from:"Ilya Maximets"

[ovs-dev] [PATCH v3 0/3] XPS implementation (Second part of XPS patch-set).

2016-07-11 Thread Ilya Maximets

This is the second part of XPS patch-set which contains XPS itself.
Implementation will use dp->ports structure by PMD threads. This
requires replacing of port_mutex with rwlock.

Also generic changes applied to fat-rwlock itself to add new
functionality: Upgrading read-lock to write-lock and backward.

Patches 1 and 2 are new ones.
Patch 3 is a little fixed one from v2:
http://openvswitch.org/pipermail/dev/2016-May/070901.html

Version 3:
* Dropped already applied changes.
* fat-rwlock used instead of port_mutex.
* revalidation of 'non-pmd' thread's tx queues added to
  'dpif_netdev_run' to make it faster.

Ilya Maximets (3):
  fat-rwlock: Make fat-rwlock upgradable.
  dpif-netdev: Use fat-rwlock to protect dp->ports.
  dpif-netdev: XPS (Transmit Packet Steering) implementation.

 lib/dpif-netdev.c | 233 +++---
 lib/fat-rwlock.c  |  56 -
 lib/fat-rwlock.h  |  11 +++
 3 files changed, 217 insertions(+), 83 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 3/3] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-11 Thread Ilya Maximets

If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<>
pmd thread numa_id 0 core_id 13:
port: vhost-user1   queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1   queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1   queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1   queue-id: 3
<>

As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
  it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
  allocated TX queue-ids. If queue wasn't used in last XPS_CYCLES
  it will be freed while revalidation.

Reported-by: Zhihong Wang 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 130 +++---
 1 file changed, 94 insertions(+), 36 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3fb1942..5eed50c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -248,6 +248,8 @@ enum pmd_cycles_counter_type {
 PMD_N_CYCLES
 };
 
+#define XPS_CYCLES 10ULL
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -256,6 +258,8 @@ struct dp_netdev_port {
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
+unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
+struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
 };
 
@@ -385,6 +389,8 @@ struct rxq_poll {
 /* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
 struct tx_port {
 odp_port_t port_no;
+int qid;
+unsigned long long last_cycles;
 struct netdev *netdev;
 struct hmap_node node;
 };
@@ -541,6 +547,11 @@ static void dp_netdev_pmd_flow_flush(struct 
dp_netdev_pmd_thread *pmd);
 static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
 OVS_REQUIRES(pmd->port_mutex);
 
+static void
+dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd);
+static int dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd,
+  struct tx_port *tx);
+
 static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
 
@@ -1185,7 +1196,9 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 port->netdev = netdev;
 port->n_rxq = netdev_n_rxq(netdev);
 port->rxq = xcalloc(port->n_rxq, sizeof *port->rxq);
+port->txq_used = xcalloc(netdev_n_txq(netdev), sizeof *port->txq_used);
 port->type = xstrdup(type);
+ovs_mutex_init(&port->txq_used_mutex);
 
 for (i = 0; i < port->n_rxq; i++) {
 error = netdev_rxq_open(netdev, &port->rxq[i], i);
@@ -1211,7 +1224,9 @@ out_rxq_close:
 for (i = 0; i < n_open_rxqs; i++) {
 netdev_rxq_close(port->rxq[i]);
 }
+ovs_mutex_destroy(&port->txq_used_mutex);
 free(port->type);
+free(port->txq_used);
 free(port->rxq);
 free(port);
 
@@ -1353,7 +1368,8 @@ port_destroy(struct dp_netdev_port *port)
 for (unsigned i = 0; i < port->n_rxq; i++) {
 netdev_rxq_close(port->rxq[i]);
 }
-
+ovs_mutex_destroy(&port->txq_use

Re: [ovs-dev] [PATCH v3 0/3] XPS implementation (Second part of XPS patch-set).

2016-07-11 Thread Ilya Maximets

First two patches "Is being held until the list moderator can review
it for approval."

Could anyone approve them?

'fat-rwlock's are very suspicious.

Best regards, Ilya Maximets.

On 11.07.2016 18:15, Ilya Maximets wrote:
> This is the second part of XPS patch-set which contains XPS itself.
> Implementation will use dp->ports structure by PMD threads. This
> requires replacing of port_mutex with rwlock.
> 
> Also generic changes applied to fat-rwlock itself to add new
> functionality: Upgrading read-lock to write-lock and backward.
> 
> Patches 1 and 2 are new ones.
> Patch 3 is a little fixed one from v2:
> http://openvswitch.org/pipermail/dev/2016-May/070901.html
> 
> Version 3:
>   * Dropped already applied changes.
>   * fat-rwlock used instead of port_mutex.
>   * revalidation of 'non-pmd' thread's tx queues added to
> 'dpif_netdev_run' to make it faster.
> 
> Ilya Maximets (3):
>   fat-rwlock: Make fat-rwlock upgradable.
>   dpif-netdev: Use fat-rwlock to protect dp->ports.
>   dpif-netdev: XPS (Transmit Packet Steering) implementation.
> 
>  lib/dpif-netdev.c | 233 
> +++---
>  lib/fat-rwlock.c  |  56 -
>  lib/fat-rwlock.h  |  11 +++
>  3 files changed, 217 insertions(+), 83 deletions(-)
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

2016-07-11 Thread Ilya Maximets

On 07.07.2016 03:27, Daniele Di Proietto wrote:
> Hi Ilya,
> 
> 
> 
> apologies for the delay
> 
> On 20/06/2016 07:22, "Ilya Maximets"  wrote:
> 
>> On 11.06.2016 02:53, Daniele Di Proietto wrote:
>>> On 02/06/2016 06:55, "Ilya Maximets"  wrote:
>>>
>>>> Hi, Daniele.
>>>> Thanks for review.
>>>>
>>>> On 02.06.2016 04:33, Daniele Di Proietto wrote:
>>>>> Hi Ilya,
>>>>>
>>>>> apologies for the delay.
>>>>>
>>>>> I didn't take a extremely detailed look at this series, but I have
>>>>> a few high level comments.
>>>>>
>>>>> Thanks for adding a command to configure the rxq affinity.  Have
>>>>> you thought about using the database instead?  I think it will
>>>>> be easier to use because it survives restarts, and one can batch
>>>>> the affinity assignment for multiple ports without explicitly
>>>>> calling pmd-reconfigure.  I'm not sure what the best interface
>>>>> would look like. Perhaps a string in Interface:other_config that
>>>>> maps rxqs with core ids?
>>>>>
>>>>> I'd prefer to avoid exporting an explicit command like
>>>>> dpif-netdev/pmd-reconfigure.  If we use the database we don't have to,
>>>>> right?
>>>>
>>>> I thought about solution with database. Actually, I can't see big
>>>> difference between database and appctl in this case. For automatic
>>>> usage both commands may be scripted, but for manual pinning this
>>>> approaches equally uncomfortable.
>>>> IMHO, if it will be database it shouldn't be a per 'Interface'
>>>> string with mapping, because one map influences on other ports
>>>> (core isolation). Also there is an issue with synchronization with
>>>> 'pmd-cpu-mask' that should be performed manually anyway.
>>>> appctl command may be changed to receive string of all mappings and
>>>> trigger reconfiguration. In this case there will be no need to have
>>>> explicit 'dpif-netdev/pmd-reconfigure'.
>>>
>>> Do we really need to implement core isolation? I'd prefer an interface where
>>> if an interface has an affinity we enforce that (as far as we can with the
>>> current pmd-cpu-mask), and for other interfaces we keep the current model.
>>> Probably there are some limitation I'm not seeing with this model.
>>
>> Generally, core isolation prevents polling of other ports on PMD thread.
>> This is useful to keep constant polling rate on some performance
>> critical port while adding/deleting of other ports. Without isolation
>> we will need to pin exactly all ports to achieve desired level of 
>> performance.
>>
>>> I'd prefer to keep the mapping in the database because it's more in line
>>> with the rest of OVS configuration.  The database survives crashes, restarts
>>> and reboots.
>>
>> Ok. How about something like this:
>>
>>  * Per-port database entry for available core-ids:
>>
>># ovs-vsctl set interface  \
>>  other_config:pmd-rxq-affinity=
>>
>>where:
>> ::= NULL | 
>> ::=  |
>>  , 
>> ::=  : 
>>
>>Example:
>>
>># ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>>  other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>>Queue #0 pinned to core 3;
>>Queue #1 pinned to core 7;
>>Queue #2 not pinned.
>>Queue #3 pinned to core 8;
> 
> Unless someone has better ideas, this looks good

Ok. I'll implement this in third (last) part of this patch-set.

>>
>>  * Configurable mask of isolated PMD threads:
>>
>># ./bin/ovs-vsctl set Open_vSwitch . \
>>  other_config:pmd-isol-cpus=
>>Empty means "none".
> 
> I still think this looks kind of complicated.  These are the options:
> 
> 1) Do not deal with isolation.  If some "isolation" is required, the user
>has to assign the rxqs of every port.
> 
> 2) Automatically isolate cores that have rxq explicitly assigned to them.
> 
> 3) Add a pmd-isol-cpus parameter
> 
> 4) Add a rxq-default-affinity (the opposite of pmd-isol-cpus).
> 
> 1) and 2) only add a single configuratio

[ovs-dev] [PATCH v3 1/3] fat-rwlock: Make fat-rwlock upgradable.

2016-07-12 Thread Ilya Maximets

New functions 'fat_rwlock_{up,down}grade()' introduced to allow
upgrading read-lock to write-lock and downgrading it back.

Signed-off-by: Ilya Maximets 
---
 lib/fat-rwlock.c | 56 ++--
 lib/fat-rwlock.h | 11 +++
 2 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/lib/fat-rwlock.c b/lib/fat-rwlock.c
index 2f42b05..86a4693 100644
--- a/lib/fat-rwlock.c
+++ b/lib/fat-rwlock.c
@@ -53,10 +53,13 @@ struct fat_rwlock_slot {
  *
  * - UINT_MAX: This thread has the write-lock on 'rwlock' and holds
  *   'mutex' (plus the 'mutex' of all of 'rwlock''s other slots).
+ *   'upgrade_depth' means the depth of read-lock on which it was
+ *   upgraded to write-lock.
  *
  * Accessed only by the slot's own thread, so no synchronization is
  * needed. */
 unsigned int depth;
+unsigned int upgrade_depth;
 };
 
 static void
@@ -127,6 +130,7 @@ fat_rwlock_get_slot__(struct fat_rwlock *rwlock)
 slot->rwlock = rwlock;
 ovs_mutex_init(&slot->mutex);
 slot->depth = 0;
+slot->upgrade_depth = 0;
 
 ovs_mutex_lock(&rwlock->mutex);
 ovs_list_push_back(&rwlock->threads, &slot->list_node);
@@ -236,6 +240,7 @@ fat_rwlock_wrlock(const struct fat_rwlock *rwlock_)
 
 ovs_assert(!this->depth);
 this->depth = UINT_MAX;
+this->upgrade_depth = 1;
 
 ovs_mutex_lock(&rwlock->mutex);
 LIST_FOR_EACH (slot, list_node, &rwlock->threads) {
@@ -257,11 +262,13 @@ fat_rwlock_unlock(const struct fat_rwlock *rwlock_)
 
 switch (this->depth) {
 case UINT_MAX:
+this->depth = this->upgrade_depth - 1;
 LIST_FOR_EACH (slot, list_node, &rwlock->threads) {
-ovs_mutex_unlock(&slot->mutex);
+if (slot != this || this->depth == 0) {
+ovs_mutex_unlock(&slot->mutex);
+}
 }
 ovs_mutex_unlock(&rwlock->mutex);
-this->depth = 0;
 break;
 
 case 0:
@@ -275,3 +282,48 @@ fat_rwlock_unlock(const struct fat_rwlock *rwlock_)
 break;
 }
 }
+
+/* Upgrades last taken read-lock to write-lock.
+ * Not thread-safe with 'fat_rwlock_wrlock' and concurrent upgrades. */
+void
+fat_rwlock_upgrade(const struct fat_rwlock *rwlock_)
+OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+struct fat_rwlock *rwlock = CONST_CAST(struct fat_rwlock *, rwlock_);
+struct fat_rwlock_slot *this = fat_rwlock_get_slot__(rwlock);
+struct fat_rwlock_slot *slot;
+
+ovs_assert(this->depth && this->depth != UINT_MAX);
+
+this->upgrade_depth = this->depth;
+this->depth = UINT_MAX;
+
+ovs_mutex_lock(&rwlock->mutex);
+LIST_FOR_EACH (slot, list_node, &rwlock->threads) {
+if (slot != this) {
+ovs_mutex_lock(&slot->mutex);
+}
+}
+}
+
+/* Downgrades write-lock to read-lock. */
+void
+fat_rwlock_downgrade(const struct fat_rwlock *rwlock_)
+OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+struct fat_rwlock *rwlock = CONST_CAST(struct fat_rwlock *, rwlock_);
+struct fat_rwlock_slot *this = fat_rwlock_get_slot__(rwlock);
+struct fat_rwlock_slot *slot;
+
+ovs_assert(this->depth == UINT_MAX);
+
+this->depth = this->upgrade_depth;
+this->upgrade_depth = 0;
+
+LIST_FOR_EACH (slot, list_node, &rwlock->threads) {
+if (slot != this) {
+ovs_mutex_unlock(&slot->mutex);
+}
+}
+ovs_mutex_unlock(&rwlock->mutex);
+}
diff --git a/lib/fat-rwlock.h b/lib/fat-rwlock.h
index 181fa92..70d5e95 100644
--- a/lib/fat-rwlock.h
+++ b/lib/fat-rwlock.h
@@ -46,4 +46,15 @@ int fat_rwlock_tryrdlock(const struct fat_rwlock *rwlock)
 void fat_rwlock_wrlock(const struct fat_rwlock *rwlock) OVS_ACQ_WRLOCK(rwlock);
 void fat_rwlock_unlock(const struct fat_rwlock *rwlock) OVS_RELEASES(rwlock);
 
+/*
+ * Following functions used to upgrade last taken read-lock to write-lock and
+ * downgrade it back to read-lock. Upgrading/downgrading doesn't change depth
+ * of recursive locking.
+ *
+ * Upgrading is NOT thread-safe operation, so, the caller must be sure that
+ * it is the only thread that wants to acquire write-lock.
+ */
+void fat_rwlock_upgrade(const struct fat_rwlock *rwlock);
+void fat_rwlock_downgrade(const struct fat_rwlock *rwlock);
+
 #endif /* fat-rwlock.h */
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 2/3] dpif-netdev: Use fat-rwlock to protect dp->ports.

2016-07-12 Thread Ilya Maximets

PMD threads can't wait on 'dp->port_mutex' because of possible
deadlock with main thread waiting in cond_wait().

This patch replaces ovs_mutex with fat-rwlock to allow PMD threads
using of dp->ports. It is required for future XPS implementation.

Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 103 ++
 1 file changed, 58 insertions(+), 45 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e0107b7..3fb1942 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -183,7 +183,7 @@ static bool dpcls_lookup(const struct dpcls *cls,
  * Acquisition order is, from outermost to innermost:
  *
  *dp_netdev_mutex (global)
- *port_mutex
+ *port_rwlock
  *non_pmd_mutex
  */
 struct dp_netdev {
@@ -196,8 +196,8 @@ struct dp_netdev {
 /* Ports.
  *
  * Any lookup into 'ports' or any access to the dp_netdev_ports found
- * through 'ports' requires taking 'port_mutex'. */
-struct ovs_mutex port_mutex;
+ * through 'ports' requires taking 'port_rwlock'. */
+struct fat_rwlock port_rwlock;
 struct hmap ports;
 struct seq *port_seq;   /* Incremented whenever a port changes. */
 
@@ -232,7 +232,7 @@ struct dp_netdev {
 
 static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
 odp_port_t)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 
 enum dp_stat_type {
 DP_STAT_EXACT_HIT,  /* Packets that had an exact match (emc). */
@@ -481,17 +481,17 @@ struct dpif_netdev {
 
 static int get_port_by_number(struct dp_netdev *dp, odp_port_t port_no,
   struct dp_netdev_port **portp)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 static int get_port_by_name(struct dp_netdev *dp, const char *devname,
 struct dp_netdev_port **portp)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 static void dp_netdev_free(struct dp_netdev *)
 OVS_REQUIRES(dp_netdev_mutex);
 static int do_add_port(struct dp_netdev *dp, const char *devname,
const char *type, odp_port_t port_no)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 static void do_del_port(struct dp_netdev *dp, struct dp_netdev_port *)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 static int dpif_netdev_open(const struct dpif_class *, const char *name,
 bool create, struct dpif **);
 static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
@@ -511,7 +511,7 @@ static void dp_netdev_configure_pmd(struct 
dp_netdev_pmd_thread *pmd,
 int numa_id);
 static void dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_set_nonpmd(struct dp_netdev *dp)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 
 static struct dp_netdev_pmd_thread *dp_netdev_get_pmd(struct dp_netdev *dp,
   unsigned core_id);
@@ -520,7 +520,7 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct 
cmap_position *pos);
 static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
 static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
  struct dp_netdev_port *port);
@@ -534,7 +534,7 @@ static void dp_netdev_add_rxq_to_pmd(struct 
dp_netdev_pmd_thread *pmd,
 static struct dp_netdev_pmd_thread *
 dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
-OVS_REQUIRES(dp->port_mutex);
+OVS_REQ_RDLOCK(dp->port_rwlock);
 static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
@@ -870,7 +870,7 @@ create_dpif_netdev(struct dp_netdev *dp)
  * Return ODPP_NONE on failure. */
 static odp_port_t
 choose_port(struct dp_netdev *dp, const char *name)
-OVS_REQUIRES(dp->port_mutex)
+OVS_REQ_RDLOCK(dp->port_rwlock)
 {
 uint32_t port_no;
 
@@ -924,7 +924,7 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 ovs_refcount_init(&dp->ref_cnt);
 atomic_flag_clear(&dp->destroyed);
 
-ovs_mutex_init(&dp->port_mutex);
+fat_rwlock_init(&dp->port_rwlock);
 hmap_init(&

[ovs-dev] [PATCH v4] XPS implementation (Second part of XPS patch-set).

2016-07-13 Thread Ilya Maximets

This is the second part of XPS patch-set which contains XPS itself.

Version 4:
* Dropped rwlock related patches.
* Added pointer from 'struct tx_port' to 'struct dp_netdev_port'
  to avoid locking of 'dp->ports'. This works because as long as
  a port is in a pmd thread's tx_port cache it cannot be deleted
  from the datapath.
* Added 'now' parameter to 'dp_netdev_execute_actions()' to pass
  current time to XPS functions. This needed to avoid using
  'last_cycles' that is always 0 without DPDK.
* Fixed tx queue ids cleanup on PMD thread deletion.

Version 3:
* Dropped already applied changes.
* fat-rwlock used instead of port_mutex.
* revalidation of 'non-pmd' thread's tx queues added to
  'dpif_netdev_run' to make it faster.



Ilya Maximets (1):
  dpif-netdev: XPS (Transmit Packet Steering) implementation.

 lib/dpif-netdev.c | 170 +-
 1 file changed, 117 insertions(+), 53 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v4] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-13 Thread Ilya Maximets

If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<>
pmd thread numa_id 0 core_id 13:
port: vhost-user1   queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1   queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1   queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1   queue-id: 3
<>

As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
  it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
  allocated TX queue-ids. If queue wasn't used in last
  XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.

Reported-by: Zhihong Wang 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 170 +-
 1 file changed, 117 insertions(+), 53 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e0107b7..6345944 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -248,6 +248,8 @@ enum pmd_cycles_counter_type {
 PMD_N_CYCLES
 };
 
+#define XPS_TIMEOUT_MS 500LL
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -256,6 +258,8 @@ struct dp_netdev_port {
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
+unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
+struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
 };
 
@@ -384,8 +388,9 @@ struct rxq_poll {
 
 /* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
 struct tx_port {
-odp_port_t port_no;
-struct netdev *netdev;
+struct dp_netdev_port *port;
+int qid;
+long long last_used;
 struct hmap_node node;
 };
 
@@ -498,7 +503,8 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
   struct dp_packet_batch *,
   bool may_steal,
   const struct nlattr *actions,
-  size_t actions_len);
+  size_t actions_len,
+  long long now);
 static void dp_netdev_input(struct dp_netdev_pmd_thread *,
 struct dp_packet_batch *, odp_port_t port_no);
 static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
@@ -541,6 +547,12 @@ static void dp_netdev_pmd_flow_flush(struct 
dp_netdev_pmd_thread *pmd);
 static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
 OVS_REQUIRES(pmd->port_mutex);
 
+static void
+dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
+   long long now, bool purge);
+static int dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd,
+  struct tx_port *tx, long long now);
+
 static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
 
@@ -1185,7 +1197,9 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 port->netdev = netdev;
 port->n_rxq = netdev_n_rxq(netdev);
 port->rxq = xcalloc(port->n_rxq, sizeof *port->rxq);
+port->t

Re: [ovs-dev] [PATCH v3 3/3] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-13 Thread Ilya Maximets

Hi, Daniele.
Thanks for review.

On 13.07.2016 04:15, Daniele Di Proietto wrote:
> Thanks for the patch.
> 
> This is not a complete review, but I have some preliminary comments.
> 
> If I understand correctly 'port_mutex' is converted to rwlock because
> we want the pmd threads in dpif_netdev_xps_get_tx_qid() to be able to
> grab it concurrently.  I think that we can add a pointer from 'struct
> tx_port' to 'struct dp_netdev_port' and access that without locking.
> As long as a port is in a pmd thread tx_port cache it cannot be
> deleted from the datapath.  This way we can avoid the rwlock.

Yes. Thank you for suggestion. This greatly simplifies this patch set.
It become almost 2 times smaller.

> 'last_cycles' is only used to monitor the performances of the pmd
> threads and it is always 0 if we compile without DPDK.  Perhaps
> we can add a 'now' parameter to dp_netdev_execute_actions(),
> pass it from packet_batch_per_flow_execute() and use that instead.

Thanks, I've implemented this in v4.

> Maybe we can improve this in the future, but with this patch
> dpif-netdev calls netdev_send() taking into account n_txq, which
> is the real number of queue.  Perhaps txq_needs_locking for
> phy devices should be stored in dpif-netdev and passed to every
> invocation of netdev_send()?

Yes I had this idea, but decided to implement this later because
there is no real profit for XPS patch set.

> Finally, have you thought about avoiding txq_used_mutex and
> using some form of atomic_compare_exchange() on the number
> of users, perhaps?  I'm not sure it's better than the
> mutex, I just wanted to throw this here, in case someone
> comes up with a good idea.

I thought about some atomic solution, but it will lead to some
retries or not-optimal txq distribution. So, I choose simple
mutex instead of complex solution with atomics. If this place
will become a bottleneck we could replace cycle by some
effective data structure (heap?) or invent some schema with
atomics in the future.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 3/3] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-13 Thread Ilya Maximets

On 13.07.2016 23:36, Ben Pfaff wrote:
> It looks like v4 doesn't need the fat-rwlock change, then?  I had been
> planning to review it but I'll skip it in that case.  Please let me know
> if you still want me to review it.

It will be great if you'll take a look on first patch of this series. It
extends current implementation of rwlock with some new features. Maybe
someone will use this functionality in the future. Anyway it'll be sad to
just let it fade away somewhere in mail-list.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 1/3] bridge: Pass interface's configuration to datapath.

2016-07-15 Thread Ilya Maximets

This commit adds functionality to pass value of 'other_config' column
of 'Interface' table to datapath.

This may be used to pass not directly connected with netdev options and
configure behaviour of the datapath for different ports.
For example: pinning of rx queues to polling threads in dpif-netdev.

Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c  |  1 +
 lib/dpif-netlink.c |  1 +
 lib/dpif-provider.h|  5 +
 lib/dpif.c | 17 +
 lib/dpif.h |  1 +
 ofproto/ofproto-dpif.c | 16 
 ofproto/ofproto-provider.h |  5 +
 ofproto/ofproto.c  | 29 +
 ofproto/ofproto.h  |  2 ++
 vswitchd/bridge.c  |  2 ++
 10 files changed, 79 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 6345944..4643cce 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4295,6 +4295,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_get_stats,
 dpif_netdev_port_add,
 dpif_netdev_port_del,
+NULL,   /* port_set_config */
 dpif_netdev_port_query_by_number,
 dpif_netdev_port_query_by_name,
 NULL,   /* port_get_pid */
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index e2bea23..2f939ae 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2348,6 +2348,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_get_stats,
 dpif_netlink_port_add,
 dpif_netlink_port_del,
+NULL,   /* port_set_config */
 dpif_netlink_port_query_by_number,
 dpif_netlink_port_query_by_name,
 dpif_netlink_port_get_pid,
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 25f4280..21fb0ba 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -167,6 +167,11 @@ struct dpif_class {
 /* Removes port numbered 'port_no' from 'dpif'. */
 int (*port_del)(struct dpif *dpif, odp_port_t port_no);
 
+/* Refreshes configuration of 'dpif's port. The implementation might
+ * postpone applying the changes until run() is called. */
+int (*port_set_config)(struct dpif *dpif, odp_port_t port_no,
+   const struct smap *cfg);
+
 /* Queries 'dpif' for a port with the given 'port_no' or 'devname'.
  * If 'port' is not null, stores information about the port into
  * '*port' if successful.
diff --git a/lib/dpif.c b/lib/dpif.c
index 5f1be41..f6e5338 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -610,6 +610,23 @@ dpif_port_exists(const struct dpif *dpif, const char 
*devname)
 return !error;
 }
 
+/* Refreshes configuration of 'dpif's port. */
+int
+dpif_port_set_config(struct dpif *dpif, odp_port_t port_no,
+ const struct smap *cfg)
+{
+int error = 0;
+
+if (dpif->dpif_class->port_set_config) {
+error = dpif->dpif_class->port_set_config(dpif, port_no, cfg);
+if (error) {
+log_operation(dpif, "port_set_config", error);
+}
+}
+
+return error;
+}
+
 /* Looks up port number 'port_no' in 'dpif'.  On success, returns 0 and
  * initializes '*port' appropriately; on failure, returns a positive errno
  * value.
diff --git a/lib/dpif.h b/lib/dpif.h
index 981868c..a7c5097 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -839,6 +839,7 @@ void dpif_register_upcall_cb(struct dpif *, upcall_callback 
*, void *aux);
 int dpif_recv_set(struct dpif *, bool enable);
 int dpif_handlers_set(struct dpif *, uint32_t n_handlers);
 int dpif_poll_threads_set(struct dpif *, const char *cmask);
+int dpif_port_set_config(struct dpif *, odp_port_t, const struct smap *cfg);
 int dpif_recv(struct dpif *, uint32_t handler_id, struct dpif_upcall *,
   struct ofpbuf *);
 void dpif_recv_purge(struct dpif *);
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index ce9383a..97510a9 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3542,6 +3542,21 @@ port_del(struct ofproto *ofproto_, ofp_port_t ofp_port)
 }
 
 static int
+port_set_config(struct ofproto *ofproto_, ofp_port_t ofp_port,
+const struct smap *cfg)
+{
+struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
+struct ofport_dpif *ofport = ofp_port_to_ofport(ofproto, ofp_port);
+
+if (!ofport || sset_contains(&ofproto->ghost_ports,
+ netdev_get_name(ofport->up.netdev))) {
+return 0;
+}
+
+return dpif_port_set_config(ofproto->backer->dpif, ofport->odp_port, cfg);
+}
+
+static int
 port_get_stats(const struct ofport *ofport_, struct netdev_stats *stats)
 {
 struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
@@ -5609,6 +5624,7 @@ const struct ofproto_class ofproto_dpif_class = {
 port_query_by_name,

[ovs-dev] [PATCH v3 3/3] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-15 Thread Ilya Maximets

New 'other_config:pmd-rxq-affinity' field for Interface table to
perform manual pinning of RX queues to desired cores.

This functionality is required to achieve maximum performance because
all kinds of ports have different cost of rx/tx operations and
only user can know about expected workload on different ports.

Example:
# ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
  other_config:pmd-rxq-affinity="0:3,1:7,3:8"
Queue #0 pinned to core 3;
Queue #1 pinned to core 7;
Queue #2 not pinned.
Queue #3 pinned to core 8;

It's decided to automatically isolate cores that have rxq explicitly
assigned to them because it's useful to keep constant polling rate on
some performance critical ports while adding/deleting other ports
without explicit pinning of all ports.

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK.md  |  49 +++-
 NEWS |   2 +
 lib/dpif-netdev.c| 218 ++-
 tests/pmd.at |   6 ++
 vswitchd/vswitch.xml |  23 ++
 5 files changed, 257 insertions(+), 41 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 5407794..7609aa7 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
  # Check current stats
ovs-appctl dpif-netdev/pmd-stats-show
 
+ # Clear previous stats
+   ovs-appctl dpif-netdev/pmd-stats-clear
+ ```
+
+  7. Port/rxq assigment to PMD threads
+
+ ```
  # Show port/rxq assignment
ovs-appctl dpif-netdev/pmd-rxq-show
+ ```
 
- # Clear previous stats
-   ovs-appctl dpif-netdev/pmd-stats-clear
+ To change default rxq assignment to pmd threads rxqs may be manually
+ pinned to desired cores using:
+
+ ```
+ ovs-vsctl set Interface  \
+   other_config:pmd-rxq-affinity=
  ```
+ where:
+
+ ```
+  ::= NULL | 
+  ::=  |
+   , 
+  ::=  : 
+ ```
+
+ Example:
+
+ ```
+ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
+   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
+
+ Queue #0 pinned to core 3;
+ Queue #1 pinned to core 7;
+ Queue #2 not pinned.
+ Queue #3 pinned to core 8;
+ ```
+
+ After that PMD threads on cores where RX queues was pinned will become
+ `isolated`. This means that this thread will poll only pinned RX queues.
+
+ WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
queues
+ will not be polled. Also, if provided `core_id` is not available (ex. this
+ `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
+ PMD thread.
+
+ Isolation of PMD threads also can be checked using
+ `ovs-appctl dpif-netdev/pmd-rxq-show` command.
 
-  7. Stop vswitchd & Delete bridge
+  8. Stop vswitchd & Delete bridge
 
  ```
  ovs-appctl -t ovs-vswitchd exit
diff --git a/NEWS b/NEWS
index 6496dc1..9ccc1f5 100644
--- a/NEWS
+++ b/NEWS
@@ -44,6 +44,8 @@ Post-v2.5.0
Old 'other_config:n-dpdk-rxqs' is no longer supported.
Not supported by vHost interfaces. For them number of rx and tx queues
is applied from connected virtio device.
+ * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
+   allows to pin port's rx queues to desired cores.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
  * Type of log messages from PMD threads changed from INFO to DBG.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 18ce316..e5a8dec 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -63,6 +63,7 @@
 #include "random.h"
 #include "seq.h"
 #include "shash.h"
+#include "smap.h"
 #include "sset.h"
 #include "timeval.h"
 #include "tnl-neigh-cache.h"
@@ -250,6 +251,12 @@ enum pmd_cycles_counter_type {
 
 #define XPS_TIMEOUT_MS 500LL
 
+/* Contained by struct dp_netdev_port's 'rxqs' member.  */
+struct dp_netdev_rxq {
+struct netdev_rxq *rxq;
+unsigned core_id;   /* Сore to which this queue is pinned. */
+};
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -257,10 +264,11 @@ struct dp_netdev_port {
 struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
-struct netdev_rxq **rxq;
+struct dp_netdev_rxq *rxqs;
 unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
 struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
+char *rxq_affinity_list;/* Requested affinity of rx queues. */
 };
 
 /*

[ovs-dev] [PATCH v3 0/3] Manual pinning of rxqs (Third part of XPS patch-set).

2016-07-15 Thread Ilya Maximets

This is the third and last part of XPS patch-set which contains
implementation of manual pinning of rx queues to pmd threads.

Manual pinning API was discussed here:
http://openvswitch.org/pipermail/dev/2016-July/074674.html

This patch-set based on top of
"[PATCH v4] XPS implementation (Second part of XPS patch-set)."
http://openvswitch.org/pipermail/dev/2016-July/075122.html

First patch implements passing of 'other_config' column from 'Interface'
table to datapath. This is required to pass pinning configuration
to dpif-netdev.

Note: IMHO, pinning configuration is not connected with netdev, so, it
  shouldn't be passed via 'options'. 'other_config' is not used by
  anyone, so, I think, it's a good place to store such information
  and push it down to datapath.

Second patch is a refactoring to decrease code duplication.

Third patch is a modified version of the last patch from v2:
"dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command."
It was modified to get configuration pushed via 'port_set_config'.

Ilya Maximets (3):
  bridge: Pass interface's configuration to datapath.
  util: Expose function nullable_string_is_equal.
  dpif-netdev: Introduce pmd-rxq-affinity.

 INSTALL.DPDK.md  |  49 -
 NEWS |   2 +
 lib/dpif-netdev.c| 231 ++-
 lib/dpif-netlink.c   |   1 +
 lib/dpif-provider.h  |   5 +
 lib/dpif.c   |  17 
 lib/dpif.h   |   1 +
 lib/util.c   |   6 ++
 lib/util.h   |   1 +
 ofproto/ofproto-dpif-ipfix.c |   6 --
 ofproto/ofproto-dpif-sflow.c |   6 --
 ofproto/ofproto-dpif.c   |  16 +++
 ofproto/ofproto-provider.h   |   5 +
 ofproto/ofproto.c|  29 ++
 ofproto/ofproto.h|   2 +
 tests/pmd.at |   6 ++
 vswitchd/bridge.c|   2 +
 vswitchd/vswitch.xml |  23 +
 18 files changed, 344 insertions(+), 64 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 2/3] util: Expose function nullable_string_is_equal.

2016-07-15 Thread Ilya Maximets

Implementation of 'nullable_string_is_equal()' moved to util.c and
reused inside dpif-netdev.

Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c| 14 ++
 lib/util.c   |  6 ++
 lib/util.h   |  1 +
 ofproto/ofproto-dpif-ipfix.c |  6 --
 ofproto/ofproto-dpif-sflow.c |  6 --
 5 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 4643cce..18ce316 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2524,16 +2524,6 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**ops, size_t n_ops)
 }
 }
 
-static bool
-cmask_equals(const char *a, const char *b)
-{
-if (a && b) {
-return !strcmp(a, b);
-}
-
-return a == NULL && b == NULL;
-}
-
 /* Changes the number or the affinity of pmd threads.  The changes are actually
  * applied in dpif_netdev_run(). */
 static int
@@ -2541,7 +2531,7 @@ dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 
-if (!cmask_equals(dp->requested_pmd_cmask, cmask)) {
+if (!nullable_string_is_equal(dp->requested_pmd_cmask, cmask)) {
 free(dp->requested_pmd_cmask);
 dp->requested_pmd_cmask = nullable_xstrdup(cmask);
 }
@@ -2756,7 +2746,7 @@ dpif_netdev_run(struct dpif *dpif)
 
 dp_netdev_pmd_unref(non_pmd);
 
-if (!cmask_equals(dp->pmd_cmask, dp->requested_pmd_cmask)
+if (!nullable_string_is_equal(dp->pmd_cmask, dp->requested_pmd_cmask)
 || ports_require_restart(dp)) {
 reconfigure_pmd_threads(dp);
 }
diff --git a/lib/util.c b/lib/util.c
index e1dc3d2..241a7f1 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -157,6 +157,12 @@ nullable_xstrdup(const char *s)
 return s ? xstrdup(s) : NULL;
 }
 
+bool
+nullable_string_is_equal(const char *a, const char *b)
+{
+return a ? b && !strcmp(a, b) : !b;
+}
+
 char *
 xvasprintf(const char *format, va_list args)
 {
diff --git a/lib/util.h b/lib/util.h
index e738c9f..6a61dde 100644
--- a/lib/util.h
+++ b/lib/util.h
@@ -113,6 +113,7 @@ void *xmemdup(const void *, size_t) MALLOC_LIKE;
 char *xmemdup0(const char *, size_t) MALLOC_LIKE;
 char *xstrdup(const char *) MALLOC_LIKE;
 char *nullable_xstrdup(const char *) MALLOC_LIKE;
+bool nullable_string_is_equal(const char *a, const char *b);
 char *xasprintf(const char *format, ...) OVS_PRINTF_FORMAT(1, 2) MALLOC_LIKE;
 char *xvasprintf(const char *format, va_list) OVS_PRINTF_FORMAT(1, 0) 
MALLOC_LIKE;
 void *x2nrealloc(void *p, size_t *n, size_t s);
diff --git a/ofproto/ofproto-dpif-ipfix.c b/ofproto/ofproto-dpif-ipfix.c
index 5744abb..d9069cb 100644
--- a/ofproto/ofproto-dpif-ipfix.c
+++ b/ofproto/ofproto-dpif-ipfix.c
@@ -464,12 +464,6 @@ static void get_export_time_now(uint64_t *, uint32_t *);
 static void dpif_ipfix_cache_expire_now(struct dpif_ipfix_exporter *, bool);
 
 static bool
-nullable_string_is_equal(const char *a, const char *b)
-{
-return a ? b && !strcmp(a, b) : !b;
-}
-
-static bool
 ofproto_ipfix_bridge_exporter_options_equal(
 const struct ofproto_ipfix_bridge_exporter_options *a,
 const struct ofproto_ipfix_bridge_exporter_options *b)
diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
index 7d0aa36..8ede492 100644
--- a/ofproto/ofproto-dpif-sflow.c
+++ b/ofproto/ofproto-dpif-sflow.c
@@ -92,12 +92,6 @@ static void dpif_sflow_del_port__(struct dpif_sflow *,
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 static bool
-nullable_string_is_equal(const char *a, const char *b)
-{
-return a ? b && !strcmp(a, b) : !b;
-}
-
-static bool
 ofproto_sflow_options_equal(const struct ofproto_sflow_options *a,
  const struct ofproto_sflow_options *b)
 {
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev, RFC, v2, 1/1] netdev-dpdk: Add support for DPDK 16.07

2016-07-18 Thread Ilya Maximets

On 12.07.2016 12:11, Ciara Loftus wrote:
> This commit introduces support for DPDK 16.07 and consequently breaks
> compatibility with DPDK 16.04.
> 
> DPDK 16.07 introduces some changes to various APIs. These have been
> updated in OVS, including:
> * xstats API: changes to structure of xstats
> * vhost API:  replace virtio-net references with 'vid'
> 
> Signed-off-by: Ciara Loftus 
> ---
>  .travis/linux-build.sh   |   2 +-
>  INSTALL.DPDK-ADVANCED.md |   8 +-
>  INSTALL.DPDK.md  |  20 ++--
>  lib/netdev-dpdk.c| 243 
> +++
>  4 files changed, 135 insertions(+), 138 deletions(-)
> 
> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> index 065de39..1b3d43d 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -68,7 +68,7 @@ fi
>  
>  if [ "$DPDK" ]; then
>  if [ -z "$DPDK_VER" ]; then
> -DPDK_VER="16.04"
> +DPDK_VER="16.07"
>  fi
>  install_dpdk $DPDK_VER
>  if [ "$CC" = "clang" ]; then
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 9ae536d..ec1de29 100644
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -43,7 +43,7 @@ for DPDK and OVS.
>  For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc`
>  
>  ```
> -export DPDK_DIR=/usr/src/dpdk-16.04
> +export DPDK_DIR=/usr/src/dpdk-16.07
>  export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>  make install T=$DPDK_TARGET DESTDIR=install
>  ```
> @@ -339,7 +339,7 @@ For users wanting to do packet forwarding using kernel 
> stack below are the steps
> cd /usr/src/cmdline_generator
> wget 
> https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/cmdline_generator.c
> wget 
> https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/Makefile
> -   export RTE_SDK=/usr/src/dpdk-16.04
> +   export RTE_SDK=/usr/src/dpdk-16.07
> export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
> make
> ./build/cmdline_generator -m -p dpdkr0 XXX
> @@ -363,7 +363,7 @@ For users wanting to do packet forwarding using kernel 
> stack below are the steps
> mount -t hugetlbfs nodev /dev/hugepages (if not already mounted)
>  
> # Build the DPDK ring application in the VM
> -   export RTE_SDK=/root/dpdk-16.04
> +   export RTE_SDK=/root/dpdk-16.07
> export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
> make
>  
> @@ -374,7 +374,7 @@ For users wanting to do packet forwarding using kernel 
> stack below are the steps
>  
>  ##  6. Vhost Walkthrough
>  
> -DPDK 16.04 supports two types of vhost:
> +DPDK 16.07 supports two types of vhost:
>  
>  1. vhost-user - enabled default
>  
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 5407794..9022ad8 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -21,7 +21,7 @@ The DPDK support of Open vSwitch is considered 
> 'experimental'.
>  
>  ### Prerequisites
>  
> -* Required: DPDK 16.04, libnuma
> +* Required: DPDK 16.07, libnuma
>  * Hardware: [DPDK Supported NICs] when physical ports in use
>  
>  ##  2. Building and Installation
> @@ -42,10 +42,10 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>  
>   ```
>   cd /usr/src/
> - wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip
> - unzip dpdk-16.04.zip
> + wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.07.zip
> + unzip dpdk-16.07.zip
>  
> - export DPDK_DIR=/usr/src/dpdk-16.04
> + export DPDK_DIR=/usr/src/dpdk-16.07
>   cd $DPDK_DIR
>   ```
>  
> @@ -329,9 +329,9 @@ can be found in [Vhost Walkthrough].
>  
>```
>cd /root/dpdk/
> -  wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip
> -  unzip dpdk-16.04.zip
> -  export DPDK_DIR=/root/dpdk/dpdk-16.04
> +  wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.07.zip
> +  unzip dpdk-16.07.zip
> +  export DPDK_DIR=/root/dpdk/dpdk-16.07
>export DPDK_TARGET=x86_64-native-linuxapp-gcc
>export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>cd $DPDK_DIR
> @@ -487,7 +487,7 @@ can be found in [Vhost Walkthrough].
> 
> 
>   
> - 
> + 
>   
>   
> 
> @@ -557,9 +557,9 @@ can be found in [Vhost Walkthrough].
>  DPDK. It is recommended that users update Network Interface firmware to
>  match what has been validated for the DPDK release.
>  
> -For DPDK 16.04, the list of validated firmware versions can be found at:
> +For DPDK 16.07, the list of validated firmware versions can be found at:
>  
> -http://dpdk.org/doc/guides/rel_notes/release_16_04.html
> +http://dpdk.org/doc/guides/rel_notes/release_16.07.html
>  
>  
>  Bug Reporting:
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 85b18fd..9cf0b0c 100644

Re: [ovs-dev] [PATCH v4] XPS implementation (Second part of XPS patch-set).

2016-07-24 Thread Ilya Maximets

Ping.

Best regards, Ilya Maximets.

On 13.07.2016 15:34, Ilya Maximets wrote:
> This is the second part of XPS patch-set which contains XPS itself.
> 
> Version 4:
>   * Dropped rwlock related patches.
>   * Added pointer from 'struct tx_port' to 'struct dp_netdev_port'
> to avoid locking of 'dp->ports'. This works because as long as
> a port is in a pmd thread's tx_port cache it cannot be deleted
> from the datapath.
>   * Added 'now' parameter to 'dp_netdev_execute_actions()' to pass
> current time to XPS functions. This needed to avoid using
> 'last_cycles' that is always 0 without DPDK.
>   * Fixed tx queue ids cleanup on PMD thread deletion.
> 
> Version 3:
>   * Dropped already applied changes.
>   * fat-rwlock used instead of port_mutex.
>   * revalidation of 'non-pmd' thread's tx queues added to
> 'dpif_netdev_run' to make it faster.
> 
> 
> 
> Ilya Maximets (1):
>   dpif-netdev: XPS (Transmit Packet Steering) implementation.
> 
>  lib/dpif-netdev.c | 170 
> +-
>  1 file changed, 117 insertions(+), 53 deletions(-)
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 0/3] Manual pinning of rxqs (Third part of XPS patch-set).

2016-07-24 Thread Ilya Maximets

Ping.

Best regards, Ilya Maximets.

On 15.07.2016 14:54, Ilya Maximets wrote:
> This is the third and last part of XPS patch-set which contains
> implementation of manual pinning of rx queues to pmd threads.
> 
> Manual pinning API was discussed here:
> http://openvswitch.org/pipermail/dev/2016-July/074674.html
> 
> This patch-set based on top of
> "[PATCH v4] XPS implementation (Second part of XPS patch-set)."
> http://openvswitch.org/pipermail/dev/2016-July/075122.html
> 
> First patch implements passing of 'other_config' column from 'Interface'
> table to datapath. This is required to pass pinning configuration
> to dpif-netdev.
> 
> Note: IMHO, pinning configuration is not connected with netdev, so, it
>   shouldn't be passed via 'options'. 'other_config' is not used by
>   anyone, so, I think, it's a good place to store such information
>   and push it down to datapath.
> 
> Second patch is a refactoring to decrease code duplication.
> 
> Third patch is a modified version of the last patch from v2:
> "dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command."
> It was modified to get configuration pushed via 'port_set_config'.
> 
> Ilya Maximets (3):
>   bridge: Pass interface's configuration to datapath.
>   util: Expose function nullable_string_is_equal.
>   dpif-netdev: Introduce pmd-rxq-affinity.
> 
>  INSTALL.DPDK.md  |  49 -
>  NEWS |   2 +
>  lib/dpif-netdev.c| 231 
> ++-
>  lib/dpif-netlink.c   |   1 +
>  lib/dpif-provider.h  |   5 +
>  lib/dpif.c   |  17 
>  lib/dpif.h   |   1 +
>  lib/util.c   |   6 ++
>  lib/util.h   |   1 +
>  ofproto/ofproto-dpif-ipfix.c |   6 --
>  ofproto/ofproto-dpif-sflow.c |   6 --
>  ofproto/ofproto-dpif.c   |  16 +++
>  ofproto/ofproto-provider.h   |   5 +
>  ofproto/ofproto.c|  29 ++
>  ofproto/ofproto.h|   2 +
>  tests/pmd.at |   6 ++
>  vswitchd/bridge.c|   2 +
>  vswitchd/vswitch.xml |  23 +
>  18 files changed, 344 insertions(+), 64 deletions(-)
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 1/3] bridge: Pass interface's configuration to datapath.

2016-07-26 Thread Ilya Maximets

On 26.07.2016 04:45, Daniele Di Proietto wrote:
> Thanks for the patch
> 
> It looks good to me, a few minor comments inline
> 
> 
> On 15/07/2016 04:54, "Ilya Maximets"  wrote:
> 
>> This commit adds functionality to pass value of 'other_config' column
>> of 'Interface' table to datapath.
>>
>> This may be used to pass not directly connected with netdev options and
>> configure behaviour of the datapath for different ports.
>> For example: pinning of rx queues to polling threads in dpif-netdev.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>> lib/dpif-netdev.c  |  1 +
>> lib/dpif-netlink.c |  1 +
>> lib/dpif-provider.h|  5 +
>> lib/dpif.c | 17 +
>> lib/dpif.h |  1 +
>> ofproto/ofproto-dpif.c | 16 
>> ofproto/ofproto-provider.h |  5 +
>> ofproto/ofproto.c  | 29 +
>> ofproto/ofproto.h  |  2 ++
>> vswitchd/bridge.c  |  2 ++
>> 10 files changed, 79 insertions(+)
>>
>> [...]
>>
>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>> index ce9383a..97510a9 100644
>> --- a/ofproto/ofproto-dpif.c
>> +++ b/ofproto/ofproto-dpif.c
>> @@ -3542,6 +3542,21 @@ port_del(struct ofproto *ofproto_, ofp_port_t 
>> ofp_port)
>> }
>>
>> static int
>> +port_set_config(struct ofproto *ofproto_, ofp_port_t ofp_port,
>> +const struct smap *cfg)
> 
> Can we change this to directly take struct ofport_dpif *ofport instead of 
> ofp_port_t?

We can't get 'struct ofport_dpif *' because ofproto layer knows nothing
about 'ofport_dpif' structure. All that we can is to get 'struct ofport *'
and cast it.

How about following fixup to this patch:
--
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 3a13326..79f2aa0 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3543,14 +3543,13 @@ port_del(struct ofproto *ofproto_, ofp_port_t ofp_port)
 }
 
 static int
-port_set_config(struct ofproto *ofproto_, ofp_port_t ofp_port,
-const struct smap *cfg)
+port_set_config(const struct ofport *ofport_, const struct smap *cfg)
 {
-struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
-struct ofport_dpif *ofport = ofp_port_to_ofport(ofproto, ofp_port);
+struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
+struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
 
-if (!ofport || sset_contains(&ofproto->ghost_ports,
- netdev_get_name(ofport->up.netdev))) {
+if (sset_contains(&ofproto->ghost_ports,
+  netdev_get_name(ofport->up.netdev))) {
 return 0;
 }
 
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index 2fc7452..7156814 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -972,10 +972,9 @@ struct ofproto_class {
  * convenient. */
 int (*port_del)(struct ofproto *ofproto, ofp_port_t ofp_port);
 
-/* Refreshes dtapath configuration of port number 'ofp_port' in 'ofproto'.
+/* Refreshes datapath configuration of 'port'.
  * Returns 0 if successful, otherwise a positive errno value. */
-int (*port_set_config)(struct ofproto *ofproto, ofp_port_t ofp_port,
-   const struct smap *cfg);
+int (*port_set_config)(const struct ofport *port, const struct smap *cfg);
 
 /* Get port stats */
 int (*port_get_stats)(const struct ofport *port,
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index c66c866..6cd2600 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -2079,7 +2079,7 @@ ofproto_port_del(struct ofproto *ofproto, ofp_port_t 
ofp_port)
 return error;
 }
 
-/* Refreshes dtapath configuration of port number 'ofp_port' in 'ofproto'.
+/* Refreshes datapath configuration of port number 'ofp_port' in 'ofproto'.
  *
  * This function has no effect if 'ofproto' does not have a port 'ofp_port'. */
 void
@@ -2097,10 +2097,10 @@ ofproto_port_set_config(struct ofproto *ofproto, 
ofp_port_t ofp_port,
 }
 
 error = (ofproto->ofproto_class->port_set_config
- ? ofproto->ofproto_class->port_set_config(ofproto, ofp_port, cfg)
+ ? ofproto->ofproto_class->port_set_config(ofport, cfg)
  : EOPNOTSUPP);
 if (error) {
-VLOG_WARN("%s: dtatapath configuration on port %"PRIu16
+VLOG_WARN("%s: datapath configuration on port %"PRI

Re: [ovs-dev] [PATCH v3 3/3] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-26 Thread Ilya Maximets

On 26.07.2016 04:46, Daniele Di Proietto wrote:
> Thanks for the patch.
> 
> I haven't been able to apply this without the XPS patch.

That was the original idea. Using of this patch with current
tx queue management may lead to performance issues on multiqueue
configurations.

> This looks like a perfect chance to add more tests to pmd.at.  I can do it if 
> you want

Sounds good.

> I started taking a look at this patch and I have a few comments inline.  I'll 
> keep looking at it tomorrow
> 
> Thanks,
> 
> Daniele
> 
> 
> On 15/07/2016 04:54, "Ilya Maximets"  wrote:
> 
>> New 'other_config:pmd-rxq-affinity' field for Interface table to
>> perform manual pinning of RX queues to desired cores.
>>
>> This functionality is required to achieve maximum performance because
>> all kinds of ports have different cost of rx/tx operations and
>> only user can know about expected workload on different ports.
>>
>> Example:
>>  # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>>other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>>  Queue #0 pinned to core 3;
>>  Queue #1 pinned to core 7;
>>  Queue #2 not pinned.
>>  Queue #3 pinned to core 8;
>>
>> It's decided to automatically isolate cores that have rxq explicitly
>> assigned to them because it's useful to keep constant polling rate on
>> some performance critical ports while adding/deleting other ports
>> without explicit pinning of all ports.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>> INSTALL.DPDK.md  |  49 +++-
>> NEWS |   2 +
>> lib/dpif-netdev.c| 218 
>> ++-
>> tests/pmd.at |   6 ++
>> vswitchd/vswitch.xml |  23 ++
>> 5 files changed, 257 insertions(+), 41 deletions(-)
>>
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index 5407794..7609aa7 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>>  # Check current stats
>>ovs-appctl dpif-netdev/pmd-stats-show
>>
>> + # Clear previous stats
>> +   ovs-appctl dpif-netdev/pmd-stats-clear
>> + ```
>> +
>> +  7. Port/rxq assigment to PMD threads
>> +
>> + ```
>>  # Show port/rxq assignment
>>ovs-appctl dpif-netdev/pmd-rxq-show
>> + ```
>>
>> - # Clear previous stats
>> -   ovs-appctl dpif-netdev/pmd-stats-clear
>> + To change default rxq assignment to pmd threads rxqs may be manually
>> + pinned to desired cores using:
>> +
>> + ```
>> + ovs-vsctl set Interface  \
>> +   other_config:pmd-rxq-affinity=
>>  ```
>> + where:
>> +
>> + ```
>> +  ::= NULL | 
>> +  ::=  |
>> +   , 
>> +  ::=  : 
>> + ```
>> +
>> + Example:
>> +
>> + ```
>> + ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>> +   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>> +
>> + Queue #0 pinned to core 3;
>> + Queue #1 pinned to core 7;
>> + Queue #2 not pinned.
>> + Queue #3 pinned to core 8;
>> + ```
>> +
>> + After that PMD threads on cores where RX queues was pinned will become
>> + `isolated`. This means that this thread will poll only pinned RX 
>> queues.
>> +
>> + WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
>> queues
>> + will not be polled. Also, if provided `core_id` is not available (ex. 
>> this
>> + `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
>> + PMD thread.
>> +
>> + Isolation of PMD threads also can be checked using
>> + `ovs-appctl dpif-netdev/pmd-rxq-show` command.
>>
>> -  7. Stop vswitchd & Delete bridge
>> +  8. Stop vswitchd & Delete bridge
>>
>>  ```
>>  ovs-appctl -t ovs-vswitchd exit
>> diff --git a/NEWS b/NEWS
>> index 6496dc1..9ccc1f5 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -44,6 +44,8 @@ Post-v2.5.0
>>Old 'other_config:n-dpdk-rxqs' is no longer supported.
>>Not supported by vHost interfaces. For them number of rx and tx queues
>>is applied from connected virtio device.
>> + * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, tha

[ovs-dev] [PATCH v5 0/4] XPS + Manual pinning (all)

2016-07-27 Thread Ilya Maximets

Manual pinning API was discussed here:
http://openvswitch.org/pipermail/dev/2016-July/074674.html

Version 5:
* XPS and Manual pinning back together.
* Dropped already applied patches
* All fixups from pinning v3 merged.
* XPS doesn't work if we have enough TX queues
* Affinity parser changed to reuse existing code
* 'needs_locking' logic moved to dpif-netdev.

Old XPS log:

Version 4:
* Dropped rwlock related patches.
* Added pointer from 'struct tx_port' to 'struct dp_netdev_port'
  to avoid locking of 'dp->ports'. This works because as long as
  a port is in a pmd thread's tx_port cache it cannot be deleted
  from the datapath.
* Added 'now' parameter to 'dp_netdev_execute_actions()' to pass
  current time to XPS functions. This needed to avoid using
  'last_cycles' that is always 0 without DPDK.
* Fixed tx queue ids cleanup on PMD thread deletion.

Version 3:
* Dropped already applied changes.
* fat-rwlock used instead of port_mutex.
* revalidation of 'non-pmd' thread's tx queues added to
  'dpif_netdev_run' to make it faster.


Ilya Maximets (4):
  dpif-netdev: XPS (Transmit Packet Steering) implementation.
  bridge: Pass interface's configuration to datapath.
  dpif-netdev: Add reconfiguration request to dp_netdev.
  dpif-netdev: Introduce pmd-rxq-affinity.

 INSTALL.DPDK.md|  49 -
 NEWS   |   2 +
 lib/dpif-netdev.c  | 450 -
 lib/dpif-netlink.c |   1 +
 lib/dpif-provider.h|   5 +
 lib/dpif.c |  17 ++
 lib/dpif.h |   1 +
 lib/netdev-bsd.c   |   3 +-
 lib/netdev-dpdk.c  |  32 ++--
 lib/netdev-dummy.c |   3 +-
 lib/netdev-linux.c |   3 +-
 lib/netdev-provider.h  |  11 +-
 lib/netdev.c   |  13 +-
 lib/netdev.h   |   2 +-
 ofproto/ofproto-dpif.c |  15 ++
 ofproto/ofproto-provider.h |   4 +
 ofproto/ofproto.c  |  29 +++
 ofproto/ofproto.h  |   2 +
 tests/pmd.at   |   6 +
 vswitchd/bridge.c  |   2 +
 vswitchd/vswitch.xml   |  23 +++
 21 files changed, 553 insertions(+), 120 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v5 1/4] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-27 Thread Ilya Maximets

If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<>
pmd thread numa_id 0 core_id 13:
port: vhost-user1   queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1   queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1   queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1   queue-id: 3
<>

As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
  it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
  allocated TX queue-ids. If queue wasn't used in last
  XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
* XPS is not working if we have enough TX queues.

Reported-by: Zhihong Wang 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 204 --
 lib/netdev-bsd.c  |   3 +-
 lib/netdev-dpdk.c |  32 +++-
 lib/netdev-dummy.c|   3 +-
 lib/netdev-linux.c|   3 +-
 lib/netdev-provider.h |  11 +--
 lib/netdev.c  |  13 ++--
 lib/netdev.h  |   2 +-
 8 files changed, 198 insertions(+), 73 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f05ca4e..d1ba6f3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -248,6 +248,8 @@ enum pmd_cycles_counter_type {
 PMD_N_CYCLES
 };
 
+#define XPS_TIMEOUT_MS 500LL
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -256,6 +258,9 @@ struct dp_netdev_port {
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
+atomic_bool dynamic_txqs;   /* If true XPS will be used. */
+unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
+struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
 };
 
@@ -384,8 +389,9 @@ struct rxq_poll {
 
 /* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
 struct tx_port {
-odp_port_t port_no;
-struct netdev *netdev;
+struct dp_netdev_port *port;
+int qid;
+long long last_used;
 struct hmap_node node;
 };
 
@@ -443,9 +449,10 @@ struct dp_netdev_pmd_thread {
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
 
-/* Queue id used by this pmd thread to send packets on all netdevs.
- * All tx_qid's are unique and less than 'ovs_numa_get_n_cores() + 1'. */
-atomic_int tx_qid;
+/* Queue id used by this pmd thread to send packets on all netdevs if
+ * XPS disabled for this netdev. All static_tx_qid's are unique and less
+ * than 'ovs_numa_get_n_cores() + 1'. */
+atomic_int static_tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
@@ -498,7 +505,8 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
   struct dp_packet_batch *,
   bool may_steal,
   const struct nlattr *actions,
-  size_t actions_len);
+  size_t

[ovs-dev] [PATCH v5 3/4] dpif-netdev: Add reconfiguration request to dp_netdev.

2016-07-27 Thread Ilya Maximets

Next patches will add new conditions when reconfiguration will be
required. It'll be simpler to have common way to request reconfiguration.

Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index b8c069d..1ef0cd7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -223,8 +223,10 @@ struct dp_netdev {
  * 'struct dp_netdev_pmd_thread' in 'per_pmd_key'. */
 ovsthread_key_t per_pmd_key;
 
+struct seq *reconfigure_seq;
+uint64_t last_reconfigure_seq;
+
 /* Cpu mask for pin of pmd threads. */
-char *requested_pmd_cmask;
 char *pmd_cmask;
 
 uint64_t last_tnl_conf_seq;
@@ -943,6 +945,9 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 dp->port_seq = seq_create();
 fat_rwlock_init(&dp->upcall_rwlock);
 
+dp->reconfigure_seq = seq_create();
+dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
+
 /* Disable upcalls by default. */
 dp_netdev_disable_upcall(dp);
 dp->upcall_aux = NULL;
@@ -967,6 +972,18 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 return 0;
 }
 
+static void
+dp_netdev_request_reconfigure(struct dp_netdev *dp)
+{
+seq_change(dp->reconfigure_seq);
+}
+
+static bool
+dp_netdev_is_reconf_required(struct dp_netdev *dp)
+{
+return seq_read(dp->reconfigure_seq) != dp->last_reconfigure_seq;
+}
+
 static int
 dpif_netdev_open(const struct dpif_class *class, const char *name,
  bool create, struct dpif **dpifp)
@@ -1025,6 +1042,8 @@ dp_netdev_free(struct dp_netdev *dp)
 ovs_mutex_unlock(&dp->port_mutex);
 cmap_destroy(&dp->poll_threads);
 
+seq_destroy(dp->reconfigure_seq);
+
 seq_destroy(dp->port_seq);
 hmap_destroy(&dp->ports);
 ovs_mutex_destroy(&dp->port_mutex);
@@ -2545,9 +2564,10 @@ dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 
-if (!nullable_string_is_equal(dp->requested_pmd_cmask, cmask)) {
-free(dp->requested_pmd_cmask);
-dp->requested_pmd_cmask = nullable_xstrdup(cmask);
+if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
+free(dp->pmd_cmask);
+dp->pmd_cmask = nullable_xstrdup(cmask);
+dp_netdev_request_reconfigure(dp);
 }
 
 return 0;
@@ -2696,12 +2716,12 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
 struct dp_netdev_port *port, *next;
 int n_cores;
 
+dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
+
 dp_netdev_destroy_all_pmds(dp);
 
 /* Reconfigures the cpu mask. */
-ovs_numa_set_cpu_mask(dp->requested_pmd_cmask);
-free(dp->pmd_cmask);
-dp->pmd_cmask = nullable_xstrdup(dp->requested_pmd_cmask);
+ovs_numa_set_cpu_mask(dp->pmd_cmask);
 
 n_cores = ovs_numa_get_n_cores();
 if (n_cores == OVS_CORE_UNSPEC) {
@@ -2770,8 +2790,7 @@ dpif_netdev_run(struct dpif *dpif)
 
 dp_netdev_pmd_unref(non_pmd);
 
-if (!nullable_string_is_equal(dp->pmd_cmask, dp->requested_pmd_cmask)
-|| ports_require_restart(dp)) {
+if (dp_netdev_is_reconf_required(dp) || ports_require_restart(dp)) {
 reconfigure_pmd_threads(dp);
 }
 ovs_mutex_unlock(&dp->port_mutex);
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v5 2/4] bridge: Pass interface's configuration to datapath.

2016-07-27 Thread Ilya Maximets

This commit adds functionality to pass value of 'other_config' column
of 'Interface' table to datapath.

This may be used to pass not directly connected with netdev options and
configure behaviour of the datapath for different ports.
For example: pinning of rx queues to polling threads in dpif-netdev.

Signed-off-by: Ilya Maximets 
Acked-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c  |  1 +
 lib/dpif-netlink.c |  1 +
 lib/dpif-provider.h|  5 +
 lib/dpif.c | 17 +
 lib/dpif.h |  1 +
 ofproto/ofproto-dpif.c | 15 +++
 ofproto/ofproto-provider.h |  4 
 ofproto/ofproto.c  | 29 +
 ofproto/ofproto.h  |  2 ++
 vswitchd/bridge.c  |  2 ++
 10 files changed, 77 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d1ba6f3..b8c069d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4347,6 +4347,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_get_stats,
 dpif_netdev_port_add,
 dpif_netdev_port_del,
+NULL,   /* port_set_config */
 dpif_netdev_port_query_by_number,
 dpif_netdev_port_query_by_name,
 NULL,   /* port_get_pid */
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index d544072..a39faa2 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2348,6 +2348,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_get_stats,
 dpif_netlink_port_add,
 dpif_netlink_port_del,
+NULL,   /* port_set_config */
 dpif_netlink_port_query_by_number,
 dpif_netlink_port_query_by_name,
 dpif_netlink_port_get_pid,
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 25f4280..21fb0ba 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -167,6 +167,11 @@ struct dpif_class {
 /* Removes port numbered 'port_no' from 'dpif'. */
 int (*port_del)(struct dpif *dpif, odp_port_t port_no);
 
+/* Refreshes configuration of 'dpif's port. The implementation might
+ * postpone applying the changes until run() is called. */
+int (*port_set_config)(struct dpif *dpif, odp_port_t port_no,
+   const struct smap *cfg);
+
 /* Queries 'dpif' for a port with the given 'port_no' or 'devname'.
  * If 'port' is not null, stores information about the port into
  * '*port' if successful.
diff --git a/lib/dpif.c b/lib/dpif.c
index bb2c4e6..53958c5 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -610,6 +610,23 @@ dpif_port_exists(const struct dpif *dpif, const char 
*devname)
 return !error;
 }
 
+/* Refreshes configuration of 'dpif's port. */
+int
+dpif_port_set_config(struct dpif *dpif, odp_port_t port_no,
+ const struct smap *cfg)
+{
+int error = 0;
+
+if (dpif->dpif_class->port_set_config) {
+error = dpif->dpif_class->port_set_config(dpif, port_no, cfg);
+if (error) {
+log_operation(dpif, "port_set_config", error);
+}
+}
+
+return error;
+}
+
 /* Looks up port number 'port_no' in 'dpif'.  On success, returns 0 and
  * initializes '*port' appropriately; on failure, returns a positive errno
  * value.
diff --git a/lib/dpif.h b/lib/dpif.h
index 981868c..a7c5097 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -839,6 +839,7 @@ void dpif_register_upcall_cb(struct dpif *, upcall_callback 
*, void *aux);
 int dpif_recv_set(struct dpif *, bool enable);
 int dpif_handlers_set(struct dpif *, uint32_t n_handlers);
 int dpif_poll_threads_set(struct dpif *, const char *cmask);
+int dpif_port_set_config(struct dpif *, odp_port_t, const struct smap *cfg);
 int dpif_recv(struct dpif *, uint32_t handler_id, struct dpif_upcall *,
   struct ofpbuf *);
 void dpif_recv_purge(struct dpif *);
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index faff1c7..79f2aa0 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3543,6 +3543,20 @@ port_del(struct ofproto *ofproto_, ofp_port_t ofp_port)
 }
 
 static int
+port_set_config(const struct ofport *ofport_, const struct smap *cfg)
+{
+struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
+struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
+
+if (sset_contains(&ofproto->ghost_ports,
+  netdev_get_name(ofport->up.netdev))) {
+return 0;
+}
+
+return dpif_port_set_config(ofproto->backer->dpif, ofport->odp_port, cfg);
+}
+
+static int
 port_get_stats(const struct ofport *ofport_, struct netdev_stats *stats)
 {
 struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
@@ -5610,6 +5624,7 @@ const struct ofproto_class ofproto_dpif_class = {
 port_query_by_name,
 port_add,
 port_del,

[ovs-dev] [PATCH v5 4/4] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-27 Thread Ilya Maximets

New 'other_config:pmd-rxq-affinity' field for Interface table to
perform manual pinning of RX queues to desired cores.

This functionality is required to achieve maximum performance because
all kinds of ports have different cost of rx/tx operations and
only user can know about expected workload on different ports.

Example:
# ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
  other_config:pmd-rxq-affinity="0:3,1:7,3:8"
Queue #0 pinned to core 3;
Queue #1 pinned to core 7;
Queue #2 not pinned.
Queue #3 pinned to core 8;

It's decided to automatically isolate cores that have rxq explicitly
assigned to them because it's useful to keep constant polling rate on
some performance critical ports while adding/deleting other ports
without explicit pinning of all ports.

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK.md  |  49 +++-
 NEWS |   2 +
 lib/dpif-netdev.c| 216 +--
 tests/pmd.at |   6 ++
 vswitchd/vswitch.xml |  23 ++
 5 files changed, 254 insertions(+), 42 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 5407794..7609aa7 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
  # Check current stats
ovs-appctl dpif-netdev/pmd-stats-show
 
+ # Clear previous stats
+   ovs-appctl dpif-netdev/pmd-stats-clear
+ ```
+
+  7. Port/rxq assigment to PMD threads
+
+ ```
  # Show port/rxq assignment
ovs-appctl dpif-netdev/pmd-rxq-show
+ ```
 
- # Clear previous stats
-   ovs-appctl dpif-netdev/pmd-stats-clear
+ To change default rxq assignment to pmd threads rxqs may be manually
+ pinned to desired cores using:
+
+ ```
+ ovs-vsctl set Interface  \
+   other_config:pmd-rxq-affinity=
  ```
+ where:
+
+ ```
+  ::= NULL | 
+  ::=  |
+   , 
+  ::=  : 
+ ```
+
+ Example:
+
+ ```
+ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
+   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
+
+ Queue #0 pinned to core 3;
+ Queue #1 pinned to core 7;
+ Queue #2 not pinned.
+ Queue #3 pinned to core 8;
+ ```
+
+ After that PMD threads on cores where RX queues was pinned will become
+ `isolated`. This means that this thread will poll only pinned RX queues.
+
+ WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
queues
+ will not be polled. Also, if provided `core_id` is not available (ex. this
+ `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
+ PMD thread.
+
+ Isolation of PMD threads also can be checked using
+ `ovs-appctl dpif-netdev/pmd-rxq-show` command.
 
-  7. Stop vswitchd & Delete bridge
+  8. Stop vswitchd & Delete bridge
 
  ```
  ovs-appctl -t ovs-vswitchd exit
diff --git a/NEWS b/NEWS
index 73d3fcf..1a34f75 100644
--- a/NEWS
+++ b/NEWS
@@ -45,6 +45,8 @@ Post-v2.5.0
Old 'other_config:n-dpdk-rxqs' is no longer supported.
Not supported by vHost interfaces. For them number of rx and tx queues
is applied from connected virtio device.
+ * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
+   allows to pin port's rx queues to desired cores.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
  * Type of log messages from PMD threads changed from INFO to DBG.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1ef0cd7..33f1216 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -53,7 +53,9 @@
 #include "openvswitch/list.h"
 #include "openvswitch/match.h"
 #include "openvswitch/ofp-print.h"
+#include "openvswitch/ofp-util.h"
 #include "openvswitch/ofpbuf.h"
+#include "openvswitch/shash.h"
 #include "openvswitch/vlog.h"
 #include "ovs-numa.h"
 #include "ovs-rcu.h"
@@ -62,7 +64,7 @@
 #include "pvector.h"
 #include "random.h"
 #include "seq.h"
-#include "openvswitch/shash.h"
+#include "smap.h"
 #include "sset.h"
 #include "timeval.h"
 #include "tnl-neigh-cache.h"
@@ -252,6 +254,12 @@ enum pmd_cycles_counter_type {
 
 #define XPS_TIMEOUT_MS 500LL
 
+/* Contained by struct dp_netdev_port's 'rxqs' member.  */
+struct dp_netdev_rxq {
+struct netdev_rxq *rxq;
+unsigned core_id;   /* Сore to which this queue is pinned. */
+};
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -259,11 +267,12 @@ struct dp_netdev_port {
 struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
 struct netdev_saved_

Re: [ovs-dev] [PATCH v5 00/16] Userspace (DPDK) connection tracker

2016-07-27 Thread Ilya Maximets

I guess, you pushed some development version of this patch set.

There is strange commit there:

commit 6c54734ed27bc22975d7035a6bd5f32a412335a0
Author: Daniele Di Proietto 
Date:   Wed Jul 27 18:32:15 2016 -0700

XXX Improve comment.


Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v5 00/16] Userspace (DPDK) connection tracker

2016-07-28 Thread Ilya Maximets

Sorry.
TO: Daniele Di Proietto 

On 28.07.2016 09:27, Ilya Maximets wrote:
> I guess, you pushed some development version of this patch set.
> 
> There is strange commit there:
> 
> commit 6c54734ed27bc22975d7035a6bd5f32a412335a0
> Author: Daniele Di Proietto 
> Date:   Wed Jul 27 18:32:15 2016 -0700
> 
> XXX Improve comment.
> 
> 
> Best regards, Ilya Maximets.
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] tests: Add new pmd test for pmd-rxq-affinity.

2016-07-28 Thread Ilya Maximets

Thanks for making this.

Acked-by: Ilya Maximets 

On 27.07.2016 23:12, Daniele Di Proietto wrote:
> This tests that the newly introduced pmd-rxq-affinity option works as
> intended, at least for a single port.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  tests/pmd.at | 53 +
>  1 file changed, 53 insertions(+)
> 
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 47639b6..3052f95 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -461,3 +461,56 @@ 
> icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10
>  
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([PMD - rxq affinity])
> +OVS_VSWITCHD_START(
> +  [], [], [], [--dummy-numa 0,0,0,0,0,0,0,0,0])
> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 actions=controller])
> +
> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=1fe])
> +
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dummy-pmd 
> ofport_request=1 options:n_rxq=4 
> other_config:pmd-rxq-affinity="0:3,1:7,2:2,3:8"])
> +
> +dnl The rxqs should be on the requested cores.
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], 
> [dnl
> +p1 0 0 3
> +p1 1 0 7
> +p1 2 0 2
> +p1 3 0 8
> +])
> +
> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6])
> +
> +dnl We removed the cores requested by some queues from pmd-cpu-mask.
> +dnl Those queues will not be polled.
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], 
> [dnl
> +p1 2 0 2
> +])
> +
> +AT_CHECK([ovs-vsctl remove Interface p1 other_config pmd-rxq-affinity])
> +
> +dnl We removed the rxq-affinity request.  dpif-netdev should assign queues
> +dnl in a round robin fashion.  We just make sure that every rxq is being
> +dnl polled again.
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 
> 1,2 -d ' ' | sort], [0], [dnl
> +p1 0
> +p1 1
> +p1 2
> +p1 3
> +])
> +
> +AT_CHECK([ovs-vsctl set Interface p1 other_config:pmd-rxq-affinity='0:1'])
> +
> +dnl We explicitly requested core 1 for queue 0.  Core 1 becomes isolated and
> +dnl every other queue goes to core 2.
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], 
> [dnl
> +p1 0 0 1
> +p1 1 0 2
> +p1 2 0 2
> +p1 3 0 2
> +])
> +
> +OVS_VSWITCHD_STOP(["/dpif_netdev|WARN|There is no PMD thread on core/d"])
> +AT_CLEANUP
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] dpif-netdev: Fix xps revalidation.

2016-07-29 Thread Ilya Maximets

Revalidation should work in case of 'dynamic_txqs == true'.

Fixes: 324c8374852a ("dpif-netdev: XPS (Transmit Packet Steering) 
implementation.")
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 828171e..c446ae8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4193,7 +4193,7 @@ dpif_netdev_xps_revalidate_pmd(const struct 
dp_netdev_pmd_thread *pmd,
 long long interval;
 
 HMAP_FOR_EACH (tx, node, &pmd->port_cache) {
-if (tx->port->dynamic_txqs) {
+if (!tx->port->dynamic_txqs) {
 continue;
 }
 interval = now - tx->last_used;
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev,v3,3/5] netdev-dpdk: Add vHost User PMD

2016-07-29 Thread Ilya Maximets

Not the complete review. Just few comments to design.

And what about performance? Is there any difference in comparison to
current version of code? I guess, this may be slower than direct
access to vhost library.

Comments inline.

Best regards, Ilya Maximets.

On 28.07.2016 19:21, Ciara Loftus wrote:
> DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser' ports
> to be controlled by the librte_ether API, like physical 'dpdk' ports and
> IVSHM 'dpdkr' ports. This commit integrates this PMD into OVS and
> removes direct calls to the librte_vhost DPDK library.
> 
> This commit removes extended statistics support for vHost User ports
> until such a time that this becomes available in the vHost PMD in a
> DPDK release supported by OVS.
> 
> Signed-off-by: Ciara Loftus 
> ---
>  INSTALL.DPDK.md   |  10 +
>  NEWS  |   2 +
>  lib/netdev-dpdk.c | 857 
> ++
>  3 files changed, 300 insertions(+), 569 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 7609aa7..4feb7be 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -604,6 +604,16 @@ can be found in [Vhost Walkthrough].
>  
>  http://dpdk.org/doc/guides/rel_notes/release_16_04.html
>  
> +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in the context 
> of
> +DPDK as they are all managed by the rte_ether API. This means that they
> +adhere to the DPDK configuration option CONFIG_RTE_MAX_ETHPORTS which by
> +default is set to 32. This means by default the combined total number of
> +dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with DPDK is 32. 
> This
> +value can be changed if desired by modifying the configuration file in
> +DPDK, or by overriding the default value on the command line when 
> building
> +DPDK. eg.
> +
> +`make install CONFIG_RTE_MAX_ETHPORTS=64`
>  
>  Bug Reporting:
>  --
> diff --git a/NEWS b/NEWS
> index dc3dedb..6510dde 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -64,6 +64,8 @@ Post-v2.5.0
>   * Basic connection tracking for the userspace datapath (no ALG,
> fragmentation or NAT support yet)
>   * Remove dpdkvhostcuse port type.
> + * vHost PMD integration brings vhost-user ports under control of the
> +   rte_ether DPDK API.
> - Increase number of registers to 16.
> - ovs-benchmark: This utility has been removed due to lack of use and
>   bitrot.
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index d6959fe..d6ceeec 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -30,7 +30,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "dirs.h"
>  #include "dp-packet.h"
> @@ -56,9 +55,9 @@
>  #include "unixctl.h"
>  
>  #include "rte_config.h"
> +#include "rte_eth_vhost.h"
>  #include "rte_mbuf.h"
>  #include "rte_meter.h"
> -#include "rte_virtio_net.h"
>  
>  VLOG_DEFINE_THIS_MODULE(dpdk);
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> @@ -141,6 +140,9 @@ static char *vhost_sock_dir = NULL;   /* Location of 
> vhost-user sockets */
>  
>  #define VHOST_ENQ_RETRY_NUM 8
>  
> +/* Array that tracks the used & unused vHost user driver IDs */
> +static unsigned int vhost_drv_ids[RTE_MAX_ETHPORTS];
> +
>  static const struct rte_eth_conf port_conf = {
>  .rxmode = {
>  .mq_mode = ETH_MQ_RX_RSS,
> @@ -346,12 +348,15 @@ struct netdev_dpdk {
>  struct rte_eth_link link;
>  int link_reset_cnt;
>  
> -/* virtio-net structure for vhost device */
> -OVSRCU_TYPE(struct virtio_net *) virtio_dev;
> +/* Number of virtqueue pairs reported by the guest */
> +uint32_t vhost_qp_nb;
>  
>  /* Identifier used to distinguish vhost devices from each other */
>  char vhost_id[PATH_MAX];
>  
> +/* ID of vhost user port given to the PMD driver */
> +unsigned int vhost_pmd_id;
> +
>  /* In dpdk_list. */
>  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
>  
> @@ -382,16 +387,23 @@ struct netdev_rxq_dpdk {
>  static bool dpdk_thread_is_pmd(void);
>  
>  static int netdev_dpdk_construct(struct netdev *);
> -
> -struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev);
> +static int netdev_dpdk_vhost_construct(struct netdev *);
>  
>  struct ingress_policer *
>  netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev);
>  
> +static void link_status_changed_callback(uint8_t port_id,
> +enum rte_eth_event_type type, void *param);
> +stat

Re: [ovs-dev] [ovs-dev,v4,3/5] netdev-dpdk: Add vHost User PMD

2016-08-01 Thread Ilya Maximets

I've applied this patch and performed following test:

OVS with 2 VMs connected via vhost-user ports.
Each vhost-user port has 4 queues.

VM1 executes ping on LOCAL port.
In normal situation ping results are following:

100 packets transmitted, 100 received, 0% packet loss, time 99144ms
rtt min/avg/max/mdev = 0.231/0.459/0.888/0.111 ms

After that VM2 starts execution of this script:

while true;
do
ethtool -L eth0 combined 4;
ethtool -L eth0 combined 1;
done

Now results of ping between VM1 and LOCAL port are:

100 packets transmitted, 100 received, 0% packet loss, time 99116ms
rtt min/avg/max/mdev = 5.466/150.327/356.201/85.208 ms

Minimal time increased from 0.231 to 5.466 ms.
Average time increased from 0.459 to 150.327 ms (~300 times)!

This happens because of constant reconfiguration requests from
the 'vring_state_changed_callback()'.

As Ciara said, "Previously we could work with only reconfiguring during
link status change as we had full information available to us
ie. virtio_net->virt_qp_nb. We don't have that any more, so we need to
count the queues in OVS now every time we get a vring_change."

Test above shows that this is unacceptable for OVS to perform
reconfiguration each time vring state changed because this leads to
ability for the guest user to break normal networking on all ports
connected to the same instance of Open vSwitch.

If this vulnerability is unavoidable with current version of vHost PMD,
I'm suggesting to postpone it's integration until there will be
method or special API to get number of queues from the inside of
'link_status_changed_callback()'.

I've added vHost maintainers to CC-list to hear their opinion about
new API to get number of queues from the vHost PMD.
Maybe we can expose 'rte_vhost_get_queue_num()' somehow or make
'dev_info->nb_rx_queues' usable?

NACK for now.

Best regards, Ilya Maximets.

On 29.07.2016 16:24, Ciara Loftus wrote:
> DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser' ports
> to be controlled by the librte_ether API, like physical 'dpdk' ports and
> IVSHM 'dpdkr' ports. This commit integrates this PMD into OVS and
> removes direct calls to the librte_vhost DPDK library.
> 
> This commit removes extended statistics support for vHost User ports
> until such a time that this becomes available in the vHost PMD in a
> DPDK release supported by OVS.
> 
> Signed-off-by: Ciara Loftus 
> ---
>  INSTALL.DPDK.md   |  10 +
>  NEWS  |   2 +
>  lib/netdev-dpdk.c | 857 
> ++
>  3 files changed, 300 insertions(+), 569 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 7609aa7..4feb7be 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -604,6 +604,16 @@ can be found in [Vhost Walkthrough].
>  
>  http://dpdk.org/doc/guides/rel_notes/release_16_04.html
>  
> +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in the context 
> of
> +DPDK as they are all managed by the rte_ether API. This means that they
> +adhere to the DPDK configuration option CONFIG_RTE_MAX_ETHPORTS which by
> +default is set to 32. This means by default the combined total number of
> +dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with DPDK is 32. 
> This
> +value can be changed if desired by modifying the configuration file in
> +DPDK, or by overriding the default value on the command line when 
> building
> +DPDK. eg.
> +
> +`make install CONFIG_RTE_MAX_ETHPORTS=64`
>  
>  Bug Reporting:
>  --
> diff --git a/NEWS b/NEWS
> index dc3dedb..6510dde 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -64,6 +64,8 @@ Post-v2.5.0
>   * Basic connection tracking for the userspace datapath (no ALG,
> fragmentation or NAT support yet)
>   * Remove dpdkvhostcuse port type.
> + * vHost PMD integration brings vhost-user ports under control of the
> +   rte_ether DPDK API.
> - Increase number of registers to 16.
> - ovs-benchmark: This utility has been removed due to lack of use and
>   bitrot.
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index d6959fe..d6ceeec 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -30,7 +30,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "dirs.h"
>  #include "dp-packet.h"
> @@ -56,9 +55,9 @@
>  #include "unixctl.h"
>  
>  #include "rte_config.h"
> +#include "rte_eth_vhost.h"
>  #include "rte_mbuf.h"
>  #include "rte_meter.h"
> -#include &

Re: [ovs-dev] [ovs-dev, 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-01 Thread Ilya Maximets

Hi Daniele. Thanks for posting this.
I have almost same patch in my local branch.

I didn't test this with physical DPDK NICs yet, but I have few
high level comments:

1. Do you thought about renaming of 'mtu_request' inside netdev-dpdk
   to 'requested_mtu'? I think, this would be more clear and
   consistent with other configurable parameters (n_rxq, n_txq, ...).

2. I'd prefer not to fail reconfiguration if there is no enough memory
   for new mempool. I think, it'll be common situation when we are
   requesting more memory than we have. Failure leads to destruction
   of the port and inability to reconnect to vhost-user port after
   re-creation if vhost is in server mode. We can just keep old
   mempool and inform user via VLOG_ERR.

3. Minor issues inline.

What do you think?

Best regards, Ilya Maximets.

On 30.07.2016 04:22, Daniele Di Proietto wrote:
> From: Mark Kavanagh 
> 
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
>  INSTALL.DPDK-ADVANCED.md |  59 +-
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 151 
> +++
>  4 files changed, 185 insertions(+), 27 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 191e69e..5cd64bf 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>  
>  ## Contents
>  
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>  
>  ##  1. Overview
>  
> @@ -862,7 +863,59 @@ respective parameter. To disable the flow control at tx 
> side,
>  
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>  
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
> +enable Jumbo Frames support for a DPDK port, change the Interface's 
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be 
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest frame 
> size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
> +(particularly in use cases involving East-West traffic only), and other DPDK 
> NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames 
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
> in
> +the QEMU command line snippet below:
> +
> +```
> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +'-device 
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +```
> +
> +2. Where virtio devices are bound to the Linux kernel driver in a guest
> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
> driver),
> +   the MTU of those logical network interfaces must also be increased to 
> a
> +   sufficiently large value. This avoids segmentation of Jumbo Frames
> +   received in the guest. Note that 'MTU' refers to the length of the IP
> +   packet only, and not that of the entire frame.
> +
> +   To calcu

Re: [ovs-dev] [ovs-dev,v4,3/5] netdev-dpdk: Add vHost User PMD

2016-08-02 Thread Ilya Maximets

t;> Test above shows that this is unacceptable for OVS to perform
>> reconfiguration each time vring state changed because this leads to
>> ability for the guest user to break normal networking on all ports
>> connected to the same instance of Open vSwitch.
>>
>> If this vulnerability is unavoidable with current version of vHost PMD,
>> I'm suggesting to postpone it's integration until there will be
>> method or special API to get number of queues from the inside of
>> 'link_status_changed_callback()'.
>>
>> I've added vHost maintainers to CC-list to hear their opinion about
>> new API to get number of queues from the vHost PMD.
>> Maybe we can expose 'rte_vhost_get_queue_num()' somehow or make
>> 'dev_info->nb_rx_queues' usable?
>>
>> NACK for now.
>>
>> Best regards, Ilya Maximets.
>>
>> On 29.07.2016 16:24, Ciara Loftus wrote:
>>> DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser' ports
>>> to be controlled by the librte_ether API, like physical 'dpdk' ports and
>>> IVSHM 'dpdkr' ports. This commit integrates this PMD into OVS and
>>> removes direct calls to the librte_vhost DPDK library.
>>>
>>> This commit removes extended statistics support for vHost User ports
>>> until such a time that this becomes available in the vHost PMD in a
>>> DPDK release supported by OVS.
>>>
>>> Signed-off-by: Ciara Loftus 
>>> ---
>>>  INSTALL.DPDK.md   |  10 +
>>>  NEWS  |   2 +
>>>  lib/netdev-dpdk.c | 857 ++-
>> ---
>>>  3 files changed, 300 insertions(+), 569 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>> index 7609aa7..4feb7be 100644
>>> --- a/INSTALL.DPDK.md
>>> +++ b/INSTALL.DPDK.md
>>> @@ -604,6 +604,16 @@ can be found in [Vhost Walkthrough].
>>>
>>>  http://dpdk.org/doc/guides/rel_notes/release_16_04.html
>>>
>>> +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in the context
>> of
>>> +DPDK as they are all managed by the rte_ether API. This means that
>> they
>>> +adhere to the DPDK configuration option CONFIG_RTE_MAX_ETHPORTS
>> which by
>>> +default is set to 32. This means by default the combined total number 
>>> of
>>> +dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with DPDK is 32.
>> This
>>> +value can be changed if desired by modifying the configuration file in
>>> +DPDK, or by overriding the default value on the command line when
>> building
>>> +DPDK. eg.
>>> +
>>> +`make install CONFIG_RTE_MAX_ETHPORTS=64`
>>>
>>>  Bug Reporting:
>>>  --
>>> diff --git a/NEWS b/NEWS
>>> index dc3dedb..6510dde 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -64,6 +64,8 @@ Post-v2.5.0
>>>   * Basic connection tracking for the userspace datapath (no ALG,
>>> fragmentation or NAT support yet)
>>>   * Remove dpdkvhostcuse port type.
>>> + * vHost PMD integration brings vhost-user ports under control of the
>>> +   rte_ether DPDK API.
>>> - Increase number of registers to 16.
>>> - ovs-benchmark: This utility has been removed due to lack of use and
>>>   bitrot.
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index d6959fe..d6ceeec 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -30,7 +30,6 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> -#include 
>>>
>>>  #include "dirs.h"
>>>  #include "dp-packet.h"
>>> @@ -56,9 +55,9 @@
>>>  #include "unixctl.h"
>>>
>>>  #include "rte_config.h"
>>> +#include "rte_eth_vhost.h"
>>>  #include "rte_mbuf.h"
>>>  #include "rte_meter.h"
>>> -#include "rte_virtio_net.h"
>>>
>>>  VLOG_DEFINE_THIS_MODULE(dpdk);
>>>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>>> @@ -141,6 +140,9 @@ static char *vhost_sock_dir = NULL;   /* Location of
>> vhost-user sockets */
>>>
>>>  #define VHOST_ENQ_RETRY_NUM 8
>>>
>>> +/* Array that tracks the used & unused vHost user driver IDs */
>>> +static unsigned int vho

Re: [ovs-dev] [ovs-dev,v4,3/5] netdev-dpdk: Add vHost User PMD

2016-08-03 Thread Ilya Maximets

On 03.08.2016 12:21, Loftus, Ciara wrote:
>>
>> I've applied this patch and performed following test:
>>
>> OVS with 2 VMs connected via vhost-user ports.
>> Each vhost-user port has 4 queues.
>>
>> VM1 executes ping on LOCAL port.
>> In normal situation ping results are following:
>>
>>  100 packets transmitted, 100 received, 0% packet loss, time 99144ms
>>  rtt min/avg/max/mdev = 0.231/0.459/0.888/0.111 ms
>>
>> After that VM2 starts execution of this script:
>>
>>  while true;
>>  do
>>  ethtool -L eth0 combined 4;
>>  ethtool -L eth0 combined 1;
>>  done
>>
>> Now results of ping between VM1 and LOCAL port are:
>>
>>  100 packets transmitted, 100 received, 0% packet loss, time 99116ms
>>  rtt min/avg/max/mdev = 5.466/150.327/356.201/85.208 ms
>>
>> Minimal time increased from 0.231 to 5.466 ms.
>> Average time increased from 0.459 to 150.327 ms (~300 times)!
>>
>> This happens because of constant reconfiguration requests from
>> the 'vring_state_changed_callback()'.
>>
>> As Ciara said, "Previously we could work with only reconfiguring during
>> link status change as we had full information available to us
>> ie. virtio_net->virt_qp_nb. We don't have that any more, so we need to
>> count the queues in OVS now every time we get a vring_change."
>>
>> Test above shows that this is unacceptable for OVS to perform
>> reconfiguration each time vring state changed because this leads to
>> ability for the guest user to break normal networking on all ports
>> connected to the same instance of Open vSwitch.
> 
> Hi Ilya,
> 
> Another thought on this. With the current master branch, isn't the above 
> possible too with a script like this:
> 
> while true;
> do
> echo ":00:03.0" > /sys/bus/pci/drivers/virtio-pci/bind
> echo ":00:03.0" > /sys/bus/pci/drivers/virtio-pci/unbind
> done
> 
> The bind/unbind calls new/destroy device which in turn call reconfigure() 
> each time.

Hmm, yes. You're right.
But this may be easily fixed by following patch:
-
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 41ca91d..b9e72b8 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2313,10 +2313,14 @@ new_device(int vid)
 newnode = dev->socket_id;
 }
 
-dev->requested_socket_id = newnode;
-dev->requested_n_rxq = qp_num;
-dev->requested_n_txq = qp_num;
-netdev_request_reconfigure(&dev->up);
+if (dev->requested_n_txq != qp_num
+|| dev->requested_n_rxq != qp_num
+|| dev->requested_socket_id != newnode) {
+dev->requested_socket_id = newnode;
+dev->requested_n_rxq = qp_num;
+dev->requested_n_txq = qp_num;
+netdev_request_reconfigure(&dev->up);
+}
 
 exists = true;
 
@@ -2376,9 +2380,6 @@ destroy_device(int vid)
 dev->vid = -1;
 /* Clear tx/rx queue settings. */
 netdev_dpdk_txq_map_clear(dev);
-dev->requested_n_rxq = NR_QUEUE;
-dev->requested_n_txq = NR_QUEUE;
-netdev_request_reconfigure(&dev->up);
 
 netdev_change_seq_changed(&dev->up);
 ovs_mutex_unlock(&dev->mutex);
-

We will not decrease number of queues on disconnect. But, I think,
it's not very important.

Thanks for pointing this.
I'll post above diff as a separate patch.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.

2016-08-03 Thread Ilya Maximets

Binding/unbinding of virtio driver inside VM leads to reconfiguration
of PMD threads. This behaviour may be abused by executing bind/unbind
in an infinite loop to break normal networking on all ports attached
to the same instance of Open vSwitch.

Fix that by avoiding reconfiguration if it's not necessary.
Number of queues will not be decreased to 1 on device disconnection but
it's not very important in comparison with possible DOS attack from the
inside of guest OS.

Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
  ports from attached virtio.")
Reported-by: Ciara Loftus 
Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index a0d541a..98369f1 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2273,11 +2273,14 @@ new_device(struct virtio_net *virtio_dev)
 newnode = dev->socket_id;
 }
 
-dev->requested_socket_id = newnode;
-dev->requested_n_rxq = qp_num;
-dev->requested_n_txq = qp_num;
-netdev_request_reconfigure(&dev->up);
-
+if (dev->requested_n_txq != qp_num
+|| dev->requested_n_rxq != qp_num
+|| dev->requested_socket_id != newnode) {
+dev->requested_socket_id = newnode;
+dev->requested_n_rxq = qp_num;
+dev->requested_n_txq = qp_num;
+netdev_request_reconfigure(&dev->up);
+}
 ovsrcu_set(&dev->virtio_dev, virtio_dev);
 exists = true;
 
@@ -2333,11 +2336,7 @@ destroy_device(volatile struct virtio_net *virtio_dev)
 ovs_mutex_lock(&dev->mutex);
 virtio_dev->flags &= ~VIRTIO_DEV_RUNNING;
 ovsrcu_set(&dev->virtio_dev, NULL);
-/* Clear tx/rx queue settings. */
 netdev_dpdk_txq_map_clear(dev);
-dev->requested_n_rxq = NR_QUEUE;
-dev->requested_n_txq = NR_QUEUE;
-netdev_request_reconfigure(&dev->up);
 
 netdev_change_seq_changed(&dev->up);
 ovs_mutex_unlock(&dev->mutex);
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev, 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-03 Thread Ilya Maximets

Hi, Mark.

On 03.08.2016 15:14, Kavanagh, Mark B wrote:
>>
>> Hi Daniele. Thanks for posting this.
> 
> Hi Ilya,
> 
> I actually implemented this patch as part of Daniele's MTU patchset, based on 
> my earlier patch - Daniele mainly rebased it to head of master :)
> 
> Thanks for your feedback - I've responded inline.
> 
> Cheers,
> Mark
> 
>> I have almost same patch in my local branch.
>>
>> I didn't test this with physical DPDK NICs yet, but I have few
>> high level comments:
>>
>> 1. Do you thought about renaming of 'mtu_request' inside netdev-dpdk
>>   to 'requested_mtu'? I think, this would be more clear and
>>   consistent with other configurable parameters (n_rxq, n_txq, ...).
> 
> 'mtu_request' was the name suggested by Daniele, following a discussion with 
> colleagues.
> I don't have strong feelings either way, so I'll leave Daniele to comment.

I meant only renaming of 'netdev_dpdk->mtu_request' to 
'netdev_dpdk->requested_mtu'.
Database column should be 'mtu_request' as it is now.

>>
>> 2. I'd prefer not to fail reconfiguration if there is no enough memory
>>   for new mempool. I think, it'll be common situation when we are
>>   requesting more memory than we have. Failure leads to destruction
>>   of the port and inability to reconnect to vhost-user port after
>>   re-creation if vhost is in server mode. We can just keep old
>>   mempool and inform user via VLOG_ERR.
>>
> Agreed - I'll modify V2 accordingly.
> 
> 
>> 3. Minor issues inline.
> 
> Comments on these inline also.
> 
>>
>> What do you think?
>>
>> Best regards, Ilya Maximets.
>>
>> On 30.07.2016 04:22, Daniele Di Proietto wrote:
>>> From: Mark Kavanagh 
>>>
>>> Add support for Jumbo Frames to DPDK-enabled port types,
>>> using single-segment-mbufs.
>>>
>>> Using this approach, the amount of memory allocated to each mbuf
>>> to store frame data is increased to a value greater than 1518B
>>> (typical Ethernet maximum frame length). The increased space
>>> available in the mbuf means that an entire Jumbo Frame of a specific
>>> size can be carried in a single mbuf, as opposed to partitioning
>>> it across multiple mbuf segments.
>>>
>>> The amount of space allocated to each mbuf to hold frame data is
>>> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
>>> parameter.
>>>
>>> Signed-off-by: Mark Kavanagh 
>>> [diproiet...@vmware.com rebased]
>>> Signed-off-by: Daniele Di Proietto 
>>> ---
>>>  INSTALL.DPDK-ADVANCED.md |  59 +-
>>>  INSTALL.DPDK.md  |   1 -
>>>  NEWS |   1 +
>>>  lib/netdev-dpdk.c| 151 
>>> +++
>>>  4 files changed, 185 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
>>> index 191e69e..5cd64bf 100755
>>> --- a/INSTALL.DPDK-ADVANCED.md
>>> +++ b/INSTALL.DPDK-ADVANCED.md
>>> @@ -1,5 +1,5 @@
>>>  OVS DPDK ADVANCED INSTALL GUIDE
>>> -=
>>> +===
>>>
>>>  ## Contents
>>>
>>> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>>>  7. [QOS](#qos)
>>>  8. [Rate Limiting](#rl)
>>>  9. [Flow Control](#fc)
>>> -10. [Vsperf](#vsperf)
>>> +10. [Jumbo Frames](#jumbo)
>>> +11. [Vsperf](#vsperf)
>>>
>>>  ##  1. Overview
>>>
>>> @@ -862,7 +863,59 @@ respective parameter. To disable the flow control at 
>>> tx side,
>>>
>>>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>>>
>>> -##  10. Vsperf
>>> +##  10. Jumbo Frames
>>> +
>>> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). 
>>> To
>>> +enable Jumbo Frames support for a DPDK port, change the Interface's 
>>> `mtu_request`
>>> +attribute to a sufficiently large value.
>>> +
>>> +e.g. Add a DPDK Phy port with MTU of 9000:
>>> +
>>> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
>>> Interface dpdk0
>> mtu_request=9000`
>>> +
>>> +e.g. Change the MTU of an existing port to 6200:
>>> +
>>> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
>>

Re: [ovs-dev] [PATCH] netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.

2016-08-04 Thread Ilya Maximets

On 04.08.2016 12:00, Loftus, Ciara wrote:
>>
>> Binding/unbinding of virtio driver inside VM leads to reconfiguration
>> of PMD threads. This behaviour may be abused by executing bind/unbind
>> in an infinite loop to break normal networking on all ports attached
>> to the same instance of Open vSwitch.
>>
>> Fix that by avoiding reconfiguration if it's not necessary.
>> Number of queues will not be decreased to 1 on device disconnection but
>> it's not very important in comparison with possible DOS attack from the
>> inside of guest OS.
>>
>> Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
>>   ports from attached virtio.")
>> Reported-by: Ciara Loftus 
>> Signed-off-by: Ilya Maximets 
>> ---
>>  lib/netdev-dpdk.c | 17 -
>>  1 file changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index a0d541a..98369f1 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -2273,11 +2273,14 @@ new_device(struct virtio_net *virtio_dev)
>>  newnode = dev->socket_id;
>>  }
>>
>> -dev->requested_socket_id = newnode;
>> -dev->requested_n_rxq = qp_num;
>> -dev->requested_n_txq = qp_num;
>> -netdev_request_reconfigure(&dev->up);
>> -
>> +if (dev->requested_n_txq != qp_num
>> +|| dev->requested_n_rxq != qp_num
>> +|| dev->requested_socket_id != newnode) {
>> +dev->requested_socket_id = newnode;
>> +dev->requested_n_rxq = qp_num;
>> +dev->requested_n_txq = qp_num;
>> +netdev_request_reconfigure(&dev->up);
>> +}
>>  ovsrcu_set(&dev->virtio_dev, virtio_dev);
>>  exists = true;
>>
>> @@ -2333,11 +2336,7 @@ destroy_device(volatile struct virtio_net
>> *virtio_dev)
>>  ovs_mutex_lock(&dev->mutex);
>>  virtio_dev->flags &= ~VIRTIO_DEV_RUNNING;
>>  ovsrcu_set(&dev->virtio_dev, NULL);
>> -/* Clear tx/rx queue settings. */
>>  netdev_dpdk_txq_map_clear(dev);
>> -dev->requested_n_rxq = NR_QUEUE;
>> -dev->requested_n_txq = NR_QUEUE;
>> -netdev_request_reconfigure(&dev->up);
> 
> Hi Ilya,
> 
> I assume we will still poll on N queues despite the device being down?
> Do you have any data showing how this may affect performance?

No, I haven't. But it must be negligible because there will be instant
return from 'netdev_dpdk_vhost_rxq_recv()'. Anyway we're polling
queue #0 now all the time. Also, I think that the state when no driver
loaded for device should not last long.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev, CudaMailTagged] netdev-dpdk: unlink socket file when constructing vhostuser.

2016-08-04 Thread Ilya Maximets

Oh, again...

1. This patch not for upstream version of OVS. (I guess your OVS patched
   at least with vHost client support).

2. If you will restart OVS properly (even by SIGTERM or SIGINT) socket
   will be removed inside 'rte_vhost_driver_unregister()' or inside
   fatal_signal handler. Are you sure that it's the issue of upstream
   OVS and not your local patches?

3. Segmentation fault or another hard failure is the only reason to
   sockets left on the filesystem.

Best regards, Ilya Maximets.

On 04.08.2016 23:31, xu.binb...@zte.com.cn wrote:
> Work with DPDK 16.07, a UNIX socket will be created when we
> add vhostuser port. After that, the restarting of ovs-vswitchd
> leads to the failure of socket binding, so the vhostuser port
> can't be created successfully.
> 
> This commit unlink socket file before creating UNIX socket to
> avoid failure of socket binding.
> 
> Signed-off-by: Binbin Xu 
> ---
>  lib/netdev-dpdk.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>  mode change 100644 => 100755 lib/netdev-dpdk.c
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> old mode 100644
> new mode 100755
> index aaac0d1..95cf7c3
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -885,7 +885,10 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>   */
>  snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>   vhost_sock_dir, name);
> -
> +
> +if (!(flags & RTE_VHOST_USER_CLIENT)) {
> +unlink(dev->vhost_id);
> +}
>  err = rte_vhost_driver_register(dev->vhost_id, flags);
>  if (err) {
>  VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] 答复: Re: [ovs-dev, CudaMailTagged] netdev-dpdk: unlink socket file when constructing vhostuser.

2016-08-04 Thread Ilya Maximets

On 04.08.2016 14:05, xu.binb...@zte.com.cn wrote:
> I get the source code clone from branch origin/master in the github.

Sorry. All is OK, but you changed the mode of 'lib/netdev-dpdk.c' file.

> In fact, I killed ovs-vswitchd process and then start it again.
> In this case, sockets left on the filesystem.

That is normal. Why you using hard untrappable signals? There will be
no issues in normal case (even SIGTERM or SIGINT).

There are few discussions about this in mail-list. That's an old topic.
Some links:
http://openvswitch.org/pipermail/dev/2016-February/065556.html
http://openvswitch.org/pipermail/dev/2016-February/065470.html


> Sorry for my description in the commit msg, it may be a puzzle to you.
> 
> 
> 
> 
> 发件人: Ilya Maximets 
> 收件人: xu.binb...@zte.com.cn, dev@openvswitch.org,
> 抄送:Ben Pfaff , Dyasly Sergey , 
> Heetae Ahn 
> 日期: 2016/08/04 18:42
> 主题:Re: [ovs-dev, CudaMailTagged] netdev-dpdk: unlink socket file when 
> constructing vhostuser.
> 
 --
> 
> 
> 
> Oh, again...
> 
> 1. This patch not for upstream version of OVS. (I guess your OVS patched
>   at least with vHost client support).
> 
> 2. If you will restart OVS properly (even by SIGTERM or SIGINT) socket
>   will be removed inside 'rte_vhost_driver_unregister()' or inside
>   fatal_signal handler. Are you sure that it's the issue of upstream
>   OVS and not your local patches?
> 
> 3. Segmentation fault or another hard failure is the only reason to
>   sockets left on the filesystem.
> 
> Best regards, Ilya Maximets.
> 
> On 04.08.2016 23:31, xu.binb...@zte.com.cn wrote:
>> Work with DPDK 16.07, a UNIX socket will be created when we
>> add vhostuser port. After that, the restarting of ovs-vswitchd
>> leads to the failure of socket binding, so the vhostuser port
>> can't be created successfully.
>>
>> This commit unlink socket file before creating UNIX socket to
>> avoid failure of socket binding.
>>
>> Signed-off-by: Binbin Xu 
>> ---
>>  lib/netdev-dpdk.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>  mode change 100644 => 100755 lib/netdev-dpdk.c
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> old mode 100644
>> new mode 100755
>> index aaac0d1..95cf7c3
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -885,7 +885,10 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>   */
>>  snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>   vhost_sock_dir, name);
>> -
>> +
>> +if (!(flags & RTE_VHOST_USER_CLIENT)) {
>> +unlink(dev->vhost_id);
>> +}
>>  err = rte_vhost_driver_register(dev->vhost_id, flags);
>>  if (err) {
>>  VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
>>
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] 答复: Re: 答复: Re: [ovs-dev, CudaMailTagged] netdev-dpdk: unlink socket file when constructing vhostuser.

2016-08-04 Thread Ilya Maximets

On 04.08.2016 14:52, xu.binb...@zte.com.cn wrote:
> Thank you.
> 
> I saw the patch in the link: 
> http://openvswitch.org/pipermail/dev/2016-February/065556.html
> 
> But in the latest code of OVS, I can't find this path??

Did you read the whole discussion?

> By the way, should I keep the mode of 'lib/netdev-dpdk.c' file?

Definitely. But I don't think you should send second version if
you don't have very strong arguments for such solution.

> Ilya Maximets  写于 2016/08/04 19:33:58:
> 
>> 发件人:  Ilya Maximets 
>> 收件人:  xu.binb...@zte.com.cn,
>> 抄送: Ben Pfaff , dev@openvswitch.org, Heetae Ahn
>> , Dyasly Sergey 
>> 日期:  2016/08/04 19:34
>> 主题: Re: 答复: Re: [ovs-dev, CudaMailTagged] netdev-dpdk: unlink
>> socket file when constructing vhostuser.
>>
>> On 04.08.2016 14:05, xu.binb...@zte.com.cn wrote:
>> > I get the source code clone from branch origin/master in the github.
>>
>> Sorry. All is OK, but you changed the mode of 'lib/netdev-dpdk.c' file.
>>
>> > In fact, I killed ovs-vswitchd process and then start it again.
>> > In this case, sockets left on the filesystem.
>>
>> That is normal. Why you using hard untrappable signals? There will be
>> no issues in normal case (even SIGTERM or SIGINT).
>>
>> There are few discussions about this in mail-list. That's an old topic.
>> Some links:
>> http://openvswitch.org/pipermail/dev/2016-February/065556.html
>> http://openvswitch.org/pipermail/dev/2016-February/065470.html
>>
>>
>> > Sorry for my description in the commit msg, it may be a puzzle to you.
>> >
>> >
>> >
>> >
>> > 发件人: Ilya Maximets 
>> > 收件人: xu.binb...@zte.com.cn, dev@openvswitch.org,
>> > 抄送:Ben Pfaff , Dyasly Sergey
>> , Heetae Ahn 
>> > 日期: 2016/08/04 18:42
>> > 主题:Re: [ovs-dev, CudaMailTagged] netdev-dpdk: unlink
>> socket file when constructing vhostuser.
>> >
>> ---
 -
>>  --
>> >
>> >
>> >
>> > Oh, again...
>> >
>> > 1. This patch not for upstream version of OVS. (I guess your OVS patched
>> >   at least with vHost client support).
>> >
>> > 2. If you will restart OVS properly (even by SIGTERM or SIGINT) socket
>> >   will be removed inside 'rte_vhost_driver_unregister()' or inside
>> >   fatal_signal handler. Are you sure that it's the issue of upstream
>> >   OVS and not your local patches?
>> >
>> > 3. Segmentation fault or another hard failure is the only reason to
>> >   sockets left on the filesystem.
>> >
>> > Best regards, Ilya Maximets.
>> >
>> > On 04.08.2016 23:31, xu.binb...@zte.com.cn wrote:
>> >> Work with DPDK 16.07, a UNIX socket will be created when we
>> >> add vhostuser port. After that, the restarting of ovs-vswitchd
>> >> leads to the failure of socket binding, so the vhostuser port
>> >> can't be created successfully.
>> >>
>> >> This commit unlink socket file before creating UNIX socket to
>> >> avoid failure of socket binding.
>> >>
>> >> Signed-off-by: Binbin Xu 
>> >> ---
>> >>  lib/netdev-dpdk.c | 5 -
>> >>  1 file changed, 4 insertions(+), 1 deletion(-)
>> >>  mode change 100644 => 100755 lib/netdev-dpdk.c
>> >>
>> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> >> old mode 100644
>> >> new mode 100755
>> >> index aaac0d1..95cf7c3
>> >> --- a/lib/netdev-dpdk.c
>> >> +++ b/lib/netdev-dpdk.c
>> >> @@ -885,7 +885,10 @@ netdev_dpdk_vhost_user_construct(struct
>> netdev *netdev)
>> >>   */
>> >>  snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>> >>   vhost_sock_dir, name);
>> >> -
>> >> +
>> >> +if (!(flags & RTE_VHOST_USER_CLIENT)) {
>> >> +unlink(dev->vhost_id);
>> >> +}
>> >>  err = rte_vhost_driver_register(dev->vhost_id, flags);
>> >>  if (err) {
>> >>  VLOG_ERR("vhost-user socket device setup failure for socket 
>> >> %s\n",
>> >>
>> >
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev, 3/3] netdev-dpdk: vHost client mode and reconnect

2016-08-04 Thread Ilya Maximets

Hi, Ciara.
I'm suggesting also following change:
-
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 57dc437..f092fa2 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -959,7 +963,8 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 
 /* Guest becomes an orphan if still attached. */
-if (netdev_dpdk_get_vid(dev) >= 0) {
+if (netdev_dpdk_get_vid(dev) >= 0
+&& !(vhost_driver_flags & RTE_VHOST_USER_CLIENT)) {
 VLOG_ERR("Removing port '%s' while vhost device still attached.",
  netdev->name);
 VLOG_ERR("To restore connectivity after re-adding of port, VM on 
socket"
-

Few comments inline.

On 04.08.2016 13:42, Ciara Loftus wrote:
> A new other_config DB option has been added called 'vhost-driver-mode'.
> By default this is set to 'server' which is the mode of operation OVS
> with DPDK has used up until this point - whereby OVS creates and manages
> vHost user sockets.
> 
> If set to 'client', OVS will act as the vHost client and connect to
> sockets created and managed by QEMU which acts as the server. This mode
> allows for reconnect capability, which allows vHost ports to resume
> normal connectivity in event of switch reset.
> 
> QEMU v2.7.0+ is required when using OVS in client mode and QEMU in
> server mode.
> 
> Signed-off-by: Ciara Loftus 
> ---
>  INSTALL.DPDK-ADVANCED.md | 27 +++
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 28 +---
>  vswitchd/vswitch.xml | 13 +
>  4 files changed, 62 insertions(+), 7 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index f9587b5..a773533 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -483,6 +483,33 @@ For users wanting to do packet forwarding using kernel 
> stack below are the steps
> where `-L`: Changes the numbers of channels of the specified network 
> device
> and `combined`: Changes the number of multi-purpose channels.
>  
> +4. Enable OVS vHost client-mode & vHost reconnect (OPTIONAL)
> +
> +   By default, OVS DPDK acts as the vHost socket server and QEMU the
> +   client. In QEMU v2.7 the option is available for QEMU to act as the
> +   server. In order for this to work, OVS DPDK must be switched to 
> 'client'
> +   mode. This is possible by setting the 'vhost-driver-mode' DB entry to
> +   'client' like so:
> +
> +   ```
> +   ovs-vsctl set Open_vSwitch . other_config:vhost-driver-mode="client"
> +   ```
> +
> +   This must be done before the switch is launched. It cannot sucessfully
> +   be changed after switch has launched.
> +
> +   One must also append ',server' to the 'chardev' arguments on the QEMU
> +   command line, to instruct QEMU to use vHost server mode, like so:
> +
> +   
> +   -chardev 
> socket,id=char0,path=/usr/local/var/run/openvswitch/vhost0,server
> +   
> +
> +   One benefit of using this mode is the ability for vHost ports to
> +   'reconnect' in event of the switch crashing or being brought down. 
> Once
> +   it is brought back up, the vHost ports will reconnect automatically 
> and
> +   normal service will resume.
> +
>- VM Configuration with libvirt
>  
>  * change the user/group, access control policty and restart libvirtd.
> diff --git a/NEWS b/NEWS
> index 9f09e1c..99412ba 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -70,6 +70,7 @@ Post-v2.5.0
> fragmentation or NAT support yet)
>   * Support for DPDK 16.07
>   * Remove dpdkvhostcuse port type.
> + * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
> - Increase number of registers to 16.
> - ovs-benchmark: This utility has been removed due to lack of use and
>   bitrot.
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 7692cc8..c528cb4 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -136,7 +136,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / 
> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
>  #define OVS_VHOST_QUEUE_DISABLED(-2) /* Queue was disabled by guest and 
> not
>* yet mapped to another queue. */
>  
> -static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
> +static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */
> +static uint64_t vhost_driver_flags = 0; /* Denote whether client/server mode 
> */
>  
>  #define VHOST_ENQ_RETRY_NUM 8
>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> @@ -833,7 +834,6 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  const char *name = netdev->name;
>  int

[ovs-dev] [openvswitch 2.5.90] testsuite: 2224 failed

2016-08-05 Thread Ilya Maximets

There is one interesting bug:

Test 2224 (ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS) constantly fails
with 'CFLAGS=-march=native'. All other tests works normally.

Environment:

* OVS current master:
  commit d59831e9b08e ("bridge: No QoS configured is not an error")
* Red Hat Enterprise Linux Server release 7.2 (Maipo)
* Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
* Intel(R) Xeon(R) CPU E5-2690 v3

Test scenario:

1. Checkout current master branch.

2. Configure OVS with default configuration:

   # ./boot.sh && ./configure && make

3. Check test #2224

   # make check TESTSUITEFLAGS='2224'
   2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   ok

4. Clean up

   # make distclean

5. Configure OVS with '-march=native':

   # ./boot.sh && ./configure CFLAGS="-march=native" && make

6. Check test #2224

   # make check TESTSUITEFLAGS='2224'
   2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   FAILED 
(ovn.at:3205)

Test failed because of bad packet:

./ovn.at:3205: cat 1.packets | cut -c 53-
--- expout  2016-08-05 14:29:47.205360523 +0300
+++ /ovs/tests/testsuite.dir/at-groups/2224/stdout   2016-08-05 
14:29:47.215360172 +0300
@@ -1 +1 @@
-0a010a0400430044011c020106006359aa760a04
 
f001
 

 

 

 
638253633501020104ff
 0003040a0136040a0133040e10ff
+0a010a0400430044011c020106006359aa760a04
 
f001
 

 

 

 
6382536335010236040a
 010104ff0003040a0133040e10ff

Full log attached.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [openvswitch 2.5.90] testsuite: 2224 failed

2016-08-05 Thread Ilya Maximets

Exactly same situation with gcc (GCC) 6.1.1 20160510 (Red Hat 6.1.1-2).

On 05.08.2016 14:37, Ilya Maximets wrote:
> There is one interesting bug:
> 
> Test 2224 (ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS) constantly fails
> with 'CFLAGS=-march=native'. All other tests works normally.
> 
> Environment:
> 
>   * OVS current master:
> commit d59831e9b08e ("bridge: No QoS configured is not an error")
>   * Red Hat Enterprise Linux Server release 7.2 (Maipo)
>   * Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
>   * Intel(R) Xeon(R) CPU E5-2690 v3
> 
> Test scenario:
> 
>   1. Checkout current master branch.
> 
>   2. Configure OVS with default configuration:
> 
>  # ./boot.sh && ./configure && make
> 
>   3. Check test #2224
> 
>  # make check TESTSUITEFLAGS='2224'
>  2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   ok
> 
>   4. Clean up
> 
>  # make distclean
> 
>   5. Configure OVS with '-march=native':
> 
>  # ./boot.sh && ./configure CFLAGS="-march=native" && make
> 
>   6. Check test #2224
> 
>  # make check TESTSUITEFLAGS='2224'
>  2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   FAILED 
> (ovn.at:3205)
> 
> Test failed because of bad packet:
> 
> ./ovn.at:3205: cat 1.packets | cut -c 53-
> --- expout  2016-08-05 14:29:47.205360523 +0300
> +++ /ovs/tests/testsuite.dir/at-groups/2224/stdout   2016-08-05 
> 14:29:47.215360172 +0300
> @@ -1 +1 @@
> -0a010a0400430044011c020106006359aa760a04
>  
> f001
>  
> 
>  
> 
>  
> 
>  
> 638253633501020104ff
>  0003040a0136040a0133040e10ff
> +0a010a0400430044011c020106006359aa760a04
>  
> f001
>  
> 
>  
> 
>  
> 
>  
> 6382536335010236040a
>  010104ff0003040a0133040e10ff
> 
> Full log attached.
> 
> Best regards, Ilya Maximets.
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [openvswitch 2.5.90] testsuite: 2224 failed

2016-08-05 Thread Ilya Maximets

Same situation on another environment:

* Ubuntu 16.04 LTS
* Compiler: gcc (Ubuntu 5.3.1-14ubuntu2.1) 5.3.1 20160413
* Intel(R) Core(TM) i7-3770 CPU

Best regards, Ilya Maximets.

On 05.08.2016 14:37, Ilya Maximets wrote:
> There is one interesting bug:
> 
> Test 2224 (ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS) constantly fails
> with 'CFLAGS=-march=native'. All other tests works normally.
> 
> Environment:
> 
>   * OVS current master:
> commit d59831e9b08e ("bridge: No QoS configured is not an error")
>   * Red Hat Enterprise Linux Server release 7.2 (Maipo)
>   * Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
>   * Intel(R) Xeon(R) CPU E5-2690 v3
> 
> Test scenario:
> 
>   1. Checkout current master branch.
> 
>   2. Configure OVS with default configuration:
> 
>  # ./boot.sh && ./configure && make
> 
>   3. Check test #2224
> 
>  # make check TESTSUITEFLAGS='2224'
>  2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   ok
> 
>   4. Clean up
> 
>  # make distclean
> 
>   5. Configure OVS with '-march=native':
> 
>  # ./boot.sh && ./configure CFLAGS="-march=native" && make
> 
>   6. Check test #2224
> 
>  # make check TESTSUITEFLAGS='2224'
>  2224: ovn -- dhcpv4 : 1 HV, 2 LS, 2 LSPs/LS   FAILED 
> (ovn.at:3205)
> 
> Test failed because of bad packet:
> 
> ./ovn.at:3205: cat 1.packets | cut -c 53-
> --- expout  2016-08-05 14:29:47.205360523 +0300
> +++ /ovs/tests/testsuite.dir/at-groups/2224/stdout   2016-08-05 
> 14:29:47.215360172 +0300
> @@ -1 +1 @@
> -0a010a0400430044011c020106006359aa760a04
>  
> f001
>  
> 
>  
> 
>  
> 
>  
> 638253633501020104ff
>  0003040a0136040a0133040e10ff
> +0a010a0400430044011c020106006359aa760a04
>  
> f001
>  
> 
>  
> 
>  
> 
>  
> 6382536335010236040a
>  010104ff0003040a0133040e10ff
> 
> Full log attached.
> 
> Best regards, Ilya Maximets.
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev,V2] netdev-dpdk: fix memory leak

2016-08-05 Thread Ilya Maximets

On 04.08.2016 12:49, Mark Kavanagh wrote:
> DPDK v16.07 introduces the ability to free memzones.
> Up until this point, DPDK memory pools created in OVS could
> not be destroyed, thus incurring a memory leak.
> 
> Leverage the DPDK v16.07 rte_mempool API to free DPDK
> mempools when their associated reference count reaches 0 (this
> indicates that the memory pool is no longer in use).
> 
> Signed-off-by: Mark Kavanagh 
> ---
> 
> v2->v1: rebase to head of master, and remove 'RFC' tag
> 
>  lib/netdev-dpdk.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index aaac0d1..ffcd35c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -506,7 +506,7 @@ dpdk_mp_get(int socket_id, int mtu) 
> OVS_REQUIRES(dpdk_mutex)
>  }
>  
>  static void
> -dpdk_mp_put(struct dpdk_mp *dmp)
> +dpdk_mp_put(struct dpdk_mp *dmp) OVS_REQUIRES(dpdk_mutex)
>  {
>  
>  if (!dmp) {
> @@ -514,15 +514,12 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>  }
>  
>  dmp->refcount--;
> -ovs_assert(dmp->refcount >= 0);
>  
> -#if 0
> -/* I could not find any API to destroy mp. */
> -if (dmp->refcount == 0) {
> -list_delete(dmp->list_node);
> -/* destroy mp-pool. */
> -}
> -#endif
> +if (OVS_UNLIKELY(!dmp->refcount)) {
> +ovs_list_remove(&dmp->list_node);
> +rte_mempool_free(dmp->mp);
> + }
> +
>  }
>  
>  static void
> @@ -928,16 +925,18 @@ netdev_dpdk_destruct(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  
> +ovs_mutex_lock(&dpdk_mutex);
>  ovs_mutex_lock(&dev->mutex);
> +
>  rte_eth_dev_stop(dev->port_id);
>  free(ovsrcu_get_protected(struct ingress_policer *,
>&dev->ingress_policer));
> -ovs_mutex_unlock(&dev->mutex);
>  
> -ovs_mutex_lock(&dpdk_mutex);
>  rte_free(dev->tx_q);
>  ovs_list_remove(&dev->list_node);
>  dpdk_mp_put(dev->dpdk_mp);
> +
> +ovs_mutex_unlock(&dev->mutex);
>  ovs_mutex_unlock(&dpdk_mutex);
>  }
>  
> @@ -946,6 +945,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  
> +ovs_mutex_lock(&dpdk_mutex);
> +ovs_mutex_lock(&dev->mutex);
> +
>  /* Guest becomes an orphan if still attached. */
>  if (netdev_dpdk_get_vid(dev) >= 0) {
>  VLOG_ERR("Removing port '%s' while vhost device still attached.",
> @@ -961,15 +963,14 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>  fatal_signal_remove_file_to_unlink(dev->vhost_id);
>  }
>  
> -ovs_mutex_lock(&dev->mutex);
>  free(ovsrcu_get_protected(struct ingress_policer *,
>&dev->ingress_policer));
> -ovs_mutex_unlock(&dev->mutex);
>  
> -ovs_mutex_lock(&dpdk_mutex);
>  rte_free(dev->tx_q);
>  ovs_list_remove(&dev->list_node);
>  dpdk_mp_put(dev->dpdk_mp);
> +
> +ovs_mutex_unlock(&dev->mutex);
>  ovs_mutex_unlock(&dpdk_mutex);
>  }

I agree that locking here was wrong but this change introduces issue because
'rte_vhost_driver_unregister()' may call 'destroy_device()' and OVS will be 
aborted
on attempt to lock 'dpdk_mutex' again:

VHOST_CONFIG: free connfd = 37 for device '/vhost1'
ovs-vswitchd: lib/netdev-dpdk.c:2305: pthread_mutex_lock failed (Resource 
deadlock avoided)

Program received signal SIGABRT, Aborted.
0x007fb7ad6d38 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x007fb7ad6d38 in raise () from /lib64/libc.so.6
#1  0x007fb7ad8aa8 in abort () from /lib64/libc.so.6
#2  0x00692be0 in ovs_abort_valist at lib/util.c:335
#3  0x00692ba0 in ovs_abort at lib/util.c:327
#4  0x00651800 in ovs_mutex_lock_at (l_=0x899ab0 , 
where=0x78a458 "lib/netdev-dpdk.c:2305") at lib/ovs-thread.c:76
#5  0x006c0190 in destroy_device (vid=0) at lib/netdev-dpdk.c:2305
#6  0x004ea850 in vhost_destroy_device ()
#7  0x004ee578 in rte_vhost_driver_unregister ()
#8  0x006bc8c8 in netdev_dpdk_vhost_destruct (netdev=0x7f6bffed00) at 
lib/netdev-dpdk.c:944
#9  0x005e4ad4 in netdev_unref (dev=0x7f6bffed00) at lib/netdev.c:499
#10 0x005e4b9c in netdev_close (netdev=0x7f6bffed00) at lib/netdev.c:523
[...]
#20 0x0053ad94 in main (argc=7, argv=0x7ff318) at 
vswitchd/ovs-vswitchd.c:112

May be reproduced by removing port while virtio still attached.
This blocks reconnection feature and deletion of port while QEMU still attached.

Someone should fix this. Any thoughts?

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] netdev-dpdk: Fix deadlock in destroy_device().

2016-08-07 Thread Ilya Maximets

Minor comment inline.

Acked-by: Ilya Maximets 

On 05.08.2016 23:57, Daniele Di Proietto wrote:
> netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
> can trigger the destroy_device() callback.  destroy_device() will try to
> take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
> deadlock.
> 
> This problem can be solved by dropping the mutexes before calling
> rte_vhost_driver_unregister().  The netdev_dpdk_vhost_destruct() and
> construct() call are already serialized by netdev_mutex.
> 
> This commit also makes clear that dev->vhost_id is constant and can be
> accessed without taking any mutexes in the lifetime of the devices.
> 
> Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
> Reported-by: Ilya Maximets 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/netdev-dpdk.c | 34 --
>  1 file changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index f37ec1c..98bff62 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -355,8 +355,10 @@ struct netdev_dpdk {
>  /* True if vHost device is 'up' and has been reconfigured at least once 
> */
>  bool vhost_reconfigured;
>  
> -/* Identifier used to distinguish vhost devices from each other */
> -char vhost_id[PATH_MAX];
> +/* Identifier used to distinguish vhost devices from each other.  It does
> + * not change during the lifetime of a struct netdev_dpdk.  It can be 
> read
> + * without holding any mutex. */
> +const char vhost_id[PATH_MAX];
>  
>  /* In dpdk_list. */
>  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
> @@ -846,7 +848,8 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
>  }
>  
>  ovs_mutex_lock(&dpdk_mutex);
> -strncpy(dev->vhost_id, netdev->name, sizeof(dev->vhost_id));
> +strncpy(CONST_CAST(char *, dev->vhost_id), netdev->name,
> +sizeof dev->vhost_id);
>  err = vhost_construct_helper(netdev);
>  ovs_mutex_unlock(&dpdk_mutex);
>  return err;
> @@ -878,7 +881,7 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>  /* Take the name of the vhost-user port and append it to the location 
> where
>   * the socket is to be created, then register the socket.
>   */
> -snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
> +snprintf(CONST_CAST(char *,dev->vhost_id), sizeof(dev->vhost_id), 
> "%s/%s",

Space between arguments of 'CONST_CAST()' and parenthesized operand of 'sizeof'.

>   vhost_sock_dir, name);
>  
>  err = rte_vhost_driver_register(dev->vhost_id, flags);
> @@ -938,6 +941,17 @@ netdev_dpdk_destruct(struct netdev *netdev)
>  ovs_mutex_unlock(&dpdk_mutex);
>  }
>  
> +/* rte_vhost_driver_unregister() can call back destroy_device(), which will
> + * try to acquire 'dpdk_mutex' and possibly 'dev->mutex'.  To avoid a
> + * deadlock, none of the mutexes must be held while calling this function. */
> +static int
> +dpdk_vhost_driver_unregister(struct netdev_dpdk *dev)
> +OVS_EXCLUDED(dpdk_mutex)
> +OVS_EXCLUDED(dev->mutex)
> +{
> +return rte_vhost_driver_unregister(dev->vhost_id);
> +}
> +
>  static void
>  netdev_dpdk_vhost_destruct(struct netdev *netdev)
>  {
> @@ -955,12 +969,6 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>   dev->vhost_id);
>  }
>  
> -if (rte_vhost_driver_unregister(dev->vhost_id)) {
> -VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
> -} else {
> -fatal_signal_remove_file_to_unlink(dev->vhost_id);
> -}
> -
>  free(ovsrcu_get_protected(struct ingress_policer *,
>&dev->ingress_policer));
>  
> @@ -970,6 +978,12 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>  
>  ovs_mutex_unlock(&dev->mutex);
>  ovs_mutex_unlock(&dpdk_mutex);
> +
> +if (dpdk_vhost_driver_unregister(dev)) {
> +VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
> +} else {
> +fatal_signal_remove_file_to_unlink(dev->vhost_id);
> +}
>  }
>  
>  static void
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-08 Thread Ilya Maximets

On 05.08.2016 17:34, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
> 
> Previous: http://openvswitch.org/pipermail/dev/2016-July/076845.html
> 
> v2->v1:
> - rebase to HEAD of master
> - fall back to previous 'good' MTU if reconfigure fails
> - introduce new field 'last_mtu' in struct netdev-dpdk to facilitate
>   fall-back
> - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
> - remove rebasing artifact in INSTALL.DPDK-Advanced.md
> - remove superflous variable in dpdk_mp_configure
> - fix minor coding style infraction
> 
>  INSTALL.DPDK-ADVANCED.md |  58 -
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 165 
> ---
>  4 files changed, 197 insertions(+), 28 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 0ab43d4..5e758ce 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>  
>  ## Contents
>  
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>  
>  ##  1. Overview
>  
> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
> side,
>  
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>  
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
> +enable Jumbo Frames support for a DPDK port, change the Interface's 
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be 
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest frame 
> size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
> +(particularly in use cases involving East-West traffic only), and other DPDK 
> NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames 
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
> in
> +the QEMU command line snippet below:
> +
> +```
> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +'-device 
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +```
> +
> +2. Where virtio devices are bound to the Linux kernel driver in a guest
> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
> driver),
> +   the MTU of those logical network interfaces must also be increased to 
> a
> +   sufficiently large value. This avoids segmentation of Jumbo Frames
> +   received in the guest. Note that 'MTU' refers to the length of the IP
> +   packet only, and not that of the entire frame.
> +
> +   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> +   header and CRC lengths (i.e. 18B) from the max supported frame size.
> +   So, to set the MTU for a 9018B Jumbo Frame:
> +
> +   ```
> +   ifconfig eth1 mtu 9000
> +   ```
> +
> +##  11. Vsperf
>  
>  Vsperf project goal is to develop vSwitch test framework that can be used to
>  validate the suitability of different vSwitch implementations in a Telco 
> deployment
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 253d022..a810ac8 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough].
>  
>  ##  6. Limitations
>  
> -  - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future 
> OVS rel

[ovs-dev] [PATCH v2] netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.

2016-08-08 Thread Ilya Maximets

Binding/unbinding of virtio driver inside VM leads to reconfiguration
of PMD threads. This behaviour may be abused by executing bind/unbind
in an infinite loop to break normal networking on all ports attached
to the same instance of Open vSwitch.

Fix that by avoiding reconfiguration if it's not necessary.
Number of queues will not be decreased to 1 on device disconnection but
it's not very important in comparison with possible DOS attack from the
inside of guest OS.

Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
  ports from attached virtio.")
Reported-by: Ciara Loftus 
Signed-off-by: Ilya Maximets 
---

Version 2:
* Set 'vhost_reconfigured' flag if reconfiguration not
  required.
* Rebased on current master.

 lib/netdev-dpdk.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index b671601..ea0e16e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2299,10 +2299,17 @@ new_device(int vid)
 newnode = dev->socket_id;
 }
 
-dev->requested_socket_id = newnode;
-dev->requested_n_rxq = qp_num;
-dev->requested_n_txq = qp_num;
-netdev_request_reconfigure(&dev->up);
+if (dev->requested_n_txq != qp_num
+|| dev->requested_n_rxq != qp_num
+|| dev->requested_socket_id != newnode) {
+dev->requested_socket_id = newnode;
+dev->requested_n_rxq = qp_num;
+dev->requested_n_txq = qp_num;
+netdev_request_reconfigure(&dev->up);
+} else {
+/* Reconfiguration not required. */
+dev->vhost_reconfigured = true;
+}
 
 ovsrcu_index_set(&dev->vid, vid);
 exists = true;
@@ -2362,11 +2369,7 @@ destroy_device(int vid)
 ovs_mutex_lock(&dev->mutex);
 dev->vhost_reconfigured = false;
 ovsrcu_index_set(&dev->vid, -1);
-/* Clear tx/rx queue settings. */
 netdev_dpdk_txq_map_clear(dev);
-dev->requested_n_rxq = NR_QUEUE;
-dev->requested_n_txq = NR_QUEUE;
-netdev_request_reconfigure(&dev->up);
 
 netdev_change_seq_changed(&dev->up);
 ovs_mutex_unlock(&dev->mutex);
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH V3 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-08 Thread Ilya Maximets

On 08.08.2016 13:24, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
> 
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
> 
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
> 
> 
>  INSTALL.DPDK-ADVANCED.md |  58 -
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 166 
> ---
>  4 files changed, 198 insertions(+), 28 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 0ab43d4..5e758ce 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>  
>  ## Contents
>  
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>  
>  ##  1. Overview
>  
> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
> side,
>  
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>  
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
> +enable Jumbo Frames support for a DPDK port, change the Interface's 
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be 
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest frame 
> size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
> +(particularly in use cases involving East-West traffic only), and other DPDK 
> NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames 
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
> in
> +the QEMU command line snippet below:
> +
> +```
> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +'-device 
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +```
> +
> +2. Where virtio devices are bound to the Linux kernel driver in a guest
> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
> driver),
> +   the MTU of those logical network interfaces must also be increased to 
> a
> +   sufficiently large value. This avoids segmentation of Jumbo Frames
> +   received in the guest. Note that 'MTU' refers to the length of the IP
> +   packet only, and not that of the entire frame.
> +
> +   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> +   header and CRC lengths (i.e. 18B) from the max supported frame size.
> +   So, to set the MTU for a 9018B Jumbo Frame:
> +
> +   ```
> +   ifconfig eth1 mtu 9000
> +   ```
> +
> +##  11. Vsperf
>  
>  Vsperf project goal is to develop vSwitch test framework that can be used to
>  validate the suitability of different vSwitch implementations in a Telco 
> deployment
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 253d022..a810ac8 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough].
>  
>  ##  6. Limitations
>  
> -  - Supports MTU size 1500, MTU setting fo

Re: [ovs-dev] [ovs-dev, v2, 1/3] netdev-dpdk: Remove dpdkvhostcuse ports

2016-08-08 Thread Ilya Maximets

On 04.08.2016 17:09, Ciara Loftus wrote:
> This commit removes the 'dpdkvhostcuse' port type from the userspace
> datapath. vhost-cuse ports are quickly becoming obsolete as the
> vhost-user port type begins to support a greater feature-set thanks to
> the addition of things like vhost-user multiqueue and potential
> upcoming features like vhost-user client-mode and vhost-user reconnect.
> The feature is also expected to be removed from DPDK soon.
> 
> One potential drawback of the removal of this support is that a
> userspace vHost port type is not available in OVS for use with older
> versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old
> this should however be a low impact change.
> 
> Signed-off-by: Ciara Loftus 
> Acked-by: Flavio Leitner 
> Acked-by: Daniele Di Proietto 

Acked-by: Ilya Maximets 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH V4 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-09 Thread Ilya Maximets

= qos_pkts;
> -netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts, cnt);
> +netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
> + cnt + mtu_dropped + qos_pkts);
>  rte_spinlock_unlock(&dev->stats_lock);
>  
>  out:
> @@ -1639,6 +1702,26 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int 
> *mtup)
>  }
>  
>  static int
> +netdev_dpdk_set_mtu(struct netdev *netdev, int mtu)
> +{
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +if (MTU_TO_FRAME_LEN(mtu) > NETDEV_DPDK_MAX_PKT_LEN) {

Should we check for 'mtu < ETHER_MIN_MTU' here? I think it'll be nice
to check it here and pass only sane values to device. Otherwise
configuration will fail and device will be deleted.

> +VLOG_WARN("Unsupported MTU (%d)\n", mtu);

I prefer to mention name of the device in this message.

> +return EINVAL;
> +}
> +
> +ovs_mutex_lock(&dev->mutex);
> +if (dev->requested_mtu != mtu) {
> +dev->requested_mtu = mtu;
> +netdev_request_reconfigure(netdev);
> +}
> +ovs_mutex_unlock(&dev->mutex);
> +
> +return 0;
> +}
> +
> +static int
>  netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
>  
>  static int
> @@ -2803,7 +2886,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>  ovs_mutex_lock(&dev->mutex);
>  
>  if (netdev->n_txq == dev->requested_n_txq
> -&& netdev->n_rxq == dev->requested_n_rxq) {
> +&& netdev->n_rxq == dev->requested_n_rxq
> +&& dev->mtu == dev->requested_mtu) {
>  /* Reconfiguration is unnecessary */
>  
>  goto out;
> @@ -2811,6 +2895,18 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>  
>  rte_eth_dev_stop(dev->port_id);
>  
> +if (dev->mtu != dev->requested_mtu) {
> +int prev_mtu = dev->mtu;
> +
> +dev->mtu = dev->requested_mtu;
> +err = dpdk_mp_configure(dev);
> +if (err) {
> +/* Revert to previous configuration; don't flag this as an error 
> */
> +dev->mtu = prev_mtu;
> +err = 0;
> +}

How about to make this like in 'netdev_dpdk_vhost_user_reconfigure()' :

if (dpdk_mp_configure(dev)) {
/* Failed. Revert to previous configuration */
dev->mtu = prev_mtu;
}
?

> +}
> +
>  netdev->n_txq = dev->requested_n_txq;
>  netdev->n_rxq = dev->requested_n_rxq;
>  
> @@ -2818,6 +2914,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>  err = dpdk_eth_dev_init(dev);
>  netdev_dpdk_alloc_txq(dev, netdev->n_txq);
>  
> +netdev_change_seq_changed(netdev);
> +
>  out:
>  
>  ovs_mutex_unlock(&dev->mutex);
> @@ -2830,7 +2928,6 @@ static int
>  netdev_dpdk_vhost_user_reconfigure(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -int err = 0;
>  
>  ovs_mutex_lock(&dpdk_mutex);
>  ovs_mutex_lock(&dev->mutex);
> @@ -2845,14 +2942,19 @@ netdev_dpdk_vhost_user_reconfigure(struct netdev 
> *netdev)
>  
>  netdev_dpdk_remap_txqs(dev);
>  
> -if (dev->requested_socket_id != dev->socket_id) {
> +if (dev->requested_socket_id != dev->socket_id
> +|| dev->requested_mtu != dev->mtu) {
> +int prev_mtu = dev->mtu;
> +
>  dev->socket_id = dev->requested_socket_id;
> -/* Change mempool to new NUMA Node */
> -dpdk_mp_put(dev->dpdk_mp);
> -dev->dpdk_mp = dpdk_mp_get(dev->socket_id, dev->mtu);
> -if (!dev->dpdk_mp) {
> -err = ENOMEM;
> +dev->mtu = dev->requested_mtu;
> +
> +/* Change mempool to new NUMA Node and to new MTU.
> + * In the event of error, restore previous MTU. */
> +if (dpdk_mp_configure(dev)) {

'dev->socket_id' should be reverted too in this case.

Maybe it'll be better to add 2 arguments ('mtu' and 'socket_id') to
function 'dpdk_mp_configure()' and update real 'mtu' and 'socket_id'
only on success? Like this:

if (!dpdk_mp_configure(dev, dev->requested_mtu,
dev->requested_socket_id)) {
dev->socket_id = dev->requested_socket_id;
dev->mtu = dev->requested_mtu;
}

Another option is to rename 'dpdk_mp_configure' to
'netdev_dpdk_reconfigure_mempool' and use 'requested_{mtu, socket_id}'
inside and update real ones on success. Like this (not tested):

/* Tries to allocate new mempool on requested_socket_id with
 * mbuf size corresponding to requested_mtu.
 * On success new configuration will be applied.
 * On error, device will be left unchanged. */
static int
netdev_dpdk_reconfigure_mempool(struct netdev_dpdk *dev)
OVS_REQUIRES(dpdk_mutex)
OVS_REQUIRES(dev->mutex)
{
uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
struct dpdk_mp *mp;

mp = dpdk_mp_get(dev->requested_socket_id, FRAME_LEN_TO_MTU(buf_size));
if (!mp) {
VLOG_ERR("Insufficient memory to create memory pool for netdev %s"
 " on socket %d\n", dev->up.name, dev->requested_socket_id);
return ENOMEM;
} else {
dpdk_mp_put(dev->dpdk_mp);
dev->dpdk_mp = mp;
dev->mtu = dev->requested_mtu;
dev->socket_id = dev->requested_socket_id;
dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
}

return 0;
}

netdev_dpdk_vhost_user_reconfigure:

if (dev->requested_socket_id != dev->socket_id
|| dev->requested_mtu != dev->mtu) {
netdev_dpdk_reconfigure_mempool(dev);
}

Same in other places.

I prefer option with 'netdev_dpdk_reconfigure_mempool()'.

What do you think?

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH V5 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-09 Thread Ilya Maximets

Few minor comments inline.

On 09.08.2016 15:03, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
> 
> v5:
> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
> - consolidate socket_id and mtu changes within
>   netdev_dpdk_mempool_configure
> - add lower bounds check for user-supplied MTU
> - add socket_id and mtu fields to mempool configure error report
> - minor cosmetic changes
> 
> v4:
> - restore error reporting in *_reconfigure functions (for
>   non-mtu-configuration based errors)
> - remove 'goto' in the event of dpdk_mp_configure failure
> - remove superfluous error variables
> 
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
> 
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
> 
>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 145 
> +++
>  4 files changed, 176 insertions(+), 29 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 0ab43d4..5e758ce 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>  
>  ## Contents
>  
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>  
>  ##  1. Overview
>  
> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
> side,
>  
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>  
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
> +enable Jumbo Frames support for a DPDK port, change the Interface's 
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be 
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest frame 
> size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
> +(particularly in use cases involving East-West traffic only), and other DPDK 
> NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames 
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
> in
> +the QEMU command line snippet below:
> +
> +```
> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +'-device 
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +```
> +
> +2. Where virtio devices are bound to the Linux kernel driver in a guest
> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
> driver),
> +   the MTU of those logical network interfaces must also be increased to 
> a
> +   sufficiently large value. This avoids segmentation of Jumbo Frames
> +   received in the guest. Note that 'MTU' refers to the length of the IP
> +   packet only, and not that of the entire frame.
> +
> +   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> +   header and CRC lengths (i.e. 18B) from the max supported frame size.
> +

Re: [ovs-dev] [PATCH V6 7/7] netdev-dpdk: add support for jumbo frames

2016-08-09 Thread Ilya Maximets

On 09.08.2016 18:02, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
> 
> v6:
> - include device name in netdev_dpdk_set_mtu error log
> - resolve minor coding standards infractions
> 
> v5:
> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
> - consolidate socket_id and mtu changes within
>   netdev_dpdk_mempool_configure
> - add lower bounds check for user-supplied MTU
> - add socket_id and mtu fields to mempool configure error report
> - minor cosmetic changes
> 
> v4:
> - restore error reporting in *_reconfigure functions (for
>   non-mtu-configuration based errors)
> - remove 'goto' in the event of dpdk_mp_configure failure
> - remove superfluous error variables
> 
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
> 
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
> 
> 
>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>  INSTALL.DPDK.md  |   1 -
>  NEWS     |   1 +
>  lib/netdev-dpdk.c| 145 
> +++----
>  4 files changed, 176 insertions(+), 29 deletions(-)

Looks good to me.
You may add one of this tags:

Signed-off-by: Ilya Maximets 
Acked-by: Ilya Maximets 

Choose which of them is more suitable.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH V7 6/7] netdev: Make netdev_set_mtu() netdev parameter non-const.

2016-08-09 Thread Ilya Maximets

On 09.08.2016 19:01, Mark Kavanagh wrote:
> From: Daniele Di Proietto 
> 
> Every provider silently drops the const attribute when converting the
> parameter to the appropriate subclass.  Might as well drop the const
> attribute from the parameter, since this is a "set" function.
> 
> Signed-off-by: Daniele Di Proietto 

Acked-by: Ilya Maximets 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH V7 2/7] vswitchd: Introduce 'mtu_request' column in Interface.

2016-08-09 Thread Ilya Maximets

Hi, Daniele.
Maybe I'm not the very right person to review such things, but I'm
actually used this patch since it appeared in your github and it
looks good to me. Also, I wanted to remove this non-working
implementation of 'netdev_dpdk_set_mtu' for a long time.

Acked-by: Ilya Maximets 

On 09.08.2016 19:01, Mark Kavanagh wrote:
> From: Daniele Di Proietto 
> 
> The 'mtu_request' column can be used to set the MTU of a specific
> interface.
> 
> This column is useful because it will allow changing the MTU of DPDK
> devices (implemented in a future commit), which are not accessible
> outside the ovs-vswitchd process, but it can be used for kernel
> interfaces as well.
> 
> The current implementation of set_mtu() in netdev-dpdk is removed
> because it's broken.  It will be reintroduced by a subsequent commit on
> this series.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  NEWS   |  2 ++
>  lib/netdev-dpdk.c  | 53 
> +-
>  vswitchd/bridge.c  |  9 
>  vswitchd/vswitch.ovsschema | 10 +++--
>  vswitchd/vswitch.xml   | 52 +
>  5 files changed, 58 insertions(+), 68 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index c2ed71d..ce10982 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -101,6 +101,8 @@ Post-v2.5.0
> - ovs-pki: Changed message digest algorithm from SHA-1 to SHA-512 because
>   SHA-1 is no longer secure and some operating systems have started to
>   disable it in OpenSSL.
> +   - Add 'mtu_request' column to the Interface table. It can be used to
> + configure the MTU of non-internal ports.
>  
>  
>  v2.5.0 - 26 Feb 2016
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index f37ec1c..60db568 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1639,57 +1639,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int 
> *mtup)
>  }
>  
>  static int
> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> -{
> -struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -int old_mtu, err, dpdk_mtu;
> -struct dpdk_mp *old_mp;
> -struct dpdk_mp *mp;
> -uint32_t buf_size;
> -
> -ovs_mutex_lock(&dpdk_mutex);
> -ovs_mutex_lock(&dev->mutex);
> -if (dev->mtu == mtu) {
> -err = 0;
> -goto out;
> -}
> -
> -buf_size = dpdk_buf_size(mtu);
> -dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
> -
> -mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
> -if (!mp) {
> -err = ENOMEM;
> -goto out;
> -}
> -
> -rte_eth_dev_stop(dev->port_id);
> -
> -old_mtu = dev->mtu;
> -old_mp = dev->dpdk_mp;
> -dev->dpdk_mp = mp;
> -dev->mtu = mtu;
> -dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> -
> -err = dpdk_eth_dev_init(dev);
> -if (err) {
> -dpdk_mp_put(mp);
> -dev->mtu = old_mtu;
> -dev->dpdk_mp = old_mp;
> -dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> -dpdk_eth_dev_init(dev);
> -goto out;
> -}
> -
> -dpdk_mp_put(old_mp);
> -netdev_change_seq_changed(netdev);
> -out:
> -ovs_mutex_unlock(&dev->mutex);
> -ovs_mutex_unlock(&dpdk_mutex);
> -return err;
> -}
> -
> -static int
>  netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
>  
>  static int
> @@ -2964,7 +2913,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev 
> *netdev)
>  netdev_dpdk_set_etheraddr,\
>  netdev_dpdk_get_etheraddr,\
>  netdev_dpdk_get_mtu,  \
> -netdev_dpdk_set_mtu,  \
> +NULL,   /* set_mtu */ \
>  netdev_dpdk_get_ifindex,  \
>  GET_CARRIER,  \
>  netdev_dpdk_get_carrier_resets,   \
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index ddf1fe5..397be70 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -775,6 +775,15 @@ bridge_delete_or_reconfigure_ports(struct bridge *br)
>  goto delete;
>  }
>  
> +if (iface->cfg->n_mtu_request == 1
> +&& strcmp(iface->type,
> +  ofproto_port_open_type(br->type, "internal"))) {
> +/* Try to set the MTU to the requested value.  This is not done
> + * for inte

[ovs-dev] [PATCH] netdev-dpdk: vhost: Fix double free and use after free with QoS.

2016-08-10 Thread Ilya Maximets

While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will
free mbufs while executing 'netdev_dpdk_policer_run()'. After
that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()'
if 'may_steal == true'. This behaviour will break mempool.

Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't
do this ('may_steal == false'). This will lead to using of already freed
packets by the upper layers.

Fix that by copying all packets that we can't steal like it done
for DPDK_DEV_ETH devices and freeing only packets not freed by QoS.

Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.")
Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0e6db26..9a1f7cd 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1373,14 +1373,13 @@ netdev_dpdk_vhost_update_tx_counters(struct 
netdev_stats *stats,
 
 static void
 __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
- struct dp_packet **pkts, int cnt,
- bool may_steal)
+ struct dp_packet **pkts, int cnt)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
 unsigned int total_pkts = cnt;
-unsigned int qos_pkts = cnt;
-int retries = 0;
+unsigned int qos_pkts = 0;
+int i, retries = 0;
 
 qid = dev->tx_q[qid % netdev->n_txq].map;
 
@@ -1396,7 +1395,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
 
 /* Check has QoS has been configured for the netdev */
 cnt = netdev_dpdk_qos_run__(dev, cur_pkts, cnt);
-qos_pkts -= cnt;
+qos_pkts = total_pkts - cnt;
 
 do {
 int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
@@ -1423,12 +1422,8 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
 rte_spinlock_unlock(&dev->stats_lock);
 
 out:
-if (may_steal) {
-int i;
-
-for (i = 0; i < total_pkts; i++) {
-dp_packet_delete(pkts[i]);
-}
+for (i = 0; i < total_pkts - qos_pkts; i++) {
+dp_packet_delete(pkts[i]);
 }
 }
 
@@ -1488,7 +1483,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 
 if (dev->type == DPDK_DEV_VHOST) {
 __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) mbufs,
- newcnt, true);
+ newcnt);
 } else {
 unsigned int qos_pkts = newcnt;
 
@@ -1516,13 +1511,12 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
bool may_steal, bool concurrent_txq OVS_UNUSED)
 {
 
-if (OVS_UNLIKELY(batch->packets[0]->source != DPBUF_DPDK)) {
+if (OVS_UNLIKELY(!may_steal || batch->packets[0]->source != DPBUF_DPDK)) {
 dpdk_do_tx_copy(netdev, qid, batch);
 dp_packet_delete_batch(batch, may_steal);
 } else {
 dp_packet_batch_apply_cutlen(batch);
-__netdev_dpdk_vhost_send(netdev, qid, batch->packets, batch->count,
- may_steal);
+__netdev_dpdk_vhost_send(netdev, qid, batch->packets, batch->count);
 }
 return 0;
 }
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

2016-05-31 Thread Ilya Maximets

Ping.

Best regards, Ilya Maximets.

On 24.05.2016 16:34, Ilya Maximets wrote:
> Manual pinning of RX queues to PMD threads required for performance
> optimisation. This will give to user ability to achieve max. performance
> using less number of CPUs because currently only user may know which
> ports are heavy loaded and which are not.
> 
> To give full controll on ports TX queue manipulation mechanisms also
> required. For example, to avoid issue described in 'dpif-netdev: XPS
> (Transmit Packet Steering) implementation.' which becomes worse with
> ability of manual pinning.
> ( http://openvswitch.org/pipermail/dev/2016-March/067152.html )
> 
> First 3 patches: prerequisites to XPS implementation.
> Patch #4: XPS implementation.
> Patches #5 and #6: Manual pinning implementation.
> 
> Version 2:
>   * Rebased on current master.
>   * Fixed initialization of newly allocated memory in
> 'port_reconfigure()'.
> 
> Ilya Maximets (6):
>   netdev-dpdk: Use instant sending instead of queueing of packets.
>   dpif-netdev: Allow configuration of number of tx queues.
>   netdev-dpdk: Mandatory locking of TX queues.
>   dpif-netdev: XPS (Transmit Packet Steering) implementation.
>   dpif-netdev: Add dpif-netdev/pmd-reconfigure appctl command.
>   dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command.
> 
>  INSTALL.DPDK.md|  44 +++--
>  NEWS   |   4 +
>  lib/dpif-netdev.c  | 393 
> ++---
>  lib/netdev-bsd.c   |   1 -
>  lib/netdev-dpdk.c  | 198 ++-
>  lib/netdev-dummy.c |   1 -
>  lib/netdev-linux.c |   1 -
>  lib/netdev-provider.h  |  18 +--
>  lib/netdev-vport.c |   1 -
>  lib/netdev.c   |  30 
>  lib/netdev.h   |   1 -
>  vswitchd/ovs-vswitchd.8.in |  10 ++
>  12 files changed, 400 insertions(+), 302 deletions(-)
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH RFC v2 1/6] netdev-dpdk: Use instant sending instead of queueing of packets.

2016-06-01 Thread Ilya Maximets

On 02.06.2016 04:33, Daniele Di Proietto wrote:
> I wanted to do this for a long time now, thanks for posting this patch.
> 
> I didn't notice a drop in throughput with this patch, for phy-ovs-phy
> tests, even when we call rte_eth_tx_burst() for every single packet.

It's good news.

> How about 'dpdk_tx_burst()' instead of 'netdev_dpdk_eth_instant_send()'?
> The "instant" part makes sense compared to the current code, but that
> code is removed.

Yes, I also wanted to change it. May be 'netdev_dpdk_eth_tx_burst()'? Just
to be closer to coding style, and, also, this points out that function
works with 'eth' devices. But, I think, you may choose any name you want
and just replace while pushing.

> Acked-by: Daniele Di Proietto 
> 
> If there are no objection I can push this separately from the rest of
> the series.

OK. Thanks.
Best regards, Ilya Maximets.

> On 24/05/2016 06:34, "Ilya Maximets"  wrote:
> 
>> Current implementarion of TX packet's queueing is broken in several ways:
>>
>>  * TX queue flushing implemented on receive assumes that all
>>core_id-s are sequential and starts from zero. This may lead
>>to situation when packets will stuck in queue forever and,
>>also, this influences on latency.
>>
>>  * For a long time flushing logic depends on uninitialized
>>'txq_needs_locking', because it usually calculated after
>>'netdev_dpdk_alloc_txq' but used inside of this function
>>for initialization of 'flush_tx'.
>>
>> According to current flushing logic, constant flushing required if TX
>> queues will be shared among different CPUs. Further patches will implement
>> mechanisms for manipulations with TX queues in runtime. In this case PMD
>> threads will not know is current queue shared or not. This means that
>> constant flushing will be required.
>>
>> Conclusion: Lets remove queueing at all because it doesn't work
>> properly now and, also, constant flushing required anyway.
>>
>> Testing on basic PHY-OVS-PHY and PHY-OVS-VM-OVS-PHY scenarios shows
>> insignificant performance drop (less than 0.5 percents) in compare to
>> profit that we can achieve in the future using XPS or other features.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>> lib/netdev-dpdk.c | 102 
>> --
>> 1 file changed, 14 insertions(+), 88 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 0d1b8c9..66e33df 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -167,7 +167,6 @@ static const struct rte_eth_conf port_conf = {
>> },
>> };
>>
>> -enum { MAX_TX_QUEUE_LEN = 384 };
>> enum { DPDK_RING_SIZE = 256 };
>> BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
>> enum { DRAIN_TSC = 20ULL };
>> @@ -284,8 +283,7 @@ static struct ovs_list dpdk_mp_list 
>> OVS_GUARDED_BY(dpdk_mutex)
>> = OVS_LIST_INITIALIZER(&dpdk_mp_list);
>>
>> /* This mutex must be used by non pmd threads when allocating or freeing
>> - * mbufs through mempools. Since dpdk_queue_pkts() and dpdk_queue_flush() 
>> may
>> - * use mempools, a non pmd thread should hold this mutex while calling them 
>> */
>> + * mbufs through mempools. */
>> static struct ovs_mutex nonpmd_mempool_mutex = OVS_MUTEX_INITIALIZER;
>>
>> struct dpdk_mp {
>> @@ -299,17 +297,12 @@ struct dpdk_mp {
>> /* There should be one 'struct dpdk_tx_queue' created for
>>  * each cpu core. */
>> struct dpdk_tx_queue {
>> -bool flush_tx; /* Set to true to flush queue everytime 
>> */
>> -   /* pkts are queued. */
>> -int count;
>> rte_spinlock_t tx_lock;/* Protects the members and the NIC queue
>> * from concurrent access.  It is used 
>> only
>> * if the queue is shared among different
>> * pmd threads (see 'txq_needs_locking'). 
>> */
>> int map;   /* Mapping of configured vhost-user queues
>> * to enabled by guest. */
>> -uint64_t tsc;
>> -struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN];
>> };
>>
>> /* dpdk has no way to remove dpdk ring ethernet devices
>> @@ -703,19 +696,6 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned 
>> int n_txqs)
>>
>>

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

2016-06-01 Thread Ilya Maximets

On 02.06.2016 04:32, Daniele Di Proietto wrote:
> 
> On 25/05/2016 04:03, "Ilya Maximets"  wrote:
> 
>> On 23.05.2016 17:55, Traynor, Kevin wrote:
>>>> -Original Message-
>>>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>>>> Sent: Tuesday, May 17, 2016 4:09 PM
>>>> To: dev@openvswitch.org; Daniele Di Proietto 
>>>> Cc: Dyasly Sergey ; Heetae Ahn
>>>> ; Flavio Leitner ;
>>>> Traynor, Kevin ; Pravin B Shelar
>>>> ; Ilya Maximets 
>>>> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
>>>> __netdev_dpdk_vhost_send().
>>>>
>>>> There are situations when PMD thread can hang forever inside
>>>> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
>>>>
>>>> This happens if rte_vring_available_entries() always positive and
>>>> rte_vhost_enqueue_burst() can't send anything (possible with broken
>>>> ring).
>>>>
>>>> In this case time expiration will be never checked and 'do {} while
>>>> (cnt)'
>>>> loop will be infinite.
>>>>
>>>> This scenario sometimes reproducible with dpdk-16.04-rc2 inside guest
>>>> VM.
>>>> Also it may be reproduced by manual braking of ring structure inside
>>>> the guest VM.
>>>>
>>>> Fix that by checking time expiration even if we have available
>>>> entries.
>>>
>>> Hi Ilya,
>>
>> Hi, Kevin.
>>
>> Christian and Thiago CC-ed, because, I think, they're faced with similar 
>> issue.
>>
>>>
>>> Thanks for catching this. This intersects with something else I've seen
>>> wrt retry code and there's a few options...
>>>
>>> 1. Remove retries when nothing sent. For the VM that needs retries it is a
>>> good thing to have, but Bhanu and I saw in a test with multiple VM's 
>>> recently
>>> that if one VM causes a lot of retries there is a large performance 
>>> degradation
>>> for the other VM's. So I changed the retry to only occur when at least one 
>>> packet
>>> has been sent on the previous call. I put a patch up here.
>>> http://openvswitch.org/pipermail/dev/2016-May/071517.html
>>>
>>> If we keep retries we can either
>>>
>>> 2. Make more robust coordination between rte_ring_available_entries() and
>>> rte_vhost_enqueue_burst(), as per your patch.
>>>
>>> 3. As you've shown that we can't rely on the rte_ring_available_entries() 
>>> to know we
>>> can enqueue, how about just remove it and use rte_vhost_enqueue_burst() 
>>> directly
>>> in the retry loop.
>>>
>>> My preference would be for 1. because on balance I'd rather one VM did not 
>>> degrade
>>> performance of others, more than I'd like it to have retries. Of course 
>>> there could
>>> be some compromise between them as well i.e. reduce amount of retries, but 
>>> any retries
>>> could affect performance for another path if they are using the same core.
>>>
>>> What do you think?
>>
>> I'm worry about scenarios with "pulsing" traffic, i.e. if we have not very 
>> big but
>> enough amount of packets to overload vring in a short time and long period 
>> of silence
>> after that. HW can keep in its RX queues much more packets than can be 
>> pushed to
>> virtio ring. In this scenario, without retrying, most of packets will be 
>> dropped.
>>
>> How about just decreasing of VHOST_ENQ_RETRY_USECS to, may be, 1 usec with 
>> my fix
>> applied of course? Such interval should be enough to handle 20G traffic with 
>> 64B
>> packets by one PMD thread. And, also, this timeout may be applied to both 
>> cases
>> (something sent or not) to decrease cost of retrying.
>>
>> Best regards, Ilya Maximets.
> 
> I haven't done any performance testing with many vms, but ...
> 
> I think the retry logic was introduced because at the time NETDEV_MAX_BURST
> was 192 and we felt that batches of 192 packets could easily overload the
> guest ring.
> 
> Now that NETDEV_MAX_BURST is only 32, I agree with Kevin on applying solution 
> 1.
> Retries can degrade performance for other vms, which IMHO is worse than
> dropping packets destined for a "slow" receiver vm.
> 
> Thanks,
> 
> Daniele

Ok. I agree, that removing of timeout is reasonable.
Another thing: Implementation provided by Kevin allows situation (in some
corner case) where rte_vhost_enqueue_burst() will be called 32 times in a row.
How much time will it take? May be we should limit number of retries or time
spent inside __netdev_dpdk_vhost_send() ?

What do you think?

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

2016-06-02 Thread Ilya Maximets

Hi, Daniele.
Thanks for review.

On 02.06.2016 04:33, Daniele Di Proietto wrote:
> Hi Ilya,
> 
> apologies for the delay.
> 
> I didn't take a extremely detailed look at this series, but I have
> a few high level comments.
> 
> Thanks for adding a command to configure the rxq affinity.  Have
> you thought about using the database instead?  I think it will
> be easier to use because it survives restarts, and one can batch
> the affinity assignment for multiple ports without explicitly
> calling pmd-reconfigure.  I'm not sure what the best interface
> would look like. Perhaps a string in Interface:other_config that
> maps rxqs with core ids?
>
> I'd prefer to avoid exporting an explicit command like
> dpif-netdev/pmd-reconfigure.  If we use the database we don't have to,
> right?

I thought about solution with database. Actually, I can't see big
difference between database and appctl in this case. For automatic
usage both commands may be scripted, but for manual pinning this
approaches equally uncomfortable.
IMHO, if it will be database it shouldn't be a per 'Interface'
string with mapping, because one map influences on other ports
(core isolation). Also there is an issue with synchronization with
'pmd-cpu-mask' that should be performed manually anyway.
appctl command may be changed to receive string of all mappings and
trigger reconfiguration. In this case there will be no need to have
explicit 'dpif-netdev/pmd-reconfigure'.

> I'm not sure what's the best way to introduce XPS in OVS.  First of all,
> for physical NICs I'd want to keep the current model where possible
> (one queue per core, no locking).  Maybe we can introduce some mapping
> just for vhost-user ports in netdev-dpdk to implement packet steering?

We can just set default values for 'n_txq' more accurately: 'n_cores() + 1'
for phy ports and '1' for virtual. To avoid locking on TX to phy ports
we may just check that 'n_txq' >= 'n_cores() + 1'. We can do that because
reconfiguration required to change 'n_txq' and, in current implementation
of XPS, one queue will not be used twice if we have unused queues.

> Also, have you considered packet reordering issues?  If a 5-tuple is
> processed by one core and we often change the tx queue, we end up
> with the same 5-tuple on two different tx queues.

To avoid reordering issues there is a timeout mechanism inside XPS.
Current tx queue_id may be changed only if there was no packets to
this queue for a significant amount of time (XPS_CYCLES).

> Lastly I'm not 100% sure we need a "n-txq" parameter.  I think that
> for vhost-user ports, we know the value in new_device() (we should
> do that for rxqs too now that we have netdev_reconfigure).  physical
> NICs txqs should match the cpu cores and avoid locking, where
> possible.

Actually, we don't need this if default values will be set as described
above and reconfiguration request will be used inside 'new_device()'
(I guess, this should be implemented separately).
But it's harmless and, actually, almost free in implementation. May be
it can be used for some specific cases or devices.

Best regards, Ilya Maximets.

> On 24/05/2016 06:34, "Ilya Maximets"  wrote:
> 
>> Manual pinning of RX queues to PMD threads required for performance
>> optimisation. This will give to user ability to achieve max. performance
>> using less number of CPUs because currently only user may know which
>> ports are heavy loaded and which are not.
>>
>> To give full controll on ports TX queue manipulation mechanisms also
>> required. For example, to avoid issue described in 'dpif-netdev: XPS
>> (Transmit Packet Steering) implementation.' which becomes worse with
>> ability of manual pinning.
>> ( http://openvswitch.org/pipermail/dev/2016-March/067152.html )
>>
>> First 3 patches: prerequisites to XPS implementation.
>> Patch #4: XPS implementation.
>> Patches #5 and #6: Manual pinning implementation.
>>
>> Version 2:
>>  * Rebased on current master.
>>  * Fixed initialization of newly allocated memory in
>>'port_reconfigure()'.
>>
>> Ilya Maximets (6):
>>  netdev-dpdk: Use instant sending instead of queueing of packets.
>>  dpif-netdev: Allow configuration of number of tx queues.
>>  netdev-dpdk: Mandatory locking of TX queues.
>>  dpif-netdev: XPS (Transmit Packet Steering) implementation.
>>  dpif-netdev: Add dpif-netdev/pmd-reconfigure appctl command.
>>  dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command.
>>
>> INSTALL.DPDK.md|  44 +++--
>> NEWS   |

Re: [ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

2016-06-03 Thread Ilya Maximets

One addition below.

On 02.06.2016 16:55, Ilya Maximets wrote:
> Hi, Daniele.
> Thanks for review.
> 
> On 02.06.2016 04:33, Daniele Di Proietto wrote:
>> Hi Ilya,
>>
>> apologies for the delay.
>>
>> I didn't take a extremely detailed look at this series, but I have
>> a few high level comments.
>>
>> Thanks for adding a command to configure the rxq affinity.  Have
>> you thought about using the database instead?  I think it will
>> be easier to use because it survives restarts, and one can batch
>> the affinity assignment for multiple ports without explicitly
>> calling pmd-reconfigure.  I'm not sure what the best interface
>> would look like. Perhaps a string in Interface:other_config that
>> maps rxqs with core ids?
>>
>> I'd prefer to avoid exporting an explicit command like
>> dpif-netdev/pmd-reconfigure.  If we use the database we don't have to,
>> right?
> 
> I thought about solution with database. Actually, I can't see big
> difference between database and appctl in this case. For automatic
> usage both commands may be scripted, but for manual pinning this
> approaches equally uncomfortable.
> IMHO, if it will be database it shouldn't be a per 'Interface'
> string with mapping, because one map influences on other ports
> (core isolation). Also there is an issue with synchronization with
> 'pmd-cpu-mask' that should be performed manually anyway.
> appctl command may be changed to receive string of all mappings and
> trigger reconfiguration. In this case there will be no need to have
> explicit 'dpif-netdev/pmd-reconfigure'.
> 
>  
>> I'm not sure what's the best way to introduce XPS in OVS.  First of all,
>> for physical NICs I'd want to keep the current model where possible
>> (one queue per core, no locking).  Maybe we can introduce some mapping
>> just for vhost-user ports in netdev-dpdk to implement packet steering?
> 
> We can just set default values for 'n_txq' more accurately: 'n_cores() + 1'
> for phy ports and '1' for virtual. To avoid locking on TX to phy ports
> we may just check that 'n_txq' >= 'n_cores() + 1'. We can do that because
> reconfiguration required to change 'n_txq' and, in current implementation
> of XPS, one queue will not be used twice if we have unused queues.
> 
>> Also, have you considered packet reordering issues?  If a 5-tuple is
>> processed by one core and we often change the tx queue, we end up
>> with the same 5-tuple on two different tx queues.
> 
> To avoid reordering issues there is a timeout mechanism inside XPS.
> Current tx queue_id may be changed only if there was no packets to
> this queue for a significant amount of time (XPS_CYCLES).
> 
>> Lastly I'm not 100% sure we need a "n-txq" parameter.  I think that
>> for vhost-user ports, we know the value in new_device() (we should
>> do that for rxqs too now that we have netdev_reconfigure).  physical
>> NICs txqs should match the cpu cores and avoid locking, where
>> possible.
> 
> Actually, we don't need this if default values will be set as described
> above and reconfiguration request will be used inside 'new_device()'
> (I guess, this should be implemented separately).
> But it's harmless and, actually, almost free in implementation. May be
> it can be used for some specific cases or devices.

For example, it may be used to test multiqueue with 'dummy-pmd' from
'PMD Testsuite' patch-set:
http://openvswitch.org/pipermail/dev/2016-May/071815.html

> 
> Best regards, Ilya Maximets.
> 
>> On 24/05/2016 06:34, "Ilya Maximets"  wrote:
>>
>>> Manual pinning of RX queues to PMD threads required for performance
>>> optimisation. This will give to user ability to achieve max. performance
>>> using less number of CPUs because currently only user may know which
>>> ports are heavy loaded and which are not.
>>>
>>> To give full controll on ports TX queue manipulation mechanisms also
>>> required. For example, to avoid issue described in 'dpif-netdev: XPS
>>> (Transmit Packet Steering) implementation.' which becomes worse with
>>> ability of manual pinning.
>>> ( http://openvswitch.org/pipermail/dev/2016-March/067152.html )
>>>
>>> First 3 patches: prerequisites to XPS implementation.
>>> Patch #4: XPS implementation.
>>> Patches #5 and #6: Manual pinning implementation.
>>>
>>> Version 2:
>>> * Rebased on current master.
>>> * Fixed init

Re: [ovs-dev] [PATCH 5/5] tests: Allow extra cmd line args to OVS_VSWITCHD_START.

2016-06-07 Thread Ilya Maximets

For the series:
Acked-by: Ilya Maximets 

One little issue is that initialization of dummy-numa is usually
occurs before enabling of log file. This means that information
about discovered cores and numas not available in ovs-vswitchd.log.
But in testing time this log messages are available inside stderr,
so, it's doesn't really matter.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 0/3] PMD Testsuite.

2016-06-07 Thread Ilya Maximets

A bunch of PMD specific tests. Some already existing tests
reused to run with 'dummy-pmd' interfaces.

Version 3:
* Rebased on top of 'dummy-numa' patch-set:
  http://openvswitch.org/pipermail/dev/2016-June/072277.html
* --dummy-numa option used for all pmd tests.
  This should fix issues with non-Linux platforms and Travis-CI.
* Skipped already applied patches.

Version 2:
* 'dummy-pmd' implemented as a separate netdev_class.
  'dummy' and 'dummy-pmd' available at the same time.
* Proper multiqueue support implemented.
* Only few tests restarted with dummy-pmd.
  Restarting implemented similar to python2/3 case.
    * Rebased on current master.

Ilya Maximets (3):
  dpif-netdev.at: Run tests with dummy-pmd.
  ofproto-dpif.at: Run tests with dummy-pmd.
  testsuite: Add PMD specific tests.

 tests/automake.mk   |   1 +
 tests/dpif-netdev.at| 147 --
 tests/ofproto-dpif.at   | 157 +++--
 tests/ofproto-macros.at |  31 ++---
 tests/pmd.at| 182 
 tests/testsuite.at  |   1 +
 6 files changed, 378 insertions(+), 141 deletions(-)
 create mode 100644 tests/pmd.at

-- 
2.5.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 1/3] dpif-netdev.at: Run tests with dummy-pmd.

2016-06-07 Thread Ilya Maximets

Signed-off-by: Ilya Maximets 
---
 tests/dpif-netdev.at | 147 ---
 1 file changed, 80 insertions(+), 67 deletions(-)

diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
index 12468f4..ab83634 100644
--- a/tests/dpif-netdev.at
+++ b/tests/dpif-netdev.at
@@ -40,110 +40,123 @@ strip_metadata () {
 ]
 m4_divert_pop([PREPARE_TESTS])
 
-AT_SETUP([dpif-netdev - dummy interface])
-# Create br0 with interfaces p1 and p7
-#and br1 with interfaces p2 and p8
-# with p1 and p2 connected via unix domain socket
-OVS_VSWITCHD_START(
-  [add-port br0 p1 -- set interface p1 type=dummy 
options:pstream=punix:$OVS_RUNDIR/p0.sock ofport_request=1 -- \
-   add-port br0 p7 -- set interface p7 ofport_request=7 type=dummy -- \
-   add-br br1 -- \
-   set bridge br1 other-config:hwaddr=aa:66:aa:66:00:00 -- \
-   set bridge br1 datapath-type=dummy other-config:datapath-id=1234 \
-  fail-mode=secure -- \
-   add-port br1 p2 -- set interface p2 type=dummy 
options:stream=unix:$OVS_RUNDIR/p0.sock ofport_request=2 -- \
-   add-port br1 p8 -- set interface p8 ofport_request=8 type=dummy --])
-AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
-
-AT_CHECK([ovs-ofctl add-flow br0 action=normal])
-AT_CHECK([ovs-ofctl add-flow br1 action=normal])
-ovs-appctl time/stop
-ovs-appctl time/warp 5000
-AT_CHECK([ovs-appctl netdev-dummy/receive p7 
'in_port(7),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
-AT_CHECK([ovs-appctl netdev-dummy/receive p8 
'in_port(8),eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800),ipv4(src=10.0.0.3,dst=10.0.0.4,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
-ovs-appctl time/warp 100
-sleep 1  # wait for forwarders process packets
-
-AT_CHECK([filter_flow_install < ovs-vswitchd.log | strip_xout], [0], [dnl
+m4_define([DPIF_NETDEV_DUMMY_IFACE],
+  [AT_SETUP([dpif-netdev - $1 interface])
+   # Create br0 with interfaces p1 and p7
+   #and br1 with interfaces p2 and p8
+   # with p1 and p2 connected via unix domain socket
+   OVS_VSWITCHD_START(
+ [add-port br0 p1 -- set interface p1 type=$1 
options:pstream=punix:$OVS_RUNDIR/p0.sock ofport_request=1 -- \
+  add-port br0 p7 -- set interface p7 ofport_request=7 type=$1 -- \
+  add-br br1 -- \
+  set bridge br1 other-config:hwaddr=aa:66:aa:66:00:00 -- \
+  set bridge br1 datapath-type=dummy other-config:datapath-id=1234 \
+ fail-mode=secure -- \
+  add-port br1 p2 -- set interface p2 type=$1 
options:stream=unix:$OVS_RUNDIR/p0.sock ofport_request=2 -- \
+  add-port br1 p8 -- set interface p8 ofport_request=8 type=$1 --], [], [],
+  [m4_if([$1], [dummy-pmd], [--dummy-numa="0,0,0,0,1,1,1,1"], [])])
+   AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
+
+   AT_CHECK([ovs-ofctl add-flow br0 action=normal])
+   AT_CHECK([ovs-ofctl add-flow br1 action=normal])
+   ovs-appctl time/stop
+   ovs-appctl time/warp 5000
+   AT_CHECK([ovs-appctl netdev-dummy/receive p7 
'in_port(7),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
+   AT_CHECK([ovs-appctl netdev-dummy/receive p8 
'in_port(8),eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800),ipv4(src=10.0.0.3,dst=10.0.0.4,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
+   ovs-appctl time/warp 100
+   sleep 1  # wait for forwarders process packets
+
+   AT_CHECK([filter_flow_install < ovs-vswitchd.log | strip_xout], [0], [dnl
 
recirc_id=0,ip,in_port=1,vlan_tci=0x/0x1fff,dl_src=50:54:00:00:00:0b,dl_dst=50:54:00:00:00:0c,nw_frag=no,
 actions: 
 
recirc_id=0,ip,in_port=2,vlan_tci=0x/0x1fff,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_frag=no,
 actions: 
 
recirc_id=0,ip,in_port=7,vlan_tci=0x/0x1fff,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_frag=no,
 actions: 
 
recirc_id=0,ip,in_port=8,vlan_tci=0x/0x1fff,dl_src=50:54:00:00:00:0b,dl_dst=50:54:00:00:00:0c,nw_frag=no,
 actions: 
 ])
 
-OVS_VSWITCHD_STOP
-AT_CLEANUP
+   OVS_VSWITCHD_STOP
+   AT_CLEANUP])
 
-AT_SETUP([dpif-netdev - miss upcall key matches flow_install])
-OVS_VSWITCHD_START(
-  [add-port br0 p1 -- set interface p1 type=dummy 
options:pstream=punix:$OVS_RUNDIR/p0.sock
-   set bridge br0 datapath-type=dummy other-config:datapath-id=1234 \
-  fail-mode=secure])
-AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
+DPIF_NETDEV_DUMMY_IFACE([dummy])
+DPIF_NETDEV_DUMMY_IFACE([dummy-pmd])
 
-AT_CHECK([ovs-ofctl add-flow br0 action=normal])
-AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
-sleep 1
+m4_define([DPIF_NETDEV_MISS_

[ovs-dev] [PATCH v3 3/3] testsuite: Add PMD specific tests.

2016-06-07 Thread Ilya Maximets

Signed-off-by: Ilya Maximets 
---
 tests/automake.mk  |   1 +
 tests/pmd.at   | 182 +
 tests/testsuite.at |   1 +
 3 files changed, 184 insertions(+)
 create mode 100644 tests/pmd.at

diff --git a/tests/automake.mk b/tests/automake.mk
index 7af7f69..777f6db 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -48,6 +48,7 @@ TESTSUITE_AT = \
tests/json.at \
tests/jsonrpc.at \
tests/jsonrpc-py.at \
+   tests/pmd.at \
tests/tunnel.at \
tests/tunnel-push-pop.at \
tests/tunnel-push-pop-ipv6.at \
diff --git a/tests/pmd.at b/tests/pmd.at
new file mode 100644
index 000..5ca9323
--- /dev/null
+++ b/tests/pmd.at
@@ -0,0 +1,182 @@
+AT_BANNER([PMD])
+
+dnl CHECK_CPU_DISCOVERED([n_cpu])
+dnl
+dnl Waits until CPUs discovered and checks if number of discovered CPUs
+dnl is greater or equal to 'n_cpu'. Without parameters checks that at
+dnl least one CPU discovered.
+m4_define([CHECK_CPU_DISCOVERED], [
+PATTERN="Discovered [[0-9]]* NUMA nodes and [[0-9]]* CPU cores"
+OVS_WAIT_UNTIL([grep "$PATTERN" stderr])
+N_CPU=$(grep "$PATTERN" stderr | sed -e 's/.* \([[0-9]]*\) CPU cores/\1/')
+if [[ -z "$1" ]]
+then AT_CHECK([test "$N_CPU" -gt "0"])
+else AT_SKIP_IF([test "$N_CPU" -lt "$1"])
+fi
+])
+
+dnl CHECK_PMD_THREADS_CREATED([n_threads], [numa_id], [+line])
+dnl
+dnl Whaits for creation of 'n_threads' or at least 1 thread if $1 not
+dnl passed. Checking starts from line number 'line' in ovs-vswithd.log .
+m4_define([CHECK_PMD_THREADS_CREATED], [
+PATTERN="Created [[0-9]]* pmd threads on numa node $2"
+line_st=$3
+if [[ -z "$line_st" ]]
+then
+line_st="+0"
+fi
+OVS_WAIT_UNTIL([tail -n $line_st ovs-vswitchd.log | grep "$PATTERN"])
+N_THREADS=$(tail -n $line_st ovs-vswitchd.log | grep "$PATTERN" | tail -1 
| sed -e 's/.* \([[0-9]]*\) pmd .*/\1/')
+if [[ -z "$1" ]]
+then AT_CHECK([test "$N_THREADS" -gt 0])
+else AT_CHECK([test "$N_THREADS" -eq "$1"])
+fi
+])
+
+m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/"])
+m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0,1,1,1,1"])
+
+AT_SETUP([PMD - creating a thread/add-port])
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy-pmd], [], 
[], [DUMMY_NUMA])
+
+CHECK_CPU_DISCOVERED()
+CHECK_PMD_THREADS_CREATED()
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], 
[0], [dnl
+pmd thread numa_id  core_id :
+   port: p0queue-id: 0
+])
+
+AT_CHECK([ovs-appctl dpif/show | sed 
's/\(tx_queues=\)[[0-9]]*/\1/g'], [0], [dnl
+dummy@ovs-dummy: hit:0 missed:0
+   br0:
+   br0 65534/100: (dummy)
+   p0 1/1: (dummy-pmd: configured_rx_queues=1, 
configured_tx_queues=, requested_rx_queues=1, 
requested_tx_queues=)
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([PMD - multiqueue support])
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy-pmd], [], 
[], [DUMMY_NUMA])
+
+CHECK_CPU_DISCOVERED()
+CHECK_PMD_THREADS_CREATED()
+
+AT_CHECK([ovs-vsctl set interface p0 options:n_rxq=8])
+
+AT_CHECK([ovs-appctl dpif/show | sed 
's/\(tx_queues=\)[[0-9]]*/\1/g'], [0], [dnl
+dummy@ovs-dummy: hit:0 missed:0
+   br0:
+   br0 65534/100: (dummy)
+   p0 1/1: (dummy-pmd: configured_rx_queues=8, 
configured_tx_queues=, requested_rx_queues=8, 
requested_tx_queues=)
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], 
[0], [dnl
+pmd thread numa_id  core_id :
+   port: p0queue-id: 0 1 2 3 4 5 6 7
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+
+AT_SETUP([PMD - pmd-cpu-mask/distribution of rx queues])
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy-pmd 
options:n_rxq=8],
+   [], [], [DUMMY_NUMA])
+
+CHECK_CPU_DISCOVERED(2)
+CHECK_PMD_THREADS_CREATED()
+
+AT_CHECK([ovs-appctl dpif/show | sed 
's/\(tx_queues=\)[[0-9]]*/\1/g'], [0], [dnl
+dummy@ovs-dummy: hit:0 missed:0
+   br0:
+   br0 65534/100: (dummy)
+   p0 1/1: (dummy-pmd: configured_rx_queues=8, 
configured_tx_queues=, requested_rx_queues=8, 
requested_tx_queues=)
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], 
[0], [dnl
+pmd thread numa_id  core_id :
+   port: p0queue-id: 0 1 2 3 4 5 6 7
+])
+
+TMP=$(cat ovs-vswitchd.log | wc -l)
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=3])
+CHECK_PMD_THREADS_CREATED([2], [], [+$TMP])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], 
[0], [dnl
+pmd thread numa_id  core_id :
+   port: p0qu

[ovs-dev] [PATCH v3 2/3] ofproto-dpif.at: Run tests with dummy-pmd.

2016-06-07 Thread Ilya Maximets

Signed-off-by: Ilya Maximets 
---
 tests/ofproto-dpif.at   | 157 
 tests/ofproto-macros.at |  31 +++---
 2 files changed, 114 insertions(+), 74 deletions(-)

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index d0aacfa..638d269 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -6007,16 +6007,17 @@ OVS_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([ofproto-dpif - ovs-appctl dpif/show])
-OVS_VSWITCHD_START([add-br br1 -- set bridge br1 datapath-type=dummy])
-add_of_ports br0 1 2
+OVS_VSWITCHD_START([add-br br1 -- set bridge br1 datapath-type=dummy], [], [],
+   [--dummy-numa="0,0,0,0,1,1,1,1"])
+add_pmd_of_ports br0 1 2
 add_of_ports br1 3
 
-AT_CHECK([ovs-appctl dpif/show], [0], [dnl
+AT_CHECK([ovs-appctl dpif/show | sed 's/\(dummy-pmd: \).*)/\1)/'], 
[0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
br0 65534/100: (dummy)
-   p1 1/1: (dummy)
-   p2 2/2: (dummy)
+   p1 1/1: (dummy-pmd: )
+   p2 2/2: (dummy-pmd: )
br1:
br1 65534/101: (dummy)
p3 3/3: (dummy)
@@ -6028,8 +6029,10 @@ AT_SETUP([ofproto-dpif - ovs-appctl dpif/dump-flows])
 # bump max-idle to avoid the flows being reclaimed behind us
 OVS_VSWITCHD_START([add-br br1 -- \
 set bridge br1 datapath-type=dummy fail-mode=secure -- \
-set Open_vSwitch . other_config:max-idle=1])
-add_of_ports br0 1 2
+set Open_vSwitch . other_config:max-idle=1], [], [],
+[--dummy-numa="0,0,0,0,1,1,1,1"])
+add_of_ports br0 1
+add_pmd_of_ports br0 2
 add_of_ports br1 3
 
 AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
@@ -6057,24 +6060,31 @@ 
skb_priority(0/0),skb_mark(0/0),recirc_id(0),dp_hash(0/0),in_port(p3),eth(src=50
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
-AT_SETUP([ofproto-dpif - ovs-appctl dpif/get-flow])
+m4_define([OFPROTO_DPIF_GET_FLOW],
+  [AT_SETUP([ofproto-dpif - ovs-appctl dpif/get-flow$1])
 
-OVS_VSWITCHD_START([add-br br1 -- \
-set bridge br1 datapath-type=dummy fail-mode=secure -- \
-set Open_vSwitch . other_config:max-idle=1])
-add_of_ports br0 1 2
+   OVS_VSWITCHD_START([add-br br1 -- \
+   set bridge br1 datapath-type=dummy fail-mode=secure -- \
+   set Open_vSwitch . other_config:max-idle=1], [], [],
+   [m4_if([$1], [], [], [--dummy-numa="0,0,0,0,1,1,1,1"])])
 
-AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
-ovs-appctl revalidator/wait
-AT_CHECK([ovs-appctl dpif/dump-flows -m br0], [0], [stdout])
+   func=`echo -n "$1_" | cut -c 4-`
+   add_${func}of_ports br0 1 2
+
+   AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
+   ovs-appctl revalidator/wait
+   AT_CHECK([ovs-appctl dpif/dump-flows -m br0], [0], [stdout])
 
-UFID=`sed -n 's/\(ufid:[[-0-9a-fA-F]]*\).*/\1/p' stdout`
-AT_CHECK([ovs-appctl dpctl/get-flow $UFID], [0], [dnl
+   UFID=`sed -n 's/\(ufid:[[-0-9a-fA-F]]*\).*/\1/p' stdout`
+   AT_CHECK([ovs-appctl dpctl/get-flow $UFID], [0], [dnl
 recirc_id(0),in_port(1),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
used:never, actions:drop
 ])
 
-OVS_VSWITCHD_STOP
-AT_CLEANUP
+   OVS_VSWITCHD_STOP
+   AT_CLEANUP])
+
+OFPROTO_DPIF_GET_FLOW([])
+OFPROTO_DPIF_GET_FLOW([ - pmd])
 
 AT_SETUP([ofproto-dpif - MPLS actions that result in a userspace action])
 OVS_VSWITCHD_START([dnl
@@ -6386,20 +6396,25 @@ 
recirc_id=0,icmp,in_port=1,vlan_tci=0x,nw_frag=no,icmp_type=0x8/0xff, action
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
-AT_SETUP([ofproto-dpif megaflow - normal])
-OVS_VSWITCHD_START
-AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
-add_of_ports br0 1 2
-AT_CHECK([ovs-ofctl add-flow br0 action=normal])
-AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
-AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800),ipv4(src=10.0.0.4,dst=10.0.0.3,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
-sleep 1
-AT_CHECK([filter_flow_install < ovs-vswitchd.log | strip_xout], [0], [dnl
+m4_define([OFPROTO_DPIF_MEGAFLOW_NORMAL],
+  [AT_SETUP(

Re: [ovs-dev] [PATCH v2 0/7] PMD Testsuite.

2016-06-07 Thread Ilya Maximets

Thanks for all your review and testing.
I've sent new version of tests based on your 'dummy-numa' series:
http://openvswitch.org/pipermail/dev/2016-June/072331.html

Actually I have no environment to test with non-Linux systems and
also have no account to test with travis. So, this series is
only tested on Linux. I hope, new version will work.

Best regards, Ilya Maximets.

On 07.06.2016 04:53, Daniele Di Proietto wrote:
> Hi Ilya,
> 
> thanks for the series, I really appreciate the effort to test
> the pmd threads.
> 
> I reviewed the whole series and it looks good to me, but I found
> some problems with travis and non linux systems.
> 
> 1 On FreeBSD (and I assume also on windows), the pmd tests fail,
>   because ovs_numa_get_n_cores() returns OVS_CORE_UNSPEC and
>   dpif-netdev refuses to add the port.  This could be easily
>   solvable by skipping the tests on windows and BSD with
>   AT_SKIP_IF.
> 
> 2 When I try to run the pmd tests under valgrind, I get a failure,
>   because of a timeout.  I've tried adding a sched_yield() in
>   netdev_dummy_rxq_recv() and the issue seems to be fixed, so maybe
>   we can do that.
> 
> 3 When the tests are run as a non root user, pthread_setaffinity_np()
>   fails, causing the travis build to fail:
>   
>   https://travis-ci.org/ddiproietto/ovs/jobs/135744944#L7953
> 
> My idea to address 1 and 3 (I'm especially concerned about 3) is to have
> a dummy implementation for ovs-numa.  I've attempted that in the past,
> when I tried to work on my dummy-pmd implementation, so I posted a
> series here:
> 
> http://openvswitch.org/pipermail/dev/2016-June/072277.html
> 
> but I'm open to better ideas.  Feel free to ack or include my patches
> in your series (or, if you have better ideas to discard them)
> 
> In the meantime I pushed everything except the tests to master.
> 
> 
> 
> 
> Thanks,
> 
> Daniele
> 
> On 27/05/2016 06:32, "Ilya Maximets"  wrote:
> 
>> New 'dummy-pmd' class created in a purpose of testing of PMD interfaces.
>> Added a bunch of PMD specific tests. Some already existing tests
>> reused to run with 'dummy-pmd' interfaces.
>> 'appctl dpctl/flow-get' implemented for dpif-netdev with PMD threads.
>>
>> Version 2:
>>  * 'dummy-pmd' implemented as a separate netdev_class.
>>'dummy' and 'dummy-pmd' available at the same time.
>>  * Proper multiqueue support implemented.
>>  * Only few tests restarted with dummy-pmd.
>>Restarting implemented similar to python2/3 case.
>>  * Rebased on current master.
>>
>> Ilya Maximets (7):
>>  netdev-dummy: Add dummy-pmd class.
>>  dpif-netdev.at: Run tests with dummy-pmd.
>>  dpctl: Implement dpctl/flow-get for dpif-netdev.
>>  ofproto-dpif.at: Run tests with dummy-pmd.
>>  ovs-vsctl.at: Use OVS_VSCTL_CLEANUP.
>>  netdev-dummy: Add multiqueue support to dummy-pmd.
>>  testsuite-pmd: Add PMD specific tests.
>>
>> lib/dpctl.c |   3 +-
>> lib/dpif-netdev.c   |  49 ++---
>> lib/netdev-dummy.c  | 264 
>> +---
>> tests/automake.mk   |   1 +
>> tests/dpif-netdev.at| 146 ++
>> tests/ofproto-dpif.at   | 149 +++
>> tests/ofproto-macros.at |  31 --
>> tests/ovs-vsctl.at  |   6 +-
>> tests/pmd.at| 179 
>> tests/testsuite.at  |   1 +
>> 10 files changed, 588 insertions(+), 241 deletions(-)
>> create mode 100644 tests/pmd.at
>>
>> -- 
>> 2.5.0
>>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Remove vhost send retries when no packets have been sent.

2016-06-13 Thread Ilya Maximets

Looks good to me.
Acked-by: Ilya Maximets 


On 11.06.2016 02:08, Daniele Di Proietto wrote:
> Thanks for the patch, it looks good to me.
> 
> If everybody agrees (Ilya?) I can push this to master.
> 
> Thanks,
> 
> Daniele
> 
> 2016-06-10 9:49 GMT-07:00 Kevin Traynor  <mailto:kevin.tray...@intel.com>>:
> 
> If the guest is connected but not servicing the virt queue, this leads
> to vhost send retries until timeout. This is fine in isolation but if
> there are other high rate queues also being serviced by the same PMD
> it can lead to a performance hit on those queues. Change to only retry
> when at least some packets have been successfully sent on the previous
> attempt.
> 
> Also, limit retries to avoid a similar delays if packets are being sent
> at a very low rate due to few available descriptors.
> 
> Reported-by: Bhanuprakash Bodireddy  <mailto:bhanuprakash.bodire...@intel.com>>
> Signed-off-by: Kevin Traynor  <mailto:kevin.tray...@intel.com>>
> Acked-by: Bhanuprakash Bodireddy  <mailto:bhanuprakash.bodire...@intel.com>>
> ---
> 
>  RFC->v2
>  - Change to PATCH after ML discussion.
>  - Rebase.
>  - Add retry limit when packets are being sent.
>  - Add Ack from Bhanu.
> 
>  lib/netdev-dpdk.c |   34 ++
>  1 files changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 19d355f..582569c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -141,10 +141,7 @@ static char *cuse_dev_name = NULL;/* Character 
> device cuse_dev_name. */
>  #endif
>  static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets 
> */
> 
> -/*
> - * Maximum amount of time in micro seconds to try and enqueue to vhost.
> - */
> -#define VHOST_ENQ_RETRY_USECS 100
> +#define VHOST_ENQ_RETRY_NUM 8
> 
>  static const struct rte_eth_conf port_conf = {
>  .rxmode = {
> @@ -1377,7 +1374,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int 
> qid,
>  struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
>  unsigned int total_pkts = cnt;
>  unsigned int qos_pkts = cnt;
> -uint64_t start = 0;
> +int retries = 0;
> 
>  qid = dev->tx_q[qid % dev->real_n_txq].map;
> 
> @@ -1404,32 +1401,13 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, 
> int qid,
>  if (OVS_LIKELY(tx_pkts)) {
>  /* Packets have been sent.*/
>  cnt -= tx_pkts;
> -/* Prepare for possible next iteration.*/
> +/* Prepare for possible retry.*/
>  cur_pkts = &cur_pkts[tx_pkts];
>  } else {
> -uint64_t timeout = VHOST_ENQ_RETRY_USECS * 
> rte_get_timer_hz() / 1E6;
> -unsigned int expired = 0;
> -
> -if (!start) {
> -start = rte_get_timer_cycles();
> -}
> -
> -/*
> - * Unable to enqueue packets to vhost interface.
> - * Check available entries before retrying.
> - */
> -while (!rte_vring_available_entries(virtio_dev, vhost_qid)) {
> -if (OVS_UNLIKELY((rte_get_timer_cycles() - start) > 
> timeout)) {
> -expired = 1;
> -break;
> -}
> -}
> -if (expired) {
> -/* break out of main loop. */
> -break;
> -}
> +/* No packets sent - do not retry.*/
> +break;
>  }
> -} while (cnt);
> +} while (cnt && (retries++ < VHOST_ENQ_RETRY_NUM));
> 
>  rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
> 
> --
> 1.7.4.1
> 
> ___
> dev mailing list
> dev@openvswitch.org <mailto:dev@openvswitch.org>
> http://openvswitch.org/mailman/listinfo/dev
> 
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

2016-06-20 Thread Ilya Maximets

On 11.06.2016 02:53, Daniele Di Proietto wrote:
> On 02/06/2016 06:55, "Ilya Maximets"  wrote:
> 
>> Hi, Daniele.
>> Thanks for review.
>>
>> On 02.06.2016 04:33, Daniele Di Proietto wrote:
>>> Hi Ilya,
>>>
>>> apologies for the delay.
>>>
>>> I didn't take a extremely detailed look at this series, but I have
>>> a few high level comments.
>>>
>>> Thanks for adding a command to configure the rxq affinity.  Have
>>> you thought about using the database instead?  I think it will
>>> be easier to use because it survives restarts, and one can batch
>>> the affinity assignment for multiple ports without explicitly
>>> calling pmd-reconfigure.  I'm not sure what the best interface
>>> would look like. Perhaps a string in Interface:other_config that
>>> maps rxqs with core ids?
>>>
>>> I'd prefer to avoid exporting an explicit command like
>>> dpif-netdev/pmd-reconfigure.  If we use the database we don't have to,
>>> right?
>>
>> I thought about solution with database. Actually, I can't see big
>> difference between database and appctl in this case. For automatic
>> usage both commands may be scripted, but for manual pinning this
>> approaches equally uncomfortable.
>> IMHO, if it will be database it shouldn't be a per 'Interface'
>> string with mapping, because one map influences on other ports
>> (core isolation). Also there is an issue with synchronization with
>> 'pmd-cpu-mask' that should be performed manually anyway.
>> appctl command may be changed to receive string of all mappings and
>> trigger reconfiguration. In this case there will be no need to have
>> explicit 'dpif-netdev/pmd-reconfigure'.
> 
> Do we really need to implement core isolation? I'd prefer an interface where
> if an interface has an affinity we enforce that (as far as we can with the
> current pmd-cpu-mask), and for other interfaces we keep the current model.
> Probably there are some limitation I'm not seeing with this model.

Generally, core isolation prevents polling of other ports on PMD thread.
This is useful to keep constant polling rate on some performance
critical port while adding/deleting of other ports. Without isolation
we will need to pin exactly all ports to achieve desired level of performance.

> I'd prefer to keep the mapping in the database because it's more in line
> with the rest of OVS configuration.  The database survives crashes, restarts
> and reboots.

Ok. How about something like this:

* Per-port database entry for available core-ids:

  # ovs-vsctl set interface  \
  other_config:pmd-rxq-affinity=

  where:
   ::= NULL | 
   ::=  |
, 
   ::=  : 

  Example:

  # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
other_config:pmd-rxq-affinity="0:3,1:7,3:8"
  Queue #0 pinned to core 3;
  Queue #1 pinned to core 7;
  Queue #2 not pinned.
  Queue #3 pinned to core 8;

* Configurable mask of isolated PMD threads:

  # ./bin/ovs-vsctl set Open_vSwitch . \
other_config:pmd-isol-cpus=
  Empty means "none".

* Pinned RX queue can be polled only by PMD thread on core to
  which it is pinned.

* Only pinned RX queues can be polled by isolated PMD thread.

>>
>>> I'm not sure what's the best way to introduce XPS in OVS.  First of all,
>>> for physical NICs I'd want to keep the current model where possible
>>> (one queue per core, no locking).  Maybe we can introduce some mapping
>>> just for vhost-user ports in netdev-dpdk to implement packet steering?
>>
>> We can just set default values for 'n_txq' more accurately: 'n_cores() + 1'
>> for phy ports and '1' for virtual. To avoid locking on TX to phy ports
>> we may just check that 'n_txq' >= 'n_cores() + 1'. We can do that because
>> reconfiguration required to change 'n_txq' and, in current implementation
>> of XPS, one queue will not be used twice if we have unused queues.
> 
> Ok, if it can be done without any overhead for phy ports, I'm fine with that.
> 
>>
>>> Also, have you considered packet reordering issues?  If a 5-tuple is
>>> processed by one core and we often change the tx queue, we end up
>>> with the same 5-tuple on two different tx queues.
>>
>> To avoid reordering issues there is

Re: [ovs-dev] [PATCH RFC 6/6] dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command.

2016-06-24 Thread Ilya Maximets

Hi, Ryan.
Thanks for your attention to this series.

I have the rebased and a little fixed version of this patch-set, but,
actually, I'm waiting for comments about high level design in
this thread:
http://openvswitch.org/pipermail/dev/2016-June/073196.html .

May be, I'll send a new version in a few days.

Best regards, Ilya Maximets.

On 22.06.2016 17:35, Ryan Moats wrote:
> "dev"  wrote on 05/12/2016 08:43:15 AM:
> 
>> From: Ilya Maximets 
>> To: dev@openvswitch.org, Daniele Di Proietto 
>> Cc: Dyasly Sergey , Flavio Leitner
>> , Ilya Maximets , Kevin
>> Traynor 
>> Date: 05/12/2016 08:45 AM
>> Subject: [ovs-dev] [PATCH RFC 6/6] dpif-netdev: Add dpif-netdev/pmd-
>> rxq-set appctl command.
>> Sent by: "dev" 
>>
>> New appctl command to perform manual pinning of RX queues
>> to desired cores.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> This patch also failed to apply completely clean.  While none of the
> rejects are all that terrible, I'm going to ask if a rebase is
> possible?
> 
> Thanks in advance,
> Ryan
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] netdev-dpdk: Fix using uninitialized link_status.

2016-06-24 Thread Ilya Maximets

'rte_eth_link_get_nowait()' works only with physical ports.
In case of vhost-user port, 'link' will stay uninitialized and there
will be random messages in log about link status.

Ex.:
|dpdk(dpdk_watchdog2)|DBG|Port -1 Link Up - speed 1 Mbps - full-duplex

Fix that by calling 'check_link_status()' only for physical ports.

Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index fc0c8d3..12733ad 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -567,7 +567,9 @@ dpdk_watchdog(void *dummy OVS_UNUSED)
 ovs_mutex_lock(&dpdk_mutex);
 LIST_FOR_EACH (dev, list_node, &dpdk_list) {
 ovs_mutex_lock(&dev->mutex);
-check_link_status(dev);
+if (dev->type == DPDK_DEV_ETH) {
+check_link_status(dev);
+}
 ovs_mutex_unlock(&dev->mutex);
 }
 ovs_mutex_unlock(&dpdk_mutex);
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 1/2] netdev-dpdk: Use instant sending instead of queueing of packets.

2016-06-27 Thread Ilya Maximets

Current implementarion of TX packet's queueing is broken in several ways:

* TX queue flushing implemented on receive assumes that all
  core_id-s are sequential and starts from zero. This may lead
  to situation when packets will stuck in queue forever and,
  also, this influences on latency.

* For a long time flushing logic depends on uninitialized
  'txq_needs_locking', because it usually calculated after
  'netdev_dpdk_alloc_txq' but used inside of this function
  for initialization of 'flush_tx'.

Testing shows no performance difference with and without queueing.
Lets remove queueing at all because it doesn't work properly now and
also does not increase performance.

Signed-off-by: Ilya Maximets 
Acked-by: Daniele Di Proietto 
---
 lib/netdev-dpdk.c | 101 --
 1 file changed, 14 insertions(+), 87 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 02e2c58..8bb33d6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -165,7 +165,6 @@ static const struct rte_eth_conf port_conf = {
 },
 };
 
-enum { MAX_TX_QUEUE_LEN = 384 };
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 20ULL };
@@ -282,8 +281,7 @@ static struct ovs_list dpdk_mp_list 
OVS_GUARDED_BY(dpdk_mutex)
 = OVS_LIST_INITIALIZER(&dpdk_mp_list);
 
 /* This mutex must be used by non pmd threads when allocating or freeing
- * mbufs through mempools. Since dpdk_queue_pkts() and dpdk_queue_flush() may
- * use mempools, a non pmd thread should hold this mutex while calling them */
+ * mbufs through mempools. */
 static struct ovs_mutex nonpmd_mempool_mutex = OVS_MUTEX_INITIALIZER;
 
 struct dpdk_mp {
@@ -297,17 +295,12 @@ struct dpdk_mp {
 /* There should be one 'struct dpdk_tx_queue' created for
  * each cpu core. */
 struct dpdk_tx_queue {
-bool flush_tx; /* Set to true to flush queue everytime */
-   /* pkts are queued. */
-int count;
 rte_spinlock_t tx_lock;/* Protects the members and the NIC queue
 * from concurrent access.  It is used only
 * if the queue is shared among different
 * pmd threads (see 'txq_needs_locking'). */
 int map;   /* Mapping of configured vhost-user queues
 * to enabled by guest. */
-uint64_t tsc;
-struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN];
 };
 
 /* dpdk has no way to remove dpdk ring ethernet devices
@@ -720,19 +713,6 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned 
int n_txqs)
 
 dev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *dev->tx_q);
 for (i = 0; i < n_txqs; i++) {
-int numa_id = ovs_numa_get_numa_id(i);
-
-if (!dev->txq_needs_locking) {
-/* Each index is considered as a cpu core id, since there should
- * be one tx queue for each cpu core.  If the corresponding core
- * is not on the same numa node as 'dev', flags the
- * 'flush_tx'. */
-dev->tx_q[i].flush_tx = dev->socket_id == numa_id;
-} else {
-/* Queues are shared among CPUs. Always flush */
-dev->tx_q[i].flush_tx = true;
-}
-
 /* Initialize map for vhost devices. */
 dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN;
 rte_spinlock_init(&dev->tx_q[i].tx_lock);
@@ -1088,16 +1068,15 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
 }
 
 static inline void
-dpdk_queue_flush__(struct netdev_dpdk *dev, int qid)
+netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
+ struct rte_mbuf **pkts, int cnt)
 {
-struct dpdk_tx_queue *txq = &dev->tx_q[qid];
 uint32_t nb_tx = 0;
 
-while (nb_tx != txq->count) {
+while (nb_tx != cnt) {
 uint32_t ret;
 
-ret = rte_eth_tx_burst(dev->port_id, qid, txq->burst_pkts + nb_tx,
-   txq->count - nb_tx);
+ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx);
 if (!ret) {
 break;
 }
@@ -1105,32 +1084,18 @@ dpdk_queue_flush__(struct netdev_dpdk *dev, int qid)
 nb_tx += ret;
 }
 
-if (OVS_UNLIKELY(nb_tx != txq->count)) {
+if (OVS_UNLIKELY(nb_tx != cnt)) {
 /* free buffers, which we couldn't transmit, one at a time (each
  * packet could come from a different mempool) */
 int i;
 
-for (i = nb_tx; i < txq->count; i++) {
-rte_pktmbuf_free(txq->burst_pkts[i]);
+for (i = nb_tx; i < cnt; i++) {
+rte_pktmbuf_free(pkts[i]);
 }
 rte_spinlock_lock(&d

[ovs-dev] [PATCH v3 0/2] Instant send + queue number. (First part of XPS patch-set).

2016-06-27 Thread Ilya Maximets

While API for manual pinnig of rx queues is under discussion [1], I
decided to split whole 'XPS + pinning' patch-set into three parts for
faster review and applying.

This is the first part which contains generic fixes not directly connected
with XPS or manual pinning but required by them.

First patch is only rebased on top of current master branch and name of
function also changed as requested. Daniele, you still may change name
of 'netdev_dpdk_eth_tx_burst' if you don't like it. Ack preserved for now.

Second patch is a new one, which allows to reconfigure number of queues
in runtime according to settings from connected virtio device.

* [1] http://openvswitch.org/pipermail/dev/2016-June/073196.html

Ilya Maximets (2):
  netdev-dpdk: Use instant sending instead of queueing of packets.
  dpif-netdev: Move setting of queue number to netdev layer.

 INSTALL.DPDK.md   |  24 ++---
 NEWS  |   2 +
 lib/dpif-netdev.c |  25 -
 lib/netdev-bsd.c  |   1 -
 lib/netdev-dpdk.c | 261 +++---
 lib/netdev-dummy.c|  31 ++
 lib/netdev-linux.c|   1 -
 lib/netdev-provider.h |  16 
 lib/netdev-vport.c|   1 -
 lib/netdev.c  |  30 --
 lib/netdev.h  |   1 -
 vswitchd/vswitch.xml  |   3 +-
 12 files changed, 98 insertions(+), 298 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3 2/2] dpif-netdev: Move setting of queue number to netdev layer.

2016-06-27 Thread Ilya Maximets

Currently, there are few inconsistencies between dpif-netdev
and netdev layers:

* dpif-netdev can't know about exact number of tx queues
  allocated inside netdev.
  This leads to constant mapping of queue-ids to 'real' ones.

* dpif-netdev is able to change number of tx queues while
  it knows nothing about real hardware or number of queues
  allocated in VM.
  This leads to complications in reconfiguration of vhost-user
  ports, because setting of 'n_txq' from different sources
  (dpif-netdev and 'new_device()' call) requires additional
  sychronization between this two layers.

Also: We are able to configure 'n_rxq' for vhost-user devices, but
  there is only one sane number of rx queues which must be used and
  configured manually (number of queues that allocated in QEMU).

This patch moves all configuration of queues to netdev layer and disables
configuration of 'n_rxq' for vhost devices.

Configuration of rx and tx queues now automatically applied from
connected virtio device. Standard reconfiguration mechanism was used to
apply this changes.

Number of tx queues by default set to 'n_cores + 1' for physical ports
and old 'needs_locking' logic preserved.

For dummy-pmd ports new undocumented option 'n_txq' introduced to
configure number of tx queues.

Ex.:
ovs-vsctl set interface dummy-pmd0 options:n_txq=32

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK.md   |  24 +++-
 NEWS  |   2 +
 lib/dpif-netdev.c |  25 
 lib/netdev-bsd.c  |   1 -
 lib/netdev-dpdk.c | 160 --
 lib/netdev-dummy.c|  31 ++
 lib/netdev-linux.c|   1 -
 lib/netdev-provider.h |  16 -
 lib/netdev-vport.c|   1 -
 lib/netdev.c  |  30 --
 lib/netdev.h  |   1 -
 vswitchd/vswitch.xml  |   3 +-
 12 files changed, 84 insertions(+), 211 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 00e75bd..e6810a5 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -378,12 +378,9 @@ Performance Tuning:
 
`ovs-vsctl set Interface  options:n_rxq=`
 
-   The command above sets the number of rx queues for DPDK interface.
+   The command above sets the number of rx queues for DPDK physical interface.
The rx queues are assigned to pmd threads on the same NUMA node in a
-   round-robin fashion.  For more information, please refer to the
-   Open_vSwitch TABLE section in
-
-   `man ovs-vswitchd.conf.db`
+   round-robin fashion.
 
 4. Exact Match Cache
 
@@ -660,16 +657,8 @@ Follow the steps below to attach vhost-user port(s) to a 
VM.
```
 
 3. Optional: Enable multiqueue support
-   The vhost-user interface must be configured in Open vSwitch with the
-   desired amount of queues with:
-
-   ```
-   ovs-vsctl set Interface vhost-user-2 options:n_rxq=
-   ```
-
-   QEMU needs to be configured as well.
-   The $q below should match the queues requested in OVS (if $q is more,
-   packets will not be received).
+   QEMU needs to be configured to use multiqueue.
+   The $q below is the number of queues.
The $v is the number of vectors, which is '$q x 2 + 2'.
 
```
@@ -678,6 +667,11 @@ Follow the steps below to attach vhost-user port(s) to a 
VM.
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
```
 
+   The vhost-user interface will be automatically reconfigured with required
+   number of rx and tx queues after connection of virtio device.
+   Manual configuration of `n_rxq` is not supported because OVS anyway will
+   work properly only if `n_rxq` will match number of queues configured in 
QEMU.
+
If one wishes to use multiple queues for an interface in the guest, the
driver in the guest operating system must be configured to do so. It is
recommended that the number of queues configured be equal to '$q'.
diff --git a/NEWS b/NEWS
index 7aa050b..6e0dda7 100644
--- a/NEWS
+++ b/NEWS
@@ -34,6 +34,8 @@ Post-v2.5.0
- DPDK:
  * New option "n_rxq" for PMD interfaces.
Old 'other_config:n-dpdk-rxqs' is no longer supported.
+   Not supported by vHost interfaces. For them number of rx and tx queues
+   is applied from connected virtio device.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
  * Type of log messages from PMD threads changed from INFO to DBG.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index ff4227c..9d88b72 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1153,31 +1153,6 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 goto out;
 }
 
-if (netdev_is_pmd(netdev)) {
-int n_cores = ovs_numa_get_n_cores();
-
-if (n_cores == OVS_CORE_UNSPEC) {
-VLOG_ERR

Re: [ovs-dev] [PATCH v3 2/2] dpif-netdev: Move setting of queue number to netdev layer.

2016-06-28 Thread Ilya Maximets

This incremental fixup adds few comments and reduces
number of queues to poll after virtio disconnection by
reconfiguration with 1 queue pair requested.

---
 lib/dpif-netdev.c |  6 --
 lib/netdev-dpdk.c | 12 ++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9d88b72..5087223 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -442,8 +442,10 @@ struct dp_netdev_pmd_thread {
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
-atomic_int tx_qid;  /* Queue id used by this pmd thread to
- * send packets on all netdevs */
+
+/* Queue id used by this pmd thread to send packets on all netdevs.
+ * All tx_qid's are unique and less than 'ovs_numa_get_n_cores() + 1'. */
+atomic_int tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7ee7d2f..6687960 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2296,15 +2296,15 @@ destroy_device(volatile struct virtio_net *virtio_dev)
 ovs_mutex_lock(&dev->mutex);
 virtio_dev->flags &= ~VIRTIO_DEV_RUNNING;
 ovsrcu_set(&dev->virtio_dev, NULL);
-/* Clear txq settings. */
+/* Clear tx/rx queue settings. */
 netdev_dpdk_txq_map_clear(dev);
-dev->up.n_rxq = NR_QUEUE;
-dev->up.n_txq = NR_QUEUE;
-dev->requested_n_rxq = dev->up.n_rxq;
-dev->requested_n_txq = dev->up.n_txq;
-exists = true;
+dev->requested_n_rxq = NR_QUEUE;
+dev->requested_n_txq = NR_QUEUE;
+netdev_request_reconfigure(&dev->up);
+
 netdev_change_seq_changed(&dev->up);
 ovs_mutex_unlock(&dev->mutex);
+exists = true;
 break;
 }
 }
-- 
2.7.4
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev, v3, 1/4] Add support for 802.1ad (QinQ tunneling)

2016-07-07 Thread Ilya Maximets

Hi, Xiao.
You did a good job.

Unfortunately, this patch breaks dpdk build because of redeclaration of
structure with same name:

include/openvswitch/packets.h:
--
struct vlan_hdr {
ovs_be16 tpid;  /* ETH_TYPE_VLAN_DOT1Q or ETH_TYPE_DOT1AD */
ovs_be16 tci;
};

DPDK:lib/librte_ether/rte_ether.h :
--
/**
* Ethernet VLAN Header.
* Contains the 16-bit VLAN Tag Control Identifier and the Ethernet type
* of the encapsulated frame.
*/
struct vlan_hdr {
uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
} __attribute__((__packed__));

Actually, this structures are logically equal. This collision should be fixed
somehow.

Best regards, Ilya Maximets.

On 03.07.2016 03:47, Xiao Liang wrote:
> Flow key handleing changes:
> - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
>   headers.
> - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN,
>   increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
> 
> Refacter VLAN handling in dpif-xlate:
> - Introduce 'xvlan' to track VLAN stack during flow processing.
> - Input and output VLAN translation according to the xbundle type.
> 
> Push VLAN action support:
> - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
> - Support push_vlan on dot1q packets.
> 
> Add new port VLAN mode "dot1q-tunnel":
> - Example:
> ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
>   Pushes another VLAN 100 header on packets (tagged and untagged) on ingress,
>   and pops it on egress.
> - Customer VLAN check:
> ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
>   Only customer VLAN of 10 and 20 are allowed.
> 
> Signed-off-by: Xiao Liang 
> ---
>  include/openvswitch/flow.h|  13 +-
>  include/openvswitch/ofp-actions.h |  10 +-
>  include/openvswitch/packets.h |   5 +
>  lib/dpctl.c   |  29 ++-
>  lib/dpif-netdev.c |   7 +-
>  lib/flow.c| 109 ++
>  lib/flow.h|   6 +-
>  lib/match.c   |  47 ++--
>  lib/meta-flow.c   |  22 +-
>  lib/nx-match.c|  14 +-
>  lib/odp-util.c| 227 
>  lib/odp-util.h|   4 +-
>  lib/ofp-actions.c |  61 +++---
>  lib/ofp-util.c|  56 ++---
>  lib/tnl-ports.c   |   2 +-
>  ofproto/bond.c|   2 +-
>  ofproto/ofproto-dpif-ipfix.c  |   6 +-
>  ofproto/ofproto-dpif-rid.h|   2 +-
>  ofproto/ofproto-dpif-sflow.c  |   4 +-
>  ofproto/ofproto-dpif-xlate.c  | 436 
> ++
>  ofproto/ofproto-dpif-xlate.h  |   6 +-
>  ofproto/ofproto-dpif.c|  74 ++-
>  ofproto/ofproto.h |   8 +-
>  ovn/controller/pinctrl.c  |   5 +-
>  tests/test-classifier.c   |  15 +-
>  utilities/ovs-ofctl.c |  29 +--
>  vswitchd/bridge.c |  27 ++-
>  vswitchd/vswitch.ovsschema|  16 +-
>  vswitchd/vswitch.xml  |  31 +++
>  29 files changed, 866 insertions(+), 407 deletions(-)
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v4] Instant send + queue number. (First part of XPS patch-set).

2016-07-07 Thread Ilya Maximets

This is the first part of XPS patch-set which contains generic fixes not
directly connected with XPS or manual pinning but required by them.

Last patch which is left here allows to reconfigure number of queues
in runtime according to settings from connected virtio device.

Version 4:
* Dropped already applied patch
  "netdev-dpdk: Use instant sending instead of queueing of packets."
* Merged fixup from previous version:
  http://openvswitch.org/pipermail/dev/2016-June/073906.html
  It adds few comments and reduces number of queues to poll after
  virtio disconnection by reconfiguration with 1 queue pair requested.
* rebased on current master

Ilya Maximets (1):
  dpif-netdev: Move setting of queue number to netdev layer.

 INSTALL.DPDK-ADVANCED.md |  26 +++-
 NEWS |   2 +
 lib/dpif-netdev.c|  31 ++---
 lib/netdev-bsd.c |   1 -
 lib/netdev-dpdk.c| 162 +++
 lib/netdev-dummy.c   |  31 ++---
 lib/netdev-linux.c   |   1 -
 lib/netdev-provider.h|  16 -
 lib/netdev-vport.c   |   1 -
 lib/netdev.c |  30 -
 lib/netdev.h |   1 -
 vswitchd/vswitch.xml |   3 +-
 12 files changed, 90 insertions(+), 215 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v4] dpif-netdev: Move setting of queue number to netdev layer.

2016-07-07 Thread Ilya Maximets

Currently, there are few inconsistencies between dpif-netdev
and netdev layers:

* dpif-netdev can't know about exact number of tx queues
  allocated inside netdev.
  This leads to constant mapping of queue-ids to 'real' ones.

* dpif-netdev is able to change number of tx queues while
  it knows nothing about real hardware or number of queues
  allocated in VM.
  This leads to complications in reconfiguration of vhost-user
  ports, because setting of 'n_txq' from different sources
  (dpif-netdev and 'new_device()' call) requires additional
  sychronization between this two layers.

Also: We are able to configure 'n_rxq' for vhost-user devices, but
  there is only one sane number of rx queues which must be used and
  configured manually (number of queues that allocated in QEMU).

This patch moves all configuration of queues to netdev layer and disables
configuration of 'n_rxq' for vhost devices.

Configuration of rx and tx queues now automatically applied from
connected virtio device. Standard reconfiguration mechanism was used to
apply this changes.

Number of tx queues by default set to 'n_cores + 1' for physical ports
and old 'needs_locking' logic preserved.

For dummy-pmd ports new undocumented option 'n_txq' introduced to
configure number of tx queues.

Ex.:
ovs-vsctl set interface dummy-pmd0 options:n_txq=32

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK-ADVANCED.md |  26 +++-
 NEWS |   2 +
 lib/dpif-netdev.c|  31 ++---
 lib/netdev-bsd.c |   1 -
 lib/netdev-dpdk.c| 162 +++
 lib/netdev-dummy.c   |  31 ++---
 lib/netdev-linux.c   |   1 -
 lib/netdev-provider.h|  16 -
 lib/netdev-vport.c   |   1 -
 lib/netdev.c |  30 -
 lib/netdev.h |   1 -
 vswitchd/vswitch.xml |   3 +-
 12 files changed, 90 insertions(+), 215 deletions(-)

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index ec47e26..9ae536d 100644
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -246,16 +246,13 @@ needs to be affinitized accordingly.
 
   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
 
-### 4.3 DPDK port Rx Queues
+### 4.3 DPDK physical port Rx Queues
 
   `ovs-vsctl set Interface  options:n_rxq=`
 
-  The command above sets the number of rx queues for DPDK interface.
+  The command above sets the number of rx queues for DPDK physical interface.
   The rx queues are assigned to pmd threads on the same NUMA node in a
-  round-robin fashion.  For more information, please refer to the
-  Open_vSwitch TABLE section in
-
-  `man ovs-vswitchd.conf.db`
+  round-robin fashion.
 
 ### 4.4 Exact Match Cache
 
@@ -454,16 +451,8 @@ DPDK 16.04 supports two types of vhost:
 
 3. Enable multiqueue support(OPTIONAL)
 
-   The vhost-user interface must be configured in Open vSwitch with the
-   desired amount of queues with:
-
-   ```
-   ovs-vsctl set Interface vhost-user-2 options:n_rxq=
-   ```
-
-   QEMU needs to be configured as well.
-   The $q below should match the queues requested in OVS (if $q is more,
-   packets will not be received).
+   QEMU needs to be configured to use multiqueue.
+   The $q below is the number of queues.
The $v is the number of vectors, which is '$q x 2 + 2'.
 
```
@@ -472,6 +461,11 @@ DPDK 16.04 supports two types of vhost:
-device 
virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
```
 
+   The vhost-user interface will be automatically reconfigured with 
required
+   number of rx and tx queues after connection of virtio device.
+   Manual configuration of `n_rxq` is not supported because OVS will work
+   properly only if `n_rxq` will match number of queues configured in QEMU.
+
A least 2 PMDs should be configured for the vswitch when using 
multiqueue.
Using a single PMD will cause traffic to be enqueued to the same vhost
queue rather than being distributed among different vhost queues for a
diff --git a/NEWS b/NEWS
index f7b202b..a6d4035 100644
--- a/NEWS
+++ b/NEWS
@@ -37,6 +37,8 @@ Post-v2.5.0
- DPDK:
  * New option "n_rxq" for PMD interfaces.
Old 'other_config:n-dpdk-rxqs' is no longer supported.
+   Not supported by vHost interfaces. For them number of rx and tx queues
+   is applied from connected virtio device.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
  * Type of log messages from PMD threads changed from INFO to DBG.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 37c2631..1c5d6a1 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -442,8 +442,1

[ovs-dev] [PATCH v5 1/2] netdev-dpdk: Obtain number of queues for vhost ports from attached virtio.

2016-07-08 Thread Ilya Maximets

Currently, there are few inconsistencies in ways to configure number of
queues for netdev device:

* dpif-netdev can't know about exact number of queues
  allocated inside netdev.
  This leads to constant mapping of queue-ids to 'real' ones.

* We are able to configure 'n_rxq' for vhost-user devices, but
  there is only one sane number of rx queues which must be used
  and configured manually (number of queues that allocated
  in QEMU).

This patch disables configuration of 'n_rxq' for DPDK vHost devices.
Configuration of rx and tx queues now automatically applied from
connected virtio device. Standard reconfiguration mechanism was used to
apply this changes.

Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues
in the device.

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK-ADVANCED.md |  26 -
 NEWS |   2 +
 lib/dpif-netdev.c|   6 ++-
 lib/netdev-dpdk.c| 138 ++-
 vswitchd/vswitch.xml |   3 +-
 5 files changed, 81 insertions(+), 94 deletions(-)

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index ec47e26..9ae536d 100644
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -246,16 +246,13 @@ needs to be affinitized accordingly.
 
   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
 
-### 4.3 DPDK port Rx Queues
+### 4.3 DPDK physical port Rx Queues
 
   `ovs-vsctl set Interface  options:n_rxq=`
 
-  The command above sets the number of rx queues for DPDK interface.
+  The command above sets the number of rx queues for DPDK physical interface.
   The rx queues are assigned to pmd threads on the same NUMA node in a
-  round-robin fashion.  For more information, please refer to the
-  Open_vSwitch TABLE section in
-
-  `man ovs-vswitchd.conf.db`
+  round-robin fashion.
 
 ### 4.4 Exact Match Cache
 
@@ -454,16 +451,8 @@ DPDK 16.04 supports two types of vhost:
 
 3. Enable multiqueue support(OPTIONAL)
 
-   The vhost-user interface must be configured in Open vSwitch with the
-   desired amount of queues with:
-
-   ```
-   ovs-vsctl set Interface vhost-user-2 options:n_rxq=
-   ```
-
-   QEMU needs to be configured as well.
-   The $q below should match the queues requested in OVS (if $q is more,
-   packets will not be received).
+   QEMU needs to be configured to use multiqueue.
+   The $q below is the number of queues.
The $v is the number of vectors, which is '$q x 2 + 2'.
 
```
@@ -472,6 +461,11 @@ DPDK 16.04 supports two types of vhost:
-device 
virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
```
 
+   The vhost-user interface will be automatically reconfigured with 
required
+   number of rx and tx queues after connection of virtio device.
+   Manual configuration of `n_rxq` is not supported because OVS will work
+   properly only if `n_rxq` will match number of queues configured in QEMU.
+
A least 2 PMDs should be configured for the vswitch when using 
multiqueue.
Using a single PMD will cause traffic to be enqueued to the same vhost
queue rather than being distributed among different vhost queues for a
diff --git a/NEWS b/NEWS
index f7b202b..a6d4035 100644
--- a/NEWS
+++ b/NEWS
@@ -37,6 +37,8 @@ Post-v2.5.0
- DPDK:
  * New option "n_rxq" for PMD interfaces.
Old 'other_config:n-dpdk-rxqs' is no longer supported.
+   Not supported by vHost interfaces. For them number of rx and tx queues
+   is applied from connected virtio device.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
  * Type of log messages from PMD threads changed from INFO to DBG.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 37c2631..fec7615 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -442,8 +442,10 @@ struct dp_netdev_pmd_thread {
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
-atomic_int tx_qid;  /* Queue id used by this pmd thread to
- * send packets on all netdevs */
+
+/* Queue id used by this pmd thread to send packets on all netdevs.
+ * All tx_qid's are unique and less than 'ovs_numa_get_n_cores() + 1'. */
+atomic_int tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8bb33d6..b850ff6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -349,12 +349,10 @@ struct netdev_dpdk {
 struct rte_eth_link link;
 int link_reset_cn

[ovs-dev] [PATCH v5 2/2] netdev-dummy: Add n_txq option.

2016-07-08 Thread Ilya Maximets

Will be used for testing with different numbers of TX queues.

Signed-off-by: Ilya Maximets 
---
 lib/netdev-dummy.c | 32 +++-
 1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 24c107e..9ea765b 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -821,7 +821,7 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
struct smap *args)
 {
 struct netdev_dummy *netdev = netdev_dummy_cast(netdev_);
 const char *pcap;
-int new_n_rxq, new_numa_id;
+int new_n_rxq, new_n_txq, new_numa_id;
 
 ovs_mutex_lock(&netdev->mutex);
 netdev->ifindex = smap_get_int(args, "ifindex", -EOPNOTSUPP);
@@ -858,10 +858,13 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
struct smap *args)
 }
 
 new_n_rxq = MAX(smap_get_int(args, "n_rxq", netdev->requested_n_rxq), 1);
+new_n_txq = MAX(smap_get_int(args, "n_txq", netdev->requested_n_txq), 1);
 new_numa_id = smap_get_int(args, "numa_id", 0);
 if (new_n_rxq != netdev->requested_n_rxq
+|| new_n_txq != netdev->requested_n_txq
 || new_numa_id != netdev->requested_numa_id) {
 netdev->requested_n_rxq = new_n_rxq;
+netdev->requested_n_txq = new_n_txq;
 netdev->requested_numa_id = new_numa_id;
 netdev_request_reconfigure(netdev_);
 }
@@ -883,26 +886,6 @@ netdev_dummy_get_numa_id(const struct netdev *netdev_)
 return numa_id;
 }
 
-/* Requests the number of tx queues for the dummy PMD interface. */
-static int
-netdev_dummy_set_tx_multiq(struct netdev *netdev_, unsigned int n_txq)
-{
-struct netdev_dummy *netdev = netdev_dummy_cast(netdev_);
-
-ovs_mutex_lock(&netdev->mutex);
-
-if (netdev_->n_txq == n_txq) {
-goto out;
-}
-
-netdev->requested_n_txq = n_txq;
-netdev_request_reconfigure(netdev_);
-
-out:
-ovs_mutex_unlock(&netdev->mutex);
-return 0;
-}
-
 /* Sets the number of tx queues and rx queues for the dummy PMD interface. */
 static int
 netdev_dummy_reconfigure(struct netdev *netdev_)
@@ -1325,7 +1308,7 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 
 /* Helper functions. */
 
-#define NETDEV_DUMMY_CLASS(NAME, PMD, TX_MULTIQ, RECOFIGURE)   \
+#define NETDEV_DUMMY_CLASS(NAME, PMD, RECOFIGURE)   \
 {   \
 NAME,   \
 PMD,/* is_pmd */\
@@ -1344,7 +1327,7 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 NULL,   /* push header */   \
 NULL,   /* pop header */\
 netdev_dummy_get_numa_id,   \
-TX_MULTIQ,  \
+NULL,   /* set_tx_multiq */ \
 \
 netdev_dummy_send,  /* send */  \
 NULL,   /* send_wait */ \
@@ -1396,11 +1379,10 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 }
 
 static const struct netdev_class dummy_class =
-NETDEV_DUMMY_CLASS("dummy", false, NULL, NULL);
+NETDEV_DUMMY_CLASS("dummy", false, NULL);
 
 static const struct netdev_class dummy_pmd_class =
 NETDEV_DUMMY_CLASS("dummy-pmd", true,
-   netdev_dummy_set_tx_multiq,
netdev_dummy_reconfigure);
 
 static void
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v5 0/2] Instant send + queue number. (First part of XPS patch-set).

2016-07-08 Thread Ilya Maximets

This is the first part of XPS patch-set which contains generic fixes not
directly connected with XPS or manual pinning but required by them.

First patch allows to reconfigure number of queues for DPDK vHost
in runtime according to settings from connected virtio device.

Second patch adds new option 'n_txq' for dummy-pmd. It will be used
in the future patches.

Version 5:
* 'dpif-netdev: Move setting of queue number to netdev layer.'
  splitted in 2 patches because changes are not really
  connected now.
* 'set_tx_multiq()' returned to configure 'n_txq' from the
  datapath layer for ETH ports.

Version 4:
* Dropped already applied patch
  "netdev-dpdk: Use instant sending instead of queueing of packets."
* Merged fixup from previous version:
  http://openvswitch.org/pipermail/dev/2016-June/073906.html
  It adds few comments and reduces number of queues to poll after
  virtio disconnection by reconfiguration with 1 queue pair requested.
* rebased on current master

Ilya Maximets (2):
  netdev-dpdk: Obtain number of queues for vhost ports from attached
virtio.
  netdev-dummy: Add n_txq option.

 INSTALL.DPDK-ADVANCED.md |  26 -
 NEWS |   2 +
 lib/dpif-netdev.c|   6 ++-
 lib/netdev-dpdk.c| 138 ++-
 lib/netdev-dummy.c   |  32 +++
 vswitchd/vswitch.xml |   3 +-
 6 files changed, 88 insertions(+), 119 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v4] dpif-netdev: Move setting of queue number to netdev layer.

2016-07-08 Thread Ilya Maximets

Hi,
Thanks for review.

On 08.07.2016 05:19, Daniele Di Proietto wrote:
> Thanks for the patch, I think this moves in the right direction.
> 
> I like how this patch removes "real_n_txq", as you pointed out
> it was confusing and, as proven here, unnecessary.
> 
> I don't like very much that the netdev implementation decides
> to create max_tx_queue_id + 1 queues.  I still think the
> request should come from the datapath with netdev_dpdk_set_tx_multiq().
> 
> How about this?
> 
> * For phy devices netdev_dpdk_set_tx_multiq() stays as it is.
>   requested_n_txq is coming only from the datapath
> * For vhost devices netdev_dpdk_set_tx_multiq() becomes a no-op, because
>   it doesn't matter how many queues the datapath requests, we're locking
>   on every transmission anyway.  requested_n_txq is coming only from the
>   new_device() callback.

I've sent v5 with this changes.

> 
> Other than this I tested the patch and it appears to work, at least the
> automatic assignment of the number of queues from qemu, so thanks!
> 
> 
> 
> On 07/07/2016 06:58, "Ilya Maximets"  wrote:
> 
>> Currently, there are few inconsistencies between dpif-netdev
>> and netdev layers:
>>
>>  * dpif-netdev can't know about exact number of tx queues
>>allocated inside netdev.
>>This leads to constant mapping of queue-ids to 'real' ones.
> 
> Now n_txq is always the real number of transmission queues in the device.
> It think this is an improvement.
> 
>>
>>  * dpif-netdev is able to change number of tx queues while
>>it knows nothing about real hardware or number of queues
>>allocated in VM.
>>This leads to complications in reconfiguration of vhost-user
>>ports, because setting of 'n_txq' from different sources
>>(dpif-netdev and 'new_device()' call) requires additional
>>sychronization between this two layers.
> 
> I suggested above a way to avoid this synchronization problem while
> maintaining the netdev_dpdk_set_tx_multiq() call.
> 
>>
>> Also: We are able to configure 'n_rxq' for vhost-user devices, but
>>  there is only one sane number of rx queues which must be used and
>>  configured manually (number of queues that allocated in QEMU).
>>
>> This patch moves all configuration of queues to netdev layer and disables
>> configuration of 'n_rxq' for vhost devices.
>>
>> Configuration of rx and tx queues now automatically applied from
>> connected virtio device. Standard reconfiguration mechanism was used to
>> apply this changes.
>>
>> Number of tx queues by default set to 'n_cores + 1' for physical ports
>> and old 'needs_locking' logic preserved.
>>
>> For dummy-pmd ports new undocumented option 'n_txq' introduced to
>> configure number of tx queues.
>>
>> Ex.:
>>  ovs-vsctl set interface dummy-pmd0 options:n_txq=32
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>> INSTALL.DPDK-ADVANCED.md |  26 +++-
>> NEWS |   2 +
>> lib/dpif-netdev.c|  31 ++---
>> lib/netdev-bsd.c |   1 -
>> lib/netdev-dpdk.c| 162 
>> +++
>> lib/netdev-dummy.c   |  31 ++---
>> lib/netdev-linux.c   |   1 -
>> lib/netdev-provider.h|  16 -
>> lib/netdev-vport.c   |   1 -
>> lib/netdev.c |  30 -
>> lib/netdev.h |   1 -
>> vswitchd/vswitch.xml |   3 +-
>> 12 files changed, 90 insertions(+), 215 deletions(-)
> 
> [...]
> 
>>
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 072fef4..a32f4ef 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -2346,12 +2346,13 @@
>> Only PMD netdevs support these options.
>>   
>>
>> -  > +   
> I'm embarrassed, I didn't notice this pretty obvious documentation mistake
> for a long time.  Thanks for fixing it!
> 
>>   type='{"type": "integer", "minInteger": 1}'>
>> 
>>   Specifies the maximum number of rx queues to be created for PMD
>>   netdev.  If not specified or specified to 0, one rx queue will
>>   be created by default.
>> +  Not supported by vHost interfaces.
> 
> maybe "DPDK vHost", instead of "vHost"?
> 
>> 
>>   
>> 
>> -- 
>> 2.7.4
>>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] dpif-netdev: fix race for queues between pmd threads

2015-07-24 Thread Ilya Maximets

Currently pmd threads select queues in pmd_load_queues() according to
get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
beacause dp_netdev_set_pmds_on_numa() starts them one by one and
current number of threads changes incrementally.

As a result we may have the following situation with 2 pmd threads:

* dp_netdev_set_pmds_on_numa()
* pmd12 thread started. Currently only 1 pmd thread exists.
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
* pmd14 thread started. 2 pmd threads exists.
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'

We have:
core 1 --> port 1, port 2
core 2 --> port 2

Fix this by reloading all pmds to get right port mapping.

If we reload pmds, we'll have:
core 1 --> port 1
core 2 --> port 2

Cc: Dyasly Sergey 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2958d52..fd700f9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1127,10 +1127,9 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 ovs_refcount_init(&port->ref_cnt);
 cmap_insert(&dp->ports, &port->node, hash_port_no(port_no));
 
-if (netdev_is_pmd(netdev)) {
+if (netdev_is_pmd(netdev))
 dp_netdev_set_pmds_on_numa(dp, netdev_get_numa_id(netdev));
-dp_netdev_reload_pmds(dp);
-}
+
 seq_change(dp->port_seq);
 
 return 0;
@@ -2978,6 +2977,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
 pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
 }
 VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id);
+dp_netdev_reload_pmds(dp);
 }
 }
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] dpif-netdev: fix race for queues between pmd threads

2015-07-27 Thread Ilya Maximets

Yes, I think, this way is better. But it may be more simple
if we just keep all the dp_netdev_pmd_thread structures.
There is no need to search over all pmd threads on system.

Like this:

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 79c4612..8e4c025 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2961,6 +2960,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
  * pmd threads for the numa node. */
 if (!n_pmds) {
 int can_have, n_unpinned, i;
+struct dp_netdev_pmd_thread *pmd;
 
 n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
 if (!n_unpinned) {
@@ -2972,16 +2972,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
 /* If cpu mask is specified, uses all unpinned cores, otherwise
  * tries creating NR_PMD_THREADS pmd threads. */
 can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, 
NR_PMD_THREADS);
+pmd = xzalloc(can_have * sizeof(*pmd));
 for (i = 0; i < can_have; i++) {
-struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
 unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
-
-dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
+dp_netdev_configure_pmd(&pmd[i], dp, i, core_id, numa_id);
+}
+/* The pmd thread code needs to see all the others configured pmd
+ * threads on the same numa node.  That's why we call
+ * 'dp_netdev_configure_pmd()' on all the threads and then we actually
+ * start them. */
+for (i = 0; i < can_have; i++) {
 /* Each thread will distribute all devices rx-queues among
  * themselves. */
-pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
+pmd[i].thread = ovs_thread_create("pmd", pmd_thread_main, &pmd[i]);
 }
 VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id);
+dp_netdev_reload_pmds(dp);
 }
 }


On 24.07.2015 19:42, Daniele Di Proietto wrote:
> That's a bad race condition, thanks for reporting it!
> 
> Regarding the fix, I agree that reloading the threads would
> restore the correct mapping, but it would still allow
> the threads to run with the incorrect mapping for a brief
> interval.
> 
> How about postponing the actual threads creation until all
> the pmds are configured?
> 
> Something like:
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 79c4612..26d9f1f 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2961,6 +2961,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int
> numa_id)
>   * pmd threads for the numa node. */
>  if (!n_pmds) {
>  int can_have, n_unpinned, i;
> +struct dp_netdev_pmd_thread *pmd;
> 
>  n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
>  if (!n_unpinned) {
> @@ -2973,13 +2974,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp,
> int numa_id)
>   * tries creating NR_PMD_THREADS pmd threads. */
>  can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned,
> NR_PMD_THREADS);
>  for (i = 0; i < can_have; i++) {
> -struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
>  unsigned core_id =
> ovs_numa_get_unpinned_core_on_numa(numa_id);
> 
> +pmd = xzalloc(sizeof *pmd);
>  dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
> -/* Each thread will distribute all devices rx-queues among
> - * themselves. */
> -pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
> +}
> +
> +/* The pmd thread code needs to see all the others configured pmd
> + * threads on the same numa node.  That's why we call
> + * 'dp_netdev_configure_pmd()' on all the threads and then we
> actually
> + * start them. */
> +CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> +if (pmd->numa_id == numa_id) {
> +/* Each thread will distribute all devices rx-queues among
> + * themselves. */
> +    pmd->thread = ovs_thread_create("pmd", pmd_thread_main,
> pmd);
> +}
>  }
>  VLOG_INFO("Created %d pmd threads on numa node %d", can_have,
> numa_id);
>  }
> 
> 
> What do you think?
> 
> Thanks!
> 
> On 24/07/2015 12:18, "Ilya Maximets"  wrote:
> 
>> Currently pmd threads select queues in pmd_load_queues() according to
>> get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
>> beacause dp_netdev_set_pmds_on_numa() starts

Re: [ovs-dev] [PATCH] dpif-netdev: fix race for queues between pmd threads

2015-07-27 Thread Ilya Maximets

Sorry,
without dp_netdev_reload_pmds(dp) at the end.

On 27.07.2015 10:58, Ilya Maximets wrote:
> Yes, I think, this way is better. But it may be more simple
> if we just keep all the dp_netdev_pmd_thread structures.
> There is no need to search over all pmd threads on system.
> 
> Like this:
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 79c4612..8e4c025 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2961,6 +2960,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
> numa_id)
>   * pmd threads for the numa node. */
>  if (!n_pmds) {
>  int can_have, n_unpinned, i;
> +struct dp_netdev_pmd_thread *pmd;
>  
>  n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
>  if (!n_unpinned) {
> @@ -2972,16 +2972,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
> numa_id)
>  /* If cpu mask is specified, uses all unpinned cores, otherwise
>   * tries creating NR_PMD_THREADS pmd threads. */
>  can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, 
> NR_PMD_THREADS);
> +pmd = xzalloc(can_have * sizeof(*pmd));
>  for (i = 0; i < can_have; i++) {
> -struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
>  unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
> -
> -dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
> +dp_netdev_configure_pmd(&pmd[i], dp, i, core_id, numa_id);
> +}
> +/* The pmd thread code needs to see all the others configured pmd
> + * threads on the same numa node.  That's why we call
> + * 'dp_netdev_configure_pmd()' on all the threads and then we 
> actually
> + * start them. */
> +for (i = 0; i < can_have; i++) {
>  /* Each thread will distribute all devices rx-queues among
>   * themselves. */
> -pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
> +pmd[i].thread = ovs_thread_create("pmd", pmd_thread_main, 
> &pmd[i]);
>  }
>  VLOG_INFO("Created %d pmd threads on numa node %d", can_have, 
> numa_id);
> +dp_netdev_reload_pmds(dp);
>  }
>  }
> 
> 
> On 24.07.2015 19:42, Daniele Di Proietto wrote:
>> That's a bad race condition, thanks for reporting it!
>>
>> Regarding the fix, I agree that reloading the threads would
>> restore the correct mapping, but it would still allow
>> the threads to run with the incorrect mapping for a brief
>> interval.
>>
>> How about postponing the actual threads creation until all
>> the pmds are configured?
>>
>> Something like:
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 79c4612..26d9f1f 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -2961,6 +2961,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int
>> numa_id)
>>   * pmd threads for the numa node. */
>>  if (!n_pmds) {
>>  int can_have, n_unpinned, i;
>> +struct dp_netdev_pmd_thread *pmd;
>>
>>  n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
>>  if (!n_unpinned) {
>> @@ -2973,13 +2974,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp,
>> int numa_id)
>>   * tries creating NR_PMD_THREADS pmd threads. */
>>  can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned,
>> NR_PMD_THREADS);
>>  for (i = 0; i < can_have; i++) {
>> -struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
>>  unsigned core_id =
>> ovs_numa_get_unpinned_core_on_numa(numa_id);
>>
>> +pmd = xzalloc(sizeof *pmd);
>>  dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
>> -/* Each thread will distribute all devices rx-queues among
>> - * themselves. */
>> -pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
>> +}
>> +
>> +/* The pmd thread code needs to see all the others configured pmd
>> + * threads on the same numa node.  That's why we call
>> + * 'dp_netdev_configure_pmd()' on all the threads and then we
>> actually
>> + * start them. */
>> +CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
>> +if (pmd->numa_id == numa_id) {
>> +/* Each thread will distribute all devices rx-queues among
>> + * themselves. */
>> +pm

Re: [ovs-dev] [PATCH] dpif-netdev: fix race for queues between pmd threads

2015-07-27 Thread Ilya Maximets

Previous diff was completely wrong. Sorry.

@@ -2971,6 +2970,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
  * pmd threads for the numa node. */
 if (!n_pmds) {
 int can_have, n_unpinned, i;
+struct dp_netdev_pmd_thread **pmds;
 
 n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
 if (!n_unpinned) {
@@ -2982,15 +2982,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
 /* If cpu mask is specified, uses all unpinned cores, otherwise
  * tries creating NR_PMD_THREADS pmd threads. */
 can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, 
NR_PMD_THREADS);
+pmds = xzalloc(can_have * sizeof(struct dp_netdev_pmd_thread*));
 for (i = 0; i < can_have; i++) {
-struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
 unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
-
-dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
+pmds[i] = xzalloc(sizeof(struct dp_netdev_pmd_thread));
+dp_netdev_configure_pmd(pmds[i], dp, i, core_id, numa_id);
+}
+/* The pmd thread code needs to see all the others configured pmd
+ * threads on the same numa node.  That's why we call
+ * 'dp_netdev_configure_pmd()' on all the threads and then we actually
+ * start them. */
+for (i = 0; i < can_have; i++) {
 /* Each thread will distribute all devices rx-queues among
  * themselves. */
-pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
+pmds[i]->thread = ovs_thread_create("pmd", pmd_thread_main, 
pmds[i]);
 }
+free(pmds);
 VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id);
 }
 }


On 27.07.2015 11:17, Ilya Maximets wrote:
> Sorry,
> without dp_netdev_reload_pmds(dp) at the end.
> 
> On 27.07.2015 10:58, Ilya Maximets wrote:
>> Yes, I think, this way is better. But it may be more simple
>> if we just keep all the dp_netdev_pmd_thread structures.
>> There is no need to search over all pmd threads on system.
>>
>> Like this:
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 79c4612..8e4c025 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -2961,6 +2960,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
>> numa_id)
>>   * pmd threads for the numa node. */
>>  if (!n_pmds) {
>>  int can_have, n_unpinned, i;
>> +struct dp_netdev_pmd_thread *pmd;
>>  
>>  n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
>>  if (!n_unpinned) {
>> @@ -2972,16 +2972,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
>> numa_id)
>>  /* If cpu mask is specified, uses all unpinned cores, otherwise
>>   * tries creating NR_PMD_THREADS pmd threads. */
>>  can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, 
>> NR_PMD_THREADS);
>> +pmd = xzalloc(can_have * sizeof(*pmd));
>>  for (i = 0; i < can_have; i++) {
>> -struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
>>  unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
>> -
>> -dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
>> +dp_netdev_configure_pmd(&pmd[i], dp, i, core_id, numa_id);
>> +}
>> +/* The pmd thread code needs to see all the others configured pmd
>> + * threads on the same numa node.  That's why we call
>> + * 'dp_netdev_configure_pmd()' on all the threads and then we 
>> actually
>> + * start them. */
>> +for (i = 0; i < can_have; i++) {
>>  /* Each thread will distribute all devices rx-queues among
>>   * themselves. */
>> -pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
>> +pmd[i].thread = ovs_thread_create("pmd", pmd_thread_main, 
>> &pmd[i]);
>>  }
>>  VLOG_INFO("Created %d pmd threads on numa node %d", can_have, 
>> numa_id);
>> +dp_netdev_reload_pmds(dp);
>>  }
>>  }
>>
>>
>> On 24.07.2015 19:42, Daniele Di Proietto wrote:
>>> That's a bad race condition, thanks for reporting it!
>>>
>>> Regarding the fix, I agree that reloading the threads would
>>> restore the correct mapping, but it would still allow
>>> the threads to run with the incorrect mapping for a brief
>>> interval.
>

Re: [ovs-dev] [PATCH] dpif-netdev: fix race for queues between pmd threads

2015-07-27 Thread Ilya Maximets

Ok. I will.

Thanks to you.

On 27.07.2015 14:12, Daniele Di Proietto wrote:
> That looks better to me.  Minor comment: we usually prefer
> 'sizeof *pmds' to 'sizeof(struct dp_netdev_pmd_thread*)'
> 
> https://github.com/openvswitch/ovs/blob/master/CodingStyle.md#user-content-
> expressions
> 
> Would you mind changing that, updating the commit message and
> resending to the list?
> 
> Thanks!
> 
> On 27/07/2015 10:34, "Ilya Maximets"  wrote:
> 
>> Previous diff was completely wrong. Sorry.
>>
>> @@ -2971,6 +2970,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp,
>> int numa_id)
>>  * pmd threads for the numa node. */
>> if (!n_pmds) {
>> int can_have, n_unpinned, i;
>> +struct dp_netdev_pmd_thread **pmds;
>>
>> n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
>> if (!n_unpinned) {
>> @@ -2982,15 +2982,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp,
>> int numa_id)
>> /* If cpu mask is specified, uses all unpinned cores, otherwise
>>  * tries creating NR_PMD_THREADS pmd threads. */
>> can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned,
>> NR_PMD_THREADS);
>> +pmds = xzalloc(can_have * sizeof(struct dp_netdev_pmd_thread*));
>> for (i = 0; i < can_have; i++) {
>> -struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
>> unsigned core_id =
>> ovs_numa_get_unpinned_core_on_numa(numa_id);
>> -
>> -dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
>> +pmds[i] = xzalloc(sizeof(struct dp_netdev_pmd_thread));
>> +dp_netdev_configure_pmd(pmds[i], dp, i, core_id, numa_id);
>> +}
>> +/* The pmd thread code needs to see all the others configured pmd
>> + * threads on the same numa node.  That's why we call
>> + * 'dp_netdev_configure_pmd()' on all the threads and then we
>> actually
>> + * start them. */
>> +for (i = 0; i < can_have; i++) {
>> /* Each thread will distribute all devices rx-queues among
>>  * themselves. */
>> -pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
>> +pmds[i]->thread = ovs_thread_create("pmd", pmd_thread_main,
>> pmds[i]);
>> }
>> +free(pmds);
>> VLOG_INFO("Created %d pmd threads on numa node %d", can_have,
>> numa_id);
>> }
>> }
>>
>>
>> On 27.07.2015 11:17, Ilya Maximets wrote:
>>> Sorry,
>>> without dp_netdev_reload_pmds(dp) at the end.
>>>
>>> On 27.07.2015 10:58, Ilya Maximets wrote:
>>>> Yes, I think, this way is better. But it may be more simple
>>>> if we just keep all the dp_netdev_pmd_thread structures.
>>>> There is no need to search over all pmd threads on system.
>>>>
>>>> Like this:
>>>>
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index 79c4612..8e4c025 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -2961,6 +2960,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp,
>>>> int numa_id)
>>>>   * pmd threads for the numa node. */
>>>>  if (!n_pmds) {
>>>>  int can_have, n_unpinned, i;
>>>> +struct dp_netdev_pmd_thread *pmd;
>>>>  
>>>>  n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
>>>>  if (!n_unpinned) {
>>>> @@ -2972,16 +2972,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev
>>>> *dp, int numa_id)
>>>>  /* If cpu mask is specified, uses all unpinned cores,
>>>> otherwise
>>>>   * tries creating NR_PMD_THREADS pmd threads. */
>>>>  can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned,
>>>> NR_PMD_THREADS);
>>>> +pmd = xzalloc(can_have * sizeof(*pmd));
>>>>  for (i = 0; i < can_have; i++) {
>>>> -struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
>>>>  unsigned core_id =
>>>> ovs_numa_get_unpinned_core_on_numa(numa_id);
>>>> -
>>>> -dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
>>>> +dp_netdev_configure_pmd(&pmd[i], dp, i, core_id, numa_id);
>>>> +}
>>>> +/* The pmd thread code needs to

[ovs-dev] [PATCH v2] dpif-netdev: fix race for queues between pmd threads

2015-07-27 Thread Ilya Maximets

Currently pmd threads select queues in pmd_load_queues() according to
get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
beacause dp_netdev_set_pmds_on_numa() starts them one by one and
current number of threads changes incrementally.

As a result we may have the following situation with 2 pmd threads:

* dp_netdev_set_pmds_on_numa()
* pmd12 thread started. Currently only 1 pmd thread exists.
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
* pmd14 thread started. 2 pmd threads exists.
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'

We have:
core 1 --> port 1, port 2
core 2 --> port 2

Fix this by starting pmd threads only after all of them have
been configured.

Cc: Daniele Di Proietto 
Cc: Dyasly Sergey 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 79c4612..4fca7b7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1123,10 +1123,9 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 ovs_refcount_init(&port->ref_cnt);
 cmap_insert(&dp->ports, &port->node, hash_port_no(port_no));
 
-if (netdev_is_pmd(netdev)) {
+if (netdev_is_pmd(netdev))
 dp_netdev_set_pmds_on_numa(dp, netdev_get_numa_id(netdev));
-dp_netdev_reload_pmds(dp);
-}
+
 seq_change(dp->port_seq);
 
 return 0;
@@ -2961,6 +2960,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
  * pmd threads for the numa node. */
 if (!n_pmds) {
 int can_have, n_unpinned, i;
+struct dp_netdev_pmd_thread **pmds;
 
 n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
 if (!n_unpinned) {
@@ -2972,15 +2972,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
 /* If cpu mask is specified, uses all unpinned cores, otherwise
  * tries creating NR_PMD_THREADS pmd threads. */
 can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, 
NR_PMD_THREADS);
+pmds = xzalloc(can_have * sizeof *pmds);
 for (i = 0; i < can_have; i++) {
-struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
 unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
-
-dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
+pmds[i] = xzalloc(sizeof **pmds);
+dp_netdev_configure_pmd(pmds[i], dp, i, core_id, numa_id);
+}
+/* The pmd thread code needs to see all the others configured pmd
+ * threads on the same numa node.  That's why we call
+ * 'dp_netdev_configure_pmd()' on all the threads and then we actually
+ * start them. */
+for (i = 0; i < can_have; i++) {
 /* Each thread will distribute all devices rx-queues among
  * themselves. */
-pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
+pmds[i]->thread = ovs_thread_create("pmd", pmd_thread_main, 
pmds[i]);
 }
+free(pmds);
 VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id);
 }
 }
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v3] dpif-netdev: fix race for queues between pmd threads

2015-07-27 Thread Ilya Maximets

Currently pmd threads select queues in pmd_load_queues() according to
get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
beacause dp_netdev_set_pmds_on_numa() starts them one by one and
current number of threads changes incrementally.

As a result we may have the following situation with 2 pmd threads:

* dp_netdev_set_pmds_on_numa()
* pmd12 thread started. Currently only 1 pmd thread exists.
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
* pmd14 thread started. 2 pmd threads exists.
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'

We have:
core 1 --> port 1, port 2
core 2 --> port 2

Fix this by starting pmd threads only after all of them have
been configured.

Cc: Daniele Di Proietto 
Cc: Dyasly Sergey 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 79c4612..83e55e7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2961,6 +2961,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
  * pmd threads for the numa node. */
 if (!n_pmds) {
 int can_have, n_unpinned, i;
+struct dp_netdev_pmd_thread **pmds;
 
 n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
 if (!n_unpinned) {
@@ -2972,15 +2973,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
 /* If cpu mask is specified, uses all unpinned cores, otherwise
  * tries creating NR_PMD_THREADS pmd threads. */
 can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, 
NR_PMD_THREADS);
+pmds = xzalloc(can_have * sizeof *pmds);
 for (i = 0; i < can_have; i++) {
-struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
 unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
-
-dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
+pmds[i] = xzalloc(sizeof **pmds);
+dp_netdev_configure_pmd(pmds[i], dp, i, core_id, numa_id);
+}
+/* The pmd thread code needs to see all the others configured pmd
+ * threads on the same numa node.  That's why we call
+ * 'dp_netdev_configure_pmd()' on all the threads and then we actually
+ * start them. */
+for (i = 0; i < can_have; i++) {
 /* Each thread will distribute all devices rx-queues among
  * themselves. */
-pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
+pmds[i]->thread = ovs_thread_create("pmd", pmd_thread_main, 
pmds[i]);
 }
+free(pmds);
 VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id);
 }
 }
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] dpif-netdev: fix race for queues between pmd threads

2015-07-28 Thread Ilya Maximets

I agree. It made sense with my first fix. Fixed and sent again.

Thanks.

On 27.07.2015 19:58, Daniele Di Proietto wrote:
> Thanks for the updated patch.
> 
> One comment below
> 
> On 27/07/2015 13:19, "Ilya Maximets"  wrote:
> 
>> Currently pmd threads select queues in pmd_load_queues() according to
>> get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
>> beacause dp_netdev_set_pmds_on_numa() starts them one by one and
>> current number of threads changes incrementally.
>>
>> As a result we may have the following situation with 2 pmd threads:
>>
>> * dp_netdev_set_pmds_on_numa()
>> * pmd12 thread started. Currently only 1 pmd thread exists.
>> dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
>> dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
>> * pmd14 thread started. 2 pmd threads exists.
>> dpif_netdev|INFO|Created 2 pmd threads on numa node 0
>> dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'
>>
>> We have:
>> core 1 --> port 1, port 2
>> core 2 --> port 2
>>
>> Fix this by starting pmd threads only after all of them have
>> been configured.
>>
>> Cc: Daniele Di Proietto 
>> Cc: Dyasly Sergey 
>> Signed-off-by: Ilya Maximets 
>> ---
>> lib/dpif-netdev.c | 21 ++---
>> 1 file changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 79c4612..4fca7b7 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -1123,10 +1123,9 @@ do_add_port(struct dp_netdev *dp, const char
>> *devname, const char *type,
>> ovs_refcount_init(&port->ref_cnt);
>> cmap_insert(&dp->ports, &port->node, hash_port_no(port_no));
>>
>> -if (netdev_is_pmd(netdev)) {
>> +if (netdev_is_pmd(netdev))
>> dp_netdev_set_pmds_on_numa(dp, netdev_get_numa_id(netdev));
>> -dp_netdev_reload_pmds(dp);
>> -}
>> +
> 
> I think we should still call 'dp_netdev_reload_pmds()' when adding a
> new port.
> 
>> seq_change(dp->port_seq);
>>
>> return 0;
>> @@ -2961,6 +2960,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp,
>> int numa_id)
>>  * pmd threads for the numa node. */
>> if (!n_pmds) {
>> int can_have, n_unpinned, i;
>> +struct dp_netdev_pmd_thread **pmds;
>>
>> n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
>> if (!n_unpinned) {
>> @@ -2972,15 +2972,22 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp,
>> int numa_id)
>> /* If cpu mask is specified, uses all unpinned cores, otherwise
>>  * tries creating NR_PMD_THREADS pmd threads. */
>> can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned,
>> NR_PMD_THREADS);
>> +pmds = xzalloc(can_have * sizeof *pmds);
>> for (i = 0; i < can_have; i++) {
>> -struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
>> unsigned core_id =
>> ovs_numa_get_unpinned_core_on_numa(numa_id);
>> -
>> -dp_netdev_configure_pmd(pmd, dp, i, core_id, numa_id);
>> +pmds[i] = xzalloc(sizeof **pmds);
>> +dp_netdev_configure_pmd(pmds[i], dp, i, core_id, numa_id);
>> +}
>> +/* The pmd thread code needs to see all the others configured pmd
>> + * threads on the same numa node.  That's why we call
>> + * 'dp_netdev_configure_pmd()' on all the threads and then we
>> actually
>> + * start them. */
>> +for (i = 0; i < can_have; i++) {
>> /* Each thread will distribute all devices rx-queues among
>>  * themselves. */
>> -pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
>> +pmds[i]->thread = ovs_thread_create("pmd", pmd_thread_main,
>> pmds[i]);
>> }
>> +free(pmds);
>> VLOG_INFO("Created %d pmd threads on numa node %d", can_have,
>> numa_id);
>> }
>> }
>> -- 
>> 2.1.4
>>
> 
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] dpif: allow adding ukeys for same flow by different pmds

2015-07-30 Thread Ilya Maximets

In multiqueue mode several pmd threads may process one
port, but different queues. Flow doesn't depend on queue.

So, while miss upcall processing, all threads (except first
for that port) will receive error = ENOSPC due to
ukey_install failure. Therefore they will not add the flow
to flow_table and will not insert it to exact match cache.

As a result all threads (except first for that port) will
always execute a miss.

Fix that by comparing ukeys not only by ufids but also
by pmd_ids.

Signed-off-by: Ilya Maximets 
---
 ofproto/ofproto-dpif-upcall.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 440f9e9..38e03c5 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -286,7 +286,8 @@ static bool ukey_install_start(struct udpif *, struct 
udpif_key *ukey);
 static bool ukey_install_finish(struct udpif_key *ukey, int error);
 static bool ukey_install(struct udpif *udpif, struct udpif_key *ukey);
 static struct udpif_key *ukey_lookup(struct udpif *udpif,
- const ovs_u128 *ufid);
+ const ovs_u128 *ufid,
+ const unsigned pmd_id);
 static int ukey_acquire(struct udpif *, const struct dpif_flow *,
 struct udpif_key **result, int *error);
 static void ukey_delete__(struct udpif_key *);
@@ -1141,7 +1142,8 @@ process_upcall(struct udpif *udpif, struct upcall *upcall,
 }
 if (actions_len == 0) {
 /* Lookup actions in userspace cache. */
-struct udpif_key *ukey = ukey_lookup(udpif, upcall->ufid);
+struct udpif_key *ukey = ukey_lookup(udpif, upcall->ufid,
+ upcall->pmd_id);
 if (ukey) {
 actions = ukey->actions->data;
 actions_len = ukey->actions->size;
@@ -1324,14 +1326,14 @@ get_ufid_hash(const ovs_u128 *ufid)
 }
 
 static struct udpif_key *
-ukey_lookup(struct udpif *udpif, const ovs_u128 *ufid)
+ukey_lookup(struct udpif *udpif, const ovs_u128 *ufid, const unsigned pmd_id)
 {
 struct udpif_key *ukey;
 int idx = get_ufid_hash(ufid) % N_UMAPS;
 struct cmap *cmap = &udpif->ukeys[idx].cmap;
 
 CMAP_FOR_EACH_WITH_HASH (ukey, cmap_node, get_ufid_hash(ufid), cmap) {
-if (ovs_u128_equals(&ukey->ufid, ufid)) {
+if (ovs_u128_equals(&ukey->ufid, ufid) && ukey->pmd_id == pmd_id) {
 return ukey;
 }
 }
@@ -1488,7 +1490,7 @@ ukey_install_start(struct udpif *udpif, struct udpif_key 
*new_ukey)
 idx = new_ukey->hash % N_UMAPS;
 umap = &udpif->ukeys[idx];
 ovs_mutex_lock(&umap->mutex);
-old_ukey = ukey_lookup(udpif, &new_ukey->ufid);
+old_ukey = ukey_lookup(udpif, &new_ukey->ufid, new_ukey->pmd_id);
 if (old_ukey) {
 /* Uncommon case: A ukey is already installed with the same UFID. */
 if (old_ukey->key_len == new_ukey->key_len
@@ -1570,7 +1572,7 @@ ukey_acquire(struct udpif *udpif, const struct dpif_flow 
*flow,
 struct udpif_key *ukey;
 int retval;
 
-ukey = ukey_lookup(udpif, &flow->ufid);
+ukey = ukey_lookup(udpif, &flow->ufid, flow->pmd_id);
 if (ukey) {
 retval = ovs_mutex_trylock(&ukey->mutex);
 } else {
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3] dpif-netdev: fix race for queues between pmd threads

2015-08-04 Thread Ilya Maximets

Will anyone plan to apply this patch?

Best regards, Ilya Maximets.

On 28.07.2015 23:48, Flavio Leitner wrote:
> On Tue, Jul 28, 2015 at 09:55:52AM +0300, Ilya Maximets wrote:
>> Currently pmd threads select queues in pmd_load_queues() according to
>> get_n_pmd_threads_on_numa(). This behavior leads to race between pmds,
>> beacause dp_netdev_set_pmds_on_numa() starts them one by one and
>> current number of threads changes incrementally.
>>
>> As a result we may have the following situation with 2 pmd threads:
>>
>> * dp_netdev_set_pmds_on_numa()
>> * pmd12 thread started. Currently only 1 pmd thread exists.
>> dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1'
>> dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2'
>> * pmd14 thread started. 2 pmd threads exists.
>> dpif_netdev|INFO|Created 2 pmd threads on numa node 0
>> dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2'
>>
>> We have:
>> core 1 --> port 1, port 2
>> core 2 --> port 2
>>
>> Fix this by starting pmd threads only after all of them have
>> been configured.
>>
>> Cc: Daniele Di Proietto 
>> Cc: Dyasly Sergey 
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> 
> Looks good to me.
> Acked-by: Flavio Leitner 
> 
> 
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] dpif-netdev: proper tx queue id

2015-08-05 Thread Ilya Maximets

Currently tx_qid is equal to pmd->core_id. This leads to wrong
behavior if pmd-cpu-mask different from '/(0*)(1|3|7)?(f*)/',
e.g. if core_ids are not sequential, or doesn't start from 0, or both.

Example(just one of possible wrong scenarios):

starting 2 pmd threads with 1 port, 2 rxqs per port and
pmd-cpu-mask = 0014.

It that case pmd_1->tx_qid = 2, pmd_2->tx_qid = 4 and
txq_needs_locking = false (if device has 2 queues).

While netdev_dpdk_send__() qid will not be truncated and
dpdk_queue_pkts() will be called for nonexistent queues (2 and 4).

Fix that by calculating tx_qid from rxq indexes for each rxq separately.
'rxq_poll' structure supplemented by tx_qid and renamed to 'q_poll'.
'poll_list' moved inside dp_netdev_pmd_thread structure to be able
to get proper tx_qid for current port while calling netdev_send().
Also, information about queues of each thread added to log.

Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 102 ++
 lib/netdev.c  |   6 
 lib/netdev.h  |   1 +
 3 files changed, 57 insertions(+), 52 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 83e55e7..03af4bf 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -365,6 +365,13 @@ struct dp_netdev_pmd_cycles {
 atomic_ullong n[PMD_N_CYCLES];
 };
 
+/* Contained by struct dp_netdev_pmd_thread's 'poll_list' member.  */
+struct q_poll {
+struct dp_netdev_port *port;
+struct netdev_rxq *rx;
+unsigned tx_qid;
+};
+
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
  * the performance overhead of interrupt processing.  Therefore netdev can
  * not implement rx-wait for these devices.  dpif-netdev needs to poll
@@ -420,8 +427,10 @@ struct dp_netdev_pmd_thread {
 /* threads on same numa node. */
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
-int tx_qid; /* Queue id used by this pmd thread to
- * send packets on all netdevs */
+
+struct q_poll *poll_list;   /* List of queues polling by this pmd */
+unsigned int poll_cnt;  /* Number of queues in poll_list */
+unsigned int poll_cur;  /* Index of current queue in poll_list */
 
 /* Only a pmd thread can write on its own 'cycles' and 'stats'.
  * The main thread keeps 'stats_zero' and 'cycles_zero' as base
@@ -2624,25 +2633,18 @@ dpif_netdev_wait(struct dpif *dpif)
 seq_wait(tnl_conf_seq, dp->last_tnl_conf_seq);
 }
 
-struct rxq_poll {
-struct dp_netdev_port *port;
-struct netdev_rxq *rx;
-};
-
-static int
-pmd_load_queues(struct dp_netdev_pmd_thread *pmd,
-struct rxq_poll **ppoll_list, int poll_cnt)
+static void
+pmd_load_queues(struct dp_netdev_pmd_thread *pmd)
 {
-struct rxq_poll *poll_list = *ppoll_list;
 struct dp_netdev_port *port;
-int n_pmds_on_numa, index, i;
+int n_pmds_on_numa, n_txqs, index, i;
 
 /* Simple scheduler for netdev rx polling. */
-for (i = 0; i < poll_cnt; i++) {
-port_unref(poll_list[i].port);
+for (i = 0; i < pmd->poll_cnt; i++) {
+port_unref(pmd->poll_list[i].port);
 }
 
-poll_cnt = 0;
+pmd->poll_cnt = 0;
 n_pmds_on_numa = get_n_pmd_threads_on_numa(pmd->dp, pmd->numa_id);
 index = 0;
 
@@ -2652,17 +2654,18 @@ pmd_load_queues(struct dp_netdev_pmd_thread *pmd,
 if (port_try_ref(port)) {
 if (netdev_is_pmd(port->netdev)
 && netdev_get_numa_id(port->netdev) == pmd->numa_id) {
-int i;
+n_txqs = netdev_n_txq(port->netdev);
 
 for (i = 0; i < netdev_n_rxq(port->netdev); i++) {
 if ((index % n_pmds_on_numa) == pmd->index) {
-poll_list = xrealloc(poll_list,
-sizeof *poll_list * (poll_cnt + 1));
+pmd->poll_list = xrealloc(pmd->poll_list,
+sizeof *pmd->poll_list * (pmd->poll_cnt + 1));
 
 port_ref(port);
-poll_list[poll_cnt].port = port;
-poll_list[poll_cnt].rx = port->rxq[i];
-poll_cnt++;
+pmd->poll_list[pmd->poll_cnt].port = port;
+pmd->poll_list[pmd->poll_cnt].rx = port->rxq[i];
+pmd->poll_list[pmd->poll_cnt].tx_qid = i % n_txqs;
+pmd->poll_cnt++;
 }
 index++;
 }
@@ -2671,9 +2674,6 @@ pmd_load_queues(stru

Re: [ovs-dev] [PATCH] dpif-netdev: proper tx queue id

2015-08-05 Thread Ilya Maximets

Sorry, I agree that example is incorrect. It is really not true, because
of using ovs_numa_get_n_cores() to call netdev_set_multiq().
No, I didn't actually observe a bug.

But there is another example:

same configuration(2 pmd threads with 1 port,
2 rxqs per port and pmd-cpu-mask = 0014).

pmd_1->tx_qid = 2, pmd_2->tx_qid = 4,
txq_needs_locking = true (if device hasn't ovs_numa_get_n_cores() 
queues)

Lets netdev->real_n_txq = 2; (device has 2 queues)

In that case, after truncating in netdev_dpdk_send__()
'qid = qid % dev->real_n_txq;'
pmd_1: qid = 2 % 2 = 0
pmd_2: qid = 4 % 2 = 0

So, both threads will call dpdk_queue_pkts() with same qid = 0.
This is unexpected behavior if there is 2 tx queues in device.
Queue #1 will not be used and both threads will lock queue #0
on each send.

About your example:
2 pmd threads can't call netdev_send() with same tx_qid,
because pmd->tx_qid = pmd->core_id and there is only one thread
with core_id = 0. See dp_netdev_configure_pmd().

So,
pmd1 will call netdev_send(netdev=dpdk0, tx_qid= *pmd1->core_id* )
pmd2 will call netdev_send(netdev=dpdk0, tx_qid= *pmd2->core_id* )


On 05.08.2015 17:54, Daniele Di Proietto wrote:
> 
> 
> On 05/08/2015 13:28, "Ilya Maximets"  wrote:
> 
>> Currently tx_qid is equal to pmd->core_id. This leads to wrong
>> behavior if pmd-cpu-mask different from '/(0*)(1|3|7)?(f*)/',
>> e.g. if core_ids are not sequential, or doesn't start from 0, or both.
>>
>> Example(just one of possible wrong scenarios):
>>
>>  starting 2 pmd threads with 1 port, 2 rxqs per port and
>>  pmd-cpu-mask = 0014.
>>
>>  It that case pmd_1->tx_qid = 2, pmd_2->tx_qid = 4 and
>>  txq_needs_locking = false (if device has 2 queues).
>>
>>  While netdev_dpdk_send__() qid will not be truncated and
>>  dpdk_queue_pkts() will be called for nonexistent queues (2 and 4).
> 
> This shouldn't be possible, because the datapath requests one txq
> for each core in the system, not for each core in pmd-cpu-mask
> (see the calls to netdev_set_multiq() in dpif-netdev.c).
> 
> Did you actually observe a bug or an unexpected behaviour?
> 
> I didn't read the patch carefully (I want to understand the problem first),
> but it appears that two pmd threads could call netdev_send on the same
> port, with the same tx_qid concurrently.  Example:
> 
> pmd1 is processing dpdk0 with rx_qid 0, pmd2 is processing dpdk1 with
> rx_qid 0.
> 
> The flow table is configured to send everything to dpdk0.
> 
> pmd1 will call netdev_send(netdev=dpdk0, tx_qid=0)
> pmd2 will call netdev_send(netdev=dpdk0, tx_qid=0)
> 
> these calls can happen concurrently
> 
>>
>> Fix that by calculating tx_qid from rxq indexes for each rxq separately.
>> 'rxq_poll' structure supplemented by tx_qid and renamed to 'q_poll'.
>> 'poll_list' moved inside dp_netdev_pmd_thread structure to be able
>> to get proper tx_qid for current port while calling netdev_send().
>> Also, information about queues of each thread added to log.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>> lib/dpif-netdev.c | 102
>> ++
>> lib/netdev.c  |   6 
>> lib/netdev.h  |   1 +
>> 3 files changed, 57 insertions(+), 52 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 83e55e7..03af4bf 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -365,6 +365,13 @@ struct dp_netdev_pmd_cycles {
>> atomic_ullong n[PMD_N_CYCLES];
>> };
>>
>> +/* Contained by struct dp_netdev_pmd_thread's 'poll_list' member.  */
>> +struct q_poll {
>> +struct dp_netdev_port *port;
>> +struct netdev_rxq *rx;
>> +unsigned tx_qid;
>> +};
>> +
>> /* PMD: Poll modes drivers.  PMD accesses devices via polling to
>> eliminate
>>  * the performance overhead of interrupt processing.  Therefore netdev
>> can
>>  * not implement rx-wait for these devices.  dpif-netdev needs to poll
>> @@ -420,8 +427,10 @@ struct dp_netdev_pmd_thread {
>> /* threads on same numa node. */
>> unsigned core_id;   /* CPU core id of this pmd thread. */
>> int numa_id;/* numa node id of this pmd thread.
>> */
>> -int tx_qid; /* Queue id used by this pmd

Re: [ovs-dev] [PATCH] dpif-netdev: proper tx queue id

2015-08-05 Thread Ilya Maximets



On 05.08.2015 19:26, Daniele Di Proietto wrote:
> 
> 
> On 05/08/2015 16:42, "Ilya Maximets"  wrote:
> 
>> Sorry, I agree that example is incorrect. It is really not true, because
>> of using ovs_numa_get_n_cores() to call netdev_set_multiq().
>> No, I didn't actually observe a bug.
>>
>> But there is another example:
>>
>>  same configuration(2 pmd threads with 1 port,
>>  2 rxqs per port and pmd-cpu-mask = 0014).
>>
>>  pmd_1->tx_qid = 2, pmd_2->tx_qid = 4,
>>  txq_needs_locking = true (if device hasn't ovs_numa_get_n_cores() 
>> queues)
>>
>>  Lets netdev->real_n_txq = 2; (device has 2 queues)
>>
>>  In that case, after truncating in netdev_dpdk_send__()
>>  'qid = qid % dev->real_n_txq;'
>>  pmd_1: qid = 2 % 2 = 0
>>  pmd_2: qid = 4 % 2 = 0
>>
>>  So, both threads will call dpdk_queue_pkts() with same qid = 0.
>>  This is unexpected behavior if there is 2 tx queues in device.
>>  Queue #1 will not be used and both threads will lock queue #0
>>  on each send.
> 
> Yes, that is true.  In general it is hard to properly distribute
> transmission queues because potentially every pmd thread can send
> a packet to any netdev.
> 
> I agree that we can do better than this. Currently we create a txq
> for every core (not for every pmd thread), because we don't want
> to reconfigure every device when a new thread is added (I rembember
> discussing this with Alex, perhaps he can provide more insight).
> We can always change the code to create one txq for every pmd
> thread (I'd appreciate other opinions on this).
> 
>>
>> About your example:
>>  2 pmd threads can't call netdev_send() with same tx_qid,
>>  because pmd->tx_qid = pmd->core_id and there is only one thread
>>  with core_id = 0. See dp_netdev_configure_pmd().
>>
>>  So,
>>  pmd1 will call netdev_send(netdev=dpdk0, tx_qid= *pmd1->core_id* )
>>  pmd2 will call netdev_send(netdev=dpdk0, tx_qid= *pmd2->core_id* )
> 
> I agree, on current master they can't. I meant with this patch applied.
> 

Yes, with patch applied calls with same tx_qid can happen concurrently,
but there is rte_spinlock_lock(&dev->tx_q[qid].tx_lock) for that case
inside netdev_dpdk_send__(). Other netdevs doesn't use qid at all.

>>
>>  
>> On 05.08.2015 17:54, Daniele Di Proietto wrote:
>>>
>>>
>>> On 05/08/2015 13:28, "Ilya Maximets"  wrote:
>>>
>>>> Currently tx_qid is equal to pmd->core_id. This leads to wrong
>>>> behavior if pmd-cpu-mask different from '/(0*)(1|3|7)?(f*)/',
>>>> e.g. if core_ids are not sequential, or doesn't start from 0, or both.
>>>>
>>>> Example(just one of possible wrong scenarios):
>>>>
>>>>starting 2 pmd threads with 1 port, 2 rxqs per port and
>>>>pmd-cpu-mask = 0014.
>>>>
>>>>It that case pmd_1->tx_qid = 2, pmd_2->tx_qid = 4 and
>>>>txq_needs_locking = false (if device has 2 queues).
>>>>
>>>>While netdev_dpdk_send__() qid will not be truncated and
>>>>dpdk_queue_pkts() will be called for nonexistent queues (2 and 4).
>>>
>>> This shouldn't be possible, because the datapath requests one txq
>>> for each core in the system, not for each core in pmd-cpu-mask
>>> (see the calls to netdev_set_multiq() in dpif-netdev.c).
>>>
>>> Did you actually observe a bug or an unexpected behaviour?
>>>
>>> I didn't read the patch carefully (I want to understand the problem
>>> first),
>>> but it appears that two pmd threads could call netdev_send on the same
>>> port, with the same tx_qid concurrently.  Example:
>>>
>>> pmd1 is processing dpdk0 with rx_qid 0, pmd2 is processing dpdk1
>>> with
>>> rx_qid 0.
>>>
>>> The flow table is configured to send everything to dpdk0.
>>>
>>> pmd1 will call netdev_send(netdev=dpdk0, tx_qid=0)
>>> pmd2 will call netdev_send(netdev=dpdk0, tx_qid=0)
>>> 
>>> these calls can happen concurrently
>>>
>>>>
>>>> Fix that by calculating tx_qid from rxq indexes for each rxq
>>>> separately.
>>>> 'rxq_poll' structure supplemented by tx_qid and renamed to 'q_poll'.
>>>> 'poll_list' moved inside dp_netdev_pmd_thread structure to be able
>>>> to get proper tx_qid for current port while calling netdev_send().
>>>> Also, information about queues of each thread added to log.
>>>>
>>>> Signed-off-by: Ilya Maximets 
>>>> ---
>>>> lib/dpif-netdev.c | 102
>>>> ++
>>>> lib/netdev.c  |   6 
>>>> lib/netdev.h  |   1 +
>>>> 3 files changed, 57 insertions(+), 52 deletions(-)
> 
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v7 08/16] dpif-netdev: Add pmd thread local port cache for transmission.

2016-04-18 Thread Ilya Maximets

On 08.04.2016 06:13, Daniele Di Proietto wrote:
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/dpif-netdev.c | 243 
> +++---
>  1 file changed, 175 insertions(+), 68 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 8c5893d..5d1cc43 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -185,6 +185,7 @@ static bool dpcls_lookup(const struct dpcls *cls,
>   *
>   *dp_netdev_mutex (global)
>   *port_mutex
> + *non_pmd_mutex
>   */
>  struct dp_netdev {
>  const struct dpif_class *const class;
> @@ -380,6 +381,13 @@ struct rxq_poll {
>  struct ovs_list node;
>  };
>  
> +/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
> +struct tx_port {
> +odp_port_t port_no;
> +struct netdev *netdev;
> +struct hmap_node node;
> +};
> +
>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
>   * the performance overhead of interrupt processing.  Therefore netdev can
>   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> @@ -436,10 +444,18 @@ struct dp_netdev_pmd_thread {
>  atomic_int tx_qid;  /* Queue id used by this pmd thread to
>   * send packets on all netdevs */
>  
> -struct ovs_mutex poll_mutex;/* Mutex for poll_list. */
> +struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. 
> */
>  /* List of rx queues to poll. */
>  struct ovs_list poll_list OVS_GUARDED;
> -int poll_cnt;   /* Number of elemints in poll_list. */
> +/* Number of elements in 'poll_list' */
> +int poll_cnt;
> +/* Map of 'tx_port's used for transmission.  Written by the main thread,
> + * read by the pmd thread. */
> +struct hmap tx_ports OVS_GUARDED;
> +
> +/* Map of 'tx_port' used in the fast path. This is a thread-local copy
> + * 'tx_ports'. */
> +struct hmap port_cache;
>  
>  /* Only a pmd thread can write on its own 'cycles' and 'stats'.
>   * The main thread keeps 'stats_zero' and 'cycles_zero' as base
> @@ -495,7 +511,7 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct 
> cmap_position *pos);
>  static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
>  static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id);
>  static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id);
> -static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread *pmd);
> +static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
>  static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
>   struct dp_netdev_port *port);
>  static void
> @@ -509,6 +525,8 @@ static void dp_netdev_reset_pmd_threads(struct dp_netdev 
> *dp);
>  static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
>  static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
>  static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
> +static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
> +OVS_REQUIRES(pmd->port_mutex);
>  
>  static inline bool emc_entry_alive(struct emc_entry *ce);
>  static void emc_clear_entry(struct emc_entry *ce);
> @@ -691,7 +709,7 @@ pmd_info_show_rxq(struct ds *reply, struct 
> dp_netdev_pmd_thread *pmd)
>  ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n",
>pmd->numa_id, pmd->core_id);
>  
> -ovs_mutex_lock(&pmd->poll_mutex);
> +ovs_mutex_lock(&pmd->port_mutex);
>  LIST_FOR_EACH (poll, node, &pmd->poll_list) {
>  const char *name = netdev_get_name(poll->port->netdev);
>  
> @@ -705,7 +723,7 @@ pmd_info_show_rxq(struct ds *reply, struct 
> dp_netdev_pmd_thread *pmd)
>  ds_put_format(reply, " %d", netdev_rxq_get_queue_id(poll->rx));
>  prev_name = name;
>  }
> -ovs_mutex_unlock(&pmd->poll_mutex);
> +ovs_mutex_unlock(&pmd->port_mutex);
>  ds_put_cstr(reply, "\n");
>  }
>  }
> @@ -1078,6 +1096,11 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread 
> *pmd)
>  int old_seq;
>  
>  if (pmd->core_id == NON_PMD_CORE_ID) {
> +ovs_mutex_lock(&pmd->dp->non_pmd_mutex);
> +ovs_mutex_lock(&pmd->port_mutex);
> +pmd_load_cached_ports(pmd);
> +ovs_mutex_unlock(&pmd->port_mutex);
> +ovs_mutex_unlock(&pmd->dp->non_pmd_mutex);
>  return;
>  }
>  
> @@ -1200,9 +1223,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
> const char *type,
>  
>  cmap_insert(&dp->ports, &port->node, hash_port_no(port_no));
>  
> -if (netdev_is_pmd(port->netdev)) {
> -dp_netdev_add_port_to_pmds(dp, port);
> -}
> +dp_netdev_add_port_to_pmds(dp, port);
>  seq_change(dp->port_seq);
>  
>  return 0;
> @@ -1371,6 +1392,9 @@ do_del_port(struct dp_netdev *dp, struct dp_netdev_p

Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.

2016-04-19 Thread Ilya Maximets

There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside
pmd_thread_main(). The reason is that we must wait until PMD thread
completely done with reloading. This patch introduces race condition
for pmd->exit_latch. While removing last port on numa node
dp_netdev_reload_pmd__(pmd) will be called twice for each port.
First call to remove port and second to destroy PMD thread.
pmd->exit_latch setted between this two calls. This leads to probable
situation when PMD thread will exit while processing first reloading.
Main thread will wait forever on cond_wait in second reload in this
case. Situation is easily reproducible by addition/deletion of last
port (may be after few iterations in a cycle).

Best regards, Ilya Maximets.

On 08.04.2016 06:13, Daniele Di Proietto wrote:
> The pmds and the main threads are synchronized using a condition
> variable.  The main thread writes a new configuration, then it waits on
> the condition variable.  A pmd thread reads the new configuration, then
> it calls signal() on the condition variable. To make sure that the pmds
> and the main thread have a consistent view, each signal() should be
> backed by a wait().
> 
> Currently the first signal() doesn't have a corresponding wait().  If
> the pmd thread takes a long time to start and the signal() is received
> by a later wait, the threads will have an inconsistent view.
> 
> The commit fixes the problem by removing the first signal() from the
> pmd thread.
> 
> This is hardly a problem on current master, because the main thread
> will call the first wait() a long time after the creation of a pmd
> thread.  It becomes a problem with the next commits.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/dpif-netdev.c | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9c32c64..2424d3e 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif)
>  
>  static int
>  pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll 
> **ppoll_list)
> -OVS_REQUIRES(pmd->poll_mutex)
>  {
>  struct rxq_poll *poll_list = *ppoll_list;
>  struct rxq_poll *poll;
>  int i;
>  
> +ovs_mutex_lock(&pmd->poll_mutex);
>  poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list);
>  
>  i = 0;
>  LIST_FOR_EACH (poll, node, &pmd->poll_list) {
>  poll_list[i++] = *poll;
>  }
> +ovs_mutex_unlock(&pmd->poll_mutex);
>  
>  *ppoll_list = poll_list;
> -return pmd->poll_cnt;
> +return i;
>  }
>  
>  static void *
> @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_)
>  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
>  pmd_thread_setaffinity_cpu(pmd->core_id);
> +poll_cnt = pmd_load_queues(pmd, &poll_list);
>  reload:
>  emc_cache_init(&pmd->flow_cache);
>  
> -ovs_mutex_lock(&pmd->poll_mutex);
> -poll_cnt = pmd_load_queues(pmd, &poll_list);
> -ovs_mutex_unlock(&pmd->poll_mutex);
> -
>  /* List port/core affinity */
>  for (i = 0; i < poll_cnt; i++) {
> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
> @@ -2699,10 +2697,6 @@ reload:
>  netdev_rxq_get_queue_id(poll_list[i].rx));
>  }
>  
> -/* Signal here to make sure the pmd finishes
> - * reloading the updated configuration. */
> -dp_netdev_pmd_reload_done(pmd);
> -
>  for (;;) {
>  for (i = 0; i < poll_cnt; i++) {
>  dp_netdev_process_rxq_port(pmd, poll_list[i].port, 
> poll_list[i].rx);
> @@ -2725,14 +2719,17 @@ reload:
>  }
>  }
>  
> +poll_cnt = pmd_load_queues(pmd, &poll_list);
> +/* Signal here to make sure the pmd finishes
> + * reloading the updated configuration. */
> +dp_netdev_pmd_reload_done(pmd);
> +
>  emc_cache_uninit(&pmd->flow_cache);
>  
>  if (!latch_is_set(&pmd->exit_latch)){
>  goto reload;
>  }
>  
> -dp_netdev_pmd_reload_done(pmd);
> -
>  free(poll_list);
>  return NULL;
>  }
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v7 08/16] dpif-netdev: Add pmd thread local port cache for transmission.

2016-04-19 Thread Ilya Maximets

On 19.04.2016 00:19, Daniele Di Proietto wrote:
> 
> 
> On 18/04/2016 07:50, "Ilya Maximets"  wrote:
> 
>> On 08.04.2016 06:13, Daniele Di Proietto wrote:
>>> Signed-off-by: Daniele Di Proietto 
>>> ---
>>>  lib/dpif-netdev.c | 243
>>> +++---
>>>  1 file changed, 175 insertions(+), 68 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 8c5893d..5d1cc43 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -185,6 +185,7 @@ static bool dpcls_lookup(const struct dpcls *cls,
>>>   *
>>>   *dp_netdev_mutex (global)
>>>   *port_mutex
>>> + *non_pmd_mutex
>>>   */
>>>  struct dp_netdev {
>>>  const struct dpif_class *const class;
>>> @@ -380,6 +381,13 @@ struct rxq_poll {
>>>  struct ovs_list node;
>>>  };
>>>  
>>> +/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or
>>> 'tx_ports'. */
>>> +struct tx_port {
>>> +odp_port_t port_no;
>>> +struct netdev *netdev;
>>> +struct hmap_node node;
>>> +};
>>> +
>>>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to
>>> eliminate
>>>   * the performance overhead of interrupt processing.  Therefore netdev
>>> can
>>>   * not implement rx-wait for these devices.  dpif-netdev needs to poll
>>> @@ -436,10 +444,18 @@ struct dp_netdev_pmd_thread {
>>>  atomic_int tx_qid;  /* Queue id used by this pmd
>>> thread to
>>>   * send packets on all netdevs */
>>>  
>>> -struct ovs_mutex poll_mutex;/* Mutex for poll_list. */
>>> +struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and
>>> 'tx_ports'. */
>>>  /* List of rx queues to poll. */
>>>  struct ovs_list poll_list OVS_GUARDED;
>>> -int poll_cnt;   /* Number of elemints in
>>> poll_list. */
>>> +/* Number of elements in 'poll_list' */
>>> +int poll_cnt;
>>> +/* Map of 'tx_port's used for transmission.  Written by the main
>>> thread,
>>> + * read by the pmd thread. */
>>> +struct hmap tx_ports OVS_GUARDED;
>>> +
>>> +/* Map of 'tx_port' used in the fast path. This is a thread-local
>>> copy
>>> + * 'tx_ports'. */
>>> +struct hmap port_cache;
>>>  
>>>  /* Only a pmd thread can write on its own 'cycles' and 'stats'.
>>>   * The main thread keeps 'stats_zero' and 'cycles_zero' as base
>>> @@ -495,7 +511,7 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct
>>> cmap_position *pos);
>>>  static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
>>>  static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int
>>> numa_id);
>>>  static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int
>>> numa_id);
>>> -static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread
>>> *pmd);
>>> +static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread
>>> *pmd);
>>>  static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
>>>   struct dp_netdev_port
>>> *port);
>>>  static void
>>> @@ -509,6 +525,8 @@ static void dp_netdev_reset_pmd_threads(struct
>>> dp_netdev *dp);
>>>  static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
>>>  static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
>>>  static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
>>> +static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
>>> +OVS_REQUIRES(pmd->port_mutex);
>>>  
>>>  static inline bool emc_entry_alive(struct emc_entry *ce);
>>>  static void emc_clear_entry(struct emc_entry *ce);
>>> @@ -691,7 +709,7 @@ pmd_info_show_rxq(struct ds *reply, struct
>>> dp_netdev_pmd_thread *pmd)
>>>  ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n",
>>>pmd->numa_id, pmd->core_id);
>>>  
>>> -ovs_mutex_lock(&pmd->poll_mutex);
>>> +ovs_mutex_lock(&pmd->port_mutex);
>>>  LIST_FOR_EACH

Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.

2016-04-19 Thread Ilya Maximets

On 19.04.2016 10:18, Ilya Maximets wrote:
> There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside
> pmd_thread_main(). The reason is that we must wait until PMD thread
> completely done with reloading. This patch introduces race condition
> for pmd->exit_latch. While removing last port on numa node
> dp_netdev_reload_pmd__(pmd) will be called twice for each port.
> First call to remove port and second to destroy PMD thread.
> pmd->exit_latch setted between this two calls. This leads to probable
> situation when PMD thread will exit while processing first reloading.
> Main thread will wait forever on cond_wait in second reload in this
> case. Situation is easily reproducible by addition/deletion of last
> port (may be after few iterations in a cycle).
> 
> Best regards, Ilya Maximets.

This incremental should help:
--
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 588d56f..2235297 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2785,6 +2785,7 @@ pmd_thread_main(void *f_)
 unsigned int port_seq = PMD_INITIAL_SEQ;
 int poll_cnt;
 int i;
+bool exiting;
 
 poll_cnt = 0;
 poll_list = NULL;
@@ -2825,14 +2826,15 @@ reload:
 }
 }
 
+emc_cache_uninit(&pmd->flow_cache);
+
 poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
+exiting = latch_is_set(&pmd->exit_latch);
 /* Signal here to make sure the pmd finishes
  * reloading the updated configuration. */
 dp_netdev_pmd_reload_done(pmd);
 
-emc_cache_uninit(&pmd->flow_cache);
-
-if (!latch_is_set(&pmd->exit_latch)){
+if (!exiting) {
 goto reload;
 }
 
--

 
> On 08.04.2016 06:13, Daniele Di Proietto wrote:
>> The pmds and the main threads are synchronized using a condition
>> variable.  The main thread writes a new configuration, then it waits on
>> the condition variable.  A pmd thread reads the new configuration, then
>> it calls signal() on the condition variable. To make sure that the pmds
>> and the main thread have a consistent view, each signal() should be
>> backed by a wait().
>>
>> Currently the first signal() doesn't have a corresponding wait().  If
>> the pmd thread takes a long time to start and the signal() is received
>> by a later wait, the threads will have an inconsistent view.
>>
>> The commit fixes the problem by removing the first signal() from the
>> pmd thread.
>>
>> This is hardly a problem on current master, because the main thread
>> will call the first wait() a long time after the creation of a pmd
>> thread.  It becomes a problem with the next commits.
>>
>> Signed-off-by: Daniele Di Proietto 
>> ---
>>  lib/dpif-netdev.c | 21 +
>>  1 file changed, 9 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 9c32c64..2424d3e 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif)
>>  
>>  static int
>>  pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll 
>> **ppoll_list)
>> -OVS_REQUIRES(pmd->poll_mutex)
>>  {
>>  struct rxq_poll *poll_list = *ppoll_list;
>>  struct rxq_poll *poll;
>>  int i;
>>  
>> +ovs_mutex_lock(&pmd->poll_mutex);
>>  poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list);
>>  
>>  i = 0;
>>  LIST_FOR_EACH (poll, node, &pmd->poll_list) {
>>  poll_list[i++] = *poll;
>>  }
>> +ovs_mutex_unlock(&pmd->poll_mutex);
>>  
>>  *ppoll_list = poll_list;
>> -return pmd->poll_cnt;
>> +return i;
>>  }
>>  
>>  static void *
>> @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_)
>>  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>>  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
>>  pmd_thread_setaffinity_cpu(pmd->core_id);
>> +poll_cnt = pmd_load_queues(pmd, &poll_list);
>>  reload:
>>  emc_cache_init(&pmd->flow_cache);
>>  
>> -ovs_mutex_lock(&pmd->poll_mutex);
>> -poll_cnt = pmd_load_queues(pmd, &poll_list);
>> -ovs_mutex_unlock(&pmd->poll_mutex);
>> -
>>  /* List port/core affinity */
>>  for (i = 0; i < poll_cnt; i++) {
>> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>> @@ -2699,10 +2697,6 @@ reload:
>>

Re: [ovs-dev] [PATCH v8 10/16] dpif-netdev: Use hmap for ports.

2016-04-20 Thread Ilya Maximets

On 20.04.2016 01:28, diproiettod at vmware.com (Daniele Di Proietto) wrote:
> netdev objects are hard to use with RCU, because it's not possible to
> split removal and reclamation.  Postponing the removal means that the
> port is not removed and cannot be readded immediately.  Waiting for
> reclamation means introducing a quiescent state, and that may introduce
> subtle bugs, due to the RCU model we use in userspace.
> 
> This commit changes the port container from cmap to hmap.  'port_mutex'
> must be held by readers and writers.  This shouldn't have performace
> impact, as readers in the fast path use a thread local cache.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/dpif-netdev.c | 96 
> +--
>  1 file changed, 57 insertions(+), 39 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bd2249e..8cc37e2 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -195,9 +195,10 @@ struct dp_netdev {
>  
>  /* Ports.
>   *
> - * Protected by RCU.  Take the mutex to add or remove ports. */
> + * Any lookup into 'ports' or any access to the dp_netdev_ports found
> + * through 'ports' requires taking 'port_mutex'. */
>  struct ovs_mutex port_mutex;
> -struct cmap ports;
> +struct hmap ports;
>  struct seq *port_seq;   /* Incremented whenever a port changes. */
>  
>  /* Protects access to ofproto-dpif-upcall interface during revalidator
> @@ -228,7 +229,8 @@ struct dp_netdev {
>  };
>  
>  static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev 
> *dp,
> -odp_port_t);
> +odp_port_t)
> +OVS_REQUIRES(&dp->port_mutex);

OVS_REQUIRES(dp->port_mutex);
here and 2 times more below.

>  
>  enum dp_stat_type {
>  DP_STAT_EXACT_HIT,  /* Packets that had an exact match (emc). */
> @@ -248,7 +250,7 @@ enum pmd_cycles_counter_type {
>  struct dp_netdev_port {
>  odp_port_t port_no;
>  struct netdev *netdev;
> -struct cmap_node node;  /* Node in dp_netdev's 'ports'. */
> +struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
>  struct netdev_saved_flags *sf;
>  unsigned n_rxq; /* Number of elements in 'rxq' */
>  struct netdev_rxq **rxq;
> @@ -476,9 +478,11 @@ struct dpif_netdev {
>  };
>  
>  static int get_port_by_number(struct dp_netdev *dp, odp_port_t port_no,
> -  struct dp_netdev_port **portp);
> +  struct dp_netdev_port **portp)
> +OVS_REQUIRES(dp->port_mutex);
>  static int get_port_by_name(struct dp_netdev *dp, const char *devname,
> -struct dp_netdev_port **portp);
> +struct dp_netdev_port **portp)
> +OVS_REQUIRES(dp->port_mutex);
>  static void dp_netdev_free(struct dp_netdev *)
>  OVS_REQUIRES(dp_netdev_mutex);
>  static int do_add_port(struct dp_netdev *dp, const char *devname,
> @@ -522,7 +526,8 @@ dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
>   struct dp_netdev_port *port, struct netdev_rxq *rx);
>  static struct dp_netdev_pmd_thread *
>  dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id);
> -static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp);
> +static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
> +OVS_REQUIRES(dp->port_mutex);
>  static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
>  static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
>  static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
> @@ -913,7 +918,7 @@ create_dp_netdev(const char *name, const struct 
> dpif_class *class,
>  atomic_flag_clear(&dp->destroyed);
>  
>  ovs_mutex_init(&dp->port_mutex);
> -cmap_init(&dp->ports);
> +hmap_init(&dp->ports);
>  dp->port_seq = seq_create();
>  fat_rwlock_init(&dp->upcall_rwlock);
>  
> @@ -984,7 +989,7 @@ static void
>  dp_netdev_free(struct dp_netdev *dp)
>  OVS_REQUIRES(dp_netdev_mutex)
>  {
> -struct dp_netdev_port *port;
> +struct dp_netdev_port *port, *next;
>  
>  shash_find_and_delete(&dp_netdevs, dp->name);
>  
> @@ -993,15 +998,14 @@ dp_netdev_free(struct dp_netdev *dp)
>  ovsthread_key_delete(dp->per_pmd_key);
>  
>  ovs_mutex_lock(&dp->port_mutex);
> -CMAP_FOR_EACH (port, node, &dp->ports) {
> -/* PMD threads are destroyed here. do_del_port() cannot quiesce */
> +HMAP_FOR_EACH_SAFE (port, next, node, &dp->ports) {
>  do_del_port(dp, port);
>  }
>  ovs_mutex_unlock(&dp->port_mutex);
>  cmap_destroy(&dp->poll_threads);
>  
>  seq_destroy(dp->port_seq);
> -cmap_destroy(&dp->ports);
> +hmap_destroy(&dp->ports);
>  ovs_mutex_destroy(&dp->port_mutex);
>  
>  /* Upcalls must be disabled at this point */
> @@ -1222,7 +1226,7 @@ do_ad

Re: [ovs-dev] [dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-21 Thread Ilya Maximets

Hi, Christian.
You're, likely, using tar archive with openvswitch from openvswitch.org.
It doesn't contain many bug fixes from git/branch-2.5 unfortunately.

The problem that you are facing has been solved in branch-2.5 by

commit d9df7b9206831631ddbd90f9cbeef1b4fc5a8e89
Author: Ilya Maximets 
Date:   Thu Mar 3 11:30:06 2016 +0300

netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct().

Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
    Signed-off-by: Ilya Maximets 
Acked-by: Flavio Leitner 
Acked-by: Daniele Di Proietto 

Best regards, Ilya Maximets.

> I assume there is a leak somewhere on adding/removing vhost_user ports.
> Although it could also be "only" a fragmentation issue.
> 
> Reproduction is easy:
> I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> Then in a loop I
>- add up to more 512 ports
>- test connectivity between the two guests
>- remove up to 512 ports
> 
> Depending on memory and the amount of multiqueue/rxq I use it seems to
> slightly change when exactly it breaks. But for my default setup of 4
> queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> iteration.
> Here a link to the stack trace indicating a memory shortage (TBC):
> https://launchpadlibrarian.net/253916410/apport-retrace.log
> 
> Known Todos:
> - I want to track it down more, and will try to come up with a non
> openvswitch based looping testcase that might show it as well to simplify
> debugging.
> - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
> Openvswitch master is planned.
> 
> I will go on debugging this and let you know, but I wanted to give a heads
> up to everyone.
> In case this is a known issue for some of you please let me know.
> 
> Kind Regards,
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
> 
> P.S. I think it is a dpdk issue, but adding Daniele on CC to represent
> ovs-dpdk as well.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

1 2 3 4 >

1 - 100 of 387 matches

Mail list logo