Re: [ovs-dev] [PATCH v2] odp-execute: Optimize IP header modification in OVS datapath

2016-12-22 Thread Daniele Di Proietto
2016-12-22 2:05 GMT-08:00 Zoltán Balogh :
> Hi Daniele,
>
> Thank you for the confirmation. I've used Ivy Bridge Xeon E5-2658 v2 running 
> at 3GHz. This is an older architecture than yours. Your results look better 
> for the new patch. I think we should take the new patch.

Great, I pushed the new patch to master.

Thanks!

Daniele

>
> Best regards,
> Zoltan
>
> -Original Message-
> From: Daniele Di Proietto [mailto:diproiet...@ovn.org]
> Sent: Thursday, December 22, 2016 4:06 AM
> To: Zoltán Balogh 
> Cc: d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2] odp-execute: Optimize IP header 
> modification in OVS datapath
>
> 2016-12-13 9:27 GMT-08:00 Zoltán Balogh :
>>
>> Hi Daniele,
>>
>>> Have you tried avoiding also computing the new field if the mask is 0?
>>> For example, if
>>> mask->ipv4_src is 0, there's not reason to compute new_ip_src, or to
>>> extract ip_src_nh.
>>
>> Yes, I have investigated this case in mid-October. My results was
>> that execution of "output + dec_ttl" and "output + mod_nw_tos" became even
>> faster, but execution of rest of the actions I investigated became slower
>> compared to the patch I posted previously.
>> So I rejected this modification.
>>
>> Now, I rebased the code to 9th Dec master and ran the tests with DPDK 16.11
>> again. Please find the new rebased patch below.
>> The results of my original and the new patch:
>>
>>| T | T | I | I |
>>| T | O | P | P |  Vanilla OVS  ||  + old patch  |  + new patch
>>| L | S | s | d | (nsec/packet) || (nsec/packet) | (nsec/packet)
>> ---+---+---+---+---+---++---+---
>> output |   |   |   |   |67.19  ||67.19  |67.19
>>| X |   |   |   |74.48  ||69.53  |68.78(+)
>>|   | X |   |   |74.42  ||71.20  |70.07(+)
>>|   |   | X |   |84.62  ||78.70  |78.03(+)
>>|   |   |   | X |84.25  ||78.51  |77.94(+)
>>|   |   | X | X |97.46  ||91.98  |91.86(+)
>>| X |   | X | X |   100.42  ||95.00  |96.00(-)
>>| X | X | X | X |   102.80  ||   100.40  |   100.73(-)
>>
>> I ran each test five times. The values are the mean of the readings obtained.
>>
>> As you can see, there is an improvement in the first 5 cases. But, in case of
>> "output + dec_ttl + mod_nw_src + mod_nw_dst" and
>> "output + dec_ttl + mod_nw_tos + mod_nw_src + mod_nw_dst", the execution of
>> actions became slower compared to the old patch. However, this is still 
>> faster
>> than vanilla OVS.
>>
>> Could you please confirm the results?
>> Are these values acceptable?
>> Is there anything still to improve in the new patch?
>
> Sorry for the delay and thanks for the details and the new version
>
> I've tried to reproduce your tests (64 bytes UDP phy-phy throughput)
> on my system (2 Ghz haswell with ixgbe).
>
> Here are my results:
>
>| T | T | I | I |
>| T | O | P | P |  Vanilla OVS  ||  + old patch  |  + new patch
>| L | S | s | d | (nsec/packet) || (nsec/packet) | (nsec/packet)
> ---+---+---+---+---+---++---+---
> output |   |   |   |   |67.20  ||67.20  |67.20
>| X |   |   |   |87.03  ||82.17  |80.19(+)
>|   | X |   |   |87.03  ||81.37  |81.50(-)
>|   |   | X |   |95.88  ||90.66  |87.26(+)
>|   |   |   | X |95.51  ||90.42  |87.18(+)
>|   |   | X | X |   107.53  ||   103.41  |   100.20(+)
>| X |   | X | X |   111.98  ||   108.23  |   105.49(+)
>| X | X | X | X |   116.01  ||   112.49  |   111.11(+)
>
> There are no regressions compared to master in any case, which is good.
>
> Both versions look good to me.  Since you did the original profiling, which
> version do you prefer?
>
> Thanks,
>
> Daniele
>
>>
>> The patch was applied to: 7971b36c3acc279f8e931d360f16c200752a3be2
>>
>> Best regards,
>> Zoltan
>>
>> Signed-off-by: Zoltán Balogh 
>>
>> ---
>>
>> diff --git a/lib/odp-execute.c b/lib/odp-execute.c
>> index 65a6fcd..cc555b8 100644
>> --- a/lib/odp-execute.c
>> +++ b/lib/odp-execute.c
>> @@ -33,6 +33,7 @@
>>  #include "flow.h&

Re: [ovs-dev] [PATCH 1/2] conntrack: Do not create new connections from ICMP errors.

2016-12-22 Thread Daniele Di Proietto
 && key->nw_proto == IPPROTO_ICMP) {
>-return (!related || check_l4_icmp(data, size))
>-   && extract_l4_icmp(key, data, size, related);
>+return (!maybe_related || check_l4_icmp(data, size))
>+   && extract_l4_icmp(key, data, size, maybe_related);
> } else if (key->dl_type == htons(ETH_TYPE_IPV6)
>&& key->nw_proto == IPPROTO_ICMPV6) {
>-return (!related || check_l4_icmp6(key, data, size, l3))
>-   && extract_l4_icmp6(key, data, size, related);
>+return (!maybe_related || check_l4_icmp6(key, data, size, l3))
>+   && extract_l4_icmp6(key, data, size, maybe_related);
> } else {
> return false;
> }
>@@ -975,7 +979,7 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
>*pkt, ovs_be16 dl_type,
> }
> 
> if (ok) {
>-if (extract_l4(&ctx->key, l4, tail - l4, &ctx->related, l3)) {
>+if (extract_l4(&ctx->key, l4, tail - l4, &ctx->maybe_related, l3)) {
> ctx->hash = conn_key_hash(&ctx->key, ct->hash_basis);
> return true;
> }
>(END)
>
>
>On 12/20/16, 12:25 PM, "ovs-dev-boun...@openvswitch.org on behalf of Daniele 
>Di Proietto" diproiet...@vmware.com> wrote:
>
>ICMP error packets (e.g. destination unreachable messages) are
>considered 'related' to another connection and are treated as part of
>that.
>
>However:
>
>* We shouldn't create new entries in the connection table if the
>  original connection is not found.  This is consistent with what the
>  kernel does.
>* We certainly shouldn't call valid_new() on the packet, because
>  valid_new() assumes the packet l4 type (might be TCP, UDP or ICMP)
>  to be consistent with the conn_key nw_proto type.
>
>Found by inspection.
>
>Fixes: a489b16854b5("conntrack: New userspace connection tracker.")
>Signed-off-by: Daniele Di Proietto 
>---
> lib/conntrack.c | 50 
> +
> tests/system-traffic.at | 27 +++---
> 2 files changed, 42 insertions(+), 35 deletions(-)
>
>diff --git a/lib/conntrack.c b/lib/conntrack.c
>index 7c50a28..d459321 100644
>--- a/lib/conntrack.c
>+++ b/lib/conntrack.c
>@@ -213,38 +213,40 @@ process_one(struct conntrack *ct, struct dp_packet 
> *pkt,
> struct conn *conn = ctx->conn;
> uint16_t state = 0;
> 
>-if (conn) {
>-if (ctx->related) {
>+if (ctx->related) {
>+if (conn) {
> state |= CS_RELATED;
> if (ctx->reply) {
> state |= CS_REPLY_DIR;
> }
> } else {
>-enum ct_update_res res;
>+state |= CS_INVALID;
>+}
>+} else if (conn) {
>+enum ct_update_res res;
> 
>-res = conn_update(conn, &ct->buckets[bucket], pkt,
>-  ctx->reply, now);
>+res = conn_update(conn, &ct->buckets[bucket], pkt,
>+  ctx->reply, now);
> 
>-switch (res) {
>-case CT_UPDATE_VALID:
>-state |= CS_ESTABLISHED;
>-if (ctx->reply) {
>-state |= CS_REPLY_DIR;
>-}
>-break;
>-case CT_UPDATE_INVALID:
>-state |= CS_INVALID;
>-break;
>-case CT_UPDATE_NEW:
>-ovs_list_remove(&conn->exp_node);
>-hmap_remove(&ct->buckets[bucket].connections, 
> &conn->node);
>-atomic_count_dec(&ct->n_conn);
>-delete_conn(conn);
>-conn = conn_not_found(ct, pkt, ctx, &state, commit, now);
>-break;
>-default:
>-OVS_NOT_REACHED();
>+switch (res) {
>+case CT_UPDATE_VALID:
>+state |= CS_ESTABLISHED;
>+if (ctx->reply) {
>+state |= CS_REPLY_DIR;
> }
>+break;
>+case CT_UPDATE_INVALID:
>+state |= CS_INVALID;
>+break;
>+case CT_UPDATE_NEW:
>+ovs_list_remove(&conn->exp_node);
>+hmap_remove(&ct->buckets[

[ovs-dev] [PATCH v2 1/3] conntrack: Do not create new connections from ICMP errors.

2016-12-22 Thread Daniele Di Proietto
ICMP error packets (e.g. destination unreachable messages) are
considered 'related' to another connection and are treated as part of
that.

However:

* We shouldn't create new entries in the connection table if the
  original connection is not found.  This is consistent with what the
  kernel does.
* We certainly shouldn't call valid_new() on the packet, because
  valid_new() assumes the packet l4 type (might be TCP, UDP or ICMP)
  to be consistent with the conn_key nw_proto type.

Found by inspection.

Fixes: a489b16854b5("conntrack: New userspace connection tracker.")
Signed-off-by: Daniele Di Proietto 
---
v2: Handle ICMP error for non existing connection in else branch without
restructuring the whole code flow.
---
 lib/conntrack.c |  6 +-
 tests/system-traffic.at | 27 ---
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 7c50a28..9bea3d9 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -247,7 +247,11 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 }
 }
 } else {
-conn = conn_not_found(ct, pkt, ctx, &state, commit, now);
+if (ctx->related) {
+state |= CS_INVALID;
+} else {
+conn = conn_not_found(ct, pkt, ctx, &state, commit, now);
+}
 }
 
 write_ct_md(pkt, state, zone, conn ? conn->mark : 0,
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 9ea6d6b..a5023d3 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1331,12 +1331,8 @@ ADD_VETH(p1, at_ns1, br0, "172.16.0.2/24")
 
 dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
ns1->ns0.
 AT_DATA([flows.txt], [dnl
-priority=1,action=drop
-priority=10,arp,action=normal
-priority=100,in_port=1,udp,ct_state=-trk,action=ct(commit,table=0)
-priority=100,in_port=1,ip,ct_state=+trk,actions=controller
-priority=100,in_port=2,ip,ct_state=-trk,action=ct(table=0)
-priority=100,in_port=2,ip,ct_state=+trk+rel+rpl,action=controller
+table=0,ip,action=ct(commit,table=1)
+table=1,ip,action=controller
 ])
 
 AT_CHECK([ovs-ofctl --bundle replace-flows br0 flows.txt])
@@ -1345,22 +1341,31 @@ AT_CAPTURE_FILE([ofctl_monitor.log])
 AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir 
--pidfile 2> ofctl_monitor.log])
 
 dnl 1. Send an ICMP port unreach reply for port 8738, without any previous 
request
-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
'f64c473528c9c6f54ecb72db080045c0003d2e874001f355ac14ac130303553f4521317040004011b138ac13ac14000d20966369616f0a'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 resubmit\(,0\) 
'f64c473528c9c6f54ecb72db080045c0003d2e874001f351ac14ac130303da494521317040004011b138ac13ac14000d20966369616f0a'])
 
 dnl 2. Send and UDP packet to port 
-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit,table=0\) 
'c6f94ecb72dbe64c473528c908004521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 resubmit\(,0\) 
'c6f94ecb72dbe64c473528c908004521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
 
 dnl 3. Send an ICMP port unreach reply for port , related to the first 
packet
-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
'e64c473528c9c6f94ecb72db080045c0003d2e874001f355ac12ac110303553f4521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 resubmit\(,0\) 
'e64c473528c9c6f94ecb72db080045c0003d2e874001f355ac12ac110303553f4521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
 
 dnl Check this output. We only see the latter two packets, not the first.
 AT_CHECK([cat ofctl_monitor.log], [0], [dnl
-NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=47 ct_state=new|trk,in_port=1 
(via action) data_len=47 (unbuffered)
+NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=75 
ct_state=inv|trk,in_port=2 (via action) data_len=75 (unbuffered)
+icmp,vlan_tci=0x,dl_src=c6:f5:4e:cb:72:db,dl_dst=f6:4c:47:35:28:c9,nw_src=172.16.0.4,nw_dst=172.16.0.3,nw_tos=192,nw_ecn=0,nw_ttl=64,icmp_type=3,icmp_code=3
 icmp_csum:da49
+NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=47 
ct_state=new|trk,in_port=1 (via action) data_len=47 (unbuffered)
 
udp,vlan_tci=0x,dl_src=e6:4c:47:35:28:c9,dl_dst=c6:f9:4e:cb:72:db,nw_src=172.16.0.1,nw_dst=172.16.0.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=41614,tp_dst=
 udp_csum:2096
-NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=75 
ct_state=rel|rpl|trk,in_port=2 (via action) data_len=75 (unbuffered)
+NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=75 
ct_state=rel|rpl|trk,in_port=2 (via 

[ovs-dev] [PATCH v2 2/3] conntrack: Return NEW for IPv6 ND packets without tracking.

2016-12-22 Thread Daniele Di Proietto
The userspace connection tracker treats Neighbor Discovery packets
as invalid, because they're not checked against any connection.

This in inconsistent with the kernel connection tracker which always
returns 'CS_NEW'.

Therefore, this commit makes the userspace connection tracker conforming
with the kernel.  ND packets still do not create or read any state, but
they're treated as NEW.

To support this, the key extraction functions can now return
KEY_NO_TRACK, meaning that the packet is ok, but it should be treated
statelessly.

We also have to remove a test that explicitly checked that neighbor
discovery was treated as invalid.

Reported-by: Sridhar Gaddam 
Signed-off-by: Daniele Di Proietto 
---
v2: Update comment to reflect that we do not do special validation with
the packet.
---
 lib/conntrack.c | 134 
 tests/ofproto-dpif.at   |  32 
 tests/system-traffic.at |  35 +
 3 files changed, 125 insertions(+), 76 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 9bea3d9..86228d6 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -52,9 +52,17 @@ struct conn_lookup_ctx {
 bool related;
 };
 
-static bool conn_key_extract(struct conntrack *, struct dp_packet *,
- ovs_be16 dl_type, struct conn_lookup_ctx *,
- uint16_t zone);
+enum key_status {
+KEY_INVALID,   /* Could not extract the connection key: invalid. */
+KEY_OK,/* Connection key is ok. */
+KEY_NO_TRACK,  /* Connection key is ok, but it should not be tracked. */
+};
+
+static enum key_status conn_key_extract(struct conntrack *,
+struct dp_packet *,
+ovs_be16 dl_type,
+struct conn_lookup_ctx *,
+uint16_t zone);
 static uint32_t conn_key_hash(const struct conn_key *, uint32_t basis);
 static void conn_key_reverse(struct conn_key *);
 static void conn_key_lookup(struct conntrack_bucket *ctb,
@@ -157,6 +165,20 @@ static unsigned hash_to_bucket(uint32_t hash)
 return (hash >> (32 - CONNTRACK_BUCKETS_SHIFT)) % CONNTRACK_BUCKETS;
 }
 
+static uint16_t
+key_status_to_cs(enum key_status s)
+{
+switch (s) {
+case KEY_INVALID:
+return CS_INVALID;
+case KEY_OK:
+case KEY_NO_TRACK:
+return CS_NEW;
+default:
+OVS_NOT_REACHED();
+}
+}
+
 static void
 write_ct_md(struct dp_packet *pkt, uint16_t state, uint16_t zone,
 uint32_t mark, ovs_u128 label)
@@ -303,10 +325,13 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 
 memset(bucket_list, INT8_C(-1), sizeof bucket_list);
 for (i = 0; i < cnt; i++) {
+enum key_status extract_res;
 unsigned bucket;
 
-if (!conn_key_extract(ct, pkts[i], dl_type, &ctxs[i], zone)) {
-write_ct_md(pkts[i], CS_INVALID, zone, 0, OVS_U128_ZERO);
+extract_res = conn_key_extract(ct, pkts[i], dl_type, &ctxs[i], zone);
+if (extract_res != KEY_OK) {
+write_ct_md(pkts[i], key_status_to_cs(extract_res), zone, 0,
+OVS_U128_ZERO);
 continue;
 }
 
@@ -693,8 +718,11 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 return key->src.port && key->dst.port;
 }
 
-static inline bool extract_l4(struct conn_key *key, const void *data,
-  size_t size, bool *related, const void *l3);
+static inline enum key_status extract_l4(struct conn_key *key,
+ const void *data,
+ size_t size,
+ bool *related,
+ const void *l3);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -724,14 +752,14 @@ reverse_icmp_type(uint8_t type)
  * instead and set *related to true.  If 'related' is NULL we're
  * already processing a nested header and no such recursion is
  * possible */
-static inline int
+static inline enum key_status
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
 bool *related)
 {
 const struct icmp_header *icmp = data;
 
 if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
-return false;
+return KEY_INVALID;
 }
 
 switch (icmp->icmp_type) {
@@ -742,13 +770,15 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 case ICMP4_INFOREQUEST:
 case ICMP4_INFOREPLY:
 if (icmp->icmp_code != 0) {
-return false;
+return KEY_INVALID;
 }
 /* Separate ICMP connection: identified using id */
 key->src.icmp_id = key->dst.icmp_id = icmp->icmp_fields.echo.id;
 key->src.icmp_type = icmp->i

[ovs-dev] [PATCH v2 3/3] conntrack: Use 'maybe_related' insted of 'related'.

2016-12-22 Thread Daniele Di Proietto
This is just a naming change.  When we extract the key of an ICMP error
message we suspect that it might be related, but we're not sure until we
perform a lookup in the connection table.

Suggested-by: Darrell Ball 
Signed-off-by: Daniele Di Proietto 
---
 lib/conntrack.c | 52 ++--
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 86228d6..902e370 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -49,7 +49,7 @@ struct conn_lookup_ctx {
 struct conn *conn;
 uint32_t hash;
 bool reply;
-bool related;
+bool maybe_related;
 };
 
 enum key_status {
@@ -236,7 +236,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 uint16_t state = 0;
 
 if (conn) {
-if (ctx->related) {
+if (ctx->maybe_related) {
 state |= CS_RELATED;
 if (ctx->reply) {
 state |= CS_REPLY_DIR;
@@ -269,7 +269,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 }
 }
 } else {
-if (ctx->related) {
+if (ctx->maybe_related) {
 state |= CS_INVALID;
 } else {
 conn = conn_not_found(ct, pkt, ctx, &state, commit, now);
@@ -721,7 +721,7 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 static inline enum key_status extract_l4(struct conn_key *key,
  const void *data,
  size_t size,
- bool *related,
+ bool *maybe_related,
  const void *l3);
 
 static uint8_t
@@ -747,14 +747,14 @@ reverse_icmp_type(uint8_t type)
 }
 }
 
-/* If 'related' is not NULL and the function is processing an ICMP
+/* If 'maybe_related' is not NULL and the function is processing an ICMP
  * error packet, extract the l3 and l4 fields from the nested header
- * instead and set *related to true.  If 'related' is NULL we're
+ * instead and set *maybe_related to true.  If 'maybe_related' is NULL we're
  * already processing a nested header and no such recursion is
  * possible */
 static inline enum key_status
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *maybe_related)
 {
 const struct icmp_header *icmp = data;
 
@@ -793,7 +793,7 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 enum key_status res;
 bool ok;
 
-if (!related) {
+if (!maybe_related) {
 return KEY_INVALID;
 }
 
@@ -817,7 +817,7 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 res = extract_l4(key, l4, tail - l4, NULL, l3);
 if (res != KEY_INVALID) {
 conn_key_reverse(key);
-*related = true;
+*maybe_related = true;
 }
 return res;
 }
@@ -839,14 +839,14 @@ reverse_icmp6_type(uint8_t type)
 }
 }
 
-/* If 'related' is not NULL and the function is processing an ICMP
+/* If 'maybe_related' is not NULL and the function is processing an ICMP
  * error packet, extract the l3 and l4 fields from the nested header
- * instead and set *related to true.  If 'related' is NULL we're
+ * instead and set *maybe_related to true.  If 'maybe_related' is NULL we're
  * already processing a nested header and no such recursion is
  * possible */
 static inline enum key_status
 extract_l4_icmp6(struct conn_key *key, const void *data, size_t size,
- bool *related)
+ bool *maybe_related)
 {
 const struct icmp6_header *icmp6 = data;
 
@@ -882,7 +882,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 enum key_status res;
 bool ok;
 
-if (!related) {
+if (!maybe_related) {
 return KEY_INVALID;
 }
 
@@ -908,7 +908,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 res = extract_l4(key, l4, tail - l4, NULL, l3);
 if (res != KEY_INVALID) {
 conn_key_reverse(key);
-*related = true;
+*maybe_related = true;
 }
 return res;
 }
@@ -929,33 +929,33 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 /* Extract l4 fields into 'key', which must already contain valid l3
  * members.
  *
- * If 'related' is not NULL and an ICMP error packet is being
+ * If 'maybe_related' is not NULL and an ICMP error packet is being
  * processed, the function will extract the key from the packet nested
- * in the ICMP paylod and set '*related' to true.
+ * in the ICMP paylod and set '*maybe_related' to true.
  *
- * If &#

Re: [ovs-dev] [PATCH v3 2/3] netdev-dpdk: Arbitrary 'dpdk' port naming

2016-12-22 Thread Daniele Di Proietto
2016-12-22 3:05 GMT-08:00 Kevin Traynor :
> On 12/22/2016 10:02 AM, Kevin Traynor wrote:
>> On 12/21/2016 07:35 PM, Daniele Di Proietto wrote:
>>> 2016-12-21 10:18 GMT-08:00 Kevin Traynor :
>>>> On 12/21/2016 03:02 AM, Daniele Di Proietto wrote:
>>>>> 2016-12-20 14:08 GMT-08:00 Kevin Traynor :
>>>>>> On 12/15/2016 11:54 AM, Ciara Loftus wrote:
>>>>>>> 'dpdk' ports no longer have naming restrictions. Now, instead of
>>>>>>> specifying the dpdk port ID as part of the name, the PCI address of the
>>>>>>> device must be specified via the 'dpdk-devargs' option. eg.
>>>>>>>
>>>>>>> ovs-vsctl add-port br0 my-port
>>>>>>> ovs-vsctl set Interface my-port type=dpdk
>>>>>>> ovs-vsctl set Interface my-port options:dpdk-devargs=:06:00.3
>>>>>>
>>>>>> I wouldn't encourage people to split up commands like above as they'll
>>>>>> see errors and warnings.
>>>>>
>>>>> Good point
>>>>>
>>>>>>
>>>>>> If you use the old command (which people surely will), it's not
>>>>>> intuitive that it's now still a valid cmd but incomplete for setting up
>>>>>> the port:
>>>>>>
>>>>>> []# ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>>>>>> 2016-12-20T19:53:54Z|00051|netdev|WARN|dpdk0: could not set
>>>>>> configuration (Invalid argument)
>>>>>> ovs-vsctl: Error detected while setting up 'dpdk0'.  See ovs-vswitchd
>>>>>> log for details.
>>>>>>
>>>>>> It would be nice if this was just a warning and more informative - this
>>>>>> could be an incremental change also.
>>>>>
>>>>> You're right, vsctl errors are pretty obscure in this case. I think a 
>>>>> first
>>>>> step is to improve what ovs-vsctl reports to the user. I sent a patch 
>>>>> here:
>>>>>
>>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2016-December/326542.html
>>>>>
>>>>> The second step would be to allow netdev_dpdk_set_config() to return an 
>>>>> error
>>>>> string, that can be printed by ovs-vsctl.  I'm fine with doing that later.
>>>>> What do you guys think?
>>>>
>>>> sounds like a good plan to me.
>>>>
>>>> I've done some testing with this patch today and I can't seem to get
>>>> hotplug working after applying 2/3. It works with 1/3 but I'm seeing
>>>> hangs with the new scheme in 2/3 and a dpdk seg fault with 3/3. I think
>>>> maybe my dpdk build is bad or my steps are wrong but it would be good if
>>>> someone else could test too.
>>>>
>>>> - dpdk-devbind.py -u :01:00.0 :01:00.1
>>>> - start vswitchd and add bridge
>>>> - dpdk-devbind.py -b igb_uio :01:00.0 :01:00.1
>>>> - ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>>>> options:dpdk-devargs=:01:00.0
>>>>
>>>
>>> I think there's a bug in DPDK 16.11 that should be fixed by this commit:
>>>
>>> f9ae888b1e19("ethdev: fix port lookup if none").
>>>
>>> Does it work if you include the fix in your DPDK build?
>>
>> yes - that was the issue I was hitting for the seg fault, thanks. The
>> hang is fixed too, I think my igb_uio was out of date.
>>
>> It is a reasonable that someone might try to mistakenly add a port when
>> there are no devices in dpdk but the dpdk fix won't be available until a
>> 16.11.1, so we need the equivalent check in OVS. I tested attach/detach
>> when no device with/without this incremental and it works now.
>
> pah...I didn't test when there was a single device bound after vswitchd
> starts and of course it breaks, so it should be:
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 75559fe..11c007d 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1097,8 +1097,9 @@ netdev_dpdk_process_devargs(const char *devargs)
>  {
>  uint8_t new_port_id = UINT8_MAX;
>
> -if (rte_eth_dev_get_port_by_name(devargs, &new_port_id)
> -|| !rte_eth_dev_is_valid_port(new_port_id)) {
> +if (!rte_eth_dev_count()
> +   || rte_eth_dev_get_por

[ovs-dev] [PATCH v3 2/2] conntrack: Use 'maybe_related' insted of 'related'.

2016-12-23 Thread Daniele Di Proietto
This is just a naming change.  When we extract the key of an ICMP error
message we suspect that it might be related, but we're not sure until we
perform a lookup in the connection table.

Suggested-by: Darrell Ball 
Signed-off-by: Daniele Di Proietto 
Acked-by: Darrell Ball 
---
 lib/conntrack.c | 52 ++--
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index d2b5f3a..a9627f9 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -49,7 +49,7 @@ struct conn_lookup_ctx {
 struct conn *conn;
 uint32_t hash;
 bool reply;
-bool related;
+bool maybe_related;
 };
 
 enum key_status {
@@ -236,7 +236,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 uint16_t state = 0;
 
 if (conn) {
-if (ctx->related) {
+if (ctx->maybe_related) {
 state |= CS_RELATED;
 if (ctx->reply) {
 state |= CS_REPLY_DIR;
@@ -269,7 +269,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 }
 }
 } else {
-if (ctx->related) {
+if (ctx->maybe_related) {
 state |= CS_INVALID;
 } else {
 conn = conn_not_found(ct, pkt, ctx, &state, commit, now);
@@ -721,7 +721,7 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 static inline enum key_status extract_l4(struct conn_key *key,
  const void *data,
  size_t size,
- bool *related,
+ bool *maybe_related,
  const void *l3);
 
 static uint8_t
@@ -747,14 +747,14 @@ reverse_icmp_type(uint8_t type)
 }
 }
 
-/* If 'related' is not NULL and the function is processing an ICMP
+/* If 'maybe_related' is not NULL and the function is processing an ICMP
  * error packet, extract the l3 and l4 fields from the nested header
- * instead and set *related to true.  If 'related' is NULL we're
+ * instead and set *maybe_related to true.  If 'maybe_related' is NULL we're
  * already processing a nested header and no such recursion is
  * possible */
 static inline enum key_status
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *maybe_related)
 {
 const struct icmp_header *icmp = data;
 
@@ -793,7 +793,7 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 enum key_status res;
 bool ok;
 
-if (!related) {
+if (!maybe_related) {
 return KEY_INVALID;
 }
 
@@ -817,7 +817,7 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 res = extract_l4(key, l4, tail - l4, NULL, l3);
 if (res != KEY_INVALID) {
 conn_key_reverse(key);
-*related = true;
+*maybe_related = true;
 }
 return res;
 }
@@ -839,14 +839,14 @@ reverse_icmp6_type(uint8_t type)
 }
 }
 
-/* If 'related' is not NULL and the function is processing an ICMP
+/* If 'maybe_related' is not NULL and the function is processing an ICMP
  * error packet, extract the l3 and l4 fields from the nested header
- * instead and set *related to true.  If 'related' is NULL we're
+ * instead and set *maybe_related to true.  If 'maybe_related' is NULL we're
  * already processing a nested header and no such recursion is
  * possible */
 static inline enum key_status
 extract_l4_icmp6(struct conn_key *key, const void *data, size_t size,
- bool *related)
+ bool *maybe_related)
 {
 const struct icmp6_header *icmp6 = data;
 
@@ -882,7 +882,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 enum key_status res;
 bool ok;
 
-if (!related) {
+if (!maybe_related) {
 return KEY_INVALID;
 }
 
@@ -908,7 +908,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 res = extract_l4(key, l4, tail - l4, NULL, l3);
 if (res != KEY_INVALID) {
 conn_key_reverse(key);
-*related = true;
+*maybe_related = true;
 }
 return res;
 }
@@ -929,33 +929,33 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 /* Extract l4 fields into 'key', which must already contain valid l3
  * members.
  *
- * If 'related' is not NULL and an ICMP error packet is being
+ * If 'maybe_related' is not NULL and an ICMP error packet is being
  * processed, the function will extract the key from the packet nested
- * in the ICMP paylod and set '*related' to true.
+ * in the ICMP paylod and set '*maybe_related

[ovs-dev] [PATCH v3 1/2] conntrack: Return NEW for IPv6 ND packets without tracking.

2016-12-23 Thread Daniele Di Proietto
The userspace connection tracker treats Neighbor Discovery packets
as invalid, because they're not checked against any connection.

This in inconsistent with the kernel connection tracker which always
returns 'CS_NEW'.

Therefore, this commit makes the userspace connection tracker conforming
with the kernel.  ND packets still do not create or read any state, but
they're treated as NEW.

To support this, the key extraction functions can now return
KEY_NO_TRACK, meaning the packet should be treated statelessly and not
be sent to the connection tracker.

We also have to remove a test that explicitly checked that neighbor
discovery was treated as invalid.

Reported-by: Sridhar Gaddam 
Signed-off-by: Daniele Di Proietto 
---
 lib/conntrack.c | 134 
 tests/ofproto-dpif.at   |  32 
 tests/system-traffic.at |  95 ++
 3 files changed, 185 insertions(+), 76 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 9bea3d9..d2b5f3a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -52,9 +52,17 @@ struct conn_lookup_ctx {
 bool related;
 };
 
-static bool conn_key_extract(struct conntrack *, struct dp_packet *,
- ovs_be16 dl_type, struct conn_lookup_ctx *,
- uint16_t zone);
+enum key_status {
+KEY_INVALID,   /* Could not extract the connection key: invalid. */
+KEY_OK,/* Connection key is ok. */
+KEY_NO_TRACK,  /* Connection key should not be tracked. */
+};
+
+static enum key_status conn_key_extract(struct conntrack *,
+struct dp_packet *,
+ovs_be16 dl_type,
+struct conn_lookup_ctx *,
+uint16_t zone);
 static uint32_t conn_key_hash(const struct conn_key *, uint32_t basis);
 static void conn_key_reverse(struct conn_key *);
 static void conn_key_lookup(struct conntrack_bucket *ctb,
@@ -157,6 +165,20 @@ static unsigned hash_to_bucket(uint32_t hash)
 return (hash >> (32 - CONNTRACK_BUCKETS_SHIFT)) % CONNTRACK_BUCKETS;
 }
 
+static uint16_t
+key_status_to_cs(enum key_status s)
+{
+switch (s) {
+case KEY_INVALID:
+return CS_INVALID;
+case KEY_OK:
+case KEY_NO_TRACK:
+return CS_NEW;
+default:
+OVS_NOT_REACHED();
+}
+}
+
 static void
 write_ct_md(struct dp_packet *pkt, uint16_t state, uint16_t zone,
 uint32_t mark, ovs_u128 label)
@@ -303,10 +325,13 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 
 memset(bucket_list, INT8_C(-1), sizeof bucket_list);
 for (i = 0; i < cnt; i++) {
+enum key_status extract_res;
 unsigned bucket;
 
-if (!conn_key_extract(ct, pkts[i], dl_type, &ctxs[i], zone)) {
-write_ct_md(pkts[i], CS_INVALID, zone, 0, OVS_U128_ZERO);
+extract_res = conn_key_extract(ct, pkts[i], dl_type, &ctxs[i], zone);
+if (extract_res != KEY_OK) {
+write_ct_md(pkts[i], key_status_to_cs(extract_res), zone, 0,
+OVS_U128_ZERO);
 continue;
 }
 
@@ -693,8 +718,11 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 return key->src.port && key->dst.port;
 }
 
-static inline bool extract_l4(struct conn_key *key, const void *data,
-  size_t size, bool *related, const void *l3);
+static inline enum key_status extract_l4(struct conn_key *key,
+ const void *data,
+ size_t size,
+ bool *related,
+ const void *l3);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -724,14 +752,14 @@ reverse_icmp_type(uint8_t type)
  * instead and set *related to true.  If 'related' is NULL we're
  * already processing a nested header and no such recursion is
  * possible */
-static inline int
+static inline enum key_status
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
 bool *related)
 {
 const struct icmp_header *icmp = data;
 
 if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
-return false;
+return KEY_INVALID;
 }
 
 switch (icmp->icmp_type) {
@@ -742,13 +770,15 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 case ICMP4_INFOREQUEST:
 case ICMP4_INFOREPLY:
 if (icmp->icmp_code != 0) {
-return false;
+return KEY_INVALID;
 }
 /* Separate ICMP connection: identified using id */
 key->src.icmp_id = key->dst.icmp_id = icmp->icmp_fields.echo.id;
 key->src.icmp_type = icmp->icmp_type;
 key->dst.icmp_type = reverse_icmp_type(icmp-&g

Re: [ovs-dev] [PATCH v2 1/3] conntrack: Do not create new connections from ICMP errors.

2016-12-23 Thread Daniele Di Proietto





On 22/12/2016 18:55, "Darrell Ball"  wrote:

>
>
>On 12/22/16, 6:36 PM, "Daniele Di Proietto"  wrote:
>
>ICMP error packets (e.g. destination unreachable messages) are
>considered 'related' to another connection and are treated as part of
>that.
>
>However:
>
>* We shouldn't create new entries in the connection table if the
>  original connection is not found.  This is consistent with what the
>  kernel does.
>* We certainly shouldn't call valid_new() on the packet, because
>  valid_new() assumes the packet l4 type (might be TCP, UDP or ICMP)
>  to be consistent with the conn_key nw_proto type.
>
>Found by inspection.
>    
>    Fixes: a489b16854b5("conntrack: New userspace connection tracker.")
>Signed-off-by: Daniele Di Proietto 
>---
>v2: Handle ICMP error for non existing connection in else branch without
>restructuring the whole code flow.
>---
> lib/conntrack.c |  6 +-
> tests/system-traffic.at | 27 ---
> 2 files changed, 21 insertions(+), 12 deletions(-)
>
>
>Acked-by: Darrell Ball 

Thanks, I pushed this to master and branch-2.6

>
>diff --git a/lib/conntrack.c b/lib/conntrack.c
>index 7c50a28..9bea3d9 100644
>--- a/lib/conntrack.c
>+++ b/lib/conntrack.c
>@@ -247,7 +247,11 @@ process_one(struct conntrack *ct, struct dp_packet 
> *pkt,
> }
> }
> } else {
>-conn = conn_not_found(ct, pkt, ctx, &state, commit, now);
>+if (ctx->related) {
>+state |= CS_INVALID;
>+} else {
>+conn = conn_not_found(ct, pkt, ctx, &state, commit, now);
>+}
> }
> 
> write_ct_md(pkt, state, zone, conn ? conn->mark : 0,
>diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>index 9ea6d6b..a5023d3 100644
>--- a/tests/system-traffic.at
>+++ b/tests/system-traffic.at
>@@ -1331,12 +1331,8 @@ ADD_VETH(p1, at_ns1, br0, "172.16.0.2/24")
> 
> dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
> ns1->ns0.
> AT_DATA([flows.txt], [dnl
>-priority=1,action=drop
>-priority=10,arp,action=normal
>-priority=100,in_port=1,udp,ct_state=-trk,action=ct(commit,table=0)
>-priority=100,in_port=1,ip,ct_state=+trk,actions=controller
>-priority=100,in_port=2,ip,ct_state=-trk,action=ct(table=0)
>-priority=100,in_port=2,ip,ct_state=+trk+rel+rpl,action=controller
>+table=0,ip,action=ct(commit,table=1)
>+table=1,ip,action=controller
> ])
> 
> AT_CHECK([ovs-ofctl --bundle replace-flows br0 flows.txt])
>@@ -1345,22 +1341,31 @@ AT_CAPTURE_FILE([ofctl_monitor.log])
> AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir 
> --pidfile 2> ofctl_monitor.log])
> 
> dnl 1. Send an ICMP port unreach reply for port 8738, without any 
> previous request
>-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
> 'f64c473528c9c6f54ecb72db080045c0003d2e874001f355ac14ac130303553f4521317040004011b138ac13ac14000d20966369616f0a'])
>+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 resubmit\(,0\) 
> 'f64c473528c9c6f54ecb72db080045c0003d2e874001f351ac14ac130303da494521317040004011b138ac13ac14000d20966369616f0a'])
> 
> dnl 2. Send and UDP packet to port 
>-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit,table=0\) 
> 'c6f94ecb72dbe64c473528c908004521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
>+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 resubmit\(,0\) 
> 'c6f94ecb72dbe64c473528c908004521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
> 
> dnl 3. Send an ICMP port unreach reply for port , related to the 
> first packet
>-AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) 
> 'e64c473528c9c6f94ecb72db080045c0003d2e874001f355ac12ac110303553f4521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
>+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 resubmit\(,0\) 
> 'e64c473528c9c6f94ecb72db080045c0003d2e874001f355ac12ac110303553f4521317040004011b138ac11ac12a28e15b3000d20966369616f0a'])
> 
> dnl Check this output. We only see the latter two packets, not the first.
> AT_CHECK([cat ofctl_monitor.log], [0], [dnl
>-NXT_PACKET_IN2 (xid=0x0): c

Re: [ovs-dev] [PATCH v2 2/3] conntrack: Return NEW for IPv6 ND packets without tracking.

2016-12-23 Thread Daniele Di Proietto





On 22/12/2016 21:20, "Darrell Ball"  wrote:

>Some comments inline

Thanks for the review, I've sent a v3

>
>On 12/22/16, 6:36 PM, "Daniele Di Proietto"  wrote:
>
>The userspace connection tracker treats Neighbor Discovery packets
>as invalid, because they're not checked against any connection.
>
>This in inconsistent with the kernel connection tracker which always
>returns 'CS_NEW'.
>
>Therefore, this commit makes the userspace connection tracker conforming
>with the kernel.  ND packets still do not create or read any state, but
>they're treated as NEW.
>
>To support this, the key extraction functions can now return
>KEY_NO_TRACK, meaning that the packet is ok, but it should be treated
>statelessly.
>
>
>s/To support this, the key extraction functions can now return
>KEY_NO_TRACK, meaning that the packet is ok, but it should be treated
>statelessly.
>/
>To support this, the key extraction functions can now return
>KEY_NO_TRACK, meaning the packet should be treated statelessly
>and not be sent to the connection tracker.
>/

ok, changed

>
>
>We also have to remove a test that explicitly checked that neighbor
>discovery was treated as invalid.
>
>Reported-by: Sridhar Gaddam 
>Signed-off-by: Daniele Di Proietto 
>---
>v2: Update comment to reflect that we do not do special validation with
>the packet.
>---
> lib/conntrack.c | 134 
> 
> tests/ofproto-dpif.at   |  32 
> tests/system-traffic.at |  35 +
> 3 files changed, 125 insertions(+), 76 deletions(-)
>
>diff --git a/lib/conntrack.c b/lib/conntrack.c
>index 9bea3d9..86228d6 100644
>--- a/lib/conntrack.c
>+++ b/lib/conntrack.c
>@@ -52,9 +52,17 @@ struct conn_lookup_ctx {
> bool related;
> };
> 
>-static bool conn_key_extract(struct conntrack *, struct dp_packet *,
>- ovs_be16 dl_type, struct conn_lookup_ctx *,
>- uint16_t zone);
>+enum key_status {
>+KEY_INVALID,   /* Could not extract the connection key: invalid. */
>+KEY_OK,/* Connection key is ok. */
>+KEY_NO_TRACK,  /* Connection key is ok, but it should not be tracked. 
> */
>
>
>KEY_NO_TRACK,  /* Connection key should not be tracked. */

ok

>
>
>+};
>+
>+static enum key_status conn_key_extract(struct conntrack *,
>+struct dp_packet *,
>+ovs_be16 dl_type,
>+struct conn_lookup_ctx *,
>+uint16_t zone);
> static uint32_t conn_key_hash(const struct conn_key *, uint32_t basis);
> static void conn_key_reverse(struct conn_key *);
> static void conn_key_lookup(struct conntrack_bucket *ctb,
>@@ -157,6 +165,20 @@ static unsigned hash_to_bucket(uint32_t hash)
> return (hash >> (32 - CONNTRACK_BUCKETS_SHIFT)) % CONNTRACK_BUCKETS;
> }
> 
>+static uint16_t
>+key_status_to_cs(enum key_status s)
>+{
>+switch (s) {
>+case KEY_INVALID:
>+return CS_INVALID;
>+case KEY_OK:
>+case KEY_NO_TRACK:
>+return CS_NEW;
>+default:
>+OVS_NOT_REACHED();
>+}
>+}
>+
> static void
> write_ct_md(struct dp_packet *pkt, uint16_t state, uint16_t zone,
> uint32_t mark, ovs_u128 label)
>@@ -303,10 +325,13 @@ conntrack_execute(struct conntrack *ct, struct 
> dp_packet_batch *pkt_batch,
> 
> memset(bucket_list, INT8_C(-1), sizeof bucket_list);
> for (i = 0; i < cnt; i++) {
>+enum key_status extract_res;
> unsigned bucket;
> 
>-if (!conn_key_extract(ct, pkts[i], dl_type, &ctxs[i], zone)) {
>-write_ct_md(pkts[i], CS_INVALID, zone, 0, OVS_U128_ZERO);
>+extract_res = conn_key_extract(ct, pkts[i], dl_type, &ctxs[i], 
> zone);
>+if (extract_res != KEY_OK) {
>+write_ct_md(pkts[i], key_status_to_cs(extract_res), zone, 0,
>+OVS_U128_ZERO);
> continue;
> }
> 
>@@ -693,8 +718,11 @@ extract_l4_udp(struct conn_key *key, const void 
> *data, size_t size)
> return key->src.port && key->dst.port;
> }
> 
>-static inline bool extract_l

[ovs-dev] [PATCH] ofproto: Fix crash on flow monitor request with tun_metadata.

2016-12-27 Thread Daniele Di Proietto
nx_put_match() needs a non-NULL tunnel metadata table, otherwise it will
crash if a flow matches on tunnel metadata.

This wasn't handled in ofputil_append_flow_update(), causing a crash
when the controller sent a flow monitor request.

To fix the problem, this commit changes ofputil_append_flow_update() to
behave like ofputil_append_flow_stats_reply().
Since ofputil_append_flow_update() now needs to temporarily modify the
match, this commits also embeds 'struct match' into 'struct
ofputil_flow_update', to be safer.  This is more similar to
'struct ofputil_flow_stats'.

A regression test is added and a comment is updated in ovs-ofctl.c

 #0  0x55699bd82fa0 in memcpy_from_metadata (dst=0x7ffc770930d0, 
src=0x7ffc77093698, loc=0x18) at ../lib/tun-metadata.c:451
 #1  0x55699bd83c2e in metadata_loc_from_match_read (map=0x0, 
match=0x7ffc77093410, idx=0, mask=0x7ffc77093658, is_masked=0x7ffc77093287) at 
../lib/tun-metadata.c:848
 #2  0x55699bd83d9b in tun_metadata_to_nx_match (b=0x55699d3f0300, oxm=0, 
match=0x7ffc77093410) at ../lib/tun-metadata.c:871
 #3  0x55699bce523d in nx_put_raw (b=0x55699d3f0300, oxm=0, 
match=0x7ffc77093410, cookie=0, cookie_mask=0) at ../lib/nx-match.c:1052
 #4  0x55699bce5580 in nx_put_match (b=0x55699d3f0300, 
match=0x7ffc77093410, cookie=0, cookie_mask=0) at ../lib/nx-match.c:1116
 #5  0x55699bd3926f in ofputil_append_flow_update (update=0x7ffc770940b0, 
replies=0x7ffc77094e00) at ../lib/ofp-util.c:6805
 #6  0x55699bc4b5a9 in ofproto_compose_flow_refresh_update 
(rule=0x55699d405b40, flags=(NXFMF_INITIAL | NXFMF_ACTIONS), 
msgs=0x7ffc77094e00) at ../ofproto/ofproto.c:5915
 #7  0x55699bc4b5f6 in ofmonitor_compose_refresh_updates 
(rules=0x7ffc77094e10, msgs=0x7ffc77094e00) at ../ofproto/ofproto.c:5929
 #8  0x55699bc4bafc in handle_flow_monitor_request (ofconn=0x55699d404090, 
oh=0x55699d404220) at ../ofproto/ofproto.c:6082
 #9  0x55699bc4f46d in handle_openflow__ (ofconn=0x55699d404090, 
msg=0x55699d404910) at ../ofproto/ofproto.c:7912
 #10 0x55699bc4f5df in handle_openflow (ofconn=0x55699d404090, 
ofp_msg=0x55699d404910) at ../ofproto/ofproto.c:8002
 #11 0x55699bc88154 in ofconn_run (ofconn=0x55699d404090, 
handle_openflow=0x55699bc4f5bc ) at ../ofproto/connmgr.c:1427
 #12 0x55699bc85934 in connmgr_run (mgr=0x55699d3adb90, 
handle_openflow=0x55699bc4f5bc ) at ../ofproto/connmgr.c:363
 #13 0x55699bc422c9 in ofproto_run (p=0x55699d3c85e0) at 
../ofproto/ofproto.c:1798
 #14 0x55699bc31ec6 in bridge_run__ () at ../vswitchd/bridge.c:2881
 #15 0x55699bc320a6 in bridge_run () at ../vswitchd/bridge.c:2938
 #16 0x55699bc3784e in main (argc=10, argv=0x7ffc770952c8) at 
../vswitchd/ovs-vswitchd.c:111

Fixes: 8d8ab6c2d574 ("tun-metadata: Manage tunnel TLV mapping table on a
per-bridge basis.")

Signed-off-by: Daniele Di Proietto 
---
 include/openvswitch/ofp-util.h |  5 +++--
 lib/ofp-print.c|  4 +---
 lib/ofp-util.c | 14 ++---
 ofproto/connmgr.c  | 10 +-
 ofproto/ofproto.c  | 12 +--
 tests/ofproto.at   | 45 ++
 utilities/ovs-ofctl.c  |  2 +-
 7 files changed, 72 insertions(+), 20 deletions(-)

diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index 91ff0c2..b197a9a 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -1083,7 +1083,7 @@ struct ofputil_flow_update {
 uint8_t table_id;
 uint16_t priority;
 ovs_be64 cookie;
-struct match *match;
+struct match match;
 const struct ofpact *ofpacts;
 size_t ofpacts_len;
 
@@ -1095,7 +1095,8 @@ int ofputil_decode_flow_update(struct ofputil_flow_update 
*,
struct ofpbuf *msg, struct ofpbuf *ofpacts);
 void ofputil_start_flow_update(struct ovs_list *replies);
 void ofputil_append_flow_update(const struct ofputil_flow_update *,
-struct ovs_list *replies);
+struct ovs_list *replies,
+const struct tun_table *);
 
 /* Abstract nx_flow_monitor_cancel. */
 uint32_t ofputil_decode_flow_monitor_cancel(const struct ofp_header *);
diff --git a/lib/ofp-print.c b/lib/ofp-print.c
index 7b7c430..dacaa07 100644
--- a/lib/ofp-print.c
+++ b/lib/ofp-print.c
@@ -2371,10 +2371,8 @@ ofp_print_nxst_flow_monitor_reply(struct ds *string,
 for (;;) {
 char reasonbuf[OFP_FLOW_REMOVED_REASON_BUFSIZE];
 struct ofputil_flow_update update;
-struct match match;
 int retval;
 
-update.match = &match;
 retval = ofputil_decode_flow_update(&update, &b, &ofpacts);
 if (retval) {
 if (retval != EOF) {
@@ -2418,7 +2416,7 @@ ofp_print_nxst_flow_monitor_reply(struct ds *string,
 ds_put_format(string, " cooki

Re: [ovs-dev] [PATCH v4 1/3] netdev-dpdk: add hotplug support

2017-01-04 Thread Daniele Di Proietto
2017-01-04 8:20 GMT-08:00 Stephen Finucane :
> On Wed, 2017-01-04 at 15:31 +, Ciara Loftus wrote:
>> In order to use dpdk ports in ovs they have to be bound to a DPDK
>> compatible driver before ovs is started.
>>
>> This patch adds the possibility to hotplug (or hot-unplug) a device
>> after ovs has been started. The implementation adds two appctl
>> commands:
>> netdev-dpdk/attach and netdev-dpdk/detach
>>
>> After the user attaches a new device, it has to be added to a bridge
>> using the add-port command, similarly, before detaching a device,
>> it has to be removed using the del-port command.
>
> There are a couple of issues with the documentation part of this that
> would be good to fix if/when you respin. I'll leave the rest of the
> changes to someone better qualified to review them.
>
> Stephen

Thanks for the patch Ciara it looks good to me.

Since you put these tags:

Signed-off-by: Mauricio Vasquez B 
Signed-off-by: Ciara Loftus 
Co-authored-by: Ciara Loftus 

should the author be Mauricio?

If you want to respin including Stephen's comments (thanks!) I'll be
happy to apply it.  Otherwise I can make the changes myself before
pushing it.

Thanks,

Daniele
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] *** SPAM *** [PATCH v4 2/3] netdev-dpdk: Arbitrary 'dpdk' port naming

2017-01-04 Thread Daniele Di Proietto
2017-01-04 7:31 GMT-08:00 Ciara Loftus :
> 'dpdk' ports no longer have naming restrictions. Now, instead of
> specifying the dpdk port ID as part of the name, the PCI address of the
> device must be specified via the 'dpdk-devargs' option. eg.
>
> ovs-vsctl add-port br0 my-port
> ovs-vsctl set Interface my-port type=dpdk
>   options:dpdk-devargs=:06:00.3
>
> The user must no longer hotplug attach DPDK ports by issuing the
> specific ovs-appctl netdev-dpdk/attach command. The hotplug is now
> automatically invoked when a valid PCI address is set in the
> dpdk-devargs. The format for ovs-appctl netdev-dpdk/detach command
> has changed in that the user now must specify the relevant PCI address
> as input instead of the port name.
>
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Daniele Di Proietto 
> Co-authored-by: Daniele Di Proietto 
> Signed-off-by: Kevin Traynor 
> Co-authored-by: Kevin Traynor 

Thanks for the new version

I've been testing this for a while and I couldn't find any problems.

The code looks good to me.

I know that's a big change but at this point I think it's better to
merge it and refine (if we have to) it later.

Kevin, do you have any more comments?

> ---
> Changelog:
> * Keep port-detach appctl function - use PCI as input arg
> * Add requires_mutex to devargs processing functions
> * use reconfigure infrastructure for devargs changes
> * process devargs even if valid portid ie. device already configured
> * report err if dpdk-devargs is not specified
> * Add Daniele's incremental patch & Sign-off + Co-author tags
> * Update details of detach command to reflect PCI is needed instead of
>   port name
> * Update NEWS to mention that the new naming scheme is not backwards
>   compatible
> * Use existing DPDk functions to get port ID from PCI address/devname
> * Merged process_devargs and process_pdevargs functions
> * Removed unnecessary requires_mutex from devargs processing fn
> * Fix case where port is re-attached after detach
> * Add note to documentation that devices won't work until devargs set.
> * Set netdev type and dpdk-devargs in one command in docs to avoid
>   errors.
> * Change port names in documentation to emphasise arbitrary-ness.
> * Add Kevin's incremental & sign-off/co-authored-by
> * Check if devargs string has changed before processing it as suggested
>   by Daniele.
> * Print error if attach fails
>
>  Documentation/howto/dpdk.rst |   7 +-
>  Documentation/intro/install/dpdk.rst |   9 +-
>  NEWS |   5 +
>  lib/netdev-dpdk.c| 181 
> ---
>  vswitchd/vswitch.xml |   8 ++
>  5 files changed, 148 insertions(+), 62 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 900f9b7..df5db71 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -312,14 +312,13 @@ dpdk_nic_bind.py script:
>
>  Then it can be attached to OVS:
>
> -   $ ovs-appctl netdev-dpdk/attach :01:00.0
> -
> -At this point, the user can create a ovs port using the add-port command.
> +   $ ovs-vsctl add-port br0 dpdkx -- set Interface dpdkx type=dpdk \
> +   options:dpdk-devargs=:01:00.0
>
>  It is also possible to detach a port from ovs, the user has to remove the
>  port using the del-port command, then it can be detached using:
>
> -   $ ovs-appctl netdev-dpdk/detach dpdk0
> +   $ ovs-appctl netdev-dpdk/detach dpdkx
>
>  This feature is not supported with VFIO and does not work with some NICs.
>  For more information please refer to the `DPDK Port Hotplug Framework
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 54d56ec..6b49cab 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -258,8 +258,13 @@ ports. For example, to create a userspace bridge named 
> ``br0`` and add two
>  ``dpdk`` ports to it, run::
>
>  $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
> -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
> -$ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
> +$ ovs-vsctl add-port br0 myportnameone -- set Interface myportnameone \
> +type=dpdk options:dpdk-devargs=:06:00.0
> +$ ovs-vsctl add-port br0 myportnametwo -- set Interface myportnametwo \
> +type=dpdk options:dpdk-devargs=:06:00.1
> +
> +DPDK devices will not be available for use until a valid dpdk-devargs is
> +specified.
>
>  Refer to ovs-vsctl(8) and :doc:`/howto/dpdk` for more detai

Re: [ovs-dev] [PATCH v4 3/3] netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)

2017-01-04 Thread Daniele Di Proietto
2017-01-04 7:31 GMT-08:00 Ciara Loftus :
> Prior to this commit, the 'dpdk' port type could only be used for
> physical DPDK devices. Now, virtual devices (or 'vdevs') are supported.
> 'vdev' devices are those which use virtual DPDK Poll Mode Drivers eg.
> null, pcap. To add a DPDK vdev, a valid 'dpdk-devargs' must be set for
> the given dpdk port. The format expected is 'eth_' where
> 'x' is a number between 0 and RTE_MAX_ETHPORTS -1.
>
> For example to add a port that uses the 'null' DPDK PMD driver:
>
> ovs-vsctl set Interface null0 options:dpdk-devargs=eth_null0
>
> Not all DPDK vdevs have been verified to work at this point in time.
>
> Signed-off-by: Ciara Loftus 

Thanks, for the new version.

I have a minor suggestion below.

If you want to incorporate the comments on the series and respin I'd
appreciate that. Otherwise I can fix the series before pushing it

Thanks,

Daniele

> ---
> Changelog:
> * Updated process_vdevargs to work with Daniele's incremental in the
>   previous patch.
> * Allow vdev detach
> * Update docs to show af_packet example
>
>  Documentation/howto/dpdk.rst | 29 +
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 35 +++
>  vswitchd/vswitch.xml |  9 +++--
>  4 files changed, 48 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index df5db71..fc2f81e 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -324,6 +324,35 @@ This feature is not supported with VFIO and does not 
> work with some NICs.
>  For more information please refer to the `DPDK Port Hotplug Framework
>  
> `__.
>
> +.. _vdev-support:
> +
> +Vdev Support
> +
> +
> +DPDK provides drivers for both physical and virtual devices. Physical DPDK
> +devices are added to OVS by specifying a valid PCI address in 'dpdk-devargs'.
> +Virtual DPDK devices which do not have PCI addresses can be added using a
> +different format for 'dpdk-devargs'.
> +
> +Typically, the format expected is 'eth_' where 'x' is a
> +number between 0 and RTE_MAX_ETHPORTS -1 (31).
> +
> +For example to add a dpdk port that uses the 'null' DPDK PMD driver:
> +
> +   $ ovs-vsctl add-port br0 null0 -- set Interface null0 type=dpdk \
> +   options:dpdk-devargs=eth_null0
> +
> +Similarly, to add a dpdk port that uses the 'af_packet' DPDK PMD driver:
> +
> +   $ ovs-vsctl add-port br0 af0 -- set Interface af0 type=dpdk \
> +   options:dpdk-devargs=eth_af_packet0

How about a real example?

$ ovs-vsctl add-port br0 myeth0 -- set Interface myeth0 type=dpdk \
   options:dpdk-devargs=eth_af_packet0,iface=eth0

> +
> +More information on the different types of virtual DPDK PMDs can be found in
> +the `DPDK documentation
> +`__.
> +
> +Note: Not all DPDK virtual PMD drivers have been tested and verified to work.
> +
>  .. _dpdk-ovs-in-guest:
>
>  OVS with DPDK Inside VMs
> diff --git a/NEWS b/NEWS
> index d66d402..cc319a9 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -53,6 +53,7 @@ Post-v2.6.0
> with the old dpdk naming scheme is broken, and as such a
> device will not be available for use until a valid dpdk-devargs is
> specified.
> + * Virtual DPDK Poll Mode Driver (vdev PMD) support.
> - Fedora packaging:
>   * A package upgrade does not automatically restart OVS service.
> - ovs-vswitchd/ovs-vsctl:
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index ba4935e..170d01a 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1134,25 +1134,19 @@ netdev_dpdk_lookup_by_port_id(int port_id)
>  static int
>  netdev_dpdk_process_devargs(const char *devargs)
>  {
> -struct rte_pci_addr addr;
>  uint8_t new_port_id = UINT8_MAX;
>
> -if (!eal_parse_pci_DomBDF(devargs, &addr)) {
> -/* Valid PCI address format detected - configure physical device */
> -if (!rte_eth_dev_count()
> -|| rte_eth_dev_get_port_by_name(devargs, &new_port_id)
> -|| !rte_eth_dev_is_valid_port(new_port_id)) {
> -/* PCI device not found in DPDK, attempt to attach it */
> -if (!rte_eth_dev_attach(devargs, &new_port_id)) {
> -/* Attach successful */
> -VLOG_INFO("Device "PCI_PRI_FMT" has been attached to DPDK",
> -  addr.domain, addr.bus, addr.devid, addr.function);
> -} else {
> -/* Attach unsuccessful */
> -VLOG_INFO("Error attaching device "PCI_PRI_FMT" to DPDK",
> -  addr.domain, addr.bus, addr.devid, addr.function);
> -return -1;
> -}
> +if (!rte_eth_dev_count()
> +|| rte_eth_dev_get_port_by_name(devargs, &new_port_id)
> +|| !rte_eth_dev_is_valid_po

Re: [ovs-dev] [PATCH 1/2] doc: Remove ivshmem instructions.

2017-01-04 Thread Daniele Di Proietto
2017-01-03 12:51 GMT-08:00 Mauricio Vasquez :
>
>
> On 01/03/2017 01:21 PM, Kevin Traynor wrote:
>>
>> ivshmem is a path to the guest using DPDK rings that was
>> introduced before userspace vhost was available in the OVS-DPDK
>> datapath. ivshmem is external to OVS but the scheme of using it
>> with DPDK rings is documented.
>>
>> Remove ivshmem instruction documentation because:
>>
>> - The ivshmem library has been removed in DPDK since DPDK 16.11.
>> - The instructions/scheme provided will not work with current
>>supported and future DPDK versions.
>> - The linked patch needed to enable support in QEMU has never
>>been upstreamed and does not apply to the last 4 QEMU releases.
>> - Userspace vhost has become the defacto OVS-DPDK path to the guest.
>>
>> Fixes: 04de404e1bfa ("netdev-dpdk: Add support for DPDK 16.11")
>> Cc: Ciara Loftus 
>> Cc: Stephen Finucane 
>> Signed-off-by: Kevin Traynor 
>
> This was on my TODO list for a long time, unfortunately I didn't have the
> time for it.
>
> Acked-by: Mauricio Vasquez B 

Applied to master, thanks!

>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Rename ivshmem structures.

2017-01-04 Thread Daniele Di Proietto
2017-01-03 10:21 GMT-08:00 Kevin Traynor :
> Rename some structures that call themselves ivshmem,
> as they are just a collection of dpdk rings and other
> information.
>
> Signed-off-by: Kevin Traynor 

Applied to master, thanks!

> ---
>  lib/netdev-dpdk.c | 40 
>  1 file changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 625f425..5f60959 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2617,10 +2617,10 @@ dpdk_ring_create(const char dev_name[], unsigned int 
> port_no,
>   unsigned int *eth_port_id)
>  {
> -struct dpdk_ring *ivshmem;
> +struct dpdk_ring *ring_pair;
>  char *ring_name;
>  int err;
>
> -ivshmem = dpdk_rte_mzalloc(sizeof *ivshmem);
> -if (!ivshmem) {
> +ring_pair = dpdk_rte_mzalloc(sizeof *ring_pair);
> +if (!ring_pair) {
>  return ENOMEM;
>  }
> @@ -2630,9 +2630,9 @@ dpdk_ring_create(const char dev_name[], unsigned int 
> port_no,
>
>  /* Create single producer tx ring, netdev does explicit locking. */
> -ivshmem->cring_tx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0,
> +ring_pair->cring_tx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0,
>  RING_F_SP_ENQ);
>  free(ring_name);
> -if (ivshmem->cring_tx == NULL) {
> -rte_free(ivshmem);
> +if (ring_pair->cring_tx == NULL) {
> +rte_free(ring_pair);
>  return ENOMEM;
>  }
> @@ -2641,25 +2641,25 @@ dpdk_ring_create(const char dev_name[], unsigned int 
> port_no,
>
>  /* Create single consumer rx ring, netdev does explicit locking. */
> -ivshmem->cring_rx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0,
> +ring_pair->cring_rx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0,
>  RING_F_SC_DEQ);
>  free(ring_name);
> -if (ivshmem->cring_rx == NULL) {
> -rte_free(ivshmem);
> +if (ring_pair->cring_rx == NULL) {
> +rte_free(ring_pair);
>  return ENOMEM;
>  }
>
> -err = rte_eth_from_rings(dev_name, &ivshmem->cring_rx, 1,
> - &ivshmem->cring_tx, 1, SOCKET0);
> +err = rte_eth_from_rings(dev_name, &ring_pair->cring_rx, 1,
> + &ring_pair->cring_tx, 1, SOCKET0);
>
>  if (err < 0) {
> -rte_free(ivshmem);
> +rte_free(ring_pair);
>  return ENODEV;
>  }
>
> -ivshmem->user_port_id = port_no;
> -ivshmem->eth_port_id = rte_eth_dev_count() - 1;
> -ovs_list_push_back(&dpdk_ring_list, &ivshmem->list_node);
> +ring_pair->user_port_id = port_no;
> +ring_pair->eth_port_id = rte_eth_dev_count() - 1;
> +ovs_list_push_back(&dpdk_ring_list, &ring_pair->list_node);
>
> -*eth_port_id = ivshmem->eth_port_id;
> +*eth_port_id = ring_pair->eth_port_id;
>  return 0;
>  }
> @@ -2669,5 +2669,5 @@ dpdk_ring_open(const char dev_name[], unsigned int 
> *eth_port_id)
>  OVS_REQUIRES(dpdk_mutex)
>  {
> -struct dpdk_ring *ivshmem;
> +struct dpdk_ring *ring_pair;
>  unsigned int port_no;
>  int err = 0;
> @@ -2680,9 +2680,9 @@ dpdk_ring_open(const char dev_name[], unsigned int 
> *eth_port_id)
>
>  /* Look through our list to find the device */
> -LIST_FOR_EACH (ivshmem, list_node, &dpdk_ring_list) {
> - if (ivshmem->user_port_id == port_no) {
> +LIST_FOR_EACH (ring_pair, list_node, &dpdk_ring_list) {
> + if (ring_pair->user_port_id == port_no) {
>  VLOG_INFO("Found dpdk ring device %s:", dev_name);
>  /* Really all that is needed */
> -*eth_port_id = ivshmem->eth_port_id;
> +*eth_port_id = ring_pair->eth_port_id;
>  return 0;
>   }
> --
> 1.8.3.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Fix crash on flow monitor request with tun_metadata.

2017-01-04 Thread Daniele Di Proietto





On 04/01/2017 16:04, "Ben Pfaff"  wrote:

>On Tue, Dec 27, 2016 at 07:39:52PM -0800, Daniele Di Proietto wrote:
>> nx_put_match() needs a non-NULL tunnel metadata table, otherwise it will
>> crash if a flow matches on tunnel metadata.
>> 
>> This wasn't handled in ofputil_append_flow_update(), causing a crash
>> when the controller sent a flow monitor request.
>> 
>> To fix the problem, this commit changes ofputil_append_flow_update() to
>> behave like ofputil_append_flow_stats_reply().
>> Since ofputil_append_flow_update() now needs to temporarily modify the
>> match, this commits also embeds 'struct match' into 'struct
>> ofputil_flow_update', to be safer.  This is more similar to
>> 'struct ofputil_flow_stats'.
>> 
>> A regression test is added and a comment is updated in ovs-ofctl.c
>> 
>>  #0  0x55699bd82fa0 in memcpy_from_metadata (dst=0x7ffc770930d0, 
>> src=0x7ffc77093698, loc=0x18) at ../lib/tun-metadata.c:451
>>  #1  0x55699bd83c2e in metadata_loc_from_match_read (map=0x0, 
>> match=0x7ffc77093410, idx=0, mask=0x7ffc77093658, is_masked=0x7ffc77093287) 
>> at ../lib/tun-metadata.c:848
>>  #2  0x55699bd83d9b in tun_metadata_to_nx_match (b=0x55699d3f0300, 
>> oxm=0, match=0x7ffc77093410) at ../lib/tun-metadata.c:871
>>  #3  0x55699bce523d in nx_put_raw (b=0x55699d3f0300, oxm=0, 
>> match=0x7ffc77093410, cookie=0, cookie_mask=0) at ../lib/nx-match.c:1052
>>  #4  0x55699bce5580 in nx_put_match (b=0x55699d3f0300, 
>> match=0x7ffc77093410, cookie=0, cookie_mask=0) at ../lib/nx-match.c:1116
>>  #5  0x55699bd3926f in ofputil_append_flow_update 
>> (update=0x7ffc770940b0, replies=0x7ffc77094e00) at ../lib/ofp-util.c:6805
>>  #6  0x55699bc4b5a9 in ofproto_compose_flow_refresh_update 
>> (rule=0x55699d405b40, flags=(NXFMF_INITIAL | NXFMF_ACTIONS), 
>> msgs=0x7ffc77094e00) at ../ofproto/ofproto.c:5915
>>  #7  0x55699bc4b5f6 in ofmonitor_compose_refresh_updates 
>> (rules=0x7ffc77094e10, msgs=0x7ffc77094e00) at ../ofproto/ofproto.c:5929
>>  #8  0x55699bc4bafc in handle_flow_monitor_request 
>> (ofconn=0x55699d404090, oh=0x55699d404220) at ../ofproto/ofproto.c:6082
>>  #9  0x55699bc4f46d in handle_openflow__ (ofconn=0x55699d404090, 
>> msg=0x55699d404910) at ../ofproto/ofproto.c:7912
>>  #10 0x55699bc4f5df in handle_openflow (ofconn=0x55699d404090, 
>> ofp_msg=0x55699d404910) at ../ofproto/ofproto.c:8002
>>  #11 0x55699bc88154 in ofconn_run (ofconn=0x55699d404090, 
>> handle_openflow=0x55699bc4f5bc ) at 
>> ../ofproto/connmgr.c:1427
>>  #12 0x55699bc85934 in connmgr_run (mgr=0x55699d3adb90, 
>> handle_openflow=0x55699bc4f5bc ) at ../ofproto/connmgr.c:363
>>  #13 0x55699bc422c9 in ofproto_run (p=0x55699d3c85e0) at 
>> ../ofproto/ofproto.c:1798
>>  #14 0x55699bc31ec6 in bridge_run__ () at ../vswitchd/bridge.c:2881
>>  #15 0x55699bc320a6 in bridge_run () at ../vswitchd/bridge.c:2938
>>  #16 0x55699bc3784e in main (argc=10, argv=0x7ffc770952c8) at 
>> ../vswitchd/ovs-vswitchd.c:111
>> 
>> Fixes: 8d8ab6c2d574 ("tun-metadata: Manage tunnel TLV mapping table on a
>> per-bridge basis.")
>> 
>> Signed-off-by: Daniele Di Proietto 
>
>Thank you!
>
>Acked-by: Ben Pfaff 

Thanks! I applied this to master
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/3] netdev-dpdk: add hotplug support

2017-01-05 Thread Daniele Di Proietto
2017-01-05 7:52 GMT-08:00 Stephen Finucane :
> On Thu, 2017-01-05 at 10:42 +, Ciara Loftus wrote:
>> From: Mauricio Vásquez 
>>
>> In order to use dpdk ports in ovs they have to be bound to a DPDK
>> compatible driver before ovs is started.
>>
>> This patch adds the possibility to hotplug (or hot-unplug) a device
>> after ovs has been started. The implementation adds two appctl
>> commands:
>> netdev-dpdk/attach and netdev-dpdk/detach
>>
>> After the user attaches a new device, it has to be added to a bridge
>> using the add-port command, similarly, before detaching a device,
>> it has to be removed using the del-port command.
>>
>> Signed-off-by: Mauricio Vasquez B > lito.it>
>> Signed-off-by: Ciara Loftus 
>> Co-authored-by: Ciara Loftus 
>
> The docs aspects of all three patches all look a-OK to me now. Cheers
> :)
>
> Acked-by: Stephen Finucane   # docs only

Applied to master, thanks
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 2/3] netdev-dpdk: Arbitrary 'dpdk' port naming

2017-01-05 Thread Daniele Di Proietto
If new_port_id == -1 I think it's better to return failure from
set_config(), rather than return failure from a later reconfigure(),
so I squashed the following incremental:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e66ff2711..038f79b80 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1231,12 +1231,13 @@ netdev_dpdk_set_config(struct netdev *netdev,
const struct smap *args)
  * is valid */
 if (!(dev->devargs && !strcmp(dev->devargs, new_devargs)
&& rte_eth_dev_is_valid_port(dev->port_id))) {
-err = EINVAL;
 int new_port_id = netdev_dpdk_process_devargs(new_devargs);
-if (new_port_id == dev->port_id) {
+if (!rte_eth_dev_is_valid_port(new_port_id)) {
+err = EINVAL;
+} else if (new_port_id == dev->port_id) {
 /* Already configured, do not reconfigure again */
 err = 0;
-} else if (rte_eth_dev_is_valid_port(new_port_id)) {
+} else {
 struct netdev_dpdk *dup_dev;
 dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
 if (dup_dev) {

and applied to master, thanks!

2017-01-05 2:42 GMT-08:00 Ciara Loftus :
> 'dpdk' ports no longer have naming restrictions. Now, instead of
> specifying the dpdk port ID as part of the name, the PCI address of the
> device must be specified via the 'dpdk-devargs' option. eg.
>
> ovs-vsctl add-port br0 my-port
> ovs-vsctl set Interface my-port type=dpdk
>   options:dpdk-devargs=:06:00.3
>
> The user must no longer hotplug attach DPDK ports by issuing the
> specific ovs-appctl netdev-dpdk/attach command. The hotplug is now
> automatically invoked when a valid PCI address is set in the
> dpdk-devargs. The format for ovs-appctl netdev-dpdk/detach command
> has changed in that the user now must specify the relevant PCI address
> as input instead of the port name.
>
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Daniele Di Proietto 
> Co-authored-by: Daniele Di Proietto 
> Signed-off-by: Kevin Traynor 
> Co-authored-by: Kevin Traynor 
> ---
> Changelog:
> * Keep port-detach appctl function - use PCI as input arg
> * Add requires_mutex to devargs processing functions
> * use reconfigure infrastructure for devargs changes
> * process devargs even if valid portid ie. device already configured
> * report err if dpdk-devargs is not specified
> * Add Daniele's incremental patch & Sign-off + Co-author tags
> * Update details of detach command to reflect PCI is needed instead of
>   port name
> * Update NEWS to mention that the new naming scheme is not backwards
>   compatible
> * Use existing DPDk functions to get port ID from PCI address/devname
> * Merged process_devargs and process_pdevargs functions
> * Removed unnecessary requires_mutex from devargs processing fn
> * Fix case where port is re-attached after detach
> * Add note to documentation that devices won't work until devargs set.
> * Set netdev type and dpdk-devargs in one command in docs to avoid
>   errors.
> * Change port names in documentation to emphasise arbitrary-ness.
> * Add Kevin's incremental & sign-off/co-authored-by
> * Check if devargs string has changed before processing it as suggested
>   by Daniele.
> * Print error if attach fails
>
>  Documentation/howto/dpdk.rst |   7 +-
>  Documentation/intro/install/dpdk.rst |   9 +-
>  NEWS |   5 +
>  lib/netdev-dpdk.c| 181 
> ---
>  vswitchd/vswitch.xml |   8 ++
>  5 files changed, 148 insertions(+), 62 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 13da548..1ff672c 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -312,14 +312,13 @@ In order to attach a port, it has to be bound to DPDK 
> using the
>
>  Then it can be attached to OVS::
>
> -$ ovs-appctl netdev-dpdk/attach :01:00.0
> -
> -At this point, the user can create a dpdk port using the ``add-port`` 
> command.
> +$ ovs-vsctl add-port br0 dpdkx -- set Interface dpdkx type=dpdk \
> +options:dpdk-devargs=:01:00.0
>
>  It is also possible to detach a port from ovs, the user has to remove the
>  port using the del-port command, then it can be detached using::
>
> -$ ovs-appctl netdev-dpdk/detach dpdk0
> +$ ovs-appctl netdev-dpdk/detach dpdkx
>
>  This feature is not supported with VFIO and does not work with some NICs.
>  For more information please refer to the `DPDK Port Hotplug Framework
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Do

Re: [ovs-dev] [PATCH v5 3/3] netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)

2017-01-05 Thread Daniele Di Proietto
2017-01-05 2:42 GMT-08:00 Ciara Loftus :
> Prior to this commit, the 'dpdk' port type could only be used for
> physical DPDK devices. Now, virtual devices (or 'vdevs') are supported.
> 'vdev' devices are those which use virtual DPDK Poll Mode Drivers eg.
> null, pcap. To add a DPDK vdev, a valid 'dpdk-devargs' must be set for
> the given dpdk port. The format expected is 'eth_' where
> 'x' is a number between 0 and RTE_MAX_ETHPORTS -1.
>
> For example to add a port that uses the 'null' DPDK PMD driver:
>
> ovs-vsctl set Interface null0 options:dpdk-devargs=eth_null0
>
> Not all DPDK vdevs have been verified to work at this point in time.
>
> Signed-off-by: Ciara Loftus 

Thanks for all your work on this!

Applied to master


> ---
> Changelog:
> * Updated process_vdevargs to work with Daniele's incremental in the
>   previous patch.
> * Allow vdev detach
> * Update docs to show af_packet example
> * Fix af_packet docs example
> * Fix style issues in docs
>
>  Documentation/howto/dpdk.rst | 29 +
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 35 +++
>  vswitchd/vswitch.xml |  9 +++--
>  4 files changed, 48 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 1ff672c..fbb4b53 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -324,6 +324,35 @@ This feature is not supported with VFIO and does not 
> work with some NICs.
>  For more information please refer to the `DPDK Port Hotplug Framework
>  
> `__.
>
> +.. _vdev-support:
> +
> +Vdev Support
> +
> +
> +DPDK provides drivers for both physical and virtual devices. Physical DPDK
> +devices are added to OVS by specifying a valid PCI address in 'dpdk-devargs'.
> +Virtual DPDK devices which do not have PCI addresses can be added using a
> +different format for 'dpdk-devargs'.
> +
> +Typically, the format expected is 'eth_' where 'x' is a
> +number between 0 and RTE_MAX_ETHPORTS -1 (31).
> +
> +For example to add a dpdk port that uses the 'null' DPDK PMD driver::
> +
> +   $ ovs-vsctl add-port br0 null0 -- set Interface null0 type=dpdk \
> +   options:dpdk-devargs=eth_null0
> +
> +Similarly, to add a dpdk port that uses the 'af_packet' DPDK PMD driver::
> +
> +   $ ovs-vsctl add-port br0 myeth0 -- set Interface myeth0 type=dpdk \
> +   options:dpdk-devargs=eth_af_packet0,iface=eth0
> +
> +More information on the different types of virtual DPDK PMDs can be found in
> +the `DPDK documentation
> +`__.
> +
> +Note: Not all DPDK virtual PMD drivers have been tested and verified to work.
> +
>  .. _dpdk-ovs-in-guest:
>
>  OVS with DPDK Inside VMs
> diff --git a/NEWS b/NEWS
> index d66d402..cc319a9 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -53,6 +53,7 @@ Post-v2.6.0
> with the old dpdk naming scheme is broken, and as such a
> device will not be available for use until a valid dpdk-devargs is
> specified.
> + * Virtual DPDK Poll Mode Driver (vdev PMD) support.
> - Fedora packaging:
>   * A package upgrade does not automatically restart OVS service.
> - ovs-vswitchd/ovs-vsctl:
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index e66ff27..79eddb5 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1134,25 +1134,19 @@ netdev_dpdk_lookup_by_port_id(int port_id)
>  static int
>  netdev_dpdk_process_devargs(const char *devargs)
>  {
> -struct rte_pci_addr addr;
>  uint8_t new_port_id = UINT8_MAX;
>
> -if (!eal_parse_pci_DomBDF(devargs, &addr)) {
> -/* Valid PCI address format detected - configure physical device */
> -if (!rte_eth_dev_count()
> -|| rte_eth_dev_get_port_by_name(devargs, &new_port_id)
> -|| !rte_eth_dev_is_valid_port(new_port_id)) {
> -/* PCI device not found in DPDK, attempt to attach it */
> -if (!rte_eth_dev_attach(devargs, &new_port_id)) {
> -/* Attach successful */
> -VLOG_INFO("Device "PCI_PRI_FMT" has been attached to DPDK",
> -  addr.domain, addr.bus, addr.devid, addr.function);
> -} else {
> -/* Attach unsuccessful */
> -VLOG_INFO("Error attaching device "PCI_PRI_FMT" to DPDK",
> -  addr.domain, addr.bus, addr.devid, addr.function);
> -return -1;
> -}
> +if (!rte_eth_dev_count()
> +|| rte_eth_dev_get_port_by_name(devargs, &new_port_id)
> +|| !rte_eth_dev_is_valid_port(new_port_id)) {
> +/* Device not found in DPDK, attempt to attach it */
> +if (!rte_eth_dev_attach(devargs, &new_port_id)) {
> +/* Attach successful */
> +VLOG_INFO("Device '%s' attached t

[ovs-dev] [PATCH] ofproto-dpif: Continue port dump if a port is missing from dpif-netdev.

2017-01-05 Thread Daniele Di Proietto
bridge_delete_or_reconfigure() deletes every interface that's not dumped
by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
calling port_query_by_name().  If port_query_by_name() returns an error,
the dump is interrupted.  If port_query_by_name() returns ENODEV, the
device doesn't exist and the dump can continue.

port_query_by_name() for the userspace datapath returns ENOENT instead
of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
handled correctly by port_dump_next().

This commit fixes the problem by handling ENOENT like ENODEV.

dpif-netdev handles reconfiguration errors for an interface by deleting
it from the datapath, so it's possible that a device is missing. When this
happens we must make sure that port_dump_next() continues to dump other
devices otherwise they will be deleted and the two layers will have an
inconsistent view.

The problem was found while developing new code.

Signed-off-by: Daniele Di Proietto 
---
 ofproto/ofproto-dpif.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 3b036a116..5051667b9 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3481,7 +3481,7 @@ port_dump_next(const struct ofproto *ofproto_, void 
*state_,
 *port = state->port;
 state->has_port = true;
 return 0;
-} else if (error != ENODEV) {
+} else if (error != ENODEV && error != ENOENT) {
 return error;
 }
 }
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpif: Return ENODEV from dpif_port_query_by_name() if there's no port.

2017-01-06 Thread Daniele Di Proietto
bridge_delete_or_reconfigure() deletes every interface that's not dumped
by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
calling port_query_by_name().  If port_query_by_name() returns an error,
the dump is interrupted.  If port_query_by_name() returns ENODEV, the
device doesn't exist and the dump can continue.

port_query_by_name() for the userspace datapath returns ENOENT instead
of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
handled correctly by port_dump_next().

This commit fixes the problem by translating ENOENT in ENODEV in
dpif_port_query_by_name().

dpif-netdev handles reconfiguration errors for an interface by deleting
it from the datapath, so it's possible that a device is missing. When this
happens we must make sure that port_dump_next() continues to dump other
devices, otherwise they will be deleted and the two layers will have an
inconsistent view.

The problem was found while developing new code.

Signed-off-by: Daniele Di Proietto 
---
v2:
* Translate ENOENT into ENODEV in dpif_port_query_by_name(), instead of
  handling both in port_dump_next().
---
 lib/dpif.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/dpif.c b/lib/dpif.c
index 53958c559..8ef0049c8 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -653,6 +653,8 @@ dpif_port_query_by_number(const struct dpif *dpif, 
odp_port_t port_no,
  * initializes '*port' appropriately; on failure, returns a positive errno
  * value.
  *
+ * Retuns ENODEV if the port doesn't exist.
+ *
  * The caller owns the data in 'port' and must free it with
  * dpif_port_destroy() when it is no longer needed. */
 int
@@ -666,12 +668,18 @@ dpif_port_query_by_name(const struct dpif *dpif, const 
char *devname,
 } else {
 memset(port, 0, sizeof *port);
 
-/* For ENOENT or ENODEV we use DBG level because the caller is probably
+if (error == ENOENT) {
+/* Some dpif provider can return ENOENT if the port is not there,
+ * we want to translate that to ENODEV. */
+error = ENODEV;
+}
+
+/* For ENODEV we use DBG level because the caller is probably
  * interested in whether 'dpif' actually has a port 'devname', so that
  * it's not an issue worth logging if it doesn't.  Other errors are
  * uncommon and more likely to indicate a real problem. */
 VLOG_RL(&error_rl,
-error == ENOENT || error == ENODEV ? VLL_DBG : VLL_WARN,
+error == ENODEV ? VLL_DBG : VLL_WARN,
 "%s: failed to query port %s: %s",
 dpif_name(dpif), devname, ovs_strerror(error));
 }
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif: Continue port dump if a port is missing from dpif-netdev.

2017-01-06 Thread Daniele Di Proietto





On 06/01/2017 09:28, "Ben Pfaff"  wrote:

>On Thu, Jan 05, 2017 at 08:37:26PM -0800, Daniele Di Proietto wrote:
>> bridge_delete_or_reconfigure() deletes every interface that's not dumped
>> by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
>> OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
>> calling port_query_by_name().  If port_query_by_name() returns an error,
>> the dump is interrupted.  If port_query_by_name() returns ENODEV, the
>> device doesn't exist and the dump can continue.
>> 
>> port_query_by_name() for the userspace datapath returns ENOENT instead
>> of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
>> handled correctly by port_dump_next().
>> 
>> This commit fixes the problem by handling ENOENT like ENODEV.
>> 
>> dpif-netdev handles reconfiguration errors for an interface by deleting
>> it from the datapath, so it's possible that a device is missing. When this
>> happens we must make sure that port_dump_next() continues to dump other
>> devices otherwise they will be deleted and the two layers will have an
>> inconsistent view.
>> 
>> The problem was found while developing new code.
>> 
>> Signed-off-by: Daniele Di Proietto 
>
>I'm not sure whether there's a difference in meaning between ENOENT and
>ENODEV when it comes from these functions.  I wonder whether the dpif
>layer should translate one of them into the other, for callers'
>convenience.

Good idea, let me send a v2.

Thanks,

Daniele

>
>Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] dpif: Return ENODEV from dpif_port_query_by_*() if there's no port.

2017-01-06 Thread Daniele Di Proietto
bridge_delete_or_reconfigure() deletes every interface that's not dumped
by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
calling port_query_by_name().  If port_query_by_name() returns an error,
the dump is interrupted.  If port_query_by_name() returns ENODEV, the
device doesn't exist and the dump can continue.

port_query_by_name() for the userspace datapath returns ENOENT instead
of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
handled correctly by port_dump_next().

dpif-netdev handles reconfiguration errors for an interface by deleting
it from the datapath, so it's possible that a device is missing. When this
happens we must make sure that port_dump_next() continues to dump other
devices, otherwise they will be deleted and the two layers will have an
inconsistent view.

This commit fixes the problem by returning ENODEV from the userspace
datapath if the port doesn't exist, and by documenting this clearly in
the dpif interfaces.

The problem was found while developing new code.

Signed-off-by: Daniele Di Proietto 
---
v3: Return ENODEV instead of ENOENT from dpif-netdev. Document that ENODEV
means that the port doesn't exist, other error numbers indicate problems.

---
 lib/dpif-netdev.c   |  7 +--
 lib/dpif-provider.h |  4 
 lib/dpif.c  | 11 +++
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0b73056a8..1953bb37c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1435,7 +1435,7 @@ get_port_by_number(struct dp_netdev *dp,
 return EINVAL;
 } else {
 *portp = dp_netdev_lookup_port(dp, port_no);
-return *portp ? 0 : ENOENT;
+return *portp ? 0 : ENODEV;
 }
 }
 
@@ -1473,7 +1473,10 @@ get_port_by_name(struct dp_netdev *dp,
 return 0;
 }
 }
-return ENOENT;
+
+/* Callers of dpif_netdev_port_query_by_name() expect ENODEV for a non
+ * existing port. */
+return ENODEV;
 }
 
 static int
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 56b88f93d..d3b2bb91d 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -176,6 +176,10 @@ struct dpif_class {
  * If 'port' is not null, stores information about the port into
  * '*port' if successful.
  *
+ * If the port doesn't exist, the provider must return ENODEV.  Other
+ * error numbers means that something wrong happened and will be
+ * treated differently by upper layers.
+ *
  * If 'port' is not null, the caller takes ownership of data in
  * 'port' and must free it with dpif_port_destroy() when it is no
  * longer needed. */
diff --git a/lib/dpif.c b/lib/dpif.c
index 53958c559..cc4936c70 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -602,7 +602,7 @@ bool
 dpif_port_exists(const struct dpif *dpif, const char *devname)
 {
 int error = dpif->dpif_class->port_query_by_name(dpif, devname, NULL);
-if (error != 0 && error != ENOENT && error != ENODEV) {
+if (error != 0 && error != ENODEV) {
 VLOG_WARN_RL(&error_rl, "%s: failed to query port %s: %s",
  dpif_name(dpif), devname, ovs_strerror(error));
 }
@@ -631,6 +631,8 @@ dpif_port_set_config(struct dpif *dpif, odp_port_t port_no,
  * initializes '*port' appropriately; on failure, returns a positive errno
  * value.
  *
+ * Retuns ENODEV if the port doesn't exist.
+ *
  * The caller owns the data in 'port' and must free it with
  * dpif_port_destroy() when it is no longer needed. */
 int
@@ -653,6 +655,8 @@ dpif_port_query_by_number(const struct dpif *dpif, 
odp_port_t port_no,
  * initializes '*port' appropriately; on failure, returns a positive errno
  * value.
  *
+ * Retuns ENODEV if the port doesn't exist.
+ *
  * The caller owns the data in 'port' and must free it with
  * dpif_port_destroy() when it is no longer needed. */
 int
@@ -666,12 +670,11 @@ dpif_port_query_by_name(const struct dpif *dpif, const 
char *devname,
 } else {
 memset(port, 0, sizeof *port);
 
-/* For ENOENT or ENODEV we use DBG level because the caller is probably
+/* For ENODEV we use DBG level because the caller is probably
  * interested in whether 'dpif' actually has a port 'devname', so that
  * it's not an issue worth logging if it doesn't.  Other errors are
  * uncommon and more likely to indicate a real problem. */
-VLOG_RL(&error_rl,
-error == ENOENT || error == ENODEV ? VLL_DBG : VLL_WARN,
+VLOG_RL(&error_rl, error == ENODEV ? VLL_DBG : VLL_WARN,
 "%s: failed to query port %s: %s",
 dpif_name(dpif), devname, ovs_strerror(error));
 }
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif: Return ENODEV from dpif_port_query_by_name() if there's no port.

2017-01-06 Thread Daniele Di Proietto





On 06/01/2017 11:34, "Ben Pfaff"  wrote:

>On Fri, Jan 06, 2017 at 10:59:07AM -0800, Daniele Di Proietto wrote:
>> bridge_delete_or_reconfigure() deletes every interface that's not dumped
>> by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
>> OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
>> calling port_query_by_name().  If port_query_by_name() returns an error,
>> the dump is interrupted.  If port_query_by_name() returns ENODEV, the
>> device doesn't exist and the dump can continue.
>> 
>> port_query_by_name() for the userspace datapath returns ENOENT instead
>> of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
>> handled correctly by port_dump_next().
>
>Should port_query_by_name() for the userspace datapath return ENODEV,
>instead?

Sorry to waste your time on this.  Yes, that seems the more appropriate 
solution.

I decided to handle both ENODEV and ENOENT to be consistent with what we did in
the past, e.g bee6b8bc16b1("dpif: Don't log warning for ENOENT with
dpif_port_exists().").

I suspected that ENOENT could only come from the userspace datapath, but I 
wasn't
too sure about that.

After looking at vport_cmd_get() and testing it I couldn't find any ENOENT, so,
probably, we added all those special cases just for the userspace datapath.

How about the following v3?

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327323.html

Thanks,

Daniele

>
>> This commit fixes the problem by translating ENOENT in ENODEV in
>> dpif_port_query_by_name().
>> 
>> dpif-netdev handles reconfiguration errors for an interface by deleting
>> it from the datapath, so it's possible that a device is missing. When this
>> happens we must make sure that port_dump_next() continues to dump other
>> devices, otherwise they will be deleted and the two layers will have an
>> inconsistent view.
>> 
>> The problem was found while developing new code.
>> 
>> Signed-off-by: Daniele Di Proietto 
>> ---
>> v2:
>> * Translate ENOENT into ENODEV in dpif_port_query_by_name(), instead of
>>   handling both in port_dump_next().
>
>Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpif: Return ENODEV from dpif_port_query_by_*() if there's no port.

2017-01-06 Thread Daniele Di Proietto





On 06/01/2017 13:01, "Ben Pfaff"  wrote:

>On Fri, Jan 06, 2017 at 12:42:35PM -0800, Daniele Di Proietto wrote:
>> bridge_delete_or_reconfigure() deletes every interface that's not dumped
>> by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
>> OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
>> calling port_query_by_name().  If port_query_by_name() returns an error,
>> the dump is interrupted.  If port_query_by_name() returns ENODEV, the
>> device doesn't exist and the dump can continue.
>> 
>> port_query_by_name() for the userspace datapath returns ENOENT instead
>> of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
>> handled correctly by port_dump_next().
>> 
>> dpif-netdev handles reconfiguration errors for an interface by deleting
>> it from the datapath, so it's possible that a device is missing. When this
>> happens we must make sure that port_dump_next() continues to dump other
>> devices, otherwise they will be deleted and the two layers will have an
>> inconsistent view.
>> 
>> This commit fixes the problem by returning ENODEV from the userspace
>> datapath if the port doesn't exist, and by documenting this clearly in
>> the dpif interfaces.
>> 
>> The problem was found while developing new code.
>> 
>> Signed-off-by: Daniele Di Proietto 
>> ---
>> v3: Return ENODEV instead of ENOENT from dpif-netdev. Document that ENODEV
>> means that the port doesn't exist, other error numbers indicate problems.
>
>Acked-by: Ben Pfaff 

Thanks!  Pushed to master
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev: Add 'errp' to set_config().

2017-01-06 Thread Daniele Di Proietto
Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
set_config() is used to identify a DPDK device, so it's better to report
its detailed error message to the user.  Tunnel devices and patch ports
rely a lot on set_config() as well.

This commit adds a param to set_config() that can be used to return
an error message and makes use of that in netdev-dpdk and netdev-vport.

Before this patch:

$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set 
configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: could not set 
configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set 
configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

After this patch:

$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing 
'options:dpdk-devargs'. The old 'dpdk' names are not supported.  See 
ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires valid 
'peer' argument.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type requires 
valid 'remote_ip' argument.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

CC: Ciara Loftus 
CC: Kevin Traynor 
Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dpdk.c | 27 ++
 lib/netdev-dummy.c|  3 +-
 lib/netdev-provider.h |  9 --
 lib/netdev-vport.c| 76 ++-
 lib/netdev.c  | 10 +--
 5 files changed, 84 insertions(+), 41 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0f02c4d74..1bcc27a62 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1132,7 +1132,7 @@ netdev_dpdk_lookup_by_port_id(int port_id)
 }
 
 static int
-netdev_dpdk_process_devargs(const char *devargs)
+netdev_dpdk_process_devargs(const char *devargs, char **errp)
 {
 uint8_t new_port_id = UINT8_MAX;
 
@@ -1145,7 +1145,7 @@ netdev_dpdk_process_devargs(const char *devargs)
 VLOG_INFO("Device '%s' attached to DPDK", devargs);
 } else {
 /* Attach unsuccessful */
-VLOG_INFO("Error attaching device '%s' to DPDK", devargs);
+VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", 
devargs);
 return -1;
 }
 }
@@ -1184,7 +1184,8 @@ dpdk_process_queue_size(struct netdev *netdev, const 
struct smap *args,
 }
 
 static int
-netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
+netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
+   char **errp)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 bool rx_fc_en, tx_fc_en, autoneg;
@@ -1225,7 +1226,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args)
  * is valid */
 if (!(dev->devargs && !strcmp(dev->devargs, new_devargs)
&& rte_eth_dev_is_valid_port(dev->port_id))) {
-int new_port_id = netdev_dpdk_process_devargs(new_devargs);
+int new_port_id = netdev_dpdk_process_devargs(new_devargs, errp);
 if (!rte_eth_dev_is_valid_port(new_port_id)) {
 err = EINVAL;
 } else if (new_port_id == dev->port_id) {
@@ -1235,10 +1236,10 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args)
 struct netdev_dpdk *dup_dev;
 dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
 if (dup_dev) {
-VLOG_WARN("'%s' is trying to use device '%s' which is "
-  "already in use by '%s'.",
-  netdev_get_name(netdev), new_devargs,
-  netdev_get_name(&dup_dev->up));
+   

[ovs-dev] [PATCH v3 02/18] dpif-netdev: Take non_pmd_mutex to access tx cached ports.

2017-01-08 Thread Daniele Di Proietto
As documented in dp_netdev_pmd_thread, we must take non_pmd_mutex to
access the tx port caches for the non pmd thread.

Found by inspection.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9003f703d..f600cab00 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3353,8 +3353,10 @@ dp_netdev_del_pmd(struct dp_netdev *dp, struct 
dp_netdev_pmd_thread *pmd)
 /* NON_PMD_CORE_ID doesn't have a thread, so we don't have to synchronize,
  * but extra cleanup is necessary */
 if (pmd->core_id == NON_PMD_CORE_ID) {
+ovs_mutex_lock(&dp->non_pmd_mutex);
 emc_cache_uninit(&pmd->flow_cache);
 pmd_free_cached_ports(pmd);
+ovs_mutex_unlock(&dp->non_pmd_mutex);
 } else {
 latch_set(&pmd->exit_latch);
 dp_netdev_reload_pmd__(pmd);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 00/18] DPDK/pmd reconfiguration refactor and bugfixes

2017-01-08 Thread Daniele Di Proietto
The first two commits of the series are trivial bugfixes for dpif-netdev.

Then the series fixes a long standing bug that caused a crash when the
admin link state of a port is changed while traffic is flowing.

The next part makes use of reconfiguration for port add: this makes
the operation twice as fast and reduce some code duplication.  This part
conflicts with the port naming change, so I'm willing to postpone it, unless
we find it to be useful for the port naming change.

The rest of the series refactors a lot of code if dpif-netdev:

* We no longer start pmd threads on demand for each numa node.  This made
  the code very complicated and introduced a lot of bugs.
* The pmd threads state is now internal to dpif-netdev and it's not stored in
  ovs-numa.
* There's now a single function that handles pmd threads/ports changes: this
  reduces code duplication and makes port reconfiguration faster, as we don't
  have to bring down the whole datapath.

v3->v2:

* Rebased:
  * Rebased against dpdk arbitrary name change.
  * Dropped unsigned 'core_id' commit because a similar fix is already
on master
* Put space between *FOR_EACH* and (
* Actually use new FOR_EACH_NUMA_ON_DUMP
* Use hmap_contains() instead of dp_netdev_lookup_port() in a couple of
  places
* Restore spaces in log messages, lost while wrapping the string.

v1->v2:

* Postpone cls deletion in dp_netdev_destroy_pmd()
* Allow ports to be in tnl_port_cache and send_port_cache at the same time
* Set counter to 1025 when reloading pmd without queues to be polled
* Rebased:
  * Allow 0x in pmd-cpu-mask
  * ...
* Don't duplicate get_core_by_core_id() in get_cpu_core()
* New commit for ovs-numa: don't use hmap_first_with_hash()
* Keep per numa count of cores in ovs_numa_dump
* Print queue id and port name in warning if there's no pmd thread
* Extract pmd_remove_stale_ports() from reconfigure_datapath()
* s/reload_all_pmds()/reload_affected_pmds()/
* Declare variables at the beginning of the block in rxq_scheduling()
* Use 'q' instead of 'port->rxqs[qid]' in a couple of places
* Unref pmd in rxq_scheduling()
* Simplify check for changed pmd threads
* Properly reset queues to unassigned in reconfigure_datapath()
* Optimize tx port insertion in pmd cache


Daniele Di Proietto (18):
  dpif-netdev: Fix memory leak.
  dpif-netdev: Take non_pmd_mutex to access tx cached ports.
  dpif-netdev: Don't try to output on a device without txqs.
  netdev-dpdk: Don't call rte_dev_stop() in update_flags().
  netdev-dpdk: Start also dpdkr devices only once on port-add.
  netdev-dpdk: Refactor construct and destruct.
  dpif-netdev: Use a boolean instead of pmd->port_seq.
  dpif-netdev: Block pmd threads if there are no ports.
  dpif-netdev: Create pmd threads for every numa node.
  dpif-netdev: Make 'static_tx_qid' const.
  dpctl: Avoid making assumptions on pmd threads.
  ovs-numa: New ovs_numa_dump_contains_core() function.
  ovs-numa: Add new dump types.
  ovs-numa: Don't use hmap_first_with_hash().
  ovs-numa: Add per numa and global counts in dump.
  dpif-netdev: Use hmap for poll_list in pmd threads.
  dpif-netdev: Centralized threads and queues handling code.
  ovs-numa: Remove unused functions.

 lib/dpctl.c   |  107 +---
 lib/dpif-netdev.c | 1427 -
 lib/dpif.c|6 +-
 lib/dpif.h|   12 +-
 lib/netdev-dpdk.c |  170 +++
 lib/netdev.c  |   41 +-
 lib/netdev.h  |1 +
 lib/ovs-numa.c|  284 +--
 lib/ovs-numa.h|   35 +-
 tests/pmd.at  |   49 +-
 vswitchd/bridge.c |2 +
 11 files changed, 1079 insertions(+), 1055 deletions(-)

-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 01/18] dpif-netdev: Fix memory leak.

2017-01-08 Thread Daniele Di Proietto
We keep all the per-port classifiers around, since they can be reused,
but when a pmd thread is destroyed we should free them.

Found using valgrind.

Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted
subtables")

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d1f9661a2..9003f703d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -,6 +,7 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
 /* All flows (including their dpcls_rules) have been deleted already */
 CMAP_FOR_EACH (cls, node, &pmd->classifiers) {
 dpcls_destroy(cls);
+ovsrcu_postpone(free, cls);
 }
 cmap_destroy(&pmd->classifiers);
 cmap_destroy(&pmd->flow_table);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 04/18] netdev-dpdk: Don't call rte_dev_stop() in update_flags().

2017-01-08 Thread Daniele Di Proietto
Calling rte_eth_dev_stop() while the device is running causes a crash.

We could use rte_eth_dev_set_link_down(), but not every PMD implements
that, and I found one NIC where that has no effect.

Instead, this commit checks if the device has the NETDEV_UP flag when
transmitting or receiving (similarly to what we do for vhostuser). I
didn't notice any performance difference with this check in case the
device is up.

An alternative would be to remove the device queues from the pmd threads
tx and receive cache, but that requires reconfiguration and I'd prefer
to avoid it, because the change can come from OpenFlow.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dpdk.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8bb908691..2df3e220c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -783,8 +783,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 mbp_priv = rte_mempool_get_priv(dev->dpdk_mp->mp);
 dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;
 
-dev->flags = NETDEV_UP | NETDEV_PROMISC;
-
 /* Get the Flow control configuration for DPDK-ETH */
 diag = rte_eth_dev_flow_ctrl_get(dev->port_id, &dev->fc_conf);
 if (diag) {
@@ -890,6 +888,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 
 /* Initilize the hardware offload flags to 0 */
 dev->hw_ol_features = 0;
+
+dev->flags = NETDEV_UP | NETDEV_PROMISC;
+
 if (type == DPDK_DEV_ETH) {
 if (rte_eth_dev_is_valid_port(dev->port_id)) {
 err = dpdk_eth_dev_init(dev);
@@ -900,8 +901,6 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
 } else {
 dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
-/* Enable DPDK_DEV_VHOST device and set promiscuous mode flag. */
-dev->flags = NETDEV_UP | NETDEV_PROMISC;
 }
 
 if (!dev->tx_q) {
@@ -1591,6 +1590,10 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct 
dp_packet_batch *batch)
 int nb_rx;
 int dropped = 0;
 
+if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
+return EAGAIN;
+}
+
 nb_rx = rte_eth_rx_burst(rx->port_id, rxq->queue_id,
  (struct rte_mbuf **) batch->packets,
  NETDEV_MAX_BURST);
@@ -1821,6 +1824,11 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
struct dp_packet_batch *batch, bool may_steal,
bool concurrent_txq)
 {
+if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
+dp_packet_delete_batch(batch, may_steal);
+return;
+}
+
 if (OVS_UNLIKELY(concurrent_txq)) {
 qid = qid % dev->up.n_txq;
 rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
@@ -2285,8 +2293,6 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
enum netdev_flags *old_flagsp)
 OVS_REQUIRES(dev->mutex)
 {
-int err;
-
 if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) {
 return EINVAL;
 }
@@ -2300,20 +2306,10 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
 }
 
 if (dev->type == DPDK_DEV_ETH) {
-if (dev->flags & NETDEV_UP) {
-err = rte_eth_dev_start(dev->port_id);
-if (err)
-return -err;
-}
-
 if (dev->flags & NETDEV_PROMISC) {
 rte_eth_promiscuous_enable(dev->port_id);
 }
 
-if (!(dev->flags & NETDEV_UP)) {
-rte_eth_dev_stop(dev->port_id);
-}
-
 netdev_change_seq_changed(&dev->up);
 } else {
 /* If DPDK_DEV_VHOST device's NETDEV_UP flag was changed and vhost is
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 03/18] dpif-netdev: Don't try to output on a device without txqs.

2017-01-08 Thread Daniele Di Proietto
Tunnel devices have 0 txqs and don't support netdev_send().  While
netdev_send() simply returns EOPNOTSUPP, the XPS logic is still executed
on output, and that might be confused by devices with no txqs.

It seems better to have different structures in the fast path for ports
that support netdev_{push,pop}_header (tunnel devices), and ports that
support netdev_send.  With this we can also remove a branch in
netdev_send().

This is also necessary for a future commit, which starts DPDK devices
without txqs.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 73 +++
 lib/netdev.c  | 35 ++
 lib/netdev.h  |  1 +
 3 files changed, 73 insertions(+), 36 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f600cab00..004b28dc8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -422,7 +422,8 @@ struct rxq_poll {
 struct ovs_list node;
 };
 
-/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
+/* Contained by struct dp_netdev_pmd_thread's 'send_port_cache',
+ * 'tnl_port_cache' or 'tx_ports'. */
 struct tx_port {
 struct dp_netdev_port *port;
 int qid;
@@ -504,11 +505,18 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
-/* Map of 'tx_port' used in the fast path. This is a thread-local copy of
- * 'tx_ports'. The instance for cpu core NON_PMD_CORE_ID can be accessed
- * by multiple threads, and thusly need to be protected by 'non_pmd_mutex'.
- * Every other instance will only be accessed by its own pmd thread. */
-struct hmap port_cache;
+/* These are thread-local copies of 'tx_ports'.  One contains only tunnel
+ * ports (that support push_tunnel/pop_tunnel), the other contains ports
+ * with at least one txq (that support send).  A port can be in both.
+ *
+ * There are two separate maps to make sure that we don't try to execute
+ * OUTPUT on a device which has 0 txqs or PUSH/POP on a non-tunnel device.
+ *
+ * The instances for cpu core NON_PMD_CORE_ID can be accessed by multiple
+ * threads, and thusly need to be protected by 'non_pmd_mutex'.  Every
+ * other instance will only be accessed by its own pmd thread. */
+struct hmap tnl_port_cache;
+struct hmap send_port_cache;
 
 /* Only a pmd thread can write on its own 'cycles' and 'stats'.
  * The main thread keeps 'stats_zero' and 'cycles_zero' as base
@@ -3058,7 +3066,10 @@ pmd_free_cached_ports(struct dp_netdev_pmd_thread *pmd)
 /* Free all used tx queue ids. */
 dpif_netdev_xps_revalidate_pmd(pmd, 0, true);
 
-HMAP_FOR_EACH_POP (tx_port_cached, node, &pmd->port_cache) {
+HMAP_FOR_EACH_POP (tx_port_cached, node, &pmd->tnl_port_cache) {
+free(tx_port_cached);
+}
+HMAP_FOR_EACH_POP (tx_port_cached, node, &pmd->send_port_cache) {
 free(tx_port_cached);
 }
 }
@@ -3072,12 +3083,21 @@ pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
 struct tx_port *tx_port, *tx_port_cached;
 
 pmd_free_cached_ports(pmd);
-hmap_shrink(&pmd->port_cache);
+hmap_shrink(&pmd->send_port_cache);
+hmap_shrink(&pmd->tnl_port_cache);
 
 HMAP_FOR_EACH (tx_port, node, &pmd->tx_ports) {
-tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
-hmap_insert(&pmd->port_cache, &tx_port_cached->node,
-hash_port_no(tx_port_cached->port->port_no));
+if (netdev_has_tunnel_push_pop(tx_port->port->netdev)) {
+tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
+hmap_insert(&pmd->tnl_port_cache, &tx_port_cached->node,
+hash_port_no(tx_port_cached->port->port_no));
+}
+
+if (netdev_n_txq(tx_port->port->netdev)) {
+tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
+hmap_insert(&pmd->send_port_cache, &tx_port_cached->node,
+hash_port_no(tx_port_cached->port->port_no));
+}
 }
 }
 
@@ -3312,7 +3332,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL;
 ovs_list_init(&pmd->poll_list);
 hmap_init(&pmd->tx_ports);
-hmap_init(&pmd->port_cache);
+hmap_init(&pmd->tnl_port_cache);
+hmap_init(&pmd->send_port_cache);
 /* init the 'flow_cache' since there is no
  * actual thread created for NON_PMD_CORE_ID. */
 if (core_id == NON_PMD_CORE_ID) {
@@ -3328,7 +3349,8 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)

[ovs-dev] [PATCH v3 05/18] netdev-dpdk: Start also dpdkr devices only once on port-add.

2017-01-08 Thread Daniele Di Proietto
Since commit 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
we don't call rte_eth_start() from netdev_open() anymore, we only call
it from netdev_reconfigure().  This commit does that also for 'dpdkr'
devices, and remove some useless code.

Calling rte_eth_start() also from netdev_open() was unnecessary and
wasteful. Not doing it reduces code duplication and makes adding a port
faster (~900ms before the patch, ~400ms after).

Another reason why this is useful is that some DPDK driver might have
problems with reconfiguration. For example, until DPDK commit
8618d19b52b1("net/vmxnet3: reallocate shared memzone on re-config"),
vmxnet3 didn't support being restarted with a different number of
queues.

Technically, the netdev interface changed because before opening rxqs or
calling netdev_send() the user must check if reconfiguration is
required.  This patch also documents that, even though no change to the
userspace datapath (the only user) is required.

Lastly, this patch makes sure the errors returned by ofproto_port_add
(which includes the first port reconfiguration) are reported back to the
database.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dpdk.c | 70 ---
 lib/netdev.c  |  6 -
 vswitchd/bridge.c |  2 ++
 3 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2df3e220c..d6315357b 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -746,10 +746,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 int diag;
 int n_rxq, n_txq;
 
-if (!rte_eth_dev_is_valid_port(dev->port_id)) {
-return ENODEV;
-}
-
 rte_eth_dev_info_get(dev->port_id, &info);
 
 n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
@@ -858,30 +854,23 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 dev->port_id = port_no;
 dev->type = type;
 dev->flags = 0;
-dev->requested_mtu = dev->mtu = ETHER_MTU;
+dev->requested_mtu = ETHER_MTU;
 dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
 ovsrcu_index_init(&dev->vid, -1);
 dev->vhost_reconfigured = false;
 
-err = netdev_dpdk_mempool_configure(dev);
-if (err) {
-goto unlock;
-}
-
 ovsrcu_init(&dev->qos_conf, NULL);
 
 ovsrcu_init(&dev->ingress_policer, NULL);
 dev->policer_rate = 0;
 dev->policer_burst = 0;
 
-netdev->n_rxq = NR_QUEUE;
-netdev->n_txq = NR_QUEUE;
-dev->requested_n_rxq = netdev->n_rxq;
-dev->requested_n_txq = netdev->n_txq;
-dev->rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE;
-dev->txq_size = NIC_PORT_DEFAULT_TXQ_SIZE;
-dev->requested_rxq_size = dev->rxq_size;
-dev->requested_txq_size = dev->txq_size;
+netdev->n_rxq = 0;
+netdev->n_txq = 0;
+dev->requested_n_rxq = NR_QUEUE;
+dev->requested_n_txq = NR_QUEUE;
+dev->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE;
+dev->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE;
 
 /* Initialize the flow control to NULL */
 memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
@@ -891,25 +880,18 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 
 dev->flags = NETDEV_UP | NETDEV_PROMISC;
 
-if (type == DPDK_DEV_ETH) {
-if (rte_eth_dev_is_valid_port(dev->port_id)) {
-err = dpdk_eth_dev_init(dev);
-if (err) {
-goto unlock;
-}
-}
-dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
-} else {
+if (type == DPDK_DEV_VHOST) {
 dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
-}
-
-if (!dev->tx_q) {
-err = ENOMEM;
-goto unlock;
+if (!dev->tx_q) {
+err = ENOMEM;
+goto unlock;
+}
 }
 
 ovs_list_push_back(&dpdk_list, &dev->list_node);
 
+netdev_request_reconfigure(netdev);
+
 unlock:
 ovs_mutex_unlock(&dev->mutex);
 return err;
@@ -3168,7 +3150,7 @@ out:
 return err;
 }
 
-static void
+static int
 dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
 {
@@ -3189,32 +3171,38 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
 }
 }
 
+if (!dev->dpdk_mp) {
+return ENOMEM;
+}
+
 if (netdev_dpdk_get_vid(dev) >= 0) {
 dev->vhost_reconfigured = true;
 }
+
+return 0;
 }
 
 static int
 netdev_dpdk_vhost_reconfigure(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+int err;
 
 ovs_mutex_lock(&dev->mutex);
-dpdk_vhost_reconfigure_helper(dev);
+err = dpdk_vhost_reconfigure_helper(dev);
 ovs_mutex_unlock(&dev->mutex);
-return 0;
+
+return err;
 }
 
 static int
 netdev_dpdk_vhost_clie

[ovs-dev] [PATCH v3 06/18] netdev-dpdk: Refactor construct and destruct.

2017-01-08 Thread Daniele Di Proietto
Some refactoring for _construct() and _destruct() methods:
* Rename netdev_dpdk_init() to common_construct(). init() has a
  different meaning in the netdev context.
* Remove DPDK_DEV_ETH and DPDK_DEV_VHOST checks in common_construct()
  and move them to different functions
* Introduce common_destruct().
* Avoid taking 'dev->mutex' in construct and destruct: we're guaranteed
  to be the only thread with access to the object.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dpdk.c | 86 ++-
 1 file changed, 41 insertions(+), 45 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d6315357b..45320e370 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -827,29 +827,20 @@ netdev_dpdk_alloc_txq(unsigned int n_txqs)
 }
 
 static int
-netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
- enum dpdk_dev_type type)
+common_construct(struct netdev *netdev, unsigned int port_no,
+ enum dpdk_dev_type type, int socket_id)
 OVS_REQUIRES(dpdk_mutex)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-int sid;
-int err = 0;
 
 ovs_mutex_init(&dev->mutex);
-ovs_mutex_lock(&dev->mutex);
 
 rte_spinlock_init(&dev->stats_lock);
 
 /* If the 'sid' is negative, it means that the kernel fails
  * to obtain the pci numa info.  In that situation, always
  * use 'SOCKET0'. */
-if (type == DPDK_DEV_ETH && rte_eth_dev_is_valid_port(dev->port_id)) {
-sid = rte_eth_dev_socket_id(port_no);
-} else {
-sid = rte_lcore_to_socket_id(rte_get_master_lcore());
-}
-
-dev->socket_id = sid < 0 ? SOCKET0 : sid;
+dev->socket_id = socket_id < 0 ? SOCKET0 : socket_id;
 dev->requested_socket_id = dev->socket_id;
 dev->port_id = port_no;
 dev->type = type;
@@ -880,21 +871,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 
 dev->flags = NETDEV_UP | NETDEV_PROMISC;
 
-if (type == DPDK_DEV_VHOST) {
-dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
-if (!dev->tx_q) {
-err = ENOMEM;
-goto unlock;
-}
-}
-
 ovs_list_push_back(&dpdk_list, &dev->list_node);
 
 netdev_request_reconfigure(netdev);
 
-unlock:
-ovs_mutex_unlock(&dev->mutex);
-return err;
+return 0;
 }
 
 /* dev_name must be the prefix followed by a positive decimal number.
@@ -919,6 +900,21 @@ dpdk_dev_parse_name(const char dev_name[], const char 
prefix[],
 }
 
 static int
+vhost_common_construct(struct netdev *netdev)
+OVS_REQUIRES(dpdk_mutex)
+{
+int socket_id = rte_lcore_to_socket_id(rte_get_master_lcore());
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+
+dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
+if (!dev->tx_q) {
+return ENOMEM;
+}
+
+return common_construct(netdev, -1, DPDK_DEV_VHOST, socket_id);
+}
+
+static int
 netdev_dpdk_vhost_construct(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
@@ -952,7 +948,7 @@ netdev_dpdk_vhost_construct(struct netdev *netdev)
 VLOG_INFO("Socket %s created for vhost-user port %s\n",
   dev->vhost_id, name);
 }
-err = netdev_dpdk_init(netdev, -1, DPDK_DEV_VHOST);
+err = vhost_common_construct(netdev);
 
 ovs_mutex_unlock(&dpdk_mutex);
 return err;
@@ -964,7 +960,7 @@ netdev_dpdk_vhost_client_construct(struct netdev *netdev)
 int err;
 
 ovs_mutex_lock(&dpdk_mutex);
-err = netdev_dpdk_init(netdev, -1, DPDK_DEV_VHOST);
+err = vhost_common_construct(netdev);
 ovs_mutex_unlock(&dpdk_mutex);
 return err;
 }
@@ -975,29 +971,36 @@ netdev_dpdk_construct(struct netdev *netdev)
 int err;
 
 ovs_mutex_lock(&dpdk_mutex);
-err = netdev_dpdk_init(netdev, -1, DPDK_DEV_ETH);
+err = common_construct(netdev, -1, DPDK_DEV_ETH, SOCKET0);
 ovs_mutex_unlock(&dpdk_mutex);
 return err;
 }
 
 static void
+common_destruct(struct netdev_dpdk *dev)
+OVS_REQUIRES(dpdk_mutex)
+OVS_EXCLUDED(dev->mutex)
+{
+rte_free(dev->tx_q);
+dpdk_mp_put(dev->dpdk_mp);
+
+ovs_list_remove(&dev->list_node);
+free(ovsrcu_get_protected(struct ingress_policer *,
+  &dev->ingress_policer));
+ovs_mutex_destroy(&dev->mutex);
+}
+
+static void
 netdev_dpdk_destruct(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 
 ovs_mutex_lock(&dpdk_mutex);
-ovs_mutex_lock(&dev->mutex);
 
 rte_eth_dev_stop(dev->port_id);
 free(dev->devargs);
-free(ovsrcu_get_protected(struct ingress_policer *,
-  &dev->ingress_policer));
+common_destruct(dev);
 
-rte_free(dev->tx

[ovs-dev] [PATCH v3 07/18] dpif-netdev: Use a boolean instead of pmd->port_seq.

2017-01-08 Thread Daniele Di Proietto
There's no need for a sequence number, since the main thread has to wait
for the pmd thread, so there's no chance that an update will be
undetected.

A seq object will be introduced for another purpose in the next commit,
and changing this to boolean makes the code more readable.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 004b28dc8..0d47a3286 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -485,7 +485,7 @@ struct dp_netdev_pmd_thread {
 unsigned long long last_cycles;
 
 struct latch exit_latch;/* For terminating the pmd thread. */
-atomic_uint change_seq; /* For reloading pmd ports. */
+atomic_bool reload; /* Do we need to reload ports? */
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
@@ -526,8 +526,6 @@ struct dp_netdev_pmd_thread {
 uint64_t cycles_zero[PMD_N_CYCLES];
 };
 
-#define PMD_INITIAL_SEQ 1
-
 /* Interface to netdev-based datapath. */
 struct dpif_netdev {
 struct dpif dpif;
@@ -1201,8 +1199,6 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct 
dpif_dp_stats *stats)
 static void
 dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd)
 {
-int old_seq;
-
 if (pmd->core_id == NON_PMD_CORE_ID) {
 ovs_mutex_lock(&pmd->dp->non_pmd_mutex);
 ovs_mutex_lock(&pmd->port_mutex);
@@ -1213,7 +1209,7 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd)
 }
 
 ovs_mutex_lock(&pmd->cond_mutex);
-atomic_add_relaxed(&pmd->change_seq, 1, &old_seq);
+atomic_store_relaxed(&pmd->reload, true);
 ovs_mutex_cond_wait(&pmd->cond, &pmd->cond_mutex);
 ovs_mutex_unlock(&pmd->cond_mutex);
 }
@@ -3131,7 +3127,6 @@ pmd_thread_main(void *f_)
 struct dp_netdev_pmd_thread *pmd = f_;
 unsigned int lc = 0;
 struct rxq_poll *poll_list;
-unsigned int port_seq = PMD_INITIAL_SEQ;
 bool exiting;
 int poll_cnt;
 int i;
@@ -3159,7 +3154,7 @@ reload:
 }
 
 if (lc++ > 1024) {
-unsigned int seq;
+bool reload;
 
 lc = 0;
 
@@ -3169,9 +3164,8 @@ reload:
 emc_cache_slow_sweep(&pmd->flow_cache);
 }
 
-atomic_read_relaxed(&pmd->change_seq, &seq);
-if (seq != port_seq) {
-port_seq = seq;
+atomic_read_relaxed(&pmd->reload, &reload);
+if (reload) {
 break;
 }
 }
@@ -3228,6 +3222,7 @@ static void
 dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd)
 {
 ovs_mutex_lock(&pmd->cond_mutex);
+atomic_store_relaxed(&pmd->reload, false);
 xpthread_cond_signal(&pmd->cond);
 ovs_mutex_unlock(&pmd->cond_mutex);
 }
@@ -3322,7 +3317,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 
 ovs_refcount_init(&pmd->ref_cnt);
 latch_init(&pmd->exit_latch);
-atomic_init(&pmd->change_seq, PMD_INITIAL_SEQ);
+atomic_init(&pmd->reload, false);
 xpthread_cond_init(&pmd->cond, NULL);
 ovs_mutex_init(&pmd->cond_mutex);
 ovs_mutex_init(&pmd->flow_mutex);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 08/18] dpif-netdev: Block pmd threads if there are no ports.

2017-01-08 Thread Daniele Di Proietto
There's no reason for a pmd thread to perform its main loop if there are
no queues in its poll_list.

This commit introduces a seq object on which the pmd thread can be
blocked, if there are no queues.

When the main thread wants to reload a pmd threads it must now change
the seq object (in case it's blocked) and set 'reload' to true.

This is useful to avoid wasting CPU cycles and is also necessary for a
future commit.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0d47a3286..dc24e72dc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -485,6 +485,8 @@ struct dp_netdev_pmd_thread {
 unsigned long long last_cycles;
 
 struct latch exit_latch;/* For terminating the pmd thread. */
+struct seq *reload_seq;
+uint64_t last_reload_seq;
 atomic_bool reload; /* Do we need to reload ports? */
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
@@ -1209,6 +1211,7 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd)
 }
 
 ovs_mutex_lock(&pmd->cond_mutex);
+seq_change(pmd->reload_seq);
 atomic_store_relaxed(&pmd->reload, true);
 ovs_mutex_cond_wait(&pmd->cond, &pmd->cond_mutex);
 ovs_mutex_unlock(&pmd->cond_mutex);
@@ -3148,6 +3151,14 @@ reload:
 netdev_rxq_get_queue_id(poll_list[i].rx));
 }
 
+if (!poll_cnt) {
+while (seq_read(pmd->reload_seq) == pmd->last_reload_seq) {
+seq_wait(pmd->reload_seq, pmd->last_reload_seq);
+poll_block();
+}
+lc = 1025;
+}
+
 for (;;) {
 for (i = 0; i < poll_cnt; i++) {
 dp_netdev_process_rxq_port(pmd, poll_list[i].port, 
poll_list[i].rx);
@@ -3223,6 +3234,7 @@ dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread 
*pmd)
 {
 ovs_mutex_lock(&pmd->cond_mutex);
 atomic_store_relaxed(&pmd->reload, false);
+pmd->last_reload_seq = seq_read(pmd->reload_seq);
 xpthread_cond_signal(&pmd->cond);
 ovs_mutex_unlock(&pmd->cond_mutex);
 }
@@ -3317,6 +3329,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 
 ovs_refcount_init(&pmd->ref_cnt);
 latch_init(&pmd->exit_latch);
+pmd->reload_seq = seq_create();
+pmd->last_reload_seq = seq_read(pmd->reload_seq);
 atomic_init(&pmd->reload, false);
 xpthread_cond_init(&pmd->cond, NULL);
 ovs_mutex_init(&pmd->cond_mutex);
@@ -3356,6 +3370,7 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
 cmap_destroy(&pmd->flow_table);
 ovs_mutex_destroy(&pmd->flow_mutex);
 latch_destroy(&pmd->exit_latch);
+seq_destroy(pmd->reload_seq);
 xpthread_cond_destroy(&pmd->cond);
 ovs_mutex_destroy(&pmd->cond_mutex);
 ovs_mutex_destroy(&pmd->port_mutex);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 13/18] ovs-numa: Add new dump types.

2017-01-08 Thread Daniele Di Proietto
They will be used by a future commit.

This patch introduces some code duplication which will be removed in a
future commit.

Signed-off-by: Daniele Di Proietto 
---
 lib/ovs-numa.c | 78 ++
 lib/ovs-numa.h |  4 ++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index 85f392a91..61c31cf69 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -512,6 +512,84 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 return dump;
 }
 
+struct ovs_numa_dump *
+ovs_numa_dump_cores_with_cmask(const char *cmask)
+{
+struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+int core_id = 0;
+int end_idx;
+
+hmap_init(&dump->dump);
+
+/* Ignore leading 0x. */
+end_idx = 0;
+if (!strncmp(cmask, "0x", 2) || !strncmp(cmask, "0X", 2)) {
+end_idx = 2;
+}
+
+for (int i = strlen(cmask) - 1; i >= end_idx; i--) {
+char hex = toupper((unsigned char) cmask[i]);
+int bin, j;
+
+if (hex >= '0' && hex <= '9') {
+bin = hex - '0';
+} else if (hex >= 'A' && hex <= 'F') {
+bin = hex - 'A' + 10;
+} else {
+VLOG_WARN("Invalid cpu mask: %c", cmask[i]);
+bin = 0;
+}
+
+for (j = 0; j < 4; j++) {
+if ((bin >> j) & 0x1) {
+struct cpu_core *core = get_core_by_core_id(core_id);
+
+if (core) {
+struct ovs_numa_info *info = xmalloc(sizeof *info);
+
+info->numa_id = core->numa->numa_id;
+info->core_id = core->core_id;
+hmap_insert(&dump->dump, &info->hmap_node,
+hash_2words(info->numa_id, info->core_id));
+}
+}
+
+core_id++;
+}
+}
+
+return dump;
+}
+
+struct ovs_numa_dump *
+ovs_numa_dump_n_cores_per_numa(int cores_per_numa)
+{
+struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+const struct numa_node *n;
+
+hmap_init(&dump->dump);
+
+HMAP_FOR_EACH (n, hmap_node, &all_numa_nodes) {
+const struct cpu_core *core;
+int i = 0;
+
+LIST_FOR_EACH (core, list_node, &n->cores) {
+if (i++ >= cores_per_numa) {
+break;
+}
+
+struct ovs_numa_info *info = xmalloc(sizeof *info);
+
+info->numa_id = core->numa->numa_id;
+info->core_id = core->core_id;
+hmap_insert(&dump->dump, &info->hmap_node,
+hash_2words(info->numa_id, info->core_id));
+}
+}
+
+return dump;
+}
+
 bool
 ovs_numa_dump_contains_core(const struct ovs_numa_dump *dump,
 int numa_id, unsigned core_id)
diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h
index c0eae07d8..62bdb225f 100644
--- a/lib/ovs-numa.h
+++ b/lib/ovs-numa.h
@@ -54,12 +54,14 @@ unsigned ovs_numa_get_unpinned_core_any(void);
 unsigned ovs_numa_get_unpinned_core_on_numa(int numa_id);
 void ovs_numa_unpin_core(unsigned core_id);
 struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);
+struct ovs_numa_dump *ovs_numa_dump_cores_with_cmask(const char *cmask);
+struct ovs_numa_dump *ovs_numa_dump_n_cores_per_numa(int n);
 bool ovs_numa_dump_contains_core(const struct ovs_numa_dump *,
  int numa_id, unsigned core_id);
 void ovs_numa_dump_destroy(struct ovs_numa_dump *);
 int ovs_numa_thread_setaffinity_core(unsigned core_id);
 
-#define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)\
+#define FOR_EACH_CORE_ON_DUMP(ITER, DUMP)\
 HMAP_FOR_EACH((ITER), hmap_node, &(DUMP)->dump)
 
 #endif /* ovs-numa.h */
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 09/18] dpif-netdev: Create pmd threads for every numa node.

2017-01-08 Thread Daniele Di Proietto
A lot of the complexity in the code that handles pmd threads and ports
in dpif-netdev is due to the fact that we postpone the creation of pmd
threads on a numa node until we have a port that needs to be polled on
that particular node.

Since the previous commit, a pmd thread with no ports will not consume
any CPU, so it seems easier to create all the threads at once.

This will also make future commits easier.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 208 ++
 tests/pmd.at  |   2 +-
 2 files changed, 69 insertions(+), 141 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index dc24e72dc..432bac814 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -575,8 +575,8 @@ static struct dp_netdev_pmd_thread 
*dp_netdev_get_pmd(struct dp_netdev *dp,
 static struct dp_netdev_pmd_thread *
 dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos);
 static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
-static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id);
-static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id)
+static void dp_netdev_stop_pmds(struct dp_netdev *dp);
+static void dp_netdev_start_pmds(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex);
 static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
@@ -1092,19 +1092,20 @@ dp_netdev_free(struct dp_netdev *dp)
 
 shash_find_and_delete(&dp_netdevs, dp->name);
 
-dp_netdev_destroy_all_pmds(dp);
-ovs_mutex_destroy(&dp->non_pmd_mutex);
-ovsthread_key_delete(dp->per_pmd_key);
-
-conntrack_destroy(&dp->conntrack);
-
 ovs_mutex_lock(&dp->port_mutex);
 HMAP_FOR_EACH_SAFE (port, next, node, &dp->ports) {
 do_del_port(dp, port);
 }
 ovs_mutex_unlock(&dp->port_mutex);
+dp_netdev_destroy_all_pmds(dp);
 cmap_destroy(&dp->poll_threads);
 
+ovs_mutex_destroy(&dp->non_pmd_mutex);
+ovsthread_key_delete(dp->per_pmd_key);
+
+conntrack_destroy(&dp->conntrack);
+
+
 seq_destroy(dp->reconfigure_seq);
 
 seq_destroy(dp->port_seq);
@@ -1348,10 +1349,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 }
 
 if (netdev_is_pmd(port->netdev)) {
-int numa_id = netdev_get_numa_id(port->netdev);
-
-ovs_assert(ovs_numa_numa_id_is_valid(numa_id));
-dp_netdev_set_pmds_on_numa(dp, numa_id);
+dp_netdev_start_pmds(dp);
 }
 
 dp_netdev_add_port_to_pmds(dp, port);
@@ -1493,45 +1491,16 @@ get_n_pmd_threads(struct dp_netdev *dp)
 return cmap_count(&dp->poll_threads) - 1;
 }
 
-static int
-get_n_pmd_threads_on_numa(struct dp_netdev *dp, int numa_id)
-{
-struct dp_netdev_pmd_thread *pmd;
-int n_pmds = 0;
-
-CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
-if (pmd->numa_id == numa_id) {
-n_pmds++;
-}
-}
-
-return n_pmds;
-}
-
-/* Returns 'true' if there is a port with pmd netdev and the netdev is on
- * numa node 'numa_id' or its rx queue assigned to core on that numa node . */
+/* Returns 'true' if there is a port with pmd netdev. */
 static bool
-has_pmd_rxq_for_numa(struct dp_netdev *dp, int numa_id)
+has_pmd_port(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex)
 {
 struct dp_netdev_port *port;
 
 HMAP_FOR_EACH (port, node, &dp->ports) {
 if (netdev_is_pmd(port->netdev)) {
-int i;
-
-if (netdev_get_numa_id(port->netdev) == numa_id) {
-return true;
-}
-
-for (i = 0; i < port->n_rxq; i++) {
-unsigned core_id = port->rxqs[i].core_id;
-
-if (core_id != OVS_CORE_UNSPEC
-&& ovs_numa_get_numa_id(core_id) == numa_id) {
-return true;
-}
-}
+return true;
 }
 }
 
@@ -1549,14 +1518,9 @@ do_del_port(struct dp_netdev *dp, struct dp_netdev_port 
*port)
 dp_netdev_del_port_from_all_pmds(dp, port);
 
 if (netdev_is_pmd(port->netdev)) {
-int numa_id = netdev_get_numa_id(port->netdev);
-
-/* PMD threads can not be on invalid numa node. */
-ovs_assert(ovs_numa_numa_id_is_valid(numa_id));
-/* If there is no netdev on the numa node, deletes the pmd threads
- * for that numa. */
-if (!has_pmd_rxq_for_numa(dp, numa_id)) {
-dp_netdev_del_pmds_on_numa(dp, numa_id);
+/* If there is no pmd netdev, delete the pmd threads */
+if (!has_pmd_port(dp)) {
+dp_netdev_stop_pmds(dp);
 }
 }
 
@@ -3407,18 +3371,22 @@ dp_netdev_del_pmd(struct dp_netdev *dp, struct 
dp_netdev_pmd_thread *pmd)
 dp_netdev_pmd_un

[ovs-dev] [PATCH v3 10/18] dpif-netdev: Make 'static_tx_qid' const.

2017-01-08 Thread Daniele Di Proietto
Since previous commit, 'static_tx_qid' doesn't need to be atomic and is
actually never touched (except for initialization), so it can be made
const.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 432bac814..436f945b7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -496,7 +496,7 @@ struct dp_netdev_pmd_thread {
 /* Queue id used by this pmd thread to send packets on all netdevs if
  * XPS disabled for this netdev. All static_tx_qid's are unique and less
  * than 'ovs_numa_get_n_cores() + 1'. */
-atomic_int static_tx_qid;
+const int static_tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
@@ -3286,10 +3286,9 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread 
*pmd, struct dp_netdev *dp,
 pmd->numa_id = numa_id;
 pmd->poll_cnt = 0;
 
-atomic_init(&pmd->static_tx_qid,
-(core_id == NON_PMD_CORE_ID)
-? ovs_numa_get_n_cores()
-: get_n_pmd_threads(dp));
+*CONST_CAST(int *, &pmd->static_tx_qid) = (core_id == NON_PMD_CORE_ID)
+  ? ovs_numa_get_n_cores()
+  : get_n_pmd_threads(dp);
 
 ovs_refcount_init(&pmd->ref_cnt);
 latch_init(&pmd->exit_latch);
@@ -4394,7 +4393,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 if (dynamic_txqs) {
 tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now);
 } else {
-atomic_read_relaxed(&pmd->static_tx_qid, &tx_qid);
+tx_qid = pmd->static_tx_qid;
 }
 
 netdev_send(p->port->netdev, tx_qid, packets_, may_steal,
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 12/18] ovs-numa: New ovs_numa_dump_contains_core() function.

2017-01-08 Thread Daniele Di Proietto
It will be used by a future commit.  struct ovs_numa_dump now uses an
hmap instead of a list to make ovs_numa_dump_contains_core() more
efficient.

Signed-off-by: Daniele Di Proietto 
---
 lib/ovs-numa.c | 25 ++---
 lib/ovs-numa.h | 10 ++
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index e1e7068a2..85f392a91 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -494,7 +494,7 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
 struct numa_node *numa = get_numa_by_numa_id(numa_id);
 
-ovs_list_init(&dump->dump);
+hmap_init(&dump->dump);
 
 if (numa) {
 struct cpu_core *core;
@@ -504,13 +504,30 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 
 info->numa_id = numa->numa_id;
 info->core_id = core->core_id;
-ovs_list_insert(&dump->dump, &info->list_node);
+hmap_insert(&dump->dump, &info->hmap_node,
+hash_2words(info->numa_id, info->core_id));
 }
 }
 
 return dump;
 }
 
+bool
+ovs_numa_dump_contains_core(const struct ovs_numa_dump *dump,
+int numa_id, unsigned core_id)
+{
+struct ovs_numa_info *core;
+
+HMAP_FOR_EACH_WITH_HASH (core, hmap_node, hash_2words(numa_id, core_id),
+ &dump->dump) {
+if (core->core_id == core_id && core->numa_id == numa_id) {
+return true;
+}
+}
+
+return false;
+}
+
 void
 ovs_numa_dump_destroy(struct ovs_numa_dump *dump)
 {
@@ -520,10 +537,12 @@ ovs_numa_dump_destroy(struct ovs_numa_dump *dump)
 return;
 }
 
-LIST_FOR_EACH_POP (iter, list_node, &dump->dump) {
+HMAP_FOR_EACH_POP (iter, hmap_node, &dump->dump) {
 free(iter);
 }
 
+hmap_destroy(&dump->dump);
+
 free(dump);
 }
 
diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h
index be836b2ca..c0eae07d8 100644
--- a/lib/ovs-numa.h
+++ b/lib/ovs-numa.h
@@ -21,19 +21,19 @@
 #include 
 
 #include "compiler.h"
-#include "openvswitch/list.h"
+#include "openvswitch/hmap.h"
 
 #define OVS_CORE_UNSPEC INT_MAX
 #define OVS_NUMA_UNSPEC INT_MAX
 
 /* Dump of a list of 'struct ovs_numa_info'. */
 struct ovs_numa_dump {
-struct ovs_list dump;
+struct hmap dump;
 };
 
 /* A numa_id - core_id pair. */
 struct ovs_numa_info {
-struct ovs_list list_node;
+struct hmap_node hmap_node;
 int numa_id;
 unsigned core_id;
 };
@@ -54,10 +54,12 @@ unsigned ovs_numa_get_unpinned_core_any(void);
 unsigned ovs_numa_get_unpinned_core_on_numa(int numa_id);
 void ovs_numa_unpin_core(unsigned core_id);
 struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);
+bool ovs_numa_dump_contains_core(const struct ovs_numa_dump *,
+ int numa_id, unsigned core_id);
 void ovs_numa_dump_destroy(struct ovs_numa_dump *);
 int ovs_numa_thread_setaffinity_core(unsigned core_id);
 
 #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)\
-LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)
+HMAP_FOR_EACH((ITER), hmap_node, &(DUMP)->dump)
 
 #endif /* ovs-numa.h */
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 14/18] ovs-numa: Don't use hmap_first_with_hash().

2017-01-08 Thread Daniele Di Proietto
I think it's better to iterate the hmap than to use
hmap_first_with_hash(), because it handles hash collisions.

Signed-off-by: Daniele Di Proietto 
---
 lib/ovs-numa.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index 61c31cf69..f8a37b1ea 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -241,30 +241,32 @@ discover_numa_and_core(void)
 static struct cpu_core*
 get_core_by_core_id(unsigned core_id)
 {
-struct cpu_core *core = NULL;
+struct cpu_core *core;
 
-if (ovs_numa_core_id_is_valid(core_id)) {
-core = CONTAINER_OF(hmap_first_with_hash(&all_cpu_cores,
- hash_int(core_id, 0)),
-struct cpu_core, hmap_node);
+HMAP_FOR_EACH_WITH_HASH (core, hmap_node, hash_int(core_id, 0),
+ &all_cpu_cores) {
+if (core->core_id == core_id) {
+return core;
+}
 }
 
-return core;
+return NULL;
 }
 
 /* Gets 'struct numa_node' by 'numa_id'. */
 static struct numa_node*
 get_numa_by_numa_id(int numa_id)
 {
-struct numa_node *numa = NULL;
+struct numa_node *numa;
 
-if (ovs_numa_numa_id_is_valid(numa_id)) {
-numa = CONTAINER_OF(hmap_first_with_hash(&all_numa_nodes,
- hash_int(numa_id, 0)),
-struct numa_node, hmap_node);
+HMAP_FOR_EACH_WITH_HASH (numa, hmap_node, hash_int(numa_id, 0),
+ &all_numa_nodes) {
+if (numa->numa_id == numa_id) {
+return numa;
+}
 }
 
-return numa;
+return NULL;
 }
 
 
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 15/18] ovs-numa: Add per numa and global counts in dump.

2017-01-08 Thread Daniele Di Proietto
They will be used by a future commit.

Suggested-by: Ilya Maximets 
Signed-off-by: Daniele Di Proietto 
---
 lib/ovs-numa.c | 96 +-
 lib/ovs-numa.h | 18 +--
 2 files changed, 77 insertions(+), 37 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index f8a37b1ea..04225a958 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -489,25 +489,53 @@ ovs_numa_unpin_core(unsigned core_id)
 }
 }
 
+static struct ovs_numa_dump *
+ovs_numa_dump_create(void)
+{
+struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+
+hmap_init(&dump->cores);
+hmap_init(&dump->numas);
+
+return dump;
+}
+
+static void
+ovs_numa_dump_add(struct ovs_numa_dump *dump, int numa_id, int core_id)
+{
+struct ovs_numa_info_core *c = xzalloc(sizeof *c);
+struct ovs_numa_info_numa *n;
+
+c->numa_id = numa_id;
+c->core_id = core_id;
+hmap_insert(&dump->cores, &c->hmap_node, hash_2words(numa_id, core_id));
+
+HMAP_FOR_EACH_WITH_HASH (n, hmap_node, hash_int(numa_id, 0),
+ &dump->numas) {
+if (n->numa_id == numa_id) {
+n->n_cores++;
+return;
+}
+}
+
+n = xzalloc(sizeof *n);
+n->numa_id = numa_id;
+n->n_cores = 1;
+hmap_insert(&dump->numas, &n->hmap_node, hash_int(numa_id, 0));
+}
+
 /* Given the 'numa_id', returns dump of all cores on the numa node. */
 struct ovs_numa_dump *
 ovs_numa_dump_cores_on_numa(int numa_id)
 {
-struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+struct ovs_numa_dump *dump = ovs_numa_dump_create();
 struct numa_node *numa = get_numa_by_numa_id(numa_id);
 
-hmap_init(&dump->dump);
-
 if (numa) {
 struct cpu_core *core;
 
-LIST_FOR_EACH(core, list_node, &numa->cores) {
-struct ovs_numa_info *info = xmalloc(sizeof *info);
-
-info->numa_id = numa->numa_id;
-info->core_id = core->core_id;
-hmap_insert(&dump->dump, &info->hmap_node,
-hash_2words(info->numa_id, info->core_id));
+LIST_FOR_EACH (core, list_node, &numa->cores) {
+ovs_numa_dump_add(dump, numa->numa_id, core->core_id);
 }
 }
 
@@ -517,12 +545,10 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 struct ovs_numa_dump *
 ovs_numa_dump_cores_with_cmask(const char *cmask)
 {
-struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+struct ovs_numa_dump *dump = ovs_numa_dump_create();
 int core_id = 0;
 int end_idx;
 
-hmap_init(&dump->dump);
-
 /* Ignore leading 0x. */
 end_idx = 0;
 if (!strncmp(cmask, "0x", 2) || !strncmp(cmask, "0X", 2)) {
@@ -547,12 +573,9 @@ ovs_numa_dump_cores_with_cmask(const char *cmask)
 struct cpu_core *core = get_core_by_core_id(core_id);
 
 if (core) {
-struct ovs_numa_info *info = xmalloc(sizeof *info);
-
-info->numa_id = core->numa->numa_id;
-info->core_id = core->core_id;
-hmap_insert(&dump->dump, &info->hmap_node,
-hash_2words(info->numa_id, info->core_id));
+ovs_numa_dump_add(dump,
+  core->numa->numa_id,
+  core->core_id);
 }
 }
 
@@ -566,11 +589,9 @@ ovs_numa_dump_cores_with_cmask(const char *cmask)
 struct ovs_numa_dump *
 ovs_numa_dump_n_cores_per_numa(int cores_per_numa)
 {
-struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+struct ovs_numa_dump *dump = ovs_numa_dump_create();
 const struct numa_node *n;
 
-hmap_init(&dump->dump);
-
 HMAP_FOR_EACH (n, hmap_node, &all_numa_nodes) {
 const struct cpu_core *core;
 int i = 0;
@@ -580,12 +601,7 @@ ovs_numa_dump_n_cores_per_numa(int cores_per_numa)
 break;
 }
 
-struct ovs_numa_info *info = xmalloc(sizeof *info);
-
-info->numa_id = core->numa->numa_id;
-info->core_id = core->core_id;
-hmap_insert(&dump->dump, &info->hmap_node,
-hash_2words(info->numa_id, info->core_id));
+ovs_numa_dump_add(dump, core->numa->numa_id, core->core_id);
 }
 }
 
@@ -596,10 +612,10 @@ bool
 ovs_numa_dump_contains_core(const struct ovs_numa_dump *dump,
 int numa_id, unsigned core_id)
 {
-struct ovs_numa_info *core;
+struct ovs_numa_info_core *core;
 
 HMAP_FOR_EACH_WITH_HASH (core, hmap_node, hash_2words(numa_id, core_id),
- &dump->dump) {
+ &dump->cores) {
 if (cor

[ovs-dev] [PATCH v3 11/18] dpctl: Avoid making assumptions on pmd threads.

2017-01-08 Thread Daniele Di Proietto
Currently dpctl depends on ovs-numa module to delete and create flows on
different pmd threads for pmd devices.

The next commits will move away the pmd threads state from ovs-numa to
dpif-netdev, so the ovs-numa interface will not be supported.

Also, the assignment between ports and thread is an implementation
detail of dpif-netdev, dpctl shouldn't know anything about it.

This commit changes the dpif_flow_put() and dpif_flow_del() calls to
iterate over all the pmd threads, if pmd_id is PMD_ID_NULL.

A simple test is added.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpctl.c   | 107 
 lib/dpif-netdev.c | 180 +-
 lib/dpif.c|   6 +-
 lib/dpif.h|  12 +++-
 tests/pmd.at  |  44 +
 5 files changed, 194 insertions(+), 155 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 7011b183d..23837ce74 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -40,7 +40,6 @@
 #include "netlink.h"
 #include "odp-util.h"
 #include "openvswitch/ofpbuf.h"
-#include "ovs-numa.h"
 #include "packets.h"
 #include "openvswitch/shash.h"
 #include "simap.h"
@@ -876,43 +875,12 @@ out_freefilter:
 return error;
 }
 
-/* Extracts the in_port from the parsed keys, and returns the reference
- * to the 'struct netdev *' of the dpif port.  On error, returns NULL.
- * Users must call 'netdev_close()' after finish using the returned
- * reference. */
-static struct netdev *
-get_in_port_netdev_from_key(struct dpif *dpif, const struct ofpbuf *key)
-{
-const struct nlattr *in_port_nla;
-struct netdev *dev = NULL;
-
-in_port_nla = nl_attr_find(key, 0, OVS_KEY_ATTR_IN_PORT);
-if (in_port_nla) {
-struct dpif_port dpif_port;
-odp_port_t port_no;
-int error;
-
-port_no = ODP_PORT_C(nl_attr_get_u32(in_port_nla));
-error = dpif_port_query_by_number(dpif, port_no, &dpif_port);
-if (error) {
-goto out;
-}
-
-netdev_open(dpif_port.name, dpif_port.type, &dev);
-dpif_port_destroy(&dpif_port);
-}
-
-out:
-return dev;
-}
-
 static int
 dpctl_put_flow(int argc, const char *argv[], enum dpif_flow_put_flags flags,
struct dpctl_params *dpctl_p)
 {
 const char *key_s = argv[argc - 2];
 const char *actions_s = argv[argc - 1];
-struct netdev *in_port_netdev = NULL;
 struct dpif_flow_stats stats;
 struct dpif_port dpif_port;
 struct dpif_port_dump port_dump;
@@ -968,39 +936,15 @@ dpctl_put_flow(int argc, const char *argv[], enum 
dpif_flow_put_flags flags,
 goto out_freeactions;
 }
 
-/* For DPDK interface, applies the operation to all pmd threads
- * on the same numa node. */
-in_port_netdev = get_in_port_netdev_from_key(dpif, &key);
-if (in_port_netdev && netdev_is_pmd(in_port_netdev)) {
-int numa_id;
-
-numa_id = netdev_get_numa_id(in_port_netdev);
-if (ovs_numa_numa_id_is_valid(numa_id)) {
-struct ovs_numa_dump *dump = ovs_numa_dump_cores_on_numa(numa_id);
-struct ovs_numa_info *iter;
-
-FOR_EACH_CORE_ON_NUMA (iter, dump) {
-if (ovs_numa_core_is_pinned(iter->core_id)) {
-error = dpif_flow_put(dpif, flags,
-  key.data, key.size,
-  mask.size == 0 ? NULL : mask.data,
-  mask.size, actions.data,
-  actions.size, ufid_present ? &ufid : 
NULL,
-  iter->core_id, 
dpctl_p->print_statistics ? &stats : NULL);
-}
-}
-ovs_numa_dump_destroy(dump);
-} else {
-error = EINVAL;
-}
-} else {
-error = dpif_flow_put(dpif, flags,
-  key.data, key.size,
-  mask.size == 0 ? NULL : mask.data,
-  mask.size, actions.data,
-  actions.size, ufid_present ? &ufid : NULL,
-  PMD_ID_NULL, dpctl_p->print_statistics ? &stats 
: NULL);
-}
+/* The flow will be added on all pmds currently in the datapath. */
+error = dpif_flow_put(dpif, flags,
+  key.data, key.size,
+  mask.size == 0 ? NULL : mask.data,
+  mask.size, actions.data,
+  actions.size, ufid_present ? &ufid : NULL,
+  PMD_ID_NULL,
+  dpctl_p->print_statistics ? &stats : NULL);
+
 if (error) {
 dpctl_error(dpctl_p, error, "updating flow table");
 goto out_freeactions;
@@ -1021,7 +965,6 @@ ou

[ovs-dev] [PATCH v3 17/18] dpif-netdev: Centralized threads and queues handling code.

2017-01-08 Thread Daniele Di Proietto
Currently we have three different code paths that deal with pmd threads
and queues, in response to different input

1. When a port is added
2. When a port is deleted
3. When the cpumask changes or a port must be reconfigured.

1. and 2. are carefully written to minimize disruption to the running
datapath, while 3. brings down all the threads reconfigure all the ports
and restarts everything.

This commit removes the three separate code paths by introducing the
reconfigure_datapath() function, that takes care of adapting the pmd
threads and queues to the current datapath configuration, no matter how
we got there.

This aims at simplifying maintenance and introduces a long overdue
improvement: port reconfiguration (can happen quite frequently for
dpdkvhost ports) is now done without shutting down the whole datapath,
but just by temporarily removing the port that needs to be reconfigured
(while the rest of the datapath is running).

We now also recompute the rxq scheduling from scratch every time a port
is added of deleted.  This means that the queues will be more balanced,
especially when dealing with explicit rxq-affinity from the user
(without shutting down the threads and restarting them), but it also
means that adding or deleting a port might cause existing queues to be
moved between pmd threads.  This negative effect can be avoided by
taking into account the existing distribution when computing the new
scheduling, but I considered code clarity and fast reconfiguration more
important than optimizing port addition or removal (a port is added and
removed only once, but can be reconfigured many times)

Lastly, this commit moves the pmd threads state away from ovs-numa.  Now
the pmd threads state is kept only in dpif-netdev.

Signed-off-by: Daniele Di Proietto 
Co-authored-by: Ilya Maximets 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 896 +++---
 tests/pmd.at  |   3 +-
 2 files changed, 450 insertions(+), 449 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d996e3c9a..22fbb6eb7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -286,6 +286,7 @@ struct dp_netdev_rxq {
   pinned. OVS_CORE_UNSPEC if the
   queue doesn't need to be pinned to a
   particular core. */
+struct dp_netdev_pmd_thread *pmd;  /* pmd thread that will poll this 
queue. */
 };
 
 /* A port in a netdev-based datapath. */
@@ -301,6 +302,7 @@ struct dp_netdev_port {
 struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
 char *rxq_affinity_list;/* Requested affinity of rx queues. */
+bool need_reconfigure;  /* True if we should reconfigure netdev. */
 };
 
 /* Contained by struct dp_netdev_flow's 'stats' member.  */
@@ -503,7 +505,7 @@ struct dp_netdev_pmd_thread {
 
 /* Queue id used by this pmd thread to send packets on all netdevs if
  * XPS disabled for this netdev. All static_tx_qid's are unique and less
- * than 'ovs_numa_get_n_cores() + 1'. */
+ * than 'cmap_count(dp->poll_threads)'. */
 const int static_tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
@@ -532,6 +534,9 @@ struct dp_netdev_pmd_thread {
  * reporting to the user */
 unsigned long long stats_zero[DP_N_STATS];
 uint64_t cycles_zero[PMD_N_CYCLES];
+
+/* Set to true if the pmd thread needs to be reloaded. */
+bool need_reload;
 };
 
 /* Interface to netdev-based datapath. */
@@ -576,29 +581,26 @@ static void dp_netdev_destroy_pmd(struct 
dp_netdev_pmd_thread *pmd);
 static void dp_netdev_set_nonpmd(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex);
 
+static void *pmd_thread_main(void *);
 static struct dp_netdev_pmd_thread *dp_netdev_get_pmd(struct dp_netdev *dp,
   unsigned core_id);
 static struct dp_netdev_pmd_thread *
 dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos);
-static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
-static void dp_netdev_stop_pmds(struct dp_netdev *dp);
-static void dp_netdev_start_pmds(struct dp_netdev *dp)
-OVS_REQUIRES(dp->port_mutex);
+static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp, bool non_pmd);
 static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
-static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
- struct dp_netdev_port *port);
-static void dp_netdev_add_port_to_pmds(struct dp_netdev *dp,
-   struct dp_netdev_port *port);
 static void dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
- struct dp_netdev_port *port);
+ 

[ovs-dev] [PATCH v3 18/18] ovs-numa: Remove unused functions.

2017-01-08 Thread Daniele Di Proietto
ovs-numa doesn't need to keep the state of the pmd threads, it is an
implementation detail of dpif-netdev.

Signed-off-by: Daniele Di Proietto 
---
 lib/ovs-numa.c | 175 -
 lib/ovs-numa.h |   7 ---
 2 files changed, 182 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index 04225a958..2e038b745 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -70,8 +70,6 @@ struct cpu_core {
 struct ovs_list list_node; /* In 'numa_node->cores' list. */
 struct numa_node *numa;/* numa node containing the core. */
 unsigned core_id;  /* Core id. */
-bool available;/* If the core can be pinned. */
-bool pinned;   /* If a thread has been pinned to the core. */
 };
 
 /* Contains all 'struct numa_node's. */
@@ -119,7 +117,6 @@ insert_new_cpu_core(struct numa_node *n, unsigned core_id)
 ovs_list_insert(&n->cores, &c->list_node);
 c->core_id = core_id;
 c->numa = n;
-c->available = true;
 
 return c;
 }
@@ -342,18 +339,6 @@ ovs_numa_core_id_is_valid(unsigned core_id)
 return found_numa_and_core && core_id < ovs_numa_get_n_cores();
 }
 
-bool
-ovs_numa_core_is_pinned(unsigned core_id)
-{
-struct cpu_core *core = get_core_by_core_id(core_id);
-
-if (core) {
-return core->pinned;
-}
-
-return false;
-}
-
 /* Returns the number of numa nodes. */
 int
 ovs_numa_get_n_numas(void)
@@ -398,97 +383,6 @@ ovs_numa_get_n_cores_on_numa(int numa_id)
 return OVS_CORE_UNSPEC;
 }
 
-/* Returns the number of cpu cores that are available and unpinned
- * on numa node.  Returns OVS_CORE_UNSPEC if 'numa_id' is invalid. */
-int
-ovs_numa_get_n_unpinned_cores_on_numa(int numa_id)
-{
-struct numa_node *numa = get_numa_by_numa_id(numa_id);
-
-if (numa) {
-struct cpu_core *core;
-int count = 0;
-
-LIST_FOR_EACH(core, list_node, &numa->cores) {
-if (core->available && !core->pinned) {
-count++;
-}
-}
-return count;
-}
-
-return OVS_CORE_UNSPEC;
-}
-
-/* Given 'core_id', tries to pin that core.  Returns true, if succeeds.
- * False, if the core has already been pinned, or if it is invalid or
- * not available. */
-bool
-ovs_numa_try_pin_core_specific(unsigned core_id)
-{
-struct cpu_core *core = get_core_by_core_id(core_id);
-
-if (core) {
-if (core->available && !core->pinned) {
-core->pinned = true;
-return true;
-}
-}
-
-return false;
-}
-
-/* Searches through all cores for an unpinned and available core.  Returns
- * the 'core_id' if found and sets the 'core->pinned' to true.  Otherwise,
- * returns OVS_CORE_UNSPEC. */
-unsigned
-ovs_numa_get_unpinned_core_any(void)
-{
-struct cpu_core *core;
-
-HMAP_FOR_EACH(core, hmap_node, &all_cpu_cores) {
-if (core->available && !core->pinned) {
-core->pinned = true;
-return core->core_id;
-}
-}
-
-return OVS_CORE_UNSPEC;
-}
-
-/* Searches through all cores on numa node with 'numa_id' for an
- * unpinned and available core.  Returns the core_id if found and
- * sets the 'core->pinned' to true.  Otherwise, returns OVS_CORE_UNSPEC. */
-unsigned
-ovs_numa_get_unpinned_core_on_numa(int numa_id)
-{
-struct numa_node *numa = get_numa_by_numa_id(numa_id);
-
-if (numa) {
-struct cpu_core *core;
-
-LIST_FOR_EACH(core, list_node, &numa->cores) {
-if (core->available && !core->pinned) {
-core->pinned = true;
-return core->core_id;
-}
-}
-}
-
-return OVS_CORE_UNSPEC;
-}
-
-/* Unpins the core with 'core_id'. */
-void
-ovs_numa_unpin_core(unsigned core_id)
-{
-struct cpu_core *core = get_core_by_core_id(core_id);
-
-if (core) {
-core->pinned = false;
-}
-}
-
 static struct ovs_numa_dump *
 ovs_numa_dump_create(void)
 {
@@ -654,75 +548,6 @@ ovs_numa_dump_destroy(struct ovs_numa_dump *dump)
 free(dump);
 }
 
-/* Reads the cpu mask configuration from 'cmask' and sets the
- * 'available' of corresponding cores.  For unspecified cores,
- * sets 'available' to false. */
-void
-ovs_numa_set_cpu_mask(const char *cmask)
-{
-int core_id = 0;
-int i;
-int end_idx;
-
-if (!found_numa_and_core) {
-return;
-}
-
-/* If no mask specified, resets the 'available' to true for all cores. */
-if (!cmask) {
-struct cpu_core *core;
-
-HMAP_FOR_EACH(core, hmap_node, &all_cpu_cores) {
-core->available = true;
-}
-
-return;
-}
-
-/* Ignore leading 0x. */
-end_idx = 0;
-if (!strnc

[ovs-dev] [PATCH v3 16/18] dpif-netdev: Use hmap for poll_list in pmd threads.

2017-01-08 Thread Daniele Di Proietto
A future commit will use this to determine if a queue is already
contained in a pmd thread.

To keep the behavior unaltered we now have to sort queues before
printing them in pmd_info_show_rxq().

Also this commit introduces 'struct polled_queue' that will be used
exclusively in the fast path, uses 'struct dp_netdev_rxq' from 'struct
rxq_poll' and uses 'rx' for 'netdev_rxq' and 'rxq' for 'dp_netdev_rxq'.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 168 --
 1 file changed, 112 insertions(+), 56 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f170f5c96..d996e3c9a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -280,8 +280,12 @@ enum pmd_cycles_counter_type {
 
 /* Contained by struct dp_netdev_port's 'rxqs' member.  */
 struct dp_netdev_rxq {
-struct netdev_rxq *rxq;
-unsigned core_id;   /* Сore to which this queue is pinned. */
+struct dp_netdev_port *port;
+struct netdev_rxq *rx;
+unsigned core_id;  /* Core to which this queue should be
+  pinned. OVS_CORE_UNSPEC if the
+  queue doesn't need to be pinned to a
+  particular core. */
 };
 
 /* A port in a netdev-based datapath. */
@@ -415,11 +419,15 @@ struct dp_netdev_pmd_cycles {
 atomic_ullong n[PMD_N_CYCLES];
 };
 
+struct polled_queue {
+struct netdev_rxq *rx;
+odp_port_t port_no;
+};
+
 /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
 struct rxq_poll {
-struct dp_netdev_port *port;
-struct netdev_rxq *rx;
-struct ovs_list node;
+struct dp_netdev_rxq *rxq;
+struct hmap_node node;
 };
 
 /* Contained by struct dp_netdev_pmd_thread's 'send_port_cache',
@@ -500,9 +508,7 @@ struct dp_netdev_pmd_thread {
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
-struct ovs_list poll_list OVS_GUARDED;
-/* Number of elements in 'poll_list' */
-int poll_cnt;
+struct hmap poll_list OVS_GUARDED;
 /* Map of 'tx_port's used for transmission.  Written by the main thread,
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
@@ -586,8 +592,8 @@ static void dp_netdev_add_port_to_pmds(struct dp_netdev *dp,
 static void dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
  struct dp_netdev_port *port);
 static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
- struct dp_netdev_port *port,
- struct netdev_rxq *rx);
+ struct dp_netdev_rxq *rxq)
+OVS_REQUIRES(pmd->port_mutex);
 static struct dp_netdev_pmd_thread *
 dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
@@ -783,12 +789,56 @@ pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
 }
 }
 
+static int
+compare_poll_list(const void *a_, const void *b_)
+{
+const struct rxq_poll *a = a_;
+const struct rxq_poll *b = b_;
+
+const char *namea = netdev_rxq_get_name(a->rxq->rx);
+const char *nameb = netdev_rxq_get_name(b->rxq->rx);
+
+int cmp = strcmp(namea, nameb);
+if (!cmp) {
+return netdev_rxq_get_queue_id(a->rxq->rx)
+   - netdev_rxq_get_queue_id(b->rxq->rx);
+} else {
+return cmp;
+}
+}
+
+static void
+sorted_poll_list(struct dp_netdev_pmd_thread *pmd, struct rxq_poll **list,
+ size_t *n)
+{
+struct rxq_poll *ret, *poll;
+size_t i;
+
+*n = hmap_count(&pmd->poll_list);
+if (!*n) {
+ret = NULL;
+} else {
+ret = xcalloc(*n, sizeof *ret);
+i = 0;
+HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
+ret[i] = *poll;
+i++;
+}
+ovs_assert(i == *n);
+}
+
+qsort(ret, *n, sizeof *ret, compare_poll_list);
+
+*list = ret;
+}
+
 static void
 pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
 {
 if (pmd->core_id != NON_PMD_CORE_ID) {
-struct rxq_poll *poll;
 const char *prev_name = NULL;
+struct rxq_poll *list;
+size_t i, n;
 
 ds_put_format(reply,
   "pmd thread numa_id %d core_id %u:\n\tisolated : %s\n",
@@ -796,21 +846,23 @@ pmd_info_show_rxq(struct ds *reply, struct 
dp_netdev_pmd_thread *pmd)
   ? "true" : "false");
 
 ovs_mutex_lock(&pmd->port_mutex);
-LIST_FOR_EACH (poll, node, &pmd->

Re: [ovs-dev] [PATCH 1/4] datapath: Fix formatting typo.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> Signed-off-by: nickcooper-zhangtonghao 

Thanks, I changed the prefix to netdev-dpdk (instead of datapath) and
pushed this to master

> ---
>  lib/netdev-dpdk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 625f425..376aa4d 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -738,7 +738,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>
>  memset(ð_addr, 0x0, sizeof(eth_addr));
>  rte_eth_macaddr_get(dev->port_id, ð_addr);
> -VLOG_INFO_RL(&rl, "Port %d: "ETH_ADDR_FMT"",
> +VLOG_INFO_RL(&rl, "Port %d: "ETH_ADDR_FMT,
>  dev->port_id, ETH_ADDR_BYTES_ARGS(eth_addr.addr_bytes));
>
>  memcpy(dev->hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN);
> --
> 1.8.3.1
>
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpdk: Late initialization.

2017-01-08 Thread Daniele Di Proietto
With this commit, we allow the user to set other_config:dpdk-init=true
after the process is started.  This makes it easier to start Open
vSwitch with DPDK using standard init scripts without restarting the
service.

This is still far from ideal, because initializing DPDK might still
abort the process (e.g. if there not enough memory), so the user must
check the status of the process after setting dpdk-init to true.

Nonetheless, I think this is an improvement, because it doesn't require
restarting the whole unit.

CC: Aaron Conole 
Signed-off-by: Daniele Di Proietto 
---
v1->v2: No change, first non-RFC post.
---
 lib/dpdk-stub.c |  8 
 lib/dpdk.c  | 30 --
 tests/ofproto-macros.at |  2 +-
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index bd981bb90..daef7291f 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -27,13 +27,13 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovsthread_once_start(&once)) {
-if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+if (ovsthread_once_start(&once)) {
 VLOG_ERR("DPDK not supported in this copy of Open vSwitch.");
+ovsthread_once_done(&once);
 }
-ovsthread_once_done(&once);
 }
 }
 
diff --git a/lib/dpdk.c b/lib/dpdk.c
index ee4360b22..008c6c06d 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -273,12 +273,6 @@ dpdk_init__(const struct smap *ovs_other_config)
 cpu_set_t cpuset;
 char *sock_dir_subcomponent;
 
-if (!smap_get_bool(ovs_other_config, "dpdk-init", false)) {
-VLOG_INFO("DPDK Disabled - to change this requires a restart.\n");
-return;
-}
-
-VLOG_INFO("DPDK Enabled, initializing");
 if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
 NAME_MAX, ovs_other_config,
 &sock_dir_subcomponent)) {
@@ -413,11 +407,27 @@ dpdk_init__(const struct smap *ovs_other_config)
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+static bool enabled = false;
+
+if (enabled || !ovs_other_config) {
+return;
+}
+
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovs_other_config && ovsthread_once_start(&once)) {
-dpdk_init__(ovs_other_config);
-ovsthread_once_done(&once);
+if (ovsthread_once_start(&once_enable)) {
+VLOG_INFO("DPDK Enabled - initializing...");
+dpdk_init__(ovs_other_config);
+VLOG_INFO("DPDK Enabled - initialized");
+ovsthread_once_done(&once_enable);
+}
+} else {
+static struct ovsthread_once once_disable = OVSTHREAD_ONCE_INITIALIZER;
+if (ovsthread_once_start(&once_disable)) {
+VLOG_INFO("DPDK Disabled - Use other_config:dpdk-init to enable");
+ovsthread_once_done(&once_disable);
+}
 }
 }
 
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 547b8..faff5b0a8 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -331,7 +331,7 @@ m4_define([_OVS_VSWITCHD_START],
 /ofproto|INFO|using datapath ID/d
 /netdev_linux|INFO|.*device has unknown hardware address family/d
 /ofproto|INFO|datapath ID changed to fedcba9876543210/d
-/dpdk|INFO|DPDK Disabled - to change this requires a restart./d']])
+/dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d']])
 ])
 
 # OVS_VSWITCHD_START([vsctl-args], [vsctl-output], [=override],
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] datapath: Uses the NR_QUEUE instead of magic numbers.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> The NR_QUEUE is defined in "lib/dpif-netdev.h", netdev-dpdk
> uses it instead of magic number. netdev-dummy should be
> in the same case.
>
> Signed-off-by: nickcooper-zhangtonghao 

Thanks, I changed the prefix of the commit message and applied to master

> ---
>  lib/netdev-dummy.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index d75e597..8d9c805 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -868,8 +868,8 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
> struct smap *args)
>  goto exit;
>  }
>
> -new_n_rxq = MAX(smap_get_int(args, "n_rxq", 1), 1);
> -new_n_txq = MAX(smap_get_int(args, "n_txq", 1), 1);
> +new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> +new_n_txq = MAX(smap_get_int(args, "n_txq", NR_QUEUE), 1);
>  new_numa_id = smap_get_int(args, "numa_id", 0);
>  if (new_n_rxq != netdev->requested_n_rxq
>  || new_n_txq != netdev->requested_n_txq
> --
> 1.8.3.1
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] datapath: Limits the number of tx/rx queues for netdev-dummy.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> This patch avoids the ovs_rcu to report WARN, caused by blocked
> for a long time, when ovs-vswitchd processes a port with many
> rx/tx queues. The number of tx/rx queues per port may be appropriate,
> because the dpdk uses it as an default max value.
>
> Signed-off-by: nickcooper-zhangtonghao 

I don't think this is a big deal, since netdev-dummy is only used for
testing, but don't you think it's better to check it in set_config()
and return an error?

Also, could you use the prefix netdev-dummy, instead of datapath?

Thanks,

Daniele

> ---
>  lib/netdev-dummy.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index d406cbc..d75e597 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -897,6 +897,9 @@ netdev_dummy_get_numa_id(const struct netdev *netdev_)
>  return numa_id;
>  }
>
> +
> +#define DUMMY_MAX_QUEUES_PER_PORT 1024
> +
>  /* Sets the number of tx queues and rx queues for the dummy PMD interface. */
>  static int
>  netdev_dummy_reconfigure(struct netdev *netdev_)
> @@ -905,8 +908,8 @@ netdev_dummy_reconfigure(struct netdev *netdev_)
>
>  ovs_mutex_lock(&netdev->mutex);
>
> -netdev_->n_txq = netdev->requested_n_txq;
> -netdev_->n_rxq = netdev->requested_n_rxq;
> +netdev_->n_txq = MIN(DUMMY_MAX_QUEUES_PER_PORT, netdev->requested_n_txq);
> +netdev_->n_rxq = MIN(DUMMY_MAX_QUEUES_PER_PORT, netdev->requested_n_rxq);
>  netdev->numa_id = netdev->requested_numa_id;
>
>  ovs_mutex_unlock(&netdev->mutex);
> --
> 1.8.3.1
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/4] datapath: Uses the OVS_CORE_UNSPEC instead of magic numbers.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> This patch uses OVS_CORE_UNSPEC for the queue unpinned instead
> of "-1". More important, the "-1" casted to unsigned int is
> equal to NON_PMD_CORE_ID. We make the distinction between them.
>
> Signed-off-by: nickcooper-zhangtonghao 

Thanks, this bothered me as well.  In fact I sent a patch for it in
the past as part of a series:

https://mail.openvswitch.org/pipermail/ovs-dev/2016-December/325692.html.

This shouldn't fix any problems, because I think we only compared
core_id with pmd threads (not why the non-pmd), but I agree that using
-1 for an unsigned is not pretty.

I fixed the title and applied this to master, thanks

> ---
>  lib/dpif-netdev.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 0b73056..99e4d35 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1293,7 +1293,7 @@ port_create(const char *devname, const char *type,
>   devname, ovs_strerror(errno));
>  goto out_rxq_close;
>  }
> -port->rxqs[i].core_id = -1;
> +port->rxqs[i].core_id = OVS_CORE_UNSPEC;
>  n_open_rxqs++;
>  }
>
> @@ -1517,7 +1517,7 @@ has_pmd_rxq_for_numa(struct dp_netdev *dp, int numa_id)
>  for (i = 0; i < port->n_rxq; i++) {
>  unsigned core_id = port->rxqs[i].core_id;
>
> -if (core_id != -1
> +if (core_id != OVS_CORE_UNSPEC
>  && ovs_numa_get_numa_id(core_id) == numa_id) {
>  return true;
>  }
> @@ -2704,7 +2704,7 @@ parse_affinity_list(const char *affinity_list, unsigned 
> *core_ids, int n_rxq)
>  int error = 0;
>
>  for (i = 0; i < n_rxq; i++) {
> -core_ids[i] = -1;
> +core_ids[i] = OVS_CORE_UNSPEC;
>  }
>
>  if (!affinity_list) {
> @@ -3617,7 +3617,7 @@ dp_netdev_add_port_rx_to_pmds(struct dp_netdev *dp,
>
>  for (i = 0; i < port->n_rxq; i++) {
>  if (pinned) {
> -if (port->rxqs[i].core_id == -1) {
> +if (port->rxqs[i].core_id == OVS_CORE_UNSPEC) {
>  continue;
>  }
>  pmd = dp_netdev_get_pmd(dp, port->rxqs[i].core_id);
> @@ -3631,7 +3631,7 @@ dp_netdev_add_port_rx_to_pmds(struct dp_netdev *dp,
>  pmd->isolated = true;
>  dp_netdev_pmd_unref(pmd);
>  } else {
> -if (port->rxqs[i].core_id != -1) {
> +if (port->rxqs[i].core_id != OVS_CORE_UNSPEC) {
>  continue;
>  }
>  pmd = dp_netdev_less_loaded_pmd_on_numa(dp, numa_id);
> @@ -3760,7 +3760,7 @@ dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
>  for (i = 0; i < port->n_rxq; i++) {
>  unsigned core_id = port->rxqs[i].core_id;
>
> -if (core_id != -1) {
> +if (core_id != OVS_CORE_UNSPEC) {
>  numa_id = ovs_numa_get_numa_id(core_id);
>  hmapx_add(&numas, (void *) numa_id);
>  }
> --
> 1.8.3.1
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpdk: Late initialization.

2017-01-09 Thread Daniele Di Proietto





On 09/01/2017 07:14, "Aaron Conole"  wrote:

>Daniele Di Proietto  writes:
>
>> With this commit, we allow the user to set other_config:dpdk-init=true
>> after the process is started.  This makes it easier to start Open
>> vSwitch with DPDK using standard init scripts without restarting the
>> service.
>>
>> This is still far from ideal, because initializing DPDK might still
>> abort the process (e.g. if there not enough memory), so the user must
>> check the status of the process after setting dpdk-init to true.
>>
>> Nonetheless, I think this is an improvement, because it doesn't require
>> restarting the whole unit.
>>
>> CC: Aaron Conole 
>> Signed-off-by: Daniele Di Proietto 
>> ---
>> v1->v2: No change, first non-RFC post.
>> ---
>
>Looks good - just one minor detail below
>
>>  lib/dpdk-stub.c |  8 
>>  lib/dpdk.c  | 30 --
>>  tests/ofproto-macros.at |  2 +-
>>  3 files changed, 25 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
>> index bd981bb90..daef7291f 100644
>> --- a/lib/dpdk-stub.c
>> +++ b/lib/dpdk-stub.c
>> @@ -27,13 +27,13 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
>>  void
>>  dpdk_init(const struct smap *ovs_other_config)
>>  {
>> -static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>  
>> -if (ovsthread_once_start(&once)) {
>> -if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> +if (ovsthread_once_start(&once)) {
>>  VLOG_ERR("DPDK not supported in this copy of Open vSwitch.");
>> +ovsthread_once_done(&once);
>>  }
>> -ovsthread_once_done(&once);
>>  }
>>  }
>>  
>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>> index ee4360b22..008c6c06d 100644
>> --- a/lib/dpdk.c
>> +++ b/lib/dpdk.c
>> @@ -273,12 +273,6 @@ dpdk_init__(const struct smap *ovs_other_config)
>>  cpu_set_t cpuset;
>>  char *sock_dir_subcomponent;
>>  
>> -if (!smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> -VLOG_INFO("DPDK Disabled - to change this requires a restart.\n");
>> -return;
>> -}
>> -
>> -VLOG_INFO("DPDK Enabled, initializing");
>>  if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
>>  NAME_MAX, ovs_other_config,
>>  &sock_dir_subcomponent)) {
>> @@ -413,11 +407,27 @@ dpdk_init__(const struct smap *ovs_other_config)
>>  void
>>  dpdk_init(const struct smap *ovs_other_config)
>>  {
>> -static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +static bool enabled = false;
>
>This doesn't appear to be used, apart from the first test of the
>following conditional (where it will always pass to the second).  Did I
>miss something?

Oops, it should be used.

I need to set it to true after calling dpdk_init__(), otherwise the following 
scenario
might happen:

1) other_config:dpdk-init is set to "true"
2) vswitchd is started and dpdk_init__() is called
3) other_config:dpdk-init is set to "false"
4) The log message "DPDK Disabled - Use other_config:dpdk-init to enable" is 
printed, giving the illusion that DPDK was disabled.

I'll add 'enable=true' in the next version.

Thanks,

Daniele

>
>> +if (enabled || !ovs_other_config) {
>> +return;
>> +}
>> +
>> +if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> +static struct ovsthread_once once_enable = 
>> OVSTHREAD_ONCE_INITIALIZER;
>>  
>> -if (ovs_other_config && ovsthread_once_start(&once)) {
>> -dpdk_init__(ovs_other_config);
>> -ovsthread_once_done(&once);
>> +if (ovsthread_once_start(&once_enable)) {
>> +VLOG_INFO("DPDK Enabled - initializing...");
>> +dpdk_init__(ovs_other_config);
>> +VLOG_INFO("DPDK Enabled - initialized");
>> +ovsthread_once_done(&once_enable);
>> +}
>> +} else {
>> +static struct ovsthread_once once_disable = 
>> OVSTHREAD_ONCE_INITIALIZER;
>> +if (ovsthread_once_start(&once_disable)) {
>

Re: [ovs-dev] [PATCH v2] dpdk: Late initialization.

2017-01-09 Thread Daniele Di Proietto





On 09/01/2017 03:49, "nickcooper-zhangtonghao"  wrote:

>
>
>
>hi Daniele,
>I reviewed this patch. One question to ask: should we check the
>hugepage mm before calling the rte_eal_init()? improvement on next version?

How do you suggest to check for hugepage before calling rte_eal_init()?

I think everybody agrees that in the long term we need to avoid aborting if the 
initialization fails, but most of that work need to happen in dpdk library.

If there's a simple check we could do here, I'm fine with including that, if 
it's something more complicated and needs to be a separate patch, we should 
probably defer it, since we're on feature freeze now.

Thanks,

Daniele

>
>
>
>Thanks.
>Nick
>
>
>
>On Jan 9, 2017, at 11:21 AM, Daniele Di Proietto  
>wrote:
>
>With
> this commit, we allow the user to set other_config:dpdk-init=true
>after
> the process is started.  This makes it easier to start Open
>vSwitch
> with DPDK using standard init scripts without restarting the
>service.
>
>This
> is still far from ideal, because initializing DPDK might still
>abort
> the process (e.g. if there not enough memory), so the user must
>check
> the status of the process after setting dpdk-init to true.
>
>Nonetheless,
> I think this is an improvement, because it doesn't require
>restarting
> the whole unit.
>
>CC:
> Aaron Conole 
>Signed-off-by:
> Daniele Di Proietto 
>---
>v1->v2:
> No change, first non-RFC post.
>---
>lib/dpdk-stub.c
> |  8 
>lib/dpdk.c
>  | 30 --
>tests/ofproto-macros.at <http://ofproto-macros.at/> |
>  2 +-
>3
> files changed, 25 insertions(+), 15 deletions(-)
>
>
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] datapath: Limits the number of tx/rx queues for netdev-dummy.

2017-01-09 Thread Daniele Di Proietto
2017-01-08 20:02 GMT-08:00 nickcooper-zhangtonghao :
> Thanks Daniele,
> Yes, it’s a small improvement. but it is necessary for us. I will check it
> in
> set_config(). One question to ask: should we check the tx/rx queue for
> netdev-dpdk in set_config()?

I think for DPDK devices ultimately there's no way to check without
actually setting up the queues, that's why it's done in reconfigure().

Thanks,

Daniele

>
> Now we check it in dpdk_eth_dev_init().
>
> Thanks.
>
>
>
> On Jan 9, 2017, at 11:22 AM, Daniele Di Proietto 
> wrote:
>
> 2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
>
> This patch avoids the ovs_rcu to report WARN, caused by blocked
> for a long time, when ovs-vswitchd processes a port with many
> rx/tx queues. The number of tx/rx queues per port may be appropriate,
> because the dpdk uses it as an default max value.
>
> Signed-off-by: nickcooper-zhangtonghao 
>
>
> I don't think this is a big deal, since netdev-dummy is only used for
> testing, but don't you think it's better to check it in set_config()
> and return an error?
>
> Also, could you use the prefix netdev-dummy, instead of datapath?
>
> Thanks,
>
> Daniele
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dummy: Limits the number of tx/rx queues.

2017-01-09 Thread Daniele Di Proietto
2017-01-09 6:40 GMT-08:00 nickcooper-zhangtonghao :
> This patch avoids the ovs_rcu to report WARN, caused by blocked
> for a long time, when ovs-vswitchd processes a port with many
> rx/tx queues. The number of tx/rx queues per port may be appropriate,
> because the dpdk uses it as an default max value.
>
> Signed-off-by: nickcooper-zhangtonghao 
> ---
>  lib/netdev-dummy.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index bdb77e1..5370404 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -827,6 +827,8 @@ netdev_dummy_set_in6(struct netdev *netdev_, struct 
> in6_addr *in6,
>  return 0;
>  }
>
> +#define DUMMY_MAX_QUEUES_PER_PORT 1024
> +
>  static int
>  netdev_dummy_set_config(struct netdev *netdev_, const struct smap *args)
>  {
> @@ -868,8 +870,11 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
> struct smap *args)
>  goto exit;
>  }
>
> -new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> -new_n_txq = MAX(smap_get_int(args, "n_txq", NR_QUEUE), 1);
> +new_n_rxq = MIN(DUMMY_MAX_QUEUES_PER_PORT,
> +MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1));
> +new_n_txq = MIN(DUMMY_MAX_QUEUES_PER_PORT,
> +MAX(smap_get_int(args, "n_txq", NR_QUEUE), 1));
> +

I'm fine with this, but don't you think it's better to return error if
one of the number exceeds DUMMY_MAX_QUEUES_PER_PORT?

>  new_numa_id = smap_get_int(args, "numa_id", 0);
>  if (new_n_rxq != netdev->requested_n_rxq
>  || new_n_txq != netdev->requested_n_txq
> --
> 1.8.3.1
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] dpdk: Late initialization.

2017-01-09 Thread Daniele Di Proietto
With this commit, we allow the user to set other_config:dpdk-init=true
after the process is started.  This makes it easier to start Open
vSwitch with DPDK using standard init scripts without restarting the
service.

This is still far from ideal, because initializing DPDK might still
abort the process (e.g. if there not enough memory), so the user must
check the status of the process after setting dpdk-init to true.

Nonetheless, I think this is an improvement, because it doesn't require
restarting the whole unit.

CC: Aaron Conole 
Signed-off-by: Daniele Di Proietto 
---
v3: Set 'enable' after dpdk_init__()
---
 lib/dpdk-stub.c |  8 
 lib/dpdk.c  | 31 +--
 tests/ofproto-macros.at |  2 +-
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index bd981bb90..daef7291f 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -27,13 +27,13 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovsthread_once_start(&once)) {
-if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+if (ovsthread_once_start(&once)) {
 VLOG_ERR("DPDK not supported in this copy of Open vSwitch.");
+ovsthread_once_done(&once);
 }
-ovsthread_once_done(&once);
 }
 }
 
diff --git a/lib/dpdk.c b/lib/dpdk.c
index ee4360b22..9ae249141 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -273,12 +273,6 @@ dpdk_init__(const struct smap *ovs_other_config)
 cpu_set_t cpuset;
 char *sock_dir_subcomponent;
 
-if (!smap_get_bool(ovs_other_config, "dpdk-init", false)) {
-VLOG_INFO("DPDK Disabled - to change this requires a restart.\n");
-return;
-}
-
-VLOG_INFO("DPDK Enabled, initializing");
 if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
 NAME_MAX, ovs_other_config,
 &sock_dir_subcomponent)) {
@@ -413,11 +407,28 @@ dpdk_init__(const struct smap *ovs_other_config)
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+static bool enabled = false;
+
+if (enabled || !ovs_other_config) {
+return;
+}
+
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovs_other_config && ovsthread_once_start(&once)) {
-dpdk_init__(ovs_other_config);
-ovsthread_once_done(&once);
+if (ovsthread_once_start(&once_enable)) {
+VLOG_INFO("DPDK Enabled - initializing...");
+dpdk_init__(ovs_other_config);
+enabled = true;
+VLOG_INFO("DPDK Enabled - initialized");
+ovsthread_once_done(&once_enable);
+}
+} else {
+static struct ovsthread_once once_disable = OVSTHREAD_ONCE_INITIALIZER;
+if (ovsthread_once_start(&once_disable)) {
+VLOG_INFO("DPDK Disabled - Use other_config:dpdk-init to enable");
+ovsthread_once_done(&once_disable);
+}
 }
 }
 
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 547b8..faff5b0a8 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -331,7 +331,7 @@ m4_define([_OVS_VSWITCHD_START],
 /ofproto|INFO|using datapath ID/d
 /netdev_linux|INFO|.*device has unknown hardware address family/d
 /ofproto|INFO|datapath ID changed to fedcba9876543210/d
-/dpdk|INFO|DPDK Disabled - to change this requires a restart./d']])
+/dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d']])
 ])
 
 # OVS_VSWITCHD_START([vsctl-args], [vsctl-output], [=override],
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpdk: Late initialization.

2017-01-10 Thread Daniele Di Proietto





On 10/01/2017 06:54, "Aaron Conole"  wrote:

>Daniele Di Proietto  writes:
>
>> With this commit, we allow the user to set other_config:dpdk-init=true
>> after the process is started.  This makes it easier to start Open
>> vSwitch with DPDK using standard init scripts without restarting the
>> service.
>>
>> This is still far from ideal, because initializing DPDK might still
>> abort the process (e.g. if there not enough memory), so the user must
>> check the status of the process after setting dpdk-init to true.
>>
>> Nonetheless, I think this is an improvement, because it doesn't require
>> restarting the whole unit.
>>
>> CC: Aaron Conole 
>> Signed-off-by: Daniele Di Proietto 
>> ---
>
>Acked-by: Aaron Conole 

Thanks for reviewing this, applied to master
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dummy: Limits the number of tx/rx queues.

2017-01-10 Thread Daniele Di Proietto
2017-01-09 21:56 GMT-08:00 nickcooper-zhangtonghao :
> This patch avoids the ovs_rcu to report WARN, caused by blocked
> for a long time, when ovs-vswitchd processes a port with many
> rx/tx queues. The number of tx/rx queues per port may be appropriate,
> because the dpdk uses it as an default max value.
>
> Signed-off-by: nickcooper-zhangtonghao 

Applied to master, thanks

> ---
> v3:
> * Limits the number of tx/rx queues in set_config().
> * Adds the WARN log when exceeds DUMMY_MAX_QUEUES_PER_PORT.
> ---
>  lib/netdev-dummy.c | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index bdb77e1..4a23cba 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -827,6 +827,8 @@ netdev_dummy_set_in6(struct netdev *netdev_, struct 
> in6_addr *in6,
>  return 0;
>  }
>
> +#define DUMMY_MAX_QUEUES_PER_PORT 1024
> +
>  static int
>  netdev_dummy_set_config(struct netdev *netdev_, const struct smap *args)
>  {
> @@ -870,6 +872,21 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
> struct smap *args)
>
>  new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
>  new_n_txq = MAX(smap_get_int(args, "n_txq", NR_QUEUE), 1);
> +
> +if (new_n_rxq > DUMMY_MAX_QUEUES_PER_PORT ||
> +new_n_txq > DUMMY_MAX_QUEUES_PER_PORT) {
> +VLOG_WARN("The one or both of interface %s queues"
> +  "(rxq: %d, txq: %d) exceed %d. Sets it %d.\n",
> +  netdev->up.name,
> +  new_n_rxq,
> +  new_n_txq,
> +  DUMMY_MAX_QUEUES_PER_PORT,
> +  DUMMY_MAX_QUEUES_PER_PORT);
> +
> +new_n_rxq = MIN(DUMMY_MAX_QUEUES_PER_PORT, new_n_rxq);
> +new_n_txq = MIN(DUMMY_MAX_QUEUES_PER_PORT, new_n_txq);
> +}
> +
>  new_numa_id = smap_get_int(args, "numa_id", 0);
>  if (new_n_rxq != netdev->requested_n_rxq
>  || new_n_txq != netdev->requested_n_txq
> --
> 1.8.3.1
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Assign socket id according to device's numa id

2017-01-11 Thread Daniele Di Proietto
2017-01-12 6:18 GMT-08:00 Binbin Xu :
> We can hotplug attach DPDK ports specified via the 'dpdk-devargs'
> option now.
>
> But the socket id of DPDK ports can't be assigned correctly,
> it is always 0. The socket id of DPDK ports should be assigned
> according to the numa id of the device.
>
> Fixes: 55e075e65ef9e ("netdev-dpdk: Arbitrary 'dpdk' port naming")
> Signed-off-by: Binbin Xu 

Thanks a lot for fixing this, applied to master

> ---
>  lib/netdev-dpdk.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 8bb9086..57ebdb3 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1197,6 +1197,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> struct smap *args)
>  bool temp_flag;
>  const char *new_devargs;
>  int err = 0;
> +int sid;
>
>  ovs_mutex_lock(&dpdk_mutex);
>  ovs_mutex_lock(&dev->mutex);
> @@ -1242,6 +1243,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> struct smap *args)
>  err = EADDRINUSE;
>  } else {
>  dev->devargs = xstrdup(new_devargs);
> +sid = rte_eth_dev_socket_id(new_port_id);
> +dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
>  dev->port_id = new_port_id;
>  netdev_request_reconfigure(&dev->up);
>  err = 0;
> @@ -3140,7 +3143,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>  && netdev->n_rxq == dev->requested_n_rxq
>  && dev->mtu == dev->requested_mtu
>  && dev->rxq_size == dev->requested_rxq_size
> -&& dev->txq_size == dev->requested_txq_size) {
> +&& dev->txq_size == dev->requested_txq_size
> +&& dev->socket_id == dev->requested_socket_id) {
>  /* Reconfiguration is unnecessary */
>
>  goto out;
> @@ -3148,7 +3152,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>
>  rte_eth_dev_stop(dev->port_id);
>
> -if (dev->mtu != dev->requested_mtu) {
> +if (dev->mtu != dev->requested_mtu
> +|| dev->socket_id != dev->requested_socket_id) {
>  netdev_dpdk_mempool_configure(dev);
>  }
>
> --
> 1.8.3.1
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev: Add 'errp' to set_config().

2017-01-11 Thread Daniele Di Proietto





On 11/01/2017 02:56, "Kevin Traynor"  wrote:

>On 01/07/2017 01:24 AM, Daniele Di Proietto wrote:
>> Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
>> set_config() is used to identify a DPDK device, so it's better to report
>> its detailed error message to the user.  Tunnel devices and patch ports
>> rely a lot on set_config() as well.
>> 
>> This commit adds a param to set_config() that can be used to return
>> an error message and makes use of that in netdev-dpdk and netdev-vport.
>> 
>> Before this patch:
>> 
>> $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>> ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set 
>> configuration (Invalid argument).  See ovs-vswitchd log for details.
>> ovs-vsctl: The default log directory is "/var/log/openvswitch/".
>> 
>> $ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
>> ovs-vsctl: Error detected while setting up 'p+': p+: could not set 
>> configuration (Invalid argument).  See ovs-vswitchd log for details.
>> ovs-vsctl: The default log directory is "/var/log/openvswitch/".
>> 
>> $ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
>> ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set 
>> configuration (Invalid argument).  See ovs-vswitchd log for details.
>> ovs-vsctl: The default log directory is "/var/log/openvswitch/".
>> 
>> After this patch:
>> 
>> $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>> ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing 
>> 'options:dpdk-devargs'. The old 'dpdk' names are not supported.  
>> See ovs-vswitchd log for details.
>> ovs-vsctl: The default log directory is "/var/log/openvswitch/".
>> 
>> $ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
>> ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires 
>> valid 'peer' argument.  See ovs-vswitchd log for details.
>> ovs-vsctl: The default log directory is "/var/log/openvswitch/".
>> 
>> $ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
>> ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type 
>> requires valid 'remote_ip' argument.  See ovs-vswitchd log for details.
>> ovs-vsctl: The default log directory is "/var/log/openvswitch/".
>> 
>> CC: Ciara Loftus 
>> CC: Kevin Traynor 
>> Signed-off-by: Daniele Di Proietto 
>> ---
>>  lib/netdev-dpdk.c | 27 ++
>>  lib/netdev-dummy.c|  3 +-
>>  lib/netdev-provider.h |  9 --
>>  lib/netdev-vport.c| 76 
>> ++-
>>  lib/netdev.c  | 10 +--
>>  5 files changed, 84 insertions(+), 41 deletions(-)
>> 
>
>Note the commit message lines get truncated in git log, you may want to
>wrap them. It's much more intuitive now, thanks.
>
>Acked-by: Kevin Traynor 

I'm always unsure what to do with long commit messages, especially with
shell commands or gdb stack traces.

Documentation/internals/contributing/submitting-patches.rst explicitly
suggests to limit lines to 75 characters, so I have to agree with you :-)

Pushed to master, thanks
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-vport: Do not log empty warnings on success.

2017-01-12 Thread Daniele Di Proietto
set_tunnel_config() always logs a warning, even on success. This
shouldn't happen.

Without this, some unit tests fail.

Fixes: 9fff138ec3a6("netdev: Add 'errp' to set_config().")
Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-vport.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index ad5ffcc81..2db51df72 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -561,10 +561,12 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 err = 0;
 
 out:
-ds_chomp(&errors, '\n');
-VLOG_WARN("%s", ds_cstr(&errors));
-if (err) {
-*errp = ds_steal_cstr(&errors);
+if (*ds_cstr(&errors)) {
+ds_chomp(&errors, '\n');
+VLOG_WARN("%s", ds_cstr(&errors));
+if (err) {
+*errp = ds_steal_cstr(&errors);
+}
 }
 
 ds_destroy(&errors);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] broken tests

2017-01-12 Thread Daniele Di Proietto





On 12/01/2017 09:24, "Ben Pfaff"  wrote:

>Commit 9fff138ec3a6dbe75073d16cba7fbe86ac273c36 "netdev: Add 'errp' to
>set_config()." breaks the unit tests because netdev-vport now logs lots
>of blank lines.  I am unsure of the right fix--is it to just drop the
>new VLOG_WARN call?

Hi Ben,

Sorry about that, I posted a patch that should fix this:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327564.html

If you're fine with it, I'll merge it shortly.

>
>Thanks,
>
>Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-vport: Do not log empty warnings on success.

2017-01-12 Thread Daniele Di Proietto





On 12/01/2017 09:33, "Ben Pfaff"  wrote:

>On Thu, Jan 12, 2017 at 12:23:55AM -0800, Daniele Di Proietto wrote:
>> set_tunnel_config() always logs a warning, even on success. This
>> shouldn't happen.
>> 
>> Without this, some unit tests fail.
>> 
>> Fixes: 9fff138ec3a6("netdev: Add 'errp' to set_config().")
>> Signed-off-by: Daniele Di Proietto 
>> ---
>>  lib/netdev-vport.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index ad5ffcc81..2db51df72 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -561,10 +561,12 @@ set_tunnel_config(struct netdev *dev_, const struct 
>> smap *args, char **errp)
>>  err = 0;
>>  
>>  out:
>> -ds_chomp(&errors, '\n');
>> -VLOG_WARN("%s", ds_cstr(&errors));
>> -if (err) {
>> -*errp = ds_steal_cstr(&errors);
>> +if (*ds_cstr(&errors)) {
>
>How about "if (errors.length)" instead?

Ok

>
>> +ds_chomp(&errors, '\n');
>> +VLOG_WARN("%s", ds_cstr(&errors));
>> +if (err) {
>> +*errp = ds_steal_cstr(&errors);
>> +}
>>  }
>>  
>>  ds_destroy(&errors);
>
>Acked-by: Ben Pfaff 

Thanks, pushed to master
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-vport: Do not log empty warnings on success.

2017-01-12 Thread Daniele Di Proietto





On 12/01/2017 03:40, "Fischetti, Antonio"  wrote:

>Hi Daniele,
>I've checked that without this patch I was getting
>ERROR: 2316 tests were run,
>75 failed unexpectedly.
>2 tests were skipped.
>
>Instead after applying this patch I get
>ERROR: 2316 tests were run,
>42 failed unexpectedly.
>2 tests were skipped.

I'm curious why these fail for you, they happen to pass on travis or locally.
Maybe this is something worth investigating.

>
>
>In particular, after I apply this patch the following tunnel tests are not 
>failing anymore.
>
>===
>tunnel
>
>749: tunnel - input  ok
>750: tunnel - ECN decapsulation  ok
>751: tunnel - output ok
>752: tunnel - unencrypted tunnel and not setting skb_mark ok
>753: tunnel - unencrypted tunnel and setting skb_mark to 1 ok
>754: tunnel - unencrypted tunnel and setting skb_mark to 2 ok
>755: tunnel - ToS and TTL inheritanceok
>756: tunnel - set_tunnel ok
>757: tunnel - keyok
>758: tunnel - key match  ok
>759: tunnel - Geneve ok
>760: tunnel - VXLAN  ok
>761: tunnel - LISP   ok
>762: tunnel - different VXLAN UDP port   ok
>763: ofproto-dpif - set_field - tun_src/tun_dst/tun_id ok
>764: tunnel - Geneve metadataok
>765: tunnel - Geneve option present  ok
>766: tunnel - concomitant IPv6 and IPv4 tunnels  ok
>
>tunnel_push_pop
>
>767: tunnel_push_pop - actionok
>768: tunnel_push_pop - packet_outok
>
>tunnel_push_pop_ipv6
>
>769: tunnel_push_pop_ipv6 - action   ok
>
>1093: ofproto-dpif - truncate and output to gre tunnel ok
>
>1097: ofproto-dpif - sFlow packet sampling - tunnel set ok
>1098: ofproto-dpif - sFlow packet sampling - tunnel push ok
>
>1107: ofproto-dpif - Flow IPFIX sanity check - tunnel set ok
>
>1147: ofproto-dpif megaflow - tunnels ok
>
>1154: ofproto-dpif - ofproto-dpif-monitor 1   ok
>1155: ofproto-dpif - ofproto-dpif-monitor 2   ok
>
>===
>
>One further comment, this patch fails with ./utilities/checkpatch.py.
>E: No signatures found.
>Warnings: 0, Errors: 1

Mmmh, not sure why it's reporting this error.  There is a signoff line.
I tried to run it and it didn't report anything.

>
>Besides this, it looks ok to me.
>
>Acked-by: Antonio Fischetti 

Thanks for the review

>
>
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
>> boun...@openvswitch.org] On Behalf Of Daniele Di Proietto
>> Sent: Thursday, January 12, 2017 8:24 AM
>> To: d...@openvswitch.org
>> Cc: Daniele Di Proietto 
>> Subject: [ovs-dev] [PATCH] netdev-vport: Do not log empty warnings on
>> success.
>> 
>> set_tunnel_config() always logs a warning, even on success. This
>> shouldn't happen.
>> 
>> Without this, some unit tests fail.
>> 
>> Fixes: 9fff138ec3a6("netdev: Add 'errp' to set_config().")
>> Signed-off-by: Daniele Di Proietto 
>> ---
>>  lib/netdev-vport.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index ad5ffcc81..2db51df72 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -561,10 +561,12 @@ set_tunnel_config(struct netdev *dev_, const struct
>> smap *args, char **errp)
>>  err = 0;
>> 
>>  out:
>> -ds_chomp(&errors, '\n');
>> -VLOG_WARN("%s", ds_cstr(&errors));
>> -if (err) {
>> -*errp = ds_steal_cstr(&errors);
>> +if (*ds_cstr(&errors)) {
>> +ds_chomp(&errors, '\n');
>> +VLOG_WARN("%s", ds_cstr(&errors));
>> +if (err) {
>> +*errp = ds_steal_cstr(&errors);
>> +}
>>  }
>> 
>>  ds_destroy(&errors);
>> --
>> 2.11.0
>> 
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] broken tests

2017-01-12 Thread Daniele Di Proietto





On 12/01/2017 09:34, "Ben Pfaff"  wrote:

>On Thu, Jan 12, 2017 at 05:27:05PM +0000, Daniele Di Proietto wrote:
>> On 12/01/2017 09:24, "Ben Pfaff"  wrote:
>> 
>> >Commit 9fff138ec3a6dbe75073d16cba7fbe86ac273c36 "netdev: Add 'errp' to
>> >set_config()." breaks the unit tests because netdev-vport now logs lots
>> >of blank lines.  I am unsure of the right fix--is it to just drop the
>> >new VLOG_WARN call?
>> 
>> Hi Ben,
>> 
>> Sorry about that, I posted a patch that should fix this:
>
>Thanks, I've reviewed it now.

Sorry about the breakage and thanks for the quick review, I hope it's fixed now.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 11/18] dpctl: Avoid making assumptions on pmd threads.

2017-01-15 Thread Daniele Di Proietto

On 13/01/2017 16:51, "Ben Pfaff"  wrote:

>On Sun, Jan 08, 2017 at 07:15:09PM -0800, Daniele Di Proietto wrote:
>> Currently dpctl depends on ovs-numa module to delete and create flows on
>> different pmd threads for pmd devices.
>> 
>> The next commits will move away the pmd threads state from ovs-numa to
>> dpif-netdev, so the ovs-numa interface will not be supported.
>> 
>> Also, the assignment between ports and thread is an implementation
>> detail of dpif-netdev, dpctl shouldn't know anything about it.
>> 
>> This commit changes the dpif_flow_put() and dpif_flow_del() calls to
>> iterate over all the pmd threads, if pmd_id is PMD_ID_NULL.
>> 
>> A simple test is added.
>> 
>> Signed-off-by: Daniele Di Proietto 
>
>dpif_netdev_flow_del() and dpif_netdev_flow_put() have a lot of very
>similar code, which makes one wonder whether any common code can be
>factored out.

You're right, they share s lot of code.  I haven't found an easy way to
reduce duplication without introducing callbacks, so I left it as it is
for now, I hope it's ok

Perhaps I can add the common code in dpif_netdev_operate, I'll post
something soon.

Thanks,

Daniele
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 08/18] dpif-netdev: Block pmd threads if there are no ports.

2017-01-15 Thread Daniele Di Proietto





On 13/01/2017 16:44, "Ben Pfaff"  wrote:

>On Sun, Jan 08, 2017 at 07:15:06PM -0800, Daniele Di Proietto wrote:
>> +if (!poll_cnt) {
>> +while (seq_read(pmd->reload_seq) == pmd->last_reload_seq) {
>> +seq_wait(pmd->reload_seq, pmd->last_reload_seq);
>> +poll_block();
>> +}
>> +lc = 1025;
>
>1025 is very magic here.  How about UINT_MAX?

Much better, thanks

>
>> +}
>
>Thanks,
>
>Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 00/18] DPDK/pmd reconfiguration refactor and bugfixes

2017-01-15 Thread Daniele Di Proietto





On 13/01/2017 16:57, "Ben Pfaff"  wrote:

>On Sun, Jan 08, 2017 at 07:14:58PM -0800, Daniele Di Proietto wrote:
>> The first two commits of the series are trivial bugfixes for dpif-netdev.
>> 
>> Then the series fixes a long standing bug that caused a crash when the
>> admin link state of a port is changed while traffic is flowing.
>> 
>> The next part makes use of reconfiguration for port add: this makes
>> the operation twice as fast and reduce some code duplication.  This part
>> conflicts with the port naming change, so I'm willing to postpone it, unless
>> we find it to be useful for the port naming change.
>> 
>> The rest of the series refactors a lot of code if dpif-netdev:
>
>I skimmed the whole series and made only one or two minor comments,
>which are not important.  Thanks for doing this!

Thanks for taking a look!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 13/18] ovs-numa: Add new dump types.

2017-01-15 Thread Daniele Di Proietto





On 13/01/2017 16:54, "Ben Pfaff"  wrote:

>On Sun, Jan 08, 2017 at 07:15:11PM -0800, Daniele Di Proietto wrote:
>> They will be used by a future commit.
>> 
>> This patch introduces some code duplication which will be removed in a
>> future commit.
>> 
>> Signed-off-by: Daniele Di Proietto 
>
>The hexit_value() function might help, in
>ovs_numa_dump_cores_with_cmask().

It does help, I changed the commit to make use of it.  Thanks
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 00/18] DPDK/pmd reconfiguration refactor and bugfixes

2017-01-15 Thread Daniele Di Proietto





On 12/01/2017 04:30, "Ilya Maximets"  wrote:

>Hi, Daniele.
>Thanks for v3.
>
>Acked-by: Ilya Maximets 

Thanks a lot for the review,

pushed to master

>
>On 09.01.2017 06:14, Daniele Di Proietto wrote:
>> The first two commits of the series are trivial bugfixes for dpif-netdev.
>> 
>> Then the series fixes a long standing bug that caused a crash when the
>> admin link state of a port is changed while traffic is flowing.
>> 
>> The next part makes use of reconfiguration for port add: this makes
>> the operation twice as fast and reduce some code duplication.  This part
>> conflicts with the port naming change, so I'm willing to postpone it, unless
>> we find it to be useful for the port naming change.
>> 
>> The rest of the series refactors a lot of code if dpif-netdev:
>> 
>> * We no longer start pmd threads on demand for each numa node.  This made
>>   the code very complicated and introduced a lot of bugs.
>> * The pmd threads state is now internal to dpif-netdev and it's not stored in
>>   ovs-numa.
>> * There's now a single function that handles pmd threads/ports changes: this
>>   reduces code duplication and makes port reconfiguration faster, as we don't
>>   have to bring down the whole datapath.
>> 
>> v3->v2:
>> 
>> * Rebased:
>>   * Rebased against dpdk arbitrary name change.
>>   * Dropped unsigned 'core_id' commit because a similar fix is already
>> on master
>> * Put space between *FOR_EACH* and (
>> * Actually use new FOR_EACH_NUMA_ON_DUMP
>> * Use hmap_contains() instead of dp_netdev_lookup_port() in a couple of
>>   places
>> * Restore spaces in log messages, lost while wrapping the string.
>> 
>> v1->v2:
>> 
>> * Postpone cls deletion in dp_netdev_destroy_pmd()
>> * Allow ports to be in tnl_port_cache and send_port_cache at the same time
>> * Set counter to 1025 when reloading pmd without queues to be polled
>> * Rebased:
>>   * Allow 0x in pmd-cpu-mask
>>   * ...
>> * Don't duplicate get_core_by_core_id() in get_cpu_core()
>> * New commit for ovs-numa: don't use hmap_first_with_hash()
>> * Keep per numa count of cores in ovs_numa_dump
>> * Print queue id and port name in warning if there's no pmd thread
>> * Extract pmd_remove_stale_ports() from reconfigure_datapath()
>> * s/reload_all_pmds()/reload_affected_pmds()/
>> * Declare variables at the beginning of the block in rxq_scheduling()
>> * Use 'q' instead of 'port->rxqs[qid]' in a couple of places
>> * Unref pmd in rxq_scheduling()
>> * Simplify check for changed pmd threads
>> * Properly reset queues to unassigned in reconfigure_datapath()
>> * Optimize tx port insertion in pmd cache
>> 
>> 
>> Daniele Di Proietto (18):
>>   dpif-netdev: Fix memory leak.
>>   dpif-netdev: Take non_pmd_mutex to access tx cached ports.
>>   dpif-netdev: Don't try to output on a device without txqs.
>>   netdev-dpdk: Don't call rte_dev_stop() in update_flags().
>>   netdev-dpdk: Start also dpdkr devices only once on port-add.
>>   netdev-dpdk: Refactor construct and destruct.
>>   dpif-netdev: Use a boolean instead of pmd->port_seq.
>>   dpif-netdev: Block pmd threads if there are no ports.
>>   dpif-netdev: Create pmd threads for every numa node.
>>   dpif-netdev: Make 'static_tx_qid' const.
>>   dpctl: Avoid making assumptions on pmd threads.
>>   ovs-numa: New ovs_numa_dump_contains_core() function.
>>   ovs-numa: Add new dump types.
>>   ovs-numa: Don't use hmap_first_with_hash().
>>   ovs-numa: Add per numa and global counts in dump.
>>   dpif-netdev: Use hmap for poll_list in pmd threads.
>>   dpif-netdev: Centralized threads and queues handling code.
>>   ovs-numa: Remove unused functions.
>> 
>>  lib/dpctl.c   |  107 +---
>>  lib/dpif-netdev.c | 1427 
>> -
>>  lib/dpif.c|6 +-
>>  lib/dpif.h|   12 +-
>>  lib/netdev-dpdk.c |  170 +++
>>  lib/netdev.c  |   41 +-
>>  lib/netdev.h  |1 +
>>  lib/ovs-numa.c|  284 +--
>>  lib/ovs-numa.h|   35 +-
>>  tests/pmd.at  |   49 +-
>>  vswitchd/bridge.c |2 +
>>  11 files changed, 1079 insertions(+), 1055 deletions(-)
>> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Avoids repeated addition of DP_STAT_LOST.

2017-01-16 Thread Daniele Di Proietto





On 16/01/2017 09:31, "Ben Pfaff"  wrote:

>On Mon, Jan 16, 2017 at 04:56:39AM -0800, nickcooper-zhangtonghao wrote:
>> Signed-off-by: nickcooper-zhangtonghao 
>> ---
>>  lib/dpif-netdev.c | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 08167b5..3901129 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -4258,7 +4258,6 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>>  ofpbuf_uninit(&actions);
>>  ofpbuf_uninit(&put_actions);
>>  fat_rwlock_unlock(&dp->upcall_rwlock);
>> -dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
>>  } else if (OVS_UNLIKELY(any_miss)) {
>>  for (i = 0; i < cnt; i++) {
>>  if (OVS_UNLIKELY(!rules[i])) {
>
>Acked-by: Ben Pfaff 
>
>I believe that this also should be tagged:
>
>CC: Daniele Di Proietto 
>Fixes: 8aaa125dab66 ("dpif-netdev: Share emc and fast path output batches.")
>
>Since this dates to May 2015 and DPDK isn't really my area, I'll leave
>this to Daniele for final application.

LGTM as well, I added the tags and applied this to master, branch-2.6 and 
branch-2.5

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles

2017-01-17 Thread Daniele Di Proietto
2017-01-17 11:43 GMT-08:00 Kevin Traynor :
> On 01/17/2017 05:43 PM, Ciara Loftus wrote:
>> Instead of counting all polling cycles as processing cycles, only count
>> the cycles where packets were received from the polling.
>
> This makes these stats much clearer. One minor comment below, other than
> that
>
> Acked-by: Kevin Traynor 
>
>>
>> Signed-off-by: Georg Schmuecking 
>> Signed-off-by: Ciara Loftus 
>> Co-authored-by: Ciara Loftus 

Minor: the co-authored-by tag should be different from the main author.

This makes it easier to understand how busy a pmd thread is, a valid question
that a sysadmin might have.

The counters were originally introduced to help developers understand how cycles
are spent between drivers(netdev rx) and datapath processing(dpif).
Do you think
it's ok to lose this type of information?  Perhaps it is, since a
developer can also
use a profiler, I'm not sure.

Maybe we could 'last_cycles' as it is and introduce a separate counter to get
the idle/busy ratio.  I'm not 100% sure this is the best way.

What do you guys think?

Thanks,

Daniele

>> ---
>> v2:
>> - Rebase
>> ---
>>  lib/dpif-netdev.c | 57 
>> ++-
>>  1 file changed, 44 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 3901129..3854c79 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -272,7 +272,10 @@ enum dp_stat_type {
>>
>>  enum pmd_cycles_counter_type {
>>  PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */
>
> this is not used anymore and can be removed
>
>> -PMD_CYCLES_PROCESSING,  /* Cycles spent processing packets */
>> +PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful 
>> polling */
>> +PMD_CYCLES_PROCESSING,  /* Cycles spent successfully polling and
>> + * processing polled packets */
>> +
>>  PMD_N_CYCLES
>>  };
>>
>> @@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply,
>>  }
>>
>>  ds_put_format(reply,
>> -  "\tpolling cycles:%"PRIu64" (%.02f%%)\n"
>> +  "\tidle cycles:%"PRIu64" (%.02f%%)\n"
>>"\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
>> -  cycles[PMD_CYCLES_POLLING],
>> -  cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100,
>> +  cycles[PMD_CYCLES_IDLE],
>> +  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
>>cycles[PMD_CYCLES_PROCESSING],
>>cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 
>> 100);
>>
>> @@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>>  non_atomic_ullong_add(&pmd->cycles.n[type], interval);
>>  }
>>
>> -static void
>> +/* Calculate the intermediate cycle result and add to the counter 'type' */
>> +static inline void
>> +cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
>> +  enum pmd_cycles_counter_type type)

I'd add an OVS_REQUIRES(&cycles_counter_fake_mutex)

>> +OVS_NO_THREAD_SAFETY_ANALYSIS
>> +{
>> +unsigned long long new_cycles = cycles_counter();
>> +unsigned long long interval = new_cycles - pmd->last_cycles;
>> +pmd->last_cycles = new_cycles;
>> +
>> +non_atomic_ullong_add(&pmd->cycles.n[type], interval);
>> +}
>> +
>> +static int
>>  dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
>> struct netdev_rxq *rx,
>> odp_port_t port_no)
>>  {
>>  struct dp_packet_batch batch;
>>  int error;
>> +int batch_cnt = 0;
>>
>>  dp_packet_batch_init(&batch);
>> -cycles_count_start(pmd);
>>  error = netdev_rxq_recv(rx, &batch);
>> -cycles_count_end(pmd, PMD_CYCLES_POLLING);
>>  if (!error) {
>>  *recirc_depth_get() = 0;
>>
>> -cycles_count_start(pmd);
>> +batch_cnt = batch.count;
>>  dp_netdev_input(pmd, &batch, port_no);
>> -cycles_count_end(pmd, PMD_CYCLES_PROCESSING);
>>  } else if (error != EAGAIN && error != EOPNOTSUPP) {
>>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>>
>>  VLOG_ERR_RL(&rl, "error receiving data from %s: %s",
>>  netdev_rxq_get_name(rx), ovs_strerror(error));
>>  }
>> +
>> +return batch_cnt;
>>  }
>>
>>  static struct tx_port *
>> @@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif)
>>  struct dp_netdev *dp = get_dp_netdev(dpif);
>>  struct dp_netdev_pmd_thread *non_pmd;
>>  uint64_t new_tnl_seq;
>> +int process_packets = 0;
>>
>>  ovs_mutex_lock(&dp->port_mutex);
>>  non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
>>  if (non_pmd) {
>>  ovs_mutex_lock(&dp->non_pmd_mutex);
>> +cycles_count_start(non_pmd);
>>  HMAP_FOR_EACH (port, node, &dp->ports) {
>>  if (!netdev_is_pmd(port->netdev)) {
>>  int i;
>>
>>  

Re: [ovs-dev] [PATCH 1/1] dpif-netdev: Conditional EMC insert

2017-01-17 Thread Daniele Di Proietto
2017-01-12 8:49 GMT-08:00 Ciara Loftus :
> Unconditional insertion of EMC entries results in EMC thrashing at high
> numbers of parallel flows. When this occurs, the performance of the EMC
> often falls below that of the dpcls classifier, rendering the EMC
> practically useless.
>
> Instead of unconditionally inserting entries into the EMC when a miss
> occurs, use a 1% probability of insertion. This ensures that the most
> frequent flows have the highest chance of creating an entry in the EMC,
> and the probability of thrashing the EMC is also greatly reduced.
>
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Georg Schmuecking 
> Co-authored-by: Georg Schmuecking 

Thanks for the patch

> ---
>  lib/dpif-netdev.c | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 546a1e9..8d55ba2 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -144,6 +144,9 @@ struct netdev_flow_key {
>  #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
>  #define EM_FLOW_HASH_SEGS 2
>
> +#define EM_FLOW_INSERT_PROB 100
> +#define EM_FLOW_INSERT_MIN (UINT32_MAX / EM_FLOW_INSERT_PROB)
> +
>  struct emc_entry {
>  struct dp_netdev_flow *flow;
>  struct netdev_flow_key key;   /* key.hash used for emc hash value. */
> @@ -1994,6 +1997,19 @@ emc_insert(struct emc_cache *cache, const struct 
> netdev_flow_key *key,
>  emc_change_entry(to_be_replaced, flow, key);
>  }
>
> +static inline void
> +emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
> + struct emc_cache *cache,
> + const struct netdev_flow_key *key,
> + struct dp_netdev_flow *flow)
> +{
> +/* One in every EM_FLOW_INSERT_PROB packets are inserted to reduce
> + * thrashing */
> +if ((key->hash ^ (uint32_t)pmd->last_cycles) <= EM_FLOW_INSERT_MIN) {

pmd->last_cycles is always 0 when OVS is compiled without DPDK.  While
we currently
don't require high throughput from this code unless DPDK is enabled, I
think that
depending only on the hash might decrease the coverage of the exact
match cache in
the unit tests.

Have you thought about just using a counter?

> +emc_insert(cache, key, flow);
> +}
> +}
> +
>  static inline struct dp_netdev_flow *
>  emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key)
>  {
> @@ -4092,7 +4108,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, 
> struct dp_packet *packet,
>  }
>  ovs_mutex_unlock(&pmd->flow_mutex);
>
> -emc_insert(&pmd->flow_cache, key, netdev_flow);
> +emc_probabilistic_insert(pmd, &pmd->flow_cache, key, netdev_flow);
>  }
>  }
>
> @@ -4187,7 +4203,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>
>  flow = dp_netdev_flow_cast(rules[i]);
>
> -emc_insert(flow_cache, &keys[i], flow);
> +emc_probabilistic_insert(pmd, flow_cache, &keys[i], flow);
>  dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, 
> n_batches);
>  }
>
> --
> 2.4.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] configuration.rst: Update the example of DPDK port's configuration

2017-01-18 Thread Daniele Di Proietto
2017-01-18 11:55 GMT-08:00 Binbin Xu :
> After the hotplug of DPDK ports, a valid dpdk-devargs must be
> specified. Otherwise, the DPDK device can't be available.
>
> Signed-off-by: Binbin Xu 

Thanks! Applied to master and branch-2.7

> ---
>  Documentation/faq/configuration.rst | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/faq/configuration.rst 
> b/Documentation/faq/configuration.rst
> index c03d069..8bd0e11 100644
> --- a/Documentation/faq/configuration.rst
> +++ b/Documentation/faq/configuration.rst
> @@ -107,12 +107,11 @@ Q: How do I configure a DPDK port as an access port?
>  startup when other_config:dpdk-init is set to 'true'.
>
>  Secondly, when adding a DPDK port, unlike a system port, the type for the
> -interface must be specified. For example::
> +interface and valid dpdk-devargs must be specified. For example::
>
>  $ ovs-vsctl add-br br0
> -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
> -
> -Finally, it is required that DPDK port names begin with ``dpdk``.
> +$ ovs-vsctl add-port br0 myportname -- set Interface myportname \
> +type=dpdk options:dpdk-devargs=:06:00.0
>
>  Refer to :doc:`/intro/install/dpdk` for more information on enabling and
>  using DPDK with Open vSwitch.
> --
> 1.8.3.1
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Documentation: Update DPDK doc after port naming change.

2017-01-18 Thread Daniele Di Proietto
options:dpdk-devargs is always required now.  This commit also changes
some of the names from 'dpdk0' to various others.

netdev-dpdk/detach accepts a PCI id instead of a port name.

CC: Ciara Loftus 
Fixes: 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Daniele Di Proietto 
---
 Documentation/howto/dpdk.rst| 77 -
 Documentation/howto/userspace-tunneling.rst |  2 +-
 2 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index fbb4b5361..d1e6e899f 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -44,8 +44,10 @@ ovs-vsctl can also be used to add DPDK devices. OVS expects 
DPDK device names
 to start with ``dpdk`` and end with a portid. ovs-vswitchd should print the
 number of dpdk devices found in the log file::
 
-$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
-$ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0
+$ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
+options:dpdk-devargs=:01:00.1
 
 After the DPDK ports get added to switch, a polling thread continuously polls
 DPDK devices and consumes 100% of the core, as can be checked from ``top`` and
@@ -55,12 +57,12 @@ DPDK devices and consumes 100% of the core, as can be 
checked from ``top`` and
 $ ps -eLo pid,psr,comm | grep pmd
 
 Creating bonds of DPDK interfaces is slightly different to creating bonds of
-system interfaces. For DPDK, the interface type must be explicitly set. For
-example::
+system interfaces. For DPDK, the interface type and devargs must be explicitly
+set. For example::
 
-$ ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 \
--- set Interface dpdk0 type=dpdk \
--- set Interface dpdk1 type=dpdk
+$ ovs-vsctl add-bond br0 dpdkbond p0 p1 \
+-- set Interface p0 type=dpdk options:dpdk-devargs=:01:00.0 \
+-- set Interface p1 type=dpdk options:dpdk-devargs=:01:00.1
 
 To stop ovs-vswitchd & delete bridge, run::
 
@@ -98,7 +100,7 @@ where:
 
 For example::
 
-$ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
+$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
 other_config:pmd-rxq-affinity="0:3,1:7,3:8"
 
 This will ensure:
@@ -165,27 +167,27 @@ Flow Control
 Flow control can be enabled only on DPDK physical ports. To enable flow control
 support at tx side while adding a port, run::
 
-$ ovs-vsctl add-port br0 dpdk0 -- \
-set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0 options:tx-flow-ctrl=true
 
 Similarly, to enable rx flow control, run::
 
-$ ovs-vsctl add-port br0 dpdk0 -- \
-set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0 options:rx-flow-ctrl=true
 
 To enable flow control auto-negotiation, run::
 
-$ ovs-vsctl add-port br0 dpdk0 -- \
-set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0 options:flow-ctrl-autoneg=true
 
 To turn ON the tx flow control at run time for an existing port, run::
 
-$ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true
+$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=true
 
 The flow control parameters can be turned off by setting ``false`` to the
 respective parameter. To disable the flow control at tx side, run::
 
-$ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false
+$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=false
 
 pdump
 -
@@ -234,13 +236,12 @@ enable Jumbo Frames support for a DPDK port, change the 
Interface's
 ``mtu_request`` attribute to a sufficiently large value. For example, to add a
 DPDK Phy port with MTU of 9000::
 
-$ ovs-vsctl add-port br0 dpdk0 \
-  -- set Interface dpdk0 type=dpdk \
-  -- set Interface dpdk0 mtu_request=9000`
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+  options:dpdk-devargs=:01:00.0 mtu_request=9000
 
 Similarly, to change the MTU of an existing port to 6200::
 
-$ ovs-vsctl set Interface dpdk0 mtu_request=6200
+$ ovs-vsctl set Interface dpdk-p0 mtu_request=6200
 
 Some additional configuration is needed to take advantage of jumbo frames with
 vHost ports:
@@ -280,14 +281,14 @@ By default, DPDK physical ports are enabled with Rx 
checksum offload. Rx
 checksum offload can be configured on a DPDK physical port either when adding
 or at run time.
 
-To disable Rx checksum offlo

Re: [ovs-dev] [PATCH] configuration.rst: Update the example of DPDK port's configuration

2017-01-18 Thread Daniele Di Proietto
2017-01-18 15:18 GMT-08:00 Daniele Di Proietto :
> 2017-01-18 11:55 GMT-08:00 Binbin Xu :
>> After the hotplug of DPDK ports, a valid dpdk-devargs must be
>> specified. Otherwise, the DPDK device can't be available.
>>
>> Signed-off-by: Binbin Xu 
>
> Thanks! Applied to master and branch-2.7

I realized that we forgot to update the documentation in other places,
so I sent a patch here:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327782.html

>
>> ---
>>  Documentation/faq/configuration.rst | 7 +++
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/faq/configuration.rst 
>> b/Documentation/faq/configuration.rst
>> index c03d069..8bd0e11 100644
>> --- a/Documentation/faq/configuration.rst
>> +++ b/Documentation/faq/configuration.rst
>> @@ -107,12 +107,11 @@ Q: How do I configure a DPDK port as an access port?
>>  startup when other_config:dpdk-init is set to 'true'.
>>
>>  Secondly, when adding a DPDK port, unlike a system port, the type for 
>> the
>> -interface must be specified. For example::
>> +interface and valid dpdk-devargs must be specified. For example::
>>
>>  $ ovs-vsctl add-br br0
>> -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>> -
>> -Finally, it is required that DPDK port names begin with ``dpdk``.
>> +$ ovs-vsctl add-port br0 myportname -- set Interface myportname \
>> +type=dpdk options:dpdk-devargs=:06:00.0
>>
>>  Refer to :doc:`/intro/install/dpdk` for more information on enabling and
>>  using DPDK with Open vSwitch.
>> --
>> 1.8.3.1
>>
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Update DPDK doc after port naming change.

2017-01-19 Thread Daniele Di Proietto





On 19/01/2017 03:12, "Loftus, Ciara"  wrote:

>> 
>> options:dpdk-devargs is always required now.  This commit also changes
>> some of the names from 'dpdk0' to various others.
>> 
>> netdev-dpdk/detach accepts a PCI id instead of a port name.
>> 
>> CC: Ciara Loftus 
>> Fixes: 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming")
>> Signed-off-by: Daniele Di Proietto 
>
>Patch looks good. Thanks for the fixes!
>
>Acked-by: Ciara Loftus 

Thanks! Pushed to master

>
>> ---
>>  Documentation/howto/dpdk.rst| 77 
>> -
>>  Documentation/howto/userspace-tunneling.rst |  2 +-
>>  2 files changed, 43 insertions(+), 36 deletions(-)
>> 
>> diff --git a/Documentation/howto/dpdk.rst
>> b/Documentation/howto/dpdk.rst
>> index fbb4b5361..d1e6e899f 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -44,8 +44,10 @@ ovs-vsctl can also be used to add DPDK devices. OVS
>> expects DPDK device names
>>  to start with ``dpdk`` and end with a portid. ovs-vswitchd should print the
>>  number of dpdk devices found in the log file::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>> -$ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0
>> +$ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
>> +options:dpdk-devargs=:01:00.1
>> 
>>  After the DPDK ports get added to switch, a polling thread continuously 
>> polls
>>  DPDK devices and consumes 100% of the core, as can be checked from
>> ``top`` and
>> @@ -55,12 +57,12 @@ DPDK devices and consumes 100% of the core, as can
>> be checked from ``top`` and
>>  $ ps -eLo pid,psr,comm | grep pmd
>> 
>>  Creating bonds of DPDK interfaces is slightly different to creating bonds of
>> -system interfaces. For DPDK, the interface type must be explicitly set. For
>> -example::
>> +system interfaces. For DPDK, the interface type and devargs must be
>> explicitly
>> +set. For example::
>> 
>> -$ ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 \
>> --- set Interface dpdk0 type=dpdk \
>> --- set Interface dpdk1 type=dpdk
>> +$ ovs-vsctl add-bond br0 dpdkbond p0 p1 \
>> +-- set Interface p0 type=dpdk options:dpdk-devargs=:01:00.0 \
>> +-- set Interface p1 type=dpdk options:dpdk-devargs=:01:00.1
>> 
>>  To stop ovs-vswitchd & delete bridge, run::
>> 
>> @@ -98,7 +100,7 @@ where:
>> 
>>  For example::
>> 
>> -$ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>> +$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
>>  other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>> 
>>  This will ensure:
>> @@ -165,27 +167,27 @@ Flow Control
>>  Flow control can be enabled only on DPDK physical ports. To enable flow
>> control
>>  support at tx side while adding a port, run::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- \
>> -set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0 options:tx-flow-ctrl=true
>> 
>>  Similarly, to enable rx flow control, run::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- \
>> -set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0 options:rx-flow-ctrl=true
>> 
>>  To enable flow control auto-negotiation, run::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- \
>> -set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0 options:flow-ctrl-autoneg=true
>> 
>>  To turn ON the tx flow control at run time for an existing port, run::
>> 
>> -$ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true
>> +$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=true
>> 
>>  The flow control parameters can be turned off by setting ``false`` to the
>>  respective parameter. To disable the flow control at tx side, run::
>> 
>> -$ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false
>

Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles

2017-01-20 Thread Daniele Di Proietto
2017-01-20 5:59 GMT-08:00 Jan Scheurich :
>
>
> On 2017-01-18 17:32, Kevin Traynor wrote:
>>
>> On 01/18/2017 01:34 AM, Daniele Di Proietto wrote:
>>>
>>> 2017-01-17 11:43 GMT-08:00 Kevin Traynor :
>>>>
>>>> On 01/17/2017 05:43 PM, Ciara Loftus wrote:
>>>>>
>>>>> Instead of counting all polling cycles as processing cycles, only count
>>>>> the cycles where packets were received from the polling.
>>>>
>>>> This makes these stats much clearer. One minor comment below, other than
>>>> that
>>>>
>>>> Acked-by: Kevin Traynor 
>>>>
>>>>> Signed-off-by: Georg Schmuecking 
>>>>> Signed-off-by: Ciara Loftus 
>>>>> Co-authored-by: Ciara Loftus 
>>>
>>> Minor: the co-authored-by tag should be different from the main author.
>>>
>>> This makes it easier to understand how busy a pmd thread is, a valid
>>> question
>>> that a sysadmin might have.
>>>
>>> The counters were originally introduced to help developers understand how
>>> cycles
>>> are spent between drivers(netdev rx) and datapath processing(dpif).
>>> Do you think
>>> it's ok to lose this type of information?  Perhaps it is, since a
>>> developer can also
>>> use a profiler, I'm not sure.
>>>
>>> Maybe we could 'last_cycles' as it is and introduce a separate counter to
>>> get
>>> the idle/busy ratio.  I'm not 100% sure this is the best way.
>>>
>>> What do you guys think?
>>>
>> I've only ever used the current stats for trying to estimate if polling
>> was getting packets or not, so the addition of an idle stat helps that.
>> I like your suggestion of having all three stats, so then it would be
>> something like:
>>
>> polling unsuccessful (idle)
>> polling successful (got pkts)
>> processing pkts
>>
>> That would keep the info for a developer and it could help initial debug
>> if pkt rates drop on a pmd.
>>
>> Kevin.
>
>
> From an operational perspective, the most important data is clearly the
> fraction of busy cycles. Any additional breakdown of busy cycles is
> debatable. We have always been wondering why Rx cost was accounted for
> separately in the current code, while Tx cost was included in the
> processing. That didn't make much sense to us.
>
> A developer should be able to split the busy cycles between Rx polling,
> processing (parsing, EMC lookup, dplcs lookup, upcall(!), actions) and Tx to
> port by analysing "perf top" output, as we have done in the analysis for our
> performance patches, or using a fancier profiler.

Thanks Kevin and Jan, based on the above discussion I think we can remove
the distinction between successfully polling and processing, meaning that the
patch is good.

Since we want to expose this the user rather than the developer, I think that
the documentation in vswitchd/ovs-vswitchd.8.in should explain the meaning
of idle cycles and processing cycles.

>
> One additional metric that would be interesting to see in pmd_stats_show,
> however, is the average number of packets per batch polled from a port (or
> recirculated).

Good point.  Maybe we can address this in a separate patch?

>
> Regards, Jan
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] dpif-netdev: Conditional EMC insert

2017-01-20 Thread Daniele Di Proietto
2017-01-20 5:38 GMT-08:00 Jan Scheurich :
>
>>> +static inline void
>>> +emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
>>> + struct emc_cache *cache,
>>> + const struct netdev_flow_key *key,
>>> + struct dp_netdev_flow *flow)
>>> +{
>>> +/* One in every EM_FLOW_INSERT_PROB packets are inserted to reduce
>>> + * thrashing */
>>> +if ((key->hash ^ (uint32_t)pmd->last_cycles) <= EM_FLOW_INSERT_MIN)
>>> {
>>
>> pmd->last_cycles is always 0 when OVS is compiled without DPDK.  While
>> we currently
>> don't require high throughput from this code unless DPDK is enabled, I
>> think that
>> depending only on the hash might decrease the coverage of the exact
>> match cache in
>> the unit tests.
>>
>> Have you thought about just using a counter?
>
>
> I think that probabilistic insertion will in any case impact EMC coverage in
> unit test cases. I doubt that there are test cases that trigger the same DP
> flow often enough to have it inserted into EMC with any probability.
>
> If unit test coverage is the issue, we should set the EMC insertion
> probability to 100% when executing unit tests, e.g. by defining
> emc_probabilistic_insert() to unconditionally call emc_insert() when not
> compiling for DPDK.
>

It's not a big deal, since the most important use case we have for
dpif-netdev is with dpdk, but I'd still like the code to behave
similarly on different platforms.  How about defining a function that
uses random_uint32 when compiling without DPDK?

For testing it's not that simple, because unit tests can be run with
or without DPDK.  It would need to be configurable at runtime.
Perhaps making EM_FLOW_INSERT_PROB configurable at runtime would also
help people that want to experiment with different values, even
though, based on the comments, I guess they wouldn't really see much
difference.

Again, what do you think about simply using counting the packets and
inserting only 1 every EM_FLOW_INSERT_PROB?

Thanks,

Daniele

> Regards, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] dpif-netdev: Conditional EMC insert

2017-01-23 Thread Daniele Di Proietto
2017-01-22 11:45 GMT-08:00 Jan Scheurich :
>
>> It's not a big deal, since the most important use case we have for
>> dpif-netdev is with dpdk, but I'd still like the code to behave
>> similarly on different platforms.  How about defining a function that
>> uses random_uint32 when compiling without DPDK?
>>
>> For testing it's not that simple, because unit tests can be run with
>> or without DPDK.  It would need to be configurable at runtime.
>> Perhaps making EM_FLOW_INSERT_PROB configurable at runtime would also
>> help people that want to experiment with different values, even
>> though, based on the comments, I guess they wouldn't really see much
>> difference.
>>
>> Again, what do you think about simply using counting the packets and
>> inserting only 1 every EM_FLOW_INSERT_PROB?
>>
>> Thanks,
>>
>> Daniele
>
>
> As far as I know Ciara did some quick tests with a counter-based
> implementation and it performed 5% worse for 1K and 4K flows than then
> current patch. Perhaps we could find the reason for that and fix it, but I
> also feel uncomfortable with deterministic insertion of every Nth flow. This
> could lead to very strange lock-step phenomena with typical artificial test
> work loads, which often generate flows round-robin. I would rather use a
> random function, as you suggest, or count "cycles" differently when
> compiling without DPDK.

Ok, using another pseudo random function when compiling without DPDK sounds
good to me.

>
> I agree to making the parameter EM_FLOW_INSERT_PROB configurable for unit
> test or other purposes. Should it be a new option in the OpenvSwitch table
> in OVSDB or rather a run-time parameter to be changed with ovs-appctl?

I think a new option in Openvswitch other_config would be appropriate.

Thanks,

Daniele

>
> Jan
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/3] rhel: Remove obsolete OVSDPDKVhostPort from ifdown script.

2017-01-24 Thread Daniele Di Proietto
The support for vhost cuse port has been removed long ago.

Fixes:419876444357("netdev-dpdk: Remove dpdkvhostcuse ports")
Signed-off-by: Daniele Di Proietto 
---
 rhel/etc_sysconfig_network-scripts_ifdown-ovs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rhel/etc_sysconfig_network-scripts_ifdown-ovs 
b/rhel/etc_sysconfig_network-scripts_ifdown-ovs
index 39884016c..8c9f3694c 100755
--- a/rhel/etc_sysconfig_network-scripts_ifdown-ovs
+++ b/rhel/etc_sysconfig_network-scripts_ifdown-ovs
@@ -59,7 +59,7 @@ case "$TYPE" in
OVSPatchPort|OVSTunnel)
ovs-vsctl -t ${TIMEOUT} -- --if-exists del-port "$OVS_BRIDGE" 
"$DEVICE"
;;
-   
OVSDPDKPort|OVSDPDKRPort|OVSDPDKVhostPort|OVSDPDKVhostUserPort|OVSDPDKBond)
+   OVSDPDKPort|OVSDPDKRPort|OVSDPDKVhostUserPort|OVSDPDKBond)
ovs-vsctl -t ${TIMEOUT} -- --if-exists del-port "$OVS_BRIDGE" 
"$DEVICE"
;;
*)
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/3] rhel: Fix ifdown for OVSDPDKBond.

2017-01-24 Thread Daniele Di Proietto
The OVSDPDKBond case wasn't handled in the rhel ifdown script.

Fixes: f6bf8880613a ("rhel: Add support DPDK port creation via network scripts")
Signed-off-by: Daniele Di Proietto 
---
 rhel/etc_sysconfig_network-scripts_ifdown-ovs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rhel/etc_sysconfig_network-scripts_ifdown-ovs 
b/rhel/etc_sysconfig_network-scripts_ifdown-ovs
index dd98d2323..39884016c 100755
--- a/rhel/etc_sysconfig_network-scripts_ifdown-ovs
+++ b/rhel/etc_sysconfig_network-scripts_ifdown-ovs
@@ -59,7 +59,7 @@ case "$TYPE" in
OVSPatchPort|OVSTunnel)
ovs-vsctl -t ${TIMEOUT} -- --if-exists del-port "$OVS_BRIDGE" 
"$DEVICE"
;;
-   OVSDPDKPort|OVSDPDKRPort|OVSDPDKVhostPort|OVSDPDKVhostUserPort)
+   
OVSDPDKPort|OVSDPDKRPort|OVSDPDKVhostPort|OVSDPDKVhostUserPort|OVSDPDKBond)
ovs-vsctl -t ${TIMEOUT} -- --if-exists del-port "$OVS_BRIDGE" 
"$DEVICE"
;;
*)
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/3] rhel: Fix ifup and ifdown after DPDK naming change.

2017-01-24 Thread Daniele Di Proietto
Names like dpdk0 and dpdk1 are not enough to identify a DPDK interface.
We could update README.RHEL.rst and add

OVS_EXTRA='set Interface ${DEVICE} options:dpdk-devargs=:01:00.0'

but a better solution is to add new parameters in the configuration file
to explicitly specify the dpdk-devargs.

Fixes: 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Daniele Di Proietto 
---
 rhel/README.RHEL.rst| 13 +
 rhel/etc_sysconfig_network-scripts_ifup-ovs |  6 --
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/rhel/README.RHEL.rst b/rhel/README.RHEL.rst
index afccf1703..af4589325 100644
--- a/rhel/README.RHEL.rst
+++ b/rhel/README.RHEL.rst
@@ -266,14 +266,16 @@ DPDK NIC port:
 
 ::
 
-==> ifcfg-dpdk0 <==
-DPDK vhost-user port:
-DEVICE=dpdk0
+==> ifcfg-mydpdk0 <==
+DEVICE=mydpdk0
+DPDK_DEVARGS=":01:00.0"
 ONBOOT=yes
 DEVICETYPE=ovs
 TYPE=OVSDPDKPort
 OVS_BRIDGE=obr0
 
+DPDK vhost-user port:
+
 ::
 
 ==> ifcfg-vhu0 <==
@@ -283,6 +285,8 @@ DPDK NIC port:
 TYPE=OVSDPDKVhostUserPort
 OVS_BRIDGE=obr0
 
+DPDK bond:
+
 ::
 
 ==> ifcfg-bond0 <==
@@ -292,7 +296,8 @@ DPDK NIC port:
 TYPE=OVSDPDKBond
 OVS_BRIDGE=ovsbridge0
 BOOTPROTO=none
-BOND_IFACES="dpdk0 dpdk1"
+BOND_IFACES="mydpdk0 mydpdk1"
+BOND_DPDK_DEVARGS=":01:00.0 :06:00.0"
 OVS_OPTIONS="bond_mode=active-backup"
 HOTPLUG=no
 
diff --git a/rhel/etc_sysconfig_network-scripts_ifup-ovs 
b/rhel/etc_sysconfig_network-scripts_ifup-ovs
index e49e6fe71..8fe60fcb1 100755
--- a/rhel/etc_sysconfig_network-scripts_ifup-ovs
+++ b/rhel/etc_sysconfig_network-scripts_ifup-ovs
@@ -170,7 +170,7 @@ case "$TYPE" in
ovs-vsctl -t ${TIMEOUT} \
-- --if-exists del-port "$OVS_BRIDGE" "$DEVICE" \
-- add-port "$OVS_BRIDGE" "$DEVICE" $OVS_OPTIONS \
-   -- set Interface "$DEVICE" type=dpdk ${OVS_EXTRA+-- 
$OVS_EXTRA}
+   -- set Interface "$DEVICE" type=dpdk 
options:dpdk-devargs="${DPDK_DEVARGS}" ${OVS_EXTRA+-- $OVS_EXTRA}
;;
OVSDPDKRPort)
ifup_ovs_bridge
@@ -188,8 +188,10 @@ case "$TYPE" in
;;
OVSDPDKBond)
ifup_ovs_bridge
+   set -- ${BOND_DPDK_DEVARGS}
for _iface in $BOND_IFACES; do
-   IFACE_TYPES="${IFACE_TYPES} -- set interface ${_iface} 
type=dpdk"
+   IFACE_TYPES="${IFACE_TYPES} -- set interface ${_iface} 
type=dpdk options:dpdk-devargs=$1"
+   shift
done
ovs-vsctl -t ${TIMEOUT} \
-- --if-exists del-port "$OVS_BRIDGE" "$DEVICE" \
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] selinux: Allow creating tap devices.

2017-01-24 Thread Daniele Di Proietto
Current SELinux policy in RHEL and Fedora doesn't allow the creation of
TAP devices.

A tap device is used by dpif-netdev to create internal devices.

Without this patch, adding any bridge backed by the userspace datapath
would fail.

This doesn't mean that we can run Open vSwitch with DPDK under SELinux
yet, but at least we can use the userspace datapath.

Signed-off-by: Daniele Di Proietto 
---
 selinux/openvswitch-custom.te | 5 +
 1 file changed, 5 insertions(+)

diff --git a/selinux/openvswitch-custom.te b/selinux/openvswitch-custom.te
index 47ddb562c..98de89c98 100644
--- a/selinux/openvswitch-custom.te
+++ b/selinux/openvswitch-custom.te
@@ -5,8 +5,11 @@ require {
 type openvswitch_tmp_t;
 type ifconfig_exec_t;
 type hostname_exec_t;
+type tun_tap_device_t;
 class netlink_socket { setopt getopt create connect getattr write read 
};
 class file { write getattr read open execute execute_no_trans };
+class chr_file { ioctl open read write };
+class tun_socket { create };
 }
 
 #= openvswitch_t ==
@@ -14,3 +17,5 @@ allow openvswitch_t self:netlink_socket { setopt getopt 
create connect getattr w
 allow openvswitch_t hostname_exec_t:file { read getattr open execute 
execute_no_trans };
 allow openvswitch_t ifconfig_exec_t:file { read getattr open execute 
execute_no_trans };
 allow openvswitch_t openvswitch_tmp_t:file { execute execute_no_trans };
+allow openvswitch_t self:tun_socket { create };
+allow openvswitch_t tun_tap_device_t:chr_file { ioctl open read write };
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-fields: Eliminate non-ASCII characters from groff input.

2017-01-25 Thread Daniele Di Proietto
2017-01-25 17:07 GMT-08:00 Ben Pfaff :
> It's difficult to make groff portably accept non-ASCII characters.  It's
> easier to replace them by groff escapes for the same characters, which
> this commit does.
>
> Fixes: 96fee5e0a2a0 ("ovs-fields: New manpage to document Open vSwitch and 
> OpenFlow fields.")
> Signed-off-by: Ben Pfaff 

I'm not an expert in groff, but

Acked-by: Daniele Di Proietto 

There's an extra failure on OS X:

https://travis-ci.org/openvswitch/ovs/jobs/195350024

lib/ovs-fields.7:403: warning: macro `TQ' not defined

Not sure if it should be addressed it in this patch

Thanks

> ---
>  build-aux/extract-ofp-fields | 2 ++
>  lib/meta-flow.xml| 3 +--
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
> index bc15a8b..4c92246 100755
> --- a/build-aux/extract-ofp-fields
> +++ b/build-aux/extract-ofp-fields
> @@ -752,6 +752,8 @@ ovs\-fields \- protocol header fields in OpenFlow and 
> Open vSwitch
>  oline = oline.replace(u'\u2208', r'\[mo]')
>  oline = oline.replace(u'\u2260', r'\[!=]')
>  oline = oline.replace(u'\u2264', r'\[<=]')
> +oline = oline.replace(u'\u2265', r'\[>=]')
> +oline = oline.replace(u'\u00d7', r'\[mu]')
>  if len(oline):
>  output += [oline]
>
> diff --git a/lib/meta-flow.xml b/lib/meta-flow.xml
> index 186a8db..3db0f82 100644
> --- a/lib/meta-flow.xml
> +++ b/lib/meta-flow.xml
> @@ -3102,8 +3102,7 @@ actions=clone(load:0->NXM_OF_IN_PORT[],output:123)
>
>  vlan_tci=0x5000/0xe000
>  
> -  Match packets with no 802.1Q header or tagged with prior‐
> -  ity 2 (in any VLAN).
> +  Match packets with no 802.1Q header or tagged with priority 2 (in 
> any VLAN).
>  
>
>  vlan_tci=0/0xefff
> --
> 2.10.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] selinux: Allow creating tap devices.

2017-01-25 Thread Daniele Di Proietto





On 25/01/2017 00:01, "Ansis Atteka"  wrote:

>
>
>On Jan 25, 2017 4:22 AM, "Daniele Di Proietto"  wrote:
>
>Current SELinux policy in RHEL and Fedora doesn't allow the creation of
>TAP devices.
>
>A tap device is used by dpif-netdev to create internal devices.
>
>Without this patch, adding any bridge backed by the userspace datapath
>would fail.
>
>This doesn't mean that we can run Open vSwitch with DPDK under SELinux
>yet, but at least we can use the userspace datapath.
>
>Signed-off-by: Daniele Di Proietto 
>
>
>
>
>Acked-by: Ansis Atteka 
>
>
>I saw that other open source projects like OpenVPN use rw_file_perms shortcut 
>macro. Not sure how relevant that is for OVS but that macro expands to a 
>little more function calls than what you have below. Maybe we don't need it, 
>if what you have
> just worked.

Thanks a lot for the review.

I cooked this up using audit2allow and I tested it on fedora 25.  I'm now able 
to create and delete userspace bridges, without any further complaints from 
selinux

I'm definitely not an expert in SELinux, so I'm not sure if it's better to use 
the macro and ask for extra permission, or to hardcode the list.

What do you think?

>
>---
> selinux/openvswitch-custom.te | 5 +
> 1 file changed, 5 insertions(+)
>
>diff --git a/selinux/openvswitch-custom.te b/selinux/openvswitch-custom.te
>index 47ddb562c..98de89c98 100644
>--- a/selinux/openvswitch-custom.te
>+++ b/selinux/openvswitch-custom.te
>@@ -5,8 +5,11 @@ require {
> type openvswitch_tmp_t;
> type ifconfig_exec_t;
> type hostname_exec_t;
>+type tun_tap_device_t;
> class netlink_socket { setopt getopt create connect getattr write 
> read };
> class file { write getattr read open execute execute_no_trans };
>+class chr_file { ioctl open read write };
>
>
>
>
>+class tun_socket { create };
> }
>
> #= openvswitch_t ==
>@@ -14,3 +17,5 @@ allow openvswitch_t self:netlink_socket { setopt getopt 
>create connect getattr w
> allow openvswitch_t hostname_exec_t:file { read getattr open execute 
> execute_no_trans };
> allow openvswitch_t ifconfig_exec_t:file { read getattr open execute 
> execute_no_trans };
> allow openvswitch_t openvswitch_tmp_t:file { execute execute_no_trans };
>+allow openvswitch_t self:tun_socket { create };
>+allow openvswitch_t tun_tap_device_t:chr_file { ioctl open read write };
>--
>2.11.0
>
>___
>dev mailing list
>d...@openvswitch.org
>https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] dpif-netdev: Conditional EMC insert

2017-01-25 Thread Daniele Di Proietto
2017-01-25 7:52 GMT-08:00 Loftus, Ciara :
>> 2017-01-22 11:45 GMT-08:00 Jan Scheurich :
>> >
>> >> It's not a big deal, since the most important use case we have for
>> >> dpif-netdev is with dpdk, but I'd still like the code to behave
>> >> similarly on different platforms.  How about defining a function that
>> >> uses random_uint32 when compiling without DPDK?
>> >>
>> >> For testing it's not that simple, because unit tests can be run with
>> >> or without DPDK.  It would need to be configurable at runtime.
>> >> Perhaps making EM_FLOW_INSERT_PROB configurable at runtime would
>> also
>> >> help people that want to experiment with different values, even
>> >> though, based on the comments, I guess they wouldn't really see much
>> >> difference.
>> >>
>> >> Again, what do you think about simply using counting the packets and
>> >> inserting only 1 every EM_FLOW_INSERT_PROB?
>> >>
>> >> Thanks,
>> >>
>> >> Daniele
>> >
>> >
>> > As far as I know Ciara did some quick tests with a counter-based
>> > implementation and it performed 5% worse for 1K and 4K flows than then
>> > current patch. Perhaps we could find the reason for that and fix it, but I
>> > also feel uncomfortable with deterministic insertion of every Nth flow. 
>> > This
>> > could lead to very strange lock-step phenomena with typical artificial test
>> > work loads, which often generate flows round-robin. I would rather use a
>> > random function, as you suggest, or count "cycles" differently when
>> > compiling without DPDK.
>>
>> Ok, using another pseudo random function when compiling without DPDK
>> sounds
>> good to me.
>>
>
> Any suggestions for the random function?

I think we can use random_uint32() from lib/random.h

>
>> >
>> > I agree to making the parameter EM_FLOW_INSERT_PROB configurable for
>> unit
>> > test or other purposes. Should it be a new option in the OpenvSwitch table
>> > in OVSDB or rather a run-time parameter to be changed with ovs-appctl?
>>
>> I think a new option in Openvswitch other_config would be appropriate.
>
> I like this idea. I've started making these changes. How about something like 
> the following?..
>
> +   +  type='{"type": "integer", "minInteger": 0, "maxInteger": 
> 4294967295}'>
> +
> +  Specifies the probability (1/emc-insert-prob) of a flow being
> +  inserted into the Exact Match Cache (EMC). Higher values of
> +  emc-insert-prob will result in less insertions, and lower
> +  values will result in more insertions. A value of zero will
> +  result in no insertions and essentially disable the EMC.
> +
> +
> +  Defaults to 100 ie. there is 1/100 chance of EMC insertion.

Looks good to me, thanks.

I would also add that this only applies to 'netdev' bridges (userspace) and that
a value of 1 means that every flow is going to be sent to the EMC.

>
> Thanks,
> Ciara
>
>>
>> Thanks,
>>
>> Daniele
>>
>> >
>> > Jan
>> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] extract-ofp-fields: Define .TQ directive in nroff output.

2017-01-25 Thread Daniele Di Proietto
2017-01-25 20:31 GMT-08:00 Ben Pfaff :
> This missing directive caused groff warnings and probably some erroneous
> output too.
>
> Fixes: 96fee5e0a2a0 ("ovs-fields: New manpage to document Open vSwitch and 
> OpenFlow fields.")
> Reported-by: Daniele Di Proietto 
> Signed-off-by: Ben Pfaff 

Acked-by: Daniele Di Proietto 

> ---
>  build-aux/extract-ofp-fields | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
> index 4c92246..333d90e 100755
> --- a/build-aux/extract-ofp-fields
> +++ b/build-aux/extract-ofp-fields
> @@ -714,6 +714,12 @@ def make_ovs_fields(meta_flow_h, meta_flow_xml):
>  .  ns
>  .  IP "$1"
>  ..
> +
> +.de TQ
> +.  br
> +.  ns
> +.  TP "$1"
> +..
>  .de URL
>  $2 \\(laURL: $1 \\(ra$3
>  ..
> --
> 2.10.2
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] selinux: Allow creating tap devices.

2017-01-26 Thread Daniele Di Proietto





On 26/01/2017 12:35, "Ansis Atteka"  wrote:

>
>
>On 26 January 2017 at 21:24, Aaron Conole 
> wrote:
>
>Daniele Di Proietto  writes:
>
>> On 25/01/2017 00:01, "Ansis Atteka"  wrote:
>>
>>>On Jan 25, 2017 4:22 AM, "Daniele Di Proietto"  
>>>wrote:
>>>
>>>Current SELinux policy in RHEL and Fedora doesn't allow the creation of
>>>TAP devices.
>>>
>>>A tap device is used by dpif-netdev to create internal devices.
>>>
>>>Without this patch, adding any bridge backed by the userspace datapath
>>>would fail.
>>>
>>>This doesn't mean that we can run Open vSwitch with DPDK under SELinux
>>>yet, but at least we can use the userspace datapath.
>>>
>>>Signed-off-by: Daniele Di Proietto 
>
>I just noticed this, sorry for jumping in late.
>
>>>Acked-by: Ansis Atteka 
>>>
>>>
>>>I saw that other open source projects like OpenVPN use rw_file_perms
>>> shortcut macro. Not sure how relevant that is for OVS but that macro
>>> expands to a little more function calls than what you have
>>> below. Maybe we don't need it, if what you have
>>> just worked.
>>
>> Thanks a lot for the review.
>>
>> I cooked this up using audit2allow and I tested it on fedora 25.  I'm
>> now able to create and delete userspace bridges, without any further
>> complaints from selinux
>
>I have the following openvswitch-custom.te that did work to run
>ovs+dpdk under selinux and pass traffic:
>
>
>Thanks for posting this. I think that this is really helpful to gather all 
>necessary OVS+DPDK rules from different sources to make sure that nothing is 
>missed.

+1, thanks a lot

> 
>
>
> 8< 
>
>require {
>type openvswitch_t;
>type openvswitch_tmp_t;
>type openvswitch_var_run_t;
>type ifconfig_exec_t;
>type hostname_exec_t;
>type vfio_device_t;
>type kernel_t;
>type tun_tap_device_t;
>type hugetlbfs_t;
>type init_t;
>class netlink_socket { setopt getopt create connect getattr write read 
> };
>class file { write getattr read open execute execute_no_trans create 
> unlink };
>class chr_file { write getattr read open ioctl };
>class unix_stream_socket { write getattr read connectto connect setopt 
> getopt sendto accept bind recvfrom acceptfrom };
>class dir { write remove_name add_name lock read };
>}
>
>#= openvswitch_t ==
>allow openvswitch_t self:netlink_socket { setopt getopt create connect getattr 
>write read };
>allow openvswitch_t hostname_exec_t:file { read getattr open execute 
>execute_no_trans };
>allow openvswitch_t ifconfig_exec_t:file { read getattr open execute 
>execute_no_trans };
>allow openvswitch_t openvswitch_tmp_t:file { execute execute_no_trans };
>allow openvswitch_t openvswitch_tmp_t:unix_stream_socket { write getattr read 
>connectto connect setopt getopt sendto accept bind recvfrom acceptfrom };
>allow openvswitch_t vfio_device_t:chr_file { read write open ioctl getattr };
>allow openvswitch_t tun_tap_device_t:chr_file { read write getattr open ioctl 
>};
>allow openvswitch_t hugetlbfs_t:dir { write remove_name add_name lock read };
>allow openvswitch_t hugetlbfs_t:file { create unlink };
>allow openvswitch_t kernel_t:unix_stream_socket { write getattr read connectto 
>connect setopt getopt sendto accept bind recvfrom acceptfrom };
>allow openvswitch_t init_t:file { read open };
>
> >8 
>
>You'll note that this change gives the openvswitch complete access to
>hugetlbfs label, which might be the biggest scary part.
>
>
>There is also option to use SELinux switches that allow to activate only 
>subset of SElinux rules on a "per OVS feature basis" if there is risk that 
>because of DPDK whitelise we could be unconditionally loosening up SElinux 
>policy too much for non-DPDK
> cases. See [https://wiki.centos.org/TipsAndTricks/SelinuxBooleans] for more 
> details.

Ok, so perhaps we should require tun_tap_device_t permissions only if we enable 
userspace support with a boolean.

I just posted this piece because the corresponding code is in openvswitch 
source tree.

The rest of the permissions (hugepages, vfio) are required because of code 
that's in the dpdk library.  Is there a way to put these in DPDK and then just 
call a macro here, like

dpdk_perms(openvswitch_t)

I'm a little bit concerned because there are different drivers in DPDK and they 
require

[ovs-dev] [PATCH] dpif-netdev: Pass Openvswitch other_config smap to dpif.

2017-01-27 Thread Daniele Di Proietto
Currently we parse the 'other_config' column in Openvswitch table in
bridge.c.  We extract the values (just 'pmd-cpu-mask' for now) and we
pass them down to the datapath, via different layers.

If we want to pass other values to dpif-netdev.c (like we recently
discussed) we would have to touch ofproto.c, ofproto-dpif.c and dpif.c.

This patch sends the entire other_config column to dpif-netdev, so that
dpif-netdev can extract the values it's interested in.

No functional change.

Signed-off-by: Daniele Di Proietto 
---
I don't like that dpif-netdev receives the whole other_config column,
because it contains other values which are completely unrelated, but
unfortunately there's no better place in the database for datapath
specific configuration.
---
 lib/dpif-netdev.c  |  9 +
 lib/dpif-netlink.c |  2 +-
 lib/dpif-provider.h|  8 +++-
 lib/dpif.c | 12 ++--
 lib/dpif.h |  2 +-
 ofproto/ofproto-dpif.c | 19 ---
 ofproto/ofproto-provider.h | 11 ---
 ofproto/ofproto.c  | 13 +
 ofproto/ofproto.h  |  3 ++-
 vswitchd/bridge.c  | 18 +-
 10 files changed, 68 insertions(+), 29 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 719a51823..0be5db514 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2724,12 +2724,13 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**ops, size_t n_ops)
 }
 }
 
-/* Changes the number or the affinity of pmd threads.  The changes are actually
- * applied in dpif_netdev_run(). */
+/* Applies datapath configuration from the database. Some of the changes are
+ * actually applied in dpif_netdev_run(). */
 static int
-dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
+dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
+const char *cmask = smap_get(other_config, "pmd-cpu-mask");
 
 if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
 free(dp->pmd_cmask);
@@ -4844,7 +4845,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_operate,
 NULL,   /* recv_set */
 NULL,   /* handlers_set */
-dpif_netdev_pmd_set,
+dpif_netdev_set_config,
 dpif_netdev_queue_to_priority,
 NULL,   /* recv */
 NULL,   /* recv_wait */
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index c8b0e37f9..9762a87be 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2387,7 +2387,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_operate,
 dpif_netlink_recv_set,
 dpif_netlink_handlers_set,
-NULL,   /* poll_thread_set */
+NULL,   /* set_config */
 dpif_netlink_queue_to_priority,
 dpif_netlink_recv,
 dpif_netlink_recv_wait,
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index d3b2bb91d..a0dc1ef35 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -326,11 +326,9 @@ struct dpif_class {
  * */
 int (*handlers_set)(struct dpif *dpif, uint32_t n_handlers);
 
-/* If 'dpif' creates its own I/O polling threads, refreshes poll threads
- * configuration.  'cmask' configures the cpu mask for setting the polling
- * threads' cpu affinity.  The implementation might postpone applying the
- * changes until run() is called. */
-int (*poll_threads_set)(struct dpif *dpif, const char *cmask);
+/* Pass custom configuration options to the datapath.  The implementation
+ * might postpone applying the changes until run() is called. */
+int (*set_config)(struct dpif *dpif, const struct smap *other_config);
 
 /* Translates OpenFlow queue ID 'queue_id' (in host byte order) into a
  * priority value used for setting packet priority. */
diff --git a/lib/dpif.c b/lib/dpif.c
index 374f013ab..57aa3c6c4 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1440,17 +1440,17 @@ dpif_print_packet(struct dpif *dpif, struct dpif_upcall 
*upcall)
 }
 }
 
-/* If 'dpif' creates its own I/O polling threads, refreshes poll threads
- * configuration. */
+/* Pass custom configuration to the datapath implementation.  Some of the
+ * changes can be postponed until dpif_run() is called. */
 int
-dpif_poll_threads_set(struct dpif *dpif, const char *cmask)
+dpif_set_config(struct dpif *dpif, const struct smap *cfg)
 {
 int error = 0;
 
-if (dpif->dpif_class->poll_threads_set) {
-error = dpif->dpif_class->poll_threads_set(dpif, cmask);
+if (dpif->dpif_class->set_config) {
+error = dpif->dpif_class->set_config(dpif, cfg);
 if (error) {
-log_operation(dpif, "poll_threads_set", error);
+log_operation(dpif, "set_config

Re: [ovs-dev] [PATCH v2] dpif-netdev: Conditional EMC insert

2017-01-27 Thread Daniele Di Proietto
2017-01-26 9:51 GMT-08:00 Ciara Loftus :
> Unconditional insertion of EMC entries results in EMC thrashing at high
> numbers of parallel flows. When this occurs, the performance of the EMC
> often falls below that of the dpcls classifier, rendering the EMC
> practically useless.
>
> Instead of unconditionally inserting entries into the EMC when a miss
> occurs, use a 1% probability of insertion. This ensures that the most
> frequent flows have the highest chance of creating an entry in the EMC,
> and the probability of thrashing the EMC is also greatly reduced.
>
> The probability of insertion is configurable, via the
> other_config:emc-insert-prob option. For example the following command
> increases the insertion probability to 1/10 ie. 10%.
>
> ovs-vsctl set Open_vSwitch . other_config:emc-insert-prob=10
>
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Georg Schmuecking 
> Co-authored-by: Georg Schmuecking 

Thanks for v2

I think the patch doesn't compile without DPDK.  Also there's no way to control
the value without DPDK.

I think we could pass down the value like we do for pmd-cpu-mask, this would
make it work even without DPDK.  I sent a patch that extends what we do for
pmd-cpu-mask:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/328161.html

Can we avoid having to restart the daemon when we want to change this?

I think we should store the probability in 'struct dp_netdev' using an atomic
uint32.  We can read and write to it using atomic relaxed operation which
have no additional cost, like we do for 'enable_megaflows' in
ofproto-dpif-upcall.c

If you want to store it in pmd without atomics, like Jan suggested, I
think we can
use reconfiguration to change it at runtime.

Thanks,

Daniele

> ---
> v2:
> - Enable probability configurability via other_config:emc-insert-prob
>   option.
>
>  Documentation/howto/dpdk.rst | 23 +++
>  NEWS |  2 ++
>  lib/dpdk.c   | 15 +++
>  lib/dpdk.h   |  1 +
>  lib/dpif-netdev.c| 28 ++--
>  vswitchd/vswitch.xml | 17 +
>  6 files changed, 84 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d1e6e89..a37b9d5 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -354,6 +354,29 @@ the `DPDK documentation
>
>  Note: Not all DPDK virtual PMD drivers have been tested and verified to work.
>
> +EMC Insertion Probability
> +-
> +By default 1 in every 100 flows are inserted into the Exact Match Cache 
> (EMC).
> +It is possible to change this insertion probability by setting the
> +``emc-insert-prob`` option::
> +
> +$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-prob=N
> +
> +where:
> +
> +``N``
> +  is a positive integer between 0 and 4294967295.
> +
> +If ``N`` is set to 1, an insertion will be performed for every flow. The 
> lower
> +the value of ``emc-insert-prob`` the higher the probability of insertion,
> +except for the value 0 which will result in no insertions being performed and
> +thus essentially disabling the EMC.
> +
> +If ``emc-insert-prob`` is modified, the daemon needs to be restarted in order
> +for the changes to take effect.
> +
> +For more information on the EMC refer to :doc:`/intro/install/dpdk` .
> +
>  .. _dpdk-ovs-in-guest:
>
>  OVS with DPDK Inside VMs
> diff --git a/NEWS b/NEWS
> index 0a9551c..8fb1f53 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -63,6 +63,8 @@ Post-v2.6.0
> device will not be available for use until a valid dpdk-devargs is
> specified.
>   * Virtual DPDK Poll Mode Driver (vdev PMD) support.
> + * New 'other_config:emc-insert-prob' field for userspace netdevs that
> +   allows definition of the EMC insertion probability.
> - Fedora packaging:
>   * A package upgrade does not automatically restart OVS service.
> - ovs-vswitchd/ovs-vsctl:
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 9ae2491..bb9e758 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -38,6 +38,8 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
>
>  static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
>
> +static uint32_t emc_insert_min = UINT32_MAX / 100;
> +
>  static int
>  process_vhost_flags(char *flag, char *default_val, int size,
>  const struct smap *ovs_other_config,
> @@ -272,6 +274,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>  int err = 0;
>  cpu_set_t cpuset;
>  char *sock_dir_subcomponent;
> +int insert_prob;
>
>  if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
>  NAME_MAX, ovs_other_config,
> @@ -297,6 +300,12 @@ dpdk_init__(const struct smap *ovs_other_config)
>  vhost_sock_dir = sock_dir_subcomponent;
>  }
>
> +/* Configure EMC insertion probability */
> +insert_prob = smap_get_int(ovs_other_config

Re: [ovs-dev] [patch_v4 0/6] Userspace Datapath: Introduce NAT support.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> This patch series introduces NAT support for the userspace datapath.
>
> The per packet scope of lookups for NAT and un_NAT is at
> the bucket level rather than global. One hash table is
> introduced to support create/delete handling. The create/delete
> events may be further optimized, if the need becomes clear.
>
> The existing NAT tests are enabled for the dpdk datapath,
> with an added enhancement to the V6 NAT test.
>
> Some NAT options with limited utility (persistent, random) are
> not supported yet, but will be supported in a later patch.
>
> One V6 api is exported to facilitate selective editing the V6
> header - packet_set_ipv6_addr().
>
> alg and fragmentation support are not included here but are
> being worked on.
>
> NEWS is not updated in this series yet, until confirmation of
> release.
>
> I realize patch 3 is big. It may be clearer and easier to keep
> as a single patch, so I have done that after some discussion.

Thanks a lot for the series.  All the NAT and OVN system tests
are passing, which is great!

You can include an update to NEWS, it won't be pushed before the
rest of the series :-)

Usually we prefix the commit message with the name of the module
that the commit touches.

More comments in the various commits

I'm sorry I don't have more meaningful comments, yet.  I'll keep looking
at the series

Thanks,

Daniele


>
> v3->v4: Fix rev_key vs key for nat_conn_keys access in a couple
> places; this would have affected cleanup; at same time
> rename some variables and change nat_conn_keys APIs to
> use conn key, rather than conn.
>
> Fix conntrack_flush() CT_CONN_TYPE_DEFAULT flag placement;
> the intention was that it be the same as in sweep_bucket().
>
> Fix nat_ipv6_addrs_delta() max boundary checking logic. I
> also enhanced the conntrack - IPv6 HTTP with NAT test to
> give it more coverage as partial penance.
>
> Rebase
>
> v2->v3: Fix a theoretical resend for closed connection restart.
> Parse out a function to help and also limit
> conn_state_update() to one.
>
> I decided to cap V6 address range delta at 4 billion using
> internal adjustment (user visibility not required).
>
> Some cleanup of deprecated code path.
>
> Parse out some more changes as separate patches.
>
> v1->v2: Updates/fixes that were missed in v1 patches.
>
> Darrell Ball (6):
>   Export packet_set_ipv6_addr()fordpdkdatapath.
>   Parse NAT netlink for userspace datapath.
>   Userspace Datapath: Introduce NAT support.
>   Unset CS_NEW for established connections.
>   Enable NAT tests for userspace datapath.
>   Enhance V6 NAT test.
>
>  lib/conntrack-private.h  |  25 +-
>  lib/conntrack.c  | 742 
> ++-
>  lib/conntrack.h  |  73 +++-
>  lib/dpif-netdev.c|  85 -
>  lib/packets.c|   2 +-
>  lib/packets.h|   4 +
>  tests/system-traffic.at  |   4 +-
>  tests/system-userspace-macros.at |   7 +-
>  tests/test-conntrack.c   |   8 +-
>  9 files changed, 843 insertions(+), 107 deletions(-)
>
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v4 1/6] Export packet_set_ipv6_addr()fordpdkdatapath.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> Signed-off-by: Darrell Ball 

LGTM, thanks

the commit message is missing a few whitespaces.

> ---
>  lib/packets.c | 2 +-
>  lib/packets.h | 4 
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/lib/packets.c b/lib/packets.c
> index fa70df6..94e7d87 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -986,7 +986,7 @@ packet_update_csum128(struct dp_packet *packet, uint8_t 
> proto,
>  }
>  }
>
> -static void
> +void
>  packet_set_ipv6_addr(struct dp_packet *packet, uint8_t proto,
>   ovs_16aligned_be32 addr[4],
>   const struct in6_addr *new_addr,
> diff --git a/lib/packets.h b/lib/packets.h
> index c4d3799..850f192 100644
> --- a/lib/packets.h
> +++ b/lib/packets.h
> @@ -1100,6 +1100,10 @@ void packet_set_ipv4_addr(struct dp_packet *packet, 
> ovs_16aligned_be32 *addr,
>  void packet_set_ipv6(struct dp_packet *, const struct in6_addr *src,
>   const struct in6_addr *dst, uint8_t tc,
>   ovs_be32 fl, uint8_t hlmit);
> +void packet_set_ipv6_addr(struct dp_packet *packet, uint8_t proto,
> +  ovs_16aligned_be32 addr[4],
> +  const struct in6_addr *new_addr,
> +  bool recalculate_csum);
>  void packet_set_tcp_port(struct dp_packet *, ovs_be16 src, ovs_be16 dst);
>  void packet_set_udp_port(struct dp_packet *, ovs_be16 src, ovs_be16 dst);
>  void packet_set_sctp_port(struct dp_packet *, ovs_be16 src, ovs_be16 dst);
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v4 2/6] Parse NAT netlink for userspace datapath.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 10:50 GMT-08:00 Darrell Ball :
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack-private.h |  9 --
>  lib/conntrack.c |  3 +-
>  lib/conntrack.h | 31 +-
>  lib/dpif-netdev.c   | 85 
> ++---
>  tests/test-conntrack.c  |  8 +++--
>  5 files changed, 118 insertions(+), 18 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 013f19f..493865f 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -29,15 +29,6 @@
>  #include "packets.h"
>  #include "unaligned.h"
>
> -struct ct_addr {
> -union {
> -ovs_16aligned_be32 ipv4;
> -union ovs_16aligned_in6_addr ipv6;
> -ovs_be32 ipv4_aligned;
> -struct in6_addr ipv6_aligned;
> -};
> -};
> -
>  struct ct_endpoint {
>  struct ct_addr addr;
>  union {
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 9bea3d9..bae42a3 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -273,7 +273,8 @@ conntrack_execute(struct conntrack *ct, struct 
> dp_packet_batch *pkt_batch,
>ovs_be16 dl_type, bool commit, uint16_t zone,
>const uint32_t *setmark,
>const struct ovs_key_ct_labels *setlabel,
> -  const char *helper)
> +  const char *helper,
> + const struct nat_action_info_t 
> *nat_action_info OVS_UNUSED)
>  {
>  struct dp_packet **pkts = pkt_batch->packets;
>  size_t cnt = pkt_batch->count;
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index 254f61c..cbdfb91 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -26,6 +26,8 @@
>  #include "openvswitch/thread.h"
>  #include "openvswitch/types.h"
>  #include "ovs-atomic.h"
> +#include "ovs-thread.h"
> +#include "packets.h"
>
>  /* Userspace connection tracker
>   * 
> @@ -61,6 +63,32 @@ struct dp_packet_batch;
>
>  struct conntrack;
>
> +struct ct_addr {
> +union {
> +ovs_16aligned_be32 ipv4;
> +union ovs_16aligned_in6_addr ipv6;
> +ovs_be32 ipv4_aligned;
> +struct in6_addr ipv6_aligned;
> +};
> +};
> +
> +// Both NAT_ACTION_* and NAT_ACTION_*_PORT can be set

We normally don't use // comments

> +enum nat_action_e {
> +   NAT_ACTION = 1 << 0,
> +   NAT_ACTION_SRC = 1 << 1,
> +   NAT_ACTION_SRC_PORT = 1 << 2,
> +   NAT_ACTION_DST = 1 << 3,
> +   NAT_ACTION_DST_PORT = 1 << 4,
> +};

This is indented by tabs, instead of 4 whitespaces.

Is NAT_ACTION really necessary?  I think it should always be set when
nat_action_info is != NULL, so we can probably remove it.

> +
> +struct nat_action_info_t {
> +   struct ct_addr min_addr;
> +   struct ct_addr max_addr;
> +   uint16_t min_port;
> +   uint16_t max_port;

Tabs

> +uint16_t nat_action;
> +};
> +
>  void conntrack_init(struct conntrack *);
>  void conntrack_destroy(struct conntrack *);
>
> @@ -68,7 +96,8 @@ int conntrack_execute(struct conntrack *, struct 
> dp_packet_batch *,
>ovs_be16 dl_type, bool commit,
>uint16_t zone, const uint32_t *setmark,
>const struct ovs_key_ct_labels *setlabel,
> -  const char *helper);
> +  const char *helper,
> +  const struct nat_action_info_t *nat_action_info);
>
>  struct conntrack_dump {
>  struct conntrack *ct;
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 3901129..a71c766 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -97,7 +97,8 @@ static struct shash dp_netdevs 
> OVS_GUARDED_BY(dp_netdev_mutex)
>  static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
>
>  #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
> - | CS_INVALID | CS_REPLY_DIR | 
> CS_TRACKED)
> + | CS_INVALID | CS_REPLY_DIR | 
> CS_TRACKED \
> + | CS_SRC_NAT | CS_DST_NAT)
>  #define DP_NETDEV_CS_UNSUPPORTED_MASK 
> (~(uint32_t)DP_NETDEV_CS_SUPPORTED_MASK)
>
>  static struct odp_support dp_netdev_support = {
> @@ -4681,7 +4682,9 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
> *packets_,
>  const char *helper = NULL;
>  const uint32_t *setmark = NULL;
>  const struct ovs_key_ct_labels *setlabel = NULL;
> -
> +struct nat_action_info_t nat_action_info;
> +bool nat = false;
> +memset(&nat_action_info, 0, sizeof nat_action_info);

As discussed offline, can this memset  be moved inside the OVS_CT_ATTR_NAT case?

>  NL_ATTR_FOR_EACH_UNSAFE (b, left, nl_attr_get(a),
>   nl_attr_get_size(a)) {
>  enum ovs_ct_attr sub_type = nl_attr_type(b);
> @@ -4702,15 +4705,89 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
> *pack

Re: [ovs-dev] [patch_v4 3/6] Userspace Datapath: Introduce NAT support.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> This patch introduces NAT support for the userspace datapath.
> The conntrack module changes are in this patch.
>
> The per packet scope of lookups for NAT and un_NAT is at
> the bucket level rather than global. One hash table is
> introduced to support create/delete handling. The create/delete
> events may be further optimized, if the need becomes clear.
>
> Some NAT options with limited utility (persistent, random) are
> not supported yet, but will be supported in a later patch.
>
> Signed-off-by: Darrell Ball 

Sparse reports some problems:

../lib/conntrack.c:1375:16: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:1398:9: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:1400:36: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:1403:33: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:214:30: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:240:30: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1360:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1362:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1365:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1367:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1395:44: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1396:44: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1409:25: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1410:25: error: no member 's6_addr32' in struct in6_addr

There are some minor coding style problems, you can find them with
utilities/checkpatch.py

> ---
>  lib/conntrack-private.h |  16 +-
>  lib/conntrack.c | 740 
> ++--
>  lib/conntrack.h |  44 +++
>  3 files changed, 717 insertions(+), 83 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 493865f..b71af37 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -51,14 +51,23 @@ struct conn_key {
>  uint16_t zone;
>  };
>
> +struct nat_conn_key_node {
> +struct hmap_node node;
> +struct conn_key key;
> +struct conn_key value;
> +};
> +
>  struct conn {
>  struct conn_key key;
>  struct conn_key rev_key;
>  long long expiration;
>  struct ovs_list exp_node;
>  struct hmap_node node;
> -uint32_t mark;
>  ovs_u128 label;
> +/* XXX: consider flattening. */
> +struct nat_action_info_t *nat_info;
> +uint32_t mark;
> +uint8_t conn_type;
>  };
>
>  enum ct_update_res {
> @@ -67,6 +76,11 @@ enum ct_update_res {
>  CT_UPDATE_NEW,
>  };
>
> +enum ct_conn_type {
> +CT_CONN_TYPE_DEFAULT,
> +   CT_CONN_TYPE_UN_NAT,
> +};
> +
>  struct ct_l4_proto {
>  struct conn *(*new_conn)(struct conntrack_bucket *, struct dp_packet 
> *pkt,
>   long long now);
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 0a611a2..34728a6 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -76,6 +76,20 @@ static void set_label(struct dp_packet *, struct conn *,
>const struct ovs_key_ct_labels *mask);
>  static void *clean_thread_main(void *f_);
>
> +static struct nat_conn_key_node *
> +nat_conn_keys_lookup(struct hmap *nat_conn_keys,
> + const struct conn_key *key,
> + uint32_t basis);
> +
> +static void
> +nat_conn_keys_remove(struct hmap *nat_conn_keys,
> +const struct conn_key *key,
> +uint32_t basis);
> +
> +static bool
> +nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
> +  struct conn *nat_conn);
> +
>  static struct ct_l4_proto *l4_protos[] = {
>  [IPPROTO_TCP] = &ct_proto_tcp,
>  [IPPROTO_UDP] = &ct_proto_other,
> @@ -90,9 +104,11 @@ long long ct_timeout_val[] = {
>  };
>
>  /* If the total number of connections goes above this value, no new 
> connections
> - * are accepted */
> + * are accepted; this is for CT_CONN_TYPE_DEFAULT connections. */
>  #define DEFAULT_N_CONN_LIMIT 300
>
> +#define DT
> +

I guess this is left here from debugging

>  /* Initializes the connection tracker 'ct'.  The caller is responsible for
>   * calling 'conntrack_destroy()', when the instance is not needed anymore */
>  void
> @@ -101,6 +117,11 @@ conntrack_init(struct conntrack *ct)
>  unsigned i, j;
>  long long now = time_msec();
>
> +ct_rwlock_init(&ct->nat_resources_lock);
> +ct_rwlock_wrlock(&ct->nat_resources_lock);
> +hmap_init(&ct->nat_conn_keys);
> +ct_rwlock_unlock(&ct->nat_resources_lock);
> +
>  for (i = 0; i < CONNTRACK_BUCKETS; i++) {
>  struct conntrack_bucket *ctb = &ct->buckets[i];
>
> @@ -139,13 +160,24 @@ conntr

Re: [ovs-dev] [patch_v4 4/6] Unset CS_NEW for established connections.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 34728a6..aaecb00 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -443,6 +443,7 @@ conn_update_state(struct conntrack *ct, struct dp_packet 
> *pkt,
>  switch (res) {
>  case CT_UPDATE_VALID:
>  *state |= CS_ESTABLISHED;
> +*state &= ~CS_NEW;

Maybe I'm missing something, but can *state be !=0 at this point?

>  if (ctx->reply) {
>  *state |= CS_REPLY_DIR;
>  }
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v4 5/6] Enable NAT tests for userspace datapath.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> Signed-off-by: Darrell Ball 

Thanks for the patch.

The macro is now empty for kernel and userspace.

Would you mind removing it from system-userspace-macros,
system-kmod-macros and from the testsuites?

Also some of the NAT tests fail for me, I guess because they use ALGs:

 57: conntrack - FTP NAT prerecirc   FAILED
(system-traffic.at:2693)
 58: conntrack - FTP NAT prerecirc seqadjFAILED
(system-traffic.at:2704)
 59: conntrack - FTP NAT postrecirc  FAILED
(system-traffic.at:2756)
 60: conntrack - FTP NAT postrecirc seqadj   FAILED
(system-traffic.at:2767)

 62: conntrack - IPv6 FTP with NAT   FAILED
(system-traffic.at:2859)

I think adding CHECK_CONNTRACK_ALG() to them should skip them for the
userspace datapath.


> ---
>  tests/system-userspace-macros.at | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/tests/system-userspace-macros.at 
> b/tests/system-userspace-macros.at
> index 631f71a..6e3d468 100644
> --- a/tests/system-userspace-macros.at
> +++ b/tests/system-userspace-macros.at
> @@ -99,9 +99,6 @@ m4_define([CHECK_CONNTRACK_LOCAL_STACK],
>  # CHECK_CONNTRACK_NAT()
>  #
>  # Perform requirements checks for running conntrack NAT tests. The userspace
> -# doesn't support NATs yet, so skip the tests
> +# datapath supports NAT.
>  #
> -m4_define([CHECK_CONNTRACK_NAT],
> -[
> -AT_SKIP_IF([:])
> -])
> +m4_define([CHECK_CONNTRACK_NAT])
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] rhel: Add missing unpackaged file 'ovs-fields.7.gz' in the %files list

2017-01-30 Thread Daniele Di Proietto
2017-01-27 5:06 GMT-08:00 Numan Siddique :
> Fixes: commit 96fee5e0a2a0 ("ovs-fields: New manpage to document Open
> vSwitch and OpenFlow fields")
>
> Signed-off-by: Numan Siddique 

Thanks for the patch.

I noticed that also rhel/openvswitch.spec.in was broken, so I squashed
in a similar fix for that file as well.

Applied to master and branch-2.7

> ---
>  rhel/openvswitch-fedora.spec.in | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
> index 3499d63..140c575 100644
> --- a/rhel/openvswitch-fedora.spec.in
> +++ b/rhel/openvswitch-fedora.spec.in
> @@ -472,6 +472,7 @@ fi
>  %{_mandir}/man1/ovsdb-tool.1*
>  %{_mandir}/man5/ovs-vswitchd.conf.db.5*
>  %{_mandir}/man5/vtep.5*
> +%{_mandir}/man7/ovs-fields.7.*
>  %{_mandir}/man8/vtep-ctl.8*
>  %{_mandir}/man8/ovs-appctl.8*
>  %{_mandir}/man8/ovs-bugtool.8*
> --
> 2.9.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] rhel: Fix ifdown for OVSDPDKBond.

2017-01-30 Thread Daniele Di Proietto





On 26/01/2017 11:11, "Aaron Conole"  wrote:

>Daniele Di Proietto  writes:
>
>> The OVSDPDKBond case wasn't handled in the rhel ifdown script.
>>
>> Fixes: f6bf8880613a ("rhel: Add support DPDK port creation via network 
>> scripts")
>> Signed-off-by: Daniele Di Proietto 
>> ---
>
>D'oh!
>
>Acked-by: Aaron Conole 

Thanks!  Pushed to master, branch-2.{7,6,5}
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] dpif-netdev: Conditional EMC insert

2017-01-30 Thread Daniele Di Proietto
2017-01-26 9:55 GMT-08:00 Loftus, Ciara :
>>
>> 2017-01-25 7:52 GMT-08:00 Loftus, Ciara :
>> >> 2017-01-22 11:45 GMT-08:00 Jan Scheurich :
>> >> >
>> >> >> It's not a big deal, since the most important use case we have for
>> >> >> dpif-netdev is with dpdk, but I'd still like the code to behave
>> >> >> similarly on different platforms.  How about defining a function that
>> >> >> uses random_uint32 when compiling without DPDK?
>> >> >>
>> >> >> For testing it's not that simple, because unit tests can be run with
>> >> >> or without DPDK.  It would need to be configurable at runtime.
>> >> >> Perhaps making EM_FLOW_INSERT_PROB configurable at runtime
>> would
>> >> also
>> >> >> help people that want to experiment with different values, even
>> >> >> though, based on the comments, I guess they wouldn't really see
>> much
>> >> >> difference.
>> >> >>
>> >> >> Again, what do you think about simply using counting the packets and
>> >> >> inserting only 1 every EM_FLOW_INSERT_PROB?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Daniele
>> >> >
>> >> >
>> >> > As far as I know Ciara did some quick tests with a counter-based
>> >> > implementation and it performed 5% worse for 1K and 4K flows than
>> then
>> >> > current patch. Perhaps we could find the reason for that and fix it, 
>> >> > but I
>> >> > also feel uncomfortable with deterministic insertion of every Nth flow.
>> This
>> >> > could lead to very strange lock-step phenomena with typical artificial
>> test
>> >> > work loads, which often generate flows round-robin. I would rather use
>> a
>> >> > random function, as you suggest, or count "cycles" differently when
>> >> > compiling without DPDK.
>> >>
>> >> Ok, using another pseudo random function when compiling without DPDK
>> >> sounds
>> >> good to me.
>> >>
>> >
>> > Any suggestions for the random function?
>>
>> I think we can use random_uint32() from lib/random.h
>>
>> >
>> >> >
>> >> > I agree to making the parameter EM_FLOW_INSERT_PROB configurable
>> for
>> >> unit
>> >> > test or other purposes. Should it be a new option in the OpenvSwitch
>> table
>> >> > in OVSDB or rather a run-time parameter to be changed with ovs-
>> appctl?
>> >>
>> >> I think a new option in Openvswitch other_config would be appropriate.
>> >
>> > I like this idea. I've started making these changes. How about something
>> like the following?..
>> >
>> > +  > > +  type='{"type": "integer", "minInteger": 0, "maxInteger":
>> 4294967295}'>
>> > +
>> > +  Specifies the probability (1/emc-insert-prob) of a flow being
>> > +  inserted into the Exact Match Cache (EMC). Higher values of
>> > +  emc-insert-prob will result in less insertions, and lower
>> > +  values will result in more insertions. A value of zero will
>> > +  result in no insertions and essentially disable the EMC.
>> > +
>> > +
>> > +  Defaults to 100 ie. there is 1/100 chance of EMC insertion.
>>
>> Looks good to me, thanks.
>>
>> I would also add that this only applies to 'netdev' bridges (userspace) and
>> that
>> a value of 1 means that every flow is going to be sent to the EMC.
>
> Thanks Daniele. I've posted a v2. Not sure it's the ideal approach so would 
> appreciate your feedback if you get a chance.
>
> On a separate note, I'm wondering would now be a good time to consider 
> allowing the size of the EMC to be configurable ie. allow EM_FLOW_HASH_SHIFT 
> or a similar value to be modifiable. Whatever scheme is followed for 
> modifying insert probability could probably be easily be re-used for EMC 
> sizing too. Just an idea!

By the way, I forgot to mention that I like this idea.  Hopefully it
shouldn't introduce any overhead.

Thanks,

Daniele

>
> Thanks,
> Ciara
>
>>
>> >
>> > Thanks,
>> > Ciara
>> >
>> >>
>> >> Thanks,
>> >>
>> >> Daniele
>> >>
>> >> >
>> >> > Jan
>> >> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   3   >