date:20160727

Re: [ovs-dev] [PATCH v5 05/16] conntrack: Periodically delete expired connections.

2016-07-27 Thread Joe Stringer

On 27 July 2016 at 19:01, Daniele Di Proietto  wrote:
>
>
>
>
>
> On 27/07/2016 17:14, "Joe Stringer"  wrote:
>
>>On 26 July 2016 at 17:58, Daniele Di Proietto  wrote:
>>> This commit adds a thread that periodically removes expired connections.
>>>
>>> The expiration time of a connection can be expressed by:
>>>
>>> expiration = now + timeout
>>>
>>> For each possible 'timeout' value (there aren't many) we keep a list.
>>> When the expiration is updated, we move the connection to the back of the
>>> corresponding 'timeout' list. This ways, the list is always ordered by
>>> 'expiration'.
>>>
>>> When the cleanup thread iterates through the lists for expired
>>> connections, it can stop at the first non expired connection.
>>>
>>> Suggested-by: Joe Stringer 
>>> Signed-off-by: Daniele Di Proietto 
>>
>>Acked-by: Joe Stringer 
>
> Thanks for the review!
>
>>
>>Minor comments on comments below. Thanks!
>>
>>
>>
>>> +/* Cleanup:
>>> + *
>>
>>Extra line.
>
> Fixed
>
>>
>>> + * We must call conntrack_clean() periodically.  conntrack_clean() return
>>> + * value gives an hint on when the next cleanup must be done (either 
>>> because
>>> + * there is an actual connection that expires, or because a new connection
>>> + * might be created with the minimum timeout).
>>> + *
>>> + * The logic below has two goals:
>>> + *
>>> + * - Avoid calling conntrack_clean() too often.  If we call 
>>> conntrack_clean()
>>> + *   each time a connection expires, the thread will consume 100% CPU, so 
>>> we
>>> + *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to 
>>> batch
>>> + *   removal.
>>
>>Isn't it CT_CLEAN_MIN_INTERVAL that prevents calls happening too
>>often? I would imagine that under high cps conditions where you're
>>likely to peg 100% on cleanup, cleanup is behind and CT_CLEAN_INTERVAL
>>logic won't come into the picture.
>>
>>> + * - On the other hand, it's not a good idea to keep the buckets locked for
>>> + *   too long, as we might prevent traffic from flowing.  If 
>>> conntrack_clean()
>>> + *   returns a value which is in the past, it means that the internal limit
>>> + *   has been reached and more cleanup is required.  In this case, just 
>>> wait
>>> + *   CT_CLEAN_MIN_INTERVAL before the next call.
>>> + */
>>
>>Keeping the buckets locked too long also happens if you constantly
>>call conntrack_clean(), so I think these two paragraphs are arguing
>>slightly different angles for the same parameter.
>>
>>CT_CLEAN_MIN_INTERVAL ensures that if cleanup is behind, there is
>>atleast some 200ms blocks of time when buckets will be left alone so
>>the datapath can operate unhindered.
>>
>>CT_CLEAN_INTERVAL ensures that if we are coping with the current
>>cleanup tasks, then we wait at least 5 seconds to do further cleanup.
>>This seems like it's more targeted towards reducing wakeups when there
>>is a wide distribution of timeouts but relatively small number of
>>connections, that could be handled by less frequent cleanups.
>>
>>I like the "the logic has two goals" presentation of this, but maybe
>>there is a better way we can frame the comment above?
>
> I couldn't have said it better, I almost stole your wording entirely:
>
> + * The logic below has two goals:
> + *
> + * - Avoid calling conntrack_clean() too often.  If we call conntrack_clean()
> + *   each time a connection expires, the thread will consume 100% CPU, so we
> + *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to 
> batch
> + *   removal.
> + *
> + * - On the other hand, it's not a good idea to keep the buckets locked for
> + *   too long, as we might prevent traffic from flowing.  If 
> conntrack_clean()
> + *   returns a value which is in the past, it means that the internal limit
> + *   has been reached and more cleanup is required.  In this case, just wait
> + *   CT_CLEAN_MIN_INTERVAL before the next call.

Is the new wording missing? This looks the same as the old wording.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: create patch ports when necessary.

2016-07-27 Thread nickcooper-zhangtonghao


> On Jul 28, 2016, at 5:03 AM, Ben Pfaff  wrote:
> 
> On Mon, Jul 25, 2016 at 09:23:19AM -0700, nickcooper-zhangtonghao wrote:
>> If the logical router ports without 'peer' or the port named peer
>> is not created, it is unnecessary to insert the ports into the
>> southbound Port_Binding table.
>> 
>> Similarly, if logical switch ports of type 'router' don't contain 
>> 'router-port'
>> value, these ports will not be inserted.
>> 
>> This patch may optimize the process of 'build_ports'.
>> 
>> Signed-off-by: nickcooper-zhangtonghao 
>> 
> 
> This doesn't seem like a common case.  Is it worth optimizing?

It seems to me that it’s unnecessary to create the ports unused currently.

Thanks,
zhangtonghao

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovsdb: Fix memory leak in execute_update.

2016-07-27 Thread nickcooper-zhangtonghao

Good, it is unnecessary to check again.

> On Jul 28, 2016, at 2:13 AM, William Tu  wrote:
> 
> Hi zhangtonghao,
> 
> Thanks for the patch!
> I think it's safe to call free(ptr) when ptr == NULL. The free() will
> be no operation.
> 
> Regards,
> William
> 
> On Tue, Jul 26, 2016 at 9:18 PM, nickcooper-zhangtonghao
>  wrote:
>> Hi William,
>> I reviewed your patch codes and found other bugs.
>> 
>> If the ‘ovsdb_condition_from_json’ return the error, cnd->clauses will be
>> set NULL, so ‘ovsdb_condition_destroy’ should check the 'cnd->clauses’ before
>> free it.
>> 
>> It is applied to ‘ovsdb_row_from_json’.
>> 
>> 
>> Signed-off-by: nickcooper-zhangtonghao 
>> 
>> ---
>> ovsdb/column.c  | 5 -
>> ovsdb/condition.c   | 6 +-
>> ovsdb/replication.c | 3 +++
>> 3 files changed, 12 insertions(+), 2 deletions(-)
>> 
>> diff --git a/ovsdb/column.c b/ovsdb/column.c
>> index 8838df3..f01301f 100644
>> --- a/ovsdb/column.c
>> +++ b/ovsdb/column.c
>> @@ -129,7 +129,10 @@ ovsdb_column_set_init(struct ovsdb_column_set *set)
>> void
>> ovsdb_column_set_destroy(struct ovsdb_column_set *set)
>> {
>> -free(set->columns);
>> +if (set->columns) {
>> +free(set->columns);
>> +set->columns = NULL;
>> +}
>> }
>> 
>> void
>> diff --git a/ovsdb/condition.c b/ovsdb/condition.c
>> index 6da3b08..f337a37 100644
>> --- a/ovsdb/condition.c
>> +++ b/ovsdb/condition.c
>> @@ -485,7 +485,11 @@ ovsdb_condition_destroy(struct ovsdb_condition *cnd)
>> for (i = 0; i < cnd->n_clauses; i++) {
>> ovsdb_clause_free(>clauses[i]);
>> }
>> -free(cnd->clauses);
>> +
>> +if (cnd->clauses) {
>> +free(cnd->clauses);
>> +cnd->clauses = NULL;
>> +}
>> cnd->n_clauses = 0;
>> 
>> ovsdb_condition_optimize_destroy(cnd);
>> diff --git a/ovsdb/replication.c b/ovsdb/replication.c
>> index 52b7085..bfd2ca1 100644
>> --- a/ovsdb/replication.c
>> +++ b/ovsdb/replication.c
>> @@ -568,6 +568,8 @@ execute_delete(struct ovsdb_txn *txn, const char *uuid,
>> }
>> 
>> ovsdb_condition_destroy();
>> +json_destroy(CONST_CAST(struct json *, where));
>> +
>> return error;
>> }
>> 
>> @@ -625,6 +627,7 @@ execute_update(struct ovsdb_txn *txn, const char *uuid,
>> ovsdb_row_destroy(row);
>> ovsdb_column_set_destroy();
>> ovsdb_condition_destroy();
>> +json_destroy(CONST_CAST(struct json *, where));
>> 
>> return error;
>> }
>> --
>> 1.8.3.1
>> 
>> 
>>> From: William Tu 
>>> To: dev@openvswitch.org
>>> Subject: [ovs-dev] [PATCH] ovsdb: Fix memory leak in execute_update.
>>> Message-ID: <1469582910-64371-1-git-send-email-u9012...@gmail.com>
>>> 
>>> Valgrind testcase 1804 ovsdb-server.at:1023 insert rows, update rows by 
>>> value
>>> reports the following leak.
>>>   json_from_string (json.c:1025)
>>>   execute_update (replication.c:614), similarily at execute_delete()
>>>   process_table_update (replication.c:502)
>>>   process_notification.part.5 (replication.c:445)
>>>   process_notification (replication.c:402)
>>>   check_for_notifications (replication.c:418)
>>>   replication_run (replication.c:110)
>>> 
>>> Signed-off-by: William Tu 
>>> ---
>>> ovsdb/replication.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/ovsdb/replication.c b/ovsdb/replication.c
>>> index af7ae5c..fe89d39 100644
>>> --- a/ovsdb/replication.c
>>> +++ b/ovsdb/replication.c
>>> @@ -573,6 +573,8 @@ execute_delete(struct ovsdb_txn *txn, const char *uuid,
>>>}
>>> 
>>>ovsdb_condition_destroy();
>>> +json_destroy(CONST_CAST(struct json *, where));
>>> +
>>>return error;
>>> }
>>> 
>>> @@ -630,6 +632,7 @@ execute_update(struct ovsdb_txn *txn, const char *uuid,
>>>ovsdb_row_destroy(row);
>>>ovsdb_column_set_destroy();
>>>ovsdb_condition_destroy();
>>> +json_destroy(CONST_CAST(struct json *, where));
>>> 
>>>return error;
>>> }
>>> --
>>> 2.5.0
>> 



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Only peer router ports to other router ports.

2016-07-27 Thread nickcooper-zhangtonghao


> On Jul 28, 2016, at 5:01 AM, Ben Pfaff  wrote:
> 
> On Mon, Jul 25, 2016 at 09:19:00AM -0700, nickcooper-zhangtonghao wrote:
>> Deletion of a logical router port, which may be an other
>> router peer, result in a WARN, because the "ports" variable
>> still contains the logical port deleted. This port exists
>> but should not have been initialized fully.
>> 
>> nbsp == NULL, nbrp == NULL
>> 
>> It is necessary to check 'nbsp'.
>> 
>> A router port's "peer", if set, must point to another router port, but the
>> code as written also accepted switch ports.  This caused problems when
>> switch ports were actually specified.
>> 
>> Reported-by: Gurucharan Shetty 
>> Reported-at: http://openvswitch.org/pipermail/dev/2016-July/075524.html
>> Signed-off-by: Ben Pfaff 
>> Acked-by: Gurucharan Shetty 
>> Signed-off-by: nickcooper-zhangtonghao 
>> 
>> ---
>> ovn/northd/ovn-northd.c | 8 +++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index a3d1672..efc915c 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -746,9 +746,15 @@ join_logical_ports(struct northd_context *ctx,
>> } else if (op->nbrp && op->nbrp->peer) {
>> struct ovn_port *peer = ovn_port_find(ports, op->nbrp->peer);
>> if (peer) {
>> +/* Deletion of a logical router port, which may be an other
>> + * router peer, result in a WARN, because the "ports" 
>> variable
>> + * still contains the logical port deleted. This port 
>> exists 
>> + * but should not have been initialized fully.
>> + *
>> + * nbsp == NULL, nbrp == NULL */
>> if (peer->nbrp) {
>> op->peer = peer;
>> -} else {
>> +} else if (peer->nbsp) {
>> /* An ovn_port for a switch port of type "router" does 
>> have
>>  * a router port as its peer (see the case above for
>>  * "router" ports), but this is set via 
>> options:router-port
> 
> I don't understand this yet.  I don't see how an ovn_port's nbsp and
> nbrp can both be NULL.  I only see two calls to ovn_port_create(), and
> each of them supplies either nbsp or nbrp as nonnull.
> 
> Can you explain further?
> 
> Thanks,
> 
> Ben.


Run the commands below, you will get a WARN info in the ovn-northd.log

ovn-nbctl lr-add lr0
ovn-nbctl lr-add lr1
ovn-nbctl lrp-add lr0 lr0-p0 00:11:22:33:44:55 192.168.10.10/24 peer=lr1-p0
ovn-nbctl lrp-add lr1 lr1-p0 00:11:22:33:44:66 192.168.20.10/24 peer=lr0-p0
ovn-nbctl lrp-del lr1-p0

WARN INFO:
“Bad configuration: The peer of router port lr0-p0 is a switch port.”


Here’s is why:
In the ‘join_logical_ports’ function,  firstly, we get the ‘ovn_port’s from 
SB database and save them to “ports” variable. The ovn_port’s nbsp and nbrp both
are NULL via ovn_port_create.

Next, in the HMAP_FOR_EACH  loop, we initialize the ‘ovn_port’s except which 
were deleted on the NB database. The ovn_port which is deleted in NB database 
is still in SB database. so, still in “ports” variable.

Then, when connected to their peer, the ovn_port deleted may be found as a peer.
But the ovn_port's nbsp and nbsp both are NULL.

Thanks,
zhangtonghao  





___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-controller: squelch expected duplicate flow warnings

2016-07-27 Thread Ryan Moats



Ben Pfaff  wrote on 07/27/2016 03:53:56 PM:

> From: Ben Pfaff 
> To: Guru Shetty 
> Cc: Ryan Moats/Omaha/IBM@IBMUS, ovs dev 
> Date: 07/27/2016 03:54 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: squelch expected
> duplicate flow warnings
>
> On Tue, Jul 26, 2016 at 01:54:29PM -0700, Guru Shetty wrote:
> > On 24 July 2016 at 10:07, Ryan Moats  wrote:
> >
> > > In the physical processing of ovn-controller, there are two
> > > sets of OF flows that are still fully recalculated every cycle:
> > >
> > >   Flows that aren't associated with any logical flow, and
> > >   Flows calculated based on multicast groups
> > >
> > > Because these flows are recalculated fully each cycle, full
> > > duplicates of existing OF flows are created and the OF management
> > > code in ovn-controller pollutes the logs with false positive
> > > warnings about repeated duplicates.
> > >
> > > As a short term measure, ignore full duplicates for both of
> > > these types of flows, but still warn if the action changes
> > > (as that is not expected and may be indicative of a problem).
> > >
> > > Signed-off-by: Ryan Moats 
> > >
> >
> > I also noticed that "commit 70c7cfef188b5ae9940abd5 (ovn-controller:
Add
> > incremental processing to lflow_run and physical_run)" causes load
> > balancing system unit tests to fail. A little debugging shows that
groups
> > are getting deleted when new flows are added.  My hunch is that this is
> > likely because 'desired_groups' in ofctl_put gets deleted in every run.
But
> > in the next run, it does not get updated as we no longer process all
flows.
>
> It's unclear to me from the discussion of this patch whether it's
> beneficial.  Can someone clarify?
>
> Thanks,
>
> Ben.

Unfortunately, I don't think we maintained reasonable email thread
discipline
on this discussion - for me, the vast majority of the conversation was
about
the CPU cycle bug that Guru mentions and not the patch itself.

The conditions stated in the commit message still apply even after fixing
Guru's issue and so I still see the squelching as being useful to avoid
polluting the ovn-controller logs with false positive messages...

Ryan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v5 05/16] conntrack: Periodically delete expired connections.

2016-07-27 Thread Daniele Di Proietto






On 27/07/2016 17:14, "Joe Stringer"  wrote:

>On 26 July 2016 at 17:58, Daniele Di Proietto  wrote:
>> This commit adds a thread that periodically removes expired connections.
>>
>> The expiration time of a connection can be expressed by:
>>
>> expiration = now + timeout
>>
>> For each possible 'timeout' value (there aren't many) we keep a list.
>> When the expiration is updated, we move the connection to the back of the
>> corresponding 'timeout' list. This ways, the list is always ordered by
>> 'expiration'.
>>
>> When the cleanup thread iterates through the lists for expired
>> connections, it can stop at the first non expired connection.
>>
>> Suggested-by: Joe Stringer 
>> Signed-off-by: Daniele Di Proietto 
>
>Acked-by: Joe Stringer 

Thanks for the review!

>
>Minor comments on comments below. Thanks!
>
>
>
>> +/* Cleanup:
>> + *
>
>Extra line.

Fixed

>
>> + * We must call conntrack_clean() periodically.  conntrack_clean() return
>> + * value gives an hint on when the next cleanup must be done (either because
>> + * there is an actual connection that expires, or because a new connection
>> + * might be created with the minimum timeout).
>> + *
>> + * The logic below has two goals:
>> + *
>> + * - Avoid calling conntrack_clean() too often.  If we call 
>> conntrack_clean()
>> + *   each time a connection expires, the thread will consume 100% CPU, so we
>> + *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to 
>> batch
>> + *   removal.
>
>Isn't it CT_CLEAN_MIN_INTERVAL that prevents calls happening too
>often? I would imagine that under high cps conditions where you're
>likely to peg 100% on cleanup, cleanup is behind and CT_CLEAN_INTERVAL
>logic won't come into the picture.
>
>> + * - On the other hand, it's not a good idea to keep the buckets locked for
>> + *   too long, as we might prevent traffic from flowing.  If 
>> conntrack_clean()
>> + *   returns a value which is in the past, it means that the internal limit
>> + *   has been reached and more cleanup is required.  In this case, just wait
>> + *   CT_CLEAN_MIN_INTERVAL before the next call.
>> + */
>
>Keeping the buckets locked too long also happens if you constantly
>call conntrack_clean(), so I think these two paragraphs are arguing
>slightly different angles for the same parameter.
>
>CT_CLEAN_MIN_INTERVAL ensures that if cleanup is behind, there is
>atleast some 200ms blocks of time when buckets will be left alone so
>the datapath can operate unhindered.
>
>CT_CLEAN_INTERVAL ensures that if we are coping with the current
>cleanup tasks, then we wait at least 5 seconds to do further cleanup.
>This seems like it's more targeted towards reducing wakeups when there
>is a wide distribution of timeouts but relatively small number of
>connections, that could be handled by less frequent cleanups.
>
>I like the "the logic has two goals" presentation of this, but maybe
>there is a better way we can frame the comment above?

I couldn't have said it better, I almost stole your wording entirely:

+ * The logic below has two goals:
+ *
+ * - Avoid calling conntrack_clean() too often.  If we call conntrack_clean()
+ *   each time a connection expires, the thread will consume 100% CPU, so we
+ *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to batch
+ *   removal.
+ *
+ * - On the other hand, it's not a good idea to keep the buckets locked for
+ *   too long, as we might prevent traffic from flowing.  If conntrack_clean()
+ *   returns a value which is in the past, it means that the internal limit
+ *   has been reached and more cleanup is required.  In this case, just wait
+ *   CT_CLEAN_MIN_INTERVAL before the next call.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH, v5] Scanning only changed entries in the ovnsb

2016-07-27 Thread Hui Kang

- Improve performance by scanning only changed port binding entries
when determining whether to mark the logical switch port up or
down

---
v4->v5:
- rebase

v3->v4:
- Add an initialization function to scan all entries in Port_binding table
  when ovn-northd restarts or fails over

Signed-off-by: Hui Kang 
---
 ovn/northd/ovn-northd.c | 100 
 1 file changed, 68 insertions(+), 32 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 716f123..32c14a2 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -53,6 +53,11 @@ struct northd_context {
 struct ovsdb_idl_txn *ovnsb_txn;
 };
 
+struct lport_hash_node {
+struct hmap_node node;
+const struct nbrec_logical_switch_port *nbsp;
+};
+
 static const char *ovnnb_db;
 static const char *ovnsb_db;
 
@@ -3206,20 +3211,51 @@ ovnnb_db_run(struct northd_context *ctx, struct 
ovsdb_idl_loop *sb_loop)
 }
 }
 
+static void
+sb_chassis_update_nbsec(const struct sbrec_port_binding *sb,
+struct hmap *lports_hmap)
+
+{
+struct lport_hash_node *hash_node;
+const struct nbrec_logical_switch_port *nbsp;
+
+nbsp = NULL;
+HMAP_FOR_EACH_WITH_HASH(hash_node, node,
+hash_string(sb->logical_port, 0),
+lports_hmap) {
+if (!strcmp(sb->logical_port, hash_node->nbsp->name)) {
+nbsp = hash_node->nbsp;
+break;
+}
+}
+
+if (!nbsp) {
+/* The logical port doesn't exist for this port binding.  This can
+ * happen under normal circumstances when ovn-northd hasn't gotten
+ * around to pruning the Port_Binding yet. */
+return;
+}
+
+if (sb->chassis && (!nbsp->up || !*nbsp->up)) {
+bool up = true;
+nbrec_logical_switch_port_set_up(nbsp, , 1);
+} else if (!sb->chassis && (!nbsp->up || *nbsp->up)) {
+bool up = false;
+nbrec_logical_switch_port_set_up(nbsp, , 1);
+}
+}
+
 /* Handle changes to the 'chassis' column of the 'Port_Binding' table.  When
  * this column is not empty, it means we need to set the corresponding logical
  * port as 'up' in the northbound DB. */
 static void
-update_logical_port_status(struct northd_context *ctx)
+update_logical_port_status(struct northd_context *ctx, bool is_init)
 {
 struct hmap lports_hmap;
 const struct sbrec_port_binding *sb;
 const struct nbrec_logical_switch_port *nbsp;
 
-struct lport_hash_node {
-struct hmap_node node;
-const struct nbrec_logical_switch_port *nbsp;
-} *hash_node;
+struct lport_hash_node *hash_node;
 
 hmap_init(_hmap);
 
@@ -3229,30 +3265,13 @@ update_logical_port_status(struct northd_context *ctx)
 hmap_insert(_hmap, _node->node, hash_string(nbsp->name, 
0));
 }
 
-SBREC_PORT_BINDING_FOR_EACH(sb, ctx->ovnsb_idl) {
-nbsp = NULL;
-HMAP_FOR_EACH_WITH_HASH(hash_node, node,
-hash_string(sb->logical_port, 0),
-_hmap) {
-if (!strcmp(sb->logical_port, hash_node->nbsp->name)) {
-nbsp = hash_node->nbsp;
-break;
-}
-}
-
-if (!nbsp) {
-/* The logical port doesn't exist for this port binding.  This can
- * happen under normal circumstances when ovn-northd hasn't gotten
- * around to pruning the Port_Binding yet. */
-continue;
+if (is_init) {
+SBREC_PORT_BINDING_FOR_EACH(sb, ctx->ovnsb_idl) {
+sb_chassis_update_nbsec(sb, _hmap);
 }
-
-if (sb->chassis && (!nbsp->up || !*nbsp->up)) {
-bool up = true;
-nbrec_logical_switch_port_set_up(nbsp, , 1);
-} else if (!sb->chassis && (!nbsp->up || *nbsp->up)) {
-bool up = false;
-nbrec_logical_switch_port_set_up(nbsp, , 1);
+} else {
+SBREC_PORT_BINDING_FOR_EACH_TRACKED(sb, ctx->ovnsb_idl) {
+sb_chassis_update_nbsec(sb, _hmap);
 }
 }
 
@@ -3354,13 +3373,14 @@ update_northbound_cfg(struct northd_context *ctx,
 
 /* Handle a fairly small set of changes in the southbound database. */
 static void
-ovnsb_db_run(struct northd_context *ctx, struct ovsdb_idl_loop *sb_loop)
+ovnsb_db_run(struct northd_context *ctx, struct ovsdb_idl_loop *sb_loop,
+ bool is_init)
 {
 if (!ctx->ovnnb_txn || !ovsdb_idl_has_ever_connected(ctx->ovnsb_idl)) {
 return;
 }
 
-update_logical_port_status(ctx);
+update_logical_port_status(ctx, is_init);
 update_northbound_cfg(ctx, sb_loop);
 }
 
@@ -3463,6 +3483,19 @@ add_column_noalert(struct ovsdb_idl *idl,
 ovsdb_idl_omit_alert(idl, column);
 }
 
+static void
+ovnsb_db_run_init(struct ovsdb_idl_loop *ovnsb_idl_loop)
+{
+struct northd_context ctx = {
+.ovnnb_idl = NULL,
+.ovnnb_txn = NULL,
+

Re: [ovs-dev] [PATCH] netdev-dpdk: Add Flow Control support.

2016-07-27 Thread Daniele Di Proietto

Thanks for the patch.

I agree, I'd be nice to document this in INSTALL.DPDK-ADVANCED.md

We should also document the new fields in vswitchd/vswitch.xml.

Probably it's better to use "true" and "false" rather that "on" and "off",
for consistency with other configuration options and so that we can use
smap_get_bool().

I assume it's ok to call rte_eth_dev_flow_ctrl_get()/_set() while the
device is transmitting/receiving.

Maybe it would be better to cache fc_conf in struct netdev_dpdk and call
_set() only if we have to make a change?

2016-07-22 6:18 GMT-07:00 Sugesh Chandran :

> Add support for flow-control(mac control frame) to DPDK enabled physical
> port types. By default, the flow-control is OFF on both rx and tx side.
> The flow control can be enabled/disabled either when adding a port to OVS
> or at run time.
>
> For eg:
> To enable flow control support at tx side while adding a port, add the
> 'tx-flow-ctrl' option to the 'ovs-vsctl add-port' command-line as below.
>
>  'ovs-vsctl add-port br0 dpdk0 -- \
>   set Interface dpdk0 type=dpdk options:tx-flow-ctrl=on'
>
> Similarly to enable rx flow control,
>  'ovs-vsctl add-port br0 dpdk0 -- \
>   set Interface dpdk0 type=dpdk options:rx-flow-ctrl=on'
>
> And to enable the flow control auto-negotiation,
>  'ovs-vsctl add-port br0 dpdk0 -- \
>   set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=on'
>
> To turn ON the tx flow control at run time(After the port is being added
> to OVS), the command-line input will be,
>  'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=on'
>
> The flow control parameters can be turned off by setting 'off' to the
> respective parameter. To turn off the flow control at tx side,
>  'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=off'
>
> Signed-off-by: Sugesh Chandran 
> ---
>  lib/netdev-dpdk.c | 68
> +++
>  1 file changed, 68 insertions(+)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 85b18fd..74efd25 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -634,6 +634,67 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int
> n_rxq, int n_txq)
>  return diag;
>  }
>
> +static void
> +dpdk_eth_parse_flow_ctrl(struct netdev_dpdk *dev,
> + const struct smap *args,
> + struct rte_eth_fc_conf *fc_conf)
> + OVS_REQUIRES(dev->mutex)
>

Minor: the other thread safety annotations are not aligned with the
parameters, but are just indented four spaces.


> +{
> +int ret = 0;
> +int rx_fc_en = 0;
> +int tx_fc_en = 0;
> +const char *rx_flow_mode;
> +const char *tx_flow_mode;
> +const char *flow_autoneg;
> +enum rte_eth_fc_mode fc_mode_set[2][2] = {{RTE_FC_NONE,
> RTE_FC_TX_PAUSE},
> +  {RTE_FC_RX_PAUSE,
> RTE_FC_FULL}
> + };
> +
> +ret = rte_eth_dev_flow_ctrl_get(dev->port_id, fc_conf);
> +if (ret != 0) {
> +VLOG_DBG("cannot get flow control parameters on port=%d, err=%s",
> + dev->port_id, rte_strerror(ret));
>

I'm not sure, do we need to change 'ret' sign before passing it to
rte_strerror()?


> +return;
> +}
> +rx_flow_mode = smap_get(args, "rx-flow-ctrl");
> +tx_flow_mode = smap_get(args, "tx-flow-ctrl");
> +flow_autoneg = smap_get(args, "flow-ctrl-autoneg");
> +if (rx_flow_mode) {
> +if (!strcmp(rx_flow_mode, "on")) {
> +rx_fc_en = 1;
> +}
> +else if (!strcmp(rx_flow_mode, "off")) {
> +rx_fc_en = 0;
> +}
> +}
> +if (tx_flow_mode) {
> +if (!strcmp(tx_flow_mode, "on")) {
> +tx_fc_en =1;
> +}
> +else if (!strcmp(tx_flow_mode, "off")) {
> +tx_fc_en =0;
> +}
> +}
> +if (flow_autoneg) {
> +if (!strcmp(flow_autoneg, "on")) {
> +fc_conf->autoneg = 1;
> +}
> +else if (!strcmp(flow_autoneg, "off")) {
> +fc_conf->autoneg = 0;
> +}
> +}
> +fc_conf->mode = fc_mode_set[tx_fc_en][rx_fc_en];
> +}
> +
> +static void
> +dpdk_eth_flow_ctrl_config(struct netdev_dpdk *dev,
> +  struct rte_eth_fc_conf *fc_conf)
> +  OVS_REQUIRES(dev->mutex)
>

Minor: the other thread safety annotations are not aligned with the
parameters, but are just indented four spaces.


> +{
> +if (rte_eth_dev_flow_ctrl_set(dev->port_id, fc_conf) != 0) {
>

I'd drop the != 0


> +VLOG_ERR("Failed to enable flow control on device %d",
> dev->port_id);
>

VLOG_WARN is probably enough


> +}
> +}
>
>  static int
>  dpdk_eth_dev_init(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex)
> @@ -991,6 +1052,13 @@ netdev_dpdk_set_config(struct netdev *netdev, const
> struct smap *args)
>  dev->requested_n_rxq = new_n_rxq;
>

Re: [ovs-dev] [PATCH v5 05/16] conntrack: Periodically delete expired connections.

2016-07-27 Thread Joe Stringer

On 26 July 2016 at 17:58, Daniele Di Proietto  wrote:
> This commit adds a thread that periodically removes expired connections.
>
> The expiration time of a connection can be expressed by:
>
> expiration = now + timeout
>
> For each possible 'timeout' value (there aren't many) we keep a list.
> When the expiration is updated, we move the connection to the back of the
> corresponding 'timeout' list. This ways, the list is always ordered by
> 'expiration'.
>
> When the cleanup thread iterates through the lists for expired
> connections, it can stop at the first non expired connection.
>
> Suggested-by: Joe Stringer 
> Signed-off-by: Daniele Di Proietto 

Acked-by: Joe Stringer 

Minor comments on comments below. Thanks!

> +/* Cleanup:
> + *

Extra line.

> + * We must call conntrack_clean() periodically.  conntrack_clean() return
> + * value gives an hint on when the next cleanup must be done (either because
> + * there is an actual connection that expires, or because a new connection
> + * might be created with the minimum timeout).
> + *
> + * The logic below has two goals:
> + *
> + * - Avoid calling conntrack_clean() too often.  If we call conntrack_clean()
> + *   each time a connection expires, the thread will consume 100% CPU, so we
> + *   try to call the function _at most_ once every CT_CLEAN_INTERVAL, to 
> batch
> + *   removal.

Isn't it CT_CLEAN_MIN_INTERVAL that prevents calls happening too
often? I would imagine that under high cps conditions where you're
likely to peg 100% on cleanup, cleanup is behind and CT_CLEAN_INTERVAL
logic won't come into the picture.

> + * - On the other hand, it's not a good idea to keep the buckets locked for
> + *   too long, as we might prevent traffic from flowing.  If 
> conntrack_clean()
> + *   returns a value which is in the past, it means that the internal limit
> + *   has been reached and more cleanup is required.  In this case, just wait
> + *   CT_CLEAN_MIN_INTERVAL before the next call.
> + */

Keeping the buckets locked too long also happens if you constantly
call conntrack_clean(), so I think these two paragraphs are arguing
slightly different angles for the same parameter.

CT_CLEAN_MIN_INTERVAL ensures that if cleanup is behind, there is
atleast some 200ms blocks of time when buckets will be left alone so
the datapath can operate unhindered.

CT_CLEAN_INTERVAL ensures that if we are coping with the current
cleanup tasks, then we wait at least 5 seconds to do further cleanup.
This seems like it's more targeted towards reducing wakeups when there
is a wide distribution of timeouts but relatively small number of
connections, that could be handled by less frequent cleanups.

I like the "the logic has two goals" presentation of this, but maybe
there is a better way we can frame the comment above?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH, v4] Scanning only changed entries in the ovnsb

2016-07-27 Thread Hui Kang



"dev"  wrote on 07/27/2016 05:53:17 PM:

> From: Ben Pfaff 
> To: Hui Kang 
> Cc: dev@openvswitch.org
> Date: 07/27/2016 05:53 PM
> Subject: Re: [ovs-dev] [PATCH, v4] Scanning only changed entries in the
ovnsb
> Sent by: "dev" 
>
> On Tue, Jul 26, 2016 at 10:25:07PM -0400, Hui Kang wrote:
> > - Improve performance by scanning only changed port binding entries
> > when determining whether to mark the logical switch port up or
> > down
> >
> > v3->v4:
> > - Add an initialization function to scan all entries in Port_binding
table
> >   when ovn-northd restarts or fails over
> >
> > Signed-off-by: Hui Kang 
>
> Hi, thanks for the revision.  This gets a few patch rejects, would you
> mind rebasing?

Not a problem; will submit an update soon.

- Hui

>
> Thanks,
>
> Ben.
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] ovn-controller: Restore all_lports for update_ct_zone

2016-07-27 Thread Ryan Moats


Russell Bryant  wrote on 07/27/2016 04:13:34 PM:

> From: Russell Bryant 
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: ovs dev 
> Date: 07/27/2016 04:14 PM
> Subject: Re: [ovs-dev] [PATCH v2] ovn-controller: Restore all_lports
> for update_ct_zone
>
> On Wed, Jul 27, 2016 at 8:42 AM, Ryan Moats  wrote:
> As [1] indicates, commit 263064a (Convert binding_run
> to incremental processing.) incorrectly localized
> all_lports to the binding module, leaving an empty
> set for update_ct_zone to work with.  This patch
> restores all_lports processing to what existed prior
> to that patch.
>
> [1] http://openvswitch.org/pipermail/dev/2016-July/076171.html
>
> Signed-off-by: Ryan Moats 
> ---
>  v1->v2: Added missing reference to commit message
>
>  ovn/controller/binding.c        | 30 +-
>  ovn/controller/binding.h        |  3 ++-
>  ovn/controller/ovn-controller.c |  3 ++-
>  3 files changed, 33 insertions(+), 3 deletions(-)
>
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index e83c1d5..82ee3ba 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -230,7 +230,8 @@ consider_local_datapath(struct controller_ctx *ctx,
>
>  void
>  binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge
*br_int,
> -            const char *chassis_id, struct hmap *local_datapaths)
> +            const char *chassis_id, struct sset *all_lports,
> +            struct hmap *local_datapaths)
>  {
>      const struct sbrec_chassis *chassis_rec;
>      const struct sbrec_port_binding *binding_rec;
> @@ -251,6 +252,33 @@ binding_run(struct controller_ctx *ctx, const
> struct ovsrec_bridge *br_int,
>          process_full_binding = true;
>      }
>
> +    struct shash_node *node;
> +    SHASH_FOR_EACH (node, _to_iface) {
> +        sset_add(all_lports, node->name);
> +    }
>
> I think you could just sset_clone() local_ids into all_lports and
> get the same result.

You are correct

>
> +
> +    /* Run through all binding records to collect lists of lports
> +       for later use in updating ct zones. */
> +    SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {
> +        const struct ovsrec_interface *iface_rec
> +            = shash_find_data(_to_iface, binding_rec->
logical_port);
> +        if (iface_rec
> +            || (binding_rec->parent_port && binding_rec->parent_port[0]
&&
> +                sset_contains(all_lports, binding_rec->parent_port))) {
> +            if (binding_rec->parent_port && binding_rec->parent_port[0])
{
> +                /* Add child logical port to the set of all local ports.
*/
> +                sset_add(all_lports, binding_rec->logical_port);
> +            }
> +        } else if (!binding_rec->chassis
> +                   && !strcmp(binding_rec->type, "localnet")) {
> +            /* localnet ports will never be bound to a chassis, but we
want
> +             * to list them in all_lports because we want to allocate
> +             * a conntrack zone ID for each one, as we'll be creating
> +             * a patch port for each one. */
> +            sset_add(all_lports, binding_rec->logical_port);
> +        }
> +    }
> +
>
> This seems to undo the benefits of the original change to do
> "incremental procesing" in binding.c.
>
> It seems like we weren't that far from a complete fix in Babu's first
patch.

We weren't/aren't - the last piece is how to handle persisting the
information
in the above code snippet, which could then be appended to the cloned
local_ids
for the final result.  What I thought of during the day (and I was still
looking
for something more efficient) would be to use a hashmap that included the
binding_rec uuid and the logical_port string - then I could just check to
see if
the string already existed on the add side, and remove the value based on
the
uuid on the delete side.

Ryan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn: Add ovn-controller-vtep debian package

2016-07-27 Thread Ryan Moats



Ben Pfaff  wrote on 07/27/2016 03:27:22 PM:

> From: Ben Pfaff 
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: dev@openvswitch.org
> Date: 07/27/2016 03:27 PM
> Subject: Re: [ovs-dev] [PATCH] ovn: Add ovn-controller-vtep debian
package
>
> On Sat, Jul 23, 2016 at 10:38:34AM -0500, Ryan Moats wrote:
> > Having a separate debian package for deploying
> > the ovn-controller-vtep binary enables the ability
> > to assign specific nodes the role of communicating
> > with VTEP enabled TORs.
> >
> > Change-Id: Ia36aea7d89bd011a57918820b2a9f6e3469b3e04
> > Signed-off-by: Ryan Moats 
>
> Should there be an initscript to start and stop it?
>
> There's inconsistent use of tabs in the automake.mk change.
>

Indeed there should be an initscript and I see the tab problems.
I'll spin a V2 at some point over the next couple of days...

Ryan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH ovn-controller] Physical: persist tunnels

2016-07-27 Thread Ryan Moats



Ben Pfaff  wrote on 07/27/2016 04:14:07 PM:

> From: Ben Pfaff 
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: dev@openvswitch.org
> Date: 07/27/2016 04:14 PM
> Subject: Re: [ovs-dev] [PATCH ovn-controller] Physical: persist tunnels
>
> On Mon, Jul 25, 2016 at 04:28:52PM +, Ryan Moats wrote:
> > While commit ab39371d68842b7e4000cc5d8718e6fc04e92795
> > (ovn-controller: Handle physical changes correctly) addressed
> > unit test failures, it did so at the cost of performance: [1]
> > notes that ovn-controller cpu usage is now pegged at 100%.
> >
> > Root cause of this is that while the storage for tunnels is
> > persisted, their creation is not (which the above changed
> > incorrectly assumed was the case).  This patch persists
> > tunneled data across invocations of physical_run.  A side
> > effect is that renaming of localfvif_map_changed variable
> > to physical_map_changed and extending its scope to include
> > tunnel changes.
> >
> > [1] http://openvswitch.org/pipermail/dev/2016-July/076058.html
> >
> > Signed-off-by: Ryan Moats 
>
> Thanks for the fix.
>
> I applied this to master, folding in the following incremental which is
> mostly style but I think that the extra check on tun->type is a bug fix
> albeit for a corner case.  Please take a look.
>
> --8<--cut here-->8--

I think that's fine

and if it isn't we'll figure it out...

Ryan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] debian: Add six dependency to python-openvswitch.

2016-07-27 Thread Joe Stringer

On 27 July 2016 at 14:50, Ben Pfaff  wrote:
> On Tue, Jul 26, 2016 at 12:34:16PM -0700, Joe Stringer wrote:
>> python-openvswitch uses the python "six" library, add a dependency for
>> this to the debian package.
>>
>> VMware-BZ: #1700259
>> Reported-by: Devang Doshi 
>> Signed-off-by: Joe Stringer 
>
> Acked-by: Ben Pfaff 

Thanks, applied.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] route-table: flush addresses list when route table is reset

2016-07-27 Thread Thadeu Lima de Souza Cascardo

On Wed, Jul 27, 2016 at 01:48:47PM -0700, Ben Pfaff wrote:
> On Sun, Jul 24, 2016 at 01:07:27PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > When the route table is reset, the addresses list may be out of date, as we 
> > race
> > for the many netlink socket notifications.
> > 
> > A quick fix for this is flushing the addresses list, before dumping the 
> > routes
> > and gathering source addresses for them.
> > 
> > That way, instead of using invalid source addresses or preventing an entry 
> > to be
> > added because of missing source addresses, repeated tests showed the correct
> > entry is always added.
> > 
> > Signed-off-by: Thadeu Lima de Souza Cascardo 
> 
> This gives me a compiler error:
> ../lib/route-table.c:154:5: error: implicit declaration of function 
> 'netdev_get_addrs_list_flush' is invalid in C99 
> [-Werror,-Wimplicit-function-declaration]
> 
> I also noticed that netdev_get_addrs_list_flush() is not available on
> Windows.
> 
> Thanks,
> 
> Ben.

Hum... I'll take a look at that and resubmit.

Thanks.
Cascardo.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovsdb-client: Fix memory leak reported by Valgrind.

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 08:24:57PM -0700, William Tu wrote:
> Testcase 1857: ovsdb-monitor.at:538 monitor-cond-change reports the
> following definitely memory leak:
> ovsdb_schema_create (ovsdb.c:34)
> ovsdb_schema_from_json (ovsdb.c:196)
> fetch_schema (ovsdb-client.c:385)
> do_monitor_cond (ovsdb-client.c:1112)

This is the kind of pseudo-leak that doesn't matter because the program
exists immediately.  But we might as well fix it.

> Signed-of-by: William Tu 

I corrected the spelling of Signed-"off"-by here.

Thanks, I applied this to master.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] debian: Add six dependency to python-openvswitch.

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 12:34:16PM -0700, Joe Stringer wrote:
> python-openvswitch uses the python "six" library, add a dependency for
> this to the debian package.
> 
> VMware-BZ: #1700259
> Reported-by: Devang Doshi 
> Signed-off-by: Joe Stringer 

Acked-by: Ben Pfaff 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovsdb: Fix memory leak reported by Valgrind.

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 08:12:36PM -0700, William Tu wrote:
> Valgrind testcase 1967: simple idl, conditional, modify as delete due
> to condition - C reports the following leak:
> json_array_create_empty (json.c:185)
> json_parser_push_array (json.c:1234)
> json_parser_input (json.c:1328)
> json_lex_input (json.c:945)
> json_parser_feed (json.c:1103)
> json_from_string (json.c:1025)
> parse_json (test-ovsdb.c:227)
> update_conditions (test-ovsdb.c:2324)
> do_idl (test-ovsdb.c:2389)
> ovs_cmdl_run_command (command-line.c:121)
> main (test-ovsdb.c:73)
> 
> Signed-off-by: William Tu 

Thanks, applied.

I changed "ovsdb:" to "test-ovsdb:" in the title to emphasize that this
is just a leak in a test case.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH, v4] Scanning only changed entries in the ovnsb

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 10:25:07PM -0400, Hui Kang wrote:
> - Improve performance by scanning only changed port binding entries
> when determining whether to mark the logical switch port up or
> down
> 
> v3->v4:
> - Add an initialization function to scan all entries in Port_binding table
>   when ovn-northd restarts or fails over
> 
> Signed-off-by: Hui Kang 

Hi, thanks for the revision.  This gets a few patch rejects, would you
mind rebasing?

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 0/5] netdev_open conflicting types

2016-07-27 Thread Daniele Di Proietto

Looks good to me, applied to master, thanks!

2016-07-27 8:06 GMT-07:00 Thadeu Lima de Souza Cascardo :

> Fix some uses of type on netdev_open.
>
> We established that there are two types: the database type and the netdev
> type.
> And that ofproto and dpif layers use the netdev type and return it when
> queried.
>
> Some calls to netdev_open should use NULL instead of system. If there is no
> netdev opened, system will be used as the default, but if a different type
> is
> opened, the caller does not expect it to be of type system.
>
> In other cases, the use of internal is incorrect and the appropriate
> port_open_type should be used instead.
>
> Finally, we make netdev_open return an error when a netdev of a different
> type
> than the one requested already exists.
>
> This will fix some bugs and alert users when conflicting interfaces exist
> on the
> system and the database. For example, when a user configures an interface
> with a
> type other than system, and there is a system interface with the same name.
>
> Thadeu Lima de Souza Cascardo (5):
>   in-band: use open_type when opening internal device
>   in-band: don't use system type when opening netdev
>   netdev-vport: don't use system type when opening netdev
>   dpif-netdev: use the open_type when creating the local port
>   netdev: do not allow devices to be opened with conflicting types
>
>  lib/dpif-netdev.c  | 11 ++-
>  lib/netdev-vport.c |  2 +-
>  lib/netdev.c   |  8 +++-
>  ofproto/in-band.c  |  5 +++--
>  tests/dpctl.at | 13 +++--
>  5 files changed, 24 insertions(+), 15 deletions(-)
>
> --
> 2.7.4
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] python: Send old values of the updated cols in notify for update2

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 11:28:14PM +0530, Numan Siddique wrote:
> When python IDL calls the "notify" function after processing the "update2"
> message from ovsdb-server, it is suppose to send the old values of the
> updated columns as the last parameter. But the recent commit "897c8064"
> sends the updated values. This breaks the behaviour.
> This patch fixes this issue. It also updates the description of
> the 'updates' param of the notify function to make it more clear.
> 
> Fixes: 897c8064 ("python: move Python idl to work with monitor_cond")
> Signed-off-by: Numan Siddique 

Thanks, applied!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] FAQ: Add contents section and enable internal links.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 10:16:17PM +0100, Bhanuprakash Bodireddy wrote:
> Add contents section to FAQ and enable internal links in doc for pretty
> printing on GitHub.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
> v1->v2
> * Rebase.
> * Change link handling to titles.

Much less ugly.  Thanks.

Applied to master.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] ovn: Support for GARP for NAT IPs via localnet

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 02:15:54AM -0700, Chandra S Vejendla wrote:
> In cases where a DNAT IP is moved to a new router or the SNAT IP is reused
> with a new mac address, the NAT IPs become unreachable because the external
> switches/routers have stale ARP entries. This commit
> aims to fix the problem by sending GARPs for NAT IPs via locanet
> 
> A new options key "nat-addresses" is added to the logical switch port of
> type router. The value for the key "nat-addresses" is the MAC address of the
> port followed by a list of SNAT & DNAT IPs.
> 
> Signed-off-by: Chandra Sekhar Vejendla 

Here are some suggested changes to fold in, to improve style and reduce
redundancy.

ovn-sb.xml should document the new option (that it copies from the NB
database).

I didn't spot any actual problems, but I don't feel like I did a good
review of this either.  I'd appreciate some comments from people who
better understand how NAT and OVN interact.  Justin?  Guru?

Thanks,

Ben.

--8<--cut here-->8--

diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index ffe7f1e..8a904a3 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -709,6 +709,19 @@ destroy_send_garps(void)
 shash_destroy_free_data(_garp_data);
 }
 
+static void
+add_garp(const char *name, ofp_port_t ofport,
+ const struct eth_addr ea, ovs_be32 ip)
+{
+struct garp_data *garp = xmalloc(sizeof *garp);
+garp->ea = ea;
+garp->ipv4 = ip;
+garp->announce_time = time_msec() + 1000;
+garp->backoff = 1;
+garp->ofport = ofport;
+shash_add(_garp_data, name, garp);
+}
+
 /* Add or update a vif for which GARPs need to be announced. */
 static void
 send_garp_update(const struct sbrec_port_binding *binding_rec,
@@ -738,24 +751,12 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
 char *name = xasprintf("%s-%s", binding_rec->logical_port,
 laddrs->ipv4_addrs[i].addr_s);
 garp = shash_find_data(_garp_data, name);
-free(name);
-
 if (garp) {
 garp->ofport = ofport;
-continue;
-}
-else {
-struct garp_data *garp = xmalloc(sizeof *garp);
-garp->ea = laddrs->ea;
-garp->ipv4 = laddrs->ipv4_addrs[i].addr;
-garp->announce_time = time_msec() + 1000;
-garp->backoff = 1;
-garp->ofport = ofport;
-char *name = xasprintf("%s-%s", binding_rec->logical_port,
-laddrs->ipv4_addrs[i].addr_s);
-shash_add(_garp_data, name, garp);
-free(name);
+} else {
+add_garp(name, ofport, laddrs->ea, laddrs->ipv4_addrs[i].addr);
 }
+free(name);
 }
 destroy_lport_addresses(laddrs);
 return;
@@ -777,13 +778,8 @@ send_garp_update(const struct sbrec_port_binding 
*binding_rec,
 continue;
 }
 
-struct garp_data *garp = xmalloc(sizeof *garp);
-garp->ea = laddrs.ea;
-garp->ipv4 = laddrs.ipv4_addrs[0].addr;
-garp->announce_time = time_msec() + 1000;
-garp->backoff = 1;
-garp->ofport = ofport;
-shash_add(_garp_data, binding_rec->logical_port, garp);
+add_garp(binding_rec->logical_port, ofport,
+ laddrs.ea, laddrs.ipv4_addrs[0].addr);
 
 destroy_lport_addresses();
 break;
@@ -922,8 +918,8 @@ get_nat_addresses_and_keys(struct sset *nat_address_keys,
 continue;
 }
 
-struct lport_addresses *laddrs = xmalloc(sizeof *laddrs);
-if (!extract_lsp_addresses((char *)nat_addresses_options, laddrs)) {
+struct lport_addresses laddrs = xmalloc(sizeof *laddrs);
+if (!extract_lsp_addresses(nat_addresses_options, laddrs)) {
 free(laddrs);
 continue;
 }
diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c
index de54624..e4e3f51 100644
--- a/ovn/lib/ovn-util.c
+++ b/ovn/lib/ovn-util.c
@@ -72,13 +72,13 @@ add_ipv6_netaddr(struct lport_addresses *laddrs, struct 
in6_addr addr,
  *
  * The caller must call destroy_lport_addresses(). */
 bool
-extract_lsp_addresses(char *address, struct lport_addresses *laddrs)
+extract_lsp_addresses(const char *address, struct lport_addresses *laddrs)
 {
 memset(laddrs, 0, sizeof *laddrs);
 
-char *buf = address;
+const char *buf = address;
 int buf_index = 0;
-char *buf_end = buf + strlen(address);
+const char *buf_end = buf + strlen(address);
 if (!ovs_scan_len(buf, _index, ETH_ADDR_SCAN_FMT,
   ETH_ADDR_SCAN_ARGS(laddrs->ea))) {
 laddrs->ea = eth_addr_zero;
diff --git a/ovn/lib/ovn-util.h b/ovn/lib/ovn-util.h
index e9f3ec2..7056781 100644
--- a/ovn/lib/ovn-util.h
+++ b/ovn/lib/ovn-util.h
@@

Re: [ovs-dev] [PATCH] ovs-numa: fixed cmask parse with 0x prefix

2016-07-27 Thread Daniele Di Proietto

Thanks for the patch.

We never accepted the 0x prefix for pmd-cpu-mask, but I guess there's no
harm in doing it and it might make user's life easier.

We always use braces, even for single statement, please read CodingStyle.md

https://github.com/openvswitch/ovs/blob/master/CodingStyle.md#statements

I cannot merge this unless you provide a signoff, the details and the
meaning is explained here:

https://github.com/openvswitch/ovs/blob/master/CONTRIBUTING.md#developers-certificate-of-origin

Thanks,

Daniele

2016-07-26 14:56 GMT-07:00 Wei Shen :

> Fixed a minor bug that would print out a confusing warning about core mask,
> "ovs_numa|WARN|Invalid cpu mask: x", when dpdl-lcore-mask has 0x prefix,
> e.g.
> 0x123, which is the convention used in INSTALL.DPDK.md.
> ---
>  lib/ovs-numa.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
> index c8173e0..c1938eb 100644
> --- a/lib/ovs-numa.c
> +++ b/lib/ovs-numa.c
> @@ -551,6 +551,10 @@ ovs_numa_set_cpu_mask(const char *cmask)
>  return;
>  }
>
> +/* Skip 0x if supplied in the cmask */
> +if (!strncmp(cmask, "0x", 2))
> +cmask += 2;
> +
>  for (i = strlen(cmask) - 1; i >= 0; i--) {
>  char hex = toupper((unsigned char)cmask[i]);
>  int bin, j;
> --
> 2.5.5
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-controller: update_ct_zone operates always on empty set

2016-07-27 Thread Russell Bryant

On Wed, Jul 27, 2016 at 1:44 AM, Babu Shanmugam  wrote:

>
>
> On Wednesday 27 July 2016 06:43 AM, Russell Bryant wrote:
>
> On Tue, Jul 26, 2016 at 6:46 AM,  wrote:
>
>> From: Babu Shanmugam 
>>
>> Commit 263064a (Convert binding_run to incremental processing.) removed
>> the usage
>> of all_lports from binding_run, but it is infact used in the context of
>> the caller,
>> especially by update_ct_zones().
>>
>> Without this change, update_ct_zones operates on an empty set always.
>>
>> Signed-off-by: Babu Shanmugam 
>>
>
> Ouch. This is a really bad regression.  If I understand correctly, we're
> not setting a ct zone ID for any logical ports.  All are just using the
> default zone of 0.
>
> Yes Russell, your understanding is correct.
>
> We should think about a good way to test OVN's use of conntrack zones to
> ensure that entries end up in separate zones for separate ports.  A good
> test for that may require userspace conntrack support, though.
>  Another test we could do now would be looking at the flows in table 0 and
> ensuring that the input flow for each port has a different conntrack zone
> ID assigned.  That feels like kind of a hack, though.
>
> I agree that we need more test cases. I could not spend much time to
> figure out a proper approach for a test case. I will have a look at it.
>

FYI, the patch doesn't apply for some reason.  It's small enough that I can
do it manually, though.

I can take a look at this tomorrow to add the missing bits that I pointed
out unless you beat me to it.  I know you've got a few other things you're
working on, as well.

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] move ovn/lib/<lex|actions|expr>.h to include/ovn

2016-07-27 Thread Ben Pfaff

On Mon, Jul 25, 2016 at 03:04:32PM -0700, Aaron Rosen wrote:
> This patch is done to enable in tree building of the ovn-utils python
> wrapper. This is similar to what was done in:
> ee89ea7b477bb4fd05137de03b2e8443807ed9f4
> 
> Signed-off-by: Aaron Rosen 

Applied, thanks!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] FAQ: Add contents section and enable internal links.

2016-07-27 Thread Bodireddy, Bhanuprakash

>-Original Message-
>From: Ben Pfaff [mailto:b...@ovn.org]
>Sent: Wednesday, July 27, 2016 8:59 PM
>To: Bodireddy, Bhanuprakash 
>Cc: dev@openvswitch.org
>Subject: Re: [PATCH] FAQ: Add contents section and enable internal links.
>
>On Wed, Jul 27, 2016 at 08:48:12PM +0100, Bhanuprakash Bodireddy wrote:
>> Add contents section to FAQ and enable internal links in doc for
>> pretty printing on GitHub.
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> 
>> ---
>> Reviewers can review the rendered form here
>> https://github.com/bbodired/ovs/blob/master/FAQ.md
>
>This makes the FAQ less readable as a text file, because it's harder to spot
>
>##  2. Releases
>
>than
>
>Releases
>
>
>Is there a way to link to a title?
I found a better way to handle the links to titles. Sent v2 patch.

Regards,
Bhanu Prakash.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v5 13/16] system-tests: Run conntrack tests with userspace.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 01:51:00PM -0700, Jesse Gross wrote:
> On Wed, Jul 27, 2016 at 1:40 PM, Daniele Di Proietto
>  wrote:
> > On 27/07/2016 13:12, "Joe Stringer"  wrote:
> >
> >>On 26 July 2016 at 17:58, Daniele Di Proietto  
> >>wrote:
> >>> The userspace connection tracker doesn't support ALGs, frag reassembly
> >>> or NAT yet, so skip those tests.
> >>>
> >>> Also, connection tracking state input from a local port is not possible
> >>> in userspace.
> >>>
> >>> The userspace datapath pads all frames with 0, to make them at
> >>> least 64 bytes.
> >>>
> >>> Finally, the userspace datapath checks for the IPv4 header checksum, so
> >>> fix those in the hardcoded packets.
> >>>
> >>> Signed-off-by: Daniele Di Proietto 
> >>> Acked-by: Joe Stringer 
> >>> Acked-by: Flavio Leitner 
> >>> ---
> >>
> >>
> >>
> >>> @@ -1324,11 +1327,11 @@ dnl UDP packets from ns0->ns1 should solicit 
> >>> "destination unreachable" response.
> >>>  NS_CHECK_EXEC([at_ns0], [bash -c "echo a | nc $NC_EOF_OPT -u 10.1.1.2 
> >>> 1"])
> >>>
> >>>  AT_CHECK([ovs-appctl revalidator/purge], [0])
> >>> -AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop], 
> >>> [0], [dnl
> >>> - n_packets=1, n_bytes=44, priority=100,udp,in_port=1 
> >>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
> >>> - n_packets=1, n_bytes=72, 
> >>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
> >>> - n_packets=1, n_bytes=72, priority=100,ct_state=-trk,icmp,in_port=2 
> >>> actions=ct(table=0)
> >>> - n_packets=2, n_bytes=84, priority=10,arp actions=NORMAL
> >>> +AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop | 
> >>> sed -e 's/n_bytes=[[0-9]]*/n_bytes=/g'], [0], [dnl
> >>> + n_packets=1, n_bytes=, priority=100,udp,in_port=1 
> >>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
> >>> + n_packets=1, n_bytes=, 
> >>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
> >>> + n_packets=1, n_bytes=, 
> >>> priority=100,ct_state=-trk,icmp,in_port=2 actions=ct(table=0)
> >>> + n_packets=2, n_bytes=, priority=10,arp actions=NORMAL
> >>>  NXST_FLOW reply:
> >>>  ])
> >>
> >>I think this is a completely orthogonal point, but it's still a bit
> >>surprising to me that the n_bytes would differ when receiving short
> >>packets in kernel vs. userspace datapaths. I follow that userspace
> >>pads shorter packets on receive, but shouldn't we be able to attribute
> >>these stats consistently, regardless of the datapath?
> >
> > That's a good point.
> >
> > We call dp_packet_pad() in netdev_linux_recv().  That used to be in 
> > netdev_recv() and can be traced back to the initial OVS commit.  Here's a 
> > comment in netdev_recv():
> >
> > COVERAGE_INC(netdev_received);
> > buffer->size += n_bytes;
> >
> > /* When the kernel internally sends out an Ethernet frame on an
> >  * interface, it gives us a copy *before* padding the frame to the
> >  * minimum length.  Thus, when it sends out something like an ARP
> >  * request, we see a too-short frame.  So pad it out to the minimum
> >  * length. */
> > pad_to_minimum_length(buffer);
> 
> I wonder if anything in OVS actually cares about this anymore? I don't
> know the history of that comment.

I don't remember the origin anymore but it was probably my comment.
It's possible that some code in OVS assumed that packets were at least
64 bytes long at some point.  For example, flow_extract() might have
been able to be slightly simplified on the basis that access to the IPv4
header couldn't be beyond the end of the buffer.

I doubt we do that kind of optimization any longer.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v2] FAQ: Add contents section and enable internal links.

2016-07-27 Thread Bhanuprakash Bodireddy

Add contents section to FAQ and enable internal links in doc for pretty
printing on GitHub.

Signed-off-by: Bhanuprakash Bodireddy 
---
v1->v2
* Rebase.
* Change link handling to titles.

 FAQ.md | 56 +++-
 1 file changed, 27 insertions(+), 29 deletions(-)

diff --git a/FAQ.md b/FAQ.md
index f685ee8..a7b159d 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -3,8 +3,22 @@ Frequently Asked Questions
 
 Open vSwitch 
 
-General

+## Contents
+
+- [General](#general)
+- [Releases](#releases)
+- [Terminology](#terminology)
+- [Basic configuration](#basic-configuration)
+- [Implementation Details](#implementation-details)
+- [Performance](#performance)
+- [Configuration Problems](#configuration-problems)
+- [QOS](#qos)
+- [VLANs](#vlans)
+- [VXLANs](#vxlans)
+- [Using OpenFlow](#using-openflow)
+- [Development](#development)
+
+## General
 
 ### Q: What is Open vSwitch?
 
@@ -118,9 +132,7 @@ A: Starting in OVS 2.4, we switched the default ports to the
cannot, all the programs allow overriding the default port.  See the
appropriate man page.
 
-
-Releases
-
+## Releases
 
 ### Q: What does it mean for an Open vSwitch release to be LTS (long-term 
support)?
 
@@ -356,8 +368,7 @@ A: Bridge compatibility was a feature of Open vSwitch 1.9 
and earlier.
the release.  Be sure to start the ovs-brcompatd daemon.
 
 
-Terminology

+## Terminology
 
 ### Q: I thought Open vSwitch was a virtual Ethernet switch, but the 
documentation keeps talking about bridges.  What's a bridge?
 
@@ -369,9 +380,7 @@ A: In networking, the terms "bridge" and "switch" are 
synonyms.  Open
 
 A: See the "VLAN" section below.
 
-
-Basic Configuration

+## Basic configuration
 
 ### Q: How do I configure a port as an access port?
 
@@ -578,9 +587,7 @@ A: First, why do you want to do this?  Two connected 
bridges are not
 A: Open vSwitch does not support such a configuration.
Bridges always have their local ports.
 
-
-Implementation Details
---
+## Implementation Details
 
 ### Q: I hear OVS has a couple of kinds of flows.  Can you tell me about them?
 
@@ -670,9 +677,7 @@ A: No.  There are several reasons:
   please read "The Design and Implementation of Open vSwitch",
   published in USENIX NSDI 2015.
 
-
-Performance

+## Performance
 
 ### Q: I just upgraded and I see a performance drop.  Why?
 
@@ -692,8 +697,7 @@ A: The OVS kernel datapath may have been updated to a newer 
version than
userspace.
 
 
-Configuration Problems
---
+## Configuration Problems
 
 ### Q: I created a bridge and added my Ethernet port to it, using commands
like these:
@@ -1048,8 +1052,7 @@ A: The short answer is that this is a misuse of a "tap" 
device.  Use
type "system", the default, instead).
 
 
-Quality of Service (QoS)
-
+## QOS
 
 ### Q: Does OVS support Quality of Service (QoS)?
 
@@ -1205,9 +1208,7 @@ A: Since version 2.0, Open vSwitch has OpenFlow protocol 
support for
vSwitch software switch (neither the kernel-based nor userspace
switches).
 
-
-VLANs
--
+## VLANs
 
 ### Q: What's a VLAN?
 
@@ -1480,8 +1481,7 @@ A: Open vSwitch implements Independent VLAN Learning 
(IVL) for
for each VLANs.
 
 
-VXLANs
--
+## VXLANs
 
 ### Q: What's a VXLAN?
 
@@ -1515,8 +1515,7 @@ A: By default, Open vSwitch will use the assigned IANA 
port for VXLAN, which
options:dst_port=8472
 
 
-Using OpenFlow (Manually or Via Controller)

+## Using OpenFlow
 
 ### Q: What versions of OpenFlow does Open vSwitch support?
 
@@ -2077,8 +2076,7 @@ A: By itself, the "learn" action can only put two kinds 
of actions
Applications".
 
 
-Development

+## Development
 
 ### Q: How do I implement a new OpenFlow message?
 
-- 
2.4.11

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn: Rename "gateway" to "l3gateway".

2016-07-27 Thread Russell Bryant

On Tue, Jul 26, 2016 at 5:07 PM, Kyle Mestery  wrote:

> On Tue, Jul 26, 2016 at 3:49 PM, Russell Bryant  wrote:
> > When L3 gateway support was added, it introduced a port type called
> > "gateway" and a corresponding option called "gateway-chassis".  Since
> > that time, we also have an L2 gateway port type called "l2gateway" and a
> > corresponding option called "l2gateway-chassis".  This patch renames the
> > L3 gateway port type and option to "l3gateway" and "l3gateway-chassis"
> > to make things a little more clear and consistent.
> >
> > Signed-off-by: Russell Bryant 
>
> This seems very reasonable to me Russell, and makes sense given the
> two disparate gateway port types.
>
> Acked-by: Kyle Mestery 
>

Thanks for the review, Kyle!

Guru, since you did the L3 gateway work, does this seem OK to you?

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] doc: Update INSTALL.Docker.md to reflect it's focus on OVN

2016-07-27 Thread Russell Bryant

On Wed, Jul 27, 2016 at 1:40 PM, Kyle Mestery  wrote:

> While reading this document, the title stood out to me as not
> accurate. The title indicates it will discuss how to use
> Open vSwitch with Docker, but in reality, it's about using
> Open Virtual Networking with Docker.
>
> This change updates the title, as well as the opening paragraphs
> to more accurately reflect what the document is talking about.
>
> Signed-off-by: Kyle Mestery 
>
>
Seems fine to me, but I think Guru should review it.

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH ovn-controller] Physical: persist tunnels

2016-07-27 Thread Ben Pfaff

On Mon, Jul 25, 2016 at 04:28:52PM +, Ryan Moats wrote:
> While commit ab39371d68842b7e4000cc5d8718e6fc04e92795
> (ovn-controller: Handle physical changes correctly) addressed
> unit test failures, it did so at the cost of performance: [1]
> notes that ovn-controller cpu usage is now pegged at 100%.
> 
> Root cause of this is that while the storage for tunnels is
> persisted, their creation is not (which the above changed
> incorrectly assumed was the case).  This patch persists
> tunneled data across invocations of physical_run.  A side
> effect is that renaming of localfvif_map_changed variable
> to physical_map_changed and extending its scope to include
> tunnel changes.
> 
> [1] http://openvswitch.org/pipermail/dev/2016-July/076058.html
> 
> Signed-off-by: Ryan Moats 

Thanks for the fix.

I applied this to master, folding in the following incremental which is
mostly style but I think that the extra check on tun->type is a bug fix
albeit for a corner case.  Please take a look.

--8<--cut here-->8--

diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index e788fe5..e5df4f2 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -610,7 +610,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id 
mff_ovn_geneve,
 
 struct simap new_localvif_to_ofport =
 SIMAP_INITIALIZER(_localvif_to_ofport);
-struct simap new_tunnel_to_ofport = 
+struct simap new_tunnel_to_ofport =
 SIMAP_INITIALIZER(_tunnel_to_ofport);
 for (int i = 0; i < br_int->n_ports; i++) {
 const struct ovsrec_port *port_rec = br_int->ports[i];
@@ -674,11 +674,13 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id 
mff_ovn_geneve,
 }
 
 simap_put(_tunnel_to_ofport, chassis_id, ofport);
-struct chassis_tunnel *tun;
-if ((tun = chassis_tunnel_find(chassis_id))) {
+struct chassis_tunnel *tun = chassis_tunnel_find(chassis_id);
+if (tun) {
 /* If the tunnel's ofport has changed, update. */
-if (tun->ofport != u16_to_ofp(ofport)) {
+if (tun->ofport != u16_to_ofp(ofport) ||
+tun->type != tunnel_type) {
 tun->ofport = u16_to_ofp(ofport);
+tun->type = tunnel_type;
 physical_map_changed = true;
 }
 } else {
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] ovn-controller: Restore all_lports for update_ct_zone

2016-07-27 Thread Russell Bryant

On Wed, Jul 27, 2016 at 8:42 AM, Ryan Moats  wrote:

> As [1] indicates, commit 263064a (Convert binding_run
> to incremental processing.) incorrectly localized
> all_lports to the binding module, leaving an empty
> set for update_ct_zone to work with.  This patch
> restores all_lports processing to what existed prior
> to that patch.
>
> [1] http://openvswitch.org/pipermail/dev/2016-July/076171.html
>
> Signed-off-by: Ryan Moats 
> ---
>  v1->v2: Added missing reference to commit message
>
>  ovn/controller/binding.c| 30 +-
>  ovn/controller/binding.h|  3 ++-
>  ovn/controller/ovn-controller.c |  3 ++-
>  3 files changed, 33 insertions(+), 3 deletions(-)
>
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index e83c1d5..82ee3ba 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -230,7 +230,8 @@ consider_local_datapath(struct controller_ctx *ctx,
>
>  void
>  binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge
> *br_int,
> -const char *chassis_id, struct hmap *local_datapaths)
> +const char *chassis_id, struct sset *all_lports,
> +struct hmap *local_datapaths)
>  {
>  const struct sbrec_chassis *chassis_rec;
>  const struct sbrec_port_binding *binding_rec;
> @@ -251,6 +252,33 @@ binding_run(struct controller_ctx *ctx, const struct
> ovsrec_bridge *br_int,
>  process_full_binding = true;
>  }
>
> +struct shash_node *node;
> +SHASH_FOR_EACH (node, _to_iface) {
> +sset_add(all_lports, node->name);
> +}
>

I think you could just sset_clone() local_ids into all_lports and get the
same result.


> +
> +/* Run through all binding records to collect lists of lports
> +   for later use in updating ct zones. */
> +SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {
> +const struct ovsrec_interface *iface_rec
> += shash_find_data(_to_iface, binding_rec->logical_port);
> +if (iface_rec
> +|| (binding_rec->parent_port && binding_rec->parent_port[0] &&
> +sset_contains(all_lports, binding_rec->parent_port))) {
> +if (binding_rec->parent_port && binding_rec->parent_port[0]) {
> +/* Add child logical port to the set of all local ports.
> */
> +sset_add(all_lports, binding_rec->logical_port);
> +}
> +} else if (!binding_rec->chassis
> +   && !strcmp(binding_rec->type, "localnet")) {
> +/* localnet ports will never be bound to a chassis, but we
> want
> + * to list them in all_lports because we want to allocate
> + * a conntrack zone ID for each one, as we'll be creating
> + * a patch port for each one. */
> +sset_add(all_lports, binding_rec->logical_port);
> +}
> +}
> +
>

This seems to undo the benefits of the original change to do "incremental
procesing" in binding.c.

It seems like we weren't that far from a complete fix in Babu's first patch.

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v5] netdev-dpdk: Set pmd thread priority

2016-07-27 Thread Daniele Di Proietto

Thanks for the patch, the implementation looks good to me too.

During testing I kept noticing that it's way too easy to make OVS
completely unresponsive.  As you point out in the documentation by having
dpdk-lcore-mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill
-9 is required).  I wonder what happens if one tries to set pmd-cpu-mask to
every core in the system.

As a way to mitigate the risk perhaps we can avoid setting the main thread
affinity to the first core in dpdk-lcore-mask by _always_ restoring it in
dpdk_init__(), also if auto_determine is false.

Perhaps we should start explicitly prohibiting creating a pmd thread on the
first core in dpdk-lcore-mask (I get why previous version of this didn't do
it on core 0.  Perhaps we can generalize that to the first core in
dpdk-lcore-mask).

What's the behavior of other DPDK applications?

Thanks,

Daniele

2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B :

> >
> >Set the DPDK pmd thread scheduling policy to SCHED_RR and static
> >priority to highest priority value of the policy. This is to deal with
> >pmd thread starvation case where another cpu hogging process can get
> >scheduled/affinitized on to the same core the pmd thread is running
> >there by significantly impacting the datapath performance.
> >
> >Setting the realtime scheduling policy to the pmd threads is one step
> >towards Fastpath Service Assurance in OVS DPDK.
> >
> >The realtime scheduling policy is applied only when CPU mask is passed
> >to 'pmd-cpu-mask'. For example:
> >
> >* In the absence of pmd-cpu-mask, one pmd thread shall be created
> >  and default scheduling policy and priority gets applied.
> >
> >* If pmd-cpu-mask is specified, one or more pmd threads shall be
> >  spawned on the corresponding core(s) in the mask and real time
> >  scheduling policy SCHED_RR and highest priority of the policy is
> >  applied to the pmd thread(s).
> >
> >To reproduce the pmd thread starvation case:
> >
> >ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
> >taskset 0x2 cat /dev/zero > /dev/null &
> >
> >With this commit, it is recommended that the OVS control thread and pmd
> >thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask'
> >should be non-overlapping). Also other processes with same affinity as
> >PMD thread will be unresponsive.
> >
> >Signed-off-by: Bhanuprakash Bodireddy 
>
> LGTM - Acked-by: mark.b.kavan...@intel.com
>
> >---
> >v4->v5:
> >* Reword Note section in DPDK-ADVANCED.md
> >
> >v3->v4:
> >* Document update
> >* Use ovs_strerror for reporting errors in lib-numa.c
> >
> >v2->v3:
> >* Move set_priority() function to lib/ovs-numa.c
> >* Apply realtime scheduling policy and priority to pmd thread only if
> >  pmd-cpu-mask is passed.
> >* Update INSTALL.DPDK-ADVANCED.
> >
> >v1->v2:
> >* Removed #ifdef and introduced dummy function "pmd_thread_setpriority"
> >  in netdev-dpdk.h
> >* Rebase
> >
> > INSTALL.DPDK-ADVANCED.md | 17 +
> > lib/dpif-netdev.c|  9 +
> > lib/ovs-numa.c   | 18 ++
> > lib/ovs-numa.h   |  1 +
> > 4 files changed, 41 insertions(+), 4 deletions(-)
> >
> >diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> >index 9ae536d..d76cb4e 100644
> >--- a/INSTALL.DPDK-ADVANCED.md
> >+++ b/INSTALL.DPDK-ADVANCED.md
> >@@ -205,8 +205,10 @@ needs to be affinitized accordingly.
> > pmd thread is CPU bound, and needs to be affinitized to isolated
> > cores for optimum performance.
> >
> >-By setting a bit in the mask, a pmd thread is created and pinned
> >-to the corresponding CPU core. e.g. to run a pmd thread on core 2
> >+By setting a bit in the mask, a pmd thread is created, pinned
> >+to the corresponding CPU core and the scheduling policy SCHED_RR
> >+along with maximum priority of the policy applied to the pmd thread.
> >+e.g. to pin a pmd thread on core 2
> >
> > `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
> >
> >@@ -234,8 +236,10 @@ needs to be affinitized accordingly.
> >   responsible for different ports/rxq's. Assignment of ports/rxq's to
> >   pmd threads is done automatically.
> >
> >-  A set bit in the mask means a pmd thread is created and pinned
> >-  to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
> >+  A set bit in the mask means a pmd thread is created, pinned to the
> >+  corresponding CPU core and the scheduling policy SCHED_RR with highest
> >+  priority of the scheduling policy applied to pmd thread.
> >+  e.g. to run pmd threads on core 1 and 2
> >
> >   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
> >
> >@@ -246,6 +250,11 @@ needs to be affinitized accordingly.
> >
> >   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
> >
> >+  Note: It is recommended that the OVS control thread and pmd thread
> >+  shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and
>

Re: [ovs-dev] [PATCH] ovn-controller: update_ct_zone operates always on empty set

2016-07-27 Thread Russell Bryant

On Wed, Jul 27, 2016 at 8:34 AM, Ryan Moats  wrote:

> "dev"  wrote on 07/27/2016 12:44:38 AM:
>
> > From: Babu Shanmugam 
> > To: Russell Bryant 
> > Cc: ovs dev 
> > Date: 07/27/2016 12:45 AM
> > Subject: Re: [ovs-dev] [PATCH] ovn-controller: update_ct_zone
> > operates always on empty set
> > Sent by: "dev" 
> >
> >
> >
> > On Wednesday 27 July 2016 06:43 AM, Russell Bryant wrote:
> > > On Tue, Jul 26, 2016 at 6:46 AM,  > > >> wrote:
> > >
> > > From: Babu Shanmugam  > > >>
> > >
> > > Commit 263064a (Convert binding_run to incremental processing.)
> > > removed the usage
> > > of all_lports from binding_run, but it is infact used in the
> > > context of the caller,
> > > especially by update_ct_zones().
> > >
> > > Without this change, update_ct_zones operates on an empty set
> always.
> > >
> > > Signed-off-by: Babu Shanmugam  > > >>
> > >
> > >
> > > Ouch. This is a really bad regression.  If I understand correctly,
> > > we're not setting a ct zone ID for any logical ports.  All are just
> > > using the default zone of 0.
> > >
> > Yes Russell, your understanding is correct.
> > > We should think about a good way to test OVN's use of conntrack zones
> > > to ensure that entries end up in separate zones for separate ports.  A
> > > good test for that may require userspace conntrack support, though.
> > >  Another test we could do now would be looking at the flows in table 0
> > > and ensuring that the input flow for each port has a different
> > > conntrack zone ID assigned.  That feels like kind of a hack, though.
> > I agree that we need more test cases. I could not spend much time to
> > figure out a proper approach for a test case. I will have a look at it.
> >
> > Thank you,
> > Babu
> > ___
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
> 
> http://patchwork.ozlabs.org/patch/653288/ replicates the
> all_lports code from binding.c prior to commit 263064a (I literally
> rolled a repo back to the commit before that one and then did a
> code inspection and copy/paste). Now, I still don't have a test case
> that shows the revert is fixed (because I frankly don't know how to
> write this one), but I believe that with the above patch set we are
> no longer using only the default zone.
>
Thanks.  Re-adding a full loop over all port bindings seems pretty
undesirable.  I think we can iterate on this original patch and get a full
solution without a ton of extra effort.

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: create patch ports when necessary.

2016-07-27 Thread Ben Pfaff

On Mon, Jul 25, 2016 at 09:23:19AM -0700, nickcooper-zhangtonghao wrote:
> If the logical router ports without 'peer' or the port named peer
> is not created, it is unnecessary to insert the ports into the
> southbound Port_Binding table.
> 
> Similarly, if logical switch ports of type 'router' don't contain 
> 'router-port'
> value, these ports will not be inserted.
> 
> This patch may optimize the process of 'build_ports'.
> 
> Signed-off-by: nickcooper-zhangtonghao 
> 

This doesn't seem like a common case.  Is it worth optimizing?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Only peer router ports to other router ports.

2016-07-27 Thread Ben Pfaff

On Mon, Jul 25, 2016 at 09:19:00AM -0700, nickcooper-zhangtonghao wrote:
> Deletion of a logical router port, which may be an other
> router peer, result in a WARN, because the "ports" variable
> still contains the logical port deleted. This port exists
> but should not have been initialized fully.
> 
> nbsp == NULL, nbrp == NULL
> 
> It is necessary to check 'nbsp'.
> 
> A router port's "peer", if set, must point to another router port, but the
> code as written also accepted switch ports.  This caused problems when
> switch ports were actually specified.
> 
> Reported-by: Gurucharan Shetty 
> Reported-at: http://openvswitch.org/pipermail/dev/2016-July/075524.html
> Signed-off-by: Ben Pfaff 
> Acked-by: Gurucharan Shetty 
> Signed-off-by: nickcooper-zhangtonghao 
> 
> ---
>  ovn/northd/ovn-northd.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index a3d1672..efc915c 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -746,9 +746,15 @@ join_logical_ports(struct northd_context *ctx,
>  } else if (op->nbrp && op->nbrp->peer) {
>  struct ovn_port *peer = ovn_port_find(ports, op->nbrp->peer);
>  if (peer) {
> +/* Deletion of a logical router port, which may be an other
> + * router peer, result in a WARN, because the "ports" 
> variable
> + * still contains the logical port deleted. This port exists 
> + * but should not have been initialized fully.
> + *
> + * nbsp == NULL, nbrp == NULL */
>  if (peer->nbrp) {
>  op->peer = peer;
> -} else {
> +} else if (peer->nbsp) {
>  /* An ovn_port for a switch port of type "router" does 
> have
>   * a router port as its peer (see the case above for
>   * "router" ports), but this is set via 
> options:router-port

I don't understand this yet.  I don't see how an ovn_port's nbsp and
nbrp can both be NULL.  I only see two calls to ovn_port_create(), and
each of them supplies either nbsp or nbrp as nonnull.

Can you explain further?

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-controller: squelch expected duplicate flow warnings

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 01:54:29PM -0700, Guru Shetty wrote:
> On 24 July 2016 at 10:07, Ryan Moats  wrote:
> 
> > In the physical processing of ovn-controller, there are two
> > sets of OF flows that are still fully recalculated every cycle:
> >
> >   Flows that aren't associated with any logical flow, and
> >   Flows calculated based on multicast groups
> >
> > Because these flows are recalculated fully each cycle, full
> > duplicates of existing OF flows are created and the OF management
> > code in ovn-controller pollutes the logs with false positive
> > warnings about repeated duplicates.
> >
> > As a short term measure, ignore full duplicates for both of
> > these types of flows, but still warn if the action changes
> > (as that is not expected and may be indicative of a problem).
> >
> > Signed-off-by: Ryan Moats 
> >
> 
> I also noticed that "commit 70c7cfef188b5ae9940abd5 (ovn-controller: Add
> incremental processing to lflow_run and physical_run)" causes load
> balancing system unit tests to fail. A little debugging shows that groups
> are getting deleted when new flows are added.  My hunch is that this is
> likely because 'desired_groups' in ofctl_put gets deleted in every run. But
> in the next run, it does not get updated as we no longer process all flows.

It's unclear to me from the discussion of this patch whether it's
beneficial.  Can someone clarify?

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/3] ovsdb: Fix memory leak in replication logic

2016-07-27 Thread Andy Zhou

On Wed, Jul 27, 2016 at 12:48 PM, Andy Zhou  wrote:

>
>
> On Wed, Jul 27, 2016 at 12:43 PM, Ben Pfaff  wrote:
>
>> On Tue, Jul 26, 2016 at 01:08:06PM -0700, Andy Zhou wrote:
>> > Release the memory of reply message of the initial "monitor" request.
>> >
>> > Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076075.html
>> > Signed-off-by: Andy Zhou 
>> > ---
>> >  ovsdb/replication.c | 4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/ovsdb/replication.c b/ovsdb/replication.c
>> > index 3d589ef..af7ae5c 100644
>> > --- a/ovsdb/replication.c
>> > +++ b/ovsdb/replication.c
>> > @@ -365,6 +365,10 @@ get_initial_db_state(const struct db *database)
>> >  if (msg->type == JSONRPC_REPLY) {
>> >  process_notification(msg->result, database->db);
>> >  }
>> > +
>> > +if (msg) {
>> > +jsonrpc_msg_destroy(msg);
>> > +}
>> >  }
>>
>> The 'if' isn't needed.
>>
> Right,  jsonrpc_msg_destroy() checks for NULL internally.
>
> I will drop the 'if' before pushing.  Thanks for catching it.
>

Pushed to master.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] rhel: Allow openvswitch to get parent information

2016-07-27 Thread Joe Stringer

On 26 July 2016 at 14:10, Flavio Leitner  wrote:
> On Tue, Jul 26, 2016 at 12:57:07PM -0700, Joe Stringer wrote:
>> On 25 July 2016 at 18:16, Flavio Leitner  wrote:
>> > Updates SELinux to allow ovs-vsctl to get parent process
>> > information and log that to the database:
>> >
>> > record 241: 2016-07-26 00:59:47.418 "ovs-vsctl (invoked by /bin/bash
>> > (pid 1589)): ovs-vsctl -t 10 -- --if-exist ...
>> >
>> > Jul 25 12:57:35 localhost.localdomain audit[830]: AVC avc:  denied  {
>> > search } for  pid=830 comm="ovs-vsctl" name="731" dev="proc" ino=14140
>> > scontext=system_u:system_r:openvswitch_t:s0
>> > tcontext=system_u:system_r:initrc_t:s0 tclass=dir permissive=0
>> >
>> > Signed-off-by: Flavio Leitner 
>> > ---
>> >  selinux/openvswitch-custom.te | 5 +
>> >  1 file changed, 5 insertions(+)
>> >
>> > diff --git a/selinux/openvswitch-custom.te b/selinux/openvswitch-custom.te
>> > index fc32b97..5739595 100644
>> > --- a/selinux/openvswitch-custom.te
>> > +++ b/selinux/openvswitch-custom.te
>> > @@ -2,8 +2,13 @@ module openvswitch-custom 1.0;
>> >
>> >  require {
>> >  type openvswitch_t;
>> > +attribute domain;
>> >  class netlink_socket { setopt getopt create connect getattr write 
>> > read };
>> > +class dir { search };
>> > +class file { open getattr read };
>> >  }
>> >
>> >  #= openvswitch_t ==
>> >  allow openvswitch_t self:netlink_socket { setopt getopt create connect 
>> > getattr write read };
>> > +allow openvswitch_t domain:dir { search };
>> > +allow openvswitch_t domain:file { open getattr read };
>>
>> Hi Flavio,
>>
>> Thanks for spending some time to get OVS in better shape with SELinux.
>> I figure that once this settles down a bit we should take the policy
>> file here and work towards upstreaming all of the policy changes.
>
> Yeah, we can try to do both in parallel.  Once this gets in, I will
> open the bz requesting to fix Fedora which would fix upstream too.

Sounds good.

>> As far as I can follow, this "domain" type is not just for accessing
>> OVS directories and files (like openvswitch_t), but ifor a much wider
>> range of paths:
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/SELinux_Guide/rhlcommon-section-0048.html
>>
>> "# The domain attribute identifies every type that can be
>> # assigned to a process.  This attribute is used in TE rules
>> # that should be applied to all domains, e.g. permitting
>> # init to kill all processes."
>>
>> Is my understanding (+documentation) correct here? Is there an similar
>
> Your understanding is correct.  Turns out that we don't know which
> process will be the parent, so it could bash unconfined or initrc_t
> or in any other context (neutron?).
>
>> but more restrictive policy that allows ovs-vsctl to access, for
>> example, /var/run/openvswitch/* (with var_run_openvswitch_t or
>> similar)? Alternatively is there an example of another daemon that has
>> a similar policy that set a precedence for writing the policy like
>> this?
>
> I spent few hours on this and I couldn't find a way to restrict it
> more that I proposed with selinux.  Basically the above is an expansion
> of the interface domain_read_all_domains_state()[1] which other
> applications are using to read other processes states.  However, that
> seemed relatively new and probably not available on older distros, so
> I have expanded to the relevant actions removing what we don't need.
>
> [1] http://danwalsh.livejournal.com/51435.html
>
>> Would you also be able to provide the full ovs-vsctl commandline? It
>> was a little difficult to understand exactly what was going on during
>> this event, or try to reproduce.
>
> utilities/ovs-vsctl.c:2473
>
> 2472 static char *
> 2473 vsctl_parent_process_info(void)
> 2474 {
> 2475 #ifdef __linux__
> 2476 pid_t parent_pid;
> 2477 char *procfile;
> 2478 struct ds s;
> 2479 FILE *f;
> 2480
> 2481 parent_pid = getppid();
> 2482 procfile = xasprintf("/proc/%d/cmdline", parent_pid);
> 2483
> 2484 f = fopen(procfile, "r");
>
> That is called from do_vsctl() to find the parent info.  If you run as
> root, then it's unconfined and it works, but it doesn't work during
> boot time (initrc_t) for instance.
>
> To reproduce you just need to configure an OVS interface using ifcfg
> with ONBOOT=yes and reboot.

Hmm. So all we want to do in this case is to get the parent's name to
log it. I'm of two minds:

1) Be more restrictive. If this were the only place in OVS that we
read /proc then maybe it should just print the pid on failure to read
/proc, then we just accept that SELinux denies this access. It seems
like it's just a niceity for prettyprinting which shouldn't affect
actual setup/operation of OVS. While a pid is less than a name, it's
still just as much information for cases like this where it's being
invoked from initrc_t.

2) Be more lenient. It seems there

Re: [ovs-dev] [PATCH] ovsdb: Fix memory leak reported by valgrind.

2016-07-27 Thread Ben Pfaff

On Mon, Jul 25, 2016 at 11:00:29AM +0300, Liran Schour wrote:
> Destroy shash on destroy of session's condition structure.
> Reported here: http://openvswitch.org/pipermail/dev/2016-July/075968.html
> 
> Signed-off-by: Liran Schour 

Applied, thanks!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v5 13/16] system-tests: Run conntrack tests with userspace.

2016-07-27 Thread Jesse Gross

On Wed, Jul 27, 2016 at 1:40 PM, Daniele Di Proietto
 wrote:
> On 27/07/2016 13:12, "Joe Stringer"  wrote:
>
>>On 26 July 2016 at 17:58, Daniele Di Proietto  wrote:
>>> The userspace connection tracker doesn't support ALGs, frag reassembly
>>> or NAT yet, so skip those tests.
>>>
>>> Also, connection tracking state input from a local port is not possible
>>> in userspace.
>>>
>>> The userspace datapath pads all frames with 0, to make them at
>>> least 64 bytes.
>>>
>>> Finally, the userspace datapath checks for the IPv4 header checksum, so
>>> fix those in the hardcoded packets.
>>>
>>> Signed-off-by: Daniele Di Proietto 
>>> Acked-by: Joe Stringer 
>>> Acked-by: Flavio Leitner 
>>> ---
>>
>>
>>
>>> @@ -1324,11 +1327,11 @@ dnl UDP packets from ns0->ns1 should solicit 
>>> "destination unreachable" response.
>>>  NS_CHECK_EXEC([at_ns0], [bash -c "echo a | nc $NC_EOF_OPT -u 10.1.1.2 
>>> 1"])
>>>
>>>  AT_CHECK([ovs-appctl revalidator/purge], [0])
>>> -AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop], 
>>> [0], [dnl
>>> - n_packets=1, n_bytes=44, priority=100,udp,in_port=1 
>>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>>> - n_packets=1, n_bytes=72, 
>>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
>>> - n_packets=1, n_bytes=72, priority=100,ct_state=-trk,icmp,in_port=2 
>>> actions=ct(table=0)
>>> - n_packets=2, n_bytes=84, priority=10,arp actions=NORMAL
>>> +AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop | 
>>> sed -e 's/n_bytes=[[0-9]]*/n_bytes=/g'], [0], [dnl
>>> + n_packets=1, n_bytes=, priority=100,udp,in_port=1 
>>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>>> + n_packets=1, n_bytes=, 
>>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
>>> + n_packets=1, n_bytes=, priority=100,ct_state=-trk,icmp,in_port=2 
>>> actions=ct(table=0)
>>> + n_packets=2, n_bytes=, priority=10,arp actions=NORMAL
>>>  NXST_FLOW reply:
>>>  ])
>>
>>I think this is a completely orthogonal point, but it's still a bit
>>surprising to me that the n_bytes would differ when receiving short
>>packets in kernel vs. userspace datapaths. I follow that userspace
>>pads shorter packets on receive, but shouldn't we be able to attribute
>>these stats consistently, regardless of the datapath?
>
> That's a good point.
>
> We call dp_packet_pad() in netdev_linux_recv().  That used to be in 
> netdev_recv() and can be traced back to the initial OVS commit.  Here's a 
> comment in netdev_recv():
>
> COVERAGE_INC(netdev_received);
> buffer->size += n_bytes;
>
> /* When the kernel internally sends out an Ethernet frame on an
>  * interface, it gives us a copy *before* padding the frame to the
>  * minimum length.  Thus, when it sends out something like an ARP
>  * request, we see a too-short frame.  So pad it out to the minimum
>  * length. */
> pad_to_minimum_length(buffer);

I wonder if anything in OVS actually cares about this anymore? I don't
know the history of that comment.

I should point out that there are other sins in stats accounting (TSO
in the kernel is probably the worst since it causes the flow stats to
not match what is actually seen on the wire). This wouldn't be the
worst inconsistency but if there is any easy fix that would be nice.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 1/2] ovs-router: ignore IPv6 source addresses for IPv4 routes

2016-07-27 Thread Ben Pfaff

On Sun, Jul 24, 2016 at 01:07:26PM -0300, Thadeu Lima de Souza Cascardo wrote:
> Though this should not happen when we have another address on the device that 
> is
> IPv4 mapped, we should prevent adding a routing entry to IPv4 with an IPv6
> source address.
> 
> This entry has been observed when the addresses list was out of date.
> 
> Cached: 172.16.10.1/32 dev br3 SRC fe80::c4d0:14ff:feb1:b54b
> Cached: 172.16.10.0/24 dev br3 SRC fe80::c4d0:14ff:feb1:b54b
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo 

Applied, thanks!

Does this need any backports?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] route-table: flush addresses list when route table is reset

2016-07-27 Thread Ben Pfaff

On Sun, Jul 24, 2016 at 01:07:27PM -0300, Thadeu Lima de Souza Cascardo wrote:
> When the route table is reset, the addresses list may be out of date, as we 
> race
> for the many netlink socket notifications.
> 
> A quick fix for this is flushing the addresses list, before dumping the routes
> and gathering source addresses for them.
> 
> That way, instead of using invalid source addresses or preventing an entry to 
> be
> added because of missing source addresses, repeated tests showed the correct
> entry is always added.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo 

This gives me a compiler error:
../lib/route-table.c:154:5: error: implicit declaration of function 
'netdev_get_addrs_list_flush' is invalid in C99 
[-Werror,-Wimplicit-function-declaration]

I also noticed that netdev_get_addrs_list_flush() is not available on
Windows.

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 0/5] netdev_open conflicting types

2016-07-27 Thread Ben Pfaff

I'm planning to leave this series to Daniele, since he already seems to
be handling the main discussion.

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH V3] ovs-vtep: vtep-ctl and ovs-vtep support of adding explicit tunnel key

2016-07-27 Thread Ben Pfaff

Justin, are you the best person to review this (when you're back from
vacation)?

Thanks,

Ben.

On Sun, Jul 24, 2016 at 03:01:00PM +0300, itamaro wrote:
> From: itamaro 
> 
> This patch adds support for handeling a per-tunnel tunnel key in the
> ovs-vtep and vtep-ctl to support the usage of neutron L2GW as an
> inter-cloud gateway.
> 
> The Neutron spec is available here:
> https://review.openstack.org/#/c/270786/
> 
> The aim of this patch is to support the usage of hardware vtep switch as
> an inter-cloud gateway, which is used to connect two separated l2 broadcast
> domains. This document will also explain the logic behind the addition of the 
> new per-tunnel tunnel-key in the hardware_vtep schema.  
> 
> The introduction of the relay tunnel, does not change the current logic of 
> hardware_vtep, it does however introduce new logic related to iner-cloud
> connectivity.
> 
> 
> Network layout
> ==
> 
> virtual network 1 shared  network virtual network 
> 2 
> ++  
> ++
> |Compute Host|   VNI=1VNI=2 |Compute 
> Host|
> |   +--+ <-++--->   +--+  
>|
> |   |vm| | ||   |   |vm|  
>|
> |   +--+ | |   L3 network   |   |   +--+  
>|
> +---^+ ||   
> +-^--+
> |  +---v+  X +--v-+   
> |   
> |  +---> hardware_vtep  |  X | hardware_vtep  |   
> |   
> | VNI=1|   | logical switch |  X | logical switch <-+ 
> |   
> |  |   | (tunnel_key 1) |==<<==X=>>==| (tunnel_key 2) | 
> |VNI=2|   
> |  |   |   +-+ +-+  |  X |   +-+  +-+ | | 
> |   
> |  |   |   |-| |-|  |  X |   |-|  |-| | | 
> |   
> +---v--v-+ ++  X ++ | 
> |   
> |Compute Host| vlan2|   |vlan5   vlan9||vlan21  | 
> |   
> |   +--+ |  |   |relay vxlan  ||
> +---v-v--+
> |   |vm| |  |   |  VNI=100|||Compute 
> Host|
> |   +--+ |  |   | |||   +--+  
>|
> +++-v-+   +-v-+ +-v-++-v-+  |   |vm|  
>|
>   |   |   |   | |   ||   |  |   +--+  
>|
>   |   |   |   | |   ||   |  
> ++
>   +---+   +---+ +---++---+
> 
> Bare metal elements Bare metal elements
> 
> Logical switch
> ===
> In a cloud architecture, there is sometimes need to connect virtual machines
> and physical machines to the same L2 broadcast domain.
> A logical switch is an entity representing an l2 virtual overlay network,
> identified by a shared tunnel key. This tunnel key (VxLAN VNI) is shared among
> all overlay virtual tunnel endpoints (VTEP) of the switch.
> The logical switch binds physical ports with either identical or different
> "VLAN" tags to the "VxLAN overlay" network.
> 
> In a multi-cloud architecture, it may be useful to establish a cross-cloud 
> l2 broadcast domain. The extended hardware vtep uses a relay l2 tunnel,
> which is a tunnel with an explicit tunnel-key property. The tunnel-key 
> propery 
> is used to map each overlay network (logical switch tunnel-key) in each cloud 
> to
> the tunnel-key of the relay tunnel.
> 
> The mapping to a remote logical switch is done by defining the same tunnel key
> in both ends of the the relay tunnel. This tunnel key (VxLAN VNI) is a
> property of the relay tunnel itself.
> 
> To support the above tunnel behevior, a new kind of VTEP port is logic is
> introduced, defining two VTEP port groups. One group represents the existing
> VTEP ports of the local l2 overlay network, and another new group represents 
> the
> individual l2 relay VTEPS.
> 
> Multicast and Unknown unicast traffic
> =
> Currently Broadcast, Unknown unicast, and Multicast,"BUM" traffic to the 
> overlay networks
> is handled by two replication modes:
> 
>   - "source_node" mode: The packets originated from physical port
> are replicated on all the VTEPs ports pointed by unknown-dst, and flooded
> to all the physical bound ports.
> 
>   - "service_node" mode: The packets originated from physical port are
> forwarded to only a selected single service node from the unknown-dst 
> ports
> (the service node responsible for "BUM" propagation to the overlay 
> network),
> and flooded to all the

Re: [ovs-dev] [PATCH v5 13/16] system-tests: Run conntrack tests with userspace.

2016-07-27 Thread Daniele Di Proietto






On 27/07/2016 13:12, "Joe Stringer"  wrote:

>On 26 July 2016 at 17:58, Daniele Di Proietto  wrote:
>> The userspace connection tracker doesn't support ALGs, frag reassembly
>> or NAT yet, so skip those tests.
>>
>> Also, connection tracking state input from a local port is not possible
>> in userspace.
>>
>> The userspace datapath pads all frames with 0, to make them at
>> least 64 bytes.
>>
>> Finally, the userspace datapath checks for the IPv4 header checksum, so
>> fix those in the hardcoded packets.
>>
>> Signed-off-by: Daniele Di Proietto 
>> Acked-by: Joe Stringer 
>> Acked-by: Flavio Leitner 
>> ---
>
>
>
>> @@ -1324,11 +1327,11 @@ dnl UDP packets from ns0->ns1 should solicit 
>> "destination unreachable" response.
>>  NS_CHECK_EXEC([at_ns0], [bash -c "echo a | nc $NC_EOF_OPT -u 10.1.1.2 
>> 1"])
>>
>>  AT_CHECK([ovs-appctl revalidator/purge], [0])
>> -AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop], 
>> [0], [dnl
>> - n_packets=1, n_bytes=44, priority=100,udp,in_port=1 
>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>> - n_packets=1, n_bytes=72, 
>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
>> - n_packets=1, n_bytes=72, priority=100,ct_state=-trk,icmp,in_port=2 
>> actions=ct(table=0)
>> - n_packets=2, n_bytes=84, priority=10,arp actions=NORMAL
>> +AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop | 
>> sed -e 's/n_bytes=[[0-9]]*/n_bytes=/g'], [0], [dnl
>> + n_packets=1, n_bytes=, priority=100,udp,in_port=1 
>> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
>> + n_packets=1, n_bytes=, 
>> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
>> + n_packets=1, n_bytes=, priority=100,ct_state=-trk,icmp,in_port=2 
>> actions=ct(table=0)
>> + n_packets=2, n_bytes=, priority=10,arp actions=NORMAL
>>  NXST_FLOW reply:
>>  ])
>
>I think this is a completely orthogonal point, but it's still a bit
>surprising to me that the n_bytes would differ when receiving short
>packets in kernel vs. userspace datapaths. I follow that userspace
>pads shorter packets on receive, but shouldn't we be able to attribute
>these stats consistently, regardless of the datapath?

That's a good point.

We call dp_packet_pad() in netdev_linux_recv().  That used to be in 
netdev_recv() and can be traced back to the initial OVS commit.  Here's a 
comment in netdev_recv():

COVERAGE_INC(netdev_received);
buffer->size += n_bytes;

/* When the kernel internally sends out an Ethernet frame on an
 * interface, it gives us a copy *before* padding the frame to the
 * minimum length.  Thus, when it sends out something like an ARP
 * request, we see a too-short frame.  So pad it out to the minimum
 * length. */
pad_to_minimum_length(buffer);
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovsdb: Fix memory leak in execute_update.

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 06:28:30PM -0700, William Tu wrote:
> Valgrind testcase 1804 ovsdb-server.at:1023 insert rows, update rows by value
> reports the following leak.
> json_from_string (json.c:1025)
> execute_update (replication.c:614), similarily at execute_delete()
> process_table_update (replication.c:502)
> process_notification.part.5 (replication.c:445)
> process_notification (replication.c:402)
> check_for_notifications (replication.c:418)
> replication_run (replication.c:110)
> 
> Signed-off-by: William Tu 
> ---
>  ovsdb/replication.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/ovsdb/replication.c b/ovsdb/replication.c
> index af7ae5c..fe89d39 100644
> --- a/ovsdb/replication.c
> +++ b/ovsdb/replication.c
> @@ -573,6 +573,8 @@ execute_delete(struct ovsdb_txn *txn, const char *uuid,
>  }
>  
>  ovsdb_condition_destroy();
> +json_destroy(CONST_CAST(struct json *, where));
> +
>  return error;
>  }
>  
> @@ -630,6 +632,7 @@ execute_update(struct ovsdb_txn *txn, const char *uuid,
>  ovsdb_row_destroy(row);
>  ovsdb_column_set_destroy();
>  ovsdb_condition_destroy();
> +json_destroy(CONST_CAST(struct json *, where));
>  
>  return error;
>  }

Thanks, good catch.

I think that we should just remove "const" from the variable
declarations in each case; the cast is not worth it.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [patch_v6 1/2] ovn: Fix receive from vxlan in ovn-controller.

2016-07-27 Thread Ben Pfaff

On Sat, Jul 23, 2016 at 01:18:01PM -0700, Darrell Ball wrote:
> This patch enables source node replication in OVN for receive from Vxlan
> tunnels.  OVN only supports source node replication mode.
> 
> OVN only supports source_node replication and previously vtep interaction,
> which used service node replication by default for
> multicast/broadcast/unknown unicast traffic worked by happenstance.
> Because of limited vxlan encapsulation metadata, received packets were
> resubmitted to find the egress port(s). This is not correct for multicast,
> broadcast and unknown unicast traffic as traffic will get resent on the tunnel
> mesh. ovn-controller is changed not to send traffic received from vxlan
> tunnels out the tunnel mesh again.  Traffic received from vxlan tunnels is
> now only sent locally as intended.
> 
> To support keeping state for receipt from a vxlan tunnel a MFF logical
> register is allocated for general scratchpad purposes and one bit is used for
> receipt from vxlan context.
> 
> As part of this change ovn-controller-vtep is hard-coded to set the 
> replication
> mode of each logical switch to source node as OVN will only support source
> node replication.
> 
> Signed-off-by: Darrell Ball 

This seems like a useful advance.  Thank you.

ovn-architecture(7) has a section titled "Architectural Physical Life
Cycle of a Packet" that describes how table 32 on hypervisors works.
Please update this to describe the special handling of packets that
arrive from VXLAN tunnels.

Please also update ovn-architecture(7) to mention the new flag bit.
Yes, I know that you want to break this document apart but I want to
treat that issue separately.

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn: Add ovn-controller-vtep debian package

2016-07-27 Thread Ben Pfaff

On Sat, Jul 23, 2016 at 10:38:34AM -0500, Ryan Moats wrote:
> Having a separate debian package for deploying
> the ovn-controller-vtep binary enables the ability
> to assign specific nodes the role of communicating
> with VTEP enabled TORs.
> 
> Change-Id: Ia36aea7d89bd011a57918820b2a9f6e3469b3e04
> Signed-off-by: Ryan Moats 

Should there be an initscript to start and stop it?

There's inconsistent use of tabs in the automake.mk change.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovsdb: Fix memory leak in execute_update.

2016-07-27 Thread Andy Zhou

On Tue, Jul 26, 2016 at 6:28 PM, William Tu  wrote:

> Valgrind testcase 1804 ovsdb-server.at:1023 insert rows, update rows by
> value
> reports the following leak.
> json_from_string (json.c:1025)
> execute_update (replication.c:614), similarily at execute_delete()
> process_table_update (replication.c:502)
> process_notification.part.5 (replication.c:445)
> process_notification (replication.c:402)
> check_for_notifications (replication.c:418)
> replication_run (replication.c:110)
>
> Signed-off-by: William Tu 
>

Tested, it removed the memory leak reported.
I think declare 'where' as 'struct json' instead of 'const struct json'
will make the code easier to read.

 Acked-by: Andy Zhou 


>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2 1/3] utilities/ovs-ctl.in: Allow non-monitoring daemons

2016-07-27 Thread Ben Pfaff

On Mon, Jul 25, 2016 at 02:03:51PM -0400, Aaron Conole wrote:
> This commit allows the ovs-ctl command to spawn daemons without the
> internal process monitor.  This is useful when integrating with,
> ex. systemd, which provides its own monitoring facilities.
> 
> Signed-off-by: Aaron Conole 
> Reviewed-by: Markos Chandras 
> Acked-by: Ben Pfaff 
> Acked-by: Flavio Fernandes 
> Acked-by: Flavio Leitner 

Applied, thanks!

I'm not planning to look at the rest of the series; I'll assume that
Russell will handle it (unless he objects).

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v6 1/2] Check and allocate free qdisc queue id for ports with qos parameters

2016-07-27 Thread Ben Pfaff

On Wed, Jul 20, 2016 at 08:02:03PM +0530, bscha...@redhat.com wrote:
> ovn-northd processes the list of Port_Bindings and hashes the list of
> queues per chassis. When it finds a port with qos_parameters and without
> a queue_id, it allocates a free queue for the chassis that this port belongs.
> The queue_id information is stored in the options field of Port_binding table.
> Adds an action set_queue to the ingress table 0 of the logical flows
> which will be translated to openflow set_queue by ovn-controller
> 
> ovn-controller opens the netdev corresponding to the tunnel interface's
> status:tunnel_egress_iface value and configures a HTB qdisc on it. Then for
> each SB port_binding that has queue_id set, it allocates a queue with the
> qos_parameters of that port. It also frees up unused queues.
> 
> This patch replaces the older approach of policing
> 
> Signed-off-by: Babu Shanmugam 

I suggest folding in the following changes.  Notably, set_queue(0); was
documented but didn't work because QDISC_MIN_QUEUE_ID was 1.

This series has no tests.  I think that at least the DSCP marking
feature could be tested with the existing OVN testing infrastructure.
Will you work on that?

Thanks,

Ben.

--8<--cut here-->8--

diff --git a/ovn/lib/actions.h b/ovn/lib/actions.h
index c0fde28..230115f 100644
--- a/ovn/lib/actions.h
+++ b/ovn/lib/actions.h
@@ -24,7 +24,11 @@
 #include "openvswitch/dynamic-string.h"
 #include "util.h"
 
-#define QDISC_MIN_QUEUE_ID  1
+/* Valid arguments to set_queue() action.
+ *
+ * QDISC_MIN_QUEUE_ID is the default queue, so user-defined queues should
+ * start at QDISC_MIN_QUEUE_ID+1. */
+#define QDISC_MIN_QUEUE_ID  0
 #define QDISC_MAX_QUEUE_ID  0xf000
 
 struct expr;
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 4eef2d9..dc803d2 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -309,7 +309,7 @@ chassis_queueid_in_use(const struct hmap *set, const char 
*chassis,
 static uint32_t
 allocate_chassis_queueid(struct hmap *set, const char *chassis)
 {
-for (uint32_t queue_id = QDISC_MIN_QUEUE_ID;
+for (uint32_t queue_id = QDISC_MIN_QUEUE_ID + 1;
  queue_id <= QDISC_MAX_QUEUE_ID;
  queue_id++) {
 if (!chassis_queueid_in_use(set, chassis, queue_id)) {
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 5e6f9c3..057a4f1 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1768,10 +1768,10 @@ tcp.flags = RST;
 interface, in bits.
   
 
-  
+  
 Indicates the queue number on the physical device. This is same as the
-queue_id used in OpenFlow in struct ofp_action_enqueue. Value should
-be in the range of 1 to 61440.
+queue_id used in OpenFlow in struct ofp_action_enqueue.
   
 
 
diff --git a/tests/ovn.at b/tests/ovn.at
index 86efcf5..84ee2c1 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -655,6 +655,11 @@ reg1[0] = put_dhcp_opts(offerip=1.2.3.4, domain=1.2.3.4); 
=> DHCP option domain
 # na
 na { eth.src = 12:34:56:78:9a:bc; nd.tll = 12:34:56:78:9a:bc; outport = 
inport; inport = ""; /* Allow sending out inport. */ output; }; => 
actions=controller(userdata=00.00.00.03.00.00.00.00.00.19.00.10.80.00.08.06.12.34.56.78.9a.bc.00.00.00.19.00.10.80.00.42.06.12.34.56.78.9a.bc.00.00.ff.ff.00.18.00.00.23.20.00.06.00.20.00.00.00.00.00.01.1c.04.00.01.1e.04.00.19.00.10.00.01.1c.04.00.00.00.00.00.00.00.00.00.19.00.10.00.00.00.02.00.00.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.40.00.00.00),
 prereqs=nd
 
+# set_queue
+set_queue(0); => actions=set_queue:0, prereqs=1
+set_queue(61440); => actions=set_queue:61440, prereqs=1
+set_queue(65535); => Queue ID 65535 for set_queue is not in valid range 0 to 
61440.
+
 # Contradictionary prerequisites (allowed but not useful):
 ip4.src = ip6.src[0..31]; => 
actions=move:NXM_NX_IPV6_SRC[0..31]->NXM_OF_IP_SRC[], prereqs=eth.type == 0x800 
&& eth.type == 0x86dd
 ip4.src <-> ip6.src[0..31]; => 
actions=push:NXM_NX_IPV6_SRC[0..31],push:NXM_OF_IP_SRC[],pop:NXM_NX_IPV6_SRC[0..31],pop:NXM_OF_IP_SRC[],
 prereqs=eth.type == 0x800 && eth.type == 0x86dd
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v5 13/16] system-tests: Run conntrack tests with userspace.

2016-07-27 Thread Joe Stringer

On 26 July 2016 at 17:58, Daniele Di Proietto  wrote:
> The userspace connection tracker doesn't support ALGs, frag reassembly
> or NAT yet, so skip those tests.
>
> Also, connection tracking state input from a local port is not possible
> in userspace.
>
> The userspace datapath pads all frames with 0, to make them at
> least 64 bytes.
>
> Finally, the userspace datapath checks for the IPv4 header checksum, so
> fix those in the hardcoded packets.
>
> Signed-off-by: Daniele Di Proietto 
> Acked-by: Joe Stringer 
> Acked-by: Flavio Leitner 
> ---



> @@ -1324,11 +1327,11 @@ dnl UDP packets from ns0->ns1 should solicit 
> "destination unreachable" response.
>  NS_CHECK_EXEC([at_ns0], [bash -c "echo a | nc $NC_EOF_OPT -u 10.1.1.2 
> 1"])
>
>  AT_CHECK([ovs-appctl revalidator/purge], [0])
> -AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop], 
> [0], [dnl
> - n_packets=1, n_bytes=44, priority=100,udp,in_port=1 
> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
> - n_packets=1, n_bytes=72, 
> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
> - n_packets=1, n_bytes=72, priority=100,ct_state=-trk,icmp,in_port=2 
> actions=ct(table=0)
> - n_packets=2, n_bytes=84, priority=10,arp actions=NORMAL
> +AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort | grep -v drop | sed 
> -e 's/n_bytes=[[0-9]]*/n_bytes=/g'], [0], [dnl
> + n_packets=1, n_bytes=, priority=100,udp,in_port=1 
> actions=ct(commit,exec(load:0x1->NXM_NX_CT_MARK[[]])),output:2
> + n_packets=1, n_bytes=, 
> priority=100,ct_state=+rel+trk,ct_mark=0x1,icmp,in_port=2 actions=output:1
> + n_packets=1, n_bytes=, priority=100,ct_state=-trk,icmp,in_port=2 
> actions=ct(table=0)
> + n_packets=2, n_bytes=, priority=10,arp actions=NORMAL
>  NXST_FLOW reply:
>  ])

I think this is a completely orthogonal point, but it's still a bit
surprising to me that the n_bytes would differ when receiving short
packets in kernel vs. userspace datapaths. I follow that userspace
pads shorter packets on receive, but shouldn't we be able to attribute
these stats consistently, regardless of the datapath?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] tests: Add new pmd test for pmd-rxq-affinity.

2016-07-27 Thread Daniele Di Proietto

This tests that the newly introduced pmd-rxq-affinity option works as
intended, at least for a single port.

Signed-off-by: Daniele Di Proietto 
---
 tests/pmd.at | 53 +
 1 file changed, 53 insertions(+)

diff --git a/tests/pmd.at b/tests/pmd.at
index 47639b6..3052f95 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -461,3 +461,56 @@ 
icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([PMD - rxq affinity])
+OVS_VSWITCHD_START(
+  [], [], [], [--dummy-numa 0,0,0,0,0,0,0,0,0])
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
+
+AT_CHECK([ovs-ofctl add-flow br0 actions=controller])
+
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=1fe])
+
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dummy-pmd 
ofport_request=1 options:n_rxq=4 
other_config:pmd-rxq-affinity="0:3,1:7,2:2,3:8"])
+
+dnl The rxqs should be on the requested cores.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], [dnl
+p1 0 0 3
+p1 1 0 7
+p1 2 0 2
+p1 3 0 8
+])
+
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6])
+
+dnl We removed the cores requested by some queues from pmd-cpu-mask.
+dnl Those queues will not be polled.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], [dnl
+p1 2 0 2
+])
+
+AT_CHECK([ovs-vsctl remove Interface p1 other_config pmd-rxq-affinity])
+
+dnl We removed the rxq-affinity request.  dpif-netdev should assign queues
+dnl in a round robin fashion.  We just make sure that every rxq is being
+dnl polled again.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 
1,2 -d ' ' | sort], [0], [dnl
+p1 0
+p1 1
+p1 2
+p1 3
+])
+
+AT_CHECK([ovs-vsctl set Interface p1 other_config:pmd-rxq-affinity='0:1'])
+
+dnl We explicitly requested core 1 for queue 0.  Core 1 becomes isolated and
+dnl every other queue goes to core 2.
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], [dnl
+p1 0 0 1
+p1 1 0 2
+p1 2 0 2
+p1 3 0 2
+])
+
+OVS_VSWITCHD_STOP(["/dpif_netdev|WARN|There is no PMD thread on core/d"])
+AT_CLEANUP
-- 
2.8.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] ovsdb active backup deployment

2016-07-27 Thread Andy Zhou

On Tue, Jul 26, 2016 at 6:20 PM, Russell Bryant  wrote:

>
>
> On Tue, Jul 26, 2016 at 3:48 PM, Andy Zhou  wrote:
>
>>
>>
>> On Tue, Jul 26, 2016 at 11:59 AM, Russell Bryant  wrote:
>>
>>>
>>>
>>> On Tue, Jul 26, 2016 at 2:41 PM, Andy Zhou  wrote:
>>>


 On Tue, Jul 26, 2016 at 5:37 AM, Russell Bryant 
 wrote:

>
>
> On Mon, Jul 25, 2016 at 8:15 PM, Andy Zhou  wrote:
>
>> Hi, Rayn and Russell,
>>
>
> Can we move this discussion to the ovs dev mailing list?  Feel free to
> just add it in a reply if you'd like.
>
 Done.

>
>
>> I am wondering how we can actually use the active/backup feature that
>> is now part of
>> OVSDB to increase OVN availability.
>>
>
> TO be clear, I haven't actually tried this yet.  I'm only speaking
> about how I think it should work.
>
>
>> Specifically:
>>
>> 1. When the active OVSDB server failed, should the back up server
>> take over, and allow write transactions? One simpler possibility is to
>> allow read only access to the backup serve.
>>
>
> The  backup server needs to take over.  It's OK if that requires
> intervention by an HA manager like Pacemaker.  If we can't make the 
> passive
> server take over, I'd say the solution is incomplete.
>

 O.K. make sense.

 One possible issue with backup server taking over is "split head".  In
 case due to network error, backup server becomes disconnected from the
 active
 server, then we may have both server thinking they are active server
 now.  Does Pacemaker help with solving this issue.

>>>
>>> It can, yes.  I would expect Pacemaker to explicitly configure a node to
>>> be either the active or passive node.
>>>
>> Manual switching is more straight forward. I agree.
>>
>>>
>
>> 2. When a crashed active OVSDB server recovers, should it become the
>> new backup, or it should switch back.
>>
>
> Becoming the new backup is fine.  Again, this can be orchestrated by
> an HA manager (Pacemaker).
>
 I am not familiar with pacemaker. Can I assume it can provide a correct
 --sync-from argument (pointing to backup server) when relaunch OVSDB
 server?

>>>
>>> Yes.  I'd have to consult with some Pacemaker experts on exactly what
>>> the implementation would look like, but roughly:
>>>
>>> Pacemaker manages services using "OCF Resource Agents", which are just
>>> scripts with a defined set of inputs and outputs for service management.  I
>>> would imagine a Pacemaker cluster being told it must have exactly 1 active
>>> and 1 passive OVSDB service.  When the passive OVSDB service is started, it
>>> would include the "sync-from" argument based on where the active OVSDB
>>> service is currently running.
>>>
>>> We really need to prototype this and document it.  I'm guessing too
>>> much.  Pacemaker is frequently used to manage active/passive HA, though.
>>>
>>> Sounds reasonable,  I will work on ovsdb internal changes to support
>> manual switching, using appctl commands. Then looking into prototyping with
>> HA systems.  I have not used pacemaker in the past, so it may take some
>> time to ramp up.
>>
>
> I should be able to help.  We need to do this work anyway for integration
> into OpenStack deployment tools.  Let me see if I can get some helpful
> examples to follow.
>

Thanks for helping out.

Given that, I now plan to work from bottom up, initially focusing on ovsdb
server changes.

1. Add a state in ovsdb-server for it to know whether it is an active
server.  Backup server will not accept any connections.  Server started with
--sync-from argument will be put in the back state by default.

2. Add appctl commands to allow manually switch state.

3. Add a new table for backup server to register its address and ports.
OVSDB clients can learn about them at run time. Back up server should issue
an
transaction to register its address before issuing the monitoring request.
This feature is not strictly necessary, and can be pushed to HA manager,
but having it built into ovsdb-server may make it simpler for integrationl.

What do you think?



>
> --
> Russell Bryant
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v5 1/4] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-27 Thread Daniele Di Proietto

I don't think dynamic_txqs should be atomic, since we change it when the pmd 
threads are stopped.

Also, in port_create() we should check for 'netdev_n_txq(netdev) < n_cores + 1' 
after we reconfigure the device.

Other than that this looks good to me, so I applied the following incremental 
and pushed this to master.

Thanks,

Daniele


---8<---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d1ba6f3..d45aba0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -258,7 +258,7 @@ struct dp_netdev_port {
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
-atomic_bool dynamic_txqs;   /* If true XPS will be used. */
+bool dynamic_txqs;  /* If true XPS will be used. */
 unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
 struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
@@ -1151,6 +1151,7 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 enum netdev_flags flags;
 struct netdev *netdev;
 int n_open_rxqs = 0;
+int n_cores = 0;
 int i, error;
 bool dynamic_txqs = false;

@@ -1171,7 +1172,7 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 }

 if (netdev_is_pmd(netdev)) {
-int n_cores = ovs_numa_get_n_cores();
+n_cores = ovs_numa_get_n_cores();

 if (n_cores == OVS_CORE_UNSPEC) {
 VLOG_ERR("%s, cannot get cpu core info", devname);
@@ -1186,9 +1187,6 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 VLOG_ERR("%s, cannot set multiq", devname);
 goto out;
 }
-if (netdev_n_txq(netdev) < n_cores + 1) {
-dynamic_txqs = true;
-}
 }

 if (netdev_is_reconf_required(netdev)) {
@@ -1198,6 +1196,12 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 }
 }

+if (netdev_is_pmd(netdev)) {
+if (netdev_n_txq(netdev) < n_cores + 1) {
+dynamic_txqs = true;
+}
+}
+
 port = xzalloc(sizeof *port);
 port->port_no = port_no;
 port->netdev = netdev;
@@ -1206,7 +1210,7 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 port->txq_used = xcalloc(netdev_n_txq(netdev), sizeof *port->txq_used);
 port->type = xstrdup(type);
 ovs_mutex_init(>txq_used_mutex);
-atomic_init(>dynamic_txqs, dynamic_txqs);
+port->dynamic_txqs = dynamic_txqs;

 for (i = 0; i < port->n_rxq; i++) {
 error = netdev_rxq_open(netdev, >rxq[i], i);
@@ -2718,8 +2722,7 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
 seq_change(dp->port_seq);
 port_destroy(port);
 } else {
-atomic_init(>dynamic_txqs,
-netdev_n_txq(port->netdev) < n_cores + 1);
+port->dynamic_txqs = netdev_n_txq(port->netdev) < n_cores + 1;
 }
 }
 /* Restores the non-pmd. */
@@ -4015,11 +4018,9 @@ dpif_netdev_xps_revalidate_pmd(const struct 
dp_netdev_pmd_thread *pmd,
 struct tx_port *tx;
 struct dp_netdev_port *port;
 long long interval;
-bool dynamic_txqs;

 HMAP_FOR_EACH (tx, node, >port_cache) {
-atomic_read_relaxed(>port->dynamic_txqs, _txqs);
-if (dynamic_txqs) {
+if (tx->port->dynamic_txqs) {
 continue;
 }
 interval = now - tx->last_used;
@@ -4156,7 +4157,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 int tx_qid;
 bool dynamic_txqs;

-atomic_read_relaxed(>port->dynamic_txqs, _txqs);
+dynamic_txqs = p->port->dynamic_txqs;
 if (dynamic_txqs) {
 tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now);
 } else {

---8<---


On 27/07/2016 07:44, "Ilya Maximets"  wrote:

>If CPU number in pmd-cpu-mask is not divisible by the number of queues and
>in a few more complex situations there may be unfair distribution of TX
>queue-ids between PMD threads.
>
>For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
>such distribution is possible:
><>
>pmd thread numa_id 0 core_id 13:
>port: vhost-user1   queue-id: 1
>port: dpdk0 queue-id: 3
>pmd thread numa_id 0 core_id 14:
>port: vhost-user1   queue-id: 2
>pmd thread numa_id 0 core_id 16:
>port: dpdk0 queue-id: 0
>pmd thread numa_id 0 core_id 17:
>port: dpdk0 queue-id: 1
>pmd thread numa_id 0 core_id 12:
>port: vhost-user1   queue-id: 0
>port: dpdk0 queue-id: 2
>pmd thread numa_id 0 core_id 15:
>port: vhost-user1   queue-id: 3
><>
>
>As we can see above dpdk0 port

Re: [ovs-dev] [PATCH v5 4/4] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-27 Thread Daniele Di Proietto

ofputil_parse_key_value() to parse the affinity seems like a good idea, thanks!


I got a compiler warning on an unused variable.  I fixed that and applied the 
series to master.

Thanks,

Daniele



On 27/07/2016 07:44, "Ilya Maximets"  wrote:

>New 'other_config:pmd-rxq-affinity' field for Interface table to
>perform manual pinning of RX queues to desired cores.
>
>This functionality is required to achieve maximum performance because
>all kinds of ports have different cost of rx/tx operations and
>only user can know about expected workload on different ports.
>
>Example:
>   # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
> other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>   Queue #0 pinned to core 3;
>   Queue #1 pinned to core 7;
>   Queue #2 not pinned.
>   Queue #3 pinned to core 8;
>
>It's decided to automatically isolate cores that have rxq explicitly
>assigned to them because it's useful to keep constant polling rate on
>some performance critical ports while adding/deleting other ports
>without explicit pinning of all ports.
>
>Signed-off-by: Ilya Maximets 
>---
> INSTALL.DPDK.md  |  49 +++-
> NEWS |   2 +
> lib/dpif-netdev.c| 216 +--
> tests/pmd.at |   6 ++
> vswitchd/vswitch.xml |  23 ++
> 5 files changed, 254 insertions(+), 42 deletions(-)
>
>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>index 5407794..7609aa7 100644
>--- a/INSTALL.DPDK.md
>+++ b/INSTALL.DPDK.md
>@@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>  # Check current stats
>ovs-appctl dpif-netdev/pmd-stats-show
> 
>+ # Clear previous stats
>+   ovs-appctl dpif-netdev/pmd-stats-clear
>+ ```
>+
>+  7. Port/rxq assigment to PMD threads
>+
>+ ```
>  # Show port/rxq assignment
>ovs-appctl dpif-netdev/pmd-rxq-show
>+ ```
> 
>- # Clear previous stats
>-   ovs-appctl dpif-netdev/pmd-stats-clear
>+ To change default rxq assignment to pmd threads rxqs may be manually
>+ pinned to desired cores using:
>+
>+ ```
>+ ovs-vsctl set Interface  \
>+   other_config:pmd-rxq-affinity=
>  ```
>+ where:
>+
>+ ```
>+  ::= NULL | 
>+  ::=  |
>+   , 
>+  ::=  : 
>+ ```
>+
>+ Example:
>+
>+ ```
>+ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>+   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>+
>+ Queue #0 pinned to core 3;
>+ Queue #1 pinned to core 7;
>+ Queue #2 not pinned.
>+ Queue #3 pinned to core 8;
>+ ```
>+
>+ After that PMD threads on cores where RX queues was pinned will become
>+ `isolated`. This means that this thread will poll only pinned RX queues.
>+
>+ WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
>queues
>+ will not be polled. Also, if provided `core_id` is not available (ex. 
>this
>+ `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
>+ PMD thread.
>+
>+ Isolation of PMD threads also can be checked using
>+ `ovs-appctl dpif-netdev/pmd-rxq-show` command.
> 
>-  7. Stop vswitchd & Delete bridge
>+  8. Stop vswitchd & Delete bridge
> 
>  ```
>  ovs-appctl -t ovs-vswitchd exit
>diff --git a/NEWS b/NEWS
>index 73d3fcf..1a34f75 100644
>--- a/NEWS
>+++ b/NEWS
>@@ -45,6 +45,8 @@ Post-v2.5.0
>Old 'other_config:n-dpdk-rxqs' is no longer supported.
>Not supported by vHost interfaces. For them number of rx and tx queues
>is applied from connected virtio device.
>+ * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
>+   allows to pin port's rx queues to desired cores.
>  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
>assignment.
>  * Type of log messages from PMD threads changed from INFO to DBG.
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 1ef0cd7..33f1216 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -53,7 +53,9 @@
> #include "openvswitch/list.h"
> #include "openvswitch/match.h"
> #include "openvswitch/ofp-print.h"
>+#include "openvswitch/ofp-util.h"
> #include "openvswitch/ofpbuf.h"
>+#include "openvswitch/shash.h"
> #include "openvswitch/vlog.h"
> #include "ovs-numa.h"
> #include "ovs-rcu.h"
>@@ -62,7 +64,7 @@
> #include "pvector.h"
> #include "random.h"
> #include "seq.h"
>-#include "openvswitch/shash.h"
>+#include "smap.h"
> #include "sset.h"
> #include "timeval.h"
> #include "tnl-neigh-cache.h"
>@@ -252,6 +254,12 @@ enum pmd_cycles_counter_type {
> 
> #define XPS_TIMEOUT_MS 500LL
> 
>+/* Contained by struct dp_netdev_port's 'rxqs' member.  */
>+struct dp_netdev_rxq {
>+struct netdev_rxq *rxq;
>+unsigned core_id;   /* Сore to which this queue is pinned. */
>+};
>+
> /* A port in a netdev-based datapath. */
> struct dp_netdev_port {
> odp_port_t

Re: [ovs-dev] [PATCH] FAQ: Add contents section and enable internal links.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 08:48:12PM +0100, Bhanuprakash Bodireddy wrote:
> Add contents section to FAQ and enable internal links in doc for pretty
> printing on GitHub.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
> Reviewers can review the rendered form here 
> https://github.com/bbodired/ovs/blob/master/FAQ.md

This makes the FAQ less readable as a text file, because it's harder to spot

##  2. Releases

than

Releases


Is there a way to link to a title?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] ovn: the implementation of icmp4 reject actions.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 20, 2016 at 01:19:37AM -0700, nickcooper-zhangtonghao wrote:
> Hi Ryan,
> Thank you for your advice. The controller will drop the icmp4 packet whose 
> eth.src or eth.dst is
> broadcast/multicast address and the type of icmp4 is “destination 
> unreachable”. That may reduce 
> the broadcast storm attack(e.g. icmp flood, dhcp flood). I write the rate 
> limiting of icmp4 into 
> my to-do list. Thanks.

Please don't put additional commentary above a commit message.  "git am"
can't distinguish it from the commit message itself.

This patch needs a rebase due to conflicts.

Please don't use bare sprintf() in OVS.  Use snprintf() or xasprintf()
or ds_put_format() instead.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] FAQ: Add contents section and enable internal links.

2016-07-27 Thread Bhanuprakash Bodireddy

Add contents section to FAQ and enable internal links in doc for pretty
printing on GitHub.

Signed-off-by: Bhanuprakash Bodireddy 
---
Reviewers can review the rendered form here 
https://github.com/bbodired/ovs/blob/master/FAQ.md

 FAQ.md | 52 +++-
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/FAQ.md b/FAQ.md
index 35e1cac..da63df1 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -3,8 +3,22 @@ Frequently Asked Questions
 
 Open vSwitch 
 
-General

+## Contents
+
+1. [General](#general)
+2. [Releases](#releases)
+3. [Terminology](#terminology)
+4. [Basic configuration](#configuration)
+5. [Implementation Details](#implementation)
+6. [Performance](#performance)
+7. [Configuration Problems](#configproblems)
+8. [Quality of Service(QOS)](#qos)
+9. [VLANs](#vlans)
+10. [VXLANs](#vxlans)
+11. [Using OpenFlow (Manually or Via Controller)](#openflowusage)
+12. [Development](#development)
+
+##  1. General
 
 ### Q: What is Open vSwitch?
 
@@ -119,8 +133,7 @@ A: Starting in OVS 2.4, we switched the default ports to the
appropriate man page.
 
 
-Releases
-
+##  2. Releases
 
 ### Q: What does it mean for an Open vSwitch release to be LTS (long-term 
support)?
 
@@ -353,8 +366,7 @@ A: Bridge compatibility was a feature of Open vSwitch 1.9 
and earlier.
the release.  Be sure to start the ovs-brcompatd daemon.
 
 
-Terminology

+##  3. Terminology
 
 ### Q: I thought Open vSwitch was a virtual Ethernet switch, but the 
documentation keeps talking about bridges.  What's a bridge?
 
@@ -367,8 +379,7 @@ A: In networking, the terms "bridge" and "switch" are 
synonyms.  Open
 A: See the "VLAN" section below.
 
 
-Basic Configuration

+##  4. Basic configuration
 
 ### Q: How do I configure a port as an access port?
 
@@ -575,9 +586,7 @@ A: First, why do you want to do this?  Two connected 
bridges are not
 A: Open vSwitch does not support such a configuration.
Bridges always have their local ports.
 
-
-Implementation Details
---
+##  5. Implementation Details
 
 ### Q: I hear OVS has a couple of kinds of flows.  Can you tell me about them?
 
@@ -668,8 +677,7 @@ A: No.  There are several reasons:
   published in USENIX NSDI 2015.
 
 
-Performance

+##  6. Performance
 
 ### Q: I just upgraded and I see a performance drop.  Why?
 
@@ -689,8 +697,7 @@ A: The OVS kernel datapath may have been updated to a newer 
version than
userspace.
 
 
-Configuration Problems
---
+##  7. Configuration Problems
 
 ### Q: I created a bridge and added my Ethernet port to it, using commands
like these:
@@ -1045,8 +1052,7 @@ A: The short answer is that this is a misuse of a "tap" 
device.  Use
type "system", the default, instead).
 
 
-Quality of Service (QoS)
-
+##  8. Quality of Service(QOS)
 
 ### Q: Does OVS support Quality of Service (QoS)?
 
@@ -1202,9 +1208,7 @@ A: Since version 2.0, Open vSwitch has OpenFlow protocol 
support for
vSwitch software switch (neither the kernel-based nor userspace
switches).
 
-
-VLANs
--
+##  9. VLANs
 
 ### Q: What's a VLAN?
 
@@ -1477,8 +1481,7 @@ A: Open vSwitch implements Independent VLAN Learning 
(IVL) for
for each VLANs.
 
 
-VXLANs
--
+##  10. VXLANs
 
 ### Q: What's a VXLAN?
 
@@ -1512,8 +1515,7 @@ A: By default, Open vSwitch will use the assigned IANA 
port for VXLAN, which
options:dst_port=8472
 
 
-Using OpenFlow (Manually or Via Controller)

+##  11. Using OpenFlow (Manually or Via Controller)
 
 ### Q: What versions of OpenFlow does Open vSwitch support?
 
@@ -1573,7 +1575,7 @@ A: The following table lists the versions of OpenFlow 
supported by
1.5 is solidly implemented, Open vSwitch will enable those version
by default.
 
-### Q: Does Open vSwitch support MPLS?
+##  12. Development
 
 A: Before version 1.11, Open vSwitch did not support MPLS.  That is,
these versions can match on MPLS Ethernet types, but they cannot
-- 
2.4.11

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/3] ovsdb: Fix memory leak in replication logic

2016-07-27 Thread Andy Zhou

On Wed, Jul 27, 2016 at 12:43 PM, Ben Pfaff  wrote:

> On Tue, Jul 26, 2016 at 01:08:06PM -0700, Andy Zhou wrote:
> > Release the memory of reply message of the initial "monitor" request.
> >
> > Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076075.html
> > Signed-off-by: Andy Zhou 
> > ---
> >  ovsdb/replication.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/ovsdb/replication.c b/ovsdb/replication.c
> > index 3d589ef..af7ae5c 100644
> > --- a/ovsdb/replication.c
> > +++ b/ovsdb/replication.c
> > @@ -365,6 +365,10 @@ get_initial_db_state(const struct db *database)
> >  if (msg->type == JSONRPC_REPLY) {
> >  process_notification(msg->result, database->db);
> >  }
> > +
> > +if (msg) {
> > +jsonrpc_msg_destroy(msg);
> > +}
> >  }
>
> The 'if' isn't needed.
>
Right,  jsonrpc_msg_destroy() checks for NULL internally.

I will drop the 'if' before pushing.  Thanks for catching it.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/3] ovsdb: Fix memory leak in replication logic

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 01:08:06PM -0700, Andy Zhou wrote:
> Release the memory of reply message of the initial "monitor" request.
> 
> Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076075.html
> Signed-off-by: Andy Zhou 
> ---
>  ovsdb/replication.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/ovsdb/replication.c b/ovsdb/replication.c
> index 3d589ef..af7ae5c 100644
> --- a/ovsdb/replication.c
> +++ b/ovsdb/replication.c
> @@ -365,6 +365,10 @@ get_initial_db_state(const struct db *database)
>  if (msg->type == JSONRPC_REPLY) {
>  process_notification(msg->result, database->db);
>  }
> +
> +if (msg) {
> +jsonrpc_msg_destroy(msg);
> +}
>  }

The 'if' isn't needed.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] Considering the possibility of integrating DPDK generic classifier APIs in OVS.

2016-07-27 Thread Jesse Gross

On Thu, Jul 21, 2016 at 3:55 AM, Chandran, Sugesh
 wrote:
> Hi Mark & Jesse
>
> Thank you for looking into the the proposal,
> Please find my answers inline below.
>
> Regards
> _Sugesh
>
>> -Original Message-
>> From: Gray, Mark D
>> Sent: Wednesday, July 20, 2016 7:17 PM
>> To: Jesse Gross 
>> Cc: Chandran, Sugesh ;
>> dev@openvswitch.org; Giller, Robin 
>> Subject: RE: [ovs-dev] Considering the possibility of integrating DPDK 
>> generic
>> classifier APIs in OVS.
>>
>> >
>> > On Wed, Jul 20, 2016 at 6:43 PM, Gray, Mark D 
>> > wrote:
>> > >  [Gray, Mark D] I think we should focus on one or two use cases
>> > > rather than a general offload like you discuss below. A general
>> > > offload involves a huge amount of code churn and there are a lot of
>> > > difficulties,
>> > some that you have highlighted below.
>> > > A more focused implementation will flush out any issues with the API.
>> > > In particular, the VxLAN use case that you mentioned above and
>> > > perhaps the offload of the hash calculation (but the generic
>> > > filtering api would also need to support generation of hashes) could
>> > > be two targets for
>> > this DPDK api.
>> >
>> > I agree that targeting a specific use case is a good idea (as well as
>> > your other comments). It's probably worthwhile talking to John
>> > Fastabend about this (also from Intel) since he has tried to something
>> > similar for the past several years in Linux. Many of the general
>> > problems listed in the original email turn out to be very difficult.
>> > (Examples include capabilities; describing flows in a hardware
>> > independent manner is something that OpenFlow tried to tackle for a
>> > long time; which flows to offload in the face of table size limits
>> > while maintaining correct forwarding behavior; etc.)
>>  [Gray, Mark D]
>> Yes, John and I have discussed a lot of this in depth and we have done
>> whiteboarding of possible hw offload designs in OVS which is why I am quite
>> familiar with the issues.
> [Sugesh] I feel that the design must be considering all the capabilities
> Of DPDK APIs though it uses only for the VxLAN and hashing for now.
> The earlier implementation installs the flows in hardware when a flow get
> populated in the datapath. Everything happens in the datapath.
> The main focus of that implementation is to just optimize the VxLAN
> traffic, so we haven’t consider other cases where flow director can be useful.
> The generic APIs can do much more than just a flow director. So having
> a generic extendable design in OVS helps in many ways.
> Comments?
>
>>
>> >
>> > I think the VXLAN acceleration was a good use case since the vswitch
>> > is the owner of the tunnel endpoint and therefore is better equipped
>> > to make policy decisions. The main concern that I had with the
>> > previous implementation was that it was making assumptions about the
>> > contents of the inner flow based on the UDP source port, which is not
>> > really safe since that is just a hash.
>> [Gray, Mark D]
>> I read your comments on this I had a look through Sugesh's code to try and
>> see where this was happening. I couldn’t see it but I agree that the source
>> port is basically random and it's only a hash of the inner flow by 
>> convention.
>> Sugesh, is Jesse's concern valid in your implementation? I thought it was
>> actually extracting the inner header and you weren't making an assumption
>> about the source port?
> [Sugesh] Its not possible to do the inner packet extraction and lookup because
> the inner packet miniflow also include metadata from outer header. The
> inner packet matches on hash + tunnel flag to match the flow in the last
> implementation.
> The proposed design may solve this by making  the control plane to insert two 
> set rules in the datapath,
> for VxLAN tunnel traffic
> set :1 (Software fallback path)
> Rule 1: Outer tunnel header rule
> Rule 2 : Inner header rule.
> Set:1 is purely software rules and that expected to be present in the 
> datapath until the
> Hardware flow insertion is complete.
>
> Set:2 (Software + Hardware path)
> Rule 1:- Outer header hardware flow director rule, Programmed by OVS control 
> plane.
> Rule 2:- Inner header software rule(matches on miniflow only from inner 
> header +
> hardware reported tunnel ID). Please note there is no tunnel metadata from 
> the outer
> header uses here.
>
> To start off,
> OVS can verify the hardware capabilities when a user adds a port. For now its 
> only
> the flow director and hashing.
> For every flow insertion request To/From the port, control plane has to check 
> if it is a
> candidate for hardware flow and insert the flows in NICs accordingly.
> We can assume that every hardware flow associate with an ID and queue(Or 
> either of one).
> The flow lookup and matching logic has to be changed to handle these 
> parameters too.
>

Re: [ovs-dev] [PATCH 1/3] INSTALL.md: Update missing hyperlink for Windows install guide.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 07:31:09PM +0100, Bhanuprakash Bodireddy wrote:
> Signed-off-by: Bhanuprakash Bodireddy 

I applied all of these, thanks!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 1/3] ovsdb: Properly close replication rpc connection

2016-07-27 Thread Andy Zhou

On Tue, Jul 26, 2016 at 5:37 PM, William Tu  wrote:

> Hi Andy,
>
> Thanks for fixing the memory leak! I've tested it and it solved the
> issue. btw, I think we don't have to assign "NULL" to static variable,
> C99 standard assume all static variable initializes to 0.
>

Right, I will drop the NULL initializations.

>
> Acked-by: William Tu 
>

Thanks for the review. I will push to master in a minute.

>
> On Tue, Jul 26, 2016 at 1:08 PM, Andy Zhou  wrote:
> > This patch removes rpc related memory leak reported below.
> >
> > Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076075.html
> > Signed-off-by: Andy Zhou 
> > ---
> >  ovsdb/ovsdb-server.c | 1 +
> >  ovsdb/replication.c  | 5 +++--
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> > index 239cca8..1c6ddca 100644
> > --- a/ovsdb/ovsdb-server.c
> > +++ b/ovsdb/ovsdb-server.c
> > @@ -202,6 +202,7 @@ main_loop(struct ovsdb_jsonrpc_server *jsonrpc,
> struct shash *all_dbs,
> >  }
> >  }
> >
> > +disconnect_remote_server();
> >  free(remotes_error);
> >  }
> >
> > diff --git a/ovsdb/replication.c b/ovsdb/replication.c
> > index 52b7085..3d589ef 100644
> > --- a/ovsdb/replication.c
> > +++ b/ovsdb/replication.c
> > @@ -32,8 +32,8 @@
> >  #include "table.h"
> >  #include "transaction.h"
> >
> > -static char *remote_ovsdb_server;
> > -static struct jsonrpc *rpc;
> > +static char *remote_ovsdb_server = NULL;
> > +static struct jsonrpc *rpc = NULL;
> >  static struct sset monitored_tables =
> SSET_INITIALIZER(_tables);
> >  static struct sset tables_blacklist =
> SSET_INITIALIZER(_blacklist);
> >  static bool reset_dbs = true;
> > @@ -391,6 +391,7 @@ check_for_notifications(struct shash *all_dbs)
> >  if (error == EAGAIN) {
> >  return;
> >  } else if (error) {
> > +jsonrpc_close(rpc);
> >  rpc = open_jsonrpc(remote_ovsdb_server);
> >  if (!rpc) {
> >  /* Remote server went down. */
> > --
> > 1.9.1
> >
> > ___
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] release-process.md: Document OVS release process and propose a schedule.

2016-07-27 Thread Ben Pfaff

On Tue, Jul 19, 2016 at 10:58:08AM -0700, Ben Pfaff wrote:
> This document has two different kinds of text:
> 
>- The first sections of the document, "Release Strategy" and "Release
>  Numbering", describe what we've already been doing for most of the
>  history of Open vSwitch.  If there is anything surprising in them,
>  then it's because our process has not been transparent enough, and not
>  because we're making a change.
> 
>- The final section of the document, "Release Scheduling", is a proposal
>  for current and future releases.  We have not had a regular release
>  schedule in the past, but it seems important to have one in the
>  future, so this section requires review and feedback from everyone in
>  the community.
> 
> Signed-off-by: Ben Pfaff 

I applied this to master.  Thanks to everyone for your feedback.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [RFC PATCH v2 00/13] Add Network Service Header Support

2016-07-27 Thread Jesse Gross

On Thu, Jul 21, 2016 at 3:40 PM, Jan Scheurich
 wrote:
> 1. The pending question whether to model NSH headers as packet header match 
> fields, metadata fields, or both applies in particular to the MD2 TLVs.
>
> We have three main OpenFlow use cases for the MD2 TLVs:
> a) Match on an MD2 TLV of an NSH packet
> b) Set a MD2 TLV of an NSH packet
> c) Pop an NSH header of MD2 format (saving selected TLVs in metadata)
> d) Push an NSH header of MD2 format with selected TLVs and set their values
>
> Use cases c) and d) would naturally fit the idea to re-use the 
> TUN_METADATA[X] fields introduced for Geneve also for NSH TLVs. The push_nsh 
> action should take the MD format as input parameter and in case of MD2 would 
> push those TLVs for which mappings have been defined (see question 2 below) 
> and set their values from the mapped TUN_METADATA[X] fields. I guess that is 
> how the Geneve tunnel ports works. Can you confirm that?

With Geneve, the current implementation doesn't automatically add all
TLVs that have a mapping. It will only add the options for which a
value has actually been assigned during the flow translation, for
example, using set_field. When a value is assigned to a field during
flow setup, a bit is also set in a mask that tracks which options
should be emitted. This allows different flows to use different sets
of options.

There is also an analogue to this on receive: matching on a metadata
field requires that the field be present in the original packet.
Effectively this means that the metadata fields are not initialized to
zero regardless of their presence. If you only wish match on the
presence of a field, this can be done by including the match with an
all wildcarded value. If you don't care whether it is present, then
simply don't include it in the flow specification, even if it is
defined in the mapping table.

> But for use cases a) and b) the same TUN_METADATA[X] fields would have to be 
> populated by the parser and used as packet header match fields. Technically 
> this might be implementable in OVS, but would it be acceptable?
>
> Otherwise we'd have to define a second set of TLV_OPTION[X] match fields on 
> the OpenFlow protocol level, which could be subject to the same TLV mapping 
> as TUN_METADATA[X].
>
> Internally, these fields could probably be stored in the same memory location 
> in the flow as TUN_METADATA[X]. There should be no collision because a mapped 
> NSH TLV can either be a packet header or metadata but not both. And even if 
> NSH would use Geneve tunnels as transport in the future, Geneve and NSH TLVs 
> can never be mapped to the same index.
>
> In contrast to TUN_METADATA[X] fields, a mapped TLV_OPTION[X] may not be 
> present in a packet fulfilling that pre-requisite. The presence info must be 
> somehow be captured in the packet's flow.

I guess having both map to the same set of fields doesn't necessarily
bother me that much. If they are all being stored internally in the
same place in the flow (which I think is necessary - having a second
set of option data in the flow key seems wasteful and unlikely to be
needed in practice) then it seems like it mostly comes down to naming.
I suppose there isn't that much cost to having an extra set of
OpenFlow field names but it seems cleaner not to given the number of
fields that we are talking about.

In any case, there will clearly be some limit on the amount of
metadata when nesting NSH in Geneve. As long as we do something
reasonably sane in this situation, I don't think it will be a big
issue.

In terms of tracking presence of options, does my comment above answer
your question?

> 2. The current Nicira extension messages for managing TLV option mappings 
> identify the TLV option to be mapped by {Class, Type and Length}. While such 
> a tuple is unique within a given protocol like Geneve, there is no guarantee 
> that the same triple cannot be defined for a TLV in another protocol. Thus, a 
> mapping for, say, NSH could collide with an independent mapping for Geneve.
>
> To address this issue, the TLV option mapping commands could include an 
> additional 'protocol' identifier. But that would be a backward incompatible 
> change of an external interface. Would you accept that or is there a clean 
> way to solve this without such an impact on OpenFlow?

There is some reserved space in the OpenFlow mapping actions. One way
to handle this would be to allocate a protocol type out of that space
and Geneve could be assigned type 0 (which is what existing clients
will send for reserved space) and NSH assigned type 1. That should
allow this behavior without breaking backwards compatibility of the
existing interface.

I'm not sure if this is really necessary though, given that we are
tracking which options are used. Another possibility would be to use
the length field as part of the lookup when converting the raw packet
data to OpenFlow fields during upcalls (currently we only use class
and

[ovs-dev] Mea culpa - rewound master branch

2016-07-27 Thread Ben Pfaff

Some time ago this morning, I accidentally pushed several patches that
were still under review to master.  I've force-pushed a correction to
the branch.  My apologies--I hope that this does not screw up anyone's
development process in a bad way.

Sorry,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v1] Enable manager_option configuation for NB and SB

2016-07-27 Thread Ben Pfaff

On Mon, Jul 18, 2016 at 10:21:09PM -0700, Amitabha Biswas wrote:
> NB and SB dbs are handled by separate ovsdb-server processes. The
> ovsdb-server processes manage dbs based on the schemas for OVN NorthBound
> and SouthBound. These schemas do not have any Manager Options similar
> to the Open_vSwitch schema. As a result, for e.g., the frequency of
> inactivity probes sent to clients from the ovsdb-server cannot be
> modified.
> 
> This patch addresses the above problem by creating independent
>  "conf.db"s for NB and SB. The ovsdb-server process which handles
> the NB will handle a NB_conf.db as well, ditto for the ovsdb-server
> that handles the SB.
> 
> A similar result can be obtained by adding a Manager Table to the
> NorthBound and SouthBound DBs as well. In such case there would
> not be a need for a separate "config" database. I'm willing to
> consider this solution as well pending feedback.

Is there an advantage to having a separate database for this purpose?
It strikes me as an odd design.

The vtep schema also in the OVS tree (vtep/vtep.ovsschema) has its own
Manager table.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ tunneling)

2016-07-27 Thread Ben Pfaff

On Tue, Jul 12, 2016 at 11:38:54PM +0800, Xiao Liang wrote:
> Flow key handleing changes:
> - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
>   headers.
> - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN,
>   increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
> 
> Refacter VLAN handling in dpif-xlate:
> - Introduce 'xvlan' to track VLAN stack during flow processing.
> - Input and output VLAN translation according to the xbundle type.
> 
> Push VLAN action support:
> - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
> - Support push_vlan on dot1q packets.
> 
> Add new port VLAN mode "dot1q-tunnel":
> - Example:
> ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
>   Pushes another VLAN 100 header on packets (tagged and untagged) on ingress,
>   and pops it on egress.
> - Customer VLAN check:
> ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
>   Only customer VLAN of 10 and 20 are allowed.
> 
> Signed-off-by: Xiao Liang 

Thanks for working on this!  I can see that it was a great deal of
work.  In addition to the "sparse" related fixes from a few days ago,
I have some more detailed comments.

It looks like this patch causes some tests to fail and then later
patches fix them.  This is not our practice: a patch should never make
tests fails (at least not intentionally).  Instead, if a patch
requires changes to tests, then the same patch should update the
tests.  If the later patches are required to fix tests, they should be
folded into this one.

Our practice is to give arrays names that are plural, like 'vlans'.

Let's talk about VLANs in the flow struct.  The use of an array of
TPID+TCI seems fine.  I think, however, that we need to define and
enforce an invariant: if vlan[i] matches on anything, for i > 0, then
every vlan[j], for j < i, needs to match on (tci & VLAN_CFI) != 0.
Otherwise we can end up with nonsensical matches like one that matches
on vlan[0] being not present (i.e. match on (vlan[0] & VLAN_CFI) == 0)
but vlan[1] being present (i.e. match on (vlan[1] & VLAN_CFI) != 0).
We enforce a similar invariant for MPLS.

I'm concerned about backward compatibility.  Consider some application
built on Open vSwitch using OpenFlow.  Today, it can distinguish
single-tagged and double-tagged packets (that use outer Ethertype
0x8100), as follows:

- A single-tagged packet has vlan_tci != 0 and some non-VLAN
  dl_type.

- A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.

With this patch, this won't work, because neither kind of packet has a
VLAN dl_type.  Instead, applications need to first match on the outer
VLAN, then pop it off, then match on the inner VLAN.  This difference
could lead to security problems in applications.  An application
might, for example, want to pop an outer VLAN and forward the packet,
but only if there is no inner VLAN.  If it is implemented according to
the previous rules, then it will not notice the inner VLAN.

There are probably multiple ways to deal with this problem.  Let me
propose one.  It is somewhat awkward, so there might be a more
graceful way.  Basically the idea is to retain the current view from
an OpenFlow perspective:

- Packet without tags: vlan_tci == 0, dl_type is non-VLAN
- Packet with 1 tag:   vlan_tci != 0, dl_type is non-VLAN
- Packet with 2+ tags: vlan_tci != 0, dl_type is 2nd VLAN

So, when a packet with 2 tags pops off the outermost one, dl_type
becomes the inner Ethertype (such as IPv4) and vlan_tci becomes the
single remaining VLAN.

I think there's another backward compatibility risk.  This patch adds
support for TPID 0x88a8 without adding any way for OpenFlow
applications to distinguish 88a8 from 8100.  This means that
applications that previously handled 0x8100 VLANs will now handle
0x88a8 VLANs whereas previously they were able to filter these out.  I
don't have a wonderful idea on how to handle this but I think we need
some way.  (The OpenFlow spec is totally unhelpful here.)

The tests seem incomplete in that they do not seem to add much in the
way of tests for OpenFlow handling of multiple VLANs.  I'd feel more
confident if it added some.

In a few places it would be more convenient to make struct
flow_vlan_hdr into a union, like this:

union flow_vlan_hdr {
ovs_be32 qtag;
struct {
ovs_be16 tpid;  /* ETH_TYPE_VLAN_DOT1Q or ETH_TYPE_DOT1AD */
ovs_be16 tci;
};
};

We could take advantage of it like this, for example:

@@ -326,16 +326,12 @@ parse_mpls(const void **datap, size_t *sizep)
 }

 static inline bool
-parse_vlan(const void **datap, size_t *sizep, struct flow_vlan_hdr *vlan_hdrs)
+parse_vlan(const void **datap, size_t *sizep, union flow_vlan_hdr *vlan_hdrs)
 {
 int encaps;
 const ovs_be16 *eth_type;
-struct qtag_prefix {
-ovs_be16 eth_type;
-ovs_be16 tci;
-};

-memset(vlan_hdrs, 0, sizeof(struct

[ovs-dev] [PATCH 1/3] INSTALL.md: Update missing hyperlink for Windows install guide.

2016-07-27 Thread Bhanuprakash Bodireddy

Signed-off-by: Bhanuprakash Bodireddy 
---
 INSTALL.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/INSTALL.md b/INSTALL.md
index 47126a3..591f924 100644
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -779,6 +779,7 @@ Please report problems to b...@openvswitch.org.
 [INSTALL.RHEL.md]:INSTALL.RHEL.md
 [INSTALL.XenServer.md]:INSTALL.XenServer.md
 [INSTALL.NetBSD.md]:INSTALL.NetBSD.md
+[INSTALL.Windows.md]:INSTALL.Windows.md
 [INSTALL.DPDK.md]:INSTALL.DPDK.md
 [INSTALL.userspace.md]:INSTALL.userspace.md
 [FAQ.md]:FAQ.md
-- 
2.4.11

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 2/3] INSTALL.RHEL: Update missing hyperlink for Fedora install guide.

2016-07-27 Thread Bhanuprakash Bodireddy

Signed-off-by: Bhanuprakash Bodireddy 
---
 INSTALL.RHEL.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/INSTALL.RHEL.md b/INSTALL.RHEL.md
index 82f3efd..3c3f1a7 100644
--- a/INSTALL.RHEL.md
+++ b/INSTALL.RHEL.md
@@ -167,4 +167,5 @@ Reporting Bugs
 Please report problems to b...@openvswitch.org.
 
 [INSTALL.md]:INSTALL.md
+[INSTALL.Fedora.md]:INSTALL.Fedora.md
 [rhel/README.RHEL]:rhel/README.RHEL
-- 
2.4.11

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 3/3] INSTALL.md: Update configure section for built-in intrinsics.

2016-07-27 Thread Bhanuprakash Bodireddy

Built-in CRC32 intrinsics can be used for efficient hash computation on
processors with SSE4.2 support.

Signed-off-by: Bhanuprakash Bodireddy 
---
 INSTALL.md | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/INSTALL.md b/INSTALL.md
index 591f924..6fbf9ea 100644
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -217,6 +217,21 @@ default CFLAGS plus "-mssse3", you might run configure as 
follows:
 
   `% ./configure CFLAGS="-g -O2 -mssse3"`
 
+For efficient hash computation special flags can be passed to leverage
+built-in intrinsics.  For example on X86_64 with SSE4.2 instruction set
+support, CRC32 intrinsics can be used by passing '-msse4.2'.
+
+  `% ./configure CFLAGS="-g -O2 -msse4.2"`
+
+If you are on a different processor and don't know what flags to choose, it
+is recommended to use '-march=native' settings.
+
+  `% ./configure CFLAGS="-g -O2 -march=native"`
+
+With this, GCC will detect the processor and automatically set appropriate
+flags for it.  This should not be used if you are compiling OVS outside the
+target machine.
+
 Note that these CFLAGS are not applied when building the Linux
 kernel module.  Custom CFLAGS for the kernel module are supplied
 using the EXTRA_CFLAGS variable when running make.  So, for example:
-- 
2.4.11

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [ovs-dev,ovn-controller] Physical: persist tunnels

2016-07-27 Thread Hui Kang



"dev"  wrote on 07/26/2016 02:37:30 PM:

> From: "Liran Schour" 
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: dev@openvswitch.org
> Date: 07/26/2016 02:38 PM
> Subject: Re: [ovs-dev] [ovs-dev,ovn-controller] Physical: persist tunnels
> Sent by: "dev" 
>
> "dev"  wrote on 26/07/2016 12:02:37 AM:
>
> > From: Flavio Fernandes 
> > To: Ryan Moats 
> > Cc: dev@openvswitch.org
> > Date: 26/07/2016 12:06 AM
> > Subject: Re: [ovs-dev] [ovs-dev,ovn-controller] Physical: persist
> tunnels
> > Sent by: "dev" 
> >
> >
> > > On Jul 25, 2016, at 12:28 PM, Ryan Moats  wrote:
> > >
> > > While commit ab39371d68842b7e4000cc5d8718e6fc04e92795
> > > (ovn-controller: Handle physical changes correctly) addressed
> > > unit test failures, it did so at the cost of performance: [1]
> > > notes that ovn-controller cpu usage is now pegged at 100%.
> > >
> > > Root cause of this is that while the storage for tunnels is
> > > persisted, their creation is not (which the above changed
> > > incorrectly assumed was the case).  This patch persists
> > > tunneled data across invocations of physical_run.  A side
> > > effect is that renaming of localfvif_map_changed variable
> > > to physical_map_changed and extending its scope to include
> > > tunnel changes.
> > >
> >
> > I tested this fix by using a Vagrant file that stamps out vms as
> > compute nodes.
> > To deploy ovn, call the script ?/vagrant/scripts/setup-ovn-
> > cluster.sh? and that
> > would render ovn-controller in compute nodes to peg the cpu at 100%
> before the
> > changes.
> >
> > More info on _easily_ deploying this cluster is available here:
> >
https://github.com/flavio-fernandes/just-ovn-nodes/blob/master/README.md
> >
> >
> > > [1] http://openvswitch.org/pipermail/dev/2016-July/076058.html
> > >
> > > Signed-off-by: Ryan Moats 
> >
> > Acked-by: Flavio Fernandes 
> > Tested-by: Flavio Fernandes 
> >
>
> Tested this fix with a cluster of 50 hosts.
>
> Acked-by: Liran Schour 
> Tested-by: Liran Schour 

This patch fixes the 100% CPU unitization of ovn-controller. Without this
patch
the ovn-controller reaches 100% CPU in fresh deployment using
ovn-scale-test [1].
After applying this patch, the CPU utilization looks good.

[1] https://github.com/openvswitch/ovn-scale-test

Acked-by: Hui Kang 
Tested-by: Hui Kang 

>
> >
> > > ---
> > > ovn/controller/physical.c | 59 
> > +--
> > > 1 file changed, 37 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> > > index a104e33..e788fe5 100644
> > > --- a/ovn/controller/physical.c
> > > +++ b/ovn/controller/physical.c
> > > @@ -605,8 +605,13 @@ physical_run(struct controller_ctx *ctx, enum
> > mf_field_id mff_ovn_geneve,
> > > uuid_generate(hc_uuid);
> > > }
> > >
> > > +/* This bool tracks physical mapping changes. */
> > > +bool physical_map_changed = false;
> > > +
> > > struct simap new_localvif_to_ofport =
> > > SIMAP_INITIALIZER(_localvif_to_ofport);
> > > +struct simap new_tunnel_to_ofport =
> > > +SIMAP_INITIALIZER(_tunnel_to_ofport);
> > > for (int i = 0; i < br_int->n_ports; i++) {
> > > const struct ovsrec_port *port_rec = br_int->ports[i];
> > > if (!strcmp(port_rec->name, br_int->name)) {
> > > @@ -668,18 +673,23 @@ physical_run(struct controller_ctx *ctx,
> > enum mf_field_id mff_ovn_geneve,
> > > continue;
> > > }
> > >
> > > -struct chassis_tunnel *tun = xmalloc(sizeof *tun);
> > > -hmap_insert(, >hmap_node,
> > > -hash_string(chassis_id, 0));
> > > -tun->chassis_id = chassis_id;
> > > -tun->ofport = u16_to_ofp(ofport);
> > > -tun->type = tunnel_type;
> > > -full_binding_processing = true;
> > > -binding_reset_processing();
> > > -
> > > -/* Reprocess logical flow table immediately. */
> > > -lflow_reset_processing();
> > > -poll_immediate_wake();
> > > +simap_put(_tunnel_to_ofport, chassis_id,
ofport);
> > > +struct chassis_tunnel *tun;
> > > +if ((tun = chassis_tunnel_find(chassis_id))) {
> > > +/* If the tunnel's ofport has changed, update.
*/
> > > +if (tun->ofport != u16_to_ofp(ofport)) {
> > > +tun->ofport = u16_to_ofp(ofport);
> > > +physical_map_changed = true;
> > > +}
> > > +} else {
> > > +tun = xmalloc(sizeof

[ovs-dev] [PATCH v4] ovn-northd, tests: Adding IPAM to ovn-northd.

2016-07-27 Thread Nimay Desai

Added an IPv4 and MAC addresses management system to ovn-northd. When a logical
switch's other_config:subnet field is set, logical ports attached to that
switch that have the keyword "dynamic" in their addresses column will
automatically be allocated a globally unique MAC address/unused IPv4 address
within the provided subnet. The allocated address will populate the
dynamic_addresses column. This can be useful for a user who wants to deploy
many VM's or containers with networking capabilities, but does not care about
the specific MAC/IPv4 addresses that are assigned.

Added tests in ovn.at for ipam.

Signed-off-by: Nimay Desai 
---
 AUTHORS   |   1 +
 ovn/northd/ovn-northd.c   | 390 +-
 ovn/ovn-nb.ovsschema  |  10 +-
 ovn/ovn-nb.xml|  35 +
 ovn/utilities/ovn-nbctl.c |   2 +-
 tests/ovn.at  | 294 ++
 6 files changed, 728 insertions(+), 4 deletions(-)

diff --git a/AUTHORS b/AUTHORS
index 334e17f..8e04e75 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -166,6 +166,7 @@ Murphy McCauley murphy.mccau...@gmail.com
 Natasha Gudenata...@nicira.com
 Neil McKee  neil.mc...@inmon.com
 Neil Zhuz...@centecnetworks.com
+Nimay Desai nimaydes...@gmail.com
 Nithin Raju nit...@vmware.com
 Niti Rohillaniti.rohi...@tcs.com
 Numan Siddique  nusid...@redhat.com
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index a836881..5183d15 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -58,6 +58,13 @@ static const char *ovnsb_db;
 
 static const char *default_nb_db(void);
 static const char *default_sb_db(void);
+
+#define MAC_ADDR_PREFIX 0x0A00ULL
+#define MAC_ADDR_SPACE 0xff
+
+/* MAC address table of "struct eth_addr"s, that holds the MAC addresses
+ * allocated by the ipam. */
+static struct hmap macam = HMAP_INITIALIZER();
 
 /* Pipeline stages. */
 
@@ -288,8 +295,40 @@ struct ovn_datapath {
 uint32_t port_key_hint;
 
 bool has_unknown;
+
+/* IPAM data. */
+struct hmap ipam;
+};
+
+struct macam_node {
+struct hmap_node hmap_node;
+struct eth_addr mac_addr; /* Allocated MAC address. */
 };
 
+static void
+cleanup_macam(struct hmap *macam)
+{
+struct macam_node *node;
+HMAP_FOR_EACH_POP (node, hmap_node, macam) {
+free(node);
+}
+}
+
+struct ipam_node {
+struct hmap_node hmap_node;
+uint32_t ip_addr; /* Allocated IP address. */
+};
+
+static void
+destroy_ipam(struct hmap *ipam)
+{
+struct ipam_node *node;
+HMAP_FOR_EACH_POP (node, hmap_node, ipam) {
+free(node);
+}
+hmap_destroy(ipam);
+}
+
 static struct ovn_datapath *
 ovn_datapath_create(struct hmap *datapaths, const struct uuid *key,
 const struct nbrec_logical_switch *nbs,
@@ -302,6 +341,7 @@ ovn_datapath_create(struct hmap *datapaths, const struct 
uuid *key,
 od->nbs = nbs;
 od->nbr = nbr;
 hmap_init(>port_tnlids);
+hmap_init(>ipam);
 od->port_key_hint = 0;
 hmap_insert(datapaths, >key_node, uuid_hash(>key));
 return od;
@@ -316,6 +356,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct 
ovn_datapath *od)
  * use it. */
 hmap_remove(datapaths, >key_node);
 destroy_tnlids(>port_tnlids);
+destroy_ipam(>ipam);
 free(od->router_ports);
 free(od);
 }
@@ -600,6 +641,318 @@ ovn_port_allocate_key(struct ovn_datapath *od)
   (1u << 15) - 1, >port_key_hint);
 }
 
+static bool
+ipam_is_duplicate_mac(struct eth_addr *ea, uint64_t mac64, bool warn)
+{
+struct macam_node *macam_node;
+HMAP_FOR_EACH_WITH_HASH (macam_node, hmap_node, hash_uint64(mac64),
+ ) {
+if (eth_addr_equals(*ea, macam_node->mac_addr)) {
+if (warn) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+VLOG_WARN_RL(, "Duplicate MAC set: "ETH_ADDR_FMT,
+ ETH_ADDR_ARGS(macam_node->mac_addr));
+}
+return true;
+}
+}
+return false;
+}
+
+static bool
+ipam_is_duplicate_ip(struct ovn_datapath *od, uint32_t ip, bool warn)
+{
+struct ipam_node *ipam_node;
+HMAP_FOR_EACH_WITH_HASH (ipam_node, hmap_node, hash_int(ip, 0),
+ >ipam) {
+if (ipam_node->ip_addr == ip) {
+if (warn) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+VLOG_WARN_RL(, "Duplicate IP set: "IP_FMT,
+ IP_ARGS(htonl(ip)));
+}
+return true;
+}
+}
+return false;
+}
+
+static void
+ipam_insert_mac(struct eth_addr *ea, bool check)
+{
+if (!ea) {
+return;
+}
+
+uint64_t mac64 = eth_addr_to_uint64(*ea);
+/* If the new MAC was not

Re: [ovs-dev] [PATCH v3] ovn-northd, tests: Adding IPAM to ovn-northd.

2016-07-27 Thread Nimay Desai

On Sun, Jul 3, 2016 at 11:16 AM, Ben Pfaff  wrote:

> On Wed, Jun 29, 2016 at 01:53:24PM -0700, Nimay Desai wrote:
> > Added an IPv4 and MAC addresses management system to ovn-northd. When a
> logical
> > switch's options:subnet field is set, logical ports attached to that
> switch
> > that do not have a MAC/IPv4 address will automatically be allocated a
> globally
> > unique MAC address/unused IPv4 address within the provided subnet. This
> > can be useful for a user who wants to deploy many VM's or containers with
> > networking capabilities, but does not care about the specific MAC/IPv4
> > addresses that are assigned.
> >
> > Added tests in ovn.at for ipam.
> >
> > Signed-off-by: Nimay Desai 
>
> Thanks for working on this!
>
> "sparse" reports that MAC_ADDR_PREFIX is "long long" even though it's
> suffixed with plain "L".  I'd use a "ULL" suffix instead.


Okay, I have fixed that.


> There seems to be multiple instances of open-coded
> HMAP_FOR_EACH_WITH_HASH searches for IP and MAC addresses.  I recommend
> adding helper functions.


I have created two helper functions ipam_is_duplicate_mac() and
ipam_is_duplicate_ip()
to fix this.

>
> It seems inefficient in ipam_get_unused_mac() and ipam_get_unused_ip()
> to start the search from the beginning of the space instead of from just
> past the last-assigned address, or in random order.  I don't know
> whether it matters though.




ipam_get_unused_mac() will now start its search from just past the
last-assigned
address. For ipam_get_unused_ip(), I am not sure what the operator's
expectation
is for ip address assignment and so for the time being the search
starts from
the beginning of the address space.

>
>
> Our usual pattern, I think, is that a column is named "options" if the
> meaning of its key-value pairs depend on the type of an entity, and
> "other_config" if the meaning does not depend on a type.  I think that
> this "subnet" configuration is independent of type (there is only one
> type of Logical_Switch), so I would put it in an other_config column.
>

I have changed the "subnet" configuration to reside in the other_config
column
in the Logical_Switch table.

>
> Everything following is about a few things that concern me a little.
> None of these are necessarily big problems, but I want to bring them up
> so that they can be thought through if necessary.
>
> Usually we do not design OVSDB schemas so that daemons with different
> purposes modify the same column, because it can be a recipe for
> confusion.  For example, we use separate columns in the Open_vSwitch
> schema to report the MAC address for a port and to request that a port
> be assigned a specific MAC address, because with a single column it's
> difficult to distinguish whether its value is meant for one purpose or
> the other.  I am not sure that this is exactly the same case, but if we
> were going to follow this principle here, too, I would add a new column
> to Logical_Switch_Port for northd-assigned addresses.  It could be
> called "dynamic_addresses", for example.
>

Dynamically allocated addresses will now reside in a new
"dynamic_addresses"
column in the Logical_Switch_Port table instead of the "addresses"
column.

>
> I am not sure that I am comfortable with the idea of assigning a dynamic
> address automatically when there is nothing in the addresses column.  It
> seems somewhat surprising.  I would consider adding a new "addresses"
> syntax to request a dynamic MAC/IP; for example, having it requested by
> specifying "dynamic" in addresses.
>

A logical switch port's "dynamic_addresses" column will now be
populated with a
dynamically allocated MAC and IPv4 address when the corresponding
logical
switch's subnet is set and the "dynamic" keyword is in the logical
switch port's
addresses column.

>
> I suspect that many deployments would want to enable port security for
> dynamic addresses, but I don't see a way to do that with the current
> design, except in a race-prone way where the CMS copies "addresses" into
> "port_security" once it's populated.
>

As of now, I have not been able to provide a solution for this. The
burden of
copying dynamically allocated addresses into "port_security" remains on
the
operator.

>
> Thanks,
>
> Ben.
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovsdb: Fix memory leak in execute_update.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 12:18:37PM +0800, nickcooper-zhangtonghao wrote:
> If the ‘ovsdb_condition_from_json’ return the error, cnd->clauses will be
> set NULL, so ‘ovsdb_condition_destroy’ should check the 'cnd->clauses’ before
> free it.

Please read CodingStyle.md:

Functions that destroy an instance of a dynamically-allocated type
should accept and ignore a null pointer argument.  Code that calls
such a function (including the C standard library function free())
should omit a null-pointer check.  We find that this usually makes
code easier to read.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] doc: Update INSTALL.Docker.md to reflect it's focus on OVN

2016-07-27 Thread Ryan Moats

"dev"  wrote on 07/27/2016 12:40:23 PM:

> From: Kyle Mestery 
> To: dev@openvswitch.org
> Date: 07/27/2016 12:40 PM
> Subject: [ovs-dev] [PATCH] doc: Update INSTALL.Docker.md to reflect
> it's focus on OVN
> Sent by: "dev" 
>
> While reading this document, the title stood out to me as not
> accurate. The title indicates it will discuss how to use
> Open vSwitch with Docker, but in reality, it's about using
> Open Virtual Networking with Docker.
>
> This change updates the title, as well as the opening paragraphs
> to more accurately reflect what the document is talking about.
>
> Signed-off-by: Kyle Mestery 
> ---
>  INSTALL.Docker.md | 19 ---
>  1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/INSTALL.Docker.md b/INSTALL.Docker.md
> index b7ca4ba..08758ae 100644
> --- a/INSTALL.Docker.md
> +++ b/INSTALL.Docker.md
> @@ -1,14 +1,11 @@
> -How to Use Open vSwitch with Docker
> -
> -
> -This document describes how to use Open vSwitch with Docker 1.9.0 or
> -later.  This document assumes that you installed Open vSwitch by
following
> -[INSTALL.md] or by using the distribution packages such as .deb or .rpm.
> -Consult www.docker.com for instructions on how to install Docker.
> -
> -Docker 1.9.0 comes with support for multi-host networking.  Integration
> -of Docker networking and Open vSwitch can be achieved via Open vSwitch
> -virtual network (OVN).
> +How to Use Open Virtual Networking With Docker
> +==
> +
> +This document describes how to use Open Virtual Networking with Docker
> +1.9.0 or later.  This document assumes that you have installed Open
> +vSwitch by following [INSTALL.md] or by using hte distribution packages
> +such as .deb or.rpm. Consult www.docker.com for instructions on how to
> +install Docker.  Docker 1.9.0 comes with support for multi-host
networkign.
>
>
>  Setup

lgtm, opening the potential mosh pit via...

Acked-by: Ryan Moats 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] doc: Update INSTALL.Docker.md to reflect it's focus on OVN

2016-07-27 Thread Kyle Mestery

While reading this document, the title stood out to me as not
accurate. The title indicates it will discuss how to use
Open vSwitch with Docker, but in reality, it's about using
Open Virtual Networking with Docker.

This change updates the title, as well as the opening paragraphs
to more accurately reflect what the document is talking about.

Signed-off-by: Kyle Mestery 
---
 INSTALL.Docker.md | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/INSTALL.Docker.md b/INSTALL.Docker.md
index b7ca4ba..08758ae 100644
--- a/INSTALL.Docker.md
+++ b/INSTALL.Docker.md
@@ -1,14 +1,11 @@
-How to Use Open vSwitch with Docker
-
-
-This document describes how to use Open vSwitch with Docker 1.9.0 or
-later.  This document assumes that you installed Open vSwitch by following
-[INSTALL.md] or by using the distribution packages such as .deb or .rpm.
-Consult www.docker.com for instructions on how to install Docker.
-
-Docker 1.9.0 comes with support for multi-host networking.  Integration
-of Docker networking and Open vSwitch can be achieved via Open vSwitch
-virtual network (OVN).
+How to Use Open Virtual Networking With Docker
+==
+
+This document describes how to use Open Virtual Networking with Docker
+1.9.0 or later.  This document assumes that you have installed Open
+vSwitch by following [INSTALL.md] or by using hte distribution packages
+such as .deb or.rpm. Consult www.docker.com for instructions on how to
+install Docker.  Docker 1.9.0 comes with support for multi-host networkign.
 
 
 Setup
-- 
2.7.4 (Apple Git-66)

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3] Scanning only changed entries in the ovnsb

2016-07-27 Thread Hui Kang



Ben Pfaff  wrote on 07/27/2016 12:30:25 PM:

> From: Ben Pfaff 
> To: Hui Kang/Watson/IBM@IBMUS
> Cc: Hui Kang , dev@openvswitch.org
> Date: 07/27/2016 12:30 PM
> Subject: Re: [ovs-dev] [PATCH v3] Scanning only changed entries in the
ovnsb
>
> On Tue, Jul 26, 2016 at 03:44:57PM -0400, Hui Kang wrote:
> >
> >
> > "dev"  wrote on 07/26/2016 02:20:27 PM:
> >
> > > From: Ben Pfaff 
> > > To: Hui Kang 
> > > Cc: dev@openvswitch.org
> > > Date: 07/26/2016 02:20 PM
> > > Subject: Re: [ovs-dev] [PATCH v3] Scanning only changed entries in
the
> > ovnsb
> > > Sent by: "dev" 
> > >
> > > On Sat, Jul 16, 2016 at 11:58:25PM -0400, Hui Kang wrote:
> > > > Improve performance by scanning only changed port binding entries
> > > > when determining whether to mark the logical switch port up or
> > > > down
> > > >
> > > > Signed-off-by: Hui Kang 
> > >
> > > Won't this skip an initial round of updates at ovn-northd startup
time?
> > > (Certainly ovn-northd might get killed and restarted occasionally,
> > > especially if we're doing failover to a second host.)
> >
> > Hi, Ben,
> > After second thought, I think skipping the initial round is the purpose
of
> > this patch.
> >
> > ovsdb_idl_create(ovsdb) copies the the Port_binding table from
southbound
> > database whenever ovn-northd gets started. In this case, the northbound
> > DB and southbound db are synced.
>
> I don't understand this statement.  ovsdb_idl_create() doesn't sync
> anything between nb and sb dbs.

Sorry for the confusion. I have updated the patch to v4 addressing your
comment.

- Hui

>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3] Scanning only changed entries in the ovnsb

2016-07-27 Thread Ben Pfaff

On Tue, Jul 26, 2016 at 03:44:57PM -0400, Hui Kang wrote:
> 
> 
> "dev"  wrote on 07/26/2016 02:20:27 PM:
> 
> > From: Ben Pfaff 
> > To: Hui Kang 
> > Cc: dev@openvswitch.org
> > Date: 07/26/2016 02:20 PM
> > Subject: Re: [ovs-dev] [PATCH v3] Scanning only changed entries in the
> ovnsb
> > Sent by: "dev" 
> >
> > On Sat, Jul 16, 2016 at 11:58:25PM -0400, Hui Kang wrote:
> > > Improve performance by scanning only changed port binding entries
> > > when determining whether to mark the logical switch port up or
> > > down
> > >
> > > Signed-off-by: Hui Kang 
> >
> > Won't this skip an initial round of updates at ovn-northd startup time?
> > (Certainly ovn-northd might get killed and restarted occasionally,
> > especially if we're doing failover to a second host.)
> 
> Hi, Ben,
> After second thought, I think skipping the initial round is the purpose of
> this patch.
> 
> ovsdb_idl_create(ovsdb) copies the the Port_binding table from southbound
> database whenever ovn-northd gets started. In this case, the northbound
> DB and southbound db are synced. 

I don't understand this statement.  ovsdb_idl_create() doesn't sync
anything between nb and sb dbs.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] FAQ: Update performance section regarding CRC32 intrinsics.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 05:15:43PM +0100, Bhanuprakash Bodireddy wrote:
> Signed-off-by: Bhanuprakash Bodireddy 

This probably makes better sense in INSTALL.md.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovs : Implementation to add meter in ovs flows

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 09:13:50PM +0530, Deepanshu Saxena1/CHN/TCS wrote:
> Thanks for the review comments. I am working on the review comments. 
> 
> As per earlier discussion on meter implementation [1] the approach is to use 
> tc capabilities for kernel implementation. Using qdisc and filters 
> I implemented meter as rate limiter. 
> 
> I understood your point that my code is applying tc on the output port 
> following the meter instruction. I implemented it like this because as per my 
> understanding tc is applied to a port. I selected the output port for that 
> purpose. 
>  
> Can you please clarify the approach to use tc for kernel space implementation 
> of meters. 

If that's all you can do with Linux traffic control then meters can't be
implemented that way.  If you do want to use Linux traffic control then
you'll have to find a creative way to apply it in the middle of a set of
actions.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] FAQ: Update performance section regarding CRC32 intrinsics.

2016-07-27 Thread Bhanuprakash Bodireddy

Signed-off-by: Bhanuprakash Bodireddy 
---
 FAQ.md | 16 
 1 file changed, 16 insertions(+)

diff --git a/FAQ.md b/FAQ.md
index 35e1cac..6218feb 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -688,6 +688,22 @@ A: The OVS kernel datapath may have been updated to a 
newer version than
recommended to pair the same versions of the kernel module and OVS
userspace.
 
+### Q: My processor supports special instructions to compute hash and how to 
leverage them in OVS?
+
+A: OVS should be configured for leveraging the special instructions.  For
+   example on X86_64 with SSE4.2 instruction set support, CRC32 intrinsics
+   can be used for efficient hash computation by appending 'msse4.2' to CFLAGS.
+
+   ./configure CFLAGS="-g -O2 -msse4.2"
+
+   If you are on a different processor and don't know what flags to choose, it
+   is recommended to use '-march=native' settings.
+
+   ./configure CFLAGS="-g -O2 -march=native"
+
+   With this, GCC will autodetect the processor and automatically set 
appropriate
+   flags for it.  This should not be used if you are compiling OVS outside the
+   target machine.
 
 Configuration Problems
 --
-- 
2.4.11

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-nbctl: Improve usage message.

2016-07-27 Thread Ben Pfaff

On Wed, Jul 27, 2016 at 01:07:29PM +0530, Numan Siddique wrote:
> On Wed, Jul 27, 2016 at 11:13 AM, Ben Pfaff  wrote:
> 
> > The most important change here is to delete misspelled "the".
> >
> > Signed-off-by: Ben Pfaff 
> > ---
> >  ovn/utilities/ovn-nbctl.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> > index e594a32..947e3a1 100644
> > --- a/ovn/utilities/ovn-nbctl.c
> > +++ b/ovn/utilities/ovn-nbctl.c
> > @@ -402,9 +402,9 @@ DHCP Options commands:\n\
> >dhcp-options-list\n\
> > lists the DHCP_Options rows\n\
> >dhcp-options-set-options DHCP_OPTIONS_UUID  KEY=VALUE [KEY=VALUE]...\n\
> > -   set DHCP options to the DHCP_OPTIONS_UUID\n\
> > +   set DHCP options for DHCP_OPTIONS_UUID\n\
> >dhcp-options-get-options DHCO_OPTIONS_UUID \n\
> > -   displays the DHCP options of th
> > DHCP_OPTIONS_UUID\n\
> > +   displays the DHCP options for
> > DHCP_OPTIONS_UUID\n\
> >  \n\
> >  %s\
> >  \n\
> >
> 
> 
> Sorry for the typo and incorrect grammar in the original patch
> 
> Acked-by: Numan Siddique 

No problem.

Thanks for the review, I applied this to master.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovs : Implementation to add meter in ovs flows

2016-07-27 Thread Deepanshu Saxena1/CHN/TCS

Thanks for the review comments. I am working on the review comments. 

As per earlier discussion on meter implementation [1] the approach is to use tc 
capabilities for kernel implementation. Using qdisc and filters 
I implemented meter as rate limiter. 

I understood your point that my code is applying tc on the output port 
following the meter instruction. I implemented it like this because as per my 
understanding tc is applied to a port. I selected the output port for that 
purpose. 
 
Can you please clarify the approach to use tc for kernel space implementation 
of meters. 

[1] http://openvswitch.org/pipermail/dev/2016-April/069276.html


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 3/5] netdev-vport: don't use system type when opening netdev

2016-07-27 Thread Thadeu Lima de Souza Cascardo

tunnel_check_status_change__ calls netdev_open with type system. Using NULL
instead will default to system in case the device is not opened yet, and allow a
different type in case it's already opened.

Any type should be fine, as netdev_get_carrier will work with any of them.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 lib/netdev-vport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 83a795c..87a30f8 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -274,7 +274,7 @@ tunnel_check_status_change__(struct netdev_vport *netdev)
 if (ovs_router_lookup(route, iface, NULL, )) {
 struct netdev *egress_netdev;
 
-if (!netdev_open(iface, "system", _netdev)) {
+if (!netdev_open(iface, NULL, _netdev)) {
 status = netdev_get_carrier(egress_netdev);
 netdev_close(egress_netdev);
 }
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 4/5] dpif-netdev: use the open_type when creating the local port

2016-07-27 Thread Thadeu Lima de Souza Cascardo

Instead of using the internal type, use the port_open_type when creating the
local port. That makes sure that whenever dpif_port_query is used, the netdev
open_type is returned instead of the "internal" type.

For other ports, that is already the case, as the netdev type is used when
creating the dp_netdev_port.

That changes the output of dpctl when showing the local port, and also when
trying to change its type. So, corresponding tests are fixed.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 11 ++-
 tests/dpctl.at| 13 +++--
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f05ca4e..1fea0d7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -941,7 +941,9 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 ovs_mutex_lock(>port_mutex);
 dp_netdev_set_nonpmd(dp);
 
-error = do_add_port(dp, name, "internal", ODPP_LOCAL);
+error = do_add_port(dp, name, dpif_netdev_port_open_type(dp->class,
+ "internal"),
+ODPP_LOCAL);
 ovs_mutex_unlock(>port_mutex);
 if (error) {
 dp_netdev_free(dp);
@@ -1129,7 +1131,7 @@ hash_port_no(odp_port_t port_no)
 }
 
 static int
-port_create(const char *devname, const char *open_type, const char *type,
+port_create(const char *devname, const char *type,
 odp_port_t port_no, struct dp_netdev_port **portp)
 {
 struct netdev_saved_flags *sf;
@@ -1142,7 +1144,7 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 *portp = NULL;
 
 /* Open and validate network device. */
-error = netdev_open(devname, open_type, );
+error = netdev_open(devname, type, );
 if (error) {
 return error;
 }
@@ -1233,8 +1235,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 return EEXIST;
 }
 
-error = port_create(devname, dpif_netdev_port_open_type(dp->class, type),
-type, port_no, );
+error = port_create(devname, type, port_no, );
 if (error) {
 return error;
 }
diff --git a/tests/dpctl.at b/tests/dpctl.at
index 067f2d2..b6d5dd6 100644
--- a/tests/dpctl.at
+++ b/tests/dpctl.at
@@ -23,14 +23,14 @@ AT_CHECK([ovs-appctl dpctl/show dummy@br0], [0], [dnl
 dummy@br0:
lookups: hit:0 missed:0 lost:0
flows: 0
-   port 0: br0 (internal)
+   port 0: br0 (dummy)
 ])
 AT_CHECK([ovs-appctl dpctl/add-if dummy@br0 vif1.0,type=dummy,port_no=5])
 AT_CHECK([ovs-appctl dpctl/show dummy@br0], [0], [dnl
 dummy@br0:
lookups: hit:0 missed:0 lost:0
flows: 0
-   port 0: br0 (internal)
+   port 0: br0 (dummy)
port 5: vif1.0 (dummy)
 ])
 AT_CHECK([ovs-appctl dpctl/add-if dummy@br0 vif1.0,type=dummy], [2], [],
@@ -44,8 +44,9 @@ AT_CHECK([ovs-appctl dpctl/set-if dummy@br0 
vif1.0,type=system], [2], [],
   [ovs-vswitchd: vif1.0: can't change type from dummy to system
 ovs-appctl: ovs-vswitchd: server returned an error
 ])
-AT_CHECK([ovs-appctl dpctl/set-if dummy@br0 br0,type=dummy], [2], [],
-  [ovs-vswitchd: br0: can't change type from internal to dummy
+AT_CHECK([ovs-appctl dpctl/set-if dummy@br0 br0,type=dummy], [0])
+AT_CHECK([ovs-appctl dpctl/set-if dummy@br0 br0,type=internal], [2], [],
+  [ovs-vswitchd: br0: can't change type from dummy to internal
 ovs-appctl: ovs-vswitchd: server returned an error
 ])
 AT_CHECK([ovs-appctl dpctl/del-if dummy@br0 vif1.0])
@@ -53,7 +54,7 @@ AT_CHECK([ovs-appctl dpctl/show dummy@br0], [0], [dnl
 dummy@br0:
lookups: hit:0 missed:0 lost:0
flows: 0
-   port 0: br0 (internal)
+   port 0: br0 (dummy)
 ])
 AT_CHECK([ovs-appctl dpctl/del-if dummy@br0 vif1.0], [2], [],
   [ovs-vswitchd: no port named vif1.0
@@ -63,7 +64,7 @@ AT_CHECK([ovs-appctl dpctl/show dummy@br0], [0], [dnl
 dummy@br0:
lookups: hit:0 missed:0 lost:0
flows: 0
-   port 0: br0 (internal)
+   port 0: br0 (dummy)
 ])
 AT_CHECK([ovs-appctl dpctl/del-if dummy@br0 nonexistent], [2], [],
   [ovs-vswitchd: no port named nonexistent
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 5/5] netdev: do not allow devices to be opened with conflicting types

2016-07-27 Thread Thadeu Lima de Souza Cascardo

When a device is already opened, netdev_open should verify that the types match,
or else return an error.

Otherwise, users might expect to open a device with a certain type and get a
handle belonging to a different type.

This also prevents certain conflicting configurations that would have a port of
a certain type in the database and one of a different type on the system.

For example, when adding an interface with a type other than system, and there
is already a system interface with the same name, as the routing table will hold
a reference to that system interface, some conflicts will arise. The netdev will
be opened with the incorrect type and that will make vswitchd remove it, but
adding it again will fail as it already exists. Failing earlier prevents some
vswitchd loops in reconfiguring the interface.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 lib/netdev.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/netdev.c b/lib/netdev.c
index 31a6a46..f0dcfc8 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -339,7 +339,8 @@ netdev_open(const char *name, const char *type, struct 
netdev **netdevp)
 if (!netdev) {
 struct netdev_registered_class *rc;
 
-rc = netdev_lookup_class(type && type[0] ? type : "system");
+type = type && type[0] ? type : "system";
+rc = netdev_lookup_class(type);
 if (rc && ovs_refcount_try_ref_rcu(>refcnt)) {
 netdev = rc->class->alloc();
 if (netdev) {
@@ -376,6 +377,11 @@ netdev_open(const char *name, const char *type, struct 
netdev **netdevp)
   name, type);
 error = EAFNOSUPPORT;
 }
+} else if (type && strcmp(type, netdev_get_type(netdev))) {
+VLOG_WARN("trying to create netdev %s of different type %s,"
+  " already is %s\n",
+  name, type, netdev_get_type(netdev));
+error = EEXIST;
 } else {
 error = 0;
 }
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 2/5] in-band: don't use system type when opening netdev

2016-07-27 Thread Thadeu Lima de Souza Cascardo

A netdev might be already opened with a different type and that can be used
instead. The system type is already the default type that will be used when
there is no netdev opened and the type is not specified.

And as long as the opened netdev supports the required operations, it doesn't
matter its type.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 ofproto/in-band.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/in-band.c b/ofproto/in-band.c
index e3ee41a..f69e94f 100644
--- a/ofproto/in-band.c
+++ b/ofproto/in-band.c
@@ -134,7 +134,7 @@ refresh_remote(struct in_band *ib, struct in_band_remote *r)
 {
 netdev_close(r->remote_netdev);
 
-retval = netdev_open(next_hop_dev, "system", >remote_netdev);
+retval = netdev_open(next_hop_dev, NULL, >remote_netdev);
 if (retval) {
 VLOG_WARN_RL(, "%s: cannot open netdev %s (next hop "
  "to controller "IP_FMT"): %s",
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 0/5] netdev_open conflicting types

2016-07-27 Thread Thadeu Lima de Souza Cascardo

Fix some uses of type on netdev_open.

We established that there are two types: the database type and the netdev type.
And that ofproto and dpif layers use the netdev type and return it when queried.

Some calls to netdev_open should use NULL instead of system. If there is no
netdev opened, system will be used as the default, but if a different type is
opened, the caller does not expect it to be of type system.

In other cases, the use of internal is incorrect and the appropriate
port_open_type should be used instead.

Finally, we make netdev_open return an error when a netdev of a different type
than the one requested already exists.

This will fix some bugs and alert users when conflicting interfaces exist on the
system and the database. For example, when a user configures an interface with a
type other than system, and there is a system interface with the same name.

Thadeu Lima de Souza Cascardo (5):
  in-band: use open_type when opening internal device
  in-band: don't use system type when opening netdev
  netdev-vport: don't use system type when opening netdev
  dpif-netdev: use the open_type when creating the local port
  netdev: do not allow devices to be opened with conflicting types

 lib/dpif-netdev.c  | 11 ++-
 lib/netdev-vport.c |  2 +-
 lib/netdev.c   |  8 +++-
 ofproto/in-band.c  |  5 +++--
 tests/dpctl.at | 13 +++--
 5 files changed, 24 insertions(+), 15 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 1/5] in-band: use open_type when opening internal device

2016-07-27 Thread Thadeu Lima de Souza Cascardo

in-band code will open a device that it expects to be the main internal port of
the bridge. However, it's possible that the correct type is a different one. For
dpif-netdev, it might be a tap device, or a dummy device for dummy datapaths.
ofproto_port_open_type will give the correct type.

While this doesn't cause any problems right now, as the needed type would be
opened already, a later patch assumes netdev with different types cannot be
opened.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 ofproto/in-band.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ofproto/in-band.c b/ofproto/in-band.c
index 36e80f4..e3ee41a 100644
--- a/ofproto/in-band.c
+++ b/ofproto/in-band.c
@@ -422,9 +422,10 @@ in_band_create(struct ofproto *ofproto, const char 
*local_name,
 struct in_band *in_band;
 struct netdev *local_netdev;
 int error;
+const char *type = ofproto_port_open_type(ofproto->type, "internal");
 
 *in_bandp = NULL;
-error = netdev_open(local_name, "internal", _netdev);
+error = netdev_open(local_name, type, _netdev);
 if (error) {
 VLOG_ERR("%s: failed to initialize in-band control: cannot open "
  "datapath local port %s (%s)", ofproto->name,
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] SCAN00009562

2016-07-27 Thread Mandy







Sent from my Samsung device
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v5 2/4] bridge: Pass interface's configuration to datapath.

2016-07-27 Thread Ilya Maximets

This commit adds functionality to pass value of 'other_config' column
of 'Interface' table to datapath.

This may be used to pass not directly connected with netdev options and
configure behaviour of the datapath for different ports.
For example: pinning of rx queues to polling threads in dpif-netdev.

Signed-off-by: Ilya Maximets 
Acked-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c  |  1 +
 lib/dpif-netlink.c |  1 +
 lib/dpif-provider.h|  5 +
 lib/dpif.c | 17 +
 lib/dpif.h |  1 +
 ofproto/ofproto-dpif.c | 15 +++
 ofproto/ofproto-provider.h |  4 
 ofproto/ofproto.c  | 29 +
 ofproto/ofproto.h  |  2 ++
 vswitchd/bridge.c  |  2 ++
 10 files changed, 77 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d1ba6f3..b8c069d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4347,6 +4347,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_get_stats,
 dpif_netdev_port_add,
 dpif_netdev_port_del,
+NULL,   /* port_set_config */
 dpif_netdev_port_query_by_number,
 dpif_netdev_port_query_by_name,
 NULL,   /* port_get_pid */
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index d544072..a39faa2 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2348,6 +2348,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_get_stats,
 dpif_netlink_port_add,
 dpif_netlink_port_del,
+NULL,   /* port_set_config */
 dpif_netlink_port_query_by_number,
 dpif_netlink_port_query_by_name,
 dpif_netlink_port_get_pid,
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 25f4280..21fb0ba 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -167,6 +167,11 @@ struct dpif_class {
 /* Removes port numbered 'port_no' from 'dpif'. */
 int (*port_del)(struct dpif *dpif, odp_port_t port_no);
 
+/* Refreshes configuration of 'dpif's port. The implementation might
+ * postpone applying the changes until run() is called. */
+int (*port_set_config)(struct dpif *dpif, odp_port_t port_no,
+   const struct smap *cfg);
+
 /* Queries 'dpif' for a port with the given 'port_no' or 'devname'.
  * If 'port' is not null, stores information about the port into
  * '*port' if successful.
diff --git a/lib/dpif.c b/lib/dpif.c
index bb2c4e6..53958c5 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -610,6 +610,23 @@ dpif_port_exists(const struct dpif *dpif, const char 
*devname)
 return !error;
 }
 
+/* Refreshes configuration of 'dpif's port. */
+int
+dpif_port_set_config(struct dpif *dpif, odp_port_t port_no,
+ const struct smap *cfg)
+{
+int error = 0;
+
+if (dpif->dpif_class->port_set_config) {
+error = dpif->dpif_class->port_set_config(dpif, port_no, cfg);
+if (error) {
+log_operation(dpif, "port_set_config", error);
+}
+}
+
+return error;
+}
+
 /* Looks up port number 'port_no' in 'dpif'.  On success, returns 0 and
  * initializes '*port' appropriately; on failure, returns a positive errno
  * value.
diff --git a/lib/dpif.h b/lib/dpif.h
index 981868c..a7c5097 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -839,6 +839,7 @@ void dpif_register_upcall_cb(struct dpif *, upcall_callback 
*, void *aux);
 int dpif_recv_set(struct dpif *, bool enable);
 int dpif_handlers_set(struct dpif *, uint32_t n_handlers);
 int dpif_poll_threads_set(struct dpif *, const char *cmask);
+int dpif_port_set_config(struct dpif *, odp_port_t, const struct smap *cfg);
 int dpif_recv(struct dpif *, uint32_t handler_id, struct dpif_upcall *,
   struct ofpbuf *);
 void dpif_recv_purge(struct dpif *);
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index faff1c7..79f2aa0 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3543,6 +3543,20 @@ port_del(struct ofproto *ofproto_, ofp_port_t ofp_port)
 }
 
 static int
+port_set_config(const struct ofport *ofport_, const struct smap *cfg)
+{
+struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
+struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
+
+if (sset_contains(>ghost_ports,
+  netdev_get_name(ofport->up.netdev))) {
+return 0;
+}
+
+return dpif_port_set_config(ofproto->backer->dpif, ofport->odp_port, cfg);
+}
+
+static int
 port_get_stats(const struct ofport *ofport_, struct netdev_stats *stats)
 {
 struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
@@ -5610,6 +5624,7 @@ const struct ofproto_class ofproto_dpif_class = {
 port_query_by_name,
 port_add,
 port_del,
+port_set_config,
 port_get_stats,
 port_dump_start,
 port_dump_next,
diff --git a/ofproto/ofproto-provider.h

[ovs-dev] [PATCH v5 4/4] dpif-netdev: Introduce pmd-rxq-affinity.

2016-07-27 Thread Ilya Maximets

New 'other_config:pmd-rxq-affinity' field for Interface table to
perform manual pinning of RX queues to desired cores.

This functionality is required to achieve maximum performance because
all kinds of ports have different cost of rx/tx operations and
only user can know about expected workload on different ports.

Example:
# ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
  other_config:pmd-rxq-affinity="0:3,1:7,3:8"
Queue #0 pinned to core 3;
Queue #1 pinned to core 7;
Queue #2 not pinned.
Queue #3 pinned to core 8;

It's decided to automatically isolate cores that have rxq explicitly
assigned to them because it's useful to keep constant polling rate on
some performance critical ports while adding/deleting other ports
without explicit pinning of all ports.

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK.md  |  49 +++-
 NEWS |   2 +
 lib/dpif-netdev.c| 216 +--
 tests/pmd.at |   6 ++
 vswitchd/vswitch.xml |  23 ++
 5 files changed, 254 insertions(+), 42 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 5407794..7609aa7 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -289,14 +289,57 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
  # Check current stats
ovs-appctl dpif-netdev/pmd-stats-show
 
+ # Clear previous stats
+   ovs-appctl dpif-netdev/pmd-stats-clear
+ ```
+
+  7. Port/rxq assigment to PMD threads
+
+ ```
  # Show port/rxq assignment
ovs-appctl dpif-netdev/pmd-rxq-show
+ ```
 
- # Clear previous stats
-   ovs-appctl dpif-netdev/pmd-stats-clear
+ To change default rxq assignment to pmd threads rxqs may be manually
+ pinned to desired cores using:
+
+ ```
+ ovs-vsctl set Interface  \
+   other_config:pmd-rxq-affinity=
  ```
+ where:
+
+ ```
+  ::= NULL | 
+  ::=  |
+   , 
+  ::=  : 
+ ```
+
+ Example:
+
+ ```
+ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
+   other_config:pmd-rxq-affinity="0:3,1:7,3:8"
+
+ Queue #0 pinned to core 3;
+ Queue #1 pinned to core 7;
+ Queue #2 not pinned.
+ Queue #3 pinned to core 8;
+ ```
+
+ After that PMD threads on cores where RX queues was pinned will become
+ `isolated`. This means that this thread will poll only pinned RX queues.
+
+ WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
queues
+ will not be polled. Also, if provided `core_id` is not available (ex. this
+ `core_id` not in `pmd-cpu-mask`), RX queue will not be polled by any
+ PMD thread.
+
+ Isolation of PMD threads also can be checked using
+ `ovs-appctl dpif-netdev/pmd-rxq-show` command.
 
-  7. Stop vswitchd & Delete bridge
+  8. Stop vswitchd & Delete bridge
 
  ```
  ovs-appctl -t ovs-vswitchd exit
diff --git a/NEWS b/NEWS
index 73d3fcf..1a34f75 100644
--- a/NEWS
+++ b/NEWS
@@ -45,6 +45,8 @@ Post-v2.5.0
Old 'other_config:n-dpdk-rxqs' is no longer supported.
Not supported by vHost interfaces. For them number of rx and tx queues
is applied from connected virtio device.
+ * New 'other_config:pmd-rxq-affinity' field for PMD interfaces, that
+   allows to pin port's rx queues to desired cores.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
  * Type of log messages from PMD threads changed from INFO to DBG.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1ef0cd7..33f1216 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -53,7 +53,9 @@
 #include "openvswitch/list.h"
 #include "openvswitch/match.h"
 #include "openvswitch/ofp-print.h"
+#include "openvswitch/ofp-util.h"
 #include "openvswitch/ofpbuf.h"
+#include "openvswitch/shash.h"
 #include "openvswitch/vlog.h"
 #include "ovs-numa.h"
 #include "ovs-rcu.h"
@@ -62,7 +64,7 @@
 #include "pvector.h"
 #include "random.h"
 #include "seq.h"
-#include "openvswitch/shash.h"
+#include "smap.h"
 #include "sset.h"
 #include "timeval.h"
 #include "tnl-neigh-cache.h"
@@ -252,6 +254,12 @@ enum pmd_cycles_counter_type {
 
 #define XPS_TIMEOUT_MS 500LL
 
+/* Contained by struct dp_netdev_port's 'rxqs' member.  */
+struct dp_netdev_rxq {
+struct netdev_rxq *rxq;
+unsigned core_id;   /* Сore to which this queue is pinned. */
+};
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -259,11 +267,12 @@ struct dp_netdev_port {
 struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
-struct netdev_rxq **rxq;
+struct dp_netdev_rxq *rxqs;
 atomic_bool dynamic_txqs;   /* If true XPS will be used. */
 unsigned *txq_used; /* Number of threads

[ovs-dev] [PATCH v5 3/4] dpif-netdev: Add reconfiguration request to dp_netdev.

2016-07-27 Thread Ilya Maximets

Next patches will add new conditions when reconfiguration will be
required. It'll be simpler to have common way to request reconfiguration.

Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index b8c069d..1ef0cd7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -223,8 +223,10 @@ struct dp_netdev {
  * 'struct dp_netdev_pmd_thread' in 'per_pmd_key'. */
 ovsthread_key_t per_pmd_key;
 
+struct seq *reconfigure_seq;
+uint64_t last_reconfigure_seq;
+
 /* Cpu mask for pin of pmd threads. */
-char *requested_pmd_cmask;
 char *pmd_cmask;
 
 uint64_t last_tnl_conf_seq;
@@ -943,6 +945,9 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 dp->port_seq = seq_create();
 fat_rwlock_init(>upcall_rwlock);
 
+dp->reconfigure_seq = seq_create();
+dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
+
 /* Disable upcalls by default. */
 dp_netdev_disable_upcall(dp);
 dp->upcall_aux = NULL;
@@ -967,6 +972,18 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 return 0;
 }
 
+static void
+dp_netdev_request_reconfigure(struct dp_netdev *dp)
+{
+seq_change(dp->reconfigure_seq);
+}
+
+static bool
+dp_netdev_is_reconf_required(struct dp_netdev *dp)
+{
+return seq_read(dp->reconfigure_seq) != dp->last_reconfigure_seq;
+}
+
 static int
 dpif_netdev_open(const struct dpif_class *class, const char *name,
  bool create, struct dpif **dpifp)
@@ -1025,6 +1042,8 @@ dp_netdev_free(struct dp_netdev *dp)
 ovs_mutex_unlock(>port_mutex);
 cmap_destroy(>poll_threads);
 
+seq_destroy(dp->reconfigure_seq);
+
 seq_destroy(dp->port_seq);
 hmap_destroy(>ports);
 ovs_mutex_destroy(>port_mutex);
@@ -2545,9 +2564,10 @@ dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 
-if (!nullable_string_is_equal(dp->requested_pmd_cmask, cmask)) {
-free(dp->requested_pmd_cmask);
-dp->requested_pmd_cmask = nullable_xstrdup(cmask);
+if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
+free(dp->pmd_cmask);
+dp->pmd_cmask = nullable_xstrdup(cmask);
+dp_netdev_request_reconfigure(dp);
 }
 
 return 0;
@@ -2696,12 +2716,12 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
 struct dp_netdev_port *port, *next;
 int n_cores;
 
+dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
+
 dp_netdev_destroy_all_pmds(dp);
 
 /* Reconfigures the cpu mask. */
-ovs_numa_set_cpu_mask(dp->requested_pmd_cmask);
-free(dp->pmd_cmask);
-dp->pmd_cmask = nullable_xstrdup(dp->requested_pmd_cmask);
+ovs_numa_set_cpu_mask(dp->pmd_cmask);
 
 n_cores = ovs_numa_get_n_cores();
 if (n_cores == OVS_CORE_UNSPEC) {
@@ -2770,8 +2790,7 @@ dpif_netdev_run(struct dpif *dpif)
 
 dp_netdev_pmd_unref(non_pmd);
 
-if (!nullable_string_is_equal(dp->pmd_cmask, dp->requested_pmd_cmask)
-|| ports_require_restart(dp)) {
+if (dp_netdev_is_reconf_required(dp) || ports_require_restart(dp)) {
 reconfigure_pmd_threads(dp);
 }
 ovs_mutex_unlock(>port_mutex);
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v5 1/4] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-07-27 Thread Ilya Maximets

If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<>
pmd thread numa_id 0 core_id 13:
port: vhost-user1   queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1   queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1   queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1   queue-id: 3
<>

As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
  it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
  allocated TX queue-ids. If queue wasn't used in last
  XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
* XPS is not working if we have enough TX queues.

Reported-by: Zhihong Wang 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 204 --
 lib/netdev-bsd.c  |   3 +-
 lib/netdev-dpdk.c |  32 +++-
 lib/netdev-dummy.c|   3 +-
 lib/netdev-linux.c|   3 +-
 lib/netdev-provider.h |  11 +--
 lib/netdev.c  |  13 ++--
 lib/netdev.h  |   2 +-
 8 files changed, 198 insertions(+), 73 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f05ca4e..d1ba6f3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -248,6 +248,8 @@ enum pmd_cycles_counter_type {
 PMD_N_CYCLES
 };
 
+#define XPS_TIMEOUT_MS 500LL
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -256,6 +258,9 @@ struct dp_netdev_port {
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
+atomic_bool dynamic_txqs;   /* If true XPS will be used. */
+unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
+struct ovs_mutex txq_used_mutex;
 char *type; /* Port type as requested by user. */
 };
 
@@ -384,8 +389,9 @@ struct rxq_poll {
 
 /* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
 struct tx_port {
-odp_port_t port_no;
-struct netdev *netdev;
+struct dp_netdev_port *port;
+int qid;
+long long last_used;
 struct hmap_node node;
 };
 
@@ -443,9 +449,10 @@ struct dp_netdev_pmd_thread {
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
 
-/* Queue id used by this pmd thread to send packets on all netdevs.
- * All tx_qid's are unique and less than 'ovs_numa_get_n_cores() + 1'. */
-atomic_int tx_qid;
+/* Queue id used by this pmd thread to send packets on all netdevs if
+ * XPS disabled for this netdev. All static_tx_qid's are unique and less
+ * than 'ovs_numa_get_n_cores() + 1'. */
+atomic_int static_tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
@@ -498,7 +505,8 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
   struct dp_packet_batch *,
   bool may_steal,
   const struct nlattr *actions,
-  size_t actions_len);
+  size_t actions_len,
+  long long now);
 static void

[ovs-dev] [PATCH v5 0/4] XPS + Manual pinning (all)

2016-07-27 Thread Ilya Maximets

Manual pinning API was discussed here:
http://openvswitch.org/pipermail/dev/2016-July/074674.html

Version 5:
* XPS and Manual pinning back together.
* Dropped already applied patches
* All fixups from pinning v3 merged.
* XPS doesn't work if we have enough TX queues
* Affinity parser changed to reuse existing code
* 'needs_locking' logic moved to dpif-netdev.

Old XPS log:

Version 4:
* Dropped rwlock related patches.
* Added pointer from 'struct tx_port' to 'struct dp_netdev_port'
  to avoid locking of 'dp->ports'. This works because as long as
  a port is in a pmd thread's tx_port cache it cannot be deleted
  from the datapath.
* Added 'now' parameter to 'dp_netdev_execute_actions()' to pass
  current time to XPS functions. This needed to avoid using
  'last_cycles' that is always 0 without DPDK.
* Fixed tx queue ids cleanup on PMD thread deletion.

Version 3:
* Dropped already applied changes.
* fat-rwlock used instead of port_mutex.
* revalidation of 'non-pmd' thread's tx queues added to
  'dpif_netdev_run' to make it faster.


Ilya Maximets (4):
  dpif-netdev: XPS (Transmit Packet Steering) implementation.
  bridge: Pass interface's configuration to datapath.
  dpif-netdev: Add reconfiguration request to dp_netdev.
  dpif-netdev: Introduce pmd-rxq-affinity.

 INSTALL.DPDK.md|  49 -
 NEWS   |   2 +
 lib/dpif-netdev.c  | 450 -
 lib/dpif-netlink.c |   1 +
 lib/dpif-provider.h|   5 +
 lib/dpif.c |  17 ++
 lib/dpif.h |   1 +
 lib/netdev-bsd.c   |   3 +-
 lib/netdev-dpdk.c  |  32 ++--
 lib/netdev-dummy.c |   3 +-
 lib/netdev-linux.c |   3 +-
 lib/netdev-provider.h  |  11 +-
 lib/netdev.c   |  13 +-
 lib/netdev.h   |   2 +-
 ofproto/ofproto-dpif.c |  15 ++
 ofproto/ofproto-provider.h |   4 +
 ofproto/ofproto.c  |  29 +++
 ofproto/ofproto.h  |   2 +
 tests/pmd.at   |   6 +
 vswitchd/bridge.c  |   2 +
 vswitchd/vswitch.xml   |  23 +++
 21 files changed, 553 insertions(+), 120 deletions(-)

-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] COPY0000549

2016-07-27 Thread Rosa







Sent from my Samsung device
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

1 2 >

1 - 100 of 124 matches

Mail list logo