Re: [ovs-dev] Request to work on the Extensions - EXT-320 / EXT-46 : OF Version 1.5 ref.

2016-10-26 Thread Jarno Rajahalme

> On Oct 26, 2016, at 12:08 AM, Pratyushaw P/HYD/TCS  
> wrote:
> 
> Hello,
> 
> It's a pleasure to contribute to the to the OF version 1.5 Extensions for the 
> upcoming releases.  
> 
> We have planned and picked up few of the below Extensions from the OpenFlow 
> 1.5 Specification listed:
> 
> B.18.5 Copy-Field action to copy between two OXM fields
> B.18.9 Allow set-field action to set metadata field
> 
> Let me know if there are any development happening for the above listed 
> Extensions so that duplication can be avoided. 
> 

Both of these should already be supported. Any Nicira extension “move” action 
is encoded to an OpenFlow15 copy_field action on the wire for OpenFlow versions 
1.5 and higher. The textual form you see with ovs-ofctl is still “move” for all 
versions. If you look at the test case “OpenFlow 1.5 action translation” in 
tests/ofp-actions.at , you’ll see that this is the 
case, if you decipher the OpenFlow encoding there.

Also, I believe we never restricted set-field action to not be able to set the 
metadata field. OVS allows set-field to set any writeable match field. You’ll 
find numerous examples of set_field to the metadata field from the OVS test 
suite.

 Jarno

> 
> Thanks & Regards,
> Pratyushaw P.
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] configure: Support compiling with Linux 4.8.

2016-10-20 Thread Jarno Rajahalme

> On Oct 20, 2016, at 2:55 PM, Pravin Shelar  wrote:
> 
> On Tue, Oct 18, 2016 at 5:01 PM, Jarno Rajahalme  wrote:
>> Datapath should now compile and work with Linux 4.8.
>> 
>> Signed-off-by: Jarno Rajahalme 
> Acked-by: Pravin B Shelar 

Thanks for the review, pushed to master and branch-2.6.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] datapath: Support a fixed size of 128 distinct labels.

2016-10-20 Thread Jarno Rajahalme

> On Oct 20, 2016, at 2:55 PM, Pravin Shelar  wrote:
> 
> On Tue, Oct 18, 2016 at 5:01 PM, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> Port upstream change in conntrack labels extension.  Add a new
>> configure macro HAVE_NF_CONN_LABELS_WITH_WORDS to detect the old
>> definition.  Unfortunately there is no conntrack API to hide the
>> difference, so the this makes conntrack.c deviate from upstream source
>> a bit.
>> 
>> Upstream commit:
>>commit 23014011ba4209a086931ff402eac1c41abbe456
>>Author: Florian Westphal 
>>Date:   Thu Jul 21 12:51:16 2016 +0200
>> 
>>netfilter: conntrack: support a fixed size of 128 distinct labels
>> 
>>The conntrack label extension is currently variable-sized, e.g. if
>>only 2 labels are used by iptables rules then the labels->bits[] array
>>will only contain one element.
>> 
>>We track size of each label storage area in the 'words' member.
>> 
>>But in nftables and openvswitch we always have to ask for worst-case
>>since we don't know what bit will be used at configuration time.
>> 
>>As most arches are 64bit we need to allocate 24 bytes in this case:
>> 
>>struct nf_conn_labels {
>>u8words;   /* 0 1 */
>>/* XXX 7 bytes hole, try to pack */
>>long unsigned bits[2]; /* 8 24 */
>> 
>>Make bits a fixed size and drop the words member, it simplifies
>>the code and only increases memory requirements on x86 when
>>less than 64bit labels are required.
>> 
>>We still only allocate the extension if its needed.
>> 
>>Signed-off-by: Florian Westphal 
>>Signed-off-by: Pablo Neira Ayuso 
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Pravin B Shelar mailto:pshe...@ovn.org>>

Thanks for the review, pushed to master and branch-2.6.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 07/12] dpif-netdev: Cache align netdev_flow_keys.

2016-10-19 Thread Jarno Rajahalme

> On Oct 19, 2016, at 10:01 AM, Joe Stringer  wrote:
> 
> On 14 October 2016 at 07:37, Bhanuprakash Bodireddy
>  wrote:
>> Aligning the 'keys' array seems to have positive performance impact.
>> 
>> Signed-off-by: Bhanuprakash Bodireddy 
>> Co-authored-by: Antonio Fischetti 
>> Signed-off-by: Antonio Fischetti 
>> Acked-by: Daniele Di Proietto 
> 
> It looks like the Windows build was broken after this change:
> https://ci.appveyor.com/project/blp/ovs/build/1.0.2391
> 
> Will one of you take a look?

The example in include/openvswitch/compiler.c has the OVS_ALIGNED_VAR() before 
the declaration, not after, so changing the modified line accordingly might 
help. I’d not want to propose a patch before testing that this fixes the 
Windows build, though, and I do not have access to a Windows build system…

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] Makefile.am: Add DocumentationStyle.rst to distribution.

2016-10-19 Thread Jarno Rajahalme

> On Oct 19, 2016, at 7:25 AM, Russell Bryant  wrote:
> 
> 
> 
> On Tue, Oct 18, 2016 at 7:59 PM, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
> Compile fails otherwise.
> 
> Signed-off-by: Jarno Rajahalme mailto:ja...@ovn.org>>
> ---
>  Makefile.am | 1 +
>  1 file changed, 1 insertion(+)
> 
> Sorry about that.
> 
> I went ahead and applied this to unbreak master.
> 
> -- 
> Russell Bryant

Thanks!

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/2] configure: Support compiling with Linux 4.8.

2016-10-18 Thread Jarno Rajahalme
Datapath should now compile and work with Linux 4.8.

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index a3c95f5..2c500a8 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -134,10 +134,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [
 AC_MSG_RESULT([$kversion])
 
 if test "$version" -ge 4; then
-   if test "$version" = 4 && test "$patchlevel" -le 7; then
+   if test "$version" = 4 && test "$patchlevel" -le 8; then
   : # Linux 4.x
else
-  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.7.x is not supported (please refer to the FAQ for advice)])
+  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.8.x is not supported (please refer to the FAQ for advice)])
fi
 elif test "$version" = 3 && test "$patchlevel" -ge 10; then
: # Linux 3.x
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 1/2] datapath: Support a fixed size of 128 distinct labels.

2016-10-18 Thread Jarno Rajahalme
Port upstream change in conntrack labels extension.  Add a new
configure macro HAVE_NF_CONN_LABELS_WITH_WORDS to detect the old
definition.  Unfortunately there is no conntrack API to hide the
difference, so the this makes conntrack.c deviate from upstream source
a bit.

Upstream commit:
commit 23014011ba4209a086931ff402eac1c41abbe456
Author: Florian Westphal 
Date:   Thu Jul 21 12:51:16 2016 +0200

netfilter: conntrack: support a fixed size of 128 distinct labels

The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.

We track size of each label storage area in the 'words' member.

But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.

As most arches are 64bit we need to allocate 24 bytes in this case:

struct nf_conn_labels {
u8words;   /* 0 1 */
/* XXX 7 bytes hole, try to pack */
long unsigned bits[2]; /* 8 24 */

Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.

We still only allocate the extension if its needed.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 

Signed-off-by: Jarno Rajahalme 
---
 acinclude.m4 |  2 ++
 datapath/conntrack.c | 14 --
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 353519d..a3c95f5 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -529,6 +529,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_labels.h],
   [nf_connlabels_get], [int bit],
   [OVS_DEFINE([HAVE_NF_CONNLABELS_GET_TAKES_BIT])])
+  OVS_FIND_FIELD_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_labels.h],
+[nf_conn_labels], [words])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/ipv6/nf_defrag_ipv6.h],
   [nf_ct_frag6_consume_orig])
   OVS_GREP_IFELSE([$KSRC/include/net/netfilter/ipv6/nf_defrag_ipv6.h],
diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index a2fc450..6e722b6 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -136,13 +136,22 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
 #endif
 }
 
+static size_t ovs_ct_get_labels_len(struct nf_conn_labels *cl)
+{
+#ifdef HAVE_NF_CONN_LABELS_WITH_WORDS
+   return cl->words * sizeof(long);
+#else
+   return sizeof(cl->bits);
+#endif
+}
+
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
 
if (cl) {
-   size_t len = cl->words * sizeof(long);
+   size_t len = ovs_ct_get_labels_len(cl);
 
if (len > OVS_CT_LABELS_LEN)
len = OVS_CT_LABELS_LEN;
@@ -281,7 +290,8 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct 
sw_flow_key *key,
nf_ct_labels_ext_add(ct);
cl = nf_ct_labels_find(ct);
}
-   if (!cl || cl->words * sizeof(long) < OVS_CT_LABELS_LEN)
+
+   if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN)
return -ENOSPC;
 
err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask,
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] Makefile.am: Add DocumentationStyle.rst to distribution.

2016-10-18 Thread Jarno Rajahalme
Compile fails otherwise.

Signed-off-by: Jarno Rajahalme 
---
 Makefile.am | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile.am b/Makefile.am
index a4842c1..dc74886 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -69,6 +69,7 @@ docs = \
CONTRIBUTING.md \
CodingStyle.md \
DESIGN.md \
+   DocumentationStyle.rst \
FAQ.md \
INSTALL.rst \
INSTALL.Debian.rst \
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 01/12] dpcls: Use 32 packet batches for lookups.

2016-10-18 Thread Jarno Rajahalme
See one comment below,

  Jarno

> On Oct 14, 2016, at 7:37 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> This patch increases the number of packets processed in a batch during a
> lookup from 16 to 32. Processing batches of 32 packets improves
> performance and also one of the internal loops can be avoided here.
> 
> Signed-off-by: Antonio Fischetti 
> Co-authored-by: Bhanuprakash Bodireddy 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
> lib/dpif-netdev.c | 110 ++
> 1 file changed, 45 insertions(+), 65 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index eb9f764..0a4f338 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4985,23 +4985,21 @@ dpcls_lookup(struct dpcls *cls, const struct 
> netdev_flow_key keys[],
>  int *num_lookups_p)
> {
> /* The received 'cnt' miniflows are the search-keys that will be processed
> - * in batches of 16 elements.  N_MAPS will contain the number of these
> - * 16-elements batches.  i.e. for 'cnt' = 32, N_MAPS will be 2.  The 
> batch
> - * size 16 was experimentally found faster than 8 or 32. */
> -typedef uint16_t map_type;
> + * to find a matching entry into the available subtables.
> + * The number of bits in map_type is equal to NETDEV_MAX_BURST. */
> +typedef uint32_t map_type;
> #define MAP_BITS (sizeof(map_type) * CHAR_BIT)
> +BUILD_ASSERT_DECL(MAP_BITS >= NETDEV_MAX_BURST);
> 
> -#if !defined(__CHECKER__) && !defined(_WIN32)
> -const int N_MAPS = DIV_ROUND_UP(cnt, MAP_BITS);
> -#else
> -enum { N_MAPS = DIV_ROUND_UP(NETDEV_MAX_BURST, MAP_BITS) };
> -#endif
> -map_type maps[N_MAPS];
> struct dpcls_subtable *subtable;
> 
> -memset(maps, 0xff, sizeof maps);
> -if (cnt % MAP_BITS) {
> -maps[N_MAPS - 1] >>= MAP_BITS - cnt % MAP_BITS; /* Clear extra bits. 
> */
> +map_type keys_map = TYPE_MAXIMUM(map_type);
> +map_type found_map;
> +uint32_t hashes[MAP_BITS];
> +const struct cmap_node *nodes[MAP_BITS];
> +
> +if (cnt != NETDEV_MAX_BURST) {
> +keys_map >>= NETDEV_MAX_BURST - cnt; /* Clear extra bits. */

‘keys_maps’ has MAP_BITS bits set, and after this is should only have the ‘cnt’ 
LSBs set. The assert above allows NETDEV_MAX_BURST to be less than MAP_BITS, 
hence MAP_BITS must be used here instead.

Otherwise looks good to me, so with this change:

Acked-by: Jarno Rajahalme 


> }
> memset(rules, 0, cnt * sizeof *rules);
> 
> @@ -5015,61 +5013,43 @@ dpcls_lookup(struct dpcls *cls, const struct 
> netdev_flow_key keys[],
>  * search-key, the search for that key can stop because the rules are
>  * non-overlapping. */
> PVECTOR_FOR_EACH (subtable, &cls->subtables) {
> -const struct netdev_flow_key *mkeys = keys;
> -struct dpcls_rule **mrules = rules;
> -map_type remains = 0;
> -int m;
> -
> -BUILD_ASSERT_DECL(sizeof remains == sizeof *maps);
> -
> -/* Loops on each batch of 16 search-keys. */
> -for (m = 0; m < N_MAPS; m++, mkeys += MAP_BITS, mrules += MAP_BITS) {
> -uint32_t hashes[MAP_BITS];
> -const struct cmap_node *nodes[MAP_BITS];
> -unsigned long map = maps[m];
> -int i;
> -
> -if (!map) {
> -continue; /* Skip empty maps. */
> -}
> -
> -/* Compute hashes for the remaining keys.  Each search-key is
> - * masked with the subtable's mask to avoid hashing the 
> wildcarded
> - * bits. */
> -ULLONG_FOR_EACH_1(i, map) {
> -hashes[i] = netdev_flow_key_hash_in_mask(&mkeys[i],
> - &subtable->mask);
> -}
> -/* Lookup. */
> -map = cmap_find_batch(&subtable->rules, map, hashes, nodes);
> -/* Check results.  When the i-th bit of map is set, it means 
> that a
> - * set of nodes with a matching hash value was found for the i-th
> - * search-key.  Due to possible hash collisions we need to check
> - * which of the found rules, if any, really matches our masked
> - * search-key. */
> -ULLONG_FOR_EACH_1(i, map) {
> -struct dpcls_rule *rule;
> -
> -CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
> -if (OVS_LIKELY(dpcls_rule_matches_key(rule, &mkeys[i]))) 
> {
> -mrules[i] = rule;
> -/* Even at 20 Mpps the 32-bit hit_cnt c

Re: [ovs-dev] [PATCH v3 02/12] flow: Add comments to mf_get_next_in_map().

2016-10-18 Thread Jarno Rajahalme

> On Oct 17, 2016, at 8:06 PM, Daniele Di Proietto  wrote:
> 
> 2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy <
> bhanuprakash.bodire...@intel.com <mailto:bhanuprakash.bodire...@intel.com>>:
> 
>> This patch adds comments to mf_get_next_in_map() to make it more
>> comprehensible.
>> 
>> Signed-off-by: Jarno Rajahalme mailto:ja...@ovn.org>>
>> Acked-by: Bhanuprakash Bodireddy > <mailto:bhanuprakash.bodire...@intel.com>>
>> Acked-by: Antonio Fischetti > <mailto:antonio.fische...@intel.com>>
>> 
> 
> The tags here look weird, at least your signoff is missing.
> 

I guess this was based on all the changes in the patch being written by me. I’m 
OK with these tags, if this makes sense.

  Jarno

> 
>> ---
>> lib/flow.h | 32 +++-
>> 1 file changed, 27 insertions(+), 5 deletions(-)
>> 
>> diff --git a/lib/flow.h b/lib/flow.h
>> index ea24e28..5a14941 100644
>> --- a/lib/flow.h
>> +++ b/lib/flow.h
>> @@ -564,12 +564,27 @@ flow_values_get_next_in_maps(struct
>> flow_for_each_in_maps_aux *aux,
>>  flow_values_get_next_in_maps(&aux__, &(VALUE));)
>> 
>> struct mf_for_each_in_map_aux {
>> -size_t unit;
>> -struct flowmap fmap;
>> -struct flowmap map;
>> -const uint64_t *values;
>> +size_t unit; /* Current 64-bit unit of the flowmaps
>> +   being processed. */
>> +struct flowmap fmap; /* Remaining 1-bits corresponding to the
>> +   64-bit words in ‘values’ */
>> +struct flowmap map;  /* Remaining 1-bits corresponding to the
>> +   64-bit words of interest. */
>> +const uint64_t *values;  /* 64-bit words corresponding to the
>> +   1-bits in ‘fmap’. */
>> };
>> 
>> +/* Get the data from ‘aux->values’ corresponding to the next lowest 1-bit
>> + * in ‘aux->map’, given that ‘aux->values’ points to an array of 64-bit
>> + * words corresponding to the 1-bits in ‘aux->fmap’, starting from the
>> + * rightmost 1-bit.
>> + *
>> + * Returns ’true’ if the traversal is incomplete, ‘false’ otherwise.
>> + * ‘aux’ is prepared for the next iteration after each call.
>> + *
>> + * This is used to traverse through, for example, the values in a miniflow
>> + * representation of a flow key selected by non-zero 64-bit words in a
>> + * corresponding subtable mask. */
>> static inline bool
>> mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
>>uint64_t *value)
>> @@ -577,8 +592,10 @@ mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
>> map_t *map, *fmap;
>> map_t rm1bit;
>> 
>> +/* Skip empty map units. */
>> while (OVS_UNLIKELY(!*(map = &aux->map.bits[aux->unit]))) {
>> -/* Skip remaining data in the previous unit. */
>> +/* Skip remaining data in the current unit before advancing
>> + * to the next. */
>> aux->values += count_1bits(aux->fmap.bits[aux->unit]);
>> if (++aux->unit == FLOWMAP_UNITS) {
>> return false;
>> @@ -589,7 +606,12 @@ mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
>> *map -= rm1bit;
>> fmap = &aux->fmap.bits[aux->unit];
>> 
>> +/* If the rightmost 1-bit found from the current unit in ‘aux->map’
>> + * (‘rm1bit’) is also present in ‘aux->fmap’, store the corresponding
>> + * value from ‘aux->values’ to ‘*value', otherwise store 0. */
>> if (OVS_LIKELY(*fmap & rm1bit)) {
>> +/* Skip all 64-bit words in ‘values’ preceding the one
>> corresponding
>> + * to ‘rm1bit’. */
>> map_t trash = *fmap & (rm1bit - 1);
>> 
>> *fmap -= trash;
>> --
>> 2.4.11
>> 
>> ___
>> dev mailing list
>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>> http://openvswitch.org/mailman/listinfo/dev 
>> <http://openvswitch.org/mailman/listinfo/dev>
>> 
> ___
> dev mailing list
> dev@openvswitch.org <mailto:dev@openvswitch.org>
> http://openvswitch.org/mailman/listinfo/dev 
> <http://openvswitch.org/mailman/listinfo/dev>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 03/12] flow: Skip invoking expensive count_1bits() with zero input.

2016-10-17 Thread Jarno Rajahalme

> On Oct 17, 2016, at 2:10 AM, Fischetti, Antonio  
> wrote:
> 
> Thanks Jarno, one question below.
> 
> Antonio
> 
>> -Original Message-
>> From: dev [mailto:dev-boun...@openvswitch.org 
>> <mailto:dev-boun...@openvswitch.org>] On Behalf Of Jarno
>> Rajahalme
>> Sent: Friday, October 14, 2016 5:03 PM
>> To: Bodireddy, Bhanuprakash > <mailto:bhanuprakash.bodire...@intel.com>>
>> Cc: dev@openvswitch.org <mailto:dev@openvswitch.org>
>> Subject: Re: [ovs-dev] [PATCH v3 03/12] flow: Skip invoking expensive
>> count_1bits() with zero input.
>> 
>> 
>>> On Oct 14, 2016, at 7:37 AM, Bhanuprakash Bodireddy
>>  wrote:
>>> 
>>> This patch checks if trash is non-zero and only then resets the flowmap
>>> bit and increment the pointer by set bits as found in trash.
>>> 
>>> Signed-off-by: Bhanuprakash Bodireddy 
>>> Co-authored-by: Antonio Fischetti 
>>> Signed-off-by: Antonio Fischetti 
>>> Acked-by: Jarno Rajahalme 
>>> ---
>>> lib/flow.h | 15 ++-
>>> 1 file changed, 10 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/lib/flow.h b/lib/flow.h
>>> index 5a14941..74e75d6 100644
>>> --- a/lib/flow.h
>>> +++ b/lib/flow.h
>>> @@ -614,11 +614,16 @@ mf_get_next_in_map(struct mf_for_each_in_map_aux
>> *aux,
>>> * to ‘rm1bit’. */
>>>map_t trash = *fmap & (rm1bit - 1);
>>> 
>>> -*fmap -= trash;
>>> -/* count_1bits() is fast for systems where speed matters (e.g.,
>>> - * DPDK), so we don't try avoid using it.
>>> - * Advance 'aux->values' to point to the value for 'rm1bit'. */
>>> -aux->values += count_1bits(trash);
>>> +/* Avoid resetting 'fmap' and calling count_1bits() when trash
>> is
>>> + * zero. */
>>> +if (trash) {
>>> +*fmap -= trash;
>>> +/* count_1bits() is fast for systems where speed matters
>> (e.g.,
>>> + * DPDK), so we don't try avoid using it.
>> 
>> The comment above is still wrong as we now test ‘trash’ for non-zero
>> above.
>> 
>>  Jarno
> [Antonio F] Just to avoid misunderstanding, do you mean we can now entirely 
> remove
> the existing comment
>/* count_1bits() is fast for systems where speed
> right?
> Because with the latest changes now we do try to avoid using count_1bits(). 
> Is that ok?
> 
> 

Yes.

  Jarno

>> 
>>> + * Advance 'aux->values' to point to the value for 'rm1bit'
>> only.
>>> + */
>>> +aux->values += count_1bits(trash);
>>> +}
>>> 
>>>*value = *aux->values;
>>>} else {
>>> --
>>> 2.4.11
>>> 
>>> ___
>>> dev mailing list
>>> dev@openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
>> 
>> ___
>> dev mailing list
>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>> http://openvswitch.org/mailman/listinfo/dev 
>> <http://openvswitch.org/mailman/listinfo/dev>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 03/12] flow: Skip invoking expensive count_1bits() with zero input.

2016-10-14 Thread Jarno Rajahalme

> On Oct 14, 2016, at 7:37 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> This patch checks if trash is non-zero and only then resets the flowmap
> bit and increment the pointer by set bits as found in trash.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Co-authored-by: Antonio Fischetti 
> Signed-off-by: Antonio Fischetti 
> Acked-by: Jarno Rajahalme 
> ---
> lib/flow.h | 15 ++-
> 1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/flow.h b/lib/flow.h
> index 5a14941..74e75d6 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -614,11 +614,16 @@ mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
>  * to ‘rm1bit’. */
> map_t trash = *fmap & (rm1bit - 1);
> 
> -*fmap -= trash;
> -/* count_1bits() is fast for systems where speed matters (e.g.,
> - * DPDK), so we don't try avoid using it.
> - * Advance 'aux->values' to point to the value for 'rm1bit'. */
> -aux->values += count_1bits(trash);
> +/* Avoid resetting 'fmap' and calling count_1bits() when trash is
> + * zero. */
> +if (trash) {
> +*fmap -= trash;
> +/* count_1bits() is fast for systems where speed matters (e.g.,
> + * DPDK), so we don't try avoid using it.

The comment above is still wrong as we now test ‘trash’ for non-zero above.

  Jarno

> + * Advance 'aux->values' to point to the value for 'rm1bit' only.
> + */
> +aux->values += count_1bits(trash);
> +}
> 
> *value = *aux->values;
> } else {
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 06/13] netdev-dpdk: Do not abort if out of hugepage memory.

2016-10-12 Thread Jarno Rajahalme

> On Oct 4, 2016, at 6:22 PM, Daniele Di Proietto  
> wrote:
> 
> We can run out of hugepage memory coming from rte_*alloc() more easily
> than heap coming from malloc().
> 
> Therefore:
> 
> * We should use hugepage memory if we're going to access it only in
>  the slow path.


Should this be “We should NOT use huge page…” instead?

  Jarno

> * We shouldn't abort if we're out of hugepage memory.
> * We should gracefully handle out of memory conditions.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
> lib/netdev-dpdk.c | 62 ++-
> 1 file changed, 38 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index fec3e68..f5523a9 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -436,19 +436,13 @@ dpdk_buf_size(int mtu)
>  NETDEV_DPDK_MBUF_ALIGN);
> }
> 
> -/* XXX: use dpdk malloc for entire OVS. in fact huge page should be used
> - * for all other segments data, bss and text. */
> -
> +/* Allocates an area of 'sz' bytes from DPDK.  The memory is zero'ed.
> + *
> + * Unlike xmalloc(), this function can return NULL on failure. */
> static void *
> dpdk_rte_mzalloc(size_t sz)
> {
> -void *ptr;
> -
> -ptr = rte_zmalloc(OVS_VPORT_DPDK, sz, OVS_CACHE_LINE_SIZE);
> -if (ptr == NULL) {
> -out_of_memory();
> -}
> -return ptr;
> +return rte_zmalloc(OVS_VPORT_DPDK, sz, OVS_CACHE_LINE_SIZE);
> }
> 
> /* XXX this function should be called only by pmd threads (or by non pmd
> @@ -483,6 +477,9 @@ dpdk_mp_create(int socket_id, int mtu)
> unsigned mp_size;
> 
> dmp = dpdk_rte_mzalloc(sizeof *dmp);
> +if (!dmp) {
> +return NULL;
> +}
> dmp->socket_id = socket_id;
> dmp->mtu = mtu;
> dmp->refcount = 1;
> @@ -801,17 +798,22 @@ netdev_dpdk_alloc(void)
> return NULL;
> }
> 
> -static void
> -netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned int n_txqs)
> +static struct dpdk_tx_queue *
> +netdev_dpdk_alloc_txq(unsigned int n_txqs)
> {
> +struct dpdk_tx_queue *txqs;
> unsigned i;
> 
> -dev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *dev->tx_q);
> -for (i = 0; i < n_txqs; i++) {
> -/* Initialize map for vhost devices. */
> -dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN;
> -rte_spinlock_init(&dev->tx_q[i].tx_lock);
> +txqs = dpdk_rte_mzalloc(n_txqs * sizeof *txqs);
> +if (txqs) {
> +for (i = 0; i < n_txqs; i++) {
> +/* Initialize map for vhost devices. */
> +txqs[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN;
> +rte_spinlock_init(&txqs[i].tx_lock);
> +}
> }
> +
> +return txqs;
> }
> 
> static int
> @@ -874,13 +876,18 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
> port_no,
> if (err) {
> goto unlock;
> }
> -netdev_dpdk_alloc_txq(dev, netdev->n_txq);
> +dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
> } else {
> -netdev_dpdk_alloc_txq(dev, OVS_VHOST_MAX_QUEUE_NUM);
> +dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
> /* Enable DPDK_DEV_VHOST device and set promiscuous mode flag. */
> dev->flags = NETDEV_UP | NETDEV_PROMISC;
> }
> 
> +if (!dev->tx_q) {
> +err = ENOMEM;
> +goto unlock;
> +}
> +
> ovs_list_push_back(&dpdk_list, &dev->list_node);
> 
> unlock:
> @@ -1226,7 +1233,11 @@ netdev_dpdk_rxq_alloc(void)
> {
> struct netdev_rxq_dpdk *rx = dpdk_rte_mzalloc(sizeof *rx);
> 
> -return &rx->up;
> +if (rx) {
> +return &rx->up;
> +}
> +
> +return NULL;
> }
> 
> static struct netdev_rxq_dpdk *
> @@ -2352,7 +2363,7 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
> int *enabled_queues, n_enabled = 0;
> int i, k, total_txqs = dev->up.n_txq;
> 
> -enabled_queues = dpdk_rte_mzalloc(total_txqs * sizeof *enabled_queues);
> +enabled_queues = xcalloc(total_txqs, sizeof *enabled_queues);
> 
> for (i = 0; i < total_txqs; i++) {
> /* Enabled queues always mapped to themselves. */
> @@ -2379,7 +2390,7 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
> VLOG_DBG("%2d --> %2d", i, dev->tx_q[i].map);
> }
> 
> -rte_free(enabled_queues);
> +free(enabled_queues);
> }
> 
> /*
> @@ -2620,7 +2631,7 @@ dpdk_ring_create(const char dev_name[], unsigned int 
> port_no,
> int err;
> 
> ivshmem = dpdk_rte_mzalloc(sizeof *ivshmem);
> -if (ivshmem == NULL) {
> +if (!ivshmem) {
> return ENOMEM;
> }
> 
> @@ -2977,7 +2988,10 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> 
> rte_free(dev->tx_q);
> err = dpdk_eth_dev_init(dev);
> -netdev_dpdk_alloc_txq(dev, netdev->n_txq);
> +dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
> +if (!dev->tx_q) {
> +err = ENOMEM;
> +}
> 
> netdev_change_seq_changed(netdev);
> 
> -- 
> 2.9.3
> 
> ___
> dev mailing list
> dev@op

[ovs-dev] [PATCH] ofproto: Return the OFPC_BUNDLES bit in switch features reply.

2016-10-12 Thread Jarno Rajahalme
Add definitions for the OpenFlow 1.5 specific capabilities bits
OFPC15_BUNDLES and OFPC15_FLOW_MONITORING.  Return the bundles
capability bit in switch features reply.

Reported-by: Andrej Leitner 
Signed-off-by: Jarno Rajahalme 
---
 include/openflow/openflow-1.5.h |  7 +++
 include/openvswitch/ofp-util.h  | 11 ---
 lib/ofp-print.c |  2 ++
 lib/ofp-util.c  |  8 ++--
 ofproto/ofproto.c   |  2 +-
 tests/ofp-print.at  | 12 
 6 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/include/openflow/openflow-1.5.h b/include/openflow/openflow-1.5.h
index 3649e6c..e3c7c6f 100644
--- a/include/openflow/openflow-1.5.h
+++ b/include/openflow/openflow-1.5.h
@@ -39,6 +39,13 @@
 
 #include 
 
+/* OpenFlow 1.5 specific capabilities
+ * (struct ofp_switch_features, member capabilities). */
+enum ofp15_capabilities {
+OFPC15_BUNDLES= 1 << 9,/* Switch supports bundles. */
+OFPC15_FLOW_MONITORING = 1 << 10,  /* Switch supports flow monitoring. */
+};
+
 /* Body for ofp15_multipart_request of type OFPMP_PORT_DESC. */
 struct ofp15_port_desc_request {
 ovs_be32 port_no; /* All ports if OFPP_ANY. */
diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index f3cb624..e55d7b5 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -613,7 +613,7 @@ struct ofputil_phy_port {
 };
 
 enum ofputil_capabilities {
-/* OpenFlow 1.0, 1.1, 1.2, and 1.3 share these capability values. */
+/* All OpenFlow versions share these capability values. */
 OFPUTIL_C_FLOW_STATS = 1 << 0,  /* Flow statistics. */
 OFPUTIL_C_TABLE_STATS= 1 << 1,  /* Table statistics. */
 OFPUTIL_C_PORT_STATS = 1 << 2,  /* Port statistics. */
@@ -626,11 +626,16 @@ enum ofputil_capabilities {
 /* OpenFlow 1.0 only. */
 OFPUTIL_C_STP= 1 << 3,  /* 802.1d spanning tree. */
 
-/* OpenFlow 1.1, 1.2, and 1.3 share this capability. */
+/* OpenFlow 1.1+ only.  Note that this bit value does not match the one
+ * in the OpenFlow message. */
 OFPUTIL_C_GROUP_STATS= 1 << 4,  /* Group statistics. */
 
-/* OpenFlow 1.2 and 1.3 share this capability */
+/* OpenFlow 1.2+ only. */
 OFPUTIL_C_PORT_BLOCKED   = 1 << 8,  /* Switch will block looping ports */
+
+/* OpenFlow 1.5+ only. */
+OFPUTIL_C_BUNDLES = 1 << 9,  /* Switch supports bundles. */
+OFPUTIL_C_FLOW_MONITORING = 1 << 10, /* Switch supports flow monitoring. */
 };
 
 /* Abstract ofp_switch_features. */
diff --git a/lib/ofp-print.c b/lib/ofp-print.c
index 0a68551..b7b5290 100644
--- a/lib/ofp-print.c
+++ b/lib/ofp-print.c
@@ -506,6 +506,8 @@ ofputil_capabilities_to_name(uint32_t bit)
 case OFPUTIL_C_STP:  return "STP";
 case OFPUTIL_C_GROUP_STATS:  return "GROUP_STATS";
 case OFPUTIL_C_PORT_BLOCKED: return "PORT_BLOCKED";
+case OFPUTIL_C_BUNDLES:  return "BUNDLES";
+case OFPUTIL_C_FLOW_MONITORING: return "FLOW_MONITORING";
 }
 
 return NULL;
diff --git a/lib/ofp-util.c b/lib/ofp-util.c
index 0445968..84109d0 100644
--- a/lib/ofp-util.c
+++ b/lib/ofp-util.c
@@ -4644,6 +4644,9 @@ BUILD_ASSERT_DECL((int) OFPUTIL_C_PORT_STATS == 
OFPC_PORT_STATS);
 BUILD_ASSERT_DECL((int) OFPUTIL_C_IP_REASM == OFPC_IP_REASM);
 BUILD_ASSERT_DECL((int) OFPUTIL_C_QUEUE_STATS == OFPC_QUEUE_STATS);
 BUILD_ASSERT_DECL((int) OFPUTIL_C_ARP_MATCH_IP == OFPC_ARP_MATCH_IP);
+BUILD_ASSERT_DECL((int) OFPUTIL_C_PORT_BLOCKED == OFPC12_PORT_BLOCKED);
+BUILD_ASSERT_DECL((int) OFPUTIL_C_BUNDLES == OFPC15_BUNDLES);
+BUILD_ASSERT_DECL((int) OFPUTIL_C_FLOW_MONITORING == OFPC15_FLOW_MONITORING);
 
 static uint32_t
 ofputil_capabilities_mask(enum ofp_version ofp_version)
@@ -4656,9 +4659,11 @@ ofputil_capabilities_mask(enum ofp_version ofp_version)
 case OFP12_VERSION:
 case OFP13_VERSION:
 case OFP14_VERSION:
+return OFPC_COMMON | OFPC12_PORT_BLOCKED;
 case OFP15_VERSION:
 case OFP16_VERSION:
-return OFPC_COMMON | OFPC12_PORT_BLOCKED;
+return OFPC_COMMON | OFPC12_PORT_BLOCKED | OFPC15_BUNDLES
+| OFPC15_FLOW_MONITORING;
 default:
 /* Caller needs to check osf->header.version itself */
 return 0;
@@ -4784,7 +4789,6 @@ ofputil_encode_switch_features(const struct 
ofputil_switch_features *features,
 osf->n_buffers = htonl(features->n_buffers);
 osf->n_tables = features->n_tables;
 
-osf->capabilities = htonl(features->capabilities & OFPC_COMMON);
 osf->capabilities = htonl(features->capabilities &
   ofputil_capabilities_mask(version));
 switch (version) {
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index de1c469..53b7226 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -3295,7 +329

Re: [ovs-dev] [PATCH] meta-flow: Fix the NXM_NX_* names of xxreg2 and xxreg3.

2016-10-11 Thread Jarno Rajahalme

> On Oct 11, 2016, at 8:33 AM, Ben Pfaff  wrote:
> 
> On Mon, Oct 10, 2016 at 11:38:21AM -0700, Jarno Rajahalme wrote:
>> 
>>> On Oct 7, 2016, at 5:54 PM, Justin Pettit  wrote:
>>> 
>>> 
>>>> On Oct 7, 2016, at 5:41 PM, Jarno Rajahalme  wrote:
>>>> 
>>>> xxreg2 and xxreg3 had the same NXM_NX_* names as xxreg0 and xxreg1,
>>>> correspondingly.
>>>> 
>>>> Found by inspection.
>>>> 
>>>> CC: Justin Pettit 
>>>> Fixes: b23ada8eecfd ("Introduce 128-bit xxregs.")
>>>> Signed-off-by: Jarno Rajahalme 
>>> 
>>> Thanks!
>>> 
>>> Acked-by: Justin Pettit 
>> 
>> Thanks for the speedy review! Pushed to master and branch-2.6 with
>> placeholders for four more xxregs.
> 
> Thanks for fixing this.
> 
> It didn't seem to have made it to branch-2.6, so I did the backport
> myself.

It appears I forgot to push after the cherry-pick on the local branch-2.6, duh!

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in dpif_upcall structure.

2016-10-10 Thread Jarno Rajahalme
How about making the ‘dp_packet’ member the first member and adding a comment 
that this should be first?

  Jarno

> On Oct 10, 2016, at 8:42 AM, Bodireddy, Bhanuprakash 
>  wrote:
> 
> Hello jarno,
> 
> Thanks for the feedback, while reordering the members of dpif_upcall, I had 
> to deviate from standards due to below reason.
> The dp_packet member has mbuf as first member that starts at new cache line 
> creating hole of size 60 bytes between dpif_upcall_type and dp_packet as 
> pointed below.
> 
> struct dpif_upcall {
>enum dpif_upcall_type  type;   
>   
> 
>   -->  60 bytes hole
> 
>/* --- cacheline 1 boundary (64 bytes) --- */
>struct dp_packet {
>struct rte_mbuf {
>/* typedef MARKER */ void * cacheline0[0]; /*  
>   64 0 */
> 
>   }
>   struct nlattr *key; 
>  
> .
> .
> }
> 
> I tried to pack this hole by moving other members in to this space. 
> 
> Regards,
> Bhanu Prakash. 
> 
>> -Original Message-
>> From: Jarno Rajahalme [mailto:ja...@ovn.org]
>> Sent: Friday, October 7, 2016 10:11 PM
>> To: Bodireddy, Bhanuprakash 
>> Cc: dev@openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in dpif_upcall
>> structure.
>> 
>> CodingStyle.md instructs to group struct members into related groups. Also,
>> changing the relative order of pointers should not make any difference. Could
>> you achieve the same by reordering just the members above the
>> ‘DPIF_UC_ACTION only.’ comment?
>> 
>> Jarno
>> 
>>> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy
>>  wrote:
>>> 
>>> By reordering the data elements in dpif_upcall structure, pad bytes
>>> can be reduced and also a cache line.
>>> 
>>> Before: structure size:768, holes:1, sum padbytes:60, cachelines:12
>>> After: structure size:656, holes:1, sum padbytes:4, cachelines:11
>>> 
>>> Signed-off-by: Bhanuprakash Bodireddy
>>> 
>>> Signed-off-by: Antonio Fischetti 
>>> ---
>>> lib/dpif.h | 17 +
>>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>> 
>>> diff --git a/lib/dpif.h b/lib/dpif.h
>>> index a7c5097..4a4bb3d 100644
>>> --- a/lib/dpif.h
>>> +++ b/lib/dpif.h
>>> @@ -779,17 +779,18 @@ const char *dpif_upcall_type_to_string(enum
>>> dpif_upcall_type); struct dpif_upcall {
>>>/* All types. */
>>>enum dpif_upcall_type type;
>>> -struct dp_packet packet;   /* Packet data. */
>>> -struct nlattr *key; /* Flow key. */
>>> -size_t key_len; /* Length of 'key' in bytes. */
>>> -ovs_u128 ufid;  /* Unique flow identifier for 'key'. */
>>> -struct nlattr *mru; /* Maximum receive unit. */
>>> -struct nlattr *cutlen;  /* Number of bytes shrink from the end. */
>>> 
>>>/* DPIF_UC_ACTION only. */
>>> -struct nlattr *userdata;/* Argument to
>> OVS_ACTION_ATTR_USERSPACE. */
>>> -struct nlattr *out_tun_key;/* Output tunnel key. */
>>>struct nlattr *actions;/* Argument to OVS_ACTION_ATTR_USERSPACE.
>> */
>>> +struct nlattr *out_tun_key;/* Output tunnel key. */
>>> +struct nlattr *userdata;/* Argument to
>> OVS_ACTION_ATTR_USERSPACE. */
>>> +
>>> +struct nlattr *cutlen;  /* Number of bytes shrink from the end. */
>>> +struct nlattr *mru; /* Maximum receive unit. */
>>> +ovs_u128 ufid;  /* Unique flow identifier for 'key'. */
>>> +struct dp_packet packet;   /* Packet data. */
>>> +struct nlattr *key; /* Flow key. */
>>> +size_t key_len; /* Length of 'key' in bytes. */
>>> };
>>> 
>>> /* A callback to notify higher layer of dpif about to be purged, so
>>> that
>>> --
>>> 2.4.11
>>> 
>>> ___
>>> dev mailing list
>>> dev@openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
> 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] meta-flow: Fix the NXM_NX_* names of xxreg2 and xxreg3.

2016-10-10 Thread Jarno Rajahalme

> On Oct 7, 2016, at 5:54 PM, Justin Pettit  wrote:
> 
> 
>> On Oct 7, 2016, at 5:41 PM, Jarno Rajahalme  wrote:
>> 
>> xxreg2 and xxreg3 had the same NXM_NX_* names as xxreg0 and xxreg1,
>> correspondingly.
>> 
>> Found by inspection.
>> 
>> CC: Justin Pettit 
>> Fixes: b23ada8eecfd ("Introduce 128-bit xxregs.")
>> Signed-off-by: Jarno Rajahalme 
> 
> Thanks!
> 
> Acked-by: Justin Pettit 
> 

Thanks for the speedy review! Pushed to master and branch-2.6 with placeholders 
for four more xxregs.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] meta-flow: Fix the NXM_NX_* names of xxreg2 and xxreg3.

2016-10-07 Thread Jarno Rajahalme
Should we also reserve some range of NXM_NX numbers after 114 for potential 
future registers, say 115-118 at least? Other register types do not have this 
problem as they have their own classes of numbers.

  Jarno

> On Oct 7, 2016, at 5:41 PM, Jarno Rajahalme  wrote:
> 
> xxreg2 and xxreg3 had the same NXM_NX_* names as xxreg0 and xxreg1,
> correspondingly.
> 
> Found by inspection.
> 
> CC: Justin Pettit 
> Fixes: b23ada8eecfd ("Introduce 128-bit xxregs.")
> Signed-off-by: Jarno Rajahalme 
> ---
> include/openvswitch/meta-flow.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
> index 966ff7f..ef07561 100644
> --- a/include/openvswitch/meta-flow.h
> +++ b/include/openvswitch/meta-flow.h
> @@ -940,8 +940,8 @@ enum OVS_PACKED_ENUM mf_field_id {
>  * Access: read/write.
>  * NXM: NXM_NX_XXREG0(111) since v2.6.  <0>
>  * NXM: NXM_NX_XXREG1(112) since v2.6.  <1>
> - * NXM: NXM_NX_XXREG0(113) since v2.6.  <2>
> - * NXM: NXM_NX_XXREG1(114) since v2.6.  <3>
> + * NXM: NXM_NX_XXREG2(113) since v2.6.  <2>
> + * NXM: NXM_NX_XXREG3(114) since v2.6.  <3>
>  * OXM: none.
>  */
> MFF_XXREG0,
> -- 
> 2.1.4
> 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] meta-flow: Fix the NXM_NX_* names of xxreg2 and xxreg3.

2016-10-07 Thread Jarno Rajahalme
xxreg2 and xxreg3 had the same NXM_NX_* names as xxreg0 and xxreg1,
correspondingly.

Found by inspection.

CC: Justin Pettit 
Fixes: b23ada8eecfd ("Introduce 128-bit xxregs.")
Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/meta-flow.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 966ff7f..ef07561 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -940,8 +940,8 @@ enum OVS_PACKED_ENUM mf_field_id {
  * Access: read/write.
  * NXM: NXM_NX_XXREG0(111) since v2.6.  <0>
  * NXM: NXM_NX_XXREG1(112) since v2.6.  <1>
- * NXM: NXM_NX_XXREG0(113) since v2.6.  <2>
- * NXM: NXM_NX_XXREG1(114) since v2.6.  <3>
+ * NXM: NXM_NX_XXREG2(113) since v2.6.  <2>
+ * NXM: NXM_NX_XXREG3(114) since v2.6.  <3>
  * OXM: none.
  */
 MFF_XXREG0,
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 12/12] timeval: Reorder elements in clock structure.

2016-10-07 Thread Jarno Rajahalme
I would leave the ‘stopped’ member below the comment.

Also, the 2nd cacheline is only ever accessed during unit tests, so this should 
not have real performance impact.

Acked-by: Jarno Rajahalme 

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> By reordering the elements in clock structure, pad bytes can be reduced
> and also a cache line is saved.
> 
> Before: structure size:136, holes:3, sum padbytes:18, cachelines:3
> After: structure size:120, holes:1, sum padbytes:2, cachelines:2
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/timeval.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/timeval.c b/lib/timeval.c
> index 0e8709a..ee755db 100644
> --- a/lib/timeval.c
> +++ b/lib/timeval.c
> @@ -69,12 +69,12 @@ struct large_warp {
> 
> struct clock {
> clockid_t id;   /* CLOCK_MONOTONIC or CLOCK_REALTIME. */
> +atomic_bool slow_path; /* True if warped or stopped. */
> +bool stopped OVS_GUARDED;  /* Disable real-time updates if true. 
> */
> 
> /* Features for use by unit tests.  Protected by 'mutex'. */
> struct ovs_mutex mutex;
> -atomic_bool slow_path; /* True if warped or stopped. */
> struct timespec warp OVS_GUARDED;  /* Offset added for unit tests. */
> -bool stopped OVS_GUARDED;  /* Disable real-time updates if true. 
> */
> struct timespec cache OVS_GUARDED; /* Last time read from kernel. */
> struct large_warp large_warp OVS_GUARDED; /* Connection information 
> waiting
>  for warp response. */
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 11/12] netlink-socket: Reorder elements in nl_dump structure.

2016-10-07 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> By reordering the elements in nl_dump structure, pad bytes can be
> reduced there by saving a cache line.
> 
> Before: structure size:72, holes:1, sum padbytes:4, cachelines:2
> After: structure size:64, holes:0, sum padbytes:0, cachelines:1
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/netlink-socket.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/netlink-socket.h b/lib/netlink-socket.h
> index f73fc7d..d3cc642 100644
> --- a/lib/netlink-socket.h
> +++ b/lib/netlink-socket.h
> @@ -260,12 +260,12 @@ struct nl_dump {
> /* These members are immutable during the lifetime of the nl_dump. */
> struct nl_sock *sock;   /* Socket being dumped. */
> uint32_t nl_seq;/* Expected nlmsg_seq for replies. */
> -
> -/* 'mutex' protects 'status' and serializes access to 'sock'. */
> -struct ovs_mutex mutex; /* Protects 'status', synchronizes recv(). */
> int status OVS_GUARDED; /* 0: dump in progress,
>  * positive errno: dump completed with error,
>  * EOF: dump completed successfully. */
> +
> +/* 'mutex' protects 'status' and serializes access to 'sock'. */
> +struct ovs_mutex mutex; /* Protects 'status', synchronizes recv(). */
> };
> 
> void nl_dump_start(struct nl_dump *, int protocol,
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 10/12] ovsdb: Reorder elements in ovsdb_table_schema structure.

2016-10-07 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> By reordering the elements in ovsdb_table_schema structure, pad bytes
> can be reduced and also a cache line is saved.
> 
> Before: structure size:72, holes:2, sum padbytes:10, cachelines:2
> After: structure size:64, holes:1, sum padbytes:2, cachelines:1
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> ovsdb/table.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/ovsdb/table.h b/ovsdb/table.h
> index f910d18..69dd649 100644
> --- a/ovsdb/table.h
> +++ b/ovsdb/table.h
> @@ -28,9 +28,9 @@ struct uuid;
> struct ovsdb_table_schema {
> char *name;
> bool mutable;
> -struct shash columns;   /* Contains "struct ovsdb_column *"s. */
> -unsigned int max_rows;  /* Maximum number of rows. */
> bool is_root;   /* Part of garbage collection root set? */
> +unsigned int max_rows;  /* Maximum number of rows. */
> +struct shash columns;   /* Contains "struct ovsdb_column *"s. */
> struct ovsdb_column_set *indexes;
> size_t n_indexes;
> };
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in dpif_upcall structure.

2016-10-07 Thread Jarno Rajahalme
CodingStyle.md instructs to group struct members into related groups. Also, 
changing the relative order of pointers should not make any difference. Could 
you achieve the same by reordering just the members above the ‘DPIF_UC_ACTION 
only.’ comment?

  Jarno

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> By reordering the data elements in dpif_upcall structure, pad bytes can
> be reduced and also a cache line.
> 
> Before: structure size:768, holes:1, sum padbytes:60, cachelines:12
> After: structure size:656, holes:1, sum padbytes:4, cachelines:11
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/dpif.h | 17 +
> 1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/dpif.h b/lib/dpif.h
> index a7c5097..4a4bb3d 100644
> --- a/lib/dpif.h
> +++ b/lib/dpif.h
> @@ -779,17 +779,18 @@ const char *dpif_upcall_type_to_string(enum 
> dpif_upcall_type);
> struct dpif_upcall {
> /* All types. */
> enum dpif_upcall_type type;
> -struct dp_packet packet;   /* Packet data. */
> -struct nlattr *key; /* Flow key. */
> -size_t key_len; /* Length of 'key' in bytes. */
> -ovs_u128 ufid;  /* Unique flow identifier for 'key'. */
> -struct nlattr *mru; /* Maximum receive unit. */
> -struct nlattr *cutlen;  /* Number of bytes shrink from the end. */
> 
> /* DPIF_UC_ACTION only. */
> -struct nlattr *userdata;/* Argument to OVS_ACTION_ATTR_USERSPACE. */
> -struct nlattr *out_tun_key;/* Output tunnel key. */
> struct nlattr *actions;/* Argument to OVS_ACTION_ATTR_USERSPACE. */
> +struct nlattr *out_tun_key;/* Output tunnel key. */
> +struct nlattr *userdata;/* Argument to OVS_ACTION_ATTR_USERSPACE. */
> +
> +struct nlattr *cutlen;  /* Number of bytes shrink from the end. */
> +struct nlattr *mru; /* Maximum receive unit. */
> +ovs_u128 ufid;  /* Unique flow identifier for 'key'. */
> +struct dp_packet packet;   /* Packet data. */
> +struct nlattr *key; /* Flow key. */
> +size_t key_len; /* Length of 'key' in bytes. */
> };
> 
> /* A callback to notify higher layer of dpif about to be purged, so that
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 08/12] dpif-netdev: Reorder elements in dp_netdev_port structure.

2016-10-07 Thread Jarno Rajahalme
Would equivalent packing be achieved by moving the line down before the bool 
instead? If yes, it would be preferable.

Acked-by: Jarno Rajahalme 

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> By reordering the data elements in dp_netdev_port structure, pad bytes
> can be reduced and there by saving a cache line.
> 
> Before: structure size:136, holes:3, sum padbytes:15, cachelines:3
> After: structure size:128, holes:1, sum padbytes:7, cachelines:2
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/dpif-netdev.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index dfc9cbd..262f4de 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -284,10 +284,10 @@ struct dp_netdev_rxq {
> /* A port in a netdev-based datapath. */
> struct dp_netdev_port {
> odp_port_t port_no;
> +unsigned n_rxq; /* Number of elements in 'rxq' */
> struct netdev *netdev;
> struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
> struct netdev_saved_flags *sf;
> -unsigned n_rxq; /* Number of elements in 'rxq' */
> struct dp_netdev_rxq *rxqs;
> bool dynamic_txqs;  /* If true XPS will be used. */
> unsigned *txq_used; /* Number of threads that uses each tx queue. 
> */
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 07/12] dpif-netdev: Cache align netdev_flow_keys.

2016-10-07 Thread Jarno Rajahalme

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> Aligning the 'keys' array seems to positively impact performance.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/dpif-netdev.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d0bb191..dfc9cbd 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4157,7 +4157,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
> /* Sparse or MSVC doesn't like variable length array. */
> enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST };
> #endif
> -struct netdev_flow_key keys[PKT_ARRAY_SIZE];
> +struct netdev_flow_key keys[PKT_ARRAY_SIZE] __attribute__((aligned(64)));

Due to compiler compatibility you must use OVS_ALIGNED_VAR(64) instead.

> struct packet_batch_per_flow batches[PKT_ARRAY_SIZE];
> long long now = time_msec();
> size_t newcnt, n_batches, i;
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 06/12] cmap: Remove prefetching in cmap_find_batch().

2016-10-07 Thread Jarno Rajahalme

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> prefetching the data in to the caches isn't improving the performance in
> cmap_find_batch(). Moreover its found that there is slight improvement
> in performance with out prefetching.
> 

I recall doing some performance testing for this earlier, but have no records 
of the system or other circumstances. Is it likely that this is at least 
somewhat system and test case dependent? Also the modified batch size may be a 
factor here. Anyway, I don’t currently have any proof to the contrary, so I 
have no problem removing the prefetching.

However, you should also update all the comments referring to prefetching.

  Jarno

> This patch removes prefetching from cmap_find_batch().
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/cmap.c | 4 
> 1 file changed, 4 deletions(-)
> 
> diff --git a/lib/cmap.c b/lib/cmap.c
> index 8c7312d..4c34bda 100644
> --- a/lib/cmap.c
> +++ b/lib/cmap.c
> @@ -397,7 +397,6 @@ cmap_find_batch(const struct cmap *cmap, unsigned long 
> map,
> ULLONG_FOR_EACH_1(i, map) {
> h1s[i] = rehash(impl, hashes[i]);
> b1s[i] = &impl->buckets[h1s[i] & impl->mask];
> -OVS_PREFETCH(b1s[i]);
> }
> /* Lookups, Round 1. Only look up at the first bucket. */
> ULLONG_FOR_EACH_1(i, map) {
> @@ -413,13 +412,11 @@ cmap_find_batch(const struct cmap *cmap, unsigned long 
> map,
> if (!node) {
> /* Not found (yet); Prefetch the 2nd bucket. */
> b2s[i] = &impl->buckets[other_hash(h1s[i]) & impl->mask];
> -OVS_PREFETCH(b2s[i]);
> c1s[i] = c1; /* We may need to check this after Round 2. */
> continue;
> }
> /* Found. */
> ULLONG_SET0(map, i); /* Ignore this on round 2. */
> -OVS_PREFETCH(node);
> nodes[i] = node;
> }
> /* Round 2. Look into the 2nd bucket, if needed. */
> @@ -453,7 +450,6 @@ cmap_find_batch(const struct cmap *cmap, unsigned long 
> map,
> continue;
> }
> found:
> -OVS_PREFETCH(node);
> nodes[i] = node;
> }
> return result;
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 05/12] dpif-netdev: Clear flow batches inside packet_batch_execute.

2016-10-07 Thread Jarno Rajahalme
Daniele had a comment on this, I believe?

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> There is a slight negative performance impact, by zeroing out the flow
> batch pointers in dp_netdev_input__ ahead of packet_batch_execute(). The
> issue has been observed with multiple batches test scenario.
> 
> This patch fixes the problem by removing the extra for loop and clear
> the flow batches inside the packet_batch_per_flow_execute(). Also the
> vtune analysis showed that the overall no. of instructions retired for
> dp_netdev_input__ reduced by 1,273,800,000 with this patch.
> 
> Fixes: 603f2ce04d00 ("dpif-netdev: Clear flow batches before execute.")
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/dpif-netdev.c | 5 +
> 1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c002dd3..d0bb191 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3878,6 +3878,7 @@ packet_batch_per_flow_execute(struct 
> packet_batch_per_flow *batch,
> {
> struct dp_netdev_actions *actions;
> struct dp_netdev_flow *flow = batch->flow;
> +flow->batch = NULL;
> 
> dp_netdev_flow_used(flow, batch->array.count, batch->byte_count,
> batch->tcp_flags, now);
> @@ -4173,10 +4174,6 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
> }
> 
> for (i = 0; i < n_batches; i++) {
> -batches[i].flow->batch = NULL;
> -}
> -
> -for (i = 0; i < n_batches; i++) {
> packet_batch_per_flow_execute(&batches[i], pmd, now);
> }
> }
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 04/12] hash: Skip invoking mhash_add__() with zero input.

2016-10-07 Thread Jarno Rajahalme

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> mhash_add__() is expensive and should be only called with valid input.
> This patch will validate the input before invoking the mhash_add__ and
> there by saving some cpu cycles.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/hash.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/hash.h b/lib/hash.h
> index 114a419..9bfebdb 100644
> --- a/lib/hash.h
> +++ b/lib/hash.h
> @@ -70,7 +70,7 @@ static inline uint32_t mhash_add__(uint32_t hash, uint32_t 
> data)
> 
> static inline uint32_t mhash_add(uint32_t hash, uint32_t data)
> {
> -hash = mhash_add__(hash, data);
> +hash = data ? mhash_add__(hash, data): hash;

IMO the zero check is best placed in the function mhash_add__(), where it is 
also more evident that zero-valued data would not change the hash anyway. Maybe 
a comment to that effect would be also nice?

> hash = hash_rot(hash, 13);
> return hash * 5 + 0xe6546b64;
> }
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 03/12] flow: Skip invoking expensive count_1bits() with zero input.

2016-10-07 Thread Jarno Rajahalme
With the nit below,

Acked-by: Jarno Rajahalme 

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> This patch checks if trash is non-zero and only then resets the flowmap
> bit and increment the pointer by set bits as found in trash.
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/flow.h | 6 --
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/flow.h b/lib/flow.h
> index 4eb19ae..8cfd243 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -609,11 +609,13 @@ mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
>  * corresponding data chunks should be skipped accordingly. */
> map_t trash = *fmap & (rm1bit - 1);
> 
> -*fmap -= trash;
> +if (trash) {
> +*fmap -= trash;
> /* count_1bits() is fast for systems where speed matters (e.g.,
>  * DPDK), so we don't try avoid using it.
>  * Advance 'aux->values' to point to the value for 'rm1bit'. */

You should update the comment, as we now avoid calling count_1bits().

> -aux->values += count_1bits(trash);
> +aux->values += count_1bits(trash);
> +}
> 
> *value = *aux->values;
> } else {
> -- 
> 2.4.11
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 02/12] flow: Add comments to mf_get_next_in_map()

2016-10-07 Thread Jarno Rajahalme

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 

A brief commit message should be included here. E.g.:

This patch adds comments to mf)get_next_in_map() to make it more comprehensible.

> Signed-off-by: Bhanuprakash Bodireddy 
> Signed-off-by: Antonio Fischetti 
> ---
> lib/flow.h | 23 +--
> 1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/flow.h b/lib/flow.h
> index ea24e28..4eb19ae 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -570,15 +570,27 @@ struct mf_for_each_in_map_aux {
> const uint64_t *values;
> };
> 
> +/*
> + * Parse the bitmap mask of the subtable and output the next value
> + * from the search-key.
> + */

While it is helpful to refer to higher level concepts such as subtable and 
search-key, I’d prefer the comment to rather state what the function does in 
terms of the abstractions at hand and then provide an example of use referring 
to subtables and such. Like this for example:

/* Get the data from ‘aux->values’ corresponding to the next lowest 1-bit in
 * ‘aux->map’, given that ‘aux->values’ points to an array of 64-bit words
 * corresponding to the 1-bits in ‘aux->fmap’, starting from the rightmost 
1-bit.
 * 
 * Returns ’true’ if the traversal is incomplete, ‘false’ otherwise. ‘aux’ is 
prepared
 * for the next iteration after each call.
 *
 * This is used to traverse through, for example, the values in a miniflow
 * representation of a flow key selected by non-zero 64-bit words in a
 * corresponding subtable mask. */ 

> static inline bool
> mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
>uint64_t *value)
> {
> -map_t *map, *fmap;
> +map_t *map;/* map refers to the bitmap mask of the subtable. */
> +map_t *fmap;   /* fmap refers to the bitmap of the search-key. */

Again, given that the example use was referred in the main comment above, these 
comments should stick to the abstractions at hand. Better yet, these comments 
should instead be placed into the aux struct definition above, e.g.:

struct mf_for_each_in_map_aux {
size_t unit;  /* Current 64-bit unit of the flowmaps 
being processed. */
struct flowmap fmap;  /* Remaining 1-bits corresponding to the 64-bit 
words in ‘values’
struct flowmap map;   /* Remaining 1-bits corresponding to the 64-bit 
words of interest.
const uint64_t *values;   /* 64-bit words corresponding to the 1-bits in 
‘fmap’. */
};

> map_t rm1bit;
> 
> +/* The bitmap mask selects which value must be considered from the
> + * search-key. We process the corresponding 'unit' of size 64 bits.
> + * 'aux->unit' is an index to the current unit of 64 bits. */

Given the above comments this could be changed to:

/* Skip empty map units. */

> while (OVS_UNLIKELY(!*(map = &aux->map.bits[aux->unit]))) {
> -/* Skip remaining data in the previous unit. */
> +/* No bits are enabled, so no search-key value will be output.
> + * In case some of the corresponding data chunks in the search-key
> + * bitmap are set, the data chunks must be skipped, as they are not
> + * considered by this mask. This is handled by incrementing 
> aux->values
> + * accordingly. */

This comment is incorrect, as the determination of the iteration termination is 
done later (the ‘false’ return case).

How about this:

/* Skip remaining data in the current unit before advancing to the next. */

> aux->values += count_1bits(aux->fmap.bits[aux->unit]);
> if (++aux->unit == FLOWMAP_UNITS) {
> return false;
> @@ -589,7 +601,12 @@ mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
> *map -= rm1bit;
> fmap = &aux->fmap.bits[aux->unit];
> 
> +/* If the 8-byte value referred by 'rm1bit' is present in the
> + * search-key return 'value', otherwise return 0. */

How about this instead:

/* If the rightmost 1-bit found from the current unit in ‘aux->map’
 * (‘rm1bit’) is also present in ‘aux->fmap’, store the corresponding
 * value from ‘aux->values’ to ‘*value', otherwise store 0. */

> if (OVS_LIKELY(*fmap & rm1bit)) {
> +/* The value is in the search-key, if the search-key contains other
> + * values preceeding the 'rm1bit' bit, we consider it trash and the
> + * corresponding data chunks should be skipped accordingly. */

How about this instead:

/* Skip all 64-bit words in ‘values’ preceding the one corresponding to 
‘rm1bit’. */

> map_t trash = *fmap & (rm1bit - 1);
> 
> *fmap -= trash;
> @@ -600,6 +617,8 @@ mf_get_next_in_map(struct mf_for_each_in_map_aux *aux,
> 
> *value = *aux->values;
> } else {
> +/* The search-key does not contain a value that corresponds to
> + * rm1bit. */

Given the above, I believe this comment can be omitted.

> *value = 0;
> }
> return true;
> -- 
> 2.4.11
> 
> ___
> dev ma

Re: [ovs-dev] [PATCH 01/12] dpcls: Use 32 packet batches for lookups.

2016-10-07 Thread Jarno Rajahalme

> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy 
>  wrote:
> 
> This patch increases the number of packets processed in a batch during a
> lookup from 16 to 32. Processing batches of 32 packets improves
> performance and also one of the internal loops can be avoided here.
> 

Can you provide some qualification of the performance test conditions? Do you 
believe the performance difference applies universally?

> Signed-off-by: Antonio Fischetti 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
> lib/dpif-netdev.c | 109 +++---
> 1 file changed, 46 insertions(+), 63 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 6e09e44..c002dd3 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4975,23 +4975,20 @@ dpcls_lookup(struct dpcls *cls, const struct 
> netdev_flow_key keys[],
>  int *num_lookups_p)
> {
> /* The received 'cnt' miniflows are the search-keys that will be processed
> - * in batches of 16 elements.  N_MAPS will contain the number of these
> - * 16-elements batches.  i.e. for 'cnt' = 32, N_MAPS will be 2.  The 
> batch
> - * size 16 was experimentally found faster than 8 or 32. */
> -typedef uint16_t map_type;
> + * to find a matching entry into the available subtables.
> + * The number of bits in map_type is equal to NETDEV_MAX_BURST. */

This needs a build time assertion to catch any future change in 
NETDEV_MAX_BURST.

Preferably, if you can verify that the compiler is able to remove the 
unnecessary loop in this case, you should consider not removing the extra loop 
that would kick in only if NETDEV_MAX_BURST becomes larger than 32.

> +typedef uint32_t map_type;
> #define MAP_BITS (sizeof(map_type) * CHAR_BIT)
> 
> -#if !defined(__CHECKER__) && !defined(_WIN32)
> -const int N_MAPS = DIV_ROUND_UP(cnt, MAP_BITS);
> -#else
> -enum { N_MAPS = DIV_ROUND_UP(NETDEV_MAX_BURST, MAP_BITS) };
> -#endif
> -map_type maps[N_MAPS];
> struct dpcls_subtable *subtable;
> 
> -memset(maps, 0xff, sizeof maps);
> -if (cnt % MAP_BITS) {
> -maps[N_MAPS - 1] >>= MAP_BITS - cnt % MAP_BITS; /* Clear extra bits. 
> */
> +map_type keys_map = 0x;
> +map_type found_map;
> +uint32_t hashes[MAP_BITS];
> +const struct cmap_node *nodes[MAP_BITS];
> +
> +if (OVS_UNLIKELY(cnt != NETDEV_MAX_BURST)) {
> +keys_map >>= NETDEV_MAX_BURST - cnt; /* Clear extra bits. */
> }
> memset(rules, 0, cnt * sizeof *rules);
> 
> @@ -5007,59 +5004,45 @@ dpcls_lookup(struct dpcls *cls, const struct 
> netdev_flow_key keys[],
> PVECTOR_FOR_EACH (subtable, &cls->subtables) {
> const struct netdev_flow_key *mkeys = keys;
> struct dpcls_rule **mrules = rules;
> -map_type remains = 0;
> -int m;
> -
> -BUILD_ASSERT_DECL(sizeof remains == sizeof *maps);
> -
> -/* Loops on each batch of 16 search-keys. */
> -for (m = 0; m < N_MAPS; m++, mkeys += MAP_BITS, mrules += MAP_BITS) {
> -uint32_t hashes[MAP_BITS];
> -const struct cmap_node *nodes[MAP_BITS];
> -unsigned long map = maps[m];
> -int i;
> -
> -if (!map) {
> -continue; /* Skip empty maps. */
> -}
> -
> -/* Compute hashes for the remaining keys.  Each search-key is
> - * masked with the subtable's mask to avoid hashing the 
> wildcarded
> - * bits. */
> -ULLONG_FOR_EACH_1(i, map) {
> -hashes[i] = netdev_flow_key_hash_in_mask(&mkeys[i],
> - &subtable->mask);
> -}
> -/* Lookup. */
> -map = cmap_find_batch(&subtable->rules, map, hashes, nodes);
> -/* Check results.  When the i-th bit of map is set, it means 
> that a
> - * set of nodes with a matching hash value was found for the i-th
> - * search-key.  Due to possible hash collisions we need to check
> - * which of the found rules, if any, really matches our masked
> - * search-key. */
> -ULLONG_FOR_EACH_1(i, map) {
> -struct dpcls_rule *rule;
> -
> -CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
> -if (OVS_LIKELY(dpcls_rule_matches_key(rule, &mkeys[i]))) 
> {
> -mrules[i] = rule;
> -/* Even at 20 Mpps the 32-bit hit_cnt cannot wrap
> - * within one second optimization interval  */
> -subtable->hit_cnt++;
> -lookups_match += subtable_pos;
> -goto next;
> -}
> +int i;
> +found_map = keys_map;
> +
> +/* Compute hashes for the remaining keys.  Each search-key is
> + * masked with the subtable's mask to avoid hashing the wildcarded
> +   

Re: [ovs-dev] [PATCH] ofproto: Always delete rules before deleting a meter.

2016-10-04 Thread Jarno Rajahalme

> On Oct 4, 2016, at 2:32 PM, Ben Pfaff  wrote:
> 
> On Fri, Sep 30, 2016 at 11:08:19AM -0700, Jarno Rajahalme wrote:
>> When deleting a bridge it is currently possible to delete a mater
>> without deleting the rules using the meter first.  Fix this by moving
>> the meter's rule deletion to meter_delete().
>> 
>> Signed-off-by: Jarno Rajahalme 
>> Reported-by: Petr Machata 
> 
> Acked-by: Ben Pfaff 

Thanks for the review!

Pushed to master, but not sure if need to backport to earlier branches, as 
meters are not really supported by any release yet?

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] Destroy rules before destroying meters

2016-09-30 Thread Jarno Rajahalme
Petr,

While deleting all the rules before deleting al the meters is more efficient, I 
think we should make the code more future proof by making sure we never delete 
a meter without deleting the rules referring to it.

Would you try the following patch to see if it fixes the problem you face:

  https://patchwork.ozlabs.org/patch/677164/

Regards,

  Jarno


> On Sep 22, 2016, at 8:03 AM, Petr Machata  wrote:
> 
> Hi,
> 
> this patch fixes a memory corruptor when a bridge containing rules with
> meters is removed from the system.  It was observed and fixed on
> branch-2.5.
> 
> This was regtested by "make check", though without SSL and IPsec tests,
> and the patch introduced no regressions.  It also actually fixes the
> corruptor in our system.
> 
> Please consider applying.
> 
> On master, I see that ofproto_rule_remove__ is called in
> delete_flows_start__ instead of delete_flows_finish__.  But otherwise
> the sequencing is the same and I think the problem should exhibit on
> master as well.  We don't however have resources to forward-port our
> provider and actually observe the issue in practice.
> 
> Thank you,
> Petr Machata
> 
> --8<
> 
> An ofproto that's being destroyed could include rules with meters.  Each
> meter in turn holds a list of rules where that meter is attached.  Rules
> that are destroyed are removed from the list of rules at the meter
> attached to the rule under destruction.
> 
> But the function ofproto_destroy sequences meter removal before a call
> to ofproto_flush__, which takes care of rule removal.  Thus at the time
> that rules are removed from the list at a meter, that list is already
> gone.  This leads to invalid memory accesses.
> 
> The particular call chains are:
> 
> - ofproto_destroy calls meter_delete calls free
> 
> - ofproto_destroy calls ofproto_flush__ calls delete_flows__ calls
>  delete_flows_finish__ calls ofproto_rule_remove__ calls
>  list_remove(&rule->meter_list_node).  rule->meter_list_node's next
>  references a free'd meter->rules.
> 
> To fix this problem, change the order in which rules and meters are
> destroyed.
> 
> Signed-off-by: Amit Nishry 
> Signed-off-by: Mykola Zhuravel 
> ---
> ofproto/ofproto.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> index c6b802b..6cdfa8c 100644
> --- a/ofproto/ofproto.c
> +++ b/ofproto/ofproto.c
> @@ -1590,6 +1590,8 @@ ofproto_destroy(struct ofproto *p, bool del)
> return;
> }
> 
> +ofproto_flush__(p);
> +
> if (p->meters) {
> meter_delete(p, 1, p->meter_features.max_meters);
> p->meter_features.max_meters = 0;
> @@ -1597,7 +1599,6 @@ ofproto_destroy(struct ofproto *p, bool del)
> p->meters = NULL;
> }
> 
> -ofproto_flush__(p);
> HMAP_FOR_EACH_SAFE (ofport, next_ofport, hmap_node, &p->ports) {
> ofport_destroy(ofport, del);
> }
> -- 
> 2.1.0
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] ofproto: Always delete rules before deleting a meter.

2016-09-30 Thread Jarno Rajahalme
When deleting a bridge it is currently possible to delete a mater
without deleting the rules using the meter first.  Fix this by moving
the meter's rule deletion to meter_delete().

Signed-off-by: Jarno Rajahalme 
Reported-by: Petr Machata 
---
 ofproto/ofproto.c | 35 +++
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index d601f71..de1c469 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -6192,10 +6192,22 @@ static void
 meter_delete(struct ofproto *ofproto, uint32_t first, uint32_t last)
 OVS_REQUIRES(ofproto_mutex)
 {
-uint32_t mid;
-for (mid = first; mid <= last; ++mid) {
+for (uint32_t mid = first; mid <= last; ++mid) {
 struct meter *meter = ofproto->meters[mid];
 if (meter) {
+/* First delete the rules that use this meter. */
+if (!ovs_list_is_empty(&meter->rules)) {
+struct rule_collection rules;
+struct rule *rule;
+
+rule_collection_init(&rules);
+
+LIST_FOR_EACH (rule, meter_list_node, &meter->rules) {
+rule_collection_add(&rules, rule);
+}
+delete_flows__(&rules, OFPRR_METER_DELETE, NULL);
+}
+
 ofproto->meters[mid] = NULL;
 ofproto->ofproto_class->meter_del(ofproto,
   meter->provider_meter_id);
@@ -6253,7 +6265,6 @@ handle_delete_meter(struct ofconn *ofconn, struct 
ofputil_meter_mod *mm)
 {
 struct ofproto *ofproto = ofconn_get_ofproto(ofconn);
 uint32_t meter_id = mm->meter.meter_id;
-struct rule_collection rules;
 enum ofperr error = 0;
 uint32_t first, last;
 
@@ -6267,25 +6278,9 @@ handle_delete_meter(struct ofconn *ofconn, struct 
ofputil_meter_mod *mm)
 first = last = meter_id;
 }
 
-/* First delete the rules that use this meter.  If any of those rules are
- * currently being modified, postpone the whole operation until later. */
-rule_collection_init(&rules);
-ovs_mutex_lock(&ofproto_mutex);
-for (meter_id = first; meter_id <= last; ++meter_id) {
-struct meter *meter = ofproto->meters[meter_id];
-if (meter && !ovs_list_is_empty(&meter->rules)) {
-struct rule *rule;
-
-LIST_FOR_EACH (rule, meter_list_node, &meter->rules) {
-rule_collection_add(&rules, rule);
-}
-}
-}
-delete_flows__(&rules, OFPRR_METER_DELETE, NULL);
-
 /* Delete the meters. */
+ovs_mutex_lock(&ofproto_mutex);
 meter_delete(ofproto, first, last);
-
 ovs_mutex_unlock(&ofproto_mutex);
 
 return error;
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Deferring ofproto_class::destruct vs. ovs-appctl exit

2016-09-29 Thread Jarno Rajahalme
Btw. I don’t see this problem triggered by the testsuite even if I set the 
‘ofproto->backer' pointer to NULL right after the free() call. Do you see this 
happening on an unmodified OVS 2.5? If so, could you send the steps to 
reproduce. Just need to know if this is a potential crashing bug in OVS 2.5(.1) 
that needs to be fixed, or if this is something that affects your development.

Thanks,

  Jarno

> On Sep 29, 2016, at 12:54 PM, Jarno Rajahalme  wrote:
> 
> This may not be the cleanest solution, but how about changing the last line 
> of close_dpif_backer() in ofproto/ofproto-dpif.c like this:
> 
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 83dcc9c..5b42b7e 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -864,7 +864,7 @@ close_dpif_backer(struct dpif_backer *backer)
> free(backer->type);
> free(backer->dp_version_string);
> dpif_close(backer->dpif);
> -free(backer);
> +ovsrcu_postpone(free, backer);
> }
> 
> /* Datapath port slated for removal from datapath. */
> 
> Maybe this would solve the problem you found? That is, if the problem is that 
> the backer reference is stale, keeping the memory around for the RCU thread 
> should help. Obviously most of the backer has already been destructed at this 
> point, but it this case it should not matter.
> 
>  Jarno
> 
>> On Sep 29, 2016, at 11:32 AM, Petr Machata  wrote:
>> 
>> Hi,
>> 
>> On 2.5, we are seeing the following problem when removing a bridge:
>> 
>> - ofproto_destroy calls ofproto_flush__, which eventually calls
>>   ovsrcu_postpone(remove_rules_rcu)
>> 
>> - ofproto_destroy also calls p->ofproto_class->destruct, which
>>   eventually leads to release of DPIF backer (close_dpif_backer)
>> 
>> - at some later point, remove_rules_rcu is picked up by the RCU
>>   thread.  That calls rule_delete, calls complete_operation, and that
>>   references the backer, which is however already gone:
>> ofproto->backer->need_revalidate = REV_FLOW_TABLE;
>> 
>> My first idea is to do this:
>> 
>> modified   ofproto/ofproto.c
>> @@ -1588,9 +1588,16 @@ ofproto_destroy__(struct ofproto *ofproto)
>> * - 1st we defer the removal of the rules from the classifier
>> * - 2nd we defer the actual destruction of the rules. */
>> static void
>> +ofproto_class_destruct__(struct ofproto *ofproto)
>> +{
>> +ofproto->ofproto_class->destruct(ofproto);
>> +}
>> +
>> +static void
>> ofproto_destroy_defer__(struct ofproto *ofproto)
>>OVS_EXCLUDED(ofproto_mutex)
>> {
>> +ovsrcu_postpone(ofproto_class_destruct__, ofproto);
>>ovsrcu_postpone(ofproto_destroy__, ofproto);
>> }
>> 
>> @@ -1623,8 +1630,6 @@ ofproto_destroy(struct ofproto *p, bool del)
>>free(usage);
>>}
>> 
>> -p->ofproto_class->destruct(p);
>> -
>>/* We should not postpone this because it involves deleting a listening
>> * socket which we may want to reopen soon. 'connmgr' should not be used
>> * by other threads */
>> 
>> That seems to fix the issue.
>> 
>> But "ovs-appctl exit" (or rather the ovs-vswitchd exit action that
>> ovs-appctl exit triggers) doesn't wait for the RCU thread to do all the
>> deferred work, so this ends up not calling the cleanup at all.  We can
>> work around by writing something like "ovs-appctl mlnx/wait-br-cleanup",
>> but that's unsatisfactory.
>> 
>> It seems like ovs-vswitchd's exit handling should actually wait for
>> deferred work to get done.
>> 
>> Thoughts?  How would I go about implementing this?
>> 
>> Thanks,
>> Petr
>> ___
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
> 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Deferring ofproto_class::destruct vs. ovs-appctl exit

2016-09-29 Thread Jarno Rajahalme
This may not be the cleanest solution, but how about changing the last line of 
close_dpif_backer() in ofproto/ofproto-dpif.c like this:

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 83dcc9c..5b42b7e 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -864,7 +864,7 @@ close_dpif_backer(struct dpif_backer *backer)
 free(backer->type);
 free(backer->dp_version_string);
 dpif_close(backer->dpif);
-free(backer);
+ovsrcu_postpone(free, backer);
 }
 
 /* Datapath port slated for removal from datapath. */

Maybe this would solve the problem you found? That is, if the problem is that 
the backer reference is stale, keeping the memory around for the RCU thread 
should help. Obviously most of the backer has already been destructed at this 
point, but it this case it should not matter.

  Jarno

> On Sep 29, 2016, at 11:32 AM, Petr Machata  wrote:
> 
> Hi,
> 
> On 2.5, we are seeing the following problem when removing a bridge:
> 
>  - ofproto_destroy calls ofproto_flush__, which eventually calls
>ovsrcu_postpone(remove_rules_rcu)
> 
>  - ofproto_destroy also calls p->ofproto_class->destruct, which
>eventually leads to release of DPIF backer (close_dpif_backer)
> 
>  - at some later point, remove_rules_rcu is picked up by the RCU
>thread.  That calls rule_delete, calls complete_operation, and that
>references the backer, which is however already gone:
>  ofproto->backer->need_revalidate = REV_FLOW_TABLE;
> 
> My first idea is to do this:
> 
> modified   ofproto/ofproto.c
> @@ -1588,9 +1588,16 @@ ofproto_destroy__(struct ofproto *ofproto)
>  * - 1st we defer the removal of the rules from the classifier
>  * - 2nd we defer the actual destruction of the rules. */
> static void
> +ofproto_class_destruct__(struct ofproto *ofproto)
> +{
> +ofproto->ofproto_class->destruct(ofproto);
> +}
> +
> +static void
> ofproto_destroy_defer__(struct ofproto *ofproto)
> OVS_EXCLUDED(ofproto_mutex)
> {
> +ovsrcu_postpone(ofproto_class_destruct__, ofproto);
> ovsrcu_postpone(ofproto_destroy__, ofproto);
> }
> 
> @@ -1623,8 +1630,6 @@ ofproto_destroy(struct ofproto *p, bool del)
> free(usage);
> }
> 
> -p->ofproto_class->destruct(p);
> -
> /* We should not postpone this because it involves deleting a listening
>  * socket which we may want to reopen soon. 'connmgr' should not be used
>  * by other threads */
> 
> That seems to fix the issue.
> 
> But "ovs-appctl exit" (or rather the ovs-vswitchd exit action that
> ovs-appctl exit triggers) doesn't wait for the RCU thread to do all the
> deferred work, so this ends up not calling the cleanup at all.  We can
> work around by writing something like "ovs-appctl mlnx/wait-br-cleanup",
> but that's unsatisfactory.
> 
> It seems like ovs-vswitchd's exit handling should actually wait for
> deferred work to get done.
> 
> Thoughts?  How would I go about implementing this?
> 
> Thanks,
> Petr
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 3/3] ofp-actions: Always consider inconsistent CT actions as an error.

2016-09-27 Thread Jarno Rajahalme
Thanks for the fast review, Joe!

Series pushed to master, branch-2.6, and branch-2.5.

  Jarno

> On Sep 27, 2016, at 2:38 PM, Joe Stringer  wrote:
> 
> On 27 September 2016 at 14:32, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> We can't downgrade to OF1.0 and expect inconsistent CT actions
>> be silently disgarded.  Instead, datapath flow install fails, so
>> it is better to flag inconsistent CT actions as hard errors.
>> 
>> Signed-off-by: Jarno Rajahalme mailto:ja...@ovn.org>>
> 
> LGTM, minor spelling below. Thanks.
> 
> Acked-by: Joe Stringer mailto:j...@ovn.org>>
> 
>> ---
>> lib/ofp-actions.c |  7 +--
>> tests/ofproto-dpif.at | 40 
>> tests/ovs-ofctl.at| 46 +++---
>> 3 files changed, 48 insertions(+), 45 deletions(-)
>> 
>> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
>> index f896f98..19e47fb 100644
>> --- a/lib/ofp-actions.c
>> +++ b/lib/ofp-actions.c
>> @@ -7033,7 +7033,10 @@ ofpact_check__(enum ofputil_protocol 
>> *usable_protocols, struct ofpact *a,
>> if (!dl_type_is_ip_any(flow->dl_type)
>> || (flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)
>> || (oc->alg == IPPORT_FTP && flow->nw_proto != IPPROTO_TCP)) {
>> -inconsistent_match(usable_protocols);
>> +/* We can't downgrade to OF1.0 and expect inconsistent CT 
>> actions
>> + * be silently disgarded.  Instead, datapath flow install 
>> fails, so
> 
> *discarded

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] system-traffic: Collapse FTP NAT tests.

2016-09-27 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

> On Sep 7, 2016, at 4:34 PM, Joe Stringer  wrote:
> 
> Previously we had the following tests:
> * FTP with NAT
> * FTP with NAT (seq-adj)
> * FTP with NAT 2
> 
> Tests 1 and 2 share everything, except use different IP addresses. Test
> 3 has a different flow table, but shares the topology with 1 and 2.
> 
> This commit creates macros:
> * CHECK_FTP_NAT(title, ip, flow_table)
> * CHECK_FTP_NAT_PRE_RECIRC(title, ip, ip-as-hex)
> * CHECK_FTP_NAT_POST_RECIRC(title, ip, ip-as-hex)
> 
> The second macro represents tests 1 and 2, while the third macro
> represents two variations on test 3: with and without TCP sequence
> adjustment.
> 
> By using these macros to declare the tests, much of the code may be
> reused and shared rather than copying/pasting. As a result, the
> differences between tests are easier to identify.
> 
> Signed-off-by: Joe Stringer 
> ---
> tests/system-traffic.at | 218 ++--
> 1 file changed, 81 insertions(+), 137 deletions(-)
> 
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 4dabd90356a1..31076f8b141a 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -2405,103 +2405,55 @@ 
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=
> OVS_TRAFFIC_VSWITCHD_STOP
> AT_CLEANUP
> 
> -AT_SETUP([conntrack - FTP with NAT])
> -AT_SKIP_IF([test $HAVE_PYFTPDLIB = no])
> -CHECK_CONNTRACK()
> -CHECK_CONNTRACK_NAT()
> -
> -OVS_TRAFFIC_VSWITCHD_START()
> -
> -ADD_NAMESPACES(at_ns0, at_ns1)
> -
> -ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> -NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address 80:88:88:88:88:88])
> -ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> -
> -dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
> ns1->ns0.
> -
> -AT_DATA([flows.txt], [dnl
> -dnl track all IP traffic, de-mangle non-NEW connections
> -table=0 in_port=1, ip, action=ct(table=1,nat)
> -table=0 in_port=2, ip, action=ct(table=2,nat)
> -dnl
> -dnl ARP
> -dnl
> -table=0 priority=100 arp arp_op=1 
> action=move:OXM_OF_ARP_TPA[[]]->NXM_NX_REG2[[]],resubmit(,8),goto_table:10
> -table=0 priority=10 arp action=normal
> -table=0 priority=0 action=drop
> -dnl
> -dnl Table 1: port 1 -> 2
> -dnl
> -dnl Allow new FTP connections. These need to be commited.
> -table=1 ct_state=+new, tcp, tp_dst=21, nw_src=10.1.1.1, 
> action=ct(alg=ftp,commit,nat(src=10.1.1.9)),2
> -dnl Allow established TCP connections, make sure they are NATted already.
> -table=1 ct_state=+est, tcp, nw_src=10.1.1.9, action=2
> -dnl
> -dnl Table 1: droppers
> +dnl CHECK_FTP_NAT(TITLE, IP_ADDR, FLOWS)
> dnl
> -table=1 priority=10, tcp, action=drop
> -table=1 priority=0,action=drop
> -dnl
> -dnl Table 2: port 2 -> 1
> -dnl
> -dnl Allow established TCP connections, make sure they are reverse NATted
> -table=2 ct_state=+est, tcp, nw_dst=10.1.1.1, action=1
> -dnl Allow (new) related (data) connections.  These need to be commited.
> -table=2 ct_state=+new+rel, tcp, nw_dst=10.1.1.9, action=ct(commit,nat),1
> -dnl Allow related ICMP packets, make sure they are reverse NATted
> -table=2 ct_state=+rel, icmp, nw_dst=10.1.1.1, action=1
> -dnl
> -dnl Table 2: droppers
> -dnl
> -table=2 priority=10, tcp, action=drop
> -table=2 priority=0, action=drop
> -dnl
> -dnl MAC resolution table for IP in reg2, stores mac in OXM_OF_PKT_REG0
> -dnl
> -table=8,reg2=0x0a010109/0x,action=load:0x8088->OXM_OF_PKT_REG0[[]]
> -table=8,priority=0,action=load:0->OXM_OF_PKT_REG0[[]]
> -dnl ARP responder mac filled in at OXM_OF_PKT_REG0, or 0 for normal action.
> -dnl TPA IP in reg2.
> -dnl Swaps the fields of the ARP message to turn a query to a response.
> -table=10 priority=100 arp xreg0=0 action=normal
> -table=10 
> priority=10,arp,arp_op=1,action=load:2->OXM_OF_ARP_OP[[]],move:OXM_OF_ARP_SHA[[]]->OXM_OF_ARP_THA[[]],move:OXM_OF_PKT_REG0[[0..47]]->OXM_OF_ARP_SHA[[]],move:OXM_OF_ARP_SPA[[]]->OXM_OF_ARP_TPA[[]],move:NXM_NX_REG2[[]]->OXM_OF_ARP_SPA[[]],move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],move:OXM_OF_PKT_REG0[[0..47]]->NXM_OF_ETH_SRC[[]],move:NXM_OF_IN_PORT[[]]->NXM_NX_REG3[[0..15]],load:0->NXM_OF_IN_PORT[[]],output:NXM_NX_REG3[[0..15]]
> -table=10 priority=0 action=drop
> -])
> +dnl Checks the implementation of conntrack with FTP ALGs in combination with
> +dnl NAT, using the provided flow table.
> +m4_define([CHECK_FTP_NAT],
> +   [AT_SETUP([conntrack - FTP NAT $1])
> +AT_SKIP_IF([test $HAVE_PYFTPDLIB = no])
> +CHECK_CONNTRACK()
> +CHECK_CONNTRACK_NAT()
> 
> -AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> 

[ovs-dev] [PATCH v2 1/3] ofp-actions: Style fixes.

2016-09-27 Thread Jarno Rajahalme
Replace a tab by a space and remove an unnecessary variable.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/ofp-actions.h | 2 +-
 lib/ofp-actions.c | 8 +++-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 67e84fa..74e9dcc 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -556,7 +556,7 @@ enum nx_conntrack_flags {
 #define NX_CT_RECIRC_NONE OFPTT_ALL
 
 #if !defined(IPPORT_FTP)
-#defineIPPORT_FTP  21
+#define IPPORT_FTP  21
 #endif
 
 /* OFPACT_CT.
diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 22c7b16..6fea508 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -7029,7 +7029,6 @@ ofpact_check__(enum ofputil_protocol *usable_protocols, 
struct ofpact *a,
 
 case OFPACT_CT: {
 struct ofpact_conntrack *oc = ofpact_get_CT(a);
-enum ofperr err;
 
 if (!dl_type_is_ip_any(flow->dl_type)
 || (flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)) {
@@ -7040,10 +7039,9 @@ ofpact_check__(enum ofputil_protocol *usable_protocols, 
struct ofpact *a,
 return mf_check_src(&oc->zone_src, flow);
 }
 
-err = ofpacts_check(oc->actions, ofpact_ct_get_action_len(oc),
-flow, max_ports, table_id, n_tables,
-usable_protocols);
-return err;
+return ofpacts_check(oc->actions, ofpact_ct_get_action_len(oc),
+ flow, max_ports, table_id, n_tables,
+ usable_protocols);
 }
 
 case OFPACT_NAT: {
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2 2/3] ofp-actions: Check that 'alg=ftp' matches on TCP.

2016-09-27 Thread Jarno Rajahalme
Datapath flow setup fails when setting the FTP helper on an
unsupported IP protocol.  It is better to fail at the OpenFlow rule
set-up time instead.

Signed-off-by: Jarno Rajahalme 
---
 lib/ofp-actions.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 6fea508..f896f98 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -7031,7 +7031,8 @@ ofpact_check__(enum ofputil_protocol *usable_protocols, 
struct ofpact *a,
 struct ofpact_conntrack *oc = ofpact_get_CT(a);
 
 if (!dl_type_is_ip_any(flow->dl_type)
-|| (flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)) {
+|| (flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)
+|| (oc->alg == IPPORT_FTP && flow->nw_proto != IPPROTO_TCP)) {
 inconsistent_match(usable_protocols);
 }
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2 3/3] ofp-actions: Always consider inconsistent CT actions as an error.

2016-09-27 Thread Jarno Rajahalme
We can't downgrade to OF1.0 and expect inconsistent CT actions
be silently disgarded.  Instead, datapath flow install fails, so
it is better to flag inconsistent CT actions as hard errors.

Signed-off-by: Jarno Rajahalme 
---
 lib/ofp-actions.c |  7 +--
 tests/ofproto-dpif.at | 40 
 tests/ovs-ofctl.at| 46 +++---
 3 files changed, 48 insertions(+), 45 deletions(-)

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index f896f98..19e47fb 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -7033,7 +7033,10 @@ ofpact_check__(enum ofputil_protocol *usable_protocols, 
struct ofpact *a,
 if (!dl_type_is_ip_any(flow->dl_type)
 || (flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)
 || (oc->alg == IPPORT_FTP && flow->nw_proto != IPPROTO_TCP)) {
-inconsistent_match(usable_protocols);
+/* We can't downgrade to OF1.0 and expect inconsistent CT actions
+ * be silently disgarded.  Instead, datapath flow install fails, so
+ * it is better to flag inconsistent CT actions as hard errors. */
+return OFPERR_OFPBAC_MATCH_INCONSISTENT;
 }
 
 if (oc->zone_src.field) {
@@ -7052,7 +7055,7 @@ ofpact_check__(enum ofputil_protocol *usable_protocols, 
struct ofpact *a,
 (on->range_af == AF_INET && flow->dl_type != htons(ETH_TYPE_IP)) ||
 (on->range_af == AF_INET6
  && flow->dl_type != htons(ETH_TYPE_IPV6))) {
-inconsistent_match(usable_protocols);
+return OFPERR_OFPBAC_MATCH_INCONSISTENT;
 }
 return 0;
 }
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index e2b983f..7b5972d 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8396,7 +8396,7 @@ add_of_ports br0 1 2
 AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg vconn:info ofproto_dpif:info])
 
 AT_DATA([flows.txt], [dnl
-ct_state=-trk,action=ct(table=0,zone=0)
+ipv6,ct_state=-trk,action=ct(table=0,zone=0)
 ct_state=+trk,action=controller
 ])
 
@@ -8407,14 +8407,14 @@ AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl -P 
nxt_packet_in --detach --no
 
 AT_CHECK([ovs-appctl time/stop])
 
-AT_CHECK([ovs-appctl netdev-dummy/receive p2 
'eth(src=80:88:88:88:88:88,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.0.1,tip=192.168.0.2,op=1,sha=50:54:00:00:00:05,tha=00:00:00:00:00:00)'])
+AT_CHECK([ovs-appctl netdev-dummy/receive p2 
'0060970769ea860580da86dd60203afffe80020086fffe0580dafe80026097fffe0769ea870068bdfe80026097fffe0769ea0101860580da'])
 
 OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 1])
 OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
 
 AT_CHECK([cat ofctl_monitor.log], [0], [dnl
-NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=42 ct_state=inv|trk,in_port=2 
(via action) data_len=42 (unbuffered)
-arp,vlan_tci=0x,dl_src=80:88:88:88:88:88,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.0.1,arp_tpa=192.168.0.2,arp_op=1,arp_sha=50:54:00:00:00:05,arp_tha=00:00:00:00:00:00
+NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=86 ct_state=inv|trk,in_port=2 
(via action) data_len=86 (unbuffered)
+icmp6,vlan_tci=0x,dl_src=00:00:86:05:80:da,dl_dst=00:60:97:07:69:ea,ipv6_src=fe80::200:86ff:fe05:80da,ipv6_dst=fe80::260:97ff:fe07:69ea,ipv6_label=0x0,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=135,icmp_code=0,nd_target=fe80::260:97ff:fe07:69ea,nd_sll=00:00:86:05:80:da,nd_tll=00:00:00:00:00:00
 icmp6_csum:68bd
 ])
 
 OVS_VSWITCHD_STOP
@@ -8545,8 +8545,8 @@ AT_DATA([flows.txt], [dnl
 dnl Table 0
 dnl
 table=0,priority=100,arp,action=normal
-table=0,priority=10,in_port=1,udp,action=ct(commit,table=1)
-table=0,priority=10,in_port=2,action=ct(table=1)
+table=0,priority=10,ip,in_port=1,udp,action=ct(commit,table=1)
+table=0,priority=10,ip,in_port=2,action=ct(table=1)
 table=0,priority=1,action=drop
 dnl
 dnl Table 1
@@ -8596,12 +8596,12 @@ dnl Allow new connections on p1->p2. Allow only 
established connections p2->p1
 AT_DATA([flows.txt], [dnl
 dnl Table 0
 dnl
-table=0,priority=100,arp,action=normal
-table=0,priority=10,in_port=1,udp,tp_src=1,action=ct(commit,exec(set_field:1->ct_mark)),controller
-table=0,priority=10,in_port=1,udp,tp_src=3,action=ct(commit,exec(set_field:3->ct_mark)),controller
-table=0,priority=10,in_port=1,udp,tp_src=5,action=ct(commit,exec(set_field:5->ct_mark)),controller
-table=0,priority=10,in_port=2,actions=ct(table=1)
-table=0,priority=1,action=drop
+table=0,arp,action=normal
+table=0,ip,in_port=1,udp,tp_src=1,action=ct(commit,exec(set_field:1->ct_mark)),controller
+table=0,ip,in_port=1,udp,tp_src=3,action=ct(commit,exec(set_field:3->ct_mark)),controller
+table=0,ip,in_port=1,udp,tp_src=5,action=ct(commit,exec(set_field:5->ct_mark)),controller
+table=0,ip,in_port=2,act

Re: [ovs-dev] [PATCH] ofp-actions: Error on conntrack action inconsistency.

2016-09-27 Thread Jarno Rajahalme

> On Sep 27, 2016, at 11:06 AM, Joe Stringer  wrote:
> 
> On 26 September 2016 at 18:46, Jarno Rajahalme  <mailto:ja...@ovn.org>> wrote:
>> Setting up a datapath flow that has a conntrack action with 'alg=ftp',
>> but does not match on 'nw_proto=tcp' fails with 'WARN' in
>> ovs-vswitchd.log.  It is better to flag such inconsistencies during
>> OpenFlow rule set-up time.  Also, conntrack action inconsistencies
>> should be flagged as such, rather than assuming that falling back to
>> OpenFlow 1.0 (which allows inconsistent actions) would remedy the
>> situation.
>> 
>> Remove the IP check from the action inconsistency check so that it is
>> possible to write rules that operate on both IPv4 and IPv6 packets,
>> when the action itself is not IP version dependent.  Correspondingly,
>> when translating a conntrack action, do not issue any datapath actions
>> if the packet being translated is not an IP packet, as conntrack can
>> operate only on IP packets and a datapath flow set-up otherwise fails
>> anyway.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> I see about three separate changes here, it would be easier to
> understand what you're trying to solve if they were proposed
> separately. I also see a few cosmetic changes which are usually
> handled separate from functional changes.
> 
> * Ensuring match on TCP for alg=ftp looks good
> 
> * Enforcing the action inconsistency check
> 
> Falling back to OF1.0 could allow one flow to execute CT() on any
> {ip,ipv6} packets, so while it may fail it's also not a terrible
> assumption that it may remedy the situation. That said, it sounds like
> a reasonable change to me - I doubt that people are relying on this,
> and it makes the errors more explicit by erroring back at the OF layer
> rather than printing a message in a kernel log somewhere.
> 
> There may be an argument for the two above being bugfixes: people
> running v2.5 LTS are more likely to run into unexpected translation
> errors in the datapath without these checks being properly presented
> at the OpenFlow layer.
> 

OK.

> * Removing IP check from action inconsistency check
> 
> I wouldn't mind some input from eg. OVN pipeline writers whether it is
> useful to match on any proto (ie, ipv4+ipv6) to do CT(). This is a
> basic tradeoff between pipeline flexibility and explicit erroring at
> the OpenFlow layer. I could see someone wanting to build conjunctive
> flows that do CT(), but it's disallowed today and no-one's complained,
> so is it really that useful?
> 
> Regardless I'm not sure that silently skipping the CT() action is
> quite right - shouldn't it complain that the flows are wrong?
> 

I’ll remove this change in v2 and separate out the changes to separate patches.

Thanks for the review,

  Jarno

>> ---
>> include/openvswitch/ofp-actions.h |  2 +-
>> lib/ofp-actions.c | 22 +++---
>> ofproto/ofproto-dpif-xlate.c  | 10 +++---
>> tests/ofproto-dpif.at |  6 +++---
>> tests/ovs-ofctl.at| 34 +-
>> 5 files changed, 39 insertions(+), 35 deletions(-)
>> 
>> diff --git a/include/openvswitch/ofp-actions.h 
>> b/include/openvswitch/ofp-actions.h
>> index 67e84fa..74e9dcc 100644
>> --- a/include/openvswitch/ofp-actions.h
>> +++ b/include/openvswitch/ofp-actions.h
>> @@ -556,7 +556,7 @@ enum nx_conntrack_flags {
>> #define NX_CT_RECIRC_NONE OFPTT_ALL
>> 
>> #if !defined(IPPORT_FTP)
>> -#defineIPPORT_FTP  21
>> +#define IPPORT_FTP  21
>> #endif
>> 
>> /* OFPACT_CT.
>> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
>> index 22c7b16..4a8e79d 100644
>> --- a/lib/ofp-actions.c
>> +++ b/lib/ofp-actions.c
>> @@ -7029,31 +7029,31 @@ ofpact_check__(enum ofputil_protocol 
>> *usable_protocols, struct ofpact *a,
>> 
>> case OFPACT_CT: {
>> struct ofpact_conntrack *oc = ofpact_get_CT(a);
>> -enum ofperr err;
>> 
>> -if (!dl_type_is_ip_any(flow->dl_type)
>> -|| (flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)) 
>> {
>> -inconsistent_match(usable_protocols);
>> +if ((flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)
>> +|| (oc->alg == IPPORT_FTP && flow->nw_proto != IPPROTO_TCP)) {
>> +/* We can't downgrade to OF1.0 and expect inconsistent CT 
>> actions
>> + * be silently disgarded.  Inst

Re: [ovs-dev] 2.6 release update

2016-09-27 Thread Jarno Rajahalme

> On Sep 27, 2016, at 8:36 AM, Ben Pfaff  wrote:
> 
> The previous plan was to release last Thursday.  When I surveyed the
> team at VMware, they said there were a few more bug fixes they wanted to
> get into 2.6.  I think that the last of these is this fix from Jarno:
>https://patchwork.ozlabs.org/patch/672410/
> 
> Once that's in, which I expect to happen today, we'll do the release.
> Justin, do you want to send the release patches now so that we can have
> that ready to go?
> 

I’m working today on a small OpenFlow API tightening around validating CT 
actions. Currently we allow inconsistent CT actions on OpenFlow 1.0, which in 
some cases leads to surprising results, like packet executes working but 
corresponding datapath flow setups failing.

IMO we could address this on a .1 release as well, but it might be nice to have 
these kind of changes done ASAP…

  Jarno



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Any updates on the "userspace meter" patches ?

2016-09-27 Thread Jarno Rajahalme

> On Sep 27, 2016, at 4:57 AM, Enas Ahmad  wrote:
> 
> Hi,
> I know that meters are still not implemented in the OVS master branch. 
> However, I would like to enquiry about the "userspace meter" patches 
> submitted a while ago by Jarno Rajahalme, is there any updates on these 
> patches? has the implementation been extended to support "DSCP remark" action 
> ? 
> 

They have not been applied, as they still need more work.

Regards,

  Jarno

> Thanks,
> Enas
> 
> This message and its contents, including attachments are intended solely for 
> the original recipient. If you are not the intended recipient or have 
> received this message in error, please notify me immediately and delete this 
> message from your computer system. Any unauthorized use or distribution is 
> prohibited. Please consider the environment before printing this email.

This legalese makes little sense on a public mailing list.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] upcall: Don't start new revalidation round too soon after the last one.

2016-09-27 Thread Jarno Rajahalme

> On Sep 26, 2016, at 5:39 PM, Ben Pfaff  wrote:
> 
> On Tue, Sep 20, 2016 at 11:42:45AM -0700, Jarno Rajahalme wrote:
>> The execution time of 'ovs-ofctl add-flows' with a large number of
>> flows can be more than halved if revalidators are not running after
>> each flow mod separately.  This was first suspected when it was found
>> that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
>> same command without the '--bundle' option in a scenario where there
>> is a large set of flows being added and no datapath flows at all.  One
>> of the differences caused by the '--bundle' option is that the
>> revalidators are woken up only once, at the end of the whole set of
>> flow table changes, rather than after each flow table change
>> individually.
>> 
>> This patch limits the revalidation to run at most 200 times a second
>> by enforcing a minimum of 5ms time gap between the start times of
>> revalidation rounds.  If nothing happens in, say 6 milliseconds, and
>> then a new flow table change is signaled, the revalidator threads wake
>> up immediately without any further delay.  Values smaller than 5 were
>> found to increase the 'ovs-ofctl add-flows' execution time noticeably.
>> 
>> Since the revalidators are not running after each flow mod, the
>> overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
>> is reduced roughly by one core on a four core machine.
>> 
>> In testing the 'ovs-ofctl add-flows' execution time is not
>> significantly improved from this even if the revalidators are not
>> notified about the flow table changes at all.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Is this the patch you said you wanted to squeak into branch-2.6?
> 
> Acked-by: Ben Pfaff 

Yes, thank you. Running the testsuite once more I noticed that I needed to add 
this incremental to wait for the added delay in a test case that changes the 
flow tables in a loop and expected the changes to be instantly visible in the 
datapath:

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index e2b983f..025c923 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -4609,6 +4609,9 @@ m4_define([CHECK_CONTINUATION], [dnl
 m4_if([$3], [0], [],
 [AT_CHECK([echo "$actions1" | sed 's/pause/controller(pause)/g' | 
ovs-ofctl -O OpenFlow13 add-flows br1 -])])
 
+# Make sure the datapath is up-to-date before sending the packet.
+ovs-appctl revalidator/wait
+
 # Run a packet through the switch.
 AT_CHECK([ovs-appctl netdev-dummy/receive p1 "$flow"], [0], [stdout])

Pushed to master and branch-2.6.

  Jarno

 


___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] ofp-actions: Error on conntrack action inconsistency.

2016-09-26 Thread Jarno Rajahalme
Setting up a datapath flow that has a conntrack action with 'alg=ftp',
but does not match on 'nw_proto=tcp' fails with 'WARN' in
ovs-vswitchd.log.  It is better to flag such inconsistencies during
OpenFlow rule set-up time.  Also, conntrack action inconsistencies
should be flagged as such, rather than assuming that falling back to
OpenFlow 1.0 (which allows inconsistent actions) would remedy the
situation.

Remove the IP check from the action inconsistency check so that it is
possible to write rules that operate on both IPv4 and IPv6 packets,
when the action itself is not IP version dependent.  Correspondingly,
when translating a conntrack action, do not issue any datapath actions
if the packet being translated is not an IP packet, as conntrack can
operate only on IP packets and a datapath flow set-up otherwise fails
anyway.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/ofp-actions.h |  2 +-
 lib/ofp-actions.c | 22 +++---
 ofproto/ofproto-dpif-xlate.c  | 10 +++---
 tests/ofproto-dpif.at |  6 +++---
 tests/ovs-ofctl.at| 34 +-
 5 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 67e84fa..74e9dcc 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -556,7 +556,7 @@ enum nx_conntrack_flags {
 #define NX_CT_RECIRC_NONE OFPTT_ALL
 
 #if !defined(IPPORT_FTP)
-#defineIPPORT_FTP  21
+#define IPPORT_FTP  21
 #endif
 
 /* OFPACT_CT.
diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 22c7b16..4a8e79d 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -7029,31 +7029,31 @@ ofpact_check__(enum ofputil_protocol *usable_protocols, 
struct ofpact *a,
 
 case OFPACT_CT: {
 struct ofpact_conntrack *oc = ofpact_get_CT(a);
-enum ofperr err;
 
-if (!dl_type_is_ip_any(flow->dl_type)
-|| (flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)) {
-inconsistent_match(usable_protocols);
+if ((flow->ct_state & CS_INVALID && oc->flags & NX_CT_F_COMMIT)
+|| (oc->alg == IPPORT_FTP && flow->nw_proto != IPPROTO_TCP)) {
+/* We can't downgrade to OF1.0 and expect inconsistent CT actions
+ * be silently disgarded.  Instead, datapath flow install fails, so
+ * it is better to flag inconsistent CT actions as hard errors. */
+return OFPERR_OFPBAC_MATCH_INCONSISTENT;
 }
 
 if (oc->zone_src.field) {
 return mf_check_src(&oc->zone_src, flow);
 }
 
-err = ofpacts_check(oc->actions, ofpact_ct_get_action_len(oc),
-flow, max_ports, table_id, n_tables,
-usable_protocols);
-return err;
+return ofpacts_check(oc->actions, ofpact_ct_get_action_len(oc),
+ flow, max_ports, table_id, n_tables,
+ usable_protocols);
 }
 
 case OFPACT_NAT: {
 struct ofpact_nat *on = ofpact_get_NAT(a);
 
-if (!dl_type_is_ip_any(flow->dl_type) ||
-(on->range_af == AF_INET && flow->dl_type != htons(ETH_TYPE_IP)) ||
+if ((on->range_af == AF_INET && flow->dl_type != htons(ETH_TYPE_IP)) ||
 (on->range_af == AF_INET6
  && flow->dl_type != htons(ETH_TYPE_IPV6))) {
-inconsistent_match(usable_protocols);
+return OFPERR_OFPBAC_MATCH_INCONSISTENT;
 }
 return 0;
 }
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 4c28712..479bf13 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5030,12 +5030,16 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t 
ofpacts_len,
 break;
 
 case OFPACT_CT:
-compose_conntrack_action(ctx, ofpact_get_CT(a));
+if (is_ip_any(flow)) {
+compose_conntrack_action(ctx, ofpact_get_CT(a));
+}
 break;
 
 case OFPACT_NAT:
-/* This will be processed by compose_conntrack_action(). */
-ctx->ct_nat_action = ofpact_get_NAT(a);
+if (is_ip_any(flow)) {
+/* This will be processed by compose_conntrack_action(). */
+ctx->ct_nat_action = ofpact_get_NAT(a);
+}
 break;
 
 case OFPACT_DEBUG_RECIRC:
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index e2b983f..cc3265f 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8407,14 +8407,14 @@ AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl -P 
nxt_packet_in --detach --no
 
 AT_CHECK([ovs-appctl time/stop])
 
-AT_CHECK([ovs-appctl netdev-dummy/

Re: [ovs-dev] [PATCH 3/3] seq: Add support for initial delay before waking up.

2016-09-20 Thread Jarno Rajahalme
Thanks for the review. I sent a v2 with an implementation that just does the 
right thing (I hope) in the revalidator thread itself.

  Jarno

> On Sep 19, 2016, at 10:32 PM, Ben Pfaff  wrote:
> 
> On Fri, Sep 16, 2016 at 04:10:51PM -0700, Jarno Rajahalme wrote:
>> The execution time of 'ovs-ofctl add-flows' with a large number of
>> flows can be more than halved if revalidators are not running after
>> each flow mod separately.  This was first suspected when it was found
>> that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
>> same command without the '--bundle' option in a scenario where there
>> is a large set of flows being added and no datapath flows at all.  One
>> of the differences caused by the '--bundle' option is that the
>> revalidators are woken up only once, at the end of the whole set of
>> flow table changes, rather than after each flow table change
>> individually.
>> 
>> This patch limits the revalidation to run at most 200 times a second
>> by enforcing a minimum of 5ms delay for flow table change wakeup after
>> each complete revalidation round.  This is implemented by adding a new
>> seq_wait_delay() function, that takes a delay parameter, which, when
>> non-zero, causes the wakeup to be postponed by the given number of
>> milliseconds from the original seq_wait_delay() call time.  If nothing
>> happens in, say 6 milliseconds, and then a new flow table change is
>> signaled, the revalidator threads wake up immediately without any
>> further delay.  Values smaller than 5 were found to increase the
>> 'ovs-ofctl add-flows' execution time noticeably.
>> 
>> Since the revalidators are not running after each flow mod, the
>> overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
>> is reduced roughly by one core on a four core machine.
>> 
>> In testing the 'ovs-ofctl add-flows' execution time is not
>> significantly improved from this even if the revalidators are not
>> notified about the flow table changes at all.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> I need some help understanding how this works.  Before this commit, a
> thread that wanted to wake up if a seq changed would call seq_wait().
> If the seq changed, it got awakened immediately because seq_change()
> called seq_wake_waiters() which set a latch that woke the thread.  After
> this commit, that still works fine with delay==0.  However suppose that
> delay==5 instead.  As I see it, the same chain of events will occur
> except that seq_wake_waiters() might do nothing because the time hasn't
> come yet.  Suppose that nothing ever calls seq_change() again for that
> seq.  What wakes up the thread after 5 ms?
> 
> Once I understand that, here's a possible nit to look into.  A couple of
> places talk about how time_msec() has to be called without holding
> seq_mutex because time_msec() calls time_init(), which creates a seq,
> which takes seq_mutex.  Maybe this is good enough.  But it also appears
> to be easy to eliminate the dependency of seq_create() on seq_mutex.
> Here is why it takes seq_mutex:
> 
>ovs_mutex_lock(&seq_mutex);
>seq->value = seq_next++;
>hmap_init(&seq->waiters);
>ovs_mutex_unlock(&seq_mutex);
> 
> The first statement is in seq_mutex because seq_next requires it.  This
> could be fixed by make seq_next atomic and then using atomic_add()
> on it.  The second statement is in seq_mutex only to silence Clang; an
> OVS_NO_THREAD_SAFETY_ANALYSIS annotation would also silence it.
> 
> Thanks,
> 
> Ben.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/3] token-bucket: Add token_bucket_wait_at().

2016-09-20 Thread Jarno Rajahalme

> On Sep 19, 2016, at 5:19 PM, Ben Pfaff  wrote:
> 
> On Fri, Sep 16, 2016 at 04:10:50PM -0700, Jarno Rajahalme wrote:
>> Having the caller of token_bucket_wait() indicated in the log messages
>> makes debugging easier.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Ben Pfaff 

Applied to master and branch-2.6.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/3] ofproto: Do not signal revalidation for group mods twice.

2016-09-20 Thread Jarno Rajahalme

> On Sep 20, 2016, at 8:20 AM, Ben Pfaff  wrote:
> 
> On Fri, Sep 16, 2016 at 04:10:49PM -0700, Jarno Rajahalme wrote:
>> The new group mod implementation signals revalidation through
>> '->set_tables_version()', so the separate '->group_modify()' is no
>> longer needed.  The ofproto-provider API is changed to allow
>> 'group_modify' to be NULL.
>> 
>> Fixes: 5d08a275cd ("ofproto: Make groups versioned.")
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Ben Pfaff 

Thanks for the review, pushed to master and branch-2.6.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2] upcall: Don't start new revalidation round too soon after the last one.

2016-09-20 Thread Jarno Rajahalme
The execution time of 'ovs-ofctl add-flows' with a large number of
flows can be more than halved if revalidators are not running after
each flow mod separately.  This was first suspected when it was found
that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
same command without the '--bundle' option in a scenario where there
is a large set of flows being added and no datapath flows at all.  One
of the differences caused by the '--bundle' option is that the
revalidators are woken up only once, at the end of the whole set of
flow table changes, rather than after each flow table change
individually.

This patch limits the revalidation to run at most 200 times a second
by enforcing a minimum of 5ms time gap between the start times of
revalidation rounds.  If nothing happens in, say 6 milliseconds, and
then a new flow table change is signaled, the revalidator threads wake
up immediately without any further delay.  Values smaller than 5 were
found to increase the 'ovs-ofctl add-flows' execution time noticeably.

Since the revalidators are not running after each flow mod, the
overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
is reduced roughly by one core on a four core machine.

In testing the 'ovs-ofctl add-flows' execution time is not
significantly improved from this even if the revalidators are not
notified about the flow table changes at all.

Signed-off-by: Jarno Rajahalme 
---
 ofproto/ofproto-dpif-upcall.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index d73dc12..f76182e 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -939,6 +939,21 @@ udpif_revalidator(void *arg)
 latch_wait(&udpif->exit_latch);
 latch_wait(&udpif->pause_latch);
 poll_block();
+
+if (!latch_is_set(&udpif->pause_latch) &&
+!latch_is_set(&udpif->exit_latch)) {
+long long int now = time_msec();
+/* Block again if we are woken up within 5ms of the last start
+ * time. */
+start_time += 5;
+
+if (now < start_time) {
+poll_timer_wait_until(start_time);
+latch_wait(&udpif->exit_latch);
+latch_wait(&udpif->pause_latch);
+poll_block();
+}
+}
 }
 }
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix memory leak in execute_controller_action.

2016-09-20 Thread Jarno Rajahalme

> On Sep 20, 2016, at 8:35 AM, Ryan Moats  wrote:
> 
> commit df70a7731 ("ofproto-dpif-xlate: Allow translating
> without side-effects.") created a memory leak by removing the
> dp_packet_delete statement in execute_controller_action that
> freed the earlier cloned packet.  This commit restores this
> statement to the end of the method.
> 
> This issue is only seen in the master branch.
> 

Thanks for the fix. I missed the fact that while the packet data is 
unconditionally taken away from the ‘packet’, the struct dp_packet itself still 
needs to be freed.

  Jarno

> Fixes: df70a7731 ("ofproto-dpif-xlate: Allow translating without 
> side-effects.")
> Signed-off-by: Ryan Moats 
> ---
> ofproto/ofproto-dpif-xlate.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index f17cb79..71ffca0 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3648,6 +3648,8 @@ execute_controller_action(struct xlate_ctx *ctx, int 
> len,
> entry->controller.ofproto = ctx->xbridge->ofproto;
> entry->controller.am = am;
> }
> +
> +dp_packet_delete(packet);
> }
> 
> static void
> -- 
> 2.7.0
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 4/5] treewide: Use WC_MASK_FIELD() and FLOW_WC_GET_AND_MASK_WC().

2016-09-19 Thread Jarno Rajahalme
Unfortunately this and the last patch of the series does not apply to master, 
so you’d need to rebase these.

  Jarno

> On Aug 30, 2016, at 6:47 PM, Daniele Di Proietto  
> wrote:
> 
> I think it makes the code more readable. No functional change.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
> lib/bfd.c| 36 
> lib/cfm.c|  9 +++
> lib/flow.c   | 42 
> lib/flow.h   | 47 +---
> lib/odp-util.c   |  2 +-
> lib/tnl-neigh-cache.c|  6 ++---
> lib/tnl-ports.c  |  2 +-
> ofproto/netflow.c|  6 ++---
> ofproto/ofproto-dpif-xlate.c | 57 +++-
> ofproto/tunnel.c |  2 +-
> 10 files changed, 92 insertions(+), 117 deletions(-)
> 
> diff --git a/lib/bfd.c b/lib/bfd.c
> index 87f3322..203a56e 100644
> --- a/lib/bfd.c
> +++ b/lib/bfd.c
> @@ -648,30 +648,24 @@ bfd_should_process_flow(const struct bfd *bfd_, const 
> struct flow *flow,
> {
> struct bfd *bfd = CONST_CAST(struct bfd *, bfd_);
> 
> -if (!eth_addr_is_zero(bfd->rmt_eth_dst)) {
> -memset(&wc->masks.dl_dst, 0xff, sizeof wc->masks.dl_dst);
> -
> -if (!eth_addr_equals(bfd->rmt_eth_dst, flow->dl_dst)) {
> -return false;
> -}
> +if (!eth_addr_is_zero(bfd->rmt_eth_dst)
> +&& !eth_addr_equals(bfd->rmt_eth_dst,
> +FLOW_WC_GET_AND_MASK_WC(flow, wc, dl_dst))) {
> +return false;
> }
> 
> -if (flow->dl_type == htons(ETH_TYPE_IP)) {
> -memset(&wc->masks.nw_proto, 0xff, sizeof wc->masks.nw_proto);
> -if (flow->nw_proto == IPPROTO_UDP && !(flow->nw_frag & 
> FLOW_NW_FRAG_LATER)) {
> -memset(&wc->masks.tp_dst, 0xff, sizeof wc->masks.tp_dst);
> -if (flow->tp_dst == htons(BFD_DEST_PORT)) {
> -bool check_tnl_key;
> -
> -atomic_read_relaxed(&bfd->check_tnl_key, &check_tnl_key);
> -if (check_tnl_key) {
> -memset(&wc->masks.tunnel.tun_id, 0xff,
> -   sizeof wc->masks.tunnel.tun_id);
> -return flow->tunnel.tun_id == htonll(0);
> -}
> -return true;
> -}
> +if (flow->dl_type == htons(ETH_TYPE_IP)
> +&& FLOW_WC_GET_AND_MASK_WC(flow, wc, nw_proto) == IPPROTO_UDP
> +&& !(flow->nw_frag & FLOW_NW_FRAG_LATER)
> +&& FLOW_WC_GET_AND_MASK_WC(flow, wc, tp_dst) == 
> htons(BFD_DEST_PORT)) {
> +bool check_tnl_key;
> +
> +atomic_read_relaxed(&bfd->check_tnl_key, &check_tnl_key);
> +if (check_tnl_key) {
> +return FLOW_WC_GET_AND_MASK_WC(flow, wc, tunnel.tun_id)
> +   == htonll(0);
> }
> +return true;
> }
> return false;
> }
> diff --git a/lib/cfm.c b/lib/cfm.c
> index 7bc22e3..3c55333 100644
> --- a/lib/cfm.c
> +++ b/lib/cfm.c
> @@ -731,16 +731,17 @@ cfm_should_process_flow(const struct cfm *cfm_, const 
> struct flow *flow,
> return false;
> }
> 
> -memset(&wc->masks.dl_dst, 0xff, sizeof wc->masks.dl_dst);
> -if (OVS_UNLIKELY(!eth_addr_equals(flow->dl_dst, cfm_ccm_addr(cfm {
> +if (OVS_UNLIKELY(
> +!eth_addr_equals(FLOW_WC_GET_AND_MASK_WC(flow, wc, dl_dst),
> + cfm_ccm_addr(cfm {
> return false;
> }
> 
> atomic_read_relaxed(&cfm->check_tnl_key, &check_tnl_key);
> 
> if (check_tnl_key) {
> -memset(&wc->masks.tunnel.tun_id, 0xff, sizeof 
> wc->masks.tunnel.tun_id);
> -return flow->tunnel.tun_id == htonll(0);
> +return FLOW_WC_GET_AND_MASK_WC(flow, wc, tunnel.tun_id)
> +   == htonll(0);
> }
> return true;
> }
> diff --git a/lib/flow.c b/lib/flow.c
> index ba4f8c7..8f052ba 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -855,8 +855,8 @@ void
> flow_unwildcard_tp_ports(const struct flow *flow, struct flow_wildcards *wc)
> {
> if (flow->nw_proto != IPPROTO_ICMP) {
> -memset(&wc->masks.tp_src, 0xff, sizeof wc->masks.tp_src);
> -memset(&wc->masks.tp_dst, 0xff, sizeof wc->masks.tp_dst);
> +WC_MASK_FIELD(wc, tp_src);
> +WC_MASK_FIELD(wc, tp_dst);
> } else {
> wc->masks.tp_src = htons(0xff);
> wc->masks.tp_dst = htons(0xff);
> @@ -1478,8 +1478,8 @@ flow_wildcards_clear_non_packet_fields(struct 
> flow_wildcards *wc)
> /* Update this function whenever struct flow changes. */
> BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
> 
> -memset(&wc->masks.metadata, 0, sizeof wc->masks.metadata);
> -memset(&wc->masks.regs, 0, sizeof wc->masks.regs);
> +WC_UNMASK_FIELD(wc, metadata);
> +WC_UNMASK_FIELD(wc, regs);
> wc->masks.actset_output = 0;
> wc->masks.conj_id = 0;
> }
> @@ -1834,21 +1834,21 @@ flow_mask_hash_fields(const struct flow *flow, struct 
> flow_wildcards

Re: [ovs-dev] [PATCH v2 3/5] tnl-neigh-cache: Unwildcard flow members before inspecting them.

2016-09-19 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

IMO this and the earlier (2/5) one are bug fixes that should be cherry-picked 
to branches 2.6 and 2.5, if applicable.

  Jarno

> On Aug 30, 2016, at 6:47 PM, Daniele Di Proietto  
> wrote:
> 
> tnl_neigh_snoop() is part of the translation.  During translation we
> have to unwildcard all the fields we examine to make a decision.
> 
> tnl_arp_snoop() and tnl_nd_snoop() failed to unwildcard fileds in case
> of failure.  The solution is to do unwildcarding before the field is
> inspected.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
> lib/flow.h|  7 +++
> lib/tnl-neigh-cache.c | 22 --
> 2 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/flow.h b/lib/flow.h
> index 5b83695..ea24e28 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -854,6 +854,13 @@ pkt_metadata_from_flow(struct pkt_metadata *md, const 
> struct flow *flow)
> md->ct_label = flow->ct_label;
> }
> 
> +/* Often, during translation we need to read a value from a flow('FLOW') and
> + * unwildcard the corresponding bits in the wildcards('WC').  This macro 
> makes
> + * it easier to do that. */
> +
> +#define FLOW_WC_GET_AND_MASK_WC(FLOW, WC, FIELD) \
> +(((WC) ? WC_MASK_FIELD(WC, FIELD) : NULL), ((FLOW)->FIELD))
> +
> static inline bool is_vlan(const struct flow *flow,
>struct flow_wildcards *wc)
> {
> diff --git a/lib/tnl-neigh-cache.c b/lib/tnl-neigh-cache.c
> index f7d29f6..499efff 100644
> --- a/lib/tnl-neigh-cache.c
> +++ b/lib/tnl-neigh-cache.c
> @@ -115,7 +115,7 @@ tnl_neigh_delete(struct tnl_neigh_entry *neigh)
> 
> static void
> tnl_neigh_set__(const char name[IFNAMSIZ], const struct in6_addr *dst,
> -  const struct eth_addr mac)
> +const struct eth_addr mac)
> {
> ovs_mutex_lock(&mutex);
> struct tnl_neigh_entry *neigh = tnl_neigh_lookup__(name, dst);
> @@ -151,26 +151,21 @@ static int
> tnl_arp_snoop(const struct flow *flow, struct flow_wildcards *wc,
>   const char name[IFNAMSIZ])
> {
> -if (flow->dl_type != htons(ETH_TYPE_ARP) ||
> -flow->nw_proto != ARP_OP_REPLY ||
> -eth_addr_is_zero(flow->arp_sha)) {
> +if (flow->dl_type != htons(ETH_TYPE_ARP)
> +|| FLOW_WC_GET_AND_MASK_WC(flow, wc, nw_proto) != ARP_OP_REPLY
> +|| eth_addr_is_zero(FLOW_WC_GET_AND_MASK_WC(flow, wc, arp_sha))) {
> return EINVAL;
> }
> 
> -/* Exact Match on all ARP flows. */
> -memset(&wc->masks.nw_proto, 0xff, sizeof wc->masks.nw_proto);
> -memset(&wc->masks.nw_src, 0xff, sizeof wc->masks.nw_src);
> -memset(&wc->masks.arp_sha, 0xff, sizeof wc->masks.arp_sha);
> -
> -tnl_arp_set(name, flow->nw_src, flow->arp_sha);
> +tnl_arp_set(name, FLOW_WC_GET_AND_MASK_WC(flow, wc, nw_src), 
> flow->arp_sha);
> return 0;
> }
> 
> static int
> tnl_nd_snoop(const struct flow *flow, struct flow_wildcards *wc,
> -  const char name[IFNAMSIZ])
> + const char name[IFNAMSIZ])
> {
> -if (!is_nd(flow, NULL) || flow->tp_src != htons(ND_NEIGHBOR_ADVERT)) {
> +if (!is_nd(flow, wc) || flow->tp_src != htons(ND_NEIGHBOR_ADVERT)) {
> return EINVAL;
> }
> /* - RFC4861 says Neighbor Advertisements sent in response to unicast 
> Neighbor
> @@ -179,14 +174,13 @@ tnl_nd_snoop(const struct flow *flow, struct 
> flow_wildcards *wc,
>  *   TLL address and other Advertisements not including it can be ignored.
>  * - OVS flow extract can set this field to zero in case of packet 
> parsing errors.
>  *   For details refer miniflow_extract()*/
> -if (eth_addr_is_zero(flow->arp_tha)) {
> +if (eth_addr_is_zero(FLOW_WC_GET_AND_MASK_WC(flow, wc, arp_tha))) {
> return EINVAL;
> }
> 
> memset(&wc->masks.ipv6_src, 0xff, sizeof wc->masks.ipv6_src);
> memset(&wc->masks.ipv6_dst, 0xff, sizeof wc->masks.ipv6_dst);
> memset(&wc->masks.nd_target, 0xff, sizeof wc->masks.nd_target);
> -memset(&wc->masks.arp_tha, 0xff, sizeof wc->masks.arp_tha);
> 
> tnl_neigh_set__(name, &flow->nd_target, flow->arp_tha);
> return 0;
> -- 
> 2.9.3
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 2/5] ofproto-dpif-xlate: Adjust generated mask for fragments.

2016-09-19 Thread Jarno Rajahalme
Acked-by: Jarno Rajahalme 

  Jarno

> On Aug 30, 2016, at 6:47 PM, Daniele Di Proietto  
> wrote:
> 
> It's possible to install an OpenFlow flow that matches on udp source and
> destination ports without matching on fragments.  If the subtable where
> such flow stays is visited during translation of a later fragment, the
> generated mask will have incorrect prerequisited for the datapath and it
> would be revalidated away at the first chance.
> 
> This commit fixes it by adjusting the mask for later fragments after
> translation.
> 
> Other prerequisites of the mask are also prerequisites in OpenFlow, but
> not the ip fragment bit, that's why we need a special case here.
> 
> For completeness, this commits also fixes a related problem in bfd,
> where we check the udp destination port without checking if the frame is
> an ip fragment.  It's not really necessary to address this separately,
> given the adjustment that we perform.
> 
> VMware-BZ: #1651589
> Signed-off-by: Daniele Di Proietto 
> ---
> lib/bfd.c|  2 +-
> ofproto/ofproto-dpif-xlate.c | 11 +++
> tests/ofproto-dpif.at| 35 +++
> 3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/bfd.c b/lib/bfd.c
> index fcb6f64..87f3322 100644
> --- a/lib/bfd.c
> +++ b/lib/bfd.c
> @@ -658,7 +658,7 @@ bfd_should_process_flow(const struct bfd *bfd_, const 
> struct flow *flow,
> 
> if (flow->dl_type == htons(ETH_TYPE_IP)) {
> memset(&wc->masks.nw_proto, 0xff, sizeof wc->masks.nw_proto);
> -if (flow->nw_proto == IPPROTO_UDP) {
> +if (flow->nw_proto == IPPROTO_UDP && !(flow->nw_frag & 
> FLOW_NW_FRAG_LATER)) {
> memset(&wc->masks.tp_dst, 0xff, sizeof wc->masks.tp_dst);
> if (flow->tp_dst == htons(BFD_DEST_PORT)) {
> bool check_tnl_key;
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 0118d01..0403c98 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -5329,6 +5329,17 @@ xlate_wc_finish(struct xlate_ctx *ctx)
> if (ctx->wc->masks.vlan_tci) {
> ctx->wc->masks.vlan_tci |= htons(VLAN_CFI);
> }
> +
> +/* The classifier might return masks that match on tp_src and tp_dst even
> + * for later fragments.  This happens because there might be flows that
> + * match on tp_src or tp_dst without matching on the frag bits, because
> + * it is not a prerequisite for OpenFlow.  Since it is a prerequisite for
> + * datapath flows and since tp_src and tp_dst are always going to be 0,
> + * wildcard the fields here. */
> +if (ctx->xin->flow.nw_frag & FLOW_NW_FRAG_LATER) {
> +ctx->wc->masks.tp_src = 0;
> +ctx->wc->masks.tp_dst = 0;
> +}
> }
> 
> /* Translates the flow, actions, or rule in 'xin' into datapath actions in
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index 8e5fde6..e047c1c 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -8903,3 +8903,38 @@ AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface 
> int1 mtu=1600])
> 
> OVS_VSWITCHD_STOP
> AT_CLEANUP
> +
> +AT_SETUP([ofproto - fragment prerequisites])
> +OVS_VSWITCHD_START
> +
> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
> +
> +add_of_ports br0 1
> +
> +AT_DATA([flows.txt], [dnl
> +priority=10,in_port=1,udp,tp_src=67,tp_dst=68,action=drop
> +priority=1,in_port=1,udp,action=drop
> +])
> +
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=1])
> +
> +ovs-appctl time/stop
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 
> 'recirc_id(0),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=later)'])
> +ovs-appctl time/warp 5000
> +
> +AT_CHECK([strip_ufid < ovs-vswitchd.log | filter_flow_install | strip_used], 
> [0], [dnl
> +recirc_id(0),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=later), 
> actions:drop
> +])
> +
> +dnl Change the flow table.  This will trigger revalidation of all the flows.
> +AT_CHECK([ovs-ofctl add-flow br0 priority=5,in_port=1,action=drop])
> +AT_CHECK([ovs-appctl revalidator/wait], [0])
> +
> +dnl We don't want revalidators to delete any flow.  If the flow has been
> +dnl deleted it means that there's some inconsistency with the revalidation.
> +AT_CHECK([grep flow_del ovs-vswitchd.log], [1])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> -- 
> 2.9.3
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 3/3] seq: Add support for initial delay before waking up.

2016-09-16 Thread Jarno Rajahalme
The execution time of 'ovs-ofctl add-flows' with a large number of
flows can be more than halved if revalidators are not running after
each flow mod separately.  This was first suspected when it was found
that 'ovs-ofctl --bundle add-flows' is about 10 times faster than the
same command without the '--bundle' option in a scenario where there
is a large set of flows being added and no datapath flows at all.  One
of the differences caused by the '--bundle' option is that the
revalidators are woken up only once, at the end of the whole set of
flow table changes, rather than after each flow table change
individually.

This patch limits the revalidation to run at most 200 times a second
by enforcing a minimum of 5ms delay for flow table change wakeup after
each complete revalidation round.  This is implemented by adding a new
seq_wait_delay() function, that takes a delay parameter, which, when
non-zero, causes the wakeup to be postponed by the given number of
milliseconds from the original seq_wait_delay() call time.  If nothing
happens in, say 6 milliseconds, and then a new flow table change is
signaled, the revalidator threads wake up immediately without any
further delay.  Values smaller than 5 were found to increase the
'ovs-ofctl add-flows' execution time noticeably.

Since the revalidators are not running after each flow mod, the
overall OVS CPU utilization during the 'ovs-ofctl add-flows' run time
is reduced roughly by one core on a four core machine.

In testing the 'ovs-ofctl add-flows' execution time is not
significantly improved from this even if the revalidators are not
notified about the flow table changes at all.

Signed-off-by: Jarno Rajahalme 
---
 lib/seq.c | 50 ++-
 lib/seq.h |  9 ++--
 ofproto/ofproto-dpif-upcall.c |  2 +-
 tests/ofproto-dpif.at |  3 +++
 4 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/lib/seq.c b/lib/seq.c
index 6e2f596..2b64dd3 100644
--- a/lib/seq.c
+++ b/lib/seq.c
@@ -22,9 +22,10 @@
 
 #include "coverage.h"
 #include "hash.h"
-#include "openvswitch/hmap.h"
 #include "latch.h"
+#include "openvswitch/hmap.h"
 #include "openvswitch/list.h"
+#include "timeval.h"
 #include "ovs-thread.h"
 #include "poll-loop.h"
 
@@ -45,6 +46,8 @@ struct seq_waiter {
 struct seq_thread *thread OVS_GUARDED;  /* Thread preparing to wait. */
 struct ovs_list list_node OVS_GUARDED;  /* In 'thread->waiters'. */
 
+long long int delay_until_time OVS_GUARDED;
+
 uint64_t value OVS_GUARDED; /* seq->value we're waiting to change. */
 };
 
@@ -66,7 +69,8 @@ static struct seq_thread *seq_thread_get(void) 
OVS_REQUIRES(seq_mutex);
 static void seq_thread_exit(void *thread_) OVS_EXCLUDED(seq_mutex);
 static void seq_thread_woke(struct seq_thread *) OVS_REQUIRES(seq_mutex);
 static void seq_waiter_destroy(struct seq_waiter *) OVS_REQUIRES(seq_mutex);
-static void seq_wake_waiters(struct seq *) OVS_REQUIRES(seq_mutex);
+static void seq_wake_waiters(struct seq *, long long int now)
+OVS_REQUIRES(seq_mutex);
 
 /* Creates and returns a new 'seq' object. */
 struct seq * OVS_EXCLUDED(seq_mutex)
@@ -94,7 +98,7 @@ seq_destroy(struct seq *seq)
  OVS_EXCLUDED(seq_mutex)
 {
 ovs_mutex_lock(&seq_mutex);
-seq_wake_waiters(seq);
+seq_wake_waiters(seq, LLONG_MAX);
 hmap_destroy(&seq->waiters);
 free(seq);
 ovs_mutex_unlock(&seq_mutex);
@@ -123,13 +127,13 @@ seq_unlock(void)
 /* Increments 'seq''s sequence number, waking up any threads that are waiting
  * on 'seq'. */
 void
-seq_change_protected(struct seq *seq)
+seq_change_protected_now(struct seq *seq, long long int now)
 OVS_REQUIRES(seq_mutex)
 {
 COVERAGE_INC(seq_change);
 
 seq->value = seq_next++;
-seq_wake_waiters(seq);
+seq_wake_waiters(seq, now);
 }
 
 /* Increments 'seq''s sequence number, waking up any threads that are waiting
@@ -138,8 +142,12 @@ void
 seq_change(struct seq *seq)
 OVS_EXCLUDED(seq_mutex)
 {
+/* time initialization uses a seq, so we must take 'now' before
+ * locking 'seq_mutex'. */
+long long int now = time_msec();
+
 ovs_mutex_lock(&seq_mutex);
-seq_change_protected(seq);
+seq_change_protected_now(seq, now);
 ovs_mutex_unlock(&seq_mutex);
 }
 
@@ -173,8 +181,10 @@ seq_read(const struct seq *seq)
 return value;
 }
 
+/* Find or create a waiter for the current thread. */
 static void
-seq_wait__(struct seq *seq, uint64_t value, const char *where)
+seq_wait__(struct seq *seq, uint64_t value, long long int delay_until_time,
+   const char *where)
 OVS_REQUIRES(seq_mutex)
 {
 unsigned int id = ovsthread_id_self();
@@ 

[ovs-dev] [PATCH 1/3] ofproto: Do not signal revalidation for group mods twice.

2016-09-16 Thread Jarno Rajahalme
The new group mod implementation signals revalidation through
'->set_tables_version()', so the separate '->group_modify()' is no
longer needed.  The ofproto-provider API is changed to allow
'group_modify' to be NULL.

Fixes: 5d08a275cd ("ofproto: Make groups versioned.")
Signed-off-by: Jarno Rajahalme 
---
 ofproto/ofproto-dpif.c | 10 +-
 ofproto/ofproto.c  |  3 ++-
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 0cffca1..63a84a8 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4549,14 +4549,6 @@ group_destruct(struct ofgroup *group_)
 ovs_mutex_destroy(&group->stats_mutex);
 }
 
-static void
-group_modify(struct ofgroup *group_)
-{
-struct ofproto_dpif *ofproto = ofproto_dpif_cast(group_->ofproto);
-
-ofproto->backer->need_revalidate = REV_FLOW_TABLE;
-}
-
 static enum ofperr
 group_get_stats(const struct ofgroup *group_, struct ofputil_group_stats *ogs)
 {
@@ -5899,7 +5891,7 @@ const struct ofproto_class ofproto_dpif_class = {
 group_construct,/* group_construct */
 group_destruct, /* group_destruct */
 group_dealloc,  /* group_dealloc */
-group_modify,   /* group_modify */
+NULL,   /* group_modify */
 group_get_stats,/* group_get_stats */
 get_datapath_version,   /* get_datapath_version */
 };
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index f89f0ef..e87def7 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -7091,7 +7091,8 @@ ofproto_group_mod_finish(struct ofproto *ofproto,
 struct ofgroup *new_group = ogm->new_group;
 struct ofgroup *old_group;
 
-if (new_group && group_collection_n(&ogm->old_groups)) {
+if (new_group && group_collection_n(&ogm->old_groups) &&
+ofproto->ofproto_class->group_modify) {
 /* Modify a group. */
 ovs_assert(group_collection_n(&ogm->old_groups) == 1);
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/3] token-bucket: Add token_bucket_wait_at().

2016-09-16 Thread Jarno Rajahalme
Having the caller of token_bucket_wait() indicated in the log messages
makes debugging easier.

Signed-off-by: Jarno Rajahalme 
---
 include/openvswitch/token-bucket.h | 5 -
 lib/token-bucket.c | 7 ---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/openvswitch/token-bucket.h 
b/include/openvswitch/token-bucket.h
index bbcde84..6bb6040 100644
--- a/include/openvswitch/token-bucket.h
+++ b/include/openvswitch/token-bucket.h
@@ -37,6 +37,9 @@ void token_bucket_init(struct token_bucket *,
 void token_bucket_set(struct token_bucket *,
unsigned int rate, unsigned int burst);
 bool token_bucket_withdraw(struct token_bucket *, unsigned int n);
-void token_bucket_wait(struct token_bucket *, unsigned int n);
+void token_bucket_wait_at(struct token_bucket *, unsigned int n,
+  const char *where);
+#define token_bucket_wait(bucket, n)\
+token_bucket_wait_at(bucket, n, OVS_SOURCE_LOCATOR)
 
 #endif /* token-bucket.h */
diff --git a/lib/token-bucket.c b/lib/token-bucket.c
index be90453..29f5165 100644
--- a/lib/token-bucket.c
+++ b/lib/token-bucket.c
@@ -85,12 +85,13 @@ token_bucket_withdraw(struct token_bucket *tb, unsigned int 
n)
 /* Causes the poll loop to wake up when at least 'n' tokens will be available
  * for withdrawal from 'tb'. */
 void
-token_bucket_wait(struct token_bucket *tb, unsigned int n)
+token_bucket_wait_at(struct token_bucket *tb, unsigned int n,
+ const char *where)
 {
 if (tb->tokens >= n) {
-poll_immediate_wake();
+poll_immediate_wake_at(where);
 } else {
 unsigned int need = n - tb->tokens;
-poll_timer_wait_until(tb->last_fill + need / tb->rate + 1);
+poll_timer_wait_until_at(tb->last_fill + need / tb->rate + 1, where);
 }
 }
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 0/3] Fixes for OVS 2.6

2016-09-16 Thread Jarno Rajahalme
These are fixes that would still be nice to get to OVS 2.6 release.
The second patch is a nice to have, but is not durectly used by the
other patches in the series.

Jarno Rajahalme (3):
  ofproto: Do not signal revalidation for group mods twice.
  token-bucket: Add token_bucket_wait_at().
  seq: Add support for initial delay before waking up.

 include/openvswitch/token-bucket.h |  5 +++-
 lib/seq.c  | 50 ++
 lib/seq.h  |  9 +--
 lib/token-bucket.c |  7 +++---
 ofproto/ofproto-dpif-upcall.c  |  2 +-
 ofproto/ofproto-dpif.c | 10 +---
 ofproto/ofproto.c  |  3 ++-
 tests/ofproto-dpif.at  |  3 +++
 8 files changed, 57 insertions(+), 32 deletions(-)

-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] xlate: Use dp_hash for select groups.

2016-09-16 Thread Jarno Rajahalme
No, thank you for remembering to prompt for the documentation, I would not have 
noticed the bug otherwise!

v3 pushed to master,

  Jarno

> On Sep 15, 2016, at 7:26 PM, Ben Pfaff  wrote:
> 
> OK.  Thanks for being so careful!
> 
> I'll look at v3.
> 
> On Thu, Sep 15, 2016 at 06:43:39PM -0700, Jarno Rajahalme wrote:
>> Thanks for the fix!
>> 
>> While I was working with tightening the parsing, I found that I had earlier 
>> introduced a bug that crashes ovs-ofctl when a parsing error is found after 
>> parsing ‘fields’, essentially a double free. I sent a v3 fixing this, 
>> documenting the parsing tightening and then rebasing this patch with this 
>> fix, additional documentation and with the NEWS piece moved to the post-2.6b 
>> section.
>> 
>>  Jarno
>> 
>>> On Sep 14, 2016, at 8:53 PM, Ben Pfaff  wrote:
>>> 
>>> On Wed, Sep 14, 2016 at 08:51:42PM -0700, Ben Pfaff wrote:
>>>> On Mon, Sep 12, 2016 at 04:16:08PM -0700, Jarno Rajahalme wrote:
>>>>> Add a new select group selection method "dp_hash", which uses minimal
>>>>> number of bits from the datapath calculated packet hash to inform the
>>>>> select group bucket selection.  This makes the datapath flows more
>>>>> generic resulting in less upcalls to userspace, but adds recirculation
>>>>> prior to group selection.
>>>>> 
>>>>> Signed-off-by: Jarno Rajahalme 
>>>>> ---
>>>>> v2: Rebase and documentation.
>>>> 
>>>> Thanks for adding the documentation!  It describes the syntax, which is
>>>> useful.  To make it even more helpful, I would suggest adding some
>>>> advice to the user to explain when one might best choose one or the
>>>> other.
>>>> 
>>>> I think that the dp_hash method ignores fields specified by the user if
>>>> any are given, so the documentation might mention that for dp_hash the
>>>> "fields" option should be omitted.
>>>> 
>>>> Thanks!
>>>> 
>>>> Acked-by: Ben Pfaff 
>>> 
>>> Oh, I forgot to say that I get a compiler warning:
>>> 
>>>   ../ofproto/ofproto-dpif-xlate.c: In function 'xlate_dp_hash_select_group':
>>>   ../ofproto/ofproto-dpif-xlate.c:3410:32: error: variable 'buckets' set 
>>> but not used [-Werror=unused-but-set-variable]
>>>const struct ovs_list *buckets;
>>> 
>>> Fixed by:
>>> 
>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>> index c83132c..a74daa7 100644
>>> --- a/ofproto/ofproto-dpif-xlate.c
>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>> @@ -3407,10 +3407,8 @@ xlate_dp_hash_select_group(struct xlate_ctx *ctx, 
>>> struct group_dpif *group)
>>> 
>>>ctx_trigger_recirculate_with_hash(ctx, param >> 32, (uint32_t)param);
>>>} else {
>>> -const struct ovs_list *buckets;
>>>uint32_t n_buckets;
>>> -
>>> -buckets = group_dpif_get_buckets(group, &n_buckets);
>>> +group_dpif_get_buckets(group, &n_buckets);
>>>if (n_buckets) {
>>>/* Minimal mask to cover the number of buckets. */
>>>uint32_t mask = (1 << log_2_ceil(n_buckets)) - 1;
>> 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 2/3] ofp-parse: Harden checking with group selection_method.

2016-09-16 Thread Jarno Rajahalme
Thanks! Pushed to master and branch-2.6.

  Jarno

> On Sep 15, 2016, at 7:28 PM, Ben Pfaff  wrote:
> 
> On Thu, Sep 15, 2016 at 06:40:20PM -0700, Jarno Rajahalme wrote:
>> Only allow fields when "selection_method=hash".  Only allow
>> selection_method_param when a non-nil selection_method is given.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Ben Pfaff 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 1/3] ofp-parse: Remove double uninit of group mod if parsing fails.

2016-09-16 Thread Jarno Rajahalme
Thanks for the review, pushed to master and branch-2.6.

  Jarno

> On Sep 15, 2016, at 7:27 PM, Ben Pfaff  wrote:
> 
> On Thu, Sep 15, 2016 at 06:40:19PM -0700, Jarno Rajahalme wrote:
>> Double ofputil_uninit_group_mod() used to be harmless, but leads to
>> double free after commit e8dba7197, which will crash if any error in
>> group parsing happens.
>> 
>> Add a test to prevent this regression from happening again.
>> 
>> Fixes: e8dba7197 ("meta-flow: Compact struct field_array.")
>> Signed-off-by: Jarno Rajahalme 
> 
> Thanks!
> 
> Acked-by: Ben Pfaff 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] xlate: Use dp_hash for select groups.

2016-09-15 Thread Jarno Rajahalme
Thanks for the fix!

While I was working with tightening the parsing, I found that I had earlier 
introduced a bug that crashes ovs-ofctl when a parsing error is found after 
parsing ‘fields’, essentially a double free. I sent a v3 fixing this, 
documenting the parsing tightening and then rebasing this patch with this fix, 
additional documentation and with the NEWS piece moved to the post-2.6b section.

  Jarno

> On Sep 14, 2016, at 8:53 PM, Ben Pfaff  wrote:
> 
> On Wed, Sep 14, 2016 at 08:51:42PM -0700, Ben Pfaff wrote:
>> On Mon, Sep 12, 2016 at 04:16:08PM -0700, Jarno Rajahalme wrote:
>>> Add a new select group selection method "dp_hash", which uses minimal
>>> number of bits from the datapath calculated packet hash to inform the
>>> select group bucket selection.  This makes the datapath flows more
>>> generic resulting in less upcalls to userspace, but adds recirculation
>>> prior to group selection.
>>> 
>>> Signed-off-by: Jarno Rajahalme 
>>> ---
>>> v2: Rebase and documentation.
>> 
>> Thanks for adding the documentation!  It describes the syntax, which is
>> useful.  To make it even more helpful, I would suggest adding some
>> advice to the user to explain when one might best choose one or the
>> other.
>> 
>> I think that the dp_hash method ignores fields specified by the user if
>> any are given, so the documentation might mention that for dp_hash the
>> "fields" option should be omitted.
>> 
>> Thanks!
>> 
>> Acked-by: Ben Pfaff 
> 
> Oh, I forgot to say that I get a compiler warning:
> 
>../ofproto/ofproto-dpif-xlate.c: In function 'xlate_dp_hash_select_group':
>../ofproto/ofproto-dpif-xlate.c:3410:32: error: variable 'buckets' set but 
> not used [-Werror=unused-but-set-variable]
> const struct ovs_list *buckets;
> 
> Fixed by:
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index c83132c..a74daa7 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3407,10 +3407,8 @@ xlate_dp_hash_select_group(struct xlate_ctx *ctx, 
> struct group_dpif *group)
> 
> ctx_trigger_recirculate_with_hash(ctx, param >> 32, (uint32_t)param);
> } else {
> -const struct ovs_list *buckets;
> uint32_t n_buckets;
> -
> -buckets = group_dpif_get_buckets(group, &n_buckets);
> +group_dpif_get_buckets(group, &n_buckets);
> if (n_buckets) {
> /* Minimal mask to cover the number of buckets. */
> uint32_t mask = (1 << log_2_ceil(n_buckets)) - 1;

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v3 3/3] xlate: Use dp_hash for select groups.

2016-09-15 Thread Jarno Rajahalme
Add a new select group selection method "dp_hash", which uses minimal
number of bits from the datapath calculated packet hash to inform the
select group bucket selection.  This makes the datapath flows more
generic resulting in less upcalls to userspace, but adds recirculation
prior to group selection.

Signed-off-by: Jarno Rajahalme 
---
v3: Further improved documentation, and moved the NEWS piece to post-2.6.

 NEWS |  8 +
 lib/ofp-util.c   | 16 --
 ofproto/ofproto-dpif-xlate.c | 70 ++--
 ofproto/ofproto-dpif.c   | 11 ---
 ofproto/ofproto-dpif.h   |  4 +--
 tests/ofproto-dpif.at| 38 
 tests/ofproto.at |  3 ++
 utilities/ovs-ofctl.8.in | 38 ++--
 8 files changed, 166 insertions(+), 22 deletions(-)

diff --git a/NEWS b/NEWS
index 343f7f1..98d9702 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,14 @@ Post-v2.6.0
  not accurate.
- OpenFlow:
  * OFPT_PACKET_OUT messages are now supported in bundles.
+ * A new "selection_method=dp_hash" type for OpenFlow select group
+   bucket selection that uses the datapath computed 5-tuple hash
+   without making datapath flows match the 5-tuple fields, which
+   is useful for more efficient load balancing, for example.  This
+   uses the Netronome extension to OpenFlow 1.5+ that allows
+   control over the OpenFlow select groups selection method.  See
+   "selection_method" and related options in ovs-ofctl(8) for
+   details.
- ovs-ofctl:
  * 'bundle' command now supports packet-out messages.
  * New syntax for 'ovs-ofctl packet-out' command, which uses the
diff --git a/lib/ofp-util.c b/lib/ofp-util.c
index c84ed17..f83e4a3 100644
--- a/lib/ofp-util.c
+++ b/lib/ofp-util.c
@@ -8904,28 +8904,24 @@ parse_group_prop_ntr_selection_method(struct ofpbuf 
*payload,
 return OFPERR_OFPBPC_BAD_VALUE;
 }
 
-if (strcmp("hash", prop->selection_method)) {
+if (strcmp("hash", prop->selection_method)
+&& strcmp("dp_hash", prop->selection_method)) {
 OFPPROP_LOG(&bad_ofmsg_rl, false,
 "ntr selection method '%s' is not supported",
 prop->selection_method);
 return OFPERR_OFPBPC_BAD_VALUE;
 }
+/* 'method_len' is now non-zero. */
 
 strcpy(gp->selection_method, prop->selection_method);
 gp->selection_method_param = ntohll(prop->selection_method_param);
 
-if (!method_len && gp->selection_method_param) {
-OFPPROP_LOG(&bad_ofmsg_rl, false, "ntr selection method parameter is "
-"non-zero but selection method is empty");
-return OFPERR_OFPBPC_BAD_VALUE;
-}
-
 ofpbuf_pull(payload, sizeof *prop);
 
 fields_len = ntohs(prop->length) - sizeof *prop;
-if (!method_len && fields_len) {
-OFPPROP_LOG(&bad_ofmsg_rl, false, "ntr selection method parameter is "
-"zero but fields are provided");
+if (fields_len && strcmp("hash", gp->selection_method)) {
+OFPPROP_LOG(&bad_ofmsg_rl, false, "ntr selection method %s "
+"does not support fields", gp->selection_method);
 return OFPERR_OFPBPC_BAD_VALUE;
 }
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 6854da3..9dd7206 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -346,6 +346,11 @@ struct xlate_ctx {
 * case at that point.
 */
 bool freezing;
+bool recirc_update_dp_hash;/* Generated recirculation will be preceded
+* by datapath HASH action to get an updated
+* dp_hash after recirculation. */
+uint32_t dp_hash_alg;
+uint32_t dp_hash_basis;
 struct ofpbuf frozen_actions;
 const struct ofpact_controller *pause;
 
@@ -408,6 +413,17 @@ ctx_trigger_freeze(struct xlate_ctx *ctx)
 ctx->freezing = true;
 }
 
+static void
+ctx_trigger_recirculate_with_hash(struct xlate_ctx *ctx, uint32_t type,
+  uint32_t basis)
+{
+ctx->exit = true;
+ctx->freezing = true;
+ctx->recirc_update_dp_hash = true;
+ctx->dp_hash_alg = type;
+ctx->dp_hash_basis = basis;
+}
+
 static bool
 ctx_first_frozen_action(const struct xlate_ctx *ctx)
 {
@@ -419,6 +435,7 @@ ctx_cancel_freeze(struct xlate_ctx *ctx)
 {
 if (ctx->freezing) {
 ctx->freezing = false;
+ctx->recirc_update_dp_hash = false;
 ofpbuf_clear(&ctx->frozen_actions);
 ctx->frozen_actions.header = NULL;
 }
@@ -1465,7 +1482,7 @@ group_first_liv

[ovs-dev] [PATCH v3 1/3] ofp-parse: Remove double uninit of group mod if parsing fails.

2016-09-15 Thread Jarno Rajahalme
Double ofputil_uninit_group_mod() used to be harmless, but leads to
double free after commit e8dba7197, which will crash if any error in
group parsing happens.

Add a test to prevent this regression from happening again.

Fixes: e8dba7197 ("meta-flow: Compact struct field_array.")
Signed-off-by: Jarno Rajahalme 
---
v3: New patch for v3.

 lib/ofp-parse.c   | 4 
 tests/ofproto-dpif.at | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/ofp-parse.c b/lib/ofp-parse.c
index 0568fc7..92c4693 100644
--- a/lib/ofp-parse.c
+++ b/lib/ofp-parse.c
@@ -1733,10 +1733,6 @@ parse_ofp_group_mod_str(struct ofputil_group_mod *gm, 
int command,
 char *error = parse_ofp_group_mod_str__(gm, command, string,
 usable_protocols);
 free(string);
-
-if (error) {
-ofputil_uninit_group_mod(gm);
-}
 return error;
 }
 
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 557c8be..2978cc5 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -418,6 +418,10 @@ AT_CLEANUP
 AT_SETUP([ofproto-dpif - select group with hash selection method])
 OVS_VSWITCHD_START
 add_of_ports br0 1 10 11
+# Check that parse failures after 'fields' parsing work
+AT_CHECK([ovs-ofctl -O OpenFlow10 add-group br0 
'group_id=1,type=select,fields(eth_dst),bukket=output:10'], [1], ,[dnl
+ovs-ofctl: unknown keyword bukket
+])
 AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 
'group_id=1234,type=select,selection_method=hash,fields(eth_dst,ip_dst,tcp_dst),bucket=output:10,bucket=output:11'])
 AT_CHECK([ovs-ofctl -O OpenFlow15 add-flow br0 'ip 
actions=write_actions(group:1234)'])
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v3 2/3] ofp-parse: Harden checking with group selection_method.

2016-09-15 Thread Jarno Rajahalme
Only allow fields when "selection_method=hash".  Only allow
selection_method_param when a non-nil selection_method is given.

Signed-off-by: Jarno Rajahalme 
---
v3: New patch for v3.

 lib/ofp-parse.c  | 12 
 tests/ofproto-dpif.at| 10 ++
 utilities/ovs-ofctl.8.in | 16 
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/lib/ofp-parse.c b/lib/ofp-parse.c
index 92c4693..2980a1d 100644
--- a/lib/ofp-parse.c
+++ b/lib/ofp-parse.c
@@ -1668,6 +1668,18 @@ parse_ofp_group_mod_str__(struct ofputil_group_mod *gm, 
int command,
 goto out;
 }
 
+/* Exclude fields for non "hash" selection method. */
+if (strcmp(gm->props.selection_method, "hash") &&
+gm->props.fields.values_size) {
+error = xstrdup("fields may only be specified with 
\"selection_method=hash\"");
+goto out;
+}
+/* Exclude selection_method_param if no selection_method is given. */
+if (gm->props.selection_method[0] == 0
+&& gm->props.selection_method_param != 0) {
+error = xstrdup("selection_method_param is only allowed with 
\"selection_method\"");
+goto out;
+}
 if (fields & F_COMMAND_BUCKET_ID) {
 if (!(fields & F_COMMAND_BUCKET_ID_ALL || had_command_bucket_id)) {
 error = xstrdup("must specify a command bucket id");
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 2978cc5..cc38858 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -449,6 +449,16 @@ AT_CHECK([sort results | uniq | sed 's/1[[01]]/1?/'], [0],
   [Datapath actions: 1?
 ])
 
+# Check that fields are rejected without "selection_method=hash".
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 
'group_id=1235,type=select,fields(eth_dst,ip_dst,tcp_dst),bucket=output:10,bucket=output:11'],
 1, [], [dnl
+ovs-ofctl: fields may only be specified with "selection_method=hash"
+])
+
+# Check that selection_method_param without selection_method is rejected.
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 
'group_id=1235,type=select,selection_method_param=1,bucket=output:10,bucket=output:11'],
 1, [], [dnl
+ovs-ofctl: selection_method_param is only allowed with "selection_method"
+])
+
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
diff --git a/utilities/ovs-ofctl.8.in b/utilities/ovs-ofctl.8.in
index b648ab8..675c308 100644
--- a/utilities/ovs-ofctl.8.in
+++ b/utilities/ovs-ofctl.8.in
@@ -2887,14 +2887,14 @@ when using Open vSwitch 2.4 and later with OpenFlow 1.5 
and later.
 .IP \fBfields\fR=\fIfield\fR
 .IQ \fBfields(\fIfield\fR[\fB=\fImask\fR]\fR...\fB)\fR
 The field parameters to selection method selected by the
-\fBselection_method\fR field.  The syntax is described in \fBFlow Syntax\fR
-with the additional restrictions that if a value is provided it is
-treated as a wildcard mask and wildcard masks following a slash are
-prohibited. The pre-requisites of fields must be provided by any flows that
-output to the group. The use of the fields is defined by the lower-layer
-that implements the \fBselection_method\fR.  They are optional if the
-\fBselection_method\fR field is specified as a non-empty string.
-Prohibited otherwise. The default is no fields.
+\fBselection_method\fR field.  The syntax is described in \fBFlow
+Syntax\fR with the additional restrictions that if a value is provided
+it is treated as a wildcard mask and wildcard masks following a slash
+are prohibited. The pre-requisites of fields must be provided by any
+flows that output to the group.  The use of the fields is defined by
+the lower-layer that implements the \fBselection_method\fR.  They are
+optional if the \fBselection_method\fR field is specified as ``hash',
+prohibited otherwise.  The default is no fields.
 .IP
 This option will use a Netronome OpenFlow extension which is only supported
 when using Open vSwitch 2.4 and later with OpenFlow 1.5 and later.
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] ofproto: Remove double reporting from bundles.

2016-09-15 Thread Jarno Rajahalme

> On Sep 15, 2016, at 2:35 PM, Jarno Rajahalme  wrote:
> 
>> 
>> On Sep 14, 2016, at 8:55 PM, Ben Pfaff  wrote:
>> 
>> On Wed, Sep 14, 2016 at 07:25:34PM -0700, Jarno Rajahalme wrote:
>>> Patch b0d38b2f17 unified flow mod reporting in ofproto for both
>>> stand-alone flow mods and bundle flow mods, but left bundle-specific
>>> reporting to the bundle removal code.  This patch fixes this by
>>> removing the bundle-specific reporting of flow mods.
>>> 
>>> Found by inspection.
>>> 
>>> Fixes: b0d38b2f17 ("ofproto: Report flow mods also from bundles.")
>>> Signed-off-by: Jarno Rajahalme 
>> 
>> Nice.
>> 
>> Should this be covered by a test?
>> 
> 
> Pushed to master with a test change to cover this. Will push to branch-2.6 
> soon.
> 
> Thanks for the review!
> 
>  Jarno
> 
>> Acked-by: Ben Pfaff mailto:b...@ovn.org>>

Both patches backported to branch-2.6.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 03/13] ofproto: Add a fixed bundle idle timeout of 10 seconds.

2016-09-15 Thread Jarno Rajahalme

> On Sep 13, 2016, at 5:14 PM, Ben Pfaff  wrote:
> 
> On Tue, Sep 13, 2016 at 03:01:42PM -0700, Jarno Rajahalme wrote:
>> 
>>> On Sep 13, 2016, at 11:14 AM, Ben Pfaff  wrote:
>>> 
>>> On Mon, Sep 12, 2016 at 01:52:33PM -0700, Jarno Rajahalme wrote:
>>>> Timing out idle bundles frees memory that would effectively be leaked
>>>> if a long standing OpenFlow connection would fail to commit or discard
>>>> a bundle.
>>>> 
>>>> OpenFlow specification mandates the timeout to be at least one second,
>>>> if the switch implements such a timeout.  This patch makes the bundle
>>>> idle timeout to be 10 seconds.
>>>> 
>>>> We do not limit the number of messages in a bundle, so it does not
>>>> make sense to limit the number of bundles either, especially now that
>>>> idle bundles are timed out.
>>>> 
>>>> Signed-off-by: Jarno Rajahalme 
>>>> ---
>>>> v3: New patch for v3.
>>> 
>>> It would be kind to describe the behavior in ovs-vswitchd(8), in the
>>> OPENFLOW IMPLEMENTATION section (and then the code should refer back to
>>> this so that we'll update it if we change the code).
>>> 
>> 
>> Done, thanks for the review. Will push to master (only?) soon.
> 
> This fixes an effective memory leak, right?  If so, then I'd put it on
> branch-2.6 also unless you have another idea.

Backported to branch-2.6.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 01/13] ofproto: Don't use connmgr after destruction.

2016-09-15 Thread Jarno Rajahalme

> On Sep 13, 2016, at 5:13 PM, Ben Pfaff  wrote:
> 
> On Tue, Sep 13, 2016 at 02:45:54PM -0700, Jarno Rajahalme wrote:
>> 
>>> On Sep 13, 2016, at 10:56 AM, Ben Pfaff  wrote:
>>> 
>>> On Mon, Sep 12, 2016 at 01:52:31PM -0700, Jarno Rajahalme wrote:
>>>> Set ofproto's connmgr pointer to NULL after the connmgr has been
>>>> destructed, and check for NULL when sending a flow removed
>>>> notification.
>>>> 
>>>> Verified by sending the flow removed message unconditionally and
>>>> observing numerous core dumps in the test suite.
>>>> 
>>>> Found by inspection.
>>>> 
>>>> Fixes: f695ebfae5 ("ofproto: Postpone sending flow removed messages.")
>>>> Signed-off-by: Jarno Rajahalme 
>>>> ---
>>>> v3: New patch for v3.
>>> 
>>> I'm worried a bit that the rules needed to access these structures
>>> correctly are getting complicated.  Maybe there should be a high-level
>>> overview somewhere.
>>> 
>> 
>> The main complication here is that if we tagged connmgr * with 
>> OVS_GUARDED_BY(ofproto_mutex), we would need to do widen the scope of 
>> ofproto_mutex quite a bit. Maybe introducing a rwlock for connmgr separately 
>> would do it, albeit with the assumption that the main thread remains the 
>> only writer, all that extra read locking from the main thread would be new 
>> overhead. It would be more maintainable, though.
> 
> Yes, I recognize that and I don't want to make the scope bigger if we
> can avoid it.
> 
> Another option might be to just stick the flow_removed messages on a
> list from the callback and then flush them later in the ofproto_run()
> loop.  They're already asynchronous and unpredictable since RCU can
> delay sending them an arbitrary amount of time.
> 

Even that would have to deal with ofproto disappearing, but I like it as it 
would simplify locking requirements. I’ll make a note to look at this later.

>> I made a note to look into this later.
>> 
>> Do you think we should apply this to branch-2.6 as well as the master?
> 
> It fixes a crash bug right?  So, yes, unless you know a simpler fix.

I back ported this to 2.6.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] NEWS: Move bundle timeout to 2.6.

2016-09-15 Thread Jarno Rajahalme
Move the news for bundle timeouts to OVS 2.6, as we decided to
backport it there.

Signed-off-by: Jarno Rajahalme 
---
 NEWS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index 6cc01eb..343f7f1 100644
--- a/NEWS
+++ b/NEWS
@@ -4,8 +4,6 @@ Post-v2.6.0
  2.3.0, wherein the number of OpenFlow table hits and misses was
  not accurate.
- OpenFlow:
- * Bundles now expire after 10 seconds since the last time the
-   bundle was either opened, modified, or closed.
  * OFPT_PACKET_OUT messages are now supported in bundles.
- ovs-ofctl:
  * 'bundle' command now supports packet-out messages.
@@ -24,6 +22,8 @@ v2.6.0 - xx xxx 
  * New "monitor_cond" "monitor_cond_update" and "update2" extensions to
RFC 7047.
- OpenFlow:
+ * OpenFlow 1.3+ bundles now expire after 10 seconds since the
+   last time the bundle was either opened, modified, or closed.
  * OpenFlow 1.3 Extension 230, adding OpenFlow Bundles support, is
now implemented.
  * OpenFlow 1.3+ bundles are now supported for group mods as well as
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] NEWS: Memory efficiency improvements.

2016-09-15 Thread Jarno Rajahalme
Pushed to master, 2.6 to follow shortly, thanks!

 Jarno

> On Sep 14, 2016, at 8:55 PM, Ben Pfaff  wrote:
> 
> On Wed, Sep 14, 2016 at 07:25:35PM -0700, Jarno Rajahalme wrote:
>> Mention both flow table and bundle memory efficiency improvements.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Acked-by: Ben Pfaff 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] ofproto: Remove double reporting from bundles.

2016-09-15 Thread Jarno Rajahalme

> On Sep 14, 2016, at 8:55 PM, Ben Pfaff  wrote:
> 
> On Wed, Sep 14, 2016 at 07:25:34PM -0700, Jarno Rajahalme wrote:
>> Patch b0d38b2f17 unified flow mod reporting in ofproto for both
>> stand-alone flow mods and bundle flow mods, but left bundle-specific
>> reporting to the bundle removal code.  This patch fixes this by
>> removing the bundle-specific reporting of flow mods.
>> 
>> Found by inspection.
>> 
>> Fixes: b0d38b2f17 ("ofproto: Report flow mods also from bundles.")
>> Signed-off-by: Jarno Rajahalme 
> 
> Nice.
> 
> Should this be covered by a test?
> 

Pushed to master with a test change to cover this. Will push to branch-2.6 soon.

Thanks for the review!

  Jarno

> Acked-by: Ben Pfaff 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 1/2] ofproto: Remove double reporting from bundles.

2016-09-14 Thread Jarno Rajahalme
Patch b0d38b2f17 unified flow mod reporting in ofproto for both
stand-alone flow mods and bundle flow mods, but left bundle-specific
reporting to the bundle removal code.  This patch fixes this by
removing the bundle-specific reporting of flow mods.

Found by inspection.

Fixes: b0d38b2f17 ("ofproto: Report flow mods also from bundles.")
Signed-off-by: Jarno Rajahalme 
---
 ofproto/bundles.c | 20 +++-
 ofproto/bundles.h |  2 +-
 ofproto/connmgr.c |  4 ++--
 ofproto/ofproto.c |  2 +-
 4 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/ofproto/bundles.c b/ofproto/bundles.c
index e0b4b7d..24ea1ae 100644
--- a/ofproto/bundles.c
+++ b/ofproto/bundles.c
@@ -62,16 +62,11 @@ ofp_bundle_create(uint32_t id, uint16_t flags, const struct 
ofp_header *oh)
 }
 
 void
-ofp_bundle_remove__(struct ofconn *ofconn, struct ofp_bundle *bundle,
-bool success)
+ofp_bundle_remove__(struct ofconn *ofconn, struct ofp_bundle *bundle)
 {
 struct ofp_bundle_entry *msg;
 
 LIST_FOR_EACH_POP (msg, node, &bundle->msg_list) {
-if (success && msg->type == OFPTYPE_FLOW_MOD) {
-/* Tell connmgr about successful flow mods. */
-ofconn_report_flow_mod(ofconn, msg->ofm.command);
-}
 ofp_bundle_entry_free(msg);
 }
 
@@ -90,7 +85,7 @@ ofp_bundle_open(struct ofconn *ofconn, uint32_t id, uint16_t 
flags,
 
 if (bundle) {
 VLOG_INFO("Bundle %x already exists.", id);
-ofp_bundle_remove__(ofconn, bundle, false);
+ofp_bundle_remove__(ofconn, bundle);
 
 return OFPERR_OFPBFC_BAD_ID;
 }
@@ -116,12 +111,12 @@ ofp_bundle_close(struct ofconn *ofconn, uint32_t id, 
uint16_t flags)
 }
 
 if (bundle->state == BS_CLOSED) {
-ofp_bundle_remove__(ofconn, bundle, false);
+ofp_bundle_remove__(ofconn, bundle);
 return OFPERR_OFPBFC_BUNDLE_CLOSED;
 }
 
 if (bundle->flags != flags) {
-ofp_bundle_remove__(ofconn, bundle, false);
+ofp_bundle_remove__(ofconn, bundle);
 return OFPERR_OFPBFC_BAD_FLAGS;
 }
 
@@ -141,8 +136,7 @@ ofp_bundle_discard(struct ofconn *ofconn, uint32_t id)
 return OFPERR_OFPBFC_BAD_ID;
 }
 
-ofp_bundle_remove__(ofconn, bundle, false);
-
+ofp_bundle_remove__(ofconn, bundle);
 return 0;
 }
 
@@ -165,10 +159,10 @@ ofp_bundle_add_message(struct ofconn *ofconn, uint32_t 
id, uint16_t flags,
 return error;
 }
 } else if (bundle->state == BS_CLOSED) {
-ofp_bundle_remove__(ofconn, bundle, false);
+ofp_bundle_remove__(ofconn, bundle);
 return OFPERR_OFPBFC_BUNDLE_CLOSED;
 } else if (flags != bundle->flags) {
-ofp_bundle_remove__(ofconn, bundle, false);
+ofp_bundle_remove__(ofconn, bundle);
 return OFPERR_OFPBFC_BAD_FLAGS;
 }
 
diff --git a/ofproto/bundles.h b/ofproto/bundles.h
index 1818b16..dd64700 100644
--- a/ofproto/bundles.h
+++ b/ofproto/bundles.h
@@ -83,7 +83,7 @@ enum ofperr ofp_bundle_add_message(struct ofconn *, uint32_t 
id,
uint16_t flags, struct ofp_bundle_entry *,
const struct ofp_header *);
 
-void ofp_bundle_remove__(struct ofconn *, struct ofp_bundle *, bool success);
+void ofp_bundle_remove__(struct ofconn *, struct ofp_bundle *);
 
 static inline struct ofp_bundle_entry *
 ofp_bundle_entry_alloc(enum ofptype type, const struct ofp_header *oh)
diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 6377245..4b927d6 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1234,7 +1234,7 @@ bundle_remove_all(struct ofconn *ofconn)
 struct ofp_bundle *b, *next;
 
 HMAP_FOR_EACH_SAFE (b, next, node, &ofconn->bundles) {
-ofp_bundle_remove__(ofconn, b, false);
+ofp_bundle_remove__(ofconn, b);
 }
 }
 
@@ -1247,7 +1247,7 @@ bundle_remove_expired(struct ofconn *ofconn, long long 
int now)
 HMAP_FOR_EACH_SAFE (b, next, node, &ofconn->bundles) {
 if (b->used <= limit) {
 ofconn_send_error(ofconn, &b->ofp_msg, OFPERR_OFPBFC_TIMEOUT);
-ofp_bundle_remove__(ofconn, b, false);
+ofp_bundle_remove__(ofconn, b);
 }
 }
 }
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 90b1ffa..f89f0ef 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -7590,7 +7590,7 @@ do_bundle_commit(struct ofconn *ofconn, uint32_t id, 
uint16_t flags)
 }
 
 /* The bundle is discarded regardless the outcome. */
-ofp_bundle_remove__(ofconn, bundle, !error);
+ofp_bundle_remove__(ofconn, bundle);
 return error;
 }
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/2] NEWS: Memory efficiency improvements.

2016-09-14 Thread Jarno Rajahalme
Mention both flow table and bundle memory efficiency improvements.

Signed-off-by: Jarno Rajahalme 
---
 NEWS | 8 
 1 file changed, 8 insertions(+)

diff --git a/NEWS b/NEWS
index a1ca864..6cc01eb 100644
--- a/NEWS
+++ b/NEWS
@@ -29,6 +29,14 @@ v2.6.0 - xx xxx 
  * OpenFlow 1.3+ bundles are now supported for group mods as well as
flow mods and port mods.  Both 'atomic' and 'ordered' bundle
flags are supported for group mods as well as flow mods.
+ * Internal OpenFlow rule representation for load and set-field
+   actions is now much more memory efficient.  For a complex flow
+   table this can reduce rule memory consumption by 40%.
+ * Bundles are now much more memory efficient than in OVS 2.5.
+   Together with memory efficiency improvements in OpenFlow rule
+   representation, the peak OVS resident memory use during a
+   bundle commit for large complex set of flow mods can be only
+   25% of that in OVS 2.5 (4x lower).
  * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY.
  * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
  * OpenFlow 1.4+ OFPT_TABLE_STATUS is now supported.
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 13/13] ofproto: Support packet_outs in bundles.

2016-09-14 Thread Jarno Rajahalme
Series now pushed to master,

  Jarno

> On Sep 14, 2016, at 2:57 PM, Jarno Rajahalme  wrote:
> 
>> 
>> On Sep 13, 2016, at 5:09 PM, Ben Pfaff mailto:b...@ovn.org>> 
>> wrote:
>> 
>> On Mon, Sep 12, 2016 at 01:52:43PM -0700, Jarno Rajahalme wrote:
>>> Add support for OFPT_PACKET_OUT messages in bundles.
>>> 
>>> While ovs-ofctl already has a packet-out command, we did not have a
>>> string parser for it, as the parsing was done directly from command
>>> line arguments.
>>> 
>>> This patch adds the string parser for packet-out messages, and adds a
>>> new ofctl/packet-out ovs-appctl command that can be used when
>>> ovs-ofctl is used as a flow monitor.
>>> 
>>> The new packet-out parser is further supported with the ovs-ofctl
>>> bundle command, which allows bundles to mix flow mods, group mods and
>>> packet-out messages.  Also the packet-outs in bundles are only
>>> executed if the whole bundle is successful.  A failing packet-out
>>> translation may also make the whole bundle to fail.
>>> 
>>> Signed-off-by: Jarno Rajahalme mailto:ja...@ovn.org>>
>> 
>> Should the existing packet-out command in ovs-ofctl also support this
>> new syntax?  It could distinguish it from the legacy syntax by checking
>> the number of arguments (1 versus 4+).  If so, perhaps we should
>> deprecate the legacy syntax.
> 
> OK.
> 
>> I think that parse_ofp_packet_out_str__() has some memory leaks inside
>> the while loop: if it detects than one error, then the earlier one is
>> leaked.
> 
> Fixed by adding “goto out;” after each time error is set to a non-NULL value, 
> rather than checking after multiple possible errors.
> 
>> 
>> It's a little unusual that ofputil_encode_bundle_msgs() frees the
>> bundled messages and that there's no separate way to do that without
>> encoding.
> 
> Added a separate ofputil_free_bundle_msgs(), and removed the freeing from the 
> ofputil_encode_bundle_msgs().
> 
>> 
>> I'm surprised that do_bundle_commit() potentially increases
>> ofproto->tables_version more than once (in the loop labeled
>> "4. Finish.").  I would have guessed that it would do that once, to make
>> visible everything in the bundle.
>> 
> 
> We have no versioning for port_mods, which OpenFlow spec says need to be 
> supported in bundles. If the bundle does not include port mods, everything is 
> published as one new version, but if there are port mods, then the other 
> (flow, group, packet out) mods before and after each port mod are made 
> visible as one new version. I presume that if the caller is not specifying 
> the ‘ordered’ flag, we could first issue all the port mods and then commit 
> everything else as one version, but I have not wanted to deal with the 
> complications of ‘out-of-order’ handling of messages, as we would need to 
> detect potential conflicts among the messages. E.g., if one flow mod deletes 
> all the flows and another adds one, out-of-order execution could be 
> considered ‘conflicting’. Now we always execute everything in order, so the 
> semantics is clear. However, I’m not sure if we should still detect such 
> conflicts if the client does not specify the ‘ordered’ flag.
> 
> Thanks a lot for the reviews, I’ll push the patches 6-13 in a moment (I 
> pushed patches 1-5 already yesterday).
> 
>   Jarno
> 
>> Acked-by: Ben Pfaff mailto:b...@ovn.org>>

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 13/13] ofproto: Support packet_outs in bundles.

2016-09-14 Thread Jarno Rajahalme

> On Sep 13, 2016, at 5:09 PM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:43PM -0700, Jarno Rajahalme wrote:
>> Add support for OFPT_PACKET_OUT messages in bundles.
>> 
>> While ovs-ofctl already has a packet-out command, we did not have a
>> string parser for it, as the parsing was done directly from command
>> line arguments.
>> 
>> This patch adds the string parser for packet-out messages, and adds a
>> new ofctl/packet-out ovs-appctl command that can be used when
>> ovs-ofctl is used as a flow monitor.
>> 
>> The new packet-out parser is further supported with the ovs-ofctl
>> bundle command, which allows bundles to mix flow mods, group mods and
>> packet-out messages.  Also the packet-outs in bundles are only
>> executed if the whole bundle is successful.  A failing packet-out
>> translation may also make the whole bundle to fail.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> Should the existing packet-out command in ovs-ofctl also support this
> new syntax?  It could distinguish it from the legacy syntax by checking
> the number of arguments (1 versus 4+).  If so, perhaps we should
> deprecate the legacy syntax.

OK.

> I think that parse_ofp_packet_out_str__() has some memory leaks inside
> the while loop: if it detects than one error, then the earlier one is
> leaked.

Fixed by adding “goto out;” after each time error is set to a non-NULL value, 
rather than checking after multiple possible errors.

> 
> It's a little unusual that ofputil_encode_bundle_msgs() frees the
> bundled messages and that there's no separate way to do that without
> encoding.

Added a separate ofputil_free_bundle_msgs(), and removed the freeing from the 
ofputil_encode_bundle_msgs().

> 
> I'm surprised that do_bundle_commit() potentially increases
> ofproto->tables_version more than once (in the loop labeled
> "4. Finish.").  I would have guessed that it would do that once, to make
> visible everything in the bundle.
> 

We have no versioning for port_mods, which OpenFlow spec says need to be 
supported in bundles. If the bundle does not include port mods, everything is 
published as one new version, but if there are port mods, then the other (flow, 
group, packet out) mods before and after each port mod are made visible as one 
new version. I presume that if the caller is not specifying the ‘ordered’ flag, 
we could first issue all the port mods and then commit everything else as one 
version, but I have not wanted to deal with the complications of ‘out-of-order’ 
handling of messages, as we would need to detect potential conflicts among the 
messages. E.g., if one flow mod deletes all the flows and another adds one, 
out-of-order execution could be considered ‘conflicting’. Now we always execute 
everything in order, so the semantics is clear. However, I’m not sure if we 
should still detect such conflicts if the client does not specify the ‘ordered’ 
flag.

Thanks a lot for the reviews, I’ll push the patches 6-13 in a moment (I pushed 
patches 1-5 already yesterday).

  Jarno

> Acked-by: Ben Pfaff mailto:b...@ovn.org>>

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 12/13] ofproto: Refactor packet_out handling.

2016-09-14 Thread Jarno Rajahalme

> On Sep 13, 2016, at 4:44 PM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:42PM -0700, Jarno Rajahalme wrote:
>> Refactor handle_packet_out() to prepare for bundle support for packet
>> outs in a later patch.
>> 
>> Two new callbacks are introduced in ofproto-provider class:
>> ->packet_xlate() and ->packet_execute().  ->packet_xlate() translates
>> the packet using the flow and actions provided by the caller, but
>> defers all OpenFlow-visible side-effects (stats, learn actions, actual
>> packet output, etc.) to be explicitly executed with the
>> ->packet_execute() call.
>> 
>> Adds a new ofproto_rule_reduce_timeouts__() that must be called with
>> 'ofproto_mutex' held.  This is used in the next patch.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> v3: Removed layer violations by making ofproto-dpif initialize and
>>manage its private 'aux' member in struct ofproto_packet_out.
> 
> It appears that an xlate_cache always allocates at least 512 bytes of
> space.  (It's not a change but I hadn't noticed before.)  Is that right?
> It seems like a lot, since I guess that the ordinary occupancy is much
> smaller than that.
> 

It’s an union, and the size is 40 bytes per entry. Given that each match/miss 
gets a table entry, each matched rule gets a rule entry, and then the 
occasional normal, learn, etc., the total size in a complex pipeline could 
easily be the 512 or more. E.g., pipeline of 15 tables would generate about 30 
entries, with total size of 1200 bytes in the ofpbuf.

However, since some pipelines are simple, I reduced the initial size to 120 
bytes. We can revisit this when the optimizations described below are done.

I have though of optimizing this further by making the entires variable sized, 
much like the ofpacts. We could also add a table number to the XC_RULE entry, 
and then modify the XC_TABLE to XC_TABLE_MISS. That would effectively halve the 
number of entries, and also make the average entry size about 16 bytes. 
Furthermore, entries now using malloc could use the ofpbuf directly, which 
could save some alloc/free overhead. Finally, we could give the same treatment 
that I did for XC_NORMAL to XC_BOND and XC_NETFLOW, replacing the full struct 
flow with just the fields that are needed.

> Acked-by: Ben Pfaff mailto:b...@ovn.org>>


Thanks for the review!

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 09/13] ofproto: Use ofproto_flow_mod for learn execution from xlate cache.

2016-09-13 Thread Jarno Rajahalme

> On Sep 13, 2016, at 1:04 PM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:39PM -0700, Jarno Rajahalme wrote:
>> Use ofproto_flow_mod with a reference to an existing or new rule
>> instead of ofputil_flow_mod for learn action execution from xlate
>> cache
>> 
>> Typically we would find that when a learn xlate cache entry is
>> created, a preceding upcall has already created the learned flow.  In
>> this case the xlate cache entry takes a reference to that flow and
>> keeps refreshing it without needing to perform any flow table lookups.
>> Otherwise the creation of the xlate cache entry creates the new rule,
>> which is then subsequently added to the classifier.  In both cases
>> this is both faster and shrinks the memory cost of each learn cache
>> entry from ~3.5kb to about 0.3kb.
>> 
>> If the learned rule does not yet exist, it is created and attached to
>> the ofproto_flow_mod, from which it is then added.  If the referred
>> rule happens to expire, or is modified in any way and is thus removed
>> from the classifier tables, we create a new rule using the old rule as
>> a template, so that we can avoid storing the ofputil_flow_mod in all
>> cases.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> v3: Fixed error handling, simplified reference keeping.
> 
> Thanks for implementing this!
> 

Thanks for the review!

  Jarno

> Acked-by: Ben Pfaff mailto:b...@ovn.org>>

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 10/13] ofproto-dpif-xlate: Allow translating without side-effects.

2016-09-13 Thread Jarno Rajahalme

> On Sep 13, 2016, at 1:06 PM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:40PM -0700, Jarno Rajahalme wrote:
>> Extend 'may_learn' attribute to also control the treatment of
>> FIN_TIMEOUT action and asynchronous messages (packet ins,
>> continuations), so that when 'may_learn' is 'false' and
>> 'resubmit_stats' is 'NULL', no OpenFlow-visible side effects are
>> generated by the translation.
>> 
>> Correspondingly, add support for one-time asynchronous messages to
>> xlate cache, so that all side-effects of the translation may be
>> executed at a later stage.  This will be useful for bundle commits.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> v3: Rebase.
> 
> Should we rename may_learn to something more general?  But I don't have
> a good name in mind.
> 

I changed it to ‘allow_side_effects’.

> Acked-by: Ben Pfaff 

Thanks for the review!

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 07/13] ofproto-dpif-xlate: Expose xlate cache.

2016-09-13 Thread Jarno Rajahalme

> On Sep 13, 2016, at 12:58 PM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:37PM -0700, Jarno Rajahalme wrote:
>> Later patches will need to create xlate cache entries from different
>> modules.  This patch refactors the xlate cache code in preparation
>> without any functional changes, so that the changes are clearly
>> visible in the following patches.
>> 
>> The definition of XC_ENTRY_FOR_EACH() iterator macro is changed so
>> that it now does not take the xlate cache pointer to unify the usage
>> accross all call sites.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> v3: Separated xlate cache to it's own module.
> 
> At the top of ofproto-dpif-xlate-cache.h, usually I alphabetize my
> struct forward definitions.
> 

Done. Never used emacs sort-lines before :-)

> Optionally we could get rid of the name for the 'u' member of struct
> xc_entry, which would make references into the union a little more
> convenient.
> 

Done.

> Thanks for working on this!
> 

Did you skip the patch 06/13, or did I just not get the email? Either way, I’ll 
wait for that before pushing the rest of the series.

> Acked-by: Ben Pfaff 

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 05/13] connmgr: Make connmgr_wants_packet_in_on_miss() lock-free.

2016-09-13 Thread Jarno Rajahalme

> On Sep 13, 2016, at 12:38 PM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:35PM -0700, Jarno Rajahalme wrote:
>> Make connmgr_wants_packet_in_on_miss() use an atomic int instead of a
>> list traversal taking the 'ofproto_mutex'.  This allows
>> connmgr_wants_packet_in_on_miss() to be called also when
>> 'ofproto_mutex' is already held, and makes it faster, too.
>> 
>> Remove unused ofproto_dpif_wants_packet_in_on_miss().
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> v3: Fix the totally broken behavior with a help of a per-ofconn boolean.
> 
> I think that update_want_packet_in_on_miss() can be a little more
> straightforward.  How about this?  I have not tested it.
> 
> static void
> update_want_packet_in_on_miss(struct ofconn *ofconn)
> {
>/* We want a packet-in on miss when controller_id is zero and OpenFlow is
> * lower than version 1.3. */
>enum ofputil_protocol p = ofconn->protocol;
>int new_want = (ofconn->controller_id == 0 &&
>(p == OFPUTIL_P_NONE ||
> ofputil_protocol_to_ofp_version(p) < OFP13_VERSION));
> 
>/* Update the setting and the count if ncessary. */
>int old_want = ofconn->want_packet_in_on_miss;
>if (old_want != new_want) {
>atomic_int *dst = &ofconn->connmgr->want_packet_in_on_miss;
>int count;
>atomic_read_relaxed(dst, &count);
>atomic_store_relaxed(dst, count - old_want + new_want);
> 
>ofconn->want_packet_in_on_miss = new_want;
>}
> }
> 

Thanks for the review, I adopted this improvement.

  Jarno

> Acked-by: Ben Pfaff 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 03/13] ofproto: Add a fixed bundle idle timeout of 10 seconds.

2016-09-13 Thread Jarno Rajahalme

> On Sep 13, 2016, at 11:14 AM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:33PM -0700, Jarno Rajahalme wrote:
>> Timing out idle bundles frees memory that would effectively be leaked
>> if a long standing OpenFlow connection would fail to commit or discard
>> a bundle.
>> 
>> OpenFlow specification mandates the timeout to be at least one second,
>> if the switch implements such a timeout.  This patch makes the bundle
>> idle timeout to be 10 seconds.
>> 
>> We do not limit the number of messages in a bundle, so it does not
>> make sense to limit the number of bundles either, especially now that
>> idle bundles are timed out.
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> v3: New patch for v3.
> 
> It would be kind to describe the behavior in ovs-vswitchd(8), in the
> OPENFLOW IMPLEMENTATION section (and then the code should refer back to
> this so that we'll update it if we change the code).
> 

Done, thanks for the review. Will push to master (only?) soon.

  Jarno


> Acked-by: Ben Pfaff 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 01/13] ofproto: Don't use connmgr after destruction.

2016-09-13 Thread Jarno Rajahalme

> On Sep 13, 2016, at 10:56 AM, Ben Pfaff  wrote:
> 
> On Mon, Sep 12, 2016 at 01:52:31PM -0700, Jarno Rajahalme wrote:
>> Set ofproto's connmgr pointer to NULL after the connmgr has been
>> destructed, and check for NULL when sending a flow removed
>> notification.
>> 
>> Verified by sending the flow removed message unconditionally and
>> observing numerous core dumps in the test suite.
>> 
>> Found by inspection.
>> 
>> Fixes: f695ebfae5 ("ofproto: Postpone sending flow removed messages.")
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> v3: New patch for v3.
> 
> I'm worried a bit that the rules needed to access these structures
> correctly are getting complicated.  Maybe there should be a high-level
> overview somewhere.
> 

The main complication here is that if we tagged connmgr * with 
OVS_GUARDED_BY(ofproto_mutex), we would need to do widen the scope of 
ofproto_mutex quite a bit. Maybe introducing a rwlock for connmgr separately 
would do it, albeit with the assumption that the main thread remains the only 
writer, all that extra read locking from the main thread would be new overhead. 
It would be more maintainable, though.

I made a note to look into this later.

Do you think we should apply this to branch-2.6 as well as the master?

Thanks for the review,

  Jarno

> Acked-by: Ben Pfaff 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 10/15] ofproto-dpif-xlate: Add xlate cache type XC_TABLE.

2016-09-12 Thread Jarno Rajahalme

> On Aug 29, 2016, at 2:57 PM, Ben Pfaff  wrote:
> 
> On Mon, Aug 22, 2016 at 04:31:36PM -0700, Jarno Rajahalme wrote:
>> Xlate cache entry type XC_TABLE is required for the table stats
>> (number of misses and matches) to be correctly attributed.
>> 
>> It appears that table stats have been off ever since xlate cache was
>> introduced.  This was now revealed by a PACKET_OUT unit test case that
>> checks for table stats explicitly.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> It might be worth mentioning this in NEWS.

Done in v3.

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 03/15] types.h: Move mirror_mask_t here.

2016-09-12 Thread Jarno Rajahalme

> On Aug 29, 2016, at 2:11 PM, Ben Pfaff  wrote:
> 
> On Mon, Aug 22, 2016 at 04:31:29PM -0700, Jarno Rajahalme wrote:
>> Move mirror_mask_t from ofproto/ofproto-dpif-mirror.h to
>> openvswitch/types.h to avoid including function definitions with
>> colliding names (e.g., mirror_destroy(), which is also the name of a
>> static function in bridge.c).
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> I guess this must be about some upcoming change, since I don't see a
> failure before this patch.  What is the change?  It would be nice to
> describe it in the commit message.

I removed the layer violations from the series so that this patch was not 
necessary any more :-)

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH] xlate: Use dp_hash for select groups.

2016-09-12 Thread Jarno Rajahalme
I added a NEWS flash as well,

  Jarno

> On Sep 12, 2016, at 3:53 PM, Ben Pfaff  wrote:
> 
> Apparently ;-)
> 
> On Mon, Sep 12, 2016 at 02:59:16PM -0700, Jarno Rajahalme wrote:
>> I’m rebasing this now. Adding ovs-ofctl documentation should be sufficient, 
>> right?
>> 
>>  Jarno
>> 
>>> On Apr 22, 2016, at 8:27 AM, Ben Pfaff  wrote:
>>> 
>>> On Mon, Apr 18, 2016 at 05:42:53PM -0700, Jarno Rajahalme wrote:
>>>> Add a new select group selection method "dp_hash", which uses minimal
>>>> number of bits from the datapath calculated packet hash to inform the
>>>> select group bucket selection.  This makes the datapath flows more
>>>> generic resulting in less upcalls to userspace, but adds recirculation
>>>> prior to group selection.
>>>> 
>>>> Signed-off-by: Jarno Rajahalme 
>>> 
>>> How can we explain this to users?  I don't think it tries, yet--I don't
>>> see any documentation updates.
>>> 
>>> Thanks,
>>> 
>>> Ben.
>> 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2] xlate: Use dp_hash for select groups.

2016-09-12 Thread Jarno Rajahalme
Add a new select group selection method "dp_hash", which uses minimal
number of bits from the datapath calculated packet hash to inform the
select group bucket selection.  This makes the datapath flows more
generic resulting in less upcalls to userspace, but adds recirculation
prior to group selection.

Signed-off-by: Jarno Rajahalme 
---
v2: Rebase and documentation.

 NEWS |  8 +
 lib/ofp-parse.c  |  2 ++
 lib/ofp-util.c   | 16 --
 ofproto/ofproto-dpif-xlate.c | 71 ++--
 ofproto/ofproto-dpif.c   | 11 ---
 ofproto/ofproto-dpif.h   |  4 +--
 tests/ofproto-dpif.at| 38 
 tests/ofproto.at |  3 ++
 utilities/ovs-ofctl.8.in | 16 --
 9 files changed, 147 insertions(+), 22 deletions(-)

diff --git a/NEWS b/NEWS
index 8c78b36..88fd90a 100644
--- a/NEWS
+++ b/NEWS
@@ -37,6 +37,14 @@ v2.6.0 - xx xxx 
already expected to work properly in cases where the switch can
not buffer packets, so this change should not affect existing
users.
+ * A new "selection_method=dp_hash" type for OpenFlow select group
+   bucket selection that uses the datapath computed 5-tuple hash
+   without making datapath flows match the 5-tuple fields, which
+   is useful for more efficient load balancing, for example.  This
+   uses the Netronome extension to OpenFlow 1.5+ that allows
+   control over the OpenFlow select groups selection method.  See
+   "selection_method" and related options in ovs-ofctl(8) for
+   details.
- Improved OpenFlow version compatibility for actions:
  * New OpenFlow extension to support the "group" action in OpenFlow 1.0.
  * OpenFlow 1.0 "enqueue" action now properly translated to OpenFlow 1.1+.
diff --git a/lib/ofp-parse.c b/lib/ofp-parse.c
index d3ef140..4c1f07a 100644
--- a/lib/ofp-parse.c
+++ b/lib/ofp-parse.c
@@ -1566,6 +1566,8 @@ parse_ofp_group_mod_str__(struct ofputil_group_mod *gm, 
int command,
 goto out;
 }
 
+/* XXX Exclude fields for non "hash" selection method. */
+
 if (fields & F_COMMAND_BUCKET_ID) {
 if (!(fields & F_COMMAND_BUCKET_ID_ALL || had_command_bucket_id)) {
 error = xstrdup("must specify a command bucket id");
diff --git a/lib/ofp-util.c b/lib/ofp-util.c
index 1fa4998..f0fdbe6 100644
--- a/lib/ofp-util.c
+++ b/lib/ofp-util.c
@@ -8901,28 +8901,24 @@ parse_group_prop_ntr_selection_method(struct ofpbuf 
*payload,
 return OFPERR_OFPBPC_BAD_VALUE;
 }
 
-if (strcmp("hash", prop->selection_method)) {
+if (strcmp("hash", prop->selection_method)
+&& strcmp("dp_hash", prop->selection_method)) {
 OFPPROP_LOG(&bad_ofmsg_rl, false,
 "ntr selection method '%s' is not supported",
 prop->selection_method);
 return OFPERR_OFPBPC_BAD_VALUE;
 }
+/* 'method_len' is now non-zero. */
 
 strcpy(gp->selection_method, prop->selection_method);
 gp->selection_method_param = ntohll(prop->selection_method_param);
 
-if (!method_len && gp->selection_method_param) {
-OFPPROP_LOG(&bad_ofmsg_rl, false, "ntr selection method parameter is "
-"non-zero but selection method is empty");
-return OFPERR_OFPBPC_BAD_VALUE;
-}
-
 ofpbuf_pull(payload, sizeof *prop);
 
 fields_len = ntohs(prop->length) - sizeof *prop;
-if (!method_len && fields_len) {
-OFPPROP_LOG(&bad_ofmsg_rl, false, "ntr selection method parameter is "
-"zero but fields are provided");
+if (fields_len && strcmp("hash", gp->selection_method)) {
+OFPPROP_LOG(&bad_ofmsg_rl, false, "ntr selection method %s "
+"does not support fields", gp->selection_method);
 return OFPERR_OFPBPC_BAD_VALUE;
 }
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 74e3387..e00a5f8 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -348,6 +348,11 @@ struct xlate_ctx {
 * case at that point.
 */
 bool freezing;
+bool recirc_update_dp_hash;/* Generated recirculation will be preceded
+* by datapath HASH action to get an updated
+* dp_hash after recirculation. */
+uint32_t dp_hash_alg;
+uint32_t dp_hash_basis;
 struct ofpbuf frozen_actions;
 const struct ofpact_controller *pause;
 
@@ -410,6 +415,17 @@ ctx_trigger_freeze(struct xlate_ctx *ctx)
 ctx->freezing = true;
 }
 
+static void
+ctx_trigger_recirculate_with_hash(struct xlate_ct

Re: [ovs-dev] [RFC PATCH] xlate: Use dp_hash for select groups.

2016-09-12 Thread Jarno Rajahalme
I’m rebasing this now. Adding ovs-ofctl documentation should be sufficient, 
right?

  Jarno

> On Apr 22, 2016, at 8:27 AM, Ben Pfaff  wrote:
> 
> On Mon, Apr 18, 2016 at 05:42:53PM -0700, Jarno Rajahalme wrote:
>> Add a new select group selection method "dp_hash", which uses minimal
>> number of bits from the datapath calculated packet hash to inform the
>> select group bucket selection.  This makes the datapath flows more
>> generic resulting in less upcalls to userspace, but adds recirculation
>> prior to group selection.
>> 
>> Signed-off-by: Jarno Rajahalme 
> 
> How can we explain this to users?  I don't think it tries, yet--I don't
> see any documentation updates.
> 
> Thanks,
> 
> Ben.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v3 12/13] ofproto: Refactor packet_out handling.

2016-09-12 Thread Jarno Rajahalme
Refactor handle_packet_out() to prepare for bundle support for packet
outs in a later patch.

Two new callbacks are introduced in ofproto-provider class:
->packet_xlate() and ->packet_execute().  ->packet_xlate() translates
the packet using the flow and actions provided by the caller, but
defers all OpenFlow-visible side-effects (stats, learn actions, actual
packet output, etc.) to be explicitly executed with the
->packet_execute() call.

Adds a new ofproto_rule_reduce_timeouts__() that must be called with
'ofproto_mutex' held.  This is used in the next patch.

Signed-off-by: Jarno Rajahalme 
---
v3: Removed layer violations by making ofproto-dpif initialize and
manage its private 'aux' member in struct ofproto_packet_out.

 ofproto/ofproto-dpif-upcall.c  |  11 +-
 ofproto/ofproto-dpif-xlate-cache.c |  17 ++-
 ofproto/ofproto-dpif-xlate-cache.h |   2 +
 ofproto/ofproto-dpif-xlate.c   |  40 +++---
 ofproto/ofproto-dpif-xlate.h   |   3 +-
 ofproto/ofproto-dpif.c | 213 +---
 ofproto/ofproto-dpif.h |  15 +-
 ofproto/ofproto-provider.h |  91 ++--
 ofproto/ofproto.c  | 275 ++---
 9 files changed, 488 insertions(+), 179 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 3080919..6bf659a 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1078,7 +1078,9 @@ upcall_xlate(struct udpif *udpif, struct upcall *upcall,
 stats.used = time_msec();
 stats.tcp_flags = ntohs(upcall->flow->tcp_flags);
 
-xlate_in_init(&xin, upcall->ofproto, upcall->flow, upcall->in_port, NULL,
+xlate_in_init(&xin, upcall->ofproto,
+  ofproto_dpif_get_tables_version(upcall->ofproto),
+  upcall->flow, upcall->in_port, NULL,
   stats.tcp_flags, upcall->packet, wc, odp_actions);
 
 if (upcall->type == DPIF_UC_MISS) {
@@ -1908,7 +1910,8 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
*ukey,
 ukey->xcache = xlate_cache_new();
 }
 
-xlate_in_init(&xin, ofproto, &flow, ofp_in_port, NULL, push.tcp_flags,
+xlate_in_init(&xin, ofproto, ofproto_dpif_get_tables_version(ofproto),
+  &flow, ofp_in_port, NULL, push.tcp_flags,
   NULL, need_revalidate ? &wc : NULL, odp_actions);
 if (push.n_packets) {
 xin.resubmit_stats = &push;
@@ -2086,7 +2089,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
size_t n_ops)
 if (!error) {
 struct xlate_in xin;
 
-xlate_in_init(&xin, ofproto, &flow, ofp_in_port, NULL,
+xlate_in_init(&xin, ofproto,
+  ofproto_dpif_get_tables_version(ofproto),
+  &flow, ofp_in_port, NULL,
   push->tcp_flags, NULL, NULL, NULL);
 xin.resubmit_stats = push->n_packets ? push : NULL;
 xin.may_learn = push->n_packets > 0;
diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
b/ofproto/ofproto-dpif-xlate-cache.c
index 68ea958..18ffc52 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -47,12 +47,17 @@
 
 VLOG_DEFINE_THIS_MODULE(ofproto_xlate_cache);
 
+void
+xlate_cache_init(struct xlate_cache *xcache)
+{
+ofpbuf_init(&xcache->entries, 512);
+}
+
 struct xlate_cache *
 xlate_cache_new(void)
 {
 struct xlate_cache *xcache = xmalloc(sizeof *xcache);
-
-ofpbuf_init(&xcache->entries, 512);
+xlate_cache_init(xcache);
 return xcache;
 }
 
@@ -261,9 +266,15 @@ xlate_cache_clear(struct xlate_cache *xcache)
 }
 
 void
-xlate_cache_delete(struct xlate_cache *xcache)
+xlate_cache_uninit(struct xlate_cache *xcache)
 {
 xlate_cache_clear(xcache);
 ofpbuf_uninit(&xcache->entries);
+}
+
+void
+xlate_cache_delete(struct xlate_cache *xcache)
+{
+xlate_cache_uninit(xcache);
 free(xcache);
 }
diff --git a/ofproto/ofproto-dpif-xlate-cache.h 
b/ofproto/ofproto-dpif-xlate-cache.h
index 3705000..875cf10 100644
--- a/ofproto/ofproto-dpif-xlate-cache.h
+++ b/ofproto/ofproto-dpif-xlate-cache.h
@@ -130,12 +130,14 @@ struct xlate_cache {
 struct ofpbuf entries;
 };
 
+void xlate_cache_init(struct xlate_cache *);
 struct xlate_cache *xlate_cache_new(void);
 struct xc_entry *xlate_cache_add_entry(struct xlate_cache *, enum xc_type);
 void xlate_push_stats_entry(struct xc_entry *, const struct dpif_flow_stats *);
 void xlate_push_stats(struct xlate_cache *, const struct dpif_flow_stats *);
 void xlate_cache_clear_entry(struct xc_entry *);
 void xlate_cache_clear(struct xlate_cache *);
+void xlate_cache_uninit(struct xlate_cache *);
 void xlate_cache_delete(struct xlate_cache *);
 
 #endif /* ofproto-dpif-xlate-cache.h */
diff --git a/of

[ovs-dev] [PATCH v3 13/13] ofproto: Support packet_outs in bundles.

2016-09-12 Thread Jarno Rajahalme
Add support for OFPT_PACKET_OUT messages in bundles.

While ovs-ofctl already has a packet-out command, we did not have a
string parser for it, as the parsing was done directly from command
line arguments.

This patch adds the string parser for packet-out messages, and adds a
new ofctl/packet-out ovs-appctl command that can be used when
ovs-ofctl is used as a flow monitor.

The new packet-out parser is further supported with the ovs-ofctl
bundle command, which allows bundles to mix flow mods, group mods and
packet-out messages.  Also the packet-outs in bundles are only
executed if the whole bundle is successful.  A failing packet-out
translation may also make the whole bundle to fail.

Signed-off-by: Jarno Rajahalme 
---
v3: Rebase.

 NEWS|   1 +
 include/openvswitch/ofp-parse.h |   5 ++
 include/openvswitch/ofp-util.h  |   1 +
 lib/ofp-parse.c | 108 
 lib/ofp-util.c  |   7 ++-
 ofproto/bundles.h   |   7 ++-
 ofproto/ofproto-dpif.c  |  23 +++
 ofproto/ofproto-provider.h  |   7 +++
 ofproto/ofproto.c   |  82 +---
 tests/ofp-print.at  |  19 ++
 tests/ofproto.at| 134 ++--
 utilities/ovs-ofctl.8.in|  46 --
 utilities/ovs-ofctl.c   |  55 +
 13 files changed, 460 insertions(+), 35 deletions(-)

diff --git a/NEWS b/NEWS
index 7f501d6..20fa0b7 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,7 @@ Post-v2.6.0
- OpenFlow:
  * Bundles now expire after 10 seconds since the last time the
bundle was either opened, modified, or closed.
+ * OFPT_PACKET_OUT messages are now supported in bundles.
 
 v2.6.0 - xx xxx 
 -
diff --git a/include/openvswitch/ofp-parse.h b/include/openvswitch/ofp-parse.h
index df60b18..e68e57b 100644
--- a/include/openvswitch/ofp-parse.h
+++ b/include/openvswitch/ofp-parse.h
@@ -28,6 +28,7 @@
 struct flow;
 struct ofpbuf;
 struct ofputil_flow_mod;
+struct ofputil_packet_out;
 struct ofputil_flow_monitor_request;
 struct ofputil_flow_stats_request;
 struct ofputil_group_mod;
@@ -47,6 +48,10 @@ char *parse_ofp_flow_mod_str(struct ofputil_flow_mod *, 
const char *string,
  enum ofputil_protocol *usable_protocols)
 OVS_WARN_UNUSED_RESULT;
 
+char *parse_ofp_packet_out_str(struct ofputil_packet_out *po, const char *str_,
+   enum ofputil_protocol *usable_protocols)
+OVS_WARN_UNUSED_RESULT;
+
 char *parse_ofp_table_mod(struct ofputil_table_mod *,
   const char *table_id, const char *flow_miss_handling,
   uint32_t *usable_versions)
diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index 450e739..9283e13 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -1343,6 +1343,7 @@ struct ofputil_bundle_msg {
 union {
 struct ofputil_flow_mod fm;
 struct ofputil_group_mod gm;
+struct ofputil_packet_out po;
 };
 };
 
diff --git a/lib/ofp-parse.c b/lib/ofp-parse.c
index d3ef140..7fccb48 100644
--- a/lib/ofp-parse.c
+++ b/lib/ofp-parse.c
@@ -22,6 +22,7 @@
 #include 
 
 #include "byte-order.h"
+#include "dp-packet.h"
 #include "learn.h"
 #include "multipath.h"
 #include "netdev.h"
@@ -550,6 +551,106 @@ parse_ofp_str(struct ofputil_flow_mod *fm, int command, 
const char *str_,
 return error;
 }
 
+/* Parse a string representation of a OFPT_PACKET_OUT to '*po'.  If successful,
+ * both 'po->ofpacts' and 'po->packet' must be free()d by the caller. */
+static char * OVS_WARN_UNUSED_RESULT
+parse_ofp_packet_out_str__(struct ofputil_packet_out *po, char *string,
+   enum ofputil_protocol *usable_protocols)
+{
+enum ofputil_protocol action_usable_protocols;
+uint64_t stub[256 / 8];
+struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(stub);
+struct dp_packet *packet = NULL;
+char *act_str = NULL;
+char *name, *value;
+char *error = NULL;
+
+*usable_protocols = OFPUTIL_P_ANY;
+
+*po = (struct ofputil_packet_out) {
+.buffer_id = UINT32_MAX,
+.in_port = OFPP_CONTROLLER,
+};
+
+act_str = extract_actions(string);
+
+while (ofputil_parse_key_value(&string, &name, &value)) {
+if (!*value) {
+error = xasprintf("field %s missing value", name);
+}
+
+if (!strcmp(name, "in_port")) {
+if (!ofputil_port_from_string(value, &po->in_port)) {
+error = xasprintf("%s is not a valid OpenFlow port", value);
+}
+if (po->in_port > OFPP_MAX && po->in_port != OFPP_LOCAL
+&& po->in_port != OFPP_NONE
+

[ovs-dev] [PATCH v3 01/13] ofproto: Don't use connmgr after destruction.

2016-09-12 Thread Jarno Rajahalme
Set ofproto's connmgr pointer to NULL after the connmgr has been
destructed, and check for NULL when sending a flow removed
notification.

Verified by sending the flow removed message unconditionally and
observing numerous core dumps in the test suite.

Found by inspection.

Fixes: f695ebfae5 ("ofproto: Postpone sending flow removed messages.")
Signed-off-by: Jarno Rajahalme 
---
v3: New patch for v3.

 ofproto/connmgr.c  | 16 
 ofproto/connmgr.h  |  6 --
 ofproto/fail-open.c|  8 +---
 ofproto/fail-open.h|  2 +-
 ofproto/in-band.c  |  2 ++
 ofproto/ofproto-provider.h |  2 +-
 ofproto/ofproto.c  | 36 
 ofproto/ofproto.h  |  3 ++-
 8 files changed, 55 insertions(+), 20 deletions(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index d2bedad..51e676a 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -191,7 +191,10 @@ struct connmgr {
 
 /* OpenFlow connections. */
 struct hmap controllers; /* All OFCONN_PRIMARY controllers. */
-struct ovs_list all_conns;   /* All controllers. */
+struct ovs_list all_conns;   /* All controllers.  All modifications are
+protected by ofproto_mutex, so that any
+traversals from other threads can be made
+safe by holding the ofproto_mutex. */
 uint64_t master_election_id; /* monotonically increasing sequence number
   * for master election */
 bool master_election_id_defined;
@@ -255,6 +258,7 @@ connmgr_create(struct ofproto *ofproto,
 /* Frees 'mgr' and all of its resources. */
 void
 connmgr_destroy(struct connmgr *mgr)
+OVS_REQUIRES(ofproto_mutex)
 {
 struct ofservice *ofservice, *next_ofservice;
 struct ofconn *ofconn, *next_ofconn;
@@ -264,11 +268,9 @@ connmgr_destroy(struct connmgr *mgr)
 return;
 }
 
-ovs_mutex_lock(&ofproto_mutex);
 LIST_FOR_EACH_SAFE (ofconn, next_ofconn, node, &mgr->all_conns) {
 ofconn_destroy(ofconn);
 }
-ovs_mutex_unlock(&ofproto_mutex);
 
 hmap_destroy(&mgr->controllers);
 
@@ -770,7 +772,9 @@ update_fail_open(struct connmgr *mgr)
 mgr->fail_open = fail_open_create(mgr->ofproto, mgr);
 }
 } else {
+ovs_mutex_lock(&ofproto_mutex);
 fail_open_destroy(mgr->fail_open);
+ovs_mutex_unlock(&ofproto_mutex);
 mgr->fail_open = NULL;
 }
 }
@@ -1203,6 +1207,7 @@ ofconn_get_target(const struct ofconn *ofconn)
 static struct ofconn *
 ofconn_create(struct connmgr *mgr, struct rconn *rconn, enum ofconn_type type,
   bool enable_async_msgs)
+OVS_REQUIRES(ofproto_mutex)
 {
 struct ofconn *ofconn;
 
@@ -1607,10 +1612,13 @@ connmgr_send_requestforward(struct connmgr *mgr, const 
struct ofconn *source,
 }
 
 /* Sends an OFPT_FLOW_REMOVED or NXT_FLOW_REMOVED message based on 'fr' to
- * appropriate controllers managed by 'mgr'. */
+ * appropriate controllers managed by 'mgr'.
+ *
+ * This may be called from the RCU thread. */
 void
 connmgr_send_flow_removed(struct connmgr *mgr,
   const struct ofputil_flow_removed *fr)
+OVS_REQUIRES(ofproto_mutex)
 {
 struct ofconn *ofconn;
 
diff --git a/ofproto/connmgr.h b/ofproto/connmgr.h
index 0768aa0..1a75c4c 100644
--- a/ofproto/connmgr.h
+++ b/ofproto/connmgr.h
@@ -71,7 +71,8 @@ void ofproto_async_msg_free(struct ofproto_async_msg *);
 /* Basics. */
 struct connmgr *connmgr_create(struct ofproto *ofproto,
const char *dpif_name, const char *local_name);
-void connmgr_destroy(struct connmgr *);
+void connmgr_destroy(struct connmgr *)
+OVS_REQUIRES(ofproto_mutex);
 
 void connmgr_run(struct connmgr *,
  void (*handle_openflow)(struct ofconn *,
@@ -142,7 +143,8 @@ bool connmgr_wants_packet_in_on_miss(struct connmgr *mgr);
 void connmgr_send_port_status(struct connmgr *, struct ofconn *source,
   const struct ofputil_phy_port *, uint8_t reason);
 void connmgr_send_flow_removed(struct connmgr *,
-   const struct ofputil_flow_removed *);
+   const struct ofputil_flow_removed *)
+OVS_REQUIRES(ofproto_mutex);
 void connmgr_send_async_msg(struct connmgr *,
 const struct ofproto_async_msg *);
 void ofconn_send_role_status(struct ofconn *ofconn, uint32_t role,
diff --git a/ofproto/fail-open.c b/ofproto/fail-open.c
index 3c3579e..e038e77 100644
--- a/ofproto/fail-open.c
+++ b/ofproto/fail-open.c
@@ -79,7 +79,7 @@ struct fail_open {
 bool fail_open_active;
 };
 
-static void fail_open_recover(struct fail_open *);
+static void fail_open_recover(struct fail_open *) OVS_REQUIRES(ofproto_mutex);
 
 /* Returns the number 

[ovs-dev] [PATCH v3 03/13] ofproto: Add a fixed bundle idle timeout of 10 seconds.

2016-09-12 Thread Jarno Rajahalme
Timing out idle bundles frees memory that would effectively be leaked
if a long standing OpenFlow connection would fail to commit or discard
a bundle.

OpenFlow specification mandates the timeout to be at least one second,
if the switch implements such a timeout.  This patch makes the bundle
idle timeout to be 10 seconds.

We do not limit the number of messages in a bundle, so it does not
make sense to limit the number of bundles either, especially now that
idle bundles are timed out.

Signed-off-by: Jarno Rajahalme 
---
v3: New patch for v3.

 NEWS  |  4 +++-
 ofproto/bundles.c | 21 +--
 ofproto/bundles.h | 23 +++-
 ofproto/connmgr.c | 32 +---
 ofproto/ofproto.c | 10 -
 tests/ofproto.at  | 63 +++
 6 files changed, 133 insertions(+), 20 deletions(-)

diff --git a/NEWS b/NEWS
index 8c78b36..9e8034a 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,8 @@
 Post-v2.6.0
 -
-
+   - OpenFlow:
+ * Bundles now expire after 10 seconds since the last time the
+   bundle was either opened, modified, or closed.
 
 v2.6.0 - xx xxx 
 -
diff --git a/ofproto/bundles.c b/ofproto/bundles.c
index 353a3a7..e0b4b7d 100644
--- a/ofproto/bundles.c
+++ b/ofproto/bundles.c
@@ -1,7 +1,7 @@
 /*
  * Copyright (c) 2013, 2014 Alexandru Copot , with 
support from IXIA.
  * Copyright (c) 2013, 2014 Daniel Baluta 
- * Copyright (c) 2014, 2015 Nicira, Inc.
+ * Copyright (c) 2014, 2015, 2016 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -41,18 +41,23 @@
 VLOG_DEFINE_THIS_MODULE(bundles);
 
 static struct ofp_bundle *
-ofp_bundle_create(uint32_t id, uint16_t flags)
+ofp_bundle_create(uint32_t id, uint16_t flags, const struct ofp_header *oh)
 {
 struct ofp_bundle *bundle;
 
 bundle = xmalloc(sizeof(*bundle));
 
+bundle->used = time_msec();
 bundle->id = id;
 bundle->flags = flags;
 bundle->state = BS_OPEN;
 
 ovs_list_init(&bundle->msg_list);
 
+/* Max 64 bytes for error reporting. */
+memcpy(bundle->ofp_msg_data, oh,
+   MIN(ntohs(oh->length), sizeof bundle->ofp_msg_data));
+
 return bundle;
 }
 
@@ -75,7 +80,8 @@ ofp_bundle_remove__(struct ofconn *ofconn, struct ofp_bundle 
*bundle,
 }
 
 enum ofperr
-ofp_bundle_open(struct ofconn *ofconn, uint32_t id, uint16_t flags)
+ofp_bundle_open(struct ofconn *ofconn, uint32_t id, uint16_t flags,
+const struct ofp_header *oh)
 {
 struct ofp_bundle *bundle;
 enum ofperr error;
@@ -89,7 +95,7 @@ ofp_bundle_open(struct ofconn *ofconn, uint32_t id, uint16_t 
flags)
 return OFPERR_OFPBFC_BAD_ID;
 }
 
-bundle = ofp_bundle_create(id, flags);
+bundle = ofp_bundle_create(id, flags, oh);
 error = ofconn_insert_bundle(ofconn, bundle);
 if (error) {
 free(bundle);
@@ -119,6 +125,7 @@ ofp_bundle_close(struct ofconn *ofconn, uint32_t id, 
uint16_t flags)
 return OFPERR_OFPBFC_BAD_FLAGS;
 }
 
+bundle->used = time_msec();
 bundle->state = BS_CLOSED;
 return 0;
 }
@@ -141,7 +148,8 @@ ofp_bundle_discard(struct ofconn *ofconn, uint32_t id)
 
 enum ofperr
 ofp_bundle_add_message(struct ofconn *ofconn, uint32_t id, uint16_t flags,
-   struct ofp_bundle_entry *bmsg)
+   struct ofp_bundle_entry *bmsg,
+   const struct ofp_header *oh)
 {
 struct ofp_bundle *bundle;
 
@@ -150,7 +158,7 @@ ofp_bundle_add_message(struct ofconn *ofconn, uint32_t id, 
uint16_t flags,
 if (!bundle) {
 enum ofperr error;
 
-bundle = ofp_bundle_create(id, flags);
+bundle = ofp_bundle_create(id, flags, oh);
 error = ofconn_insert_bundle(ofconn, bundle);
 if (error) {
 free(bundle);
@@ -164,6 +172,7 @@ ofp_bundle_add_message(struct ofconn *ofconn, uint32_t id, 
uint16_t flags,
 return OFPERR_OFPBFC_BAD_FLAGS;
 }
 
+bundle->used = time_msec();
 ovs_list_push_back(&bundle->msg_list, &bmsg->node);
 return 0;
 }
diff --git a/ofproto/bundles.h b/ofproto/bundles.h
index 802de77..3069224 100644
--- a/ofproto/bundles.h
+++ b/ofproto/bundles.h
@@ -42,33 +42,45 @@ struct ofp_bundle_entry {
 };
 
 /* OpenFlow header and some of the message contents for error reporting. */
-struct ofp_header ofp_msg[DIV_ROUND_UP(64, sizeof(struct ofp_header))];
+union {
+struct ofp_header ofp_msg;
+uint8_t ofp_msg_data[64];
+};
 };
 
-enum bundle_state {
+enum OVS_PACKED_ENUM bundle_state {
 BS_OPEN,
 BS_CLOSED
 };
 
 struct ofp_bundle {
 struct hmap_node  node;  /* In struct ofconn's "bundles" hmap. */
+long long int used;  /* Last time bundle was used. */
 uint32_t  id;
 uint

[ovs-dev] [PATCH v3 11/13] coverage: Rename init functions to avoid symbol collisions.

2016-09-12 Thread Jarno Rajahalme
ofproto now uses various *_init() functions, so use something else for
coverage constructors.

Signed-off-by: Jarno Rajahalme 
---
v3: Use '_init_coverage' suffix instead of '_init__'.

 lib/coverage.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/coverage.h b/lib/coverage.h
index 34a04aa..dea990e 100644
--- a/lib/coverage.h
+++ b/lib/coverage.h
@@ -75,7 +75,7 @@ void coverage_counter_register(struct coverage_counter*);
 extern struct coverage_counter counter_##COUNTER;   \
 struct coverage_counter counter_##COUNTER   \
 = { #COUNTER, COUNTER##_count, 0, 0, {0}, {0} };\
-OVS_CONSTRUCTOR(COUNTER##_init) {   \
+OVS_CONSTRUCTOR(COUNTER##_init_coverage) {  \
 coverage_counter_register(&counter_##COUNTER);  \
 }
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v3 10/13] ofproto-dpif-xlate: Allow translating without side-effects.

2016-09-12 Thread Jarno Rajahalme
Extend 'may_learn' attribute to also control the treatment of
FIN_TIMEOUT action and asynchronous messages (packet ins,
continuations), so that when 'may_learn' is 'false' and
'resubmit_stats' is 'NULL', no OpenFlow-visible side effects are
generated by the translation.

Correspondingly, add support for one-time asynchronous messages to
xlate cache, so that all side-effects of the translation may be
executed at a later stage.  This will be useful for bundle commits.

Signed-off-by: Jarno Rajahalme 
---
v3: Rebase.

 ofproto/ofproto-dpif-xlate-cache.c | 14 +
 ofproto/ofproto-dpif-xlate-cache.h |  9 +++--
 ofproto/ofproto-dpif-xlate.c   | 40 +-
 ofproto/ofproto-dpif-xlate.h   |  3 ++-
 4 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
b/ofproto/ofproto-dpif-xlate-cache.c
index eabdad7..68ea958 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -27,6 +27,7 @@
 #include "bond.h"
 #include "bundle.h"
 #include "byte-order.h"
+#include "connmgr.h"
 #include "coverage.h"
 #include "dp-packet.h"
 #include "dpif.h"
@@ -146,6 +147,13 @@ xlate_push_stats_entry(struct xc_entry *entry,
 tnl_neigh_lookup(entry->u.tnl_neigh_cache.br_name,
  &entry->u.tnl_neigh_cache.d_ipv6, &dmac);
 break;
+case XC_CONTROLLER:
+if (entry->u.controller.am) {
+ofproto_dpif_send_async_msg(entry->u.controller.ofproto,
+entry->u.controller.am);
+entry->u.controller.am = NULL; /* One time only. */
+}
+break;
 default:
 OVS_NOT_REACHED();
 }
@@ -225,6 +233,12 @@ xlate_cache_clear_entry(struct xc_entry *entry)
 break;
 case XC_TNL_NEIGH:
 break;
+case XC_CONTROLLER:
+if (entry->u.controller.am) {
+ofproto_async_msg_free(entry->u.controller.am);
+entry->u.controller.am = NULL;
+}
+break;
 default:
 OVS_NOT_REACHED();
 }
diff --git a/ofproto/ofproto-dpif-xlate-cache.h 
b/ofproto/ofproto-dpif-xlate-cache.h
index b32a02a..3705000 100644
--- a/ofproto/ofproto-dpif-xlate-cache.h
+++ b/ofproto/ofproto-dpif-xlate-cache.h
@@ -46,11 +46,12 @@ enum xc_type {
 XC_NETDEV,
 XC_NETFLOW,
 XC_MIRROR,
-XC_LEARN,
+XC_LEARN,/* Calls back to ofproto. */
 XC_NORMAL,
-XC_FIN_TIMEOUT,
+XC_FIN_TIMEOUT,  /* Calls back to ofproto. */
 XC_GROUP,
 XC_TNL_NEIGH,
+XC_CONTROLLER,
 };
 
 /* xlate_cache entries hold enough information to perform the side effects of
@@ -113,6 +114,10 @@ struct xc_entry {
 char br_name[IFNAMSIZ];
 struct in6_addr d_ipv6;
 } tnl_neigh_cache;
+struct {
+struct ofproto_dpif *ofproto;
+struct ofproto_async_msg *am;
+} controller;
 } u;
 };
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 9c534be..0783d48 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3508,6 +3508,10 @@ execute_controller_action(struct xlate_ctx *ctx, int len,
 return;
 }
 
+if (!ctx->xin->may_learn && !ctx->xin->xcache) {
+return;
+}
+
 packet = dp_packet_clone(ctx->xin->packet);
 packet_batch_init_packet(&batch, packet);
 odp_execute_actions(NULL, &batch, false,
@@ -3546,13 +3550,26 @@ execute_controller_action(struct xlate_ctx *ctx, int 
len,
 };
 flow_get_metadata(&ctx->xin->flow, &am->pin.up.public.flow_metadata);
 
-ofproto_dpif_send_async_msg(ctx->xbridge->ofproto, am);
-dp_packet_delete(packet);
+/* Async messages are only sent once, so if we send one now, no
+ * xlate cache entry is created.  */
+if (ctx->xin->may_learn) {
+ofproto_dpif_send_async_msg(ctx->xbridge->ofproto, am);
+} else /* xcache */ {
+struct xc_entry *entry;
+
+entry = xlate_cache_add_entry(ctx->xin->xcache, XC_CONTROLLER);
+entry->u.controller.ofproto = ctx->xbridge->ofproto;
+entry->u.controller.am = am;
+}
 }
 
 static void
 emit_continuation(struct xlate_ctx *ctx, const struct frozen_state *state)
 {
+if (!ctx->xin->may_learn && !ctx->xin->xcache) {
+return;
+}
+
 struct ofproto_async_msg *am = xmalloc(sizeof *am);
 *am = (struct ofproto_async_msg) {
 .controller_id = ctx->pause->controller_id,
@@ -3584,7 +3601,18 @@ emit_continuation(struct xlate_ctx *ctx, const struct 
frozen_state *state)
 },
 };
 flow_get_metadata(&ctx->xin->flow, &am->pin.up.public.flow_metadata);
-of

[ovs-dev] [PATCH v3 04/13] ofproto: Change rule's 'removed' member to a tri-state 'state'.

2016-09-12 Thread Jarno Rajahalme
As a rule may not be re-inserted to ofproto data structures, it is
cleaner to have three states for the rule, rather than just two.  This
will be useful for managing learned flows in later patches.

Signed-off-by: Jarno Rajahalme 
---
v3: Do not change the locking requirements, as locking for a single (integral)
variable makes no difference.

 ofproto/ofproto-provider.h | 17 +++--
 ofproto/ofproto.c  | 13 +++--
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index b11bf12..ef8ed67 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -323,6 +323,19 @@ struct oftable {
  *  'rule->mutex', and safely written only by coding holding ofproto_mutex
  *  AND 'rule->mutex'.  These are marked OVS_GUARDED.
  */
+enum OVS_PACKED_ENUM rule_state {
+RULE_INITIALIZED, /* Rule has been initialized, but not inserted to the
+   * ofproto data structures.  Versioning makes sure the
+   * rule is not visible to lookups by other threads, even
+   * if the rule is added to a classifier. */
+RULE_INSERTED,/* Rule has been inserted to ofproto data structures and
+   * may be visible to lookups by other threads. */
+RULE_REMOVED, /* Rule has been removed from ofproto data structures,
+   * and may still be visible to lookups by other threads
+   * until they quiesce, after which the rule will be
+   * removed from the classifier as well. */
+};
+
 struct rule {
 /* Where this rule resides in an OpenFlow switch.
  *
@@ -330,8 +343,8 @@ struct rule {
 struct ofproto *const ofproto; /* The ofproto that contains this rule. */
 const struct cls_rule cr;  /* In owning ofproto's classifier. */
 const uint8_t table_id;/* Index in ofproto's 'tables' array. */
-bool removed;  /* Rule has been removed from the ofproto
-* data structures. */
+
+enum rule_state state;
 
 /* Protects members marked OVS_GUARDED.
  * Readers only need to hold this mutex.
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index cfc4d41..762c42d 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -1478,7 +1478,7 @@ ofproto_rule_delete(struct ofproto *ofproto, struct rule 
*rule)
  * be killed. */
 ovs_mutex_lock(&ofproto_mutex);
 
-if (!rule->removed) {
+if (rule->state == RULE_INSERTED) {
 /* Make sure there is no postponed removal of the rule. */
 ovs_assert(cls_rule_visible_in_version(&rule->cr, OVS_VERSION_MAX));
 
@@ -4816,7 +4816,7 @@ ofproto_rule_create(struct ofproto *ofproto, struct 
cls_rule *cr,
 return error;
 }
 
-rule->removed = true;   /* Not yet in ofproto data structures. */
+rule->state = RULE_INITIALIZED;
 
 *new_rule = rule;
 return 0;
@@ -8100,7 +8100,8 @@ ofproto_rule_insert__(struct ofproto *ofproto, struct 
rule *rule)
 {
 const struct rule_actions *actions = rule_get_actions(rule);
 
-ovs_assert(rule->removed);
+/* A rule may not be reinserted. */
+ovs_assert(rule->state == RULE_INITIALIZED);
 
 if (rule->hard_timeout || rule->idle_timeout) {
 ovs_list_insert(&ofproto->expirable, &rule->expirable);
@@ -8123,7 +8124,7 @@ ofproto_rule_insert__(struct ofproto *ofproto, struct 
rule *rule)
 }
 }
 
-rule->removed = false;
+rule->state = RULE_INSERTED;
 }
 
 /* Removes 'rule' from the ofproto data structures.  Caller may have deferred
@@ -8132,7 +8133,7 @@ static void
 ofproto_rule_remove__(struct ofproto *ofproto, struct rule *rule)
 OVS_REQUIRES(ofproto_mutex)
 {
-ovs_assert(!rule->removed);
+ovs_assert(rule->state == RULE_INSERTED);
 
 cookies_remove(ofproto, rule);
 
@@ -8168,7 +8169,7 @@ ofproto_rule_remove__(struct ofproto *ofproto, struct 
rule *rule)
 }
 }
 
-rule->removed = true;
+rule->state = RULE_REMOVED;
 }
 
 /* unixctl commands. */
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v3 02/13] tests: Keyword fixes.

2016-09-12 Thread Jarno Rajahalme
Add "bundle" keyword to ofp-print.at tests about bundle messages.
Add a missing ofp-print.at test for bundle group mods.
Remove "monitor" keyword from ofproto.at tests that do not use a monitor.

Signed-off-by: Jarno Rajahalme 
---
v3: New patch for v3.

 tests/ofp-print.at | 58 ++
 tests/ofproto.at   | 14 +
 2 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/tests/ofp-print.at b/tests/ofp-print.at
index ff1df39..5d2040b 100644
--- a/tests/ofp-print.at
+++ b/tests/ofp-print.at
@@ -3339,8 +3339,8 @@ NXST_FLOW_MONITOR reply (xid=0x4):
 AT_CLEANUP
 
 
-AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REQUEST])
-AT_KEYWORDS([ofp-print])
+AT_SETUP([OFPT_BUNDLE_CONTROL - atomic OPEN_REQUEST])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 00 00 01 \
@@ -3350,8 +3350,8 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 ])
 AT_CLEANUP
 
-AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REQUEST])
-AT_KEYWORDS([ofp-print])
+AT_SETUP([OFPT_BUNDLE_CONTROL - ordered OPEN_REQUEST])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 00 00 02 \
@@ -3361,8 +3361,8 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 ])
 AT_CLEANUP
 
-AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REQUEST])
-AT_KEYWORDS([ofp-print])
+AT_SETUP([OFPT_BUNDLE_CONTROL - atomic ordered OPEN_REQUEST])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 00 00 03 \
@@ -3373,7 +3373,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_CONTROL - OPEN_REPLY])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 01 00 01 \
@@ -3384,7 +3384,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_CONTROL - CLOSE_REQUEST])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 02 00 01 \
@@ -3395,7 +3395,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_CONTROL - CLOSE_REPLY])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 03 00 01 \
@@ -3406,7 +3406,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_CONTROL - COMMIT_REQUEST])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 04 00 01 \
@@ -3417,7 +3417,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_CONTROL - COMMIT_REPLY])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 05 00 01 \
@@ -3428,7 +3428,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_CONTROL - DISCARD_REQUEST])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 06 00 01 \
@@ -3439,7 +3439,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_CONTROL - DISCARD_REPLY])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 21 00 10 00 00 00 00 \
 00 00 00 01 00 07 00 01 \
@@ -3450,7 +3450,7 @@ OFPT_BUNDLE_CONTROL (OF1.4) (xid=0x0):
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_ADD_MESSAGE - verify xid])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 22 00 20 00 00 00 00 00 00 00 01 00 00 00 01 \
 05 00 00 08 00 00 00 01 00 00 00 00 00 00 00 00 \
@@ -3460,7 +3460,7 @@ OFPT_BUNDLE_ADD_MESSAGE (OF1.4) (xid=0x0): ***decode 
error: OFPBFC_MSG_BAD_XID**
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_ADD_MESSAGE - reject OFPT_HELLO])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl '-vPATTERN:console:%c|%p|%m' ofp-print "\
 05 22 00 20 00 00 00 00 00 00 00 01 00 00 00 01 \
 05 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 \
@@ -3472,7 +3472,7 @@ ofp_util|WARN|OFPT_HELLO message not allowed inside 
OFPT14_BUNDLE_ADD_MESSAGE
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_ADD_MESSAGE - FLOW_MOD])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 22 00 a0 00 00 00 02 00 00 00 01 00 00 00 01 \
 05 0e 00 90 00 00 00 02 00 00 00 00 00 00 00 00 \
@@ -3492,7 +3492,7 @@ OFPT_FLOW_MOD (OF1.4) (xid=0x2): ADD table:1 
priority=65535,arp,in_port=1,vlan_t
 AT_CLEANUP
 
 AT_SETUP([OFPT_BUNDLE_ADD_MESSAGE - PORT_MOD])
-AT_KEYWORDS([ofp-print])
+AT_KEYWORDS([ofp-print bundle])
 AT_CHECK([ovs-ofctl ofp-print "\
 05 22 00 38 00 00 00 03 00 00 00 01 00 00 00 01 \
 05 10 00 28 00 00 00 03 00 00 00 03 00 00 00 00 \
@@ -3508,6 +3508

[ovs-dev] [PATCH v3 08/13] ofproto-dpif-xlate: Add xlate cache type XC_TABLE.

2016-09-12 Thread Jarno Rajahalme
Xlate cache entry type XC_TABLE is required for the table stats
(number of misses and matches) to be correctly attributed.

It appears that table stats have been off ever since xlate cache was
introduced.  This was now revealed by a PACKET_OUT unit test case in a
later patch that checks for table stats explicitly.

Fixes: b256dc52 ("ofproto-dpif-xlate: Cache xlate_actions() effects.")
Signed-off-by: Jarno Rajahalme 
---
v3: Add the NEWS flash.

 NEWS   |  3 +++
 ofproto/ofproto-dpif-xlate-cache.c | 10 ++
 ofproto/ofproto-dpif-xlate-cache.h |  6 ++
 ofproto/ofproto-dpif-xlate.c   |  6 +++---
 ofproto/ofproto-dpif.c | 35 ++-
 ofproto/ofproto-dpif.h |  8 +++-
 6 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index 9e8034a..7f501d6 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,8 @@
 Post-v2.6.0
 -
+   - Fixed regression in table stats maintenance introduced in OVS
+ 2.3.0, wherein the number of OpenFlow table hits and misses was
+ not accurate.
- OpenFlow:
  * Bundles now expire after 10 seconds since the last time the
bundle was either opened, modified, or closed.
diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
b/ofproto/ofproto-dpif-xlate-cache.c
index ebcfa8a..ff32a99 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -88,6 +88,14 @@ xlate_push_stats_entry(struct xc_entry *entry,
 struct eth_addr dmac;
 
 switch (entry->type) {
+case XC_TABLE:
+ofproto_dpif_credit_table_stats(entry->u.table.ofproto,
+entry->u.table.id,
+entry->u.table.match
+? stats->n_packets : 0,
+entry->u.table.match
+? 0 : stats->n_packets);
+break;
 case XC_RULE:
 rule_dpif_credit_stats(entry->u.rule, stats);
 break;
@@ -178,6 +186,8 @@ void
 xlate_cache_clear_entry(struct xc_entry *entry)
 {
 switch (entry->type) {
+case XC_TABLE:
+break;
 case XC_RULE:
 rule_dpif_unref(entry->u.rule);
 break;
diff --git a/ofproto/ofproto-dpif-xlate-cache.h 
b/ofproto/ofproto-dpif-xlate-cache.h
index 30e5c75..d9e0e52 100644
--- a/ofproto/ofproto-dpif-xlate-cache.h
+++ b/ofproto/ofproto-dpif-xlate-cache.h
@@ -40,6 +40,7 @@ struct group_dpif;
 struct ofputil_bucket;
 
 enum xc_type {
+XC_TABLE,
 XC_RULE,
 XC_BOND,
 XC_NETDEV,
@@ -64,6 +65,11 @@ enum xc_type {
 struct xc_entry {
 enum xc_type type;
 union {
+struct {
+struct ofproto_dpif *ofproto;
+uint8_t id;
+boolmatch; /* or miss. */
+} table;
 struct rule_dpif *rule;
 struct {
 struct netdev *tx;
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 11d3732..cf8d1fc 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3188,8 +3188,8 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t 
in_port, uint8_t table_id,
&ctx->xin->flow, ctx->wc,
ctx->xin->resubmit_stats,
&ctx->table_id, in_port,
-   may_packet_in, honor_table_miss);
-
+   may_packet_in, honor_table_miss,
+   ctx->xin->xcache);
 if (OVS_UNLIKELY(ctx->xin->resubmit_hook)) {
 ctx->xin->resubmit_hook(ctx->xin, rule, ctx->indentation + 1);
 }
@@ -5336,7 +5336,7 @@ xlate_actions(struct xlate_in *xin, struct xlate_out 
*xout)
 ctx.rule = rule_dpif_lookup_from_table(
 ctx.xbridge->ofproto, ctx.tables_version, flow, ctx.wc,
 ctx.xin->resubmit_stats, &ctx.table_id,
-flow->in_port.ofp_port, true, true);
+flow->in_port.ofp_port, true, true, ctx.xin->xcache);
 if (ctx.xin->resubmit_stats) {
 rule_dpif_credit_stats(ctx.rule, ctx.xin->resubmit_stats);
 }
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index d928f01..b8171a7 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -49,6 +49,7 @@
 #include "ofproto-dpif-sflow.h"
 #include "ofproto-dpif-upcall.h"
 #include "ofproto-dpif-xlate.h"
+#include "ofproto-dpif-xlate-cache.h"
 #include "openvswitch/ofp-actions.h"
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/meta-flow.h"
@@ -3914,6 +3915,21 @@ rule_dpif_lookup_in_table(struct ofproto_dpif *ofproto, 
ovs_v

[ovs-dev] [PATCH v3 09/13] ofproto: Use ofproto_flow_mod for learn execution from xlate cache.

2016-09-12 Thread Jarno Rajahalme
Use ofproto_flow_mod with a reference to an existing or new rule
instead of ofputil_flow_mod for learn action execution from xlate
cache

Typically we would find that when a learn xlate cache entry is
created, a preceding upcall has already created the learned flow.  In
this case the xlate cache entry takes a reference to that flow and
keeps refreshing it without needing to perform any flow table lookups.
Otherwise the creation of the xlate cache entry creates the new rule,
which is then subsequently added to the classifier.  In both cases
this is both faster and shrinks the memory cost of each learn cache
entry from ~3.5kb to about 0.3kb.

If the learned rule does not yet exist, it is created and attached to
the ofproto_flow_mod, from which it is then added.  If the referred
rule happens to expire, or is modified in any way and is thus removed
from the classifier tables, we create a new rule using the old rule as
a template, so that we can avoid storing the ofputil_flow_mod in all
cases.

Signed-off-by: Jarno Rajahalme 
---
v3: Fixed error handling, simplified reference keeping.

 ofproto/ofproto-dpif-xlate-cache.c |  14 +-
 ofproto/ofproto-dpif-xlate-cache.h |   4 +-
 ofproto/ofproto-dpif-xlate.c   |  47 ---
 ofproto/ofproto-dpif.c |  17 ++-
 ofproto/ofproto-dpif.h |   5 +-
 ofproto/ofproto-provider.h |   6 +
 ofproto/ofproto.c  | 253 +++--
 7 files changed, 242 insertions(+), 104 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
b/ofproto/ofproto-dpif-xlate-cache.c
index ff32a99..eabdad7 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -115,9 +115,15 @@ xlate_push_stats_entry(struct xc_entry *entry,
 entry->u.mirror.mirrors,
 stats->n_packets, stats->n_bytes);
 break;
-case XC_LEARN:
-ofproto_dpif_flow_mod(entry->u.learn.ofproto, entry->u.learn.fm);
+case XC_LEARN: {
+enum ofperr error;
+error = ofproto_flow_mod_learn(entry->u.learn.ofm, true);
+if (error) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(&rl, "xcache LEARN action execution failed.");
+}
 break;
+}
 case XC_NORMAL:
 xlate_mac_learning_update(entry->u.normal.ofproto,
   entry->u.normal.in_port,
@@ -205,8 +211,8 @@ xlate_cache_clear_entry(struct xc_entry *entry)
 mbridge_unref(entry->u.mirror.mbridge);
 break;
 case XC_LEARN:
-free(entry->u.learn.fm);
-ofpbuf_delete(entry->u.learn.ofpacts);
+ofproto_flow_mod_uninit(entry->u.learn.ofm);
+free(entry->u.learn.ofm);
 break;
 case XC_NORMAL:
 break;
diff --git a/ofproto/ofproto-dpif-xlate-cache.h 
b/ofproto/ofproto-dpif-xlate-cache.h
index d9e0e52..b32a02a 100644
--- a/ofproto/ofproto-dpif-xlate-cache.h
+++ b/ofproto/ofproto-dpif-xlate-cache.h
@@ -91,9 +91,7 @@ struct xc_entry {
 uint16_t vid;
 } bond;
 struct {
-struct ofproto_dpif *ofproto;
-struct ofputil_flow_mod *fm;
-struct ofpbuf *ofpacts;
+struct ofproto_flow_mod *ofm;
 } learn;
 struct {
 struct ofproto_dpif *ofproto;
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index cf8d1fc..9c534be 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3981,37 +3981,42 @@ xlate_bundle_action(struct xlate_ctx *ctx,
 }
 
 static void
-xlate_learn_action__(struct xlate_ctx *ctx, const struct ofpact_learn *learn,
- struct ofputil_flow_mod *fm, struct ofpbuf *ofpacts)
-{
-learn_execute(learn, &ctx->xin->flow, fm, ofpacts);
-if (ctx->xin->may_learn) {
-ofproto_dpif_flow_mod(ctx->xbridge->ofproto, fm);
-}
-}
-
-static void
 xlate_learn_action(struct xlate_ctx *ctx, const struct ofpact_learn *learn)
 {
 learn_mask(learn, ctx->wc);
 
-if (ctx->xin->xcache) {
-struct xc_entry *entry;
-
-entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN);
-entry->u.learn.ofproto = ctx->xbridge->ofproto;
-entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm);
-entry->u.learn.ofpacts = ofpbuf_new(64);
-xlate_learn_action__(ctx, learn, entry->u.learn.fm,
- entry->u.learn.ofpacts);
-} else if (ctx->xin->may_learn) {
+if (ctx->xin->xcache || ctx->xin->may_learn) {
 uint64_t ofpacts_stub[1024 / 8];
 struct ofputil_flow_mod fm;
+struct ofproto_flow_mod ofm__, *ofm;
 struct ofpbuf ofpacts;
+enum ofperr error;
+
+if (ctx->xin->xcache) {
+struct xc_entry *entry;
+
+ 

[ovs-dev] [PATCH v3 07/13] ofproto-dpif-xlate: Expose xlate cache.

2016-09-12 Thread Jarno Rajahalme
Later patches will need to create xlate cache entries from different
modules.  This patch refactors the xlate cache code in preparation
without any functional changes, so that the changes are clearly
visible in the following patches.

The definition of XC_ENTRY_FOR_EACH() iterator macro is changed so
that it now does not take the xlate cache pointer to unify the usage
accross all call sites.

Signed-off-by: Jarno Rajahalme 
---
v3: Separated xlate cache to it's own module.

 ofproto/automake.mk|   2 +
 ofproto/ofproto-dpif-upcall.c  |   1 +
 ofproto/ofproto-dpif-xlate-cache.c | 239 +
 ofproto/ofproto-dpif-xlate-cache.h | 132 +++
 ofproto/ofproto-dpif-xlate.c   | 262 +
 ofproto/ofproto-dpif-xlate.h   |   7 +-
 6 files changed, 379 insertions(+), 264 deletions(-)
 create mode 100644 ofproto/ofproto-dpif-xlate-cache.c
 create mode 100644 ofproto/ofproto-dpif-xlate-cache.h

diff --git a/ofproto/automake.mk b/ofproto/automake.mk
index 7486f2b..5a83a60 100644
--- a/ofproto/automake.mk
+++ b/ofproto/automake.mk
@@ -43,6 +43,8 @@ ofproto_libofproto_la_SOURCES = \
ofproto/ofproto-dpif-xlate.c \
ofproto/ofproto-dpif-xlate.h \
ofproto/ofproto-provider.h \
+   ofproto/ofproto-dpif-xlate-cache.c \
+   ofproto/ofproto-dpif-xlate-cache.h \
ofproto/pinsched.c \
ofproto/pinsched.h \
ofproto/tunnel.c \
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 565de5b..3080919 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -33,6 +33,7 @@
 #include "ofproto-dpif-ipfix.h"
 #include "ofproto-dpif-sflow.h"
 #include "ofproto-dpif-xlate.h"
+#include "ofproto-dpif-xlate-cache.h"
 #include "ovs-rcu.h"
 #include "packets.h"
 #include "poll-loop.h"
diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
b/ofproto/ofproto-dpif-xlate-cache.c
new file mode 100644
index 000..ebcfa8a
--- /dev/null
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -0,0 +1,239 @@
+/* Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License. */
+
+#include 
+
+#include "ofproto/ofproto-dpif-xlate-cache.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "bfd.h"
+#include "bitmap.h"
+#include "bond.h"
+#include "bundle.h"
+#include "byte-order.h"
+#include "coverage.h"
+#include "dp-packet.h"
+#include "dpif.h"
+#include "learn.h"
+#include "mac-learning.h"
+#include "netdev-vport.h"
+#include "ofproto/ofproto-dpif-mirror.h"
+#include "ofproto/ofproto-dpif.h"
+#include "ofproto/ofproto-dpif-xlate.h"
+#include "ofproto/ofproto-provider.h"
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/vlog.h"
+#include "ovs-router.h"
+#include "packets.h"
+#include "tnl-neigh-cache.h"
+#include "util.h"
+
+VLOG_DEFINE_THIS_MODULE(ofproto_xlate_cache);
+
+struct xlate_cache *
+xlate_cache_new(void)
+{
+struct xlate_cache *xcache = xmalloc(sizeof *xcache);
+
+ofpbuf_init(&xcache->entries, 512);
+return xcache;
+}
+
+struct xc_entry *
+xlate_cache_add_entry(struct xlate_cache *xcache, enum xc_type type)
+{
+struct xc_entry *entry;
+
+entry = ofpbuf_put_zeros(&xcache->entries, sizeof *entry);
+entry->type = type;
+
+return entry;
+}
+
+static void
+xlate_cache_netdev(struct xc_entry *entry, const struct dpif_flow_stats *stats)
+{
+if (entry->u.dev.tx) {
+netdev_vport_inc_tx(entry->u.dev.tx, stats);
+}
+if (entry->u.dev.rx) {
+netdev_vport_inc_rx(entry->u.dev.rx, stats);
+}
+if (entry->u.dev.bfd) {
+bfd_account_rx(entry->u.dev.bfd, stats);
+}
+}
+
+/* Push stats and perform side effects of flow translation. */
+void
+xlate_push_stats_entry(struct xc_entry *entry,
+   const struct dpif_flow_stats *stats)
+{
+struct eth_addr dmac;
+
+switch (entry->type) {
+case XC_RULE:
+rule_dpif_credit_stats(entry->u.rule, stats);
+break;
+case XC_BOND:
+bond_account(entry->u.bond.bo

[ovs-dev] [PATCH v3 05/13] connmgr: Make connmgr_wants_packet_in_on_miss() lock-free.

2016-09-12 Thread Jarno Rajahalme
Make connmgr_wants_packet_in_on_miss() use an atomic int instead of a
list traversal taking the 'ofproto_mutex'.  This allows
connmgr_wants_packet_in_on_miss() to be called also when
'ofproto_mutex' is already held, and makes it faster, too.

Remove unused ofproto_dpif_wants_packet_in_on_miss().

Signed-off-by: Jarno Rajahalme 
---
v3: Fix the totally broken behavior with a help of a per-ofconn boolean.

 ofproto/connmgr.c  | 80 ++
 ofproto/ofproto-dpif.c | 12 
 ofproto/ofproto-dpif.h |  1 -
 3 files changed, 54 insertions(+), 39 deletions(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 6014899..11ac08a 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -32,6 +32,7 @@
 #include "openvswitch/ofpbuf.h"
 #include "openvswitch/vconn.h"
 #include "openvswitch/vlog.h"
+#include "ovs-atomic.h"
 #include "pinsched.h"
 #include "poll-loop.h"
 #include "rconn.h"
@@ -65,6 +66,7 @@ struct ofconn {
 enum ofconn_type type;  /* Type. */
 enum ofproto_band band; /* In-band or out-of-band? */
 bool enable_async_msgs; /* Initially enable async messages? */
+bool want_packet_in_on_miss;
 
 /* State that should be cleared from one connection to the next. */
 
@@ -219,6 +221,8 @@ struct connmgr {
 struct sockaddr_in *extra_in_band_remotes;
 size_t n_extra_remotes;
 int in_band_queue;
+
+ATOMIC(int) want_packet_in_on_miss;   /* Sum of ofconns' values. */
 };
 
 static void update_in_band_remotes(struct connmgr *);
@@ -258,9 +262,46 @@ connmgr_create(struct ofproto *ofproto,
 mgr->n_extra_remotes = 0;
 mgr->in_band_queue = -1;
 
+atomic_init(&mgr->want_packet_in_on_miss, 0);
+
 return mgr;
 }
 
+/* The default "table-miss" behaviour for OpenFlow1.3+ is to drop the
+ * packet rather than to send the packet to the controller.
+ *
+ * This function maintains the count of pre-OpenFlow1.3 with controller_id 0,
+ * as we assume these are the controllers that should receive "table-miss"
+ * notifications. */
+static void
+update_want_packet_in_on_miss(struct ofconn *ofconn)
+{
+int want_packet_in_on_miss_delta = 0;
+
+/* Set ofconn->want_packet_in_on_miss when controller_id is zero
+ * and OpenFlow is lower than version 1.3, otherwise clear it. */
+if (ofconn->controller_id == 0 &&
+(ofconn->protocol == OFPUTIL_P_NONE ||
+ ofputil_protocol_to_ofp_version(ofconn->protocol) < OFP13_VERSION)) {
+if (!ofconn->want_packet_in_on_miss) {
+ofconn->want_packet_in_on_miss = true;
+want_packet_in_on_miss_delta = 1;
+}
+} else {
+if (ofconn->want_packet_in_on_miss) {
+ofconn->want_packet_in_on_miss = false;
+want_packet_in_on_miss_delta = -1;
+}
+}
+if (want_packet_in_on_miss_delta) {
+int count;
+atomic_read_relaxed(&ofconn->connmgr->want_packet_in_on_miss,
+&count);
+atomic_store_relaxed(&ofconn->connmgr->want_packet_in_on_miss,
+ count + want_packet_in_on_miss_delta);
+}
+}
+
 /* Frees 'mgr' and all of its resources. */
 void
 connmgr_destroy(struct connmgr *mgr)
@@ -1001,6 +1042,7 @@ void
 ofconn_set_protocol(struct ofconn *ofconn, enum ofputil_protocol protocol)
 {
 ofconn->protocol = protocol;
+update_want_packet_in_on_miss(ofconn);
 }
 
 /* Returns the currently configured packet in format for 'ofconn', one of
@@ -1030,6 +1072,7 @@ void
 ofconn_set_controller_id(struct ofconn *ofconn, uint16_t controller_id)
 {
 ofconn->controller_id = controller_id;
+update_want_packet_in_on_miss(ofconn);
 }
 
 /* Returns the default miss send length for 'ofconn'. */
@@ -1304,6 +1347,11 @@ ofconn_destroy(struct ofconn *ofconn)
 {
 ofconn_flush(ofconn);
 
+/* Force clearing of want_packet_in_on_miss to keep the global count
+ * accurate. */
+ofconn->controller_id = 1;
+update_want_packet_in_on_miss(ofconn);
+
 if (ofconn->type == OFCONN_PRIMARY) {
 hmap_remove(&ofconn->connmgr->controllers, &ofconn->hmap_node);
 }
@@ -1492,37 +1540,17 @@ ofconn_receives_async_msg(const struct ofconn *ofconn,
 return (masks[type] & (1u << reason)) != 0;
 }
 
-/* The default "table-miss" behaviour for OpenFlow1.3+ is to drop the
- * packet rather than to send the packet to the controller.
- *
- * This function returns true to indicate that a packet_in message
+/* This function returns true to indicate that a packet_in message
  * for a "table-miss" should be sent to at least one controller.
- * That is there is at least one controller with controller_id 0
- * which connected using an OpenFlow v

[ovs-dev] [PATCH v3 06/13] lib: Refactor mac-learning updates.

2016-09-12 Thread Jarno Rajahalme
Make mac table update functions part of the mac-learning module, which
also helps in figuring what is the minimal set of struct flow fields
needed for the update.  Use this to change the xlate cache entry for
XC_NORMAL to not take a copy of the struct flow, but only save the
in_port, dl_src, and some auxiliary fields.  This reduces the memory
burden of XC_NORMAL by roughly 0.5kb.

Signed-off-by: Jarno Rajahalme 
---
v3: New patch for v3.

 lib/mac-learning.c   | 126 +
 lib/mac-learning.h   |   4 ++
 ofproto/ofproto-dpif-xlate.c | 147 +--
 3 files changed, 159 insertions(+), 118 deletions(-)

diff --git a/lib/mac-learning.c b/lib/mac-learning.c
index 5a2ccba..5509f22 100644
--- a/lib/mac-learning.c
+++ b/lib/mac-learning.c
@@ -335,6 +335,132 @@ mac_learning_insert(struct mac_learning *ml,
 return e;
 }
 
+/* Checks whether a MAC learning update is necessary for MAC learning table
+ * 'ml' given that a packet matching 'src' was received on 'in_port' in 'vlan',
+ * and given that the packet was gratuitous ARP if 'is_gratuitous_arp' is
+ * 'true' and 'in_port' is a bond port if 'is_bond' is 'true'.
+ *
+ * Most packets processed through the MAC learning table do not actually
+ * change it in any way.  This function requires only a read lock on the MAC
+ * learning table, so it is much cheaper in this common case.
+ *
+ * Keep the code here synchronized with that in update_learning_table__()
+ * below. */
+static bool
+is_mac_learning_update_needed(const struct mac_learning *ml,
+  struct eth_addr src, int vlan,
+  bool is_gratuitous_arp, bool is_bond,
+  void *in_port)
+OVS_REQ_RDLOCK(ml->rwlock)
+{
+struct mac_entry *mac;
+
+if (!mac_learning_may_learn(ml, src, vlan)) {
+return false;
+}
+
+mac = mac_learning_lookup(ml, src, vlan);
+if (!mac || mac_entry_age(ml, mac)) {
+return true;
+}
+
+if (is_gratuitous_arp) {
+/* We don't want to learn from gratuitous ARP packets that are
+ * reflected back over bond slaves so we lock the learning table.  For
+ * more detail, see the bigger comment in update_learning_table__(). */
+if (!is_bond) {
+return true;   /* Need to set the gratuitous ARP lock. */
+} else if (mac_entry_is_grat_arp_locked(mac)) {
+return false;
+}
+}
+
+return mac_entry_get_port(ml, mac) != in_port /* ofbundle */;
+}
+
+/* Updates MAC learning table 'ml' given that a packet matching 'src' was
+ * received on 'in_port' in 'vlan', and given that the packet was gratuitous
+ * ARP if 'is_gratuitous_arp' is 'true' and 'in_port' is a bond port if
+ * 'is_bond' is 'true'.
+ *
+ * This code repeats all the checks in is_mac_learning_update_needed() because
+ * the lock was released between there and here and thus the MAC learning state
+ * could have changed.
+ *
+ * Returns 'true' if 'ml' was updated, 'false' otherwise.
+ *
+ * Keep the code here synchronized with that in is_mac_learning_update_needed()
+ * above. */
+static bool
+update_learning_table__(struct mac_learning *ml, struct eth_addr src,
+int vlan, bool is_gratuitous_arp, bool is_bond,
+void *in_port)
+OVS_REQ_WRLOCK(ml->rwlock)
+{
+struct mac_entry *mac;
+
+if (!mac_learning_may_learn(ml, src, vlan)) {
+return false;
+}
+
+mac = mac_learning_insert(ml, src, vlan);
+if (is_gratuitous_arp) {
+/* Gratuitous ARP packets received over non-bond interfaces could be
+ * reflected back over bond slaves.  We don't want to learn from these
+ * reflected packets, so we lock each entry for which a gratuitous ARP
+ * packet was received over a non-bond interface and refrain from
+ * learning from gratuitous ARP packets that arrive over bond
+ * interfaces for this entry while the lock is in effect.  See
+ * vswitchd/INTERNALS for more in-depth discussion on this topic. */
+if (!is_bond) {
+mac_entry_set_grat_arp_lock(mac);
+} else if (mac_entry_is_grat_arp_locked(mac)) {
+return false;
+}
+}
+
+if (mac_entry_get_port(ml, mac) != in_port) {
+mac_entry_set_port(ml, mac, in_port);
+return true;
+}
+return false;
+}
+
+/* Updates MAC learning table 'ml' given that a packet matching 'src' was
+ * received on 'in_port' in 'vlan', and given that the packet was gratuitous
+ * ARP if 'is_gratuitous_arp' is 'true' and 'in_port' is a bond port if
+ * 'is_bo

  1   2   3   4   5   6   7   8   9   10   >