date:20170804

Re: [ovs-dev] [patch_v9 1/6] string: Implement strcasestr substitute.

2017-08-04 Thread Darrell Ball

-Original Message-
From:  on behalf of Ben Pfaff 
Date: Friday, August 4, 2017 at 2:11 PM
To: Darrell Ball 
Cc: "d...@openvswitch.org" 
Subject: Re: [ovs-dev] [patch_v9 1/6] string: Implement strcasestr  
substitute.

On Thu, Aug 03, 2017 at 09:34:41PM -0700, Darrell Ball wrote:
> strcasestr is not defined for Windows, so implement a version
> that could be used on Windows. This is needed for an upcoming
> patch.
> 
> Signed-off-by: Darrell Ball 

Thanks!

I don't understand why this uses a macro to change the name of
strcasestr to strcasestr_s.  I don't know of a benefit to that.  Maybe
it follows the pattern in string.h.in that substitutes strerror_s for
strerror_r and strtok_s to strtok_r, but those cases are because the
Windows C library has functions named strerror_s and strtok_s that do
what OVS wants.  The Windows C library does not (as far as I know) have
a function strcasestr_s.

The header should provide a prototype for the function.  It doesn't do
clients any good to provide the prototype in the .c file.

Well, there is an explanation for everything – although it may not be a good 
one : ) ;
my intention was to provide a substitute with a separate name and use it for
Windows presently (with the remapping of strcasestr to strcasestr_s) but in the 
most
limited scope of source file usage.
However, I agree it is better just to provide a Windows version outright.

The implementation should follow POSIX in that searching for an empty
substring should always succeed; otherwise, eventually we will end up
with confusion.  See:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strstr.html

That was part of the reason for V9 - to support empty substr find success.
The V9 patch does handle the empty substring success case.
However, if both str and substr are empty, it should still be ‘success’; your 
version
does that; mine version returns NULL, which is non-compliant. 

My understanding is that it is not a good idea, legally, to replace
non-contiguous ranges of years by a range in a copyright notice.

ohh; thanks

Here is my suggestion.  What do you think?

Nice
I was wondering if the below incremental looked ok to you ?
I like the two do/while loops for consistency, affirmative character matching 
and the explicit else ? 
It seems more definitional, but it might be slower and is not as compact, but 
either way is fine with me ?

+#ifdef _WIN32
+char *
+strcasestr(const char *str, const char *substr)
+{
+size_t i = 0;
+do {
+do {
+if (!substr[i]) {
+return CONST_CAST(char *, str);
+} else if (tolower(substr[i]) == tolower(str[i])) {
+i++;
+} else {
+i = 0;
+break;
+}
+} while (1);
+} while (*str++);
+return NULL;
+}
+#endif

Thanks,

Ben.

--8<--cut here-->8--

From: Darrell Ball 
Date: Thu, 3 Aug 2017 21:34:41 -0700
Subject: [PATCH] string: Implement strcasestr substitute.

strcasestr is not defined for Windows, so implement a version
that could be used on Windows. This is needed for an upcoming
patch.

Signed-off-by: Darrell Ball 
Signed-off-by: Ben Pfaff 
---
 lib/string.c| 23 +--
 lib/string.h.in |  4 +++-
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index 082359d858d8..a9ceabe47e9f 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2009, 2011 Nicira, Inc.
+ * Copyright (c) 2009, 2011, 2017 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -15,9 +15,11 @@
  */

 #include 
-
+#include 
 #include 

+#include "util.h"
+
 #ifndef HAVE_STRNLEN
 size_t
 strnlen(const char *s, size_t maxlen)
@@ -26,3 +28,20 @@ strnlen(const char *s, size_t maxlen)
 return end ? end - s : maxlen;
 }
 #endif
+
+#ifdef _WIN32
+char *
+strcasestr(const char *str, const char *substr)
+{
+do {
+for (size_t i = 0; ; i++) {
+if (!substr[i]) {
+return CONST_CAST(char *, str);
+} else if (tolower(substr[i]) != tolower(str[i])) {
+break;
+}
+}
+} while (*str++);
+return NULL;
+}
+#endif
diff --git

Re: [ovs-dev] [PATCH] ovsdb-server: Document clarification for some bad wording in RFC 7047.

2017-08-04 Thread Ben Pfaff

On Fri, Aug 04, 2017 at 01:45:56PM -0700, Andy Zhou wrote:
> On Thu, Jul 27, 2017 at 4:20 PM, Ben Pfaff  wrote:
> > Reported-by: Harish Kanakaraju 
> > Signed-off-by: Ben Pfaff 
> 
> Acked-by: Andy Zhou 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [patch_v9 3/6] Userspace Datapath: Add TFTP support.

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:34:43PM -0700, Darrell Ball wrote:
> Both ipv4 and ipv6 are supported. Also, NAT support is included.
> 
> Signed-off-by: Darrell Ball 

I'm happy with this patch and the rest of them.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [patch_v9 2/6] Userspace Datapath: Add ALG infra and FTP.

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:34:42PM -0700, Darrell Ball wrote:
> ALG infra and FTP (both V4 and V6) support is added to the userspace
> datapath.  Also, NAT support is included.
> 
> Signed-off-by: Darrell Ball 

Thanks for implementing this.  I have only a few minor comments.

struct conn_key has at least one hole in it, between 'nw_proto' and
'zone'.  That makes it risky, at best, to do byte-by-byte comparisons of
two instances of it with memcmp(), but expectation_lookup() does such a
comparison.  It would probably be better to do a member-by-member
comparison, or to carefully rearrange struct conn_key to avoid holes.

It doesn't make sense to me to have a strcasestr_s() prototype in
conntrack.c.  I think it can be removed.

As a possible direction for the future, usually the need to read-lock a
data structure can be eliminated by using RCU.

I'm appending some minor suggestions that help to better conform to the
usual OVS style or that made the code easier for me to read and
understand.

--8<--cut here-->8--

diff --git a/lib/conntrack.c b/lib/conntrack.c
index be7d0623b24f..a05c54019bc9 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -66,8 +66,6 @@ enum ct_alg_mode {
 CT_FTP_MODE_PASSIVE,
 };
 
-char *strcasestr_s(const char *str, const char *substr);
-
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
  ovs_be16 dl_type, struct conn_lookup_ctx *,
  uint16_t zone);
@@ -333,7 +331,6 @@ write_ct_md(struct dp_packet *pkt, uint16_t zone, const 
struct conn *conn,
 static uint8_t
 get_ip_proto(const struct dp_packet *pkt)
 {
-
 uint8_t ip_proto;
 struct eth_header *l2 = dp_packet_eth(pkt);
 if (l2->eth_type == htons(ETH_TYPE_IPV6)) {
@@ -1178,18 +1175,16 @@ sweep_bucket(struct conntrack *ct, struct 
conntrack_bucket *ctb,
 }
 }
 
-#define MAX_ALG_EXP_TO_EXPIRE 1000
+enum { MAX_ALG_EXP_TO_EXPIRE = 1000 };
 size_t alg_exp_count = hmap_count(>alg_expectations);
 /* XXX: revisit this. */
-size_t max_to_expire =
-MAX(alg_exp_count/10, MAX_ALG_EXP_TO_EXPIRE);
+size_t max_to_expire = MAX(alg_exp_count / 10, MAX_ALG_EXP_TO_EXPIRE);
 count = 0;
 ct_rwlock_wrlock(>resources_lock);
 struct alg_exp_node *alg_exp_node, *alg_exp_node_next;
 LIST_FOR_EACH_SAFE (alg_exp_node, alg_exp_node_next,
 exp_node, >alg_exp_list) {
-if (now < alg_exp_node->expiration ||
-count >= max_to_expire) {
+if (now < alg_exp_node->expiration || count >= max_to_expire) {
 min_expiration = MIN(min_expiration, alg_exp_node->expiration);
 break;
 }
@@ -2355,7 +2350,6 @@ static struct alg_exp_node *
 expectation_lookup(struct hmap *alg_expectations,
const struct conn_key *key, uint32_t basis)
 {
-
 struct conn_key check_key = *key;
 check_key.src.port = ALG_WC_SRC_PORT;
 struct alg_exp_node *alg_exp_node;
@@ -2410,7 +2404,7 @@ expectation_create(struct conntrack *ct,
 alg_exp_node->master_mark = master_conn->mark;
 alg_exp_node->master_label = master_conn->label;
 alg_exp_node->master_key = master_conn->key;
-alg_exp_node->passive_mode = mode == CT_FTP_MODE_PASSIVE ? true : false;
+alg_exp_node->passive_mode = mode == CT_FTP_MODE_PASSIVE;
 /* Take the write lock here because it is almost 100%
  * likely that the lookup will fail and
  * expectation_create() will be called below. */
@@ -2458,7 +2452,7 @@ repl_ftp_v4_addr(struct dp_packet *pkt, ovs_be32 
v4_addr_rep,
  char *ftp_data_start,
  size_t addr_offset_from_ftp_data_start)
 {
-#define MAX_FTP_V4_NAT_DELTA 8
+enum { MAX_FTP_V4_NAT_DELTA = 8 };
 
 /* Do conservative check for pathological MTU usage. */
 uint32_t orig_used_size = dp_packet_size(pkt);
@@ -2475,26 +2469,25 @@ repl_ftp_v4_addr(struct dp_packet *pkt, ovs_be32 
v4_addr_rep,
 
 int overall_delta = 0;
 char *byte_str = ftp_data_start + addr_offset_from_ftp_data_start;
-char *next_delim;
-size_t substr_size;
-uint8_t rep_byte;
-char rep_str[4];
-size_t replace_size;
 
+/* Replace the existing IPv4 address by the new one. */
 for (uint8_t i = 0; i < 4; i++) {
-memset(rep_str, 0 , sizeof rep_str);
-next_delim = memchr(byte_str,',',4);
+/* Find the end of the string for this octet. */
+char *next_delim = memchr(byte_str, ',', 4);
 ovs_assert(next_delim);
-substr_size = next_delim - byte_str;
+int substr_size = next_delim - byte_str;
 remain_size -= substr_size;
-rep_byte = get_v4_byte_be(v4_addr_rep, i);
-replace_size = sprintf(rep_str, "%d", rep_byte);
-ovs_assert(replace_size == strlen(rep_str));
+
+/* Compose the new string for this octet, and replace it. */
+char rep_str[4];
+uint8_t

Re: [ovs-dev] [PATCH v3 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

2017-08-04 Thread Greg Rose


On 08/01/2017 08:58 AM, Kevin Traynor wrote:

Previously rxqs were assigned to pmds by round robin in
port/queue order.

Now that we have the processing cycles used for existing rxqs,
use that information to try and produced a better balanced
distribution of rxqs across pmds. i.e. given multiple pmds, the
rxqs which have consumed the largest amount of processing cycles
will be placed on different pmds.

The rxqs are sorted by their processing cycles and assigned (in
sorted order) round robin across pmds.

Signed-off-by: Kevin Traynor 


Kevin,

Thanks for your work on openvswitch.

Unfortunately, this patch fails when applied to the master branch.  I didn't 
see a 'From:' sha id so I don't know which commit to base this series upon.

Here's the error:

Applying: dpif-netdev: Count the rxq processing cycles for an rxq.
Applying patch #796309 using 'git am'
Description: [ovs-dev,v3,4/6] dpif-netdev: Change rxq_scheduling to use rxq 
processing cycles.
Applying: dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
error: patch failed: lib/dpif-netdev.c:3306
error: lib/dpif-netdev.c: patch does not apply
Patch failed at 0001 dpif-netdev: Change rxq_scheduling to use rxq processing 
cycles.

I guess we need the series to be rebased.

Thanks,

- Greg


---
  Documentation/howto/dpdk.rst |  7 +
  lib/dpif-netdev.c| 73 +++-
  2 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index af01d3e..a969285 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned 
will become
thread.

+If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
+automatically. The processing cycles that have been required for each rxq
+will be used where known to assign rxqs with the highest consumption of
+processing cycles to different pmds.
+
+Rxq to pmds assignment takes place whenever there are configuration changes.
+
  QoS
  ---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 25a521a..a05e586 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
  }

+/* Sort Rx Queues by the processing cycles they are consuming. */
+static int
+rxq_cycle_sort(const void *a, const void *b)
+{
+struct dp_netdev_rxq * qa;
+struct dp_netdev_rxq * qb;
+
+qa = *(struct dp_netdev_rxq **) a;
+qb = *(struct dp_netdev_rxq **) b;
+
+if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
+dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
+return -1;
+}
+
+return 1;
+}
+
  /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
   * queues and marks the pmds as isolated.  Otherwise, assign non isolated
   * pmds to unpinned queues.
   *
+ * If 'pinned' is false queues will be sorted by processing cycles they are
+ * consuming and then assigned to pmds in round robin order.
+ *
   * The function doesn't touch the pmd threads, it just stores the assignment
   * in the 'pmd' member of each rxq. */
@@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
  struct dp_netdev_port *port;
  struct rr_numa_list rr;
-
-rr_numa_list_populate(dp, );
+struct dp_netdev_rxq ** rxqs = NULL;
+int i, n_rxqs = 0;
+struct rr_numa *numa = NULL;
+int numa_id;

  HMAP_FOR_EACH (port, node, >ports) {
-struct rr_numa *numa;
-int numa_id;
-
  if (!netdev_is_pmd(port->netdev)) {
  continue;
  }

-numa_id = netdev_get_numa_id(port->netdev);
-numa = rr_numa_list_lookup(, numa_id);
-
  for (int qid = 0; qid < port->n_rxq; qid++) {
  struct dp_netdev_rxq *q = >rxqs[qid];
@@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
  }
  } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
-if (!numa) {
-VLOG_WARN("There's no available (non isolated) pmd thread "
-  "on numa node %d. Queue %d on port \'%s\' will "
-  "not be polled.",
-  numa_id, qid, netdev_get_name(port->netdev));
+if (n_rxqs == 0) {
+rxqs = xmalloc(sizeof *rxqs);
  } else {
-q->pmd = rr_numa_get_pmd(numa);
+rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
  }
+/* Store the queue. */
+rxqs[n_rxqs++] = q;
  }
  }
  }

+if (n_rxqs > 1) {
+/* Sort the queues in order of the processing cycles
+ * they consumed during their last pmd interval. */
+qsort(rxqs,

Re: [ovs-dev] [patch_v9 1/6] string: Implement strcasestr substitute.

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:34:41PM -0700, Darrell Ball wrote:
> strcasestr is not defined for Windows, so implement a version
> that could be used on Windows. This is needed for an upcoming
> patch.
> 
> Signed-off-by: Darrell Ball 

Thanks!

I don't understand why this uses a macro to change the name of
strcasestr to strcasestr_s.  I don't know of a benefit to that.  Maybe
it follows the pattern in string.h.in that substitutes strerror_s for
strerror_r and strtok_s to strtok_r, but those cases are because the
Windows C library has functions named strerror_s and strtok_s that do
what OVS wants.  The Windows C library does not (as far as I know) have
a function strcasestr_s.

The header should provide a prototype for the function.  It doesn't do
clients any good to provide the prototype in the .c file.

The implementation should follow POSIX in that searching for an empty
substring should always succeed; otherwise, eventually we will end up
with confusion.  See:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strstr.html

My understanding is that it is not a good idea, legally, to replace
non-contiguous ranges of years by a range in a copyright notice.

Here is my suggestion.  What do you think?

Thanks,

Ben.

--8<--cut here-->8--

From: Darrell Ball 
Date: Thu, 3 Aug 2017 21:34:41 -0700
Subject: [PATCH] string: Implement strcasestr substitute.

strcasestr is not defined for Windows, so implement a version
that could be used on Windows. This is needed for an upcoming
patch.

Signed-off-by: Darrell Ball 
Signed-off-by: Ben Pfaff 
---
 lib/string.c| 23 +--
 lib/string.h.in |  4 +++-
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index 082359d858d8..a9ceabe47e9f 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2009, 2011 Nicira, Inc.
+ * Copyright (c) 2009, 2011, 2017 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -15,9 +15,11 @@
  */

 #include 
-
+#include 
 #include 

+#include "util.h"
+
 #ifndef HAVE_STRNLEN
 size_t
 strnlen(const char *s, size_t maxlen)
@@ -26,3 +28,20 @@ strnlen(const char *s, size_t maxlen)
 return end ? end - s : maxlen;
 }
 #endif
+
+#ifdef _WIN32
+char *
+strcasestr(const char *str, const char *substr)
+{
+do {
+for (size_t i = 0; ; i++) {
+if (!substr[i]) {
+return CONST_CAST(char *, str);
+} else if (tolower(substr[i]) != tolower(str[i])) {
+break;
+}
+}
+} while (*str++);
+return NULL;
+}
+#endif
diff --git a/lib/string.h.in b/lib/string.h.in
index bbdaeb43e612..59998aa36fc4 100644
--- a/lib/string.h.in
+++ b/lib/string.h.in
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2009, 2011, 2013 Nicira, Inc.
+ * Copyright (c) 2009-2017 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -36,6 +36,8 @@
 #define strcasecmp _stricmp
 #define strncasecmp _strnicmp
 #define strerror_r(errnum, buf, buflen) strerror_s(buf, buflen, errnum)
+
+char *strcasestr(const char *, const char *);
 #endif

 #ifndef HAVE_STRNLEN
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] debian.rst: Clarify that "dpkg" needs manual help with dependencies.

2017-08-04 Thread Andy Zhou

On Thu, Jul 6, 2017 at 4:47 PM, Ben Pfaff  wrote:
> On Mon, May 29, 2017 at 11:40:51AM -0700, Ben Pfaff wrote:
>> Reported-by: Mircea Ulinic 
>> Signed-off-by: Ben Pfaff 

Acked-by: Andy Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovsdb-server: Document clarification for some bad wording in RFC 7047.

2017-08-04 Thread Andy Zhou

On Thu, Jul 27, 2017 at 4:20 PM, Ben Pfaff  wrote:
> Reported-by: Harish Kanakaraju 
> Signed-off-by: Ben Pfaff 

Acked-by: Andy Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 6/6] NSH unit test cases using encap and decap actions

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:15:00AM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> With the support of generic encap and decap actions for Ethernet and NSH
> it is now possible to build test cases that mimic realistic OVS
> configurations and OF pipelines for Service Function Chaining. Packets
> are being encapsulated in NSH, forwarded based on NSH headers, sent over
> Ethernet links and VXLAN-GPE tunnels, and decapsulated at the end of
> a service chain.
> 
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Yi Yang 

Thanks for adding tests!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 5/6] Generic encap and decap support for NSH

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:14:59AM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> This commit adds translation and netdev datapath support for generic
> encap and decap actions for the NSH MD1 header. The generic encap and
> decap actions are mapped to specific encap_nsh and decap_nsh actions
> in the datapath.
> 
> The translation follows that general scheme that decap() of an NSH
> packet triggers recirculation after decapsulation, while encap(nsh)
> just modifies struct flow and sets the ctx->pending_encap flag to
> generate the encap_nsh action at the next commit to be able to include
> subsequent set_field actions for NSH headers.
> 
> Support for the flexible MD2 format using TLV properties is foreseen
> in encap(nsh), but not yet fully implemented.
> 
> The CLI syntax for encap of NSH is
> encap(nsh(md_type=1))
> encap(nsh(md_type=2[,tlv(,,),...]))
> 
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Yi Yang 

I don't see the new parts added to openvswitch.h in upstream Linux (even
in net-next), so I think that all of them should be enclosed in #ifndef
__KERNEL__ to make that clear.

struct ovs_action_encap_nsh is absurdly large due to the metadata array.
It does not make sense for it to be that big given only MD1 support.
Ideally, struct ovs_action_encap_nsh would be converted into nested
Netlink attributes; then, the metadata could be variable length.  That
is the right way to add kernel support.  Before kernel support, though,
it would make more sense to just use a __be32[4] metadata array.

This patch seems to make a lot of changes that should have been made in
whatever patch originally added the code.  For example, all the changes
to format_nsh_key() and format_be32_masked() appear to be in this
category.

There are some new compiler warnings or errors:

../ofproto/ofproto-dpif-ipfix.c:2793:17: error: enumeration values 
'OVS_ACTION_ATTR_ENCAP_NSH' and 'OVS_ACTION_ATTR_DECAP_NSH' not explicitly 
handled in switch [-Werror,-Wswitch-enum]
../ofproto/ofproto-dpif-xlate.c:5824:43: error: cast from 'const char *' to 
'struct ofpact_ed_prop *' increases required alignment from 1 to 2 
[-Werror,-Wcast-align]
../lib/odp-util.c:1847:21: error: cast from 'uint8_t *' (aka 'unsigned char 
*') to 'struct nsh_md1_ctx *' increases required alignment from 1 to 4 
[-Werror,-Wcast-align]
../lib/odp-util.c:6818:35: error: cast from 'uint8_t *' (aka 'unsigned char 
*') to 'struct nsh_md1_ctx *' increases required alignment from 1 to 4 
[-Werror,-Wcast-align]

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 4/6] userspace: add NSH support to vxlan-gpe tunnels

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:14:58AM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> Signed-off-by: Yi Yang 
> Signed-off-by: Jan Scheurich 

Looks good to me, but are there any tests?  (Do the later patches test
this support?)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 3/6] Adding nsh.at for NSH unit tests

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:14:57AM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> First basic NSH test case implemented and working.
> 
> Unconditionally show matched packet_type in megaflows, even when
> matching on eth.
> 
> Signed-off-by: Jan Scheurich 

I needed to add the following to fix one remaining test (maybe because
it was added recently).  Please fold it in.

--8<--cut here-->8--

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 8aaa1d264ff7..284a65ec6524 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -4785,7 +4785,7 @@ AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: 
icmp,tun_id=0x6,in_port=
 ])
 
 AT_CHECK([ovs-appctl ofproto/trace ovs-dummy "$flow,recirc_id(1)" -generate], 
[0], [stdout])
-AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: 
recirc_id=0x1,icmp,tun_id=0x6,in_port=1,vlan_tci=0x,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=128,icmp_type=8,icmp_code=0
+AT_CHECK([grep "Final flow:" stdout], [0], [Final flow: 
recirc_id=0x1,eth,icmp,tun_id=0x6,in_port=1,vlan_tci=0x,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=128,icmp_type=8,icmp_code=0
 ])
 
 OVS_VSWITCHD_STOP
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 2/6] userspace: enable set_field support for nsh fields

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:14:56AM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> Signed-off-by: Yi Yang 
> Signed-off-by: Jan Scheurich 

Thanks for working on this.

I think that this should be folded into patch 1, since it fixes what is
essentially a bug in patch 1.

I noticed that struct nsh_hdr (introduced in the previous patch) isn't
suitable for 16-bit alignment, since it has 32-bit members.  Our
practice is to use appropriate type to ensure safety for 16-bit
alignment, like this:

/**
 * struct nsh_md1_ctx - Keeps track of NSH context data
 * @nshc<1-4>: NSH Contexts.
 */
struct nsh_md1_ctx {
ovs_16aligned_be32 c[4];
};

struct nsh_md2_tlv {
ovs_be16 md_class;
uint8_t type;
uint8_t length;
uint8_t md_value[];
};

struct nsh_hdr {
ovs_be16 ver_flags_len;
uint8_t md_type;
uint8_t next_proto;
ovs_16aligned_be32 path_hdr;
union {
struct nsh_md1_ctx md1;
struct nsh_md2_tlv md2[0];
};
};

This makes it harder to forget to use the proper accessors to read
misaligned data, since the types more or less ensure it.

I noticed that it's easier to just use arrays of 4 elements (c[4]) than
separate members (c1, c2, c3, c4).

I'm appending an incremental that can be applied on top of patches 1 and
2 to achieve many of the suggestions I've made.  It includes the
incremental for patch 1.  I haven't tested anything, beyond compiling.

--8<--cut here-->8--

diff --git a/build-aux/extract-odp-netlink-h b/build-aux/extract-odp-netlink-h
index a509adb88f79..bc1cc35a7e9b 100755
--- a/build-aux/extract-odp-netlink-h
+++ b/build-aux/extract-odp-netlink-h
@@ -45,7 +45,11 @@ s,#.*,,
 
s/__u8[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*ETH_ALEN[[:space:]]*\]/struct
 eth_addr \1/
 
 # Transform IPv6 addresses from an array to struct in6_addr
-s/__be32[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*4[[:space:]]*\]/struct
 in6_addr \1/
+#
+# As a very special case, only transform member names with more than
+# one character because struct ovs_key_nsh has a member "__be32 c[4];"
+# that is not an IPv6 address.
+s/__be32[[:space:]]*\([a-zA-Z0-9_]\{2,\}\)[[:space:]]*\[[[:space:]]*4[[:space:]]*\]/struct
 in6_addr \1/
 
 # Transform most Linux-specific __u types into C99 uint_t types,
 # and most Linux-specific __be into Open vSwitch ovs_be,
diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 7b6151f3384b..184b75e36bef 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -67,9 +67,9 @@ PREREQS = {"none": "MFP_NONE",
 # If a name matches more than one prefix, the longest one is used.
 OXM_CLASSES = {"NXM_OF_":(0,  0x),
"NXM_NX_":(0,  0x0001),
+   "NXOXM_NSH_": (0x005ad650, 0x),
"OXM_OF_":(0,  0x8000),
"OXM_OF_PKT_REG": (0,  0x8001),
-   "OXM_NSH_":   (0,  0x8004),
"ONFOXM_ET_": (0x4f4e4600, 0x),
 
# This is the experimenter OXM class for Nicira, which is the
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 2381ed2c8c3c..5806aba93f73 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -360,9 +360,6 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking labels */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
-#ifndef __KERNEL__
-   OVS_KEY_ATTR_NSH,   /* struct ovs_key_nsh */
-#endif
 
 #ifdef __KERNEL__
/* Only used within kernel data path. */
@@ -372,6 +369,7 @@ enum ovs_key_attr {
 #ifndef __KERNEL__
/* Only used within userspace data path. */
OVS_KEY_ATTR_PACKET_TYPE,  /* be32 packet type */
+   OVS_KEY_ATTR_NSH,  /* struct ovs_key_nsh */
 #endif
 
__OVS_KEY_ATTR_MAX
@@ -500,10 +498,7 @@ struct ovs_key_nsh {
 __u8 np;
 __u8 pad;
 __be32 path_hdr;
-__be32 c1;
-__be32 c2;
-__be32 c3;
-__be32 c4;
+__be32 c[4];
 };
 
 /* OVS_KEY_ATTR_CT_STATE flags */
diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 6192898f76e3..1c4b56b3c051 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -1745,7 +1745,7 @@ enum OVS_PACKED_ENUM mf_field_id {
 
 /* "nsh_flags".
  *
- * flags field in NSH base header (8 bits).
+ * flags field in NSH base header.
  *
  * Type: u8.
  * Maskable: bitwise.
@@ -1753,13 +1753,13 @@ enum OVS_PACKED_ENUM mf_field_id {
  * Prerequisites: NSH.
  * Access: read/write.
  * NXM: none.
- * OXM: OXM_NSH_FLAGS(1)

Re: [ovs-dev] [PATCH] dpif-netdev: Simplify emc replacement policy.

2017-08-04 Thread Darrell Ball

-Original Message-
From: Ilya Maximets 
Date: Monday, July 31, 2017 at 7:25 AM
To: Darrell Ball , "Wang, Yipeng1" , 
"ovs-dev@openvswitch.org" 
Cc: Heetae Ahn 
Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Simplify emc replacement policy.

On 31.07.2017 04:41, Darrell Ball wrote:
> 
> 
> -Original Message-
> From:  on behalf of "Wang, Yipeng1" 

> Date: Friday, July 28, 2017 at 11:04 AM
> To: Ilya Maximets , "ovs-dev@openvswitch.org" 

> Cc: Heetae Ahn 
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Simplify emc replacement 
policy.
> 
> Good catch. But I think the hash comparison is to "randomly" choose 
one of the two entries to replace when both entries are live. 
> Your change would always replace the first one in such case. It might 
cause some thrashing issue for certain traffic. Meanwhile, to my experience, 
the original "hash comparison" is also not a good way to choose random entry, I 
encountered some thrashing issue before.
> 
> I think we want some condition like below, but a way to fast choose a 
random entry. 
> 
> if (!to_be_replaced || (emc_entry_alive(to_be_replaced) && 
!emc_entry_alive(current_entry) )
> to_be_replaced = current_entry;
> else if((emc_entry_alive(to_be_replaced) && 
(emc_entry_alive(current_entry))
>   to_be_replaced = random_entry;

I agree that we need to have something like random choosing of active entry 
to replace.
I though about this a little and came up with idea to reuse the random 
value generated
for insertion probability. This should give a good distribution for 
replacement.
I'll send v2 soon with that approach. 

The effect here is highly data dependent and in fact dominated by the packet 
distribution which will not be random
or rather pseudo-random. I had done my own testing with pseudo random flows, 
fwiw.
I did not see any thrashing with even at 4000 flows and saw one alive/alive 
choice at 8000.

We can also see the data dependency with this patch in this first version.
This patch removed all randomness when choosing an entry to replace when both 
candidates are alive and instead
always choose the first entry.

However, you observed that this fixed your problem of thrashing with your 
dataset – if so, the dataset used in
your testing may not be very random.
This change would have been worse in the general case, but seemed perfect for 
your dataset.

> //
> 
> I agree – we are trying to randomly select one of two live entries with 
the last condition.
> Something like this maybe makes it more clear what we are trying to do ?

Your code solves the issue with replacement of alive entries while dead 
ones exists,
but you're still uses hashes as random values which is not right. Hashes 
are not random
and there is no any difference in choosing the first entry or the entry 
with a bit
set in a particular place. There always will be some bad case where you 
will replace
same entries all the time and performance of EMC will be low.

> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 47a9fa0..75cc039 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2051,12 +2051,15 @@ emc_insert(struct emc_cache *cache, const struct 
netdev_flow_key *key,
>  }
>  
>  /* Replacement policy: put the flow in an empty (not alive) 
entry, or
> - * in the first entry where it can be */
> -if (!to_be_replaced
> -|| (emc_entry_alive(to_be_replaced)
> -&& !emc_entry_alive(current_entry))
> -|| current_entry->key.hash < to_be_replaced->key.hash) {
> + * randomly choose one of the two alive entries to be replaced. 
*/
> +if (!to_be_replaced) {
>  to_be_replaced = current_entry;
> +} else if (emc_entry_alive(to_be_replaced) && 
!emc_entry_alive(current_entry)) {
> +to_be_replaced = current_entry;
> +} else if (emc_entry_alive(to_be_replaced) && 
emc_entry_alive(current_entry)) {
> +if (current_entry->key.hash & 0x1) {
> +to_be_replaced = current_entry;
> +}
>  }
>  }
>  /* We didn't find the miniflow in the cache.
> 
> //
> 
> Thanks
> Yipeng
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of Ilya Maximets
> > Sent: Friday, July 28, 2017

Re: [ovs-dev] [PATCH v4 5/5] redhat: allow dpdk to also run as non-root user

2017-08-04 Thread Russell Bryant

On Fri, Aug 4, 2017 at 1:00 PM, Aaron Conole  wrote:
> After this commit, users may start a dpdk-enabled ovs setup as a
> non-root user.  This is accomplished by exporting the $HOME directory,
> which dpdk uses to fill in it's semi-persistent RTE configuration.
>
> This change may be a bit controversial since it modifies /dev/hugepages
> as part of starting the ovs-vswitchd to set a hugetlbfs group
> ownership.  This is used to enable writing to /dev/hugepages so that the
> dpdk_init will successfully complete.  There is an alternate way of
> accomplishing this - namely to initialize DPDK before dropping
> privileges.  However, this would mean that if DPDK ever grows an uninit
> / reinit function, non-root ovs likely could never use it.

Indeed ... the modifications to /dev/hugepages don't look ideal ...

If this was truly limited to when DPDK was in use, I'd feel better
about it.  We want to build a single package for OVS, right?  The
package will have DPDK enabled, even for normal uses that won't use
DPDK.  That means these modifications take place even for non-DPDK
use.  I'd feel more comfortable if it could be restricted to only when
DPDK was actually in use.  Maybe some of this logic could be moved
into ovs-ctl so that the check could be at runtime?

>
> This does not change OvS+DPDK's SELinux requirements.  It still must be
> disabled.
>
> Signed-off-by: Aaron Conole 
> ---
>  Documentation/intro/install/dpdk.rst|  7 +++
>  NEWS|  1 +
>  rhel/README.RHEL.rst| 11 +++
>  rhel/openvswitch-fedora.spec.in | 13 +
>  rhel/usr_lib_systemd_system_ovs-vswitchd.service.in |  5 +
>  5 files changed, 37 insertions(+)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 1/5] redhat: allow arbitrary user:group

2017-08-04 Thread Russell Bryant

On Fri, Aug 4, 2017 at 2:31 PM, Russell Bryant  wrote:
> On Fri, Aug 4, 2017 at 1:00 PM, Aaron Conole  wrote:
>> Under rpm based distributions, the only user:group that the rhel daemons run
>> as is 'root:root'.  This is fine as a default, but as part of a security
>> procedure, users may want to run as an alternate uid/gid.  This commit
>> adds an OVS_USER_ID environment variable for systemd, which defaults to
>> root:root, but can be overridden by changing the /etc/sysconfig/openvswitch
>> environment file.
>>
>> Acked-by: Markos Chandras 
>> Signed-off-by: Aaron Conole 
>> ---
>>  rhel/automake.mk  | 1 +
>>  rhel/etc_openvswitch_default.conf | 5 +
>>  rhel/openvswitch-fedora.spec.in   | 4 
>>  rhel/usr_lib_systemd_system_ovs-vswitchd.service  | 3 +++
>>  rhel/usr_lib_systemd_system_ovsdb-server.service  | 3 +++
>>  rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++
>>  6 files changed, 19 insertions(+)
>>  create mode 100644 rhel/etc_openvswitch_default.conf
>
>
>> diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template 
>> b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
>> index 3050a07..fdaee00 100644
>> --- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
>> +++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
>> @@ -21,3 +21,6 @@
>>  #   --ovsdb-server-wrapper=valgrind
>>  #
>>  OPTIONS=""
>> +
>> +# Uncomment and set the OVS User/Group value
>> +#OVS_USER_ID="openvswitch:openvswitch"
>
> Is this really needed?  How about just documenting the use of
> --ovs-user with OPTIONS above?

Nevermind, I see how else this is being used once I read the next patch ...

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability.

2017-08-04 Thread Darrell Ball

-Original Message-
From: Ilya Maximets 
Date: Friday, August 4, 2017 at 7:17 AM
To: "ovs-dev@openvswitch.org" 
Cc: Heetae Ahn , Darrell Ball , 
Yipeng Wang , Kevin Traynor , 
Ciara Loftus , Antonio Fischetti 
, Ilya Maximets 
Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC 
probability.

Currently, real insertion probability is higher than configured
for the maximum case because of wrong usage of the random value.

i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
equals to 1. In this case we're allowing insert if random vailue
is less or equal to 1. So, two of 2**32 values (0 and 1) are
allowed and real probability is 2 times higher than configured.

This happens because 'random_uint32()' returns value in range
[0; UINT32_MAX], but for the checking to be correct we should
generate random value in range [0; UINT32_MAX - 1].

I understand the calculation is slightly off.
If the user enters 4,294,967,295 then the probability to insert into emc will 
be 
2 out of 4,294,967,295 rather than 1 out of 4,294,967,295,

However, is there a general concern about such a low probability anyways ?
This max inverse value would be rarely, if ever used and if used, it would be 
impossible
to see the difference in any real use case.

The user might as well just disable emc rather than use such tiny probabilities.

This existing api was discussed extensively and was very contentious.

However, if patch 2 really has an absolute dependency on this patch 1, we can 
include it.
I have done various testing and don’t see that, but I have some comments on the
other threads.

To fix this we have 4 possible solutions:

 1. need to use uint64_t for 'emc-insert-min' and calculate it
as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
range [0; UINT32_MAX].

This may decrease performance becaue of 64 bit atomic ops.

 2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
because it's the only value we have issues with.

This will require additional explanations and not very friendly
for users.

 3. Generate random value in range [0; UINT32_MAX - 1].

This will require heavy division operation.

 4. Decrease the range of available values for 'emc-insert-inv-prob'.

Actually, we don't need to have so much different values for
that option. I beleve that values higher than 1M are completely
useless. Choosing the upper limit as a power of 2 like 2**20 we
will be able to mask the generated random value in a fast way
and also avoid range issue, because same uint32_t can be used to
store 2**20.

This patch implements solution #4.

CC: Ciara Loftus 
Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
Signed-off-by: Ilya Maximets 
Acked-by: Antonio Fischetti 
---

Infrastructure and logic introduced here will be used for fixing
emc replacement policy.

 lib/dpif-netdev.c| 14 ++
 vswitchd/vswitch.xml |  3 ++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8ecc9d1..e23c674 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -153,9 +153,14 @@ struct netdev_flow_key {
 #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
 #define EM_FLOW_HASH_SEGS 2

+/* Set up maximum inverse EMC insertion probability to 2^20 - 1.
+ * Higher values considered useless in practice. */
+#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
+#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
+#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
 /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
 #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
-#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \
+#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /\
 DEFAULT_EM_FLOW_INSERT_INV_PROB)

 struct emc_entry {
@@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread 
*pmd,
 uint32_t min;
 atomic_read_relaxed(>dp->emc_insert_min, );

-if (min && random_uint32() <= min) {
+if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
 emc_insert(>flow_cache, key, flow);
 }
 }
@@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const 
struct smap *other_config)

Re: [ovs-dev] [PATCH v4 1/5] redhat: allow arbitrary user:group

2017-08-04 Thread Russell Bryant

On Fri, Aug 4, 2017 at 1:00 PM, Aaron Conole  wrote:
> Under rpm based distributions, the only user:group that the rhel daemons run
> as is 'root:root'.  This is fine as a default, but as part of a security
> procedure, users may want to run as an alternate uid/gid.  This commit
> adds an OVS_USER_ID environment variable for systemd, which defaults to
> root:root, but can be overridden by changing the /etc/sysconfig/openvswitch
> environment file.
>
> Acked-by: Markos Chandras 
> Signed-off-by: Aaron Conole 
> ---
>  rhel/automake.mk  | 1 +
>  rhel/etc_openvswitch_default.conf | 5 +
>  rhel/openvswitch-fedora.spec.in   | 4 
>  rhel/usr_lib_systemd_system_ovs-vswitchd.service  | 3 +++
>  rhel/usr_lib_systemd_system_ovsdb-server.service  | 3 +++
>  rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++
>  6 files changed, 19 insertions(+)
>  create mode 100644 rhel/etc_openvswitch_default.conf


> diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template 
> b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
> index 3050a07..fdaee00 100644
> --- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
> +++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
> @@ -21,3 +21,6 @@
>  #   --ovsdb-server-wrapper=valgrind
>  #
>  OPTIONS=""
> +
> +# Uncomment and set the OVS User/Group value
> +#OVS_USER_ID="openvswitch:openvswitch"

Is this really needed?  How about just documenting the use of
--ovs-user with OPTIONS above?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/6] userspace: Add support for NSH MD1 match fields

2017-08-04 Thread Ben Pfaff

On Thu, Aug 03, 2017 at 09:14:55AM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> This patch adds support for NSH packet header fields to the OVS
> control plane and the userspace datapath. Initially we support the
> fields of the NSH base header as defined in
> https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt
> and the fixed context headers specified for metadata format MD1.
> The variable length MD2 format is parsed but the TLV context headers
> are not yet available for matching.
> 
> The NSH fields are modelled as OXM fields with the dedicated OXM
> class 0x8004 proposed for NSH in ONF. The following fields are defined:
> 
> OXM codeofctl nameSize  Comment
> =
> OXM_NSH_FLAGS   nsh_flags   8   Bits 2-9 of 1st NSH word
> (0x8004,1)
> OXM_NSH_MDTYPE  nsh_mdtype  8   Bits 16-23
> (0x8004,2)
> OXM_NSH_NEXTPROTO   nsh_np  8   Bits 24-31
> (0x8004,3)
> OXM_NSH_SPI nsh_spi 24  Bits 0-23 of 2nd NSH word
> (0x8004,4)
> OXM_NSH_SI  nsh_si  8   Bits 24-31
> (0x8004,5)
> OXM_NSH_C1  nsh_c1  32  Maskable, nsh_mdtype==1
> (0x8004,6)
> OXM_NSH_C2  nsh_c2  32  Maskable, nsh_mdtype==1
> (0x8004,7)
> OXM_NSH_C3  nsh_c3  32  Maskable, nsh_mdtype==1
> (0x8004,8)
> OXM_NSH_C4  nsh_c4  32  Maskable, nsh_mdtype==1
> (0x8004,9)
> 
> Co-authored-by: Johnson Li 
> Signed-off-by: Yi Yang 
> Signed-off-by: Jan Scheurich 

Thanks for working on this!

I do not think that we should use the ONF-reserved class 0x8004.
Although some drafts assign 0x8004 for NSH use, I do not think that any
of them have been officially approved.  So, I suggest that we use a
randomly chosen OUI to form an experimenter class.

The documentation in meta-flow.xml is incomplete.  Please expand it to
include the kinds of details that we see in other sections of the
document.

Please add OVS_KEY_ATTR_NSH to the existing #ifndef __KERNEL__ block.

In meta-flow.xml, please indicate the 24-bit size of nsh_spi as part of
the type.  Also, it's not necessary to redundantly include the size in a
comment.

I find the NSH version field confusing.  The diagram shows the version
as the most significant 2 bits of the first 16-bit field of the header,
and NSH_VER_MASK is a 2-bit mask, but NSH_VER_SHIFT is defined as 12.
Why isn't it 14?

I don't see anything in parse_nsh() that verifies that the packet's
length is long enough for the NSH header length.  Probably, it needs to
check before trying to access the length field, and then again when to
verify that it's long enough for the encoded length.

In format_nsh_masked(), it is redundant to check the masks before each
call, because the format_*_masked() functions check them too.

In mf_is_value_valid(), I think we should verify that the high 8 bits of
MFF_NSH_SPI are zero, since it is a 24-bit field.

I'm appending a suggested incremental that addresses many of the issues
I noted above.

Thanks,

Ben.

--8<--cut here-->8--

diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index 7b6151f3384b..184b75e36bef 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -67,9 +67,9 @@ PREREQS = {"none": "MFP_NONE",
 # If a name matches more than one prefix, the longest one is used.
 OXM_CLASSES = {"NXM_OF_":(0,  0x),
"NXM_NX_":(0,  0x0001),
+   "NXOXM_NSH_": (0x005ad650, 0x),
"OXM_OF_":(0,  0x8000),
"OXM_OF_PKT_REG": (0,  0x8001),
-   "OXM_NSH_":   (0,  0x8004),
"ONFOXM_ET_": (0x4f4e4600, 0x),

# This is the experimenter OXM class for Nicira, which is the
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 2381ed2c8c3c..a98ecc0ebedc 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -360,9 +360,6 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking labels */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
-#ifndef __KERNEL__
-   OVS_KEY_ATTR_NSH,   /* struct ovs_key_nsh */
-#endif

 #ifdef __KERNEL__
/* Only used within kernel data path. */
@@ -372,6 +369,7 @@ enum ovs_key_attr {
 #ifndef __KERNEL__
/* Only used within userspace data path. */
OVS_KEY_ATTR_PACKET_TYPE,  /* be32 packet type */
+   OVS_KEY_ATTR_NSH,  /* struct ovs_key_nsh */
 #endif

__OVS_KEY_ATTR_MAX
diff --git

Re: [ovs-dev] [PATCH] tests: fix wrapped comment

2017-08-04 Thread Russell Bryant

On Fri, Aug 4, 2017 at 10:15 AM, Lance Richardson  wrote:
> Add missing '#' to comment line.
>
> Signed-off-by: Lance Richardson 
> ---
> It's odd that this doesn't result in failures in more environments,
> but it does fail under Alpine Linux.

Definitely odd ... thanks for the patch, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] tests: avoid non-posix options to wc

2017-08-04 Thread Russell Bryant

Thanks, applied to master.

On Fri, Aug 4, 2017 at 10:26 AM, Lance Richardson  wrote:
> The '--lines' option for the wc command is a GNU extension and is not
> recognized by some implemenations. Use the POSIX 1003.1 '-l' option
> instead.
>
> Signed-off-by: Lance Richardson 
> ---
>  tests/ovn.at | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 29c5fc0ab..7d816785e 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -7442,12 +7442,12 @@ 
> expected=${dst_mac}${src_mac}0800451c3f110100${src_ip}${dst_ip}00351
>  echo $expected >> hv2-vif1.expected
>  OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [hv2-vif1.expected])
>
> -AT_CHECK([ovn-sbctl --bare --columns _uuid find Port_Binding 
> logical_port=cr-alice | wc --lines], [0], [1
> +AT_CHECK([ovn-sbctl --bare --columns _uuid find Port_Binding 
> logical_port=cr-alice | wc -l], [0], [1
>  ])
>
>  ovn-nbctl --timeout=3 --wait=sb remove Logical_Router_Port alice options 
> redirect-chassis
>
> -AT_CHECK([ovn-sbctl find Port_Binding logical_port=cr-alice | wc --lines], 
> [0], [0
> +AT_CHECK([ovn-sbctl find Port_Binding logical_port=cr-alice | wc -l], [0], [0
>  ])
>
>  OVN_CLEANUP([hv1],[hv2],[hv3])
> --
> 2.13.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 5/5] redhat: allow dpdk to also run as non-root user

2017-08-04 Thread Aaron Conole

After this commit, users may start a dpdk-enabled ovs setup as a
non-root user.  This is accomplished by exporting the $HOME directory,
which dpdk uses to fill in it's semi-persistent RTE configuration.

This change may be a bit controversial since it modifies /dev/hugepages
as part of starting the ovs-vswitchd to set a hugetlbfs group
ownership.  This is used to enable writing to /dev/hugepages so that the
dpdk_init will successfully complete.  There is an alternate way of
accomplishing this - namely to initialize DPDK before dropping
privileges.  However, this would mean that if DPDK ever grows an uninit
/ reinit function, non-root ovs likely could never use it.

This does not change OvS+DPDK's SELinux requirements.  It still must be
disabled.

Signed-off-by: Aaron Conole 
---
 Documentation/intro/install/dpdk.rst|  7 +++
 NEWS|  1 +
 rhel/README.RHEL.rst| 11 +++
 rhel/openvswitch-fedora.spec.in | 13 +
 rhel/usr_lib_systemd_system_ovs-vswitchd.service.in |  5 +
 5 files changed, 37 insertions(+)

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index b2b91d4..4b0828c 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -134,6 +134,13 @@ has to be configured with DPDK support (``--with-dpdk``).
 
 Additional information can be found in :doc:`general`.
 
+.. note::
+  If you are running using the Fedora or Red Hat package, the Open vSwitch
+  daemon will run as a non-root user.  This implies that you must have a
+  working IOMMU.  Visit the `RHEL README`__ for additional information.
+
+__ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst
+
 Setup
 -
 
diff --git a/NEWS b/NEWS
index a89e718..a7bc1c3 100644
--- a/NEWS
+++ b/NEWS
@@ -70,6 +70,7 @@ Post-v2.7.0
First supported use case is encap/decap for Ethernet.
- Fedora Packaging:
  * OVN services are no longer restarted automatically after upgrade.
+ * ovs-vswitchd and ovsdb-server run as non-root users by default.
- Add --cleanup option to command 'ovs-appctl exit' (see ovs-vswitchd(8)).
- L3 tunneling:
  * Use new tunnel port option "packet_type" to configure L2 vs. L3.
diff --git a/rhel/README.RHEL.rst b/rhel/README.RHEL.rst
index 6affdba..f3d2942 100644
--- a/rhel/README.RHEL.rst
+++ b/rhel/README.RHEL.rst
@@ -337,6 +337,17 @@ running. All other commands where executed when Open 
vSwitch was successfully
 running.
 
 
+Non-root User Support
+---
+Fedora and RHEL support running the Open vSwitch daemons as a non-root user.
+By default, a fresh installation will create an *openvswitch* user, along
+with any additional support groups needed (such as *hugetlbfs* for DPDK
+support).
+
+This is controlled by modifying the ``OVS_USER_ID`` option.  Setting this
+to 'root:root', or commenting the variable out will revert this behavior.
+
+
 Reporting Bugs
 --
 
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 1061824..2eccada 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -95,6 +95,10 @@ Requires: openssl hostname iproute module-init-tools
 Requires(post): /usr/bin/getent
 Requires(post): /usr/sbin/useradd
 Requires(post): /usr/bin/sed
+%if %{with dpdk}
+Requires(post): /usr/sbin/usermod
+Requires(post): /usr/sbin/groupadd
+%endif
 Requires(post): systemd-units
 Requires(preun): systemd-units
 Requires(postun): systemd-units
@@ -379,6 +383,15 @@ if [ $1 -eq 1 ]; then
 
 sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch
 
+%if %{with dpdk}
+getent group hugetlbfs >/dev/null || \
+groupadd hugetlbfs
+usermod -a -G hugetlbfs openvswitch
+sed -i \
+
's@OVS_USER_ID="openvswitch:openvswitch"@OVS_USER_ID="openvswitch:hugetlbfs"@'\
+/etc/sysconfig/openvswitch
+%endif
+
 # In the case of upgrade, this is not needed.
 chown -R openvswitch:openvswitch /etc/openvswitch
 fi
diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in 
b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
index 9aff70b..bf0f058 100644
--- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
+++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
@@ -10,8 +10,13 @@ PartOf=openvswitch.service
 [Service]
 Type=forking
 Restart=on-failure
+Environment=HOME=/var/run/openvswitch
 EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
+@begin_dpdk@
+ExecStartPre=/usr/bin/chown :hugetlbfs /dev/hugepages
+ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages
+@end_dpdk@
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovsdb-server --no-monitor --system-id=random \
   --ovs-user=${OVS_USER_ID} \
-- 
2.9.4

___
dev mailing list

[ovs-dev] [PATCH v4 3/5] dpdkstrip: add a preprocessor tool for stripping dpdk blocks

2017-08-04 Thread Aaron Conole

Normally, in C code, pre-processing macros can be used to enable/disable
specific functionality based on switches passed to configure.  This works
for DPDK using the --with-dpdk flag, which sets the DPDK_NETDEV define to
the appropriate value.

However, not all files are processed with the C pre-processor.  For those
files which are not, this commit adds a new pre-processor tool for .in
files to either include or exclude those stanzas as appropriate.

Signed-off-by: Aaron Conole 
---
 Makefile.am|  1 +
 build-aux/dpdkstrip.pl | 35 +++
 2 files changed, 36 insertions(+)
 create mode 100644 build-aux/dpdkstrip.pl

diff --git a/Makefile.am b/Makefile.am
index 373ef6e..035afe6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -82,6 +82,7 @@ EXTRA_DIST = \
build-aux/cksum-schema-check \
build-aux/calculate-schema-cksum \
build-aux/dist-docs \
+   build-aux/dpdkstrip.pl \
build-aux/sodepends.pl \
build-aux/soexpand.pl \
build-aux/xml2nroff \
diff --git a/build-aux/dpdkstrip.pl b/build-aux/dpdkstrip.pl
new file mode 100644
index 000..98539cc
--- /dev/null
+++ b/build-aux/dpdkstrip.pl
@@ -0,0 +1,35 @@
+# Copyright (c) 2017 Red Hat, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+my ($check_dpdk) = 0;
+my ($disabled_print) = 0;
+
+Getopt::Long::Configure ("bundling");
+GetOptions("dpdk!" => \$check_dpdk) or exit(1);
+
+OUTER: while () {
+if (/@(begin|end)_dpdk@/) {
+if (!$check_dpdk) {
+$disabled_print = ! $disabled_print;
+}
+next;
+}
+
+print $_ unless $disabled_print;
+}
+exit 0;
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 4/5] redhat: dynamic service file for vswitchd

2017-08-04 Thread Aaron Conole

This commit changes the service file from static configuration to an
autogenerated file, produced during the build.  This will be relevant in a
future commit.

Signed-off-by: Aaron Conole 
---
 rhel/.gitignore  | 1 +
 rhel/automake.mk | 4 +++-
 rhel/openvswitch-fedora.spec.in  | 9 +
 ...hd.service => usr_lib_systemd_system_ovs-vswitchd.service.in} | 0
 4 files changed, 13 insertions(+), 1 deletion(-)
 rename rhel/{usr_lib_systemd_system_ovs-vswitchd.service => 
usr_lib_systemd_system_ovs-vswitchd.service.in} (100%)

diff --git a/rhel/.gitignore b/rhel/.gitignore
index 164bb66..e584a1e 100644
--- a/rhel/.gitignore
+++ b/rhel/.gitignore
@@ -4,3 +4,4 @@ openvswitch-kmod-rhel6.spec
 openvswitch-kmod-fedora.spec
 openvswitch.spec
 openvswitch-fedora.spec
+usr_lib_systemd_system_ovs-vswitchd.service
diff --git a/rhel/automake.mk b/rhel/automake.mk
index 2d9443f..11c8be0 100644
--- a/rhel/automake.mk
+++ b/rhel/automake.mk
@@ -28,13 +28,15 @@ EXTRA_DIST += \
rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template \
rhel/usr_lib_systemd_system_openvswitch.service \
rhel/usr_lib_systemd_system_ovsdb-server.service \
-   rhel/usr_lib_systemd_system_ovs-vswitchd.service \
+   rhel/usr_lib_systemd_system_ovs-vswitchd.service.in \
rhel/usr_lib_systemd_system_ovn-controller.service \
rhel/usr_lib_systemd_system_ovn-controller-vtep.service \
rhel/usr_lib_systemd_system_ovn-northd.service \
rhel/usr_lib_firewalld_services_ovn-central-firewall-service.xml \
rhel/usr_lib_firewalld_services_ovn-host-firewall-service.xml
 
+DISTCLEANFILES += rhel/usr_lib_systemd_system_ovs-vswitchd.service
+
 update_rhel_spec = \
   $(AM_V_GEN)($(ro_shell) && sed -e 's,[@]VERSION[@],$(VERSION),g') \
 < $(srcdir)/rhel/$(@F).in > $(@F).tmp || exit 1; \
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index fe86054..1061824 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -225,6 +225,15 @@ Docker network plugins for OVN.
--enable-ssl \
--with-pkidir=%{_sharedstatedir}/openvswitch/pki
 
+/usr/bin/perl build-aux/dpdkstrip.pl \
+%if %{with dpdk}
+   --dpdk \
+%else
+   --nodpdk \
+%endif
+   < rhel/usr_lib_systemd_system_ovs-vswitchd.service.in \
+   > rhel/usr_lib_systemd_system_ovs-vswitchd.service
+
 make %{?_smp_mflags}
 cd selinux
 make -f %{_datadir}/selinux/devel/Makefile
diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service 
b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
similarity index 100%
rename from rhel/usr_lib_systemd_system_ovs-vswitchd.service
rename to rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 2/5] redhat: dynamically allocate and reference ovs user

2017-08-04 Thread Aaron Conole

After this commit, the fedora RPM will create the openvswitch user, from the
non-static pool, for use as an Open vSwitch daemon user.  This only happens
on install - not upgrade.  This will be the default user:group
combination for the openvswitch daemons.

To do this in a way that doesn't impact existing installations, the
/etc/openvswitch directory will be created during the installation,
rather than being provided as part of the rpm.

Acked-by: Markos Chandras 
Signed-off-by: Aaron Conole 
---
 rhel/openvswitch-fedora.spec.in  | 13 +
 rhel/usr_lib_systemd_system_ovsdb-server.service |  1 +
 2 files changed, 14 insertions(+)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index a779c9a..fe86054 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -92,6 +92,9 @@ Requires: openssl hostname iproute module-init-tools
 #Upstream kernel commit 4f647e0a3c37b8d5086214128614a136064110c3
 #Requires: kernel >= 3.15.0-0
 
+Requires(post): /usr/bin/getent
+Requires(post): /usr/sbin/useradd
+Requires(post): /usr/bin/sed
 Requires(post): systemd-units
 Requires(preun): systemd-units
 Requires(postun): systemd-units
@@ -361,6 +364,16 @@ rm -rf $RPM_BUILD_ROOT
 %endif
 
 %post
+if [ $1 -eq 1 ]; then
+getent passwd openvswitch >/dev/null || \
+useradd -r -d / -s /sbin/nologin -c "Open vSwitch Daemons" openvswitch
+
+sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch
+
+# In the case of upgrade, this is not needed.
+chown -R openvswitch:openvswitch /etc/openvswitch
+fi
+
 %if 0%{?systemd_post:1}
 %systemd_post %{name}.service
 %else
diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service 
b/rhel/usr_lib_systemd_system_ovsdb-server.service
index b82cb33..7acd25f 100644
--- a/rhel/usr_lib_systemd_system_ovsdb-server.service
+++ b/rhel/usr_lib_systemd_system_ovsdb-server.service
@@ -10,6 +10,7 @@ Type=forking
 Restart=on-failure
 EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
+ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovs-vswitchd --no-monitor --system-id=random \
   --ovs-user=${OVS_USER_ID} \
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 1/5] redhat: allow arbitrary user:group

2017-08-04 Thread Aaron Conole

Under rpm based distributions, the only user:group that the rhel daemons run
as is 'root:root'.  This is fine as a default, but as part of a security
procedure, users may want to run as an alternate uid/gid.  This commit
adds an OVS_USER_ID environment variable for systemd, which defaults to
root:root, but can be overridden by changing the /etc/sysconfig/openvswitch
environment file.

Acked-by: Markos Chandras 
Signed-off-by: Aaron Conole 
---
 rhel/automake.mk  | 1 +
 rhel/etc_openvswitch_default.conf | 5 +
 rhel/openvswitch-fedora.spec.in   | 4 
 rhel/usr_lib_systemd_system_ovs-vswitchd.service  | 3 +++
 rhel/usr_lib_systemd_system_ovsdb-server.service  | 3 +++
 rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++
 6 files changed, 19 insertions(+)
 create mode 100644 rhel/etc_openvswitch_default.conf

diff --git a/rhel/automake.mk b/rhel/automake.mk
index 1265fa7..2d9443f 100644
--- a/rhel/automake.mk
+++ b/rhel/automake.mk
@@ -10,6 +10,7 @@ EXTRA_DIST += \
rhel/automake.mk \
rhel/etc_init.d_openvswitch \
rhel/etc_logrotate.d_openvswitch \
+   rhel/etc_openvswitch_default.conf \
rhel/etc_sysconfig_network-scripts_ifdown-ovs \
rhel/etc_sysconfig_network-scripts_ifup-ovs \
rhel/openvswitch-dkms.spec \
diff --git a/rhel/etc_openvswitch_default.conf 
b/rhel/etc_openvswitch_default.conf
new file mode 100644
index 000..c74417d
--- /dev/null
+++ b/rhel/etc_openvswitch_default.conf
@@ -0,0 +1,5 @@
+# DO NOT EDIT THIS FILE
+
+# The following is the *default* configuration for the openvswitch user ID.
+# This is for backward compatibility.
+OVS_USER_ID="root:root"
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 1cad940..a779c9a 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -246,6 +246,9 @@ done
 install -m 0755 rhel/etc_init.d_openvswitch \
 $RPM_BUILD_ROOT%{_datadir}/openvswitch/scripts/openvswitch.init
 
+install -p -D -m 0644 rhel/etc_openvswitch_default.conf \
+$RPM_BUILD_ROOT/%{_sysconfdir}/openvswitch/default.conf
+
 install -p -D -m 0644 rhel/etc_logrotate.d_openvswitch \
 $RPM_BUILD_ROOT/%{_sysconfdir}/logrotate.d/openvswitch
 
@@ -475,6 +478,7 @@ fi
 %{_sysconfdir}/bash_completion.d/ovs-appctl-bashcomp.bash
 %{_sysconfdir}/bash_completion.d/ovs-vsctl-bashcomp.bash
 %dir %{_sysconfdir}/openvswitch
+%{_sysconfdir}/openvswitch/default.conf
 %config %ghost %{_sysconfdir}/openvswitch/conf.db
 %ghost %{_sysconfdir}/openvswitch/.conf.db.~lock~
 %config %ghost %{_sysconfdir}/openvswitch/system-id.conf
diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service 
b/rhel/usr_lib_systemd_system_ovs-vswitchd.service
index 886b68a..9aff70b 100644
--- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service
+++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service
@@ -10,12 +10,15 @@ PartOf=openvswitch.service
 [Service]
 Type=forking
 Restart=on-failure
+EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovsdb-server --no-monitor --system-id=random \
+  --ovs-user=${OVS_USER_ID} \
   start $OPTIONS
 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop
 ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server \
   --no-monitor --system-id=random \
+  --ovs-user=${OVS_USER_ID} \
   restart $OPTIONS
 TimeoutSec=300
diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service 
b/rhel/usr_lib_systemd_system_ovsdb-server.service
index 68deace..b82cb33 100644
--- a/rhel/usr_lib_systemd_system_ovsdb-server.service
+++ b/rhel/usr_lib_systemd_system_ovsdb-server.service
@@ -8,12 +8,15 @@ PartOf=openvswitch.service
 [Service]
 Type=forking
 Restart=on-failure
+EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovs-vswitchd --no-monitor --system-id=random \
+  --ovs-user=${OVS_USER_ID} \
   start $OPTIONS
 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop
 ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd \
+   --ovs-user=${OVS_USER_ID} \
--no-monitor restart $OPTIONS
 RuntimeDirectory=openvswitch
 RuntimeDirectoryMode=0755
diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template 
b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
index 3050a07..fdaee00 100644
--- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
+++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
@@ -21,3 +21,6 @@
 #   --ovsdb-server-wrapper=valgrind
 #
 OPTIONS=""
+
+# Uncomment and set the OVS User/Group value

[ovs-dev] [PATCH v4 0/5] rhel/fedora: non-root OvS out of the box

2017-08-04 Thread Aaron Conole

This series attempts to introduce the ability to start and use
Open vSwitch 'out of the box' as a non-root user.  It does this by
modifying the service files to pass the recently introduced --ovs-user
argument around, and by making some minor tweaks to the passwd, group,
and filesystem information.

I prefixed the packaging work with 'redhat', but if rpm or packaging
is a preferred prefx for that work, I can respin.

The more controversial changes are:

* This modifies the /etc/sysconfig/ file on install.
* The dpdk support directly modifies /dev/hugepages with a call to chmod
* A new user 'openvswitch', and up to two new groups 'openvswitch', and
  'hugetlbfs' are created

After this series:

> [root at wsfd-netdev60 ~]# yum install openvswitch-2.7.90-1.fc25.x86_64.rpm 
> Loaded plugins: product-id, search-disabled-repos, subscription-manager
> This system is not registered to Red Hat Subscription Management. You can use 
> subscription-manager to register.
> Examining openvswitch-2.7.90-1.fc25.x86_64.rpm: 
> openvswitch-2.7.90-1.fc25.x86_64
> Marking openvswitch-2.7.90-1.fc25.x86_64.rpm to be installed
> Resolving Dependencies
> --> Running transaction check
> ---> Package openvswitch.x86_64 0:2.7.90-1.fc25 will be installed
> --> Finished Dependency Resolution
>
> Dependencies Resolved
>
> 
>  Package  ArchVersion  Repository  
> Size
> 
> Installing:
>  openvswitch  x86_64  2.7.90-1.fc25/openvswitch-2.7.90-1.fc25.x86_64   11 
> M
>
> Transaction Summary
> 
> Install  1 Package
>
> Total size: 11 M
> Installed size: 11 M
> Is this ok [y/d/N]: y
> Downloading packages:
> Running transaction check
> Running transaction test
> Transaction test succeeded
> Running transaction
>   Installing : openvswitch-2.7.90-1.fc25.x86_64 
> 1/1 
>   Verifying  : openvswitch-2.7.90-1.fc25.x86_64 
> 1/1 
>
> Installed:
>   openvswitch.x86_64 0:2.7.90-1.fc25  
>   
>
> Complete!
> [root at wsfd-netdev60 ~]# systemctl start openvswitch
> [root at wsfd-netdev60 ~]# ps aux | grep ovs
> openvsw+  12642  0.0  0.0  47864  2296 ?S ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info 
> --remote=punix:/var/run/openvswitch/db.sock 
> --private-key=db:Open_vSwitch,SSL,private_key 
> --certificate=db:Open_vSwitch,SSL,certificate 
> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user 
> openvswitch:openvswitch --no-chdir 
> --log-file=/var/log/openvswitch/ovsdb-server.log 
> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
> openvsw+  12688  0.0  0.0  49588 10600 ?S ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err 
> -vfile:info --mlockall --user openvswitch:openvswitch --no-chdir 
> --log-file=/var/log/openvswitch/ovs-vswitchd.log 
> --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach


v1->v2:
https://lists.linux-foundation.org/pipermail/ovs-dev/2017-June/333417.html

The previous method used 3 different locations of configuration from
environment variables:
1. The systemd file.
2. A new /etc/sysconfig/openvswitch-pre
3. The existing /etc/sysconfig/openvswitch

Now, configuration is from two areas:
1. A new /etc/openvswitch/default.conf
2. The existing /etc/sysconfig/openvswitch

As part of the install, we set the OVS_USER_ID to the new values.

v2->v3:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/334946.html

Refactor for the dpdk non-root user portion due to an issue discovered
where the generated service file didn't honor new configuration when
re-running ./configure.

Also, converted the "Reviewed-by" to "Acked-by".  This is because there
is no such thing as Reviewed-by in the OVS contributing guide.

Finally, included some documentation updates.

v3->v4:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336558.html

Remove the makefile modifications


Aaron Conole (5):
  redhat: allow arbitrary user:group
  redhat: dynamically allocate and reference ovs user
  dpdkstrip: add a preprocessor tool for stripping dpdk blocks
  redhat: dynamic service file for vswitchd
  redhat: allow dpdk to also run as non-root user

 Documentation/intro/install/dpdk.rst   |  7 
 Makefile.am|  1 +
 NEWS   |  1 +
 build-aux/dpdkstrip.pl | 35 +++
 rhel/.gitignore|  1 +
 rhel/README.RHEL.rst   | 11 ++
 rhel/automake.mk   |  5 ++-
 rhel/etc_openvswitch_default.conf  |  5 +++
 rhel/openvswitch-fedora.spec.in

Re: [ovs-dev] [PATCH v3 6/6] redhat: allow dpdk to also run as non-root user

2017-08-04 Thread Aaron Conole

Sergio Gonzalez Monroy  writes:

> On 02/08/2017 16:07, Aaron Conole wrote:
>> Sergio Gonzalez Monroy  writes:
>>
>>> Hi Aaron,
>> Hi Sergio,
>>
>>> On 01/08/2017 23:05, Aaron Conole wrote:
 After this commit, users may start a dpdk-enabled ovs setup as a
 non-root user.  This is accomplished by exporting the $HOME directory,
 which dpdk uses to fill in it's semi-persistent RTE configuration.

 This change may be a bit controversial since it modifies /dev/hugepages
 as part of starting the ovs-vswitchd to set a hugetlbfs group
 ownership.  This is used to enable writing to /dev/hugepages so that the
 dpdk_init will successfully complete.  There is an alternate way of
 accomplishing this - namely to initialize DPDK before dropping
 privileges.  However, this would mean that if DPDK ever grows an uninit
 / reinit function, non-root ovs likely could never use it.

 This does not change OvS+DPDK's SELinux requirements.  It still must be
 disabled.

 Signed-off-by: Aaron Conole 
 ---
>>> Instead of modifying /dev/hugepages, what about creating a hugetlbfs
>>> mount point for OvS? You could then point DPDK to use that specific
>>> mount (--huge-dir).
>> With this approach, I need to also insert new information into the
>> ovsdb.  Additionally, if the user wants to customize it, I'm not sure
>> the best way of doing that.  Do you have a concrete set of steps you
>> think makes sense here?
>
> I was just checking with the team here (I am not very familiar with
> the OvS specifics) and
> the option is already in ovsdb 'dpdk-hugepage-dir'.
>
> Are you referring to something else?

Whoops.  I forgot to hit send on my reply, it seems.  Sorry for that.

I'm referring to populating that information in the database in a way
that doesn't break the user and also can be done in an automated
fashion.  That would require either starting ovsdb during the install,
only to stop it again, OR possibly adding commands to the startup
sequence to set this directory value.  Maybe there's some other way,
though?

> Thanks,
> Sergio
>
>>> The only downside I can think of for this approach is that OvS would
>>> be fixed to use a single size (either 2MB or 1GB whatever the mount
>>> point is set to).
>>> Without specifying the mount point directory, DPDK could use both
>>> hugepage sizes.
>>>
>>> Could you elaborate on the OvS+DPDK's SELinux requirements?
>> Sure.  We need r/w permissions on vfio labeled devices, and some
>> additional unix socket permissions for vhost-user sockets, as well as
>> r/w and create bits for hugetlbfs labeled files.  I think that was it
>> off the top of my head.  I have an selinux policy ready to go, but I
>> want to get this change in first.
>>
>>> Quick summary of DPDK privileged/unprivileged user (assuming the
>>> unprivileged user has permissions for hugepage allocation):
>>> - if all devices are bound to vfio-pci driver (thus IOMMU enabled
>>> system), then the DPDK can run as unprivileged user. This is the
>>> recommended mode is possible.
>>> - if any device is bound to igb_uio/uio_pci_generic (likely scenario
>>> when running in VMs), then the DPDK needs privileged user to be able
>>> to read each hugepage physical address from /proc/self/pagemap.
>> Correct.  I agree with the above.
>>
>> Thanks so much for looking at this!
>>
>>> Thanks,
>>> Sergio
>>>
>>>
Documentation/intro/install/dpdk.rst|  7 +++
NEWS|  1 +
rhel/README.RHEL.rst| 11 +++
rhel/openvswitch-fedora.spec.in | 13 +
rhel/usr_lib_systemd_system_ovs-vswitchd.service.in |  5 +
5 files changed, 37 insertions(+)

 diff --git a/Documentation/intro/install/dpdk.rst 
 b/Documentation/intro/install/dpdk.rst
 index a05aa1a..0585c6a 100644
 --- a/Documentation/intro/install/dpdk.rst
 +++ b/Documentation/intro/install/dpdk.rst
 @@ -134,6 +134,13 @@ has to be configured with DPDK support 
 (``--with-dpdk``).
  Additional information can be found in :doc:`general`.
+.. note::
 +  If you are running using the Fedora or Red Hat package, the Open vSwitch
 +  daemon will run as a non-root user.  This implies that you must have a
 +  working IOMMU.  Visit the `RHEL README`__ for additional information.
 +
 +__ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst
 +
Setup
-
diff --git a/NEWS b/NEWS
 index facea02..095272a 100644
 --- a/NEWS
 +++ b/NEWS
 @@ -64,6 +64,7 @@ Post-v2.7.0
 * OpenFlow 1.5 packet-out is now supported.
   - Fedora Packaging:
 * OVN services are no longer restarted automatically after upgrade.
 + * ovs-vswitchd and

Re: [ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive functionality.

2017-08-04 Thread Bodireddy, Bhanuprakash

HI Ilya,

Thanks for looking in to this and providing your feedback. 

When this feature was first posted as RFC 
(https://mail.openvswitch.org/pipermail/ovs-dev/2016-July/318243.html), the 
implementation in OvS was done based on DPDK Keepalive library and keeping 
collectd in sync.  As you can see from RFC it was pretty compact code and 
integrated well with ceilometer and provided end to end functionality. Much of 
the RFC code was  to handle SHM. 

However the reviewers pointed below flaws.

- Very DPDK specific.
- Shared memory for inter process communication(Between OvS and collectd 
threads).
- Tracks PMD cores and not threads.
- Limited support to detect false negatives & false positives.
- Limited support to query KA status.

As per suggestions, below changes were introduced.

- Basic infrastructure to register & track threads instead of cores. (Now only 
PMDs are tracked only but can be extended to track non-PMD threads).
- Keep most of the APIs generic so that they can extended in the future. All 
generic APIs are in Keepalive.[hc]
- Remove Shared memory and introduce OvSDB.
- Add support to detect false negatives.
- appctl options to query status.

I agree that we have few issues but they can be reworked.
 -  invoke dpdk_is_enabled() from generic code (vswitchd/bridge.c) isn't nice, 
I had to do  to pass few unit test cases last time.
 -  Half a dozen stub APIs. I couldn't avoid it as they are needed to get the 
kernel datapath build.

The patch series can be categorized  in to sub patchesets (KA infrastructure/  
OvSDB changes/  Query KA stats / Check False positives).  This patch series in 
the current form is using rte_keepalive library to handle PMD thread. But 
importantly has  introduced basic infrastructure to deal with other threads in 
the future.  

Regards,
Bhanuprakash. 

>-Original Message-
>From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>Sent: Friday, August 4, 2017 2:40 PM
>To: ovs-dev@openvswitch.org; Bodireddy, Bhanuprakash
>
>Cc: Darrell Ball ; Ben Pfaff ; Aaron
>Conole 
>Subject: Re: [ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive
>functionality.
>
>Hi Bhanuprakash,
>
>Thanks for working on this.
>I have a general concern about implementation of this functionality:
>
>*What is the profit from using rte_keepalive library ?*
>
>Pros:
>
>* No need to implement 'rte_keepalive_dispatch_pings()' (40 LOC)
>  and struct rte_keepalive (30 LOC, can be significantly decreased
>  by removing not needed elements) ---> ~70 LOC.
>
>Cons:
>
>* DPDK dependency:
>
>* Implementation of PMD threads management (KA) inside netdev code
>  (netdev-dpdk) looks very strange.
>* Many DPDK references in generic code (like dpdk_is_enabled).
>* Feature isn't available for the common threads (main?) wihtout DPDK.
>* Many stubs and placeholders for cases without dpdk.
>* No ability for unit testing.
>
>So, does it worth to use rte_keepalive? To make functionality fully generic we
>only need to implement 'rte_keepalive_dispatch_pings()'
>and few helpers. As soon as this function does nothing dpdk-specific it's a
>really simple task which will allow to greatly clean up the code. The feature 
>is
>too big to use external library for 70 LOCs of really simple code. (Clean up
>should save much more).
>
>Am I missed something?
>Any thoughts?
>
>Best regards, Ilya Maximets.
>
>> Keepalive feature is aimed at achieving Fastpath Service Assurance in
>> OVS-DPDK deployments. It adds support for monitoring the packet
>> processing cores(PMD thread cores) by dispatching heartbeats at
>> regular intervals. Incase of heartbeat misses additional health checks
>> are enabled on the PMD thread to detect the failure and the same shall
>> be reported to higher level fault management systems/frameworks.
>>
>> The implementation uses OVSDB for reporting the health of the PMD
>threads.
>> Any external monitoring application can read the status from OVSDB at
>> regular intervals (or) subscribe to the updates in OVSDB so that they
>> get notified when the changes happen on OVSDB.
>>
>> keepalive info struct is created and initialized for storing the
>> status of the PMD threads. This is initialized by main
>> thread(vswitchd) as part of init process and will be periodically updated by
>'keepalive'
>> thread. keepalive feature can be enabled through below OVSDB settings.
>>
>> enable-keepalive=true
>>   - Keepalive feature is disabled by default.
>>
>> keepalive-interval="5000"
>>   - Timer interval in milliseconds for monitoring the packet
>> processing cores.
>>
>> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
>> up at regular intervals to update the timestamp and status of pmd
>> cores in keepalive info struct. This information shall be read by
>> vswitchd thread and write the status in to 'keepalive' column of

Re: [ovs-dev] [PATCH v3 2/2] dpif-netdev: Fix emc replacement policy.

2017-08-04 Thread Fischetti, Antonio

LGTM

Acked-by: Antonio Fischetti 

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Friday, August 4, 2017 3:17 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn ; Darrell Ball ;
> Wang, Yipeng1 ; Kevin Traynor ;
> Loftus, Ciara ; Fischetti, Antonio
> ; Ilya Maximets 
> Subject: [PATCH v3 2/2] dpif-netdev: Fix emc replacement policy.
> 
> Current EMC replacement policy allows to replace active EMC entry
> even if there are dead (empty) entries available. This leads to
> EMC trashing even on few hundreds of flows. In some cases PMD
> threads starts to execute classifier lookups even in tests with
> 50 - 100 active flows.
> 
> Looks like the hash comparison rule was introduced to randomly
> choose one of alive entries to replace. But it doesn't work as
> needed and also hashes has nothing common with randomness.
> 
> Lets fix the replacement policy by removing hash checking and
> using the random value passed from 'emc_probabilistic_insert()'
> only while considering replace of the alive entry.
> This should give us nearly fair way to choose the entry to replace.
> 
> We are avoiding calculation of the new random value by reusing
> bits of already generated random for probabilistic EMC insertion.
> Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because
> lower bits are less than 'min' and not fully random.
> 
> Not replacing of alive entries while dead ones exists allows to
> significantly decrease EMC trashing.
> 
> Testing shows stable work of exact match cache without misses
> with up to 3072 - 6144 active flows (depends on traffic pattern).
> 
> Signed-off-by: Ilya Maximets 
> ---
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-04 Thread Loftus, Ciara

> 
> Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> So L4 packets's cksum were calculated in VM side but performance is not
> good.
> Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
> makes virtio-net frontend-driver support NETIF_F_SG as well
> 
> Signed-off-by: Zhenyu Gao 
> ---
> 
> Here is some performance number:
> 
> Setup:
> 
>  qperf client
> +-+
> |   VM|
> +-+
>  |
>  |  qperf server
> +--+  ++
> | vswitch+dpdk |  | bare-metal |
> +--+  ++
>||
>||
>   pNic-PhysicalSwitch
> 
> do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on'
> in VM side.
>   It offload cksum job to ovs-dpdk side.
> 
> do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM
> side.
> VM calculate cksum for tcp/udp packets.
> 
> We can see huge improvment in TCP throughput if we leverage ovs-dpdk
> cksum.

Hi Zhenyu,

Thanks for the patch. I tested some alternative use cases and unfortunately I 
see a degradation for phy-phy and phy-vm-phy topologies.
Here are my results:

phy-vm-phy:
without patch: 0.871Mpps
with patch (offload=on): 0.877Mpps
with patch (offload=off): 0.891Mpps

phy-phy:
without patch: 13.581Mpps
with patch: 13.055Mpps

The half a million pps drop for the second test case is concerning to me but 
not surprising since we're adding extra complexity to netdev_dpdk_send()
Could this be avoided? Would it make sense to put this functionality somewhere 
else eg. vhost receive?

On another note I have a general concern. I understand similar functionality is 
present in the DPDK vhost sample app. I wonder if it would be feasible for this 
to be implemented in the DPDK vhost library and leveraged here, rather than 
having two implementations in two separate code bases.

I have some other comments inline.

Thanks,
Ciara

> 
> [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2  host-qperf-server01
> tcp_bw tcp_lat udp_bw udp_lat
>   do cksum in ovs-dpdk  do cksum in VM without this patch
> tcp_bw:
> bw  =  2.05 MB/secbw  =  1.92 MB/secbw  =  1.95 MB/sec
> tcp_bw:
> bw  =  3.9 MB/sec bw  =  3.99 MB/secbw  =  3.98 MB/sec
> tcp_bw:
> bw  =  8.09 MB/secbw  =  7.82 MB/secbw  =  8.19 MB/sec
> tcp_bw:
> bw  =  14.9 MB/secbw  =  14.8 MB/secbw  =  15.7 MB/sec
> tcp_bw:
> bw  =  27.7 MB/secbw  =  28 MB/sec  bw  =  29.7 MB/sec
> tcp_bw:
> bw  =  51.2 MB/secbw  =  50.9 MB/secbw  =  54.9 MB/sec
> tcp_bw:
> bw  =  86.7 MB/secbw  =  86.8 MB/secbw  =  95.1 MB/sec
> tcp_bw:
> bw  =  149 MB/sec bw  =  160 MB/sec bw  =  149 MB/sec
> tcp_bw:
> bw  =  211 MB/sec bw  =  205 MB/sec bw  =  216 MB/sec
> tcp_bw:
> bw  =  271 MB/sec bw  =  254 MB/sec bw  =  275 MB/sec
> tcp_bw:
> bw  =  326 MB/sec bw  =  303 MB/sec bw  =  321 MB/sec
> tcp_bw:
> bw  =  407 MB/sec bw  =  359 MB/sec bw  =  361 MB/sec
> tcp_bw:
> bw  =  816 MB/sec bw  =  512 MB/sec bw  =  419 MB/sec
> tcp_bw:
> bw  =  840 MB/sec bw  =  756 MB/sec bw  =  457 MB/sec
> tcp_bw:
> bw  =  1.07 GB/secbw  =  880 MB/sec bw  =  480 MB/sec
> tcp_bw:
> bw  =  1.17 GB/secbw  =  1.01 GB/secbw  =  488 MB/sec
> tcp_bw:
> bw  =  1.17 GB/secbw  =  1.11 GB/secbw  =  483 MB/sec
> tcp_lat:
> latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
> tcp_lat:
> latency  =  28.9 us   latency  =  29.3 us   latency  =  29.5 us
> tcp_lat:
> latency  =  29 us latency  =  29.3 us   latency  =  29.6 us
> tcp_lat:
> latency  =  29 us latency  =  29.4 us   latency  =  29.5 us
> tcp_lat:
> latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
> tcp_lat:
> latency  =  29.1 us   latency  =  29.3 us   latency  =  29.7 us
> tcp_lat:
> latency  =  29.4 us   latency  =  29.6 us   latency  =  30 us
> tcp_lat:
> latency  =  29.8 us   latency  =  30.1 us   latency  =  30.2 us
> tcp_lat:
> latency  =  30.9 us   latency  =  30.9 us   latency  =  31 us
> tcp_lat:
> latency  =  46.9 us   latency  =  46.2 us   latency  =  32.2 us
> tcp_lat:
> latency  =  51.5 us   latency  =  52.6 us   latency  =  34.5 us
> tcp_lat:
> latency  =  43.9 us   latency  =  43.8 us   latency  =  43.6 us
> tcp_lat:
>  latency  =  47.6 us  latency  =  48 us latency  =  48.1 us
> tcp_lat:
> latency  =  77.7 us   latency  =  78.8 us

Re: [ovs-dev] [PATCH v5 2/3] ovn-controller: Add a new action - 'put_nd_ra_opts'

2017-08-04 Thread Numan Siddique

On Fri, Aug 4, 2017 at 4:09 PM, Numan Siddique  wrote:

>
>
> On Fri, Aug 4, 2017 at 3:02 AM, Ben Pfaff  wrote:
>
>> On Mon, Jul 31, 2017 at 06:11:35PM +0530, nusid...@redhat.com wrote:
>> > From: Numan Siddique 
>> >
>> > This patch adds a new OVN action 'put_nd_ra_opts' to support native
>> > IPv6 Router Advertisement in OVN. This action can be used to respond
>> > to the IPv6 Router Solicitation requests.
>> >
>> > ovn-controller parses this action and adds a NXT_PACKET_IN2 OF flow
>> > with 'pause' flag set and the RA options stored in 'userdata' field.
>> > This action is similar to 'put_dhcp_opts' and 'put_dhcpv6_opts'.
>> >
>> > When a valid IPv6 RS packet is received by the pinctrl module of
>> > ovn-controller, it frames a new RA packet and sets the RA options
>> > from the 'userdata' field and resumes the packet storing 1 in the
>> > 1-bit result sub-field. If the packet is invalid, it resumes the
>> > packet without any modifications storing 0 in the 1-bit result
>> > sub-field.
>> >
>> > Eg. reg0[5] = put_nd_ra_opts(address_mode = "slaac", mtu = 1450,
>> >  slla = 01:02:03:04:05:06, prefix =
>> aef0::/64)
>> >
>> > Note that unlike DHCPv4/v6, a new table to store the supported IPv6 ND
>> RA
>> > options is not added in SB DB since there are only 3 ND RA options.
>> >
>> > Co-authored-by: Zongkai LI 
>> > Signed-off-by: Zongkai LI 
>> > Signed-off-by: Numan Siddique 
>> > Acked-by: Miguel Angel Ajo 
>>
>> Thanks for working on this.
>>
>> Looking at it, the treatment of some of the options strikes me as odd.
>> Some of the options (SLL) are actually required, and others could be
>> supplied as fixed data since there's a default value (mo_flags, mtu).
>> Only the prefixes seem to really be options in what I think of as the
>> usual sense.  It looks to me like those prefixes could be supplied
>> directly in the userdata as bytes to append to the packet.  It might be
>> cleaner to make these changes--then there would be less parsing of text
>> options, encoding them as binary options, decoding those binary options,
>> and then re-encoding them again in the packet.
>>
>>
> Thanks for the review Ben. I think what you are suggesting makes sense and
> simplies the code. Based on your comments, this is what I am planning to
> modify this patch.
>
> "put_nd_ra_opts" action would be like
>
> res_bit = put_nd_ra_opts(slla, mtu, mo_flags, ...)
>
>  ovn-northd would include 0 or more prefixes following the mo_flags field
> depending the address mode defined by the user.
>
> Eg.
> reg0[4] = put_nd_ra_opts(01:02:03:04:05:06, 1450, 0, aef0::/64)
> reg0[4] = put_nd_ra_opts(01:02:03:04:05:06, 1450, 64, aef0::/64,
> bef0::/64)
> reg0[4] = put_nd_ra_opts(01:02:03:04:05:06, 1450, 128)
>
>
> If this makes sense, I will update the patches accordingly. Please let me
> know.
>
>
And also it is possible to store the complete ICMPv6 response data (which
includes ICMP v6 header, flags and options) in the userdata field as you
mentioned, making the job of framing the reply easier.

Numan

Thanks again
>
> Numan
>
>
> If that doesn't make sense, though, this ALIGNED_CAST in
>> pinctrl_handle_put_nd_ra_opts()looks wrong--I see no reason to believe
>> that opt_data is 32-bit aligned.  Probably should use
>> get_unaligned_be32():
>> if (opt_len == sizeof(ovs_be32)) {
>> mtu = *(ALIGNED_CAST(const ovs_be32 *, opt_data));
>> }
>>
>> Thanks,
>>
>> Ben.
>>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability.

2017-08-04 Thread Ilya Maximets

Currently, real insertion probability is higher than configured
for the maximum case because of wrong usage of the random value.

i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
equals to 1. In this case we're allowing insert if random vailue
is less or equal to 1. So, two of 2**32 values (0 and 1) are
allowed and real probability is 2 times higher than configured.

This happens because 'random_uint32()' returns value in range
[0; UINT32_MAX], but for the checking to be correct we should
generate random value in range [0; UINT32_MAX - 1].

To fix this we have 4 possible solutions:

 1. need to use uint64_t for 'emc-insert-min' and calculate it
as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
range [0; UINT32_MAX].

This may decrease performance becaue of 64 bit atomic ops.

 2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
because it's the only value we have issues with.

This will require additional explanations and not very friendly
for users.

 3. Generate random value in range [0; UINT32_MAX - 1].

This will require heavy division operation.

 4. Decrease the range of available values for 'emc-insert-inv-prob'.

Actually, we don't need to have so much different values for
that option. I beleve that values higher than 1M are completely
useless. Choosing the upper limit as a power of 2 like 2**20 we
will be able to mask the generated random value in a fast way
and also avoid range issue, because same uint32_t can be used to
store 2**20.

This patch implements solution #4.

CC: Ciara Loftus 
Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
Signed-off-by: Ilya Maximets 
Acked-by: Antonio Fischetti 
---

Infrastructure and logic introduced here will be used for fixing
emc replacement policy.

 lib/dpif-netdev.c| 14 ++
 vswitchd/vswitch.xml |  3 ++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8ecc9d1..e23c674 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -153,9 +153,14 @@ struct netdev_flow_key {
 #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
 #define EM_FLOW_HASH_SEGS 2
 
+/* Set up maximum inverse EMC insertion probability to 2^20 - 1.
+ * Higher values considered useless in practice. */
+#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
+#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
+#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
 /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
 #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
-#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \
+#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /\
 DEFAULT_EM_FLOW_INSERT_INV_PROB)
 
 struct emc_entry {
@@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
 uint32_t min;
 atomic_read_relaxed(>dp->emc_insert_min, );
 
-if (min && random_uint32() <= min) {
+if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
 emc_insert(>flow_cache, key, flow);
 }
 }
@@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 }
 
 atomic_read_relaxed(>emc_insert_min, _min);
-if (insert_prob <= UINT32_MAX) {
-insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;
+if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {
+insert_min = insert_prob == 0
+ ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob;
 } else {
 insert_min = DEFAULT_EM_FLOW_INSERT_MIN;
 insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 074535b..61f252e 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -381,7 +381,8 @@
   
 
   
+  type='{"type": "integer",
+ "minInteger": 0, "maxInteger": 1048575}'>
 
   Specifies the inverse probability (1/emc-insert-inv-prob) of a flow
   being inserted into the Exact Match Cache (EMC). On average one in
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 0/2] EMC management fixes.

2017-08-04 Thread Ilya Maximets

Version 3:
* Added comment to EM_FLOW_INSERT_INV_PROB_SHIFT.

Ilya Maximets (2):
  dpif-netdev: Decrease range of values for EMC probability.
  dpif-netdev: Fix emc replacement policy.

 lib/dpif-netdev.c| 35 +--
 vswitchd/vswitch.xml |  3 ++-
 2 files changed, 27 insertions(+), 11 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] tests: fix wrapped comment

2017-08-04 Thread Lance Richardson

Add missing '#' to comment line.

Signed-off-by: Lance Richardson 
---
It's odd that this doesn't result in failures in more environments,
but it does fail under Alpine Linux.

 tests/ovn.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index 40fa817f9..29c5fc0ab 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -7134,7 +7134,7 @@ ovn_start
 # for the localnet on hv2. These tests are expected to succeed.
 
 # Create three hypervisors and create OVS ports corresponding
-to logical ports.
+# to logical ports.
 net_add n1
 
 sim_add hv1
-- 
2.13.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 答复: Re: [PATCH] ovn: Support for taas(tap-as-a-service) function

2017-08-04 Thread Russell Bryant

On Thu, Aug 3, 2017 at 8:52 PM,  wrote:

> Miguel Ángel and Russell
>
> Thanks for your reviews.
>
> Current taas function just for port monitor, in this situation, we can
> simplify the design by just add new port type. But we have the plane to add
> flow_classifier to tap_flow to monitor special flows of given port. The
> flow_classifier definition may like as follow:
> 'flow_classifiers': {
> 'id': {'allow_post': False, 'allow_put': False,
>'validate': {'type:uuid': None}, 'is_visible': True,
>'primary_key': True},
> 'tenant_id': {'allow_post': True, 'allow_put': False,
>   'validate': {'type:string': None},
>   'required_by_policy': True, 'is_visible': True},
> 'name': {'allow_post': True, 'allow_put': True,
>  'validate': {'type:string': None},
>  'is_visible': True, 'default': ''},
> 'description': {'allow_post': True, 'allow_put': True,
> 'validate': {'type:string': None},
> 'is_visible': True, 'default': ''},
> 'protocol': {'allow_post': True, 'allow_put': True,
>  'validate': {'type:string': None},
>  'is_visible': True, 'default': ''},
> 'src_port_range_min': {'allow_post': True, 'allow_put': True,
>'convert_to': attr.convert_to_int,
>'is_visible': True, 'default': 0},
> 'src_port_range_max': {'allow_post': True, 'allow_put': True,
>'convert_to': attr.convert_to_int,
>'is_visible': True, 'default': 0},
> 'dst_port_range_min': {'allow_post': True, 'allow_put': True,
>'convert_to': attr.convert_to_int,
>'is_visible': True, 'default': 0},
> 'dst_port_range_max': {'allow_post': True, 'allow_put': True,
>'convert_to': attr.convert_to_int,
>'is_visible': True, 'default': 0},
> 'src_ip_prefix': {'allow_post': True, 'allow_put': True,
>   'validate': {'type:subnet':
> attr._validate_subnet},
>   'is_visible': True, 'default': '0.0.0.0/0'},
> 'dst_ip_prefix': {'allow_post': True, 'allow_put': True,
>   'validate': {'type:subnet':
> attr._validate_subnet},
>   'is_visible': True, 'default': '0.0.0.0/0'}
> }
>
> This may need more complex pipeline. So I think add a new table and new
> pipeline may be a easier way.
>

Thanks for sharing the info on future capabilities.

We have a very flexible syntax for traffic classification in OVN.  It's the
logical flow match syntax (see logical flows in the southbound database).
We expose this syntax in the northbound database in the "match" column of
the ACL table.

This would be another use case where we could use this syntax in the
northbound database.  Expanding on my preview proposal:

 - a new port type of 'mirror'

 - when port type=mirror, an option to identify which port is being mirrored

 - (the new part) when port type=mirror, an option that may be used to
specify traffic classification for the subset of traffic on a port to
mirror, in "match" syntax

Do you think this captures the use case?

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovs-vsctl-bashcomp: Make compatible with busybox "awk".

2017-08-04 Thread Lance Richardson

> From: "Ben Pfaff" 
> To: d...@openvswitch.org
> Cc: "Ben Pfaff" , "Stuart Cardall" 
> Sent: Friday, 14 July, 2017 12:42:54 AM
> Subject: [ovs-dev] [PATCH] ovs-vsctl-bashcomp: Make compatible with busybox   
> "awk".
> 
> It seems that awk in busybox doesn't think that an empty string is part of
> a larger string, but that GNU awk does.  This commit adds an extra test to
> make _ovs_vsctl_check_startswith_string work either way.
> 
> This allows the following tests to pass with busybox awk:
> 
> vsctl bashcomp unit tests
> 
>   7: vsctl-bashcomp - basic verification ok
>   8: vsctl-bashcomp - argument completionok
> 
> Reported-by: Stuart Cardall 
> Signed-off-by: Ben Pfaff 
> ---

Makes sense, verified that these test cases now pass in the Alpine
environment with busybox awk.

Acked-by: Lance Richardson 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for EMC probability.

2017-08-04 Thread Fischetti, Antonio

My reply inline.

Acked-by: Antonio Fischetti 

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Friday, August 4, 2017 12:38 PM
> To: Fischetti, Antonio ; ovs-dev@openvswitch.org
> Cc: Heetae Ahn ; Loftus, Ciara
> 
> Subject: Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values 
> for
> EMC probability.
> 
> Hi Antonio,
> 
> Thanks for review and testing. Comments inline.
> 
> Best regards, Ilya Maximets.
> 
> On 03.08.2017 18:37, Fischetti, Antonio wrote:
> > LGTM,
> > just wondering if a further comment would help,
> > eg something like
> >
> > +/* random_uint32() returns a value in the [0; UINT32_MAX] range.
> > +   For our checking to be correct we would need instead a random value
> > +   in the range [0; UINT32_MAX - 1]. To avoid further computation
> > +   we use a decreased range of available values for 'emc-insert-inv-prob'
> > +   ie [0; 2**20 - 1]. */
> > +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
> > +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
> > +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
> >
> > ?
> >
> > -Antonio
> 
> I don't think that such a comment will be useful for the reader.
> This is more like explanation why we made this change and it should
> be in commit message (which already has this information).
> 
> In addition, the next patch adds build time assert which will forbid
> using of EM_FLOW_INSERT_INV_PROB_SHIFT higher than 30.
> 
> One thing we may clarify here is why 20 was chosen.
> Something like this:
> 
> /* Set up maximum inverse EMC insertion probability to 2^20 - 1.
>  * Higher values considered useless in practice. */
> 
> What do you think?

[Antonio] yes, sounds good. Thanks.


> 
> >
> >> -Original Message-
> >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org]
> >> On Behalf Of Ilya Maximets
> >> Sent: Monday, July 31, 2017 3:41 PM
> >> To: ovs-dev@openvswitch.org
> >> Cc: Heetae Ahn ; Ilya Maximets
> >> 
> >> Subject: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Decrease range of values for
> EMC
> >> probability.
> >>
> >> Currently, real insertion probability is higher than configured
> >> for the maximum case because of wrong usage of the random value.
> >>
> >> i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
> >> equals to 1. In this case we're allowing insert if random vailue
> >> is less or equal to 1. So, two of 2**32 values (0 and 1) are
> >> allowed and real probability is 2 times higher than configured.
> >>
> >> This happens because 'random_uint32()' returns value in range
> >> [0; UINT32_MAX], but for the checking to be correct we should
> >> generate random value in range [0; UINT32_MAX - 1].
> >>
> >> To fix this we have 4 possible solutions:
> >>
> >>  1. need to use uint64_t for 'emc-insert-min' and calculate it
> >> as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
> >> range [0; UINT32_MAX].
> >>
> >> This may decrease performance becaue of 64 bit atomic ops.
> >>
> >>  2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
> >> because it's the only value we have issues with.
> >>
> >> This will require additional explanations and not very friendly
> >> for users.
> >>
> >>  3. Generate random value in range [0; UINT32_MAX - 1].
> >>
> >> This will require heavy division operation.
> >>
> >>  4. Decrease the range of available values for 'emc-insert-inv-prob'.
> >>
> >> Actually, we don't need to have so much different values for
> >> that option. I beleve that values higher than 1M are completely
> >> useless. Choosing the upper limit as a power of 2 like 2**20 we
> >> will be able to mask the generated random value in a fast way
> >> and also avoid range issue, because same uint32_t can be used to
> >> store 2**20.
> >>
> >> This patch implements solution #4.
> >>
> >> CC: Ciara Loftus 
> >> Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>
> >> Infrastructure and logic introduced here will be used for fixing
> >> emc replacement policy.
> >>
> >>  lib/dpif-netdev.c| 12 
> >>  vswitchd/vswitch.xml |  3 ++-
> >>  2 files changed, 10 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 47a9fa0..123a7c9 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -152,9 +152,12 @@ struct netdev_flow_key {
> >>  #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
> >>  #define EM_FLOW_HASH_SEGS 2
> >>
> >> +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
> >> +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
> >> +#define EM_FLOW_INSERT_INV_PROB_MASK

Re: [ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive functionality.

2017-08-04 Thread Ilya Maximets

Hi Bhanuprakash,

Thanks for working on this.
I have a general concern about implementation of this functionality:

*What is the profit from using rte_keepalive library ?*

Pros:

* No need to implement 'rte_keepalive_dispatch_pings()' (40 LOC)
  and struct rte_keepalive (30 LOC, can be significantly decreased
  by removing not needed elements) ---> ~70 LOC.

Cons:

* DPDK dependency:

* Implementation of PMD threads management (KA) inside netdev code
  (netdev-dpdk) looks very strange.
* Many DPDK references in generic code (like dpdk_is_enabled).
* Feature isn't available for the common threads (main?) wihtout DPDK.
* Many stubs and placeholders for cases without dpdk.
* No ability for unit testing.

So, does it worth to use rte_keepalive? To make functionality fully
generic we only need to implement 'rte_keepalive_dispatch_pings()'
and few helpers. As soon as this function does nothing dpdk-specific
it's a really simple task which will allow to greatly clean up the
code. The feature is too big to use external library for 70 LOCs of
really simple code. (Clean up should save much more).

Am I missed something?
Any thoughts?

Best regards, Ilya Maximets.

> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing cores(PMD thread cores) by dispatching heartbeats at regular
> intervals. Incase of heartbeat misses additional health checks are
> enabled on the PMD thread to detect the failure and the same shall be
> reported to higher level fault management systems/frameworks.
> 
> The implementation uses OVSDB for reporting the health of the PMD threads.
> Any external monitoring application can read the status from OVSDB at 
> regular intervals (or) subscribe to the updates in OVSDB so that they get
> notified when the changes happen on OVSDB.
> 
> keepalive info struct is created and initialized for storing the
> status of the PMD threads. This is initialized by main thread(vswitchd)
> as part of init process and will be periodically updated by 'keepalive'
> thread. keepalive feature can be enabled through below OVSDB settings.
> 
> enable-keepalive=true
>   - Keepalive feature is disabled by default.
> 
> keepalive-interval="5000"
>   - Timer interval in milliseconds for monitoring the packet
> processing cores.
> 
> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
> up at regular intervals to update the timestamp and status of pmd cores
> in keepalive info struct. This information shall be read by vswitchd thread
> and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB.
> 
> An external monitoring framework like collectd with ovs events support
> can read (or) subscribe to the datapath status changes in ovsdb. When the 
> state
> is updated, the collectd shall be notified and will eventually relay the 
> status
> to ceilometer service running in the controller. Below is the high level
> overview of deployment model.
> 
> Compute NodeControllerCompute Node
> 
> Collectd  <--> Ceilometer <>   Collectd
> 
> OvS DPDK   OvS DPDK
> 
> +-+
> | VM  |
> +--+--+
> \---+---/
> |
> +--+---+   ++--+ +--+---+
> | OVS  |-> |   ovsevents plugin| --> |   collectd   |
> +--+---+   ++--+ +--+---+
> 
> +--+-+ +---++ |
> | Ceilometer | <-- | collectd ceilometer plugin |  <---
> +--+-+ +---++
> 
> github: The patches can be found here:
>   https://github.com/bbodired/ovs (Last master commit e7cd8c363)
> 
> Performance impact:
>   No noticeable performance or latency impact is observed with
>   KA feature enabled.
> 
> -
> v2 -> v3
>   * Rebase.
>   * Verified with dpdk-stable-17.05.1 release.
>   * Fixed build issues with MSVC and cross checked with appveyor.
> 
> v1 -> v2
>   * Rebase
>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
> is already applied as separate patch.
> 
> RFCv3 -> v1
>   * Made changes to fix failures in some unit test cases.
>   * some more code cleanup w.r.t process related APIs.
> 
> RFCv2 -> RFCv3
>   * Remove POSIX shared memory block implementation (suggested by Aaron).
>   * Rework the logic to register and track threads instead of cores. This way
> in the future any thread can be registered to KA framework. For now only 
> PMD
> threads are tracked (suggested by Aaron).
>   * Refactor few APIs and further clean up the code.
>
> RFCv1 -> RFCv2
>   * Merged the xml and schema commits to later commit where the actual
> implementation is done(suggested by Ben).
>   * Fix ovs-appctl keepalive/* hang issue

Re: [ovs-dev] [PATCH] OF1.5/EXT-334 OXS/Extensible Flow Entry Statistics Support

2017-08-04 Thread Satyavalli Rama

Hi Ben,

Much Thanks for the Review.
We will address the comments and will submit the updated patch soon.

Thanks & Regards
Satya Valli
Tata Consultancy Services
Mailto: satyavalli.r...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting

From:   Ben Pfaff 
To: SatyaValli 
Cc: d...@openvswitch.org, Surya Muttamsetty 
, SatyaValli 
Date:   08/03/2017 01:41 AM
Subject:Re: [ovs-dev] [PATCH] OF1.5/EXT-334 OXS/Extensible Flow 
Entry Statistics Support

Thanks for the revision.  I have some more comments.

On Mon, Jun 19, 2017 at 01:14:40PM +0530, SatyaValli wrote:
> commit 49a3592b2878a33033e5fd2e6e5ce82ebccef780
> Author: Satya Valli 
> Co-authored-by: Lavanya Harivelam 
> Co-authored-by: Surya Muttamsetty 

None of the above should be at the top of the commit message.  Coauthors
go with the sign-offs at the end.

> OpenVswitch: OF1.5/EXT-334 Extensible Flow Entry Statistics

Please don't repeat the subject again in the body.

> OpenFlow 1.5 introduces the Extensible Statistics (OXS) by redefining 
the existing
> flow entry statistics with OXS Fields.
> This Patch provides implementation for OXS fields encoding in TLV 
format.
> 
> To support this implementation below two messages are newly added
> 
> OFPST_OXS_FLOW_STATS_REQUEST
> OFPST_OXS_FLOW_STATS_REPLY
> OFPST_OXS_AGGREGATE_STATS_REQUEST
> OFPST_OXS_AGGREGATE_STATS_REPLY
> OFPST_FLOW_REMOVED

> As per the openflow specification-1.5, this enhancement should take 
place
> on the existing flow entry statistics with the OXS fields on all the 
messages
> that carries flow entry statistics.
> 
> The current commit adds support for the new feature in flow statistics 
multipart messages,
> aggregate multipart messages and OXS flow statistics support for flow 
removal message.
> 
> Some more fields are added to ovs-ofctl dump-flows command to support 
OpenFlow15 OXS stats.
> Below are Commands to display OXS stats field wise
> 
> Flow Statistics Multipart
> ovs-ofctl dump-flows -O OpenFlow15  oxs-duration
> ovs-ofctl dump-flows -O OpenFlow15  oxs-idle_time
> ovs-ofctl dump-flows -O OpenFlow15  oxs-packet_count
> ovs-ofctl dump-flows -O OpenFlow15  oxs-byte_count
> 
> Aggregate Flow Statistics Multipart
> ovs-ofctl dump-aggregate -O OpenFlow15  oxs-packet_count
> ovs-ofctl dump-aggregate -O OpenFlow15  oxs-byte_count
> ovs-ofctl dump-aggregate -O OpenFlow15  oxs-flow_count
> 
> Make Check Changes in  - ofproto.at and ofproto-dpif.at:
> For 1.5 version "flags"  variable has been removed from flow_stats_reply 
structure.
> For avoiding the make check failures for OF1.5 version add-flow with 
flags,
> below test cases has been modified for 1.5 version in respective .at 
files
> 
> Files   testcase
> ofproto.at  0884
> ofproto-dpif1126

Please don't refer to tests by number.  The numbers are not meaningful
and change often.  Use the name of the test.

> In ofproto.at there is one test case numbered 0884 and 1126 in 
ofproto-dpif.at
> which is validating flagsfor OF1.5 this testcase is not valid, because
> in stats reply flags are not explicitly communicated,I've removed this 
test case.
> Before removing this test case is failing for 1.5 version.
> 
> Since FLOW_REMOVED is not supported in OF1.5 version,
> testing has been done by adding test case in ofproto.at.
> While doing make check after adding test case we are able to see
> OFPT_FLOW_REMOVED (OF1.5) message has sent successfully.
> 
> But make check is failing because of Signal 15 termination,
> so we've removed that test case from the patch.

Why isn't FLOW_REMOVED supported in OF1.5?  Support is needed.  Signal
15 is SIGTERM, meaning that something actually used "kill" to kill it
intentionally.  Please investigate and fix the problem rather than
removing a test.

This adds features that users can use via ovs-ofctl, but it does not
document them.  Please document them in ovs-ofctl(8).  Please also add
appropriate NEWS items to explain the new features.

This adds support for new OpenFlow messages, but it does not add any
tests for printing them in ofp-print.at.  Please add at least one
representative test there for every new message.

struct ofp15_flow_removed doesn't seem to be the same as
ofp_flow_removed in OpenFlow 1.5.

This adds a global variable oxs_field_set.  This is inappropriate.  Do
not use global variables.

The variable oxs_field_set has a bunch of unnamed magic numbers.  This
is inappropriate.  Do not use unnamed magic numbers.

This code is weird.  Fix it please:

+if (parse_oxs_field(name, )) {
+if (f->fl_type == OFPXST_OFB_DURATION) {
+oxs_field_set |= (1 <<

[ovs-dev] [PATCH v3 12/19] keepalive: Add support to query keepalive status.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit adds support to query if keepalive status is
enabled/disabled.

  $ ovs-appctl keepalive/status
keepAlive Status: Enabled

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/keepalive.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/lib/keepalive.c b/lib/keepalive.c
index a2f0314..43f8f11 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -429,6 +429,19 @@ out:
 ds_destroy();
 }
 
+static void
+ka_unixctl_status(struct unixctl_conn *conn, int argc OVS_UNUSED,
+  const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
+{
+struct ds ds = DS_EMPTY_INITIALIZER;
+
+ds_put_format(, "keepAlive Status: %s",
+  ka_is_enabled() ? "Enabled" : "Disabled");
+
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
 static int
 ka_init__(void)
 {
@@ -473,6 +486,8 @@ ka_init(const struct smap *ovs_other_config)
 
 unixctl_command_register("keepalive/pmd-health-show", "", 0, 0,
   ka_unixctl_pmd_health_show, ka_info);
+unixctl_command_register("keepalive/status", "", 0, 0,
+  ka_unixctl_status, NULL);
 
 ovsthread_once_done(_enable);
 }
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 11/19] keepalive: Add support to query keepalive statistics.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit adds support to query keepalive statistics. Datapath health
status can be retrieved as follows:

  $ ovs-appctl keepalive/pmd-health-show

  Keepalive status

keepalive status  : Enabled
keepalive interval: 1000 ms
PMD threads   : 8

 PMDCORESTATE   LAST SEEN TIMESTAMP
pmd620  ALIVE   8632183482028293
pmd631  ALIVE   8632183482028425
pmd642  ALIVE   8632190191004294
pmd653  ALIVE   8632183482028525
pmd664  GONE8612183482028117
pmd675  ALIVE   8632190191004984
pmd686  ALIVE   8632190191005713
pmd697  ALIVE   8632190191006555

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/keepalive.c | 78 +
 1 file changed, 78 insertions(+)

diff --git a/lib/keepalive.c b/lib/keepalive.c
index 83ce26d..a2f0314 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -23,9 +23,11 @@
 #include "dpdk.h"
 #include "keepalive.h"
 #include "lib/vswitch-idl.h"
+#include "openvswitch/dynamic-string.h"
 #include "openvswitch/vlog.h"
 #include "ovs-thread.h"
 #include "process.h"
+#include "unixctl.h"
 
 VLOG_DEFINE_THIS_MODULE(keepalive);
 
@@ -354,6 +356,79 @@ ka_stats_run(void)
 return ka_stats;
 }
 
+static void
+ka_unixctl_pmd_health_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
+   const char *argv[] OVS_UNUSED, void *ka_info_)
+{
+struct ds ds = DS_EMPTY_INITIALIZER;
+ds_put_format(,
+  "\n\t\tKeepalive status\n\n");
+
+ds_put_format(, "keepalive status  : %s\n",
+  ka_is_enabled() ? "Enabled" : "Disabled");
+
+if (!ka_is_enabled()) {
+goto out;
+}
+
+ds_put_format(, "keepalive interval: %"PRIu32" ms\n",
+  get_ka_interval());
+
+struct keepalive_info *ka_info = (struct keepalive_info *)ka_info_;
+if (OVS_UNLIKELY(!ka_info)) {
+goto out;
+}
+
+ds_put_format(, "PMD threads   : %"PRIu32" \n", ka_info->pmd_cnt);
+ds_put_format(,
+  "\n PMD\tCORE\tSTATE\tLAST SEEN TIMESTAMP\n");
+
+struct ka_process_info *pinfo, *pinfo_next;
+
+ovs_mutex_lock(_info->proclist_mutex);
+HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) {
+char *state = NULL;
+if (pinfo->core_state == KA_STATE_UNUSED ||
+  pinfo->core_state == KA_STATE_SLEEP)
+continue;
+
+switch (pinfo->core_state) {
+case KA_STATE_ALIVE:
+state = "ALIVE";
+break;
+case KA_STATE_MISSING:
+state = "MISSING";
+break;
+case KA_STATE_DEAD:
+state = "DEAD";
+break;
+case KA_STATE_GONE:
+state = "GONE";
+break;
+case KA_STATE_DOZING:
+state = "DOZING";
+break;
+case KA_STATE_SLEEP:
+state = "SLEEP";
+break;
+case KA_STATE_CHECK:
+state = "HEALTH_CHECK_RUNNING";
+break;
+case KA_STATE_UNUSED:
+break;
+}
+ds_put_format(, "%s\t%2d\t%s\t%"PRIu64"\n",
+  pinfo->name, pinfo->core_id, state,
+  pinfo->core_last_seen_times);
+}
+ovs_mutex_unlock(_info->proclist_mutex);
+
+ds_put_format(, "\n");
+out:
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
 static int
 ka_init__(void)
 {
@@ -396,6 +471,9 @@ ka_init(const struct smap *ovs_other_config)
 }
 }
 
+unixctl_command_register("keepalive/pmd-health-show", "", 0, 0,
+  ka_unixctl_pmd_health_show, ka_info);
+
 ovsthread_once_done(_enable);
 }
 }
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 10/19] bridge: Update keepalive status in OVSDB

2017-08-04 Thread Bhanuprakash Bodireddy

This commit allows vswitchd thread to update the OVSDB with the
status of all registered PMD threads. The status can be monitored
using ovsdb-client and the sample output is below.

$ ovsdb-client monitor Open_vSwitch Open_vSwitch keepalive

rowaction keepalive
7b746190-ee71-4dcc-becf-f8cb9c7cb909 old  { "pmd62"="ALIVE,0,9226457935188922"
"pmd63"="ALIVE,1,9226457935189628"
"pmd64"="ALIVE,2,9226457935189897"
"pmd65"="ALIVE,3,9226457935190127"}

 new  { "pmd62"="ALIVE,0,9226460230167364"
"pmd63"="ALIVE,1,9226460230168100"
"pmd64"="ALIVE,2,9226460230168905"
"pmd65"="ALIVE,3,9226460230169632"}

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/keepalive.c   | 15 +++
 lib/keepalive.h   |  1 +
 vswitchd/bridge.c | 26 ++
 3 files changed, 42 insertions(+)

diff --git a/lib/keepalive.c b/lib/keepalive.c
index 509a0d1..83ce26d 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -339,6 +339,21 @@ get_ka_stats(void)
 ovs_mutex_unlock();
 }
 
+struct smap *
+ka_stats_run(void)
+{
+struct smap *ka_stats = NULL;
+
+ovs_mutex_lock();
+if (keepalive_stats) {
+ka_stats = keepalive_stats;
+keepalive_stats = NULL;
+}
+ovs_mutex_unlock();
+
+return ka_stats;
+}
+
 static int
 ka_init__(void)
 {
diff --git a/lib/keepalive.h b/lib/keepalive.h
index b18f6e0..cedc390 100644
--- a/lib/keepalive.h
+++ b/lib/keepalive.h
@@ -101,6 +101,7 @@ int get_ka_init_status(void);
 int ka_alloc_portstats(unsigned, int);
 void ka_destroy_portstats(void);
 void get_ka_stats(void);
+struct smap *ka_stats_run(void);
 
 void dispatch_heartbeats(void);
 #endif /* keepalive.h */
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 8ff91df..56c55f5 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -286,6 +286,7 @@ static bool port_is_synthetic(const struct port *);
 
 static void reconfigure_system_stats(const struct ovsrec_open_vswitch *);
 static void run_system_stats(void);
+static void run_keepalive_stats(void);
 
 static void bridge_configure_mirrors(struct bridge *);
 static struct mirror *mirror_create(struct bridge *,
@@ -403,6 +404,7 @@ bridge_init(const char *remote)
 
 ovsdb_idl_omit_alert(idl, _open_vswitch_col_cur_cfg);
 ovsdb_idl_omit_alert(idl, _open_vswitch_col_statistics);
+ovsdb_idl_omit_alert(idl, _open_vswitch_col_keepalive);
 ovsdb_idl_omit_alert(idl, _open_vswitch_col_datapath_types);
 ovsdb_idl_omit_alert(idl, _open_vswitch_col_iface_types);
 ovsdb_idl_omit(idl, _open_vswitch_col_external_ids);
@@ -2686,6 +2688,29 @@ run_system_stats(void)
 }
 }
 
+void
+run_keepalive_stats(void)
+{
+struct smap *ka_stats;
+const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(idl);
+
+ka_stats = ka_stats_run();
+if (ka_stats && cfg) {
+struct ovsdb_idl_txn *txn;
+struct ovsdb_datum datum;
+
+txn = ovsdb_idl_txn_create(idl);
+ovsdb_datum_from_smap(, ka_stats);
+smap_destroy(ka_stats);
+ovsdb_idl_txn_write(>header_, _open_vswitch_col_keepalive,
+);
+ovsdb_idl_txn_commit(txn);
+ovsdb_idl_txn_destroy(txn);
+
+free(ka_stats);
+}
+}
+
 static const char *
 ofp12_controller_role_to_str(enum ofp12_controller_role role)
 {
@@ -3038,6 +3063,7 @@ bridge_run(void)
 run_stats_update();
 run_status_update();
 run_system_stats();
+run_keepalive_stats();
 }
 
 void
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 09/19] keepalive: Retrieve PMD status periodically.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit implements APIs to retrieve the PMD thread status and return
the status in the below format for each PMD thread.

  Format: pmdid="status,core id,last_seen_timestamp"
  eg: pmd62="ALIVE,2,9220698256784207"
  pmd63="GONE,3,9220698256786231"

The status is periodically retrieved by keepalive thread and stored in
keepalive_stats struc which later shall be retrieved by vswitchd thread.
In case of four PMD threads the status is as below:

   "pmd62"="ALIVE,0,9220698256784207"
   "pmd63"="ALIVE,1,9220698256784913"
   "pmd64"="ALIVE,2,9220698256785902"
   "pmd65"="ALIVE,3,9220698256786231"

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/dpif-netdev.c |  1 +
 lib/keepalive.c   | 73 +++
 lib/keepalive.h   |  1 +
 3 files changed, 75 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 009ee24..65db5fd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -991,6 +991,7 @@ ovs_keepalive(void *f_)
 int n_pmds = cmap_count(>poll_threads) - 1;
 if (n_pmds > 0) {
 dispatch_heartbeats();
+get_ka_stats();
 }
 
 xusleep(get_ka_interval() * 1000);
diff --git a/lib/keepalive.c b/lib/keepalive.c
index 3022789..509a0d1 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -24,6 +24,7 @@
 #include "keepalive.h"
 #include "lib/vswitch-idl.h"
 #include "openvswitch/vlog.h"
+#include "ovs-thread.h"
 #include "process.h"
 
 VLOG_DEFINE_THIS_MODULE(keepalive);
@@ -33,6 +34,9 @@ static bool ka_init_status = ka_init_failure; /* Keepalive 
initialization */
 static uint32_t keepalive_timer_interval; /* keepalive timer interval */
 static struct keepalive_info *ka_info = NULL;
 
+static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
+static struct smap *keepalive_stats OVS_GUARDED_BY(mutex);
+
 inline bool
 ka_is_enabled(void)
 {
@@ -266,6 +270,75 @@ keepalive_info_create(void)
 return ka_info;
 }
 
+static void
+get_pmd_status(struct smap *ka_pmd_stats)
+OVS_REQUIRES(ka_info->proclist_mutex)
+{
+if (OVS_UNLIKELY(!ka_info)) {
+return;
+}
+
+struct ka_process_info *pinfo, *pinfo_next;
+HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) {
+int core_id = pinfo->core_id;
+char *state = NULL;
+if (pinfo->core_state == KA_STATE_UNUSED ||
+   pinfo->core_state == KA_STATE_SLEEP ) {
+continue;
+}
+
+switch (pinfo->core_state) {
+case KA_STATE_ALIVE:
+state = "ALIVE";
+break;
+case KA_STATE_MISSING:
+state = "MISSING";
+break;
+case KA_STATE_DEAD:
+state = "DEAD";
+break;
+case KA_STATE_GONE:
+state = "GONE";
+break;
+case KA_STATE_DOZING:
+state = "DOZING";
+break;
+case KA_STATE_SLEEP:
+state = "SLEEP";
+break;
+case KA_STATE_CHECK:
+state = "HEALTH_CHECK_RUNNING";
+break;
+case KA_STATE_UNUSED:
+break;
+}
+
+smap_add_format(ka_pmd_stats, pinfo->name, "%s,%d,%ld",
+state, core_id, pinfo->core_last_seen_times);
+}
+}
+
+void
+get_ka_stats(void)
+{
+struct smap *ka_pmd_stats;
+ka_pmd_stats = xmalloc(sizeof *ka_pmd_stats);
+smap_init(ka_pmd_stats);
+
+ovs_mutex_lock(_info->proclist_mutex);
+get_pmd_status(ka_pmd_stats);
+ovs_mutex_unlock(_info->proclist_mutex);
+
+ovs_mutex_lock();
+if (keepalive_stats) {
+smap_destroy(keepalive_stats);
+free(keepalive_stats);
+keepalive_stats = NULL;
+}
+keepalive_stats = ka_pmd_stats;
+ovs_mutex_unlock();
+}
+
 static int
 ka_init__(void)
 {
diff --git a/lib/keepalive.h b/lib/keepalive.h
index 31ee34c..b18f6e0 100644
--- a/lib/keepalive.h
+++ b/lib/keepalive.h
@@ -100,6 +100,7 @@ uint32_t get_ka_interval(void);
 int get_ka_init_status(void);
 int ka_alloc_portstats(unsigned, int);
 void ka_destroy_portstats(void);
+void get_ka_stats(void);
 
 void dispatch_heartbeats(void);
 #endif /* keepalive.h */
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 08/19] dpif-netdev: Enable heartbeats for DPDK datapath.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit adds heartbeat mechanism support for DPDK datapath. Heartbeats
are sent to registered PMD threads at predefined intervals (as set in ovsdb
with 'keepalive-interval').

The heartbeats are only enabled when there is atleast one port added to
the bridge and with active PMD thread polling the port.

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/dpdk-stub.c   | 6 ++
 lib/dpdk.c| 7 +++
 lib/dpdk.h| 2 ++
 lib/dpif-netdev.c | 9 -
 lib/keepalive.c   | 9 +
 lib/keepalive.h   | 1 +
 6 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index b4f111a..740ec11 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -78,3 +78,9 @@ dpdk_mark_pmd_core_sleep(void)
 {
 /* Nothing */
 }
+
+void
+dpdk_dispatch_pmd_hb(void)
+{
+/* Nothing */
+}
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 250cc2f..59ac035 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -542,3 +542,10 @@ dpdk_mark_pmd_core_sleep(void)
 {
 rte_keepalive_mark_sleep(rte_global_keepalive_info);
 }
+
+/* Dispatch pings */
+void
+dpdk_dispatch_pmd_hb(void)
+{
+rte_keepalive_dispatch_pings(NULL, rte_global_keepalive_info);
+}
diff --git a/lib/dpdk.h b/lib/dpdk.h
index 7619730..2532cb7 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -47,4 +47,6 @@ void dpdk_unregister_pmd_core(unsigned core_id);
 void dpdk_mark_pmd_core_alive(void);
 void dpdk_mark_pmd_core_sleep(void);
 
+void dpdk_dispatch_pmd_hb(void);
+
 #endif /* dpdk.h */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5737223..009ee24 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -981,11 +981,18 @@ sorted_poll_thread_list(struct dp_netdev *dp,
 }
 
 static void *
-ovs_keepalive(void *f_ OVS_UNUSED)
+ovs_keepalive(void *f_)
 {
+struct dp_netdev *dp = f_;
+
 pthread_detach(pthread_self());
 
 for (;;) {
+int n_pmds = cmap_count(>poll_threads) - 1;
+if (n_pmds > 0) {
+dispatch_heartbeats();
+}
+
 xusleep(get_ka_interval() * 1000);
 }
 
diff --git a/lib/keepalive.c b/lib/keepalive.c
index eeaa25a..3022789 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -248,6 +248,15 @@ ka_destroy_portstats(void)
 }
 }
 
+/* Dispatch pings */
+void
+dispatch_heartbeats(void)
+{
+#ifdef DPDK_NETDEV
+dpdk_dispatch_pmd_hb();
+#endif
+}
+
 static struct keepalive_info *
 keepalive_info_create(void)
 {
diff --git a/lib/keepalive.h b/lib/keepalive.h
index 605e41c..31ee34c 100644
--- a/lib/keepalive.h
+++ b/lib/keepalive.h
@@ -101,4 +101,5 @@ int get_ka_init_status(void);
 int ka_alloc_portstats(unsigned, int);
 void ka_destroy_portstats(void);
 
+void dispatch_heartbeats(void);
 #endif /* keepalive.h */
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 07/19] dpif-netdev: Register packet processing cores to KA framework.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit registers the packet processing PMD cores to keepalive
framework. Only PMDs that have rxqs mapped will be registered and
actively monitored by KA framework.

This commit spawns a keepalive thread that will dispatch heartbeats to
PMD cores. The pmd threads respond to heartbeats by marking themselves
alive. As long as PMD responds to heartbeats it is considered 'healthy'.

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/dpif-netdev.c |  99 +++
 lib/keepalive.c   | 138 +++---
 lib/keepalive.h   |  25 +-
 lib/util.c|  10 
 lib/util.h|   1 +
 5 files changed, 254 insertions(+), 19 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index b51674f..5737223 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -73,6 +73,7 @@
 #include "seq.h"
 #include "smap.h"
 #include "sset.h"
+#include "svec.h"
 #include "timeval.h"
 #include "tnl-neigh-cache.h"
 #include "tnl-ports.h"
@@ -979,6 +980,94 @@ sorted_poll_thread_list(struct dp_netdev *dp,
 *n = k;
 }
 
+static void *
+ovs_keepalive(void *f_ OVS_UNUSED)
+{
+pthread_detach(pthread_self());
+
+for (;;) {
+xusleep(get_ka_interval() * 1000);
+}
+
+return NULL;
+}
+
+static void
+ka_thread_start(struct dp_netdev *dp)
+{
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+
+if (ovsthread_once_start()) {
+ovs_thread_create("ovs_keepalive", ovs_keepalive, dp);
+
+ovsthread_once_done();
+}
+}
+
+static void
+pmd_num_poll_ports(struct dp_netdev_pmd_thread *pmd, int *num_poll_ports)
+{
+struct svec pmd_port_poll_list;
+svec_init(_port_poll_list);
+
+struct rxq_poll *poll;
+const char *port_name;
+int i = 0;
+
+HMAP_FOR_EACH (poll, node, >poll_list) {
+svec_add(_port_poll_list, netdev_rxq_get_name(poll->rxq->rx));
+}
+/* With MQ enabled, remove the duplicates. */
+svec_sort_unique(_port_poll_list);
+SVEC_FOR_EACH (i, port_name, _port_poll_list) {
+VLOG_DBG("%d Port:%s", i, port_name);
+}
+svec_destroy(_port_poll_list);
+
+*num_poll_ports = i;
+VLOG_DBG("PMD thread [%d] polling [%d] ports",
+ pmd->core_id, *num_poll_ports);
+}
+
+static void
+ka_register_datapath_threads(struct dp_netdev *dp)
+{
+int ka_init = get_ka_init_status();
+VLOG_DBG("Keepalive: Was initialization successful? [%s]",
+ka_init ? "Success" : "Failure");
+if (!ka_init) {
+return;
+}
+
+ka_thread_start(dp);
+
+struct dp_netdev_pmd_thread *pmd;
+CMAP_FOR_EACH (pmd, node, >poll_threads) {
+/* Skip PMD thread with no rxqs mapping. */
+if (OVS_UNLIKELY(!hmap_count(>poll_list))) {
+continue;
+}
+
+/*  Register only PMD threads. */
+if (pmd->core_id != NON_PMD_CORE_ID) {
+int err;
+int nports;
+pmd_num_poll_ports(pmd, );
+err = ka_alloc_portstats(pmd->core_id, nports);
+if (err) {
+VLOG_FATAL("Unable to allocate memory for PMD core %d",
+pmd->core_id);
+return;
+}
+
+int tid = ka_get_pmd_tid(pmd->core_id);
+ka_register_thread(tid, true);
+VLOG_DBG("Registered PMD thread [%d] on Core [%d] to KA framework",
+  tid, pmd->core_id);
+}
+}
+}
+
 static void
 dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
  void *aux)
@@ -3626,6 +3715,9 @@ reconfigure_datapath(struct dp_netdev *dp)
 
 /* Reload affected pmd threads. */
 reload_affected_pmds(dp);
+
+/* Register datapath threads to KA monitoring. */
+ka_register_datapath_threads(dp);
 }
 
 /* Returns true if one of the netdevs in 'dp' requires a reconfiguration */
@@ -3862,6 +3954,9 @@ reload:
   : PMD_CYCLES_IDLE);
 }
 
+/* Mark PMD thread alive. */
+ka_mark_pmd_thread_alive();
+
 if (lc++ > 1024) {
 bool reload;
 
@@ -3895,6 +3990,10 @@ reload:
 }
 
 emc_cache_uninit(>flow_cache);
+
+int tid = ka_get_pmd_tid(pmd->core_id);
+ka_unregister_thread(tid, true);
+
 free(poll_list);
 pmd_free_cached_ports(pmd);
 return NULL;
diff --git a/lib/keepalive.c b/lib/keepalive.c
index 0087e5c..eeaa25a 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -24,6 +24,7 @@
 #include "keepalive.h"
 #include "lib/vswitch-idl.h"
 #include "openvswitch/vlog.h"
+#include "process.h"
 
 VLOG_DEFINE_THIS_MODULE(keepalive);
 
@@ -77,21 +78,85 @@ ka_store_pmd_id(unsigned core_idx)
 
 /* Register thread to KA framework. */
 void
-ka_register_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED,
-   unsigned core_id)
+ka_register_thread(int tid, bool

[ovs-dev] [PATCH v3 06/19] dpif-netdev: Add helper function to store datapath tids.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit adds an API to store the PMD thread ids in to KA info struct.
The thread ids shall be used to check false positives and for status and
statistics reporting.

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/dpif-netdev.c |  3 +++
 lib/keepalive.c   | 13 +
 lib/keepalive.h   |  1 +
 3 files changed, 17 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8ecc9d1..b51674f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -49,6 +49,7 @@
 #include "flow.h"
 #include "hmapx.h"
 #include "id-pool.h"
+#include "keepalive.h"
 #include "latch.h"
 #include "netdev.h"
 #include "netdev-vport.h"
@@ -3824,6 +3825,8 @@ pmd_thread_main(void *f_)
 
 poll_list = NULL;
 
+ka_store_pmd_id(pmd->core_id);
+
 /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
 ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
 ovs_numa_thread_setaffinity_core(pmd->core_id);
diff --git a/lib/keepalive.c b/lib/keepalive.c
index 66c969d..0087e5c 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -62,6 +62,19 @@ get_ka_init_status(void)
 return ka_init_status;
 }
 
+void
+ka_store_pmd_id(unsigned core_idx)
+{
+int tid = -1;
+#ifdef DPDK_NETDEV
+tid = rte_sys_gettid();
+#endif
+
+if (ka_is_enabled()) {
+ka_info->thread_id[core_idx] = tid;
+}
+}
+
 /* Register thread to KA framework. */
 void
 ka_register_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED,
diff --git a/lib/keepalive.h b/lib/keepalive.h
index c96f2d0..f74b23a 100644
--- a/lib/keepalive.h
+++ b/lib/keepalive.h
@@ -76,6 +76,7 @@ void ka_unregister_pmd_thread(int, bool, unsigned);
 void ka_mark_pmd_thread_alive(void);
 void ka_mark_pmd_thread_sleep(void);
 
+void ka_store_pmd_id(unsigned core);
 uint32_t get_ka_interval(void);
 int get_ka_init_status(void);
 
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 05/19] keepalive: Add more helper functions to KA framework.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit introduces helper functions in 'keepalive' module that are
needed to register/unregister PMD threads to KA framework. Also
introduce APIs to mark the PMD core states.

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/keepalive.c | 51 +++
 lib/keepalive.h |  8 
 2 files changed, 59 insertions(+)

diff --git a/lib/keepalive.c b/lib/keepalive.c
index e93dc99..66c969d 100644
--- a/lib/keepalive.c
+++ b/lib/keepalive.c
@@ -49,6 +49,57 @@ ka_get_pmd_tid(unsigned core_idx)
 return tid;
 }
 
+/* Return the Keepalive timer interval. */
+inline uint32_t
+get_ka_interval(void)
+{
+return keepalive_timer_interval;
+}
+
+int
+get_ka_init_status(void)
+{
+return ka_init_status;
+}
+
+/* Register thread to KA framework. */
+void
+ka_register_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED,
+   unsigned core_id)
+{
+if (ka_is_enabled()) {
+dpdk_register_pmd_core(core_id);
+}
+}
+
+/* Unregister thread from KA framework. */
+void
+ka_unregister_pmd_thread(int tid OVS_UNUSED, bool thread_is_pmd OVS_UNUSED,
+ unsigned core_id)
+{
+if (ka_is_enabled()) {
+dpdk_unregister_pmd_core(core_id);
+}
+}
+
+/* Mark packet processing core alive. */
+void
+ka_mark_pmd_thread_alive(void)
+{
+if (ka_is_enabled()) {
+dpdk_mark_pmd_core_alive();
+}
+}
+
+/* Mark packet processing core as idle. */
+inline void
+ka_mark_pmd_thread_sleep(void)
+{
+if (ka_is_enabled()) {
+dpdk_mark_pmd_core_sleep();
+}
+}
+
 void
 ka_set_pmd_state_ts(unsigned core_id, enum keepalive_state state,
 uint64_t last_alive)
diff --git a/lib/keepalive.h b/lib/keepalive.h
index b87b66f..c96f2d0 100644
--- a/lib/keepalive.h
+++ b/lib/keepalive.h
@@ -71,4 +71,12 @@ void ka_set_pmd_state_ts(unsigned, enum keepalive_state, 
uint64_t);
 
 int ka_get_pmd_tid(unsigned core);
 bool ka_is_enabled(void);
+void ka_register_pmd_thread(int, bool, unsigned);
+void ka_unregister_pmd_thread(int, bool, unsigned);
+void ka_mark_pmd_thread_alive(void);
+void ka_mark_pmd_thread_sleep(void);
+
+uint32_t get_ka_interval(void);
+int get_ka_init_status(void);
+
 #endif /* keepalive.h */
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 04/19] Keepalive: Add initial keepalive support.

2017-08-04 Thread Bhanuprakash Bodireddy

This commit introduces the initial keepalive support by adding
'keepalive' module and also helper and initialization functions
that will be invoked by later commits.

This commit adds new ovsdb column "keepalive" that shows the status
of the datapath threads. This is implemented for DPDK datapath and
only status of PMD threads is reported.

For eg:
  To enable keepalive feature.
  'ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-keepalive=true'

  To set timer interval of 5000ms for monitoring packet processing cores.
  'ovs-vsctl --no-wait set Open_vSwitch . \
 other_config:keepalive-interval="5000"

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/automake.mk|   2 +
 lib/dpdk-stub.c|   6 ++
 lib/dpdk.c |  30 +++--
 lib/dpdk.h |   4 ++
 lib/keepalive.c| 157 +
 lib/keepalive.h|  74 +
 lib/netdev-dpdk.c  |  61 +-
 lib/netdev-dpdk.h  |   5 ++
 vswitchd/bridge.c  |   8 +++
 vswitchd/vswitch.ovsschema |   8 ++-
 vswitchd/vswitch.xml   |  49 ++
 11 files changed, 397 insertions(+), 7 deletions(-)
 create mode 100644 lib/keepalive.c
 create mode 100644 lib/keepalive.h

diff --git a/lib/automake.mk b/lib/automake.mk
index 2415f4c..0d99f0a 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -110,6 +110,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/json.c \
lib/jsonrpc.c \
lib/jsonrpc.h \
+   lib/keepalive.c \
+   lib/keepalive.h \
lib/lacp.c \
lib/lacp.h \
lib/latch.h \
diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index d7fb19b..b4f111a 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -49,6 +49,12 @@ dpdk_get_vhost_sock_dir(void)
 return NULL;
 }
 
+bool
+dpdk_is_enabled(void)
+{
+return false;
+}
+
 void
 dpdk_register_pmd_core(unsigned core_id OVS_UNUSED)
 {
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 8db63bf..250cc2f 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -32,6 +32,7 @@
 
 #include "dirs.h"
 #include "fatal-signal.h"
+#include "keepalive.h"
 #include "netdev-dpdk.h"
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/vlog.h"
@@ -42,6 +43,7 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
 static FILE *log_stream = NULL;   /* Stream for DPDK log redirection */
 
 static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
+static bool dpdk_enabled = false; /* DPDK status. */
 
 static int
 process_vhost_flags(char *flag, const char *default_val, int size,
@@ -303,6 +305,12 @@ static cookie_io_functions_t dpdk_log_func = {
 .write = dpdk_log_write,
 };
 
+bool
+dpdk_is_enabled(void)
+{
+return dpdk_enabled;
+}
+
 static void
 dpdk_init__(const struct smap *ovs_other_config)
 {
@@ -456,9 +464,7 @@ dpdk_init__(const struct smap *ovs_other_config)
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static bool enabled = false;
-
-if (enabled || !ovs_other_config) {
+if (dpdk_enabled || !ovs_other_config) {
 return;
 }
 
@@ -468,7 +474,7 @@ dpdk_init(const struct smap *ovs_other_config)
 if (ovsthread_once_start(_enable)) {
 VLOG_INFO("DPDK Enabled - initializing...");
 dpdk_init__(ovs_other_config);
-enabled = true;
+dpdk_enabled = true;
 VLOG_INFO("DPDK Enabled - initialized");
 ovsthread_once_done(_enable);
 }
@@ -477,6 +483,22 @@ dpdk_init(const struct smap *ovs_other_config)
 }
 }
 
+int
+dpdk_ka_init(struct keepalive_info *ka_info)
+{
+/* Initialize keepalive subsystem */
+if ((rte_global_keepalive_info =
+rte_keepalive_create(_failcore_cb, ka_info)) == NULL) {
+VLOG_ERR("Keepalive initialization failed.");
+return -1;
+} else {
+rte_keepalive_register_relay_callback(rte_global_keepalive_info,
+dpdk_ka_update_core_state, ka_info);
+}
+
+return 0;
+}
+
 const char *
 dpdk_get_vhost_sock_dir(void)
 {
diff --git a/lib/dpdk.h b/lib/dpdk.h
index 3f31211..7619730 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -17,6 +17,7 @@
 #ifndef DPDK_H
 #define DPDK_H
 
+#include "stdbool.h"
 #ifdef DPDK_NETDEV
 
 #include 
@@ -31,9 +32,12 @@
 #endif /* DPDK_NETDEV */
 
 struct smap;
+struct keepalive_info;
 
 struct rte_keepalive *rte_global_keepalive_info;
+bool dpdk_is_enabled(void);
 void dpdk_init(const struct smap *ovs_other_config);
+int dpdk_ka_init(struct keepalive_info *ka_info);
 void dpdk_set_lcore_id(unsigned cpu);
 const char *dpdk_get_vhost_sock_dir(void);
 
diff --git a/lib/keepalive.c b/lib/keepalive.c
new file mode 100644
index 000..e93dc99
--- /dev/null
+++ b/lib/keepalive.c
@@ -0,0 +1,157 @@
+/*
+ * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with

[ovs-dev] [PATCH v3 03/19] process: Add helper function to retrieve process status.

2017-08-04 Thread Bhanuprakash Bodireddy

Implement helper function to retrieve the process status. This commit
also enables the fields relating to process name, state and core the
process was last scheduled. The APIs will be used by keepalive monitoring
framework in future commits.

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/process.c | 56 ++--
 lib/process.h | 13 +
 2 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/lib/process.c b/lib/process.c
index 0b8d195..cbf83db 100644
--- a/lib/process.c
+++ b/lib/process.c
@@ -64,7 +64,23 @@ struct raw_process_info {
 long long int uptime;   /* ms since started. */
 long long int cputime;  /* ms of CPU used during 'uptime'. */
 pid_t ppid; /* Parent. */
-char name[18];  /* Name (surrounded by parentheses). */
+char state; /* Process state. */
+int core_id;/* Core id last executed on. */
+char name[18];  /* Name. */
+};
+
+struct pstate2Num {
+char pidstate;
+int num;
+};
+
+const struct pstate2Num pstate_map[] = {
+{ 'S', STOPPED_STATE },
+{ 'R', ACTIVE_STATE },
+{ 't', TRACED_STATE },
+{ 'Z', DEFUNC_STATE },
+{ 'D', UNINTERRUPTIBLE_SLEEP_STATE },
+{ '\0', UNUSED_STATE },
 };
 
 /* Pipe used to signal child termination. */
@@ -421,8 +437,8 @@ get_raw_process_info(pid_t pid, struct raw_process_info 
*raw)
 
 n = fscanf(stream,
"%*d "   /* (1. pid) */
-   "%17s "  /* 2. process name */
-   "%*c "   /* (3. state) */
+   "(%17[^)]) " /* 2. process name */
+   "%c "/* 3. state */
"%lu "   /* 4. ppid */
"%*d "   /* (5. pgid) */
"%*d "   /* (6. sid) */
@@ -458,7 +474,7 @@ get_raw_process_info(pid_t pid, struct raw_process_info 
*raw)
"%*u "   /* (36. always 0) */
"%*u "   /* (37. always 0) */
"%*d "   /* (38. exit_signal) */
-   "%*d "   /* (39. task_cpu) */
+   "%d "/* 39. task_cpu */
 #if 0
/* These are here for documentation but #if'd out to save
 * actually parsing them from the stream for no benefit. */
@@ -468,9 +484,10 @@ get_raw_process_info(pid_t pid, struct raw_process_info 
*raw)
"%*lu "  /* (43. gtime) */
"%*ld"   /* (44. cgtime) */
 #endif
-   , raw->name, , , , _time, , );
+   , raw->name, >state, , , , _time,
+  , , >core_id);
 fclose(stream);
-if (n != 7) {
+if (n != 9) {
 VLOG_ERR_ONCE("%s: fscanf failed", file_name);
 return false;
 }
@@ -496,12 +513,15 @@ get_process_info(pid_t pid, struct process_info *pinfo)
 return false;
 }
 
+ovs_strlcpy(pinfo->name, child.name, sizeof pinfo->name);
+pinfo->state = child.state;
 pinfo->vsz = child.vsz;
 pinfo->rss = child.rss;
 pinfo->booted = child.uptime;
 pinfo->crashes = 0;
 pinfo->uptime = child.uptime;
 pinfo->cputime = child.cputime;
+pinfo->core_id = child.core_id;
 
 if (child.ppid) {
 struct raw_process_info parent;
@@ -579,6 +599,30 @@ process_run(void)
 #endif
 }
 
+bool
+process_is_active(int pid)
+{
+struct process_info pinfo;
+int pstate = UNUSED_STATE;
+
+ovs_assert(LINUX);
+
+int success = get_process_info(pid, );
+if (success) {
+for (int i = 0; pstate_map[i].pidstate != '\0'; i++) {
+if (pstate_map[i].pidstate == pinfo.state) {
+VLOG_DBG_ONCE("The state is %c", pstate_map[i].pidstate);
+pstate = pstate_map[i].num;
+break;
+}
+}
+
+if (pstate == ACTIVE_STATE) {
+return true;
+}
+}
+return false;
+}
 
 /* Causes the next call to poll_block() to wake up when process 'p' has
  * exited. */
diff --git a/lib/process.h b/lib/process.h
index 999ac68..69ab77f 100644
--- a/lib/process.h
+++ b/lib/process.h
@@ -20,6 +20,15 @@
 #include 
 #include 
 
+enum process_states {
+UNUSED_STATE,
+STOPPED_STATE,
+ACTIVE_STATE,
+TRACED_STATE,
+DEFUNC_STATE,
+UNINTERRUPTIBLE_SLEEP_STATE
+};
+
 struct process;
 
 struct process_info {
@@ -29,6 +38,9 @@ struct process_info {
 int crashes;/* # of crashes (usually 0). */
 long long int uptime;   /* ms since last (re)started by monitor. */
 long long int cputime;  /* ms of CPU used during 'uptime'. */
+char state;
+int core_id;
+char name[18];
 };
 
 /* Starting and monitoring subprocesses.
@@ -47,6 +59,7 @@ bool process_exited(struct process *);
 int process_status(const struct process *);
 void process_run(void);
 void process_wait(struct process

[ovs-dev] [PATCH v3 02/19] process: Avoid warnings compiling process.c

2017-08-04 Thread Bhanuprakash Bodireddy

This commit fixes the following "sparse" warning:

lib/process.c:439:16: error: use of assignment suppression and length
modifier together in gnu_scanf format [-Werror=format=].

This fix doesn't need any other changes as the fields aren't used for now.

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/process.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/lib/process.c b/lib/process.c
index 3e119b5..0b8d195 100644
--- a/lib/process.c
+++ b/lib/process.c
@@ -444,24 +444,24 @@ get_raw_process_info(pid_t pid, struct raw_process_info 
*raw)
"%llu "  /* 22. start_time */
"%llu "  /* 23. vsize */
"%llu "  /* 24. rss */
+   "%*u "   /* (25. rsslim) */
+   "%*u "   /* (26. start_code) */
+   "%*u "   /* (27. end_code) */
+   "%*u "   /* (28. start_stack) */
+   "%*u "   /* (29. esp) */
+   "%*u "   /* (30. eip) */
+   "%*u "   /* (31. pending signals) */
+   "%*u "   /* (32. blocked signals) */
+   "%*u "   /* (33. ignored signals) */
+   "%*u "   /* (34. caught signals) */
+   "%*u "   /* (35. whcan) */
+   "%*u "   /* (36. always 0) */
+   "%*u "   /* (37. always 0) */
+   "%*d "   /* (38. exit_signal) */
+   "%*d "   /* (39. task_cpu) */
 #if 0
/* These are here for documentation but #if'd out to save
 * actually parsing them from the stream for no benefit. */
-   "%*lu "  /* (25. rsslim) */
-   "%*lu "  /* (26. start_code) */
-   "%*lu "  /* (27. end_code) */
-   "%*lu "  /* (28. start_stack) */
-   "%*lu "  /* (29. esp) */
-   "%*lu "  /* (30. eip) */
-   "%*lu "  /* (31. pending signals) */
-   "%*lu "  /* (32. blocked signals) */
-   "%*lu "  /* (33. ignored signals) */
-   "%*lu "  /* (34. caught signals) */
-   "%*lu "  /* (35. whcan) */
-   "%*lu "  /* (36. always 0) */
-   "%*lu "  /* (37. always 0) */
-   "%*d "   /* (38. exit_signal) */
-   "%*d "   /* (39. task_cpu) */
"%*u "   /* (40. rt_priority) */
"%*u "   /* (41. policy) */
"%*llu " /* (42. blkio_ticks) */
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 01/19] dpdk: Add helper functions for DPDK datapath keepalive.

2017-08-04 Thread Bhanuprakash Bodireddy

Introduce helper functions in 'dpdk' module that are needed for
DPDK keepalive functionality. Also add dummy functions in 'dpdk-stub' module
that are needed when DPDK datapath is not available.

Signed-off-by: Bhanuprakash Bodireddy 
---
 lib/dpdk-stub.c | 24 
 lib/dpdk.c  | 31 +++
 lib/dpdk.h  |  7 +++
 3 files changed, 62 insertions(+)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index daef729..d7fb19b 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -48,3 +48,27 @@ dpdk_get_vhost_sock_dir(void)
 {
 return NULL;
 }
+
+void
+dpdk_register_pmd_core(unsigned core_id OVS_UNUSED)
+{
+/* Nothing */
+}
+
+void
+dpdk_unregister_pmd_core(unsigned core_id OVS_UNUSED)
+{
+/* Nothing */
+}
+
+void
+dpdk_mark_pmd_core_alive(void)
+{
+/* Nothing */
+}
+
+void
+dpdk_mark_pmd_core_sleep(void)
+{
+/* Nothing */
+}
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 8da6c32..8db63bf 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #ifdef DPDK_PDUMP
@@ -489,3 +490,33 @@ dpdk_set_lcore_id(unsigned cpu)
 ovs_assert(cpu != NON_PMD_CORE_ID);
 RTE_PER_LCORE(_lcore_id) = cpu;
 }
+
+/* Register packet processing core 'core_id' for liveness checks. */
+void
+dpdk_register_pmd_core(unsigned core)
+{
+rte_keepalive_register_core(rte_global_keepalive_info, core);
+}
+
+void
+dpdk_unregister_pmd_core(unsigned core OVS_UNUSED)
+{
+/* XXX: DPDK unfortunately hasn't implemented unregister API
+ * This will be fixed later, instead use sleep API now.
+ */
+rte_keepalive_mark_sleep(rte_global_keepalive_info);
+}
+
+/* Mark packet processing core alive. */
+void
+dpdk_mark_pmd_core_alive(void)
+{
+rte_keepalive_mark_alive(rte_global_keepalive_info);
+}
+
+/* Mark packet processing core as idle. */
+void
+dpdk_mark_pmd_core_sleep(void)
+{
+rte_keepalive_mark_sleep(rte_global_keepalive_info);
+}
diff --git a/lib/dpdk.h b/lib/dpdk.h
index 673a1f1..3f31211 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -32,8 +32,15 @@
 
 struct smap;
 
+struct rte_keepalive *rte_global_keepalive_info;
 void dpdk_init(const struct smap *ovs_other_config);
 void dpdk_set_lcore_id(unsigned cpu);
 const char *dpdk_get_vhost_sock_dir(void);
 
+/* Keepalive APIs */
+void dpdk_register_pmd_core(unsigned core_id);
+void dpdk_unregister_pmd_core(unsigned core_id);
+void dpdk_mark_pmd_core_alive(void);
+void dpdk_mark_pmd_core_sleep(void);
+
 #endif /* dpdk.h */
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 00/19] Add OVS DPDK keep-alive functionality.

2017-08-04 Thread Bhanuprakash Bodireddy

Keepalive feature is aimed at achieving Fastpath Service Assurance
in OVS-DPDK deployments. It adds support for monitoring the packet
processing cores(PMD thread cores) by dispatching heartbeats at regular
intervals. Incase of heartbeat misses additional health checks are
enabled on the PMD thread to detect the failure and the same shall be
reported to higher level fault management systems/frameworks.

The implementation uses OVSDB for reporting the health of the PMD threads.
Any external monitoring application can read the status from OVSDB at 
regular intervals (or) subscribe to the updates in OVSDB so that they get
notified when the changes happen on OVSDB.

keepalive info struct is created and initialized for storing the
status of the PMD threads. This is initialized by main thread(vswitchd)
as part of init process and will be periodically updated by 'keepalive'
thread. keepalive feature can be enabled through below OVSDB settings.

enable-keepalive=true
  - Keepalive feature is disabled by default.

keepalive-interval="5000"
  - Timer interval in milliseconds for monitoring the packet
processing cores.

When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
up at regular intervals to update the timestamp and status of pmd cores
in keepalive info struct. This information shall be read by vswitchd thread
and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB.

An external monitoring framework like collectd with ovs events support
can read (or) subscribe to the datapath status changes in ovsdb. When the state
is updated, the collectd shall be notified and will eventually relay the status
to ceilometer service running in the controller. Below is the high level
overview of deployment model.

Compute NodeControllerCompute Node

Collectd  <--> Ceilometer <>   Collectd

OvS DPDK   OvS DPDK

+-+
| VM  |
+--+--+
\---+---/
|
+--+---+   ++--+ +--+---+
| OVS  |-> |   ovsevents plugin| --> |   collectd   |
+--+---+   ++--+ +--+---+

+--+-+ +---++ |
| Ceilometer | <-- | collectd ceilometer plugin |  <---
+--+-+ +---++

github: The patches can be found here:
  https://github.com/bbodired/ovs (Last master commit e7cd8c363)

Performance impact:
  No noticeable performance or latency impact is observed with
  KA feature enabled.

-
v2 -> v3
  * Rebase.
  * Verified with dpdk-stable-17.05.1 release.
  * Fixed build issues with MSVC and cross checked with appveyor.

v1 -> v2
  * Rebase
  * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
is already applied as separate patch.

RFCv3 -> v1
  * Made changes to fix failures in some unit test cases.
  * some more code cleanup w.r.t process related APIs.

RFCv2 -> RFCv3
  * Remove POSIX shared memory block implementation (suggested by Aaron).
  * Rework the logic to register and track threads instead of cores. This way
in the future any thread can be registered to KA framework. For now only PMD
threads are tracked (suggested by Aaron).
  * Refactor few APIs and further clean up the code.
   
RFCv1 -> RFCv2
  * Merged the xml and schema commits to later commit where the actual
implementation is done(suggested by Ben).
  * Fix ovs-appctl keepalive/* hang issue when KA disabled.
  * Fixed memory leaks with appctl commands for keepalive/pmd-health-show,
pmd-xstats-show.
  * Refactor code and fixed APIs dealing with PMD health monitoring.


Bhanuprakash Bodireddy (19):
  dpdk: Add helper functions for DPDK datapath keepalive.
  process: Avoid warnings compiling process.c
  process: Add helper function to retrieve process status.
  Keepalive: Add initial keepalive support.
  keepalive: Add more helper functions to KA framework.
  dpif-netdev: Add helper function to store datapath tids.
  dpif-netdev: Register packet processing cores to KA framework.
  dpif-netdev: Enable heartbeats for DPDK datapath.
  keepalive: Retrieve PMD status periodically.
  bridge: Update keepalive status in OVSDB
  keepalive: Add support to query keepalive statistics.
  keepalive: Add support to query keepalive status.
  dpif-netdev: Add additional datapath health checks.
  keepalive: Check the link status as part of PMD health checks.
  keepalive: Check the packet statistics as part of PMD health checks.
  keepalive: Check the PMD cycle stats as part of PMD health checks.
  netdev-dpdk: Enable PMD health checks on heartbeat failure.
  keepalive: Display extended Keepalive status.
  Documentation: Update DPDK doc with Keepalive feature.

 Documentation/howto/dpdk.rst |  90 +
 lib/automake.mk  |   2 +
 lib/dpdk-stub.c  |  36 ++
 lib/dpdk.c

53 matches

Mail list logo