[ovs-dev] [PATCH v2] Monitor Database table to manage lifecycle of IDL client.

2019-01-03 Thread Ted Elhourani
The Python IDL implementation supports ovsdb cluster connections.
This patch is a follow up to commit 31e434fc98, it adds the option of
connecting to the leader (the default) in the Raft-based cluster. It mimics
the exisiting C IDL support for clusters introduced in commit 1b1d2e6daa.

The _Server database schema is first requested, then a monitor of the
Database table in the _Server Database. Method __check_server_db verifies
the eligibility of the server. If the attempt to obtain a monitor of the
_Server database fails and a cluster id was not provided this implementation
proceeds to request the data monitor. If a cluster id was provided via the
set_cluster_id method then the connection is aborted and a connection to a
different node is instead attempted, until a valid cluster node is found.
Thus, when supplied, cluster id is interpreted as the intention to only
allow connections to a clustered database. If not supplied, connections to
standalone nodes, or nodes that do not have the _Server database are
allowed. change_seqno is not incremented in the case of Database table
updates.
Signed-off-by: Ted Elhourani 
---
 python/ovs/db/idl.py| 217 
 python/ovs/reconnect.py |   3 +
 tests/ovsdb-idl.at  |  62 +++---
 3 files changed, 237 insertions(+), 45 deletions(-)

diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index 250e897..f989548 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -38,6 +38,7 @@ ROW_DELETE = "delete"
 OVSDB_UPDATE = 0
 OVSDB_UPDATE2 = 1
 
+CLUSTERED = "clustered"
 
 class Idl(object):
 """Open vSwitch Database Interface Definition Language (OVSDB IDL).
@@ -92,10 +93,12 @@ class Idl(object):
 """
 
 IDL_S_INITIAL = 0
-IDL_S_MONITOR_REQUESTED = 1
-IDL_S_MONITOR_COND_REQUESTED = 2
+IDL_S_SERVER_SCHEMA_REQUESTED = 1
+IDL_S_SERVER_MONITOR_REQUESTED = 2
+IDL_S_DATA_MONITOR_REQUESTED = 3
+IDL_S_DATA_MONITOR_COND_REQUESTED = 4
 
-def __init__(self, remote, schema_helper, probe_interval=None):
+def __init__(self, remote, schema_helper, leader_only=True, 
probe_interval=None):
 """Creates and returns a connection to the database named 'db_name' on
 'remote', which should be in a form acceptable to
 ovs.jsonrpc.session.open().  The connection will maintain an in-memory
@@ -119,6 +122,9 @@ class Idl(object):
 
 The IDL uses and modifies 'schema' directly.
 
+If 'leader_only' is set to True (default value) the IDL will only 
monitor
+and transact with the leader of the cluster.
+
 If "probe_interval" is zero it disables the connection keepalive
 feature. If non-zero the value will be forced to at least 1000
 milliseconds. If None it will just use the default value in OVS.
@@ -137,6 +143,20 @@ class Idl(object):
 self._last_seqno = None
 self.change_seqno = 0
 self.uuid = uuid.uuid1()
+
+# Server monitor.
+self._server_schema_request_id = None
+self._server_monitor_request_id = None
+self._db_change_aware_request_id = None
+self._server_db_name = '_Server'
+self._server_db_table = 'Database'
+self.server_tables = None
+self._server_db = None
+self.server_monitor_uuid = uuid.uuid1()
+self.leader_only = leader_only
+self.cluster_id = None
+self._min_index = 0
+
 self.state = self.IDL_S_INITIAL
 
 # Database locking.
@@ -172,6 +192,15 @@ class Idl(object):
 remotes.append(r)
 return remotes
 
+def set_cluster_id(self, cluster_id):
+"""Set the id of the cluster that this idl must connect to."""
+if cluster_id:
+self.cluster_id = str(cluster_id)
+else:
+self.cluster_id = None
+if self.state != self.IDL_S_INITIAL:
+self.force_reconnect()
+
 def index_create(self, table, name):
 """Create a named multi-column index on a table"""
 return self.tables[table].rows.index_create(name)
@@ -222,7 +251,7 @@ class Idl(object):
 if seqno != self._last_seqno:
 self._last_seqno = seqno
 self.__txn_abort_all()
-self.__send_monitor_request()
+self.__send_server_schema_request()
 if self.lock_name:
 self.__send_lock_request()
 break
@@ -230,6 +259,7 @@ class Idl(object):
 msg = self._session.recv()
 if msg is None:
 break
+
 if (msg.type == ovs.jsonrpc.Message.T_NOTIFY
 and msg.method == "update2"
 and len(msg.params) == 2):
@@ -239,7 +269,15 @@ class Idl(object):
 and msg.method == "update"
 and len(msg.params) == 2):
 # Database contents changed.
-self.__parse_update(msg.params[1], 

Re: [ovs-dev] [PATCH v1] Monitor Database table to manage lifecycle of IDL client.

2019-01-03 Thread Ted Elhourani
Thanks for taking a look. I will send a revised patch that addresses both 
issues you 
pointed to, in particular backward compatibility with _Server-less ovsdbs.

Thanks,
Ted

On 12/27/18, 9:23 AM, "Ben Pfaff"  wrote:

On Tue, Dec 18, 2018 at 12:18:06AM +, Ted Elhourani wrote:
> The Python IDL implementation supports ovsdb cluster connections.
> This patch is a follow up to commit 31e434fc98, it adds the option of
> connecting to the leader in the Raft-based cluster. It mimics the
> exisiting C IDL support for clusters introduced in commit 1b1d2e6daa.
> 
> The server schema is first requested, then a monitor of the Database table
> in the _Server Database. __check_server_db is called to verify the
> eligibility of the server. Unlike the C IDL, if an attempt to obtain a
> monitor of the Server database fails this implementation does not proceed
> to monitor and transact with the database. The reason is connections to
> standalone databases should not be permitted when the intention is to
> connect to a valid cluster node. The change_seqno is not incremented in 
the
> case of Database table updates.
> 
> The run method of Session (jsonrpc) is modified such that the session is
> closed when a stream connection fails, if more than one remote exists.
> Otherwise the session will re-attempt to establish a stream to the same
> cluster node. Method pick_remote must be called before opening a stream
> in __connect (in Session), otherwise a force_reconnect causes the Session
> to re-attempt a connection to the same server. pick_remote is no longer
> necessary in method run of Session.
> 
> Signed-off-by: Ted Elhourani 

I'm pretty sure that this breaks backward compatibility with any
ovsdb-server not new enough to support the _Server table.  The commit
message above conflates that with whether the database is clustered.  I
think your intention is to reject non-clustered databases, but it
doesn't do that; instead, it rejects database servers too old to have a
_Server table and accepts standalone databases served up by new-enough
servers.  It's one thing to avoid connecting to standalone databases
when a cluster is expected, but this goes beyond that to drop support
for all older database servers.  I don't think that's reasonable behavior.

I noticed that the following looks wrong.  EAGAIN indicates that the
connection is not complete yet.  It should not be treated as a
connection failure (unless a timeout has occurred) even if there is more
than one remote:

> @@ -516,9 +516,9 @@ class Session(object):
>  self.reconnect.connected(ovs.timeval.msec())
>  self.rpc = Connection(self.stream)
>  self.stream = None
> -elif error != errno.EAGAIN:
> +elif (error != errno.EAGAIN or
> +  self.get_num_of_remotes() > 1):
>  self.reconnect.connect_failed(ovs.timeval.msec(), error)
> -self.pick_remote()
>  self.stream.close()
>  self.stream = None

Thanks,

Ben.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Adding support for PMD auto load balancing

2019-01-03 Thread Kevin Traynor
On 01/03/2019 12:36 PM, Nitin Katiyar wrote:
> Port rx queues that have not been statically assigned to PMDs are currently
> assigned based on periodically sampled load measurements.
> The assignment is performed at specific instances – port addition, port
> deletion, upon reassignment request via CLI etc.
> 
> Due to change in traffic pattern over time it can cause uneven load among
> the PMDs and thus resulting in lower overall throughout.
> 
> This patch enables the support of auto load balancing of PMDs based on
> measured load of RX queues. Each PMD measures the processing load for each
> of its associated queues every 10 seconds. If the aggregated PMD load reaches
> 95% for 6 consecutive intervals then PMD considers itself to be overloaded.
> 
> If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
> performed by OVS main thread. The dry-run does NOT change the existing
> queue to PMD assignments.
> 
> If the resultant mapping of dry-run indicates an improved distribution
> of the load then the actual reassignment will be performed.
> 
> The automatic rebalancing will be disabled by default and has to be
> enabled via configuration option. The interval (in minutes) between
> two consecutive rebalancing can also be configured via CLI, default
> is 1 min.
> 
> Following example commands can be used to set the auto-lb params:
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
> 

Hi Nitin, thanks for v2. Not full review yet but sending some comments
below.

Maybe you can put some of the above into a new section below this
http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue-assigment-to-pmd-threads

I also think this feature should be experimental until it has been road
tested a bit more.

> Co-authored-by: Rohith Basavaraja 
> Co-authored-by: Venkatesan Pradeep 
> Signed-off-by: Rohith Basavaraja 
> Signed-off-by: Venkatesan Pradeep 
> Signed-off-by: Nitin Katiyar 
> ---
>  lib/dpif-netdev.c| 403 
> +--
>  vswitchd/vswitch.xml |  30 
>  2 files changed, 424 insertions(+), 9 deletions(-)
> 

There seems to be windows style line endings in the patch? or something
else has gone wrong for me.

> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1564db9..8db106f 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -80,6 +80,12 @@
>  
>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>  
> +/* Auto Load Balancing Defaults */
> +#define ACCEPT_IMPROVE_DEFAULT   (25)
> +#define PMD_LOAD_THRE_DEFAULT(95)

Probably you should remove the brackets above to be consistent with the
others below and in the rest of the file.

> +#define PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> +#define MIN_TO_MSEC  6
> +
>  #define FLOW_DUMP_MAX_BATCH 50
>  /* Use per thread recirc_depth to prevent recirculation loop. */
>  #define MAX_RECIRC_DEPTH 6
> @@ -288,6 +294,13 @@ struct dp_meter {
>  struct dp_meter_band bands[];
>  };
>  
> +struct pmd_auto_lb {
> +bool auto_lb_conf;/* enable-disable auto load balancing */

I'm not sure what '_conf' is short for? maybe it should be something
like 'auto_lb_requested'

> +bool is_enabled;  /* auto_lb current status */

Comments should be of style /* Sentence case. */
http://docs.openvswitch.org/en/latest/internals/contributing/coding-style/#comments


> +uint64_t rebalance_intvl;
> +uint64_t rebalance_poll_timer;
> +};
> +
>  /* Datapath based on the network device interface from netdev.h.
>   *
>   *
> @@ -368,6 +381,7 @@ struct dp_netdev {
>  uint64_t last_tnl_conf_seq;
>  
>  struct conntrack conntrack;
> +struct pmd_auto_lb pmd_alb;
>  };
>  
>  static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
> @@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread {
>  struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. 
> */
>  /* List of rx queues to poll. */
>  struct hmap poll_list OVS_GUARDED;
> +

Unrelated newline should be removed

>  /* Map of 'tx_port's used for transmission.  Written by the main thread,
>   * read by the pmd thread. */
>  struct hmap tx_ports OVS_GUARDED;
> @@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread {
>  /* Keep track of detailed PMD performance statistics. */
>  struct pmd_perf_stats perf_stats;
>  
> +/* Some stats from previous iteration used by automatic pmd
> +   load balance logic. */

Nit, but see coding stds. and other multi-line comments wrt style

> +uint64_t prev_stats[PMD_N_STATS];> +atomic_count pmd_overloaded;
> +
>  /* Set to true if the pmd thread needs to be reloaded. */
>  bool need_reload;
>  };
> @@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
>   enum rxq_cycles_counter_type type);
>  static void
>  dp_netdev_rxq_set_intrvl_cycles(struct 

Re: [ovs-dev] [PATCH v2] rhel: bug fix upgrade path in kmod fedora spec file

2019-01-03 Thread Flavio Leitner
Hi Martin,

Sorry the delay, coming back from PTO and holidays.

On Thu, Dec 13, 2018 at 02:50:22PM -0800, Martin Xu wrote:
> Hi Flavio,
> 
> I remember when we had a discussion about package "succession" when I had
> the "Obsoletes" tag added in one of the previous patches submitted for the
> fedora spec files.
> 
> We do need the capability for the system to know openvswitch-kmod is
> automatically upgrading kmod-openvswitch. For some reason, I thought I
> tested "Provides" alone would suffice. I just tried again and it seems not
> the case. "Provides" does not make the renamed package automatically
> upgrade the older one. Is there any other ways of doing this? I found one
> suggestion from the fedora site, that uses both "Provides" and "Obsoletes."
> I understand this means we will be declaring openvswitch-kmod is being
> replaced. We can't go either way anymore. Is there a better solution?
> Thanks.

Provides does not allow upgrades. It will satisfy dependencies on what
you are providing. A common example is smtp: sendmail and postfix. Both
could do: "Provides: smtp", then "mutt" could do "Requires: smtp" and
either sendmail or postfix would be fine.

You're right that you need Obsoletes to allow upgrades. The problem
with Obsoletes is that once you ship it, there is no turning back.
That's why I suggested to limit to a specific version.
For example:

   Obsoletes: kmod-openvswitch < %{version}-%{release}

That grants all previous packages are indeed obsoleted while it
would be possible to resurrect kmod-openvswitch if we change our minds
in the future. All we would need is to ship kmod-openvswitch newer
than the last %{version}-%{release} shipped.

HTH,
fbl

> 
> Martin
> 
> On Thu, Dec 13, 2018 at 2:44 PM Martin Xu  wrote:
> 
> > This patch removes the "Conflicts" tag and adds "Obsoletes" tag.
> >
> > With the conflicts tag, when a user attempts to install or upgrade with
> > the same version as already installed, the conflict kicks in. Otherwise,
> > such is allowed with --replacepkgs.
> >
> > Obsoletes is needed for the upgrade path from kmod-openvswitch to
> > openvswitch-kmod.
> >
> > Fixes: 22c33c3039 (rhel: support kmod build against mulitple kernel
> > versions, fedora)
> >
> > VMware-BZ: #2249788
> >
> > Signed-off-by: Martin Xu 
> > CC: Flavio Leitner 
> > CC: Yi-Hung Wei 
> > CC: Yifeng Sun 
> > CC: Zak Whittington 
> > CC: Ben Pfaff 
> > ---
> > v1->v2: adds "Obsoletes" tag needed for upgrade after renaming
> > adds more reviewers
> >
> >  rhel/openvswitch-kmod-fedora.spec.in | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/rhel/openvswitch-kmod-fedora.spec.in b/rhel/
> > openvswitch-kmod-fedora.spec.in
> > index 8d54fd734..7913dbdb4 100644
> > --- a/rhel/openvswitch-kmod-fedora.spec.in
> > +++ b/rhel/openvswitch-kmod-fedora.spec.in
> > @@ -33,7 +33,7 @@ Source: openvswitch-%{version}.tar.gz
> >  #Source1: openvswitch-init
> >  Buildroot: /tmp/openvswitch-xen-rpm
> >  Provides: kmod-openvswitch
> > -Conflicts: kmod-openvswitch
> > +Obsoletes: kmod-openvswitch
> >
> >  %description
> >  Open vSwitch provides standard network bridging functions augmented with
> > --
> > 2.12.3
> >
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
Flavio

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v2 3/5] ovn: Add a new OVN field icmp4.frag_mtu

2019-01-03 Thread nusiddiq
From: Numan Siddique 

In order to support OVN specific fields (which are not yet
supported in OpenvSwitch to set or modify values) a generic
OVN field support is added in this patch. These OVN fields are
expected to be used as nested OVN actions inside OVN actions
like icmp4, icmp6 etc. This patch adds only one field for now
 - icmp4.frag_mtu. It should be fairly straightforward to
add similar fields in the near future.

This field is expected to be used as an inner field with in
the 'icmp4' action.

Eg.
"icmp4 {"eth.dst <-> eth.src; "
"icmp4.type = 3; /* Destination Unreachable */ "
"icmp4.code = 4; /* Fragmentation Needed */ "
 icmp4.frag_mtu = 1442;
 ...
 "next; };",

pinctrl module of ovn-controller will set the specified value
in the the low-order 16 bits of the ICMP4 header field that is
labelled "unused" in the ICMP specification as defined in the RFC 1191.

Upcoming patch will use it to send an icmp4 packet if the
source IPv4 packet destined to go via external gateway needs to
be fragmented.

Signed-off-by: Numan Siddique 
---
 include/ovn/actions.h |  25 +++-
 include/ovn/automake.mk   |   3 +-
 include/ovn/expr.h|   5 +
 {ovn/lib => include/ovn}/logical-fields.h |  41 +++
 ovn/controller/lflow.c|   1 +
 ovn/controller/lflow.h|   2 +-
 ovn/controller/pinctrl.c  |  23 +++-
 ovn/lib/actions.c | 135 +-
 ovn/lib/automake.mk   |   1 -
 ovn/lib/expr.c|  17 ++-
 ovn/lib/logical-fields.c  |  36 +-
 ovn/northd/ovn-northd.c   |   2 +-
 ovn/ovn-sb.xml|  11 ++
 ovn/utilities/ovn-trace.c |   5 +-
 tests/ovn.at  |  18 ++-
 tests/test-ovn.c  |   2 +-
 16 files changed, 282 insertions(+), 45 deletions(-)
 rename {ovn/lib => include/ovn}/logical-fields.h (71%)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 1c0c67ce6..095b60df0 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -80,7 +80,8 @@ struct ovn_extend_table;
 OVNACT(LOG,   ovnact_log) \
 OVNACT(PUT_ND_RA_OPTS,ovnact_put_opts)\
 OVNACT(ND_NS, ovnact_nest)\
-OVNACT(SET_METER, ovnact_set_meter)
+OVNACT(SET_METER, ovnact_set_meter)   \
+OVNACT(OVNFIELD_LOAD, ovnact_load)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -142,6 +143,20 @@ ovnact_end(const struct ovnact *ovnacts, size_t 
ovnacts_len)
 #define OVNACT_FOR_EACH(POS, OVNACTS, OVNACTS_LEN)  \
 for ((POS) = (OVNACTS); (POS) < ovnact_end(OVNACTS, OVNACTS_LEN);  \
  (POS) = ovnact_next(POS))
+
+static inline int
+ovnacts_count(const struct ovnact *ovnacts, size_t ovnacts_len)
+{
+uint8_t n_ovnacts = 0;
+if (ovnacts) {
+const struct ovnact *a;
+
+OVNACT_FOR_EACH (a, ovnacts, ovnacts_len) {
+n_ovnacts++;
+}
+}
+return n_ovnacts;
+}
 
 /* Action structure for each OVNACT_*. */
 
@@ -229,6 +244,8 @@ struct ovnact_nest {
 struct ovnact ovnact;
 struct ovnact *nested;
 size_t nested_len;
+struct ovnact *nested_ovnfields;
+size_t nested_ovnfields_len;
 };
 
 /* OVNACT_GET_ARP, OVNACT_GET_ND. */
@@ -461,6 +478,12 @@ struct action_header {
 };
 BUILD_ASSERT_DECL(sizeof(struct action_header) == 8);
 
+OVS_PACKED(
+struct ovnfield_act_header {
+ovs_be16 id; /* one of enum ovnfield_id. */
+ovs_be16 len; /* Length of the ovnfield data. */
+});
+
 struct ovnact_parse_params {
 /* A table of "struct expr_symbol"s to support (as one would provide to
  * expr_parse()). */
diff --git a/include/ovn/automake.mk b/include/ovn/automake.mk
index d2924c2f6..54b0e2c0e 100644
--- a/include/ovn/automake.mk
+++ b/include/ovn/automake.mk
@@ -2,4 +2,5 @@ ovnincludedir = $(includedir)/ovn
 ovninclude_HEADERS = \
include/ovn/actions.h \
include/ovn/expr.h \
-   include/ovn/lex.h
+   include/ovn/lex.h  \
+   include/ovn/logical-fields.h
diff --git a/include/ovn/expr.h b/include/ovn/expr.h
index 3995e62f0..a68fd6e1c 100644
--- a/include/ovn/expr.h
+++ b/include/ovn/expr.h
@@ -58,6 +58,7 @@
 #include "openvswitch/list.h"
 #include "openvswitch/match.h"
 #include "openvswitch/meta-flow.h"
+#include "logical-fields.h"
 
 struct ds;
 struct expr;
@@ -244,6 +245,7 @@ struct expr_symbol {
 int width;
 
 const struct mf_field *field; /* Fields only, otherwise NULL. */
+const struct ovn_field *ovn_field;  /* OVN Fields only, otherwise NULL. */
 const struct expr_symbol *parent; /* Subfields only, otherwise NULL. */
 int parent_ofs;   /* Subfields only, otherwise 0. */
 char 

[ovs-dev] [RFC v2 2/5] Add a new OVS action check_pkt_larger

2019-01-03 Thread nusiddiq
From: Numan Siddique 

This patch adds a new action 'check_pkt_larger' which checks if the
packet is larger than the given size and stores the result in the
destination register.

Usage: check_pkt_larger:len->REGISTER
Eg. match=...,actions=check_pkt_larger:1442->NXM_NX_REG0[0],next;

This patch makes use of the new datapath action - 'check_pkt_len'
which is still WIP. At the start of ovs-vswitchd, datapath is probed
for this action. If the datapath action is present, then 'check_pkt_larger'
makes use of this datapath action.

Datapath action 'check_pkt_len' takes these nlattrs
  * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
to apply if the packet length is greater than the specified 'pkt_len'
  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
actions to apply if the packet length is lesser or equal to the
specified 'pkt_len'.

Let's say we have these flows added to an OVS bridge br-int

table=0, priority=100 
in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1)
table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3
table=1, priority=100,in_port=1,ip,actions=output:4

Suppose if a packet is received from in_port=1, the packet is sent
to userspace (MISS_UPCALL). The action 'check_pkt_larger' will be translated as
  - check_pkt_len( > 100 ? (3) : (4))

datapath will check the packet length and if the packet length is greater than 
100,
it will be output to port 3, else it will be output to port 4.

In case, datapath doesn't support 'check_pkt_len' action, the OVS action
'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added.

This OVS action is intended to be used by OVN to check the packet length
and generate an ICMP packet with type 3, code 4 and next hop mtu
in the logical router pipeline if the MTU of the physical interface
is lesser than the packet length. More information can be found here [1]

TODO:
 - Add test cases.
 - Add documentation

Request to suggest a better name for the action in case 'check_pkt_larger'
seems odd.

[1] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff 
Signed-off-by: Numan Siddique 
CC: Ben Pfaff 
---
 .../linux/compat/include/linux/openvswitch.h  |  30 +++--
 include/openvswitch/ofp-actions.h |  18 +++
 lib/dpif-netdev.c |   1 +
 lib/dpif.c|   1 +
 lib/odp-execute.c |   4 +
 lib/odp-util.c|  36 ++
 lib/ofp-actions.c |  98 +++-
 lib/ofp-parse.c   |  10 ++
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  | 108 ++
 ofproto/ofproto-dpif.c|  47 
 ofproto/ofproto-dpif.h|   5 +-
 13 files changed, 349 insertions(+), 11 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 9b087f1b0..26dd69dcd 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -855,6 +855,24 @@ enum ovs_nat_attr {
 
 #define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1)
 
+/*
+ * enum ovs_check_pkt_len_attr - Attributes for %OVS_ACTION_ATTR_CHECK_PKT_LEN.
+ *
+ * @OVS_CHECK_PKT_LEN_ATTR_PKT_LEN: u16 Packet length to check for.
+ * @OVS_CHECK_PKT_LEN_ATTR_USERSPACE_COND: u8 comparison condition to send
+ * the packet to userspace. One of OVS_CHECK_PKT_LEN_COND_*.
+ * @OVS_CHECK_PKT_LEN_ATTR_USERPACE - Nested OVS_USERSPACE_ATTR_* actions.
+ */
+enum ovs_check_pkt_len_attr {
+OVS_CHECK_PKT_LEN_ATTR_UNSPEC,
+OVS_CHECK_PKT_LEN_ATTR_PKT_LEN,
+OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER,
+OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL,
+__OVS_CHECK_PKT_LEN_ATTR_MAX,
+};
+
+#define OVS_CHECK_PKT_LEN_ATTR_MAX (__OVS_CHECK_PKT_LEN_ATTR_MAX - 1)
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -909,8 +927,6 @@ enum ovs_nat_attr {
  * tunnel header.
  * @OVS_ACTION_ATTR_METER: Run packet through a meter, which may drop the
  * packet, or modify the packet (e.g., change the DSCP field).
- * @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of
- * actions without affecting the original packet and key.
  */
 
 enum ovs_action_attr {
@@ -936,12 +952,13 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_CT_CLEAR, /* No argument. */
OVS_ACTION_ATTR_PUSH_NSH, /* Nested OVS_NSH_KEY_ATTR_*. */
OVS_ACTION_ATTR_POP_NSH,  /* No argument. */
-   OVS_ACTION_ATTR_METER,/* u32 meter number. */
-   OVS_ACTION_ATTR_CLONE, 

[ovs-dev] [RFC v2 1/5] datapath: Add a new action check_pkt_len

2019-01-03 Thread nusiddiq
From: Numan Siddique 

[Please note, this patch is submitted as RFC in ovs-dev ML to
get feedback before submitting to netdev ML]

This patch adds a new action which checks the packet length and
executes a set of actions if the packet length is greater than
the specified length or executes another set of actions if the
packet length is lesser or equal to.

This action takes below nlattrs
  * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for

  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
to apply if the packet length is greater than the specified 'pkt_len'

  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
actions to apply if the packet length is lesser or equal to the
specified 'pkt_len'.

The main use case for adding this action is to solve the packet
drops because of MTU mismatch in OVN virtual networking solution.
When a VM (which belongs to a logical switch of OVN) sends a packet
destined to go via the gateway router and if the nic which provides
external connectivity, has a lesser MTU, OVS drops the packet
if the packet length is greater than this MTU.

With the help of this action, OVN will check the packet length
and if it is greater than the MTU size, it will generate an
ICMP packet (type 3, code 4) and includes the next hop mtu in it
so that the sender can fragment the packets.

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff 
Signed-off-by: Numan Siddique 
CC: Ben Pfaff 
CC: Justin Pettit 
CC: Pravin B Shelar 
---
 include/uapi/linux/openvswitch.h |  25 -
 net/openvswitch/actions.c|  55 +-
 net/openvswitch/flow_netlink.c   | 175 +++
 3 files changed, 253 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index dbe0cbe4f1b7..c395baffdd42 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -798,6 +798,27 @@ struct ovs_action_push_eth {
struct ovs_key_ethernet addresses;
 };
 
+/*
+ * enum ovs_check_pkt_len_attr - Attributes for %OVS_ACTION_ATTR_CHECK_PKT_LEN.
+ *
+ * @OVS_CHECK_PKT_LEN_ATTR_PKT_LEN: u16 Packet length to check for.
+ * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER: Nested OVS_ACTION_ATTR_*
+ * actions to apply if the packer length is greater than the specified
+ * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN.
+ * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested OVS_ACTION_ATTR_*
+ * actions to apply if the packer length is lesser or equal to the specified
+ * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN.
+ */
+enum ovs_check_pkt_len_attr {
+   OVS_CHECK_PKT_LEN_ATTR_UNSPEC,
+   OVS_CHECK_PKT_LEN_ATTR_PKT_LEN,
+   OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER,
+   OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL,
+   __OVS_CHECK_PKT_LEN_ATTR_MAX,
+};
+
+#define OVS_CHECK_PKT_LEN_ATTR_MAX (__OVS_CHECK_PKT_LEN_ATTR_MAX - 1)
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -842,7 +863,8 @@ struct ovs_action_push_eth {
  * packet, or modify the packet (e.g., change the DSCP field).
  * @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of
  * actions without affecting the original packet and key.
- *
+ * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the length of the packet and
+ * execute a set of actions as specified in OVS_CHECK_PKT_LEN_ATTR_*.
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
  * type may not be changed.
@@ -875,6 +897,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_PUSH_NSH, /* Nested OVS_NSH_KEY_ATTR_*. */
OVS_ACTION_ATTR_POP_NSH,  /* No argument. */
OVS_ACTION_ATTR_METER,/* u32 meter ID. */
+   OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index e47ebbbe71b8..9551c07eae92 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -169,6 +169,10 @@ static int clone_execute(struct datapath *dp, struct 
sk_buff *skb,
 const struct nlattr *actions, int len,
 bool last, bool clone_flow_key);
 
+static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
+ struct sw_flow_key *key,
+ const struct nlattr *attr, int len);
+
 static void update_ethertype(struct sk_buff *skb, struct ethhdr *hdr,
 __be16 ethertype)
 {
@@ -1213,6 +1217,46 @@ static int execute_recirc(struct datapath *dp, struct 
sk_buff *skb,
return clone_execute(dp, skb, key, recirc_id, NULL, 0, last, true);
 }
 
+static int 

[ovs-dev] [RFC v2 0/5] Address MTU issue for larger packets in OVN

2019-01-03 Thread nusiddiq
From: Numan Siddique 

This is an RFC series to address the MTU issues for OVN reported
here [1].

To address this issue, a new OVS action - check_pkt_larger is added.
A new datapath action is also added - check_pkt_len.

The datapath patch is submitted here to get feedback before submitting
to the kernel net-dev ML.

v1 of RFC included patches for datapath and OVS. This v2 series also
includes the corresponding OVN changes.

In v1, check_pkt_len datapath action was implemented as 
  - check_pkt_len(pkt_len, condition, userspace action if condition matches)
   Eg.
check_pkt_len( > 1500 ? userspace(pid=,slow_path(check_pkt_len))
If the packet length is greater than 1500 bytes, send it to userspace.

In v2 a different approach is taken
  - check_pkt_len(> pkt_len ? (set of action to apply) : (another set of
actions to apply))
Eg. check_pkt_len(> 1500 ? (2): (3))
If the packet length is greater than 1500, output to port 2, else
output to port 3.

ovs-vswitchd is modified accordingly.

The ovs action - check_pkt_larger is unchanged. It looks like
  - check_pkt_larger(pkt_len)->REGISTER_BIT.
Eg.  check_pkt_larger(1500)->NXM_NX_REG0[0]
If the packet is larger than 1500 bytes, REG0[0] will be set to 1.

More details in the patches.

If the approach seems reasonable, I will submit the datapath patch first
to net-dev ML first.

[1] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html

Datapath patch
-
Numan Siddique (1):
 datapath: Add a new action check_pkt_len

 include/uapi/linux/openvswitch.h |  25 -
 net/openvswitch/actions.c|  55 +-
 net/openvswitch/flow_netlink.c   | 175 +++
 3 files changed, 253 insertions(+), 2 deletions(-)


OVS patches
---
Numan Siddique (4):
  Add a new OVS action check_pkt_larger
  ovn: Add a new OVN field icmp4.frag_mtu
  ovn: Support OVS action 'check_pkt_larger' in OVN
  ovn: Generate ICMPv4 packet in router pipeline for larger packets

 .../linux/compat/include/linux/openvswitch.h  |  30 ++-
 include/openvswitch/ofp-actions.h |  18 ++
 include/ovn/actions.h |  33 ++-
 include/ovn/automake.mk   |   3 +-
 include/ovn/expr.h|   5 +
 {ovn/lib => include/ovn}/logical-fields.h |  41 
 lib/dpif-netdev.c |   1 +
 lib/dpif.c|   1 +
 lib/odp-execute.c |   4 +
 lib/odp-util.c|  36 
 lib/ofp-actions.c |  98 -
 lib/ofp-parse.c   |  10 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  | 108 ++
 ofproto/ofproto-dpif.c|  47 +
 ofproto/ofproto-dpif.h|   5 +-
 ovn/controller/lflow.c|   1 +
 ovn/controller/lflow.h|   2 +-
 ovn/controller/pinctrl.c  |  38 +++-
 ovn/lib/actions.c | 188 ++---
 ovn/lib/automake.mk   |   1 -
 ovn/lib/expr.c|  17 +-
 ovn/lib/logical-fields.c  |  36 +++-
 ovn/northd/ovn-northd.8.xml   |  83 +++-
 ovn/northd/ovn-northd.c   |  91 -
 ovn/ovn-sb.xml|  32 +++
 ovn/utilities/ovn-trace.c |   9 +-
 tests/ovn.at  | 193 +-
 tests/test-ovn.c  |   2 +-
 30 files changed, 1072 insertions(+), 63 deletions(-)
 rename {ovn/lib => include/ovn}/logical-fields.h (71%)

-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] conntrack: Check all addresses for ephemeral ports.

2019-01-03 Thread Ben Pfaff
On Thu, Jan 03, 2019 at 10:11:29AM -0500, Aaron Conole wrote:
> Aaron Conole  writes:
> 
> > Ben Pfaff  writes:
> >
> >> On Thu, Dec 20, 2018 at 01:22:18PM -0500, Aaron Conole wrote:
> >>> Kevin Traynor  writes:
> >>> 
> >>> > On 12/19/2018 08:23 AM, Darrell Ball wrote:
> >>> >> On Tue, Dec 18, 2018 at 6:57 PM 0-day Robot  wrote:
> >>> >> 
> >>> >>> Bleep bloop.  Greetings Darrell Ball, I am a robot and I have tried 
> >>> >>> out
> >>> >>> your patch.
> >>> >>> Thanks for your contribution.
> >>> >>>
> >>> >>> I encountered some error that I wasn't expecting.  See the details 
> >>> >>> below.
> >>> >>>
> >>> >>>
> >>> >>> checkpatch:
> >>> >>> ERROR: Too many signoffs; are you missing Co-authored-by lines?
> >>> >>> Lines checked: 37, Warnings: 0, Errors: 1
> >>> >>>
> >>> >> 
> >>> >> I don't understand this complaint.
> >>> >> 
> >>> >
> >>> > This is a false positive. I've seen patchwork duplicate signed-off-by's
> >>> > in an mbox before (reported to Stephen,
> >>> > https://github.com/getpatchwork/patchwork/issues/219) which would cause
> >>> > this error, but downloading this mbox locally it seems ok.
> >>> 
> >>> It's a false positive, but not for that reason.  The bot keeps up to date
> >>> with the most recent checkpatch, so it stamps a sign-off as part of the
> >>> delivery chain. The older branch checkpatch version doesn't understand
> >>> that.
> >>> 
> >>> We probably should either backport 3267343a8487 ("checkpatch: Improve
> >>> accuracy and specificity of sign-off checking.") to the relevant
> >>> branches so that when someone submits a patch on the branch it's
> >>> checked, -OR- improve the robot to just save off the latest checkpatch
> >>> version before starting to apply patches.  I like the idea of the former
> >>> so that checkpatch changes can self-check, but it comes with a drawback
> >>> (like checkpatch changes won't be invoked until after they're applied to
> >>> the tree .. I guess it isn't such a big deal, though).
> >>
> >> Hmm.  I'm inclined to suggest that the robot should always use the
> >> latest checkpatch regardless of branch.  Otherwise we'll have to
> >> make a policy of backporting checkpatch updates, and I'm not really in
> >> favor of that.
> >
> > Makes sense.  I'll work on it when I'm back from my PTO.
> 
> So, I'm modifying the bot - just before it switches branches, it will
> save off the latest checkpatch.  That lets new changes on master be
> invoked, and backports will use the latest checkpatch versions.  I'll
> hook it up today.
> 
> Sound reasonable?

Sure, sounds reasonable to me.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Stopwatch module test failure on FreeBSD.

2019-01-03 Thread Aaron Conole
0-day Robot  writes:

> Bleep bloop.  Greetings Ilya Maximets, I am a robot and I have tried out your 
> patch.
> Thanks for your contribution.
>
> I encountered some error that I wasn't expecting.  See the details below.
>
>
> git-am:
> fatal: sha1 information is lacking or useless
> (tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/85/stdout).
> Repository lacks necessary blobs to fall back on 3-way merge.
> Cannot fall back to three-way merge.
> Patch failed at 0001 Stopwatch module test failure on FreeBSD.
> The copy of the patch that failed is found in:
>
> /var/lib/jenkins/jobs/upstream_build_from_pw/workspace/.git/rebase-apply/patch
> When you have resolved this problem, run "git am --resolved".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".

This is a cool corner case.  Patchwork detected this as a patch
(probably because there is a diff line in the email body), and that
triggered the bot.  Cool find!

I'm going to try and make a change to filter these kinds of things out.

Sorry for the spam.

>
> Please check this out.  If you feel there has been an error, please
> email acon...@bytheb.org
>
> Thanks,
> 0-day Robot
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] conntrack: Check all addresses for ephemeral ports.

2019-01-03 Thread Aaron Conole
Aaron Conole  writes:

> Ben Pfaff  writes:
>
>> On Thu, Dec 20, 2018 at 01:22:18PM -0500, Aaron Conole wrote:
>>> Kevin Traynor  writes:
>>> 
>>> > On 12/19/2018 08:23 AM, Darrell Ball wrote:
>>> >> On Tue, Dec 18, 2018 at 6:57 PM 0-day Robot  wrote:
>>> >> 
>>> >>> Bleep bloop.  Greetings Darrell Ball, I am a robot and I have tried out
>>> >>> your patch.
>>> >>> Thanks for your contribution.
>>> >>>
>>> >>> I encountered some error that I wasn't expecting.  See the details 
>>> >>> below.
>>> >>>
>>> >>>
>>> >>> checkpatch:
>>> >>> ERROR: Too many signoffs; are you missing Co-authored-by lines?
>>> >>> Lines checked: 37, Warnings: 0, Errors: 1
>>> >>>
>>> >> 
>>> >> I don't understand this complaint.
>>> >> 
>>> >
>>> > This is a false positive. I've seen patchwork duplicate signed-off-by's
>>> > in an mbox before (reported to Stephen,
>>> > https://github.com/getpatchwork/patchwork/issues/219) which would cause
>>> > this error, but downloading this mbox locally it seems ok.
>>> 
>>> It's a false positive, but not for that reason.  The bot keeps up to date
>>> with the most recent checkpatch, so it stamps a sign-off as part of the
>>> delivery chain. The older branch checkpatch version doesn't understand
>>> that.
>>> 
>>> We probably should either backport 3267343a8487 ("checkpatch: Improve
>>> accuracy and specificity of sign-off checking.") to the relevant
>>> branches so that when someone submits a patch on the branch it's
>>> checked, -OR- improve the robot to just save off the latest checkpatch
>>> version before starting to apply patches.  I like the idea of the former
>>> so that checkpatch changes can self-check, but it comes with a drawback
>>> (like checkpatch changes won't be invoked until after they're applied to
>>> the tree .. I guess it isn't such a big deal, though).
>>
>> Hmm.  I'm inclined to suggest that the robot should always use the
>> latest checkpatch regardless of branch.  Otherwise we'll have to
>> make a policy of backporting checkpatch updates, and I'm not really in
>> favor of that.
>
> Makes sense.  I'll work on it when I'm back from my PTO.

So, I'm modifying the bot - just before it switches branches, it will
save off the latest checkpatch.  That lets new changes on master be
invoked, and backports will use the latest checkpatch versions.  I'll
hook it up today.

Sound reasonable?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Adding support for PMD auto load balancing

2019-01-03 Thread Aaron Conole
Nitin Katiyar  writes:

> Port rx queues that have not been statically assigned to PMDs are currently
> assigned based on periodically sampled load measurements.
> The assignment is performed at specific instances – port addition, port
> deletion, upon reassignment request via CLI etc.
>
> Due to change in traffic pattern over time it can cause uneven load among
> the PMDs and thus resulting in lower overall throughout.
>
> This patch enables the support of auto load balancing of PMDs based on
> measured load of RX queues. Each PMD measures the processing load for each
> of its associated queues every 10 seconds. If the aggregated PMD load reaches
> 95% for 6 consecutive intervals then PMD considers itself to be overloaded.
>
> If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
> performed by OVS main thread. The dry-run does NOT change the existing
> queue to PMD assignments.
>
> If the resultant mapping of dry-run indicates an improved distribution
> of the load then the actual reassignment will be performed.
>
> The automatic rebalancing will be disabled by default and has to be
> enabled via configuration option. The interval (in minutes) between
> two consecutive rebalancing can also be configured via CLI, default
> is 1 min.
>
> Following example commands can be used to set the auto-lb params:
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
>
> Co-authored-by: Rohith Basavaraja 
> Co-authored-by: Venkatesan Pradeep 
> Signed-off-by: Rohith Basavaraja 
> Signed-off-by: Venkatesan Pradeep 
> Signed-off-by: Nitin Katiyar 
> ---
>  lib/dpif-netdev.c| 403 
> +--
>  vswitchd/vswitch.xml |  30 
>  2 files changed, 424 insertions(+), 9 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1564db9..8db106f 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -80,6 +80,12 @@
>  
>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>  
> +/* Auto Load Balancing Defaults */
> +#define ACCEPT_IMPROVE_DEFAULT   (25)
> +#define PMD_LOAD_THRE_DEFAULT(95)
> +#define PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> +#define MIN_TO_MSEC  6
> +
>  #define FLOW_DUMP_MAX_BATCH 50
>  /* Use per thread recirc_depth to prevent recirculation loop. */
>  #define MAX_RECIRC_DEPTH 6
> @@ -288,6 +294,13 @@ struct dp_meter {
>  struct dp_meter_band bands[];
>  };
>  
> +struct pmd_auto_lb {
> +bool auto_lb_conf;/* enable-disable auto load balancing */
> +bool is_enabled;  /* auto_lb current status */
> +uint64_t rebalance_intvl;
> +uint64_t rebalance_poll_timer;
> +};
> +
>  /* Datapath based on the network device interface from netdev.h.
>   *
>   *
> @@ -368,6 +381,7 @@ struct dp_netdev {
>  uint64_t last_tnl_conf_seq;
>  
>  struct conntrack conntrack;
> +struct pmd_auto_lb pmd_alb;
>  };
>  
>  static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
> @@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread {
>  struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. 
> */
>  /* List of rx queues to poll. */
>  struct hmap poll_list OVS_GUARDED;
> +
>  /* Map of 'tx_port's used for transmission.  Written by the main thread,
>   * read by the pmd thread. */
>  struct hmap tx_ports OVS_GUARDED;
> @@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread {
>  /* Keep track of detailed PMD performance statistics. */
>  struct pmd_perf_stats perf_stats;
>  
> +/* Some stats from previous iteration used by automatic pmd
> +   load balance logic. */
> +uint64_t prev_stats[PMD_N_STATS];
> +atomic_count pmd_overloaded;
> +
>  /* Set to true if the pmd thread needs to be reloaded. */
>  bool need_reload;
>  };
> @@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
>   enum rxq_cycles_counter_type type);
>  static void
>  dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx,
> -   unsigned long long cycles);
> +unsigned long long cycles,
> +unsigned idx);
>  static uint64_t
> -dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx);
> +dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx,
> +unsigned idx);
>  static void
>  dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
> bool purge);
> @@ -3734,6 +3756,51 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
> **ops, size_t n_ops,
>  }
>  }
>  
> +/* Enable/Disable PMD auto load balancing */
> +static void
> +set_pmd_auto_lb(struct dp_netdev *dp)
> +{
> +unsigned int cnt = 0;
> +struct dp_netdev_pmd_thread *pmd;
> +struct pmd_auto_lb * pmd_alb = >pmd_alb;
> +
> +bool enable = false;
> +bool 

[ovs-dev] [PATCH v2] Adding support for PMD auto load balancing

2019-01-03 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
---
 lib/dpif-netdev.c| 403 +--
 vswitchd/vswitch.xml |  30 
 2 files changed, 424 insertions(+), 9 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1564db9..8db106f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,12 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ACCEPT_IMPROVE_DEFAULT   (25)
+#define PMD_LOAD_THRE_DEFAULT(95)
+#define PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#define MIN_TO_MSEC  6
+
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */
 #define MAX_RECIRC_DEPTH 6
@@ -288,6 +294,13 @@ struct dp_meter {
 struct dp_meter_band bands[];
 };
 
+struct pmd_auto_lb {
+bool auto_lb_conf;/* enable-disable auto load balancing */
+bool is_enabled;  /* auto_lb current status */
+uint64_t rebalance_intvl;
+uint64_t rebalance_poll_timer;
+};
+
 /* Datapath based on the network device interface from netdev.h.
  *
  *
@@ -368,6 +381,7 @@ struct dp_netdev {
 uint64_t last_tnl_conf_seq;
 
 struct conntrack conntrack;
+struct pmd_auto_lb pmd_alb;
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread {
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
 struct hmap poll_list OVS_GUARDED;
+
 /* Map of 'tx_port's used for transmission.  Written by the main thread,
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
@@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread {
 /* Keep track of detailed PMD performance statistics. */
 struct pmd_perf_stats perf_stats;
 
+/* Some stats from previous iteration used by automatic pmd
+   load balance logic. */
+uint64_t prev_stats[PMD_N_STATS];
+atomic_count pmd_overloaded;
+
 /* Set to true if the pmd thread needs to be reloaded. */
 bool need_reload;
 };
@@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
  enum rxq_cycles_counter_type type);
 static void
 dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx,
-   unsigned long long cycles);
+unsigned long long cycles,
+unsigned idx);
 static uint64_t
-dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx);
+dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx,
+unsigned idx);
 static void
 dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
bool purge);
@@ -3734,6 +3756,51 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**ops, size_t n_ops,
 }
 }
 
+/* Enable/Disable PMD auto load balancing */
+static void
+set_pmd_auto_lb(struct dp_netdev *dp)
+{
+unsigned int cnt = 0;
+struct dp_netdev_pmd_thread *pmd;
+struct pmd_auto_lb * pmd_alb = >pmd_alb;
+
+bool enable = false;
+bool pmd_rxq_assign_cyc = dp->pmd_rxq_assign_cyc;
+
+/* Ensure that there is at least 2 non-isolated PMDs and
+ * one of the PMD is polling more than one rxq
+ */
+CMAP_FOR_EACH (pmd, node, >poll_threads) {
+if (pmd->core_id == NON_PMD_CORE_ID || 

Re: [ovs-dev] OVS seems to not release memory on ARP bursts

2019-01-03 Thread Ani Sinha
Oh and I forgot to mention that we are seeing this issue on OVS versions 2.5.0 
and 2.5.2.

thanks
Ani

On Jan 3, 2019, 1:10 PM +0530, Ani Sinha , wrote:
Hi Ben:

In order to reproduce, please put the node in a network with flood of ARP 
broadcast packets, for example in an environment where machines are performing 
peer to peer upgrades at scheduled intervals. If you have a packet generator, 
you can generate ARP broadcast packets with random source MACs and then flood 
the network with these packets with varying transmission rates. If you see the 
RSS of OVS increasing monotonically, you have recreated the issue. If the host 
has OOM killer enabled, it should eventually kill the OVS daemon. The confusing 
part is that the OVS never seems to free the memory once the slow path packets 
have been processed. That is why the RSS in allocated huge pages keeps 
increasing. We are not sure why this happens.

Let me know if this helps. Happy New Year.

Thanks
Ani
On Dec 28, 2018, 11:21 PM +0530, Ben Pfaff , wrote:
On Fri, Dec 28, 2018 at 07:30:52AM +, Ani Sinha wrote:
We are performing an experiment based on our observed behavior on some of our 
systems. We are using a python packet generator called scapy to generate arp 
broadcast packet bursts with random MACs. What we are observing is that within 
a period of 20 sec to 30 sec, the upcall flow count reaches to about 30K with 
about 1.6G memory consumption by OVS at which point the kernel OOM killer kicks 
in and kills the OVS daemon.

We are wondering if this is an expected behavior and if it is, whether there is 
a setting to limit the size of internal data structures used by OVS for ARP 
packets or processing slow path ARP packets (I have not looked into the code 
and done any analysis).

It's not expected. How do we reproduce it?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev