[ovs-dev] [PATCH v2] Monitor Database table to manage lifecycle of IDL client.
The Python IDL implementation supports ovsdb cluster connections. This patch is a follow up to commit 31e434fc98, it adds the option of connecting to the leader (the default) in the Raft-based cluster. It mimics the exisiting C IDL support for clusters introduced in commit 1b1d2e6daa. The _Server database schema is first requested, then a monitor of the Database table in the _Server Database. Method __check_server_db verifies the eligibility of the server. If the attempt to obtain a monitor of the _Server database fails and a cluster id was not provided this implementation proceeds to request the data monitor. If a cluster id was provided via the set_cluster_id method then the connection is aborted and a connection to a different node is instead attempted, until a valid cluster node is found. Thus, when supplied, cluster id is interpreted as the intention to only allow connections to a clustered database. If not supplied, connections to standalone nodes, or nodes that do not have the _Server database are allowed. change_seqno is not incremented in the case of Database table updates. Signed-off-by: Ted Elhourani --- python/ovs/db/idl.py| 217 python/ovs/reconnect.py | 3 + tests/ovsdb-idl.at | 62 +++--- 3 files changed, 237 insertions(+), 45 deletions(-) diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py index 250e897..f989548 100644 --- a/python/ovs/db/idl.py +++ b/python/ovs/db/idl.py @@ -38,6 +38,7 @@ ROW_DELETE = "delete" OVSDB_UPDATE = 0 OVSDB_UPDATE2 = 1 +CLUSTERED = "clustered" class Idl(object): """Open vSwitch Database Interface Definition Language (OVSDB IDL). @@ -92,10 +93,12 @@ class Idl(object): """ IDL_S_INITIAL = 0 -IDL_S_MONITOR_REQUESTED = 1 -IDL_S_MONITOR_COND_REQUESTED = 2 +IDL_S_SERVER_SCHEMA_REQUESTED = 1 +IDL_S_SERVER_MONITOR_REQUESTED = 2 +IDL_S_DATA_MONITOR_REQUESTED = 3 +IDL_S_DATA_MONITOR_COND_REQUESTED = 4 -def __init__(self, remote, schema_helper, probe_interval=None): +def __init__(self, remote, schema_helper, leader_only=True, probe_interval=None): """Creates and returns a connection to the database named 'db_name' on 'remote', which should be in a form acceptable to ovs.jsonrpc.session.open(). The connection will maintain an in-memory @@ -119,6 +122,9 @@ class Idl(object): The IDL uses and modifies 'schema' directly. +If 'leader_only' is set to True (default value) the IDL will only monitor +and transact with the leader of the cluster. + If "probe_interval" is zero it disables the connection keepalive feature. If non-zero the value will be forced to at least 1000 milliseconds. If None it will just use the default value in OVS. @@ -137,6 +143,20 @@ class Idl(object): self._last_seqno = None self.change_seqno = 0 self.uuid = uuid.uuid1() + +# Server monitor. +self._server_schema_request_id = None +self._server_monitor_request_id = None +self._db_change_aware_request_id = None +self._server_db_name = '_Server' +self._server_db_table = 'Database' +self.server_tables = None +self._server_db = None +self.server_monitor_uuid = uuid.uuid1() +self.leader_only = leader_only +self.cluster_id = None +self._min_index = 0 + self.state = self.IDL_S_INITIAL # Database locking. @@ -172,6 +192,15 @@ class Idl(object): remotes.append(r) return remotes +def set_cluster_id(self, cluster_id): +"""Set the id of the cluster that this idl must connect to.""" +if cluster_id: +self.cluster_id = str(cluster_id) +else: +self.cluster_id = None +if self.state != self.IDL_S_INITIAL: +self.force_reconnect() + def index_create(self, table, name): """Create a named multi-column index on a table""" return self.tables[table].rows.index_create(name) @@ -222,7 +251,7 @@ class Idl(object): if seqno != self._last_seqno: self._last_seqno = seqno self.__txn_abort_all() -self.__send_monitor_request() +self.__send_server_schema_request() if self.lock_name: self.__send_lock_request() break @@ -230,6 +259,7 @@ class Idl(object): msg = self._session.recv() if msg is None: break + if (msg.type == ovs.jsonrpc.Message.T_NOTIFY and msg.method == "update2" and len(msg.params) == 2): @@ -239,7 +269,15 @@ class Idl(object): and msg.method == "update" and len(msg.params) == 2): # Database contents changed. -self.__parse_update(msg.params[1],
Re: [ovs-dev] [PATCH v1] Monitor Database table to manage lifecycle of IDL client.
Thanks for taking a look. I will send a revised patch that addresses both issues you pointed to, in particular backward compatibility with _Server-less ovsdbs. Thanks, Ted On 12/27/18, 9:23 AM, "Ben Pfaff" wrote: On Tue, Dec 18, 2018 at 12:18:06AM +, Ted Elhourani wrote: > The Python IDL implementation supports ovsdb cluster connections. > This patch is a follow up to commit 31e434fc98, it adds the option of > connecting to the leader in the Raft-based cluster. It mimics the > exisiting C IDL support for clusters introduced in commit 1b1d2e6daa. > > The server schema is first requested, then a monitor of the Database table > in the _Server Database. __check_server_db is called to verify the > eligibility of the server. Unlike the C IDL, if an attempt to obtain a > monitor of the Server database fails this implementation does not proceed > to monitor and transact with the database. The reason is connections to > standalone databases should not be permitted when the intention is to > connect to a valid cluster node. The change_seqno is not incremented in the > case of Database table updates. > > The run method of Session (jsonrpc) is modified such that the session is > closed when a stream connection fails, if more than one remote exists. > Otherwise the session will re-attempt to establish a stream to the same > cluster node. Method pick_remote must be called before opening a stream > in __connect (in Session), otherwise a force_reconnect causes the Session > to re-attempt a connection to the same server. pick_remote is no longer > necessary in method run of Session. > > Signed-off-by: Ted Elhourani I'm pretty sure that this breaks backward compatibility with any ovsdb-server not new enough to support the _Server table. The commit message above conflates that with whether the database is clustered. I think your intention is to reject non-clustered databases, but it doesn't do that; instead, it rejects database servers too old to have a _Server table and accepts standalone databases served up by new-enough servers. It's one thing to avoid connecting to standalone databases when a cluster is expected, but this goes beyond that to drop support for all older database servers. I don't think that's reasonable behavior. I noticed that the following looks wrong. EAGAIN indicates that the connection is not complete yet. It should not be treated as a connection failure (unless a timeout has occurred) even if there is more than one remote: > @@ -516,9 +516,9 @@ class Session(object): > self.reconnect.connected(ovs.timeval.msec()) > self.rpc = Connection(self.stream) > self.stream = None > -elif error != errno.EAGAIN: > +elif (error != errno.EAGAIN or > + self.get_num_of_remotes() > 1): > self.reconnect.connect_failed(ovs.timeval.msec(), error) > -self.pick_remote() > self.stream.close() > self.stream = None Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Adding support for PMD auto load balancing
On 01/03/2019 12:36 PM, Nitin Katiyar wrote: > Port rx queues that have not been statically assigned to PMDs are currently > assigned based on periodically sampled load measurements. > The assignment is performed at specific instances – port addition, port > deletion, upon reassignment request via CLI etc. > > Due to change in traffic pattern over time it can cause uneven load among > the PMDs and thus resulting in lower overall throughout. > > This patch enables the support of auto load balancing of PMDs based on > measured load of RX queues. Each PMD measures the processing load for each > of its associated queues every 10 seconds. If the aggregated PMD load reaches > 95% for 6 consecutive intervals then PMD considers itself to be overloaded. > > If any PMD is overloaded, a dry-run of the PMD assignment algorithm is > performed by OVS main thread. The dry-run does NOT change the existing > queue to PMD assignments. > > If the resultant mapping of dry-run indicates an improved distribution > of the load then the actual reassignment will be performed. > > The automatic rebalancing will be disabled by default and has to be > enabled via configuration option. The interval (in minutes) between > two consecutive rebalancing can also be configured via CLI, default > is 1 min. > > Following example commands can be used to set the auto-lb params: > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5" > Hi Nitin, thanks for v2. Not full review yet but sending some comments below. Maybe you can put some of the above into a new section below this http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue-assigment-to-pmd-threads I also think this feature should be experimental until it has been road tested a bit more. > Co-authored-by: Rohith Basavaraja > Co-authored-by: Venkatesan Pradeep > Signed-off-by: Rohith Basavaraja > Signed-off-by: Venkatesan Pradeep > Signed-off-by: Nitin Katiyar > --- > lib/dpif-netdev.c| 403 > +-- > vswitchd/vswitch.xml | 30 > 2 files changed, 424 insertions(+), 9 deletions(-) > There seems to be windows style line endings in the patch? or something else has gone wrong for me. > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 1564db9..8db106f 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -80,6 +80,12 @@ > > VLOG_DEFINE_THIS_MODULE(dpif_netdev); > > +/* Auto Load Balancing Defaults */ > +#define ACCEPT_IMPROVE_DEFAULT (25) > +#define PMD_LOAD_THRE_DEFAULT(95) Probably you should remove the brackets above to be consistent with the others below and in the rest of the file. > +#define PMD_REBALANCE_POLL_INTERVAL 1 /* 1 Min */ > +#define MIN_TO_MSEC 6 > + > #define FLOW_DUMP_MAX_BATCH 50 > /* Use per thread recirc_depth to prevent recirculation loop. */ > #define MAX_RECIRC_DEPTH 6 > @@ -288,6 +294,13 @@ struct dp_meter { > struct dp_meter_band bands[]; > }; > > +struct pmd_auto_lb { > +bool auto_lb_conf;/* enable-disable auto load balancing */ I'm not sure what '_conf' is short for? maybe it should be something like 'auto_lb_requested' > +bool is_enabled; /* auto_lb current status */ Comments should be of style /* Sentence case. */ http://docs.openvswitch.org/en/latest/internals/contributing/coding-style/#comments > +uint64_t rebalance_intvl; > +uint64_t rebalance_poll_timer; > +}; > + > /* Datapath based on the network device interface from netdev.h. > * > * > @@ -368,6 +381,7 @@ struct dp_netdev { > uint64_t last_tnl_conf_seq; > > struct conntrack conntrack; > +struct pmd_auto_lb pmd_alb; > }; > > static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) > @@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread { > struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. > */ > /* List of rx queues to poll. */ > struct hmap poll_list OVS_GUARDED; > + Unrelated newline should be removed > /* Map of 'tx_port's used for transmission. Written by the main thread, > * read by the pmd thread. */ > struct hmap tx_ports OVS_GUARDED; > @@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread { > /* Keep track of detailed PMD performance statistics. */ > struct pmd_perf_stats perf_stats; > > +/* Some stats from previous iteration used by automatic pmd > + load balance logic. */ Nit, but see coding stds. and other multi-line comments wrt style > +uint64_t prev_stats[PMD_N_STATS];> +atomic_count pmd_overloaded; > + > /* Set to true if the pmd thread needs to be reloaded. */ > bool need_reload; > }; > @@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, > enum rxq_cycles_counter_type type); > static void > dp_netdev_rxq_set_intrvl_cycles(struct
Re: [ovs-dev] [PATCH v2] rhel: bug fix upgrade path in kmod fedora spec file
Hi Martin, Sorry the delay, coming back from PTO and holidays. On Thu, Dec 13, 2018 at 02:50:22PM -0800, Martin Xu wrote: > Hi Flavio, > > I remember when we had a discussion about package "succession" when I had > the "Obsoletes" tag added in one of the previous patches submitted for the > fedora spec files. > > We do need the capability for the system to know openvswitch-kmod is > automatically upgrading kmod-openvswitch. For some reason, I thought I > tested "Provides" alone would suffice. I just tried again and it seems not > the case. "Provides" does not make the renamed package automatically > upgrade the older one. Is there any other ways of doing this? I found one > suggestion from the fedora site, that uses both "Provides" and "Obsoletes." > I understand this means we will be declaring openvswitch-kmod is being > replaced. We can't go either way anymore. Is there a better solution? > Thanks. Provides does not allow upgrades. It will satisfy dependencies on what you are providing. A common example is smtp: sendmail and postfix. Both could do: "Provides: smtp", then "mutt" could do "Requires: smtp" and either sendmail or postfix would be fine. You're right that you need Obsoletes to allow upgrades. The problem with Obsoletes is that once you ship it, there is no turning back. That's why I suggested to limit to a specific version. For example: Obsoletes: kmod-openvswitch < %{version}-%{release} That grants all previous packages are indeed obsoleted while it would be possible to resurrect kmod-openvswitch if we change our minds in the future. All we would need is to ship kmod-openvswitch newer than the last %{version}-%{release} shipped. HTH, fbl > > Martin > > On Thu, Dec 13, 2018 at 2:44 PM Martin Xu wrote: > > > This patch removes the "Conflicts" tag and adds "Obsoletes" tag. > > > > With the conflicts tag, when a user attempts to install or upgrade with > > the same version as already installed, the conflict kicks in. Otherwise, > > such is allowed with --replacepkgs. > > > > Obsoletes is needed for the upgrade path from kmod-openvswitch to > > openvswitch-kmod. > > > > Fixes: 22c33c3039 (rhel: support kmod build against mulitple kernel > > versions, fedora) > > > > VMware-BZ: #2249788 > > > > Signed-off-by: Martin Xu > > CC: Flavio Leitner > > CC: Yi-Hung Wei > > CC: Yifeng Sun > > CC: Zak Whittington > > CC: Ben Pfaff > > --- > > v1->v2: adds "Obsoletes" tag needed for upgrade after renaming > > adds more reviewers > > > > rhel/openvswitch-kmod-fedora.spec.in | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/rhel/openvswitch-kmod-fedora.spec.in b/rhel/ > > openvswitch-kmod-fedora.spec.in > > index 8d54fd734..7913dbdb4 100644 > > --- a/rhel/openvswitch-kmod-fedora.spec.in > > +++ b/rhel/openvswitch-kmod-fedora.spec.in > > @@ -33,7 +33,7 @@ Source: openvswitch-%{version}.tar.gz > > #Source1: openvswitch-init > > Buildroot: /tmp/openvswitch-xen-rpm > > Provides: kmod-openvswitch > > -Conflicts: kmod-openvswitch > > +Obsoletes: kmod-openvswitch > > > > %description > > Open vSwitch provides standard network bridging functions augmented with > > -- > > 2.12.3 > > > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Flavio ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [RFC v2 3/5] ovn: Add a new OVN field icmp4.frag_mtu
From: Numan Siddique In order to support OVN specific fields (which are not yet supported in OpenvSwitch to set or modify values) a generic OVN field support is added in this patch. These OVN fields are expected to be used as nested OVN actions inside OVN actions like icmp4, icmp6 etc. This patch adds only one field for now - icmp4.frag_mtu. It should be fairly straightforward to add similar fields in the near future. This field is expected to be used as an inner field with in the 'icmp4' action. Eg. "icmp4 {"eth.dst <-> eth.src; " "icmp4.type = 3; /* Destination Unreachable */ " "icmp4.code = 4; /* Fragmentation Needed */ " icmp4.frag_mtu = 1442; ... "next; };", pinctrl module of ovn-controller will set the specified value in the the low-order 16 bits of the ICMP4 header field that is labelled "unused" in the ICMP specification as defined in the RFC 1191. Upcoming patch will use it to send an icmp4 packet if the source IPv4 packet destined to go via external gateway needs to be fragmented. Signed-off-by: Numan Siddique --- include/ovn/actions.h | 25 +++- include/ovn/automake.mk | 3 +- include/ovn/expr.h| 5 + {ovn/lib => include/ovn}/logical-fields.h | 41 +++ ovn/controller/lflow.c| 1 + ovn/controller/lflow.h| 2 +- ovn/controller/pinctrl.c | 23 +++- ovn/lib/actions.c | 135 +- ovn/lib/automake.mk | 1 - ovn/lib/expr.c| 17 ++- ovn/lib/logical-fields.c | 36 +- ovn/northd/ovn-northd.c | 2 +- ovn/ovn-sb.xml| 11 ++ ovn/utilities/ovn-trace.c | 5 +- tests/ovn.at | 18 ++- tests/test-ovn.c | 2 +- 16 files changed, 282 insertions(+), 45 deletions(-) rename {ovn/lib => include/ovn}/logical-fields.h (71%) diff --git a/include/ovn/actions.h b/include/ovn/actions.h index 1c0c67ce6..095b60df0 100644 --- a/include/ovn/actions.h +++ b/include/ovn/actions.h @@ -80,7 +80,8 @@ struct ovn_extend_table; OVNACT(LOG, ovnact_log) \ OVNACT(PUT_ND_RA_OPTS,ovnact_put_opts)\ OVNACT(ND_NS, ovnact_nest)\ -OVNACT(SET_METER, ovnact_set_meter) +OVNACT(SET_METER, ovnact_set_meter) \ +OVNACT(OVNFIELD_LOAD, ovnact_load) /* enum ovnact_type, with a member OVNACT_ for each action. */ enum OVS_PACKED_ENUM ovnact_type { @@ -142,6 +143,20 @@ ovnact_end(const struct ovnact *ovnacts, size_t ovnacts_len) #define OVNACT_FOR_EACH(POS, OVNACTS, OVNACTS_LEN) \ for ((POS) = (OVNACTS); (POS) < ovnact_end(OVNACTS, OVNACTS_LEN); \ (POS) = ovnact_next(POS)) + +static inline int +ovnacts_count(const struct ovnact *ovnacts, size_t ovnacts_len) +{ +uint8_t n_ovnacts = 0; +if (ovnacts) { +const struct ovnact *a; + +OVNACT_FOR_EACH (a, ovnacts, ovnacts_len) { +n_ovnacts++; +} +} +return n_ovnacts; +} /* Action structure for each OVNACT_*. */ @@ -229,6 +244,8 @@ struct ovnact_nest { struct ovnact ovnact; struct ovnact *nested; size_t nested_len; +struct ovnact *nested_ovnfields; +size_t nested_ovnfields_len; }; /* OVNACT_GET_ARP, OVNACT_GET_ND. */ @@ -461,6 +478,12 @@ struct action_header { }; BUILD_ASSERT_DECL(sizeof(struct action_header) == 8); +OVS_PACKED( +struct ovnfield_act_header { +ovs_be16 id; /* one of enum ovnfield_id. */ +ovs_be16 len; /* Length of the ovnfield data. */ +}); + struct ovnact_parse_params { /* A table of "struct expr_symbol"s to support (as one would provide to * expr_parse()). */ diff --git a/include/ovn/automake.mk b/include/ovn/automake.mk index d2924c2f6..54b0e2c0e 100644 --- a/include/ovn/automake.mk +++ b/include/ovn/automake.mk @@ -2,4 +2,5 @@ ovnincludedir = $(includedir)/ovn ovninclude_HEADERS = \ include/ovn/actions.h \ include/ovn/expr.h \ - include/ovn/lex.h + include/ovn/lex.h \ + include/ovn/logical-fields.h diff --git a/include/ovn/expr.h b/include/ovn/expr.h index 3995e62f0..a68fd6e1c 100644 --- a/include/ovn/expr.h +++ b/include/ovn/expr.h @@ -58,6 +58,7 @@ #include "openvswitch/list.h" #include "openvswitch/match.h" #include "openvswitch/meta-flow.h" +#include "logical-fields.h" struct ds; struct expr; @@ -244,6 +245,7 @@ struct expr_symbol { int width; const struct mf_field *field; /* Fields only, otherwise NULL. */ +const struct ovn_field *ovn_field; /* OVN Fields only, otherwise NULL. */ const struct expr_symbol *parent; /* Subfields only, otherwise NULL. */ int parent_ofs; /* Subfields only, otherwise 0. */ char
[ovs-dev] [RFC v2 2/5] Add a new OVS action check_pkt_larger
From: Numan Siddique This patch adds a new action 'check_pkt_larger' which checks if the packet is larger than the given size and stores the result in the destination register. Usage: check_pkt_larger:len->REGISTER Eg. match=...,actions=check_pkt_larger:1442->NXM_NX_REG0[0],next; This patch makes use of the new datapath action - 'check_pkt_len' which is still WIP. At the start of ovs-vswitchd, datapath is probed for this action. If the datapath action is present, then 'check_pkt_larger' makes use of this datapath action. Datapath action 'check_pkt_len' takes these nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. Let's say we have these flows added to an OVS bridge br-int table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1) table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3 table=1, priority=100,in_port=1,ip,actions=output:4 Suppose if a packet is received from in_port=1, the packet is sent to userspace (MISS_UPCALL). The action 'check_pkt_larger' will be translated as - check_pkt_len( > 100 ? (3) : (4)) datapath will check the packet length and if the packet length is greater than 100, it will be output to port 3, else it will be output to port 4. In case, datapath doesn't support 'check_pkt_len' action, the OVS action 'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added. This OVS action is intended to be used by OVN to check the packet length and generate an ICMP packet with type 3, code 4 and next hop mtu in the logical router pipeline if the MTU of the physical interface is lesser than the packet length. More information can be found here [1] TODO: - Add test cases. - Add documentation Request to suggest a better name for the action in case 'check_pkt_larger' seems odd. [1] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff Signed-off-by: Numan Siddique CC: Ben Pfaff --- .../linux/compat/include/linux/openvswitch.h | 30 +++-- include/openvswitch/ofp-actions.h | 18 +++ lib/dpif-netdev.c | 1 + lib/dpif.c| 1 + lib/odp-execute.c | 4 + lib/odp-util.c| 36 ++ lib/ofp-actions.c | 98 +++- lib/ofp-parse.c | 10 ++ ofproto/ofproto-dpif-ipfix.c | 1 + ofproto/ofproto-dpif-sflow.c | 1 + ofproto/ofproto-dpif-xlate.c | 108 ++ ofproto/ofproto-dpif.c| 47 ofproto/ofproto-dpif.h| 5 +- 13 files changed, 349 insertions(+), 11 deletions(-) diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index 9b087f1b0..26dd69dcd 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -855,6 +855,24 @@ enum ovs_nat_attr { #define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1) +/* + * enum ovs_check_pkt_len_attr - Attributes for %OVS_ACTION_ATTR_CHECK_PKT_LEN. + * + * @OVS_CHECK_PKT_LEN_ATTR_PKT_LEN: u16 Packet length to check for. + * @OVS_CHECK_PKT_LEN_ATTR_USERSPACE_COND: u8 comparison condition to send + * the packet to userspace. One of OVS_CHECK_PKT_LEN_COND_*. + * @OVS_CHECK_PKT_LEN_ATTR_USERPACE - Nested OVS_USERSPACE_ATTR_* actions. + */ +enum ovs_check_pkt_len_attr { +OVS_CHECK_PKT_LEN_ATTR_UNSPEC, +OVS_CHECK_PKT_LEN_ATTR_PKT_LEN, +OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER, +OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL, +__OVS_CHECK_PKT_LEN_ATTR_MAX, +}; + +#define OVS_CHECK_PKT_LEN_ATTR_MAX (__OVS_CHECK_PKT_LEN_ATTR_MAX - 1) + /** * enum ovs_action_attr - Action types. * @@ -909,8 +927,6 @@ enum ovs_nat_attr { * tunnel header. * @OVS_ACTION_ATTR_METER: Run packet through a meter, which may drop the * packet, or modify the packet (e.g., change the DSCP field). - * @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of - * actions without affecting the original packet and key. */ enum ovs_action_attr { @@ -936,12 +952,13 @@ enum ovs_action_attr { OVS_ACTION_ATTR_CT_CLEAR, /* No argument. */ OVS_ACTION_ATTR_PUSH_NSH, /* Nested OVS_NSH_KEY_ATTR_*. */ OVS_ACTION_ATTR_POP_NSH, /* No argument. */ - OVS_ACTION_ATTR_METER,/* u32 meter number. */ - OVS_ACTION_ATTR_CLONE,
[ovs-dev] [RFC v2 1/5] datapath: Add a new action check_pkt_len
From: Numan Siddique [Please note, this patch is submitted as RFC in ovs-dev ML to get feedback before submitting to netdev ML] This patch adds a new action which checks the packet length and executes a set of actions if the packet length is greater than the specified length or executes another set of actions if the packet length is lesser or equal to. This action takes below nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. The main use case for adding this action is to solve the packet drops because of MTU mismatch in OVN virtual networking solution. When a VM (which belongs to a logical switch of OVN) sends a packet destined to go via the gateway router and if the nic which provides external connectivity, has a lesser MTU, OVS drops the packet if the packet length is greater than this MTU. With the help of this action, OVN will check the packet length and if it is greater than the MTU size, it will generate an ICMP packet (type 3, code 4) and includes the next hop mtu in it so that the sender can fragment the packets. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff Signed-off-by: Numan Siddique CC: Ben Pfaff CC: Justin Pettit CC: Pravin B Shelar --- include/uapi/linux/openvswitch.h | 25 - net/openvswitch/actions.c| 55 +- net/openvswitch/flow_netlink.c | 175 +++ 3 files changed, 253 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index dbe0cbe4f1b7..c395baffdd42 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -798,6 +798,27 @@ struct ovs_action_push_eth { struct ovs_key_ethernet addresses; }; +/* + * enum ovs_check_pkt_len_attr - Attributes for %OVS_ACTION_ATTR_CHECK_PKT_LEN. + * + * @OVS_CHECK_PKT_LEN_ATTR_PKT_LEN: u16 Packet length to check for. + * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER: Nested OVS_ACTION_ATTR_* + * actions to apply if the packer length is greater than the specified + * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN. + * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested OVS_ACTION_ATTR_* + * actions to apply if the packer length is lesser or equal to the specified + * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN. + */ +enum ovs_check_pkt_len_attr { + OVS_CHECK_PKT_LEN_ATTR_UNSPEC, + OVS_CHECK_PKT_LEN_ATTR_PKT_LEN, + OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER, + OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL, + __OVS_CHECK_PKT_LEN_ATTR_MAX, +}; + +#define OVS_CHECK_PKT_LEN_ATTR_MAX (__OVS_CHECK_PKT_LEN_ATTR_MAX - 1) + /** * enum ovs_action_attr - Action types. * @@ -842,7 +863,8 @@ struct ovs_action_push_eth { * packet, or modify the packet (e.g., change the DSCP field). * @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of * actions without affecting the original packet and key. - * + * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the length of the packet and + * execute a set of actions as specified in OVS_CHECK_PKT_LEN_ATTR_*. * Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not all * fields within a header are modifiable, e.g. the IPv4 protocol and fragment * type may not be changed. @@ -875,6 +897,7 @@ enum ovs_action_attr { OVS_ACTION_ATTR_PUSH_NSH, /* Nested OVS_NSH_KEY_ATTR_*. */ OVS_ACTION_ATTR_POP_NSH, /* No argument. */ OVS_ACTION_ATTR_METER,/* u32 meter ID. */ + OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */ OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*. */ __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index e47ebbbe71b8..9551c07eae92 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -169,6 +169,10 @@ static int clone_execute(struct datapath *dp, struct sk_buff *skb, const struct nlattr *actions, int len, bool last, bool clone_flow_key); +static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, + struct sw_flow_key *key, + const struct nlattr *attr, int len); + static void update_ethertype(struct sk_buff *skb, struct ethhdr *hdr, __be16 ethertype) { @@ -1213,6 +1217,46 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, return clone_execute(dp, skb, key, recirc_id, NULL, 0, last, true); } +static int
[ovs-dev] [RFC v2 0/5] Address MTU issue for larger packets in OVN
From: Numan Siddique This is an RFC series to address the MTU issues for OVN reported here [1]. To address this issue, a new OVS action - check_pkt_larger is added. A new datapath action is also added - check_pkt_len. The datapath patch is submitted here to get feedback before submitting to the kernel net-dev ML. v1 of RFC included patches for datapath and OVS. This v2 series also includes the corresponding OVN changes. In v1, check_pkt_len datapath action was implemented as - check_pkt_len(pkt_len, condition, userspace action if condition matches) Eg. check_pkt_len( > 1500 ? userspace(pid=,slow_path(check_pkt_len)) If the packet length is greater than 1500 bytes, send it to userspace. In v2 a different approach is taken - check_pkt_len(> pkt_len ? (set of action to apply) : (another set of actions to apply)) Eg. check_pkt_len(> 1500 ? (2): (3)) If the packet length is greater than 1500, output to port 2, else output to port 3. ovs-vswitchd is modified accordingly. The ovs action - check_pkt_larger is unchanged. It looks like - check_pkt_larger(pkt_len)->REGISTER_BIT. Eg. check_pkt_larger(1500)->NXM_NX_REG0[0] If the packet is larger than 1500 bytes, REG0[0] will be set to 1. More details in the patches. If the approach seems reasonable, I will submit the datapath patch first to net-dev ML first. [1] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Datapath patch - Numan Siddique (1): datapath: Add a new action check_pkt_len include/uapi/linux/openvswitch.h | 25 - net/openvswitch/actions.c| 55 +- net/openvswitch/flow_netlink.c | 175 +++ 3 files changed, 253 insertions(+), 2 deletions(-) OVS patches --- Numan Siddique (4): Add a new OVS action check_pkt_larger ovn: Add a new OVN field icmp4.frag_mtu ovn: Support OVS action 'check_pkt_larger' in OVN ovn: Generate ICMPv4 packet in router pipeline for larger packets .../linux/compat/include/linux/openvswitch.h | 30 ++- include/openvswitch/ofp-actions.h | 18 ++ include/ovn/actions.h | 33 ++- include/ovn/automake.mk | 3 +- include/ovn/expr.h| 5 + {ovn/lib => include/ovn}/logical-fields.h | 41 lib/dpif-netdev.c | 1 + lib/dpif.c| 1 + lib/odp-execute.c | 4 + lib/odp-util.c| 36 lib/ofp-actions.c | 98 - lib/ofp-parse.c | 10 + ofproto/ofproto-dpif-ipfix.c | 1 + ofproto/ofproto-dpif-sflow.c | 1 + ofproto/ofproto-dpif-xlate.c | 108 ++ ofproto/ofproto-dpif.c| 47 + ofproto/ofproto-dpif.h| 5 +- ovn/controller/lflow.c| 1 + ovn/controller/lflow.h| 2 +- ovn/controller/pinctrl.c | 38 +++- ovn/lib/actions.c | 188 ++--- ovn/lib/automake.mk | 1 - ovn/lib/expr.c| 17 +- ovn/lib/logical-fields.c | 36 +++- ovn/northd/ovn-northd.8.xml | 83 +++- ovn/northd/ovn-northd.c | 91 - ovn/ovn-sb.xml| 32 +++ ovn/utilities/ovn-trace.c | 9 +- tests/ovn.at | 193 +- tests/test-ovn.c | 2 +- 30 files changed, 1072 insertions(+), 63 deletions(-) rename {ovn/lib => include/ovn}/logical-fields.h (71%) -- 2.20.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] conntrack: Check all addresses for ephemeral ports.
On Thu, Jan 03, 2019 at 10:11:29AM -0500, Aaron Conole wrote: > Aaron Conole writes: > > > Ben Pfaff writes: > > > >> On Thu, Dec 20, 2018 at 01:22:18PM -0500, Aaron Conole wrote: > >>> Kevin Traynor writes: > >>> > >>> > On 12/19/2018 08:23 AM, Darrell Ball wrote: > >>> >> On Tue, Dec 18, 2018 at 6:57 PM 0-day Robot wrote: > >>> >> > >>> >>> Bleep bloop. Greetings Darrell Ball, I am a robot and I have tried > >>> >>> out > >>> >>> your patch. > >>> >>> Thanks for your contribution. > >>> >>> > >>> >>> I encountered some error that I wasn't expecting. See the details > >>> >>> below. > >>> >>> > >>> >>> > >>> >>> checkpatch: > >>> >>> ERROR: Too many signoffs; are you missing Co-authored-by lines? > >>> >>> Lines checked: 37, Warnings: 0, Errors: 1 > >>> >>> > >>> >> > >>> >> I don't understand this complaint. > >>> >> > >>> > > >>> > This is a false positive. I've seen patchwork duplicate signed-off-by's > >>> > in an mbox before (reported to Stephen, > >>> > https://github.com/getpatchwork/patchwork/issues/219) which would cause > >>> > this error, but downloading this mbox locally it seems ok. > >>> > >>> It's a false positive, but not for that reason. The bot keeps up to date > >>> with the most recent checkpatch, so it stamps a sign-off as part of the > >>> delivery chain. The older branch checkpatch version doesn't understand > >>> that. > >>> > >>> We probably should either backport 3267343a8487 ("checkpatch: Improve > >>> accuracy and specificity of sign-off checking.") to the relevant > >>> branches so that when someone submits a patch on the branch it's > >>> checked, -OR- improve the robot to just save off the latest checkpatch > >>> version before starting to apply patches. I like the idea of the former > >>> so that checkpatch changes can self-check, but it comes with a drawback > >>> (like checkpatch changes won't be invoked until after they're applied to > >>> the tree .. I guess it isn't such a big deal, though). > >> > >> Hmm. I'm inclined to suggest that the robot should always use the > >> latest checkpatch regardless of branch. Otherwise we'll have to > >> make a policy of backporting checkpatch updates, and I'm not really in > >> favor of that. > > > > Makes sense. I'll work on it when I'm back from my PTO. > > So, I'm modifying the bot - just before it switches branches, it will > save off the latest checkpatch. That lets new changes on master be > invoked, and backports will use the latest checkpatch versions. I'll > hook it up today. > > Sound reasonable? Sure, sounds reasonable to me. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Stopwatch module test failure on FreeBSD.
0-day Robot writes: > Bleep bloop. Greetings Ilya Maximets, I am a robot and I have tried out your > patch. > Thanks for your contribution. > > I encountered some error that I wasn't expecting. See the details below. > > > git-am: > fatal: sha1 information is lacking or useless > (tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/85/stdout). > Repository lacks necessary blobs to fall back on 3-way merge. > Cannot fall back to three-way merge. > Patch failed at 0001 Stopwatch module test failure on FreeBSD. > The copy of the patch that failed is found in: > > /var/lib/jenkins/jobs/upstream_build_from_pw/workspace/.git/rebase-apply/patch > When you have resolved this problem, run "git am --resolved". > If you prefer to skip this patch, run "git am --skip" instead. > To restore the original branch and stop patching, run "git am --abort". This is a cool corner case. Patchwork detected this as a patch (probably because there is a diff line in the email body), and that triggered the bot. Cool find! I'm going to try and make a change to filter these kinds of things out. Sorry for the spam. > > Please check this out. If you feel there has been an error, please > email acon...@bytheb.org > > Thanks, > 0-day Robot > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] conntrack: Check all addresses for ephemeral ports.
Aaron Conole writes: > Ben Pfaff writes: > >> On Thu, Dec 20, 2018 at 01:22:18PM -0500, Aaron Conole wrote: >>> Kevin Traynor writes: >>> >>> > On 12/19/2018 08:23 AM, Darrell Ball wrote: >>> >> On Tue, Dec 18, 2018 at 6:57 PM 0-day Robot wrote: >>> >> >>> >>> Bleep bloop. Greetings Darrell Ball, I am a robot and I have tried out >>> >>> your patch. >>> >>> Thanks for your contribution. >>> >>> >>> >>> I encountered some error that I wasn't expecting. See the details >>> >>> below. >>> >>> >>> >>> >>> >>> checkpatch: >>> >>> ERROR: Too many signoffs; are you missing Co-authored-by lines? >>> >>> Lines checked: 37, Warnings: 0, Errors: 1 >>> >>> >>> >> >>> >> I don't understand this complaint. >>> >> >>> > >>> > This is a false positive. I've seen patchwork duplicate signed-off-by's >>> > in an mbox before (reported to Stephen, >>> > https://github.com/getpatchwork/patchwork/issues/219) which would cause >>> > this error, but downloading this mbox locally it seems ok. >>> >>> It's a false positive, but not for that reason. The bot keeps up to date >>> with the most recent checkpatch, so it stamps a sign-off as part of the >>> delivery chain. The older branch checkpatch version doesn't understand >>> that. >>> >>> We probably should either backport 3267343a8487 ("checkpatch: Improve >>> accuracy and specificity of sign-off checking.") to the relevant >>> branches so that when someone submits a patch on the branch it's >>> checked, -OR- improve the robot to just save off the latest checkpatch >>> version before starting to apply patches. I like the idea of the former >>> so that checkpatch changes can self-check, but it comes with a drawback >>> (like checkpatch changes won't be invoked until after they're applied to >>> the tree .. I guess it isn't such a big deal, though). >> >> Hmm. I'm inclined to suggest that the robot should always use the >> latest checkpatch regardless of branch. Otherwise we'll have to >> make a policy of backporting checkpatch updates, and I'm not really in >> favor of that. > > Makes sense. I'll work on it when I'm back from my PTO. So, I'm modifying the bot - just before it switches branches, it will save off the latest checkpatch. That lets new changes on master be invoked, and backports will use the latest checkpatch versions. I'll hook it up today. Sound reasonable? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Adding support for PMD auto load balancing
Nitin Katiyar writes: > Port rx queues that have not been statically assigned to PMDs are currently > assigned based on periodically sampled load measurements. > The assignment is performed at specific instances – port addition, port > deletion, upon reassignment request via CLI etc. > > Due to change in traffic pattern over time it can cause uneven load among > the PMDs and thus resulting in lower overall throughout. > > This patch enables the support of auto load balancing of PMDs based on > measured load of RX queues. Each PMD measures the processing load for each > of its associated queues every 10 seconds. If the aggregated PMD load reaches > 95% for 6 consecutive intervals then PMD considers itself to be overloaded. > > If any PMD is overloaded, a dry-run of the PMD assignment algorithm is > performed by OVS main thread. The dry-run does NOT change the existing > queue to PMD assignments. > > If the resultant mapping of dry-run indicates an improved distribution > of the load then the actual reassignment will be performed. > > The automatic rebalancing will be disabled by default and has to be > enabled via configuration option. The interval (in minutes) between > two consecutive rebalancing can also be configured via CLI, default > is 1 min. > > Following example commands can be used to set the auto-lb params: > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" > ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5" > > Co-authored-by: Rohith Basavaraja > Co-authored-by: Venkatesan Pradeep > Signed-off-by: Rohith Basavaraja > Signed-off-by: Venkatesan Pradeep > Signed-off-by: Nitin Katiyar > --- > lib/dpif-netdev.c| 403 > +-- > vswitchd/vswitch.xml | 30 > 2 files changed, 424 insertions(+), 9 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 1564db9..8db106f 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -80,6 +80,12 @@ > > VLOG_DEFINE_THIS_MODULE(dpif_netdev); > > +/* Auto Load Balancing Defaults */ > +#define ACCEPT_IMPROVE_DEFAULT (25) > +#define PMD_LOAD_THRE_DEFAULT(95) > +#define PMD_REBALANCE_POLL_INTERVAL 1 /* 1 Min */ > +#define MIN_TO_MSEC 6 > + > #define FLOW_DUMP_MAX_BATCH 50 > /* Use per thread recirc_depth to prevent recirculation loop. */ > #define MAX_RECIRC_DEPTH 6 > @@ -288,6 +294,13 @@ struct dp_meter { > struct dp_meter_band bands[]; > }; > > +struct pmd_auto_lb { > +bool auto_lb_conf;/* enable-disable auto load balancing */ > +bool is_enabled; /* auto_lb current status */ > +uint64_t rebalance_intvl; > +uint64_t rebalance_poll_timer; > +}; > + > /* Datapath based on the network device interface from netdev.h. > * > * > @@ -368,6 +381,7 @@ struct dp_netdev { > uint64_t last_tnl_conf_seq; > > struct conntrack conntrack; > +struct pmd_auto_lb pmd_alb; > }; > > static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) > @@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread { > struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. > */ > /* List of rx queues to poll. */ > struct hmap poll_list OVS_GUARDED; > + > /* Map of 'tx_port's used for transmission. Written by the main thread, > * read by the pmd thread. */ > struct hmap tx_ports OVS_GUARDED; > @@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread { > /* Keep track of detailed PMD performance statistics. */ > struct pmd_perf_stats perf_stats; > > +/* Some stats from previous iteration used by automatic pmd > + load balance logic. */ > +uint64_t prev_stats[PMD_N_STATS]; > +atomic_count pmd_overloaded; > + > /* Set to true if the pmd thread needs to be reloaded. */ > bool need_reload; > }; > @@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, > enum rxq_cycles_counter_type type); > static void > dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx, > - unsigned long long cycles); > +unsigned long long cycles, > +unsigned idx); > static uint64_t > -dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx); > +dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, > +unsigned idx); > static void > dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, > bool purge); > @@ -3734,6 +3756,51 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op > **ops, size_t n_ops, > } > } > > +/* Enable/Disable PMD auto load balancing */ > +static void > +set_pmd_auto_lb(struct dp_netdev *dp) > +{ > +unsigned int cnt = 0; > +struct dp_netdev_pmd_thread *pmd; > +struct pmd_auto_lb * pmd_alb = >pmd_alb; > + > +bool enable = false; > +bool
[ovs-dev] [PATCH v2] Adding support for PMD auto load balancing
Port rx queues that have not been statically assigned to PMDs are currently assigned based on periodically sampled load measurements. The assignment is performed at specific instances – port addition, port deletion, upon reassignment request via CLI etc. Due to change in traffic pattern over time it can cause uneven load among the PMDs and thus resulting in lower overall throughout. This patch enables the support of auto load balancing of PMDs based on measured load of RX queues. Each PMD measures the processing load for each of its associated queues every 10 seconds. If the aggregated PMD load reaches 95% for 6 consecutive intervals then PMD considers itself to be overloaded. If any PMD is overloaded, a dry-run of the PMD assignment algorithm is performed by OVS main thread. The dry-run does NOT change the existing queue to PMD assignments. If the resultant mapping of dry-run indicates an improved distribution of the load then the actual reassignment will be performed. The automatic rebalancing will be disabled by default and has to be enabled via configuration option. The interval (in minutes) between two consecutive rebalancing can also be configured via CLI, default is 1 min. Following example commands can be used to set the auto-lb params: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5" Co-authored-by: Rohith Basavaraja Co-authored-by: Venkatesan Pradeep Signed-off-by: Rohith Basavaraja Signed-off-by: Venkatesan Pradeep Signed-off-by: Nitin Katiyar --- lib/dpif-netdev.c| 403 +-- vswitchd/vswitch.xml | 30 2 files changed, 424 insertions(+), 9 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1564db9..8db106f 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -80,6 +80,12 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev); +/* Auto Load Balancing Defaults */ +#define ACCEPT_IMPROVE_DEFAULT (25) +#define PMD_LOAD_THRE_DEFAULT(95) +#define PMD_REBALANCE_POLL_INTERVAL 1 /* 1 Min */ +#define MIN_TO_MSEC 6 + #define FLOW_DUMP_MAX_BATCH 50 /* Use per thread recirc_depth to prevent recirculation loop. */ #define MAX_RECIRC_DEPTH 6 @@ -288,6 +294,13 @@ struct dp_meter { struct dp_meter_band bands[]; }; +struct pmd_auto_lb { +bool auto_lb_conf;/* enable-disable auto load balancing */ +bool is_enabled; /* auto_lb current status */ +uint64_t rebalance_intvl; +uint64_t rebalance_poll_timer; +}; + /* Datapath based on the network device interface from netdev.h. * * @@ -368,6 +381,7 @@ struct dp_netdev { uint64_t last_tnl_conf_seq; struct conntrack conntrack; +struct pmd_auto_lb pmd_alb; }; static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) @@ -682,6 +696,7 @@ struct dp_netdev_pmd_thread { struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */ /* List of rx queues to poll. */ struct hmap poll_list OVS_GUARDED; + /* Map of 'tx_port's used for transmission. Written by the main thread, * read by the pmd thread. */ struct hmap tx_ports OVS_GUARDED; @@ -702,6 +717,11 @@ struct dp_netdev_pmd_thread { /* Keep track of detailed PMD performance statistics. */ struct pmd_perf_stats perf_stats; +/* Some stats from previous iteration used by automatic pmd + load balance logic. */ +uint64_t prev_stats[PMD_N_STATS]; +atomic_count pmd_overloaded; + /* Set to true if the pmd thread needs to be reloaded. */ bool need_reload; }; @@ -792,9 +812,11 @@ dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, enum rxq_cycles_counter_type type); static void dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx, - unsigned long long cycles); +unsigned long long cycles, +unsigned idx); static uint64_t -dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx); +dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, +unsigned idx); static void dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, bool purge); @@ -3734,6 +3756,51 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op **ops, size_t n_ops, } } +/* Enable/Disable PMD auto load balancing */ +static void +set_pmd_auto_lb(struct dp_netdev *dp) +{ +unsigned int cnt = 0; +struct dp_netdev_pmd_thread *pmd; +struct pmd_auto_lb * pmd_alb = >pmd_alb; + +bool enable = false; +bool pmd_rxq_assign_cyc = dp->pmd_rxq_assign_cyc; + +/* Ensure that there is at least 2 non-isolated PMDs and + * one of the PMD is polling more than one rxq + */ +CMAP_FOR_EACH (pmd, node, >poll_threads) { +if (pmd->core_id == NON_PMD_CORE_ID ||
Re: [ovs-dev] OVS seems to not release memory on ARP bursts
Oh and I forgot to mention that we are seeing this issue on OVS versions 2.5.0 and 2.5.2. thanks Ani On Jan 3, 2019, 1:10 PM +0530, Ani Sinha , wrote: Hi Ben: In order to reproduce, please put the node in a network with flood of ARP broadcast packets, for example in an environment where machines are performing peer to peer upgrades at scheduled intervals. If you have a packet generator, you can generate ARP broadcast packets with random source MACs and then flood the network with these packets with varying transmission rates. If you see the RSS of OVS increasing monotonically, you have recreated the issue. If the host has OOM killer enabled, it should eventually kill the OVS daemon. The confusing part is that the OVS never seems to free the memory once the slow path packets have been processed. That is why the RSS in allocated huge pages keeps increasing. We are not sure why this happens. Let me know if this helps. Happy New Year. Thanks Ani On Dec 28, 2018, 11:21 PM +0530, Ben Pfaff , wrote: On Fri, Dec 28, 2018 at 07:30:52AM +, Ani Sinha wrote: We are performing an experiment based on our observed behavior on some of our systems. We are using a python packet generator called scapy to generate arp broadcast packet bursts with random MACs. What we are observing is that within a period of 20 sec to 30 sec, the upcall flow count reaches to about 30K with about 1.6G memory consumption by OVS at which point the kernel OOM killer kicks in and kills the OVS daemon. We are wondering if this is an expected behavior and if it is, whether there is a setting to limit the size of internal data structures used by OVS for ARP packets or processing slow path ARP packets (I have not looked into the code and done any analysis). It's not expected. How do we reproduce it? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev