Re: [ovs-dev] [Suspected-Phishing] [PATCH v8 0/6] OVS-DPDK flow offload with rte_flow

2018-04-09 Thread Shahaf Shuler
Tuesday, March 27, 2018 10:55 AM, Shahaf Shuler:

Hi, 

Any comments on this version?

> 
> Hi,
> 
> Here is a joint work from Mellanox and Napatech, to enable the flow hw
> offload with the DPDK generic flow interface (rte_flow).
> 
> The basic idea is to associate the flow with a mark id (a unit32_t number).
> Later, we then get the flow directly from the mark id, which could bypass
> some heavy CPU operations, including but not limiting to mini flow extract,
> emc lookup, dpcls lookup, etc.
> 
> The association is done with CMAP in patch 1. The CPU workload bypassing is
> done in patch 2. The flow offload is done in patch 3, which mainly does two
> things:
> 
> - translate the ovs match to DPDK rte flow patterns
> - bind those patterns with a RSS + MARK action.
> 
> Patch 5 makes the offload work happen in another thread, for leaving the
> datapath as light as possible.
> 
> A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and 1
> million streams (tp_src=1000-1999, tp_dst=2000-2999) show more than 260%
> performance boost.
> 
> Note that it's disabled by default, which can be enabled by:
> 
> $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> 
> v8: - enhanced documentation with more details on supported protocols
> - fixed VLOG to start with capital letter
> - fixed compilation issues
> - fixed coding style
> - addressed the rest of Ian's comments
> 
> v7: - fixed wrong hash for mark_to_flow that has been refactored in v6
> - set the rss_conf for rss action to NULL, to workaround a mlx5 change
>   in DPDK v17.11. Note that it will obey the rss settings OVS-DPDK has
>   set in the beginning. Thus, nothing should be effected.
> 
> v6: - fixed a sparse warning
> - added documentation
> - used hash_int to compute mark to flow hash
> - added more comments
> - added lock for pot lookup
> - rebased on top of the latest code
> 
> v5: - fixed an issue that it took too long if we do flow add/remove
>   repeatedly.
> - removed an unused mutex lock
> - turned most of the log level to DBG
> - rebased on top of the latest code
> 
> v4: - use RSS action instead of QUEUE action with MARK
> - make it work with multiple queue (see patch 1)
> - rebased on top of latest code
> 
> v3: - The mark and id association is done with array instead of CMAP.
> - Added a thread to do hw offload operations
> - Removed macros completely
> - dropped the patch to set FDIR_CONF, which is a workround some
>   Intel NICs.
> - Added a debug patch to show all flow patterns we have created.
> - Misc fixes
> 
> v2: - workaround the queue action issue
> - fixed the tcp_flags being skipped issue, which also fixed the
>   build warnings
> - fixed l2 patterns for Intel nic
> - Converted some macros to functions
> - did not hardcode the max number of flow/action
> - rebased on top of the latest code
> 
> Thanks.
> 
> ---
> 
> Finn Christensen (1):
>   netdev-dpdk: implement flow offload with rte flow
> 
> Yuanhan Liu (5):
>   dpif-netdev: associate flow with a mark id
>   dpif-netdev: retrieve flow directly from the flow mark
>   netdev-dpdk: add debug for rte flow patterns
>   dpif-netdev: do hw flow offload in a thread
>   Documentation: document ovs-dpdk flow offload
> 
>  Documentation/howto/dpdk.rst |  22 ++
>  NEWS |   3 +-
>  lib/dp-packet.h  |  13 +
>  lib/dpif-netdev.c| 497 -
>  lib/flow.c   | 155 ++--
>  lib/flow.h   |   1 +
>  lib/netdev-dpdk.c| 740
> +-
>  lib/netdev.h |   6 +
>  8 files changed, 1397 insertions(+), 40 deletions(-)
> 
> --
> 2.7.4
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> il.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-
> dev&data=02%7C01%7Cshahafs%40mellanox.com%7C06b32d22c2e34de31af
> 508d593b82bd2%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C1%7C6365
> 77341596013800&sdata=FrFTVCkg6HzG%2FMaFYw9zsytUuQPXMzyr8z893Qp
> zYbc%3D&reserved=0
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev 1/2] tc: Make the actions order consistent

2018-04-09 Thread Chris Mi
When OVS DP passes the actions to TC library, we save all the
actions in data structure tc_flower and each action type has its
own field in tc_flower. So when TC library passes the actions to
kernel, actually the actions order is lost.

We add an actions array in tc_flower to keep the actions order
in this patch.

Issue: 1321102
Change-Id: If1d96eb3d1dc3b482667f456ddcddb17ac4c
Signed-off-by: Chris Mi 
Reviewed-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/netdev-tc-offloads.c | 182 +++
 lib/tc.c | 158 
 lib/tc.h |  57 +--
 3 files changed, 238 insertions(+), 159 deletions(-)

diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
index 6db76801f..101b2ffb1 100644
--- a/lib/netdev-tc-offloads.c
+++ b/lib/netdev-tc-offloads.c
@@ -393,18 +393,14 @@ parse_tc_flower_to_match(struct tc_flower *flower,
  struct match *match,
  struct nlattr **actions,
  struct dpif_flow_stats *stats,
- struct ofpbuf *buf) {
+ struct ofpbuf *buf)
+{
 size_t act_off;
 struct tc_flower_key *key = &flower->key;
 struct tc_flower_key *mask = &flower->mask;
 odp_port_t outport = 0;
-
-if (flower->ifindex_out) {
-outport = netdev_ifindex_to_odp_port(flower->ifindex_out);
-if (!outport) {
-return ENOENT;
-}
-}
+struct tc_action *action;
+int i;
 
 ofpbuf_clear(buf);
 
@@ -487,60 +483,71 @@ parse_tc_flower_to_match(struct tc_flower *flower,
 
 act_off = nl_msg_start_nested(buf, OVS_FLOW_ATTR_ACTIONS);
 {
-if (flower->vlan_pop) {
-nl_msg_put_flag(buf, OVS_ACTION_ATTR_POP_VLAN);
-}
-
-if (flower->vlan_push_id || flower->vlan_push_prio) {
-struct ovs_action_push_vlan *push;
-push = nl_msg_put_unspec_zero(buf, OVS_ACTION_ATTR_PUSH_VLAN,
-  sizeof *push);
-
-push->vlan_tpid = htons(ETH_TYPE_VLAN);
-push->vlan_tci = htons(flower->vlan_push_id
-   | (flower->vlan_push_prio << 13)
-   | VLAN_CFI);
-}
-
-if (flower->rewrite.rewrite) {
-parse_flower_rewrite_to_netlink_action(buf, flower);
-}
-
-if (flower->set.set) {
-size_t set_offset = nl_msg_start_nested(buf, OVS_ACTION_ATTR_SET);
-size_t tunnel_offset =
-nl_msg_start_nested(buf, OVS_KEY_ATTR_TUNNEL);
-
-nl_msg_put_be64(buf, OVS_TUNNEL_KEY_ATTR_ID, flower->set.id);
-if (flower->set.ipv4.ipv4_src) {
-nl_msg_put_be32(buf, OVS_TUNNEL_KEY_ATTR_IPV4_SRC,
-flower->set.ipv4.ipv4_src);
-}
-if (flower->set.ipv4.ipv4_dst) {
-nl_msg_put_be32(buf, OVS_TUNNEL_KEY_ATTR_IPV4_DST,
-flower->set.ipv4.ipv4_dst);
+action = flower->actions;
+for (i = 0; i < flower->action_count; i++, action++) {
+switch (action->type) {
+case TC_ACT_VLAN_POP: {
+nl_msg_put_flag(buf, OVS_ACTION_ATTR_POP_VLAN);
 }
-if (!is_all_zeros(&flower->set.ipv6.ipv6_src,
-  sizeof flower->set.ipv6.ipv6_src)) {
-nl_msg_put_in6_addr(buf, OVS_TUNNEL_KEY_ATTR_IPV6_SRC,
-&flower->set.ipv6.ipv6_src);
+break;
+case TC_ACT_VLAN_PUSH: {
+struct ovs_action_push_vlan *push;
+
+push = nl_msg_put_unspec_zero(buf, OVS_ACTION_ATTR_PUSH_VLAN,
+  sizeof *push);
+push->vlan_tpid = htons(ETH_TYPE_VLAN);
+push->vlan_tci = htons(action->vlan.vlan_push_id
+   | (action->vlan.vlan_push_prio << 13)
+   | VLAN_CFI);
 }
-if (!is_all_zeros(&flower->set.ipv6.ipv6_dst,
-  sizeof flower->set.ipv6.ipv6_dst)) {
-nl_msg_put_in6_addr(buf, OVS_TUNNEL_KEY_ATTR_IPV6_DST,
-&flower->set.ipv6.ipv6_dst);
+break;
+case TC_ACT_PEDIT: {
+parse_flower_rewrite_to_netlink_action(buf, flower);
 }
-nl_msg_put_be16(buf, OVS_TUNNEL_KEY_ATTR_TP_DST,
-flower->set.tp_dst);
-
-nl_msg_end_nested(buf, tunnel_offset);
-nl_msg_end_nested(buf, set_offset);
-}
+break;
+case TC_ACT_ENCAP: {
+size_t set_offset = nl_msg_start_nested(buf, 
OVS_ACTION_ATTR_SET);
+size_t tunnel_offset =
+nl_msg_start_nested(b

[ovs-dev] [ovs-dev 2/2] netdev-tc-offloads: Add offloading of multiple outputs

2018-04-09 Thread Chris Mi
Currently, we support offloading of one output port. Remove that
limitation by use of mirred mirror action for all output ports,
except that the last one is mirred redirect action.

Issue: 1321102
Change-Id: I65f0d0b6b4531c74b5550ad88a19a441027dc8f6
Signed-off-by: Chris Mi 
Reviewed-by: Paul Blakey 
Reviewed-by: Roi Dayan 
---
 lib/dpif-netlink.c | 10 +-
 lib/tc.c   | 15 +++
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 17be3ddde..bb9e95df7 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2081,7 +2081,6 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 odp_port_t in_port;
 const struct nlattr *nla;
 size_t left;
-int outputs = 0;
 struct netdev *dev;
 struct offload_info info;
 ovs_be16 dst_port = 0;
@@ -2108,20 +2107,13 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 return EOPNOTSUPP;
 }
 
-/* Get tunnel dst port and count outputs */
+/* Get tunnel dst port */
 NL_ATTR_FOR_EACH(nla, left, put->actions, put->actions_len) {
 if (nl_attr_type(nla) == OVS_ACTION_ATTR_OUTPUT) {
 const struct netdev_tunnel_config *tnl_cfg;
 struct netdev *outdev;
 odp_port_t out_port;
 
-outputs++;
-if (outputs > 1) {
-VLOG_DBG_RL(&rl, "offloading multiple ports isn't supported");
-err = EOPNOTSUPP;
-goto out;
-}
-
 out_port = nl_attr_get_odp_port(nla);
 outdev = netdev_ports_get(out_port, dpif_class);
 if (!outdev) {
diff --git a/lib/tc.c b/lib/tc.c
index aeec94ac7..3fc390275 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1209,15 +1209,16 @@ nl_msg_put_act_drop(struct ofpbuf *request)
 }
 
 static void
-nl_msg_put_act_redirect(struct ofpbuf *request, int ifindex)
+nl_msg_put_act_mirred(struct ofpbuf *request, int ifindex, int action,
+  int eaction)
 {
 size_t offset;
 
 nl_msg_put_string(request, TCA_ACT_KIND, "mirred");
 offset = nl_msg_start_nested(request, TCA_ACT_OPTIONS);
 {
-struct tc_mirred m = { .action = TC_ACT_STOLEN,
-   .eaction = TCA_EGRESS_REDIR,
+struct tc_mirred m = { .action = action,
+   .eaction = eaction,
.ifindex = ifindex };
 
 nl_msg_put_unspec(request, TCA_MIRRED_PARMS, &m, sizeof m);
@@ -1453,7 +1454,13 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
 return EINVAL;
 }
 act_offset = nl_msg_start_nested(request, act_index++);
-nl_msg_put_act_redirect(request, ifindex);
+if (i == flower->action_count - 1) {
+nl_msg_put_act_mirred(request, ifindex, TC_ACT_STOLEN,
+  TCA_EGRESS_REDIR);
+} else {
+nl_msg_put_act_mirred(request, ifindex, TC_ACT_PIPE,
+  TCA_EGRESS_MIRROR);
+}
 nl_msg_put_act_cookie(request, &flower->act_cookie);
 nl_msg_end_nested(request, act_offset);
 }
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev 0/2] Add offloading of multiple outputs

2018-04-09 Thread Chris Mi
This patchset adds the offloading support of multiple outputs.

The first patch makes the actions order consistent. In previous
implementation, the actions order is lost when offloading. If there
is only one output, there is on problem. But if there are multiple
outputs, some ports should see the packets before some actions and
some ports should see the packets after some actions. So we should
keep the actions order that OVS DP passes to TC library to make sure
the right ports see the right packets.

The second patch is straighforward that we remove the one output
limitation. If the output is the last action, we use TC mirred
redirect. Otherwise, we use TC mirred mirror.


Chris Mi (2):
  tc: Make the actions order consistent
  netdev-tc-offloads: Add offloading of multiple outputs

 lib/dpif-netlink.c   |  10 +--
 lib/netdev-tc-offloads.c | 182 +++
 lib/tc.c | 171 +++-
 lib/tc.h |  57 +--
 4 files changed, 249 insertions(+), 171 deletions(-)

-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] [PATCH] 1/1 Add multi-column index support for the Python IDL

2018-04-09 Thread twilson
From: Terry Wilson 

This adds multi-column index support for the Python IDL that is
similar to the feature in the C IDL.

Signed-off-by: Terry Wilson 
---
 python/automake.mk|   1 +
 python/ovs/db/custom_index.py | 151 ++
 python/ovs/db/idl.py  |  55 ---
 tests/test-ovsdb.py   |   7 +-
 4 files changed, 199 insertions(+), 15 deletions(-)
 create mode 100644 python/ovs/db/custom_index.py

diff --git a/python/automake.mk b/python/automake.mk
index 9b5d3d8..583a4e9 100644
--- a/python/automake.mk
+++ b/python/automake.mk
@@ -13,6 +13,7 @@ ovs_pyfiles = \
python/ovs/daemon.py \
python/ovs/fcntl_win.py \
python/ovs/db/__init__.py \
+   python/ovs/db/custom_index.py \
python/ovs/db/data.py \
python/ovs/db/error.py \
python/ovs/db/idl.py \
diff --git a/python/ovs/db/custom_index.py b/python/ovs/db/custom_index.py
new file mode 100644
index 000..878cf37
--- /dev/null
+++ b/python/ovs/db/custom_index.py
@@ -0,0 +1,151 @@
+import collections
+import functools
+import operator
+try:
+from UserDict import IterableUserDict as DictBase
+except ImportError:
+from collections import UserDict as DictBase
+
+import sortedcontainers
+
+from ovs.db import data
+
+OVSDB_INDEX_ASC = "ASC"
+OVSDB_INDEX_DESC = "DESC"
+ColumnIndex = collections.namedtuple('ColumnIndex',
+ ['column', 'direction', 'key'])
+
+
+class MultiColumnIndex(object):
+def __init__(self, name):
+self.name = name
+self.columns = []
+self.clear()
+
+def __repr__(self):
+return "{}(name={})".format(self.__class__.__name__, self.name)
+
+def __str__(self):
+return repr(self) + " columns={} values={}".format(
+self.columns, [str(v) for v in self.values])
+
+def add_column(self, column, direction=OVSDB_INDEX_ASC, key=None):
+self.columns.append(ColumnIndex(column, direction,
+ key or operator.attrgetter(column)))
+
+def add_columns(self, *columns):
+self.columns.extend(ColumnIndex(col, OVSDB_INDEX_ASC,
+operator.attrgetter(col))
+for col in columns)
+
+def _cmp(self, a, b):
+for col, direction, key in self.columns:
+aval, bval = key(a), key(b)
+if aval == bval:
+continue
+result = (aval > bval) - (aval < bval)
+return result if direction == OVSDB_INDEX_ASC else -result
+return 0
+
+def index_entry_from_row(self, row):
+return row._table.rows.IndexEntry(
+uuid=row.uuid,
+**{c.column: getattr(row, c.column) for c in self.columns})
+
+def add(self, row):
+if not all(hasattr(row, col.column) for col in self.columns):
+# This is a new row, but it hasn't had the necessary columns set
+# We'll add it later
+return
+self.values.add(self.index_entry_from_row(row))
+
+def remove(self, row):
+self.values.remove(self.index_entry_from_row(row))
+
+def clear(self):
+self.values = sortedcontainers.SortedListWithKey(
+key=functools.cmp_to_key(self._cmp))
+
+def irange(self, start, end):
+return iter(r._table.rows[r.uuid]
+for r in self.values.irange(start, end))
+
+def __iter__(self):
+return iter(r._table.rows[r.uuid] for r in self.values)
+
+
+class IndexedRows(DictBase, object):
+def __init__(self, table, *args, **kwargs):
+super(IndexedRows, self).__init__(*args, **kwargs)
+self.table = table
+self.indexes = {}
+self.IndexEntry = IndexEntryClass(table)
+
+def index_create(self, name):
+if name in self.indexes:
+raise ValueError("An index named {} already exists".format(name))
+index = self.indexes[name] = MultiColumnIndex(name)
+return index
+
+def __setitem__(self, key, item):
+self.data[key] = item
+for index in self.indexes.values():
+index.add(item)
+
+def __delitem__(self, key):
+val = self.data[key]
+del self.data[key]
+for index in self.indexes.values():
+index.remove(val)
+
+def clear(self):
+self.data.clear()
+for index in self.indexes.values():
+index.clear()
+
+# Nothing uses the methods below, though they'd be easy to implement
+def update(self, dict=None, **kwargs):
+raise NotImplementedError()
+
+def setdefault(self, key, failobj=None):
+raise NotImplementedError()
+
+def pop(self, key, *args):
+raise NotImplementedError()
+
+def popitem(self):
+raise NotImplementedError()
+
+@classmethod
+def fromkeys(cls, iterable, value=None):
+raise NotImplementedError()
+
+
+def IndexEntryClass(table):
+"""Create a cla

[ovs-dev] [RFC] [PATCH] 0/1 Add multi-column index support for the Python IDL

2018-04-09 Thread twilson
From: Terry Wilson 

This adds a Python version of the C IDL's multi-column indexes. I tried
to make this as non-invasive as possible by just replacing idl.table.rows
with a custom dict-like object that updates the index when adding or
removing rows, and modifying Row.__setattr__ to update indexes when an
Row has an indexed column modified.

Some additional changes had to be made since Row.__getattr__ will raise
an AssertionError if ._changes == None, which happens when delete() is
called on a row. Also, when a txn is disassembled after a commit,
deleted rows are added back to idl.table.rows with ._changes == None.
Mostly, I just had to reorder when ._changes was set to None.

All tests pass in-tree both before changing the simple3 table tests to
use indexes and after. Also, functional tests in the ovsdbapp project
pass when it is converted to use indexes.

The performance difference on adding 1000 ports via ovsdbapp (with
may_exist=True which forces a lookup by name) takes the time from around
18 seconds to 9 without using the C json extension, and from 10 seconds
to 5 with the extension.

The patch does use an external library, 'sortedcontainers', which is
packaged by Ubuntu and Fedora (but currently isn't packaged in RHEL). It
is a very mature, pure-Ptyhon library that is also licensed Apache 2.0.
I didn't want to write something from scratch for a PoC if something
good existed, and it did. :)

I'd like some comments on whether this is a good direction, whether
anything in the code looks dumb, if we are ok adding an external
dependency (I could always update the patch to work just without the
indexinng feature if sortedcontainers didn't exist, or licensing would
permit us to just take the parts we want in tree--but I'd hate to do
that). If so, I can add some more in-tree testing for indexes with
multiple columns, etc. and fill in some missing (though unused) dict
methods.

Terry Wilson (1):
  Add multi-column index support for the Python IDL

 python/automake.mk|   1 +
 python/ovs/db/custom_index.py | 151 ++
 python/ovs/db/idl.py  |  55 ---
 tests/test-ovsdb.py   |   7 +-
 4 files changed, 199 insertions(+), 15 deletions(-)
 create mode 100644 python/ovs/db/custom_index.py

-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Facilitar la educativa como padre o docente

2018-04-09 Thread Desarrolle mejor la inteligencia de los niños
 
 

 
Cómo preparar excelentes lectores desde la primera infancia 
Marzo 27 - webinar Interactivo

Objetivo:

En esta capacitación veremos los detalles de una metodología que está 
revolucionando la manera de enseñar a leer y que está soportada 
científicamente. Le daremos los detalles para que empiece a preparar lectores 
de alto rendimiento. 

8 razones para asistir a este webinar :

- Porque el mundo está cambiando y la educación no se puede quedar atrás.
- Para ser protagonista de los cambios en la educación.
- Facilitar la labor educativa como padre o docente.
- Ayudar a mejorar a nuestra sociedad.
- Poder ayudar a una generación de niños que necesitan hacer de la lectura su 
principal herramienta de formación.
- Afrontar de mejor manera la educación de sus hijos. 
- Saber cómo desarrollar mejor la inteligencia de los niños.
 - Una buena educación es la mejor manera de construir la paz. 
 
 
Temario e Inscripciones:

Respondiendo por este medio "Lectura"+TELÉFONO + NOMBRE o marcando al:

045 + 5515546630  



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] raft ovsdb clustering with scale test

2018-04-09 Thread aginwala
Further Analysis indicates that cluster is not able to select a new leader
and hence every node keeps retrying:

ovn-northd logs around the same time on all three nodes complains about
disconnection from cluster and failing to choose leader


#Node1
2018-04-06T01:19:06.481Z|00466|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.
2018-04-06T01:19:07.668Z|00467|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
2018-04-06T01:19:07.671Z|00468|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
clustered database server is not cluster leader; trying another server
2018-04-06T01:19:15.673Z|00469|reconnect|INFO|tcp:192.168.220.103:6642:
connected
2018-04-06T01:19:15.673Z|00470|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

#Node2
2018-04-06T01:19:58.249Z|00487|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
2018-04-06T01:19:58.255Z|00488|ovsdb_idl|INFO|tcp:192.168.220.103:6642:
clustered database server is disconnected from cluster; trying another
server
2018-04-06T01:20:09.261Z|00489|reconnect|INFO|tcp:192.168.220.102:6642:
connected
2018-04-06T01:20:09.261Z|00490|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.


#Node3
2018-04-06T01:19:01.654Z|00964|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
2018-04-06T01:19:01.711Z|00965|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
clustered database server is disconnected frr
om cluster; trying another server
2018-04-06T01:19:09.715Z|00966|reconnect|INFO|tcp:192.168.220.103:6642:
connected
2018-04-06T01:19:09.716Z|00967|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.



Regards,
Aliasgar

On Fri, Apr 6, 2018 at 2:16 PM, aginwala  wrote:

> Hi Ben/Numan:
>
> So I went ahead to try out clustered db in scale env and results are not
> as expected.
> OVS branch used is master; commit-id(2062840612904ff0873d46b2f4f226
> bc23f2296d)
>
> Setup is uing 3 nodes.
>
> Also disabled inactivity_probe,
> ovn-nbctl --db="tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:
> 192.168.220.103:6641" set NB . external_ids:inactivity_probe=0
> ovn-sbctl --db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
> 192.168.220.103:6642" set SB . external_ids:inactivity_probe=0
>
>
> 1. With wait_up = true for 10k lports and 1k HVs which internally uses
> wait-until, it was able to create around 1k lports and rally exited due db
> connection error.
> Cause of failure: database connection failed is @
> https://raw.githubusercontent.com/noah8713/ovn-scale-test/
> e398ccdd77b5c0bfed3d1cfe37c7e7ac5d8d8f81/results/output_raft_fail.html
>
> Some data from sb db logs are as below
> 2018-04-05T06:29:32.628Z|00065|poll_loop|INFO|wakeup due to 9-ms timeout
> at lib/reconnect.c:643 (90% CPU usage)
> 2018-04-05T06:30:19.432Z|00066|raft|INFO|current entry eid
> 23a85453-70a1-49ae-bf8f-22a1eeda6a60 does not match prerequisite
> 966d3234-961a-4dc5-b2ad-4f12766fd610 in execute_command_request
> 2018-04-05T06:31:03.428Z|00067|raft|INFO|current entry eid
> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> 2018-04-05T06:31:03.469Z|00068|raft|INFO|current entry eid
> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> 2018-04-05T06:31:03.492Z|00069|raft|INFO|current entry eid
> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> 2018-04-05T06:31:03.505Z|00070|raft|INFO|current entry eid
> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> 2018-04-05T06:31:03.510Z|00071|raft|INFO|current entry eid
> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> 2018-04-05T06:32:04.056Z|00072|memory|INFO|peak resident set size grew
> 51% in last 35031.7 seconds, from 24748 kB to 37332 kB
> 2018-04-05T06:32:04.056Z|00073|memory|INFO|cells:59908 json-caches:1
> monitors:3 sessions:431
> 2018-04-05T06:32:27.956Z|00074|raft|WARN|ignoring vote request received
> as leader
> 2018-04-05T06:32:27.958Z|00075|timeval|WARN|Unreasonably long 1278ms poll
> interval (1044ms user, 232ms system)
> 2018-04-05T06:32:27.958Z|00076|timeval|WARN|faults: 61669 minor, 0 major
> 2018-04-05T06:32:27.958Z|00077|timeval|WARN|context switches: 0
> voluntary, 3 involuntary
> 2018-04-05T06:32:27.958Z|00078|coverage|INFO|Event coverage, avg rate
> over last: 5 seconds, last minute, last hour,  hash=1dd2af20:
> 2018-04-05T06:32:27.959Z|00079|coverage|INFO|hmap_pathological
>  25.0/sec24.633/sec1.5550/sec   total: 19274
> 2018-04-05T06:32:27.959Z|00080|coverage|INFO|hmap_expand
> 4202.6/sec  4041.417/sec  578.9233/sec   to

[ovs-dev] [PATCH] netdev-dpdk: fix MAC address in port addr example

2018-04-09 Thread Marcelo Ricardo Leitner
The MAC address is always 6-bytes long, never 7. The extra :01 and :02
doesn't belong in there as it doesn't mean selecting one port or
another.

Instead, use an incrementing MAC address, which is what usually happens
on such cards.

See-also: http://www.dpdk.org/ml/archives/dev/2018-April/094976.html
Fixes: 5e7588186839 ("netdev-dpdk: fix port addition for ports sharing same PCI 
id")
Signed-off-by: Marcelo Ricardo Leitner 
---
 Documentation/howto/dpdk.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 
79b626c76d0dd45381bd75ab867b7815ca941208..69e692f40d500cf65d59d1979e07afa6f99cf903
 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -53,9 +53,9 @@ with multiple ports. Using a PCI device like above won't 
work. Instead, below
 usage is suggested::
 
 $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
-options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55:01"
+options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
 $ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
-options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55:02"
+options:dpdk-devargs="class=eth,mac=00:11:22:33:44:56"
 
 Note: such syntax won't support hotplug. The hotplug is supposed to work with
 future DPDK release, v18.05.
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] stopwatch: Fix Windows incompatibility

2018-04-09 Thread Mark Michelson
Stopwatch was implemented using a Unix-only pipe structure. This commit
changes to using a guarded list and latch in order to pass data between
threads.

Signed-off-by: Mark Michelson 
---
 lib/stopwatch.c | 92 +
 1 file changed, 54 insertions(+), 38 deletions(-)

diff --git a/lib/stopwatch.c b/lib/stopwatch.c
index 2a4124840..bd40434af 100644
--- a/lib/stopwatch.c
+++ b/lib/stopwatch.c
@@ -25,6 +25,8 @@
 #include 
 #include "socket-util.h"
 #include "util.h"
+#include "latch.h"
+#include "guarded-list.h"
 
 VLOG_DEFINE_THIS_MODULE(stopwatch);
 
@@ -73,6 +75,7 @@ enum stopwatch_op {
 };
 
 struct stopwatch_packet {
+struct ovs_list list_node;
 enum stopwatch_op op;
 char name[32];
 unsigned long long time;
@@ -82,7 +85,8 @@ static struct shash stopwatches = 
SHASH_INITIALIZER(&stopwatches);
 static struct ovs_mutex stopwatches_lock = OVS_MUTEX_INITIALIZER;
 static pthread_cond_t stopwatches_sync = PTHREAD_COND_INITIALIZER;
 
-static int stopwatch_pipe[2];
+static struct latch stopwatch_latch;
+static struct guarded_list stopwatch_commands;
 static pthread_t stopwatch_thread_id;
 
 static const char *unit_name[] = {
@@ -329,17 +333,33 @@ stopwatch_show(struct unixctl_conn *conn, int argc 
OVS_UNUSED,
 ds_destroy(&s);
 }
 
+static struct stopwatch_packet *
+stopwatch_packet_create(enum stopwatch_op op)
+{
+struct stopwatch_packet *pkt;
+
+pkt = xzalloc(sizeof *pkt);
+pkt->op = op;
+
+return pkt;
+}
+
+static void
+stopwatch_packet_write(struct stopwatch_packet *pkt)
+{
+guarded_list_push_back(&stopwatch_commands, &pkt->list_node, SIZE_MAX);
+latch_set(&stopwatch_latch);
+}
+
 static void
 stopwatch_reset(struct unixctl_conn *conn, int argc OVS_UNUSED,
 const char *argv[], void *aux OVS_UNUSED)
 {
-struct stopwatch_packet pkt = {
-.op = OP_RESET,
-};
+struct stopwatch_packet *pkt = stopwatch_packet_create(OP_RESET);
 if (argc > 1) {
-ovs_strlcpy(pkt.name, argv[1], sizeof pkt.name);
+ovs_strlcpy(pkt->name, argv[1], sizeof pkt->name);
 }
-ignore(write(stopwatch_pipe[1], &pkt, sizeof pkt));
+stopwatch_packet_write(pkt);
 unixctl_command_reply(conn, "");
 }
 
@@ -406,31 +426,34 @@ stopwatch_thread(void *ign OVS_UNUSED)
 bool should_exit = false;
 
 while (!should_exit) {
-struct stopwatch_packet pkt;
-while (read(stopwatch_pipe[0], &pkt, sizeof pkt) > 0) {
-ovs_mutex_lock(&stopwatches_lock);
-switch (pkt.op) {
+struct ovs_list command_list;
+struct stopwatch_packet *pkt;
+
+guarded_list_pop_all(&stopwatch_commands, &command_list);
+ovs_mutex_lock(&stopwatches_lock);
+LIST_FOR_EACH_POP (pkt, list_node, &command_list) {
+switch (pkt->op) {
 case OP_START_SAMPLE:
-stopwatch_start_sample_protected(&pkt);
+stopwatch_start_sample_protected(pkt);
 break;
 case OP_END_SAMPLE:
-stopwatch_end_sample_protected(&pkt);
+stopwatch_end_sample_protected(pkt);
 break;
 case OP_SYNC:
 xpthread_cond_signal(&stopwatches_sync);
 break;
 case OP_RESET:
-stopwatch_reset_protected(&pkt);
+stopwatch_reset_protected(pkt);
 break;
 case OP_SHUTDOWN:
 should_exit = true;
 break;
 }
-ovs_mutex_unlock(&stopwatches_lock);
 }
+ovs_mutex_unlock(&stopwatches_lock);
 
 if (!should_exit) {
-poll_fd_wait(stopwatch_pipe[0], POLLIN);
+latch_wait(&stopwatch_latch);
 poll_block();
 }
 }
@@ -442,11 +465,8 @@ static void
 stopwatch_exit(void)
 {
 struct shash_node *node, *node_next;
-struct stopwatch_packet pkt = {
-.op = OP_SHUTDOWN,
-};
-
-ignore(write(stopwatch_pipe[1], &pkt, sizeof pkt));
+struct stopwatch_packet *pkt = stopwatch_packet_create(OP_SHUTDOWN);
+stopwatch_packet_write(pkt);
 xpthread_join(stopwatch_thread_id, NULL);
 
 /* Process is exiting and we have joined the only
@@ -460,6 +480,8 @@ stopwatch_exit(void)
 }
 shash_destroy(&stopwatches);
 ovs_mutex_destroy(&stopwatches_lock);
+guarded_list_destroy(&stopwatch_commands);
+latch_destroy(&stopwatch_latch);
 }
 
 static void
@@ -469,7 +491,8 @@ do_init_stopwatch(void)
  stopwatch_show, NULL);
 unixctl_command_register("stopwatch/reset", "[NAME]", 0, 1,
  stopwatch_reset, NULL);
-xpipe_nonblocking(stopwatch_pipe);
+guarded_list_init(&stopwatch_commands);
+latch_init(&stopwatch_latch);
 stopwatch_thread_id = ovs_thread_create(
 "stopwatch", stopwatch_thread, NULL);
 atexit(stopwatch_exit);
@@ -503,34 +526,27 @@ stopwatch_cr

[ovs-dev] A few of your incoming messages has been suspended

2018-04-09 Thread Jasna Kofol
Your Messages Has Been Suspended By Microsoft Outlook Team

A few of your incoming messages has been suspended by microsoft verification 
Team because your email box account has not been properly verify. Do 
verify now to receive your suspended messages 
now.



Microsoft Outlook Team

Microsoft Outlook Copyright (c) 2018












___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 0/2] dpdk: minor refactor of the initialization step

2018-04-09 Thread Aaron Conole
"Mooney, Sean K"  writes:

> So just from a deployment tools point of view I would like to point out that
> This change could break existing workflow that deploy ovs in a docker 
> container.
> Kolla ansible assumes that if the docker ovs_vswitchd container is
> still running that the
> is infact running in dpdk mode when we set dpdk-init=true.

Is there a way to test this out and see the behavior?

It does seem strange that for a possible configuration error we abort()
running the vswitchd (and with --monitor set, it will continue to
abort() over and over - so I guess you're also not using the monitor
thread?).  In the case that an abort does happen, does the Kolla script
distinguish between issues where dpdk setup failed vs. some other
software issue?

> Can I request that if you make this change you add something along the lines 
> of
> dpdk-init-is-fatal=true/false so that we can explicitly say which behavior we 
> want.
> I would not be surprised if people have built monitoring around "is
> the ovs-vswitchd running"

I think they have, but I don't know that they use it to infer such
low-level details (meaning a crash implies that dpdk configuration is
wrong).

> To infer at least at a highlevel that "everything is fine" where as
> the log message/db field proposed
> Here will invalidate that.

I've added Assaf Mueller from our Open Stack team as well - maybe he has
some additional details on those mechanisms outside of Kolla (maybe it
exists in some kind of director / other software too, as you point out).

> it would be ease to check that field but its work that needs to be
> done in multiple places.

I think such a knob wouldn't be useful.  I believe it would either
have to be defaulted to 'dpdk-init-is-fatal=true' to abort on failure
(which most users would want to change making it an undesirable
default), or the Kolla ansible scripts (and other detection mechanisms
for dpdk failure - if they exist) would need to change.  Maybe there's
another approach, though?

>> -Original Message-
>> From: Aaron Conole [mailto:acon...@redhat.com]
>> Sent: Thursday, April 5, 2018 10:23 PM
>> To: d...@openvswitch.org
>> Cc: Stokes, Ian ; Kevin Traynor
>> ; Ilya Maximets ; Loftus,
>> Ciara ; Mooney, Sean K
>> ; Terry Wilson 
>> Subject: [RFC 0/2] dpdk: minor refactor of the initialization step
>> 
>> Sometimes, DPDK initialization can fail, but ovs-vswitchd will abort in
>> that case.  When that occurs, ovs-vswitchd will be restarted by the
>> monitor and immediately abort.  This is rather unfriendly to users, who
>> would prefer to possibly correct the issue or at least, not have lots
>> of processes continually spawning.
>> 
>> This series accepts that rte_eal_init() can and does fail for real.  It
>> reflects the initialization status in the database, as well as adding
>> the DPDK version (where appropriate).
>> 
>> Submitted as RFC to spawn discussion around the type to reflect for the
>> initialized information.  Presented here as a boolean - however, it
>> might be more interesting to be a 'string' and have more elaborate
>> details (ex: 'failed - ovs_strerror(rte_errno)' or 'uninitialized' or
>> 'initialized').
>> 
>> Aaron Conole (2):
>>   dpdk: allow init to fail
>>   dpdk: reflect status and version in the database
>> 
>>  lib/dpdk-stub.c| 10 ++
>>  lib/dpdk.c | 31 +--
>>  lib/dpdk.h |  3 ++-
>>  vswitchd/bridge.c  |  5 +
>>  vswitchd/vswitch.ovsschema | 11 ---
>>  vswitchd/vswitch.xml   | 11 +++
>>  6 files changed, 61 insertions(+), 10 deletions(-)
>> 
>> --
>> 2.14.3
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/8] doc: Move additional sections to "physical ports" doc

2018-04-09 Thread Stokes, Ian
> The "vdev", "hotplugging", and "Rx checksum offload" sections only apply
> to 'dpdk' ports and are too detailed to include in a high-level howto.

Should flow control be in here too? AFAIK it's phy port only.

> Move them, reworking some aspects of this in the process.
> 

It may not be obvious to users that these are relevant to phy only and as such 
are found under Documentation/topics/dpdk/phy.rst.

We should be making it clear where these topics can be found to user at a 
higher level.

Have you considered a high level documentation map, possibly in 
Documentation/howto/dpdk.rst showing where feature docs can be found?

More comments below.


> Signed-off-by: Stephen Finucane 
> ---
>  Documentation/howto/dpdk.rst  | 93 +++---
> -
>  Documentation/topics/dpdk/phy.rst | 91
> ++
>  2 files changed, 97 insertions(+), 87 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index c2324118d..4d993a0eb 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -57,8 +57,12 @@ usage is suggested::
>  $ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
>  options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55:02"
> 
> -Note: such syntax won't support hotplug. The hotplug is supposed to work
> with -future DPDK release, v18.05.
> +.. important::
> +
> +Hotplugging physical interfaces is not supported using the above
> syntax.
> +This is expected to change with the release of DPDK v18.05. For
> information
> +on hotplugging physical interfaces, you should instead refer to
> +:ref:`port-hotplug`.
> 
>  After the DPDK ports get added to switch, a polling thread continuously
> polls  DPDK devices and consumes 100% of the core, as can be checked from
> ``top`` and @@ -236,16 +240,6 @@ largest frame size supported by Fortville
> NIC using the DPDK i40e driver, but  larger frames and other DPDK NIC
> drivers may be supported. These cases are  common for use cases involving
> East-West traffic only.
> 
> -Rx Checksum Offload
> 
> -
> -By default, DPDK physical ports are enabled with Rx checksum offload.
> -
> -Rx checksum offload can offer performance improvement only for tunneling
> -traffic in OVS-DPDK because the checksum validation of tunnel packets is
> -offloaded to the NIC. Also enabling Rx checksum may slightly reduce the -
> performance of non-tunnel traffic, specifically for smaller size packet.
> -
>  .. _extended-statistics:
> 
>  Extended & Custom Statistics
> @@ -278,81 +272,6 @@ Note about "Extended Statistics": vHost ports
> supports only partial  statistics. RX packet size based counter are only
> supported and  doesn't include TX packet size counters.
> 
> -.. _port-hotplug:
> -
> -Port Hotplug
> -
> -
> -OVS supports port hotplugging, allowing the use of ports that were not
> bound -to DPDK when vswitchd was started.
> -In order to attach a port, it has to be bound to DPDK using the -
> ``dpdk_nic_bind.py`` script::
> -
> -$ $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio :01:00.0
> -
> -Then it can be attached to OVS::
> -
> -$ ovs-vsctl add-port br0 dpdkx -- set Interface dpdkx type=dpdk \
> -options:dpdk-devargs=:01:00.0
> -
> -Detaching will be performed while processing del-port command::
> -
> -$ ovs-vsctl del-port dpdkx
> -
> -Sometimes, the del-port command may not detach the device.
> -Detaching can be confirmed by the appearance of an INFO log.
> -For example::
> -
> -INFO|Device ':04:00.1' has been detached
> -
> -If the log is not seen, then the port can be detached using::
> -
> -$ ovs-appctl netdev-dpdk/detach :01:00.0
> -
> -Detaching can be confirmed by console output::
> -
> -Device ':04:00.1' has been detached
> -
> -.. warning::
> -Detaching should not be done if a device is known to be non-
> detachable, as
> -this may cause the device to behave improperly when added back with
> -add-port. The Chelsio Terminator adapters which use the cxgbe driver
> seem
> -to be an example of this behavior; check the driver documentation if
> this
> -is suspected.
> -
> -This feature does not work with some NICs.
> -For more information please refer to the `DPDK Port Hotplug Framework -
>  >`__.
> -
> -.. _vdev-support:
> -
> -Vdev Support
> -
> -
> -DPDK provides drivers for both physical and virtual devices. Physical
> DPDK -devices are added to OVS by specifying a valid PCI address in 'dpdk-
> devargs'.
> -Virtual DPDK devices which do not have PCI addresses can be added using a
> -different format for 'dpdk-devargs'.
> -
> -Typically, the format expected is 'eth_' where 'x' is a -
> unique identifier of your choice for the given port.
> -
> -For example to add a dpdk port that uses the 'null' DPDK PMD driver::
> -
> -   $ 

[ovs-dev] [PATCH] docs: Fix 7 byte octets MAC addresses in dpdk.rst

2018-04-09 Thread Timothy Redaelli
Currently the code relies on the standard 6 byte octets, but the
documentation uses a wrong 7-byte octects.
This commit fix the documention in order to use the correct 6 byte octets
syntax.

Fixes: 5e7588186839 ("netdev-dpdk: fix port addition for ports sharing same PCI 
id")

Signed-off-by: Timothy Redaelli 
---
 Documentation/howto/dpdk.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 79b626c76..69e692f40 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -53,9 +53,9 @@ with multiple ports. Using a PCI device like above won't 
work. Instead, below
 usage is suggested::
 
 $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
-options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55:01"
+options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
 $ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
-options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55:02"
+options:dpdk-devargs="class=eth,mac=00:11:22:33:44:56"
 
 Note: such syntax won't support hotplug. The hotplug is supposed to work with
 future DPDK release, v18.05.
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 7/8] doc: Split Jumbo Frames guide between two docs

2018-04-09 Thread Stokes, Ian
> While there is some duplication going on here, that's not necessarily a
> bad thing. If nothing else, it lets us remove one more overly-detailed
> step from the howto.
> 

I think it's cleaner to move Jumbo frames to its own doc altogether rather than 
splitting in two.
>From the point of view flagging known issues and future maintenance this will 
>help.

Maybe just add a small section in both phy and vhost user noting Jumbo frames 
are supported and then link to the main jumbo frame doc.

Ian

> Signed-off-by: Stephen Finucane 
> ---
>  Documentation/howto/dpdk.rst | 52 ++-
> -
>  Documentation/topics/dpdk/phy.rst| 24 +++
>  Documentation/topics/dpdk/vhost-user.rst | 36 ++
>  3 files changed, 63 insertions(+), 49 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 1a72e90bf..ba01810f8 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -48,9 +48,9 @@ number of dpdk devices found in the log file::
>  $ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
>  options:dpdk-devargs=:01:00.1
> 
> -Some NICs (i.e. Mellanox ConnectX-3) have only one PCI address associated
> -with multiple ports. Using a PCI device like above won't work. Instead,
> below -usage is suggested::
> +Some NICs (i.e. Mellanox ConnectX-3) have only one PCI address
> +associated with multiple ports. Using a PCI device like above won't
> +work. Instead, below usage is suggested::
> 
>  $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>  options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55:01"
> @@ -85,52 +85,6 @@ To stop ovs-vswitchd & delete bridge, run::
>  $ ovs-appctl -t ovsdb-server exit
>  $ ovs-vsctl del-br br0
> 
> -Jumbo Frames
> -
> -
> -By default, DPDK ports are configured with standard Ethernet MTU (1500B).
> To -enable Jumbo Frames support for a DPDK port, change the Interface's -
> ``mtu_request`` attribute to a sufficiently large value. For example, to
> add a -DPDK Phy port with MTU of 9000::
> -
> -$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
> -  options:dpdk-devargs=:01:00.0 mtu_request=9000
> -
> -Similarly, to change the MTU of an existing port to 6200::
> -
> -$ ovs-vsctl set Interface dpdk-p0 mtu_request=6200
> -
> -Some additional configuration is needed to take advantage of jumbo frames
> with -vHost ports:
> -
> -1. *mergeable buffers* must be enabled for vHost ports, as demonstrated
> in the
> -   QEMU command line snippet below::
> -
> -   -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
> -   -device virtio-net-
> pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on
> -
> -2. Where virtio devices are bound to the Linux kernel driver in a guest
> -   environment (i.e. interfaces are not bound to an in-guest DPDK
> driver), the
> -   MTU of those logical network interfaces must also be increased to a
> -   sufficiently large value. This avoids segmentation of Jumbo Frames
> received
> -   in the guest. Note that 'MTU' refers to the length of the IP packet
> only,
> -   and not that of the entire frame.
> -
> -   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> header
> -   and CRC lengths (i.e. 18B) from the max supported frame size.  So, to
> set
> -   the MTU for a 9018B Jumbo Frame::
> -
> -   $ ip link set eth1 mtu 9000
> -
> -When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments
> are -increased, such that a full Jumbo Frame of a specific size may be
> accommodated -within a single mbuf segment.
> -
> -Jumbo frame support has been validated against 9728B frames, which is the
> -largest frame size supported by Fortville NIC using the DPDK i40e driver,
> but -larger frames and other DPDK NIC drivers may be supported. These
> cases are -common for use cases involving East-West traffic only.
> -
>  .. _dpdk-ovs-in-guest:
> 
>  OVS with DPDK Inside VMs
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> index 93aff628c..d49269567 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -216,3 +216,27 @@ Flow Control
> 
>  Flow control is available for DPDK physical ports. For more information,
> refer  to :ref:`dpdk-flow-control`.
> +
> +Jumbo Frames
> +
> +
> +By default, ``dpdk`` ports are configured with standard Ethernet MTU
> (1500B).
> +To enable Jumbo Frames support for such a port, change the interface's
> +``mtu_request`` attribute to a sufficiently large value. For example,
> +to add a ``dpdk`` port with MTU of 9000, run::
> +
> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
> +options:dpdk-devargs=:01:00.0 mtu_request=9000
> +
> +Similarly, to change the MTU of an existing port to 6200::
> +
> +$ ovs-vsctl set Interface d

Re: [ovs-dev] [PATCH 6/8] doc: Move "pdump" guide to its own document

2018-04-09 Thread Stokes, Ian
Minor nit, would like to see a commit message here. I think you modify caption 
headers in the patch also, these could be kept in the final clean up patch of 
the series.

Ian

> Signed-off-by: Stephen Finucane 
> ---
>  Documentation/howto/dpdk.rst| 39 --
>  Documentation/topics/dpdk/index.rst |  7 
> Documentation/topics/dpdk/pdump.rst | 65
> +
>  3 files changed, 72 insertions(+), 39 deletions(-)  create mode 100644
> Documentation/topics/dpdk/pdump.rst
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index c01bf7a39..1a72e90bf 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -85,45 +85,6 @@ To stop ovs-vswitchd & delete bridge, run::
>  $ ovs-appctl -t ovsdb-server exit
>  $ ovs-vsctl del-br br0
> 
> -pdump
> --
> -
> -pdump allows you to listen on DPDK ports and view the traffic that is
> passing -on them. To use this utility, one must have libpcap installed on
> the system.
> -Furthermore, DPDK must be built with ``CONFIG_RTE_LIBRTE_PDUMP=y`` and -
> ``CONFIG_RTE_LIBRTE_PMD_PCAP=y``.
> -
> -.. warning::
> -  A performance decrease is expected when using a monitoring application
> like
> -  the DPDK pdump app.
> -
> -To use pdump, simply launch OVS as usual, then navigate to the
> ``app/pdump`` -directory in DPDK, ``make`` the application and run like
> so::
> -
> -$ sudo ./build/app/dpdk-pdump -- \
> ---pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \
> ---server-socket-path=/usr/local/var/run/openvswitch
> -
> -The above command captures traffic received on queue 0 of port 0 and
> stores it -in ``/tmp/pkts.pcap``. Other combinations of port numbers,
> queues numbers and -pcap locations are of course also available to use.
> For example, to capture all -packets that traverse port 0 in a single pcap
> file::
> -
> -$ sudo ./build/app/dpdk-pdump -- \
> ---pdump 'port=0,queue=*,rx-dev=/tmp/pkts.pcap,tx-
> dev=/tmp/pkts.pcap' \
> ---server-socket-path=/usr/local/var/run/openvswitch
> -
> -``server-socket-path`` must be set to the value of ``ovs_rundir()`` which
> -typically resolves to ``/usr/local/var/run/openvswitch``.
> -
> -Many tools are available to view the contents of the pcap file. Once
> example is -tcpdump. Issue the following command to view the contents of
> ``pkts.pcap``::
> -
> -$ tcpdump -r pkts.pcap
> -
> -More information on the pdump app and its usage can be found in the `DPDK
> docs -`__.
> -
>  Jumbo Frames
>  
> 
> diff --git a/Documentation/topics/dpdk/index.rst
> b/Documentation/topics/dpdk/index.rst
> index 52cacaef6..c9b231f06 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -27,10 +27,17 @@ The DPDK Datapath
> 
>  .. toctree::
> :maxdepth: 2
> +   :caption: Bridges and Ports
> 
> bridge
> phy
> vhost-user
> ring
> +
> +.. toctree::
> +   :maxdepth: 2
> +   :caption: Advanced Configuration
> +
> pmd
> qos
> +   pdump
> diff --git a/Documentation/topics/dpdk/pdump.rst
> b/Documentation/topics/dpdk/pdump.rst
> new file mode 100644
> index 0..f1a408989
> --- /dev/null
> +++ b/Documentation/topics/dpdk/pdump.rst
> @@ -0,0 +1,65 @@
> +..
> +  Licensed under the Apache License, Version 2.0 (the "License"); you
> may
> +  not use this file except in compliance with the License. You may
> obtain
> +  a copy of the License at
> +
> +  http://www.apache.org/licenses/LICENSE-2.0
> +
> +  Unless required by applicable law or agreed to in writing, software
> +  distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> +  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> +  License for the specific language governing permissions and
> limitations
> +  under the License.
> +
> +  Convention for heading levels in Open vSwitch documentation:
> +
> +  ===  Heading 0 (reserved for the title in a document)
> +  ---  Heading 1
> +  ~~~  Heading 2
> +  +++  Heading 3
> +  '''  Heading 4
> +
> +  Avoid deeper levels because they do not render well.
> +
> +=
> +pdump
> +=
> +
> +pdump allows you to listen on DPDK ports and view the traffic that is
> +passing on them. To use this utility, one must have libpcap installed on
> the system.
> +Furthermore, DPDK must be built with ``CONFIG_RTE_LIBRTE_PDUMP=y`` and
> +``CONFIG_RTE_LIBRTE_PMD_PCAP=y``.
> +
> +.. warning::
> +
> +   A performance decrease is expected when using a monitoring application
> like
> +   the DPDK pdump app.
> +
> +To use pdump, simply launch OVS as usual, then navigate to the
> +``app/pdump`` directory in DPDK, ``make`` the application and run like
> so::
> +
> +$ sudo ./build/app/dpdk-pdump -- \
> +--pdump port=0,queue=0,rx-dev=/tmp/pkts.pcap \
> +   

Re: [ovs-dev] [PATCH 5/8] doc: Add "bridge" topic document

2018-04-09 Thread Stokes, Ian
> This details configuration steps that apply to the entire bridge, rather
> than individual ports.

Comments inline.

> 
> Signed-off-by: Stephen Finucane 
> ---
>  Documentation/howto/dpdk.rst |  60 
>  Documentation/topics/dpdk/bridge.rst | 103
> +++
>  Documentation/topics/dpdk/index.rst  |   1 +
>  3 files changed, 104 insertions(+), 60 deletions(-)  create mode 100644
> Documentation/topics/dpdk/bridge.rst
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 608dde814..c01bf7a39 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -170,66 +170,6 @@ largest frame size supported by Fortville NIC using
> the DPDK i40e driver, but  larger frames and other DPDK NIC drivers may be
> supported. These cases are  common for use cases involving East-West
> traffic only.
> 
> -.. _extended-statistics:
> -
> -Extended & Custom Statistics
> -
> -
> -DPDK Extended Statistics API allows PMD to expose unique set of
> statistics.
> -The Extended statistics are implemented and supported only for DPDK
> physical -and vHost ports. Custom statistics are dynamic set of counters
> which can -vary depenend on a driver. Those statistics are implemented -
> for DPDK physical ports and contain all "dropped", "error" and
> "management"
> -counters from XSTATS. XSTATS counters list can be found here:
> -`__.
> -
> -To enable statistics, you have to enable OpenFlow 1.4 support for OVS.
> -Configure bridge br0 to support OpenFlow version 1.4::
> -
> -$ ovs-vsctl set bridge br0 datapath_type=netdev \
> -  protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14
> -
> -Check the OVSDB protocols column in the bridge table if OpenFlow 1.4
> support -is enabled for OVS::
> -
> -$ ovsdb-client dump Bridge protocols
> -
> -Query the port statistics by explicitly specifying -O OpenFlow14 option::
> -
> -$ ovs-ofctl -O OpenFlow14 dump-ports br0
> -
> -Note about "Extended Statistics": vHost ports supports only partial -
> statistics. RX packet size based counter are only supported and -doesn't
> include TX packet size counters.
> -
> -EMC Insertion Probability
> --
> -By default 1 in every 100 flows are inserted into the Exact Match Cache
> (EMC).
> -It is possible to change this insertion probability by setting the -
> ``emc-insert-inv-prob`` option::
> -
> -$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-
> prob=N
> -
> -where:
> -
> -``N``
> -  is a positive integer representing the inverse probability of insertion
> ie.
> -  on average 1 in every N packets with a unique flow will generate an EMC
> -  insertion.
> -
> -If ``N`` is set to 1, an insertion will be performed for every flow. If
> set to -0, no insertions will be performed and the EMC will effectively be
> disabled.
> -
> -With default ``N`` set to 100, higher megaflow hits will occur initially
> -as observed with pmd stats::
> -
> -$ ovs-appctl dpif-netdev/pmd-stats-show
> -
> -For certain traffic profiles with many parallel flows, it's recommended
> to set -``N`` to '0' to achieve higher forwarding performance.
> -
> -For more information on the EMC refer to :doc:`/intro/install/dpdk` .
> -
>  .. _dpdk-ovs-in-guest:
> 
>  OVS with DPDK Inside VMs
> diff --git a/Documentation/topics/dpdk/bridge.rst
> b/Documentation/topics/dpdk/bridge.rst
> new file mode 100644
> index 0..307005f3b
> --- /dev/null
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -0,0 +1,103 @@
> +..
> +  Licensed under the Apache License, Version 2.0 (the "License"); you
> may
> +  not use this file except in compliance with the License. You may
> obtain
> +  a copy of the License at
> +
> +  http://www.apache.org/licenses/LICENSE-2.0
> +
> +  Unless required by applicable law or agreed to in writing, software
> +  distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> +  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> +  License for the specific language governing permissions and
> limitations
> +  under the License.
> +
> +  Convention for heading levels in Open vSwitch documentation:
> +
> +  ===  Heading 0 (reserved for the title in a document)
> +  ---  Heading 1
> +  ~~~  Heading 2
> +  +++  Heading 3
> +  '''  Heading 4
> +
> +  Avoid deeper levels because they do not render well.
> +
> +
> +DPDK Bridges
> +
> +
> +The DPDK datapath requires specially configured bridge(s) in order to
> +utilize DPDK-backed :doc:`physical ` and `virtual `
> ports.
> +
> +Quick Example
> +-
> +
> +This example demonstrates how to add a bridge using the DPDK datapath::
> +
> +$ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
> +

Re: [ovs-dev] [PATCH 4/8] doc: Move "QoS" guide to its own document

2018-04-09 Thread Stokes, Ian
> Again, this stuff is too detailed for a high-level howto.
> 
> Signed-off-by: Stephen Finucane 
> ---
>  Documentation/howto/dpdk.rst|  70 -
>  Documentation/topics/dpdk/index.rst |   1 +
>  Documentation/topics/dpdk/phy.rst   |   6 +++
>  Documentation/topics/dpdk/qos.rst   | 100
> 
>  4 files changed, 107 insertions(+), 70 deletions(-)  create mode 100644
> Documentation/topics/dpdk/qos.rst
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 4d993a0eb..608dde814 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -85,76 +85,6 @@ To stop ovs-vswitchd & delete bridge, run::
>  $ ovs-appctl -t ovsdb-server exit
>  $ ovs-vsctl del-br br0
> 
> -QoS
> 
> -
> -Assuming you have a vhost-user port transmitting traffic consisting of
> packets -of size 64 bytes, the following command would limit the egress
> transmission -rate of the port to ~1,000,000 packets per second::
> -
> -$ ovs-vsctl set port vhost-user0 qos=@newqos -- \
> ---id=@newqos create qos type=egress-policer other-
> config:cir=4600 \
> -other-config:cbs=2048`
> -
> -To examine the QoS configuration of the port, run::
> -
> -$ ovs-appctl -t ovs-vswitchd qos/show vhost-user0
> -
> -To clear the QoS configuration from the port and ovsdb, run::
> -
> -$ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
> -
> -Refer to vswitch.xml for more details on egress-policer.
> -
> -Rate Limiting
> ---
> -
> -Here is an example on Ingress Policing usage. Assuming you have a vhost-
> user -port receiving traffic consisting of packets of size 64 bytes, the
> following -command would limit the reception rate of the port to
> ~1,000,000 packets per
> -second::
> -
> -$ ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 \
> -ingress_policing_burst=1000`
> -
> -To examine the ingress policer configuration of the port::
> -
> -$ ovs-vsctl list interface vhost-user0
> -
> -To clear the ingress policer configuration from the port::
> -
> -$ ovs-vsctl set interface vhost-user0 ingress_policing_rate=0
> -
> -Refer to vswitch.xml for more details on ingress-policer.
> -
> -Flow Control
> -
> -
> -Flow control can be enabled only on DPDK physical ports. To enable flow
> control -support at tx side while adding a port, run::
> -
> -$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
> -options:dpdk-devargs=:01:00.0 options:tx-flow-ctrl=true
> -
> -Similarly, to enable rx flow control, run::
> -
> -$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
> -options:dpdk-devargs=:01:00.0 options:rx-flow-ctrl=true
> -
> -To enable flow control auto-negotiation, run::
> -
> -$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
> -options:dpdk-devargs=:01:00.0 options:flow-ctrl-autoneg=true
> -
> -To turn ON the tx flow control at run time for an existing port, run::
> -
> -$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=true
> -
> -The flow control parameters can be turned off by setting ``false`` to the
> -respective parameter. To disable the flow control at tx side, run::
> -
> -$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=false
> -
>  pdump
>  -
> 
> diff --git a/Documentation/topics/dpdk/index.rst
> b/Documentation/topics/dpdk/index.rst
> index dfde88377..fe4d97b8b 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -32,3 +32,4 @@ The DPDK Datapath
> vhost-user
> ring
> pmd
> +   qos
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> index 507dac869..93aff628c 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -210,3 +210,9 @@ More information on the different types of virtual
> DPDK PMDs can be found in  the `DPDK documentation`__.
> 
>  __ http://dpdk.org/doc/guides/nics/overview.html
> +
> +Flow Control
> +
> +
> +Flow control is available for DPDK physical ports. For more
> +information, refer to :ref:`dpdk-flow-control`.

Ah, I see you add a section for flow control here, disregard previous comment 
in patch 3.

> diff --git a/Documentation/topics/dpdk/qos.rst
> b/Documentation/topics/dpdk/qos.rst
> new file mode 100644
> index 0..c19e01db7
> --- /dev/null
> +++ b/Documentation/topics/dpdk/qos.rst
> @@ -0,0 +1,100 @@
> +..
> +  Licensed under the Apache License, Version 2.0 (the "License"); you
> may
> +  not use this file except in compliance with the License. You may
> obtain
> +  a copy of the License at
> +
> +  http://www.apache.org/licenses/LICENSE-2.0
> +
> +  Unless required by applicable law or agreed to in writing, software
> +  distributed under the License is distributed on an "AS IS" BASIS,
> WITHOU

Re: [ovs-dev] [PATCH 2/8] doc: Add "PMD" topic document

2018-04-09 Thread Stokes, Ian
> This continues the breakup of the huge DPDK "howto" into smaller
> components. There are a couple of related changes included, such as using
> "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured.
> 
> We enable the TODO directive, so we can actually start calling out some
> TODOs.
> 
> Signed-off-by: Stephen Finucane 
> ---
>  Documentation/conf.py|   2 +-
>  Documentation/howto/dpdk.rst |  86 ---
>  Documentation/topics/dpdk/index.rst  |   1 +
>  Documentation/topics/dpdk/phy.rst|  10 +++
>  Documentation/topics/dpdk/pmd.rst| 139
> +++
>  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
>  6 files changed, 159 insertions(+), 96 deletions(-)  create mode 100644
> Documentation/topics/dpdk/pmd.rst
> 
> diff --git a/Documentation/conf.py b/Documentation/conf.py index
> 6ab144c5d..babda21de 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -32,7 +32,7 @@ needs_sphinx = '1.1'
>  # Add any Sphinx extension module names here, as strings. They can be  #
> extensions coming with Sphinx (named 'sphinx.ext.*') or your custom  #
> ones.
> -extensions = []
> +extensions = ['sphinx.ext.todo']
> 
>  # Add any paths that contain templates here, relative to this directory.
>  templates_path = ['_templates']
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d717d2ebe..c2324118d 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
>  $ ovs-appctl -t ovsdb-server exit
>  $ ovs-vsctl del-br br0
> 
> -PMD Thread Statistics
> --
> -
> -To show current stats::
> -
> -$ ovs-appctl dpif-netdev/pmd-stats-show
> -
> -To clear previous stats::
> -
> -$ ovs-appctl dpif-netdev/pmd-stats-clear
> -
> -Port/RXQ Assigment to PMD Threads
> --
> -
> -To show port/rxq assignment::
> -
> -$ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -To change default rxq assignment to pmd threads, rxqs may be manually
> pinned to -desired cores using::
> -
> -$ ovs-vsctl set Interface  \
> -other_config:pmd-rxq-affinity=
> -
> -where:
> -
> --  is a CSV list of ``:``
> values
> -
> -For example::
> -
> -$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> -other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> -
> -This will ensure:
> -
> -- Queue #0 pinned to core 3
> -- Queue #1 pinned to core 7
> -- Queue #2 not pinned
> -- Queue #3 pinned to core 8
> -
> -After that PMD threads on cores where RX queues was pinned will become -
> ``isolated``. This means that this thread will poll only pinned RX queues.
> -
> -.. warning::
> -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues
> will
> -  not be polled. Also, if provided ``core_id`` is not available (ex. this
> -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by
> any PMD
> -  thread.
> -
> -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds
> (cores) -automatically. The processing cycles that have been stored for
> each rxq -will be used where known to assign rxqs to pmd based on a round
> robin of the -sorted rxqs.
> -
> -For example, in the case where here there are 5 rxqs and 3 cores (e.g.
> 3,7,8) -available, and the measured usage of core cycles per rxq over the
> last -interval is seen to be:
> -
> -- Queue #0: 30%
> -- Queue #1: 80%
> -- Queue #3: 60%
> -- Queue #4: 70%
> -- Queue #5: 10%
> -
> -The rxqs will be assigned to cores 3,7,8 in the following order:
> -
> -Core 3: Q1 (80%) |
> -Core 7: Q4 (70%) | Q5 (10%)
> -core 8: Q3 (60%) | Q0 (30%)
> -
> -To see the current measured usage history of pmd core cycles for each
> rxq::
> -
> -$ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -.. note::
> -
> -  A history of one minute is recorded and shown for each rxq to allow for
> -  traffic pattern spikes. An rxq's pmd core cycles usage changes due to
> traffic
> -  pattern or reconfig changes will take one minute before they are fully
> -  reflected in the stats.
> -
> -Rxq to pmds assignment takes place whenever there are configuration
> changes -or can be triggered by using::
> -
> -$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> -
>  QoS
>  ---
> 
> diff --git a/Documentation/topics/dpdk/index.rst
> b/Documentation/topics/dpdk/index.rst
> index 5f836a6e9..dfde88377 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -31,3 +31,4 @@ The DPDK Datapath
> phy
> vhost-user
> ring
> +   pmd
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> index 1c18e4e3d..222fa3e9f 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -109,3 +109,13 @@ tool::
>  For more information, refer to the `DPDK documentation `__.
> 
>  .. _dpdk-drivers: http://dpdk.org/doc/gui

Re: [ovs-dev] [PATCH 0/8] Split up the DPDK how-to

2018-04-09 Thread Stokes, Ian
> The DPDK howto has slowly morphed into a catch all for everything DPDK,
> which goes against the original design goal for 'howto' documents [*].
> This series attempts to return some sanity to the universe by splitting
> this document into many more 'topic' documents. Along the way, we add a
> lot of semantic markup, rework some text, and add an overview on 'dpdk'-
> type ports (the original goal here).

Thanks for working on this Stephen.

I'm in favor of improving the documentation but we need to be careful when 
splitting it further apart. I think we should do this only where it makes sense.

Since moving to the current documentation design layout a common complaint I've 
heard at the community call is that users don't know where to find specific 
info as there are multiple locations dealing with the same topic. It can be 
argued that this info exists in different locations due to the differences in 
the type of content expected i.e. the difference between what is put in an 
intro, how-to or tutorial. But this can be non-obvious to a user. It seems the 
technical details of these differences are hampering the ease of use from a 
user POV.

I'd agree the previous approach of having a single DPDK doc catch all is far 
from ideal but one aspect users liked was that it was a single clear location.

Possibly a document map at a high level could help with this.

I be interested for others input on this series as well.

Thanks
Ian
> 
> There's a good chance I've made some mistakes in the process and I've left
> TODOs for someone to resolve now or at a future date. I welcome feedback
> on both of these.
> 
> Now to go back to figure how exactly NUMA affinity works for and affects
> PMD threads...
> 
> [*] 'howto' documents are supposed to be brief, high-level overviews on
> a particular group of features, with a focus on the user. They're
> not as all-encompassing as a 'tutorial', but not as specific as a
> 'topic'.
> 
> Stephen Finucane (8):
>   doc: Add an overview of the 'dpdk' port
>   doc: Add "PMD" topic document
>   doc: Move additional sections to "physical ports" doc
>   doc: Move "QoS" guide to its own document
>   doc: Add "bridge" topic document
>   doc: Move "pdump" guide to its own document
>   doc: Split Jumbo Frames guide between two docs
>   doc: Final cleanup of the DPDK howto
> 
>  Documentation/conf.py|   2 +-
>  Documentation/howto/dpdk.rst | 453 +++---
> -
>  Documentation/topics/dpdk/bridge.rst | 103 +++
>  Documentation/topics/dpdk/index.rst  |  11 +
>  Documentation/topics/dpdk/pdump.rst  |  65 +
>  Documentation/topics/dpdk/phy.rst| 242 +
>  Documentation/topics/dpdk/pmd.rst| 139 ++
>  Documentation/topics/dpdk/qos.rst| 100 +++
>  Documentation/topics/dpdk/vhost-user.rst |  53 +++-
>  9 files changed, 740 insertions(+), 428 deletions(-)  create mode 100644
> Documentation/topics/dpdk/bridge.rst
>  create mode 100644 Documentation/topics/dpdk/pdump.rst
>  create mode 100644 Documentation/topics/dpdk/phy.rst  create mode 100644
> Documentation/topics/dpdk/pmd.rst  create mode 100644
> Documentation/topics/dpdk/qos.rst
> 
> --
> 2.14.3
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/8] doc: Add an overview of the 'dpdk' port

2018-04-09 Thread Stokes, Ian
> These ports are used to allow ingress/egress from the host and are
> therefore _reasonably_ important. However, there is no clear overview of
> what these ports actually are or why things are done the way they are.
> Start closing this gap by providing a standalone example of using these
> ports along with a little more detailed overview of the binding process.
> 
> There is additional cleanup to be done for the DPDK howto, but that will
> be done separately.
> 
> Signed-off-by: Stephen Finucane 
> Cc: Ciara Loftus 
> Cc: Kevin Traynor 
> ---
>  Documentation/topics/dpdk/index.rst |   1 +
>  Documentation/topics/dpdk/phy.rst   | 111
> 
>  2 files changed, 112 insertions(+)
>  create mode 100644 Documentation/topics/dpdk/phy.rst
> 
> diff --git a/Documentation/topics/dpdk/index.rst
> b/Documentation/topics/dpdk/index.rst
> index da148b323..5f836a6e9 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -28,5 +28,6 @@ The DPDK Datapath
>  .. toctree::
> :maxdepth: 2
> 
> +   phy
> vhost-user
> ring
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> new file mode 100644
> index 0..1c18e4e3d
> --- /dev/null
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -0,0 +1,111 @@
> +..
> +  Copyright 2018, Red Hat, Inc.
> +
> +  Licensed under the Apache License, Version 2.0 (the "License"); you
> may
> +  not use this file except in compliance with the License. You may
> obtain
> +  a copy of the License at
> +
> +  http://www.apache.org/licenses/LICENSE-2.0
> +
> +  Unless required by applicable law or agreed to in writing, software
> +  distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> +  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> +  License for the specific language governing permissions and
> limitations
> +  under the License.
> +
> +  Convention for heading levels in Open vSwitch documentation:
> +
> +  ===  Heading 0 (reserved for the title in a document)
> +  ---  Heading 1
> +  ~~~  Heading 2
> +  +++  Heading 3
> +  '''  Heading 4
> +
> +  Avoid deeper levels because they do not render well.
> +
> +===
> +DPDK Physical Ports
> +===
> +
> +The DPDK datapath provides a way to attach DPDK-backed physical
> +interfaces to allow high-performance ingress/egress from the host.
> +
> +.. versionchanged:: 2.7.0
> +
> +   Before Open vSwitch 2.7.0, it was necessary to prefix port names with
> a
> +   ``dpdk`` prefix. Starting with 2.7.0, this is no longer necessary.
> +
> +Quick Example
> +-
> +
> +This example demonstrates how to bind two ``dpdk`` ports, bound to
> +physical interfaces identified by hardware IDs ``:01:00.0`` and
> +``:01:00.1``, to an existing bridge called ``br0``::
> +
> +$ ovs-vsctl add-port br0 dpdk-p0 \
> +   -- set Interface dpdk-p0 type=dpdk options:dpdk-
> devargs=:01:00.0
> +$ ovs-vsctl add-port br0 dpdk-p1 \
> +   -- set Interface dpdk-p1 type=dpdk
> + options:dpdk-devargs=:01:00.1
> +
> +For the above example to work, the two physical interfaces must be
> +bound to the DPDK poll-mode drivers in userspace rather than the
> +traditional kernel drivers. See the `binding NIC drivers  nics>` section for details.

I think an example should be added here for when multiple ports share the same 
bus slot function.

Support for this was added in as part of OVS 2.9.

If not added here then we need to flag clearly that it's supported but that 
users need to consult the dpdk-binding-nic section for specifics.
 
Ian
> +
> +.. _dpdk-binding-nics:
> +
> +Binding NIC Drivers
> +---
> +
> +DPDK operates entirely in userspace and, as a result, requires use of
> +its own poll-mode drivers in user space for physical interfaces and a
> +passthrough-style driver for the devices in kernel space.
> +
> +There are two different tools for binding drivers: :command:`driverctl`
> +which is a generic tool for persistently configuring alternative device
> +drivers, and :command:`dpdk-devbind` which is a DPDK-specific tool and
> +whose changes do not persist across reboots. In addition, there are two
> +options available for this kernel space driver - VFIO (Virtual Function
> +I/O) and UIO (Userspace I/O) - along with a number of drivers for each
> +option. We will demonstrate examples of both tools and will use the
> +``vfio-pci`` driver, which is the more secure, robust driver of those
> +available. More information can be found in the `DPDK documentation
> `__.
> +
> +To list devices using :command:`driverctl`, run::
> +
> +$ driverctl -v list-devices | grep -i net
> +:07:00.0 igb (I350 Gigabit Network Connection (Ethernet Server
> Adapter I350-T2))
> +:07:00.1 igb (I350 Gigabit Network Connection (Ethernet Server
> + Ad

Re: [ovs-dev] [branch-2.9 PATCH 1/2] netdev-dpdk: Free mempool only when no in-use mbufs.

2018-04-09 Thread Kevin Traynor
On 04/06/2018 04:51 PM, Ilya Maximets wrote:
>>> DPDK mempools are freed when they are no longer needed.
>>> This can happen when a port is removed or a port's mtu is reconfigured so
>>> that a new mempool is used.
>>>
>>> It is possible that an mbuf is attempted to be returned to a freed mempool
>>> from NIC Tx queues and this can lead to a segfault.
>>>
>>> In order to prevent this, only free mempools when they are not needed and
>>> have no in-use mbufs. As this might not be possible immediately, sweep the
>>> mempools anytime a port tries to get a mempool.
>>>
>>
>> Thanks for working on this Kevin. Just a query below. From a testing POV I 
>> didn't come across any issues.
>>
>>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>>> Cc: mark.b.kavanagh81 at gmail.com
>>> Signed-off-by: Kevin Traynor 
>>> ---
>>>
>>> Found on OVS 2.6, but code is operationally similar on recent the branch-
>>> 2.*'s. If the patch is acceptable I can backport to older branches.
> 
> This issue was actually rised back in January while discussing mempool issue.
> 

Hmmm, thanks for pointing it out. Seems the last mails in the thread
coincided with when I was traveling and I didn't read them back then.

>>
>> Sounds good to me.
>>>
>>>  lib/netdev-dpdk.c | 40 +++-
>>>  1 file changed, 27 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c19cedc..9b8e732
>>> 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -590,8 +590,24 @@ dpdk_mp_create(int socket_id, int mtu)  }
>>>
>>> +/* Free unused mempools. */
>>> +static void
>>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex) {
>>> +struct dpdk_mp *dmp, *next;
>>> +
>>> +LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>>> +if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
>>
>> I hadn't looked at rte_mempool_full() before. I noticed the documentation 
>> warns not to use it as part of the datapath but for debug purposes only.
> 
> Yes, rte_mempool_full() is racy and could return false-positives, but it's
> the best and only useful function in DPDK API. In practice we can never be
> sure if someone still uses the memory pool. I also used this function in
> my solution for branch 2.8 posted here:
> 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046100.html
> 

The similarities and differences are interesting!

> Memory handling in DPDK is awful. In fact, noone is able to use 
> rte_mempool_free(),
> because there is no way correct to check if memory pool is still in use.
> 
> There is one dirty hack to check if mempool is really full. It based on the
> internal mempool implementation and the fact that we're not allocating any
> mbufs from the pool:
> 1.  Check rte_mempool_ops_get_count(). <-- this exposed externally for some 
> reason
> 2.  Check rte_mempool_full().
> 3.  Check rte_mempool_ops_get_count() again.
> 4.  If rte_mempool_full() was 'true' and both calls to
> rte_mempool_ops_get_count() returned same value > Mempool really full.
> 5.  Here we could safely call rte_mempool_free().
> 

Yes, that would work.

The ring count is made before the cache count in rte_mempool_full() and
as we are not getting mbufs I think we should only have the possibility
of mbufs moving from a cache to the ring. In that case we may get a
false negative but wouldn't get a false positive for rte_mempool_full().
Obviously the downside is that it is implementation specific - but maybe
it could be argued the implementation will not change on already
released DPDK versions. What do you think?

>>
>> Does its use add to the downtime in a noticeable way while we reconfigure? 
>> I'm thinking of a deployment where the number of lcores is high and we 
>> reconfigure often. When cache is enabled, it has to browse the length of all 
>> lcores. There may not be an issue here but was curious.
> 

I think it should be insignificant compared to actually
allocating/freeing a mempool. I will try to check it.

> That is the interesting question. In my version (referenced above) I used
> watchdog thread to periodically free mempools with zero refcount.
> Not sure which solution is better.
> 

The advantage I see of being on the watchdog timer thread is that
probably no one cares how long it takes :-) The advantage of being in
the mempool config is that it runs (and guarantees to run) just before a
new mempool is going to be requested, so there is smaller risk of
additional memory requirements.

In the short term I think we should use the a fix like the patches that
don't cover every possible corner case or the get_count() that isn't the
most elegant. Longer term we can think about what we would want to add
to DPDK to make it easier to do this.

thanks,
Kevin.

>>
>> Did you do any testing around this?
>>
>> Thanks
>> Ian
>>
>>> +ovs_list_remove(&dmp->list_node);
>>> +rte_mempool_free(dmp->mp);
>>> +rte_free(dmp);
>>> + 

[ovs-dev] [PATCH v2 2/2] OVN: add icmp6 action to ovn acl reject support

2018-04-09 Thread Lorenzo Bianconi
Whenever the acl reject rule is hit by an IPv6 packet send back
an ICMPv6 destination unreachable packet using the icmp6 action

Signed-off-by: Lorenzo Bianconi 
---
 NEWS|  4 
 ovn/northd/ovn-northd.c | 25 ++---
 ovn/ovn-nb.xml  |  3 ++-
 tests/ovn.at| 20 
 4 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/NEWS b/NEWS
index 0cfcac54f..757d648a1 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,10 @@ Post-v2.9.0
  * OFPT_ROLE_STATUS is now available in OpenFlow 1.3.
- Linux kernel 4.14
  * Add support for compiling OVS with the latest Linux 4.14 kernel
+   - ovn:
+ * implemented icmp4/icmp6/tcp_reset actions in order to drop the packet
+   and reply with a RST for TCP or ICMPv4/ICMPv6 unreachable message for
+   other IPv4/IPv6-based protocols whenever a reject ACL rule is hit.
 
 v2.9.0 - 19 Feb 2018
 
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 9ca15bc2f..845faba86 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -2912,10 +2912,12 @@ build_pre_acls(struct ovn_datapath *od, struct hmap 
*lflows)
  * unreachable packets. */
 ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
   "nd || nd_rs || nd_ra || icmp4.type == 3 || "
-  "(tcp && tcp.flags == 4)", "next;");
+  "icmp6.type == 1 || (tcp && tcp.flags == 4)",
+  "next;");
 ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
   "nd || nd_rs || nd_ra || icmp4.type == 3 || "
-  "(tcp && tcp.flags == 4)", "next;");
+  "icmp6.type == 1 || (tcp && tcp.flags == 4)",
+  "next;");
 
 /* Ingress and Egress Pre-ACL Table (Priority 100).
  *
@@ -3131,7 +3133,7 @@ build_reject_acl_rules(struct ovn_datapath *od, struct 
hmap *lflows,
 ovn_lflow_add(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET + 10,
   ds_cstr(&match), ds_cstr(&actions));
 
-/* IPv4 traffic */
+/* IP traffic */
 ds_clear(&match);
 ds_clear(&actions);
 build_acl_log(&actions, acl);
@@ -3148,6 +3150,23 @@ build_reject_acl_rules(struct ovn_datapath *od, struct 
hmap *lflows,
   ingress ? "output;" : "next(pipeline=ingress,table=0);");
 ovn_lflow_add(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET,
   ds_cstr(&match), ds_cstr(&actions));
+ds_clear(&match);
+ds_clear(&actions);
+build_acl_log(&actions, acl);
+if (extra_match->length > 0) {
+ds_put_format(&match, "(%s) && ", extra_match->string);
+}
+ds_put_format(&match, "ip6 && (%s)", acl->match);
+if (extra_actions->length > 0) {
+ds_put_format(&actions, "%s ", extra_actions->string);
+}
+ds_put_format(&actions, "reg0 = 0; icmp6 { "
+  "eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
+  "outport <-> inport; %s };",
+  ingress ? "output;" : "next(pipeline=ingress,table=0);");
+ovn_lflow_add(lflows, od, stage, acl->priority + OVN_ACL_PRI_OFFSET,
+  ds_cstr(&match), ds_cstr(&actions));
+
 ds_destroy(&match);
 ds_destroy(&actions);
 }
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 2ebaac561..1f6e69e2d 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -1061,7 +1061,8 @@
 
 
   reject: Drop the packet, replying with a RST for TCP or
-  ICMP unreachable message for other IP-based protocols.
+  ICMPv4/ICMPv6 unreachable message for other IPv4/IPv6-based
+  protocols.
 
   
 
diff --git a/tests/ovn.at b/tests/ovn.at
index 423c73c26..3bb9932f2 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -9412,6 +9412,24 @@ test_ip_packet() {
 as hv$hv ovs-appctl netdev-dummy/receive vif$inport $packet
 }
 
+# test_ipv6_packet INPORT HV ETH_SRC ETH_DST IPV4_SRC IPV4_DST EXP_ICMP_CHKSUM
+#
+# Causes a packet to be received on INPORT of the hypervisor HV. The packet is 
an IPv6 packet with
+# ETH_SRC, ETH_DST, IPV6_SRC, IPV6_DST as specified.
+# EXP_ICMP_CHKSUM is the icmp6 checksums of the icmp6 destination unreachable 
frame generated from ACL rule hit
+test_ipv6_packet() {
+local inport=$1 hv=$2 eth_src=$3 eth_dst=$4 ipv6_src=$5 ipv6_dst=$6 
exp_icmp_chksum=$7
+shift 7
+
+local ip6_hdr=60083aff${ipv6_src}${ipv6_dst}
+local packet=${eth_dst}${eth_src}86dd${ip6_hdr}
+
+local 
reply=${eth_src}${eth_dst}86dd60303aff${ipv6_dst}${ipv6_src}0101${exp_icmp_chksum}${ip6_hdr}
+echo $reply >> vif$inport.expected
+
+as hv$hv ovs-appctl netdev-dummy/receive vif$inport $packet
+}
+
 # test_tcp_syn_packet INPORT HV ETH_SRC ETH_DST IPV4_SRC IPV4_DST IP_CHKSUM 
TCP_SPORT TCP_DPORT TCP_CHKSUM EXP_IP_CHKSUM EXP_TCP_RST_CHKSUM
 #
 # Causes a packet to be received on INPORT of the hy

[ovs-dev] [PATCH v2 1/2] OVN: add icmp6{} action support

2018-04-09 Thread Lorenzo Bianconi
icmp6 action is used to replace the IPv6 packet been processed with
an ICMPv6 packet initialized based on incoming IPv6 one.
Ethernet and IPv6 fields not listed are not changed:
- ip.proto = 58 (ICMPv6)
- ip.ttl = 255
- icmp6.type = 1 (destination unreachable)
- icmp6.code = 1 (communication administratively prohibited)
Prerequisite: ip6

Signed-off-by: Lorenzo Bianconi 
---
 include/ovn/actions.h |  5 ++--
 ovn/controller/pinctrl.c  | 73 +--
 ovn/lib/actions.c | 24 +++-
 ovn/ovn-sb.xml| 26 +
 ovn/utilities/ovn-trace.c | 30 +++
 tests/ovn.at  | 10 +++
 6 files changed, 144 insertions(+), 24 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 086ab19e0..3f58b72f8 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -65,6 +65,7 @@ struct ovn_extend_table;
 OVNACT(CLONE, ovnact_nest)\
 OVNACT(ARP,   ovnact_nest)\
 OVNACT(ICMP4, ovnact_nest)\
+OVNACT(ICMP6, ovnact_nest)\
 OVNACT(TCP_RESET, ovnact_nest)\
 OVNACT(ND_NA, ovnact_nest)\
 OVNACT(GET_ARP,   ovnact_get_mac_bind)\
@@ -432,11 +433,11 @@ enum action_opcode {
  */
 ACTION_OPCODE_ND_NS,
 
-/* "icmp4 { ...actions... }".
+/* "icmp4 { ...actions... } and icmp6 { ...actions... }".
  *
  * The actions, in OpenFlow 1.3 format, follow the action_header.
  */
-ACTION_OPCODE_ICMP4,
+ACTION_OPCODE_ICMP,
 
 /* "tcp_reset { ...actions... }".
  *
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index c816b2dd6..2f130994a 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -220,15 +220,16 @@ pinctrl_handle_arp(const struct flow *ip_flow, const 
struct match *md,
 }
 
 static void
-pinctrl_handle_icmp4(const struct flow *ip_flow, const struct match *md,
- struct ofpbuf *userdata)
+pinctrl_handle_icmp(const struct flow *ip_flow, struct dp_packet *pkt_in,
+const struct match *md, struct ofpbuf *userdata)
 {
 /* This action only works for IP packets, and the switch should only send
  * us IP packets this way, but check here just to be sure. */
-if (ip_flow->dl_type != htons(ETH_TYPE_IP)) {
+if (ip_flow->dl_type != htons(ETH_TYPE_IP) &&
+ip_flow->dl_type != htons(ETH_TYPE_IPV6)) {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 VLOG_WARN_RL(&rl,
- "ICMP4 action on non-IP packet (eth_type 0x%"PRIx16")",
+ "ICMP action on non-IP packet (eth_type 0x%"PRIx16")",
  ntohs(ip_flow->dl_type));
 return;
 }
@@ -243,21 +244,50 @@ pinctrl_handle_icmp4(const struct flow *ip_flow, const 
struct match *md,
 struct eth_header *eh = dp_packet_put_zeros(&packet, sizeof *eh);
 eh->eth_dst = ip_flow->dl_dst;
 eh->eth_src = ip_flow->dl_src;
-eh->eth_type = htons(ETH_TYPE_IP);
-
-struct ip_header *nh = dp_packet_put_zeros(&packet, sizeof *nh);
-dp_packet_set_l3(&packet, nh);
-nh->ip_ihl_ver = IP_IHL_VER(5, 4);
-nh->ip_tot_len = htons(sizeof(struct ip_header) +
-   sizeof(struct icmp_header));
-nh->ip_proto = IPPROTO_ICMP;
-nh->ip_frag_off = htons(IP_DF);
-packet_set_ipv4(&packet, ip_flow->nw_src, ip_flow->nw_dst,
-ip_flow->nw_tos, 255);
-
-struct icmp_header *ih = dp_packet_put_zeros(&packet, sizeof *ih);
-dp_packet_set_l4(&packet, ih);
-packet_set_icmp(&packet, ICMP4_DST_UNREACH, 1);
+
+if (get_dl_type(ip_flow) == htons(ETH_TYPE_IP)) {
+struct ip_header *nh = dp_packet_put_zeros(&packet, sizeof *nh);
+
+eh->eth_type = htons(ETH_TYPE_IP);
+dp_packet_set_l3(&packet, nh);
+nh->ip_ihl_ver = IP_IHL_VER(5, 4);
+nh->ip_tot_len = htons(sizeof(struct ip_header) +
+   sizeof(struct icmp_header));
+nh->ip_proto = IPPROTO_ICMP;
+nh->ip_frag_off = htons(IP_DF);
+packet_set_ipv4(&packet, ip_flow->nw_src, ip_flow->nw_dst,
+ip_flow->nw_tos, 255);
+
+struct icmp_header *ih = dp_packet_put_zeros(&packet, sizeof *ih);
+dp_packet_set_l4(&packet, ih);
+packet_set_icmp(&packet, ICMP4_DST_UNREACH, 1);
+} else {
+struct ip6_hdr *nh = dp_packet_put_zeros(&packet, sizeof *nh);
+struct icmp6_error_header *ih;
+uint32_t icmpv6_csum;
+
+eh->eth_type = htons(ETH_TYPE_IPV6);
+dp_packet_set_l3(&packet, nh);
+nh->ip6_vfc = 0x60;
+nh->ip6_nxt = IPPROTO_ICMPV6;
+nh->ip6_plen = htons(sizeof(*nh) + ICMP6_ERROR_HEADER_LEN);
+packet_set_ipv6(&packet, &ip_flow->ipv6_src, &ip_flow->ipv6_dst,
+ip_flow->nw_tos, ip_

[ovs-dev] [PATCH v2 0/2] add icmp6 action support to ovn acl framework

2018-04-09 Thread Lorenzo Bianconi
Changes since v1:
- squashed ACTION_OPCODE_ICMP4 and ACTION_OPCODE_ICMP6 in ACTION_OPCODE_ICMP
- updated ovn-northd manpage
- added a NEWS item that describes the new features

Lorenzo Bianconi (2):
  OVN: add icmp6{} action support
  OVN: add icmp6 action to ovn acl reject support

 NEWS  |  4 +++
 include/ovn/actions.h |  5 ++--
 ovn/controller/pinctrl.c  | 73 +--
 ovn/lib/actions.c | 24 +++-
 ovn/northd/ovn-northd.c   | 25 ++--
 ovn/ovn-nb.xml|  3 +-
 ovn/ovn-sb.xml| 26 +
 ovn/utilities/ovn-trace.c | 30 +++
 tests/ovn.at  | 30 +++
 9 files changed, 192 insertions(+), 28 deletions(-)

-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6] Configurable Link State Change (LSC) detection mode

2018-04-09 Thread Eelco Chaudron

On 05/04/18 15:17, Stokes, Ian wrote:

On 03/27/2018 04:07 PM, Stokes, Ian wrote:

On 27.03.2018 13:19, Stokes, Ian wrote:

It is possible to change LSC detection mode to polling or interrupt
mode for DPDK interfaces. The default is polling mode. To set
interrupt mode, option dpdk-lsc-interrupt has to be set to true.

In polling mode more processor time is needed, since the OVS
repeatedly reads the link state with a short period. It can lead to
packet loss for certain systems.

In interrupt mode the hardware itself triggers an interrupt when
link state change happens, so less processing time needs for the OVS.

For detailed description and usage see the dpdk install

documentation.

Could you, please, better describe why we need this change?
Because we're not removing the polling thread. OVS will still poll
the link states periodically. This config option has no effect on that

side.

Also, link state polling in OVS uses 'rte_eth_link_get_nowait()'
function which will be called in both cases and should not wait for
hardware reply in any implementation.

rte_eth_link_get_nowait() on Intel XL710 could take an excessive time to
respond. The following patch, https://dpdk.org/ml/archives/dev/2018-
March/092156.html is taking care of it from a DPDK side.

There might be other drivers that also take a long time, hence this patch
might still be useful in the future.


I believe it was related to a case where bonded mode in active back
was causing packet drops due to the frequency that the LSC was being
polled. Using interrupt based approach alleviated the issue. (I'm open
to correction on this :))

@Robert/Eelco You may be able to provide some more light here and

whether the patches below in DPDK resolve the issue?
This long delay can be an issue in bonding mode, as the links checks for
bonding interfaces is holding the RW lock in bond_run(). This same lock is
taken in the PMD thread when calling the bond_check_admissibility() for
upcall traffic.

There was recent bug fix for intel NICs that fixes waiting of an
admin queue on link state requests despite of 'no_wait' flag:
  http://dpdk.org/ml/archives/dev/2018-March/092156.html
Will this fix your target case?

So, the difference of execution time of 'rte_eth_link_get_nowait()'
with enabled and disabled interrupts should be not so significant.
Do you have performance measurements? Measurement with above fix

applied?
I do not have delay numbers but I know that we were no longer seeing
dropped traffic compared to other NICs under the same load with upcall
traffic present.

Thanks for working on this Robert.

I've completed some testing including the case where LSC is not

supported, in which case the port will remain in a down state and
fail rx/tx traffic. This behavior conforms to the netdev_reconfigure
expectations in the fail case so that's ok.

I'm not sure if this is acceptable. For example, we're not failing
reconfiguration in case of issues with number of queues. We're trying
different numbers until we have working configuration.
Maybe we need the same fall-back mechanism in case of not supported
LSC interrupts? (MTU setup errors are really uncommon unlike LSC

interrupts'

support in PMDs).

Thanks for raising this Ilya.

I thought of this as well. I'd like to see a fall back to the PMD but

didn’t see how it could be done in a clean way.

Unfortunately rte_eth_dev_configure() returns -EINVAL when lsc mode is

requested but not supported.

It doesn't give us a clue if the error is related to lsc mode as it

could also relate to a number of other configure issues such as
nb_rxq/nb_txq/portid etc.

It would be better if we could query the device via ethdev api to see if

it supports lsc interrupt mode but that’s not available currently.
Maybe a DPDK patch before we continue?

It's hard to say. As part of the call to rte_eth_dev_configure() there is a 
check specifically to see if lsc interrupt is supported with the following.

/* Check that the device supports requested interrupts */
if ((dev_conf->intr_conf.lsc == 1) &&
(!(dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
RTE_PMD_DEBUG_TRACE("driver %s does not support lsc\n",
dev->device->driver->name);
return -EINVAL;
}

Even if a patch was submitted to extend the API in DPDK to allow this check 
specifically, I feel it will be the same code as above, just in a new function 
name. The check would remain in rte_eth_dev_configure() anyway.
I was more hinting to the fact that the only way to get this device 
information is trough some global data set, 
rte_eth_devices[port_id].data->dev_flags, would be nice if a clean API 
existed. But if we are ok, we can access it directly from OVS.



We could do something similar in OVS for now to allow PMD fallback where it's 
not supported.

Ian



___
dev mailing list
d...@openvswitch.org
https:/

Re: [ovs-dev] [PATCH] ofp-actions: Correct execution of encap/decap actions in action set

2018-04-09 Thread Jan Scheurich
Hi Yi,

The assertion failure is indeed caused by the incorrect implementation of 
double encap() and should be fixed by the patch you mention (which is merged to 
master by now).

Prior to the below fix this happened with every encap(nsh) in an group bucket. 

I can't say why it still happens periodically every few minutes in your test. 
You'd need to carefully analyze a crash dump to try to understand the packet 
processing history that leads to a double encap() or perhaps decap().

It is definitely worth trying whether the problem is already resolved on the 
latest master.

BR, Jan

> -Original Message-
> From: Yang, Yi Y [mailto:yi.y.y...@intel.com]
> Sent: Sunday, 08 April, 2018 10:27
> To: Jan Scheurich ; d...@openvswitch.org
> Subject: RE: [PATCH] ofp-actions: Correct execution of encap/decap actions in 
> action set
> 
> Hi, Jan
> 
> Sangfor guy tried this one, he still encountered assert issue after ovs ran 
> for about 20 minutes, moreover it appeared periodically. I'm
> not sure if https://patchwork.ozlabs.org/patch/895405/ is helpful for this 
> issue. Do you think what the root cause is?
> 
> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Monday, March 26, 2018 3:36 PM
> To: d...@openvswitch.org
> Cc: Yang, Yi Y ; Jan Scheurich 
> 
> Subject: [PATCH] ofp-actions: Correct execution of encap/decap actions in 
> action set
> 
> The actions encap, decap and dec_nsh_ttl were wrongly flagged as set_field 
> actions in ofpact_is_set_or_move_action(). This caused
> them to be executed twice in the action set or a group bucket, once 
> explicitly in
> ofpacts_execute_action_set() and once again as part of the list of set_field 
> or move actions.
> 
> Fixes: f839892a ("OF support and translation of generic encap and decap")
> Fixes: 491e05c2 ("nsh: add dec_nsh_ttl action")
> 
> Signed-off-by: Jan Scheurich 
> 
> ---
> 
> The fix should be backported to OVS 2.9 and OVS 2.8 (without the case for 
> OFPACT_DEC_NSH_TTL introduced in 2.9).
> 
> 
>  lib/ofp-actions.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index db85716..87797bc 
> 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -6985,9 +6985,6 @@ ofpact_is_set_or_move_action(const struct ofpact *a)
>  case OFPACT_SET_TUNNEL:
>  case OFPACT_SET_VLAN_PCP:
>  case OFPACT_SET_VLAN_VID:
> -case OFPACT_ENCAP:
> -case OFPACT_DECAP:
> -case OFPACT_DEC_NSH_TTL:
>  return true;
>  case OFPACT_BUNDLE:
>  case OFPACT_CLEAR_ACTIONS:
> @@ -7025,6 +7022,9 @@ ofpact_is_set_or_move_action(const struct ofpact *a)
>  case OFPACT_WRITE_METADATA:
>  case OFPACT_DEBUG_RECIRC:
>  case OFPACT_DEBUG_SLOW:
> +case OFPACT_ENCAP:
> +case OFPACT_DECAP:
> +case OFPACT_DEC_NSH_TTL:
>  return false;
>  default:
>  OVS_NOT_REACHED();
> --
> 1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev