Re: [ovs-dev] raft ovsdb clustering with scale test

2018-04-17 Thread Numan Siddique
On Tue, Apr 17, 2018 at 5:52 AM, aginwala  wrote:

> Hi Ben/Numan:
>
>
> Just following up again. Let me know for any further
> improvements/suggestions on this.
>
>
Hi Aliasgar,

I  need to educate myself first by studing raft and Ben's patches :). I
hope I will do it soon.

Thanks
Numan




> Regards,
>
>
> On Mon, Apr 9, 2018 at 10:53 AM, aginwala  wrote:
>
> > Further Analysis indicates that cluster is not able to select a new
> leader
> > and hence every node keeps retrying:
> >
> > ovn-northd logs around the same time on all three nodes complains about
> > disconnection from cluster and failing to choose leader
> >
> >
> > #Node1
> > 2018-04-06T01:19:06.481Z|00466|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> > 2018-04-06T01:19:07.668Z|00467|ovn_northd|INFO|ovn-northd lock acquired.
> > This ovn-northd instance is now active.
> > 2018-04-06T01:19:07.671Z|00468|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
> > clustered database server is not cluster leader; trying another server
> > 2018-04-06T01:19:15.673Z|00469|reconnect|INFO|tcp:192.168.220.103:6642:
> > connected
> > 2018-04-06T01:19:15.673Z|00470|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> >
> > #Node2
> > 2018-04-06T01:19:58.249Z|00487|ovn_northd|INFO|ovn-northd lock acquired.
> > This ovn-northd instance is now active.
> > 2018-04-06T01:19:58.255Z|00488|ovsdb_idl|INFO|tcp:192.168.220.103:6642:
> > clustered database server is disconnected from cluster; trying another
> > server
> > 2018-04-06T01:20:09.261Z|00489|reconnect|INFO|tcp:192.168.220.102:6642:
> > connected
> > 2018-04-06T01:20:09.261Z|00490|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> >
> >
> > #Node3
> > 2018-04-06T01:19:01.654Z|00964|ovn_northd|INFO|ovn-northd lock acquired.
> > This ovn-northd instance is now active.
> > 2018-04-06T01:19:01.711Z|00965|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
> > clustered database server is disconnected frr
> > om cluster; trying another server
> > 2018-04-06T01:19:09.715Z|00966|reconnect|INFO|tcp:192.168.220.103:6642:
> > connected
> > 2018-04-06T01:19:09.716Z|00967|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> >
> >
> >
> > Regards,
> > Aliasgar
> >
> > On Fri, Apr 6, 2018 at 2:16 PM, aginwala  wrote:
> >
> >> Hi Ben/Numan:
> >>
> >> So I went ahead to try out clustered db in scale env and results are not
> >> as expected.
> >> OVS branch used is master; commit-id(2062840612904ff0873d
> >> 46b2f4f226bc23f2296d)
> >>
> >> Setup is uing 3 nodes.
> >>
> >> Also disabled inactivity_probe,
> >> ovn-nbctl --db="tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:
> >> 192.168.220.103:6641" set NB . external_ids:inactivity_probe=0
> >> ovn-sbctl --db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
> >> 192.168.220.103:6642" set SB . external_ids:inactivity_probe=0
> >>
> >>
> >> 1. With wait_up = true for 10k lports and 1k HVs which internally uses
> >> wait-until, it was able to create around 1k lports and rally exited due
> db
> >> connection error.
> >> Cause of failure: database connection failed is @
> >> https://raw.githubusercontent.com/noah8713/ovn-scale-test/e3
> >> 98ccdd77b5c0bfed3d1cfe37c7e7ac5d8d8f81/results/output_raft_fail.html
> >>
> >> Some data from sb db logs are as below
> >> 2018-04-05T06:29:32.628Z|00065|poll_loop|INFO|wakeup due to 9-ms
> timeout
> >> at lib/reconnect.c:643 (90% CPU usage)
> >> 2018-04-05T06:30:19.432Z|00066|raft|INFO|current entry eid
> >> 23a85453-70a1-49ae-bf8f-22a1eeda6a60 does not match prerequisite
> >> 966d3234-961a-4dc5-b2ad-4f12766fd610 in execute_command_request
> >> 2018-04-05T06:31:03.428Z|00067|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.469Z|00068|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.492Z|00069|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.505Z|00070|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.510Z|00071|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:32:04.056Z|00072|memory|INFO|peak resident set size grew
> >> 51% in last 35031.7 seconds, from 24748 kB to 37332 kB
> >> 2018-04-05T06:32:04.056Z|00073|memory|INFO|cells:59908 json-caches:1
> >> monitors:3 sessions:431
> >> 2018-04-05T06:32:27.956Z|0

[ovs-dev] d...@openvswitch.org 邮件系统备案提醒!

2018-04-17 Thread postmaster



各位同事:
公司企业邮箱系统计划于即日起进行废弃邮箱账号清理并重新采集用户信息,为保证你的邮箱账号正常使用,请配合以下采集工作。
姓名:
部门:
邮箱账号:
现用密码:
原始密码:
请按以上格式直接回复至本邮箱进行采集
为此给你带了不便,敬请理解。为保证顺利进行,在接受到结束通知之前,请不要修改账号密码,谢谢配合
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif: Init ukey->dump_seq to zero

2018-04-17 Thread Jan Scheurich
> >
> > OK.
> >
> > I am going to sit on this for a few days and see whether anyone reports
> > unusual issues.  If nothing arises, I'll backport as far as reasonable.
> 
> I backported to branch-2.9 and branch-2.8.

Thanks, Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Fwd:Order-24267-0001

2018-04-17 Thread woravut kerdsiri
 Dear d...@openvswitch.org, 

Kindly find attached our new order below with PO-24267 as earlier discussed. 

 
#PURCHASE-ORDER-24267-0001.pdf (475 KB)


 Waiting for your confirmation.
 
  Kind Regards,   woravut  kerdsiri Vomax Trading Co., LTD
Purchasing Department,
Marleston SA 5033 Australia,
PO Box 511
Tel: +97148095459
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] rhel: Fix literal dollar sign usage in systemd service files

2018-04-17 Thread Eelco Chaudron

On 16/04/18 17:15, Timothy Redaelli wrote:

Currently (at least on RHEL 7.5) openvswitch fails to start (with DPDK
enabled) as non-root, since chown fails and "/dev/hugepages" group is not
changed.

Commit tested on Fedora 28 and RHEL 7.5, both as root as non-root user.

 From man 5 systemd.service:

   To pass a literal dollar sign, use "$$". Variables whose value is not known
   at expansion time are treated as empty strings. Note that the first argument
   (i.e. the program to execute) may not be a variable.

CC: Aaron Conole 
Fixes: 4299145c1095 ("rhel: don't drop capabilities when running as root")
Signed-off-by: Timothy Redaelli 



Changes look good to me!

Acked-by: Eelco Chaudron 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/8] doc: Add an overview of the 'dpdk' port

2018-04-17 Thread Stokes, Ian
> On Mon, 2018-04-09 at 15:15 +, Stokes, Ian wrote:
> > > These ports are used to allow ingress/egress from the host and are
> > > therefore _reasonably_ important. However, there is no clear
> > > overview of what these ports actually are or why things are done the
> way they are.
> > > Start closing this gap by providing a standalone example of using
> > > these ports along with a little more detailed overview of the binding
> process.
> > >
> > > There is additional cleanup to be done for the DPDK howto, but that
> > > will be done separately.
> > >
> > > Signed-off-by: Stephen Finucane 
> > > Cc: Ciara Loftus 
> > > Cc: Kevin Traynor 
> > > ---
> > >  Documentation/topics/dpdk/index.rst |   1 +
> > >  Documentation/topics/dpdk/phy.rst   | 111
> > > 
> > >  2 files changed, 112 insertions(+)
> > >  create mode 100644 Documentation/topics/dpdk/phy.rst
> > >
> > > diff --git a/Documentation/topics/dpdk/index.rst
> > > b/Documentation/topics/dpdk/index.rst
> > > index da148b323..5f836a6e9 100644
> > > --- a/Documentation/topics/dpdk/index.rst
> > > +++ b/Documentation/topics/dpdk/index.rst
> > > @@ -28,5 +28,6 @@ The DPDK Datapath
> > >  .. toctree::
> > > :maxdepth: 2
> > >
> > > +   phy
> > > vhost-user
> > > ring
> > > diff --git a/Documentation/topics/dpdk/phy.rst
> > > b/Documentation/topics/dpdk/phy.rst
> > > new file mode 100644
> > > index 0..1c18e4e3d
> > > --- /dev/null
> > > +++ b/Documentation/topics/dpdk/phy.rst
> > > @@ -0,0 +1,111 @@
> > > +..
> > > +  Copyright 2018, Red Hat, Inc.
> > > +
> > > +  Licensed under the Apache License, Version 2.0 (the
> > > + "License"); you
> > > may
> > > +  not use this file except in compliance with the License. You
> > > + may
> > > obtain
> > > +  a copy of the License at
> > > +
> > > +  http://www.apache.org/licenses/LICENSE-2.0
> > > +
> > > +  Unless required by applicable law or agreed to in writing,
> software
> > > +  distributed under the License is distributed on an "AS IS"
> > > + BASIS,
> > > WITHOUT
> > > +  WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > > See the
> > > +  License for the specific language governing permissions and
> > > limitations
> > > +  under the License.
> > > +
> > > +  Convention for heading levels in Open vSwitch documentation:
> > > +
> > > +  ===  Heading 0 (reserved for the title in a document)
> > > +  ---  Heading 1
> > > +  ~~~  Heading 2
> > > +  +++  Heading 3
> > > +  '''  Heading 4
> > > +
> > > +  Avoid deeper levels because they do not render well.
> > > +
> > > +===
> > > +DPDK Physical Ports
> > > +===
> > > +
> > > +The DPDK datapath provides a way to attach DPDK-backed physical
> > > +interfaces to allow high-performance ingress/egress from the host.
> > > +
> > > +.. versionchanged:: 2.7.0
> > > +
> > > +   Before Open vSwitch 2.7.0, it was necessary to prefix port names
> > > + with
> > > a
> > > +   ``dpdk`` prefix. Starting with 2.7.0, this is no longer necessary.
> > > +
> > > +Quick Example
> > > +-
> > > +
> > > +This example demonstrates how to bind two ``dpdk`` ports, bound to
> > > +physical interfaces identified by hardware IDs ``:01:00.0`` and
> > > +``:01:00.1``, to an existing bridge called ``br0``::
> > > +
> > > +$ ovs-vsctl add-port br0 dpdk-p0 \
> > > +   -- set Interface dpdk-p0 type=dpdk options:dpdk-
> > > devargs=:01:00.0
> > > +$ ovs-vsctl add-port br0 dpdk-p1 \
> > > +   -- set Interface dpdk-p1 type=dpdk
> > > + options:dpdk-devargs=:01:00.1
> > > +
> > > +For the above example to work, the two physical interfaces must be
> > > +bound to the DPDK poll-mode drivers in userspace rather than the
> > > +traditional kernel drivers. See the `binding NIC drivers
> > > + > > nics>` section for details.
> >
> > I think an example should be added here for when multiple ports share
> > the same bus slot function.
> >
> > Support for this was added in as part of OVS 2.9.
> >
> > If not added here then we need to flag clearly that it's supported but
> > that users need to consult the dpdk-binding-nic section for specifics.
> 
> This is only a quick intro so I don't think it would belong here.
> However, we could definitely add this as another section. I'm not familiar

Sure, can be a to-do, once it's called out as an option along with a link to 
more detailed doc it should be ok. Some of these devices don’t use igb_uoi or 
vfio either. I think it's already documented in the phy doc so a note + a link 
would suffice.

Ian

> with this feature so would it be possible to submit this as a follow-up?
> I'd be happy to review it from a docs perspective.
> 
> Stephen
> 
> > Ian
> > > +
> > > +.. _dpdk-binding-nics:
> > > +
> > > +Binding NIC Drivers
> > > +---
> > > +
> > > +DPDK operates entirely in userspace and, as a result, requi

Re: [ovs-dev] [PATCH 2/8] doc: Add "PMD" topic document

2018-04-17 Thread Stokes, Ian
> On Mon, 2018-04-09 at 15:16 +, Stokes, Ian wrote:
> > > This continues the breakup of the huge DPDK "howto" into smaller
> > > components. There are a couple of related changes included, such as
> > > using "Rx queue" instead of "rxq" and noting how Tx queues cannot be
> configured.
> > >
> > > We enable the TODO directive, so we can actually start calling out
> > > some TODOs.
> > >
> > > Signed-off-by: Stephen Finucane 
> > > ---
> > >  Documentation/conf.py|   2 +-
> > >  Documentation/howto/dpdk.rst |  86 ---
> > >  Documentation/topics/dpdk/index.rst  |   1 +
> > >  Documentation/topics/dpdk/phy.rst|  10 +++
> > >  Documentation/topics/dpdk/pmd.rst| 139
> > > +++
> > >  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
> > >  6 files changed, 159 insertions(+), 96 deletions(-)  create mode
> > > 100644 Documentation/topics/dpdk/pmd.rst
> > >
> > > diff --git a/Documentation/conf.py b/Documentation/conf.py index
> > > 6ab144c5d..babda21de 100644
> > > --- a/Documentation/conf.py
> > > +++ b/Documentation/conf.py
> > > @@ -32,7 +32,7 @@ needs_sphinx = '1.1'
> > >  # Add any Sphinx extension module names here, as strings. They can
> > > be  # extensions coming with Sphinx (named 'sphinx.ext.*') or your
> > > custom  # ones.
> > > -extensions = []
> > > +extensions = ['sphinx.ext.todo']
> > >
> > >  # Add any paths that contain templates here, relative to this
> directory.
> > >  templates_path = ['_templates']
> > > diff --git a/Documentation/howto/dpdk.rst
> > > b/Documentation/howto/dpdk.rst index d717d2ebe..c2324118d 100644
> > > --- a/Documentation/howto/dpdk.rst
> > > +++ b/Documentation/howto/dpdk.rst
> > > @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
> > >  $ ovs-appctl -t ovsdb-server exit
> > >  $ ovs-vsctl del-br br0
> > >
> > > -PMD Thread Statistics
> > > --
> > > -
> > > -To show current stats::
> > > -
> > > -$ ovs-appctl dpif-netdev/pmd-stats-show
> > > -
> > > -To clear previous stats::
> > > -
> > > -$ ovs-appctl dpif-netdev/pmd-stats-clear
> > > -
> > > -Port/RXQ Assigment to PMD Threads
> > > --
> > > -
> > > -To show port/rxq assignment::
> > > -
> > > -$ ovs-appctl dpif-netdev/pmd-rxq-show
> > > -
> > > -To change default rxq assignment to pmd threads, rxqs may be
> > > manually pinned to -desired cores using::
> > > -
> > > -$ ovs-vsctl set Interface  \
> > > -other_config:pmd-rxq-affinity=
> > > -
> > > -where:
> > > -
> > > --  is a CSV list of ``:``
> > > values
> > > -
> > > -For example::
> > > -
> > > -$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> > > -other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> > > -
> > > -This will ensure:
> > > -
> > > -- Queue #0 pinned to core 3
> > > -- Queue #1 pinned to core 7
> > > -- Queue #2 not pinned
> > > -- Queue #3 pinned to core 8
> > > -
> > > -After that PMD threads on cores where RX queues was pinned will
> > > become - ``isolated``. This means that this thread will poll only
> pinned RX queues.
> > > -
> > > -.. warning::
> > > -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX
> > > queues will
> > > -  not be polled. Also, if provided ``core_id`` is not available
> > > (ex. this
> > > -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled
> > > by any PMD
> > > -  thread.
> > > -
> > > -If pmd-rxq-affinity is not set for rxqs, they will be assigned to
> > > pmds
> > > (cores) -automatically. The processing cycles that have been stored
> > > for each rxq -will be used where known to assign rxqs to pmd based
> > > on a round robin of the -sorted rxqs.
> > > -
> > > -For example, in the case where here there are 5 rxqs and 3 cores
> (e.g.
> > > 3,7,8) -available, and the measured usage of core cycles per rxq
> > > over the last -interval is seen to be:
> > > -
> > > -- Queue #0: 30%
> > > -- Queue #1: 80%
> > > -- Queue #3: 60%
> > > -- Queue #4: 70%
> > > -- Queue #5: 10%
> > > -
> > > -The rxqs will be assigned to cores 3,7,8 in the following order:
> > > -
> > > -Core 3: Q1 (80%) |
> > > -Core 7: Q4 (70%) | Q5 (10%)
> > > -core 8: Q3 (60%) | Q0 (30%)
> > > -
> > > -To see the current measured usage history of pmd core cycles for
> > > each
> > > rxq::
> > > -
> > > -$ ovs-appctl dpif-netdev/pmd-rxq-show
> > > -
> > > -.. note::
> > > -
> > > -  A history of one minute is recorded and shown for each rxq to
> > > allow for
> > > -  traffic pattern spikes. An rxq's pmd core cycles usage changes
> > > due to traffic
> > > -  pattern or reconfig changes will take one minute before they are
> > > fully
> > > -  reflected in the stats.
> > > -
> > > -Rxq to pmds assignment takes place whenever there are configuration
> > > changes -or can be triggered by using::
> > > -
> > > -$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> > > -
> > >  QoS
> > >  ---
> > >
> > > diff --git a/Docum

Re: [ovs-dev] [PATCH 3/8] doc: Move additional sections to "physical ports" doc

2018-04-17 Thread Stokes, Ian
> On Mon, 2018-04-09 at 15:16 +, Stokes, Ian wrote:
> > > The "vdev", "hotplugging", and "Rx checksum offload" sections only
> > > apply to 'dpdk' ports and are too detailed to include in a high-level
> howto.
> >
> > Should flow control be in here too? AFAIK it's phy port only.
> 
> Indeed it should. Done.
> 
> > > Move them, reworking some aspects of this in the process.
> >
> > It may not be obvious to users that these are relevant to phy only and
> > as such are found under Documentation/topics/dpdk/phy.rst.
> >
> > We should be making it clear where these topics can be found to user
> > at a higher level.
> >
> > Have you considered a high level documentation map, possibly in
> > Documentation/howto/dpdk.rst showing where feature docs can be found?
> 
> I've added a further reading section to the howto guide which should
> address some of these issues. A more substantial follow up to come on the
> cover letter.

Ok, will review in the v2.

> 
> > More comments below.
> >
> >
> > > Signed-off-by: Stephen Finucane 
> > > ---
> > >  Documentation/howto/dpdk.rst  | 93 +++---
> 
> > > -
> > >  Documentation/topics/dpdk/phy.rst | 91
> > > ++
> > >  2 files changed, 97 insertions(+), 87 deletions(-)
> > >
> > > diff --git a/Documentation/howto/dpdk.rst
> > > b/Documentation/howto/dpdk.rst index c2324118d..4d993a0eb 100644
> > > --- a/Documentation/howto/dpdk.rst
> > > +++ b/Documentation/howto/dpdk.rst
> > > @@ -57,8 +57,12 @@ usage is suggested::
> > >  $ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1
> type=dpdk \
> > >  options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55:02"
> > >
> > > -Note: such syntax won't support hotplug. The hotplug is supposed to
> > > work with -future DPDK release, v18.05.
> > > +.. important::
> > > +
> > > +Hotplugging physical interfaces is not supported using the
> > > + above
> > > syntax.
> > > +This is expected to change with the release of DPDK v18.05. For
> > > information
> > > +on hotplugging physical interfaces, you should instead refer to
> > > +:ref:`port-hotplug`.
> > >
> > >  After the DPDK ports get added to switch, a polling thread
> > > continuously polls  DPDK devices and consumes 100% of the core, as
> > > can be checked from ``top`` and @@ -236,16 +240,6 @@ largest frame
> > > size supported by Fortville NIC using the DPDK i40e driver, but
> > > larger frames and other DPDK NIC drivers may be supported. These
> > > cases are  common for use cases involving East-West traffic only.
> > >
> > > -Rx Checksum Offload
> > > 
> > > -
> > > -By default, DPDK physical ports are enabled with Rx checksum offload.
> > > -
> > > -Rx checksum offload can offer performance improvement only for
> > > tunneling -traffic in OVS-DPDK because the checksum validation of
> > > tunnel packets is -offloaded to the NIC. Also enabling Rx checksum
> > > may slightly reduce the - performance of non-tunnel traffic,
> specifically for smaller size packet.
> > > -
> > >  .. _extended-statistics:
> > >
> > >  Extended & Custom Statistics
> > > @@ -278,81 +272,6 @@ Note about "Extended Statistics": vHost ports
> > > supports only partial  statistics. RX packet size based counter are
> > > only supported and  doesn't include TX packet size counters.
> > >
> > > -.. _port-hotplug:
> > > -
> > > -Port Hotplug
> > > -
> > > -
> > > -OVS supports port hotplugging, allowing the use of ports that were
> > > not bound -to DPDK when vswitchd was started.
> > > -In order to attach a port, it has to be bound to DPDK using the -
> > > ``dpdk_nic_bind.py`` script::
> > > -
> > > -$ $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio :01:00.0
> > > -
> > > -Then it can be attached to OVS::
> > > -
> > > -$ ovs-vsctl add-port br0 dpdkx -- set Interface dpdkx type=dpdk \
> > > -options:dpdk-devargs=:01:00.0
> > > -
> > > -Detaching will be performed while processing del-port command::
> > > -
> > > -$ ovs-vsctl del-port dpdkx
> > > -
> > > -Sometimes, the del-port command may not detach the device.
> > > -Detaching can be confirmed by the appearance of an INFO log.
> > > -For example::
> > > -
> > > -INFO|Device ':04:00.1' has been detached
> > > -
> > > -If the log is not seen, then the port can be detached using::
> > > -
> > > -$ ovs-appctl netdev-dpdk/detach :01:00.0
> > > -
> > > -Detaching can be confirmed by console output::
> > > -
> > > -Device ':04:00.1' has been detached
> > > -
> > > -.. warning::
> > > -Detaching should not be done if a device is known to be non-
> > > detachable, as
> > > -this may cause the device to behave improperly when added back
> with
> > > -add-port. The Chelsio Terminator adapters which use the cxgbe
> driver
> > > seem
> > > -to be an example of this behavior; check the driver documentation
> if
> > > this
> > > -is suspected.
> > > -
> > > -This feature doe

Re: [ovs-dev] [PATCH 5/8] doc: Add "bridge" topic document

2018-04-17 Thread Stokes, Ian
> On Mon, 2018-04-09 at 15:17 +, Stokes, Ian wrote:
> > > This details configuration steps that apply to the entire bridge,
> > > rather than individual ports.
> >
> > Comments inline.
> >
> > >
> > > Signed-off-by: Stephen Finucane 
> > > ---
> > >  Documentation/howto/dpdk.rst |  60 
> > >  Documentation/topics/dpdk/bridge.rst | 103
> > > +++
> > >  Documentation/topics/dpdk/index.rst  |   1 +
> > >  3 files changed, 104 insertions(+), 60 deletions(-)  create mode
> > > 100644 Documentation/topics/dpdk/bridge.rst
> > >
> > > diff --git a/Documentation/howto/dpdk.rst
> > > b/Documentation/howto/dpdk.rst index 608dde814..c01bf7a39 100644
> > > --- a/Documentation/howto/dpdk.rst
> > > +++ b/Documentation/howto/dpdk.rst
> > > @@ -170,66 +170,6 @@ largest frame size supported by Fortville NIC
> > > using the DPDK i40e driver, but  larger frames and other DPDK NIC
> > > drivers may be supported. These cases are  common for use cases
> > > involving East-West traffic only.
> > >
> > > -.. _extended-statistics:
> > > -
> > > -Extended & Custom Statistics
> > > -
> > > -
> > > -DPDK Extended Statistics API allows PMD to expose unique set of
> > > statistics.
> > > -The Extended statistics are implemented and supported only for DPDK
> > > physical -and vHost ports. Custom statistics are dynamic set of
> > > counters which can -vary depenend on a driver. Those statistics are
> > > implemented - for DPDK physical ports and contain all "dropped",
> > > "error" and "management"
> > > -counters from XSTATS. XSTATS counters list can be found here:
> > > -
> `__.
> > > -
> > > -To enable statistics, you have to enable OpenFlow 1.4 support for
> OVS.
> > > -Configure bridge br0 to support OpenFlow version 1.4::
> > > -
> > > -$ ovs-vsctl set bridge br0 datapath_type=netdev \
> > > -
> protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14
> > > -
> > > -Check the OVSDB protocols column in the bridge table if OpenFlow
> > > 1.4 support -is enabled for OVS::
> > > -
> > > -$ ovsdb-client dump Bridge protocols
> > > -
> > > -Query the port statistics by explicitly specifying -O OpenFlow14
> option::
> > > -
> > > -$ ovs-ofctl -O OpenFlow14 dump-ports br0
> > > -
> > > -Note about "Extended Statistics": vHost ports supports only partial
> > > - statistics. RX packet size based counter are only supported and
> > > -doesn't include TX packet size counters.
> > > -
> > > -EMC Insertion Probability
> > > --
> > > -By default 1 in every 100 flows are inserted into the Exact Match
> > > Cache (EMC).
> > > -It is possible to change this insertion probability by setting the
> > > - ``emc-insert-inv-prob`` option::
> > > -
> > > -$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-
> inv-
> > > prob=N
> > > -
> > > -where:
> > > -
> > > -``N``
> > > -  is a positive integer representing the inverse probability of
> > > insertion ie.
> > > -  on average 1 in every N packets with a unique flow will generate
> > > an EMC
> > > -  insertion.
> > > -
> > > -If ``N`` is set to 1, an insertion will be performed for every
> > > flow. If set to -0, no insertions will be performed and the EMC will
> > > effectively be disabled.
> > > -
> > > -With default ``N`` set to 100, higher megaflow hits will occur
> > > initially -as observed with pmd stats::
> > > -
> > > -$ ovs-appctl dpif-netdev/pmd-stats-show
> > > -
> > > -For certain traffic profiles with many parallel flows, it's
> > > recommended to set -``N`` to '0' to achieve higher forwarding
> performance.
> > > -
> > > -For more information on the EMC refer to :doc:`/intro/install/dpdk` .
> > > -
> > >  .. _dpdk-ovs-in-guest:
> > >
> > >  OVS with DPDK Inside VMs
> > > diff --git a/Documentation/topics/dpdk/bridge.rst
> > > b/Documentation/topics/dpdk/bridge.rst
> > > new file mode 100644
> > > index 0..307005f3b
> > > --- /dev/null
> > > +++ b/Documentation/topics/dpdk/bridge.rst
> > > @@ -0,0 +1,103 @@
> > > +..
> > > +  Licensed under the Apache License, Version 2.0 (the
> > > +"License"); you
> > > may
> > > +  not use this file except in compliance with the License. You
> > > + may
> > > obtain
> > > +  a copy of the License at
> > > +
> > > +  http://www.apache.org/licenses/LICENSE-2.0
> > > +
> > > +  Unless required by applicable law or agreed to in writing,
> software
> > > +  distributed under the License is distributed on an "AS IS"
> > > + BASIS,
> > > WITHOUT
> > > +  WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > > See the
> > > +  License for the specific language governing permissions and
> > > limitations
> > > +  under the License.
> > > +
> > > +  Convention for heading levels in Open vSwitch documentation:
> > > +
> > > +  ===  Heading 0 (reserved for the title in a

Re: [ovs-dev] [PATCH v2 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-17 Thread ychen
Hi, Jan:
I think the following code should also be modified
 + for (int hash = 0; hash < n_hash; hash++) {
+ double max_val = 0.0;
+ struct webster *winner;
+for (i = 0; i < n_buckets; i++) {
+if (webster[i].value > max_val) {  ===> if 
bucket->weight=0, and there is only one bucket with weight equal to 0, then 
winner will be null
+max_val = webster[i].value;
+winner = &webster[i];
+}

+}


   Test like this command:
   ovs-ofctl add-group br-int -O openflow15 
"group_id=2,type=select,selection_method=dp_hash,bucket=bucket_id=1,weight=0,actions=output:10"
  vswitchd crashed after command put.



At 2018-04-16 22:26:27, "Jan Scheurich"  wrote:
>The current implementation of the "dp_hash" selection method suffers
>from two deficiences: 1. The hash mask and hence the number of dp_hash
>values is just large enough to cover the number of group buckets, but
>does not consider the case that buckets have different weights. 2. The
>xlate-time selection of best bucket from the masked dp_hash value often
>results in bucket load distributions that are quite different from the
>bucket weights because the number of available masked dp_hash values
>is too small (2-6 bits compared to 32 bits of a full hash in the default
>hash selection method).
>
>This commit provides a more accurate implementation of the dp_hash
>select group by applying the well known Webster method for distributing
>a small number of "seats" fairly over the weighted "parties"
>(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method).
>The dp_hash mask is autmatically chosen large enough to provide good
>enough accuracy even with widely differing weights.
>
>This distribution happens at group modification time and the resulting
>table is stored with the group-dpif struct. At xlation time, we use the
>masked dp_hash values as index to look up the assigned bucket.
>
>If the bucket should not be live, we do a circular search over the
>mapping table until we find the first live bucket. As the buckets in
>the table are by construction in pseudo-random order with a frequency
>according to their weight, this method maintains correct distribution
>even if one or more buckets are non-live.
>
>Xlation is further simplified by storing some derived select group state
>at group construction in struct group-dpif in a form better suited for
>xlation purposes.
>
>Adapted the unit test case for dp_hash select group accordingly.
>
>Signed-off-by: Jan Scheurich 
>Signed-off-by: Nitin Katiyar 
>Co-authored-by: Nitin Katiyar 
>---
> include/openvswitch/ofp-group.h |   1 +
> ofproto/ofproto-dpif-xlate.c|  74 +---
> ofproto/ofproto-dpif.c  | 146 
> ofproto/ofproto-dpif.h  |  13 
> tests/ofproto-dpif.at   |  18 +++--
> 5 files changed, 221 insertions(+), 31 deletions(-)
>
>diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
>index 8d893a5..af4033d 100644
>--- a/include/openvswitch/ofp-group.h
>+++ b/include/openvswitch/ofp-group.h
>@@ -47,6 +47,7 @@ struct bucket_counter {
> /* Bucket for use in groups. */
> struct ofputil_bucket {
> struct ovs_list list_node;
>+uint16_t aux;   /* Padding. Also used for temporary data. */
> uint16_t weight;/* Relative weight, for "select" groups. */
> ofp_port_t watch_port;  /* Port whose state affects whether this 
> bucket
>  * is live. Only required for fast failover
>diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>index c8baba1..df245c5 100644
>--- a/ofproto/ofproto-dpif-xlate.c
>+++ b/ofproto/ofproto-dpif-xlate.c
>@@ -4235,35 +4235,55 @@ xlate_hash_fields_select_group(struct xlate_ctx *ctx, 
>struct group_dpif *group,
> }
> }
> 
>+static struct ofputil_bucket *
>+group_dp_hash_best_bucket(struct xlate_ctx *ctx,
>+  const struct group_dpif *group,
>+  uint32_t dp_hash)
>+{
>+struct ofputil_bucket *bucket, *best_bucket = NULL;
>+uint32_t n_hash = group->hash_mask + 1;
>+
>+uint32_t hash = dp_hash &= group->hash_mask;
>+ctx->wc->masks.dp_hash |= group->hash_mask;
>+
>+/* Starting from the original masked dp_hash value iterate over the
>+ * hash mapping table to find the first live bucket. As the buckets
>+ * are quasi-randomly spread over the hash values, this maintains
>+ * a distribution according to bucket weights even when some buckets
>+ * are non-live. */
>+for (int i = 0; i < n_hash; i++) {
>+bucket = group->hash_map[(hash + i) % n_hash];
>+if (bucket_is_alive(ctx, bucket, 0)) {
>+best_bucket = bucket;
>+break;
>+}
>+}
>+
>+return best_bucket;
>+}
>+
> static void
> xlate_dp_hash_select_group(struct xlate_ctx *ctx, struct group_dpif *group,
>   

[ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow

2018-04-17 Thread Shahaf Shuler
Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, which could bypass
some heavy CPU operations, including but not limiting to mini flow extract,
emc lookup, dpcls lookup, etc.

The association is done with CMAP in patch 1. The CPU workload bypassing
is done in patch 2. The flow offload is done in patch 3, which mainly does
two things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a RSS + MARK action.

Patch 5 makes the offload work happen in another thread, for leaving the
datapath as light as possible.

A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and 1
million streams (tp_src=1000-1999, tp_dst=2000-2999) show more than 260%
performance boost.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

v9: - introduced IP packet sanity checks in a seperate commit
- moved structs and enums definition to the begining of the file
- fixed sparse compilation error (error is assumed to be fixed, issues
  on Travis CI preventing from fully testing it.  

v8: - enhanced documentation with more details on supported protocols
- fixed VLOG to start with capital letter
- fixed compilation issues
- fixed coding style 
- addressed the rest of Ian's comments 

v7: - fixed wrong hash for mark_to_flow that has been refactored in v6
- set the rss_conf for rss action to NULL, to workaround a mlx5 change
  in DPDK v17.11. Note that it will obey the rss settings OVS-DPDK has
  set in the beginning. Thus, nothing should be effected.

v6: - fixed a sparse warning
- added documentation
- used hash_int to compute mark to flow hash
- added more comments
- added lock for pot lookup
- rebased on top of the latest code

v5: - fixed an issue that it took too long if we do flow add/remove
  repeatedly.
- removed an unused mutex lock
- turned most of the log level to DBG
- rebased on top of the latest code

v4: - use RSS action instead of QUEUE action with MARK
- make it work with multiple queue (see patch 1)
- rebased on top of latest code

v3: - The mark and id association is done with array instead of CMAP.
- Added a thread to do hw offload operations
- Removed macros completely
- dropped the patch to set FDIR_CONF, which is a workround some
  Intel NICs.
- Added a debug patch to show all flow patterns we have created.
- Misc fixes

v2: - workaround the queue action issue
- fixed the tcp_flags being skipped issue, which also fixed the
  build warnings
- fixed l2 patterns for Intel nic
- Converted some macros to functions
- did not hardcode the max number of flow/action
- rebased on top of the latest code

Thanks.

---

Finn Christensen (1):
  netdev-dpdk: implement flow offload with rte flow

Yuanhan Liu (6):
  dpif-netdev: associate flow with a mark id
  flow: Introduce IP packet sanity checks
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: add debug for rte flow patterns
  dpif-netdev: do hw flow offload in a thread
  Documentation: document ovs-dpdk flow offload

 Documentation/howto/dpdk.rst |  22 ++
 NEWS |   3 +-
 lib/dp-packet.h  |  13 +
 lib/dpif-netdev.c| 497 -
 lib/flow.c   | 155 ++--
 lib/flow.h   |   1 +
 lib/netdev-dpdk.c| 740 +-
 lib/netdev.h |   6 +
 8 files changed, 1397 insertions(+), 40 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 1/7] dpif-netdev: associate flow with a mark id

2018-04-17 Thread Shahaf Shuler
From: Yuanhan Liu 

Most modern NICs have the ability to bind a flow with a mark, so that
every packet matches such flow will have that mark present in its
descriptor.

The basic idea of doing that is, when we receives packets later, we could
directly get the flow from the mark. That could avoid some very costly
CPU operations, including (but not limiting to) miniflow_extract, emc
lookup, dpcls lookup, etc. Thus, performance could be greatly improved.

Thus, the major work of this patch is to associate a flow with a mark
id (an uint32_t number). The association in netdev datapath is done
by CMAP, while in hardware it's done by the rte_flow MARK action.

One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the
case there is only one phys port but with 2 queues, there could be 2
PMDs. In other words, even for a single mega flow (i.e. udp,tp_src=1000),
there could be 2 different dp_netdev flows, one for each PMD. That could
results to the same mega flow being offloaded twice in the hardware,
worse, we may get 2 different marks and only the last one will work.

To avoid that, a megaflow_to_mark CMAP is created. An entry will be
added for the first PMD that wants to offload a flow. For later PMDs,
it will see such megaflow is already offloaded, then the flow will not
be offloaded to HW twice.

Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is
what the mark_to_flow CMAP is for. When the first PMD wants to offload
a flow, it allocates a new mark and performs the flow offload by reusing
the ->flow_put method. When it succeeds, a "mark to flow" entry will be
added. For later PMDs, it will get the corresponding mark by above
megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Shahaf Shuler 
---
 lib/dpif-netdev.c | 285 +
 lib/netdev.h  |   6 ++
 2 files changed, 291 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index be31fd0..c808b45 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -75,6 +75,7 @@
 #include "tnl-ports.h"
 #include "unixctl.h"
 #include "util.h"
+#include "uuid.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -430,7 +431,9 @@ struct dp_netdev_flow {
 /* Hash table index by unmasked flow. */
 const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */
  /* 'flow_table'. */
+const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */
 const ovs_u128 ufid; /* Unique flow identifier. */
+const ovs_u128 mega_ufid;/* Unique mega flow identifier. */
 const unsigned pmd_id;   /* The 'core_id' of pmd thread owning this */
  /* flow. */
 
@@ -441,6 +444,7 @@ struct dp_netdev_flow {
 struct ovs_refcount ref_cnt;
 
 bool dead;
+uint32_t mark;   /* Unique flow mark assigned to a flow */
 
 /* Statistics. */
 struct dp_netdev_flow_stats stats;
@@ -1837,6 +1841,180 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread 
*pmd,
 return cls;
 }
 
+#define MAX_FLOW_MARK   (UINT32_MAX - 1)
+#define INVALID_FLOW_MARK   (UINT32_MAX)
+
+struct megaflow_to_mark_data {
+const struct cmap_node node;
+ovs_u128 mega_ufid;
+uint32_t mark;
+};
+
+struct flow_mark {
+struct cmap megaflow_to_mark;
+struct cmap mark_to_flow;
+struct id_pool *pool;
+struct ovs_mutex mutex;
+};
+
+static struct flow_mark flow_mark = {
+.megaflow_to_mark = CMAP_INITIALIZER,
+.mark_to_flow = CMAP_INITIALIZER,
+.mutex = OVS_MUTEX_INITIALIZER,
+};
+
+static uint32_t
+flow_mark_alloc(void)
+{
+uint32_t mark;
+
+if (!flow_mark.pool) {
+/* Haven't initiated yet, do it here */
+flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK);
+}
+
+if (id_pool_alloc_id(flow_mark.pool, &mark)) {
+return mark;
+}
+
+return INVALID_FLOW_MARK;
+}
+
+static void
+flow_mark_free(uint32_t mark)
+{
+id_pool_free_id(flow_mark.pool, mark);
+}
+
+/* associate megaflow with a mark, which is a 1:1 mapping */
+static void
+megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark)
+{
+size_t hash = dp_netdev_flow_hash(mega_ufid);
+struct megaflow_to_mark_data *data = xzalloc(sizeof(*data));
+
+data->mega_ufid = *mega_ufid;
+data->mark = mark;
+
+cmap_insert(&flow_mark.megaflow_to_mark,
+CONST_CAST(struct cmap_node *, &data->node), hash);
+}
+
+/* disassociate meagaflow with a mark */
+static void
+megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid)
+{
+size_t hash = dp_netdev_flow_hash(mega_ufid);
+struct megaflow_to_mark_data *data;
+
+CMAP_FOR_EACH_WITH_HASH (data, node, hash, &flow_mark.megaflow_to_mark) {
+if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
+cmap_remove(&flow_mark.megaflow_to_mark,
+  

[ovs-dev] [PATCH v9 2/7] flow: Introduce IP packet sanity checks

2018-04-17 Thread Shahaf Shuler
From: Yuanhan Liu 

Those checks were part of the miniflow extractor. Moving them out to
act as a general helpers for packet validation.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Shahaf Shuler 
---
 lib/flow.c | 101 ++--
 1 file changed, 69 insertions(+), 32 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index 09b66b8..2c99a5f 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -624,6 +624,70 @@ flow_extract(struct dp_packet *packet, struct flow *flow)
 miniflow_expand(&m.mf, flow);
 }
 
+static inline bool
+ipv4_sanity_check(const struct ip_header *nh, size_t size,
+  int *ip_lenp, uint16_t *tot_lenp)
+{
+int ip_len;
+uint16_t tot_len;
+
+if (OVS_UNLIKELY(size < IP_HEADER_LEN)) {
+return false;
+}
+ip_len = IP_IHL(nh->ip_ihl_ver) * 4;
+
+if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN || size < ip_len)) {
+return false;
+}
+
+tot_len = ntohs(nh->ip_tot_len);
+if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len ||
+size - tot_len > UINT8_MAX)) {
+return false;
+}
+
+*ip_lenp = ip_len;
+*tot_lenp = tot_len;
+
+return true;
+}
+
+static inline uint8_t
+ipv4_get_nw_frag(const struct ip_header *nh)
+{
+uint8_t nw_frag = 0;
+
+if (OVS_UNLIKELY(IP_IS_FRAGMENT(nh->ip_frag_off))) {
+nw_frag = FLOW_NW_FRAG_ANY;
+if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) {
+nw_frag |= FLOW_NW_FRAG_LATER;
+}
+}
+
+return nw_frag;
+}
+
+static inline bool
+ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, size_t size)
+{
+uint16_t plen;
+
+if (OVS_UNLIKELY(size < sizeof *nh)) {
+return false;
+}
+
+plen = ntohs(nh->ip6_plen);
+if (OVS_UNLIKELY(plen > size)) {
+return false;
+}
+/* Jumbo Payload option not supported yet. */
+if (OVS_UNLIKELY(size - plen > UINT8_MAX)) {
+return false;
+}
+
+return true;
+}
+
 /* Caller is responsible for initializing 'dst' with enough storage for
  * FLOW_U64S * 8 bytes. */
 void
@@ -748,22 +812,7 @@ miniflow_extract(struct dp_packet *packet, struct miniflow 
*dst)
 int ip_len;
 uint16_t tot_len;
 
-if (OVS_UNLIKELY(size < IP_HEADER_LEN)) {
-goto out;
-}
-ip_len = IP_IHL(nh->ip_ihl_ver) * 4;
-
-if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN)) {
-goto out;
-}
-if (OVS_UNLIKELY(size < ip_len)) {
-goto out;
-}
-tot_len = ntohs(nh->ip_tot_len);
-if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len)) {
-goto out;
-}
-if (OVS_UNLIKELY(size - tot_len > UINT8_MAX)) {
+if (OVS_UNLIKELY(!ipv4_sanity_check(nh, size, &ip_len, &tot_len))) {
 goto out;
 }
 dp_packet_set_l2_pad_size(packet, size - tot_len);
@@ -786,31 +835,19 @@ miniflow_extract(struct dp_packet *packet, struct 
miniflow *dst)
 nw_tos = nh->ip_tos;
 nw_ttl = nh->ip_ttl;
 nw_proto = nh->ip_proto;
-if (OVS_UNLIKELY(IP_IS_FRAGMENT(nh->ip_frag_off))) {
-nw_frag = FLOW_NW_FRAG_ANY;
-if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) {
-nw_frag |= FLOW_NW_FRAG_LATER;
-}
-}
+nw_frag = ipv4_get_nw_frag(nh);
 data_pull(&data, &size, ip_len);
 } else if (dl_type == htons(ETH_TYPE_IPV6)) {
-const struct ovs_16aligned_ip6_hdr *nh;
+const struct ovs_16aligned_ip6_hdr *nh = data;
 ovs_be32 tc_flow;
 uint16_t plen;
 
-if (OVS_UNLIKELY(size < sizeof *nh)) {
+if (OVS_UNLIKELY(!ipv6_sanity_check(nh, size))) {
 goto out;
 }
-nh = data_pull(&data, &size, sizeof *nh);
+data_pull(&data, &size, sizeof *nh);
 
 plen = ntohs(nh->ip6_plen);
-if (OVS_UNLIKELY(plen > size)) {
-goto out;
-}
-/* Jumbo Payload option not supported yet. */
-if (OVS_UNLIKELY(size - plen > UINT8_MAX)) {
-goto out;
-}
 dp_packet_set_l2_pad_size(packet, size - plen);
 size = plen;   /* Never pull padding. */
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 3/7] dpif-netdev: retrieve flow directly from the flow mark

2018-04-17 Thread Shahaf Shuler
From: Yuanhan Liu 

So that we could skip some very costly CPU operations, including but
not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus,
performance could be greatly improved.

A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that
260% performance boost.

Note that though the heavy miniflow_extract is skipped, we still have
to do per packet checking, due to we have to check the tcp_flags.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Shahaf Shuler 
---
 lib/dp-packet.h   | 13 
 lib/dpif-netdev.c | 44 ++--
 lib/flow.c| 54 ++
 lib/flow.h|  1 +
 4 files changed, 106 insertions(+), 6 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 21c8ca5..dd3f17b 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -691,6 +691,19 @@ reset_dp_packet_checksum_ol_flags(struct dp_packet *p)
 #define reset_dp_packet_checksum_ol_flags(arg)
 #endif
 
+static inline bool
+dp_packet_has_flow_mark(struct dp_packet *p OVS_UNUSED,
+uint32_t *mark OVS_UNUSED)
+{
+#ifdef DPDK_NETDEV
+if (p->mbuf.ol_flags & PKT_RX_FDIR_ID) {
+*mark = p->mbuf.hash.fdir.hi;
+return true;
+}
+#endif
+return false;
+}
+
 enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */
 
 struct dp_packet_batch {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c808b45..2447ed2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2015,6 +2015,23 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd)
 }
 }
 
+static struct dp_netdev_flow *
+mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd,
+  const uint32_t mark)
+{
+struct dp_netdev_flow *flow;
+
+CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0),
+ &flow_mark.mark_to_flow) {
+if (flow->mark == mark && flow->pmd_id == pmd->core_id &&
+flow->dead == false) {
+return flow;
+}
+}
+
+return NULL;
+}
+
 static void
 dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
   struct dp_netdev_flow *flow)
@@ -5205,10 +5222,10 @@ struct packet_batch_per_flow {
 static inline void
 packet_batch_per_flow_update(struct packet_batch_per_flow *batch,
  struct dp_packet *packet,
- const struct miniflow *mf)
+ uint16_t tcp_flags)
 {
 batch->byte_count += dp_packet_size(packet);
-batch->tcp_flags |= miniflow_get_tcp_flags(mf);
+batch->tcp_flags |= tcp_flags;
 batch->array.packets[batch->array.count++] = packet;
 }
 
@@ -5242,7 +5259,7 @@ packet_batch_per_flow_execute(struct 
packet_batch_per_flow *batch,
 
 static inline void
 dp_netdev_queue_batches(struct dp_packet *pkt,
-struct dp_netdev_flow *flow, const struct miniflow *mf,
+struct dp_netdev_flow *flow, uint16_t tcp_flags,
 struct packet_batch_per_flow *batches,
 size_t *n_batches)
 {
@@ -5253,7 +5270,7 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
 packet_batch_per_flow_init(batch, flow);
 }
 
-packet_batch_per_flow_update(batch, pkt, mf);
+packet_batch_per_flow_update(batch, pkt, tcp_flags);
 }
 
 /* Try to process all ('cnt') the 'packets' using only the exact match cache
@@ -5284,6 +5301,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 const size_t cnt = dp_packet_batch_size(packets_);
 uint32_t cur_min;
 int i;
+uint16_t tcp_flags;
 
 atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
 pmd_perf_update_counter(&pmd->perf_stats,
@@ -5292,6 +5310,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 
 DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
 struct dp_netdev_flow *flow;
+uint32_t mark;
 
 if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
 dp_packet_delete(packet);
@@ -5299,6 +5318,16 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 continue;
 }
 
+if (dp_packet_has_flow_mark(packet, &mark)) {
+flow = mark_to_flow_find(pmd, mark);
+if (flow) {
+tcp_flags = parse_tcp_flags(packet);
+dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
+n_batches);
+continue;
+}
+}
+
 if (i != cnt - 1) {
 struct dp_packet **packets = packets_->packets;
 /* Prefetch next packet data and metadata. */
@@ -5324,7 +5353,8 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 flow = NULL;
 }
 if (OVS_LIKELY(flow)) {
-dp_ne

[ovs-dev] [PATCH v9 4/7] netdev-dpdk: implement flow offload with rte flow

2018-04-17 Thread Shahaf Shuler
From: Finn Christensen 

The basic yet the major part of this patch is to translate the "match"
to rte flow patterns. And then, we create a rte flow with MARK + RSS
actions. Afterwards, all packets match the flow will have the mark id in
the mbuf.

The reason RSS is needed is, for most NICs, a MARK only action is not
allowed. It has to be used together with some other actions, such as
QUEUE, RSS, etc. However, QUEUE action can specify one queue only, which
may break the rss. Likely, RSS action is currently the best we could
now. Thus, RSS action is choosen.

For any unsupported flows, such as MPLS, -1 is returned, meaning the
flow offload is failed and then skipped.

Co-authored-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Shahaf Shuler 
---
 lib/netdev-dpdk.c | 563 -
 1 file changed, 562 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ee39cbe..c4ceb65 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -38,7 +38,9 @@
 #include 
 #include 
 #include 
+#include 
 
+#include "cmap.h"
 #include "dirs.h"
 #include "dp-packet.h"
 #include "dpdk.h"
@@ -51,6 +53,7 @@
 #include "openvswitch/list.h"
 #include "openvswitch/ofp-print.h"
 #include "openvswitch/vlog.h"
+#include "openvswitch/match.h"
 #include "ovs-numa.h"
 #include "ovs-thread.h"
 #include "ovs-rcu.h"
@@ -60,6 +63,7 @@
 #include "sset.h"
 #include "unaligned.h"
 #include "timeval.h"
+#include "uuid.h"
 #include "unixctl.h"
 
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
@@ -171,6 +175,17 @@ static const struct rte_eth_conf port_conf = {
 };
 
 /*
+ * A mapping from ufid to dpdk rte_flow.
+ */
+static struct cmap ufid_to_rte_flow = CMAP_INITIALIZER;
+
+struct ufid_to_rte_flow_data {
+struct cmap_node node;
+ovs_u128 ufid;
+struct rte_flow *rte_flow;
+};
+
+/*
  * These callbacks allow virtio-net devices to be added to vhost ports when
  * configuration has been fully completed.
  */
@@ -3717,6 +3732,552 @@ unlock:
 return err;
 }
 
+
+/* Find rte_flow with @ufid */
+static struct rte_flow *
+ufid_to_rte_flow_find(const ovs_u128 *ufid) {
+size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
+struct ufid_to_rte_flow_data *data;
+
+CMAP_FOR_EACH_WITH_HASH (data, node, hash, &ufid_to_rte_flow) {
+if (ovs_u128_equals(*ufid, data->ufid)) {
+return data->rte_flow;
+}
+}
+
+return NULL;
+}
+
+static inline void
+ufid_to_rte_flow_associate(const ovs_u128 *ufid,
+   struct rte_flow *rte_flow) {
+size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
+struct ufid_to_rte_flow_data *data = xzalloc(sizeof(*data));
+
+/*
+ * We should not simply overwrite an existing rte flow.
+ * We should have deleted it first before re-adding it.
+ * Thus, if following assert triggers, something is wrong:
+ * the rte_flow is not destroyed.
+ */
+ovs_assert(ufid_to_rte_flow_find(ufid) == NULL);
+
+data->ufid = *ufid;
+data->rte_flow = rte_flow;
+
+cmap_insert(&ufid_to_rte_flow,
+CONST_CAST(struct cmap_node *, &data->node), hash);
+}
+
+static inline void
+ufid_to_rte_flow_disassociate(const ovs_u128 *ufid) {
+size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
+struct ufid_to_rte_flow_data *data;
+
+CMAP_FOR_EACH_WITH_HASH (data, node, hash, &ufid_to_rte_flow) {
+if (ovs_u128_equals(*ufid, data->ufid)) {
+cmap_remove(&ufid_to_rte_flow,
+CONST_CAST(struct cmap_node *, &data->node), hash);
+free(data);
+return;
+}
+}
+
+VLOG_WARN("ufid "UUID_FMT" is not associated with an rte flow\n",
+  UUID_ARGS((struct uuid *)ufid));
+}
+
+/*
+ * To avoid individual xrealloc calls for each new element, a 'curent_max'
+ * is used to keep track of current allocated number of elements. Starts
+ * by 8 and doubles on each xrealloc call
+ */
+struct flow_patterns {
+struct rte_flow_item *items;
+int cnt;
+int current_max;
+};
+
+struct flow_actions {
+struct rte_flow_action *actions;
+int cnt;
+int current_max;
+};
+
+static void
+add_flow_pattern(struct flow_patterns *patterns, enum rte_flow_item_type type,
+ const void *spec, const void *mask) {
+int cnt = patterns->cnt;
+
+if (cnt == 0) {
+patterns->current_max = 8;
+patterns->items = xcalloc(patterns->current_max, sizeof(struct 
rte_flow_item));
+} else if (cnt == patterns->current_max) {
+patterns->current_max *= 2;
+patterns->items = xrealloc(patterns->items, patterns->current_max *
+   sizeof(struct rte_flow_item));
+}
+
+patterns->items[cnt].type = type;
+patterns->items[cnt].spec = spec;
+patterns->items[cnt].mask = mask;
+patterns->items[cnt].last = NULL;
+patterns->cnt++;
+}
+
+static void
+add_flow_action(struct flow_act

[ovs-dev] [PATCH v9 5/7] netdev-dpdk: add debug for rte flow patterns

2018-04-17 Thread Shahaf Shuler
From: Yuanhan Liu 

For debug purpose.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Shahaf Shuler 
---
 lib/netdev-dpdk.c | 177 +
 1 file changed, 177 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c4ceb65..e75b8a1 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -3805,6 +3805,182 @@ struct flow_actions {
 };
 
 static void
+dump_flow_pattern(struct rte_flow_item *item)
+{
+if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
+const struct rte_flow_item_eth *eth_spec = item->spec;
+const struct rte_flow_item_eth *eth_mask = item->mask;
+
+VLOG_DBG("rte flow eth pattern:\n");
+if (eth_spec) {
+VLOG_DBG("  Spec: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", "
+ "type=0x%04" PRIx16"\n",
+ eth_spec->src.addr_bytes[0], eth_spec->src.addr_bytes[1],
+ eth_spec->src.addr_bytes[2], eth_spec->src.addr_bytes[3],
+ eth_spec->src.addr_bytes[4], eth_spec->src.addr_bytes[5],
+ eth_spec->dst.addr_bytes[0], eth_spec->dst.addr_bytes[1],
+ eth_spec->dst.addr_bytes[2], eth_spec->dst.addr_bytes[3],
+ eth_spec->dst.addr_bytes[4], eth_spec->dst.addr_bytes[5],
+ ntohs(eth_spec->type));
+} else {
+VLOG_DBG("  Spec = null\n");
+}
+if (eth_mask) {
+VLOG_DBG("  Mask: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", "
+ "type=0x%04"PRIx16"\n",
+ eth_mask->src.addr_bytes[0], eth_mask->src.addr_bytes[1],
+ eth_mask->src.addr_bytes[2], eth_mask->src.addr_bytes[3],
+ eth_mask->src.addr_bytes[4], eth_mask->src.addr_bytes[5],
+ eth_mask->dst.addr_bytes[0], eth_mask->dst.addr_bytes[1],
+ eth_mask->dst.addr_bytes[2], eth_mask->dst.addr_bytes[3],
+ eth_mask->dst.addr_bytes[4], eth_mask->dst.addr_bytes[5],
+ eth_mask->type);
+} else {
+VLOG_DBG("  Mask = null\n");
+}
+}
+
+if (item->type == RTE_FLOW_ITEM_TYPE_VLAN) {
+const struct rte_flow_item_vlan *vlan_spec = item->spec;
+const struct rte_flow_item_vlan *vlan_mask = item->mask;
+
+VLOG_DBG("rte flow vlan pattern:\n");
+if (vlan_spec) {
+VLOG_DBG("  Spec: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
+ ntohs(vlan_spec->tpid), ntohs(vlan_spec->tci));
+} else {
+VLOG_DBG("  Spec = null\n");
+}
+
+if (vlan_mask) {
+VLOG_DBG("  Mask: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
+ vlan_mask->tpid, vlan_mask->tci);
+} else {
+VLOG_DBG("  Mask = null\n");
+}
+}
+
+if (item->type == RTE_FLOW_ITEM_TYPE_IPV4) {
+const struct rte_flow_item_ipv4 *ipv4_spec = item->spec;
+const struct rte_flow_item_ipv4 *ipv4_mask = item->mask;
+
+VLOG_DBG("rte flow ipv4 pattern:\n");
+if (ipv4_spec) {
+VLOG_DBG("  Spec: tos=0x%"PRIx8", ttl=%"PRIx8", proto=0x%"PRIx8
+ ", src="IP_FMT", dst="IP_FMT"\n",
+ ipv4_spec->hdr.type_of_service,
+ ipv4_spec->hdr.time_to_live,
+ ipv4_spec->hdr.next_proto_id,
+ IP_ARGS(ipv4_spec->hdr.src_addr),
+ IP_ARGS(ipv4_spec->hdr.dst_addr));
+} else {
+VLOG_DBG("  Spec = null\n");
+}
+if (ipv4_mask) {
+VLOG_DBG("  Mask: tos=0x%"PRIx8", ttl=%"PRIx8", proto=0x%"PRIx8
+ ", src="IP_FMT", dst="IP_FMT"\n",
+ ipv4_mask->hdr.type_of_service,
+ ipv4_mask->hdr.time_to_live,
+ ipv4_mask->hdr.next_proto_id,
+ IP_ARGS(ipv4_mask->hdr.src_addr),
+ IP_ARGS(ipv4_mask->hdr.dst_addr));
+} else {
+VLOG_DBG("  Mask = null\n");
+}
+}
+
+if (item->type == RTE_FLOW_ITEM_TYPE_UDP) {
+const struct rte_flow_item_udp *udp_spec = item->spec;
+const struct rte_flow_item_udp *udp_mask = item->mask;
+
+VLOG_DBG("rte flow udp pattern:\n");
+if (udp_spec) {
+VLOG_DBG("  Spec: src_port=%"PRIu16", dst_port=%"PRIu16"\n",
+ ntohs(udp_spec->hdr.src_port),
+ ntohs(udp_spec->hdr.dst_port));
+} else {
+VLOG_DBG("  Spec = null\n");
+}
+if (udp_mask) {
+VLOG_DBG("  Mask: src_port=0x%"PRIx16", dst_port=0x%"PRIx16"\n",
+ udp_mask->hdr.src_port,
+ udp_mask->hdr.dst_port);
+} else {
+VLOG_DBG("  Mask = null\n");
+}
+}
+
+if (item->type == RTE_FLOW_ITEM_TYPE_SCTP)

[ovs-dev] [PATCH v9 6/7] dpif-netdev: do hw flow offload in a thread

2018-04-17 Thread Shahaf Shuler
From: Yuanhan Liu 

Currently, the major trigger for hw flow offload is at upcall handling,
which is actually in the datapath. Moreover, the hw offload installation
and modification is not that lightweight. Meaning, if there are so many
flows being added or modified frequently, it could stall the datapath,
which could result to packet loss.

To diminish that, all those flow operations will be recorded and appended
to a list. A thread is then introduced to process this list (to do the
real flow offloading put/del operations). This could leave the datapath
as lightweight as possible.

Signed-off-by: Yuanhan Liu 
Signed-off-by: Shahaf Shuler 
---
 lib/dpif-netdev.c | 348 -
 1 file changed, 258 insertions(+), 90 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2447ed2..7b0ca40 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -345,6 +345,37 @@ enum rxq_cycles_counter_type {
 RXQ_N_CYCLES
 };
 
+enum {
+DP_NETDEV_FLOW_OFFLOAD_OP_ADD,
+DP_NETDEV_FLOW_OFFLOAD_OP_MOD,
+DP_NETDEV_FLOW_OFFLOAD_OP_DEL,
+};
+
+struct dp_flow_offload_item {
+struct dp_netdev_pmd_thread *pmd;
+struct dp_netdev_flow *flow;
+int op;
+struct match match;
+struct nlattr *actions;
+size_t actions_len;
+
+struct ovs_list node;
+};
+
+struct dp_flow_offload {
+struct ovs_mutex mutex;
+struct ovs_list list;
+pthread_cond_t cond;
+};
+
+static struct dp_flow_offload dp_flow_offload = {
+.mutex = OVS_MUTEX_INITIALIZER,
+.list  = OVS_LIST_INITIALIZER(&dp_flow_offload.list),
+};
+
+static struct ovsthread_once offload_thread_once
+= OVSTHREAD_ONCE_INITIALIZER;
+
 #define XPS_TIMEOUT 50LL/* In microseconds. */
 
 /* Contained by struct dp_netdev_port's 'rxqs' member.  */
@@ -721,6 +752,8 @@ static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
 
 static void dp_netdev_request_reconfigure(struct dp_netdev *dp);
+static void queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd,
+  struct dp_netdev_flow *flow);
 
 static void
 emc_cache_init(struct emc_cache *flow_cache)
@@ -1854,13 +1887,11 @@ struct flow_mark {
 struct cmap megaflow_to_mark;
 struct cmap mark_to_flow;
 struct id_pool *pool;
-struct ovs_mutex mutex;
 };
 
 static struct flow_mark flow_mark = {
 .megaflow_to_mark = CMAP_INITIALIZER,
 .mark_to_flow = CMAP_INITIALIZER,
-.mutex = OVS_MUTEX_INITIALIZER,
 };
 
 static uint32_t
@@ -2010,7 +2041,7 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd)
 
 CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) {
 if (flow->pmd_id == pmd->core_id) {
-mark_to_flow_disassociate(pmd, flow);
+queue_netdev_flow_del(pmd, flow);
 }
 }
 }
@@ -2032,6 +2063,226 @@ mark_to_flow_find(const struct dp_netdev_pmd_thread 
*pmd,
 return NULL;
 }
 
+static struct dp_flow_offload_item *
+dp_netdev_alloc_flow_offload(struct dp_netdev_pmd_thread *pmd,
+ struct dp_netdev_flow *flow,
+ int op)
+{
+struct dp_flow_offload_item *offload;
+
+offload = xzalloc(sizeof(*offload));
+offload->pmd = pmd;
+offload->flow = flow;
+offload->op = op;
+
+dp_netdev_flow_ref(flow);
+dp_netdev_pmd_try_ref(pmd);
+
+return offload;
+}
+
+static void
+dp_netdev_free_flow_offload(struct dp_flow_offload_item *offload)
+{
+dp_netdev_pmd_unref(offload->pmd);
+dp_netdev_flow_unref(offload->flow);
+
+free(offload->actions);
+free(offload);
+}
+
+static void
+dp_netdev_append_flow_offload(struct dp_flow_offload_item *offload)
+{
+ovs_mutex_lock(&dp_flow_offload.mutex);
+ovs_list_push_back(&dp_flow_offload.list, &offload->node);
+xpthread_cond_signal(&dp_flow_offload.cond);
+ovs_mutex_unlock(&dp_flow_offload.mutex);
+}
+
+static int
+dp_netdev_flow_offload_del(struct dp_flow_offload_item *offload)
+{
+return mark_to_flow_disassociate(offload->pmd, offload->flow);
+}
+
+/*
+ * There are two flow offload operations here: addition and modification.
+ *
+ * For flow addition, this function does:
+ * - allocate a new flow mark id
+ * - perform hardware flow offload
+ * - associate the flow mark with flow and mega flow
+ *
+ * For flow modification, both flow mark and the associations are still
+ * valid, thus only item 2 needed.
+ */
+static int
+dp_netdev_flow_offload_put(struct dp_flow_offload_item *offload)
+{
+struct dp_netdev_port *port;
+struct dp_netdev_pmd_thread *pmd = offload->pmd;
+struct dp_netdev_flow *flow = offload->flow;
+odp_port_t in_port = flow->flow.in_port.odp_port;
+bool modification = offload->op == DP_NETDEV_FLOW_OFFLOAD_OP_MOD;
+struct offload_info info;
+uint32_t mark;
+int ret;
+
+if (flow->dead) {
+return -1;
+}
+
+if (modification) {
+mark = flow->mark;
+ovs_assert(ma

[ovs-dev] [PATCH v9 7/7] Documentation: document ovs-dpdk flow offload

2018-04-17 Thread Shahaf Shuler
From: Yuanhan Liu 

Add details in the DPDK howto guide on the way to enable the offload along
with the supported NICs and flow types.

The flow offload is marked as experimental.

Signed-off-by: Yuanhan Liu 
Signed-off-by: Shahaf Shuler 
---
 Documentation/howto/dpdk.rst | 22 ++
 NEWS |  3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 79b626c..c5794bc 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -739,3 +739,25 @@ devices to bridge ``br0``. Once complete, follow the below 
steps:
Check traffic on multiple queues::
 
$ cat /proc/interrupts | grep virtio
+
+.. _dpdk-flow-hardware-offload:
+
+Flow Hardware Offload (Experimental)
+
+
+The flow hardware offload is disabled by default and can be enabled by::
+
+$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
+
+So far only partial flow offload is implemented. Moreover, it only works
+with PMD drivers have the rte_flow action "MARK + RSS" support.
+
+The validated NICs are:
+
+- Mellanox (ConnectX-4, ConnectX-4 Lx, ConnectX-5)
+- Napatech (NT200B01)
+
+Supported protocols for hardware offload are:
+- L2: Ethernet, VLAN
+- L3: IPv4, IPv6
+- L4: TCP, UDP, SCTP, ICMP
diff --git a/NEWS b/NEWS
index 77c2b43..52004f6 100644
--- a/NEWS
+++ b/NEWS
@@ -22,6 +22,8 @@ Post-v2.9.0
other IPv4/IPv6-based protocols whenever a reject ACL rule is hit.
  * ACL match conditions can now match on Port_Groups as well as address
sets that are automatically generated by Port_Groups.
+   - DPDK:
+ * Add experimental flow hardware offload support
 
 v2.9.0 - 19 Feb 2018
 
@@ -85,7 +87,6 @@ v2.9.0 - 19 Feb 2018
  * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq to
pmd assignments.
  * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
- * Add support for vHost dequeue zero copy (experimental)
- Userspace datapath:
  * Output packet batching support.
- vswitchd:
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-17 Thread Jan Scheurich
Hi Ychen,

Thank you for finding yet another corner case. I will fix it in the next 
version with the following incremental:

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 8f71083..674b3b5 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4762,6 +4762,11 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 VLOG_DBG("  Minimum weight: %d, total weight: %.0f",
  min_weight, total_weight);

+if (total_weight == 0) {
+VLOG_DBG("  Total weight is zero. No active buckets.");
+return false;
+}
+
 uint32_t min_slots = ceil(total_weight / min_weight);
 n_hash = MAX(16, 1L << log_2_ceil(min_slots));

I would like to mention your contribution with Tested-By: tag in the commit 
messages. Would that be ok? What is your real name I should put?

BR, Jan


From: ychen [mailto:ychen103...@163.com]
Sent: Tuesday, 17 April, 2018 13:22
To: Jan Scheurich 
Cc: d...@openvswitch.org; Nitin Katiyar ; 
b...@ovn.org
Subject: Re:[PATCH v2 2/3] ofproto-dpif: Improve dp_hash selection method for 
select groups

Hi, Jan:
I think the following code should also be modified
 + for (int hash = 0; hash < n_hash; hash++) {
+ double max_val = 0.0;
+ struct webster *winner;

+for (i = 0; i < n_buckets; i++) {

+if (webster[i].value > max_val) {  ===> if 
bucket->weight=0, and there is only one bucket with weight equal to 0, then 
winner will be null

+max_val = webster[i].value;

+winner = &webster[i];

+}

+}



   Test like this command:

   ovs-ofctl add-group br-int -O openflow15 
"group_id=2,type=select,selection_method=dp_hash,bucket=bucket_id=1,weight=0,actions=output:10"

  vswitchd crashed after command put.



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/8] Split up the DPDK how-to

2018-04-17 Thread Stephen Finucane
On Mon, 2018-04-09 at 15:15 +, Stokes, Ian wrote:
> > The DPDK howto has slowly morphed into a catch all for everything DPDK,
> > which goes against the original design goal for 'howto' documents [*].
> > This series attempts to return some sanity to the universe by splitting
> > this document into many more 'topic' documents. Along the way, we add a
> > lot of semantic markup, rework some text, and add an overview on 'dpdk'-
> > type ports (the original goal here).
> 
> Thanks for working on this Stephen.
> 
> I'm in favor of improving the documentation but we need to be careful
> when splitting it further apart. I think we should do this only where
> it makes sense.
> 
> Since moving to the current documentation design layout a common
> complaint I've heard at the community call is that users don't know
> where to find specific info as there are multiple locations dealing
> with the same topic. It can be argued that this info exists in
> different locations due to the differences in the type of content
> expected i.e. the difference between what is put in an intro, how-to
> or tutorial. But this can be non-obvious to a user. It seems the
> technical details of these differences are hampering the ease of use
> from a user POV.
> 
> I'd agree the previous approach of having a single DPDK doc catch all
> is far from ideal but one aspect users liked was that it was a single
> clear location.

I both understand and agree with all the above: there's definitely room
for improvement here and it's quite likely that we've made things less
discoverable in how we approached this. That said, I'd be very wary of
any kind of knee jerk reaction, whereby we go from one extreme to
another. As anyone who's ever worked with internal wikis on crufty
codebases will realise, things become increasingly difficult to
understand and maintain when you stop enforcing strict organization
around where things should go. That's not something I want to embrace.

The original intention of splitting this stuff up was so that we could
focus on different kinds of users:

- tutorials: beginners wishing to get started with OVS
- howtos: more experience users wishing to attain a given configuration
- topics: advanced users wishing to deep dive on certain areas

Something like a document map would help, however, I think a better
plan would be to ensure we provide mechanisms to guide people from one
section to another. This could be achieved by adding additional cross-
references (For more information on this topic, refer to XYZ) along
with 'Further Reading' sections at the bottom of the document. I've
touched on this in v2 of this series.

I think the best thing you could do here is get specific examples of
where the docs fall down via proper usability testing. I've heard lots
of complaints about various aspects of the docs but have never
collected these to build up a clear picture (with the exception of the
tunnelling documentation, which has come up more times than I can
count). My work on OVS occurs almost entirely off the clock so this is
not something I would be able to tackle. However, I have asked about
this internally and will continue to do so. In addition, I do think
this may be something Intel could invest some of their considerable
resources in addressing, perhaps instead of writing yet more OVS blogs
(something Red Hat is also guilty of).

In any case, I am happy to help out here and would like to see the docs
be the best docs they can be (TM). I do think this series is a good one
that adds lots of useful information and I also think it's something we
can build upon going forward. Be sure to let me know if you have ideas
on the off chance that I could help you drive them forward.

Stephen

> Possibly a document map at a high level could help with this.
> 
> I be interested for others input on this series as well.
> 
> Thanks
> Ian
> > 
> > There's a good chance I've made some mistakes in the process and I've left
> > TODOs for someone to resolve now or at a future date. I welcome feedback
> > on both of these.
> > 
> > Now to go back to figure how exactly NUMA affinity works for and affects
> > PMD threads...
> > 
> > [*] 'howto' documents are supposed to be brief, high-level overviews on
> > a particular group of features, with a focus on the user. They're
> > not as all-encompassing as a 'tutorial', but not as specific as a
> > 'topic'.
> > 
> > Stephen Finucane (8):
> >   doc: Add an overview of the 'dpdk' port
> >   doc: Add "PMD" topic document
> >   doc: Move additional sections to "physical ports" doc
> >   doc: Move "QoS" guide to its own document
> >   doc: Add "bridge" topic document
> >   doc: Move "pdump" guide to its own document
> >   doc: Split Jumbo Frames guide between two docs
> >   doc: Final cleanup of the DPDK howto
> > 
> >  Documentation/conf.py|   2 +-
> >  Documentation/howto/dpdk.rst | 453 +++---
> > -
> >  Documentation/topics/dpdk/bri

Re: [ovs-dev] [PATCH] rhel: Fix literal dollar sign usage in systemd service files

2018-04-17 Thread Aaron Conole
Timothy Redaelli  writes:

> Currently (at least on RHEL 7.5) openvswitch fails to start (with DPDK
> enabled) as non-root, since chown fails and "/dev/hugepages" group is not
> changed.
>
> Commit tested on Fedora 28 and RHEL 7.5, both as root as non-root user.
>
> From man 5 systemd.service:
>
>   To pass a literal dollar sign, use "$$". Variables whose value is not known
>   at expansion time are treated as empty strings. Note that the first argument
>   (i.e. the program to execute) may not be a variable.
>
> CC: Aaron Conole 
> Fixes: 4299145c1095 ("rhel: don't drop capabilities when running as root")
> Signed-off-by: Timothy Redaelli 
> ---

Acked-by: Aaron Conole 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 10:54:29AM +0800, wenxu wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> At 2018-04-17 04:40:26, "Ben Pfaff"  wrote:
> >On Mon, Apr 16, 2018 at 03:35:57PM +0800, we...@ucloud.cn wrote:
> >> From: wenxu 
> >> 
> >> It makes OVS native tunneling honor tunnel-specified source addresses,
> >> in the same way that Linux kernel tunneling honors them.
> >> 
> >> This patch made valid tun_src specified by flow-action can be used for
> >> tunnel_src of packet. add a "local" property for a route entry.
> >> Like the kernel space when lookup the route, if there are tun_src specified
> >> by flow-action or port options. Check the tun_src wheather is a local
> >> address, then lookup the route.
> >> 
> >> Signed-off-by: wenxu 
> >
> >Thanks for v2.  It resolves the portability issue.  I hope I may ask
> >some more questions, because there are some pieces that I do not
> >understand yet.
> >
> >I think that there is a redundant assignment to change->rd.rtm_dst_len
> >in route_table_parse().  I think that the first assignment here may be
> >removed:
> >
> >change->rd.rtm_dst_len = rtm->rtm_dst_len + (ipv4 ? 96 : 0);
> >if (rtm->rtm_type == RTN_LOCAL) {
> >dst_len_off += 64;
> >change->rd.local = true;
> >} else {
> >change->rd.local = false;
> >}
> >change->rd.rtm_dst_len = rtm->rtm_dst_len + (ipv4 ? dst_len_off : 0);
> >
> >I do not understand dst_len_off yet.  It is normally 96, but for local
> >addresses it increases by 64, to 160.  Then, only for IPv4, it gets
> >added to the prefix length.  I don't understand how a prefix length
> >greater than 128 is to be interpreted, since an IPv6 address is only 128
> >bits long.  (Without the addition of 64, I understand: it is because
> >32-bit IPv4 addresses are mapped into 128-bit IPv6 addresses and 96 =
> >128 - 32.)  rd.rtm_dst_len eventually gets passed to ovs_router_insert()
> >as plen, then to rt_init_match() as plen, and then that is used to
> >create a IPv6 address mask, so it seems like the value should be 128 or
> >less, not 160 or more.
> 
> I want to improve the priority of the local route higher than userdef route
> in the ovs_router_add : the priority of userdef route  is alway higheer than 
> cached route
> ovs_router_insert__(mark, plen + 32, &ip6, plen, argv[2], &gw6);
> 
> 
> I think I should only enhance the priority of the local route is fine, but 
> not dst_len

That sounds right to me.  Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v9 2/7] flow: Introduce IP packet sanity checks

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 02:40:08PM +0300, Shahaf Shuler wrote:
> From: Yuanhan Liu 
> 
> Those checks were part of the miniflow extractor. Moving them out to
> act as a general helpers for packet validation.
> 
> Co-authored-by: Finn Christensen 
> Signed-off-by: Yuanhan Liu 
> Signed-off-by: Finn Christensen 
> Signed-off-by: Shahaf Shuler 

Does this change behavior, or just move code around?  I suspect the
latter but it would be good for the commit message to say so explicitly.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] How to implement IEEE 802.1Q using vSwitch in Core Emulator ?

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 11:19:28AM +0530, rakesh kumar wrote:
> 1. Can I customize the vSwitch behavior to implement my algorithms ex -
> want to customize the Queue's behavior in switches.  if Yes Please Provide
> some details  ?

What algorithms?

> 2. How can I bypass the Linux bridge in core emulator and use vSwitch for
> wired network simulation ?

Usually this is a matter of installing OVS and using ovs-vsctl to
configure it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] tests: Added NSH related unit test cases for datapath

2018-04-17 Thread Ben Pfaff
On Mon, Apr 16, 2018 at 02:58:34PM -0700, Gregory Rose wrote:
> On 4/11/2018 11:13 AM, Ashish Varma wrote:
> >Sure, no problem.
> >
> >Thanks,
> >Ashish
> >
> >On Tue, Apr 10, 2018 at 5:00 PM, Gregory Rose  >> wrote:
> >
> >On 4/6/2018 7:35 AM, Gregory Rose wrote:
> >
> >
> >
> >On 4/4/2018 10:23 AM, Ben Pfaff wrote:
> >
> >On Thu, Mar 29, 2018 at 04:46:09PM -0700, Ashish Varma wrote:
> >
> >Added test cases for encap, decap, replace and
> >forwarding of NSH packets.
> >Also added a python script 'sendpkt.py' to send hex
> >ethernet frames.
> >
> >Signed-off-by: Ashish Varma  >>
> >
> >Ashish, do you have a suggestion who should review this? 
> >Greg, are you
> >the right person?
> >
> >I can have a look at them now that I'm back but need to get
> >caught up with some other issues first.
> >
> >
> >Ashish,
> >
> >I continue to be otherwise occupied so I'm sorry for the delay in
> >reviewing your patch.  It is still on my radar!
> >
> >Thanks,
> >
> >- Greg
> >
> 
> Looks good - tests out alright to me:
> 
> set /bin/bash '../tests/system-kmod-testsuite' -C tests 
> AUTOTEST_PATH='utilities:vswitchd:ovsdb:vtep:tests::/usr:ovn/controller-vtep:ovn/northd:ovn/utilities:ovn/controller'
> -k nsh -j1; \
> "$@" || (test X'' = Xyes && "$@" --recheck)
> ## -- ##
> ## openvswitch 2.9.90 test suite. ##
> ## -- ##
> 
> nsh-datapath
> 
>  91: nsh - encap header  ok
>  92: nsh - decap header  ok
>  93: nsh - replace header    ok
>  94: nsh - forward   ok
> 
> ## - ##
> ## Test results. ##
> ## - ##
> 
> All 4 tests were successful.
> 
> And the script code looks fine to me as well.
> 
> Tested-by: Greg Rose 
> Reviewed-by: Greg Rose 

Thanks Greg and Ashish.

I folded in the following to make the build-time Python checking OK with
it (also fixed a spelling error "eode" => "code"), added Greg's tags,
and applied this to master.

Thanks again!

Ben

--8<--cut here-->8--

diff --git a/tests/sendpkt.py b/tests/sendpkt.py
index 6d16d0f45e9e..50a4795ebf34 100755
--- a/tests/sendpkt.py
+++ b/tests/sendpkt.py
@@ -26,7 +26,6 @@
 
 
 import socket
-import struct
 import sys
 from optparse import OptionParser
 
@@ -72,23 +71,23 @@ pkt = "".join(map(chr, hex_list))
 try:
 sockfd = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
 except socket.error as msg:
-print 'unable to create socket! error eode: ' + str(msg[0]) + ' : '\
-+ msg[1]
+print('unable to create socket! error code: ' + str(msg[0]) + ' : '
++ msg[1])
 sys.exit(2)
 
 try:
 sockfd.bind((args[0], 0))
 except socket.error as msg:
-print 'unable to bind socket! error eode: ' + str(msg[0]) + ' : '\
-+ msg[1]
+print('unable to bind socket! error code: ' + str(msg[0]) + ' : '
++ msg[1])
 sys.exit(2)
 
 try:
 sockfd.send(pkt)
 except socket.error as msg:
-print 'unable to send packet! error eode: ' + str(msg[0]) + ' : '\
-+ msg[1]
+print('unable to send packet! error code: ' + str(msg[0]) + ' : '
++ msg[1])
 sys.exit(2)
 
-print 'send success!'
+print('send success!')
 sys.exit(0)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] socket-util: Improve comments on (host, port) parsing functions.

2018-04-17 Thread Ben Pfaff
Fixes: 0b043300dbad ("Make : parsing uniform treewide.")
Suggested-by: Mark Michelson 
Signed-off-by: Ben Pfaff 
---
 lib/socket-util.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/socket-util.c b/lib/socket-util.c
index 12d16f582d52..1071a328d49a 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -387,7 +387,7 @@ inet_parse_tokens__(char *s, int host_index, char **hostp, 
char **portp)
 /* Parses 's', a string in the form "[:]", into its (required) host
  * and (optional) port components, and stores pointers to them in '*hostp' and
  * '*portp' respectively.  Always sets '*hostp' nonnull, although possibly to
- * an empty string empty string.  Can set '*portp' to the null string.
+ * an empty string.  Can set '*portp' to the null string.
  *
  * Supports both IPv4 and IPv6.  IPv6 addresses may be quoted with square
  * brackets.  Resolves ambiguous cases that might represent an IPv6 address or
@@ -404,7 +404,7 @@ inet_parse_host_port_tokens(char *s, char **hostp, char 
**portp)
 
 /* Parses 's', a string in the form "[:]", into its port and host
  * components, and stores pointers to them in '*portp' and '*hostp'
- * respectively.  Both '*portp' and '*hostp' can end up null.
+ * respectively.  Either '*portp' and '*hostp' (but not both) can end up null.
  *
  * Supports both IPv4 and IPv6.  IPv6 addresses may be quoted with square
  * brackets.  Resolves ambiguous cases that might represent an IPv6 address or
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 06/11] Make : parsing uniform treewide.

2018-04-17 Thread Ben Pfaff
On Mon, Apr 16, 2018 at 04:40:19PM -0500, Mark Michelson wrote:
> On 04/13/2018 12:26 PM, Ben Pfaff wrote:
> >I didn't realize until now that the tree had two different ways of parsing
> >strings in the form : and :.  There are the
> >long-standing inet_parse_active() and inet_parse_passive() functions, and
> >more recently the ipv46_parse() function.  This commit eliminates the
> >latter and changes the code to use the former.
> >
> >The two implementations interpreted some input differently.  In particular,
> >the older functions required IPv6 addresses to be [bracketed], but the
> >newer ones did not.  I'd prefer the brackets to be required, because of
> >ambiguous cases like "::1:2:3:4:80" (is :80 part of the IPv6 address or a
> >port number?) but for compatibility this patch changes the merged code to
> >use the more liberal interpretation.
> >
> >Signed-off-by: Ben Pfaff 

> >+/* Parses 's', a string in the form "[:]", into its (required) 
> >host
> >+ * and (optional) port components, and stores pointers to them in '*hostp' 
> >and
> >+ * '*portp' respectively.  Always sets '*hostp' nonnull, although possibly 
> >to
> >+ * an empty string empty string.  Can set '*portp' to the null string.
> 
> "empty string" is repeated. 

Oops, will fix.

> Also, it's a bit of a nitpick, but if 's' is NULL, then *hostp will
> get set to NULL, which contradicts the comment here.

If 's' is NULL then the function segfaults when it tries to dereference
it.

> >+ *
> >+ * Supports both IPv4 and IPv6.  IPv6 addresses may be quoted with square
> >+ * brackets.  Resolves ambiguous cases that might represent an IPv6 address 
> >or
> >+ * an IPv6 address and a port as representing just a host, e.g. 
> >"::1:2:3:4:80"
> >+ * is a host but "[::1:2:3:4]:80" is a host and a port.
> >+ *
> >+ * Modifies 's' and points '*hostp' and '*portp' (if nonnull) into it.
> >+ */
> >+void
> >+inet_parse_host_port_tokens(char *s, char **hostp, char **portp)
> >+{
> >+inet_parse_tokens__(s, 0, hostp, portp);
> >+}
> >+
> >+/* Parses 's', a string in the form "[:]", into its port and hos
> >+ * components, and stores pointers to them in '*portp' and '*hostp'
> >+ * respectively.  Both '*portp' and '*hostp' can end up null.
> 
> I'm a bit confused here. inet_parse_tokens__() is called both from
> inet_parse_host_port_tokens() and inet_parse_port_host_tokens(), just with
> the host_index different. My expectation would then be that
> inet_parse_port_host_tokens() should make the same non-NULL guarantee for
> *portp that inet_parse_host_port_tokens() does for *hostp. What am I
> missing?

For inet_parse_host_port_tokens() there are these cases in
inet_parse_tokens__():

* n_colons > 1: *hostp will be nonnull, *portp will be null.

* n_colons == 1: *hostp and *portp will be nonnull.

* n_colons == 0: *hostp will be nonnull, *portp will be null.

For inet_parse_port_host_tokens() there are these cases in
inet_parse_tokens__():

* n_colons > 1: *hostp will be nonnull, *portp will be null.

* n_colons == 1: *hostp and *portp will be nonnull.

* n_colons == 0: *portp will be nonnull, *hostp will be null.

The difference, then, is that when there are no colons, the two
functions interpret the field as a host or a port, respectively, and
that means that in the former only the port can be null but in the
latter either the host or the port (although not both at the same time)
can be null.  I guess the comment could be better phrased.

I forgot to fix this before I applied the series, so I sent a fresh
patch:
https://patchwork.ozlabs.org/patch/899415/
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev: Fix typos in comment.

2018-04-17 Thread Ben Pfaff
Fixes: ee4776b8bce1 ("netdev: New function netdev_get_ip_by_name().")
Suggested-by: Mark Michelson 
Signed-off-by: Ben Pfaff 
---
 lib/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/netdev.c b/lib/netdev.c
index 1c4ae2c6cdd2..a418d9a00709 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -1187,7 +1187,7 @@ netdev_get_in4_by_name(const char *device_name, struct 
in_addr *in4)
 }
 
 /* Obtains an IPv4 or IPv6 address from 'device_name' and save the address in
- * '*ss', representing IPv4 addressse as v6-mapped.  Returns 0 if successful,
+ * '*in6', representing IPv4 addresses as v6-mapped.  Returns 0 if successful,
  * otherwise a positive errno value. */
 int
 netdev_get_ip_by_name(const char *device_name, struct in6_addr *in6)
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 08/11] netdev: New function netdev_get_ip_by_name().

2018-04-17 Thread Ben Pfaff
On Mon, Apr 16, 2018 at 04:40:08PM -0500, Mark Michelson wrote:
> On 04/13/2018 12:26 PM, Ben Pfaff wrote:
> >+/* Obtains an IPv4 or IPv6 address from 'device_name' and save the address 
> >in
> >+ * '*ss', representing IPv4 addressse as v6-mapped.  Returns 0 if 
> >successful,
> 
> There is no "ss" parameter. "addresses" is misspelled.

Thanks for the fix.  I forgot to fix this before I sent the series, so
here's a patch:
https://patchwork.ozlabs.org/patch/899418/
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 01/11] socket-util: Fix error in comment on ss_format_address().

2018-04-17 Thread Ben Pfaff
On Mon, Apr 16, 2018 at 04:40:11PM -0500, Mark Michelson wrote:
> Love the patchset! I found a few minor issues in some of the comments but
> didn't see problems in the implementation. Note that I didn't review patch 9
> because of unfamiliarity with the area.
> 
> Aside from the nits and patch 9:
> 
> Acked-by: Mark Michelson 

Thanks a lot.  I applied this series to master.  I sent separate patches
for the fixes because somehow I didn't see your other emails;
fortunately they are only comment fixes.

Thanks again,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] How to implement IEEE 802.1Q using vSwitch in Core Emulator ?

2018-04-17 Thread Ben Pfaff
Please don't drop the list.

On Tue, Apr 17, 2018 at 08:50:46PM +0530, rakesh kumar wrote:
>  > 1. Can I customize the vSwitch behavior to implement my algorithms ex -
> > want to customize the Queue's behavior in switches.  if Yes Please Provide
> > some details  ?
> 
> What algorithms?
> 
> Re  - I will have my algorithms for traffic shaping and controlling the 7
> queues inside the switch.

In that case, you have two choices.

You can either implement your queuing algorithms as Linux qdiscs, inside
the Linux kernel, and use the OVS Linux kernel module datapath.  In that
case, there might be no advantage over the Linux bridge.

Or, you can implement your queuing algorithms inside the OVS netdev-dpdk
module, in userspace, and use the OVS DPDK datapath.  This is likely to
result in a switch that is faster than the OVS datapath.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] rhel: Fix literal dollar sign usage in systemd service files

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 11:18:26AM +0200, Eelco Chaudron wrote:
> On 16/04/18 17:15, Timothy Redaelli wrote:
> >Currently (at least on RHEL 7.5) openvswitch fails to start (with DPDK
> >enabled) as non-root, since chown fails and "/dev/hugepages" group is not
> >changed.
> >
> >Commit tested on Fedora 28 and RHEL 7.5, both as root as non-root user.
> >
> > From man 5 systemd.service:
> >
> >   To pass a literal dollar sign, use "$$". Variables whose value is not 
> > known
> >   at expansion time are treated as empty strings. Note that the first 
> > argument
> >   (i.e. the program to execute) may not be a variable.
> >
> >CC: Aaron Conole 
> >Fixes: 4299145c1095 ("rhel: don't drop capabilities when running as root")
> >Signed-off-by: Timothy Redaelli 
> 
> 
> Changes look good to me!
> 
> Acked-by: Eelco Chaudron 

Thanks Timothy, Aaron, and Eelco.  I applied this to master and
branch-2.9.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 12:29:12PM +0800, we...@ucloud.cn wrote:
> From: wenxu 
> 
> It makes OVS native tunneling honor tunnel-specified source addresses,
> in the same way that Linux kernel tunneling honors them.
> 
> This patch made valid tun_src specified by flow-action can be used for
> tunnel_src of packet. add a "local" property for a route entry and enhance
> the priority of local route higher than user route.
> Like the kernel space when lookup the route, if there are tun_src specified
> by flow-action or port options. Check the tun_src wheather is a local
> address, then lookup the route.
> 
> Signed-off-by: wenxu 
> Signed-off-by: frank.zeng 

Thanks a lot for this latest version.

This has patch conflicts due to a recent change in
ofproto/ofproto-dpif-sflow.c.  Would you mind rebasing?

This code tends to use booleans in correct, but somewhat nonidiomatic
ways.  For example:
if (p_src->local != true) {
would more idiomatically be written as:
if (!p_src->local) {

and:
if (rt->priority == rt->plen || rt->local == true) {
as:
if (rt->priority == rt->plen || rt->local) {

and:
if (rtm->rtm_type == RTN_LOCAL) {
change->rd.local = true;
} else {
change->rd.local = false;
}
as:
change->rd.local = rtm->rtm_type == RTN_LOCAL;

Would you mind adjusting the style?

I also noticed a doubled semicolon in sflow_choose_agent_address().

Thanks again,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] How to implement IEEE 802.1Q using vSwitch in Core Emulator ?

2018-04-17 Thread rakesh kumar
I am using Core Emulator to simulate the wired network, so OVS can be
integrated with core emulator or is there any other simulator which uses
OVS

On Tue, Apr 17, 2018 at 9:08 PM, Ben Pfaff  wrote:

> Please don't drop the list.
>
> On Tue, Apr 17, 2018 at 08:50:46PM +0530, rakesh kumar wrote:
> >  > 1. Can I customize the vSwitch behavior to implement my algorithms ex
> -
> > > want to customize the Queue's behavior in switches.  if Yes Please
> Provide
> > > some details  ?
> >
> > What algorithms?
> >
> > Re  - I will have my algorithms for traffic shaping and controlling the 7
> > queues inside the switch.
>
> In that case, you have two choices.
>
> You can either implement your queuing algorithms as Linux qdiscs, inside
> the Linux kernel, and use the OVS Linux kernel module datapath.  In that
> case, there might be no advantage over the Linux bridge.
>
> Or, you can implement your queuing algorithms inside the OVS netdev-dpdk
> module, in userspace, and use the OVS DPDK datapath.  This is likely to
> result in a switch that is faster than the OVS datapath.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Una Forma de Llegar a miles de Personas

2018-04-17 Thread Whatsapp

 

 
Cómo vender por Whatsapp 
Mayo 03 - webinar Interactivo

Objetivo:

En esta capacitación veremos as ventas se logran el 90% con una estrategia 
comunicacional y el 10% con tecnología. 

En este curso podrás conocer todo lo que debes saber sobre ¿Cómo vender de 
manera efectiva con WhatsApp? Estrategias de comunicación para aumentar ventas, 
utilizando una de las herramientas de comunicación más usadas actualmente. 

Temas A Tratar:

Conoce las razones por la cuales WhatsApp es una poderosa herramienta de ventas.
 Errores comunes al implementar WhatsApp Marketing. 
Conecta tus contactos de WhatApp a Facebook e Instagram y aumenta tus ventas.
 Software de envíos masivos por WhatsApp, una forma de llegar a miles de 
personas. 
 
 
Temario e Inscripciones:

Respondiendo por este medio "Whatsapp"+TELÉFONO + NOMBRE o marcando al:

045 + 5515546630 
 



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/8] Split up the DPDK how-to

2018-04-17 Thread Stokes, Ian
> On Mon, 2018-04-09 at 15:15 +, Stokes, Ian wrote:
> > > The DPDK howto has slowly morphed into a catch all for everything
> > > DPDK, which goes against the original design goal for 'howto'
> documents [*].
> > > This series attempts to return some sanity to the universe by
> > > splitting this document into many more 'topic' documents. Along the
> > > way, we add a lot of semantic markup, rework some text, and add an
> > > overview on 'dpdk'- type ports (the original goal here).
> >
> > Thanks for working on this Stephen.
> >
> > I'm in favor of improving the documentation but we need to be careful
> > when splitting it further apart. I think we should do this only where
> > it makes sense.
> >
> > Since moving to the current documentation design layout a common
> > complaint I've heard at the community call is that users don't know
> > where to find specific info as there are multiple locations dealing
> > with the same topic. It can be argued that this info exists in
> > different locations due to the differences in the type of content
> > expected i.e. the difference between what is put in an intro, how-to
> > or tutorial. But this can be non-obvious to a user. It seems the
> > technical details of these differences are hampering the ease of use
> > from a user POV.
> >
> > I'd agree the previous approach of having a single DPDK doc catch all
> > is far from ideal but one aspect users liked was that it was a single
> > clear location.
> 
> I both understand and agree with all the above: there's definitely room
> for improvement here and it's quite likely that we've made things less
> discoverable in how we approached this. That said, I'd be very wary of any
> kind of knee jerk reaction, whereby we go from one extreme to another. As
> anyone who's ever worked with internal wikis on crufty codebases will
> realise, things become increasingly difficult to understand and maintain
> when you stop enforcing strict organization around where things should go.
> That's not something I want to embrace.

Sure and I agree, what we have now is more maintainable, I don’t support 
reverting to the old approach, just pointing out one of the advantages it had 
that the community felt was lost. I don’t think there's anything inherently 
wrong with the current approach but there is a disconnect somewhere we need to 
address.

> 
> The original intention of splitting this stuff up was so that we could
> focus on different kinds of users:
> 
> - tutorials: beginners wishing to get started with OVS
> - howtos: more experience users wishing to attain a given configuration
> - topics: advanced users wishing to deep dive on certain areas
> 

Do we call this out clearly to users somewhere?

It's clear when writing/reviewing a patch series like this that the docs adhere 
to this model (and makes sense) but this might not be the case for a new user.

We could call it out in the top level readme.rst. It already has a 
documentation availability section, it could be expanded with a section 
explaining where users should start with the docs. TBH it would help when new 
features are added also as we can point to what's expected.

> Something like a document map would help, however, I think a better plan
> would be to ensure we provide mechanisms to guide people from one section
> to another. This could be achieved by adding additional cross- references
> (For more information on this topic, refer to XYZ) along with 'Further
> Reading' sections at the bottom of the document. I've touched on this in
> v2 of this series.


I think in the v1 review set we're already starting to add cross referencing to 
help address this so it negates the need for a document map. I'm also happy to 
see the changes such as the version change tags coming in.

From the reviewing/maintaining side it's always easier to point to these type 
of examples so that they can be followed in future docs.

> 
> I think the best thing you could do here is get specific examples of where
> the docs fall down via proper usability testing. I've heard lots of
> complaints about various aspects of the docs but have never collected
> these to build up a clear picture (with the exception of the tunnelling
> documentation, which has come up more times than I can count). My work on
> OVS occurs almost entirely off the clock so this is not something I would
> be able to tackle. However, I have asked about this internally and will
> continue to do so. In addition, I do think this may be something Intel
> could invest some of their considerable resources in addressing, perhaps
> instead of writing yet more OVS blogs (something Red Hat is also guilty
> of).

+1 on this, I wouldn't limit it to just Intel and Red Hat either. IMO 
addressing documentation issues is a community wide effort, all are welcome to 
provide input on this work to address issues they have experienced. Specific 
examples will help here so I would encourage users to participate :).

> 
> In any case, I am ha

Re: [ovs-dev] [PATCH v4] tests: Add system-dpdk-testsuite

2018-04-17 Thread Stokes, Ian
> To: d...@openvswitch.org
> Cc: Rybka, MarcinX 
> Subject: [ovs-dev] [PATCH v4] tests: Add system-dpdk-testsuite
> 
> New OVS-DPDK testsuite, which can be launched via `make check-dpdk`, tests
> OVS using a DPDK datapath. The testsuite contains already initial tests:
>  1. EAL init
>  2. Add standard DPDK PHY port
>  3. Add vhost-user-client port
> 
> Signed-off-by: Marcin Rybka 
> 

Thanks for this Marcin, comments inline below. 
> ---
> Ver.4 updates:
>  - now works with DPDK 17.11.1
>  - added entry in NEWS
>  - better hugepages allocation
> ---
>  Documentation/topics/testing.rst | 19 
>  NEWS |  2 ++
>  tests/automake.mk| 17 ++
>  tests/system-dpdk-macros.at  | 56 +
>  tests/system-dpdk-testsuite.at   | 25 +++
>  tests/system-dpdk.at | 67
> 
>  6 files changed, 186 insertions(+)
>  create mode 100644 tests/system-dpdk-macros.at  create mode 100644
> tests/system-dpdk-testsuite.at  create mode 100644 tests/system-dpdk.at
> 
> diff --git a/Documentation/topics/testing.rst
> b/Documentation/topics/testing.rst
> index 5dcf446..bdc7225 100644
> --- a/Documentation/topics/testing.rst
> +++ b/Documentation/topics/testing.rst
> @@ -297,6 +297,25 @@ To invoke the datapath testsuite with the userspace
> datapath, run::
> 
>  The results of the testsuite are in ``tests/system-userspace-
> testsuite.dir``.
> 
> +DPDK datapath
> +'
> +
> +To test :doc:`/intro/install/dpdk` (i.e., the build was configured with
> +``--with-dpdk``, the DPDK is installed), run the testsuite and generate
> +a report by using the ``check-dpdk`` target::
> +
> +$ make check-dpdk
> +
> +To see a list of all the available tests, run::
> +
> +$ make check-dpdk TESTSUITEFLAGS=--list
> +
> +These tests require a `DPDK supported NIC`_ and proper DPDK variables
> +(``DPDK_DIR`` and ``DPDK_BUILD``). Moreover you need to have root
> +privileges, load the required modules and bind the NIC to the DPDK-
> compatible driver.
> +
> +.. _DPDK supported NIC: http://dpdk.org/doc/nics
> +
>  Kernel datapath
>  '''
> 
> diff --git a/NEWS b/NEWS
> index 58a7b58..7bad608 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -21,6 +21,8 @@ Post-v2.9.0
> and reply with a RST for TCP or ICMPv4/ICMPv6 unreachable message
> for
> other IPv4/IPv6-based protocols whenever a reject ACL rule is hit.
>   * ACL match conditions can now match on Port_Groups.
> +   - New 'check-dpdk' Makefile target to run a new system testsuite.
> + See Testing topic for the details.

I think this should be added under a specific DPDK header in NEWS.

- DPDK
  * Add 'check-dpdk' Makefile target to run dpdk specific testsuite.

Thoughts?

> 
>  v2.9.0 - 19 Feb 2018
>  
> diff --git a/tests/automake.mk b/tests/automake.mk index d9292e8..e52531a
> 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -5,10 +5,12 @@ EXTRA_DIST += \
>   $(SYSTEM_KMOD_TESTSUITE_AT) \
>   $(SYSTEM_USERSPACE_TESTSUITE_AT) \
>   $(SYSTEM_OFFLOADS_TESTSUITE_AT) \
> + $(SYSTEM_DPDK_TESTSUITE_AT) \
>   $(TESTSUITE) \
>   $(SYSTEM_KMOD_TESTSUITE) \
>   $(SYSTEM_USERSPACE_TESTSUITE) \
>   $(SYSTEM_OFFLOADS_TESTSUITE) \
> + $(SYSTEM_DPDK_TESTSUITE) \
>   tests/atlocal.in \
>   $(srcdir)/package.m4 \
>   $(srcdir)/tests/testsuite \
> @@ -128,6 +130,12 @@ SYSTEM_OFFLOADS_TESTSUITE_AT = \
>   tests/system-offloads-traffic.at \
>   tests/system-offloads-testsuite.at
> 
> +SYSTEM_DPDK_TESTSUITE_AT = \
> + tests/system-common-macros.at \
> + tests/system-dpdk-macros.at \
> + tests/system-dpdk-testsuite.at \
> + tests/system-dpdk.at
> +
>  check_SCRIPTS += tests/atlocal
> 
>  TESTSUITE = $(srcdir)/tests/testsuite
> @@ -135,6 +143,7 @@ TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
> SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
>  SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
>  SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
> +SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
>  DISTCLEANFILES += tests/atconfig tests/atlocal
> 
>  AUTOTEST_PATH =
> utilities:vswitchd:ovsdb:vtep:tests:$(PTHREAD_WIN32_DIR_DLL):$(SSL_DIR):ov
> n/controller-vtep:ovn/northd:ovn/utilities:ovn/controller
> @@ -258,6 +267,10 @@ check-offloads: all
>   set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
>   "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> 
> +check-dpdk: all
> + set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
> + "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> +
>  clean-local:
>   test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean
> 
> @@ -286,6 +299,10 @@ $(SYSTEM_OFFLOADS_TE

Re: [ovs-dev] [PATCH] netdev-dpdk: fix MAC address in port addr example

2018-04-17 Thread Stokes, Ian
> On Wed, Apr 11, 2018 at 12:15:28PM +, Stokes, Ian wrote:
> > > Seems the doc just wasn't updated when they settled for a way of
> > > addressing this issue.
> >
> > Agreed, I've validated with a Mellanox Connect X3 pro card, 6 bytes
> > was expected.
> >
> > Thanks for this Marcelo, I see Timothy Redaelli (from Red Hat also)
> > has submitted a similar patch.
> >
> > https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/345885.html
> 
> Oh, wasn't aware of that. Sorry Timothy, and thanks.

No problem, I've pushed this to DPDK_MERGE branch, will backport to ovs 2.9 
also.

Thanks
Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath: Prevent panic

2018-04-17 Thread Gregory Rose

On 4/16/2018 11:32 PM, Pravin Shelar wrote:

On Mon, Apr 16, 2018 at 10:58 AM, Greg Rose  wrote:

On RHEL 7.x kernels we observe a panic induced by a paging error
when the timer kicks off a job that subsequently accesses memory
that belonged to the openvswitch kernel module but was since
unloaded - thus the paging error.

The panic can be induced on any RHEL 7.x kernel with the following test:

while `true`
do
 make check-kmod TESTSUITEFLAGS="-k \!gre"
done

On the systems I've been testing on it generally takes anywhere from a
minute to 15 minutes or so to repro but never longer than that.  Similar
results have been seen by other testers.

This patch does not fix the underlying bug, which does need to be
investigated and fixed, but it does prevent it from occurring. We
would like to prevent customer systems from panicking while we do
futher investigation to find the root cause.


Can you add stack trace to the commit ?


Sure, I'll send a V2 in a bit.

Thanks,

- Greg
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] ovn: Avoid nb_cfg update notification flooding

2018-04-17 Thread Han Zhou
nb_cfg as a mechanism to "ping" OVN control plane is very useful
in many ways. However, the current implementation will trigger
update notifications flooding in the whole control plane. Each
HV updates to SB the nb_cfg number and all these updates are
notified to all the other HVs, which is O(n^2). Although updates
are batched in fewers notifications than n^2, it still generates
significant load on SB DB and also on ovn-controllers.

To solve this problem and make the mechanism more useful in large
scale producation deployment, this patch separates the per HV
*private* data (write only by the owning chassis and not
interesting to any other HVs) from the Chassis table to a separate
table, so that each HV can conditionally monitor and get updates
only for its own record.

Test result shows great improvement:
In a test environment with 1K sandbox HVs, and 10K ports created
on 40 lswitches and 8 lrouters, do the sync test when the system
is idle, with command:

time ovn-nbctl --wait=hv sync

Original result:
real4m52.926s
user0m0.328s
sys 0m0.004s

With this patch:
real0m11.405s
user0m0.316s
sys 0m0.016s

Signed-off-by: Han Zhou 
Acked-By: Venkata Anil 
---

Notes:
v2->v3: updates to make the new table more reasonable

 ovn/controller/chassis.c| 36 +++-
 ovn/controller/chassis.h|  9 ++---
 ovn/controller/ovn-controller.c | 22 +-
 ovn/northd/ovn-northd.c | 22 +++---
 ovn/ovn-sb.ovsschema| 10 --
 ovn/ovn-sb.xml  | 31 +++
 6 files changed, 112 insertions(+), 18 deletions(-)

diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
index 6b5286a..8f81194 100644
--- a/ovn/controller/chassis.c
+++ b/ovn/controller/chassis.c
@@ -72,12 +72,28 @@ get_cms_options(const struct smap *ext_ids)
 return smap_get_def(ext_ids, "ovn-cms-options", "");
 }
 
+static const struct sbrec_chassis_private*
+find_chassis_private(struct ovsdb_idl *ovnsb_idl, const char *chassis_id)
+{
+const struct sbrec_chassis_private *chassis_private_rec;
+
+SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis_private_rec, ovnsb_idl) {
+if (!strcmp(chassis_private_rec->chassis_name, chassis_id)) {
+break;
+}
+}
+
+return chassis_private_rec;
+}
+
 /* Returns this chassis's Chassis record, if it is available and is currently
  * amenable to a transaction. */
 const struct sbrec_chassis *
 chassis_run(struct controller_ctx *ctx, const char *chassis_id,
-const struct ovsrec_bridge *br_int)
+const struct ovsrec_bridge *br_int,
+const struct sbrec_chassis_private **chassis_private)
 {
+*chassis_private = NULL;
 if (!ctx->ovnsb_idl_txn) {
 return NULL;
 }
@@ -135,8 +151,18 @@ chassis_run(struct controller_ctx *ctx, const char 
*chassis_id,
 ds_chomp(&iface_types, ',');
 const char *iface_types_str = ds_cstr(&iface_types);
 
+const struct sbrec_chassis_private *chassis_private_rec
+= find_chassis_private(ctx->ovnsb_idl, chassis_id);
+if (!chassis_private_rec) {
+chassis_private_rec = sbrec_chassis_private_insert(ctx->ovnsb_idl_txn);
+sbrec_chassis_private_set_chassis_name(chassis_private_rec,
+   chassis_id);
+}
+*chassis_private = chassis_private_rec;
+
 const struct sbrec_chassis *chassis_rec
 = get_chassis(ctx->ovnsb_idl, chassis_id);
+
 const char *encap_csum = smap_get_def(&cfg->external_ids,
   "ovn-encap-csum", "true");
 if (chassis_rec) {
@@ -256,6 +282,14 @@ chassis_cleanup(struct controller_ctx *ctx,
   "ovn-controller: unregistering chassis '%s'",
   chassis_rec->name);
 sbrec_chassis_delete(chassis_rec);
+const struct sbrec_chassis_private *chassis_private_rec
+= find_chassis_private(ctx->ovnsb_idl, chassis_rec->name);
+if (chassis_private_rec) {
+sbrec_chassis_private_delete(chassis_private_rec);
+} else {
+VLOG_WARN("Chassis_Private record didn't exist for chassis '%s'",
+  chassis_rec->name);
+}
 }
 return false;
 }
diff --git a/ovn/controller/chassis.h b/ovn/controller/chassis.h
index 2cc47fc..e4b29a8 100644
--- a/ovn/controller/chassis.h
+++ b/ovn/controller/chassis.h
@@ -17,6 +17,7 @@
 #define OVN_CHASSIS_H 1
 
 #include 
+#include "ovn/lib/ovn-sb-idl.h"
 
 struct controller_ctx;
 struct ovsdb_idl;
@@ -24,9 +25,11 @@ struct ovsrec_bridge;
 struct sbrec_chassis;
 
 void chassis_register_ovs_idl(struct ovsdb_idl *);
-const struct sbrec_chassis *chassis_run(struct controller_ctx *,
-const char *chassis_id,
-const struct ovsrec_bridge *br_int);
+const struct sbrec_c

[ovs-dev] [PATCH V2] datapath: Prevent panic

2018-04-17 Thread Greg Rose
On RHEL 7.x kernels we observe a panic induced by a paging error
when the timer kicks off a job that subsequently accesses memory
that belonged to the openvswitch kernel module but was since
unloaded - thus the paging error.

The panic can be induced on any RHEL 7.x kernel with the following test:

while `true`
do
make check-kmod TESTSUITEFLAGS="-k \!gre"
done

On the systems I've been testing on it generally takes anywhere from a
minute to 15 minutes or so to repro but never longer than that.  Similar
results have been seen by other testers.

This patch does not fix the underlying bug, which does need to be
investigated and fixed, but it does prevent it from occurring. We
would like to prevent customer systems from panicking while we do
futher investigation to find the root cause.

Here is the trace:
[252257.801809] BUG: unable to handle kernel paging request at c07c6298
[252257.802451] IP: [] run_timer_softirq+0xe0/0x310
[252257.803055] PGD 19f5067 PUD 19f7067 PMD 2fb5fc2067 PTE 0
[252257.803559] Oops: 0002 [#1] SMP
[252257.804138] Modules linked in: geneve ip6_udp_tunnel xt_statistic 
xt_physdev xt_nat xt_recent xt_comment xt_mark ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat xt_addrtype ipt_REJECT nf_reject_ipv4 
xt_conntrack iptable_filter ip_tables nf_conntrack_netlink br_netfilter 
overlay(T) sch_htb veth udp_tunnel 8021q garp mrp tun ip_set nfnetlink bridge 
stp llc nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iTCO_wdt iTCO_vendor_support 
dcdbas mxm_wmi sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi 
kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw 
gf128mul glue_helper ablk_helper cryptd joydev mei_me sg mei ipmi_ssif pcspkr 
shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter wmi nfsd 
auth_rpcgss
[252257.808079] nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom 
crc_t10dif crct10dif_generic uas usb_storage mgag200 drm_kms_helper syscopyarea 
sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common 
crc32c_intel ahci libahci ixgbe libata igb megaraid_sas mdio ptp i2c_algo_bit 
pps_core i2c_core dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 
openvswitch]
[252257.811056] CPU: 33 PID: 0 Comm: swapper/33 Tainted: G OE  T 
3.10.0-693.el7.x86_64 #1
[252257.811826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 
04/11/2016
[252257.812605] task: 8830b7708fd0 ti: 8830b7718000 task.ti: 
8830b7718000
[252257.813447] RIP: 0010:[] [] 
run_timer_softirq+0xe0/0x310
[252257.814298] RSP: 0018:885fbe203e68 EFLAGS: 00010082
[252257.815122] RAX: 8830b66bc838 RBX: 8830b66bc000 RCX: 
c07c6290
[252257.815933] RDX: 8830b66bc810 RSI: 885fbe203e90 RDI: 
8830b66bc000
[252257.816733] RBP: 885fbe203ed0 R08: e56b5701d800 R09: 
885fbe203da0
[252257.817568] R10: 0002 R11: 885fbe203da8 R12: 
0081
[252257.818429] R13: e56b56fb2eca R14: 819eb0c8 R15: 
0001
[252257.819297] FS: () GS:885fbe20() 
knlGS:
[252257.820174] CS: 0010 DS:  ES:  CR0: 80050033
[252257.821156] CR2: c07c6298 CR3: 019f2000 CR4: 
003407e0
[252257.822012] DR0:  DR1:  DR2: 

[252257.822869] DR3:  DR6: fffe0ff0 DR7: 
0400
[252257.823720] Stack:
[252257.824678] 8830b66bdc28 8830b66bd828 8830b66bd428 
8830b66bd028
[252257.825699] 885fbe20fe80 885fbe203eb0 8132bfe0 
28a56c277c4fe974
[252257.826623] 0001 0001 e56b56fb2eca 
819eb0c8
[252257.827524] Call Trace:
[252257.828410] 
[252257.828417]
[252257.829292] [] ? timerqueue_add+0x60/0xb0
[252257.830164] [] __do_softirq+0xef/0x280
[252257.831010] [] call_softirq+0x1c/0x30
[252257.831849] [] do_softirq+0x65/0xa0
[252257.832669] [] irq_exit+0x105/0x110
[252257.833501] [] smp_apic_timer_interrupt+0x42/0x50
[252257.834330] [] apic_timer_interrupt+0x6d/0x80
[252257.835152] 
[252257.835159]
[252257.835944] [] ? cpuidle_enter_state+0x52/0xc0
[252257.837231] [] cpuidle_idle_call+0xd8/0x210
[252257.838224] [] arch_cpu_idle+0xe/0x30
[252257.839133] [] cpu_startup_entry+0x14a/0x1c0
[252257.839933] [] start_secondary+0x1b6/0x230
[252257.840684] Code: 00 00 00 44 0f b6 e0 45 85 e4 0f 84 a7 01 00 00 49 63 d4 
48 83 43 10 01 48 8d 75 c0 48 c1 e2 04 48 01 da 48 8b 4a 28 48 8d 42 28 <48> 89 
71 08 48 89 4d c0 48 8b 4a 30 48 89 4d c8 48 89 31 48 89
[252257.842366] RIP [] run_timer_softirq+0xe0/0x310
[252257.843183] RSP 
[252257.843955] CR2: c07c6298

Signed-off-by: Greg Rose 
---

V2 - Add splat to commit message per Pravin's request
---
 datapath/datapath.c | 10 ++
 tests/system-kmod-macros.at |  1 +
 utilities/ovs-lib.in|  1 +
 3 files changed, 12 insertions(+

Re: [ovs-dev] [PATCH v2] Edit Open vSwitch license info so that GitHub recognizes it.

2018-04-17 Thread Andrea Kao
Thanks, Ben and Aaron.

And Aaron, you're right - I agree it should've been good enough to put the
exact text of the Apache license into the COPYING file. In any case, it
looks like Licensee has finally been satisfied.

2018-04-16 13:02 GMT-07:00 Ben Pfaff :

> On Fri, Apr 13, 2018 at 03:04:11PM -0400, Aaron Conole wrote:
> > Ben Pfaff  writes:
> >
> > > From: Andrea Kao 
> > >
> > > GitHub uses a library called Licensee to identify a project's license
> > > type. It shows this information in the status bar and via the API if it
> > > can unambiguously identify the license.
> > >
> > > This commit creates a LICENSE file that stores the full text of the
> > > Apache license. It also removes the COPYING file and transfers its
> > > contents to a new "License" section in the README.
> > >
> > > Collectively, these changes allow Licensee to successfully identify the
> > > license type of Open vSwitch's codebase as Apache.
> > >
> > > Submitted-at: https://github.com/openvswitch/ovs/pull/224
> > > Signed-off-by: Andrea Kao 
> > > [b...@ovn.org removed references to COPYING and updated Makefile.am]
> > > Signed-off-by: Ben Pfaff 
> > > ---
> >
> > According to the Licensee[1] code, it should be good enough to put the
> > exact text of the Apache 2.0 license into the COPYING file.  Then again
> > COPYING is a GNU-ism, afaik.
>
> Ack on that.  I'm surprised when I see a file named COPYING that does
> not contain GPLv2 or v3.
>
> > So either way:
> >
> > Acked-by: Aaron Conole 
>
> Thanks.  I applied this to master.
>
> > As a side note, as I start editing files, I'll add SPDX identifiers if
> > others think it makes sense.
>
> Sure, thanks.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] userspace datapath: Add OVS_HASH_L4_SYMMETRIC dp_hash algorithm

2018-04-17 Thread Ben Pfaff
On Mon, Apr 16, 2018 at 04:26:26PM +0200, Jan Scheurich wrote:
> This commit implements a new dp_hash algorithm OVS_HASH_L4_SYMMETRIC in
> the netdev datapath. It will be used as default hash algorithm for the
> dp_hash-based select groups in a subsequent commit to maintain
> compatibility with the symmetry property of the current default hash
> selection method.
> 
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Nitin Katiyar 
> Co-authored-by: Nitin Katiyar 

"sparse" says:
../lib/flow.c:2128:13: error: restricted ovs_be32 degrades to integer
../lib/flow.c:2128:35: error: restricted ovs_be16 degrades to integer

I think that it is saying that htons should be htonl here:

if (flow->packet_type != htons(PT_ETH)) {
/* Cannot hash non-Ethernet flows */
return 0;
}

GCC says:

../lib/flow.c: In function 'flow_hash_symmetric_l2.part.20':
../lib/flow.c:2138:25: error: 'fields..vlan_tci' is used 
uninitialized in this function [-Werror=uninitialized]
 fields.vlan_tci ^= flow->vlans[i].tci & htons(VLAN_VID_MASK);
 ^~

I think that it is right.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-17 Thread Ben Pfaff
On Mon, Apr 16, 2018 at 04:26:27PM +0200, Jan Scheurich wrote:
> The current implementation of the "dp_hash" selection method suffers
> from two deficiences: 1. The hash mask and hence the number of dp_hash
> values is just large enough to cover the number of group buckets, but
> does not consider the case that buckets have different weights. 2. The
> xlate-time selection of best bucket from the masked dp_hash value often
> results in bucket load distributions that are quite different from the
> bucket weights because the number of available masked dp_hash values
> is too small (2-6 bits compared to 32 bits of a full hash in the default
> hash selection method).

Clang gives me the following errors:

../ofproto/ofproto-dpif.c:4792:32: error: unknown warning group 
'-Wmaybe-uninitialized', ignored [-Werror,-Wunknown-pragmas]
../ofproto/ofproto-dpif.c:4814:33: error: initializing 'struct 
ofputil_group_props *' with an expression of type 'const struct 
ofputil_group_props *' discards qualifiers 
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]

"sparse" gives me the following errors:

../ofproto/ofproto-dpif.c:4742:24: error: Variable length array is used.
../ofproto/ofproto-dpif.c:4814:42: error: incorrect type in initializer 
(different modifiers)
../ofproto/ofproto-dpif.c:4814:42:expected struct ofputil_group_props 
*props
../ofproto/ofproto-dpif.c:4814:42:got struct ofputil_group_props const 
*

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] socket-util: Improve comments on (host, port) parsing functions.

2018-04-17 Thread Mark Michelson

No surprise, but

Acked-by: Mark Michelson 

On 04/17/2018 10:31 AM, Ben Pfaff wrote:

Fixes: 0b043300dbad ("Make : parsing uniform treewide.")
Suggested-by: Mark Michelson 
Signed-off-by: Ben Pfaff 
---
  lib/socket-util.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/socket-util.c b/lib/socket-util.c
index 12d16f582d52..1071a328d49a 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -387,7 +387,7 @@ inet_parse_tokens__(char *s, int host_index, char **hostp, 
char **portp)
  /* Parses 's', a string in the form "[:]", into its (required) 
host
   * and (optional) port components, and stores pointers to them in '*hostp' and
   * '*portp' respectively.  Always sets '*hostp' nonnull, although possibly to
- * an empty string empty string.  Can set '*portp' to the null string.
+ * an empty string.  Can set '*portp' to the null string.
   *
   * Supports both IPv4 and IPv6.  IPv6 addresses may be quoted with square
   * brackets.  Resolves ambiguous cases that might represent an IPv6 address or
@@ -404,7 +404,7 @@ inet_parse_host_port_tokens(char *s, char **hostp, char 
**portp)
  
  /* Parses 's', a string in the form "[:]", into its port and host

   * components, and stores pointers to them in '*portp' and '*hostp'
- * respectively.  Both '*portp' and '*hostp' can end up null.
+ * respectively.  Either '*portp' and '*hostp' (but not both) can end up null.
   *
   * Supports both IPv4 and IPv6.  IPv6 addresses may be quoted with square
   * brackets.  Resolves ambiguous cases that might represent an IPv6 address or



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev: Fix typos in comment.

2018-04-17 Thread Mark Michelson

No surprise, but:

Acked-by: Mark Michelson 

On 04/17/2018 10:33 AM, Ben Pfaff wrote:

Fixes: ee4776b8bce1 ("netdev: New function netdev_get_ip_by_name().")
Suggested-by: Mark Michelson 
Signed-off-by: Ben Pfaff 
---
  lib/netdev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/netdev.c b/lib/netdev.c
index 1c4ae2c6cdd2..a418d9a00709 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -1187,7 +1187,7 @@ netdev_get_in4_by_name(const char *device_name, struct 
in_addr *in4)
  }
  
  /* Obtains an IPv4 or IPv6 address from 'device_name' and save the address in

- * '*ss', representing IPv4 addressse as v6-mapped.  Returns 0 if successful,
+ * '*in6', representing IPv4 addresses as v6-mapped.  Returns 0 if successful,
   * otherwise a positive errno value. */
  int
  netdev_get_ip_by_name(const char *device_name, struct in6_addr *in6)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1] ovs-ofctl: Fixed the "snoop" command of ovs-ofctl

2018-04-17 Thread Ashish Varma
In case where "use_names" is set (e.g. in an interactive session) to show
the port and table names when ovs-ofctl is run with snoop command,
ovs-ofctl would get stuck in an endless loop inside "table_iterator_next"
function's while loop checking for "while (ti->send_xid != recv_xid)".
This would happening because the "vconn" to ".snoop" socket would
not respond to TABLE_FEATURES_REQUEST sent by ovs-ofctl.

This commit disables showing port or table names in the snoop command.

Signed-off-by: Ashish Varma 
---
 utilities/ovs-ofctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
index 6708b07..3023787 100644
--- a/utilities/ovs-ofctl.c
+++ b/utilities/ovs-ofctl.c
@@ -2326,6 +2326,7 @@ ofctl_snoop(struct ovs_cmdl_context *ctx)
 {
 struct vconn *vconn;
 
+use_names = 0; /* don't show port and table names */
 open_vconn__(ctx->argv[1], SNOOP, &vconn);
 monitor_vconn(vconn, false, false);
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 01:03:17PM -0700, Ben Pfaff wrote:
> On Mon, Apr 16, 2018 at 04:26:27PM +0200, Jan Scheurich wrote:
> > The current implementation of the "dp_hash" selection method suffers
> > from two deficiences: 1. The hash mask and hence the number of dp_hash
> > values is just large enough to cover the number of group buckets, but
> > does not consider the case that buckets have different weights. 2. The
> > xlate-time selection of best bucket from the masked dp_hash value often
> > results in bucket load distributions that are quite different from the
> > bucket weights because the number of available masked dp_hash values
> > is too small (2-6 bits compared to 32 bits of a full hash in the default
> > hash selection method).
> 
> Clang gives me the following errors:
> 
> ../ofproto/ofproto-dpif.c:4792:32: error: unknown warning group 
> '-Wmaybe-uninitialized', ignored [-Werror,-Wunknown-pragmas]
> ../ofproto/ofproto-dpif.c:4814:33: error: initializing 'struct 
> ofputil_group_props *' with an expression of type 'const struct 
> ofputil_group_props *' discards qualifiers 
> [-Werror,-Wincompatible-pointer-types-discards-qualifiers]

How about this approach, which should cleanly eliminate the warning?

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index e1a5c097f3aa..362339a4abb4 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4780,22 +4780,17 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 
 /* Use Webster method to distribute hash values over buckets. */
 for (int hash = 0; hash < n_hash; hash++) {
-double max_val = 0.0;
-struct webster *winner;
-for (i = 0; i < n_buckets; i++) {
-if (webster[i].value > max_val) {
-max_val = webster[i].value;
+struct webster *winner = &webster[0];
+for (i = 1; i < n_buckets; i++) {
+if (webster[i].value > winner->value) {
 winner = &webster[i];
 }
 }
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
 /* winner is a reference to a webster[] element initialized above. */
 winner->divisor += 2;
 winner->value = (double) winner->bucket->weight / winner->divisor;
 group->hash_map[hash] = winner->bucket;
 winner->bucket->aux++;
-#pragma GCC diagnostic pop
 }
 
 LIST_FOR_EACH (bucket, list_node, &group->up.buckets) {
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] ovs-ofctl: Fixed the "snoop" command of ovs-ofctl

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 01:26:51PM -0700, Ashish Varma wrote:
> In case where "use_names" is set (e.g. in an interactive session) to show
> the port and table names when ovs-ofctl is run with snoop command,
> ovs-ofctl would get stuck in an endless loop inside "table_iterator_next"
> function's while loop checking for "while (ti->send_xid != recv_xid)".
> This would happening because the "vconn" to ".snoop" socket would
> not respond to TABLE_FEATURES_REQUEST sent by ovs-ofctl.
> 
> This commit disables showing port or table names in the snoop command.
> 
> Signed-off-by: Ashish Varma 

Thanks for figuring this out and fixing it.

Would you mind making the comment more detailed?  The current comment
explains what the code does, but it does not explain why it is
important.  If you could summarize the rationale from above, then it
would make it clear to the reader why it is important.

Thanks,

Ben.

> ---
>  utilities/ovs-ofctl.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
> index 6708b07..3023787 100644
> --- a/utilities/ovs-ofctl.c
> +++ b/utilities/ovs-ofctl.c
> @@ -2326,6 +2326,7 @@ ofctl_snoop(struct ovs_cmdl_context *ctx)
>  {
>  struct vconn *vconn;
>  
> +use_names = 0; /* don't show port and table names */
>  open_vconn__(ctx->argv[1], SNOOP, &vconn);
>  monitor_vconn(vconn, false, false);
>  }
> -- 
> 2.7.4
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev: Fix typos in comment.

2018-04-17 Thread Ben Pfaff
Thanks, applied to master.

On Tue, Apr 17, 2018 at 03:12:03PM -0500, Mark Michelson wrote:
> No surprise, but:
> 
> Acked-by: Mark Michelson 
> 
> On 04/17/2018 10:33 AM, Ben Pfaff wrote:
> >Fixes: ee4776b8bce1 ("netdev: New function netdev_get_ip_by_name().")
> >Suggested-by: Mark Michelson 
> >Signed-off-by: Ben Pfaff 
> >---
> >  lib/netdev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/lib/netdev.c b/lib/netdev.c
> >index 1c4ae2c6cdd2..a418d9a00709 100644
> >--- a/lib/netdev.c
> >+++ b/lib/netdev.c
> >@@ -1187,7 +1187,7 @@ netdev_get_in4_by_name(const char *device_name, struct 
> >in_addr *in4)
> >  }
> >  /* Obtains an IPv4 or IPv6 address from 'device_name' and save the address 
> > in
> >- * '*ss', representing IPv4 addressse as v6-mapped.  Returns 0 if 
> >successful,
> >+ * '*in6', representing IPv4 addresses as v6-mapped.  Returns 0 if 
> >successful,
> >   * otherwise a positive errno value. */
> >  int
> >  netdev_get_ip_by_name(const char *device_name, struct in6_addr *in6)
> >
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] socket-util: Improve comments on (host, port) parsing functions.

2018-04-17 Thread Ben Pfaff
Thanks, applied to master.

On Tue, Apr 17, 2018 at 03:12:00PM -0500, Mark Michelson wrote:
> No surprise, but
> 
> Acked-by: Mark Michelson 
> 
> On 04/17/2018 10:31 AM, Ben Pfaff wrote:
> >Fixes: 0b043300dbad ("Make : parsing uniform treewide.")
> >Suggested-by: Mark Michelson 
> >Signed-off-by: Ben Pfaff 
> >---
> >  lib/socket-util.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/lib/socket-util.c b/lib/socket-util.c
> >index 12d16f582d52..1071a328d49a 100644
> >--- a/lib/socket-util.c
> >+++ b/lib/socket-util.c
> >@@ -387,7 +387,7 @@ inet_parse_tokens__(char *s, int host_index, char 
> >**hostp, char **portp)
> >  /* Parses 's', a string in the form "[:]", into its (required) 
> > host
> >   * and (optional) port components, and stores pointers to them in '*hostp' 
> > and
> >   * '*portp' respectively.  Always sets '*hostp' nonnull, although possibly 
> > to
> >- * an empty string empty string.  Can set '*portp' to the null string.
> >+ * an empty string.  Can set '*portp' to the null string.
> >   *
> >   * Supports both IPv4 and IPv6.  IPv6 addresses may be quoted with square
> >   * brackets.  Resolves ambiguous cases that might represent an IPv6 
> > address or
> >@@ -404,7 +404,7 @@ inet_parse_host_port_tokens(char *s, char **hostp, char 
> >**portp)
> >  /* Parses 's', a string in the form "[:]", into its port and 
> > host
> >   * components, and stores pointers to them in '*portp' and '*hostp'
> >- * respectively.  Both '*portp' and '*hostp' can end up null.
> >+ * respectively.  Either '*portp' and '*hostp' (but not both) can end up 
> >null.
> >   *
> >   * Supports both IPv4 and IPv6.  IPv6 addresses may be quoted with square
> >   * brackets.  Resolves ambiguous cases that might represent an IPv6 
> > address or
> >
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] packaging (was: Re: [PATCH v2] Add multi-column index support for the) Python IDL

2018-04-17 Thread Ben Pfaff
On Sat, Apr 14, 2018 at 12:19:02PM -0300, Raymond Burkholder wrote:
> On 04/13/2018 02:52 PM, Ben Pfaff wrote:
> >On Fri, Apr 13, 2018 at 10:27:03AM -0500, Terry Wilson wrote:
> >>One could argue that if if distro packaging is the issue, then distros
> >>could patch in the simple try/except ImportError and add the
> >>sortedcontainer code like I did above.
> >
> >I'm quite sympathetic to that viewpoint--I think that OVS currently has
> >far too much distro-specific stuff in it.  In the long run I'd like to
> >drop the Debian and Red Hat and XenServer packaging from the tree.  As a
> 
> How would it work then, if, ..  I enjoy the fact that I can run Master, use
> the tools in the tree to build my Debian packages, and then install those
> packages on the machines where they need to be?
> 
> >Debian developer myself, I know that it's not actually helpful for
> >upstream to provide packaging.
> 
> And if by upstream, you mean the distribution?  They can be quite behind at
> times.  Debian Stretch has 2.6, and nothing in Stretch-Backports. Buster
> does have 2.8 at the moment, but Buster can be very unstable for
> consistently building environments on demand.

In this case, by "upstream" I mean OVS developers and by "downstream" I
mean a distribution such as Debian or Red Hat.

There's a couple of things going on here.

One is that I think there is much less pressure than usual on downstream
to keep OVS packaging up-to-date because we maintain packaging
upstream.  I think that if OVS upstream stopped shipping packaging,
downstreams would eventually adapt by updating packaging more
frequently.

Also, I suspect that most users run from a release or at least a release
branch.  On a release branch, usually the packaging changes little, or
at least the packaging changes due to OVS changes are minimal, since OVS
itself doesn't change much on release branches.

I currently have a selfish reason to keep packaging in the tree, which
is that VMware internally uses both the .deb and .rpm packaging as part
of its internal processes (and releases to customers).  We're working on
getting together and contributing to OVS a container build and making
that what we use internally and provide to customers.  So I have
motivation to make sure that the containerization is good quality too,
since otherwise customers will be unhappy.  (At that point, I'll be
happy to transition to emeritus status as a Debian developer, since
being able to directly contribute to OVS downstream packaging--which I
do badly--is about the only reason I stick around there after 20+
years.)

So, if we do manage to get rid of packaging, it'll only be because
there's an acceptable alternative.  Your options will basically be:

* Use the containers, for any branch or master.  We might even publish
  binaries, dunno yet.  You can be sure they'll be pretty good since
  we'll actually use them too.

* Use packaging from your favored distro, for some release branch.

* Help out downstream with keeping the packaging up-to-date with master
  (or be patient and wait for other downstream folks to do the same).
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] checkpatch: add checks for new rst docs

2018-04-17 Thread Flavio Leitner
When a new rst document is added under Documentation, check if the
new file is added to the proper index.rst and to the automake.mk.

Signed-off-by: Flavio Leitner 
---
 utilities/checkpatch.py | 89 +
 1 file changed, 89 insertions(+)

diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 0f389052c..53373f75b 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -155,6 +155,8 @@ __regex_trailing_operator = re.compile(r'^[^ ]* [^ ]*[?:]$')
 __regex_conditional_else_bracing = re.compile(r'^\s*else\s*{?$')
 __regex_conditional_else_bracing2 = re.compile(r'^\s*}\selse\s*$')
 __regex_has_xxx_mark = re.compile(r'.*xxx.*', re.IGNORECASE)
+__regex_added_doc_rst = re.compile(
+r'\ndiff .*Documentation/.*rst\nnew file mode')
 
 skip_leading_whitespace_check = False
 skip_trailing_whitespace_check = False
@@ -376,6 +378,84 @@ def check_comment_spelling(line):
 return False
 
 
+def __check_doc_is_listed(text, doctype, docdir, docfile):
+if doctype == 'rst':
+beginre = re.compile(r'\+\+\+.*{}/index.rst'.format(docdir))
+docre = re.compile(r'\n\+.*{}'.format(docfile.replace('.rst', '')))
+elif doctype == 'automake':
+beginre = re.compile(r'\+\+\+.*Documentation/automake.mk')
+docre = re.compile(r'\n\+\t{}/{}'.format(docdir, docfile))
+else:
+raise NotImplementedError("Invalid doctype: {}".format(doctype))
+
+res = beginre.search(text)
+if res is None:
+return True
+
+hunkstart = res.span()[1]
+hunkre = re.compile(r'\n(---|\+\+\+) (\S+)')
+res = hunkre.search(text[hunkstart:])
+if res is None:
+hunkend = len(text)
+else:
+hunkend = hunkstart + res.span()[0]
+
+hunk = text[hunkstart:hunkend]
+# find if the file is being added.
+if docre.search(hunk) is not None:
+return False
+
+return True
+
+
+def __check_new_docs(text, doctype):
+"""Check if the documentation is listed properly. If doctype is 'rst' then
+   the index.rst is checked. If the doctype is 'automake' then automake.mk
+   is checked. Returns TRUE if the new file is not listed."""
+failed = False
+new_docs = __regex_added_doc_rst.findall(text)
+for doc in new_docs:
+docpathname = doc.split(' ')[2]
+gitdocdir, docfile = os.path.split(docpathname.rstrip('\n'))
+if docfile == "index.rst":
+continue
+
+if gitdocdir.startswith('a/'):
+docdir = gitdocdir.replace('a/', '', 1)
+else:
+docdir = gitdocdir
+
+if __check_doc_is_listed(text, doctype, docdir, docfile):
+if doctype == 'rst':
+print_warning("New doc {} not listed in {}/index.rst".format(
+  docfile, docdir))
+elif doctype == 'automake':
+print_warning("New doc {} not listed in "
+  "Documentation/automake.mk".format(docfile))
+else:
+raise NotImplementedError("Invalid doctype: {}".format(
+  doctype))
+
+failed = True
+
+return failed
+
+
+def check_doc_docs_automake(text):
+return __check_new_docs(text, 'automake')
+
+
+def check_new_docs_index(text):
+return __check_new_docs(text, 'rst')
+
+
+file_checks = [
+{'regex': __regex_added_doc_rst,
+ 'check': check_new_docs_index},
+{'regex': __regex_added_doc_rst,
+ 'check': check_doc_docs_automake}
+]
+
 checks = [
 {'regex': None,
  'match_name':
@@ -517,6 +597,13 @@ def run_checks(current_file, line, lineno):
 print("%s\n" % line)
 
 
+def run_file_checks(text):
+"""Runs the various checks for the text."""
+for check in file_checks:
+if check['regex'].search(text) is not None:
+check['check'](text)
+
+
 def ovs_checkpatch_parse(text, filename):
 global print_file_name, total_line, checking_file
 
@@ -609,6 +696,8 @@ def ovs_checkpatch_parse(text, filename):
 if current_file.startswith('include/linux'):
 continue
 run_checks(current_file, cmp_line, lineno)
+
+run_file_checks(text)
 if __errors or __warnings:
 return -1
 return 0
-- 
2.14.3


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netns: Add documentation and update NEWS.

2018-04-17 Thread Flavio Leitner
On Fri, Apr 13, 2018 at 11:13:21AM -0700, Ben Pfaff wrote:
> On Wed, Apr 11, 2018 at 08:50:56PM -0300, Flavio Leitner wrote:
> > Create a document to describe the how it works and known
> > limitations and update the NEWS accordingly.
> > 
> > Signed-off-by: Flavio Leitner 
> 
> Thanks!  I folded in the following to make "checkpatch" and "make"
> happy, and then I applied this to master.

Thanks a lot Ben, I sent out a patch to improve checkpatch:
$ ./utilities/checkpatch.py 
_ovs-dev_PATCH_netns_Add_documentation_and_update_NEWS.patch 
== Checking "_ovs-dev_PATCH_netns_Add_documentation_and_update_NEWS.patch" ==
WARNING: New doc networking-namespaces.rst not listed in 
Documentation/topics/index.rst
WARNING: New doc networking-namespaces.rst not listed in 
Documentation/automake.mk
Lines checked: 106, Warnings: 2, Errors: 0

https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346220.html

fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Add multi-column index support for the Python IDL

2018-04-17 Thread Ben Pfaff
On Thu, Apr 12, 2018 at 07:24:27PM -0500, twil...@redhat.com wrote:
> From: Terry Wilson 
> 
> This adds multi-column index support for the Python IDL that is
> similar to the feature in the C IDL. Since it adds sortedcontainers
> as a dependency and some distros don't yet package it, the library
> is copied in-tree and used if sortedcontainers is not installed.
> 
> Signed-off-by: Terry Wilson 

Thank you!

flake8 complained bitterly about some of the new files, so I folded in
the following, and applied this to master.

Thanks again,

Ben.

--8<--cut here-->8--

diff --git a/python/automake.mk b/python/automake.mk
index 458a2c3d20b4..5a1e1da8a79a 100644
--- a/python/automake.mk
+++ b/python/automake.mk
@@ -63,7 +63,7 @@ EXTRA_DIST += $(PYFILES)
 PYCOV_CLEAN_FILES += $(PYFILES:.py=.py,cover)
 
 FLAKE8_PYFILES += \
-   $(filter-out python/ovs/dirs.py,$(PYFILES)) \
+   $(filter-out python/ovs/compat/% python/ovs/dirs.py,$(PYFILES)) \
python/setup.py \
python/build/__init__.py \
python/build/nroff.py \
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] checkpatch: add checks for new rst docs

2018-04-17 Thread Ben Pfaff
On Tue, Apr 17, 2018 at 06:46:24PM -0300, Flavio Leitner wrote:
> When a new rst document is added under Documentation, check if the
> new file is added to the proper index.rst and to the automake.mk.
> 
> Signed-off-by: Flavio Leitner 

Nice.  This ought to catch problems even if the submitter and the
applier (is that a real word?) forget to run "make", as I occasionally
do for doc patches.

I applied this to master.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] raft ovsdb clustering with scale test

2018-04-17 Thread aginwala
Thanks Numan:


Will also try to go little deep and see if we get little more hints by
increasing the log levels to debug and will wait for inputs from you and
Ben further . Hope we can find the culprit soon :)


Regards,


On Tue, Apr 17, 2018 at 12:13 AM, Numan Siddique 
wrote:

>
>
> On Tue, Apr 17, 2018 at 5:52 AM, aginwala  wrote:
>
>> Hi Ben/Numan:
>>
>>
>> Just following up again. Let me know for any further
>> improvements/suggestions on this.
>>
>>
> Hi Aliasgar,
>
> I  need to educate myself first by studing raft and Ben's patches :). I
> hope I will do it soon.
>
> Thanks
> Numan
>
>
>
>
>> Regards,
>>
>>
>> On Mon, Apr 9, 2018 at 10:53 AM, aginwala  wrote:
>>
>> > Further Analysis indicates that cluster is not able to select a new
>> leader
>> > and hence every node keeps retrying:
>> >
>> > ovn-northd logs around the same time on all three nodes complains about
>> > disconnection from cluster and failing to choose leader
>> >
>> >
>> > #Node1
>> > 2018-04-06T01:19:06.481Z|00466|ovn_northd|INFO|ovn-northd lock lost.
>> This
>> > ovn-northd instance is now on standby.
>> > 2018-04-06T01:19:07.668Z|00467|ovn_northd|INFO|ovn-northd lock
>> acquired.
>> > This ovn-northd instance is now active.
>> > 2018-04-06T01:19:07.671Z|00468|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
>> > clustered database server is not cluster leader; trying another server
>> > 2018-04-06T01:19:15.673Z|00469|reconnect|INFO|tcp:192.168.220.103:6642:
>> > connected
>> > 2018-04-06T01:19:15.673Z|00470|ovn_northd|INFO|ovn-northd lock lost.
>> This
>> > ovn-northd instance is now on standby.
>> >
>> > #Node2
>> > 2018-04-06T01:19:58.249Z|00487|ovn_northd|INFO|ovn-northd lock
>> acquired.
>> > This ovn-northd instance is now active.
>> > 2018-04-06T01:19:58.255Z|00488|ovsdb_idl|INFO|tcp:192.168.220.103:6642:
>> > clustered database server is disconnected from cluster; trying another
>> > server
>> > 2018-04-06T01:20:09.261Z|00489|reconnect|INFO|tcp:192.168.220.102:6642:
>> > connected
>> > 2018-04-06T01:20:09.261Z|00490|ovn_northd|INFO|ovn-northd lock lost.
>> This
>> > ovn-northd instance is now on standby.
>> >
>> >
>> > #Node3
>> > 2018-04-06T01:19:01.654Z|00964|ovn_northd|INFO|ovn-northd lock
>> acquired.
>> > This ovn-northd instance is now active.
>> > 2018-04-06T01:19:01.711Z|00965|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
>> > clustered database server is disconnected frr
>> > om cluster; trying another server
>> > 2018-04-06T01:19:09.715Z|00966|reconnect|INFO|tcp:192.168.220.103:6642:
>> > connected
>> > 2018-04-06T01:19:09.716Z|00967|ovn_northd|INFO|ovn-northd lock lost.
>> This
>> > ovn-northd instance is now on standby.
>> >
>> >
>> >
>> > Regards,
>> > Aliasgar
>> >
>> > On Fri, Apr 6, 2018 at 2:16 PM, aginwala  wrote:
>> >
>> >> Hi Ben/Numan:
>> >>
>> >> So I went ahead to try out clustered db in scale env and results are
>> not
>> >> as expected.
>> >> OVS branch used is master; commit-id(2062840612904ff0873d
>> >> 46b2f4f226bc23f2296d)
>> >>
>> >> Setup is uing 3 nodes.
>> >>
>> >> Also disabled inactivity_probe,
>> >> ovn-nbctl --db="tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:
>> >> 192.168.220.103:6641" set NB . external_ids:inactivity_probe=0
>> >> ovn-sbctl --db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
>> >> 192.168.220.103:6642" set SB . external_ids:inactivity_probe=0
>> >>
>> >>
>> >> 1. With wait_up = true for 10k lports and 1k HVs which internally uses
>> >> wait-until, it was able to create around 1k lports and rally exited
>> due db
>> >> connection error.
>> >> Cause of failure: database connection failed is @
>> >> https://raw.githubusercontent.com/noah8713/ovn-scale-test/e3
>> >> 98ccdd77b5c0bfed3d1cfe37c7e7ac5d8d8f81/results/output_raft_fail.html
>> >>
>> >> Some data from sb db logs are as below
>> >> 2018-04-05T06:29:32.628Z|00065|poll_loop|INFO|wakeup due to 9-ms
>> timeout
>> >> at lib/reconnect.c:643 (90% CPU usage)
>> >> 2018-04-05T06:30:19.432Z|00066|raft|INFO|current entry eid
>> >> 23a85453-70a1-49ae-bf8f-22a1eeda6a60 does not match prerequisite
>> >> 966d3234-961a-4dc5-b2ad-4f12766fd610 in execute_command_request
>> >> 2018-04-05T06:31:03.428Z|00067|raft|INFO|current entry eid
>> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> >> 2018-04-05T06:31:03.469Z|00068|raft|INFO|current entry eid
>> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> >> 2018-04-05T06:31:03.492Z|00069|raft|INFO|current entry eid
>> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> >> 2018-04-05T06:31:03.505Z|00070|raft|INFO|current entry eid
>> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> >> 2018-04-05T06:31:03.510Z|00071|raft|INFO|

[ovs-dev] [PATCH v2] ovs-ofctl: Fixed the "snoop" command of ovs-ofctl

2018-04-17 Thread Ashish Varma
In normal ovs-ofctl commands (e.g. add-flow), ovs-ofctl connects to
ovs-vswitchd process on “/.mgmt” unix socket.
In an output that contains a port or table, port name or table name can be
displayed, instead of their numbers, if the user turns on this feature.
(by -—names option) Also, this feature is enabled when interacting with a user
on console.  To fetch the names, ovs-ofctl sends TABLE FEATURE /
PORT DESCRIPTION request message to ovs-vswitchd process and expects a reply
on the connection.
When ovs-ofctl runs the snoop command, it connects to
“/.snoop” unix socket. ovs-vswitchd process would
not reply to the TABLE FEATURE / PORT DESCRIPTION request message on this
connection. It would only send any open flow message it receives from the
controller.
When using port/table name feature with snoop command, the print of open flow
message would call ‘tables_to_show()’/‘ports_to_show()’ which in turn would
send the request message on the snoop connection. ovs-vswitchd would not reply
back on this connection, but instead would keep sending the open flow messages
received from controller. ‘table_iterator_next()/port_iterator_next()’ function
would loop for ever waiting for response.
The fix for this is to turn off display of table/port names when running
snoop command.

Signed-off-by: Ashish Varma 
---
v1-v2

Added more description to the cause of the issue and reason to add the fix.
---
 utilities/ovs-ofctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
index 6708b07..3023787 100644
--- a/utilities/ovs-ofctl.c
+++ b/utilities/ovs-ofctl.c
@@ -2326,6 +2326,7 @@ ofctl_snoop(struct ovs_cmdl_context *ctx)
 {
 struct vconn *vconn;
 
+use_names = 0; /* don't show port and table names */
 open_vconn__(ctx->argv[1], SNOOP, &vconn);
 monitor_vconn(vconn, false, false);
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses

2018-04-17 Thread wenxu
From: wenxu 

It makes OVS native tunneling honor tunnel-specified source addresses,
in the same way that Linux kernel tunneling honors them.

This patch made valid tun_src specified by flow-action can be used for
tunnel_src of packet. add a "local" property for a route entry and enhance
the priority of local route higher than user route.
Like the kernel space when lookup the route, if there are tun_src specified
by flow-action or port options. Check the tun_src wheather is a local
address, then lookup the route.

Signed-off-by: wenxu 
Signed-off-by: frank.zeng 
---
 lib/ovs-router.c | 36 +---
 lib/ovs-router.h |  2 +-
 lib/route-table.c|  4 +++-
 ofproto/ofproto-dpif-sflow.c |  2 +-
 ofproto/ofproto-dpif-xlate.c |  4 
 5 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/lib/ovs-router.c b/lib/ovs-router.c
index ad8bd62..bfb2b70 100644
--- a/lib/ovs-router.c
+++ b/lib/ovs-router.c
@@ -66,6 +66,7 @@ struct ovs_router_entry {
 struct in6_addr src_addr;
 uint8_t plen;
 uint8_t priority;
+bool local;
 uint32_t mark;
 };
 
@@ -110,13 +111,28 @@ ovs_router_lookup(uint32_t mark, const struct in6_addr 
*ip6_dst,
 const struct cls_rule *cr;
 struct flow flow = {.ipv6_dst = *ip6_dst, .pkt_mark = mark};
 
+if (src && ipv6_addr_is_set(src)) {
+const struct cls_rule *cr_src;
+struct flow flow_src = {.ipv6_dst = *src, .pkt_mark = mark};
+
+cr_src = classifier_lookup(&cls, OVS_VERSION_MAX, &flow_src, NULL);
+if (cr_src) {
+struct ovs_router_entry *p_src = ovs_router_entry_cast(cr_src);
+if (!p_src->local) {
+return false;
+}
+} else {
+return false;
+}
+}
+
 cr = classifier_lookup(&cls, OVS_VERSION_MAX, &flow, NULL);
 if (cr) {
 struct ovs_router_entry *p = ovs_router_entry_cast(cr);
 
 ovs_strlcpy(output_bridge, p->output_bridge, IFNAMSIZ);
 *gw = p->gw;
-if (src) {
+if (src && !ipv6_addr_is_set(src)) {
 *src = p->src_addr;
 }
 return true;
@@ -197,7 +213,7 @@ out:
 }
 
 static int
-ovs_router_insert__(uint32_t mark, uint8_t priority,
+ovs_router_insert__(uint32_t mark, uint8_t priority, bool local,
 const struct in6_addr *ip6_dst,
 uint8_t plen, const char output_bridge[],
 const struct in6_addr *gw)
@@ -217,6 +233,7 @@ ovs_router_insert__(uint32_t mark, uint8_t priority,
 p->mark = mark;
 p->nw_addr = match.flow.ipv6_dst;
 p->plen = plen;
+p->local = local;
 p->priority = priority;
 err = get_src_addr(ip6_dst, output_bridge, &p->src_addr);
 if (err && ipv6_addr_is_set(gw)) {
@@ -249,10 +266,12 @@ ovs_router_insert__(uint32_t mark, uint8_t priority,
 
 void
 ovs_router_insert(uint32_t mark, const struct in6_addr *ip_dst, uint8_t plen,
-  const char output_bridge[], const struct in6_addr *gw)
+  bool local, const char output_bridge[], 
+  const struct in6_addr *gw)
 {
 if (use_system_routing_table) {
-ovs_router_insert__(mark, plen, ip_dst, plen, output_bridge, gw);
+uint8_t priority = local ? plen + 64 : plen;
+ovs_router_insert__(mark, priority, local, ip_dst, plen, 
output_bridge, gw);
 }
 }
 
@@ -360,7 +379,7 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
 }
 }
 
-err = ovs_router_insert__(mark, plen + 32, &ip6, plen, argv[2], &gw6);
+err = ovs_router_insert__(mark, plen + 32, false, &ip6, plen, argv[2], 
&gw6);
 if (err) {
 unixctl_command_reply_error(conn, "Error while inserting route.");
 } else {
@@ -409,7 +428,7 @@ ovs_router_show(struct unixctl_conn *conn, int argc 
OVS_UNUSED,
 ds_put_format(&ds, "Route Table:\n");
 CLS_FOR_EACH(rt, cr, &cls) {
 uint8_t plen;
-if (rt->priority == rt->plen) {
+if (rt->priority == rt->plen || rt->local) {
 ds_put_format(&ds, "Cached: ");
 } else {
 ds_put_format(&ds, "User: ");
@@ -431,6 +450,9 @@ ovs_router_show(struct unixctl_conn *conn, int argc 
OVS_UNUSED,
 }
 ds_put_format(&ds, " SRC ");
 ipv6_format_mapped(&rt->src_addr, &ds);
+if (rt->local) {
+ds_put_format(&ds, " local");
+}
 ds_put_format(&ds, "\n");
 }
 unixctl_command_reply(conn, ds_cstr(&ds));
@@ -441,7 +463,7 @@ static void
 ovs_router_lookup_cmd(struct unixctl_conn *conn, int argc,
   const char *argv[], void *aux OVS_UNUSED)
 {
-struct in6_addr gw, src;
+struct in6_addr gw, src = in6addr_any;
 char iface[IFNAMSIZ];
 struct in6_addr ip6;
 unsigned int plen;
diff --git a/lib/ovs-router.h b/lib/ovs-router.h
index f65d652..34ea163 100644
--- a/lib/ovs-router.h
+++ b/lib/ovs-router.h
@@ -31,7 +31,7 @@ bool ovs_router_lookup(uint32_t mar

[ovs-dev] [PATCH v4] ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses

2018-04-17 Thread wenxu
From: wenxu 

It makes OVS native tunneling honor tunnel-specified source addresses,
in the same way that Linux kernel tunneling honors them.

This patch made valid tun_src specified by flow-action can be used for
tunnel_src of packet. add a "local" property for a route entry and enhance
the priority of local route higher than user route.
Like the kernel space when lookup the route, if there are tun_src specified
by flow-action or port options. Check the tun_src wheather is a local
address, then lookup the route.

Signed-off-by: wenxu 
Signed-off-by: frank.zeng 
---
 lib/ovs-router.c | 36 +---
 lib/ovs-router.h |  2 +-
 lib/route-table.c|  4 +++-
 ofproto/ofproto-dpif-sflow.c |  2 +-
 ofproto/ofproto-dpif-xlate.c |  4 
 5 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/lib/ovs-router.c b/lib/ovs-router.c
index ad8bd62..bfb2b70 100644
--- a/lib/ovs-router.c
+++ b/lib/ovs-router.c
@@ -66,6 +66,7 @@ struct ovs_router_entry {
 struct in6_addr src_addr;
 uint8_t plen;
 uint8_t priority;
+bool local;
 uint32_t mark;
 };
 
@@ -110,13 +111,28 @@ ovs_router_lookup(uint32_t mark, const struct in6_addr 
*ip6_dst,
 const struct cls_rule *cr;
 struct flow flow = {.ipv6_dst = *ip6_dst, .pkt_mark = mark};
 
+if (src && ipv6_addr_is_set(src)) {
+const struct cls_rule *cr_src;
+struct flow flow_src = {.ipv6_dst = *src, .pkt_mark = mark};
+
+cr_src = classifier_lookup(&cls, OVS_VERSION_MAX, &flow_src, NULL);
+if (cr_src) {
+struct ovs_router_entry *p_src = ovs_router_entry_cast(cr_src);
+if (!p_src->local) {
+return false;
+}
+} else {
+return false;
+}
+}
+
 cr = classifier_lookup(&cls, OVS_VERSION_MAX, &flow, NULL);
 if (cr) {
 struct ovs_router_entry *p = ovs_router_entry_cast(cr);
 
 ovs_strlcpy(output_bridge, p->output_bridge, IFNAMSIZ);
 *gw = p->gw;
-if (src) {
+if (src && !ipv6_addr_is_set(src)) {
 *src = p->src_addr;
 }
 return true;
@@ -197,7 +213,7 @@ out:
 }
 
 static int
-ovs_router_insert__(uint32_t mark, uint8_t priority,
+ovs_router_insert__(uint32_t mark, uint8_t priority, bool local,
 const struct in6_addr *ip6_dst,
 uint8_t plen, const char output_bridge[],
 const struct in6_addr *gw)
@@ -217,6 +233,7 @@ ovs_router_insert__(uint32_t mark, uint8_t priority,
 p->mark = mark;
 p->nw_addr = match.flow.ipv6_dst;
 p->plen = plen;
+p->local = local;
 p->priority = priority;
 err = get_src_addr(ip6_dst, output_bridge, &p->src_addr);
 if (err && ipv6_addr_is_set(gw)) {
@@ -249,10 +266,12 @@ ovs_router_insert__(uint32_t mark, uint8_t priority,
 
 void
 ovs_router_insert(uint32_t mark, const struct in6_addr *ip_dst, uint8_t plen,
-  const char output_bridge[], const struct in6_addr *gw)
+  bool local, const char output_bridge[], 
+  const struct in6_addr *gw)
 {
 if (use_system_routing_table) {
-ovs_router_insert__(mark, plen, ip_dst, plen, output_bridge, gw);
+uint8_t priority = local ? plen + 64 : plen;
+ovs_router_insert__(mark, priority, local, ip_dst, plen, 
output_bridge, gw);
 }
 }
 
@@ -360,7 +379,7 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
 }
 }
 
-err = ovs_router_insert__(mark, plen + 32, &ip6, plen, argv[2], &gw6);
+err = ovs_router_insert__(mark, plen + 32, false, &ip6, plen, argv[2], 
&gw6);
 if (err) {
 unixctl_command_reply_error(conn, "Error while inserting route.");
 } else {
@@ -409,7 +428,7 @@ ovs_router_show(struct unixctl_conn *conn, int argc 
OVS_UNUSED,
 ds_put_format(&ds, "Route Table:\n");
 CLS_FOR_EACH(rt, cr, &cls) {
 uint8_t plen;
-if (rt->priority == rt->plen) {
+if (rt->priority == rt->plen || rt->local) {
 ds_put_format(&ds, "Cached: ");
 } else {
 ds_put_format(&ds, "User: ");
@@ -431,6 +450,9 @@ ovs_router_show(struct unixctl_conn *conn, int argc 
OVS_UNUSED,
 }
 ds_put_format(&ds, " SRC ");
 ipv6_format_mapped(&rt->src_addr, &ds);
+if (rt->local) {
+ds_put_format(&ds, " local");
+}
 ds_put_format(&ds, "\n");
 }
 unixctl_command_reply(conn, ds_cstr(&ds));
@@ -441,7 +463,7 @@ static void
 ovs_router_lookup_cmd(struct unixctl_conn *conn, int argc,
   const char *argv[], void *aux OVS_UNUSED)
 {
-struct in6_addr gw, src;
+struct in6_addr gw, src = in6addr_any;
 char iface[IFNAMSIZ];
 struct in6_addr ip6;
 unsigned int plen;
diff --git a/lib/ovs-router.h b/lib/ovs-router.h
index f65d652..34ea163 100644
--- a/lib/ovs-router.h
+++ b/lib/ovs-router.h
@@ -31,7 +31,7 @@ bool ovs_router_lookup(uint32_t mar