On 2/3/2020 7:05 PM, Sriharsha Basavapatna wrote:
Hi Eli,

Thanks for sending this patchset. I have some questions about the
design, please see my comments below.

Thanks,
-Harsha

On Mon, Jan 20, 2020 at 8:39 PM Eli Britstein <el...@mellanox.com> wrote:
In the netdev datapath, packets arriving over a tunnel are processed by
more than one flow. For example a packet that should be decapsulated and
forwarded to another port is processed by two flows:
1. in_port=<PF>, outer-header matches, dst-port=4789 (VXLAN),
    actions:tnl_pop(vxlan_sys_4789).
2. in_port=vxlan_sys_4789, tunnel matches (VNI for example),
    inner-header matches, actions: vm1.

In order to offload such a multi-flow processing path, a multi-table HW
model is used.
Did you explore other ways to avoid this multi-flow processing model ?
For example, may be support some additional logic to merge the two
datapath flow-rules for decap into a single flow-rule in the offload
layer ?
Yes, we did explore this approach and our research lead us to conclude that following the OVS multi-table SW model in HW proves to be a more robust architecture. Single table architecture for tunnel offloads, for example, has the following issues: 1. Statistics – flow #1 counts bytes including outer header. If the outer is IPv6, it can have extensions, to be counted by the packet. Flow #2 counts the bytes of the inner packet only. 2. Update rate – parsing multi-table to single one means cartesian product of all the flows. Any change of a single flow in one table means a lot of removals and insertions of rules in HW. The single table model also breaks when offloading more complex actions such as CT, dp_hash and others.

Flow #1 matches are offloaded as is, but tnl_pop is
translated to MARK and JUMP to a tnl-port table actions.
The response to tnl_pop action is now different between non-offload
and offload datapaths, since offload layer changes it into entirely
different
actions. The user would expect consistent response to tnl_pop action,
especially if you treat this flow rule as an independent rule by
itself (i.e,
assuming there's no flow #2 subsequently).

Regarding the JUMP action:
IMO, offload framework should avoid anything that is too vendor
specific; and make use of simple/common abstractions. I'm not sure if
JUMP action is really needed to support offload of tnl_pop action.
Also, it may not map to the real use case for which JUMP action seems
to have been defined, which is to support flow-grouping (see
rte_flow.h).
The rte_flow API does not offer an equivalent action to moving outer headers to some metadata buffer, like the SW tnl_pop action. The JUMP action is equivalent to recirculate on the vport, and the MARK action is to make sure the user will have a consistent response in SW/HW (again, sticking to the SW model), so in case of a miss the packet is recovered to continue SW processing normally.

Flow #2 is
offloaded to that tnl-port table, its tunnel matches are translated to
outer-header matches, followed by the inner-header ones. The actions
start with an implicit DECAP action followed by the flow's actions.
The DECAP is done in flow #2. If it was done in flow #1, the outer
headers of the packets would have been lost, and they could not have
been matched in flow #2.
The only missing match field that would allow decap on flow #1 itself,
seems to be the tunnel-id. If that is provided in flow #1, then we may
not need to match it again in flow #2. And flow #2 could just match on
the inner packet headers. This would also make handling the flow-miss
simple, since you wouldn't need the recovery logic (offload layer
popping the header etc). The packet would be in the right state for
the next stage of processing in SW, if there's a  flow-miss after the
first stage in HW. It also means tnl_pop can be done in flow #1,
making the behavior consistent with non-offload processing.

The other advantage of providing the tunnel-id in flow #1 is that,
then we wouldn't need to match on the entire packet including the
outer-headers in the second stage and only the inner header fields
would need to be matched in flow #2.
Even if we could somehow get the matches from flow #2 to flow #1 (leading back to single flow model…) and do the DECAP in flow #1, then we would have to change the SW matches in flow #2 in case it is processed in SW as it won’t have the outer header/tunnel anymore.

Furthermore, once the HW processing path is composed of a multi table
processing, there is a need to handle a case where the processing starts
in HW, but is not completed in HW. For example, only flow #1 is
offloaded while flow #2 doesn't exist. For that, the MARK value set in
the tnl_pop action is received in SW, the packet is recovered by the
mapped vport, and processing continues in SW.
If it is not possible to provide tunnel-id in flow #1, then the other
alternative might be to defer offloading until flow #2 and offload
only flow #2. You are anyway matching almost the entire outer-headers
and the inner-header in flow #2 and the decap action is also added
implicitly to flow #2. Why not totally avoid offloading flow #1 ? That
way, none of the miss handling code is required.
See the first argument about following the SW model.

As vports are virtual and cannot have rte_flows, offloading a vport
generates rte_flows for every PF in the system, as we cannot know from
which one the packets arrive from.
Is it possible to track the "original in-port" through this two-stage
flow processing and generate rte_flow only on the specific PF that
received the
packet ?
We could “track” it only if we tried to merge both flow #1 and flow #2. See the first argument about following the SW model.

Thanks,
-Harsha
This series adds support for IPv6, tunnel push actions, and the
multi-table model for the decap processing.

v1 Travis passed: 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Felibritstein%2FOVS%2Fbuilds%2F639484233&amp;data=02%7C01%7Celibr%40mellanox.com%7Cec94952030bb4352740308d7a8cb42ac%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637163463282513495&amp;sdata=hDni3f9CGRr6FvUwLKlBDtlLNfrpHm5p%2Ffe%2FWzdCZuE%3D&amp;reserved=0

Eli Britstein (14):
   dpif-netdev: Add mega ufid in flow add log
   netdev-offload-dpdk: Remove pre-validate of patterns function
   netdev-offload-dpdk: Add IPv6 pattern matching
   netdev-offload-dpdk: Support offload of clone tnl_push/output actions
   netdev-offload-dpdk: Support tnl/push using vxlan encap attribute
   netdev-offload: Don't use zero flow mark
   netdev-offload-dpdk: Introduce map APIs for table id and miss context
   netdev-offload-dpdk: Implement HW miss packet recover for vport
   netdev-offload-dpdk: Refactor disassociate ufid function
   netdev-offload-dpdk: Support tunnel pop action
   netdev-offload-dpdk: Implement flow dump create/destroy APIs
   netdev-dpdk: Getter function for dpdk devargs API
   netdev-dpdk: Introduce get netdev by devargs function
   netdev-dpdk-offload: Add vxlan pattern matching function

Ilya Maximets (4):
   netdev: Allow storing dpif type into netdev structure.
   netdev-offload: Use dpif type instead of class.
   netdev-offload: Allow offloading to netdev without ifindex.
   netdev-offload: Disallow offloading to unrelated tunneling vports.

Ophir Munk (6):
   dpif-netdev: Make mark allocation public API
   dpif-netdev: Add HW miss packet state recover logic
   netdev-dpdk: Introduce an API to query if a dpdk port is an uplink
     port
   netdev-dpdk-offload: Infrastructure for multiple rte_flows per UFID
   netdev-dpdk: Add flow_api support for netdev vxlan vports
   netdev-offload-dpdk: Support offload for vxlan vport

Oz Shlomo (1):
   netdev-offload: Add HW miss packet state recover API

  Documentation/howto/dpdk.rst  |    5 +-
  NEWS                          |    6 +-
  lib/dpif-netdev.c             |   70 +--
  lib/dpif-netlink.c            |   23 +-
  lib/dpif.c                    |   21 +-
  lib/netdev-dpdk.c             |   68 +++
  lib/netdev-dpdk.h             |    6 +
  lib/netdev-offload-dpdk.c     | 1275 +++++++++++++++++++++++++++++++++++------
  lib/netdev-offload-provider.h |    6 +
  lib/netdev-offload-tc.c       |   11 +-
  lib/netdev-offload.c          |  115 ++--
  lib/netdev-offload.h          |   23 +-
  lib/netdev-provider.h         |    3 +-
  lib/netdev.c                  |   16 +
  lib/netdev.h                  |    2 +
  ofproto/ofproto-dpif-upcall.c |    5 +-
  tests/dpif-netdev.at          |   14 +-
  tests/ofproto-macros.at       |    3 +-
  18 files changed, 1394 insertions(+), 278 deletions(-)

--
2.14.5

_______________________________________________
dev mailing list
d...@openvswitch.org
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=02%7C01%7Celibr%40mellanox.com%7Cec94952030bb4352740308d7a8cb42ac%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637163463282513495&amp;sdata=TkLL3cBksB6tKghQVnqHuxxPlK5dxYuLoJHxko9DNIc%3D&amp;reserved=0
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to