Re: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath by computing the recirculate actions at translate time.

Joe Stringer Wed, 10 May 2017 12:14:52 -0700

On 9 May 2017 at 04:46, Zoltán Balogh <zoltan.bal...@ericsson.com> wrote:
> Hi,
>
> Thank you for your comments. Actually, I was wrong in my previous e-mail when 
> I stated that packet is truncated only at the end of the pipeline, when 
> output is applied. The packet size is set according max_len set by truncate 
> when tunnel_push is applied. So the size of packet is correct.
>
> The root cause of the problem, why stats in underlay bridge is wrong, is that 
> statistics will be incremented by the packet number and packet size set by 
> the upcall_xlate().
>
> static void
> upcall_xlate(struct udpif *udpif, struct upcall *upcall,
>              struct ofpbuf *odp_actions, struct flow_wildcards *wc)
> {
>     struct dpif_flow_stats stats;
>     struct xlate_in xin;
>
>     stats.n_packets = 1;
>     stats.n_bytes = dp_packet_size(upcall->packet);
>     stats.used = time_msec();
>     stats.tcp_flags = ntohs(upcall->flow->tcp_flags);
> ...
> }
>
> Since we excluded recirculation in the "Avoid recirculation" patch, there is 
> no second upcall when packet enters tunnel, but stats created by "first" and 
> only upcall are used for statistics of both integrate and underlay bridges. 
> And that's not correct.
>
> I completed my old patch to solve this problem by adding two new members to 
> struct rule_dpif and modify stats with their values.


Hi Zoltan, thanks again for looking into this.

I had trouble trying to review this, in part because I felt like
changes to fix multiple issues are being placed into one patch. If
there are separate issues being addressed, each issue should be fixed
in a separate patch, each with a clear description of the intent of
the change (with links to patches that introduced the breakage if
possible!). If there is refactoring to do, consider separating the
non-functional changes into a separate patch---when looking at the
original patch that started this discussion, I found it difficult to
review post-merge because the refactoring was included and I couldn't
tell if there were actually functional changes.

Given that this is taking a bit longer than I thought and we need to
take a holistic approach to how all of these interactions should work,
I plan to go ahead and apply the revert patch. I look forward to
seeing a fresh series proposal that will bring back the benefits of
the original patch.

More feedback below.

> Here comes the patch:
>
>
> Since tunneling and forwarding via patch port uses clone action, the tunneled

Where does forwarding via patch port use clone action?

> packet will not be parsed by miniflow_extract() and flow data will not be
> updated within clone. So when a tunnel header is added to the packet, the MAC
> and IP data of struct flow will not be updated according to the newly created
> tunnel header.
>
> This patch stores MAC and IP data of flow before calling
> apply_nested_clone_actions() in build_tunnel_send(), then restores the data
> after apply_nested_clone_actions() has returned.
>
> Furthermore, it introduces truncate_packet_size and tunnel_header_size struct
> rule_dpif members in order to correct n_bytes statistics of OF flows.
> ---
>  ofproto/ofproto-dpif-xlate.c | 92 
> ++++++++++++++++++++++++++++++++++++++++++++
>  ofproto/ofproto-dpif.c       | 14 ++++++-
>  ofproto/ofproto-dpif.h       |  5 +++
>  tests/ofproto-dpif.at        | 11 +++++-
>  4 files changed, 120 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index b308f21..55015d7 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3141,6 +3141,33 @@ tnl_send_arp_request(struct xlate_ctx *ctx, const 
> struct xport *out_dev,
>      dp_packet_uninit(&packet);
>  }
>
> +/*
> + * Copy IP data of 'flow->tunnel' into 'flow' and 'base_flow'.
> + */
> +static void
> +propagate_tunnel_data_to_flow(struct flow *flow, struct flow *base_flow)
> +{
> +    flow->nw_dst = flow->tunnel.ip_dst;
> +    flow->nw_src = flow->tunnel.ip_src;
> +    flow->ipv6_dst = flow->tunnel.ipv6_dst;
> +    flow->ipv6_src = flow->tunnel.ipv6_src;
> +
> +    flow->nw_tos = flow->tunnel.ip_tos;
> +    flow->nw_ttl = flow->tunnel.ip_ttl;
> +    flow->tp_dst = flow->tunnel.tp_dst;
> +    flow->tp_src = flow->tunnel.tp_src;
> +
> +    base_flow->nw_dst = flow->nw_dst;
> +    base_flow->nw_src = flow->nw_src;
> +    base_flow->ipv6_dst = flow->ipv6_dst;
> +    base_flow->ipv6_src = flow->ipv6_src;
> +
> +    base_flow->nw_tos = flow->nw_tos;
> +    base_flow->nw_ttl = flow->nw_ttl;
> +    base_flow->tp_dst = flow->tp_dst;
> +    base_flow->tp_src = flow->tp_src;
> +}
> +
>  static int
>  build_tunnel_send(struct xlate_ctx *ctx, const struct xport *xport,
>                    const struct flow *flow, odp_port_t tunnel_odp_port)
> @@ -3156,6 +3183,11 @@ build_tunnel_send(struct xlate_ctx *ctx, const struct 
> xport *xport,
>      int err;
>      char buf_sip6[INET6_ADDRSTRLEN];
>      char buf_dip6[INET6_ADDRSTRLEN];
> +    /* Structures to backup Ethernet and IP data of flow and base_flow. */
> +    struct flow old_flow = ctx->xin->flow;
> +    struct flow old_base_flow = ctx->base_flow;
> +    /* Variable to backup actual tunnel header size. */
> +    uint64_t old_ths = 0;

apply_nested_clone_action() also takes copies of the flow and
base_flow, do we need to do it twice in this path? I suspect that this
work could benefit from some further refactoring, taking note to have
separate patches to refactor code and different patches for functional
changes.

>
>      err = tnl_route_lookup_flow(flow, &d_ip6, &s_ip6, &out_dev);
>      if (err) {
> @@ -3216,16 +3248,57 @@ build_tunnel_send(struct xlate_ctx *ctx, const struct 
> xport *xport,
>      tnl_push_data.tnl_port = odp_to_u32(tunnel_odp_port);
>      tnl_push_data.out_port = odp_to_u32(out_dev->odp_port);
>
> +    /* After tunnel header has been added, MAC and IP data of flow and
> +     * base_flow need to be set properly, since there is not recirculation
> +     * any more when sending packet to tunnel. */
> +    if (!eth_addr_is_zero(ctx->xin->flow.dl_dst)) {
> +        ctx->xin->flow.dl_dst = dmac;
> +        ctx->base_flow.dl_dst = ctx->xin->flow.dl_dst;
> +    }

What's the expectation if ctx->xin->flow.dl_dst is zero? Is that for
L3 tunneled packets?

> +
> +    ctx->xin->flow.dl_src = smac;
> +    ctx->base_flow.dl_src = ctx->xin->flow.dl_src;
> +
> +    propagate_tunnel_data_to_flow(&ctx->xin->flow, &ctx->base_flow);
> +
> +    if (!tnl_params.is_ipv6) {
> +        ctx->xin->flow.dl_type = htons(ETH_TYPE_IP);
> +    } else {
> +        ctx->xin->flow.dl_type = htons(ETH_TYPE_IPV6);
> +    }

When writing if (!condition) { ... } else { ... }, please consider
whether you can swap the conditions and remove the negation. It's less
cognitive load if we don't have to negate a negative condition to
consider which condition the "else" applies to. (This otherwise looks
correct, so just a style request.)

> +
> +    if (tnl_push_data.tnl_type == OVS_VPORT_TYPE_GRE) {
> +        ctx->xin->flow.nw_proto = IPPROTO_GRE;
> +    } else {
> +        ctx->xin->flow.nw_proto = IPPROTO_UDP;
> +    }
> +    ctx->base_flow.nw_proto = ctx->xin->flow.nw_proto;

Should we use a switch () statement here to ensure that newer tunnel
ports are properly handled in this case if/when they are added? What
about things like LISP or STT?

> +
>      size_t push_action_size = 0;
>      size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
>                                             OVS_ACTION_ATTR_CLONE);
>      odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
>      push_action_size = ctx->odp_actions->size;
> +
> +    if (ctx->rule) {
> +        old_ths = ctx->rule->tunnel_header_size;
> +        ctx->rule->tunnel_header_size = tnl_push_data.header_len;
> +    }
> +
>      apply_nested_clone_actions(ctx, xport, out_dev);
>      if (ctx->odp_actions->size > push_action_size) {
>          /* Update the CLONE action only when combined */
>          nl_msg_end_nested(ctx->odp_actions, clone_ofs);
>      }

If there are no actions on the underlay bridge (ie drop), the else
condition here should probably reset the nl_msg nesting so that we
don't generate any actions to the datapath at all. You may also
consider adding a xlate_report() call to explain why the drop is
occuring - this should help debugging later on using ofproto/trace.

> +
> +    if (ctx->rule) {
> +        ctx->rule->tunnel_header_size = old_ths;
> +    }
> +
> +    /* Restore flow and base_flow data. */
> +    ctx->xin->flow = old_flow;
> +    ctx->base_flow = old_base_flow;
> +
>      return 0;
>  }
>
> @@ -3728,6 +3801,10 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t 
> in_port, uint8_t table_id,
>          }
>
>          if (rule) {
> +            if (ctx->rule) {
> +                rule->truncated_packet_size = 
> ctx->rule->truncated_packet_size;
> +                rule->tunnel_header_size = ctx->rule->tunnel_header_size;
> +            }
>              /* Fill in the cache entry here instead of xlate_recursively
>               * to make the reference counting more explicit.  We take a
>               * reference in the lookups above if we are going to cache the
> @@ -4602,6 +4679,8 @@ xlate_output_trunc_action(struct xlate_ctx *ctx,
>      bool support_trunc = ctx->xbridge->support.trunc;
>      struct ovs_action_trunc *trunc;
>      char name[OFP10_MAX_PORT_NAME_LEN];
> +    /* Variable to backup truncate max len. */
> +    uint64_t old_tps = 0;
>
>      switch (port) {
>      case OFPP_TABLE:
> @@ -4634,7 +4713,20 @@ xlate_output_trunc_action(struct xlate_ctx *ctx,
>                                  OVS_ACTION_ATTR_TRUNC,
>                                  sizeof *trunc);
>              trunc->max_len = max_len;
> +
> +            /* Update truncate correction. */
> +            if (ctx->rule) {
> +                old_tps = ctx->rule->truncated_packet_size;
> +                ctx->rule->truncated_packet_size = max_len;
> +            }
> +
>              xlate_output_action(ctx, port, max_len, false);
> +
> +            /* Restore truncate correction. */
> +            if (ctx->rule) {
> +                ctx->rule->truncated_packet_size = old_tps;
> +            }
> +
>              if (!support_trunc) {
>                  ctx->xout->slow |= SLOW_ACTION;
>              }
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index dc5f004..800d0f6 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -3959,8 +3959,18 @@ rule_dpif_credit_stats__(struct rule_dpif *rule,
>      OVS_REQUIRES(rule->stats_mutex)
>  {
>      if (credit_counts) {
> +        uint64_t stats_n_bytes = 0;
> +
> +        if (rule->truncated_packet_size) {
> +            stats_n_bytes = MIN(rule->truncated_packet_size, stats->n_bytes);
> +        } else {
> +            stats_n_bytes = stats->n_bytes;
> +        }

Is this fixing a separate issue? I'm confused about whether truncated
packet stats are being correctly attributed today on master. If so, I
think this should split out into a separate patch. You might also
consider whether it makes more sense to modify 'stats' earlier in the
path so that each rule doesn't need to individually apply the stats
adjustment. I could imagine a store/restore of the stats plus
modifying for truncation during the xlate_output_trunc_action()
processing rather than pushing this complexity all the way into the
rule stats attribution.

> +
> +        stats_n_bytes += rule->tunnel_header_size;
> +
>          rule->stats.n_packets += stats->n_packets;
> -        rule->stats.n_bytes += stats->n_bytes;
> +        rule->stats.n_bytes += stats_n_bytes;
>      }
>      rule->stats.used = MAX(rule->stats.used, stats->used);
>  }
> @@ -4153,6 +4163,8 @@ rule_dpif_lookup_from_table(struct ofproto_dpif 
> *ofproto,
>              entry->table.match = (rule != NULL);
>          }
>          if (rule) {
> +            rule->truncated_packet_size = 0;
> +            rule->tunnel_header_size = 0;
>              goto out;   /* Match. */
>          }
>          if (honor_table_miss) {
> diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
> index a3a6f1f..748df4c 100644
> --- a/ofproto/ofproto-dpif.h
> +++ b/ofproto/ofproto-dpif.h
> @@ -93,6 +93,11 @@ struct rule_dpif {
>       * The recirculation id and associated internal flow should
>       * be freed when the rule is freed */
>      uint32_t recirc_id;
> +
> +    /* Variables to be used to correct statistic when packet is sent to 
> tunnel
> +     * or truncated. */
> +    uint64_t truncated_packet_size;
> +    uint64_t tunnel_header_size;
>  };
>
>  struct rule_dpif *rule_dpif_lookup_from_table(struct ofproto_dpif *,
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index e52ab32..1f6cd84 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -6257,6 +6257,15 @@ HEADER
>         dgramSeqNo=1
>         ds=127.0.0.1>2:1000
>         fsSeqNo=1
> +       tunnel4_out_length=0
> +       tunnel4_out_protocol=47
> +       tunnel4_out_src=1.1.2.88
> +       tunnel4_out_dst=1.1.2.92
> +       tunnel4_out_src_port=0
> +       tunnel4_out_dst_port=0
> +       tunnel4_out_tcp_flags=0
> +       tunnel4_out_tos=0
> +       tunnel_out_vni=456
>         in_vlan=0
>         in_priority=0
>         out_vlan=0
> @@ -6266,7 +6275,7 @@ HEADER
>         dropEvents=0
>         in_ifindex=2011
>         in_format=0
> -       out_ifindex=2
> +       out_ifindex=1
>         out_format=2
>         hdr_prot=1
>         pkt_len=46
> --
> 2.7.4
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath by computing the recirculate actions at translate time.

Reply via email to