On 9 May 2017 at 04:46, Zoltán Balogh <zoltan.bal...@ericsson.com> wrote: > Hi, > > Thank you for your comments. Actually, I was wrong in my previous e-mail when > I stated that packet is truncated only at the end of the pipeline, when > output is applied. The packet size is set according max_len set by truncate > when tunnel_push is applied. So the size of packet is correct. > > The root cause of the problem, why stats in underlay bridge is wrong, is that > statistics will be incremented by the packet number and packet size set by > the upcall_xlate(). > > static void > upcall_xlate(struct udpif *udpif, struct upcall *upcall, > struct ofpbuf *odp_actions, struct flow_wildcards *wc) > { > struct dpif_flow_stats stats; > struct xlate_in xin; > > stats.n_packets = 1; > stats.n_bytes = dp_packet_size(upcall->packet); > stats.used = time_msec(); > stats.tcp_flags = ntohs(upcall->flow->tcp_flags); > ... > } > > Since we excluded recirculation in the "Avoid recirculation" patch, there is > no second upcall when packet enters tunnel, but stats created by "first" and > only upcall are used for statistics of both integrate and underlay bridges. > And that's not correct. > > I completed my old patch to solve this problem by adding two new members to > struct rule_dpif and modify stats with their values.
Hi Zoltan, thanks again for looking into this. I had trouble trying to review this, in part because I felt like changes to fix multiple issues are being placed into one patch. If there are separate issues being addressed, each issue should be fixed in a separate patch, each with a clear description of the intent of the change (with links to patches that introduced the breakage if possible!). If there is refactoring to do, consider separating the non-functional changes into a separate patch---when looking at the original patch that started this discussion, I found it difficult to review post-merge because the refactoring was included and I couldn't tell if there were actually functional changes. Given that this is taking a bit longer than I thought and we need to take a holistic approach to how all of these interactions should work, I plan to go ahead and apply the revert patch. I look forward to seeing a fresh series proposal that will bring back the benefits of the original patch. More feedback below. > Here comes the patch: > > > Since tunneling and forwarding via patch port uses clone action, the tunneled Where does forwarding via patch port use clone action? > packet will not be parsed by miniflow_extract() and flow data will not be > updated within clone. So when a tunnel header is added to the packet, the MAC > and IP data of struct flow will not be updated according to the newly created > tunnel header. > > This patch stores MAC and IP data of flow before calling > apply_nested_clone_actions() in build_tunnel_send(), then restores the data > after apply_nested_clone_actions() has returned. > > Furthermore, it introduces truncate_packet_size and tunnel_header_size struct > rule_dpif members in order to correct n_bytes statistics of OF flows. > --- > ofproto/ofproto-dpif-xlate.c | 92 > ++++++++++++++++++++++++++++++++++++++++++++ > ofproto/ofproto-dpif.c | 14 ++++++- > ofproto/ofproto-dpif.h | 5 +++ > tests/ofproto-dpif.at | 11 +++++- > 4 files changed, 120 insertions(+), 2 deletions(-) > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index b308f21..55015d7 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -3141,6 +3141,33 @@ tnl_send_arp_request(struct xlate_ctx *ctx, const > struct xport *out_dev, > dp_packet_uninit(&packet); > } > > +/* > + * Copy IP data of 'flow->tunnel' into 'flow' and 'base_flow'. > + */ > +static void > +propagate_tunnel_data_to_flow(struct flow *flow, struct flow *base_flow) > +{ > + flow->nw_dst = flow->tunnel.ip_dst; > + flow->nw_src = flow->tunnel.ip_src; > + flow->ipv6_dst = flow->tunnel.ipv6_dst; > + flow->ipv6_src = flow->tunnel.ipv6_src; > + > + flow->nw_tos = flow->tunnel.ip_tos; > + flow->nw_ttl = flow->tunnel.ip_ttl; > + flow->tp_dst = flow->tunnel.tp_dst; > + flow->tp_src = flow->tunnel.tp_src; > + > + base_flow->nw_dst = flow->nw_dst; > + base_flow->nw_src = flow->nw_src; > + base_flow->ipv6_dst = flow->ipv6_dst; > + base_flow->ipv6_src = flow->ipv6_src; > + > + base_flow->nw_tos = flow->nw_tos; > + base_flow->nw_ttl = flow->nw_ttl; > + base_flow->tp_dst = flow->tp_dst; > + base_flow->tp_src = flow->tp_src; > +} > + > static int > build_tunnel_send(struct xlate_ctx *ctx, const struct xport *xport, > const struct flow *flow, odp_port_t tunnel_odp_port) > @@ -3156,6 +3183,11 @@ build_tunnel_send(struct xlate_ctx *ctx, const struct > xport *xport, > int err; > char buf_sip6[INET6_ADDRSTRLEN]; > char buf_dip6[INET6_ADDRSTRLEN]; > + /* Structures to backup Ethernet and IP data of flow and base_flow. */ > + struct flow old_flow = ctx->xin->flow; > + struct flow old_base_flow = ctx->base_flow; > + /* Variable to backup actual tunnel header size. */ > + uint64_t old_ths = 0; apply_nested_clone_action() also takes copies of the flow and base_flow, do we need to do it twice in this path? I suspect that this work could benefit from some further refactoring, taking note to have separate patches to refactor code and different patches for functional changes. > > err = tnl_route_lookup_flow(flow, &d_ip6, &s_ip6, &out_dev); > if (err) { > @@ -3216,16 +3248,57 @@ build_tunnel_send(struct xlate_ctx *ctx, const struct > xport *xport, > tnl_push_data.tnl_port = odp_to_u32(tunnel_odp_port); > tnl_push_data.out_port = odp_to_u32(out_dev->odp_port); > > + /* After tunnel header has been added, MAC and IP data of flow and > + * base_flow need to be set properly, since there is not recirculation > + * any more when sending packet to tunnel. */ > + if (!eth_addr_is_zero(ctx->xin->flow.dl_dst)) { > + ctx->xin->flow.dl_dst = dmac; > + ctx->base_flow.dl_dst = ctx->xin->flow.dl_dst; > + } What's the expectation if ctx->xin->flow.dl_dst is zero? Is that for L3 tunneled packets? > + > + ctx->xin->flow.dl_src = smac; > + ctx->base_flow.dl_src = ctx->xin->flow.dl_src; > + > + propagate_tunnel_data_to_flow(&ctx->xin->flow, &ctx->base_flow); > + > + if (!tnl_params.is_ipv6) { > + ctx->xin->flow.dl_type = htons(ETH_TYPE_IP); > + } else { > + ctx->xin->flow.dl_type = htons(ETH_TYPE_IPV6); > + } When writing if (!condition) { ... } else { ... }, please consider whether you can swap the conditions and remove the negation. It's less cognitive load if we don't have to negate a negative condition to consider which condition the "else" applies to. (This otherwise looks correct, so just a style request.) > + > + if (tnl_push_data.tnl_type == OVS_VPORT_TYPE_GRE) { > + ctx->xin->flow.nw_proto = IPPROTO_GRE; > + } else { > + ctx->xin->flow.nw_proto = IPPROTO_UDP; > + } > + ctx->base_flow.nw_proto = ctx->xin->flow.nw_proto; Should we use a switch () statement here to ensure that newer tunnel ports are properly handled in this case if/when they are added? What about things like LISP or STT? > + > size_t push_action_size = 0; > size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions, > OVS_ACTION_ATTR_CLONE); > odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data); > push_action_size = ctx->odp_actions->size; > + > + if (ctx->rule) { > + old_ths = ctx->rule->tunnel_header_size; > + ctx->rule->tunnel_header_size = tnl_push_data.header_len; > + } > + > apply_nested_clone_actions(ctx, xport, out_dev); > if (ctx->odp_actions->size > push_action_size) { > /* Update the CLONE action only when combined */ > nl_msg_end_nested(ctx->odp_actions, clone_ofs); > } If there are no actions on the underlay bridge (ie drop), the else condition here should probably reset the nl_msg nesting so that we don't generate any actions to the datapath at all. You may also consider adding a xlate_report() call to explain why the drop is occuring - this should help debugging later on using ofproto/trace. > + > + if (ctx->rule) { > + ctx->rule->tunnel_header_size = old_ths; > + } > + > + /* Restore flow and base_flow data. */ > + ctx->xin->flow = old_flow; > + ctx->base_flow = old_base_flow; > + > return 0; > } > > @@ -3728,6 +3801,10 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, > } > > if (rule) { > + if (ctx->rule) { > + rule->truncated_packet_size = > ctx->rule->truncated_packet_size; > + rule->tunnel_header_size = ctx->rule->tunnel_header_size; > + } > /* Fill in the cache entry here instead of xlate_recursively > * to make the reference counting more explicit. We take a > * reference in the lookups above if we are going to cache the > @@ -4602,6 +4679,8 @@ xlate_output_trunc_action(struct xlate_ctx *ctx, > bool support_trunc = ctx->xbridge->support.trunc; > struct ovs_action_trunc *trunc; > char name[OFP10_MAX_PORT_NAME_LEN]; > + /* Variable to backup truncate max len. */ > + uint64_t old_tps = 0; > > switch (port) { > case OFPP_TABLE: > @@ -4634,7 +4713,20 @@ xlate_output_trunc_action(struct xlate_ctx *ctx, > OVS_ACTION_ATTR_TRUNC, > sizeof *trunc); > trunc->max_len = max_len; > + > + /* Update truncate correction. */ > + if (ctx->rule) { > + old_tps = ctx->rule->truncated_packet_size; > + ctx->rule->truncated_packet_size = max_len; > + } > + > xlate_output_action(ctx, port, max_len, false); > + > + /* Restore truncate correction. */ > + if (ctx->rule) { > + ctx->rule->truncated_packet_size = old_tps; > + } > + > if (!support_trunc) { > ctx->xout->slow |= SLOW_ACTION; > } > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index dc5f004..800d0f6 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -3959,8 +3959,18 @@ rule_dpif_credit_stats__(struct rule_dpif *rule, > OVS_REQUIRES(rule->stats_mutex) > { > if (credit_counts) { > + uint64_t stats_n_bytes = 0; > + > + if (rule->truncated_packet_size) { > + stats_n_bytes = MIN(rule->truncated_packet_size, stats->n_bytes); > + } else { > + stats_n_bytes = stats->n_bytes; > + } Is this fixing a separate issue? I'm confused about whether truncated packet stats are being correctly attributed today on master. If so, I think this should split out into a separate patch. You might also consider whether it makes more sense to modify 'stats' earlier in the path so that each rule doesn't need to individually apply the stats adjustment. I could imagine a store/restore of the stats plus modifying for truncation during the xlate_output_trunc_action() processing rather than pushing this complexity all the way into the rule stats attribution. > + > + stats_n_bytes += rule->tunnel_header_size; > + > rule->stats.n_packets += stats->n_packets; > - rule->stats.n_bytes += stats->n_bytes; > + rule->stats.n_bytes += stats_n_bytes; > } > rule->stats.used = MAX(rule->stats.used, stats->used); > } > @@ -4153,6 +4163,8 @@ rule_dpif_lookup_from_table(struct ofproto_dpif > *ofproto, > entry->table.match = (rule != NULL); > } > if (rule) { > + rule->truncated_packet_size = 0; > + rule->tunnel_header_size = 0; > goto out; /* Match. */ > } > if (honor_table_miss) { > diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h > index a3a6f1f..748df4c 100644 > --- a/ofproto/ofproto-dpif.h > +++ b/ofproto/ofproto-dpif.h > @@ -93,6 +93,11 @@ struct rule_dpif { > * The recirculation id and associated internal flow should > * be freed when the rule is freed */ > uint32_t recirc_id; > + > + /* Variables to be used to correct statistic when packet is sent to > tunnel > + * or truncated. */ > + uint64_t truncated_packet_size; > + uint64_t tunnel_header_size; > }; > > struct rule_dpif *rule_dpif_lookup_from_table(struct ofproto_dpif *, > diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at > index e52ab32..1f6cd84 100644 > --- a/tests/ofproto-dpif.at > +++ b/tests/ofproto-dpif.at > @@ -6257,6 +6257,15 @@ HEADER > dgramSeqNo=1 > ds=127.0.0.1>2:1000 > fsSeqNo=1 > + tunnel4_out_length=0 > + tunnel4_out_protocol=47 > + tunnel4_out_src=1.1.2.88 > + tunnel4_out_dst=1.1.2.92 > + tunnel4_out_src_port=0 > + tunnel4_out_dst_port=0 > + tunnel4_out_tcp_flags=0 > + tunnel4_out_tos=0 > + tunnel_out_vni=456 > in_vlan=0 > in_priority=0 > out_vlan=0 > @@ -6266,7 +6275,7 @@ HEADER > dropEvents=0 > in_ifindex=2011 > in_format=0 > - out_ifindex=2 > + out_ifindex=1 > out_format=2 > hdr_prot=1 > pkt_len=46 > -- > 2.7.4 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev