On Wed, Jan 24, 2018 at 05:29:45PM +0000, Stokes, Ian wrote: > > -----Original Message----- > > From: Yuanhan Liu [mailto:y...@fridaylinux.org] > > Sent: Wednesday, December 20, 2017 2:45 PM > > To: d...@openvswitch.org > > Cc: Finn Christensen <f...@napatech.com>; Darrell Ball <db...@vmware.com>; > > Chandran, Sugesh <sugesh.chand...@intel.com>; Simon Horman > > <simon.hor...@netronome.com>; Stokes, Ian <ian.sto...@intel.com>; Yuanhan > > Liu <y...@fridaylinux.org> > > Subject: [PATCH v5 1/5] dpif-netdev: associate flow with a mark id > > > > Hi Yuanhan, thanks for working on this, a few comments on the commit message > and the code below to be addressed.
Thanks for the review, I will address all the comments tomorrow, including rebase and re-test. --yliu > > > Most modern NICs have the ability to bind a flow with a mark, so that > > every pkt matches such flow will have that mark present in its desc. > > Can you use 'packet' and 'descriptor' rather than abbreviations pkt & desc > above for clarity in the commit message. This can applies to other instances > throughput the patch. > > > > > The basic idea of doing that is, when we receives pkts later, we could > > directly get the flow from the mark. That could avoid some very costly CPU > > operations, including (but not limiting to) miniflow_extract, emc lookup, > > dpcls lookup, etc. Thus, performance could be greatly improved. > > > > Thus, the mojor work of this patch is to associate a flow with a mark id > Typo 'major' > > > (an uint32_t number). The association in netdev datapatch is done by CMAP, > Typo 'datapath' > > > while in hardware it's done by the rte_flow MARK action. > > > > One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the case > > there is only one phys port but with 2 queues, there could be 2 PMDs. In > > another word, even for a single mega flow (i.e. udp,tp_src=1000), there > I think above should be 'In otherwords' rather than 'another word'. > > > could be 2 different dp_netdev flows, one for each PMD. That could results > > to the same mega flow being offloaded twice in the hardware, worse, we may > > get 2 different marks and only the last one will work. > > > > To avoid that, a megaflow_to_mark CMAP is created. An entry will be added > > for the first PMD wants to offload a flow. For later PMDs, it will see > > Pmd 'that' wants to offload. > > > such megaflow is already offloaded, then the flow will not be offloaded to > > HW twice. > > > > Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is what > > Is this 1:N or 1:1? I thought I spotted below that it's 1:1. > > > the mark_to_flow CMAP for. For the first PMD wants to offload a flow, it > > mark_to_flow CMAP 'is' for. 'If' the first PMD wants to offload a flow it... > > > allocates a new mark and do the flow offload by reusing the > > And 'performs' the flow offload by.... > > > ->flow_put method. When it succeeds, a "mark to flow" entry will be > > added. For later PMDs, it will get the corresponding mark by above > > corresponding mark by 'the'... > > > megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added. > > > > Another thing might worth mentioning is that hte megaflow is created by > > Another point worth mentioning is that 'the'... > > > masking all the bytes from match->flow with match->wc. It works well so > > far, but I have a feeling that is not the best way. > > > > Co-authored-by: Finn Christensen <f...@napatech.com> > > Signed-off-by: Yuanhan Liu <y...@fridaylinux.org> > > Signed-off-by: Finn Christensen <f...@napatech.com> > > --- > > > > v5: - fixed check of flow_mark_has_no_ref (renamed from > > is_last_flow_mark_reference). > > This fixed an issue that it took too long to finish > > flow add/removal if we do that repeatdly. > > > > - do mark_to_flow disassociation if flow modification failed > > --- > > lib/dpif-netdev.c | 263 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > lib/netdev.h | 6 ++ > > 2 files changed, 269 insertions(+) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 55be632..2fdc8dd > > 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -77,6 +77,7 @@ > > #include "tnl-ports.h" > > #include "unixctl.h" > > #include "util.h" > > +#include "uuid.h" > > > > Just a general comment for this patch, > > There's a lot of work ongoing in the dpif-netdev layer these days, did you > think about moving some of the HWOL functionality here to a separate HWOL > specific file? As HWOL grows over time I'm just thinking about the code > maintainability. > > > > VLOG_DEFINE_THIS_MODULE(dpif_netdev); > > > > @@ -442,7 +443,9 @@ struct dp_netdev_flow { > > /* Hash table index by unmasked flow. */ > > const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */ > > Can you clarify the comment above, at first glance it's not clear /* In > owning dp_netdev_pmd_thread's */ ? > > > /* 'flow_table'. */ > > + const struct cmap_node mark_node; /* In owning flow_mark's > > + mark_to_flow */ > > Same here, it's not clear what is meant by 'In owning...' > > > const ovs_u128 ufid; /* Unique flow identifier. */ > > + const ovs_u128 mega_ufid; > > As per OVS Coding style can you add a comment for mega_ufid. > > > const unsigned pmd_id; /* The 'core_id' of pmd thread owning > > this */ > > /* flow. */ > > > > @@ -453,6 +456,7 @@ struct dp_netdev_flow { > > struct ovs_refcount ref_cnt; > > > > bool dead; > > + uint32_t mark; /* Unique flow mark assiged to a flow */ > > Typo, 'assigned'. > > > > > /* Statistics. */ > > struct dp_netdev_flow_stats stats; > > @@ -1832,6 +1836,172 @@ dp_netdev_pmd_find_dpcls(struct > > dp_netdev_pmd_thread *pmd, > > return cls; > > } > > > > +#define MAX_FLOW_MARK (UINT32_MAX - 1) > > +#define INVALID_FLOW_MARK (UINT32_MAX) > > + > > +struct megaflow_to_mark_data { > > + const struct cmap_node node; > > + ovs_u128 mega_ufid; > > + uint32_t mark; > > +}; > > + > > +struct flow_mark { > > + struct cmap megaflow_to_mark; > > + struct cmap mark_to_flow; > > + struct id_pool *pool; > > + struct ovs_mutex mutex; > > +}; > > + > > Breaks compilation with Sparse. Sparse complains with: > 'lib/dpif-netdev.c:1860:18: error: symbol 'flow_mark' was not declared. > Should it be static?' > > > +struct flow_mark flow_mark = { > > + .megaflow_to_mark = CMAP_INITIALIZER, > > + .mark_to_flow = CMAP_INITIALIZER, > > + .mutex = OVS_MUTEX_INITIALIZER, > > +}; > > + > > +static uint32_t > > +flow_mark_alloc(void) > > +{ > > + uint32_t mark; > > + > > I would expect the id pool to be allocated only once before it's used to > allocate ids for the marks. > > Would it make sense to move the !flow_mark.pool check and allocation out of > here to simplify the function to a single purpose and avoid the check for > every mark id allocation. > > > > + if (!flow_mark.pool) { > > + /* Haven't initiated yet, do it here */ > > + flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK); > > + } > > + > > + if (id_pool_alloc_id(flow_mark.pool, &mark)) { > > + return mark; > > + } > > + > > + return INVALID_FLOW_MARK; > > +} > > + > > +static void > > +flow_mark_free(uint32_t mark) > > +{ > > + id_pool_free_id(flow_mark.pool, mark); } > > + > > +/* associate flow with a mark, which is a 1:1 mapping */ static void > > Can you specify megaflow in the comment above. > > > +megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark) { > > + size_t hash = dp_netdev_flow_hash(mega_ufid); > > + struct megaflow_to_mark_data *data = xzalloc(sizeof(*data)); > > + > > + data->mega_ufid = *mega_ufid; > > + data->mark = mark; > > + > > + cmap_insert(&flow_mark.megaflow_to_mark, > > + CONST_CAST(struct cmap_node *, &data->node), hash); } > > + > > +/* disassociate flow with a mark */ > > +static void > > +megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid) { > > + size_t hash = dp_netdev_flow_hash(mega_ufid); > > + struct megaflow_to_mark_data *data; > > + > > + CMAP_FOR_EACH_WITH_HASH (data, node, hash, > > &flow_mark.megaflow_to_mark) { > > + if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) { > > + cmap_remove(&flow_mark.megaflow_to_mark, > > + CONST_CAST(struct cmap_node *, &data->node), > > hash); > > + free(data); > > + return; > > + } > > + } > > + > > + VLOG_WARN("masked ufid "UUID_FMT" is not associated with a mark?\n", > > + UUID_ARGS((struct uuid *)mega_ufid)); } > > + > > +static inline uint32_t > > +megaflow_to_mark_find(const ovs_u128 *mega_ufid) { > > + size_t hash = dp_netdev_flow_hash(mega_ufid); > > + struct megaflow_to_mark_data *data; > > + > > + CMAP_FOR_EACH_WITH_HASH (data, node, hash, > > &flow_mark.megaflow_to_mark) { > > + if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) { > > + return data->mark; > > + } > > + } > > + > > + return INVALID_FLOW_MARK; > > +} > > + > > +/* associate mark with a flow, which is 1:N mapping */ static void > > +mark_to_flow_associate(const uint32_t mark, struct dp_netdev_flow > > +*flow) { > > + dp_netdev_flow_ref(flow); > > + > > + cmap_insert(&flow_mark.mark_to_flow, > > + CONST_CAST(struct cmap_node *, &flow->mark_node), > > + mark); > > + flow->mark = mark; > > + > > + VLOG_DBG("associated dp_netdev flow %p with mark %u\n", flow, > > +mark); } > > + > > I'd like to see a comment explain the purpose of the function below. Which > reference is the flow_mark missing? > > > +static bool > > +flow_mark_has_no_ref(uint32_t mark) > > +{ > > + struct dp_netdev_flow *flow; > > + > > Maybe I'm missing something below, but I expected a hash to be computed for > mark before being called with CMAP_FOR_EACH_WITH_HASH? > > > + CMAP_FOR_EACH_WITH_HASH (flow, mark_node, mark, > > + &flow_mark.mark_to_flow) { > > + if (flow->mark == mark) { > > + return false; > > + } > > + } > > + > > + return true; > > +} > > + > > +static int > > +mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd, > > + struct dp_netdev_flow *flow) { > > + int ret = 0; > > + uint32_t mark = flow->mark; > > + struct cmap_node *mark_node = CONST_CAST(struct cmap_node *, > > + &flow->mark_node); > > + > > + cmap_remove(&flow_mark.mark_to_flow, mark_node, mark); > > + flow->mark = INVALID_FLOW_MARK; > > + > > A comment on this code block could be helpful. Mainly just to explain how the > reference could be missing and the required steps because of this. > > > + if (flow_mark_has_no_ref(mark)) { > > + struct dp_netdev_port *port; > > + odp_port_t in_port = flow->flow.in_port.odp_port; > > + > > Need port Mutex 'pmd->dp->port_mutex' before calling dp_netdev_lookup_port(). > > > + port = dp_netdev_lookup_port(pmd->dp, in_port); > > + if (port) { > > + ret = netdev_flow_del(port->netdev, &flow->mega_ufid, NULL); > > + } > > + > > + flow_mark_free(mark); > > + VLOG_DBG("freed flow mark %u\n", mark); > > + > > + megaflow_to_mark_disassociate(&flow->mega_ufid); > > + } > > + dp_netdev_flow_unref(flow); > > + > > + return ret; > > +} > > + > > +static void > > +flow_mark_flush(struct dp_netdev_pmd_thread *pmd) { > > + struct dp_netdev_flow *flow; > > + > > + CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) { > > + if (flow->pmd_id == pmd->core_id) { > > + mark_to_flow_disassociate(pmd, flow); > > + } > > + } > > +} > > + > > static void > > dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, > > struct dp_netdev_flow *flow) @@ -1845,6 +2015,9 > > @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, > > ovs_assert(cls != NULL); > > dpcls_remove(cls, &flow->cr); > > cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow- > > >ufid)); > > + if (flow->mark != INVALID_FLOW_MARK) { > > + mark_to_flow_disassociate(pmd, flow); > > + } > > flow->dead = true; > > > > dp_netdev_flow_unref(flow); > > @@ -2424,6 +2597,87 @@ out: > > return error; > > } > > > > +static void > > +try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port, > > + struct dp_netdev_flow *flow, struct match *match, > > + const struct nlattr *actions, size_t actions_len) { > > + struct offload_info info; > > + struct dp_netdev_port *port; > > + bool modification = flow->mark != INVALID_FLOW_MARK; > > + const char *op = modification ? "modify" : "add"; > > + uint32_t mark; > > + int ret; > > + > > Again port Mutex 'pmd->dp->port_mutex' required in this function for calling > dp_netdev_lookup_port(). > > > + port = dp_netdev_lookup_port(pmd->dp, in_port); > > + if (!port) { > > + return; > > + } > > + > > + ovs_mutex_lock(&flow_mark.mutex); > > + > > + if (modification) { > > + mark = flow->mark; > > + } else { > > + if (!netdev_is_flow_api_enabled()) { > > + goto out; > > + } > > + > > + /* > > + * If a mega flow has already been offloaded (from other PMD > > + * instances), do not offload it again. > > + */ > > + mark = megaflow_to_mark_find(&flow->mega_ufid); > > + if (mark != INVALID_FLOW_MARK) { > > + VLOG_DBG("flow has already been offloaded with mark %u\n", > > mark); > Just to understand this, Am I right in thinking the expeted behavior is a > mark esists so it has been offloaded by another PMD, don't offload it in the > HW again BUT do mark it as associated. Is this associated call only relevant > to the current PMD? (i.e. each PMD must discover this association themselves?) > > > + mark_to_flow_associate(mark, flow); > > + goto out; > > + } > > + > > + mark = flow_mark_alloc(); > > + if (mark == INVALID_FLOW_MARK) { > > + VLOG_ERR("failed to allocate flow mark!\n"); > > + goto out; > > + } > > + } > > + > > + info.flow_mark = mark; > > + ret = netdev_flow_put(port->netdev, match, > > + CONST_CAST(struct nlattr *, actions), > > + actions_len, &flow->mega_ufid, &info, NULL); > > + if (ret) { > > + VLOG_ERR("failed to %s netdev flow with mark %u\n", op, mark); > > + if (!modification) { > > + flow_mark_free(mark); > > + } else { > > + mark_to_flow_disassociate(pmd, flow); > > + } > > + goto out; > > + } > > + > > + if (!modification) { > > + megaflow_to_mark_associate(&flow->mega_ufid, mark); > > + mark_to_flow_associate(mark, flow); > > + } > > + VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark); > > The whole section above seems a bit complicated, again comments explaining > the behavior that depends on modification would help a lot here. > > > + > > +out: > > + ovs_mutex_unlock(&flow_mark.mutex); > > +} > > + > > +static void > > +dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid) > > +{ > > + struct flow masked_flow; > > + size_t i; > > + > > + for (i = 0; i < sizeof(struct flow); i++) { > > + ((uint8_t *)&masked_flow)[i] = ((uint8_t *)&match->flow)[i] & > > + ((uint8_t *)&match->wc)[i]; > > + } > > + dpif_flow_hash(NULL, &masked_flow, sizeof(struct flow), mega_ufid); > > +} > > + > > static struct dp_netdev_flow * > > dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, > > struct match *match, const ovs_u128 *ufid, @@ -2459,12 > > +2713,14 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, > > memset(&flow->stats, 0, sizeof flow->stats); > > flow->dead = false; > > flow->batch = NULL; > > + flow->mark = INVALID_FLOW_MARK; > > *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id; > > *CONST_CAST(struct flow *, &flow->flow) = match->flow; > > *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid; > > ovs_refcount_init(&flow->ref_cnt); > > ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions, > > actions_len)); > > > > + dp_netdev_get_mega_ufid(match, CONST_CAST(ovs_u128 *, > > + &flow->mega_ufid)); > > netdev_flow_key_init_masked(&flow->cr.flow, &match->flow, &mask); > > > > /* Select dpcls for in_port. Relies on in_port to be exact match. */ > > @@ -2474,6 +2730,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, > > cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, &flow- > > >node), > > dp_netdev_flow_hash(&flow->ufid)); > > > > + try_netdev_flow_put(pmd, in_port, flow, match, actions, > > + actions_len); > > + > > if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) { > > struct ds ds = DS_EMPTY_INITIALIZER; > > struct ofpbuf key_buf, mask_buf; @@ -2554,6 +2812,7 @@ > > flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, > > if (put->flags & DPIF_FP_MODIFY) { > > struct dp_netdev_actions *new_actions; > > struct dp_netdev_actions *old_actions; > > + odp_port_t in_port = netdev_flow->flow.in_port.odp_port; > > > > new_actions = dp_netdev_actions_create(put->actions, > > put->actions_len); @@ > > -2561,6 +2820,9 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, > > old_actions = dp_netdev_flow_get_actions(netdev_flow); > > ovsrcu_set(&netdev_flow->actions, new_actions); > > > > + try_netdev_flow_put(pmd, in_port, netdev_flow, match, > > + put->actions, put->actions_len); > > + > > if (stats) { > > get_dpif_flow_stats(netdev_flow, stats); > > } > > @@ -3570,6 +3832,7 @@ reload_affected_pmds(struct dp_netdev *dp) > > > > CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > > if (pmd->need_reload) { > > + flow_mark_flush(pmd); > > dp_netdev_reload_pmd__(pmd); > > pmd->need_reload = false; > > } > > diff --git a/lib/netdev.h b/lib/netdev.h index 3a545fe..0c1946a 100644 > > --- a/lib/netdev.h > > +++ b/lib/netdev.h > > @@ -188,6 +188,12 @@ void netdev_send_wait(struct netdev *, int qid); > > struct offload_info { > > const struct dpif_class *dpif_class; > > ovs_be16 tp_dst_port; /* Destination port for tunnel in SET action */ > > + > > + /* > > + * The flow mark id assigened to the flow. If any pkts hit the flow, > > + * it will be in the pkt meta data. > > + */ > > + uint32_t flow_mark; > > }; > > struct dpif_class; > > struct netdev_flow_dump; > > -- > > 2.7.4 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev