On Thu, Jan 25, 2024 at 4:21 AM Dumitru Ceara <dce...@redhat.com> wrote: > > On 1/25/24 06:44, Han Zhou wrote: > > On Wed, Jan 24, 2024 at 8:39 PM Numan Siddique <num...@ovn.org> wrote: > >> > >> On Wed, Jan 24, 2024 at 10:53 PM Han Zhou <hz...@ovn.org> wrote: > >>> > >>> On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara <dce...@redhat.com> wrote: > >>>> > >>>> On 1/24/24 06:01, Han Zhou wrote: > >>>>> On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara <dce...@redhat.com> > > wrote: > >>>>>> > >>>>>> On 1/11/24 16:31, num...@ovn.org wrote: > >>>>>>> + > >>>>>>> +void > >>>>>>> +lflow_table_add_lflow(struct lflow_table *lflow_table, > >>>>>>> + const struct ovn_datapath *od, > >>>>>>> + const unsigned long *dp_bitmap, size_t > >>>>> dp_bitmap_len, > >>>>>>> + enum ovn_stage stage, uint16_t priority, > >>>>>>> + const char *match, const char *actions, > >>>>>>> + const char *io_port, const char > > *ctrl_meter, > >>>>>>> + const struct ovsdb_idl_row *stage_hint, > >>>>>>> + const char *where, > >>>>>>> + struct lflow_ref *lflow_ref) > >>>>>>> + OVS_EXCLUDED(fake_hash_mutex) > >>>>>>> +{ > >>>>>>> + struct ovs_mutex *hash_lock; > >>>>>>> + uint32_t hash; > >>>>>>> + > >>>>>>> + ovs_assert(!od || > >>>>>>> + ovn_stage_to_datapath_type(stage) == > >>>>> ovn_datapath_get_type(od)); > >>>>>>> + > >>>>>>> + hash = ovn_logical_flow_hash(ovn_stage_get_table(stage), > >>>>>>> + ovn_stage_get_pipeline(stage), > >>>>>>> + priority, match, > >>>>>>> + actions); > >>>>>>> + > >>>>>>> + hash_lock = lflow_hash_lock(&lflow_table->entries, hash); > >>>>>>> + struct ovn_lflow *lflow = > >>>>>>> + do_ovn_lflow_add(lflow_table, od, dp_bitmap, > >>>>>>> + dp_bitmap_len, hash, stage, > >>>>>>> + priority, match, actions, > >>>>>>> + io_port, ctrl_meter, stage_hint, where); > >>>>>>> + > >>>>>>> + if (lflow_ref) { > >>>>>>> + /* lflow referencing is only supported if 'od' is not > > NULL. > >>> */ > >>>>>>> + ovs_assert(od); > >>>>>>> + > >>>>>>> + struct lflow_ref_node *lrn = > >>>>>>> + lflow_ref_node_find(&lflow_ref->lflow_ref_nodes, > > lflow, > >>>>> hash); > >>>>>>> + if (!lrn) { > >>>>>>> + lrn = xzalloc(sizeof *lrn); > >>>>>>> + lrn->lflow = lflow; > >>>>>>> + lrn->dp_index = od->index; > >>>>>>> + ovs_list_insert(&lflow_ref->lflows_ref_list, > >>>>>>> + &lrn->lflow_list_node); > >>>>>>> + inc_dp_refcnt(&lflow->dp_refcnts_map, lrn->dp_index); > >>>>>>> + ovs_list_insert(&lflow->referenced_by, > >>>>> &lrn->ref_list_node); > >>>>>>> + > >>>>>>> + hmap_insert(&lflow_ref->lflow_ref_nodes, > > &lrn->ref_node, > >>>>> hash); > >>>>>>> + } > >>>>>>> + > >>>>>>> + lrn->linked = true; > >>>>>>> + } > >>>>>>> + > >>>>>>> + lflow_hash_unlock(hash_lock); > >>>>>>> + > >>>>>>> +} > >>>>>>> + > >>>>>> > >>>>>> This part is not thread safe. > >>>>>> > >>>>>> If two threads try to add logical flows that have different hashes > > and > >>>>>> lflow_ref is not NULL we're going to have a race condition when > >>>>>> inserting to the &lflow_ref->lflow_ref_nodes hash map because the > > two > >>>>>> threads will take different locks. > >>>>>> > >>>>> > >>>>> I think it is safe because a lflow_ref is always associated with an > >>> object, > >>>>> e.g. port, datapath, lb, etc., and lflow generation for a single > > such > >>>>> object is never executed in parallel, which is how the parallel > > lflow > >>> build > >>>>> is designed. > >>>>> Does it make sense? > >>>> > >>>> It happens that it's safe in this current patch set because indeed we > >>>> always process individual ports, datapaths, lbs, etc, in the same > >>>> thread. However, this code (lflow_table_add_lflow()) is generic and > >>>> there's nothing (not even a comment) that would warn developers in the > >>>> future about the potential race if the lflow_ref is shared. > >>>> > >>>> I spoke to Numan offline a bit about this and I think the current plan > >>>> is to leave it as is and add proper locking as a follow up (or in v7). > >>>> But I think we still need a clear comment here warning users about > > this. > >>>> Maybe we should add a comment where the lflow_ref structure is > > defined > >>> too. > >>>> > >>>> What do you think? > >>> > >>> I totally agree with you about adding comments to explain the thread > > safety > >>> considerations, and make it clear that the lflow_ref should always be > >>> associated with the object that is being processed by the thread. > >>> With regard to any follow up change for proper locking, I am not sure > > what > >>> scenario is your concern. I think if we always make sure the lflow_ref > >>> passed in is the one owned by the object then the current locking is > >>> sufficient. And this is the natural case. > >>> > >>> However, I did think about a situation where there might be a potential > >>> problem in the future when we need to maintain references for more than > > one > >>> input for the same lflow. For the "main" input, which is the object that > >>> the thread is iterating and generating lflow for, will have its > > lflow_ref > >>> passed in this function, but we might also need to maintain the > > reference > >>> of the lflow for a secondary input (or even third). In that case it is > > not > >>> just the locking issue, but firstly we need to have a proper way to > > pass in > >>> the secondary lflow_ref, which is what I had mentioned in the review > >>> comments for v3: > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410269.html > > . > >>> (the last paragraph). Regardless of that, it is not a problem for now, > > and > >>> I hope there is no need to add references for more inputs for the I-P > > that > >>> matters for production use cases. > >>> > >> > >> I'll update the next version with the proper documentation. > >> > >> @Han Regarding your comment that we may have the requirement in the > >> future to add a logical flow to > >> 2 or more lflow_ref's, I doubt if we would have such a requirement > >> or a scenario. > >> > >> Because calling lflow_table_add_lflow(,..., lflow_ref1, lflow_ref2) > >> is same as calling lflow_table_add_lflow twice > >> > >> i.e > >> lflow_table_add_lflow(,..., lflow_ref1) > >> lflow_table_add_lflow(,..., lflow_ref2) > >> > >> In the second call, since the ovn_lflow already exists in the lflow > >> table, it will be just referenced in the lflow_ref2. > >> It would be a problem for sure if these lflow_refs (ref1 and ref2) > >> belong to different objects. > >> > > > > The scenario you mentioned here is more like duplicated lflows generated > > when processing different objects, which has different lflow_refs, which is > > not the scenario I was talking about. > > I am thinking about the case when a lflow is generated, there are multiple > > inputs. For example (extremely simplified just to express the idea), when > > generating lflows for a LB object, we also consult other objects such as > > datapath. Now we implemented I-P for LB, so for the LB object, for each > > lflow generated we will call lflow_table_add_lflow(..., lb->lflow_ref) to > > maintain the reference from the LB object to the lflow. However, if in the > > future we want to do I-P for datapaths, which means when a datapath is > > deleted or updated we need to quickly find the lflows that is linked to the > > datapath, we need to maintain the reference from the datapath to the same > > lflow that is generated by the same call lflow_table_add_lflow(..., > > lb->lflow_ref, datapath->lflow_ref). Or, do you suggest even in this case > > we just call the function twice, with the second call's only purpose being > > to add the reference to the secondary input? Hmm, it would actually work as > > if we are trying to add a redundant lflow, although wasting some cycles for > > the ovn_lflow_find(). Of course the locking issue would arise in this case. > > Anyway, I don't have a realistic example to prove this is a requirement of > > I-P for solving real production scale/performance issues. In the example I > > provided, datapath I-P seems to be a non-goal as far as I can tell for now. > > So I am not too worried. > > > >> However I do see a scenario where lflow_table_add_lflow() could be > >> called with a different lflow_ref object. > >> > >> Few scenarios I could think of > >> > >> 1. While generating logical flows for a logical switch port (of type > >> router) 'P1', there is a possibility (most likely a mistake from > >> a contributor) may call something like > >> > >> lflow_table_add_lflow(...., P1->peer->lflow_ref) > >> > >> Even in this case, we don't generate logical router port flows in > >> parallel with logical switch ports and > >> hence P1->peer->lflow_ref may not be modified by multiple threads. > >> But it's a possibility. And I think Dumitru > >> is perhaps concerned about such scenarios. > >> > > In this case it does seem to be a problem, because it is possible that > > thread1 is processing P1 while thread2 is processing P1->peer, and now both > > threads can access P1->peer->lflow_ref. We should make it clear that when > > processing an object, we only add lflow_ref for that object. > > > >> 2. While generating load balancer flows we may also generate logical > >> flows for the routers and store them in the od->lflow_ref (If we > >> happen to support I-P for router ports) > >> i.e HMAP_FOR_EACH_PARALLEL (lb, lb_maps) { > >> generate_lflows_for_lb(lb, lb->lflow_ref) > >> FOR_EACH_ROUTER_OF_LB(lr, lb) { > >> generate_lb_lflows_for_router(lr, lr->lflow_ref) > >> } > >> } > >> > >> In this case, it is possible that a logical router 'lr' may be > >> associated with multiple lbs and this may corrupt the lr->lflow_ref. > >> > >> > >> We have 2 choices (after this patch series is accepted :)) > >> a. Don't do anything. The documentation added by this patch series > >> (which I'll update in the next version) is good enough. > >> and whenever such scenarios arise, the contributor will add > >> the thread_safe code or make sure that he/she will not use the > >> lflow_ref of other objects while > >> generating lflows. In the scenario 2 above, the contributor > >> should not generate lflows for the routers in the lb_maps loop. > >> Instead should generate > >> them in the lr_datapaths loop or lr_stateful loop. > >> > >> b. As a follow up patch, make the lflow_table_add_lflow() > >> thread_safe for lflow_ref so that we are covered for the future > >> scenarios. > >> The downside of this is that we have to maintain hash locks for > >> each lflow_ref and reconsile the lflow_ref hmap after all the threads > >> finish the lflow generation. > >> It does incur some cost (both memory and cpu wise). But it > >> will be only during recomputes. Which should not be too frequent. > >> > > > > I'd prefer solution (a) with proper documentation to emphasize the > > rule/assumption. > > > > Would it make sense to also try to detect whether a lflow_ref is used > from multiple threads? E.g., store the id of the last thread that > accessed it and assert that it's the same with the thread currently > accessing the lflow ref? > > Just to make sure we crash explicitly in case such bugs are > inadvertently introduced? >
I think it would be hard to make a decision here because I-P is handled in a single thread. How do we distinguish between a full recompute with parallel threads vs I-P single threading in lflow_ref.c. Numan > Regards, > Dumitru > > > Thanks, > > Han > > > >> > >> Let me know what you think. I'm fine with both, but personally prefer > >> (1) as I don't see such scenarios in the near future. > >> > >> > >> Thanks > >> Numan > >> > >> > >> > >> > >>> Thanks, > >>> Han > >>> > >>>> > >>>> Regards, > >>>> Dumitru > >>>> > >>>>> > >>>>> Thanks, > >>>>> Han > >>>>> > >>>>>> That might corrupt the map. > >>>>>> > >>>>>> I guess, if we don't want to cause more performance degradation we > >>>>>> should maintain as many lflow_ref instances as we do hash_locks, > > i.e., > >>>>>> LFLOW_HASH_LOCK_MASK + 1. Will that even be possible? > >>>>>> > >>>>>> Wdyt? > >>>>>> > >>>>>> Regards, > >>>>>> Dumitru > >>>>>> > >>>>>> _______________________________________________ > >>>>>> dev mailing list > >>>>>> d...@openvswitch.org > >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>>>> > >>>> > >>> _______________________________________________ > >>> dev mailing list > >>> d...@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev