Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

Dumitru Ceara Fri, 26 Jan 2024 01:33:31 -0800

On 1/25/24 16:40, Numan Siddique wrote:
> On Thu, Jan 25, 2024 at 4:21 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>
>> On 1/25/24 06:44, Han Zhou wrote:
>>> On Wed, Jan 24, 2024 at 8:39 PM Numan Siddique <num...@ovn.org> wrote:
>>>>
>>>> On Wed, Jan 24, 2024 at 10:53 PM Han Zhou <hz...@ovn.org> wrote:
>>>>>
>>>>> On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>>>>>
>>>>>> On 1/24/24 06:01, Han Zhou wrote:
>>>>>>> On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara <dce...@redhat.com>
>>> wrote:
>>>>>>>>
>>>>>>>> On 1/11/24 16:31, num...@ovn.org wrote:
>>>>>>>>> +
>>>>>>>>> +void
>>>>>>>>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
>>>>>>>>> +                      const struct ovn_datapath *od,
>>>>>>>>> +                      const unsigned long *dp_bitmap, size_t
>>>>>>> dp_bitmap_len,
>>>>>>>>> +                      enum ovn_stage stage, uint16_t priority,
>>>>>>>>> +                      const char *match, const char *actions,
>>>>>>>>> +                      const char *io_port, const char
>>> *ctrl_meter,
>>>>>>>>> +                      const struct ovsdb_idl_row *stage_hint,
>>>>>>>>> +                      const char *where,
>>>>>>>>> +                      struct lflow_ref *lflow_ref)
>>>>>>>>> +    OVS_EXCLUDED(fake_hash_mutex)
>>>>>>>>> +{
>>>>>>>>> +    struct ovs_mutex *hash_lock;
>>>>>>>>> +    uint32_t hash;
>>>>>>>>> +
>>>>>>>>> +    ovs_assert(!od ||
>>>>>>>>> +               ovn_stage_to_datapath_type(stage) ==
>>>>>>> ovn_datapath_get_type(od));
>>>>>>>>> +
>>>>>>>>> +    hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
>>>>>>>>> +                                 ovn_stage_get_pipeline(stage),
>>>>>>>>> +                                 priority, match,
>>>>>>>>> +                                 actions);
>>>>>>>>> +
>>>>>>>>> +    hash_lock = lflow_hash_lock(&lflow_table->entries, hash);
>>>>>>>>> +    struct ovn_lflow *lflow =
>>>>>>>>> +        do_ovn_lflow_add(lflow_table, od, dp_bitmap,
>>>>>>>>> +                         dp_bitmap_len, hash, stage,
>>>>>>>>> +                         priority, match, actions,
>>>>>>>>> +                         io_port, ctrl_meter, stage_hint, where);
>>>>>>>>> +
>>>>>>>>> +    if (lflow_ref) {
>>>>>>>>> +        /* lflow referencing is only supported if 'od' is not
>>> NULL.
>>>>> */
>>>>>>>>> +        ovs_assert(od);
>>>>>>>>> +
>>>>>>>>> +        struct lflow_ref_node *lrn =
>>>>>>>>> +            lflow_ref_node_find(&lflow_ref->lflow_ref_nodes,
>>> lflow,
>>>>>>> hash);
>>>>>>>>> +        if (!lrn) {
>>>>>>>>> +            lrn = xzalloc(sizeof *lrn);
>>>>>>>>> +            lrn->lflow = lflow;
>>>>>>>>> +            lrn->dp_index = od->index;
>>>>>>>>> +            ovs_list_insert(&lflow_ref->lflows_ref_list,
>>>>>>>>> +                            &lrn->lflow_list_node);
>>>>>>>>> +            inc_dp_refcnt(&lflow->dp_refcnts_map, lrn->dp_index);
>>>>>>>>> +            ovs_list_insert(&lflow->referenced_by,
>>>>>>> &lrn->ref_list_node);
>>>>>>>>> +
>>>>>>>>> +            hmap_insert(&lflow_ref->lflow_ref_nodes,
>>> &lrn->ref_node,
>>>>>>> hash);
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>> +        lrn->linked = true;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    lflow_hash_unlock(hash_lock);
>>>>>>>>> +
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>
>>>>>>>> This part is not thread safe.
>>>>>>>>
>>>>>>>> If two threads try to add logical flows that have different hashes
>>> and
>>>>>>>> lflow_ref is not NULL we're going to have a race condition when
>>>>>>>> inserting to the &lflow_ref->lflow_ref_nodes hash map because the
>>> two
>>>>>>>> threads will take different locks.
>>>>>>>>
>>>>>>>
>>>>>>> I think it is safe because a lflow_ref is always associated with an
>>>>> object,
>>>>>>> e.g. port, datapath, lb, etc., and lflow generation for a single
>>> such
>>>>>>> object is never executed in parallel, which is how the parallel
>>> lflow
>>>>> build
>>>>>>> is designed.
>>>>>>> Does it make sense?
>>>>>>
>>>>>> It happens that it's safe in this current patch set because indeed we
>>>>>> always process individual ports, datapaths, lbs, etc, in the same
>>>>>> thread.  However, this code (lflow_table_add_lflow()) is generic and
>>>>>> there's nothing (not even a comment) that would warn developers in the
>>>>>> future about the potential race if the lflow_ref is shared.
>>>>>>
>>>>>> I spoke to Numan offline a bit about this and I think the current plan
>>>>>> is to leave it as is and add proper locking as a follow up (or in v7).
>>>>>> But I think we still need a clear comment here warning users about
>>> this.
>>>>>>  Maybe we should add a comment where the lflow_ref structure is
>>> defined
>>>>> too.
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> I totally agree with you about adding comments to explain the thread
>>> safety
>>>>> considerations, and make it clear that the lflow_ref should always be
>>>>> associated with the object that is being processed by the thread.
>>>>> With regard to any follow up change for proper locking, I am not sure
>>> what
>>>>> scenario is your concern. I think if we always make sure the lflow_ref
>>>>> passed in is the one owned by the object then the current locking is
>>>>> sufficient. And this is the natural case.
>>>>>
>>>>> However, I did think about a situation where there might be a potential
>>>>> problem in the future when we need to maintain references for more than
>>> one
>>>>> input for the same lflow. For the "main" input, which is the object that
>>>>> the thread is iterating and generating lflow for, will have its
>>> lflow_ref
>>>>> passed in this function, but we might also need to maintain the
>>> reference
>>>>> of the lflow for a secondary input (or even third). In that case it is
>>> not
>>>>> just the locking issue, but firstly we need to have a proper way to
>>> pass in
>>>>> the secondary lflow_ref, which is what I had mentioned in the review
>>>>> comments for v3:
>>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410269.html
>>> .
>>>>> (the last paragraph). Regardless of that, it is not a problem for now,
>>> and
>>>>> I hope there is no need to add references for more inputs for the I-P
>>> that
>>>>> matters for production use cases.
>>>>>
>>>>
>>>> I'll update the next version with the proper documentation.
>>>>
>>>> @Han Regarding your comment that we may have the requirement in the
>>>> future to add a logical flow to
>>>> 2 or more lflow_ref's,   I doubt if we would have such a requirement
>>>> or a scenario.
>>>>
>>>> Because calling lflow_table_add_lflow(,..., lflow_ref1,  lflow_ref2)
>>>> is same as calling lflow_table_add_lflow twice
>>>>
>>>> i.e
>>>> lflow_table_add_lflow(,..., lflow_ref1)
>>>> lflow_table_add_lflow(,..., lflow_ref2)
>>>>
>>>> In the second call,  since the ovn_lflow already exists in the lflow
>>>> table, it will be just referenced in the lflow_ref2.
>>>> It would be a problem for sure if these lflow_refs (ref1 and ref2)
>>>> belong to different objects.
>>>>
>>>
>>> The scenario you mentioned here is more like duplicated lflows generated
>>> when processing different objects, which has different lflow_refs, which is
>>> not the scenario I was talking about.
>>> I am thinking about the case when a lflow is generated, there are multiple
>>> inputs. For example (extremely simplified just to express the idea), when
>>> generating lflows for a LB object, we also consult other objects such as
>>> datapath. Now we implemented I-P for LB, so for the LB object, for each
>>> lflow generated we will call lflow_table_add_lflow(..., lb->lflow_ref) to
>>> maintain the reference from the LB object to the lflow. However, if in the
>>> future we want to do I-P for datapaths, which means when a datapath is
>>> deleted or updated we need to quickly find the lflows that is linked to the
>>> datapath, we need to maintain the reference from the datapath to the same
>>> lflow that is generated by the same call lflow_table_add_lflow(...,
>>> lb->lflow_ref, datapath->lflow_ref). Or, do you suggest even in this case
>>> we just call the function twice, with the second call's only purpose being
>>> to add the reference to the secondary input? Hmm, it would actually work as
>>> if we are trying to add a redundant lflow, although wasting some cycles for
>>> the ovn_lflow_find(). Of course the locking issue would arise in this case.
>>> Anyway, I don't have a realistic example to prove this is a requirement of
>>> I-P for solving real production scale/performance issues. In the example I
>>> provided, datapath I-P seems to be a non-goal as far as I can tell for now.
>>> So I am not too worried.
>>>
>>>> However I do see a scenario where lflow_table_add_lflow() could be
>>>> called with a different lflow_ref object.
>>>>
>>>> Few scenarios I could think of
>>>>
>>>> 1.  While generating logical flows for a logical switch port (of type
>>>> router) 'P1',  there is a possibility (most likely a mistake from
>>>> a contributor) may call something like
>>>>
>>>> lflow_table_add_lflow(....,  P1->peer->lflow_ref)
>>>>
>>>> Even in this case,  we don't generate logical router port flows in
>>>> parallel with logical switch ports and
>>>> hence P1->peer->lflow_ref may not be modified by multiple threads.
>>>> But it's a possibility.  And I think Dumitru
>>>> is perhaps concerned about such scenarios.
>>>>
>>> In this case it does seem to be a problem, because it is possible that
>>> thread1 is processing P1 while thread2 is processing P1->peer, and now both
>>> threads can access P1->peer->lflow_ref. We should make it clear that when
>>> processing an object, we only add lflow_ref for that object.
>>>
>>>> 2.  While generating load balancer flows we may also generate logical
>>>> flows for the routers and store them in the od->lflow_ref  (If we
>>>> happen to support I-P for router ports)
>>>>      i.e  HMAP_FOR_EACH_PARALLEL (lb, lb_maps) {
>>>>                 generate_lflows_for_lb(lb, lb->lflow_ref)
>>>>                 FOR_EACH_ROUTER_OF_LB(lr, lb) {
>>>>                     generate_lb_lflows_for_router(lr, lr->lflow_ref)
>>>>                 }
>>>>            }
>>>>
>>>>      In this case, it is possible that a logical router 'lr' may be
>>>> associated with multiple lbs and this may corrupt the lr->lflow_ref.
>>>>
>>>>
>>>> We have 2 choices  (after this patch series is accepted :))
>>>>   a.  Don't do anything.  The documentation added by this patch series
>>>> (which I'll update in the next version)  is good enough.
>>>>        and whenever such scenarios arise,  the contributor will add
>>>> the thread_safe code or make sure that he/she will not use the
>>>> lflow_ref of other objects while
>>>>        generating lflows.  In the scenario 2 above,  the contributor
>>>> should not generate lflows for the routers in the lb_maps loop.
>>>> Instead should generate
>>>>        them in the lr_datapaths loop or lr_stateful loop.
>>>>
>>>>   b.  As a follow up patch,  make the lflow_table_add_lflow()
>>>> thread_safe for lflow_ref so that we are covered for the future
>>>> scenarios.
>>>>        The downside of this is that we have to maintain hash locks for
>>>> each lflow_ref and reconsile the lflow_ref hmap after all the threads
>>>> finish the lflow generation.
>>>>        It does incur some cost (both memory and cpu wise).  But it
>>>> will be only during recomputes.  Which should not be too frequent.
>>>>
>>>
>>> I'd prefer solution (a) with proper documentation to emphasize the
>>> rule/assumption.
>>>
>>
>> Would it make sense to also try to detect whether a lflow_ref is used
>> from multiple threads?  E.g., store the id of the last thread that
>> accessed it and assert that it's the same with the thread currently
>> accessing the lflow ref?
>>
>> Just to make sure we crash explicitly in case such bugs are
>> inadvertently introduced?
>>
> 
> I think it would be hard to make a decision here because I-P is
> handled in a single thread.
> How do we distinguish between a full recompute with parallel threads
> vs I-P single threading
> in lflow_ref.c.
>


We could distinguish between recompute and incremental updates but that
would probably add a connection between lflow-mgr and the I-P engine.
That's not so nice indeed.

Let's add a clear comment warning people about the right way to use this.

Regards,
Dumitru

> Numan
> 
>> Regards,
>> Dumitru
>>
>>> Thanks,
>>> Han
>>>
>>>>
>>>> Let me know what you think.  I'm fine with both, but personally prefer
>>>> (1) as I don't see such scenarios in the near future.
>>>>
>>>>
>>>> Thanks
>>>> Numan
>>>>
>>>>
>>>>
>>>>
>>>>> Thanks,
>>>>> Han
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Dumitru
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Han
>>>>>>>
>>>>>>>> That might corrupt the map.
>>>>>>>>
>>>>>>>> I guess, if we don't want to cause more performance degradation we
>>>>>>>> should maintain as many lflow_ref instances as we do hash_locks,
>>> i.e.,
>>>>>>>> LFLOW_HASH_LOCK_MASK + 1.  Will that even be possible?
>>>>>>>>
>>>>>>>> Wdyt?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Dumitru
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> d...@openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> d...@openvswitch.org
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v5 08/16] northd: Refactor lflow management into a separate module.

Reply via email to