On 1/25/24 06:44, Han Zhou wrote:
> On Wed, Jan 24, 2024 at 8:39 PM Numan Siddique <num...@ovn.org> wrote:
>>
>> On Wed, Jan 24, 2024 at 10:53 PM Han Zhou <hz...@ovn.org> wrote:
>>>
>>> On Wed, Jan 24, 2024 at 4:23 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>>>
>>>> On 1/24/24 06:01, Han Zhou wrote:
>>>>> On Fri, Jan 19, 2024 at 2:50 AM Dumitru Ceara <dce...@redhat.com>
> wrote:
>>>>>>
>>>>>> On 1/11/24 16:31, num...@ovn.org wrote:
>>>>>>> +
>>>>>>> +void
>>>>>>> +lflow_table_add_lflow(struct lflow_table *lflow_table,
>>>>>>> +                      const struct ovn_datapath *od,
>>>>>>> +                      const unsigned long *dp_bitmap, size_t
>>>>> dp_bitmap_len,
>>>>>>> +                      enum ovn_stage stage, uint16_t priority,
>>>>>>> +                      const char *match, const char *actions,
>>>>>>> +                      const char *io_port, const char
> *ctrl_meter,
>>>>>>> +                      const struct ovsdb_idl_row *stage_hint,
>>>>>>> +                      const char *where,
>>>>>>> +                      struct lflow_ref *lflow_ref)
>>>>>>> +    OVS_EXCLUDED(fake_hash_mutex)
>>>>>>> +{
>>>>>>> +    struct ovs_mutex *hash_lock;
>>>>>>> +    uint32_t hash;
>>>>>>> +
>>>>>>> +    ovs_assert(!od ||
>>>>>>> +               ovn_stage_to_datapath_type(stage) ==
>>>>> ovn_datapath_get_type(od));
>>>>>>> +
>>>>>>> +    hash = ovn_logical_flow_hash(ovn_stage_get_table(stage),
>>>>>>> +                                 ovn_stage_get_pipeline(stage),
>>>>>>> +                                 priority, match,
>>>>>>> +                                 actions);
>>>>>>> +
>>>>>>> +    hash_lock = lflow_hash_lock(&lflow_table->entries, hash);
>>>>>>> +    struct ovn_lflow *lflow =
>>>>>>> +        do_ovn_lflow_add(lflow_table, od, dp_bitmap,
>>>>>>> +                         dp_bitmap_len, hash, stage,
>>>>>>> +                         priority, match, actions,
>>>>>>> +                         io_port, ctrl_meter, stage_hint, where);
>>>>>>> +
>>>>>>> +    if (lflow_ref) {
>>>>>>> +        /* lflow referencing is only supported if 'od' is not
> NULL.
>>> */
>>>>>>> +        ovs_assert(od);
>>>>>>> +
>>>>>>> +        struct lflow_ref_node *lrn =
>>>>>>> +            lflow_ref_node_find(&lflow_ref->lflow_ref_nodes,
> lflow,
>>>>> hash);
>>>>>>> +        if (!lrn) {
>>>>>>> +            lrn = xzalloc(sizeof *lrn);
>>>>>>> +            lrn->lflow = lflow;
>>>>>>> +            lrn->dp_index = od->index;
>>>>>>> +            ovs_list_insert(&lflow_ref->lflows_ref_list,
>>>>>>> +                            &lrn->lflow_list_node);
>>>>>>> +            inc_dp_refcnt(&lflow->dp_refcnts_map, lrn->dp_index);
>>>>>>> +            ovs_list_insert(&lflow->referenced_by,
>>>>> &lrn->ref_list_node);
>>>>>>> +
>>>>>>> +            hmap_insert(&lflow_ref->lflow_ref_nodes,
> &lrn->ref_node,
>>>>> hash);
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        lrn->linked = true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    lflow_hash_unlock(hash_lock);
>>>>>>> +
>>>>>>> +}
>>>>>>> +
>>>>>>
>>>>>> This part is not thread safe.
>>>>>>
>>>>>> If two threads try to add logical flows that have different hashes
> and
>>>>>> lflow_ref is not NULL we're going to have a race condition when
>>>>>> inserting to the &lflow_ref->lflow_ref_nodes hash map because the
> two
>>>>>> threads will take different locks.
>>>>>>
>>>>>
>>>>> I think it is safe because a lflow_ref is always associated with an
>>> object,
>>>>> e.g. port, datapath, lb, etc., and lflow generation for a single
> such
>>>>> object is never executed in parallel, which is how the parallel
> lflow
>>> build
>>>>> is designed.
>>>>> Does it make sense?
>>>>
>>>> It happens that it's safe in this current patch set because indeed we
>>>> always process individual ports, datapaths, lbs, etc, in the same
>>>> thread.  However, this code (lflow_table_add_lflow()) is generic and
>>>> there's nothing (not even a comment) that would warn developers in the
>>>> future about the potential race if the lflow_ref is shared.
>>>>
>>>> I spoke to Numan offline a bit about this and I think the current plan
>>>> is to leave it as is and add proper locking as a follow up (or in v7).
>>>> But I think we still need a clear comment here warning users about
> this.
>>>>  Maybe we should add a comment where the lflow_ref structure is
> defined
>>> too.
>>>>
>>>> What do you think?
>>>
>>> I totally agree with you about adding comments to explain the thread
> safety
>>> considerations, and make it clear that the lflow_ref should always be
>>> associated with the object that is being processed by the thread.
>>> With regard to any follow up change for proper locking, I am not sure
> what
>>> scenario is your concern. I think if we always make sure the lflow_ref
>>> passed in is the one owned by the object then the current locking is
>>> sufficient. And this is the natural case.
>>>
>>> However, I did think about a situation where there might be a potential
>>> problem in the future when we need to maintain references for more than
> one
>>> input for the same lflow. For the "main" input, which is the object that
>>> the thread is iterating and generating lflow for, will have its
> lflow_ref
>>> passed in this function, but we might also need to maintain the
> reference
>>> of the lflow for a secondary input (or even third). In that case it is
> not
>>> just the locking issue, but firstly we need to have a proper way to
> pass in
>>> the secondary lflow_ref, which is what I had mentioned in the review
>>> comments for v3:
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410269.html
> .
>>> (the last paragraph). Regardless of that, it is not a problem for now,
> and
>>> I hope there is no need to add references for more inputs for the I-P
> that
>>> matters for production use cases.
>>>
>>
>> I'll update the next version with the proper documentation.
>>
>> @Han Regarding your comment that we may have the requirement in the
>> future to add a logical flow to
>> 2 or more lflow_ref's,   I doubt if we would have such a requirement
>> or a scenario.
>>
>> Because calling lflow_table_add_lflow(,..., lflow_ref1,  lflow_ref2)
>> is same as calling lflow_table_add_lflow twice
>>
>> i.e
>> lflow_table_add_lflow(,..., lflow_ref1)
>> lflow_table_add_lflow(,..., lflow_ref2)
>>
>> In the second call,  since the ovn_lflow already exists in the lflow
>> table, it will be just referenced in the lflow_ref2.
>> It would be a problem for sure if these lflow_refs (ref1 and ref2)
>> belong to different objects.
>>
> 
> The scenario you mentioned here is more like duplicated lflows generated
> when processing different objects, which has different lflow_refs, which is
> not the scenario I was talking about.
> I am thinking about the case when a lflow is generated, there are multiple
> inputs. For example (extremely simplified just to express the idea), when
> generating lflows for a LB object, we also consult other objects such as
> datapath. Now we implemented I-P for LB, so for the LB object, for each
> lflow generated we will call lflow_table_add_lflow(..., lb->lflow_ref) to
> maintain the reference from the LB object to the lflow. However, if in the
> future we want to do I-P for datapaths, which means when a datapath is
> deleted or updated we need to quickly find the lflows that is linked to the
> datapath, we need to maintain the reference from the datapath to the same
> lflow that is generated by the same call lflow_table_add_lflow(...,
> lb->lflow_ref, datapath->lflow_ref). Or, do you suggest even in this case
> we just call the function twice, with the second call's only purpose being
> to add the reference to the secondary input? Hmm, it would actually work as
> if we are trying to add a redundant lflow, although wasting some cycles for
> the ovn_lflow_find(). Of course the locking issue would arise in this case.
> Anyway, I don't have a realistic example to prove this is a requirement of
> I-P for solving real production scale/performance issues. In the example I
> provided, datapath I-P seems to be a non-goal as far as I can tell for now.
> So I am not too worried.
> 
>> However I do see a scenario where lflow_table_add_lflow() could be
>> called with a different lflow_ref object.
>>
>> Few scenarios I could think of
>>
>> 1.  While generating logical flows for a logical switch port (of type
>> router) 'P1',  there is a possibility (most likely a mistake from
>> a contributor) may call something like
>>
>> lflow_table_add_lflow(....,  P1->peer->lflow_ref)
>>
>> Even in this case,  we don't generate logical router port flows in
>> parallel with logical switch ports and
>> hence P1->peer->lflow_ref may not be modified by multiple threads.
>> But it's a possibility.  And I think Dumitru
>> is perhaps concerned about such scenarios.
>>
> In this case it does seem to be a problem, because it is possible that
> thread1 is processing P1 while thread2 is processing P1->peer, and now both
> threads can access P1->peer->lflow_ref. We should make it clear that when
> processing an object, we only add lflow_ref for that object.
> 
>> 2.  While generating load balancer flows we may also generate logical
>> flows for the routers and store them in the od->lflow_ref  (If we
>> happen to support I-P for router ports)
>>      i.e  HMAP_FOR_EACH_PARALLEL (lb, lb_maps) {
>>                 generate_lflows_for_lb(lb, lb->lflow_ref)
>>                 FOR_EACH_ROUTER_OF_LB(lr, lb) {
>>                     generate_lb_lflows_for_router(lr, lr->lflow_ref)
>>                 }
>>            }
>>
>>      In this case, it is possible that a logical router 'lr' may be
>> associated with multiple lbs and this may corrupt the lr->lflow_ref.
>>
>>
>> We have 2 choices  (after this patch series is accepted :))
>>   a.  Don't do anything.  The documentation added by this patch series
>> (which I'll update in the next version)  is good enough.
>>        and whenever such scenarios arise,  the contributor will add
>> the thread_safe code or make sure that he/she will not use the
>> lflow_ref of other objects while
>>        generating lflows.  In the scenario 2 above,  the contributor
>> should not generate lflows for the routers in the lb_maps loop.
>> Instead should generate
>>        them in the lr_datapaths loop or lr_stateful loop.
>>
>>   b.  As a follow up patch,  make the lflow_table_add_lflow()
>> thread_safe for lflow_ref so that we are covered for the future
>> scenarios.
>>        The downside of this is that we have to maintain hash locks for
>> each lflow_ref and reconsile the lflow_ref hmap after all the threads
>> finish the lflow generation.
>>        It does incur some cost (both memory and cpu wise).  But it
>> will be only during recomputes.  Which should not be too frequent.
>>
> 
> I'd prefer solution (a) with proper documentation to emphasize the
> rule/assumption.
> 

Would it make sense to also try to detect whether a lflow_ref is used
from multiple threads?  E.g., store the id of the last thread that
accessed it and assert that it's the same with the thread currently
accessing the lflow ref?

Just to make sure we crash explicitly in case such bugs are
inadvertently introduced?

Regards,
Dumitru

> Thanks,
> Han
> 
>>
>> Let me know what you think.  I'm fine with both, but personally prefer
>> (1) as I don't see such scenarios in the near future.
>>
>>
>> Thanks
>> Numan
>>
>>
>>
>>
>>> Thanks,
>>> Han
>>>
>>>>
>>>> Regards,
>>>> Dumitru
>>>>
>>>>>
>>>>> Thanks,
>>>>> Han
>>>>>
>>>>>> That might corrupt the map.
>>>>>>
>>>>>> I guess, if we don't want to cause more performance degradation we
>>>>>> should maintain as many lflow_ref instances as we do hash_locks,
> i.e.,
>>>>>> LFLOW_HASH_LOCK_MASK + 1.  Will that even be possible?
>>>>>>
>>>>>> Wdyt?
>>>>>>
>>>>>> Regards,
>>>>>> Dumitru
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> d...@openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>>>
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to