On 7/28/23 11:02, Peng He wrote:
> 
> 
> Ilya Maximets <i.maxim...@ovn.org <mailto:i.maxim...@ovn.org>> 于2023年7月28日周五 
> 00:59写道:
> 
>     On 7/27/23 04:31, Peng He wrote:
>     >
>     >
>     > Eelco Chaudron <echau...@redhat.com <mailto:echau...@redhat.com> 
> <mailto:echau...@redhat.com <mailto:echau...@redhat.com>>> 于2023年7月6日周四 
> 15:52写道:
>     >
>     >
>     >
>     >     On 6 Jul 2023, at 4:32, Peng He wrote:
>     >
>     >     > Eelco Chaudron <echau...@redhat.com <mailto:echau...@redhat.com> 
> <mailto:echau...@redhat.com <mailto:echau...@redhat.com>>> 于2023年7月5日周三 
> 22:16写道:
>     >     >
>     >     >>
>     >     >>
>     >     >> On 1 Jul 2023, at 4:43, Peng He wrote:
>     >     >>
>     >     >>> OVS allows overlapping megaflows, as long as the actions of 
> these
>     >     >>> megaflows are equal. However, the current implementation of 
> action
>     >     >>> modification relies on flow_lookup instead of ufid, this could 
> result
>     >     >>> in looking up a wrong megaflow and make the ukeys and megaflows
>     >     >> inconsistent
>     >     >>>
>     >     >>> Just like the test case in the patch, at first we have a rule 
> with the
>     >     >>> prefix:
>     >     >>>
>     >     >>> 10.1.2.0/24 <http://10.1.2.0/24> <http://10.1.2.0/24 
> <http://10.1.2.0/24>>
>     >     >>>
>     >     >>> and we will get a megaflow with prefixes 10.1.2.2/24 
> <http://10.1.2.2/24> <http://10.1.2.2/24 <http://10.1.2.2/24>> when a packet 
> with
>     >     >> IP
>     >     >>> 10.1.2.2 is received.
>     >     >>>
>     >     >>> Then suppose we change the rule into 10.1.0.0/16 
> <http://10.1.0.0/16> <http://10.1.0.0/16 <http://10.1.0.0/16>>. OVS prefers 
> to keep
>     >     >> the
>     >     >>> 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 
> <http://10.1.2.2/24>> megaflow and just changes its action instead of 
> extending
>     >     >>> the prefix into 10.1.2.2/16 <http://10.1.2.2/16> 
> <http://10.1.2.2/16 <http://10.1.2.2/16>>.
>     >     >>>
>     >     >>> then suppose we have a 10.1.0.2 packet, since it misses the 
> megaflow,
>     >     >>> this time, we will have an overlapping megaflow with the right 
> prefix:
>     >     >>> 10.1.0.2/16 <http://10.1.0.2/16> <http://10.1.0.2/16 
> <http://10.1.0.2/16>>
>     >     >>>
>     >     >>> now we have two megaflows:
>     >     >>> 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 
> <http://10.1.2.2/24>>
>     >     >>> 10.1.0.2/16 <http://10.1.0.2/16> <http://10.1.0.2/16 
> <http://10.1.0.2/16>>
>     >     >>>
>     >     >>> last, suppose we have changed the ruleset again. The 
> revalidator this
>     >     >>> time still decides to change the actions of both megaflows 
> instead of
>     >     >>> deleting them.
>     >     >>>
>     >     >>> The dpif_netdev_flow_put will search the megaflow to modify 
> with unmasked
>     >     >>> keys, however it might lookup the wrong megaflow as the key 
> 10.1.2.2
>     >     >> matches
>     >     >>> both 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 
> <http://10.1.2.2/24>> and 10.1.0.2/16 <http://10.1.0.2/16> 
> <http://10.1.0.2/16 <http://10.1.0.2/16>>!
>     >     >>>
>     >     >>> This patch changes the megaflow lookup code in modification 
> path into
>     >     >>> relying the ufid to find the correct megaflow instead of key 
> lookup.
>     >     >>
>     >     >> Thanks for fixing Ilya’s comments! I’ve also copied in some of 
> the v3
>     >     >> discussion, so we can wrap it up here.
>     >     >>
>     >     >> Acked-by: Eelco Chaudron <echau...@redhat.com 
> <mailto:echau...@redhat.com> <mailto:echau...@redhat.com 
> <mailto:echau...@redhat.com>>>
>     >     >>
>     >     >> //Eelco
>     >     >>
>     >     >>> Signed-off-by: Peng He <hepeng.0...@bytedance.com 
> <mailto:hepeng.0...@bytedance.com> <mailto:hepeng.0...@bytedance.com 
> <mailto:hepeng.0...@bytedance.com>>>
>     >     >>> ---
>     >     >>>  lib/dpif-netdev.c | 62 
> ++++++++++++++++++++++++++---------------------
>     >     >>>  tests/pmd.at <http://pmd.at> <http://pmd.at <http://pmd.at>>   
>    | 48 ++++++++++++++++++++++++++++++++++++
>     >     >>>  2 files changed, 82 insertions(+), 28 deletions(-)
>     >     >>>
>     >     >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     >     >>> index 70b953ae6..b90ed1542 100644
>     >     >>> --- a/lib/dpif-netdev.c
>     >     >>> +++ b/lib/dpif-netdev.c
>     >     >>> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct 
> dp_netdev_pmd_thread *pmd,
>     >     >>>          memset(stats, 0, sizeof *stats);
>     >     >>>      }
>     >     >>>
>     >     >>> -    ovs_mutex_lock(&pmd->flow_mutex);
>     >     >>> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>     >     >>> -    if (!netdev_flow) {
>     >     >>> -        if (put->flags & DPIF_FP_CREATE) {
>     >     >>> +    if (put->flags & DPIF_FP_CREATE) {
>     >     >>
>     >     >>
>     >     >> EC> Should this not be:
>     >     >> EC>
>     >     >> EC> if (put->flags & DPIF_FP_CREATE && !(put->flags & 
> DPIF_FP_MODIFY))
>     >     >> EC>
>     >     >> EC> I know this will break the fix, but I’m wondering what the 
> possible
>     >     >> EC> combinations are.
>     >     >> EC> Quickly looking at the code the following ones seem to exist:
>     >     >> EC>
>     >     >> EC> DPIF_FP_CREATE  -> Create a flow, if one exists return EEXIST
>     >     >> EC> DPIF_FP_MODIFY  -> Modify a flow, if it does not exist 
> return ENONT
>     >     >> EC> DPIF_FP_CREATE | DPIF_FP_MODIFY   -> If it exists modify it, 
> if it does
>     >     >> EC> not exists create it.
>     >     >> EC>
>     >     >> EC> Could the last combination not potentially fail the lookup?
>     >     >> EC>
>     >     >> EC> Or are we assuming only standalone DPIF_FP_MODIFY’s are the 
> problem? In
>     >     >> EC> theory, it could also be the combination.
>     >     >> EC>
>     >     >>
>     >     >> PH> sorry, I was wrong. Such a combination does exist in
>     >     >> PH>the dpif_probe_feature, and it probes the datapath by trying 
> to put
>     >     >> flows:
>     >     >> PH>
>     >     >> PH>     error = dpif_flow_put(dpif, DPIF_FP_CREATE | 
> DPIF_FP_MODIFY |
>     >     >> PH>DPIF_FP_PROBE,
>     >     >> PH>                           key->data, key->size, NULL, 0,
>     >     >> PH>                           nl_actions, nl_actions_size,
>     >     >> PH>                           ufid, NON_PMD_CORE_ID, NULL);
>     >     >>
>     >     >> PH> so we CANNOT change the code into:
>     >     >>
>     >     >> PH> if (put->flags & DPIF_FP_CREATE && !(put->flags & 
> DPIF_FP_MODIFY))
>     >     >>
>     >     >> PH> as the issue the patch tries to fix only exist in MODIFY 
> alone path.
>     >     >> PH> If CREATE bit is set, we need to go through the CREATE path 
> no matter
>     >     >> if
>     >     >> PH> MODIFY exist or not.
>     >     >>
>     >     >> Ok, if this is only used by the probe function we should be 
> fine. I did
>     >     >> quickly search the code and it seems to be the way. However, if 
> it’s ever
>     >     >> used by any part of ovs with both flags set, it might fail the 
> lookup and
>     >     >> we run into the same problem.
>     >     >>
>     >     >
>     >     > By "the same problem", you mean the problem listed in this patch 
> or this
>     >     > fix might lead to a potential failure in other part of OVS?
>     >
>     >     If you have a request with both DPIF_FP_MODIFY and DPIF_FP_CREATE 
> set, the lookup could still return the wrong entry. But if this is not used 
> it should be fine for now.
>     >
>     >
>     > So do we need to check such combination here, and just return -ENOSUPP?
> 
>     I'm not sure that's a good solution for a problem in the current version
>     of the fix.  In general, this patch seems to not comply with the 
> documented
>     behavior of the flow_put callback.  The intended behavior is described in
>     the comment for a struct dpif_flow_put in lib/dpif.h.  Both flags being
>     present is a valid case that should be handled.  Empty set of flags can
>     also be a valid case of some sort.  It will always fail, but it should
>     provide different error codes depending on the flow existence in the
>     datapath.
> 
>     Is there anything major that prevents us from using UFID always?
>     Userspace datapath seems to always have it and it will generate one if not
>     provided.  The internal generation might be a problem, I guess, due to a
>     risk of it being non-unique.  So, we probably should not use a locally
>     generated ufid for a lookup, only for dp_netdev_flow_add.
> 
>     So, the logic might be:
> 
>       if put->ufid:   /* Actually provided UFID from upper layers. */
>         Lookup by UFID
>       else:
>         Lookup by KEY
> 
>  
> In general, I think this solution is better. One thing I am not clear is 
> below:
> 
> if use KEY, how can we guarantee that we have found the right megaflow?
> 
> I think the only way to do this is to generate again the ufid using KEY, and 
> then lookup the
> flow_table using this ufid. The upper layer can decide to use ufid or not, 
> through upcall/enable/disable-ufid, 
> but the userspace datapath should always support ufid. This results in a 
> logic that no matter
> what the upper layer provide, the datapath always use ufid to locate the 
> megaflow. 
> 
> Why not use ufid at the first place? I do not know.

The problem is that locally generated UFID, despite the name, can be
not unique.  It's a hash and collisions are possible.  ukeys are checked
for UFID collisions, so externally provided UFIDs are indeed unique.
So we will have to additionally check that we found what we were looking
for anyways.

Another potential problem is manually added flows with UFID calculated
in a different way / UFID that is just random.  If user later will request
this flow without specifying UFID, we will not be able to find it.
For that reason we have to use the classifier, if UFID is not provided
in the request.  I agree that we can't guarantee that the correct flow
is found by just a classifier lookup.  So, we should likely bring back
some form of the flow_equal() check that was removed in commit:
  beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")
This commit is also responsible for turning '/* Overlapping flow. */'
into incorrect comment.

So, something like:

  if put->ufid:
    Lookup by UFID
  else:
    Lookup by KEY
    if ! flow_equal():
      /* Overlapping flow. */ --> EINVAL

The problematic part is the statement in the commit message that flow_equal()
check caused false-negative results due to masking.  I'm not sure if that is
still the issue.  We fixed a few issues in that area since then.  But the
check is definitely needed, because the 'a) the dpcls lookup is sufficient to
uniquely identify a flow' in the commit message is not a correct statement.

I suppose, the mentioned commit is also a candidate for a 'Fixes' tag.

> 
> 
> 
>       Not Found:
>         DPIF_FP_CREATE is SET:       Add the flow
>         DPIF_FP_CREATE is not SET:   ENOENT
>       Found:
>         DPIF_FP_MODIFY is SET:       Modify the flow
>         DPIF_FP_MODIFY is not SET:   EEXIST 
> 
> 
>     This way, if flows are added manually it will be user's responsibility
>     to specify or not specify UFIDs, the dpctl/add-flow is a debug interface
>     after all, so should not be a problem.  And the ovs-vswitchd itself will
>     always provide UFID and it checks that ukeys have different UFIDs, unless
>     manually disabled, again, for testing or debugging purposes.
> 
>     Such logic is not that different from what we have today and should cover
>     the issue and all the combinations of flags.
> 
>     Or am I missing something?
> 
> 
>     A few other comments regarding the implementation:
> 
>     - The change reduces the critical section of the flow mutex down to the
>     DPIF_FP_CREATE case.  We can't actually do that.  The main reason will
>     be a possible race condition on the ovsrcu_set() during the actions update
>     and a subsequent potential actions leak.  RCU set operations require
>     external synchronization.  There could be other race conditions as well,
>     because multiple threads can work on the same flow at the same time.
> 
> 
> Yes, this is indeed a problem. Although I am not sure that is it possible 
> that two
> revalidator threads operate a same ukey at the same time? But better control
> this in the lower layer to make sure.

Revalidators should lock the ukey while working on it, but the dpif-provider API
doesn't require that from the caller, so we should check in the implementation.
Also, the manual flow_put from the dpctl/mod-flow may collide with a 
revalidator.

> 
>     - The '/* Overlapping flow. */' comment is actually not correct.  It's
>     not really the problem of this particular patch as it wasn't correct
>     before either, but we should probably just remove it as you're touching
>     this code anyways.
> 
> Ok, will delete it in the next version.
> 
>  
> 
> 
>     Best regards, Ilya Maximets.
> 
> 
> 
> -- 
> hepeng

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to