Ilya Maximets <i.maxim...@ovn.org> 于2023年7月28日周五 20:25写道:

> On 7/28/23 11:02, Peng He wrote:
> >
> >
> > Ilya Maximets <i.maxim...@ovn.org <mailto:i.maxim...@ovn.org>>
> 于2023年7月28日周五 00:59写道:
> >
> >     On 7/27/23 04:31, Peng He wrote:
> >     >
> >     >
> >     > Eelco Chaudron <echau...@redhat.com <mailto:echau...@redhat.com>
> <mailto:echau...@redhat.com <mailto:echau...@redhat.com>>> 于2023年7月6日周四
> 15:52写道:
> >     >
> >     >
> >     >
> >     >     On 6 Jul 2023, at 4:32, Peng He wrote:
> >     >
> >     >     > Eelco Chaudron <echau...@redhat.com <mailto:
> echau...@redhat.com> <mailto:echau...@redhat.com <mailto:
> echau...@redhat.com>>> 于2023年7月5日周三 22:16写道:
> >     >     >
> >     >     >>
> >     >     >>
> >     >     >> On 1 Jul 2023, at 4:43, Peng He wrote:
> >     >     >>
> >     >     >>> OVS allows overlapping megaflows, as long as the actions
> of these
> >     >     >>> megaflows are equal. However, the current implementation
> of action
> >     >     >>> modification relies on flow_lookup instead of ufid, this
> could result
> >     >     >>> in looking up a wrong megaflow and make the ukeys and
> megaflows
> >     >     >> inconsistent
> >     >     >>>
> >     >     >>> Just like the test case in the patch, at first we have a
> rule with the
> >     >     >>> prefix:
> >     >     >>>
> >     >     >>> 10.1.2.0/24 <http://10.1.2.0/24> <http://10.1.2.0/24 <
> http://10.1.2.0/24>>
> >     >     >>>
> >     >     >>> and we will get a megaflow with prefixes 10.1.2.2/24 <
> http://10.1.2.2/24> <http://10.1.2.2/24 <http://10.1.2.2/24>> when a
> packet with
> >     >     >> IP
> >     >     >>> 10.1.2.2 is received.
> >     >     >>>
> >     >     >>> Then suppose we change the rule into 10.1.0.0/16 <
> http://10.1.0.0/16> <http://10.1.0.0/16 <http://10.1.0.0/16>>. OVS
> prefers to keep
> >     >     >> the
> >     >     >>> 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 <
> http://10.1.2.2/24>> megaflow and just changes its action instead of
> extending
> >     >     >>> the prefix into 10.1.2.2/16 <http://10.1.2.2/16> <
> http://10.1.2.2/16 <http://10.1.2.2/16>>.
> >     >     >>>
> >     >     >>> then suppose we have a 10.1.0.2 packet, since it misses
> the megaflow,
> >     >     >>> this time, we will have an overlapping megaflow with the
> right prefix:
> >     >     >>> 10.1.0.2/16 <http://10.1.0.2/16> <http://10.1.0.2/16 <
> http://10.1.0.2/16>>
> >     >     >>>
> >     >     >>> now we have two megaflows:
> >     >     >>> 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 <
> http://10.1.2.2/24>>
> >     >     >>> 10.1.0.2/16 <http://10.1.0.2/16> <http://10.1.0.2/16 <
> http://10.1.0.2/16>>
> >     >     >>>
> >     >     >>> last, suppose we have changed the ruleset again. The
> revalidator this
> >     >     >>> time still decides to change the actions of both megaflows
> instead of
> >     >     >>> deleting them.
> >     >     >>>
> >     >     >>> The dpif_netdev_flow_put will search the megaflow to
> modify with unmasked
> >     >     >>> keys, however it might lookup the wrong megaflow as the
> key 10.1.2.2
> >     >     >> matches
> >     >     >>> both 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24
> <http://10.1.2.2/24>> and 10.1.0.2/16 <http://10.1.0.2/16> <
> http://10.1.0.2/16 <http://10.1.0.2/16>>!
> >     >     >>>
> >     >     >>> This patch changes the megaflow lookup code in
> modification path into
> >     >     >>> relying the ufid to find the correct megaflow instead of
> key lookup.
> >     >     >>
> >     >     >> Thanks for fixing Ilya’s comments! I’ve also copied in some
> of the v3
> >     >     >> discussion, so we can wrap it up here.
> >     >     >>
> >     >     >> Acked-by: Eelco Chaudron <echau...@redhat.com <mailto:
> echau...@redhat.com> <mailto:echau...@redhat.com <mailto:
> echau...@redhat.com>>>
> >     >     >>
> >     >     >> //Eelco
> >     >     >>
> >     >     >>> Signed-off-by: Peng He <hepeng.0...@bytedance.com <mailto:
> hepeng.0...@bytedance.com> <mailto:hepeng.0...@bytedance.com <mailto:
> hepeng.0...@bytedance.com>>>
> >     >     >>> ---
> >     >     >>>  lib/dpif-netdev.c | 62
> ++++++++++++++++++++++++++---------------------
> >     >     >>>  tests/pmd.at <http://pmd.at> <http://pmd.at <
> http://pmd.at>>      | 48 ++++++++++++++++++++++++++++++++++++
> >     >     >>>  2 files changed, 82 insertions(+), 28 deletions(-)
> >     >     >>>
> >     >     >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >     >     >>> index 70b953ae6..b90ed1542 100644
> >     >     >>> --- a/lib/dpif-netdev.c
> >     >     >>> +++ b/lib/dpif-netdev.c
> >     >     >>> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct
> dp_netdev_pmd_thread *pmd,
> >     >     >>>          memset(stats, 0, sizeof *stats);
> >     >     >>>      }
> >     >     >>>
> >     >     >>> -    ovs_mutex_lock(&pmd->flow_mutex);
> >     >     >>> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key,
> NULL);
> >     >     >>> -    if (!netdev_flow) {
> >     >     >>> -        if (put->flags & DPIF_FP_CREATE) {
> >     >     >>> +    if (put->flags & DPIF_FP_CREATE) {
> >     >     >>
> >     >     >>
> >     >     >> EC> Should this not be:
> >     >     >> EC>
> >     >     >> EC> if (put->flags & DPIF_FP_CREATE && !(put->flags &
> DPIF_FP_MODIFY))
> >     >     >> EC>
> >     >     >> EC> I know this will break the fix, but I’m wondering what
> the possible
> >     >     >> EC> combinations are.
> >     >     >> EC> Quickly looking at the code the following ones seem to
> exist:
> >     >     >> EC>
> >     >     >> EC> DPIF_FP_CREATE  -> Create a flow, if one exists return
> EEXIST
> >     >     >> EC> DPIF_FP_MODIFY  -> Modify a flow, if it does not exist
> return ENONT
> >     >     >> EC> DPIF_FP_CREATE | DPIF_FP_MODIFY   -> If it exists
> modify it, if it does
> >     >     >> EC> not exists create it.
> >     >     >> EC>
> >     >     >> EC> Could the last combination not potentially fail the
> lookup?
> >     >     >> EC>
> >     >     >> EC> Or are we assuming only standalone DPIF_FP_MODIFY’s are
> the problem? In
> >     >     >> EC> theory, it could also be the combination.
> >     >     >> EC>
> >     >     >>
> >     >     >> PH> sorry, I was wrong. Such a combination does exist in
> >     >     >> PH>the dpif_probe_feature, and it probes the datapath by
> trying to put
> >     >     >> flows:
> >     >     >> PH>
> >     >     >> PH>     error = dpif_flow_put(dpif, DPIF_FP_CREATE |
> DPIF_FP_MODIFY |
> >     >     >> PH>DPIF_FP_PROBE,
> >     >     >> PH>                           key->data, key->size, NULL, 0,
> >     >     >> PH>                           nl_actions, nl_actions_size,
> >     >     >> PH>                           ufid, NON_PMD_CORE_ID, NULL);
> >     >     >>
> >     >     >> PH> so we CANNOT change the code into:
> >     >     >>
> >     >     >> PH> if (put->flags & DPIF_FP_CREATE && !(put->flags &
> DPIF_FP_MODIFY))
> >     >     >>
> >     >     >> PH> as the issue the patch tries to fix only exist in
> MODIFY alone path.
> >     >     >> PH> If CREATE bit is set, we need to go through the CREATE
> path no matter
> >     >     >> if
> >     >     >> PH> MODIFY exist or not.
> >     >     >>
> >     >     >> Ok, if this is only used by the probe function we should be
> fine. I did
> >     >     >> quickly search the code and it seems to be the way.
> However, if it’s ever
> >     >     >> used by any part of ovs with both flags set, it might fail
> the lookup and
> >     >     >> we run into the same problem.
> >     >     >>
> >     >     >
> >     >     > By "the same problem", you mean the problem listed in this
> patch or this
> >     >     > fix might lead to a potential failure in other part of OVS?
> >     >
> >     >     If you have a request with both DPIF_FP_MODIFY and
> DPIF_FP_CREATE set, the lookup could still return the wrong entry. But if
> this is not used it should be fine for now.
> >     >
> >     >
> >     > So do we need to check such combination here, and just return
> -ENOSUPP?
> >
> >     I'm not sure that's a good solution for a problem in the current
> version
> >     of the fix.  In general, this patch seems to not comply with the
> documented
> >     behavior of the flow_put callback.  The intended behavior is
> described in
> >     the comment for a struct dpif_flow_put in lib/dpif.h.  Both flags
> being
> >     present is a valid case that should be handled.  Empty set of flags
> can
> >     also be a valid case of some sort.  It will always fail, but it
> should
> >     provide different error codes depending on the flow existence in the
> >     datapath.
> >
> >     Is there anything major that prevents us from using UFID always?
> >     Userspace datapath seems to always have it and it will generate one
> if not
> >     provided.  The internal generation might be a problem, I guess, due
> to a
> >     risk of it being non-unique.  So, we probably should not use a
> locally
> >     generated ufid for a lookup, only for dp_netdev_flow_add.
> >
> >     So, the logic might be:
> >
> >       if put->ufid:   /* Actually provided UFID from upper layers. */
> >         Lookup by UFID
> >       else:
> >         Lookup by KEY
> >
> >
> > In general, I think this solution is better. One thing I am not clear is
> below:
> >
> > if use KEY, how can we guarantee that we have found the right megaflow?
> >
> > I think the only way to do this is to generate again the ufid using KEY,
> and then lookup the
> > flow_table using this ufid. The upper layer can decide to use ufid or
> not, through upcall/enable/disable-ufid,
> > but the userspace datapath should always support ufid. This results in a
> logic that no matter
> > what the upper layer provide, the datapath always use ufid to locate the
> megaflow.
> >
> > Why not use ufid at the first place? I do not know.
>
> The problem is that locally generated UFID, despite the name, can be
> not unique.  It's a hash and collisions are possible.  ukeys are checked
> for UFID collisions, so externally provided UFIDs are indeed unique.
> So we will have to additionally check that we found what we were looking
> for anyways.
>

In fact, if the locally generated *ufid* does conflict with an exist
netdev_flow_key,
later the revalidator will dump this flow and try to recover the associated
ukey,
if it found the conflict, the recover will fail and this netdev_flow_key
will be eventually
removed.

So either generated locally or provided by a random method, if the *ufid*
is not unique,
the netdev_flow_key will not survive too long in current system.
Eventually, the *ufid*
can accurately locate the right megaflow. So I think the uniqueness is not
a big issue here.

Perhaps the real issue here is described as you write in the following.


>
> Another potential problem is manually added flows with UFID calculated
> in a different way / UFID that is just random.  If user later will request
> this flow without specifying UFID, we will not be able to find it.
> For that reason we have to use the classifier, if UFID is not provided
> in the request.  I agree that we can't guarantee that the correct flow
> is found by just a classifier lookup.  So, we should likely bring back
> some form of the flow_equal() check that was removed in commit:
>   beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")
> This commit is also responsible for turning '/* Overlapping flow. */'
> into incorrect comment.
>
> So, something like:
>
>   if put->ufid:
>     Lookup by UFID
>   else:
>     Lookup by KEY
>     if ! flow_equal():
>       /* Overlapping flow. */ --> EINVAL
>
> Agree.



> The problematic part is the statement in the commit message that
> flow_equal()
> check caused false-negative results due to masking.  I'm not sure if that
> is
> still the issue.  We fixed a few issues in that area since then.  But the
> check is definitely needed, because the 'a) the dpcls lookup is sufficient
> to
> uniquely identify a flow' in the commit message is not a correct statement.
>
> I suppose, the mentioned commit is also a candidate for a 'Fixes' tag.
>
> Agree.



> >
> >
> >
> >       Not Found:
> >         DPIF_FP_CREATE is SET:       Add the flow
> >         DPIF_FP_CREATE is not SET:   ENOENT
> >       Found:
> >         DPIF_FP_MODIFY is SET:       Modify the flow
> >         DPIF_FP_MODIFY is not SET:   EEXIST
> >
> >
> >     This way, if flows are added manually it will be user's
> responsibility
> >     to specify or not specify UFIDs, the dpctl/add-flow is a debug
> interface
> >     after all, so should not be a problem.  And the ovs-vswitchd itself
> will
> >     always provide UFID and it checks that ukeys have different UFIDs,
> unless
> >     manually disabled, again, for testing or debugging purposes.
> >
> >     Such logic is not that different from what we have today and should
> cover
> >     the issue and all the combinations of flags.
> >
> >     Or am I missing something?
> >
> >
> >     A few other comments regarding the implementation:
> >
> >     - The change reduces the critical section of the flow mutex down to
> the
> >     DPIF_FP_CREATE case.  We can't actually do that.  The main reason
> will
> >     be a possible race condition on the ovsrcu_set() during the actions
> update
> >     and a subsequent potential actions leak.  RCU set operations require
> >     external synchronization.  There could be other race conditions as
> well,
> >     because multiple threads can work on the same flow at the same time.
> >
> >
> > Yes, this is indeed a problem. Although I am not sure that is it
> possible that two
> > revalidator threads operate a same ukey at the same time? But better
> control
> > this in the lower layer to make sure.
>
> Revalidators should lock the ukey while working on it, but the
> dpif-provider API
> doesn't require that from the caller, so we should check in the
> implementation.
> Also, the manual flow_put from the dpctl/mod-flow may collide with a
> revalidator.
>

Indeed, will send a new version to fix that.

>
> >
> >     - The '/* Overlapping flow. */' comment is actually not correct.
> It's
> >     not really the problem of this particular patch as it wasn't correct
> >     before either, but we should probably just remove it as you're
> touching
> >     this code anyways.
> >
> > Ok, will delete it in the next version.
> >
> >
> >
> >
> >     Best regards, Ilya Maximets.
> >
> >
> >
> > --
> > hepeng
>
>

-- 
hepeng
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to