Ilya Maximets <i.maxim...@ovn.org> 于2023年7月28日周五 20:25写道:
> On 7/28/23 11:02, Peng He wrote: > > > > > > Ilya Maximets <i.maxim...@ovn.org <mailto:i.maxim...@ovn.org>> > 于2023年7月28日周五 00:59写道: > > > > On 7/27/23 04:31, Peng He wrote: > > > > > > > > > Eelco Chaudron <echau...@redhat.com <mailto:echau...@redhat.com> > <mailto:echau...@redhat.com <mailto:echau...@redhat.com>>> 于2023年7月6日周四 > 15:52写道: > > > > > > > > > > > > On 6 Jul 2023, at 4:32, Peng He wrote: > > > > > > > Eelco Chaudron <echau...@redhat.com <mailto: > echau...@redhat.com> <mailto:echau...@redhat.com <mailto: > echau...@redhat.com>>> 于2023年7月5日周三 22:16写道: > > > > > > > >> > > > >> > > > >> On 1 Jul 2023, at 4:43, Peng He wrote: > > > >> > > > >>> OVS allows overlapping megaflows, as long as the actions > of these > > > >>> megaflows are equal. However, the current implementation > of action > > > >>> modification relies on flow_lookup instead of ufid, this > could result > > > >>> in looking up a wrong megaflow and make the ukeys and > megaflows > > > >> inconsistent > > > >>> > > > >>> Just like the test case in the patch, at first we have a > rule with the > > > >>> prefix: > > > >>> > > > >>> 10.1.2.0/24 <http://10.1.2.0/24> <http://10.1.2.0/24 < > http://10.1.2.0/24>> > > > >>> > > > >>> and we will get a megaflow with prefixes 10.1.2.2/24 < > http://10.1.2.2/24> <http://10.1.2.2/24 <http://10.1.2.2/24>> when a > packet with > > > >> IP > > > >>> 10.1.2.2 is received. > > > >>> > > > >>> Then suppose we change the rule into 10.1.0.0/16 < > http://10.1.0.0/16> <http://10.1.0.0/16 <http://10.1.0.0/16>>. OVS > prefers to keep > > > >> the > > > >>> 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 < > http://10.1.2.2/24>> megaflow and just changes its action instead of > extending > > > >>> the prefix into 10.1.2.2/16 <http://10.1.2.2/16> < > http://10.1.2.2/16 <http://10.1.2.2/16>>. > > > >>> > > > >>> then suppose we have a 10.1.0.2 packet, since it misses > the megaflow, > > > >>> this time, we will have an overlapping megaflow with the > right prefix: > > > >>> 10.1.0.2/16 <http://10.1.0.2/16> <http://10.1.0.2/16 < > http://10.1.0.2/16>> > > > >>> > > > >>> now we have two megaflows: > > > >>> 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 < > http://10.1.2.2/24>> > > > >>> 10.1.0.2/16 <http://10.1.0.2/16> <http://10.1.0.2/16 < > http://10.1.0.2/16>> > > > >>> > > > >>> last, suppose we have changed the ruleset again. The > revalidator this > > > >>> time still decides to change the actions of both megaflows > instead of > > > >>> deleting them. > > > >>> > > > >>> The dpif_netdev_flow_put will search the megaflow to > modify with unmasked > > > >>> keys, however it might lookup the wrong megaflow as the > key 10.1.2.2 > > > >> matches > > > >>> both 10.1.2.2/24 <http://10.1.2.2/24> <http://10.1.2.2/24 > <http://10.1.2.2/24>> and 10.1.0.2/16 <http://10.1.0.2/16> < > http://10.1.0.2/16 <http://10.1.0.2/16>>! > > > >>> > > > >>> This patch changes the megaflow lookup code in > modification path into > > > >>> relying the ufid to find the correct megaflow instead of > key lookup. > > > >> > > > >> Thanks for fixing Ilya’s comments! I’ve also copied in some > of the v3 > > > >> discussion, so we can wrap it up here. > > > >> > > > >> Acked-by: Eelco Chaudron <echau...@redhat.com <mailto: > echau...@redhat.com> <mailto:echau...@redhat.com <mailto: > echau...@redhat.com>>> > > > >> > > > >> //Eelco > > > >> > > > >>> Signed-off-by: Peng He <hepeng.0...@bytedance.com <mailto: > hepeng.0...@bytedance.com> <mailto:hepeng.0...@bytedance.com <mailto: > hepeng.0...@bytedance.com>>> > > > >>> --- > > > >>> lib/dpif-netdev.c | 62 > ++++++++++++++++++++++++++--------------------- > > > >>> tests/pmd.at <http://pmd.at> <http://pmd.at < > http://pmd.at>> | 48 ++++++++++++++++++++++++++++++++++++ > > > >>> 2 files changed, 82 insertions(+), 28 deletions(-) > > > >>> > > > >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > > >>> index 70b953ae6..b90ed1542 100644 > > > >>> --- a/lib/dpif-netdev.c > > > >>> +++ b/lib/dpif-netdev.c > > > >>> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct > dp_netdev_pmd_thread *pmd, > > > >>> memset(stats, 0, sizeof *stats); > > > >>> } > > > >>> > > > >>> - ovs_mutex_lock(&pmd->flow_mutex); > > > >>> - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, > NULL); > > > >>> - if (!netdev_flow) { > > > >>> - if (put->flags & DPIF_FP_CREATE) { > > > >>> + if (put->flags & DPIF_FP_CREATE) { > > > >> > > > >> > > > >> EC> Should this not be: > > > >> EC> > > > >> EC> if (put->flags & DPIF_FP_CREATE && !(put->flags & > DPIF_FP_MODIFY)) > > > >> EC> > > > >> EC> I know this will break the fix, but I’m wondering what > the possible > > > >> EC> combinations are. > > > >> EC> Quickly looking at the code the following ones seem to > exist: > > > >> EC> > > > >> EC> DPIF_FP_CREATE -> Create a flow, if one exists return > EEXIST > > > >> EC> DPIF_FP_MODIFY -> Modify a flow, if it does not exist > return ENONT > > > >> EC> DPIF_FP_CREATE | DPIF_FP_MODIFY -> If it exists > modify it, if it does > > > >> EC> not exists create it. > > > >> EC> > > > >> EC> Could the last combination not potentially fail the > lookup? > > > >> EC> > > > >> EC> Or are we assuming only standalone DPIF_FP_MODIFY’s are > the problem? In > > > >> EC> theory, it could also be the combination. > > > >> EC> > > > >> > > > >> PH> sorry, I was wrong. Such a combination does exist in > > > >> PH>the dpif_probe_feature, and it probes the datapath by > trying to put > > > >> flows: > > > >> PH> > > > >> PH> error = dpif_flow_put(dpif, DPIF_FP_CREATE | > DPIF_FP_MODIFY | > > > >> PH>DPIF_FP_PROBE, > > > >> PH> key->data, key->size, NULL, 0, > > > >> PH> nl_actions, nl_actions_size, > > > >> PH> ufid, NON_PMD_CORE_ID, NULL); > > > >> > > > >> PH> so we CANNOT change the code into: > > > >> > > > >> PH> if (put->flags & DPIF_FP_CREATE && !(put->flags & > DPIF_FP_MODIFY)) > > > >> > > > >> PH> as the issue the patch tries to fix only exist in > MODIFY alone path. > > > >> PH> If CREATE bit is set, we need to go through the CREATE > path no matter > > > >> if > > > >> PH> MODIFY exist or not. > > > >> > > > >> Ok, if this is only used by the probe function we should be > fine. I did > > > >> quickly search the code and it seems to be the way. > However, if it’s ever > > > >> used by any part of ovs with both flags set, it might fail > the lookup and > > > >> we run into the same problem. > > > >> > > > > > > > > By "the same problem", you mean the problem listed in this > patch or this > > > > fix might lead to a potential failure in other part of OVS? > > > > > > If you have a request with both DPIF_FP_MODIFY and > DPIF_FP_CREATE set, the lookup could still return the wrong entry. But if > this is not used it should be fine for now. > > > > > > > > > So do we need to check such combination here, and just return > -ENOSUPP? > > > > I'm not sure that's a good solution for a problem in the current > version > > of the fix. In general, this patch seems to not comply with the > documented > > behavior of the flow_put callback. The intended behavior is > described in > > the comment for a struct dpif_flow_put in lib/dpif.h. Both flags > being > > present is a valid case that should be handled. Empty set of flags > can > > also be a valid case of some sort. It will always fail, but it > should > > provide different error codes depending on the flow existence in the > > datapath. > > > > Is there anything major that prevents us from using UFID always? > > Userspace datapath seems to always have it and it will generate one > if not > > provided. The internal generation might be a problem, I guess, due > to a > > risk of it being non-unique. So, we probably should not use a > locally > > generated ufid for a lookup, only for dp_netdev_flow_add. > > > > So, the logic might be: > > > > if put->ufid: /* Actually provided UFID from upper layers. */ > > Lookup by UFID > > else: > > Lookup by KEY > > > > > > In general, I think this solution is better. One thing I am not clear is > below: > > > > if use KEY, how can we guarantee that we have found the right megaflow? > > > > I think the only way to do this is to generate again the ufid using KEY, > and then lookup the > > flow_table using this ufid. The upper layer can decide to use ufid or > not, through upcall/enable/disable-ufid, > > but the userspace datapath should always support ufid. This results in a > logic that no matter > > what the upper layer provide, the datapath always use ufid to locate the > megaflow. > > > > Why not use ufid at the first place? I do not know. > > The problem is that locally generated UFID, despite the name, can be > not unique. It's a hash and collisions are possible. ukeys are checked > for UFID collisions, so externally provided UFIDs are indeed unique. > So we will have to additionally check that we found what we were looking > for anyways. > In fact, if the locally generated *ufid* does conflict with an exist netdev_flow_key, later the revalidator will dump this flow and try to recover the associated ukey, if it found the conflict, the recover will fail and this netdev_flow_key will be eventually removed. So either generated locally or provided by a random method, if the *ufid* is not unique, the netdev_flow_key will not survive too long in current system. Eventually, the *ufid* can accurately locate the right megaflow. So I think the uniqueness is not a big issue here. Perhaps the real issue here is described as you write in the following. > > Another potential problem is manually added flows with UFID calculated > in a different way / UFID that is just random. If user later will request > this flow without specifying UFID, we will not be able to find it. > For that reason we have to use the classifier, if UFID is not provided > in the request. I agree that we can't guarantee that the correct flow > is found by just a classifier lookup. So, we should likely bring back > some form of the flow_equal() check that was removed in commit: > beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") > This commit is also responsible for turning '/* Overlapping flow. */' > into incorrect comment. > > So, something like: > > if put->ufid: > Lookup by UFID > else: > Lookup by KEY > if ! flow_equal(): > /* Overlapping flow. */ --> EINVAL > > Agree. > The problematic part is the statement in the commit message that > flow_equal() > check caused false-negative results due to masking. I'm not sure if that > is > still the issue. We fixed a few issues in that area since then. But the > check is definitely needed, because the 'a) the dpcls lookup is sufficient > to > uniquely identify a flow' in the commit message is not a correct statement. > > I suppose, the mentioned commit is also a candidate for a 'Fixes' tag. > > Agree. > > > > > > > > Not Found: > > DPIF_FP_CREATE is SET: Add the flow > > DPIF_FP_CREATE is not SET: ENOENT > > Found: > > DPIF_FP_MODIFY is SET: Modify the flow > > DPIF_FP_MODIFY is not SET: EEXIST > > > > > > This way, if flows are added manually it will be user's > responsibility > > to specify or not specify UFIDs, the dpctl/add-flow is a debug > interface > > after all, so should not be a problem. And the ovs-vswitchd itself > will > > always provide UFID and it checks that ukeys have different UFIDs, > unless > > manually disabled, again, for testing or debugging purposes. > > > > Such logic is not that different from what we have today and should > cover > > the issue and all the combinations of flags. > > > > Or am I missing something? > > > > > > A few other comments regarding the implementation: > > > > - The change reduces the critical section of the flow mutex down to > the > > DPIF_FP_CREATE case. We can't actually do that. The main reason > will > > be a possible race condition on the ovsrcu_set() during the actions > update > > and a subsequent potential actions leak. RCU set operations require > > external synchronization. There could be other race conditions as > well, > > because multiple threads can work on the same flow at the same time. > > > > > > Yes, this is indeed a problem. Although I am not sure that is it > possible that two > > revalidator threads operate a same ukey at the same time? But better > control > > this in the lower layer to make sure. > > Revalidators should lock the ukey while working on it, but the > dpif-provider API > doesn't require that from the caller, so we should check in the > implementation. > Also, the manual flow_put from the dpctl/mod-flow may collide with a > revalidator. > Indeed, will send a new version to fix that. > > > > > - The '/* Overlapping flow. */' comment is actually not correct. > It's > > not really the problem of this particular patch as it wasn't correct > > before either, but we should probably just remove it as you're > touching > > this code anyways. > > > > Ok, will delete it in the next version. > > > > > > > > > > Best regards, Ilya Maximets. > > > > > > > > -- > > hepeng > > -- hepeng _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev