Hi,

On Tue, Jun 14, 2022 at 05:26:25PM +0300, Oz Shlomo wrote:
> Hi Ilya,
>
> On 6/14/2022 4:03 PM, Ilya Maximets wrote:
> > On 6/14/22 10:27, Oz Shlomo via dev wrote:
> > >
> > >
> > > On 6/8/2022 3:16 AM, Frode Nordahl wrote:
> > > > On Tue, Jun 7, 2022 at 12:16 AM Ilya Maximets <i.maxim...@ovn.org> 
> > > > wrote:
> > > > >
> > > > > On 5/31/22 23:48, Ilya Maximets wrote:
> > > > > > On 5/31/22 21:15, Frode Nordahl wrote:
> > > > > > > On Mon, May 30, 2022 at 5:25 PM Frode Nordahl
> > > > >
> > > > > <snip>
> > > > >
> > > > > > > I've pushed the first part of the fix here:
> > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2022-May%2F394450.html&amp;data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=pq0CABjao2UWojg6yZut7RL%2FZEeuRou0qUVZKNYP3rQ%3D&amp;reserved=0
> > > > > >
> > > > > > Thanks!  I saw that and I tend to think that it is correct.
> > > > > > I'll try to test it and apply in the next couple of days.
> > > > > >
> > > > > > One question about the test above: which entity actually adds
> > > > > > the ct_state to the packet or at which moment that happens?
> > > > > > I see it, but I'm not sure I fully understand that.  Looks
> > > > > > like I'm missing smething obvious.
> > > > >
> > > > > I found what is going on there - kernel simply tracks everything
> > > > > if not told to do so, so ICMP packets create the ct entry and
> > > > > subsequent packets re-use it, so icmp replies have +trk set while
> > > > > entering OVS.
> > > >
> > > > Great, my hunch was that something along these lines was happening as
> > > > well, I have to admit the test case was found by locating something
> > > > closest to the real life use case and it proved to work as a good test
> > > > for this condition.
> > > >
> > > > > ----
> > > > >
> > > > > Let's have some summary of the issues discovered here so far,
> > > > > including a few new issues:
> > > > >
> > > > > 1. ct states set externally are not tracked correctly by OVS.
> > > > >      Fix: 
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2022-May%2F394450.html&amp;data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=pq0CABjao2UWojg6yZut7RL%2FZEeuRou0qUVZKNYP3rQ%3D&amp;reserved=0
> > > > >      Status:  LGTM, will apply soon.
> > > > >      This fixes the problem originally reported by Liam, IIUC.  Right?
> > > >
> > > > Yes.
> > > >
> > > > > 2. Kernel ct() actions are trying to re-use the cached connection
> > > > >      after the tuple changes.
> > > > >      This ends up to be the OVN hairpin issue as reported here:
> > > > >        
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fubuntu%2F%2Bsource%2Fovn%2F%2Bbug%2F1967856&amp;data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=25B7VbtRFguupC7VoNjZK%2FWlasu%2BMSTUzJkszvEpDaQ%3D&amp;reserved=0
> > > > >
> > > > >      Proposed Fix:
> > > > >        
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F20220606221140.488984-1-i.maximets%40ovn.org%2FT%2F%23u&amp;data=05%7C01%7Cozsh%40nvidia.com%7C7ab5490f9c334e9d877f08da4e0652fc%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637908086346878848%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=areRLYsAEbare7yo%2FxmIF9k2tMw2v8ZQkwHcR%2FEvV%2Bo%3D&amp;reserved=0
> > > > >
> > > > >      Status: Needs review.
> > > >
> > > > I can confirm that the proposed fix resolves the OVN hairpin issue. It
> > > > also looks simple enough to be backportable all the way to where we
> > > > would need it (kernel 5.4.0). I'll have a look at giving this wider
> > > > exposure in an internal CI environment as a canary for any unintended
> > > > consequences if that would be helpful.
> > > >
> > >
> > > Sorry for jumping in late on this, as this patch was already accepted to 
> > > the kernel.
> > > However, I have a concern that this patch would break the tc datapath 
> > > when ovs hw offload is enabled.
> > >
> > > IIUC then this patch adds an implicit ovs_ct_clear call for 5-tuple 
> > > modify actions. However, this implicit action will not apply to flows 
> > > that use the tc datapath.
> > >
> > > Forde, can you verify that indeed this fix breaks the OVN hairpin use 
> > > case when hw offload is enabled.
> > Hi, Oz.  I don't think that this kernel fix breaks the TC datapath
> > as the packets processed by the openvswitch kernel module will not
> > go back to TC for further processing, IIUC.  Also, it's not a full
> > ct_clear, because we're not clearing the flow key.
>
> A flow datapath is either in tc or in ovs.
> If hardware offload is enabled then ovs will create a tc flower entry.
> Therefore, packets for that flow will be processed by tc and not
> openvswitch.
> Note that hardware offload may be enabled even if there is no supporting
> hardware. TC software datapath is designed to be functionally equivalent to
> ovs.
>
> tc is processed before openvswitch in the kernel pipeline. Therefore, if a
> packet is matched by tc then it will not continue to openvswitch. Therefore
> my concern is that openvswitch change will not apply if ovs hardware offload
> is enabled.
>
> >
> > But I agree that the original bug exists in TC as well, since TC
> > just copied the ct() recircuation optimization from the openvswitch.
> > So, if there are subsequent ct actions with pedit in between,
> > TC will have the same problem with misclassification as OVS had
> > before the kernel fix 2061ecfdf235.
>
> Right.
>
> >
> > So, the similar fix should be implemented for TC as well.  However,
> > I'm not sure how to actually do that, because ct and pedit are
> > not really connected in the kernel.  The issue might be fixed as a
> > side effect from fixes for the issue #5 in the list here, I guess,
> > but it's not really a correct fix.  The reason why it should be
> > fixed in the kernel is because user doesn't really know that TC
> > or openvswitch module cached that connection, the user didn't ask
> > it to be cached and re-used, they only wanted to populate the
> > current flow key with the ct_{state,mark,label} or commit some
> > changes.  TC/openvswitch kernel module decided to cache the nfct,
> > so it should handle possible mismatch if the packet got changed.
> >
> > Does that make sense?
>
> Indeed changing the tc pedit action is not a possibility.
>
> We did copy the caching optimization from ovs when implementing tc act_ct.
>
> I wonder if we could remove the optimization.
> According to the comment in the code the caching mechanism was designed to
> optimize the ct(commit) execution, as ovs connections have to be explicitly
> commited.

I'm afraid removing the optimization may lead to bad matches, because
TC will use the CT entry to build the key.

https://elixir.bootlin.com/linux/latest/source/net/core/flow_dissector.c#L254

> Perhaps we can also consider the other approach that you suggested,
> verifying that the cached 5-tuples was not changed.

I'm a bit lost here. And verify that against the CT entry itself? It
would have to be a smart check, by checking if it was NATed or not and
so.

>
> However, I do remember that OVN pod to external pipeline actually relies on
> this optimization when executing ct(nat) -> recirc -> ct for identical
> zones. Without the optimization the second ct would miss because the natted
> entry was never commited to the ct table.

(This one was clarified on the other side of the thread)

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to