On 2/4/22 17:12, Gaetan Rivet wrote:
> A race condition has been identified during datapath destruction,
> along with the port offload flushes issued.
> 
> This series addresses these race conditions, cleaning up the
> port deletion process.
> 
> The last patch also cleans up the offload structure.
> It is not strictly necessary like the first two fixes,
> so I put it last. It can wait until after the code-freeze
> to be integrated.
> 
> I tested for a few hours without ASAN enabled without seeing
> issues. ASAN has been executed as part of the github CI:
> https://github.com/grivet/ovs/actions/runs/1795624401
> It is however not too relevant, as no offloads are inserted during CI.
> 
> The following patch was used to fix an unrelated CI issue:
> https://patchwork.ozlabs.org/project/openvswitch/patch/20220204150445.1481457-1-i.maxim...@ovn.org/
> 
> I also ran datapath creation + deletion loop with ASAN on an offload
> test setup, but the execution was excruciatingly slow and could
> not progress much. It reached datapath deletion without panicking
> and no crash was seen, even though I had to interrupt the test after
> a few hours.
> 
> Gaetan Rivet (2):
>   dpif-netdev: Move port flush after datapath reconfiguration
>   dpif-netdev: Use dp_netdev reference in offload threads
> 
> Sriharsha Basavapatna (1):
>   dpif-netdev: Fix a race condition in deletion of offloaded flows
> 
>  lib/dpif-netdev.c | 71 ++++++++++++++++++++++++++---------------------
>  1 file changed, 40 insertions(+), 31 deletions(-)


Thanks!  I applied the series to master and 2.17.

I was thinking about backports and I think that it should be OK to
backport patch #2 (always enqueue deletions), because it doesn't
introduce any new issues.

But I'm not sure about the first patch.  On 2.17 flush acts like
a barrier, so no offload requests are in the offload queue after
the port removal from PMD thread and flush from the offload thread.
However, there is no such synchronization mechanism on 2.16.
Flush is performed from the main thread and some offload requests
could still be in the offload queue.  And if dp is destroyed,
processing of those offload requests will cause use-after-free.
In other words, while patch #1 can reduce the race window for flows
remaining in the hardware after the port deletion, it doesn't solve
the use-after-free problem.  So, we need some other solution.
Backporting the barrier machinery doesn't seem like a good option.

Referencing the dp on creation of the PMD thread or on creation
of the offload item might be a solution. But this will change the
time dp will actually be destroyed, so needs a careful consideration.

We may also not fix the UAF problem on 2.16 taking into account that
issue is happening only during deletion of the datapath, so should not
be very severe and 2.16 is not an LTS branch.

Thoughts?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to