Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On 2/2/22 18:19, Sriharsha Basavapatna wrote: > On Wed, Feb 2, 2022 at 9:22 PM Ilya Maximets wrote: >> >> On 2/1/22 16:07, Gaëtan Rivet wrote: >>> On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote: On 2/1/22 14:38, Gaëtan Rivet wrote: > On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote: >> On 2/1/22 12:38, Gaëtan Rivet wrote: >>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: On 1/31/22 11:42, Gaëtan Rivet wrote: > On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: >> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: >>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an >>> offloaded flow, if a mark has been assigned to the flow. But if >>> this occurs in the window in which the offload thread completes >>> offloading the flow and assigns a mark to the flow, then we miss >>> deleting the flow. This problem has been observed while adding >>> and deleting flows in a loop. To fix this, always enqueue flow >>> deletion regardless of the flow->mark being set. >>> >>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") >>> Signed-off-by: Sriharsha Basavapatna >>> >>> --- >>> lib/dpif-netdev.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index e28e0b554..22c5f19a8 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct >>> dp_netdev_pmd_thread *pmd, >>> dp_netdev_simple_match_remove(pmd, flow); >>> cmap_remove(&pmd->flow_table, node, >>> dp_netdev_flow_hash(&flow->ufid)); >>> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); >>> -if (flow->mark != INVALID_FLOW_MARK) { >>> -queue_netdev_flow_del(pmd, flow); >>> -} >>> +queue_netdev_flow_del(pmd, flow); >>> flow->dead = true; >>> >>> dp_netdev_flow_unref(flow); >>> >> >> Thanks for the patch, but it looks like that it's not that simple... >> A lot of tests are crashing in GHA and ASAN reports use-after-free >> on flow disassociation if the dpif already destroyed: >> https://github.com/ovsrobot/ovs/actions/runs/1754866321 >> >> I think that the problem is that offload thread holds the ref count >> for PMD thread, but PMD thread structure doesn't hold the ref count >> for the dp, because it doesn't expect that dp_netdev_pmd structure >> will be used while PMD thread is destroyed. I guess, we should fix >> that. OTOH, I'm not sure if offload thread actually needs a >> reference >> to dp_netdev_pmd structure. If I didn't miss something, it only >> uses pmd pointer to access pmd->dp. So, maybe, >> dp_offload_thread_item >> should hold and ref the dp pointer instead? >> >> What do you think? Gaetan? >> > > Hi Ilya, > > I've been looking into this issue, you are right that the offload > threads don't > actually need the pmd pointer, only the dp one. > > The problem I see with a proper reference taking on dp to protect > against this > UAF, is that it is prohibitively slow to manage those dp references, > and it relies on > a global lock. > > I have tried to find an alternative (the flush op could be tagged to > mark that > it was issued during a dp_netdev_free), but I always end-up with a > race-condition, > even if the window is reduced. Hmm. Is it with flow API enabled or disabled? With flow api enabled we should have a flush operation that should in theory wait for all the previous offloading operations to be completed. If it's disabled though, flow deletion will initialize the offload thread and enqueue the operation while flush will not actually be requested. >>> >>> Yes, on your second point, with the API disabled the deletion path >>> relied on >>> the test flow->mark != INVALID to know whether a flow was offload or >>> not, and did not >>> care then to check the API status. >>> The API state should now be checked as well for the deletion path. >>> I missed before that we have flush on that path. Maybe it's enough to just add a !netdev_is_flow_api_enabled() check to the deletion path? Or am I missing something else? >>> >>> After the flush, there will be remaining ops in the queue that would >>> trigger the UAF. >> >> What are these ops? >> > > Any add/mod/del can b
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Wed, Feb 2, 2022 at 9:22 PM Ilya Maximets wrote: > > On 2/1/22 16:07, Gaëtan Rivet wrote: > > On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote: > >> On 2/1/22 14:38, Gaëtan Rivet wrote: > >>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote: > On 2/1/22 12:38, Gaëtan Rivet wrote: > > On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: > >> On 1/31/22 11:42, Gaëtan Rivet wrote: > >>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: > On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: > > In dp_netdev_pmd_remove_flow() we schedule the deletion of an > > offloaded flow, if a mark has been assigned to the flow. But if > > this occurs in the window in which the offload thread completes > > offloading the flow and assigns a mark to the flow, then we miss > > deleting the flow. This problem has been observed while adding > > and deleting flows in a loop. To fix this, always enqueue flow > > deletion regardless of the flow->mark being set. > > > > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") > > Signed-off-by: Sriharsha Basavapatna > > > > --- > > lib/dpif-netdev.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index e28e0b554..22c5f19a8 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct > > dp_netdev_pmd_thread *pmd, > > dp_netdev_simple_match_remove(pmd, flow); > > cmap_remove(&pmd->flow_table, node, > > dp_netdev_flow_hash(&flow->ufid)); > > ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); > > -if (flow->mark != INVALID_FLOW_MARK) { > > -queue_netdev_flow_del(pmd, flow); > > -} > > +queue_netdev_flow_del(pmd, flow); > > flow->dead = true; > > > > dp_netdev_flow_unref(flow); > > > > Thanks for the patch, but it looks like that it's not that simple... > A lot of tests are crashing in GHA and ASAN reports use-after-free > on flow disassociation if the dpif already destroyed: > https://github.com/ovsrobot/ovs/actions/runs/1754866321 > > I think that the problem is that offload thread holds the ref count > for PMD thread, but PMD thread structure doesn't hold the ref count > for the dp, because it doesn't expect that dp_netdev_pmd structure > will be used while PMD thread is destroyed. I guess, we should fix > that. OTOH, I'm not sure if offload thread actually needs a > reference > to dp_netdev_pmd structure. If I didn't miss something, it only > uses pmd pointer to access pmd->dp. So, maybe, > dp_offload_thread_item > should hold and ref the dp pointer instead? > > What do you think? Gaetan? > > >>> > >>> Hi Ilya, > >>> > >>> I've been looking into this issue, you are right that the offload > >>> threads don't > >>> actually need the pmd pointer, only the dp one. > >>> > >>> The problem I see with a proper reference taking on dp to protect > >>> against this > >>> UAF, is that it is prohibitively slow to manage those dp references, > >>> and it relies on > >>> a global lock. > >>> > >>> I have tried to find an alternative (the flush op could be tagged to > >>> mark that > >>> it was issued during a dp_netdev_free), but I always end-up with a > >>> race-condition, > >>> even if the window is reduced. > >> Hmm. Is it with flow API enabled or disabled? > >> With flow api enabled we should have a flush operation that should > >> in theory wait for all the previous offloading operations to be > >> completed. If it's disabled though, flow deletion will initialize > >> the offload thread and enqueue the operation while flush will not > >> actually be requested. > > > > Yes, on your second point, with the API disabled the deletion path > > relied on > > the test flow->mark != INVALID to know whether a flow was offload or > > not, and did not > > care then to check the API status. > > The API state should now be checked as well for the deletion path. > > > >> > >> I missed before that we have flush on that path. Maybe it's enough > >> to just add a !netdev_is_flow_api_enabled() check to the deletion path? > >> Or am I missing something else? > >> > > > > After the flush, there will be remaining ops in the queue that would > > trigger the UAF. > > What are these ops? > > >>> > >>> Any add/mod/del can be found there. > >>> > >>> Flushes are scheduled b
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On 2/1/22 16:07, Gaëtan Rivet wrote: > On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote: >> On 2/1/22 14:38, Gaëtan Rivet wrote: >>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote: On 2/1/22 12:38, Gaëtan Rivet wrote: > On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: >> On 1/31/22 11:42, Gaëtan Rivet wrote: >>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: > In dp_netdev_pmd_remove_flow() we schedule the deletion of an > offloaded flow, if a mark has been assigned to the flow. But if > this occurs in the window in which the offload thread completes > offloading the flow and assigns a mark to the flow, then we miss > deleting the flow. This problem has been observed while adding > and deleting flows in a loop. To fix this, always enqueue flow > deletion regardless of the flow->mark being set. > > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") > Signed-off-by: Sriharsha Basavapatna > > --- > lib/dpif-netdev.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index e28e0b554..22c5f19a8 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct > dp_netdev_pmd_thread *pmd, > dp_netdev_simple_match_remove(pmd, flow); > cmap_remove(&pmd->flow_table, node, > dp_netdev_flow_hash(&flow->ufid)); > ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); > -if (flow->mark != INVALID_FLOW_MARK) { > -queue_netdev_flow_del(pmd, flow); > -} > +queue_netdev_flow_del(pmd, flow); > flow->dead = true; > > dp_netdev_flow_unref(flow); > Thanks for the patch, but it looks like that it's not that simple... A lot of tests are crashing in GHA and ASAN reports use-after-free on flow disassociation if the dpif already destroyed: https://github.com/ovsrobot/ovs/actions/runs/1754866321 I think that the problem is that offload thread holds the ref count for PMD thread, but PMD thread structure doesn't hold the ref count for the dp, because it doesn't expect that dp_netdev_pmd structure will be used while PMD thread is destroyed. I guess, we should fix that. OTOH, I'm not sure if offload thread actually needs a reference to dp_netdev_pmd structure. If I didn't miss something, it only uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item should hold and ref the dp pointer instead? What do you think? Gaetan? >>> >>> Hi Ilya, >>> >>> I've been looking into this issue, you are right that the offload >>> threads don't >>> actually need the pmd pointer, only the dp one. >>> >>> The problem I see with a proper reference taking on dp to protect >>> against this >>> UAF, is that it is prohibitively slow to manage those dp references, >>> and it relies on >>> a global lock. >>> >>> I have tried to find an alternative (the flush op could be tagged to >>> mark that >>> it was issued during a dp_netdev_free), but I always end-up with a >>> race-condition, >>> even if the window is reduced. >> Hmm. Is it with flow API enabled or disabled? >> With flow api enabled we should have a flush operation that should >> in theory wait for all the previous offloading operations to be >> completed. If it's disabled though, flow deletion will initialize >> the offload thread and enqueue the operation while flush will not >> actually be requested. > > Yes, on your second point, with the API disabled the deletion path relied > on > the test flow->mark != INVALID to know whether a flow was offload or not, > and did not > care then to check the API status. > The API state should now be checked as well for the deletion path. > >> >> I missed before that we have flush on that path. Maybe it's enough >> to just add a !netdev_is_flow_api_enabled() check to the deletion path? >> Or am I missing something else? >> > > After the flush, there will be remaining ops in the queue that would > trigger the UAF. What are these ops? >>> >>> Any add/mod/del can be found there. >>> >>> Flushes are scheduled by the main thread in response to user commands. >>> While it executes, PMDs et revalidators are running, potentially enqueuing >>> offload ops. The only restriction on those ops is that it's impossible to >>> have multiple >>> flush in the queue: only one can exist at a time. >>> >> >> I
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote: > On 2/1/22 14:38, Gaëtan Rivet wrote: >> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote: >>> On 2/1/22 12:38, Gaëtan Rivet wrote: On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: > On 1/31/22 11:42, Gaëtan Rivet wrote: >> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: >>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: In dp_netdev_pmd_remove_flow() we schedule the deletion of an offloaded flow, if a mark has been assigned to the flow. But if this occurs in the window in which the offload thread completes offloading the flow and assigns a mark to the flow, then we miss deleting the flow. This problem has been observed while adding and deleting flows in a loop. To fix this, always enqueue flow deletion regardless of the flow->mark being set. Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") Signed-off-by: Sriharsha Basavapatna --- lib/dpif-netdev.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e28e0b554..22c5f19a8 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, dp_netdev_simple_match_remove(pmd, flow); cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); -if (flow->mark != INVALID_FLOW_MARK) { -queue_netdev_flow_del(pmd, flow); -} +queue_netdev_flow_del(pmd, flow); flow->dead = true; dp_netdev_flow_unref(flow); >>> >>> Thanks for the patch, but it looks like that it's not that simple... >>> A lot of tests are crashing in GHA and ASAN reports use-after-free >>> on flow disassociation if the dpif already destroyed: >>> https://github.com/ovsrobot/ovs/actions/runs/1754866321 >>> >>> I think that the problem is that offload thread holds the ref count >>> for PMD thread, but PMD thread structure doesn't hold the ref count >>> for the dp, because it doesn't expect that dp_netdev_pmd structure >>> will be used while PMD thread is destroyed. I guess, we should fix >>> that. OTOH, I'm not sure if offload thread actually needs a reference >>> to dp_netdev_pmd structure. If I didn't miss something, it only >>> uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item >>> should hold and ref the dp pointer instead? >>> >>> What do you think? Gaetan? >>> >> >> Hi Ilya, >> >> I've been looking into this issue, you are right that the offload >> threads don't >> actually need the pmd pointer, only the dp one. >> >> The problem I see with a proper reference taking on dp to protect >> against this >> UAF, is that it is prohibitively slow to manage those dp references, and >> it relies on >> a global lock. >> >> I have tried to find an alternative (the flush op could be tagged to >> mark that >> it was issued during a dp_netdev_free), but I always end-up with a >> race-condition, >> even if the window is reduced. > Hmm. Is it with flow API enabled or disabled? > With flow api enabled we should have a flush operation that should > in theory wait for all the previous offloading operations to be > completed. If it's disabled though, flow deletion will initialize > the offload thread and enqueue the operation while flush will not > actually be requested. Yes, on your second point, with the API disabled the deletion path relied on the test flow->mark != INVALID to know whether a flow was offload or not, and did not care then to check the API status. The API state should now be checked as well for the deletion path. > > I missed before that we have flush on that path. Maybe it's enough > to just add a !netdev_is_flow_api_enabled() check to the deletion path? > Or am I missing something else? > After the flush, there will be remaining ops in the queue that would trigger the UAF. >>> >>> What are these ops? >>> >> >> Any add/mod/del can be found there. >> >> Flushes are scheduled by the main thread in response to user commands. >> While it executes, PMDs et revalidators are running, potentially enqueuing >> offload ops. The only restriction on those ops is that it's impossible to >> have multiple >> flush in the queue: only one can exist at a time. >> > > I think, revalidators should already be stopped at the point of datapath > deletion. Don't they? > > For the PMDs, I see that they are still runnin
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On 2/1/22 14:38, Gaëtan Rivet wrote: > On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote: >> On 2/1/22 12:38, Gaëtan Rivet wrote: >>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: On 1/31/22 11:42, Gaëtan Rivet wrote: > On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: >> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: >>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an >>> offloaded flow, if a mark has been assigned to the flow. But if >>> this occurs in the window in which the offload thread completes >>> offloading the flow and assigns a mark to the flow, then we miss >>> deleting the flow. This problem has been observed while adding >>> and deleting flows in a loop. To fix this, always enqueue flow >>> deletion regardless of the flow->mark being set. >>> >>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") >>> Signed-off-by: Sriharsha Basavapatna >>> >>> --- >>> lib/dpif-netdev.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index e28e0b554..22c5f19a8 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct >>> dp_netdev_pmd_thread *pmd, >>> dp_netdev_simple_match_remove(pmd, flow); >>> cmap_remove(&pmd->flow_table, node, >>> dp_netdev_flow_hash(&flow->ufid)); >>> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); >>> -if (flow->mark != INVALID_FLOW_MARK) { >>> -queue_netdev_flow_del(pmd, flow); >>> -} >>> +queue_netdev_flow_del(pmd, flow); >>> flow->dead = true; >>> >>> dp_netdev_flow_unref(flow); >>> >> >> Thanks for the patch, but it looks like that it's not that simple... >> A lot of tests are crashing in GHA and ASAN reports use-after-free >> on flow disassociation if the dpif already destroyed: >> https://github.com/ovsrobot/ovs/actions/runs/1754866321 >> >> I think that the problem is that offload thread holds the ref count >> for PMD thread, but PMD thread structure doesn't hold the ref count >> for the dp, because it doesn't expect that dp_netdev_pmd structure >> will be used while PMD thread is destroyed. I guess, we should fix >> that. OTOH, I'm not sure if offload thread actually needs a reference >> to dp_netdev_pmd structure. If I didn't miss something, it only >> uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item >> should hold and ref the dp pointer instead? >> >> What do you think? Gaetan? >> > > Hi Ilya, > > I've been looking into this issue, you are right that the offload threads > don't > actually need the pmd pointer, only the dp one. > > The problem I see with a proper reference taking on dp to protect against > this > UAF, is that it is prohibitively slow to manage those dp references, and > it relies on > a global lock. > > I have tried to find an alternative (the flush op could be tagged to mark > that > it was issued during a dp_netdev_free), but I always end-up with a > race-condition, > even if the window is reduced. Hmm. Is it with flow API enabled or disabled? With flow api enabled we should have a flush operation that should in theory wait for all the previous offloading operations to be completed. If it's disabled though, flow deletion will initialize the offload thread and enqueue the operation while flush will not actually be requested. >>> >>> Yes, on your second point, with the API disabled the deletion path relied on >>> the test flow->mark != INVALID to know whether a flow was offload or not, >>> and did not >>> care then to check the API status. >>> The API state should now be checked as well for the deletion path. >>> I missed before that we have flush on that path. Maybe it's enough to just add a !netdev_is_flow_api_enabled() check to the deletion path? Or am I missing something else? >>> >>> After the flush, there will be remaining ops in the queue that would >>> trigger the UAF. >> >> What are these ops? >> > > Any add/mod/del can be found there. > > Flushes are scheduled by the main thread in response to user commands. > While it executes, PMDs et revalidators are running, potentially enqueuing > offload ops. The only restriction on those ops is that it's impossible to > have multiple > flush in the queue: only one can exist at a time. > I think, revalidators should already be stopped at the point of datapath deletion. Don't they? For the PMDs, I see that they are still running and that sounds like a bug, because we're still offloading things after the flush. So, some of the flows are still in hardware after the port removal / some userspace struc
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote: > On 2/1/22 12:38, Gaëtan Rivet wrote: >> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: >>> On 1/31/22 11:42, Gaëtan Rivet wrote: On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: > On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: >> In dp_netdev_pmd_remove_flow() we schedule the deletion of an >> offloaded flow, if a mark has been assigned to the flow. But if >> this occurs in the window in which the offload thread completes >> offloading the flow and assigns a mark to the flow, then we miss >> deleting the flow. This problem has been observed while adding >> and deleting flows in a loop. To fix this, always enqueue flow >> deletion regardless of the flow->mark being set. >> >> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") >> Signed-off-by: Sriharsha Basavapatna >> --- >> lib/dpif-netdev.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index e28e0b554..22c5f19a8 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct >> dp_netdev_pmd_thread *pmd, >> dp_netdev_simple_match_remove(pmd, flow); >> cmap_remove(&pmd->flow_table, node, >> dp_netdev_flow_hash(&flow->ufid)); >> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); >> -if (flow->mark != INVALID_FLOW_MARK) { >> -queue_netdev_flow_del(pmd, flow); >> -} >> +queue_netdev_flow_del(pmd, flow); >> flow->dead = true; >> >> dp_netdev_flow_unref(flow); >> > > Thanks for the patch, but it looks like that it's not that simple... > A lot of tests are crashing in GHA and ASAN reports use-after-free > on flow disassociation if the dpif already destroyed: > https://github.com/ovsrobot/ovs/actions/runs/1754866321 > > I think that the problem is that offload thread holds the ref count > for PMD thread, but PMD thread structure doesn't hold the ref count > for the dp, because it doesn't expect that dp_netdev_pmd structure > will be used while PMD thread is destroyed. I guess, we should fix > that. OTOH, I'm not sure if offload thread actually needs a reference > to dp_netdev_pmd structure. If I didn't miss something, it only > uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item > should hold and ref the dp pointer instead? > > What do you think? Gaetan? > Hi Ilya, I've been looking into this issue, you are right that the offload threads don't actually need the pmd pointer, only the dp one. The problem I see with a proper reference taking on dp to protect against this UAF, is that it is prohibitively slow to manage those dp references, and it relies on a global lock. I have tried to find an alternative (the flush op could be tagged to mark that it was issued during a dp_netdev_free), but I always end-up with a race-condition, even if the window is reduced. >>> Hmm. Is it with flow API enabled or disabled? >>> With flow api enabled we should have a flush operation that should >>> in theory wait for all the previous offloading operations to be >>> completed. If it's disabled though, flow deletion will initialize >>> the offload thread and enqueue the operation while flush will not >>> actually be requested. >> >> Yes, on your second point, with the API disabled the deletion path relied on >> the test flow->mark != INVALID to know whether a flow was offload or not, >> and did not >> care then to check the API status. >> The API state should now be checked as well for the deletion path. >> >>> >>> I missed before that we have flush on that path. Maybe it's enough >>> to just add a !netdev_is_flow_api_enabled() check to the deletion path? >>> Or am I missing something else? >>> >> >> After the flush, there will be remaining ops in the queue that would trigger >> the UAF. > > What are these ops? > Any add/mod/del can be found there. Flushes are scheduled by the main thread in response to user commands. While it executes, PMDs et revalidators are running, potentially enqueuing offload ops. The only restriction on those ops is that it's impossible to have multiple flush in the queue: only one can exist at a time. -- Gaetan Rivet ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On 2/1/22 12:38, Gaëtan Rivet wrote: > On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: >> On 1/31/22 11:42, Gaëtan Rivet wrote: >>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: > In dp_netdev_pmd_remove_flow() we schedule the deletion of an > offloaded flow, if a mark has been assigned to the flow. But if > this occurs in the window in which the offload thread completes > offloading the flow and assigns a mark to the flow, then we miss > deleting the flow. This problem has been observed while adding > and deleting flows in a loop. To fix this, always enqueue flow > deletion regardless of the flow->mark being set. > > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") > Signed-off-by: Sriharsha Basavapatna > --- > lib/dpif-netdev.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index e28e0b554..22c5f19a8 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct > dp_netdev_pmd_thread *pmd, > dp_netdev_simple_match_remove(pmd, flow); > cmap_remove(&pmd->flow_table, node, > dp_netdev_flow_hash(&flow->ufid)); > ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); > -if (flow->mark != INVALID_FLOW_MARK) { > -queue_netdev_flow_del(pmd, flow); > -} > +queue_netdev_flow_del(pmd, flow); > flow->dead = true; > > dp_netdev_flow_unref(flow); > Thanks for the patch, but it looks like that it's not that simple... A lot of tests are crashing in GHA and ASAN reports use-after-free on flow disassociation if the dpif already destroyed: https://github.com/ovsrobot/ovs/actions/runs/1754866321 I think that the problem is that offload thread holds the ref count for PMD thread, but PMD thread structure doesn't hold the ref count for the dp, because it doesn't expect that dp_netdev_pmd structure will be used while PMD thread is destroyed. I guess, we should fix that. OTOH, I'm not sure if offload thread actually needs a reference to dp_netdev_pmd structure. If I didn't miss something, it only uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item should hold and ref the dp pointer instead? What do you think? Gaetan? >>> >>> Hi Ilya, >>> >>> I've been looking into this issue, you are right that the offload threads >>> don't >>> actually need the pmd pointer, only the dp one. >>> >>> The problem I see with a proper reference taking on dp to protect against >>> this >>> UAF, is that it is prohibitively slow to manage those dp references, and it >>> relies on >>> a global lock. >>> >>> I have tried to find an alternative (the flush op could be tagged to mark >>> that >>> it was issued during a dp_netdev_free), but I always end-up with a >>> race-condition, >>> even if the window is reduced. >> Hmm. Is it with flow API enabled or disabled? >> With flow api enabled we should have a flush operation that should >> in theory wait for all the previous offloading operations to be >> completed. If it's disabled though, flow deletion will initialize >> the offload thread and enqueue the operation while flush will not >> actually be requested. > > Yes, on your second point, with the API disabled the deletion path relied on > the test flow->mark != INVALID to know whether a flow was offload or not, and > did not > care then to check the API status. > The API state should now be checked as well for the deletion path. > >> >> I missed before that we have flush on that path. Maybe it's enough >> to just add a !netdev_is_flow_api_enabled() check to the deletion path? >> Or am I missing something else? >> > > After the flush, there will be remaining ops in the queue that would trigger > the UAF. What are these ops? > The api_enabled() check on del emission is not sufficient. > During the flush, with a special tag telling that it is part of deleting the > whole > datapath, the API could be marked disabled, and the api_enabled() could be > checked again in the offload thread, before any dereference of dp. > > I didn't consider it as the API was not meant to be disabled then possible > re-enabled > after a datapath re-creation potentially. That could work, with some changes > on > the api_enabled() flag. The current flag should hold the user intent as set > in the config, > a second flag should hold the API liveness as set related to the dp_netdev > status. > That second flag would be the one read during the checks, and written on open > and after > a flush, before getting past the barrier. All offload threads should just > discard > offload requests if this liveness flag is false. > ___ dev mailing
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote: > On 1/31/22 11:42, Gaëtan Rivet wrote: >> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: >>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: In dp_netdev_pmd_remove_flow() we schedule the deletion of an offloaded flow, if a mark has been assigned to the flow. But if this occurs in the window in which the offload thread completes offloading the flow and assigns a mark to the flow, then we miss deleting the flow. This problem has been observed while adding and deleting flows in a loop. To fix this, always enqueue flow deletion regardless of the flow->mark being set. Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") Signed-off-by: Sriharsha Basavapatna --- lib/dpif-netdev.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e28e0b554..22c5f19a8 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, dp_netdev_simple_match_remove(pmd, flow); cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); -if (flow->mark != INVALID_FLOW_MARK) { -queue_netdev_flow_del(pmd, flow); -} +queue_netdev_flow_del(pmd, flow); flow->dead = true; dp_netdev_flow_unref(flow); >>> >>> Thanks for the patch, but it looks like that it's not that simple... >>> A lot of tests are crashing in GHA and ASAN reports use-after-free >>> on flow disassociation if the dpif already destroyed: >>> https://github.com/ovsrobot/ovs/actions/runs/1754866321 >>> >>> I think that the problem is that offload thread holds the ref count >>> for PMD thread, but PMD thread structure doesn't hold the ref count >>> for the dp, because it doesn't expect that dp_netdev_pmd structure >>> will be used while PMD thread is destroyed. I guess, we should fix >>> that. OTOH, I'm not sure if offload thread actually needs a reference >>> to dp_netdev_pmd structure. If I didn't miss something, it only >>> uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item >>> should hold and ref the dp pointer instead? >>> >>> What do you think? Gaetan? >>> >> >> Hi Ilya, >> >> I've been looking into this issue, you are right that the offload threads >> don't >> actually need the pmd pointer, only the dp one. >> >> The problem I see with a proper reference taking on dp to protect against >> this >> UAF, is that it is prohibitively slow to manage those dp references, and it >> relies on >> a global lock. >> >> I have tried to find an alternative (the flush op could be tagged to mark >> that >> it was issued during a dp_netdev_free), but I always end-up with a >> race-condition, >> even if the window is reduced. > Hmm. Is it with flow API enabled or disabled? > With flow api enabled we should have a flush operation that should > in theory wait for all the previous offloading operations to be > completed. If it's disabled though, flow deletion will initialize > the offload thread and enqueue the operation while flush will not > actually be requested. Yes, on your second point, with the API disabled the deletion path relied on the test flow->mark != INVALID to know whether a flow was offload or not, and did not care then to check the API status. The API state should now be checked as well for the deletion path. > > I missed before that we have flush on that path. Maybe it's enough > to just add a !netdev_is_flow_api_enabled() check to the deletion path? > Or am I missing something else? > After the flush, there will be remaining ops in the queue that would trigger the UAF. The api_enabled() check on del emission is not sufficient. During the flush, with a special tag telling that it is part of deleting the whole datapath, the API could be marked disabled, and the api_enabled() could be checked again in the offload thread, before any dereference of dp. I didn't consider it as the API was not meant to be disabled then possible re-enabled after a datapath re-creation potentially. That could work, with some changes on the api_enabled() flag. The current flag should hold the user intent as set in the config, a second flag should hold the API liveness as set related to the dp_netdev status. That second flag would be the one read during the checks, and written on open and after a flush, before getting past the barrier. All offload threads should just discard offload requests if this liveness flag is false. -- Gaetan Rivet ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On 1/31/22 11:42, Gaëtan Rivet wrote: > On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: >> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: >>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an >>> offloaded flow, if a mark has been assigned to the flow. But if >>> this occurs in the window in which the offload thread completes >>> offloading the flow and assigns a mark to the flow, then we miss >>> deleting the flow. This problem has been observed while adding >>> and deleting flows in a loop. To fix this, always enqueue flow >>> deletion regardless of the flow->mark being set. >>> >>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") >>> Signed-off-by: Sriharsha Basavapatna >>> --- >>> lib/dpif-netdev.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index e28e0b554..22c5f19a8 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread >>> *pmd, >>> dp_netdev_simple_match_remove(pmd, flow); >>> cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); >>> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); >>> -if (flow->mark != INVALID_FLOW_MARK) { >>> -queue_netdev_flow_del(pmd, flow); >>> -} >>> +queue_netdev_flow_del(pmd, flow); >>> flow->dead = true; >>> >>> dp_netdev_flow_unref(flow); >>> >> >> Thanks for the patch, but it looks like that it's not that simple... >> A lot of tests are crashing in GHA and ASAN reports use-after-free >> on flow disassociation if the dpif already destroyed: >> https://github.com/ovsrobot/ovs/actions/runs/1754866321 >> >> I think that the problem is that offload thread holds the ref count >> for PMD thread, but PMD thread structure doesn't hold the ref count >> for the dp, because it doesn't expect that dp_netdev_pmd structure >> will be used while PMD thread is destroyed. I guess, we should fix >> that. OTOH, I'm not sure if offload thread actually needs a reference >> to dp_netdev_pmd structure. If I didn't miss something, it only >> uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item >> should hold and ref the dp pointer instead? >> >> What do you think? Gaetan? >> > > Hi Ilya, > > I've been looking into this issue, you are right that the offload threads > don't > actually need the pmd pointer, only the dp one. > > The problem I see with a proper reference taking on dp to protect against this > UAF, is that it is prohibitively slow to manage those dp references, and it > relies on > a global lock. > > I have tried to find an alternative (the flush op could be tagged to mark that > it was issued during a dp_netdev_free), but I always end-up with a > race-condition, > even if the window is reduced. Hmm. Is it with flow API enabled or disabled? With flow api enabled we should have a flush operation that should in theory wait for all the previous offloading operations to be completed. If it's disabled though, flow deletion will initialize the offload thread and enqueue the operation while flush will not actually be requested. I missed before that we have flush on that path. Maybe it's enough to just add a !netdev_is_flow_api_enabled() check to the deletion path? Or am I missing something else? > > So given that the refcounting is too slow but necessary, I think the only > practical > solution is to rewrite it to be fast enough. I will send an RFC to that end, > let > me know if you think of something better. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote: > On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: >> In dp_netdev_pmd_remove_flow() we schedule the deletion of an >> offloaded flow, if a mark has been assigned to the flow. But if >> this occurs in the window in which the offload thread completes >> offloading the flow and assigns a mark to the flow, then we miss >> deleting the flow. This problem has been observed while adding >> and deleting flows in a loop. To fix this, always enqueue flow >> deletion regardless of the flow->mark being set. >> >> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") >> Signed-off-by: Sriharsha Basavapatna >> --- >> lib/dpif-netdev.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index e28e0b554..22c5f19a8 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread >> *pmd, >> dp_netdev_simple_match_remove(pmd, flow); >> cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); >> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); >> -if (flow->mark != INVALID_FLOW_MARK) { >> -queue_netdev_flow_del(pmd, flow); >> -} >> +queue_netdev_flow_del(pmd, flow); >> flow->dead = true; >> >> dp_netdev_flow_unref(flow); >> > > Thanks for the patch, but it looks like that it's not that simple... > A lot of tests are crashing in GHA and ASAN reports use-after-free > on flow disassociation if the dpif already destroyed: > https://github.com/ovsrobot/ovs/actions/runs/1754866321 > > I think that the problem is that offload thread holds the ref count > for PMD thread, but PMD thread structure doesn't hold the ref count > for the dp, because it doesn't expect that dp_netdev_pmd structure > will be used while PMD thread is destroyed. I guess, we should fix > that. OTOH, I'm not sure if offload thread actually needs a reference > to dp_netdev_pmd structure. If I didn't miss something, it only > uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item > should hold and ref the dp pointer instead? > > What do you think? Gaetan? > Hi Ilya, I've been looking into this issue, you are right that the offload threads don't actually need the pmd pointer, only the dp one. The problem I see with a proper reference taking on dp to protect against this UAF, is that it is prohibitively slow to manage those dp references, and it relies on a global lock. I have tried to find an alternative (the flush op could be tagged to mark that it was issued during a dp_netdev_free), but I always end-up with a race-condition, even if the window is reduced. So given that the refcounting is too slow but necessary, I think the only practical solution is to rewrite it to be fast enough. I will send an RFC to that end, let me know if you think of something better. -- Gaetan Rivet ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On 1/27/22 13:09, Sriharsha Basavapatna wrote: > On Thu, Jan 27, 2022 at 4:08 PM Ilya Maximets wrote: >> >> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: >>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an >>> offloaded flow, if a mark has been assigned to the flow. But if >>> this occurs in the window in which the offload thread completes >>> offloading the flow and assigns a mark to the flow, then we miss >>> deleting the flow. This problem has been observed while adding >>> and deleting flows in a loop. To fix this, always enqueue flow >>> deletion regardless of the flow->mark being set. >>> >>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") >>> Signed-off-by: Sriharsha Basavapatna >>> --- >>> lib/dpif-netdev.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index e28e0b554..22c5f19a8 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread >>> *pmd, >>> dp_netdev_simple_match_remove(pmd, flow); >>> cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); >>> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); >>> -if (flow->mark != INVALID_FLOW_MARK) { >>> -queue_netdev_flow_del(pmd, flow); >>> -} >>> +queue_netdev_flow_del(pmd, flow); >>> flow->dead = true; >>> >>> dp_netdev_flow_unref(flow); >>> >> >> Thanks for the patch, but it looks like that it's not that simple... >> A lot of tests are crashing in GHA and ASAN reports use-after-free >> on flow disassociation if the dpif already destroyed: >> https://github.com/ovsrobot/ovs/actions/runs/1754866321 >> >> I think that the problem is that offload thread holds the ref count >> for PMD thread, but PMD thread structure doesn't hold the ref count >> for the dp, because it doesn't expect that dp_netdev_pmd structure >> will be used while PMD thread is destroyed. I guess, we should fix >> that. OTOH, I'm not sure if offload thread actually needs a reference >> to dp_netdev_pmd structure. If I didn't miss something, it only >> uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item >> should hold and ref the dp pointer instead? >> >> What do you think? Gaetan? >> >> Another point is that queue_netdev_flow_del() should check for >> netdev_is_flow_api_enabled() to avoid creation of offload threads >> if offloading is disabled. But that's good that we didn't have it, >> as the refcount issue got exposed. Otherwise it would be hard >> to reproduce. >> >> Best regards, Ilya Maximets. >> >> Asan report below, for convenience: >> >> = >> ==17076==ERROR: AddressSanitizer: heap-use-after-free on address >> 0x61603080 at pc 0x005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8 >> READ of size 8 at 0x61603080 thread T5 (hw_offload4) >> #0 0x5e0e37 in mark_to_flow_disassociate >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62 >> #1 0x5dfaf1 in dp_offload_flow >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15 >> #2 0x5df8bd in dp_netdev_flow_offload_main >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17 >> #3 0x73a2fc in ovsthread_wrapper >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12 >> #4 0x7fe0619506da in start_thread >> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) >> #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) >> >> 0x61603080 is located 0 bytes inside of 576-byte region >> [0x61603080,0x616032c0) >> freed by thread T0 here: >> #0 0x49640d in free >> (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d) >> #1 0x5d6652 in dp_netdev_free >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5 >> #2 0x5f0722 in dp_netdev_unref >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13 >> #3 0x5cf5e5 in dpif_netdev_close >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5 >> #4 0x5fc393 in dpif_uninit >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9 >> #5 0x5fc0c0 in dpif_close >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9 >> #6 0x52a972 in close_dpif_backer >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5 >> #7 0x517f08 in destruct >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5 >> #8 0x4f09e0 in ofproto_destroy >> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5 >> #9 0x4c71e4 in bridge_destroy
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Thu, Jan 27, 2022 at 4:08 PM Ilya Maximets wrote: > > On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: > > In dp_netdev_pmd_remove_flow() we schedule the deletion of an > > offloaded flow, if a mark has been assigned to the flow. But if > > this occurs in the window in which the offload thread completes > > offloading the flow and assigns a mark to the flow, then we miss > > deleting the flow. This problem has been observed while adding > > and deleting flows in a loop. To fix this, always enqueue flow > > deletion regardless of the flow->mark being set. > > > > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") > > Signed-off-by: Sriharsha Basavapatna > > --- > > lib/dpif-netdev.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index e28e0b554..22c5f19a8 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread > > *pmd, > > dp_netdev_simple_match_remove(pmd, flow); > > cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); > > ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); > > -if (flow->mark != INVALID_FLOW_MARK) { > > -queue_netdev_flow_del(pmd, flow); > > -} > > +queue_netdev_flow_del(pmd, flow); > > flow->dead = true; > > > > dp_netdev_flow_unref(flow); > > > > Thanks for the patch, but it looks like that it's not that simple... > A lot of tests are crashing in GHA and ASAN reports use-after-free > on flow disassociation if the dpif already destroyed: > https://github.com/ovsrobot/ovs/actions/runs/1754866321 > > I think that the problem is that offload thread holds the ref count > for PMD thread, but PMD thread structure doesn't hold the ref count > for the dp, because it doesn't expect that dp_netdev_pmd structure > will be used while PMD thread is destroyed. I guess, we should fix > that. OTOH, I'm not sure if offload thread actually needs a reference > to dp_netdev_pmd structure. If I didn't miss something, it only > uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item > should hold and ref the dp pointer instead? > > What do you think? Gaetan? > > Another point is that queue_netdev_flow_del() should check for > netdev_is_flow_api_enabled() to avoid creation of offload threads > if offloading is disabled. But that's good that we didn't have it, > as the refcount issue got exposed. Otherwise it would be hard > to reproduce. > > Best regards, Ilya Maximets. > > Asan report below, for convenience: > > = > ==17076==ERROR: AddressSanitizer: heap-use-after-free on address > 0x61603080 at pc 0x005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8 > READ of size 8 at 0x61603080 thread T5 (hw_offload4) > #0 0x5e0e37 in mark_to_flow_disassociate > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62 > #1 0x5dfaf1 in dp_offload_flow > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15 > #2 0x5df8bd in dp_netdev_flow_offload_main > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17 > #3 0x73a2fc in ovsthread_wrapper > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12 > #4 0x7fe0619506da in start_thread > (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) > #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) > > 0x61603080 is located 0 bytes inside of 576-byte region > [0x61603080,0x616032c0) > freed by thread T0 here: > #0 0x49640d in free > (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d) > #1 0x5d6652 in dp_netdev_free > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5 > #2 0x5f0722 in dp_netdev_unref > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13 > #3 0x5cf5e5 in dpif_netdev_close > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5 > #4 0x5fc393 in dpif_uninit > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9 > #5 0x5fc0c0 in dpif_close > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9 > #6 0x52a972 in close_dpif_backer > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5 > #7 0x517f08 in destruct > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5 > #8 0x4f09e0 in ofproto_destroy > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5 > #9 0x4c71e4 in bridge_destroy > /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3608:9 > #10 0x4c6f4a in bri
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote: > In dp_netdev_pmd_remove_flow() we schedule the deletion of an > offloaded flow, if a mark has been assigned to the flow. But if > this occurs in the window in which the offload thread completes > offloading the flow and assigns a mark to the flow, then we miss > deleting the flow. This problem has been observed while adding > and deleting flows in a loop. To fix this, always enqueue flow > deletion regardless of the flow->mark being set. > > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") > Signed-off-by: Sriharsha Basavapatna > --- > lib/dpif-netdev.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index e28e0b554..22c5f19a8 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread > *pmd, > dp_netdev_simple_match_remove(pmd, flow); > cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); > ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); > -if (flow->mark != INVALID_FLOW_MARK) { > -queue_netdev_flow_del(pmd, flow); > -} > +queue_netdev_flow_del(pmd, flow); > flow->dead = true; > > dp_netdev_flow_unref(flow); > Thanks for the patch, but it looks like that it's not that simple... A lot of tests are crashing in GHA and ASAN reports use-after-free on flow disassociation if the dpif already destroyed: https://github.com/ovsrobot/ovs/actions/runs/1754866321 I think that the problem is that offload thread holds the ref count for PMD thread, but PMD thread structure doesn't hold the ref count for the dp, because it doesn't expect that dp_netdev_pmd structure will be used while PMD thread is destroyed. I guess, we should fix that. OTOH, I'm not sure if offload thread actually needs a reference to dp_netdev_pmd structure. If I didn't miss something, it only uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item should hold and ref the dp pointer instead? What do you think? Gaetan? Another point is that queue_netdev_flow_del() should check for netdev_is_flow_api_enabled() to avoid creation of offload threads if offloading is disabled. But that's good that we didn't have it, as the refcount issue got exposed. Otherwise it would be hard to reproduce. Best regards, Ilya Maximets. Asan report below, for convenience: = ==17076==ERROR: AddressSanitizer: heap-use-after-free on address 0x61603080 at pc 0x005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8 READ of size 8 at 0x61603080 thread T5 (hw_offload4) #0 0x5e0e37 in mark_to_flow_disassociate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62 #1 0x5dfaf1 in dp_offload_flow /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15 #2 0x5df8bd in dp_netdev_flow_offload_main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17 #3 0x73a2fc in ovsthread_wrapper /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12 #4 0x7fe0619506da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) 0x61603080 is located 0 bytes inside of 576-byte region [0x61603080,0x616032c0) freed by thread T0 here: #0 0x49640d in free (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d) #1 0x5d6652 in dp_netdev_free /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5 #2 0x5f0722 in dp_netdev_unref /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13 #3 0x5cf5e5 in dpif_netdev_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5 #4 0x5fc393 in dpif_uninit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9 #5 0x5fc0c0 in dpif_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9 #6 0x52a972 in close_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5 #7 0x517f08 in destruct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5 #8 0x4f09e0 in ofproto_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5 #9 0x4c71e4 in bridge_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3608:9 #10 0x4c6f4a in bridge_exit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:553:9 #11 0x4e109a in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:146:5 #12 0x7fe060dcfbf6 in
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Thu, Jan 27, 2022 at 2:45 PM Gaëtan Rivet wrote: > > On Thu, Jan 27, 2022, at 07:42, Sriharsha Basavapatna via dev wrote: > > In dp_netdev_pmd_remove_flow() we schedule the deletion of an > > offloaded flow, if a mark has been assigned to the flow. But if > > this occurs in the window in which the offload thread completes > > offloading the flow and assigns a mark to the flow, then we miss > > deleting the flow. This problem has been observed while adding > > and deleting flows in a loop. To fix this, always enqueue flow > > deletion regardless of the flow->mark being set. > > > > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") > > Signed-off-by: Sriharsha Basavapatna > > --- > > lib/dpif-netdev.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index e28e0b554..22c5f19a8 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct > > dp_netdev_pmd_thread *pmd, > > dp_netdev_simple_match_remove(pmd, flow); > > cmap_remove(&pmd->flow_table, node, > > dp_netdev_flow_hash(&flow->ufid)); > > ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); > > -if (flow->mark != INVALID_FLOW_MARK) { > > -queue_netdev_flow_del(pmd, flow); > > -} > > +queue_netdev_flow_del(pmd, flow); > > Hi Sriharsha, > > It makes sense, thanks for the patch. > Additionally, the mark is being written in a thread and read in another but > is not atomic. Without fence, coherence might take time, making the described > issue more likely on some archs. Right, good point. > > Acked-by: Gaetan Rivet Thanks Gaetan. -Harsha > > -- > Gaetan Rivet -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
On Thu, Jan 27, 2022, at 07:42, Sriharsha Basavapatna via dev wrote: > In dp_netdev_pmd_remove_flow() we schedule the deletion of an > offloaded flow, if a mark has been assigned to the flow. But if > this occurs in the window in which the offload thread completes > offloading the flow and assigns a mark to the flow, then we miss > deleting the flow. This problem has been observed while adding > and deleting flows in a loop. To fix this, always enqueue flow > deletion regardless of the flow->mark being set. > > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") > Signed-off-by: Sriharsha Basavapatna > --- > lib/dpif-netdev.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index e28e0b554..22c5f19a8 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct > dp_netdev_pmd_thread *pmd, > dp_netdev_simple_match_remove(pmd, flow); > cmap_remove(&pmd->flow_table, node, > dp_netdev_flow_hash(&flow->ufid)); > ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); > -if (flow->mark != INVALID_FLOW_MARK) { > -queue_netdev_flow_del(pmd, flow); > -} > +queue_netdev_flow_del(pmd, flow); Hi Sriharsha, It makes sense, thanks for the patch. Additionally, the mark is being written in a thread and read in another but is not atomic. Without fence, coherence might take time, making the described issue more likely on some archs. Acked-by: Gaetan Rivet -- Gaetan Rivet ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows
In dp_netdev_pmd_remove_flow() we schedule the deletion of an offloaded flow, if a mark has been assigned to the flow. But if this occurs in the window in which the offload thread completes offloading the flow and assigns a mark to the flow, then we miss deleting the flow. This problem has been observed while adding and deleting flows in a loop. To fix this, always enqueue flow deletion regardless of the flow->mark being set. Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id") Signed-off-by: Sriharsha Basavapatna --- lib/dpif-netdev.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e28e0b554..22c5f19a8 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, dp_netdev_simple_match_remove(pmd, flow); cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); ccmap_dec(&pmd->n_flows, odp_to_u32(in_port)); -if (flow->mark != INVALID_FLOW_MARK) { -queue_netdev_flow_del(pmd, flow); -} +queue_netdev_flow_del(pmd, flow); flow->dead = true; dp_netdev_flow_unref(flow); -- 2.30.0.349.g30b29f044a -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev