Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-02 Thread Ilya Maximets
On 2/2/22 18:19, Sriharsha Basavapatna wrote:
> On Wed, Feb 2, 2022 at 9:22 PM Ilya Maximets  wrote:
>>
>> On 2/1/22 16:07, Gaëtan Rivet wrote:
>>> On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
 On 2/1/22 14:38, Gaëtan Rivet wrote:
> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>> On 2/1/22 12:38, Gaëtan Rivet wrote:
>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
 On 1/31/22 11:42, Gaëtan Rivet wrote:
> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>> offloaded flow, if a mark has been assigned to the flow. But if
>>> this occurs in the window in which the offload thread completes
>>> offloading the flow and assigns a mark to the flow, then we miss
>>> deleting the flow. This problem has been observed while adding
>>> and deleting flows in a loop. To fix this, always enqueue flow
>>> deletion regardless of the flow->mark being set.
>>>
>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>> Signed-off-by: Sriharsha Basavapatna 
>>> 
>>> ---
>>>  lib/dpif-netdev.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index e28e0b554..22c5f19a8 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
>>> dp_netdev_pmd_thread *pmd,
>>>  dp_netdev_simple_match_remove(pmd, flow);
>>>  cmap_remove(&pmd->flow_table, node, 
>>> dp_netdev_flow_hash(&flow->ufid));
>>>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>> -if (flow->mark != INVALID_FLOW_MARK) {
>>> -queue_netdev_flow_del(pmd, flow);
>>> -}
>>> +queue_netdev_flow_del(pmd, flow);
>>>  flow->dead = true;
>>>
>>>  dp_netdev_flow_unref(flow);
>>>
>>
>> Thanks for the patch, but it looks like that it's not that simple...
>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>> on flow disassociation if the dpif already destroyed:
>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>
>> I think that the problem is that offload thread holds the ref count
>> for PMD thread, but PMD thread structure doesn't hold the ref count
>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>> will be used while PMD thread is destroyed.  I guess, we should fix
>> that.  OTOH, I'm not sure if offload thread actually needs a 
>> reference
>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>> uses pmd pointer to access pmd->dp.  So, maybe, 
>> dp_offload_thread_item
>> should hold and ref the dp pointer instead?
>>
>> What do you think?  Gaetan?
>>
>
> Hi Ilya,
>
> I've been looking into this issue, you are right that the offload 
> threads don't
> actually need the pmd pointer, only the dp one.
>
> The problem I see with a proper reference taking on dp to protect 
> against this
> UAF, is that it is prohibitively slow to manage those dp references, 
> and it relies on
> a global lock.
>
> I have tried to find an alternative (the flush op could be tagged to 
> mark that
> it was issued during a dp_netdev_free), but I always end-up with a 
> race-condition,
> even if the window is reduced.
 Hmm.  Is it with flow API enabled or disabled?
 With flow api enabled we should have a flush operation that should
 in theory wait for all the previous offloading operations to be
 completed.  If it's disabled though, flow deletion will initialize
 the offload thread and enqueue the operation while flush will not
 actually be requested.
>>>
>>> Yes, on your second point, with the API disabled the deletion path 
>>> relied on
>>> the test flow->mark != INVALID to know whether a flow was offload or 
>>> not, and did not
>>> care then to check the API status.
>>> The API state should now be checked as well for the deletion path.
>>>

 I missed before that we have flush on that path.  Maybe it's enough
 to just add a !netdev_is_flow_api_enabled() check to the deletion path?
 Or am I missing something else?

>>>
>>> After the flush, there will be remaining ops in the queue that would 
>>> trigger the UAF.
>>
>> What are these ops?
>>
>
> Any add/mod/del can b

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-02 Thread Sriharsha Basavapatna via dev
On Wed, Feb 2, 2022 at 9:22 PM Ilya Maximets  wrote:
>
> On 2/1/22 16:07, Gaëtan Rivet wrote:
> > On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
> >> On 2/1/22 14:38, Gaëtan Rivet wrote:
> >>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>  On 2/1/22 12:38, Gaëtan Rivet wrote:
> > On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
> >> On 1/31/22 11:42, Gaëtan Rivet wrote:
> >>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>  On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> > In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> > offloaded flow, if a mark has been assigned to the flow. But if
> > this occurs in the window in which the offload thread completes
> > offloading the flow and assigns a mark to the flow, then we miss
> > deleting the flow. This problem has been observed while adding
> > and deleting flows in a loop. To fix this, always enqueue flow
> > deletion regardless of the flow->mark being set.
> >
> > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> > Signed-off-by: Sriharsha Basavapatna 
> > 
> > ---
> >  lib/dpif-netdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e28e0b554..22c5f19a8 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
> > dp_netdev_pmd_thread *pmd,
> >  dp_netdev_simple_match_remove(pmd, flow);
> >  cmap_remove(&pmd->flow_table, node, 
> > dp_netdev_flow_hash(&flow->ufid));
> >  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> > -if (flow->mark != INVALID_FLOW_MARK) {
> > -queue_netdev_flow_del(pmd, flow);
> > -}
> > +queue_netdev_flow_del(pmd, flow);
> >  flow->dead = true;
> >
> >  dp_netdev_flow_unref(flow);
> >
> 
>  Thanks for the patch, but it looks like that it's not that simple...
>  A lot of tests are crashing in GHA and ASAN reports use-after-free
>  on flow disassociation if the dpif already destroyed:
>    https://github.com/ovsrobot/ovs/actions/runs/1754866321
> 
>  I think that the problem is that offload thread holds the ref count
>  for PMD thread, but PMD thread structure doesn't hold the ref count
>  for the dp, because it doesn't expect that dp_netdev_pmd structure
>  will be used while PMD thread is destroyed.  I guess, we should fix
>  that.  OTOH, I'm not sure if offload thread actually needs a 
>  reference
>  to dp_netdev_pmd structure.  If I didn't miss something, it only
>  uses pmd pointer to access pmd->dp.  So, maybe, 
>  dp_offload_thread_item
>  should hold and ref the dp pointer instead?
> 
>  What do you think?  Gaetan?
> 
> >>>
> >>> Hi Ilya,
> >>>
> >>> I've been looking into this issue, you are right that the offload 
> >>> threads don't
> >>> actually need the pmd pointer, only the dp one.
> >>>
> >>> The problem I see with a proper reference taking on dp to protect 
> >>> against this
> >>> UAF, is that it is prohibitively slow to manage those dp references, 
> >>> and it relies on
> >>> a global lock.
> >>>
> >>> I have tried to find an alternative (the flush op could be tagged to 
> >>> mark that
> >>> it was issued during a dp_netdev_free), but I always end-up with a 
> >>> race-condition,
> >>> even if the window is reduced.
> >> Hmm.  Is it with flow API enabled or disabled?
> >> With flow api enabled we should have a flush operation that should
> >> in theory wait for all the previous offloading operations to be
> >> completed.  If it's disabled though, flow deletion will initialize
> >> the offload thread and enqueue the operation while flush will not
> >> actually be requested.
> >
> > Yes, on your second point, with the API disabled the deletion path 
> > relied on
> > the test flow->mark != INVALID to know whether a flow was offload or 
> > not, and did not
> > care then to check the API status.
> > The API state should now be checked as well for the deletion path.
> >
> >>
> >> I missed before that we have flush on that path.  Maybe it's enough
> >> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
> >> Or am I missing something else?
> >>
> >
> > After the flush, there will be remaining ops in the queue that would 
> > trigger the UAF.
> 
>  What are these ops?
> 
> >>>
> >>> Any add/mod/del can be found there.
> >>>
> >>> Flushes are scheduled b

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-02 Thread Ilya Maximets
On 2/1/22 16:07, Gaëtan Rivet wrote:
> On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
>> On 2/1/22 14:38, Gaëtan Rivet wrote:
>>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
 On 2/1/22 12:38, Gaëtan Rivet wrote:
> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
 On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> offloaded flow, if a mark has been assigned to the flow. But if
> this occurs in the window in which the offload thread completes
> offloading the flow and assigns a mark to the flow, then we miss
> deleting the flow. This problem has been observed while adding
> and deleting flows in a loop. To fix this, always enqueue flow
> deletion regardless of the flow->mark being set.
>
> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> Signed-off-by: Sriharsha Basavapatna 
> 
> ---
>  lib/dpif-netdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..22c5f19a8 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
> dp_netdev_pmd_thread *pmd,
>  dp_netdev_simple_match_remove(pmd, flow);
>  cmap_remove(&pmd->flow_table, node, 
> dp_netdev_flow_hash(&flow->ufid));
>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> -if (flow->mark != INVALID_FLOW_MARK) {
> -queue_netdev_flow_del(pmd, flow);
> -}
> +queue_netdev_flow_del(pmd, flow);
>  flow->dead = true;
>  
>  dp_netdev_flow_unref(flow);
>

 Thanks for the patch, but it looks like that it's not that simple...
 A lot of tests are crashing in GHA and ASAN reports use-after-free
 on flow disassociation if the dpif already destroyed:
   https://github.com/ovsrobot/ovs/actions/runs/1754866321

 I think that the problem is that offload thread holds the ref count
 for PMD thread, but PMD thread structure doesn't hold the ref count
 for the dp, because it doesn't expect that dp_netdev_pmd structure
 will be used while PMD thread is destroyed.  I guess, we should fix
 that.  OTOH, I'm not sure if offload thread actually needs a reference
 to dp_netdev_pmd structure.  If I didn't miss something, it only
 uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
 should hold and ref the dp pointer instead?

 What do you think?  Gaetan?

>>>
>>> Hi Ilya,
>>>
>>> I've been looking into this issue, you are right that the offload 
>>> threads don't
>>> actually need the pmd pointer, only the dp one.
>>>
>>> The problem I see with a proper reference taking on dp to protect 
>>> against this
>>> UAF, is that it is prohibitively slow to manage those dp references, 
>>> and it relies on
>>> a global lock.
>>>
>>> I have tried to find an alternative (the flush op could be tagged to 
>>> mark that
>>> it was issued during a dp_netdev_free), but I always end-up with a 
>>> race-condition,
>>> even if the window is reduced.
>> Hmm.  Is it with flow API enabled or disabled?
>> With flow api enabled we should have a flush operation that should
>> in theory wait for all the previous offloading operations to be
>> completed.  If it's disabled though, flow deletion will initialize
>> the offload thread and enqueue the operation while flush will not
>> actually be requested.
>
> Yes, on your second point, with the API disabled the deletion path relied 
> on
> the test flow->mark != INVALID to know whether a flow was offload or not, 
> and did not
> care then to check the API status.
> The API state should now be checked as well for the deletion path.
>
>>
>> I missed before that we have flush on that path.  Maybe it's enough
>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>> Or am I missing something else?
>>
>
> After the flush, there will be remaining ops in the queue that would 
> trigger the UAF.

 What are these ops?

>>>
>>> Any add/mod/del can be found there.
>>>
>>> Flushes are scheduled by the main thread in response to user commands.
>>> While it executes, PMDs et revalidators are running, potentially enqueuing
>>> offload ops. The only restriction on those ops is that it's impossible to 
>>> have multiple
>>> flush in the queue: only one can exist at a time.
>>>
>>
>> I

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-01 Thread Gaëtan Rivet
On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
> On 2/1/22 14:38, Gaëtan Rivet wrote:
>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>>> On 2/1/22 12:38, Gaëtan Rivet wrote:
 On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
> On 1/31/22 11:42, Gaëtan Rivet wrote:
>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
 In dp_netdev_pmd_remove_flow() we schedule the deletion of an
 offloaded flow, if a mark has been assigned to the flow. But if
 this occurs in the window in which the offload thread completes
 offloading the flow and assigns a mark to the flow, then we miss
 deleting the flow. This problem has been observed while adding
 and deleting flows in a loop. To fix this, always enqueue flow
 deletion regardless of the flow->mark being set.

 Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
 Signed-off-by: Sriharsha Basavapatna 
 
 ---
  lib/dpif-netdev.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

 diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
 index e28e0b554..22c5f19a8 100644
 --- a/lib/dpif-netdev.c
 +++ b/lib/dpif-netdev.c
 @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
 dp_netdev_pmd_thread *pmd,
  dp_netdev_simple_match_remove(pmd, flow);
  cmap_remove(&pmd->flow_table, node, 
 dp_netdev_flow_hash(&flow->ufid));
  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
 -if (flow->mark != INVALID_FLOW_MARK) {
 -queue_netdev_flow_del(pmd, flow);
 -}
 +queue_netdev_flow_del(pmd, flow);
  flow->dead = true;
  
  dp_netdev_flow_unref(flow);

>>>
>>> Thanks for the patch, but it looks like that it's not that simple...
>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>> on flow disassociation if the dpif already destroyed:
>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>
>>> I think that the problem is that offload thread holds the ref count
>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>> should hold and ref the dp pointer instead?
>>>
>>> What do you think?  Gaetan?
>>>
>>
>> Hi Ilya,
>>
>> I've been looking into this issue, you are right that the offload 
>> threads don't
>> actually need the pmd pointer, only the dp one.
>>
>> The problem I see with a proper reference taking on dp to protect 
>> against this
>> UAF, is that it is prohibitively slow to manage those dp references, and 
>> it relies on
>> a global lock.
>>
>> I have tried to find an alternative (the flush op could be tagged to 
>> mark that
>> it was issued during a dp_netdev_free), but I always end-up with a 
>> race-condition,
>> even if the window is reduced.
> Hmm.  Is it with flow API enabled or disabled?
> With flow api enabled we should have a flush operation that should
> in theory wait for all the previous offloading operations to be
> completed.  If it's disabled though, flow deletion will initialize
> the offload thread and enqueue the operation while flush will not
> actually be requested.

 Yes, on your second point, with the API disabled the deletion path relied 
 on
 the test flow->mark != INVALID to know whether a flow was offload or not, 
 and did not
 care then to check the API status.
 The API state should now be checked as well for the deletion path.

>
> I missed before that we have flush on that path.  Maybe it's enough
> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
> Or am I missing something else?
>

 After the flush, there will be remaining ops in the queue that would 
 trigger the UAF.
>>>
>>> What are these ops?
>>>
>> 
>> Any add/mod/del can be found there.
>> 
>> Flushes are scheduled by the main thread in response to user commands.
>> While it executes, PMDs et revalidators are running, potentially enqueuing
>> offload ops. The only restriction on those ops is that it's impossible to 
>> have multiple
>> flush in the queue: only one can exist at a time.
>> 
>
> I think, revalidators should already be stopped at the point of datapath
> deletion.  Don't they?
>
> For the PMDs, I see that they are still runnin

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-01 Thread Ilya Maximets
On 2/1/22 14:38, Gaëtan Rivet wrote:
> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>> On 2/1/22 12:38, Gaëtan Rivet wrote:
>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
 On 1/31/22 11:42, Gaëtan Rivet wrote:
> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>> offloaded flow, if a mark has been assigned to the flow. But if
>>> this occurs in the window in which the offload thread completes
>>> offloading the flow and assigns a mark to the flow, then we miss
>>> deleting the flow. This problem has been observed while adding
>>> and deleting flows in a loop. To fix this, always enqueue flow
>>> deletion regardless of the flow->mark being set.
>>>
>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>> Signed-off-by: Sriharsha Basavapatna 
>>> 
>>> ---
>>>  lib/dpif-netdev.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index e28e0b554..22c5f19a8 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
>>> dp_netdev_pmd_thread *pmd,
>>>  dp_netdev_simple_match_remove(pmd, flow);
>>>  cmap_remove(&pmd->flow_table, node, 
>>> dp_netdev_flow_hash(&flow->ufid));
>>>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>> -if (flow->mark != INVALID_FLOW_MARK) {
>>> -queue_netdev_flow_del(pmd, flow);
>>> -}
>>> +queue_netdev_flow_del(pmd, flow);
>>>  flow->dead = true;
>>>  
>>>  dp_netdev_flow_unref(flow);
>>>
>>
>> Thanks for the patch, but it looks like that it's not that simple...
>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>> on flow disassociation if the dpif already destroyed:
>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>
>> I think that the problem is that offload thread holds the ref count
>> for PMD thread, but PMD thread structure doesn't hold the ref count
>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>> will be used while PMD thread is destroyed.  I guess, we should fix
>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>> should hold and ref the dp pointer instead?
>>
>> What do you think?  Gaetan?
>>
>
> Hi Ilya,
>
> I've been looking into this issue, you are right that the offload threads 
> don't
> actually need the pmd pointer, only the dp one.
>
> The problem I see with a proper reference taking on dp to protect against 
> this
> UAF, is that it is prohibitively slow to manage those dp references, and 
> it relies on
> a global lock.
>
> I have tried to find an alternative (the flush op could be tagged to mark 
> that
> it was issued during a dp_netdev_free), but I always end-up with a 
> race-condition,
> even if the window is reduced.
 Hmm.  Is it with flow API enabled or disabled?
 With flow api enabled we should have a flush operation that should
 in theory wait for all the previous offloading operations to be
 completed.  If it's disabled though, flow deletion will initialize
 the offload thread and enqueue the operation while flush will not
 actually be requested.
>>>
>>> Yes, on your second point, with the API disabled the deletion path relied on
>>> the test flow->mark != INVALID to know whether a flow was offload or not, 
>>> and did not
>>> care then to check the API status.
>>> The API state should now be checked as well for the deletion path.
>>>

 I missed before that we have flush on that path.  Maybe it's enough
 to just add a !netdev_is_flow_api_enabled() check to the deletion path?
 Or am I missing something else?

>>>
>>> After the flush, there will be remaining ops in the queue that would 
>>> trigger the UAF.
>>
>> What are these ops?
>>
> 
> Any add/mod/del can be found there.
> 
> Flushes are scheduled by the main thread in response to user commands.
> While it executes, PMDs et revalidators are running, potentially enqueuing
> offload ops. The only restriction on those ops is that it's impossible to 
> have multiple
> flush in the queue: only one can exist at a time.
> 

I think, revalidators should already be stopped at the point of datapath
deletion.  Don't they?

For the PMDs, I see that they are still running and that sounds like a
bug, because we're still offloading things after the flush.  So, some
of the flows are still in hardware after the port removal / some userspace
struc

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-01 Thread Gaëtan Rivet
On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
> On 2/1/22 12:38, Gaëtan Rivet wrote:
>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
 On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>> offloaded flow, if a mark has been assigned to the flow. But if
>> this occurs in the window in which the offload thread completes
>> offloading the flow and assigns a mark to the flow, then we miss
>> deleting the flow. This problem has been observed while adding
>> and deleting flows in a loop. To fix this, always enqueue flow
>> deletion regardless of the flow->mark being set.
>>
>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>> Signed-off-by: Sriharsha Basavapatna 
>> ---
>>  lib/dpif-netdev.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index e28e0b554..22c5f19a8 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
>> dp_netdev_pmd_thread *pmd,
>>  dp_netdev_simple_match_remove(pmd, flow);
>>  cmap_remove(&pmd->flow_table, node, 
>> dp_netdev_flow_hash(&flow->ufid));
>>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>> -if (flow->mark != INVALID_FLOW_MARK) {
>> -queue_netdev_flow_del(pmd, flow);
>> -}
>> +queue_netdev_flow_del(pmd, flow);
>>  flow->dead = true;
>>  
>>  dp_netdev_flow_unref(flow);
>>
>
> Thanks for the patch, but it looks like that it's not that simple...
> A lot of tests are crashing in GHA and ASAN reports use-after-free
> on flow disassociation if the dpif already destroyed:
>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>
> I think that the problem is that offload thread holds the ref count
> for PMD thread, but PMD thread structure doesn't hold the ref count
> for the dp, because it doesn't expect that dp_netdev_pmd structure
> will be used while PMD thread is destroyed.  I guess, we should fix
> that.  OTOH, I'm not sure if offload thread actually needs a reference
> to dp_netdev_pmd structure.  If I didn't miss something, it only
> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
> should hold and ref the dp pointer instead?
>
> What do you think?  Gaetan?
>

 Hi Ilya,

 I've been looking into this issue, you are right that the offload threads 
 don't
 actually need the pmd pointer, only the dp one.

 The problem I see with a proper reference taking on dp to protect against 
 this
 UAF, is that it is prohibitively slow to manage those dp references, and 
 it relies on
 a global lock.

 I have tried to find an alternative (the flush op could be tagged to mark 
 that
 it was issued during a dp_netdev_free), but I always end-up with a 
 race-condition,
 even if the window is reduced.
>>> Hmm.  Is it with flow API enabled or disabled?
>>> With flow api enabled we should have a flush operation that should
>>> in theory wait for all the previous offloading operations to be
>>> completed.  If it's disabled though, flow deletion will initialize
>>> the offload thread and enqueue the operation while flush will not
>>> actually be requested.
>> 
>> Yes, on your second point, with the API disabled the deletion path relied on
>> the test flow->mark != INVALID to know whether a flow was offload or not, 
>> and did not
>> care then to check the API status.
>> The API state should now be checked as well for the deletion path.
>> 
>>>
>>> I missed before that we have flush on that path.  Maybe it's enough
>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>> Or am I missing something else?
>>>
>> 
>> After the flush, there will be remaining ops in the queue that would trigger 
>> the UAF.
>
> What are these ops?
>

Any add/mod/del can be found there.

Flushes are scheduled by the main thread in response to user commands.
While it executes, PMDs et revalidators are running, potentially enqueuing
offload ops. The only restriction on those ops is that it's impossible to have 
multiple
flush in the queue: only one can exist at a time.

-- 
Gaetan Rivet
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-01 Thread Ilya Maximets
On 2/1/22 12:38, Gaëtan Rivet wrote:
> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
 On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> offloaded flow, if a mark has been assigned to the flow. But if
> this occurs in the window in which the offload thread completes
> offloading the flow and assigns a mark to the flow, then we miss
> deleting the flow. This problem has been observed while adding
> and deleting flows in a loop. To fix this, always enqueue flow
> deletion regardless of the flow->mark being set.
>
> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> Signed-off-by: Sriharsha Basavapatna 
> ---
>  lib/dpif-netdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..22c5f19a8 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
> dp_netdev_pmd_thread *pmd,
>  dp_netdev_simple_match_remove(pmd, flow);
>  cmap_remove(&pmd->flow_table, node, 
> dp_netdev_flow_hash(&flow->ufid));
>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> -if (flow->mark != INVALID_FLOW_MARK) {
> -queue_netdev_flow_del(pmd, flow);
> -}
> +queue_netdev_flow_del(pmd, flow);
>  flow->dead = true;
>  
>  dp_netdev_flow_unref(flow);
>

 Thanks for the patch, but it looks like that it's not that simple...
 A lot of tests are crashing in GHA and ASAN reports use-after-free
 on flow disassociation if the dpif already destroyed:
   https://github.com/ovsrobot/ovs/actions/runs/1754866321

 I think that the problem is that offload thread holds the ref count
 for PMD thread, but PMD thread structure doesn't hold the ref count
 for the dp, because it doesn't expect that dp_netdev_pmd structure
 will be used while PMD thread is destroyed.  I guess, we should fix
 that.  OTOH, I'm not sure if offload thread actually needs a reference
 to dp_netdev_pmd structure.  If I didn't miss something, it only
 uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
 should hold and ref the dp pointer instead?

 What do you think?  Gaetan?

>>>
>>> Hi Ilya,
>>>
>>> I've been looking into this issue, you are right that the offload threads 
>>> don't
>>> actually need the pmd pointer, only the dp one.
>>>
>>> The problem I see with a proper reference taking on dp to protect against 
>>> this
>>> UAF, is that it is prohibitively slow to manage those dp references, and it 
>>> relies on
>>> a global lock.
>>>
>>> I have tried to find an alternative (the flush op could be tagged to mark 
>>> that
>>> it was issued during a dp_netdev_free), but I always end-up with a 
>>> race-condition,
>>> even if the window is reduced.
>> Hmm.  Is it with flow API enabled or disabled?
>> With flow api enabled we should have a flush operation that should
>> in theory wait for all the previous offloading operations to be
>> completed.  If it's disabled though, flow deletion will initialize
>> the offload thread and enqueue the operation while flush will not
>> actually be requested.
> 
> Yes, on your second point, with the API disabled the deletion path relied on
> the test flow->mark != INVALID to know whether a flow was offload or not, and 
> did not
> care then to check the API status.
> The API state should now be checked as well for the deletion path.
> 
>>
>> I missed before that we have flush on that path.  Maybe it's enough
>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>> Or am I missing something else?
>>
> 
> After the flush, there will be remaining ops in the queue that would trigger 
> the UAF.

What are these ops?

> The api_enabled() check on del emission is not sufficient.
> During the flush, with a special tag telling that it is part of deleting the 
> whole
> datapath, the API could be marked disabled, and the api_enabled() could be
> checked again in the offload thread, before any dereference of dp.
> 
> I didn't consider it as the API was not meant to be disabled then possible 
> re-enabled
> after a datapath re-creation potentially. That could work, with some changes 
> on
> the api_enabled() flag. The current flag should hold the user intent as set 
> in the config,
> a second flag should hold the API liveness as set related to the dp_netdev 
> status.
> That second flag would be the one read during the checks, and written on open 
> and after
> a flush, before getting past the barrier. All offload threads should just 
> discard
> offload requests if this liveness flag is false.
> 

___
dev mailing

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-01 Thread Gaëtan Rivet
On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
> On 1/31/22 11:42, Gaëtan Rivet wrote:
>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
 In dp_netdev_pmd_remove_flow() we schedule the deletion of an
 offloaded flow, if a mark has been assigned to the flow. But if
 this occurs in the window in which the offload thread completes
 offloading the flow and assigns a mark to the flow, then we miss
 deleting the flow. This problem has been observed while adding
 and deleting flows in a loop. To fix this, always enqueue flow
 deletion regardless of the flow->mark being set.

 Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
 Signed-off-by: Sriharsha Basavapatna 
 ---
  lib/dpif-netdev.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

 diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
 index e28e0b554..22c5f19a8 100644
 --- a/lib/dpif-netdev.c
 +++ b/lib/dpif-netdev.c
 @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
 dp_netdev_pmd_thread *pmd,
  dp_netdev_simple_match_remove(pmd, flow);
  cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
 -if (flow->mark != INVALID_FLOW_MARK) {
 -queue_netdev_flow_del(pmd, flow);
 -}
 +queue_netdev_flow_del(pmd, flow);
  flow->dead = true;
  
  dp_netdev_flow_unref(flow);

>>>
>>> Thanks for the patch, but it looks like that it's not that simple...
>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>> on flow disassociation if the dpif already destroyed:
>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>
>>> I think that the problem is that offload thread holds the ref count
>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>> should hold and ref the dp pointer instead?
>>>
>>> What do you think?  Gaetan?
>>>
>> 
>> Hi Ilya,
>> 
>> I've been looking into this issue, you are right that the offload threads 
>> don't
>> actually need the pmd pointer, only the dp one.
>> 
>> The problem I see with a proper reference taking on dp to protect against 
>> this
>> UAF, is that it is prohibitively slow to manage those dp references, and it 
>> relies on
>> a global lock.
>> 
>> I have tried to find an alternative (the flush op could be tagged to mark 
>> that
>> it was issued during a dp_netdev_free), but I always end-up with a 
>> race-condition,
>> even if the window is reduced.
> Hmm.  Is it with flow API enabled or disabled?
> With flow api enabled we should have a flush operation that should
> in theory wait for all the previous offloading operations to be
> completed.  If it's disabled though, flow deletion will initialize
> the offload thread and enqueue the operation while flush will not
> actually be requested.

Yes, on your second point, with the API disabled the deletion path relied on
the test flow->mark != INVALID to know whether a flow was offload or not, and 
did not
care then to check the API status.
The API state should now be checked as well for the deletion path.

>
> I missed before that we have flush on that path.  Maybe it's enough
> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
> Or am I missing something else?
>

After the flush, there will be remaining ops in the queue that would trigger 
the UAF.
The api_enabled() check on del emission is not sufficient.
During the flush, with a special tag telling that it is part of deleting the 
whole
datapath, the API could be marked disabled, and the api_enabled() could be
checked again in the offload thread, before any dereference of dp.

I didn't consider it as the API was not meant to be disabled then possible 
re-enabled
after a datapath re-creation potentially. That could work, with some changes on
the api_enabled() flag. The current flag should hold the user intent as set in 
the config,
a second flag should hold the API liveness as set related to the dp_netdev 
status.
That second flag would be the one read during the checks, and written on open 
and after
a flush, before getting past the barrier. All offload threads should just 
discard
offload requests if this liveness flag is false.

-- 
Gaetan Rivet
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-31 Thread Ilya Maximets
On 1/31/22 11:42, Gaëtan Rivet wrote:
> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>> offloaded flow, if a mark has been assigned to the flow. But if
>>> this occurs in the window in which the offload thread completes
>>> offloading the flow and assigns a mark to the flow, then we miss
>>> deleting the flow. This problem has been observed while adding
>>> and deleting flows in a loop. To fix this, always enqueue flow
>>> deletion regardless of the flow->mark being set.
>>>
>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>> Signed-off-by: Sriharsha Basavapatna 
>>> ---
>>>  lib/dpif-netdev.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index e28e0b554..22c5f19a8 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
>>> *pmd,
>>>  dp_netdev_simple_match_remove(pmd, flow);
>>>  cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>> -if (flow->mark != INVALID_FLOW_MARK) {
>>> -queue_netdev_flow_del(pmd, flow);
>>> -}
>>> +queue_netdev_flow_del(pmd, flow);
>>>  flow->dead = true;
>>>  
>>>  dp_netdev_flow_unref(flow);
>>>
>>
>> Thanks for the patch, but it looks like that it's not that simple...
>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>> on flow disassociation if the dpif already destroyed:
>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>
>> I think that the problem is that offload thread holds the ref count
>> for PMD thread, but PMD thread structure doesn't hold the ref count
>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>> will be used while PMD thread is destroyed.  I guess, we should fix
>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>> should hold and ref the dp pointer instead?
>>
>> What do you think?  Gaetan?
>>
> 
> Hi Ilya,
> 
> I've been looking into this issue, you are right that the offload threads 
> don't
> actually need the pmd pointer, only the dp one.
> 
> The problem I see with a proper reference taking on dp to protect against this
> UAF, is that it is prohibitively slow to manage those dp references, and it 
> relies on
> a global lock.
> 
> I have tried to find an alternative (the flush op could be tagged to mark that
> it was issued during a dp_netdev_free), but I always end-up with a 
> race-condition,
> even if the window is reduced.
Hmm.  Is it with flow API enabled or disabled?
With flow api enabled we should have a flush operation that should
in theory wait for all the previous offloading operations to be
completed.  If it's disabled though, flow deletion will initialize
the offload thread and enqueue the operation while flush will not
actually be requested.

I missed before that we have flush on that path.  Maybe it's enough
to just add a !netdev_is_flow_api_enabled() check to the deletion path?
Or am I missing something else?

> 
> So given that the refcounting is too slow but necessary, I think the only 
> practical
> solution is to rewrite it to be fast enough. I will send an RFC to that end, 
> let
> me know if you think of something better.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-31 Thread Gaëtan Rivet
On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>> offloaded flow, if a mark has been assigned to the flow. But if
>> this occurs in the window in which the offload thread completes
>> offloading the flow and assigns a mark to the flow, then we miss
>> deleting the flow. This problem has been observed while adding
>> and deleting flows in a loop. To fix this, always enqueue flow
>> deletion regardless of the flow->mark being set.
>> 
>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>> Signed-off-by: Sriharsha Basavapatna 
>> ---
>>  lib/dpif-netdev.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>> 
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index e28e0b554..22c5f19a8 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
>> *pmd,
>>  dp_netdev_simple_match_remove(pmd, flow);
>>  cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>> -if (flow->mark != INVALID_FLOW_MARK) {
>> -queue_netdev_flow_del(pmd, flow);
>> -}
>> +queue_netdev_flow_del(pmd, flow);
>>  flow->dead = true;
>>  
>>  dp_netdev_flow_unref(flow);
>> 
>
> Thanks for the patch, but it looks like that it's not that simple...
> A lot of tests are crashing in GHA and ASAN reports use-after-free
> on flow disassociation if the dpif already destroyed:
>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>
> I think that the problem is that offload thread holds the ref count
> for PMD thread, but PMD thread structure doesn't hold the ref count
> for the dp, because it doesn't expect that dp_netdev_pmd structure
> will be used while PMD thread is destroyed.  I guess, we should fix
> that.  OTOH, I'm not sure if offload thread actually needs a reference
> to dp_netdev_pmd structure.  If I didn't miss something, it only
> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
> should hold and ref the dp pointer instead?
>
> What do you think?  Gaetan?
>

Hi Ilya,

I've been looking into this issue, you are right that the offload threads don't
actually need the pmd pointer, only the dp one.

The problem I see with a proper reference taking on dp to protect against this
UAF, is that it is prohibitively slow to manage those dp references, and it 
relies on
a global lock.

I have tried to find an alternative (the flush op could be tagged to mark that 
it was issued during a dp_netdev_free), but I always end-up with a 
race-condition, even if the window is reduced.

So given that the refcounting is too slow but necessary, I think the only 
practical solution
is to rewrite it to be fast enough. I will send an RFC to that end, let me know 
if you think
of something better.

-- 
Gaetan Rivet
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-27 Thread Ilya Maximets
On 1/27/22 13:09, Sriharsha Basavapatna wrote:
> On Thu, Jan 27, 2022 at 4:08 PM Ilya Maximets  wrote:
>>
>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>> offloaded flow, if a mark has been assigned to the flow. But if
>>> this occurs in the window in which the offload thread completes
>>> offloading the flow and assigns a mark to the flow, then we miss
>>> deleting the flow. This problem has been observed while adding
>>> and deleting flows in a loop. To fix this, always enqueue flow
>>> deletion regardless of the flow->mark being set.
>>>
>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>> Signed-off-by: Sriharsha Basavapatna 
>>> ---
>>>  lib/dpif-netdev.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index e28e0b554..22c5f19a8 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
>>> *pmd,
>>>  dp_netdev_simple_match_remove(pmd, flow);
>>>  cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>> -if (flow->mark != INVALID_FLOW_MARK) {
>>> -queue_netdev_flow_del(pmd, flow);
>>> -}
>>> +queue_netdev_flow_del(pmd, flow);
>>>  flow->dead = true;
>>>
>>>  dp_netdev_flow_unref(flow);
>>>
>>
>> Thanks for the patch, but it looks like that it's not that simple...
>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>> on flow disassociation if the dpif already destroyed:
>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>
>> I think that the problem is that offload thread holds the ref count
>> for PMD thread, but PMD thread structure doesn't hold the ref count
>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>> will be used while PMD thread is destroyed.  I guess, we should fix
>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>> should hold and ref the dp pointer instead?
>>
>> What do you think?  Gaetan?
>>
>> Another point is that queue_netdev_flow_del() should check for
>> netdev_is_flow_api_enabled() to avoid creation of offload threads
>> if offloading is disabled.  But that's good that we didn't have it,
>> as the refcount issue got exposed.  Otherwise it would be hard
>> to reproduce.
>>
>> Best regards, Ilya Maximets.
>>
>> Asan report below, for convenience:
>>
>> =
>> ==17076==ERROR: AddressSanitizer: heap-use-after-free on address 
>> 0x61603080 at pc 0x005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8
>> READ of size 8 at 0x61603080 thread T5 (hw_offload4)
>> #0 0x5e0e37 in mark_to_flow_disassociate 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62
>> #1 0x5dfaf1 in dp_offload_flow 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15
>> #2 0x5df8bd in dp_netdev_flow_offload_main 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17
>> #3 0x73a2fc in ovsthread_wrapper 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12
>> #4 0x7fe0619506da in start_thread 
>> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
>> #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)
>>
>> 0x61603080 is located 0 bytes inside of 576-byte region 
>> [0x61603080,0x616032c0)
>> freed by thread T0 here:
>> #0 0x49640d in free 
>> (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d)
>> #1 0x5d6652 in dp_netdev_free 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5
>> #2 0x5f0722 in dp_netdev_unref 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13
>> #3 0x5cf5e5 in dpif_netdev_close 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5
>> #4 0x5fc393 in dpif_uninit 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9
>> #5 0x5fc0c0 in dpif_close 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9
>> #6 0x52a972 in close_dpif_backer 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5
>> #7 0x517f08 in destruct 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5
>> #8 0x4f09e0 in ofproto_destroy 
>> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5
>> #9 0x4c71e4 in bridge_destroy 

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-27 Thread Sriharsha Basavapatna via dev
On Thu, Jan 27, 2022 at 4:08 PM Ilya Maximets  wrote:
>
> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> > In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> > offloaded flow, if a mark has been assigned to the flow. But if
> > this occurs in the window in which the offload thread completes
> > offloading the flow and assigns a mark to the flow, then we miss
> > deleting the flow. This problem has been observed while adding
> > and deleting flows in a loop. To fix this, always enqueue flow
> > deletion regardless of the flow->mark being set.
> >
> > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >  lib/dpif-netdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e28e0b554..22c5f19a8 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
> > *pmd,
> >  dp_netdev_simple_match_remove(pmd, flow);
> >  cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
> >  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> > -if (flow->mark != INVALID_FLOW_MARK) {
> > -queue_netdev_flow_del(pmd, flow);
> > -}
> > +queue_netdev_flow_del(pmd, flow);
> >  flow->dead = true;
> >
> >  dp_netdev_flow_unref(flow);
> >
>
> Thanks for the patch, but it looks like that it's not that simple...
> A lot of tests are crashing in GHA and ASAN reports use-after-free
> on flow disassociation if the dpif already destroyed:
>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>
> I think that the problem is that offload thread holds the ref count
> for PMD thread, but PMD thread structure doesn't hold the ref count
> for the dp, because it doesn't expect that dp_netdev_pmd structure
> will be used while PMD thread is destroyed.  I guess, we should fix
> that.  OTOH, I'm not sure if offload thread actually needs a reference
> to dp_netdev_pmd structure.  If I didn't miss something, it only
> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
> should hold and ref the dp pointer instead?
>
> What do you think?  Gaetan?
>
> Another point is that queue_netdev_flow_del() should check for
> netdev_is_flow_api_enabled() to avoid creation of offload threads
> if offloading is disabled.  But that's good that we didn't have it,
> as the refcount issue got exposed.  Otherwise it would be hard
> to reproduce.
>
> Best regards, Ilya Maximets.
>
> Asan report below, for convenience:
>
> =
> ==17076==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x61603080 at pc 0x005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8
> READ of size 8 at 0x61603080 thread T5 (hw_offload4)
> #0 0x5e0e37 in mark_to_flow_disassociate 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62
> #1 0x5dfaf1 in dp_offload_flow 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15
> #2 0x5df8bd in dp_netdev_flow_offload_main 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17
> #3 0x73a2fc in ovsthread_wrapper 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12
> #4 0x7fe0619506da in start_thread 
> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
> #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)
>
> 0x61603080 is located 0 bytes inside of 576-byte region 
> [0x61603080,0x616032c0)
> freed by thread T0 here:
> #0 0x49640d in free 
> (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d)
> #1 0x5d6652 in dp_netdev_free 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5
> #2 0x5f0722 in dp_netdev_unref 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13
> #3 0x5cf5e5 in dpif_netdev_close 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5
> #4 0x5fc393 in dpif_uninit 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9
> #5 0x5fc0c0 in dpif_close 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9
> #6 0x52a972 in close_dpif_backer 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5
> #7 0x517f08 in destruct 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5
> #8 0x4f09e0 in ofproto_destroy 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5
> #9 0x4c71e4 in bridge_destroy 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3608:9
> #10 0x4c6f4a in bri

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-27 Thread Ilya Maximets
On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> offloaded flow, if a mark has been assigned to the flow. But if
> this occurs in the window in which the offload thread completes
> offloading the flow and assigns a mark to the flow, then we miss
> deleting the flow. This problem has been observed while adding
> and deleting flows in a loop. To fix this, always enqueue flow
> deletion regardless of the flow->mark being set.
> 
> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> Signed-off-by: Sriharsha Basavapatna 
> ---
>  lib/dpif-netdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..22c5f19a8 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
> *pmd,
>  dp_netdev_simple_match_remove(pmd, flow);
>  cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> -if (flow->mark != INVALID_FLOW_MARK) {
> -queue_netdev_flow_del(pmd, flow);
> -}
> +queue_netdev_flow_del(pmd, flow);
>  flow->dead = true;
>  
>  dp_netdev_flow_unref(flow);
> 

Thanks for the patch, but it looks like that it's not that simple...
A lot of tests are crashing in GHA and ASAN reports use-after-free
on flow disassociation if the dpif already destroyed:
  https://github.com/ovsrobot/ovs/actions/runs/1754866321

I think that the problem is that offload thread holds the ref count
for PMD thread, but PMD thread structure doesn't hold the ref count
for the dp, because it doesn't expect that dp_netdev_pmd structure
will be used while PMD thread is destroyed.  I guess, we should fix
that.  OTOH, I'm not sure if offload thread actually needs a reference
to dp_netdev_pmd structure.  If I didn't miss something, it only
uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
should hold and ref the dp pointer instead?

What do you think?  Gaetan?

Another point is that queue_netdev_flow_del() should check for
netdev_is_flow_api_enabled() to avoid creation of offload threads
if offloading is disabled.  But that's good that we didn't have it,
as the refcount issue got exposed.  Otherwise it would be hard
to reproduce.

Best regards, Ilya Maximets.

Asan report below, for convenience:

=
==17076==ERROR: AddressSanitizer: heap-use-after-free on address 0x61603080 
at pc 0x005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8
READ of size 8 at 0x61603080 thread T5 (hw_offload4)
#0 0x5e0e37 in mark_to_flow_disassociate 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62
#1 0x5dfaf1 in dp_offload_flow 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15
#2 0x5df8bd in dp_netdev_flow_offload_main 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17
#3 0x73a2fc in ovsthread_wrapper 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12
#4 0x7fe0619506da in start_thread 
(/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)

0x61603080 is located 0 bytes inside of 576-byte region 
[0x61603080,0x616032c0)
freed by thread T0 here:
#0 0x49640d in free 
(/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d)
#1 0x5d6652 in dp_netdev_free 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5
#2 0x5f0722 in dp_netdev_unref 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13
#3 0x5cf5e5 in dpif_netdev_close 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5
#4 0x5fc393 in dpif_uninit 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9
#5 0x5fc0c0 in dpif_close 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9
#6 0x52a972 in close_dpif_backer 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5
#7 0x517f08 in destruct 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5
#8 0x4f09e0 in ofproto_destroy 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5
#9 0x4c71e4 in bridge_destroy 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3608:9
#10 0x4c6f4a in bridge_exit 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:553:9
#11 0x4e109a in main 
/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:146:5
#12 0x7fe060dcfbf6 in

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-27 Thread Sriharsha Basavapatna via dev
On Thu, Jan 27, 2022 at 2:45 PM Gaëtan Rivet  wrote:
>
> On Thu, Jan 27, 2022, at 07:42, Sriharsha Basavapatna via dev wrote:
> > In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> > offloaded flow, if a mark has been assigned to the flow. But if
> > this occurs in the window in which the offload thread completes
> > offloading the flow and assigns a mark to the flow, then we miss
> > deleting the flow. This problem has been observed while adding
> > and deleting flows in a loop. To fix this, always enqueue flow
> > deletion regardless of the flow->mark being set.
> >
> > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >  lib/dpif-netdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e28e0b554..22c5f19a8 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct
> > dp_netdev_pmd_thread *pmd,
> >  dp_netdev_simple_match_remove(pmd, flow);
> >  cmap_remove(&pmd->flow_table, node,
> > dp_netdev_flow_hash(&flow->ufid));
> >  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> > -if (flow->mark != INVALID_FLOW_MARK) {
> > -queue_netdev_flow_del(pmd, flow);
> > -}
> > +queue_netdev_flow_del(pmd, flow);
>
> Hi Sriharsha,
>
> It makes sense, thanks for the patch.
> Additionally, the mark is being written in a thread and read in another but 
> is not atomic. Without fence, coherence might take time, making the described 
> issue more likely on some archs.

Right, good point.
>
> Acked-by: Gaetan Rivet 

Thanks Gaetan.
-Harsha
>
> --
> Gaetan Rivet

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-27 Thread Gaëtan Rivet
On Thu, Jan 27, 2022, at 07:42, Sriharsha Basavapatna via dev wrote:
> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> offloaded flow, if a mark has been assigned to the flow. But if
> this occurs in the window in which the offload thread completes
> offloading the flow and assigns a mark to the flow, then we miss
> deleting the flow. This problem has been observed while adding
> and deleting flows in a loop. To fix this, always enqueue flow
> deletion regardless of the flow->mark being set.
>
> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> Signed-off-by: Sriharsha Basavapatna 
> ---
>  lib/dpif-netdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..22c5f19a8 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
> dp_netdev_pmd_thread *pmd,
>  dp_netdev_simple_match_remove(pmd, flow);
>  cmap_remove(&pmd->flow_table, node, 
> dp_netdev_flow_hash(&flow->ufid));
>  ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> -if (flow->mark != INVALID_FLOW_MARK) {
> -queue_netdev_flow_del(pmd, flow);
> -}
> +queue_netdev_flow_del(pmd, flow);

Hi Sriharsha,

It makes sense, thanks for the patch.
Additionally, the mark is being written in a thread and read in another but is 
not atomic. Without fence, coherence might take time, making the described 
issue more likely on some archs.

Acked-by: Gaetan Rivet 

-- 
Gaetan Rivet
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-26 Thread Sriharsha Basavapatna via dev
In dp_netdev_pmd_remove_flow() we schedule the deletion of an
offloaded flow, if a mark has been assigned to the flow. But if
this occurs in the window in which the offload thread completes
offloading the flow and assigns a mark to the flow, then we miss
deleting the flow. This problem has been observed while adding
and deleting flows in a loop. To fix this, always enqueue flow
deletion regardless of the flow->mark being set.

Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e28e0b554..22c5f19a8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
*pmd,
 dp_netdev_simple_match_remove(pmd, flow);
 cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
 ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
-if (flow->mark != INVALID_FLOW_MARK) {
-queue_netdev_flow_del(pmd, flow);
-}
+queue_netdev_flow_del(pmd, flow);
 flow->dead = true;
 
 dp_netdev_flow_unref(flow);
-- 
2.30.0.349.g30b29f044a


-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev