Re: [ovs-discuss] Double free in recent kernels after memleak fix

2020-08-10 Thread Cong Wang
On Mon, Aug 10, 2020 at 8:27 PM Tonghao Zhang  wrote:
>
> On Tue, Aug 11, 2020 at 10:24 AM Cong Wang  wrote:
> >
> > On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang  
> > wrote:
> > > Hi all, I send a patch to fix this. The rcu warnings disappear. I
> > > don't reproduce the double free issue.
> > > But I guess this patch may address this issue.
> > >
> > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/
> >
> > I don't see how your patch address the double-free, as we still
> > free mask array twice after your patch: once in tbl_mask_array_realloc()
> > and once in ovs_flow_tbl_destroy().
> Hi Cong.
> Before my patch, we use the ovsl_dereference
> (rcu_dereference_protected) in the rcu callback.
> ovs_flow_tbl_destroy
> ->table_instance_destroy
> ->table_instance_flow_free
> ->flow_mask_remove
> ASSERT_OVSL(will print warning)
> ->tbl_mask_array_del_mask
> ovsl_dereference(rcu usage warning)
>

I understand how your patch addresses the RCU annotation issue,
which is different from double-free.


> so we should invoke the table_instance_destroy or others under
> ovs_lock to avoid (ASSERT_OVSL and rcu usage warning).

Of course... I never doubt it.


> with this patch, we reallocate the mask_array under ovs_lock, and free
> it in the rcu callback. Without it, we  reallocate and free it in the
> rcu callback.
> I think we may fix it with this patch.

Does it matter which context tbl_mask_array_realloc() is called?
Even with ovs_lock, we can still double free:

ovs_lock()
tbl_mask_array_realloc()
 => call_rcu(>rcu, mask_array_rcu_cb);
ovs_unlock()
...
ovs_flow_tbl_destroy()
 => call_rcu(>rcu, mask_array_rcu_cb);

So still twice, right? To fix the double-free, we have to eliminate one
of them, don't we? ;)


>
> > Have you tried my patch which is supposed to address this double-free?
> I don't reproduce it. but your patch does not avoid ruc usage warning
> and ASSERT_OVSL.

Sure, I never intend to fix anything else but double-free. The $subject is
about double free, I double checked. ;)

Thanks.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Double free in recent kernels after memleak fix

2020-08-10 Thread Tonghao Zhang
On Tue, Aug 11, 2020 at 10:24 AM Cong Wang  wrote:
>
> On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang  
> wrote:
> > Hi all, I send a patch to fix this. The rcu warnings disappear. I
> > don't reproduce the double free issue.
> > But I guess this patch may address this issue.
> >
> > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/
>
> I don't see how your patch address the double-free, as we still
> free mask array twice after your patch: once in tbl_mask_array_realloc()
> and once in ovs_flow_tbl_destroy().
Hi Cong.
Before my patch, we use the ovsl_dereference
(rcu_dereference_protected) in the rcu callback.
ovs_flow_tbl_destroy
->table_instance_destroy
->table_instance_flow_free
->flow_mask_remove
ASSERT_OVSL(will print warning)
->tbl_mask_array_del_mask
ovsl_dereference(rcu usage warning)

so we should invoke the table_instance_destroy or others under
ovs_lock to avoid (ASSERT_OVSL and rcu usage warning).
with this patch, we reallocate the mask_array under ovs_lock, and free
it in the rcu callback. Without it, we  reallocate and free it in the
rcu callback.
I think we may fix it with this patch.

> Have you tried my patch which is supposed to address this double-free?
I don't reproduce it. but your patch does not avoid ruc usage warning
and ASSERT_OVSL.
> It simply skips the reallocation as it makes no sense to trigger reallocation
> when destroying it.
>
> Thanks.



-- 
Best regards, Tonghao
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Double free in recent kernels after memleak fix

2020-08-10 Thread Cong Wang
On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang  wrote:
> Hi all, I send a patch to fix this. The rcu warnings disappear. I
> don't reproduce the double free issue.
> But I guess this patch may address this issue.
>
> http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/

I don't see how your patch address the double-free, as we still
free mask array twice after your patch: once in tbl_mask_array_realloc()
and once in ovs_flow_tbl_destroy().

Have you tried my patch which is supposed to address this double-free?
It simply skips the reallocation as it makes no sense to trigger reallocation
when destroying it.

Thanks.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Double free in recent kernels after memleak fix

2020-08-10 Thread Tonghao Zhang
On Tue, Aug 11, 2020 at 4:28 AM Paul E. McKenney  wrote:
>
> On Mon, Aug 10, 2020 at 04:08:59PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > > > Hi,
> > > > Adding more of us working on RCU as well. Johan from another team at
> > > > Google discovered a likely issue in openswitch, details below:
> > > >
> > > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös  wrote:
> > > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose  
> > > > > wrote:
> > > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > > > Hi Open vSwitch contributors,
> > > > > > >
> > > > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > > > 5.6.14 and newer kernels.
> > > > > > >
> > > > > > > After reverting the RCU commits below for debugging, enabling
> > > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of 
> > > > > > > this
> > > > > > > email in the kernel log (the last one shows the double-free). 
> > > > > > > When I
> > > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: 
> > > > > > > openvswitch:
> > > > > > > fix possible memleak on destroy flow-table"), the symptoms 
> > > > > > > disappear.
> > > > > > > While I have a reliable way to reproduce the issue, I 
> > > > > > > unfortunately
> > > > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > > > look.
> > > > > > >
> > > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and 
> > > > > > > lazy-callback handling
> > > > > > > e99637becb2e rcu: Add support for debug_objects debugging for 
> > > > > > > kfree_rcu()
> > > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() 
> > > > > > > work
> > > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > > >
> > > > Note that these reverts were only for testing the same code, because
> > > > he was testing 2 different kernel versions. One of them did not have
> > > > this set. So I asked him to revert. There's no known bug in the
> > > > reverted code itself. But somehow these patches do make it harder for
> > > > him to reproduce the issue.
> > >
> > > Perhaps they adjust timing?
> >
> > Yes that could be it. In my testing (which is unrelated to OVS), the issue
> > happens only with TREE02. I can reproduce the issue in [1] on just boot-up 
> > of
> > TREE02.
> >
> > I could have screwed up something in my segcblist count patch, any hints
> > would be great. I'll dig more into it as well.
>
> Has anyone taken a close look at 50b0e61b32ee ("net: openvswitch: fix
> possible memleak on destroy flow-table") commit?  Maybe it avoided the
> memleak so thoroughly that it did a double free?
Hi all, I send a patch to fix this. The rcu warnings disappear. I
don't reproduce the double free issue.
But I guess this patch may address this issue.

http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/
> Thanx, Paul
>
> > > > But then again, I have not heard reports of this warning firing. Paul,
> > > > has this come to your radar recently?
> > >
> > > I have not seen any recent WARNs in rcu_do_batch().  I am guessing that
> > > this is one of the last two in that function?
> > >
> > > If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y?  That Kconfig
> > > option is designed to help locate double frees via RCU.
> >
> > Yes true, kfree_rcu() also has support for this. Jonathan, did you get a
> > chance to try this out in your failure scenario?
> >
> > thanks,
> >
> >  - Joel
> >
> > [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/



-- 
Best regards, Tonghao
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Double free in recent kernels after memleak fix

2020-08-10 Thread Paul E. McKenney
On Mon, Aug 10, 2020 at 04:08:59PM -0400, Joel Fernandes wrote:
> On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > > Hi,
> > > Adding more of us working on RCU as well. Johan from another team at
> > > Google discovered a likely issue in openswitch, details below:
> > > 
> > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös  wrote:
> > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose  
> > > > wrote:
> > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > > Hi Open vSwitch contributors,
> > > > > >
> > > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > > 5.6.14 and newer kernels.
> > > > > >
> > > > > > After reverting the RCU commits below for debugging, enabling
> > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of 
> > > > > > this
> > > > > > email in the kernel log (the last one shows the double-free). When I
> > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > > > fix possible memleak on destroy flow-table"), the symptoms 
> > > > > > disappear.
> > > > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > > look.
> > > > > >
> > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and 
> > > > > > lazy-callback handling
> > > > > > e99637becb2e rcu: Add support for debug_objects debugging for 
> > > > > > kfree_rcu()
> > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > > 
> > > Note that these reverts were only for testing the same code, because
> > > he was testing 2 different kernel versions. One of them did not have
> > > this set. So I asked him to revert. There's no known bug in the
> > > reverted code itself. But somehow these patches do make it harder for
> > > him to reproduce the issue.
> > 
> > Perhaps they adjust timing?
> 
> Yes that could be it. In my testing (which is unrelated to OVS), the issue
> happens only with TREE02. I can reproduce the issue in [1] on just boot-up of
> TREE02.
> 
> I could have screwed up something in my segcblist count patch, any hints
> would be great. I'll dig more into it as well.

Has anyone taken a close look at 50b0e61b32ee ("net: openvswitch: fix
possible memleak on destroy flow-table") commit?  Maybe it avoided the
memleak so thoroughly that it did a double free?

Thanx, Paul

> > > But then again, I have not heard reports of this warning firing. Paul,
> > > has this come to your radar recently?
> > 
> > I have not seen any recent WARNs in rcu_do_batch().  I am guessing that
> > this is one of the last two in that function?
> > 
> > If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y?  That Kconfig
> > option is designed to help locate double frees via RCU.
> 
> Yes true, kfree_rcu() also has support for this. Jonathan, did you get a
> chance to try this out in your failure scenario?
> 
> thanks,
> 
>  - Joel
> 
> [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Double free in recent kernels after memleak fix

2020-08-10 Thread Joel Fernandes
On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote:
> > Hi,
> > Adding more of us working on RCU as well. Johan from another team at
> > Google discovered a likely issue in openswitch, details below:
> > 
> > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös  wrote:
> > >
> > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose  wrote:
> > > >
> > > >
> > > >
> > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote:
> > > > > Hi Open vSwitch contributors,
> > > > >
> > > > > We have found openvswitch is causing double-freeing of memory. The
> > > > > issue was not present in kernel version 5.5.17 but is present in
> > > > > 5.6.14 and newer kernels.
> > > > >
> > > > > After reverting the RCU commits below for debugging, enabling
> > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this
> > > > > email in the kernel log (the last one shows the double-free). When I
> > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
> > > > > fix possible memleak on destroy flow-table"), the symptoms disappear.
> > > > > While I have a reliable way to reproduce the issue, I unfortunately
> > > > > don't yet have a process that's amenable to sharing. Please take a
> > > > > look.
> > > > >
> > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
> > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback 
> > > > > handling
> > > > > e99637becb2e rcu: Add support for debug_objects debugging for 
> > > > > kfree_rcu()
> > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
> > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
> > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching
> > 
> > Note that these reverts were only for testing the same code, because
> > he was testing 2 different kernel versions. One of them did not have
> > this set. So I asked him to revert. There's no known bug in the
> > reverted code itself. But somehow these patches do make it harder for
> > him to reproduce the issue.
> 
> Perhaps they adjust timing?

Yes that could be it. In my testing (which is unrelated to OVS), the issue
happens only with TREE02. I can reproduce the issue in [1] on just boot-up of
TREE02.

I could have screwed up something in my segcblist count patch, any hints
would be great. I'll dig more into it as well.

> > 
> > But then again, I have not heard reports of this warning firing. Paul,
> > has this come to your radar recently?
> 
> I have not seen any recent WARNs in rcu_do_batch().  I am guessing that
> this is one of the last two in that function?
> 
> If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y?  That Kconfig
> option is designed to help locate double frees via RCU.

Yes true, kfree_rcu() also has support for this. Jonathan, did you get a
chance to try this out in your failure scenario?

thanks,

 - Joel

[1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] not-equal in ACL

2020-08-10 Thread Tony Liu
Hi Numan,

Create a new thread here to follow up ACL questions.

> > > I think this is a big problem here. We should not use "!=" in
> > > logical flows, although OVN allows.
> >
> > Is this a generic recommendation or for certain cases?
> > Is it OK to add an ACL with "!=", like below?
> >
> > ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d from-lport 1005
> > 'ip4.dst == 192.168.0.0/16 && inport !=
> > "d93619c3-dab9-4f6d-8261-4211f6937fd1"' drop
> 
> 
> This is a generic recommendation. The above ACL would also result in
> many OF flows.
> 
> To handle cases like above, you can add a couple of ACLs like below with
> high priority flow to allow the desired inport and low priority ACL to
> drop all the traffic.
> 
>  ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d from-lport
> 1006 'ip4.dst == 192.168.0.0/16 && inport == "d93619c3-dab9-4f6d-8261-
> 4211f6937fd1"' allow  ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-
> e780d9dfdc0d from-lport
> 1005 'ip4.dst == 192.168.0.0/16"' drop

In my case, two LS connect to one LR who has external access.
There are 3 ports on each LS.
* vm_port
* gw_port (connect to LR)
* svc_port (localport for DHCP and metadata)

What I want is to disable the connection between two LS while allow
external access for them.

Option #1, create one ACL for each VM on each LS.

acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport == "$vm_port"' 
drop

This works fine for me, but the ACL has to be per VM.

Option #2, create one ACL to exclude gw_port and svc_port.

acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport != "$gw_port" 
&& inport != "svc_port"' drop

As you mentioned, this is not recommended, cause it will result many
OF flows. I actually tried, but I don't see any OF flows created for
that ACL. Is there any policy in ovn-controller to not translate such
policy to OF flow?

Option #3, as you suggested, I tried 2 ACLs.

acl-add $ls from-lport 1006 'ip4.dst == 192.168.0.0/16 && (inport == "$gw_port" 
|| inport == "svc_port")' allow
acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16' drop

On compute node, I see the "drop" OF flow only, not the "allow" flow.
Am I missing anything here?


Thanks!

Tony

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss