Re: [ovs-discuss] Double free in recent kernels after memleak fix
On Mon, Aug 10, 2020 at 8:27 PM Tonghao Zhang wrote: > > On Tue, Aug 11, 2020 at 10:24 AM Cong Wang wrote: > > > > On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang > > wrote: > > > Hi all, I send a patch to fix this. The rcu warnings disappear. I > > > don't reproduce the double free issue. > > > But I guess this patch may address this issue. > > > > > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/ > > > > I don't see how your patch address the double-free, as we still > > free mask array twice after your patch: once in tbl_mask_array_realloc() > > and once in ovs_flow_tbl_destroy(). > Hi Cong. > Before my patch, we use the ovsl_dereference > (rcu_dereference_protected) in the rcu callback. > ovs_flow_tbl_destroy > ->table_instance_destroy > ->table_instance_flow_free > ->flow_mask_remove > ASSERT_OVSL(will print warning) > ->tbl_mask_array_del_mask > ovsl_dereference(rcu usage warning) > I understand how your patch addresses the RCU annotation issue, which is different from double-free. > so we should invoke the table_instance_destroy or others under > ovs_lock to avoid (ASSERT_OVSL and rcu usage warning). Of course... I never doubt it. > with this patch, we reallocate the mask_array under ovs_lock, and free > it in the rcu callback. Without it, we reallocate and free it in the > rcu callback. > I think we may fix it with this patch. Does it matter which context tbl_mask_array_realloc() is called? Even with ovs_lock, we can still double free: ovs_lock() tbl_mask_array_realloc() => call_rcu(>rcu, mask_array_rcu_cb); ovs_unlock() ... ovs_flow_tbl_destroy() => call_rcu(>rcu, mask_array_rcu_cb); So still twice, right? To fix the double-free, we have to eliminate one of them, don't we? ;) > > > Have you tried my patch which is supposed to address this double-free? > I don't reproduce it. but your patch does not avoid ruc usage warning > and ASSERT_OVSL. Sure, I never intend to fix anything else but double-free. The $subject is about double free, I double checked. ;) Thanks. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Double free in recent kernels after memleak fix
On Tue, Aug 11, 2020 at 10:24 AM Cong Wang wrote: > > On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang > wrote: > > Hi all, I send a patch to fix this. The rcu warnings disappear. I > > don't reproduce the double free issue. > > But I guess this patch may address this issue. > > > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/ > > I don't see how your patch address the double-free, as we still > free mask array twice after your patch: once in tbl_mask_array_realloc() > and once in ovs_flow_tbl_destroy(). Hi Cong. Before my patch, we use the ovsl_dereference (rcu_dereference_protected) in the rcu callback. ovs_flow_tbl_destroy ->table_instance_destroy ->table_instance_flow_free ->flow_mask_remove ASSERT_OVSL(will print warning) ->tbl_mask_array_del_mask ovsl_dereference(rcu usage warning) so we should invoke the table_instance_destroy or others under ovs_lock to avoid (ASSERT_OVSL and rcu usage warning). with this patch, we reallocate the mask_array under ovs_lock, and free it in the rcu callback. Without it, we reallocate and free it in the rcu callback. I think we may fix it with this patch. > Have you tried my patch which is supposed to address this double-free? I don't reproduce it. but your patch does not avoid ruc usage warning and ASSERT_OVSL. > It simply skips the reallocation as it makes no sense to trigger reallocation > when destroying it. > > Thanks. -- Best regards, Tonghao ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Double free in recent kernels after memleak fix
On Mon, Aug 10, 2020 at 6:16 PM Tonghao Zhang wrote: > Hi all, I send a patch to fix this. The rcu warnings disappear. I > don't reproduce the double free issue. > But I guess this patch may address this issue. > > http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/ I don't see how your patch address the double-free, as we still free mask array twice after your patch: once in tbl_mask_array_realloc() and once in ovs_flow_tbl_destroy(). Have you tried my patch which is supposed to address this double-free? It simply skips the reallocation as it makes no sense to trigger reallocation when destroying it. Thanks. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Double free in recent kernels after memleak fix
On Tue, Aug 11, 2020 at 4:28 AM Paul E. McKenney wrote: > > On Mon, Aug 10, 2020 at 04:08:59PM -0400, Joel Fernandes wrote: > > On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote: > > > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote: > > > > Hi, > > > > Adding more of us working on RCU as well. Johan from another team at > > > > Google discovered a likely issue in openswitch, details below: > > > > > > > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös wrote: > > > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose > > > > > wrote: > > > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote: > > > > > > > Hi Open vSwitch contributors, > > > > > > > > > > > > > > We have found openvswitch is causing double-freeing of memory. The > > > > > > > issue was not present in kernel version 5.5.17 but is present in > > > > > > > 5.6.14 and newer kernels. > > > > > > > > > > > > > > After reverting the RCU commits below for debugging, enabling > > > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of > > > > > > > this > > > > > > > email in the kernel log (the last one shows the double-free). > > > > > > > When I > > > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: > > > > > > > openvswitch: > > > > > > > fix possible memleak on destroy flow-table"), the symptoms > > > > > > > disappear. > > > > > > > While I have a reliable way to reproduce the issue, I > > > > > > > unfortunately > > > > > > > don't yet have a process that's amenable to sharing. Please take a > > > > > > > look. > > > > > > > > > > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch() > > > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and > > > > > > > lazy-callback handling > > > > > > > e99637becb2e rcu: Add support for debug_objects debugging for > > > > > > > kfree_rcu() > > > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() > > > > > > > work > > > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo > > > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching > > > > > > > > Note that these reverts were only for testing the same code, because > > > > he was testing 2 different kernel versions. One of them did not have > > > > this set. So I asked him to revert. There's no known bug in the > > > > reverted code itself. But somehow these patches do make it harder for > > > > him to reproduce the issue. > > > > > > Perhaps they adjust timing? > > > > Yes that could be it. In my testing (which is unrelated to OVS), the issue > > happens only with TREE02. I can reproduce the issue in [1] on just boot-up > > of > > TREE02. > > > > I could have screwed up something in my segcblist count patch, any hints > > would be great. I'll dig more into it as well. > > Has anyone taken a close look at 50b0e61b32ee ("net: openvswitch: fix > possible memleak on destroy flow-table") commit? Maybe it avoided the > memleak so thoroughly that it did a double free? Hi all, I send a patch to fix this. The rcu warnings disappear. I don't reproduce the double free issue. But I guess this patch may address this issue. http://patchwork.ozlabs.org/project/netdev/patch/20200811011001.75690-1-xiangxia.m@gmail.com/ > Thanx, Paul > > > > > But then again, I have not heard reports of this warning firing. Paul, > > > > has this come to your radar recently? > > > > > > I have not seen any recent WARNs in rcu_do_batch(). I am guessing that > > > this is one of the last two in that function? > > > > > > If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig > > > option is designed to help locate double frees via RCU. > > > > Yes true, kfree_rcu() also has support for this. Jonathan, did you get a > > chance to try this out in your failure scenario? > > > > thanks, > > > > - Joel > > > > [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/ -- Best regards, Tonghao ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Double free in recent kernels after memleak fix
On Mon, Aug 10, 2020 at 04:08:59PM -0400, Joel Fernandes wrote: > On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote: > > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote: > > > Hi, > > > Adding more of us working on RCU as well. Johan from another team at > > > Google discovered a likely issue in openswitch, details below: > > > > > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös wrote: > > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose > > > > wrote: > > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote: > > > > > > Hi Open vSwitch contributors, > > > > > > > > > > > > We have found openvswitch is causing double-freeing of memory. The > > > > > > issue was not present in kernel version 5.5.17 but is present in > > > > > > 5.6.14 and newer kernels. > > > > > > > > > > > > After reverting the RCU commits below for debugging, enabling > > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of > > > > > > this > > > > > > email in the kernel log (the last one shows the double-free). When I > > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch: > > > > > > fix possible memleak on destroy flow-table"), the symptoms > > > > > > disappear. > > > > > > While I have a reliable way to reproduce the issue, I unfortunately > > > > > > don't yet have a process that's amenable to sharing. Please take a > > > > > > look. > > > > > > > > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch() > > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and > > > > > > lazy-callback handling > > > > > > e99637becb2e rcu: Add support for debug_objects debugging for > > > > > > kfree_rcu() > > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work > > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo > > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching > > > > > > Note that these reverts were only for testing the same code, because > > > he was testing 2 different kernel versions. One of them did not have > > > this set. So I asked him to revert. There's no known bug in the > > > reverted code itself. But somehow these patches do make it harder for > > > him to reproduce the issue. > > > > Perhaps they adjust timing? > > Yes that could be it. In my testing (which is unrelated to OVS), the issue > happens only with TREE02. I can reproduce the issue in [1] on just boot-up of > TREE02. > > I could have screwed up something in my segcblist count patch, any hints > would be great. I'll dig more into it as well. Has anyone taken a close look at 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table") commit? Maybe it avoided the memleak so thoroughly that it did a double free? Thanx, Paul > > > But then again, I have not heard reports of this warning firing. Paul, > > > has this come to your radar recently? > > > > I have not seen any recent WARNs in rcu_do_batch(). I am guessing that > > this is one of the last two in that function? > > > > If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig > > option is designed to help locate double frees via RCU. > > Yes true, kfree_rcu() also has support for this. Jonathan, did you get a > chance to try this out in your failure scenario? > > thanks, > > - Joel > > [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/ ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Double free in recent kernels after memleak fix
On Fri, Aug 07, 2020 at 03:20:15PM -0700, Paul E. McKenney wrote: > On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote: > > Hi, > > Adding more of us working on RCU as well. Johan from another team at > > Google discovered a likely issue in openswitch, details below: > > > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös wrote: > > > > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose wrote: > > > > > > > > > > > > > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote: > > > > > Hi Open vSwitch contributors, > > > > > > > > > > We have found openvswitch is causing double-freeing of memory. The > > > > > issue was not present in kernel version 5.5.17 but is present in > > > > > 5.6.14 and newer kernels. > > > > > > > > > > After reverting the RCU commits below for debugging, enabling > > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this > > > > > email in the kernel log (the last one shows the double-free). When I > > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch: > > > > > fix possible memleak on destroy flow-table"), the symptoms disappear. > > > > > While I have a reliable way to reproduce the issue, I unfortunately > > > > > don't yet have a process that's amenable to sharing. Please take a > > > > > look. > > > > > > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch() > > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback > > > > > handling > > > > > e99637becb2e rcu: Add support for debug_objects debugging for > > > > > kfree_rcu() > > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work > > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo > > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching > > > > Note that these reverts were only for testing the same code, because > > he was testing 2 different kernel versions. One of them did not have > > this set. So I asked him to revert. There's no known bug in the > > reverted code itself. But somehow these patches do make it harder for > > him to reproduce the issue. > > Perhaps they adjust timing? Yes that could be it. In my testing (which is unrelated to OVS), the issue happens only with TREE02. I can reproduce the issue in [1] on just boot-up of TREE02. I could have screwed up something in my segcblist count patch, any hints would be great. I'll dig more into it as well. > > > > But then again, I have not heard reports of this warning firing. Paul, > > has this come to your radar recently? > > I have not seen any recent WARNs in rcu_do_batch(). I am guessing that > this is one of the last two in that function? > > If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig > option is designed to help locate double frees via RCU. Yes true, kfree_rcu() also has support for this. Jonathan, did you get a chance to try this out in your failure scenario? thanks, - Joel [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/ ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] [OVN] not-equal in ACL
Hi Numan, Create a new thread here to follow up ACL questions. > > > I think this is a big problem here. We should not use "!=" in > > > logical flows, although OVN allows. > > > > Is this a generic recommendation or for certain cases? > > Is it OK to add an ACL with "!=", like below? > > > > ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d from-lport 1005 > > 'ip4.dst == 192.168.0.0/16 && inport != > > "d93619c3-dab9-4f6d-8261-4211f6937fd1"' drop > > > This is a generic recommendation. The above ACL would also result in > many OF flows. > > To handle cases like above, you can add a couple of ACLs like below with > high priority flow to allow the desired inport and low priority ACL to > drop all the traffic. > > ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d from-lport > 1006 'ip4.dst == 192.168.0.0/16 && inport == "d93619c3-dab9-4f6d-8261- > 4211f6937fd1"' allow ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324- > e780d9dfdc0d from-lport > 1005 'ip4.dst == 192.168.0.0/16"' drop In my case, two LS connect to one LR who has external access. There are 3 ports on each LS. * vm_port * gw_port (connect to LR) * svc_port (localport for DHCP and metadata) What I want is to disable the connection between two LS while allow external access for them. Option #1, create one ACL for each VM on each LS. acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport == "$vm_port"' drop This works fine for me, but the ACL has to be per VM. Option #2, create one ACL to exclude gw_port and svc_port. acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport != "$gw_port" && inport != "svc_port"' drop As you mentioned, this is not recommended, cause it will result many OF flows. I actually tried, but I don't see any OF flows created for that ACL. Is there any policy in ovn-controller to not translate such policy to OF flow? Option #3, as you suggested, I tried 2 ACLs. acl-add $ls from-lport 1006 'ip4.dst == 192.168.0.0/16 && (inport == "$gw_port" || inport == "svc_port")' allow acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16' drop On compute node, I see the "drop" OF flow only, not the "allow" flow. Am I missing anything here? Thanks! Tony ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss