Hi Numan, Excellent find. I missed the POP while iterating the hmap. Submitted a V2, added relevant tags as well.
Please take a look. Regards, Ankur From: Numan Siddique <nusid...@redhat.com> Sent: Thursday, May 16, 2019 11:31 AM To: Ankur Sharma <ankur.sha...@nutanix.com> Cc: Han Zhou <zhou...@gmail.com>; ovs-dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH v1] OVN: Fix the ovn-controller 100% usage issue with put_mac_bindings On Thu, May 16, 2019 at 12:26 AM Numan Siddique <nusid...@redhat.com<mailto:nusid...@redhat.com>> wrote: On Thu, May 16, 2019 at 12:20 AM Ankur Sharma <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>> wrote: Hi Numan, Just confirming, anything else pending from my side? or given that patch has been Acked, hence someone will apply it to master? I don't think you have anything pending unless the committer has any comments. Thanks for the fix :) Numan Thanks Regards, Ankur From: Numan Siddique <nusid...@redhat.com<mailto:nusid...@redhat.com>> Sent: Wednesday, May 15, 2019 11:43 AM To: Ankur Sharma <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>> Cc: Han Zhou <zhou...@gmail.com<mailto:zhou...@gmail.com>>; ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org> Subject: Re: [ovs-dev] [PATCH v1] OVN: Fix the ovn-controller 100% usage issue with put_mac_bindings Gentle ping on this patch. This fix is a bit critical to fix the ovn-controller CPU usage. Thanks Numan On Sat, May 11, 2019 at 2:34 PM Numan Siddique <nusid...@redhat.com<mailto:nusid...@redhat.com>> wrote: On Sat, May 11, 2019 at 2:09 PM Numan Siddique <nusid...@redhat.com<mailto:nusid...@redhat.com>> wrote: On Sat, May 11, 2019 at 5:38 AM Ankur Sharma <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>> wrote: Hi Han, Thanks for review. Yup, looks like it, although I did not try the scenario myself, but yes entry are not getting removed once added, hence, new mac bindings will not be added after some time (as existing count reaches 1000). Regards, Ankur From: Han Zhou <zhou...@gmail.com<mailto:zhou...@gmail.com>> Sent: Friday, May 10, 2019 4:47 PM To: Ankur Sharma <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>> Cc: ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org> Subject: Re: [ovs-dev] [PATCH v1] OVN: Fix the ovn-controller 100% usage issue with put_mac_bindings On Fri, May 10, 2019 at 2:46 PM Ankur Sharma <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com><mailto:ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>>> wrote: > > ISSUE: > a. As soon as entries get added to put_mac_bindings in pinctrl.c, > ovn-controller CPU consumption reached 100%. > > b. This happens because in wait_put_mac_bindings, > if !hmap_is_empty(&put_mac_bindings) succeeds and > calls poll_immediat_wake(). > > ovn-controller.log: > "2019-05-10T19:43:28.035Z|00035|poll_loop|INFO|wakeup due to > 0-ms timeout at ovn/controller/pinctrl.c:2520 (99% CPU usage)" > > ROOT_CAUSE: > a. Earlier it used to work fine, because in run_put_mac_bindings > all the bindings in put_mac_bindings would be flushed. > > b. Looks like, as a part of adding a new thread in pinctrl, this > line got replaced with calling buffer_put_mac_bindings. > > " > . > . > . > /* Move the mac bindings from 'put_mac_bindings' hmap to > * 'buffered_mac_bindings' and notify the pinctrl_handler. > * pinctrl_handler will reinject the buffered packets. */ > if (!hmap_is_empty(&put_mac_bindings)) { > buffer_put_mac_bindings(); > notify_pinctrl_handler(); > } > " > > c. Because of which put_mac_bindings is never emptied and > wait_put_mac_bindings > ends up doing poll_immediate_wake. Hi Ankur, I was very curious why this 100% CPU issue came now and why we didn't see earlier. I looked into the actual commit which added the pinctrl thread [1] The function run_put_mac_bindings() didn't have any issues earlier as the code was like below ******* ... /* Move the mac bindings from 'put_mac_bindings' hmap to * 'buffered_mac_bindings' and notify the pinctrl_handler. * pinctrl_handler will reinject the buffered packets. */ if (!hmap_is_empty(&put_mac_bindings)) { buffer_put_mac_bindings(); notify_pinctrl_handler(); } ******** and the function buffer_put_mac_bindings() popped the put_mac_bindings hmap. The recent commit [2] has introduced this issue because it deleted the function buffer_put_mac_bindings(). I think its better to correct the commit message and also add "Fixes" tag. [1] - https://github.com/openvswitch/ovs/commit/3594ffab6b4b423aa635a313f6b304180d7dbaf7#diff-426c527fd2f3cf3c57def4b2747a48c3 [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs_commit_3594ffab6b4b423aa635a313f6b304180d7dbaf7-23diff-2D426c527fd2f3cf3c57def4b2747a48c3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=XA7ByJtXDRAQSk6npDCYocdoN-a3Bktrhxx0flh5B4A&s=Vxr6-c-Anu0LEhO9OwySZ5A5v_hpNVq6UQWkGy1JyMI&e=> [2] - https://github.com/openvswitch/ovs/commit/1c24b2f490ba002bbfeb23006965188a7c5b9747#diff-426c527fd2f3cf3c57def4b2747a48c3 [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs_commit_1c24b2f490ba002bbfeb23006965188a7c5b9747-23diff-2D426c527fd2f3cf3c57def4b2747a48c3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=XA7ByJtXDRAQSk6npDCYocdoN-a3Bktrhxx0flh5B4A&s=qKoaN2qjrET5DGYwQBS9Po2Bj4Bi8TIbeBE-K_BJU4Y&e=> Thanks Numan > > FIX: > a. Added call to flush_put_mac_bindings back in > run_put_mac_bindings. > > b. Additioanlly, logic mentioned in ROOT_CAUSE.b has been changed > in 1c24b2f490ba002bbfeb23006965188a7c5b9747, hence updated > the documentation in pinctrl.c accordingly. > > Signed-off-by: Ankur Sharma > <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com><mailto:ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>>> > --- > ovn/controller/pinctrl.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c > index 8ae1f9b..b7bb4c9 100644 > --- a/ovn/controller/pinctrl.c > +++ b/ovn/controller/pinctrl.c > @@ -91,11 +91,13 @@ VLOG_DEFINE_THIS_MODULE(pinctrl); > * > * - arp/nd_ns - These actions generate an ARP/IPv6 Neighbor solicit > * requests. The original packets are buffered and > - * injected back when put_arp/put_nd actions are called. > + * injected back when put_arp/put_nd resolves > + * corresponding ARP/IPv6 Neighbor solicit requests. > * When pinctrl_run(), writes the mac bindings from the > * 'put_mac_bindings' hmap to the MAC_Binding table in > - * SB DB, it moves these mac bindings to another hmap - > - * 'buffered_mac_bindings'. > + * SB DB, run_buffered_binding will add the buffered > + * packets to buffered_mac_bindings and notify > + * pinctrl_handler. > * > * The pinctrl_handler thread calls the function - > * send_mac_binding_buffered_pkts(), which uses > @@ -2468,6 +2470,7 @@ run_put_mac_bindings(struct ovsdb_idl_txn > *ovnsb_idl_txn, > sbrec_mac_binding_by_lport_ip, > pmb); > } > + "flush_put_mac_bindings(); Thanks for the patch and finding the issue. This is a big mistake from me when I added the pinctrl_thread. Instead of calling "flush_put_mac_bindings()" in the function "run_put_mac_bindings()" I would suggest to use - HMAP_FOR_EACH_POP in run_put_mac_bindings() and then free pmb after the call to run_put_mac_binding(). If you see flush_put_mac_bindings() also does the same thing. We can avoid running the for loop twice. You can ignore my suggestion. Calling flush_put_mac_bindings() seems much cleaner. Acked-by: Numan Siddique <nusid...@redhat.com<mailto:nusid...@redhat.com>> Thanks Numan Thanks Numan > } > > static void > -- > 1.8.3.1 > > _______________________________________________ > dev mailing list > d...@openvswitch.org<mailto:d...@openvswitch.org><mailto:d...@openvswitch.org<mailto:d...@openvswitch.org>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=wyCWCurDrDdYULVOHFu1TJyFW2oQoNtJtlISXqkQEOU&s=kwv6Xpj9J47S5qVPfEF92S5wBd_7KHBtvOeOua67ijM&e=> > [mail.openvswitch.org > [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openvswitch.org&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=wyCWCurDrDdYULVOHFu1TJyFW2oQoNtJtlISXqkQEOU&s=jQd-m5xirQqrCOBwkOE4D0fAkYZjwy2No-tgJksKVCk&e=>]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=P-XiYg1pCA7c5_H6MKaFGMBLWLPE7GAfuUnfELvcIGY&s=6QBH6hjY9c7Vc-Bh4_AzkwWJBBGuOH3Y2v7sXIIU-kc&e=> Thanks for the fix. Does it mean the the mac-binding will not work after some time since it is never flushed, and the check "hmap_count(&put_mac_bindings) >= 1000" in pinctrl_handle_put_mac_binding() will prevent any new mac-binding being handled? The fix looks good to me. Acked-by: Han Zhou <hzh...@ebay.com<mailto:hzh...@ebay.com><mailto:hzh...@ebay.com<mailto:hzh...@ebay.com>>> _______________________________________________ dev mailing list d...@openvswitch.org<mailto:d...@openvswitch.org> https://mail.openvswitch.org/mailman/listinfo/ovs-dev [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=wyCWCurDrDdYULVOHFu1TJyFW2oQoNtJtlISXqkQEOU&s=kwv6Xpj9J47S5qVPfEF92S5wBd_7KHBtvOeOua67ijM&e=> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev