Re: [ovs-dev] [PATCH ovn v5 09/14] northd: Use objdep mgr for lport to lflow references.
On Tue, 22 Aug, 2023, 6:11 am Han Zhou, wrote: > On Fri, Aug 18, 2023 at 4:35 AM Numan Siddique wrote: > > > > On Fri, Aug 18, 2023 at 2:33 PM Numan Siddique wrote: > > > > > > On Fri, Aug 18, 2023 at 11:38 AM Han Zhou wrote: > > > > > > > > On Wed, Aug 9, 2023 at 9:38 AM wrote: > > > > > > > > > > From: Numan Siddique > > > > > > > > > Thanks Numan for the enhancement. > > > > > > > > > Instead of maintaing the lport to lflow references using > > > > > 'struct lflow_ref_list', this patch makes use of the existing > > > > > objdep APIs. Since objdep_mgr is not thread safe, an instance > > > > > of objdep_mgr is maintained for each 'ovn_port'. > > > > > > > > > > Once we add the thread safe support to objdep APIs we can move > > > > > the objdep_mgr to 'lflow' engine date. > > > > > > > > > > Using objdep APIs does come with a cost in memory. We now > > > > > maintain an additional hmap of ovn_lflow's to look up using > > > > > uuid. But this will be useful to handle datapath and load > > > > > balancer changes in the lflow engine node. > > > > > > > > > > > > > I understand that the 'struct lflow_ref_list' may not satisfy the > needs of > > > > more complex I-P, but I wonder if objdep_mgr (without modification) > is the > > > > solution. It seems to me more complex at least for this patch without > > > > adding obvious benefits. Since I am still reviewing, maybe I missed > some > > > > points which would show the value from the rest of the patches, and > please > > > > forgive me for my slow review. > > > > > > Sure. No worries. > > > > > > Can you please continue reviewing from the latest v6 ? > > > > > > http://patchwork.ozlabs.org/project/ovn/list/?series=369384 > > > > > > I think we can discuss more once you go through all the patches in the > > > series (or until patch 14). > > > I think that would give a better idea of how it is used and we can > > > discuss if it's worth using obj dep mgr. > > > > > > Also I think the patches P1 - P8 make one set and can be considered > > > first. I submitted all these patches > > > as one series mainly because > > >- all the patches can be applied easily > > >- the soft freeze was near. > > > > > > The patches can be found here too - > > > https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6 > > > > > > Thanks > > > Numan > > > > > > > > > > > While I am still reviewing the patches, I did a performance test with > > > > https://github.com/hzhou8/ovn-test-script for the 200 chassis x 50 > lsp > > > > scale of simulated ovn-k8s topology. > > > > > > > > - Before this patch, a recompute takes 1293ms > > > > - After this patch, a recompute takes 1536ms, which is 20% more time. > > > > - I also did the same test with all the rest of the patches of this > series, > > > > which takes 1690ms. > > > > > > > > I did some testing using the OVN dbs of the ovn-heater density heavy > run. > > A triggered forced recompute with the present main of ovn-northd takes > > around 4389ms > > where as with the patch 9 of this series, it is taking around 4838ms. > > > > I hacked the 'engine_compute_log_timeout_msec' [1] value from 500ms to > > 50ms so that > > inc-eng can log the engine nodes taking more than 50ms in the recompute. > > > > With main > > --- > > 2023-08-18T11:28:20.714Z|00117|inc_proc_eng|INFO|User triggered force > recompute. > > 2023-08-18T11:28:23.189Z|00118|inc_proc_eng|INFO|node: northd, > > recompute (forced) took 2475ms > > 2023-08-18T11:28:25.102Z|00119|inc_proc_eng|INFO|node: lflow, > > recompute (forced) took 1887ms > > 2023-08-18T11:28:25.102Z|00120|timeval|WARN|Unreasonably long 4389ms > > poll interval (4373ms user, 3ms system) > > > > > > And with patch 9 > > - > > 2023-08-18T11:28:10.558Z|00220|inc_proc_eng|INFO|User triggered force > recompute. > > 2023-08-18T11:28:10.883Z|00221|inc_proc_eng|INFO|node: lb_data, > > recompute (forced) took 325ms > > 2023-08-18T11:28:13.163Z|00222|inc_proc_eng|INFO|node: northd, > > recompute (forced) took 2280ms > > 2023-08-18T11:28:13.431Z|00223|inc_proc_eng|INFO|node: sync_to_sb_lb, > > recompute (forced) took 241ms > > 2023-08-18T11:28:15.396Z|00224|inc_proc_eng|INFO|node: lflow, > > recompute (forced) took 1956ms > > 2023-08-18T11:28:15.397Z|00225|timeval|WARN|Unreasonably long 4838ms > > poll interval (4819ms user, 3ms system) > > > > > > As you can see the lflow recompute time is not very significant > > (1887 ms vs 1956 ms) > > > > It is the lb_data engine node which is taking 325ms more. But if you > > see the lb_data engine code, the lb_data handlers never return false. > > So the lb_data full engine recompute cost can be ignored as it will be > > triggered when engine aborts or when the user triggers a recompute. > > > > I don't think the lflow engine recompute cost due to objdep is > > significant if lflow recomputes are less. > > > > My point is that the performance degradation ~20% is caused by just patch > 9, because I did the same test on patch 8 v.s.
Re: [ovs-dev] [PATCH ovn v5 09/14] northd: Use objdep mgr for lport to lflow references.
On Fri, Aug 18, 2023 at 4:35 AM Numan Siddique wrote: > > On Fri, Aug 18, 2023 at 2:33 PM Numan Siddique wrote: > > > > On Fri, Aug 18, 2023 at 11:38 AM Han Zhou wrote: > > > > > > On Wed, Aug 9, 2023 at 9:38 AM wrote: > > > > > > > > From: Numan Siddique > > > > > > > Thanks Numan for the enhancement. > > > > > > > Instead of maintaing the lport to lflow references using > > > > 'struct lflow_ref_list', this patch makes use of the existing > > > > objdep APIs. Since objdep_mgr is not thread safe, an instance > > > > of objdep_mgr is maintained for each 'ovn_port'. > > > > > > > > Once we add the thread safe support to objdep APIs we can move > > > > the objdep_mgr to 'lflow' engine date. > > > > > > > > Using objdep APIs does come with a cost in memory. We now > > > > maintain an additional hmap of ovn_lflow's to look up using > > > > uuid. But this will be useful to handle datapath and load > > > > balancer changes in the lflow engine node. > > > > > > > > > > I understand that the 'struct lflow_ref_list' may not satisfy the needs of > > > more complex I-P, but I wonder if objdep_mgr (without modification) is the > > > solution. It seems to me more complex at least for this patch without > > > adding obvious benefits. Since I am still reviewing, maybe I missed some > > > points which would show the value from the rest of the patches, and please > > > forgive me for my slow review. > > > > Sure. No worries. > > > > Can you please continue reviewing from the latest v6 ? > > > > http://patchwork.ozlabs.org/project/ovn/list/?series=369384 > > > > I think we can discuss more once you go through all the patches in the > > series (or until patch 14). > > I think that would give a better idea of how it is used and we can > > discuss if it's worth using obj dep mgr. > > > > Also I think the patches P1 - P8 make one set and can be considered > > first. I submitted all these patches > > as one series mainly because > >- all the patches can be applied easily > >- the soft freeze was near. > > > > The patches can be found here too - > > https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6 > > > > Thanks > > Numan > > > > > > > > While I am still reviewing the patches, I did a performance test with > > > https://github.com/hzhou8/ovn-test-script for the 200 chassis x 50 lsp > > > scale of simulated ovn-k8s topology. > > > > > > - Before this patch, a recompute takes 1293ms > > > - After this patch, a recompute takes 1536ms, which is 20% more time. > > > - I also did the same test with all the rest of the patches of this series, > > > which takes 1690ms. > > > > > I did some testing using the OVN dbs of the ovn-heater density heavy run. > A triggered forced recompute with the present main of ovn-northd takes > around 4389ms > where as with the patch 9 of this series, it is taking around 4838ms. > > I hacked the 'engine_compute_log_timeout_msec' [1] value from 500ms to > 50ms so that > inc-eng can log the engine nodes taking more than 50ms in the recompute. > > With main > --- > 2023-08-18T11:28:20.714Z|00117|inc_proc_eng|INFO|User triggered force recompute. > 2023-08-18T11:28:23.189Z|00118|inc_proc_eng|INFO|node: northd, > recompute (forced) took 2475ms > 2023-08-18T11:28:25.102Z|00119|inc_proc_eng|INFO|node: lflow, > recompute (forced) took 1887ms > 2023-08-18T11:28:25.102Z|00120|timeval|WARN|Unreasonably long 4389ms > poll interval (4373ms user, 3ms system) > > > And with patch 9 > - > 2023-08-18T11:28:10.558Z|00220|inc_proc_eng|INFO|User triggered force recompute. > 2023-08-18T11:28:10.883Z|00221|inc_proc_eng|INFO|node: lb_data, > recompute (forced) took 325ms > 2023-08-18T11:28:13.163Z|00222|inc_proc_eng|INFO|node: northd, > recompute (forced) took 2280ms > 2023-08-18T11:28:13.431Z|00223|inc_proc_eng|INFO|node: sync_to_sb_lb, > recompute (forced) took 241ms > 2023-08-18T11:28:15.396Z|00224|inc_proc_eng|INFO|node: lflow, > recompute (forced) took 1956ms > 2023-08-18T11:28:15.397Z|00225|timeval|WARN|Unreasonably long 4838ms > poll interval (4819ms user, 3ms system) > > > As you can see the lflow recompute time is not very significant > (1887 ms vs 1956 ms) > > It is the lb_data engine node which is taking 325ms more. But if you > see the lb_data engine code, the lb_data handlers never return false. > So the lb_data full engine recompute cost can be ignored as it will be > triggered when engine aborts or when the user triggers a recompute. > > I don't think the lflow engine recompute cost due to objdep is > significant if lflow recomputes are less. > My point is that the performance degradation ~20% is caused by just patch 9, because I did the same test on patch 8 v.s. patch 9. (and patch 8 is similar to main in terms of the recompute time, although I did see a big percentage increase of time spent for VIF I-P, which I will check later separately) I did the same test for v6 (on top of the latest main) between patch 8 and patch 9. The percentage of the gap is
Re: [ovs-dev] [PATCH ovn v5 09/14] northd: Use objdep mgr for lport to lflow references.
On Fri, Aug 18, 2023 at 2:33 PM Numan Siddique wrote: > > On Fri, Aug 18, 2023 at 11:38 AM Han Zhou wrote: > > > > On Wed, Aug 9, 2023 at 9:38 AM wrote: > > > > > > From: Numan Siddique > > > > > Thanks Numan for the enhancement. > > > > > Instead of maintaing the lport to lflow references using > > > 'struct lflow_ref_list', this patch makes use of the existing > > > objdep APIs. Since objdep_mgr is not thread safe, an instance > > > of objdep_mgr is maintained for each 'ovn_port'. > > > > > > Once we add the thread safe support to objdep APIs we can move > > > the objdep_mgr to 'lflow' engine date. > > > > > > Using objdep APIs does come with a cost in memory. We now > > > maintain an additional hmap of ovn_lflow's to look up using > > > uuid. But this will be useful to handle datapath and load > > > balancer changes in the lflow engine node. > > > > > > > I understand that the 'struct lflow_ref_list' may not satisfy the needs of > > more complex I-P, but I wonder if objdep_mgr (without modification) is the > > solution. It seems to me more complex at least for this patch without > > adding obvious benefits. Since I am still reviewing, maybe I missed some > > points which would show the value from the rest of the patches, and please > > forgive me for my slow review. > > Sure. No worries. > > Can you please continue reviewing from the latest v6 ? > > http://patchwork.ozlabs.org/project/ovn/list/?series=369384 > > I think we can discuss more once you go through all the patches in the > series (or until patch 14). > I think that would give a better idea of how it is used and we can > discuss if it's worth using obj dep mgr. > > Also I think the patches P1 - P8 make one set and can be considered > first. I submitted all these patches > as one series mainly because >- all the patches can be applied easily >- the soft freeze was near. > > The patches can be found here too - > https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6 > > Thanks > Numan > > > > > While I am still reviewing the patches, I did a performance test with > > https://github.com/hzhou8/ovn-test-script for the 200 chassis x 50 lsp > > scale of simulated ovn-k8s topology. > > > > - Before this patch, a recompute takes 1293ms > > - After this patch, a recompute takes 1536ms, which is 20% more time. > > - I also did the same test with all the rest of the patches of this series, > > which takes 1690ms. > > I did some testing using the OVN dbs of the ovn-heater density heavy run. A triggered forced recompute with the present main of ovn-northd takes around 4389ms where as with the patch 9 of this series, it is taking around 4838ms. I hacked the 'engine_compute_log_timeout_msec' [1] value from 500ms to 50ms so that inc-eng can log the engine nodes taking more than 50ms in the recompute. With main --- 2023-08-18T11:28:20.714Z|00117|inc_proc_eng|INFO|User triggered force recompute. 2023-08-18T11:28:23.189Z|00118|inc_proc_eng|INFO|node: northd, recompute (forced) took 2475ms 2023-08-18T11:28:25.102Z|00119|inc_proc_eng|INFO|node: lflow, recompute (forced) took 1887ms 2023-08-18T11:28:25.102Z|00120|timeval|WARN|Unreasonably long 4389ms poll interval (4373ms user, 3ms system) And with patch 9 - 2023-08-18T11:28:10.558Z|00220|inc_proc_eng|INFO|User triggered force recompute. 2023-08-18T11:28:10.883Z|00221|inc_proc_eng|INFO|node: lb_data, recompute (forced) took 325ms 2023-08-18T11:28:13.163Z|00222|inc_proc_eng|INFO|node: northd, recompute (forced) took 2280ms 2023-08-18T11:28:13.431Z|00223|inc_proc_eng|INFO|node: sync_to_sb_lb, recompute (forced) took 241ms 2023-08-18T11:28:15.396Z|00224|inc_proc_eng|INFO|node: lflow, recompute (forced) took 1956ms 2023-08-18T11:28:15.397Z|00225|timeval|WARN|Unreasonably long 4838ms poll interval (4819ms user, 3ms system) As you can see the lflow recompute time is not very significant (1887 ms vs 1956 ms) It is the lb_data engine node which is taking 325ms more. But if you see the lb_data engine code, the lb_data handlers never return false. So the lb_data full engine recompute cost can be ignored as it will be triggered when engine aborts or when the user triggers a recompute. I don't think the lflow engine recompute cost due to objdep is significant if lflow recomputes are less. Thanks Numan > > So the major increase of the recompute time is from this patch. I didn't > > debug what's the exact cause of this increase, but I think it might be > > related to the objdep usage. > > > > > This patch does few more changes which are significant: > > > - Logical flows are now hashed without the logical > > > datapaths. If a logical flow is referenced by just one > > > datapath, we don't rehash it. > > > > > > - The synthetic 'hash' column of sbrec_logical_flow now > > > doesn't use the logical datapath too. This means that > > > when ovn-northd is updated/upgraded and has this commit, > > > all the logical flows with
Re: [ovs-dev] [PATCH ovn v5 09/14] northd: Use objdep mgr for lport to lflow references.
On Fri, Aug 18, 2023 at 11:38 AM Han Zhou wrote: > > On Wed, Aug 9, 2023 at 9:38 AM wrote: > > > > From: Numan Siddique > > > Thanks Numan for the enhancement. > > > Instead of maintaing the lport to lflow references using > > 'struct lflow_ref_list', this patch makes use of the existing > > objdep APIs. Since objdep_mgr is not thread safe, an instance > > of objdep_mgr is maintained for each 'ovn_port'. > > > > Once we add the thread safe support to objdep APIs we can move > > the objdep_mgr to 'lflow' engine date. > > > > Using objdep APIs does come with a cost in memory. We now > > maintain an additional hmap of ovn_lflow's to look up using > > uuid. But this will be useful to handle datapath and load > > balancer changes in the lflow engine node. > > > > I understand that the 'struct lflow_ref_list' may not satisfy the needs of > more complex I-P, but I wonder if objdep_mgr (without modification) is the > solution. It seems to me more complex at least for this patch without > adding obvious benefits. Since I am still reviewing, maybe I missed some > points which would show the value from the rest of the patches, and please > forgive me for my slow review. Sure. No worries. Can you please continue reviewing from the latest v6 ? http://patchwork.ozlabs.org/project/ovn/list/?series=369384 I think we can discuss more once you go through all the patches in the series (or until patch 14). I think that would give a better idea of how it is used and we can discuss if it's worth using obj dep mgr. Also I think the patches P1 - P8 make one set and can be considered first. I submitted all these patches as one series mainly because - all the patches can be applied easily - the soft freeze was near. The patches can be found here too - https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6 Thanks Numan > > While I am still reviewing the patches, I did a performance test with > https://github.com/hzhou8/ovn-test-script for the 200 chassis x 50 lsp > scale of simulated ovn-k8s topology. > > - Before this patch, a recompute takes 1293ms > - After this patch, a recompute takes 1536ms, which is 20% more time. > - I also did the same test with all the rest of the patches of this series, > which takes 1690ms. > > So the major increase of the recompute time is from this patch. I didn't > debug what's the exact cause of this increase, but I think it might be > related to the objdep usage. > > > This patch does few more changes which are significant: > > - Logical flows are now hashed without the logical > > datapaths. If a logical flow is referenced by just one > > datapath, we don't rehash it. > > > > - The synthetic 'hash' column of sbrec_logical_flow now > > doesn't use the logical datapath too. This means that > > when ovn-northd is updated/upgraded and has this commit, > > all the logical flows with 'logical_datapath' column > > set will get deleted and re-added causing some disruptions. > > > > - With the commit [1] which added I-P support for logical > > port changes, multiple logical flows with same match 'M' > > and actions 'A' are generated and stored without the > > dp groups, which was not the case prior to > > that patch. > > One example to generate these lflows is: > > ovn-nbctl lsp-set-addresses sw0p1 "MAC1 IP1" > > ovn-nbctl lsp-set-addresses sw1p1 "MAC1 IP1" > > ovn-nbctl lsp-set-addresses sw2p1 "MAC1 IP1" > > > > Now with this patch we go back to the earlier way. i.e > > one logical flow with logical_dp_groups set. > > > Thanks for taking the effort to achieve this. The approach is indeed very > smart. I was aware of such scenarios, but it just didn''t seem to impact > scalability because the number of such lflows should be small, so it wasn't > considered a priority. > > > - With this patch any updates to a logical port which > > doesn't result in new logical flows will not result in > > deletion and addition of same logical flows. > > Eg. > > ovn-nbctl set logical_switch_port sw0p1 external_ids:foo=bar > > will be a no-op to the SB logical flow table. > > > Similar to the above case, this doesn't seem to be critical for > scalability, but it is good to see the enhancement. > I will post more feedback after finishing the full review. > > Thanks, > Han > > > [1] - 8bbd67891f68("northd: Incremental processing of VIF additions in > 'lflow' node.") > > > > Suggested-by: Dumitru Ceara > > Signed-off-by: Numan Siddique > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v5 09/14] northd: Use objdep mgr for lport to lflow references.
On Wed, Aug 9, 2023 at 9:38 AM wrote: > > From: Numan Siddique > Thanks Numan for the enhancement. > Instead of maintaing the lport to lflow references using > 'struct lflow_ref_list', this patch makes use of the existing > objdep APIs. Since objdep_mgr is not thread safe, an instance > of objdep_mgr is maintained for each 'ovn_port'. > > Once we add the thread safe support to objdep APIs we can move > the objdep_mgr to 'lflow' engine date. > > Using objdep APIs does come with a cost in memory. We now > maintain an additional hmap of ovn_lflow's to look up using > uuid. But this will be useful to handle datapath and load > balancer changes in the lflow engine node. > I understand that the 'struct lflow_ref_list' may not satisfy the needs of more complex I-P, but I wonder if objdep_mgr (without modification) is the solution. It seems to me more complex at least for this patch without adding obvious benefits. Since I am still reviewing, maybe I missed some points which would show the value from the rest of the patches, and please forgive me for my slow review. While I am still reviewing the patches, I did a performance test with https://github.com/hzhou8/ovn-test-script for the 200 chassis x 50 lsp scale of simulated ovn-k8s topology. - Before this patch, a recompute takes 1293ms - After this patch, a recompute takes 1536ms, which is 20% more time. - I also did the same test with all the rest of the patches of this series, which takes 1690ms. So the major increase of the recompute time is from this patch. I didn't debug what's the exact cause of this increase, but I think it might be related to the objdep usage. > This patch does few more changes which are significant: > - Logical flows are now hashed without the logical > datapaths. If a logical flow is referenced by just one > datapath, we don't rehash it. > > - The synthetic 'hash' column of sbrec_logical_flow now > doesn't use the logical datapath too. This means that > when ovn-northd is updated/upgraded and has this commit, > all the logical flows with 'logical_datapath' column > set will get deleted and re-added causing some disruptions. > > - With the commit [1] which added I-P support for logical > port changes, multiple logical flows with same match 'M' > and actions 'A' are generated and stored without the > dp groups, which was not the case prior to > that patch. > One example to generate these lflows is: > ovn-nbctl lsp-set-addresses sw0p1 "MAC1 IP1" > ovn-nbctl lsp-set-addresses sw1p1 "MAC1 IP1" > ovn-nbctl lsp-set-addresses sw2p1 "MAC1 IP1" > > Now with this patch we go back to the earlier way. i.e > one logical flow with logical_dp_groups set. > Thanks for taking the effort to achieve this. The approach is indeed very smart. I was aware of such scenarios, but it just didn''t seem to impact scalability because the number of such lflows should be small, so it wasn't considered a priority. > - With this patch any updates to a logical port which > doesn't result in new logical flows will not result in > deletion and addition of same logical flows. > Eg. > ovn-nbctl set logical_switch_port sw0p1 external_ids:foo=bar > will be a no-op to the SB logical flow table. > Similar to the above case, this doesn't seem to be critical for scalability, but it is good to see the enhancement. I will post more feedback after finishing the full review. Thanks, Han > [1] - 8bbd67891f68("northd: Incremental processing of VIF additions in 'lflow' node.") > > Suggested-by: Dumitru Ceara > Signed-off-by: Numan Siddique ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v5 09/14] northd: Use objdep mgr for lport to lflow references.
Bleep bloop. Greetings Numan Siddique, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line has non-spaces leading whitespace WARNING: Line has trailing whitespace #3073 FILE: northd/northd.c:17013: Lines checked: 3478, Warnings: 2, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev