On 2/5/24 11:34, Ilya Maximets wrote: > On 2/5/24 09:23, Dumitru Ceara wrote: >> On 2/5/24 08:13, Han Zhou wrote: >>> On Sun, Feb 4, 2024 at 9:26 PM Numan Siddique <num...@ovn.org> wrote: >>>> >>>> On Sun, Feb 4, 2024 at 9:53 PM Han Zhou <hz...@ovn.org> wrote: >>>>> >>>>> On Sun, Feb 4, 2024 at 5:46 AM Ilya Maximets <i.maxim...@ovn.org> wrote: >>>>>> >>>>>>>>> >>>>>>>>>> 35 files changed, 9681 insertions(+), 4645 deletions(-) >>>>>>>>> >>>>>>>>> I had another look at this series and acked the remaining >>> patches. I >>>>>>>>> just had some minor comments that can be easily fixed when >>> applying >>>>> the >>>>>>>>> patches to the main branch. >>>>>>>>> >>>>>>>>> Thanks for all the work on this! It was a very large change but >>> it >>>>>>>>> improves northd performance significantly. I just hope we don't >>>>>>>>> introduce too many bugs. Hopefully the time we have until release >>>>> will >>>>>>>>> allow us to further test this change on the 24.03 branch. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Dumitru >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks a lot Dumitru and Han for the reviews and patience. >>>>>>>> >>>>>>>> I addressed the comments and applied the patches to main and also >>> to >>>>>>> branch-24.03. >>>>>>>> >>>>>>>> @Han - I know you wanted to take another look in to v6. I didn't >>> want >>>>> to >>>>>>> delay further as branch-24.03 was created. I'm more than happy to >>>>> submit >>>>>>> follow up patches if you have any comments to address. Please let >>> me >>>>> know. >>>>>>>> >>>>>>> >>>>>>> Hi Numan, >>>>>>> >>>>>>> I was writing the reply and saw your email just now. Thanks a lot >>> for >>>>>>> taking a huge effort to achieve the great optimization. I only left >>> one >>>>>>> comment on the implicit dependency left for the en_lrnat -> >>> en_lflow. >>>>> Feel >>>>>>> free to address it with a followup and no need to block the >>> branching. >>>>> And >>>>>>> take my Ack for the series with that addressed. >>>>>>> >>>>>>> Acked-by: Han Zhou <hzhou at ovn.org> >>>>>> >>>>>> >>>>>> Hi, Numan, Dumitru and Han. >>>>>> >>>>>> I see a huge negative performance impact, most likely from this set, >>> on >>>>>> ovn-heater's cluster-density tests. The memory consumption on northd >> >> Thanks for reporting this, Ilya! >> >>>>>> jumped about 4x and it constantly recomputes due to failures of >>> port_group >>>>>> handler: >>>>>> >>>>>> 2024-02-03T11:09:12.441Z|01680|inc_proc_eng|INFO|node: lflow, >>> recompute >>>>> (failed handler for input port_group) took 9762ms >>>>>> 2024-02-03T11:09:12.444Z|01681|timeval|WARN|Unreasonably long 9898ms >>> poll >>>>> interval (5969ms user, 1786ms system) >>>>>> ... >>>>>> 2024-02-03T11:09:23.770Z|01690|inc_proc_eng|INFO|node: lflow, >>> recompute >>>>> (failed handler for input port_group) took 9014ms >>>>>> 2024-02-03T11:09:23.773Z|01691|timeval|WARN|Unreasonably long 9118ms >>> poll >>>>> interval (5376ms user, 1515ms system) >>>>>> ... >>>>>> 2024-02-03T11:09:36.692Z|01699|inc_proc_eng|INFO|node: lflow, >>> recompute >>>>> (failed handler for input port_group) took 10695ms >>>>>> 2024-02-03T11:09:36.696Z|01700|timeval|WARN|Unreasonably long 10890ms >>>>> poll interval (6085ms user, 2745ms system) >>>>>> ... >>>>>> 2024-02-03T11:09:49.133Z|01708|inc_proc_eng|INFO|node: lflow, >>> recompute >>>>> (failed handler for input port_group) took 9985ms >>>>>> 2024-02-03T11:09:49.137Z|01709|timeval|WARN|Unreasonably long 10108ms >>>>> poll interval (5521ms user, 2440ms system) >>>>>> >>>>>> That increases 95%% ovn-installed latency in 500node cluster-density >>> from >>>>>> 3.6 seconds last week to 21.5 seconds this week. >>>>>> >>>>>> I think, this should be a release blocker. >>>>>> >>>>>> Memory usage is also very concerning. Unfortunately it is not tied >>> to the >>>>>> cluster-density test. The same 4-5x RSS jump is also seen in other >>> test >>>>>> like density-heavy. Last week RSS of ovn-northd in cluster-density >>> 500 >>>>> node >>>>>> was between 1.5 and 2.5 GB, this week we have a range between 5.5 and >>> 8.5 >>>>> GB. >>>>>> >>>>>> I would consider this as a release blocker as well. >>>>>> >> >> I agree, we shouldn't release 24.03.0 unless these two issues are >> (sufficiently) addressed. We do have until March 1st (official release >> date) to do that or to revert any patches that cause regressions. >> >>>>>> >>>>>> I don't have direct evidence that this particular series is a >>> culprit, but >>>>>> it looks like the most likely candidate. I can dig more into >>>>> investigation >>>>>> on Monday. >>>>>> >>>>>> Best regards, Ilya Maximets. >>>>> >>>>> Thanks Ilya for reporting this. 95% latency and 4x RSS increase is a >>> little >>>>> surprising to me. I did test this series with my scale test scripts for >>>>> recompute performance regression. It was 10+% increase in latency. I >>> even >>>>> digged a little into it, and noticed ~5% increase caused by the hmap >>> used >>>>> to maintain the lflows in each lflow_ref. This was discussed in the code >>>>> review for an earlier version (v2/v3). Overall it looked not very bad, >>> if >>>>> we now handle most common scenarios incrementally, and it is reasonable >>> to >>>>> have some cost for maintaining the references/index for incremental >>>>> processing. I wonder if my test scenario was too simple (didn't have LBs >>>>> included) to find the problems, so today I did another test by >>> including a >>>>> LB group with 1k LBs applied to 100 node-LS & GR, and another 1K LBs per >>>>> node-LS & GR (101K LBs in total), and I did see more performance penalty >>>>> but still within ~20%. While for memory I didn't notice a significant >>>>> increase (<10%). I believe I am missing some specific scenario that had >>> the >>>>> big impact in the ovn-heater's tests. Please share if you dig out more >>>>> clues . >>>> >>>> Hi Ilya, >>>> >>>> Thanks for reporting these details. >>>> >>>> I had a look at this regression. There is a significant increase in >>>> the lflow recompute >>>> time (around 4x) in my local testing and this definitely not acceptable. >>>> >>>> In this particular cluster density test, whenever a port group is >>>> created it results in a full recompute >>>> and now since recompute time has increased, it has a cumulative effect >>>> on the latency of the test. >>>> >>>> The dp reference counting [1] added in the v4 of this series has >>>> introduced this regression (both CPU and memory). >>>> I'm working on this fix and I think I should be able to address this soon. >>>> >>>> [1] - https://github.com/ovn-org/ovn/blob/main/northd/lflow-mgr.c#L696 >>>> >>>> >>> Indeed, the same concern I mentioned when reviewing v5: >>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-January/411194.html > > Yeah. In this cluster-density 500node test, dp_refcnt_use() allocates > 120 M references and puts them into hash maps. That alone takes several > GB of memory. And it is a largest consumer according to massif. > It takes ~45% of total memory allocations. Another ~25% are taken by hash > maps where the refcounts are stored. > > There is a simple optimization that can be made - not allocate refcounts > if it is the first time we're trying to add them. The original bitmap > should be enough. And if we try to use it again while it is already in > a bitmap, then allocate. That saves a lot of space, because 80% of all > refcounts in that test never used more than once. However, it might not > be enough. Needs testing. Ideally the whole structure needs to go or > be replaced by something different. Frequent iterations over the bitmaps > and per-datapath*per-lflow memory allocations do not scale. Even if we > can make it work reasonably well for 500 nodes, it will cause problems on > higher scale.
FWIW, I have a small prototype for this partial solution that seem to noticeably decrease the memory usage here: https://github.com/igsilya/ovn/commit/5ac3a78e287708fd571344c74a021e418fde31a4 I need to run this through the scale run to get actual results though. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev