On 8/24/22 18:32, Ilya Maximets wrote: > On 8/24/22 17:57, Dumitru Ceara wrote: >> On 8/24/22 17:49, Ilya Maximets wrote: >>> On 8/24/22 17:43, Ilya Maximets wrote: >>>> On 8/23/22 16:42, Dumitru Ceara wrote: >>>>> After removing the possibility of disabling logical datapath groups the >>>>> above mentioned state should only be relevant when ovn-northd is started >>>>> with more than one lflow processing threads (--n-threads > 1), to avoid >>>>> a very inefficient first lflow processing run due to a suboptimally >>>>> sized hmap. >>>>> >>>>> There's no need to ever go back to this state now that we don't allow >>>>> users to disable logical datapath groups. Moreover, the previous code >>>>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every >>>>> iteration, essentially disabling parallelization. >>>>> >>>>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical >>>>> datapath groups.") >>>>> Reported-by: Ilya Maximets <i.maxim...@ovn.org> >>>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com> >>>>> --- >>>> >>>> Hi, Dumitru. Thanks for the patch! >> >> Hi Ilya, >> >>>> I've been using it in multiple scale tests since it was posted, >>>> and it does re-enable parallel lflow build. >>>> >> >> Thanks for trying it out! >> >>>> One issue though. Not sure how to fix that or what is the desired >>>> behavior, but what I noticed is: >>>> >>>> 1. Cluster with 3 northd instances starts up. >>>> 2. After a while all of them has been active at least once, due >>>> to leadership transfers or other things. >>>> 3. If one instance stays active for long, a lot of database changes >>>> can be accumulated. >>>> 4. Once the leadership flips, new active northd is not prepared for >>>> the amount of logical flows it suddenly needs to manage. So, >>>> it starts parallel processing with hmap significantly smaller >>>> than it actually needs right now. That iteration takes noticeably >>>> more time than a usual one. After the first iteration after >>>> becoming active, it can process lflows with a normal speed. >>>> >>>> So, I wonder if we can fall into a catastrophic scenario where >>>> the current hash map size is very small and amount of flows after >>>> re-connection is very high, so threads are only fighting for locks >>>> all the time. >>>> >>>> Do you have some thoughts on this? >>>> >> >> This seems like a possible scenario (although I wonder how often we'd >> hit it in practice). >> >> But this was a possible scenario before 9dea0c090e91 ("nb: Remove >> possibility of disabling logical datapath groups.") too, right? > > Probably, yes. There is no need to block this particular patch as > it is fixing the separate bug. So, > > Acked-by: Ilya Maximets <i.maxim...@ovn.org> >
Thanks! >> >>>> Should we disable parallelism when northd goes into stand-up state? >>> >>> * I meant 'standby', of course. :/ >>> >> >> We could, but that means every time northd becomes active it will do one >> first inefficient run. Can we instead store the max_seen_lflow_size >> value in the SB database and use the one we read from the DB? > > It's an interesting idea, if it can be implemented efficiently. I'm not > sure how much extra traffic or northd wakeups that will generate. > It might make more sense to store it in NB, just to avoid spamming all > the controllers with monitor-all. Also, NB DB generally has more room > performance-wise. > I'll try to work on a patch to store it in NB; we already write to NB for other stuff like Logical_Switch_Port.up so it should be acceptable. I'll run some benchmarks to see if there's any measurable negative impact because of this and report back. Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev