On 8/14/23 18:51, Han Zhou wrote: > On Mon, Aug 14, 2023 at 4:46 AM Dumitru Ceara <dce...@redhat.com> wrote: >> >> On 8/12/23 07:08, Han Zhou wrote: >>> On Fri, Aug 11, 2023 at 6:07 AM Dumitru Ceara <dce...@redhat.com> wrote: >>>> >>>> On 8/10/23 18:38, Ilya Maximets wrote: >>>>> On 8/10/23 17:34, Dumitru Ceara wrote: >>>>>> On 8/10/23 17:20, Han Zhou wrote: >>>>>>> On Thu, Aug 10, 2023 at 6:36 AM Dumitru Ceara <dce...@redhat.com> >>> wrote: >>>>>>>> >>>>>>>> On 8/10/23 15:34, Han Zhou wrote: >>>>>>>>> On Thu, Aug 10, 2023 at 2:29 AM Dumitru Ceara <dce...@redhat.com> >>> wrote: >>>>>>>>>> >>>>>>>>>> On 8/10/23 08:12, Ales Musil wrote: >>>>>>>>>>> On Wed, Aug 9, 2023 at 5:13 PM Mark Michelson < > mmich...@redhat.com >>>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Ales, >>>>>>>>>>>> >>>>>>>>>>>> I have some high-level comments/questions about this patch. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Mark, >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Ales, Mark, >>>>>>>>>> >>>>>>>>>>> thank you for the review. See my answers inline below. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I have been privy to the conversations that led to this change. >>> My >>>>>>>>>>>> understanding is that by having ovn-northd wake up immediately, >>> it >>>>>>> can >>>>>>>>>>>> be more CPU-intensive than waiting a bit for changes to >>> accumulate >>>>>>> and >>>>>>>>>>>> handling all of those at once instead. However, nothing in >>> either the >>>>>>>>>>>> commit message or ovn-nb.xml explains what the purpose of this >>> new >>>>>>>>>>>> configuration option is. I think you should add a sentence or >>> two to >>>>>>>>>>>> explain why someone would want to enable this option. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yeah that's my bad, I have v2 prepared with some explanation in >>> the >>>>>>>>> commit >>>>>>>>>>> message >>>>>>>>>>> together with results from scale run. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> +1 we really need to explain why this change is needed. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Next, the algorithm used here strikes me as odd. We use the >>> previous >>>>>>>>> run >>>>>>>>>>>> time of ovn-northd to determine how long to wait before running >>>>>>> again. >>>>>>>>>>>> This delay is capped by the configured backoff time. Let's say >>> that >>>>>>>>>>>> we've configured the backoff interval to be 200 ms. If > ovn-northd >>>>>>> has a >>>>>>>>>>>> super quick run and only takes 10 ms, then we will only delay > the >>>>>>> next >>>>>>>>>>>> run by 10 ms. IMO, this seems like it would not mitigate the >>> original >>>>>>>>>>>> issue by much, since we are only allowing a maximum of 20 ms > (10 >>> ms >>>>>>> for >>>>>>>>>>>> the run of ovn-northd + 10 ms delay) of NB changes to > accumulate. >>>>>>>>>>>> Conversely, if northd has a huge recompute and it takes 5000 ms >>> to >>>>>>>>>>>> complete, then we would delay the next run by 200ms. In this >>> case, >>>>>>>>>>>> delaying at all seems like it's not necessary since we >>> potentially >>>>>>> have >>>>>>>>>>>> 5000 ms worth of NB DB updates that have not been addressed. I >>> would >>>>>>>>>>>> have expected the opposite approach to be taken. If someone >>>>>>> configures >>>>>>>>>>>> 200ms as their backoff interval, I would expect us to always >>> allow a >>>>>>>>>>>> *minimum* of 200ms of NB changes to accumulate before running >>> again. >>>>>>> So >>>>>>>>>>>> for instance, if northd runs quickly and is done in 10 ms, then >>> we >>>>>>>>> would >>>>>>>>>>>> wait 200 - 10 = 190 ms before processing changes again. If > northd >>>>>>> takes >>>>>>>>>>>> a long time to recompute and takes 5000 ms, then we would not >>> wait at >>>>>>>>>>>> all before processing changes again. Was the algorithm chosen >>> based >>>>>>> on >>>>>>>>>>>> experimentation? Is it a well-known method I'm just not > familiar >>>>>>> with? >>>>>>>>>> >>>>>>>>>> I think the main assumption (that should probably be made > explicit >>> in >>>>>>>>>> the commit log and/or documentation) is that on average changes >>> happen >>>>>>>>>> in a uniform way. This might not always be accurate. >>>>>>>>>> >>>>>>>>>> However, if we're off with the estimate, in the worst case we'd > be >>>>>>>>>> adding the configured max delay to the latency of processing >>> changes. >>>>>>>>>> So, as long as the value is not extremely high, the impact is not >>> that >>>>>>>>>> high either. >>>>>>>>>> >>>>>>>>>> Last but not least, as this value would be configured by the CMS, >>> we >>>>>>>>>> assume the CMS knows what they're doing. :) >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm not sure if the algorithm is well known. >>>>>>>>>>> >>>>>>>>>>> The thing is that at scale we almost always cap at the backoff > so >>> it >>>>>>> has >>>>>>>>>>> probably >>>>>>>>>>> the same effect as your suggestion with the difference that we >>>>>>> actually >>>>>>>>>>> delay even >>>>>>>>>>> after long runs. And that is actually desired, it's true that in >>> the >>>>>>>>> let's >>>>>>>>>>> say 500 ms >>>>>>>>>>> should be enough to accumulate more changes however that can > lead >>> to >>>>>>>>> another >>>>>>>>>>> 500 ms run and so on. That in the end means that northd will > spin >>> at >>>>>>>>> 100% >>>>>>>>>>> CPU >>>>>>>>>>> anyway which is what we want to avoid. So from another point of >>> view >>>>>>> the >>>>>>>>>>> accumulation >>>>>>>>>>> of IDL changes is a secondary effect which is still desired, but >>> not >>>>>>> the >>>>>>>>>>> main purpose. >>>>>>>>>>> >>>>>>>>>>> Also delaying by short time if the previous run was short is >>> fine, you >>>>>>>>> are >>>>>>>>>>> right that we don't >>>>>>>>>>> accumulate enough however during short running times there is a >>> high >>>>>>>>> chance >>>>>>>>>>> that the >>>>>>>>>>> northd would get to sleep anyway (We will help it to sleep at >>> least a >>>>>>>>> bit >>>>>>>>>>> nevertheless). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Next, I notice that you've added new poll_timer_wait() calls > but >>>>>>>>> haven't >>>>>>>>>>>> changed the ovsdb_idl_loop_run() or >>> ovsdb_idl_loop_commit_and_wait() >>>>>>>>>>>> calls. Is there any danger of ovn-northd getting in a busy loop >>> of >>>>>>>>>>>> sleeping and waking because of this? I don't think it should, >>> since >>>>>>>>>>>> presumably ovsdb_idl_loop_run() should clear the conditions >>> waited on >>>>>>>>> by >>>>>>>>>>>> ovsdb_idl_loop_commit_and_wait(), but I want to double-check. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> AFAIK it shouldn't cause any issues as ovsdb_idl_loop_run() will >>>>>>> process >>>>>>>>>>> anything >>>>>>>>>>> that it can and wait will be fine. The problem would be if we >>> would >>>>>>>>> skip the >>>>>>>>>>> ovsdb_idl_loop_run() for some reason. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Next, does this have any negative impact on our ability to >>> perform >>>>>>>>>>>> incremental processing in ovn-northd? My concern is that since >>> we are >>>>>>>>>>>> still running the ovsdb IDL loop that if multiple NB changes >>> occur >>>>>>>>>>>> during our delay, then we might have to fall back to a full >>> recompute >>>>>>>>>>>> instead of being able to incrementally process the changes. Are >>> my >>>>>>>>>>>> concerns valid? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I suppose that can happen if there are changes that could result >>> in >>>>>>>>>>> "conflict" >>>>>>>>>>> and full recompute. During the test we haven't seen anything > like >>>>>>> that. >>>>>>>>>>> The odds of that happening are small because as stated > previously >>> we >>>>>>> are >>>>>>>>>>> doing >>>>>>>>>>> basically the same as if the engine was running for a long time >>> always >>>>>>>>> from >>>>>>>>>>> the IDL >>>>>>>>>>> point of view except that we give IDL a chance to process >>> whatever has >>>>>>>>>>> pilled up >>>>>>>>>>> within the sleep period. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Next, has scale testing shown that this change has made a >>> positive >>>>>>>>>>>> impact? If so, is there any recommendation for how to determine >>> what >>>>>>> to >>>>>>>>>>>> configure the value to? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> It has a huge impact actually the value tested was 200 ms, the >>>>>>>>>>> recommendation >>>>>>>>>> >>>>>>>>>> This was chosen based on the historical data from similar tests >>> which >>>>>>>>>> showed that the I-P engine was taking ~180-200 ms to run at > scale. >>>>>>>>>> >>>>>>>>>>> would be < 500 ms. After that point the latency on components >>> creation >>>>>>>>>>> would be >>>>>>>>>>> very noticable. I will put the recommendation into the > ovn-nb.xml >>> with >>>>>>>>> the >>>>>>>>>>> latency >>>>>>>>>>> comment. Before I'll post v2 (which has the numbers in commit >>> message) >>>>>>>>> those >>>>>>>>>>> are the test results: >>>>>>>>>>> >>>>>>>>>>> Run without any backoff period: >>>>>>>>>>> northd aggregate CPU 9810% avg / 12765% max >>>>>>>>>>> northd was spinning at 100% CPU the entire second half of the >>> test. >>>>>>>>>>> >>>>>>>>>>> Run with 200 ms max backoff period: >>>>>>>>>>> northd aggregate CPU 6066% avg / 7689% max >>>>>>>>>>> northd was around 60% for the second half of the test >>>>>>>>>>> >>> >>> Ales, could you clarify the version of OVN and the type of test for the >>> above result? Is it on main branch or with Numans patches for LB I-P? > Does >>> the test include LB changes? Are there recomputes during the test? Was > the >>> 100% CPU during the second half of the test caused by recompute or by > cost >>> of I-P? >>> >> >> I'll try to reply to these but Ales, please correct me if I'm wrong. >> >> Version used for testing: OVN main + Numan's "LB + lflow I-P v5". >> >> The test includes LB changes, it brings up an 120 OpenShift cluster >> using the latest OVN-IC support that was added recently to >> ovn-kubernetes. It runs one "zone per node". >> >> The test then creates 14K services and 28K pods distributed across >> 80 namespaces as fast as possible. The test measures lots of >> things but we were interested in average and P99 time for pods to >> be "Ready" (networking ready) and cpu usage for ovn-northd on all >> nodes. >> >> As far as I understand there were still occasional recomputes >> triggered by missing/failing handlers, e.g. sync_to_sb_lb or lflow >> missing a handler for SB.LB changes. In any case, the CMS (ovn-kube) >> was continuously adding ports/LBs/resources to NB, as fast as >> possible; northd was waking up to process those. >> >> An example of logs generated by northd I-P while at scale: >> >> 2023-08-08T19:12:27.394Z|1149658|inc_proc_eng|DBG|node: lflow, handler > for input northd took 1ms >> 2023-08-08T19:12:27.552Z|1149766|inc_proc_eng|DBG|node: lb_data, handler > for input NB_load_balancer_group took 13ms >> 2023-08-08T19:12:27.553Z|1149779|inc_proc_eng|DBG|node: > sync_to_sb_addr_set, recompute (missing handler for input northd) took 1ms >> 2023-08-08T19:12:27.684Z|1149782|inc_proc_eng|DBG|node: sync_to_sb_lb, > recompute (missing handler for input SB_load_balancer) took 131ms >> 2023-08-08T19:12:27.685Z|1149794|inc_proc_eng|DBG|node: lflow, handler > for input northd took 1ms >> 2023-08-08T19:12:27.846Z|1149893|inc_proc_eng|DBG|node: lb_data, handler > for input NB_load_balancer_group took 13ms >> 2023-08-08T19:12:27.848Z|1149907|inc_proc_eng|DBG|node: > sync_to_sb_addr_set, recompute (missing handler for input northd) took 1ms >> >>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Finally, is this change expected to be a long-term necessity? >>> This >>>>>>>>>>>> option seems to be useful for cases where northd recomputes are >>>>>>>>>>>> required. Performing recomputes less frequently seems like it >>> would >>>>>>>>>>>> lower the CPU usage of ovn-northd while still processing the > same >>>>>>>>> amount >>>>>>>>>>>> of changes. However, once northd can handle most changes >>>>>>> incrementally, >>>>>>>>>>>> is there still a benefit to delaying running? If each run of >>> northd >>>>>>>>>>>> handles all DB changes incrementally, then is there any point > in >>>>>>>>> putting >>>>>>>>>>>> delays between those incremental runs? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Ideally we won't need it in the future. However, the assumption >>> for >>>>>>> not >>>>>>>>>>> needing >>>>>>>>>>> anything like this is that northd will be fast enough to process >>> I-P >>>>>>>>>>> changes and >>>>>>>>>>> be able to sleep between the next batch update arrives from CMS. >>> That >>>>>>>>>>> doesn't >>>>>>>>>>> seem to happen in very near future, one thing to keep in mind is >>> that >>>>>>>>>>> testing >>>>>>>>>>> happened with Numan's I-P for LBs and lflows which make a huge >>>>>>>>> difference, >>>>>>>>>>> but >>>>>>>>>>> still not enough to achieve the mentioned northd state. So from > my >>>>>>>>>>> perspective >>>>>>>>>>> it will be relevant for a few releases. And as stated above the >>> point >>>>>>>>> is to >>>>>>>>>>> prevent >>>>>>>>>>> northd to spin at 100% CPU all the time. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> +1 it's not the prettiest feature (and some might rightfully call >>> it a >>>>>>>>>> hack) but it seems to me like the cleanest alternative for now, >>> until >>>>>>>>>> northd processing is fully incremental. >>>>>>>>> >>>>>>>>> In most cases it may be fine, but it might be a problem for a > worst >>> case >>>>>>>>> scenario: >>>>>>>>> >>>>>>>>> Assume all the changes coming in NB can be incrementally processed >>> but >>>>>>> at >>>>>>>>> a very very high rate, and ovn-northd keeps processing the changes >>>>>>>>> incrementally. Since the change rate is so high, ovn-northd barely >>>>>>> keeps up >>>>>>>>> with the changes with 99% CPU load. For example, I-P for each > object >>>>>>> takes >>>>>>>>> 10ms, and the change rate is 99 objects/sec. According to this >>>>>>> algorithm, >>>>>>>>> ovn-northd will always sleep for the maximum 200ms between each > IDL >>> run, >>>>>>>>> and then ovn-northd would never keep up with the changes any more > - >>> the >>>>>>>>> backlog will become longer and longer because of the wasted idle >>> time. >>>>>>>>> >>>>>>>> >>>>>>>> IDL runs are not skipped. Just I-P engine runs. So I think this >>> won't >>>>>>>> be a problem, or am I missing something? >>>>>>> >>>>>>> Sorry about the typo, I meant to say between each "engine run" >>> instead of >>>>>>> "IDL run". IDL run is not skipped, but the backlog (accumulated >>> changes in >>>>>>> IDL) becomes longer and longer. E.g.: >>>>>>> >>>>>>> (assume change rate is 100 object/sec) >>>>>>> >>>>>>> run-1: handles 1 object, takes 10ms, sleep 10ms >>>>>>> run-2: handles 2 objects, takes 20ms, sleep 20ms >>>>>>> run-3: handles 4 objects, takes 40ms, sleep 40ms >>>>>>> run-4: handles 8 objects, takes 80ms, sleep 80ms >>>>>>> run-5: handles 16 objects, takes 160ms, sleep 160ms >>>>>>> run-6: handles 32 objects, takes 320ms, sleep 200ms >>>>>>> run-7: handles 52 objects (accumulated in 320 + 200 ms), takes > 520ms, >>> sleep >>>>>>> 200ms >>>>>>> run-8: handles 72 objects, takes 720ms, sleep 200ms >>>>>>> run-9: handles 92 objects, takes 920ms, sleep 200ms >>>>>>> ... >>>>>>> As we can see the backlog grows indefinitely if the input keeps >>> changing at >>>>>>> the rate of 100 obj/s. >>>>>>> >>>>>> >>>>>> I see now, thanks for the explanation. But isn't this possible today >>>>>> too? Maybe less probable though. >>>>>> >>>>>> Also, we'll probably hit other issues (e.g., timeouts on CMS side) >>>>>> because of the backlog which will (likely?) throttle the CMS. >>>>>> >>>>>> What would be a good solution? It seems hard to define what a "large >>>>>> backlog" is. If we could do that, maybe triggering a full recompute >>>>>> when the backlog is large enough might be OK. >>>>> >>>>> Hmm. That's an interesting problem. >>>>> From my understanding, the cost of incremental processing run consists >>>>> of two parts: some more or less constant engine run cost E and a cost >>>>> of processing actual changes P. Unless CMS is changing the same value >>>>> over and over again, we can't optimize P by accumulating more changes >>>>> per run. But we can amortize the more or less constant cost E. >>>>> >>>>> We had a similar problem with the full recompute in the past. If we >>>>> assume the cost of a recompute R to be more or less constant > regardless >>>>> of amount of changes, then we can reduce the total cost by aggregating >>>>> more changes per run. That's what run_idl_loop() function is trying >>>>> to do. It runs IDL for as long as there are changes (capped at > 500ms). >>>>> >>>>> We can't really avoid paying the cost P and it will compound with the >>>>> increased number of changes. The only solution here, AFAICT, is to >>>>> fall back to full recompute once P > R. Maybe that can be done by >>>>> somehow calculating the number of tracked vs real rows in IDL. >>>>> Current ovn-northd has this problem and introduction of delays will >>>>> amplify it. >>>>> >>>> >>>> That is still hard to do unfortunately. The ratio of IDL tracked vs >>>> "real" rows is still not a good enough indication. It's possible that >>>> northd can quickly incrementally process row updates for table T1 but >>>> that it has to do intensive computations for row updates for table T2. >>>> >>>> A simpler (dumb?) approach is to just periodically recompute. But that >>>> shouldn't happen too often because then we start using a lot of CPU >>>> again. :) >>>> >>>>> >>>>> The problem this patch is trying to address, IIUC, is that the cost E >>>>> is still high. So, I'm wondering if the solution can be similar to >>>>> what we already have for a full recompute. >>> >>> My assumption was that the cost E is relatively small in the I-P. I > thought >>> the patch was trying to help when there were still many recomputes >>> triggered. Please correct me if I was wrong. I posted some questions >>> regarding Ales's test result, to get a better understanding. >> >> I thought E was the "constant" non-incremental cost. In any case, >> P (cost for I-P to successfully run) is also not negligible, e.g., >> even a successful I-P execution of the lb_data handler for NB.LB_Group >> changes took 13ms at scale: >> >> 2023-08-08T19:12:27.552Z|1149766|inc_proc_eng|DBG|node: lb_data, handler > for input NB_load_balancer_group took 13ms > > My point is, the constant cost in the I-P handlers is relatively small. The > constant cost mainly comes from the recomputes or partial recomputes, e.g. > when "northd" node is I-P but "lflow" or "sync_to_sb_lb" recomputes. > If the definition of E includes the cost of recompute of nodes during the > engine execution, I think we are on the same page. That part is what the > delay approach would help the most. But if E only includes the constant > part of an I-P handler, e.g. initializing/destroying some data structures, > those shouldn't be a significant cost. >
Yes, the way I see it is that E includes the cost of recomputing nodes. > And yes, the cost of some I-P handlers is not negligible, but I don't think > the delay approach would really help in that case. > In theory it wouldn't help but in practice it really depends on how that handler is implemented. E.g., a handler could somehow batch incremental update processing for a given table. Regardless, we focus on the former case, E. > Does this make sense? > I think we're on the same page now. Thanks, Dumitru > Thanks, > Han > >> >> >> Regards, >> Dumitru >> >>> >>> Thanks, >>> Han >>> >>>>> Current run_idl_loop() >>>>> runs the IDL only while there are changes and it will not continue >>>>> running it if the next update is delayed even by a very short time >>>>> interval. What we could potentially do here is to wait for some >>>>> very short time interval (maybe configurable or maybe not) and check >>>>> the IDL for changes again. This will allow us to better batch updates >>>>> and amortize constant engine run cost E without need to sleep for >>>>> arbitrary 200ms. (We will sleep for arbitrary 1-10ms at a time, but >>>>> that seems less arbitrary. :D ). >>>> >>>> I think we still need to set a max limit for this sequence of short >>>> sleeps and that max limit needs to be configurable by the user because >>>> it may increase end-to-end latency for propagating NB changes to >>>> ovn-controller. >>>> >>>> In that case I think I prefer an explicit "arbitrary" max backoff set > by >>>> the user like this patch initially proposed. But I'm not completely >>>> against other options. >>>> >>>>> >>>>> Something like (for illustration purposes; not tested): >>>>> >>>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >>>>> index 4fa1b039e..54e4ecb5f 100644 >>>>> --- a/northd/ovn-northd.c >>>>> +++ b/northd/ovn-northd.c >>>>> @@ -689,22 +689,39 @@ run_idl_loop(struct ovsdb_idl_loop *idl_loop, >>> const char *name) >>>>> unsigned long long duration, start = time_msec(); >>>>> unsigned int seqno = UINT_MAX; >>>>> struct ovsdb_idl_txn *txn; >>>>> + int n_before_sleep = -1; >>>>> int n = 0; >>>>> >>>>> /* Accumulate database changes as long as there are some, >>>>> * but no longer than half a second. */ >>>>> - while (seqno != ovsdb_idl_get_seqno(idl_loop->idl) >>>>> - && time_msec() - start < 500) { >>>>> - seqno = ovsdb_idl_get_seqno(idl_loop->idl); >>>>> - ovsdb_idl_run(idl_loop->idl); >>>>> - n++; >>>>> + for (;;) { >>>>> + while (seqno != ovsdb_idl_get_seqno(idl_loop->idl) >>>>> + && time_msec() - start < 500) { >>>>> + seqno = ovsdb_idl_get_seqno(idl_loop->idl); >>>>> + ovsdb_idl_run(idl_loop->idl); >>>>> + n++; >>>>> + } >>>>> + /* If we're not out of time yet, try to sleep for a short > 10ms >>>>> + * in case we'll have more updates. Don't sleep again if > there >>>>> + * were no updates after the previous short sleep. */ >>>>> + if (n > n_before_sleep + 1 && time_msec() - start < 500) { >>>>> + n_before_sleep = n; >>>>> + poll_timer_wait(10); >>>>> + ovsdb_idl_wait(idl_loop->idl); >>>>> + poll_block(); >>>>> + /* Reset seqno, so we try to run IDL at least one more >>> time. */ >>>>> + seqno = UINT_MAX; >>>>> + } else { >>>>> + /* Out of time or no updates since the last sleep. */ >>>>> + break; >>>>> + } >>>>> } >>>>> >>>>> txn = ovsdb_idl_loop_run(idl_loop); >>>>> >>>>> duration = time_msec() - start; >>>>> - /* ovsdb_idl_run() is called at least 2 times. Once directly and >>>>> - * once in the ovsdb_idl_loop_run(). n > 2 means that we > received >>>>> + /* ovsdb_idl_run() is called at least 3 times. Once directly and >>>>> + * once in the ovsdb_idl_loop_run(). n > 3 means that we > received >>>>> * data on at least 2 subsequent calls. */ >>>>> if (n > 2 || duration > 100) { >>>>> VLOG(duration > 500 ? VLL_INFO : VLL_DBG, >>>>> --- >>>>> >>>>> I'm not sure how something like this will impact the total CPU usage. >>>>> Hopefully, these short sleeps can allow accumulating more changes and >>>>> amortize the constant costs better while also reducing perceived CPU >>>>> usage. Needs testing, for sure. One other advantage of such a > solution >>>>> is the code locality, i.e. all the logic being in one place. >>>>> >>>>> Thoughts? >>>>> >>>>> Best regards, Ilya Maximets. >>>>> >>>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev