Re: [ovs-dev] [PATCH ovn] northd: Allow delay of northd engine runs

Han Zhou Mon, 14 Aug 2023 09:52:10 -0700

On Mon, Aug 14, 2023 at 4:46 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On 8/12/23 07:08, Han Zhou wrote:
> > On Fri, Aug 11, 2023 at 6:07 AM Dumitru Ceara <dce...@redhat.com> wrote:
> >>
> >> On 8/10/23 18:38, Ilya Maximets wrote:
> >>> On 8/10/23 17:34, Dumitru Ceara wrote:
> >>>> On 8/10/23 17:20, Han Zhou wrote:
> >>>>> On Thu, Aug 10, 2023 at 6:36 AM Dumitru Ceara <dce...@redhat.com>
> > wrote:
> >>>>>>
> >>>>>> On 8/10/23 15:34, Han Zhou wrote:
> >>>>>>> On Thu, Aug 10, 2023 at 2:29 AM Dumitru Ceara <dce...@redhat.com>
> > wrote:
> >>>>>>>>
> >>>>>>>> On 8/10/23 08:12, Ales Musil wrote:
> >>>>>>>>> On Wed, Aug 9, 2023 at 5:13 PM Mark Michelson <
mmich...@redhat.com
> >>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Ales,
> >>>>>>>>>>
> >>>>>>>>>> I have some high-level comments/questions about this patch.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi Mark,
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Ales, Mark,
> >>>>>>>>
> >>>>>>>>> thank you for the review. See my answers inline below.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> I have been privy to the conversations that led to this change.
> > My
> >>>>>>>>>> understanding is that by having ovn-northd wake up immediately,
> > it
> >>>>> can
> >>>>>>>>>> be more CPU-intensive than waiting a bit for changes to
> > accumulate
> >>>>> and
> >>>>>>>>>> handling all of those at once instead. However, nothing in
> > either the
> >>>>>>>>>> commit message or ovn-nb.xml explains what the purpose of this
> > new
> >>>>>>>>>> configuration option is. I think you should add a sentence or
> > two to
> >>>>>>>>>> explain why someone would want to enable this option.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> Yeah that's my bad, I have v2 prepared with some explanation in
> > the
> >>>>>>> commit
> >>>>>>>>> message
> >>>>>>>>> together with results from scale run.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> +1 we really need to explain why this change is needed.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Next, the algorithm used here strikes me as odd. We use the
> > previous
> >>>>>>> run
> >>>>>>>>>> time of ovn-northd to determine how long to wait before running
> >>>>> again.
> >>>>>>>>>> This delay is capped by the configured backoff time. Let's say
> > that
> >>>>>>>>>> we've configured the backoff interval to be 200 ms. If
ovn-northd
> >>>>> has a
> >>>>>>>>>> super quick run and only takes 10 ms, then we will only delay
the
> >>>>> next
> >>>>>>>>>> run by 10 ms. IMO, this seems like it would not mitigate the
> > original
> >>>>>>>>>> issue by much, since we are only allowing a maximum of 20 ms
(10
> > ms
> >>>>> for
> >>>>>>>>>> the run of ovn-northd + 10 ms delay) of NB changes to
accumulate.
> >>>>>>>>>> Conversely, if northd has a huge recompute and it takes 5000 ms
> > to
> >>>>>>>>>> complete, then we would delay the next run by 200ms. In this
> > case,
> >>>>>>>>>> delaying at all seems like it's not necessary since we
> > potentially
> >>>>> have
> >>>>>>>>>> 5000 ms worth of NB DB updates that have not been addressed. I
> > would
> >>>>>>>>>> have expected the opposite approach to be taken. If someone
> >>>>> configures
> >>>>>>>>>> 200ms as their backoff interval, I would expect us to always
> > allow a
> >>>>>>>>>> *minimum* of 200ms of NB changes to accumulate before running
> > again.
> >>>>> So
> >>>>>>>>>> for instance, if northd runs quickly and is done in 10 ms, then
> > we
> >>>>>>> would
> >>>>>>>>>> wait 200 - 10 = 190 ms before processing changes again. If
northd
> >>>>> takes
> >>>>>>>>>> a long time to recompute and takes 5000 ms, then we would not
> > wait at
> >>>>>>>>>> all before processing changes again. Was the algorithm chosen
> > based
> >>>>> on
> >>>>>>>>>> experimentation? Is it a well-known method I'm just not
familiar
> >>>>> with?
> >>>>>>>>
> >>>>>>>> I think the main assumption (that should probably be made
explicit
> > in
> >>>>>>>> the commit log and/or documentation) is that on average changes
> > happen
> >>>>>>>> in a uniform way.  This might not always be accurate.
> >>>>>>>>
> >>>>>>>> However, if we're off with the estimate, in the worst case we'd
be
> >>>>>>>> adding the configured max delay to the latency of processing
> > changes.
> >>>>>>>> So, as long as the value is not extremely high, the impact is not
> > that
> >>>>>>>> high either.
> >>>>>>>>
> >>>>>>>> Last but not least, as this value would be configured by the CMS,
> > we
> >>>>>>>> assume the CMS knows what they're doing. :)
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm not sure if the algorithm is well known.
> >>>>>>>>>
> >>>>>>>>> The thing is that at scale we almost always cap at the backoff
so
> > it
> >>>>> has
> >>>>>>>>> probably
> >>>>>>>>> the same effect as your suggestion with the difference that we
> >>>>> actually
> >>>>>>>>> delay even
> >>>>>>>>> after long runs. And that is actually desired, it's true that in
> > the
> >>>>>>> let's
> >>>>>>>>> say 500 ms
> >>>>>>>>> should be enough to accumulate more changes however that can
lead
> > to
> >>>>>>> another
> >>>>>>>>> 500 ms run and so on. That in the end means that northd will
spin
> > at
> >>>>>>> 100%
> >>>>>>>>> CPU
> >>>>>>>>> anyway which is what we want to avoid. So from another point of
> > view
> >>>>> the
> >>>>>>>>> accumulation
> >>>>>>>>> of IDL changes is a secondary effect which is still desired, but
> > not
> >>>>> the
> >>>>>>>>> main purpose.
> >>>>>>>>>
> >>>>>>>>> Also delaying by short time if the previous run was short is
> > fine, you
> >>>>>>> are
> >>>>>>>>> right that we don't
> >>>>>>>>> accumulate enough however during short running times there is a
> > high
> >>>>>>> chance
> >>>>>>>>> that the
> >>>>>>>>> northd would get to sleep anyway (We will help it to sleep at
> > least a
> >>>>>>> bit
> >>>>>>>>> nevertheless).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Next, I notice that you've added new poll_timer_wait() calls
but
> >>>>>>> haven't
> >>>>>>>>>> changed the ovsdb_idl_loop_run() or
> > ovsdb_idl_loop_commit_and_wait()
> >>>>>>>>>> calls. Is there any danger of ovn-northd getting in a busy loop
> > of
> >>>>>>>>>> sleeping and waking because of this? I don't think it should,
> > since
> >>>>>>>>>> presumably ovsdb_idl_loop_run() should clear the conditions
> > waited on
> >>>>>>> by
> >>>>>>>>>> ovsdb_idl_loop_commit_and_wait(), but I want to double-check.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> AFAIK it shouldn't cause any issues as ovsdb_idl_loop_run() will
> >>>>> process
> >>>>>>>>> anything
> >>>>>>>>> that it can and wait will be fine. The problem would be if we
> > would
> >>>>>>> skip the
> >>>>>>>>> ovsdb_idl_loop_run() for some reason.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Next, does this have any negative impact on our ability to
> > perform
> >>>>>>>>>> incremental processing in ovn-northd? My concern is that since
> > we are
> >>>>>>>>>> still running the ovsdb IDL loop that if multiple NB changes
> > occur
> >>>>>>>>>> during our delay, then we might have to fall back to a full
> > recompute
> >>>>>>>>>> instead of being able to incrementally process the changes. Are
> > my
> >>>>>>>>>> concerns valid?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I suppose that can happen if there are changes that could result
> > in
> >>>>>>>>> "conflict"
> >>>>>>>>> and full recompute. During the test we haven't seen anything
like
> >>>>> that.
> >>>>>>>>> The odds of that happening are small because as stated
previously
> > we
> >>>>> are
> >>>>>>>>> doing
> >>>>>>>>> basically the same as if the engine was running for a long time
> > always
> >>>>>>> from
> >>>>>>>>> the IDL
> >>>>>>>>> point of view except that we give IDL a chance to process
> > whatever has
> >>>>>>>>> pilled up
> >>>>>>>>> within the sleep period.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Next, has scale testing shown that this change has made a
> > positive
> >>>>>>>>>> impact? If so, is there any recommendation for how to determine
> > what
> >>>>> to
> >>>>>>>>>> configure the value to?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> It has a huge impact actually the value tested was 200 ms, the
> >>>>>>>>> recommendation
> >>>>>>>>
> >>>>>>>> This was chosen based on the historical data from similar tests
> > which
> >>>>>>>> showed that the I-P engine was taking ~180-200 ms to run at
scale.
> >>>>>>>>
> >>>>>>>>> would be < 500 ms. After that point the latency on components
> > creation
> >>>>>>>>> would be
> >>>>>>>>> very noticable. I will put the recommendation into the
ovn-nb.xml
> > with
> >>>>>>> the
> >>>>>>>>> latency
> >>>>>>>>> comment. Before I'll post v2 (which has the numbers in commit
> > message)
> >>>>>>> those
> >>>>>>>>> are the test results:
> >>>>>>>>>
> >>>>>>>>> Run without any backoff period:
> >>>>>>>>> northd aggregate CPU 9810% avg / 12765% max
> >>>>>>>>> northd was spinning at 100% CPU the entire second half of the
> > test.
> >>>>>>>>>
> >>>>>>>>> Run with 200 ms max backoff period:
> >>>>>>>>> northd aggregate CPU 6066% avg / 7689% max
> >>>>>>>>> northd was around 60% for the second half of the test
> >>>>>>>>>
> >
> > Ales, could you clarify the version of OVN and the type of test for the
> > above result? Is it on main branch or with Numans patches for LB I-P?
Does
> > the test include LB changes? Are there recomputes during the test? Was
the
> > 100% CPU during the second half of the test caused by recompute or by
cost
> > of I-P?
> >
>
> I'll try to reply to these but Ales, please correct me if I'm wrong.
>
> Version used for testing: OVN main + Numan's "LB + lflow I-P v5".
>
> The test includes LB changes, it brings up an 120 OpenShift cluster
> using the latest OVN-IC support that was added recently to
> ovn-kubernetes.  It runs one "zone per node".
>
> The test then creates 14K services and 28K pods distributed across
> 80 namespaces as fast as possible.  The test measures lots of
> things but we were interested in average and P99 time for pods to
> be "Ready" (networking ready) and cpu usage for ovn-northd on all
> nodes.
>
> As far as I understand there were still occasional recomputes
> triggered by missing/failing handlers, e.g. sync_to_sb_lb or lflow
> missing a handler for SB.LB changes.  In any case, the CMS (ovn-kube)
> was continuously adding ports/LBs/resources to NB, as fast as
> possible; northd was waking up to process those.
>
> An example of logs generated by northd I-P while at scale:
>
> 2023-08-08T19:12:27.394Z|1149658|inc_proc_eng|DBG|node: lflow, handler
for input northd took 1ms
> 2023-08-08T19:12:27.552Z|1149766|inc_proc_eng|DBG|node: lb_data, handler
for input NB_load_balancer_group took 13ms
> 2023-08-08T19:12:27.553Z|1149779|inc_proc_eng|DBG|node:
sync_to_sb_addr_set, recompute (missing handler for input northd) took 1ms
> 2023-08-08T19:12:27.684Z|1149782|inc_proc_eng|DBG|node: sync_to_sb_lb,
recompute (missing handler for input SB_load_balancer) took 131ms
> 2023-08-08T19:12:27.685Z|1149794|inc_proc_eng|DBG|node: lflow, handler
for input northd took 1ms
> 2023-08-08T19:12:27.846Z|1149893|inc_proc_eng|DBG|node: lb_data, handler
for input NB_load_balancer_group took 13ms
> 2023-08-08T19:12:27.848Z|1149907|inc_proc_eng|DBG|node:
sync_to_sb_addr_set, recompute (missing handler for input northd) took 1ms
>
> >>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Finally, is this change expected to be a long-term necessity?
> > This
> >>>>>>>>>> option seems to be useful for cases where northd recomputes are
> >>>>>>>>>> required. Performing recomputes less frequently seems like it
> > would
> >>>>>>>>>> lower the CPU usage of ovn-northd while still processing the
same
> >>>>>>> amount
> >>>>>>>>>> of changes. However, once northd can handle most changes
> >>>>> incrementally,
> >>>>>>>>>> is there still a benefit to delaying running? If each run of
> > northd
> >>>>>>>>>> handles all DB changes incrementally, then is there any point
in
> >>>>>>> putting
> >>>>>>>>>> delays between those incremental runs?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> Ideally we won't need it in the future. However, the assumption
> > for
> >>>>> not
> >>>>>>>>> needing
> >>>>>>>>> anything like this is that northd will be fast enough to process
> > I-P
> >>>>>>>>> changes and
> >>>>>>>>> be able to sleep between the next batch update arrives from CMS.
> > That
> >>>>>>>>> doesn't
> >>>>>>>>> seem to happen in very near future, one thing to keep in mind is
> > that
> >>>>>>>>> testing
> >>>>>>>>> happened with Numan's I-P for LBs and lflows which make a huge
> >>>>>>> difference,
> >>>>>>>>> but
> >>>>>>>>> still not enough to achieve the mentioned northd state. So from
my
> >>>>>>>>> perspective
> >>>>>>>>> it will be relevant for a few releases. And as stated above the
> > point
> >>>>>>> is to
> >>>>>>>>> prevent
> >>>>>>>>> northd to spin at 100% CPU all the time.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> +1 it's not the prettiest feature (and some might rightfully call
> > it a
> >>>>>>>> hack) but it seems to me like the cleanest alternative for now,
> > until
> >>>>>>>> northd processing is fully incremental.
> >>>>>>>
> >>>>>>> In most cases it may be fine, but it might be a problem for a
worst
> > case
> >>>>>>> scenario:
> >>>>>>>
> >>>>>>> Assume all the changes coming in NB can be incrementally processed
> > but
> >>>>> at
> >>>>>>> a very very high rate, and ovn-northd keeps processing the changes
> >>>>>>> incrementally. Since the change rate is so high, ovn-northd barely
> >>>>> keeps up
> >>>>>>> with the changes with 99% CPU load. For example, I-P for each
object
> >>>>> takes
> >>>>>>> 10ms, and the change rate is 99 objects/sec. According to this
> >>>>> algorithm,
> >>>>>>> ovn-northd will always sleep for the maximum 200ms between each
IDL
> > run,
> >>>>>>> and then ovn-northd would never keep up with the changes any more
-
> > the
> >>>>>>> backlog will become longer and longer because of the wasted idle
> > time.
> >>>>>>>
> >>>>>>
> >>>>>> IDL runs are not skipped.  Just I-P engine runs.  So I think this
> > won't
> >>>>>> be a problem, or am I missing something?
> >>>>>
> >>>>> Sorry about the typo, I meant to say between each "engine run"
> > instead of
> >>>>> "IDL run". IDL run is not skipped, but the backlog (accumulated
> > changes in
> >>>>> IDL) becomes longer and longer. E.g.:
> >>>>>
> >>>>> (assume change rate is 100 object/sec)
> >>>>>
> >>>>> run-1: handles 1 object, takes 10ms, sleep 10ms
> >>>>> run-2: handles 2 objects, takes 20ms, sleep 20ms
> >>>>> run-3: handles 4 objects, takes 40ms, sleep 40ms
> >>>>> run-4: handles 8 objects, takes 80ms, sleep 80ms
> >>>>> run-5: handles 16 objects, takes 160ms, sleep 160ms
> >>>>> run-6: handles 32 objects, takes 320ms, sleep 200ms
> >>>>> run-7: handles 52 objects (accumulated in 320 + 200 ms), takes
520ms,
> > sleep
> >>>>> 200ms
> >>>>> run-8: handles 72 objects, takes 720ms, sleep 200ms
> >>>>> run-9: handles 92 objects, takes 920ms, sleep 200ms
> >>>>> ...
> >>>>> As we can see the backlog grows indefinitely if the input keeps
> > changing at
> >>>>> the rate of 100 obj/s.
> >>>>>
> >>>>
> >>>> I see now, thanks for the explanation.  But isn't this possible today
> >>>> too?  Maybe less probable though.
> >>>>
> >>>> Also, we'll probably hit other issues (e.g., timeouts on CMS side)
> >>>> because of the backlog which will (likely?) throttle the CMS.
> >>>>
> >>>> What would be a good solution?  It seems hard to define what a "large
> >>>> backlog" is.  If we could do that, maybe triggering a full recompute
> >>>> when the backlog is large enough might be OK.
> >>>
> >>> Hmm.  That's an interesting problem.
> >>> From my understanding, the cost of incremental processing run consists
> >>> of two parts: some more or less constant engine run cost E and a cost
> >>> of processing actual changes P.  Unless CMS is changing the same value
> >>> over and over again, we can't optimize P by accumulating more changes
> >>> per run.  But we can amortize the more or less constant cost E.
> >>>
> >>> We had a similar problem with the full recompute in the past.  If we
> >>> assume the cost of a recompute R to be more or less constant
regardless
> >>> of amount of changes, then we can reduce the total cost by aggregating
> >>> more changes per run.  That's what run_idl_loop() function is trying
> >>> to do.  It runs IDL for as long as there are changes (capped at
500ms).
> >>>
> >>> We can't really avoid paying the cost P and it will compound with the
> >>> increased number of changes.  The only solution here, AFAICT, is to
> >>> fall back to full recompute once P > R.  Maybe that can be done by
> >>> somehow calculating the number of tracked vs real rows in IDL.
> >>> Current ovn-northd has this problem and introduction of delays will
> >>> amplify it.
> >>>
> >>
> >> That is still hard to do unfortunately.  The ratio of IDL tracked vs
> >> "real" rows is still not a good enough indication.  It's possible that
> >> northd can quickly incrementally process row updates for table T1 but
> >> that it has to do intensive computations for row updates for table T2.
> >>
> >> A simpler (dumb?) approach is to just periodically recompute.  But that
> >> shouldn't happen too often because then we start using a lot of CPU
> >> again. :)
> >>
> >>>
> >>> The problem this patch is trying to address, IIUC, is that the cost E
> >>> is still high.  So, I'm wondering if the solution can be similar to
> >>> what we already have for a full recompute.
> >
> > My assumption was that the cost E is relatively small in the I-P. I
thought
> > the patch was trying to help when there were still many recomputes
> > triggered. Please correct me if I was wrong. I posted some questions
> > regarding Ales's test result, to get a better understanding.
>
> I thought E was the "constant" non-incremental cost.  In any case,
> P (cost for I-P to successfully run) is also not negligible, e.g.,
> even a successful I-P execution of the lb_data handler for NB.LB_Group
> changes took 13ms at scale:
>
> 2023-08-08T19:12:27.552Z|1149766|inc_proc_eng|DBG|node: lb_data, handler
for input NB_load_balancer_group took 13ms


My point is, the constant cost in the I-P handlers is relatively small. The
constant cost mainly comes from the recomputes or partial recomputes, e.g.
when "northd" node is I-P but "lflow" or "sync_to_sb_lb" recomputes.
If the definition of E includes the cost of recompute of nodes during the
engine execution, I think we are on the same page. That part is what the
delay approach would help the most. But if E only includes the constant
part of an I-P handler, e.g. initializing/destroying some data structures,
those shouldn't be a significant cost.

And yes, the cost of some I-P handlers is not negligible, but I don't think
the delay approach would really help in that case.

Does this make sense?

Thanks,
Han

>
>
> Regards,
> Dumitru
>
> >
> > Thanks,
> > Han
> >
> >>>  Current run_idl_loop()
> >>> runs the IDL only while there are changes and it will not continue
> >>> running it if the next update is delayed even by a very short time
> >>> interval.  What we could potentially do here is to wait for some
> >>> very short time interval (maybe configurable or maybe not) and check
> >>> the IDL for changes again.  This will allow us to better batch updates
> >>> and amortize constant engine run cost E without need to sleep for
> >>> arbitrary 200ms. (We will sleep for arbitrary 1-10ms at a time, but
> >>> that seems less arbitrary. :D ).
> >>
> >> I think we still need to set a max limit for this sequence of short
> >> sleeps and that max limit needs to be configurable by the user because
> >> it may increase end-to-end latency for propagating NB changes to
> >> ovn-controller.
> >>
> >> In that case I think I prefer an explicit "arbitrary" max backoff set
by
> >> the user like this patch initially proposed.  But I'm not completely
> >> against other options.
> >>
> >>>
> >>> Something like (for illustration purposes; not tested):
> >>>
> >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> >>> index 4fa1b039e..54e4ecb5f 100644
> >>> --- a/northd/ovn-northd.c
> >>> +++ b/northd/ovn-northd.c
> >>> @@ -689,22 +689,39 @@ run_idl_loop(struct ovsdb_idl_loop *idl_loop,
> > const char *name)
> >>>      unsigned long long duration, start = time_msec();
> >>>      unsigned int seqno = UINT_MAX;
> >>>      struct ovsdb_idl_txn *txn;
> >>> +    int n_before_sleep = -1;
> >>>      int n = 0;
> >>>
> >>>      /* Accumulate database changes as long as there are some,
> >>>       * but no longer than half a second. */
> >>> -    while (seqno != ovsdb_idl_get_seqno(idl_loop->idl)
> >>> -           && time_msec() - start < 500) {
> >>> -        seqno = ovsdb_idl_get_seqno(idl_loop->idl);
> >>> -        ovsdb_idl_run(idl_loop->idl);
> >>> -        n++;
> >>> +    for (;;) {
> >>> +        while (seqno != ovsdb_idl_get_seqno(idl_loop->idl)
> >>> +               && time_msec() - start < 500) {
> >>> +            seqno = ovsdb_idl_get_seqno(idl_loop->idl);
> >>> +            ovsdb_idl_run(idl_loop->idl);
> >>> +            n++;
> >>> +        }
> >>> +        /* If we're not out of time yet, try to sleep for a short
10ms
> >>> +         * in case we'll have more updates.  Don't sleep again if
there
> >>> +         * were no updates after the previous short sleep. */
> >>> +        if (n > n_before_sleep + 1 && time_msec() - start < 500) {
> >>> +            n_before_sleep = n;
> >>> +            poll_timer_wait(10);
> >>> +            ovsdb_idl_wait(idl_loop->idl);
> >>> +            poll_block();
> >>> +            /* Reset seqno, so we try to run IDL at least one more
> > time. */
> >>> +            seqno = UINT_MAX;
> >>> +        } else {
> >>> +            /* Out of time or no updates since the last sleep. */
> >>> +            break;
> >>> +        }
> >>>      }
> >>>
> >>>      txn = ovsdb_idl_loop_run(idl_loop);
> >>>
> >>>      duration = time_msec() - start;
> >>> -    /* ovsdb_idl_run() is called at least 2 times.  Once directly and
> >>> -     * once in the ovsdb_idl_loop_run().  n > 2 means that we
received
> >>> +    /* ovsdb_idl_run() is called at least 3 times.  Once directly and
> >>> +     * once in the ovsdb_idl_loop_run().  n > 3 means that we
received
> >>>       * data on at least 2 subsequent calls. */
> >>>      if (n > 2 || duration > 100) {
> >>>          VLOG(duration > 500 ? VLL_INFO : VLL_DBG,
> >>> ---
> >>>
> >>> I'm not sure how something like this will impact the total CPU usage.
> >>> Hopefully, these short sleeps can allow accumulating more changes and
> >>> amortize the constant costs better while also reducing perceived CPU
> >>> usage.  Needs testing, for sure.  One other advantage of such a
solution
> >>> is the code locality, i.e. all the logic being in one place.
> >>>
> >>> Thoughts?
> >>>
> >>> Best regards, Ilya Maximets.
> >>>
> >>
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] northd: Allow delay of northd engine runs

Reply via email to