Re: [ovs-dev] [PATCH ovn] northd: Allow delay of northd engine runs

Dumitru Ceara Tue, 15 Aug 2023 01:49:50 -0700

On 8/14/23 18:51, Han Zhou wrote:
> On Mon, Aug 14, 2023 at 4:46 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>
>> On 8/12/23 07:08, Han Zhou wrote:
>>> On Fri, Aug 11, 2023 at 6:07 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>>>
>>>> On 8/10/23 18:38, Ilya Maximets wrote:
>>>>> On 8/10/23 17:34, Dumitru Ceara wrote:
>>>>>> On 8/10/23 17:20, Han Zhou wrote:
>>>>>>> On Thu, Aug 10, 2023 at 6:36 AM Dumitru Ceara <dce...@redhat.com>
>>> wrote:
>>>>>>>>
>>>>>>>> On 8/10/23 15:34, Han Zhou wrote:
>>>>>>>>> On Thu, Aug 10, 2023 at 2:29 AM Dumitru Ceara <dce...@redhat.com>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 8/10/23 08:12, Ales Musil wrote:
>>>>>>>>>>> On Wed, Aug 9, 2023 at 5:13 PM Mark Michelson <
> mmich...@redhat.com
>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ales,
>>>>>>>>>>>>
>>>>>>>>>>>> I have some high-level comments/questions about this patch.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Mark,
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Ales, Mark,
>>>>>>>>>>
>>>>>>>>>>> thank you for the review. See my answers inline below.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I have been privy to the conversations that led to this change.
>>> My
>>>>>>>>>>>> understanding is that by having ovn-northd wake up immediately,
>>> it
>>>>>>> can
>>>>>>>>>>>> be more CPU-intensive than waiting a bit for changes to
>>> accumulate
>>>>>>> and
>>>>>>>>>>>> handling all of those at once instead. However, nothing in
>>> either the
>>>>>>>>>>>> commit message or ovn-nb.xml explains what the purpose of this
>>> new
>>>>>>>>>>>> configuration option is. I think you should add a sentence or
>>> two to
>>>>>>>>>>>> explain why someone would want to enable this option.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Yeah that's my bad, I have v2 prepared with some explanation in
>>> the
>>>>>>>>> commit
>>>>>>>>>>> message
>>>>>>>>>>> together with results from scale run.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1 we really need to explain why this change is needed.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Next, the algorithm used here strikes me as odd. We use the
>>> previous
>>>>>>>>> run
>>>>>>>>>>>> time of ovn-northd to determine how long to wait before running
>>>>>>> again.
>>>>>>>>>>>> This delay is capped by the configured backoff time. Let's say
>>> that
>>>>>>>>>>>> we've configured the backoff interval to be 200 ms. If
> ovn-northd
>>>>>>> has a
>>>>>>>>>>>> super quick run and only takes 10 ms, then we will only delay
> the
>>>>>>> next
>>>>>>>>>>>> run by 10 ms. IMO, this seems like it would not mitigate the
>>> original
>>>>>>>>>>>> issue by much, since we are only allowing a maximum of 20 ms
> (10
>>> ms
>>>>>>> for
>>>>>>>>>>>> the run of ovn-northd + 10 ms delay) of NB changes to
> accumulate.
>>>>>>>>>>>> Conversely, if northd has a huge recompute and it takes 5000 ms
>>> to
>>>>>>>>>>>> complete, then we would delay the next run by 200ms. In this
>>> case,
>>>>>>>>>>>> delaying at all seems like it's not necessary since we
>>> potentially
>>>>>>> have
>>>>>>>>>>>> 5000 ms worth of NB DB updates that have not been addressed. I
>>> would
>>>>>>>>>>>> have expected the opposite approach to be taken. If someone
>>>>>>> configures
>>>>>>>>>>>> 200ms as their backoff interval, I would expect us to always
>>> allow a
>>>>>>>>>>>> *minimum* of 200ms of NB changes to accumulate before running
>>> again.
>>>>>>> So
>>>>>>>>>>>> for instance, if northd runs quickly and is done in 10 ms, then
>>> we
>>>>>>>>> would
>>>>>>>>>>>> wait 200 - 10 = 190 ms before processing changes again. If
> northd
>>>>>>> takes
>>>>>>>>>>>> a long time to recompute and takes 5000 ms, then we would not
>>> wait at
>>>>>>>>>>>> all before processing changes again. Was the algorithm chosen
>>> based
>>>>>>> on
>>>>>>>>>>>> experimentation? Is it a well-known method I'm just not
> familiar
>>>>>>> with?
>>>>>>>>>>
>>>>>>>>>> I think the main assumption (that should probably be made
> explicit
>>> in
>>>>>>>>>> the commit log and/or documentation) is that on average changes
>>> happen
>>>>>>>>>> in a uniform way.  This might not always be accurate.
>>>>>>>>>>
>>>>>>>>>> However, if we're off with the estimate, in the worst case we'd
> be
>>>>>>>>>> adding the configured max delay to the latency of processing
>>> changes.
>>>>>>>>>> So, as long as the value is not extremely high, the impact is not
>>> that
>>>>>>>>>> high either.
>>>>>>>>>>
>>>>>>>>>> Last but not least, as this value would be configured by the CMS,
>>> we
>>>>>>>>>> assume the CMS knows what they're doing. :)
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure if the algorithm is well known.
>>>>>>>>>>>
>>>>>>>>>>> The thing is that at scale we almost always cap at the backoff
> so
>>> it
>>>>>>> has
>>>>>>>>>>> probably
>>>>>>>>>>> the same effect as your suggestion with the difference that we
>>>>>>> actually
>>>>>>>>>>> delay even
>>>>>>>>>>> after long runs. And that is actually desired, it's true that in
>>> the
>>>>>>>>> let's
>>>>>>>>>>> say 500 ms
>>>>>>>>>>> should be enough to accumulate more changes however that can
> lead
>>> to
>>>>>>>>> another
>>>>>>>>>>> 500 ms run and so on. That in the end means that northd will
> spin
>>> at
>>>>>>>>> 100%
>>>>>>>>>>> CPU
>>>>>>>>>>> anyway which is what we want to avoid. So from another point of
>>> view
>>>>>>> the
>>>>>>>>>>> accumulation
>>>>>>>>>>> of IDL changes is a secondary effect which is still desired, but
>>> not
>>>>>>> the
>>>>>>>>>>> main purpose.
>>>>>>>>>>>
>>>>>>>>>>> Also delaying by short time if the previous run was short is
>>> fine, you
>>>>>>>>> are
>>>>>>>>>>> right that we don't
>>>>>>>>>>> accumulate enough however during short running times there is a
>>> high
>>>>>>>>> chance
>>>>>>>>>>> that the
>>>>>>>>>>> northd would get to sleep anyway (We will help it to sleep at
>>> least a
>>>>>>>>> bit
>>>>>>>>>>> nevertheless).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Next, I notice that you've added new poll_timer_wait() calls
> but
>>>>>>>>> haven't
>>>>>>>>>>>> changed the ovsdb_idl_loop_run() or
>>> ovsdb_idl_loop_commit_and_wait()
>>>>>>>>>>>> calls. Is there any danger of ovn-northd getting in a busy loop
>>> of
>>>>>>>>>>>> sleeping and waking because of this? I don't think it should,
>>> since
>>>>>>>>>>>> presumably ovsdb_idl_loop_run() should clear the conditions
>>> waited on
>>>>>>>>> by
>>>>>>>>>>>> ovsdb_idl_loop_commit_and_wait(), but I want to double-check.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> AFAIK it shouldn't cause any issues as ovsdb_idl_loop_run() will
>>>>>>> process
>>>>>>>>>>> anything
>>>>>>>>>>> that it can and wait will be fine. The problem would be if we
>>> would
>>>>>>>>> skip the
>>>>>>>>>>> ovsdb_idl_loop_run() for some reason.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Next, does this have any negative impact on our ability to
>>> perform
>>>>>>>>>>>> incremental processing in ovn-northd? My concern is that since
>>> we are
>>>>>>>>>>>> still running the ovsdb IDL loop that if multiple NB changes
>>> occur
>>>>>>>>>>>> during our delay, then we might have to fall back to a full
>>> recompute
>>>>>>>>>>>> instead of being able to incrementally process the changes. Are
>>> my
>>>>>>>>>>>> concerns valid?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I suppose that can happen if there are changes that could result
>>> in
>>>>>>>>>>> "conflict"
>>>>>>>>>>> and full recompute. During the test we haven't seen anything
> like
>>>>>>> that.
>>>>>>>>>>> The odds of that happening are small because as stated
> previously
>>> we
>>>>>>> are
>>>>>>>>>>> doing
>>>>>>>>>>> basically the same as if the engine was running for a long time
>>> always
>>>>>>>>> from
>>>>>>>>>>> the IDL
>>>>>>>>>>> point of view except that we give IDL a chance to process
>>> whatever has
>>>>>>>>>>> pilled up
>>>>>>>>>>> within the sleep period.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Next, has scale testing shown that this change has made a
>>> positive
>>>>>>>>>>>> impact? If so, is there any recommendation for how to determine
>>> what
>>>>>>> to
>>>>>>>>>>>> configure the value to?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> It has a huge impact actually the value tested was 200 ms, the
>>>>>>>>>>> recommendation
>>>>>>>>>>
>>>>>>>>>> This was chosen based on the historical data from similar tests
>>> which
>>>>>>>>>> showed that the I-P engine was taking ~180-200 ms to run at
> scale.
>>>>>>>>>>
>>>>>>>>>>> would be < 500 ms. After that point the latency on components
>>> creation
>>>>>>>>>>> would be
>>>>>>>>>>> very noticable. I will put the recommendation into the
> ovn-nb.xml
>>> with
>>>>>>>>> the
>>>>>>>>>>> latency
>>>>>>>>>>> comment. Before I'll post v2 (which has the numbers in commit
>>> message)
>>>>>>>>> those
>>>>>>>>>>> are the test results:
>>>>>>>>>>>
>>>>>>>>>>> Run without any backoff period:
>>>>>>>>>>> northd aggregate CPU 9810% avg / 12765% max
>>>>>>>>>>> northd was spinning at 100% CPU the entire second half of the
>>> test.
>>>>>>>>>>>
>>>>>>>>>>> Run with 200 ms max backoff period:
>>>>>>>>>>> northd aggregate CPU 6066% avg / 7689% max
>>>>>>>>>>> northd was around 60% for the second half of the test
>>>>>>>>>>>
>>>
>>> Ales, could you clarify the version of OVN and the type of test for the
>>> above result? Is it on main branch or with Numans patches for LB I-P?
> Does
>>> the test include LB changes? Are there recomputes during the test? Was
> the
>>> 100% CPU during the second half of the test caused by recompute or by
> cost
>>> of I-P?
>>>
>>
>> I'll try to reply to these but Ales, please correct me if I'm wrong.
>>
>> Version used for testing: OVN main + Numan's "LB + lflow I-P v5".
>>
>> The test includes LB changes, it brings up an 120 OpenShift cluster
>> using the latest OVN-IC support that was added recently to
>> ovn-kubernetes.  It runs one "zone per node".
>>
>> The test then creates 14K services and 28K pods distributed across
>> 80 namespaces as fast as possible.  The test measures lots of
>> things but we were interested in average and P99 time for pods to
>> be "Ready" (networking ready) and cpu usage for ovn-northd on all
>> nodes.
>>
>> As far as I understand there were still occasional recomputes
>> triggered by missing/failing handlers, e.g. sync_to_sb_lb or lflow
>> missing a handler for SB.LB changes.  In any case, the CMS (ovn-kube)
>> was continuously adding ports/LBs/resources to NB, as fast as
>> possible; northd was waking up to process those.
>>
>> An example of logs generated by northd I-P while at scale:
>>
>> 2023-08-08T19:12:27.394Z|1149658|inc_proc_eng|DBG|node: lflow, handler
> for input northd took 1ms
>> 2023-08-08T19:12:27.552Z|1149766|inc_proc_eng|DBG|node: lb_data, handler
> for input NB_load_balancer_group took 13ms
>> 2023-08-08T19:12:27.553Z|1149779|inc_proc_eng|DBG|node:
> sync_to_sb_addr_set, recompute (missing handler for input northd) took 1ms
>> 2023-08-08T19:12:27.684Z|1149782|inc_proc_eng|DBG|node: sync_to_sb_lb,
> recompute (missing handler for input SB_load_balancer) took 131ms
>> 2023-08-08T19:12:27.685Z|1149794|inc_proc_eng|DBG|node: lflow, handler
> for input northd took 1ms
>> 2023-08-08T19:12:27.846Z|1149893|inc_proc_eng|DBG|node: lb_data, handler
> for input NB_load_balancer_group took 13ms
>> 2023-08-08T19:12:27.848Z|1149907|inc_proc_eng|DBG|node:
> sync_to_sb_addr_set, recompute (missing handler for input northd) took 1ms
>>
>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Finally, is this change expected to be a long-term necessity?
>>> This
>>>>>>>>>>>> option seems to be useful for cases where northd recomputes are
>>>>>>>>>>>> required. Performing recomputes less frequently seems like it
>>> would
>>>>>>>>>>>> lower the CPU usage of ovn-northd while still processing the
> same
>>>>>>>>> amount
>>>>>>>>>>>> of changes. However, once northd can handle most changes
>>>>>>> incrementally,
>>>>>>>>>>>> is there still a benefit to delaying running? If each run of
>>> northd
>>>>>>>>>>>> handles all DB changes incrementally, then is there any point
> in
>>>>>>>>> putting
>>>>>>>>>>>> delays between those incremental runs?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Ideally we won't need it in the future. However, the assumption
>>> for
>>>>>>> not
>>>>>>>>>>> needing
>>>>>>>>>>> anything like this is that northd will be fast enough to process
>>> I-P
>>>>>>>>>>> changes and
>>>>>>>>>>> be able to sleep between the next batch update arrives from CMS.
>>> That
>>>>>>>>>>> doesn't
>>>>>>>>>>> seem to happen in very near future, one thing to keep in mind is
>>> that
>>>>>>>>>>> testing
>>>>>>>>>>> happened with Numan's I-P for LBs and lflows which make a huge
>>>>>>>>> difference,
>>>>>>>>>>> but
>>>>>>>>>>> still not enough to achieve the mentioned northd state. So from
> my
>>>>>>>>>>> perspective
>>>>>>>>>>> it will be relevant for a few releases. And as stated above the
>>> point
>>>>>>>>> is to
>>>>>>>>>>> prevent
>>>>>>>>>>> northd to spin at 100% CPU all the time.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1 it's not the prettiest feature (and some might rightfully call
>>> it a
>>>>>>>>>> hack) but it seems to me like the cleanest alternative for now,
>>> until
>>>>>>>>>> northd processing is fully incremental.
>>>>>>>>>
>>>>>>>>> In most cases it may be fine, but it might be a problem for a
> worst
>>> case
>>>>>>>>> scenario:
>>>>>>>>>
>>>>>>>>> Assume all the changes coming in NB can be incrementally processed
>>> but
>>>>>>> at
>>>>>>>>> a very very high rate, and ovn-northd keeps processing the changes
>>>>>>>>> incrementally. Since the change rate is so high, ovn-northd barely
>>>>>>> keeps up
>>>>>>>>> with the changes with 99% CPU load. For example, I-P for each
> object
>>>>>>> takes
>>>>>>>>> 10ms, and the change rate is 99 objects/sec. According to this
>>>>>>> algorithm,
>>>>>>>>> ovn-northd will always sleep for the maximum 200ms between each
> IDL
>>> run,
>>>>>>>>> and then ovn-northd would never keep up with the changes any more
> -
>>> the
>>>>>>>>> backlog will become longer and longer because of the wasted idle
>>> time.
>>>>>>>>>
>>>>>>>>
>>>>>>>> IDL runs are not skipped.  Just I-P engine runs.  So I think this
>>> won't
>>>>>>>> be a problem, or am I missing something?
>>>>>>>
>>>>>>> Sorry about the typo, I meant to say between each "engine run"
>>> instead of
>>>>>>> "IDL run". IDL run is not skipped, but the backlog (accumulated
>>> changes in
>>>>>>> IDL) becomes longer and longer. E.g.:
>>>>>>>
>>>>>>> (assume change rate is 100 object/sec)
>>>>>>>
>>>>>>> run-1: handles 1 object, takes 10ms, sleep 10ms
>>>>>>> run-2: handles 2 objects, takes 20ms, sleep 20ms
>>>>>>> run-3: handles 4 objects, takes 40ms, sleep 40ms
>>>>>>> run-4: handles 8 objects, takes 80ms, sleep 80ms
>>>>>>> run-5: handles 16 objects, takes 160ms, sleep 160ms
>>>>>>> run-6: handles 32 objects, takes 320ms, sleep 200ms
>>>>>>> run-7: handles 52 objects (accumulated in 320 + 200 ms), takes
> 520ms,
>>> sleep
>>>>>>> 200ms
>>>>>>> run-8: handles 72 objects, takes 720ms, sleep 200ms
>>>>>>> run-9: handles 92 objects, takes 920ms, sleep 200ms
>>>>>>> ...
>>>>>>> As we can see the backlog grows indefinitely if the input keeps
>>> changing at
>>>>>>> the rate of 100 obj/s.
>>>>>>>
>>>>>>
>>>>>> I see now, thanks for the explanation.  But isn't this possible today
>>>>>> too?  Maybe less probable though.
>>>>>>
>>>>>> Also, we'll probably hit other issues (e.g., timeouts on CMS side)
>>>>>> because of the backlog which will (likely?) throttle the CMS.
>>>>>>
>>>>>> What would be a good solution?  It seems hard to define what a "large
>>>>>> backlog" is.  If we could do that, maybe triggering a full recompute
>>>>>> when the backlog is large enough might be OK.
>>>>>
>>>>> Hmm.  That's an interesting problem.
>>>>> From my understanding, the cost of incremental processing run consists
>>>>> of two parts: some more or less constant engine run cost E and a cost
>>>>> of processing actual changes P.  Unless CMS is changing the same value
>>>>> over and over again, we can't optimize P by accumulating more changes
>>>>> per run.  But we can amortize the more or less constant cost E.
>>>>>
>>>>> We had a similar problem with the full recompute in the past.  If we
>>>>> assume the cost of a recompute R to be more or less constant
> regardless
>>>>> of amount of changes, then we can reduce the total cost by aggregating
>>>>> more changes per run.  That's what run_idl_loop() function is trying
>>>>> to do.  It runs IDL for as long as there are changes (capped at
> 500ms).
>>>>>
>>>>> We can't really avoid paying the cost P and it will compound with the
>>>>> increased number of changes.  The only solution here, AFAICT, is to
>>>>> fall back to full recompute once P > R.  Maybe that can be done by
>>>>> somehow calculating the number of tracked vs real rows in IDL.
>>>>> Current ovn-northd has this problem and introduction of delays will
>>>>> amplify it.
>>>>>
>>>>
>>>> That is still hard to do unfortunately.  The ratio of IDL tracked vs
>>>> "real" rows is still not a good enough indication.  It's possible that
>>>> northd can quickly incrementally process row updates for table T1 but
>>>> that it has to do intensive computations for row updates for table T2.
>>>>
>>>> A simpler (dumb?) approach is to just periodically recompute.  But that
>>>> shouldn't happen too often because then we start using a lot of CPU
>>>> again. :)
>>>>
>>>>>
>>>>> The problem this patch is trying to address, IIUC, is that the cost E
>>>>> is still high.  So, I'm wondering if the solution can be similar to
>>>>> what we already have for a full recompute.
>>>
>>> My assumption was that the cost E is relatively small in the I-P. I
> thought
>>> the patch was trying to help when there were still many recomputes
>>> triggered. Please correct me if I was wrong. I posted some questions
>>> regarding Ales's test result, to get a better understanding.
>>
>> I thought E was the "constant" non-incremental cost.  In any case,
>> P (cost for I-P to successfully run) is also not negligible, e.g.,
>> even a successful I-P execution of the lb_data handler for NB.LB_Group
>> changes took 13ms at scale:
>>
>> 2023-08-08T19:12:27.552Z|1149766|inc_proc_eng|DBG|node: lb_data, handler
> for input NB_load_balancer_group took 13ms
> 
> My point is, the constant cost in the I-P handlers is relatively small. The
> constant cost mainly comes from the recomputes or partial recomputes, e.g.
> when "northd" node is I-P but "lflow" or "sync_to_sb_lb" recomputes.
> If the definition of E includes the cost of recompute of nodes during the
> engine execution, I think we are on the same page. That part is what the
> delay approach would help the most. But if E only includes the constant
> part of an I-P handler, e.g. initializing/destroying some data structures,
> those shouldn't be a significant cost.
>


Yes, the way I see it is that E includes the cost of recomputing nodes.

> And yes, the cost of some I-P handlers is not negligible, but I don't think
> the delay approach would really help in that case.
> 

In theory it wouldn't help but in practice it really depends on how that
handler is implemented.  E.g., a handler could somehow batch incremental
update processing for a given table.  Regardless, we focus on the former
case, E.

> Does this make sense?
> 

I think we're on the same page now.

Thanks,
Dumitru

> Thanks,
> Han
> 
>>
>>
>> Regards,
>> Dumitru
>>
>>>
>>> Thanks,
>>> Han
>>>
>>>>>  Current run_idl_loop()
>>>>> runs the IDL only while there are changes and it will not continue
>>>>> running it if the next update is delayed even by a very short time
>>>>> interval.  What we could potentially do here is to wait for some
>>>>> very short time interval (maybe configurable or maybe not) and check
>>>>> the IDL for changes again.  This will allow us to better batch updates
>>>>> and amortize constant engine run cost E without need to sleep for
>>>>> arbitrary 200ms. (We will sleep for arbitrary 1-10ms at a time, but
>>>>> that seems less arbitrary. :D ).
>>>>
>>>> I think we still need to set a max limit for this sequence of short
>>>> sleeps and that max limit needs to be configurable by the user because
>>>> it may increase end-to-end latency for propagating NB changes to
>>>> ovn-controller.
>>>>
>>>> In that case I think I prefer an explicit "arbitrary" max backoff set
> by
>>>> the user like this patch initially proposed.  But I'm not completely
>>>> against other options.
>>>>
>>>>>
>>>>> Something like (for illustration purposes; not tested):
>>>>>
>>>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>>>>> index 4fa1b039e..54e4ecb5f 100644
>>>>> --- a/northd/ovn-northd.c
>>>>> +++ b/northd/ovn-northd.c
>>>>> @@ -689,22 +689,39 @@ run_idl_loop(struct ovsdb_idl_loop *idl_loop,
>>> const char *name)
>>>>>      unsigned long long duration, start = time_msec();
>>>>>      unsigned int seqno = UINT_MAX;
>>>>>      struct ovsdb_idl_txn *txn;
>>>>> +    int n_before_sleep = -1;
>>>>>      int n = 0;
>>>>>
>>>>>      /* Accumulate database changes as long as there are some,
>>>>>       * but no longer than half a second. */
>>>>> -    while (seqno != ovsdb_idl_get_seqno(idl_loop->idl)
>>>>> -           && time_msec() - start < 500) {
>>>>> -        seqno = ovsdb_idl_get_seqno(idl_loop->idl);
>>>>> -        ovsdb_idl_run(idl_loop->idl);
>>>>> -        n++;
>>>>> +    for (;;) {
>>>>> +        while (seqno != ovsdb_idl_get_seqno(idl_loop->idl)
>>>>> +               && time_msec() - start < 500) {
>>>>> +            seqno = ovsdb_idl_get_seqno(idl_loop->idl);
>>>>> +            ovsdb_idl_run(idl_loop->idl);
>>>>> +            n++;
>>>>> +        }
>>>>> +        /* If we're not out of time yet, try to sleep for a short
> 10ms
>>>>> +         * in case we'll have more updates.  Don't sleep again if
> there
>>>>> +         * were no updates after the previous short sleep. */
>>>>> +        if (n > n_before_sleep + 1 && time_msec() - start < 500) {
>>>>> +            n_before_sleep = n;
>>>>> +            poll_timer_wait(10);
>>>>> +            ovsdb_idl_wait(idl_loop->idl);
>>>>> +            poll_block();
>>>>> +            /* Reset seqno, so we try to run IDL at least one more
>>> time. */
>>>>> +            seqno = UINT_MAX;
>>>>> +        } else {
>>>>> +            /* Out of time or no updates since the last sleep. */
>>>>> +            break;
>>>>> +        }
>>>>>      }
>>>>>
>>>>>      txn = ovsdb_idl_loop_run(idl_loop);
>>>>>
>>>>>      duration = time_msec() - start;
>>>>> -    /* ovsdb_idl_run() is called at least 2 times.  Once directly and
>>>>> -     * once in the ovsdb_idl_loop_run().  n > 2 means that we
> received
>>>>> +    /* ovsdb_idl_run() is called at least 3 times.  Once directly and
>>>>> +     * once in the ovsdb_idl_loop_run().  n > 3 means that we
> received
>>>>>       * data on at least 2 subsequent calls. */
>>>>>      if (n > 2 || duration > 100) {
>>>>>          VLOG(duration > 500 ? VLL_INFO : VLL_DBG,
>>>>> ---
>>>>>
>>>>> I'm not sure how something like this will impact the total CPU usage.
>>>>> Hopefully, these short sleeps can allow accumulating more changes and
>>>>> amortize the constant costs better while also reducing perceived CPU
>>>>> usage.  Needs testing, for sure.  One other advantage of such a
> solution
>>>>> is the code locality, i.e. all the logic being in one place.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] northd: Allow delay of northd engine runs

Reply via email to