On Thu, Oct 16, 2025 at 12:43:13PM +0200, Ales Musil via dev wrote:
> On Thu, Oct 16, 2025 at 12:09 PM Felix Huettner
> <[email protected]> wrote:
> 
> > On Wed, Oct 15, 2025 at 11:12:20AM +0200, Dumitru Ceara via dev wrote:
> > > Hi Ales, Mark,
> > >
> > > On 10/15/25 10:23 AM, Ales Musil wrote:
> > > > On Fri, Oct 10, 2025 at 9:58 PM Mark Michelson <[email protected]>
> > wrote:
> > > >
> > > >> On Fri, Oct 10, 2025 at 3:43 AM Ales Musil <[email protected]> wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, Oct 10, 2025 at 2:51 AM Mark Michelson <[email protected]>
> > > >> wrote:
> > > >>>>
> > > >>>> On Thu, Oct 9, 2025 at 5:25 PM Mark Michelson <[email protected]>
> > > >> wrote:
> > > >>>>>
> > > >>>>> Hi Ales,
> > > >>>>>
> > > >>>
> > > >>>
> > > >>> Hi Mark,
> > > >>>
> > > >>> thank you for the review, but I'm probably missing something here.
> > > >>>
> > > >>>
> > > >>>>
> > > >>>>> I've been thinking through this change, and I think the idea might
> > > >> have some flaws. Let's consider an arbitrary engine node B that takes
> > > >> engine node A as input. In this scenario, engine node B writes to the
> > SB DB
> > > >> and engine node A does not. With this change, we mark engine node B as
> > > >> SB_WRITE, but we do not do the same with node A.
> > > >>>>>
> > > >>>
> > > >>>
> > > >>> Why would we need to? A state doesn't depend on SB being writable.
> > > >>>
> > > >>>>
> > > >>>>> We perform an engine run where recomputation is not allowed because
> > > >> the SB DB is read only. When engine node A runs, since it does not
> > have
> > > >> SB_WRITE specified, it is able to recompute its data. However, when
> > engine
> > > >> node B runs, it cannot recompute its data.
> > > >>>>>
> > > >>>>> Now, what happens during the next engine run?
> > > >>>>> * Let's say engine node A has no new input data. Engine node A's
> > > >> state will be EN_UNCHANGED. In this case, engine node B sees that
> > engine
> > > >> node A is unchanged, so engine node B does not run. However, B really
> > > >> should run because in the previous engine node run, A recomputed. In
> > this
> > > >> case, B's data is incorrect.
> > > >>>>> * Let's say engine node A has new data and can process it
> > > >> incrementally. Engine node A's state will be EN_UPDATED.
> > > >>>>>     * Engine node B may also try to incrementally handle A's
> > tracked
> > > >> data. However, the only tracked data that B will see is from the
> > current
> > > >> engine run. The data that A recomputed in the previous engine run
> > will be
> > > >> unknown to B. In this case, B's data is incorrect.
> > > >>>>>     * Engine node B may not have an input handler for A. In this
> > > >> case, B will recompute and all will be well.
> > > >>>>> * Let's say engine node A has new data and cannot process it
> > > >> incrementally. In this case, A will recompute, and then B will also
> > likely
> > > >> recompute. The data integrity is fine in this case. However, A has
> > had to
> > > >> recompute twice. Before this change, A would only have recomputed on
> > the
> > > >> second engine run.
> > > >>>>>
> > > >>>>> In 2 of the 4 cases, B ends up with incorrect data. In 1 of the 4
> > > >> cases, the data is correct in engine node B, but A has to recompute
> > twice.
> > > >> And in 1 of the 4 cases, the data is correct in engine node B and we
> > > >> successfully avoided an unnecessary recomputation of A. The only time
> > this
> > > >> change has a positive effect is if the SB_WRITE node has no input
> > handlers
> > > >> for its input nodes, and the input nodes are able to incrementally
> > process
> > > >> their data.
> > > >>>>
> > > >>>> I've realized that in the scenario I outlined, a couple of those
> > > >>>> outcomes can't happen. When the first engine run happens, B will
> > fail
> > > >>>> to recompute, and the engine run will be canceled. In
> > ovn-controller,
> > > >>>> if an engine run is canceled, then we call
> > > >>>> engine_set_force_recompute_immediate(). This means that the next run
> > > >>>> will force all nodes to recompute. This means that the second and
> > > >>>> third outcomes (both based on A being able to incrementally process
> > > >>>> its data) cannot happen. Also, when engine_force_recompute is true,
> > > >>>> the state of input nodes is not considered. This means that if A is
> > > >>>> unchanged, it does not matter. B will still recompute and will
> > > >>>> therefore have all of A's data. So my concerns about data integrity
> > > >>>> are not correct.
> > > >>>
> > > >>>
> > > >>>
> > > >>> So none of the outcomes highlighted cannot happen, if B is cancelled
> > > >>> we will immediately run force recompute loop that will set everything
> > > >>> straight.
> > > >>>
> > > >>>>
> > > >>>>
> > > >>>> However, the fact that the engine forces all nodes to recompute
> > means
> > > >>>> that A will recompute on both engine runs. This means that with the
> > > >>>> patch, A will recompute two times across the two engine runs.
> > Without
> > > >>>> the patch, A only recomputes once. So I still think the patch is
> > > >>>> hurting more than it is helping.
> > > >>>
> > > >>>
> > > >>> That's the case only in the worst scenario. Keep in mind that this
> > > >>> doesn't always happen. So it's like all of the sudden we will do two
> > > >>> recmputes for certain nodes back to back. In fact I see the opposite
> > > >>> happening. Only 3 nodes in ovn-controller are marked as SB_WRITE any
> > > >>> of those nodes might cancel but the rest is free to recompute however
> > > >>> they like. We saw that there has been cancels for nodes do not
> > > >>> require SB write and this was harming performance a good example are
> > > >>> nodes pflow_output and lflow_output. Those are very heavy CPU time
> > > >>> consumers, now they are free to recompute even if SB is read only
> > > >>> during the current run.
> > > >>
> > > >> Thanks for the explanation Ales. I was coming at this change with much
> > > >> more familiarity with the northd incremental engine than the
> > > >> ovn-controller incremental engine. In northd, a lot of nodes are
> > > >> eventual inputs for en_northd or en_lflow, both of which write to the
> > > >> SB DB. So in most cases, even if a node doesn't write to the SB DB,
> > > >> then it eventually will feed into a node that does write to the SB DB
> > > >> [1]. In my head, it felt like the "common" case was for this to
> > > >> happen. However, in ovn-controller, a good majority of the nodes do
> > > >> not write to the SB DB and several of those nodes are never inputs
> > > >> (directly or indirectly) to nodes that write to the SB DB. As you
> > > >> pointed out, pflow_output and lflow_output are good examples of this.
> > > >>
> > > >> The situation I've outlined is still possible in ovn-controller. Many
> > > >> nodes are inputs for en_runtime_data. Letting those nodes recompute
> > > >> when en_runtime_data may eventually be denied the right to do so could
> > > >> be a waste of CPU resources.
> > > >>
> > > >> What if the change were altered a bit: if an engine node is marked as
> > > >> SB_WRITE, then any nodes that are upstream from that node (i.e. any
> > > >> node that is a direct or indirect input to that node) is also marked
> > > >> as SB_WRITE by the incremental engine. Meanwhile, any nodes that are
> > > >> downstream from SB_WRITE nodes are not. This way, nodes like
> > > >> pflow_output and lflow_output can recompute even when the SB database
> > > >> is read-only. But nodes that input into runtime_data, route_exchange,
> > > >> and garp_rarp can avoid unnecessary recomputes.
> > > >>
> > > >> What do you think?
> > > >>
> > > >
> > > > Your suggestion will unfortunately cause the change to not have any
> > > > benefit, in other words the state would be the same as on main right
> > > > now. For example the runtime_data node has several inputs a lot of
> > > > them are DB nodes e.g. sb_chassis. By marking them also as SB write
> > > > or "canceled if recompte is not allowed" if you will, would basically
> > > > mean it would mostly cancel as without the change. The leaf nodes,
> > > > usually the DB ones are always recomputing. They cannot have any
> > > > handlers. So I'm afraid that this change would then have the only
> > > > benefit of documenting which node can write into SB DB.
> > > >
> > > > But to check if my theory is correct I did run ovn-heater with a
> > > > baseline from main collecting the engine stats. Below you can see
> > > > the results:
> > > >
> > > > ---------------------------
> > > > main-250-density-heavy-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 1472 ms
> > > > avg     : 129 ms
> > > > 95%     : 537 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 328312207, Compute: 62608243, Cancel: 2067
> > > > (0.0006295836572412308%)
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-heavy-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 1343 ms
> > > > avg     : 130 ms
> > > > 95%     : 577 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 325154128, Compute: 62002527, Cancel: 2138
> > > > (0.0006575343247679758%)
> > > >
> > > > ---------------------------
> > > > main-250-density-heavy-ipv4-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 119 ms
> > > > avg     : 26 ms
> > > > 95%     : 48 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 77772935, Compute: 4420305, Cancel: 627
> > (0.0008061930541775233%)
> > > >
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-heavy-ipv4-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 131 ms
> > > > avg     : 26 ms
> > > > 95%     : 47 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 77918353, Compute: 4434319, Cancel: 533
> > (0.0006840493663925366%)
> > > >
> > > > ---------------------------
> > > > main-250-density-heavy-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 1330 ms
> > > > avg     : 134 ms
> > > > 95%     : 611 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 325845839, Compute: 62012707, Cancel: 1724
> > > > (0.0005290845527722083%)
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-heavy-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 1376 ms
> > > > avg     : 133 ms
> > > > 95%     : 569 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 324545521, Compute: 61835316, Cancel: 2680
> > > > (0.0008257701390369827%)
> > > >
> > > > ---------------------------
> > > > main-250-density-heavy-ipv6-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 174 ms
> > > > avg     : 26 ms
> > > > 95%     : 48 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 76480809, Compute: 4401000, Cancel: 686
> > (0.0008969570392488919%)
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-heavy-ipv6-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 172 ms
> > > > avg     : 26 ms
> > > > 95%     : 48 ms
> > > > total   : 13750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 78371328, Compute: 4493119, Cancel: 615
> > (0.0007847257609313447%)
> > > >
> > > > ---------------------------
> > > > main-250-density-light-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 1991 ms
> > > > avg     : 79 ms
> > > > 95%     : 124 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 1787284835, Compute: 339628410, Cancel: 3089
> > > > (0.0001728319929486785%)
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-light-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 2465 ms
> > > > avg     : 82 ms
> > > > 95%     : 135 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 1744830863, Compute: 334913315, Cancel: 2439
> > > > (0.00013978432246472707%)
> > > >
> > > > ---------------------------
> > > > main-250-density-light-ipv4-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 123 ms
> > > > avg     : 24 ms
> > > > 95%     : 34 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 47999536, Compute: 5072743, Cancel: 630
> > (0.001312512687622647%)
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-light-ipv4-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 129 ms
> > > > avg     : 24 ms
> > > > 95%     : 34 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 48054016, Compute: 5090679, Cancel: 522
> > (0.0010862775756348855%)
> > > >
> > > > ---------------------------
> > > > main-250-density-light-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 2181 ms
> > > > avg     : 79 ms
> > > > 95%     : 141 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 1778682312, Compute: 337995667, Cancel: 1309
> > > > (0.00007359380543499778%)
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-light-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 1993 ms
> > > > avg     : 76 ms
> > > > 95%     : 128 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 1775718929, Compute: 337788420, Cancel: 1527
> > > > (0.0000859933390956154%)
> > > >
> > > > ---------------------------
> > > > main-250-density-light-ipv6-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 179 ms
> > > > avg     : 27 ms
> > > > 95%     : 41 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 47954143, Compute: 4608282, Cancel: 728
> > (0.0015181170060739068%)
> > > >
> > > >
> > > > ---------------------------
> > > > sb-ro-250-density-light-ipv6-ic
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 169 ms
> > > > avg     : 27 ms
> > > > 95%     : 41 ms
> > > > total   : 63750
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 46764060, Compute: 4505181, Cancel: 597
> > (0.0012766214054126181%)
> > > >
> > > > ---------------------------
> > > > main-500-density-heavy-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 3291 ms
> > > > avg     : 483 ms
> > > > 95%     : 1570 ms
> > > > total   : 27500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 845825451, Compute: 166132940, Cancel: 2524
> > > > (0.00029840672174334936%)
> > > >
> > > > ---------------------------
> > > > sb-ro-500-density-heavy-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 7 ms
> > > > max     : 3223 ms
> > > > avg     : 447 ms
> > > > 95%     : 1504 ms
> > > > total   : 27500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 847419493, Compute: 166570179, Cancel: 2072
> > > > (0.0002445070023896653%)
> > > >
> > > > ---------------------------
> > > > main-500-density-heavy-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 11875 ms
> > > > avg     : 784 ms
> > > > 95%     : 2267 ms
> > > > total   : 27500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 857064532, Compute: 167714262, Cancel: 2521
> > > > (0.0002941435453077412%)
> > > >
> > > >
> > > > ---------------------------
> > > > sb-ro-500-density-heavy-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 7 ms
> > > > max     : 7056 ms
> > > > avg     : 598 ms
> > > > 95%     : 2147 ms
> > > > total   : 27500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 865415057, Compute: 169182676, Cancel: 2166
> > > > (0.0002502845290800158%)
> > > >
> > > > ---------------------------
> > > > main-500-density-light-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 6962 ms
> > > > avg     : 306 ms
> > > > 95%     : 1389 ms
> > > > total   : 127500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 5688174577, Compute: 1186841715, Cancel: 1468
> > > > (0.00002580792801148937%)
> > > >
> > > > ---------------------------
> > > > sb-ro-500-density-light-ipv4
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 5508 ms
> > > > avg     : 301 ms
> > > > 95%     : 1325 ms
> > > > total   : 127500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 5676845569, Compute: 1184598070, Cancel: 2659
> > > > (0.00004683939289312734%)
> > > >
> > > > ---------------------------
> > > > main-500-density-light-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 11675 ms
> > > > avg     : 276 ms
> > > > 95%     : 797 ms
> > > > total   : 127500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 5591661452, Compute: 1162323896, Cancel: 2665
> > > > (0.00004766025309073021%)
> > > >
> > > > ---------------------------
> > > > sb-ro-500-density-light-ipv6
> > > > ---------------------------
> > > > ovn-installed:
> > > > min     : 6 ms
> > > > max     : 6673 ms
> > > > avg     : 242 ms
> > > > 95%     : 688 ms
> > > > total   : 127500
> > > > failures: 0
> > > >
> > > > IP stats:
> > > > Recompute: 5779381065, Compute: 1201598403, Cancel: 1728
> > > > (0.0000298993954640712%)
> > > >
> > > > The numbers caught me a little by surprise, but the important thing
> > > > is that the percentage of cancels is pretty small in general. So having
> > > > 2 recomputes the input nodes for SB write nodes won't cause any
> > > > significant issue. Also on the other side there doesn't seem to be any
> > > > measurable improvement. However, unless you see a different reason
> > > > I would still like to get this patch in for the documentation purpose.
> > > >
> > > >
> > >
> > > Thanks, Ales, for testing this!  I agree, even if we don't get a
> > > significant improvement on the ovn-heater benchmarks it's probably still
> > > good to get this in for the additional semantic check it implements.
> > >
> > > From my perspective:
> > >
> > > Acked-by: Dumitru Ceara <[email protected]>
> > >
> > > But I'd love to hear from Mark too before this is merged.
> >
> > Hi everyone,
> >
> 
> Hi Felix,
> 
> 
> >
> > i think with runtime_data we the additional interesting behaviour that
> > binding_run makes a lot of effort handle cases where southbound is
> > unavailable (or we do not have a transaction for other reasons).
> > It might be the case that it could actually recompute without
> > southbound.
> >
> > I am not sure if this would actually work. But maybe it improves the
> > above statistics.
> >
> 
> I'm not sure if I follow correctly but is your suggestion that
> we should revisit binding_run and make it less dependent
> on SB being writable? Could you please elaborate a bit more?

Hi Ales,

as the code reads to me binding_run might work without the southbound
being writable at all. That would mean we could also recompute the
runtime_data engine node even if we do not have a southbound
transaction.

That would allow us to recompute in a lot more cases without the
southbound than before.

But i am not sure if:
1. binding_run completely works without a southbound transaction, or
   just certain parts of it.
2. Allowing runtime_data to recompute would actually reduce the amount
   of canceled runs in the above statistics.

Thanks,
Felix

> 
> 
> > Thanks a lot,
> > Felix
> >
> > >
> > > Thanks,
> > > Dumitru
> > >
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> Thanks,
> Ales
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to