On 2/23/22 07:27, Han Zhou wrote: > On Fri, Feb 11, 2022 at 12:52 AM Dumitru Ceara <dce...@redhat.com> wrote: >> >> On 2/11/22 00:54, Numan Siddique wrote: >>> On Thu, Jan 27, 2022 at 10:00 AM Dumitru Ceara <dce...@redhat.com> > wrote: >>>> >>>> We still need to try to ovsdb_idl_loop_commit_and_wait() on instances >>>> that are in standby mode. That's because they need to try to take the >>>> lock. But if they're in standby they won't actually build a > transaction >>>> json and will return early because they don't own the SB lock. >>>> >>>> That's reported as an error by ovsdb_idl_loop_commit_and_wait() but we >>>> shouldn't act on it. >>>> >>>> Also, to ensure that no DB changes are missed, ovsdb_idl_track_clear() >>>> should be called only on active instances. The standby or paused ones >>>> will get the incremental updates when they become active. Otherwise we >>>> would be forced to perform a full recompute when the instance becomes >>>> active. >>> >>> Hi Dumitru, >>> >> >> Hi Numan, >> >> Thanks for the review! >> >>> I've a question on the track clear being moved out of the standby > instances. >>> To ensure correctness, I suppose it's better to trigger a full > recompute when a >>> standby instance becomes active. What do you think? >>> >> >> I might be wrong but I don't think that's necessary. It may also be >> quite costly as full recomputes can take quite long. >> >>> Also lets say CMS does the below operations >>> - Add a logical switch S1 >>> - Add a logical port p1 in S1 >>> - Add a logical port p2 in S1 >>> - Delete logical port p2 >>> - Delete a logical switch. >>> >>> With this patch since we are not clearing the tracking information, >>> how does ovn-northd >>> process the tracked changes when it becomes active ? >> >> When ovn-northd becomes active, from a Northbound database perspective, >> there were no changes; that is, S1 didn't exist when it was last active >> and it doesn't exist now either. >> >> So, there should be no NB change to process. Accumulating tracked >> changes without calling clear() on the standby has exactly this effect. > > Hi Dumitru, >
Hi Han, > I wonder how accumulating tracked changes without calling clear() would > work. > > Firstly, I was under the impression that ovsdb_idl_track_clear() must be > called before the next ovsdb_idl_run(), and the current change tracking > implementation cannot safely carry tracking information across the > iterations. This was why in ovn-controller whenever (!engine_has_run() && > engine_need_run()) we force recompute in the next iteration. If changes > could be carried over we would have incrementally processed the accumulated > changes without forcing recompute. I can't recall the details, and I > checked the IDL again briefly but I didn't find the exact reason why it is > not safe. But I believe it was never used this way before. I should have > added a TODO for this (but somehow forgot to, sorry). > I've been looking at that code too and I don't see any reason why accumulating changes wouldn't work. The IDL code is written such that it processes multiple jsonrpc updates in a single run anyway: ovsdb_idl_run() -> ovsdb_cs_run() -> which can receive up to 50 (hardcoded) jsonrpc messages. It's possible that I missed something though, the IDL functionality is complex, so maybe it's worth documenting that we recommend calling ovsdb_idl_track_clear() every time ovsdb_idl_run() is called, and why. What do you think? > Secondly, even if it is safe to accumulate changes, it is going to be a > memory problem. Active-standby failover happens very rarely in a healthy > production environment. So even if the DB size is small, the change > tracking can grow without any limit. I tried with this patch by doing > simply a loop of adding and deleting a single logical switch 10k times on > top of an empty NB DB, and the standby northd's memory grew to 1.1GB. > This is surprising, sounds like an issue in the IDL, I would've expected the delete to just remove the record from the track list. I'll look into it. > Thirdly, processing all the tracked changes when standby becomes active is > likely taking higher cost than recompute, if the tracked change size is > bigger than the DB size. > If changes are accumulated correctly, the total number of the tracked record changes should never be larger than the size of the DB the last time changes were processed + the size of the DB now, resulting in a cost to process of O(size-of-DB), the same as for full recompute. Or am I missing something? > So for now I would suggest keeping the logic of clearing tracking on every > iteration and force recompute when failover. > At least until we figure out why the memory increase in the IDL, I agree to keep forcing a recompute on failover. That's also because we currently don't really incrementally process much in ovn-northd. I'll send a v2. > Thanks, > Han > Thanks, Dumitru >> >> From a Southbound database perspective there are two cases: >> >> a. The former active northd processed some (but not all) of the NB >> changes and executed their corresponding SB transactions. In this case, >> the standby northd also receives update messages for the SB records that >> were changed. The standby northd tracks these changes. >> >> When the standby northd becomes active it will: >> - determine that NB state didn't change >> - SB state changed and needs to be reconciled (today we do this with the >> help of a NULL change handler for SB_* tables which will trigger a full >> recompute). >> >> b. The former active northd processed all of the NB changes and executed >> their corresponding SB transactions. In this case, the final state of >> the NB and SB databases should be equivalent to their initial states. >> NB/SB changes will be accumulated by the change tracking mechanism on >> the standby resulting in empty tracked changes lists. This is fine >> because the new active northd doesn't need to do anything, the DB >> contents are already consistent. >> >> c. The former active northd processed none of the NB changes yet. This >> is very similar to case "b" above, the new active northd doesn't need to >> change anything in the NB/SB and it won't do that either. >> >>> >>> Thanks >>> Numan >>> >> >> Thanks, >> Dumitru >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev