On 2/23/22 09:24, Dumitru Ceara wrote: > On 2/23/22 07:27, Han Zhou wrote: >> On Fri, Feb 11, 2022 at 12:52 AM Dumitru Ceara <dce...@redhat.com> wrote: >>> >>> On 2/11/22 00:54, Numan Siddique wrote: >>>> On Thu, Jan 27, 2022 at 10:00 AM Dumitru Ceara <dce...@redhat.com> >> wrote: >>>>> >>>>> We still need to try to ovsdb_idl_loop_commit_and_wait() on instances >>>>> that are in standby mode. That's because they need to try to take the >>>>> lock. But if they're in standby they won't actually build a >> transaction >>>>> json and will return early because they don't own the SB lock. >>>>> >>>>> That's reported as an error by ovsdb_idl_loop_commit_and_wait() but we >>>>> shouldn't act on it. >>>>> >>>>> Also, to ensure that no DB changes are missed, ovsdb_idl_track_clear() >>>>> should be called only on active instances. The standby or paused ones >>>>> will get the incremental updates when they become active. Otherwise we >>>>> would be forced to perform a full recompute when the instance becomes >>>>> active. >>>> >>>> Hi Dumitru, >>>> >>> >>> Hi Numan, >>> >>> Thanks for the review! >>> >>>> I've a question on the track clear being moved out of the standby >> instances. >>>> To ensure correctness, I suppose it's better to trigger a full >> recompute when a >>>> standby instance becomes active. What do you think? >>>> >>> >>> I might be wrong but I don't think that's necessary. It may also be >>> quite costly as full recomputes can take quite long. >>> >>>> Also lets say CMS does the below operations >>>> - Add a logical switch S1 >>>> - Add a logical port p1 in S1 >>>> - Add a logical port p2 in S1 >>>> - Delete logical port p2 >>>> - Delete a logical switch. >>>> >>>> With this patch since we are not clearing the tracking information, >>>> how does ovn-northd >>>> process the tracked changes when it becomes active ? >>> >>> When ovn-northd becomes active, from a Northbound database perspective, >>> there were no changes; that is, S1 didn't exist when it was last active >>> and it doesn't exist now either. >>> >>> So, there should be no NB change to process. Accumulating tracked >>> changes without calling clear() on the standby has exactly this effect. >> >> Hi Dumitru, >> > > Hi Han, > >> I wonder how accumulating tracked changes without calling clear() would >> work. >> >> Firstly, I was under the impression that ovsdb_idl_track_clear() must be >> called before the next ovsdb_idl_run(), and the current change tracking >> implementation cannot safely carry tracking information across the >> iterations. This was why in ovn-controller whenever (!engine_has_run() && >> engine_need_run()) we force recompute in the next iteration. If changes >> could be carried over we would have incrementally processed the accumulated >> changes without forcing recompute. I can't recall the details, and I >> checked the IDL again briefly but I didn't find the exact reason why it is >> not safe. But I believe it was never used this way before. I should have >> added a TODO for this (but somehow forgot to, sorry). >> > > I've been looking at that code too and I don't see any reason why > accumulating changes wouldn't work. The IDL code is written such that > it processes multiple jsonrpc updates in a single run anyway: > > ovsdb_idl_run() -> ovsdb_cs_run() -> which can receive up to 50 > (hardcoded) jsonrpc messages. > > It's possible that I missed something though, the IDL functionality is > complex, so maybe it's worth documenting that we recommend calling > ovsdb_idl_track_clear() every time ovsdb_idl_run() is called, and why. > What do you think? > >> Secondly, even if it is safe to accumulate changes, it is going to be a >> memory problem. Active-standby failover happens very rarely in a healthy >> production environment. So even if the DB size is small, the change >> tracking can grow without any limit. I tried with this patch by doing >> simply a loop of adding and deleting a single logical switch 10k times on >> top of an empty NB DB, and the standby northd's memory grew to 1.1GB. >> > > This is surprising, sounds like an issue in the IDL, I would've expected > the delete to just remove the record from the track list. I'll look > into it. >
I had a quick look, it's because deleting a row clears its datum, sets tracked_old_datum and adds it to the deleted track list. The subsequent "add" will not find this row, and will insert a new one. We probably need to fix this anyway because even in the current implementation the IDL can process up to 50 consecutive "add/delete/add/delete/.." updates. This can potentially cause issues due to "spurious" delete events when the client finally processes the tracked changes. >> Thirdly, processing all the tracked changes when standby becomes active is >> likely taking higher cost than recompute, if the tracked change size is >> bigger than the DB size. >> > > If changes are accumulated correctly, the total number of the tracked > record changes should never be larger than the size of the DB the last > time changes were processed + the size of the DB now, resulting in a > cost to process of O(size-of-DB), the same as for full recompute. Or am > I missing something? > >> So for now I would suggest keeping the logic of clearing tracking on every >> iteration and force recompute when failover. >> > > At least until we figure out why the memory increase in the IDL, I agree > to keep forcing a recompute on failover. That's also because we > currently don't really incrementally process much in ovn-northd. > > I'll send a v2. > >> Thanks, >> Han >> > > Thanks, > Dumitru > >>> >>> From a Southbound database perspective there are two cases: >>> >>> a. The former active northd processed some (but not all) of the NB >>> changes and executed their corresponding SB transactions. In this case, >>> the standby northd also receives update messages for the SB records that >>> were changed. The standby northd tracks these changes. >>> >>> When the standby northd becomes active it will: >>> - determine that NB state didn't change >>> - SB state changed and needs to be reconciled (today we do this with the >>> help of a NULL change handler for SB_* tables which will trigger a full >>> recompute). >>> >>> b. The former active northd processed all of the NB changes and executed >>> their corresponding SB transactions. In this case, the final state of >>> the NB and SB databases should be equivalent to their initial states. >>> NB/SB changes will be accumulated by the change tracking mechanism on >>> the standby resulting in empty tracked changes lists. This is fine >>> because the new active northd doesn't need to do anything, the DB >>> contents are already consistent. >>> >>> c. The former active northd processed none of the NB changes yet. This >>> is very similar to case "b" above, the new active northd doesn't need to >>> change anything in the NB/SB and it won't do that either. >>> >>>> >>>> Thanks >>>> Numan >>>> >>> >>> Thanks, >>> Dumitru >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev