Re: [ovs-dev] [PATCH ovn] ovn-northd: Don't log transaction failures on standby instances.

Han Zhou Tue, 22 Feb 2022 22:45:59 -0800

On Fri, Feb 11, 2022 at 12:52 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On 2/11/22 00:54, Numan Siddique wrote:
> > On Thu, Jan 27, 2022 at 10:00 AM Dumitru Ceara <dce...@redhat.com>
wrote:
> >>
> >> We still need to try to ovsdb_idl_loop_commit_and_wait() on instances
> >> that are in standby mode.  That's because they need to try to take the
> >> lock.  But if they're in standby they won't actually build a
transaction
> >> json and will return early because they don't own the SB lock.
> >>
> >> That's reported as an error by ovsdb_idl_loop_commit_and_wait() but we
> >> shouldn't act on it.
> >>
> >> Also, to ensure that no DB changes are missed, ovsdb_idl_track_clear()
> >> should be called only on active instances.  The standby or paused ones
> >> will get the incremental updates when they become active.  Otherwise we
> >> would be forced to perform a full recompute when the instance becomes
> >> active.
> >
> > Hi Dumitru,
> >
>
> Hi Numan,
>
> Thanks for the review!
>
> > I've a question on the track clear being moved out of the standby
instances.
> > To ensure correctness,  I suppose it's better to trigger a full
recompute when a
> > standby instance becomes active. What do you think?
> >
>
> I might be wrong but I don't think that's necessary.  It may also be
> quite costly as full recomputes can take quite long.
>
> > Also lets say CMS does the below operations
> >      - Add a logical switch S1
> >      - Add a  logical port p1 in S1
> >      - Add a logical port p2 in S1
> >      - Delete logical port p2
> >      - Delete a logical switch.
> >
> > With this patch since we are not clearing the tracking information,
> > how does ovn-northd
> > process the tracked changes when it becomes active ?
>
> When ovn-northd becomes active, from a Northbound database perspective,
> there were no changes; that is, S1 didn't exist when it was last active
> and it doesn't exist now either.
>
> So, there should be no NB change to process.  Accumulating tracked
> changes without calling clear() on the standby has exactly this effect.


Hi Dumitru,

I wonder how accumulating tracked changes without calling clear() would
work.

Firstly, I was under the impression that ovsdb_idl_track_clear() must be
called before the next ovsdb_idl_run(), and the current change tracking
implementation cannot safely carry tracking information across the
iterations. This was why in ovn-controller whenever (!engine_has_run() &&
engine_need_run()) we force recompute in the next iteration. If changes
could be carried over we would have incrementally processed the accumulated
changes without forcing recompute. I can't recall the details, and I
checked the IDL again briefly but I didn't find the exact reason why it is
not safe. But I believe it was never used this way before. I should have
added a TODO for this (but somehow forgot to, sorry).

Secondly, even if it is safe to accumulate changes, it is going to be a
memory problem. Active-standby failover happens very rarely in a healthy
production environment. So even if the DB size is small, the change
tracking can grow without any limit. I tried with this patch by doing
simply a loop of adding and deleting a single logical switch 10k times on
top of an empty NB DB, and the standby northd's memory grew to 1.1GB.

Thirdly, processing all the tracked changes when standby becomes active is
likely taking higher cost than recompute, if the tracked change size is
bigger than the DB size.

So for now I would suggest keeping the logic of clearing tracking on every
iteration and force recompute when failover.

Thanks,
Han

>
> From a Southbound database perspective there are two cases:
>
> a. The former active northd processed some (but not all) of the NB
> changes and executed their corresponding SB transactions.  In this case,
> the standby northd also receives update messages for the SB records that
> were changed.  The standby northd tracks these changes.
>
> When the standby northd becomes active it will:
> - determine that NB state didn't change
> - SB state changed and needs to be reconciled (today we do this with the
> help of a NULL change handler for SB_* tables which will trigger a full
> recompute).
>
> b. The former active northd processed all of the NB changes and executed
> their corresponding SB transactions.  In this case, the final state of
> the NB and SB databases should be equivalent to their initial states.
> NB/SB changes will be accumulated by the change tracking mechanism on
> the standby resulting in empty tracked changes lists.  This is fine
> because the new active northd doesn't need to do anything, the DB
> contents are already consistent.
>
> c. The former active northd processed none of the NB changes yet.  This
> is very similar to case "b" above, the new active northd doesn't need to
> change anything in the NB/SB and it won't do that either.
>
> >
> > Thanks
> > Numan
> >
>
> Thanks,
> Dumitru
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] ovn-northd: Don't log transaction failures on standby instances.

Reply via email to