On Fri, Jun 24, 2022 at 11:49 AM Han Zhou <zhou...@gmail.com> wrote:
>
> On Fri, Jun 24, 2022 at 1:11 AM Ales Musil <amu...@redhat.com> wrote:
> >
> > Hi Han,
> >
> > after our discussion I did he suggested test and the throughput does not
> seem to be affected,
> > I did the test with aging set to 2 sec, and during the test period (360
> sec) the MAC binding was removed multiple times.
> > There were some dropped packets, but the traffic was maintained with
> minimal disturbance.
>
> Thanks for sharing the result! I think different applications may react to
> this kind of disturbance differently. Some may be sensitive to packet loss.
> In addition, I believe this would also incur megaflow cache miss and
> trigger OVS userspace processing in the middle of a flow.
> May I know the traffic pattern of your test? Did you measure with iperf
> during the test? Could share the numbers with v.s. without the drops?
>
> On the other hand, if such random disturbance is not considered harmful for
> some deployment, then I would also question the value of doing all those
> OVS flow idle_age checkings on the *owner* chassis. There can be lots of
> chassis consuming the same mac-binding entry but we are now checking "at
> least one of them is not using the entry recently", which doesn't sound too
> different from just blindly expiring the entries without checking anything,
> and let it recreate if someone still needs it - if the minimal disturbance
> is acceptable in such environment. ovn-northd can do this periodical check
> easily and clean the expired entries, correct?
>
> Thanks,
> Han
>
> >
> >
> > Thanks,
> > Ales
> >
> > On Wed, Jun 22, 2022 at 9:51 AM Ales Musil <amu...@redhat.com> wrote:
> >>
> >>
> >>
> >> On Wed, Jun 22, 2022 at 9:21 AM Han Zhou <zhou...@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Fri, Jun 17, 2022 at 2:08 AM Ales Musil <amu...@redhat.com> wrote:
> >>> >
> >>> > Add MAC binding aging mechanism, that
> >>> > should take care of stale MAC bindings.
> >>> >
> >>> > The mechanism works on "ownership" of the
> >>> > MAC binding row. The chassis that creates
> >>> > the row is then checking if the "idle_age"
> >>> > of the flow is over the aging threshold.
> >>> > In that case the MAC binding is removed
> >>> > from database. The "owner" might change
> >>> > when another chassis saw an update of the
> >>> > MAC address.
> >>> >
> >>> > This approach has downside, the chassis
> >>> > that "owns" the MAC binding might not actually be
> >>> > the one that is using it actively. This
> >>> > might lead some delays in packet flow when
> >>> > the row is removed.
> >>> >
> >>>
> >>
> >> Hi Han, thank you for your input.
> >>
> >>>
> >>> Thanks Ales for working on this! The stale entries in MAC_Binding table
> was a big TODO of OVN and a difficult problem. It is great to see a
> solution finally, and I think utilizing the "idle_age" is brilliant. Before
> reviewing it in more detail, I'd like to discuss the "downside" first.
> >>>
> >>> I think the "downside" here is indeed a problem with this approach. The
> MAC binding in OVN is in fact the ARP cache (or neighbour table) of the
> router, but OVN logical router is distributed (except for gateway-router
> and DGP), so in most cases by nature of OVN LR the user of MAC binding
> wouldn't be the one "owns" it. It would be a big dataplane performance
> impact, thinking about a chassis that has a flow with high throughput of
> packets suddenly needs to pause and wait for ovn-controller (and SB DB) to
> complete the ARP resolution process. I saw this being pointed out and
> discussed in the first version, but I'd raise more attention to it, because
> the problem introduced would be much bigger than the stale entries in the
> MAC binding table.
> >>>
> >>>
> >>> I think the proposal from Daniel that transfers owner with "expire
> timestamp" set would help, but I am also thinking that since the logical
> router is distributed, it may be unreasonable to have an owner at all. My
> suggestion is, instead of assigning "owner" for each entry, a central
> controller can just be responsible for checking if any chassis still uses
> the entry and removing it when no one uses it anymore. Naturally the
> central controller can be hosted in ovn-northd. Here is the detailed
> algorithm I am thinking:
> >>>
> >>> * when an entry is created (by any ovn-controller), an expire_timestamp
> is set (e.g. 10 min from now - can be configurable)
> >>> * Each ovn-controller: check the entries it uses and if the
> expire_timestamp of the entry is past, but its own "idle_age" indicates the
> entry is still needed, it will update the SB DB entry with a new
> expire_timestamp. Note: before updating the SB DB, ovn-controller needs a
> random delay, to avoid update storm to SB unnecessarily - in most cases
> only one ovn-controller would update/refresh the SB DB when an entry is
> expired.
> >>> * ovn-northd periodically checks if there are entries with
> expire_timestamp past longer than 1 min (this is related to the random
> delay of ovn-controller, may be configurable, too), it will go ahead and
> delete the entry.
> >>>
> >>> What do you think?
> >>
> >>
> >> This is actually pretty close to the first approach that was suggested
> in the BZ [0] for this. However your suggestion would cause less SB traffic
> which is great. I would be still a bit worried that in case of big setups
> there could be a lot of controllers trying to postpone the deletion of the
> particular MAC binding. We are running some scale tests with the v2 patch
> set, so we should have some answers whether the downside is causing any
> visible troubles.
> >>
> >> I will definitely discuss this suggestion with the rest of the team.
> >>
> >>>
> >>> In addition, such a change may still be risky in large scale
> environments, and I think it worth experimenting first with a knob to
> enable it (and disabled by default).
> >>
> >>
> >> That would be in line with what Mark suggested, a special value that
> disables mac binding e.g. threshold=0, which could be the default.
> >>


Thanks Ales for working on this.  I haven't reviewed the patch series.
Jst providing some comments and my 2 cent thoughts

1.  If it's possible I'd avoid querying OVS to get the flow stats and
determine if a mac binding entry is stale/expired or not.
    If there is no other way, then I'm fine with it,

2. Before taking that approach we can perhaps explore another way to
do it.   My initial thought is:
      -  Each mac binding is owned by one ovn-controller  (probably
the one which learnt it)
     -  And periodically, it will generate an arp request for the
learnt IP of the mac binding entry.
     -  If that mac binding is still intact, we will receive an arp
response. And ovn-controller handling this arp response will mark that
this
        mac binding entry as still active.
   -   If no response, then this mac binding entry is deleted.

I don't think this can be easy to implement as presently we first
check if have already learnt the mac bindind entry or not (using ovn
action lookup_arp/ lookup_nd)
When we receive the arp response from the mac binding ip, then we
should still send the packet to ovn-controller even if lookup_arp/nd
returns success.

What do you all think ?  Does this seem doable ?

Numan



> >>>
> >>>
> >>> Thanks,
> >>> Han
> >>
> >>
> >>
> >> Thanks,
> >> Ales
> >>
> >> [0] https://bugzilla.redhat.com/2084668#c2
> >>
> >>>
> >>>
> >>> > The threshold can be configured in
> >>> > NB_global table with key "mac_binding_age_threshold"
> >>> > in seconds with default value being 60.
> >>> >
> >>> > The test case is present as separate patch of the series.
> >>> >
> >>> > Add delay to ARP response processing to prevent
> >>> > race condition between multiple controllers
> >>> > that received the same ARP.
> >>> >
> >>> > Ales Musil (6):
> >>> >   Add chassis column to MAC_Binding table
> >>> >   Add MAC binding aging mechanism
> >>> >   Add stopwatch for MAC binding aging
> >>> >   Allow the MAC binding age threshold to be configurable
> >>> >   ovn.at: Add test case covering the MAC binding aging
> >>> >   pinctrl.c: Add delay after ARP packet
> >>> >
> >>> >  controller/automake.mk         |   4 +-
> >>> >  controller/mac-binding-aging.c | 241
> +++++++++++++++++++++++++++++++++
> >>> >  controller/mac-binding-aging.h |  32 +++++
> >>> >  controller/ovn-controller.c    |  32 +++++
> >>> >  controller/pinctrl.c           |  73 ++++++++--
> >>> >  northd/northd.c                |  12 ++
> >>> >  northd/ovn-northd.c            |   2 +-
> >>> >  ovn-nb.xml                     |   5 +
> >>> >  ovn-sb.ovsschema               |   6 +-
> >>> >  ovn-sb.xml                     |   5 +
> >>> >  tests/ovn.at                   | 212 +++++++++++++++++++++++++++--
> >>> >  11 files changed, 595 insertions(+), 29 deletions(-)
> >>> >  create mode 100644 controller/mac-binding-aging.c
> >>> >  create mode 100644 controller/mac-binding-aging.h
> >>> >
> >>> > --
> >>> > 2.35.3
> >>> >
> >>> > _______________________________________________
> >>> > dev mailing list
> >>> > d...@openvswitch.org
> >>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >>
> >>
> >> --
> >>
> >> Ales Musil
> >>
> >> Senior Software Engineer - OVN Core
> >>
> >> Red Hat EMEA
> >>
> >> amu...@redhat.com    IM: amusil
> >
> >
> >
> > --
> >
> > Ales Musil
> >
> > Senior Software Engineer - OVN Core
> >
> > Red Hat EMEA
> >
> > amu...@redhat.com    IM: amusil
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to