Ben Pfaff <b...@ovn.org> wrote on 07/19/2016 07:08:04 PM:

> From: Ben Pfaff <b...@ovn.org>
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: ovs-dev <dev@openvswitch.org>
> Date: 07/19/2016 07:08 PM
> Subject: Re: [ovs-dev] [ovs-dev, v23, 2/2] ovn-controller: Add
> incremental processing to lflow_run and physical_run
>
> On Tue, Jul 19, 2016 at 04:55:49PM -0500, Ryan Moats wrote:
> >
> >
> > > On Tue, Jul 19, 2016 at 02:57:43PM -0500, Ryan Moats wrote:
> > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 02:54:36 PM:
> > > >
> > > > > From: Ben Pfaff <blp at ovn.org>
> > > > To: Ryan Moats/Omaha/IBM at IBMUS>
> > > > Cc: dev at openvswitch.org
> > > > Date: 07/19/2016 02:54 PM
> > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > > processing to lflow_run and physical_run
> > > > >
> > > > > On Tue, Jul 19, 2016 at 02:18:11PM -0500, Ryan Moats wrote:
> > > > > >
> > > > > >
> > > > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 01:39:43 PM:
> > > > > >
> > > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > > Cc: dev at openvswitch.org
> > > > > > > Date: 07/19/2016 01:41 PM
> > > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add
incremental
> > > > > > > processing to lflow_run and physical_run
> > > > > > >
> > > > > > > On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 12:57:23 AM:
> > > > > > > >
> > > > > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > > > > Cc: dev at openvswitch.org
> > > > > > > > > Date: 07/19/2016 12:57 AM
> > > > > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add
> > incremental
> > > > > > > > > processing to lflow_run and physical_run
> > > > > > > > >
> > > > > > > > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats
wrote:
> > > > > > > > > > This code changes to allow incremental processing of
the
> > > > > > > > > > logical flow and physical binding tables whenver
possible.
> > > > > > > > > >
> > > > > > > > > > Note: flows created by physical_run for
multicast_groups
> > are
> > > > > > > > > > *NOT* handled incrementally due to to be solved issues
> > > > > > > > > > with GWs and local routers.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Ryan Moats <rmoats at us.ibm.com>
> > > > > > > > >
> > > > > > > > > With this applied, I get a number of apparently
consistent
> > test
> > > > > > failures
> > > > > > > > > (literally no passes in 5+ runs) that I don't see, at
least
> > not
> > > > > > > > > consistently, if I revert it:
> > > > > > > > >
> > > > > > > > >     OVN end-to-end tests
> > > > > > > > >
> > > > > > > > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS
> > FAILED
> > > > > > > > (ovn.at:1307)
> > > > > > > > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS
> > FAILED
> > > > > > > > (ovn.at:1473)
> > > > > > > > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR
> > FAILED
> > > > > > > > (ovn.at:1887)
> > > > > > > > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs
> > FAILED
> > > > > > > > (ovn.at:2416)
> > > > > > > > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static
routes
> > > > FAILED
> > > > > > > > > (ovn.at:3053)
> > > > > > > > >
> > > > > > > > > Do these pass reasonably often for you?  (Maybe my
changes to
> > the
> > > > > > > > > previous patch screwed something up?)
> > > > > > > > >
> > > > > > > >
> > > > > > > > First, not all of the above cases failed for me even with a
> > rebase
> > > > of
> > > > > > the
> > > > > > > > last patch onto what is currently in master - the last two
> > passed
> > > > for
> > > > > > me
> > > > > > > > on either the first and second run, so I'm looking at the
first
> > > > three
> > > > > > for
> > > > > > > > the rest of this message.
> > > > > > > >
> > > > > > > > > Double checking by going back to my original first patch
> > verified
> > > > that
> > > > > > all
> > > > > > > > of the tests in question passed.  Going through and
checking
> > each
> > > > of
> > > > > > the
> > > > > > > > changes that you made to the first patch, the following is
the
> > diff
> > > > I
> > > > > > > > needed
> > > > > > > > to make things work again:
> > > > > > > >
> > > > > > > > diff --git a/ovn/controller/ofctrl.c
b/ovn/controller/ofctrl.c
> > > > > > > > index d10dcc6..e113213 100644
> > > > > > > > --- a/ovn/controller/ofctrl.c
> > > > > > > > +++ b/ovn/controller/ofctrl.c
> > > > > > > > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id,
uint16_t
> > > > > > priority,
> > > > > > > >      ovn_flow_lookup(&match_flow_table, f, &existing);
> > > > > > > >      struct ovn_flow *d;
> > > > > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > > > > > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > > > +        if (f->table_id == d->table_id && f->priority ==
d->
> > > > priority
> > > > > > > > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > > > > +                             d->ofpacts, d->ofpacts_len)
> > > > > > > > +            && uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > > >              static struct vlog_rate_limit rl =
> > > > VLOG_RATE_LIMIT_INIT(5,
> > > > > > 1);
> > > > > > > >              char *s = ovn_flow_to_string(f);
> > > > > > > >              VLOG_WARN_RL(&rl, "found duplicate flow %s for
> > parent
> > > > > > > > "UUID_FMT,
> > > > > > > >
> > > > > > > > Hopefully that will get you past the test failures and you
can
> > push
> > > > > > > > both it and the rest of the last incremental processing
patch.
> > > > > > >
> > > > > > > OK...
> > > > > > >
> > > > > > > Help me understand why that change is correct.
> > > > > > >
> > > > > > > I deleted the table_id and priority checks because
> > ovn_flow_lookup()
> > > > > > > only returns flows where those are equal anyhow, so they
seemed
> > > > > > > completely redundant.  That leaves the actions comparison.
Why
> > would
> > > > we
> > > > > > > report a duplicate only if the actions are the same?  It's
> > actually
> > > > > > > worse, isn't it, if the actions are different, because then
the
> > flows
> > > > > > > aren't just duplicates, they're inconsistent with each other.
> > > > > >
> > > > > > Actually, you are correct in that I'm doing something I
shouldn't,
> > but
> > > > > > your solution is also doing something it shouldn't.  Yes, my
> > keeping
> > > > > > rules that are inconsistent is wrong, but discarding rules
where
> > > > > > the action changes (which is what you proposed) doesn't work
> > either.
> > > > > >
> > > > > > How about if we split the difference?  First, verify that the
uuid
> > > > > > is the same, then check the actions.  If they are the same,
spawn a
> > > > > > warning, and if they aren't replace the old actions with the
new
> > ones.
> > > > > > This appears to pass unit tests as well (and I think it's more
> > > > correct),
> > > > > > in which case the difference becomes:
> > > > > >
> > > > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > > > index d10dcc6..0b8326e 100644
> > > > > > --- a/ovn/controller/ofctrl.c
> > > > > > +++ b/ovn/controller/ofctrl.c
> > > > > > @@ -554,13 +554,21 @@ ofctrl_add_flow(uint8_t table_id,
uint16_t
> > > > priority,
> > > > > >      struct ovn_flow *d;
> > > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > > >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > -            static struct vlog_rate_limit rl =
> > VLOG_RATE_LIMIT_INIT(5,
> > > > 1);
> > > > > > -            char *s = ovn_flow_to_string(f);
> > > > > > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for
parent
> > > > > > "UUID_FMT,
> > > > > > -                         s, UUID_ARGS(&f->uuid));
> > > > > > -            ovn_flow_destroy(f);
> > > > > > -            free(s);
> > > > > > -            return;
> > > > > > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > > +                              d->ofpacts, d->ofpacts_len)) {
> > > > > > +                static struct vlog_rate_limit rl =
> > > > VLOG_RATE_LIMIT_INIT(5,
> > > > > > 1);
> > > > > > +                char *s = ovn_flow_to_string(f);
> > > > > > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for
> > parent
> > > > > > "UUID_FMT
> > > > > > +                             s, UUID_ARGS(&f->uuid));
> > > > > > +                ovn_flow_destroy(f);
> > > > > > +                free(s);
> > > > > > +                return;
> > > > > > +            } else {
> > > > > > +                /* Update the action. */
> > > > > > +                free(d->ofpacts);
> > > > > > +                d->ofpacts = xmemdup(f->ofpacts, f->
ofpacts_len);
> > > > > > +                d->ofpacts_len = f->ofpacts_len;
> > > > > > +            }
> > > > > >          }
> > > > > >      }
> > > > > >
> > > > >
> > > > > I'm OK with updating the actions, but I'd prefer to warn about it
> > anyway
> > > > > because I still think it's a bug; if the "new" actions happen to
work
> > > > > better than the "old" actions, I think that's just a coincidence.
> > > >
> > > > While I think that's going to be a *lot* of warnings, I'm ok with
> > warning
> > > > in that case as well for now and we can argue the point later on
after
> > we
> > > > have more operational experience with the patch set.
> > >
> > > So do we have a lot of code that's using ofctrl_add_flow() to
actually
> > > replace an existing flow instead of adding one?  Maybe that code
should
> > > be using ofctrl_set_flow() instead?
> > >
> > > I'm trying to understand the intent here instead of just the code.
> > >
> > > > You want to add the additional warning code or should I?
> > >
> > > I'm happy to write the code once I understand what's going on.
> >
> > Sorry for breaking the thread but while I can see your reply in the
> > archives,
> > something in the server chain isn't it dropping it into my mailbox,
> > and I don't want to look like I'm ignoring it.
> >
> > Based on looking through the logs when running the unit tests, I
believe
> > (but I don't know for sure) that the vast majority of warnings
> > will come from creating a completely identical flow (and I admit that I
> > don't know why *yet* - that was a topic for another day) - the reason I
> > can't say for sure is that since I've not had a separate message
handling
> > the action change case, I've had nothing to compare to.
> >
> > Since we are going to warn on both branches, I'd like to have
> > the warning message be different so that there is a canary to work
> > backwards
> > from to figure out what's driving what. Something like the following:
> >
> > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > index d10dcc6..7fd6ff8 100644
> > --- a/ovn/controller/ofctrl.c
> > +++ b/ovn/controller/ofctrl.c
> > @@ -554,13 +554,28 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
priority,
> >      struct ovn_flow *d;
> >      LIST_FOR_EACH (d, list_node, &existing) {
> >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > -            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
> > -            char *s = ovn_flow_to_string(f);
> > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > "UUID_FMT,
> > -                         s, UUID_ARGS(&f->uuid));
> > -            ovn_flow_destroy(f);
> > -            free(s);
> > -            return;
> > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > +                              d->ofpacts, d->ofpacts_len)) {
> > +                static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> > 1);
> > +                char *s = ovn_flow_to_string(f);
> > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > "UUID_FMT
> > +                             s, UUID_ARGS(&f->uuid));
> > +                ovn_flow_destroy(f);
> > +                free(s);
> > +                return;
> > +            } else {
> > +                /* Update the action. */
> > +                free(d->ofpacts);
> > +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> > +                d->ofpacts_len = f->ofpacts_len;
> > +                static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> > 1);
> > +                char *s = ovn_flow_to_string(f);
> > +                VLOG_WARN_RL(&rl, "found flow %s with modified action
for
> > paren
> > +                             s, UUID_ARGS(&f->uuid));
> > +                ovn_flow_destroy(f);
> > +                free(s);
> > +                return;
> > +            }
> >          }
> >
> > Does that make sense?
>
> Here's a version for your review:
>         http://openvswitch.org/pipermail/dev/2016-July/075709.html
>         http://openvswitch.org/pipermail/dev/2016-July/075710.html
>

Looking at them now

Ryan
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to