Thanks Dumitru for this promising optimization!

On Thu, Aug 11, 2022 at 1:03 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On 8/10/22 19:54, Mark Michelson wrote:
> > Hi Dumitru,
> >
>
> Hi Mark,
>
> > I read the patch series, and I think the idea of chassis-specific
> > variables is a good idea to reduce the number of DB records for certain
> > things. Aside from load balancers, I suspect this could have a positive
> > impact for other structures as well.
> >
>
> Thanks for taking a look!  Yes, I think this might be applicable to
> other structures too.
>

I think it is a good idea to make it more generic, but for my understanding
this template concept is for anything that's "node/chassis" specific, and
supposed to be instantiated at chassis level. Probably we should name the
DB tables as something like chassis_template_var.

> > Rather than criticizing the individual lines of code, I'll focus instead
> > on some higher-level questions/ideas.
> >
>
> Sure, thanks! :)
>
> > First, one question I had was what happens when a template variable name
> > is used in a load balancer, but there is no appropriate value to
> > substitute? For instance, what if a load balancer applies to chassis-3,
> > but you only have template variables for chassis-1 and chassis-2? This
> > might be addressed in the code but I didn't notice if it was.
> >
>
> There are actually two things to consider here:
>
> 1. there might be a logical flow that uses a template variable: in this
> case if template expansion/instantiation fails we currently leave the
> token untouched (e.g., '^variable' stays '^variable').  That will cause
> the flow action/match parsing to fail and currently logs a warning.  The
> flow itself is skipped, as it should be.  We probably need to avoid
> logging a warning though.
>
> 2. like you pointed out, there might be a load balancer using templates
> in its backends/vips: if some of those templates cannot be instantiated
> locally the backend/vip where they're added is skipped.  Unless I missed
> something, the code should already do that.
>
> > Second, it seems like template variables are a natural extension of
> > existing concepts like address sets and port groups. In those cases,
> > they were an unconditional collection of IP addresses or ports. For
>
> You're right to some extent template variables are similar to port
> groups.  The southbound database port group table splits the northbound
> port group per datapath though not per chassis like template variables.
>
> > template variables, they're a collection of untyped values with the
> > condition of only applying on certain Chassis. I wonder if this could
> > all be reconciled with a single table that uses untyped values with
> > user-specified conditions. Right now template variables have a "Chassis"
> > column, but maybe this could be replaced with a broader "condition",
> > "when", or "match" column. To get something integrated quickly, this
> > column could just accept the syntax of "chassis.name == <blah>" or
> > "chassis.uuid == <blah>" to allow for chassis-specific application of
> > the values. With this foundation, we could eventually allow
> > unconditional application of the value, or more complex conditions (e.g.
> > only apply to logical switch ports that are connected to a router with a
> > distributed gateway port). Doing this, we could deprecate address sets
> > and port groups eventually in favor of template variables.
>
> This sounds like a good idea to me.  I wasn't too happy with the
> "chassis" string column of the Template_Var table anyway.  A generic
> condition field makes more sense.
>
If it is chassis-specific template, a column "chassis" seems to be
straightforward. With a "match" column it is another burden of parsing
(which is costly and error prone). In addition, the LB object (or other
structures) is not a logical-flow, and it doesn't directly map to
logical-flows (unlike ACLs), so I didn't understand how would a match
string be applied to the template. Is there a more detailed example of
this? Maybe I am missing something, and hope we will see more details in
the formal patch.

> Regarding deprecating and replacing address sets and port groups, I'm
> not sure how easy that would be but we can try it when we get to that
point.
>

Address sets and port groups are something different in my view. Although
they can be treated as variables in a template, they are not really
chassis-specific, and each variable needs to be instantiated to a big
number of instances (sometimes huge). For this reason, fine-grained I-P
embedded in the expression parsing (for Address set) was introduced for the
performance of ovn-controller. Maybe we can say there are still some
similarities of templates, but I am not really sure if it is really helpful
to generalize them and how difficult it would be.

Thanks,
Han

> >
> > Third, I was wondering if there could be some layer that exists between
> > the IDL and the application that expands the template variables as early
> > as possible. I'm thinking the application could inject some callback in
> > the IDL layer that might allow for the values to be substituted. This
> > way, the variable substitution is taken care of in a single place, and
> > by the time the application gets the data, it knows that all
> > substitutions have been made and there is no need to special case
> > template variable names vs. plain tokens. They should all be plain
> > tokens. I don't think construction of such a layer should be a barrier
> > to merging the code, but it's something worth considering as a later
> > improvement.
>
> This could work.  I'll think more about it.  But like you said, it's
> probably a longer term goal.  It will need some significant changes in
> the IDL layer (e.g., to re-evaluate some records when template
> instantiations change).
>
> >
> > Anyway, those were my high-level thoughts on the topic. Let me know what
> > you think.
> >
>
> I can work on changing the Template_Var schema to add a broader way of
> specifying conditions (when/match/etc).  I'm already working on adding
> proper nbctl support for templated load balancers and trying to tackle
> the rest of the todos.  I can probably send a v1 sometime in the first
> half of next week.  Do you want to share any specific code related
> comments that I should already integrate or shall we start a proper
> review when v1 gets posted?
>
> Thanks again for your input on this RFC!
>
> Regards,
> Dumitru
>
> > On 8/5/22 12:26, Dumitru Ceara wrote:
> >> Sometimes network components are compute node-specific.  Sometimes such
> >> components are replicated, almost identically, for multiple nodes
> >> in the cluster.
> >>
> >> One such example is the case of Kubernetes NodePort services which
> >> translate (in the ovn-kubernetes case) to Load_Balancer
> >> objects being applied to each and every node's logical gateway router.
> >> These load balancers are almost identical, the main difference being
> >> the fact that they use different VIPs (the node's IP).
> >>
> >> With the current OVN load balancer design, this becomes a problem at
> >> scale because the number of load balancers that must be configured is
> >> N x M (N nodes times M services).
> >>
> >> This series proposes a new concept in OVN: virtual network component
> >> templates.  The goal of the templates is to help reduce resource
> >> consumption in the OVN central components in specific cases like the
one
> >> described above.
> >>
> >> To achieve that, the CMS will instead configure a "templated" load
> >> balancer for every service and apply that single template record to
> >> the cluster-wide load balancer group.  This template is then
> >> instantiated differently on different compute nodes.  This translation
> >> is controlled through per-chassis "template variables" configured by
> >> the CMS in the new NB.Template_Var table.
> >>
> >> A syntetic benchmark simulating what an OpenShift router (using Node
> >> Port services) scale test would do shows the following preliminary
> >> results:
> >> A. 120 node, 2K NodePort services:
> >> - before:
> >>    - Southbound DB size on disk (compacted): ~385MB
> >>    - Southbound DB memory usage (RSS): ~3GB
> >>    - Southbound DB logical flows: 720K
> >>
> >> - after:
> >>    - Southbound DB size on disk (compacted): ~100MB
> >>    - Southbound DB memory usage (RSS): ~250MB
> >>    - Southbound DB logical flows: 6K
> >>
> >> B. 250 node, 2K NodePort services:
> >> - after (didn't run the "before" test as it was taking way too long):
> >>    - Southbound DB size on disk (compacted): ~155MB
> >>    - Southbound DB memory usage (RSS): ~760MB
> >>    - Southbound DB logical flows: 6K
> >>
> >> The series is sent as RFC because there's still the need to add
> >> some template specific unit tests and the "ovn-nbctl lb-*" helper
> >> utilities need to be adapted to support templated load balancers.
> >>
> >> With these two items addressed the code is self can likely qualify
> >> for acceptance as a new feature in the upcoming release.
> >>
> >> There also exists a more extensive TODO list (also listed in the commit
> >> log of every patch in the series for now) but these are mainly load
> >> balancer related functionalities that are not yet implemented for
> >> templated load balancers but can definitely be implemented as follow
ups:
> >> - No support for LB health check if the LB is templated.
> >> - No support for VIP ARP responder if the LB is templated.
> >> - No support for routed VIPs if the LB is templated.
> >> - Figure out a way to deal with templates in ovn-trace
> >> - Determine if we need to allow Template_Var to match against chassis
> >>    hostname or other IDs.
> >> - Make ofctrl_inject_pkt() work with template_vars.
> >> - Make test-ovn work with template_vars.
> >>
> >> A basic example of how to configure a templated load balancer follows:
> >>    $ ovn-nbctl create load_balancer name=lb-test \
> >>        protocol=tcp options:template=true \
> >>        vips:\"^vip:4200\"="^backends"
> >>
> >>    $ ovn-nbctl ls-add ls
> >>    $ ovn-nbctl ls-lb-add ls lb-test
> >>
> >>    # Instantiate the load balancer on chassis-1
> >>    $ ovn-nbctl create template_var name=vip value=80.80.80.1
> >> chassis=chassis-1
> >>    $ ovn-nbctl create template_var name=backends
> >> value='"42.42.42.1:1000"' chassis=chassis-1
> >>
> >>    # Instantiate the load balancer on chassis-2
> >>    $ ovn-nbctl create template_var name=vip value=80.80.80.2
> >> chassis=chassis-2
> >>    $ ovn-nbctl create template_var name=backends
> >> value='"42.42.42.2:1000"' chassis=chassis-2
> >>
> >> Dumitru Ceara (5):
> >>        Add NB and SB Template_Var tables.
> >>        controller: Add support for templated actions and matches.
> >>        controller: Make resource references more generic.
> >>        lb: Support using templates.
> >>        controller: Add Template_Var <- LB references.
> >>
> >>
> >>   controller/lflow.c          | 248 ++++++++++++++++++++++++--------
> >>   controller/lflow.h          |  98 +++++++------
> >>   controller/ofctrl.c         |   9 +-
> >>   controller/ofctrl.h         |   3 +-
> >>   controller/ovn-controller.c | 277
++++++++++++++++++++++++++++++++++--
> >>   include/ovn/expr.h          |   4 +-
> >>   include/ovn/lex.h           |  14 +-
> >>   lib/actions.c               |   9 +-
> >>   lib/expr.c                  |  14 +-
> >>   lib/lb.c                    | 201 ++++++++++++++++++++++----
> >>   lib/lb.h                    |  36 +++--
> >>   lib/lex.c                   |  55 +++++++
> >>   northd/northd.c             |  64 +++++----
> >>   tests/ovn.at                |   2 +-
> >>   tests/test-ovn.c            |  16 ++-
> >>   utilities/ovn-trace.c       |  26 +++-
> >>   16 files changed, 869 insertions(+), 207 deletions(-)
> >>
> >> _______________________________________________
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to