Thanks Dumitru for this promising optimization! On Thu, Aug 11, 2022 at 1:03 AM Dumitru Ceara <dce...@redhat.com> wrote: > > On 8/10/22 19:54, Mark Michelson wrote: > > Hi Dumitru, > > > > Hi Mark, > > > I read the patch series, and I think the idea of chassis-specific > > variables is a good idea to reduce the number of DB records for certain > > things. Aside from load balancers, I suspect this could have a positive > > impact for other structures as well. > > > > Thanks for taking a look! Yes, I think this might be applicable to > other structures too. >
I think it is a good idea to make it more generic, but for my understanding this template concept is for anything that's "node/chassis" specific, and supposed to be instantiated at chassis level. Probably we should name the DB tables as something like chassis_template_var. > > Rather than criticizing the individual lines of code, I'll focus instead > > on some higher-level questions/ideas. > > > > Sure, thanks! :) > > > First, one question I had was what happens when a template variable name > > is used in a load balancer, but there is no appropriate value to > > substitute? For instance, what if a load balancer applies to chassis-3, > > but you only have template variables for chassis-1 and chassis-2? This > > might be addressed in the code but I didn't notice if it was. > > > > There are actually two things to consider here: > > 1. there might be a logical flow that uses a template variable: in this > case if template expansion/instantiation fails we currently leave the > token untouched (e.g., '^variable' stays '^variable'). That will cause > the flow action/match parsing to fail and currently logs a warning. The > flow itself is skipped, as it should be. We probably need to avoid > logging a warning though. > > 2. like you pointed out, there might be a load balancer using templates > in its backends/vips: if some of those templates cannot be instantiated > locally the backend/vip where they're added is skipped. Unless I missed > something, the code should already do that. > > > Second, it seems like template variables are a natural extension of > > existing concepts like address sets and port groups. In those cases, > > they were an unconditional collection of IP addresses or ports. For > > You're right to some extent template variables are similar to port > groups. The southbound database port group table splits the northbound > port group per datapath though not per chassis like template variables. > > > template variables, they're a collection of untyped values with the > > condition of only applying on certain Chassis. I wonder if this could > > all be reconciled with a single table that uses untyped values with > > user-specified conditions. Right now template variables have a "Chassis" > > column, but maybe this could be replaced with a broader "condition", > > "when", or "match" column. To get something integrated quickly, this > > column could just accept the syntax of "chassis.name == <blah>" or > > "chassis.uuid == <blah>" to allow for chassis-specific application of > > the values. With this foundation, we could eventually allow > > unconditional application of the value, or more complex conditions (e.g. > > only apply to logical switch ports that are connected to a router with a > > distributed gateway port). Doing this, we could deprecate address sets > > and port groups eventually in favor of template variables. > > This sounds like a good idea to me. I wasn't too happy with the > "chassis" string column of the Template_Var table anyway. A generic > condition field makes more sense. > If it is chassis-specific template, a column "chassis" seems to be straightforward. With a "match" column it is another burden of parsing (which is costly and error prone). In addition, the LB object (or other structures) is not a logical-flow, and it doesn't directly map to logical-flows (unlike ACLs), so I didn't understand how would a match string be applied to the template. Is there a more detailed example of this? Maybe I am missing something, and hope we will see more details in the formal patch. > Regarding deprecating and replacing address sets and port groups, I'm > not sure how easy that would be but we can try it when we get to that point. > Address sets and port groups are something different in my view. Although they can be treated as variables in a template, they are not really chassis-specific, and each variable needs to be instantiated to a big number of instances (sometimes huge). For this reason, fine-grained I-P embedded in the expression parsing (for Address set) was introduced for the performance of ovn-controller. Maybe we can say there are still some similarities of templates, but I am not really sure if it is really helpful to generalize them and how difficult it would be. Thanks, Han > > > > Third, I was wondering if there could be some layer that exists between > > the IDL and the application that expands the template variables as early > > as possible. I'm thinking the application could inject some callback in > > the IDL layer that might allow for the values to be substituted. This > > way, the variable substitution is taken care of in a single place, and > > by the time the application gets the data, it knows that all > > substitutions have been made and there is no need to special case > > template variable names vs. plain tokens. They should all be plain > > tokens. I don't think construction of such a layer should be a barrier > > to merging the code, but it's something worth considering as a later > > improvement. > > This could work. I'll think more about it. But like you said, it's > probably a longer term goal. It will need some significant changes in > the IDL layer (e.g., to re-evaluate some records when template > instantiations change). > > > > > Anyway, those were my high-level thoughts on the topic. Let me know what > > you think. > > > > I can work on changing the Template_Var schema to add a broader way of > specifying conditions (when/match/etc). I'm already working on adding > proper nbctl support for templated load balancers and trying to tackle > the rest of the todos. I can probably send a v1 sometime in the first > half of next week. Do you want to share any specific code related > comments that I should already integrate or shall we start a proper > review when v1 gets posted? > > Thanks again for your input on this RFC! > > Regards, > Dumitru > > > On 8/5/22 12:26, Dumitru Ceara wrote: > >> Sometimes network components are compute node-specific. Sometimes such > >> components are replicated, almost identically, for multiple nodes > >> in the cluster. > >> > >> One such example is the case of Kubernetes NodePort services which > >> translate (in the ovn-kubernetes case) to Load_Balancer > >> objects being applied to each and every node's logical gateway router. > >> These load balancers are almost identical, the main difference being > >> the fact that they use different VIPs (the node's IP). > >> > >> With the current OVN load balancer design, this becomes a problem at > >> scale because the number of load balancers that must be configured is > >> N x M (N nodes times M services). > >> > >> This series proposes a new concept in OVN: virtual network component > >> templates. The goal of the templates is to help reduce resource > >> consumption in the OVN central components in specific cases like the one > >> described above. > >> > >> To achieve that, the CMS will instead configure a "templated" load > >> balancer for every service and apply that single template record to > >> the cluster-wide load balancer group. This template is then > >> instantiated differently on different compute nodes. This translation > >> is controlled through per-chassis "template variables" configured by > >> the CMS in the new NB.Template_Var table. > >> > >> A syntetic benchmark simulating what an OpenShift router (using Node > >> Port services) scale test would do shows the following preliminary > >> results: > >> A. 120 node, 2K NodePort services: > >> - before: > >> - Southbound DB size on disk (compacted): ~385MB > >> - Southbound DB memory usage (RSS): ~3GB > >> - Southbound DB logical flows: 720K > >> > >> - after: > >> - Southbound DB size on disk (compacted): ~100MB > >> - Southbound DB memory usage (RSS): ~250MB > >> - Southbound DB logical flows: 6K > >> > >> B. 250 node, 2K NodePort services: > >> - after (didn't run the "before" test as it was taking way too long): > >> - Southbound DB size on disk (compacted): ~155MB > >> - Southbound DB memory usage (RSS): ~760MB > >> - Southbound DB logical flows: 6K > >> > >> The series is sent as RFC because there's still the need to add > >> some template specific unit tests and the "ovn-nbctl lb-*" helper > >> utilities need to be adapted to support templated load balancers. > >> > >> With these two items addressed the code is self can likely qualify > >> for acceptance as a new feature in the upcoming release. > >> > >> There also exists a more extensive TODO list (also listed in the commit > >> log of every patch in the series for now) but these are mainly load > >> balancer related functionalities that are not yet implemented for > >> templated load balancers but can definitely be implemented as follow ups: > >> - No support for LB health check if the LB is templated. > >> - No support for VIP ARP responder if the LB is templated. > >> - No support for routed VIPs if the LB is templated. > >> - Figure out a way to deal with templates in ovn-trace > >> - Determine if we need to allow Template_Var to match against chassis > >> hostname or other IDs. > >> - Make ofctrl_inject_pkt() work with template_vars. > >> - Make test-ovn work with template_vars. > >> > >> A basic example of how to configure a templated load balancer follows: > >> $ ovn-nbctl create load_balancer name=lb-test \ > >> protocol=tcp options:template=true \ > >> vips:\"^vip:4200\"="^backends" > >> > >> $ ovn-nbctl ls-add ls > >> $ ovn-nbctl ls-lb-add ls lb-test > >> > >> # Instantiate the load balancer on chassis-1 > >> $ ovn-nbctl create template_var name=vip value=80.80.80.1 > >> chassis=chassis-1 > >> $ ovn-nbctl create template_var name=backends > >> value='"42.42.42.1:1000"' chassis=chassis-1 > >> > >> # Instantiate the load balancer on chassis-2 > >> $ ovn-nbctl create template_var name=vip value=80.80.80.2 > >> chassis=chassis-2 > >> $ ovn-nbctl create template_var name=backends > >> value='"42.42.42.2:1000"' chassis=chassis-2 > >> > >> Dumitru Ceara (5): > >> Add NB and SB Template_Var tables. > >> controller: Add support for templated actions and matches. > >> controller: Make resource references more generic. > >> lb: Support using templates. > >> controller: Add Template_Var <- LB references. > >> > >> > >> controller/lflow.c | 248 ++++++++++++++++++++++++-------- > >> controller/lflow.h | 98 +++++++------ > >> controller/ofctrl.c | 9 +- > >> controller/ofctrl.h | 3 +- > >> controller/ovn-controller.c | 277 ++++++++++++++++++++++++++++++++++-- > >> include/ovn/expr.h | 4 +- > >> include/ovn/lex.h | 14 +- > >> lib/actions.c | 9 +- > >> lib/expr.c | 14 +- > >> lib/lb.c | 201 ++++++++++++++++++++++---- > >> lib/lb.h | 36 +++-- > >> lib/lex.c | 55 +++++++ > >> northd/northd.c | 64 +++++---- > >> tests/ovn.at | 2 +- > >> tests/test-ovn.c | 16 ++- > >> utilities/ovn-trace.c | 26 +++- > >> 16 files changed, 869 insertions(+), 207 deletions(-) > >> > >> _______________________________________________ > >> dev mailing list > >> d...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev