On Tue, Oct 19, 2021 at 3:28 AM Han Zhou <hz...@ovn.org> wrote: > > On Mon, Oct 18, 2021 at 6:35 AM Odintsov Vladislav <vlodint...@croc.ru> > wrote: > > > > > > > > regards, > > Vladislav Odintsov > > > > > On 16 Oct 2021, at 03:20, Han Zhou <hz...@ovn.org> wrote: > > > > > > On Fri, Oct 15, 2021 at 2:36 AM Vladislav Odintsov <odiv...@gmail.com> > > > wrote: > > > > > >> > > >> > > >> Regards, > > >> Vladislav Odintsov > > >> > > >> On 15 Oct 2021, at 08:42, Han Zhou <hz...@ovn.org> wrote: > > >> > > >> On Thu, Oct 14, 2021 at 12:58 AM Vladislav Odintsov <odiv...@gmail.com> > > >> wrote: > > >> > > >> > > >> Hi Han, > > >> > > >> Thanks for the review. > > >> > > >> Regards, > > >> Vladislav Odintsov > > >> > > >> On 14 Oct 2021, at 08:13, Han Zhou <hz...@ovn.org> wrote: > > >> > > >> > > >> > > >> On Tue, Oct 5, 2021 at 1:26 PM Vladislav Odintsov <odiv...@gmail.com> > > >> > > >> wrote: > > >> > > >> > > >> This patch extends Logical Router's routing functionality. > > >> Now user may create multiple routing tables within a Logical Router > > >> and assign them to Logical Router Ports. > > >> > > >> Traffic coming from Logical Router Port with assigned route_table > > >> is checked against global routes if any (Logical_Router_Static_Routes > > >> whith empty route_table field), next against directly connected routes > > >> > > >> > > >> This is not accurate. The "directly connected routes" is NOT after the > > >> > > >> global routes. Their priority only depends on the prefix length. > > >> > > >> > > >> and then Logical_Router_Static_Routes with same route_table value as > > >> in Logical_Router_Port options:route_table field. > > >> > > >> A new Logical Router ingress table #10 is added - IN_IP_ROUTING_PRE. > > >> In this table packets which come from LRPs with configured > > >> options:route_table field are checked against inport and in OVS > > >> register 7 unique non-zero value identifying route table is written. > > >> > > >> Then in 11th table IN_IP_ROUTING routes which have non-empty > > >> `route_table` field are added with additional match on reg7 value > > >> associated with appropriate route_table. > > >> > > >> > > >> Hi Vladislav, > > >> > > >> First of all, sorry for the delayed review, and thanks for implementing > > >> > > >> this new feature. > > >> > > >> > > >> I have some questions regarding the feature itself. I remember that we > > >> > > >> had some discussion earlier for this feature, but it seems I > misunderstood > > >> the feature you are implementing here. I thought by multiple routing > tables > > >> you were trying to support something like VRF in physical routers, but > this > > >> seems to be something different. According to your implementation, > instead > > >> of assigning LRPs to different routing tables, a LRP can actually > > >> participate to both the global routing table and the table with a > specific > > >> ID. For ingress, global routes are prefered over other routes; for > egress > > >> (i.e. forwarding to the next hop), it doesn't even enforce any table-id > > >> check, so packet routed by a entry of table-X can go out of a LRP with > > >> table-id Y. Is my understanding correct about the desired behavior of > this > > >> feature? > > >> > > >> > > >> > > >> Yes, your understanding is correct. > > >> This is not VRF. At first glance VRF can be done just by using > LR-per-VRF > > >> > > >> without any code modifications (maybe there are corner cases, but it’s > not > > >> something I’m currently working on). > > >> > > >> I agree VRF can be achieved by just separate LRs. I am trying to > understand > > >> why multiple LRs wouldn't satisfy your use case here. > > >> > > >> LRP can be optionally assigned to specific routing table name. This > means > > >> > > >> that for LR ingress pipeline packets coming from this specific LRP > would be > > >> checked against routes with same route_table value within appropriate > LR. > > >> This is some kind of PBR, analog of "ip rule add iif <interface name> > > >> lookup <id>". > > >> > > >> There is one specific use-case, which requires special handling: > > >> > > >> directly-connected routes (subnet CIDRs from connected to this LR > LRPs). > > >> These routes can’t be added manually by user, though routing between > LRPs > > >> belonging to different routing tables is still needed. So, these routes > > >> should be added to global routing table to override routing for LRPs > with > > >> configured routing tables. If for some reason user wants to prohibit IP > > >> connectivity to any LRP (honestly, I can’t imagine, why), it can be > done by > > >> utilising OVN PBR with drop action or a route with "discard" nexthop. > But > > >> I’d think in this case whether this LRP is needed in this LR. > > >> > > >> Also, if user wants to create routes, which apply for all LRPs from all > > >> > > >> routing tables, it can be done by adding new entries to global routing > > >> table. > > >> > > >> And the third reason to have global routing table - a fully > > >> > > >> backward-compatible solution. All users, who don’t need multiple > routing > > >> tables support just continue using static routing in the same manner > > >> without any changes. > > >> > > >> > > >> > > >> If this is correct, it doesn't seem to be a common/standard requirement > > >> > > >> (or please educate me if I am wrong). Could you explain a little more > about > > >> the actual use cases: what kind of real world problems need to be > solved by > > >> this feature or how are you going to use this. For example, why would a > > >> port need to participate in both routing tables? It looks like what you > > >> really need is policy routing instead of multiple isolated routing > tables. > > >> I understand that you already use policy routing to implement ACLs, so > it > > >> is not convenient to combine other policies (e.g. inport based routing) > > >> into the policy routing stage. If that's the case, would it be more > generic > > >> to support multiple policy routing stages? My concern to the current > > >> approach is that it is implemented for a very special use case. It > makes > > >> the code more complex but when there is a slightly different > requirement in > > >> the future it becomes insufficient. I am thinking that policy routing > seems > > >> more flexible and has more potential to be made more generic. Maybe I > will > > >> have a better understanding when I hear more detailed use cases and > > >> considerations from you. > > >> > > >> > > >> > > >> I can't agree here in it’s uncommon requirement. > > >> This implementation was inspired by AWS Route Tables feature [1]: > within > > >> > > >> a VPC (LR in terms of OVN) user may create multiple routing tables and > > >> assign them to different subnets (LRPs) in multiple availability zones. > > >> Auto-generated directly-connected routes from LRPs CIDRs are working > in the > > >> same manner as they do in AWS - apply to all subnets regardless of > their > > >> association to routing table. GCP has similar behaviour: [2], Azure, I > > >> guess, too. > > >> > > >> Our public cloud (CROC Cloud Platform) supports AWS behaviour [3], so I > > >> > > >> primarily was oriented on it. Internally we already use this feature > and > > >> currently it fits our use-case, but again I can't say it is specific. > > >> > > >> If it is for AWS/GCP alike routing features, then from what I > understand > > >> what's implemented in this patch is a little different. To implement a > VPC > > >> model in OVN, I think it is ok to have multiple LRs instead of only > one LR > > >> for each VPC. So naturally I would use a LR to map to AWS's routing > table. > > >> In AWS's document (the link you provided), it says: > > >> > > >> "A subnet can only be associated with one route table at a time" > > >> > > >> So basically in AWS a subnet is either attached to the default/main > route > > >> table or a custom table, but not both, right? However, in your use > case, a > > >> LRP (maps to a subnet) attaches to both "main" and a custom table, > which > > >> seems not common to me. Or did I miss something? > > >> > > >> > > >> That’s true about AWS, but there is still a bit not accurate about OVN. > > >> Global routing table in OVN terms is not that AWS main route table is. > > >> Main route table is just a configuration hint for users for implicit > route > > >> tables association with subnets. > > >> Implicitly-associated via main routing table subnets routing functions > the > > >> same manner as a normal explicit route_table-subnet association. > > >> > > >> Global routing table in OVN is just a list of routes with higher > priority > > >> than routes with configured "route_table". > > >> > > >> I do not offer to configure both tables at the same time. But it is > > >> _possible_ to do if required for some reason (for instance to configure > > >> some service chaining or just internal VPC services like > metadata/another > > >> internal APIs, access to another services). > > >> Normally, if we talk about AWS Route Table to OVN, it is mostly > one-to-one > > >> mapping, except "Local route": > > >> > > > > > >> Example: > > >> AWS Route Table rtb-xxx: > > >> 172.31.0.0/16: local route (VPC CIDR for subnets) > > >> 0.0.0.0/0: igw-XXX (internet gateway) > > >> > > >> AWS Route Table rtb-yyy: > > >> 172.31.0.0/16: local route (VPC CIDR for subnets) > > >> 5.5.5.5/32: instance-xxx > > >> > > >> This maps to OVN configuration (for one LR with one subnet > 172.31.0.0/24): > > >> > > >> implicit route (not present in logical router static routes table): > > >> 172.31.0.0/24: LRP-subnet-xxx - has highest priority over other route > > >> table-oriented routes and can be threat as placed in global routing > table > > >> > > > > > > What do you mean by highest priority here? It all depends on the prefix > > > length, right? If you have a static route entry that says: > 172.31.0.0./25 > > > -> xyz, then this route will have higher priority than the directly > > > connected subnet. > > > > > > > Yes, it is true, but only for routes within “global” routing table. I > left this to save previous behaviour. > > It is impossible to create a static route within any non-global routing > table which overrides directly connected routes, > > because priority for “global” routes is added by 100 (though, it should > add >2*128 to support same behavior for IPv6 as you already pointed). > > > > > > > >> > > >> Normal static routes: > > >> ip_prefix: 0.0.0.0/0, nexthop: <IP for edge LR’s LRP>, route_table: > > >> rtb-xxx > > >> ip_prefix: 5.5.5.5/32, nexthop: <IP of some LSP, which belongs to > > >> VM/container via which route is built>, route_table: rtb-yyy > > >> > > >> I guess, I understood the reason for misunderstanding: the global > routing > > >> table, which I referred earlier is a routing table which has no value > in > > >> "route_table" field and directly-connected routes at the same time. > Latter > > >> have no records in logical router static routes, but I still referred > them > > >> as a routes from "global routing table". I can think about terminology > > >> here, if it’s a root cause for misunderstanding. What do you think? > > >> > > > > > > Thanks for explaining. In fact I understand what you mean about "global > > > routing table", but I think you are implementing something different > from > > > what AWS provides, primarily because you use a single LR instead of LR > per > > > routing table. I understand there are reasons why you want to do it this > > > way, but here I can think of some challenges of your approach. For > example, > > > in AWS we could do: > > > > > > route table main: > > > subnet S0: > > > 172.31.0.0/24 > > > routes: > > > 172.31.0.0/16: local > > > > > > route table R1: > > > subnet S1: > > > 172.31.1.0/24 > > > routes: > > > 172.31.0.0/16: local > > > 172.31.0.0/24: <some FW/GW> > > > > Wow. It’s a brand-new behaviour for AWS, about which I was not aware, as > it appeared 1.5 months ago [1]. > > > > Previously, when I was writing this patch series in AWS such static route > was impossible to add. > > You couldn’t add routes, which are the same or more specific than VPC > CIDR block. > > > > Though in GCP you can’t create same or more specific than subnet route > [2]. > > I think I can try to update my patch to support AWS approach here. > > Would it be simpler just don't make the global routes higher priority? We > could make it clear to the user that if they add a route in any route table > that is overlapping with any directly connected subnets, the longer prefix > would take precedence. Would it be more flexible to user? > > > > > But all this is relevant to single LR per VPC (not per route table > because of described earlier blocker). > > Do you think it is reasonable with such inputs? > > > > > > > > Packet from S1 to S0 will go to FW/GW, where it may be > dropped/forwarded to > > > S0/or redirected to something outside of AWS ... > > > > > > While in your approach with OVN, both subnets will be under the same > > > logical router, and with different table IDs assigned to the LRPs and > > > routes. But when a VM under S1 sends a packet to S0, it will go to the > > > destination directly because you are prioritizing the direct-connection > > > routes and they are under the same global route table, and there is no > way > > > to force it to go through some FW/GW from the custom route table. You > can > > > add a route to achieve this in the global table, because in your design > the > > > global routes are still applicable to all LRPs and has higher priority, > but > > > it would impact all the LRPs/subnets, while in the AWS design above it > is > > > supposed to affect subnets under R1 only. > > > > > > In addition, for my understanding, S0 and S1 in AWS cannot communicate > > > directly, unless there are some specific routes on both route tables to > > > route them to some gateway. In other words, there are some isolations > > > between route tables. However, in your design with OVN, it is more of > > > > S0 and S1 can communicate with each other being in different route tables > without any specific setup, > > the default unremovable “Local route” ensures this. > > User only can override "local" route with only existing subnet’s (more > specific than "local route") CIDR > > to send traffic from S0 to S1 through some specific appliance(s) in S2. > > But basic IP connectivity always remains. Regardless of route tables > subnet associations. > > So I’m still sure that one LR per VPC suites this scenario better, > > as I can’t imagine why to place subnets, which don’t need connectivity in > one VPC. > > > Maybe I was wrong. I am not really familiar with AWS and I wanted to try > this myself but didn't get the time :( > If default connectivity for subnets under different route tables is a > requirement (instead of the contrary), then I'd agree that one LR per VPC > seems better, because it is more like policy routing instead of for > isolation. > > > > policy routing without isolation at all. I am not saying that your > design > > > is wrong, but just saying it implements very different behavior than > AWS, > > > and I am trying to understand your real use cases to have a better > > > understanding of the feature you implemented. (You explained that you > just > > > wanted the AWS behavior, but it looks not exactly the case). > > > > > > > Will fix places where the behaviour has difference with AWS. > > And thanks you for pointing such places. > > > Well, I think it doesn't have to be all the same as AWS, but yes if AWS > design makes more sense. > > > > > > >> > > >> > > >> In my opinion having this feature to be implemented using PBR is less > > >> > > >> convenient and native for users, who are familiar with behaviour for > > >> mentioned above public cloud platforms, because configuring routes > should > > >> be done in routes section. And adding route table property seems > native in > > >> this route, not in PBR. Moreover, I _think_ using > > >> Logical_Router_Static_Route to extend this feature for > OVN-Interconnection > > >> becomes quite easy comparing to PBR (though, I didn’t try the latter). > > >> > > >> I agree if it is just AWS -like requirement, PBR is less convenient. > > >> > > >> I am trying to understand if it can be achieved with separate LRs. If > not, > > >> what's special about the requirement, and is the current approach > providing > > >> a solution common enough so that more use cases can also benefit from? > > >> Could you clarify a little more? Thanks again. > > >> > > >> That was our initial approach - to use separate LRs for each Route > Table. > > >> We rejected that solution because met some difficulties and blocker. > See > > >> below: > > >> > > >> Brief topology description if using LRs per Route Table: > > >> Imagine 2 subnets in VPC in 1st AZ, one in another AZ. Each subnet in > it’s > > >> own Route Table (LR). > > >> All subnets must have IP connectivity, so we have to somehow > interconnect > > >> these Route Table LRs. > > >> > > > > > > That's exactly the same case for AWS, right? If you want direct > > > connectivity, you would just put them under the same route table in AWS > (or > > > same LR in OVN), right? > > > > > > > I’m sorry, my example was not good enough. > > Let’s remove AZ’s from that case as they’re not relevant to discussion. > > So, we’ve got 2 subnets S0 and S1, and suppose, they’ve got different > route tables assigned to them. > > If we place them in a single LR, assign different route tables, they > would have connectivity "out-of-box". > > Same in AWS: because "local route" is added automatically, can’t be > removed or modified and ensures IP connectivity between all VPC subnets. > > > Ok, thanks for explaining. > > > > > > >> > > >> [BLOCKER] It is impossible to place route in route table 1 via VM from > > >> subnet assiciated to route table 2 if using per-RTB LR. Because in > RTB-2 LR > > >> we have to add route from RTB 1 and this breaks route table isolation. > > >> Route Table 2 LR will start looking up routes and there could be routes > > >> from another route tables. This breaks the idea of having LR per Route > > >> Table completely. Here we rejected this solution and moved to adding > > >> support for route tables in OVN. > > >> > > >> > > > Would it be just the same problem in AWS? Why would a user route a > subnet > > > of RTB-1 to a VM under RTB-2, and at the same time want route table > > > isolation? > > > With the single LR solution in OVN, you would not get the isolation > between > > > the tables, because 1) all LRPs still share the global routing table, 2) > > > for output to nexthop there is no distinction between the output > interfaces. > > > > > > > No, such problem does not exist. > > By route tables "isolation" I meant that routing rules configured in one > routing table must not affect routing in another one. > > In this scenario user may want to route traffic to internet from S0 > (RTB-0) through instance in S1 (RTB-1). > > For instance it can be a gateway service (FW, WAF, IPS, etc) in S1, which > provides secure access for clients from VPC to the internet with NAT. > > So, configuration should be: > > - 0.0.0.0/0 via GW-INSTANCE in RTB-0 > > - 0.0.0.0/0 via edge LR in RTB-1 > > So, instances from all subnets with rtb-0 assigned route table will be > routed to internet through GW-INSTANCE, which can analyse/drop/do whatever > things with traffic. > > This is a small example to show how different route tables can be used in > a single VPC. > > With multiple OVN LRs per VPC in AZ such topology can’t be created, see: > > In LR for RTB-0 we can add route 0.0.0.0/0 via <LR0-to-LR1 link IP>; > > bun in LR for RTB-1 we can only add route to internet through either > edge-LR or GW-INSTANCE. Not both. So, this case can’t be implemented > currently in OVN. > > > Thanks for the example. It helps understanding your requirements. Why I > believe this can be achieved by current OVN with the policy routing > feature, I understand as you mentioned earlier you need policy routing for > ACL purpose. I also agree that multi-stage policy routing would seem less > user-friendly (and maybe also harder to implement), so I support the > multi-table feature for the purpose of providing extra policy routing > capability. > > While I still need to take a closer look at the code, I believe it would be > helpful to add such example use cases in the documentation for users to > better understand the feature. > > > Hope this helps to understand my problem. > > > > > > > >> But some more cons: > > >> > > >> 1. complex network topology. Interconnecting all LRs even with some > > >> transit switch is harder than having one LR and all VPC-related > > >> configuration is done in one place (in AZ). > > >> 2. Directly-connected routes. In case we have multiple Route Table > LRs, we > > >> have to add route for each subnet from another LR. In case one LR per > VPC > > >> all such routes are installed automatically and learnt via ovn-ic. > > >> 3. PBR, LBs. It’s much easier to implement PBR and configuring Load > > >> Balancers in one LR, than in multiple. > > >> 4. There could be very many LRs, LRPs, LSPs (datapaths in SB) - excess > > >> database records, huge database growth. > > >> 5. Extra client code complexity (updating routes required configuring > > >> routes in many LRs on different availibility zones); > > >> 5. We couldn’t use ovn-ic routes learning, because VPC requires > out-of-box > > >> IP connectivity between subnets, and if connect Route Table LRs between > > >> AZs, because connecting multiple LRs from different Route Tables would > > >> learn routes without binding to route table. This requires additional > > >> isolation at the transit switches level. > > >> 6. There were some more problems. Here are listed some, which I could > > >> refresh in my mind. > > >> > > >> From our half-of-a-year experience using LR per VPC is very comfortable > > >> and it looks quite extendable it terms of network features. > > >> > > >> Let me know if this makes sense. > > >> > > >> > > > Thanks for the list of the cons! Most of the cons you listed above seem > to > > > apply to the AWS model, too, right? It is ok to have different > requirements > > > > Honestly, I can’t imagine these bullets to "AWS model". Maybe only > ovn-ic, which > > requires modifications in ovn-ic codebase to support route_tables in > routes. > > But this work is already done. > > I think the disagreement was mainly from the understanding of the behavior > of directly connected subnets between different routing tables. I assume my > understanding was wrong for now :) > > > > > than AWS, but we'd better define it clearly and it would be helpful with > > > some typical use cases that solve specific problems. I have more > > > information now after your explanation, but still not really clear under > > > what scenarios would this feature be used. > > > > > > @Numan Siddique <num...@ovn.org> @Mark Michelson <mmich...@redhat.com> > > > could you let me know your thoughts on this, too? Would you see similar > use > > > cases in Redhat customers? Or does this design match well with your > > > potential use cases? I'd like to be careful when implementing something > > > like this and make sure we are doing it the right way.
I don't think there is any use case in ovn-k8s or openstack neutron which would use this feature. Sorry. I've not followed the discussion here closely. What is the tl;dr. @Han Zhou You're fine with this feature and will we be accepting this in OVN ? Thanks Numan > > > > One comment here from my side: > > by "this design" we should talk about "behaviour similar to AWS/other > public platforms route tables support". > > I think that it is a good feature for opensource SDN, which can be futher > used by opensource cloud platforms like OpenStack to extend routing > capabilities. > > > Agree. > > > > > > > Thanks, > > > Han > > > > 1: > https://aws.amazon.com/blogs/aws/inspect-subnet-to-subnet-traffic-with-amazon-vpc-more-specific-routing/ > > 2: https://cloud.google.com/vpc/docs/routes#subnet-routes > > > > > > > >> Han > > >> > > >> > > >> > > >> I haven't finished reviewing the code yet, but I have one comment > > >> > > >> regarding adding 100 to the priority of the global routes. For IPv6, > the > > >> priority range from 0 to 120x2=240, so adding 100 is not enough. It > would > > >> create overlapping priority ranges, and some table-id specific route > > >> entries may override the global routes. > > >> > > >> > > >> > > >> Thanks, I’ll dig into this when you finish review. > > >> > > >> > > >> Let me know if I answered your questions or if you have new ones. > > >> Again many thanks for your time and digging into this patch series. > > >> > > >> 1: > https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html > > >> 2: https://cloud.google.com/vpc/docs/routes#subnet-routes > > >> 3: https://docs.cloud.croc.ru/en/services/networks/routetables.html > > >> > > >> Thanks, > > >> Han > > >> > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev