On Tue, Oct 19, 2021 at 3:28 AM Han Zhou <hz...@ovn.org> wrote:
>
> On Mon, Oct 18, 2021 at 6:35 AM Odintsov Vladislav <vlodint...@croc.ru>
> wrote:
> >
> >
> >
> > regards,
> > Vladislav Odintsov
> >
> > > On 16 Oct 2021, at 03:20, Han Zhou <hz...@ovn.org> wrote:
> > >
> > > On Fri, Oct 15, 2021 at 2:36 AM Vladislav Odintsov <odiv...@gmail.com>
> > > wrote:
> > >
> > >>
> > >>
> > >> Regards,
> > >> Vladislav Odintsov
> > >>
> > >> On 15 Oct 2021, at 08:42, Han Zhou <hz...@ovn.org> wrote:
> > >>
> > >> On Thu, Oct 14, 2021 at 12:58 AM Vladislav Odintsov <odiv...@gmail.com>
> > >> wrote:
> > >>
> > >>
> > >> Hi Han,
> > >>
> > >> Thanks for the review.
> > >>
> > >> Regards,
> > >> Vladislav Odintsov
> > >>
> > >> On 14 Oct 2021, at 08:13, Han Zhou <hz...@ovn.org> wrote:
> > >>
> > >>
> > >>
> > >> On Tue, Oct 5, 2021 at 1:26 PM Vladislav Odintsov <odiv...@gmail.com>
> > >>
> > >> wrote:
> > >>
> > >>
> > >> This patch extends Logical Router's routing functionality.
> > >> Now user may create multiple routing tables within a Logical Router
> > >> and assign them to Logical Router Ports.
> > >>
> > >> Traffic coming from Logical Router Port with assigned route_table
> > >> is checked against global routes if any (Logical_Router_Static_Routes
> > >> whith empty route_table field), next against directly connected routes
> > >>
> > >>
> > >> This is not accurate. The "directly connected routes" is NOT after the
> > >>
> > >> global routes. Their priority only depends on the prefix length.
> > >>
> > >>
> > >> and then Logical_Router_Static_Routes with same route_table value as
> > >> in Logical_Router_Port options:route_table field.
> > >>
> > >> A new Logical Router ingress table #10 is added - IN_IP_ROUTING_PRE.
> > >> In this table packets which come from LRPs with configured
> > >> options:route_table field are checked against inport and in OVS
> > >> register 7 unique non-zero value identifying route table is written.
> > >>
> > >> Then in 11th table IN_IP_ROUTING routes which have non-empty
> > >> `route_table` field are added with additional match on reg7 value
> > >> associated with appropriate route_table.
> > >>
> > >>
> > >> Hi Vladislav,
> > >>
> > >> First of all, sorry for the delayed review, and thanks for implementing
> > >>
> > >> this new feature.
> > >>
> > >>
> > >> I have some questions regarding the feature itself. I remember that we
> > >>
> > >> had some discussion earlier for this feature, but it seems I
> misunderstood
> > >> the feature you are implementing here. I thought by multiple routing
> tables
> > >> you were trying to support something like VRF in physical routers, but
> this
> > >> seems to be something different. According to your implementation,
> instead
> > >> of assigning LRPs to different routing tables, a LRP can actually
> > >> participate to both the global routing table and the table with a
> specific
> > >> ID. For ingress, global routes are prefered over other routes; for
> egress
> > >> (i.e. forwarding to the next hop), it doesn't even enforce any table-id
> > >> check, so packet routed by a entry of table-X can go out of a LRP with
> > >> table-id Y. Is my understanding correct about the desired behavior of
> this
> > >> feature?
> > >>
> > >>
> > >>
> > >> Yes, your understanding is correct.
> > >> This is not VRF. At first glance VRF can be done just by using
> LR-per-VRF
> > >>
> > >> without any code modifications (maybe there are corner cases, but it’s
> not
> > >> something I’m currently working on).
> > >>
> > >> I agree VRF can be achieved by just separate LRs. I am trying to
> understand
> > >> why multiple LRs wouldn't satisfy your use case here.
> > >>
> > >> LRP can be optionally assigned to specific routing table name. This
> means
> > >>
> > >> that for LR ingress pipeline packets coming from this specific LRP
> would be
> > >> checked against routes with same route_table value within appropriate
> LR.
> > >> This is some kind of PBR, analog of "ip rule add iif <interface name>
> > >> lookup <id>".
> > >>
> > >> There is one specific use-case, which requires special handling:
> > >>
> > >> directly-connected routes (subnet CIDRs from connected to this LR
> LRPs).
> > >> These routes can’t be added manually by user, though routing between
> LRPs
> > >> belonging to different routing tables is still needed. So, these routes
> > >> should be added to global routing table to override routing for LRPs
> with
> > >> configured routing tables. If for some reason user wants to prohibit IP
> > >> connectivity to any LRP (honestly, I can’t imagine, why), it can be
> done by
> > >> utilising OVN PBR with drop action or a route with "discard" nexthop.
> But
> > >> I’d think in this case whether this LRP is needed in this LR.
> > >>
> > >> Also, if user wants to create routes, which apply for all LRPs from all
> > >>
> > >> routing tables, it can be done by adding new entries to global routing
> > >> table.
> > >>
> > >> And the third reason to have global routing table - a fully
> > >>
> > >> backward-compatible solution. All users, who don’t need multiple
> routing
> > >> tables support just continue using static routing in the same manner
> > >> without any changes.
> > >>
> > >>
> > >>
> > >> If this is correct, it doesn't seem to be a common/standard requirement
> > >>
> > >> (or please educate me if I am wrong). Could you explain a little more
> about
> > >> the actual use cases: what kind of real world problems need to be
> solved by
> > >> this feature or how are you going to use this. For example, why would a
> > >> port need to participate in both routing tables? It looks like what you
> > >> really need is policy routing instead of multiple isolated routing
> tables.
> > >> I understand that you already use policy routing to implement ACLs, so
> it
> > >> is not convenient to combine other policies (e.g. inport based routing)
> > >> into the policy routing stage. If that's the case, would it be more
> generic
> > >> to support multiple policy routing stages? My concern to the current
> > >> approach is that it is implemented for a very special use case. It
> makes
> > >> the code more complex but when there is a slightly different
> requirement in
> > >> the future it becomes insufficient. I am thinking that policy routing
> seems
> > >> more flexible and has more potential to be made more generic. Maybe I
> will
> > >> have a better understanding when I hear more detailed use cases and
> > >> considerations from you.
> > >>
> > >>
> > >>
> > >> I can't agree here in it’s uncommon requirement.
> > >> This implementation was inspired by AWS Route Tables feature [1]:
> within
> > >>
> > >> a VPC (LR in terms of OVN) user may create multiple routing tables and
> > >> assign them to different subnets (LRPs) in multiple availability zones.
> > >> Auto-generated directly-connected routes from LRPs CIDRs are working
> in the
> > >> same manner as they do in AWS - apply to all subnets regardless of
> their
> > >> association to routing table. GCP has similar behaviour: [2], Azure, I
> > >> guess, too.
> > >>
> > >> Our public cloud (CROC Cloud Platform) supports AWS behaviour [3], so I
> > >>
> > >> primarily was oriented on it. Internally we already use this feature
> and
> > >> currently it fits our use-case, but again I can't say it is specific.
> > >>
> > >> If it is for AWS/GCP alike routing features, then from what I
> understand
> > >> what's implemented in this patch is a little different. To implement a
> VPC
> > >> model in OVN, I think it is ok to have multiple LRs instead of only
> one LR
> > >> for each VPC. So naturally I would use a LR to map to AWS's routing
> table.
> > >> In AWS's document (the link you provided), it says:
> > >>
> > >> "A subnet can only be associated with one route table at a time"
> > >>
> > >> So basically in AWS a subnet is either attached to the default/main
> route
> > >> table or a custom table, but not both, right? However, in your use
> case, a
> > >> LRP (maps to a subnet) attaches to both "main" and a custom table,
> which
> > >> seems not common to me. Or did I miss something?
> > >>
> > >>
> > >> That’s true about AWS, but there is still a bit not accurate about OVN.
> > >> Global routing table in OVN terms is not that AWS main route table is.
> > >> Main route table is just a configuration hint for users for implicit
> route
> > >> tables association with subnets.
> > >> Implicitly-associated via main routing table subnets routing functions
> the
> > >> same manner as a normal explicit route_table-subnet association.
> > >>
> > >> Global routing table in OVN is just a list of routes with higher
> priority
> > >> than routes with configured "route_table".
> > >>
> > >> I do not offer to configure both tables at the same time. But it is
> > >> _possible_ to do if required for some reason (for instance to configure
> > >> some service chaining or just internal VPC services like
> metadata/another
> > >> internal APIs, access to another services).
> > >> Normally, if we talk about AWS Route Table to OVN, it is mostly
> one-to-one
> > >> mapping, except "Local route":
> > >>
> > >
> > >> Example:
> > >> AWS Route Table rtb-xxx:
> > >> 172.31.0.0/16: local route (VPC CIDR for subnets)
> > >> 0.0.0.0/0: igw-XXX (internet gateway)
> > >>
> > >> AWS Route Table rtb-yyy:
> > >> 172.31.0.0/16: local route (VPC CIDR for subnets)
> > >> 5.5.5.5/32: instance-xxx
> > >>
> > >> This maps to OVN configuration (for one LR with one subnet
> 172.31.0.0/24):
> > >>
> > >> implicit route (not present in logical router static routes table):
> > >> 172.31.0.0/24: LRP-subnet-xxx - has highest priority over other route
> > >> table-oriented routes and can be threat as placed in global routing
> table
> > >>
> > >
> > > What do you mean by highest priority here? It all depends on the prefix
> > > length, right? If you have a static route entry that says:
> 172.31.0.0./25
> > > -> xyz, then this route will have higher priority than the directly
> > > connected subnet.
> > >
> >
> > Yes, it is true, but only for routes within “global” routing table. I
> left this to save previous behaviour.
> > It is impossible to create a static route within any non-global routing
> table which overrides directly connected routes,
> > because priority for “global” routes is added by 100 (though, it should
> add >2*128 to support same behavior for IPv6 as you already pointed).
> >
> > >
> > >>
> > >> Normal static routes:
> > >> ip_prefix: 0.0.0.0/0, nexthop: <IP for edge LR’s LRP>, route_table:
> > >> rtb-xxx
> > >> ip_prefix: 5.5.5.5/32, nexthop: <IP of some LSP, which belongs to
> > >> VM/container via which route is built>, route_table: rtb-yyy
> > >>
> > >> I guess, I understood the reason for misunderstanding: the global
> routing
> > >> table, which I referred earlier is a routing table which has no value
> in
> > >> "route_table" field and directly-connected routes at the same time.
> Latter
> > >> have no records in logical router static routes, but I still referred
> them
> > >> as a routes from "global routing table". I can think about terminology
> > >> here, if it’s a root cause for misunderstanding. What do you think?
> > >>
> > >
> > > Thanks for explaining. In fact I understand what you mean about "global
> > > routing table", but I think you are implementing something different
> from
> > > what AWS provides, primarily because you use a single LR instead of LR
> per
> > > routing table. I understand there are reasons why you want to do it this
> > > way, but here I can think of some challenges of your approach. For
> example,
> > > in AWS we could do:
> > >
> > > route table main:
> > > subnet S0:
> > > 172.31.0.0/24
> > > routes:
> > > 172.31.0.0/16: local
> > >
> > > route table R1:
> > > subnet S1:
> > > 172.31.1.0/24
> > > routes:
> > > 172.31.0.0/16: local
> > > 172.31.0.0/24: <some FW/GW>
> >
> > Wow. It’s a brand-new behaviour for AWS, about which I was not aware, as
> it appeared 1.5 months ago [1].
> >
> > Previously, when I was writing this patch series in AWS such static route
> was impossible to add.
> > You couldn’t add routes, which are the same or more specific than VPC
> CIDR block.
> >
> > Though in GCP you can’t create same or more specific than subnet route
> [2].
> > I think I can try to update my patch to support AWS approach here.
>
> Would it be simpler just don't make the global routes higher priority? We
> could make it clear to the user that if they add a route in any route table
> that is overlapping with any directly connected subnets, the longer prefix
> would take precedence. Would it be more flexible to user?
>
> >
> > But all this is relevant to single LR per VPC (not per route table
> because of described earlier blocker).
> > Do you think it is reasonable with such inputs?
> >
> > >
> > > Packet from S1 to S0 will go to FW/GW, where it may be
> dropped/forwarded to
> > > S0/or redirected to something outside of AWS ...
> > >
> > > While in your approach with OVN, both subnets will be under the same
> > > logical router, and with different table IDs assigned to the LRPs and
> > > routes. But when a VM under S1 sends a packet to S0, it will go to the
> > > destination directly because you are prioritizing the direct-connection
> > > routes and they are under the same global route table, and there is no
> way
> > > to force it to go through some FW/GW from the custom route table. You
> can
> > > add a route to achieve this in the global table, because in your design
> the
> > > global routes are still applicable to all LRPs and has higher priority,
> but
> > > it would impact all the LRPs/subnets, while in the AWS design above it
> is
> > > supposed to affect subnets under R1 only.
> > >
> > > In addition, for my understanding, S0 and S1 in AWS cannot communicate
> > > directly, unless there are some specific routes on both route tables to
> > > route them to some gateway. In other words, there are some isolations
> > > between route tables. However, in your design with OVN, it is more of
> >
> > S0 and S1 can communicate with each other being in different route tables
> without any specific setup,
> > the default unremovable “Local route” ensures this.
> > User only can override "local" route with only existing subnet’s (more
> specific than "local route") CIDR
> > to send traffic from S0 to S1 through some specific appliance(s) in S2.
> > But basic IP connectivity always remains. Regardless of route tables
> subnet associations.
> > So I’m still sure that one LR per VPC suites this scenario better,
> > as I can’t imagine why to place subnets, which don’t need connectivity in
> one VPC.
> >
> Maybe I was wrong. I am not really familiar with AWS and I wanted to try
> this myself but didn't get the time :(
> If default connectivity for subnets under different route tables is a
> requirement (instead of the contrary), then I'd agree that one LR per VPC
> seems better, because it is more like policy routing instead of for
> isolation.
>
> > > policy routing without isolation at all. I am not saying that your
> design
> > > is wrong, but just saying it implements very different behavior than
> AWS,
> > > and I am trying to understand your real use cases to have a better
> > > understanding of the feature you implemented. (You explained that you
> just
> > > wanted the AWS behavior, but it looks not exactly the case).
> > >
> >
> > Will fix places where the behaviour has difference with AWS.
> > And thanks you for pointing such places.
> >
> Well, I think it doesn't have to be all the same as AWS, but yes if AWS
> design makes more sense.
>
> > >
> > >>
> > >>
> > >> In my opinion having this feature to be implemented using PBR is less
> > >>
> > >> convenient and native for users, who are familiar with behaviour for
> > >> mentioned above public cloud platforms, because configuring routes
> should
> > >> be done in routes section. And adding route table property seems
> native in
> > >> this route, not in PBR. Moreover, I _think_ using
> > >> Logical_Router_Static_Route to extend this feature for
> OVN-Interconnection
> > >> becomes quite easy comparing to PBR (though, I didn’t try the latter).
> > >>
> > >> I agree if it is just AWS -like requirement, PBR is less convenient.
> > >>
> > >> I am trying to understand if it can be achieved with separate LRs. If
> not,
> > >> what's special about the requirement, and is the current approach
> providing
> > >> a solution common enough so that more use cases can also benefit from?
> > >> Could you clarify a little more? Thanks again.
> > >>
> > >> That was our initial approach - to use separate LRs for each Route
> Table.
> > >> We rejected that solution because met some difficulties and blocker.
> See
> > >> below:
> > >>
> > >> Brief topology description if using LRs per Route Table:
> > >> Imagine 2 subnets in VPC in 1st AZ, one in another AZ. Each subnet in
> it’s
> > >> own Route Table (LR).
> > >> All subnets must have IP connectivity, so we have to somehow
> interconnect
> > >> these Route Table LRs.
> > >>
> > >
> > > That's exactly the same case for AWS, right? If you want direct
> > > connectivity, you would just put them under the same route table in AWS
> (or
> > > same LR in OVN), right?
> > >
> >
> > I’m sorry, my example was not good enough.
> > Let’s remove AZ’s from that case as they’re not relevant to discussion.
> > So, we’ve got 2 subnets S0 and S1, and suppose, they’ve got different
> route tables assigned to them.
> > If we place them in a single LR, assign different route tables, they
> would have connectivity "out-of-box".
> > Same in AWS: because "local route" is added automatically, can’t be
> removed or modified and ensures IP connectivity between all VPC subnets.
> >
> Ok, thanks for explaining.
>
> > >
> > >>
> > >> [BLOCKER] It is impossible to place route in route table 1 via VM from
> > >> subnet assiciated to route table 2 if using per-RTB LR. Because in
> RTB-2 LR
> > >> we have to add route from RTB 1 and this breaks route table isolation.
> > >> Route Table 2 LR will start looking up routes and there could be routes
> > >> from another route tables. This breaks the idea of having LR per Route
> > >> Table completely. Here we rejected this solution and moved to adding
> > >> support for route tables in OVN.
> > >>
> > >>
> > > Would it be just the same problem in AWS? Why would a user route a
> subnet
> > > of RTB-1 to a VM under RTB-2, and at the same time want route table
> > > isolation?
> > > With the single LR solution in OVN, you would not get the isolation
> between
> > > the tables, because 1) all LRPs still share the global routing table, 2)
> > > for output to nexthop there is no distinction between the output
> interfaces.
> > >
> >
> > No, such problem does not exist.
> > By route tables "isolation" I meant that routing rules configured in one
> routing table must not affect routing in another one.
> > In this scenario user may want to route traffic to internet from S0
> (RTB-0) through instance in S1 (RTB-1).
> > For instance it can be a gateway service (FW, WAF, IPS, etc) in S1, which
> provides secure access for clients from VPC to the internet with NAT.
> > So, configuration should be:
> > - 0.0.0.0/0 via GW-INSTANCE in RTB-0
> > - 0.0.0.0/0 via edge LR in RTB-1
> > So, instances from all subnets with rtb-0 assigned route table will be
> routed to internet through GW-INSTANCE, which can analyse/drop/do whatever
> things with traffic.
> > This is a small example to show how different route tables can be used in
> a single VPC.
> > With multiple OVN LRs per VPC in AZ such topology can’t be created, see:
> > In LR for RTB-0 we can add route 0.0.0.0/0 via <LR0-to-LR1 link IP>;
> > bun in LR for RTB-1 we can only add route to internet through either
> edge-LR or GW-INSTANCE. Not both. So, this case can’t be implemented
> currently in OVN.
> >
> Thanks for the example. It helps understanding your requirements. Why I
> believe this can be achieved by current OVN with the policy routing
> feature, I understand as you mentioned earlier you need policy routing for
> ACL purpose. I also agree that multi-stage policy routing would seem less
> user-friendly (and maybe also harder to implement), so I support the
> multi-table feature for the purpose of providing extra policy routing
> capability.
>
> While I still need to take a closer look at the code, I believe it would be
> helpful to add such example use cases in the documentation for users to
> better understand the feature.
>
> > Hope this helps to understand my problem.
> >
> > >
> > >> But some more cons:
> > >>
> > >> 1. complex network topology. Interconnecting all LRs even with some
> > >> transit switch is harder than having one LR and all VPC-related
> > >> configuration is done in one place (in AZ).
> > >> 2. Directly-connected routes. In case we have multiple Route Table
> LRs, we
> > >> have to add route for each subnet from another LR. In case one LR per
> VPC
> > >> all such routes are installed automatically and learnt via ovn-ic.
> > >> 3. PBR, LBs. It’s much easier to implement PBR and configuring Load
> > >> Balancers in one LR, than in multiple.
> > >> 4. There could be very many LRs, LRPs, LSPs (datapaths in SB) - excess
> > >> database records, huge database growth.
> > >> 5. Extra client code complexity (updating routes required configuring
> > >> routes in many LRs on different availibility zones);
> > >> 5. We couldn’t use ovn-ic routes learning, because VPC requires
> out-of-box
> > >> IP connectivity between subnets, and if connect Route Table LRs between
> > >> AZs, because connecting multiple LRs from different Route Tables would
> > >> learn routes without binding to route table. This requires additional
> > >> isolation at the transit switches level.
> > >> 6. There were some more problems. Here are listed some, which I could
> > >> refresh in my mind.
> > >>
> > >> From our half-of-a-year experience using LR per VPC is very comfortable
> > >> and it looks quite extendable it terms of network features.
> > >>
> > >> Let me know if this makes sense.
> > >>
> > >>
> > > Thanks for the list of the cons! Most of the cons you listed above seem
> to
> > > apply to the AWS model, too, right? It is ok to have different
> requirements
> >
> > Honestly, I can’t imagine these bullets to "AWS model". Maybe only
> ovn-ic, which
> > requires modifications in ovn-ic codebase to support route_tables in
> routes.
> > But this work is already done.
>
> I think the disagreement was mainly from the understanding of the behavior
> of directly connected subnets between different routing tables. I assume my
> understanding was wrong for now :)
> >
> > > than AWS, but we'd better define it clearly and it would be helpful with
> > > some typical use cases that solve specific problems. I have more
> > > information now after your explanation, but still not really clear under
> > > what scenarios would this feature be used.
> > >
> > > @Numan Siddique <num...@ovn.org> @Mark Michelson <mmich...@redhat.com>
> > > could you let me know your thoughts on this, too? Would you see similar
> use
> > > cases in Redhat customers? Or does this design match well with your
> > > potential use cases? I'd like to be careful when implementing something
> > > like this and make sure we are doing it the right way.

I don't think there is any use case in ovn-k8s or openstack neutron which would
use this feature.

Sorry.  I've not followed the discussion here closely.

What is the tl;dr.

@Han Zhou You're fine with this feature and will we be accepting this in OVN ?

Thanks
Numan

> >
> > One comment here from my side:
> > by "this design" we should talk about "behaviour similar to AWS/other
> public platforms route tables support".
> > I think that it is a good feature for opensource SDN, which can be futher
> used by opensource cloud platforms like OpenStack to extend routing
> capabilities.
> >
> Agree.
>
> > >
> > > Thanks,
> > > Han
> >
> > 1:
> https://aws.amazon.com/blogs/aws/inspect-subnet-to-subnet-traffic-with-amazon-vpc-more-specific-routing/
> > 2: https://cloud.google.com/vpc/docs/routes#subnet-routes
> >
> > >
> > >> Han
> > >>
> > >>
> > >>
> > >> I haven't finished reviewing the code yet, but I have one comment
> > >>
> > >> regarding adding 100 to the priority of the global routes. For IPv6,
> the
> > >> priority range from 0 to 120x2=240, so adding 100 is not enough. It
> would
> > >> create overlapping priority ranges, and some table-id specific route
> > >> entries may override the global routes.
> > >>
> > >>
> > >>
> > >> Thanks, I’ll dig into this when you finish review.
> > >>
> > >>
> > >> Let me know if I answered your questions or if you have new ones.
> > >> Again many thanks for your time and digging into this patch series.
> > >>
> > >> 1:
> https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html
> > >> 2: https://cloud.google.com/vpc/docs/routes#subnet-routes
> > >> 3: https://docs.cloud.croc.ru/en/services/networks/routetables.html
> > >>
> > >> Thanks,
> > >> Han
> > >>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to