Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 6/10/15, 12:13 AM, roopa wrote: Robert/Thomas, All my changes are in the below repo under the 'mpls' branch. https://github.com/CumulusNetworks/net-next https://github.com/CumulusNetworks/iproute2 The last iproute2 commit has a sample usage. The commits pushed to this tree do not contain support for the following yet (but working on it): a) tunnel routes to work with tunnel RTA_OIF and a non-tunnel RTA_OIF: The current commits in the tree assume a non-tunnel RTA_OIF. If the tunnel driver has registered a dst_output func, dst_output is set to the tunnel dst output handler in the receive route lookup path which in turn does the encap and xmits. Thomas had last suggested using a flag to skip the dst output handler re-direction for cases where RTA_OIF is a special tunnel netdev and the tunnel driver xmit function can do the encap. My current thinking is to pass the oif to the encap parse handler and the handler can set the flag on the tunnel state. And this flag can then be used to skip the dst_output re-direction. This change should be trivial will fix it soon. I have pushed this change to my github tree. b) make RTA_OIF optional and do a fib lookup. thinking about this some more, RTA_OIF is already optional. And net/ipv4/fib_semantics.c:fib_check_nh will lookup the dev if not specified. Wouldn't that be enough ?. (unless i have misunderstood something here) thanks, Roopa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 6/8/15, 3:58 PM, Thomas Graf wrote: I'll immediately ACK any series that supports both models and rebase my patches on top of it. I think we are on the right track overall. I am trying to get my code on github to collaborate better. Stay tuned (hopefully end of day today). Robert/Thomas, All my changes are in the below repo under the 'mpls' branch. https://github.com/CumulusNetworks/net-next https://github.com/CumulusNetworks/iproute2 The last iproute2 commit has a sample usage. The commits pushed to this tree do not contain support for the following yet (but working on it): a) tunnel routes to work with tunnel RTA_OIF and a non-tunnel RTA_OIF: The current commits in the tree assume a non-tunnel RTA_OIF. If the tunnel driver has registered a dst_output func, dst_output is set to the tunnel dst output handler in the receive route lookup path which in turn does the encap and xmits. Thomas had last suggested using a flag to skip the dst output handler re-direction for cases where RTA_OIF is a special tunnel netdev and the tunnel driver xmit function can do the encap. My current thinking is to pass the oif to the encap parse handler and the handler can set the flag on the tunnel state. And this flag can then be used to skip the dst_output re-direction. This change should be trivial will fix it soon. b) make RTA_OIF optional and do a fib lookup. keep your suggestions/feedback coming... thanks, Roopa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 06/08/15 at 08:17am, roopa wrote: > ack, that sounds intuitive. > With RTA_ENCAP and the mpls examples i was using it looks something like the > below for (1) > ip route add 10.1.1.0/30 encap mpls 200 via 10.1.1.1 dev eth0 > > The tunnel dst is parsed and understood by the light weight tunnel driver, > which I think will > end up having to do the lookup (needs more thought)...for (2) and (3). I think we only want to perform the nested fib lookup if no dev is specified. If a tunnel device is specified, that device will do the fib lookup and can cache the route in the encap socket. > >Your nexthop implementation seemed more correct based on the chunks > >I went through. Can we combine the two series and make the RTA_OIF > >in the nexthop optional if an RTA_ENCAP was provided and provide a > >route lookup instead? > > yes, we can do that. > Robert can correct me if i misunderstood, both our patches had similar code > to handle RTA_ENCAP. > Only difference was in the way we stored the encaped data, mine was a > pointer to tunnel state and his was embedded in fib_nh. His patch today > assumes there is a tunnel device. > And mine assumes the output device is specified in the ipv4 fib route. I'll immediately ACK any series that supports both models and rebase my patches on top of it. I think we are on the right track overall. > I am trying to get my code on github to collaborate better. Stay tuned > (hopefully end of day today). Cool > While we are on this conversation, Though the code already supports nested > attributes (with the example robert showed), I introduced explicit nested > attributes for mpls in my version, > and it seemed like it is better to introduce two attributes RTA_ENCAP_TYPE > and RTA_ENCAP and > type determines the nested policy for RTA_ENCAP > RTA_ENCAP_TYPE /* MPLS, VXLAN etc */ +1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 6/8/15, 5:33 AM, Thomas Graf wrote: Yes, the information used to determine the encapsulation and the route used to select the outgoing interface might be coming from different components. A simple and typical example is if you are running quagga to for your underlay which determines which interface to use for which tunnel endpoints. On top of that, somebody is maintaining your virtual networks which is only aware of the tunnel endpoint IP addresses but does not want to manage how to actually reach them. So you would have: ip route add 10.1.1.0/24 via tunnel 20.1.1.1 id 100 [dev vxlan0] ip route add 20.1.1.1/24 dev eth0 I've put "dev vxlan0" in brackets for now to indicate that it is optional. I'm also using VXLAN as an examples as I think it's easier to understand this separation of concern here. The point is, whoever is adding the route with the encap information may not know what interface to use to reach 20.1.1.1 and we may want to rely on existing routes. I think we want to support three models: 1. nexthop has encap and outgoing interface ip route add 10.1.1.0/24 via tunnel 20.1.1.1 dev eth0 ip route add 20.1.1.1/24 dev eth0 2. nexthop has endpoint but no dev ip route add 10.1.1.0/24 via tunnel 20.1.1.1 ip route add 20.1.1.1/24 dev eth0 This would indicate to the routing subsystem to perform a fib lookup on 20.1.1.1 to determine the outgoing interface. 3. virtual tunnel interface to share configuration among routes ip route add 10.1.1.0/24 via tunnel 20.1.1.1 dev vxlan0 ip route add 20.1.1.1/24 dev eth0 I think all of them are intuitive and easy to implement. This will also allow to incorporate the bridge model. ack, that sounds intuitive. With RTA_ENCAP and the mpls examples i was using it looks something like the below for (1) ip route add 10.1.1.0/30 encap mpls 200 via 10.1.1.1 dev eth0 The tunnel dst is parsed and understood by the light weight tunnel driver, which I think will end up having to do the lookup (needs more thought)...for (2) and (3). Your nexthop implementation seemed more correct based on the chunks I went through. Can we combine the two series and make the RTA_OIF in the nexthop optional if an RTA_ENCAP was provided and provide a route lookup instead? yes, we can do that. Robert can correct me if i misunderstood, both our patches had similar code to handle RTA_ENCAP. Only difference was in the way we stored the encaped data, mine was a pointer to tunnel state and his was embedded in fib_nh. His patch today assumes there is a tunnel device. And mine assumes the output device is specified in the ipv4 fib route. I am trying to get my code on github to collaborate better. Stay tuned (hopefully end of day today). While we are on this conversation, Though the code already supports nested attributes (with the example robert showed), I introduced explicit nested attributes for mpls in my version, and it seemed like it is better to introduce two attributes RTA_ENCAP_TYPE and RTA_ENCAP and type determines the nested policy for RTA_ENCAP RTA_ENCAP_TYPE /* MPLS, VXLAN etc */ RTA_ENCAP { MPLS_IPTUNNEL_UNSPEC MPLS_IPTUNNEL_DST } RTA_ENCAP { /* this is also similar to the example robert posted for vxlan */ VXLAN_TUN_UNSPEC, VXLAN_TUN_ID, VXLAN_TUN_DST, VXLAN_TUN_SRC, VXLAN_TUN_TTL, VXLAN_TUN_TOS, VXLAN_TUN_SPORT, VXLAN_TUN_DPORT, VXLAN_TUN_FLAGS, VXLAN_TUN_MAX, } Thanks, Roopa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 06/05/15 at 07:54pm, roopa wrote: > On 6/5/15, 8:26 AM, Robert Shearman wrote: > > > >It isn't clear to me what the strategy here is for dealing with tunnel > >encaps that aren't bound to an interface. > > > >Thomas, I presume you would prefer not to force the user to keep track of > >changes to the output interface and nexthop corresponding to the > >destination of the outer IP header? And I presume that Eric is opposed to > >the option of using a virtual interface here, i.e. falling back to the > >approach I proposed? > > > >In which case, what will the nexthop output interface be set to? > >Logically, it should have no interface. At the moment, the code assumes > >that a nexthop will have a valid interface and I don't have a feel for > >what the impact would be of changing that. > > The nexthop interface is the final output interface. Any reason it should > not be ? Yes, the information used to determine the encapsulation and the route used to select the outgoing interface might be coming from different components. A simple and typical example is if you are running quagga to for your underlay which determines which interface to use for which tunnel endpoints. On top of that, somebody is maintaining your virtual networks which is only aware of the tunnel endpoint IP addresses but does not want to manage how to actually reach them. So you would have: ip route add 10.1.1.0/24 via tunnel 20.1.1.1 id 100 [dev vxlan0] ip route add 20.1.1.1/24 dev eth0 I've put "dev vxlan0" in brackets for now to indicate that it is optional. I'm also using VXLAN as an examples as I think it's easier to understand this separation of concern here. The point is, whoever is adding the route with the encap information may not know what interface to use to reach 20.1.1.1 and we may want to rely on existing routes. I think we want to support three models: 1. nexthop has encap and outgoing interface ip route add 10.1.1.0/24 via tunnel 20.1.1.1 dev eth0 ip route add 20.1.1.1/24 dev eth0 2. nexthop has endpoint but no dev ip route add 10.1.1.0/24 via tunnel 20.1.1.1 ip route add 20.1.1.1/24 dev eth0 This would indicate to the routing subsystem to perform a fib lookup on 20.1.1.1 to determine the outgoing interface. 3. virtual tunnel interface to share configuration among routes ip route add 10.1.1.0/24 via tunnel 20.1.1.1 dev vxlan0 ip route add 20.1.1.1/24 dev eth0 I think all of them are intuitive and easy to implement. This will also allow to incorporate the bridge model. > >However, with that resolved I'd be happy to work on a series together. The > >remaining issue is whether to optimise for small encap that reside in the > >same memory block as the fib_info, which aren't refcounted but instead are > >copied around, or larger encaps that reside in their own memory block that > >are refcounted and only a pointer passed around. > I would prefer the latter (as shown in my incomplete patch) simply because > it stays separate from fib_info and allows for extending it in the future. I'm with Roopa on this one. Simply because it allows to keep the RX and TX path more symmetric and it allows non-FIB users as well. > >If the latter, then there really isn't much left in my patch series that > >can be reused, other than references to the places in the code that need > >to be changed to support multipath and to make fib_info matching work > >correctly. Your nexthop implementation seemed more correct based on the chunks I went through. Can we combine the two series and make the RTA_OIF in the nexthop optional if an RTA_ENCAP was provided and provide a route lookup instead? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 6/5/15, 8:26 AM, Robert Shearman wrote: It isn't clear to me what the strategy here is for dealing with tunnel encaps that aren't bound to an interface. Thomas, I presume you would prefer not to force the user to keep track of changes to the output interface and nexthop corresponding to the destination of the outer IP header? And I presume that Eric is opposed to the option of using a virtual interface here, i.e. falling back to the approach I proposed? In which case, what will the nexthop output interface be set to? Logically, it should have no interface. At the moment, the code assumes that a nexthop will have a valid interface and I don't have a feel for what the impact would be of changing that. The nexthop interface is the final output interface. Any reason it should not be ? However, with that resolved I'd be happy to work on a series together. The remaining issue is whether to optimise for small encap that reside in the same memory block as the fib_info, which aren't refcounted but instead are copied around, or larger encaps that reside in their own memory block that are refcounted and only a pointer passed around. I would prefer the latter (as shown in my incomplete patch) simply because it stays separate from fib_info and allows for extending it in the future. If the latter, then there really isn't much left in my patch series that can be reused, other than references to the places in the code that need to be changed to support multipath and to make fib_info matching work correctly. Thanks, Roopa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 05/06/15 15:16, roopa wrote: On 6/5/15, 2:14 AM, Thomas Graf wrote: On 06/03/15 at 07:21am, Roopa Prabhu wrote: From: Roopa Prabhu This is still WIP and incomplete. Posting it here because of the other discussions happening around mpls ler in the context of Roberts code and I happened to mention this implementation. This was in response to earlier email thread with Eric on net-next of possibly using xfrm style stacked destination approach. I introduce a new set of tunnel ops for light weight tunnels (lwt), but this could be merged with the other ip_tunnels code if possible. I had this code for 3.2 kernel initially, and as I was pulling out code, I realize i had to separate out some other mpls code that i have been working on and quite likely this will not even compile. Sorry abt that. Signed-off-by: Roopa Prabhu Thanks for posting these patches Roopa! Ditto, thanks Roopa! I see that some of the edges are still a bit rough. In particular the lack of sanity checking around type before indexing the array with it ;-) Oh..., sorry you had to see that :) (In my defense, ...i did successfully get some packets into the mpls tunnel with this though! :) ) No question that this would make a great optimization on top of existing IP tunnels though! I think this is where Eric was heading to and given this implementation, I'm perfectly fine with it as it does not *require* to precompute the headers for all encap types. This can be made compatible with the patches I have posted as well. A simple flag in what you call rtencap could indicate whether to perform the encap in the dst->output or merely attach the metadata and forward it to RTA_OIF for postponed encapsulation. That way, if desirable by the user, the net_device can be omitted which would suit Eric's architecture while we still also support the traditional net_device model which provides stats and a shared set of encapsulation parameters. It will also allow for bridges to perform the encapsulation decision if needed and we can still get rid of the OVS encapsulation special handling. yeah, that's a great idea. As I mentioned to Robert, the new RTA_ENCAP should be a list of Netlink attributes from the beginning to make it extendible without ever breaking user ABI. agreed. The most overlap seems to be with Robert's series. The direction seems to be very similar. How do you want to proceed? Work on a series together? I'm happy to rebase my series on top of both you and Robert's work and make use of a new generic per nexthop encapsulation API. Let me know how you guys want to proceed. Robert, pls let me know if you have a preference on how you want to proceed. One option is for me to use your git tree as a way to get my patches in. But, If we agree that we don't want to introduce a tunnel netdevice for mpls yet (which is our vote as well), then its probably better for me to rebase my changes on top of your series and re-submit (with proper attribution ofcourse). It isn't clear to me what the strategy here is for dealing with tunnel encaps that aren't bound to an interface. Thomas, I presume you would prefer not to force the user to keep track of changes to the output interface and nexthop corresponding to the destination of the outer IP header? And I presume that Eric is opposed to the option of using a virtual interface here, i.e. falling back to the approach I proposed? In which case, what will the nexthop output interface be set to? Logically, it should have no interface. At the moment, the code assumes that a nexthop will have a valid interface and I don't have a feel for what the impact would be of changing that. However, with that resolved I'd be happy to work on a series together. The remaining issue is whether to optimise for small encap that reside in the same memory block as the fib_info, which aren't refcounted but instead are copied around, or larger encaps that reside in their own memory block that are refcounted and only a pointer passed around. If the latter, then there really isn't much left in my patch series that can be reused, other than references to the places in the code that need to be changed to support multipath and to make fib_info matching work correctly. (Happy to take erics feedback as well here). Right now I am working on refining my patches and covering ipv6. I would be happy to make RTA_ENCAP nested...unless you would prefer to take that over. I have also been trying to see If i can reuse any infra from the existing ip_tunnel world. Thanks, Rob -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 05/06/15 10:14, Thomas Graf wrote: As I mentioned to Robert, the new RTA_ENCAP should be a list of Netlink attributes from the beginning to make it extendible without ever breaking user ABI. Just to be clear in both of our approaches, the contents of the RTA_ENCAP data is interpreted by the encap owner. Therefore, if the mpls encap doesn't consist of nested attributes then it doesn't preculde vxlan, for example, consisting of nested attributes. I do agree though that the netlink format for specifying mpls encap should support nested attributes from day 1 to allow it to be extended without breaking the ABI. Thanks, Rob -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 6/5/15, 2:14 AM, Thomas Graf wrote: On 06/03/15 at 07:21am, Roopa Prabhu wrote: From: Roopa Prabhu This is still WIP and incomplete. Posting it here because of the other discussions happening around mpls ler in the context of Roberts code and I happened to mention this implementation. This was in response to earlier email thread with Eric on net-next of possibly using xfrm style stacked destination approach. I introduce a new set of tunnel ops for light weight tunnels (lwt), but this could be merged with the other ip_tunnels code if possible. I had this code for 3.2 kernel initially, and as I was pulling out code, I realize i had to separate out some other mpls code that i have been working on and quite likely this will not even compile. Sorry abt that. Signed-off-by: Roopa Prabhu Thanks for posting these patches Roopa! I see that some of the edges are still a bit rough. In particular the lack of sanity checking around type before indexing the array with it ;-) Oh..., sorry you had to see that :) (In my defense, ...i did successfully get some packets into the mpls tunnel with this though! :) ) No question that this would make a great optimization on top of existing IP tunnels though! I think this is where Eric was heading to and given this implementation, I'm perfectly fine with it as it does not *require* to precompute the headers for all encap types. This can be made compatible with the patches I have posted as well. A simple flag in what you call rtencap could indicate whether to perform the encap in the dst->output or merely attach the metadata and forward it to RTA_OIF for postponed encapsulation. That way, if desirable by the user, the net_device can be omitted which would suit Eric's architecture while we still also support the traditional net_device model which provides stats and a shared set of encapsulation parameters. It will also allow for bridges to perform the encapsulation decision if needed and we can still get rid of the OVS encapsulation special handling. yeah, that's a great idea. As I mentioned to Robert, the new RTA_ENCAP should be a list of Netlink attributes from the beginning to make it extendible without ever breaking user ABI. agreed. The most overlap seems to be with Robert's series. The direction seems to be very similar. How do you want to proceed? Work on a series together? I'm happy to rebase my series on top of both you and Robert's work and make use of a new generic per nexthop encapsulation API. Let me know how you guys want to proceed. Robert, pls let me know if you have a preference on how you want to proceed. One option is for me to use your git tree as a way to get my patches in. But, If we agree that we don't want to introduce a tunnel netdevice for mpls yet (which is our vote as well), then its probably better for me to rebase my changes on top of your series and re-submit (with proper attribution ofcourse). (Happy to take erics feedback as well here). Right now I am working on refining my patches and covering ipv6. I would be happy to make RTA_ENCAP nested...unless you would prefer to take that over. I have also been trying to see If i can reuse any infra from the existing ip_tunnel world. Thanks for the feedback Thomas!. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 06/03/15 at 07:21am, Roopa Prabhu wrote: > From: Roopa Prabhu > > This is still WIP and incomplete. > Posting it here because of the other discussions > happening around mpls ler in the context of Roberts > code and I happened to mention this implementation. > > This was in response to earlier email thread with Eric on > net-next of possibly using xfrm style stacked destination > approach. > > I introduce a new set of tunnel ops for light weight > tunnels (lwt), but this could be merged with the > other ip_tunnels code if possible. > > I had this code for 3.2 kernel initially, and > as I was pulling out code, I realize i had to separate > out some other mpls code that i have been working on > and quite likely this will not even compile. Sorry abt > that. > > Signed-off-by: Roopa Prabhu Thanks for posting these patches Roopa! I see that some of the edges are still a bit rough. In particular the lack of sanity checking around type before indexing the array with it ;-) No question that this would make a great optimization on top of existing IP tunnels though! I think this is where Eric was heading to and given this implementation, I'm perfectly fine with it as it does not *require* to precompute the headers for all encap types. This can be made compatible with the patches I have posted as well. A simple flag in what you call rtencap could indicate whether to perform the encap in the dst->output or merely attach the metadata and forward it to RTA_OIF for postponed encapsulation. That way, if desirable by the user, the net_device can be omitted which would suit Eric's architecture while we still also support the traditional net_device model which provides stats and a shared set of encapsulation parameters. It will also allow for bridges to perform the encapsulation decision if needed and we can still get rid of the OVS encapsulation special handling. As I mentioned to Robert, the new RTA_ENCAP should be a list of Netlink attributes from the beginning to make it extendible without ever breaking user ABI. The most overlap seems to be with Robert's series. The direction seems to be very similar. How do you want to proceed? Work on a series together? I'm happy to rebase my series on top of both you and Robert's work and make use of a new generic per nexthop encapsulation API. Let me know how you guys want to proceed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH WIP RFC 0/3] mpls: support for ler
From: Roopa Prabhu This is still WIP and incomplete. Posting it here because of the other discussions happening around mpls ler in the context of Roberts code and I happened to mention this implementation. This was in response to earlier email thread with Eric on net-next of possibly using xfrm style stacked destination approach. I introduce a new set of tunnel ops for light weight tunnels (lwt), but this could be merged with the other ip_tunnels code if possible. I had this code for 3.2 kernel initially, and as I was pulling out code, I realize i had to separate out some other mpls code that i have been working on and quite likely this will not even compile. Sorry abt that. Signed-off-by: Roopa Prabhu Roopa Prabhu (3): lwtunnels: basic infra for light weight tunnels like mpls ipv4 fib: lwtunnel handling mpls: register lwtunnel ops include/linux/if_lwtunnel.h |8 ++ include/net/dst.h|2 + include/net/ip_fib.h |5 +- include/net/lwtunnel.h | 61 + include/uapi/linux/if_lwtunnel.h | 12 +++ include/uapi/linux/rtnetlink.h |8 +- net/Makefile |2 +- net/ipv4/fib_frontend.c |6 ++ net/ipv4/fib_semantics.c | 34 +++- net/ipv4/route.c |5 ++ net/lwtunnel.c | 177 ++ net/mpls/af_mpls.c | 143 ++ net/mpls/internal.h |5 ++ 13 files changed, 464 insertions(+), 4 deletions(-) create mode 100644 include/linux/if_lwtunnel.h create mode 100644 include/net/lwtunnel.h create mode 100644 include/uapi/linux/if_lwtunnel.h create mode 100644 net/lwtunnel.c -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html