Re: Routing loops & TTL tracking with tunnel devices
On Mon, 2015-11-16 at 21:55 +0100, Jason A. Donenfeld wrote: > Hi Sowmini, > > Neat. Though, in my case, I'm not actually just prepending a header. > I'm doing some more substantial transformations of a packet. And this > needs to work with v4 too. So I'm not sure implementing a v6 spec will > help with things. I need to identify the right mechanism inside the > kernel to assist with this, like, say, a member in sk_buff. There is very little chance we'll accept a new member in sk_buff, unless proven needed. Yes, it is very tempting and dozen of programmers already tried. You'll have to demonstrate full understanding of the stack and why other solutions do not work. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Routing loops & TTL tracking with tunnel devices
Hi Jason, On Mon, Nov 16, 2015, at 21:14, Jason A. Donenfeld wrote: > A few tunnel devices, like geneve or vxlan, are using > udp_tunnel_xmit_skb, or related functions for transmitting packets, > and are doing the usual FIB lookup to get the dst entry. I see a lot > of code like this: > > if (rt->dst.dev == dev) { > netdev_dbg(dev, "circular route to %pI4\n", >>sin.sin_addr.s_addr); > dev->stats.collisions++; > goto rt_tx_error; > } > > This one is from vxlan, but there are other similar blocks elsewhere. > The basic idea is "am I about to send this packet to my own device?" > > This is a bit crude. For starters, two interfaces could be pointed at > each other, bouncing the packet back and forth indefinitely, causing > the feared routing loop. Hopefully as more headers got tacked on, > allocations would eventually fail, and the queen would be saved. > > But what about in devices for which self-routing might actually be > useful? For example, let's say that if an incoming skb is headed for > dst X, it gets encapsulated and sent to dst A, and for dst Y it gets > encapsulated and sent to dst B, and for dst Z it gets encapsulated and > sent to dst C. I can imagine situations in which setting A==Y and B==Z > might be useful to do multiple levels of encapsulation on one device, > so that skbs headed for dst X get sent to dst C, but with intermediate > transformations of dst A and dst B. > > This isn't merely theoretical. I'm working on a driver right now that > could benefit from this. > > So, in implementing this, the question of avoiding routing loops comes > into play. The most straight forward way to do this is to use a TTL > value that's decreased. But we have a problem. A packet sent to dst X > that is encapsulated and sent to dst A will have a ttl calculated for > its journey to dst A. How do we preserve TTLs across multiple > traversals of the networking stack? We can't simply stay with the TTL > of the packet when it comes in, because it's tunnel destination might > require a different TTL. The best thing would be to have a "tunnel > TTL" value as part of skb->cb, except the cb gets overwritten when > traversing the networking stack. The best thing I can think of is some > other member of sk_buff, but I don't see any that look good for this. > > So perhaps it would be worthwhile to add this to struct sk_buff? David > - are you interested in this if I submit a patch? > > Or, alternatively, does a fast solution for this already exist that I > overlooked? Have a look at __dev_queue_xmit and the per_cpu recursion limits implemented there: if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT) goto recursion_alert; Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Routing loops & TTL tracking with tunnel devices
Hi Sowmini, Neat. Though, in my case, I'm not actually just prepending a header. I'm doing some more substantial transformations of a packet. And this needs to work with v4 too. So I'm not sure implementing a v6 spec will help with things. I need to identify the right mechanism inside the kernel to assist with this, like, say, a member in sk_buff. Jason -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Routing loops & TTL tracking with tunnel devices
Hi folks, A few tunnel devices, like geneve or vxlan, are using udp_tunnel_xmit_skb, or related functions for transmitting packets, and are doing the usual FIB lookup to get the dst entry. I see a lot of code like this: if (rt->dst.dev == dev) { netdev_dbg(dev, "circular route to %pI4\n", >sin.sin_addr.s_addr); dev->stats.collisions++; goto rt_tx_error; } This one is from vxlan, but there are other similar blocks elsewhere. The basic idea is "am I about to send this packet to my own device?" This is a bit crude. For starters, two interfaces could be pointed at each other, bouncing the packet back and forth indefinitely, causing the feared routing loop. Hopefully as more headers got tacked on, allocations would eventually fail, and the queen would be saved. But what about in devices for which self-routing might actually be useful? For example, let's say that if an incoming skb is headed for dst X, it gets encapsulated and sent to dst A, and for dst Y it gets encapsulated and sent to dst B, and for dst Z it gets encapsulated and sent to dst C. I can imagine situations in which setting A==Y and B==Z might be useful to do multiple levels of encapsulation on one device, so that skbs headed for dst X get sent to dst C, but with intermediate transformations of dst A and dst B. This isn't merely theoretical. I'm working on a driver right now that could benefit from this. So, in implementing this, the question of avoiding routing loops comes into play. The most straight forward way to do this is to use a TTL value that's decreased. But we have a problem. A packet sent to dst X that is encapsulated and sent to dst A will have a ttl calculated for its journey to dst A. How do we preserve TTLs across multiple traversals of the networking stack? We can't simply stay with the TTL of the packet when it comes in, because it's tunnel destination might require a different TTL. The best thing would be to have a "tunnel TTL" value as part of skb->cb, except the cb gets overwritten when traversing the networking stack. The best thing I can think of is some other member of sk_buff, but I don't see any that look good for this. So perhaps it would be worthwhile to add this to struct sk_buff? David - are you interested in this if I submit a patch? Or, alternatively, does a fast solution for this already exist that I overlooked? Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Routing loops & TTL tracking with tunnel devices
On (11/16/15 21:14), Jason A. Donenfeld wrote: > > But what about in devices for which self-routing might actually be > useful? For example, let's say that if an incoming skb is headed for > dst X, it gets encapsulated and sent to dst A, and for dst Y it gets > encapsulated and sent to dst B, and for dst Z it gets encapsulated and > sent to dst C. I can imagine situations in which setting A==Y and B==Z > might be useful to do multiple levels of encapsulation on one device, > so that skbs headed for dst X get sent to dst C, but with intermediate > transformations of dst A and dst B. I believe that what you are talking about is basically nested encapsulation- see https://tools.ietf.org/html/rfc2473. The tunnelling endpoint could track the number of encapsulations and keep a limit on that? (conceptually this may be the same thing as your ttl proposal, except that "ttl" has other meanings in other contexts, so a bit non-intuitive) --Sowmini (fwiw, RFC 2473 proposes an ipv6 option to track nested encapsulation, and that never took off, because, among other reasons, its hard to offload such options to hardware. Anyway, you are not trying to carry this around in the packet). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Routing loops & TTL tracking with tunnel devices
> Neat. Though, in my case, I'm not actually just prepending a header. > I'm doing some more substantial transformations of a packet. And this > needs to work with v4 too. So I'm not sure implementing a v6 spec will Understood, that spec was just referenced to indicate that there are more issues (mtu reduction etc) with nested encapsulation, and this is actually applicable even without the recursion issue (i.e even if you dont have a tunnelling loop, and even if it is not ipv6, there are some non-trivial problems here. Luckily, nested encaps is somewhat uncommon). --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Routing loops & TTL tracking with tunnel devices
On Mon, Nov 16, 2015 at 11:25 PM, Hannes Frederic Sowawrote: > Have a look at __dev_queue_xmit and the per_cpu recursion limits > implemented there: > > if (__this_cpu_read(xmit_recursion) > > RECURSION_LIMIT) > goto recursion_alert; Ahh, thanks for pointing that out. So this works with virtual devices with no queue. As of some recent changes, that now applies to what I'm doing. Unfortunately, I get a complete hard crash, with the blinking keyboard. The only thing written to serial before it dies is: [ 171.347446] Dead loop on virtual device wg0, fix it urgently! This means it did hit that recursion condition, which is good. I assume the recursion limit is just too high, and this has something to do with me overflowing the stack. I'll test this hypothesis and see if I can add a similar check inside my driver to make it lower. If this works, I'm satisfied. Thanks a lot for the pointer here. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Routing loops & TTL tracking with tunnel devices
Hi Eric, On Mon, Nov 16, 2015 at 11:28 PM, Eric Dumazetwrote: > There is very little chance we'll accept a new member in sk_buff, unless > proven needed. I actually have no intention of doing this! I'm wondering if there already is a member in sk_buff that moonlights as my desired ttl counter, or if there's another mechanism for avoiding routing loops. I want to work with what's already there, rather than meddling with the innards of important and memory sensitive structures such as sk_buff. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html