Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Tue, 2015-02-03 at 16:19 -0800, David Miller wrote: > From: David Woodhouse > Date: Mon, 02 Feb 2015 07:27:10 + > > > I'm guessing you don't want to push the *whole* management of the TLS > > control connection *and* the UDP transport, and probing the latter with > > keepalives, into the kernel? I certainly don't :) > > Whilst Herbert Xu and I have discussed in the past supporting > automatic SSL handling of socket data during socket writes in the > kernel, doing TLS stuff would be a bit of a stretch :-) Right. For the DTLS I was thinking we'd do the handshake in userspace and then hand the UDP socket down. At that point it's basically the same as ESP with the bytes in a slightly different place. So I really am looking at an option for "here's a UDP socket to send those tun packets out on, with encryption setup" as the sanest plan I can come up with. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
From: David Woodhouse Date: Mon, 02 Feb 2015 07:27:10 + > I'm guessing you don't want to push the *whole* management of the TLS > control connection *and* the UDP transport, and probing the latter with > keepalives, into the kernel? I certainly don't :) Whilst Herbert Xu and I have discussed in the past supporting automatic SSL handling of socket data during socket writes in the kernel, doing TLS stuff would be a bit of a stretch :-)
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
Hi, On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote: > On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote: > > From: David Woodhouse > > Date: Sun, 01 Feb 2015 21:29:43 + > > > > > I really was looking for some way to push down something like an XFRM > > > state into the tun device and just say "shove them out here until I tell > > > you otherwise". > > > > People decided to use TUN and push VPN stuff back into userspace, > > and there are repercussions for that decision. > > > > I'm not saying this to be mean or whatever, but I was very > > disappointed when userland IPSEC solutions using TUN started showing > > up. > > Yeah. That's a valid criticism of vpnc, certainly. I never did > understand why it reimplemented the IPSec stack. > > For my OpenConnect client it's somewhat more justified though — the > initial data transport there is over TLS, which the kernel doesn't > support. And if we *can* establish UDP communication, that's over DTLS > which the kernel doesn't support either. It's not even the *standard* > version of DTLS because Cisco are still using a pre-RFC4347 version of > the protocol. And we also need to probe the UDP connectivity and do > keepalives and manage the fallback to using the TCP data transport. > > It's not like vpnc where it really is just a case of setting up the ESP > context and letting it run. > > It's only now I've added Juniper support, which uses ESP-in-UDP for the > data transport, that I'm doing something that the kernel supports at > all. And now I'm looking at how to make use of that. > > > We might as well have not have implemented the IPSEC stack at all, > > because as a result of the userland VPN stuff our IPSEC stack is > > largely unused except by a very narrow group of users. > > Well, I'd love to make better use of it if I can. I do suspect it makes > most sense for userspace to continue to manage the probing of UDP > connectivity, and the fallback to TCP mode — and I suspect it also makes > sense to continue to use tun for passing packets up to the VPN client > when it's using the TCP transport. > > So the question would be how we handle redirecting the packet flow to > the optional UDP transport, when the VPN client determines that it's > available. For the sake of the user setting up firewall and routing > rules, I do think it's important that it continues to appear to > userspace as the *same* device for the entire lifetime of the session, > regardless of which transport the packets happen to be using at a given > moment in time. It doesn't *have* to be tun, though. Since you want to provide connectivity over HTTPS which is not possible in kernel space, you are stuck with keeping the tun device. So the packet flow in that case is identical to how e.g. OpenVPN does it: - tunX holds default route - OpenConnect then: - receives packets on /dev/tun - holds TCP socket to VPN concentrator - does encapsulation into TLS Speaking of optimisation, the interesting part is the alternative flow via IPsec in UDP. AFAICT, it should be possible to setup an ESP in UDP tunnel using XFRM (see ip-xfrm(8) for reference), although I didn't try that myself. The funny thing with XFRM is, it applies before the routing decision does: If my IPsec policy matches, the packet goes that way no matter what the routing table says about the original destination. This can be used to override the default route provided via tun0 in the above case. Of course, OpenConnect has to manage all the XFRM/policy stuff on it's own, since switching from ESP in UDP back to TLS would mean to tear down the XFRM tunnel. OpenConnect would have to setup (a limited) XFRM and send test traffic to decide whether to set it up fully (if limited) or tear it down (if unlimited) again so traffic arrives at tunX again. In my opinion, this might work. The whole setup is probably about as intuitive as the fact that kernel IPsec tunnel mode does not naturally provide an own interface. Firewall setup on top of that might become a matter of try-and-error. Maybe having a VTI interface and merely moving the default route instead of fiddling with policies all the time might make things a little easier to comprehend, but surely adds some performance overhead. Cheers, Phil
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Mon, 2015-02-02 at 16:23 +0100, Phil Sutter wrote: > Since you want to provide connectivity over HTTPS which is not possible > in kernel space, you are stuck with keeping the tun device. So the > packet flow in that case is identical to how e.g. OpenVPN does it: > > - tunX holds default route > - OpenConnect then: > - receives packets on /dev/tun > - holds TCP socket to VPN concentrator > - does encapsulation into TLS > > Speaking of optimisation, the interesting part is the alternative flow > via IPsec in UDP. Right. The packet flow you describe is what we already have. Except of course we already *do* establish the UDP connection (which is DTLS when we're talking to a Cisco AnyConnect server, and ESP in UDP when we're talking to Juniper). If we get responses to keepalive packets, we'll send outbound packets over the UDP connection. If the UDP connectivity goes AWOL, we'll fall back to sending on TCP. Rekeying of the UDP connection is handled over the TCP control connection too. Even in the DTLS case, the master secret and session ID are exchanged over TCP and the DTLS is actually done as a 'session resume', without the normal DTLS handshake ever happening. As you say, I'm stuck with keeping the tun device (or something very much like it). This *isn't* like vpnc where I can set up an IPSec config and just let it run. > AFAICT, it should be possible to setup an ESP in UDP > tunnel using XFRM (see ip-xfrm(8) for reference), although I didn't try > that myself. The funny thing with XFRM is, it applies before the routing > decision does: If my IPsec policy matches, the packet goes that way no > matter what the routing table says about the original destination. This > can be used to override the default route provided via tun0 in the above > case. Except it isn't even the default route. We get given a bunch of split includes or split excludes from the VPN server. We pass them to vpnc-script or NetworkManager to actually set the routes up, and those tools may make their own tweaks to what the server requested — denying the default route and setting up explicit routes, or adding firewall rules or NAT to incoming/outgoing packets on the tun device. If it is no longer *just* the single tun device, everything gets really complicated. Even *before* we talk about changing it on the fly during normal operation. > Of course, OpenConnect has to manage all the XFRM/policy stuff on it's > own, since switching from ESP in UDP back to TLS would mean to tear down > the XFRM tunnel. OpenConnect would have to setup (a limited) XFRM and > send test traffic to decide whether to set it up fully (if limited) or > tear it down (if unlimited) again so traffic arrives at tunX again. Right. And ideally without CAP_NET_ADMIN. > In my opinion, this might work. The whole setup is probably about as > intuitive as the fact that kernel IPsec tunnel mode does not naturally > provide an own interface. Firewall setup on top of that might become a > matter of try-and-error. Maybe having a VTI interface and merely moving > the default route instead of fiddling with policies all the time might > make things a little easier to comprehend, but surely adds some > performance overhead. I think even the latter is sufficiently complex to manage that it's not worth pursuing. I may throw together my suggested hack using tun_get_socket() and see how much it makes *me* barf before deciding whether to show it here for more feedback :) -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Mon, 2015-02-02 at 09:24 +0100, Steffen Klassert wrote: > > Maybe you want to use a virtual tunnel interface (vti) what we have > already. Everything that is routed through such an interface is > guaranteed to be either encrypted if a matching xfrm state is present > or dropped. Same on the rceive side, everything that is received by > this interface is guaranteed to be IPsec processed. So you can do > a routing based decision about the IPsec processing. > > While I'm sure it could handle the ESP in UDP encapsulation, I'm not that > sure about your TCP fallback because this requires a valid xfrm state > to allow packets to pass. Using the same interface for both is probably > not possible. I'm trying to imagine how we could make it work in practice if we end up exposing two *different* interfaces and having to change the kernel's routing according to whether we have UDP connectivity at any given moment in time. Given how painful it already is to maintain vpnc-script and make it do the right thing for split-include and split-exclude routing, I'm not really sure I want to go there. Even if we could get such a scheme to work, it would probably also require retaining root privileges to make the changes — and one of the security benefits over the proprietary VPN clients is that we don't *need* to run as root. We can either drop privs after running vpnc-script to do the initial routing setup, or in the NetworkManager case we *never* run with elevated privileges; we just pass the IP/routing information back over DBus to NetworkManager. It occurs to me that for the approach I was thinking about, I wouldn't even need to touch the internals of the tun driver. It could be a separate driver which just uses tun_get_socket(). Userspace could hand it the file descriptors of the tun device and the connected UDP socket, along with the encryption parameters — and then just stop reading packets from the tun device for itself. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote: > On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote: > > > We might as well have not have implemented the IPSEC stack at all, > > because as a result of the userland VPN stuff our IPSEC stack is > > largely unused except by a very narrow group of users. > > Well, I'd love to make better use of it if I can. I do suspect it makes > most sense for userspace to continue to manage the probing of UDP > connectivity, and the fallback to TCP mode — and I suspect it also makes > sense to continue to use tun for passing packets up to the VPN client > when it's using the TCP transport. > > So the question would be how we handle redirecting the packet flow to > the optional UDP transport, when the VPN client determines that it's > available. For the sake of the user setting up firewall and routing > rules, I do think it's important that it continues to appear to > userspace as the *same* device for the entire lifetime of the session, > regardless of which transport the packets happen to be using at a given > moment in time. It doesn't *have* to be tun, though. > > You don't seem to like my suggestion of somehow pushing down an XFRM > state to the tun device to direct the packets out there instead of up to > userspace. Do you have an alternative suggestion... or a specific > concern that would help me come up with something you like better? Maybe you want to use a virtual tunnel interface (vti) what we have already. Everything that is routed through such an interface is guaranteed to be either encrypted if a matching xfrm state is present or dropped. Same on the rceive side, everything that is received by this interface is guaranteed to be IPsec processed. So you can do a routing based decision about the IPsec processing. While I'm sure it could handle the ESP in UDP encapsulation, I'm not that sure about your TCP fallback because this requires a valid xfrm state to allow packets to pass. Using the same interface for both is probably not possible.
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote: > From: David Woodhouse > Date: Sun, 01 Feb 2015 21:29:43 + > > > I really was looking for some way to push down something like an XFRM > > state into the tun device and just say "shove them out here until I tell > > you otherwise". > > People decided to use TUN and push VPN stuff back into userspace, > and there are repercussions for that decision. > > I'm not saying this to be mean or whatever, but I was very > disappointed when userland IPSEC solutions using TUN started showing > up. Yeah. That's a valid criticism of vpnc, certainly. I never did understand why it reimplemented the IPSec stack. For my OpenConnect client it's somewhat more justified though — the initial data transport there is over TLS, which the kernel doesn't support. And if we *can* establish UDP communication, that's over DTLS which the kernel doesn't support either. It's not even the *standard* version of DTLS because Cisco are still using a pre-RFC4347 version of the protocol. And we also need to probe the UDP connectivity and do keepalives and manage the fallback to using the TCP data transport. It's not like vpnc where it really is just a case of setting up the ESP context and letting it run. It's only now I've added Juniper support, which uses ESP-in-UDP for the data transport, that I'm doing something that the kernel supports at all. And now I'm looking at how to make use of that. > We might as well have not have implemented the IPSEC stack at all, > because as a result of the userland VPN stuff our IPSEC stack is > largely unused except by a very narrow group of users. Well, I'd love to make better use of it if I can. I do suspect it makes most sense for userspace to continue to manage the probing of UDP connectivity, and the fallback to TCP mode — and I suspect it also makes sense to continue to use tun for passing packets up to the VPN client when it's using the TCP transport. So the question would be how we handle redirecting the packet flow to the optional UDP transport, when the VPN client determines that it's available. For the sake of the user setting up firewall and routing rules, I do think it's important that it continues to appear to userspace as the *same* device for the entire lifetime of the session, regardless of which transport the packets happen to be using at a given moment in time. It doesn't *have* to be tun, though. You don't seem to like my suggestion of somehow pushing down an XFRM state to the tun device to direct the packets out there instead of up to userspace. Do you have an alternative suggestion... or a specific concern that would help me come up with something you like better? I'm guessing you don't want to push the *whole* management of the TLS control connection *and* the UDP transport, and probing the latter with keepalives, into the kernel? I certainly don't :) -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
From: David Woodhouse Date: Sun, 01 Feb 2015 21:29:43 + > I really was looking for some way to push down something like an XFRM > state into the tun device and just say "shove them out here until I tell > you otherwise". People decided to use TUN and push VPN stuff back into userspace, and there are repercussions for that decision. I'm not saying this to be mean or whatever, but I was very disappointed when userland IPSEC solutions using TUN started showing up. We might as well have not have implemented the IPSEC stack at all, because as a result of the userland VPN stuff our IPSEC stack is largely unused except by a very narrow group of users.
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Sun, 2015-02-01 at 12:19 -0800, David Miller wrote: > From: David Woodhouse > Date: Sun, 01 Feb 2015 13:33:50 + > > > Of course, now I'm looking closely at the path these packets take to > > leave the box, it starts to offend me that they're being passed up to > > userspace just to encrypt them (as DTLS or ESP) and then send them back > > down to the kernel on a UDP socket. The kernel already knows how to > > {en,de}crypt ESP, and do the sequence number checking on incoming > > packets. > > It's funny, I thought we had an IPSEC stack Right. But I'm trying to work out how we can sanely *use* that from a VPN client. The client normally sets up a tun device, configuring it with appropriate IP addresses and routes by invoking vpnc-script or passing the information back to NetworkManager. The client itself might not even have root privs, in the NetworkManager case. The initial authentication and connection are done over HTTPS, and packets *can* be passed that way if they need to be. But obviously the client *also* tries to set up a UDP data transport too — which is DTLS in the case of Cisco AnyConnect, and ESP in UDP for Juniper. If it *can* get communication over UDP, it'll use it. Otherwise it just passes packets over the TCP connection. So it needs to dynamically set up and tear down the ESP/DTLS tunnels as and when they are working. Ideally we want it such that that packets routed to the tun device get transparently encrypted and sent out on the UDP socket, and packets received from UDP and successfully decrypted will appear to have arrived on the tun device. The user may be manually tweaking the routing, or setting up firewall/NAT/etc. on the tun device. I can see how to set up an ESP in UDP tunnel such that it looks like the packets are actually departing on the *physical* interface (which in practice I suppose they are). But that's going to be fairly complex to set up, and extremely non-intuitive and hard to manage for the user. To the extent that I don't think it's actually deployable. I really was looking for some way to push down something like an XFRM state into the tun device and just say "shove them out here until I tell you otherwise". -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
From: David Woodhouse Date: Sun, 01 Feb 2015 13:33:50 + > Of course, now I'm looking closely at the path these packets take to > leave the box, it starts to offend me that they're being passed up to > userspace just to encrypt them (as DTLS or ESP) and then send them back > down to the kernel on a UDP socket. The kernel already knows how to > {en,de}crypt ESP, and do the sequence number checking on incoming > packets. It's funny, I thought we had an IPSEC stack
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Sun, 2015-02-01 at 14:26 +0200, Michael S. Tsirkin wrote: > > When I run over the VPN, netperf thinks it sent 2½ times the amount of > > TX traffic. > > At some level, it's expected: netperf's manual actually says: > A UDP_STREAM test has no end-to-end flow control - UDP provides none and > neither does netperf. However, if you wish, you can configure netperf > with --enable-intervals=yes to enable the global command-line -b and -w > options to pace bursts of traffic onto the network. True, but UDP is just a canary for other protocols here. We *should* be able to keep track of the packets before they even leave the box, and know that we haven't even managed to send them. Even if we know it's a datagram protocol and it's potentially going to be dropped in transit later. Of course, now I'm looking closely at the path these packets take to leave the box, it starts to offend me that they're being passed up to userspace just to encrypt them (as DTLS or ESP) and then send them back down to the kernel on a UDP socket. The kernel already knows how to {en,de}crypt ESP, and do the sequence number checking on incoming packets. I'm wondering if we bypass userspace in that case somehow — let userspace negotiate the encryption and connect the UDP socket, then just pass the socket fd and the parameters to the kernel so that incoming packets are decrypted and 'received' on the tun device, and outgoing packets on the tun device are encrypted and sent out on the UDP socket. The performance isn't too much of an issue for a VPN *client* in practice, but we have a server implementation too which would probably benefit quite well from such an offload facility. If I were to look at such a thing, would it provoke screams of horror? -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Sun, Feb 01, 2015 at 11:20:33AM +, David Woodhouse wrote: > On Wed, 2010-04-14 at 08:58 +0800, Herbert Xu wrote: > > On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote: > > > > > > Herbert Acked your patch, so I guess its OK, but I think it can be > > > dangerous. > > > > The tun socket accounting was never designed to stop it from > > flooding another tun interface. It's there to stop it from > > transmitting above a destination interface TX bandwidth and > > cause unnecessary packet drops. It also limits the total amount > > of kernel memory that can be pinned down by a single tun interface. > > > > In this case, all we're doing is shifting the accounting from the > > "hardware" queue to the qdisc queue. > > > > So your ability to flood a tun interface is essentially unchanged. > > I've just been looking at VPN performance, using netperf to flood an > openconnect/ocserv connection over GigE and profiling my VPN client. > > If I run netperf over the *unencrypted* link, it only sends 1Gb/s of > packets — because the packets are correctly accounted to netperf's UDP > socket until the moment they're actually transmitted on the wire, and > the backpressure works correctly. > > When I run over the VPN, netperf thinks it sent 2½ times the amount of > TX traffic. At some level, it's expected: netperf's manual actually says: A UDP_STREAM test has no end-to-end flow control - UDP provides none and neither does netperf. However, if you wish, you can configure netperf with --enable-intervals=yes to enable the global command-line -b and -w options to pace bursts of traffic onto the network. > Packets are being dropped by the tun device before even > feeding them up to the VPN client to be sent — presumably because of > this skb_orphan() call. (The client itself should do the right thing, > and only suck packets out of the tun at the rate it can shove them out > *its* UDP socket.) A simple work-around is to limit the rate using a non work conservig qdisc. > Did we ever look at the alternative solution of taking ownership only > after a timeout, or on demand when we need to shut down the device? I've been thinking about this on and off, but didn't find a good safe solution yet. For timeout, the difficulty is to find a good timer value, low enough to avoid DOS attacks but high enough to avoid spurious packet drops (and expensive timer interrupts). -- MST
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Wed, 2010-04-14 at 08:58 +0800, Herbert Xu wrote: > On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote: > > > > Herbert Acked your patch, so I guess its OK, but I think it can be > > dangerous. > > The tun socket accounting was never designed to stop it from > flooding another tun interface. It's there to stop it from > transmitting above a destination interface TX bandwidth and > cause unnecessary packet drops. It also limits the total amount > of kernel memory that can be pinned down by a single tun interface. > > In this case, all we're doing is shifting the accounting from the > "hardware" queue to the qdisc queue. > > So your ability to flood a tun interface is essentially unchanged. I've just been looking at VPN performance, using netperf to flood an openconnect/ocserv connection over GigE and profiling my VPN client. If I run netperf over the *unencrypted* link, it only sends 1Gb/s of packets — because the packets are correctly accounted to netperf's UDP socket until the moment they're actually transmitted on the wire, and the backpressure works correctly. When I run over the VPN, netperf thinks it sent 2½ times the amount of TX traffic. Packets are being dropped by the tun device before even feeding them up to the VPN client to be sent — presumably because of this skb_orphan() call. (The client itself should do the right thing, and only suck packets out of the tun at the rate it can shove them out *its* UDP socket.) Did we ever look at the alternative solution of taking ownership only after a timeout, or on demand when we need to shut down the device? -- dwmw2 smime.p7s Description: S/MIME cryptographic signature