Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-03 Thread David Woodhouse
On Tue, 2015-02-03 at 16:19 -0800, David Miller wrote:
> From: David Woodhouse 
> Date: Mon, 02 Feb 2015 07:27:10 +
> 
> > I'm guessing you don't want to push the *whole* management of the TLS
> > control connection *and* the UDP transport, and probing the latter with
> > keepalives, into the kernel? I certainly don't :)
> 
> Whilst Herbert Xu and I have discussed in the past supporting
> automatic SSL handling of socket data during socket writes in the
> kernel, doing TLS stuff would be a bit of a stretch :-)

Right. For the DTLS I was thinking we'd do the handshake in userspace
and then hand the UDP socket down. At that point it's basically the same
as ESP with the bytes in a slightly different place.

So I really am looking at an option for "here's a UDP socket to send
those tun packets out on, with  encryption setup" as the sanest
plan I can come up with.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-03 Thread David Miller
From: David Woodhouse 
Date: Mon, 02 Feb 2015 07:27:10 +

> I'm guessing you don't want to push the *whole* management of the TLS
> control connection *and* the UDP transport, and probing the latter with
> keepalives, into the kernel? I certainly don't :)

Whilst Herbert Xu and I have discussed in the past supporting
automatic SSL handling of socket data during socket writes in the
kernel, doing TLS stuff would be a bit of a stretch :-)



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread Phil Sutter
Hi,

On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote:
> On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote:
> > From: David Woodhouse 
> > Date: Sun, 01 Feb 2015 21:29:43 +
> > 
> > > I really was looking for some way to push down something like an XFRM
> > > state into the tun device and just say "shove them out here until I tell
> > > you otherwise".
> > 
> > People decided to use TUN and push VPN stuff back into userspace,
> > and there are repercussions for that decision.
> > 
> > I'm not saying this to be mean or whatever, but I was very
> > disappointed when userland IPSEC solutions using TUN started showing
> > up.
> 
> Yeah. That's a valid criticism of vpnc, certainly. I never did
> understand why it reimplemented the IPSec stack.
> 
> For my OpenConnect client it's somewhat more justified though — the
> initial data transport there is over TLS, which the kernel doesn't
> support. And if we *can* establish UDP communication, that's over DTLS
> which the kernel doesn't support either. It's not even the *standard*
> version of DTLS because Cisco are still using a pre-RFC4347 version of
> the protocol. And we also need to probe the UDP connectivity and do
> keepalives and manage the fallback to using the TCP data transport.
> 
> It's not like vpnc where it really is just a case of setting up the ESP
> context and letting it run.
> 
> It's only now I've added Juniper support, which uses ESP-in-UDP for the
> data transport, that I'm doing something that the kernel supports at
> all. And now I'm looking at how to make use of that.
> 
> > We might as well have not have implemented the IPSEC stack at all,
> > because as a result of the userland VPN stuff our IPSEC stack is
> > largely unused except by a very narrow group of users.
> 
> Well, I'd love to make better use of it if I can. I do suspect it makes
> most sense for userspace to continue to manage the probing of UDP
> connectivity, and the fallback to TCP mode — and I suspect it also makes
> sense to continue to use tun for passing packets up to the VPN client
> when it's using the TCP transport.
> 
> So the question would be how we handle redirecting the packet flow to
> the optional UDP transport, when the VPN client determines that it's
> available. For the sake of the user setting up firewall and routing
> rules, I do think it's important that it continues to appear to
> userspace as the *same* device for the entire lifetime of the session,
> regardless of which transport the packets happen to be using at a given
> moment in time. It doesn't *have* to be tun, though. 

Since you want to provide connectivity over HTTPS which is not possible
in kernel space, you are stuck with keeping the tun device. So the
packet flow in that case is identical to how e.g. OpenVPN does it:

- tunX holds default route
- OpenConnect then:
  - receives packets on /dev/tun
  - holds TCP socket to VPN concentrator
  - does encapsulation into TLS

Speaking of optimisation, the interesting part is the alternative flow
via IPsec in UDP. AFAICT, it should be possible to setup an ESP in UDP
tunnel using XFRM (see ip-xfrm(8) for reference), although I didn't try
that myself. The funny thing with XFRM is, it applies before the routing
decision does: If my IPsec policy matches, the packet goes that way no
matter what the routing table says about the original destination. This
can be used to override the default route provided via tun0 in the above
case.

Of course, OpenConnect has to manage all the XFRM/policy stuff on it's
own, since switching from ESP in UDP back to TLS would mean to tear down
the XFRM tunnel. OpenConnect would have to setup (a limited) XFRM and
send test traffic to decide whether to set it up fully (if limited) or
tear it down (if unlimited) again so traffic arrives at tunX again.

In my opinion, this might work. The whole setup is probably about as
intuitive as the fact that kernel IPsec tunnel mode does not naturally
provide an own interface. Firewall setup on top of that might become a
matter of try-and-error. Maybe having a VTI interface and merely moving
the default route instead of fiddling with policies all the time might
make things a little easier to comprehend, but surely adds some
performance overhead.

Cheers, Phil



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread David Woodhouse
On Mon, 2015-02-02 at 16:23 +0100, Phil Sutter wrote:
> Since you want to provide connectivity over HTTPS which is not possible
> in kernel space, you are stuck with keeping the tun device. So the
> packet flow in that case is identical to how e.g. OpenVPN does it:
> 
> - tunX holds default route
> - OpenConnect then:
>   - receives packets on /dev/tun
>   - holds TCP socket to VPN concentrator
>   - does encapsulation into TLS
> 
> Speaking of optimisation, the interesting part is the alternative flow
> via IPsec in UDP.

Right. The packet flow you describe is what we already have. Except of
course we already *do* establish the UDP connection (which is DTLS when
we're talking to a Cisco AnyConnect server, and ESP in UDP when we're
talking to Juniper). If we get responses to keepalive packets, we'll
send outbound packets over the UDP connection. If the UDP connectivity
goes AWOL, we'll fall back to sending on TCP. Rekeying of the UDP
connection is handled over the TCP control connection too. Even in the
DTLS case, the master secret and session ID are exchanged over TCP and
the DTLS is actually done as a 'session resume', without the normal DTLS
handshake ever happening.

As you say, I'm stuck with keeping the tun device (or something very
much like it). This *isn't* like vpnc where I can set up an IPSec config
and just let it run.

>  AFAICT, it should be possible to setup an ESP in UDP
> tunnel using XFRM (see ip-xfrm(8) for reference), although I didn't try
> that myself. The funny thing with XFRM is, it applies before the routing
> decision does: If my IPsec policy matches, the packet goes that way no
> matter what the routing table says about the original destination. This
> can be used to override the default route provided via tun0 in the above
> case.

Except it isn't even the default route. We get given a bunch of split
includes or split excludes from the VPN server. We pass them to
vpnc-script or NetworkManager to actually set the routes up, and those
tools may make their own tweaks to what the server requested — denying
the default route and setting up explicit routes, or adding firewall
rules or NAT to incoming/outgoing packets on the tun device.

If it is no longer *just* the single tun device, everything gets really
complicated. Even *before* we talk about changing it on the fly during
normal operation.

> Of course, OpenConnect has to manage all the XFRM/policy stuff on it's
> own, since switching from ESP in UDP back to TLS would mean to tear down
> the XFRM tunnel. OpenConnect would have to setup (a limited) XFRM and
> send test traffic to decide whether to set it up fully (if limited) or
> tear it down (if unlimited) again so traffic arrives at tunX again.

Right. And ideally without CAP_NET_ADMIN.

> In my opinion, this might work. The whole setup is probably about as
> intuitive as the fact that kernel IPsec tunnel mode does not naturally
> provide an own interface. Firewall setup on top of that might become a
> matter of try-and-error. Maybe having a VTI interface and merely moving
> the default route instead of fiddling with policies all the time might
> make things a little easier to comprehend, but surely adds some
> performance overhead.

I think even the latter is sufficiently complex to manage that it's not
worth pursuing. I may throw together my suggested hack using
tun_get_socket() and see how much it makes *me* barf before deciding
whether to show it here for more feedback :)

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread David Woodhouse
On Mon, 2015-02-02 at 09:24 +0100, Steffen Klassert wrote:
> 
> Maybe you want to use a virtual tunnel interface (vti) what we have
> already. Everything that is routed through such an interface is
> guaranteed to be either encrypted if a matching xfrm state is present
> or dropped. Same on the rceive side, everything that is received by
> this interface is guaranteed to be IPsec processed. So you can do
> a routing based decision about the IPsec processing.
> 
> While I'm sure it could handle the ESP in UDP encapsulation, I'm not that
> sure about your TCP fallback because this requires a valid xfrm state
> to allow packets to pass. Using the same interface for both is probably
> not possible.

I'm trying to imagine how we could make it work in practice if we end up
exposing two *different* interfaces and having to change the kernel's
routing according to whether we have UDP connectivity at any given
moment in time.

Given how painful it already is to maintain vpnc-script and make it do
the right thing for split-include and split-exclude routing, I'm not
really sure I want to go there.

Even if we could get such a scheme to work, it would probably also
require retaining root privileges to make the changes — and one of the
security benefits over the proprietary VPN clients is that we don't
*need* to run as root. We can either drop privs after running
vpnc-script to do the initial routing setup, or in the NetworkManager
case we *never* run with elevated privileges; we just pass the
IP/routing information back over DBus to NetworkManager.

It occurs to me that for the approach I was thinking about, I wouldn't
even need to touch the internals of the tun driver. It could be a
separate driver which just uses tun_get_socket(). Userspace could hand
it the file descriptors of the tun device and the connected UDP socket,
along with the encryption parameters — and then just stop reading
packets from the tun device for itself.

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread Steffen Klassert
On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote:
> On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote:
> 
> > We might as well have not have implemented the IPSEC stack at all,
> > because as a result of the userland VPN stuff our IPSEC stack is
> > largely unused except by a very narrow group of users.
> 
> Well, I'd love to make better use of it if I can. I do suspect it makes
> most sense for userspace to continue to manage the probing of UDP
> connectivity, and the fallback to TCP mode — and I suspect it also makes
> sense to continue to use tun for passing packets up to the VPN client
> when it's using the TCP transport.
> 
> So the question would be how we handle redirecting the packet flow to
> the optional UDP transport, when the VPN client determines that it's
> available. For the sake of the user setting up firewall and routing
> rules, I do think it's important that it continues to appear to
> userspace as the *same* device for the entire lifetime of the session,
> regardless of which transport the packets happen to be using at a given
> moment in time. It doesn't *have* to be tun, though. 
> 
> You don't seem to like my suggestion of somehow pushing down an XFRM
> state to the tun device to direct the packets out there instead of up to
> userspace. Do you have an alternative suggestion... or a specific
> concern that would help me come up with something you like better?

Maybe you want to use a virtual tunnel interface (vti) what we have
already. Everything that is routed through such an interface is
guaranteed to be either encrypted if a matching xfrm state is present
or dropped. Same on the rceive side, everything that is received by
this interface is guaranteed to be IPsec processed. So you can do
a routing based decision about the IPsec processing.

While I'm sure it could handle the ESP in UDP encapsulation, I'm not that
sure about your TCP fallback because this requires a valid xfrm state
to allow packets to pass. Using the same interface for both is probably
not possible.



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Woodhouse
On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote:
> From: David Woodhouse 
> Date: Sun, 01 Feb 2015 21:29:43 +
> 
> > I really was looking for some way to push down something like an XFRM
> > state into the tun device and just say "shove them out here until I tell
> > you otherwise".
> 
> People decided to use TUN and push VPN stuff back into userspace,
> and there are repercussions for that decision.
> 
> I'm not saying this to be mean or whatever, but I was very
> disappointed when userland IPSEC solutions using TUN started showing
> up.

Yeah. That's a valid criticism of vpnc, certainly. I never did
understand why it reimplemented the IPSec stack.

For my OpenConnect client it's somewhat more justified though — the
initial data transport there is over TLS, which the kernel doesn't
support. And if we *can* establish UDP communication, that's over DTLS
which the kernel doesn't support either. It's not even the *standard*
version of DTLS because Cisco are still using a pre-RFC4347 version of
the protocol. And we also need to probe the UDP connectivity and do
keepalives and manage the fallback to using the TCP data transport.

It's not like vpnc where it really is just a case of setting up the ESP
context and letting it run.

It's only now I've added Juniper support, which uses ESP-in-UDP for the
data transport, that I'm doing something that the kernel supports at
all. And now I'm looking at how to make use of that.

> We might as well have not have implemented the IPSEC stack at all,
> because as a result of the userland VPN stuff our IPSEC stack is
> largely unused except by a very narrow group of users.

Well, I'd love to make better use of it if I can. I do suspect it makes
most sense for userspace to continue to manage the probing of UDP
connectivity, and the fallback to TCP mode — and I suspect it also makes
sense to continue to use tun for passing packets up to the VPN client
when it's using the TCP transport.

So the question would be how we handle redirecting the packet flow to
the optional UDP transport, when the VPN client determines that it's
available. For the sake of the user setting up firewall and routing
rules, I do think it's important that it continues to appear to
userspace as the *same* device for the entire lifetime of the session,
regardless of which transport the packets happen to be using at a given
moment in time. It doesn't *have* to be tun, though. 

You don't seem to like my suggestion of somehow pushing down an XFRM
state to the tun device to direct the packets out there instead of up to
userspace. Do you have an alternative suggestion... or a specific
concern that would help me come up with something you like better?

I'm guessing you don't want to push the *whole* management of the TLS
control connection *and* the UDP transport, and probing the latter with
keepalives, into the kernel? I certainly don't :)

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Miller
From: David Woodhouse 
Date: Sun, 01 Feb 2015 21:29:43 +

> I really was looking for some way to push down something like an XFRM
> state into the tun device and just say "shove them out here until I tell
> you otherwise".

People decided to use TUN and push VPN stuff back into userspace,
and there are repercussions for that decision.

I'm not saying this to be mean or whatever, but I was very
disappointed when userland IPSEC solutions using TUN started showing
up.

We might as well have not have implemented the IPSEC stack at all,
because as a result of the userland VPN stuff our IPSEC stack is
largely unused except by a very narrow group of users.



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Woodhouse
On Sun, 2015-02-01 at 12:19 -0800, David Miller wrote:
> From: David Woodhouse 
> Date: Sun, 01 Feb 2015 13:33:50 +
> 
> > Of course, now I'm looking closely at the path these packets take to
> > leave the box, it starts to offend me that they're being passed up to
> > userspace just to encrypt them (as DTLS or ESP) and then send them back
> > down to the kernel on a UDP socket. The kernel already knows how to
> > {en,de}crypt ESP, and do the sequence number checking on incoming
> > packets.
> 
> It's funny, I thought we had an IPSEC stack

Right. But I'm trying to work out how we can sanely *use* that from a
VPN client.

The client normally sets up a tun device, configuring it with
appropriate IP addresses and routes by invoking vpnc-script or passing
the information back to NetworkManager. The client itself might not even
have root privs, in the NetworkManager case.

The initial authentication and connection are done over HTTPS, and
packets *can* be passed that way if they need to be. But obviously the
client *also* tries to set up a UDP data transport too — which is DTLS
in the case of Cisco AnyConnect, and ESP in UDP for Juniper.

If it *can* get communication over UDP, it'll use it. Otherwise it just
passes packets over the TCP connection. So it needs to dynamically set
up and tear down the ESP/DTLS tunnels as and when they are working.

Ideally we want it such that that packets routed to the tun device get
transparently encrypted and sent out on the UDP socket, and packets
received from UDP and successfully decrypted will appear to have arrived
on the tun device. The user may be manually tweaking the routing, or
setting up firewall/NAT/etc. on the tun device.

I can see how to set up an ESP in UDP tunnel such that it looks like the
packets are actually departing on the *physical* interface (which in
practice I suppose they are). But that's going to be fairly complex to
set up, and extremely non-intuitive and hard to manage for the user. To
the extent that I don't think it's actually deployable.

I really was looking for some way to push down something like an XFRM
state into the tun device and just say "shove them out here until I tell
you otherwise".

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Miller
From: David Woodhouse 
Date: Sun, 01 Feb 2015 13:33:50 +

> Of course, now I'm looking closely at the path these packets take to
> leave the box, it starts to offend me that they're being passed up to
> userspace just to encrypt them (as DTLS or ESP) and then send them back
> down to the kernel on a UDP socket. The kernel already knows how to
> {en,de}crypt ESP, and do the sequence number checking on incoming
> packets.

It's funny, I thought we had an IPSEC stack



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Woodhouse
On Sun, 2015-02-01 at 14:26 +0200, Michael S. Tsirkin wrote:
> > When I run over the VPN, netperf thinks it sent 2½ times the amount of
> > TX traffic.
> 
> At some level, it's expected: netperf's manual actually says:
>   A UDP_STREAM test has no end-to-end flow control - UDP provides none and
>   neither does netperf. However, if you wish, you can configure netperf
>   with --enable-intervals=yes to enable the global command-line -b and -w
>   options to pace bursts of traffic onto the network.

True, but UDP is just a canary for other protocols here. We *should* be
able to keep track of the packets before they even leave the box, and
know that we haven't even managed to send them. Even if we know it's a
datagram protocol and it's potentially going to be dropped in transit
later.

Of course, now I'm looking closely at the path these packets take to
leave the box, it starts to offend me that they're being passed up to
userspace just to encrypt them (as DTLS or ESP) and then send them back
down to the kernel on a UDP socket. The kernel already knows how to
{en,de}crypt ESP, and do the sequence number checking on incoming
packets.

I'm wondering if we bypass userspace in that case somehow — let
userspace negotiate the encryption and connect the UDP socket, then just
pass the socket fd and the parameters to the kernel so that incoming
packets are decrypted and 'received' on the tun device, and outgoing
packets on the tun device are encrypted and sent out on the UDP socket. 

The performance isn't too much of an issue for a VPN *client* in
practice, but we have a server implementation too which would probably
benefit quite well from such an offload facility.

If I were to look at such a thing, would it provoke screams of horror? 

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread Michael S. Tsirkin
On Sun, Feb 01, 2015 at 11:20:33AM +, David Woodhouse wrote:
> On Wed, 2010-04-14 at 08:58 +0800, Herbert Xu wrote:
> > On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> > >
> > > Herbert Acked your patch, so I guess its OK, but I think it can be
> > > dangerous.
> > 
> > The tun socket accounting was never designed to stop it from
> > flooding another tun interface.  It's there to stop it from
> > transmitting above a destination interface TX bandwidth and
> > cause unnecessary packet drops.  It also limits the total amount
> > of kernel memory that can be pinned down by a single tun interface.
> > 
> > In this case, all we're doing is shifting the accounting from the
> > "hardware" queue to the qdisc queue.
> > 
> > So your ability to flood a tun interface is essentially unchanged.
> 
> I've just been looking at VPN performance, using netperf to flood an
> openconnect/ocserv connection over GigE and profiling my VPN client.
> 
> If I run netperf over the *unencrypted* link, it only sends 1Gb/s of
> packets — because the packets are correctly accounted to netperf's UDP
> socket until the moment they're actually transmitted on the wire, and
> the backpressure works correctly.
> 
> When I run over the VPN, netperf thinks it sent 2½ times the amount of
> TX traffic.

At some level, it's expected: netperf's manual actually says:
A UDP_STREAM test has no end-to-end flow control - UDP provides none and
neither does netperf. However, if you wish, you can configure netperf
with --enable-intervals=yes to enable the global command-line -b and -w
options to pace bursts of traffic onto the network.

> Packets are being dropped by the tun device before even
> feeding them up to the VPN client to be sent — presumably because of
> this skb_orphan() call. (The client itself should do the right thing,
> and only suck packets out of the tun at the rate it can shove them out
> *its* UDP socket.)

A simple work-around is to limit the rate using a non work conservig qdisc.

> Did we ever look at the alternative solution of taking ownership only
> after a timeout, or on demand when we need to shut down the device?

I've been thinking about this on and off, but didn't find a good
safe solution yet.

For timeout, the difficulty is to find a good timer value,
low enough to avoid DOS attacks but high enough to avoid
spurious packet drops (and expensive timer interrupts).

-- 
MST



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Woodhouse
On Wed, 2010-04-14 at 08:58 +0800, Herbert Xu wrote:
> On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> >
> > Herbert Acked your patch, so I guess its OK, but I think it can be
> > dangerous.
> 
> The tun socket accounting was never designed to stop it from
> flooding another tun interface.  It's there to stop it from
> transmitting above a destination interface TX bandwidth and
> cause unnecessary packet drops.  It also limits the total amount
> of kernel memory that can be pinned down by a single tun interface.
> 
> In this case, all we're doing is shifting the accounting from the
> "hardware" queue to the qdisc queue.
> 
> So your ability to flood a tun interface is essentially unchanged.

I've just been looking at VPN performance, using netperf to flood an
openconnect/ocserv connection over GigE and profiling my VPN client.

If I run netperf over the *unencrypted* link, it only sends 1Gb/s of
packets — because the packets are correctly accounted to netperf's UDP
socket until the moment they're actually transmitted on the wire, and
the backpressure works correctly.

When I run over the VPN, netperf thinks it sent 2½ times the amount of
TX traffic. Packets are being dropped by the tun device before even
feeding them up to the VPN client to be sent — presumably because of
this skb_orphan() call. (The client itself should do the right thing,
and only suck packets out of the tun at the rate it can shove them out
*its* UDP socket.)

Did we ever look at the alternative solution of taking ownership only
after a timeout, or on demand when we need to shut down the device?

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature