[PATCH net-next 0/5] net: introduce noref sk

2017-09-20 Thread Paolo Abeni
This series introduce the infrastructure to store inside the skb a socket
pointer without carrying a refcount to the socket.

Such infrastructure is then used in the network receive path - and
specifically the early demux operation.

This allows the UDP early demux to perform a full lookup for UDP sockets,
with many benefits:

- the UDP early demux code is now much simpler
- the early demux does not hit any performance penalties in case of UDP hash
  table collision - previously the early demux performed a partial, unsuccesful,
  lookup
- early demux is now operational also for unconnected sockets.

This infrastrcture will be used in follow-up series to allow dst caching for
unconnected UDP sockets, and than to extend the same features to TCP listening
sockets.

Paolo Abeni (5):
  net: add support for noref skb->sk
  net: allow early demux to fetch noref socket
  udp: do not touch socket refcount in early demux
  net: add simple socket-like dst cache helpers
  udp: perform full socket lookup in early demux

 include/linux/skbuff.h   | 30 +++
 include/linux/udp.h  |  2 +
 include/net/dst.h| 12 ++
 net/core/dst.c   | 16 
 net/core/sock.c  |  6 +++
 net/ipv4/ip_input.c  | 12 ++
 net/ipv4/ipmr.c  | 18 +++--
 net/ipv4/netfilter/nf_dup_ipv4.c |  3 ++
 net/ipv4/udp.c   | 80 
 net/ipv6/ip6_input.c |  7 +++-
 net/ipv6/netfilter/nf_dup_ipv6.c |  3 ++
 net/ipv6/udp.c   | 67 ++---
 net/netfilter/nf_queue.c |  3 ++
 13 files changed, 159 insertions(+), 100 deletions(-)

-- 
2.13.5



Re: [PATCH net-next 0/5] net: introduce noref sk

2017-09-20 Thread David Miller
From: Paolo Abeni 
Date: Wed, 20 Sep 2017 18:54:00 +0200

> This series introduce the infrastructure to store inside the skb a socket
> pointer without carrying a refcount to the socket.
> 
> Such infrastructure is then used in the network receive path - and
> specifically the early demux operation.
> 
> This allows the UDP early demux to perform a full lookup for UDP sockets,
> with many benefits:
> 
> - the UDP early demux code is now much simpler
> - the early demux does not hit any performance penalties in case of UDP hash
>   table collision - previously the early demux performed a partial, 
> unsuccesful,
>   lookup
> - early demux is now operational also for unconnected sockets.
> 
> This infrastrcture will be used in follow-up series to allow dst caching for
> unconnected UDP sockets, and than to extend the same features to TCP listening
> sockets.

Like Eric, I find this series (while exciting) quite scary :-)

You really have to post some kind of performance numbers in your
header posting in order to justify something with these ramifications
and scale.

Thank you.


Re: [PATCH net-next 0/5] net: introduce noref sk

2017-09-21 Thread Paolo Abeni
Hi,

Thanks for the feedback!

On Wed, 2017-09-20 at 20:20 -0700, David Miller wrote:
> From: Paolo Abeni 
> Date: Wed, 20 Sep 2017 18:54:00 +0200
> 
> > This series introduce the infrastructure to store inside the skb a socket
> > pointer without carrying a refcount to the socket.
> > 
> > Such infrastructure is then used in the network receive path - and
> > specifically the early demux operation.
> > 
> > This allows the UDP early demux to perform a full lookup for UDP sockets,
> > with many benefits:
> > 
> > - the UDP early demux code is now much simpler
> > - the early demux does not hit any performance penalties in case of UDP hash
> >   table collision - previously the early demux performed a partial, 
> > unsuccesful,
> >   lookup
> > - early demux is now operational also for unconnected sockets.
> > 
> > This infrastrcture will be used in follow-up series to allow dst caching for
> > unconnected UDP sockets, and than to extend the same features to TCP 
> > listening
> > sockets.
> 
> Like Eric, I find this series (while exciting) quite scary :-)
> 
> You really have to post some kind of performance numbers in your
> header posting in order to justify something with these ramifications
> and scale.

This is actually a preparatory work for the next series which will
bring in the real gain. The next patches are still to be polished so we
 posted this separately to get some early feedback. 

If that would help, I can post the follow-up soon as RFC. Overall -
with the follow-up appplied, too - when using a single rx ingress
queue, I measured ~20% tput gain for unconnected ipv4 sockets - with
rp_filter disabled - and ~30% for ipv6 sockets. In case of multiple
ingress queues, the gain is smaller but still measurable (roughly 5%). 

Please let me know if you prefer the see the full work early. 

Thanks,

Paolo


Re: [PATCH net-next 0/5] net: introduce noref sk

2017-09-21 Thread Eric Dumazet
On Thu, 2017-09-21 at 11:42 +0200, Paolo Abeni wrote:
> Hi,
> 
> Thanks for the feedback!
> 
> On Wed, 2017-09-20 at 20:20 -0700, David Miller wrote:
> > From: Paolo Abeni 
> > Date: Wed, 20 Sep 2017 18:54:00 +0200
> > 
> > > This series introduce the infrastructure to store inside the skb a socket
> > > pointer without carrying a refcount to the socket.
> > > 
> > > Such infrastructure is then used in the network receive path - and
> > > specifically the early demux operation.
> > > 
> > > This allows the UDP early demux to perform a full lookup for UDP sockets,
> > > with many benefits:
> > > 
> > > - the UDP early demux code is now much simpler
> > > - the early demux does not hit any performance penalties in case of UDP 
> > > hash
> > >   table collision - previously the early demux performed a partial, 
> > > unsuccesful,
> > >   lookup
> > > - early demux is now operational also for unconnected sockets.
> > > 
> > > This infrastrcture will be used in follow-up series to allow dst caching 
> > > for
> > > unconnected UDP sockets, and than to extend the same features to TCP 
> > > listening
> > > sockets.
> > 
> > Like Eric, I find this series (while exciting) quite scary :-)
> > 
> > You really have to post some kind of performance numbers in your
> > header posting in order to justify something with these ramifications
> > and scale.
> 
> This is actually a preparatory work for the next series which will
> bring in the real gain. The next patches are still to be polished so we
>  posted this separately to get some early feedback. 
> 
> If that would help, I can post the follow-up soon as RFC. Overall -
> with the follow-up appplied, too - when using a single rx ingress
> queue, I measured ~20% tput gain for unconnected ipv4 sockets - with
> rp_filter disabled - and ~30% for ipv6 sockets. In case of multiple
> ingress queues, the gain is smaller but still measurable (roughly 5%). 
> 
> Please let me know if you prefer the see the full work early. 

I want to see the full work yes. Ipv6, and everything.

I do not want ~1000 lines of changed code in the stack for some corner
cases, where people do not properly use existing infra, like proper
SO_REUSEPORT with proper BPF filter to have as many clean siloes (proper
CPU/NUMA affinities to avoid QPI traffic)

The complexity of your patches reached a point where I am extremely
nervous.

Thanks.