Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-09 Thread David Miller
From: David Howells 
Date: Thu, 09 Mar 2017 07:51:34 +

> David Miller  wrote:
> 
>> I guess this is fine, but I think you can use one of the two "sk_padding"
>> bits in struct sock instead of making the structure larger.
> 
> It shouldn't make the structure larger since there's a hole in the structure:
 ...
>   u16 sk_gso_max_segs;
>   ---> 2-byte hole here
>   unsigned long   sk_lingertime;
> 
> but I'll quite happily shift it to sk_padding.

Ok, then that's fine.


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-09 Thread David Miller
From: David Howells 
Date: Thu, 09 Mar 2017 07:51:34 +

> David Miller  wrote:
> 
>> I guess this is fine, but I think you can use one of the two "sk_padding"
>> bits in struct sock instead of making the structure larger.
> 
> It shouldn't make the structure larger since there's a hole in the structure:
 ...
>   u16 sk_gso_max_segs;
>   ---> 2-byte hole here
>   unsigned long   sk_lingertime;
> 
> but I'll quite happily shift it to sk_padding.

Ok, then that's fine.


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Howells
David Miller  wrote:

> I guess this is fine, but I think you can use one of the two "sk_padding"
> bits in struct sock instead of making the structure larger.

It shouldn't make the structure larger since there's a hole in the structure:

unsigned intsk_padding : 2,
sk_no_check_tx : 1,
sk_no_check_rx : 1,
sk_userlocks : 4,
sk_protocol  : 8,
sk_type  : 16;
#define SK_PROTOCOL_MAX U8_MAX
kmemcheck_bitfield_end(flags);

u16 sk_gso_max_segs;
  ---> 2-byte hole here
unsigned long   sk_lingertime;

but I'll quite happily shift it to sk_padding.

Thanks,
David


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Howells
David Miller  wrote:

> I guess this is fine, but I think you can use one of the two "sk_padding"
> bits in struct sock instead of making the structure larger.

It shouldn't make the structure larger since there's a hole in the structure:

unsigned intsk_padding : 2,
sk_no_check_tx : 1,
sk_no_check_rx : 1,
sk_userlocks : 4,
sk_protocol  : 8,
sk_type  : 16;
#define SK_PROTOCOL_MAX U8_MAX
kmemcheck_bitfield_end(flags);

u16 sk_gso_max_segs;
  ---> 2-byte hole here
unsigned long   sk_lingertime;

but I'll quite happily shift it to sk_padding.

Thanks,
David


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Miller
From: David Howells 
Date: Mon, 06 Mar 2017 15:04:44 +

> Fix the general case by:
> 
>  (1) Double up all the locking keys used in sockets so that one set are
>  used if the socket is created by userspace and the other set is used
>  if the socket is created by the kernel.
> 
>  (2) Store the kern parameter passed to sk_alloc() in a variable in the
>  sock struct (sk_kern_sock).  This informs sock_lock_init(),
>  sock_init_data() and sk_clone_lock() as to the lock keys to be used.
> 
>  Note that the child created by sk_clone_lock() inherits the parent's
>  kern setting.
> 
>  (3) Add a 'kern' parameter to ->accept() that is analogous to the one
>  passed in to ->create() that distinguishes whether kernel_accept() or
>  sys_accept4() was the caller and can be passed to sk_alloc().
> 
>  Note that a lot of accept functions merely dequeue an already
>  allocated socket.  I haven't touched these as the new socket already
>  exists before we get the parameter.
> 
>  Note also that there are a couple of places where I've made the accepted
>  socket unconditionally kernel-based:
> 
>   irda_accept()
>   rds_rcp_accept_one()
>   tcp_accept_from_sock()
> 
>  because they follow a sock_create_kern() and accept off of that.

I guess this is fine, but I think you can use one of the two "sk_padding"
bits in struct sock instead of making the structure larger.


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Miller
From: David Howells 
Date: Mon, 06 Mar 2017 15:04:44 +

> Fix the general case by:
> 
>  (1) Double up all the locking keys used in sockets so that one set are
>  used if the socket is created by userspace and the other set is used
>  if the socket is created by the kernel.
> 
>  (2) Store the kern parameter passed to sk_alloc() in a variable in the
>  sock struct (sk_kern_sock).  This informs sock_lock_init(),
>  sock_init_data() and sk_clone_lock() as to the lock keys to be used.
> 
>  Note that the child created by sk_clone_lock() inherits the parent's
>  kern setting.
> 
>  (3) Add a 'kern' parameter to ->accept() that is analogous to the one
>  passed in to ->create() that distinguishes whether kernel_accept() or
>  sys_accept4() was the caller and can be passed to sk_alloc().
> 
>  Note that a lot of accept functions merely dequeue an already
>  allocated socket.  I haven't touched these as the new socket already
>  exists before we get the parameter.
> 
>  Note also that there are a couple of places where I've made the accepted
>  socket unconditionally kernel-based:
> 
>   irda_accept()
>   rds_rcp_accept_one()
>   tcp_accept_from_sock()
> 
>  because they follow a sock_create_kern() and accept off of that.

I guess this is fine, but I think you can use one of the two "sk_padding"
bits in struct sock instead of making the structure larger.


[RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-06 Thread David Howells
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
 calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
 creating a call requires the socket lock:

mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
 binds the underlying UDP socket whilst holding its socket lock.
 inet_bind() takes its own socket lock:

sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
 and thus cause the kernel to take the mmap_sem, but the TCP socket is
 locked whilst doing this:

sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
 used if the socket is created by userspace and the other set is used
 if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
 sock struct (sk_kern_sock).  This informs sock_lock_init(),
 sock_init_data() and sk_clone_lock() as to the lock keys to be used.

 Note that the child created by sk_clone_lock() inherits the parent's
 kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
 passed in to ->create() that distinguishes whether kernel_accept() or
 sys_accept4() was the caller and can be passed to sk_alloc().

 Note that a lot of accept functions merely dequeue an already
 allocated socket.  I haven't touched these as the new socket already
 exists before we get the parameter.

 Note also that there are a couple of places where I've made the accepted
 socket unconditionally kernel-based:

irda_accept()
rds_rcp_accept_one()
tcp_accept_from_sock()

 because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells 
---

 crypto/af_alg.c   |9 +-
 crypto/algif_hash.c   |9 +-
 drivers/staging/lustre/lnet/lnet/lib-socket.c |4 -
 fs/dlm/lowcomms.c |2 
 fs/ocfs2/cluster/tcp.c|2 
 include/crypto/if_alg.h   |2 
 include/linux/net.h   |2 
 include/net/inet_common.h |3 -
 include/net/inet_connection_sock.h|2 
 include/net/sctp/structs.h|3 -
 include/net/sock.h|7 +-
 net/atm/svc.c |5 +
 net/ax25/af_ax25.c|3 -
 net/bluetooth/l2cap_sock.c|2 
 net/bluetooth/rfcomm/sock.c   |3 -
 net/bluetooth/sco.c   |2 
 net/core/sock.c   |  106 +
 net/decnet/af_decnet.c|5 +
 net/ipv4/af_inet.c|5 +
 net/ipv4/inet_connection_sock.c   |2 
 net/irda/af_irda.c|5 +
 net/iucv/af_iucv.c|2 
 net/llc/af_llc.c  |4 +
 net/netrom/af_netrom.c|3 -
 net/nfc/llcp_sock.c   |2 
 net/phonet/pep.c  |6 +
 net/phonet/socket.c   |4 -
 net/rds/tcp_listen.c  |2 
 net/rose/af_rose.c|3 -
 net/sctp/ipv6.c   |5 +
 net/sctp/protocol.c   |5 +
 net/sctp/socket.c |4 -
 net/smc/af_smc.c  |2 
 net/socket.c  |4 -
 net/tipc/socket.c |8 +-
 net/unix/af_unix.c|5 +
 net/vmw_vsock/af_vsock.c  |3 -
 net/x25/af_x25.c  |3 -
 38 files changed, 141 insertions(+), 107 deletions(-)

diff --git 

[RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-06 Thread David Howells
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
 calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
 creating a call requires the socket lock:

mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
 binds the underlying UDP socket whilst holding its socket lock.
 inet_bind() takes its own socket lock:

sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
 and thus cause the kernel to take the mmap_sem, but the TCP socket is
 locked whilst doing this:

sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
 used if the socket is created by userspace and the other set is used
 if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
 sock struct (sk_kern_sock).  This informs sock_lock_init(),
 sock_init_data() and sk_clone_lock() as to the lock keys to be used.

 Note that the child created by sk_clone_lock() inherits the parent's
 kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
 passed in to ->create() that distinguishes whether kernel_accept() or
 sys_accept4() was the caller and can be passed to sk_alloc().

 Note that a lot of accept functions merely dequeue an already
 allocated socket.  I haven't touched these as the new socket already
 exists before we get the parameter.

 Note also that there are a couple of places where I've made the accepted
 socket unconditionally kernel-based:

irda_accept()
rds_rcp_accept_one()
tcp_accept_from_sock()

 because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells 
---

 crypto/af_alg.c   |9 +-
 crypto/algif_hash.c   |9 +-
 drivers/staging/lustre/lnet/lnet/lib-socket.c |4 -
 fs/dlm/lowcomms.c |2 
 fs/ocfs2/cluster/tcp.c|2 
 include/crypto/if_alg.h   |2 
 include/linux/net.h   |2 
 include/net/inet_common.h |3 -
 include/net/inet_connection_sock.h|2 
 include/net/sctp/structs.h|3 -
 include/net/sock.h|7 +-
 net/atm/svc.c |5 +
 net/ax25/af_ax25.c|3 -
 net/bluetooth/l2cap_sock.c|2 
 net/bluetooth/rfcomm/sock.c   |3 -
 net/bluetooth/sco.c   |2 
 net/core/sock.c   |  106 +
 net/decnet/af_decnet.c|5 +
 net/ipv4/af_inet.c|5 +
 net/ipv4/inet_connection_sock.c   |2 
 net/irda/af_irda.c|5 +
 net/iucv/af_iucv.c|2 
 net/llc/af_llc.c  |4 +
 net/netrom/af_netrom.c|3 -
 net/nfc/llcp_sock.c   |2 
 net/phonet/pep.c  |6 +
 net/phonet/socket.c   |4 -
 net/rds/tcp_listen.c  |2 
 net/rose/af_rose.c|3 -
 net/sctp/ipv6.c   |5 +
 net/sctp/protocol.c   |5 +
 net/sctp/socket.c |4 -
 net/smc/af_smc.c  |2 
 net/socket.c  |4 -
 net/tipc/socket.c |8 +-
 net/unix/af_unix.c|5 +
 net/vmw_vsock/af_vsock.c  |3 -
 net/x25/af_x25.c  |3 -
 38 files changed, 141 insertions(+), 107 deletions(-)

diff --git a/crypto/af_alg.c