[PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-20 Thread Aaron Conole
From: Aaron Conole 

AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
is set.

This is referenced in kernel bugzilla #12323 @
https://bugzilla.kernel.org/show_bug.cgi?id=12323

As described both in the BZ and lkml thread @
http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
AF_UNIX socket only reads a single skb, where the desired effect is
to return as much skb data has been queued, until hitting the recv
buffer size (whichever comes first).

The modified MSG_PEEK path will now move to the next skb in the tree
and jump to the again: label, rather than following the natural loop
structure. This requires duplicating some of the loop head actions.

This was tested using the python socketpair python code attached to
the bugzilla issue.

Signed-off-by: Aaron Conole 
---
 net/unix/af_unix.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 03ee4d3..988fbbd4 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2179,9 +2179,24 @@ unlock:
if (UNIXCB(skb).fp)
scm.fp = scm_fp_dup(UNIXCB(skb).fp);
 
-   sk_peek_offset_fwd(sk, chunk);
+   if (skip) {
+   sk_peek_offset_fwd(sk, chunk);
+   skip -= chunk;
+   }
 
-   break;
+   if (UNIXCB(skb).fp)
+   break;
+
+   /* XXX - this is ugly; a better approach would be
+* rewriting this function
+*/
+   last = skb;
+   last_len = skb->len;
+   unix_state_lock();
+   skb = skb_peek_next(skb, >sk_receive_queue);
+   if (skb)
+   goto again;
+   goto unlock;
}
} while (size);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-20 Thread Aaron Conole
Eric Dumazet  writes:

> On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote:
>> From: Aaron Conole 
>> 
>
> I am wondering what this is expected to do, and how this code would
> possibly not trigger a crash.
Are you suspecting it should crash from a possible double-lock case?
On line 2125, there is an unconditional unlock, which should be 
guaranteeing that there is no longer a condition to 'double lock' the
socket.

With my patch, I re-do a lock just before entering skb_peek_next, and
then loop to again: label (line 2078); I admit that there is a check
at the top of the loop which I do not include (the check for SOCK_DEAD).
Do you think this check is needed (and the cause for your concern on
the suspected crash)?

I will re-do the testing as you outline later, and report the results.

> Are you 100% sure you tested this patch and code path ?
Yes, 100%; I used the python code attached to the bug before hacking on
this function whatsoever to ensure that the bug still exists in current
kernel (it does). Then after my patch, I reran the same test. There 
were no oops, bugs, panics, or other errors reported.

> Before resending v3, please make sure to compile and test with
> CONFIG_LOCKDEP=y. Add a temporary (in your tree, not final patch)
>
> pr_err_once("went there at least one time\n");
>
> (to make sure this code path was tested)
I will do this testing as requested; my current config does include
LOCKDEP_SUPPORT=y.

> It might be time to get rid of unix_sk macro for a proper function to
> avoid these kind of errors.
>
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 4a167b30a12f..cb1b9bbda332 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -63,7 +63,11 @@ struct unix_sock {
>  #define UNIX_GC_MAYBE_CYCLE  1
>   struct socket_wqpeer_wq;
>  };
> -#define unix_sk(__sk) ((struct unix_sock *)__sk)
> +
> +static inline struct unix_sock *unix_sk(struct sock *sk)
> +{
> + return (struct unix_sock *)sk;
> +}
>  
>  #define peer_wait peer_wq.wait
If you'd like, I'll add this to a V3 version of this patch, re-do
testing with your requested config above, and report the results.

> Thanks.
Thank you for the feedback, it is very good.

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-20 Thread Aaron Conole

Resending, I accidentally dropped the list.

> Eric Dumazet  writes:
>
>> On Sun, 2015-09-20 at 15:07 -0400, Aaron Conole wrote:
>>> Eric Dumazet  writes:
>>> 
>>> > On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote:
>>> >> From: Aaron Conole 
>>> >> 
>>> >
>>> > I am wondering what this is expected to do, and how this code would
>>> > possibly not trigger a crash.
>>> Are you suspecting it should crash from a possible double-lock case?
>>> On line 2125, there is an unconditional unlock, which should be 
>>> guaranteeing that there is no longer a condition to 'double lock' the
>>> socket.
>>
>> Not at all.
>>
>> I am suggesting there is a big difference between
>>
>> unix_state_lock();
>>
>> and
>>
>> unix_state_lock(sk);
>>
>> Can you see it ?

Wow!

That's an excellent catch, thank you! I did test the originally
submitted patch, and got no oops, bug, panic, etc (I usually have
panic_on_oops set to 1 when first testing new code).

I guess I got very lucky, somehow. I'll change this, and make sure
to retest.

I will also try to enhance the python case attached to the bug to
include a filepointer as well, and will repost a v3 when I have done
this.

Thanks,
-Aaron

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-20 Thread Eric Dumazet
On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote:
> From: Aaron Conole 
> 
> AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
> is set.
> 
> This is referenced in kernel bugzilla #12323 @
> https://bugzilla.kernel.org/show_bug.cgi?id=12323
> 
> As described both in the BZ and lkml thread @
> http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
> AF_UNIX socket only reads a single skb, where the desired effect is
> to return as much skb data has been queued, until hitting the recv
> buffer size (whichever comes first).
> 
> The modified MSG_PEEK path will now move to the next skb in the tree
> and jump to the again: label, rather than following the natural loop
> structure. This requires duplicating some of the loop head actions.
> 
> This was tested using the python socketpair python code attached to
> the bugzilla issue.
> 
> Signed-off-by: Aaron Conole 
> ---
>  net/unix/af_unix.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 03ee4d3..988fbbd4 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2179,9 +2179,24 @@ unlock:
>   if (UNIXCB(skb).fp)
>   scm.fp = scm_fp_dup(UNIXCB(skb).fp);
>  
> - sk_peek_offset_fwd(sk, chunk);
> + if (skip) {
> + sk_peek_offset_fwd(sk, chunk);
> + skip -= chunk;
> + }
>  
> - break;
> + if (UNIXCB(skb).fp)
> + break;
> +
> + /* XXX - this is ugly; a better approach would be
> +  * rewriting this function
> +  */
> + last = skb;
> + last_len = skb->len;
> + unix_state_lock();

I am wondering what this is expected to do, and how this code would
possibly not trigger a crash.

Are you 100% sure you tested this patch and code path ?

Before resending v3, please make sure to compile and test with
CONFIG_LOCKDEP=y. Add a temporary (in your tree, not final patch)

pr_err_once("went there at least one time\n");

(to make sure this code path was tested)

It might be time to get rid of unix_sk macro for a proper function to
avoid these kind of errors.

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 4a167b30a12f..cb1b9bbda332 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -63,7 +63,11 @@ struct unix_sock {
 #define UNIX_GC_MAYBE_CYCLE1
struct socket_wqpeer_wq;
 };
-#define unix_sk(__sk) ((struct unix_sock *)__sk)
+
+static inline struct unix_sock *unix_sk(struct sock *sk)
+{
+   return (struct unix_sock *)sk;
+}
 
 #define peer_wait peer_wq.wait
 

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-09-20 Thread Eric Dumazet
On Sun, 2015-09-20 at 15:07 -0400, Aaron Conole wrote:
> Eric Dumazet  writes:
> 
> > On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote:
> >> From: Aaron Conole 
> >> 
> >
> > I am wondering what this is expected to do, and how this code would
> > possibly not trigger a crash.
> Are you suspecting it should crash from a possible double-lock case?
> On line 2125, there is an unconditional unlock, which should be 
> guaranteeing that there is no longer a condition to 'double lock' the
> socket.

Not at all.

I am suggesting there is a big difference between

unix_state_lock();

and

unix_state_lock(sk);

Can you see it ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html