[PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
From: Aaron ConoleAF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag is set. This is referenced in kernel bugzilla #12323 @ https://bugzilla.kernel.org/show_bug.cgi?id=12323 As described both in the BZ and lkml thread @ http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an AF_UNIX socket only reads a single skb, where the desired effect is to return as much skb data has been queued, until hitting the recv buffer size (whichever comes first). The modified MSG_PEEK path will now move to the next skb in the tree and jump to the again: label, rather than following the natural loop structure. This requires duplicating some of the loop head actions. This was tested using the python socketpair python code attached to the bugzilla issue. Signed-off-by: Aaron Conole --- net/unix/af_unix.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 03ee4d3..988fbbd4 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2179,9 +2179,24 @@ unlock: if (UNIXCB(skb).fp) scm.fp = scm_fp_dup(UNIXCB(skb).fp); - sk_peek_offset_fwd(sk, chunk); + if (skip) { + sk_peek_offset_fwd(sk, chunk); + skip -= chunk; + } - break; + if (UNIXCB(skb).fp) + break; + + /* XXX - this is ugly; a better approach would be +* rewriting this function +*/ + last = skb; + last_len = skb->len; + unix_state_lock(); + skb = skb_peek_next(skb, >sk_receive_queue); + if (skb) + goto again; + goto unlock; } } while (size); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
Eric Dumazetwrites: > On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote: >> From: Aaron Conole >> > > I am wondering what this is expected to do, and how this code would > possibly not trigger a crash. Are you suspecting it should crash from a possible double-lock case? On line 2125, there is an unconditional unlock, which should be guaranteeing that there is no longer a condition to 'double lock' the socket. With my patch, I re-do a lock just before entering skb_peek_next, and then loop to again: label (line 2078); I admit that there is a check at the top of the loop which I do not include (the check for SOCK_DEAD). Do you think this check is needed (and the cause for your concern on the suspected crash)? I will re-do the testing as you outline later, and report the results. > Are you 100% sure you tested this patch and code path ? Yes, 100%; I used the python code attached to the bug before hacking on this function whatsoever to ensure that the bug still exists in current kernel (it does). Then after my patch, I reran the same test. There were no oops, bugs, panics, or other errors reported. > Before resending v3, please make sure to compile and test with > CONFIG_LOCKDEP=y. Add a temporary (in your tree, not final patch) > > pr_err_once("went there at least one time\n"); > > (to make sure this code path was tested) I will do this testing as requested; my current config does include LOCKDEP_SUPPORT=y. > It might be time to get rid of unix_sk macro for a proper function to > avoid these kind of errors. > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index 4a167b30a12f..cb1b9bbda332 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -63,7 +63,11 @@ struct unix_sock { > #define UNIX_GC_MAYBE_CYCLE 1 > struct socket_wqpeer_wq; > }; > -#define unix_sk(__sk) ((struct unix_sock *)__sk) > + > +static inline struct unix_sock *unix_sk(struct sock *sk) > +{ > + return (struct unix_sock *)sk; > +} > > #define peer_wait peer_wq.wait If you'd like, I'll add this to a V3 version of this patch, re-do testing with your requested config above, and report the results. > Thanks. Thank you for the feedback, it is very good. -Aaron -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
Resending, I accidentally dropped the list. > Eric Dumazetwrites: > >> On Sun, 2015-09-20 at 15:07 -0400, Aaron Conole wrote: >>> Eric Dumazet writes: >>> >>> > On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote: >>> >> From: Aaron Conole >>> >> >>> > >>> > I am wondering what this is expected to do, and how this code would >>> > possibly not trigger a crash. >>> Are you suspecting it should crash from a possible double-lock case? >>> On line 2125, there is an unconditional unlock, which should be >>> guaranteeing that there is no longer a condition to 'double lock' the >>> socket. >> >> Not at all. >> >> I am suggesting there is a big difference between >> >> unix_state_lock(); >> >> and >> >> unix_state_lock(sk); >> >> Can you see it ? Wow! That's an excellent catch, thank you! I did test the originally submitted patch, and got no oops, bug, panic, etc (I usually have panic_on_oops set to 1 when first testing new code). I guess I got very lucky, somehow. I'll change this, and make sure to retest. I will also try to enhance the python case attached to the bug to include a filepointer as well, and will repost a v3 when I have done this. Thanks, -Aaron -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote: > From: Aaron Conole> > AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag > is set. > > This is referenced in kernel bugzilla #12323 @ > https://bugzilla.kernel.org/show_bug.cgi?id=12323 > > As described both in the BZ and lkml thread @ > http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an > AF_UNIX socket only reads a single skb, where the desired effect is > to return as much skb data has been queued, until hitting the recv > buffer size (whichever comes first). > > The modified MSG_PEEK path will now move to the next skb in the tree > and jump to the again: label, rather than following the natural loop > structure. This requires duplicating some of the loop head actions. > > This was tested using the python socketpair python code attached to > the bugzilla issue. > > Signed-off-by: Aaron Conole > --- > net/unix/af_unix.c | 19 +-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 03ee4d3..988fbbd4 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -2179,9 +2179,24 @@ unlock: > if (UNIXCB(skb).fp) > scm.fp = scm_fp_dup(UNIXCB(skb).fp); > > - sk_peek_offset_fwd(sk, chunk); > + if (skip) { > + sk_peek_offset_fwd(sk, chunk); > + skip -= chunk; > + } > > - break; > + if (UNIXCB(skb).fp) > + break; > + > + /* XXX - this is ugly; a better approach would be > + * rewriting this function > + */ > + last = skb; > + last_len = skb->len; > + unix_state_lock(); I am wondering what this is expected to do, and how this code would possibly not trigger a crash. Are you 100% sure you tested this patch and code path ? Before resending v3, please make sure to compile and test with CONFIG_LOCKDEP=y. Add a temporary (in your tree, not final patch) pr_err_once("went there at least one time\n"); (to make sure this code path was tested) It might be time to get rid of unix_sk macro for a proper function to avoid these kind of errors. diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 4a167b30a12f..cb1b9bbda332 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -63,7 +63,11 @@ struct unix_sock { #define UNIX_GC_MAYBE_CYCLE1 struct socket_wqpeer_wq; }; -#define unix_sk(__sk) ((struct unix_sock *)__sk) + +static inline struct unix_sock *unix_sk(struct sock *sk) +{ + return (struct unix_sock *)sk; +} #define peer_wait peer_wq.wait Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [net] af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
On Sun, 2015-09-20 at 15:07 -0400, Aaron Conole wrote: > Eric Dumazetwrites: > > > On Sun, 2015-09-20 at 05:18 -0400, Aaron Conole wrote: > >> From: Aaron Conole > >> > > > > I am wondering what this is expected to do, and how this code would > > possibly not trigger a crash. > Are you suspecting it should crash from a possible double-lock case? > On line 2125, there is an unconditional unlock, which should be > guaranteeing that there is no longer a condition to 'double lock' the > socket. Not at all. I am suggesting there is a big difference between unix_state_lock(); and unix_state_lock(sk); Can you see it ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html