On Sat, 1 Oct 2011 14:15:45 +0800 dave jones wrote:

 dj> On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote:
 >>
 >> On Wed, 28 Sep 2011, Mikolaj Golub wrote:
 >>
 >>> On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote:
 >>>
 >>> KM> Sorry, didn't look at the images (limited bw), I've seen something KM>
 >>> like this before in timewait. This "can't happen" with UDP so will be KM>
 >>> interested in learning more about the bug.
 >>>
 >>> The panic can be easily triggered by this:
 >>
 >> Hi:
 >>
 >> Just catching up on this thread.  I think the analysis here is generally
 >> right: in 9.0, you're much more likely to see an inpcb with its in_socket
 >> pointer cleared in the hash list than in prior releases, and
 >> in_pcbbind_setup() trips over this.
 >>
 >> However, at least on first glance (and from the perspective of invariants
 >> here), I think the bug is actualy that in_pcbbind_setup() is asking
 >> in_pcblookup_local() for an inpcb and then access the returned inpcb's
 >> in_socket pointer without acquiring a lock on the inpcb.  Structurally, it
 >> can't acquire this lock for lock order reasons -- it already holds the lock
 >> on its own inpcb.  Therefore, we should only access fields that are safe to
 >> follow in an inpcb when you hold a reference via the hash lock and not a
 >> lock on the inpcb itself, which appears generally OK (+/-) for all the
 >> fields in that clause but the t->inp_socket->so_options dereference.
 >>
 >> A preferred fix would cache the SO_REUSEPORT flag in an inpcb-layer field,
 >> such as inp_flags2, giving us access to its value without having to walk
 >> into the attached (or not) socket.
 >>
 >> This raises another structural question, which is whether we need a new
 >> inp_foo flags field that is protected explicitly by the hash lock, and not
 >> by the inpcb lock, which could hold fields relevant to address binding.  I
 >> don't think we need to solve that problem in this context, as a slightly
 >> race on SO_REUSEPORT is likely acceptable.
 >>
 >> The suggested fix does perform the desired function of explicitly detaching
 >> the inpcb from the hash list before the socket is disconnected from the
 >> inpcb. However, it's incomplete in that the invariant that's being broken is
 >> also relied on for other protocols (such as raw sockets).  The correct
 >> invariant is that inp_socket is safe to follow unconditionally if an inpcb
 >> is locked and INP_DROPPED isn't set -- the bug is in "locked" not in
 >> "INP_DROPPED", which is why I think this is the wrong fix, even though it
 >> prevents a panic :-).

 dj> Hello Robert,

 dj> Thank you for taking your valuable time to find out the problem.
 dj> Since I don't have idea about network internals, would you have a patch
 dj> about this? I'd be glad to test it, thanks again.

Here is the patch that implements what Robert suggests.

Dave, could you test it?

 >> Robert

 dj> Best regards,
 dj> Dave.

-- 
Mikolaj Golub

Index: sys/netinet/in_pcb.c
===================================================================
--- sys/netinet/in_pcb.c	(revision 225885)
+++ sys/netinet/in_pcb.c	(working copy)
@@ -575,8 +575,7 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
 				     ntohl(t->inp_faddr.s_addr) == INADDR_ANY) &&
 				    (ntohl(sin->sin_addr.s_addr) != INADDR_ANY ||
 				     ntohl(t->inp_laddr.s_addr) != INADDR_ANY ||
-				     (t->inp_socket->so_options &
-					 SO_REUSEPORT) == 0) &&
+				     (t->inp_flags2 & INP_REUSEPORT) == 0) &&
 				    (inp->inp_cred->cr_uid !=
 				     t->inp_cred->cr_uid))
 					return (EADDRINUSE);
@@ -595,14 +594,15 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
 				    (reuseport & tw->tw_so_options) == 0)
 					return (EADDRINUSE);
 			} else if (t &&
-			    (reuseport & t->inp_socket->so_options) == 0) {
+			    (reuseport == 0 ||
+			    (t->inp_flags2 & INP_REUSEPORT) == 0)) {
 #ifdef INET6
 				if (ntohl(sin->sin_addr.s_addr) !=
 				    INADDR_ANY ||
 				    ntohl(t->inp_laddr.s_addr) !=
 				    INADDR_ANY ||
-				    INP_SOCKAF(so) ==
-				    INP_SOCKAF(t->inp_socket))
+				    (inp->inp_vflag & INP_IPV6PROTO) == 0 ||
+				    (t->inp_vflag & INP_IPV6PROTO) == 0)
 #endif
 				return (EADDRINUSE);
 			}
@@ -1867,6 +1867,11 @@ in_pcbinshash_internal(struct inpcb *inp, int do_p
 	KASSERT((inp->inp_flags & INP_INHASHLIST) == 0,
 	    ("in_pcbinshash: INP_INHASHLIST"));
 
+	if ((inp->inp_socket->so_options & SO_REUSEPORT) != 0 ||
+	    (IN_MULTICAST(ntohl(inp->inp_laddr.s_addr)) &&
+	    (inp->inp_socket->so_options & SO_REUSEADDR) != 0))
+		inp->inp_flags2 |= INP_REUSEPORT;
+
 #ifdef INET6
 	if (inp->inp_vflag & INP_IPV6)
 		hashkey_faddr = inp->in6p_faddr.s6_addr32[3] /* XXX */;
Index: sys/netinet/in_pcb.h
===================================================================
--- sys/netinet/in_pcb.h	(revision 225885)
+++ sys/netinet/in_pcb.h	(working copy)
@@ -540,6 +540,7 @@ void 	inp_4tuple_get(struct inpcb *inp, uint32_t *
 #define	INP_LLE_VALID		0x00000001 /* cached lle is valid */	
 #define	INP_RT_VALID		0x00000002 /* cached rtentry is valid */
 #define	INP_PCBGROUPWILD	0x00000004 /* in pcbgroup wildcard list */
+#define	INP_REUSEPORT		0x00000008 /* socket SO_REUSEPORT option is set */
 
 /*
  * Flags passed to in_pcblookup*() functions.
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to