Re: 9.2 ixgbe tx queue hang

Rick Macklem Sun, 23 Mar 2014 16:59:28 -0700

Christopher Forgeron wrote:
> 
> 
> 
> 
> 
> 
> On Sat, Mar 22, 2014 at 6:41 PM, Rick Macklem < rmack...@uoguelph.ca
> > wrote:
> 
> 
> 
> Christopher Forgeron wrote:
> > #if defined(INET) || defined(INET6)
> > /* Initialize to max value. */
> > if (ifp->if_hw_tsomax == 0)
> > ifp->if_hw_tsomax = IP_MAXPACKET;
> > KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET &&
> > ifp->if_hw_tsomax >= IP_MAXPACKET / 8,
> > ("%s: tsomax outside of range", __func__));
> > #endif
> > 
> > 
> > Should this be the location where it's being set rather than in
> > ixgbe? I would assume that other drivers could fall prey to this
> > issue.
> > 
> All of this should be prepended with "I'm an NFS guy, not a
> networking
> guy, so I might be wrong".
> 
> Other drivers (and ixgbe for the 82598 chip) can handle a packet that
> is in more than 32 mbufs. (I think the 82598 handles 100, grep for
> SCATTER
> in *.h in sys/dev/ixgbe.)
> 
> 
> [...]
> 
> 
> Yes, I agree we have to be careful about the limitations of other
> drivers, but I'm thinking setting tso to IP_MAXPACKET is a bad idea,
> unless all of the header subtractions are happening elsewhere. Then
> again, perhaps every other driver (and possibly ixgbe.. i need to
> look more) does a maxtso - various_headers to set a limit for data
> packets.
> 
> 
> I'm not familiar with the Freebsd network conventions/styles - I'm
> just asking questions, something I have a bad habit for, but I'm in
> charge of code stability issues at my work so it's hard to stop.
> 
Well, IP_MAXPACKET is simply the largest # that fits in the 16bit length
field of an IP header (65535). This limit is on the TSO segment (which
is really just a TCP/IP packet greater than the MTU) and does not include
a MAC level (ethernet) header.


Beyond that, it is the specific hardware that limits things, such as
this case, which is limited to 32 mbufs (which happens to imply 64K
total, including ethernet header using 2K mbuf clusters).
(The 64K limit is just a quirk caused by the 32mbuf limit and the fact
 that mbuf clusters hold 2K of data each.)

> 
> 
> Now, since several drivers do have this 32 mbufs limit, I can see an
> argument
> for making the default a little smaller to make these work, since the
> driver can override the default. (About now someone usually jumps in
> and says
> something along the lines of "You can't do that until all the drivers
> that
> can handle IP_MAXPACKET are fixed to set if_hw_tsomax" and since I
> can't fix
> drivers I can't test, that pretty much puts a stop on it.)
> 
> 
> 
> 
> Testing is a problem isn't it? I once again offer my stack of network
> cards and systems for some sort of testing.. I still have coax and
> token ring around. :-)
> 
> 
> 
> You see the problem isn't that IP_MAXPACKET is too big, but that the
> hardware
> has a limit of 32 non-contiguous chunks (mbufs)/packet and 32 *
> MCLBYTES = 64K.
> (Hardware/network drivers that can handle 35 or more chunks (they
> like to call
> them transmit segments, although ixgbe uses the term scatter)
> shouldn't have
> any problems.)
> 
> I have an untested patch that adds a tsomaxseg count to use along
> with tsomax
> bytes so that a driver could inform tcp_output() it can only handle
> 32 mbufs
> and then tcp_output() would limit a TSO segment using both, but I
> can't test
> it, so who knows when/if that might happen.
> 
> 
> 
> 
> I think you give that to me in the next email - if not, please send.
> 
> 
> 
> I also have a patch that modifies NFS to use pagesize clusters
> (reducing the
> mbuf count in the list), but that one causes grief when testing on an
> i386
> (seems to run out of kernel memory to the point where it can't
> allocate something
> called "boundary tags" and pretty well wedges the machine at that
> point.)
> Since I don't know how to fix this (I thought of making the patch
> "amd64 only"),
> I can't really commit this to head, either.
> 
> 
> 
> 
> Send me that one too. I love NFS patches.
> 
> 
> 
> As such, I think it's going to be "fix the drivers one at a time" and
> tell
> folks to "disable TSO or limit rsize,wsize to 32K" when they run into
> trouble.
> (As you might have guessed, I'd rather just be "the NFS guy", but
> since NFS
> "triggers the problem" I\m kinda stuck with it;-)
> 
> 
> 
> I know in some circumstances disabling TSO can be a benefit, but in
> general you'd want it on a modern system with heavy data load.
> 
> 
> 
> 
> > Also should we not also subtract ETHER_VLAN_ENCAP_LEN from tsomax
> > to
> > make sure VLANs fit?
> > 
> No idea. (I wouldn't know a VLAN if it jumped up and tried to
> bite me on the nose.;-) So, I have no idea what does this, but
> if it means the total ethernet header size can be > 14bytes, then I'd
> agree.
> 
> 
> 
> Yeah, you need another 4 bytes for VLAN header if you're not using
> hardware that strips it before the TCP stack gets it. I have a mix
> of hardware and software VLANs running on our backbone, mostly due
> to a mixed FreeBSD/OpenBSD/Windows environment.
> 
> 
> 
> 
> > Perhaps there is something in the newer network code that is
> > filling
> > up the frames to the point where they are full - thus a TSO =
> > IP_MAXPACKET is just now causing problems.
> > 
> Yea, I have no idea why this didn't bite running 9.1. (Did 9.1 have
> TSO enabled by default?)
> 
> 
> 
> I believe 9.0 has TSO on by default.. I seem to recall it always
> being there, but I can't easily confirm it now. My last 9.0-STABLE
> doesn't have an ixgbe card in it.
> 
> 
> 
> 
Ok, I've attached 3 patches:
ixgbe.patch - A slightly updated version of the one that sets if_hw_tsomax,
              which subtracts out the additional 4bytes for the VLAN header.
*** If you can test this, it would be nice to know if this gets rid of all
    the EFBIG replies, since I think Jack might feel it is ok to commit if
    it does do so.

4kmcl.patch - This one modifies NFS to use pagesize mbuf clusters for the
              large RPC messages. It is NOT safe to use on a small i386,
              but might be ok on a large amd64 box. On a small i386, using
              a mix of 2K and 4K mbuf clusters seems to fragment kernel memory
              enough that allocation of "boundary tags" (whatever those are?)
              fail and this trainwrecks the system.
              Using pagesize (4K) clusters reduces the mbuf count for an
              IP_MAXPACKET sized TSO segment to 19, avoiding the 32 limit
              and any need to call m_defrag() for NFS.
*** Only use on a test system, at your own risk.

tsomaxseg.patch - This one adds support for if_hw_tsomaxseg, which is a limit on
          the # of mbufs in an output TSO segment (and defaults to 32).
*** This one HAS NOT BEEN TESTED and probably doesn't even work at this point.

rick

--- dev/ixgbe/ixgbe.c.sav	2014-03-19 17:44:34.000000000 -0400
+++ dev/ixgbe/ixgbe.c	2014-03-22 22:44:53.000000000 -0400
@@ -2614,6 +2614,10 @@ ixgbe_setup_interface(device_t dev, stru
 	ifp->if_snd.ifq_drv_maxlen = adapter->num_tx_desc - 2;
 	IFQ_SET_READY(&ifp->if_snd);
 #endif
+	if ((adapter->num_segs * MCLBYTES - (ETHER_HDR_LEN +
+	    ETHER_VLAN_ENCAP_LEN)) < IP_MAXPACKET)
+		ifp->if_hw_tsomax = adapter->num_segs * MCLBYTES -
+		    (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
 
 	ether_ifattach(ifp, adapter->hw.mac.addr);

--- fs/nfsserver/nfs_nfsdport.c.sav2	2014-01-26 18:54:29.000000000 -0500
+++ fs/nfsserver/nfs_nfsdport.c	2014-03-16 23:22:56.000000000 -0400
@@ -566,8 +566,7 @@ nfsvno_readlink(struct vnode *vp, struct
 	len = 0;
 	i = 0;
 	while (len < NFS_MAXPATHLEN) {
-		NFSMGET(mp);
-		MCLGET(mp, M_WAITOK);
+		NFSMCLGET(mp, M_NOWAIT);
 		mp->m_len = NFSMSIZ(mp);
 		if (len == 0) {
 			mp3 = mp2 = mp;
@@ -621,7 +620,7 @@ nfsvno_read(struct vnode *vp, off_t off,
     struct thread *p, struct mbuf **mpp, struct mbuf **mpendp)
 {
 	struct mbuf *m;
-	int i;
+	int do_pagesize, i;
 	struct iovec *iv;
 	struct iovec *iv2;
 	int error = 0, len, left, siz, tlen, ioflag = 0;
@@ -630,14 +629,33 @@ nfsvno_read(struct vnode *vp, off_t off,
 	struct nfsheur *nh;
 
 	len = left = NFSM_RNDUP(cnt);
+	do_pagesize = 0;
+#if MJUMPAGESIZE != MCLBYTES
+	if (left > MCLBYTES)
+		do_pagesize = 1;
+#endif
 	m3 = NULL;
 	/*
 	 * Generate the mbuf list with the uio_iov ref. to it.
 	 */
 	i = 0;
 	while (left > 0) {
-		NFSMGET(m);
-		MCLGET(m, M_WAITOK);
+		/*
+		 * For large reads, try and acquire MJUMPAGESIZE clusters.
+		 * However, do so with M_NOWAIT so the thread can't get
+		 * stuck sleeping on "btalloc".
+		 * If this fails, use NFSMCLGET(..M_NOWAIT), which does an
+		 * MGET(..M_WAITOK) followed by a MCLGET(..M_NOWAIT).  The
+		 * MCLGET(..M_NOWAIT) may not get a cluster, but will drain
+		 * the mbuf cluster zone when it fails.
+		 * As such, an mbuf will always be allocated and most likely
+		 * it will have a cluster.
+		 */
+		m = NULL;
+		if (do_pagesize != 0)
+			m = m_getjcl(M_NOWAIT, MT_DATA, 0, MJUMPAGESIZE);
+		if (m == NULL)
+			NFSMCLGET(m, M_NOWAIT);
 		m->m_len = 0;
 		siz = min(M_TRAILINGSPACE(m), left);
 		left -= siz;
@@ -1653,10 +1671,10 @@ again:
 	if (siz == 0) {
 		vput(vp);
 		if (nd->nd_flag & ND_NFSV2) {
-			NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+			NFSM_BUILD_PAGEMBCL(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
 		} else {
 			nfsrv_postopattr(nd, getret, &at);
-			NFSM_BUILD(tl, u_int32_t *, 4 * NFSX_UNSIGNED);
+			NFSM_BUILD_PAGEMBCL(tl, u_int32_t *, 4 * NFSX_UNSIGNED);
 			txdr_hyper(at.na_filerev, tl);
 			tl += 2;
 		}
@@ -1708,7 +1726,7 @@ again:
 	 */
 	if (nd->nd_flag & ND_NFSV3) {
 		nfsrv_postopattr(nd, getret, &at);
-		NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+		NFSM_BUILD_PAGEMBCL(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
 		txdr_hyper(at.na_filerev, tl);
 		dirlen = NFSX_V3POSTOPATTR + NFSX_VERF + 2 * NFSX_UNSIGNED;
 	} else {
@@ -1734,20 +1752,24 @@ again:
 			 * the dirent entry.
 			 */
 			if (nd->nd_flag & ND_NFSV3) {
-				NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED);
+				NFSM_BUILD_PAGEMBCL(tl, u_int32_t *,
+				    3 * NFSX_UNSIGNED);
 				*tl++ = newnfs_true;
 				*tl++ = 0;
 			} else {
-				NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+				NFSM_BUILD_PAGEMBCL(tl, u_int32_t *,
+				    2 * NFSX_UNSIGNED);
 				*tl++ = newnfs_true;
 			}
 			*tl = txdr_unsigned(dp->d_fileno);
 			(void) nfsm_strtom(nd, dp->d_name, nlen);
 			if (nd->nd_flag & ND_NFSV3) {
-				NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+				NFSM_BUILD_PAGEMBCL(tl, u_int32_t *,
+				    2 * NFSX_UNSIGNED);
 				*tl++ = 0;
 			} else
-				NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED);
+				NFSM_BUILD_PAGEMBCL(tl, u_int32_t *,
+				    NFSX_UNSIGNED);
 			*tl = txdr_unsigned(*cookiep);
 		}
 		cpos += dp->d_reclen;
@@ -1757,7 +1779,7 @@ again:
 	}
 	if (cpos < cend)
 		eofflag = 0;
-	NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+	NFSM_BUILD_PAGEMBCL(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
 	*tl++ = newnfs_false;
 	if (eofflag)
 		*tl = newnfs_true;
@@ -1928,7 +1950,7 @@ again:
 		vput(vp);
 		if (nd->nd_flag & ND_NFSV3)
 			nfsrv_postopattr(nd, getret, &at);
-		NFSM_BUILD(tl, u_int32_t *, 4 * NFSX_UNSIGNED);
+		NFSM_BUILD_PAGEMBCL(tl, u_int32_t *, 4 * NFSX_UNSIGNED);
 		txdr_hyper(at.na_filerev, tl);
 		tl += 2;
 		*tl++ = newnfs_false;
@@ -2031,7 +2053,7 @@ again:
 	} else {
 		dirlen = NFSX_VERF + 2 * NFSX_UNSIGNED;
 	}
-	NFSM_BUILD(tl, u_int32_t *, NFSX_VERF);
+	NFSM_BUILD_PAGEMBCL(tl, u_int32_t *, NFSX_VERF);
 	txdr_hyper(at.na_filerev, tl);
 
 	/*
@@ -2186,12 +2208,14 @@ again:
 			 * Build the directory record xdr
 			 */
 			if (nd->nd_flag & ND_NFSV3) {
-				NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED);
+				NFSM_BUILD_PAGEMBCL(tl, u_int32_t *,
+				    3 * NFSX_UNSIGNED);
 				*tl++ = newnfs_true;
 				*tl++ = 0;
 				*tl = txdr_unsigned(dp->d_fileno);
 				dirlen += nfsm_strtom(nd, dp->d_name, nlen);
-				NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+				NFSM_BUILD_PAGEMBCL(tl, u_int32_t *,
+				    2 * NFSX_UNSIGNED);
 				*tl++ = 0;
 				*tl = txdr_unsigned(*cookiep);
 				nfsrv_postopattr(nd, 0, nvap);
@@ -2200,7 +2224,8 @@ again:
 				if (nvp != NULL)
 					vput(nvp);
 			} else {
-				NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED);
+				NFSM_BUILD_PAGEMBCL(tl, u_int32_t *,
+				    3 * NFSX_UNSIGNED);
 				*tl++ = newnfs_true;
 				*tl++ = 0;
 				*tl = txdr_unsigned(*cookiep);
@@ -2267,7 +2292,7 @@ again:
 	} else if (cpos < cend)
 		eofflag = 0;
 	if (!nd->nd_repstat) {
-		NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+		NFSM_BUILD_PAGEMBCL(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
 		*tl++ = newnfs_false;
 		if (eofflag)
 			*tl = newnfs_true;
--- fs/nfsclient/nfs_clcomsubs.c.sav2	2014-02-01 20:47:07.000000000 -0500
+++ fs/nfsclient/nfs_clcomsubs.c	2014-03-16 23:22:06.000000000 -0400
@@ -155,7 +155,7 @@ nfscl_reqstart(struct nfsrv_descript *nd
 	 * Get the first mbuf for the request.
 	 */
 	if (nfs_bigrequest[procnum])
-		NFSMCLGET(mb, M_WAITOK);
+		NFSMCLGET(mb, M_NOWAIT);
 	else
 		NFSMGET(mb);
 	mbuf_setlen(mb, 0);
@@ -267,9 +267,29 @@ nfsm_uiombuf(struct nfsrv_descript *nd, 
 		while (left > 0) {
 			mlen = M_TRAILINGSPACE(mp);
 			if (mlen == 0) {
-				if (clflg)
-					NFSMCLGET(mp, M_WAITOK);
-				else
+				if (clflg != 0) {
+					/*
+					 * For large writes, try and acquire
+					 * MJUMPAGESIZE clusters.
+					 * However, do so with M_NOWAIT so the
+					 * thread can't get stuck sleeping on
+					 * "btalloc".  If this fails, use
+					 * NFSMCLGET(..M_NOWAIT), which does an
+					 * MGET(..M_WAITOK) followed by a
+					 * MCLGE T(..M_NOWAIT). This may not get
+					 * a cluster, but will drain the mbuf
+					 * cluster zone when it fails.
+					 * As such, an mbuf will always be
+					 * allocated and most likely it will
+					 * have a cluster.
+					 */
+#if MJUMPAGESIZE != MCLBYTES
+					mp = m_getjcl(M_NOWAIT, MT_DATA, 0,
+					    MJUMPAGESIZE);
+					if (mp == NULL)
+#endif
+						NFSMCLGET(mp, M_NOWAIT);
+				} else
 					NFSMGET(mp);
 				mbuf_setlen(mp, 0);
 				mbuf_setnext(mp2, mp);
--- fs/nfs/nfsm_subs.h.sav2	2014-02-01 19:51:12.000000000 -0500
+++ fs/nfs/nfsm_subs.h	2014-03-13 18:54:27.000000000 -0400
@@ -89,6 +89,37 @@ nfsm_build(struct nfsrv_descript *nd, in
 
 #define	NFSM_BUILD(a, c, s)	((a) = (c)nfsm_build(nd, (s)))
 
+/*
+ * Same as above, but allocates MJUMPAGESIZE mbuf clusters, if possible.
+ */
+static __inline void *
+nfsm_build_pagembcl(struct nfsrv_descript *nd, int siz)
+{
+	void *retp;
+	struct mbuf *mb2;
+
+	if (siz > M_TRAILINGSPACE(nd->nd_mb)) {
+		mb2 = NULL;
+#if MJUMPAGESIZE != MCLBYTES
+		mb2 = m_getjcl(M_NOWAIT, MT_DATA, 0, MJUMPAGESIZE);
+#endif
+		if (mb2 == NULL)
+			NFSMCLGET(mb2, M_NOWAIT);
+		if (siz > MLEN)
+			panic("build > MLEN");
+		mbuf_setlen(mb2, 0);
+		nd->nd_bpos = NFSMTOD(mb2, caddr_t);
+		nd->nd_mb->m_next = mb2;
+		nd->nd_mb = mb2;
+	}
+	retp = (void *)(nd->nd_bpos);
+	nd->nd_mb->m_len += siz;
+	nd->nd_bpos += siz;
+	return (retp);
+}
+
+#define	NFSM_BUILD_PAGEMBCL(a, c, s)	((a) = (c)nfsm_build_pagembcl(nd, (s)))
+
 static __inline void *
 nfsm_dissect(struct nfsrv_descript *nd, int siz)
 {
--- fs/nfs/nfsport.h.sav2	2014-02-13 19:03:22.000000000 -0500
+++ fs/nfs/nfsport.h	2014-02-13 19:14:24.000000000 -0500
@@ -138,6 +138,8 @@
 
 /*
  * Allocate mbufs. Must succeed and never set the mbuf ptr to NULL.
+ * Note that when NFSMCLGET(m, M_NOWAIT) is done, it still must allocate
+ * an mbuf (and can sleep), but might not get a cluster, in the worst case.
  */
 #define	NFSMGET(m)	do { 					\
 		MGET((m), M_WAITOK, MT_DATA); 			\

--- kern/uipc_sockbuf.c.sav	2014-01-30 20:27:17.000000000 -0500
+++ kern/uipc_sockbuf.c	2014-01-30 22:12:08.000000000 -0500
@@ -965,6 +965,39 @@ sbsndptr(struct sockbuf *sb, u_int off, 
 }
 
 /*
+ * Return the first mbuf for the provided offset.
+ */
+struct mbuf *
+sbsndmbuf(struct sockbuf *sb, u_int off, long *first_len)
+{
+	struct mbuf *m;
+
+	KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__));
+
+	*first_len = 0;
+	/*
+	 * Is off below stored offset? Happens on retransmits.
+	 * If so, just use sb_mb.
+	 */
+	if (sb->sb_sndptr == NULL || sb->sb_sndptroff > off)
+		m = sb->sb_mb;
+	else {
+		m = sb->sb_sndptr;
+		off -= sb->sb_sndptroff;
+	}
+	while (off > 0 && m != NULL) {
+		if (off < m->m_len)
+			break;
+		off -= m->m_len;
+		m = m->m_next;
+	}
+	if (m != NULL)
+		*first_len = m->m_len - off;
+
+	return (m);
+}
+
+/*
  * Drop a record off the front of a sockbuf and move the next record to the
  * front.
  */
--- sys/sockbuf.h.sav	2014-01-30 20:42:28.000000000 -0500
+++ sys/sockbuf.h	2014-01-30 22:08:43.000000000 -0500
@@ -153,6 +153,8 @@ int	sbreserve_locked(struct sockbuf *sb,
 	    struct thread *td);
 struct mbuf *
 	sbsndptr(struct sockbuf *sb, u_int off, u_int len, u_int *moff);
+struct mbuf *
+	sbsndmbuf(struct sockbuf *sb, u_int off, long *first_len);
 void	sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb);
 int	sbwait(struct sockbuf *sb);
 int	sblock(struct sockbuf *sb, int flags);
--- netinet/tcp_input.c.sav	2014-01-30 19:37:52.000000000 -0500
+++ netinet/tcp_input.c	2014-01-30 19:39:07.000000000 -0500
@@ -3627,6 +3627,7 @@ tcp_mss(struct tcpcb *tp, int offer)
 	if (cap.ifcap & CSUM_TSO) {
 		tp->t_flags |= TF_TSO;
 		tp->t_tsomax = cap.tsomax;
+		tp->t_tsomaxsegs = cap.tsomaxsegs;
 	}
 }
 
--- netinet/tcp_output.c.sav	2014-01-30 18:55:15.000000000 -0500
+++ netinet/tcp_output.c	2014-01-30 22:18:56.000000000 -0500
@@ -166,8 +166,8 @@ int
 tcp_output(struct tcpcb *tp)
 {
 	struct socket *so = tp->t_inpcb->inp_socket;
-	long len, recwin, sendwin;
-	int off, flags, error = 0;	/* Keep compiler happy */
+	long len, recwin, sendwin, tso_tlen;
+	int cnt, off, flags, error = 0;	/* Keep compiler happy */
 	struct mbuf *m;
 	struct ip *ip = NULL;
 	struct ipovly *ipov = NULL;
@@ -780,6 +780,24 @@ send:
 			}
 
 			/*
+			 * Limit the number of TSO transmit segments (mbufs
+			 * in mbuf list) to tp->t_tsomaxsegs.
+			 */
+			cnt = 0;
+			m = sbsndmbuf(&so->so_snd, off, &tso_tlen);
+			while (m != NULL && cnt < tp->t_tsomaxsegs &&
+			    tso_tlen < len) {
+				if (cnt > 0)
+					tso_tlen += m->m_len;
+				cnt++;
+				m = m->m_next;
+			}
+			if (m != NULL && tso_tlen < len) {
+				len = tso_tlen;
+				sendalot = 1;
+			}
+
+			/*
 			 * Prevent the last segment from being
 			 * fractional unless the send sockbuf can
 			 * be emptied.
--- netinet/tcp_subr.c.sav	2014-01-30 19:44:35.000000000 -0500
+++ netinet/tcp_subr.c	2014-01-30 20:56:12.000000000 -0500
@@ -1800,6 +1800,12 @@ tcp_maxmtu(struct in_conninfo *inc, stru
 			    ifp->if_hwassist & CSUM_TSO)
 				cap->ifcap |= CSUM_TSO;
 				cap->tsomax = ifp->if_hw_tsomax;
+#ifdef notyet
+				cap->tsomaxsegs = ifp->if_hw_tsomaxsegs;
+#endif
+				if (cap->tsomaxsegs == 0)
+					cap->tsomaxsegs =
+					    TCPTSO_MAX_TX_SEGS_DEFAULT;
 		}
 		RTFREE(sro.ro_rt);
 	}
--- netinet/tcp_var.h.sav	2014-01-30 19:39:22.000000000 -0500
+++ netinet/tcp_var.h	2014-01-30 20:52:57.000000000 -0500
@@ -209,6 +209,7 @@ struct tcpcb {
 	u_int	t_keepcnt;		/* number of keepalives before close */
 
 	u_int	t_tsomax;		/* tso burst length limit */
+	u_int	t_tsomaxsegs;		/* tso burst segment limit */
 
 	uint32_t t_ispare[8];		/* 5 UTO, 3 TBD */
 	void	*t_pspare2[4];		/* 4 TBD */
@@ -268,6 +269,11 @@ struct tcpcb {
 #define	TCPOOB_HAVEDATA	0x01
 #define	TCPOOB_HADDATA	0x02
 
+/*
+ * Default value for TSO maximum number of transmit segments (count of mbufs).
+ */
+#define	TCPTSO_MAX_TX_SEGS_DEFAULT	30
+
 #ifdef TCP_SIGNATURE
 /*
  * Defines which are needed by the xform_tcp module and tcp_[in|out]put
@@ -333,6 +339,7 @@ struct hc_metrics_lite {	/* must stay in
 struct tcp_ifcap {
 	int	ifcap;
 	u_int	tsomax;
+	u_int	tsomaxsegs;
 };
 
 #ifndef _NETINET_IN_PCB_H_

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: 9.2 ixgbe tx queue hang

Reply via email to