Re: svn commit: r212803 - head/sys/netinet

2010-10-24 Thread Andre Oppermann

On 23.10.2010 15:10, Bjoern A. Zeeb wrote:

On Fri, 17 Sep 2010, Andre Oppermann wrote:


Author: andre
Date: Fri Sep 17 22:05:27 2010
New Revision: 212803
URL: http://svn.freebsd.org/changeset/base/212803

Log:
Rearrange the TSO code to make it more readable and to clearly
separate the decision logic, of whether we can do TSO, and the
calculation of the burst length into two distinct parts.

Change the way the TSO burst length calculation is done. While
TSO could do bursts of 65535 bytes that can't be represented in
ip_len together with the IP and TCP header. Account for that and
use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both
have the same value of 64K). When more data is available prevent
less than MSS sized segments from being sent during the current
TSO burst.

Add two more KASSERTs to ensure the integrity of the packets.

Tested by: Ben Wilber ben-at-desync com
MFC after: 10 days


As this hasn't happned yet, please do not do. It breaks things. I'll
follow-up later as soon as I have more details.


I was busied out after the EuroBSDCon DevSummit and didn't have have
time to MFC.  Incidentially I was planning on doing it today, but will
hold off based on your request.

The version currently in 8 certainly has a bug.  For the one in head
you are the first report.  Others reported their all their issues to be
fixed with this patch.

Can you give an high level description of the problem you are seeing?
A detailed description is not required to take a first look on whatever
issue you may have.

--
Andre




Modified:
head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c Fri Sep 17 21:53:56 2010 (r212802)
+++ head/sys/netinet/tcp_output.c Fri Sep 17 22:05:27 2010 (r212803)
@@ -465,9 +465,8 @@ after_sack_rexmit:
}

/*
- * Truncate to the maximum segment length or enable TCP Segmentation
- * Offloading (if supported by hardware) and ensure that FIN is removed
- * if the length no longer contains the last data byte.
+ * Decide if we can use TCP Segmentation Offloading (if supported by
+ * hardware).
*
* TSO may only be used if we are in a pure bulk sending state. The
* presence of TCP-MD5, SACK retransmits, SACK advertizements and
@@ -475,10 +474,6 @@ after_sack_rexmit:
* (except for the sequence number) for all generated packets. This
* makes it impossible to transmit any options which vary per generated
* segment or packet.
- *
- * The length of TSO bursts is limited to TCP_MAXWIN. That limit and
- * removal of FIN (if not already catched here) are handled later after
- * the exact length of the TCP options are known.
*/
#ifdef IPSEC
/*
@@ -487,22 +482,15 @@ after_sack_rexmit:
*/
ipsec_optlen = ipsec_hdrsiz_tcp(tp);
#endif
- if (len  tp-t_maxseg) {
- if ((tp-t_flags  TF_TSO)  V_tcp_do_tso 
- ((tp-t_flags  TF_SIGNATURE) == 0) 
- tp-rcv_numsacks == 0  sack_rxmit == 0 
- tp-t_inpcb-inp_options == NULL 
- tp-t_inpcb-in6p_options == NULL
+ if ((tp-t_flags  TF_TSO)  V_tcp_do_tso  len  tp-t_maxseg 
+ ((tp-t_flags  TF_SIGNATURE) == 0) 
+ tp-rcv_numsacks == 0  sack_rxmit == 0 
#ifdef IPSEC
-  ipsec_optlen == 0
+ ipsec_optlen == 0 
#endif
- ) {
- tso = 1;
- } else {
- len = tp-t_maxseg;
- sendalot = 1;
- }
- }
+ tp-t_inpcb-inp_options == NULL 
+ tp-t_inpcb-in6p_options == NULL)
+ tso = 1;

if (sack_rxmit) {
if (SEQ_LT(p-rxmit + len, tp-snd_una + so-so_snd.sb_cc))
@@ -732,28 +720,53 @@ send:
* bump the packet length beyond the t_maxopd length.
* Clear the FIN bit because we cut off the tail of
* the segment.
- *
- * When doing TSO limit a burst to TCP_MAXWIN minus the
- * IP, TCP and Options length to keep ip-ip_len from
- * overflowing. Prevent the last segment from being
- * fractional thus making them all equal sized and set
- * the flag to continue sending. TSO is disabled when
- * IP options or IPSEC are present.
*/
if (len + optlen + ipoptlen  tp-t_maxopd) {
flags = ~TH_FIN;
+
if (tso) {
- if (len  TCP_MAXWIN - hdrlen - optlen) {
- len = TCP_MAXWIN - hdrlen - optlen;
- len = len - (len % (tp-t_maxopd - optlen));
+ KASSERT(ipoptlen == 0,
+ (%s: TSO can't do IP options, __func__));
+
+ /*
+ * Limit a burst to IP_MAXPACKET minus IP,
+ * TCP and options length to keep ip-ip_len
+ * from overflowing.
+ */
+ if (len  IP_MAXPACKET - hdrlen) {
+ len = IP_MAXPACKET - hdrlen;
+ sendalot = 1;
+ }
+
+ /*
+ * Prevent the last segment from being
+ * fractional unless the send sockbuf can
+ * be emptied.
+ */
+ if (sendalot  off + len  so-so_snd.sb_cc) {
+ len -= len % (tp-t_maxopd - optlen);
sendalot = 1;
- } else if (tp-t_flags  TF_NEEDFIN)
+ }
+
+ /*
+ * Send the FIN in a separate segment
+ * after the bulk sending is done.
+ * We don't trust the TSO implementations
+ * to clear the FIN flag on all but the
+ * last segment.
+ */
+ if (tp-t_flags  TF_NEEDFIN)
sendalot = 1;
+
} else {
len = tp-t_maxopd - optlen - ipoptlen;
sendalot = 1;
}
- }
+ } else
+ tso = 0;
+
+ KASSERT(len

svn commit: r241686 - in head/sys: net netgraph netgraph/atm/ccatm netgraph/atm/sscfu netgraph/atm/sscop netgraph/atm/uni netinet netinet6 netipsec

2012-10-18 Thread Andre Oppermann
Author: andre
Date: Thu Oct 18 13:57:24 2012
New Revision: 241686
URL: http://svn.freebsd.org/changeset/base/241686

Log:
  Mechanically remove the last stray remains of spl* calls from net*/*.
  They have been Noop's for a long time now.

Modified:
  head/sys/net/if.c
  head/sys/net/if_ef.c
  head/sys/net/if_gre.c
  head/sys/net/if_spppsubr.c
  head/sys/net/if_var.h
  head/sys/net/rtsock.c
  head/sys/netgraph/atm/ccatm/ng_ccatm.c
  head/sys/netgraph/atm/sscfu/ng_sscfu.c
  head/sys/netgraph/atm/sscop/ng_sscop.c
  head/sys/netgraph/atm/uni/ng_uni.c
  head/sys/netgraph/ng_eiface.c
  head/sys/netgraph/ng_ether.c
  head/sys/netgraph/ng_fec.c
  head/sys/netgraph/ng_gif.c
  head/sys/netgraph/ng_ksocket.c
  head/sys/netgraph/ng_source.c
  head/sys/netinet/ip_ipsec.c
  head/sys/netinet6/in6.c
  head/sys/netinet6/ip6_ipsec.c
  head/sys/netinet6/nd6.c
  head/sys/netinet6/nd6_nbr.c
  head/sys/netinet6/nd6_rtr.c
  head/sys/netinet6/udp6_usrreq.c
  head/sys/netipsec/key.c

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Thu Oct 18 13:46:26 2012(r241685)
+++ head/sys/net/if.c   Thu Oct 18 13:57:24 2012(r241686)
@@ -691,12 +691,9 @@ static void
 if_attachdomain(void *dummy)
 {
struct ifnet *ifp;
-   int s;
 
-   s = splnet();
TAILQ_FOREACH(ifp, V_ifnet, if_link)
if_attachdomain1(ifp);
-   splx(s);
 }
 SYSINIT(domainifattach, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_SECOND,
 if_attachdomain, NULL);
@@ -705,21 +702,15 @@ static void
 if_attachdomain1(struct ifnet *ifp)
 {
struct domain *dp;
-   int s;
-
-   s = splnet();
 
/*
 * Since dp-dom_ifattach calls malloc() with M_WAITOK, we
 * cannot lock ifp-if_afdata initialization, entirely.
 */
-   if (IF_AFDATA_TRYLOCK(ifp) == 0) {
-   splx(s);
+   if (IF_AFDATA_TRYLOCK(ifp) == 0)
return;
-   }
if (ifp-if_afdata_initialized = domain_init_status) {
IF_AFDATA_UNLOCK(ifp);
-   splx(s);
printf(if_attachdomain called more than once on %s\n,
ifp-if_xname);
return;
@@ -734,8 +725,6 @@ if_attachdomain1(struct ifnet *ifp)
ifp-if_afdata[dp-dom_family] =
(*dp-dom_ifattach)(ifp);
}
-
-   splx(s);
 }
 
 /*
@@ -1825,7 +1814,6 @@ link_rtrequest(int cmd, struct rtentry *
 /*
  * Mark an interface down and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 static void
 if_unroute(struct ifnet *ifp, int flag, int fam)
@@ -1849,7 +1837,6 @@ if_unroute(struct ifnet *ifp, int flag, 
 /*
  * Mark an interface up and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 static void
 if_route(struct ifnet *ifp, int flag, int fam)
@@ -1935,7 +1922,6 @@ do_link_state_change(void *arg, int pend
 /*
  * Mark an interface down and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 void
 if_down(struct ifnet *ifp)
@@ -1947,7 +1933,6 @@ if_down(struct ifnet *ifp)
 /*
  * Mark an interface up and notify protocols of
  * the transition.
- * NOTE: must be called at splnet or eqivalent.
  */
 void
 if_up(struct ifnet *ifp)
@@ -2150,14 +2135,10 @@ ifhwioctl(u_long cmd, struct ifnet *ifp,
/* Smart drivers twiddle their own routes */
} else if (ifp-if_flags  IFF_UP 
(new_flags  IFF_UP) == 0) {
-   int s = splimp();
if_down(ifp);
-   splx(s);
} else if (new_flags  IFF_UP 
(ifp-if_flags  IFF_UP) == 0) {
-   int s = splimp();
if_up(ifp);
-   splx(s);
}
/* See if permanently promiscuous mode bit is about to flip */
if ((ifp-if_flags ^ new_flags)  IFF_PPROMISC) {
@@ -2605,11 +2586,8 @@ ifioctl(struct socket *so, u_long cmd, c
 
if ((oif_flags ^ ifp-if_flags)  IFF_UP) {
 #ifdef INET6
-   if (ifp-if_flags  IFF_UP) {
-   int s = splimp();
+   if (ifp-if_flags  IFF_UP)
in6_if_up(ifp);
-   splx(s);
-   }
 #endif
}
if_rele(ifp);

Modified: head/sys/net/if_ef.c
==
--- head/sys/net/if_ef.cThu Oct 18 13:46:26 2012(r241685)
+++ head/sys/net/if_ef.cThu Oct 18 13:57:24 2012(r241686)
@@ -151,14 +151,10 @@ static int
 ef_detach(struct efnet *sc)
 {
struct ifnet *ifp = sc-ef_ifp;
-   int s;
-
-   s = splimp();
 
ether_ifdetach(ifp);
if_free(ifp);
 
-   splx(s);
return 0;
 }
 
@@ -172,11 

svn commit: r241688 - head/sys/net

2012-10-18 Thread Andre Oppermann
Author: andre
Date: Thu Oct 18 14:08:26 2012
New Revision: 241688
URL: http://svn.freebsd.org/changeset/base/241688

Log:
  Use LOG_WARNING level in in_attachdomain1() instead of printf().
  
  Submitted by: vijju.singh-at-gmail.com

Modified:
  head/sys/net/if.c

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Thu Oct 18 13:57:28 2012(r241687)
+++ head/sys/net/if.c   Thu Oct 18 14:08:26 2012(r241688)
@@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp)
return;
if (ifp-if_afdata_initialized = domain_init_status) {
IF_AFDATA_UNLOCK(ifp);
-   printf(if_attachdomain called more than once on %s\n,
-   ifp-if_xname);
+   log(LOG_WARNING, if_attachdomain called more than once 
+   on %s\n, ifp-if_xname);
return;
}
ifp-if_afdata_initialized = domain_init_status;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Andre Oppermann

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.
Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.

Note that zero copy isn't entirely true either as it marks
the page as COW.  So when the userspace application reuses
the memory it is copied anyway.  Also the overhead of doing
the VM magic and mbuf attachment of a VM page isn't free
either.  To really benefit from it an application has to be
written with COW in mind and not reuse the memory that was
just written to the socket.  For non-aware applications it
may be a net performance loss overall.

Also I don't like the name zero-copy-socket as it promises
too much for those not into socket, mbuf and VM magic.
I'd rather call it cow-socket or something like that as it
describes much better what is actually happening behind the
scenes.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241704 - head/sys/kern

2012-10-18 Thread Andre Oppermann
Author: andre
Date: Thu Oct 18 21:04:30 2012
New Revision: 241704
URL: http://svn.freebsd.org/changeset/base/241704

Log:
  Remove unnecessary includes from sosend_copyin() and fix
  a couple of style issues.

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Thu Oct 18 20:22:17 2012(r241703)
+++ head/sys/kern/uipc_socket.c Thu Oct 18 21:04:30 2012(r241704)
@@ -860,12 +860,6 @@ struct so_zerocopy_stats{
int found_ifp;
 };
 struct so_zerocopy_stats so_zerocp_stats = {0,0,0};
-#include netinet/in.h
-#include net/route.h
-#include netinet/in_pcb.h
-#include vm/vm.h
-#include vm/vm_page.h
-#include vm/vm_object.h
 
 /*
  * sosend_copyin() is only used if zero copy sockets are enabled.  Otherwise
@@ -907,9 +901,9 @@ sosend_copyin(struct uio *uio, struct mb
} else
m = m_get(M_WAITOK, MT_DATA);
if (so_zero_copy_send 
-   resid=PAGE_SIZE 
-   *space=PAGE_SIZE 
-   uio-uio_iov-iov_len=PAGE_SIZE) {
+   resid = PAGE_SIZE 
+   *space = PAGE_SIZE 
+   uio-uio_iov-iov_len = PAGE_SIZE) {
so_zerocp_stats.size_ok++;
so_zerocp_stats.align_ok++;
cow_send = socow_setup(m, uio);
@@ -946,7 +940,7 @@ sosend_copyin(struct uio *uio, struct mb
if (cow_send)
error = 0;
else
-   error = uiomove(mtod(m, void *), (int)len, uio);
+   error = uiomove(mtod(m, void *), (int)len, uio);
resid = uio-uio_resid;
m-m_len = len;
*mp = m;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241703 - head/sys/kern

2012-10-18 Thread Andre Oppermann

On 18.10.2012 23:06, Navdeep Parhar wrote:

Hello Andre,

A couple of things if you're poking around in this area...


I didn't really mean to dive too deep into COW socket writes.


On 10/18/12 13:44, Andre Oppermann wrote:

On 18.10.2012 22:22, Andre Oppermann wrote:

Author: andre
Date: Thu Oct 18 20:22:17 2012
New Revision: 241703
URL: http://svn.freebsd.org/changeset/base/241703

Log:
   Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
   zero copy specialized sosend_copyin() helper function.


Note that I'm not saying zero copy should be used or is even
more performant than the optimized m_uiotombuf() function.


Some time back I played around with a modified m_uiotombuf() that was aware of 
the mbuf_jumbo_16K
zone (instead of limiting itself to 4K mbufs).  In some cases it performed 
better than the stock
m_uiotombuf. I suspect this change would also help drivers that are unable to 
deal with long gather
lists when doing TSO.  But my testing wasn't rigorous enough (I was merely 
playing around), and the
drivers I work with can mostly cope with whatever the kernel throws at them.  
So nothing came out of
it.


The jumbo 16K zone is special in that the memory is actually allocated
by contigmalloc to get physically contiguous RAM. After some uptime and
heavy use this may become difficult to obtain. Also contigmalloc has to
hunt for it which may cause quite a bit of overhead.

4K mbufs, actually PAGE_SIZE mbufs, are very easily obtainable and fast.

To be honest I'm not really happy about  PAGE_SIZE mbufs.  They were
introduced at a time when DMA engines were more limited and couldn't
do S/G DMA on receive.

So performance with  PAGE_SIZE mbufs may be a little bit better but
when you approach memory fragmentation after some heavy system usage
it sucks up to the point where it fails most of the time.  PAGE_SIZE
mbufs always perform the same with very little deviation.

In an ideal scenario I'd like to see 9K and 16K mbufs go away and
have the RX DMA ring stitch a packet up out of PAGE_SIZE mbufs.


Actually there may be some real bit-rot to zero copy sockets.
I've just started looking into it.


I have a cxgbe(4)-specific true zero-copy implementation.  The rx side is in 
head, the tx side works
only for blocking sockets (the easy case) and I haven't checked it in 
anywhere.  Take a look at
t4_soreceive_ddp() and m_mbuftouio_ddp() in sys/dev/cxgbe/t4_ddp.c. They're 
mostly identical to the
kernel routines they're based on (read: copy-pasted from).  You may find them 
of some interest if
you're working in this area and are thinking of adding zero-copy hooks to the 
socket implementation.


I'm going to have a look at it think about how to generically support
DDP either way with our socket buffer layout.

Actually that may end up as the golden path. Do away with  PAGE_SIZE
mbufs, sink page flipping COW (incorrectly named ZERO_COPY) and use
DDP for those who need utmost performance (as I said only COW aware
applications gain a bit of speed, unaware may end up much worse).

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241724 - head/sys/sys

2012-10-19 Thread Andre Oppermann
Author: andre
Date: Fri Oct 19 10:04:43 2012
New Revision: 241724
URL: http://svn.freebsd.org/changeset/base/241724

Log:
  Remove splimp() comment from sysinit table and attribute SI_SUB_PROTO_BEGIN
  and SI_SUB_PROTO_END to VNET related initializations.
  
  MFC after:3 days

Modified:
  head/sys/sys/kernel.h

Modified: head/sys/sys/kernel.h
==
--- head/sys/sys/kernel.h   Fri Oct 19 09:41:45 2012(r241723)
+++ head/sys/sys/kernel.h   Fri Oct 19 10:04:43 2012(r241724)
@@ -84,12 +84,6 @@ extern int ticks;
  * The SI_SUB_SWAP values represent a value used by
  * the BSD 4.4Lite but not by FreeBSD; it is maintained in dependent
  * order to support porting.
- *
- * The SI_SUB_PROTO_BEGIN and SI_SUB_PROTO_END bracket a range of
- * initializations to take place at splimp().  This is a historical
- * wart that should be removed -- probably running everything at
- * splimp() until the first init that doesn't want it is the correct
- * fix.  They are currently present to ensure historical behavior.
  */
 enum sysinit_sub_id {
SI_SUB_DUMMY= 0x000,/* not executed; for linker*/
@@ -147,12 +141,12 @@ enum sysinit_sub_id {
SI_SUB_P1003_1B = 0x6E0,/* P1003.1B realtime */
SI_SUB_PSEUDO   = 0x700,/* pseudo devices*/
SI_SUB_EXEC = 0x740,/* execve() handlers */
-   SI_SUB_PROTO_BEGIN  = 0x800,/* XXX: set splimp (kludge)*/
+   SI_SUB_PROTO_BEGIN  = 0x800,/* VNET initialization */
SI_SUB_PROTO_IF = 0x840,/* interfaces*/
SI_SUB_PROTO_DOMAININIT = 0x860,/* domain registration system */
SI_SUB_PROTO_DOMAIN = 0x880,/* domains (address families?)*/
SI_SUB_PROTO_IFATTACHDOMAIN = 0x881,/* domain dependent 
data init*/
-   SI_SUB_PROTO_END= 0x8ff,/* XXX: set splx (kludge)*/
+   SI_SUB_PROTO_END= 0x8ff,/* VNET helper functions */
SI_SUB_KPROF= 0x900,/* kernel profiling*/
SI_SUB_KICK_SCHEDULER   = 0xa00,/* start the timeout events*/
SI_SUB_INT_CONFIG_HOOKS = 0xa80,/* Interrupts enabled config */
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241725 - head/sys/net

2012-10-19 Thread Andre Oppermann
Author: andre
Date: Fri Oct 19 10:07:55 2012
New Revision: 241725
URL: http://svn.freebsd.org/changeset/base/241725

Log:
  Update to previous r241688 to use __func__ instead of spelled out function
  name in log(9) message.
  
  Suggested by: glebius

Modified:
  head/sys/net/if.c

Modified: head/sys/net/if.c
==
--- head/sys/net/if.c   Fri Oct 19 10:04:43 2012(r241724)
+++ head/sys/net/if.c   Fri Oct 19 10:07:55 2012(r241725)
@@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp)
return;
if (ifp-if_afdata_initialized = domain_init_status) {
IF_AFDATA_UNLOCK(ifp);
-   log(LOG_WARNING, if_attachdomain called more than once 
-   on %s\n, ifp-if_xname);
+   log(LOG_WARNING, %s called more than once on %s\n,
+   __func__, ifp-if_xname);
return;
}
ifp-if_afdata_initialized = domain_init_status;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241688 - head/sys/net

2012-10-19 Thread Andre Oppermann

On 18.10.2012 16:11, Gleb Smirnoff wrote:

On Thu, Oct 18, 2012 at 02:08:26PM +, Andre Oppermann wrote:
A Author: andre
A Date: Thu Oct 18 14:08:26 2012
A New Revision: 241688
A URL: http://svn.freebsd.org/changeset/base/241688
A
A Log:
A   Use LOG_WARNING level in in_attachdomain1() instead of printf().
A
A   Submitted by:   vijju.singh-at-gmail.com
A
A Modified:
A   head/sys/net/if.c
A
A Modified: head/sys/net/if.c
A 
==
A --- head/sys/net/if.c Thu Oct 18 13:57:28 2012(r241687)
A +++ head/sys/net/if.c Thu Oct 18 14:08:26 2012(r241688)
A @@ -711,8 +711,8 @@ if_attachdomain1(struct ifnet *ifp)
A   return;
A   if (ifp-if_afdata_initialized = domain_init_status) {
A   IF_AFDATA_UNLOCK(ifp);
A - printf(if_attachdomain called more than once on %s\n,
A - ifp-if_xname);
A + log(LOG_WARNING, if_attachdomain called more than once 
A + on %s\n, ifp-if_xname);
A   return;
A   }
A   ifp-if_afdata_initialized = domain_init_status;

It'll be even more perfect if done as

%s called more than once on %s\n, __func__, ifp-if_xname


Thanks, done in r241725.


And do we need \n for log(9)?


Yes.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241726 - head/sys/kern

2012-10-19 Thread Andre Oppermann
Author: andre
Date: Fri Oct 19 10:15:32 2012
New Revision: 241726
URL: http://svn.freebsd.org/changeset/base/241726

Log:
  Move UMA socket zone initialization from uipc_domain.c to uipc_socket.c
  into one place next to its other related functions to avoid confusion.

Modified:
  head/sys/kern/uipc_domain.c
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_domain.c
==
--- head/sys/kern/uipc_domain.c Fri Oct 19 10:07:55 2012(r241725)
+++ head/sys/kern/uipc_domain.c Fri Oct 19 10:15:32 2012(r241726)
@@ -239,28 +239,11 @@ domain_add(void *data)
mtx_unlock(dom_mtx);
 }
 
-static void
-socket_zone_change(void *tag)
-{
-
-   uma_zone_set_max(socket_zone, maxsockets);
-}
-
 /* ARGSUSED*/
 static void
 domaininit(void *dummy)
 {
 
-   /*
-* Before we do any setup, make sure to initialize the
-* zone allocator we get struct sockets from.
-*/
-   socket_zone = uma_zcreate(socket, sizeof(struct socket), NULL, NULL,
-   NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
-   uma_zone_set_max(socket_zone, maxsockets);
-   EVENTHANDLER_REGISTER(maxsockets_change, socket_zone_change, NULL,
-   EVENTHANDLER_PRI_FIRST);
-
if (max_linkhdr  16)   /* XXX */
max_linkhdr = 16;
 

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Fri Oct 19 10:07:55 2012(r241725)
+++ head/sys/kern/uipc_socket.c Fri Oct 19 10:15:32 2012(r241726)
@@ -227,6 +227,29 @@ MTX_SYSINIT(so_global_mtx, so_global_mt
 SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLAG_RW, 0, IPC);
 
 /*
+ * Initialize the socket subsystem and set up the socket
+ * memory allocator.
+ */
+static void
+socket_zone_change(void *tag)
+{
+
+   uma_zone_set_max(socket_zone, maxsockets);
+}
+
+static void
+socket_init(void *tag)
+{
+
+socket_zone = uma_zcreate(socket, sizeof(struct socket), NULL, NULL,
+NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
+uma_zone_set_max(socket_zone, maxsockets);
+EVENTHANDLER_REGISTER(maxsockets_change, socket_zone_change, NULL,
+EVENTHANDLER_PRI_FIRST);
+}
+SYSINIT(socket, SI_SUB_PROTO_DOMAININIT, SI_ORDER_ANY, socket_init, NULL);
+
+/*
  * Sysctl to get and set the maximum global sockets limit.  Notify protocols
  * of the change so that they can update their dependent limits as required.
  */
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241779 - head/sys/kern

2012-10-20 Thread Andre Oppermann
Author: andre
Date: Sat Oct 20 10:51:32 2012
New Revision: 241779
URL: http://svn.freebsd.org/changeset/base/241779

Log:
  Tidy up somaxconn (accept queue limit) and related functions
  and move it together into one place.

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Sat Oct 20 10:34:55 2012(r241778)
+++ head/sys/kern/uipc_socket.c Sat Oct 20 10:51:32 2012(r241779)
@@ -182,15 +182,37 @@ MALLOC_DEFINE(M_PCB, pcb, protocol co
VNET_ASSERT(curvnet != NULL,\
(%s:%d curvnet is NULL, so=%p, __func__, __LINE__, (so)));
 
+/*
+ * Limit on the number of connections in the listen queue waiting
+ * for accept(2).
+ */
 static int somaxconn = SOMAXCONN;
-static int sysctl_somaxconn(SYSCTL_HANDLER_ARGS);
-/* XXX: we dont have SYSCTL_USHORT */
+
+static int
+sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
+{
+   int error;
+   int val;
+
+   val = somaxconn;
+   error = sysctl_handle_int(oidp, val, 0, req);
+   if (error || !req-newptr )
+   return (error);
+
+   if (val  1 || val  USHRT_MAX)
+   return (EINVAL);
+
+   somaxconn = val;
+   return (0);
+}
 SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW,
-0, sizeof(int), sysctl_somaxconn, I, Maximum pending socket connection 
-queue size);
+0, sizeof(int), sysctl_somaxconn, I,
+Maximum listen socket pending connection accept queue size);
+
 static int numopensockets;
 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
 numopensockets, 0, Number of open sockets);
+
 #ifdef ZERO_COPY_SOCKETS
 /* These aren't static because they're used in other files. */
 int so_zero_copy_send = 1;
@@ -3269,24 +3291,6 @@ socheckuid(struct socket *so, uid_t uid)
return (0);
 }
 
-static int
-sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
-{
-   int error;
-   int val;
-
-   val = somaxconn;
-   error = sysctl_handle_int(oidp, val, 0, req);
-   if (error || !req-newptr )
-   return (error);
-
-   if (val  1 || val  USHRT_MAX)
-   return (EINVAL);
-
-   somaxconn = val;
-   return (0);
-}
-
 /*
  * These functions are used by protocols to notify the socket layer (and its
  * consumers) of state changes in the sockets driven by protocol-side events.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241781 - in head: lib/libc/sys sys/kern

2012-10-20 Thread Andre Oppermann
Author: andre
Date: Sat Oct 20 12:53:14 2012
New Revision: 241781
URL: http://svn.freebsd.org/changeset/base/241781

Log:
  Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a
  output and replace it with a new visible sysctl kern.ipc.acceptqueue
  of the same functionality.  It specifies the maximum length of the
  accept queue on a listen socket.
  
  The old kern.ipc.somaxconn remains available for reading and writing
  for compatibility reasons so that existing programs, scripts and
  configurations continue to work.  There no plans to ever remove the
  orginal and now hidden kern.ipc.somaxconn.

Modified:
  head/lib/libc/sys/listen.2
  head/sys/kern/uipc_socket.c

Modified: head/lib/libc/sys/listen.2
==
--- head/lib/libc/sys/listen.2  Sat Oct 20 12:07:48 2012(r241780)
+++ head/lib/libc/sys/listen.2  Sat Oct 20 12:53:14 2012(r241781)
@@ -28,7 +28,7 @@
 .\From: @(#)listen.2  8.2 (Berkeley) 12/11/93
 .\ $FreeBSD$
 .\
-.Dd August 29, 2005
+.Dd October 20, 2012
 .Dt LISTEN 2
 .Os
 .Sh NAME
@@ -102,15 +102,15 @@ of service attacks are no longer necessa
 The
 .Xr sysctl 3
 MIB variable
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
 specifies a hard limit on
 .Fa backlog ;
 if a value greater than
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
 or less than zero is specified,
 .Fa backlog
 is silently forced to
-.Va kern.ipc.somaxconn .
+.Va kern.ipc.soacceptqueue .
 .Sh INTERACTION WITH ACCEPT FILTERS
 When accept filtering is used on a socket, a second queue will
 be used to hold sockets that have connected, but have not yet
@@ -168,3 +168,17 @@ at run-time, and to use a negative
 .Fa backlog
 to request the maximum allowable value, was introduced in
 .Fx 2.2 .
+The
+.Va kern.ipc.somaxconn
+.Xr sysctl 3
+has been replaced with
+.Va kern.ipc.soacceptqueue
+in
+.Fx 10.0
+to prevent confusion its actual functionality.
+The original
+.Xr sysctl 3
+.Va kern.ipc.somaxconn
+is still available but hidden from a
+.Xr sysctl 3
+-a output so that existing applications and scripts continue to work.

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Sat Oct 20 12:07:48 2012(r241780)
+++ head/sys/kern/uipc_socket.c Sat Oct 20 12:53:14 2012(r241781)
@@ -185,6 +185,8 @@ MALLOC_DEFINE(M_PCB, pcb, protocol co
 /*
  * Limit on the number of connections in the listen queue waiting
  * for accept(2).
+ * NB: The orginal sysctl somaxconn is still available but hidden
+ * to prevent confusion about the actually purpose of this number.
  */
 static int somaxconn = SOMAXCONN;
 
@@ -205,9 +207,13 @@ sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
somaxconn = val;
return (0);
 }
-SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW,
+SYSCTL_PROC(_kern_ipc, OID_AUTO, soacceptqueue, CTLTYPE_UINT | CTLFLAG_RW,
 0, sizeof(int), sysctl_somaxconn, I,
 Maximum listen socket pending connection accept queue size);
+SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn,
+CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_SKIP,
+0, sizeof(int), sysctl_somaxconn, I,
+Maximum listen socket pending connection accept queue size (compat));
 
 static int numopensockets;
 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241789 - in head: lib/libc/sys sys/kern

2012-10-20 Thread Andre Oppermann
Author: andre
Date: Sat Oct 20 19:38:22 2012
New Revision: 241789
URL: http://svn.freebsd.org/changeset/base/241789

Log:
  Grammar fixes to r241781.
  
  Submitted by: alc

Modified:
  head/lib/libc/sys/listen.2
  head/sys/kern/uipc_socket.c

Modified: head/lib/libc/sys/listen.2
==
--- head/lib/libc/sys/listen.2  Sat Oct 20 18:13:20 2012(r241788)
+++ head/lib/libc/sys/listen.2  Sat Oct 20 19:38:22 2012(r241789)
@@ -175,7 +175,7 @@ has been replaced with
 .Va kern.ipc.soacceptqueue
 in
 .Fx 10.0
-to prevent confusion its actual functionality.
+to prevent confusion about its actual functionality.
 The original
 .Xr sysctl 3
 .Va kern.ipc.somaxconn

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Sat Oct 20 18:13:20 2012(r241788)
+++ head/sys/kern/uipc_socket.c Sat Oct 20 19:38:22 2012(r241789)
@@ -186,7 +186,7 @@ MALLOC_DEFINE(M_PCB, pcb, protocol co
  * Limit on the number of connections in the listen queue waiting
  * for accept(2).
  * NB: The orginal sysctl somaxconn is still available but hidden
- * to prevent confusion about the actually purpose of this number.
+ * to prevent confusion about the actual purpose of this number.
  */
 static int somaxconn = SOMAXCONN;
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241781 - in head: lib/libc/sys sys/kern

2012-10-20 Thread Andre Oppermann

On 20.10.2012 19:23, Alan Cox wrote:

There are couple minor grammar issues in the text.  See below.


Thank you. Fixed in r241789.

--
Andre


Alan

On 10/20/2012 07:53, Andre Oppermann wrote:

Author: andre
Date: Sat Oct 20 12:53:14 2012
New Revision: 241781
URL: http://svn.freebsd.org/changeset/base/241781

Log:
   Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a
   output and replace it with a new visible sysctl kern.ipc.acceptqueue
   of the same functionality.  It specifies the maximum length of the
   accept queue on a listen socket.

   The old kern.ipc.somaxconn remains available for reading and writing
   for compatibility reasons so that existing programs, scripts and
   configurations continue to work.  There no plans to ever remove the
   orginal and now hidden kern.ipc.somaxconn.

Modified:
   head/lib/libc/sys/listen.2
   head/sys/kern/uipc_socket.c

Modified: head/lib/libc/sys/listen.2
==
--- head/lib/libc/sys/listen.2Sat Oct 20 12:07:48 2012(r241780)
+++ head/lib/libc/sys/listen.2Sat Oct 20 12:53:14 2012(r241781)
@@ -28,7 +28,7 @@
  .\From: @(#)listen.28.2 (Berkeley) 12/11/93
  .\ $FreeBSD$
  .\
-.Dd August 29, 2005
+.Dd October 20, 2012
  .Dt LISTEN 2
  .Os
  .Sh NAME
@@ -102,15 +102,15 @@ of service attacks are no longer necessa
  The
  .Xr sysctl 3
  MIB variable
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
  specifies a hard limit on
  .Fa backlog ;
  if a value greater than
-.Va kern.ipc.somaxconn
+.Va kern.ipc.soacceptqueue
  or less than zero is specified,
  .Fa backlog
  is silently forced to
-.Va kern.ipc.somaxconn .
+.Va kern.ipc.soacceptqueue .
  .Sh INTERACTION WITH ACCEPT FILTERS
  When accept filtering is used on a socket, a second queue will
  be used to hold sockets that have connected, but have not yet
@@ -168,3 +168,17 @@ at run-time, and to use a negative
  .Fa backlog
  to request the maximum allowable value, was introduced in
  .Fx 2.2 .
+The
+.Va kern.ipc.somaxconn
+.Xr sysctl 3
+has been replaced with
+.Va kern.ipc.soacceptqueue
+in
+.Fx 10.0
+to prevent confusion its actual functionality.


There is a missing word here: ... confusion about its ...


+The original
+.Xr sysctl 3
+.Va kern.ipc.somaxconn
+is still available but hidden from a
+.Xr sysctl 3
+-a output so that existing applications and scripts continue to work.

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.cSat Oct 20 12:07:48 2012(r241780)
+++ head/sys/kern/uipc_socket.cSat Oct 20 12:53:14 2012(r241781)
@@ -185,6 +185,8 @@ MALLOC_DEFINE(M_PCB, pcb, protocol co
  /*
   * Limit on the number of connections in the listen queue waiting
   * for accept(2).
+ * NB: The orginal sysctl somaxconn is still available but hidden
+ * to prevent confusion about the actually purpose of this number.


actually should be actual.


   */
  static int somaxconn = SOMAXCONN;

@@ -205,9 +207,13 @@ sysctl_somaxconn(SYSCTL_HANDLER_ARGS)
  somaxconn = val;
  return (0);
  }
-SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT | CTLFLAG_RW,
+SYSCTL_PROC(_kern_ipc, OID_AUTO, soacceptqueue, CTLTYPE_UINT | CTLFLAG_RW,
  0, sizeof(int), sysctl_somaxconn, I,
  Maximum listen socket pending connection accept queue size);
+SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn,
+CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_SKIP,
+0, sizeof(int), sysctl_somaxconn, I,
+Maximum listen socket pending connection accept queue size (compat));

  static int numopensockets;
  SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,







___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241892 - head/sys/mips/conf

2012-10-22 Thread Andre Oppermann
Author: andre
Date: Mon Oct 22 15:04:23 2012
New Revision: 241892
URL: http://svn.freebsd.org/changeset/base/241892

Log:
  Remove ZERO_COPY_SOCKETS from kernel configuration as the current
  COW based approach is not safe and should not be used in production.

Modified:
  head/sys/mips/conf/RT305X

Modified: head/sys/mips/conf/RT305X
==
--- head/sys/mips/conf/RT305X   Mon Oct 22 14:48:14 2012(r241891)
+++ head/sys/mips/conf/RT305X   Mon Oct 22 15:04:23 2012(r241892)
@@ -86,7 +86,6 @@ options   SCSI_NO_OP_STRINGS
 optionsRWLOCK_NOINLINE
 optionsSX_NOINLINE
 optionsNO_SWAPPING
-optionsZERO_COPY_SOCKETS
 options MROUTING# Multicast routing
 optionsIPFIREWALL_DEFAULT_TO_ACCEPT
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241923 - in head/sys: netinet netipsec

2012-10-23 Thread Andre Oppermann

On 23.10.2012 10:33, Gleb Smirnoff wrote:

Author: glebius
Date: Tue Oct 23 08:33:13 2012
New Revision: 241923
URL: http://svn.freebsd.org/changeset/base/241923

Log:
 Do not reduce ip_len by size of IP header in the ip_input()
   before passing a packet to protocol input routines.
 For several protocols this mean that now protocol needs to
   do subtraction itself, and for another half this means that
   we do not need to add header length back to the packet.


Yay! More Mammoth shit getting washed away! ;)

Please add an entry to UPDATING as the convention of of ip_len
subtraction has been there since forever. That makes it easier
to discover for third parties writing code.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann
Author: andre
Date: Tue Oct 23 14:19:44 2012
New Revision: 241931
URL: http://svn.freebsd.org/changeset/base/241931

Log:
  Replace the ill-named ZERO_COPY_SOCKET kernel option with two
  more appropriate named kernel options for the very distinct
  send and receive path.
  
  options SOCKET_SEND_COW enables VM page copy-on-write based
  sending of data on an outbound socket.
  
  NB: The COW based send mechanism is not safe and may result
  in kernel crashes.
  
  options SOCKET_RECV_PFLIP enables VM kernel/userspace page
  flipping for special disposable pages attached as external
  storage to mbufs.
  
  Only the naming of the kernel options is changed and their
  corresponding #ifdef sections are adjusted.  No functionality
  is added or removed.
  
  Discussed with:   alc (mechanism and limitations of send side COW)

Modified:
  head/sys/conf/NOTES
  head/sys/conf/options
  head/sys/kern/subr_uio.c
  head/sys/kern/uipc_socket.c

Modified: head/sys/conf/NOTES
==
--- head/sys/conf/NOTES Tue Oct 23 12:39:17 2012(r241930)
+++ head/sys/conf/NOTES Tue Oct 23 14:19:44 2012(r241931)
@@ -964,12 +964,20 @@ options   TCP_SIGNATURE   #include support
 # a smooth scheduling of the traffic.
 optionsDUMMYNET
 
-# Zero copy sockets support.  This enables zero copy for sending and
-# receiving data via a socket.  The send side works for any type of NIC,
-# the receive side only works for NICs that support MTUs greater than the
-# page size of your architecture and that support header splitting.  See
-# zero_copy(9) for more details.
-optionsZERO_COPY_SOCKETS
+# Zero copy sockets support is split into the send and receive path
+# which operate very differently.
+# For the send path the VM page with the data is wired into the kernel
+# and marked as COW (copy-on-write).  If the application touches the
+# data while it is still in the send socket buffer the page is copied
+# and divorced from its kernel wiring (no longer zero copy).
+# The receive side requires explicit NIC driver support to create
+# disposable pages which are flipped from kernel to user-space VM.
+# See zero_copy(9) for more details.
+# XXX: The COW based send mechanism is not safe and may result in
+# kernel crashes.
+# XXX: None of the current NIC drivers support disposeable pages.
+optionsSOCKET_SEND_COW
+optionsSOCKET_RECV_PFLIP
 
 #
 # FILESYSTEM OPTIONS

Modified: head/sys/conf/options
==
--- head/sys/conf/options   Tue Oct 23 12:39:17 2012(r241930)
+++ head/sys/conf/options   Tue Oct 23 14:19:44 2012(r241931)
@@ -520,7 +520,8 @@ NGATM_CCATM opt_netgraph.h
 # DRM options
 DRM_DEBUG  opt_drm.h
 
-ZERO_COPY_SOCKETS  opt_zero.h
+SOCKET_SEND_COWopt_zero.h
+SOCKET_RECV_PFLIP  opt_zero.h
 TI_SF_BUF_JUMBOopt_ti.h
 TI_JUMBO_HDRSPLIT  opt_ti.h
 BCE_JUMBO_HDRSPLIT opt_bce.h

Modified: head/sys/kern/subr_uio.c
==
--- head/sys/kern/subr_uio.cTue Oct 23 12:39:17 2012(r241930)
+++ head/sys/kern/subr_uio.cTue Oct 23 14:19:44 2012(r241931)
@@ -57,7 +57,7 @@ __FBSDID($FreeBSD$);
 #include vm/vm_extern.h
 #include vm/vm_page.h
 #include vm/vm_map.h
-#ifdef ZERO_COPY_SOCKETS
+#ifdef SOCKET_SEND_COW
 #include vm/vm_object.h
 #endif
 
@@ -66,7 +66,7 @@ SYSCTL_INT(_kern, KERN_IOV_MAX, iov_max,
 
 static int uiomove_faultflag(void *cp, int n, struct uio *uio, int nofault);
 
-#ifdef ZERO_COPY_SOCKETS
+#ifdef SOCKET_SEND_COW
 /* Declared in uipc_socket.c */
 extern int so_zero_copy_receive;
 
@@ -128,7 +128,7 @@ retry:
vm_map_lookup_done(map, entry);
return(KERN_SUCCESS);
 }
-#endif /* ZERO_COPY_SOCKETS */
+#endif /* SOCKET_SEND_COW */
 
 int
 copyin_nofault(const void *udaddr, void *kaddr, size_t len)
@@ -261,7 +261,7 @@ uiomove_frombuf(void *buf, int buflen, s
return (uiomove((char *)buf + offset, n, uio));
 }
 
-#ifdef ZERO_COPY_SOCKETS
+#ifdef SOCKET_RECV_PFLIP
 /*
  * Experimental support for zero-copy I/O
  */
@@ -356,7 +356,7 @@ uiomoveco(void *cp, int n, struct uio *u
}
return (0);
 }
-#endif /* ZERO_COPY_SOCKETS */
+#endif /* SOCKET_RECV_PFLIP */
 
 /*
  * Give next character to user as result of read.

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Tue Oct 23 12:39:17 2012(r241930)
+++ head/sys/kern/uipc_socket.c Tue Oct 23 14:19:44 2012(r241931)
@@ -219,17 +219,20 @@ static int numopensockets;
 SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD,
 numopensockets, 0, Number of open sockets);
 
-#ifdef 

svn commit: r241932 - head/share/man/man9

2012-10-23 Thread Andre Oppermann
Author: andre
Date: Tue Oct 23 14:25:37 2012
New Revision: 241932
URL: http://svn.freebsd.org/changeset/base/241932

Log:
  Update zero_copy(9) man page to note the renamed kernel options
  and to warn about unsafeness of COW based sends.

Modified:
  head/share/man/man9/zero_copy.9

Modified: head/share/man/man9/zero_copy.9
==
--- head/share/man/man9/zero_copy.9 Tue Oct 23 14:19:44 2012
(r241931)
+++ head/share/man/man9/zero_copy.9 Tue Oct 23 14:25:37 2012
(r241932)
@@ -25,7 +25,7 @@
 .\
 .\ $FreeBSD$
 .\
-.Dd December 5, 2004
+.Dd October 23, 2012
 .Dt ZERO_COPY 9
 .Os
 .Sh NAME
@@ -33,7 +33,8 @@
 .Nm zero_copy_sockets
 .Nd zero copy sockets code
 .Sh SYNOPSIS
-.Cd options ZERO_COPY_SOCKETS
+.Cd options SOCKET_SEND_COW
+.Cd options SOCKET_RECV_PFLIP
 .Sh DESCRIPTION
 The
 .Fx
@@ -155,6 +156,8 @@ variables respectively.
 .Xr sendfile 2 ,
 .Xr socket 2 ,
 .Xr ti 4
+.Sh BUGS
+The COW based send mechanism is not safe and may result in kernel crashes.
 .Sh HISTORY
 The zero copy sockets code first appeared in
 .Fx 5.0 ,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 16:42, Gleb Smirnoff wrote:

On Tue, Oct 23, 2012 at 02:19:45PM +, Andre Oppermann wrote:
A Author: andre
A Date: Tue Oct 23 14:19:44 2012
A New Revision: 241931
A URL: http://svn.freebsd.org/changeset/base/241931
A
A Log:
A   Replace the ill-named ZERO_COPY_SOCKET kernel option with two
A   more appropriate named kernel options for the very distinct
A   send and receive path.
A
A   options SOCKET_SEND_COW enables VM page copy-on-write based
A   sending of data on an outbound socket.
A
A   NB: The COW based send mechanism is not safe and may result
A   in kernel crashes.
A
A   options SOCKET_RECV_PFLIP enables VM kernel/userspace page
A   flipping for special disposable pages attached as external
A   storage to mbufs.
A
A   Only the naming of the kernel options is changed and their
A   corresponding #ifdef sections are adjusted.  No functionality
A   is added or removed.
A
A   Discussed with: alc (mechanism and limitations of send side COW)

Users may call this a pointless POLA violation. IMO, the old
kernel option that we had for years, more than a decade, should remain
and just imply two new kernel options.


There shouldn't be any users.  Zero copy send is broken and
responsible for random kernel crashes.  Zero copy receive isn't
supported by any modern driver.  Both are useless to dangerous.

The main problem with ZERO_COPY_SOCKETS was that it sounded great
and who wouldn't want to have zero copy sockets?  Unfortunately
it doesn't work that way.

According to alc@ even if zero copy send would work it wouldn't
be faster due to page based COW setup being a very expensive
operation.  Eventually he want's page-based COW to go away.

For zero copy send we're trying to come up with a sendfile-like
approach where the page is simply wired into kernel space.  The
application then is not allowed to touch it until the socket
buffer has released it again.  The main issue here is how to
provide feedback to the application when it is safe for reuse.

For zero copy receive I've been contacted by np@ to find a way
to combine DDP into the socket buffer layer.  Trying to work
something out that isn't too horrible.  A generic approach would
hinge on page sized mbufs though.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 17:11, David Chisnall wrote:

On 23 Oct 2012, at 16:05, Andre Oppermann wrote:


For zero copy send we're trying to come up with a sendfile-like
approach where the page is simply wired into kernel space.  The
application then is not allowed to touch it until the socket
buffer has released it again.  The main issue here is how to
provide feedback to the application when it is safe for reuse.


It's been a few years since I used it, but I thought that aio_write() already 
provided this.  The application may not modify the contents of the memory 
pointed to by aio_buf until after it has received notification that the write 
has finished.  This happens either via a signal directly, a signal polled by 
kqueue, or a call to aio_return().


Indeed, that's one of the ways being explored.  It requires the
explicit cooperation of the application.  I don't think there is
any way around that.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r241955 - head

2012-10-23 Thread Andre Oppermann
Author: andre
Date: Tue Oct 23 16:33:43 2012
New Revision: 241955
URL: http://svn.freebsd.org/changeset/base/241955

Log:
  Note the removal of the ZERO_COPY_SOCKETS kernel option in r241931
  and provide a proper explanation.

Modified:
  head/UPDATING

Modified: head/UPDATING
==
--- head/UPDATING   Tue Oct 23 16:12:17 2012(r241954)
+++ head/UPDATING   Tue Oct 23 16:33:43 2012(r241955)
@@ -25,6 +25,17 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10
ln -s 'abort:false,junk:false' /etc/malloc.conf.)
 
 20121023:
+   The ZERO_COPY_SOCKET kernel option has been removed and
+   split into SOCKET_SEND_COW and SOCKET_RECV_PFLIP.
+   NB: SOCKET_SEND_COW uses the VM page based copy-on-write
+   mechanism which is not safe and may result in kernel crashes.
+   NB: The SOCKET_RECV_PFLIP mechanism is useless as no current
+   driver supports disposeable external page sized mbuf storage.
+   Proper replacements for both zero-copy mechanisms are under
+   consideration and will eventually lead to complete removal
+   of the two kernel options.
+
+20121023:
The IPv4 network stack has been converted to network byte
order. The following modules need to be recompiled together
with kernel: carp(4), divert(4), gif(4), siftr(4), gre(4),
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 17:21, Bryan Drewery wrote:

On 10/23/2012 10:05 AM, Andre Oppermann wrote:

There shouldn't be any users.  Zero copy send is broken and
responsible for random kernel crashes.  Zero copy receive isn't
supported by any modern driver.  Both are useless to dangerous.


I enabled this a few weeks ago, not knowing it was useless/dangerous.

Perhaps an entry in UPDATING to note that this has been renamed and that
it may not actually be useful?


Good idea.  Will do.


Also, zero_copy(9) needs updating, as it references ZERO_COPY_SOCKETS.


Already done in r241932.

--
Andre


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r241931 - in head/sys: conf kern

2012-10-23 Thread Andre Oppermann

On 23.10.2012 18:05, Gleb Smirnoff wrote:

On Tue, Oct 23, 2012 at 05:05:48PM +0200, Andre Oppermann wrote:
A There shouldn't be any users.  Zero copy send is broken and
A responsible for random kernel crashes.  Zero copy receive isn't
A supported by any modern driver.  Both are useless to dangerous.
A
A The main problem with ZERO_COPY_SOCKETS was that it sounded great
A and who wouldn't want to have zero copy sockets?  Unfortunately
A it doesn't work that way.

Okay, it appeared that there are users, even on current@ mailing
list during couple of hours of exposition.

Can we keep the old option as compatibility?


No.  They are not users.  They simply fell for the promise of
zero copy which it isn't.  It doesn't do what the users
believe it does.  It's useless for receive and dangerous for send.

I have updated NOTES and forwarded it to -current.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 20:56, Jim Harris wrote:

On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote:

On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote:


   Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.


Ok, but..



 struct mtx  tdq_lock;   /* run queue lock. */
+   charpad[64 - sizeof(struct mtx)];


.. don't we have an existing compile time macro for the cache line
size, which can be used here?


Yes, but I didn't use it for a couple of reasons:

1) struct tdq itself is currently using __aligned(64), so I wanted to
keep it consistent.
2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
NetBurst-based processors having 128-byte cache sectors a while back.
I had planned to start a separate thread on arch@ about this today on
whether this was still appropriate.


See also the discussion on svn-src-all regarding global struct mtx
alignment.

Thank you for proving my point. ;)

Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
the future possibly change to a different compiler dependent
align attribute
 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
automatically gets aligned in all cases, even when dynamically
allocated.

Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
of #3 is that there possibly isn't any case where you'd actually
want the mutex to share a cache line with anything else, even a data
structure.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 21:49, Jim Harris wrote:

On Wed, Oct 24, 2012 at 12:16 PM, Andre Oppermann an...@freebsd.org wrote:

snip



See also the discussion on svn-src-all regarding global struct mtx
alignment.

Thank you for proving my point. ;)

Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

  1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
  2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
 the future possibly change to a different compiler dependent
 align attribute
  3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
 automatically gets aligned in all cases, even when dynamically
 allocated.

Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
of #3 is that there possibly isn't any case where you'd actually
want the mutex to share a cache line with anything else, even a data
structure.


I've run my same tests with #3 as you describe, and I did see further
noticeable improvement.  I had a difficult time though quantifying the
effect it would have on all of the different architectures.  Putting
it in ULE's tdq gained 60-70% of the overall benefit, and was well
contained.


I just experimented with different specifications of alignment
and couldn't get the globals aligned at all.  This seems to be
because of the linker not understanding or not getting passed
the alignment information when linking the kernel.


I agree that sprinkling all over the place isn't pretty.  But focused
investigations into specific locks (spin mutexes, default mutexes,
whatever) may find a few key additional ones that would benefit.  I
started down this path with the sleepq and turnstile locks, but none
of those specifically showed noticeable improvement (at least in the
tests I was running).  There's still some additional ones I want to
look at, but haven't had the time yet.


This runs the very great risk of optimizing for today's available
architectures and then needs rejiggling every five years.  Just as
you've noticed the issue with 128B alignment from the Netburst days.
We never know how the next micro-architecture will behave.  Micro
optimizing each individual invocation of common building blocks is
the wrong path to go.

I'd very much prefer the alignment *and* padding control to be done
in one place for all of them, either through a magic macro or compiler
__attribute__(whatever).

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 21:06, Attilio Rao wrote:

On Wed, Oct 24, 2012 at 8:00 PM, Jim Harris jim.har...@gmail.com wrote:

On Wed, Oct 24, 2012 at 11:43 AM, John Baldwin j...@freebsd.org wrote:

On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote:

Author: jimharris
Date: Wed Oct 24 18:36:41 2012
New Revision: 242014
URL: http://svn.freebsd.org/changeset/base/242014

Log:
   Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.

   This enables CPU searches (which read tdq_load) to operate independently
   of any contention on the spinlock.  Some scheduler-intensive workloads
   running on an 8C single-socket SNB Xeon show considerable improvement with
   this change (2-3% perf improvement, 5-6% decrease in CPU util).

   Sponsored by:   Intel
   Reviewed by:jeff

Modified:
   head/sys/kern/sched_ule.c

Modified: head/sys/kern/sched_ule.c


==

--- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013)
+++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014)
@@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1;
   * locking in sched_pickcpu();
   */
  struct tdq {
- /* Ordered to improve efficiency of cpu_search() and switch(). */
+ /*
+  * Ordered to improve efficiency of cpu_search() and switch().
+  * tdq_lock is padded to avoid false sharing with tdq_load and
+  * tdq_cpu_idle.
+  */
   struct mtx  tdq_lock;   /* run queue lock. */
+ charpad[64 - sizeof(struct mtx)];


Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead?



No - that doesn't pad it.  I believe that only works if it's global,
i.e. not part of a data structure.


As I've already said in another thread __align() doesn't work on
object declaration, so what that won't pad it either if it is global
or part of a struct.
It is just implemented as __attribute__((aligned(X))):
http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html


Actually it seems gcc itself doesn't really care and it up to the
linker to honor that.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 21:30, Alexander Motin wrote:

On 24.10.2012 22:16, Andre Oppermann wrote:

On 24.10.2012 20:56, Jim Harris wrote:

On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org
wrote:

On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote:


   Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.


Ok, but..



 struct mtx  tdq_lock;   /* run queue lock. */
+   charpad[64 - sizeof(struct mtx)];


.. don't we have an existing compile time macro for the cache line
size, which can be used here?


Yes, but I didn't use it for a couple of reasons:

1) struct tdq itself is currently using __aligned(64), so I wanted to
keep it consistent.
2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
NetBurst-based processors having 128-byte cache sectors a while back.
I had planned to start a separate thread on arch@ about this today on
whether this was still appropriate.


See also the discussion on svn-src-all regarding global struct mtx
alignment.

Thank you for proving my point. ;)

Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

  1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
  2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
 the future possibly change to a different compiler dependent
 align attribute
  3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
 automatically gets aligned in all cases, even when dynamically
 allocated.

Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
of #3 is that there possibly isn't any case where you'd actually
want the mutex to share a cache line with anything else, even a data
structure.


I'm sorry, could you hint me with some theory? I think I can agree that cache 
line sharing can be a
problem in case of spin locks -- waiting thread will constantly try to access 
page modified by other
CPU, that I guess will cause cache line writes to the RAM. But why is it so bad 
to share lock with
respective data in case of non-spin locks? Won't benefits from free regular 
prefetch of the right
data while grabbing lock compensate penalties from relatively rare collisions?


Cliff Click describes it in detail:
 http://www.azulsystems.com/blog/cliff/2009-04-14-odds-ends

For a classic mutex it likely doesn't make much difference since the
cache line is exclusive anyway while the lock is held.  On LL/SC systems
there may be cache line dirtying on a failed locking attempt.

For spin mutexes it hurts badly as you noted.

Especially on RW mutexes it hurts because a read lock dirties the cache
line for all other CPU's.  Here the RW mutex should be on its own cache
line in all cases.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 22:29, Attilio Rao wrote:

On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann an...@freebsd.org wrote:

On 24.10.2012 21:06, Attilio Rao wrote:

As I've already said in another thread __align() doesn't work on
object declaration, so what that won't pad it either if it is global
or part of a struct.
It is just implemented as __attribute__((aligned(X))):
http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html



Actually it seems gcc itself doesn't really care and it up to the
linker to honor that.


Yes but the concept being that if you use __aligned() properly (when
defining a struct) the object will be correctly sized, so you will get
padding automatically.


Yes.  With __aligned() the start of the element/structure should
begin on an address evenly dividable by the align value *and* it
should pad out any remaining space up to the next evenly dividable
address.

The problem we have is that is apparently doesn't work correctly
within gcc when creating structs nor within the linker when placing
such supposedly aligned structs in the .bss section (at least the
padding is missing).

It seems to come down to either a) fixing gcc+ld; or b) hacking
around it by magically padding the structs that require it.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242014 - head/sys/kern

2012-10-24 Thread Andre Oppermann

On 24.10.2012 22:55, Andre Oppermann wrote:

On 24.10.2012 22:29, Attilio Rao wrote:

On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann an...@freebsd.org wrote:

On 24.10.2012 21:06, Attilio Rao wrote:

As I've already said in another thread __align() doesn't work on
object declaration, so what that won't pad it either if it is global
or part of a struct.
It is just implemented as __attribute__((aligned(X))):
http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html



Actually it seems gcc itself doesn't really care and it up to the
linker to honor that.


Yes but the concept being that if you use __aligned() properly (when
defining a struct) the object will be correctly sized, so you will get
padding automatically.


Yes.  With __aligned() the start of the element/structure should
begin on an address evenly dividable by the align value *and* it
should pad out any remaining space up to the next evenly dividable
address.

The problem we have is that is apparently doesn't work correctly
within gcc when creating structs nor within the linker when placing
such supposedly aligned structs in the .bss section (at least the
padding is missing).


I spoke too soon.  Attilio is completely right in his assessment.

It does work when done on the struct definition:

struct mtx {
...
} __aligned(CACHE_LINE_SIZE);   /* works including .bss alignment  padding */

When creating a struct (in globals at least) it doesn't work:

struct mtx __aligned(CACHE_LINE_SIZE) foo_mtx;  /* doesn't work */


It seems to come down to either a) fixing gcc+ld; or b) hacking
around it by magically padding the structs that require it.


The question now becomes of whether we can (should?) make the latter
case above work or find another workaround.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-25 Thread Andre Oppermann

On 25.10.2012 11:39, Andrey V. Elsukov wrote:

Author: ae
Date: Thu Oct 25 09:39:14 2012
New Revision: 242079
URL: http://svn.freebsd.org/changeset/base/242079

Log:
   Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
   on the related functionality in the runtime via the sysctl variable
   net.pfil.forward. It is turned off by default.

   Sponsored by:Yandex LLC
   Discussed with:  net@
   MFC after:   2 weeks


I still don't agree with naming the sysctl net.pfil.forward.  This
type of forwarding is a property of IPv4 and IPv6 and thus should
be put there.  Pfil hooking can be on layer 2, 2-bridging, 3 and
who knows where else in the future.  Forwarding works only for IPv46.

You haven't even replied to my comment on net@.  Please change the
sysctl location and name to its appropriate place.

Also an MFC's after 2 weeks must ensure that compiling with IPFIREWALL_
FORWARD enabled the sysctl at the same time to keep kernel configs
within 9-stable working.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242014 - head/sys/kern

2012-10-25 Thread Andre Oppermann

On 25.10.2012 05:49, Bruce Evans wrote:

On Wed, 24 Oct 2012, Attilio Rao wrote:


On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann an...@freebsd.org wrote:

...
Let's go back and see how we can do this the sanest way.  These are
the options I see at the moment:

 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place


This is wrong because it doesn't give padding.


Unless it is sprinkled in struct declarations.


 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
the future possibly change to a different compiler dependent
align attribute


What is this macro supposed to do? I don't understand that from your
description.


 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
automatically gets aligned in all cases, even when dynamically
allocated.


This works but I think it is overkill for structures including sleep
mutexes which are the vast majority. So I wouldn't certainly be in
favor of such a patch.


This doesn't work either with fully dynamic (auto) allocations.  Stack
alignment is generally broken (limited, and pessimized for both space
and time) in gcc (it works better in clang).  On amd64, it is limited
by the default of -mpreferred-stack-boundary=4.  Since 2**4 is smaller
than the cache line size and stack alignments larger than it are broken
in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally,
16/CACHE_LINE_SIZE of the time.  On i386, we reduce the space/time
pessimizations a little by overriding the default to
-mpreferred-stack-boundary=2.  2**2 is even smaller than the cache
line size.  (The pessimizations are for both space and time, since
time and code space is wasted for the code to keep the stack aligned,
and cache space and thus also time are wasted for padding.  Most
functions don't benefit from more than sizeof(register_t) alignment.)


I'm not aware of stack allocated mutexes anywhere in the kernel.
Even if there is a case it's very special and unique.

I've verified that __aligned(CACHE_LINE_SIZE) on the definition of
struct mtx itself (in sys/_mutex.h) correctly aligns and pads the
global .bss resident mutexes for 64B and 128B cache line sizes.


Dynamic allocations via malloc() get whatever alignment malloc() gives.
This is only required to be 4 or 8 or 16 or so (the maximum for a C
object declared in conforming C (no __align()), but malloc() usually
gives more.  If it gives CACHE_LINE_SIZE, that is wasteful for most
small allocations.


Stand-alone mutexes are normally not malloc'ed.  They're always
embedded into some larger structure they protect.


__builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at
least on i386.  In gcc-3.3.3, it assumes that the stack is the default
16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to
say otherwise, and just subtracts from the stack pointer.  In gcc-4.2.1,
it does the necessary andl of the stack pointer, but only 16-byte
alignment.

It is another bug that there sre no extensions of malloc() or alloca().
Since malloc() is in the library and may give CACHE_LINE_SIZE but
__builtin_alloca() is in the compiler and only gives 16, these functions
are not even as compatible as they should be.

I don't know of any mutexes allocated on the stack, but there are stack
frames with mcontexts in them that need special alignment so they cause
problems on i386.  They can't just be put on the stack due to the above
bugs. They are laboriously allocated using malloc().  Since they are a
quite large, 1 mcontext barely fits on the kernel stack, so kib didn't
like my alloca() method for allocating them.


You lost me here.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-25 Thread Andre Oppermann

On 25.10.2012 18:25, Andrey V. Elsukov wrote:

On 25.10.2012 19:54, Andre Oppermann wrote:

I still don't agree with naming the sysctl net.pfil.forward.  This
type of forwarding is a property of IPv4 and IPv6 and thus should
be put there.  Pfil hooking can be on layer 2, 2-bridging, 3 and
who knows where else in the future.  Forwarding works only for IPv46.

You haven't even replied to my comment on net@.  Please change the
sysctl location and name to its appropriate place.


Hi Andre,

There were two replies related to this subject, you did not replied to
them and i thought that you became agree.


I replied to your reply to mine.  Other than that I didn't find
anything else from you.


So, if not, what you think about the name net.pfil.ipforward?


net.inet.ip.pfil_forward
net.inet6.ip6.pfil_forward

or something like that.

If you can show with your performance profiling that the sysctl
isn't even necessary, you could leave it completely away and have
pfil_forward enabled permanently.  That would be even better for
everybody.


Also an MFC's after 2 weeks must ensure that compiling with IPFIREWALL_
FORWARD enabled the sysctl at the same time to keep kernel configs
within 9-stable working.


Yes, it will work like that.


Excellent.  Thank you.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-26 Thread Andre Oppermann

On 26.10.2012 13:26, Gleb Smirnoff wrote:

On Thu, Oct 25, 2012 at 10:29:51PM +0200, Andre Oppermann wrote:
A On 25.10.2012 18:25, Andrey V. Elsukov wrote:
A  On 25.10.2012 19:54, Andre Oppermann wrote:
A  I still don't agree with naming the sysctl net.pfil.forward.  This
A  type of forwarding is a property of IPv4 and IPv6 and thus should
A  be put there.  Pfil hooking can be on layer 2, 2-bridging, 3 and
A  who knows where else in the future.  Forwarding works only for IPv46.
A 
A  You haven't even replied to my comment on net@.  Please change the
A  sysctl location and name to its appropriate place.
A 
A  Hi Andre,
A 
A  There were two replies related to this subject, you did not replied to
A  them and i thought that you became agree.
A
A I replied to your reply to mine.  Other than that I didn't find
A anything else from you.
A
A  So, if not, what you think about the name net.pfil.ipforward?
A
A net.inet.ip.pfil_forward
A net.inet6.ip6.pfil_forward
A
A or something like that.
A
A If you can show with your performance profiling that the sysctl
A isn't even necessary, you could leave it completely away and have
A pfil_forward enabled permanently.  That would be even better for
A everybody.

I'd prefer to have the sysctl. Benchmarking will definitely show
no regression, because in default case packets are tagless. But if
packets would carry 1 or 2 tags each, which don't actually belong
to PACKET_TAG_IPFORWARD, then processing would be pessimized.


With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5]
mbuf flags.  The same can be done with M_IPFORWARD.  The ipfw code then
will not only add the m_tag but also set M_IPFORWARD flag.  That way no
sysctl is required and the feature is always available.  The overlay
definition is in ip_var.h.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-26 Thread Andre Oppermann

On 26.10.2012 14:29, Andrey V. Elsukov wrote:

On 26.10.2012 15:43, Andre Oppermann wrote:

A If you can show with your performance profiling that the sysctl
A isn't even necessary, you could leave it completely away and have
A pfil_forward enabled permanently.  That would be even better for
A everybody.

I'd prefer to have the sysctl. Benchmarking will definitely show
no regression, because in default case packets are tagless. But if
packets would carry 1 or 2 tags each, which don't actually belong
to PACKET_TAG_IPFORWARD, then processing would be pessimized.


With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5]
mbuf flags.  The same can be done with M_IPFORWARD.  The ipfw code then
will not only add the m_tag but also set M_IPFORWARD flag.  That way no
sysctl is required and the feature is always available.  The overlay
definition is in ip_var.h.


It seems we have only one bit in the m_flags that can be used, so, maybe
we left it to some things that can appear in the future?


That's what the M_PROTO flags are for:

#define M_IPFW_FORWARD  M_PROTO2/* ip forwarding */

of course you have to do the same for ip6.

The M_PROTO[1-5] flags are only valid within a protocol layer.  For
example they get cleared in ip_output() before the packet is handed
to layer 2.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242079 - in head: sbin/ipfw share/man/man4 sys/conf sys/net sys/netinet sys/netinet6 sys/netpfil/ipfw

2012-10-26 Thread Andre Oppermann

On 26.10.2012 15:24, Andre Oppermann wrote:

On 26.10.2012 14:29, Andrey V. Elsukov wrote:

On 26.10.2012 15:43, Andre Oppermann wrote:

A If you can show with your performance profiling that the sysctl
A isn't even necessary, you could leave it completely away and have
A pfil_forward enabled permanently.  That would be even better for
A everybody.

I'd prefer to have the sysctl. Benchmarking will definitely show
no regression, because in default case packets are tagless. But if
packets would carry 1 or 2 tags each, which don't actually belong
to PACKET_TAG_IPFORWARD, then processing would be pessimized.


With M_FASTFWD_OURS I used an overlay of the protocol specific M_PROTO[1-5]
mbuf flags.  The same can be done with M_IPFORWARD.  The ipfw code then
will not only add the m_tag but also set M_IPFORWARD flag.  That way no
sysctl is required and the feature is always available.  The overlay
definition is in ip_var.h.


It seems we have only one bit in the m_flags that can be used, so, maybe
we left it to some things that can appear in the future?


That's what the M_PROTO flags are for:

#defineM_IPFW_FORWARDM_PROTO2/* ip forwarding */


Actually looking at it technically this isn't forwarding but specifying
a different nexthop.  Hence the #define and description should be more
like

#define M_IP_NEXTHOPM_PROTO2/* explicit ip nexthop */

Of course the userspace ipfw feature naming and usage doesn't change.
But within the kernel it's really nexthop manipulation within the
forwarding path.

--
Andre


of course you have to do the same for ip6.

The M_PROTO[1-5] flags are only valid within a protocol layer.  For
example they get cleared in ip_output() before the packet is handed
to layer 2.



___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242151 - in head/sys: vm xen/evtchn

2012-10-26 Thread Andre Oppermann
Author: andre
Date: Fri Oct 26 17:31:35 2012
New Revision: 242151
URL: http://svn.freebsd.org/changeset/base/242151

Log:
  Move the corresponding MTX_SYSINIT() next to their struct mtx declaration
  to make their relationship more obvious as done with the other such mutexs.

Modified:
  head/sys/vm/vm_glue.c
  head/sys/xen/evtchn/evtchn.c

Modified: head/sys/vm/vm_glue.c
==
--- head/sys/vm/vm_glue.c   Fri Oct 26 17:02:50 2012(r242150)
+++ head/sys/vm/vm_glue.c   Fri Oct 26 17:31:35 2012(r242151)
@@ -307,6 +307,8 @@ struct kstack_cache_entry *kstack_cache;
 static int kstack_cache_size = 128;
 static int kstacks;
 static struct mtx kstack_cache_mtx;
+MTX_SYSINIT(kstack_cache, kstack_cache_mtx, kstkch, MTX_DEF);
+
 SYSCTL_INT(_vm, OID_AUTO, kstack_cache_size, CTLFLAG_RW, kstack_cache_size, 0,
 );
 SYSCTL_INT(_vm, OID_AUTO, kstacks, CTLFLAG_RD, kstacks, 0,
@@ -486,7 +488,6 @@ kstack_cache_init(void *nulll)
EVENTHANDLER_PRI_ANY);
 }
 
-MTX_SYSINIT(kstack_cache, kstack_cache_mtx, kstkch, MTX_DEF);
 SYSINIT(vm_kstacks, SI_SUB_KTHREAD_INIT, SI_ORDER_ANY, kstack_cache_init, 
NULL);
 
 #ifndef NO_SWAPPING

Modified: head/sys/xen/evtchn/evtchn.c
==
--- head/sys/xen/evtchn/evtchn.cFri Oct 26 17:02:50 2012
(r242150)
+++ head/sys/xen/evtchn/evtchn.cFri Oct 26 17:31:35 2012
(r242151)
@@ -44,7 +44,15 @@ static inline unsigned long __ffs(unsign
 return word;
 }
 
+/*
+ * irq_mapping_update_lock: in order to allow an interrupt to occur in a 
critical
+ * section, to set pcpu-ipending (etc...) properly, we
+ * must be able to get the icu lock, so it can't be
+ * under witness.
+ */
 static struct mtx irq_mapping_update_lock;
+MTX_SYSINIT(irq_mapping_update_lock, irq_mapping_update_lock, xp, MTX_SPIN);
+
 static struct xenpic *xp;
 struct xenpic_intsrc {
struct intsrc xp_intsrc;
@@ -1130,11 +1138,4 @@ evtchn_init(void *dummy __unused)
 }
 
 SYSINIT(evtchn_init, SI_SUB_INTR, SI_ORDER_MIDDLE, evtchn_init, NULL);
-/*
- * irq_mapping_update_lock: in order to allow an interrupt to occur in a 
critical
- * section, to set pcpu-ipending (etc...) properly, we
- * must be able to get the icu lock, so it can't be
- * under witness.
- */
 
-MTX_SYSINIT(irq_mapping_update_lock, irq_mapping_update_lock, xp, MTX_SPIN);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242161 - in head/sys: net netinet netpfil/pf

2012-10-27 Thread Andre Oppermann

On 26.10.2012 23:06, Gleb Smirnoff wrote:

Author: glebius
Date: Fri Oct 26 21:06:33 2012
New Revision: 242161
URL: http://svn.freebsd.org/changeset/base/242161

Log:
   o Remove last argument to ip_fragment(), and obtain all needed information
 on checksums directly from mbuf flags. This simplifies code.
   o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
 hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
 although try to do checksums if CSUM_IP set on mbuf. Example is em(4).


I'm not getting your description here?  Why work around a bug in a driver
in ip_fragment() when we can fix the bug in the driver?


   o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
 After this change CSUM_DELAY_IP vanishes from the stack.


Good. :)


   Submitted by:Sebastian Kuzminsky seb lineratesystems.com


--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242249 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:16:09 2012
New Revision: 242249
URL: http://svn.freebsd.org/changeset/base/242249

Log:
  Adjust the initial default CWND upon connection establishment to the
  new and increased values specified by RFC5681 Section 3.1.
  
  The even larger initial CWND per RFC3390, if enabled, is not affected.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 17:06:50 2012
(r242248)
+++ head/sys/netinet/tcp_input.cSun Oct 28 17:16:09 2012
(r242249)
@@ -351,8 +351,15 @@ cc_conn_init(struct tcpcb *tp)
if (V_tcp_do_rfc3390)
tp-snd_cwnd = min(4 * tp-t_maxseg,
max(2 * tp-t_maxseg, 4380));
-   else
-   tp-snd_cwnd = tp-t_maxseg;
+   else {
+   /* Per RFC5681 Section 3.1 */
+   if (tp-t_maxseg  2190)
+   tp-snd_cwnd = 2 * tp-t_maxseg;
+   else if (tp-t_maxseg  1095)
+   tp-snd_cwnd = 3 * tp-t_maxseg;
+   else
+   tp-snd_cwnd = 4 * tp-t_maxseg;
+   }
 
if (CC_ALGO(tp)-conn_init != NULL)
CC_ALGO(tp)-conn_init(tp-ccv);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242250 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:25:08 2012
New Revision: 242250
URL: http://svn.freebsd.org/changeset/base/242250

Log:
  When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
  reduce the initial CWND to one segment.  This reduction got lost
  some time ago due to a change in initialization ordering.
  
  Additionally in tcp_timer_rexmt() avoid entering fast recovery when
  we're still in TCPS_SYN_SENT state.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_syncache.c
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 17:16:09 2012
(r242249)
+++ head/sys/netinet/tcp_input.cSun Oct 28 17:25:08 2012
(r242250)
@@ -345,10 +345,16 @@ cc_conn_init(struct tcpcb *tp)
/*
 * Set the initial slow-start flight size.
 *
-* RFC3390 says only do this if SYN or SYN/ACK didn't got lost.
-* XXX: We currently check only in syncache_socket for that.
-*/
-   if (V_tcp_do_rfc3390)
+* RFC5681 Section 3.1 specifies the default conservative values.
+* RFC3390 specifies slightly more aggressive values.
+*
+* If a SYN or SYN/ACK was lost and retransmitted, we have to
+* reduce the initial CWND to one segment as congestion is likely
+* requiring us to be cautious.
+*/
+   if (tp-snd_cwnd == 1)
+   tp-snd_cwnd = tp-t_maxseg;/* SYN(-ACK) lost */
+   else if (V_tcp_do_rfc3390)
tp-snd_cwnd = min(4 * tp-t_maxseg,
max(2 * tp-t_maxseg, 4380));
else {

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Oct 28 17:16:09 2012
(r242249)
+++ head/sys/netinet/tcp_syncache.c Sun Oct 28 17:25:08 2012
(r242250)
@@ -852,11 +852,12 @@ syncache_socket(struct syncache *sc, str
tcp_mss(tp, sc-sc_peer_mss);
 
/*
-* If the SYN,ACK was retransmitted, reset cwnd to 1 segment.
+* If the SYN,ACK was retransmitted, indicate that CWND to be
+* limited to one segment in cc_conn_init().
 * NB: sc_rxmits counts all SYN,ACK transmits, not just retransmits.
 */
if (sc-sc_rxmits  1)
-   tp-snd_cwnd = tp-t_maxseg;
+   tp-snd_cwnd = 1;
 
 #ifdef TCP_OFFLOAD
/*

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 17:16:09 2012
(r242249)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 17:25:08 2012
(r242250)
@@ -539,7 +539,13 @@ tcp_timer_rexmt(void * xtp)
}
INP_INFO_RUNLOCK(V_tcbinfo);
headlocked = 0;
-   if (tp-t_rxtshift == 1) {
+   if (tp-t_state == TCPS_SYN_SENT) {
+   /*
+* If the SYN was retransmitted, indicate CWND to be
+* limited to 1 segment in cc_conn_init().
+*/
+   tp-snd_cwnd = 1;
+   } else if (tp-t_rxtshift == 1) {
/*
 * first retransmit; record ssthresh and cwnd so they can
 * be recovered if this turns out to be a bad retransmit.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242251 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:30:28 2012
New Revision: 242251
URL: http://svn.freebsd.org/changeset/base/242251

Log:
  When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
  reduce the initial CWND to one segment.  This reduction got lost
  some time ago due to a change in initialization ordering.
  
  Additionally in tcp_timer_rexmt() avoid entering fast recovery when
  we're still in TCPS_SYN_SENT state.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 28 17:25:08 2012
(r242250)
+++ head/sys/netinet/tcp_output.c   Sun Oct 28 17:30:28 2012
(r242251)
@@ -551,10 +551,14 @@ after_sack_rexmit:
 * max size segments, or at least 50% of the maximum possible
 * window, then want to send a window update to peer.
 * Skip this if the connection is in T/TCP half-open state.
-* Don't send pure window updates when the peer has closed
-* the connection and won't ever send more data.
+*
+* Don't send an independent window update if a delayed
+* ACK is pending (it will get piggy-backed on it) or the
+* remote side already has done a half-close and won't send
+* more data.
 */
if (recwin  0  !(tp-t_flags  TF_NEEDSYN) 
+   !(tp-t_flags  TF_DELACK) 
!TCPS_HAVERCVDFIN(tp-t_state)) {
/*
 * adv is the amount we can increase the window,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242252 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:40:35 2012
New Revision: 242252
URL: http://svn.freebsd.org/changeset/base/242252

Log:
  Prevent a flurry of forced window updates when an application is
  doing small reads on a (partially) filled receive socket buffer.
  
  Normally one would a send a window update every time the available
  space in the socket buffer increases by two times MSS.  This leads
  to a flurry of window updates that do not provide any meaningful
  new information to the sender.  There still is available space in
  the window and the sender can continue sending data.  All window
  updates then get carried by the regular ACKs.  Only when the socket
  buffer was (almost) full and the window closed accordingly a window
  updates delivery new information and allows the sender to start
  sending more data again.
  
  Send window updates only every two MSS when the socket buffer
  has less than 1/8 space available, or the available space in the
  socket buffer increased by 1/4 its full capacity, or the socket
  buffer is very small.  The next regular data ACK will carry and
  report the exact window size again.
  
  Reported by:  sbruno
  Tested by:darrenr
  Tested by:Darren Baginski
  PR:   kern/116335
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 28 17:30:28 2012
(r242251)
+++ head/sys/netinet/tcp_output.c   Sun Oct 28 17:40:35 2012
(r242252)
@@ -545,23 +545,39 @@ after_sack_rexmit:
}
 
/*
-* Compare available window to amount of window
-* known to peer (as advertised window less
-* next expected input).  If the difference is at least two
-* max size segments, or at least 50% of the maximum possible
-* window, then want to send a window update to peer.
-* Skip this if the connection is in T/TCP half-open state.
+* Sending of standalone window updates.
+*
+* Window updates important when we close our window due to a full
+* socket buffer and are opening it again after the application
+* reads data from it.  Once the window has opened again and the
+* remote end starts to send again the ACK clock takes over and
+* provides the most current window information.
+*
+* We must avoid to the silly window syndrome whereas every read
+* from the receive buffer, no matter how small, causes a window
+* update to be sent.  We also should avoid sending a flurry of
+* window updates when the socket buffer had queued a lot of data
+* and the application is doing small reads.
+*
+* Prevent a flurry of pointless window updates by only sending
+* an update when we can increase the advertized window by more
+* than 1/4th of the socket buffer capacity.  When the buffer is
+* getting full or is very small be more aggressive and send an
+* update whenever we can increase by two mss sized segments.
+* In all other situations the ACK's to new incoming data will
+* carry further window increases.
 *
 * Don't send an independent window update if a delayed
 * ACK is pending (it will get piggy-backed on it) or the
 * remote side already has done a half-close and won't send
-* more data.
+* more data.  Skip this if the connection is in T/TCP
+* half-open state.
 */
if (recwin  0  !(tp-t_flags  TF_NEEDSYN) 
!(tp-t_flags  TF_DELACK) 
!TCPS_HAVERCVDFIN(tp-t_state)) {
/*
-* adv is the amount we can increase the window,
+* adv is the amount we could increase the window,
 * taking into account that we are limited by
 * TCP_MAXWIN  tp-rcv_scale.
 */
@@ -581,9 +597,11 @@ after_sack_rexmit:
 */
if (oldwin  tp-rcv_scale == (adv + oldwin)  tp-rcv_scale)
goto dontupdate;
-   if (adv = (long) (2 * tp-t_maxseg))
-   goto send;
-   if (2 * adv = (long) so-so_rcv.sb_hiwat)
+
+   if (adv = (long)(2 * tp-t_maxseg) 
+   (adv = (long)(so-so_rcv.sb_hiwat / 4) ||
+recwin = (long)(so-so_rcv.sb_hiwat / 8) ||
+so-so_rcv.sb_hiwat = 8 * tp-t_maxseg))
goto send;
}
 dontupdate:
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242253 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 17:59:46 2012
New Revision: 242253
URL: http://svn.freebsd.org/changeset/base/242253

Log:
  Simplify implementation of net.inet.tcp.reass.maxsegments and
  net.inet.tcp.reass.cursegments.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_reass.c

Modified: head/sys/netinet/tcp_reass.c
==
--- head/sys/netinet/tcp_reass.cSun Oct 28 17:40:35 2012
(r242252)
+++ head/sys/netinet/tcp_reass.cSun Oct 28 17:59:46 2012
(r242253)
@@ -74,7 +74,6 @@ __FBSDID($FreeBSD$);
 #include netinet/tcp_debug.h
 #endif /* TCPDEBUG */
 
-static int tcp_reass_sysctl_maxseg(SYSCTL_HANDLER_ARGS);
 static int tcp_reass_sysctl_qsize(SYSCTL_HANDLER_ARGS);
 
 static SYSCTL_NODE(_net_inet_tcp, OID_AUTO, reass, CTLFLAG_RW, 0,
@@ -82,16 +81,12 @@ static SYSCTL_NODE(_net_inet_tcp, OID_AU
 
 static VNET_DEFINE(int, tcp_reass_maxseg) = 0;
 #defineV_tcp_reass_maxseg  VNET(tcp_reass_maxseg)
-SYSCTL_VNET_PROC(_net_inet_tcp_reass, OID_AUTO, maxsegments,
-CTLTYPE_INT | CTLFLAG_RDTUN,
-VNET_NAME(tcp_reass_maxseg), 0, tcp_reass_sysctl_maxseg, I,
+SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, maxsegments, CTLFLAG_RDTUN,
+VNET_NAME(tcp_reass_maxseg), 0,
 Global maximum number of TCP Segments in Reassembly Queue);
 
-static VNET_DEFINE(int, tcp_reass_qsize) = 0;
-#defineV_tcp_reass_qsize   VNET(tcp_reass_qsize)
 SYSCTL_VNET_PROC(_net_inet_tcp_reass, OID_AUTO, cursegments,
-CTLTYPE_INT | CTLFLAG_RD,
-VNET_NAME(tcp_reass_qsize), 0, tcp_reass_sysctl_qsize, I,
+(CTLTYPE_INT | CTLFLAG_RD), NULL, 0, tcp_reass_sysctl_qsize, I,
 Global number of TCP Segments currently in Reassembly Queue);
 
 static VNET_DEFINE(int, tcp_reass_overflows) = 0;
@@ -109,8 +104,10 @@ static void
 tcp_reass_zone_change(void *tag)
 {
 
+   /* Set the zone limit and read back the effective value. */
V_tcp_reass_maxseg = nmbclusters / 16;
uma_zone_set_max(V_tcp_reass_zone, V_tcp_reass_maxseg);
+   V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone);
 }
 
 void
@@ -122,7 +119,9 @@ tcp_reass_init(void)
V_tcp_reass_maxseg);
V_tcp_reass_zone = uma_zcreate(tcpreass, sizeof (struct tseg_qent),
NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
+   /* Set the zone limit and read back the effective value. */
uma_zone_set_max(V_tcp_reass_zone, V_tcp_reass_maxseg);
+   V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone);
EVENTHANDLER_REGISTER(nmbclusters_change,
tcp_reass_zone_change, NULL, EVENTHANDLER_PRI_ANY);
 }
@@ -156,17 +155,12 @@ tcp_reass_flush(struct tcpcb *tp)
 }
 
 static int
-tcp_reass_sysctl_maxseg(SYSCTL_HANDLER_ARGS)
-{
-   V_tcp_reass_maxseg = uma_zone_get_max(V_tcp_reass_zone);
-   return (sysctl_handle_int(oidp, arg1, arg2, req));
-}
-
-static int
 tcp_reass_sysctl_qsize(SYSCTL_HANDLER_ARGS)
 {
-   V_tcp_reass_qsize = uma_zone_get_cur(V_tcp_reass_zone);
-   return (sysctl_handle_int(oidp, arg1, arg2, req));
+   int qsize;
+
+   qsize = uma_zone_get_cur(V_tcp_reass_zone);
+   return (sysctl_handle_int(oidp, qsize, sizeof(qsize), req));
 }
 
 int
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242254 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:07:34 2012
New Revision: 242254
URL: http://svn.freebsd.org/changeset/base/242254

Log:
  Change the syncache count reporting the current number of entries
  from an unprotected u_int that reports garbage on SMP to a function
  based sysctl obtaining the current value from UMA.
  
  Also read back the actual cache_limit after page size rounding by UMA.
  
  PR:   kern/165879
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_syncache.c
  head/sys/netinet/tcp_syncache.h

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Oct 28 17:59:46 2012
(r242253)
+++ head/sys/netinet/tcp_syncache.c Sun Oct 28 18:07:34 2012
(r242254)
@@ -123,6 +123,7 @@ struct syncache *syncache_lookup(struct 
 static int  syncache_respond(struct syncache *);
 static struct   socket *syncache_socket(struct syncache *, struct socket *,
struct mbuf *m);
+static int  syncache_sysctl_count(SYSCTL_HANDLER_ARGS);
 static void syncache_timeout(struct syncache *sc, struct syncache_head 
*sch,
int docallout);
 static void syncache_timer(void *);
@@ -158,8 +159,8 @@ SYSCTL_VNET_UINT(_net_inet_tcp_syncache,
 VNET_NAME(tcp_syncache.cache_limit), 0,
 Overall entry limit for syncache);
 
-SYSCTL_VNET_UINT(_net_inet_tcp_syncache, OID_AUTO, count, CTLFLAG_RD,
-VNET_NAME(tcp_syncache.cache_count), 0,
+SYSCTL_VNET_PROC(_net_inet_tcp_syncache, OID_AUTO, count, 
(CTLTYPE_UINT|CTLFLAG_RD),
+NULL, 0, syncache_sysctl_count, IU,
 Current number of entries in syncache);
 
 SYSCTL_VNET_UINT(_net_inet_tcp_syncache, OID_AUTO, hashsize, CTLFLAG_RDTUN,
@@ -225,7 +226,6 @@ syncache_init(void)
 {
int i;
 
-   V_tcp_syncache.cache_count = 0;
V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE;
V_tcp_syncache.bucket_limit = TCP_SYNCACHE_BUCKETLIMIT;
V_tcp_syncache.rexmt_limit = SYNCACHE_MAXREXMTS;
@@ -269,6 +269,7 @@ syncache_init(void)
V_tcp_syncache.zone = uma_zcreate(syncache, sizeof(struct syncache),
NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0);
uma_zone_set_max(V_tcp_syncache.zone, V_tcp_syncache.cache_limit);
+   V_tcp_syncache.cache_limit = uma_zone_get_max(V_tcp_syncache.zone);
 }
 
 #ifdef VIMAGE
@@ -296,8 +297,8 @@ syncache_destroy(void)
mtx_destroy(sch-sch_mtx);
}
 
-   KASSERT(V_tcp_syncache.cache_count == 0, (%s: cache_count %d not 0,
-   __func__, V_tcp_syncache.cache_count));
+   KASSERT(uma_zone_get_cur(V_tcp_syncache.zone) == 0,
+   (%s: cache_count not 0, __func__));
 
/* Free the allocated global resources. */
uma_zdestroy(V_tcp_syncache.zone);
@@ -305,6 +306,15 @@ syncache_destroy(void)
 }
 #endif
 
+static int
+syncache_sysctl_count(SYSCTL_HANDLER_ARGS)
+{
+   int count;
+
+   count = uma_zone_get_cur(V_tcp_syncache.zone);
+   return (sysctl_handle_int(oidp, count, sizeof(count), req));
+}
+
 /*
  * Inserts a syncache entry into the specified bucket row.
  * Locks and unlocks the syncache_head autonomously.
@@ -347,7 +357,6 @@ syncache_insert(struct syncache *sc, str
 
SCH_UNLOCK(sch);
 
-   V_tcp_syncache.cache_count++;
TCPSTAT_INC(tcps_sc_added);
 }
 
@@ -373,7 +382,6 @@ syncache_drop(struct syncache *sc, struc
 #endif
 
syncache_free(sc);
-   V_tcp_syncache.cache_count--;
 }
 
 /*
@@ -958,7 +966,6 @@ syncache_expand(struct in_conninfo *inc,
tod-tod_syncache_removed(tod, sc-sc_todctx);
}
 #endif
-   V_tcp_syncache.cache_count--;
SCH_UNLOCK(sch);
}
 

Modified: head/sys/netinet/tcp_syncache.h
==
--- head/sys/netinet/tcp_syncache.h Sun Oct 28 17:59:46 2012
(r242253)
+++ head/sys/netinet/tcp_syncache.h Sun Oct 28 18:07:34 2012
(r242254)
@@ -112,7 +112,6 @@ struct tcp_syncache {
u_int   hashsize;
u_int   hashmask;
u_int   bucket_limit;
-   u_int   cache_count;/* XXX: unprotected */
u_int   cache_limit;
u_int   rexmt_limit;
u_int   hash_secret;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242255 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:33:52 2012
New Revision: 242255
URL: http://svn.freebsd.org/changeset/base/242255

Log:
  Allow arbitrary MSS sizes and don't mind about the cluster size anymore.
  We've got more cluster sizes for quite some time now and the orginally
  imposed limits and the previously codified thoughts on efficiency gains
  are no longer true.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 18:07:34 2012
(r242254)
+++ head/sys/netinet/tcp_input.cSun Oct 28 18:33:52 2012
(r242255)
@@ -3322,10 +3322,8 @@ tcp_xmit_timer(struct tcpcb *tp, int rtt
 /*
  * Determine a reasonable value for maxseg size.
  * If the route is known, check route for mtu.
- * If none, use an mss that can be handled on the outgoing
- * interface without forcing IP to fragment; if bigger than
- * an mbuf cluster (MCLBYTES), round down to nearest multiple of MCLBYTES
- * to utilize large mbufs.  If no route is found, route has no mtu,
+ * If none, use an mss that can be handled on the outgoing interface
+ * without forcing IP to fragment.  If no route is found, route has no mtu,
  * or the destination isn't local, use a default, hopefully conservative
  * size (usually 512 or the default IP max size, but no more than the mtu
  * of the interface), as we can't discover anything about intervening
@@ -3506,13 +3504,6 @@ tcp_mss_update(struct tcpcb *tp, int off
 (tp-t_flags  TF_RCVD_TSTMP) == TF_RCVD_TSTMP))
mss -= TCPOLEN_TSTAMP_APPA;
 
-#if(MCLBYTES  (MCLBYTES - 1)) == 0
-   if (mss  MCLBYTES)
-   mss = ~(MCLBYTES-1);
-#else
-   if (mss  MCLBYTES)
-   mss = mss / MCLBYTES * MCLBYTES;
-#endif
tp-t_maxseg = mss;
 }
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242256 - head/sys/kern

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:38:51 2012
New Revision: 242256
URL: http://svn.freebsd.org/changeset/base/242256

Log:
  Improve m_cat() by being able to also merge contents from M_EXT
  mbuf's by doing proper testing with M_WRITABLE().
  
  In m_collapse() replace an incomplete manual check for M_RDONLY
  with the M_WRITABLE() macro that also tests for shared buffers
  and other cases that make a particular mbuf immutable.
  
  MFC after:2 weeks

Modified:
  head/sys/kern/uipc_mbuf.c

Modified: head/sys/kern/uipc_mbuf.c
==
--- head/sys/kern/uipc_mbuf.c   Sun Oct 28 18:33:52 2012(r242255)
+++ head/sys/kern/uipc_mbuf.c   Sun Oct 28 18:38:51 2012(r242256)
@@ -911,8 +911,8 @@ m_cat(struct mbuf *m, struct mbuf *n)
while (m-m_next)
m = m-m_next;
while (n) {
-   if (m-m_flags  M_EXT ||
-   m-m_data + m-m_len + n-m_len = m-m_dat[MLEN]) {
+   if (!M_WRITABLE(m) ||
+   M_TRAILINGSPACE(m)  n-m_len) {
/* just join the two chains */
m-m_next = n;
return;
@@ -1584,7 +1584,7 @@ again:
n = m-m_next;
if (n == NULL)
break;
-   if ((m-m_flags  M_RDONLY) == 0 
+   if (M_WRITABLE(m) 
n-m_len  M_TRAILINGSPACE(m)) {
bcopy(mtod(n, void *), mtod(m, char *) + m-m_len,
n-m_len);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242257 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:45:04 2012
New Revision: 242257
URL: http://svn.freebsd.org/changeset/base/242257

Log:
  Remove bogus 'else' in #ifdef that prevented the rttvar from being reset
  tcp_timer_rexmt() on retransmit for IPv6 sessions.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 18:38:51 2012
(r242256)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 18:45:04 2012
(r242257)
@@ -596,7 +596,6 @@ tcp_timer_rexmt(void * xtp)
 #ifdef INET6
if ((tp-t_inpcb-inp_vflag  INP_IPV6) != 0)
in6_losing(tp-t_inpcb);
-   else
 #endif
tp-t_rttvar += (tp-t_srtt  TCP_RTT_SHIFT);
tp-t_srtt = 0;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242260 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 18:56:57 2012
New Revision: 242260
URL: http://svn.freebsd.org/changeset/base/242260

Log:
  When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE,
  the default retransmit timeout, as base to calculate the backoff
  time until next try instead of the TCP_REXMTVAL() macro which only
  works correctly when we already have measured an actual RTT+RTTVAR.
  
  Before it would cause the first retransmit at RTOBASE, the next
  four at the same time (!) about 200ms later, and then another one
  again RTOBASE later.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 18:53:28 2012
(r242259)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 18:56:57 2012
(r242260)
@@ -572,7 +572,7 @@ tcp_timer_rexmt(void * xtp)
tp-t_flags = ~TF_PREVVALID;
TCPSTAT_INC(tcps_rexmttimeo);
if (tp-t_state == TCPS_SYN_SENT)
-   rexmt = TCP_REXMTVAL(tp) * tcp_syn_backoff[tp-t_rxtshift];
+   rexmt = TCPTV_RTOBASE * tcp_syn_backoff[tp-t_rxtshift];
else
rexmt = TCP_REXMTVAL(tp) * tcp_backoff[tp-t_rxtshift];
TCPT_RANGESET(tp-t_rxtcur, rexmt,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242261 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 19:02:07 2012
New Revision: 242261
URL: http://svn.freebsd.org/changeset/base/242261

Log:
  For retransmits of SYN|ACK from the syncache use the slightly more
  aggressive special tcp_syn_backoff[] retransmit schedule instead of
  the normal tcp_backoff[] schedule for established connections.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_syncache.c
  head/sys/netinet/tcp_timer.h

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Oct 28 18:56:57 2012
(r242260)
+++ head/sys/netinet/tcp_syncache.c Sun Oct 28 19:02:07 2012
(r242261)
@@ -391,7 +391,7 @@ static void
 syncache_timeout(struct syncache *sc, struct syncache_head *sch, int docallout)
 {
sc-sc_rxttime = ticks +
-   TCPTV_RTOBASE * (tcp_backoff[sc-sc_rxmits]);
+   TCPTV_RTOBASE * (tcp_syn_backoff[sc-sc_rxmits]);
sc-sc_rxmits++;
if (TSTMP_LT(sc-sc_rxttime, sch-sch_nextc)) {
sch-sch_nextc = sc-sc_rxttime;

Modified: head/sys/netinet/tcp_timer.h
==
--- head/sys/netinet/tcp_timer.hSun Oct 28 18:56:57 2012
(r242260)
+++ head/sys/netinet/tcp_timer.hSun Oct 28 19:02:07 2012
(r242261)
@@ -170,6 +170,7 @@ extern int tcp_rexmit_slop;
 extern int tcp_msl;
 extern int tcp_ttl;/* time to live for TCP segs */
 extern int tcp_backoff[];
+extern int tcp_syn_backoff[];
 
 extern int tcp_finwait2_timeout;
 extern int tcp_fast_finwait2_recycle;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242262 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 19:16:22 2012
New Revision: 242262
URL: http://svn.freebsd.org/changeset/base/242262

Log:
  Simplify and enhance the window change/update acceptance logic,
  especially in the presence of bi-directional data transfers.
  
  snd_wl1 tracks the right edge, including data in the reassembly
  queue, of valid incoming data.  This makes it like rcv_nxt plus
  reassembly.  It never goes backwards to prevent older, possibly
  reordered segments from updating the window.
  
  snd_wl2 tracks the left edge of sent data.  This makes it a duplicate
  of snd_una.  However joining them right now is difficult due to
  separate update dependencies in different places in the code flow.
  
  snd_wnd tracks the current advertized send window by the peer.  In
  tcp_output() the effective window is calculated by subtracting the
  already in-flight data, snd_nxt less snd_una, from it.
  
  ACK's become the main clock of window updates and will always update
  the window when the left edge of what we sent is advanced.  The ACK
  clock is the primary signaling mechanism in ongoing data transfers.
  This works reliably even in the presence of reordering, reassembly
  and retransmitted segments.  The ACK clock is most important because
  it determines how much data we are allowed to inject into the network.
  
  Zero window updates get us out of persistence mode are crucial.  Here
  a segment that neither moves ACK nor SEQ but enlarges WND is accepted.
  
  When the ACK clock is not active (that is we're not or no longer
  sending any data) any segment that moves the extended right SEQ edge,
  including out-of-order segments, updates the window.  This gives us
  updates especially during ping-pong transfers where the peer isn't
  done consuming the already acknowledged data from the receive buffer
  while responding with data.
  
  The SSH protocol is a prime candidate to benefit from the improved
  bi-directional window update logic as it has its own windowing
  mechanism on top of TCP and is frequently sending back protocol ACK's.
  
  Tcpdump provided by:  darrenr
  Tested by:darrenr
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 19:02:07 2012
(r242261)
+++ head/sys/netinet/tcp_input.cSun Oct 28 19:16:22 2012
(r242262)
@@ -1714,7 +1714,7 @@ tcp_do_segment(struct mbuf *m, struct tc
 * Pull snd_wl1 up to prevent seq wrap relative to
 * th_seq.
 */
-   tp-snd_wl1 = th-th_seq;
+   tp-snd_wl1 = th-th_seq + tlen;
/*
 * Pull rcv_up up to prevent seq wrap relative to
 * rcv_nxt.
@@ -2327,7 +2327,6 @@ tcp_do_segment(struct mbuf *m, struct tc
if (tlen == 0  (thflags  TH_FIN) == 0)
(void) tcp_reass(tp, (struct tcphdr *)0, 0,
(struct mbuf *)0);
-   tp-snd_wl1 = th-th_seq - 1;
/* FALLTHROUGH */
 
/*
@@ -2638,12 +2637,10 @@ process_ACK:
 
SOCKBUF_LOCK(so-so_snd);
if (acked  so-so_snd.sb_cc) {
-   tp-snd_wnd -= so-so_snd.sb_cc;
sbdrop_locked(so-so_snd, (int)so-so_snd.sb_cc);
ourfinisacked = 1;
} else {
sbdrop_locked(so-so_snd, acked);
-   tp-snd_wnd -= acked;
ourfinisacked = 0;
}
/* NB: sowwakeup_locked() does an implicit unlock. */
@@ -2733,24 +2730,56 @@ step6:
INP_WLOCK_ASSERT(tp-t_inpcb);
 
/*
-* Update window information.
-* Don't look at window if no ACK: TAC's send garbage on first SYN.
+* Window update acceptance logic.  We have to be careful not
+* to accept window updates from old segments in the presence
+* of reordering or duplication.
+*
+* A window update is valid when:
+*  - the segment ACK's new data.
+*  - the segment carries new data and its ACK is current.
+*  - the segment matches the current SEQ and ACK but increases
+*the window.  This is the escape from persist mode, if there
+*data to be sent.
+*
+* XXXAO: The presence of new SACK information would allow to
+* accept window updates during retransmits.  We don't have an
+* easy way to test for that the moment.
+*
+* NB: The other side isn't allowed to shrink the window when
+* not sending or acking new data.  This behavior is strongly
+* discouraged by RFC793, section 3.7, page 42 anyways.
+*
+* XXXAO: tiwin = minmss to avoid jitter?
   

svn commit: r242263 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 19:20:23 2012
New Revision: 242263
URL: http://svn.freebsd.org/changeset/base/242263

Log:
  Add SACK_PERMIT to the list of TCP options that are switched off after
  retransmitting a SYN three times.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 19:16:22 2012
(r242262)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 19:20:23 2012
(r242263)
@@ -585,7 +585,7 @@ tcp_timer_rexmt(void * xtp)
 * unknown-to-them TCP options.
 */
if ((tp-t_state == TCPS_SYN_SENT)  (tp-t_rxtshift == 3))
-   tp-t_flags = ~(TF_REQ_SCALE|TF_REQ_TSTMP);
+   tp-t_flags = ~(TF_REQ_SCALE|TF_REQ_TSTMP|TF_SACK_PERMIT);
/*
 * If we backed off this far, our srtt estimate is probably bogus.
 * Clobber it so we'll take the next rtt measurement as our srtt;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242264 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 19:22:18 2012
New Revision: 242264
URL: http://svn.freebsd.org/changeset/base/242264

Log:
  Update comment to reflect the change made in r242263.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 19:20:23 2012
(r242263)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 19:22:18 2012
(r242264)
@@ -578,7 +578,7 @@ tcp_timer_rexmt(void * xtp)
TCPT_RANGESET(tp-t_rxtcur, rexmt,
  tp-t_rttmin, TCPTV_REXMTMAX);
/*
-* Disable rfc1323 if we haven't got any response to
+* Disable RFC1323 and SACK if we haven't got any response to
 * our third SYN to work-around some broken terminal servers
 * (most of which have hopefully been retired) that have bad VJ
 * header compression code which trashes TCP segments containing
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242266 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 19:47:46 2012
New Revision: 242266
URL: http://svn.freebsd.org/changeset/base/242266

Log:
  Increase the initial CWND to 10 segments as defined in IETF TCPM
  draft-ietf-tcpm-initcwnd-05. It explains why the increased initial
  window improves the overall performance of many web services without
  risking congestion collapse.
  
  As long as it remains a draft it is placed under a sysctl marking it
  as experimental:
   net.inet.tcp.experimental.initcwnd10 = 1
  When it becomes an official RFC soon the sysctl will be changed to
  the RFC number and moved to net.inet.tcp.
  
  This implementation differs from the RFC draft in that it is a bit
  more conservative in the case of packet loss on SYN or SYN|ACK because
  we haven't reduced the default RTO to 1 second yet.  Also the restart
  window isn't yet increased as allowed.  Both will be adjusted with
  upcoming changes.
  
  Is is enabled by default.  In Linux it is enabled since kernel 3.0.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 28 19:38:42 2012
(r242265)
+++ head/sys/netinet/tcp_input.cSun Oct 28 19:47:46 2012
(r242266)
@@ -159,6 +159,14 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 VNET_NAME(tcp_do_rfc3390), 0,
 Enable RFC 3390 (Increasing TCP's Initial Congestion Window));
 
+SYSCTL_NODE(_net_inet_tcp, OID_AUTO, experimental, CTLFLAG_RW, 0,
+Experimental TCP extensions);
+
+VNET_DEFINE(int, tcp_do_initcwnd10) = 1;
+SYSCTL_VNET_INT(_net_inet_tcp_experimental, OID_AUTO, initcwnd10, CTLFLAG_RW,
+VNET_NAME(tcp_do_initcwnd10), 0,
+Enable draft-ietf-tcpm-initcwnd-05 (Increasing initial CWND to 10));
+
 VNET_DEFINE(int, tcp_do_rfc3465) = 1;
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, rfc3465, CTLFLAG_RW,
 VNET_NAME(tcp_do_rfc3465), 0,
@@ -347,6 +355,7 @@ cc_conn_init(struct tcpcb *tp)
 *
 * RFC5681 Section 3.1 specifies the default conservative values.
 * RFC3390 specifies slightly more aggressive values.
+* Draft-ietf-tcpm-initcwnd-05 increases it to ten segments.
 *
 * If a SYN or SYN/ACK was lost and retransmitted, we have to
 * reduce the initial CWND to one segment as congestion is likely
@@ -354,6 +363,9 @@ cc_conn_init(struct tcpcb *tp)
 */
if (tp-snd_cwnd == 1)
tp-snd_cwnd = tp-t_maxseg;/* SYN(-ACK) lost */
+   else if (V_tcp_do_initcwnd10)
+   tp-snd_cwnd = min(10 * tp-t_maxseg,
+   max(2 * tp-t_maxseg, 14600));
else if (V_tcp_do_rfc3390)
tp-snd_cwnd = min(4 * tp-t_maxseg,
max(2 * tp-t_maxseg, 4380));

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Sun Oct 28 19:38:42 2012(r242265)
+++ head/sys/netinet/tcp_var.h  Sun Oct 28 19:47:46 2012(r242266)
@@ -611,6 +611,7 @@ VNET_DECLARE(int, tcp_mssdflt); /* XXX *
 VNET_DECLARE(int, tcp_minmss);
 VNET_DECLARE(int, tcp_delack_enabled);
 VNET_DECLARE(int, tcp_do_rfc3390);
+VNET_DECLARE(int, tcp_do_initcwnd10);
 VNET_DECLARE(int, tcp_sendspace);
 VNET_DECLARE(int, tcp_recvspace);
 VNET_DECLARE(int, path_mtu_discovery);
@@ -623,6 +624,7 @@ VNET_DECLARE(int, tcp_abc_l_var);
 #defineV_tcp_minmssVNET(tcp_minmss)
 #defineV_tcp_delack_enabledVNET(tcp_delack_enabled)
 #defineV_tcp_do_rfc3390VNET(tcp_do_rfc3390)
+#defineV_tcp_do_initcwnd10 VNET(tcp_do_initcwnd10)
 #defineV_tcp_sendspace VNET(tcp_sendspace)
 #defineV_tcp_recvspace VNET(tcp_recvspace)
 #defineV_path_mtu_discoveryVNET(path_mtu_discovery)
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242267 - head/sys/netinet

2012-10-28 Thread Andre Oppermann
Author: andre
Date: Sun Oct 28 19:58:20 2012
New Revision: 242267
URL: http://svn.freebsd.org/changeset/base/242267

Log:
  If the user has closed the socket then drop a persisting connection
  after a much reduced timeout.
  
  Typically web servers close their sockets quickly under the assumption
  that the TCP connections goes away as well.  That is not entirely true
  however.  If the peer closed the window we're going to wait for a long
  time with lots of data in the send buffer.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 19:47:46 2012
(r242266)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 19:58:20 2012
(r242267)
@@ -447,6 +447,16 @@ tcp_timer_persist(void *xtp)
tp = tcp_drop(tp, ETIMEDOUT);
goto out;
}
+   /*
+* If the user has closed the socket then drop a persisting
+* connection after a much reduced timeout.
+*/
+   if (tp-t_state  TCPS_CLOSE_WAIT 
+   (ticks - tp-t_rcvtime) = TCPTV_PERSMAX) {
+   TCPSTAT_INC(tcps_persistdrop);
+   tp = tcp_drop(tp, ETIMEDOUT);
+   goto out;
+   }
tcp_setpersist(tp);
tp-t_flags |= TF_FORCEDATA;
(void) tcp_output(tp);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242254 - head/sys/netinet

2012-10-28 Thread Andre Oppermann

On 28.10.2012 21:07, Gleb Smirnoff wrote:

On Sun, Oct 28, 2012 at 06:07:34PM +, Andre Oppermann wrote:
A @@ -296,8 +297,8 @@ syncache_destroy(void)
A   mtx_destroy(sch-sch_mtx);
A   }
A
A - KASSERT(V_tcp_syncache.cache_count == 0, (%s: cache_count %d not 0,
A - __func__, V_tcp_syncache.cache_count));
A + KASSERT(uma_zone_get_cur(V_tcp_syncache.zone) == 0,
A + (%s: cache_count not 0, __func__));
A
A   /* Free the allocated global resources. */
A   uma_zdestroy(V_tcp_syncache.zone);

btw, keg_dtor() which is called in uma_zdestroy() printfs a warning
(even on non-invariant kernel) if keg had items in it. So leak won't
be unnoticed.


Thanks, didn't know that.  I leave the KASSERT() in if you don't
mind to make it a bit more forceful than a printf that gets overlooked
too easily.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242161 - in head/sys: net netinet netpfil/pf

2012-10-28 Thread Andre Oppermann

On 28.10.2012 00:01, Gleb Smirnoff wrote:

On Sat, Oct 27, 2012 at 12:58:52PM +0200, Andre Oppermann wrote:
A On 26.10.2012 23:06, Gleb Smirnoff wrote:
A  Author: glebius
A  Date: Fri Oct 26 21:06:33 2012
A  New Revision: 242161
A  URL: http://svn.freebsd.org/changeset/base/242161
A 
A  Log:
A o Remove last argument to ip_fragment(), and obtain all needed 
information
A   on checksums directly from mbuf flags. This simplifies code.
A o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
A   hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
A   although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
A
A I'm not getting your description here?  Why work around a bug in a driver
A in ip_fragment() when we can fix the bug in the driver?

Well, that was actually bug in the stack and a very special driver that
demonstrates it. I may even agree that driver is incorrect, but the stack was
incorrect, too.


Ah, OK.  Do you intend to fix the driver as well?

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242266 - head/sys/netinet

2012-10-28 Thread Andre Oppermann

On 28.10.2012 22:03, Rui Paulo wrote:

On 28 Oct 2012, at 12:47, Andre Oppermann an...@freebsd.org wrote:


Author: andre
Date: Sun Oct 28 19:47:46 2012
New Revision: 242266
URL: http://svn.freebsd.org/changeset/base/242266

Log:
  Increase the initial CWND to 10 segments as defined in IETF TCPM
  draft-ietf-tcpm-initcwnd-05. It explains why the increased initial
  window improves the overall performance of many web services without
  risking congestion collapse.

  As long as it remains a draft it is placed under a sysctl marking it
  as experimental:
   net.inet.tcp.experimental.initcwnd10 = 1
  When it becomes an official RFC soon the sysctl will be changed to
  the RFC number and moved to net.inet.tcp.

  This implementation differs from the RFC draft in that it is a bit
  more conservative in the case of packet loss on SYN or SYN|ACK because
  we haven't reduced the default RTO to 1 second yet.  Also the restart
  window isn't yet increased as allowed.  Both will be adjusted with
  upcoming changes.

  Is is enabled by default.  In Linux it is enabled since kernel 3.0.



Didn't you also forget to point out the problems associated with it?

http://tools.ietf.org/html/draft-gettys-iw10-considered-harmful-00


IW10 has been heavily discussed on IETF TCPM.  A lot of research on
the impact has been done and the overall result has been a significant
improvement with very little downside.  Linux has adopted it for quite
some time already as default setting.

The bufferbloat issue is certainly real and should not be neglected.
However the solution to bufferbloat is not to send less packets into
the network.  In fact that doesn't even make a difference simply because
other packets with take their place.  Buffer bloat can only be fixed
in the devices that actually do the buffering.  A much discussed and
apparently good approach seems to be the Codel algorithm for active
buffer management.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242263 - head/sys/netinet

2012-10-28 Thread Andre Oppermann

On 28.10.2012 22:26, Rui Paulo wrote:

On 28 Oct 2012, at 12:20, Andre Oppermann an...@freebsd.org wrote:


Author: andre
Date: Sun Oct 28 19:20:23 2012
New Revision: 242263
URL: http://svn.freebsd.org/changeset/base/242263

Log:
  Add SACK_PERMIT to the list of TCP options that are switched off after
  retransmitting a SYN three times.

  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_timer.c

Modified: head/sys/netinet/tcp_timer.c
==
--- head/sys/netinet/tcp_timer.cSun Oct 28 19:16:22 2012
(r242262)
+++ head/sys/netinet/tcp_timer.cSun Oct 28 19:20:23 2012
(r242263)
@@ -585,7 +585,7 @@ tcp_timer_rexmt(void * xtp)
 * unknown-to-them TCP options.
 */
if ((tp-t_state == TCPS_SYN_SENT)  (tp-t_rxtshift == 3))
-   tp-t_flags = ~(TF_REQ_SCALE|TF_REQ_TSTMP);
+   tp-t_flags = ~(TF_REQ_SCALE|TF_REQ_TSTMP|TF_SACK_PERMIT);
/*
 * If we backed off this far, our srtt estimate is probably bogus.
 * Clobber it so we'll take the next rtt measurement as our srtt;


Do you have any data regarding this commit or you're just trying to make sure

 the SACK option follows the same behaviour of the WSCALE/TSTMP options?

The latter.  For the purpose of turning off the options after three tries
it is contradictory to leave SACK on.

There is discussion of scrapping this whole option disabling altogether.
Until then better have the 'correct' behavior.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242261 - head/sys/netinet

2012-10-28 Thread Andre Oppermann

On 28.10.2012 22:34, Rui Paulo wrote:

On 28 Oct 2012, at 12:02, Andre Oppermann an...@freebsd.org wrote:


Author: andre
Date: Sun Oct 28 19:02:07 2012
New Revision: 242261
URL: http://svn.freebsd.org/changeset/base/242261

Log:
  For retransmits of SYN|ACK from the syncache use the slightly more
  aggressive special tcp_syn_backoff[] retransmit schedule instead of
  the normal tcp_backoff[] schedule for established connections.



How did you came up with the values for tcp_syn_backoff? I obviously

 understand the aggressiveness, but did you measure any significant
 improvement in connection establishment time and if so, on what type of links?

I didn't come up with the values.  tcp_syn_backoff[] was introduced
almost 12 years ago by jlemon.  For syncache it got lost somewhere
along the line.

There has been recent talk by some large FreeBSD web server operators
of reducing SYN|ACK retransmit timeouts.  This change fixes a part of
the problem.  The recent RFC on reducing the RTO will fix the other
part.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242261 - head/sys/netinet

2012-10-28 Thread Andre Oppermann

On 28.10.2012 23:01, Rui Paulo wrote:

On Oct 28, 2012, at 14:56, Andre Oppermann an...@freebsd.org wrote:


On 28.10.2012 22:34, Rui Paulo wrote:

On 28 Oct 2012, at 12:02, Andre Oppermann an...@freebsd.org wrote:


Author: andre
Date: Sun Oct 28 19:02:07 2012
New Revision: 242261
URL: http://svn.freebsd.org/changeset/base/242261

Log:
  For retransmits of SYN|ACK from the syncache use the slightly more
  aggressive special tcp_syn_backoff[] retransmit schedule instead of
  the normal tcp_backoff[] schedule for established connections.



How did you came up with the values for tcp_syn_backoff? I obviously
understand the aggressiveness, but did you measure any significant
improvement in connection establishment time and if so, on what type of links?


I didn't come up with the values.  tcp_syn_backoff[] was introduced
almost 12 years ago by jlemon.  For syncache it got lost somewhere
along the line.


Oh, I see. I read it backwards.



There has been recent talk by some large FreeBSD web server operators
of reducing SYN|ACK retransmit timeouts.  This change fixes a part of
the problem.  The recent RFC on reducing the RTO will fix the other
part.


Which RFC? I'm only aware of draft-hurtig-tcpm-rtorestart.


RFC6298.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242266 - head/sys/netinet

2012-10-28 Thread Andre Oppermann

On 28.10.2012 22:44, Rui Paulo wrote:

On 28 Oct 2012, at 14:33, Andre Oppermann an...@freebsd.org wrote:

IW10 has been heavily discussed on IETF TCPM.  A lot of research on
the impact has been done and the overall result has been a significant
improvement with very little downside.  Linux has adopted it for quite
some time already as default setting.


I have followed the discussions at tcpm, but I did not find any conclusive 
evidence of the benefit of IW10. I'm sure it can help in multiple situations 
but, as always, there are tradeoffs. Section 6 of draft-ietf-tcpm-initcwnd 
never convinced me.


Then please raise your points on TCPM.


The bufferbloat issue is certainly real and should not be neglected.
However the solution to bufferbloat is not to send less packets into
the network.  In fact that doesn't even make a difference simply because
other packets with take their place.


Right, my point is that sending more packets in an already congested link will 
negatively affect the throughput / latency of the network. I'm not saying that 
it won't help you download a YouTube video faster, but the overall fairness of 
TCP will be reduced.


That's always the case.  Reality is that the majority of links these
days is very fast compared to twenty years ago.  We can afford to be
a bit more aggressive here.  Otherwise taking your point to the extreme
would mean that IW can only ever be 1 MSS.

Then there is the unfairness of low RTT to high RTT transfers.  But that's
inherent in any end to end feedback system.


  Buffer bloat can only be fixed
in the devices that actually do the buffering.  A much discussed and
apparently good approach seems to be the Codel algorithm for active
buffer management.


Are you working on CoDel? :-)


I'm looking into how the whole interface stuff including ALTQ can be
improved in an SMP world.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242161 - in head/sys: net netinet netpfil/pf

2012-10-29 Thread Andre Oppermann

On 29.10.2012 22:40, YongHyeon PYUN wrote:

On Mon, Oct 29, 2012 at 09:21:00AM +0400, Gleb Smirnoff wrote:

On Mon, Oct 29, 2012 at 01:41:04PM -0700, YongHyeon PYUN wrote:
Y On Sun, Oct 28, 2012 at 02:01:37AM +0400, Gleb Smirnoff wrote:
Y  On Sat, Oct 27, 2012 at 12:58:52PM +0200, Andre Oppermann wrote:
Y  A On 26.10.2012 23:06, Gleb Smirnoff wrote:
Y  A  Author: glebius
Y  A  Date: Fri Oct 26 21:06:33 2012
Y  A  New Revision: 242161
Y  A  URL: http://svn.freebsd.org/changeset/base/242161
Y  A 
Y  A  Log:
Y  A o Remove last argument to ip_fragment(), and obtain all needed 
information
Y  A   on checksums directly from mbuf flags. This simplifies code.
Y  A o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums 
in
Y
Y I'm not sure whether ti(4)'s checksum offloading for IP fragmented
Y packets(CSUM_IP_FRAGS) still works after this change.  ti(4)
Y requires CSUM_IP should be set for IP fragmented packets. Not sure
Y whether it's a bug or not. I have a ti(4) controller but I don't
Y remember where I can find it and don't have a link
Y parter(1000baseSX) to test it. :-(

ti(4) declares both CSUM_IP and CSUM_IP_FRAGS, so ip_fragment() won't do


Because it supports both CSUM_IP and CSUM_IP_FRAGS. Probably ti(4)
is the only controller that supports TCP/UDP checksum offloading
for an IP fragmented packet.


This is a bit weird if it doesn't do the fragmentation itself.
Computing the IP header checksum doesn't differ for normal and
fragmented packets.  The protocol checksum (TCP or UDP) stays
the same for in the case of IP level fragmentation.  It is only
visible in the first fragment which includes the protocol header.


software checksums, and thus won't clear these flags.

Potentially a driver that announces one flag in if_hwassist but relies on
couple of flags to be set on mbuf is not correct. If a driver can't do single
checksum  processing independently from others, then it should set or clear
appropriate flags in if_hwassist as a group.


Hmm, then what would be best way to achieve CSUM_IP_FRAGS in
driver? I don't have clear idea how to utilize the hardware
feature. The stack should tell that the mbuf needs TCP/UDP checksum
offloading for IP fragmented packet(i.e. CSUM_IP_FRAGS is not set by
upper stack).


As I said there can't be fragment checksumming without hardware
based fragmentation.  We have three cases here:

 1. TSO where the hardware does the segmentation, TCP and IP header
checksums for each generated packet.
 2. IP packet fragmentation where a packet is split, the IP header
checksum is recomputed for each fragment, but the protocol csum
stays the same and is not modified.
 3. UDP fragmentation where a large packet is sent to the hardware
and it generates first the UDP checksum and then splits it into
IP fragments each with its own IP header checksum.

So we end up with these possible large send hardware offload capabilities:
 TSO: including IPv4hdr and TCP checksumming
 UDP fragmentation: including IPv4hdr and UDP checksumming
 IP fragmentation: including IPv4hdr checksumming

Besides that we have the packet = MTU sized offload capabilities:
 TCP checksumming
 UDP checksumming
 SCTP checksumming
 IPv4hdr checksumming


Y  A   hardware. Some driver may not announce CSUM_IP in theur 
if_hwassist,


Oh, that was a typo! Software was meant.


That explains quite a bit of confusion.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242306 - head/sys/kern

2012-10-29 Thread Andre Oppermann
Author: andre
Date: Mon Oct 29 12:14:57 2012
New Revision: 242306
URL: http://svn.freebsd.org/changeset/base/242306

Log:
  Add logging for socket attach failures in sonewconn() during accept(2).
  Include the pointer to the PCB so it can be attributed to a particular
  application by corresponding it to netstat -A output.
  
  MFC after:2 weeks

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Mon Oct 29 10:22:00 2012(r242305)
+++ head/sys/kern/uipc_socket.c Mon Oct 29 12:14:57 2012(r242306)
@@ -135,6 +135,7 @@ __FBSDID($FreeBSD$);
 #include sys/sysctl.h
 #include sys/uio.h
 #include sys/jail.h
+#include sys/syslog.h
 
 #include net/vnet.h
 
@@ -500,16 +501,24 @@ sonewconn(struct socket *head, int conns
over = (head-so_qlen  3 * head-so_qlimit / 2);
ACCEPT_UNLOCK();
 #ifdef REGRESSION
-   if (regression_sonewconn_earlytest  over)
+   if (regression_sonewconn_earlytest  over) {
 #else
-   if (over)
+   if (over) {
 #endif
+   log(LOG_DEBUG, %s: pcb %p: Listen queue overflow: 
+   %i already in queue awaiting acceptance\n,
+   __func__, head-so_pcb, over);
return (NULL);
+   }
VNET_ASSERT(head-so_vnet != NULL, (%s:%d so_vnet is NULL, head=%p,
__func__, __LINE__, head));
so = soalloc(head-so_vnet);
-   if (so == NULL)
+   if (so == NULL) {
+   log(LOG_DEBUG, %s: pcb %p: New socket allocation failure: 
+   limit reached or out of memory\n,
+   __func__, head-so_pcb);
return (NULL);
+   }
if ((head-so_options  SO_ACCEPTFILTER) != 0)
connstatus = 0;
so-so_head = head;
@@ -526,9 +535,16 @@ sonewconn(struct socket *head, int conns
knlist_init_mtx(so-so_rcv.sb_sel.si_note, SOCKBUF_MTX(so-so_rcv));
knlist_init_mtx(so-so_snd.sb_sel.si_note, SOCKBUF_MTX(so-so_snd));
VNET_SO_ASSERT(head);
-   if (soreserve(so, head-so_snd.sb_hiwat, head-so_rcv.sb_hiwat) ||
-   (*so-so_proto-pr_usrreqs-pru_attach)(so, 0, NULL)) {
+   if (soreserve(so, head-so_snd.sb_hiwat, head-so_rcv.sb_hiwat)) {
+   sodealloc(so);
+   log(LOG_DEBUG, %s: pcb %p: soreserve() failed\n,
+   __func__, head-so_pcb);
+   return (NULL);
+   }
+   if ((*so-so_proto-pr_usrreqs-pru_attach)(so, 0, NULL)) {
sodealloc(so);
+   log(LOG_DEBUG, %s: pcb %p: pru_attach() failed\n,
+   __func__, head-so_pcb);
return (NULL);
}
so-so_rcv.sb_lowat = head-so_rcv.sb_lowat;
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242308 - head/sys/netinet

2012-10-29 Thread Andre Oppermann
Author: andre
Date: Mon Oct 29 12:17:02 2012
New Revision: 242308
URL: http://svn.freebsd.org/changeset/base/242308

Log:
  Define the delayed ACK timeout value directly as hz/10 instead of
  obfuscating it by going through PR_FASTHZ.  No functional change.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_timer.h

Modified: head/sys/netinet/tcp_timer.h
==
--- head/sys/netinet/tcp_timer.hMon Oct 29 12:16:19 2012
(r242307)
+++ head/sys/netinet/tcp_timer.hMon Oct 29 12:17:02 2012
(r242308)
@@ -118,7 +118,7 @@
 
 #defineTCP_MAXRXTSHIFT 12  /* maximum retransmits 
*/
 
-#defineTCPTV_DELACK(hz / PR_FASTHZ / 2)/* 100ms timeout */
+#defineTCPTV_DELACK( hz/10 )   /* 100ms timeout */
 
 #ifdef TCPTIMERS
 static const char *tcptimers[] =
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r242311 - head/sys/netinet

2012-10-29 Thread Andre Oppermann
Author: andre
Date: Mon Oct 29 13:16:33 2012
New Revision: 242311
URL: http://svn.freebsd.org/changeset/base/242311

Log:
  Forced commit to provide the correct commit message to r242251:
  
Defer sending an independent window update if a delayed ACK is pending
saving a packet.  The window update then gets piggy-backed on the next
already scheduled ACK.
  
  Added grammar fixes as well.
  
  MFC after:2 weeks

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Mon Oct 29 12:37:39 2012
(r242310)
+++ head/sys/netinet/tcp_output.c   Mon Oct 29 13:16:33 2012
(r242311)
@@ -547,13 +547,13 @@ after_sack_rexmit:
/*
 * Sending of standalone window updates.
 *
-* Window updates important when we close our window due to a full
-* socket buffer and are opening it again after the application
+* Window updates are important when we close our window due to a
+* full socket buffer and are opening it again after the application
 * reads data from it.  Once the window has opened again and the
 * remote end starts to send again the ACK clock takes over and
 * provides the most current window information.
 *
-* We must avoid to the silly window syndrome whereas every read
+* We must avoid the silly window syndrome whereas every read
 * from the receive buffer, no matter how small, causes a window
 * update to be sent.  We also should avoid sending a flurry of
 * window updates when the socket buffer had queued a lot of data
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242251 - head/sys/netinet

2012-10-29 Thread Andre Oppermann

On 28.10.2012 18:30, Andre Oppermann wrote:

Author: andre
Date: Sun Oct 28 17:30:28 2012
New Revision: 242251
URL: http://svn.freebsd.org/changeset/base/242251

Log:
   When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
   reduce the initial CWND to one segment.  This reduction got lost
   some time ago due to a change in initialization ordering.

   Additionally in tcp_timer_rexmt() avoid entering fast recovery when
   we're still in TCPS_SYN_SENT state.


Oops, this was the wrong commit message for this change.  Here is the
correct one:

  Defer sending an independent window update if a delayed ACK is pending
  saving a packet.  The window update then gets piggy-backed on the next
  already scheduled ACK.

I've forced commit r242311 with some grammar fixes to provide this information.

--
Andre


   MFC after:   2 weeks

Modified:
   head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 28 17:25:08 2012
(r242250)
+++ head/sys/netinet/tcp_output.c   Sun Oct 28 17:30:28 2012
(r242251)
@@ -551,10 +551,14 @@ after_sack_rexmit:
 * max size segments, or at least 50% of the maximum possible
 * window, then want to send a window update to peer.
 * Skip this if the connection is in T/TCP half-open state.
-* Don't send pure window updates when the peer has closed
-* the connection and won't ever send more data.
+*
+* Don't send an independent window update if a delayed
+* ACK is pending (it will get piggy-backed on it) or the
+* remote side already has done a half-close and won't send
+* more data.
 */
if (recwin  0  !(tp-t_flags  TF_NEEDSYN) 
+   !(tp-t_flags  TF_DELACK) 
!TCPS_HAVERCVDFIN(tp-t_state)) {
/*
 * adv is the amount we can increase the window,




___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242161 - in head/sys: net netinet netpfil/pf

2012-10-31 Thread Andre Oppermann

On 30.10.2012 03:25, YongHyeon PYUN wrote:

On Mon, Oct 29, 2012 at 09:20:59AM +0100, Andre Oppermann wrote:

On 29.10.2012 22:40, YongHyeon PYUN wrote:

On Mon, Oct 29, 2012 at 09:21:00AM +0400, Gleb Smirnoff wrote:

On Mon, Oct 29, 2012 at 01:41:04PM -0700, YongHyeon PYUN wrote:
Y On Sun, Oct 28, 2012 at 02:01:37AM +0400, Gleb Smirnoff wrote:
Y  On Sat, Oct 27, 2012 at 12:58:52PM +0200, Andre Oppermann wrote:
Y  A On 26.10.2012 23:06, Gleb Smirnoff wrote:
Y  A  Author: glebius
Y  A  Date: Fri Oct 26 21:06:33 2012
Y  A  New Revision: 242161
Y  A  URL: http://svn.freebsd.org/changeset/base/242161
Y  A 
Y  A  Log:
Y  A o Remove last argument to ip_fragment(), and obtain all
needed information
Y  A   on checksums directly from mbuf flags. This simplifies
code.
Y  A o Clear CSUM_IP from the mbuf in ip_fragment() if we did
checksums in
Y
Y I'm not sure whether ti(4)'s checksum offloading for IP fragmented
Y packets(CSUM_IP_FRAGS) still works after this change.  ti(4)
Y requires CSUM_IP should be set for IP fragmented packets. Not sure
Y whether it's a bug or not. I have a ti(4) controller but I don't
Y remember where I can find it and don't have a link
Y parter(1000baseSX) to test it. :-(

ti(4) declares both CSUM_IP and CSUM_IP_FRAGS, so ip_fragment() won't do


Because it supports both CSUM_IP and CSUM_IP_FRAGS. Probably ti(4)
is the only controller that supports TCP/UDP checksum offloading
for an IP fragmented packet.


This is a bit weird if it doesn't do the fragmentation itself.
Computing the IP header checksum doesn't differ for normal and
fragmented packets.  The protocol checksum (TCP or UDP) stays
the same for in the case of IP level fragmentation.  It is only
visible in the first fragment which includes the protocol header.


My interpretation for CSUM_IP_FRAGS works like the following.
  - Only peuso header checksum for TCP/UDP is computed by upper
stack.
  - Controller has no ability to fragment the packet so it should
done in upper stack(i.e. ip_output()).
  - When ip_output() has to fragment the packet, it just fragments
the packet without completing TCP/UDP and IP checksum. If
controller does not support CSUM_IP_FRAGS feature, ip_output()
can't delay TCP/UDP checksum in this stage.
  - The fragmented packets are sent to driver. Driver sets
appropriate bits of DMA descriptor based on fragmentation field
of mbuf(M_FRAG, M_LASTFRAG) and issue the frame to controller.
  - The firmware of controller queues the fragmented frames up in
its internal memory and hold off sending out the frames since it
has to compute TCP/UDP checksum. When it sees a frame which
indicates the end of fragmented frame it finally computes
TCP/UDP checksum and send each frame out to wire by computing
IP checksum on the fly.
The difference is which one(upper stack vs. controller) computes
TCP/UDP/IP checksum.


Such a behavior doesn't make much sense and probably wasn't used at all
in practice.  It's very complex as well.  Plus you can't guarantee that
there won't be other packet slipping into the interface queue in an SMP
world.

IP fragmentation really isn't done for TCP within the kernel.  We try
to prevent it as it would have a huge performance impact. Hence the
internal MTU discovery and the Don't Fragment bit set on TCP packets.

IP fragmentation does happen for large UDP packet locally generated.
There however because of the past absence of UDP fragmentation offload
coupled with UDP checksum offloading caused all fragmentation to be
done at the UDP level before it hits ip_output.

The remaining use of IP fragmentation is when the machine is acting
as a router and it has to send packets out on an interface with a
smaller MTU than the one it came in on.

So the only two useful features regarding UDP+IP fragmentation are:

 1. IP fragmentation including UDP checksum calculation for locally
generated large UDP packets.  This is the TSO for UDP.

 2. Pure IP fragmentation for in-transit packets.  Here only the
IP header checksum needs to be recalculated for each fragment.
The layer 4 checksums (UDP, TCP and others) stay the same.

--
Andre




software checksums, and thus won't clear these flags.

Potentially a driver that announces one flag in if_hwassist but relies on
couple of flags to be set on mbuf is not correct. If a driver can't do
single
checksum  processing independently from others, then it should set or
clear
appropriate flags in if_hwassist as a group.


Hmm, then what would be best way to achieve CSUM_IP_FRAGS in
driver? I don't have clear idea how to utilize the hardware
feature. The stack should tell that the mbuf needs TCP/UDP checksum
offloading for IP fragmented packet(i.e. CSUM_IP_FRAGS is not set by
upper stack).


As I said there can't be fragment checksumming without hardware


It's up to controller's firmware. It does not send the fragmented
frame until it computes TCP/UDP checksum.


based fragmentation.  We have three cases

Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Andre Oppermann

On 31.10.2012 20:40, Ian Lepore wrote:

On Thu, 2012-11-01 at 06:30 +1100, Peter Jeremy wrote:

On 2012-Oct-31 18:57:37 +, Attilio Rao atti...@freebsd.org wrote:

On 10/31/12, Adrian Chadd adr...@freebsd.org wrote:

Right, but you didn't make it configurable for us embedded peeps who
still care about memory usage.


How is this possible without breaking the module/kernel ABI?


Memory usage may override ABI compatibility in an embedded environment.


All that assuming you can actually prove a real performance loss even
in the new cases.


The issue with padding on embedded systems is memory utilisation rather
than performance.



There are potential performance hits too, in that embedded systems tend
to have tiny caches (16K L1 with no L2, that sort of thing), so
purposely padding things so that large parts of a cache line aren't used
for anything wastes a scarce resource.


You can define CACHE_LINE_SIZE to 0 on those platforms.
Or to make it even more granular there could be a CACHE_LINE_SIZE_LOCKS
that is used for lock padding.

--
Andre


That said, I think a point Attilio was trying to make is that we won't
see a large hit because this doesn't affect a large number of mutex
instances.  I'm willing to accept his expert advice on that, not in
small part because I'm not sure how I'd go about disputing it. :)

I'm really busy with $work right now, but things should calm down in a
couple weeks, and I'd be willing to do some measurements on arm systems
then, if I can get some help on how to generate useful data.

-- Ian






___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Andre Oppermann

On 31.10.2012 19:10, Attilio Rao wrote:

On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:

Author: attilio
Date: Wed Oct 31 18:07:18 2012
New Revision: 242402
URL: http://svn.freebsd.org/changeset/base/242402

Log:
   Rework the known mutexes to benefit about staying on their own
   cache line in order to avoid manual frobbing but using
   struct mtx_padalign.


Interested developers can now dig and look for other mutexes to
convert and just do it.
Please, however, try to enclose a description about the benchmark
which lead you believe the necessity to pad the mutex and possibly
some numbers, in particular when the lock belongs to structures or the
ABI itself.

Next steps involve porting the same mtx(9) changes to rwlock(9) and
port pvh global pmap lock to rwlock_padalign.


I'd say for an rwlock you can make it unconditional.  The very purpose
of it is to be aquired by multiple CPU's causing cache line dirtying
for every concurrent reader.  Rwlocks are only ever used because multiple
concurrent readers are expected.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Andre Oppermann

On 01.11.2012 12:53, Attilio Rao wrote:

On 10/31/12, Andre Oppermann an...@freebsd.org wrote:

On 31.10.2012 19:10, Attilio Rao wrote:

On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:

Author: attilio
Date: Wed Oct 31 18:07:18 2012
New Revision: 242402
URL: http://svn.freebsd.org/changeset/base/242402

Log:
Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.


Interested developers can now dig and look for other mutexes to
convert and just do it.
Please, however, try to enclose a description about the benchmark
which lead you believe the necessity to pad the mutex and possibly
some numbers, in particular when the lock belongs to structures or the
ABI itself.

Next steps involve porting the same mtx(9) changes to rwlock(9) and
port pvh global pmap lock to rwlock_padalign.


I'd say for an rwlock you can make it unconditional.  The very purpose
of it is to be aquired by multiple CPU's causing cache line dirtying
for every concurrent reader.  Rwlocks are only ever used because multiple
concurrent readers are expected.


I thought about it, but I think the same arguments as for mutexes remains.
The real problem is that having default rwlocks pad-aligned will put
showstoppers for their usage in sensitive structures. For example, I
have plans to use them in vm_object at some point to replace
VM_OBJECT_LOCK and I do want to avoid the extra-bloat for such
structures.

Also, please keep in mind that there is no direct relation between
read acquisition and high contention with the latter being the
real reason for having pad-aligned locks.


I do not agree.  If there is no contention then there is no need for
a rwlock, a normal mutex would be sufficient.  A rwlock is used when
multiple concurrent readers are expected.  Each read lock and unlock
dirties the cache line for all other CPU's.

Please note that I don't want to prevent you from doing the work all
over for rwlocks.  It's just that the use case for a non-padded rwlock
is very narrow.

--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r242421 - head/sys/dev/ixgbe

2012-11-01 Thread Andre Oppermann

On 01.11.2012 00:50, Jack F Vogel wrote:

Author: jfv
Date: Wed Oct 31 23:50:36 2012
New Revision: 242421
URL: http://svn.freebsd.org/changeset/base/242421

Log:
   A few important fixes:
 - Testing TSO6 has led me to discover that HW RSC is
   a problematic feature, it is ONLY designed to work
   with IPv4 in the first place, and if IP forwarding
   is done it can't be disabled as LRO in the stack,
   also initial testing we've done at Intel shows an
   equal performance using TSO[46] on the TX and LRO
   on RX, if you ran older code on 82599 or later hardware
   you actually could have detrimental performance for
   this reason. So I am disabling the feature by default
   and all our adapters will now use LRO instead.


Yes, it's very important that LRO is *not* used when forwarding
is enabled (= acting as a router).


 - If you have flow control off and multiple queues it
   was possible when the buffer of one queue becomes
   full that all RX movement is stalled, to eliminate
   this problem a feature bit is now set that will allow
   packets to be dropped when full rather than stall.
   Note, the default is to have flow control on, and this
   keeps this from happening.

 - Because of the recent fixes in the stack, LRO is now
   auto-disabled when problematic, so I have decided to
   enable it by default in the capabilities in the driver.


A very important cautionary note here: LRO is only good when combined
with very low RTTs (that is in LAN environments).  On everything over
5ms is breaks the TCP ACK clock badly and performance will suffer
greatly.  This is because every ACK increases the congestion window.
With a greatly reduced ACK rate the ramping up of CWND on startup and
after a loss event is severely limited.  Combined with ABC (appropriate
byte counting) where the CWND increases only once per ACK by at most
one MSS the effect is greatly pronounced as well.  The higher the RTT
goes the worse the effects become.  I haven't checked yet whether our
soft-LRO does ACK compression or not.  If it does, we need a workaround
and some tcp_input magic to reduce the negative impact.  I'm looking
into it.


 - There are some 1G modules used by some customers, a couple
   small tweaks to properly support those in the media code.

 - A note: we have now done some testing of TSO6 and using
   LRO with IPv6 and it all works great!! Seeing line rate
   in both directions in best cases. Thanks bz for your
   excellent work!!


Indeed!


Modified:
   head/sys/dev/ixgbe/ixgbe.c


--
Andre

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r211315 - head/sys/netinet

2010-08-14 Thread Andre Oppermann
Author: andre
Date: Sat Aug 14 20:40:55 2010
New Revision: 211315
URL: http://svn.freebsd.org/changeset/base/211315

Log:
  Disable TCP inflight limiter by default.
  
  It was experimental and interferes with the normal congestion control
  algorithms by instating a separate, possibly lower, ceiling for the
  amount of data that is in flight to the remote host.  With high speed
  internet connections the inflight limit frequently has been estimated
  too low due to the noisy nature of the RTT measurements.
  
  This code gives way for the upcoming pluggable congestion control
  framework.  It is the task of the congestion control algorithm to
  set the congestion window and amount of inflight data without external
  interference.
  
  Reviewed by:  lstewart
  MFC after:1 week
  Removal after:1 month

Modified:
  head/sys/netinet/tcp_subr.c

Modified: head/sys/netinet/tcp_subr.c
==
--- head/sys/netinet/tcp_subr.c Sat Aug 14 20:12:10 2010(r211314)
+++ head/sys/netinet/tcp_subr.c Sat Aug 14 20:40:55 2010(r211315)
@@ -221,7 +221,7 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 SYSCTL_NODE(_net_inet_tcp, OID_AUTO, inflight, CTLFLAG_RW, 0,
 TCP inflight data limiting);
 
-static VNET_DEFINE(int, tcp_inflight_enable) = 1;
+static VNET_DEFINE(int, tcp_inflight_enable) = 0;
 #defineV_tcp_inflight_enable   VNET(tcp_inflight_enable)
 SYSCTL_VNET_INT(_net_inet_tcp_inflight, OID_AUTO, enable, CTLFLAG_RW,
 VNET_NAME(tcp_inflight_enable), 0,
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r211316 - head/sys/netinet

2010-08-14 Thread Andre Oppermann
Author: andre
Date: Sat Aug 14 21:04:27 2010
New Revision: 211316
URL: http://svn.freebsd.org/changeset/base/211316

Log:
  Change the messages of the ICMP bad port bandwidth limiter from
  a kernel printf to a log output with the priority of LOG_NOTICE.
  
  This way the messages still show up in /var/log/messages but no
  longer spam the console every other second on busy servers that
  are port scanned:
   Limiting open port RST response from 114 to 100 packets/sec
  
  PR:   kern/147352
  Submitted by: Eugene Grosbein eugen-at-eg sd rdtc ru
  MFC after:1 week

Modified:
  head/sys/netinet/ip_icmp.c

Modified: head/sys/netinet/ip_icmp.c
==
--- head/sys/netinet/ip_icmp.c  Sat Aug 14 20:40:55 2010(r211315)
+++ head/sys/netinet/ip_icmp.c  Sat Aug 14 21:04:27 2010(r211316)
@@ -42,6 +42,7 @@ __FBSDID($FreeBSD$);
 #include sys/time.h
 #include sys/kernel.h
 #include sys/sysctl.h
+#include sys/syslog.h
 
 #include net/if.h
 #include net/if_types.h
@@ -975,7 +976,7 @@ badport_bandlim(int which)
 * the previous behaviour at the expense of added complexity.
 */
if (V_icmplim_output  opps  V_icmplim)
-   printf(Limiting %s from %d to %d packets/sec\n,
+   log(LOG_NOTICE, Limiting %s from %d to %d 
packets/sec\n,
r-type, opps, V_icmplim);
}
return 0;   /* okay to send packet */
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r211327 - head/sys/netinet

2010-08-15 Thread Andre Oppermann
Author: andre
Date: Sun Aug 15 09:30:13 2010
New Revision: 211327
URL: http://svn.freebsd.org/changeset/base/211327

Log:
  Add more logging points for failures in syncache_socket() to
  report when a new socket couldn't be created because one of
  in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed.
  
  Logging is conditional on net.inet.tcp.log_debug being enabled.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_syncache.c

Modified: head/sys/netinet/tcp_syncache.c
==
--- head/sys/netinet/tcp_syncache.c Sun Aug 15 08:49:07 2010
(r211326)
+++ head/sys/netinet/tcp_syncache.c Sun Aug 15 09:30:13 2010
(r211327)
@@ -627,6 +627,7 @@ syncache_socket(struct syncache *sc, str
struct inpcb *inp = NULL;
struct socket *so;
struct tcpcb *tp;
+   int error = 0;
char *s;
 
INP_INFO_WLOCK_ASSERT(V_tcbinfo);
@@ -675,7 +676,7 @@ syncache_socket(struct syncache *sc, str
}
 #endif
inp-inp_lport = sc-sc_inc.inc_lport;
-   if (in_pcbinshash(inp) != 0) {
+   if ((error = in_pcbinshash(inp)) != 0) {
/*
 * Undo the assignments above if we failed to
 * put the PCB on the hash lists.
@@ -687,6 +688,12 @@ syncache_socket(struct syncache *sc, str
 #endif
inp-inp_laddr.s_addr = INADDR_ANY;
inp-inp_lport = 0;
+   if ((s = tcp_log_addrs(sc-sc_inc, NULL, NULL, NULL))) {
+   log(LOG_DEBUG, %s; %s: in_pcbinshash failed 
+   with error %i\n,
+   s, __func__, error);
+   free(s, M_TCPLOG);
+   }
goto abort;
}
 #ifdef IPSEC
@@ -721,9 +728,15 @@ syncache_socket(struct syncache *sc, str
laddr6 = inp-in6p_laddr;
if (IN6_IS_ADDR_UNSPECIFIED(inp-in6p_laddr))
inp-in6p_laddr = sc-sc_inc.inc6_laddr;
-   if (in6_pcbconnect(inp, (struct sockaddr *)sin6,
-   thread0.td_ucred)) {
+   if ((error = in6_pcbconnect(inp, (struct sockaddr *)sin6,
+   thread0.td_ucred)) != 0) {
inp-in6p_laddr = laddr6;
+   if ((s = tcp_log_addrs(sc-sc_inc, NULL, NULL, NULL))) 
{
+   log(LOG_DEBUG, %s; %s: in6_pcbconnect failed 
+   with error %i\n,
+   s, __func__, error);
+   free(s, M_TCPLOG);
+   }
goto abort;
}
/* Override flowlabel from in6_pcbconnect. */
@@ -750,9 +763,15 @@ syncache_socket(struct syncache *sc, str
laddr = inp-inp_laddr;
if (inp-inp_laddr.s_addr == INADDR_ANY)
inp-inp_laddr = sc-sc_inc.inc_laddr;
-   if (in_pcbconnect(inp, (struct sockaddr *)sin,
-   thread0.td_ucred)) {
+   if ((error = in_pcbconnect(inp, (struct sockaddr *)sin,
+   thread0.td_ucred)) != 0) {
inp-inp_laddr = laddr;
+   if ((s = tcp_log_addrs(sc-sc_inc, NULL, NULL, NULL))) 
{
+   log(LOG_DEBUG, %s; %s: in_pcbconnect failed 
+   with error %i\n,
+   s, __func__, error);
+   free(s, M_TCPLOG);
+   }
goto abort;
}
}
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r211333 - head/sys/netinet

2010-08-15 Thread Andre Oppermann
Author: andre
Date: Sun Aug 15 13:25:18 2010
New Revision: 211333
URL: http://svn.freebsd.org/changeset/base/211333

Log:
  Fix the interaction between 'ICMP fragmentation needed' MTU updates,
  path MTU discovery and the tcp_minmss limiter for very small MTU's.
  
  When the MTU suggested by the gateway via ICMP, or if there isn't
  any the next smaller step from ip_next_mtu(), is lower than the
  floor enforced by net.inet.tcp.minmss (default 216) the value is
  ignored and the default MSS (512) is used instead.  However the
  DF flag in the IP header is still set in tcp_output() preventing
  fragmentation by the gateway.
  
  Fix this by using tcp_minmss as the MSS and clear the DF flag if
  the suggested MTU is too low.  This turns off path MTU dissovery
  for the remainder of the session and allows fragmentation to be
  done by the gateway.
  
  Only MTU's smaller than 256 are affected.  The smallest official
  MTU specified is for AX.25 packet radio at 256 octets.
  
  PR:   kern/146628
  Tested by:Matthew Luckie mjl-at-luckie org nz
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_subr.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Aug 15 13:07:08 2010
(r211332)
+++ head/sys/netinet/tcp_output.c   Sun Aug 15 13:25:18 2010
(r211333)
@@ -1186,8 +1186,10 @@ timer:
 * This might not be the best thing to do according to RFC3390
 * Section 2. However the tcp hostcache migitates the problem
 * so it affects only the first tcp connection with a host.
+*
+* NB: Don't set DF on small MTU/MSS to have a safe fallback.
 */
-   if (V_path_mtu_discovery)
+   if (V_path_mtu_discovery  tp-t_maxopd  V_tcp_minmss)
ip-ip_off |= IP_DF;
 
error = ip_output(m, tp-t_inpcb-inp_options, NULL,

Modified: head/sys/netinet/tcp_subr.c
==
--- head/sys/netinet/tcp_subr.c Sun Aug 15 13:07:08 2010(r211332)
+++ head/sys/netinet/tcp_subr.c Sun Aug 15 13:25:18 2010(r211333)
@@ -1339,11 +1339,9 @@ tcp_ctlinput(int cmd, struct sockaddr *s
if (!mtu)
mtu = ip_next_mtu(ip-ip_len,
 1);
-   if (mtu  max(296, V_tcp_minmss
-+ sizeof(struct tcpiphdr)))
-   mtu = 0;
-   if (!mtu)
-   mtu = V_tcp_mssdflt
+   if (mtu  V_tcp_minmss
++ sizeof(struct tcpiphdr))
+   mtu = V_tcp_minmss
 + sizeof(struct tcpiphdr);
/*
 * Only cache the the MTU if it
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r211327 - head/sys/netinet

2010-08-15 Thread Andre Oppermann

On 15.08.2010 11:41, Bjoern A. Zeeb wrote:

On Sun, 15 Aug 2010, Andre Oppermann wrote:


Author: andre
Date: Sun Aug 15 09:30:13 2010
New Revision: 211327
URL: http://svn.freebsd.org/changeset/base/211327

Log:
Add more logging points for failures in syncache_socket() to
report when a new socket couldn't be created because one of
in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed.

Logging is conditional on net.inet.tcp.log_debug being enabled.

MFC after: 1 week

Modified:
head/sys/netinet/tcp_syncache.c

Modified: head/sys/netinet/tcp_syncache.c
==

--- head/sys/netinet/tcp_syncache.c Sun Aug 15 08:49:07 2010 (r211326)
+++ head/sys/netinet/tcp_syncache.c Sun Aug 15 09:30:13 2010 (r211327)
@@ -627,6 +627,7 @@ syncache_socket(struct syncache *sc, str
struct inpcb *inp = NULL;
struct socket *so;
struct tcpcb *tp;
+ int error = 0;



Is there any need to initialize here?


No.  Actually not.  Was just my style of using safe initial values.
But here the return value is the socket pointer of NULL.  The error
is not passed back directly.

Fixed in r211332.

Thanks for noticing and reporting.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r211396 - head/sys/vm

2010-08-16 Thread Andre Oppermann
Author: andre
Date: Mon Aug 16 14:24:00 2010
New Revision: 211396
URL: http://svn.freebsd.org/changeset/base/211396

Log:
  Add uma_zone_get_max() to obtain the effective limit after a call
  to uma_zone_set_max().
  
  The UMA zone limit is not exactly set to the value supplied but
  rounded up to completely fill the backing store increment (a page
  normally).  This can lead to surprising situations where the number
  of elements allocated from UMA is higher than the supplied limit
  value.  The new get function reads back the effective value so that
  the supplied limit value can be adjusted to the real limit.
  
  Reviewed by:  jeffr
  MFC after:1 week

Modified:
  head/sys/vm/uma.h
  head/sys/vm/uma_core.c

Modified: head/sys/vm/uma.h
==
--- head/sys/vm/uma.h   Mon Aug 16 12:37:17 2010(r211395)
+++ head/sys/vm/uma.h   Mon Aug 16 14:24:00 2010(r211396)
@@ -459,6 +459,18 @@ int uma_zone_set_obj(uma_zone_t zone, st
 void uma_zone_set_max(uma_zone_t zone, int nitems);
 
 /*
+ * Obtains the effective limit on the number of items in a zone
+ *
+ * Arguments:
+ * zone  The zone to obtain the effective limit from
+ *
+ * Return:
+ * 0  No limit
+ * int  The effective limit of the zone
+ */
+int uma_zone_get_max(uma_zone_t zone);
+
+/*
  * The following two routines (uma_zone_set_init/fini)
  * are used to set the backend init/fini pair which acts on an
  * object as it becomes allocated and is placed in a slab within

Modified: head/sys/vm/uma_core.c
==
--- head/sys/vm/uma_core.c  Mon Aug 16 12:37:17 2010(r211395)
+++ head/sys/vm/uma_core.c  Mon Aug 16 14:24:00 2010(r211396)
@@ -2797,6 +2797,24 @@ uma_zone_set_max(uma_zone_t zone, int ni
 }
 
 /* See uma.h */
+int
+uma_zone_get_max(uma_zone_t zone)
+{
+   int nitems;
+   uma_keg_t keg;
+
+   ZONE_LOCK(zone);
+   keg = zone_first_keg(zone);
+   if (keg-uk_maxpages)
+   nitems = keg-uk_maxpages * keg-uk_ipers;
+   else
+   nitems = 0;
+   ZONE_UNLOCK(zone);
+
+   return (nitems);
+}
+
+/* See uma.h */
 void
 uma_zone_set_init(uma_zone_t zone, uma_init uminit)
 {
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r211464 - head/sys/netinet

2010-08-18 Thread Andre Oppermann
Author: andre
Date: Wed Aug 18 18:05:54 2010
New Revision: 211464
URL: http://svn.freebsd.org/changeset/base/211464

Log:
  If a TCP connection has been idle for one retransmit timeout or more
  it must reset its congestion window back to the initial window.
  
  RFC3390 has increased the initial window from 1 segment to up to
  4 segments.
  
  The initial window increase of RFC3390 wasn't reflected into the
  restart window which remained at its original defaults of 4 segments
  for local and 1 segment for all other connections.  Both values are
  controllable through sysctl net.inet.tcp.local_slowstart_flightsize
  and net.inet.tcp.slowstart_flightsize.
  
  The increase helps TCP's slow start algorithm to open up the congestion
  window much faster.
  
  Reviewed by:  lstewart
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Wed Aug 18 17:40:10 2010
(r211463)
+++ head/sys/netinet/tcp_output.c   Wed Aug 18 18:05:54 2010
(r211464)
@@ -140,7 +140,7 @@ tcp_output(struct tcpcb *tp)
 {
struct socket *so = tp-t_inpcb-inp_socket;
long len, recwin, sendwin;
-   int off, flags, error;
+   int off, flags, error, rw;
struct mbuf *m;
struct ip *ip = NULL;
struct ipovly *ipov = NULL;
@@ -176,23 +176,34 @@ tcp_output(struct tcpcb *tp)
idle = (tp-t_flags  TF_LASTIDLE) || (tp-snd_max == tp-snd_una);
if (idle  ticks - tp-t_rcvtime = tp-t_rxtcur) {
/*
-* We have been idle for a while and no acks are
-* expected to clock out any data we send --
-* slow start to get ack clock running again.
+* If we've been idle for more than one retransmit
+* timeout the old congestion window is no longer
+* current and we have to reduce it to the restart
+* window before we can transmit again.
 *
-* Set the slow-start flight size depending on whether
-* this is a local network or not.
+* The restart window is the initial window or the last
+* CWND, whichever is smaller.
+* 
+* This is done to prevent us from flooding the path with
+* a full CWND at wirespeed, overloading router and switch
+* buffers along the way.
+*
+* See RFC5681 Section 4.1. Restarting Idle Connections.
 */
-   int ss = V_ss_fltsz;
+   if (V_tcp_do_rfc3390)
+   rw = min(4 * tp-t_maxseg,
+max(2 * tp-t_maxseg, 4380));
 #ifdef INET6
-   if (isipv6) {
-   if (in6_localaddr(tp-t_inpcb-in6p_faddr))
-   ss = V_ss_fltsz_local;
-   } else
-#endif /* INET6 */
-   if (in_localaddr(tp-t_inpcb-inp_faddr))
-   ss = V_ss_fltsz_local;
-   tp-snd_cwnd = tp-t_maxseg * ss;
+   else if ((isipv6 ? in6_localaddr(tp-t_inpcb-in6p_faddr) :
+ in_localaddr(tp-t_inpcb-inp_faddr)))
+#else
+   else if (in_localaddr(tp-t_inpcb-inp_faddr))
+#endif
+   rw = V_ss_fltsz_local * tp-t_maxseg;
+   else
+   rw = V_ss_fltsz * tp-t_maxseg;
+
+   tp-snd_cwnd = min(rw, tp-snd_cwnd);
}
tp-t_flags = ~TF_LASTIDLE;
if (idle) {

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Wed Aug 18 17:40:10 2010(r211463)
+++ head/sys/netinet/tcp_var.h  Wed Aug 18 18:05:54 2010(r211464)
@@ -565,6 +565,7 @@ extern  int tcp_log_in_vain;
 VNET_DECLARE(int, tcp_mssdflt);/* XXX */
 VNET_DECLARE(int, tcp_minmss);
 VNET_DECLARE(int, tcp_delack_enabled);
+VNET_DECLARE(int, tcp_do_rfc3390);
 VNET_DECLARE(int, tcp_do_newreno);
 VNET_DECLARE(int, path_mtu_discovery);
 VNET_DECLARE(int, ss_fltsz);
@@ -575,6 +576,7 @@ VNET_DECLARE(int, ss_fltsz_local);
 #defineV_tcp_mssdflt   VNET(tcp_mssdflt)
 #defineV_tcp_minmssVNET(tcp_minmss)
 #defineV_tcp_delack_enabledVNET(tcp_delack_enabled)
+#defineV_tcp_do_rfc3390VNET(tcp_do_rfc3390)
 #defineV_tcp_do_newrenoVNET(tcp_do_newreno)
 #defineV_path_mtu_discoveryVNET(path_mtu_discovery)
 #defineV_ss_fltsz  VNET(ss_fltsz)
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r211503 - head/sys/mips/atheros

2010-08-19 Thread Andre Oppermann

On 19.08.2010 13:53, Adrian Chadd wrote:

Author: adrian
Date: Thu Aug 19 11:53:55 2010
New Revision: 211503
URL: http://svn.freebsd.org/changeset/base/211503

Log:
   Add some initial AR724X chipset support.

   This is untested but should at least allow an AR724X to boot.


Isn't this something that should be done on a project branch and
merged back when in a good working state?


   The current code is lacking the detail needed to expose the PCIe bus.
   It is also lacking any NIC, PLL or flush/WB code.


--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r211503 - head/sys/mips/atheros

2010-08-19 Thread Andre Oppermann

On 19.08.2010 19:20, M. Warner Losh wrote:

In message:4c6d2933.9020...@freebsd.org
 Andre Oppermannan...@freebsd.org  writes:
: On 19.08.2010 13:53, Adrian Chadd wrote:
:  Author: adrian
:  Date: Thu Aug 19 11:53:55 2010
:  New Revision: 211503
:  URL: http://svn.freebsd.org/changeset/base/211503
:
:  Log:
: Add some initial AR724X chipset support.
:
: This is untested but should at least allow an AR724X to boot.
:
: Isn't this something that should be done on a project branch and
: merged back when in a good working state?

We don't have a branch for mips stuff these days.  This stuff is OK,
since the AR724X is just being rolled out right now...  For non AR724x
systems, this won't affect anything...


I was more concerned about tree breakage for non-tested code.  When
developing something bleeding edge it is often useful to just commit
some stuff and have it sorted out later.  In head this is more
dangerous.  A small AR724X development branch would be ideal for
this.  Branching is cheap with SVN these days.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r211503 - head/sys/mips/atheros

2010-08-19 Thread Andre Oppermann

On 19.08.2010 20:42, M. Warner Losh wrote:

In message:4c6d6fd7.7060...@freebsd.org
 Andre Oppermannan...@freebsd.org  writes:
: On 19.08.2010 19:20, M. Warner Losh wrote:
:  In message:4c6d2933.9020...@freebsd.org
:   Andre Oppermannan...@freebsd.org   writes:
:  : On 19.08.2010 13:53, Adrian Chadd wrote:
:  :   Author: adrian
:  :   Date: Thu Aug 19 11:53:55 2010
:  :   New Revision: 211503
:  :   URL: http://svn.freebsd.org/changeset/base/211503
:  :
:  :   Log:
:  :  Add some initial AR724X chipset support.
:  :
:  :  This is untested but should at least allow an AR724X to boot.
:  :
:  : Isn't this something that should be done on a project branch and
:  : merged back when in a good working state?
:
:  We don't have a branch for mips stuff these days.  This stuff is OK,
:  since the AR724X is just being rolled out right now...  For non AR724x
:  systems, this won't affect anything...
:
: I was more concerned about tree breakage for non-tested code.  When
: developing something bleeding edge it is often useful to just commit
: some stuff and have it sorted out later.  In head this is more
: dangerous.  A small AR724X development branch would be ideal for
: this.  Branching is cheap with SVN these days.

Merging isn't that cheap with svn.  The svn:mergeinfo properties make
them a pita.  Given that this code won't break anything, except
possibly the now-unsupported AR724x, I think a branch would be
overkill.  We'd have to drag that branch along all the time until we
can get actual hardware to test it on, which is a high overhead.


Didn't know that branching and merging isn't that easy with SVN after
all.  This was one of the supposed benefits for switching from CVS.
If there is no risk of head breakage I don't mind at all.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r211874 - head/sys/netinet

2010-08-27 Thread Andre Oppermann
Author: andre
Date: Fri Aug 27 12:34:53 2010
New Revision: 211874
URL: http://svn.freebsd.org/changeset/base/211874

Log:
  Use timestamp modulo comparison macro for automatic receive buffer
  scaling to correctly handle wrapping of ticks value.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cFri Aug 27 11:08:11 2010
(r211873)
+++ head/sys/netinet/tcp_input.cFri Aug 27 12:34:53 2010
(r211874)
@@ -1441,7 +1441,7 @@ tcp_do_segment(struct mbuf *m, struct tc
if (V_tcp_do_autorcvbuf 
to.to_tsecr 
(so-so_rcv.sb_flags  SB_AUTOSIZE)) {
-   if (to.to_tsecr  tp-rfbuf_ts 
+   if (TSTMP_GT(to.to_tsecr, tp-rfbuf_ts) 
to.to_tsecr - tp-rfbuf_ts  hz) {
if (tp-rfbuf_cnt 
(so-so_rcv.sb_hiwat / 8 * 7) 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r212731 - head/sys/netinet

2010-09-16 Thread Andre Oppermann
Author: andre
Date: Thu Sep 16 12:13:06 2010
New Revision: 212731
URL: http://svn.freebsd.org/changeset/base/212731

Log:
  Improve comment to TCP_MINMSS by taking the wording from lstewart (with
  a small difference in the last paragraph though) as suggested by jhb.
  
  Clarify that the 'reviewed by' in r212653 by lstewart was for the
  functional change, not the comments in the committed version.

Modified:
  head/sys/netinet/tcp.h

Modified: head/sys/netinet/tcp.h
==
--- head/sys/netinet/tcp.h  Thu Sep 16 12:05:46 2010(r212730)
+++ head/sys/netinet/tcp.h  Thu Sep 16 12:13:06 2010(r212731)
@@ -120,18 +120,18 @@ struct tcphdr {
 #defineTCP6_MSS1220
 
 /*
- * Limit the lowest MSS we accept from path MTU discovery and the TCP SYN MSS
- * option.  Allowing too low values of MSS can consume significant amounts of
- * resources and be used as a form of a resource exhaustion attack.
+ * Limit the lowest MSS we accept for path MTU discovery and the TCP SYN MSS
+ * option.  Allowing low values of MSS can consume significant resources and
+ * be used to mount a resource exhaustion attack.
  * Connections requesting lower MSS values will be rounded up to this value
- * and the IP_DF flag is cleared to allow fragmentation along the path.
+ * and the IP_DF flag will be cleared to allow fragmentation along the path.
  *
  * See tcp_subr.c tcp_minmss SYSCTL declaration for more comments.  Setting
  * it to 0 disables the minmss check.
  *
- * The default value is fine for the smallest official link MTU (256 bytes,
- * AX.25 packet radio) in the Internet.  However it is very unlikely to come
- * across such low MTU interfaces these days (anno domini 2003).
+ * The default value is fine for TCP across the Internet's smallest official
+ * link MTU (256 bytes for AX.25 packet radio).  However, a connection is very
+ * unlikely to come across such low MTU interfaces these days (anno domini 
2003).
  */
 #defineTCP_MINMSS 216
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r212769 - head/share/man/man4

2010-09-16 Thread Andre Oppermann
Author: andre
Date: Thu Sep 16 22:11:55 2010
New Revision: 212769
URL: http://svn.freebsd.org/changeset/base/212769

Log:
  The inflight bandwidth limiter was removed in r212765.

Modified:
  head/share/man/man4/tcp.4

Modified: head/share/man/man4/tcp.4
==
--- head/share/man/man4/tcp.4   Thu Sep 16 21:18:25 2010(r212768)
+++ head/share/man/man4/tcp.4   Thu Sep 16 22:11:55 2010(r212769)
@@ -32,7 +32,7 @@
 .\ From: @(#)tcp.48.1 (Berkeley) 6/5/93
 .\ $FreeBSD$
 .\
-.Dd August 16, 2008
+.Dd September 16, 2010
 .Dt TCP 4
 .Os
 .Sh NAME
@@ -383,72 +383,6 @@ code.
 For this reason, we use 200ms of slop and a near-0
 minimum, which gives us an effective minimum of 200ms (similar to
 .Tn Linux ) .
-.It Va inflight.enable
-Enable
-.Tn TCP
-bandwidth-delay product limiting.
-An attempt will be made to calculate
-the bandwidth-delay product for each individual
-.Tn TCP
-connection, and limit
-the amount of inflight data being transmitted, to avoid building up
-unnecessary packets in the network.
-This option is recommended if you
-are serving a lot of data over connections with high bandwidth-delay
-products, such as modems, GigE links, and fast long-haul WANs, and/or
-you have configured your machine to accommodate large
-.Tn TCP
-windows.
-In such
-situations, without this option, you may experience high interactive
-latencies or packet loss due to the overloading of intermediate routers
-and switches.
-Note that bandwidth-delay product limiting only effects
-the transmit side of a
-.Tn TCP
-connection.
-.It Va inflight.debug
-Enable debugging for the bandwidth-delay product algorithm.
-.It Va inflight.min
-This puts a lower bound on the bandwidth-delay product window, in bytes.
-A value of 1024 is typically used for debugging.
-6000-16000 is more typical in a production installation.
-Setting this value too low may result in
-slow ramp-up times for bursty connections.
-Setting this value too high effectively disables the algorithm.
-.It Va inflight.max
-This puts an upper bound on the bandwidth-delay product window, in bytes.
-This value should not generally be modified, but may be used to set a
-global per-connection limit on queued data, potentially allowing you to
-intentionally set a less than optimum limit, to smooth data flow over a
-network while still being able to specify huge internal
-.Tn TCP
-buffers.
-.It Va inflight.stab
-The bandwidth-delay product algorithm requires a slightly larger window
-than it otherwise calculates for stability.
-This parameter determines the extra window in maximal packets / 10.
-The default value of 20 represents 2 maximal packets.
-Reducing this value is not recommended, but you may
-come across a situation with very slow links where the
-.Xr ping 8
-time
-reduction of the default inflight code is not sufficient.
-If this case occurs, you should first try reducing
-.Va inflight.min
-and, if that does not
-work, reduce both
-.Va inflight.min
-and
-.Va inflight.stab ,
-trying values of
-15, 10, or 5 for the latter.
-Never use a value less than 5.
-Reducing
-.Va inflight.stab
-can lead to upwards of a 20% underutilization of the link
-as well as reducing the algorithm's ability to adapt to changing
-situations and should only be done as a last resort.
 .It Va rfc3042
 Enable the Limited Transmit algorithm as described in RFC 3042.
 It helps avoid timeouts on lossy links and also when the congestion window
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r212803 - head/sys/netinet

2010-09-17 Thread Andre Oppermann
Author: andre
Date: Fri Sep 17 22:05:27 2010
New Revision: 212803
URL: http://svn.freebsd.org/changeset/base/212803

Log:
  Rearrange the TSO code to make it more readable and to clearly
  separate the decision logic, of whether we can do TSO, and the
  calculation of the burst length into two distinct parts.
  
  Change the way the TSO burst length calculation is done. While
  TSO could do bursts of 65535 bytes that can't be represented in
  ip_len together with the IP and TCP header. Account for that and
  use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both
  have the same value of 64K). When more data is available prevent
  less than MSS sized segments from being sent during the current
  TSO burst.
  
  Add two more KASSERTs to ensure the integrity of the packets.
  
  Tested by:Ben Wilber ben-at-desync com
  MFC after:10 days

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Fri Sep 17 21:53:56 2010
(r212802)
+++ head/sys/netinet/tcp_output.c   Fri Sep 17 22:05:27 2010
(r212803)
@@ -465,9 +465,8 @@ after_sack_rexmit:
}
 
/*
-* Truncate to the maximum segment length or enable TCP Segmentation
-* Offloading (if supported by hardware) and ensure that FIN is removed
-* if the length no longer contains the last data byte.
+* Decide if we can use TCP Segmentation Offloading (if supported by
+* hardware).
 *
 * TSO may only be used if we are in a pure bulk sending state.  The
 * presence of TCP-MD5, SACK retransmits, SACK advertizements and
@@ -475,10 +474,6 @@ after_sack_rexmit:
 * (except for the sequence number) for all generated packets.  This
 * makes it impossible to transmit any options which vary per generated
 * segment or packet.
-*
-* The length of TSO bursts is limited to TCP_MAXWIN.  That limit and
-* removal of FIN (if not already catched here) are handled later after
-* the exact length of the TCP options are known.
 */
 #ifdef IPSEC
/*
@@ -487,22 +482,15 @@ after_sack_rexmit:
 */
ipsec_optlen = ipsec_hdrsiz_tcp(tp);
 #endif
-   if (len  tp-t_maxseg) {
-   if ((tp-t_flags  TF_TSO)  V_tcp_do_tso 
-   ((tp-t_flags  TF_SIGNATURE) == 0) 
-   tp-rcv_numsacks == 0  sack_rxmit == 0 
-   tp-t_inpcb-inp_options == NULL 
-   tp-t_inpcb-in6p_options == NULL
+   if ((tp-t_flags  TF_TSO)  V_tcp_do_tso  len  tp-t_maxseg 
+   ((tp-t_flags  TF_SIGNATURE) == 0) 
+   tp-rcv_numsacks == 0  sack_rxmit == 0 
 #ifdef IPSEC
-ipsec_optlen == 0
+   ipsec_optlen == 0 
 #endif
-   ) {
-   tso = 1;
-   } else {
-   len = tp-t_maxseg;
-   sendalot = 1;
-   }
-   }
+   tp-t_inpcb-inp_options == NULL 
+   tp-t_inpcb-in6p_options == NULL)
+   tso = 1;
 
if (sack_rxmit) {
if (SEQ_LT(p-rxmit + len, tp-snd_una + so-so_snd.sb_cc))
@@ -732,28 +720,53 @@ send:
 * bump the packet length beyond the t_maxopd length.
 * Clear the FIN bit because we cut off the tail of
 * the segment.
-*
-* When doing TSO limit a burst to TCP_MAXWIN minus the
-* IP, TCP and Options length to keep ip-ip_len from
-* overflowing.  Prevent the last segment from being
-* fractional thus making them all equal sized and set
-* the flag to continue sending.  TSO is disabled when
-* IP options or IPSEC are present.
 */
if (len + optlen + ipoptlen  tp-t_maxopd) {
flags = ~TH_FIN;
+
if (tso) {
-   if (len  TCP_MAXWIN - hdrlen - optlen) {
-   len = TCP_MAXWIN - hdrlen - optlen;
-   len = len - (len % (tp-t_maxopd - optlen));
+   KASSERT(ipoptlen == 0,
+   (%s: TSO can't do IP options, __func__));
+
+   /*
+* Limit a burst to IP_MAXPACKET minus IP,
+* TCP and options length to keep ip-ip_len
+* from overflowing.
+*/
+   if (len  IP_MAXPACKET - hdrlen) {
+   len = IP_MAXPACKET - hdrlen;
+   sendalot = 1;
+   }
+
+   /*
+* Prevent the last segment from being
+* fractional unless the send sockbuf can
+* be emptied.
+*/
+   if (sendalot  off + len  

Re: svn commit: r212803 - head/sys/netinet

2010-09-18 Thread Andre Oppermann

On 18.09.2010 13:34, Bjoern A. Zeeb wrote:

On Fri, 17 Sep 2010, Andre Oppermann wrote:

@@ -487,22 +482,15 @@ after_sack_rexmit:
*/
ipsec_optlen = ipsec_hdrsiz_tcp(tp);
#endif
- if (len  tp-t_maxseg) {
- if ((tp-t_flags  TF_TSO)  V_tcp_do_tso 
- ((tp-t_flags  TF_SIGNATURE) == 0) 
- tp-rcv_numsacks == 0  sack_rxmit == 0 
- tp-t_inpcb-inp_options == NULL 
- tp-t_inpcb-in6p_options == NULL
+ if ((tp-t_flags  TF_TSO)  V_tcp_do_tso  len  tp-t_maxseg 
+ ((tp-t_flags  TF_SIGNATURE) == 0) 
+ tp-rcv_numsacks == 0  sack_rxmit == 0 
#ifdef IPSEC
-  ipsec_optlen == 0
+ ipsec_optlen == 0 
#endif
- ) {
- tso = 1;
- } else {
- len = tp-t_maxseg;
- sendalot = 1;
- }
- }
+ tp-t_inpcb-inp_options == NULL 
+ tp-t_inpcb-in6p_options == NULL)
+ tso = 1;


In the non-TSO case you are no longer reducing len to tp-t_maxseg
here, if it's larger, which I think breaks asssumptions all the way down.


No assumptions are broken for the non-TSO case.  The value of len is
only tested against t_maxseg for being equal or grater.  This always
hold true.  When the decision to send has been made len is correctly
limited in the non-TSO and TSO case.  Before it was a bit of either
was done in both places.  That is now merged into one spot.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r236959 - in head: share/man/man4 sys/netinet

2012-06-13 Thread Andre Oppermann

On 12.06.2012 16:02, Michael Tuexen wrote:

Author: tuexen
Date: Tue Jun 12 14:02:38 2012
New Revision: 236959
URL: http://svn.freebsd.org/changeset/base/236959

Log:
   Add a IP_RECVTOS socket option to receive for received UDP/IPv4
   packets a cmsg of type IP_RECVTOS which contains the TOS byte.
   Much like IP_RECVTTL does for TTL. This allows to implement a
   protocol on top of UDP and implementing ECN.


You may want to consider to alias IP_RECVTOS with IP_TOS as it is
done with IP_SENDSRCADDR+IP_RECVDSTADDR to allow for simpler replying
of received UDP packets.  That way IP_RECVTOS has the same ip socket
option number and it can be used for direct TOS reflection.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r227499 - head/share/man/man4

2011-11-14 Thread Andre Oppermann
Author: andre
Date: Mon Nov 14 15:10:42 2011
New Revision: 227499
URL: http://svn.freebsd.org/changeset/base/227499

Log:
  Note the ip_len bug fixed in r226105 in the BUGS section.

Modified:
  head/share/man/man4/ip.4

Modified: head/share/man/man4/ip.4
==
--- head/share/man/man4/ip.4Mon Nov 14 15:10:01 2011(r227498)
+++ head/share/man/man4/ip.4Mon Nov 14 15:10:42 2011(r227499)
@@ -32,7 +32,7 @@
 .\ @(#)ip.4   8.2 (Berkeley) 11/30/93
 .\ $FreeBSD$
 .\
-.Dd June 1, 2009
+.Dd November 14, 2011
 .Dt IP 4
 .Os
 .Sh NAME
@@ -847,3 +847,9 @@ The
 .Vt ip_mreqn
 structure appeared in
 .Tn Linux 2.4 .
+.Sh BUGS
+Before
+.Fx 10.0 packets received on raw IP sockets had the
+.Va ip_hl
+subtracted from the
+.Va ip_len field.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r227500 - head/share/man/man4

2011-11-14 Thread Andre Oppermann
Author: andre
Date: Mon Nov 14 15:14:42 2011
New Revision: 227500
URL: http://svn.freebsd.org/changeset/base/227500

Log:
  Remove mention of ss_fltsz and ss_fltsz_local which were retired in r226447.

Modified:
  head/share/man/man4/tcp.4

Modified: head/share/man/man4/tcp.4
==
--- head/share/man/man4/tcp.4   Mon Nov 14 15:10:42 2011(r227499)
+++ head/share/man/man4/tcp.4   Mon Nov 14 15:14:42 2011(r227500)
@@ -38,7 +38,7 @@
 .\ From: @(#)tcp.48.1 (Berkeley) 6/5/93
 .\ $FreeBSD$
 .\
-.Dd September 15, 2011
+.Dd November 14, 2011
 .Dt TCP 4
 .Os
 .Sh NAME
@@ -290,14 +290,6 @@ That of 2 results in any
 packets to closed ports being logged.
 Any value unlisted above disables the logging
 (default is 0, i.e., the logging is disabled).
-.It Va slowstart_flightsize
-The number of packets allowed to be in-flight during the
-.Tn TCP
-slow-start phase on a non-local network.
-.It Va local_slowstart_flightsize
-The number of packets allowed to be in-flight during the
-.Tn TCP
-slow-start phase to local machines in the same subnet.
 .It Va msl
 The Maximum Segment Lifetime, in milliseconds, for a packet.
 .It Va keepinit
@@ -411,15 +403,6 @@ maximum segment size.
 This helps throughput in general, but
 particularly affects short transfers and high-bandwidth large
 propagation-delay connections.
-.Pp
-When this feature is enabled, the
-.Va slowstart_flightsize
-and
-.Va local_slowstart_flightsize
-settings are not observed for new
-connection slow starts, but they are still used for slow starts
-that occur when the connection has been idle and starts sending
-again.
 .It Va sack.enable
 Enable support for RFC 2018, TCP Selective Acknowledgment option,
 which allows the receiver to inform the sender about all successfully
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r227499 - head/share/man/man4

2011-11-14 Thread Andre Oppermann

On 14.11.2011 16:38, Garrett Cooper wrote:

On Mon, Nov 14, 2011 at 7:10 AM, Andre Oppermannan...@freebsd.org  wrote:

Author: andre
Date: Mon Nov 14 15:10:42 2011
New Revision: 227499
URL: http://svn.freebsd.org/changeset/base/227499

Log:
  Note the ip_len bug fixed in r226105 in the BUGS section.

Modified:
  head/share/man/man4/ip.4

Modified: head/share/man/man4/ip.4
==
--- head/share/man/man4/ip.4Mon Nov 14 15:10:01 2011(r227498)
+++ head/share/man/man4/ip.4Mon Nov 14 15:10:42 2011(r227499)
@@ -32,7 +32,7 @@
  .\ @(#)ip.4   8.2 (Berkeley) 11/30/93
  .\ $FreeBSD$
  .\
-.Dd June 1, 2009
+.Dd November 14, 2011
  .Dt IP 4
  .Os
  .Sh NAME
@@ -847,3 +847,9 @@ The
  .Vt ip_mreqn
  structure appeared in
  .Tn Linux 2.4 .
+.Sh BUGS
+Before
+.Fx 10.0 packets received on raw IP sockets had the
+.Va ip_hl
+subtracted from the
+.Va ip_len field.


Isn't the fix going to be MFCed?


It was. However there are some ports depending on this bug and due
to the late stage we are in the release cycle we decided to back out
the MFC.

--
Andre
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r227501 - head/share/man/man4

2011-11-14 Thread Andre Oppermann
Author: andre
Date: Mon Nov 14 15:57:03 2011
New Revision: 227501
URL: http://svn.freebsd.org/changeset/base/227501

Log:
  mdoc fix for r227499.
  
  Reported by:  brueffer

Modified:
  head/share/man/man4/ip.4

Modified: head/share/man/man4/ip.4
==
--- head/share/man/man4/ip.4Mon Nov 14 15:14:42 2011(r227500)
+++ head/share/man/man4/ip.4Mon Nov 14 15:57:03 2011(r227501)
@@ -849,7 +849,8 @@ structure appeared in
 .Tn Linux 2.4 .
 .Sh BUGS
 Before
-.Fx 10.0 packets received on raw IP sockets had the
+.Fx 10.0
+packets received on raw IP sockets had the
 .Va ip_hl
 subtracted from the
 .Va ip_len field.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r223839 - in head/sys: conf kern netinet

2011-07-07 Thread Andre Oppermann
Author: andre
Date: Thu Jul  7 10:37:14 2011
New Revision: 223839
URL: http://svn.freebsd.org/changeset/base/223839

Log:
  Remove the TCP_SORECEIVE_STREAM compile time option.  The use of
  soreceive_stream() for TCP still has to be enabled with the loader
  tuneable net.inet.tcp.soreceive_stream.
  
  Suggested by: trociny and others

Modified:
  head/sys/conf/options
  head/sys/kern/uipc_socket.c
  head/sys/netinet/tcp_subr.c

Modified: head/sys/conf/options
==
--- head/sys/conf/options   Thu Jul  7 09:51:31 2011(r223838)
+++ head/sys/conf/options   Thu Jul  7 10:37:14 2011(r223839)
@@ -427,7 +427,6 @@ SLIP_IFF_OPTS   opt_slip.h
 TCPDEBUG
 TCP_OFFLOAD_DISABLEopt_inet.h #Disable code to dispatch tcp offloading
 TCP_SIGNATURE  opt_inet.h
-TCP_SORECEIVE_STREAM   opt_inet.h
 VLAN_ARRAY opt_vlan.h
 XBONEHACK
 FLOWTABLE  opt_route.h

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Thu Jul  7 09:51:31 2011(r223838)
+++ head/sys/kern/uipc_socket.c Thu Jul  7 10:37:14 2011(r223839)
@@ -1915,7 +1915,6 @@ release:
 /*
  * Optimized version of soreceive() for stream (TCP) sockets.
  */
-#ifdef TCP_SORECEIVE_STREAM
 int
 soreceive_stream(struct socket *so, struct sockaddr **psa, struct uio *uio,
 struct mbuf **mp0, struct mbuf **controlp, int *flagsp)
@@ -2109,7 +2108,6 @@ out:
sbunlock(sb);
return (error);
 }
-#endif /* TCP_SORECEIVE_STREAM */
 
 /*
  * Optimized version of soreceive() for simple datagram cases from userspace.

Modified: head/sys/netinet/tcp_subr.c
==
--- head/sys/netinet/tcp_subr.c Thu Jul  7 09:51:31 2011(r223838)
+++ head/sys/netinet/tcp_subr.c Thu Jul  7 10:37:14 2011(r223839)
@@ -206,11 +206,9 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 VNET_NAME(tcp_isn_reseed_interval), 0,
 Seconds between reseeding of ISN secret);
 
-#ifdef TCP_SORECEIVE_STREAM
 static int tcp_soreceive_stream = 0;
 SYSCTL_INT(_net_inet_tcp, OID_AUTO, soreceive_stream, CTLFLAG_RDTUN,
 tcp_soreceive_stream, 0, Using soreceive_stream for TCP sockets);
-#endif
 
 #ifdef TCP_SIGNATURE
 static int tcp_sig_checksigs = 1;
@@ -337,13 +335,13 @@ tcp_init(void)
tcp_finwait2_timeout = TCPTV_FINWAIT2_TIMEOUT;
tcp_tcbhashsize = hashsize;
 
-#ifdef TCP_SORECEIVE_STREAM
TUNABLE_INT_FETCH(net.inet.tcp.soreceive_stream, 
tcp_soreceive_stream);
if (tcp_soreceive_stream) {
tcp_usrreqs.pru_soreceive = soreceive_stream;
+#ifdef INET6
tcp6_usrreqs.pru_soreceive = soreceive_stream;
+#endif /* INET6 */
}
-#endif
 
 #ifdef INET6
 #define TCP_MINPROTOHDR (sizeof(struct ip6_hdr) + sizeof(struct tcphdr))
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r223862 - in head/sys: net netinet netinet6

2011-07-08 Thread Andre Oppermann

On 08.07.2011 11:38, Marko Zec wrote:

Author: zec
Date: Fri Jul  8 09:38:33 2011
New Revision: 223862
URL: http://svn.freebsd.org/changeset/base/223862

Log:
   Permit ARP to proceed for IPv4 host routes for which the gateway is the
   same as the host address.  This already works fine for INET6 and ND6.


Can you give an example what this does? Is it some sort of proxy ARP?


   While here, remove two function pointers from struct lltable which are
   only initialized but never used.


Ideally this would have been a separate commit because it has nothing to
do with primary functional change.

--
Andre


   MFC after:   3 days

Modified:
   head/sys/net/if_llatbl.h
   head/sys/netinet/in.c
   head/sys/netinet6/in6.c

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r223863 - head/sys/kern

2011-07-08 Thread Andre Oppermann
Author: andre
Date: Fri Jul  8 10:50:13 2011
New Revision: 223863
URL: http://svn.freebsd.org/changeset/base/223863

Log:
  In the experimental soreceive_stream():
  
   o Move the non-blocking socket test below the SBS_CANTRCVMORE so that EOF
 is correctly returned on a remote connection close.
   o In the non-blocking socket test compare SS_NBIO against the so-so_state
 field instead of the incorrect sb-sb_state field.
   o Simplify the ENOTCONN test by removing cases that can't occur.
  
  Submitted by: trociny (with some further tweaks by committer)
  Tested by:trociny

Modified:
  head/sys/kern/uipc_socket.c

Modified: head/sys/kern/uipc_socket.c
==
--- head/sys/kern/uipc_socket.c Fri Jul  8 09:38:33 2011(r223862)
+++ head/sys/kern/uipc_socket.c Fri Jul  8 10:50:13 2011(r223863)
@@ -1954,20 +1954,9 @@ soreceive_stream(struct socket *so, stru
}
oresid = uio-uio_resid;
 
-   /* We will never ever get anything unless we are connected. */
+   /* We will never ever get anything unless we are or were connected. */
if (!(so-so_state  (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
-   /* When disconnecting there may be still some data left. */
-   if (sb-sb_cc  0)
-   goto deliver;
-   if (!(so-so_state  SS_ISDISCONNECTED))
-   error = ENOTCONN;
-   goto out;
-   }
-
-   /* Socket buffer is empty and we shall not block. */
-   if (sb-sb_cc == 0 
-   ((sb-sb_flags  SS_NBIO) || (flags  (MSG_DONTWAIT|MSG_NBIO {
-   error = EAGAIN;
+   error = ENOTCONN;
goto out;
}
 
@@ -1994,6 +1983,13 @@ restart:
goto out;
}
 
+   /* Socket buffer is empty and we shall not block. */
+   if (sb-sb_cc == 0 
+   ((so-so_state  SS_NBIO) || (flags  (MSG_DONTWAIT|MSG_NBIO {
+   error = EAGAIN;
+   goto out;
+   }
+
/* Socket buffer got some data that we shall deliver now. */
if (sb-sb_cc  0  !(flags  MSG_WAITALL) 
((sb-sb_flags  SS_NBIO) ||
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r226105 - head/sys/netinet

2011-10-07 Thread Andre Oppermann
Author: andre
Date: Fri Oct  7 13:43:01 2011
New Revision: 226105
URL: http://svn.freebsd.org/changeset/base/226105

Log:
  Add back the IP header length to the total packet length field on
  raw IP sockets.  It was deducted in ip_input() in preparation for
  protocols interested only in the payload.
  
  On raw sockets the IP header should be delivered as it at came in
  from the network except for the byte order swaps in some fields.
  
  This brings us in line with all other OS'es that provide raw
  IP sockets.
  
  Reported by: Matthew Cini Sarreo mcins1-at-gmail.com
  MFC after: 3 days

Modified:
  head/sys/netinet/raw_ip.c

Modified: head/sys/netinet/raw_ip.c
==
--- head/sys/netinet/raw_ip.c   Fri Oct  7 13:16:21 2011(r226104)
+++ head/sys/netinet/raw_ip.c   Fri Oct  7 13:43:01 2011(r226105)
@@ -289,6 +289,13 @@ rip_input(struct mbuf *m, int off)
last = NULL;
 
ifp = m-m_pkthdr.rcvif;
+   /*
+* Add back the IP header length which was
+* removed by ip_input().  Raw sockets do
+* not modify the packet except for some
+* byte order swaps.
+*/
+   ip-ip_len += off;
 
hash = INP_PCBHASH_RAW(proto, ip-ip_src.s_addr,
ip-ip_dst.s_addr, V_ripcbinfo.ipi_hashmask);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r226113 - head/sys/netinet

2011-10-07 Thread Andre Oppermann
Author: andre
Date: Fri Oct  7 16:39:03 2011
New Revision: 226113
URL: http://svn.freebsd.org/changeset/base/226113

Log:
  Prevent TCP sessions from stalling indefinitely in reassembly
  when reaching the zone limit of reassembly queue entries.
  
  When the zone limit was reached not even the missing segment
  that would complete the sequence space could be processed
  preventing the TCP session forever from making any further
  progress.
  
  Solve this deadlock by using a temporary on-stack queue entry
  for the missing segment followed by an immediate dequeue again
  by delivering the contiguous sequence space to the socket.
  
  Add logging under net.inet.tcp.log_debug for reassembly queue
  issues.
  
  Reviewed by:  lsteward (previous version)
  Tested by:Steven Hartland killing-at-multiplay.co.uk
  MFC after:3 days

Modified:
  head/sys/netinet/tcp_reass.c

Modified: head/sys/netinet/tcp_reass.c
==
--- head/sys/netinet/tcp_reass.cFri Oct  7 16:09:44 2011
(r226112)
+++ head/sys/netinet/tcp_reass.cFri Oct  7 16:39:03 2011
(r226113)
@@ -177,7 +177,9 @@ tcp_reass(struct tcpcb *tp, struct tcphd
struct tseg_qent *nq;
struct tseg_qent *te = NULL;
struct socket *so = tp-t_inpcb-inp_socket;
+   char *s = NULL;
int flags;
+   struct tseg_qent tqs;
 
INP_WLOCK_ASSERT(tp-t_inpcb);
 
@@ -215,19 +217,40 @@ tcp_reass(struct tcpcb *tp, struct tcphd
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
*tlenp = 0;
+   if ((s = tcp_log_addrs(tp-t_inpcb-inp_inc, th, NULL, NULL))) 
{
+   log(LOG_DEBUG, %s; %s: queue limit reached, 
+   segment dropped\n, s, __func__);
+   free(s, M_TCPLOG);
+   }
return (0);
}
 
/*
 * Allocate a new queue entry. If we can't, or hit the zone limit
 * just drop the pkt.
+*
+* Use a temporary structure on the stack for the missing segment
+* when the zone is exhausted. Otherwise we may get stuck.
 */
te = uma_zalloc(V_tcp_reass_zone, M_NOWAIT);
-   if (te == NULL) {
+   if (te == NULL  th-th_seq != tp-rcv_nxt) {
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
*tlenp = 0;
+   if ((s = tcp_log_addrs(tp-t_inpcb-inp_inc, th, NULL, NULL))) 
{
+   log(LOG_DEBUG, %s; %s: global zone limit reached, 
+   segment dropped\n, s, __func__);
+   free(s, M_TCPLOG);
+   }
return (0);
+   } else if (th-th_seq == tp-rcv_nxt) {
+   bzero(tqs, sizeof(struct tseg_qent));
+   te = tqs;
+   if ((s = tcp_log_addrs(tp-t_inpcb-inp_inc, th, NULL, NULL))) 
{
+   log(LOG_DEBUG, %s; %s: global zone limit reached, 
+   using stack for missing segment\n, s, __func__);
+   free(s, M_TCPLOG);
+   }
}
tp-t_segqlen++;
 
@@ -304,6 +327,8 @@ tcp_reass(struct tcpcb *tp, struct tcphd
if (p == NULL) {
LIST_INSERT_HEAD(tp-t_segq, te, tqe_q);
} else {
+   KASSERT(te != tqs, (%s: temporary stack based entry not 
+   first element in queue, __func__));
LIST_INSERT_AFTER(p, te, tqe_q);
}
 
@@ -327,7 +352,8 @@ present:
m_freem(q-tqe_m);
else
sbappendstream_locked(so-so_rcv, q-tqe_m);
-   uma_zfree(V_tcp_reass_zone, q);
+   if (q != tqs)
+   uma_zfree(V_tcp_reass_zone, q);
tp-t_segqlen--;
q = nq;
} while (q  q-tqe_th-th_seq == tp-rcv_nxt);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r226433 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 13:54:46 2011
New Revision: 226433
URL: http://svn.freebsd.org/changeset/base/226433

Log:
  Update the comment and description of tcp_sendspace and tcp_recvspace
  to better reflect their purpose.
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_usrreq.c

Modified: head/sys/netinet/tcp_usrreq.c
==
--- head/sys/netinet/tcp_usrreq.c   Sun Oct 16 11:08:51 2011
(r226432)
+++ head/sys/netinet/tcp_usrreq.c   Sun Oct 16 13:54:46 2011
(r226433)
@@ -1498,16 +1498,15 @@ tcp_ctloutput(struct socket *so, struct 
 #undef INP_WLOCK_RECHECK
 
 /*
- * tcp_sendspace and tcp_recvspace are the default send and receive window
- * sizes, respectively.  These are obsolescent (this information should
- * be set by the route).
+ * Set the initial send and receive socket buffer sizes for
+ * newly created TCP sockets.
  */
 u_long tcp_sendspace = 1024*32;
 SYSCTL_ULONG(_net_inet_tcp, TCPCTL_SENDSPACE, sendspace, CTLFLAG_RW,
-tcp_sendspace , 0, Maximum outgoing TCP datagram size);
+tcp_sendspace , 0, Initial send socket buffer size);
 u_long tcp_recvspace = 1024*64;
 SYSCTL_ULONG(_net_inet_tcp, TCPCTL_RECVSPACE, recvspace, CTLFLAG_RW,
-tcp_recvspace , 0, Maximum incoming TCP datagram size);
+tcp_recvspace , 0, Initial receive socket buffer size);
 
 /*
  * Attach TCP protocol to socket, allocating
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r226437 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 15:08:43 2011
New Revision: 226437
URL: http://svn.freebsd.org/changeset/base/226437

Log:
  VNET virtualize tcp_sendspace/tcp_recvspace and change the
  type to INT.  A long is not necessary as the TCP window is
  limited to 2**30.  A larger initial window isn't useful.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_usrreq.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 14:30:28 2011
(r226436)
+++ head/sys/netinet/tcp_input.cSun Oct 16 15:08:43 2011
(r226437)
@@ -3517,7 +3517,7 @@ tcp_mss(struct tcpcb *tp, int offer)
 */
so = inp-inp_socket;
SOCKBUF_LOCK(so-so_snd);
-   if ((so-so_snd.sb_hiwat == tcp_sendspace)  metrics.rmx_sendpipe)
+   if ((so-so_snd.sb_hiwat == V_tcp_sendspace)  metrics.rmx_sendpipe)
bufsize = metrics.rmx_sendpipe;
else
bufsize = so-so_snd.sb_hiwat;
@@ -3534,7 +3534,7 @@ tcp_mss(struct tcpcb *tp, int offer)
tp-t_maxseg = mss;
 
SOCKBUF_LOCK(so-so_rcv);
-   if ((so-so_rcv.sb_hiwat == tcp_recvspace)  metrics.rmx_recvpipe)
+   if ((so-so_rcv.sb_hiwat == V_tcp_recvspace)  metrics.rmx_recvpipe)
bufsize = metrics.rmx_recvpipe;
else
bufsize = so-so_rcv.sb_hiwat;

Modified: head/sys/netinet/tcp_usrreq.c
==
--- head/sys/netinet/tcp_usrreq.c   Sun Oct 16 14:30:28 2011
(r226436)
+++ head/sys/netinet/tcp_usrreq.c   Sun Oct 16 15:08:43 2011
(r226437)
@@ -1501,12 +1501,15 @@ tcp_ctloutput(struct socket *so, struct 
  * Set the initial send and receive socket buffer sizes for
  * newly created TCP sockets.
  */
-u_long tcp_sendspace = 1024*32;
-SYSCTL_ULONG(_net_inet_tcp, TCPCTL_SENDSPACE, sendspace, CTLFLAG_RW,
-tcp_sendspace , 0, Initial send socket buffer size);
-u_long tcp_recvspace = 1024*64;
-SYSCTL_ULONG(_net_inet_tcp, TCPCTL_RECVSPACE, recvspace, CTLFLAG_RW,
-tcp_recvspace , 0, Initial receive socket buffer size);
+VNET_DEFINE(int, tcp_sendspace) = 1024*32;
+#defineV_tcp_sendspace VNET(tcp_sendspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW,
+VNET_NAME(tcp_sendspace), 0, Initial send socket buffer size);
+
+VNET_DEFINE(int, tcp_recvspace) = 1024*64
+#defineV_tcp_recvspace VNET(tcp_recvspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
+VNET_NAME(tcp_recvspace), 0, Initial receive socket buffer size);
 
 /*
  * Attach TCP protocol to socket, allocating
@@ -1521,7 +1524,7 @@ tcp_attach(struct socket *so)
int error;
 
if (so-so_snd.sb_hiwat == 0 || so-so_rcv.sb_hiwat == 0) {
-   error = soreserve(so, tcp_sendspace, tcp_recvspace);
+   error = soreserve(so, V_tcp_sendspace, V_tcp_recvspace);
if (error)
return (error);
}

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Sun Oct 16 14:30:28 2011(r226436)
+++ head/sys/netinet/tcp_var.h  Sun Oct 16 15:08:43 2011(r226437)
@@ -606,6 +606,8 @@ VNET_DECLARE(int, tcp_mssdflt); /* XXX *
 VNET_DECLARE(int, tcp_minmss);
 VNET_DECLARE(int, tcp_delack_enabled);
 VNET_DECLARE(int, tcp_do_rfc3390);
+VNET_DECLARE(int, tcp_sendspace);
+VNET_DECLARE(int, tcp_recvspace);
 VNET_DECLARE(int, path_mtu_discovery);
 VNET_DECLARE(int, ss_fltsz);
 VNET_DECLARE(int, ss_fltsz_local);
@@ -618,6 +620,8 @@ VNET_DECLARE(int, tcp_abc_l_var);
 #defineV_tcp_minmssVNET(tcp_minmss)
 #defineV_tcp_delack_enabledVNET(tcp_delack_enabled)
 #defineV_tcp_do_rfc3390VNET(tcp_do_rfc3390)
+#defineV_tcp_sendspace VNET(tcp_sendspace)
+#defineV_tcp_recvspace VNET(tcp_recvspace)
 #defineV_path_mtu_discoveryVNET(path_mtu_discovery)
 #defineV_ss_fltsz  VNET(ss_fltsz)
 #defineV_ss_fltsz_localVNET(ss_fltsz_local)
@@ -716,8 +720,6 @@ void tcp_hc_updatemtu(struct in_conninf
 voidtcp_hc_update(struct in_conninfo *, struct hc_metrics_lite *);
 
 extern struct pr_usrreqs tcp_usrreqs;
-extern u_long tcp_sendspace;
-extern u_long tcp_recvspace;
 tcp_seq tcp_new_isn(struct tcpcb *);
 
 voidtcp_sack_doack(struct tcpcb *, struct tcpopt *, tcp_seq);
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


svn commit: r226447 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 20:06:44 2011
New Revision: 226447
URL: http://svn.freebsd.org/changeset/base/226447

Log:
  Remove the ss_fltsz and ss_fltsz_local sysctl's which have
  long been superseded by the RFC3390 initial CWND sizing.
  
  Also remove the remnants of TCP_METRICS_CWND which used the
  TCP hostcache to set the initial CWND in a non-RFC compliant
  way.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 19:46:52 2011
(r226446)
+++ head/sys/netinet/tcp_input.cSun Oct 16 20:06:44 2011
(r226447)
@@ -301,9 +301,6 @@ cc_conn_init(struct tcpcb *tp)
struct hc_metrics_lite metrics;
struct inpcb *inp = tp-t_inpcb;
int rtt;
-#ifdef INET6
-   int isipv6 = ((inp-inp_vflag  INP_IPV6) != 0) ? 1 : 0;
-#endif
 
INP_WLOCK_ASSERT(tp-t_inpcb);
 
@@ -337,49 +334,16 @@ cc_conn_init(struct tcpcb *tp)
}
 
/*
-* Set the slow-start flight size depending on whether this
-* is a local network or not.
-*
-* Extend this so we cache the cwnd too and retrieve it here.
-* Make cwnd even bigger than RFC3390 suggests but only if we
-* have previous experience with the remote host. Be careful
-* not make cwnd bigger than remote receive window or our own
-* send socket buffer. Maybe put some additional upper bound
-* on the retrieved cwnd. Should do incremental updates to
-* hostcache when cwnd collapses so next connection doesn't
-* overloads the path again.
-*
-* XXXAO: Initializing the CWND from the hostcache is broken
-* and in its current form not RFC conformant.  It is disabled
-* until fixed or removed entirely.
+* Set the initial slow-start flight size.
 *
 * RFC3390 says only do this if SYN or SYN/ACK didn't got lost.
-* We currently check only in syncache_socket for that.
+* XXX: We currently check only in syncache_socket for that.
 */
-/* #define TCP_METRICS_CWND */
-#ifdef TCP_METRICS_CWND
-   if (metrics.rmx_cwnd)
-   tp-snd_cwnd = max(tp-t_maxseg, min(metrics.rmx_cwnd / 2,
-   min(tp-snd_wnd, so-so_snd.sb_hiwat)));
-   else
-#endif
if (V_tcp_do_rfc3390)
tp-snd_cwnd = min(4 * tp-t_maxseg,
max(2 * tp-t_maxseg, 4380));
-#ifdef INET6
-   else if (isipv6  in6_localaddr(inp-in6p_faddr))
-   tp-snd_cwnd = tp-t_maxseg * V_ss_fltsz_local;
-#endif
-#if defined(INET)  defined(INET6)
-   else if (!isipv6  in_localaddr(inp-inp_faddr))
-   tp-snd_cwnd = tp-t_maxseg * V_ss_fltsz_local;
-#endif
-#ifdef INET
-   else if (in_localaddr(inp-inp_faddr))
-   tp-snd_cwnd = tp-t_maxseg * V_ss_fltsz_local;
-#endif
else
-   tp-snd_cwnd = tp-t_maxseg * V_ss_fltsz;
+   tp-snd_cwnd = tp-t_maxseg;
 
if (CC_ALGO(tp)-conn_init != NULL)
CC_ALGO(tp)-conn_init(tp-ccv);

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 16 19:46:52 2011
(r226446)
+++ head/sys/netinet/tcp_output.c   Sun Oct 16 20:06:44 2011
(r226447)
@@ -89,16 +89,6 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
VNET_NAME(path_mtu_discovery), 1,
Enable Path MTU Discovery);
 
-VNET_DEFINE(int, ss_fltsz) = 1;
-SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, slowstart_flightsize, CTLFLAG_RW,
-   VNET_NAME(ss_fltsz), 1,
-   Slow start flight size);
-
-VNET_DEFINE(int, ss_fltsz_local) = 4;
-SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, local_slowstart_flightsize,
-   CTLFLAG_RW, VNET_NAME(ss_fltsz_local), 1,
-   Slow start flight size for local networks);
-
 VNET_DEFINE(int, tcp_do_tso) = 1;
 #defineV_tcp_do_tsoVNET(tcp_do_tso)
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, tso, CTLFLAG_RW,

Modified: head/sys/netinet/tcp_var.h
==
--- head/sys/netinet/tcp_var.h  Sun Oct 16 19:46:52 2011(r226446)
+++ head/sys/netinet/tcp_var.h  Sun Oct 16 20:06:44 2011(r226447)
@@ -609,8 +609,6 @@ VNET_DECLARE(int, tcp_do_rfc3390);
 VNET_DECLARE(int, tcp_sendspace);
 VNET_DECLARE(int, tcp_recvspace);
 VNET_DECLARE(int, path_mtu_discovery);
-VNET_DECLARE(int, ss_fltsz);
-VNET_DECLARE(int, ss_fltsz_local);
 VNET_DECLARE(int, tcp_do_rfc3465);
 VNET_DECLARE(int, tcp_abc_l_var);
 #defineV_tcb   VNET(tcb)
@@ -623,8 +621,6 @@ VNET_DECLARE(int, tcp_abc_l_var);
 #defineV_tcp_sendspace VNET(tcp_sendspace)
 #defineV_tcp_recvspace 

svn commit: r226448 - head/sys/netinet

2011-10-16 Thread Andre Oppermann
Author: andre
Date: Sun Oct 16 20:18:39 2011
New Revision: 226448
URL: http://svn.freebsd.org/changeset/base/226448

Log:
  Move the tcp_sendspace and tcp_recvspace sysctl's from
  the middle of tcp_usrreq.c to the top of tcp_output.c
  and tcp_input.c respectively next to the socket buffer
  autosizing controls.
  
  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_usrreq.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 20:06:44 2011
(r226447)
+++ head/sys/netinet/tcp_input.cSun Oct 16 20:18:39 2011
(r226448)
@@ -183,6 +183,11 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 VNET_NAME(tcp_insecure_rst), 0,
 Follow the old (insecure) criteria for accepting RST packets);
 
+VNET_DEFINE(int, tcp_recvspace) = 1024*64
+#defineV_tcp_recvspace VNET(tcp_recvspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
+VNET_NAME(tcp_recvspace), 0, Initial receive socket buffer size);
+
 VNET_DEFINE(int, tcp_do_autorcvbuf) = 1;
 #defineV_tcp_do_autorcvbuf VNET(tcp_do_autorcvbuf)
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, recvbuf_auto, CTLFLAG_RW,

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Sun Oct 16 20:06:44 2011
(r226447)
+++ head/sys/netinet/tcp_output.c   Sun Oct 16 20:18:39 2011
(r226448)
@@ -95,6 +95,11 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
VNET_NAME(tcp_do_tso), 0,
Enable TCP Segmentation Offload);
 
+VNET_DEFINE(int, tcp_sendspace) = 1024*32;
+#defineV_tcp_sendspace VNET(tcp_sendspace)
+SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW,
+   VNET_NAME(tcp_sendspace), 0, Initial send socket buffer size);
+
 VNET_DEFINE(int, tcp_do_autosndbuf) = 1;
 #defineV_tcp_do_autosndbuf VNET(tcp_do_autosndbuf)
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, sendbuf_auto, CTLFLAG_RW,

Modified: head/sys/netinet/tcp_usrreq.c
==
--- head/sys/netinet/tcp_usrreq.c   Sun Oct 16 20:06:44 2011
(r226447)
+++ head/sys/netinet/tcp_usrreq.c   Sun Oct 16 20:18:39 2011
(r226448)
@@ -1498,20 +1498,6 @@ tcp_ctloutput(struct socket *so, struct 
 #undef INP_WLOCK_RECHECK
 
 /*
- * Set the initial send and receive socket buffer sizes for
- * newly created TCP sockets.
- */
-VNET_DEFINE(int, tcp_sendspace) = 1024*32;
-#defineV_tcp_sendspace VNET(tcp_sendspace)
-SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_SENDSPACE, tcp_sendspace, CTLFLAG_RW,
-VNET_NAME(tcp_sendspace), 0, Initial send socket buffer size);
-
-VNET_DEFINE(int, tcp_recvspace) = 1024*64
-#defineV_tcp_recvspace VNET(tcp_recvspace)
-SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
-VNET_NAME(tcp_recvspace), 0, Initial receive socket buffer size);
-
-/*
  * Attach TCP protocol to socket, allocating
  * internet protocol control block, tcp control block,
  * bufer space, and entering LISTEN state if to accept connections.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r226454 - head/sys/netinet

2011-10-17 Thread Andre Oppermann

On 17.10.2011 02:16, Bjoern A. Zeeb wrote:


On 17. Oct 2011, at 00:05 , Bjoern A. Zeeb wrote:


Author: bz
Date: Mon Oct 17 00:05:31 2011
New Revision: 226454
URL: http://svn.freebsd.org/changeset/base/226454

Log:
  Add syntactic sugar missed in r226437 and then not added either when moving
  things around in r226448 but desperately needed to always make things
  compile successfully.




GENRIC and LINT did not fail failed on it as it expanded to:

int tcp_recvspace = 1024*64

followed by:

#define SYSCTL_VNET_INT(parent, nbr, name, access, ptr, val, descr) \
 SYSCTL_INT(parent, nbr, name, access, ptr, val, descr)

=

#define SYSCTL_INT(parent, nbr, name, access, ptr, val, descr)  \
 SYSCTL_ASSERT_TYPE(INT, ptr, parent, name); \
 SYSCTL_OID(parent, nbr, name,   \
 CTLTYPE_INT | CTLFLAG_MPSAFE | (access),\
 ptr, val, sysctl_handle_int, I, descr)

and the SYSCTL_ASSERT_TYPE() expanding to nothing in

#define SYSCTL_ASSERT_TYPE(type, ptr, parent, name)

leaving just the ';' around;  so it ended up as:

int tcp_recvspace = 1024*64

;
and an expanded SYSCTL_OID(...);


Oops, sorry missing that one. And thanks for comitting the fix.

--
Andre


  MFC after:1 week

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cSun Oct 16 22:24:04 2011
(r226453)
+++ head/sys/netinet/tcp_input.cMon Oct 17 00:05:31 2011
(r226454)
@@ -183,7 +183,7 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
 VNET_NAME(tcp_insecure_rst), 0,
 Follow the old (insecure) criteria for accepting RST packets);

-VNET_DEFINE(int, tcp_recvspace) = 1024*64
+VNET_DEFINE(int, tcp_recvspace) = 1024*64;
#define V_tcp_recvspace VNET(tcp_recvspace)
SYSCTL_VNET_INT(_net_inet_tcp, TCPCTL_RECVSPACE, tcp_recvspace, CTLFLAG_RW,
 VNET_NAME(tcp_recvspace), 0, Initial receive socket buffer size);




___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


  1   2   3   >