Re: weird network problems on current since 10/28/2012
On 05.11.2012 02:39, Manfred Antar wrote: At 01:57 PM 11/4/2012, you wrote: On 04.11.2012 21:15, Andreas Tobler wrote: On 04.11.12 14:57, Andre Oppermann wrote: On 04.11.2012 13:11, Kim Culhan wrote: On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: On 2012-11-04 02:13, Manfred Antar wrote: At 03:29 PM 11/3/2012, Adrian Chadd wrote: After the commit, there was a small discussion thread on svn-src-head@ about the possible problems with the approach. Maybe you are experiencing those? As the commit message says, you should be able to turn the feature off using: sysctl net.inet.tcp.experimental.initcwnd10=0 Can you please try that, and see if the problems go away? FWIW this did not make the problem go away on 2 machines. Yes, this very much looks like the same problem as in PR/173309. Please try the attached patch. It fixes the connection hang issue. There may be a second issue I debugging currently base on the feedback >from Fabian Keil. I jump into this thread since I have a similar network issue. My scenario: 'make installkernel DESTDIR=/netboot/test' to a nfs mounted drive. The nfs drive on the server is an ufs fs. No zfs. Up to r242261 I can install the kernel (or world) in a fluent way to the nfs destination. >From r242262 it doesn't work smooth. I have stalls, sometimes my patience is not enough and I kill the process. I tried 242266 with the above mentioned patch. No real success. How can I help/test? Please try the attach patch instead of the above mentioned one. -- Andre Index: netinet/tcp_output.c === --- netinet/tcp_output.c(revision 242577) +++ netinet/tcp_output.c(working copy) @@ -228,7 +228,7 @@ tso = 0; mtu = 0; off = tp->snd_nxt - tp->snd_una; - sendwin = min(tp->snd_wnd, tp->snd_cwnd); + sendwin = ulmax(ulmin(tp->snd_wnd - off, tp->snd_cwnd), 0); flags = tcp_outflags[tp->t_state]; /* @@ -249,7 +249,7 @@ (p = tcp_sack_output(tp, &sack_bytes_rxmt))) { long cwin; - cwin = min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt; + cwin = ulmin(tp->snd_wnd - off, tp->snd_cwnd) - sack_bytes_rxmt; if (cwin < 0) cwin = 0; /* Do not retransmit SACK segments beyond snd_recover */ @@ -355,7 +355,7 @@ * sending new data, having retransmitted all the * data possible in the scoreboard. */ - len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) + len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd - off) - off); /* * Don't remove this (len > 0) check ! This doesn't seem to make a difference. I have a ssh window thats been trying to connect for the past 5 minutes. This is on a local network 192.168.0.4 >===SSH==> 192.168.0.5 Also pop from the same machines endless trying to connect. Hopefully this mail will get thru , otherwise i will need to reboot to old kernel I've backed out the change with r242601 as it exhibits still too many problems. I'll fix these problems in the next days but in the mean time HEAD should be in a working state. I'm sorry for the trouble. -- Andre ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
At 01:57 PM 11/4/2012, you wrote: >On 04.11.2012 21:15, Andreas Tobler wrote: >>On 04.11.12 14:57, Andre Oppermann wrote: >>>On 04.11.2012 13:11, Kim Culhan wrote: On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: >On 2012-11-04 02:13, Manfred Antar wrote: >>At 03:29 PM 11/3/2012, Adrian Chadd wrote: >After the commit, there was a small discussion thread on svn-src-head@ >about the possible problems with the approach. Maybe you are >experiencing those? > >As the commit message says, you should be able to turn the feature off >using: > > sysctl net.inet.tcp.experimental.initcwnd10=0 > >Can you please try that, and see if the problems go away? FWIW this did not make the problem go away on 2 machines. >>> >>>Yes, this very much looks like the same problem as in PR/173309. >>> >>>Please try the attached patch. It fixes the connection hang issue. >>>There may be a second issue I debugging currently base on the feedback >>>from Fabian Keil. >> >>I jump into this thread since I have a similar network issue. >> >>My scenario: >> >>'make installkernel DESTDIR=/netboot/test' to a nfs mounted drive. >>The nfs drive on the server is an ufs fs. No zfs. >> >>Up to r242261 I can install the kernel (or world) in a fluent way to the >>nfs destination. >> >>>From r242262 it doesn't work smooth. I have stalls, sometimes my >>patience is not enough and I kill the process. >> >>I tried 242266 with the above mentioned patch. No real success. >> >>How can I help/test? > >Please try the attach patch instead of the above mentioned one. > >-- >Andre > >Index: netinet/tcp_output.c >=== >--- netinet/tcp_output.c(revision 242577) >+++ netinet/tcp_output.c(working copy) >@@ -228,7 +228,7 @@ >tso = 0; >mtu = 0; >off = tp->snd_nxt - tp->snd_una; >- sendwin = min(tp->snd_wnd, tp->snd_cwnd); >+ sendwin = ulmax(ulmin(tp->snd_wnd - off, tp->snd_cwnd), 0); > >flags = tcp_outflags[tp->t_state]; >/* >@@ -249,7 +249,7 @@ >(p = tcp_sack_output(tp, &sack_bytes_rxmt))) { >long cwin; > >- cwin = min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt; >+ cwin = ulmin(tp->snd_wnd - off, tp->snd_cwnd) - >sack_bytes_rxmt; >if (cwin < 0) >cwin = 0; >/* Do not retransmit SACK segments beyond snd_recover */ >@@ -355,7 +355,7 @@ > * sending new data, having retransmitted all the > * data possible in the scoreboard. > */ >- len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) >+ len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd - off) > - off); >/* > * Don't remove this (len > 0) check ! This doesn't seem to make a difference. I have a ssh window thats been trying to connect for the past 5 minutes. This is on a local network 192.168.0.4 >===SSH==> 192.168.0.5 Also pop from the same machines endless trying to connect. Hopefully this mail will get thru , otherwise i will need to reboot to old kernel Manfred || n...@pozo.com || || Ph. (415) 681-6235 || -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On 04.11.12 22:57, Andre Oppermann wrote: > On 04.11.2012 21:15, Andreas Tobler wrote: >> On 04.11.12 14:57, Andre Oppermann wrote: >>> On 04.11.2012 13:11, Kim Culhan wrote: On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: > On 2012-11-04 02:13, Manfred Antar wrote: >> At 03:29 PM 11/3/2012, Adrian Chadd wrote: > After the commit, there was a small discussion thread on svn-src-head@ > about the possible problems with the approach. Maybe you are > experiencing those? > > As the commit message says, you should be able to turn the feature off > using: > > sysctl net.inet.tcp.experimental.initcwnd10=0 > > Can you please try that, and see if the problems go away? FWIW this did not make the problem go away on 2 machines. >>> >>> Yes, this very much looks like the same problem as in PR/173309. >>> >>> Please try the attached patch. It fixes the connection hang issue. >>> There may be a second issue I debugging currently base on the feedback >>> from Fabian Keil. >> >> I jump into this thread since I have a similar network issue. >> >> My scenario: >> >> 'make installkernel DESTDIR=/netboot/test' to a nfs mounted drive. >> The nfs drive on the server is an ufs fs. No zfs. >> >> Up to r242261 I can install the kernel (or world) in a fluent way to the >> nfs destination. >> >> >From r242262 it doesn't work smooth. I have stalls, sometimes my >> patience is not enough and I kill the process. >> >> I tried 242266 with the above mentioned patch. No real success. >> >> How can I help/test? > > Please try the attach patch instead of the above mentioned one. Test run based on 242266. It starts much smoother. But it stalls later on. Continues, stalls for several seconds, cont. thx so far. Andreas 1391 0 D+ 0:00.00 install -o root -g wheel -m 555 crypto.ko /netboot/test_install procstat -kk 1391 PIDTID COMM TDNAME KSTACK 1391 100099 install -mi_switch+0x186 sleepq_timedwait+0x42 _sleep+0x1c9 clnt_vc_call+0x763 clnt_reconnect_call+0xfb newnfs_request+0xadb nfscl_request+0x72 nfsrpc_setattr+0x28f nfs_setattr+0x2b0 VOP_SETATTR_APV+0x31 setfmode+0x101 vn_chmod+0x8a sys_fchmod+0x8b amd64_syscall+0x55f Xfast_syscall+0xf7 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On 04.11.2012 21:15, Andreas Tobler wrote: On 04.11.12 14:57, Andre Oppermann wrote: On 04.11.2012 13:11, Kim Culhan wrote: On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: On 2012-11-04 02:13, Manfred Antar wrote: At 03:29 PM 11/3/2012, Adrian Chadd wrote: After the commit, there was a small discussion thread on svn-src-head@ about the possible problems with the approach. Maybe you are experiencing those? As the commit message says, you should be able to turn the feature off using: sysctl net.inet.tcp.experimental.initcwnd10=0 Can you please try that, and see if the problems go away? FWIW this did not make the problem go away on 2 machines. Yes, this very much looks like the same problem as in PR/173309. Please try the attached patch. It fixes the connection hang issue. There may be a second issue I debugging currently base on the feedback from Fabian Keil. I jump into this thread since I have a similar network issue. My scenario: 'make installkernel DESTDIR=/netboot/test' to a nfs mounted drive. The nfs drive on the server is an ufs fs. No zfs. Up to r242261 I can install the kernel (or world) in a fluent way to the nfs destination. From r242262 it doesn't work smooth. I have stalls, sometimes my patience is not enough and I kill the process. I tried 242266 with the above mentioned patch. No real success. How can I help/test? Please try the attach patch instead of the above mentioned one. -- Andre Index: netinet/tcp_output.c === --- netinet/tcp_output.c(revision 242577) +++ netinet/tcp_output.c(working copy) @@ -228,7 +228,7 @@ tso = 0; mtu = 0; off = tp->snd_nxt - tp->snd_una; - sendwin = min(tp->snd_wnd, tp->snd_cwnd); + sendwin = ulmax(ulmin(tp->snd_wnd - off, tp->snd_cwnd), 0); flags = tcp_outflags[tp->t_state]; /* @@ -249,7 +249,7 @@ (p = tcp_sack_output(tp, &sack_bytes_rxmt))) { long cwin; - cwin = min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt; + cwin = ulmin(tp->snd_wnd - off, tp->snd_cwnd) - sack_bytes_rxmt; if (cwin < 0) cwin = 0; /* Do not retransmit SACK segments beyond snd_recover */ @@ -355,7 +355,7 @@ * sending new data, having retransmitted all the * data possible in the scoreboard. */ - len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) + len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd - off) - off); /* * Don't remove this (len > 0) check ! ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On 04.11.12 14:57, Andre Oppermann wrote: > On 04.11.2012 13:11, Kim Culhan wrote: >> On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: >>> On 2012-11-04 02:13, Manfred Antar wrote: At 03:29 PM 11/3/2012, Adrian Chadd wrote: > On 3 November 2012 10:40, Manfred Antar wrote: >> i have problem connecting to freebsd box on local network since last >> sunday. >> the last kernel that works: >>FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 >> anything after that, sometimes i can connect, other times just hangs. >> any network connection hangs = pop httpd ssh etc etc. >> anyone have any ideas ? >> i can checkout different sources and see if i can locate the changes >> that cause >> this. > > Please do! >>> ... Here is what I found doing : setenv CVSROOT /usr/home/ncvs cvs co -D"October 28, 2012 12:14:38 PDT" sys A kernel from that time works fine. doing: cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later the following files were changed: sys/netinet/tcp_input.c sys/netinet/tcp_timer.c sys/netinet/tcp_var.h Building a kernel from these new files is when the problem starts. >>> >>> So, your problems seem to have been introduced by this commit by Andre: >>> >>> http://svn.freebsd.org/changeset/base/242266 >>> >>> Increase the initial CWND to 10 segments as defined in IETF TCPM >>> draft-ietf-tcpm-initcwnd-05. It explains why the increased initial >>> window improves the overall performance of many web services without >>> risking congestion collapse. >>> >>> As long as it remains a draft it is placed under a sysctl marking it >>> as experimental: >>> net.inet.tcp.experimental.initcwnd10 = 1 >>> When it becomes an official RFC soon the sysctl will be changed to >>> the RFC number and moved to net.inet.tcp. >>> >>> This implementation differs from the RFC draft in that it is a bit >>> more conservative in the case of packet loss on SYN or SYN|ACK because >>> we haven't reduced the default RTO to 1 second yet. Also the restart >>> window isn't yet increased as allowed. Both will be adjusted with >>> upcoming changes. >>> >>> Is is enabled by default. In Linux it is enabled since kernel 3.0. >>> >>> After the commit, there was a small discussion thread on svn-src-head@ >>> about the possible problems with the approach. Maybe you are >>> experiencing those? >>> >>> As the commit message says, you should be able to turn the feature off >>> using: >>> >>> sysctl net.inet.tcp.experimental.initcwnd10=0 >>> >>> Can you please try that, and see if the problems go away? >> >> FWIW this did not make the problem go away on 2 machines. > > Yes, this very much looks like the same problem as in PR/173309. > > Please try the attached patch. It fixes the connection hang issue. > There may be a second issue I debugging currently base on the feedback > from Fabian Keil. I jump into this thread since I have a similar network issue. My scenario: 'make installkernel DESTDIR=/netboot/test' to a nfs mounted drive. The nfs drive on the server is an ufs fs. No zfs. Up to r242261 I can install the kernel (or world) in a fluent way to the nfs destination. >From r242262 it doesn't work smooth. I have stalls, sometimes my patience is not enough and I kill the process. I tried 242266 with the above mentioned patch. No real success. How can I help/test? TIA, Andreas ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
At 05:57 AM 11/4/2012, you wrote: >On 04.11.2012 13:11, Kim Culhan wrote: >>On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: >>>On 2012-11-04 02:13, Manfred Antar wrote: At 03:29 PM 11/3/2012, Adrian Chadd wrote: >On 3 November 2012 10:40, Manfred Antar wrote: >>i have problem connecting to freebsd box on local network since last >>sunday. >>the last kernel that works: >> FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 >>anything after that, sometimes i can connect, other times just hangs. >>any network connection hangs = pop httpd ssh etc etc. >>anyone have any ideas ? >>i can checkout different sources and see if i can locate the changes that >>cause >>this. > >Please do! >>>... Here is what I found doing : setenv CVSROOT /usr/home/ncvs cvs co -D"October 28, 2012 12:14:38 PDT" sys A kernel from that time works fine. doing: cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later the following files were changed: sys/netinet/tcp_input.c sys/netinet/tcp_timer.c sys/netinet/tcp_var.h Building a kernel from these new files is when the problem starts. >>> >>>So, your problems seem to have been introduced by this commit by Andre: >>> >>>http://svn.freebsd.org/changeset/base/242266 >>> >>>Increase the initial CWND to 10 segments as defined in IETF TCPM >>>draft-ietf-tcpm-initcwnd-05. It explains why the increased initial >>>window improves the overall performance of many web services without >>>risking congestion collapse. >>> >>>As long as it remains a draft it is placed under a sysctl marking it >>>as experimental: >>> net.inet.tcp.experimental.initcwnd10 = 1 >>>When it becomes an official RFC soon the sysctl will be changed to >>>the RFC number and moved to net.inet.tcp. >>> >>>This implementation differs from the RFC draft in that it is a bit >>>more conservative in the case of packet loss on SYN or SYN|ACK because >>>we haven't reduced the default RTO to 1 second yet. Also the restart >>>window isn't yet increased as allowed. Both will be adjusted with >>>upcoming changes. >>> >>>Is is enabled by default. In Linux it is enabled since kernel 3.0. >>> >>>After the commit, there was a small discussion thread on svn-src-head@ >>>about the possible problems with the approach. Maybe you are >>>experiencing those? >>> >>>As the commit message says, you should be able to turn the feature off >>>using: >>> >>>sysctl net.inet.tcp.experimental.initcwnd10=0 >>> >>>Can you please try that, and see if the problems go away? >> >>FWIW this did not make the problem go away on 2 machines. > >Yes, this very much looks like the same problem as in PR/173309. > >Please try the attached patch. It fixes the connection hang issue. >There may be a second issue I debugging currently base on the feedback >from Fabian Keil. > >-- >Andre > >Index: tcp_input.c >=== >--- tcp_input.c (revision 242494) >+++ tcp_input.c (working copy) >@@ -2650,10 +2652,12 @@ > >SOCKBUF_LOCK(&so->so_snd); >if (acked > so->so_snd.sb_cc) { >+ tp->snd_wnd -= so->so_snd.sb_cc; >sbdrop_locked(&so->so_snd, (int)so->so_snd.sb_cc); >ourfinisacked = 1; >} else { >sbdrop_locked(&so->so_snd, acked); >+ tp->snd_wnd -= acked; >ourfinisacked = 0; >} >/* NB: sowwakeup_locked() does an implicit unlock. */ This patch improves the connection issue, not hanging on trying to connect (ssh pop) It still seems that it is taking longer to connect though. But in the end the connection goes through. I can capture a tcpdump and put it at http://pozo.com/tcpdump/tpdump.txt if that will help. I'll let it run for about 1/2 hour. Manfred -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
At 03:21 AM 11/4/2012, Dimitry Andric wrote: >On 2012-11-04 02:13, Manfred Antar wrote: >>At 03:29 PM 11/3/2012, Adrian Chadd wrote: >>>On 3 November 2012 10:40, Manfred Antar wrote: i have problem connecting to freebsd box on local network since last sunday. the last kernel that works: FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 anything after that, sometimes i can connect, other times just hangs. any network connection hangs = pop httpd ssh etc etc. anyone have any ideas ? i can checkout different sources and see if i can locate the changes that cause this. >>> >>>Please do! >... >>Here is what I found doing : >>setenv CVSROOT /usr/home/ncvs >> >>cvs co -D"October 28, 2012 12:14:38 PDT" sys >> >>A kernel from that time works fine. >> >>doing: >> >>cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later >>the following files were changed: >>sys/netinet/tcp_input.c >>sys/netinet/tcp_timer.c >>sys/netinet/tcp_var.h >> >>Building a kernel from these new files is when the problem starts. > >So, your problems seem to have been introduced by this commit by Andre: > > http://svn.freebsd.org/changeset/base/242266 > > Increase the initial CWND to 10 segments as defined in IETF TCPM > draft-ietf-tcpm-initcwnd-05. It explains why the increased initial > window improves the overall performance of many web services without > risking congestion collapse. > > As long as it remains a draft it is placed under a sysctl marking it > as experimental: > net.inet.tcp.experimental.initcwnd10 = 1 > When it becomes an official RFC soon the sysctl will be changed to > the RFC number and moved to net.inet.tcp. > > This implementation differs from the RFC draft in that it is a bit > more conservative in the case of packet loss on SYN or SYN|ACK because > we haven't reduced the default RTO to 1 second yet. Also the restart > window isn't yet increased as allowed. Both will be adjusted with > upcoming changes. > > Is is enabled by default. In Linux it is enabled since kernel 3.0. > >After the commit, there was a small discussion thread on svn-src-head@ >about the possible problems with the approach. Maybe you are >experiencing those? > >As the commit message says, you should be able to turn the feature off >using: > > sysctl net.inet.tcp.experimental.initcwnd10=0 > >Can you please try that, and see if the problems go away? I read the commit log and tried that. It didn't change. I will try the patch from Andre and enable the debug log. Manfred || n...@pozo.com || || Ph. (415) 681-6235 || -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On 04.11.2012 13:11, Kim Culhan wrote: On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: On 2012-11-04 02:13, Manfred Antar wrote: At 03:29 PM 11/3/2012, Adrian Chadd wrote: On 3 November 2012 10:40, Manfred Antar wrote: i have problem connecting to freebsd box on local network since last sunday. the last kernel that works: FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 anything after that, sometimes i can connect, other times just hangs. any network connection hangs = pop httpd ssh etc etc. anyone have any ideas ? i can checkout different sources and see if i can locate the changes that cause this. Please do! ... Here is what I found doing : setenv CVSROOT /usr/home/ncvs cvs co -D"October 28, 2012 12:14:38 PDT" sys A kernel from that time works fine. doing: cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later the following files were changed: sys/netinet/tcp_input.c sys/netinet/tcp_timer.c sys/netinet/tcp_var.h Building a kernel from these new files is when the problem starts. So, your problems seem to have been introduced by this commit by Andre: http://svn.freebsd.org/changeset/base/242266 Increase the initial CWND to 10 segments as defined in IETF TCPM draft-ietf-tcpm-initcwnd-05. It explains why the increased initial window improves the overall performance of many web services without risking congestion collapse. As long as it remains a draft it is placed under a sysctl marking it as experimental: net.inet.tcp.experimental.initcwnd10 = 1 When it becomes an official RFC soon the sysctl will be changed to the RFC number and moved to net.inet.tcp. This implementation differs from the RFC draft in that it is a bit more conservative in the case of packet loss on SYN or SYN|ACK because we haven't reduced the default RTO to 1 second yet. Also the restart window isn't yet increased as allowed. Both will be adjusted with upcoming changes. Is is enabled by default. In Linux it is enabled since kernel 3.0. After the commit, there was a small discussion thread on svn-src-head@ about the possible problems with the approach. Maybe you are experiencing those? As the commit message says, you should be able to turn the feature off using: sysctl net.inet.tcp.experimental.initcwnd10=0 Can you please try that, and see if the problems go away? FWIW this did not make the problem go away on 2 machines. Yes, this very much looks like the same problem as in PR/173309. Please try the attached patch. It fixes the connection hang issue. There may be a second issue I debugging currently base on the feedback from Fabian Keil. -- Andre Index: tcp_input.c === --- tcp_input.c (revision 242494) +++ tcp_input.c (working copy) @@ -2650,10 +2652,12 @@ SOCKBUF_LOCK(&so->so_snd); if (acked > so->so_snd.sb_cc) { + tp->snd_wnd -= so->so_snd.sb_cc; sbdrop_locked(&so->so_snd, (int)so->so_snd.sb_cc); ourfinisacked = 1; } else { sbdrop_locked(&so->so_snd, acked); + tp->snd_wnd -= acked; ourfinisacked = 0; } /* NB: sowwakeup_locked() does an implicit unlock. */ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote: > On 2012-11-04 02:13, Manfred Antar wrote: >> At 03:29 PM 11/3/2012, Adrian Chadd wrote: >>> On 3 November 2012 10:40, Manfred Antar wrote: i have problem connecting to freebsd box on local network since last sunday. the last kernel that works: FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 anything after that, sometimes i can connect, other times just hangs. any network connection hangs = pop httpd ssh etc etc. anyone have any ideas ? i can checkout different sources and see if i can locate the changes that cause this. >>> >>> Please do! > ... >> Here is what I found doing : >> setenv CVSROOT /usr/home/ncvs >> >> cvs co -D"October 28, 2012 12:14:38 PDT" sys >> >> A kernel from that time works fine. >> >> doing: >> >> cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later >> the following files were changed: >> sys/netinet/tcp_input.c >> sys/netinet/tcp_timer.c >> sys/netinet/tcp_var.h >> >> Building a kernel from these new files is when the problem starts. > > So, your problems seem to have been introduced by this commit by Andre: > >http://svn.freebsd.org/changeset/base/242266 > >Increase the initial CWND to 10 segments as defined in IETF TCPM >draft-ietf-tcpm-initcwnd-05. It explains why the increased initial >window improves the overall performance of many web services without >risking congestion collapse. > >As long as it remains a draft it is placed under a sysctl marking it >as experimental: > net.inet.tcp.experimental.initcwnd10 = 1 >When it becomes an official RFC soon the sysctl will be changed to >the RFC number and moved to net.inet.tcp. > >This implementation differs from the RFC draft in that it is a bit >more conservative in the case of packet loss on SYN or SYN|ACK because >we haven't reduced the default RTO to 1 second yet. Also the restart >window isn't yet increased as allowed. Both will be adjusted with >upcoming changes. > >Is is enabled by default. In Linux it is enabled since kernel 3.0. > > After the commit, there was a small discussion thread on svn-src-head@ > about the possible problems with the approach. Maybe you are > experiencing those? > > As the commit message says, you should be able to turn the feature off > using: > >sysctl net.inet.tcp.experimental.initcwnd10=0 > > Can you please try that, and see if the problems go away? FWIW this did not make the problem go away here. thanks -kim -- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
Could this be same problem - PR/173309 ? -- Regards, Alexander Yerenkow ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On 04.11.2012 02:13, Manfred Antar wrote: At 03:29 PM 11/3/2012, Adrian Chadd wrote: On 3 November 2012 10:40, Manfred Antar wrote: i have problem connecting to freebsd box on local network since last sunday. the last kernel that works: FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 anything after that, sometimes i can connect, other times just hangs. any network connection hangs = pop httpd ssh etc etc. anyone have any ideas ? i can checkout different sources and see if i can locate the changes that cause this. Please do! adrian OK Here is what I found doing : setenv CVSROOT /usr/home/ncvs cvs co -D"October 28, 2012 12:14:38 PDT" sys A kernel from that time works fine. doing: cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later the following files were changed: sys/netinet/tcp_input.c sys/netinet/tcp_timer.c sys/netinet/tcp_var.h Building a kernel from these new files is when the problem starts. Can you please provide one or more tcpdump from a failing kernel? Also please enable sysctl net.inet.tcp.logdebug=1 and capture LOG_DEBUG output from syslogd. That may give some important information as well. -- Andre ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On 2012-11-04 02:13, Manfred Antar wrote: At 03:29 PM 11/3/2012, Adrian Chadd wrote: On 3 November 2012 10:40, Manfred Antar wrote: i have problem connecting to freebsd box on local network since last sunday. the last kernel that works: FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 anything after that, sometimes i can connect, other times just hangs. any network connection hangs = pop httpd ssh etc etc. anyone have any ideas ? i can checkout different sources and see if i can locate the changes that cause this. Please do! ... Here is what I found doing : setenv CVSROOT /usr/home/ncvs cvs co -D"October 28, 2012 12:14:38 PDT" sys A kernel from that time works fine. doing: cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later the following files were changed: sys/netinet/tcp_input.c sys/netinet/tcp_timer.c sys/netinet/tcp_var.h Building a kernel from these new files is when the problem starts. So, your problems seem to have been introduced by this commit by Andre: http://svn.freebsd.org/changeset/base/242266 Increase the initial CWND to 10 segments as defined in IETF TCPM draft-ietf-tcpm-initcwnd-05. It explains why the increased initial window improves the overall performance of many web services without risking congestion collapse. As long as it remains a draft it is placed under a sysctl marking it as experimental: net.inet.tcp.experimental.initcwnd10 = 1 When it becomes an official RFC soon the sysctl will be changed to the RFC number and moved to net.inet.tcp. This implementation differs from the RFC draft in that it is a bit more conservative in the case of packet loss on SYN or SYN|ACK because we haven't reduced the default RTO to 1 second yet. Also the restart window isn't yet increased as allowed. Both will be adjusted with upcoming changes. Is is enabled by default. In Linux it is enabled since kernel 3.0. After the commit, there was a small discussion thread on svn-src-head@ about the possible problems with the approach. Maybe you are experiencing those? As the commit message says, you should be able to turn the feature off using: sysctl net.inet.tcp.experimental.initcwnd10=0 Can you please try that, and see if the problems go away? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
At 03:29 PM 11/3/2012, Adrian Chadd wrote: >On 3 November 2012 10:40, Manfred Antar wrote: >> i have problem connecting to freebsd box on local network since last sunday. >> the last kernel that works: >> FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 >> anything after that, sometimes i can connect, other times just hangs. >> any network connection hangs = pop httpd ssh etc etc. >> anyone have any ideas ? >> i can checkout different sources and see if i can locate the changes that >> cause this. > >Please do! > > > >adrian OK Here is what I found doing : setenv CVSROOT /usr/home/ncvs cvs co -D"October 28, 2012 12:14:38 PDT" sys A kernel from that time works fine. doing: cvs up -D"October 28, 2012 13:14:38 PDT" sys1 hour later the following files were changed: sys/netinet/tcp_input.c sys/netinet/tcp_timer.c sys/netinet/tcp_var.h Building a kernel from these new files is when the problem starts. || n...@pozo.com || || Ph. (415) 681-6235 || -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: weird network problems on current since 10/28/2012
On 3 November 2012 10:40, Manfred Antar wrote: > i have problem connecting to freebsd box on local network since last sunday. > the last kernel that works: > FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 > anything after that, sometimes i can connect, other times just hangs. > any network connection hangs = pop httpd ssh etc etc. > anyone have any ideas ? > i can checkout different sources and see if i can locate the changes that > cause this. Please do! adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
weird network problems on current since 10/28/2012
i have problem connecting to freebsd box on local network since last sunday. the last kernel that works: FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012 anything after that, sometimes i can connect, other times just hangs. any network connection hangs = pop httpd ssh etc etc. anyone have any ideas ? i can checkout different sources and see if i can locate the changes that cause this. thanks manfred || n...@pozo.com || || Ph. (415) 681-6235 || -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"