Re: Cas driver fails to load first time after boot.
On 01/24/13 15:50, Marius Strobl wrote: On Thu, Jan 24, 2013 at 12:39:44PM -0600, Paul Keusemann wrote: On 01/24/13 09:09, Marius Strobl wrote: On Tue, Jan 22, 2013 at 02:46:48PM -0600, Paul Keusemann wrote: Hi, I've got a Dell R200 which I'm trying to build into a gateway with a Sun QGE (501-6738-10). The cas driver fails to load the first time I try to load it but succeeds the second time. Is this a problem with the card, the driver, my karma? Wrong phase of the moon, apparently :) The MII setup of these chips is a bit tricky and I'm not sure whether I've hit all code paths during development of the driver. I certainly didn't test with a 501-6738, these have been reported as working before, though. It also doesn't make much sense that attaching the devices succeeds on the second attempt. Could you please use a if_cas.ko built with the attached patch and report the debug output for one of the interfaces in both the working and the non-working case? I would love to give you output from the working and non-working case but apparently the phase of the moon has changed, I can't get it to fail now. The messages output from the working case is attached. Thanks but unfortunately this doesn't make any sense either. In general, printf()s cause deays which can be relevant. In the locations I've put them they hardly can make such a difference though. If you haven't already done so, could you please power off the machine before doing the test with the patched module? Is the problem still gone if you revert to the original module? OK, power-cycling makes a difference. The driver fails to attach all of the devices after power-cycling most of the time if not all of the time. The number of devices attached varies, the attached message file fragment is from my last test. Three of the devices were attached on the first load attempt and all four of them on the second attempt. In the interest of full disclosure, I did build a new kernel but it is just a copy of GENERIC. This is a Marius -- Paul Keusemannpkeu...@visi.com 4266 Joppa Court (952) 894-7805 Savage, MN 55378 Jan 24 20:32:32 lucid kernel: Copyright (c) 1992-2012 The FreeBSD Project. Jan 24 20:32:32 lucid kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Jan 24 20:32:32 lucid kernel: The Regents of the University of California. All rights reserved. Jan 24 20:32:32 lucid kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Jan 24 20:32:32 lucid kernel: FreeBSD 8.3-RELEASE #0: Thu Jan 24 11:15:13 CST 2013 Jan 24 20:32:32 lucid kernel: toor@lucid:/usr/obj/usr/src/sys/LUCID amd64 Jan 24 20:32:32 lucid kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Jan 24 20:32:32 lucid kernel: CPU: Intel(R) Xeon(R) CPU X3210 @ 2.13GHz (2133.42-MHz K8-class CPU) Jan 24 20:32:32 lucid kernel: Origin = "GenuineIntel" Id = 0x6fb Family = 6 Model = f Stepping = 11 Jan 24 20:32:32 lucid kernel: Features=0xbfebfbff Jan 24 20:32:32 lucid kernel: Features2=0xe3bd Jan 24 20:32:32 lucid kernel: AMD Features=0x20100800 Jan 24 20:32:32 lucid kernel: AMD Features2=0x1 Jan 24 20:32:32 lucid kernel: TSC: P-state invariant Jan 24 20:32:32 lucid kernel: real memory = 4294967296 (4096 MB) Jan 24 20:32:32 lucid kernel: avail memory = 4099231744 (3909 MB) Jan 24 20:32:32 lucid kernel: ACPI APIC Table: Jan 24 20:32:32 lucid kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs Jan 24 20:32:32 lucid kernel: FreeBSD/SMP: 1 package(s) x 4 core(s) Jan 24 20:32:32 lucid kernel: cpu0 (BSP): APIC ID: 0 Jan 24 20:32:32 lucid kernel: cpu1 (AP): APIC ID: 1 Jan 24 20:32:32 lucid kernel: cpu2 (AP): APIC ID: 2 Jan 24 20:32:32 lucid kernel: cpu3 (AP): APIC ID: 3 Jan 24 20:32:32 lucid kernel: ioapic0: Changing APIC ID to 4 Jan 24 20:32:32 lucid kernel: ioapic1: Changing APIC ID to 5 Jan 24 20:32:32 lucid kernel: ioapic0 irqs 0-23 on motherboard Jan 24 20:32:32 lucid kernel: ioapic1 irqs 32-55 on motherboard Jan 24 20:32:32 lucid kernel: kbd1 at kbdmux0 Jan 24 20:32:32 lucid kernel: acpi0: on motherboard Jan 24 20:32:32 lucid kernel: acpi0: [ITHREAD] Jan 24 20:32:32 lucid kernel: acpi0: Power Button (fixed) Jan 24 20:32:32 lucid kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Jan 24 20:32:32 lucid kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 Jan 24 20:32:32 lucid kernel: cpu0: on acpi0 Jan 24 20:32:32 lucid kernel: cpu1: on acpi0 Jan 24 20:32:32 lucid kernel: cpu2: on acpi0 Jan 24 20:32:32 lucid kernel: cpu3: on acpi0 Jan 24 20:32:32 lucid kernel: pcib0: port 0xcf8-0xcff on acpi0 Jan 24 20:32:32 lucid kernel: pci0: on pcib0 Jan 24 20:32:32 lucid kernel: pcib1: irq 16 at device 1.0 on pci0 Jan 24 20:32:32 lucid kernel: pci1: on pcib1 Jan 24 20:32:32 lucid kernel: pcib2: irq 16 at device 28.0 on pci0 Jan 24 20:32:32 lucid kernel: pci2: on pcib2 Jan 24 20:
Re: how to completely makes an interface down?
On Thu, 24 Jan 2013, h bagade wrote: I'm searching for a method or configuration which when I make the interface down, the led goes off. Currently the led still remains on when I shutdowns the interface! Is there any way to do this? em(4) mentions controlling the card LEDs. I have not tried it, though. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Tov?bb?t?s: [Ipsec-tools-users] freebsd & linux setup question
Dear Yvan, I've found a strange line in racoon's output: Either family (2 - 2), types (4 - 1) of ID from initiator differ or matching sainfo has no id_i defined for the peer. Not filling iph2->sa_src and iph2->sa_dst. This is missing in linux's instance. Could this be a clue for my problem? Thanks in advance, Kojedzinszky Richard On Tue, 22 Jan 2013, Richard Kojedzinszky wrote: Dear Yvan, I've recompiled racoon with NATT, but as you've said, only pure Internet is between A and B without NAT, and thus it did not solve my problem. I've attached racoon's output from # racoon -ddd -F on the freebsd's side. I can confirm, that setkey -D and -DP's output were full, so only the two entries existed for the SA's and policices. I've tried a simple road-warrior setup, with transport mode, thus only traffic between A and B was protected, but that worked. My server's racoon.conf is simple: -- path certificate "/usr/local/etc/racoon/certs"; remote anonymous { exchange_mode main,aggressive; # nat_traversal off; certificate_type x509 "A.crt "A.key"; ca_type x509 "ca.crt"; my_identifier asn1dn; peers_identifier asn1dn; proposal_check strict ; lifetime time 24 hour; proposal { encryption_algorithm aes256; hash_algorithm sha1; authentication_method rsasig; dh_group 2; } generate_policy on ; passive on ; dpd_delay 60; } sainfo anonymous { lifetime time 4 hour; encryption_algorithm aes128 ; authentication_algorithm hmac_md5 ; compression_algorithm deflate; } log debug ; -- And the client's is the same except the generate_policy and passive statements. Thanks in advance, Kojedzinszky Richard On Tue, 22 Jan 2013, VANHULLEBUS Yvan wrote: Hi. On Mon, Jan 21, 2013 at 05:53:49PM +0100, kri...@cflinux.hu wrote: Dear users, I've a working tunnel setup between two linux hosts. One end (A) has a fix address, while the other (B) has a dynamic one. A is my server, B is my home router. Behind B, I've a private network. What I've setup is that my private network reaches A through an IPSEC tunnel. [] Now, I've decided to switc to freebsd on server side, and the same configuration on the server simply does not work. It installs the policies, and the tunnels, but it seems, that when a reply packet is leaving the server, it tries to initiate a new tunnel. If I've "passive on" on my server's remote section, then I've the following error: Jan 21 16:06:11 pi racoon: ERROR: no configuration found for B. Jan 21 16:06:11 pi racoon: ERROR: failed to begin ipsec sa negotication. If I disable passive mode, then racoon tries to establish another tunnel, but for some reason it does not succeed also. But I think, as in linux it should work with passive on. FreeBSD is 9.1-RELEASE, the linux side is a linux 3.5.4. racoon on linux is: # racoon -V @(#)ipsec-tools 0.8.0 (http://ipsec-tools.sourceforge.net) Compiled with: - OpenSSL 1.0.0e 6 Sep 2011 (http://www.openssl.org/) - Dead Peer Detection - IKE fragmentation - NAT Traversal - Monotonic clock racoon on freebsd is: # racoon -V @(#)ipsec-tools 0.8.0 (http://ipsec-tools.sourceforge.net) Compiled with: - OpenSSL 0.9.8x 10 May 2012 (http://www.openssl.org/) - Dead Peer Detection - IKE fragmentation - Hybrid authentication - Monotonic clock You have NAT-T compiled/enabled on Linux side, but not on FreeBSD side (probably because it is not activated as a kernel option). If you have "something that does NAT" on the wire between A and B, it is probably the origin of your problem. However, as it seems that there is only "Internet" between A and B, I'll suppose that the issue is somewhere else... Unfortunately I've no idea. Before the first packet, on the server: # setkey -D No SAD entries. After an icmp packet sent from my private network to A: # setkey -D A B esp mode=tunnel spi=76859998(0x0494ca5e) reqid=0(0x) E: rijndael-cbc 1c80b80d b006e3a3 772c2a9b 5c475213 A: hmac-md5 d43ff29c 034c896a fb2e7d1c 95f73ff5 seq=0x replay=4 flags=0x state=mature created: Jan 21 17:03:39 2013 current: Jan 21 17:05:54 2013 diff: 135(s)hard: 14400(s) soft: 11520(s) last: hard: 0(s) soft: 0(s) current: 0(bytes) hard: 0(bytes) soft: 0(bytes) allocated: 0hard: 0 soft: 0 sadb_seq=1 pid=93091 refcnt=1 B A esp mode=tunnel spi=14479(0x08a151f0) reqid=0(0x) E: rijndael-cbc 8bd59c29 9800d10f 8f9d7e84 a720aa9c A: hmac-md5 188070e2 a3220772 78efcb06 3457db62 seq=0x0037 replay=4 flags=0x state=mature created: Jan 21 17:03:39 2013 current: Jan 21 17:05:54 2013 diff: 135(s)hard: 14400(s) soft: 11520(s) last: Jan 21 17:04:50 2013 hard: 0(s)
Re: Some questions about the new TCP congestion control code
On 01/25/13 01:12, Andre Oppermann wrote: > On 24.01.2013 14:28, Lawrence Stewart wrote: >> On 01/16/13 06:27, John Baldwin wrote: >>> One other thing I noticed which is may or may not be odd during this, >>> is that >>> if you have a connection with TCP_NODELAY enabled and you fill your >>> cwnd and >>> then you get an ACK back for an earlier small segment (less than >>> MSS), TCP >>> will not send out a "short" segment for the amount of window space >>> released. >>> Instead, it will wait until a full MSS of space is available before >>> sending >>> a packet. I'm not sure if that is the correct behavior with >>> TCP_NODELAY or >>> if we should send "short" segments in that case. >> >> We try fairly hard not to send runt segments irrespective of NODELAY, >> but I would be happy to see that change. I'm not aware of any "correct >> behaviour" we have to adhere to - I think it would be perfectly >> reasonable to have a sysctl set the lowest number of bytes we'd be >> willing to send a runt segment for and then key off TCP_NODELAY as to >> whether we try hard to send an MSS worth or send as soon as we have the >> min number of bytes worth of window available. > > This is classic silly window syndrome prevention applied to the CWND. Yes, but I think we could provide knobs to relax the behaviour where the latency vs header/payload overhead tradeoff swings in favour of latency. I guess, John, I should first ask if you know why you were only getting such small ACKs back? Were you sending full MSS segments in the first place or doing some sort of PUSH to try and expedite getting some smaller chunk of data to the other end which triggered a small segment and corresponding small ACK? > Sending a small segment when the window opens just a bit isn't going to help > much and I wouldn't be game to make such a blanket statement - that very much depends on the situation. I think John's use case is relevant and we currently aren't very helpful towards it. > mostly clogs the network. How so? We're not in the 80's any more. If I pay for X MBps of service, I expect to be able to use it in any way I choose. Packet size is irrelevant, but there are obvious efficiencies to be gained by maximising the amount of payload in each segment. > This is actually a side effect of ABC (appropriate byte counting) where not > the ACK's are counted but the bytes ACK'ed. Disabling ABC will solve this > problem. I don't follow. How is what John described above related to ABC? Cheers, Lawrence ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Block ACK in Ralink RT2860
> Message: 6 > Date: Thu, 24 Jan 2013 12:23:55 -0500 > From: Ramanujan Seshadri > To: freebsd-net@freebsd.org > Subject: Block ACK in Ralink RT2860 > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hi all, > I am trying to read the contents of block ack's in a Ralink RT2860 driver. > Can you please help me to know which function i should be looking into ? At default, all BA packets are dropped by h/w. Clear RT2860_DROP_BA flag at http://fxr.watson.org/fxr/source/dev/ral/rt2860.c#L3559 Then, the diver should receive BA packets, and you can read them. AK ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Cas driver fails to load first time after boot.
On Thu, Jan 24, 2013 at 12:39:44PM -0600, Paul Keusemann wrote: > > On 01/24/13 09:09, Marius Strobl wrote: > > On Tue, Jan 22, 2013 at 02:46:48PM -0600, Paul Keusemann wrote: > >> Hi, > >> > >> I've got a Dell R200 which I'm trying to build into a gateway with a Sun > >> QGE (501-6738-10). The cas driver fails to load the first time I try to > >> load it but succeeds the second time. Is this a problem with the card, > >> the driver, my karma? > > Wrong phase of the moon, apparently :) > > The MII setup of these chips is a bit tricky and I'm not sure whether > > I've hit all code paths during development of the driver. I certainly > > didn't test with a 501-6738, these have been reported as working before, > > though. It also doesn't make much sense that attaching the devices > > succeeds on the second attempt. Could you please use a if_cas.ko built > > with the attached patch and report the debug output for one of the > > interfaces in both the working and the non-working case? > > I would love to give you output from the working and non-working case > but apparently the phase of the moon has changed, I can't get it to fail > now. The messages output from the working case is attached. > Thanks but unfortunately this doesn't make any sense either. In general, printf()s cause deays which can be relevant. In the locations I've put them they hardly can make such a difference though. If you haven't already done so, could you please power off the machine before doing the test with the patched module? Is the problem still gone if you revert to the original module? Marius ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: 9.1-stable crashes while copying data from a NFS mounted directory
On Thu, Jan 24, 2013 at 09:50:52PM +0100, Christian Gusenbauer wrote: > On Thursday 24 January 2013 20:37:09 Konstantin Belousov wrote: > > On Thu, Jan 24, 2013 at 07:50:49PM +0100, Christian Gusenbauer wrote: > > > On Thursday 24 January 2013 19:07:23 Konstantin Belousov wrote: > > > > On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin Belousov wrote: > > > > > On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian Gusenbauer wrote: > > > > > > Hi! > > > > > > > > > > > > I'm using 9.1 stable svn revision 245605 and I get the panic below > > > > > > if I execute the following commands (as single user): > > > > > > > > > > > > # swapon -a > > > > > > # dumpon /dev/ada0s3b > > > > > > # mount -u / > > > > > > # ifconfig age0 inet 192.168.2.2 mtu 6144 up > > > > > > # mount -t nfs -o rsize=32768 data:/multimedia /mnt > > > > > > # cp /mnt/Movies/test/a.m2ts /tmp > > > > > > > > > > > > then the system panics almost immediately. I'll attach the stack > > > > > > trace. > > > > > > > > > > > > Note, that I'm using jumbo frames (6144 byte) on a 1Gbit network, > > > > > > maybe that's the cause for the panic, because the bcopy (see stack > > > > > > frame #15) fails. > > > > > > > > > > > > Any clues? > > > > > > > > > > I tried a similar operation with the nfs mount of rsize=32768 and mtu > > > > > 6144, but the machine runs HEAD and em instead of age. I was unable > > > > > to reproduce the panic on the copy of the 5GB file from nfs mount. > > > > > > Hmmm, I did a quick test. If I do not change the MTU, so just configuring > > > age0 with > > > > > > # ifconfig age0 inet 192.168.2.2 up > > > > > > then I can copy all files from the mounted directory without any > > > problems, too. So it's probably age0 related? > > > > From your backtrace and the buffer printout, I see somewhat strange thing. > > The buffer data address is 0xff8171418000, while kernel faulted > > at the attempt to write at 0xff8171413000, which is is lower then > > the buffer data pointer, at the attempt to bcopy to the buffer. > > > > The other data suggests that there were no overflow of the data from the > > server response. So it might be that mbuf_len(mp) returned negative number > > ? I am not sure is it possible at all. > > > > Try this debugging patch, please. You need to add INVARIANTS etc to the > > kernel config. > > > > diff --git a/sys/fs/nfs/nfs_commonsubs.c b/sys/fs/nfs/nfs_commonsubs.c > > index efc0786..9a6bda5 100644 > > --- a/sys/fs/nfs/nfs_commonsubs.c > > +++ b/sys/fs/nfs/nfs_commonsubs.c > > @@ -218,6 +218,7 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio > > *uiop, int siz) } > > mbufcp = NFSMTOD(mp, caddr_t); > > len = mbuf_len(mp); > > + KASSERT(len > 0, ("len %d", len)); > > } > > xfer = (left > len) ? len : left; > > #ifdef notdef > > @@ -239,6 +240,8 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio > > *uiop, int siz) uiop->uio_resid -= xfer; > > } > > if (uiop->uio_iov->iov_len <= siz) { > > + KASSERT(uiop->uio_iovcnt > 1, ("uio_iovcnt %d", > > + uiop->uio_iovcnt)); > > uiop->uio_iovcnt--; > > uiop->uio_iov++; > > } else { > > > > I thought that server have returned too long response, but it seems to > > be not the case from your data. Still, I think the patch below might be > > due. > > > > diff --git a/sys/fs/nfsclient/nfs_clrpcops.c > > b/sys/fs/nfsclient/nfs_clrpcops.c index be0476a..a89b907 100644 > > --- a/sys/fs/nfsclient/nfs_clrpcops.c > > +++ b/sys/fs/nfsclient/nfs_clrpcops.c > > @@ -1444,7 +1444,7 @@ nfsrpc_readrpc(vnode_t vp, struct uio *uiop, struct > > ucred *cred, NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); > > eof = fxdr_unsigned(int, *tl); > > } > > - NFSM_STRSIZ(retlen, rsize); > > + NFSM_STRSIZ(retlen, len); > > error = nfsm_mbufuio(nd, uiop, retlen); > > if (error) > > goto nfsmout; > > I applied your patches and now I get a > > panic: len -4 > cpuid = 1 > KDB: enter: panic > Dumping 377 out of 6116 MB:..5%..13%..22%..34%..43%..51%..64%..73%..81%..94% > This means that the age driver either produced corrupted mbuf chain, or filled wrong negative value into the mbuf len field. I am quite certain that the issue is in the driver. I added the net@ to Cc:, hopefully you could get help there. > > #0 doadump (textdump=0) > at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:265 > 265 if (textdump && textdump_pending) { > (kgdb) #0 doadump (textdump=0) > at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:265 > #1 0x802a7490 in db_dump (dummy=, > dummy2=, dummy3=, > dummy4=) > at /spare/tmp/src-stable9/sys/ddb/db_command.c:538 > #2 0x802a6a7e in db_command (last_cmdp=0x808ca140
Re: [PATCH] Add a new TCP_IGNOREIDLE socket option
On 1/24/13 11:14 AM, John Baldwin wrote: On Thursday, January 24, 2013 3:03:31 am Andre Oppermann wrote: On 24.01.2013 03:31, Sepherosa Ziehau wrote: On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin wrote: On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin wrote: As I mentioned in an earlier thread, I recently had to debug an issue we were seeing across a link with a high bandwidth-delay product (both high bandwidth and high RTT). Our specific use case was to use a TCP connection to reliably forward a latency-sensitive datagram stream across a WAN connection. We would often see spikes in the latency of individual datagrams. I eventually tracked this down to the connection entering slow start when it would transmit data after being idle. The data stream was quite bursty and would often attempt to transmit a burst of data after being idle for far longer than a retransmit timeout. In 7.x we had worked around this in the past by disabling RFC 3390 and jacking the slow start window size up via a sysctl. On 8.x this no longer worked. The solution I came up with was to add a new socket option to disable idle handling completely. That is, when an idle connection restarts with this new option enabled, it keeps its current congestion window and doesn't enter slow start. There are only a few cases where such an option is useful, but if anyone else thinks this might be useful I'd be happy to add the option to FreeBSD. I think what you need is the RFC2861, however, you probably should ignore the "application-limited period" part of RFC2861. Hummm. It appears btw, that Linux uses RFC 2861, but has a global knob to disable it due to applictions having problems. When it is disabled, it doesn't decay the congestion window at all during idle handling. That is, it appears to act the same as if TCP_IGNOREIDLE were enabled. From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 2.6.18) If enabled, provide RFC 2861 behavior and time out the congestion window after an idle period. An idle period is defined as the current RTO (retransmission timeout). If disabled, the congestion window will not be timed out after an idle period. Also, in this thread on tcp-m it appears no one on that list realizes that there are any implementations which follow the "SHOULD" in RFC 2581 for idle handling (which is what we do currently): Nah, I don't think the idle detection in FreeBSD follows the RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD"). IMHO, that's probably why the author in the following email requestioned about the implementation of "SHOULD" in RFC2581/RFC5681. http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html So if we were to implement RFC 2861, the new socket option would be equivalent to setting Linux's 'tcp_slow_start_after_idle' to false, but on a per-socket basis rather than globally. Agree, per-socket option could be useful than global sysctls under certain situation. However, in addition to the per-socket option, could global sysctl nodes to disable idle_restart/idle_cwv help too? No. This is far too dangerous once it makes it into some tuning guide. The threat of congestion breakdown is real. The Internet, or any packet network, can only survive in the long term if almost all follow the rules and self-constrain to remain fair to the others. What would happen if nobody would respect the traffic lights anymore? The problem with this argument is Linux has already had this as a tunable option for years and the Internet hasn't melted as a result. Besides that bursting into unknown network conditions is very likely to result in burst losses as well. TCP isn't good at recovering from it. In the end you most likely come out ahead if you decay the restartCWND. We have two cases primarily: a) long distance, medium to high RTT, and wildly varying bandwidth (a.k.a. the Internet); b) short distance, low RTT and mostly plenty of bandwidth (a.k.a. Datacenter). The former absolutely definately requires a decayed restartCWND. The latter less so but even there bursting at 10Gig TSO assisted wirespeed isn't going to end too happy more often than not. You forgot my case: c) dedicated long distance links with high bandwidth. Since this seems to be a burning issue I'll come up with a patch in the next days to add a decaying restartCWND that'll be fair and allow a very quick ramp up if no loss occurs. I think this could be useful. OTOH, I still think the TCP_IGNOREIDLE option is useful both with and without a decaying restartCWND? Linux seems to be doing just fine with it for what seems to be a long while. Can we get this committed? -Alfred ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/
Re: [PATCH] Add a new TCP_IGNOREIDLE socket option
On Thursday, January 24, 2013 3:03:31 am Andre Oppermann wrote: > On 24.01.2013 03:31, Sepherosa Ziehau wrote: > > On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin wrote: > >> On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: > >>> On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin wrote: > As I mentioned in an earlier thread, I recently had to debug an issue we > were > seeing across a link with a high bandwidth-delay product (both high > bandwidth > and high RTT). Our specific use case was to use a TCP connection to > reliably > forward a latency-sensitive datagram stream across a WAN connection. We > would > often see spikes in the latency of individual datagrams. I eventually > tracked > this down to the connection entering slow start when it would transmit > data > after being idle. The data stream was quite bursty and would often > attempt to > transmit a burst of data after being idle for far longer than a > retransmit > timeout. > > In 7.x we had worked around this in the past by disabling RFC 3390 and > jacking > the slow start window size up via a sysctl. On 8.x this no longer > worked. > The solution I came up with was to add a new socket option to disable > idle > handling completely. That is, when an idle connection restarts with > this new > option enabled, it keeps its current congestion window and doesn't enter > slow > start. > > There are only a few cases where such an option is useful, but if anyone > else > thinks this might be useful I'd be happy to add the option to FreeBSD. > >>> > >>> I think what you need is the RFC2861, however, you probably should > >>> ignore the "application-limited period" part of RFC2861. > >> > >> Hummm. It appears btw, that Linux uses RFC 2861, but has a global knob to > >> disable it due to applictions having problems. When it is disabled, > >> it doesn't decay the congestion window at all during idle handling. That > >> is, > >> it appears to act the same as if TCP_IGNOREIDLE were enabled. > >> > >> From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: > >> > >> tcp_slow_start_after_idle (Boolean; default: enabled; since Linux > >> 2.6.18) > >>If enabled, provide RFC 2861 behavior and time out the > >> congestion > >>window after an idle period. An idle period is defined as > >> the current > >>RTO (retransmission timeout). If disabled, the congestion > >> window will > >>not be timed out after an idle period. > >> > >> Also, in this thread on tcp-m it appears no one on that list realizes that > >> there are any implementations which follow the "SHOULD" in RFC 2581 for > >> idle > >> handling (which is what we do currently): > > > > Nah, I don't think the idle detection in FreeBSD follows the > > RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD"). IMHO, that's > > probably why the author in the following email requestioned about the > > implementation of "SHOULD" in RFC2581/RFC5681. > > > >> > >> http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html > >> > >> So if we were to implement RFC 2861, the new socket option would be > >> equivalent > >> to setting Linux's 'tcp_slow_start_after_idle' to false, but on a > >> per-socket > >> basis rather than globally. > > > > Agree, per-socket option could be useful than global sysctls under > > certain situation. However, in addition to the per-socket option, > > could global sysctl nodes to disable idle_restart/idle_cwv help too? > > No. This is far too dangerous once it makes it into some tuning guide. > The threat of congestion breakdown is real. The Internet, or any packet > network, can only survive in the long term if almost all follow the rules > and self-constrain to remain fair to the others. What would happen if > nobody would respect the traffic lights anymore? The problem with this argument is Linux has already had this as a tunable option for years and the Internet hasn't melted as a result. > Besides that bursting into unknown network conditions is very likely to > result in burst losses as well. TCP isn't good at recovering from it. > In the end you most likely come out ahead if you decay the restartCWND. > > We have two cases primarily: a) long distance, medium to high RTT, and > wildly varying bandwidth (a.k.a. the Internet); b) short distance, low > RTT and mostly plenty of bandwidth (a.k.a. Datacenter). The former > absolutely definately requires a decayed restartCWND. The latter less > so but even there bursting at 10Gig TSO assisted wirespeed isn't going > to end too happy more often than not. You forgot my case: c) dedicated long distance links with high bandwidth. > Since this seems to be a burning issue I'll come up with a patch in the > next days to add a decaying r
Re: how to completely makes an interface down?
h bagade wrote this message on Thu, Jan 24, 2013 at 16:59 +0330: > I'm searching for a method or configuration which when I make the interface > down, the led goes off. Currently the led still remains on when I shutdowns > the interface! Is there any way to do this? Not all ethernet drivers disable the PHY when you down the interface... You can try to use: ifconfig media none to shutdown the PHY, but the em driver on 9.1 doesn't have it, but re (7.2-R and -current) and msk (-current) seems to have it... Also, why do you want the led to go off? Remeber, the led is just an indication if there is a link established, not what will happen to the packets that are received... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Cas driver fails to load first time after boot.
On 01/24/13 09:09, Marius Strobl wrote: On Tue, Jan 22, 2013 at 02:46:48PM -0600, Paul Keusemann wrote: Hi, I've got a Dell R200 which I'm trying to build into a gateway with a Sun QGE (501-6738-10). The cas driver fails to load the first time I try to load it but succeeds the second time. Is this a problem with the card, the driver, my karma? Wrong phase of the moon, apparently :) The MII setup of these chips is a bit tricky and I'm not sure whether I've hit all code paths during development of the driver. I certainly didn't test with a 501-6738, these have been reported as working before, though. It also doesn't make much sense that attaching the devices succeeds on the second attempt. Could you please use a if_cas.ko built with the attached patch and report the debug output for one of the interfaces in both the working and the non-working case? I would love to give you output from the working and non-working case but apparently the phase of the moon has changed, I can't get it to fail now. The messages output from the working case is attached. Let me know if there's anything else I can do. Marius -- Paul Keusemannpkeu...@visi.com 4266 Joppa Court (952) 894-7805 Savage, MN 55378 Jan 24 11:00:01 lucid newsyslog[2087]: logfile turned over due to size>100K Jan 24 11:47:39 lucid shutdown: reboot by toor: Jan 24 11:47:41 lucid syslogd: exiting on signal 15 Jan 24 11:48:51 lucid syslogd: kernel boot file is /boot/kernel/kernel Jan 24 11:48:51 lucid kernel: Copyright (c) 1992-2012 The FreeBSD Project. Jan 24 11:48:51 lucid kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Jan 24 11:48:51 lucid kernel: The Regents of the University of California. All rights reserved. Jan 24 11:48:51 lucid kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Jan 24 11:48:51 lucid kernel: FreeBSD 8.3-RELEASE #0: Thu Jan 24 11:15:13 CST 2013 Jan 24 11:48:51 lucid kernel: toor@lucid:/usr/obj/usr/src/sys/LUCID amd64 Jan 24 11:48:51 lucid kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Jan 24 11:48:51 lucid kernel: CPU: Intel(R) Xeon(R) CPU X3210 @ 2.13GHz (2133.42-MHz K8-class CPU) Jan 24 11:48:51 lucid kernel: Origin = "GenuineIntel" Id = 0x6fb Family = 6 Model = f Stepping = 11 Jan 24 11:48:51 lucid kernel: Features=0xbfebfbff Jan 24 11:48:51 lucid kernel: Features2=0xe3bd Jan 24 11:48:51 lucid kernel: AMD Features=0x20100800 Jan 24 11:48:51 lucid kernel: AMD Features2=0x1 Jan 24 11:48:51 lucid kernel: TSC: P-state invariant Jan 24 11:48:51 lucid kernel: real memory = 4294967296 (4096 MB) Jan 24 11:48:51 lucid kernel: avail memory = 4099231744 (3909 MB) Jan 24 11:48:51 lucid kernel: ACPI APIC Table: Jan 24 11:48:51 lucid kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs Jan 24 11:48:51 lucid kernel: FreeBSD/SMP: 1 package(s) x 4 core(s) Jan 24 11:48:51 lucid kernel: cpu0 (BSP): APIC ID: 0 Jan 24 11:48:51 lucid kernel: cpu1 (AP): APIC ID: 1 Jan 24 11:48:51 lucid kernel: cpu2 (AP): APIC ID: 2 Jan 24 11:48:51 lucid kernel: cpu3 (AP): APIC ID: 3 Jan 24 11:48:51 lucid kernel: ioapic0: Changing APIC ID to 4 Jan 24 11:48:51 lucid kernel: ioapic1: Changing APIC ID to 5 Jan 24 11:48:51 lucid kernel: ioapic0 irqs 0-23 on motherboard Jan 24 11:48:51 lucid kernel: ioapic1 irqs 32-55 on motherboard Jan 24 11:48:51 lucid kernel: kbd1 at kbdmux0 Jan 24 11:48:51 lucid kernel: acpi0: on motherboard Jan 24 11:48:51 lucid kernel: acpi0: [ITHREAD] Jan 24 11:48:51 lucid kernel: acpi0: Power Button (fixed) Jan 24 11:48:51 lucid kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Jan 24 11:48:51 lucid kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 Jan 24 11:48:51 lucid kernel: cpu0: on acpi0 Jan 24 11:48:51 lucid kernel: cpu1: on acpi0 Jan 24 11:48:51 lucid kernel: cpu2: on acpi0 Jan 24 11:48:51 lucid kernel: cpu3: on acpi0 Jan 24 11:48:51 lucid kernel: pcib0: port 0xcf8-0xcff on acpi0 Jan 24 11:48:51 lucid kernel: pci0: on pcib0 Jan 24 11:48:51 lucid kernel: pcib1: irq 16 at device 1.0 on pci0 Jan 24 11:48:51 lucid kernel: pci1: on pcib1 Jan 24 11:48:51 lucid kernel: pcib2: irq 16 at device 28.0 on pci0 Jan 24 11:48:51 lucid kernel: pci2: on pcib2 Jan 24 11:48:51 lucid kernel: pcib3: at device 0.0 on pci2 Jan 24 11:48:51 lucid kernel: pci3: on pcib3 Jan 24 11:48:51 lucid kernel: pcib4: at device 2.0 on pci3 Jan 24 11:48:51 lucid kernel: pci4: on pcib4 Jan 24 11:48:51 lucid kernel: pci4: at device 0.0 (no driver attached) Jan 24 11:48:51 lucid kernel: pci4: at device 1.0 (no driver attached) Jan 24 11:48:51 lucid kernel: pci4: at device 2.0 (no driver attached) Jan 24 11:48:51 lucid kernel: pci4: at device 3.0 (no driver attached) Jan 24 11:48:51 lucid kernel: pcib5: irq 16 at device 28.4 on pci0 Jan 24 11:48:51 lucid kernel: pci5: on pcib5 Jan 24 11:48:51 lucid kernel: bge0: mem 0xd
Re: how to completely makes an interface down?
On Thu, Jan 24, 2013 at 5:29 AM, h bagade wrote: > Hi all, > > I'm searching for a method or configuration which when I make the interface > down, the led goes off. Currently the led still remains on when I shutdowns > the interface! Is there any way to do this? Depends on the interface, but on many devices the only way to turn off the LED is to unplug the cable or turn off the device in the other end. The LED is lit by the power on the receive pair and the LED will remain on even if the system is turned off and the power cord pulled as the remote end is really lighting the LED. -- R. Kevin Oberman, Network Engineer E-mail: kob6...@gmail.com ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Carp configuration errors
Hello, After upgrading to 9.1 it seems like carp doesn't pay attention to advskew anymore. I have two boxes each setup with carp0 and carp1; the intention is that in regular operation proxy1 is master for carp0 and proxy2 for carp1. However, whichever box comes up second is BACKUP for both. To make IPv6 CARP work I am using the patch from http://www.freebsd.org/cgi/query-pr.cgi?pr=127050 I don't know if it is related, but when booting I see a lot of messages like ifa_del_loopback_route: deletion failed ifa_add_loopback_route: insertion failed ifa_del_loopback_route: deletion failed ifa_add_loopback_route: insertion failed ifa_del_loopback_route: deletion failed ifa_add_loopback_route: insertion failed ifa_del_loopback_route: deletion failed ifa_add_loopback_route: insertion failed in dmesg. I am including my rc.conf files for each host below. Any hints or suggestions will be appreciated. Ask # proxy1 sshd_enable="YES" ntpd_enable="YES" ntpd_flags="-p /var/run/ntpd.pid -f /etc/ntp/ntpd.drift -g" hostname="proxy1.dev" ifconfig_vr0="inet 10.0.100.31/24" ifconfig_vr2="inet 207.171.7.31/24" ifconfig_vr2_ipv6="inet6 2607:f238:3::1:1/64" ifconfig_carp0="vhid 40 advskew 50 pass y4t8gwtgjkq4g 207.171.7.40" ipv4_addrs_carp0="207.171.7.41-49/24" ifconfig_carp0_ipv6="inet6 2607:f238:3::1:41/64" ifconfig_carp0_alias0="inet6 2607:f238:3::1:40/64" ifconfig_carp0_alias1="inet6 2607:f238:3::1:42/64" ifconfig_carp0_alias2="inet6 2607:f238:3::1:43/64" ifconfig_carp0_alias3="inet6 2607:f238:3::1:44/64" ifconfig_carp0_alias4="inet6 2607:f238:3::1:45/64" ifconfig_carp0_alias5="inet6 2607:f238:3::1:46/64" ifconfig_carp0_alias6="inet6 2607:f238:3::1:47/64" ifconfig_carp0_alias7="inet6 2607:f238:3::1:48/64" ifconfig_carp0_alias8="inet6 2607:f238:3::1:49/64" ifconfig_carp1="vhid 50 advskew 250 pass hsjrthvruwybwt 207.171.7.50" ipv4_addrs_carp1="207.171.7.51-59/24" ifconfig_carp1_ipv6="inet6 2607:f238:3::1:51/64" ifconfig_carp1_alias0="inet6 2607:f238:3::1:50/64" ifconfig_carp1_alias1="inet6 2607:f238:3::1:52/64" ifconfig_carp1_alias2="inet6 2607:f238:3::1:53/64" ifconfig_carp1_alias3="inet6 2607:f238:3::1:54/64" ifconfig_carp1_alias4="inet6 2607:f238:3::1:55/64" ifconfig_carp1_alias5="inet6 2607:f238:3::1:56/64" ifconfig_carp1_alias6="inet6 2607:f238:3::1:57/64" ifconfig_carp1_alias7="inet6 2607:f238:3::1:58/64" ifconfig_carp1_alias8="inet6 2607:f238:3::1:59/64" ifconfig_vr1="down" defaultrouter="207.171.7.1" ipv6_defaultrouter="2607:F238:3::1" ifconfig_lo0_alias0="inet 127.0.0.2" ifconfig_lo0_alias1="inet 127.0.0.3" cloned_interfaces="carp0 carp1" static_routes="${static_routes} vpn" route_vpn="-net 10.0.0.0/16 10.0.100.1" pf_enable="NO" pflog_enable="NO" haproxy_enable="YES" haproxy_config="/etc/haproxy.conf" ### # proxy2 sshd_enable="YES" ntpd_enable="YES" ntpd_flags="-p /var/run/ntpd.pid -f /etc/ntp/ntpd.drift -g" hostname="proxy2.dev" ifconfig_vr0="inet 10.0.100.32/24" ifconfig_vr2="inet 207.171.7.32/24" ifconfig_vr2_ipv6="inet6 2607:f238:3::1:2/64" ifconfig_carp0="vhid 40 advskew 150 pass y4t8gwtgjkq4g 207.171.7.40" ipv4_addrs_carp0="207.171.7.41-49/24" ifconfig_carp0_ipv6="inet6 2607:f238:3::1:41/64" ifconfig_carp0_alias0="inet6 2607:f238:3::1:40/64" ifconfig_carp0_alias1="inet6 2607:f238:3::1:42/64" ifconfig_carp0_alias2="inet6 2607:f238:3::1:43/64" ifconfig_carp0_alias3="inet6 2607:f238:3::1:44/64" ifconfig_carp0_alias4="inet6 2607:f238:3::1:45/64" ifconfig_carp0_alias5="inet6 2607:f238:3::1:46/64" ifconfig_carp0_alias6="inet6 2607:f238:3::1:47/64" ifconfig_carp0_alias7="inet6 2607:f238:3::1:48/64" ifconfig_carp0_alias8="inet6 2607:f238:3::1:49/64" ifconfig_carp1="vhid 50 advskew 100 pass hsjrthvruwybwt 207.171.7.50" ipv4_addrs_carp1="207.171.7.51-59/24" ifconfig_carp1_ipv6="inet6 2607:f238:3::1:51/64" ifconfig_carp1_alias0="inet6 2607:f238:3::1:50/64" ifconfig_carp1_alias1="inet6 2607:f238:3::1:52/64" ifconfig_carp1_alias2="inet6 2607:f238:3::1:53/64" ifconfig_carp1_alias3="inet6 2607:f238:3::1:54/64" ifconfig_carp1_alias4="inet6 2607:f238:3::1:55/64" ifconfig_carp1_alias5="inet6 2607:f238:3::1:56/64" ifconfig_carp1_alias6="inet6 2607:f238:3::1:57/64" ifconfig_carp1_alias7="inet6 2607:f238:3::1:58/64" ifconfig_carp1_alias8="inet6 2607:f238:3::1:59/64" ifconfig_vr1="down" defaultrouter="207.171.7.1" ipv6_defaultrouter="2607:F238:3::1" ifconfig_lo0_alias0="inet 127.0.0.2" ifconfig_lo0_alias1="inet 127.0.0.3" cloned_interfaces="carp0 carp1" static_routes="${static_routes} vpn" route_vpn="-net 10.0.0.0/16 10.0.100.1" pf_enable="NO" pflog_enable="NO" haproxy_enable="YES" haproxy_config="/etc/haproxy.conf" svscan_enable="NO" svscan_servicedir="/etc/svscan" ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Block ACK in Ralink RT2860
Hi all, I am trying to read the contents of block ack's in a Ralink RT2860 driver. Can you please help me to know which function i should be looking into ? Thanks ram ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Cas driver fails to load first time after boot.
On Tue, Jan 22, 2013 at 02:46:48PM -0600, Paul Keusemann wrote: > Hi, > > I've got a Dell R200 which I'm trying to build into a gateway with a Sun > QGE (501-6738-10). The cas driver fails to load the first time I try to > load it but succeeds the second time. Is this a problem with the card, > the driver, my karma? Wrong phase of the moon, apparently :) The MII setup of these chips is a bit tricky and I'm not sure whether I've hit all code paths during development of the driver. I certainly didn't test with a 501-6738, these have been reported as working before, though. It also doesn't make much sense that attaching the devices succeeds on the second attempt. Could you please use a if_cas.ko built with the attached patch and report the debug output for one of the interfaces in both the working and the non-working case? Marius Index: if_cas.c === --- if_cas.c (revision 245046) +++ if_cas.c (working copy) @@ -332,6 +332,8 @@ cas_attach(struct cas_softc *sc) */ error = ENXIO; v = CAS_READ_4(sc, CAS_MIF_CONF); +device_printf(sc->sc_dev, "MIF=0x%x PCFG=0x%x\n", v, +CAS_READ_4(sc, CAS_SATURN_PCFG)); if ((v & CAS_MIF_CONF_MDI1) != 0) { v |= CAS_MIF_CONF_PHY_SELECT; CAS_WRITE_4(sc, CAS_MIF_CONF, v); @@ -347,6 +349,8 @@ cas_attach(struct cas_softc *sc) error = mii_attach(sc->sc_dev, &sc->sc_miibus, ifp, cas_mediachange, cas_mediastatus, BMSR_DEFCAPMASK, MII_PHY_ANY, MII_OFFSET_ANY, MIIF_DOPAUSE); +if (error == 0) +device_printf(sc->sc_dev, "external PHY\n"); } /* * Fall back on an internal PHY if no external PHY was found. @@ -367,6 +371,8 @@ cas_attach(struct cas_softc *sc) error = mii_attach(sc->sc_dev, &sc->sc_miibus, ifp, cas_mediachange, cas_mediastatus, BMSR_DEFCAPMASK, MII_PHY_ANY, MII_OFFSET_ANY, MIIF_DOPAUSE); +if (error == 0) +device_printf(sc->sc_dev, "internal PHY\n"); } } else { /* ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Some questions about the new TCP congestion control code
On 24.01.2013 14:28, Lawrence Stewart wrote: On 01/16/13 06:27, John Baldwin wrote: One other thing I noticed which is may or may not be odd during this, is that if you have a connection with TCP_NODELAY enabled and you fill your cwnd and then you get an ACK back for an earlier small segment (less than MSS), TCP will not send out a "short" segment for the amount of window space released. Instead, it will wait until a full MSS of space is available before sending a packet. I'm not sure if that is the correct behavior with TCP_NODELAY or if we should send "short" segments in that case. We try fairly hard not to send runt segments irrespective of NODELAY, but I would be happy to see that change. I'm not aware of any "correct behaviour" we have to adhere to - I think it would be perfectly reasonable to have a sysctl set the lowest number of bytes we'd be willing to send a runt segment for and then key off TCP_NODELAY as to whether we try hard to send an MSS worth or send as soon as we have the min number of bytes worth of window available. This is classic silly window syndrome prevention applied to the CWND. Sending a small segment when the window opens just a bit isn't going to help much and mostly clogs the network. This is actually a side effect of ABC (appropriate byte counting) where not the ACK's are counted but the bytes ACK'ed. Disabling ABC will solve this problem. -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
how to completely makes an interface down?
Hi all, I'm searching for a method or configuration which when I make the interface down, the led goes off. Currently the led still remains on when I shutdowns the interface! Is there any way to do this? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Some questions about the new TCP congestion control code
On 01/16/13 06:27, John Baldwin wrote: > On Tuesday, January 15, 2013 3:29:51 am Lawrence Stewart wrote: >> Hi John, >> >> On 01/15/13 08:04, John Baldwin wrote: >>> I was looking at TCP congestion control at work recently and noticed a few >> >> Poor you ;) >> >>> "odd" things in the current code. First, there is this chunk of code in >>> cc_ack_received() in tcp_input.c: >>> >>> static void inline >>> cc_ack_received(struct tcpcb *tp, struct tcphdr *th, uint16_t type) >>> { >>> INP_WLOCK_ASSERT(tp->t_inpcb); >>> >>> tp->ccv->bytes_this_ack = BYTES_THIS_ACK(tp, th); >>> if (tp->snd_cwnd == min(tp->snd_cwnd, tp->snd_wnd)) >>> tp->ccv->flags |= CCF_CWND_LIMITED; >>> else >>> tp->ccv->flags &= ~CCF_CWND_LIMITED; >>> >>> >>> Due to hysterical raisins, snd_cwnd and snd_wnd are u_long values, not >>> integers, so the call to min() results in truncation on 64-bit hosts. >> >> Good catch, but I don't think it matters in practice as neither snd_cwnd >> or snd_wnd will grow past the 32-bit boundary. > > I have a psyhcotic case using cc_cubic where it seems to grow without bound, > though that is a bug in and of itself (and this change did not fix that > issue). I ended up not using cc_cubic (more below) and haven't been able > to track down the root cause of the delay. I can probably provide a test case > to reproduce this if you are interested. hmm I'd certainly be interested in hearing more about this issue with cubic. If you think a test case is easy to come up with, please shoot it through to me when you have the chance. >>> It should probably be ulmin() instead. However, this line seems to be a >>> really >>> obfuscated way to just write: >>> >>> if (tp->snd_cwnd <= tp->snd_wnd) >> >> You are correct, though I'd argue the meaning of the existing code as >> written is clearer compared to your suggested change. >> >>> If that is correct, I would vote for changing this to use the much simpler >>> logic. >> >> Agreed. While I find the existing code slightly clearer in meaning, it's >> not significant enough to warrant keeping it as is when your suggested >> change is simpler, fixes a bug and achieves the same thing. Happy for >> you to change it or I can do it if you prefer. > > I'll leave that to you, thanks. Committed as r245783. >>> Secondly, in the particular case I was investigating at work (restart of an >>> idle connnection), the newreno congestion control code in 8.x and later >>> uses a >>> different algorithm than in 7. Specifically, in 7 TCP would reuse the same >>> logic used for an initial cwnd (honoring ss_fltsz). In 8 this no longer >>> happens (instead, 2 is hardcoded). A guess at a possible fix might look >>> something like this: >>> >>> Index: cc_newreno.c >>> === >>> --- cc_newreno.c(revision 243660) >>> +++ cc_newreno.c(working copy) >>> @@ -169,8 +169,21 @@ newreno_after_idle(struct cc_var *ccv) >>> if (V_tcp_do_rfc3390) >>> rw = min(4 * CCV(ccv, t_maxseg), >>> max(2 * CCV(ccv, t_maxseg), 4380)); >>> +#if 1 >>> else >>> rw = CCV(ccv, t_maxseg) * 2; >>> +#else >>> + /* XXX: This is missing a lot of stuff that used to be in 7. */ >>> +#ifdef INET6 >>> + else if ((isipv6 ? in6_localaddr(&CCV(ccv, t_inpcb->in6p_faddr)) : >>> + in_localaddr(CCV(ccv, t_inpcb->inp_faddr >>> +#else >>> + else if (in_localaddr(CCV(ccv, t_inpcb->inp_faddr))) >>> +#endif >>> + rw = V_ss_fltsz_local * CCV(ccv, t_maxseg); >>> + else >>> + rw = V_ss_fltsz * CCV(ccv, t_maxseg); >>> +#endif >>> >>> CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd)); >>> } >>> >>> (But using the #else clause instead of the current #if 1 code). Was this >>> change in 8 intentional? >> >> It was. Unlike connection initialisation which still honours ss_fltsz in >> cc_conn_init(), restarting an idle connection based on ss_fltsz seemed >> particularly dubious and as such was omitted from the refactored code. >> >> The ultimate goal was to remove the ss_fltsz hack completely and >> implement a smarter mechanism, but that hasn't quite happened yet. The >> removal of ss_fltsz from 10.x without providing a replacement mechanism >> is not ideal and should probably be addressed. >> >> I'm guessing you're not using rfc3390 because you want to override the >> initial window based on specific local knowledge of the path between >> sender and receiver? > > Correct, in 7.x we had cranked ss_fltsz up to a really high number to prevent > the congestion window from collapsing when the connection was idle. We have > a bit of a unique workload in that we are using TCP to reliably forward a > latency-sensitive datagram stream across a WAN connection with high bandwidth > and high RTT. Most of congestion control seems tuned to bulk transfers rather > than this sort of use case. The solution we have settled on here is to add a >
Re: [PATCH] Don't imply TCP and UDP socket options are bitmasks
On 01/23/13 07:28, John Baldwin wrote: > On Tuesday, January 22, 2013 3:57:23 am Lawrence Stewart wrote: >> On 01/16/13 06:16, John Baldwin wrote: >>> On Tuesday, January 15, 2013 3:49:33 am Lawrence Stewart wrote: On 01/15/13 07:50, John Baldwin wrote: > The constants used for TCP and UDP socket options (TCP_NODELAY, etc.) are > currently defined as hex values that are individual bits. However, > socket > options are never masked together, they are used as a simple enumeration > of > discrete values. Using a bitmask forces us to run out of bits and makes > it > harder for vendors to try to use a high range of values for local custom > options (hoping that they never conflict with a new option value added in > stock FreeBSD). Yup. Should we be explicitly #defining the boundary between "bits reserved for FreeBSD" and "bits for private vendor use"? >>> >>> Oh, we could if you wanted. I'm using 0x1000 locally for both TCP and UDP, >>> but those are completely arbitrary values. Saner ones might be 0x800 if >>> we want to do that explicitly. We could perhaps just say that is true for >>> all >>> socket option levels (that is, just define one SO_VENDOR constant or some >>> such >>> but say it applies to all levels)? >> >> A single SO_VENDOR applied to all levels sounds good to me. > > Ok, how about this for wording: > > Index: sys/socket.h > === > --- socket.h (revision 245742) > +++ socket.h (working copy) > @@ -143,6 +143,15 @@ typedef __uid_t uid_t; > #endif > > /* > + * Space reserved for new socket options added by third-party vendors. > + * This range applies to all socket option levels. New socket options > + * in FreeBSD should always use an option value less than SO_VENDOR. > + */ > +#if __BSD_VISIBLE > +#define SO_VENDOR 0x8000 > +#endif > + > +/* > * Structure used for manipulating linger option. > */ > struct linger { Two thumbs up from me. We might also want to #define TCP_VENDOR SO_VENDOR /* FreeBSD TCP socket options must be numerically less than this. */ and so on in each file that defines option levels to provide some hint to people that SO_VENDOR exists? Maybe we don't need the define and just need to put the one line comment at the end of each set of options in each file where a particular level's options are specified. Cheers, Lawrence ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: [PATCH] Add a new TCP_IGNOREIDLE socket option
On 24.01.2013 03:31, Sepherosa Ziehau wrote: On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin wrote: On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote: On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin wrote: As I mentioned in an earlier thread, I recently had to debug an issue we were seeing across a link with a high bandwidth-delay product (both high bandwidth and high RTT). Our specific use case was to use a TCP connection to reliably forward a latency-sensitive datagram stream across a WAN connection. We would often see spikes in the latency of individual datagrams. I eventually tracked this down to the connection entering slow start when it would transmit data after being idle. The data stream was quite bursty and would often attempt to transmit a burst of data after being idle for far longer than a retransmit timeout. In 7.x we had worked around this in the past by disabling RFC 3390 and jacking the slow start window size up via a sysctl. On 8.x this no longer worked. The solution I came up with was to add a new socket option to disable idle handling completely. That is, when an idle connection restarts with this new option enabled, it keeps its current congestion window and doesn't enter slow start. There are only a few cases where such an option is useful, but if anyone else thinks this might be useful I'd be happy to add the option to FreeBSD. I think what you need is the RFC2861, however, you probably should ignore the "application-limited period" part of RFC2861. Hummm. It appears btw, that Linux uses RFC 2861, but has a global knob to disable it due to applictions having problems. When it is disabled, it doesn't decay the congestion window at all during idle handling. That is, it appears to act the same as if TCP_IGNOREIDLE were enabled. From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html: tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 2.6.18) If enabled, provide RFC 2861 behavior and time out the congestion window after an idle period. An idle period is defined as the current RTO (retransmission timeout). If disabled, the congestion window will not be timed out after an idle period. Also, in this thread on tcp-m it appears no one on that list realizes that there are any implementations which follow the "SHOULD" in RFC 2581 for idle handling (which is what we do currently): Nah, I don't think the idle detection in FreeBSD follows the RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD"). IMHO, that's probably why the author in the following email requestioned about the implementation of "SHOULD" in RFC2581/RFC5681. http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html So if we were to implement RFC 2861, the new socket option would be equivalent to setting Linux's 'tcp_slow_start_after_idle' to false, but on a per-socket basis rather than globally. Agree, per-socket option could be useful than global sysctls under certain situation. However, in addition to the per-socket option, could global sysctl nodes to disable idle_restart/idle_cwv help too? No. This is far too dangerous once it makes it into some tuning guide. The threat of congestion breakdown is real. The Internet, or any packet network, can only survive in the long term if almost all follow the rules and self-constrain to remain fair to the others. What would happen if nobody would respect the traffic lights anymore? Besides that bursting into unknown network conditions is very likely to result in burst losses as well. TCP isn't good at recovering from it. In the end you most likely come out ahead if you decay the restartCWND. We have two cases primarily: a) long distance, medium to high RTT, and wildly varying bandwidth (a.k.a. the Internet); b) short distance, low RTT and mostly plenty of bandwidth (a.k.a. Datacenter). The former absolutely definately requires a decayed restartCWND. The latter less so but even there bursting at 10Gig TSO assisted wirespeed isn't going to end too happy more often than not. Since this seems to be a burning issue I'll come up with a patch in the next days to add a decaying restartCWND that'll be fair and allow a very quick ramp up if no loss occurs. -- Andre ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"