Re: Rename of MSIZE kernel option..
On 2002-10-15 00:12, Nicolas Christin [EMAIL PROTECTED] wrote: On Mon, 14 Oct 2002, Andrew Gallatin wrote: Would people be open to renaming the 'MSIZE' kernel option to something more specific such as 'MBUF_SIZE' or 'MBUFSIZE'? Using 'MSIZE' can No. MSIZE is a traditional BSDism. Everybody else still uses it. Even AIX and MacOS. I really don't like the idea of changing this. True, but John is right, it's too generic a name. The argument it's been forever so we can't change it seems a bit fallacious to me: True. But that sort of reasoning might lead us one day to rename macros and functions like m_get() to mbuf_get() or similar. That doesn't seem like a good idea :-/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: How to add bpf support to if_atmsubr.c?
On Tue, 15 Oct 2002, Bruce M Simpson wrote: BMSOn Mon, Oct 14, 2002 at 11:13:05PM -0700, Guy Harris wrote: BMS The current CVS versions of libpcap and tcpdump, and the current BMS released version of Ethereal, support a DLT_SUNATM DLT_ type. SunATM's BMS DLPI interface supplies packets with a 4-byte pseudo-header, consisting of: BMS[snip] BMS BMSJust FYI... BMS BMSThis sounds very similar to the promiscuous cell receive option on ENI's BMSSpeedStream 5861 router. I found the raw hex cell output was essentially BMSa 4 byte ATM UNI header omitting the CRC byte, and the 48 bytes of the raw BMSAAL5 cell payload. The marconi HE cards have the same format although they have no promiscous mode (although it would be easy to configure all unused connections to receveive to a free receive group, the question is whether you want this (35/packets per second for OC3)). My driver allows you to receive cells (i.e. AAL0) on any of the supported connections. BMSIs there any open source support for the SunATM PCI cards? I see a few of BMSthem cropping up on eBay from time to time. It might be worth finding out BMSwhich ASICs they use, I doubt Sun would engineer their own. Does Sun still make ATM cards? As far as I remember I saw the last SBUS cards a couple of years ago. harti -- harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
which L2TP server ?
Hello! I'm looking for a good L2TP server for FreeBSD, someone knows it ? If I'm right MPD does not (yet?) support L2TP. Thanks in advance! -- bye! Ale To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
RFR: ping(8) patches: do not fragment, TOS, maximum payload
Hello, I have made a patch set for ping(8). I'll appreciate your comments. I did not include patches #3 and #4, they are stylistic mostly (based on BDE's style patch). A cumulative patch is there: http://people.freebsd.org/~maxim/p.cumulative #1, Print strict source routing option. Requested by David Wang [EMAIL PROTECTED]. Index: ping.c === RCS file: /home/maxim/cvs/ping/ping.c,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- ping.c 15 Oct 2002 11:56:58 - 1.1 +++ ping.c 15 Oct 2002 11:57:53 - 1.2 @@ -953,7 +953,9 @@ hlen = 0; break; case IPOPT_LSRR: - (void)printf(\nLSRR: ); + case IPOPT_SSRR: + (void)printf(*cp == IPOPT_LSRR ? + \nLSRR: : \nSSRR: ); j = cp[IPOPT_OLEN] - IPOPT_MINOFF + 1; hlen -= 2; cp += 2; %%% #2, Implement -D (do not fragment) and -z (TOS) options. Obtained from OpenBSD, bin/35843. Index: ping.c === RCS file: /home/maxim/cvs/ping/ping.c,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- ping.c 15 Oct 2002 11:57:53 - 1.2 +++ ping.c 15 Oct 2002 12:04:10 - 1.3 @@ -67,6 +67,7 @@ */ #include sys/param.h /* NB: we rely on this for sys/types.h */ +#include sys/sysctl.h #include ctype.h #include err.h @@ -107,6 +108,7 @@ #defineMAXPAYLOAD (IP_MAXPACKET - MAXIPLEN - MINICMPLEN) #defineMAXWAIT 10 /* max seconds to wait for response */ #defineMAXALARM(60 * 60) /* max seconds for alarm timeout */ +#defineMAXTOS 255 #defineA(bit) rcvd_tbl[(bit)3] /* identify byte in array */ #defineB(bit) (1 ((bit) 0x07)) /* identify bit in byte */ @@ -138,6 +140,7 @@ #defineF_TTL 0x8000 #defineF_MISSED0x1 #defineF_ONCE 0x2 +#defineF_HDRINCL 0x4 /* * MAX_DUP_CHK is the number of bits in received table, i.e. the maximum @@ -151,7 +154,7 @@ struct sockaddr_in whereto;/* who to ping */ int datalen = DEFDATALEN; int s; /* socket file descriptor */ -u_char outpack[MINICMPLEN + MAXPAYLOAD]; +u_char outpackhdr[IP_MAXPACKET], *outpack; char BSPACE = '\b';/* characters written for flood */ char BBELL = '\a'; /* characters written for MISSED and AUDIBLE */ char DOT = '.'; @@ -201,6 +204,7 @@ { struct in_addr ifaddr; struct iovec iov; + struct ip *ip; struct msghdr msg; struct sigaction si_sa; struct sockaddr_in from, sin; @@ -209,13 +213,15 @@ struct hostent *hp; struct sockaddr_in *to; double t; + size_t sz; u_char *datap, packet[IP_MAXPACKET]; char *ep, *source, *target; #ifdef IPSEC_POLICY_IPSEC char *policy_in, *policy_out; #endif u_long alarmtimeout, ultmp; - int ch, hold, i, packlen, preload, sockerrno, almost_done = 0, ttl; + int ch, df, hold, i, mib[4], packlen, preload, sockerrno, + almost_done = 0, tos, ttl; char ctrl[CMSG_SPACE(sizeof(struct timeval))]; char hnamebuf[MAXHOSTNAMELEN], snamebuf[MAXHOSTNAMELEN]; #ifdef IP_OPTIONS @@ -239,11 +245,12 @@ setuid(getuid()); uid = getuid(); - alarmtimeout = preload = 0; + alarmtimeout = df = preload = tos = 0; + outpack = outpackhdr + sizeof(struct ip); datap = outpack[MINICMPLEN + PHDR_LEN]; while ((ch = getopt(argc, argv, - AI:LQRS:T:c:adfi:l:m:nop:qrs:t:v + ADI:LQRS:T:c:adfi:l:m:nop:qrs:t:vz: #ifdef IPSEC #ifdef IPSEC_POLICY_IPSEC P: @@ -266,6 +273,10 @@ optarg); npackets = ultmp; break; + case 'D': + options |= F_HDRINCL; + df = 1; + break; case 'd': options |= F_SO_DEBUG; break; @@ -390,6 +401,13 @@ else errx(1, invalid security policy); break; + case 'z': + options |= F_HDRINCL; + ultmp = strtoul(optarg, ep, 0); + if (*ep || ep == optarg || ultmp MAXTOS) + errx(EX_USAGE, invalid TOS: `%s', optarg); + tos = ultmp; + break; #endif /*IPSEC_POLICY_IPSEC*/ #endif /*IPSEC*/ default: @@ -509,6 +527,28 @@ #endif
Re: which L2TP server ?
Alessandro de Manzano wrote: Hello! I'm looking for a good L2TP server for FreeBSD, someone knows it ? If I'm right MPD does not (yet?) support L2TP. Thanks in advance! man ng_l2tp DESCRIPTION The ng_l2tp node type implements the encapsulation layer of the L2TP pro- tocol as described in RFC 2661. This includes adding the L2TP packet header for outgoing packets and verifying and removing it for incoming packets. The node maintains the L2TP sequence number state and handles control session packet acknowledgment and retransmission. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: which L2TP server ?
On Tue, Oct 15, 2002 at 07:10:29AM -0700, Michael Sierchio wrote: man ng_l2tp DESCRIPTION The ng_l2tp node type implements the encapsulation layer of the L2TP pro- tocol as described in RFC 2661. This includes adding the L2TP packet thanks, but I'm looking for something at higher level, also easier to setup. As MPD (actually it use ng_ppp and others), for example. tnx! -- bye! Ale To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: which L2TP server ?
In arved.freebsd.net, you wrote: On Tue, Oct 15, 2002 at 07:10:29AM -0700, Michael Sierchio wrote: man ng_l2tp DESCRIPTION The ng_l2tp node type implements the encapsulation layer of the L2TP pro- tocol as described in RFC 2661. This includes adding the L2TP packet thanks, but I'm looking for something at higher level, also easier to setup. As MPD (actually it use ng_ppp and others), for example. I once compiled the Linux one from www.l2tpd.org (port at http://stud3.tuwien.ac.at/~e0025974/bsdsrc/l2tpd.shar), but never tested, if it really worked on FreeBSD. regards arved msg07094/pgp0.pgp Description: PGP signature
Re: How to add bpf support to if_atmsubr.c?
On Tue, Oct 15, 2002 at 11:54:52AM +0100, Bruce M Simpson wrote: This sounds very similar to the promiscuous cell receive option on ENI's SpeedStream 5861 router. I found the raw hex cell output was essentially a 4 byte ATM UNI header omitting the CRC byte, and the 48 bytes of the raw AAL5 cell payload. Similar, but not the same; I doubt there's any hardware significance to the VPI/VCI part of the header, and the type field is probably put there by the driver. (Also, the DLPI interface supplies reassembled AAL5 PDUs, not raw cells; I don't know what it does for other AALs, except for the signalling AAL where it again supplies reassembled packets.) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: How to add bpf support to if_atmsubr.c?
On Tue, Oct 15, 2002 at 01:01:05PM +0200, Harti Brandt wrote: Does Sun still make ATM cards? As far as I remember I saw the last SBUS cards a couple of years ago. They still have a Web page for SunATM: http://www.sun.com/products-n-solutions/hw/networking/connectivity/sunatm/index.html and say that they've introduced a 4.0 version of SunATM (which runs in 64-bit mode on Solaris 7) and also list PCI adapters in addition to the SBus adapters. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: which L2TP server ?
A year a half ago, the l2tpd interface and code was still in its infancy. If all you seek is to create tunnels/sessions, and don't care about security or other more complex l2tp issues, it should work ok. I developed my own L2TP stack for Linux with much higher level of functionality. It would take some porting effort. Only a small effort was made on usability, so that could be an issue too. http://sourceforge.net/projects/l2tp/ I would also suggest going to http://sourceforge.net and search for l2tp. There are a few other projects out there besides these two. Regards, Bill Baumann On Tue, 15 Oct 2002, Tilman Linneweh wrote: In arved.freebsd.net, you wrote: On Tue, Oct 15, 2002 at 07:10:29AM -0700, Michael Sierchio wrote: man ng_l2tp DESCRIPTION The ng_l2tp node type implements the encapsulation layer of the L2TP pro- tocol as described in RFC 2661. This includes adding the L2TP packet thanks, but I'm looking for something at higher level, also easier to setup. As MPD (actually it use ng_ppp and others), for example. I once compiled the Linux one from www.l2tpd.org (port at http://stud3.tuwien.ac.at/~e0025974/bsdsrc/l2tpd.shar), but never tested, if it really worked on FreeBSD. regards arved To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
On Mon, 14 Oct 2002, Steve Francis wrote: Kirill Ponomarew wrote: is it recommended to use net.inet.tcp.delayed_ack=0 on the machines with heavy network traffic ? If you want to increase your network traffic for no particular reason, and increase load on your server, then yes. Otherwise no. Not true. Although some bugs have been fixed in 4.3, FreeBSD's delayed ACKs will still degrade your performance dramatically in some cases. For now, the best advice I could give is to benchmark your client machine with and without delayed ACKs and see which works best for your environment. -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: which L2TP server ?
There is a new L2TP project from Roaring Penguin. It supports both LAC and LNS features: http://sourceforge.net/projects/rp-l2tp It requires pppd. It has been written for Linux, however it should support FreeBSD easily. Vincent Le Mardi 15 Octobre 2002 14:15, Alessandro de Manzano a écrit : Hello! I'm looking for a good L2TP server for FreeBSD, someone knows it ? If I'm right MPD does not (yet?) support L2TP. Thanks in advance! To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
dynamic load of em/fxp/bge
I am trying to load the if_em, if_fxp, if_bge drivers via /boot/loader.conf. I've added if_fxp_load=YES if_bge_load=YES if_em_load=YES The problem is that the bge driver doesn't load. It will if I manually load it after startup with kldload. The issue seems to be a dependency on miibus, both fxp and bge want to load it, bge gets an error that its already loaded. I tried putting 'miibus_load=YES' in loader.conf, but the same affect is seen. I've tried from the boot prompt doing an explicit load of these manually in each order, but to no avail. As a work-around, I've placed an kldload if_bge in rc.network before the 'ifconfig -l'. Any suggestions on why the fxp/bge don't play nice when loaded automatically, but will work if run manually? Is there a timing thing that the fxp hasn't initialised its miibus yet? I have: fxp0 fxp1 bge0 in this particular machine. The bge will get miibus2 (eventually), leaving fxp0 to have miibus0, fxp1 to have miibus1 I think. Suggestions? --don To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: RFC: eliminating the _IP_VHL hack.
On Wed, Oct 16, 2002 at 12:17:13AM +0200, Poul-Henning Kamp wrote: ... I would therefore propose to eliminate the _IP_VHL hack from the kernel yes, go for it. cheers luigi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
RFC: eliminating the _IP_VHL hack.
On Wed, 16 Oct 2002 00:17:13 +0200, Poul-Henning Kamp [EMAIL PROTECTED] said: In the meantime absolutely no code has picked up on this idea, It was copied in spirit from OSF/1. The side effect of having some source-files using the _IP_VHL hack and some not is that sizeof(struct ip) varies from file to file, Not so. Any compiler which allocates different amounts of storage to one eight-bit member versus two four-bit bitfield members is seriously broken (and would defeat the whole purpose). I would therefore propose to eliminate the _IP_VHL hack from the kernel to end this state of (potential) confusion, and invite comments to the following patch: Much better to delete the bogus BYTE_ORDER kluge from ip.h. (Note that the definition of the bitfields in question has nothing whatsoever to do with the actual byte order in use; it simply relies on the historical behavior of compilers which allocated space for bitfields in BYTE_ORDER order.) -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
ENOBUFS
On Wed, 16 Oct 2002 00:53:46 +0300, Petri Helenius [EMAIL PROTECTED] said: My processes writing to SOCK_DGRAM sockets are getting ENOBUFS Probably means that your outgoing interface queue is filling up. ENOBUFS is the only way the kernel has to tell you ``slow down!''. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
Lars Eggert wrote: Paul Herman wrote: Not true. Although some bugs have been fixed in 4.3, FreeBSD's delayed ACKs will still degrade your performance dramatically in some cases. I'm sorry, but such statements without a packet trace that exhibits the problem are just not useful. Lars He's probably referring to poorly behaved windows clients, on certain applications, if you leave net.inet.tcp.slowstart_flightsize at default. Incidentally, why are not the defaults on net.inet.tcp.slowstart_flightsize higher? RFC2414 seems to indicate it should be higher. Solaris in version 8 and later default to 4 for this value. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
Steve Francis wrote: He's probably referring to poorly behaved windows clients, on certain applications, if you leave net.inet.tcp.slowstart_flightsize at default. Ah. Well, that's a Windows problem :-) Incidentally, why are not the defaults on net.inet.tcp.slowstart_flightsize higher? RFC2414 seems to indicate it should be higher. Solaris in version 8 and later default to 4 for this value. I've been running with 4 for years w/o problems. so i'm all for the change. Lars -- Lars Eggert [EMAIL PROTECTED] USC Information Sciences Institute smime.p7s Description: S/MIME Cryptographic Signature
Re: ENOBUFS
What rate are you sending these packets at? A standard interface queue length is 50 packets, you get ENOBUFS when it's full. This might explain the phenomenan. (packets are going out bursty, with average hovering at ~500Mbps:ish) I recomplied kernel with IFQ_MAXLEN of 5000 but there seems to be no change in the behaviour. How do I make sure that em-interface is running 66/64 and is there a way to see interface queue depth? em0: Intel(R) PRO/1000 Network Connection, Version - 1.3.14 port 0x3040-0x307f mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2 em0: Speed:1000 Mbps Duplex:Full pcib2: PCI to PCI bridge (vendor=8086 device=1460) at device 29.0 on pci1 IOAPIC #2 intpin 0 - irq 16 IOAPIC #2 intpin 6 - irq 17 IOAPIC #2 intpin 7 - irq 18 pci2: PCI bus on pcib2 The OS is 4.7-RELEASE. Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: ENOBUFS
Probably means that your outgoing interface queue is filling up. ENOBUFS is the only way the kernel has to tell you ``slow down!''. How much should I be able to send to two em interfaces on one 66/64 PCI ? Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: ENOBUFS
On Wed, Oct 16, 2002 at 02:04:11AM +0300, Petri Helenius wrote: What rate are you sending these packets at? A standard interface queue length is 50 packets, you get ENOBUFS when it's full. This might explain the phenomenan. (packets are going out bursty, with average hovering at ~500Mbps:ish) I recomplied kernel with IFQ_MAXLEN of 5000 but there seems to be no change in the behaviour. How do I make sure that how large are the packets and how fast is the box ? on a fast box you should be able to generate packets faster than wire speed for sizes around 500bytes, meaning that you are going to saturate the queue no matter how large it is. cheers luigi em-interface is running 66/64 and is there a way to see interface queue depth? em0: Intel(R) PRO/1000 Network Connection, Version - 1.3.14 port 0x3040-0x307f mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2 em0: Speed:1000 Mbps Duplex:Full pcib2: PCI to PCI bridge (vendor=8086 device=1460) at device 29.0 on pci1 IOAPIC #2 intpin 0 - irq 16 IOAPIC #2 intpin 6 - irq 17 IOAPIC #2 intpin 7 - irq 18 pci2: PCI bus on pcib2 The OS is 4.7-RELEASE. Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: ENOBUFS
Petri Helenius wrote: Probably means that your outgoing interface queue is filling up. ENOBUFS is the only way the kernel has to tell you ``slow down!''. How much should I be able to send to two em interfaces on one 66/64 PCI ? I've seen netperf UDP throughputs of ~950Mpbs with a fiber em card and 4K datagrams on a 2.4Ghz P4. Lars -- Lars Eggert [EMAIL PROTECTED] USC Information Sciences Institute smime.p7s Description: S/MIME Cryptographic Signature
Re: RFC: eliminating the _IP_VHL hack.
On Wed, 16 Oct 2002, Poul-Henning Kamp wrote: almost 7 years ago, this commit introduced the _IP_VHL hack in our IP-stack: ] revision 1.7 ] date: 1995/12/21 21:20:27; author: wollman; state: Exp; lines: +5 -1 ] If _IP_VHL is defined, declare a single ip_vhl member in struct ip rather ] than separate ip_v and ip_hl members. Should have no effect on current code, ] but I'd eventually like to get rid of those obnoxious bitfields completely. We can argue a lot about how long time we should wait for eventually, but I would say that 7 years is far too long, considering the status: Fine by me. RCS file: /home/ncvs/src/sys/netinet/ip_icmp.c,v retrieving revision 1.70 diff -u -r1.70 ip_icmp.c --- ip_icmp.c 1 Aug 2002 03:53:04 - 1.70 +++ ip_icmp.c 15 Oct 2002 22:05:23 - @@ -51,7 +51,6 @@ #include net/if_types.h #include net/route.h -#define _IP_VHL #include netinet/in.h #include netinet/in_systm.h #include netinet/in_var.h @@ -128,7 +127,7 @@ struct ifnet *destifp; { register struct ip *oip = mtod(n, struct ip *), *nip; - register unsigned oiplen = IP_VHL_HL(oip-ip_vhl) 2; + register unsigned oiplen = oip-ip_hl 2; register struct icmp *icp; register struct mbuf *m; unsigned icmplen; @@ -214,7 +213,8 @@ nip = mtod(m, struct ip *); bcopy((caddr_t)oip, (caddr_t)nip, sizeof(struct ip)); nip-ip_len = m-m_len; - nip-ip_vhl = IP_VHL_BORING; + nip-ip_v = IPVERSION; + nip-ip_hl = 5; I think there is a manifest constant for the default ipv4 header size but can't remember it right now. -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
On Tue, 15 Oct 2002, Lars Eggert wrote: Paul Herman wrote: Not true. Although some bugs have been fixed in 4.3, FreeBSD's delayed ACKs will still degrade your performance dramatically in some cases. I'm sorry, but such statements without a packet trace that exhibits the problem are just not useful. /me reels line back in Aha! Another victim who is willing to take a look at this! :-) It's an issue that was left unresolved in kern/24645. Bruce Evans brought this to my attention back during the unrelated I have delayed ACK problems thread on -net in January of 2001 and I then passed it on to jlemon. If you need a packet trace, let me know, but you should be able to reproduce it yourself. Even today on my 4.7-PRERELEASE I still get: mammoth# sysctl net.inet.tcp.delayed_ack=0 net.inet.tcp.delayed_ack: 1 - 0 mammoth# time tar cf 127.0.0.1:/tmp/foo /kernel 0.000u 0.041s 0:00.33 12.1% 350+300k 0+0io 0pf+0w mammoth# sysctl net.inet.tcp.delayed_ack=1 net.inet.tcp.delayed_ack: 0 - 1 mammoth# time tar cf 127.0.0.1:/tmp/foo /kernel 0.014u 0.033s 0:45.90 0.0% 700+600k 0+0io 0pf+0w ^^^ It seems that lowering lo0 mtu to 1500 makes this particular problem go away. The magic mtu size is 2100. This makes me think that this is a big problem across GigE using 8K jumbo frames, not sure. Also, taring over the IPv6 lo0 interface seems to work OK. No idea what causes this. -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: RFC: eliminating the _IP_VHL hack.
The side effect of having some source-files using the _IP_VHL hack and some not is that sizeof(struct ip) varies from file to file, which at best is confusing an at worst the source of some really evil bugs. I would therefore propose to eliminate the _IP_VHL hack from the kernel to end this state of (potential) confusion This problem could be solved more easily by changing the u_int back to an u_char, as it used to be before rev 1.15: Index: ip.h === RCS file: /home/ncvs/src/sys/netinet/ip.h,v retrieving revision 1.19 diff -u -r1.19 ip.h --- ip.h14 Dec 2001 19:37:32 - 1.19 +++ ip.h16 Oct 2002 01:15:48 - @@ -51,11 +51,11 @@ u_char ip_vhl; /* version 4 | header length 2 */ #else #if BYTE_ORDER == LITTLE_ENDIAN - u_int ip_hl:4,/* header length */ + u_char ip_hl:4,/* header length */ ip_v:4; /* version */ #endif #if BYTE_ORDER == BIG_ENDIAN - u_int ip_v:4, /* version */ + u_char ip_v:4, /* version */ ip_hl:4;/* header length */ #endif #endif /* not _IP_VHL */ But, if we were to pick one or the other to discard, I would keep the IP_VHL because that field really is a byte in the IP header To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
this smells a lot as a bad interaction between default window size and mtu -- loopback has 16k default, maybe tar uses a smallish window (32k is default now for net.inet.tcp.sendspace, but used to be 16k at the time), which means only 1 or 2 packets in flight at once, meaning that many times you get the 200ms delay and your throughput goes way down. cheers luigi On Tue, Oct 15, 2002 at 05:25:42PM -0700, Paul Herman wrote: ... Aha! Another victim who is willing to take a look at this! :-) It's an issue that was left unresolved in kern/24645. Bruce Evans brought this to my attention back during the unrelated I have delayed ACK problems thread on -net in January of 2001 and I then passed it on to jlemon. If you need a packet trace, let me know, but you should be able to reproduce it yourself. Even today on my 4.7-PRERELEASE I still get: mammoth# sysctl net.inet.tcp.delayed_ack=0 net.inet.tcp.delayed_ack: 1 - 0 mammoth# time tar cf 127.0.0.1:/tmp/foo /kernel 0.000u 0.041s 0:00.33 12.1% 350+300k 0+0io 0pf+0w mammoth# sysctl net.inet.tcp.delayed_ack=1 net.inet.tcp.delayed_ack: 0 - 1 mammoth# time tar cf 127.0.0.1:/tmp/foo /kernel 0.014u 0.033s 0:45.90 0.0% 700+600k 0+0io 0pf+0w ^^^ It seems that lowering lo0 mtu to 1500 makes this particular problem go away. The magic mtu size is 2100. This makes me think that this is a big problem across GigE using 8K jumbo frames, not sure. Also, taring over the IPv6 lo0 interface seems to work OK. No idea what causes this. -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
On Tue, 15 Oct 2002, Luigi Rizzo wrote: this smells a lot as a bad interaction between default window size and mtu -- loopback has 16k default, maybe tar uses a smallish window (32k is default now for net.inet.tcp.sendspace, but used to be 16k at the time), which means only 1 or 2 packets in flight at once, meaning that many times you get the 200ms delay and your throughput goes way down. cheers luigi NetBSD introduced a fix for this recently, it seems sorta hackish, but maybe we need to do something similar. The diff reminds me why FreeBSD has a policy of seperating style and functional commits, fwiw. :) http://cvsweb.netbsd.org/bsdweb.cgi/syssrc/sys/netinet/tcp_output.c.diff?r1=1.84r2=1.85 Revision 1.85 / (download) - annotate - [select for diffs], Tue Aug 20 16:29:42 2002 UTC (8 weeks ago) by thorpej Branch: MAIN CVS Tags: gehenna-devsw-base Changes since 1.84: +18 -4 lines Diff to previous 1.84 (colored) Never send more than half a socket buffer of data. This insures that we can always keep 2 packets on the wire, no matter what SO_SNDBUF is, and therefore ACKs will never be delayed unless we run out of data to transmit. The problem is quite easy to tickle when the MTU of the outgoing interface is larger than the socket buffer size (e.g. loopback). Fix from Charles Hannum. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
On Tue, Oct 15, 2002 at 08:52:49PM -0500, Mike Silbersack wrote: ... NetBSD introduced a fix for this recently, it seems sorta hackish, but maybe we need to do something similar. this helps you if the other side has delayed acks, but halves the throughput if you are being window limited and the other side does not use delayed acks (can you force immediate acks by setting the PUSH flag in the tcp header ?) cheers luigi http://cvsweb.netbsd.org/bsdweb.cgi/syssrc/sys/netinet/tcp_output.c.diff?r1=1.84r2=1.85 Revision 1.85 / (download) - annotate - [select for diffs], Tue Aug 20 16:29:42 2002 UTC (8 weeks ago) by thorpej Branch: MAIN CVS Tags: gehenna-devsw-base Changes since 1.84: +18 -4 lines Diff to previous 1.84 (colored) Never send more than half a socket buffer of data. This insures that we can always keep 2 packets on the wire, no matter what SO_SNDBUF is, and therefore ACKs will never be delayed unless we run out of data to transmit. The problem is quite easy to tickle when the MTU of the outgoing interface is larger than the socket buffer size (e.g. loopback). Fix from Charles Hannum. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: delayed ACK
On Tue, 15 Oct 2002, Luigi Rizzo wrote: On Tue, Oct 15, 2002 at 08:52:49PM -0500, Mike Silbersack wrote: ... NetBSD introduced a fix for this recently, it seems sorta hackish, but maybe we need to do something similar. this helps you if the other side has delayed acks, but halves the throughput if you are being window limited and the other side does not use delayed acks (can you force immediate acks by setting the PUSH flag in the tcp header ?) cheers luigi I think the comment is slightly misleading, and that it won't actually cause any performance problems as you suggest. From what I recall, immediate acking of PUSH packets varies... Linux appears to have changed back and forth on whether it does so or not. I also seem to recall Windows making a change too. Either way, we probably shouldn't rely on that behavior alone. Never send more than half a socket buffer of data. This insures that we can always keep 2 packets on the wire, no matter what SO_SNDBUF is, and therefore ACKs will never be delayed unless we run out of data to transmit. The problem is quite easy to tickle when the MTU of the outgoing interface is larger than the socket buffer size (e.g. loopback). If I'm reading the implementation correctly, what this means is that if you have a single packet .5*socketbuffer, you reduce the maximum *segment* size, causing two smaller packets to be sent instead of one large packet. (Smaller still being 8K in size.) While such a change might help with localhost, I have this sneaky suspicion that it falls apart when applied to jumbo frames and 32K send buffers. Someone well motivated should be able to come up with a more general heuristic. Mike Silby Silbersack To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: ENOBUFS
how large are the packets and how fast is the box ? Packets go out at an average size of 1024 bytes. The box is dual P4 Xeon 2400/400 so I think it should qualify as fast ? I disabled hyperthreading to figure out if it was causing problems. I seem to be able to send packets at a rate in the 900Mbps when just sending them out with a process. If I do similar sending on two interfaces at same time, it tops out at 600Mbps. The information I´m looking for is how to instrument where the bottleneck is to either tune the parameters or report a bug in PCI or em code. (or just simply swap the GE hardware to something that works better) Pete on a fast box you should be able to generate packets faster than wire speed for sizes around 500bytes, meaning that you are going to saturate the queue no matter how large it is. cheers luigi em-interface is running 66/64 and is there a way to see interface queue depth? em0: Intel(R) PRO/1000 Network Connection, Version - 1.3.14 port 0x3040-0x307f mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2 em0: Speed:1000 Mbps Duplex:Full pcib2: PCI to PCI bridge (vendor=8086 device=1460) at device 29.0 on pci1 IOAPIC #2 intpin 0 - irq 16 IOAPIC #2 intpin 6 - irq 17 IOAPIC #2 intpin 7 - irq 18 pci2: PCI bus on pcib2 The OS is 4.7-RELEASE. Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message
Re: ENOBUFS
Petri Helenius wrote: how large are the packets and how fast is the box ? Packets go out at an average size of 1024 bytes. The box is dual P4 Xeon 2400/400 so I think it should qualify as fast ? I disabled hyperthreading to figure out if it was causing problems. I seem to be able to send packets at a rate in the 900Mbps when just sending them out with a process. If I do similar sending on two interfaces at same time, it tops out at 600Mbps. The 900Mbps are similar to what I see here on similar hardware. For your two-interface setup, are the 600Mbps aggregate send rate on both interfaces, or do you see 600Mbps per interface? In the latter case, is your CPU maxed out? Only one can be in the kernel under -stable, so the second one won't help much. With small packets like that, you may be interrupt-bound. (Until Luigi releases polling for em interfaces... :-) Lars -- Lars Eggert [EMAIL PROTECTED] USC Information Sciences Institute smime.p7s Description: S/MIME Cryptographic Signature
Re: ENOBUFS
The 900Mbps are similar to what I see here on similar hardware. What kind of receive performance do you observe? I haven´t got that far yet. For your two-interface setup, are the 600Mbps aggregate send rate on both interfaces, or do you see 600Mbps per interface? In the latter 600Mbps per interface. I´m going to try this out also on -CURRENT to see if it changes anything. Interrupts do not seem to pose a big problem because I´m seeing only a few thousand em interrupts a second but since every packet involves a write call there are 100k syscalls a second. case, is your CPU maxed out? Only one can be in the kernel under -stable, so the second one won't help much. With small packets like that, you may be interrupt-bound. (Until Luigi releases polling for em interfaces... :-) I´ll try changing the packet sizes to figure out optimum. Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-net in the body of the message