Re: SOLVED [was: firewall is very slow, something's wrong]
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-17 00:16]: > HOLY SH*T! I tried 4.2. It rocks! > > Just the first test that I tried after installing it: > - switched gigabit network > - web server behind 1:1 NATing firewall > - firewall is AMD64 X2 2.4GHz > - downloading 2GB file via HTTP through the firewall in infinite loop > - flooding the firewall with small UDP packets, random source IPs, > generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI > Express card, Linux Fedora 7, running the kernel-level "pktgen" packet > generator which is very fast) can crank them out. The packets are directed > to the NATed address of the web server, to a port that's blocked by the > firewall. > > Under these conditions, OpenBSD 4.1 as a firewall just keels over and dies. > All traffic through the firewall just stops in an instant. > Linux 2.6.18 fares slightly better, the current download finishes up, but > another one won't start. > > But the default OpenBSD 4.2 i386 uniprocessor kernel doesn't seem to care. lovely :) -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
Re: SOLVED [was: firewall is very slow, something's wrong]
Stuart Henderson wrote: On 2007/10/16 15:27, James Hartley wrote: Secondly, does anyone on the mailing list know of an OpenBSD equivalent to pktgen? Not in-kernel, but netblast from the netrate package is somewhat useful. If anybody has a same-hardware performance comparison between pktgen and netblast, please post it. I'm especially interested in generating lots of small packets, which is difficult. -- Florin Andrei http://florin.myip.org/
Re: SOLVED [was: firewall is very slow, something's wrong]
On 2007/10/16 15:27, James Hartley wrote: > On 10/16/07, Florin Andrei <[EMAIL PROTECTED]> wrote: > > - flooding the firewall with small UDP packets, random source IPs, > > generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI > > Express card, Linux Fedora 7, running the kernel-level "pktgen" packet > > generator which is very fast) can crank them out. > > First, thanks for sharing your findings. > > Secondly, does anyone on the mailing list know of an OpenBSD > equivalent to pktgen? Not in-kernel, but netblast from the netrate package is somewhat useful.
Re: SOLVED [was: firewall is very slow, something's wrong]
On 10/16/07, Florin Andrei <[EMAIL PROTECTED]> wrote: > - flooding the firewall with small UDP packets, random source IPs, > generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI > Express card, Linux Fedora 7, running the kernel-level "pktgen" packet > generator which is very fast) can crank them out. First, thanks for sharing your findings. Secondly, does anyone on the mailing list know of an OpenBSD equivalent to pktgen? Thanks. Jim
SOLVED [was: firewall is very slow, something's wrong]
Florin Andrei wrote: ## Huge performance improvements in the network stack, including: * In pf, store routing table ID, queue ID etc directly in the packet header mbuf instead of using mbuf tags (which use malloc'd memory). This yields a 100% improvement in pf performance. * Skip TCP/UDP/ICMP/ICMP6 checksumming when not necessary. This yields a further 10% improvement in pf performance. * A change in the way the kernel random pool is stirred greatly increases performance with network interface cards that support interrupt mitigation, especially on architectures where reading the clock is expensive (such as amd64). ## I'll try 4.2. HOLY SH*T! I tried 4.2. It rocks! Just the first test that I tried after installing it: - switched gigabit network - web server behind 1:1 NATing firewall - firewall is AMD64 X2 2.4GHz - downloading 2GB file via HTTP through the firewall in infinite loop - flooding the firewall with small UDP packets, random source IPs, generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI Express card, Linux Fedora 7, running the kernel-level "pktgen" packet generator which is very fast) can crank them out. The packets are directed to the NATed address of the web server, to a port that's blocked by the firewall. Under these conditions, OpenBSD 4.1 as a firewall just keels over and dies. All traffic through the firewall just stops in an instant. Linux 2.6.18 fares slightly better, the current download finishes up, but another one won't start. But the default OpenBSD 4.2 i386 uniprocessor kernel doesn't seem to care. The download just keeps going. New downloads are initiated OK through the firewall. There are even spare CPU cycles left :-) not many (10%) but still. There's a very large percentage of CPU (80...90%) used for interrupts. Good job folks, I'm impressed. Anyone building gigabit routers and firewalls, don't delay, upgrade to 4.2. Heck, do that even for 100Mbit systems, this type of DoS doesn't need much bandwidth to be effective. I'll keep doing tests. If anything interesting shows up, I'll post the results in a new thread. -- Florin Andrei http://florin.myip.org/
Re: firewall is very slow, something's wrong
* Robert C Wittig <[EMAIL PROTECTED]> [2007-10-10 20:45]: > If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig RAM > and a 64 bit CPU, which would be a better choice, in general? for a packet filter/router/...? 32bit 2Gig and take a gig out. for a databse server? 64bit and add ram when required. there is no "in general". -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
Re: firewall is very slow, something's wrong
On Wed, Oct 10, 2007 at 12:34:48PM -0500, Robert C Wittig wrote: | If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig | RAM and a 64 bit CPU, which would be a better choice, in general? There is no such generalization. The amount of RAM you need depends on the task. For firewalling, you don't need lots. For a high-traffic, caching webserver you do need much. If, in general, you are firewalling .. you won't need much RAM. If, in general, you are doing something else, you might need it. Like I said in my previous mail, there is no short answer. No quick solution. Everything has advantages and disadvantages. In some cases you may not even want to run OpenBSD (*shock* !). In general, you should look at the specific problem at hand and solve it with the means available. Cheers, Paul 'WEiRD' de Weerd -- >[<++>-]<+++.>+++[<-->-]<.>+++[<+ +++>-]<.>++[<>-]<+.--.[-] http://www.weirdnet.nl/ [demime 1.01d removed an attachment of type application/pgp-signature]
Re: firewall is very slow, something's wrong
On 10/10/07, Robert C Wittig <[EMAIL PROTECTED]> wrote: > If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig > RAM and a 64 bit CPU, which would be a better choice, in general? 64-bit and 1 GB. it's much easier to add another GB RAM later than to add 32-bits.
Re: firewall is very slow, something's wrong
Paul de Weerd wrote: wittig wrote: | 64 bit processors (combined with 64 bit capable operating systems) have | the ability to address more RAM than 32 bit processors because 64^2 is a | much larger number than 32^2... lots more RAM addresses). Oops! that should have read: 2^64 and 2^32 Depending on your software, 64 bit processors can be quite a bit faster. If you're dealing with 64bit integers, using 64bit registers, etc., a lower clocked 64bit CPU might be faster than a 32bit CPU clocking at a higher rate. In short: There is no short answer. It depends on what you're doing. Point taken, particularly where big integers are concerned. From what Henning tells us (and what sounds logical to me), grabbing a ethernet frame from a NIC and putting it on another NIC doesn't really change much from 32bit to 64bit. Your compiler also comes into play. If that is more tuned towards a certain 32bit architecture (such as i386) than a certain 64bit arch (because it's less populair, such as sparc64 or hppa64 or mips64), this will impact your performance quite a bit. If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig RAM and a 64 bit CPU, which would be a better choice, in general? -- -wittig http://www.robertwittig.com/ http://robertwittig.net/ http://robertwittig.org/ .
Re: firewall is very slow, something's wrong
On 2007/10/10 11:20, Tony Abernethy wrote: > Siju George wrote: > > > > so you think a 20 ton truck is twice as fast as a 10 ton truck? > > O.K I get it :-) > > So when does changing from 32 bit to a 64-bit processor actually help? > > Quoting Paul de Weerd, > "In short: There is no short answer. It depends on what you're doing." > ( Not to mention how you do it ;-) There are other changes between i386/amd64 than the number of bits (e.g. amd64 has more registers, which allows some other changes that can improve performance for some things), so it depends a lot on the code being run. You can't even always say, "software X is faster on arch Y", since the way you use that software can give different results. If you're looking for "fastest", just benchmark as close to real-life use on both, it's the easiest way. You also often need to test whether what you're trying to run does work correctly on !i386 arch (it's not uncommon for code to make assumptions which don't hold true on !i386). Of course, there are reasons other than "fastest" you might choose a particular arch. > Short answer: > When you *might* need more than a GB or so of RAM/swap. > Most anything is faster than stuck. > > Easy: 2:1 ratio *either direction* which is faster. > Hard: 10:1 ratio (again either direction). I'm not too sure I understand what you're saying here.
Re: firewall is very slow, something's wrong
Siju George wrote: > > so you think a 20 ton truck is twice as fast as a 10 ton truck? > O.K I get it :-) > So when does changing from 32 bit to a 64-bit processor actually help? Quoting Paul de Weerd, "In short: There is no short answer. It depends on what you're doing." ( Not to mention how you do it ;-) Short answer: When you *might* need more than a GB or so of RAM/swap. Most anything is faster than stuck. Easy: 2:1 ratio *either direction* which is faster. Hard: 10:1 ratio (again either direction). (figure in loading/unloading times on the truck analogy)
Re: firewall is very slow, something's wrong
And is it in a vacuum? Peter N. M. Hansteen wrote: Henning Brauer <[EMAIL PROTECTED]> writes: so you think a 20 ton truck is twice as fast as a 10 ton truck? horizontal or vertical motion? assuming a perfectly spherical truck?
Re: firewall is very slow, something's wrong
On 10/10/07, Henning Brauer <[EMAIL PROTECTED]> wrote: > * Siju George <[EMAIL PROTECTED]> [2007-10-10 15:10]: > > On 10/9/07, Henning Brauer <[EMAIL PROTECTED]> wrote: > > > * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]: > > > >> then, an i386 kernel should perform considerably better than amd64 for > > > >> firewalling/routing/... > > > > That is surprising. What is the reason? > > > we dunno really. it hasn't been benched in sometimesoit might not even > > > be true nay more, but last time the difference was dramatic. > > I thought by running an amd64 kernel will get me twice the speed than > > an i386 on an amd64 machine since one is 64 bit processing and the > > other is just 32 bit :-( > > so you think a 20 ton truck is twice as fast as a 10 ton truck? > O.K I get it :-) So when does changing from 32 bit to a 64-bit processor actually help? Kind Regards Siju
Re: firewall is very slow, something's wrong
Robert C Wittig wrote: > Siju George wrote: > > > I thought by running an amd64 kernel will get me twice the > speed than > > an i386 on an amd64 machine since one is 64 bit processing and the > > other is just 32 bit :-( > > > > 64 bit processors (combined with 64 bit capable operating > systems) have > the ability to address more RAM than 32 bit processors > because 64^2 is a > much larger number than 32^2... lots more RAM addresses). Actually 2^64 vs 2^32 (64^2 is 2^7, 64 is 2^6, 32 is 2^5) Other things equal, 64-bit should take twice as long because it takes 64 bits to do anything instead of 32 bits. Not really that simple, because accessing 32 bits can involve 1) accessing the 64 bits that the 32 bits are in. 2) selecting the appropriate 32 bits of the 64 bits. > > This does not speed things up, though, until you run out of RAM, and > start having to access the swapfile. The 64-bits does affect how big the swap file can be without resorting to Rube Goldberg contraptions to identify what is what. > > The processor's speed... MHz, GHz, etc., will determine how fast the > processor itself can process instructions. > > > -- > -wittig http://www.robertwittig.com/ > http://robertwittig.net/ > http://robertwittig.org/ > .
Re: firewall is very slow, something's wrong
Robert C Wittig wrote: > 64 bit processors (combined with 64 bit capable operating systems) have > the ability to address more RAM than 32 bit processors because 64^2 is a > much larger number than 32^2... lots more RAM addresses). The increase from 2^32 to 2^64 is even more impressive. ;-) --Jon Radel [demime 1.01d removed an attachment of type application/x-pkcs7-signature which had a name of smime.p7s]
Re: firewall is very slow, something's wrong
On Wed, Oct 10, 2007 at 09:24:25AM -0500, Robert C Wittig wrote: | Siju George wrote: | | >I thought by running an amd64 kernel will get me twice the speed than | >an i386 on an amd64 machine since one is 64 bit processing and the | >other is just 32 bit :-( | > | | 64 bit processors (combined with 64 bit capable operating systems) have | the ability to address more RAM than 32 bit processors because 64^2 is a | much larger number than 32^2... lots more RAM addresses). | | This does not speed things up, though, until you run out of RAM, and | start having to access the swapfile. | | The processor's speed... MHz, GHz, etc., will determine how fast the | processor itself can process instructions. Depending on your software, 64 bit processors can be quite a bit faster. If you're dealing with 64bit integers, using 64bit registers, etc., a lower clocked 64bit CPU might be faster than a 32bit CPU clocking at a higher rate. In short: There is no short answer. It depends on what you're doing. >From what Henning tells us (and what sounds logical to me), grabbing a ethernet frame from a NIC and putting it on another NIC doesn't really change much from 32bit to 64bit. Your compiler also comes into play. If that is more tuned towards a certain 32bit architecture (such as i386) than a certain 64bit arch (because it's less populair, such as sparc64 or hppa64 or mips64), this will impact your performance quite a bit. Cheers, Paul 'WEiRD' de Weerd -- >[<++>-]<+++.>+++[<-->-]<.>+++[<+ +++>-]<.>++[<>-]<+.--.[-] http://www.weirdnet.nl/ [demime 1.01d removed an attachment of type application/pgp-signature]
Re: firewall is very slow, something's wrong
Siju George wrote: I thought by running an amd64 kernel will get me twice the speed than an i386 on an amd64 machine since one is 64 bit processing and the other is just 32 bit :-( 64 bit processors (combined with 64 bit capable operating systems) have the ability to address more RAM than 32 bit processors because 64^2 is a much larger number than 32^2... lots more RAM addresses). This does not speed things up, though, until you run out of RAM, and start having to access the swapfile. The processor's speed... MHz, GHz, etc., will determine how fast the processor itself can process instructions. -- -wittig http://www.robertwittig.com/ http://robertwittig.net/ http://robertwittig.org/ .
Re: firewall is very slow, something's wrong
Henning Brauer <[EMAIL PROTECTED]> writes: > so you think a 20 ton truck is twice as fast as a 10 ton truck? horizontal or vertical motion? assuming a perfectly spherical truck? -- Peter N. M. Hansteen, member of the first RFC 1149 implementation team http://bsdly.blogspot.com/ http://www.datadok.no/ http://www.nuug.no/ "Remember to set the evil bit on all malicious network traffic" delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
Re: firewall is very slow, something's wrong
* Siju George <[EMAIL PROTECTED]> [2007-10-10 15:10]: > On 10/9/07, Henning Brauer <[EMAIL PROTECTED]> wrote: > > * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]: > > >> then, an i386 kernel should perform considerably better than amd64 for > > >> firewalling/routing/... > > > That is surprising. What is the reason? > > we dunno really. it hasn't been benched in sometimesoit might not even > > be true nay more, but last time the difference was dramatic. > I thought by running an amd64 kernel will get me twice the speed than > an i386 on an amd64 machine since one is 64 bit processing and the > other is just 32 bit :-( so you think a 20 ton truck is twice as fast as a 10 ton truck? -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
Re: firewall is very slow, something's wrong
On 10/9/07, Henning Brauer <[EMAIL PROTECTED]> wrote: > * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]: > >> then, an i386 kernel should perform considerably better than amd64 for > >> firewalling/routing/... > > > > That is surprising. What is the reason? > > we dunno really. it hasn't been benched in sometimesoit might not even > be true nay more, but last time the difference was dramatic. > I thought by running an amd64 kernel will get me twice the speed than an i386 on an amd64 machine since one is 64 bit processing and the other is just 32 bit :-( How about on sparc64 systems? do you get thwice the speed compared to its 32 bit counterpart? Thank you so much Kind Regards Siju
Re: firewall is very slow, something's wrong
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 22:54]: > Henning Brauer wrote: >> * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]: then, an i386 kernel should perform considerably better than amd64 for firewalling/routing/... >>> That is surprising. What is the reason? >> we dunno really. it hasn't been benched in sometimesoit might not even be >> true nay more, but last time the difference was dramatic. > > Then I will do some tests with 4.2 on gigabit-capable hardware. If anything > noteworthy comes out, I'll post the results. > Don't expect something too fancy, but I guess anything is better than > nothing. > >>> How much RAM can the i386 kernel use on an amd64 machine? >> 4GB minus pci space > > Hmmm. > > Please correct me if I'm wrong: > Let's say a firewall is connected to a pretty fast Internet pipe (in the > gigabit range). Let's say there's a DDoS against this environment. In > theory, the firewall would need lots of RAM so that it can deal with the > incoming nasty packets, create an entry for each packet in the state table > (don't know the correct name for it in OpenBSD, sorry), then expire it > after a while. > In theory, the firewall could be tweaked to expire unused states quickly, > but still, more RAM is better when dealing with a DDoS. nope. the kernel will not ever use more than 1 GB (or were it 768MB? memory fuzzy). more than 1 GB of memory on a firewall even hurts.ok, not much. but a bit. > What's still not clear to me is how much RAM I should provision per 1Gb of > bandwidth on OpenBSD, assuming there's an incoming worst-case-scenario > DDoS, that consumes RAM (and other resources) on the firewall yet leaves > some bandwidth open for legitimate traffic (so the firewall must be able to > continue to let the good traffic pass through). Also assuming some tweaking > has been done on the firewall to expire the bad stuff quickly without > affecting legitimate traffic. RAM is not your concern on a firewall. >>> If the SMP kernel does not actually hurt performance, I might have to use >>> it. >> it does. seriously. locking is not free. > > Aw, damn. I was hoping that's not quite the case. > > Well, then hopefully the dynamic routing daemons won't get too greedy and > DoS the firewall from within. :-) no, they won't. they only get the cpu cycles not required for packet forwarding (well, interrupts + softint handling really) anyway. > Or I may have to re-think the whole > environment and forget the idea of doing any kind of dynamic routing on the > firewall - from a security perspective, dynamic routing on the firewall > sucks anyway. no, not really, not if done right. -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
Re: firewall is very slow, something's wrong
Henning Brauer wrote: * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]: then, an i386 kernel should perform considerably better than amd64 for firewalling/routing/... That is surprising. What is the reason? we dunno really. it hasn't been benched in sometimesoit might not even be true nay more, but last time the difference was dramatic. Then I will do some tests with 4.2 on gigabit-capable hardware. If anything noteworthy comes out, I'll post the results. Don't expect something too fancy, but I guess anything is better than nothing. How much RAM can the i386 kernel use on an amd64 machine? 4GB minus pci space Hmmm. Please correct me if I'm wrong: Let's say a firewall is connected to a pretty fast Internet pipe (in the gigabit range). Let's say there's a DDoS against this environment. In theory, the firewall would need lots of RAM so that it can deal with the incoming nasty packets, create an entry for each packet in the state table (don't know the correct name for it in OpenBSD, sorry), then expire it after a while. In theory, the firewall could be tweaked to expire unused states quickly, but still, more RAM is better when dealing with a DDoS. What's still not clear to me is how much RAM I should provision per 1Gb of bandwidth on OpenBSD, assuming there's an incoming worst-case-scenario DDoS, that consumes RAM (and other resources) on the firewall yet leaves some bandwidth open for legitimate traffic (so the firewall must be able to continue to let the good traffic pass through). Also assuming some tweaking has been done on the firewall to expire the bad stuff quickly without affecting legitimate traffic. But all that depends on the actual legitimate traffic and on the firewall rules. I guess that's another way of saying "more tests are needed". :-/ If the SMP kernel does not actually hurt performance, I might have to use it. it does. seriously. locking is not free. Aw, damn. I was hoping that's not quite the case. Well, then hopefully the dynamic routing daemons won't get too greedy and DoS the firewall from within. :-) Or I may have to re-think the whole environment and forget the idea of doing any kind of dynamic routing on the firewall - from a security perspective, dynamic routing on the firewall sucks anyway. Looks like my performance test matrix just got bigger by a factor of 2x. :-/ But the bad combinations should get pruned pretty quickly, I guess. +-+---+---+ | \ | i386 | amd64 | +-+---+---+ | SMP | | | +-+---+---+ | UP | | | +-+---+---+ -- Florin Andrei http://florin.myip.org/
Re: firewall is very slow, something's wrong
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]: >> then, an i386 kernel should perform considerably better than amd64 for >> firewalling/routing/... > > That is surprising. What is the reason? we dunno really. it hasn't been benched in sometimesoit might not even be true nay more, but last time the difference was dramatic. > How much RAM can the i386 kernel use on an amd64 machine? 4GB minus pci space >> next, you don't want SMP for such tasks. take out the second CPU and give >> it to somebody who can use it, and run the uniprocessor kernel. > So, assuming the box is a pure firewall / static router (so just pf and > static routes), even with multiple interfaces, all those tasks run in a > single kernel thread? yup > Now here's the second thing: if this firewall needs to be integrated in an > environment with dynamic routing, it will need to run some kind of dynamic > routing daemon(s). For that, I'd like to have at least two cores on the > system, and a kernel that can take advantage of them. the required locking will cost you more than the second cpu/core will ever gain you. > If the SMP kernel does not actually hurt performance, I might have to use > it. it does. seriously. locking is not free. -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
Re: firewall is very slow, something's wrong
Henning Brauer wrote: First, you want to run 4.2 or -current, that shoudl about double your throughput. Yes, I was looking at a paragraph in the 4.2 release notes and I thought all those things might be related exactly to the problem I'm seeing: ## Huge performance improvements in the network stack, including: * In pf, store routing table ID, queue ID etc directly in the packet header mbuf instead of using mbuf tags (which use malloc'd memory). This yields a 100% improvement in pf performance. * Skip TCP/UDP/ICMP/ICMP6 checksumming when not necessary. This yields a further 10% improvement in pf performance. * A change in the way the kernel random pool is stirred greatly increases performance with network interface cards that support interrupt mitigation, especially on architectures where reading the clock is expensive (such as amd64). ## I'll try 4.2. then, an i386 kernel should perform considerably better than amd64 for firewalling/routing/... That is surprising. What is the reason? How much RAM can the i386 kernel use on an amd64 machine? next, you don't want SMP for such tasks. take out the second CPU and give it to somebody who can use it, and run the uniprocessor kernel. So, assuming the box is a pure firewall / static router (so just pf and static routes), even with multiple interfaces, all those tasks run in a single kernel thread? Now here's the second thing: if this firewall needs to be integrated in an environment with dynamic routing, it will need to run some kind of dynamic routing daemon(s). For that, I'd like to have at least two cores on the system, and a kernel that can take advantage of them. If the SMP kernel does not actually hurt performance, I might have to use it. -- Florin Andrei http://florin.myip.org/
Re: firewall is very slow, something's wrong
Karsten McMinn wrote: while is dreadfully obvious that there is some weirdness happening, you'll definately get more performance by switching to the latest snapshot or wait for your 4.2 cd Just ordered it yesterday. ;-) if it hasn't come yet. What model transport do you have and whats the Mainbords bios rev? Tyan Transport GT24-B3992 BIOS Date: 03/06/07 09:36:13 Ver: 08.00.11 -- Florin Andrei http://florin.myip.org/
Re: firewall is very slow, something's wrong
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-05 03:55]: > The hardware is AMD64, Tyan Transport, 2 CPUs 2 cores each. I am using the > SMP kernel. The network card is Intel Pro/1000 PCI Express 4x dual gigabit > port, it carries both em0 and em1. First, you want to run 4.2 or -current, that shoudl about double your throughput. then, an i386 kernel should perform considerably better than amd64 for firewalling/routing/... next, you don't want SMP for such tasks. take out the second CPU and give it to somebody who can use it, and run the uniprocessor kernel. last, increase net.inet.ip.ifq.maxlen until you see the congestion counter not increasing much any more under load. should not exceed 2500 by too much. as a rule of thumb, 256 per gigE interface aren't too far off. -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
Re: firewall is very slow, something's wrong
On 10/8/07, Florin Andrei <[EMAIL PROTECTED]> wrote: > > The UDP flood still freezes the system solid (but I discovered that the > system clock continues to work more or less fine, it's just the text > console and the firewall that are not responsive). > > I still can't match the performance I get from Linux. Any suggestion is > appreciated. while is dreadfully obvious that there is some weirdness happening, you'll definately get more performance by switching to the latest snapshot or wait for your 4.2 cd if it hasn't come yet. What model transport do you have and whats the Mainbords bios rev?
Re: firewall is very slow, something's wrong
knitti wrote: there were in the past postings on this list about problems with quad-port em NICs. I am absolutely not in a position to tell whether they are relevant for this situation. If I remember correctly, there was a problem with TCP checksum offloading, and a suggested fix in one instance was jumpering the card down to 66 MHz. I can't tell if this is related in *any* way. I think there are some people here who *could* tell if you'd post a dmesg. # dmesg OpenBSD 4.1 (GENERIC.MP) #1152: Sat Mar 10 19:22:57 MST 2007 [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 3220754432 (3145268K) avail mem = 2757828608 (2693192K) using 22937 buffers containing 322281472 bytes (314728K) of memory mainbus0 (root) bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf97e0 (61 entries) bios0: empty empty acpi0 at mainbus0: rev 2 acpi0: tables DSDT FACP APIC OEMB SRAT acpitimer at acpi0 not configured acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Dual-Core AMD Opteron(tm) Processor 2216, 2394.33 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 64b/line 16-way L2 cache cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: apic clock running at 205MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 64b/line 16-way L2 cache cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu1: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu2 at mainbus0: apid 2 (application processor) cpu2: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 64b/line 16-way L2 cache cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu2: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu3 at mainbus0: apid 3 (application processor) cpu3: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 64b/line 16-way L2 cache cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu3: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative ioapic0 at mainbus0 apid 4 pa 0xfec0, version 11, 16 pins ioapic1 at mainbus0 apid 5 pa 0xfec01000, version 11, 16 pins ioapic2 at mainbus0 apid 6 pa 0xfec02000, version 11, 16 pins acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (P0P1) acpiprt2 at acpi0: bus 2 (P1P2) acpiprt3 at acpi0: bus 3 (BR14) acpiprt4 at acpi0: bus 4 (BR1E) acpiprt5 at acpi0: bus 5 (BR28) acpiprt6 at acpi0: bus 6 (BR32) acpiprt7 at acpi0: bus 7 (BR3C) acpibtn at acpi0 not configured ipmi0 at mainbus0: reserve send fails pci0 at mainbus0 bus 0: configuration mode 1 ppb0 at pci0 dev 1 function 0 "ServerWorks HT-1000 PCI" rev 0x00 pci1 at ppb0 bus 1 ppb1 at pci1 dev 13 function 0 "ServerWorks HT-1000 PCIX" rev 0xc0 pci2 at ppb1 bus 2 pciide0 at pci1 dev 14 function 0 "ServerWorks HT-1000 SATA" rev 0x00: DMA pciide0: using apic 4 int 11 (irq 11) for native-PCI interrupt pciide0: port 0: device present, speed: 1.5Gb/s wd0 at pciide0 channel 0 drive 0: wd0: 16-sector PIO, LBA48, 238475MB, 488397168 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 pciide0: port 1: PHY offline pciide0: port 2: PHY offline pciide0: port 3: PHY offline piixpm0 at pci0 dev 2 function 0 "ServerWorks HT-1000" rev 0x00: polling iic0 at piixpm0 adt0 at iic0 addr 0x2e: emc6d100 rev 0x68 pciide1 at pci0 dev 2 function 1 "ServerWorks HT-1000 IDE" rev 0x00: DMA atapiscsi0 at pciide1 channel 0 drive 1 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: SCSI0 5/cdrom removable cd0(pciide1:0:1): using PIO mode 4, DMA mode 2, Ultra-DMA mode 0 pcib0 at pci0 dev 2 function 2 "ServerWorks HT-1000 LPC" rev 0x00 ohci0 at pci0 dev 3 function 0 "ServerWorks HT-1000 USB" rev 0x01: apic 4 int 10 (irq 10), version 1.0, legacy support usb0 at ohci0: USB revision 1.0 uhub0 at usb0 uhub0: ServerWorks OHCI root hub, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered ohci1 at pci0 dev 3 function 1 "ServerWorks HT-1000 USB" rev 0x01: apic 4 int 10 (
Re: firewall is very slow, something's wrong
On 10/8/07, Florin Andrei <[EMAIL PROTECTED]> wrote: > I still can't match the performance I get from Linux. Any suggestion is > appreciated. there were in the past postings on this list about problems with quad-port em NICs. I am absolutely not in a position to tell whether they are relevant for this situation. If I remember correctly, there was a problem with TCP checksum offloading, and a suggested fix in one instance was jumpering the card down to 66 MHz. I can't tell if this is related in *any* way. I think there are some people here who *could* tell if you'd post a dmesg. gretings, knitti
Re: firewall is very slow, something's wrong
Florin Andrei wrote: I expected OpenBSD 4.1 to do better. But the thing is, even without the UDP flood, the OpenBSD firewall is very slow. I am downloading a huge file through it, via HTTP, and all I get is 4 Mbyte / sec. With Linux I get 112 Mbyte / sec. Something's wrong. Or I'm doing something wrong. Disabled all pf rules including NAT, now it's just "pass in ; pass out" Now the download is able to saturate the gig ports, about 112 Mbyte / sec. But it's still not constantly at 112, it sometime drops below that about 10%. When that happens, CPU0 has 0% idle cycles. A lot of interrupts, always above 70% on CPU0, going to 99% when the download slows down. The congestion counter is now 0. The UDP flood still freezes the system solid (but I discovered that the system clock continues to work more or less fine, it's just the text console and the firewall that are not responsive). I still can't match the performance I get from Linux. Any suggestion is appreciated. -- Florin Andrei http://florin.myip.org/
Re: firewall is very slow, something's wrong
Stuart Henderson wrote: On 2007/10/04 17:48, Florin Andrei wrote: All firewall rules are written as stateless as possible - I don't need stateful filtering, the setup is very simple (allow HTTP inbound, allow a few ICMP types, and that's it). congestion116169 197.2/s Try setting net.inet.ip.ifq.maxlen to 256 (sysctl/sysctl.conf), if you still see the congestion count increasing then search for net.inet.ip.ifq.maxlen in the list archives and have a read. I raised maxlen to 300. I also enabled ACPI. It's still slow. The congestion counter is still not zero - currently at 386.5/s One good thing is that there used to be a big pause when the kernel was booting up, probably waiting for some device or something - now with ACPI the pause is smaller. It's still waiting for something, just not as much. I am watching the system with top, set to update every 1s, and I noticed there are a lot of interrupt load bursts on CPU0. The percentage of interrupt load is very uneven, sometimes as low as 15%, sometimes as high as 75%. I unleashed the UDP flood and the firewall is totally frozen - can't do anything even on the local keyboard. Not even the display (running top) gets updated anymore. The machine is frozen solid. All network traffic stops immediately. Kill the UDP flood and OpenBSD resumes normal operations. I tried the uniprocessor kernel and it's exactly the same. Comparison with Linux on the exact same hardware: HTTP download speed through the firewall is 112 Mbyte / sec (saturating the GigE ports) and the interrupt load is relatively low and constant - about 30%. Under UDP flood with Linux as a firewall, the current download finishes up, but a new one cannot get started. The system is not frozen at all, it's quite usable, in fact I can heavily overload it (running a bunch of CPU hogs) to the point where userspace becomes sluggish and load average is up to 250 or so, yet the firewall is not influenced at all. So what's the deal here? The heavy interrupt load percentage seems to indicate an issue with the network driver if I'm not mistaken. But these are good and quite popular network cards - Intel Pro/1000 PCI Express 4x dual-port gigabit, seen by kernel as em0 and em1 -- Florin Andrei http://florin.myip.org/
Re: firewall is very slow, something's wrong
On Thu, Oct 04, 2007 at 05:48:50PM -0700, Florin Andrei wrote: > Dual-homed firewall, web server on the private network, firewall is > doing 1:1 NAT for the web server to the public interface of the > firewall. em0 is the public interface, em1 is the private one. > > In the exact same setup (same hardware even) I am comparing Linux and > OpenBSD for a firewall. Installed Linux on a hard-disc, OpenBSD on > another disc, and I'm just swapping discs while I'm testing. > All firewall rules are written as stateless as possible - I don't need > stateful filtering, the setup is very simple (allow HTTP inbound, allow > a few ICMP types, and that's it). > > With Linux, I achieve gigabit transfer speeds through the firewall > (saturating the network ports), but the firewall refuses to let any new > connection through when I flood it with a bunch of small UDP packets > with random source addresses. > > I expected OpenBSD 4.1 to do better. But the thing is, even without the > UDP flood, the OpenBSD firewall is very slow. I am downloading a huge > file through it, via HTTP, and all I get is 4 Mbyte / sec. With Linux I > get 112 Mbyte / sec. > > Something's wrong. Or I'm doing something wrong. > > The hardware is AMD64, Tyan Transport, 2 CPUs 2 cores each. I am using > the SMP kernel. The network card is Intel Pro/1000 PCI Express 4x dual > gigabit port, it carries both em0 and em1. > I guess you need to "enable acpi" with config(8) as the system is quite new and most newer system have busted MP BIOS infos. The effect is bad interrupt routing and other crazyness -- which is often felt as slow systems. -- :wq Claudio
Re: firewall is very slow, something's wrong
On 2007/10/04 17:48, Florin Andrei wrote: > All firewall rules are written as stateless as possible - I don't need > stateful filtering, the setup is very simple (allow HTTP inbound, allow a > few ICMP types, and that's it). You might want to re-think this, stateless rulesets are usually slower. This is interesting: http://www.undeadly.org/cgi?action=article&sid=20060927091645 > congestion116169 197.2/s Try setting net.inet.ip.ifq.maxlen to 256 (sysctl/sysctl.conf), if you still see the congestion count increasing then search for net.inet.ip.ifq.maxlen in the list archives and have a read.
firewall is very slow, something's wrong
Dual-homed firewall, web server on the private network, firewall is doing 1:1 NAT for the web server to the public interface of the firewall. em0 is the public interface, em1 is the private one. In the exact same setup (same hardware even) I am comparing Linux and OpenBSD for a firewall. Installed Linux on a hard-disc, OpenBSD on another disc, and I'm just swapping discs while I'm testing. All firewall rules are written as stateless as possible - I don't need stateful filtering, the setup is very simple (allow HTTP inbound, allow a few ICMP types, and that's it). With Linux, I achieve gigabit transfer speeds through the firewall (saturating the network ports), but the firewall refuses to let any new connection through when I flood it with a bunch of small UDP packets with random source addresses. I expected OpenBSD 4.1 to do better. But the thing is, even without the UDP flood, the OpenBSD firewall is very slow. I am downloading a huge file through it, via HTTP, and all I get is 4 Mbyte / sec. With Linux I get 112 Mbyte / sec. Something's wrong. Or I'm doing something wrong. The hardware is AMD64, Tyan Transport, 2 CPUs 2 cores each. I am using the SMP kernel. The network card is Intel Pro/1000 PCI Express 4x dual gigabit port, it carries both em0 and em1. = lo0: flags=8049 mtu 33192 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8 fxp0: flags=8802 mtu 1500 lladdr 00:e0:81:4a:0a:7f media: Ethernet autoselect (none) status: no carrier bge0: flags=8802 mtu 1500 lladdr 00:e0:81:4a:0a:a8 media: Ethernet autoselect (none) status: no carrier bge1: flags=8802 mtu 1500 lladdr 00:e0:81:4a:0a:a9 media: Ethernet autoselect (none) status: no carrier em0: flags=8843 mtu 1500 lladdr 00:15:17:37:e9:fa groups: egress media: Ethernet autoselect (1000baseT full-duplex) status: active inet 10.123.0.10 netmask 0xff00 broadcast 10.123.0.255 inet6 fe80::215:17ff:fe37:e9fa%em0 prefixlen 64 scopeid 0x4 inet 10.123.0.253 netmask 0x broadcast 10.123.0.253 em1: flags=8843 mtu 1500 lladdr 00:15:17:37:e9:fb media: Ethernet autoselect (1000baseT full-duplex) status: active inet 10.123.1.10 netmask 0xff00 broadcast 10.123.1.255 inet6 fe80::215:17ff:fe37:e9fb%em1 prefixlen 64 scopeid 0x5 pflog0: flags=141 mtu 33192 enc0: flags=0<> mtu 1536 == TRANSLATION RULES: binat on em0 inet from 10.123.1.253 to any -> 10.123.0.253 FILTER RULES: pass quick on em1 all no state pass in quick on em0 inet proto tcp from any to 10.123.1.253 port = www no state pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type echoreq no state pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type echorep no state pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type unreach no state pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type paramprob no state pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type trace no state pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type timex no state pass in quick on em0 inet from any to 10.123.0.10 no state block drop in quick all pass out all no state No queue in use STATES: all tcp 10.123.1.253:80 <- 10.123.0.253:80 <- 10.123.0.251:47108 ESTABLISHED:ESTABLISHED INFO: Status: Enabled for 0 days 00:09:49 Debug: Urgent State Table Total Rate current entries1 searches 3809717 6468.1/s inserts60.0/s removals 50.0/s Counters match1812847 3077.8/s bad-offset 00.0/s fragment 00.0/s short 00.0/s normalize 00.0/s memory 00.0/s bad-timestamp 00.0/s congestion116169 197.2/s ip-option 00.0/s proto-cksum00.0/s state-mismatch 00.0/s state-insert 00.0/s state-limit00.0/s src-limit 00.0/s synproxy 00.0/s TIMEOUTS: tcp.first30s tcp.opening 5s tcp.established 18000s tcp.closing 60s tcp.finwait