Re: SOLVED [was: firewall is very slow, something's wrong]

2007-10-17 Thread Henning Brauer
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-17 00:16]:
> HOLY SH*T! I tried 4.2. It rocks!
>
> Just the first test that I tried after installing it:
> - switched gigabit network
> - web server behind 1:1 NATing firewall
> - firewall is AMD64 X2 2.4GHz
> - downloading 2GB file via HTTP through the firewall in infinite loop
> - flooding the firewall with small UDP packets, random source IPs, 
> generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI 
> Express card, Linux Fedora 7, running the kernel-level "pktgen" packet 
> generator which is very fast) can crank them out. The packets are directed 
> to the NATed address of the web server, to a port that's blocked by the 
> firewall.
>
> Under these conditions, OpenBSD 4.1 as a firewall just keels over and dies. 
> All traffic through the firewall just stops in an instant.
> Linux 2.6.18 fares slightly better, the current download finishes up, but 
> another one won't start.
>
> But the default OpenBSD 4.2 i386 uniprocessor kernel doesn't seem to care. 

lovely :)

-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam



Re: SOLVED [was: firewall is very slow, something's wrong]

2007-10-16 Thread Florin Andrei

Stuart Henderson wrote:

On 2007/10/16 15:27, James Hartley wrote:


Secondly, does anyone on the mailing list know of an OpenBSD
equivalent to pktgen?


Not in-kernel, but netblast from the netrate package is somewhat
useful.


If anybody has a same-hardware performance comparison between pktgen and 
netblast, please post it. I'm especially interested in generating lots 
of small packets, which is difficult.


--
Florin Andrei

http://florin.myip.org/



Re: SOLVED [was: firewall is very slow, something's wrong]

2007-10-16 Thread Stuart Henderson
On 2007/10/16 15:27, James Hartley wrote:
> On 10/16/07, Florin Andrei <[EMAIL PROTECTED]> wrote:
> > - flooding the firewall with small UDP packets, random source IPs,
> > generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI
> > Express card, Linux Fedora 7, running the kernel-level "pktgen" packet
> > generator which is very fast) can crank them out.
> 
> First, thanks for sharing your findings.
> 
> Secondly, does anyone on the mailing list know of an OpenBSD
> equivalent to pktgen?

Not in-kernel, but netblast from the netrate package is somewhat
useful.



Re: SOLVED [was: firewall is very slow, something's wrong]

2007-10-16 Thread James Hartley
On 10/16/07, Florin Andrei <[EMAIL PROTECTED]> wrote:
> - flooding the firewall with small UDP packets, random source IPs,
> generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI
> Express card, Linux Fedora 7, running the kernel-level "pktgen" packet
> generator which is very fast) can crank them out.

First, thanks for sharing your findings.

Secondly, does anyone on the mailing list know of an OpenBSD
equivalent to pktgen?

Thanks.

Jim



SOLVED [was: firewall is very slow, something's wrong]

2007-10-16 Thread Florin Andrei

Florin Andrei wrote:


##
Huge performance improvements in the network stack, including:
* In pf, store routing table ID, queue ID etc directly in the packet 
header mbuf instead of using mbuf tags (which use malloc'd memory). This 
yields a 100% improvement in pf performance.
* Skip TCP/UDP/ICMP/ICMP6 checksumming when not necessary. This 
yields a further 10% improvement in pf performance.
* A change in the way the kernel random pool is stirred greatly 
increases performance with network interface cards that support 
interrupt mitigation, especially on architectures where reading the 
clock is expensive (such as amd64).

##

I'll try 4.2.


HOLY SH*T! I tried 4.2. It rocks!

Just the first test that I tried after installing it:
- switched gigabit network
- web server behind 1:1 NATing firewall
- firewall is AMD64 X2 2.4GHz
- downloading 2GB file via HTTP through the firewall in infinite loop
- flooding the firewall with small UDP packets, random source IPs, 
generated as fast as my workstation (AMD64 X2 6400, Intel Pro/1000 PCI 
Express card, Linux Fedora 7, running the kernel-level "pktgen" packet 
generator which is very fast) can crank them out. The packets are 
directed to the NATed address of the web server, to a port that's 
blocked by the firewall.


Under these conditions, OpenBSD 4.1 as a firewall just keels over and 
dies. All traffic through the firewall just stops in an instant.
Linux 2.6.18 fares slightly better, the current download finishes up, 
but another one won't start.


But the default OpenBSD 4.2 i386 uniprocessor kernel doesn't seem to 
care. The download just keeps going. New downloads are initiated OK 
through the firewall. There are even spare CPU cycles left :-) not many 
(10%) but still. There's a very large percentage of CPU (80...90%) used 
for interrupts.


Good job folks, I'm impressed.

Anyone building gigabit routers and firewalls, don't delay, upgrade to 
4.2. Heck, do that even for 100Mbit systems, this type of DoS doesn't 
need much bandwidth to be effective.


I'll keep doing tests. If anything interesting shows up, I'll post the 
results in a new thread.


--
Florin Andrei

http://florin.myip.org/



Re: firewall is very slow, something's wrong

2007-10-10 Thread Henning Brauer
* Robert C Wittig <[EMAIL PROTECTED]> [2007-10-10 20:45]:
> If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig RAM 
> and a 64 bit CPU, which would be a better choice, in general?

for a packet filter/router/...? 32bit 2Gig and take a gig out.
for a databse server? 64bit and add ram when required.
there is no "in general".

-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam



Re: firewall is very slow, something's wrong

2007-10-10 Thread Paul de Weerd
On Wed, Oct 10, 2007 at 12:34:48PM -0500, Robert C Wittig wrote:
| If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig
| RAM and a 64 bit CPU, which would be a better choice, in general?

There is no such generalization. The amount of RAM you need depends on
the task. For firewalling, you don't need lots. For a high-traffic,
caching webserver you do need much.

If, in general, you are firewalling .. you won't need much RAM. If, in
general, you are doing something else, you might need it. Like I said
in my previous mail, there is no short answer. No quick solution.
Everything has advantages and disadvantages. In some cases you may not
even want to run OpenBSD (*shock* !).

In general, you should look at the specific problem at hand and solve
it with the means available.

Cheers,

Paul 'WEiRD' de Weerd

--
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/

[demime 1.01d removed an attachment of type application/pgp-signature]



Re: firewall is very slow, something's wrong

2007-10-10 Thread Ted Unangst
On 10/10/07, Robert C Wittig <[EMAIL PROTECTED]> wrote:
> If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig
> RAM and a 64 bit CPU, which would be a better choice, in general?

64-bit and 1 GB.  it's much easier to add another GB RAM later than to
add 32-bits.



Re: firewall is very slow, something's wrong

2007-10-10 Thread Robert C Wittig

Paul de Weerd wrote:

wittig wrote:
| 64 bit processors (combined with 64 bit capable operating systems) have 
| the ability to address more RAM than 32 bit processors because 64^2 is a 
| much larger number than 32^2... lots more RAM addresses).


Oops! that should have read:

2^64 and 2^32


Depending on your software, 64 bit processors can be quite a bit
faster. If you're dealing with 64bit integers, using 64bit registers,
etc., a lower clocked 64bit CPU might be faster than a 32bit CPU
clocking at a higher rate. In short: There is no short answer. It
depends on what you're doing.



Point taken, particularly where big integers are concerned.


From what Henning tells us (and what sounds logical to me), grabbing a
ethernet frame from a NIC and putting it on another NIC doesn't really
change much from 32bit to 64bit.

Your compiler also comes into play. If that is more tuned towards a
certain 32bit architecture (such as i386) than a certain 64bit arch
(because it's less populair, such as sparc64 or hppa64 or mips64),
this will impact your performance quite a bit.



If you had to choose between, say, 2 gig RAM and a 32 bit CPU, or 1 gig 
RAM and a 64 bit CPU, which would be a better choice, in general?



--
-wittig http://www.robertwittig.com/
http://robertwittig.net/
http://robertwittig.org/
.



Re: firewall is very slow, something's wrong

2007-10-10 Thread Stuart Henderson
On 2007/10/10 11:20, Tony Abernethy wrote:
> Siju George wrote:
> 
> > > so you think a 20 ton truck is twice as fast as a 10 ton truck?
> > O.K I get it :-)
> > So when does changing from 32 bit to a 64-bit processor actually help?
> 
> Quoting Paul de Weerd,
> "In short: There is no short answer. It depends on what you're doing."
> ( Not to mention how you do it ;-)

There are other changes between i386/amd64 than the number of bits
(e.g. amd64 has more registers, which allows some other changes that
can improve performance for some things), so it depends a lot on
the code being run.

You can't even always say, "software X is faster on arch Y", since
the way you use that software can give different results.

If you're looking for "fastest", just benchmark as close to real-life
use on both, it's the easiest way. You also often need to test whether
what you're trying to run does work correctly on !i386 arch (it's not
uncommon for code to make assumptions which don't hold true on !i386).

Of course, there are reasons other than "fastest" you might choose
a particular arch.

> Short answer:
> When you *might* need more than a GB or so of RAM/swap. 
> Most anything is faster than stuck.
>
> Easy: 2:1 ratio *either direction* which is faster.
> Hard: 10:1 ratio (again either direction).

I'm not too sure I understand what you're saying here.



Re: firewall is very slow, something's wrong

2007-10-10 Thread Tony Abernethy
Siju George wrote:

> > so you think a 20 ton truck is twice as fast as a 10 ton truck?
> O.K I get it :-)
> So when does changing from 32 bit to a 64-bit processor actually help?

Quoting Paul de Weerd,
"In short: There is no short answer. It depends on what you're doing."
( Not to mention how you do it ;-)

Short answer:
When you *might* need more than a GB or so of RAM/swap. 
Most anything is faster than stuck.

Easy: 2:1 ratio *either direction* which is faster.
Hard: 10:1 ratio (again either direction).
(figure in loading/unloading times on the truck analogy)



Re: firewall is very slow, something's wrong

2007-10-10 Thread Scott Wells

And is it in a vacuum?

Peter N. M. Hansteen wrote:

Henning Brauer <[EMAIL PROTECTED]> writes:

  

so you think a 20 ton truck is twice as fast as a 10 ton truck?



horizontal or vertical motion? assuming a perfectly spherical truck?




Re: firewall is very slow, something's wrong

2007-10-10 Thread Siju George
On 10/10/07, Henning Brauer <[EMAIL PROTECTED]> wrote:
> * Siju George <[EMAIL PROTECTED]> [2007-10-10 15:10]:
> > On 10/9/07, Henning Brauer <[EMAIL PROTECTED]> wrote:
> > > * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]:
> > > >> then, an i386 kernel should perform considerably better than amd64 for
> > > >> firewalling/routing/...
> > > > That is surprising. What is the reason?
> > > we dunno really. it hasn't been benched in sometimesoit might not even
> > > be true nay more, but last time the difference was dramatic.
> > I thought by running an amd64 kernel will get me twice the speed than
> > an i386 on an amd64 machine since one is 64 bit processing and the
> > other is just 32 bit :-(
>
> so you think a 20 ton truck is twice as fast as a 10 ton truck?
>

O.K I get it :-)
So when does changing from 32 bit to a 64-bit processor actually help?

Kind Regards

Siju



Re: firewall is very slow, something's wrong

2007-10-10 Thread Tony Abernethy
Robert C Wittig wrote:
> Siju George wrote:
> 
> > I thought by running an amd64 kernel will get me twice the 
> speed than
> > an i386 on an amd64 machine since one is 64 bit processing and the
> > other is just 32 bit :-(
> > 
> 
> 64 bit processors (combined with 64 bit capable operating 
> systems) have 
> the ability to address more RAM than 32 bit processors 
> because 64^2 is a 
> much larger number than 32^2... lots more RAM addresses).

Actually 2^64 vs 2^32  (64^2 is 2^7, 64 is 2^6, 32 is 2^5)

Other things equal, 64-bit should take twice as long because it 
takes 64 bits to do anything instead of 32 bits.

Not really that simple, because accessing 32 bits can involve
1) accessing the 64 bits that the 32 bits are in.
2) selecting the appropriate 32 bits of the 64 bits.

> 
> This does not speed things up, though, until you run out of RAM, and 
> start having to access the swapfile.
The 64-bits does affect how big the swap file can be without
resorting to Rube Goldberg contraptions to identify what is what.

> 
> The processor's speed... MHz, GHz, etc., will determine how fast the 
> processor itself can process instructions.
> 
> 
> -- 
> -wittig http://www.robertwittig.com/
>  http://robertwittig.net/
>  http://robertwittig.org/
> .



Re: firewall is very slow, something's wrong

2007-10-10 Thread Jon Radel
Robert C Wittig wrote:

> 64 bit processors (combined with 64 bit capable operating systems) have
> the ability to address more RAM than 32 bit processors because 64^2 is a
> much larger number than 32^2... lots more RAM addresses).

The increase from 2^32 to 2^64 is even more impressive.  ;-)

--Jon Radel

[demime 1.01d removed an attachment of type application/x-pkcs7-signature which 
had a name of smime.p7s]



Re: firewall is very slow, something's wrong

2007-10-10 Thread Paul de Weerd
On Wed, Oct 10, 2007 at 09:24:25AM -0500, Robert C Wittig wrote:
| Siju George wrote:
|
| >I thought by running an amd64 kernel will get me twice the speed than
| >an i386 on an amd64 machine since one is 64 bit processing and the
| >other is just 32 bit :-(
| >
|
| 64 bit processors (combined with 64 bit capable operating systems) have
| the ability to address more RAM than 32 bit processors because 64^2 is a
| much larger number than 32^2... lots more RAM addresses).
|
| This does not speed things up, though, until you run out of RAM, and
| start having to access the swapfile.
|
| The processor's speed... MHz, GHz, etc., will determine how fast the
| processor itself can process instructions.

Depending on your software, 64 bit processors can be quite a bit
faster. If you're dealing with 64bit integers, using 64bit registers,
etc., a lower clocked 64bit CPU might be faster than a 32bit CPU
clocking at a higher rate. In short: There is no short answer. It
depends on what you're doing.

>From what Henning tells us (and what sounds logical to me), grabbing a
ethernet frame from a NIC and putting it on another NIC doesn't really
change much from 32bit to 64bit.

Your compiler also comes into play. If that is more tuned towards a
certain 32bit architecture (such as i386) than a certain 64bit arch
(because it's less populair, such as sparc64 or hppa64 or mips64),
this will impact your performance quite a bit.

Cheers,

Paul 'WEiRD' de Weerd

--
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/

[demime 1.01d removed an attachment of type application/pgp-signature]



Re: firewall is very slow, something's wrong

2007-10-10 Thread Robert C Wittig

Siju George wrote:


I thought by running an amd64 kernel will get me twice the speed than
an i386 on an amd64 machine since one is 64 bit processing and the
other is just 32 bit :-(



64 bit processors (combined with 64 bit capable operating systems) have 
the ability to address more RAM than 32 bit processors because 64^2 is a 
much larger number than 32^2... lots more RAM addresses).


This does not speed things up, though, until you run out of RAM, and 
start having to access the swapfile.


The processor's speed... MHz, GHz, etc., will determine how fast the 
processor itself can process instructions.



--
-wittig http://www.robertwittig.com/
http://robertwittig.net/
http://robertwittig.org/
.



Re: firewall is very slow, something's wrong

2007-10-10 Thread Peter N. M. Hansteen
Henning Brauer <[EMAIL PROTECTED]> writes:

> so you think a 20 ton truck is twice as fast as a 10 ton truck?

horizontal or vertical motion? assuming a perfectly spherical truck?

-- 
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://bsdly.blogspot.com/ http://www.datadok.no/ http://www.nuug.no/
"Remember to set the evil bit on all malicious network traffic"
delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.



Re: firewall is very slow, something's wrong

2007-10-10 Thread Henning Brauer
* Siju George <[EMAIL PROTECTED]> [2007-10-10 15:10]:
> On 10/9/07, Henning Brauer <[EMAIL PROTECTED]> wrote:
> > * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]:
> > >> then, an i386 kernel should perform considerably better than amd64 for
> > >> firewalling/routing/...
> > > That is surprising. What is the reason?
> > we dunno really. it hasn't been benched in sometimesoit might not even
> > be true nay more, but last time the difference was dramatic.
> I thought by running an amd64 kernel will get me twice the speed than
> an i386 on an amd64 machine since one is 64 bit processing and the
> other is just 32 bit :-(

so you think a 20 ton truck is twice as fast as a 10 ton truck?

-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam



Re: firewall is very slow, something's wrong

2007-10-10 Thread Siju George
On 10/9/07, Henning Brauer <[EMAIL PROTECTED]> wrote:
> * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]:
> >> then, an i386 kernel should perform considerably better than amd64 for
> >> firewalling/routing/...
> >
> > That is surprising. What is the reason?
>
> we dunno really. it hasn't been benched in sometimesoit might not even
> be true nay more, but last time the difference was dramatic.
>

I thought by running an amd64 kernel will get me twice the speed than
an i386 on an amd64 machine since one is 64 bit processing and the
other is just 32 bit :-(

How about on sparc64 systems? do you get thwice the speed compared to
its 32 bit counterpart?

Thank you so much

Kind Regards

Siju



Re: firewall is very slow, something's wrong

2007-10-10 Thread Henning Brauer
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 22:54]:
> Henning Brauer wrote:
>> * Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]:
 then, an i386 kernel should perform considerably better than amd64 for 
 firewalling/routing/...
>>> That is surprising. What is the reason?
>> we dunno really. it hasn't been benched in sometimesoit might not even be 
>> true nay more, but last time the difference was dramatic.
>
> Then I will do some tests with 4.2 on gigabit-capable hardware. If anything 
> noteworthy comes out, I'll post the results.
> Don't expect something too fancy, but I guess anything is better than 
> nothing.
>
>>> How much RAM can the i386 kernel use on an amd64 machine?
>> 4GB minus pci space
>
> Hmmm.
>
> Please correct me if I'm wrong:
> Let's say a firewall is connected to a pretty fast Internet pipe (in the 
> gigabit range). Let's say there's a DDoS against this environment. In 
> theory, the firewall would need lots of RAM so that it can deal with the 
> incoming nasty packets, create an entry for each packet in the state table 
> (don't know the correct name for it in OpenBSD, sorry), then expire it 
> after a while.
> In theory, the firewall could be tweaked to expire unused states quickly, 
> but still, more RAM is better when dealing with a DDoS.

nope.
the kernel will not ever use more than 1 GB (or were it 768MB? memory 
fuzzy).
more than 1 GB of memory on a firewall even hurts.ok, not much. but a 
bit.

> What's still not clear to me is how much RAM I should provision per 1Gb of 
> bandwidth on OpenBSD, assuming there's an incoming worst-case-scenario 
> DDoS, that consumes RAM (and other resources) on the firewall yet leaves 
> some bandwidth open for legitimate traffic (so the firewall must be able to 
> continue to let the good traffic pass through). Also assuming some tweaking 
> has been done on the firewall to expire the bad stuff quickly without 
> affecting legitimate traffic.

RAM is not your concern on a firewall.

>>> If the SMP kernel does not actually hurt performance, I might have to use 
>>> it.
>> it does. seriously. locking is not free.
>
> Aw, damn. I was hoping that's not quite the case.
>
> Well, then hopefully the dynamic routing daemons won't get too greedy and 
> DoS the firewall from within. :-)

no, they won't.
they only get the cpu cycles not required for packet forwarding (well, 
interrupts + softint handling really) anyway.

> Or I may have to re-think the whole 
> environment and forget the idea of doing any kind of dynamic routing on the 
> firewall - from a security perspective, dynamic routing on the firewall 
> sucks anyway.

no, not really, not if done right.

-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam



Re: firewall is very slow, something's wrong

2007-10-09 Thread Florin Andrei

Henning Brauer wrote:

* Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]:
then, an i386 kernel should perform considerably better than amd64 for 
firewalling/routing/...

That is surprising. What is the reason?


we dunno really. it hasn't been benched in sometimesoit might not even 
be true nay more, but last time the difference was dramatic.


Then I will do some tests with 4.2 on gigabit-capable hardware. If 
anything noteworthy comes out, I'll post the results.
Don't expect something too fancy, but I guess anything is better than 
nothing.



How much RAM can the i386 kernel use on an amd64 machine?


4GB minus pci space


Hmmm.

Please correct me if I'm wrong:
Let's say a firewall is connected to a pretty fast Internet pipe (in the 
gigabit range). Let's say there's a DDoS against this environment. In 
theory, the firewall would need lots of RAM so that it can deal with the 
incoming nasty packets, create an entry for each packet in the state 
table (don't know the correct name for it in OpenBSD, sorry), then 
expire it after a while.
In theory, the firewall could be tweaked to expire unused states 
quickly, but still, more RAM is better when dealing with a DDoS.


What's still not clear to me is how much RAM I should provision per 1Gb 
of bandwidth on OpenBSD, assuming there's an incoming 
worst-case-scenario DDoS, that consumes RAM (and other resources) on the 
firewall yet leaves some bandwidth open for legitimate traffic (so the 
firewall must be able to continue to let the good traffic pass through). 
Also assuming some tweaking has been done on the firewall to expire the 
bad stuff quickly without affecting legitimate traffic.


But all that depends on the actual legitimate traffic and on the 
firewall rules.

I guess that's another way of saying "more tests are needed". :-/

If the SMP kernel does not actually hurt performance, I might have to use 
it.


it does. seriously. locking is not free.


Aw, damn. I was hoping that's not quite the case.

Well, then hopefully the dynamic routing daemons won't get too greedy 
and DoS the firewall from within. :-) Or I may have to re-think the 
whole environment and forget the idea of doing any kind of dynamic 
routing on the firewall - from a security perspective, dynamic routing 
on the firewall sucks anyway.


Looks like my performance test matrix just got bigger by a factor of 2x. 
:-/ But the bad combinations should get pruned pretty quickly, I guess.


+-+---+---+
|  \  | i386  | amd64 |
+-+---+---+
| SMP |   |   |
+-+---+---+
| UP  |   |   |
+-+---+---+

--
Florin Andrei

http://florin.myip.org/



Re: firewall is very slow, something's wrong

2007-10-09 Thread Henning Brauer
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-09 19:34]:
>> then, an i386 kernel should perform considerably better than amd64 for 
>> firewalling/routing/...
>
> That is surprising. What is the reason?

we dunno really. it hasn't been benched in sometimesoit might not even 
be true nay more, but last time the difference was dramatic.

> How much RAM can the i386 kernel use on an amd64 machine?

4GB minus pci space

>> next, you don't want SMP for such tasks. take out the second CPU and give 
>> it to somebody who can use it, and run the uniprocessor kernel.
> So, assuming the box is a pure firewall / static router (so just pf and 
> static routes), even with multiple interfaces, all those tasks run in a 
> single kernel thread?

yup

> Now here's the second thing: if this firewall needs to be integrated in an 
> environment with dynamic routing, it will need to run some kind of dynamic 
> routing daemon(s). For that, I'd like to have at least two cores on the 
> system, and a kernel that can take advantage of them.

the required locking will cost you more than the second cpu/core 
will ever gain you.

> If the SMP kernel does not actually hurt performance, I might have to use 
> it.

it does. seriously. locking is not free.

-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam



Re: firewall is very slow, something's wrong

2007-10-09 Thread Florin Andrei

Henning Brauer wrote:


First, you want to run 4.2 or -current, that shoudl about double your 
throughput.


Yes, I was looking at a paragraph in the 4.2 release notes and I thought 
all those things might be related exactly to the problem I'm seeing:


##
Huge performance improvements in the network stack, including:
* In pf, store routing table ID, queue ID etc directly in the 
packet header mbuf instead of using mbuf tags (which use malloc'd 
memory). This yields a 100% improvement in pf performance.
* Skip TCP/UDP/ICMP/ICMP6 checksumming when not necessary. This 
yields a further 10% improvement in pf performance.
* A change in the way the kernel random pool is stirred greatly 
increases performance with network interface cards that support 
interrupt mitigation, especially on architectures where reading the 
clock is expensive (such as amd64).

##

I'll try 4.2.

then, an i386 kernel should perform considerably better than amd64 for 
firewalling/routing/...


That is surprising. What is the reason?

How much RAM can the i386 kernel use on an amd64 machine?

next, you don't want SMP for such tasks. take out the second CPU and 
give it to somebody who can use it, and run the uniprocessor kernel.


So, assuming the box is a pure firewall / static router (so just pf and 
static routes), even with multiple interfaces, all those tasks run in a 
single kernel thread?


Now here's the second thing: if this firewall needs to be integrated in 
an environment with dynamic routing, it will need to run some kind of 
dynamic routing daemon(s). For that, I'd like to have at least two cores 
on the system, and a kernel that can take advantage of them.
If the SMP kernel does not actually hurt performance, I might have to 
use it.


--
Florin Andrei

http://florin.myip.org/



Re: firewall is very slow, something's wrong

2007-10-09 Thread Florin Andrei

Karsten McMinn wrote:


while is dreadfully obvious that there is some weirdness
happening, you'll definately get more performance by
switching to the latest snapshot or wait for your 4.2 cd


Just ordered it yesterday. ;-)


if it hasn't come yet.  What model transport do you have
and whats the Mainbords bios rev?


Tyan Transport GT24-B3992
BIOS Date: 03/06/07 09:36:13 Ver: 08.00.11

--
Florin Andrei

http://florin.myip.org/



Re: firewall is very slow, something's wrong

2007-10-09 Thread Henning Brauer
* Florin Andrei <[EMAIL PROTECTED]> [2007-10-05 03:55]:
> The hardware is AMD64, Tyan Transport, 2 CPUs 2 cores each. I am using the 
> SMP kernel. The network card is Intel Pro/1000 PCI Express 4x dual gigabit 
> port, it carries both em0 and em1.

First, you want to run 4.2 or -current, that shoudl about double your 
throughput.
then, an i386 kernel should perform considerably better than amd64 for 
firewalling/routing/...
next, you don't want SMP for such tasks. take out the second CPU and 
give it to somebody who can use it, and run the uniprocessor kernel.
last, increase net.inet.ip.ifq.maxlen until you see the congestion 
counter not increasing much any more under load. should not exceed 2500 
by too much. as a rule of thumb, 256 per gigE interface aren't too far 
off.

-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam



Re: firewall is very slow, something's wrong

2007-10-08 Thread Karsten McMinn
On 10/8/07, Florin Andrei <[EMAIL PROTECTED]> wrote:
> 
> The UDP flood still freezes the system solid (but I discovered that the
> system clock continues to work more or less fine, it's just the text
> console and the firewall that are not responsive).
>
> I still can't match the performance I get from Linux. Any suggestion is
> appreciated.

while is dreadfully obvious that there is some weirdness
happening, you'll definately get more performance by
switching to the latest snapshot or wait for your 4.2 cd
if it hasn't come yet.  What model transport do you have
and whats the Mainbords bios rev?



Re: firewall is very slow, something's wrong

2007-10-08 Thread Florin Andrei

knitti wrote:


there were in the past postings on this list about problems with quad-port
em NICs. I am absolutely not in a position to tell whether they are relevant
for this situation.  If I remember correctly, there was a problem with TCP
checksum offloading, and a suggested fix in one instance was jumpering
the card down to 66 MHz. I can't tell if this is related in *any* way.

I think there are some people here who *could* tell if you'd post a dmesg.


# dmesg 



OpenBSD 4.1 (GENERIC.MP) #1152: Sat Mar 10 19:22:57 MST 2007
[EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3220754432 (3145268K)
avail mem = 2757828608 (2693192K)
using 22937 buffers containing 322281472 bytes (314728K) of memory
mainbus0 (root)
bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf97e0 (61 entries)
bios0: empty empty
acpi0 at mainbus0: rev 2
acpi0: tables DSDT FACP APIC OEMB SRAT
acpitimer at acpi0 not configured
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Dual-Core AMD Opteron(tm) Processor 2216, 2394.33 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache

cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: apic clock running at 205MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache

cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu1: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache

cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu2: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Dual-Core AMD Opteron(tm) Processor 2216, 2465.82 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
64b/line 16-way L2 cache

cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu3: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
ioapic0 at mainbus0 apid 4 pa 0xfec0, version 11, 16 pins
ioapic1 at mainbus0 apid 5 pa 0xfec01000, version 11, 16 pins
ioapic2 at mainbus0 apid 6 pa 0xfec02000, version 11, 16 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (P0P1)
acpiprt2 at acpi0: bus 2 (P1P2)
acpiprt3 at acpi0: bus 3 (BR14)
acpiprt4 at acpi0: bus 4 (BR1E)
acpiprt5 at acpi0: bus 5 (BR28)
acpiprt6 at acpi0: bus 6 (BR32)
acpiprt7 at acpi0: bus 7 (BR3C)
acpibtn at acpi0 not configured
ipmi0 at mainbus0: reserve send fails
pci0 at mainbus0 bus 0: configuration mode 1
ppb0 at pci0 dev 1 function 0 "ServerWorks HT-1000 PCI" rev 0x00
pci1 at ppb0 bus 1
ppb1 at pci1 dev 13 function 0 "ServerWorks HT-1000 PCIX" rev 0xc0
pci2 at ppb1 bus 2
pciide0 at pci1 dev 14 function 0 "ServerWorks HT-1000 SATA" rev 0x00: DMA
pciide0: using apic 4 int 11 (irq 11) for native-PCI interrupt
pciide0: port 0: device present, speed: 1.5Gb/s
wd0 at pciide0 channel 0 drive 0: 
wd0: 16-sector PIO, LBA48, 238475MB, 488397168 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
pciide0: port 1: PHY offline
pciide0: port 2: PHY offline
pciide0: port 3: PHY offline
piixpm0 at pci0 dev 2 function 0 "ServerWorks HT-1000" rev 0x00: polling
iic0 at piixpm0
adt0 at iic0 addr 0x2e: emc6d100 rev 0x68
pciide1 at pci0 dev 2 function 1 "ServerWorks HT-1000 IDE" rev 0x00: DMA
atapiscsi0 at pciide1 channel 0 drive 1
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0:  SCSI0 5/cdrom removable
cd0(pciide1:0:1): using PIO mode 4, DMA mode 2, Ultra-DMA mode 0
pcib0 at pci0 dev 2 function 2 "ServerWorks HT-1000 LPC" rev 0x00
ohci0 at pci0 dev 3 function 0 "ServerWorks HT-1000 USB" rev 0x01: apic 
4 int 10 (irq 10), version 1.0, legacy support

usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: ServerWorks OHCI root hub, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ohci1 at pci0 dev 3 function 1 "ServerWorks HT-1000 USB" rev 0x01: apic 
4 int 10 (

Re: firewall is very slow, something's wrong

2007-10-08 Thread knitti
On 10/8/07, Florin Andrei <[EMAIL PROTECTED]> wrote:
> I still can't match the performance I get from Linux. Any suggestion is
> appreciated.

there were in the past postings on this list about problems with quad-port
em NICs. I am absolutely not in a position to tell whether they are relevant
for this situation.  If I remember correctly, there was a problem with TCP
checksum offloading, and a suggested fix in one instance was jumpering
the card down to 66 MHz. I can't tell if this is related in *any* way.

I think there are some people here who *could* tell if you'd post a dmesg.

gretings,
knitti



Re: firewall is very slow, something's wrong

2007-10-08 Thread Florin Andrei

Florin Andrei wrote:


I expected OpenBSD 4.1 to do better. But the thing is, even without the 
UDP flood, the OpenBSD firewall is very slow. I am downloading a huge 
file through it, via HTTP, and all I get is 4 Mbyte / sec. With Linux I 
get 112 Mbyte / sec.


Something's wrong. Or I'm doing something wrong.


Disabled all pf rules including NAT, now it's just "pass in ; pass out"
Now the download is able to saturate the gig ports, about 112 Mbyte / sec.
But it's still not constantly at 112, it sometime drops below that about 
10%. When that happens, CPU0 has 0% idle cycles. A lot of interrupts, 
always above 70% on CPU0, going to 99% when the download slows down.

The congestion counter is now 0.

The UDP flood still freezes the system solid (but I discovered that the 
system clock continues to work more or less fine, it's just the text 
console and the firewall that are not responsive).


I still can't match the performance I get from Linux. Any suggestion is 
appreciated.


--
Florin Andrei

http://florin.myip.org/



Re: firewall is very slow, something's wrong

2007-10-08 Thread Florin Andrei

Stuart Henderson wrote:

On 2007/10/04 17:48, Florin Andrei wrote:
All firewall rules are written as stateless as possible - I don't need 
stateful filtering, the setup is very simple (allow HTTP inbound, allow a 
few ICMP types, and that's it).

  congestion116169  197.2/s


Try setting net.inet.ip.ifq.maxlen to 256 (sysctl/sysctl.conf),
if you still see the congestion count increasing then search for
net.inet.ip.ifq.maxlen in the list archives and have a read.


I raised maxlen to 300. I also enabled ACPI. It's still slow. The 
congestion counter is still not zero - currently at 386.5/s
One good thing is that there used to be a big pause when the kernel was 
booting up, probably waiting for some device or something - now with 
ACPI the pause is smaller. It's still waiting for something, just not as 
much.


I am watching the system with top, set to update every 1s, and I noticed 
there are a lot of interrupt load bursts on CPU0. The percentage of 
interrupt load is very uneven, sometimes as low as 15%, sometimes as 
high as 75%.
I unleashed the UDP flood and the firewall is totally frozen - can't do 
anything even on the local keyboard. Not even the display (running top) 
gets updated anymore. The machine is frozen solid. All network traffic 
stops immediately.

Kill the UDP flood and OpenBSD resumes normal operations.

I tried the uniprocessor kernel and it's exactly the same.

Comparison with Linux on the exact same hardware:
HTTP download speed through the firewall is 112 Mbyte / sec (saturating 
the GigE ports) and the interrupt load is relatively low and constant - 
about 30%.
Under UDP flood with Linux as a firewall, the current download finishes 
up, but a new one cannot get started. The system is not frozen at all, 
it's quite usable, in fact I can heavily overload it (running a bunch of 
CPU hogs) to the point where userspace becomes sluggish and load average 
is up to 250 or so, yet the firewall is not influenced at all.


So what's the deal here? The heavy interrupt load percentage seems to 
indicate an issue with the network driver if I'm not mistaken. But these 
are good and quite popular network cards - Intel Pro/1000 PCI Express 4x 
dual-port gigabit, seen by kernel as em0 and em1


--
Florin Andrei

http://florin.myip.org/



Re: firewall is very slow, something's wrong

2007-10-07 Thread Claudio Jeker
On Thu, Oct 04, 2007 at 05:48:50PM -0700, Florin Andrei wrote:
> Dual-homed firewall, web server on the private network, firewall is 
> doing 1:1 NAT for the web server to the public interface of the 
> firewall. em0 is the public interface, em1 is the private one.
> 
> In the exact same setup (same hardware even) I am comparing Linux and 
> OpenBSD for a firewall. Installed Linux on a hard-disc, OpenBSD on 
> another disc, and I'm just swapping discs while I'm testing.
> All firewall rules are written as stateless as possible - I don't need 
> stateful filtering, the setup is very simple (allow HTTP inbound, allow 
> a few ICMP types, and that's it).
> 
> With Linux, I achieve gigabit transfer speeds through the firewall 
> (saturating the network ports), but the firewall refuses to let any new 
> connection through when I flood it with a bunch of small UDP packets 
> with random source addresses.
> 
> I expected OpenBSD 4.1 to do better. But the thing is, even without the 
> UDP flood, the OpenBSD firewall is very slow. I am downloading a huge 
> file through it, via HTTP, and all I get is 4 Mbyte / sec. With Linux I 
> get 112 Mbyte / sec.
> 
> Something's wrong. Or I'm doing something wrong.
> 
> The hardware is AMD64, Tyan Transport, 2 CPUs 2 cores each. I am using 
> the SMP kernel. The network card is Intel Pro/1000 PCI Express 4x dual 
> gigabit port, it carries both em0 and em1.
> 

I guess you need to "enable acpi" with config(8) as the system is quite
new and most newer system have busted MP BIOS infos. The effect is bad
interrupt routing and other crazyness -- which is often felt as slow
systems.

-- 
:wq Claudio



Re: firewall is very slow, something's wrong

2007-10-05 Thread Stuart Henderson
On 2007/10/04 17:48, Florin Andrei wrote:
> All firewall rules are written as stateless as possible - I don't need 
> stateful filtering, the setup is very simple (allow HTTP inbound, allow a 
> few ICMP types, and that's it).

You might want to re-think this, stateless rulesets are usually
slower. This is interesting:

http://www.undeadly.org/cgi?action=article&sid=20060927091645

>   congestion116169  197.2/s

Try setting net.inet.ip.ifq.maxlen to 256 (sysctl/sysctl.conf),
if you still see the congestion count increasing then search for
net.inet.ip.ifq.maxlen in the list archives and have a read.



firewall is very slow, something's wrong

2007-10-04 Thread Florin Andrei
Dual-homed firewall, web server on the private network, firewall is 
doing 1:1 NAT for the web server to the public interface of the 
firewall. em0 is the public interface, em1 is the private one.


In the exact same setup (same hardware even) I am comparing Linux and 
OpenBSD for a firewall. Installed Linux on a hard-disc, OpenBSD on 
another disc, and I'm just swapping discs while I'm testing.
All firewall rules are written as stateless as possible - I don't need 
stateful filtering, the setup is very simple (allow HTTP inbound, allow 
a few ICMP types, and that's it).


With Linux, I achieve gigabit transfer speeds through the firewall 
(saturating the network ports), but the firewall refuses to let any new 
connection through when I flood it with a bunch of small UDP packets 
with random source addresses.


I expected OpenBSD 4.1 to do better. But the thing is, even without the 
UDP flood, the OpenBSD firewall is very slow. I am downloading a huge 
file through it, via HTTP, and all I get is 4 Mbyte / sec. With Linux I 
get 112 Mbyte / sec.


Something's wrong. Or I'm doing something wrong.

The hardware is AMD64, Tyan Transport, 2 CPUs 2 cores each. I am using 
the SMP kernel. The network card is Intel Pro/1000 PCI Express 4x dual 
gigabit port, it carries both em0 and em1.


=

lo0: flags=8049 mtu 33192
groups: lo
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8
fxp0: flags=8802 mtu 1500
lladdr 00:e0:81:4a:0a:7f
media: Ethernet autoselect (none)
status: no carrier
bge0: flags=8802 mtu 1500
lladdr 00:e0:81:4a:0a:a8
media: Ethernet autoselect (none)
status: no carrier
bge1: flags=8802 mtu 1500
lladdr 00:e0:81:4a:0a:a9
media: Ethernet autoselect (none)
status: no carrier
em0: flags=8843 mtu 1500
lladdr 00:15:17:37:e9:fa
groups: egress
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 10.123.0.10 netmask 0xff00 broadcast 10.123.0.255
inet6 fe80::215:17ff:fe37:e9fa%em0 prefixlen 64 scopeid 0x4
inet 10.123.0.253 netmask 0x broadcast 10.123.0.253
em1: flags=8843 mtu 1500
lladdr 00:15:17:37:e9:fb
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 10.123.1.10 netmask 0xff00 broadcast 10.123.1.255
inet6 fe80::215:17ff:fe37:e9fb%em1 prefixlen 64 scopeid 0x5
pflog0: flags=141 mtu 33192
enc0: flags=0<> mtu 1536

==

TRANSLATION RULES:
binat on em0 inet from 10.123.1.253 to any -> 10.123.0.253

FILTER RULES:
pass quick on em1 all no state
pass in quick on em0 inet proto tcp from any to 10.123.1.253 port = www 
no state
pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type 
echoreq no state
pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type 
echorep no state
pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type 
unreach no state
pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type 
paramprob no state
pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type 
trace no state
pass in quick on em0 inet proto icmp from any to 10.123.1.253 icmp-type 
timex no state

pass in quick on em0 inet from any to 10.123.0.10 no state
block drop in quick all
pass out all no state
No queue in use

STATES:
all tcp 10.123.1.253:80 <- 10.123.0.253:80 <- 10.123.0.251:47108 
ESTABLISHED:ESTABLISHED


INFO:
Status: Enabled for 0 days 00:09:49   Debug: Urgent

State Table  Total Rate
  current entries1
  searches 3809717 6468.1/s
  inserts60.0/s
  removals   50.0/s
Counters
  match1812847 3077.8/s
  bad-offset 00.0/s
  fragment   00.0/s
  short  00.0/s
  normalize  00.0/s
  memory 00.0/s
  bad-timestamp  00.0/s
  congestion116169  197.2/s
  ip-option  00.0/s
  proto-cksum00.0/s
  state-mismatch 00.0/s
  state-insert   00.0/s
  state-limit00.0/s
  src-limit  00.0/s
  synproxy   00.0/s

TIMEOUTS:
tcp.first30s
tcp.opening   5s
tcp.established   18000s
tcp.closing  60s
tcp.finwait