Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-19 Thread Kip Macy
On Fri, Jul 11, 2008 at 11:44 PM, Brian McGinty [EMAIL PROTECTED] wrote:
 Hi Brian
 I very much doubt that this is ceteris paribus. This is 384 random IPs
  - 384 random IP addresses with a flow lookup for each packet. Also,
 I've read through igb on Linux - it has a lot of optimizations that
 the FreeBSD driver lacks and I have yet to implement.

 Hey Kip,
 when will you push the optimization into FreeBSD?

Hi Brian,

I'm hoping to get to it some time in August. I'm a bit behind in my
contracts at the moment.

FYI: I'm actually able to forward 2.3Mpps between 2 10Gig interfaces
on an 8-core system. I'm hoping to push it up to 3Mpps.

Thanks,
Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-19 Thread Brian McGinty
G'day Kip,

 I'm hoping to get to it some time in August. I'm a bit behind in my
 contracts at the moment.

A few weeks ago, I did a quick comparison of the driver between
FreeBSD and Linux, and found quite a few differences that's worth
pulling over.  The guy from Intel working on FreeBSD, Jack?, is he the
one that does this sort of sync-up of the drivers between the two
distribution, or you?  There's been a lot of changes recently,
including full support for multiple Rx/Tx queues that significantly
ups the ante on performance. FreeBSD doesn't support multiple Rx/Tx,
or does something half arsed.

 FYI: I'm actually able to forward 2.3Mpps between 2 10Gig interfaces
 on an 8-core system. I'm hoping to push it up to 3Mpps.

Is this no-loss number, and how did you test it?  I don't have
throughput numbers for the Oplin.  I'm waiting to get some time on the
Ixia at work to generate performance numbers for 1G and 10G for all
packet sizes, on FreeBSD and Linux, on a 16 core system, and blast it
to the list. I expect Linux to do 2-3 times better :-)

Later,
Brian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-19 Thread Kip Macy
On Sat, Jul 19, 2008 at 7:17 PM, Brian McGinty [EMAIL PROTECTED] wrote:
 G'day Kip,

 I'm hoping to get to it some time in August. I'm a bit behind in my
 contracts at the moment.

 A few weeks ago, I did a quick comparison of the driver between
 FreeBSD and Linux, and found quite a few differences that's worth
 pulling over.  The guy from Intel working on FreeBSD, Jack?, is he the
 one that does this sort of sync-up of the drivers between the two
 distribution, or you?  There's been a lot of changes recently,
 including full support for multiple Rx/Tx queues that significantly
 ups the ante on performance. FreeBSD doesn't support multiple Rx/Tx,
 or does something half arsed.

This is on a variant of RELENG_6 FreeBSD with a recent version of ULE
and running the Checkpoint firewall. It also uses the full number of
queues available to igb (4) and #queues == #cores (8 in this case) for
ixgbe. The drivers in CVS have some bugs that I have fixed in this
FreeBSD variant. FreeBSD's CVS version of the Intel drivers definitely
lags Linux in terms of some optimizations. Even my version doesn't
have some of the linux optimizations.

 FYI: I'm actually able to forward 2.3Mpps between 2 10Gig interfaces
 on an 8-core system. I'm hoping to push it up to 3Mpps.

This is testing with an IXIA I don't currently have zero loss numbers.
This is not fully loaded. However, ixgbe spews out pause frames when
rx gets backed up so losses never get much above 0.1%.

 Is this no-loss number, and how did you test it?  I don't have
 throughput numbers for the Oplin.  I'm waiting to get some time on the
 Ixia at work to generate performance numbers for 1G and 10G for all
 packet sizes, on FreeBSD and Linux, on a 16 core system, and blast it
 to the list. I expect Linux to do 2-3 times better :-)

Sure, if you don't care about packet reordering. On their own box
Checkpoint claims that Linux is currently able to do 20% better than
we are seeing. Even they don't claim 200% - 300%. I know people who
are switching off of Linux for memcache because they simply can't make
it perform. So you're mileage really varies depending on the workload.
I'm not sure where you get your numbers from. I would really like to
get a hold of this magical Linux distribution to do a side by side
comparison on the same workload. A 200% - 300% performance delta would
definitely justify switching.

Thanks,
Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-14 Thread Bruce Evans

On Mon, 7 Jul 2008, Robert Watson wrote:


On Mon, 7 Jul 2008, Bruce Evans wrote:

(1) sendto() to a specific address and port on a socket that has been 
bound to

   INADDR_ANY and a specific port.

(2) sendto() on a specific address and port on a socket that has been 
bound to

   a specific IP address (not INADDR_ANY) and a specific port.

(3) send() on a socket that has been connect()'d to a specific IP address 
and

   a specific port, and bound to INADDR_ANY and a specific port.

(4) send() on a socket that has been connect()'d to a specific IP address
   and a specific port, and bound to a specific IP address (not 
INADDR_ANY)

   and a specific port.

The last of these should really be quite a bit faster than the first of 
these, but I'd be interested in seeing specific measurements for each if 
that's possible!


Not sure if I understand networking well enough to set these up quickly. 
Does netrate use one of (3) or (4) now?


(3) and (4) are effectively the same thing, I think, since connect(2) should 
force the selection of a source IP address, but I think it's not a bad idea 
to confirm that. :-)


The structure of the desired micro-benchmark here is basically:
...


I hacked netblast.c to do this:

% --- /usr/src/tools/tools/netrate/netblast/netblast.c  Fri Dec 16 17:02:44 2005
% +++ netblast.cMon Jul 14 21:26:52 2008
% @@ -44,9 +44,11 @@
%  {
% 
% -	fprintf(stderr, netblast [ip] [port] [payloadsize] [duration]\n);

% - exit(-1);
% + fprintf(stderr, netblast ip port payloadsize duration bind connect\n);
% + exit(1);
%  }
% 
% +static int	gconnected;

%  static int   global_stop_flag;
% +static struct sockaddr_in *gsin;
% 
%  static void

% @@ -116,6 +118,13 @@
%   counter++;
%   }
% - if (send(s, packet, packet_len, 0)  0)
% + if (gconnected  send(s, packet, packet_len, 0)  0) {
%   send_errors++;
% + usleep(1000);
% + }
% + if (!gconnected  sendto(s, packet, packet_len, 0,
% + (struct sockaddr *)gsin, sizeof(*gsin))  0) {
% + send_errors++;
% + usleep(1000);
% + }
%   send_calls++;
%   }
% @@ -146,9 +155,10 @@
%   struct sockaddr_in sin;
%   char *dummy, *packet;
% - int s;
% + int bind_desired, connect_desired, s;
% 
% -	if (argc != 5)

% + if (argc != 7)
%   usage();
% 
% +	gsin = sin;

%   bzero(sin, sizeof(sin));
%   sin.sin_len = sizeof(sin);
% @@ -176,4 +186,7 @@
%   usage();
% 
% +	bind_desired = (strcmp(argv[5], b) == 0);

% + connect_desired = (strcmp(argv[6], c) == 0);
% +
%   packet = malloc(payloadsize);
%   if (packet == NULL) {
% @@ -189,7 +202,19 @@
%   }
% 
% -	if (connect(s, (struct sockaddr *)sin, sizeof(sin))  0) {

% - perror(connect);
% - return (-1);
% + if (bind_desired) {
% + struct sockaddr_in osin;
% +
% + osin = sin;
% + if (inet_aton(0, sin.sin_addr) == 0)
% + perror(inet_aton(0));
% + if (bind(s, (struct sockaddr *)sin, sizeof(sin))  0)
% + err(-1, bind);
% + sin = osin;
% + }
% +
% + if (connect_desired) {
% + if (connect(s, (struct sockaddr *)sin, sizeof(sin))  0)
% + err(-1, connect);
% + gconnected = 1;
%   }
%

This also fixes some bugs in usage() (bogus [] around non-optional args and
bogus exit code) and adds a sleep after send failure.  Without the sleep,
netblast distorts the measurements by taking 100% CPU.  This depends on
kernel queues having enough buffering to not run dry during the sleep
time (rounded up to a tick boundary).  I use ifq_maxlen =
DRIVER_TX_RING_CNT + imax(2 * tick / 4, 1) = 10512 for DRIVER = bge
and HZ = 100.  This is actually wrong now.  The magic 2 is to round up to
a tick boundary and the magic 4 is for bge taking a minimum of 4 usec per
packet on old hadware, but bge actually takes about 1.5 usec on the test
hardware and I'd like it to take 0.66 usec.  The queues rarely run dry in
practice, but running dry just a few times for a few msec each would
explain some anomalies.  Old SGI ttcp uses a select timeout of 18 msec here.
nttcp and netsend use more sophisticated methods that don't work unless HZ
is too small.  It's just impossible for a program to schedule its sleeps
with a fine enough resolution to ensure waking up before the queue runs
dry, unless HZ is too small or the queue is too large.  select() for
writing doesn't work for the queue part of socket i/o.

Results:
~5.2 sendto (1):  630 kpps   98% CPU  11   cm/p (cache misses/packet (min))
-cur sendto:  590 kpps  100% CPU  10   cm/p (July 8 -current)
(2):  no significant difference - see below
~5.2 send   (3):  620 kpps   75% CPU   9.5 cm/p
-cur send:520 kpps   60% CPU   8   

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-12 Thread Brian McGinty
 Hi Brian
 I very much doubt that this is ceteris paribus. This is 384 random IPs
  - 384 random IP addresses with a flow lookup for each packet. Also,
 I've read through igb on Linux - it has a lot of optimizations that
 the FreeBSD driver lacks and I have yet to implement.

Hey Kip,
when will you push the optimization into FreeBSD?

Cheers,
Brian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-11 Thread Stefan Lambrev

Hi Paul,

Paul wrote:
I tested Linux in bridge configuration with the same machine and it 
CPUed out at about 600kpps through the bridge..

600kpps incoming or 600kpps incoming+ outgoing ?
That's a bit low :/   Soft interrupt using all the cpu.  Same opteron 
, 82571EB Pci express NIC.

Tried SMP/ non-smp , load balanced irqs, etc..

Does hwpmc work out of the box (FreeBSD) with those CPUs?


Good news is using iptables only adds a few percentage onto the CPU 
usage.   But still, what's with that..
So far FreeBSD got the highest pps rating for forwarding. I  haven't 
tried bridge mode.  Ipfw probably takes a big hit in that too though.


Looking for an 82575 to test..


P.S. It was a nice chat, but what we can expect from the future? Any 
plans, patches etc?
Someone suggested to install 8-current and test with it as this is the 
fast way to have something included in FreeBSD.
I can do this - I can install 8-current, patch it and put it under load 
and report results, but need patches :)

I guess Paul is in the same situation ..

--

Best Wishes,
Stefan Lambrev
ICQ# 24134177

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-11 Thread Bart Van Kerckhove
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Good news is using iptables only adds a few percentage onto the CPU
 usage.   But still, what's with that..
 So far FreeBSD got the highest pps rating for forwarding. I  haven't
 tried bridge mode.  Ipfw probably takes a big hit in that too though.
 Looking for an 82575 to test..
 
 P.S. It was a nice chat, but what we can expect from the future? Any
 plans, patches etc?
 Someone suggested to install 8-current and test with it as this is the
 fast way to have something included in FreeBSD.
 I can do this - I can install 8-current, patch it and put it under
 load and report results, but need patches :)
 I guess Paul is in the same situation ..

I'm in the same situation as well.
Would anyone be interested in very specific work aimed at improving IP
forwarding?
I would happily put out a bounty for this, and I'm quite sure I'm not
alone.

PS Paul: idd you get around to testing C2D ?

Kind regards,

Met vriendelijke groet / With kind regards,

Bart Van Kerckhove
http://friet.net/pgp.txt
There are 10 kinds of ppl; those who read binary and those who don't

-BEGIN PGP SIGNATURE-

iQA/AwUBSHdqOQoIFchBM0BKEQJSPQCfQKKgD8+xrX088+o0IKmPDdDD0XoAnAv+
SqgNdjkKsEstDYqnFDNUQuK3
=ft58
-END PGP SIGNATURE-

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-10 Thread Paul
I tested Linux in bridge configuration with the same machine and it 
CPUed out at about 600kpps through the bridge..
That's a bit low :/   Soft interrupt using all the cpu.  
Same opteron , 82571EB Pci express NIC.

Tried SMP/ non-smp , load balanced irqs, etc..

Good news is using iptables only adds a few percentage onto the CPU 
usage.   But still, what's with that..
So far FreeBSD got the highest pps rating for forwarding. I  haven't 
tried bridge mode.  Ipfw probably takes a big hit in that too though.


Looking for an 82575 to test..


Paul
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Kip Macy
On Mon, Jul 7, 2008 at 6:07 PM, Mike Tancsa [EMAIL PROTECTED] wrote:
 At 02:44 PM 7/7/2008, Paul wrote:

 Also my 82571 NIC supports multiple received queues and multiple transmit
 queues so why hasn't
 anyone written the driver to support this?  It's not a 10gb card and it
 still supports it and it's widely
 available and not too expensive either.   The new 82575/6 chips support
 even more queues and the
 two port version will be out this month and the 4 port in october (PCI-E
 cards).  Motherboards are
 already shipping with the 82576..   (82571 supports 2x/2x  575/6 support
 4x/4x)




 Actually, do any of your NICs attach via the igb driver ?


I have a pre-production card. With some bug fixes and some tuning of
interrupt handling (custom stack - I've been asked to push the changes
back in to CVS, I just don't have time right now) an otherwise
unoptimized igb can forward 1.04Mpps from one port to another (1.04
Mpps in  on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8
core system.

-Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Kip Macy
On Mon, Jul 7, 2008 at 6:22 PM, Paul [EMAIL PROTECTED] wrote:
 I read through the IGB driver, and it says 82575/6 only...  which is the new
 chip Intel is releasing on the cards this month 2 port
 and october 4 port, but the chips are on some of the motherboards right now.
 Why can't it also use the 82571 ? doesn't make any sense.. I haven't tried
 it but just browsing the driver source
 doesn't look like it will work.

The igb driver has been written to remove a lot of the cruft that has
accumulated to work around  deficiencies in earlier 8257x hardware.
Although it supports legacy descriptor handling it has a new mode of
descriptor handling that is ostensibly better. I don't have access to
the data sheets for pre-zoar hardware so I'm not sure what it would
take to support multiple queues on that hardware.

-Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Robert Watson


On Mon, 7 Jul 2008, Artem Belevich wrote:

As was already mentioned, we can't avoid all cache misses as there's data 
that's recently been updated in memory via DMA and therefor kicked out of 
cache.


However, we may hide some of the latency penalty by prefetching 
'interesting' data early. I.e. we know that we want to access some ethernet 
headers, so we may start pulling relevant data into cache early. Ideally, by 
the time we need to access the field, it will already be in the cache. When 
we're counting nanoseconds per packet this may bring some performance gain.


There were some patches floating around for if_em to do a prefetch of the 
first bit of packet data on packets before handing them up the stack.  My 
understanding is that they moved the hot spot earlier, but didn't make a huge 
difference because it doesn't really take that long to get to the point where 
you're processing the IP header in our current stack (a downside to 
optimization...).  However, that's a pretty anecdotal story, and a proper 
study of the effects of prefetching would be most welcome.  One thing that I'd 
really like to see someone look at is whether, by doing a bit of appropriately 
timed prefetching, we can move cache misses out from under hot locks that 
don't really relate to the data being prefetched.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Stefan Lambrev

Hi,

Kip Macy wrote:

On Mon, Jul 7, 2008 at 6:07 PM, Mike Tancsa [EMAIL PROTECTED] wrote:
  

At 02:44 PM 7/7/2008, Paul wrote:



Also my 82571 NIC supports multiple received queues and multiple transmit
queues so why hasn't
anyone written the driver to support this?  It's not a 10gb card and it
still supports it and it's widely
available and not too expensive either.   The new 82575/6 chips support
even more queues and the
two port version will be out this month and the 4 port in october (PCI-E
cards).  Motherboards are
already shipping with the 82576..   (82571 supports 2x/2x  575/6 support
4x/4x)
  



Actually, do any of your NICs attach via the igb driver ?




I have a pre-production card. With some bug fixes and some tuning of
interrupt handling (custom stack - I've been asked to push the changes
back in to CVS, I just don't have time right now) an otherwise
unoptimized igb can forward 1.04Mpps from one port to another (1.04
Mpps in  on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8
core system.

  

Is this on 1gbps or on 10gbps NIC?

-Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]
  


--

Best Wishes,
Stefan Lambrev
ICQ# 24134177

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Kip Macy
 I have a pre-production card. With some bug fixes and some tuning of
 interrupt handling (custom stack - I've been asked to push the changes
 back in to CVS, I just don't have time right now) an otherwise
 unoptimized igb can forward 1.04Mpps from one port to another (1.04
 Mpps in  on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8
 core system.



 Is this on 1gbps or on 10gbps NIC?


Hi Stefan,
The hardware that igb supports is just the latest revision of the
hardware supported by em, i.e. it is 1gbps.

Cheers,
Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Paul
Will someone confirm if it will support the 82571EB ?  I don't see a 
reason why not as it's very similar hardware
and it's available now in large quantities so making 82571 part of igb I 
think would be a good idea.



Kip Macy wrote:

I have a pre-production card. With some bug fixes and some tuning of
interrupt handling (custom stack - I've been asked to push the changes
back in to CVS, I just don't have time right now) an otherwise
unoptimized igb can forward 1.04Mpps from one port to another (1.04
Mpps in  on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8
core system.


  

Is this on 1gbps or on 10gbps NIC?



Hi Stefan,
The hardware that igb supports is just the latest revision of the
hardware supported by em, i.e. it is 1gbps.

Cheers,
Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]

  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Bruce Evans

On Mon, 7 Jul 2008, Erik Trulsson wrote:


On Mon, Jul 07, 2008 at 10:30:53PM +1000, Bruce Evans wrote:

On Mon, 7 Jul 2008, Andre Oppermann wrote:

The theoretical maximum at 64byte frames is 1,488,100.  I've looked
up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.


Where is the extra?  I still get 1.644736 Mpps (10^9/(8*64+96)).
1.488095 is for 64 bits extra (10^9/(8*64+96+64)).


A standard ethernet frame (on the wire) consists of:
7 octets preamble
1 octet  Start Frame Delimiter
6 octets destination address
6 octets source address
2 octets length/type
46-1500 octets  data (+padding if needed)
4 octets Frame Check Sequence

Followed by (at least) 96 bits interFrameGap, before the next frame starts.

For minimal packet size this gives a maximum packet rate at 1Gbit/s of
1e9/((7+1+6+6+2+46+4)*8+96)/ = 1488095 packets/second

You probably missed the preamble and start frame delimiter in your
calculation.


Thanks.  Yes, that was it.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Stefan Lambrev

Hi,

Kip Macy wrote:

On Mon, Jul 7, 2008 at 6:22 PM, Paul [EMAIL PROTECTED] wrote:
  

I read through the IGB driver, and it says 82575/6 only...  which is the new
chip Intel is releasing on the cards this month 2 port
and october 4 port, but the chips are on some of the motherboards right now.
Why can't it also use the 82571 ? doesn't make any sense.. I haven't tried
it but just browsing the driver source
doesn't look like it will work.



The igb driver has been written to remove a lot of the cruft that has
accumulated to work around  deficiencies in earlier 8257x hardware.
Although it supports legacy descriptor handling it has a new mode of
descriptor handling that is ostensibly better. I don't have access to
the data sheets for pre-zoar hardware so I'm not sure what it would
take to support multiple queues on that hardware.
  

May be we should ask Jack Vogel? He will have some news probably.

-Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]
  


--

Best Wishes,
Stefan Lambrev
ICQ# 24134177

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Artem Belevich
On 7/8/08, Robert Watson [EMAIL PROTECTED] wrote:
  There were some patches floating around for if_em to do a prefetch of the
 first bit of packet data on packets before handing them up the stack.  My

I found Andre Oppermann's optimization patch mentioned in july 2005
status report:
http://lists.freebsd.org/pipermail/freebsd-announce/2005-July/001012.html
http://www.nrg4u.com/freebsd/tcp_reass+prefetch-20041216.patch

Is that the patch you had in mind?

In the report Andre says: Use [of prefetch] in both of these places
show a very significant performance gain but not yet fully
quantified.

very significant bit looks promising. Unfortunately, it does not
look like prefetch changes in the patch made it into official kernel.
I wonder why.

It should be easy enough to apply prefetch-related changes and see
if/how it affects forwarding performance.

--Artem
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Paul
But this is probably no routing table, and single source and dst ips or 
very limited number of ips and ports.
the entire problem with Linux is the route cache, try and generate 
random source ips and random source/dst ports

and it won't even do 100kpps without problems.

I would like to log into the machine and see 1.4Mpps going through 3 nics :)



Brian McGinty wrote:

I have a pre-production card. With some bug fixes and some tuning of
interrupt handling (custom stack - I've been asked to push the changes
back in to CVS, I just don't have time right now) an otherwise
unoptimized igb can forward 1.04Mpps from one port to another (1.04
Mpps in  on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8
core system.



I have a 8 core system running stock Linux that easily does line rate
(ie, 1.488 Mpps) on 3 (82575) interfaces. Ie, 3 * 1.48 Mpps!

Cheers,
Brian.

  

-Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]




  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Kip Macy
On Tue, Jul 8, 2008 at 1:46 PM, Brian McGinty [EMAIL PROTECTED] wrote:
 I have a pre-production card. With some bug fixes and some tuning of
 interrupt handling (custom stack - I've been asked to push the changes
 back in to CVS, I just don't have time right now) an otherwise
 unoptimized igb can forward 1.04Mpps from one port to another (1.04
 Mpps in  on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8
 core system.

 I have a 8 core system running stock Linux that easily does line rate
 (ie, 1.488 Mpps) on 3 (82575) interfaces. Ie, 3 * 1.48 Mpps!

Hi Brian
I very much doubt that this is ceteris paribus. This is 384 random IPs
 - 384 random IP addresses with a flow lookup for each packet. Also,
I've read through igb on Linux - it has a lot of optimizations that
the FreeBSD driver lacks and I have yet to implement.

Thanks,
Kip
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-08 Thread Brian McGinty
 I have a pre-production card. With some bug fixes and some tuning of
 interrupt handling (custom stack - I've been asked to push the changes
 back in to CVS, I just don't have time right now) an otherwise
 unoptimized igb can forward 1.04Mpps from one port to another (1.04
 Mpps in  on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8
 core system.

I have a 8 core system running stock Linux that easily does line rate
(ie, 1.488 Mpps) on 3 (82575) interfaces. Ie, 3 * 1.48 Mpps!

Cheers,
Brian.


 -Kip
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to [EMAIL PROTECTED]

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock 
contention in the routing and transmit queue code.  The former needs 
some kernel hacking to address in order to improve parallelism for 
routing lookups.  The latter is harder to address given the hardware 
you're using: modern 10gbps cards frequently offer multiple transmit 
queues that can be used independently (which our cxgb driver supports), 
but 1gbps cards generally don't.


Actually the routing code is not contended.  The workload in router
is mostly serialized without much opportunity for contention.  With
many interfaces and any-to-any traffic patterns it may get some
contention.  The locking overhead per packet is always there and has
some impact though.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Robert Watson


On Mon, 7 Jul 2008, Andre Oppermann wrote:


Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock 
contention in the routing and transmit queue code.  The former needs some 
kernel hacking to address in order to improve parallelism for routing 
lookups.  The latter is harder to address given the hardware you're using: 
modern 10gbps cards frequently offer multiple transmit queues that can be 
used independently (which our cxgb driver supports), but 1gbps cards 
generally don't.


Actually the routing code is not contended.  The workload in router is 
mostly serialized without much opportunity for contention.  With many 
interfaces and any-to-any traffic patterns it may get some contention.  The 
locking overhead per packet is always there and has some impact though.


Yes, I don't see any real sources of contention until we reach the output 
code, which will run in the input if_em taskqueue threads, as the input path 
generates little or no contention of the packets are not destined for local 
delivery.  I was a little concerned about mention of degrading performance as 
firewall complexity grows -- I suspect there's a nice project for someone to 
do looking at why this is the case.  I was under the impression that, in 7.x 
and later, we use rwlocks to protect firewall state, and that unless stateful 
firewall rules are used, these are locked read-only rather than writable...


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Ingo Flaschberger wrote:

Dear Paul,


I tried all of this :/  still, 256/512 descriptors seem to work the best.
Happy to let you log into the machine and fiddle around if you want :)


yes, but I'm shure I will also not be able to achieve much more pps.
As it seems that you hit hardware-software-level-barriers, my only idea 
is to test dragonfly bsd, which seems to have less software overhead.


I tested DragonFly some time ago with an Agilent N2X tester and it
was by far the slowest of the pack.

I don't think you will be able to route 64byte packets at 1gbit 
wirespeed (2Mpps) with a current x86 platform.


You have to take inter-frame gap and other overheads too.  That gives
about 1.244Mpps max on a 1GigE interface.

In general the chipsets and buses are able to transfer quite a bit of
data.  On a dual-opteron 848 I was able to sink 2.5Mpps into the machine
with ifconfig em[01] monitor without hitting the cpu ceiling.  This
means that the bus and interrupt handling is not where most of the time
is spent.

When I did my profiling the saturation point was the cache miss penalty
for accessing the packet headers.  At saturation point about 50% of the
time was spent waiting for the memory to make its way into the CPU.

I hoped to reach 1Mpps with the hardware I mentioned some mails before, 
but 2Mpps is far far away.

Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium.


This is more or less expected.  PCI32 is not able to sustain high
packet rates.  The bus setup times kill the speed.  For larger packets
the ratio gets much better and some reasonable throughput can be achieved.


Perhaps you have some better luck at some different hardware systems
(ppc, mips, ..?) or use freebsd only for routing-table-updates and 
special network-cards (netfpga) for real routing.


NetFPGA doesn't have enough TCAM space to be useful for real routing
(as in Internet sized routing table).  The trick many embedded networking
CPUs use is cache prefetching that is integrated with the network
controller.  The first 64-128bytes of every packet are transferred
automatically into the L2 cache by the hardware.  This allows relatively
slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale
7448 in NPE-G2) to get more than 1Mpps.  Until something like this is
possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Robert Watson wrote:


On Mon, 7 Jul 2008, Andre Oppermann wrote:


Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock 
contention in the routing and transmit queue code.  The former needs 
some kernel hacking to address in order to improve parallelism for 
routing lookups.  The latter is harder to address given the hardware 
you're using: modern 10gbps cards frequently offer multiple transmit 
queues that can be used independently (which our cxgb driver 
supports), but 1gbps cards generally don't.


Actually the routing code is not contended.  The workload in router is 
mostly serialized without much opportunity for contention.  With many 
interfaces and any-to-any traffic patterns it may get some 
contention.  The locking overhead per packet is always there and has 
some impact though.


Yes, I don't see any real sources of contention until we reach the 
output code, which will run in the input if_em taskqueue threads, as the 
input path generates little or no contention of the packets are not 
destined for local delivery.  I was a little concerned about mention of 


The interface output was the second largest block after the cache misses
IIRC.  The output part seems to have received only moderate attention
and detailed performance analysis compared to the interface input path.
Most network drivers do a write to the hardware for every packet sent
in addition to other overhead that may be necessary for their transmit
DMA rings.  That adds significant overhead compared to the RX path where
those costs are amortized over a larger number packets.

degrading performance as firewall complexity grows -- I suspect there's 
a nice project for someone to do looking at why this is the case.  I was 
under the impression that, in 7.x and later, we use rwlocks to protect 
firewall state, and that unless stateful firewall rules are used, these 
are locked read-only rather than writable...


The overhead of just looking at the packet (twice) in ipfw or other
firewall packets is a huge overhead.  The main loop of ipfw is a very
large block of code.  Unless one implements compilation of firewall to
native machine code there is not much that can be done.  With LLVM we
will see some very interesting opportunity in that area.  Other than
that the ipfw instruction over per rule seems to be quite close to the
optimum.  I'm not saying one shouldn't take a close look with a profiler
to verify this is actually the case.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Paul,

to get a systematic analysis of the performance please do the following
tests and put them into a table for easy comparison:

 1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 monitor)

 2. inbound pps w/ fastforward into a single blackhole route

 3. inbound pps /w fastforward into a single blackhole route w/ ipfw and
just one allow all rule

 4. inbound pps /w fastforward into a single blackhole route w/ ipfw and
just one deny all rule

 5. inbound pps /w fastforward into the disc(4) discard network interface

 6. inbound pps /w fastforward into the disc(4) discard network interface
w/ ipfw and just one allow all rule

All surrounding parameters like RX/TX interface queue length, scheduler
and so may me varied but should be noted.

--
Andre

Paul wrote:

UP 32 bit test vs 64 bit:
negligible difference in forwarding performance without polling
slightly better polling performance but still errors at lower packet rates
same massive hit with ipfw loaded

Installing dragonfly in a bit..
If anyone has a really fast PPC type system or SUN or something i'd love 
to try it :)

Something with a really big L1 cache :P


Paul wrote:

ULE + PREEMPTION for non SMP
no major differences with SMP with ULE/4BSD and preemption ON/OFF

32 bit UP test coming up with new cpu
and I'm installing dragonfly sometime this weekend :]
UP: 1mpps in one direction with no firewall/no routing table is not 
too bad, but 1mpps both directions is the goal here

700kpps with full bgp table in one direction is not too bad
Ipfw needs a lot of work, barely gets 500kpps with no routing table 
with a few ipfw rules loaded.. that's horrible
Linux barely takes a hit when you start loading iptables rules , but 
then again linux has a HUGE problem with routing

random packet sources/ports .. grr
My problem Is I need some box to do fast routing and some to do 
firewall.. :/
I'll have 32 bit 7-stable UP test with ipfw/routing table and then 
move on to dragonfly.
I'll post the dragonfly results here as well as sign up for their 
mailing list.



Bart Van Kerckhove wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Paul / Ingo,
 

I tried all of this :/  still, 256/512 descriptors seem to work the
best. Happy to let you log into the machine and fiddle around if you
want :)   

I've been watching this thread closely, since I'm in a very similair
situation.
A few questions/remarks:

Does ULE provide better performance than 4BSD for forwarding?
Did you try freebsd4 as well? This thread had a report about that quite
opposite to my own experiences, -4 seemed to be a lot faster at 
forwarding

than anything else I 've tried so far.
Obviously the thing I'm interested in is IMIX - and 64byte packets.
Does anyone have any benchmarks for DragonFly? I asked around on IRC, 
but

that nor google turned up any useful results.

snip 

I don't think you will be able to route 64byte packets at 1gbit
wirespeed (2Mpps) with a current x86 platform.

Are there actual hardware related reasons this should not be 
possible, or

is this purely lack of dedicated work towards this goal?

snip
 

Theres a sun used at quagga dev as bgp-route-server.
http://quagga.net/route-server.php
(but they don't answered my question regarding fw-performance).




the Quagga guys are running a sun T1000 (niagara 1) route server - I 
happen

to have the machine in my racks,
please let me know if you want to run some tests on it, I'm sure they 
won't

mind ;-)
It should also make a great testbed for SMP performance testing imho 
(and

they're pretty cheap these days)
Also, feel free to use me as a relay for your questions, they're not 
always

very reachable.
snap

 

Perhaps you have some better luck at some different hardware systems
(ppc, mips, ..?) or use freebsd only for routing-table-updates and
special network-cards (netfpga) for real routing.


The netfpga site seems more or less dead - is this project still alive?
It does look like a very interesting idea, even though it's currently 
quite
linux-centric (and according to docs doesn't have VLAN nor ip6 
support, the

former being quite a dealbreaker)

Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a
freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of
information you are providing us :)

Met vriendelijke groet / With kind regards,

Bart Van Kerckhove
http://friet.net/pgp.txt

-BEGIN PGP SIGNATURE-

iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi
eca31f7WQ/oXq9tJ8TEDN3CA
=YGYq
-END PGP SIGNATURE-


  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]





Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Paul wrote:

SMP DISABLED on my Opteron 2212  (ULE, Preemption on)
Yields ~750kpps in em0 and out em1  (one direction)
I am miffed why this yields more pps than
a) with all 4 cpus running and b) 4 cpus with lagg load balanced over 3 
incoming connections so 3 taskq threads


SMP adds quite some overhead in the generic case is currently not
well suited for high performance packet forwarding.

On SMP interrupts are delivered to one CPU but not necessarily the
one that will later on handle the taskqueue to process the packets.
That adds overhead.  Ideally the interrupt for each network interface
is bound to exactly one pre-determined CPU and the taskqueue is bound
to the same CPU.  That way the overhead for interrupt and taskqueue
scheduling can be kept at a minimum.  Most of the infrastructure to
do this binding already exists in the kernel but is not yet exposed
to the outside for us to make use of it.  I'm also not sure if the
ULE scheduler skips the more global locks when interrupt and the
thread are on the same CPU.

Distributing the interrupts and taskqueues among the available CPUs
gives concurrent forwarding with bi- or multi-directional traffic.
All incoming traffic from any particular interface is still serialized
though.

--
Andre

I would be willing to set up test equipment (several servers plugged 
into a switch) with ipkvm and power port access
if someone or a group of people want to figure out ways to improve the 
routing process, ipfw, and lagg.


Maximum PPS with one ipfw rule on UP:
tops out about 570Kpps.. almost 200kpps lower ? (frown)

I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in 
here and see how that scales, using UP same kernel etc I have now.






Julian Elischer wrote:

Paul wrote:

ULE without PREEMPTION is now yeilding better results.
input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   571595 40639   34564108  1 0226 0
   577892 48865   34941908  1 0178 0
   545240 84744   32966404  1 0178 0
   587661 44691   35534512  1 0178 0
   587839 38073   35544904  1 0178 0
   587787 43556   35540360  1 0178 0
   540786 39492   32712746  1 0178 0
   572071 55797   34595650  1 0178 0
 
*OUCH, IPFW HURTS..
loading ipfw, and adding one ipfw rule allow ip from any to any drops 
100Kpps off :/ what's up with THAT?
unloaded ipfw module and back 100kpps more again, that's not right 
with ONE rule.. :/


ipfw need sto gain a lock on hte firewall before running,
and is quite complex..  I can believe it..

in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two 
interfaces (bridged) but I think it has slowed down since then due to 
the SMP locking.





em0 taskq is still jumping cpus.. is there any way to lock it to one 
cpu or is this just a function of ULE


running a tar czpvf all.tgz *  and seeing if pps changes..
negligible.. guess scheduler is doing it's job at least..

Hmm. even when it's getting 50-60k errors per second on the interface 
I can still SCP a file through that interface although it's not 
fast.. 3-4MB/s..


You know, I wouldn't care if it added 5ms latency to the packets when 
it was doing 1mpps as long as it didn't drop any.. Why can't it do 
that? Queue them up and do them in bi chunks so none are 
droppedhmm?


32 bit system is compiling now..  won't do  400kpps with GENERIC 
kernel, as with 64 bit did 450k with GENERIC, although that could be

the difference between opteron 270 and opteron 2212..

Paul

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]





___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Robert Watson


On Mon, 7 Jul 2008, Andre Oppermann wrote:

Distributing the interrupts and taskqueues among the available CPUs gives 
concurrent forwarding with bi- or multi-directional traffic. All incoming 
traffic from any particular interface is still serialized though.


... although not on multiple input queue-enabled hardware and drivers.  While 
I've really only focused on local traffic performance with my 10gbps Chelsio 
setup, it should be possible to do packet forwarding from multiple input 
queues using that hardware and driver today.


I'll update the netisr2 patches, which allow work to be pushed to multiple 
CPUs from a single input queue.  However, these necessarily take a cache miss 
or two on packet header data in order to break out the packets from the input 
queue into flows that can be processed independently without ordering 
constraints, so if those cache misses on header data are a big part of the 
performance of a configuration, load balancing in this manner may not help. 
What would be neat is if the cards without multiple input queues could still 
tag receive descriptors with a flow identifier generated from the IP/TCP/etc 
layers that could be used for work placement.


Robert N M Watson
Computer Laboratory
University of Cambridge



--
Andre

I would be willing to set up test equipment (several servers plugged into a 
switch) with ipkvm and power port access
if someone or a group of people want to figure out ways to improve the 
routing process, ipfw, and lagg.


Maximum PPS with one ipfw rule on UP:
tops out about 570Kpps.. almost 200kpps lower ? (frown)

I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in here 
and see how that scales, using UP same kernel etc I have now.






Julian Elischer wrote:

Paul wrote:

ULE without PREEMPTION is now yeilding better results.
input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   571595 40639   34564108  1 0226 0
   577892 48865   34941908  1 0178 0
   545240 84744   32966404  1 0178 0
   587661 44691   35534512  1 0178 0
   587839 38073   35544904  1 0178 0
   587787 43556   35540360  1 0178 0
   540786 39492   32712746  1 0178 0
   572071 55797   34595650  1 0178 0
 *OUCH, IPFW HURTS..
loading ipfw, and adding one ipfw rule allow ip from any to any drops 
100Kpps off :/ what's up with THAT?
unloaded ipfw module and back 100kpps more again, that's not right with 
ONE rule.. :/


ipfw need sto gain a lock on hte firewall before running,
and is quite complex..  I can believe it..

in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two 
interfaces (bridged) but I think it has slowed down since then due to the 
SMP locking.





em0 taskq is still jumping cpus.. is there any way to lock it to one cpu 
or is this just a function of ULE


running a tar czpvf all.tgz *  and seeing if pps changes..
negligible.. guess scheduler is doing it's job at least..

Hmm. even when it's getting 50-60k errors per second on the interface I 
can still SCP a file through that interface although it's not fast.. 
3-4MB/s..


You know, I wouldn't care if it added 5ms latency to the packets when it 
was doing 1mpps as long as it didn't drop any.. Why can't it do that? 
Queue them up and do them in bi chunks so none are 
droppedhmm?


32 bit system is compiling now..  won't do  400kpps with GENERIC kernel, 
as with 64 bit did 450k with GENERIC, although that could be

the difference between opteron 270 and opteron 2212..

Paul

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]





___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Robert Watson wrote:


On Mon, 7 Jul 2008, Andre Oppermann wrote:

Distributing the interrupts and taskqueues among the available CPUs 
gives concurrent forwarding with bi- or multi-directional traffic. All 
incoming traffic from any particular interface is still serialized 
though.


... although not on multiple input queue-enabled hardware and drivers.  
While I've really only focused on local traffic performance with my 
10gbps Chelsio setup, it should be possible to do packet forwarding from 
multiple input queues using that hardware and driver today.


I'll update the netisr2 patches, which allow work to be pushed to 
multiple CPUs from a single input queue.  However, these necessarily 
take a cache miss or two on packet header data in order to break out the 
packets from the input queue into flows that can be processed 
independently without ordering constraints, so if those cache misses on 
header data are a big part of the performance of a configuration, load 
balancing in this manner may not help. What would be neat is if the 
cards without multiple input queues could still tag receive descriptors 
with a flow identifier generated from the IP/TCP/etc layers that could 
be used for work placement.


The cache miss is really the elephant in the room.  If the network card
supports multiple RX rings with separate interrupts and a stable hash
based (that includes IP+Port src+dst) distribution they can be bound to
different CPUs.  It is very important to maintain the packet order for
flows that go through the router.  Otherwise TCP and VoIP will suffer.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Bruce Evans wrote:

On Mon, 7 Jul 2008, Andre Oppermann wrote:


Ingo Flaschberger wrote:
I don't think you will be able to route 64byte packets at 1gbit 
wirespeed (2Mpps) with a current x86 platform.


You have to take inter-frame gap and other overheads too.  That gives
about 1.244Mpps max on a 1GigE interface.


What are the other overheads?  I calculate 1.644Mpps counting the 
inter-frame

gap, with 64-byte packets and 64-header_size payloads.  If the 64 bytes
is for the payload, then the max is much lower.


The theoretical maximum at 64byte frames is 1,488,100.  I've looked
up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.

I hoped to reach 1Mpps with the hardware I mentioned some mails 
before, but 2Mpps is far far away.

Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium.


This is more or less expected.  PCI32 is not able to sustain high
packet rates.  The bus setup times kill the speed.  For larger packets
the ratio gets much better and some reasonable throughput can be 
achieved.


I get about 640 kpps without forwarding (sendto: slightly faster;
recvfrom: slightly slower) on a 2.2GHz A64.  Underclocking the memory
from 200MHz to 100MHz only reduces the speed by about 10%, while not
overclocking the CPU by 10% reduces the speed by the same 10%, so the
system is apparently still mainly CPU-bound.


On [EMAIL PROTECTED]  He's using a 1.2GHz Mobile Pentium on top of that.


NetFPGA doesn't have enough TCAM space to be useful for real routing
(as in Internet sized routing table).  The trick many embedded networking
CPUs use is cache prefetching that is integrated with the network
controller.  The first 64-128bytes of every packet are transferred
automatically into the L2 cache by the hardware.  This allows relatively
slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale
7448 in NPE-G2) to get more than 1Mpps.  Until something like this is
possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed.


Does using fa$ter memory (speed and/or latency) help here?  64 bytes
is so small that latency may be more of a problem, especially without
a prefetch.


Latency.  For IPv4 packet forwarding only one cache line per packet
is fetched.  More memory speed only helps with the DMA from/to the
network card.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Kris Kennaway

Andre Oppermann wrote:

Robert Watson wrote:
Experience suggests that forwarding workloads see significant lock 
contention in the routing and transmit queue code.  The former needs 
some kernel hacking to address in order to improve parallelism for 
routing lookups.  The latter is harder to address given the hardware 
you're using: modern 10gbps cards frequently offer multiple transmit 
queues that can be used independently (which our cxgb driver 
supports), but 1gbps cards generally don't.


Actually the routing code is not contended.  The workload in router
is mostly serialized without much opportunity for contention.  With
many interfaces and any-to-any traffic patterns it may get some
contention.  The locking overhead per packet is always there and has
some impact though.



Actually contention from route locking is a major bottleneck even on 
packet generation from multiple CPUs on a single host.  It is becoming 
increasingly necessary that someone look into fixing this.


Kris
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Bruce Evans

On Mon, 7 Jul 2008, Andre Oppermann wrote:


Bruce Evans wrote:
What are the other overheads?  I calculate 1.644Mpps counting the 
inter-frame

gap, with 64-byte packets and 64-header_size payloads.  If the 64 bytes
is for the payload, then the max is much lower.


The theoretical maximum at 64byte frames is 1,488,100.  I've looked
up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.


Where is the extra?  I still get 1.644736 Mpps (10^9/(8*64+96)).
1.488095 is for 64 bits extra (10^9/(8*64+96+64)).

I hoped to reach 1Mpps with the hardware I mentioned some mails before, 
but 2Mpps is far far away.

Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium.


This is more or less expected.  PCI32 is not able to sustain high
packet rates.  The bus setup times kill the speed.  For larger packets
the ratio gets much better and some reasonable throughput can be achieved.


I get about 640 kpps without forwarding (sendto: slightly faster;
recvfrom: slightly slower) on a 2.2GHz A64.  Underclocking the memory
from 200MHz to 100MHz only reduces the speed by about 10%, while not
overclocking the CPU by 10% reduces the speed by the same 10%, so the
system is apparently still mainly CPU-bound.


On [EMAIL PROTECTED]  He's using a 1.2GHz Mobile Pentium on top of that.


Yes.  My example shows that FreeBSD is more CPU-bound than I/O bound up
to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is
fast relative to its clock speed).  The memory interface may matter more
than the CPU clock.


NetFPGA doesn't have enough TCAM space to be useful for real routing
(as in Internet sized routing table).  The trick many embedded networking
CPUs use is cache prefetching that is integrated with the network
controller.  The first 64-128bytes of every packet are transferred
automatically into the L2 cache by the hardware.  This allows relatively
slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale
7448 in NPE-G2) to get more than 1Mpps.  Until something like this is
possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed.


Does using fa$ter memory (speed and/or latency) help here?  64 bytes
is so small that latency may be more of a problem, especially without
a prefetch.


Latency.  For IPv4 packet forwarding only one cache line per packet
is fetched.  More memory speed only helps with the DMA from/to the
network card.


I use low-end memory, but on the machine that does 640 kpps it somehow
has latency almost 4 times as low as on new FreeBSD cluster machines
(~42 nsec instead of ~150).  perfmon (fixed for AXP and A64) and hwpmc
report an average of 11 k8-dc-misses per sendto() while sending via
bge at 640 kpps.  11 * 42 accounts for 442 nsec out of the 1562 per
packet at this rate.  11 * 150 = 1650 would probably make this rate
unachievable despite the system having 20 times as much CPU and bus.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Robert Watson


On Mon, 7 Jul 2008, Bruce Evans wrote:

I use low-end memory, but on the machine that does 640 kpps it somehow has 
latency almost 4 times as low as on new FreeBSD cluster machines (~42 nsec 
instead of ~150).  perfmon (fixed for AXP and A64) and hwpmc report an 
average of 11 k8-dc-misses per sendto() while sending via bge at 640 kpps. 
11 * 42 accounts for 442 nsec out of the 1562 per packet at this rate.  11 * 
150 = 1650 would probably make this rate unachievable despite the system 
having 20 times as much CPU and bus.


Since you're doing fine-grained performance measurements of a code path that 
interests me a lot, could you compare the cost per-send on UDP for the 
following four cases:


(1) sendto() to a specific address and port on a socket that has been bound to
INADDR_ANY and a specific port.

(2) sendto() on a specific address and port on a socket that has been bound to
a specific IP address (not INADDR_ANY) and a specific port.

(3) send() on a socket that has been connect()'d to a specific IP address and
a specific port, and bound to INADDR_ANY and a specific port.

(4) send() on a socket that has been connect()'d to a specific IP address
and a specific port, and bound to a specific IP address (not INADDR_ANY)
and a specific port.

The last of these should really be quite a bit faster than the first of these, 
but I'd be interested in seeing specific measurements for each if that's 
possible!


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Robert Watson


On Mon, 7 Jul 2008, Robert Watson wrote:

The last of these should really be quite a bit faster than the first of 
these, but I'd be interested in seeing specific measurements for each if 
that's possible!


And, if you're feeling particualrly subject to suggestion, you might consider 
comparing 7.0 recent 8.x along the same dimensions :-).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Bruce Evans

On Mon, 7 Jul 2008, Robert Watson wrote:

Since you're doing fine-grained performance measurements of a code path that 
interests me a lot, could you compare the cost per-send on UDP for the 
following four cases:


(1) sendto() to a specific address and port on a socket that has been bound 
to

   INADDR_ANY and a specific port.

(2) sendto() on a specific address and port on a socket that has been bound 
to

   a specific IP address (not INADDR_ANY) and a specific port.

(3) send() on a socket that has been connect()'d to a specific IP address and
   a specific port, and bound to INADDR_ANY and a specific port.

(4) send() on a socket that has been connect()'d to a specific IP address
   and a specific port, and bound to a specific IP address (not INADDR_ANY)
   and a specific port.

The last of these should really be quite a bit faster than the first of 
these, but I'd be interested in seeing specific measurements for each if 
that's possible!


Not sure if I understand networking well enough to set these up quickly.
Does netrate use one of (3) or (4) now?

I can tell you vaguely about old results for netrate (send()) vs ttcp
(sendto()).  send() is lighter weight of course, and this made a difference
of 10-20%, but after further tuning the difference became smaller, which
suggests that everything ends up waiting for something in common.

Now I can measure cache misses better and hope that a simple count of
cache misses will be a more reproducible indicator of significant
bottlenecks than pps.  I got nowhere trying to reduce instruction
counts, possibly because it would take avoiding 100's of instructions
to get the same benefit as avoiding a single cache miss.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Erik Trulsson
On Mon, Jul 07, 2008 at 10:30:53PM +1000, Bruce Evans wrote:
 On Mon, 7 Jul 2008, Andre Oppermann wrote:
 
  Bruce Evans wrote:
  What are the other overheads?  I calculate 1.644Mpps counting the 
  inter-frame
  gap, with 64-byte packets and 64-header_size payloads.  If the 64 bytes
  is for the payload, then the max is much lower.
 
  The theoretical maximum at 64byte frames is 1,488,100.  I've looked
  up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.
 
 Where is the extra?  I still get 1.644736 Mpps (10^9/(8*64+96)).
 1.488095 is for 64 bits extra (10^9/(8*64+96+64)).

A standard ethernet frame (on the wire) consists of:
7 octets preamble
1 octet  Start Frame Delimiter
6 octets destination address
6 octets source address
2 octets length/type
46-1500 octets  data (+padding if needed)
4 octets Frame Check Sequence

Followed by (at least) 96 bits interFrameGap, before the next frame starts.

For minimal packet size this gives a maximum packet rate at 1Gbit/s of
1e9/((7+1+6+6+2+46+4)*8+96)/ = 1488095 packets/second

You probably missed the preamble and start frame delimiter in your
calculation.




-- 
Insert your favourite quote here.
Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Bruce Evans wrote:

On Mon, 7 Jul 2008, Andre Oppermann wrote:


Bruce Evans wrote:
What are the other overheads?  I calculate 1.644Mpps counting the 
inter-frame

gap, with 64-byte packets and 64-header_size payloads.  If the 64 bytes
is for the payload, then the max is much lower.


The theoretical maximum at 64byte frames is 1,488,100.  I've looked
up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.


Where is the extra?  I still get 1.644736 Mpps (10^9/(8*64+96)).
1.488095 is for 64 bits extra (10^9/(8*64+96+64)).


The preamble has 64 bits and is in addition to the inter-frame gap.

I hoped to reach 1Mpps with the hardware I mentioned some mails 
before, but 2Mpps is far far away.

Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium.


This is more or less expected.  PCI32 is not able to sustain high
packet rates.  The bus setup times kill the speed.  For larger packets
the ratio gets much better and some reasonable throughput can be 
achieved.


I get about 640 kpps without forwarding (sendto: slightly faster;
recvfrom: slightly slower) on a 2.2GHz A64.  Underclocking the memory
from 200MHz to 100MHz only reduces the speed by about 10%, while not
overclocking the CPU by 10% reduces the speed by the same 10%, so the
system is apparently still mainly CPU-bound.


On [EMAIL PROTECTED]  He's using a 1.2GHz Mobile Pentium on top of that.


Yes.  My example shows that FreeBSD is more CPU-bound than I/O bound up
to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is
fast relative to its clock speed).  The memory interface may matter more
than the CPU clock.


NetFPGA doesn't have enough TCAM space to be useful for real routing
(as in Internet sized routing table).  The trick many embedded 
networking

CPUs use is cache prefetching that is integrated with the network
controller.  The first 64-128bytes of every packet are transferred
automatically into the L2 cache by the hardware.  This allows 
relatively
slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz 
Freescale

7448 in NPE-G2) to get more than 1Mpps.  Until something like this is
possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM 
speed.


Does using fa$ter memory (speed and/or latency) help here?  64 bytes
is so small that latency may be more of a problem, especially without
a prefetch.


Latency.  For IPv4 packet forwarding only one cache line per packet
is fetched.  More memory speed only helps with the DMA from/to the
network card.


I use low-end memory, but on the machine that does 640 kpps it somehow
has latency almost 4 times as low as on new FreeBSD cluster machines
(~42 nsec instead of ~150).  perfmon (fixed for AXP and A64) and hwpmc
report an average of 11 k8-dc-misses per sendto() while sending via
bge at 640 kpps.  11 * 42 accounts for 442 nsec out of the 1562 per
packet at this rate.  11 * 150 = 1650 would probably make this rate
unachievable despite the system having 20 times as much CPU and bus.


We were talking routing here.  That is a packet received via network
interface and sent out on another.  Crosses the PCI bus twice.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Robert Watson


On Mon, 7 Jul 2008, Bruce Evans wrote:

(1) sendto() to a specific address and port on a socket that has been bound 
to

   INADDR_ANY and a specific port.

(2) sendto() on a specific address and port on a socket that has been bound 
to

   a specific IP address (not INADDR_ANY) and a specific port.

(3) send() on a socket that has been connect()'d to a specific IP address 
and

   a specific port, and bound to INADDR_ANY and a specific port.

(4) send() on a socket that has been connect()'d to a specific IP address
   and a specific port, and bound to a specific IP address (not INADDR_ANY)
   and a specific port.

The last of these should really be quite a bit faster than the first of 
these, but I'd be interested in seeing specific measurements for each if 
that's possible!


Not sure if I understand networking well enough to set these up quickly. 
Does netrate use one of (3) or (4) now?


(3) and (4) are effectively the same thing, I think, since connect(2) should 
force the selection of a source IP address, but I think it's not a bad idea to 
confirm that. :-)


The structure of the desired micro-benchmark here is basically:

int
main(int argc, char *argv)
{
struct sockaddr_in sin;

/* Parse command line arguments such as addresss and ports. */
if (bind_desired) {
/* Set up sockaddr_in. */
if (bind(s, (struct sockaddr *)sin, sizeof(sin))  0)
err(-1, bind);
}

/* Set up destination sockaddr_in. */
if (connect_desired) {
if (connect(s, (struct sockaddr *)sin, sizeof(sin))  0)
err(-1, connect);
}

while (appropriate_condition) {
if (connect_desired) {
if (send(s, ...)  0)
errors++;
} else {
if (sendto(s, (struct sockaddr *)sin, sizeof(sin))  0)
errors++;
}
}
}

I can tell you vaguely about old results for netrate (send()) vs ttcp 
(sendto()).  send() is lighter weight of course, and this made a difference 
of 10-20%, but after further tuning the difference became smaller, which 
suggests that everything ends up waiting for something in common.


Now I can measure cache misses better and hope that a simple count of cache 
misses will be a more reproducible indicator of significant bottlenecks than 
pps.  I got nowhere trying to reduce instruction counts, possibly because it 
would take avoiding 100's of instructions to get the same benefit as 
avoiding a single cache miss.


If you look at the design of the higher performance UDP applications, they 
will generally bind a specific IP (perhaps every IP on the host with its own 
socket), and if they do sustained communication to a specific endpoint they 
will use connect(2) rather than providing an address for each send(2) system 
call to the kernel.


udp_output(2) makes the trade-offs there fairly clear: with the most recent 
rev, the optimal case is one connect(2) has been called, allowing a single 
inpcb read lock and no global data structure access, vs. an application 
calling sendto(2) for each system call and the local binding remaining 
INADDR_ANY.  Middle ground applications, such as named(8) will force a local 
binding using bind(2), but then still have to pass an address to each 
sendto(2).  In the future, this case will be further optimized in our code by 
using a global read lock rather than a global write lock: we have to check for 
collisions, but we don't actually have to reserve the new 4-tuple for the UDP 
socket as it's an ephemeral association rather than a connect(2).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Bruce Evans

On Mon, 7 Jul 2008, Andre Oppermann wrote:


Paul,

to get a systematic analysis of the performance please do the following
tests and put them into a table for easy comparison:

1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 
monitor)

...


I won't be running many of these tests, but found this one interesting --
I didn't know about monitor mode.  It gives the following behaviour:

-monitor ttcp receiving on bge0 at 397 kpps: 35% idle (8.0-CURRENT) 13.6 cm/p
 monitor ttcp receiving on bge0 at 397 kpps: 83% idle (8.0-CURRENT)  5.8 cm/p
-monitor ttcp receiving on em0  at 580 kpps:  5% idle (~5.2)12.5 cm/p
 monitor ttcp receiving on em0  at 580 kpps: 65% idle (~5.2) 4.8 cm/p

cm/p = k8-dc-misses (bge0 system)
cm/p = k7-dc-misses (em0 system)

So it seems that the major overheads are not near the driver (as I already
knew), and upper layers are responsible for most of the cache misses.
The packet header is accessed even in monitor mode, so I think most of
the cache misses in upper layers are not related to the packet header.
Maybe they are due mainly to perfect non-locality for mbufs.

Other cm/p numbers:

ttcp sending on bge0 at 640 kpps: (~5.2)11 cm/p
ttcp sending on bge0 at 580 kpps: (8.0-CURRENT)  9 cm/p
(-current is 10% slower despite having lower cm/p.  This seems to be
due to extra instructions executed)
ping -fq -c100 localhost at 171 kpps: (8.0-CURRENT) 12-33 cm/p
(This is certainly CPU-bound.  lo0 is much slower than bge0.
Latency (rtt) is 2 us.  It is 3 us in ~5.2 and was 4 in -current until
very recently.)
ping -fq -c100 etherhost at  40 kpps: (8.0-CURRENT)55 cm/p
(The rate is quite low because flood ping doesn't actually flood.
It tries to limit the rate to max(100, 1/latency), but it tends to
go at a rate of ql(t)/latency where ql(t) is the average hardware
queue length at the current time t.  ql(t) starts at 1 and builds up
after a minute or 2 to a maximum of about 10 on my hardware.
Latency is always ~100 us, so the average ql(t) must have been ~4.)
ping -fq -c100 etherhost at  20 kpps: (8.0-CURRENT)45 cm/p
(Another run to record the average latency (it was 121) showed high
variance.)
netblast sending on bge0 at 582 kpps: (8.0-CURRENT)  9.8 cm/p
(Packet blasting benchmarks actually flood, unlike flood ping.
This is hard to implement, since select() for output-ready doesn't
work.  netblast has to busy wait, while ttcp guesses how long to
sleep but cannot sleep for a short enough interval unless queues
are too large or hz is too small.  My systems are configured with
HZ = 100 and snd.ifq too large so that sleeping for 1/Hz works for
ttcp.  netblast still busy-waits.

This gives an interesting difference for netblast.  It tries to send
800 k packets in 1 second by only successfully sends 582 k.  9.8
cm/p is for #misses / 582k.  The 300k unsuccessful sends apparently
don't cause many cache misses.  But variance is high...)
ttcp sending on bge0 at 577 kpps: (8.0-CURRENT) 15.5 cm/p
(Another run shows high variance.)
ttcp rates have low variance for a given kernel but high variance for
different kernels (an extra unrelated byte in the text section can
cause a 30% change).

High variance would also be explained by non-locality of mbufs.  Cycling
through lots of mbufs would maximize cache misses but random reuse of
mbufs would give variance.  Or the cycling and variance might be more
in general allocation.  There is sillyness in getsockaddr():  sendit()
calls getsockaddr() and getsockaddr() always uses malloc(), but
allocation on the stack works for at the call from sendit().  This
malloc() seemed to be responsible for a cache miss or two, but when I
changed it to use the stack the results were inconclusive.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Andre Oppermann

Bruce Evans wrote:

On Mon, 7 Jul 2008, Andre Oppermann wrote:


Paul,

to get a systematic analysis of the performance please do the following
tests and put them into a table for easy comparison:

1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 
monitor)

...


I won't be running many of these tests, but found this one interesting --
I didn't know about monitor mode.  It gives the following behaviour:

-monitor ttcp receiving on bge0 at 397 kpps: 35% idle (8.0-CURRENT) 13.6 
cm/p
 monitor ttcp receiving on bge0 at 397 kpps: 83% idle (8.0-CURRENT)  5.8 
cm/p
-monitor ttcp receiving on em0  at 580 kpps:  5% idle (~5.2)12.5 
cm/p
 monitor ttcp receiving on em0  at 580 kpps: 65% idle (~5.2) 4.8 
cm/p


cm/p = k8-dc-misses (bge0 system)
cm/p = k7-dc-misses (em0 system)

So it seems that the major overheads are not near the driver (as I already
knew), and upper layers are responsible for most of the cache misses.
The packet header is accessed even in monitor mode, so I think most of
the cache misses in upper layers are not related to the packet header.
Maybe they are due mainly to perfect non-locality for mbufs.


Monitor mode doesn't access the payload packet header.  It only looks
at the mbuf (which has a structure called mbuf packet header).  The mbuf
header it hot in the cache because the driver just touched it and filled
in the information.  The packet content (the payload) is cold and just
arrived via DMA in DRAM.

--
Andre
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Bruce Evans

On Mon, 7 Jul 2008, Andre Oppermann wrote:


Bruce Evans wrote:

So it seems that the major overheads are not near the driver (as I already
knew), and upper layers are responsible for most of the cache misses.
The packet header is accessed even in monitor mode, so I think most of
the cache misses in upper layers are not related to the packet header.
Maybe they are due mainly to perfect non-locality for mbufs.


Monitor mode doesn't access the payload packet header.  It only looks
at the mbuf (which has a structure called mbuf packet header).  The mbuf
header it hot in the cache because the driver just touched it and filled
in the information.  The packet content (the payload) is cold and just
arrived via DMA in DRAM.


Why does it use ntohs() then? :-).  From if_ethersubr.c:

% static void
% ether_input(struct ifnet *ifp, struct mbuf *m)
% {
%   struct ether_header *eh;
%   u_short etype;
% 
% 	if ((ifp-if_flags  IFF_UP) == 0) {

%   m_freem(m);
%   return;
%   }
% #ifdef DIAGNOSTIC
%   if ((ifp-if_drv_flags  IFF_DRV_RUNNING) == 0) {
%   if_printf(ifp, discard frame at !IFF_DRV_RUNNING\n);
%   m_freem(m);
%   return;
%   }
% #endif
%   /*
%* Do consistency checks to verify assumptions
%* made by code past this point.
%*/
%   if ((m-m_flags  M_PKTHDR) == 0) {
%   if_printf(ifp, discard frame w/o packet header\n);
%   ifp-if_ierrors++;
%   m_freem(m);
%   return;
%   }
%   if (m-m_len  ETHER_HDR_LEN) {
%   /* XXX maybe should pullup? */
%   if_printf(ifp, discard frame w/o leading ethernet 
%   header (len %u pkt len %u)\n,
%   m-m_len, m-m_pkthdr.len);
%   ifp-if_ierrors++;
%   m_freem(m);
%   return;
%   }
%   eh = mtod(m, struct ether_header *);

Point outside of mbuf header.

%   etype = ntohs(eh-ether_type);

First access outside of mbuf header.

But this seems to be bogus and might be fixed by compiler optimization, 
since etype is not used until after the monitor mode returns.  This may

have been broken by debugging cruft -- in 5.2, etype is used immediately
after here in a printf about discarding oversize frames.  The compiler
might also pessimize things by reordering code.

%   if (m-m_pkthdr.rcvif == NULL) {
%   if_printf(ifp, discard frame w/o interface pointer\n);
%   ifp-if_ierrors++;
%   m_freem(m);
%   return;
%   }
% #ifdef DIAGNOSTIC
%   if (m-m_pkthdr.rcvif != ifp) {
%   if_printf(ifp, Warning, frame marked as received on %s\n,
%   m-m_pkthdr.rcvif-if_xname);
%   }
% #endif
% 
% 	if (ETHER_IS_MULTICAST(eh-ether_dhost)) {

%   if (ETHER_IS_BROADCAST(eh-ether_dhost))
%   m-m_flags |= M_BCAST;
%   else
%   m-m_flags |= M_MCAST;
%   ifp-if_imcasts++;
%   }

Another dereference of eh (2 unless optimizable and optimized).  Here
the result is actually used early, but I think you don't care enough
about maintaing if_imcasts to do this.

% 
% #ifdef MAC

%   /*
%* Tag the mbuf with an appropriate MAC label before any other
%* consumers can get to it.
%*/
%   mac_ifnet_create_mbuf(ifp, m);
% #endif
% 
% 	/*

%* Give bpf a chance at the packet.
%*/
%   ETHER_BPF_MTAP(ifp, m);

I think this can access the whole packet, but usually doesn't.

% 
% 	/*

%* If the CRC is still on the packet, trim it off. We do this once
%* and once only in case we are re-entered. Nothing else on the
%* Ethernet receive path expects to see the FCS.
%*/
%   if (m-m_flags  M_HASFCS) {
%   m_adj(m, -ETHER_CRC_LEN);
%   m-m_flags = ~M_HASFCS;
%   }
% 
% 	ifp-if_ibytes += m-m_pkthdr.len;
% 
% 	/* Allow monitor mode to claim this frame, after stats are updated. */

%   if (ifp-if_flags  IFF_MONITOR) {
%   m_freem(m);
%   return;
%   }

Finally return in monitor mode.

I don't see any stats update before here except for the stray if_imcasts
one.

BTW, stats behave strangely in monitor mode:
- netstat -I interface 1 works except:
  - the byte counts are 0 every second second (the next second counts the
previous 2), while the packet counts are update every second
  - one system started showing bge0 stats for all interfaces.  Perhaps
unrelated.
- systat -ip shows all counts 0.
I think this is due to stats maintained by the driver working but other
stats not.  The mixture seems strange at user level.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Paul



one that will later on handle the taskqueue to process the packets.
That adds overhead.  Ideally the interrupt for each network interface
is bound to exactly one pre-determined CPU and the taskqueue is bound
to the same CPU.  That way the overhead for interrupt and taskqueue
scheduling can be kept at a minimum.  Most of the infrastructure to
do this binding already exists in the kernel but is not yet exposed
to the outside for us to make use of it.  I'm also not sure if the
ULE scheduler skips the more global locks when interrupt and the
thread are on the same CPU.

Distributing the interrupts and taskqueues among the available CPUs
gives concurrent forwarding with bi- or multi-directional traffic.
All incoming traffic from any particular interface is still serialized
though.




I used etherchannel to distribute incoming packets over 3 separate cpus 
evenly
but the output was on one interface..  What I got was less performance 
than with one cpu
and all three cpus were close to 100% utilizied.  em0,em1,em2 were all 
receiving packets
and sending them out em3.   The machine had 4 cpus in it.  em3 taskq was 
low cpu usage and
em0,1,2 were using cpu0,1,2(for example) almost fully used.   With all 
that cpu power being used
and I got less performance than with 1 cpu :/ Obviously in SMP there is 
a big issue somewhere.


Also my 82571 NIC supports multiple received queues and multiple 
transmit queues so why hasn't
anyone written the driver to support this?  It's not a 10gb card and it 
still supports it and it's widely
available and not too expensive either.   The new 82575/6 chips support 
even more queues and the
two port version will be out this month and the 4 port in october (PCI-E 
cards).  Motherboards are
already shipping with the 82576..   (82571 supports 2x/2x  575/6 support 
4x/4x)


Paul








___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Paul



I use low-end memory, but on the machine that does 640 kpps it somehow
has latency almost 4 times as low as on new FreeBSD cluster machines
(~42 nsec instead of ~150).  perfmon (fixed for AXP and A64) and hwpmc
report an average of 11 k8-dc-misses per sendto() while sending via
bge at 640 kpps.  11 * 42 accounts for 442 nsec out of the 1562 per
packet at this rate.  11 * 150 = 1650 would probably make this rate
unachievable despite the system having 20 times as much CPU and bus.

Any of the buffered dimms or ddr3 or high cas ddr2 are going to have a 
lot more latency
than older ones because the frequency is so high or the buffering.  The 
best is to use ddr2 with
the lowest timings that it supports at the highest frequency but not the 
highest frequency it supports
at higher timings.. for instance i have some 1100mhz ddr2 ram but it's 
5-5-5-15 but it will do
5-4-4-12 at 1000 or 900 Mhz so I think the latency may have more impact 
on the speed than
the actual MHz of the ram itself.   This works for several benchmarks 
which I have tested before
running the ram at 1:1 with the FSB  (400 FSB(1600fsb actual) with ram 
at 800 and the latency is

a lot lower than ram at 1:1.20 FSB even though the bandwidth is higher)
With higher latency in the 'server' machines we probably need to do 
things in bigger chunks.. Anyone
using a FBSD router isn't going to care about a 1ms delay in the packet 
but they will care if packets are

dropped or reordered.


Paul
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Bruce Evans

On Tue, 8 Jul 2008, Bruce Evans wrote:


On Mon, 7 Jul 2008, Andre Oppermann wrote:


Bruce Evans wrote:

So it seems that the major overheads are not near the driver (as I already
knew), and upper layers are responsible for most of the cache misses.
The packet header is accessed even in monitor mode, so I think most of
the cache misses in upper layers are not related to the packet header.
Maybe they are due mainly to perfect non-locality for mbufs.


Monitor mode doesn't access the payload packet header.  It only looks
at the mbuf (which has a structure called mbuf packet header).  The mbuf
header it hot in the cache because the driver just touched it and filled
in the information.  The packet content (the payload) is cold and just
arrived via DMA in DRAM.


Why does it use ntohs() then? :-).  From if_ethersubr.c:



...
%   eh = mtod(m, struct ether_header *);

Point outside of mbuf header.

%   etype = ntohs(eh-ether_type);

First access outside of mbuf header.
...
% % 	/* Allow monitor mode to claim this frame, after stats are updated. 
*/

%   if (ifp-if_flags  IFF_MONITOR) {
%   m_freem(m);
%   return;
%   }

Finally return in monitor mode.

I don't see any stats update before here except for the stray if_imcasts
one.


There are some error stats with printfs, but I've never seen these do
anything except with a buggy sk driver.

Testing verifies that accessing eh above gives a cache miss.  Under
~5.2 receiving on bge0 at 397 kpps:

-monitor: 17% idle 19 cm/p  (18% less idle than under -current)
 monitor: 66% idle  8 cm/p  (17% less idle than under -current)
+monitor: 71% idle  7 cm/p  (idle time under -current not measured)

+monitor is monitor mode with the exit moved to the top of ether_input().

If the cache miss takes the time measured by lmbench2 (42 ns), then
397 k of these per second gives 17 ms or 1.7% CPU, which is vaguely
consistent with the improvement of 5% by not taking this cache miss.
Avoiding most of the 19 cache misses should give much more than a
5% improvement.  Maybe -current gets its 17% improvement by avoiding
some.

More userland stats weirdness in userland:
- in monitor mode, em0 gives byte counts delayed while bge0 gives byte
  counts always 0.
- netstat -I interface 1 seems to be broken in ~5.2 in all modes -- it
  gives output for interfaces with drivers but no hardware.

All this is for UP.  An SMP kernel on the same UP system loses  5% for at
least tx.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Artem Belevich
Hi,

As was already mentioned, we can't avoid all cache misses as there's
data that's recently been updated in memory via DMA and therefor
kicked out of cache.

However, we may hide some of the latency penalty by prefetching
'interesting' data early. I.e. we know that we want to access some
ethernet headers, so we may start pulling relevant data into cache
early. Ideally, by the time we need to access the field, it will
already be in the cache. When we're counting nanoseconds per packet
this may bring some performance gain.

Just my $0.02.
--Artem
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Mike Tancsa

At 02:44 PM 7/7/2008, Paul wrote:
Also my 82571 NIC supports multiple received queues and multiple 
transmit queues so why hasn't
anyone written the driver to support this?  It's not a 10gb card and 
it still supports it and it's widely



Intel actually maintains the driver. Not sure if there are plans or 
not, but perhaps they can comment ?


---Mike

available and not too expensive either.   The new 82575/6 chips 
support even more queues and the
two port version will be out this month and the 4 port in october 
(PCI-E cards).  Motherboards are
already shipping with the 82576..   (82571 supports 2x/2x  575/6 
support 4x/4x)


Paul








___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Julian Elischer

Artem Belevich wrote:

Hi,

As was already mentioned, we can't avoid all cache misses as there's
data that's recently been updated in memory via DMA and therefor
kicked out of cache.

However, we may hide some of the latency penalty by prefetching
'interesting' data early. I.e. we know that we want to access some
ethernet headers, so we may start pulling relevant data into cache
early. Ideally, by the time we need to access the field, it will
already be in the cache. When we're counting nanoseconds per packet
this may bring some performance gain.


Prefetching when you are waiting for the data isn't a help.
what you need is a speculative prefetch where you an tell teh 
processor We will probably need the following address so start 
getting it while we go do other stuff.


As far as I know we have no capacity to do that..



Just my $0.02.
--Artem
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Peter Jeremy
On 2008-Jul-07 13:25:13 -0700, Julian Elischer [EMAIL PROTECTED] wrote:
what you need is a speculative prefetch where you an tell teh 
processor We will probably need the following address so start 
getting it while we go do other stuff.

This looks like the PREFETCH instructions that exist in at least amd64
and SPARC.  Unfortunately, their optimal use is very implementation-
dependent and the AMD documentation suggests that incorrect use can
degrade performance.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgptHl9lUe7aI.pgp
Description: PGP signature


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Julian Elischer

Peter Jeremy wrote:

On 2008-Jul-07 13:25:13 -0700, Julian Elischer [EMAIL PROTECTED] wrote:
what you need is a speculative prefetch where you an tell teh 
processor We will probably need the following address so start 
getting it while we go do other stuff.


This looks like the PREFETCH instructions that exist in at least amd64
and SPARC.  Unfortunately, their optimal use is very implementation-
dependent and the AMD documentation suggests that incorrect use can
degrade performance.



It might be worth looking to see if the network processing threads 
might be able to prefetch the IP header at least :-)


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Artem Belevich
  Prefetching when you are waiting for the data isn't a help.

Agreed. Got to start prefetch around put your memory latency herens
before you actually need the data and move on doing other things that
do not depend on the data you've just started prefetching.

  what you need is a speculative prefetch where you an tell teh processor We
 will probably need the following address so start getting it while we go do
 other stuff.

It does not have to be 'speculative' either. In this particular case
we have very good idea that we *will* need some data from ethernet
header and, probably, IP and TCP headers as well. We might as well tel
the hardware to start pulling data in without stalling the CPU. Intel
has instructions specifically for this purpose. I assume AMD has them
too.

--Artem
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Mike Tancsa

At 02:44 PM 7/7/2008, Paul wrote:

Also my 82571 NIC supports multiple received queues and multiple 
transmit queues so why hasn't
anyone written the driver to support this?  It's not a 10gb card and 
it still supports it and it's widely
available and not too expensive either.   The new 82575/6 chips 
support even more queues and the
two port version will be out this month and the 4 port in october 
(PCI-E cards).  Motherboards are
already shipping with the 82576..   (82571 supports 2x/2x  575/6 
support 4x/4x)





Actually, do any of your NICs attach via the igb driver ?

---Mike 


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-07 Thread Paul
I read through the IGB driver, and it says 82575/6 only...  which is the 
new chip Intel is releasing on the cards this month 2 port

and october 4 port, but the chips are on some of the motherboards right now.
Why can't it also use the 82571 ? doesn't make any sense.. I haven't 
tried it but just browsing the driver source

doesn't look like it will work.



Mike Tancsa wrote:

At 02:44 PM 7/7/2008, Paul wrote:

Also my 82571 NIC supports multiple received queues and multiple 
transmit queues so why hasn't
anyone written the driver to support this?  It's not a 10gb card and 
it still supports it and it's widely
available and not too expensive either.   The new 82575/6 chips 
support even more queues and the
two port version will be out this month and the 4 port in october 
(PCI-E cards).  Motherboards are
already shipping with the 82576..   (82571 supports 2x/2x  575/6 
support 4x/4x)





Actually, do any of your NICs attach via the igb driver ?

---Mike
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-06 Thread Andrew Snow


I'm no expert, but I imagine the problem is because the net processing 
of FreeBSD is not pipelined enough.  We are now able to affordably throw 
many gigabytes of RAM into a machine, as well 2 to 8 CPUs.  So why not 
allow for big buffers and multiple processing steps?


I be happy to give up a bit of latency in order to increase the parallel 
processing ability of packets travelling through the system.  I could be 
wrong but I imagine it would be better to treat the processing of 
pockets as a series of stages with queues (that can grow quite large if 
necessary).


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-05 Thread Paul

ULE + PREEMPTION for non SMP
no major differences with SMP with ULE/4BSD and preemption ON/OFF

32 bit UP test coming up with new cpu
and I'm installing dragonfly sometime this weekend :]
UP: 1mpps in one direction with no firewall/no routing table is not too 
bad, but 1mpps both directions is the goal here

700kpps with full bgp table in one direction is not too bad
Ipfw needs a lot of work, barely gets 500kpps with no routing table with 
a few ipfw rules loaded.. that's horrible
Linux barely takes a hit when you start loading iptables rules , but 
then again linux has a HUGE problem with routing

random packet sources/ports .. grr
My problem Is I need some box to do fast routing and some to do 
firewall.. :/
I'll have 32 bit 7-stable UP test with ipfw/routing table and then move 
on to dragonfly.
I'll post the dragonfly results here as well as sign up for their 
mailing list.



Bart Van Kerckhove wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Paul / Ingo,
  

I tried all of this :/  still, 256/512 descriptors seem to work the
best. Happy to let you log into the machine and fiddle around if you
want :) 
  

I've been watching this thread closely, since I'm in a very similair
situation.
A few questions/remarks:

Does ULE provide better performance than 4BSD for forwarding?
Did you try freebsd4 as well? This thread had a report about that quite
opposite to my own experiences, -4 seemed to be a lot faster at forwarding
than anything else I 've tried so far.
Obviously the thing I'm interested in is IMIX - and 64byte packets.
Does anyone have any benchmarks for DragonFly? I asked around on IRC, but
that nor google turned up any useful results.

snip 
  

I don't think you will be able to route 64byte packets at 1gbit
wirespeed (2Mpps) with a current x86 platform.


Are there actual hardware related reasons this should not be possible, or
is this purely lack of dedicated work towards this goal?

snip
  

Theres a sun used at quagga dev as bgp-route-server.
http://quagga.net/route-server.php
(but they don't answered my question regarding fw-performance).




the Quagga guys are running a sun T1000 (niagara 1) route server - I happen
to have the machine in my racks,
please let me know if you want to run some tests on it, I'm sure they won't
mind ;-)
It should also make a great testbed for SMP performance testing imho (and
they're pretty cheap these days)
Also, feel free to use me as a relay for your questions, they're not always
very reachable.
snap

  

Perhaps you have some better luck at some different hardware systems
(ppc, mips, ..?) or use freebsd only for routing-table-updates and
special network-cards (netfpga) for real routing.


The netfpga site seems more or less dead - is this project still alive?
It does look like a very interesting idea, even though it's currently quite
linux-centric (and according to docs doesn't have VLAN nor ip6 support, the
former being quite a dealbreaker)

Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a
freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of
information you are providing us :)

Met vriendelijke groet / With kind regards,

Bart Van Kerckhove
http://friet.net/pgp.txt

-BEGIN PGP SIGNATURE-

iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi
eca31f7WQ/oXq9tJ8TEDN3CA
=YGYq
-END PGP SIGNATURE-


  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-05 Thread Paul

UP 32 bit test vs 64 bit:
negligible difference in forwarding performance without polling
slightly better polling performance but still errors at lower packet rates
same massive hit with ipfw loaded

Installing dragonfly in a bit..
If anyone has a really fast PPC type system or SUN or something i'd love 
to try it :)

Something with a really big L1 cache :P


Paul wrote:

ULE + PREEMPTION for non SMP
no major differences with SMP with ULE/4BSD and preemption ON/OFF

32 bit UP test coming up with new cpu
and I'm installing dragonfly sometime this weekend :]
UP: 1mpps in one direction with no firewall/no routing table is not 
too bad, but 1mpps both directions is the goal here

700kpps with full bgp table in one direction is not too bad
Ipfw needs a lot of work, barely gets 500kpps with no routing table 
with a few ipfw rules loaded.. that's horrible
Linux barely takes a hit when you start loading iptables rules , but 
then again linux has a HUGE problem with routing

random packet sources/ports .. grr
My problem Is I need some box to do fast routing and some to do 
firewall.. :/
I'll have 32 bit 7-stable UP test with ipfw/routing table and then 
move on to dragonfly.
I'll post the dragonfly results here as well as sign up for their 
mailing list.



Bart Van Kerckhove wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Paul / Ingo,
 

I tried all of this :/  still, 256/512 descriptors seem to work the
best. Happy to let you log into the machine and fiddle around if you
want :)   

I've been watching this thread closely, since I'm in a very similair
situation.
A few questions/remarks:

Does ULE provide better performance than 4BSD for forwarding?
Did you try freebsd4 as well? This thread had a report about that quite
opposite to my own experiences, -4 seemed to be a lot faster at 
forwarding

than anything else I 've tried so far.
Obviously the thing I'm interested in is IMIX - and 64byte packets.
Does anyone have any benchmarks for DragonFly? I asked around on IRC, 
but

that nor google turned up any useful results.

snip  

I don't think you will be able to route 64byte packets at 1gbit
wirespeed (2Mpps) with a current x86 platform.

Are there actual hardware related reasons this should not be 
possible, or

is this purely lack of dedicated work towards this goal?

snip
 

Theres a sun used at quagga dev as bgp-route-server.
http://quagga.net/route-server.php
(but they don't answered my question regarding fw-performance).




the Quagga guys are running a sun T1000 (niagara 1) route server - I 
happen

to have the machine in my racks,
please let me know if you want to run some tests on it, I'm sure they 
won't

mind ;-)
It should also make a great testbed for SMP performance testing imho 
(and

they're pretty cheap these days)
Also, feel free to use me as a relay for your questions, they're not 
always

very reachable.
snap

 

Perhaps you have some better luck at some different hardware systems
(ppc, mips, ..?) or use freebsd only for routing-table-updates and
special network-cards (netfpga) for real routing.


The netfpga site seems more or less dead - is this project still alive?
It does look like a very interesting idea, even though it's currently 
quite
linux-centric (and according to docs doesn't have VLAN nor ip6 
support, the

former being quite a dealbreaker)

Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a
freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of
information you are providing us :)

Met vriendelijke groet / With kind regards,

Bart Van Kerckhove
http://friet.net/pgp.txt

-BEGIN PGP SIGNATURE-

iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi
eca31f7WQ/oXq9tJ8TEDN3CA
=YGYq
-END PGP SIGNATURE-


  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-05 Thread Bart Van Kerckhove
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Paul / Ingo,
 
 I tried all of this :/  still, 256/512 descriptors seem to work the
 best. Happy to let you log into the machine and fiddle around if you
 want :) 
I've been watching this thread closely, since I'm in a very similair
situation.
A few questions/remarks:

Does ULE provide better performance than 4BSD for forwarding?
Did you try freebsd4 as well? This thread had a report about that quite
opposite to my own experiences, -4 seemed to be a lot faster at forwarding
than anything else I 've tried so far.
Obviously the thing I'm interested in is IMIX - and 64byte packets.
Does anyone have any benchmarks for DragonFly? I asked around on IRC, but
that nor google turned up any useful results.

snip 
 I don't think you will be able to route 64byte packets at 1gbit
 wirespeed (2Mpps) with a current x86 platform.
Are there actual hardware related reasons this should not be possible, or
is this purely lack of dedicated work towards this goal?

snip
Theres a sun used at quagga dev as bgp-route-server.
http://quagga.net/route-server.php
(but they don't answered my question regarding fw-performance).


the Quagga guys are running a sun T1000 (niagara 1) route server - I happen
to have the machine in my racks,
please let me know if you want to run some tests on it, I'm sure they won't
mind ;-)
It should also make a great testbed for SMP performance testing imho (and
they're pretty cheap these days)
Also, feel free to use me as a relay for your questions, they're not always
very reachable.
snap

 Perhaps you have some better luck at some different hardware systems
 (ppc, mips, ..?) or use freebsd only for routing-table-updates and
 special network-cards (netfpga) for real routing.
The netfpga site seems more or less dead - is this project still alive?
It does look like a very interesting idea, even though it's currently quite
linux-centric (and according to docs doesn't have VLAN nor ip6 support, the
former being quite a dealbreaker)

Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a
freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of
information you are providing us :)

Met vriendelijke groet / With kind regards,

Bart Van Kerckhove
http://friet.net/pgp.txt

-BEGIN PGP SIGNATURE-

iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi
eca31f7WQ/oXq9tJ8TEDN3CA
=YGYq
-END PGP SIGNATURE-

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-05 Thread Chris Marlatt

Bart Van Kerckhove wrote:

The netfpga site seems more or less dead - is this project still alive?
It does look like a very interesting idea, even though it's currently quite
linux-centric (and according to docs doesn't have VLAN nor ip6 support, the
former being quite a dealbreaker)



Just last Thursday they made another release so it certainly doesn't 
look dead. I've been following the project for awhile now to see where 
it's going to go. The lack of FreeBSD support isn't great but I doubt 
it's going to happen until someone steps up and makes it so. The same is 
likely true for VLAN support. So far it's primarily been a proof of 
concept from what I can tell and could be molded into any number of 
different applications with the appropriate support.


Considering all high performance routing platforms separate the 
management and routing/switching into two (or more) different hardware 
sections it wouldn't surprise me at all to see this as the only real 
option to get some serious routing and firewalling performance out of 
i386/amd64 type servers. Throwing faster and faster cpus at it is only 
going to get you so far (re: opteron 2212 vs ). Even so, 1.1Mpps is 
a considerable rate.


Regards,

Chris
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-05 Thread Julian Elischer

Bart Van Kerckhove wrote:




Perhaps you have some better luck at some different hardware systems
(ppc, mips, ..?) or use freebsd only for routing-table-updates and
special network-cards (netfpga) for real routing.

The netfpga site seems more or less dead - is this project still alive?
It does look like a very interesting idea, even though it's currently quite
linux-centric (and according to docs doesn't have VLAN nor ip6 support, the
former being quite a dealbreaker)


netfpga is very much alive. I'm on the mailing lists..

but it is summer break and it's an academically driven project.



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-04 Thread Bruce Evans

On Fri, 4 Jul 2008, Paul wrote:

Numbers are maximum with near 100% cpu usage and some errors occuring, just 
for testing.
FreeBSD  7.0-STABLE FreeBSD 7.0-STABLE #6: Thu Jul  3 19:32:38 CDT 2008 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/ROUTER  amd64

CPU: Dual-Core AMD Opteron(tm) Processor  (3015.47-MHz K8-class CPU)
NON-SMP KERNEL  em driver, intel 82571EB NICs
fastforwarding on, isr.direct on, ULE, Preemption (NOTE: Interesting thing, 
without preemption gets errors similar to polling)


PREEMPTION is certainly needed with UP.  Without it, interrupts don't
actually work (to work, they need to preempt the running thread, but
they often (usually?) don't do that).  Then with UP, there is a good
chance that the interrupt thread doesn't get scheduled to run for a
long time, but with SMP (especially with lots of CPUs) there is a good
chance that another CPU gets scheduled to run the interrupt thread.
em (unless misconfigured) doesn't have an interrupt thread; it uses a
taskq which might take even longer to be scheduled than an interrupt
thread.  I use PREEMPTION with UP and !PREEMPTION with SMP.

With polling, missed polls cause the same packet loss as not preempting.

I tried polling, and I tried the polling patch that was posted to the list 
and both work but generate too many errors (missed packets).

Without polling the packet errors ONLY occur when the cpu is near 100% usage


Polling should also only cause packet loss when the CPU is near 100% usage,
but now transients of near 100% usually cause packet loss, while with
interrupts it takes a transient of  100% on the competing interrupt-
driven resources to cause packet loss.

Pleas trim quotes.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-04 Thread Ingo Flaschberger

Dear Paul,


Opteron  UP mode, no polling

  input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
 1071020 0   66403248  2 0404 0


that looks good. (but seems to be near the limit).

Polling turned on provided better performance on 32 bit, but it gets strange 
errors on 64 bit..
Even at low pps I get small amounts of errors, and high pps same thing.. you 
would think that if
it got errors at low pps it would get more errors at high pps but that isn't 
the case..

Polling on:
packets  errs  bytespackets  errs  bytes colls
  979736   963   60743636  1 0226 0
  991838   496   61493960  1 0178 0
  996125   460   61759754  1 0178 0
  979381   326   60721626  1 0178 0
 1022249   379   63379442  1 0178 0
  991468   557   61471020  1 0178 0

lowering pps a little...
 input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
  818688   151   50758660  1 0226 0
  837920   179   51951044  1 0178 0
  826217   168   51225458  1 0178 0
  801017   100   49663058  1 0178 0
  761857   287   47235138  1 0178 0


what could cause this?


*) kern.polling.idle_poll enabled?
*) kern.polling.user_frac ?
*) kern.polling.reg_frac ?
*) kern.polling.burst_max ?
*) kern.polling.each_burst ?

Kind regards,
Ingo Flaschberger
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-04 Thread Ingo Flaschberger

Dear Paul,


what could cause this?


*) kern.polling.idle_poll enabled?
*) kern.polling.user_frac ?
*) kern.polling.reg_frac ?
*) kern.polling.burst_max ?
*) kern.polling.each_burst ?


I tried tons of different values for these and nothing made any significant 
difference.

Idle polling makes a difference, allows more pps, but still errors.
Without idle polling it seems PPS is limited to HZ * descriptors, or 1000 * 
256 or 512
but 1000 * 1024 is the same as 512..  4000 * 256 or 2000 * 512 works but 
starts erroring 600kpps (SMP right now but it happens in UP too)


I have patched src/sys/kern/kern_poll.c to support higher burst_max 
values:

#define MAX_POLL_BURST_MAX  1

When setting kern.polling.burst_max to higher values, the server reach a 
point, where cpu-usage goes up without load, so try to keep below this 
values. I also have set the network card to 4096 rx-ram, to have more 
room for late polls.


Kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-04 Thread Paul

I tried all of this :/  still, 256/512 descriptors seem to work the best.
Happy to let you log into the machine and fiddle around if you want :)

Paul

Ingo Flaschberger wrote:

Dear Paul,


what could cause this?


*) kern.polling.idle_poll enabled?
*) kern.polling.user_frac ?
*) kern.polling.reg_frac ?
*) kern.polling.burst_max ?
*) kern.polling.each_burst ?


I tried tons of different values for these and nothing made any 
significant difference.

Idle polling makes a difference, allows more pps, but still errors.
Without idle polling it seems PPS is limited to HZ * descriptors, or 
1000 * 256 or 512
but 1000 * 1024 is the same as 512..  4000 * 256 or 2000 * 512 works 
but starts erroring 600kpps (SMP right now but it happens in UP too)


I have patched src/sys/kern/kern_poll.c to support higher burst_max 
values:

#define MAX_POLL_BURST_MAX  1

When setting kern.polling.burst_max to higher values, the server reach 
a point, where cpu-usage goes up without load, so try to keep below 
this values. I also have set the network card to 4096 rx-ram, to 
have more room for late polls.


Kind regards,
Ingo Flaschberger




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Paul

Preliminary 32 bit results...
When I started out it looked like 32 bit was worse than 64 bit, but it's 
just the timers are different.

For instance, 4000 hz in 64 bit gives better results than 4000hz in 32 bit.
Low HZ gives better result with polling on in 32 bit

Bottom line, so far I'm not able to get any better performance out of 32 
bit at all.  In fact I think it might be even a tad slower. I didn't see 
as high of bursts like I did on 64 bit so far but I'm still testing.


Tomorrow comes opteron  so it's 1ghz faster than this one, and I can 
see if it scales directly with cpu speed or what happens.


I did another SMP test with an interesting results. I took one of the 
cpus out of the machine, so it was just left with a single 2212 (dual core)

and it performed better.  Less contention I suppose?




some results:
kern.hz=4000
hw.em.rxd=512
hw.em.txd=512

polling on, idle polling on (only way I can get a reliable netstat output)

  input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   681961 117612   42281586  1 0226 0
   655095 83418   40615892  2 0220 0
   683881 93559   42400626  1 0178 0
   683637 90452   42385498  1 0178 0
   683345 87471   42367394  1 0178 0
   682737 81483   42329696  2 0220 0
   683154 95413   4232  1 0178 0
   684556 111013   42442476  1 0178 0
   684365 110960   42430634  1 0178 0
   679089 116440   42103518  3 0534 0
   684328 122713   42428340  1 0178 0
   684852 121387   42460828  1 0178 0
   685358 113256   42492200  1 0178 0
   685060 123110   42473724  1 0178 0
   684463 118335   42436710  1 0178 0
   677182 127788   41985300  2 0356 0
   685920 126144   42527044  1 0178 0
   684946 107034   42466656  1 0178 0


(reboot)
kern.hz=1000
   input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   679611 97394   42136046  5 0762 0
   663939 104714   41164254  5 0   1322 0
   685538 91102   42503412  4 0536 0
   676704 94629   41955668  2 0404 0
   685323 115060   42490030  1 0178 0
   675954 105506   41909164  2 0356 0
   655321 92118   40629906  1 0178 0
   686826 85674   42583228  2 0356 0
   686378 89983   42555440  1 0178 0
   685539 80180   42503422  1 0178 0
   686704 88626   42575652  1 0178 0
   686567 88596   42567158  1 0178 0
   687031 82640   42595936  3 0398 0
sysctl -w kern.polling.each_burst=50
kern.polling.each_burst: 256 - 50
[EMAIL PROTECTED] ~]# netstat -w1 -I em0
   input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   693036 39992   42968315  3 0400 0
   695538 58189   43123360  1 0178 0
   692670 62765   42945544  1 0178 0
   693219 60755   42979580  2 0220 0
   692637 64761   42943498  1


sysctl -w kern.polling.each_burst=33
kern.polling.each_burst: 50 - 33
[EMAIL PROTECTED] ~]# netstat -w1 -I em0
   input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   690530 63359   42812868  1 0226 0
   689748 57670   42764380  1 0178 0
   690489 57874   42810322  1 0178 0
   689655 60606   42758614  1 0178 0
^C
[EMAIL PROTECTED] ~]# sysctl -w kern.polling.each_burst=3
kern.polling.each_burst: 33 - 3
[EMAIL PROTECTED] ~]# netstat -w1 -I em0
   input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   612234 110896   37958512  1 0226 0
   614391 112506   38092246  1 0178 0
^C
[EMAIL PROTECTED] ~]# sysctl -w kern.polling.each_burst=800
kern.polling.each_burst: 3 - 800
[EMAIL PROTECTED] ~]# netstat -w1 -I em0
   input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   668057 76496   41419538  1 0226 0
   667689 88674   41396720  2 0220 0
   670526 106654   41572616  1 0178 0
   667326 97832   41374216  1 0178 0
^C
[EMAIL PROTECTED] ~]# sysctl -w 

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Bruce Evans

On Wed, 2 Jul 2008, Paul wrote:


...
---Reboot with 4096/4096(my guess is that it will be a lot 
worse, more errors..)


Without polling, 4096 is horrible, about 200kpps less ... :/
Turning on polling..
polling on, 4096 is bad,
  input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
  622379 307753   38587506  1 0178 0
  635689 277303   39412718  1 0178 0
...
--Rebooting with 256/256 descriptors..
..
No polling:
843762 25337   52313248  1 0178 0
  763555 0   47340414  1 0178 0
  830189 0   51471722  1 0178 0
  838724 0   52000892  1 0178 0
  813594   939   50442832  1 0178 0
  807303   763   50052790  1 0178 0
  791024 0   49043492  1 0178 0
  768316  1106   47635596  1 0178 0
Machine is maxed and is unresponsive..


That's the most interesting one.  Even 1% packet loss would probably
destroy performance, so the benchmarks that give 10-50% packet loss
are uninteresting.

All indications are that you are running out of CPU and memory (DMA
and/or cache fills) throughput.  The above apparently hits both limits
at the same time, while with more descriptors memory throughput runs
out first.  1 CPU is apparently barely enough for 800 kpps (is this
all with UP now?), and I think more CPUs could only be slower, as you
saw with SMP, especially using multiple em taskqs, since memory traffic
would be higher.  I wouldn't expect this to be fixed soon (except by
throwing better/different hardware at it).

The CPU/DMA balance can probably be investigated by slowing down the CPU/
memory system.

You may remember my previous mail about getting higher pps on bge.
Again, all indications are that I'm running out of CPU, memory, and
bus throughput too since the bus is only PCI 33MHz.  These interact
in a complicated way which I haven't been able to untangle.  -current
is fairly consistently slower than my ~5.2 by about 10%, apparently
due to code bloat (extra CPU and related extra cache misses).  OTOH,
like you I've seen huge variations for changes that should be null
(e.g., disturbing the alignment of the text section without changing
anything else).  My ~5.2 is very consistent since I rarely change it,
while -current changes a lot and shows more variation, but with no
sign of getting near the ~5.2 plateau or even its old peaks.


Polling ON:
input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
  784138 179079   48616564  1 0226 0
  788815 129608   48906530  2 0356 0
  75 142997   46844426  2 0468 0
  803670 144459   49827544  1 0178 0
  777649 147120   48214242  1 0178 0
  779539 146820   48331422  1 0178 0
  786201 148215   48744478  2 0356 0
  776013 101660   48112810  1 0178 0
  774239 145041   48002834  2 0356 0
  771774 102969   47850004  1 0178 0

Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ?  I'm really 
mistified by this..


Is this with hz=2000 and 256/256 and no polling in idle?  40% is easy
to explain (perhaps incorrectly).  Polling can then read at most 256
descriptors every 1/2000 second, giving a max throughput of 512 kpps.
Packets  descriptors in general but might be equal here (for small
packets).  You seem to actually get 784 kpps, which is too high even
in descriptors unless, but matches exactly if the errors are counted
twice (784 - 179 - 505 ~= 512).  CPU is getting short too, but 40%
still happens to be left over after giving up at 512 kpps.  Most of
the errors are probably handled by the hardware at low cost in CPU by
dropping packets.  There are other types of errors but none except
dropped packets is likely.


Every time it maxes out and gets errors, top reports:
CPU:  0.0% user,  0.0% nice, 10.1% system, 45.3% interrupt, 44.6% idle
pretty much the same line every time

256/256 blows away 4096 , probably fits the descriptors into the cache lines 
on the cpu and 4096 has too many cache misses and causes worse performance.


Quite likely.  Maybe your systems have memory systems that are weak relative
to other resources, so that they this limit sooner than expected.

I should look at my fixes for bge, one than changes rxd from 256 to 512,
and one that increases the ifq tx length from txd = 512 to about 2.
Both of these might thrash caches.  The former makes little difference
except for polling at  4000 Hz, but I don't believe in or use polling.
The latter works around select() for write descriptors not working on 
sockets, so that high frequency 

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Paul

Bruce Evans wrote:

On Wed, 2 Jul 2008, Paul wrote:


...
---Reboot with 4096/4096(my guess is that it will be 
a lot worse, more errors..)


Without polling, 4096 is horrible, about 200kpps less ... :/
Turning on polling..
polling on, 4096 is bad,
  input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
  622379 307753   38587506  1 0178 0
  635689 277303   39412718  1 0178 0
...
--Rebooting with 256/256 descriptors..
..
No polling:
843762 25337   52313248  1 0178 0
  763555 0   47340414  1 0178 0
  830189 0   51471722  1 0178 0
  838724 0   52000892  1 0178 0
  813594   939   50442832  1 0178 0
  807303   763   50052790  1 0178 0
  791024 0   49043492  1 0178 0
  768316  1106   47635596  1 0178 0
Machine is maxed and is unresponsive..


That's the most interesting one.  Even 1% packet loss would probably
destroy performance, so the benchmarks that give 10-50% packet loss
are uninteresting.

But you realize that it's outputting all of these packets on em3  and 
I'm watching them coming out
and they are consistent with the packets received on em0 that netstat 
shows are 'good' packets.

All indications are that you are running out of CPU and memory (DMA
and/or cache fills) throughput.  The above apparently hits both limits
at the same time, while with more descriptors memory throughput runs
out first.  1 CPU is apparently barely enough for 800 kpps (is this
all with UP now?), and I think more CPUs could only be slower, as you
saw with SMP, especially using multiple em taskqs, since memory traffic
would be higher.  I wouldn't expect this to be fixed soon (except by
throwing better/different hardware at it).

The CPU/DMA balance can probably be investigated by slowing down the CPU/
memory system.

I'm using a server opteron which supposedly has the best memory 
performance out of any CPU right now.
Plus opterons have the biggest l1 cache, but small l2 cache.  Do you 
think larger l2 cache on the Xeon (6mb for 2 core) would be better?
I have a  opteron coming which is 1ghz faster so we will see what 
happens :

My NIC is PCI-E 4x so there's no bottleneck there.

You may remember my previous mail about getting higher pps on bge.
Again, all indications are that I'm running out of CPU, memory, and
bus throughput too since the bus is only PCI 33MHz.  These interact
in a complicated way which I haven't been able to untangle.  -current
is fairly consistently slower than my ~5.2 by about 10%, apparently
due to code bloat (extra CPU and related extra cache misses).  OTOH,
like you I've seen huge variations for changes that should be null
(e.g., disturbing the alignment of the text section without changing
anything else).  My ~5.2 is very consistent since I rarely change it,
while -current changes a lot and shows more variation, but with no
sign of getting near the ~5.2 plateau or even its old peaks.


Polling ON:
input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
  784138 179079   48616564  1 0226 0
  788815 129608   48906530  2 0356 0
  75 142997   46844426  2 0468 0
  803670 144459   49827544  1 0178 0
  777649 147120   48214242  1 0178 0
  779539 146820   48331422  1 0178 0
  786201 148215   48744478  2 0356 0
  776013 101660   48112810  1 0178 0
  774239 145041   48002834  2 0356 0
  771774 102969   47850004  1 0178 0

Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ?  I'm 
really mistified by this..


Is this with hz=2000 and 256/256 and no polling in idle?  40% is easy
to explain (perhaps incorrectly).  Polling can then read at most 256
descriptors every 1/2000 second, giving a max throughput of 512 kpps.
Packets  descriptors in general but might be equal here (for small
packets).  You seem to actually get 784 kpps, which is too high even
in descriptors unless, but matches exactly if the errors are counted
twice (784 - 179 - 505 ~= 512).  CPU is getting short too, but 40%
still happens to be left over after giving up at 512 kpps.  Most of
the errors are probably handled by the hardware at low cost in CPU by
dropping packets.  There are other types of errors but none except
dropped packets is likely.

Read above, it's actually transmitting 770kpps out of em3 so it can't 
just be 512kpps.
I suppose multiple packets can fit in 1 descriptor? I am using VERY 
small tcp packets..



Every time it maxes out and gets errors, top reports:
CPU:  0.0% user,  0.0% 

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Bruce Evans

On Thu, 3 Jul 2008, Paul wrote:


Bruce Evans wrote:

No polling:
843762 25337   52313248  1 0178 0
  763555 0   47340414  1 0178 0
  830189 0   51471722  1 0178 0
  838724 0   52000892  1 0178 0
  813594   939   50442832  1 0178 0
  807303   763   50052790  1 0178 0
  791024 0   49043492  1 0178 0
  768316  1106   47635596  1 0178 0
Machine is maxed and is unresponsive..


That's the most interesting one.  Even 1% packet loss would probably
destroy performance, so the benchmarks that give 10-50% packet loss
are uninteresting.

But you realize that it's outputting all of these packets on em3  and I'm 
watching them coming out
and they are consistent with the packets received on em0 that netstat shows 
are 'good' packets.


Well, output is easier.  I don't remember seeing the load on a taskq for
em3.  If there is a memory bottleneck, it might to might not be more related
to running only 1 taskq per interrupt, depending on how independent the
memory system is for different CPU.  I think Opterons have more indenpendence
here than most x86's.

I'm using a server opteron which supposedly has the best memory performance 
out of any CPU right now.
Plus opterons have the biggest l1 cache, but small l2 cache.  Do you think 
larger l2 cache on the Xeon (6mb for 2 core) would be better?

I have a  opteron coming which is 1ghz faster so we will see what happens


I suspect lower latency memory would help more.  Big memory systems
have inherently higher latency.  My little old A64 workstation and
laptop have main memory latencies 3 times smaller than freebsd.org's
new Core2 servers according to lmbench2 (42 nsec for the overclocked
DDR PC3200 one and 55 for the DDR2 PC5400 (?) one, vs 145-155 nsec).
If there are a lot of cache misses, then the extra 100 nsec can be
important.  Profiling of sendto() using hwpmc or perfmon shows a
significant number of cache misses per packet (2 or 10?).


Polling ON:
input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
  784138 179079   48616564  1 0226 0
  788815 129608   48906530  2 0356 0
Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ?  I'm really 
mistified by this..


Is this with hz=2000 and 256/256 and no polling in idle?  40% is easy
to explain (perhaps incorrectly).  Polling can then read at most 256
descriptors every 1/2000 second, giving a max throughput of 512 kpps.
Packets  descriptors in general but might be equal here (for small
packets).  You seem to actually get 784 kpps, which is too high even
in descriptors unless, but matches exactly if the errors are counted
twice (784 - 179 - 505 ~= 512).  CPU is getting short too, but 40%
still happens to be left over after giving up at 512 kpps.  Most of
the errors are probably handled by the hardware at low cost in CPU by
dropping packets.  There are other types of errors but none except
dropped packets is likely.

Read above, it's actually transmitting 770kpps out of em3 so it can't just be 
512kpps.


Transmitting is easier, but with polling its even harder to send faster than
hz * queue_length than to receive.  This is without polling in idle.

I was thinking of trying 4 or 5.. but how would that work with this new 
hardware?


Poorly, except possibly with polling in FreeBSD-4.  FreeBSD-4 generally
has lower overheads and latency, but is missing important improvements
(mainly tcp optimizations in upper layers, better DMA and/or mbuf
handling, and support for newer NICs).  FreeBSD-5 is also missing the
overhead+latency advantage.

Here are some benchmarks. (ttcp mainly tests sendto().  4.10 em needed a
2-line change to support a not-so-new PCI em NIC.  Summary:
- my bge NIC can handle about 600 kpps on my faster machine, but only
  achieves 300 in 4.10 unpatched.
- my em NIC can handle about 400 kpps on my slower machine, except in
  later versions it can receive at about 600 kpps.
- only 6.x and later can achieve near wire throughput for 1500-MTU
  packets (81 kpps vs 76 kpps).  This depends on better DMA or mbuf
  handling...  I now remember the details -- it is mainly better mbuf
  handling: old versions split the 1500-MTU packets into 2 mbufs and
  this causes 2 descriptors per packet, which causes extra software
  overheads and even larger overheads for the hardware.

%%%
Results of benchmarks run on 23 Feb 2007:

my~5.2 bge -- ~4.10 em
 tx  rx
 kpps   load%ipskppsload%ips
ttcp -l5-u -t 639 981660 398* 77  8k
ttcp -l5   -t 6.01003960 6.0   65900
ttcp -l1472 -u -t  76 27 395  76  40  8k
ttcp -l1472-t  51 40 11k  51  

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Ingo Flaschberger

Dear Stefan,

So my maximum without polling is close to 800kpps but if I push that it 
starts locking me from doing things, or


how many kpps do you want to achieve?
Do not know for Paul but, I want to be able to route (and/or bridge to 
handle) 600-700mbps syn flood,

which is something like 1500kpps in every direction. Is it unrealistic?


yes, I think so.

look at this project:
http://yuba.stanford.edu/NetFPGA/

This card(s) could do that.
Maximum count of routes seems to be limited, but with lpf it should work.
A freebsd-kernel interface is missing.

If the code is optimized to fully utilize MP I do not see a reason why quad 
core processor should not be able to do this.
After all single core seems to handle 500kpps, if we utilize four, eight or 
even more cores we should be able to route 1500kpps + ?


Theres a sun used at quagga dev as bgp-route-server.
http://quagga.net/route-server.php
(but they don't answered my question regarding fw-performance).


I hope TOE once MFCed to 7-STABLE will help too?


I don't think toe will help.


Kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Ingo Flaschberger

Dear Paul,

Tomorrow comes opteron  so it's 1ghz faster than this one, and I can see 
if it scales directly with cpu speed or what happens.


can you send me a lspci -v?

I did another SMP test with an interesting results. I took one of the cpus 
out of the machine, so it was just left with a single 2212 (dual core)

and it performed better.  Less contention I suppose?


in smp locking is a performance killer.

My next router appliance will be:
http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429

Kind regards,
Ingo Flaschberger
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Ingo Flaschberger

Dear Stefan,

So my maximum without polling is close to 800kpps but if I push that it 
starts locking me from doing things, or


how many kpps do you want to achieve?
Do not know for Paul but, I want to be able to route (and/or bridge to 
handle) 600-700mbps syn flood,

which is something like 1500kpps in every direction. Is it unrealistic?


I would also give Dragonfly bsd a try, as Mike had the best results with 
it.


Kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Paul

Bruce Evans wrote:

On Thu, 3 Jul 2008, Paul wrote:


Bruce Evans wrote:

No polling:
843762 25337   52313248  1 0178 0
  763555 0   47340414  1 0178 0
  830189 0   51471722  1 0178 0
  838724 0   52000892  1 0178 0
  813594   939   50442832  1 0178 0
  807303   763   50052790  1 0178 0
  791024 0   49043492  1 0178 0
  768316  1106   47635596  1 0178 0
Machine is maxed and is unresponsive..


That's the most interesting one.  Even 1% packet loss would probably
destroy performance, so the benchmarks that give 10-50% packet loss
are uninteresting.

But you realize that it's outputting all of these packets on em3  and 
I'm watching them coming out
and they are consistent with the packets received on em0 that netstat 
shows are 'good' packets.


Well, output is easier.  I don't remember seeing the load on a taskq for
em3.  If there is a memory bottleneck, it might to might not be more 
related

to running only 1 taskq per interrupt, depending on how independent the
memory system is for different CPU.  I think Opterons have more 
indenpendence

here than most x86's.


Opterons have on cpu memory controller.. That should help a little. :P
But I must be getting more than 1 packet per descriptor because I can do 
HZ=100 and still get it without polling..
idle polling helps in all cases of polling that I have tested it with, 
seems moreso on 32 bit
I'm using a server opteron which supposedly has the best memory 
performance out of any CPU right now.
Plus opterons have the biggest l1 cache, but small l2 cache.  Do you 
think larger l2 cache on the Xeon (6mb for 2 core) would be better?
I have a  opteron coming which is 1ghz faster so we will see what 
happens


I suspect lower latency memory would help more.  Big memory systems
have inherently higher latency.  My little old A64 workstation and
laptop have main memory latencies 3 times smaller than freebsd.org's
new Core2 servers according to lmbench2 (42 nsec for the overclocked
DDR PC3200 one and 55 for the DDR2 PC5400 (?) one, vs 145-155 nsec).
If there are a lot of cache misses, then the extra 100 nsec can be
important.  Profiling of sendto() using hwpmc or perfmon shows a
significant number of cache misses per packet (2 or 10?).

The opterons are 667mhz DDR2 [registered], I have a Xeon that is ddr3 
but i think the latency is higher than ddr2.

I'll look up those programs you mentioned and see If I can run some tests.


Polling ON:
input  (em0)   output
 packets  errs  bytespackets  errs  bytes colls
  784138 179079   48616564  1 0226 0
  788815 129608   48906530  2 0356 0
Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ?  I'm 
really mistified by this..


Is this with hz=2000 and 256/256 and no polling in idle?  40% is easy
to explain (perhaps incorrectly).  Polling can then read at most 256
descriptors every 1/2000 second, giving a max throughput of 512 kpps.
Packets  descriptors in general but might be equal here (for small
packets).  You seem to actually get 784 kpps, which is too high even
in descriptors unless, but matches exactly if the errors are counted
twice (784 - 179 - 505 ~= 512).  CPU is getting short too, but 40%
still happens to be left over after giving up at 512 kpps.  Most of
the errors are probably handled by the hardware at low cost in CPU by
dropping packets.  There are other types of errors but none except
dropped packets is likely.

Read above, it's actually transmitting 770kpps out of em3 so it can't 
just be 512kpps.


Transmitting is easier, but with polling its even harder to send 
faster than

hz * queue_length than to receive.  This is without polling in idle.

What i'm saying though, it that it's not giving up at 512kpps because 
784kpps is coming in em0
and going out em3 so obviously it's reading more than 256 every 1/2000th 
of a second (packets).

What would be the best settings (theoretical) for 1mpps processing?
I actually don't have a problem 'receiving' more than 800kpps with much 
lower CPU usage if it's going to blackhole .
so obviously it can receive a lot more, maybe even line rate pps but i 
can't generate that much.
I was thinking of trying 4 or 5.. but how would that work with this 
new hardware?


Poorly, except possibly with polling in FreeBSD-4.  FreeBSD-4 generally
has lower overheads and latency, but is missing important improvements
(mainly tcp optimizations in upper layers, better DMA and/or mbuf
handling, and support for newer NICs).  FreeBSD-5 is also missing the
overhead+latency advantage.

Here are some benchmarks. (ttcp mainly tests sendto().  4.10 em needed a
2-line change to support a not-so-new PCI em NIC.  Summary:
- my bge NIC can handle about 600 kpps on my faster machine, but only
  achieves 

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Paul

Err.. pciconf -lv ?


[EMAIL PROTECTED]:0:0:0:   class=0x05 card=0x151115d9 chip=0x036910de 
rev=0xa2 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 Memory Controller'
   class  = memory
   subclass   = RAM
[EMAIL PROTECTED]:0:1:0:   class=0x060100 card=0x151115d9 chip=0x036410de 
rev=0xa3 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 LPC Bridge'
   class  = bridge
   subclass   = PCI-ISA
[EMAIL PROTECTED]:0:1:1:   class=0x0c0500 card=0x151115d9 chip=0x036810de 
rev=0xa3 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 SMBus'
   class  = serial bus
   subclass   = SMBus
[EMAIL PROTECTED]:0:2:0:   class=0x0c0310 card=0x151115d9 chip=0x036c10de 
rev=0xa1 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 USB Controller'
   class  = serial bus
   subclass   = USB
[EMAIL PROTECTED]:0:2:1:   class=0x0c0320 card=0x151115d9 chip=0x036d10de 
rev=0xa2 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 USB Controller'
   class  = serial bus
   subclass   = USB
[EMAIL PROTECTED]:0:4:0: class=0x01018a card=0x151115d9 chip=0x036e10de 
rev=0xa1 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 IDE'
   class  = mass storage
   subclass   = ATA
[EMAIL PROTECTED]:0:5:0: class=0x010185 card=0x151115d9 chip=0x037f10de 
rev=0xa3 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 SATA Controller'
   class  = mass storage
   subclass   = ATA
[EMAIL PROTECTED]:0:5:1: class=0x010185 card=0x151115d9 chip=0x037f10de 
rev=0xa3 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 SATA Controller'
   class  = mass storage
   subclass   = ATA
[EMAIL PROTECTED]:0:5:2: class=0x010185 card=0x151115d9 chip=0x037f10de 
rev=0xa3 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 SATA Controller'
   class  = mass storage
   subclass   = ATA
[EMAIL PROTECTED]:0:6:0:   class=0x060401 card=0x151115d9 chip=0x037010de 
rev=0xa2 hdr=0x01

   vendor = 'Nvidia Corp'
   device = 'MCP55 PCI bridge'
   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:0:8:0:class=0x02 card=0x151115d9 chip=0x037210de 
rev=0xa3 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 Ethernet'
   class  = network
   subclass   = ethernet
[EMAIL PROTECTED]:0:9:0:class=0x02 card=0x151115d9 chip=0x037210de 
rev=0xa3 hdr=0x00

   vendor = 'Nvidia Corp'
   device = 'MCP55 Ethernet'
   class  = network
   subclass   = ethernet
[EMAIL PROTECTED]:0:10:0:  class=0x060400 card=0x10de chip=0x037610de 
rev=0xa3 hdr=0x01

   vendor = 'Nvidia Corp'
   device = 'MCP55 PCIe bridge'
   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:0:13:0:  class=0x060400 card=0x10de chip=0x037810de 
rev=0xa3 hdr=0x01

   vendor = 'Nvidia Corp'
   device = 'MCP55 PCIe bridge'
   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:0:14:0:  class=0x060400 card=0x10de chip=0x037510de 
rev=0xa3 hdr=0x01

   vendor = 'Nvidia Corp'
   device = 'MCP55 PCIe bridge'
   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:0:15:0:  class=0x060400 card=0x10de chip=0x037710de 
rev=0xa3 hdr=0x01

   vendor = 'Nvidia Corp'
   device = 'MCP55 PCIe bridge'
   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:0:24:0: class=0x06 card=0x chip=0x11001022 
rev=0x00 hdr=0x00

   vendor = 'Advanced Micro Devices (AMD)'
   device = '(K8) Athlon 64/Opteron HyperTransport Technology 
Configuration'

   class  = bridge
   subclass   = HOST-PCI
[EMAIL PROTECTED]:0:24:1: class=0x06 card=0x chip=0x11011022 
rev=0x00 hdr=0x00

   vendor = 'Advanced Micro Devices (AMD)'
   device = '(K8) Athlon 64/Opteron Address Map'
   class  = bridge
   subclass   = HOST-PCI
[EMAIL PROTECTED]:0:24:2: class=0x06 card=0x chip=0x11021022 
rev=0x00 hdr=0x00

   vendor = 'Advanced Micro Devices (AMD)'
   device = '(K8) Athlon 64/Opteron DRAM Controller'
   class  = bridge
   subclass   = HOST-PCI
[EMAIL PROTECTED]:0:24:3: class=0x06 card=0x chip=0x11031022 
rev=0x00 hdr=0x00

   vendor = 'Advanced Micro Devices (AMD)'
   device = '(K8) Athlon 64/Opteron Miscellaneous Control'
   class  = bridge
   subclass   = HOST-PCI
[EMAIL PROTECTED]:1:6:0: class=0x03 card=0x151115d9 chip=0x515e1002 
rev=0x02 hdr=0x00

   vendor = 'ATI Technologies Inc'
   device = 'Radeon ES1000 Radeon ES1000'
   class  = display
   subclass   = VGA
[EMAIL PROTECTED]:2:0:0:   class=0x060400 card=0x chip=0x01251033 
rev=0x07 hdr=0x01

   vendor = 'NEC Electronics Hong Kong'
   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:2:0:1:   class=0x060400 card=0x chip=0x01251033 
rev=0x07 hdr=0x01

   vendor = 'NEC Electronics Hong Kong'
   class  

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-03 Thread Steve Bertrand

Ingo Flaschberger wrote:


My next router appliance will be:
http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429


This is exactly the device that I have been testing with (just rebranded).

Steve
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-02 Thread Paul

SMP DISABLED on my Opteron 2212  (ULE, Preemption on)
Yields ~750kpps in em0 and out em1  (one direction)
I am miffed why this yields more pps than
a) with all 4 cpus running and b) 4 cpus with lagg load balanced over 3 
incoming connections so 3 taskq threads


I would be willing to set up test equipment (several servers plugged 
into a switch) with ipkvm and power port access
if someone or a group of people want to figure out ways to improve the 
routing process, ipfw, and lagg.


Maximum PPS with one ipfw rule on UP:
tops out about 570Kpps.. almost 200kpps lower ? (frown)

I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in 
here and see how that scales, using UP same kernel etc I have now.






Julian Elischer wrote:

Paul wrote:

ULE without PREEMPTION is now yeilding better results.
input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   571595 40639   34564108  1 0226 0
   577892 48865   34941908  1 0178 0
   545240 84744   32966404  1 0178 0
   587661 44691   35534512  1 0178 0
   587839 38073   35544904  1 0178 0
   587787 43556   35540360  1 0178 0
   540786 39492   32712746  1 0178 0
   572071 55797   34595650  1 0178 0
 
*OUCH, IPFW HURTS..
loading ipfw, and adding one ipfw rule allow ip from any to any drops 
100Kpps off :/ what's up with THAT?
unloaded ipfw module and back 100kpps more again, that's not right 
with ONE rule.. :/


ipfw need sto gain a lock on hte firewall before running,
and is quite complex..  I can believe it..

in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two 
interfaces (bridged) but I think it has slowed down since then due to 
the SMP locking.





em0 taskq is still jumping cpus.. is there any way to lock it to one 
cpu or is this just a function of ULE


running a tar czpvf all.tgz *  and seeing if pps changes..
negligible.. guess scheduler is doing it's job at least..

Hmm. even when it's getting 50-60k errors per second on the interface 
I can still SCP a file through that interface although it's not 
fast.. 3-4MB/s..


You know, I wouldn't care if it added 5ms latency to the packets when 
it was doing 1mpps as long as it didn't drop any.. Why can't it do 
that? Queue them up and do them in bi chunks so none are 
droppedhmm?


32 bit system is compiling now..  won't do  400kpps with GENERIC 
kernel, as with 64 bit did 450k with GENERIC, although that could be

the difference between opteron 270 and opteron 2212..

Paul

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]





___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-02 Thread Ingo Flaschberger

Dear Paul,


SMP DISABLED on my Opteron 2212  (ULE, Preemption on)
Yields ~750kpps in em0 and out em1  (one direction)
I am miffed why this yields more pps than
a) with all 4 cpus running and b) 4 cpus with lagg load balanced over 3 
incoming connections so 3 taskq threads


because less locking, less synchronisation, 

I would be willing to set up test equipment (several servers plugged into a 
switch) with ipkvm and power port access
if someone or a group of people want to figure out ways to improve the 
routing process, ipfw, and lagg.


Maximum PPS with one ipfw rule on UP:
tops out about 570Kpps.. almost 200kpps lower ? (frown)


can you post the rule here?

I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in here 
and see how that scales, using UP same kernel etc I have now.


really, please try 32bit and 1 cpu.

Kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-02 Thread Paul

Ipfw rule was simply allow ip from any to any :)
This is 64bit i'm testing now.. I have a 32 bit install I tested on 
another machine but it only has bge NIC and wasn't performing as well
so I'll reinstall 32 bit on this 2212 and test then drop in the  
(3ghz) and test.

I still don't like the huge hit ipfw and lagg take :/

** I tried polling in UP mode and I got some VERY interesting results..
CPU is 44% idle (idle polling isn't on)  but I'm getting errors!  It's 
doing 530kpps with ipfw loaded, which without polling uses 100% cpu but 
now it says my cpu is 44% idle? that makes no sense.. If it was idle why 
am I getting errors?  I only get errors when em taskq was eating 100% cpu..

Idle polling on/off makes no difference.
user_frac is set to 5 ..
last pid:  1598;  load averages:  0.01,  0.16,  
0.43  up 
0+00:34:41  04:04:43

66 processes:  2 running, 46 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice,  7.3% system, 46.5% interrupt, 46.2% idle
Mem: 8064K Active, 6808K Inact, 43M Wired, 92K Cache, 9264K Buf, 1923M Free
Swap: 8192M Total, 8192M Free

 PID USERNAME PRI NICE   SIZERES STATETIME   WCPU COMMAND
  10 root 171 ki31 0K16K RUN 10:10 88.87% idle
1598 root  450  8084K  2052K RUN  0:00  1.12% top
  11 root -32- 0K16K WAIT 0:02  0.24% swi4: clock sio
  13 root -44- 0K16K WAIT14:13  0.15% swi1: net
1329 root  440 33732K  4572K select   0:00  0.05% sshd

 input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   541186 68741   33107504  1 0  0 0
   540036 70611   33044632  1 0178 0
   540470 66493   33043148  1 0178 0
   541903 67981   33125414  1 0178 0
   541238 84979   33105898  1 0178 0
   541338 74067   33115984  2 0356 0
   539116 49286   32991516  2 0220 0


kldunload ipfw...

 input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   600589 0   36751064  1 0226 0
   606294 0   37102868  2 0220 0
   616802 0   37733866  1 0178 0
   623017 0   38117436  1 0178 0
   624800 0   38225470  1 0178 0
   626791 0   38347426  1 0178 0

last pid:  1605;  load averages:  0.00,  0.13,  
0.40  up 
0+00:35:30  04:05:32

66 processes:  2 running, 46 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice,  7.1% system, 36.0% interrupt, 56.9% idle
Mem: 8064K Active, 6812K Inact, 43M Wired, 92K Cache, 9264K Buf, 1923M Free
Swap: 8192M Total, 8192M Free

 PID USERNAME PRI NICE   SIZERES STATETIME   WCPU COMMAND
  10 root 171 ki31 0K16K RUN 10:16 95.36% idle
  13 root -44- 0K16K WAIT14:53  0.24% swi1: net
  36 root -68- 0K16K -1:03  0.10% em3 taskq
1605 root  440  8084K  2052K RUN  0:00  0.10% top
  11 root -32- 0K16K WAIT 0:02  0.05% swi4: clock sio



add some more PPS..
 input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   749015 169684   46438936  1 0 42 0
   749176 184574   46448916  1 0178 0
   759576 188462   47093716  1 0178 0
   762904 182854   47300052  1 0178 0
   798039 147509   49478422  1 0178 0
   759528 194297   47090740  1 0178 0
   746849 195935   46304642  1 0178 0
   747566 186703   46349096  1 0178 0
   750011 181630   46500702  2


last pid:  1607;  load averages:  0.19,  0.17,  
0.40  up 
0+00:36:18  04:06:20

66 processes:  2 running, 46 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 12.5% system, 45.4% interrupt, 42.1% idle
Mem: 8068K Active, 6808K Inact, 43M Wired, 92K Cache, 9264K Buf, 1923M Free
Swap: 8192M Total, 8192M Free

 PID USERNAME PRI NICE   SIZERES STATETIME   WCPU COMMAND
  10 root 171 ki31 0K16K RUN 10:21 85.64% idle
  36 root -68- 0K16K -1:07  3.61% em3 taskq
1607 root  440  8084K  2052K RUN  0:00  0.93% top
  13 root -44- 0K16K WAIT15:32  0.20% swi1: net
  11 root -32- 0K16K WAIT 0:02  0.05% swi4: clock sio



So my maximum without polling is close to 800kpps but if I push that it 
starts locking me from doing things, or
my maximum is 750kpps with polling and the console is very responsive? 
How on EARTH can my 

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-02 Thread Ingo Flaschberger

Dear Paul,


I still don't like the huge hit ipfw and lagg take :/


I think, you can't use fastforward with with lagg.


** I tried polling in UP mode and I got some VERY interesting results..
CPU is 44% idle (idle polling isn't on)  but I'm getting errors!  It's doing 
530kpps with ipfw loaded, which without polling uses 100% cpu but now it says 
my cpu is 44% idle? that makes no sense.. If it was idle why am I getting 
errors?  I only get errors when em taskq was eating 100% cpu..

Idle polling on/off makes no difference.
user_frac is set to 5 ..


what are your values:
kern.polling.reg_frac=
kern.polling.user_frac=
kern.polling.burst_max=

I use:
kern.polling.reg_frac=20
kern.polling.user_frac=20
kern.polling.burst_max=512

if you need more than 1000, you need to change the code:
src/sys/kern/kern_poll.c
#define MAX_POLL_BURST_MAX  1000

So my maximum without polling is close to 800kpps but if I push that it 
starts locking me from doing things, or


how many kpps do you want to achieve?


HZ=2000 for this test (512/512 descriptors)


you mean:
hw.em.rxd=512
hw.em.txd=512
?

can you try with polling:
hw.em.rxd=4096
hw.em.txd=4096

Kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-02 Thread Stefan Lambrev

Hi

Ingo Flaschberger wrote:

Dear Paul,


I still don't like the huge hit ipfw and lagg take :/

You have to try PF, then you will respect IPFW again ;)
-cut-


So my maximum without polling is close to 800kpps but if I push that 
it starts locking me from doing things, or


how many kpps do you want to achieve?
Do not know for Paul but, I want to be able to route (and/or bridge to 
handle) 600-700mbps syn flood,

which is something like 1500kpps in every direction. Is it unrealistic?
If the code is optimized to fully utilize MP I do not see a reason why 
quad core processor should not be able to do this.
After all single core seems to handle 500kpps, if we utilize four, eight 
or even more cores we should be able to route 1500kpps + ?

I hope TOE once MFCed to 7-STABLE will help too?

--

Best Wishes,
Stefan Lambrev
ICQ# 24134177

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-02 Thread Adrian Chadd
2008/7/2 Stefan Lambrev [EMAIL PROTECTED]:

 Do not know for Paul but, I want to be able to route (and/or bridge to
 handle) 600-700mbps syn flood,
 which is something like 1500kpps in every direction. Is it unrealistic?
 If the code is optimized to fully utilize MP I do not see a reason why quad
 core processor should not be able to do this.
 After all single core seems to handle 500kpps, if we utilize four, eight or
 even more cores we should be able to route 1500kpps + ?
 I hope TOE once MFCed to 7-STABLE will help too?

But its not just about CPU use, its about your NIC, your IO bus path,
your memory interface, your caches .. things get screwy. Especially if
you're holding a full internet routing table.

If you're interested in participating in a group funding project to
make this happen then let me know. The more the merrier (read: the
more that can be achieved :)



Adrian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-02 Thread Freddie Cash
On Mon, Jun 30, 2008 at 6:39 PM, Ingo Flaschberger [EMAIL PROTECTED] wrote:
 I'm curious now... how do you change individual device polling via sysctl?

 not via sysctl, via ifconfig:

 # enable interface polling
 /sbin/ifconfig em0 polling
 /sbin/ifconfig em1 polling
 /sbin/ifconfig em2 polling
 /sbin/ifconfig em3 polling

 (and via /etc/rc.local also across reboots)

No, you put it into the ifconfig_X lines in /etc/rc.conf as the last
option.  Or -polling to disable it.

ifconfig_em0='inet 1.2.3.4/24 polling
ifconfig_em2='inet 1.2.3.5/24 -polling

-- 
Freddie Cash
[EMAIL PROTECTED]
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Stefan Lambrev

Hi,

Ingo Flaschberger wrote:

Dear Rudy,

I used polling in FreeBSD 5.x and it helped a bunch.  I set up a new 
router with 7.0 and MSI was recommended to me.  (I noticed no 
difference when moving from polling - MSI, however, on 5.4 polling 
seemed to help a lot.  What are people using in 7.0?

polling or MSI?


if you have a inet-router with gige-uplinks, it is possible that there 
will be (d)dos attacks.
only polling helps you then to keep the router manageable (but 
dropping packets).

Let me disagree :)
I'm experimenting with bridge and Intel 82571EB Gigabit Ethernet Controller.
On quad core system I have no problems with the stability of the bridge 
without polling.
taskq em0 takes 100% CPU, but I have another three (cpus/cores) that are 
free and the router is very very stable, no lag on other interfaces

and the average load is not very high too.


Kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


--

Best Wishes,
Stefan Lambrev
ICQ# 24134177

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Ingo Flaschberger

Dear Paul,

I have been unable to even come close to livelocking the machine with the em 
driver interrupt moderation.
So that to me throws polling out the window.  I tried 8000hz with polling 
modified to allow 1 burst and it makes no difference


higher hz-values gives you better latenca but less overall speed.
2000hz should be enough.

Kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Ingo Flaschberger

Dear Paul,



Dual Opteron 2212, Recompiled kernel with 7-STABLE and removed a lot of junk 
in the config, added
options NO_ADAPTIVE_MUTEXES   not sure if that makes any difference 
or not, will test without.

Used ULE scheduler, used preemption, CPUTYPE=opteron in /etc/make.conf
7.0-STABLE FreeBSD 7.0-STABLE #4: Tue Jul  1 01:22:18 CDT 2008 amd64
Max input rate .. 587kpps?   Take into consideration that these packets are 
being forwarded out em1 interface which
causes a great impact on cpu usage.  If I set up a firewall rule to block the 
packets it can do over 1mpps on em0 input.


would be great if you can also test with 32bit.

what value do you have at net.inet.ip.intr_queue_maxlen?

kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Paul

Thanks.. I was hoping I wasn't seeing things :
I do not like inconsistencies.. :/

Stefan Lambrev wrote:



Greetings Paul,



--OK I'm stumped now.. Rebuilt with preemption and ULE and 
preemption again and it's not doing what it did before..
I saw this in my configuration too :) Just leave your test running for 
longer time and you will see this strange inconsistency in action.
In my configuration I almost always have better throughput after 
reboot, which drops latter (5-10min under flood) with 50-60kpps and 
after another 10-15min the number of correctly passed packet increase 
again. Looks like auto tuning of which I'm not aware :)



How could that be? Now about 500kpps..

That kind of inconsistency almost invalidates all my testing.. why 
would it be so much different after trying a bunch of kernel options 
and rebooting a bunch of times and then going back to the original 
config doesn't get you what it did in the beginning..


I'll have to dig into this further.. never seen anything like it :)

Hopefully the ip_input fix will help free up a few cpu cycles.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Paul
I am going to.. I have an opteron 270 dual set up on 32 bit and the 2212 
is set up on 64 bit :)

Today should bring some 32 bit results as well as etherchannel results.


Ingo Flaschberger wrote:

Dear Paul,



Dual Opteron 2212, Recompiled kernel with 7-STABLE and removed a lot 
of junk in the config, added
options NO_ADAPTIVE_MUTEXES   not sure if that makes any 
difference or not, will test without.

Used ULE scheduler, used preemption, CPUTYPE=opteron in /etc/make.conf
7.0-STABLE FreeBSD 7.0-STABLE #4: Tue Jul  1 01:22:18 CDT 2008 amd64
Max input rate .. 587kpps?   Take into consideration that these 
packets are being forwarded out em1 interface which
causes a great impact on cpu usage.  If I set up a firewall rule to 
block the packets it can do over 1mpps on em0 input.


would be great if you can also test with 32bit.

what value do you have at net.inet.ip.intr_queue_maxlen?

kind regards,
Ingo Flaschberger




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Paul
I can't reproduce the 580kpps maximum that I saw when I first compiled 
for some reason, I don't understand, the max I get even with ULE and 
preemption
is now about 530 and it dips to 480 a lot.. The first time I tried it it 
was at 580 and dipped to 520...what the?.. (kernel config attached at end)
* noticed that SOMETIMES the em0 taskq jumps around cpus and doesn't use 
100% of any one cpu
* noticed that the netstat packets per second rate varies explicitly 
with the CPU usage of em0 taskq

(top output with ULE/PREEMPTION compiled in):
PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  10 root 171 ki31 0K16K RUN3  64:12 94.09% idle: cpu3
  36 root -68- 0K16K CPU1   1   5:43 89.75% em0 taskq
  13 root 171 ki31 0K16K CPU0   0  63:21 87.30% idle: cpu0
  12 root 171 ki31 0K16K RUN1  62:44 66.75% idle: cpu1
  11 root 171 ki31 0K16K CPU2   2  62:17 56.49% idle: cpu2
  39 root -68- 0K16K -  0   0:54 10.64% em3 taskq

this is about 480-500kpps rate.
now I wait a minute and

PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  10 root 171 ki31 0K16K CPU3   3  64:56 100.00% idle: cpu3
  36 root -68- 0K16K CPU2   2   6:21 94.14% em0 taskq
  13 root 171 ki31 0K16K RUN0  63:55 80.18% idle: cpu0
  11 root 171 ki31 0K16K RUN2  62:48 67.38% idle: cpu2
  12 root 171 ki31 0K16K CPU1   1  63:04 58.40% idle: cpu1
  39 root -68- 0K16K -  1   1:00 10.21% em3 taskq


530kpps rate...


drops to 85%.. 480kpps rate
goes back up to 95% 530kpps

it keeps flopping like this...

none of the CPUs are 100% use and none of the cpus add up , like the cpu 
time of em0 taskq is 94% so one of the cpus should be 6% idle but it's not.
This is with ULE/PREEMPTION.. I see different behavior without 
preemption and with 4bsd..

and I also see different behavior depending on the time of day lol :)
Figure that one out

I'll post back without preemption and with 4bsd in a min
then i'll move on to the 32 bit platform tests


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Paul

ULE without PREEMPTION is now yeilding better results.
input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   571595 40639   34564108  1 0226 0
   577892 48865   34941908  1 0178 0
   545240 84744   32966404  1 0178 0
   587661 44691   35534512  1 0178 0
   587839 38073   35544904  1 0178 0
   587787 43556   35540360  1 0178 0
   540786 39492   32712746  1 0178 0
   572071 55797   34595650  1 0178 0
 
*OUCH, IPFW HURTS..
loading ipfw, and adding one ipfw rule allow ip from any to any drops 
100Kpps off :/ what's up with THAT?
unloaded ipfw module and back 100kpps more again, that's not right with 
ONE rule.. :/


em0 taskq is still jumping cpus.. is there any way to lock it to one cpu 
or is this just a function of ULE


running a tar czpvf all.tgz *  and seeing if pps changes..
negligible.. guess scheduler is doing it's job at least..

Hmm. even when it's getting 50-60k errors per second on the interface I 
can still SCP a file through that interface although it's not fast.. 
3-4MB/s..


You know, I wouldn't care if it added 5ms latency to the packets when it 
was doing 1mpps as long as it didn't drop any.. Why can't it do that? 
Queue them up and do them in bi chunks so none are droppedhmm?


32 bit system is compiling now..  won't do  400kpps with GENERIC 
kernel, as with 64 bit did 450k with GENERIC, although that could be

the difference between opteron 270 and opteron 2212..

Paul

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Paul

Ok, now THIS is absoultely a whole bunch of ridiculousness..
I set up etherchannel, and I'm evenly distributing packets over em0 em1 
and em2 to lagg0
and i get WORSE performance than with a single interface..  Can anyone 
explain this one? This is horrible.
I got em0-em2 taskq's using 80% cpu EACH and they are only doing 100kpps 
EACH


looks:

packets  errs  bytespackets  errs  bytes colls
   105050 110666303000  0 0  0 0
   104952 139696297120  0 0  0 0
   104331 121216259860  0 0  0 0

  input  (em1)   output
  packets  errs  bytespackets  errs  bytes colls
   103734 706586223998  0 0  0 0
   103483 757036209046  0 0  0 0
   103848 761956230886  0 0  0 0


  input  (em2)   output
  packets  errs  bytespackets  errs  bytes colls
   103299 629576197940  1 0226 0
   106388 730716383280  1 0178 0
   104503 705736270180  4 0712 0

last pid:  1378;  load averages:  2.31,  1.28,  
0.57  up 
0+00:06:27  17:42:32

68 processes:  8 running, 42 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 58.9% system,  0.0% interrupt, 41.1% idle
Mem: 7980K Active, 5932K Inact, 47M Wired, 16K Cache, 8512K Buf, 1920M Free
Swap: 8192M Total, 8192M Free

 PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  11 root 171 ki31 0K16K RUN2   5:18 80.47% idle: cpu2
  38 root -68- 0K16K CPU3   3   2:30 80.18% em2 taskq
  37 root -68- 0K16K CPU1   1   2:28 76.90% em1 taskq
  36 root -68- 0K16K CPU2   2   2:28 72.56% em0 taskq
  13 root 171 ki31 0K16K RUN0   3:32 29.20% idle: cpu0
  12 root 171 ki31 0K16K RUN1   3:29 27.88% idle: cpu1
  10 root 171 ki31 0K16K RUN3   3:21 25.63% idle: cpu3
  39 root -68- 0K16K -  3   0:32 17.68% em3 taskq


See that's total wrongness.. something is very wrong here.  Does anyone 
have any ideas? I really need to get this working.
I figured if I evenly distributed the packets over 3 interfaces it 
simulates having 3 rx queues because it has a separate process for each 
interface
and the result is WAY more CPU usage and a little over half the pps 
throughput with a single port ..


If anyone is interested in tackling some these issues please e-mail me.  
It would be greatly appreciated.



Paul



Julian Elischer wrote:

Paul wrote:

ULE without PREEMPTION is now yeilding better results.
input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   571595 40639   34564108  1 0226 0
   577892 48865   34941908  1 0178 0
   545240 84744   32966404  1 0178 0
   587661 44691   35534512  1 0178 0
   587839 38073   35544904  1 0178 0
   587787 43556   35540360  1 0178 0
   540786 39492   32712746  1 0178 0
   572071 55797   34595650  1 0178 0
 
*OUCH, IPFW HURTS..
loading ipfw, and adding one ipfw rule allow ip from any to any drops 
100Kpps off :/ what's up with THAT?
unloaded ipfw module and back 100kpps more again, that's not right 
with ONE rule.. :/


ipfw need sto gain a lock on hte firewall before running,
and is quite complex..  I can believe it..

in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two 
interfaces (bridged) but I think it has slowed down since then due to 
the SMP locking.





em0 taskq is still jumping cpus.. is there any way to lock it to one 
cpu or is this just a function of ULE


running a tar czpvf all.tgz *  and seeing if pps changes..
negligible.. guess scheduler is doing it's job at least..

Hmm. even when it's getting 50-60k errors per second on the interface 
I can still SCP a file through that interface although it's not 
fast.. 3-4MB/s..


You know, I wouldn't care if it added 5ms latency to the packets when 
it was doing 1mpps as long as it didn't drop any.. Why can't it do 
that? Queue them up and do them in bi chunks so none are 
droppedhmm?


32 bit system is compiling now..  won't do  400kpps with GENERIC 
kernel, as with 64 bit did 450k with GENERIC, although that could be

the difference between opteron 270 and opteron 2212..

Paul

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, 

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-07-01 Thread Paul
Apparently lagg hasn't been giant fixed :/   Can we do something about 
this quickly? 
with adaptive giant i get more performance on lagg but the cpu usage is 
smashed 100%
I get about 50k more pps per interface  (so 150kpps total which STILL is 
less than a single gigabit port)

Check it out

68 processes:  9 running, 41 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 89.5% system,  0.0% interrupt, 10.5% idle
Mem: 8016K Active, 6192K Inact, 47M Wired, 108K Cache, 9056K Buf, 1919M Free
Swap: 8192M Total, 8192M Free

 PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  38 root -68- 0K16K CPU1   1   3:29 100.00% em2 taskq
  37 root -68- 0K16K CPU0   0   3:31 98.78% em1 taskq
  36 root -68- 0K16K CPU3   3   2:53 82.42% em0 taskq
  11 root 171 ki31 0K16K RUN2  22:48 79.00% idle: cpu2
  10 root 171 ki31 0K16K RUN3  20:51 22.90% idle: cpu3
  39 root -68- 0K16K RUN2   0:32 16.60% em3 taskq
  12 root 171 ki31 0K16K RUN1  20:16  2.05% idle: cpu1
  13 root 171 ki31 0K16K RUN0  20:25  1.90% idle: cpu0

   input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   122588 07355280  0 0  0 0
   123057 07383420  0 0  0 0

   input  (em1)   output
  packets  errs  bytespackets  errs  bytes colls
   174917 11899   10495032  2 0178 0
   173967 11697   10438038  2 0356 0
   174630 10603   10477806  2 0268 0

   input  (em2)   output
  packets  errs  bytespackets  errs  bytes colls
   175843  3928   10550580  0 0  0 0
   175952  5750   10557120  0 0  0 0


Still less performance than single gig-e.. that giant lock really sucks 
, and why on earth would LAGG require that.. It  seems so simple to fix :/
Anyone up for it:) I wish I was a programmer sometimes, but network 
engineering will have to do. :D




Julian Elischer wrote:

Paul wrote:
Is PF better than ipfw?  iptables almost has no impact on routing 
performance unless I add a swath of rules to it and then it bombs
I need maybe 10 rules max  and I don't want 20% performance drop for 
that.. :P


well lots of people have wanted to fix it, and I've investigated
quite a lot but it takes someone with 2 weeks of free time and
all the right clue. It's not inherrent in ipfw but it needs some
TLC from someone who cares :-).



Ouch! :)  Is this going to be fixed any time soon?  We have some 
money that can be used for development costs to fix things like this 
because
we use linux and freebsd machines as firewalls for a lot of customers 
and with the increasing bandwidth and pps the customers are demanding 
more and I
can't give them better performance with a brand new dual xeon or 
opteron machine vs the old p4 machines I have them running on now :/  
The only difference
in the new machine vs old machine is that the new one can take in 
more pps and drop it but it can't route a whole lot more.  
Routing/firewalling must still not be lock free, ugh.. :P


Thanks



Julian Elischer wrote:

Paul wrote:

ULE without PREEMPTION is now yeilding better results.
input  (em0)   output
  packets  errs  bytespackets  errs  bytes colls
   571595 40639   34564108  1 0226 0
   577892 48865   34941908  1 0178 0
   545240 84744   32966404  1 0178 0
   587661 44691   35534512  1 0178 0
   587839 38073   35544904  1 0178 0
   587787 43556   35540360  1 0178 0
   540786 39492   32712746  1 0178 0
   572071 55797   34595650  1 0178 0
 
*OUCH, IPFW HURTS..
loading ipfw, and adding one ipfw rule allow ip from any to any 
drops 100Kpps off :/ what's up with THAT?
unloaded ipfw module and back 100kpps more again, that's not right 
with ONE rule.. :/


ipfw need sto gain a lock on hte firewall before running,
and is quite complex..  I can believe it..

in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two 
interfaces (bridged) but I think it has slowed down since then due 
to the SMP locking.





em0 taskq is still jumping cpus.. is there any way to lock it to 
one cpu or is this just a function of ULE


running a tar czpvf all.tgz *  and seeing if pps changes..
negligible.. guess scheduler is doing it's job at least..

Hmm. even when it's getting 50-60k errors per second on the 
interface I can still SCP a file through that interface although 
it's not fast.. 3-4MB/s..


You know, I wouldn't care if it added 5ms latency to the packets 
when it was doing 1mpps as long as it didn't drop any.. Why can't 
it do that? 

Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Stefan Lambrev



Paul wrote:
The higher I set the buffer the worse it is.. 256 and 512 I get about 
50-60k more pps than i do with 2048 or 4096.. You
would think it would be the other way around but obviously there is 
some contention going on. :/
Looks like in bridge mode hw.em.rxd=512 and hw.em.txd=512 yields best 
results also. reducing or increasing those leads to worse performance.
btw is there any news with hwpmc for new CPUs ? last time I checked was 
real pain to get it working with core2 CPUs :(
I'm sticking with 512 for now, as it seems to make it worse with 
anything higher.
Keep in mind, i'm using random source ips, random source and 
destination ports.. Although that should have zero impact on the 
amount of PPS it can route but for some reason it seems to.. ? Any 
ideas on that one?   A single stream one source ip/port to one 
destination ip/port seems to use less cpu, although I haven't 
generated the same pps with that yet.. I am going to test it soon


Ingo Flaschberger wrote:

Dear Paul,

I tried this.. I put  6-STABLE (6.3), using default driver was 
slower than FBSD7


have you set the rx/tx buffers?

/boot/loader.conf
hw.em.rxd=4096
hw.em.txd=4096

bye,
Ingo



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


--

Best Wishes,
Stefan Lambrev
ICQ# 24134177

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Mike Tancsa

At 04:04 AM 6/29/2008, Paul wrote:
This is just a question but who can get more than 400k pps 
forwarding performance ?



OK, I setup 2 boxes on either end of a RELENG_7 box from about May 
7th just now, to see with 2 boxes blasting across it how it would 
work.  *However*, this is with no firewall loaded and, I must enable 
ip fast forwarding. Without that enabled, the box just falls over.


even at 20Kpps, I start seeing all sorts of messages spewing to route 
-n monitor



got message of size 96 on Mon Jun 30 15:39:10 2008
RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 
0, flags:DONE

locks:  inits:
sockaddrs: DST
 default

I am starting to wonder if those messages are the results of 
corrupted packets the machine just cant keep up with ?



CPU is

CPU: Intel(R) Xeon(R) CPU3070  @ 2.66GHz (2660.01-MHz 
686-class CPU)



   input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
611945 0   77892098 611955 0   77013002 0
616727 0   78215508 616742 0   77303454 0
617066 0   78162130 617082 0   77238434 0
618238 0   78302314 618225 0   77377582 0
617035 0   78141000 617038 0   77215672 0
617625 0   78225600 617588 0   77301734 0
616190 0   78017320 616165 0   77091774 0
615583 0   78064130 615628 0   77152800 0
617662 0   78254388 617658 0   77332340 0
618000 0   78269912 617950 0   77344554 0
617248 0   78183136 617315 0   77259588 0
617325 0   78204566 617289 0   77282094 0
618391 0   78337734 618357 0   77413756 0
616025 0   78116070 616082 0   77203116 0


To generate the packets, I am just using 
/usr/src/tools/tools/netblast  on 2 endpoints starting at about the same time


# ./netblast 10.10.1.2 500 100 40

start: 1214854131.083679919
finish:1214854171.084668592
send calls:20139141
send errors:   0
approx send rate:  503478
approx error rate: 0


# ./netblast 10.10.1.3 500 10 40

start: 1214854273.882202815
finish:1214854313.882319031
send calls:23354971
send errors:   18757223
approx send rate:  114943
approx error rate: 0

The box in the middle doing the forwarding

1[spare-r7]# ifconfig -u
em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=19bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4
ether 00:1b:21:08:32:a8
inet 10.20.1.1 netmask 0xff00 broadcast 10.20.1.255
media: Ethernet autoselect (1000baseTX full-duplex)
status: active
em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM
ether 00:1b:21:08:32:a9
inet 192.168.43.193 netmask 0xff00 broadcast 192.168.43.255
media: Ethernet autoselect (100baseTX full-duplex)
status: active
em3: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=19bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4
ether 00:30:48:90:4c:ff
inet 10.10.1.1 netmask 0xff00 broadcast 10.10.1.255
media: Ethernet autoselect (1000baseTX full-duplex)
status: active
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384
inet 127.0.0.1 netmask 0xff00


I am going to try a few more tests with and without, firewall rules 
etc as well as an updated kernel to RELENG_7 as of today and see how that goes.


---Mike

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Paul
With hours and days of tweaking i can't even get 500k pps :/ no firewall 
no anything else..

What is your kernel config? Sysctl configs?
My machine i'm testing on is dual opteron 2212  , with intel 2 port 
82571 nic..  Using 7-STABLE and I tried 6-stable and -current
I get the RTM_MISS with 7 and current but only with certain types of 
packets at a certain rate.. :/
I can not get more than 500kpps.. i tried everything I could think of... 
lowering the rx descriptors on EM to 512 instead of 2048 gave me some 
more.. I was stuck at 400kpps until i changed those and i lowered the rx 
processing limit.   
My tests are going incoming em0 and outgoing em1  in one direction only 
and it has major errors when em0 taskq gets close to 80% cpu..
I am pretty disappointed that it maxes out a little over 400kpps and 
even then it gets some errors here and there , mainly missed packets due 
to no buffer and rx overruns (dev.em.0.stats=1)




Mike Tancsa wrote:

At 04:04 AM 6/29/2008, Paul wrote:
This is just a question but who can get more than 400k pps forwarding 
performance ?



OK, I setup 2 boxes on either end of a RELENG_7 box from about May 7th 
just now, to see with 2 boxes blasting across it how it would work.  
*However*, this is with no firewall loaded and, I must enable ip fast 
forwarding. Without that enabled, the box just falls over.


even at 20Kpps, I start seeing all sorts of messages spewing to route 
-n monitor



got message of size 96 on Mon Jun 30 15:39:10 2008
RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 
0, flags:DONE

locks:  inits:
sockaddrs: DST
 default

I am starting to wonder if those messages are the results of corrupted 
packets the machine just cant keep up with ?



CPU is

CPU: Intel(R) Xeon(R) CPU3070  @ 2.66GHz (2660.01-MHz 
686-class CPU)



   input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
611945 0   77892098 611955 0   77013002 0
616727 0   78215508 616742 0   77303454 0
617066 0   78162130 617082 0   77238434 0
618238 0   78302314 618225 0   77377582 0
617035 0   78141000 617038 0   77215672 0
617625 0   78225600 617588 0   77301734 0
616190 0   78017320 616165 0   77091774 0
615583 0   78064130 615628 0   77152800 0
617662 0   78254388 617658 0   77332340 0
618000 0   78269912 617950 0   77344554 0
617248 0   78183136 617315 0   77259588 0
617325 0   78204566 617289 0   77282094 0
618391 0   78337734 618357 0   77413756 0
616025 0   78116070 616082 0   77203116 0


To generate the packets, I am just using 
/usr/src/tools/tools/netblast  on 2 endpoints starting at about the 
same time


# ./netblast 10.10.1.2 500 100 40

start: 1214854131.083679919
finish:1214854171.084668592
send calls:20139141
send errors:   0
approx send rate:  503478
approx error rate: 0


# ./netblast 10.10.1.3 500 10 40

start: 1214854273.882202815
finish:1214854313.882319031
send calls:23354971
send errors:   18757223
approx send rate:  114943
approx error rate: 0

The box in the middle doing the forwarding

1[spare-r7]# ifconfig -u
em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=19bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4

ether 00:1b:21:08:32:a8
inet 10.20.1.1 netmask 0xff00 broadcast 10.20.1.255
media: Ethernet autoselect (1000baseTX full-duplex)
status: active
em1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM
ether 00:1b:21:08:32:a9
inet 192.168.43.193 netmask 0xff00 broadcast 192.168.43.255
media: Ethernet autoselect (100baseTX full-duplex)
status: active
em3: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=19bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4

ether 00:30:48:90:4c:ff
inet 10.10.1.1 netmask 0xff00 broadcast 10.10.1.255
media: Ethernet autoselect (1000baseTX full-duplex)
status: active
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384
inet 127.0.0.1 netmask 0xff00


I am going to try a few more tests with and without, firewall rules 
etc as well as an updated kernel to RELENG_7 as of today and see how 
that goes.


---Mike




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Steve Bertrand

Mike Tancsa wrote:

At 04:04 AM 6/29/2008, Paul wrote:
This is just a question but who can get more than 400k pps forwarding 
performance ?



OK, I setup 2 boxes on either end of a RELENG_7 box from about May 7th 
just now, to see with 2 boxes blasting across it how it would work.  
*However*, this is with no firewall loaded and, I must enable ip fast 
forwarding. Without that enabled, the box just falls over.


even at 20Kpps, I start seeing all sorts of messages spewing to route -n 
monitor



got message of size 96 on Mon Jun 30 15:39:10 2008
RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, 
flags:DONE

locks:  inits:
sockaddrs: DST
 default


Mike,

Is the monitor running on the 7.0 box in the middle you are testing?

I set up the same configuration, and even with almost no load ( 1Kpps) 
can replicate these error messages by making the remote IP address (in 
your case 'default', disappear (ie: unplug the cable, DDoS etc).


...to further, I can even replicate the problem at a single packet per 
second by trying to ping an IP address that I know for fact that the 
router can not get to.


Do you see these error messages if you set up a loopback address with an 
IP on the router, and effectively chop your test environment in half? In 
your case, can the router in the middle actually get to a default 
gateway for external addresses (when I perform the test, your 'default' 
is substituted with the IP I am trying to reach, so I am only assuming 
that 'default' is implying default gateway).


Steve
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Paul

I am getting this message with normal routing.

say...

em0 10.1.1.1/24

em1 10.2.2.1/24

using a box 10.1.1.2 on em0
and having another box on 10.2.2.2 on em1

I send packet from 10.1.1.2 which goes through em0 and has a route to 
10.2.2.2 out em1 of course and I get MASSIVE RTM_MISS messages but ONLY 
with this certain packets.. I don't get it?   I posted the tcpdump of 
the types of packets that generate them and the ones that don't.  
RTM_MISS is normal if the box can't get to a route, it's the 
'destination unreachable' message.


I would prefer a kernel option to disable this message to save  CPU 
cycles though as it is completely unnecessary to generate.


I even set the default gateway to loopback interface and I STILL get the 
message.. Something is wrong in the code somewhere.
Does anyone have any idea how to disable this message? It's causing 
major cpu usage on my zebra daemon which is watching the route messages 
and most likely severely limiting pps throughput :/


It generates the messages with only ip on em1 and em0 with nothing else 
in the routing table and a default gateway set.  So it has nothing to do 
with zebra.  It happens in 7-STABLE and (8) -CURRENT, I tested both.


There are no RTM_MISS message in 7-RELEASE so someone changed something 
to -STABLE :/


Paul


Steve Bertrand wrote:

Mike Tancsa wrote:

At 04:04 AM 6/29/2008, Paul wrote:
This is just a question but who can get more than 400k pps 
forwarding performance ?



OK, I setup 2 boxes on either end of a RELENG_7 box from about May 
7th just now, to see with 2 boxes blasting across it how it would 
work.  *However*, this is with no firewall loaded and, I must enable 
ip fast forwarding. Without that enabled, the box just falls over.


even at 20Kpps, I start seeing all sorts of messages spewing to route 
-n monitor



got message of size 96 on Mon Jun 30 15:39:10 2008
RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 
0, flags:DONE

locks:  inits:
sockaddrs: DST
 default


Mike,

Is the monitor running on the 7.0 box in the middle you are testing?

I set up the same configuration, and even with almost no load ( 
1Kpps) can replicate these error messages by making the remote IP 
address (in your case 'default', disappear (ie: unplug the cable, DDoS 
etc).


...to further, I can even replicate the problem at a single packet per 
second by trying to ping an IP address that I know for fact that the 
router can not get to.


Do you see these error messages if you set up a loopback address with 
an IP on the router, and effectively chop your test environment in 
half? In your case, can the router in the middle actually get to a 
default gateway for external addresses (when I perform the test, your 
'default' is substituted with the IP I am trying to reach, so I am 
only assuming that 'default' is implying default gateway).


Steve



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Ingo Flaschberger

Dear Paul,


I am getting this message with normal routing.

say...

em0 10.1.1.1/24

em1 10.2.2.1/24

using a box 10.1.1.2 on em0
and having another box on 10.2.2.2 on em1

I send packet from 10.1.1.2 which goes through em0 and has a route to 
10.2.2.2 out em1 of course and I get MASSIVE RTM_MISS messages but ONLY with 
this certain packets.. I don't get it?   I posted the tcpdump of the types of


There is a open bug report:
http://www.freebsd.org/cgi/query-pr.cgi?pr=124540

perhaps it has something todo with the multiple fip-stuff?

kind regards,
Ingo Flaschberger
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Wilkinson, Alex
0n Mon, Jun 30, 2008 at 03:44:48PM -0400, Mike Tancsa wrote: 

OK, I setup 2 boxes on either end of a RELENG_7 box from about May 
7th just now, to see with 2 boxes blasting across it how it would 
work.  *However*, this is with no firewall loaded and, I must enable 
ip fast forwarding. Without that enabled, the box just falls over.

What is ip fast forwarding ?

 -aW

IMPORTANT: This email remains the property of the Australian Defence 
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 
1914.  If you have received this email in error, you are requested to contact 
the sender and delete the email.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Ingo Flaschberger

Dear Alex,


   OK, I setup 2 boxes on either end of a RELENG_7 box from about May
   7th just now, to see with 2 boxes blasting across it how it would
   work.  *However*, this is with no firewall loaded and, I must enable
   ip fast forwarding. Without that enabled, the box just falls over.

What is ip fast forwarding ?


instead of copying the while ip packet into system memory, only the ip 
header is copyied and then in a fast path determined if it could be fast 
forwarded.

if possible, a ned header is created at the other network-cards-buffer
and the ip-data is copied from network-card-buffer to network-card-buffer 
directly.


Kind regards,
   Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Wilkinson, Alex
0n Tue, Jul 01, 2008 at 03:00:31AM +0200, Ingo Flaschberger wrote: 

Dear Alex,

OK, I setup 2 boxes on either end of a RELENG_7 box from about May
7th just now, to see with 2 boxes blasting across it how it would
work.  *However*, this is with no firewall loaded and, I must enable
ip fast forwarding. Without that enabled, the box just falls over.

 What is ip fast forwarding ?

instead of copying the while ip packet into system memory, only the ip 
header is copyied and then in a fast path determined if it could be fast 
forwarded.
if possible, a ned header is created at the other network-cards-buffer
and the ip-data is copied from network-card-buffer to network-card-buffer 
directly.

So how does one enable ip fast forwarding on FreeBSD ?

 -aW

IMPORTANT: This email remains the property of the Australian Defence 
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 
1914.  If you have received this email in error, you are requested to contact 
the sender and delete the email.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Ingo Flaschberger

Dear Alex,


   if possible, a ned header is created at the other network-cards-buffer
   and the ip-data is copied from network-card-buffer to network-card-buffer
   directly.

So how does one enable ip fast forwarding on FreeBSD ?


sysctl -w net.inet.ip.fastforwarding=1

usually interface polling is also chosen to prevent lock-ups.
man polling

kind regards,
Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Support (Rudy)

Ingo Flaschberger wrote:

usually interface polling is also chosen to prevent lock-ups.
man polling



I used polling in FreeBSD 5.x and it helped a bunch.  I set up a new router with 7.0 and 
MSI was recommended to me.  (I noticed no difference when moving from polling - MSI, 
however, on 5.4 polling seemed to help a lot.  What are people using in 7.0?

 polling or MSI?

Rudy
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Steve Bertrand

Wilkinson, Alex wrote:


So how does one enable ip fast forwarding on FreeBSD ?


Not to take anything away from Ingo's response, but to inform how to add 
the functionality to span across reboots, add the following line to 
/etc/sysctl.conf


net.inet.ip.fastforwarding=1

Steve
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

2008-06-30 Thread Steve Bertrand

Support (Rudy) wrote:

Ingo Flaschberger wrote:

usually interface polling is also chosen to prevent lock-ups.
man polling



I used polling in FreeBSD 5.x and it helped a bunch.  I set up a new 
router with 7.0 and MSI was recommended to me.  (I noticed no difference 
when moving from polling - MSI, however, on 5.4 polling seemed to help 
a lot.  


I'm curious now... how do you change individual device polling via sysctl?

Steve
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


  1   2   >