Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Luigi Rizzo
On Wed, Jul 04, 2012 at 12:31:56AM +0400, Alexander V. Chernikov wrote:
> On 04.07.2012 00:27, Luigi Rizzo wrote:
> >On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote:
> >...
> >>Thanks, another good point. I forgot to merge this option from andre's
> >>patch.
> >>
> >>Another 30-40-50kpps to win.
> >
> >not much gain though.
> >What about the other IPSTAT_INC counters ?
> Well, we should then remove all such counters (total, forwarded) and 
> per-interface statistics (at least for forwarded packets).

I am not saying to remove them for good, but at least have a
try at what we can hope to save by implementing them
on a per-cpu basis.

There is a chance that one will not
see big gains util the majority of such shared counters
are fixed (there are probably 3-4 at least on the non-error
path for forwarded packets), plus the per-interface ones
that are not even wrapped in macros (see if_ethersubr.c)

> >I think the IPSTAT_INC macros were introduced (by rwatson ?)
> >following a discussion on how to make the counters per-cpu
> >and avoid the contention on cache lines.
> >But they are still implemented as a single instance,
> >and neither volatile nor atomic, so it is not even clear
> >that they can give reliable results, let alone the fact
> >that you are likely to get some cache misses.
> >
> >the relevant macro is in ip_var.h.
> Hm. This seems to be just per-vnet structure instance.

yes but essentially they are still shared by all threads within a vnet
(besides you probably ran your tests in the main instance)

> We've got some more real DPCPU stuff (sys/pcpu.h && kern/subr_pcpu.c) 
> which can be used for global ipstat structure, however since it is 
> allocated from single area without possibility to free we can't use it 
> for per-interface counters.

yes, those should be moved to a private, dynamically allocated
region of the ifnet (the number of CPUs is known at driver init
time, i hope). But again for a quick test disabling the
if_{i|o}{bytesC|packets} should do the job, if you can count
the received rate by some other means.

> I'll try to run tests without any possibly contested counters and report 
> the results on Thursday.

great, that would be really useful info.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Alexander V. Chernikov

On 04.07.2012 00:27, Luigi Rizzo wrote:

On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote:
...

Thanks, another good point. I forgot to merge this option from andre's
patch.

Another 30-40-50kpps to win.


not much gain though.
What about the other IPSTAT_INC counters ?
Well, we should then remove all such counters (total, forwarded) and 
per-interface statistics (at least for forwarded packets).

I think the IPSTAT_INC macros were introduced (by rwatson ?)
following a discussion on how to make the counters per-cpu
and avoid the contention on cache lines.
But they are still implemented as a single instance,
and neither volatile nor atomic, so it is not even clear
that they can give reliable results, let alone the fact
that you are likely to get some cache misses.

the relevant macro is in ip_var.h.

Hm. This seems to be just per-vnet structure instance.
We've got some more real DPCPU stuff (sys/pcpu.h && kern/subr_pcpu.c) 
which can be used for global ipstat structure, however since it is 
allocated from single area without possibility to free we can't use it 
for per-interface counters.


I'll try to run tests without any possibly contested counters and report 
the results on Thursday.


Cheers
luigi



+u_int rt_count  = 1;
+SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW,&rt_count, 1, "");

@@ -601,17 +625,20 @@ passout:
 if (error != 0)
 IPSTAT_INC(ips_odropped);
 else {
-   ro.ro_rt->rt_rmx.rmx_pksent++;
+   if (rt_count)
+   ro.ro_rt->rt_rmx.rmx_pksent++;
 IPSTAT_INC(ips_forward);
 IPSTAT_INC(ips_fastforward);




cheers
luigi




--
WBR, Alexander
___
freebsd-...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"




--
WBR, Alexander
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Luigi Rizzo
On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote:
...
> Thanks, another good point. I forgot to merge this option from andre's 
> patch.
> 
> Another 30-40-50kpps to win.

not much gain though.
What about the other IPSTAT_INC counters ?
I think the IPSTAT_INC macros were introduced (by rwatson ?)
following a discussion on how to make the counters per-cpu
and avoid the contention on cache lines.
But they are still implemented as a single instance,
and neither volatile nor atomic, so it is not even clear
that they can give reliable results, let alone the fact
that you are likely to get some cache misses.

the relevant macro is in ip_var.h.

Cheers
luigi

> 
> +u_int rt_count  = 1;
> +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW, &rt_count, 1, "");
> 
> @@ -601,17 +625,20 @@ passout:
> if (error != 0)
> IPSTAT_INC(ips_odropped);
> else {
> -   ro.ro_rt->rt_rmx.rmx_pksent++;
> +   if (rt_count)
> +   ro.ro_rt->rt_rmx.rmx_pksent++;
> IPSTAT_INC(ips_forward);
> IPSTAT_INC(ips_fastforward);
> 
> 
> >
> >cheers
> >luigi
> >
> 
> 
> -- 
> WBR, Alexander
> ___
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Alexander V. Chernikov

On 03.07.2012 20:55, Luigi Rizzo wrote:

On Tue, Jul 03, 2012 at 08:11:14PM +0400, Alexander V. Chernikov wrote:

Hello list!

I'm quite stuck with bad forwarding performance on many FreeBSD boxes
doing firewalling.

...

In most cases system can forward no more than 700 (or 1400) kpps which
is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware).


among the many interesting tests you have run, i am curious
if you have tried to remove the update of the counters on route
entries. They might be another severe contention point.


21:47 [0] m@test15 netstat -I ix0 -w 1
input  (ix0)   output
   packets  errs idrops  bytespackets  errs  bytes colls
   1785514 52785 0  1213183401784650 0  117874854 0
   1773126 52437 0  1207014701772977 0  117584736 0
   1781948 52154 0  1210601261778271 0   75029554 0
   1786169 52982 0  1214511601787312 0  160967392 0
21:47 [0] test15# sysctl net.rt_count=0
net.rt_count: 1 -> 0
   1814465 22546 0  1213020761814291 0   76860092 0
   1817769 14272 0  1209849221816254 0  163643534 0
   1815311 13113 0  1208319701815340 0  120159118 0
   1814059 13698 0  1207991321813738 0  120172092 0
   1818030 13513 0  1209601401814578 0  120332662 0
   1814169 14351 0  1208361821814003 0  120164310 0

Thanks, another good point. I forgot to merge this option from andre's 
patch.


Another 30-40-50kpps to win.


+u_int rt_count  = 1;
+SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW, &rt_count, 1, "");

@@ -601,17 +625,20 @@ passout:
if (error != 0)
IPSTAT_INC(ips_odropped);
else {
-   ro.ro_rt->rt_rmx.rmx_pksent++;
+   if (rt_count)
+   ro.ro_rt->rt_rmx.rmx_pksent++;
IPSTAT_INC(ips_forward);
IPSTAT_INC(ips_fastforward);




cheers
luigi




--
WBR, Alexander
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Luigi Rizzo
On Tue, Jul 03, 2012 at 08:11:14PM +0400, Alexander V. Chernikov wrote:
> Hello list!
> 
> I'm quite stuck with bad forwarding performance on many FreeBSD boxes 
> doing firewalling.
...
> In most cases system can forward no more than 700 (or 1400) kpps which 
> is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware).

among the many interesting tests you have run, i am curious
if you have tried to remove the update of the counters on route
entries. They might be another severe contention point.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Alexander V. Chernikov

Hello list!

I'm quite stuck with bad forwarding performance on many FreeBSD boxes 
doing firewalling.


Typical configuration is E5645 / E5675 @ Intel 82599 NIC.
HT is turned off.
(Configs and tunables below).

I'm mostly concerned with unidirectional traffic flowing to single 
interface (e.g. using singe route entry).


In most cases system can forward no more than 700 (or 1400) kpps which 
is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware).



Test scenario:

Ixia XM2 (traffic generator) <> ix0 (FreeBSD).

Ixia sends 64byte IP packets from vlan10 (10.100.0.64 - 10.100.0.156) to
destinations in vlan11 (10.100.1.128 - 10.100.1.192).

Static arps are configured for all destination addresses.

Traffic level is slightly above or slightly below system performance.


= Test 1  ===
Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, 
no firewall


Traffic: 1-1 flow (1 src, 1 dst)
(This is actually a bit different from described above)

Result:
 input  (ix0)   output
packets  errs idrops  bytespackets  errs  bytes colls
   878k   48k 059M   878k 056M 0
   874k   48k 059M   874k 056M 0
   875k   48k 059M   875k 056M 0

16:41 [0] test15# top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf "  %7s 
%2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}'

 STATE  C   TIMECPU COMMAND
  CPU6  6  17:28100.00%  kernel{ix0 que}
  CPU9  9  20:42 60.06%intr{irq265: ix0:que

16:41 [0] test15# vmstat -i | grep ix0
irq256: ix0:que 0 500796167
irq257: ix0:que 16693573   2245
irq258: ix0:que 22572380862
irq259: ix0:que 33166273   1062
irq260: ix0:que 49691706   3251
irq261: ix0:que 5   10766434   3611
irq262: ix0:que 68933774   2996
irq263: ix0:que 75246879   1760
irq264: ix0:que 83548930   1190
irq265: ix0:que 9   11817986   3964
irq266: ix0:que 10227561 76
irq267: ix0:link   1  0

Note that system is using 2 cores to forward, so 12 cores should be able 
to forward 4+ mpps which is more or less consistent with Linux results. 
Note that interrupts on all queues are (as far as I understand from the 
fact that AIM is turned off and interrupt rates are the same from 
previous test). Additionally, despite hw.intr_storm_threshold = 200k, 
i'm constantly getting

interrupt storm detected on "irq265:"; throttling interrupt source
message.


= Test 2  ===
Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, 
no firewall


Traffic: Unidirectional many-2-many

16:20 [0] test15# netstat -I ix0 -hw 1
 input  (ix0)   output
packets  errs idrops  bytespackets  errs  bytes colls
   507k  651k 074M   508k 032M 0
   506k  652k 074M   507k 028M 0
   509k  652k 074M   508k 037M 0


16:28 [0] test15# top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf "  %7s 
%2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}'

 STATE  C   TIMECPU COMMAND
 CPU10  6   0:40100.00%  kernel{ix0 que}
  CPU2  2  11:47 84.86%intr{irq258: ix0:que
  CPU3  3  11:50 81.88%intr{irq259: ix0:que
  CPU8  8  11:38 77.69%intr{irq264: ix0:que
  CPU7  7  11:24 77.10%intr{irq263: ix0:que
  WAIT  1  10:10 74.76%intr{irq257: ix0:que
  CPU4  4   8:57 63.48%intr{irq260: ix0:que
  CPU6  6   8:35 61.96%intr{irq262: ix0:que
  CPU9  9  14:01 60.79%intr{irq265: ix0:que
   RUN  0   9:07 59.67%intr{irq256: ix0:que
  WAIT  5   6:13 43.26%intr{irq261: ix0:que
 CPU11 11   5:19 35.89%  kernel{ix0 que}
 -  4   3:41 25.49%  kernel{ix0 que}
 -  1   3:22 21.78%  kernel{ix0 que}
 -  1   2:55 17.68%  kernel{ix0 que}
 -  4   2:24 16.55%  kernel{ix0 que}
 -  1   9:54 14.99%  kernel{ix0 que}
  CPU0 11   2:13 14.26%  kernel{ix0 que}


16:07 [0] test15# vmstat -i | grep ix0
irq256: ix0:que 0  13654 15
irq257: ix0:que 1  87043 96
irq258: ix0:que 2  39604 44
irq259: ix0:que 3  48308 53
irq260: ix0:que 4 138002153
irq261: ix0:que 5 169596188
irq262: ix0:que 6 107679119
irq263: ix0:que 7  72769 81
irq264: ix0:que 8  30878 34
irq265: ix0:que