Re: FreeBSD 10G forwarding performance @Intel
On Wed, Jul 04, 2012 at 12:31:56AM +0400, Alexander V. Chernikov wrote: > On 04.07.2012 00:27, Luigi Rizzo wrote: > >On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote: > >... > >>Thanks, another good point. I forgot to merge this option from andre's > >>patch. > >> > >>Another 30-40-50kpps to win. > > > >not much gain though. > >What about the other IPSTAT_INC counters ? > Well, we should then remove all such counters (total, forwarded) and > per-interface statistics (at least for forwarded packets). I am not saying to remove them for good, but at least have a try at what we can hope to save by implementing them on a per-cpu basis. There is a chance that one will not see big gains util the majority of such shared counters are fixed (there are probably 3-4 at least on the non-error path for forwarded packets), plus the per-interface ones that are not even wrapped in macros (see if_ethersubr.c) > >I think the IPSTAT_INC macros were introduced (by rwatson ?) > >following a discussion on how to make the counters per-cpu > >and avoid the contention on cache lines. > >But they are still implemented as a single instance, > >and neither volatile nor atomic, so it is not even clear > >that they can give reliable results, let alone the fact > >that you are likely to get some cache misses. > > > >the relevant macro is in ip_var.h. > Hm. This seems to be just per-vnet structure instance. yes but essentially they are still shared by all threads within a vnet (besides you probably ran your tests in the main instance) > We've got some more real DPCPU stuff (sys/pcpu.h && kern/subr_pcpu.c) > which can be used for global ipstat structure, however since it is > allocated from single area without possibility to free we can't use it > for per-interface counters. yes, those should be moved to a private, dynamically allocated region of the ifnet (the number of CPUs is known at driver init time, i hope). But again for a quick test disabling the if_{i|o}{bytesC|packets} should do the job, if you can count the received rate by some other means. > I'll try to run tests without any possibly contested counters and report > the results on Thursday. great, that would be really useful info. cheers luigi ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: FreeBSD 10G forwarding performance @Intel
On 04.07.2012 00:27, Luigi Rizzo wrote: On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote: ... Thanks, another good point. I forgot to merge this option from andre's patch. Another 30-40-50kpps to win. not much gain though. What about the other IPSTAT_INC counters ? Well, we should then remove all such counters (total, forwarded) and per-interface statistics (at least for forwarded packets). I think the IPSTAT_INC macros were introduced (by rwatson ?) following a discussion on how to make the counters per-cpu and avoid the contention on cache lines. But they are still implemented as a single instance, and neither volatile nor atomic, so it is not even clear that they can give reliable results, let alone the fact that you are likely to get some cache misses. the relevant macro is in ip_var.h. Hm. This seems to be just per-vnet structure instance. We've got some more real DPCPU stuff (sys/pcpu.h && kern/subr_pcpu.c) which can be used for global ipstat structure, however since it is allocated from single area without possibility to free we can't use it for per-interface counters. I'll try to run tests without any possibly contested counters and report the results on Thursday. Cheers luigi +u_int rt_count = 1; +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW,&rt_count, 1, ""); @@ -601,17 +625,20 @@ passout: if (error != 0) IPSTAT_INC(ips_odropped); else { - ro.ro_rt->rt_rmx.rmx_pksent++; + if (rt_count) + ro.ro_rt->rt_rmx.rmx_pksent++; IPSTAT_INC(ips_forward); IPSTAT_INC(ips_fastforward); cheers luigi -- WBR, Alexander ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" -- WBR, Alexander ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: FreeBSD 10G forwarding performance @Intel
On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote: ... > Thanks, another good point. I forgot to merge this option from andre's > patch. > > Another 30-40-50kpps to win. not much gain though. What about the other IPSTAT_INC counters ? I think the IPSTAT_INC macros were introduced (by rwatson ?) following a discussion on how to make the counters per-cpu and avoid the contention on cache lines. But they are still implemented as a single instance, and neither volatile nor atomic, so it is not even clear that they can give reliable results, let alone the fact that you are likely to get some cache misses. the relevant macro is in ip_var.h. Cheers luigi > > +u_int rt_count = 1; > +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW, &rt_count, 1, ""); > > @@ -601,17 +625,20 @@ passout: > if (error != 0) > IPSTAT_INC(ips_odropped); > else { > - ro.ro_rt->rt_rmx.rmx_pksent++; > + if (rt_count) > + ro.ro_rt->rt_rmx.rmx_pksent++; > IPSTAT_INC(ips_forward); > IPSTAT_INC(ips_fastforward); > > > > > >cheers > >luigi > > > > > -- > WBR, Alexander > ___ > freebsd-...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: FreeBSD 10G forwarding performance @Intel
On 03.07.2012 20:55, Luigi Rizzo wrote: On Tue, Jul 03, 2012 at 08:11:14PM +0400, Alexander V. Chernikov wrote: Hello list! I'm quite stuck with bad forwarding performance on many FreeBSD boxes doing firewalling. ... In most cases system can forward no more than 700 (or 1400) kpps which is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware). among the many interesting tests you have run, i am curious if you have tried to remove the update of the counters on route entries. They might be another severe contention point. 21:47 [0] m@test15 netstat -I ix0 -w 1 input (ix0) output packets errs idrops bytespackets errs bytes colls 1785514 52785 0 1213183401784650 0 117874854 0 1773126 52437 0 1207014701772977 0 117584736 0 1781948 52154 0 1210601261778271 0 75029554 0 1786169 52982 0 1214511601787312 0 160967392 0 21:47 [0] test15# sysctl net.rt_count=0 net.rt_count: 1 -> 0 1814465 22546 0 1213020761814291 0 76860092 0 1817769 14272 0 1209849221816254 0 163643534 0 1815311 13113 0 1208319701815340 0 120159118 0 1814059 13698 0 1207991321813738 0 120172092 0 1818030 13513 0 1209601401814578 0 120332662 0 1814169 14351 0 1208361821814003 0 120164310 0 Thanks, another good point. I forgot to merge this option from andre's patch. Another 30-40-50kpps to win. +u_int rt_count = 1; +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW, &rt_count, 1, ""); @@ -601,17 +625,20 @@ passout: if (error != 0) IPSTAT_INC(ips_odropped); else { - ro.ro_rt->rt_rmx.rmx_pksent++; + if (rt_count) + ro.ro_rt->rt_rmx.rmx_pksent++; IPSTAT_INC(ips_forward); IPSTAT_INC(ips_fastforward); cheers luigi -- WBR, Alexander ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: FreeBSD 10G forwarding performance @Intel
On Tue, Jul 03, 2012 at 08:11:14PM +0400, Alexander V. Chernikov wrote: > Hello list! > > I'm quite stuck with bad forwarding performance on many FreeBSD boxes > doing firewalling. ... > In most cases system can forward no more than 700 (or 1400) kpps which > is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware). among the many interesting tests you have run, i am curious if you have tried to remove the update of the counters on route entries. They might be another severe contention point. cheers luigi ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
FreeBSD 10G forwarding performance @Intel
Hello list! I'm quite stuck with bad forwarding performance on many FreeBSD boxes doing firewalling. Typical configuration is E5645 / E5675 @ Intel 82599 NIC. HT is turned off. (Configs and tunables below). I'm mostly concerned with unidirectional traffic flowing to single interface (e.g. using singe route entry). In most cases system can forward no more than 700 (or 1400) kpps which is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware). Test scenario: Ixia XM2 (traffic generator) <> ix0 (FreeBSD). Ixia sends 64byte IP packets from vlan10 (10.100.0.64 - 10.100.0.156) to destinations in vlan11 (10.100.1.128 - 10.100.1.192). Static arps are configured for all destination addresses. Traffic level is slightly above or slightly below system performance. = Test 1 === Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, no firewall Traffic: 1-1 flow (1 src, 1 dst) (This is actually a bit different from described above) Result: input (ix0) output packets errs idrops bytespackets errs bytes colls 878k 48k 059M 878k 056M 0 874k 48k 059M 874k 056M 0 875k 48k 059M 875k 056M 0 16:41 [0] test15# top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf " %7s %2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}' STATE C TIMECPU COMMAND CPU6 6 17:28100.00% kernel{ix0 que} CPU9 9 20:42 60.06%intr{irq265: ix0:que 16:41 [0] test15# vmstat -i | grep ix0 irq256: ix0:que 0 500796167 irq257: ix0:que 16693573 2245 irq258: ix0:que 22572380862 irq259: ix0:que 33166273 1062 irq260: ix0:que 49691706 3251 irq261: ix0:que 5 10766434 3611 irq262: ix0:que 68933774 2996 irq263: ix0:que 75246879 1760 irq264: ix0:que 83548930 1190 irq265: ix0:que 9 11817986 3964 irq266: ix0:que 10227561 76 irq267: ix0:link 1 0 Note that system is using 2 cores to forward, so 12 cores should be able to forward 4+ mpps which is more or less consistent with Linux results. Note that interrupts on all queues are (as far as I understand from the fact that AIM is turned off and interrupt rates are the same from previous test). Additionally, despite hw.intr_storm_threshold = 200k, i'm constantly getting interrupt storm detected on "irq265:"; throttling interrupt source message. = Test 2 === Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, no firewall Traffic: Unidirectional many-2-many 16:20 [0] test15# netstat -I ix0 -hw 1 input (ix0) output packets errs idrops bytespackets errs bytes colls 507k 651k 074M 508k 032M 0 506k 652k 074M 507k 028M 0 509k 652k 074M 508k 037M 0 16:28 [0] test15# top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf " %7s %2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}' STATE C TIMECPU COMMAND CPU10 6 0:40100.00% kernel{ix0 que} CPU2 2 11:47 84.86%intr{irq258: ix0:que CPU3 3 11:50 81.88%intr{irq259: ix0:que CPU8 8 11:38 77.69%intr{irq264: ix0:que CPU7 7 11:24 77.10%intr{irq263: ix0:que WAIT 1 10:10 74.76%intr{irq257: ix0:que CPU4 4 8:57 63.48%intr{irq260: ix0:que CPU6 6 8:35 61.96%intr{irq262: ix0:que CPU9 9 14:01 60.79%intr{irq265: ix0:que RUN 0 9:07 59.67%intr{irq256: ix0:que WAIT 5 6:13 43.26%intr{irq261: ix0:que CPU11 11 5:19 35.89% kernel{ix0 que} - 4 3:41 25.49% kernel{ix0 que} - 1 3:22 21.78% kernel{ix0 que} - 1 2:55 17.68% kernel{ix0 que} - 4 2:24 16.55% kernel{ix0 que} - 1 9:54 14.99% kernel{ix0 que} CPU0 11 2:13 14.26% kernel{ix0 que} 16:07 [0] test15# vmstat -i | grep ix0 irq256: ix0:que 0 13654 15 irq257: ix0:que 1 87043 96 irq258: ix0:que 2 39604 44 irq259: ix0:que 3 48308 53 irq260: ix0:que 4 138002153 irq261: ix0:que 5 169596188 irq262: ix0:que 6 107679119 irq263: ix0:que 7 72769 81 irq264: ix0:que 8 30878 34 irq265: ix0:que