Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1
I am laughing so hard that I had to open some windows to get more oxygen! On Friday, August 14, 2015 1:30 PM, Maxim Sobolev sobo...@freebsd.org wrote: Hi guys, unfortunately no, neither reduction of the number of queues from 8 to 6 nor pinning interrupt rate at 2 per queue have not made any difference. The card still goes kaboom at about 200Kpps no matter what. in fact I've gone bit further, and after the first spike went on an pushed interrupt rate even further down to 1, but again no difference either, it still blows at the same mark. Although it did have effect on interrupt rate reduction from 190K to some 130K according to the systat -vm, so that the moderation itself seems to be working fine. We will try disabling IXGBE_FDIR tomorrow and see if it helps. http://sobomax.sippysoft.com/ScreenShot391.png - systat -vm with max_interrupt_rate = 2 right before overload http://sobomax.sippysoft.com/ScreenShot392.png - systat -vm during issue unfolding (max_interrupt_rate = 1) http://sobomax.sippysoft.com/ScreenShot394.png - cpu/net monitoring, first two spikes are with max_interrupt_rate = 2, the third one max_interrupt_rate = 1 -Max On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo ri...@iet.unipi.it wrote: As I was telling to maxim, you should disable aim because it only matches the max interrupt rate to the average packet size, which is the last thing you want. Setting the interrupt rate with sysctl (one per queue) gives you precise control on the max rate and (hence, extra latency). 20k interrupts/s give you 50us of latency, and the 2k slots in the queue are still enough to absorb a burst of min-sized frames hitting a single queue (the os will start dropping long before that level, but that's another story). Cheers Luigi On Wednesday, August 12, 2015, Babak Farrokhi farro...@freebsd.org wrote: I ran into the same problem with almost the same hardware (Intel X520) on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queues, with the same sysctl tunings as sobomax@ did. I am not using lagg, no FLOWTABLE. I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1] [2] you can see the results, including pmc output, callchain, flamegraph and gprof output. I am experiencing huge number of interrupts with 200kpps load: # sysctl dev.ix | grep interrupt_rate dev.ix.1.queue7.interrupt_rate: 125000 dev.ix.1.queue6.interrupt_rate: 6329 dev.ix.1.queue5.interrupt_rate: 50 dev.ix.1.queue4.interrupt_rate: 10 dev.ix.1.queue3.interrupt_rate: 5 dev.ix.1.queue2.interrupt_rate: 50 dev.ix.1.queue1.interrupt_rate: 50 dev.ix.1.queue0.interrupt_rate: 10 dev.ix.0.queue7.interrupt_rate: 50 dev.ix.0.queue6.interrupt_rate: 6097 dev.ix.0.queue5.interrupt_rate: 10204 dev.ix.0.queue4.interrupt_rate: 5208 dev.ix.0.queue3.interrupt_rate: 5208 dev.ix.0.queue2.interrupt_rate: 71428 dev.ix.0.queue1.interrupt_rate: 5494 dev.ix.0.queue0.interrupt_rate: 6250 [1] http://farrokhi.net/~farrokhi/pmc/6/ [2] http://farrokhi.net/~farrokhi/pmc/7/ Regards, Babak Alexander V. Chernikov wrote: 12.08.2015, 02:28, Maxim Sobolev sobo...@freebsd.org: Olivier, keep in mind that we are not kernel forwarding packets, but app forwarding, i.e. the packet goes full way net-kernel-recvfrom-app-sendto-kernel-net, which is why we have much lower PPS limits and which is why I think we are actually benefiting from the extra queues. Single-thread sendto() in a loop is CPU-bound at about 220K PPS, and while running the test I am observing that outbound traffic from one thread is mapped into a specific queue (well, pair of queues on two separate adaptors, due to lagg load balancing action). And the peak performance of that test is at 7 threads, which I believe corresponds to the number of queues. We have plenty of CPU cores in the box (24) with HTT/SMT disabled and one CPU is mapped to a specific queue. This leaves us with at least 8 CPUs fully capable of running our app. If you look at the CPU utilization, we are at about 10% when the issue hits. In any case, it would be great if you could provide some profiling info since there could be plenty of problematic places starting from TX rings contention to some locks inside udp or even (in)famous random entropy harvester.. e.g. something like pmcstat -TS instructions -w1 might be sufficient to determine the reason ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 port 0x6020-0x603f mem 0xc7c0-0xc7df,0xc7e04000-0xc7e07fff irq 40 at device 0.0 on pci3 ix0: Using MSIX interrupts with 9 vectors ix0: Bound queue 0 to cpu 0 ix0: Bound queue 1 to cpu 1 ix0: Bound queue 2 to cpu 2 ix0: Bound queue 3 to cpu 3 ix0: Bound queue 4 to cpu 4 ix0: Bound queue 5 to cpu 5 ix0: Bound queue 6 to cpu 6 ix0: Bound queue 7 to cpu 7 ix0: Ethernet address: 0c:c4:7a:5e:be:64 ix0: PCI Express Bus: Speed
Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1
Also, using a slow-ass cpu like the atom is completely absurd; first, no-one would ever use them. You have to test cpu usage under 60% cpu usage, because as you get to higher cpu usage levels the lock contention increases exponentially. You're increasing lock contention by having more queues; so more queues at higher cpu % usage will perform increasingly bad as usage increases. You'd never run a system at 95% usage (ie totally hammering it) in real world usage, so why would you benchmark at such a high usage? Everything changes as cpu available become scarce. What is the pps at 50% cpu usage is a better question to ask than the one you're asking. BC On Tuesday, August 11, 2015 9:29 PM, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: Wow, this is really important! if this is a college project, I give you a D. Maybe a D- because it's almost useless information. You ignore the most important aspect of performance. Efficiency is arguably the most important aspect of performance. 1M pps at 20% cpu usage is much better performance than 1.2M pps at 85%. Why don't any of you understand this simple thing? Why does spreading equality really matter, unless you are hitting a wall with your cpus? I don't care which cpu processes which packet. If you weren't doing moronic things like binding to a cpu, then you'd never have to care about distribution unless it was extremely unbalanced. BC On Tuesday, August 11, 2015 7:15 PM, Olivier Cochard-Labbé oliv...@cochard.me wrote: On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev sobo...@freebsd.org wrote: Hi folks, Hi, We've trying to migrate some of our high-PPS systems to a new hardware that has four X540-AT2 10G NICs and observed that interrupt time goes through roof after we cross around 200K PPS in and 200K out (two ports in LACP). The previous hardware was stable up to about 350K PPS in and 350K out. I believe the old one was equipped with the I350 and had the identical LACP configuration. The new box also has better CPU with more cores (i.e. 24 cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. 200K PPS, and even 350K PPS are very low value indeed. On a Intel Xeon L5630 (4 cores only) with one X540-AT2 (then 2 10Gigabit ports) I've reached about 1.8Mpps (fastforwarding enabled) [1]. But my setup didn't use lagg(4): Can you disable lagg configuration and re-measure your performance without lagg ? Do you let Intel NIC drivers using 8 queues for port too? In my use case (forwarding smallest UDP packet size), I obtain better behaviour by limiting NIC queues to 4 (hw.ix.num_queues or hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And this with Gigabit Intel[2] or Chelsio NIC [3]. Don't forget to disable TSO and LRO too. Regards, Olivier [1] http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs [2] http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_superserver_5018a-ftn4#graph1 [3] http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#reducing_nic_queues ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1
Wow, this is really important! if this is a college project, I give you a D. Maybe a D- because it's almost useless information. You ignore the most important aspect of performance. Efficiency is arguably the most important aspect of performance. 1M pps at 20% cpu usage is much better performance than 1.2M pps at 85%. Why don't any of you understand this simple thing? Why does spreading equality really matter, unless you are hitting a wall with your cpus? I don't care which cpu processes which packet. If you weren't doing moronic things like binding to a cpu, then you'd never have to care about distribution unless it was extremely unbalanced. BC On Tuesday, August 11, 2015 7:15 PM, Olivier Cochard-Labbé oliv...@cochard.me wrote: On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev sobo...@freebsd.org wrote: Hi folks, Hi, We've trying to migrate some of our high-PPS systems to a new hardware that has four X540-AT2 10G NICs and observed that interrupt time goes through roof after we cross around 200K PPS in and 200K out (two ports in LACP). The previous hardware was stable up to about 350K PPS in and 350K out. I believe the old one was equipped with the I350 and had the identical LACP configuration. The new box also has better CPU with more cores (i.e. 24 cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. 200K PPS, and even 350K PPS are very low value indeed. On a Intel Xeon L5630 (4 cores only) with one X540-AT2 (then 2 10Gigabit ports) I've reached about 1.8Mpps (fastforwarding enabled) [1]. But my setup didn't use lagg(4): Can you disable lagg configuration and re-measure your performance without lagg ? Do you let Intel NIC drivers using 8 queues for port too? In my use case (forwarding smallest UDP packet size), I obtain better behaviour by limiting NIC queues to 4 (hw.ix.num_queues or hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And this with Gigabit Intel[2] or Chelsio NIC [3]. Don't forget to disable TSO and LRO too. Regards, Olivier [1] http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs [2] http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_superserver_5018a-ftn4#graph1 [3] http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#reducing_nic_queues ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Exposing full 32bit RSS hash from card for ixgbe(4)
On Wednesday, August 5, 2015 4:28 PM, Kevin Oberman rkober...@gmail.com wrote: On Wed, Aug 5, 2015 at 7:10 AM, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: On Wednesday, August 5, 2015 2:19 AM, Olivier Cochard-Labbé oliv...@cochard.me wrote: On Wed, Aug 5, 2015 at 1:15 AM, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: What's the point of all of this gobbledygook anyway? Seriously, 99% of the world needs a driver that passes packets in the most efficient way, and every time I look at igb and ixgbe it has another 2 heads. It's up to 8 heads, and none of the things wrong with it have been fixed. This is now even uglier than Kip Macy's cxgb abortion. I'm not trying to be snarky here. I wrote a simple driver 3 years ago that runs and runs and uses little cpu; maybe 8% for a full gig load on an E3. Hi, I will be very happy to bench your simple driver. Where can I download the sources ? Thanks, Olivier ___ Another unproductive dick head on the FreeBSD team? Figures. A typical Barney thread. First he calls the developers incompetent and says he has done better. Then someone who has experience in real world benchmarking (not a trivial thing) offers to evaluate Barney's code, and gets a quick, rude, obscene dismissal. Is it any wonder that, even though he made some valid arguments (at least for some workloads), almost everyone just dismisses him as too obnoxious to try to deal with. Based on my pre-retirement work with high-performance networking, in some cases it was clear that it would be better to locking down things to a single CPU on with FreeBSD or Linux. I can further state that this was NOT true for all workloads, so it is quite possible that Barney's code works for some cases (perhaps his) and would be bad in others. But without good benchmarking, it's hard to tell. I will say that for large volume data transfers (very large flows), a single CPU solution does work best. But if Barney is going at this with his usual attitude, it's probably not worth it to continue the discussion. -- the give us the source and we'll test it nonsense is kindergarden stuff. As if my code is open source and you can just have it, and like you know how to benchmark anything since you can't even benchmark what you have. Some advice is to ignore guys like Oberman who spent their lives randomly pounding networks on slow machines with slow busses and bad NICs on OS's that couldn't do SMP properly. Because he'll just lead you down the road to dusty death. Multicore design isn't simple math; its about efficiency, lock minimization and the understanding that shifting memory between cpus unnecessarily is costly. Today's CPUs and NICs can't be judged using test methods of the past. You'll just end up playing the Microsoft Windows game; get bigger machines and more memory and don't worry about the fact that the code is junk. It's just that the default in these drivers is so obviously wrong that it's mind-boggling. The argument to use 1, 2 or 4 queues is one worth having; using all of the cpus, including the hyperthreads, is just plain incompetent. I will contribute one possibly useful tidbit: disable_queue() only disables receive interrupts. Both tx and rx ints are effectively tied together by moderation so you'll just getan interrupt at the next slot anyway. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Exposing full 32bit RSS hash from card for ixgbe(4)
On Wednesday, August 5, 2015 2:19 AM, Olivier Cochard-Labbé oliv...@cochard.me wrote: On Wed, Aug 5, 2015 at 1:15 AM, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: What's the point of all of this gobbledygook anyway? Seriously, 99% of the world needs a driver that passes packets in the most efficient way, and every time I look at igb and ixgbe it has another 2 heads. It's up to 8 heads, and none of the things wrong with it have been fixed. This is now even uglier than Kip Macy's cxgb abortion. I'm not trying to be snarky here. I wrote a simple driver 3 years ago that runs and runs and uses little cpu; maybe 8% for a full gig load on an E3. Hi, I will be very happy to bench your simple driver. Where can I download the sources ? Thanks, Olivier ___ Another unproductive dick head on the FreeBSD team? Figures. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Exposing full 32bit RSS hash from card for ixgbe(4)
What's the point of all of this gobbledygook anyway? Seriously, 99% of the world needs a driver that passes packets in the most efficient way, and every time I look at igb and ixgbe it has another 2 heads. It's up to 8 heads, and none of the things wrong with it have been fixed. This is now even uglier than Kip Macy's cxgb abortion. I'm not trying to be snarky here. I wrote a simple driver 3 years ago that runs and runs and uses little cpu; maybe 8% for a full gig load on an E3. What is the benefit of implementing all of these stupid offload and RSS hashes? Spreading across cpus is incredibly inefficient; running 8 'queues' on a quad core cpu with hyperthreading is incredibly stupid. 1 cpu can easily handle a full gig, so why are you dirtying the code with 8000 features when it runs just fine without any of them? you're subjecting 1000s of users to constant instability (and fear in upgrading at all) for what amounts to a college science project. I know you haven't benchmarked it, so why are you doing it? hell, you added that buf_ring stuff without even making any determination that it was beneficial to use it, just because it was there. You're trying to steal a handful of cycles with these hokey features, and then you're losing buckets of cycles (maybe wheelbarrows) by unnecessarily spreading the processes across too many cpus. It just makes no sense at all. If you want to play, that's fine. But there should be simple I/O drivers for em, igb and ixgbe available as alternatives for the 99% of users who just want to run a router, a bridge/filter or a web server. Drivers that don't break features A and C when you make a change to Q and Z because you can't possibly test all 8000 features every time you do something. Im horrified that some poor schlub with a 1 gig webserver is losing half of his cpu power because of the ridiculous defaults in the igb driver. On Wednesday, July 15, 2015 2:01 PM, hiren panchasara hi...@freebsd.org wrote: On 07/14/15 at 02:18P, hiren panchasara wrote: On 07/14/15 at 12:38P, Eric Joyner wrote: Sorry for the delay; it looked fine to me, but I never got back to you. - Eric On Mon, Jul 13, 2015 at 3:16 PM Adrian Chadd adrian.ch...@gmail.com wrote: Hi, It's fine by me. Please do it! Thanks Adrian and Eric. Committed as r285528. FYI: I am planning to do a partial mfc of this to stable10. Here is the patch: https://people.freebsd.org/~hiren/patches/ix_expose_rss_hash_stable10.patch (I did the same for igb(4), r282831) Cheers, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netmap-ipfw on em0 em1
Are you NOT SHARP ENOUGH to understand that my proposal DOESN'T USE THE NETWORK STACK? OMFG Julien, perhaps if people weren't so hostile towards commercial companies providing ideas for alternative ways of doing things you'd get more input and more help. Why would I want to help these people? BC On Monday, May 4, 2015 11:55 PM, Jim Thompson j...@netgate.com wrote: On May 4, 2015, at 10:07 PM, Julian Elischer jul...@freebsd.org wrote: Jim, and Barney. I hate to sound like a broken record, but we really need interested people in the network stack. The people who make the decisions about this are the people who stand up and say I have a few hours I can spend on this. If you were to do so too, then really, all these issues could be worked on. get in there and help rather than standing on the bleachers and offering advise. There is no person working against you here. From my counting the current active networking crew is about 10 people. with another 10 doing drivers. You would have a lot of sway in a group that small. but you have th be in it first, and the way to do that is to simple start doing stuff. no-one was ever sent an invitation. They just turned up. I am (and we are) interested. I’m a bit short on time, and I have a project/product (pfSense) to maintain, so I keep other people busy on the stack. Examples include: We co-sponsored the AES-GCM work. Unfortunately, the process stopped before the IPsec work to leverage this we did made it upstream. As partial remedy, gnn is currently evaluating all the patches from pfSense for inclusion into the FreeBSD mainline. I was involved in the work to replace the hash function used in pf. This is (only) min 3% gain, more if you carry large state tables. There was a paper presented at AsiaBSDcon, so at least we have a methodology to speak about performance increases. (Is the methodology in the paper perfect? No. But at least it’s a stake in the ground.) We’re currently working with Intel to bring support for QuickAssist to FreeBSD. (Linux has it.) While that’s not ‘networking’ per-se, the larger consumers for the technology are various components in the stack. The other flaws I pointed out are on the list of things for us to work on / fix. Someone might get there first, but … that’s good. I only care about getting things fixed. Jim p.s. yes, I'm working on a commit bit. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Fwd: netmap-ipfw on em0 em1
It's not faster than wedging into the if_input()s. It simply can't be. Your getting packets at interrupt time as soon as their processed and you there's no network stack involved, and your able to receive and transmit without a process switch. At worst it's the same, without the extra plumbing. It's not rocket science to bypass the network stack. The only advantage of bringing it into user space would be that it's easier to write threaded handlers for complex uses; but not as a firewall (which is the limit of the context of my comment). You can do anything in the kernel that you can do in user space. The reason a kernel module with if_input() hooks is better is that you can use the standard kernel without all of the netmap hacks. You can just pop it into any kernel and it works. BC On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: Frankly I'm baffled by netmap. You can easily write a loadable kernel module that moves packets from 1 interface to another and hook in the firewall; why would you want to bring them up into user space? It's 1000s of lines of unnecessary code. Because it is much faster. The motivation for netmap-like solutions (that includes Intel's DPDK, PF_RING/DNA and several proprietary implementations) is speed: they bypass the entire network stack, and a good part of the device drivers, so you can access packets 10+ times faster. So things are actually the other way around: the 1000's of unnecessary lines of code (not really thousands, though) are those that you'd pay going through the standard network stack when you don't need any of its services. Going to userspace is just a side effect -- turns out to be easier to develop and run your packet processing code in userspace, but there are netmap clients (e.g. the VALE software switch) which run entirely in the kernel. cheers luigi On Sunday, May 3, 2015 3:10 AM, Raimundo Santos rait...@gmail.com wrote: Clarifying things for the sake of documentation: To use the host stack, append a ^ character after the name of the interface you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.) Examples: kipfw em0 does nothing useful. kipfw netmap:em0 disconnects the NIC from the usual data path, i.e., there are no host communications. kipfw netmap:em0 netmap:em0^ or kipfw netmap:em0+ places the netmap-ipfw rules between the NIC and the host stack entry point associated (the IP addresses configured on it with ifconfig, ARP and RARP, etc...) with the same NIC. On 10 November 2014 at 18:29, Evandro Nunes evandronune...@gmail.com wrote: dear professor luigi, i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU and system using the rest, this system is a 8core at 2.4Ghz, but only one core is in use in this next round of tests, my NIC is now an avoton with igb(4) driver, currently with 4 queues per NIC (total 8 queues for kipfw bridge) i have read in your papers we should expect something similar to 1.48Mpps how can I benefit from the other CPUs which are completely idle? I tried CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw so I could not set one CPU to userland while other for system All the papers talk about *generating* lots of packets, not *processing* lots of packets. What this netmap example does is processing. If someone really wants to use the host stack, the expected performance WILL BE worse - what's the point of using a host stack bypassing tool/framework if someone will end up using the host stack? And by generating, usually the papers means: minimum sized UDP packets. can you please enlighten? For everyone: read the manuals, read related and indicated materials (papers, web sites, etc), and, as a least resource, read the code. Within netmap's codes, it's more easy than it sounds. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org -- -+--- Prof. Luigi RIZZO, ri...@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2217533 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -+--- ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any
Re: Fwd: netmap-ipfw on em0 em1
Nothing freely available. Many commercial companies have done such things. Why limit the general community by force-feeding a really fast packet generator into the mainstream by squashing other ideas in their infancy? Anyone who understands how the kernel works understands what I'm saying. A packet forwarder is a 3 day project (which means 2 weeks as we all know). When you're can't debate the merits of an implementation without having some weenie ask if you have a finished implementation to offer up for free, you end up stuck with misguided junk like netgraph and flowtables. The mediocrity of freebsd network utilities is a function of the collective imagination of its users. Its unfortunate that these lists can't be used to brainstorm better potential better ideas. Luigi's efforts are not diminished by arguing that there is a better way to do something that he recommends to be done with netmap. BC On Monday, May 4, 2015 11:52 AM, Ian Smith smi...@nimnet.asn.au wrote: On Mon, 4 May 2015 15:29:13 +, Barney Cordoba via freebsd-net wrote: It's not faster than wedging into the if_input()s. It simply can't be. Your getting packets at interrupt time as soon as their processed and you there's no network stack involved, and your able to receive and transmit without a process switch. At worst it's the same, without the extra plumbing. It's not rocket science to bypass the network stack. The only advantage of bringing it into user space would be that it's easier to write threaded handlers for complex uses; but not as a firewall (which is the limit of the context of my comment). You can do anything in the kernel that you can do in user space. The reason a kernel module with if_input() hooks is better is that you can use the standard kernel without all of the netmap hacks. You can just pop it into any kernel and it works. Barney, do you have a working alternative implementation you can share with us to help put this silly inferior netmap thingy out of business? Thanks, Ian [I'm sorry, pine doesn't quote messages from some yahoo users properly:] On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: Frankly I'm baffled by netmap. You can easily write a loadable kernel module that moves packets from 1 interface to another and hook in the firewall; why would you want to bring them up into user space? It's 1000s of lines of unnecessary code. Because it is much faster. The motivation for netmap-like solutions (that includes Intel's DPDK, PF_RING/DNA and several proprietary implementations) is speed: they bypass the entire network stack, and a good part of the device drivers, so you can access packets 10+ times faster. So things are actually the other way around: the 1000's of unnecessary lines of code (not really thousands, though) are those that you'd pay going through the standard network stack when you don't need any of its services. Going to userspace is just a side effect -- turns out to be easier to develop and run your packet processing code in userspace, but there are netmap clients (e.g. the VALE software switch) which run entirely in the kernel. cheers luigi On Sunday, May 3, 2015 3:10 AM, Raimundo Santos rait...@gmail.com wrote: Clarifying things for the sake of documentation: To use the host stack, append a ^ character after the name of the interface you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.) Examples: kipfw em0 does nothing useful. kipfw netmap:em0 disconnects the NIC from the usual data path, i.e., there are no host communications. kipfw netmap:em0 netmap:em0^ or kipfw netmap:em0+ places the netmap-ipfw rules between the NIC and the host stack entry point associated (the IP addresses configured on it with ifconfig, ARP and RARP, etc...) with the same NIC. On 10 November 2014 at 18:29, Evandro Nunes evandronune...@gmail.com wrote: dear professor luigi, i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU and system using the rest, this system is a 8core at 2.4Ghz, but only one core is in use in this next round of tests, my NIC is now an avoton with igb(4) driver, currently with 4 queues per NIC (total 8 queues for kipfw bridge) i have read in your papers we should expect something similar to 1.48Mpps how can I benefit from the other CPUs which are completely idle? I tried CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw so I could not set one CPU to userland while other for system All the papers talk about *generating* lots of packets, not *processing* lots of packets. What this netmap example does is processing. If someone really wants to use the host stack, the expected performance WILL BE worse - what's the point of using a host stack bypassing tool/framework if someone
Re: netmap-ipfw on em0 em1
I'll assume you're just not that clear on specific implementation. Hooking directly into if_input() bypasses all of the cruft. It basically uses the driver as-is, so any driver can be used and it will be as good as the driver. The bloat starts in if_ethersubr.c, which is easily completely avoided. Most drivers need to be tuned (or modified a bit) as most freebsd drivers are full of bloat and forced into a bad, cookie-cutter type way of doing things. The problem with doing things in user space is that user space is unpredictable. Things work just dandily when nothing else is going on, but you can't control when a user space program gets context under heavy loads. In the kernel you can control almost exactly what the polling interval is through interrupt moderation on most modern controllers. Many otherwise credible programmers argued for years that polling was faster, but it was only faster in artificially controlled environment. Its mainly because 1) they're not thinking about the entire context of what can happen, and 2) because they test under unrealistic conditions that don't represent real world events, and 3) they don't have properly tuned ethernet drivers. BC On Monday, May 4, 2015 12:37 PM, Jim Thompson j...@netgate.com wrote: While it is a true statement that, You can do anything in the kernel that you can do in user space.”, it is not a helpful statement. Yes, the kernel is just a program. In a similar way, “You can just pop it into any kernel and it works.” is also not helpful. It works, but it doesn’t work well, because of other infrastructure issues. Both of your statements reduce to the age-old, “proof is left as an exercise for the student”. There is a lot of kernel infrastructure that is just plain crusty(*) and which directly impedes performance in this area. But there is plenty of cruft, Barney. Here are two threads which are three years old, with the issues it points out still unresolved, and multiple places where 100ns or more is lost: https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033287.html https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033351.html 100ns is death at 10Gbps with min-sized packets. quoting: http://luca.ntop.org/10g.pdf --- Taking as a reference a 10 Gbit/s link, the raw throughput is well below the memory bandwidth of modern systems (between 6 and 8 GBytes/s for CPU to memory, up to 5 GBytes/s on PCI-Express x16). How- ever a 10Gbit/s link can generate up to 14.88 million Packets Per Second (pps), which means that the system must be able to process one packet every 67.2 ns. This translates to about 200 clock cycles even for the faster CPUs, and might be a challenge considering the per- packet overheads normally involved by general-purpose operating systems. The use of large frames reduces the pps rate by a factor of 20..50, which is great on end hosts only concerned in bulk data transfer. Monitoring systems and traffic generators, however, must be able to deal with worst case conditions.” Forwarding and filtering must also be able to deal with worst case, and nobody does well with kernel-based networking here. https://github.com/gvnn3/netperf/blob/master/Documentation/Papers/ABSDCon2015Paper.pdf 10Gbps NICs are $200-$300 today, and they’ll be included on the motherboard during the next hardware refresh. Broadwell-DE (Xeon-D) has 10G in the SoC, and others are coming. 10Gbps switches can be had at around $100/port. This is exactly the point at which the adoption curve for 1Gbps Ethernet ramped over a decade ago. (*) A few more simple examples of cruft: Why, in 2015 does the kernel have a ‘fast forwarding’ option, and worse, one that isn’t enabled by default? Shouldn’t “fast forwarding be the default? Why, in 2015, does FreeBSD not ship with IPSEC enabled in GENERIC? (Reason: each and every time this has come up in recent memory, someone has pointed out that it impacts performance. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=128030) Why, in 2015, does anyone think it’s acceptable for “fast forwarding” to break IPSEC? Why, in 2015, does anyone think it’s acceptable that the setkey(8) man page documents, of all things, DES-CBC and HMAC-MD5 for a SA? That’s some kind of sick joke, right? This completely flies in the face of RFC 4835. On May 4, 2015, at 10:29 AM, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: It's not faster than wedging into the if_input()s. It simply can't be. Your getting packets at interrupt time as soon as their processed and you there's no network stack involved, and your able to receive and transmit without a process switch. At worst it's the same, without the extra plumbing. It's not rocket science to bypass the network stack. The only advantage of bringing it into user space would be that it's easier to write threaded handlers for complex uses; but not as a firewall (which is the limit
Re: Fwd: netmap-ipfw on em0 em1
Frankly I'm baffled by netmap. You can easily write a loadable kernel module that moves packets from 1 interface to another and hook in the firewall; why would you want to bring them up into user space? It's 1000s of lines of unnecessary code. On Sunday, May 3, 2015 3:10 AM, Raimundo Santos rait...@gmail.com wrote: Clarifying things for the sake of documentation: To use the host stack, append a ^ character after the name of the interface you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.) Examples: kipfw em0 does nothing useful. kipfw netmap:em0 disconnects the NIC from the usual data path, i.e., there are no host communications. kipfw netmap:em0 netmap:em0^ or kipfw netmap:em0+ places the netmap-ipfw rules between the NIC and the host stack entry point associated (the IP addresses configured on it with ifconfig, ARP and RARP, etc...) with the same NIC. On 10 November 2014 at 18:29, Evandro Nunes evandronune...@gmail.com wrote: dear professor luigi, i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU and system using the rest, this system is a 8core at 2.4Ghz, but only one core is in use in this next round of tests, my NIC is now an avoton with igb(4) driver, currently with 4 queues per NIC (total 8 queues for kipfw bridge) i have read in your papers we should expect something similar to 1.48Mpps how can I benefit from the other CPUs which are completely idle? I tried CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw so I could not set one CPU to userland while other for system All the papers talk about *generating* lots of packets, not *processing* lots of packets. What this netmap example does is processing. If someone really wants to use the host stack, the expected performance WILL BE worse - what's the point of using a host stack bypassing tool/framework if someone will end up using the host stack? And by generating, usually the papers means: minimum sized UDP packets. can you please enlighten? For everyone: read the manuals, read related and indicated materials (papers, web sites, etc), and, as a least resource, read the code. Within netmap's codes, it's more easy than it sounds. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel Support for FreeBSD
Ok. It was a lot more convenient when it was a standalone module/tarball so you didn't have to surgically extract it from the tree and spend a week trying to get it to compile with whatever version you happened to be running. So if you're running 9.1 or 9.2 you could still use it seamlessly. Negative Progress is inevitable. BC On Tuesday, August 12, 2014 9:57 PM, Mike Tancsa m...@sentex.net wrote: On 8/12/2014 9:16 PM, Barney Cordoba via freebsd-net wrote: I notice that there hasn't been an update in the Intel Download Center since July. Is there no official support for 10? Hi, The latest code is committed directly into the tree by Intel eg http://lists.freebsd.org/pipermail/svn-src-head/2014-July/060947.html and http://lists.freebsd.org/pipermail/svn-src-head/2014-June/059904.html They have been MFC'd to RELENG_10 a few weeks ago ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel Support for FreeBSD
It's not an either/or. Until last July there was both. Like F'ing Intel isn't making enough money to pay someone to maintain a FreeBSD version. On Wednesday, August 13, 2014 2:24 PM, Jim Thompson j...@netgate.com wrote: On Aug 13, 2014, at 8:24, Barney Cordoba via freebsd-net freebsd-net@freebsd.org wrote: Negative Progress is inevitable. Many here undoubtedly consider the referenced effort to be the opposite. Jim ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel Support for FreeBSD
This kind of stupidity really irritates me. The commercial use of FreeBSD is the only reason that there is a project, and anyone with 1/2 a brain knows that companies with products based on freebsd can't just upgrade their tree every time some geek gets around to writing a patch. Maybe its the reason that linux sucks but everyone uses it? 10 years later, some old brain dead mentality. On Wednesday, August 13, 2014 2:49 PM, John-Mark Gurney j...@funkthat.com wrote: Barney Cordoba via freebsd-net wrote this message on Wed, Aug 13, 2014 at 06:24 -0700: Ok. It was a lot more convenient when it was a standalone module/tarball so you didn't have to surgically extract it from the tree and spend a week trying to get it to compile with whatever version you happened to be running. So if you're running 9.1 or 9.2 you could still use it seamlessly. Negative Progress is inevitable. The problem is that you are using an old version of FreeBSD that only provides security update... The correct solution is to update your machines... I'd much rather have Intel support it in tree, meaning that supported versions of FreeBSD have an up to date driver, than to cater to your wants of using older releases of FreeBSD... Thanks. On Tuesday, August 12, 2014 9:57 PM, Mike Tancsa m...@sentex.net wrote: On 8/12/2014 9:16 PM, Barney Cordoba via freebsd-net wrote: I notice that there hasn't been an update in the Intel Download Center since July. Is there no official support for 10? Hi, The latest code is committed directly into the tree by Intel eg http://lists.freebsd.org/pipermail/svn-src-head/2014-July/060947.html and http://lists.freebsd.org/pipermail/svn-src-head/2014-June/059904.html They have been MFC'd to RELENG_10 a few weeks ago -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Intel Support for FreeBSD
I notice that there hasn't been an update in the Intel Download Center since July. Is there no official support for 10? We liked to use the intel stuff as an alternative to the latest freebsd code, but it doesnt compile. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: TSO help or hindrance ? (was Re: TSO and FreeBSD vs Linux)
Didn't read down far enough. There are def issues and gains are probably mostly in a lab with 9k frames. Turn it off. CPUs and buses are fast. BC On Sep 10, 2013, at 6:52 PM, Mike Tancsa m...@sentex.net wrote: On 9/10/2013 6:42 PM, Barney Cordoba wrote: NFS has been broken since Day 1, so lets not come to conclusions about anything as it relates to NFS. iSCSI is NFS ? ---Mike BC *From:* Mike Tancsa m...@sentex.net *To:* Rick Macklem rmack...@uoguelph.ca *Cc:* FreeBSD Net n...@freebsd.org; David Wolfskill da...@catwhisker.org *Sent:* Wednesday, September 4, 2013 11:26 AM *Subject:* TSO help or hindrance ? (was Re: TSO and FreeBSD vs Linux) On 9/4/2013 8:50 AM, Rick Macklem wrote: David Wolfskill wrote: I noticed that when I tried to write files to NFS, I could write small files OK, but larger ones seemed to ... hang. * ifconfig -v em0 showed flags TSO4 VLAN_HWTSO turned on. * sysctl net.inet.tcp.tso showed 1 -- enabled. As soon as I issued sudo net.inet.tcp.tso=0 ... the copy worked without a hitch or a whine. And I was able to copy all 117709618 bytes, not just 2097152 (2^21). Is the above expected? It came rather as a surprise to me. Not surprising to me, I'm afraid. When there are serious NFS problems like this, it is often caused by a network fabric issue and broken TSO is at the top of the list w.r.t. cause. I was just experimenting a bit with iSCSI via FreeNAS and was a little disappointed at the speeds I was getting. So, I tried disabling tso on both boxes and it did seem to speed things up a bit. Data and testing methods attached in a txt file. I did 3 cases. Just boot up FreeNAS and the initiator without tweaks. That had the worst performance. disable tso on the nic as well as via sysctl on both boxes. That had the best performance. re-enable tso on both boxes. That had better performance than the first case, but still not as good as totally disabling it. I am guessing something is not quite being re-enabled properly ? But its different than the other two cases ?!? tgt is FreeNAS-9.1.1-RELEASE-x64 (a752d35) and initiator is r254328 9.2 AMD64 The FreeNAS box has 16G of RAM, so the file is being served out of cache as gstat shows no activity when sending out the file ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net mailto:m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-net@freebsd.org mailto:freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org mailto:freebsd-net-unsubscr...@freebsd.org -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: TSO help or hindrance ? (was Re: TSO and FreeBSD vs Linux)
NFS has been broken since Day 1, so lets not come to conclusions about anything as it relates to NFS. BC From: Mike Tancsa m...@sentex.net To: Rick Macklem rmack...@uoguelph.ca Cc: FreeBSD Net n...@freebsd.org; David Wolfskill da...@catwhisker.org Sent: Wednesday, September 4, 2013 11:26 AM Subject: TSO help or hindrance ? (was Re: TSO and FreeBSD vs Linux) On 9/4/2013 8:50 AM, Rick Macklem wrote: David Wolfskill wrote: I noticed that when I tried to write files to NFS, I could write small files OK, but larger ones seemed to ... hang. * ifconfig -v em0 showed flags TSO4 VLAN_HWTSO turned on. * sysctl net.inet.tcp.tso showed 1 -- enabled. As soon as I issued sudo net.inet.tcp.tso=0 ... the copy worked without a hitch or a whine. And I was able to copy all 117709618 bytes, not just 2097152 (2^21). Is the above expected? It came rather as a surprise to me. Not surprising to me, I'm afraid. When there are serious NFS problems like this, it is often caused by a network fabric issue and broken TSO is at the top of the list w.r.t. cause. I was just experimenting a bit with iSCSI via FreeNAS and was a little disappointed at the speeds I was getting. So, I tried disabling tso on both boxes and it did seem to speed things up a bit. Data and testing methods attached in a txt file. I did 3 cases. Just boot up FreeNAS and the initiator without tweaks. That had the worst performance. disable tso on the nic as well as via sysctl on both boxes. That had the best performance. re-enable tso on both boxes. That had better performance than the first case, but still not as good as totally disabling it. I am guessing something is not quite being re-enabled properly ? But its different than the other two cases ?!? tgt is FreeNAS-9.1.1-RELEASE-x64 (a752d35) and initiator is r254328 9.2 AMD64 The FreeNAS box has 16G of RAM, so the file is being served out of cache as gstat shows no activity when sending out the file ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Flow ID, LACP, and igb
Are you using a pcie3 bus? Of course this is only an issue for 10g; what pct of FreeBSD users have a load over 9.5Gb/s? It's completely unnecessary for igb or em driver, so why is it used? because it's there. Here's my argument against it. The handful of brains capable of doing driver development become consumed with BS like LRO and the things that need to be fixed, like buffer management and basic driver design flaws, never get fixed. The offload code makes the driver code a virtual mess that can only be maintained by Jack and 1 other guy in the entire world. And it takes 10 times longer to make a simple change or to add support for a new NIC. In a week I ripped out the offload crap and the 9000 sysctls, eliminated the consumer buffer problem, reduced locking by 40% and now the igb driver uses 20% less cpu with a full gig load. And the code is cleaner and more easily maintained. BC From: Adrian Chadd adr...@freebsd.org To: Barney Cordoba barney_cord...@yahoo.com Cc: Andre Oppermann an...@freebsd.org; Alan Somers asom...@freebsd.org; n...@freebsd.org n...@freebsd.org; Jack F Vogel j...@freebsd.org; Justin T. Gibbs gi...@freebsd.org; Luigi Rizzo ri...@iet.unipi.it; T.C. Gubatayao tgubata...@barracuda.com Sent: Sunday, September 1, 2013 4:51 PM Subject: Re: Flow ID, LACP, and igb Yo, LRO is an interesting hack that seems to do a good trick of hiding the ridiculous locking and unfriendly cache behaviour that we do per-packet. It helps with LAN test traffic where things are going out in batches from the TCP layer so the RX layer sees these frames in-order and can do LRO. When you disable it, I don't easily get 10GE LAN TCP performance. That has to be fixed. Given how fast the CPU cores, bus interconnect and memory interconnects are, I don't think there should be any reason why we can't hit 10GE traffic on a LAN with LRO disabled (in both software and hardware.) Now that I have the PMC sandy bridge stuff working right (but no PEBS, I have to talk to Intel about that in a bit more detail before I think about hacking that in) we can get actual live information about this stuff. But the last time I looked, there's just too much per-packet latency going on. The root cause looks like it's a toss up between scheduling, locking and just lots of code running to completion per-frame. As I said, that all has to die somehow. 2c, -adrian On 1 September 2013 08:45, Barney Cordoba barney_cord...@yahoo.com wrote: Comcast sends packets OOO. With any decent number of internet hops you're likely to encounter a load balancer or packet shaper that sends packets OOO, so you just can't be worried about it. In fact, your designs MUST work with OOO packets. Getting balance on your load balanced lines is certainly a bigger upside than the additional CPU used. You can buy a faster processor for your stack for a lot less than you can buy bandwidth. Frankly my opinion of LRO is that it's a science project suitable for labs only. It's a trick to get more bandwidth than your bus capacity; the answer is to not run PCIe2 if you need pcie3. You can use it internally if you have control of all of the machines. When I modify a driver the first thing that I do is rip it out. BC From: Luigi Rizzo ri...@iet.unipi.it To: Barney Cordoba barney_cord...@yahoo.com Cc: Andre Oppermann an...@freebsd.org; Alan Somers asom...@freebsd.org; n...@freebsd.org n...@freebsd.org; Jack F Vogel j...@freebsd.org; Justin T. Gibbs gi...@freebsd.org; T.C. Gubatayao tgubata...@barracuda.com Sent: Saturday, August 31, 2013 10:27 PM Subject: Re: Flow ID, LACP, and igb On Sun, Sep 1, 2013 at 4:15 AM, Barney Cordoba barney_cord...@yahoo.com wrote: ... [your point on testing with realistic assumptions is surely a valid one] Of course there's nothing really wrong with OOO packets. We had this discussion before; lots of people have round robin dual homing without any ill effects. It's just not an issue. It depends on where you are. It may not be an issue if the reordering is not large enough to trigger retransmissions, but even then it is annoying as it causes more work in the endpoint -- it prevents LRO from working, and even on the host stack it takes more work to sort where an out of order segment goes than appending an in-order one to the socket buffer. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo
Re: Flow ID, LACP, and igb
Comcast sends packets OOO. With any decent number of internet hops you're likely to encounter a load balancer or packet shaper that sends packets OOO, so you just can't be worried about it. In fact, your designs MUST work with OOO packets. Getting balance on your load balanced lines is certainly a bigger upside than the additional CPU used. You can buy a faster processor for your stack for a lot less than you can buy bandwidth. Frankly my opinion of LRO is that it's a science project suitable for labs only. It's a trick to get more bandwidth than your bus capacity; the answer is to not run PCIe2 if you need pcie3. You can use it internally if you have control of all of the machines. When I modify a driver the first thing that I do is rip it out. BC From: Luigi Rizzo ri...@iet.unipi.it To: Barney Cordoba barney_cord...@yahoo.com Cc: Andre Oppermann an...@freebsd.org; Alan Somers asom...@freebsd.org; n...@freebsd.org n...@freebsd.org; Jack F Vogel j...@freebsd.org; Justin T. Gibbs gi...@freebsd.org; T.C. Gubatayao tgubata...@barracuda.com Sent: Saturday, August 31, 2013 10:27 PM Subject: Re: Flow ID, LACP, and igb On Sun, Sep 1, 2013 at 4:15 AM, Barney Cordoba barney_cord...@yahoo.comwrote: ... [your point on testing with realistic assumptions is surely a valid one] Of course there's nothing really wrong with OOO packets. We had this discussion before; lots of people have round robin dual homing without any ill effects. It's just not an issue. It depends on where you are. It may not be an issue if the reordering is not large enough to trigger retransmissions, but even then it is annoying as it causes more work in the endpoint -- it prevents LRO from working, and even on the host stack it takes more work to sort where an out of order segment goes than appending an in-order one to the socket buffer. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Flow ID, LACP, and igb
May I express my glee and astonishment that you're debating the use of complicated hash functions for something that's likely to have from 2-8 slots? Also, the *most* important thing is distribution with realistic data. The goal should be to use the most trivial function that gives the most balanced distribution with real numbers. Faster is not better if the result is an unbalanced distribution. Many of your ports will be 80 and 53, and if you're going through a router your ethernets may not be very unique, so why even bother to include them? Does getting a good distribution require that you hash every element individually, or can you get the same distribution with a faster, simpler way of creating the seed? There's also the other consideration of packet size. Packets on port 53 are likely to be smaller than packets on port 80. What you want is equal distribution PER PORT on the ports that will carry that vast majority of your traffic. When designing efficient systems, you must not assume that ports and IPs are random, because they're not. 99% of your load will be on a small number of destination ports and a limited range of source ports. For a web server application, geting a perfect distribution on the http ports is most crucial. The hash function in if_lagg.c looks like more of a classroom exercise than a practical implementation. If you're going to consider 100M iterations; consider that much of the time is wasted parsing the packet (again). Why not add a simple sysctl that enables a hash that is created in the ip parser, when all of the pieces are available without having to re-parse the mbuf? Or better yet, use the same number of queues on igb as you have LAGG ports, and use the queue id (or RSS) as the hash, so that your traffic is sync'd between the ethernet adapter queues and the LAGG ports. The card has already done the work for you. BC From: Luigi Rizzo ri...@iet.unipi.it To: Alan Somers asom...@freebsd.org Cc: Jack F Vogel j...@freebsd.org; n...@freebsd.org n...@freebsd.org; Justin T. Gibbs gi...@freebsd.org; Andre Oppermann an...@freebsd.org; T.C. Gubatayao tgubata...@barracuda.com Sent: Friday, August 30, 2013 8:04 PM Subject: Re: Flow ID, LACP, and igb Alan, On Thu, Aug 29, 2013 at 6:45 PM, Alan Somers asom...@freebsd.org wrote: ... I pulled all four hash functions out into userland and microbenchmarked them. The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash is slower, and siphash24 is the slowest. Also, Clang resulted in much faster code than gcc. i missed this part of your message, but if i read your code well, you are running 100M iterations and the numbers below are in seconds, so if you multiply the numbers by 10 you have the cost per hash in nanoseconds. What CPU did you use for your tests ? Also some of the numbers (FNV and hash32) are suspiciously low. I believe that the compiler (both of them) have figure out that everything is constant in these functions, and fnv_32_buf() and hash32_buf() are inline, hence they can be optimized to just return a constant. This does not happen for siphash and jenkins because they are defined externally. Can you please re-run the tests in a way that defeats the optimization ? (e.g. pass a non constant argument to the the hashes so you actually need to run the code). cheers luigi http://people.freebsd.org/~asomers/lagg_hash/ [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.8 FNV: 0.76 hash32: 1.18 SipHash24: 44.39 Jenkins: 6.20 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.2.1 FNV: 0.74 hash32: 1.35 SipHash24: 55.25 Jenkins: 7.37 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.3 FNV: 0.30 hash32: 0.30 SipHash24: 55.97 Jenkins: 6.45 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.2 FNV: 0.30 hash32: 0.30 SipHash24: 44.52 Jenkins: 6.48 T.C. [1] http://svnweb.freebsd.org/base/head/sys/libkern/jenkins_hash.c?view=markup ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Flow ID, LACP, and igb
And another thing; the use of modulo is very expensive when the number of ports used in LAGG is *usually* a power of 2. foo(SLOTS-1) is a lot faster than (foo%SLOTS). if (SLOTS == 2 || SLOTS == 4 || SLOTS == 8) hash = hash(SLOTS-1); else hash = hash % SLOTS; is more than twice as fast as hash % SLOTS; BC From: Luigi Rizzo ri...@iet.unipi.it To: Alan Somers asom...@freebsd.org Cc: Jack F Vogel j...@freebsd.org; n...@freebsd.org n...@freebsd.org; Justin T. Gibbs gi...@freebsd.org; Andre Oppermann an...@freebsd.org; T.C. Gubatayao tgubata...@barracuda.com Sent: Friday, August 30, 2013 8:04 PM Subject: Re: Flow ID, LACP, and igb Alan, On Thu, Aug 29, 2013 at 6:45 PM, Alan Somers asom...@freebsd.org wrote: ... I pulled all four hash functions out into userland and microbenchmarked them. The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash is slower, and siphash24 is the slowest. Also, Clang resulted in much faster code than gcc. i missed this part of your message, but if i read your code well, you are running 100M iterations and the numbers below are in seconds, so if you multiply the numbers by 10 you have the cost per hash in nanoseconds. What CPU did you use for your tests ? Also some of the numbers (FNV and hash32) are suspiciously low. I believe that the compiler (both of them) have figure out that everything is constant in these functions, and fnv_32_buf() and hash32_buf() are inline, hence they can be optimized to just return a constant. This does not happen for siphash and jenkins because they are defined externally. Can you please re-run the tests in a way that defeats the optimization ? (e.g. pass a non constant argument to the the hashes so you actually need to run the code). cheers luigi http://people.freebsd.org/~asomers/lagg_hash/ [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.8 FNV: 0.76 hash32: 1.18 SipHash24: 44.39 Jenkins: 6.20 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.2.1 FNV: 0.74 hash32: 1.35 SipHash24: 55.25 Jenkins: 7.37 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.3 FNV: 0.30 hash32: 0.30 SipHash24: 55.97 Jenkins: 6.45 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.2 FNV: 0.30 hash32: 0.30 SipHash24: 44.52 Jenkins: 6.48 T.C. [1] http://svnweb.freebsd.org/base/head/sys/libkern/jenkins_hash.c?view=markup ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 4-port ethernet adaptor link aggregation issue
That's way too high. Your base rx requirement is Ports * queues * rxd With a quad card you shouldn't be using more than 2 queues, so your requirement with 5 ports is 10,240 just for the receive setup. If you're using 4 queues that number doubles, which would make 25,600 not enough. Note that setting mbufs to a huge number doesn't allocate the buffers; they'll be allocated as needed. It's a ceiling. The reason for the ceiling is so that you don't blow up your memory. If your system is using 2 million mbuf clusters then you have much bigger problems than LAGG. Anyone who recommends 2 million clearly has no idea what they're doing. BC From: Joe Moog joem...@ebureau.com To: freebsd-net freebsd-net@freebsd.org Sent: Wednesday, August 28, 2013 9:36 AM Subject: Re: Intel 4-port ethernet adaptor link aggregation issue All: Thanks again to everybody for the responses and suggestions to our 4-port lagg issue. The solution (for those that may find the information of some value) was to set the value for kern.ipc.nmbclusters to a higher value than we had initially. Our previous tuning had this value set at 25600, but following a recommendation from the good folks at iXSystems we bumped this to a value closer to 200, and the 4-port lagg is functioning as expected now. Thank you all. Joe ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Flow ID, LACP, and igb
No, no. The entire point of the hash is to separate the connections. But when testing you should use realistic assumptions. You're not splitting packets, so the big packets will mess up your distribution if you don't get it right. Of course there's nothing really wrong with OOO packets. We had this discussion before; lots of people have round robin dual homing without any ill effects. It's just not an issue. BC From: T.C. Gubatayao tgubata...@barracuda.com To: Barney Cordoba barney_cord...@yahoo.com; Luigi Rizzo ri...@iet.unipi.it; Alan Somers asom...@freebsd.org Cc: Jack F Vogel j...@freebsd.org; Justin T. Gibbs gi...@freebsd.org; Andre Oppermann an...@freebsd.org; n...@freebsd.org n...@freebsd.org Sent: Saturday, August 31, 2013 9:38 PM Subject: RE: Flow ID, LACP, and igb On Sat, Aug 31, 2013 at 8:41 AM, Barney Cordoba barney_cord...@yahoo.com wrote: Also, the *most* important thing is distribution with realistic data. The goal should be to use the most trivial function that gives the most balanced distribution with real numbers. Faster is not better if the result is an unbalanced distribution. Agreed, with a caveat. It's critical that this distribution be by flow, so that out of order packet delivery is minimized. Many of your ports will be 80 and 53, and if you're going through a router your ethernets may not be very unique, so why even bother to include them? Does getting a good distribution require that you hash every element individually, or can you get the same distribution with a faster, simpler way of creating the seed? There's also the other consideration of packet size. Packets on port 53 are likely to be smaller than packets on port 80. What you want is equal distribution PER PORT on the ports that will carry that vast majority of your traffic. Unfortunately, trying to evenly distribute traffic per port based on packet size will likely result in the reordering of packets, and bandwidth wasted on TCP retransmissions. Or better yet, use the same number of queues on igb as you have LAGG ports, and use the queue id (or RSS) as the hash, so that your traffic is sync'd between the ethernet adapter queues and the LAGG ports. The card has already done the work for you. Isn't this hash for selecting an outbound link? The ingress adapter hash (RSS) won't help for packets originating from the host, or for packets that may have been translated or otherwise modified while traversing the stack. T.C. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
From: Andre Oppermann an...@freebsd.org To: Adrian Chadd adr...@freebsd.org Cc: Barney Cordoba barney_cord...@yahoo.com; Luigi Rizzo ri...@iet.unipi.it; freebsd-net@freebsd.org freebsd-net@freebsd.org Sent: Wednesday, August 21, 2013 2:19 PM Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) On 18.08.2013 23:54, Adrian Chadd wrote: Hi, I think the UNIX architecture is a bit broken for anything other than the occasional (for various traffic levels defining occasional!) traffic connection. It's serving us well purely through the sheer force of will of modern CPU power but I think we can do a lot better. I do not agree with you here. The UNIX architecture is fine but of course as with anything you're not going to get the full raw and theoretically possible performance for every special case out of it. It is extremely versatile and performs rather good over a broad set of applications. _I_ think the correct model is a netmap model - batched packet handling, lightweight drivers pushing and pulling batches of things, with some lightweight plugins to service that inside the kernel and/or push into the netmap ring buffer in userland. Interfacing into the ethernet and socket layer should be something that bolts on the side, kind of netgraph style. It would likely look a lot more like a switching backplane with socket IO being one of many processing possibilities. If socket IO stays packet at a time than great; but that's messing up the ability to do a lot of other interesting things. Sure, lets go back to MS-DOS with interrupt wedges. First of all, the Unix model has long been abandoned. System V Streams and all that classroom stuff (which is why I dislike netgraph) proved useless once we got beyond Token Ring. All you heard about in the old days was the OSI model; thank god the OSIs and CCITTs have become little more than noise as people started to really need to do things. How's that ISDN thing working out? As much as I complain, FreeBSD is far superior to other camps in their discipline and conformance to sanity. Play around with linux internals and you see what happens when you build an OS by an undisciplined committee. There's no bigger abortion in computing than the sk_buff. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
From: Luigi Rizzo ri...@iet.unipi.it To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org; Adrian Chadd adr...@freebsd.org Sent: Sunday, August 18, 2013 5:16 PM Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) On Sun, Aug 18, 2013 at 11:01 PM, Barney Cordoba barney_cord...@yahoo.comwrote: That's fine, it's a test tool, not a solution. It just seems that it gets pushed as if it's some sort of real world solution, which it's not. The idea that bringing packets into user space to forward them rather than just replacing the bridge module with something more efficient is just silliness. you might want to have a look at the VALE switch http://info.iet.unipi.it/~luigi/vale/ the upcoming version can attach physical interfaces to the switch and keep all the processing within the kernel. If pushing packets was a useful task, the solution would be easy. Unfortunately you need to do something useful with the packets in between. there are different definitions of what is useful: sources, sinks, forwarding, dropping (anti DoS), logging, ids, are all useful for different people. The mistake, i think, is to expect that there is one magic solution to handle all the useful cases. cheers luigi ___ Nobody claimed that there was a magic solution. But when so much time and brainpower is spent working on kludges (instead of doing things that have mainstream usefulness), it results in either 1) fewer people using it or 2) the kludges become accepted solutions, simply because someone did it. Polling, dummynet, netgraph, flowtable and buf_ring are all good examples. It's the big negative of open source, particularly for the bigger projects. Once someone has done something, it not worth the effort in most cases to do it in a more correct way; and the something becomes all that's available. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
From: Adrian Chadd adr...@freebsd.org To: Barney Cordoba barney_cord...@yahoo.com Cc: Luigi Rizzo ri...@iet.unipi.it; Lawrence Stewart lstew...@freebsd.org; FreeBSD Net n...@freebsd.org Sent: Saturday, August 17, 2013 11:59 AM Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) ... we get perfectly good throughput without 400k ints a second on the ixgbe driver. As in, I can easily saturate 2 x 10GE on ixgbe hardware with a handful of flows. That's not terribly difficult. However, there's a few interesting problems that need addressing: * There's lock contention between the transmit side from userland and the TCP timers, and the receive side with ACK processing. Under very high traffic load a lot of lock contention stalls things. We (the royal we, I'm mostly just doing tooling at the moment) working on that. * There's lock contention on the ARP, routing table and PCB lookups. The latter will go away when we've finally implemented RSS for transmit and receive and then moved things over to using PCB groups on CPUs which have NIC driver threads bound to them. * There's increasing cache thrashing from a larger workload, causing the expensive lookups to be even more expensive. * All the list walks suck. We need to be batching things so we use CPU caches much more efficiently. The idea of using TSO on the transmit side and generic LRO on the receive side is to make the per-packet overhead less. I think we can be much more efficient in general in packet processing, but that's a big task. :-) So, using at least TSO is a big benefit if purely to avoid decomposing things into smaller mbufs and contending on those locks in a very big way. I'm working on PMC to make it easier to use to find these bottlenecks and make the code and data more efficient. Then, likely, I'll end up hacking on generic TSO/LRO, TX/RX RSS queue management and make the PCB group thing default on for SMP machines. I may even take a knife to some of the packet processing overhead. --- The ints/sec reference was based on Luigi's implication that turning off moderation was some sort of performance choice. Again, you're talking throughput and not efficiency. I could fill a tx queue with 10gb of traffic with yesteryear's cpus. It's not an achievement. Being able to bridge real traffic at 10gb/s with 2 cores is. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
Great. Never has the been a better explanation for the word Kludge than netmap. From: Adrian Chadd adr...@freebsd.org To: Jim Thompson j...@netgate.com Cc: Barney Cordoba barney_cord...@yahoo.com; FreeBSD Net n...@freebsd.org; Luigi Rizzo ri...@iet.unipi.it; Lawrence Stewart lstew...@freebsd.org Sent: Sunday, August 18, 2013 11:57 AM Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) Right. Well, post some profiling data, let's figure this out sometime. Luigi can do bridging with 2 cores using netmap. So it's technically possible. There's just a lot of kernel gunk in the way of doing it ye olde way. -adrian On 18 August 2013 07:25, Jim Thompson j...@netgate.com wrote: On Aug 18, 2013, at 8:48 AM, Barney Cordoba barney_cord...@yahoo.com wrote: I could fill a tx queue with 10gb of traffic with yesteryear's cpus. It's not an achievement. Being able to bridge real traffic at 10gb/s with 2 cores is Or forward at layer 3. Or filter packets. Or IPSEC. Or... ___ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
That's fine, it's a test tool, not a solution. It just seems that it gets pushed as if it's some sort of real world solution, which it's not. The idea that bringing packets into user space to forward them rather than just replacing the bridge module with something more efficient is just silliness. If pushing packets was a useful task, the solution would be easy. Unfortunately you need to do something useful with the packets in between. Reminds me of polling. The problem is that over time, people actually view it as a solution, when it was never more than a kludge in the first place. BC From: Adrian Chadd adr...@freebsd.org To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org Sent: Sunday, August 18, 2013 3:18 PM Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) On 18 August 2013 11:39, Barney Cordoba barney_cord...@yahoo.com wrote: Great. Never has the been a better explanation for the word Kludge than netmap. Nah. Netmap is a reimplementation of some reasonably well known ways of pushing bits. Luigi just pushed it up to eleven and demonstrated what current hardware is capable of. I have never bought the We need eleventy cores just to push 10ge of real traffic! before. Luigi did note down where the per-packet inefficiencies were. What we have to do now is sit down and for each of those, figure out what the root causes are and how to mitigate it. There's some architectural things that need tidying up (read: CPU pinning, queue handling, some locking hilarity) but if they're solved, we'll end up having dual core boxes push line rate packets for routing. So the gauntlet has been thrown. Let's fix this shit up. -adrian ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
Criticism is the bedrock of innovation. From: Vijay Singh vijju.si...@gmail.com To: Barney Cordoba barney_cord...@yahoo.com Cc: Adrian Chadd adr...@freebsd.org; freebsd-net@freebsd.org freebsd-net@freebsd.org Sent: Sunday, August 18, 2013 3:46 PM Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) Barney, did you get picked on a lot as a kid? Wonder why you're so caustic and negative all the time? Sent from my iPhone On Aug 18, 2013, at 11:39 AM, Barney Cordoba barney_cord...@yahoo.com wrote: Great. Never has the been a better explanation for the word Kludge than netmap. From: Adrian Chadd adr...@freebsd.org To: Jim Thompson j...@netgate.com Cc: Barney Cordoba barney_cord...@yahoo.com; FreeBSD Net n...@freebsd.org; Luigi Rizzo ri...@iet.unipi.it; Lawrence Stewart lstew...@freebsd.org Sent: Sunday, August 18, 2013 11:57 AM Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) Right. Well, post some profiling data, let's figure this out sometime. Luigi can do bridging with 2 cores using netmap. So it's technically possible. There's just a lot of kernel gunk in the way of doing it ye olde way. -adrian On 18 August 2013 07:25, Jim Thompson j...@netgate.com wrote: On Aug 18, 2013, at 8:48 AM, Barney Cordoba barney_cord...@yahoo.com wrote: I could fill a tx queue with 10gb of traffic with yesteryear's cpus. It's not an achievement. Being able to bridge real traffic at 10gb/s with 2 cores is Or forward at layer 3. Or filter packets. Or IPSEC. Or... ___ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
Horsehockey. What are you guys running with, P4s? Modern cpus are magnificently fast. The triviality of lookups is a non-issue in almost all cases. The ability of modern cpus to fill a transmit queue faster than the data can be transmitted is incontrovertible. With TCP you have windows and things; trying to drill down to hardware inefficiencies as if you're running on a 200Mhz P4 is just silly. I abandoned hardware offloads back when someone tried to sell me on data compression boards; the truth is that the IO overhead of copying to and from the board was higher than the cpu cycles needed to compress the data. The failure to understand how IO and locks interfere with traffic flow on multicore systems is the biggest problem with driver development; all of this chatter about moderation is simply a waste of time; such things are completely tunable; a task that gets far too little attention IMO. Tuning can make a world of difference if you understand what you're doing. The idea that having 400K ints/second to gain a tock of throughput is an acceptable trade-off is patently absurd. EFFICIENCY is tantamount. Throughput is almost always a tuning issue. BC From: Luigi Rizzo ri...@iet.unipi.it To: Lawrence Stewart lstew...@freebsd.org Cc: FreeBSD Net n...@freebsd.org Sent: Wednesday, August 14, 2013 6:21 AM Subject: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) On Wed, Aug 14, 2013 at 05:23:02PM +1000, Lawrence Stewart wrote: On 08/14/13 16:33, Julian Elischer wrote: On 8/14/13 11:39 AM, Lawrence Stewart wrote: On 08/14/13 03:29, Julian Elischer wrote: I have been tracking down a performance embarrassment on AMAZON EC2 and have found it I think. Let us please avoid conflating performance with throughput. The behaviour you go on to describe as a performance embarrassment is actually a throughput difference, and the FreeBSD behaviour you're describing is essentially sacrificing throughput and CPU cycles for lower latency. That may not be a trade-off you like, but it is an important factor in this discussion. ... Sure, there's nothing wrong with holding throughput up as a key performance metric for your use case. I'm just trying to pre-empt a discussion that focuses on one metric and fails to consider the bigger picture. ... I could see no latency reversion. You wouldn't because it would be practically invisible in the sorts of tests/measurements you're doing. Our good friends over at HRT on the other hand would be far more likely to care about latency on the order of microseconds. Again, the use case matters a lot. ... so, does Software LRO mean that LRO on hte NIC should be ON or OFF to see this? I think (check the driver code in question as I'm not sure) that if you ifconfig if lro and the driver has hardware support or has been made aware of our software implementation, it should DTRT. The lower throughput than linux that julian was seeing is either because of a slow (CPU-bound) sender or slow receiver. Given that the FreeBSD tx path is quite expensive (redoing route and arp lookups on every packet, etc.) I highly suspect the sender side is at fault. Ack coalescing, LRO, GRO are limited to the set of packets that you receive in the same batch, which in turn is upper bounded by the interrupt moderation delay. Apart from simple benchmarks with only a few flows, it is very hard that ack/lro/gro can coalesce more than a few segments for the same flow. But the real fix is in tcp_output. In fact, it has never been the case that an ack (single or coalesced) triggers an immediate transmission in the output path. We had this in the past (Silly Window Syndrome) and there is code that avoids sending less than 1-mtu under appropriate conditions (there is more data to push out anyways, no NODELAY, there are outstanding acks, the window can open further). In all these cases there is no reasonable way to experience the difference in terms of latency. If one really cares, e.g. the High Speed Trading example, this is a non issue because any reasonable person would run with TCP_NODELAY (and possibly disable interrupt moderation), and optimize for latency even on a per flow basis. In terms of coding effort, i suspect that by replacing the 1-mtu limit (t_maxseg i believe is the variable that we use in the SWS avoidance code) with 1-max-tso-segment we can probably achieve good results with little programming effort. Then the problem remains that we should keep a copy of route and arp information in the socket instead of redoing the lookups on every single transmission, as they consume some 25% of the time of a sendto(), and probably even more when it comes to large tcp segments, sendfile() and the like. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to
Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
EFFICIENCY is tantamount. Throughput is almost always a tuning issue. Of course I meant paramount. Coffee matters :-| From: Luigi Rizzo ri...@iet.unipi.it To: Lawrence Stewart lstew...@freebsd.org Cc: FreeBSD Net n...@freebsd.org Sent: Wednesday, August 14, 2013 6:21 AM Subject: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) On Wed, Aug 14, 2013 at 05:23:02PM +1000, Lawrence Stewart wrote: On 08/14/13 16:33, Julian Elischer wrote: On 8/14/13 11:39 AM, Lawrence Stewart wrote: On 08/14/13 03:29, Julian Elischer wrote: I have been tracking down a performance embarrassment on AMAZON EC2 and have found it I think. Let us please avoid conflating performance with throughput. The behaviour you go on to describe as a performance embarrassment is actually a throughput difference, and the FreeBSD behaviour you're describing is essentially sacrificing throughput and CPU cycles for lower latency. That may not be a trade-off you like, but it is an important factor in this discussion. ... Sure, there's nothing wrong with holding throughput up as a key performance metric for your use case. I'm just trying to pre-empt a discussion that focuses on one metric and fails to consider the bigger picture. ... I could see no latency reversion. You wouldn't because it would be practically invisible in the sorts of tests/measurements you're doing. Our good friends over at HRT on the other hand would be far more likely to care about latency on the order of microseconds. Again, the use case matters a lot. ... so, does Software LRO mean that LRO on hte NIC should be ON or OFF to see this? I think (check the driver code in question as I'm not sure) that if you ifconfig if lro and the driver has hardware support or has been made aware of our software implementation, it should DTRT. The lower throughput than linux that julian was seeing is either because of a slow (CPU-bound) sender or slow receiver. Given that the FreeBSD tx path is quite expensive (redoing route and arp lookups on every packet, etc.) I highly suspect the sender side is at fault. Ack coalescing, LRO, GRO are limited to the set of packets that you receive in the same batch, which in turn is upper bounded by the interrupt moderation delay. Apart from simple benchmarks with only a few flows, it is very hard that ack/lro/gro can coalesce more than a few segments for the same flow. But the real fix is in tcp_output. In fact, it has never been the case that an ack (single or coalesced) triggers an immediate transmission in the output path. We had this in the past (Silly Window Syndrome) and there is code that avoids sending less than 1-mtu under appropriate conditions (there is more data to push out anyways, no NODELAY, there are outstanding acks, the window can open further). In all these cases there is no reasonable way to experience the difference in terms of latency. If one really cares, e.g. the High Speed Trading example, this is a non issue because any reasonable person would run with TCP_NODELAY (and possibly disable interrupt moderation), and optimize for latency even on a per flow basis. In terms of coding effort, i suspect that by replacing the 1-mtu limit (t_maxseg i believe is the variable that we use in the SWS avoidance code) with 1-max-tso-segment we can probably achieve good results with little programming effort. Then the problem remains that we should keep a copy of route and arp information in the socket instead of redoing the lookups on every single transmission, as they consume some 25% of the time of a sendto(), and probably even more when it comes to large tcp segments, sendfile() and the like. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 4-port ethernet adaptor link aggregation issue
You can create your own pipeline with some minor modifications. Why wait months for the guys who did it wrong to make changes? BC From: Adrian Chadd adr...@freebsd.org To: Barney Cordoba barney_cord...@yahoo.com Cc: Zaphod Beeblebrox zbee...@gmail.com; Freddie Cash fjwc...@gmail.com; Steve Read steve.r...@netasq.com; freebsd-net freebsd-net@freebsd.org Sent: Saturday, August 3, 2013 12:21 AM Subject: Re: Intel 4-port ethernet adaptor link aggregation issue On 2 August 2013 16:35, Barney Cordoba barney_cord...@yahoo.com wrote: The stock igb driver binds to all cores, so with multiple igbs you have multiple nics binding to the same cores. I suppose that might create issues in a lagg setup. Try 1 queue and/or comment out the bind code. I have thrashed the hell out of 2-port ixgbe and 4-port chelsio (cxgbe) on 4-core device all with lagg. All is great. There's apparently some more igb improvements coming in the pipeline. Fear not! -adrian ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Intel 4-port ethernet adaptor link aggregation issue
The stock igb driver binds to all cores, so with multiple igbs you have multiple nics binding to the same cores. I suppose that might create issues in a lagg setup. Try 1 queue and/or comment out the bind code. BC From: Zaphod Beeblebrox zbee...@gmail.com To: Freddie Cash fjwc...@gmail.com Cc: Steve Read steve.r...@netasq.com; freebsd-net freebsd-net@freebsd.org Sent: Friday, August 2, 2013 5:41 PM Subject: Re: Intel 4-port ethernet adaptor link aggregation issue On several machines with large numbers of IGBx interfaces, I've found that hw.igb.enable_msix=0 is necessary to ensure proper operation. On Fri, Aug 2, 2013 at 11:49 AM, Freddie Cash fjwc...@gmail.com wrote: On Fri, Aug 2, 2013 at 12:36 AM, Steve Read steve.r...@netasq.com wrote: On 01.08.2013 20:07, Joe Moog wrote: We have an iXsystems 1U server (E5) with an Intel 4-port ethernet NIC installed, model I350-T4 (manufactured May of 2013). We're trying to bind the 4 ports on this NIC together into a single lagg port, connected LACP to a distribution switch (Cisco 4900-series). We are able to successfully bind the 2 on-board ethernet ports to a single lagg, however the NIC is not so cooperative. At first we thought we had a bad NIC, but a replacement has not fixed the issue. We are thinking there may be a driver limitation with these Intel ethernet NICs when attempting to bind more than 2 ports to a lagg. FreeBSD version: FreeBSD 9.1-PRERELEASE #0 r244125: Wed Dec 12 11:47:47 CST 2012 rc.conf: # LINK AGGREGATION ifconfig_igb2=UP ifconfig_igb3=UP ifconfig_igb4=UP ifconfig_igb5=UP cloned_interfaces=lagg0 ifconfig_lagg0=laggproto lacp laggport igb2 laggport igb3 laggport igb4 laggport igb5 ifconfig_lagg0=inet 192.168.1.14 netmask 255.255.255.0 Am I the only one who noticed that you replaced the value of $ifconfig_lagg0 that specifies the proto and the ports with one that specifies just the address? Good catch! Merge the two ifconfig_lagg0 lines into one, and it will work infinitely better, or at least no worse. ifconfig_lagg0=laggproto lacp laggport igb2 laggport igb3 laggport igb4 laggport igb5 inet 192.168.1.14 netmask 255.255.255.0 Or, if you want to keep them split into two parts (initialise lagg0, then add IP): create_args_lagg0=laggproto lacp laggport igb2 laggport igb3 laggport igb4 laggport igb5 ifconfig_lagg0=inet 192.168.1.14 netmask 255.255.255.0 create_args_* are run first, then ifconfig_* are run. I like this setup, as it separates create and initialise from configure for cloned/virtual interfaces like vlans, laggs, etc. -- Freddie Cash fjwc...@gmail.com ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Recommendations for 10gbps NIC
On Fri, 26 Jul 2013 15:14:17 -0700 (PDT) Barney Cordoba barney_cord...@yahoo.com wrote about Re: Recommendations for 10gbps NIC: BC I don't really understand why nearly all 10GBE cards are dual-port. BC Surely there is a market for NICs between 1 gigabit and 20 gigabit. Myricom has single port 10G cards. However, I only use them on Linux and cannot comment on FreeBSD usage here. cu Gerrit I didn't write/ask that; But intel makes a single port x540 card thats available through popular online outlets. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: bce(4) panics, 9.2rc1 [redux]
From: Sean Bruno sean_br...@yahoo.com To: freebsd-net@freebsd.org freebsd-net@freebsd.org Sent: Monday, July 29, 2013 8:56 PM Subject: Re: bce(4) panics, 9.2rc1 [redux] On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: Running 9.2 in production load mail servers. We're hitting the watchdog message and crashing with the stable/9 version. We're reverting the change from 2 weeks ago and seeing if it still happens. We didn't see this from stable/9 from about a month ago. Sean Not seeing any changes to core dumps, or crashes after updating the bce(4) interface on these Dell R410s. IPMI was a definite false hope. No changes noted after I modified the ipmi_attach code. stable/7 works just fine and stable/9 fails with NMI erros on the console very badly. It fails so badly that it won't come into service at all. I've reverted stable/9 back to august of 2012 with no changes. It sort of looks like r236216 is causing severe issues with my configuration. The Dell R410 has a 3rd ethernet interface for the BMC only, not sure if that is meaningful in this context. The 3rd interface is *not* visible from the o/s and is dedicated to the BMC interface. Doing more testing at this time to validate. Sean -- FWIW, I have an R210 with a BCM5716 running 9.1 RELEASE without any problems. I have customized the driver a bit. Try turning off the features and running it raw without any checksum or tso gobbledygook. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Recommendations for 10gbps NIC
From: Luigi Rizzo ri...@iet.unipi.it To: Alexander V. Chernikov melif...@freebsd.org Cc: Barney Cordoba barney_cord...@yahoo.com; Daniel Feenberg feenb...@nber.org; freebsd-net@freebsd.org freebsd-net@freebsd.org Sent: Saturday, July 27, 2013 4:15 AM Subject: Re: Recommendations for 10gbps NIC On Sat, Jul 27, 2013 at 10:02 AM, Alexander V. Chernikov melif...@freebsd.org wrote: On 27.07.2013 02:14, Barney Cordoba wrote: *From:* Daniel Feenberg feenb...@nber.org *To:* Alexander V. Chernikov melif...@freebsd.org *Cc:* Barney Cordoba barney_cord...@yahoo.com; freebsd-net@freebsd.org freebsd-net@freebsd.org *Sent:* Friday, July 26, 2013 4:59 PM *Subject:* Re: Recommendations for 10gbps NIC On Fri, 26 Jul 2013, Alexander V. Chernikov wrote: On 26.07.2013 19:30, Barney Cordoba wrote: *From:* Alexander V. Chernikov melif...@freebsd.org mailto:melif...@freebsd.org *To:* Boris Kochergin sp...@acm.poly.edu mailto:sp...@acm.poly.edu *Cc:* freebsd-net@freebsd.org mailto:freebsd-net@freebsd.org *Sent:* Thursday, July 25, 2013 2:10 PM *Subject:* Re: Recommendations for 10gbps NIC On 25.07.2013 00:26, Boris Kochergin wrote: Hi. Hello. I am looking for recommendations for a 10gbps NIC from someone who has successfully used it on FreeBSD. It will be used on FreeBSD 9.1-R/amd64 to capture packets. Some desired features are: We have experience with HP NC523SFP and Chelsio N320E. The key difference among 10GBE cards for us is how they treat foreign DACs. The HP would PXE boot with several brands and generic DACs, but the Chelsio required a Chelsio brand DAC to PXE boot. There was firmware on the NIC to check the brand of cable. Both worked fine once booted. The Chelsio cables were hard to find, which became a problem. Also, when used with diskless Unix clients the Chelsio cards seemed to hang from time to time. Otherwise packet loss was one in a million for both cards, even with 7 meter cables. We liked the fact that the Chelsio cards were single-port and cheaper. I don't really understand why nearly all 10GBE cards are dual-port. Surely there is a market for NICs between 1 gigabit and 20 gigabit. The NIC heatsinks are too hot to touch during use unless specially cooled. Daniel Feenberg NBER - The same reason that they don't make single core cpus anymore. It costs about the same to make a 1 port chip as a 2 port chip. I find it interesting how so many talk about the cards, when most often the differences are with the drivers. Luigi made the most useful comment; if you ever want to use netmap, you need to buy a card compatible with netmap. Although you don't need netmap just to capture 10Gb/s. Forwarding, Maybe. I also find it interesting that nobody seems to have a handle on the performance differences. Obviously they're all different. Maybe substantially different. It depends on what kind of performance you are talking about. All NICs are capable of doing linerate RX/TX for both small/big packets. this is actually not true. I have direct experience with Intel, Mellanox and Broadcom, and small packets are a problem across the board even with 1 port. From my experience only intel can do line rate (14.88Mpps) with 64-byte frames, but suffers a bit with sizes that are not multiple of 64. Mellanox peaks at around 7Mpps. Broadcom is limited to some 2.5Mpps. This is all with netmap, using the regular stack you are going to see much much less. Large frames (1400+) are probably not a problem for anyone, but since the original post asked for packet capture, i thought the small-frame case is a relevant one. The only notable exception I;m aware of are Intel 82598-based NICs which advertise PCI-E X8 gen2 with _2.5GT_ link speed, giving you maximum ~14Gbit/s bw for 2 ports instead of 20. This makes me curious because i believe people have used netmap with the 82598 and achieved close to line rate even with 64-byte frames/one port, and i thought (maybe I am wrong ?) the various 2-port NICs use 4 lanes per port. So the number i remember does not match with your quote of 2.5Gt/s. Are all 82598 using 2.5GT/s (which is a gen.1 speed) instead of 5 ? cheers luigi ___ 64 byte frames rarely require that 64 bytes be transferred across the bus. Depending on your offloads the bus requirement can be quite a bit less than the line speed. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Recommendations for 10gbps NIC
From: Alexander V. Chernikov melif...@freebsd.org To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org; Daniel Feenberg feenb...@nber.org Sent: Saturday, July 27, 2013 4:02 AM Subject: Re: Recommendations for 10gbps NIC On 27.07.2013 02:14, Barney Cordoba wrote: *From:* Daniel Feenberg feenb...@nber.org *To:* Alexander V. Chernikov melif...@freebsd.org *Cc:* Barney Cordoba barney_cord...@yahoo.com; freebsd-net@freebsd.org freebsd-net@freebsd.org *Sent:* Friday, July 26, 2013 4:59 PM *Subject:* Re: Recommendations for 10gbps NIC On Fri, 26 Jul 2013, Alexander V. Chernikov wrote: On 26.07.2013 19:30, Barney Cordoba wrote: *From:* Alexander V. Chernikov melif...@freebsd.org mailto:melif...@freebsd.org *To:* Boris Kochergin sp...@acm.poly.edu mailto:sp...@acm.poly.edu *Cc:* freebsd-net@freebsd.org mailto:freebsd-net@freebsd.org *Sent:* Thursday, July 25, 2013 2:10 PM *Subject:* Re: Recommendations for 10gbps NIC On 25.07.2013 00:26, Boris Kochergin wrote: Hi. Hello. I am looking for recommendations for a 10gbps NIC from someone who has successfully used it on FreeBSD. It will be used on FreeBSD 9.1-R/amd64 to capture packets. Some desired features are: We have experience with HP NC523SFP and Chelsio N320E. The key difference among 10GBE cards for us is how they treat foreign DACs. The HP would PXE boot with several brands and generic DACs, but the Chelsio required a Chelsio brand DAC to PXE boot. There was firmware on the NIC to check the brand of cable. Both worked fine once booted. The Chelsio cables were hard to find, which became a problem. Also, when used with diskless Unix clients the Chelsio cards seemed to hang from time to time. Otherwise packet loss was one in a million for both cards, even with 7 meter cables. We liked the fact that the Chelsio cards were single-port and cheaper. I don't really understand why nearly all 10GBE cards are dual-port. Surely there is a market for NICs between 1 gigabit and 20 gigabit. The NIC heatsinks are too hot to touch during use unless specially cooled. Daniel Feenberg NBER - The same reason that they don't make single core cpus anymore. It costs about the same to make a 1 port chip as a 2 port chip. I find it interesting how so many talk about the cards, when most often the differences are with the drivers. Luigi made the most useful comment; if you ever want to use netmap, you need to buy a card compatible with netmap. Although you don't need netmap just to capture 10Gb/s. Forwarding, Maybe. I also find it interesting that nobody seems to have a handle on the performance differences. Obviously they're all different. Maybe substantially different. It depends on what kind of performance you are talking about. All NICs are capable of doing linerate RX/TX for both small/big packets. The only notable exception I;m aware of are Intel 82598-based NICs which advertise PCI-E X8 gen2 with _2.5GT_ link speed, giving you maximum ~14Gbit/s bw for 2 ports instead of 20. This statement is sort of like saying all cars can do 65MPH or whatever the speed limit is, so therefore all cars are equal. If one device can forward 2Mpps at 20% cpu and other used 45%, obvious there is a preference to use the more efficient driver/controller. BC The x540 with RJ45 has the obvious advantage of being compatible with regular gigabit cards, and single port adapters are about $325 in the US. When cheap(er) 10g RJ45 switches become available, it will start to be used more and more. Very soon. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Recommendations for 10gbps NIC
From: Alexander V. Chernikov melif...@freebsd.org To: Boris Kochergin sp...@acm.poly.edu Cc: freebsd-net@freebsd.org Sent: Thursday, July 25, 2013 2:10 PM Subject: Re: Recommendations for 10gbps NIC On 25.07.2013 00:26, Boris Kochergin wrote: Hi. Hello. I am looking for recommendations for a 10gbps NIC from someone who has successfully used it on FreeBSD. It will be used on FreeBSD 9.1-R/amd64 to capture packets. Some desired features are: - PCIe - LC connectors - 10GBASE-SR - Either single- or dual-port - Multiqueue Intel 82598/99/X520 Emulex OCe10102-NM Mellanox ConnectX Chelsio T4 Do they all cost the same, have the exact same features and have equally well-written drivers? Which do you recommend and why? BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Recommendations for 10gbps NIC
From: Daniel Feenberg feenb...@nber.org To: Alexander V. Chernikov melif...@freebsd.org Cc: Barney Cordoba barney_cord...@yahoo.com; freebsd-net@freebsd.org freebsd-net@freebsd.org Sent: Friday, July 26, 2013 4:59 PM Subject: Re: Recommendations for 10gbps NIC On Fri, 26 Jul 2013, Alexander V. Chernikov wrote: On 26.07.2013 19:30, Barney Cordoba wrote: *From:* Alexander V. Chernikov melif...@freebsd.org *To:* Boris Kochergin sp...@acm.poly.edu *Cc:* freebsd-net@freebsd.org *Sent:* Thursday, July 25, 2013 2:10 PM *Subject:* Re: Recommendations for 10gbps NIC On 25.07.2013 00:26, Boris Kochergin wrote: Hi. Hello. I am looking for recommendations for a 10gbps NIC from someone who has successfully used it on FreeBSD. It will be used on FreeBSD 9.1-R/amd64 to capture packets. Some desired features are: We have experience with HP NC523SFP and Chelsio N320E. The key difference among 10GBE cards for us is how they treat foreign DACs. The HP would PXE boot with several brands and generic DACs, but the Chelsio required a Chelsio brand DAC to PXE boot. There was firmware on the NIC to check the brand of cable. Both worked fine once booted. The Chelsio cables were hard to find, which became a problem. Also, when used with diskless Unix clients the Chelsio cards seemed to hang from time to time. Otherwise packet loss was one in a million for both cards, even with 7 meter cables. We liked the fact that the Chelsio cards were single-port and cheaper. I don't really understand why nearly all 10GBE cards are dual-port. Surely there is a market for NICs between 1 gigabit and 20 gigabit. The NIC heatsinks are too hot to touch during use unless specially cooled. Daniel Feenberg NBER - The same reason that they don't make single core cpus anymore. It costs about the same to make a 1 port chip as a 2 port chip. I find it interesting how so many talk about the cards, when most often the differences are with the drivers. Luigi made the most useful comment; if you ever want to use netmap, you need to buy a card compatible with netmap. Although you don't need netmap just to capture 10Gb/s. Forwarding, Maybe. I also find it interesting that nobody seems to have a handle on the performance differences. Obviously they're all different. Maybe substantially different. The x540 with RJ45 has the obvious advantage of being compatible with regular gigabit cards, and single port adapters are about $325 in the US. When cheap(er) 10g RJ45 switches become available, it will start to be used more and more. Very soon. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: LACP LAGG device problems
On Sat, 7/20/13, isp ml...@ukr.net wrote: Subject: LACP LAGG device problems To: freebsd-net@freebsd.org Date: Saturday, July 20, 2013, 10:04 AM Hi! Can anybody tell me, is there any plans to improve LAGG(802.3ad) device driver in FreeBSD? It will be greate to have a possibility to set LACP mode (active/passive) and system priority. Also there is no way to set hashing algorithm and master interface (port). And we can't see any information about our neighbor. The same function in Linux is named Bonding and it is much more better. I realy can donate some money to those who can make this improvements. Best regards. ___ Why are you using LAGG when 10g cards are like $350? It's not a peering protocol nor it is PTP; can you see your peer info on an ethernet? Bonding is a late 90s concept designed to connect 2 slow links to get higher speeds, back in the day when 100Mb/s was ambitious. The point of LAGG is that it's transparent; you can load balance traffic to multiple hosts or create a redundant link without having to have equipment running some special applications, or any special logic above the LAGG device. Describing how you are using LAGG (and why) might be better than just asking for improvements. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: LACP LAGG device problems
I wasn't referring to science projects. Nor did I say it wasn't useful. Only that 10g is cheap now and quite a bit better. LAGG isn't perfect. - Original Message - From: Adrian Chadd adr...@freebsd.org To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org; isp ml...@ukr.net Sent: Sunday, July 21, 2013 9:49 AM Subject: Re: LACP LAGG device problems Hah! I'm pushing 20GE out using lagg right now (and fixing the er, amusing behaviour of doing so.) I'm aiming to hit 40 once I get hardware that doesn't get upset pushing that many bits. The netops people at ${JOB} also point out that even today switches occasionally get confused and crash a switchport. Ew. So yes, there are people using lagg, both for failover and throughput reasons. I'm working on debugging/statistics right now as part of general why are things behaving crappy debugging. I'll see about improving some of the peer reporting at the same time. -adrian On 21 July 2013 06:03, Barney Cordoba barney_cord...@yahoo.com wrote: On Sat, 7/20/13, isp ml...@ukr.net wrote: Subject: LACP LAGG device problems To: freebsd-net@freebsd.org Date: Saturday, July 20, 2013, 10:04 AM Hi! Can anybody tell me, is there any plans to improve LAGG(802.3ad) device driver in FreeBSD? It will be greate to have a possibility to set LACP mode (active/passive) and system priority. Also there is no way to set hashing algorithm and master interface (port). And we can't see any information about our neighbor. The same function in Linux is named Bonding and it is much more better. I realy can donate some money to those who can make this improvements. Best regards. ___ Why are you using LAGG when 10g cards are like $350? It's not a peering protocol nor it is PTP; can you see your peer info on an ethernet? Bonding is a late 90s concept designed to connect 2 slow links to get higher speeds, back in the day when 100Mb/s was ambitious. The point of LAGG is that it's transparent; you can load balance traffic to multiple hosts or create a redundant link without having to have equipment running some special applications, or any special logic above the LAGG device. Describing how you are using LAGG (and why) might be better than just asking for improvements. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: FreeBSD router problems
On Tue, 7/16/13, Eugene Grosbein eu...@grosbein.net wrote: Subject: Re: FreeBSD router problems To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org Date: Tuesday, July 16, 2013, 1:10 AM On 15.07.2013 22:04, Barney Cordoba wrote: Also, IP fragmentation and TCP segments are not the same thing. TCP segments regularly will come in out of order, NFS is too stupid to do things correctly; IP fragmentation should not be done unless necessary to accommodate a smaller mtu. The PR is about NFS over UDP, not TCP. -- Ok, so is there evidence that it's UDP and not an IP fragmenting problem? Out of Order UDP is the same issue; its common for packets to traverse different paths through the internet in the same connection, so OOO packets are normal. IP fragmentation is rare, except for NFS. A lot of ISPs will block fragmentation because it's difficult to shape or filter fragments; they're often used to defeat simple firewalls and filters. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: FreeBSD router problems
On Sun, 7/14/13, Eugene Grosbein eu...@grosbein.net wrote: Subject: Re: FreeBSD router problems To: Barney Cordoba barney_cord...@yahoo.com Cc: isp ml...@ukr.net, freebsd-net@freebsd.org Date: Sunday, July 14, 2013, 1:17 PM On 14.07.2013 23:14, Barney Cordoba wrote: So why not get a real 10gb/s card? RJ45 10gig is here, and it works a lot better than LAGG. If you want to get more than 1Gb/s on a single connection, you'd need to use roundrobin, which will alternate packets without concern for ordering. Purists will argue against it, but it does work and modern TCP stacks know how to deal with out of order packets. Except of FreeBSD's packet reassembly is broken for long time. For example, http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/167603 - NFS has been broken since the beginning of time. NFS has always had problems sending segments the packet size. There are a lot of ISPs that load balance multiple feeds so OOO packets are a normal occurrence. A stack that doesn't handle out of order tcp packets doesn't work in today's world. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: FreeBSD router problems
On Sun, 7/14/13, Eugene Grosbein eu...@grosbein.net wrote: Subject: Re: FreeBSD router problems To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org, isp ml...@ukr.net Date: Sunday, July 14, 2013, 1:17 PM On 14.07.2013 23:14, Barney Cordoba wrote: So why not get a real 10gb/s card? RJ45 10gig is here, and it works a lot better than LAGG. If you want to get more than 1Gb/s on a single connection, you'd need to use roundrobin, which will alternate packets without concern for ordering. Purists will argue against it, but it does work and modern TCP stacks know how to deal with out of order packets. Except of FreeBSD's packet reassembly is broken for long time. For example, http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/167603 ___ Also, IP fragmentation and TCP segments are not the same thing. TCP segments regularly will come in out of order, NFS is too stupid to do things correctly; IP fragmentation should not be done unless necessary to accommodate a smaller mtu. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Re[2]: FreeBSD router problems
So why not get a real 10gb/s card? RJ45 10gig is here, and it works a lot better than LAGG. If you want to get more than 1Gb/s on a single connection, you'd need to use roundrobin, which will alternate packets without concern for ordering. Purists will argue against it, but it does work and modern TCP stacks know how to deal with out of order packets. ifconfig lagg0 laggproto roundrobin laggport em0 laggport em1 BC On Thu, 7/11/13, isp ml...@ukr.net wrote: Subject: Re[2]: FreeBSD router problems To: Alan Somers asom...@freebsd.org Cc: freebsd-net@freebsd.org Date: Thursday, July 11, 2013, 2:11 PM I have a real network with more than 4 000 users. In normal case, when I have two 1Gbps routers, and I split VLAN's between them total bandwidth if growing up to 1.7 Gbps. --- Incoming mail --- From: Alan Somers asom...@freebsd.org Date: 11 July 2013, 21:00:41 How are you benchmarking it? Each TCP connection only uses one member of a lagg port. So if you want to see 1 Gbps, you'll need to benchmark with multiple TCP connections. You may also need multiple systems; I don't know the full details of LACP. On Thu, Jul 11, 2013 at 11:32 AM, isp ml...@ukr.net wrote: Hi! I have a problem with my FreeBSD router, I can't get more than 1 Gbps throught it, but I have 2 Gbps LAGG on it. There are only 27 IPFW rules (NAT+Shaping). IPoE only. lagg0 (VLAN's + shaping) - two 'igb' adapters lagg1 (NAT, tso if off) - two 'em' adapters I tried to switch off dummynet, but it doesn't helps. # uname -a [code]FreeBSD router 9.1-RELEASE-p3 FreeBSD 9.1-RELEASE-p3 #0: Tue Apr 30 20:02:00 EEST 2013 root@south:/usr/obj/usr/src/sys/ROUTER amd64 # top -aSPHI last pid: 91712; load averages: 2.18, 2.06, 1.97 up 20+22:28:36 17:40:22 120 processes: 7 running, 87 sleeping, 26 waiting CPU 0: 0.0% user, 0.0% nice, 1.6% system, 38.6% interrupt, 59.8% idle CPU 1: 0.0% user, 0.0% nice, 7.1% system, 37.0% interrupt, 55.9% idle CPU 2: 0.0% user, 0.0% nice, 3.9% system, 38.6% interrupt, 57.5% idle CPU 3: 0.0% user, 0.0% nice, 15.7% system, 26.8% interrupt, 57.5% idle Mem: 59M Active, 1102M Inact, 942M Wired, 800M Buf, 5529M Free Swap: 16G Total, 16G Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -72 - 0K 448K RUN 1 153:39 72.22% [intr{swi1: netisr 0}] 11 root 155 ki31 0K 64K RUN 1 494.2H 65.19% [idle{idle: cpu1}] 11 root 155 ki31 0K 64K CPU2 2 494.3H 64.65% [idle{idle: cpu2}] 11 root 155 ki31 0K 64K RUN 0 493.3H 63.38% [idle{idle: cpu0}] 11 root 155 ki31 0K 64K CPU3 3 496.4H 62.55% [idle{idle: cpu3}] 12 root -92 - 0K 448K WAIT 2 58:49 9.38% [intr{irq266: igb0:que}] 12 root -92 - 0K 448K WAIT 2 59:32 9.03% [intr{irq271: igb1:que}] 12 root -92 - 0K 448K CPU1 1 59:09 8.94% [intr{irq265: igb0:que}] 12 root -92 - 0K 448K WAIT 3 57:52 8.01% [intr{irq272: igb1:que}] 12 root -92 - 0K 448K WAIT 1 59:32 7.96% [intr{irq270: igb1:que}] 12 root -92 - 0K 448K WAIT 3 55:47 7.81% [intr{irq267: igb0:que}] 12 root -92 - 0K 448K WAIT 0 55:24 7.23% [intr{irq264: igb0:que}] 12 root -92 - 0K 448K WAIT 0 56:57 6.69% [intr{irq269: igb1:que}] 12 root -92 - 0K 448K WAIT 3 203:34 4.74% [intr{irq275: em1:rx 0}] 0 root -92 0 0K 336K - 2 427:03 2.64% [kernel{dummynet}] 0 root -92 0 0K 336K - 3 206:57 2.54% [kernel{em0 que}] 86278 root 20 0 33348K 8588K select 0 8:35 0.54% /usr/local/sbin/snmpd -p /var/run/net_snmpd.pid -r 12 root -92 - 0K 448K WAIT 2 7:56 0.20% [intr{irq276: em1:tx 0}] # cat /etc/sysctl.conf dev.igb.0.rx_processing_limit=4096 dev.igb.1.rx_processing_limit=4096 dev.em.0.rx_int_delay=200 dev.em.0.tx_int_delay=200 dev.em.0.rx_abs_int_delay=4000 dev.em.0.tx_abs_int_delay=4000 dev.em.0.rx_processing_limit=4096 dev.em.1.rx_int_delay=200 dev.em.1.tx_int_delay=200 dev.em.1.rx_abs_int_delay=4000 dev.em.1.tx_abs_int_delay=4000 dev.em.1.rx_processing_limit=4096 net.inet.ip.forwarding=1 net.inet.ip.fastforwarding=1 net.inet.tcp.blackhole=2 net.inet.udp.blackhole=0 net.inet.ip.redirect=0 net.inet.tcp.delayed_ack=0 net.inet.tcp.recvbuf_max=4194304 net.inet.tcp.sendbuf_max=4194304 net.inet.tcp.sack.enable=0 net.inet.tcp.drop_synfin=1 net.inet.tcp.nolocaltimewait=1 net.inet.ip.ttl=255 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.udp.recvspace=64080 net.inet.ip.rtmaxcache=1024 net.inet.ip.intr_queue_maxlen=5120 kern.ipc.nmbclusters=824288 kern.ipc.maxsockbuf=83886080
Re: Inconsistent NIC behavior
On Mon, 7/1/13, Zaphod Beeblebrox zbee...@gmail.com wrote: Subject: Re: Inconsistent NIC behavior To: Barney Cordoba barney_cord...@yahoo.com Date: Monday, July 1, 2013, 7:38 PM On Sun, Jun 30, 2013 at 12:04 PM, Barney Cordoba barney_cord...@yahoo.com wrote: One particular annoyance with Freebsd is that different NICs have different dormant behavior. On this we agree. For example em and igb both will show the link being active or not on boot whether the interface has been UPed or not, while ixgbe and bce do not. I think it's a worthy goal to have NICs work the same in this manner. It's very valuable to know that a nic is connected without having to UP it. And an annoyance when you fire up a new box with a new nic that shows No Carrier when the link light is on. I disagree here. If an interface is shutdown, it should give no link to the far end. I consider it an error that many FreeBSD NIC drivers cannot shutdown the link. -- I think thats a different issue. The ability to shut down a link could easily be a feature. However when you boot a machine, say with a 4 port NIC, having to UP them all to see which one is plugged in is simply a logistical disaster, particularly with admins with marginal skills. While shutting down a link may occasionally be useful, the preponderance of uses would lean towards having some way of knowing when a nic is plugged into a switch regardless of whether it's been fully initialized. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Inconsistent NIC behavior
One particular annoyance with Freebsd is that different NICs have different dormant behavior. For example em and igb both will show the link being active or not on boot whether the interface has been UPed or not, while ixgbe and bce do not. I think it's a worthy goal to have NICs work the same in this manner. It's very valuable to know that a nic is connected without having to UP it. And an annoyance when you fire up a new box with a new nic that shows No Carrier when the link light is on. It's really too much of a project for one person to have enough knowledge of multiple drivers to make the changes, so it would be best if the maintainers would do it. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: hw.igb.num_queues default
--- On Thu, 6/20/13, Andre Oppermann wrote: gt; From: Andre Oppermann gt; Subject: Re: hw.igb.num_queues default gt; To: quot;Eugene Grosbeinquot; gt; Cc: quot;freebsd-net@freebsd.orgquot; , quot;Eggert, Larsquot; , quot;Jack Vogelquot; gt; Date: Thursday, June 20, 2013, 10:29 AM gt; On 20.06.2013 15:37, Eugene Grosbein gt; wrote: gt; gt; On 20.06.2013 17:34, Eggert, Lars wrote: gt; gt; gt; gt;gt; real memory = 8589934592 (8192 MB) gt; gt;gt; avail memory = 8239513600 (7857 MB) gt; gt; gt; gt;gt; By default, the igb driver seems to set up one gt; queue per detected CPU. Googling around, people seemed to gt; suggest that limiting the number of queues makes things work gt; better. I can confirm that setting hw.igb.num_queues=2 seems gt; to have fixed the issue. (Two was the first value I tried, gt; maybe other values other than 0 would work, too.) gt; gt;gt; gt; gt;gt; In order to uphold POLA, should the igb driver gt; maybe default to a conservative value for hw.igb.num_queues gt; that may not deliver optimal performance, but at least works gt; out of the box? gt; gt; gt; gt; Or, better, make nmbclusters auto-tuning smarter, if gt; any. gt; gt; I mean, use more nmbclusters for machines with large gt; amounts of memory. gt; gt; That has already been done in HEAD. gt; gt; The other problem is the pre-filling of the large rings for gt; all queues gt; stranding large amounts of mbuf clusters. OpenBSD gt; starts with a small gt; number of filled mbufs in the RX ring and then dynamically gt; adjusts the gt; number upwards if there is enough traffic to maintain deep gt; buffers. I gt; don#39;t know if it always quickly scales in practice though. You#39;re probably not running with 512MB these days, so pre-filling isn#39;t much of an issue. 4 queues is only 8MB of ram with 1024 descriptors per queue, and 4MB with 512. Think about the # of queues issue. In order to have acceptable latency, you need to do 6k-10k interrupts per second per queue. So with 4 queues you have to process 40K ints/second and with 2 you only process 20k. For a gig link 2 queues is much more efficient. quot;Spreadingquot; for the sake of spreading is more about Intel marketing than it is about practical computing. BC BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netmap bridge can tranmit big packet in line rate ?
--- On Tue, 5/21/13, liujie liu...@263.net wrote: From: liujie liu...@263.net Subject: Re: netmap bridge can tranmit big packet in line rate ? To: freebsd-net@freebsd.org Date: Tuesday, May 21, 2013, 5:25 AM Hi, Prof.Luigi RIZZO Firstly i should thank you for netmap. I tried to send a e-mail to you yestoday, but it was rejected. I used two machines to test netmap bridge. all with i7-2600 cpu and intel 82599 dual-interfaces card. One worked as sender and receiver with pkt-gen, the other worked as bridge with bridge.c. as you said,I feeled comfous too when i saw the big packet performance dropped, i tried to change the memory parameters of netmap(netmap_mem1.c netmap_mem2.c),but it seemed that can not resove the problem. 60-byte packet send 14882289 pps recv 13994753 pps 124-byte send 8445770 pps recv 7628942 pps 252-byte send 4529819 pps recv 3757843 pps 508-byte send 2350815 pps recv 1645647 pps 1514-byte send 814288 pps recv 489133 pps These numbers indicate you're tx'ing 7.2Gb/s with 60 byte packets and 9.8Gb/s with 1514, so maybe you just need a new calculator? BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netmap bridge can tranmit big packet in line rate ?
--- On Tue, 5/21/13, Luigi Rizzo ri...@iet.unipi.it wrote: From: Luigi Rizzo ri...@iet.unipi.it Subject: Re: netmap bridge can tranmit big packet in line rate ? To: Hooman Fazaeli hoomanfaza...@gmail.com Cc: freebsd-net@freebsd.org Date: Tuesday, May 21, 2013, 10:39 AM On Tue, May 21, 2013 at 06:51:12PM +0430, Hooman Fazaeli wrote: On 5/21/2013 5:10 PM, Barney Cordoba wrote: --- On Tue, 5/21/13, liujie liu...@263.net wrote: From: liujie liu...@263.net Subject: Re: netmap bridge can tranmit big packet in line rate ? To: freebsd-net@freebsd.org Date: Tuesday, May 21, 2013, 5:25 AM Hi, Prof.Luigi RIZZO Firstly i should thank you for netmap. I tried to send a e-mail to you yestoday, but it was rejected. I used two machines to test netmap bridge. all with i7-2600 cpu and intel 82599 dual-interfaces card. One worked as sender and receiver with pkt-gen, the other worked as bridge with bridge.c. as you said,I feeled comfous too when i saw the big packet performance dropped, i tried to change the memory parameters of netmap(netmap_mem1.c netmap_mem2.c),but it seemed that can not resove the problem. 60-byte packet send 14882289 pps recv 13994753 pps 124-byte send 8445770 pps recv 7628942 pps 252-byte send 4529819 pps recv 3757843 pps 508-byte send 2350815 pps recv 1645647 pps 1514-byte send 814288 pps recv 489133 pps These numbers indicate you're tx'ing 7.2Gb/s with 60 byte packets and 9.8Gb/s with 1514, so maybe you just need a new calculator? BC ___ AsBarney pointed outalready, your numbers are reasonable. You have almost saturated the link with 1514 byte packets.In the case of 64 byte packets, you do not achieve line rate probably because of the congestion on the bus.Can you show us top -SI output on the sender machine? the OP is commenting that on the receive side he is seeing a much lower number than on the tx side (A:ix1 489Kpps vs A:ix0 814Kpps). [pkt-gen -f tx ix0][ix0 bridge ] [ HOST A ] [ HOST B ] [pkt-gen -f rx ix1][ix1 ] What is unclear is where the loss occurs. cheers luigi The ixgbe driver has mac stats that will answer that. Just look at the sysctl output. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
You have to admit there's a problem before you can fix it. If Eugene is going to blame to bottleneck and no one is going to tell him he's wrong, then there is no discussion. The solution in this case is to use 1 queue, which was my suggestion many days ago. The defaults are broken. The driver should default to 1 queue, and be tuned to the system environment. With 2 NICs in the box, the defaults are defective. 1 queue should always work. Other settings require tuning and an understanding of how things work. I've had to support i350 so I've been playing with the driver a bit. It works fine with lots of cores. But you have to have more cores than queues. 2 cards with 4 queues on a 6 physical core system gets into a contention problem at certain loads. I've also removed the cpu bindings, which is about all I'm free to disclose. The driver needs a tuning doc as much as anything else. BC --- On Sat, 5/11/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Hooman Fazaeli hoomanfaza...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Clément Hermann (nodens) nodens2...@gmail.com, Eugene Grosbein egrosb...@rdtc.ru, freebsd-net@freebsd.org Date: Saturday, May 11, 2013, 6:16 PM Hi, The motivation behind the locking scheme in igb in friends is for a very specific, userland-traffic-origin workload. Sure, it may or may not work well for forwarding/filtering workloads. If you want to fix it, let's have a discussion about how to do it, followed by some patches to do so. Adrian On 11 May 2013 13:12, Hooman Fazaeli hoomanfaza...@gmail.com wrote: On 5/11/2013 8:26 PM, Barney Cordoba wrote: Clearly you don't understand the problem. Your logic is that because other drivers are defective also; therefore its not a driver problem? The problem is caused by a multi-threaded driver that haphazardly launches tasks and that doesn't manage the case that the rest of the system can't handle the load. It's no different than a driver that barfs when mbuf clusters are exhausted. The answer isn't to increase memory or mbufs, even though that may alleviate the problem. The answer is to fix the driver, so that it doesn't crash the system for an event that is wholly predictable. igb has 1) too many locks and 2) exasperates the problem by binding to cpus, which causes it to not only have to wait for the lock to free, but also for a specific cpu to become free. So it chugs along happily until it encounters a bottleneck, at which point it quickly blows up the entire system in a domino effect. It needs to manage locks more efficiently, and also to detect when the backup is unmanageable. Ever since FreeBSD 5 the answer has been it's fixed in 7, or its fixed in 9, or it's fixed in 10. There will always be bottlenecks, and no driver should blow up the system no matter what intermediate code may present a problem. Its the driver's responsibility to behave and to drop packets if necessary. BC And how the driver should behave? You suggest dropping the packets. Even if we accept that dropping packets is a good strategy in all configurations (which I doubt), the driver is definitely not the best place to implement it, since that involves duplication of similar code between drivers. Somewhere like the Ethernet layer is a much better choice to watch load of packets and drop them to prevent them to eat all the cores. Furthermore, ignoring the fact that pf is not optimized for multi-processors and blaming drivers for not adjusting themselves with the this pf's fault, is a bit unfair, I believe. -- Best regards. Hooman Fazaeli ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
--- On Fri, 5/10/13, Eugene Grosbein egrosb...@rdtc.ru wrote: From: Eugene Grosbein egrosb...@rdtc.ru Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org, Clément Hermann (nodens) nodens2...@gmail.com Date: Friday, May 10, 2013, 8:56 AM On 10.05.2013 05:16, Barney Cordoba wrote: Network device driver is not guilty here, that's just pf's contention running in igb's context. They're both at play. Single threadedness aggravates subsystems that have too many lock points. It can also be solved with using 1 queue, because then you don't have 4 queues going into a single thread. Again, the problem is within pf(4)'s global lock, not in the igb(4). Again, you're wrong. It's not the bottleneck's fault; it's the fault of the multi-threaded code for only working properly when there are no bottlenecks. In practice, the problem is easily solved without any change in the igb code. The same problem will occur for other NIC drivers too - if several NICs were combined within one lagg(4). So, driver is not guilty and solution would be same - eliminate bottleneck and you will be fine and capable to spread the load on several CPU cores. Therefore, I don't care of CS theory for this particular case. Clearly you don't understand the problem. Your logic is that because other drivers are defective also; therefore its not a driver problem? The problem is caused by a multi-threaded driver that haphazardly launches tasks and that doesn't manage the case that the rest of the system can't handle the load. It's no different than a driver that barfs when mbuf clusters are exhausted. The answer isn't to increase memory or mbufs, even though that may alleviate the problem. The answer is to fix the driver, so that it doesn't crash the system for an event that is wholly predictable. igb has 1) too many locks and 2) exasperates the problem by binding to cpus, which causes it to not only have to wait for the lock to free, but also for a specific cpu to become free. So it chugs along happily until it encounters a bottleneck, at which point it quickly blows up the entire system in a domino effect. It needs to manage locks more efficiently, and also to detect when the backup is unmanageable. Ever since FreeBSD 5 the answer has been it's fixed in 7, or its fixed in 9, or it's fixed in 10. There will always be bottlenecks, and no driver should blow up the system no matter what intermediate code may present a problem. Its the driver's responsibility to behave and to drop packets if necessary. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
--- On Sat, 5/11/13, Hooman Fazaeli hoomanfaza...@gmail.com wrote: From: Hooman Fazaeli hoomanfaza...@gmail.com Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Barney Cordoba barney_cord...@yahoo.com Cc: Eugene Grosbein egrosb...@rdtc.ru, freebsd-net@freebsd.org, Clément Hermann (nodens) nodens2...@gmail.com Date: Saturday, May 11, 2013, 4:12 PM On 5/11/2013 8:26 PM, Barney Cordoba wrote: Clearly you don't understand the problem. Your logic is that because other drivers are defective also; therefore its not a driver problem? The problem is caused by a multi-threaded driver that haphazardly launches tasks and that doesn't manage the case that the rest of the system can't handle the load. It's no different than a driver that barfs when mbuf clusters are exhausted. The answer isn't to increase memory or mbufs, even though that may alleviate the problem. The answer is to fix the driver, so that it doesn't crash the system for an event that is wholly predictable. igb has 1) too many locks and 2) exasperates the problem by binding to cpus, which causes it to not only have to wait for the lock to free, but also for a specific cpu to become free. So it chugs along happily until it encounters a bottleneck, at which point it quickly blows up the entire system in a domino effect. It needs to manage locks more efficiently, and also to detect when the backup is unmanageable. Ever since FreeBSD 5 the answer has been it's fixed in 7, or its fixed in 9, or it's fixed in 10. There will always be bottlenecks, and no driver should blow up the system no matter what intermediate code may present a problem. Its the driver's responsibility to behave and to drop packets if necessary. BC And how the driver should behave? You suggest dropping the packets. Even if we accept that dropping packets is a good strategy in all configurations (which I doubt), the driver is definitely not the best place to implement it, since that involves duplication of similar code between drivers. Somewhere like the Ethernet layer is a much better choice to watch load of packets and drop them to prevent them to eat all the cores. Furthermore, ignoring the fact that pf is not optimized for multi-processors and blaming drivers for not adjusting themselves with the this pf's fault, is a bit unfair, I believe. It's easier to make excuses than to write a really good driver. I'll grant you that. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
--- On Sun, 4/28/13, Barney Cordoba barney_cord...@yahoo.com wrote: From: Barney Cordoba barney_cord...@yahoo.com Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Jack Vogel jfvo...@gmail.com Cc: FreeBSD Net freebsd-net@freebsd.org, Clément Hermann (nodens) nodens2...@gmail.com Date: Sunday, April 28, 2013, 2:59 PM The point of lists is to be able to benefit from other's experiences so you don't have to waste your time trying things that others have already done. I'm not pontificating. I've done the tests. There's no reason for every person who is having to exact same problem to do the same tests over and over, hoping for somemagically different result. The result will always be the same. Because there's no chance of it working properly by chance. BC --- On Sun, 4/28/13, Jack Vogel jfvo...@gmail.com wrote: From: Jack Vogel jfvo...@gmail.com Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Barney Cordoba barney_cord...@yahoo.com Cc: FreeBSD Net freebsd-net@freebsd.org, Clément Hermann (nodens) nodens2...@gmail.com Date: Sunday, April 28, 2013, 1:07 PM Try setting your queues to 1, run some tests, then try settingyour queues to 2, then to 4... its called tuning, and rather thanjust pontificating about it, which Barney so loves to do, you can discover what works best. I ran tests last week preparing for anew driver version and found the best results came not only whiletweaking queues, but also ring size, and I could see changes based on the buf ring size There are lots of things that may improve ordegrade performance depending on the workload. Jack On Sun, Apr 28, 2013 at 7:21 AM, Barney Cordoba barney_cord...@yahoo.com wrote: --- On Fri, 4/26/13, Clément Hermann (nodens) nodens2...@gmail.com wrote: From: Clément Hermann (nodens) nodens2...@gmail.com Subject: High CPU interrupt load on intel I350T4 with igb on 8.3 To: freebsd-net@freebsd.org Date: Friday, April 26, 2013, 7:31 AM Hi list, We use pf+ALTQ for trafic shaping on some routers. % We are switching to new servers : Dell PowerEdge R620 with 2 8-cores Intel Processor (E5-2650L), 8GB RAM and Intel I350T4 (quad port) using igb driver. The old hardware is using em driver, the CPU load is high but mostly due to kernel and a large pf ruleset. On the new hardware, we see high CPU Interrupt load (up to 95%), even though there is not much trafic currently (peaks about 150Mbps and 40Kpps). All queues are used and binded to a cpu according to top, but a lot of CPU time is spent on igb queues (interrupt or wait). The load is fine when we stay below 20Kpps. We see no mbuf shortage, no dropped packet, but there is little margin left on CPU time (about 25% idle at best, most of CPU time is spent on interrupts), which is disturbing. We have done some tuning, but to no avail : sysctl.conf : # mbufs kern.ipc.nmbclusters=65536 # Sockets kern.ipc.somaxconn=8192 net.inet.tcp.delayed_ack=0 net.inet.tcp.sendspace=65535 net.inet.udp.recvspace=65535 net.inet.udp.maxdgram=57344 net.local.stream.recvspace=65535 net.local.stream.sendspace=65535 # IGB dev.igb.0.rx_processing_limit=4096 dev.igb.1.rx_processing_limit=4096 dev.igb.2.rx_processing_limit=4096 dev.igb.3.rx_processing_limit=4096 /boot/loader.conf : vm.kmem_size=1G hw.igb.max_interrupt_rate=32000 # maximum number of interrupts/sec generated by single igb(4) (default 8000) hw.igb.txd=2048 # number of transmit descriptors allocated by the driver (2048 limit) hw.igb.rxd=2048 # number of receive descriptors allocated by the driver (2048 limit) hw.igb.rx_process_limit=1000 # maximum number of received packets to process at a time, The default of 100 is # too low for most firewalls. (-1 means unlimited) Kernel HZ is 1000. The IGB /boot/loader.conf tuning was our last attempt, it didn't change anything. Does anyone have any pointer ? How could we lower CPU interrupt load ? should we set hw.igb.max_interrupt_rate lower instead of higher ? From what we saw here and there, we should be able to do much better with this hardware. relevant sysctl (igb1 and igb2 only, other interfaces are unused) : sysctl dev.igb | grep -v : 0$ dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.1 dev.igb.1.%driver: igb dev.igb.1.%location: slot=0 function=1 dev.igb.1.%pnpinfo: vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x5001 class=0x02 dev.igb.1.%parent: pci5 dev.igb.1.nvm: -1 dev.igb.1
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
--- On Thu, 5/9/13, Eugene Grosbein egrosb...@rdtc.ru wrote: From: Eugene Grosbein egrosb...@rdtc.ru Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Clément Hermann (nodens) nodens2...@gmail.com Cc: freebsd-net@freebsd.org Date: Thursday, May 9, 2013, 10:55 AM On 26.04.2013 18:31, Clément Hermann (nodens) wrote: Hi list, We use pf+ALTQ for trafic shaping on some routers. We are switching to new servers : Dell PowerEdge R620 with 2 8-cores Intel Processor (E5-2650L), 8GB RAM and Intel I350T4 (quad port) using igb driver. The old hardware is using em driver, the CPU load is high but mostly due to kernel and a large pf ruleset. On the new hardware, we see high CPU Interrupt load (up to 95%), even though there is not much trafic currently (peaks about 150Mbps and 40Kpps). All queues are used and binded to a cpu according to top, but a lot of CPU time is spent on igb queues (interrupt or wait). The load is fine when we stay below 20Kpps. We see no mbuf shortage, no dropped packet, but there is little margin left on CPU time (about 25% idle at best, most of CPU time is spent on interrupts), which is disturbing. It seems you suffer from pf lock contention. You should stop using pf with multi-core systems with 8.3. Move to ipfw+dummynet or ng_car for 8.3 or move to 10.0-CURRENT having new, rewritten pf that does not have this problem. Network device driver is not guilty here, that's just pf's contention running in igb's context. Eugene Grosbein They're both at play. Single threadedness aggravates subsystems that have too many lock points. It can also be solved with using 1 queue, because then you don't have 4 queues going into a single thread. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
--- On Thu, 5/9/13, Eugene Grosbein egrosb...@rdtc.ru wrote: From: Eugene Grosbein egrosb...@rdtc.ru Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Barney Cordoba barney_cord...@yahoo.com Cc: Clément Hermann (nodens) nodens2...@gmail.com, freebsd-net@freebsd.org Date: Thursday, May 9, 2013, 12:30 PM On 09.05.2013 23:25, Barney Cordoba wrote: Network device driver is not guilty here, that's just pf's contention running in igb's context. Eugene Grosbein They're both at play. Single threadedness aggravates subsystems that have too many lock points. It can also be solved with using 1 queue, because then you don't have 4 queues going into a single thread. Again, the problem is within pf(4)'s global lock, not in the igb(4). Again, you're wrong. It's not the bottleneck's fault; it's the fault of the multi-threaded code for only working properly when there are no bottlenecks. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Capture packets before kernel process
--- On Tue, 4/30/13, w...@sourcearmory.com w...@sourcearmory.com wrote: From: w...@sourcearmory.com w...@sourcearmory.com Subject: Capture packets before kernel process To: freebsd-net@freebsd.org Date: Tuesday, April 30, 2013, 11:24 AM Hi! I need some help, currently I'm working in a project where I want to capture and process some network packets before the kernel. I have searched but I have found nothing. Is there some way to capture the packets before the kernel ? You want to wedge your code to the if_input routine. Then pass the mbuf to the original if_input routine. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
--- On Fri, 4/26/13, Clément Hermann (nodens) nodens2...@gmail.com wrote: From: Clément Hermann (nodens) nodens2...@gmail.com Subject: High CPU interrupt load on intel I350T4 with igb on 8.3 To: freebsd-net@freebsd.org Date: Friday, April 26, 2013, 7:31 AM Hi list, We use pf+ALTQ for trafic shaping on some routers. We are switching to new servers : Dell PowerEdge R620 with 2 8-cores Intel Processor (E5-2650L), 8GB RAM and Intel I350T4 (quad port) using igb driver. The old hardware is using em driver, the CPU load is high but mostly due to kernel and a large pf ruleset. On the new hardware, we see high CPU Interrupt load (up to 95%), even though there is not much trafic currently (peaks about 150Mbps and 40Kpps). All queues are used and binded to a cpu according to top, but a lot of CPU time is spent on igb queues (interrupt or wait). The load is fine when we stay below 20Kpps. We see no mbuf shortage, no dropped packet, but there is little margin left on CPU time (about 25% idle at best, most of CPU time is spent on interrupts), which is disturbing. We have done some tuning, but to no avail : sysctl.conf : # mbufs kern.ipc.nmbclusters=65536 # Sockets kern.ipc.somaxconn=8192 net.inet.tcp.delayed_ack=0 net.inet.tcp.sendspace=65535 net.inet.udp.recvspace=65535 net.inet.udp.maxdgram=57344 net.local.stream.recvspace=65535 net.local.stream.sendspace=65535 # IGB dev.igb.0.rx_processing_limit=4096 dev.igb.1.rx_processing_limit=4096 dev.igb.2.rx_processing_limit=4096 dev.igb.3.rx_processing_limit=4096 /boot/loader.conf : vm.kmem_size=1G hw.igb.max_interrupt_rate=32000 # maximum number of interrupts/sec generated by single igb(4) (default 8000) hw.igb.txd=2048 # number of transmit descriptors allocated by the driver (2048 limit) hw.igb.rxd=2048 # number of receive descriptors allocated by the driver (2048 limit) hw.igb.rx_process_limit=1000 # maximum number of received packets to process at a time, The default of 100 is # too low for most firewalls. (-1 means unlimited) Kernel HZ is 1000. The IGB /boot/loader.conf tuning was our last attempt, it didn't change anything. Does anyone have any pointer ? How could we lower CPU interrupt load ? should we set hw.igb.max_interrupt_rate lower instead of higher ? From what we saw here and there, we should be able to do much better with this hardware. relevant sysctl (igb1 and igb2 only, other interfaces are unused) : sysctl dev.igb | grep -v : 0$ dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.1 dev.igb.1.%driver: igb dev.igb.1.%location: slot=0 function=1 dev.igb.1.%pnpinfo: vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x5001 class=0x02 dev.igb.1.%parent: pci5 dev.igb.1.nvm: -1 dev.igb.1.enable_aim: 1 dev.igb.1.fc: 3 dev.igb.1.rx_processing_limit: 4096 dev.igb.1.eee_disabled: 1 dev.igb.1.link_irq: 2 dev.igb.1.device_control: 1209795137 dev.igb.1.rx_control: 67141658 dev.igb.1.interrupt_mask: 4 dev.igb.1.extended_int_mask: 2147483981 dev.igb.1.fc_high_water: 33168 dev.igb.1.fc_low_water: 33152 dev.igb.1.queue0.interrupt_rate: 71428 dev.igb.1.queue0.txd_head: 1318 dev.igb.1.queue0.txd_tail: 1318 dev.igb.1.queue0.tx_packets: 84663594 dev.igb.1.queue0.rxd_head: 717 dev.igb.1.queue0.rxd_tail: 715 dev.igb.1.queue0.rx_packets: 43899597 dev.igb.1.queue0.rx_bytes: 8905556030 dev.igb.1.queue1.interrupt_rate: 90909 dev.igb.1.queue1.txd_head: 693 dev.igb.1.queue1.txd_tail: 693 dev.igb.1.queue1.tx_packets: 57543349 dev.igb.1.queue1.rxd_head: 1033 dev.igb.1.queue1.rxd_tail: 1032 dev.igb.1.queue1.rx_packets: 54821897 dev.igb.1.queue1.rx_bytes: 9944955108 dev.igb.1.queue2.interrupt_rate: 10 dev.igb.1.queue2.txd_head: 350 dev.igb.1.queue2.txd_tail: 350 dev.igb.1.queue2.tx_packets: 62320990 dev.igb.1.queue2.rxd_head: 1962 dev.igb.1.queue2.rxd_tail: 1939 dev.igb.1.queue2.rx_packets: 43909016 dev.igb.1.queue2.rx_bytes: 8673941461 dev.igb.1.queue3.interrupt_rate: 14925 dev.igb.1.queue3.txd_head: 647 dev.igb.1.queue3.txd_tail: 647 dev.igb.1.queue3.tx_packets: 58776199 dev.igb.1.queue3.rxd_head: 692 dev.igb.1.queue3.rxd_tail: 691 dev.igb.1.queue3.rx_packets: 55138996 dev.igb.1.queue3.rx_bytes: 9310217354 dev.igb.1.queue4.interrupt_rate: 10 dev.igb.1.queue4.txd_head: 1721 dev.igb.1.queue4.txd_tail: 1721 dev.igb.1.queue4.tx_packets: 54337209 dev.igb.1.queue4.rxd_head: 1609 dev.igb.1.queue4.rxd_tail: 1598 dev.igb.1.queue4.rx_packets: 46546503 dev.igb.1.queue4.rx_bytes: 8818182840 dev.igb.1.queue5.interrupt_rate: 11627 dev.igb.1.queue5.txd_head: 254 dev.igb.1.queue5.txd_tail: 254 dev.igb.1.queue5.tx_packets: 53117182 dev.igb.1.queue5.rxd_head: 701 dev.igb.1.queue5.rxd_tail: 685 dev.igb.1.queue5.rx_packets: 43014837 dev.igb.1.queue5.rx_bytes: 8699057447
Re: High CPU interrupt load on intel I350T4 with igb on 8.3
The point of lists is to be able to benefit from other's experiences so you don't have to waste your time trying things that others have already done. I'm not pontificating. I've done the tests. There's no reason for every person who is having to exact same problem to do the same tests over and over, hoping for somemagically different result. The result will always be the same. Because there's no chance of it working properly by chance. BC --- On Sun, 4/28/13, Jack Vogel jfvo...@gmail.com wrote: From: Jack Vogel jfvo...@gmail.com Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3 To: Barney Cordoba barney_cord...@yahoo.com Cc: FreeBSD Net freebsd-net@freebsd.org, Clément Hermann (nodens) nodens2...@gmail.com Date: Sunday, April 28, 2013, 1:07 PM Try setting your queues to 1, run some tests, then try settingyour queues to 2, then to 4... its called tuning, and rather thanjust pontificating about it, which Barney so loves to do, you can discover what works best. I ran tests last week preparing for anew driver version and found the best results came not only whiletweaking queues, but also ring size, and I could see changes based on the buf ring size There are lots of things that may improve ordegrade performance depending on the workload. Jack On Sun, Apr 28, 2013 at 7:21 AM, Barney Cordoba barney_cord...@yahoo.com wrote: --- On Fri, 4/26/13, Clément Hermann (nodens) nodens2...@gmail.com wrote: From: Clément Hermann (nodens) nodens2...@gmail.com Subject: High CPU interrupt load on intel I350T4 with igb on 8.3 To: freebsd-net@freebsd.org Date: Friday, April 26, 2013, 7:31 AM Hi list, We use pf+ALTQ for trafic shaping on some routers. We are switching to new servers : Dell PowerEdge R620 with 2 8-cores Intel Processor (E5-2650L), 8GB RAM and Intel I350T4 (quad port) using igb driver. The old hardware is using em driver, the CPU load is high but mostly due to kernel and a large pf ruleset. On the new hardware, we see high CPU Interrupt load (up to 95%), even though there is not much trafic currently (peaks about 150Mbps and 40Kpps). All queues are used and binded to a cpu according to top, but a lot of CPU time is spent on igb queues (interrupt or wait). The load is fine when we stay below 20Kpps. We see no mbuf shortage, no dropped packet, but there is little margin left on CPU time (about 25% idle at best, most of CPU time is spent on interrupts), which is disturbing. We have done some tuning, but to no avail : sysctl.conf : # mbufs kern.ipc.nmbclusters=65536 # Sockets kern.ipc.somaxconn=8192 net.inet.tcp.delayed_ack=0 net.inet.tcp.sendspace=65535 net.inet.udp.recvspace=65535 net.inet.udp.maxdgram=57344 net.local.stream.recvspace=65535 net.local.stream.sendspace=65535 # IGB dev.igb.0.rx_processing_limit=4096 dev.igb.1.rx_processing_limit=4096 dev.igb.2.rx_processing_limit=4096 dev.igb.3.rx_processing_limit=4096 /boot/loader.conf : vm.kmem_size=1G hw.igb.max_interrupt_rate=32000 # maximum number of interrupts/sec generated by single igb(4) (default 8000) hw.igb.txd=2048 # number of transmit descriptors allocated by the driver (2048 limit) hw.igb.rxd=2048 # number of receive descriptors allocated by the driver (2048 limit) hw.igb.rx_process_limit=1000 # maximum number of received packets to process at a time, The default of 100 is # too low for most firewalls. (-1 means unlimited) Kernel HZ is 1000. The IGB /boot/loader.conf tuning was our last attempt, it didn't change anything. Does anyone have any pointer ? How could we lower CPU interrupt load ? should we set hw.igb.max_interrupt_rate lower instead of higher ? From what we saw here and there, we should be able to do much better with this hardware. relevant sysctl (igb1 and igb2 only, other interfaces are unused) : sysctl dev.igb | grep -v : 0$ dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.1 dev.igb.1.%driver: igb dev.igb.1.%location: slot=0 function=1 dev.igb.1.%pnpinfo: vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x5001 class=0x02 dev.igb.1.%parent: pci5 dev.igb.1.nvm: -1 dev.igb.1.enable_aim: 1 dev.igb.1.fc: 3 dev.igb.1.rx_processing_limit: 4096 dev.igb.1.eee_disabled: 1 dev.igb.1.link_irq: 2 dev.igb.1.device_control: 1209795137 dev.igb.1.rx_control: 67141658 dev.igb.1.interrupt_mask: 4 dev.igb.1.extended_int_mask: 2147483981 dev.igb.1.fc_high_water: 33168 dev.igb.1.fc_low_water: 33152 dev.igb.1.queue0.interrupt_rate: 71428 dev.igb.1.queue0.txd_head: 1318 dev.igb.1.queue0.txd_tail: 1318 dev.igb.1.queue0.tx_packets: 84663594 dev.igb.1.queue0.rxd_head: 717 dev.igb.1.queue0.rxd_tail: 715 dev.igb.1.queue0.rx_packets: 43899597 dev.igb
Re: pf performance?
--- On Fri, 4/26/13, Erich Weiler wei...@soe.ucsc.edu wrote: From: Erich Weiler wei...@soe.ucsc.edu Subject: Re: pf performance? To: Andre Oppermann an...@freebsd.org Cc: Paul Tatarsky p...@soe.ucsc.edu, freebsd-net@freebsd.org Date: Friday, April 26, 2013, 12:04 PM But the work pf does would show up in 'system' on top right? So if I see all my CPUs tied up 100% in 'interrupts' and very little 'system', would it be a reasonable assumption to think that if I got more CPU cores to handle the interrupts that eventually I would see 'system' load increase as the interrupt load became faster to be handled? And thus increase my bandwidth? Having the work of pf show up in 'interrupts' or 'system' depends on the network driver and how it handles sending packets up the stack. In most cases drivers deliver packets from interrupt context. Ah, I see. Definitely appears for me in interrupts then. I've got the mxge driver doing the work here. So, given that I can spread out the interrupts to every core (like, pin an interrupt queue to each core), I can have all my cores work on the process. But seeing as though the pf bit is still serialized I'm not sure that I understand how it is serialized when many CPUs are handling interrupts, and hence doing the work of pf as well? Wouldn't that indicate that the work of pf is being handled by many cores, as many cores are handling the interrupts? you're thinking exactly backwards. You're creating lock contention by having a bunch of receive processes going into a single threaded pf process. Think of it like a six lane highway that has 5 lanes closed a mile up the road. The result isn't that you go the same speed as a 1 lane highway; what you have is a parking lot. The only thing you're doing by spreading the interrupts is using up more cycles on more cores. What you *should* be doing, if you can engineer it, is use 1 path through the pf filter. You could have 4 queues feed a single process that dequeues and runs through the filter. The problem with that is that the pf process IS the bottleneck in that its slower than the receive processes, so you'd best just use the other cores to do userland stuff. You could use cpuset to make sure that no userland process uses the interrupt core, and dedicate 1 cpu to packet filtering. 1 modern CPU can easily handle a gig of traffic. There's no need to spread in most case. BC Or are you saying that pf *is* being handled by many cores, but just in a serialized nature? Like, packet 1 is handled by core 0, then packet 2 is handled by core 1, packet 3 is handled by core 4, etc? Such that even though multiple cores are handling it, they are just doing so serially and not in parallel? And if so, maybe it still helps to have many cores? Thanks for all the excellent info! In other words, until I see like 100% system usage in one core, I would have room to grow? You have room to grow if 'idle' is more than 0% and the interrupts of the networks cards are running on different cores. If one core gets all the interrupts a second idle core doesn't get the chance to help out. IIRC the interrupt allocation to cores is done at interrupt registration time or driver attach time. It can be re-distributed at run time on most architecture but I'm not sure we have an easily accessible API for that. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb and ALTQ in 9.1-rc3
Firstly, my OP was not intended to have anything to do with Jack. Frankly, he's just a mechanical programmer and not a designer, so its others that should be responsible for guiding him. There *should* be someone at FreeBSD who is responsible for taking mechanically sound drivers and optimizing them. But in the open source world, that function doesn't usually exist. But the idea that the FreeBSD community refuses to point out that some things just don't work well hurts people in the community. The portrayal of every feature in FreeBSD as being equally useful and well done doesn't provide the information that users need to make decisions about what to use to run their businesses. It also hurts people that you've made IGB worse in FreeBSD 9. There's *some* expectation that it should be better in 9 than 7 or 8, and that it should have fewer bugs. But in an effort to force in a rickety implementation of multi-queue, you've converted the driver into something that is guaranteed to rob any system of cpu cycles. I wrote a multiqueue driver for 7 for igb that works very well, and I'd hoped to be able to use igb in 9 without having to port it, even if it wasn't as good. But it's not just not as good; it's unusable in a heavy production environment. While it's noble (and convenient) for you folks who have jobs where you get paid to write open source code to rip others for not contributing, Im sure that some of you with real jobs know that when someone pays you a lot of money to write code, you're not free to share the code or even the specific techniques publicly. After all, technique is the difference between a good and a bad driver. I try to drop hints, but Jack's lack of curiosity as to how to make the driver better is particularly troubling. So I just have to recommend that igb cards not be used for production flows, because there is little hope that it will improve any time soon. BC --- On Sun, 3/31/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com Cc: Jeffrey EPieper jeffrey.e.pie...@intel.com, Nick Rogers ncrog...@gmail.com, Clement Hermann (nodens) nodens2...@gmail.com, Jack Vogel jfvo...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Sunday, March 31, 2013, 2:48 PM Barney, As much as we'd like it, Jack's full time job involves other things besides supporting FreeBSD. If you want to see it done better, please work with the FreeBSD developer community and improve the intel driver. No-one is stopping you from stepping in. In fact, this would be _beneficial_ to Jack's work inside Intel with FreeBSD. If there is more of an active community participating with the intel drivers and more companies choosing intel hardware for FreeBSD network services, Intel will likely dump more effort into FreeBSD. So please, stop your non-constructive trolling and complaining and put your skills to use for the greater good. Sheesh. Intel have supplied a very thorough, detailed driver as well as programming and errata datasheets for their chips. We aren't in the dark here. There's more than enough rope to hang ourselves with. Please focus on making it better. Adrian On 31 March 2013 05:35, Barney Cordoba barney_cord...@yahoo.com wrote: The reason that Jack is a no better programmer now than he was in 2009 might have something to do with the fact that he hides when his work is criticized. Why not release the benchmarks you did while designing the igb driver, Jack? Say what,you didn't do any benchmarking? How does the default driver perform, say in a firewall,with 1000 user load? What's the optimum number of queues to use in such a system?What's the effect of CPU binding? What's the effect with multiple cards when you havemore queues than you have physical cpus? What made you decide to use buf_ring? Something new to play with? I'm guessing that you have no idea. BC--- On Fri, 3/29/13, Jack Vogel jfvo...@gmail.com wrote: From: Jack Vogel jfvo...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Pieper, Jeffrey E jeffrey.e.pie...@intel.com Cc: Barney Cordoba barney_cord...@yahoo.com, Nick Rogers ncrog...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 12:36 PM Fortunately, Barney doesn't speak for me, or for Intel, and I've long ago realized its pointless to attempt anything like a fair conversation with him. The only thing he's ever contributed is slander and pseudo-critique... another poison thread I'm done with. Jack On Fri, Mar 29, 2013 at 8:45 AM, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote: -Original Message- From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On Behalf Of Barney Cordoba Sent: Friday, March 29, 2013
Re: igb and ALTQ in 9.1-rc3
--- On Tue, 4/2/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: igb and ALTQ in 9.1-rc3 To: Nick Rogers ncrog...@gmail.com Cc: Karim Fodil-Lemelin fodillemlinka...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Tuesday, April 2, 2013, 6:39 PM Yes: * you need to add it to conf/options - see if there's an opt_igb.h to add it to, otherwise you'll need to add one; * Make sure the driver code includes opt_igb.h; * Then make sure you make kernel modules using either make buildkernel KERNCONF=X, or you set the environment appropriately so the build scripts can find your kernel build directory (where it populates all the opt_xxx.h includes) and it'll have this module set. Hopefully Jack will do this. Yes, we need a better queue management discipline API in the kernel. Jacks' just falling afoul of the fact we don't have one. It's not his fault. That's not true at all. For a bridged system running a firewall or doing filtering, virtually all of the proper design can be done in the ethernet driver. Or course if you have 2 different drivers then you need a different scheme, but if the input and the output is the same driver you can manage virtually all of the contention. You can't just randomly do things; you have to design to minimize lock contention. Drivers that seem to work fine at low volume blow up quickly as contention increases. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb and ALTQ in 9.1-rc3
Do you know anything about the subject, Scott? I'd be interested in seeing your benchmarks with various queue counts, binding to cpus vs not binding, and the numbers comparing the pre multiqueue driver to the current one. It's the minimum that any marginally competent network driver developer would do. Or are you just hurling insults because you're devoid of actual ideas? BC --- On Fri, 3/29/13, Scott Long scott4l...@yahoo.com wrote: From: Scott Long scott4l...@yahoo.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com Cc: Nick Rogers ncrog...@gmail.com, Adrian Chadd adr...@freebsd.org, Jeffrey EPieper jeffrey.e.pie...@intel.com, freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com, Jack Vogel jfvo...@gmail.com Date: Friday, March 29, 2013, 12:42 PM Comedy gold. It's been a while since I've seen this much idiocy from you, Barney. Hopefully the rest of the mailing list will blackhole you, as I'm about to, and we can all get back to real work. Scott On Mar 29, 2013, at 10:38 AM, Barney Cordoba barney_cord...@yahoo.com wrote: it needs a lot more than a patch. It needs to be completely re-thunk --- On Fri, 3/29/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com Cc: Jack Vogel jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com, Jeffrey EPieper jeffrey.e.pie...@intel.com, freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 12:07 PM Barney, Patches gratefully accepted. Adrian On 29 March 2013 08:54, Barney Cordoba barney_cord...@yahoo.com wrote: --- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote: From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com Subject: RE: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com, Jack Vogel jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 11:45 AM -Original Message- From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On Behalf Of Barney Cordoba Sent: Friday, March 29, 2013 5:51 AM To: Jack Vogel; Nick Rogers Cc: freebsd-net@freebsd.org; Clement Hermann (nodens) Subject: Re: igb and ALTQ in 9.1-rc3 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote: From: Nick Rogers ncrog...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Thursday, March 28, 2013, 9:29 PM On Thu, Mar 28, 2013 at 4:16 PM, Jack Vogel jfvo...@gmail.com wrote: Have been kept fairly busy with other matters, one thing I could do short term is change the defines in igb the way I did in the em driver so you could still define the older if_start entry. Right now those are based on OS version and so you will automatically get if_transmit, but I could change it to be IGB_LEGACY_TX or so, and that could be defined in the Makefile. Would this help? I'm currently using ALTQ successfully with the em driver, so if igb behaved the same with respect to using if_start instead of if_transmit when ALTQ is in play, that would be great. I do not completely understand the change you propose as I am not very familiar with the driver internals. Any kind of patch or extra Makefile/make.conf definition that would allow me to build a 9-STABLE kernel with an igb driver that works again with ALTQ, ASAP, would be much appreciated. Jack On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com wrote: On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com wrote: On Mon, Dec 10, 2012 at 11:58 PM, Gleb Smirnoff gleb...@freebsd.org wrote: On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from
Re: igb and ALTQ in 9.1-rc3
The reason that Jack is a no better programmer now than he was in 2009 might have something to do with the fact that he hides when his work is criticized. Why not release the benchmarks you did while designing the igb driver, Jack? Say what,you didn't do any benchmarking? How does the default driver perform, say in a firewall,with 1000 user load? What's the optimum number of queues to use in such a system?What's the effect of CPU binding? What's the effect with multiple cards when you havemore queues than you have physical cpus? What made you decide to use buf_ring? Something new to play with? I'm guessing that you have no idea. BC--- On Fri, 3/29/13, Jack Vogel jfvo...@gmail.com wrote: From: Jack Vogel jfvo...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Pieper, Jeffrey E jeffrey.e.pie...@intel.com Cc: Barney Cordoba barney_cord...@yahoo.com, Nick Rogers ncrog...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 12:36 PM Fortunately, Barney doesn't speak for me, or for Intel, and I've long ago realized its pointless to attempt anything like a fair conversation with him. The only thing he's ever contributed is slander and pseudo-critique... another poison thread I'm done with. Jack On Fri, Mar 29, 2013 at 8:45 AM, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote: -Original Message- From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On Behalf Of Barney Cordoba Sent: Friday, March 29, 2013 5:51 AM To: Jack Vogel; Nick Rogers Cc: freebsd-net@freebsd.org; Clement Hermann (nodens) Subject: Re: igb and ALTQ in 9.1-rc3 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote: From: Nick Rogers ncrog...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Thursday, March 28, 2013, 9:29 PM On Thu, Mar 28, 2013 at 4:16 PM, Jack Vogel jfvo...@gmail.com wrote: Have been kept fairly busy with other matters, one thing I could do short term is change the defines in igb the way I did in the em driver so you could still define the older if_start entry. Right now those are based on OS version and so you will automatically get if_transmit, but I could change it to be IGB_LEGACY_TX or so, and that could be defined in the Makefile. Would this help? I'm currently using ALTQ successfully with the em driver, so if igb behaved the same with respect to using if_start instead of if_transmit when ALTQ is in play, that would be great. I do not completely understand the change you propose as I am not very familiar with the driver internals. Any kind of patch or extra Makefile/make.conf definition that would allow me to build a 9-STABLE kernel with an igb driver that works again with ALTQ, ASAP, would be much appreciated. Jack On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com wrote: On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com wrote: On Mon, Dec 10, 2012 at 11:58 PM, Gleb Smirnoff gleb...@freebsd.org wrote: On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from if_start to if_transmit effectively disables ALTQ support. AFAIR, there is some magic implemented in other drivers that makes them modern (that means using if_transmit), but still capable to switch to queueing mode if SIOCADDALTQ was casted upon them. Oh, hmmm, I'll look into the matter after my vacation. Jack Has there been any progress on resolving this issue? I recently ran into this problem upgrading my servers from 8.3 to 9.1-RELEASE and am wondering what the latest recommendation is. I've used ALTQ and igb successfully for years and it is unfortunate it no longer works. Appreciate any advice. Do yourself a favor and either get a cheap dual port 82571 card or 2 cards and disable the IGB ports. The igb driver is defective, and until they back out the new, untested multi-queue stuff you're just neutering your system trying to use it. Frankly this project made a huge mistake by moving forward with multi queue just for the sake of saying that you support it; without having any credible plan for implementing it. That nonsense that Bill Macy did should have been tarballed up and deposited in the trash folder
Re: igb and ALTQ in 9.1-rc3
--- On Fri, 3/29/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: igb and ALTQ in 9.1-rc3 To: Nick Rogers ncrog...@gmail.com Cc: Pieper, Jeffrey E jeffrey.e.pie...@intel.com, freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com, Jack Vogel jfvo...@gmail.com Date: Friday, March 29, 2013, 1:10 PM On 29 March 2013 10:04, Nick Rogers ncrog...@gmail.com wrote: Multiqueue or not, I would appreciate any help with this thread's original issue. Whether or not its the ideal thing to do, I cannot simply just replace the NICs with an em(4) variant, as I have hundreds of customers/systems already in production running 8.3 and relying on the igb driver + ALTQ. I need to be able to upgrade these systems to 9.1 without making hardware changes. If it's that critical, have you thought about contracting out that task to a developer? You have 100s of systems/customers using 1990s-class traffic shaping and you have no programmer on staff with the skills to patch and test an ethernet driver? the igb driver has always sucked rocks, why did you use them in the first place. Or did they just happen to be on the MB you use? BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb and ALTQ in 9.1-rc3
--- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote: From: Nick Rogers ncrog...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Thursday, March 28, 2013, 9:29 PM On Thu, Mar 28, 2013 at 4:16 PM, Jack Vogel jfvo...@gmail.com wrote: Have been kept fairly busy with other matters, one thing I could do short term is change the defines in igb the way I did in the em driver so you could still define the older if_start entry. Right now those are based on OS version and so you will automatically get if_transmit, but I could change it to be IGB_LEGACY_TX or so, and that could be defined in the Makefile. Would this help? I'm currently using ALTQ successfully with the em driver, so if igb behaved the same with respect to using if_start instead of if_transmit when ALTQ is in play, that would be great. I do not completely understand the change you propose as I am not very familiar with the driver internals. Any kind of patch or extra Makefile/make.conf definition that would allow me to build a 9-STABLE kernel with an igb driver that works again with ALTQ, ASAP, would be much appreciated. Jack On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com wrote: On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com wrote: On Mon, Dec 10, 2012 at 11:58 PM, Gleb Smirnoff gleb...@freebsd.org wrote: On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from if_start to if_transmit effectively disables ALTQ support. AFAIR, there is some magic implemented in other drivers that makes them modern (that means using if_transmit), but still capable to switch to queueing mode if SIOCADDALTQ was casted upon them. Oh, hmmm, I'll look into the matter after my vacation. Jack Has there been any progress on resolving this issue? I recently ran into this problem upgrading my servers from 8.3 to 9.1-RELEASE and am wondering what the latest recommendation is. I've used ALTQ and igb successfully for years and it is unfortunate it no longer works. Appreciate any advice. Do yourself a favor and either get a cheap dual port 82571 card or 2 cards and disable the IGB ports. The igb driver is defective, and until they back out the new, untested multi-queue stuff you're just neutering your system trying to use it. Frankly this project made a huge mistake by moving forward with multi queue just for the sake of saying that you support it; without having any credible plan for implementing it. That nonsense that Bill Macy did should have been tarballed up and deposited in the trash folder. The biggest mess in programming history. That being said, the solution is not to hack the igb driver; its to make ALTQ if_transmit compatible, which shouldn't be all that difficult. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: vlan with modified MAC fails to communicate
--- On Fri, 3/29/13, Pablo Ribalta Lorenzo r...@semihalf.com wrote: From: Pablo Ribalta Lorenzo r...@semihalf.com Subject: vlan with modified MAC fails to communicate To: freebsd-net@freebsd.org Date: Friday, March 29, 2013, 7:53 AM Hi there! Lately I've been investigating an issue that I would like to share, as I feel I may have to attack it from a different end. I have an ethernet interface from where I create a vlan. Once I set up the ip address in the vlan I can ping correctly on both sides. The issue arrives when I try to change the MAC address of the vlan, as from then on it fails to communicate unless: - I restore vlan's MAC address to its previous value - I enable promisc mode. It's also worth to mention that my current setup is FreeBSD 8.3 and the NIC driver I'm using is not fully mature. I was wondering if this behavior is due to some limitations in the NCI driver I'm using or if in fact it's the correct way to proceed, as it was possible to reproduce this same issue in FreeBSD 8.3 and FreeBSD CURRENT versions, even using more mature NIC drivers as 'em' and 're'. Could somebody please shed some light in this? Thank you. Without looking at the code, it's likely that you should be changing the MAC address BEFORE you set up the VLAN. The mac is probably being mapped into some table that being used to track the vlans. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
RE: igb and ALTQ in 9.1-rc3
--- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote: From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com Subject: RE: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com, Jack Vogel jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 11:45 AM -Original Message- From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On Behalf Of Barney Cordoba Sent: Friday, March 29, 2013 5:51 AM To: Jack Vogel; Nick Rogers Cc: freebsd-net@freebsd.org; Clement Hermann (nodens) Subject: Re: igb and ALTQ in 9.1-rc3 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote: From: Nick Rogers ncrog...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Thursday, March 28, 2013, 9:29 PM On Thu, Mar 28, 2013 at 4:16 PM, Jack Vogel jfvo...@gmail.com wrote: Have been kept fairly busy with other matters, one thing I could do short term is change the defines in igb the way I did in the em driver so you could still define the older if_start entry. Right now those are based on OS version and so you will automatically get if_transmit, but I could change it to be IGB_LEGACY_TX or so, and that could be defined in the Makefile. Would this help? I'm currently using ALTQ successfully with the em driver, so if igb behaved the same with respect to using if_start instead of if_transmit when ALTQ is in play, that would be great. I do not completely understand the change you propose as I am not very familiar with the driver internals. Any kind of patch or extra Makefile/make.conf definition that would allow me to build a 9-STABLE kernel with an igb driver that works again with ALTQ, ASAP, would be much appreciated. Jack On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com wrote: On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com wrote: On Mon, Dec 10, 2012 at 11:58 PM, Gleb Smirnoff gleb...@freebsd.org wrote: On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from if_start to if_transmit effectively disables ALTQ support. AFAIR, there is some magic implemented in other drivers that makes them modern (that means using if_transmit), but still capable to switch to queueing mode if SIOCADDALTQ was casted upon them. Oh, hmmm, I'll look into the matter after my vacation. Jack Has there been any progress on resolving this issue? I recently ran into this problem upgrading my servers from 8.3 to 9.1-RELEASE and am wondering what the latest recommendation is. I've used ALTQ and igb successfully for years and it is unfortunate it no longer works. Appreciate any advice. Do yourself a favor and either get a cheap dual port 82571 card or 2 cards and disable the IGB ports. The igb driver is defective, and until they back out the new, untested multi-queue stuff you're just neutering your system trying to use it. Frankly this project made a huge mistake by moving forward with multi queue just for the sake of saying that you support it; without having any credible plan for implementing it. That nonsense that Bill Macy did should have been tarballed up and deposited in the trash folder. The biggest mess in programming history. That being said, the solution is not to hack the igb driver; its to make ALTQ if_transmit compatible, which shouldn't be all that difficult. BC I may be misunderstanding what you are saying, but if the solution is, as you say not to hack the igb driver, then how is it defective in this case? Or are you just directing vitriol toward Intel? Multi-queue is working fine in igb. Jeff It's defective because it's been poorly implemented and has more bugs than a Manhattan hotel bed. Adding queues without a proper plan just add more lock contention. It's not a production-ready driver. As Jack once said, Intel doesn't care about performance, they're just example drivers. igb is an example of how not to do things. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail
RE: igb and ALTQ in 9.1-rc3
--- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote: From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com Subject: RE: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com, Jack Vogel jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 11:45 AM -Original Message- From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On Behalf Of Barney Cordoba Sent: Friday, March 29, 2013 5:51 AM To: Jack Vogel; Nick Rogers Cc: freebsd-net@freebsd.org; Clement Hermann (nodens) Subject: Re: igb and ALTQ in 9.1-rc3 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote: From: Nick Rogers ncrog...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Thursday, March 28, 2013, 9:29 PM On Thu, Mar 28, 2013 at 4:16 PM, Jack Vogel jfvo...@gmail.com wrote: Have been kept fairly busy with other matters, one thing I could do short term is change the defines in igb the way I did in the em driver so you could still define the older if_start entry. Right now those are based on OS version and so you will automatically get if_transmit, but I could change it to be IGB_LEGACY_TX or so, and that could be defined in the Makefile. Would this help? I'm currently using ALTQ successfully with the em driver, so if igb behaved the same with respect to using if_start instead of if_transmit when ALTQ is in play, that would be great. I do not completely understand the change you propose as I am not very familiar with the driver internals. Any kind of patch or extra Makefile/make.conf definition that would allow me to build a 9-STABLE kernel with an igb driver that works again with ALTQ, ASAP, would be much appreciated. Jack On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com wrote: On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com wrote: On Mon, Dec 10, 2012 at 11:58 PM, Gleb Smirnoff gleb...@freebsd.org wrote: On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from if_start to if_transmit effectively disables ALTQ support. AFAIR, there is some magic implemented in other drivers that makes them modern (that means using if_transmit), but still capable to switch to queueing mode if SIOCADDALTQ was casted upon them. Oh, hmmm, I'll look into the matter after my vacation. Jack Has there been any progress on resolving this issue? I recently ran into this problem upgrading my servers from 8.3 to 9.1-RELEASE and am wondering what the latest recommendation is. I've used ALTQ and igb successfully for years and it is unfortunate it no longer works. Appreciate any advice. Do yourself a favor and either get a cheap dual port 82571 card or 2 cards and disable the IGB ports. The igb driver is defective, and until they back out the new, untested multi-queue stuff you're just neutering your system trying to use it. Frankly this project made a huge mistake by moving forward with multi queue just for the sake of saying that you support it; without having any credible plan for implementing it. That nonsense that Bill Macy did should have been tarballed up and deposited in the trash folder. The biggest mess in programming history. That being said, the solution is not to hack the igb driver; its to make ALTQ if_transmit compatible, which shouldn't be all that difficult. BC I may be misunderstanding what you are saying, but if the solution is, as you say not to hack the igb driver, then how is it defective in this case? Or are you just directing vitriol toward Intel? Multi-queue is working fine in igb. Jeff It works like crap. Your definition of works is that it doesnt crash. Mine is that it works better with multiple queues than with 1, which it doesn't. And if you load a system up, it will blow up with multiqueue before it will with 1 queue. The point of using Multiqueue isn't to exhaust all of the cpus instead of just 2. It's to get past the wall of using only 2 cpus when they're exhausted. The goal is to increase the capacity of the system; not to make it look like you're using more cpus without any
Re: igb and ALTQ in 9.1-rc3
it needs a lot more than a patch. It needs to be completely re-thunk --- On Fri, 3/29/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com Cc: Jack Vogel jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com, Jeffrey EPieper jeffrey.e.pie...@intel.com, freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 12:07 PM Barney, Patches gratefully accepted. Adrian On 29 March 2013 08:54, Barney Cordoba barney_cord...@yahoo.com wrote: --- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote: From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com Subject: RE: igb and ALTQ in 9.1-rc3 To: Barney Cordoba barney_cord...@yahoo.com, Jack Vogel jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com Date: Friday, March 29, 2013, 11:45 AM -Original Message- From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On Behalf Of Barney Cordoba Sent: Friday, March 29, 2013 5:51 AM To: Jack Vogel; Nick Rogers Cc: freebsd-net@freebsd.org; Clement Hermann (nodens) Subject: Re: igb and ALTQ in 9.1-rc3 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote: From: Nick Rogers ncrog...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Thursday, March 28, 2013, 9:29 PM On Thu, Mar 28, 2013 at 4:16 PM, Jack Vogel jfvo...@gmail.com wrote: Have been kept fairly busy with other matters, one thing I could do short term is change the defines in igb the way I did in the em driver so you could still define the older if_start entry. Right now those are based on OS version and so you will automatically get if_transmit, but I could change it to be IGB_LEGACY_TX or so, and that could be defined in the Makefile. Would this help? I'm currently using ALTQ successfully with the em driver, so if igb behaved the same with respect to using if_start instead of if_transmit when ALTQ is in play, that would be great. I do not completely understand the change you propose as I am not very familiar with the driver internals. Any kind of patch or extra Makefile/make.conf definition that would allow me to build a 9-STABLE kernel with an igb driver that works again with ALTQ, ASAP, would be much appreciated. Jack On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com wrote: On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com wrote: On Mon, Dec 10, 2012 at 11:58 PM, Gleb Smirnoff gleb...@freebsd.org wrote: On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from if_start to if_transmit effectively disables ALTQ support. AFAIR, there is some magic implemented in other drivers that makes them modern (that means using if_transmit), but still capable to switch to queueing mode if SIOCADDALTQ was casted upon them. Oh, hmmm, I'll look into the matter after my vacation. Jack Has there been any progress on resolving this issue? I recently ran into this problem upgrading my servers from 8.3 to 9.1-RELEASE and am wondering what the latest recommendation is. I've used ALTQ and igb successfully for years and it is unfortunate it no longer works. Appreciate any advice. Do yourself a favor and either get a cheap dual port 82571 card or 2 cards and disable the IGB ports. The igb driver is defective, and until they back out the new, untested multi-queue stuff you're just neutering your system trying to use it. Frankly this project made a huge mistake by moving forward with multi queue just for the sake of saying that you support it; without having any credible plan for implementing it. That nonsense that Bill Macy did should have been tarballed up and deposited in the trash folder. The biggest mess in programming history. That being said, the solution is not to hack the igb driver; its to make ALTQ if_transmit compatible, which shouldn't be all
Re: igb network lockups
--- On Mon, 3/4/13, Zaphod Beeblebrox zbee...@gmail.com wrote: From: Zaphod Beeblebrox zbee...@gmail.com Subject: Re: igb network lockups To: Jack Vogel jfvo...@gmail.com Cc: Nick Rogers ncrog...@gmail.com, Sepherosa Ziehau sepher...@gmail.com, Christopher D. Harrison harri...@biostat.wisc.edu, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Monday, March 4, 2013, 1:58 PM For everyone having lockup problems with IGB, I'd like to ask if they could try disabling hyperthreads --- this worked for me on one system but has been unnecessary on others. Gee, maybe binding an interrupt to a virtual cpu isn't a good idea? BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb network lockups
--- On Mon, 3/4/13, Zaphod Beeblebrox zbee...@gmail.com wrote: From: Zaphod Beeblebrox zbee...@gmail.com Subject: Re: igb network lockups To: Jack Vogel jfvo...@gmail.com Cc: Nick Rogers ncrog...@gmail.com, Sepherosa Ziehau sepher...@gmail.com, Christopher D. Harrison harri...@biostat.wisc.edu, freebsd-net@freebsd.org freebsd-net@freebsd.org Date: Monday, March 4, 2013, 1:58 PM For everyone having lockup problems with IGB, I'd like to ask if they could try disabling hyperthreads --- this worked for me on one system but has been unnecessary on others. Gee, maybe binding an interrupt to a virtual cpu isn't a good idea? BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb network lockups
--- On Mon, 2/25/13, Christopher D. Harrison harri...@biostat.wisc.edu wrote: From: Christopher D. Harrison harri...@biostat.wisc.edu Subject: Re: igb network lockups To: Jack Vogel jfvo...@gmail.com Cc: freebsd-net@freebsd.org Date: Monday, February 25, 2013, 1:38 PM Sure, The problem appears on both systems running with ALTQ and vanilla. -C On 02/25/13 12:29, Jack Vogel wrote: I've not heard of this problem, but I think most users do not use ALTQ, and we (Intel) do not test using it. Can it be eliminated from the equation? Jack On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. Harrison harri...@biostat.wisc.edu mailto:harri...@biostat.wisc.edu wrote: I recently have been experiencing network freezes and network lockups on our Freebsd 9.1 systems which are running zfs and nfs file servers. I upgraded from 9.0 to 9.1 about 2 months ago and we have been having issues with almost bi-monthly. The issue manifests in the system becomes unresponsive to any/all nfs clients. The system is not resource bound as our I/O is low to disk and our network is usually in the 20mbit/40mbit range. We do notice a correlation between temporary i/o spikes and network freezes but not enough to send our system in to lockup mode for the next 5min. Currently we have 4 igb nics in 2 aggr's with 8 queue's per nic and our dev.igb reports: dev.igb.3.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.4 I am almost certain the problem is with the ibg driver as a friend is also experiencing the same problem with the same intel igb nic. He has addressed the issue by restarting the network using netif on his systems. According to my friend, once the network interfaces get cleared, everything comes back and starts working as expected. I have noticed an issue with the igb driver and I was looking for thoughts on how to help address this problem. http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html Thoughts/Ideas are greatly appreciated!!! -C Do you have 32 cpus in the system? You've created a lock contention nightmare; frankly Im surprised that the system runs at all. Try running with 1 queue per nic. The point of using queues is to spread the load; the fact that you're even using queues with such a minuscule load is a commentary on the blind use of features without any explanation or understanding of what they do. Does igb still bind to CPUs without regard to whether its a real cpu or a hyper thread? This needs to be removed. I wish that someone who understood this stuff would have a beer with Jack and explain to him why this design is defective. The default for this driver is almost always the wrong configuration. You don't need to spread the load with 40Mb/s throughput, and using multiple queues will use a lot more CPU than using just 1. do you really want 4 cpus using 10% instead of 1 using 14%? You also should consider increasing your tx buffers; a property of applications like ALTQ is that they tend to send out big bursts of packets and they can overflow the rings. I'm not specifically familiar with ALTQ so Im not sure how it handles such things; nor am I sure of how it handles multiple tx queues, if at all. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
--- On Fri, 1/18/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org, Luigi Rizzo ri...@iet.unipi.it Date: Friday, January 18, 2013, 3:09 PM On 18 January 2013 06:30, Barney Cordoba barney_cord...@yahoo.com wrote: I don't see the distinction between the rx thread getting re-scheduled immediately vs introducing another thread. In fact you increase missed interrupts by this method. The entire point of interrupt moderation is to tune the intervals where a driver is processed. The problem with interrupt moderation combined with enabling/disabling interrupts is that if you get it even slightly wrong, you won't run the packet processing thread(s) until the next interrupt occurs - even if something is in the queue. Which is the point of interrupt moderation. Your argument is that I only want 6000 interrupts per second, but I'm willing to launch N tasks that have the exact same processing load where N = 20. So you're willing to have 12 interrupts/task_queues per second (its only possible to get about 2000pps in 1/6000th of a second on a gigabit link unless you're fielding runts). This all comes down, again, to tuning. Luigi's example would result in 39 tasks being queued to process his 3900 backup with a process limit of 100. This would bypass the next interrupt by a wide margin. Is the point of moderation to not have the network task take over your system? If you don't care, then why not just set moderation to 20Kpps? The work should be the amount of time you're willing to process packets within the interrupt moderation window. The settings go hand in hand. I'm not saying that the task_queue idea is wrong; however in Luigi's example it will cause substantially more overhead by launching too many tasks. Unless you're still running a 700Mhz P3 100 is way too low for a workload limit. It's just arbitrarily silly. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
--- On Fri, 1/18/13, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: freebsd-net@freebsd.org Cc: Barney Cordoba barney_cord...@yahoo.com, Adrian Chadd adr...@freebsd.org, Luigi Rizzo ri...@iet.unipi.it Date: Friday, January 18, 2013, 11:49 AM On Friday, January 18, 2013 9:30:40 am Barney Cordoba wrote: --- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: Barney Cordoba barney_cord...@yahoo.com Cc: Luigi Rizzo ri...@iet.unipi.it, freebsd-net@freebsd.org Date: Thursday, January 17, 2013, 11:48 AM There's also the subtle race condition in TX and RX handling that re-queuing the taskqueue gets around. Which is: * The hardware is constantly receiving frames , right until you blow the FIFO away by filling it up; * The RX thread receives a bunch of frames; * .. and processes them; * .. once it's done processing, the hardware may have read some more frames in the meantime; * .. and the hardware may have generated a mitigated interrupt which you're ignoring, since you're processing frames; * So if your architecture isn't 100% paranoid, you may end up having to wait for the next interrupt to handle what's currently in the queue. Now if things are done correct: * The hardware generates a mitigated interrupt * The mask register has that bit disabled, so you don't end up receiving it; * You finish your RX queue processing, and there's more stuff that's appeared in the FIFO (hence why the hardware has generated another mitigated interrupt); * You unmask the interrupt; * .. and the hardware immediately sends you the MSI or signals an interrupt; * .. thus you re-enter the RX processing thread almost(!) immediately. However as the poster(s) have said, the interrupt mask/unmask in the intel driver(s) may not be 100% correct, so you're going to end up with situations where interrupts are missed. The reason why this wasn't a big deal in the deep/distant past is because we didn't used to have kernel preemption, or multiple kernel threads running, or an overly aggressive scheduler trying to parallelise things as much as possible. A lot of net80211/ath bugs have popped out of the woodwork specifically because of the above changes to the kernel. They were bugs before, but people didn't hit them. I don't see the distinction between the rx thread getting re-scheduled immediately vs introducing another thread. In fact you increase missed interrupts by this method. The entire point of interrupt moderation is to tune the intervals where a driver is processed. You might as well just not have a work limit and process until your done. The idea that gee, I've been taking up too much cpu, I'd better yield to just queue a task and continue soon after doesn't make much sense to me. If there are multiple threads with the same priority then batching the work up into chunks allows the scheduler to round-robin among them. However, when a task requeues itself that doesn't actually work since the taskqueue thread will see the requeued task before it yields the CPU. Alternatively, if you force all the relevant interrupt handlers to use the same thread pool and instead of requeueing a separate task you requeue your handler in the ithread pool then you can get the desired round-robin behavior. (I have changes to the ithread stuff that get us part of the way there in that handlers can reschedule themselves and much of the plumbing is in place for shared thread pools among different interrupts.) I dont see any round robin effect here. You have: Repeat: - Process 100 frames if (more) - Queue a Task there's only 1 task at a time. All its really doing is yielding and rescheduling itself to resume the loop. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
--- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: Barney Cordoba barney_cord...@yahoo.com Cc: Luigi Rizzo ri...@iet.unipi.it, freebsd-net@freebsd.org Date: Thursday, January 17, 2013, 11:48 AM There's also the subtle race condition in TX and RX handling that re-queuing the taskqueue gets around. Which is: * The hardware is constantly receiving frames , right until you blow the FIFO away by filling it up; * The RX thread receives a bunch of frames; * .. and processes them; * .. once it's done processing, the hardware may have read some more frames in the meantime; * .. and the hardware may have generated a mitigated interrupt which you're ignoring, since you're processing frames; * So if your architecture isn't 100% paranoid, you may end up having to wait for the next interrupt to handle what's currently in the queue. Now if things are done correct: * The hardware generates a mitigated interrupt * The mask register has that bit disabled, so you don't end up receiving it; * You finish your RX queue processing, and there's more stuff that's appeared in the FIFO (hence why the hardware has generated another mitigated interrupt); * You unmask the interrupt; * .. and the hardware immediately sends you the MSI or signals an interrupt; * .. thus you re-enter the RX processing thread almost(!) immediately. However as the poster(s) have said, the interrupt mask/unmask in the intel driver(s) may not be 100% correct, so you're going to end up with situations where interrupts are missed. The reason why this wasn't a big deal in the deep/distant past is because we didn't used to have kernel preemption, or multiple kernel threads running, or an overly aggressive scheduler trying to parallelise things as much as possible. A lot of net80211/ath bugs have popped out of the woodwork specifically because of the above changes to the kernel. They were bugs before, but people didn't hit them. I don't see the distinction between the rx thread getting re-scheduled immediately vs introducing another thread. In fact you increase missed interrupts by this method. The entire point of interrupt moderation is to tune the intervals where a driver is processed. You might as well just not have a work limit and process until your done. The idea that gee, I've been taking up too much cpu, I'd better yield to just queue a task and continue soon after doesn't make much sense to me. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
--- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: Barney Cordoba barney_cord...@yahoo.com Cc: Luigi Rizzo ri...@iet.unipi.it, freebsd-net@freebsd.org Date: Thursday, January 17, 2013, 11:48 AM There's also the subtle race condition in TX and RX handling that re-queuing the taskqueue gets around. Which is: * The hardware is constantly receiving frames , right until you blow the FIFO away by filling it up; * The RX thread receives a bunch of frames; * .. and processes them; * .. once it's done processing, the hardware may have read some more frames in the meantime; * .. and the hardware may have generated a mitigated interrupt which you're ignoring, since you're processing frames; * So if your architecture isn't 100% paranoid, you may end up having to wait for the next interrupt to handle what's currently in the queue. Now if things are done correct: * The hardware generates a mitigated interrupt * The mask register has that bit disabled, so you don't end up receiving it; * You finish your RX queue processing, and there's more stuff that's appeared in the FIFO (hence why the hardware has generated another mitigated interrupt); * You unmask the interrupt; * .. and the hardware immediately sends you the MSI or signals an interrupt; * .. thus you re-enter the RX processing thread almost(!) immediately. However as the poster(s) have said, the interrupt mask/unmask in the intel driver(s) may not be 100% correct, so you're going to end up with situations where interrupts are missed. The reason why this wasn't a big deal in the deep/distant past is because we didn't used to have kernel preemption, or multiple kernel threads running, or an overly aggressive scheduler trying to parallelise things as much as possible. A lot of net80211/ath bugs have popped out of the woodwork specifically because of the above changes to the kernel. They were bugs before, but people didn't hit them. I don't see the distinction between the rx thread getting re-scheduled immediately vs introducing another thread. In fact you increase missed interrupts by this method. The entire point of interrupt moderation is to tune the intervals where a driver is processed. You might as well just not have a work limit and process until your done. The idea that gee, I've been taking up too much cpu, I'd better yield to just queue a task and continue soon after doesn't make much sense to me. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
--- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: Luigi Rizzo ri...@iet.unipi.it Cc: Barney Cordoba barney_cord...@yahoo.com, freebsd-net@freebsd.org Date: Thursday, January 17, 2013, 2:04 PM On 17 January 2013 09:44, Luigi Rizzo ri...@iet.unipi.it wrote: (in the lem driver this cannot happen until you release some rx slots, which only happens once at the end of the lem_rxeof() routine, not long before re-enabling interrupts) Right. * .. and the hardware immediately sends you the MSI or signals an interrupt; * .. thus you re-enter the RX processing thread almost(!) immediately. the problem i was actually seeing are slightly different, namely: - once the driver lags behind, it does not have a chance to recover even if there are CPU cycles available, because both interrupt rate and packets per interrupt are capped. Right, but the interrupt isn't being continuously asserted whilst there are packets there. You just get a single interrupt when the queue has frames in it, and you won't get a further interrupt for whatever the mitigation period is (or ever, if you fill up the RX FIFO, right?) - much worse, once the input stream stops, you have a huge backlog that is not drained. And if, say, you try to ping the machine, the incoming packet is behind another 3900 packets, so the first interrupt drains 100 (but not the ping request, so no response), you keep going for a while, eventually the external world sees the machine as not responding and stops even trying to talk to it. Right, so you do need to do what you're doing - but I still think there's a possibility of a race there. Namely that your queue servicing does reach the end of the list (and so you don't immediately reschedule the taskqueue) but some more frames have arrived. You have to wait for the next mitigated interrupt for that. i don't think that's the case. The mitigation is a minimum delay. If the delay is longer than the minimum, you'd get an interrupt as soon as you enable it, which is clearly better than scheduling a task. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
the problem i was actually seeing are slightly different, namely: - once the driver lags behind, it does not have a chance to recover even if there are CPU cycles available, because both interrupt rate and packets per interrupt are capped. - much worse, once the input stream stops, you have a huge backlog that is not drained. And if, say, you try to ping the machine, the incoming packet is behind another 3900 packets, so the first interrupt drains 100 (but not the ping request, so no response), you keep going for a while, eventually the external world sees the machine as not responding and stops even trying to talk to it. This is a silly example. As I said before, the 100 work limit is arbitrary and too low for a busy network. If you have a backlog of 3900 packets with a workload of 100, then your system is so incompetently tuned that it's not even worthy of discussion. If you're using workload and task queues because you don't know how to tune moderation the process_limit, that's one discussion. But if you can't process all of the packets in your RX queue in the interrupt window than you either need to tune your machine better or get a faster machine. When you tune the work limit you're making a decision about the trade off between livelock and dropping packets. It's not an arbitrary decision. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
--- On Wed, 1/16/13, Luigi Rizzo ri...@iet.unipi.it wrote: From: Luigi Rizzo ri...@iet.unipi.it Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org Date: Wednesday, January 16, 2013, 9:55 PM On Wed, Jan 16, 2013 at 06:19:01AM -0800, Barney Cordoba wrote: --- On Tue, 1/15/13, Luigi Rizzo ri...@iet.unipi.it wrote: From: Luigi Rizzo ri...@iet.unipi.it Subject: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: h...@freebsd.org, freebsd-net@freebsd.org freebsd-net@freebsd.org Cc: Jack Vogel jfvo...@gmail.com Date: Tuesday, January 15, 2013, 8:23 PM Hi, i found a couple of problems in ? ? ? ? dev/e1000/if_lem.c::lem_handle_rxtx() , (compare with dev/e1000/if_em.c::em_handle_que() for better understanding): 1. in if_em.c::em_handle_que(), when em_rxeof() exceeds the ? rx_process_limit, the task is rescheduled so it can complete the work. ? Conversely, in if_lem.c::lem_handle_rxtx() the lem_rxeof() is ? only run once, and if there are more pending packets the only ? chance to drain them is to receive (many) more interrupts. ? This is a relatively serious problem, because the receiver has ? a hard time recovering. ? I'd like to commit a fix to this same as it is done in e1000. 2. in if_em.c::em_handle_que(), interrupts are reenabled unconditionally, ???whereas lem_handle_rxtx() only enables them if IFF_DRV_RUNNING is set. ???I cannot really tell what is the correct way here, so I'd like ???to put a comment there unless there is a good suggestion on ???what to do. ???Accesses to the intr register are race-prone anyways ???(disabled in fastintr, enabled in the rxtx task without ???holding any lock, and generally accessed under EM_CORE_LOCK ???in other places), and presumably enabling/disabling the ???interrupts around activations of the taks is just an ???optimization (and on a VM, it is actually a pessimization ???due to the huge cost of VM exits). cheers luigi This is not really a big deal; this is how things works for a million years before we had task queues. i agree that the second issue is not a big deal. The first one, on the contrary, is a real problem no matter how you set the 'work' parameter (unless you make it large enough to drain the entire queue in one call). Which should be the goal, except in extreme circumstances. Having more packets than work should be the extreme case and not the norm. All work should do is normalize bursts of packets. If you're consistently over work then either your work parameter is too low, or your interrupt moderation is too wide. Adding a cleanup task simply compensates for bad tuning. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
--- On Tue, 1/15/13, Luigi Rizzo ri...@iet.unipi.it wrote: From: Luigi Rizzo ri...@iet.unipi.it Subject: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() To: h...@freebsd.org, freebsd-net@freebsd.org freebsd-net@freebsd.org Cc: Jack Vogel jfvo...@gmail.com Date: Tuesday, January 15, 2013, 8:23 PM Hi, i found a couple of problems in dev/e1000/if_lem.c::lem_handle_rxtx() , (compare with dev/e1000/if_em.c::em_handle_que() for better understanding): 1. in if_em.c::em_handle_que(), when em_rxeof() exceeds the rx_process_limit, the task is rescheduled so it can complete the work. Conversely, in if_lem.c::lem_handle_rxtx() the lem_rxeof() is only run once, and if there are more pending packets the only chance to drain them is to receive (many) more interrupts. This is a relatively serious problem, because the receiver has a hard time recovering. I'd like to commit a fix to this same as it is done in e1000. 2. in if_em.c::em_handle_que(), interrupts are reenabled unconditionally, whereas lem_handle_rxtx() only enables them if IFF_DRV_RUNNING is set. I cannot really tell what is the correct way here, so I'd like to put a comment there unless there is a good suggestion on what to do. Accesses to the intr register are race-prone anyways (disabled in fastintr, enabled in the rxtx task without holding any lock, and generally accessed under EM_CORE_LOCK in other places), and presumably enabling/disabling the interrupts around activations of the taks is just an optimization (and on a VM, it is actually a pessimization due to the huge cost of VM exits). cheers luigi This is not really a big deal; this is how things works for a million years before we had task queues. Intel controllers have built in interrupt moderation (unless you're on an ISA bus or something), so interrupt storms aren't possible. Typical default is 6K ints per second, so you can't get another interrupt for 1/6000th of a second whether there's more work to do or not. The work parameter should be an indicator that something is happening too slow, which can happen with a shaper that's taking a lot more time than normal to process packets. Systems should have a maximum pps engineered into its tuning depending on the cpu to avoid live-lock on legacy systems. the default work limit of 100 is too low on a gigabit system. queuing tasks actually creates more overhead in the system, not less. The issue is whether the process_limit * interrupt_moderation is set to a pps that's suitable for your system. Setting low work limits isn't really a good idea unless you have some other time sensitive kernel task. Usually networking is a priority, so setting arbitrary work limits makes less sense than queuing an additional task, which defeats the purpose of interrupt moderation. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Wed, 1/9/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote: From: Erich Dollansky erichsfreebsdl...@alogt.com Subject: Re: To SMP or not to SMP To: Mark Atkinson atkin...@gmail.com Cc: freebsd-net@freebsd.org Date: Wednesday, January 9, 2013, 1:01 AM Hi, On Tue, 08 Jan 2013 08:29:51 -0800 Mark Atkinson atkin...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/07/2013 18:25, Barney Cordoba wrote: I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. You can build a SMP kernel, but you'll get better performance (in my experience) with SCHED_4BSD on single cpu than with ULE. I would not say so. The machine behaves different with the two schedulers. It depends mostly what you want to do with the machine. I forgot which scheduler I finally left in the single CPU kernel. Erich 4BSD runs pretty well with an SMP kernel. I can test ULE and compare easily. A no SMP kernel is problematic as the igb driver doesn't seem to work and my onboard NICs are, sadly, igb. Rather than say depends what you want to do, perhaps an explanation of which cases you might choose one or the other would be helpful. So can anyone in the know confirm that the kernel really isn't smart enough to know there there's only 1 core so that most of the SMP overhead is avoided? It seems to me that SMP scheduling should only be enabled if there is more than 1 core as part of the scheduler initialization. Its arrogant indeed to assume that just because SMP support is compiled in that there are multiple cores. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Wed, 1/9/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote: From: Erich Dollansky erichsfreebsdl...@alogt.com Subject: Re: To SMP or not to SMP To: Barney Cordoba barney_cord...@yahoo.com Cc: Mark Atkinson atkin...@gmail.com, freebsd-net@freebsd.org, jack.vo...@gmail.com Date: Wednesday, January 9, 2013, 9:14 AM Hi, On Wed, 9 Jan 2013 05:40:13 -0800 (PST) Barney Cordoba barney_cord...@yahoo.com wrote: --- On Wed, 1/9/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote: From: Erich Dollansky erichsfreebsdl...@alogt.com Subject: Re: To SMP or not to SMP To: Mark Atkinson atkin...@gmail.com Cc: freebsd-net@freebsd.org Date: Wednesday, January 9, 2013, 1:01 AM Hi, On Tue, 08 Jan 2013 08:29:51 -0800 Mark Atkinson atkin...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/07/2013 18:25, Barney Cordoba wrote: I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. You can build a SMP kernel, but you'll get better performance (in my experience) with SCHED_4BSD on single cpu than with ULE. I would not say so. The machine behaves different with the two schedulers. It depends mostly what you want to do with the machine. I forgot which scheduler I finally left in the single CPU kernel. Erich 4BSD runs pretty well with an SMP kernel. I can test ULE and compare easily. A no SMP kernel is problematic as the igb driver doesn't seem to work and my onboard NICs are, sadly, igb. this is bad luck. I know of the kernels as I have had SMP and single CPU machines since 4.x times. Rather than say depends what you want to do, perhaps an explanation of which cases you might choose one or the other would be helpful. So can anyone in the know confirm that the kernel really isn't smart enough to know there there's only 1 core so that most of the SMP The kernel does not think like this. It is a fixed program flow. overhead is avoided? It seems to me that SMP scheduling should only be enabled if there is more than 1 core as part of the scheduler initialization. Its arrogant indeed to assume that just because SMP support is compiled in that there are multiple cores. I compile my own kernels and set the parameters as needed. Erich This explanation defies the possibility of a GENERIC kernel, which of course is an important element of a GPOS. Its too bad that smp support can't be done with logic rather than a kernel option. The big thing I see is the use of legacy interrupts vs msix. Its not like flipping off SMP support only changes the scheduler behavior. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Wed, 1/9/13, sth...@nethelp.no sth...@nethelp.no wrote: From: sth...@nethelp.no sth...@nethelp.no Subject: Re: To SMP or not to SMP To: erichsfreebsdl...@alogt.com Cc: barney_cord...@yahoo.com, freebsd-net@freebsd.org, jack.vo...@gmail.com, atkin...@gmail.com Date: Wednesday, January 9, 2013, 9:32 AM 4BSD runs pretty well with an SMP kernel. I can test ULE and compare easily. A no SMP kernel is problematic as the igb driver doesn't seem to work and my onboard NICs are, sadly, igb. this is bad luck. I know of the kernels as I have had SMP and single CPU machines since 4.x times. I have had igb working with both SMP and non-SMP kernel for at least a year or two, 8.x-STABLE. No specific problems. Steinar Haug, Nethelp consulting, sth...@nethelp.no Maybe a problem with legacy interrupts on more modern processors? I'm using an E5520 and while the NIC inits ok, it just doesnt seem to gen interrupts. I can't spend much time debugging it I notice that HAMMER kernels use MSI/X interrupts whether SMP is enabled or not, while i386 kernels seem to require APIC. Is there some physical reason for this? BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Tue, 1/8/13, Mark Atkinson atkin...@gmail.com wrote: From: Mark Atkinson atkin...@gmail.com Subject: Re: To SMP or not to SMP To: freebsd-net@freebsd.org Date: Tuesday, January 8, 2013, 11:29 AM -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/07/2013 18:25, Barney Cordoba wrote: I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. You can build a SMP kernel, but you'll get better performance (in my experience) with SCHED_4BSD on single cpu than with ULE. I've tested the 2 schedulers on an SMP kernel with 1 core. I don't have a 1 core system to test with so I'm using an E5520 with 1 core enabled. Bridging a controlled test (curl-loader doing a web-load test with 100 users that consistently generates 870Mb/s and 77Kpps, I see the following: top -SH ULE: idle: 74.85% kernel {em1 que} 17.68% kernel {em0 que} 5.86% httpd: .49% 4BSD: idle: 70.95% kernel {em1 que} 18.07% kernel {em0 que} 4.44% httpd: .93% Note that the https is a monitor I'm running. so it appears that theres 7% of usage missing (all other apps show 0% usage). If i had to guess just looking at the numbers, it seems that 4BSD might do better with the interrupt level stuff, and not as good with user level context switching. I think they're close enough to stick with ULE so I can just use a stock kernel. One thing that bothers me is the idle sits at 100% when other tasks are registering values under light loads, so it's certainly not all that accurate. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Wed, 1/9/13, Barney Cordoba barney_cord...@yahoo.com wrote: From: Barney Cordoba barney_cord...@yahoo.com Subject: Re: To SMP or not to SMP To: Mark Atkinson atkin...@gmail.com Cc: freebsd-net@freebsd.org Date: Wednesday, January 9, 2013, 1:08 PM --- On Tue, 1/8/13, Mark Atkinson atkin...@gmail.com wrote: From: Mark Atkinson atkin...@gmail.com Subject: Re: To SMP or not to SMP To: freebsd-net@freebsd.org Date: Tuesday, January 8, 2013, 11:29 AM -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/07/2013 18:25, Barney Cordoba wrote: I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. You can build a SMP kernel, but you'll get better performance (in my experience) with SCHED_4BSD on single cpu than with ULE. I've tested the 2 schedulers on an SMP kernel with 1 core. I don't have a 1 core system to test with so I'm using an E5520 with 1 core enabled. Bridging a controlled test (curl-loader doing a web-load test with 100 users that consistently generates 870Mb/s and 77Kpps, I see the following: top -SH ULE: idle: 74.85% kernel {em1 que} 17.68% kernel {em0 que} 5.86% httpd: .49% 4BSD: idle: 70.95% kernel {em1 que} 18.07% kernel {em0 que} 4.44% httpd: .93% Note that the https is a monitor I'm running. so it appears that theres 7% of usage missing (all other apps show 0% usage). If i had to guess just looking at the numbers, it seems that 4BSD might do better with the interrupt level stuff, and not as good with user level context switching. I think they're close enough to stick with ULE so I can just use a stock kernel. One thing that bothers me is the idle sits at 100% when other tasks are registering values under light loads, so it's certainly not all that accurate. BC Ok, thanks to J Baldwin's tip I got a NON-SMP kernel running with some interesting results. Here's all 4 tests: I've tested the 2 schedulers on an SMP kernel with 1 core. I don't have a 1 core system to test with so I'm using an E5520 with 1 core enabled. Bridging a controlled test (curl-loader doing a web-load test with 100 users that consistently generates 870Mb/s and 77Kpps, I see the following: top -SH ULE (SMP): idle: 74.85% kernel {em1 que} 17.68% kernel {em0 que} 5.86% httpd: .49% 4BSD (SMP): idle: 70.95% kernel {em1 que} 18.07% kernel {em0 que} 4.44% httpd: .93% 4BSD (NON-SMP): idle: 72.95% kernel {em1 que} 15.04% kernel {em0 que} 6.10% httpd: 1.17% ULE (NON-SMP): idle: 76.17% kernel {em1 que} 16.99% kernel {em0 que} 5.18% httpd: 1.66% A kernel with SMP off seems to be a bit more efficient. A better test would be to have more stuff running, but Im about out of time on this project. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Mon, 1/7/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote: From: Erich Dollansky erichsfreebsdl...@alogt.com Subject: Re: To SMP or not to SMP To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org Date: Monday, January 7, 2013, 10:56 PM Hi, On Mon, 7 Jan 2013 18:25:58 -0800 (PST) Barney Cordoba barney_cord...@yahoo.com wrote: I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. I ran a single CPU version of FreeBSD until my last single CPU got hit by a lightning last April or May without any problems. I never saw a reason to include the overhead of SMP for this kind of machine and I also never ran into problems with this. Another assumption based on logic rather than empirical evidence. I think I'll test it. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Tue, 1/8/13, Ian Smith smi...@nimnet.asn.au wrote: From: Ian Smith smi...@nimnet.asn.au Subject: Re: To SMP or not to SMP To: Garrett Cooper yaneg...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, Erich Dollansky erichsfreebsdl...@alogt.com, freebsd-net@freebsd.org Date: Tuesday, January 8, 2013, 11:34 AM On Tue, 8 Jan 2013 07:57:04 -0800, Garrett Cooper wrote: On Jan 8, 2013, at 7:50 AM, Barney Cordoba wrote: --- On Mon, 1/7/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote: From: Erich Dollansky erichsfreebsdl...@alogt.com Subject: Re: To SMP or not to SMP To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org Date: Monday, January 7, 2013, 10:56 PM Hi, On Mon, 7 Jan 2013 18:25:58 -0800 (PST) Barney Cordoba barney_cord...@yahoo.com wrote: I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. I ran a single CPU version of FreeBSD until my last single CPU got hit by a lightning last April or May without any problems. I never saw a reason to include the overhead of SMP for this kind of machine and I also never ran into problems with this. Another assumption based on logic rather than empirical evidence. It isn't really an offhanded assumption because there _is_ additional overhead added into the kernel structures to make things work SMP with locking :). Whether or not it's measurable for you and your applications, I have no idea. HTH, -Garrett Where's Kris Kennaway when you need something compared, benchmarked under N different types of loads, and nicely graphed? Do we have a contender? :) cheers, Ian I don't need no stinking graphs. I'll do some testing. bc ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver
--- On Mon, 1/7/13, Willem Jan Withagen w...@digiware.nl wrote: From: Willem Jan Withagen w...@digiware.nl Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver To: Barney Cordoba barney_cord...@yahoo.com Cc: Garrett Cooper yaneg...@gmail.com, freebsd-net@freebsd.org, Adrian Chadd adr...@freebsd.org, David Christensen davi...@freebsd.org, lini...@freebsd.org Date: Monday, January 7, 2013, 3:20 AM On 2013-01-05 16:17, Barney Cordoba wrote: --- On Fri, 1/4/13, Willem Jan Withagen w...@digiware.nl wrote: From: Willem Jan Withagen w...@digiware.nl Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver To: Barney Cordoba barney_cord...@yahoo.com Cc: Garrett Cooper yaneg...@gmail.com, freebsd-net@freebsd.org, Adrian Chadd adr...@freebsd.org, David Christensen davi...@freebsd.org, lini...@freebsd.org Date: Friday, January 4, 2013, 9:41 AM On 2013-01-01 0:04, Barney Cordoba wrote: The statement above assumes that there is a benefit. voIP packets are short, so the benefit of offloading is reduced. There is some delay added by the hardware, and there are cpu cycles used in managing the offload code. So those operations not only muddy the code, but they may not be faster than simply doing the checksum on a much, much faster cpu. Forgoing all the discussions on performance and possible penalties in drivers. I think there is a large set of UDP streams (and growing) that do use larger packets. The video streaming we did used a size of header(14)+7*188, which is the max number of MPEG packet to fit into anything with an MTU 1500. Receiving those on small embedded devices which can do HW check-summing is very beneficial there. On the large servers we would generate up to 5Gbit of outgoing streams. I'm sure that offloading UDP checks would be an advantage as well. (They did run mainly Linux, but FreeBSD would also work) Unfortunately most of the infrastructure has been taken down, so it is no longer possible to verify any of the assumptions. --WjW If you haven't benchmarked it, then you're just guessing. That's my point. Its like SMP in freeBSD 4. People bought big, honking machines and the big expensive machines were slower than a single core system at less than half the price. Just because something sounds better doesn't mean that it is better. I completely agree Dutch proverb goes: To measure is to know Which was the subtitle of my graduation report, and my professional motto when working as a systems-architect That's why it is sad that the system is no longer up and running, because a 0-order check would be no more that 1 ifconfig would have made a difference. But that is all water under the bridge. --WjW You can't really benchmark on a live network; you need a control. It's easy enough to generate controlled UDP streams. And of course every NIC would be a new deal. I'm sure that UDP offload is a checklist feature and not something that the intels and broadcoms of the world do a lot of performance testing for. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
To SMP or not to SMP
I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: To SMP or not to SMP
--- On Mon, 1/7/13, Garrett Cooper yaneg...@gmail.com wrote: From: Garrett Cooper yaneg...@gmail.com Subject: Re: To SMP or not to SMP To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org Date: Monday, January 7, 2013, 9:38 PM On Jan 7, 2013, at 6:25 PM, Barney Cordoba wrote: I have a situation where I have to run 9.1 on an old single core box. Does anyone have a handle on whether it's better to build a non SMP kernel or to just use a standard SMP build with just the one core? Thanks. Non-SMP. I don't see why it would be wise to involve the standard locking structure overhead for a single-core box. It might not be wise, but I'd guess that 99% of the development work is being done on SMP systems, so who knows what weirdness non-smp systems might have. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver
--- On Fri, 1/4/13, Willem Jan Withagen w...@digiware.nl wrote: From: Willem Jan Withagen w...@digiware.nl Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver To: Barney Cordoba barney_cord...@yahoo.com Cc: Garrett Cooper yaneg...@gmail.com, freebsd-net@freebsd.org, Adrian Chadd adr...@freebsd.org, David Christensen davi...@freebsd.org, lini...@freebsd.org Date: Friday, January 4, 2013, 9:41 AM On 2013-01-01 0:04, Barney Cordoba wrote: The statement above assumes that there is a benefit. voIP packets are short, so the benefit of offloading is reduced. There is some delay added by the hardware, and there are cpu cycles used in managing the offload code. So those operations not only muddy the code, but they may not be faster than simply doing the checksum on a much, much faster cpu. Forgoing all the discussions on performance and possible penalties in drivers. I think there is a large set of UDP streams (and growing) that do use larger packets. The video streaming we did used a size of header(14)+7*188, which is the max number of MPEG packet to fit into anything with an MTU 1500. Receiving those on small embedded devices which can do HW check-summing is very beneficial there. On the large servers we would generate up to 5Gbit of outgoing streams. I'm sure that offloading UDP checks would be an advantage as well. (They did run mainly Linux, but FreeBSD would also work) Unfortunately most of the infrastructure has been taken down, so it is no longer possible to verify any of the assumptions. --WjW If you haven't benchmarked it, then you're just guessing. That's my point. Its like SMP in freeBSD 4. People bought big, honking machines and the big expensive machines were slower than a single core system at less than half the price. Just because something sounds better doesn't mean that it is better. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver
--- On Mon, 12/31/12, lini...@freebsd.org lini...@freebsd.org wrote: From: lini...@freebsd.org lini...@freebsd.org Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver To: lini...@freebsd.org, freebsd-b...@freebsd.org, freebsd-net@FreeBSD.org Date: Monday, December 31, 2012, 2:28 AM Old Synopsis: UDP checksum offload is wrong in bxe driver New Synopsis: [bxe] [patch] UDP checksum offload is wrong in bxe driver Responsible-Changed-From-To: freebsd-bugs-freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Mon Dec 31 07:28:11 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=174851 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org Has anyone done an analysis on modern hardware as to whether udp csum offloading is actually beneficial? Even on 2007 hardware I came to the conclusion that using offloading was a negative. Reminds me of the days when people were using intelligent ethernet cards that were slower than the host cpu. The handshaking cost you more than just using shared memory. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver
--- On Mon, 12/31/12, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver To: Garrett Cooper yaneg...@gmail.com Cc: Barney Cordoba barney_cord...@yahoo.com, David Christensen davi...@freebsd.org, lini...@freebsd.org, freebsd-net@freebsd.org Date: Monday, December 31, 2012, 2:00 PM On 31 December 2012 07:58, Garrett Cooper yaneg...@gmail.com wrote: I would ask David about whether or not there was a performance difference because they might have some numbers for if_bxe. Not sure about the concept in general, but it seems like a reasonable application protocol specific request. But by and large, I agree that UDP checksumming doesn't make logical sense because it adds unnecessary overhead on a L3 protocol that's assumed to be unreliable. People are terminating millions of VoIP calls on FreeBSD devices. All using UDP. I can imagine large scale VoIP gateways wanting to try and benefit from this. The statement above assumes that there is a benefit. voIP packets are short, so the benefit of offloading is reduced. There is some delay added by the hardware, and there are cpu cycles used in managing the offload code. So those operations not only muddy the code, but they may not be faster than simply doing the checksum on a much, much faster cpu. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb and ALTQ in 9.1-rc3
--- On Tue, 12/11/12, Gleb Smirnoff gleb...@freebsd.org wrote: From: Gleb Smirnoff gleb...@freebsd.org Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Clement Hermann (nodens) nodens2...@gmail.com, Barney Cordoba barney_cord...@yahoo.com, freebsd-net@FreeBSD.org Date: Tuesday, December 11, 2012, 2:58 AM On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from if_start to if_transmit effectively disables ALTQ support. AFAIR, there is some magic implemented in other drivers that makes them modern (that means using if_transmit), but still capable to switch to queueing mode if SIOCADDALTQ was casted upon them. It seems pretty difficult to say that something is compatible with something else if it hasn't been tested in a few years. It seems to me that ATLQ is the one that should handle if_transmit. although it's a good argument for having a raw send function in drivers. Ethernet drivers don't need more than a send() routing that loads a packet into the ring. The decision on what to do if you can't queue a packet should be in the network layer, if we must still call things layers. start is a leftover from a day when you stuffed a buffer and waited for an interrupt to stuff in another. The whole idea is antiquated. Imagine drivers that pull packets off of a card and simply queue it; and that you simply submit a packet to be queued for transmit. Instead of trying to find 35 programmers that understand all of the lock BS, you only need to have a couple. I always disable all of the gobbledegook like checksum offloading. They just muddy the water and have very little effect on performance. A modern cpu can do a checksum as fast as you can manage the capabilities without disrupting the processing path. With FreeBSD, every driver is an experience. Some suck so bad that they should come with a warning. The MSK driver is completely useless, as an example. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb and ALTQ in 9.1-rc3
--- On Tue, 12/11/12, Karim Fodil-Lemelin fodillemlinka...@gmail.com wrote: From: Karim Fodil-Lemelin fodillemlinka...@gmail.com Subject: Re: igb and ALTQ in 9.1-rc3 To: freebsd-net@freebsd.org Cc: nodens2...@gmail.com Date: Tuesday, December 11, 2012, 9:56 AM On 11/12/2012 9:15 AM, Ermal Luçi wrote: On Tue, Dec 11, 2012 at 2:05 PM, Barney Cordoba barney_cord...@yahoo.comwrote: --- On Tue, 12/11/12, Gleb Smirnoff gleb...@freebsd.org wrote: From: Gleb Smirnoff gleb...@freebsd.org Subject: Re: igb and ALTQ in 9.1-rc3 To: Jack Vogel jfvo...@gmail.com Cc: Clement Hermann (nodens) nodens2...@gmail.com, Barney Cordoba barney_cord...@yahoo.com, freebsd-net@FreeBSD.org Date: Tuesday, December 11, 2012, 2:58 AM On Mon, Dec 10, 2012 at 03:31:19PM -0800, Jack Vogel wrote: J UH, maybe asking the owner of the driver would help :) J J ... and no, I've never been aware of doing anything to stop supporting altq J so you wouldn't see any commits. If there's something in the altq code or J support (which I have nothing to do with) that caused this no-one informed J me. Switching from if_start to if_transmit effectively disables ALTQ support. AFAIR, there is some magic implemented in other drivers that makes them modern (that means using if_transmit), but still capable to switch to queueing mode if SIOCADDALTQ was casted upon them. It seems pretty difficult to say that something is compatible with something else if it hasn't been tested in a few years. It seems to me that ATLQ is the one that should handle if_transmit. although it's a good argument for having a raw send function in drivers. Ethernet drivers don't need more than a send() routing that loads a packet into the ring. The decision on what to do if you can't queue a packet should be in the network layer, if we must still call things layers. start is a leftover from a day when you stuffed a buffer and waited for an interrupt to stuff in another. The whole idea is antiquated. Imagine drivers that pull packets off of a card and simply queue it; and that you simply submit a packet to be queued for transmit. Instead of trying to find 35 programmers that understand all of the lock BS, you only need to have a couple. I always disable all of the gobbledegook like checksum offloading. They just muddy the water and have very little effect on performance. A modern cpu can do a checksum as fast as you can manage the capabilities without disrupting the processing path. With FreeBSD, every driver is an experience. Some suck so bad that they should come with a warning. The MSK driver is completely useless, as an example. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org During implementation of if_transmit altq was not considered at all. The default if_transmit provides some compatibility but that is void since altq has not been converted to call if_transmit after processing the mbuf. ALTQ can be adapted quite easily to if_transmit model it just wasn't done at the time. With if_transmit model it can even be modularized and not be a compile kernel option since the queue of the iface is abstracted now. I have always wanted to do a diff but have not yet got to it. The change is quite simple just provide an altq_transmit default method and just hook into if_transmit model on the fly. You surely need to handle some iface events and enable altq based on request but its is not a hard to implement. I will always have this in my TODO but not sure when i can get to it. The issue is not only that igb doesn't support if_transmit or if_start method but that ALTQ isn't multiqueue ready and still uses the IFQ_LOCK for all of its enqueue/dequeue operations. A simple drop in of if_transmit is bound to cause race conditions on any multiqueue driver with ALTQ. I do have a patch set for this on igb but its ugly and needs more work although it should get you going. Let me know if your interested I will clean it up and send it over. For more information on ALTQ discussion and igb please read this thread: http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html Best regards, Karim. At minimum, the drivers should make multiqueue an option, at least until it works better than a single queue driver. Many MOBOs have igb nics on board and such a mainstream NIC shouldn't be strapped using experimental code that clearly isn't ready for prime time. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: igb and ALTQ in 9.1-rc3
--- On Mon, 12/10/12, Clément Hermann (nodens) nodens2...@gmail.com wrote: From: Clément Hermann (nodens) nodens2...@gmail.com Subject: igb and ALTQ in 9.1-rc3 To: freebsd-net@freebsd.org Date: Monday, December 10, 2012, 6:03 AM Hi there, I'm trying to install a new pf/altq router. I needed to use 9.1-rc3 due to RAID driver issues. Everything works find on my quad port intel card (igb), but when I try to load my ruleset I get the following error : pfctl: igb0 : driver does not support ALTQ altq(4) states that igb is supported. There are some references to altq in if_igb.c (include opt_altq in an ifdef), but they are not in the em driver (though my ruleset load fine with a em card). Could anyone tell me if igb is supposed to support altq or not ? Thanks, Clément (nodens) I'll take a guess that the ALTQ description was written before igb stopped supporting it. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Latency issues with buf_ring
--- On Thu, 12/6/12, Adrian Chadd adr...@freebsd.org wrote: From: Adrian Chadd adr...@freebsd.org Subject: Re: Latency issues with buf_ring To: Barney Cordoba barney_cord...@yahoo.com Cc: freebsd-net@freebsd.org, Robert Watson rwat...@freebsd.org Date: Thursday, December 6, 2012, 1:31 PM There've been plenty of discussions about better ways of doing this networking stuff. Barney, are you able to make it to any of the developer summits? Perhaps the summits are part of the problem? The goal should be to get the best ideas; not the best ideas of those with the time and resource and desire to attend a summit. Lists are the best summit. You can get ideas from people who may not be allowed by their contract obligations to attend such a summit. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Latency issues with buf_ring
--- On Thu, 12/6/12, Robert Watson rwat...@freebsd.org wrote: From: Robert Watson rwat...@freebsd.org Subject: Re: Latency issues with buf_ring To: Andre Oppermann opperm...@networx.ch Cc: Barney Cordoba barney_cord...@yahoo.com, Adrian Chadd adr...@freebsd.org, John Baldwin j...@freebsd.org, freebsd-net@freebsd.org Date: Thursday, December 6, 2012, 4:39 AM On Tue, 4 Dec 2012, Andre Oppermann wrote: For most if not all ethernet drivers from 100Mbit/s the TX DMA rings are so large that buffering at the IFQ level doesn't make sense anymore and only adds latency. So it could simply directly put everything into the TX DMA and not even try to soft-queue. If the TX DMA ring is full ENOBUFS is returned instead of filling yet another queue. However there are ALTQ interactions and other mechanisms which have to be considered too making it a bit more involved. I asserted for many years that software-side queueing would be subsumed by increasingly large DMA descriptor rings for the majority of devices and configurations. However, this turns out not to have happened in a number of scenarios, and so I've revised my conclusions there. I think we will continue to need to support transmit-side buffering, ideally in the form of a set of libraries that device drivers can use to avoid code replication and integrate queue management features fairly transparently. I'm a bit worried by the level of copy-and-paste between 10gbps device drivers right now -- for 10/100/1000 drivers, the network stack contains the majority of the code, and the responsibility of the device driver is to advertise hardware features and manage interactions with rings, interrupts, etc. On the 10gbps side, we see lots of code replication, especially in queue management, and it suggests to me (as discussed for several years in a row at BSDCan and elsehwere) that it's time to do a bit of revisiting of ifnet, pull more code back into the central stack and out of device drivers, etc. That doesn't necessarily mean changing notions of ownership of event models, rather, centralising code in libraries rather than all over the place. This is something to do with some care, of course. Robert More troubling than that is the notion that the same code that's suitable for 10/100Gb/s should be used in a 10Gb/s environment. 10Gb/s requires a completely different way of thinking. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Latency issues with buf_ring
--- On Tue, 12/4/12, Bruce Evans b...@optusnet.com.au wrote: From: Bruce Evans b...@optusnet.com.au Subject: Re: Latency issues with buf_ring To: Andre Oppermann opperm...@networx.ch Cc: Adrian Chadd adr...@freebsd.org, Barney Cordoba barney_cord...@yahoo.com, John Baldwin j...@freebsd.org, freebsd-net@FreeBSD.org Date: Tuesday, December 4, 2012, 10:31 PM On Tue, 4 Dec 2012, Andre Oppermann wrote: For most if not all ethernet drivers from 100Mbit/s the TX DMA rings are so large that buffering at the IFQ level doesn't make sense anymore and only adds latency. I found sort of the opposite for bge at 1Gbps. Most or all bge NICs have a tx ring size of 512. The ifq length is the tx ring size minus 1 (511). I needed to expand this to imax(2 * tick / 4, 1) to maximize pps. This does bad things to latency and worse things to caching (512 buffers might fit in the L2 cache, but 1 buffers bust any reasonably cache as they are cycled through), but I only tried to optimize tx pps. So it could simply directly put everything into the TX DMA and not even try to soft-queue. If the TX DMA ring is full ENOBUFS is returned instead of filling yet another queue. That could work, but upper layers currently don't understand ENOBUFS at all, so it would work poorly now. Also, 512 entries is not many, so even if upper layers understood ENOBUFS it is not easy for them to _always_ respond fast enough to keep the tx active, unless there are upstream buffers with many more than 512 entries. There needs to be enough buffering somewhere so that the tx ring can be replenished almost instantly from the buffer, to handle the worst-case latency for the threads generatng new (unbuffered) packets. At the line rate of ~1.5 Mpps for 1 Gbps, the maximum latency that can be covered by 512 entries is only 340 usec. However there are ALTQ interactions and other mechanisms which have to be considered too making it a bit more involved. I didn't try to handle ALTQ or even optimize for TCP. More details: to maximize pps, the main detail is to ensure that the tx ring never becomes empty. The tx then transmits as fast as possible. This requires some watermark processing, but FreeBSD has almost none for tx rings. The following normally happens for packet generators like ttcp and netsend: - loop calling send() or sendto() until the tx ring (and also any upstream buffers) fill up. Then ENOBUFS is returned. - watermark processing is broken in the user API at this point. There is no way for the application to wait for the ENOBUFS condition to go away (select() and poll() don't work). Applications use poor workarounds: - old (~1989) ttcp sleeps for 18 msec when send() returns ENOBUFS. This was barely good enough for 1 Mbps ethernet (line rate ~1500 pps is 27 per 18 msec, so IFQ_MAXLEN = 50 combined with just a 1-entry tx ring provides a safety factor of about 2). Expansion of the tx ring size to 512 makes this work with 10 Mbps ethernet too. Expansion of the ifq to 511 gives another factor of 2. After losing the safety factor of 2, we can now handle 40 Mbps ethernet, and are only a factor of 25 short for 1 Gbps. My hardware can't do line rate for small packets -- it can only do 640 kpps. Thus ttcp is only a factor of 11 short of supporting the hardware at 1 Gbps. This assumes that sleeps of 18 msec are actually possible, which they aren't with HZ = 100 giving a granularity of 10 msec so that sleep(18 msec) actually sleeps for an average of 23 msec. -current uses the bad default of HZ = 1000. With that sleep(18 msec) would average 18.5 msec. Of course, ttcp should sleep for more like 1 msec if that is possible. Then the average sleep is 1.5 msec. ttcp can keep up with the hardware with that, and is only slightly behind the hardware with the worst-case sleep of 2 msec (512+511 packets generated every 2 msec is 511.5 kpps). I normally use old ttcp, except I modify it to sleep for 1 msec instead of 18 in one version, and in another version I remove the sleep so that it busy-waits in a loop that calls send() which almost always returns ENOBUFS. The latter wastes a lot of CPU, but is almost good enough for throughput testing. - newer ttcp tries to program the sleep time in microseconds. This doesn't really work, since the sleep granularity is normally at least a millisecond, and even if it could be the 340 microseconds needed by bge with no ifq (see above, and better divide the 340 by 2), then this is quite short and would take almost as much CPU as busy-waiting. I consider HZ = 1000 to be another form of polling/busy-waiting and don't use it except for testing. - netrate/netsend also uses a programmed sleep time. This doesn't really work, as above. netsend also tries to limit its rate