On 08/11/15 at 03:01P, Adrian Chadd wrote: > hi, > > Are you able to graph per-queue interrupt rates? > > It looks like the traffic is distributed differently (the first two > queues are taking interrupts).
Yeah, also check out "# sysctl dev.ix | grep packets" > > Does 10.1 have the flow director code disabled? I remember there was > some .. interesting behaviour with ixgbe where it'd look at traffic > and set up flow director rules to try and "balance" things. It was > buggy and programmed the hardware badly, so we disabled it in at least > -HEAD. Looks like we don't build with IXGBE_FDIR by default on 10 so I assume it's off. There were some lagg/hashing related changes recently so let us know if that is hurting you. Cheers, Hiren > > > > -adrian > > > On 11 August 2015 at 14:18, Maxim Sobolev <sobo...@freebsd.org> wrote: > > Hi folks, > > > > We've trying to migrate some of our high-PPS systems to a new hardware that > > has four X540-AT2 10G NICs and observed that interrupt time goes through > > roof after we cross around 200K PPS in and 200K out (two ports in LACP). > > The previous hardware was stable up to about 350K PPS in and 350K out. I > > believe the old one was equipped with the I350 and had the identical LACP > > configuration. The new box also has better CPU with more cores (i.e. 24 > > cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. > > > > After hitting this limit with the default settings, I've tried to tweak the > > following settings: > > > > hw.ix.rx_process_limit="-1" > > hw.ix.tx_process_limit="-1" > > hw.ix.enable_aim="0" > > hw.ix.max_interrupt_rate="-1" > > hw.ix.rxd="4096" > > hw.ix.txd="4096" > > > > dev.ix.0.fc=0 > > dev.ix.1.fc=0 > > dev.ix.2.fc=0 > > dev.ix.3.fc=0 > > > > hw.intr_storm_threshold=0 > > > > But there is little or no effect on the performance. The workload is just > > lot of small UDP packets being relayed between bunch of hosts. The symptoms > > are always the same - the box runs nice and cool until it his the said PPS > > threshold, with kernel spending just few percent in the interrupts and then > > it jumps straight to 100% interrupt time, thereby scaring some traffic away > > due to packet loss and such, so that the load drops and the system goes > > into the "cool" state again. It looks very much like some contention in the > > driver or in the hardware. Linked are some monitoring screenshots > > displaying the issue unfolding as well as systat -vm screenshots from the > > "cool" state. > > > > http://sobomax.sippysoft.com/ScreenShot387.png <- CPU utilization right > > before the "bang event" > > http://sobomax.sippysoft.com/ScreenShot382.png <- issue itself > > http://sobomax.sippysoft.com/ScreenShot385.png <- systat -vm few minutes > > after traffic declined somewhat > > > > We are now trying to get customer install 1Gig NIC so that we can run it > > and compare performance with the rest of the hardware and software being > > essentially the same. > > > > Any ideas on how to improve/resolve this problem are welcome. Thanks! > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
pgplmFDuIfLwF.pgp
Description: PGP signature