Re: [E1000-devel] rx_no_dma_resources - Issue on newer hardware (not on older hardware)

Scott Silverman Thu, 23 Jan 2014 14:02:11 -0800

Answered my own question, from the xeon-e5-1600-2600-vol-2-datasheet.pdf
where the hex values line up with the options I see. It seems my board
(SuperMicro X9DRW-iF) defaults to 0x2 (128us). I will try 0x1 (32us) and
report back.


In the meantime, I've tried to do some googling to determine what this
function actually controls, but can't seem to find anything helpful. Can
you point me to a resource that describes what this setting controls for my
own understanding?




Thanks,

Scott Silverman | IT | Simplex Investments | 312-360-2444
230 S. LaSalle St., Suite 4-100, Chicago, IL 60604


On Thu, Jan 23, 2014 at 3:53 PM, Scott Silverman <
ssilver...@simplexinvestments.com> wrote:

> I do not have an "Extended ATR" setting but I do have a "Ageing Timer
> Rollover"(sp) setting.
>
> The default for that is 128us with options of: Disabled, 32us, 128us and
> 512us.
>
> According to the "xeon-35-family-spec-update.pdf" from Intel: (page 95)
> 0 Disabled
> 1 32us
> 2 4us
> 3 2us.
>
> As my options don't really match those on the spec, I thought I'd ask what
> you suggest I try here.
>
>
>
>
> Thanks,
>
> Scott Silverman | IT | Simplex Investments | 312-360-2444
> 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>
>
> On Thu, Jan 23, 2014 at 3:25 PM, Alexander Duyck <
> alexander.h.du...@intel.com> wrote:
>
>> One other thing you may want to check is your BIOS configuration.
>> Specifically check to see if you have an option to modify a value called
>> "Extended ATR" in your BIOS.  It is usually in somewhere with the
>> advanced CPU options.  The default value on many systems is 0x3 and we
>> have seen that changing it to 0x1 can sometimes improve the system
>> performance in cases such as this.
>>
>> Thanks,
>>
>> Alex
>>
>> On 01/23/2014 01:03 PM, Scott Silverman wrote:
>> > I now have one of the older (dual Xeon X5670) systems running CentOS6
>> > like the newer hardware. It remains free of any drops incrementing the
>> > "rx_no_dma_resources" counter. The newer (E5-2670 and E5-2680 v2)
>> > hardware still drops.
>> >
>> > Various tuning measures have had varying amounts of success in
>> > reducing the number of drops on the newer hardware (things like
>> > limiting RSS to the number of physical cores on the CPU package
>> > connected to the NIC, turning off ATR, using numactl to move processes
>> > closer to interrupts, etc) but none of them have been necessary on the
>> > older "slower" hardware.
>> >
>> > All systems (new and old) have their C-states disabled and only use C0
>> > and C1. turbostat reports that they stay, consistently, at their turbo
>> > frequencies, all right around 3Ghz.
>> >
>> > Adjusting the rx-usecs value to 0, disabling interrupt moderation,
>> > seems like it may have reduced the drops a bit, but I can't say that
>> > conclusively yet.
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Scott Silverman | IT | Simplex Investments | 312-360-2444
>> > 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >
>> >
>> > On Thu, Dec 26, 2013 at 10:15 AM, Duyck, Alexander H
>> > <alexander.h.du...@intel.com <mailto:alexander.h.du...@intel.com>>
>> wrote:
>> >
>> >     Normally any other issues such as ASPM would show up as Rx missed
>> >     errors without the no_dma_resources error.  This is because ASPM
>> >     normally affects DMA latency, not CPU performance.
>> >
>> >
>> >
>> >     One other thing that occurred to me that you might want to check
>> >     is the interrupt moderation configuration.  This can be controlled
>> >     via the “ethtool –C/-c” interface.  Normally the rx-usecs value is
>> >     defaulted to 1 if I recall which is a dynamic interrupt moderation
>> >     value.  One thing you might try is setting it to a static value
>> >     such as 40us to see if this helps to reduce the drops.
>> >
>> >
>> >
>> >     Thanks,
>> >
>> >
>> >
>> >     Alex
>> >
>> >
>> >
>> >     *From:*Scott Silverman [mailto:ssilver...@simplexinvestments.com
>> >     <mailto:ssilver...@simplexinvestments.com>]
>> >     *Sent:* Tuesday, December 24, 2013 10:09 AM
>> >     *To:* Brandeburg, Jesse
>> >     *Cc:* Duyck, Alexander H; e1000-devel@lists.sourceforge.net
>> >     <mailto:e1000-devel@lists.sourceforge.net>
>> >     *Subject:* Re: [E1000-devel] rx_no_dma_resources - Issue on newer
>> >     hardware (not on older hardware)
>> >
>> >
>> >
>> >     I haven't been able to get a system out on the older hardware
>> >     running CentOS6 yet.
>> >
>> >
>> >
>> >     In the meantime I did want to confirm that, according to turbostat
>> >     (and i7z) my cores never leave C0/C1. They also stay at a
>> >     consistent frequency (3.0-3.2Ghz depending on the processor). I am
>> >     fairly confident that the information reported by those tools is
>> >     accurate and that there are no sleep/wakeup issues in terms of CPU
>> >     power management.
>> >
>> >
>> >
>> >     Are there other sleep/wake issues on the newer hardware I need to
>> >     be aware of, other than the CPU power state? As far as I know,
>> >     ASPM is also disabled (as reported by lspci -vv LnkCtl: ASPM
>> >     Disabled).
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     Thanks,
>> >
>> >
>> >
>> >     Scott Silverman | IT | Simplex Investments | 312-360-2444
>> >     <tel:312-360-2444>
>> >
>> >     230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >
>> >
>> >
>> >     On Thu, Dec 19, 2013 at 5:32 PM, Brandeburg, Jesse
>> >     <jesse.brandeb...@intel.com <mailto:jesse.brandeb...@intel.com>>
>> >     wrote:
>> >
>> >     Scott be sure to try running turbostat on both old and new servers
>> >     as I suspect the 50us wake latency of C6 power state may cause
>> drops.
>> >
>> >     The new kernels enable deeper sleep.
>> >
>> >     You can also try a bios setting to disable deep sleep states,
>> >     leave on C1 only.
>> >
>> >     There was a program called cpudmalatency.c or something that may
>> >     be able to help you keep system more awake.
>> >
>> >     --
>> >     Jesse Brandeburg
>> >
>> >
>> >
>> >     On Dec 19, 2013, at 2:57 PM, "Scott Silverman"
>> >     <ssilver...@simplexinvestments.com
>> >     <mailto:ssilver...@simplexinvestments.com>> wrote:
>> >
>> >     > Alex,
>> >     >
>> >     > Thanks for the response, I'll attempt to reproduce with a
>> >     consistent OS
>> >     > release and re-open the discussion at that time.
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     > Thanks,
>> >     >
>> >     > Scott Silverman
>> >     >
>> >     >
>> >     > On Thu, Dec 19, 2013 at 4:52 PM, Alexander Duyck <
>> >     > alexander.h.du...@intel.com
>> >     <mailto:alexander.h.du...@intel.com>> wrote:
>> >     >
>> >     >> On 12/19/2013 10:31 AM, Scott Silverman wrote:
>> >     >>> We have three generations of servers running nearly identical
>> >     software.
>> >     >>> Each subscribes to a variety of multicast groups taking in, on
>> >     average,
>> >     >>> 200-300Mbps of data.
>> >     >>>
>> >     >>> The oldest generation (2x Xeon X5670, SuperMicro 6016T-NTRF,
>> Intel
>> >     >>> X520-DA2) has no issues handling all the incoming data. (zero
>> >     >>> rx_no_dma_resources)
>> >     >>>
>> >     >>> The middle generation (2x Xeon E5-2670, SuperMicro 6017R-WRF,
>> >     Intel
>> >     >>> X520-DA2) and the newest generation (2x Xeon E5-2680v2,
>> SuperMicro
>> >     >>> 6017R-WRF, Intel X520-DAs) both have issues handling the
>> >     incoming data
>> >     >>> (indicated by increasing rx_no_dma_resources counter).
>> >     >>>
>> >     >>> The oldest generation of servers is running CentOS5 on a newer
>> >     kernel
>> >     >>> (3.4.41), the others are running CentOS6 on the exact same
>> kernel
>> >     >> (3.4.41).
>> >     >>>
>> >     >>> The oldest generation is using ixgbe 3.13.10, the middle
>> >     generation
>> >     >> 3.13.10
>> >     >>> and the newest are on 3.18.7. All machines are using the
>> >     set_irq_affinity
>> >     >>> script to spread queue interrupts across available cores. All
>> >     machines
>> >     >> are
>> >     >>> configured with C1 as the maximum C-state and CPU clocks are
>> >     all steady
>> >     >>> between 3-3.2Ghz depending on the processor model.
>> >     >>>
>> >     >>> On the middle/newer boxes, lowering the number of RSS queues
>> >     manually
>> >     >> (i.e.
>> >     >>> RSS=8,8) seems to help reduce the amount of dropping, but it
>> >     does not
>> >     >>> eliminate it.
>> >     >>>
>> >     >>> The ring buffer drops do not seem to correlate with data
>> >     rates, either.
>> >     >> It
>> >     >>> does not seem that it is an issue of keeping up. In addition,
>> >     the boxes
>> >     >> are
>> >     >>> not under particularly heavy load. The CPU usage is generally
>> >     between
>> >     >> 3-5%
>> >     >>> and rarely spikes much higher than 15%. The load average is
>> >     generally
>> >     >>> around 2.
>> >     >>>
>> >     >>> I am at a loss for what else to try to diagnose and/or fix
>> >     this. In my
>> >     >>> mind, the newer boxes should have no problem at all keeping up
>> >     with the
>> >     >>> older ones.
>> >     >>>
>> >     >>> I've attached the output of ethtool -S, one from each
>> >     generation of
>> >     >> server.
>> >     >>>
>> >     >>>
>> >     >>>
>> >     >>> Thanks,
>> >     >>>
>> >     >>> Scott Silverman
>> >     >>
>> >     >> Scott,
>> >     >>
>> >     >> Have you tried running the CentOS5 w/ newer kernel on any of
>> >     your newer
>> >     >> servers, or CentOS6 on one of the older ones?  I ask because
>> >     this would
>> >     >> seem to be the one of the most significant differences between
>> the
>> >     >> servers that are not dropping frames and those that are.  I
>> >     suspect you
>> >     >> may have something in the CentOS6 configuration that is
>> >     responsible for
>> >     >> the drops that is not present in the CentOS5 configuration.  We
>> >     really
>> >     >> need to eliminate any OS based issues before we can really even
>> >     hope to
>> >     >> start chasing this issue down into the driver and/or device
>> >     configuration.
>> >     >>
>> >     >> Thanks,
>> >     >>
>> >     >> Alex
>> >
>> >     >
>> >
>> ------------------------------------------------------------------------------
>> >     > Rapidly troubleshoot problems before they affect your business.
>> >     Most IT
>> >     > organizations don't have a clear picture of how application
>> >     performance
>> >     > affects their revenue. With AppDynamics, you get 100% visibility
>> >     into your
>> >     > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>> >     AppDynamics Pro!
>> >     >
>> >
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>> >     > _______________________________________________
>> >     > E1000-devel mailing list
>> >     > E1000-devel@lists.sourceforge.net
>> >     <mailto:E1000-devel@lists.sourceforge.net>
>> >     > https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> >     > To learn more about Intel&#174; Ethernet, visit
>> >     http://communities.intel.com/community/wired
>> >
>> >
>> >
>> >
>>
>>
>

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] rx_no_dma_resources - Issue on newer hardware (not on older hardware)

Reply via email to