Re: [5/5] e1000e: Avoid receiver overrun interrupt bursts

2017-10-24 Thread Philip Prindeville

> On Oct 24, 2017, at 11:20 AM, Lennart Sorensen  
> wrote:
> 
> On Tue, Sep 19, 2017 at 09:41:02PM +0200, Benjamin Poirier wrote:
>> On 2017/09/19 12:38, Philip Prindeville wrote:
>>> Hi.
>>> 
>>> We’ve been running this patchset (all 5) for about as long as they’ve been 
>>> under review… about 2 months.  And in a burn-in lab with heavy traffic.
>>> 
>>> We’ve not seen a single link-flap in hundreds of ours of saturated traffic.
>>> 
>>> Would love to see some resolution soon on this as we don’t want to ship a 
>>> release with unsanctioned patches.
>>> 
>>> Is there an estimate on when that might be?
>> 
>> The patches have been added to Jeff Kirsher's next-queue tree. I guess
>> they will be submitted for v4.15 which might be released in early
>> 2018...
>> http://phb-crystal-ball.org/
> 
> And then they will be submitted to linux-stable so this long standing
> regression can be fixed, right?
> 


Let’s hope!

-Philip



Re: [5/5] e1000e: Avoid receiver overrun interrupt bursts

2017-09-19 Thread Philip Prindeville
Hi.

We’ve been running this patchset (all 5) for about as long as they’ve been 
under review… about 2 months.  And in a burn-in lab with heavy traffic.

We’ve not seen a single link-flap in hundreds of ours of saturated traffic.

Would love to see some resolution soon on this as we don’t want to ship a 
release with unsanctioned patches.

Is there an estimate on when that might be?

Thanks,

-Philip



> On Jul 21, 2017, at 12:36 PM, Benjamin Poirier  wrote:
> 
> When e1000e_poll() is not fast enough to keep up with incoming traffic, the
> adapter (when operating in msix mode) raises the Other interrupt to signal
> Receiver Overrun.
> 
> This is a double problem because 1) at the moment e1000_msix_other()
> assumes that it is only called in case of Link Status Change and 2) if the
> condition persists, the interrupt is repeatedly raised again in quick
> succession.
> 
> Ideally we would configure the Other interrupt to not be raised in case of
> receiver overrun but this doesn't seem possible on this adapter. Instead,
> we handle the first part of the problem by reverting to the practice of
> reading ICR in the other interrupt handler, like before commit 16ecba59bc33
> ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
> 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
> from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
> anymore. We handle the second part of the problem by not re-enabling the
> Other interrupt right away when there is overrun. Instead, we wait until
> traffic subsides, napi polling mode is exited and interrupts are
> re-enabled.
> 
> Reported-by: Lennart Sorensen 
> Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
> Signed-off-by: Benjamin Poirier 
> Tested-by: Aaron Brown 
> ---
> drivers/net/ethernet/intel/e1000e/defines.h |  1 +
> drivers/net/ethernet/intel/e1000e/netdev.c  | 33 +++--
> 2 files changed, 27 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
> b/drivers/net/ethernet/intel/e1000e/defines.h
> index 0641c0098738..afb7ebe20b24 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -398,6 +398,7 @@
> #define E1000_ICR_LSC   0x0004 /* Link Status Change */
> #define E1000_ICR_RXSEQ 0x0008 /* Rx sequence error */
> #define E1000_ICR_RXDMT00x0010 /* Rx desc min. threshold (0) */
> +#define E1000_ICR_RXO   0x0040 /* Receiver Overrun */
> #define E1000_ICR_RXT0  0x0080 /* Rx timer intr (ring 0) */
> #define E1000_ICR_ECCER 0x0040 /* Uncorrectable ECC Error */
> /* If this bit asserted, the driver should claim the interrupt */
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 5a8ab1136566..803edd1a6401 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1910,12 +1910,30 @@ static irqreturn_t e1000_msix_other(int 
> __always_unused irq, void *data)
>   struct net_device *netdev = data;
>   struct e1000_adapter *adapter = netdev_priv(netdev);
>   struct e1000_hw *hw = &adapter->hw;
> + u32 icr;
> + bool enable = true;
> +
> + icr = er32(ICR);
> + if (icr & E1000_ICR_RXO) {
> + ew32(ICR, E1000_ICR_RXO);
> + enable = false;
> + /* napi poll will re-enable Other, make sure it runs */
> + if (napi_schedule_prep(&adapter->napi)) {
> + adapter->total_rx_bytes = 0;
> + adapter->total_rx_packets = 0;
> + __napi_schedule(&adapter->napi);
> + }
> + }
> + if (icr & E1000_ICR_LSC) {
> + ew32(ICR, E1000_ICR_LSC);
> + hw->mac.get_link_status = true;
> + /* guard against interrupt when we're going down */
> + if (!test_bit(__E1000_DOWN, &adapter->state)) {
> + mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + }
> + }
> 
> - hw->mac.get_link_status = true;
> -
> - /* guard against interrupt when we're going down */
> - if (!test_bit(__E1000_DOWN, &adapter->state)) {
> - mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + if (enable && !test_bit(__E1000_DOWN, &adapter->state)) {
>   ew32(IMS, E1000_IMS_OTHER);
>   }
> 
> @@ -2687,7 +2705,8 @@ static int e1000e_poll(struct napi_struct *napi, int 
> weight)
>   napi_complete_done(napi, work_done);
>   if (!test_bit(__E1000_DOWN, &adapter->state)) {
>   if (adapter->msix_entries)
> - ew32(IMS, adapter->rx_ring->ims_val);
> + ew32(IMS, adapter->rx_ring->ims_val |
> +  E1000_IMS_OTHER);
>   else
>

Re: [RFC] watchdog: iTCO_wdt: Introduce panic_on_timeout module param

2017-09-18 Thread Philip Prindeville
Sorry for being late to the party…

I tried to use this patch backported to 4.9.49 but could only get the system to 
reset (presumably the SMI but not certain since I’m not that current on BMC/SMI 
architecture) when it hung.

It’s a Lanner FW-8771 with AMI BIOS 4.6.5 on a Xeon E3-1225 v3.

Also, it seems to me that the TCO stuff has several versions (that affect bit 
placement and semantics) but I’m assuming that this code hasn’t been tested on 
all versions?

Thus special handling might be required based on the generation of TCO that’s 
on-chip which I’m not seeing here.  What processor was this code 
developed/tested on?

Comments on the particulars of the code itself inline.


> On Dec 5, 2016, at 3:48 PM, Julius Hemanth Pitti  wrote:
> 
> Currently iTCO_wdt silently resets the board when timeout
> occurs.
> 
> This patch introduces new "panic_on_timeout" module param,
> which when set allows the iTCO_wdt to call panic when
> watchdog timeout occurs, this help to boot to crash
> kernel and collect core dump for further analysis.
> 
> Cc: xe-ker...@external.cisco.com
> Cc: jpi...@mvista.com
> Signed-off-by: Julius Hemanth Pitti 
> ---
> drivers/watchdog/iTCO_wdt.c | 61 +
> 1 file changed, 61 insertions(+)
> 
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index 06fcb6c..23ddcf4 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -66,6 +66,7 @@
> #include/* For spin_lock/spin_unlock/... */
> #include /* For copy_to_user/put_user/... */
> #include  /* For inb/outb/... */
> +#include 
> #include 
> 
> #include "iTCO_vendor.h"
> @@ -76,6 +77,24 @@
> /* SMI Control and Enable Register */
> #define SMI_EN(iTCO_wdt_private.smi_res->start)
> 
> +static int panic_on_timeout;


Don’t seem the point of making this ‘int’ rather than ‘bool’.


> +module_param(panic_on_timeout, int, 0);


Ditto


> +MODULE_PARM_DESC(panic_on_timeout,
> +  "Panic on NMI instead of Reset (1 = panic), default=0.");
> +
> +/* NMI2SMI_EN is bit 9 of TCO1_CNT register
> + * Read/Write
> + * 0 = Normal NMI functionality.
> + * 1 = Forces all NMIs to instead cause SMIs
> + * This depends on NMI_EN and GBL_SMI_EN bits.
> + */
> +#define NMI2SMI_EN   (1 << 9)
> +
> +/* NMI_NOW is bit 8 of TCO1_CNT register.
> + * Read/'Write to Clear'
> + */
> +#define NMI_NOW  (1 << 8)
> +
> #define TCO_RLD   (TCOBASE + 0x00) /* TCO Timer Reload and Curr. 
> Value */
> #define TCOv1_TMR (TCOBASE + 0x01) /* TCOv1 Timer Initial Value   */
> #define TCO_DAT_IN(TCOBASE + 0x02) /* TCO Data In Register*/
> @@ -236,6 +255,15 @@ static int iTCO_wdt_start(struct watchdog_device *wd_dev)
>   val &= 0xf7ff;
>   outw(val, TCO1_CNT);
>   val = inw(TCO1_CNT);
> +
> + if (panic_on_timeout) {
> + /* Make sure NMIs are allowed to fire */
> + if (NMI2SMI_EN & val) {


The Yoda notation was already called out…


> + val &= ~(NMI2SMI_EN);


… as were the extraneous parens.


> + outw(val, TCO1_CNT);
> + pr_info("NMIs are no longer routed to SMIs\n");
> + }
> + }
>   spin_unlock(&iTCO_wdt_private.io_lock);
> 
>   if (val & 0x0800)


Do we need:

+@@ -372,6 +373,7 @@ static unsigned int iTCO_wdt_get_timelef
+ 
+ static const struct watchdog_info ident = {
+   .options =  WDIOF_SETTIMEOUT |
++  WDIOF_PRETIMEOUT |
+   WDIOF_KEEPALIVEPING |
+   WDIOF_MAGICCLOSE,
+   .firmware_version = 0,

also?  Because without WDIOF_PRETIMEOUT, it seems that 
watchdog_set_pretimeout() would return
-EOPNOTSUPP when you try to do the ioctl(WDIOC_SETPRETIMEOUT)…


> @@ -422,6 +450,26 @@ static void iTCO_wdt_cleanup(void)
>   iTCO_wdt_private.gcs_pmc = NULL;
> }
> 
> +/*
> + * iTCO_wdt_timeout_handler: Handler for watchdog timeout NMI event.
> + */
> +int iTCO_wdt_timeout_handler(unsigned int ulReason, struct pt_regs *regs)
> +{
> + unsigned long val32 = inw(TCO1_CNT);
> +
> + if (val32 & NMI_NOW) {
> + /* Clear NMI - Bit 8 within TCO1_CNT is write to clear */
> + outw(val32, TCO1_CNT);
> +
> + /* Crash the system */
> + nmi_panic(regs, "iTCO_wdt: Watchdog timeout");
> +
> + /* Never returns */
> + return NMI_HANDLED;
> + }
> + return NMI_DONE;
> +}
> +
> static int iTCO_wdt_probe(struct platform_device *dev)
> {
>   int ret = -ENODEV;
> @@ -552,11 +600,21 @@ static int iTCO_wdt_probe(struct platform_device *dev)
>   goto unreg_tco;
>   }
> 
> + if (panic_on_timeout) {
> + ret = register_nmi_handler(NMI_UNKNOWN, 
> iTCO_wdt_timeout_handler, 0, "iTCO_wdt");
> + if (ret != 0) {
> + pr_err("cannot register NMI

Re: [PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts

2017-08-11 Thread Philip Prindeville

> On Aug 11, 2017, at 8:13 PM, Philip Prindeville 
>  wrote:
> 
>> 
>> On Jul 21, 2017, at 12:48 PM, Lennart Sorensen 
>>  wrote:
>> 
>> On Fri, Jul 21, 2017 at 11:36:27AM -0700, Benjamin Poirier wrote:
>>> When e1000e_poll() is not fast enough to keep up with incoming traffic, the
>>> adapter (when operating in msix mode) raises the Other interrupt to signal
>>> Receiver Overrun.
>>> 
>>> This is a double problem because 1) at the moment e1000_msix_other()
>>> assumes that it is only called in case of Link Status Change and 2) if the
>>> condition persists, the interrupt is repeatedly raised again in quick
>>> succession.
>>> 
>>> Ideally we would configure the Other interrupt to not be raised in case of
>>> receiver overrun but this doesn't seem possible on this adapter. Instead,
>>> we handle the first part of the problem by reverting to the practice of
>>> reading ICR in the other interrupt handler, like before commit 16ecba59bc33
>>> ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
>>> 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
>>> from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
>>> anymore. We handle the second part of the problem by not re-enabling the
>>> Other interrupt right away when there is overrun. Instead, we wait until
>>> traffic subsides, napi polling mode is exited and interrupts are
>>> re-enabled.
>>> 
>>> Reported-by: Lennart Sorensen 
>>> Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
>>> Signed-off-by: Benjamin Poirier 
>> 
>> Any chance of this fix hitting -stable?  After all adapter reset under
>> load is not nice.
>> 
> 
> 
> I tried this patch sequence and I’m seeing a 2% drop in throughput.  CPU 
> utilization at softIRQ is also about 8% higher.  The previous single patch 
> that went out to fix this problem had better performance.
> 
> This is on an Atom D525 with an 82574L and running 2 GB streams across a pair 
> of interfaces with iperf3.
> 
> -Philip


Actually, after turning off MSI-X mode (and using MSI mode instead), and 
setting InterruptRateThrottle to “4” (conservative dynamic mode) across all 
interfaces, I’m actually seeing slightly better throughput than the earlier 
patch… with comparable overall CPU utilization and SoftIRQ utilization.

So setting the module parameters correctly for routing (and not end-system 
parameters) makes a big difference when routing.

-Philip



Re: [PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts

2017-08-11 Thread Philip Prindeville

> On Jul 21, 2017, at 12:48 PM, Lennart Sorensen  
> wrote:
> 
> On Fri, Jul 21, 2017 at 11:36:27AM -0700, Benjamin Poirier wrote:
>> When e1000e_poll() is not fast enough to keep up with incoming traffic, the
>> adapter (when operating in msix mode) raises the Other interrupt to signal
>> Receiver Overrun.
>> 
>> This is a double problem because 1) at the moment e1000_msix_other()
>> assumes that it is only called in case of Link Status Change and 2) if the
>> condition persists, the interrupt is repeatedly raised again in quick
>> succession.
>> 
>> Ideally we would configure the Other interrupt to not be raised in case of
>> receiver overrun but this doesn't seem possible on this adapter. Instead,
>> we handle the first part of the problem by reverting to the practice of
>> reading ICR in the other interrupt handler, like before commit 16ecba59bc33
>> ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
>> 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
>> from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
>> anymore. We handle the second part of the problem by not re-enabling the
>> Other interrupt right away when there is overrun. Instead, we wait until
>> traffic subsides, napi polling mode is exited and interrupts are
>> re-enabled.
>> 
>> Reported-by: Lennart Sorensen 
>> Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
>> Signed-off-by: Benjamin Poirier 
> 
> Any chance of this fix hitting -stable?  After all adapter reset under
> load is not nice.
> 


I tried this patch sequence and I’m seeing a 2% drop in throughput.  CPU 
utilization at softIRQ is also about 8% higher.  The previous single patch that 
went out to fix this problem had better performance.

This is on an Atom D525 with an 82574L and running 2 GB streams across a pair 
of interfaces with iperf3.

-Philip




Re: commit 16ecba59 breaks 82574L under heavy load.

2017-07-24 Thread Philip Prindeville

> On Jul 20, 2017, at 5:44 PM, Benjamin Poirier  wrote:
> 
> [snip]
> Could you please test the following patch and let me know if it:
> 1) reduces the interrupt rate of the Other msi-x vector
> 2) avoids the link flaps
> or
> 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]"
> In this case, please paste icr values printed.
> 
> Thanks
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
> b/drivers/net/ethernet/intel/e1000e/defines.h
> index 0641c0098738..afb7ebe20b24 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -398,6 +398,7 @@
> #define E1000_ICR_LSC   0x0004 /* Link Status Change */
> #define E1000_ICR_RXSEQ 0x0008 /* Rx sequence error */
> #define E1000_ICR_RXDMT00x0010 /* Rx desc min. threshold (0) */
> +#define E1000_ICR_RXO   0x0040 /* Receiver Overrun */
> #define E1000_ICR_RXT0  0x0080 /* Rx timer intr (ring 0) */
> #define E1000_ICR_ECCER 0x0040 /* Uncorrectable ECC Error */
> /* If this bit asserted, the driver should claim the interrupt */
> diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
> b/drivers/net/ethernet/intel/e1000e/e1000.h
> index c7c994eb410e..f7b46eba3efb 100644
> --- a/drivers/net/ethernet/intel/e1000e/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000e/e1000.h
> @@ -351,6 +351,10 @@ struct e1000_adapter {
>   s32 ptp_delta;
> 
>   u16 eee_advert;
> +
> + unsigned int uh_count;
> + u32 uh_values[16];
> + unsigned int uh_values_nb;
> };
> 
> struct e1000_info {
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index b3679728caac..46697338c0e1 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -46,6 +46,8 @@
> 
> #include "e1000.h"
> 
> +DEFINE_RATELIMIT_STATE(other_uh_ratelimit_state, HZ, 1);
> +
> #define DRV_EXTRAVERSION "-k"
> 
> #define DRV_VERSION "3.2.6" DRV_EXTRAVERSION
> @@ -1904,12 +1906,60 @@ static irqreturn_t e1000_msix_other(int 
> __always_unused irq, void *data)
>   struct net_device *netdev = data;
>   struct e1000_adapter *adapter = netdev_priv(netdev);
>   struct e1000_hw *hw = &adapter->hw;
> + u32 icr;
> + bool enable = true;
> + bool handled = false;
> + unsigned int i;
> 
> - hw->mac.get_link_status = true;
> + icr = er32(ICR);
> + if (icr & E1000_ICR_RXO) {
> + ew32(ICR, E1000_ICR_RXO);
> + enable = false;
> + /* napi poll will re-enable Other, make sure it runs */
> + if (napi_schedule_prep(&adapter->napi)) {
> + adapter->total_rx_bytes = 0;
> + adapter->total_rx_packets = 0;
> + __napi_schedule(&adapter->napi);
> + }
> + handled = true;
> + }
> + if (icr & E1000_ICR_LSC) {
> + ew32(ICR, E1000_ICR_LSC);
> + hw->mac.get_link_status = true;
> + /* guard against interrupt when we're going down */
> + if (!test_bit(__E1000_DOWN, &adapter->state)) {
> + mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + }
> + handled = true;
> + }
> 
> - /* guard against interrupt when we're going down */
> - if (!test_bit(__E1000_DOWN, &adapter->state)) {
> - mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + if (!handled) {
> + adapter->uh_count++;
> + /* only print unseen icr values */
> + if (adapter->uh_values_nb < ARRAY_SIZE(adapter->uh_values)) {
> + for (i = 0; i < ARRAY_SIZE(adapter->uh_values); i++) {
> + if (adapter->uh_values[i] == icr) {
> + break;
> + }
> + }
> + if (i == ARRAY_SIZE(adapter->uh_values)) {
> + adapter->uh_values[adapter->uh_values_nb] =
> + icr;
> + adapter->uh_values_nb++;
> + netdev_warn(netdev,
> + "Other interrupt with unhandled icr 
> 0x%08x\n",
> + icr);
> + }
> + }
> + }
> + if (adapter->uh_count && __ratelimit(&other_uh_ratelimit_state)) {
> + netdev_warn(netdev,
> + "Other interrupt with unhandled cause, count %u\n",
> + adapter->uh_count);
> + adapter->uh_count = 0;
> + }
> +
> + if (enable && !test_bit(__E1000_DOWN, &adapter->state)) {
>   ew32(IMS, E1000_IMS_OTHER);
>   }
> 
> @@ -2681,7 +2731,8 @@ static int e1000e_poll(struct napi_struct *napi, int 
> weight)
>   napi_complete_done(napi