Re: [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems

2018-11-02 Thread Ville Syrjälä
On Fri, Oct 26, 2018 at 06:49:09PM -0400, Lyude Paul wrote:
> Unfortunately, it seems that the HPD IRQ storm problem from the early
> days of Intel GPUs was never entirely solved, only mostly. Within the
> last couple of days, I got a bug report from one of our customers who
> had been having issues with their machine suddenly booting up very
> slowly after having updated. The amount of time it took to boot went
> from around 30 seconds, to over 6 minutes consistently.
> 
> After some investigation, I discovered that i915 was reporting massive
> amounts of short HPD IRQ spam on this system from the DisplayPort port,
> despite there not being anything actually connected. The symptoms would
> start with one "long" HPD IRQ being detected at boot:
> 
> [1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
> 0x0044, dig 0x0044, pins 0x00a0
> [1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
> [1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on 
> PIN 5 - cnt: 0
> [1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
> [1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on 
> PIN 7 - cnt: 0
> [1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
> [1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
> …
> 
> followed by constant short IRQs afterwards:
> 
> [1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status 
> updated from unknown to disconnected
> [1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) 
> received hotplug event.
> [1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
> [1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
> 0x0020, dig 0x0020, pins 0x0080
> [1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
> 0x0020, dig 0x0020, pins 0x0080
> [1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 
> 0x71450085
> [1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
> 0x0020, dig 0x0020, pins 0x0080
> [1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
> 0x0020, dig 0x0020, pins 0x0080
> [1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
> [1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
> 0x0020, dig 0x0020, pins 0x0080
> 
> The customer's system in question has a GM45 GPU, which is apparently
> well known for hotplugging storms.
> 
> So, workaround this impressively broken hardware by changing the default
> HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
> short IRQs count for 1. This makes it so that 5 long IRQs will trigger
> an HPD storm, and on systems with short HPD storm detection 50 short
> IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
> constant pulsing, which seems like a good middleground between being too
> sensitive and not being sensitive enough (which would cause visible
> stutters in userspace every time a storm occurs).
> 
> And just to be extra safe: we don't enable this by default on systems
> with MST support. There's too high of a chance of MST support triggering
> storm detection, and systems that are new enough to support MST are a
> lot less likely to have issues with IRQ storms anyway.
> 
> As a note: this patch was tested using a ThinkPad T450s and a Chamelium
> to simulate the short IRQ storms.
> 
> Changes since v1:
> - Don't use two separate thresholds, just make long IRQs count for 10
>   each and short IRQs count for 1. This simplifies the code a bit
>   - Ville Syrjälä
> 
> Signed-off-by: Lyude Paul 
> Cc: Ville Syrjälä 
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  | 74 
>  drivers/gpu/drm/i915/i915_drv.h  |  5 +-
>  drivers/gpu/drm/i915/i915_irq.c  |  7 +++
>  drivers/gpu/drm/i915/intel_hotplug.c | 47 +++---
>  4 files changed, 113 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
> b/drivers/gpu/drm/i915/i915_debugfs.c
> index b4744a68cd88..1595b8565875 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4641,6 +4641,79 @@ static const struct file_operations 
> i915_hpd_storm_ctl_fops = {
>   .write = i915_hpd_storm_ctl_write
>  };
>  
> +static int i915_hpd_short_storm_ctl_show(struct seq_file *m, void *data)
> +{
> + struct drm_i915_private *dev_priv = m->private;
> +
> + seq_printf(m, "Enabled: %s\n",
> +yesno(dev_priv->hotplug.hpd_short_storm_enabled));
> +
> + return 0;
> 

Re: [Intel-gfx] [PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems

2018-10-26 Thread kbuild test robot
Hi Lyude,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Lyude-Paul/drm-i915-HPD-IRQ-storm-detection-fixes/20181027-085424
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   include/net/mac80211.h:977: warning: Function parameter or member 
'status.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.ampdu_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.antenna' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.tx_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.status_driver_data' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'driver_rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'pad' not 
described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'rate_driver_data' not described in 'ieee80211_tx_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'rx_stats_avg' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'rx_stats_avg.signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'rx_stats_avg.chain_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.filtered' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.retry_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.retry_count' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.lost_packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.last_tdls_pkt_time' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.msdu_retries' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.msdu_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.last_ack' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.last_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.ack_signal_filled' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'status_stats.avg_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'tx_stats.packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'tx_stats.bytes' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'tx_stats.last_rate' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 
'tx_stats.msdu' not described in 'sta_info'
   include/linux/dma-buf.h:304: warning: Function parameter or member 
'cb_excl.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 
'cb_excl.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 
'cb_excl.active' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 
'cb_shared.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 
'cb_shared.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:304: warning: Function parameter or member 
'cb_shared.active' not described in 'dma_buf'
   include/linux/dma-fence-array.h:54: warning: Function parameter or member 
'work' not described in 'dma_fence_array'
   include/linux/gpio/driver.h:142: warning: Function param

[PATCH v2 3/3] drm/i915: Add short HPD IRQ storm detection for non-MST systems

2018-10-26 Thread Lyude Paul
Unfortunately, it seems that the HPD IRQ storm problem from the early
days of Intel GPUs was never entirely solved, only mostly. Within the
last couple of days, I got a bug report from one of our customers who
had been having issues with their machine suddenly booting up very
slowly after having updated. The amount of time it took to boot went
from around 30 seconds, to over 6 minutes consistently.

After some investigation, I discovered that i915 was reporting massive
amounts of short HPD IRQ spam on this system from the DisplayPort port,
despite there not being anything actually connected. The symptoms would
start with one "long" HPD IRQ being detected at boot:

[1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
0x0044, dig 0x0044, pins 0x00a0
[1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
[1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 
5 - cnt: 0
[1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
[1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 
7 - cnt: 0
[1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
[1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
…

followed by constant short IRQs afterwards:

[1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status 
updated from unknown to disconnected
[1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) 
received hotplug event.
[1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
[1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
0x0020, dig 0x0020, pins 0x0080
[1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
0x0020, dig 0x0020, pins 0x0080
[1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 
0x71450085
[1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
0x0020, dig 0x0020, pins 0x0080
[1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
0x0020, dig 0x0020, pins 0x0080
[1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 
0x0020, dig 0x0020, pins 0x0080

The customer's system in question has a GM45 GPU, which is apparently
well known for hotplugging storms.

So, workaround this impressively broken hardware by changing the default
HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
short IRQs count for 1. This makes it so that 5 long IRQs will trigger
an HPD storm, and on systems with short HPD storm detection 50 short
IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
constant pulsing, which seems like a good middleground between being too
sensitive and not being sensitive enough (which would cause visible
stutters in userspace every time a storm occurs).

And just to be extra safe: we don't enable this by default on systems
with MST support. There's too high of a chance of MST support triggering
storm detection, and systems that are new enough to support MST are a
lot less likely to have issues with IRQ storms anyway.

As a note: this patch was tested using a ThinkPad T450s and a Chamelium
to simulate the short IRQ storms.

Changes since v1:
- Don't use two separate thresholds, just make long IRQs count for 10
  each and short IRQs count for 1. This simplifies the code a bit
  - Ville Syrjälä

Signed-off-by: Lyude Paul 
Cc: Ville Syrjälä 
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 74 
 drivers/gpu/drm/i915/i915_drv.h  |  5 +-
 drivers/gpu/drm/i915/i915_irq.c  |  7 +++
 drivers/gpu/drm/i915/intel_hotplug.c | 47 +++---
 4 files changed, 113 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index b4744a68cd88..1595b8565875 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4641,6 +4641,79 @@ static const struct file_operations 
i915_hpd_storm_ctl_fops = {
.write = i915_hpd_storm_ctl_write
 };
 
+static int i915_hpd_short_storm_ctl_show(struct seq_file *m, void *data)
+{
+   struct drm_i915_private *dev_priv = m->private;
+
+   seq_printf(m, "Enabled: %s\n",
+  yesno(dev_priv->hotplug.hpd_short_storm_enabled));
+
+   return 0;
+}
+
+static int
+i915_hpd_short_storm_ctl_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, i915_hpd_short_storm_ctl_show,
+  inode->i_private);
+}
+
+static ssize_t i915_hpd_short_storm_ctl_write