date:20220113

Re: [Intel-gfx] [RFC 7/7] drm/i915/guc: Print the GuC error capture output register list.

2022-01-13 Thread Teres Alexis, Alan Previn

> This made me thing guc engine reset notification is a "handshake" 
> operation and not a pure notification? Does it imply GuC will wait for
> i915 to reply what to do next meaning it won't continue to execute ContextA 
> before i915 replies to engine reset notification?

> If so that would resolve my concern.

Yes: The GuC to host action is used to report a hung context to the VF host if 
engine reset was triggered and a hung context was detected during engine reset. 
This context is automatically put in a non-runnable state.
Apologies for the delay - some task IRQs.

...alan

-Original Message-
From: Tvrtko Ursulin  
Sent: Tuesday, January 11, 2022 2:09 AM
To: Teres Alexis, Alan Previn ; Brost, 
Matthew 
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFC 7/7] drm/i915/guc: Print the GuC error capture 
output register list.


On 10/01/2022 18:19, Teres Alexis, Alan Previn wrote:
> 
> On Mon, 2022-01-10 at 08:07 +, Tvrtko Ursulin wrote:
>> On 07/01/2022 17:03, Teres Alexis, Alan Previn wrote:
>>> On Fri, 2022-01-07 at 09:03 +, Tvrtko Ursulin wrote:
 On 06/01/2022 18:33, Teres Alexis, Alan Previn wrote:
> On Thu, 2022-01-06 at 09:38 +, Tvrtko Ursulin wrote:
>> On 05/01/2022 17:30, Teres Alexis, Alan Previn wrote:
>>> On Tue, 2022-01-04 at 13:56 +, Tvrtko Ursulin wrote:
> The flow of events are as below:
>
> 1. guc sends notification that an error capture was done and ready to 
> take.
>   - at this point we copy the guc error captured dump into an 
> interim store
> (larger buffer that can hold multiple captures).
> 2. guc sends notification that a context was reset (after the prior)
>   - this triggers a call to i915_gpu_coredump with the 
> corresponding engine-mask
>from the context that was reset
>   - i915_gpu_coredump proceeds to gather entire gpu state 
> including driver state,
>global gpu state, engine state, context vmas and also 
> engine registers. For the
>engine registers now call into the guc_capture code 
> which merely needs to verify
> that GuC had already done a step 1 and we have data ready to 
> be parsed.

 What about the time between the actual reset and receiving the 
 context reset notification? Latter will contain 
 intel_context->guc_id - can that be re-assigned or "retired" in 
 between the two and so cause problems for matching the correct (or 
 any) vmas?

>>> Not it cannot because its only after the context reset 
>>> notification that i915 starts taking action against that cotnext - and 
>>> even that happens after the i915_gpu_codedump (engine-mask-of-context) 
>>> happens.
>>> That's what i've observed in the code flow.
>>
>> The fact it is "only after" is exactly why I asked.
>>
>> Reset notification is in a CT queue with other stuff, right? So 
>> can be some unrelated time after the actual reset. Could have 
>> context be retired in the meantime and guc_id released is the question.
>>
>> Because i915 has no idea there was a reset until this delayed 
>> message comes over, but it could see user interrupt signaling end 
>> of batch, after the reset has happened, unbeknown to i915, right?
>>
>> Perhaps the answer is guc_id cannot be released via the request 
>> retire flows. Or GuC signaling release of guc_id is a thing, 
>> which is then ordered via the same CT buffer.
>>
>> I don't know, just asking.
>>
> As long as the context is pinned, the guc-id wont be re-assigned. 
> After a bit of offline brain-dump from John Harrison, there are 
> many factors that can keep the context pinned (recounts) including 
> new or oustanding requests. So a guc-id can't get re-assigned 
> between a capture-notify and a context-reset even if that 
> outstanding request is the only refcount left since it would still 
> be considered outstanding by the driver. I also think we may also be 
> talking past each other in the sense that the guc-id is something the 
> driver assigns to a context being pinned and only the driver can 
> un-assign it (both assigning and unasigning is via H2G interactions).
> I get the sense you are assuming the GuC can un-assign the 
> guc-id's on its own - which isn't the case. Apologies if i mis-assumed.

 I did not think GuC can re-assign ce->guc_id. I asked about 
 request/context complete/retire happening before reset/capture 
 notification is received.

 That would be the time window between the last intel_context_put, so last 
 i915_request_put from retire, at which point AFAICT GuC code releases the 
 guc_id. Execution timeline like:

> -- rq1

[Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [v5,1/5] x86/quirks: Fix stolen detection with integrated + discrete GPU

2022-01-13 Thread Patchwork

== Series Details ==

Series: series starting with [v5,1/5] x86/quirks: Fix stolen detection with 
integrated + discrete GPU
URL   : https://patchwork.freedesktop.org/series/98864/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11081_full -> Patchwork_22001_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Participating hosts (10 -> 10)
--

  No changes in participating hosts

Known issues


  Here are the changes found in Patchwork_22001_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@gem_create@create-massive:
- shard-apl:  NOTRUN -> [DMESG-WARN][1] ([i915#3002])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-apl7/igt@gem_cre...@create-massive.html

  * igt@gem_eio@kms:
- shard-tglb: [PASS][2] -> [TIMEOUT][3] ([i915#3063])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/shard-tglb7/igt@gem_...@kms.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-tglb3/igt@gem_...@kms.html

  * igt@gem_eio@unwedge-stress:
- shard-skl:  [PASS][4] -> [TIMEOUT][5] ([i915#3063])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/shard-skl7/igt@gem_...@unwedge-stress.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-skl6/igt@gem_...@unwedge-stress.html

  * igt@gem_exec_balancer@parallel-contexts:
- shard-iclb: [PASS][6] -> [SKIP][7] ([i915#4525])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/shard-iclb1/igt@gem_exec_balan...@parallel-contexts.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-iclb8/igt@gem_exec_balan...@parallel-contexts.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
- shard-tglb: [PASS][8] -> [FAIL][9] ([i915#2842])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/shard-tglb8/igt@gem_exec_fair@basic-none-sh...@rcs0.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-tglb7/igt@gem_exec_fair@basic-none-sh...@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs0:
- shard-kbl:  NOTRUN -> [FAIL][10] ([i915#2842])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-kbl3/igt@gem_exec_fair@basic-n...@vcs0.html

  * igt@gem_exec_fair@basic-pace@rcs0:
- shard-kbl:  [PASS][11] -> [FAIL][12] ([i915#2842])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/shard-kbl4/igt@gem_exec_fair@basic-p...@rcs0.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-kbl3/igt@gem_exec_fair@basic-p...@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
- shard-glk:  [PASS][13] -> [FAIL][14] ([i915#2842]) +1 similar 
issue
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/shard-glk1/igt@gem_exec_fair@basic-p...@vcs0.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-glk1/igt@gem_exec_fair@basic-p...@vcs0.html

  * igt@gem_exec_fair@basic-pace@vcs1:
- shard-iclb: NOTRUN -> [FAIL][15] ([i915#2842]) +1 similar issue
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-iclb2/igt@gem_exec_fair@basic-p...@vcs1.html

  * igt@gem_lmem_swapping@random:
- shard-apl:  NOTRUN -> [SKIP][16] ([fdo#109271] / [i915#4613]) +1 
similar issue
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-apl7/igt@gem_lmem_swapp...@random.html

  * igt@gem_lmem_swapping@smem-oom:
- shard-kbl:  NOTRUN -> [SKIP][17] ([fdo#109271] / [i915#4613]) +3 
similar issues
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-kbl3/igt@gem_lmem_swapp...@smem-oom.html
- shard-skl:  NOTRUN -> [SKIP][18] ([fdo#109271] / [i915#4613]) +1 
similar issue
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-skl9/igt@gem_lmem_swapp...@smem-oom.html

  * igt@gem_pwrite@basic-exhaustion:
- shard-kbl:  NOTRUN -> [WARN][19] ([i915#2658])
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-kbl6/igt@gem_pwr...@basic-exhaustion.html

  * igt@gem_render_copy@x-tiled-to-vebox-yf-tiled:
- shard-kbl:  NOTRUN -> [SKIP][20] ([fdo#109271]) +194 similar 
issues
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-kbl6/igt@gem_render_c...@x-tiled-to-vebox-yf-tiled.html

  * igt@gem_softpin@allocator-evict-all-engines:
- shard-glk:  [PASS][21] -> [DMESG-WARN][22] ([i915#118]) +2 
similar issues
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/shard-glk3/igt@gem_soft...@allocator-evict-all-engines.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/shard-glk8/igt@gem_soft...@allocator-evict-all-engines.html

  * igt@gem_userptr_blits@dmabuf-sync:
- shard-iclb: NOTRUN -> [SKIP][23] ([i915#3323])
   [23]:

[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [v5,1/5] x86/quirks: Fix stolen detection with integrated + discrete GPU

2022-01-13 Thread Patchwork

== Series Details ==

Series: series starting with [v5,1/5] x86/quirks: Fix stolen detection with 
integrated + discrete GPU
URL   : https://patchwork.freedesktop.org/series/98864/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11081 -> Patchwork_22001


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/index.html

Participating hosts (43 -> 40)
--

  Missing(3): fi-bsw-cyan fi-icl-u2 fi-pnv-d510 

Known issues


  Here are the changes found in Patchwork_22001 that come from known issues:

### CI changes ###

 Possible fixes 

  * boot:
- fi-bxt-dsi: [FAIL][1] ([i915#4912]) -> [PASS][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/fi-bxt-dsi/boot.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-bxt-dsi/boot.html

  

### IGT changes ###

 Issues hit 

  * igt@amdgpu/amd_basic@cs-gfx:
- fi-hsw-4770:NOTRUN -> [SKIP][3] ([fdo#109271] / [fdo#109315]) +17 
similar issues
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-hsw-4770/igt@amdgpu/amd_ba...@cs-gfx.html

  * igt@amdgpu/amd_cs_nop@sync-fork-compute0:
- fi-snb-2600:NOTRUN -> [SKIP][4] ([fdo#109271]) +17 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-snb-2600/igt@amdgpu/amd_cs_...@sync-fork-compute0.html

  * igt@gem_huc_copy@huc-copy:
- fi-bxt-dsi: NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#2190])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-bxt-dsi/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@verify-random:
- fi-bxt-dsi: NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#4613]) +3 
similar issues
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-bxt-dsi/igt@gem_lmem_swapp...@verify-random.html

  * igt@i915_pm_rpm@module-reload:
- fi-kbl-soraka:  [PASS][7] -> [DMESG-WARN][8] ([i915#1982])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/fi-kbl-soraka/igt@i915_pm_...@module-reload.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-kbl-soraka/igt@i915_pm_...@module-reload.html

  * igt@kms_chamelium@common-hpd-after-suspend:
- fi-bxt-dsi: NOTRUN -> [SKIP][9] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-bxt-dsi/igt@kms_chamel...@common-hpd-after-suspend.html

  * igt@kms_force_connector_basic@force-load-detect:
- fi-bxt-dsi: NOTRUN -> [SKIP][10] ([fdo#109271]) +30 similar issues
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-bxt-dsi/igt@kms_force_connector_ba...@force-load-detect.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
- fi-bxt-dsi: NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#533])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-bxt-dsi/igt@kms_pipe_crc_ba...@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_psr@primary_page_flip:
- fi-skl-6600u:   [PASS][12] -> [FAIL][13] ([i915#4547])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/fi-skl-6600u/igt@kms_psr@primary_page_flip.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-skl-6600u/igt@kms_psr@primary_page_flip.html

  * igt@runner@aborted:
- fi-skl-6600u:   NOTRUN -> [FAIL][14] ([i915#4312])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-skl-6600u/igt@run...@aborted.html
- fi-bdw-5557u:   NOTRUN -> [FAIL][15] ([i915#2426] / [i915#4312])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-bdw-5557u/igt@run...@aborted.html

  
 Possible fixes 

  * igt@i915_selftest@live@hangcheck:
- fi-hsw-4770:[INCOMPLETE][16] ([i915#4785]) -> [PASS][17]
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/fi-hsw-4770/igt@i915_selftest@l...@hangcheck.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-hsw-4770/igt@i915_selftest@l...@hangcheck.html
- bat-dg1-6:  [DMESG-FAIL][18] ([i915#4494]) -> [PASS][19]
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/bat-dg1-6/igt@i915_selftest@l...@hangcheck.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/bat-dg1-6/igt@i915_selftest@l...@hangcheck.html
- fi-snb-2600:[INCOMPLETE][20] ([i915#3921]) -> [PASS][21]
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11081/fi-snb-2600/igt@i915_selftest@l...@hangcheck.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22001/fi-snb-2600/igt@i915_selftest@l...@hangcheck.html

  * igt@kms_frontbuffer_tracking@basic:
- fi-cml-u2:  [DMESG-WARN][22] ([i915#4269]) -> [PASS][23]
   [22]:

[Intel-gfx] [PATCH v5 4/5] x86/quirks: Remove unused logic for flags

2022-01-13 Thread Lucas De Marchi

The flags were only used to mark the quirk as applied when it was
requested to be called only once. Now all the users were converted to
use a static local variable, so this logic can be removed.

Signed-off-by: Lucas De Marchi 
---
 arch/x86/kernel/early-quirks.c | 35 ++
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 7c70977737de..1db4d92f8a85 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -695,37 +695,33 @@ static void __init apple_airport_reset(int bus, int slot, 
int func)
early_iounmap(mmio, BCM4331_MMIO_SIZE);
 }
 
-#define QFLAG_APPLY_ONCE   0x1
-#define QFLAG_APPLIED  0x2
-#define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
 struct chipset {
u32 vendor;
u32 device;
u32 class;
u32 class_mask;
-   u32 flags;
void (*f)(int num, int slot, int func);
 };
 
 static struct chipset early_qrk[] __initdata = {
{ PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
- PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, 0, nvidia_bugs },
+ PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, nvidia_bugs },
{ PCI_VENDOR_ID_VIA, PCI_ANY_ID,
- PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, 0, via_bugs },
+ PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, via_bugs },
{ PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB,
- PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, fix_hypertransport_config },
+ PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, fix_hypertransport_config },
{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_IXP400_SMBUS,
- PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
+ PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, ati_bugs },
{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
- PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
+ PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, ati_bugs_contd },
{ PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
- PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
+ PCI_BASE_CLASS_BRIDGE, intel_remapping_check },
{ PCI_VENDOR_ID_INTEL, 0x3405, PCI_CLASS_BRIDGE_HOST,
- PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
+ PCI_BASE_CLASS_BRIDGE, intel_remapping_check },
{ PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
- PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
+ PCI_BASE_CLASS_BRIDGE, intel_remapping_check },
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA, PCI_ANY_ID,
- 0, intel_graphics_quirks },
+ intel_graphics_quirks },
/*
 * HPET on the current version of the Baytrail platform has accuracy
 * problems: it will halt in deep idle state - so we disable it.
@@ -735,9 +731,9 @@ static struct chipset early_qrk[] __initdata = {
 *
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/atom-z8000-datasheet-vol-1.pdf
 */
{ PCI_VENDOR_ID_INTEL, 0x0f00,
-   PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
+ PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, force_disable_hpet},
{ PCI_VENDOR_ID_BROADCOM, 0x4331,
- PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset},
+ PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, apple_airport_reset},
{}
 };
 
@@ -778,12 +774,9 @@ static int __init check_dev_quirk(int num, int slot, int 
func)
((early_qrk[i].device == PCI_ANY_ID) ||
(early_qrk[i].device == device)) &&
(!((early_qrk[i].class ^ class) &
-   early_qrk[i].class_mask))) {
-   if ((early_qrk[i].flags &
-QFLAG_DONE) != QFLAG_DONE)
-   early_qrk[i].f(num, slot, func);
-   early_qrk[i].flags |= QFLAG_APPLIED;
-   }
+   early_qrk[i].class_mask)))
+   early_qrk[i].f(num, slot, func);
+
}
 
type = read_pci_config_byte(num, slot, func,
-- 
2.34.1

[Intel-gfx] [PATCH v5 5/5] x86/quirks: Improve line wrap on quirk conditions

2022-01-13 Thread Lucas De Marchi

Remove extra parenthesis and wrap lines so it's easier to read what are
the conditions being checked. The call to the hook also had an extra
indentation: remove here to conform to coding style.

Signed-off-by: Lucas De Marchi 
Reviewed-by: Rodrigo Vivi 
---
 arch/x86/kernel/early-quirks.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 1db4d92f8a85..996e3cbc1c5f 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -769,14 +769,12 @@ static int __init check_dev_quirk(int num, int slot, int 
func)
device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);
 
for (i = 0; early_qrk[i].f != NULL; i++) {
-   if (((early_qrk[i].vendor == PCI_ANY_ID) ||
-   (early_qrk[i].vendor == vendor)) &&
-   ((early_qrk[i].device == PCI_ANY_ID) ||
-   (early_qrk[i].device == device)) &&
-   (!((early_qrk[i].class ^ class) &
-   early_qrk[i].class_mask)))
-   early_qrk[i].f(num, slot, func);
-
+   if ((early_qrk[i].vendor == PCI_ANY_ID ||
+early_qrk[i].vendor == vendor) &&
+   (early_qrk[i].device == PCI_ANY_ID ||
+early_qrk[i].device == device) &&
+   !((early_qrk[i].class ^ class) & early_qrk[i].class_mask))
+   early_qrk[i].f(num, slot, func);
}
 
type = read_pci_config_byte(num, slot, func,
-- 
2.34.1

[Intel-gfx] [PATCH v5 3/5] x86/quirks: Stop using QFLAG_APPLY_ONCE in nvidia_bugs()

2022-01-13 Thread Lucas De Marchi

Adopt the same approach as in intel_graphics_quirks(), with a static
local variable, to control when the quirk has already been applied.
However, contrary to intel_graphics_quirks(), here we always set it as
applied as soon as it's called to avoid changing the current behavior
that is not failing.

This is the last user of the flags, so we can cleanup the early-quirks,
removing all the flags logic later.

Signed-off-by: Lucas De Marchi 
---
 arch/x86/kernel/early-quirks.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 59cc67aace93..7c70977737de 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -88,6 +88,13 @@ static void __init nvidia_bugs(int num, int slot, int func)
 {
 #ifdef CONFIG_ACPI
 #ifdef CONFIG_X86_IO_APIC
+   static bool quirk_applied __initdata;
+
+   if (quirk_applied)
+   return;
+
+   quirk_applied = true;
+
/*
 * Only applies to Nvidia root ports (bus 0) and not to
 * Nvidia graphics cards with PCI ports on secondary buses.
@@ -702,7 +709,7 @@ struct chipset {
 
 static struct chipset early_qrk[] __initdata = {
{ PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
- PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, QFLAG_APPLY_ONCE, nvidia_bugs },
+ PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, 0, nvidia_bugs },
{ PCI_VENDOR_ID_VIA, PCI_ANY_ID,
  PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, 0, via_bugs },
{ PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB,
-- 
2.34.1

[Intel-gfx] [PATCH v5 1/5] x86/quirks: Fix stolen detection with integrated + discrete GPU

2022-01-13 Thread Lucas De Marchi

early_pci_scan_bus() does a depth-first traversal, possibly calling
the quirk functions for each device based on vendor, device and class
from early_qrk table. intel_graphics_quirks() however uses PCI_ANY_ID
and does additional filtering in the quirk.

If there is an Intel integrated + discrete GPU the quirk may be called
first for the discrete GPU based on the PCI topology. Then we will fail
to reserve the system stolen memory for the integrated GPU, because we
will already have marked the quirk as "applied".

This was reproduced in a setup with Alderlake-P (integrated) + DG2
(discrete), with the following PCI topology:

- 00:01.0 Bridge
  `- 03:00.0 DG2
- 00:02.0 Integrated GPU

So, stop using the QFLAG_APPLY_ONCE flag, replacing it with a static
local variable. We can set this variable in the right place, inside
intel_graphics_quirks(), only when the quirk was actually applied, i.e.
when we find the integrated GPU based on the intel_early_ids table.

Cc: sta...@vger.kernel.org
Signed-off-by: Lucas De Marchi 
---

v5: apply fix before the refactor

 arch/x86/kernel/early-quirks.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 1ca3a56fdc2d..de9a76eb544e 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -589,10 +589,14 @@ intel_graphics_stolen(int num, int slot, int func,
 
 static void __init intel_graphics_quirks(int num, int slot, int func)
 {
+   static bool quirk_applied __initdata;
const struct intel_early_ops *early_ops;
u16 device;
int i;
 
+   if (quirk_applied)
+   return;
+
device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);
 
for (i = 0; i < ARRAY_SIZE(intel_early_ids); i++) {
@@ -605,6 +609,8 @@ static void __init intel_graphics_quirks(int num, int slot, 
int func)
 
intel_graphics_stolen(num, slot, func, early_ops);
 
+   quirk_applied = true;
+
return;
}
 }
@@ -705,7 +711,7 @@ static struct chipset early_qrk[] __initdata = {
{ PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
  PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA, PCI_ANY_ID,
- QFLAG_APPLY_ONCE, intel_graphics_quirks },
+ 0, intel_graphics_quirks },
/*
 * HPET on the current version of the Baytrail platform has accuracy
 * problems: it will halt in deep idle state - so we disable it.
-- 
2.34.1

[Intel-gfx] [PATCH v5 2/5] x86/quirks: Stop using QFLAG_APPLY_ONCE in via_bugs()

2022-01-13 Thread Lucas De Marchi

Adopt the same approach as in intel_graphics_quirks(), with a static
local variable, to control when the quirk has already been applied.
However, contrary to intel_graphics_quirks() here we always set it as
applied as soon as it's called to avoid changing the current behavior
that is not failing.

After converting other users, it will allow us to remove all the logic
handling the flags.

Signed-off-by: Lucas De Marchi 
---
 arch/x86/kernel/early-quirks.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index de9a76eb544e..59cc67aace93 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -57,6 +57,13 @@ static void __init fix_hypertransport_config(int num, int 
slot, int func)
 static void __init via_bugs(int  num, int slot, int func)
 {
 #ifdef CONFIG_GART_IOMMU
+   static bool quirk_applied __initdata;
+
+   if (quirk_applied)
+   return;
+
+   quirk_applied = true;
+
if ((max_pfn > MAX_DMA32_PFN ||  force_iommu) &&
!gart_iommu_aperture_allowed) {
printk(KERN_INFO
@@ -697,7 +704,7 @@ static struct chipset early_qrk[] __initdata = {
{ PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
  PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, QFLAG_APPLY_ONCE, nvidia_bugs },
{ PCI_VENDOR_ID_VIA, PCI_ANY_ID,
- PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, QFLAG_APPLY_ONCE, via_bugs },
+ PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, 0, via_bugs },
{ PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB,
  PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, fix_hypertransport_config },
{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_IXP400_SMBUS,
-- 
2.34.1

[Intel-gfx] [PATCH v4 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The hang test was relying on context persitence for no particular
reason. That is, it would set a bunch of background spinners running
then immediately destroy the active contexts but expect the spinners
to keep spinning. With the current implementation of context
persistence in i915, that means that super high priority pings are
sent to each engine at the start of the test. Depending upon the
timing and platform, one of those unexpected pings could cause test
failures.

There is no need to require context persitence in this test. So change
to managing the contexts cleanly and only destroying them when they
are no longer in use.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 73a86ec9e..24087931c 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -289,27 +289,29 @@ test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
 {
const struct intel_execution_engine2 *other;
-   const intel_ctx_t *tmp_ctx;
+   const intel_ctx_t *local_ctx[GEM_MAX_ENGINES];
igt_spin_t *spin, *next;
IGT_LIST_HEAD(list);
uint64_t ahnd = get_reloc_ahnd(device, ctx->id), ahndN;
+   int num_ctx;
 
igt_skip_on(flags & IGT_SPIN_INVALID_CS &&
gem_engine_has_cmdparser(device, >cfg, e->flags));
 
/* Fill all the other engines with background load */
+   num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
if (other->flags == e->flags)
continue;
 
-   tmp_ctx = intel_ctx_create(device, >cfg);
-   ahndN = get_reloc_ahnd(device, tmp_ctx->id);
+   local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
- .ctx = tmp_ctx,
+ .ctx = local_ctx[num_ctx],
  .engine = other->flags,
  .flags = IGT_SPIN_FENCE_OUT);
-   intel_ctx_destroy(device, tmp_ctx);
+   num_ctx++;
 
igt_list_move(>link, );
}
@@ -339,7 +341,10 @@ test_engine_hang(const intel_ctx_t *ctx,
igt_spin_free(device, spin);
put_ahnd(ahndN);
}
+
put_ahnd(ahnd);
+   while (num_ctx)
+   intel_ctx_destroy(device, local_ctx[--num_ctx]);
 
check_alive();
 }
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The test was updated some engine properties but not restoring them
afterwards. That would leave the system in a non-default state which
could potentially affect subsequent tests. Fix it by using the new
save/restore engine properties helper functions.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/gem_exec_capture.c | 37 ++-
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 9beb36fc7..51db07c41 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -209,14 +209,21 @@ static int check_error_state(int dir, struct offset 
*obj_offsets, int obj_count,
return blobs;
 }
 
-static void configure_hangs(int fd, const struct intel_execution_engine2 *e, 
int ctxt_id)
+static struct gem_engine_properties
+configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
 {
+   struct gem_engine_properties props;
+
/* Ensure fast hang detection */
-   gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 
250);
-   gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 
500);
+   props.engine = e;
+   props.preempt_timeout = 250;
+   props.heartbeat_interval = 500;
+   gem_engine_properties_configure(fd, );
 
/* Allow engine based resets and disable banning */
igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE | 
HANG_WANT_ENGINE_RESET);
+
+   return props;
 }
 
 static bool fence_busy(int fence)
@@ -256,8 +263,9 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
const intel_ctx_t *ctx,
uint32_t *batch, *seqno;
struct offset offset;
int i, fence_out;
+   struct gem_engine_properties saved_engine;
 
-   configure_hangs(fd, e, ctx->id);
+   saved_engine = configure_hangs(fd, e, ctx->id);
 
memset(obj, 0, sizeof(obj));
obj[SCRATCH].handle = gem_create_in_memory_regions(fd, 4096, region);
@@ -371,6 +379,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
const intel_ctx_t *ctx,
gem_close(fd, obj[BATCH].handle);
gem_close(fd, obj[NOCAPTURE].handle);
gem_close(fd, obj[SCRATCH].handle);
+
+   gem_engine_properties_restore(fd, _engine);
 }
 
 static void capture(int fd, int dir, const intel_ctx_t *ctx,
@@ -417,8 +427,9 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
uint32_t *batch, *seqno;
struct offset *offsets;
int i, fence_out;
+   struct gem_engine_properties saved_engine;
 
-   configure_hangs(fd, e, ctx->id);
+   saved_engine = configure_hangs(fd, e, ctx->id);
 
offsets = calloc(count, sizeof(*offsets));
igt_assert(offsets);
@@ -559,10 +570,12 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
 
qsort(offsets, count, sizeof(*offsets), cmp);
igt_assert(offsets[0].addr <= offsets[count-1].addr);
+
+   gem_engine_properties_restore(fd, _engine);
return offsets;
 }
 
-#define find_first_available_engine(fd, ctx, e) \
+#define find_first_available_engine(fd, ctx, e, saved) \
do { \
ctx = intel_ctx_create_all_physical(fd); \
igt_assert(ctx); \
@@ -570,7 +583,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
for_each_if(gem_class_can_store_dword(fd, e->class)) \
break; \
igt_assert(e); \
-   configure_hangs(fd, e, ctx->id); \
+   saved = configure_hangs(fd, e, ctx->id); \
} while(0)
 
 static void many(int fd, int dir, uint64_t size, unsigned int flags)
@@ -580,8 +593,9 @@ static void many(int fd, int dir, uint64_t size, unsigned 
int flags)
uint64_t ram, gtt, ahnd;
unsigned long count, blobs;
struct offset *offsets;
+   struct gem_engine_properties saved_engine;
 
-   find_first_available_engine(fd, ctx, e);
+   find_first_available_engine(fd, ctx, e, saved_engine);
 
gtt = gem_aperture_size(fd) / size;
ram = (intel_get_avail_ram_mb() << 20) / size;
@@ -602,6 +616,8 @@ static void many(int fd, int dir, uint64_t size, unsigned 
int flags)
 
free(offsets);
put_ahnd(ahnd);
+
+   gem_engine_properties_restore(fd, _engine);
 }
 
 static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
@@ -697,8 +713,9 @@ static void userptr(int fd, int dir)
void *ptr;
int obj_size = 4096;
uint32_t system_region = INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0);
+   struct gem_engine_properties saved_engine;
 
-   find_first_available_engine(fd, ctx, e);
+   find_first_available_engine(fd, ctx, e, saved_engine);
 
igt_assert(posix_memalign(, obj_size, obj_size) == 0);
memset(ptr, 0, obj_size);
@@ -710,6 +727,8 @@ static void userptr(int fd, int dir)

[Intel-gfx] [PATCH v4 i-g-t 04/15] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Although the hangman test was ensuring that *some* reset functionality
was enabled, it did not differentiate what kind. The infrastructure
required to choose between per engine reset or full GT reset was
recently added. So update this test to use it as well.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 76 +--
 1 file changed, 49 insertions(+), 27 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 280eac197..7b8390a6c 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -323,40 +323,26 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
}
 }
 
-igt_main
+static void do_tests(const char *name, const char *prefix,
+const intel_ctx_t *ctx)
 {
const struct intel_execution_engine2 *e;
-   const intel_ctx_t *ctx;
-   igt_hang_t hang = {};
-
-   igt_fixture {
-   device = drm_open_driver(DRIVER_INTEL);
-   igt_require_gem(device);
-
-   ctx = intel_ctx_create_all_physical(device);
-
-   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE);
-
-   sysfs = igt_sysfs_open(device);
-   igt_assert(sysfs != -1);
-
-   igt_require(has_error_state(sysfs));
-   }
+   char buff[256];
 
-   igt_describe("Basic error capture");
-   igt_subtest("error-state-basic")
-   test_error_state_basic();
-
-   igt_describe("Per engine error capture");
-   igt_subtest_with_dynamic("error-state-capture") {
+   snprintf(buff, sizeof(buff), "Per engine error capture (%s reset)", 
name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-error-state-capture", prefix);
+   igt_subtest_with_dynamic(buff) {
for_each_ctx_engine(device, ctx, e) {
igt_dynamic_f("%s", e->name)
test_error_state_capture(ctx, e);
}
}
 
-   igt_describe("Per engine hang recovery (spin)");
-   igt_subtest_with_dynamic("engine-hang") {
+   snprintf(buff, sizeof(buff), "Per engine hang recovery (spin, %s 
reset)", name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-engine-hang", prefix);
+   igt_subtest_with_dynamic(buff) {
 int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
.param = I915_PARAM_HAS_GPU_RESET,
@@ -374,8 +360,10 @@ igt_main
}
}
 
-   igt_describe("Per engine hang recovery (invalid CS)");
-   igt_subtest_with_dynamic("engine-error") {
+   snprintf(buff, sizeof(buff), "Per engine hang recovery (invalid CS, %s 
reset)", name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-engine-error", prefix);
+   igt_subtest_with_dynamic(buff) {
int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
.param = I915_PARAM_HAS_GPU_RESET,
@@ -391,11 +379,45 @@ igt_main
test_engine_hang(ctx, e, IGT_SPIN_INVALID_CS);
}
}
+}
+
+igt_main
+{
+   const intel_ctx_t *ctx;
+   igt_hang_t hang = {};
+
+   igt_fixture {
+   device = drm_open_driver(DRIVER_INTEL);
+   igt_require_gem(device);
+
+   ctx = intel_ctx_create_all_physical(device);
+
+   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE);
+
+   sysfs = igt_sysfs_open(device);
+   igt_assert(sysfs != -1);
+
+   igt_require(has_error_state(sysfs));
+   }
+
+   igt_describe("Basic error capture");
+   igt_subtest("error-state-basic")
+   test_error_state_basic();
 
igt_describe("Check that executing unintialised memory causes a hang");
igt_subtest("hangcheck-unterminated")
hangcheck_unterminated(ctx);
 
+   do_tests("GT", "gt", ctx);
+
+   igt_fixture {
+   igt_disallow_hang(device, hang);
+
+   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE | 
HANG_WANT_ENGINE_RESET);
+   }
+
+   do_tests("engine", "engine", ctx);
+
igt_fixture {
igt_disallow_hang(device, hang);
intel_ctx_destroy(device, ctx);
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 13/15] lib/i915: Add helper for non-destructive engine property updates

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Various tests want to configure engine properties such as pre-emption
timeout and heartbeat interval. Some don't bother to restore the
original values again afterwards. So, add a helper to make it easier
to do this.

v2: Fix for platforms with no pre-emption capability.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 lib/i915/gem_engine_topology.c | 46 ++
 lib/i915/gem_engine_topology.h |  9 +++
 2 files changed, 55 insertions(+)

diff --git a/lib/i915/gem_engine_topology.c b/lib/i915/gem_engine_topology.c
index 729f42b0a..bd12d0bc9 100644
--- a/lib/i915/gem_engine_topology.c
+++ b/lib/i915/gem_engine_topology.c
@@ -488,6 +488,52 @@ int gem_engine_property_printf(int i915, const char 
*engine, const char *attr,
return ret;
 }
 
+/* Ensure fast hang detection */
+void gem_engine_properties_configure(int fd, struct gem_engine_properties 
*params)
+{
+   int ret;
+   struct gem_engine_properties write = *params;
+
+   ret = gem_engine_property_scanf(fd, write.engine->name,
+   "heartbeat_interval_ms",
+   "%d", >heartbeat_interval);
+   igt_assert_eq(ret, 1);
+
+   ret = gem_engine_property_printf(fd, write.engine->name,
+"heartbeat_interval_ms", "%d",
+write.heartbeat_interval);
+   igt_assert_lt(0, ret);
+
+   if (gem_scheduler_has_preemption(fd)) {
+   ret = gem_engine_property_scanf(fd, write.engine->name,
+   "preempt_timeout_ms",
+   "%d", >preempt_timeout);
+   igt_assert_eq(ret, 1);
+
+   ret = gem_engine_property_printf(fd, write.engine->name,
+"preempt_timeout_ms", "%d",
+write.preempt_timeout);
+   igt_assert_lt(0, ret);
+   }
+}
+
+void gem_engine_properties_restore(int fd, const struct gem_engine_properties 
*saved)
+{
+   int ret;
+
+   ret = gem_engine_property_printf(fd, saved->engine->name,
+"heartbeat_interval_ms", "%d",
+saved->heartbeat_interval);
+   igt_assert_lt(0, ret);
+
+   if (gem_scheduler_has_preemption(fd)) {
+   ret = gem_engine_property_printf(fd, saved->engine->name,
+"preempt_timeout_ms", "%d",
+saved->preempt_timeout);
+   igt_assert_lt(0, ret);
+   }
+}
+
 uint32_t gem_engine_mmio_base(int i915, const char *engine)
 {
unsigned int mmio = 0;
diff --git a/lib/i915/gem_engine_topology.h b/lib/i915/gem_engine_topology.h
index 4cfab560b..b413aa8ab 100644
--- a/lib/i915/gem_engine_topology.h
+++ b/lib/i915/gem_engine_topology.h
@@ -115,6 +115,15 @@ struct intel_execution_engine2 
gem_eb_flags_to_engine(unsigned int flags);
 ((e__) = intel_get_current_physical_engine(__##e__)); \
 intel_next_engine(__##e__))
 
+struct gem_engine_properties {
+   const struct intel_execution_engine2 *engine;
+   int preempt_timeout;
+   int heartbeat_interval;
+};
+
+void gem_engine_properties_configure(int fd, struct gem_engine_properties 
*params);
+void gem_engine_properties_restore(int fd, const struct gem_engine_properties 
*saved);
+
 __attribute__((format(scanf, 4, 5)))
 int gem_engine_property_scanf(int i915, const char *engine, const char *attr,
  const char *fmt, ...);
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Added a an extra step to the i915_hangman tests to check that the
system is still alive after the hang and recovery. This submits a
simple batch to each engine which does a write to memory and checks
that the write occurred.

v2: Use _device_coherent instead of _wc for mapping memory to support
discrete boards.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 5a0c9497c..73a86ec9e 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -48,8 +48,57 @@
 static int device = -1;
 static int sysfs = -1;
 
+#define OFFSET_ALIVE   10
+
 IGT_TEST_DESCRIPTION("Tests for hang detection and recovery");
 
+static void check_alive(void)
+{
+   const struct intel_execution_engine2 *engine;
+   const intel_ctx_t *ctx;
+   uint32_t scratch, *out;
+   int fd, i = 0;
+   uint64_t ahnd, scratch_addr;
+
+   fd = drm_open_driver(DRIVER_INTEL);
+   igt_require(gem_class_can_store_dword(fd, 0));
+
+   ctx = intel_ctx_create_all_physical(fd);
+   ahnd = get_reloc_ahnd(fd, ctx->id);
+   scratch = gem_create(fd, 4096);
+   scratch_addr = get_offset(ahnd, scratch, 4096, 0);
+   out = gem_mmap__device_coherent(fd, scratch, 0, 4096, PROT_WRITE | 
PROT_READ);
+   gem_set_domain(fd, scratch,
+   I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
+
+   for_each_physical_engine(fd, engine) {
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], 0);
+   i++;
+   }
+
+   i = 0;
+   for_each_ctx_engine(fd, ctx, engine) {
+   if (!gem_class_can_store_dword(fd, engine->class))
+   continue;
+
+   /* +OFFSET_ALIVE to ensure engine zero doesn't get a false 
negative */
+   igt_store_word(fd, ahnd, ctx, engine, -1, scratch, scratch_addr,
+  i + OFFSET_ALIVE, i + OFFSET_ALIVE);
+   i++;
+   }
+
+   gem_set_domain(fd, scratch, I915_GEM_DOMAIN_GTT, 0);
+
+   while (i--)
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], i + OFFSET_ALIVE);
+
+   munmap(out, 4096);
+   gem_close(fd, scratch);
+   put_ahnd(ahnd);
+   intel_ctx_destroy(fd, ctx);
+   close(fd);
+}
+
 static bool has_error_state(int dir)
 {
bool result;
@@ -231,6 +280,8 @@ static void test_error_state_capture(const intel_ctx_t *ctx,
check_error_state(e->name, offset, batch);
munmap(batch, 4096);
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static void
@@ -289,6 +340,8 @@ test_engine_hang(const intel_ctx_t *ctx,
put_ahnd(ahndN);
}
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static int hang_count;
@@ -321,6 +374,8 @@ static void test_hang_detector(const intel_ctx_t *ctx,
 
/* Did it work? */
igt_assert(hang_count == 1);
+
+   check_alive();
 }
 
 /* This test covers the case where we end up in an uninitialised area of the
@@ -356,6 +411,8 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
igt_force_gpu_reset(device);
igt_assert_f(0, "unterminated batch did not trigger a hang!\n");
}
+
+   check_alive();
 }
 
 static void do_tests(const char *name, const char *prefix,
@@ -433,6 +490,8 @@ igt_main
igt_assert(sysfs != -1);
 
igt_require(has_error_state(sysfs));
+
+   gem_require_mmap_device_coherent(device);
}
 
igt_describe("Basic error capture");
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 10/15] tests/i915/i915_hangman: Run background task on all engines

2022-01-13 Thread John . C . Harrison

From: John Harrison 

As opposed to only on the non-target engines. This means that there is
some other workload present for the scheduler to switch between and so
detet the hang immediately.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 24087931c..a1aeeba6d 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -298,12 +298,14 @@ test_engine_hang(const intel_ctx_t *ctx,
igt_skip_on(flags & IGT_SPIN_INVALID_CS &&
gem_engine_has_cmdparser(device, >cfg, e->flags));
 
-   /* Fill all the other engines with background load */
+   /*
+* Fill all engines with background load.
+* This verifies that independent engines are unaffected and gives
+* the target engine something to switch between so it notices the
+* hang.
+*/
num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
-   if (other->flags == e->flags)
-   continue;
-
local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Some platforms have very long timeouts configured for some engines.
Some have them disabled completely. That makes for a very slow (or
broken) hangman test. So explicitly configure the engines to have
reasonable settings first.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index e661b8ad0..23055c271 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -496,8 +496,12 @@ igt_main
 {
const intel_ctx_t *ctx;
igt_hang_t hang = {};
+   struct gem_engine_properties saved_params[GEM_MAX_ENGINES];
+   int num_engines = 0;
 
igt_fixture {
+   const struct intel_execution_engine2 *e;
+
device = drm_open_driver(DRIVER_INTEL);
igt_require_gem(device);
 
@@ -511,6 +515,13 @@ igt_main
igt_require(has_error_state(sysfs));
 
gem_require_mmap_device_coherent(device);
+
+   for_each_physical_engine(device, e) {
+   saved_params[num_engines].engine = e;
+   saved_params[num_engines].preempt_timeout = 500;
+   saved_params[num_engines].heartbeat_interval = 1000;
+   gem_engine_properties_configure(device, saved_params + 
num_engines++);
+   }
}
 
igt_describe("Basic error capture");
@@ -542,6 +553,11 @@ igt_main
do_tests("engine", "engine", ctx);
 
igt_fixture {
+   int i;
+
+   for (i = 0; i < num_engines; i++)
+   gem_engine_properties_restore(device, saved_params + i);
+
igt_disallow_hang(device, hang);
intel_ctx_destroy(device, ctx);
close(device);
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The update to use intel_ctx_t missed a line that configures the
context to allow hanging. Fix that.

Fixes: 09c36188b ("tests/i915/gem_exec_fence: Convert to intel_ctx_t (v2)")
Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/gem_exec_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/i915/gem_exec_fence.c b/tests/i915/gem_exec_fence.c
index 196236b27..5e45d0518 100644
--- a/tests/i915/gem_exec_fence.c
+++ b/tests/i915/gem_exec_fence.c
@@ -3139,7 +3139,7 @@ igt_main
igt_hang_t hang;
 
igt_fixture {
-   hang = igt_allow_hang(i915, 0, 0);
+   hang = igt_allow_hang(i915, ctx->id, 0);
intel_allocator_multiprocess_start();
}
 
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 06/15] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The hangman framework sets up a context that is valid for all engines
and has things like banning disabled. The 'unterminated' test then
ignores it and uses the default context. Fix that.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 354769f39..6656b3fcd 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -347,6 +347,7 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
memset(, 0, sizeof(execbuf));
execbuf.buffers_ptr = (uintptr_t)_exec;
execbuf.buffer_count = 1;
+   execbuf.rsvd1 = ctx->id;
 
gem_execbuf(device, );
if (gem_wait(device, handle, _ns) != 0) {
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 03/15] tests/i915/i915_hangman: Update capture test to use engine structure

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The capture test was still using old style ring_id and ring_name
(derived from the engine structure at the higher level). Update it to
just take the engine structure directly.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index f64b8819d..280eac197 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -207,8 +207,8 @@ static void check_error_state(const char 
*expected_ring_name,
igt_assert(found);
 }
 
-static void test_error_state_capture(const intel_ctx_t *ctx, unsigned ring_id,
-const char *ring_name)
+static void test_error_state_capture(const intel_ctx_t *ctx,
+const struct intel_execution_engine2 *e)
 {
uint32_t *batch;
igt_hang_t hang;
@@ -217,7 +217,7 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx, unsigned ring_id,
 
clear_error_state();
 
-   hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, ring_id,
+   hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, e->flags,
  HANG_ALLOW_CAPTURE);
offset = hang.spin->obj[IGT_SPIN_BATCH].offset;
 
@@ -226,7 +226,7 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx, unsigned ring_id,
 
igt_post_hang_ring(device, hang);
 
-   check_error_state(ring_name, offset, batch);
+   check_error_state(e->name, offset, batch);
munmap(batch, 4096);
put_ahnd(ahnd);
 }
@@ -351,7 +351,7 @@ igt_main
igt_subtest_with_dynamic("error-state-capture") {
for_each_ctx_engine(device, ctx, e) {
igt_dynamic_f("%s", e->name)
-   test_error_state_capture(ctx, e->flags, 
e->name);
+   test_error_state_capture(ctx, e);
}
}
 
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 02/15] lib/hang: Fix igt_require_hang_ring to work with all engines

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The above function was checking for valid rings via the old interface.
The new scheme is to check for engines on contexts as there are now
more engines than could be supported.

Signed-off-by: John Harrison 
---
 lib/igt_gt.c  | 6 +++---
 lib/igt_gt.h  | 2 +-
 tests/i915/i915_hangman.c | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index 7c7df95ee..50da512f2 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -122,12 +122,12 @@ static void eat_error_state(int dev)
  * to be done under hang injection.
  * Default: false
  */
-void igt_require_hang_ring(int fd, int ring)
+void igt_require_hang_ring(int fd, uint32_t ctx, int ring)
 {
if (!igt_check_boolean_env_var("IGT_HANG", true))
igt_skip("hang injection disabled by user [IGT_HANG=0]\n");
 
-   gem_require_ring(fd, ring);
+igt_require(gem_context_has_engine(fd, ctx, ring));
gem_context_require_bannable(fd);
if (!igt_check_boolean_env_var("IGT_HANG_WITHOUT_RESET", false))
igt_require(has_gpu_reset(fd));
@@ -290,7 +290,7 @@ static igt_hang_t __igt_hang_ctx(int fd, uint64_t ahnd, 
uint32_t ctx, int ring,
igt_spin_t *spin;
unsigned ban;
 
-   igt_require_hang_ring(fd, ring);
+   igt_require_hang_ring(fd, ctx, ring);
 
/* check if non-default ctx submission is allowed */
igt_require(ctx == 0 || has_ctx_exec(fd, ring, ctx));
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index c5059817b..3d10349e4 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -31,7 +31,7 @@
 #include "i915/i915_drm_local.h"
 #include "i915_drm.h"
 
-void igt_require_hang_ring(int fd, int ring);
+void igt_require_hang_ring(int fd, uint32_t ctx, int ring);
 
 typedef struct igt_hang {
igt_spin_t *spin;
diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index b9c4d9983..f64b8819d 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -295,7 +295,7 @@ test_engine_hang(const intel_ctx_t *ctx,
  * case and it takes a lot more time to wrap, so the acthd can potentially keep
  * increasing for a long time
  */
-static void hangcheck_unterminated(void)
+static void hangcheck_unterminated(const intel_ctx_t *ctx)
 {
/* timeout needs to be greater than ~5*hangcheck */
int64_t timeout_ns = 100ull * NSEC_PER_SEC; /* 100 seconds */
@@ -304,7 +304,7 @@ static void hangcheck_unterminated(void)
uint32_t handle;
 
igt_require(gem_uses_full_ppgtt(device));
-   igt_require_hang_ring(device, 0);
+   igt_require_hang_ring(device, ctx->id, 0);
 
handle = gem_create(device, 4096);
 
@@ -394,7 +394,7 @@ igt_main
 
igt_describe("Check that executing unintialised memory causes a hang");
igt_subtest("hangcheck-unterminated")
-   hangcheck_unterminated();
+   hangcheck_unterminated(ctx);
 
igt_fixture {
igt_disallow_hang(device, hang);
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 05/15] tests/i915/i915_hangman: Add uevent test & fix detector

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Some of the IGT framework relies on receving a uevent when a hang
occurs. So add a test that this actually works.

While testing this, noticed that hangs could sometimes be missed
because the uevent was (presumably) still in flight by the time the
handler was de-registered. So add an extra delay during cleanup to
give the uevent chance to arrive.

Signed-off-by: John Harrison 
---
 lib/igt_aux.c |  7 +++
 tests/i915/i915_hangman.c | 43 +++
 2 files changed, 50 insertions(+)

diff --git a/lib/igt_aux.c b/lib/igt_aux.c
index c247a1aa4..03cc38c93 100644
--- a/lib/igt_aux.c
+++ b/lib/igt_aux.c
@@ -523,6 +523,13 @@ void igt_fork_hang_detector(int fd)
 
 void igt_stop_hang_detector(void)
 {
+   /*
+* Give the uevent time to arrive. No sleep at all misses about 20% of
+* hangs (at least, in the i915_hangman/detector test). A sleep of 1ms
+* seems to miss about 2%, 10ms loses <1%, so 100ms should be safe.
+*/
+   usleep(100 * 1000);
+
igt_stop_helper(_detector);
 }
 
diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 7b8390a6c..354769f39 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "i915/gem.h"
 #include "i915/gem_create.h"
@@ -289,6 +290,38 @@ test_engine_hang(const intel_ctx_t *ctx,
put_ahnd(ahnd);
 }
 
+static int hang_count;
+
+static void sig_io(int sig)
+{
+   hang_count++;
+}
+
+static void test_hang_detector(const intel_ctx_t *ctx,
+  const struct intel_execution_engine2 *e)
+{
+   igt_hang_t hang;
+   uint64_t ahnd = get_reloc_ahnd(device, ctx->id);
+
+   hang_count = 0;
+
+   igt_fork_hang_detector(device);
+
+   /* Steal the signal handler */
+   signal(SIGIO, sig_io);
+
+   /* Make a hang... */
+   hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, e->flags, 0);
+
+   igt_post_hang_ring(device, hang);
+   put_ahnd(ahnd);
+
+   igt_stop_hang_detector();
+
+   /* Did it work? */
+   igt_assert(hang_count == 1);
+}
+
 /* This test covers the case where we end up in an uninitialised area of the
  * ppgtt and keep executing through it. This is particularly relevant if 48b
  * ppgtt is enabled because the ppgtt is massively bigger compared to the 32b
@@ -408,6 +441,16 @@ igt_main
igt_subtest("hangcheck-unterminated")
hangcheck_unterminated(ctx);
 
+   igt_describe("Check that hang detector works");
+   igt_subtest_with_dynamic("detector") {
+   const struct intel_execution_engine2 *e;
+
+   for_each_ctx_engine(device, ctx, e) {
+   igt_dynamic_f("%s", e->name)
+   test_hang_detector(ctx, e);
+   }
+   }
+
do_tests("GT", "gt", ctx);
 
igt_fixture {
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The global context used by all the subtests for causing hangs is
marked as unbannable. However, some of the subtests set background
spinners running on all engines using a freshly created context. If
there is a test failure for any reason, all of those spinners can be
killed off as hanging contexts. On systems with lots of engines, that
can result in the test being banned from creating any new contexts.

So make the spinner contexts unbannable as well. That way if one
subtest fails it won't necessarily bring down all subsequent subtests.

v2: Simplify anti-banning code (review feedback from Matthew Brost).

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index a1aeeba6d..e661b8ad0 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -284,6 +284,17 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx,
check_alive();
 }
 
+static void context_unban(int fd, unsigned ctx)
+{
+   struct drm_i915_gem_context_param param = {
+   .ctx_id = ctx,
+   .param = I915_CONTEXT_PARAM_BANNABLE,
+   .value = 0,
+   };
+
+   gem_context_set_param(fd, );
+}
+
 static void
 test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
@@ -307,6 +318,7 @@ test_engine_hang(const intel_ctx_t *ctx,
num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   context_unban(device, local_ctx[num_ctx]->id);
ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread John . C . Harrison

From: John Harrison 

A lot of tests use almost identical code for creating a batch buffer
which does a single write to memory and another is about to be added.
Instead, move the most generic version into a common helper function.
Unfortunately, the other instances are all subtly different enough to
make it not so trivial to try to use the helper. It could be done but
it is unclear if it is worth the effort at this point. This patch
proves the concept, if people like it enough then it can be extended.

v2: Fix up object address vs store offset confusion (with help from
Zbigniew K).
v3: Cope with >32bit store_offset (review feedback from Matthew Brost).

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 lib/igt_store.c | 100 
 lib/igt_store.h |  12 +
 lib/meson.build |   1 +
 tests/i915/gem_exec_fence.c |  77 ++-
 tests/i915/i915_hangman.c   |   1 +
 5 files changed, 119 insertions(+), 72 deletions(-)
 create mode 100644 lib/igt_store.c
 create mode 100644 lib/igt_store.h

diff --git a/lib/igt_store.c b/lib/igt_store.c
new file mode 100644
index 0..98c6c4fbd
--- /dev/null
+++ b/lib/igt_store.c
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "i915/gem_create.h"
+#include "igt_core.h"
+#include "drmtest.h"
+#include "igt_store.h"
+#include "intel_chipset.h"
+#include "intel_reg.h"
+#include "ioctl_wrappers.h"
+#include "lib/intel_allocator.h"
+
+/**
+ * SECTION:igt_store_word
+ * @short_description: Library for writing a value to memory
+ * @title: StoreWord
+ * @include: igt.h
+ *
+ * A lot of igt testcases need some mechanism for writing a value to memory
+ * as a test that a batch buffer has executed.
+ *
+ * NB: Requires master for STORE_DWORD on gen4/5.
+ */
+void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
+   const struct intel_execution_engine2 *e,
+   int fence, uint32_t target_handle,
+   uint64_t target_gpu_addr,
+   uint64_t store_offset, uint32_t store_value)
+{
+   const int SCRATCH = 0;
+   const int BATCH = 1;
+   const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
+   struct drm_i915_gem_exec_object2 obj[2];
+   struct drm_i915_gem_relocation_entry reloc;
+   struct drm_i915_gem_execbuffer2 execbuf;
+   uint32_t batch[16];
+   uint64_t bb_offset, delta;
+   int i;
+
+   memset(, 0, sizeof(execbuf));
+   execbuf.buffers_ptr = to_user_pointer(obj);
+   execbuf.buffer_count = ARRAY_SIZE(obj);
+   execbuf.flags = e->flags;
+   execbuf.rsvd1 = ctx->id;
+   if (fence != -1) {
+   execbuf.flags |= I915_EXEC_FENCE_IN;
+   execbuf.rsvd2 = fence;
+   }
+   if (gen < 6)
+   execbuf.flags |= I915_EXEC_SECURE;
+
+   memset(obj, 0, sizeof(obj));
+   obj[SCRATCH].handle = target_handle;
+
+   obj[BATCH].handle = gem_create(fd, 4096);
+   obj[BATCH].relocs_ptr = to_user_pointer();
+   obj[BATCH].relocation_count = !ahnd ? 1 : 0;
+   bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
+   memset(, 0, sizeof(reloc));
+
+   i = 0;
+   delta = sizeof(uint32_t) * store_offset;
+   if (!ahnd) {
+   reloc.target_handle = obj[SCRATCH].handle;
+   reloc.presumed_offset = -1;
+   reloc.offset = sizeof(uint32_t) * (i + 1);
+   reloc.delta = lower_32_bits(delta);
+   igt_assert_eq(upper_32_bits(delta), 0);
+   reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
+   reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
+   } else {
+   obj[SCRATCH].offset = target_gpu_addr;
+   obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+   obj[BATCH].offset = bb_offset;
+   obj[BATCH].flags |= EXEC_OBJECT_PINNED;
+   }
+   batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
+   if (gen >= 8) {
+   uint64_t addr = target_gpu_addr + delta;
+   batch[++i] = lower_32_bits(addr);
+   batch[++i] = upper_32_bits(addr);
+   } else if (gen >= 4) {
+   batch[++i] = 0;
+   batch[++i] = lower_32_bits(delta);
+   igt_assert_eq(upper_32_bits(delta), 0);
+   reloc.offset += sizeof(uint32_t);
+   } else {
+   batch[i]--;
+   batch[++i] = lower_32_bits(delta);
+   igt_assert_eq(upper_32_bits(delta), 0);
+   }
+   batch[++i] = store_value;
+   batch[++i] = MI_BATCH_BUFFER_END;
+   gem_write(fd, obj[BATCH].handle, 0, batch, sizeof(batch));
+   gem_execbuf(fd, );
+   gem_close(fd, obj[BATCH].handle);
+   put_offset(ahnd, obj[BATCH].handle);
+}
diff --git a/lib/igt_store.h b/lib/igt_store.h
new file mode 100644
index

[Intel-gfx] [PATCH v4 i-g-t 00/15] Fixes for i915_hangman and gem_exec_capture

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Fix a bunch of issues with i915_hangman and gem_exec_capture with the
ultimate aim of making them pass on GuC enabled platforms.

v2: Fixes to the store code. Add engine properties management.
v3: Fix for platforms without pre-emption.
v4: Simplify anti-ban code, support >32bit store offsets and fix
memory mapping on discrete platforms.

Signed-off-by: John Harrison 


John Harrison (15):
  tests/i915/i915_hangman: Add descriptions
  lib/hang: Fix igt_require_hang_ring to work with all engines
  tests/i915/i915_hangman: Update capture test to use engine structure
  tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU
reset
  tests/i915/i915_hangman: Add uevent test & fix detector
  tests/i915/i915_hangman: Use the correct context in
hangcheck_unterminated
  lib/store: Refactor common store code into helper function
  tests/i915/i915_hangman: Add alive-ness test after error capture
  tests/i915/i915_hangman: Remove reliance on context persistance
  tests/i915/i915_hangman: Run background task on all engines
  tests/i915/i915_hangman: Don't let background contexts cause a ban
  tests/i915/gem_exec_fence: Configure correct context
  lib/i915: Add helper for non-destructive engine property updates
  tests/i915/i915_hangman: Configure engine properties for quicker hangs
  tests/i915/gem_exec_capture: Restore engines

 lib/i915/gem_engine_topology.c |  46 ++
 lib/i915/gem_engine_topology.h |   9 ++
 lib/igt_aux.c  |   7 +
 lib/igt_gt.c   |   6 +-
 lib/igt_gt.h   |   2 +-
 lib/igt_store.c| 100 +
 lib/igt_store.h|  12 ++
 lib/meson.build|   1 +
 tests/i915/gem_exec_capture.c  |  37 +++--
 tests/i915/gem_exec_fence.c|  79 +--
 tests/i915/i915_hangman.c  | 252 +++--
 11 files changed, 423 insertions(+), 128 deletions(-)
 create mode 100644 lib/igt_store.c
 create mode 100644 lib/igt_store.h

-- 
2.25.1

[Intel-gfx] [PATCH v4 i-g-t 01/15] tests/i915/i915_hangman: Add descriptions

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Added descriptions of the various sub-tests and the test as a whole.

v2: Added missing linefeed (spotted by Petri)

Signed-off-by: John Harrison 
Reviewed-by: Petri Latvala 
---
 tests/i915/i915_hangman.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 4c18c22db..b9c4d9983 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -46,6 +46,8 @@
 static int device = -1;
 static int sysfs = -1;
 
+IGT_TEST_DESCRIPTION("Tests for hang detection and recovery");
+
 static bool has_error_state(int dir)
 {
bool result;
@@ -315,9 +317,9 @@ static void hangcheck_unterminated(void)
 
gem_execbuf(device, );
if (gem_wait(device, handle, _ns) != 0) {
-   /* need to manually trigger an hang to clean before failing */
+   /* need to manually trigger a hang to clean before failing */
igt_force_gpu_reset(device);
-   igt_assert_f(0, "unterminated batch did not trigger an hang!");
+   igt_assert_f(0, "unterminated batch did not trigger a hang!\n");
}
 }
 
@@ -341,9 +343,11 @@ igt_main
igt_require(has_error_state(sysfs));
}
 
+   igt_describe("Basic error capture");
igt_subtest("error-state-basic")
test_error_state_basic();
 
+   igt_describe("Per engine error capture");
igt_subtest_with_dynamic("error-state-capture") {
for_each_ctx_engine(device, ctx, e) {
igt_dynamic_f("%s", e->name)
@@ -351,6 +355,7 @@ igt_main
}
}
 
+   igt_describe("Per engine hang recovery (spin)");
igt_subtest_with_dynamic("engine-hang") {
 int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
@@ -369,6 +374,7 @@ igt_main
}
}
 
+   igt_describe("Per engine hang recovery (invalid CS)");
igt_subtest_with_dynamic("engine-error") {
int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
@@ -386,6 +392,7 @@ igt_main
}
}
 
+   igt_describe("Check that executing unintialised memory causes a hang");
igt_subtest("hangcheck-unterminated")
hangcheck_unterminated();
 
-- 
2.25.1

[Intel-gfx] [PATCH i-g-t] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Added a an extra step to the i915_hangman tests to check that the
system is still alive after the hang and recovery. This submits a
simple batch to each engine which does a write to memory and checks
that the write occurred.

v2: Use _device_coherent instead of _wc for mapping memory to support
discrete boards.

Signed-off-by: John Harrison 
Reviewed-by: Matthew Brost 
---
 tests/i915/i915_hangman.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 5a0c9497c..73a86ec9e 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -48,8 +48,57 @@
 static int device = -1;
 static int sysfs = -1;
 
+#define OFFSET_ALIVE   10
+
 IGT_TEST_DESCRIPTION("Tests for hang detection and recovery");
 
+static void check_alive(void)
+{
+   const struct intel_execution_engine2 *engine;
+   const intel_ctx_t *ctx;
+   uint32_t scratch, *out;
+   int fd, i = 0;
+   uint64_t ahnd, scratch_addr;
+
+   fd = drm_open_driver(DRIVER_INTEL);
+   igt_require(gem_class_can_store_dword(fd, 0));
+
+   ctx = intel_ctx_create_all_physical(fd);
+   ahnd = get_reloc_ahnd(fd, ctx->id);
+   scratch = gem_create(fd, 4096);
+   scratch_addr = get_offset(ahnd, scratch, 4096, 0);
+   out = gem_mmap__device_coherent(fd, scratch, 0, 4096, PROT_WRITE | 
PROT_READ);
+   gem_set_domain(fd, scratch,
+   I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
+
+   for_each_physical_engine(fd, engine) {
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], 0);
+   i++;
+   }
+
+   i = 0;
+   for_each_ctx_engine(fd, ctx, engine) {
+   if (!gem_class_can_store_dword(fd, engine->class))
+   continue;
+
+   /* +OFFSET_ALIVE to ensure engine zero doesn't get a false 
negative */
+   igt_store_word(fd, ahnd, ctx, engine, -1, scratch, scratch_addr,
+  i + OFFSET_ALIVE, i + OFFSET_ALIVE);
+   i++;
+   }
+
+   gem_set_domain(fd, scratch, I915_GEM_DOMAIN_GTT, 0);
+
+   while (i--)
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], i + OFFSET_ALIVE);
+
+   munmap(out, 4096);
+   gem_close(fd, scratch);
+   put_ahnd(ahnd);
+   intel_ctx_destroy(fd, ctx);
+   close(fd);
+}
+
 static bool has_error_state(int dir)
 {
bool result;
@@ -231,6 +280,8 @@ static void test_error_state_capture(const intel_ctx_t *ctx,
check_error_state(e->name, offset, batch);
munmap(batch, 4096);
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static void
@@ -289,6 +340,8 @@ test_engine_hang(const intel_ctx_t *ctx,
put_ahnd(ahndN);
}
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static int hang_count;
@@ -321,6 +374,8 @@ static void test_hang_detector(const intel_ctx_t *ctx,
 
/* Did it work? */
igt_assert(hang_count == 1);
+
+   check_alive();
 }
 
 /* This test covers the case where we end up in an uninitialised area of the
@@ -356,6 +411,8 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
igt_force_gpu_reset(device);
igt_assert_f(0, "unterminated batch did not trigger a hang!\n");
}
+
+   check_alive();
 }
 
 static void do_tests(const char *name, const char *prefix,
@@ -433,6 +490,8 @@ igt_main
igt_assert(sysfs != -1);
 
igt_require(has_error_state(sysfs));
+
+   gem_require_mmap_device_coherent(device);
}
 
igt_describe("Basic error capture");
-- 
2.25.1

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:47AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The test was updated some engine properties but not restoring them
> afterwards. That would leave the system in a non-default state which
> could potentially affect subsequent tests. Fix it by using the new
> save/restore engine properties helper functions.
> 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  tests/i915/gem_exec_capture.c | 37 ++-
>  1 file changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
> index 9beb36fc7..51db07c41 100644
> --- a/tests/i915/gem_exec_capture.c
> +++ b/tests/i915/gem_exec_capture.c
> @@ -209,14 +209,21 @@ static int check_error_state(int dir, struct offset 
> *obj_offsets, int obj_count,
>   return blobs;
>  }
>  
> -static void configure_hangs(int fd, const struct intel_execution_engine2 *e, 
> int ctxt_id)
> +static struct gem_engine_properties
> +configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
>  {
> + struct gem_engine_properties props;
> +
>   /* Ensure fast hang detection */
> - gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 
> 250);
> - gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 
> 500);
> + props.engine = e;
> + props.preempt_timeout = 250;
> + props.heartbeat_interval = 500;
> + gem_engine_properties_configure(fd, );
>  
>   /* Allow engine based resets and disable banning */
>   igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE | 
> HANG_WANT_ENGINE_RESET);
> +
> + return props;
>  }
>  
>  static bool fence_busy(int fence)
> @@ -256,8 +263,9 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
> const intel_ctx_t *ctx,
>   uint32_t *batch, *seqno;
>   struct offset offset;
>   int i, fence_out;
> + struct gem_engine_properties saved_engine;
>  
> - configure_hangs(fd, e, ctx->id);
> + saved_engine = configure_hangs(fd, e, ctx->id);
>  
>   memset(obj, 0, sizeof(obj));
>   obj[SCRATCH].handle = gem_create_in_memory_regions(fd, 4096, region);
> @@ -371,6 +379,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
> const intel_ctx_t *ctx,
>   gem_close(fd, obj[BATCH].handle);
>   gem_close(fd, obj[NOCAPTURE].handle);
>   gem_close(fd, obj[SCRATCH].handle);
> +
> + gem_engine_properties_restore(fd, _engine);
>  }
>  
>  static void capture(int fd, int dir, const intel_ctx_t *ctx,
> @@ -417,8 +427,9 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
> intel_ctx_t *ctx,
>   uint32_t *batch, *seqno;
>   struct offset *offsets;
>   int i, fence_out;
> + struct gem_engine_properties saved_engine;
>  
> - configure_hangs(fd, e, ctx->id);
> + saved_engine = configure_hangs(fd, e, ctx->id);
>  
>   offsets = calloc(count, sizeof(*offsets));
>   igt_assert(offsets);
> @@ -559,10 +570,12 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
> intel_ctx_t *ctx,
>  
>   qsort(offsets, count, sizeof(*offsets), cmp);
>   igt_assert(offsets[0].addr <= offsets[count-1].addr);
> +
> + gem_engine_properties_restore(fd, _engine);
>   return offsets;
>  }
>  
> -#define find_first_available_engine(fd, ctx, e) \
> +#define find_first_available_engine(fd, ctx, e, saved) \
>   do { \
>   ctx = intel_ctx_create_all_physical(fd); \
>   igt_assert(ctx); \
> @@ -570,7 +583,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
> intel_ctx_t *ctx,
>   for_each_if(gem_class_can_store_dword(fd, e->class)) \
>   break; \
>   igt_assert(e); \
> - configure_hangs(fd, e, ctx->id); \
> + saved = configure_hangs(fd, e, ctx->id); \
>   } while(0)
>  
>  static void many(int fd, int dir, uint64_t size, unsigned int flags)
> @@ -580,8 +593,9 @@ static void many(int fd, int dir, uint64_t size, unsigned 
> int flags)
>   uint64_t ram, gtt, ahnd;
>   unsigned long count, blobs;
>   struct offset *offsets;
> + struct gem_engine_properties saved_engine;
>  
> - find_first_available_engine(fd, ctx, e);
> + find_first_available_engine(fd, ctx, e, saved_engine);
>  
>   gtt = gem_aperture_size(fd) / size;
>   ram = (intel_get_avail_ram_mb() << 20) / size;
> @@ -602,6 +616,8 @@ static void many(int fd, int dir, uint64_t size, unsigned 
> int flags)
>  
>   free(offsets);
>   put_ahnd(ahnd);
> +
> + gem_engine_properties_restore(fd, _engine);
>  }
>  
>  static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
> @@ -697,8 +713,9 @@ static void userptr(int fd, int dir)
>   void *ptr;
>   int obj_size = 4096;
>   uint32_t system_region = INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0);
> + struct gem_engine_properties saved_engine;
>  
> - find_first_available_engine(fd, ctx, e);
>

Re: [Intel-gfx] [PATCH v3 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:46AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> Some platforms have very long timeouts configured for some engines.
> Some have them disabled completely. That makes for a very slow (or
> broken) hangman test. So explicitly configure the engines to have
> reasonable settings first.
> 
> Signed-off-by: John Harrison 
> ---
>  tests/i915/i915_hangman.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index 567eb71ee..1a2b2cf7a 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -500,8 +500,12 @@ igt_main
>  {
>   const intel_ctx_t *ctx;
>   igt_hang_t hang = {};
> + struct gem_engine_properties saved_params[GEM_MAX_ENGINES];
> + int num_engines = 0;
>  
>   igt_fixture {
> + const struct intel_execution_engine2 *e;
> +
>   device = drm_open_driver(DRIVER_INTEL);
>   igt_require_gem(device);
>  
> @@ -515,6 +519,13 @@ igt_main
>   igt_require(has_error_state(sysfs));
>  
>   gem_require_mmap_wc(device);
> +
> + for_each_physical_engine(device, e) {
> + saved_params[num_engines].engine = e;
> + saved_params[num_engines].preempt_timeout = 500;
> + saved_params[num_engines].heartbeat_interval = 1000;
> + gem_engine_properties_configure(device, saved_params + 
> num_engines++);
> + }
>   }
>  
>   igt_describe("Basic error capture");
> @@ -546,6 +557,11 @@ igt_main
>   do_tests("engine", "engine", ctx);
>  
>   igt_fixture {
> + int i;
> +
> + for (i = 0; i < num_engines; i++)
> + gem_engine_properties_restore(device, saved_params + i);

If you wanted to be clever:

while (num_engines--)
gem_engine_properties_restore(device, saved_params + num_engines);

Regardless:
Reviewed-by: Matthew Brost 

> +
>   igt_disallow_hang(device, hang);
>   intel_ctx_destroy(device, ctx);
>   close(device);
> -- 
> 2.25.1
>

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 13/15] lib/i915: Add helper for non-destructive engine property updates

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:45AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> Various tests want to configure engine properties such as pre-emption
> timeout and heartbeat interval. Some don't bother to restore the
> original values again afterwards. So, add a helper to make it easier
> to do this.
> 
> v2: Fix for platforms with no pre-emption capability.
> 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  lib/i915/gem_engine_topology.c | 46 ++
>  lib/i915/gem_engine_topology.h |  9 +++
>  2 files changed, 55 insertions(+)
> 
> diff --git a/lib/i915/gem_engine_topology.c b/lib/i915/gem_engine_topology.c
> index 729f42b0a..bd12d0bc9 100644
> --- a/lib/i915/gem_engine_topology.c
> +++ b/lib/i915/gem_engine_topology.c
> @@ -488,6 +488,52 @@ int gem_engine_property_printf(int i915, const char 
> *engine, const char *attr,
>   return ret;
>  }
>  
> +/* Ensure fast hang detection */
> +void gem_engine_properties_configure(int fd, struct gem_engine_properties 
> *params)
> +{
> + int ret;
> + struct gem_engine_properties write = *params;
> +
> + ret = gem_engine_property_scanf(fd, write.engine->name,
> + "heartbeat_interval_ms",
> + "%d", >heartbeat_interval);
> + igt_assert_eq(ret, 1);
> +
> + ret = gem_engine_property_printf(fd, write.engine->name,
> +  "heartbeat_interval_ms", "%d",
> +  write.heartbeat_interval);
> + igt_assert_lt(0, ret);
> +
> + if (gem_scheduler_has_preemption(fd)) {
> + ret = gem_engine_property_scanf(fd, write.engine->name,
> + "preempt_timeout_ms",
> + "%d", >preempt_timeout);
> + igt_assert_eq(ret, 1);
> +
> + ret = gem_engine_property_printf(fd, write.engine->name,
> +  "preempt_timeout_ms", "%d",
> +  write.preempt_timeout);
> + igt_assert_lt(0, ret);
> + }
> +}
> +
> +void gem_engine_properties_restore(int fd, const struct 
> gem_engine_properties *saved)
> +{
> + int ret;
> +
> + ret = gem_engine_property_printf(fd, saved->engine->name,
> +  "heartbeat_interval_ms", "%d",
> +  saved->heartbeat_interval);
> + igt_assert_lt(0, ret);
> +
> + if (gem_scheduler_has_preemption(fd)) {
> + ret = gem_engine_property_printf(fd, saved->engine->name,
> +  "preempt_timeout_ms", "%d",
> +  saved->preempt_timeout);
> + igt_assert_lt(0, ret);
> + }
> +}
> +
>  uint32_t gem_engine_mmio_base(int i915, const char *engine)
>  {
>   unsigned int mmio = 0;
> diff --git a/lib/i915/gem_engine_topology.h b/lib/i915/gem_engine_topology.h
> index 4cfab560b..b413aa8ab 100644
> --- a/lib/i915/gem_engine_topology.h
> +++ b/lib/i915/gem_engine_topology.h
> @@ -115,6 +115,15 @@ struct intel_execution_engine2 
> gem_eb_flags_to_engine(unsigned int flags);
>((e__) = intel_get_current_physical_engine(__##e__)); \
>intel_next_engine(__##e__))
>  
> +struct gem_engine_properties {
> + const struct intel_execution_engine2 *engine;
> + int preempt_timeout;
> + int heartbeat_interval;
> +};
> +
> +void gem_engine_properties_configure(int fd, struct gem_engine_properties 
> *params);
> +void gem_engine_properties_restore(int fd, const struct 
> gem_engine_properties *saved);
> +
>  __attribute__((format(scanf, 4, 5)))
>  int gem_engine_property_scanf(int i915, const char *engine, const char *attr,
> const char *fmt, ...);
> -- 
> 2.25.1
>

Re: [Intel-gfx] [PATCH i-g-t] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 01:26:53PM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The global context used by all the subtests for causing hangs is
> marked as unbannable. However, some of the subtests set background
> spinners running on all engines using a freshly created context. If
> there is a test failure for any reason, all of those spinners can be
> killed off as hanging contexts. On systems with lots of engines, that
> can result in the test being banned from creating any new contexts.
> 
> So make the spinner contexts unbannable as well. That way if one
> subtest fails it won't necessarily bring down all subsequent subtests.
> 
> v2: Simplify anti-banning code (review feedback from Matthew Brost).
> 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  tests/i915/i915_hangman.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index 9f7f8062c..537ed35a5 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -284,6 +284,17 @@ static void test_error_state_capture(const intel_ctx_t 
> *ctx,
>   check_alive();
>  }
>  
> +static void context_unban(int fd, unsigned ctx)
> +{
> + struct drm_i915_gem_context_param param = {
> + .ctx_id = ctx,
> + .param = I915_CONTEXT_PARAM_BANNABLE,
> + .value = 0,
> + };
> +
> + gem_context_set_param(fd, );
> +}
> +
>  static void
>  test_engine_hang(const intel_ctx_t *ctx,
>const struct intel_execution_engine2 *e, unsigned int flags)
> @@ -307,6 +318,7 @@ test_engine_hang(const intel_ctx_t *ctx,
>   num_ctx = 0;
>   for_each_ctx_engine(device, ctx, other) {
>   local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
> + context_unban(device, local_ctx[num_ctx]->id);
>   ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
>   spin = __igt_spin_new(device,
> .ahnd = ahndN,
> -- 
> 2.25.1
>

[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Flip guc_id allocation partition (rev5)

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915: Flip guc_id allocation partition (rev5)
URL   : https://patchwork.freedesktop.org/series/98751/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11079_full -> Patchwork_22000_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_22000_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22000_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (10 -> 10)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_22000_full:

### IGT changes ###

 Possible regressions 

  * igt@gem_workarounds@suspend-resume:
- shard-skl:  NOTRUN -> [INCOMPLETE][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-skl10/igt@gem_workarou...@suspend-resume.html

  
Known issues


  Here are the changes found in Patchwork_22000_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@feature_discovery@psr2:
- shard-iclb: [PASS][2] -> [SKIP][3] ([i915#658])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-iclb2/igt@feature_discov...@psr2.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-iclb7/igt@feature_discov...@psr2.html

  * igt@gem_exec_balancer@parallel-bb-first:
- shard-iclb: [PASS][4] -> [SKIP][5] ([i915#4525])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-iclb4/igt@gem_exec_balan...@parallel-bb-first.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-iclb5/igt@gem_exec_balan...@parallel-bb-first.html

  * igt@gem_exec_fair@basic-deadline:
- shard-skl:  NOTRUN -> [FAIL][6] ([i915#2846])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-skl4/igt@gem_exec_f...@basic-deadline.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
- shard-iclb: NOTRUN -> [FAIL][7] ([i915#2852])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-iclb7/igt@gem_exec_fair@basic-none-r...@rcs0.html
- shard-glk:  NOTRUN -> [FAIL][8] ([i915#2842])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-glk7/igt@gem_exec_fair@basic-none-r...@rcs0.html

  * igt@gem_exec_fair@basic-none@vecs0:
- shard-glk:  [PASS][9] -> [FAIL][10] ([i915#2842])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk4/igt@gem_exec_fair@basic-n...@vecs0.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-glk8/igt@gem_exec_fair@basic-n...@vecs0.html

  * igt@gem_lmem_swapping@basic:
- shard-skl:  NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#4613]) +2 
similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-skl7/igt@gem_lmem_swapp...@basic.html

  * igt@gem_lmem_swapping@random-engines:
- shard-kbl:  NOTRUN -> [SKIP][12] ([fdo#109271] / [i915#4613])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-kbl6/igt@gem_lmem_swapp...@random-engines.html

  * igt@gem_pxp@reject-modify-context-protection-off-3:
- shard-tglb: NOTRUN -> [SKIP][13] ([i915#4270])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-tglb3/igt@gem_...@reject-modify-context-protection-off-3.html

  * igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled:
- shard-iclb: NOTRUN -> [SKIP][14] ([i915#768])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-iclb7/igt@gem_render_c...@y-tiled-mc-ccs-to-vebox-y-tiled.html

  * igt@gem_userptr_blits@dmabuf-sync:
- shard-iclb: NOTRUN -> [SKIP][15] ([i915#3323])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-iclb7/igt@gem_userptr_bl...@dmabuf-sync.html
- shard-tglb: NOTRUN -> [SKIP][16] ([i915#3323])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-tglb7/igt@gem_userptr_bl...@dmabuf-sync.html

  * igt@gen9_exec_parse@shadow-peek:
- shard-tglb: NOTRUN -> [SKIP][17] ([i915#2527] / [i915#2856])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-tglb7/igt@gen9_exec_pa...@shadow-peek.html
- shard-iclb: NOTRUN -> [SKIP][18] ([i915#2856])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-iclb7/igt@gen9_exec_pa...@shadow-peek.html

  * igt@i915_pm_rpm@modeset-pc8-residency-stress:
- shard-tglb: NOTRUN -> [SKIP][19] ([fdo#109506] / [i915#2411])
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/shard-tglb3/igt@i915_pm_...@modeset-pc8-residency-stress.html

  *

[Intel-gfx] [PATCH i-g-t] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The global context used by all the subtests for causing hangs is
marked as unbannable. However, some of the subtests set background
spinners running on all engines using a freshly created context. If
there is a test failure for any reason, all of those spinners can be
killed off as hanging contexts. On systems with lots of engines, that
can result in the test being banned from creating any new contexts.

So make the spinner contexts unbannable as well. That way if one
subtest fails it won't necessarily bring down all subsequent subtests.

v2: Simplify anti-banning code (review feedback from Matthew Brost).

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 9f7f8062c..537ed35a5 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -284,6 +284,17 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx,
check_alive();
 }
 
+static void context_unban(int fd, unsigned ctx)
+{
+   struct drm_i915_gem_context_param param = {
+   .ctx_id = ctx,
+   .param = I915_CONTEXT_PARAM_BANNABLE,
+   .value = 0,
+   };
+
+   gem_context_set_param(fd, );
+}
+
 static void
 test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
@@ -307,6 +318,7 @@ test_engine_hang(const intel_ctx_t *ctx,
num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   context_unban(device, local_ctx[num_ctx]->id);
ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
-- 
2.25.1

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-13 Thread John Harrison


On 1/13/2022 13:06, Matthew Brost wrote:

On Thu, Jan 13, 2022 at 11:59:44AM -0800, john.c.harri...@intel.com wrote:

From: John Harrison 

The update to use intel_ctx_t missed a line that configures the
context to allow hanging. Fix that.

Fixes: 09c36188b23f83ef9a7b5414e2a10100adc4291f

Typically I thought the Fixes comment was the sha from "git log
--oneline" + first line of commit message from that surrounded by ("").

So:
Fixes:  ("")

Oops, yeah. Not sure what happened here. Brain fart most likely ;).

John.



With that fixed:
Reviewed-by: Matthew Brost 


Signed-off-by: John Harrison 
---
  tests/i915/gem_exec_fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/i915/gem_exec_fence.c b/tests/i915/gem_exec_fence.c
index 196236b27..5e45d0518 100644
--- a/tests/i915/gem_exec_fence.c
+++ b/tests/i915/gem_exec_fence.c
@@ -3139,7 +3139,7 @@ igt_main
igt_hang_t hang;
  
  			igt_fixture {

-   hang = igt_allow_hang(i915, 0, 0);
+   hang = igt_allow_hang(i915, ctx->id, 0);
intel_allocator_multiprocess_start();
}
  
--

2.25.1

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John Harrison


On 1/13/2022 13:01, Matthew Brost wrote:

On Thu, Jan 13, 2022 at 11:59:43AM -0800, john.c.harri...@intel.com wrote:

From: John Harrison 

The global context used by all the subtests for causing hangs is
marked as unbannable. However, some of the subtests set background
spinners running on all engines using a freshly created context. If
there is a test failure for any reason, all of those spinners can be
killed off as hanging contexts. On systems with lots of engines, that
can result in the test being banned from creating any new contexts.

So make the spinner contexts unbannable as well. That way if one
subtest fails it won't necessarily bring down all subsequent subtests.

Signed-off-by: John Harrison 
---
  tests/i915/i915_hangman.c | 16 
  1 file changed, 16 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 9f7f8062c..567eb71ee 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -284,6 +284,21 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx,
check_alive();
  }
  
+static void context_unban(int fd, unsigned ctx)

+{
+   struct drm_i915_gem_context_param param = {
+   .ctx_id = ctx,
+   .param = I915_CONTEXT_PARAM_BANNABLE,

Looking at the kernel I don't see how I915_CONTEXT_PARAM_BANNABLE can
return -EINVAL unless it is protected context.


+   .value = 0,
+   };
+
+   if(__gem_context_set_param(fd, ) == -EINVAL) {
+   igt_assert_eq(param.value, 0);
+   param.param = I915_CONTEXT_PARAM_BAN_PERIOD;

Also this always returns -EINVAL.

Probably can just go with:

gem_context_set_param on original parameters.

Matt
The code was just copied from 'context_set_ban' in igt_gt.c. Can't 
recall offhand why I didn't just call that function instead. There was 
some reason why it seemed better to clone it than to export the helper.


Just did a quick check of other tests that disable banning (sysfs_*, 
gem_exec_balancer, gem_exec_isolation) and they all just do a simple 
set_param(BANNABLE) and leave it at that. So I guess I'll just update 
this one to match as well.


John.





+   gem_context_set_param(fd, );
+   }
+}
+
  static void
  test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
@@ -307,6 +322,7 @@ test_engine_hang(const intel_ctx_t *ctx,
num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   context_unban(device, local_ctx[num_ctx]->id);
ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
--
2.25.1

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:44AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The update to use intel_ctx_t missed a line that configures the
> context to allow hanging. Fix that.
> 
> Fixes: 09c36188b23f83ef9a7b5414e2a10100adc4291f

Typically I thought the Fixes comment was the sha from "git log
--oneline" + first line of commit message from that surrounded by ("").

So:
Fixes:  ("")

With that fixed:
Reviewed-by: Matthew Brost 

> 
> Signed-off-by: John Harrison 
> ---
>  tests/i915/gem_exec_fence.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/i915/gem_exec_fence.c b/tests/i915/gem_exec_fence.c
> index 196236b27..5e45d0518 100644
> --- a/tests/i915/gem_exec_fence.c
> +++ b/tests/i915/gem_exec_fence.c
> @@ -3139,7 +3139,7 @@ igt_main
>   igt_hang_t hang;
>  
>   igt_fixture {
> - hang = igt_allow_hang(i915, 0, 0);
> + hang = igt_allow_hang(i915, ctx->id, 0);
>   intel_allocator_multiprocess_start();
>   }
>  
> -- 
> 2.25.1
>

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:43AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The global context used by all the subtests for causing hangs is
> marked as unbannable. However, some of the subtests set background
> spinners running on all engines using a freshly created context. If
> there is a test failure for any reason, all of those spinners can be
> killed off as hanging contexts. On systems with lots of engines, that
> can result in the test being banned from creating any new contexts.
> 
> So make the spinner contexts unbannable as well. That way if one
> subtest fails it won't necessarily bring down all subsequent subtests.
> 
> Signed-off-by: John Harrison 
> ---
>  tests/i915/i915_hangman.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index 9f7f8062c..567eb71ee 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -284,6 +284,21 @@ static void test_error_state_capture(const intel_ctx_t 
> *ctx,
>   check_alive();
>  }
>  
> +static void context_unban(int fd, unsigned ctx)
> +{
> + struct drm_i915_gem_context_param param = {
> + .ctx_id = ctx,
> + .param = I915_CONTEXT_PARAM_BANNABLE,

Looking at the kernel I don't see how I915_CONTEXT_PARAM_BANNABLE can
return -EINVAL unless it is protected context.

> + .value = 0,
> + };
> +
> + if(__gem_context_set_param(fd, ) == -EINVAL) {
> + igt_assert_eq(param.value, 0);
> + param.param = I915_CONTEXT_PARAM_BAN_PERIOD;

Also this always returns -EINVAL.

Probably can just go with:

gem_context_set_param on original parameters.

Matt

> + gem_context_set_param(fd, );
> + }
> +}
> +
>  static void
>  test_engine_hang(const intel_ctx_t *ctx,
>const struct intel_execution_engine2 *e, unsigned int flags)
> @@ -307,6 +322,7 @@ test_engine_hang(const intel_ctx_t *ctx,
>   num_ctx = 0;
>   for_each_ctx_engine(device, ctx, other) {
>   local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
> + context_unban(device, local_ctx[num_ctx]->id);
>   ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
>   spin = __igt_spin_new(device,
> .ahnd = ahndN,
> -- 
> 2.25.1
>

Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] lib/store: Refactor common store code into helper function

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 12:50:29PM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> A lot of tests use almost identical code for creating a batch buffer
> which does a single write to memory and another is about to be added.
> Instead, move the most generic version into a common helper function.
> Unfortunately, the other instances are all subtly different enough to
> make it not so trivial to try to use the helper. It could be done but
> it is unclear if it is worth the effort at this point. This patch
> proves the concept, if people like it enough then it can be extended.
> 
> v2: Fix up object address vs store offset confusion (with help from
> Zbigniew K).
> v3: Cope with >32bit store_offset (review feedback from Matthew Brost).
> 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  lib/igt_store.c | 100 
>  lib/igt_store.h |  12 +
>  lib/meson.build |   1 +
>  tests/i915/gem_exec_fence.c |  77 ++-
>  tests/i915/i915_hangman.c   |   1 +
>  5 files changed, 119 insertions(+), 72 deletions(-)
>  create mode 100644 lib/igt_store.c
>  create mode 100644 lib/igt_store.h
> 
> diff --git a/lib/igt_store.c b/lib/igt_store.c
> new file mode 100644
> index 0..98c6c4fbd
> --- /dev/null
> +++ b/lib/igt_store.c
> @@ -0,0 +1,100 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include "i915/gem_create.h"
> +#include "igt_core.h"
> +#include "drmtest.h"
> +#include "igt_store.h"
> +#include "intel_chipset.h"
> +#include "intel_reg.h"
> +#include "ioctl_wrappers.h"
> +#include "lib/intel_allocator.h"
> +
> +/**
> + * SECTION:igt_store_word
> + * @short_description: Library for writing a value to memory
> + * @title: StoreWord
> + * @include: igt.h
> + *
> + * A lot of igt testcases need some mechanism for writing a value to memory
> + * as a test that a batch buffer has executed.
> + *
> + * NB: Requires master for STORE_DWORD on gen4/5.
> + */
> +void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
> + const struct intel_execution_engine2 *e,
> + int fence, uint32_t target_handle,
> + uint64_t target_gpu_addr,
> + uint64_t store_offset, uint32_t store_value)
> +{
> + const int SCRATCH = 0;
> + const int BATCH = 1;
> + const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
> + struct drm_i915_gem_exec_object2 obj[2];
> + struct drm_i915_gem_relocation_entry reloc;
> + struct drm_i915_gem_execbuffer2 execbuf;
> + uint32_t batch[16];
> + uint64_t bb_offset, delta;
> + int i;
> +
> + memset(, 0, sizeof(execbuf));
> + execbuf.buffers_ptr = to_user_pointer(obj);
> + execbuf.buffer_count = ARRAY_SIZE(obj);
> + execbuf.flags = e->flags;
> + execbuf.rsvd1 = ctx->id;
> + if (fence != -1) {
> + execbuf.flags |= I915_EXEC_FENCE_IN;
> + execbuf.rsvd2 = fence;
> + }
> + if (gen < 6)
> + execbuf.flags |= I915_EXEC_SECURE;
> +
> + memset(obj, 0, sizeof(obj));
> + obj[SCRATCH].handle = target_handle;
> +
> + obj[BATCH].handle = gem_create(fd, 4096);
> + obj[BATCH].relocs_ptr = to_user_pointer();
> + obj[BATCH].relocation_count = !ahnd ? 1 : 0;
> + bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
> + memset(, 0, sizeof(reloc));
> +
> + i = 0;
> + delta = sizeof(uint32_t) * store_offset;
> + if (!ahnd) {
> + reloc.target_handle = obj[SCRATCH].handle;
> + reloc.presumed_offset = -1;
> + reloc.offset = sizeof(uint32_t) * (i + 1);
> + reloc.delta = lower_32_bits(delta);
> + igt_assert_eq(upper_32_bits(delta), 0);
> + reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> + reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
> + } else {
> + obj[SCRATCH].offset = target_gpu_addr;
> + obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> + obj[BATCH].offset = bb_offset;
> + obj[BATCH].flags |= EXEC_OBJECT_PINNED;
> + }
> + batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> + if (gen >= 8) {
> + uint64_t addr = target_gpu_addr + delta;
> + batch[++i] = lower_32_bits(addr);
> + batch[++i] = upper_32_bits(addr);
> + } else if (gen >= 4) {
> + batch[++i] = 0;
> + batch[++i] = lower_32_bits(delta);
> + igt_assert_eq(upper_32_bits(delta), 0);
> + reloc.offset += sizeof(uint32_t);
> + } else {
> + batch[i]--;
> + batch[++i] = lower_32_bits(delta);
> + igt_assert_eq(upper_32_bits(delta), 0);
> + }
> + batch[++i] = store_value;
> + batch[++i] = MI_BATCH_BUFFER_END;
> + gem_write(fd, obj[BATCH].handle, 0, batch,

Re: [Intel-gfx] [PATCH v3 i-g-t 10/15] tests/i915/i915_hangman: Run background task on all engines

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:42AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> As opposed to only on the non-target engines. This means that there is
> some other workload present for the scheduler to switch between and so
> detet the hang immediately.
> 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  tests/i915/i915_hangman.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index 6601db5f6..9f7f8062c 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -298,12 +298,14 @@ test_engine_hang(const intel_ctx_t *ctx,
>   igt_skip_on(flags & IGT_SPIN_INVALID_CS &&
>   gem_engine_has_cmdparser(device, >cfg, e->flags));
>  
> - /* Fill all the other engines with background load */
> + /*
> +  * Fill all engines with background load.
> +  * This verifies that independent engines are unaffected and gives
> +  * the target engine something to switch between so it notices the
> +  * hang.
> +  */
>   num_ctx = 0;
>   for_each_ctx_engine(device, ctx, other) {
> - if (other->flags == e->flags)
> - continue;
> -
>   local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
>   ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
>   spin = __igt_spin_new(device,
> -- 
> 2.25.1
>

[Intel-gfx] ✗ Fi.CI.IGT: failure for Remove some hacks required for GuC 62.0.0 (rev2)

2022-01-13 Thread Patchwork

== Series Details ==

Series: Remove some hacks required for GuC 62.0.0 (rev2)
URL   : https://patchwork.freedesktop.org/series/98773/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11079_full -> Patchwork_21999_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_21999_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21999_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (10 -> 10)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_21999_full:

### IGT changes ###

 Possible regressions 

  * igt@i915_suspend@fence-restore-tiled2untiled:
- shard-skl:  NOTRUN -> [INCOMPLETE][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-skl9/igt@i915_susp...@fence-restore-tiled2untiled.html

  
Known issues


  Here are the changes found in Patchwork_21999_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@feature_discovery@psr2:
- shard-iclb: [PASS][2] -> [SKIP][3] ([i915#658])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-iclb2/igt@feature_discov...@psr2.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-iclb8/igt@feature_discov...@psr2.html

  * igt@gem_eio@in-flight-contexts-immediate:
- shard-skl:  [PASS][4] -> [TIMEOUT][5] ([i915#3063])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-skl10/igt@gem_...@in-flight-contexts-immediate.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-skl5/igt@gem_...@in-flight-contexts-immediate.html

  * igt@gem_exec_balancer@parallel-keep-submit-fence:
- shard-iclb: [PASS][6] -> [SKIP][7] ([i915#4525]) +1 similar issue
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-iclb1/igt@gem_exec_balan...@parallel-keep-submit-fence.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-iclb7/igt@gem_exec_balan...@parallel-keep-submit-fence.html

  * igt@gem_exec_capture@pi@rcs0:
- shard-skl:  [PASS][8] -> [INCOMPLETE][9] ([i915#4547])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-skl4/igt@gem_exec_capture@p...@rcs0.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-skl7/igt@gem_exec_capture@p...@rcs0.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
- shard-iclb: NOTRUN -> [FAIL][10] ([i915#2842])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-iclb1/igt@gem_exec_fair@basic-none-r...@rcs0.html

  * igt@gem_exec_fair@basic-none-solo@rcs0:
- shard-apl:  [PASS][11] -> [FAIL][12] ([i915#2842])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-apl4/igt@gem_exec_fair@basic-none-s...@rcs0.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-apl1/igt@gem_exec_fair@basic-none-s...@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
- shard-tglb: NOTRUN -> [FAIL][13] ([i915#2842])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-tglb8/igt@gem_exec_fair@basic-pace-sh...@rcs0.html

  * igt@gem_lmem_swapping@heavy-verify-random:
- shard-kbl:  NOTRUN -> [SKIP][14] ([fdo#109271] / [i915#4613]) +2 
similar issues
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-kbl7/igt@gem_lmem_swapp...@heavy-verify-random.html
- shard-skl:  NOTRUN -> [SKIP][15] ([fdo#109271] / [i915#4613])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-skl9/igt@gem_lmem_swapp...@heavy-verify-random.html

  * igt@gem_pread@exhaustion:
- shard-skl:  NOTRUN -> [WARN][16] ([i915#2658])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-skl2/igt@gem_pr...@exhaustion.html

  * igt@gem_pwrite@basic-exhaustion:
- shard-kbl:  NOTRUN -> [WARN][17] ([i915#2658])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-kbl7/igt@gem_pwr...@basic-exhaustion.html

  * igt@gem_pxp@reject-modify-context-protection-off-3:
- shard-tglb: NOTRUN -> [SKIP][18] ([i915#4270])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-tglb8/igt@gem_...@reject-modify-context-protection-off-3.html

  * igt@gem_render_copy@y-tiled-mc-ccs-to-vebox-y-tiled:
- shard-iclb: NOTRUN -> [SKIP][19] ([i915#768])
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/shard-iclb1/igt@gem_render_c...@y-tiled-mc-ccs-to-vebox-y-tiled.html

  * igt@gem_userptr_blits@dmabuf-sync:
- shard-iclb: NOTRUN ->

[Intel-gfx] [PATCH i-g-t] lib/store: Refactor common store code into helper function

2022-01-13 Thread John . C . Harrison

From: John Harrison 

A lot of tests use almost identical code for creating a batch buffer
which does a single write to memory and another is about to be added.
Instead, move the most generic version into a common helper function.
Unfortunately, the other instances are all subtly different enough to
make it not so trivial to try to use the helper. It could be done but
it is unclear if it is worth the effort at this point. This patch
proves the concept, if people like it enough then it can be extended.

v2: Fix up object address vs store offset confusion (with help from
Zbigniew K).
v3: Cope with >32bit store_offset (review feedback from Matthew Brost).

Signed-off-by: John Harrison 
---
 lib/igt_store.c | 100 
 lib/igt_store.h |  12 +
 lib/meson.build |   1 +
 tests/i915/gem_exec_fence.c |  77 ++-
 tests/i915/i915_hangman.c   |   1 +
 5 files changed, 119 insertions(+), 72 deletions(-)
 create mode 100644 lib/igt_store.c
 create mode 100644 lib/igt_store.h

diff --git a/lib/igt_store.c b/lib/igt_store.c
new file mode 100644
index 0..98c6c4fbd
--- /dev/null
+++ b/lib/igt_store.c
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "i915/gem_create.h"
+#include "igt_core.h"
+#include "drmtest.h"
+#include "igt_store.h"
+#include "intel_chipset.h"
+#include "intel_reg.h"
+#include "ioctl_wrappers.h"
+#include "lib/intel_allocator.h"
+
+/**
+ * SECTION:igt_store_word
+ * @short_description: Library for writing a value to memory
+ * @title: StoreWord
+ * @include: igt.h
+ *
+ * A lot of igt testcases need some mechanism for writing a value to memory
+ * as a test that a batch buffer has executed.
+ *
+ * NB: Requires master for STORE_DWORD on gen4/5.
+ */
+void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
+   const struct intel_execution_engine2 *e,
+   int fence, uint32_t target_handle,
+   uint64_t target_gpu_addr,
+   uint64_t store_offset, uint32_t store_value)
+{
+   const int SCRATCH = 0;
+   const int BATCH = 1;
+   const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
+   struct drm_i915_gem_exec_object2 obj[2];
+   struct drm_i915_gem_relocation_entry reloc;
+   struct drm_i915_gem_execbuffer2 execbuf;
+   uint32_t batch[16];
+   uint64_t bb_offset, delta;
+   int i;
+
+   memset(, 0, sizeof(execbuf));
+   execbuf.buffers_ptr = to_user_pointer(obj);
+   execbuf.buffer_count = ARRAY_SIZE(obj);
+   execbuf.flags = e->flags;
+   execbuf.rsvd1 = ctx->id;
+   if (fence != -1) {
+   execbuf.flags |= I915_EXEC_FENCE_IN;
+   execbuf.rsvd2 = fence;
+   }
+   if (gen < 6)
+   execbuf.flags |= I915_EXEC_SECURE;
+
+   memset(obj, 0, sizeof(obj));
+   obj[SCRATCH].handle = target_handle;
+
+   obj[BATCH].handle = gem_create(fd, 4096);
+   obj[BATCH].relocs_ptr = to_user_pointer();
+   obj[BATCH].relocation_count = !ahnd ? 1 : 0;
+   bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
+   memset(, 0, sizeof(reloc));
+
+   i = 0;
+   delta = sizeof(uint32_t) * store_offset;
+   if (!ahnd) {
+   reloc.target_handle = obj[SCRATCH].handle;
+   reloc.presumed_offset = -1;
+   reloc.offset = sizeof(uint32_t) * (i + 1);
+   reloc.delta = lower_32_bits(delta);
+   igt_assert_eq(upper_32_bits(delta), 0);
+   reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
+   reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
+   } else {
+   obj[SCRATCH].offset = target_gpu_addr;
+   obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+   obj[BATCH].offset = bb_offset;
+   obj[BATCH].flags |= EXEC_OBJECT_PINNED;
+   }
+   batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
+   if (gen >= 8) {
+   uint64_t addr = target_gpu_addr + delta;
+   batch[++i] = lower_32_bits(addr);
+   batch[++i] = upper_32_bits(addr);
+   } else if (gen >= 4) {
+   batch[++i] = 0;
+   batch[++i] = lower_32_bits(delta);
+   igt_assert_eq(upper_32_bits(delta), 0);
+   reloc.offset += sizeof(uint32_t);
+   } else {
+   batch[i]--;
+   batch[++i] = lower_32_bits(delta);
+   igt_assert_eq(upper_32_bits(delta), 0);
+   }
+   batch[++i] = store_value;
+   batch[++i] = MI_BATCH_BUFFER_END;
+   gem_write(fd, obj[BATCH].handle, 0, batch, sizeof(batch));
+   gem_execbuf(fd, );
+   gem_close(fd, obj[BATCH].handle);
+   put_offset(ahnd, obj[BATCH].handle);
+}
diff --git a/lib/igt_store.h b/lib/igt_store.h
new file mode 100644
index 0..5c6c8263c
---

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 12:42:28PM -0800, John Harrison wrote:
> On 1/13/2022 12:30, Matthew Brost wrote:
> > On Thu, Jan 13, 2022 at 11:59:41AM -0800, john.c.harri...@intel.com wrote:
> > > From: John Harrison 
> > > 
> > > The hang test was relying on context persitence for no particular
> > > reason. That is, it would set a bunch of background spinners running
> > > then immediately destroy the active contexts but expect the spinners
> > > to keep spinning. With the current implementation of context
> > > persistence in i915, that means that super high priority pings are
> > > sent to each engine at the start of the test. Depending upon the
> > > timing and platform, one of those unexpected pings could cause test
> > > failures.
> > > 
> > > There is no need to require context persitence in this test. So change
> > > to managing the contexts cleanly and only destroying them when they
> > > are no longer in use.
> > > 
> > > Signed-off-by: John Harrison 
> > > ---
> > >   tests/i915/i915_hangman.c | 15 ++-
> > >   1 file changed, 10 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> > > index 918418760..6601db5f6 100644
> > > --- a/tests/i915/i915_hangman.c
> > > +++ b/tests/i915/i915_hangman.c
> > > @@ -289,27 +289,29 @@ test_engine_hang(const intel_ctx_t *ctx,
> > >const struct intel_execution_engine2 *e, unsigned int 
> > > flags)
> > >   {
> > >   const struct intel_execution_engine2 *other;
> > > - const intel_ctx_t *tmp_ctx;
> > > + const intel_ctx_t *local_ctx[GEM_MAX_ENGINES];
> > This is fine for now as GEM_MAX_ENGINES is relatively small but what if
> > we change this to large value, let's say 4k? I think the stack could
> > overflow then. Maybe not a concern, maybe it is? I'll leave this up to
> > if this should be kmalloc'd or not in the next rev.
> Seems unlikely we are going that big any time soon. And such stack reduction
> can always be done as part of any huge engine count update. Although, this
> is userland not kernel - you can slap gigabytes on the stack and it won't
> blow up ;).
> 

Right, I realized after I sent this the stack in user land matter far
less. Should be fine.

Matt

> John.
> 
> > Everything else looks good.
> > 
> > With that:
> > Reviewed-by: Matthew Brost 
> > 
> > >   igt_spin_t *spin, *next;
> > >   IGT_LIST_HEAD(list);
> > >   uint64_t ahnd = get_reloc_ahnd(device, ctx->id), ahndN;
> > > + int num_ctx;
> > >   igt_skip_on(flags & IGT_SPIN_INVALID_CS &&
> > >   gem_engine_has_cmdparser(device, >cfg, 
> > > e->flags));
> > >   /* Fill all the other engines with background load */
> > > + num_ctx = 0;
> > >   for_each_ctx_engine(device, ctx, other) {
> > >   if (other->flags == e->flags)
> > >   continue;
> > > - tmp_ctx = intel_ctx_create(device, >cfg);
> > > - ahndN = get_reloc_ahnd(device, tmp_ctx->id);
> > > + local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
> > > + ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
> > >   spin = __igt_spin_new(device,
> > > .ahnd = ahndN,
> > > -   .ctx = tmp_ctx,
> > > +   .ctx = local_ctx[num_ctx],
> > > .engine = other->flags,
> > > .flags = IGT_SPIN_FENCE_OUT);
> > > - intel_ctx_destroy(device, tmp_ctx);
> > > + num_ctx++;
> > >   igt_list_move(>link, );
> > >   }
> > > @@ -339,7 +341,10 @@ test_engine_hang(const intel_ctx_t *ctx,
> > >   igt_spin_free(device, spin);
> > >   put_ahnd(ahndN);
> > >   }
> > > +
> > >   put_ahnd(ahnd);
> > > + while (num_ctx)
> > > + intel_ctx_destroy(device, local_ctx[--num_ctx]);
> > >   check_alive();
> > >   }
> > > -- 
> > > 2.25.1
> > > 
>

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread John Harrison


On 1/13/2022 12:30, Matthew Brost wrote:

On Thu, Jan 13, 2022 at 11:59:41AM -0800, john.c.harri...@intel.com wrote:

From: John Harrison 

The hang test was relying on context persitence for no particular
reason. That is, it would set a bunch of background spinners running
then immediately destroy the active contexts but expect the spinners
to keep spinning. With the current implementation of context
persistence in i915, that means that super high priority pings are
sent to each engine at the start of the test. Depending upon the
timing and platform, one of those unexpected pings could cause test
failures.

There is no need to require context persitence in this test. So change
to managing the contexts cleanly and only destroying them when they
are no longer in use.

Signed-off-by: John Harrison 
---
  tests/i915/i915_hangman.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 918418760..6601db5f6 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -289,27 +289,29 @@ test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
  {
const struct intel_execution_engine2 *other;
-   const intel_ctx_t *tmp_ctx;
+   const intel_ctx_t *local_ctx[GEM_MAX_ENGINES];

This is fine for now as GEM_MAX_ENGINES is relatively small but what if
we change this to large value, let's say 4k? I think the stack could
overflow then. Maybe not a concern, maybe it is? I'll leave this up to
if this should be kmalloc'd or not in the next rev.
Seems unlikely we are going that big any time soon. And such stack 
reduction can always be done as part of any huge engine count update. 
Although, this is userland not kernel - you can slap gigabytes on the 
stack and it won't blow up ;).


John.


Everything else looks good.

With that:
Reviewed-by: Matthew Brost 


igt_spin_t *spin, *next;
IGT_LIST_HEAD(list);
uint64_t ahnd = get_reloc_ahnd(device, ctx->id), ahndN;
+   int num_ctx;
  
  	igt_skip_on(flags & IGT_SPIN_INVALID_CS &&

gem_engine_has_cmdparser(device, >cfg, e->flags));
  
  	/* Fill all the other engines with background load */

+   num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
if (other->flags == e->flags)
continue;
  
-		tmp_ctx = intel_ctx_create(device, >cfg);

-   ahndN = get_reloc_ahnd(device, tmp_ctx->id);
+   local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
- .ctx = tmp_ctx,
+ .ctx = local_ctx[num_ctx],
  .engine = other->flags,
  .flags = IGT_SPIN_FENCE_OUT);
-   intel_ctx_destroy(device, tmp_ctx);
+   num_ctx++;
  
  		igt_list_move(>link, );

}
@@ -339,7 +341,10 @@ test_engine_hang(const intel_ctx_t *ctx,
igt_spin_free(device, spin);
put_ahnd(ahndN);
}
+
put_ahnd(ahnd);
+   while (num_ctx)
+   intel_ctx_destroy(device, local_ctx[--num_ctx]);
  
  	check_alive();

  }
--
2.25.1

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread John Harrison


On 1/13/2022 12:23, Matthew Brost wrote:

On Thu, Jan 13, 2022 at 12:27:00PM -0800, John Harrison wrote:

On 1/13/2022 12:10, Matthew Brost wrote:

On Thu, Jan 13, 2022 at 11:59:39AM -0800, john.c.harri...@intel.com wrote:

From: John Harrison 

A lot of tests use almost identical code for creating a batch buffer
which does a single write to memory and another is about to be added.
Instead, move the most generic version into a common helper function.
Unfortunately, the other instances are all subtly different enough to
make it not so trivial to try to use the helper. It could be done but
it is unclear if it is worth the effort at this point. This patch
proves the concept, if people like it enough then it can be extended.

v2: Fix up object address vs store offset confusion (with help from
Zbigniew K).

Signed-off-by: John Harrison 
---
   lib/igt_store.c | 96 +
   lib/igt_store.h | 12 +
   lib/meson.build |  1 +
   tests/i915/gem_exec_fence.c | 77 ++---
   tests/i915/i915_hangman.c   |  1 +
   5 files changed, 115 insertions(+), 72 deletions(-)
   create mode 100644 lib/igt_store.c
   create mode 100644 lib/igt_store.h

diff --git a/lib/igt_store.c b/lib/igt_store.c
new file mode 100644
index 0..42c888b55
--- /dev/null
+++ b/lib/igt_store.c
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "i915/gem_create.h"
+#include "igt_core.h"
+#include "drmtest.h"
+#include "igt_store.h"
+#include "intel_chipset.h"
+#include "intel_reg.h"
+#include "ioctl_wrappers.h"
+#include "lib/intel_allocator.h"
+
+/**
+ * SECTION:igt_store_word
+ * @short_description: Library for writing a value to memory
+ * @title: StoreWord
+ * @include: igt.h
+ *
+ * A lot of igt testcases need some mechanism for writing a value to memory
+ * as a test that a batch buffer has executed.
+ *
+ * NB: Requires master for STORE_DWORD on gen4/5.
+ */
+void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
+   const struct intel_execution_engine2 *e,
+   int fence, uint32_t target_handle,
+   uint64_t target_gpu_addr,
+   uint64_t store_offset, uint32_t store_value)
+{
+   const int SCRATCH = 0;
+   const int BATCH = 1;
+   const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
+   struct drm_i915_gem_exec_object2 obj[2];
+   struct drm_i915_gem_relocation_entry reloc;
+   struct drm_i915_gem_execbuffer2 execbuf;
+   uint32_t batch[16], delta;
+   uint64_t bb_offset;
+   int i;
+
+   memset(, 0, sizeof(execbuf));
+   execbuf.buffers_ptr = to_user_pointer(obj);
+   execbuf.buffer_count = ARRAY_SIZE(obj);
+   execbuf.flags = e->flags;
+   execbuf.rsvd1 = ctx->id;
+   if (fence != -1) {
+   execbuf.flags |= I915_EXEC_FENCE_IN;
+   execbuf.rsvd2 = fence;
+   }
+   if (gen < 6)
+   execbuf.flags |= I915_EXEC_SECURE;
+
+   memset(obj, 0, sizeof(obj));
+   obj[SCRATCH].handle = target_handle;
+
+   obj[BATCH].handle = gem_create(fd, 4096);
+   obj[BATCH].relocs_ptr = to_user_pointer();
+   obj[BATCH].relocation_count = !ahnd ? 1 : 0;
+   bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
+   memset(, 0, sizeof(reloc));
+
+   i = 0;
+   delta = sizeof(uint32_t) * store_offset;

Can't this overflow the delta as store_offset is a u64?

Oops.

Yeah, this code was a right mess of data words being used as addresses and
random copies supporting 64bit or only 32bit offsets. I believe it's
currently fine as even platforms which can theoretically support >32bits
don't actually use it. But yes, will repost with a 64bit version of delta.


+   if (!ahnd) {
+   reloc.target_handle = obj[SCRATCH].handle;
+   reloc.presumed_offset = -1;
+   reloc.offset = sizeof(uint32_t) * (i + 1);

Then just be safe, probably assert the upper 32 bits of delta are clear too.

Indeed. And in the 
Matt


+   reloc.delta = delta;
+   reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
+   reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
+   } else {
+   obj[SCRATCH].offset = target_gpu_addr;
+   obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+   obj[BATCH].offset = bb_offset;
+   obj[BATCH].flags |= EXEC_OBJECT_PINNED;
+   }
+   batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
+   if (gen >= 8) {
+   batch[++i] = target_gpu_addr + delta;
+   batch[++i] = (target_gpu_addr + delta) >> 32;

This is different from the previous code, presumably this is fixing a
bug where delta + bits 31:0 of target_gpu_addr overflows into the upper
32 bits?

Matt

Yeah, some copies of this code were definitely broken for >32bit

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:41AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The hang test was relying on context persitence for no particular
> reason. That is, it would set a bunch of background spinners running
> then immediately destroy the active contexts but expect the spinners
> to keep spinning. With the current implementation of context
> persistence in i915, that means that super high priority pings are
> sent to each engine at the start of the test. Depending upon the
> timing and platform, one of those unexpected pings could cause test
> failures.
> 
> There is no need to require context persitence in this test. So change
> to managing the contexts cleanly and only destroying them when they
> are no longer in use.
> 
> Signed-off-by: John Harrison 
> ---
>  tests/i915/i915_hangman.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index 918418760..6601db5f6 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -289,27 +289,29 @@ test_engine_hang(const intel_ctx_t *ctx,
>const struct intel_execution_engine2 *e, unsigned int flags)
>  {
>   const struct intel_execution_engine2 *other;
> - const intel_ctx_t *tmp_ctx;
> + const intel_ctx_t *local_ctx[GEM_MAX_ENGINES];

This is fine for now as GEM_MAX_ENGINES is relatively small but what if
we change this to large value, let's say 4k? I think the stack could
overflow then. Maybe not a concern, maybe it is? I'll leave this up to
if this should be kmalloc'd or not in the next rev.

Everything else looks good.

With that:
Reviewed-by: Matthew Brost 

>   igt_spin_t *spin, *next;
>   IGT_LIST_HEAD(list);
>   uint64_t ahnd = get_reloc_ahnd(device, ctx->id), ahndN;
> + int num_ctx;
>  
>   igt_skip_on(flags & IGT_SPIN_INVALID_CS &&
>   gem_engine_has_cmdparser(device, >cfg, e->flags));
>  
>   /* Fill all the other engines with background load */
> + num_ctx = 0;
>   for_each_ctx_engine(device, ctx, other) {
>   if (other->flags == e->flags)
>   continue;
>  
> - tmp_ctx = intel_ctx_create(device, >cfg);
> - ahndN = get_reloc_ahnd(device, tmp_ctx->id);
> + local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
> + ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
>   spin = __igt_spin_new(device,
> .ahnd = ahndN,
> -   .ctx = tmp_ctx,
> +   .ctx = local_ctx[num_ctx],
> .engine = other->flags,
> .flags = IGT_SPIN_FENCE_OUT);
> - intel_ctx_destroy(device, tmp_ctx);
> + num_ctx++;
>  
>   igt_list_move(>link, );
>   }
> @@ -339,7 +341,10 @@ test_engine_hang(const intel_ctx_t *ctx,
>   igt_spin_free(device, spin);
>   put_ahnd(ahndN);
>   }
> +
>   put_ahnd(ahnd);
> + while (num_ctx)
> + intel_ctx_destroy(device, local_ctx[--num_ctx]);
>  
>   check_alive();
>  }
> -- 
> 2.25.1
>

Re: [Intel-gfx] [PATCH v2 1/1] drm/i915/pxp: Hold RPM wakelock during PXP unbind

2022-01-13 Thread Rodrigo Vivi

On Thu, Jan 06, 2022 at 12:02:36PM -0800, Juston Li wrote:
> Similar to commit b8d8436840ca ("drm/i915/gt: Hold RPM wakelock during
> PXP suspend") but to fix the same warning for unbind during shutdown:
> 
> [ cut here ]
> RPM wakelock ref not held during HW access
> WARNING: CPU: 0 PID: 4139 at drivers/gpu/drm/i915/intel_runtime_pm.h:115
> gen12_fwtable_write32+0x1b7/0
> Modules linked in: 8021q ccm rfcomm cmac algif_hash algif_skcipher
> af_alg uinput snd_hda_codec_hdmi vf industrialio iwl7000_mac80211
> cros_ec_sensorhub lzo_rle lzo_compress zram iwlwifi cfg80211 joydev
> CPU: 0 PID: 4139 Comm: halt Tainted: G U  W
> 5.10.84 #13 344e11e079c4a03940d949e537eab645f6
> RIP: 0010:gen12_fwtable_write32+0x1b7/0x200
> Code: 48 c7 c7 fc b3 b5 89 31 c0 e8 2c f3 ad ff 0f 0b e9 04 ff ff ff c6
> 05 71 e9 1d 01 01 48 c7 c7 d67
> RSP: 0018:a09ec0bb3bb0 EFLAGS: 00010246
> RAX: 12dde97bbd260300 RBX: 000320f0 RCX: 89e60ea0
> RDX:  RSI: dfff RDI: 89e60e70
> RBP: a09ec0bb3bd8 R08:  R09: a09ec0bb3950
> R10: dfff R11: 89e91160 R12: 
> R13: 28121969 R14: 9515c32f0990 R15: 4000
> FS:  790dcf225740() GS:95173780() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 58b25efae147 CR3: 000133ea6001 CR4: 00770ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 07f0 DR7: 0400
> PKRU: 5554
> Call Trace:
>  intel_pxp_fini_hw+0x2f/0x39
>  i915_pxp_tee_component_unbind+0x1c/0x42
>  component_unbind+0x32/0x48
>  component_unbind_all+0x80/0x9d
>  take_down_master+0x24/0x36
>  component_master_del+0x56/0x70
>  mei_pxp_remove+0x2c/0x68
>  mei_cl_device_remove+0x35/0x68
>  device_release_driver_internal+0x100/0x1a1
>  mei_cl_bus_remove_device+0x21/0x79
>  mei_cl_bus_remove_devices+0x3b/0x51
>  mei_stop+0x3b/0xae
>  mei_me_shutdown+0x23/0x58
>  device_shutdown+0x144/0x1d3
>  kernel_power_off+0x13/0x4c
>  __se_sys_reboot+0x1d4/0x1e9
>  do_syscall_64+0x43/0x55
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x790dcf316273
> Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> 00 89 fa be 69 19 12 28 bf ad8
> RSP: 002b:7ffca0df9198 EFLAGS: 0202 ORIG_RAX: 00a9
> RAX: ffda RBX: 4321fedc RCX: 790dcf316273
> RDX: 4321fedc RSI: 28121969 RDI: fee1dead
> RBP: 7ffca0df9200 R08: 0007 R09: 563ce8cd8970
> R10:  R11: 0202 R12: 7ffca0df9308
> R13: 0001 R14:  R15: 0003
> ---[ end trace 2f501b01b348f114 ]---
> ACPI: Preparing to enter system sleep state S5
> reboot: Power down
> 
> Changes since v1:
>  - Rebase to latest drm-tip
> 
> Fixes: 0cfab4cb3c4e ("drm/i915/pxp: Enable PXP power management")
> Suggested-by: Lee Shawn C 
> Signed-off-by: Juston Li 
> Reviewed-by: Daniele Ceraolo Spurio 

Reviewed-by: Rodrigo Vivi 

and pushing to drm-intel-next right now.

Thanks for the patch.

> ---
>  drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c 
> b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
> index 195b2323ec00..4b6f5655fab5 100644
> --- a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
> @@ -107,9 +107,12 @@ static int i915_pxp_tee_component_bind(struct device 
> *i915_kdev,
>  static void i915_pxp_tee_component_unbind(struct device *i915_kdev,
> struct device *tee_kdev, void *data)
>  {
> + struct drm_i915_private *i915 = kdev_to_i915(i915_kdev);
>   struct intel_pxp *pxp = i915_dev_to_pxp(i915_kdev);
> + intel_wakeref_t wakeref;
>  
> - intel_pxp_fini_hw(pxp);
> + with_intel_runtime_pm_if_in_use(>runtime_pm, wakeref)
> + intel_pxp_fini_hw(pxp);
>  
>   mutex_lock(>tee_mutex);
>   pxp->pxp_component = NULL;
> -- 
> 2.31.1
>

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 12:27:00PM -0800, John Harrison wrote:
> On 1/13/2022 12:10, Matthew Brost wrote:
> > On Thu, Jan 13, 2022 at 11:59:39AM -0800, john.c.harri...@intel.com wrote:
> > > From: John Harrison 
> > > 
> > > A lot of tests use almost identical code for creating a batch buffer
> > > which does a single write to memory and another is about to be added.
> > > Instead, move the most generic version into a common helper function.
> > > Unfortunately, the other instances are all subtly different enough to
> > > make it not so trivial to try to use the helper. It could be done but
> > > it is unclear if it is worth the effort at this point. This patch
> > > proves the concept, if people like it enough then it can be extended.
> > > 
> > > v2: Fix up object address vs store offset confusion (with help from
> > > Zbigniew K).
> > > 
> > > Signed-off-by: John Harrison 
> > > ---
> > >   lib/igt_store.c | 96 +
> > >   lib/igt_store.h | 12 +
> > >   lib/meson.build |  1 +
> > >   tests/i915/gem_exec_fence.c | 77 ++---
> > >   tests/i915/i915_hangman.c   |  1 +
> > >   5 files changed, 115 insertions(+), 72 deletions(-)
> > >   create mode 100644 lib/igt_store.c
> > >   create mode 100644 lib/igt_store.h
> > > 
> > > diff --git a/lib/igt_store.c b/lib/igt_store.c
> > > new file mode 100644
> > > index 0..42c888b55
> > > --- /dev/null
> > > +++ b/lib/igt_store.c
> > > @@ -0,0 +1,96 @@
> > > +/* SPDX-License-Identifier: MIT */
> > > +/*
> > > + * Copyright © 2021 Intel Corporation
> > > + */
> > > +
> > > +#include "i915/gem_create.h"
> > > +#include "igt_core.h"
> > > +#include "drmtest.h"
> > > +#include "igt_store.h"
> > > +#include "intel_chipset.h"
> > > +#include "intel_reg.h"
> > > +#include "ioctl_wrappers.h"
> > > +#include "lib/intel_allocator.h"
> > > +
> > > +/**
> > > + * SECTION:igt_store_word
> > > + * @short_description: Library for writing a value to memory
> > > + * @title: StoreWord
> > > + * @include: igt.h
> > > + *
> > > + * A lot of igt testcases need some mechanism for writing a value to 
> > > memory
> > > + * as a test that a batch buffer has executed.
> > > + *
> > > + * NB: Requires master for STORE_DWORD on gen4/5.
> > > + */
> > > +void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
> > > + const struct intel_execution_engine2 *e,
> > > + int fence, uint32_t target_handle,
> > > + uint64_t target_gpu_addr,
> > > + uint64_t store_offset, uint32_t store_value)
> > > +{
> > > + const int SCRATCH = 0;
> > > + const int BATCH = 1;
> > > + const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
> > > + struct drm_i915_gem_exec_object2 obj[2];
> > > + struct drm_i915_gem_relocation_entry reloc;
> > > + struct drm_i915_gem_execbuffer2 execbuf;
> > > + uint32_t batch[16], delta;
> > > + uint64_t bb_offset;
> > > + int i;
> > > +
> > > + memset(, 0, sizeof(execbuf));
> > > + execbuf.buffers_ptr = to_user_pointer(obj);
> > > + execbuf.buffer_count = ARRAY_SIZE(obj);
> > > + execbuf.flags = e->flags;
> > > + execbuf.rsvd1 = ctx->id;
> > > + if (fence != -1) {
> > > + execbuf.flags |= I915_EXEC_FENCE_IN;
> > > + execbuf.rsvd2 = fence;
> > > + }
> > > + if (gen < 6)
> > > + execbuf.flags |= I915_EXEC_SECURE;
> > > +
> > > + memset(obj, 0, sizeof(obj));
> > > + obj[SCRATCH].handle = target_handle;
> > > +
> > > + obj[BATCH].handle = gem_create(fd, 4096);
> > > + obj[BATCH].relocs_ptr = to_user_pointer();
> > > + obj[BATCH].relocation_count = !ahnd ? 1 : 0;
> > > + bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
> > > + memset(, 0, sizeof(reloc));
> > > +
> > > + i = 0;
> > > + delta = sizeof(uint32_t) * store_offset;
> > Can't this overflow the delta as store_offset is a u64?
> Oops.
> 
> Yeah, this code was a right mess of data words being used as addresses and
> random copies supporting 64bit or only 32bit offsets. I believe it's
> currently fine as even platforms which can theoretically support >32bits
> don't actually use it. But yes, will repost with a 64bit version of delta.
> 
> > 
> > > + if (!ahnd) {
> > > + reloc.target_handle = obj[SCRATCH].handle;
> > > + reloc.presumed_offset = -1;
> > > + reloc.offset = sizeof(uint32_t) * (i + 1);

Then just be safe, probably assert the upper 32 bits of delta are clear too.

Matt

> > > + reloc.delta = delta;
> > > + reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> > > + reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
> > > + } else {
> > > + obj[SCRATCH].offset = target_gpu_addr;
> > > + obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> > > + obj[BATCH].offset = bb_offset;
> > > + obj[BATCH].flags |= EXEC_OBJECT_PINNED;
> > > + }
> > > + batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> > > + if (gen >= 8) {
> > > +

Re: [Intel-gfx] [PATCH v4] x86/quirks: Replace QFLAG_APPLY_ONCE with static locals

2022-01-13 Thread Rodrigo Vivi

On Wed, Jan 12, 2022 at 05:28:29PM -0800, Lucas De Marchi wrote:
> On Wed, Jan 12, 2022 at 07:06:45PM -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 12, 2022 at 04:21:28PM -0800, Lucas De Marchi wrote:
> > > On Wed, Jan 12, 2022 at 06:08:05PM -0600, Bjorn Helgaas wrote:
> > > > On Wed, Jan 12, 2022 at 03:30:43PM -0800, Lucas De Marchi wrote:
> > > > > The flags are only used to mark a quirk to be called once and nothing
> > > > > else. Also, that logic may not be appropriate if the quirk wants to
> > > > > do additional filtering and set quirk as applied by itself.
> > > > >
> > > > > So replace the uses of QFLAG_APPLY_ONCE with static local variables in
> > > > > the few quirks that use this logic and remove all the flags logic.
> > > > >
> > > > > Signed-off-by: Lucas De Marchi 
> > > > > Reviewed-by: Bjorn Helgaas 
> > > >
> > > > Only occurred to me now, but another, less intrusive approach would be
> > > > to just remove QFLAG_APPLY_ONCE from intel_graphics_quirks() and do
> > > > its bookkeeping internally, e.g.,
> > > 
> > > that is actually what I suggested after your comment in v2: this would
> > > be the first patch with "minimal fix". But then to keep it consistent
> > > with the other calls to follow up with additional patches on top
> > > converting them as well.  Maybe what I wrote wasn't clear in the
> > > direction? Copying it here:
> > > 
> > >   1) add the static local only to intel graphics quirk  and remove the
> > >   flag from this item
> > >   2 and 3) add the static local to other functions and remove the flag
> > >   from those items
> > >   4) remove the flag from the table, the defines and its usage.
> > >   5) fix the coding style (to be clear, it's already wrong, not
> > >   something wrong introduced here... maybe could be squashed in (4)?)
> > 
> > Oh, sorry, I guess I just skimmed over that without really
> > comprehending it.
> > 
> > Although the patch below is basically just 1 from above and doesn't
> > require any changes to the other functions or the flags themselves
> > (2-4 above).
> 
> Yes, but I would do the rest of the conversion anyway. It would be odd
> to be inconsistent with just a few functions. So in the end I think we
> would achieve the same goal.
> 
> I would really prefer this approach, having the bug fix first, if I was
> concerned about having to backport this to linux-stable beyond 5.10.y
> (we have a trivial conflict on 5.10).
> 
> However given this situation is new (Intel GPU + Intel Discrete GPU)
> rare (it also needs a PCI topology in a certain way to reproduce it),
> I'm not too concerned. Not even sure if it's worth submitting to
> linux-stable.

+1 on the minimal fix approach first and send that to stable 5.10+.
We will hit this case for sure.

also +1 on the discussed ideas as a follow up.

> 
> I'll wait others to chime in on one way vs the other.
> 
> thanks
> Lucas De Marchi

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread John Harrison


On 1/13/2022 12:10, Matthew Brost wrote:

On Thu, Jan 13, 2022 at 11:59:39AM -0800, john.c.harri...@intel.com wrote:

From: John Harrison 

A lot of tests use almost identical code for creating a batch buffer
which does a single write to memory and another is about to be added.
Instead, move the most generic version into a common helper function.
Unfortunately, the other instances are all subtly different enough to
make it not so trivial to try to use the helper. It could be done but
it is unclear if it is worth the effort at this point. This patch
proves the concept, if people like it enough then it can be extended.

v2: Fix up object address vs store offset confusion (with help from
Zbigniew K).

Signed-off-by: John Harrison 
---
  lib/igt_store.c | 96 +
  lib/igt_store.h | 12 +
  lib/meson.build |  1 +
  tests/i915/gem_exec_fence.c | 77 ++---
  tests/i915/i915_hangman.c   |  1 +
  5 files changed, 115 insertions(+), 72 deletions(-)
  create mode 100644 lib/igt_store.c
  create mode 100644 lib/igt_store.h

diff --git a/lib/igt_store.c b/lib/igt_store.c
new file mode 100644
index 0..42c888b55
--- /dev/null
+++ b/lib/igt_store.c
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "i915/gem_create.h"
+#include "igt_core.h"
+#include "drmtest.h"
+#include "igt_store.h"
+#include "intel_chipset.h"
+#include "intel_reg.h"
+#include "ioctl_wrappers.h"
+#include "lib/intel_allocator.h"
+
+/**
+ * SECTION:igt_store_word
+ * @short_description: Library for writing a value to memory
+ * @title: StoreWord
+ * @include: igt.h
+ *
+ * A lot of igt testcases need some mechanism for writing a value to memory
+ * as a test that a batch buffer has executed.
+ *
+ * NB: Requires master for STORE_DWORD on gen4/5.
+ */
+void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
+   const struct intel_execution_engine2 *e,
+   int fence, uint32_t target_handle,
+   uint64_t target_gpu_addr,
+   uint64_t store_offset, uint32_t store_value)
+{
+   const int SCRATCH = 0;
+   const int BATCH = 1;
+   const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
+   struct drm_i915_gem_exec_object2 obj[2];
+   struct drm_i915_gem_relocation_entry reloc;
+   struct drm_i915_gem_execbuffer2 execbuf;
+   uint32_t batch[16], delta;
+   uint64_t bb_offset;
+   int i;
+
+   memset(, 0, sizeof(execbuf));
+   execbuf.buffers_ptr = to_user_pointer(obj);
+   execbuf.buffer_count = ARRAY_SIZE(obj);
+   execbuf.flags = e->flags;
+   execbuf.rsvd1 = ctx->id;
+   if (fence != -1) {
+   execbuf.flags |= I915_EXEC_FENCE_IN;
+   execbuf.rsvd2 = fence;
+   }
+   if (gen < 6)
+   execbuf.flags |= I915_EXEC_SECURE;
+
+   memset(obj, 0, sizeof(obj));
+   obj[SCRATCH].handle = target_handle;
+
+   obj[BATCH].handle = gem_create(fd, 4096);
+   obj[BATCH].relocs_ptr = to_user_pointer();
+   obj[BATCH].relocation_count = !ahnd ? 1 : 0;
+   bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
+   memset(, 0, sizeof(reloc));
+
+   i = 0;
+   delta = sizeof(uint32_t) * store_offset;

Can't this overflow the delta as store_offset is a u64?

Oops.

Yeah, this code was a right mess of data words being used as addresses 
and random copies supporting 64bit or only 32bit offsets. I believe it's 
currently fine as even platforms which can theoretically support >32bits 
don't actually use it. But yes, will repost with a 64bit version of delta.





+   if (!ahnd) {
+   reloc.target_handle = obj[SCRATCH].handle;
+   reloc.presumed_offset = -1;
+   reloc.offset = sizeof(uint32_t) * (i + 1);
+   reloc.delta = delta;
+   reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
+   reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
+   } else {
+   obj[SCRATCH].offset = target_gpu_addr;
+   obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+   obj[BATCH].offset = bb_offset;
+   obj[BATCH].flags |= EXEC_OBJECT_PINNED;
+   }
+   batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
+   if (gen >= 8) {
+   batch[++i] = target_gpu_addr + delta;
+   batch[++i] = (target_gpu_addr + delta) >> 32;

This is different from the previous code, presumably this is fixing a
bug where delta + bits 31:0 of target_gpu_addr overflows into the upper
32 bits?

Matt

Yeah, some copies of this code were definitely broken for >32bit addresses.

John.




+   } else if (gen >= 4) {
+   batch[++i] = 0;
+   batch[++i] = delta;
+   reloc.offset += sizeof(uint32_t);
+   } else {
+   batch[i]--;
+

Re: [Intel-gfx] [PATCH v3 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:40AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> Added a an extra step to the i915_hangman tests to check that the
> system is still alive after the hang and recovery. This submits a
> simple batch to each engine which does a write to memory and checks
> that the write occurred.
> 
> Signed-off-by: John Harrison 

Looks good to me but can't help but think this could be a library
function as I really doubt this is the only test where at the end of the
test we want to verify all engines are alive. Something to keep an eye /
do in a follow up.

With that:
Reviewed-by: Matthew Brost 

> ---
>  tests/i915/i915_hangman.c | 59 +++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index 5a0c9497c..918418760 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -48,8 +48,57 @@
>  static int device = -1;
>  static int sysfs = -1;
>  
> +#define OFFSET_ALIVE 10
> +
>  IGT_TEST_DESCRIPTION("Tests for hang detection and recovery");
>  
> +static void check_alive(void)
> +{
> + const struct intel_execution_engine2 *engine;
> + const intel_ctx_t *ctx;
> + uint32_t scratch, *out;
> + int fd, i = 0;
> + uint64_t ahnd, scratch_addr;
> +
> + fd = drm_open_driver(DRIVER_INTEL);
> + igt_require(gem_class_can_store_dword(fd, 0));
> +
> + ctx = intel_ctx_create_all_physical(fd);
> + ahnd = get_reloc_ahnd(fd, ctx->id);
> + scratch = gem_create(fd, 4096);
> + scratch_addr = get_offset(ahnd, scratch, 4096, 0);
> + out = gem_mmap__wc(fd, scratch, 0, 4096, PROT_WRITE);
> + gem_set_domain(fd, scratch,
> + I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
> +
> + for_each_physical_engine(fd, engine) {
> + igt_assert_eq_u32(out[i + OFFSET_ALIVE], 0);
> + i++;
> + }
> +
> + i = 0;
> + for_each_ctx_engine(fd, ctx, engine) {
> + if (!gem_class_can_store_dword(fd, engine->class))
> + continue;
> +
> + /* +OFFSET_ALIVE to ensure engine zero doesn't get a false 
> negative */
> + igt_store_word(fd, ahnd, ctx, engine, -1, scratch, scratch_addr,
> +i + OFFSET_ALIVE, i + OFFSET_ALIVE);
> + i++;
> + }
> +
> + gem_set_domain(fd, scratch, I915_GEM_DOMAIN_GTT, 0);
> +
> + while (i--)
> + igt_assert_eq_u32(out[i + OFFSET_ALIVE], i + OFFSET_ALIVE);
> +
> + munmap(out, 4096);
> + gem_close(fd, scratch);
> + put_ahnd(ahnd);
> + intel_ctx_destroy(fd, ctx);
> + close(fd);
> +}
> +
>  static bool has_error_state(int dir)
>  {
>   bool result;
> @@ -231,6 +280,8 @@ static void test_error_state_capture(const intel_ctx_t 
> *ctx,
>   check_error_state(e->name, offset, batch);
>   munmap(batch, 4096);
>   put_ahnd(ahnd);
> +
> + check_alive();
>  }
>  
>  static void
> @@ -289,6 +340,8 @@ test_engine_hang(const intel_ctx_t *ctx,
>   put_ahnd(ahndN);
>   }
>   put_ahnd(ahnd);
> +
> + check_alive();
>  }
>  
>  static int hang_count;
> @@ -321,6 +374,8 @@ static void test_hang_detector(const intel_ctx_t *ctx,
>  
>   /* Did it work? */
>   igt_assert(hang_count == 1);
> +
> + check_alive();
>  }
>  
>  /* This test covers the case where we end up in an uninitialised area of the
> @@ -356,6 +411,8 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
>   igt_force_gpu_reset(device);
>   igt_assert_f(0, "unterminated batch did not trigger a hang!\n");
>   }
> +
> + check_alive();
>  }
>  
>  static void do_tests(const char *name, const char *prefix,
> @@ -433,6 +490,8 @@ igt_main
>   igt_assert(sysfs != -1);
>  
>   igt_require(has_error_state(sysfs));
> +
> + gem_require_mmap_wc(device);
>   }
>  
>   igt_describe("Basic error capture");
> -- 
> 2.25.1
>

Re: [Intel-gfx] [PATCH v3 00/11] Start cleaning up register definitions

2022-01-13 Thread Rodrigo Vivi

On Thu, Jan 13, 2022 at 06:58:47PM +0200, Jani Nikula wrote:
> On Wed, 12 Jan 2022, Rodrigo Vivi  wrote:
> > I understand that I'm late to the fun here, but I got myself wondering if
> > we couldn't separated the registers in a "regs" directory
> > and find some way to organize them in IP blocks matching the hw...
> >
> > mainly thinking about 2 cases:
> >
> > 1. searching for registers usages...
> > 2. the idea of having some sort of auto generation from spec...
> 
> At least to me it's more important to split these between display and
> gt, and I'd prefer not to have them in the same directory.

yeap, it makes sense...

> 
> BR,
> Jani.
> 
> 
> -- 
> Jani Nikula, Intel Open Source Graphics Center

[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Flip guc_id allocation partition (rev5)

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915: Flip guc_id allocation partition (rev5)
URL   : https://patchwork.freedesktop.org/series/98751/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11079 -> Patchwork_22000


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/index.html

Participating hosts (40 -> 40)
--

  Additional (5): fi-kbl-soraka bat-dg1-6 bat-adlp-6 bat-jsl-2 bat-jsl-1 
  Missing(5): shard-tglu fi-bsw-cyan fi-icl-u2 shard-rkl shard-dg1 

Known issues


  Here are the changes found in Patchwork_22000 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@amdgpu/amd_basic@semaphore:
- fi-bdw-5557u:   NOTRUN -> [SKIP][1] ([fdo#109271]) +31 similar issues
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/fi-bdw-5557u/igt@amdgpu/amd_ba...@semaphore.html
- fi-hsw-4770:NOTRUN -> [SKIP][2] ([fdo#109271] / [fdo#109315]) +17 
similar issues
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/fi-hsw-4770/igt@amdgpu/amd_ba...@semaphore.html

  * igt@fbdev@nullptr:
- bat-dg1-6:  NOTRUN -> [SKIP][3] ([i915#2582]) +4 similar issues
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@fb...@nullptr.html

  * igt@gem_exec_fence@basic-busy@bcs0:
- fi-kbl-soraka:  NOTRUN -> [SKIP][4] ([fdo#109271]) +8 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/fi-kbl-soraka/igt@gem_exec_fence@basic-b...@bcs0.html

  * igt@gem_exec_gttfill@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][5] ([i915#4086])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@gem_exec_gttf...@basic.html

  * igt@gem_huc_copy@huc-copy:
- fi-kbl-soraka:  NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#2190])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/fi-kbl-soraka/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@parallel-random-engines:
- fi-kbl-soraka:  NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#4613]) +3 
similar issues
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/fi-kbl-soraka/igt@gem_lmem_swapp...@parallel-random-engines.html

  * igt@gem_mmap@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][8] ([i915#4083])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@gem_m...@basic.html

  * igt@gem_tiled_blits@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][9] ([i915#4077]) +2 similar issues
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@gem_tiled_bl...@basic.html

  * igt@gem_tiled_pread_basic:
- bat-dg1-6:  NOTRUN -> [SKIP][10] ([i915#4079]) +1 similar issue
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- bat-dg1-6:  NOTRUN -> [SKIP][11] ([i915#1155])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_selftest@live@gt_pm:
- fi-kbl-soraka:  NOTRUN -> [DMESG-FAIL][12] ([i915#1886] / [i915#2291])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@hangcheck:
- bat-dg1-6:  NOTRUN -> [DMESG-FAIL][13] ([i915#4494])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@i915_selftest@l...@hangcheck.html

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
- bat-dg1-6:  NOTRUN -> [SKIP][14] ([i915#4212]) +7 similar issues
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@kms_addfb_ba...@addfb25-x-tiled-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
- bat-dg1-6:  NOTRUN -> [SKIP][15] ([i915#4215])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@kms_addfb_ba...@basic-y-tiled-legacy.html

  * igt@kms_busy@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][16] ([i915#4303])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@kms_b...@basic.html

  * igt@kms_chamelium@dp-edid-read:
- fi-kbl-soraka:  NOTRUN -> [SKIP][17] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/fi-kbl-soraka/igt@kms_chamel...@dp-edid-read.html

  * igt@kms_chamelium@hdmi-edid-read:
- bat-dg1-6:  NOTRUN -> [SKIP][18] ([fdo#111827]) +8 similar issues
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22000/bat-dg1-6/igt@kms_chamel...@hdmi-edid-read.html

  * igt@kms_chamelium@vga-edid-read:
- fi-bdw-5557u:   NOTRUN -> [SKIP][19] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [19]:

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:39AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> A lot of tests use almost identical code for creating a batch buffer
> which does a single write to memory and another is about to be added.
> Instead, move the most generic version into a common helper function.
> Unfortunately, the other instances are all subtly different enough to
> make it not so trivial to try to use the helper. It could be done but
> it is unclear if it is worth the effort at this point. This patch
> proves the concept, if people like it enough then it can be extended.
> 
> v2: Fix up object address vs store offset confusion (with help from
> Zbigniew K).
> 
> Signed-off-by: John Harrison 
> ---
>  lib/igt_store.c | 96 +
>  lib/igt_store.h | 12 +
>  lib/meson.build |  1 +
>  tests/i915/gem_exec_fence.c | 77 ++---
>  tests/i915/i915_hangman.c   |  1 +
>  5 files changed, 115 insertions(+), 72 deletions(-)
>  create mode 100644 lib/igt_store.c
>  create mode 100644 lib/igt_store.h
> 
> diff --git a/lib/igt_store.c b/lib/igt_store.c
> new file mode 100644
> index 0..42c888b55
> --- /dev/null
> +++ b/lib/igt_store.c
> @@ -0,0 +1,96 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2021 Intel Corporation
> + */
> +
> +#include "i915/gem_create.h"
> +#include "igt_core.h"
> +#include "drmtest.h"
> +#include "igt_store.h"
> +#include "intel_chipset.h"
> +#include "intel_reg.h"
> +#include "ioctl_wrappers.h"
> +#include "lib/intel_allocator.h"
> +
> +/**
> + * SECTION:igt_store_word
> + * @short_description: Library for writing a value to memory
> + * @title: StoreWord
> + * @include: igt.h
> + *
> + * A lot of igt testcases need some mechanism for writing a value to memory
> + * as a test that a batch buffer has executed.
> + *
> + * NB: Requires master for STORE_DWORD on gen4/5.
> + */
> +void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
> + const struct intel_execution_engine2 *e,
> + int fence, uint32_t target_handle,
> + uint64_t target_gpu_addr,
> + uint64_t store_offset, uint32_t store_value)
> +{
> + const int SCRATCH = 0;
> + const int BATCH = 1;
> + const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
> + struct drm_i915_gem_exec_object2 obj[2];
> + struct drm_i915_gem_relocation_entry reloc;
> + struct drm_i915_gem_execbuffer2 execbuf;
> + uint32_t batch[16], delta;
> + uint64_t bb_offset;
> + int i;
> +
> + memset(, 0, sizeof(execbuf));
> + execbuf.buffers_ptr = to_user_pointer(obj);
> + execbuf.buffer_count = ARRAY_SIZE(obj);
> + execbuf.flags = e->flags;
> + execbuf.rsvd1 = ctx->id;
> + if (fence != -1) {
> + execbuf.flags |= I915_EXEC_FENCE_IN;
> + execbuf.rsvd2 = fence;
> + }
> + if (gen < 6)
> + execbuf.flags |= I915_EXEC_SECURE;
> +
> + memset(obj, 0, sizeof(obj));
> + obj[SCRATCH].handle = target_handle;
> +
> + obj[BATCH].handle = gem_create(fd, 4096);
> + obj[BATCH].relocs_ptr = to_user_pointer();
> + obj[BATCH].relocation_count = !ahnd ? 1 : 0;
> + bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
> + memset(, 0, sizeof(reloc));
> +
> + i = 0;
> + delta = sizeof(uint32_t) * store_offset;

Can't this overflow the delta as store_offset is a u64?

> + if (!ahnd) {
> + reloc.target_handle = obj[SCRATCH].handle;
> + reloc.presumed_offset = -1;
> + reloc.offset = sizeof(uint32_t) * (i + 1);
> + reloc.delta = delta;
> + reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
> + reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
> + } else {
> + obj[SCRATCH].offset = target_gpu_addr;
> + obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
> + obj[BATCH].offset = bb_offset;
> + obj[BATCH].flags |= EXEC_OBJECT_PINNED;
> + }
> + batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> + if (gen >= 8) {
> + batch[++i] = target_gpu_addr + delta;
> + batch[++i] = (target_gpu_addr + delta) >> 32;

This is different from the previous code, presumably this is fixing a
bug where delta + bits 31:0 of target_gpu_addr overflows into the upper
32 bits?

Matt

> + } else if (gen >= 4) {
> + batch[++i] = 0;
> + batch[++i] = delta;
> + reloc.offset += sizeof(uint32_t);
> + } else {
> + batch[i]--;
> + batch[++i] = delta;
> + }
> + batch[++i] = store_value;
> + batch[++i] = MI_BATCH_BUFFER_END;
> + gem_write(fd, obj[BATCH].handle, 0, batch, sizeof(batch));
> + gem_execbuf(fd, );
> + gem_close(fd, obj[BATCH].handle);
> + put_offset(ahnd, obj[BATCH].handle);
> +}
> diff --git

Re: [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence

2022-01-13 Thread Souza, Jose

On Thu, 2022-01-13 at 19:59 +, Patchwork wrote:
Patch Details
Series: drm/i915/display/adlp: Implement new step in the TC voltage swing prog 
sequence
URL:https://patchwork.freedesktop.org/series/98853/
State:  success
Details:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/index.html
CI Bug Log - changes from CI_DRM_11079_full -> Patchwork_21997_full
Summary

SUCCESS

No regressions found.

pushed, thanks for the review Clint.

Participating hosts (10 -> 10)

No changes in participating hosts

Known issues

Here are the changes found in Patchwork_21997_full that come from known issues:

CI changes
Issues hit

  *   boot:
 *   shard-glk: 
(PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS)
 -> 
(PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
PASS,
 
FAIL,
 
PASS,
 
PASS,
 
PASS)
 ([i915#4392])

IGT changes
Issues hit

  *

Re: [Intel-gfx] [PATCH v3 i-g-t 06/15] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:38AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The hangman framework sets up a context that is valid for all engines
> and has things like banning disabled. The 'unterminated' test then
> ignores it and uses the default context. Fix that.
> 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  tests/i915/i915_hangman.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index 354769f39..6656b3fcd 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -347,6 +347,7 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
>   memset(, 0, sizeof(execbuf));
>   execbuf.buffers_ptr = (uintptr_t)_exec;
>   execbuf.buffer_count = 1;
> + execbuf.rsvd1 = ctx->id;
>  
>   gem_execbuf(device, );
>   if (gem_wait(device, handle, _ns) != 0) {
> -- 
> 2.25.1
>

Re: [Intel-gfx] [igt-dev] [PATCH v3 i-g-t 03/15] tests/i915/i915_hangman: Update capture test to use engine structure

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 11:59:35AM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The capture test was still using old style ring_id and ring_name
> (derived from the engine structure at the higher level). Update it to
> just take the engine structure directly.
> 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  tests/i915/i915_hangman.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
> index f64b8819d..280eac197 100644
> --- a/tests/i915/i915_hangman.c
> +++ b/tests/i915/i915_hangman.c
> @@ -207,8 +207,8 @@ static void check_error_state(const char 
> *expected_ring_name,
>   igt_assert(found);
>  }
>  
> -static void test_error_state_capture(const intel_ctx_t *ctx, unsigned 
> ring_id,
> -  const char *ring_name)
> +static void test_error_state_capture(const intel_ctx_t *ctx,
> +  const struct intel_execution_engine2 *e)
>  {
>   uint32_t *batch;
>   igt_hang_t hang;
> @@ -217,7 +217,7 @@ static void test_error_state_capture(const intel_ctx_t 
> *ctx, unsigned ring_id,
>  
>   clear_error_state();
>  
> - hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, ring_id,
> + hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, e->flags,
> HANG_ALLOW_CAPTURE);
>   offset = hang.spin->obj[IGT_SPIN_BATCH].offset;
>  
> @@ -226,7 +226,7 @@ static void test_error_state_capture(const intel_ctx_t 
> *ctx, unsigned ring_id,
>  
>   igt_post_hang_ring(device, hang);
>  
> - check_error_state(ring_name, offset, batch);
> + check_error_state(e->name, offset, batch);
>   munmap(batch, 4096);
>   put_ahnd(ahnd);
>  }
> @@ -351,7 +351,7 @@ igt_main
>   igt_subtest_with_dynamic("error-state-capture") {
>   for_each_ctx_engine(device, ctx, e) {
>   igt_dynamic_f("%s", e->name)
> - test_error_state_capture(ctx, e->flags, 
> e->name);
> + test_error_state_capture(ctx, e);
>   }
>   }
>  
> -- 
> 2.25.1
>

Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/display/ehl: Update voltage swing table

2022-01-13 Thread Souza, Jose

On Thu, 2022-01-13 at 17:56 +, Patchwork wrote:
Patch Details
Series: drm/i915/display/ehl: Update voltage swing table
URL:https://patchwork.freedesktop.org/series/98844/
State:  failure
Details:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/index.html
CI Bug Log - changes from CI_DRM_11079_full -> Patchwork_21994_full
Summary

FAILURE

Serious unknown changes coming with Patchwork_21994_full absolutely need to be
verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_21994_full, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.

Participating hosts (10 -> 10)

No changes in participating hosts

Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_21994_full:

IGT changes
Possible regressions

  *   igt@gem_exec_await@wide-contexts:
 *   shard-iclb: 
PASS
 -> 
INCOMPLETE

Not related.

Patch pushed, thanks for the review Clint.

Known issues

Here are the changes found in Patchwork_21994_full that come from known issues:

IGT changes
Issues hit

  *   igt@feature_discovery@psr2:

 *   shard-iclb: 
PASS
 -> 
SKIP
 ([i915#658])
  *   igt@gem_eio@unwedge-stress:

 *   shard-skl: 
PASS
 -> 
TIMEOUT
 ([i915#3063])
  *   igt@gem_exec_balancer@parallel-keep-submit-fence:

 *   shard-iclb: 
PASS
 -> 
SKIP
 ([i915#4525]) +1 similar issue
  *   igt@gem_exec_fair@basic-flow@rcs0:

 *   shard-tglb: 
PASS
 -> 
FAIL
 ([i915#2842])
  *   igt@gem_exec_fair@basic-none-rrul@rcs0:

 *   shard-iclb: NOTRUN -> 
FAIL
 ([i915#2852])

 *   shard-glk: NOTRUN -> 
FAIL
 ([i915#2842])

  *   igt@gem_exec_fair@basic-none@vcs0:

 *   shard-kbl: NOTRUN -> 
FAIL
 ([i915#2842]) +1 similar issue
  *   igt@gem_exec_fair@basic-none@vecs0:

 *   shard-apl: 
PASS
 -> 
FAIL
 ([i915#2842])
  *   igt@gem_exec_fair@basic-pace@vcs0:

 *   shard-glk: 
PASS
 -> 
FAIL
 ([i915#2842])
  *   igt@gem_exec_fair@basic-pace@vcs1:

 *   shard-iclb: NOTRUN -> 
FAIL
 ([i915#2842])
  *   igt@gem_huc_copy@huc-copy:

 *   shard-skl: NOTRUN -> 
SKIP
 ([fdo#109271] / [i915#2190])
  *   igt@gem_lmem_swapping@basic:

 *   shard-apl: NOTRUN -> 
SKIP
 ([fdo#109271] / [i915#4613])
  *   igt@gem_lmem_swapping@heavy-verify-random:

 *   shard-kbl: NOTRUN -> 
SKIP
 ([fdo#109271] / [i915#4613]) +4 similar issues

 *   shard-skl: NOTRUN -> 
SKIP
 ([fdo#109271] / [i915#4613]) +2 similar issues

  *   igt@gem_pread@exhaustion:

 *   shard-skl: NOTRUN ->

[Intel-gfx] [PATCH v3 i-g-t 07/15] lib/store: Refactor common store code into helper function

2022-01-13 Thread John . C . Harrison

From: John Harrison 

A lot of tests use almost identical code for creating a batch buffer
which does a single write to memory and another is about to be added.
Instead, move the most generic version into a common helper function.
Unfortunately, the other instances are all subtly different enough to
make it not so trivial to try to use the helper. It could be done but
it is unclear if it is worth the effort at this point. This patch
proves the concept, if people like it enough then it can be extended.

v2: Fix up object address vs store offset confusion (with help from
Zbigniew K).

Signed-off-by: John Harrison 
---
 lib/igt_store.c | 96 +
 lib/igt_store.h | 12 +
 lib/meson.build |  1 +
 tests/i915/gem_exec_fence.c | 77 ++---
 tests/i915/i915_hangman.c   |  1 +
 5 files changed, 115 insertions(+), 72 deletions(-)
 create mode 100644 lib/igt_store.c
 create mode 100644 lib/igt_store.h

diff --git a/lib/igt_store.c b/lib/igt_store.c
new file mode 100644
index 0..42c888b55
--- /dev/null
+++ b/lib/igt_store.c
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "i915/gem_create.h"
+#include "igt_core.h"
+#include "drmtest.h"
+#include "igt_store.h"
+#include "intel_chipset.h"
+#include "intel_reg.h"
+#include "ioctl_wrappers.h"
+#include "lib/intel_allocator.h"
+
+/**
+ * SECTION:igt_store_word
+ * @short_description: Library for writing a value to memory
+ * @title: StoreWord
+ * @include: igt.h
+ *
+ * A lot of igt testcases need some mechanism for writing a value to memory
+ * as a test that a batch buffer has executed.
+ *
+ * NB: Requires master for STORE_DWORD on gen4/5.
+ */
+void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
+   const struct intel_execution_engine2 *e,
+   int fence, uint32_t target_handle,
+   uint64_t target_gpu_addr,
+   uint64_t store_offset, uint32_t store_value)
+{
+   const int SCRATCH = 0;
+   const int BATCH = 1;
+   const unsigned int gen = intel_gen(intel_get_drm_devid(fd));
+   struct drm_i915_gem_exec_object2 obj[2];
+   struct drm_i915_gem_relocation_entry reloc;
+   struct drm_i915_gem_execbuffer2 execbuf;
+   uint32_t batch[16], delta;
+   uint64_t bb_offset;
+   int i;
+
+   memset(, 0, sizeof(execbuf));
+   execbuf.buffers_ptr = to_user_pointer(obj);
+   execbuf.buffer_count = ARRAY_SIZE(obj);
+   execbuf.flags = e->flags;
+   execbuf.rsvd1 = ctx->id;
+   if (fence != -1) {
+   execbuf.flags |= I915_EXEC_FENCE_IN;
+   execbuf.rsvd2 = fence;
+   }
+   if (gen < 6)
+   execbuf.flags |= I915_EXEC_SECURE;
+
+   memset(obj, 0, sizeof(obj));
+   obj[SCRATCH].handle = target_handle;
+
+   obj[BATCH].handle = gem_create(fd, 4096);
+   obj[BATCH].relocs_ptr = to_user_pointer();
+   obj[BATCH].relocation_count = !ahnd ? 1 : 0;
+   bb_offset = get_offset(ahnd, obj[BATCH].handle, 4096, 0);
+   memset(, 0, sizeof(reloc));
+
+   i = 0;
+   delta = sizeof(uint32_t) * store_offset;
+   if (!ahnd) {
+   reloc.target_handle = obj[SCRATCH].handle;
+   reloc.presumed_offset = -1;
+   reloc.offset = sizeof(uint32_t) * (i + 1);
+   reloc.delta = delta;
+   reloc.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
+   reloc.write_domain = I915_GEM_DOMAIN_INSTRUCTION;
+   } else {
+   obj[SCRATCH].offset = target_gpu_addr;
+   obj[SCRATCH].flags |= EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+   obj[BATCH].offset = bb_offset;
+   obj[BATCH].flags |= EXEC_OBJECT_PINNED;
+   }
+   batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
+   if (gen >= 8) {
+   batch[++i] = target_gpu_addr + delta;
+   batch[++i] = (target_gpu_addr + delta) >> 32;
+   } else if (gen >= 4) {
+   batch[++i] = 0;
+   batch[++i] = delta;
+   reloc.offset += sizeof(uint32_t);
+   } else {
+   batch[i]--;
+   batch[++i] = delta;
+   }
+   batch[++i] = store_value;
+   batch[++i] = MI_BATCH_BUFFER_END;
+   gem_write(fd, obj[BATCH].handle, 0, batch, sizeof(batch));
+   gem_execbuf(fd, );
+   gem_close(fd, obj[BATCH].handle);
+   put_offset(ahnd, obj[BATCH].handle);
+}
diff --git a/lib/igt_store.h b/lib/igt_store.h
new file mode 100644
index 0..5c6c8263c
--- /dev/null
+++ b/lib/igt_store.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "igt_gt.h"
+
+void igt_store_word(int fd, uint64_t ahnd, const intel_ctx_t *ctx,
+   const struct intel_execution_engine2 *e,
+   int fence, uint32_t

[Intel-gfx] [PATCH v3 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The global context used by all the subtests for causing hangs is
marked as unbannable. However, some of the subtests set background
spinners running on all engines using a freshly created context. If
there is a test failure for any reason, all of those spinners can be
killed off as hanging contexts. On systems with lots of engines, that
can result in the test being banned from creating any new contexts.

So make the spinner contexts unbannable as well. That way if one
subtest fails it won't necessarily bring down all subsequent subtests.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 9f7f8062c..567eb71ee 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -284,6 +284,21 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx,
check_alive();
 }
 
+static void context_unban(int fd, unsigned ctx)
+{
+   struct drm_i915_gem_context_param param = {
+   .ctx_id = ctx,
+   .param = I915_CONTEXT_PARAM_BANNABLE,
+   .value = 0,
+   };
+
+   if(__gem_context_set_param(fd, ) == -EINVAL) {
+   igt_assert_eq(param.value, 0);
+   param.param = I915_CONTEXT_PARAM_BAN_PERIOD;
+   gem_context_set_param(fd, );
+   }
+}
+
 static void
 test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
@@ -307,6 +322,7 @@ test_engine_hang(const intel_ctx_t *ctx,
num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   context_unban(device, local_ctx[num_ctx]->id);
ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Some platforms have very long timeouts configured for some engines.
Some have them disabled completely. That makes for a very slow (or
broken) hangman test. So explicitly configure the engines to have
reasonable settings first.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 567eb71ee..1a2b2cf7a 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -500,8 +500,12 @@ igt_main
 {
const intel_ctx_t *ctx;
igt_hang_t hang = {};
+   struct gem_engine_properties saved_params[GEM_MAX_ENGINES];
+   int num_engines = 0;
 
igt_fixture {
+   const struct intel_execution_engine2 *e;
+
device = drm_open_driver(DRIVER_INTEL);
igt_require_gem(device);
 
@@ -515,6 +519,13 @@ igt_main
igt_require(has_error_state(sysfs));
 
gem_require_mmap_wc(device);
+
+   for_each_physical_engine(device, e) {
+   saved_params[num_engines].engine = e;
+   saved_params[num_engines].preempt_timeout = 500;
+   saved_params[num_engines].heartbeat_interval = 1000;
+   gem_engine_properties_configure(device, saved_params + 
num_engines++);
+   }
}
 
igt_describe("Basic error capture");
@@ -546,6 +557,11 @@ igt_main
do_tests("engine", "engine", ctx);
 
igt_fixture {
+   int i;
+
+   for (i = 0; i < num_engines; i++)
+   gem_engine_properties_restore(device, saved_params + i);
+
igt_disallow_hang(device, hang);
intel_ctx_destroy(device, ctx);
close(device);
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 12/15] tests/i915/gem_exec_fence: Configure correct context

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The update to use intel_ctx_t missed a line that configures the
context to allow hanging. Fix that.

Fixes: 09c36188b23f83ef9a7b5414e2a10100adc4291f

Signed-off-by: John Harrison 
---
 tests/i915/gem_exec_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/i915/gem_exec_fence.c b/tests/i915/gem_exec_fence.c
index 196236b27..5e45d0518 100644
--- a/tests/i915/gem_exec_fence.c
+++ b/tests/i915/gem_exec_fence.c
@@ -3139,7 +3139,7 @@ igt_main
igt_hang_t hang;
 
igt_fixture {
-   hang = igt_allow_hang(i915, 0, 0);
+   hang = igt_allow_hang(i915, ctx->id, 0);
intel_allocator_multiprocess_start();
}
 
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 09/15] tests/i915/i915_hangman: Remove reliance on context persistance

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The hang test was relying on context persitence for no particular
reason. That is, it would set a bunch of background spinners running
then immediately destroy the active contexts but expect the spinners
to keep spinning. With the current implementation of context
persistence in i915, that means that super high priority pings are
sent to each engine at the start of the test. Depending upon the
timing and platform, one of those unexpected pings could cause test
failures.

There is no need to require context persitence in this test. So change
to managing the contexts cleanly and only destroying them when they
are no longer in use.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 918418760..6601db5f6 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -289,27 +289,29 @@ test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
 {
const struct intel_execution_engine2 *other;
-   const intel_ctx_t *tmp_ctx;
+   const intel_ctx_t *local_ctx[GEM_MAX_ENGINES];
igt_spin_t *spin, *next;
IGT_LIST_HEAD(list);
uint64_t ahnd = get_reloc_ahnd(device, ctx->id), ahndN;
+   int num_ctx;
 
igt_skip_on(flags & IGT_SPIN_INVALID_CS &&
gem_engine_has_cmdparser(device, >cfg, e->flags));
 
/* Fill all the other engines with background load */
+   num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
if (other->flags == e->flags)
continue;
 
-   tmp_ctx = intel_ctx_create(device, >cfg);
-   ahndN = get_reloc_ahnd(device, tmp_ctx->id);
+   local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
- .ctx = tmp_ctx,
+ .ctx = local_ctx[num_ctx],
  .engine = other->flags,
  .flags = IGT_SPIN_FENCE_OUT);
-   intel_ctx_destroy(device, tmp_ctx);
+   num_ctx++;
 
igt_list_move(>link, );
}
@@ -339,7 +341,10 @@ test_engine_hang(const intel_ctx_t *ctx,
igt_spin_free(device, spin);
put_ahnd(ahndN);
}
+
put_ahnd(ahnd);
+   while (num_ctx)
+   intel_ctx_destroy(device, local_ctx[--num_ctx]);
 
check_alive();
 }
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 10/15] tests/i915/i915_hangman: Run background task on all engines

2022-01-13 Thread John . C . Harrison

From: John Harrison 

As opposed to only on the non-target engines. This means that there is
some other workload present for the scheduler to switch between and so
detet the hang immediately.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 6601db5f6..9f7f8062c 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -298,12 +298,14 @@ test_engine_hang(const intel_ctx_t *ctx,
igt_skip_on(flags & IGT_SPIN_INVALID_CS &&
gem_engine_has_cmdparser(device, >cfg, e->flags));
 
-   /* Fill all the other engines with background load */
+   /*
+* Fill all engines with background load.
+* This verifies that independent engines are unaffected and gives
+* the target engine something to switch between so it notices the
+* hang.
+*/
num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
-   if (other->flags == e->flags)
-   continue;
-
local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 03/15] tests/i915/i915_hangman: Update capture test to use engine structure

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The capture test was still using old style ring_id and ring_name
(derived from the engine structure at the higher level). Update it to
just take the engine structure directly.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index f64b8819d..280eac197 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -207,8 +207,8 @@ static void check_error_state(const char 
*expected_ring_name,
igt_assert(found);
 }
 
-static void test_error_state_capture(const intel_ctx_t *ctx, unsigned ring_id,
-const char *ring_name)
+static void test_error_state_capture(const intel_ctx_t *ctx,
+const struct intel_execution_engine2 *e)
 {
uint32_t *batch;
igt_hang_t hang;
@@ -217,7 +217,7 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx, unsigned ring_id,
 
clear_error_state();
 
-   hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, ring_id,
+   hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, e->flags,
  HANG_ALLOW_CAPTURE);
offset = hang.spin->obj[IGT_SPIN_BATCH].offset;
 
@@ -226,7 +226,7 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx, unsigned ring_id,
 
igt_post_hang_ring(device, hang);
 
-   check_error_state(ring_name, offset, batch);
+   check_error_state(e->name, offset, batch);
munmap(batch, 4096);
put_ahnd(ahnd);
 }
@@ -351,7 +351,7 @@ igt_main
igt_subtest_with_dynamic("error-state-capture") {
for_each_ctx_engine(device, ctx, e) {
igt_dynamic_f("%s", e->name)
-   test_error_state_capture(ctx, e->flags, 
e->name);
+   test_error_state_capture(ctx, e);
}
}
 
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 04/15] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Although the hangman test was ensuring that *some* reset functionality
was enabled, it did not differentiate what kind. The infrastructure
required to choose between per engine reset or full GT reset was
recently added. So update this test to use it as well.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 76 +--
 1 file changed, 49 insertions(+), 27 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 280eac197..7b8390a6c 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -323,40 +323,26 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
}
 }
 
-igt_main
+static void do_tests(const char *name, const char *prefix,
+const intel_ctx_t *ctx)
 {
const struct intel_execution_engine2 *e;
-   const intel_ctx_t *ctx;
-   igt_hang_t hang = {};
-
-   igt_fixture {
-   device = drm_open_driver(DRIVER_INTEL);
-   igt_require_gem(device);
-
-   ctx = intel_ctx_create_all_physical(device);
-
-   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE);
-
-   sysfs = igt_sysfs_open(device);
-   igt_assert(sysfs != -1);
-
-   igt_require(has_error_state(sysfs));
-   }
+   char buff[256];
 
-   igt_describe("Basic error capture");
-   igt_subtest("error-state-basic")
-   test_error_state_basic();
-
-   igt_describe("Per engine error capture");
-   igt_subtest_with_dynamic("error-state-capture") {
+   snprintf(buff, sizeof(buff), "Per engine error capture (%s reset)", 
name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-error-state-capture", prefix);
+   igt_subtest_with_dynamic(buff) {
for_each_ctx_engine(device, ctx, e) {
igt_dynamic_f("%s", e->name)
test_error_state_capture(ctx, e);
}
}
 
-   igt_describe("Per engine hang recovery (spin)");
-   igt_subtest_with_dynamic("engine-hang") {
+   snprintf(buff, sizeof(buff), "Per engine hang recovery (spin, %s 
reset)", name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-engine-hang", prefix);
+   igt_subtest_with_dynamic(buff) {
 int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
.param = I915_PARAM_HAS_GPU_RESET,
@@ -374,8 +360,10 @@ igt_main
}
}
 
-   igt_describe("Per engine hang recovery (invalid CS)");
-   igt_subtest_with_dynamic("engine-error") {
+   snprintf(buff, sizeof(buff), "Per engine hang recovery (invalid CS, %s 
reset)", name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-engine-error", prefix);
+   igt_subtest_with_dynamic(buff) {
int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
.param = I915_PARAM_HAS_GPU_RESET,
@@ -391,11 +379,45 @@ igt_main
test_engine_hang(ctx, e, IGT_SPIN_INVALID_CS);
}
}
+}
+
+igt_main
+{
+   const intel_ctx_t *ctx;
+   igt_hang_t hang = {};
+
+   igt_fixture {
+   device = drm_open_driver(DRIVER_INTEL);
+   igt_require_gem(device);
+
+   ctx = intel_ctx_create_all_physical(device);
+
+   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE);
+
+   sysfs = igt_sysfs_open(device);
+   igt_assert(sysfs != -1);
+
+   igt_require(has_error_state(sysfs));
+   }
+
+   igt_describe("Basic error capture");
+   igt_subtest("error-state-basic")
+   test_error_state_basic();
 
igt_describe("Check that executing unintialised memory causes a hang");
igt_subtest("hangcheck-unterminated")
hangcheck_unterminated(ctx);
 
+   do_tests("GT", "gt", ctx);
+
+   igt_fixture {
+   igt_disallow_hang(device, hang);
+
+   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE | 
HANG_WANT_ENGINE_RESET);
+   }
+
+   do_tests("engine", "engine", ctx);
+
igt_fixture {
igt_disallow_hang(device, hang);
intel_ctx_destroy(device, ctx);
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 13/15] lib/i915: Add helper for non-destructive engine property updates

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Various tests want to configure engine properties such as pre-emption
timeout and heartbeat interval. Some don't bother to restore the
original values again afterwards. So, add a helper to make it easier
to do this.

v2: Fix for platforms with no pre-emption capability.

Signed-off-by: John Harrison 
---
 lib/i915/gem_engine_topology.c | 46 ++
 lib/i915/gem_engine_topology.h |  9 +++
 2 files changed, 55 insertions(+)

diff --git a/lib/i915/gem_engine_topology.c b/lib/i915/gem_engine_topology.c
index 729f42b0a..bd12d0bc9 100644
--- a/lib/i915/gem_engine_topology.c
+++ b/lib/i915/gem_engine_topology.c
@@ -488,6 +488,52 @@ int gem_engine_property_printf(int i915, const char 
*engine, const char *attr,
return ret;
 }
 
+/* Ensure fast hang detection */
+void gem_engine_properties_configure(int fd, struct gem_engine_properties 
*params)
+{
+   int ret;
+   struct gem_engine_properties write = *params;
+
+   ret = gem_engine_property_scanf(fd, write.engine->name,
+   "heartbeat_interval_ms",
+   "%d", >heartbeat_interval);
+   igt_assert_eq(ret, 1);
+
+   ret = gem_engine_property_printf(fd, write.engine->name,
+"heartbeat_interval_ms", "%d",
+write.heartbeat_interval);
+   igt_assert_lt(0, ret);
+
+   if (gem_scheduler_has_preemption(fd)) {
+   ret = gem_engine_property_scanf(fd, write.engine->name,
+   "preempt_timeout_ms",
+   "%d", >preempt_timeout);
+   igt_assert_eq(ret, 1);
+
+   ret = gem_engine_property_printf(fd, write.engine->name,
+"preempt_timeout_ms", "%d",
+write.preempt_timeout);
+   igt_assert_lt(0, ret);
+   }
+}
+
+void gem_engine_properties_restore(int fd, const struct gem_engine_properties 
*saved)
+{
+   int ret;
+
+   ret = gem_engine_property_printf(fd, saved->engine->name,
+"heartbeat_interval_ms", "%d",
+saved->heartbeat_interval);
+   igt_assert_lt(0, ret);
+
+   if (gem_scheduler_has_preemption(fd)) {
+   ret = gem_engine_property_printf(fd, saved->engine->name,
+"preempt_timeout_ms", "%d",
+saved->preempt_timeout);
+   igt_assert_lt(0, ret);
+   }
+}
+
 uint32_t gem_engine_mmio_base(int i915, const char *engine)
 {
unsigned int mmio = 0;
diff --git a/lib/i915/gem_engine_topology.h b/lib/i915/gem_engine_topology.h
index 4cfab560b..b413aa8ab 100644
--- a/lib/i915/gem_engine_topology.h
+++ b/lib/i915/gem_engine_topology.h
@@ -115,6 +115,15 @@ struct intel_execution_engine2 
gem_eb_flags_to_engine(unsigned int flags);
 ((e__) = intel_get_current_physical_engine(__##e__)); \
 intel_next_engine(__##e__))
 
+struct gem_engine_properties {
+   const struct intel_execution_engine2 *engine;
+   int preempt_timeout;
+   int heartbeat_interval;
+};
+
+void gem_engine_properties_configure(int fd, struct gem_engine_properties 
*params);
+void gem_engine_properties_restore(int fd, const struct gem_engine_properties 
*saved);
+
 __attribute__((format(scanf, 4, 5)))
 int gem_engine_property_scanf(int i915, const char *engine, const char *attr,
  const char *fmt, ...);
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The test was updated some engine properties but not restoring them
afterwards. That would leave the system in a non-default state which
could potentially affect subsequent tests. Fix it by using the new
save/restore engine properties helper functions.

Signed-off-by: John Harrison 
---
 tests/i915/gem_exec_capture.c | 37 ++-
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 9beb36fc7..51db07c41 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -209,14 +209,21 @@ static int check_error_state(int dir, struct offset 
*obj_offsets, int obj_count,
return blobs;
 }
 
-static void configure_hangs(int fd, const struct intel_execution_engine2 *e, 
int ctxt_id)
+static struct gem_engine_properties
+configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
 {
+   struct gem_engine_properties props;
+
/* Ensure fast hang detection */
-   gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 
250);
-   gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 
500);
+   props.engine = e;
+   props.preempt_timeout = 250;
+   props.heartbeat_interval = 500;
+   gem_engine_properties_configure(fd, );
 
/* Allow engine based resets and disable banning */
igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE | 
HANG_WANT_ENGINE_RESET);
+
+   return props;
 }
 
 static bool fence_busy(int fence)
@@ -256,8 +263,9 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
const intel_ctx_t *ctx,
uint32_t *batch, *seqno;
struct offset offset;
int i, fence_out;
+   struct gem_engine_properties saved_engine;
 
-   configure_hangs(fd, e, ctx->id);
+   saved_engine = configure_hangs(fd, e, ctx->id);
 
memset(obj, 0, sizeof(obj));
obj[SCRATCH].handle = gem_create_in_memory_regions(fd, 4096, region);
@@ -371,6 +379,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
const intel_ctx_t *ctx,
gem_close(fd, obj[BATCH].handle);
gem_close(fd, obj[NOCAPTURE].handle);
gem_close(fd, obj[SCRATCH].handle);
+
+   gem_engine_properties_restore(fd, _engine);
 }
 
 static void capture(int fd, int dir, const intel_ctx_t *ctx,
@@ -417,8 +427,9 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
uint32_t *batch, *seqno;
struct offset *offsets;
int i, fence_out;
+   struct gem_engine_properties saved_engine;
 
-   configure_hangs(fd, e, ctx->id);
+   saved_engine = configure_hangs(fd, e, ctx->id);
 
offsets = calloc(count, sizeof(*offsets));
igt_assert(offsets);
@@ -559,10 +570,12 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
 
qsort(offsets, count, sizeof(*offsets), cmp);
igt_assert(offsets[0].addr <= offsets[count-1].addr);
+
+   gem_engine_properties_restore(fd, _engine);
return offsets;
 }
 
-#define find_first_available_engine(fd, ctx, e) \
+#define find_first_available_engine(fd, ctx, e, saved) \
do { \
ctx = intel_ctx_create_all_physical(fd); \
igt_assert(ctx); \
@@ -570,7 +583,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
for_each_if(gem_class_can_store_dword(fd, e->class)) \
break; \
igt_assert(e); \
-   configure_hangs(fd, e, ctx->id); \
+   saved = configure_hangs(fd, e, ctx->id); \
} while(0)
 
 static void many(int fd, int dir, uint64_t size, unsigned int flags)
@@ -580,8 +593,9 @@ static void many(int fd, int dir, uint64_t size, unsigned 
int flags)
uint64_t ram, gtt, ahnd;
unsigned long count, blobs;
struct offset *offsets;
+   struct gem_engine_properties saved_engine;
 
-   find_first_available_engine(fd, ctx, e);
+   find_first_available_engine(fd, ctx, e, saved_engine);
 
gtt = gem_aperture_size(fd) / size;
ram = (intel_get_avail_ram_mb() << 20) / size;
@@ -602,6 +616,8 @@ static void many(int fd, int dir, uint64_t size, unsigned 
int flags)
 
free(offsets);
put_ahnd(ahnd);
+
+   gem_engine_properties_restore(fd, _engine);
 }
 
 static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
@@ -697,8 +713,9 @@ static void userptr(int fd, int dir)
void *ptr;
int obj_size = 4096;
uint32_t system_region = INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0);
+   struct gem_engine_properties saved_engine;
 
-   find_first_available_engine(fd, ctx, e);
+   find_first_available_engine(fd, ctx, e, saved_engine);
 
igt_assert(posix_memalign(, obj_size, obj_size) == 0);
memset(ptr, 0, obj_size);
@@ -710,6 +727,8 @@ static void userptr(int fd, int dir)
gem_close(fd, handle);

[Intel-gfx] [PATCH v3 i-g-t 06/15] tests/i915/i915_hangman: Use the correct context in hangcheck_unterminated

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The hangman framework sets up a context that is valid for all engines
and has things like banning disabled. The 'unterminated' test then
ignores it and uses the default context. Fix that.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 354769f39..6656b3fcd 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -347,6 +347,7 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
memset(, 0, sizeof(execbuf));
execbuf.buffers_ptr = (uintptr_t)_exec;
execbuf.buffer_count = 1;
+   execbuf.rsvd1 = ctx->id;
 
gem_execbuf(device, );
if (gem_wait(device, handle, _ns) != 0) {
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 01/15] tests/i915/i915_hangman: Add descriptions

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Added descriptions of the various sub-tests and the test as a whole.

v2: Added missing linefeed (spotted by Petri)

Signed-off-by: John Harrison 
Reviewed-by: Petri Latvala 
---
 tests/i915/i915_hangman.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 4c18c22db..b9c4d9983 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -46,6 +46,8 @@
 static int device = -1;
 static int sysfs = -1;
 
+IGT_TEST_DESCRIPTION("Tests for hang detection and recovery");
+
 static bool has_error_state(int dir)
 {
bool result;
@@ -315,9 +317,9 @@ static void hangcheck_unterminated(void)
 
gem_execbuf(device, );
if (gem_wait(device, handle, _ns) != 0) {
-   /* need to manually trigger an hang to clean before failing */
+   /* need to manually trigger a hang to clean before failing */
igt_force_gpu_reset(device);
-   igt_assert_f(0, "unterminated batch did not trigger an hang!");
+   igt_assert_f(0, "unterminated batch did not trigger a hang!\n");
}
 }
 
@@ -341,9 +343,11 @@ igt_main
igt_require(has_error_state(sysfs));
}
 
+   igt_describe("Basic error capture");
igt_subtest("error-state-basic")
test_error_state_basic();
 
+   igt_describe("Per engine error capture");
igt_subtest_with_dynamic("error-state-capture") {
for_each_ctx_engine(device, ctx, e) {
igt_dynamic_f("%s", e->name)
@@ -351,6 +355,7 @@ igt_main
}
}
 
+   igt_describe("Per engine hang recovery (spin)");
igt_subtest_with_dynamic("engine-hang") {
 int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
@@ -369,6 +374,7 @@ igt_main
}
}
 
+   igt_describe("Per engine hang recovery (invalid CS)");
igt_subtest_with_dynamic("engine-error") {
int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
@@ -386,6 +392,7 @@ igt_main
}
}
 
+   igt_describe("Check that executing unintialised memory causes a hang");
igt_subtest("hangcheck-unterminated")
hangcheck_unterminated();
 
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Added a an extra step to the i915_hangman tests to check that the
system is still alive after the hang and recovery. This submits a
simple batch to each engine which does a write to memory and checks
that the write occurred.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 5a0c9497c..918418760 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -48,8 +48,57 @@
 static int device = -1;
 static int sysfs = -1;
 
+#define OFFSET_ALIVE   10
+
 IGT_TEST_DESCRIPTION("Tests for hang detection and recovery");
 
+static void check_alive(void)
+{
+   const struct intel_execution_engine2 *engine;
+   const intel_ctx_t *ctx;
+   uint32_t scratch, *out;
+   int fd, i = 0;
+   uint64_t ahnd, scratch_addr;
+
+   fd = drm_open_driver(DRIVER_INTEL);
+   igt_require(gem_class_can_store_dword(fd, 0));
+
+   ctx = intel_ctx_create_all_physical(fd);
+   ahnd = get_reloc_ahnd(fd, ctx->id);
+   scratch = gem_create(fd, 4096);
+   scratch_addr = get_offset(ahnd, scratch, 4096, 0);
+   out = gem_mmap__wc(fd, scratch, 0, 4096, PROT_WRITE);
+   gem_set_domain(fd, scratch,
+   I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
+
+   for_each_physical_engine(fd, engine) {
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], 0);
+   i++;
+   }
+
+   i = 0;
+   for_each_ctx_engine(fd, ctx, engine) {
+   if (!gem_class_can_store_dword(fd, engine->class))
+   continue;
+
+   /* +OFFSET_ALIVE to ensure engine zero doesn't get a false 
negative */
+   igt_store_word(fd, ahnd, ctx, engine, -1, scratch, scratch_addr,
+  i + OFFSET_ALIVE, i + OFFSET_ALIVE);
+   i++;
+   }
+
+   gem_set_domain(fd, scratch, I915_GEM_DOMAIN_GTT, 0);
+
+   while (i--)
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], i + OFFSET_ALIVE);
+
+   munmap(out, 4096);
+   gem_close(fd, scratch);
+   put_ahnd(ahnd);
+   intel_ctx_destroy(fd, ctx);
+   close(fd);
+}
+
 static bool has_error_state(int dir)
 {
bool result;
@@ -231,6 +280,8 @@ static void test_error_state_capture(const intel_ctx_t *ctx,
check_error_state(e->name, offset, batch);
munmap(batch, 4096);
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static void
@@ -289,6 +340,8 @@ test_engine_hang(const intel_ctx_t *ctx,
put_ahnd(ahndN);
}
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static int hang_count;
@@ -321,6 +374,8 @@ static void test_hang_detector(const intel_ctx_t *ctx,
 
/* Did it work? */
igt_assert(hang_count == 1);
+
+   check_alive();
 }
 
 /* This test covers the case where we end up in an uninitialised area of the
@@ -356,6 +411,8 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
igt_force_gpu_reset(device);
igt_assert_f(0, "unterminated batch did not trigger a hang!\n");
}
+
+   check_alive();
 }
 
 static void do_tests(const char *name, const char *prefix,
@@ -433,6 +490,8 @@ igt_main
igt_assert(sysfs != -1);
 
igt_require(has_error_state(sysfs));
+
+   gem_require_mmap_wc(device);
}
 
igt_describe("Basic error capture");
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 05/15] tests/i915/i915_hangman: Add uevent test & fix detector

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Some of the IGT framework relies on receving a uevent when a hang
occurs. So add a test that this actually works.

While testing this, noticed that hangs could sometimes be missed
because the uevent was (presumably) still in flight by the time the
handler was de-registered. So add an extra delay during cleanup to
give the uevent chance to arrive.

Signed-off-by: John Harrison 
---
 lib/igt_aux.c |  7 +++
 tests/i915/i915_hangman.c | 43 +++
 2 files changed, 50 insertions(+)

diff --git a/lib/igt_aux.c b/lib/igt_aux.c
index c247a1aa4..03cc38c93 100644
--- a/lib/igt_aux.c
+++ b/lib/igt_aux.c
@@ -523,6 +523,13 @@ void igt_fork_hang_detector(int fd)
 
 void igt_stop_hang_detector(void)
 {
+   /*
+* Give the uevent time to arrive. No sleep at all misses about 20% of
+* hangs (at least, in the i915_hangman/detector test). A sleep of 1ms
+* seems to miss about 2%, 10ms loses <1%, so 100ms should be safe.
+*/
+   usleep(100 * 1000);
+
igt_stop_helper(_detector);
 }
 
diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 7b8390a6c..354769f39 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "i915/gem.h"
 #include "i915/gem_create.h"
@@ -289,6 +290,38 @@ test_engine_hang(const intel_ctx_t *ctx,
put_ahnd(ahnd);
 }
 
+static int hang_count;
+
+static void sig_io(int sig)
+{
+   hang_count++;
+}
+
+static void test_hang_detector(const intel_ctx_t *ctx,
+  const struct intel_execution_engine2 *e)
+{
+   igt_hang_t hang;
+   uint64_t ahnd = get_reloc_ahnd(device, ctx->id);
+
+   hang_count = 0;
+
+   igt_fork_hang_detector(device);
+
+   /* Steal the signal handler */
+   signal(SIGIO, sig_io);
+
+   /* Make a hang... */
+   hang = igt_hang_ctx_with_ahnd(device, ahnd, ctx->id, e->flags, 0);
+
+   igt_post_hang_ring(device, hang);
+   put_ahnd(ahnd);
+
+   igt_stop_hang_detector();
+
+   /* Did it work? */
+   igt_assert(hang_count == 1);
+}
+
 /* This test covers the case where we end up in an uninitialised area of the
  * ppgtt and keep executing through it. This is particularly relevant if 48b
  * ppgtt is enabled because the ppgtt is massively bigger compared to the 32b
@@ -408,6 +441,16 @@ igt_main
igt_subtest("hangcheck-unterminated")
hangcheck_unterminated(ctx);
 
+   igt_describe("Check that hang detector works");
+   igt_subtest_with_dynamic("detector") {
+   const struct intel_execution_engine2 *e;
+
+   for_each_ctx_engine(device, ctx, e) {
+   igt_dynamic_f("%s", e->name)
+   test_hang_detector(ctx, e);
+   }
+   }
+
do_tests("GT", "gt", ctx);
 
igt_fixture {
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 02/15] lib/hang: Fix igt_require_hang_ring to work with all engines

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The above function was checking for valid rings via the old interface.
The new scheme is to check for engines on contexts as there are now
more engines than could be supported.

Signed-off-by: John Harrison 
---
 lib/igt_gt.c  | 6 +++---
 lib/igt_gt.h  | 2 +-
 tests/i915/i915_hangman.c | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index 7c7df95ee..50da512f2 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -122,12 +122,12 @@ static void eat_error_state(int dev)
  * to be done under hang injection.
  * Default: false
  */
-void igt_require_hang_ring(int fd, int ring)
+void igt_require_hang_ring(int fd, uint32_t ctx, int ring)
 {
if (!igt_check_boolean_env_var("IGT_HANG", true))
igt_skip("hang injection disabled by user [IGT_HANG=0]\n");
 
-   gem_require_ring(fd, ring);
+igt_require(gem_context_has_engine(fd, ctx, ring));
gem_context_require_bannable(fd);
if (!igt_check_boolean_env_var("IGT_HANG_WITHOUT_RESET", false))
igt_require(has_gpu_reset(fd));
@@ -290,7 +290,7 @@ static igt_hang_t __igt_hang_ctx(int fd, uint64_t ahnd, 
uint32_t ctx, int ring,
igt_spin_t *spin;
unsigned ban;
 
-   igt_require_hang_ring(fd, ring);
+   igt_require_hang_ring(fd, ctx, ring);
 
/* check if non-default ctx submission is allowed */
igt_require(ctx == 0 || has_ctx_exec(fd, ring, ctx));
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index c5059817b..3d10349e4 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -31,7 +31,7 @@
 #include "i915/i915_drm_local.h"
 #include "i915_drm.h"
 
-void igt_require_hang_ring(int fd, int ring);
+void igt_require_hang_ring(int fd, uint32_t ctx, int ring);
 
 typedef struct igt_hang {
igt_spin_t *spin;
diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index b9c4d9983..f64b8819d 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -295,7 +295,7 @@ test_engine_hang(const intel_ctx_t *ctx,
  * case and it takes a lot more time to wrap, so the acthd can potentially keep
  * increasing for a long time
  */
-static void hangcheck_unterminated(void)
+static void hangcheck_unterminated(const intel_ctx_t *ctx)
 {
/* timeout needs to be greater than ~5*hangcheck */
int64_t timeout_ns = 100ull * NSEC_PER_SEC; /* 100 seconds */
@@ -304,7 +304,7 @@ static void hangcheck_unterminated(void)
uint32_t handle;
 
igt_require(gem_uses_full_ppgtt(device));
-   igt_require_hang_ring(device, 0);
+   igt_require_hang_ring(device, ctx->id, 0);
 
handle = gem_create(device, 4096);
 
@@ -394,7 +394,7 @@ igt_main
 
igt_describe("Check that executing unintialised memory causes a hang");
igt_subtest("hangcheck-unterminated")
-   hangcheck_unterminated();
+   hangcheck_unterminated(ctx);
 
igt_fixture {
igt_disallow_hang(device, hang);
-- 
2.25.1

[Intel-gfx] [PATCH v3 i-g-t 00/15] Fixes for i915_hangman and gem_exec_capture

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Fix a bunch of issues with i915_hangman and gem_exec_capture with the
ultimate aim of making them pass on GuC enabled platforms.

v2: Fixes to the store code. Add engine properties management.
v3: Fix for platforms without pre-emption.

Signed-off-by: John Harrison 


John Harrison (15):
  tests/i915/i915_hangman: Add descriptions
  lib/hang: Fix igt_require_hang_ring to work with all engines
  tests/i915/i915_hangman: Update capture test to use engine structure
  tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU
reset
  tests/i915/i915_hangman: Add uevent test & fix detector
  tests/i915/i915_hangman: Use the correct context in
hangcheck_unterminated
  lib/store: Refactor common store code into helper function
  tests/i915/i915_hangman: Add alive-ness test after error capture
  tests/i915/i915_hangman: Remove reliance on context persistance
  tests/i915/i915_hangman: Run background task on all engines
  tests/i915/i915_hangman: Don't let background contexts cause a ban
  tests/i915/gem_exec_fence: Configure correct context
  lib/i915: Add helper for non-destructive engine property updates
  tests/i915/i915_hangman: Configure engine properties for quicker hangs
  tests/i915/gem_exec_capture: Restore engines

 lib/i915/gem_engine_topology.c |  46 ++
 lib/i915/gem_engine_topology.h |   9 ++
 lib/igt_aux.c  |   7 +
 lib/igt_gt.c   |   6 +-
 lib/igt_gt.h   |   2 +-
 lib/igt_store.c|  96 +
 lib/igt_store.h|  12 ++
 lib/meson.build|   1 +
 tests/i915/gem_exec_capture.c  |  37 +++--
 tests/i915/gem_exec_fence.c|  79 +-
 tests/i915/i915_hangman.c  | 256 +++--
 11 files changed, 423 insertions(+), 128 deletions(-)
 create mode 100644 lib/igt_store.c
 create mode 100644 lib/igt_store.h

-- 
2.25.1

[Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915/display/adlp: Implement new step in the TC voltage swing prog 
sequence
URL   : https://patchwork.freedesktop.org/series/98853/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11079_full -> Patchwork_21997_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Participating hosts (10 -> 10)
--

  No changes in participating hosts

Known issues


  Here are the changes found in Patchwork_21997_full that come from known 
issues:

### CI changes ###

 Issues hit 

  * boot:
- shard-glk:  ([PASS][1], [PASS][2], [PASS][3], [PASS][4], 
[PASS][5], [PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], 
[PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], 
[PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], 
[PASS][24], [PASS][25]) -> ([PASS][26], [PASS][27], [PASS][28], [PASS][29], 
[PASS][30], [PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], 
[PASS][36], [PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], 
[PASS][42], [PASS][43], [PASS][44], [PASS][45], [PASS][46], [FAIL][47], 
[PASS][48], [PASS][49], [PASS][50]) ([i915#4392])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk4/boot.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk5/boot.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk5/boot.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk5/boot.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk6/boot.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk6/boot.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk6/boot.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk7/boot.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk7/boot.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk7/boot.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk8/boot.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk8/boot.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk8/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk9/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk9/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk9/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk1/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk1/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk1/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk2/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk2/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk3/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk3/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk4/boot.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk4/boot.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk9/boot.html
   [27]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk9/boot.html
   [28]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk9/boot.html
   [29]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk8/boot.html
   [30]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk8/boot.html
   [31]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk8/boot.html
   [32]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk7/boot.html
   [33]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk7/boot.html
   [34]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk6/boot.html
   [35]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk6/boot.html
   [36]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk6/boot.html
   [37]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk5/boot.html
   [38]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk5/boot.html
   [39]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk4/boot.html
   [40]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk4/boot.html
   [41]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk4/boot.html
   [42]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk3/boot.html
   [43]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/shard-glk3/boot.html
   [44]:

[Intel-gfx] ✓ Fi.CI.BAT: success for Remove some hacks required for GuC 62.0.0 (rev2)

2022-01-13 Thread Patchwork

== Series Details ==

Series: Remove some hacks required for GuC 62.0.0 (rev2)
URL   : https://patchwork.freedesktop.org/series/98773/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11079 -> Patchwork_21999


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/index.html

Participating hosts (40 -> 41)
--

  Additional (6): fi-kbl-soraka bat-dg1-6 bat-adlp-6 bat-rpls-1 bat-jsl-2 
bat-jsl-1 
  Missing(5): shard-tglu fi-bsw-cyan fi-pnv-d510 shard-rkl shard-dg1 

Known issues


  Here are the changes found in Patchwork_21999 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@amdgpu/amd_basic@semaphore:
- fi-bdw-5557u:   NOTRUN -> [SKIP][1] ([fdo#109271]) +31 similar issues
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/fi-bdw-5557u/igt@amdgpu/amd_ba...@semaphore.html
- fi-hsw-4770:NOTRUN -> [SKIP][2] ([fdo#109271] / [fdo#109315]) +17 
similar issues
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/fi-hsw-4770/igt@amdgpu/amd_ba...@semaphore.html

  * igt@fbdev@nullptr:
- bat-dg1-6:  NOTRUN -> [SKIP][3] ([i915#2582]) +4 similar issues
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@fb...@nullptr.html

  * igt@gem_exec_fence@basic-busy@bcs0:
- fi-kbl-soraka:  NOTRUN -> [SKIP][4] ([fdo#109271]) +8 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/fi-kbl-soraka/igt@gem_exec_fence@basic-b...@bcs0.html

  * igt@gem_exec_gttfill@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][5] ([i915#4086])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@gem_exec_gttf...@basic.html

  * igt@gem_huc_copy@huc-copy:
- fi-kbl-soraka:  NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#2190])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/fi-kbl-soraka/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@parallel-random-engines:
- fi-kbl-soraka:  NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#4613]) +3 
similar issues
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/fi-kbl-soraka/igt@gem_lmem_swapp...@parallel-random-engines.html

  * igt@gem_mmap@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][8] ([i915#4083])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@gem_m...@basic.html

  * igt@gem_tiled_blits@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][9] ([i915#4077]) +2 similar issues
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@gem_tiled_bl...@basic.html

  * igt@gem_tiled_pread_basic:
- bat-dg1-6:  NOTRUN -> [SKIP][10] ([i915#4079]) +1 similar issue
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- bat-dg1-6:  NOTRUN -> [SKIP][11] ([i915#1155])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_selftest@live@gt_pm:
- fi-kbl-soraka:  NOTRUN -> [DMESG-FAIL][12] ([i915#1886] / [i915#2291])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@hangcheck:
- bat-dg1-6:  NOTRUN -> [DMESG-FAIL][13] ([i915#4494])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@i915_selftest@l...@hangcheck.html

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
- bat-dg1-6:  NOTRUN -> [SKIP][14] ([i915#4212]) +7 similar issues
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@kms_addfb_ba...@addfb25-x-tiled-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
- bat-dg1-6:  NOTRUN -> [SKIP][15] ([i915#4215])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@kms_addfb_ba...@basic-y-tiled-legacy.html

  * igt@kms_busy@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][16] ([i915#4303])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@kms_b...@basic.html

  * igt@kms_chamelium@dp-edid-read:
- fi-kbl-soraka:  NOTRUN -> [SKIP][17] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/fi-kbl-soraka/igt@kms_chamel...@dp-edid-read.html

  * igt@kms_chamelium@hdmi-edid-read:
- bat-dg1-6:  NOTRUN -> [SKIP][18] ([fdo#111827]) +8 similar issues
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21999/bat-dg1-6/igt@kms_chamel...@hdmi-edid-read.html

  * igt@kms_chamelium@vga-edid-read:
- fi-bdw-5557u:   NOTRUN -> [SKIP][19] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [19]:

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Remove some hacks required for GuC 62.0.0 (rev2)

2022-01-13 Thread Patchwork

== Series Details ==

Series: Remove some hacks required for GuC 62.0.0 (rev2)
URL   : https://patchwork.freedesktop.org/series/98773/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
0bbf21e5b612 drm/i915/selftests: Add a cancel request selftest that triggers a 
reset
6c1f06c0b615 drm/i915/guc: Remove hacks for reset and schedule disable G2H 
being received out of order
-:14: WARNING:TYPO_SPELLING: 'cancelation' may be misspelled - perhaps 
'cancellation'?
#14: 
  - s/cancelation/cancellation
  ^^^

total: 0 errors, 1 warnings, 0 checks, 63 lines checked

[Intel-gfx] ✗ Fi.CI.BAT: failure for Flush G2H handler during a GT reset

2022-01-13 Thread Patchwork

== Series Details ==

Series: Flush G2H handler during a GT reset
URL   : https://patchwork.freedesktop.org/series/98855/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11079 -> Patchwork_21998


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_21998 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21998, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/index.html

Participating hosts (39 -> 42)
--

  Additional (6): bat-dg1-6 bat-dg1-5 bat-adlp-6 bat-rpls-1 bat-jsl-2 bat-jsl-1 
  Missing(3): fi-bsw-cyan shard-rkl shard-tglu 

Possible new issues
---

  Here are the unknown changes that may have been introduced in Patchwork_21998:

### IGT changes ###

 Possible regressions 

  * igt@i915_selftest@live@hangcheck:
- bat-dg1-5:  NOTRUN -> [INCOMPLETE][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-5/igt@i915_selftest@l...@hangcheck.html
- bat-dg1-6:  NOTRUN -> [INCOMPLETE][2]
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@i915_selftest@l...@hangcheck.html

  
Known issues


  Here are the changes found in Patchwork_21998 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@amdgpu/amd_basic@cs-gfx:
- fi-hsw-4770:NOTRUN -> [SKIP][3] ([fdo#109271] / [fdo#109315]) +17 
similar issues
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/fi-hsw-4770/igt@amdgpu/amd_ba...@cs-gfx.html

  * igt@amdgpu/amd_basic@semaphore:
- fi-bdw-5557u:   NOTRUN -> [SKIP][4] ([fdo#109271]) +31 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/fi-bdw-5557u/igt@amdgpu/amd_ba...@semaphore.html

  * igt@fbdev@info:
- bat-dg1-6:  NOTRUN -> [SKIP][5] ([i915#2582]) +4 similar issues
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@fb...@info.html

  * igt@gem_exec_gttfill@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][6] ([i915#4086])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@gem_exec_gttf...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][7] ([i915#4086])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-5/igt@gem_exec_gttf...@basic.html

  * igt@gem_mmap@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][8] ([i915#4083])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@gem_m...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][9] ([i915#4083])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-5/igt@gem_m...@basic.html

  * igt@gem_tiled_blits@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][10] ([i915#4077]) +2 similar issues
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@gem_tiled_bl...@basic.html

  * igt@gem_tiled_fence_blits@basic:
- bat-dg1-5:  NOTRUN -> [SKIP][11] ([i915#4077]) +2 similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-5/igt@gem_tiled_fence_bl...@basic.html

  * igt@gem_tiled_pread_basic:
- bat-dg1-5:  NOTRUN -> [SKIP][12] ([i915#4079]) +1 similar issue
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-5/igt@gem_tiled_pread_basic.html
- bat-dg1-6:  NOTRUN -> [SKIP][13] ([i915#4079]) +1 similar issue
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- bat-dg1-5:  NOTRUN -> [SKIP][14] ([i915#1155])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-5/igt@i915_pm_backli...@basic-brightness.html
- bat-dg1-6:  NOTRUN -> [SKIP][15] ([i915#1155])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
- fi-skl-6600u:   [PASS][16] -> [FAIL][17] ([i915#3239])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/fi-skl-6600u/igt@i915_pm_...@basic-pci-d3-state.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/fi-skl-6600u/igt@i915_pm_...@basic-pci-d3-state.html

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
- bat-dg1-6:  NOTRUN -> [SKIP][18] ([i915#4212]) +7 similar issues
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21998/bat-dg1-6/igt@kms_addfb_ba...@addfb25-x-tiled-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
- bat-dg1-5:  NOTRUN -> [SKIP][19] ([i915#4215])
   [19]:

Re: [Intel-gfx] [PATCH] drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence

2022-01-13 Thread Clint Taylor


Matches BSPEC for DKL Phy.

Reviewed-by: Clint Taylor 

-Clint


On 1/13/22 9:48 AM, José Roberto de Souza wrote:

TC voltage swing programming sequence was updated with a new step.

BSpec: 54956
Cc: sta...@vger.kernel.org
Cc: Jani Nikula 
Cc: Clint Taylor 
Cc: Imre Deak 
Signed-off-by: José Roberto de Souza 
---
  drivers/gpu/drm/i915/display/intel_ddi.c | 22 ++
  drivers/gpu/drm/i915/i915_reg.h  |  8 ++--
  2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c 
b/drivers/gpu/drm/i915/display/intel_ddi.c
index 6ee0f77b79274..4e93eac926a56 100644
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@@ -1300,6 +1300,28 @@ static void tgl_dkl_phy_set_signal_levels(struct 
intel_encoder *encoder,
  
  		intel_de_rmw(dev_priv, DKL_TX_DPCNTL2(tc_port),

 DKL_TX_DP20BITMODE, 0);
+
+   if (IS_ALDERLAKE_P(dev_priv)) {
+   u32 val;
+
+   if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI)) 
{
+   if (ln == 0) {
+   val = 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(0);
+   val |= 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(2);
+   } else {
+   val = 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(3);
+   val |= 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(3);
+   }
+   } else {
+   val = DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(0);
+   val |= DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(0);
+   }
+
+   intel_de_rmw(dev_priv, DKL_TX_DPCNTL2(tc_port),
+DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK |
+DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK,
+val);
+   }
}
  }
  
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h

index 7c4013a0db615..ef6bc81800738 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -10085,8 +10085,12 @@ enum skl_power_gate {
 _DKL_PHY2_BASE) + \
 _DKL_TX_DPCNTL1)
  
-#define _DKL_TX_DPCNTL20x2C8

-#define  DKL_TX_DP20BITMODE(1 << 2)
+#define _DKL_TX_DPCNTL20x2C8
+#define  DKL_TX_DP20BITMODEREG_BIT(2)
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK REG_GENMASK(4, 3)
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(val) 
REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK, (val))
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK REG_GENMASK(6, 5)
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(val) 
REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK, (val))
  #define DKL_TX_DPCNTL2(tc_port) _MMIO(_PORT(tc_port, \
 _DKL_PHY1_BASE, \
 _DKL_PHY2_BASE) + \

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Flush G2H handler during a GT reset

2022-01-13 Thread Patchwork

== Series Details ==

Series: Flush G2H handler during a GT reset
URL   : https://patchwork.freedesktop.org/series/98855/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
df8c84b965e7 drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL
6af3f3db9124 drm/i915/guc: Flush G2H handler during a GT reset
-:49: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address 
mismatch: 'From: Matthew Brost ' != 'Signed-off-by: 
Matthew Brost '

total: 0 errors, 1 warnings, 0 checks, 30 lines checked

[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915/display/adlp: Implement new step in the TC voltage swing prog 
sequence
URL   : https://patchwork.freedesktop.org/series/98853/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11079 -> Patchwork_21997


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/index.html

Participating hosts (39 -> 41)
--

  Additional (6): bat-dg1-6 bat-dg1-5 bat-adlp-6 bat-rpls-1 bat-jsl-2 bat-jsl-1 
  Missing(4): fi-bsw-cyan shard-rkl shard-tglu fi-pnv-d510 

Known issues


  Here are the changes found in Patchwork_21997 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@amdgpu/amd_basic@cs-gfx:
- fi-hsw-4770:NOTRUN -> [SKIP][1] ([fdo#109271] / [fdo#109315]) +17 
similar issues
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/fi-hsw-4770/igt@amdgpu/amd_ba...@cs-gfx.html

  * igt@amdgpu/amd_basic@semaphore:
- fi-bdw-5557u:   NOTRUN -> [SKIP][2] ([fdo#109271]) +31 similar issues
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/fi-bdw-5557u/igt@amdgpu/amd_ba...@semaphore.html

  * igt@fbdev@info:
- bat-dg1-6:  NOTRUN -> [SKIP][3] ([i915#2582]) +4 similar issues
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@fb...@info.html

  * igt@gem_exec_gttfill@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][4] ([i915#4086])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@gem_exec_gttf...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][5] ([i915#4086])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-5/igt@gem_exec_gttf...@basic.html

  * igt@gem_mmap@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][6] ([i915#4083])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@gem_m...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][7] ([i915#4083])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-5/igt@gem_m...@basic.html

  * igt@gem_tiled_blits@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][8] ([i915#4077]) +2 similar issues
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@gem_tiled_bl...@basic.html

  * igt@gem_tiled_fence_blits@basic:
- bat-dg1-5:  NOTRUN -> [SKIP][9] ([i915#4077]) +2 similar issues
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-5/igt@gem_tiled_fence_bl...@basic.html

  * igt@gem_tiled_pread_basic:
- bat-dg1-5:  NOTRUN -> [SKIP][10] ([i915#4079]) +1 similar issue
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-5/igt@gem_tiled_pread_basic.html
- bat-dg1-6:  NOTRUN -> [SKIP][11] ([i915#4079]) +1 similar issue
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- bat-dg1-5:  NOTRUN -> [SKIP][12] ([i915#1155])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-5/igt@i915_pm_backli...@basic-brightness.html
- bat-dg1-6:  NOTRUN -> [SKIP][13] ([i915#1155])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@i915_pm_backli...@basic-brightness.html

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
- bat-dg1-6:  NOTRUN -> [SKIP][14] ([i915#4212]) +7 similar issues
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@kms_addfb_ba...@addfb25-x-tiled-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
- bat-dg1-5:  NOTRUN -> [SKIP][15] ([i915#4215])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-5/igt@kms_addfb_ba...@basic-y-tiled-legacy.html
- bat-dg1-6:  NOTRUN -> [SKIP][16] ([i915#4215])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@kms_addfb_ba...@basic-y-tiled-legacy.html

  * igt@kms_addfb_basic@tile-pitch-mismatch:
- bat-dg1-5:  NOTRUN -> [SKIP][17] ([i915#4212]) +7 similar issues
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-5/igt@kms_addfb_ba...@tile-pitch-mismatch.html

  * igt@kms_busy@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][18] ([i915#4303])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/bat-dg1-6/igt@kms_b...@basic.html

  * igt@kms_chamelium@dp-crc-fast:
- fi-bdw-5557u:   NOTRUN -> [SKIP][19] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21997/fi-bdw-5557u/igt@kms_chamel...@dp-crc-fast.html

  * igt@kms_chamelium@hdmi-edid-read:
- bat-dg1-6:  NOTRUN -> [SKIP][20] ([fdo#111827]) +8 similar issues
   [20]:

Re: [Intel-gfx] [PATCH 1/2] drm/i915/selftests: Add a cancel request selftest that triggers a reset

2022-01-13 Thread John Harrison


On 1/13/2022 10:13, Matthew Brost wrote:

Add a cancel request selftest that results in an engine reset to cancel
the request as it is non-preemptable. Also insert a NOP request after
the cancelled request and confirm that it completes successfully.

v2:
  (Tvrtko)
   - Skip test if preemption timeout compiled out
   - Skip test if engine reset isn't supported
   - Update debug prints to be more descriptive
v3:
   - Add comment explaining test
v4:
  (John Harrison)
   - Fix typos in comment explaining test
   - goto out_rq is NOP creation fails

Signed-off-by: Matthew Brost 

---
  drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
  1 file changed, 117 insertions(+)
---
  drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
  1 file changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index 7f66f6d299b26..2a99dd7c2fe8a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -782,6 +782,115 @@ static int __cancel_completed(struct intel_engine_cs 
*engine)
return err;
  }
  
+/*

+ * Test to prove a non-preemptable request can be cancelled and a subsequent

Oops - another 'preemptible'. Maybe fix that one during merge?

Reviewed-by: John Harrison 



+ * request on the same context can successfully complete after cancellation.
+ *
+ * Testing methodology is to create a non-preemptible request and submit it,
+ * wait for spinner to start, create a NOP request and submit it, cancel the
+ * spinner, wait for spinner to complete and verify it failed with an error,
+ * finally wait for NOP request to complete verify it succeeded without an
+ * error. Preemption timeout also reduced / restored so test runs in a timely
+ * maner.
+ */
+static int __cancel_reset(struct drm_i915_private *i915,
+ struct intel_engine_cs *engine)
+{
+   struct intel_context *ce;
+   struct igt_spinner spin;
+   struct i915_request *rq, *nop;
+   unsigned long preempt_timeout_ms;
+   int err = 0;
+
+   if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT ||
+   !intel_has_reset_engine(engine->gt))
+   return 0;
+
+   preempt_timeout_ms = engine->props.preempt_timeout_ms;
+   engine->props.preempt_timeout_ms = 100;
+
+   if (igt_spinner_init(, engine->gt))
+   goto out_restore;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(ce);
+   goto out_spin;
+   }
+
+   rq = igt_spinner_create_request(, ce, MI_NOOP);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   goto out_ce;
+   }
+
+   pr_debug("%s: Cancelling active non-preemptable request\n",
+engine->name);
+   i915_request_get(rq);
+   i915_request_add(rq);
+   if (!igt_wait_for_spinner(, rq)) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("Failed to start spinner on %s\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_rq;
+   }
+
+   nop = intel_context_create_request(ce);
+   if (IS_ERR(nop))
+   goto out_rq;
+   i915_request_get(nop);
+   i915_request_add(nop);
+
+   i915_request_cancel(rq, -EINTR);
+
+   if (i915_request_wait(rq, 0, HZ) < 0) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("%s: Failed to cancel hung request\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_nop;
+   }
+
+   if (rq->fence.error != -EINTR) {
+   pr_err("%s: fence not cancelled (%u)\n",
+  engine->name, rq->fence.error);
+   err = -EINVAL;
+   goto out_nop;
+   }
+
+   if (i915_request_wait(nop, 0, HZ) < 0) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("%s: Failed to complete nop request\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_nop;
+   }
+
+   if (nop->fence.error != 0) {
+   pr_err("%s: Nop request errored (%u)\n",
+  engine->name, nop->fence.error);
+   err = -EINVAL;
+   }
+
+out_nop:
+   i915_request_put(nop);
+out_rq:
+   i915_request_put(rq);
+out_ce:
+   intel_context_put(ce);
+out_spin:
+   igt_spinner_fini();
+out_restore:
+   engine->props.preempt_timeout_ms = preempt_timeout_ms;
+   if (err)
+   pr_err("%s: %s error %d\n", __func__, engine->name, err);
+   return err;
+}
+
  static int live_cancel_request(void *arg)
  {
struct

[Intel-gfx] [PATCH 1/2] drm/i915/selftests: Add a cancel request selftest that triggers a reset

2022-01-13 Thread Matthew Brost

Add a cancel request selftest that results in an engine reset to cancel
the request as it is non-preemptable. Also insert a NOP request after
the cancelled request and confirm that it completes successfully.

v2:
 (Tvrtko)
  - Skip test if preemption timeout compiled out
  - Skip test if engine reset isn't supported
  - Update debug prints to be more descriptive
v3:
  - Add comment explaining test
v4:
 (John Harrison)
  - Fix typos in comment explaining test
  - goto out_rq is NOP creation fails

Signed-off-by: Matthew Brost 

---
 drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
 1 file changed, 117 insertions(+)
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
 1 file changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index 7f66f6d299b26..2a99dd7c2fe8a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -782,6 +782,115 @@ static int __cancel_completed(struct intel_engine_cs 
*engine)
return err;
 }
 
+/*
+ * Test to prove a non-preemptable request can be cancelled and a subsequent
+ * request on the same context can successfully complete after cancellation.
+ *
+ * Testing methodology is to create a non-preemptible request and submit it,
+ * wait for spinner to start, create a NOP request and submit it, cancel the
+ * spinner, wait for spinner to complete and verify it failed with an error,
+ * finally wait for NOP request to complete verify it succeeded without an
+ * error. Preemption timeout also reduced / restored so test runs in a timely
+ * maner.
+ */
+static int __cancel_reset(struct drm_i915_private *i915,
+ struct intel_engine_cs *engine)
+{
+   struct intel_context *ce;
+   struct igt_spinner spin;
+   struct i915_request *rq, *nop;
+   unsigned long preempt_timeout_ms;
+   int err = 0;
+
+   if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT ||
+   !intel_has_reset_engine(engine->gt))
+   return 0;
+
+   preempt_timeout_ms = engine->props.preempt_timeout_ms;
+   engine->props.preempt_timeout_ms = 100;
+
+   if (igt_spinner_init(, engine->gt))
+   goto out_restore;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(ce);
+   goto out_spin;
+   }
+
+   rq = igt_spinner_create_request(, ce, MI_NOOP);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   goto out_ce;
+   }
+
+   pr_debug("%s: Cancelling active non-preemptable request\n",
+engine->name);
+   i915_request_get(rq);
+   i915_request_add(rq);
+   if (!igt_wait_for_spinner(, rq)) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("Failed to start spinner on %s\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_rq;
+   }
+
+   nop = intel_context_create_request(ce);
+   if (IS_ERR(nop))
+   goto out_rq;
+   i915_request_get(nop);
+   i915_request_add(nop);
+
+   i915_request_cancel(rq, -EINTR);
+
+   if (i915_request_wait(rq, 0, HZ) < 0) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("%s: Failed to cancel hung request\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_nop;
+   }
+
+   if (rq->fence.error != -EINTR) {
+   pr_err("%s: fence not cancelled (%u)\n",
+  engine->name, rq->fence.error);
+   err = -EINVAL;
+   goto out_nop;
+   }
+
+   if (i915_request_wait(nop, 0, HZ) < 0) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("%s: Failed to complete nop request\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_nop;
+   }
+
+   if (nop->fence.error != 0) {
+   pr_err("%s: Nop request errored (%u)\n",
+  engine->name, nop->fence.error);
+   err = -EINVAL;
+   }
+
+out_nop:
+   i915_request_put(nop);
+out_rq:
+   i915_request_put(rq);
+out_ce:
+   intel_context_put(ce);
+out_spin:
+   igt_spinner_fini();
+out_restore:
+   engine->props.preempt_timeout_ms = preempt_timeout_ms;
+   if (err)
+   pr_err("%s: %s error %d\n", __func__, engine->name, err);
+   return err;
+}
+
 static int live_cancel_request(void *arg)
 {
struct drm_i915_private *i915 = arg;
@@ -814,6 +923,14 @@ static int live_cancel_request(void *arg)
return err;
if (err2)

[Intel-gfx] [PATCH 2/2] drm/i915/guc: Remove hacks for reset and schedule disable G2H being received out of order

2022-01-13 Thread Matthew Brost

In the i915 there are several hacks in place to make request cancellation
work with an old version of the GuC which delivered the G2H indicating
schedule disable is done before G2H indicating a context reset. Version
69 fixes this, so we can remove these hacks.

v2:
 (Checkpatch)
  - s/cancelation/cancellation

Reviewed-by: John Harrison 
Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 30 ++-
 1 file changed, 2 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a40f10d376d..3918f1be114fa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1533,7 +1533,6 @@ static void __guc_reset_context(struct intel_context *ce, 
bool stalled)
unsigned long flags;
u32 head;
int i, number_children = ce->parallel.number_children;
-   bool skip = false;
struct intel_context *parent = ce;
 
GEM_BUG_ON(intel_context_is_child(ce));
@@ -1544,23 +1543,10 @@ static void __guc_reset_context(struct intel_context 
*ce, bool stalled)
 * GuC will implicitly mark the context as non-schedulable when it sends
 * the reset notification. Make sure our state reflects this change. The
 * context will be marked enabled on resubmission.
-*
-* XXX: If the context is reset as a result of the request cancellation
-* this G2H is received after the schedule disable complete G2H which is
-* wrong as this creates a race between the request cancellation code
-* re-submitting the context and this G2H handler. This is a bug in the
-* GuC but can be worked around in the meantime but converting this to a
-* NOP if a pending enable is in flight as this indicates that a request
-* cancellation has occurred.
 */
spin_lock_irqsave(>guc_state.lock, flags);
-   if (likely(!context_pending_enable(ce)))
-   clr_context_enabled(ce);
-   else
-   skip = true;
+   clr_context_enabled(ce);
spin_unlock_irqrestore(>guc_state.lock, flags);
-   if (unlikely(skip))
-   goto out_put;
 
/*
 * For each context in the relationship find the hanging request
@@ -1592,7 +1578,6 @@ static void __guc_reset_context(struct intel_context *ce, 
bool stalled)
}
 
__unwind_incomplete_requests(parent);
-out_put:
intel_context_put(parent);
 }
 
@@ -2531,12 +2516,6 @@ static void guc_context_cancel_request(struct 
intel_context *ce,
true);
}
 
-   /*
-* XXX: Racey if context is reset, see comment in
-* __guc_reset_context().
-*/
-   flush_work(_to_guc(ce)->ct.requests.worker);
-
guc_context_unblock(block_context);
intel_context_put(ce);
}
@@ -3971,12 +3950,7 @@ static void guc_handle_context_reset(struct intel_guc 
*guc,
 {
trace_intel_context_reset(ce);
 
-   /*
-* XXX: Racey if request cancellation has occurred, see comment in
-* __guc_reset_context().
-*/
-   if (likely(!intel_context_is_banned(ce) &&
-  !context_blocked(ce))) {
+   if (likely(!intel_context_is_banned(ce))) {
capture_error_state(guc, ce);
guc_context_replay(ce);
} else {
-- 
2.34.1

[Intel-gfx] [PATCH 0/2] Remove some hacks required for GuC 62.0.0

2022-01-13 Thread Matthew Brost

Remove a hack required because schedule disable done G2H was received
before context reset G2H in GuC firmware 62.0.0. Since we have upgraded
69.0.3, this is no longer required.

Also revive selftest which proves this works before / after change.

v2:
  - Address John Harrion's comments

Signed-off-by: Matthew Brost 

Matthew Brost (2):
  drm/i915/selftests: Add a cancel request selftest that triggers a
reset
  drm/i915/guc: Remove hacks for reset and schedule disable G2H being
received out of order

 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  30 +
 drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
 2 files changed, 119 insertions(+), 28 deletions(-)

-- 
2.34.1

[Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915/display/adlp: Implement new step in the TC voltage swing prog 
sequence
URL   : https://patchwork.freedesktop.org/series/98853/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915/display/adlp: Implement new step in the TC voltage swing prog 
sequence
URL   : https://patchwork.freedesktop.org/series/98853/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
79db9538db06 drm/i915/display/adlp: Implement new step in the TC voltage swing 
prog sequence
-:65: WARNING:LONG_LINE: line length of 120 exceeds 100 columns
#65: FILE: drivers/gpu/drm/i915/i915_reg.h:10091:
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(val) 
REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK, (val))

-:67: WARNING:LONG_LINE: line length of 120 exceeds 100 columns
#67: FILE: drivers/gpu/drm/i915/i915_reg.h:10093:
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(val) 
REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK, (val))

total: 0 errors, 2 warnings, 0 checks, 42 lines checked

[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Flip guc_id allocation partition (rev4)

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915: Flip guc_id allocation partition (rev4)
URL   : https://patchwork.freedesktop.org/series/98751/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11079 -> Patchwork_21996


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_21996 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21996, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/index.html

Participating hosts (39 -> 42)
--

  Additional (6): bat-dg1-6 bat-dg1-5 bat-adlp-6 bat-rpls-1 bat-jsl-2 bat-jsl-1 
  Missing(3): fi-bsw-cyan shard-rkl shard-tglu 

Possible new issues
---

  Here are the unknown changes that may have been introduced in Patchwork_21996:

### CI changes ###

 Possible regressions 

  * boot:
- fi-skl-6700k2:  [PASS][1] -> [FAIL][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/fi-skl-6700k2/boot.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/fi-skl-6700k2/boot.html

  
Known issues


  Here are the changes found in Patchwork_21996 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@amdgpu/amd_basic@cs-gfx:
- fi-hsw-4770:NOTRUN -> [SKIP][3] ([fdo#109271] / [fdo#109315]) +17 
similar issues
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/fi-hsw-4770/igt@amdgpu/amd_ba...@cs-gfx.html

  * igt@amdgpu/amd_basic@semaphore:
- fi-bdw-5557u:   NOTRUN -> [SKIP][4] ([fdo#109271]) +31 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/fi-bdw-5557u/igt@amdgpu/amd_ba...@semaphore.html

  * igt@fbdev@info:
- bat-dg1-6:  NOTRUN -> [SKIP][5] ([i915#2582]) +4 similar issues
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@fb...@info.html

  * igt@gem_exec_gttfill@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][6] ([i915#4086])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@gem_exec_gttf...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][7] ([i915#4086])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-5/igt@gem_exec_gttf...@basic.html

  * igt@gem_mmap@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][8] ([i915#4083])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@gem_m...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][9] ([i915#4083])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-5/igt@gem_m...@basic.html

  * igt@gem_tiled_blits@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][10] ([i915#4077]) +2 similar issues
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@gem_tiled_bl...@basic.html

  * igt@gem_tiled_fence_blits@basic:
- bat-dg1-5:  NOTRUN -> [SKIP][11] ([i915#4077]) +2 similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-5/igt@gem_tiled_fence_bl...@basic.html

  * igt@gem_tiled_pread_basic:
- bat-dg1-5:  NOTRUN -> [SKIP][12] ([i915#4079]) +1 similar issue
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-5/igt@gem_tiled_pread_basic.html
- bat-dg1-6:  NOTRUN -> [SKIP][13] ([i915#4079]) +1 similar issue
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- bat-dg1-5:  NOTRUN -> [SKIP][14] ([i915#1155])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-5/igt@i915_pm_backli...@basic-brightness.html
- bat-dg1-6:  NOTRUN -> [SKIP][15] ([i915#1155])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_selftest@live@hangcheck:
- bat-dg1-6:  NOTRUN -> [DMESG-FAIL][16] ([i915#4494])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@i915_selftest@l...@hangcheck.html

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
- bat-dg1-6:  NOTRUN -> [SKIP][17] ([i915#4212]) +7 similar issues
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-6/igt@kms_addfb_ba...@addfb25-x-tiled-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
- bat-dg1-5:  NOTRUN -> [SKIP][18] ([i915#4215])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21996/bat-dg1-5/igt@kms_addfb_ba...@basic-y-tiled-legacy.html
- bat-dg1-6:  NOTRUN -> [SKIP][19] ([i915#4215])
   [19]:

Re: [Intel-gfx] [PATCH 1/2] drm/i915/selftests: Add a cancel request selftest that triggers a reset

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 09:59:35AM -0800, John Harrison wrote:
> On 1/13/2022 09:34, Matthew Brost wrote:
> > On Thu, Jan 13, 2022 at 09:33:12AM -0800, John Harrison wrote:
> > > On 1/11/2022 15:11, Matthew Brost wrote:
> > > > Add a cancel request selftest that results in an engine reset to cancel
> > > > the request as it is non-preemptable. Also insert a NOP request after
> > > > the cancelled request and confirm that it completes successfully.
> > > > 
> > > > v2:
> > > >(Tvrtko)
> > > > - Skip test if preemption timeout compiled out
> > > > - Skip test if engine reset isn't supported
> > > > - Update debug prints to be more descriptive
> > > > v3:
> > > > - Add comment explaining test
> > > > 
> > > > Signed-off-by: Matthew Brost 
> > > > ---
> > > >drivers/gpu/drm/i915/selftests/i915_request.c | 117 
> > > > ++
> > > >1 file changed, 117 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
> > > > b/drivers/gpu/drm/i915/selftests/i915_request.c
> > > > index 7f66f6d299b26..f78de99d5ae1e 100644
> > > > --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> > > > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> > > > @@ -782,6 +782,115 @@ static int __cancel_completed(struct 
> > > > intel_engine_cs *engine)
> > > > return err;
> > > >}
> > > > +/*
> > > > + * Test to prove a non-preemptable request can be cancelled and a 
> > > > subsequent
> > > > + * request on the same context can successfully complete after 
> > > > cancallation.
> > > cancellation
> > > 
> > Yep.
> > 
> > > > + *
> > > > + * Testing methodology is to create non-preemptable request and submit 
> > > > it,
> > > a non-preemptible
> > > 
> > Yep.
> > 
> > > > + * wait for spinner to start, create a NOP request and submit it, 
> > > > cancel the
> > > > + * spinner, wait for spinner to complete and verify it failed with an 
> > > > error,
> > > > + * finally wait for NOP request to complete verify it succeeded 
> > > > without an
> > > > + * error. Preemption timeout also reduced / restored so test runs in a 
> > > > timely
> > > > + * maner.
> > > > + */
> > > > +static int __cancel_reset(struct drm_i915_private *i915,
> > > > + struct intel_engine_cs *engine)
> > > > +{
> > > > +   struct intel_context *ce;
> > > > +   struct igt_spinner spin;
> > > > +   struct i915_request *rq, *nop;
> > > > +   unsigned long preempt_timeout_ms;
> > > > +   int err = 0;
> > > > +
> > > > +   if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT ||
> > > Does this matter? The test is overriding the default anyway.
> > > 
> > Yes. Execlists don't try to preempt anything if
> > CONFIG_DRM_I915_PREEMPT_TIMEOUT is turned off. If we wan't to avoid the
> > cancelation doing a full GT reset, CONFIG_DRM_I915_PREEMPT_TIMEOUT
> > should be turned on.
> Hmm, I would read that as a bug. The description for the config parameter
> is:
>   "This is adjustable via
>   /sys/class/drm/card?/engine/*/preempt_timeout_ms
> 
>   May be 0 to disable the timeout.
> 
>   The compiled in default may get overridden at driver probe time on
>   certain platforms and certain engines which will be reflected in
> the
>   sysfs control."
> 
> I would take that as meaning that even if the compiled in default is zero,
> the user or even the i915 driver itself could override that at runtime and
> enable pre-emption again. So having any code use this as a flag is broken.
> Indeed, any code other than 'engine->default_preempt_timeout =
> CONFIG_PREEMPT_TIMEOUT' is broken, IMHO.
> 

Can't really argue against you here.

> But maybe that's for a different patch. If the driver is already behaving
> badly and doing the correct thing here will actually cause test failures
> then you can't really do much other than follow the existing bad behaviour.
>

Yea, agree it is out of scope this patch / series. We can cleanup the
execlists code in a follow up patch if needed + loop in an execlists
expert for a reviewer. Maybe there is a unknown reason that code is
doing this?

Matt

> John.
> 
> 
> > > > +   !intel_has_reset_engine(engine->gt))
> > > > +   return 0;
> > > > +
> > > > +   preempt_timeout_ms = engine->props.preempt_timeout_ms;
> > > > +   engine->props.preempt_timeout_ms = 100;
> > > > +
> > > > +   if (igt_spinner_init(, engine->gt))
> > > > +   goto out_restore;
> > > > +
> > > > +   ce = intel_context_create(engine);
> > > > +   if (IS_ERR(ce)) {
> > > > +   err = PTR_ERR(ce);
> > > > +   goto out_spin;
> > > > +   }
> > > > +
> > > > +   rq = igt_spinner_create_request(, ce, MI_NOOP);
> > > > +   if (IS_ERR(rq)) {
> > > > +   err = PTR_ERR(rq);
> > > > +   goto out_ce;
> > > > +   }
> > > > +
> > > > +   pr_debug("%s: Cancelling active non-preemptable request\n",
> > > > +

Re: [Intel-gfx] [PATCH 1/2] drm/i915/selftests: Add a cancel request selftest that triggers a reset

2022-01-13 Thread John Harrison


On 1/13/2022 09:34, Matthew Brost wrote:

On Thu, Jan 13, 2022 at 09:33:12AM -0800, John Harrison wrote:

On 1/11/2022 15:11, Matthew Brost wrote:

Add a cancel request selftest that results in an engine reset to cancel
the request as it is non-preemptable. Also insert a NOP request after
the cancelled request and confirm that it completes successfully.

v2:
   (Tvrtko)
- Skip test if preemption timeout compiled out
- Skip test if engine reset isn't supported
- Update debug prints to be more descriptive
v3:
- Add comment explaining test

Signed-off-by: Matthew Brost 
---
   drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
   1 file changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index 7f66f6d299b26..f78de99d5ae1e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -782,6 +782,115 @@ static int __cancel_completed(struct intel_engine_cs 
*engine)
return err;
   }
+/*
+ * Test to prove a non-preemptable request can be cancelled and a subsequent
+ * request on the same context can successfully complete after cancallation.

cancellation


Yep.


+ *
+ * Testing methodology is to create non-preemptable request and submit it,

a non-preemptible


Yep.


+ * wait for spinner to start, create a NOP request and submit it, cancel the
+ * spinner, wait for spinner to complete and verify it failed with an error,
+ * finally wait for NOP request to complete verify it succeeded without an
+ * error. Preemption timeout also reduced / restored so test runs in a timely
+ * maner.
+ */
+static int __cancel_reset(struct drm_i915_private *i915,
+ struct intel_engine_cs *engine)
+{
+   struct intel_context *ce;
+   struct igt_spinner spin;
+   struct i915_request *rq, *nop;
+   unsigned long preempt_timeout_ms;
+   int err = 0;
+
+   if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT ||

Does this matter? The test is overriding the default anyway.


Yes. Execlists don't try to preempt anything if
CONFIG_DRM_I915_PREEMPT_TIMEOUT is turned off. If we wan't to avoid the
cancelation doing a full GT reset, CONFIG_DRM_I915_PREEMPT_TIMEOUT
should be turned on.
  
Hmm, I would read that as a bug. The description for the config 
parameter is:

  "This is adjustable via
  /sys/class/drm/card?/engine/*/preempt_timeout_ms

  May be 0 to disable the timeout.

  The compiled in default may get overridden at driver probe 
time on
  certain platforms and certain engines which will be reflected 
in the

  sysfs control."

I would take that as meaning that even if the compiled in default is 
zero, the user or even the i915 driver itself could override that at 
runtime and enable pre-emption again. So having any code use this as a 
flag is broken. Indeed, any code other than 
'engine->default_preempt_timeout = CONFIG_PREEMPT_TIMEOUT' is broken, IMHO.


But maybe that's for a different patch. If the driver is already 
behaving badly and doing the correct thing here will actually cause test 
failures then you can't really do much other than follow the existing 
bad behaviour.


John.



+   !intel_has_reset_engine(engine->gt))
+   return 0;
+
+   preempt_timeout_ms = engine->props.preempt_timeout_ms;
+   engine->props.preempt_timeout_ms = 100;
+
+   if (igt_spinner_init(, engine->gt))
+   goto out_restore;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(ce);
+   goto out_spin;
+   }
+
+   rq = igt_spinner_create_request(, ce, MI_NOOP);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   goto out_ce;
+   }
+
+   pr_debug("%s: Cancelling active non-preemptable request\n",
+engine->name);
+   i915_request_get(rq);
+   i915_request_add(rq);
+   if (!igt_wait_for_spinner(, rq)) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("Failed to start spinner on %s\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_rq;
+   }
+
+   nop = intel_context_create_request(ce);
+   if (IS_ERR(nop))
+   goto out_nop;

Should be out_rq?


Yes, it should.

Matt


John.



+   i915_request_get(nop);
+   i915_request_add(nop);
+
+   i915_request_cancel(rq, -EINTR);
+
+   if (i915_request_wait(rq, 0, HZ) < 0) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("%s: Failed to cancel hung request\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_nop;
+   }
+
+   if (rq->fence.error != -EINTR)

[Intel-gfx] [PATCH 2/2] drm/i915/guc: Flush G2H handler during a GT reset

2022-01-13 Thread Matthew Brost

Now that the error capture is fully decoupled from fence signalling
(request retirement to free memory, which is turn depends on resets) we
can safely flush the G2H handler during a GT reset. This is eliminates
corner cases where GuC generated G2H (e.g. engine resets) race with a GT
reset.

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c  | 18 +-
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a40f10d376d..f8614ff904b2b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1396,8 +1396,6 @@ static void guc_flush_destroyed_contexts(struct intel_guc 
*guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 {
-   int i;
-
if (unlikely(!guc_submission_initialized(guc))) {
/* Reset called during driver load? GuC not yet initialised! */
return;
@@ -1414,21 +1412,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc 
*guc)
 
guc_flush_submissions(guc);
guc_flush_destroyed_contexts(guc);
-
-   /*
-* Handle any outstanding G2Hs before reset. Call IRQ handler directly
-* each pass as interrupt have been disabled. We always scrub for
-* outstanding G2H as it is possible for outstanding_submission_g2h to
-* be incremented after the context state update.
-*/
-   for (i = 0; i < 4 && atomic_read(>outstanding_submission_g2h); 
++i) {
-   intel_guc_to_host_event_handler(guc);
-#define wait_for_reset(guc, wait_var) \
-   intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
-   do {
-   wait_for_reset(guc, >outstanding_submission_g2h);
-   } while (!list_empty(>ct.requests.incoming));
-   }
+   flush_work(>ct.requests.worker);
 
scrub_guc_desc_for_outstanding_g2h(guc);
 }
-- 
2.34.1

[Intel-gfx] [PATCH 0/2] Flush G2H handler during a GT reset

2022-01-13 Thread Matthew Brost

After a small fix to error capture code, we now can flush G2H during a
GT reset which simplifies code and seals some extreme corner case races. 

Signed-off-by: Matthew Brost 

Matthew Brost (2):
  drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL
  drm/i915/guc: Flush G2H handler during a GT reset

 .../gpu/drm/i915/gt/uc/intel_guc_submission.c  | 18 +-
 drivers/gpu/drm/i915/i915_gpu_error.c  |  2 +-
 2 files changed, 2 insertions(+), 18 deletions(-)

-- 
2.34.1

[Intel-gfx] [PATCH 1/2] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL

2022-01-13 Thread Matthew Brost

Allocate intel_engine_coredump_alloc with ALLOW_FAIL rather than
GFP_KERNEL do fully decouple the error capture from fence signalling.

Fixes: 8b91cdd4f8649 ("drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code")

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 67f3515f07e7a..aee42eae4729f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1516,7 +1516,7 @@ capture_engine(struct intel_engine_cs *engine,
struct i915_request *rq = NULL;
unsigned long flags;
 
-   ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
+   ee = intel_engine_coredump_alloc(engine, ALLOW_FAIL);
if (!ee)
return NULL;
 
-- 
2.34.1

[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/display/ehl: Update voltage swing table

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915/display/ehl: Update voltage swing table
URL   : https://patchwork.freedesktop.org/series/98844/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11079_full -> Patchwork_21994_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_21994_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21994_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (10 -> 10)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_21994_full:

### IGT changes ###

 Possible regressions 

  * igt@gem_exec_await@wide-contexts:
- shard-iclb: [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-iclb6/igt@gem_exec_aw...@wide-contexts.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-iclb2/igt@gem_exec_aw...@wide-contexts.html

  
Known issues


  Here are the changes found in Patchwork_21994_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@feature_discovery@psr2:
- shard-iclb: [PASS][3] -> [SKIP][4] ([i915#658])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-iclb2/igt@feature_discov...@psr2.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-iclb6/igt@feature_discov...@psr2.html

  * igt@gem_eio@unwedge-stress:
- shard-skl:  [PASS][5] -> [TIMEOUT][6] ([i915#3063])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-skl3/igt@gem_...@unwedge-stress.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-skl5/igt@gem_...@unwedge-stress.html

  * igt@gem_exec_balancer@parallel-keep-submit-fence:
- shard-iclb: [PASS][7] -> [SKIP][8] ([i915#4525]) +1 similar issue
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-iclb1/igt@gem_exec_balan...@parallel-keep-submit-fence.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-iclb3/igt@gem_exec_balan...@parallel-keep-submit-fence.html

  * igt@gem_exec_fair@basic-flow@rcs0:
- shard-tglb: [PASS][9] -> [FAIL][10] ([i915#2842])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-tglb5/igt@gem_exec_fair@basic-f...@rcs0.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-tglb7/igt@gem_exec_fair@basic-f...@rcs0.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
- shard-iclb: NOTRUN -> [FAIL][11] ([i915#2852])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-iclb4/igt@gem_exec_fair@basic-none-r...@rcs0.html
- shard-glk:  NOTRUN -> [FAIL][12] ([i915#2842])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-glk5/igt@gem_exec_fair@basic-none-r...@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs0:
- shard-kbl:  NOTRUN -> [FAIL][13] ([i915#2842]) +1 similar issue
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-kbl7/igt@gem_exec_fair@basic-n...@vcs0.html

  * igt@gem_exec_fair@basic-none@vecs0:
- shard-apl:  [PASS][14] -> [FAIL][15] ([i915#2842])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-apl3/igt@gem_exec_fair@basic-n...@vecs0.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-apl3/igt@gem_exec_fair@basic-n...@vecs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
- shard-glk:  [PASS][16] -> [FAIL][17] ([i915#2842])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/shard-glk1/igt@gem_exec_fair@basic-p...@vcs0.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-glk7/igt@gem_exec_fair@basic-p...@vcs0.html

  * igt@gem_exec_fair@basic-pace@vcs1:
- shard-iclb: NOTRUN -> [FAIL][18] ([i915#2842])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-iclb1/igt@gem_exec_fair@basic-p...@vcs1.html

  * igt@gem_huc_copy@huc-copy:
- shard-skl:  NOTRUN -> [SKIP][19] ([fdo#109271] / [i915#2190])
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-skl10/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@basic:
- shard-apl:  NOTRUN -> [SKIP][20] ([fdo#109271] / [i915#4613])
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21994/shard-apl6/igt@gem_lmem_swapp...@basic.html

  * igt@gem_lmem_swapping@heavy-verify-random:
- shard-kbl:  NOTRUN -> [SKIP][21] ([fdo#109271] / [i915#4613]) +4 
similar issues
   [21]:

[Intel-gfx] [PATCH] drm/i915/display/adlp: Implement new step in the TC voltage swing prog sequence

2022-01-13 Thread José Roberto de Souza

TC voltage swing programming sequence was updated with a new step.

BSpec: 54956
Cc: sta...@vger.kernel.org
Cc: Jani Nikula 
Cc: Clint Taylor 
Cc: Imre Deak 
Signed-off-by: José Roberto de Souza 
---
 drivers/gpu/drm/i915/display/intel_ddi.c | 22 ++
 drivers/gpu/drm/i915/i915_reg.h  |  8 ++--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c 
b/drivers/gpu/drm/i915/display/intel_ddi.c
index 6ee0f77b79274..4e93eac926a56 100644
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@@ -1300,6 +1300,28 @@ static void tgl_dkl_phy_set_signal_levels(struct 
intel_encoder *encoder,
 
intel_de_rmw(dev_priv, DKL_TX_DPCNTL2(tc_port),
 DKL_TX_DP20BITMODE, 0);
+
+   if (IS_ALDERLAKE_P(dev_priv)) {
+   u32 val;
+
+   if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI)) 
{
+   if (ln == 0) {
+   val = 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(0);
+   val |= 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(2);
+   } else {
+   val = 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(3);
+   val |= 
DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(3);
+   }
+   } else {
+   val = DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(0);
+   val |= DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(0);
+   }
+
+   intel_de_rmw(dev_priv, DKL_TX_DPCNTL2(tc_port),
+DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK |
+DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK,
+val);
+   }
}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7c4013a0db615..ef6bc81800738 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -10085,8 +10085,12 @@ enum skl_power_gate {
 _DKL_PHY2_BASE) + \
 _DKL_TX_DPCNTL1)
 
-#define _DKL_TX_DPCNTL20x2C8
-#define  DKL_TX_DP20BITMODE(1 << 2)
+#define _DKL_TX_DPCNTL20x2C8
+#define  DKL_TX_DP20BITMODEREG_BIT(2)
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK REG_GENMASK(4, 3)
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1(val) 
REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX1_MASK, (val))
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK REG_GENMASK(6, 5)
+#define  DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2(val) 
REG_FIELD_PREP(DKL_TX_DPCNTL2_CFG_LOADGENSELECT_TX2_MASK, (val))
 #define DKL_TX_DPCNTL2(tc_port) _MMIO(_PORT(tc_port, \
 _DKL_PHY1_BASE, \
 _DKL_PHY2_BASE) + \
-- 
2.34.1

Re: [Intel-gfx] [PATCH 1/2] drm/i915/selftests: Add a cancel request selftest that triggers a reset

2022-01-13 Thread Matthew Brost

On Thu, Jan 13, 2022 at 09:33:12AM -0800, John Harrison wrote:
> On 1/11/2022 15:11, Matthew Brost wrote:
> > Add a cancel request selftest that results in an engine reset to cancel
> > the request as it is non-preemptable. Also insert a NOP request after
> > the cancelled request and confirm that it completes successfully.
> > 
> > v2:
> >   (Tvrtko)
> >- Skip test if preemption timeout compiled out
> >- Skip test if engine reset isn't supported
> >- Update debug prints to be more descriptive
> > v3:
> >- Add comment explaining test
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
> >   1 file changed, 117 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
> > b/drivers/gpu/drm/i915/selftests/i915_request.c
> > index 7f66f6d299b26..f78de99d5ae1e 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> > @@ -782,6 +782,115 @@ static int __cancel_completed(struct intel_engine_cs 
> > *engine)
> > return err;
> >   }
> > +/*
> > + * Test to prove a non-preemptable request can be cancelled and a 
> > subsequent
> > + * request on the same context can successfully complete after 
> > cancallation.
> cancellation
> 

Yep.

> > + *
> > + * Testing methodology is to create non-preemptable request and submit it,
> a non-preemptible
>

Yep.

> > + * wait for spinner to start, create a NOP request and submit it, cancel 
> > the
> > + * spinner, wait for spinner to complete and verify it failed with an 
> > error,
> > + * finally wait for NOP request to complete verify it succeeded without an
> > + * error. Preemption timeout also reduced / restored so test runs in a 
> > timely
> > + * maner.
> > + */
> > +static int __cancel_reset(struct drm_i915_private *i915,
> > + struct intel_engine_cs *engine)
> > +{
> > +   struct intel_context *ce;
> > +   struct igt_spinner spin;
> > +   struct i915_request *rq, *nop;
> > +   unsigned long preempt_timeout_ms;
> > +   int err = 0;
> > +
> > +   if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT ||
> Does this matter? The test is overriding the default anyway.
>

Yes. Execlists don't try to preempt anything if
CONFIG_DRM_I915_PREEMPT_TIMEOUT is turned off. If we wan't to avoid the
cancelation doing a full GT reset, CONFIG_DRM_I915_PREEMPT_TIMEOUT
should be turned on.
 
> > +   !intel_has_reset_engine(engine->gt))
> > +   return 0;
> > +
> > +   preempt_timeout_ms = engine->props.preempt_timeout_ms;
> > +   engine->props.preempt_timeout_ms = 100;
> > +
> > +   if (igt_spinner_init(, engine->gt))
> > +   goto out_restore;
> > +
> > +   ce = intel_context_create(engine);
> > +   if (IS_ERR(ce)) {
> > +   err = PTR_ERR(ce);
> > +   goto out_spin;
> > +   }
> > +
> > +   rq = igt_spinner_create_request(, ce, MI_NOOP);
> > +   if (IS_ERR(rq)) {
> > +   err = PTR_ERR(rq);
> > +   goto out_ce;
> > +   }
> > +
> > +   pr_debug("%s: Cancelling active non-preemptable request\n",
> > +engine->name);
> > +   i915_request_get(rq);
> > +   i915_request_add(rq);
> > +   if (!igt_wait_for_spinner(, rq)) {
> > +   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
> > +
> > +   pr_err("Failed to start spinner on %s\n", engine->name);
> > +   intel_engine_dump(engine, , "%s\n", engine->name);
> > +   err = -ETIME;
> > +   goto out_rq;
> > +   }
> > +
> > +   nop = intel_context_create_request(ce);
> > +   if (IS_ERR(nop))
> > +   goto out_nop;
> Should be out_rq?
>

Yes, it should.

Matt

> John.
> 
> 
> > +   i915_request_get(nop);
> > +   i915_request_add(nop);
> > +
> > +   i915_request_cancel(rq, -EINTR);
> > +
> > +   if (i915_request_wait(rq, 0, HZ) < 0) {
> > +   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
> > +
> > +   pr_err("%s: Failed to cancel hung request\n", engine->name);
> > +   intel_engine_dump(engine, , "%s\n", engine->name);
> > +   err = -ETIME;
> > +   goto out_nop;
> > +   }
> > +
> > +   if (rq->fence.error != -EINTR) {
> > +   pr_err("%s: fence not cancelled (%u)\n",
> > +  engine->name, rq->fence.error);
> > +   err = -EINVAL;
> > +   goto out_nop;
> > +   }
> > +
> > +   if (i915_request_wait(nop, 0, HZ) < 0) {
> > +   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
> > +
> > +   pr_err("%s: Failed to complete nop request\n", engine->name);
> > +   intel_engine_dump(engine, , "%s\n", engine->name);
> > +   err = -ETIME;
> > +   goto out_nop;
> > +   }
> > +
> > +   if (nop->fence.error != 0) {
> > +   pr_err("%s: Nop request errored (%u)\n",
> > +  engine->name, nop->fence.error);
> > +   err = -EINVAL;
> > +   }
> > +
> > +out_nop:
> > +   i915_request_put(nop);
> >

Re: [Intel-gfx] [PATCH 1/2] drm/i915/selftests: Add a cancel request selftest that triggers a reset

2022-01-13 Thread John Harrison


On 1/11/2022 15:11, Matthew Brost wrote:

Add a cancel request selftest that results in an engine reset to cancel
the request as it is non-preemptable. Also insert a NOP request after
the cancelled request and confirm that it completes successfully.

v2:
  (Tvrtko)
   - Skip test if preemption timeout compiled out
   - Skip test if engine reset isn't supported
   - Update debug prints to be more descriptive
v3:
   - Add comment explaining test

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/selftests/i915_request.c | 117 ++
  1 file changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c 
b/drivers/gpu/drm/i915/selftests/i915_request.c
index 7f66f6d299b26..f78de99d5ae1e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -782,6 +782,115 @@ static int __cancel_completed(struct intel_engine_cs 
*engine)
return err;
  }
  
+/*

+ * Test to prove a non-preemptable request can be cancelled and a subsequent
+ * request on the same context can successfully complete after cancallation.

cancellation


+ *
+ * Testing methodology is to create non-preemptable request and submit it,

a non-preemptible


+ * wait for spinner to start, create a NOP request and submit it, cancel the
+ * spinner, wait for spinner to complete and verify it failed with an error,
+ * finally wait for NOP request to complete verify it succeeded without an
+ * error. Preemption timeout also reduced / restored so test runs in a timely
+ * maner.
+ */
+static int __cancel_reset(struct drm_i915_private *i915,
+ struct intel_engine_cs *engine)
+{
+   struct intel_context *ce;
+   struct igt_spinner spin;
+   struct i915_request *rq, *nop;
+   unsigned long preempt_timeout_ms;
+   int err = 0;
+
+   if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT ||

Does this matter? The test is overriding the default anyway.


+   !intel_has_reset_engine(engine->gt))
+   return 0;
+
+   preempt_timeout_ms = engine->props.preempt_timeout_ms;
+   engine->props.preempt_timeout_ms = 100;
+
+   if (igt_spinner_init(, engine->gt))
+   goto out_restore;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce)) {
+   err = PTR_ERR(ce);
+   goto out_spin;
+   }
+
+   rq = igt_spinner_create_request(, ce, MI_NOOP);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   goto out_ce;
+   }
+
+   pr_debug("%s: Cancelling active non-preemptable request\n",
+engine->name);
+   i915_request_get(rq);
+   i915_request_add(rq);
+   if (!igt_wait_for_spinner(, rq)) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("Failed to start spinner on %s\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_rq;
+   }
+
+   nop = intel_context_create_request(ce);
+   if (IS_ERR(nop))
+   goto out_nop;

Should be out_rq?

John.



+   i915_request_get(nop);
+   i915_request_add(nop);
+
+   i915_request_cancel(rq, -EINTR);
+
+   if (i915_request_wait(rq, 0, HZ) < 0) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("%s: Failed to cancel hung request\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_nop;
+   }
+
+   if (rq->fence.error != -EINTR) {
+   pr_err("%s: fence not cancelled (%u)\n",
+  engine->name, rq->fence.error);
+   err = -EINVAL;
+   goto out_nop;
+   }
+
+   if (i915_request_wait(nop, 0, HZ) < 0) {
+   struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+   pr_err("%s: Failed to complete nop request\n", engine->name);
+   intel_engine_dump(engine, , "%s\n", engine->name);
+   err = -ETIME;
+   goto out_nop;
+   }
+
+   if (nop->fence.error != 0) {
+   pr_err("%s: Nop request errored (%u)\n",
+  engine->name, nop->fence.error);
+   err = -EINVAL;
+   }
+
+out_nop:
+   i915_request_put(nop);
+out_rq:
+   i915_request_put(rq);
+out_ce:
+   intel_context_put(ce);
+out_spin:
+   igt_spinner_fini();
+out_restore:
+   engine->props.preempt_timeout_ms = preempt_timeout_ms;
+   if (err)
+   pr_err("%s: %s error %d\n", __func__, engine->name, err);
+   return err;
+}
+
  static int live_cancel_request(void *arg)
  {
struct drm_i915_private *i915 = arg;
@@ -814,6 +923,14 @@ static int live_cancel_request(void *arg)
return err;
if (err2)
return err2;
+

Re: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Remove hacks for reset and schedule disable G2H being received out of order

2022-01-13 Thread John Harrison


On 1/11/2022 15:11, Matthew Brost wrote:

In the i915 there are several hacks in place to make request cancelation
work with an old version of the GuC which delivered the G2H indicating
schedule disable is done before G2H indicating a context reset. Version
69 fixes this, so we can remove these hacks.

Signed-off-by: Matthew Brost 

Reviewed-by: John Harrison 


---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 30 ++-
  1 file changed, 2 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a40f10d376d..3918f1be114fa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1533,7 +1533,6 @@ static void __guc_reset_context(struct intel_context *ce, 
bool stalled)
unsigned long flags;
u32 head;
int i, number_children = ce->parallel.number_children;
-   bool skip = false;
struct intel_context *parent = ce;
  
  	GEM_BUG_ON(intel_context_is_child(ce));

@@ -1544,23 +1543,10 @@ static void __guc_reset_context(struct intel_context 
*ce, bool stalled)
 * GuC will implicitly mark the context as non-schedulable when it sends
 * the reset notification. Make sure our state reflects this change. The
 * context will be marked enabled on resubmission.
-*
-* XXX: If the context is reset as a result of the request cancellation
-* this G2H is received after the schedule disable complete G2H which is
-* wrong as this creates a race between the request cancellation code
-* re-submitting the context and this G2H handler. This is a bug in the
-* GuC but can be worked around in the meantime but converting this to a
-* NOP if a pending enable is in flight as this indicates that a request
-* cancellation has occurred.
 */
spin_lock_irqsave(>guc_state.lock, flags);
-   if (likely(!context_pending_enable(ce)))
-   clr_context_enabled(ce);
-   else
-   skip = true;
+   clr_context_enabled(ce);
spin_unlock_irqrestore(>guc_state.lock, flags);
-   if (unlikely(skip))
-   goto out_put;
  
  	/*

 * For each context in the relationship find the hanging request
@@ -1592,7 +1578,6 @@ static void __guc_reset_context(struct intel_context *ce, 
bool stalled)
}
  
  	__unwind_incomplete_requests(parent);

-out_put:
intel_context_put(parent);
  }
  
@@ -2531,12 +2516,6 @@ static void guc_context_cancel_request(struct intel_context *ce,

true);
}
  
-		/*

-* XXX: Racey if context is reset, see comment in
-* __guc_reset_context().
-*/
-   flush_work(_to_guc(ce)->ct.requests.worker);
-
guc_context_unblock(block_context);
intel_context_put(ce);
}
@@ -3971,12 +3950,7 @@ static void guc_handle_context_reset(struct intel_guc 
*guc,
  {
trace_intel_context_reset(ce);
  
-	/*

-* XXX: Racey if request cancellation has occurred, see comment in
-* __guc_reset_context().
-*/
-   if (likely(!intel_context_is_banned(ce) &&
-  !context_blocked(ce))) {
+   if (likely(!intel_context_is_banned(ce))) {
capture_error_state(guc, ce);
guc_context_replay(ce);
} else {

[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Flip guc_id allocation partition (rev3)

2022-01-13 Thread Patchwork

== Series Details ==

Series: drm/i915: Flip guc_id allocation partition (rev3)
URL   : https://patchwork.freedesktop.org/series/98751/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11079 -> Patchwork_21995


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_21995 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21995, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/index.html

Participating hosts (39 -> 40)
--

  Additional (6): bat-dg1-6 bat-dg1-5 bat-adlp-6 bat-rpls-1 bat-jsl-2 bat-jsl-1 
  Missing(5): shard-tglu fi-bsw-cyan fi-apl-guc shard-rkl fi-snb-2600 

Possible new issues
---

  Here are the unknown changes that may have been introduced in Patchwork_21995:

### IGT changes ###

 Possible regressions 

  * igt@i915_pm_rpm@module-reload:
- fi-bsw-kefka:   [PASS][1] -> [DMESG-WARN][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11079/fi-bsw-kefka/igt@i915_pm_...@module-reload.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/fi-bsw-kefka/igt@i915_pm_...@module-reload.html

  
Known issues


  Here are the changes found in Patchwork_21995 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@amdgpu/amd_basic@cs-gfx:
- fi-hsw-4770:NOTRUN -> [SKIP][3] ([fdo#109271] / [fdo#109315]) +17 
similar issues
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/fi-hsw-4770/igt@amdgpu/amd_ba...@cs-gfx.html

  * igt@amdgpu/amd_basic@semaphore:
- fi-bdw-5557u:   NOTRUN -> [SKIP][4] ([fdo#109271]) +31 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/fi-bdw-5557u/igt@amdgpu/amd_ba...@semaphore.html

  * igt@fbdev@info:
- bat-dg1-6:  NOTRUN -> [SKIP][5] ([i915#2582]) +4 similar issues
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-6/igt@fb...@info.html

  * igt@gem_exec_gttfill@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][6] ([i915#4086])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-6/igt@gem_exec_gttf...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][7] ([i915#4086])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-5/igt@gem_exec_gttf...@basic.html

  * igt@gem_exec_suspend@basic-s3:
- fi-skl-6600u:   NOTRUN -> [INCOMPLETE][8] ([i915#4547])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/fi-skl-6600u/igt@gem_exec_susp...@basic-s3.html

  * igt@gem_mmap@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][9] ([i915#4083])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-6/igt@gem_m...@basic.html
- bat-dg1-5:  NOTRUN -> [SKIP][10] ([i915#4083])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-5/igt@gem_m...@basic.html

  * igt@gem_tiled_blits@basic:
- bat-dg1-6:  NOTRUN -> [SKIP][11] ([i915#4077]) +2 similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-6/igt@gem_tiled_bl...@basic.html

  * igt@gem_tiled_fence_blits@basic:
- bat-dg1-5:  NOTRUN -> [SKIP][12] ([i915#4077]) +2 similar issues
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-5/igt@gem_tiled_fence_bl...@basic.html

  * igt@gem_tiled_pread_basic:
- bat-dg1-5:  NOTRUN -> [SKIP][13] ([i915#4079]) +1 similar issue
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-5/igt@gem_tiled_pread_basic.html
- bat-dg1-6:  NOTRUN -> [SKIP][14] ([i915#4079]) +1 similar issue
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-6/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- bat-dg1-5:  NOTRUN -> [SKIP][15] ([i915#1155])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-5/igt@i915_pm_backli...@basic-brightness.html
- bat-dg1-6:  NOTRUN -> [SKIP][16] ([i915#1155])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-6/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_selftest@live@hangcheck:
- bat-dg1-5:  NOTRUN -> [DMESG-FAIL][17] ([i915#4494])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-5/igt@i915_selftest@l...@hangcheck.html
- bat-dg1-6:  NOTRUN -> [DMESG-FAIL][18] ([i915#4494])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21995/bat-dg1-6/igt@i915_selftest@l...@hangcheck.html

  * igt@i915_selftest@live@requests:
- fi-pnv-d510:[PASS][19] -> [DMESG-FAIL][20] ([i915#2927] / 
[i915#4528])
   [19]:

[Intel-gfx] [PATCH v2 i-g-t 15/15] tests/i915/gem_exec_capture: Restore engines

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The test was updated some engine properties but not restoring them
afterwards. That would leave the system in a non-default state which
could potentially affect subsequent tests. Fix it by using the new
save/restore engine properties helper functions.

Signed-off-by: John Harrison 
---
 tests/i915/gem_exec_capture.c | 37 ++-
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/tests/i915/gem_exec_capture.c b/tests/i915/gem_exec_capture.c
index 9beb36fc7..51db07c41 100644
--- a/tests/i915/gem_exec_capture.c
+++ b/tests/i915/gem_exec_capture.c
@@ -209,14 +209,21 @@ static int check_error_state(int dir, struct offset 
*obj_offsets, int obj_count,
return blobs;
 }
 
-static void configure_hangs(int fd, const struct intel_execution_engine2 *e, 
int ctxt_id)
+static struct gem_engine_properties
+configure_hangs(int fd, const struct intel_execution_engine2 *e, int ctxt_id)
 {
+   struct gem_engine_properties props;
+
/* Ensure fast hang detection */
-   gem_engine_property_printf(fd, e->name, "preempt_timeout_ms", "%d", 
250);
-   gem_engine_property_printf(fd, e->name, "heartbeat_interval_ms", "%d", 
500);
+   props.engine = e;
+   props.preempt_timeout = 250;
+   props.heartbeat_interval = 500;
+   gem_engine_properties_configure(fd, );
 
/* Allow engine based resets and disable banning */
igt_allow_hang(fd, ctxt_id, HANG_ALLOW_CAPTURE | 
HANG_WANT_ENGINE_RESET);
+
+   return props;
 }
 
 static bool fence_busy(int fence)
@@ -256,8 +263,9 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
const intel_ctx_t *ctx,
uint32_t *batch, *seqno;
struct offset offset;
int i, fence_out;
+   struct gem_engine_properties saved_engine;
 
-   configure_hangs(fd, e, ctx->id);
+   saved_engine = configure_hangs(fd, e, ctx->id);
 
memset(obj, 0, sizeof(obj));
obj[SCRATCH].handle = gem_create_in_memory_regions(fd, 4096, region);
@@ -371,6 +379,8 @@ static void __capture1(int fd, int dir, uint64_t ahnd, 
const intel_ctx_t *ctx,
gem_close(fd, obj[BATCH].handle);
gem_close(fd, obj[NOCAPTURE].handle);
gem_close(fd, obj[SCRATCH].handle);
+
+   gem_engine_properties_restore(fd, _engine);
 }
 
 static void capture(int fd, int dir, const intel_ctx_t *ctx,
@@ -417,8 +427,9 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
uint32_t *batch, *seqno;
struct offset *offsets;
int i, fence_out;
+   struct gem_engine_properties saved_engine;
 
-   configure_hangs(fd, e, ctx->id);
+   saved_engine = configure_hangs(fd, e, ctx->id);
 
offsets = calloc(count, sizeof(*offsets));
igt_assert(offsets);
@@ -559,10 +570,12 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
 
qsort(offsets, count, sizeof(*offsets), cmp);
igt_assert(offsets[0].addr <= offsets[count-1].addr);
+
+   gem_engine_properties_restore(fd, _engine);
return offsets;
 }
 
-#define find_first_available_engine(fd, ctx, e) \
+#define find_first_available_engine(fd, ctx, e, saved) \
do { \
ctx = intel_ctx_create_all_physical(fd); \
igt_assert(ctx); \
@@ -570,7 +583,7 @@ __captureN(int fd, int dir, uint64_t ahnd, const 
intel_ctx_t *ctx,
for_each_if(gem_class_can_store_dword(fd, e->class)) \
break; \
igt_assert(e); \
-   configure_hangs(fd, e, ctx->id); \
+   saved = configure_hangs(fd, e, ctx->id); \
} while(0)
 
 static void many(int fd, int dir, uint64_t size, unsigned int flags)
@@ -580,8 +593,9 @@ static void many(int fd, int dir, uint64_t size, unsigned 
int flags)
uint64_t ram, gtt, ahnd;
unsigned long count, blobs;
struct offset *offsets;
+   struct gem_engine_properties saved_engine;
 
-   find_first_available_engine(fd, ctx, e);
+   find_first_available_engine(fd, ctx, e, saved_engine);
 
gtt = gem_aperture_size(fd) / size;
ram = (intel_get_avail_ram_mb() << 20) / size;
@@ -602,6 +616,8 @@ static void many(int fd, int dir, uint64_t size, unsigned 
int flags)
 
free(offsets);
put_ahnd(ahnd);
+
+   gem_engine_properties_restore(fd, _engine);
 }
 
 static void prioinv(int fd, int dir, const intel_ctx_t *ctx,
@@ -697,8 +713,9 @@ static void userptr(int fd, int dir)
void *ptr;
int obj_size = 4096;
uint32_t system_region = INTEL_MEMORY_REGION_ID(I915_SYSTEM_MEMORY, 0);
+   struct gem_engine_properties saved_engine;
 
-   find_first_available_engine(fd, ctx, e);
+   find_first_available_engine(fd, ctx, e, saved_engine);
 
igt_assert(posix_memalign(, obj_size, obj_size) == 0);
memset(ptr, 0, obj_size);
@@ -710,6 +727,8 @@ static void userptr(int fd, int dir)
gem_close(fd, handle);

[Intel-gfx] [PATCH v2 i-g-t 08/15] tests/i915/i915_hangman: Add alive-ness test after error capture

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Added a an extra step to the i915_hangman tests to check that the
system is still alive after the hang and recovery. This submits a
simple batch to each engine which does a write to memory and checks
that the write occurred.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 5a0c9497c..918418760 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -48,8 +48,57 @@
 static int device = -1;
 static int sysfs = -1;
 
+#define OFFSET_ALIVE   10
+
 IGT_TEST_DESCRIPTION("Tests for hang detection and recovery");
 
+static void check_alive(void)
+{
+   const struct intel_execution_engine2 *engine;
+   const intel_ctx_t *ctx;
+   uint32_t scratch, *out;
+   int fd, i = 0;
+   uint64_t ahnd, scratch_addr;
+
+   fd = drm_open_driver(DRIVER_INTEL);
+   igt_require(gem_class_can_store_dword(fd, 0));
+
+   ctx = intel_ctx_create_all_physical(fd);
+   ahnd = get_reloc_ahnd(fd, ctx->id);
+   scratch = gem_create(fd, 4096);
+   scratch_addr = get_offset(ahnd, scratch, 4096, 0);
+   out = gem_mmap__wc(fd, scratch, 0, 4096, PROT_WRITE);
+   gem_set_domain(fd, scratch,
+   I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
+
+   for_each_physical_engine(fd, engine) {
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], 0);
+   i++;
+   }
+
+   i = 0;
+   for_each_ctx_engine(fd, ctx, engine) {
+   if (!gem_class_can_store_dword(fd, engine->class))
+   continue;
+
+   /* +OFFSET_ALIVE to ensure engine zero doesn't get a false 
negative */
+   igt_store_word(fd, ahnd, ctx, engine, -1, scratch, scratch_addr,
+  i + OFFSET_ALIVE, i + OFFSET_ALIVE);
+   i++;
+   }
+
+   gem_set_domain(fd, scratch, I915_GEM_DOMAIN_GTT, 0);
+
+   while (i--)
+   igt_assert_eq_u32(out[i + OFFSET_ALIVE], i + OFFSET_ALIVE);
+
+   munmap(out, 4096);
+   gem_close(fd, scratch);
+   put_ahnd(ahnd);
+   intel_ctx_destroy(fd, ctx);
+   close(fd);
+}
+
 static bool has_error_state(int dir)
 {
bool result;
@@ -231,6 +280,8 @@ static void test_error_state_capture(const intel_ctx_t *ctx,
check_error_state(e->name, offset, batch);
munmap(batch, 4096);
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static void
@@ -289,6 +340,8 @@ test_engine_hang(const intel_ctx_t *ctx,
put_ahnd(ahndN);
}
put_ahnd(ahnd);
+
+   check_alive();
 }
 
 static int hang_count;
@@ -321,6 +374,8 @@ static void test_hang_detector(const intel_ctx_t *ctx,
 
/* Did it work? */
igt_assert(hang_count == 1);
+
+   check_alive();
 }
 
 /* This test covers the case where we end up in an uninitialised area of the
@@ -356,6 +411,8 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
igt_force_gpu_reset(device);
igt_assert_f(0, "unterminated batch did not trigger a hang!\n");
}
+
+   check_alive();
 }
 
 static void do_tests(const char *name, const char *prefix,
@@ -433,6 +490,8 @@ igt_main
igt_assert(sysfs != -1);
 
igt_require(has_error_state(sysfs));
+
+   gem_require_mmap_wc(device);
}
 
igt_describe("Basic error capture");
-- 
2.25.1

[Intel-gfx] [PATCH v2 i-g-t 14/15] tests/i915/i915_hangman: Configure engine properties for quicker hangs

2022-01-13 Thread John . C . Harrison

From: John Harrison 

Some platforms have very long timeouts configured for some engines.
Some have them disabled completely. That makes for a very slow (or
broken) hangman test. So explicitly configure the engines to have
reasonable settings first.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 567eb71ee..1a2b2cf7a 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -500,8 +500,12 @@ igt_main
 {
const intel_ctx_t *ctx;
igt_hang_t hang = {};
+   struct gem_engine_properties saved_params[GEM_MAX_ENGINES];
+   int num_engines = 0;
 
igt_fixture {
+   const struct intel_execution_engine2 *e;
+
device = drm_open_driver(DRIVER_INTEL);
igt_require_gem(device);
 
@@ -515,6 +519,13 @@ igt_main
igt_require(has_error_state(sysfs));
 
gem_require_mmap_wc(device);
+
+   for_each_physical_engine(device, e) {
+   saved_params[num_engines].engine = e;
+   saved_params[num_engines].preempt_timeout = 500;
+   saved_params[num_engines].heartbeat_interval = 1000;
+   gem_engine_properties_configure(device, saved_params + 
num_engines++);
+   }
}
 
igt_describe("Basic error capture");
@@ -546,6 +557,11 @@ igt_main
do_tests("engine", "engine", ctx);
 
igt_fixture {
+   int i;
+
+   for (i = 0; i < num_engines; i++)
+   gem_engine_properties_restore(device, saved_params + i);
+
igt_disallow_hang(device, hang);
intel_ctx_destroy(device, ctx);
close(device);
-- 
2.25.1

[Intel-gfx] [PATCH v2 i-g-t 11/15] tests/i915/i915_hangman: Don't let background contexts cause a ban

2022-01-13 Thread John . C . Harrison

From: John Harrison 

The global context used by all the subtests for causing hangs is
marked as unbannable. However, some of the subtests set background
spinners running on all engines using a freshly created context. If
there is a test failure for any reason, all of those spinners can be
killed off as hanging contexts. On systems with lots of engines, that
can result in the test being banned from creating any new contexts.

So make the spinner contexts unbannable as well. That way if one
subtest fails it won't necessarily bring down all subsequent subtests.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c
index 9f7f8062c..567eb71ee 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -284,6 +284,21 @@ static void test_error_state_capture(const intel_ctx_t 
*ctx,
check_alive();
 }
 
+static void context_unban(int fd, unsigned ctx)
+{
+   struct drm_i915_gem_context_param param = {
+   .ctx_id = ctx,
+   .param = I915_CONTEXT_PARAM_BANNABLE,
+   .value = 0,
+   };
+
+   if(__gem_context_set_param(fd, ) == -EINVAL) {
+   igt_assert_eq(param.value, 0);
+   param.param = I915_CONTEXT_PARAM_BAN_PERIOD;
+   gem_context_set_param(fd, );
+   }
+}
+
 static void
 test_engine_hang(const intel_ctx_t *ctx,
 const struct intel_execution_engine2 *e, unsigned int flags)
@@ -307,6 +322,7 @@ test_engine_hang(const intel_ctx_t *ctx,
num_ctx = 0;
for_each_ctx_engine(device, ctx, other) {
local_ctx[num_ctx] = intel_ctx_create(device, >cfg);
+   context_unban(device, local_ctx[num_ctx]->id);
ahndN = get_reloc_ahnd(device, local_ctx[num_ctx]->id);
spin = __igt_spin_new(device,
  .ahnd = ahndN,
-- 
2.25.1

1 2 >

1 - 100 of 156 matches

Mail list logo