Re: [Intel-gfx] [PATCH 2/2] drm/i915/gem: Really move i915_gem_context.link under ref protection

2022-09-14 Thread Andi Shyti
Hi,

[...]

> > struct i915_address_space *vm;
> > +   unsigned long flags;
> > trace_i915_context_free(ctx);
> > GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
> > +   spin_lock_irqsave(>i915->gem.contexts.lock, flags);
> 
> Why irqsave and the conversion to irq safe elsewhere? Worker context does
> not require it and I don't see the connection to the change of list_del
> location.

yah! I think there is no reason, this is just inherited from
other code.

Andi

> Regards,
> 
> Tvrtko
> 
> > +   list_del(>link);
> > +   spin_unlock_irqrestore(>i915->gem.contexts.lock, flags);
> > +
> > if (ctx->syncobj)
> > drm_syncobj_put(ctx->syncobj);

...


Re: [Intel-gfx] [PATCH 2/2] drm/i915/gem: Really move i915_gem_context.link under ref protection

2022-09-14 Thread Janusz Krzysztofik
Hi Tvrtko,

Thanks for review.

On Wednesday, 14 September 2022 17:37:19 CEST Tvrtko Ursulin wrote:
> 
> On 13/09/2022 17:10, Janusz Krzysztofik wrote:
> > From: Chris Wilson 
> > 
> > i915_perf assumes that it can use the i915_gem_context reference to
> > protect its i915->gem.contexts.list iteration. However, this requires
> > that we do not remove the context from the list until after we drop the
> > final reference and release the struct. If, as currently, we remove the
> > context from the list during context_close(), the link.next pointer may
> > be poisoned while we are holding the context reference and cause a GPF:
> > 
> > [ 4070.573157] i915 :00:02.0: [drm:i915_perf_open_ioctl [i915]] 
> > filtering on ctx_id=0x1f ctx_id_mask=0x1f
> > [ 4070.574881] general protection fault, probably for non-canonical address 
> > 0xdead0100:  [#1] PREEMPT SMP
> > [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G  
> >   E 5.17.9 #180
> > [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS 
> > BNKBL357.86A.0052.2017.0918.1346 09/18/2017
> > [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
> > [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 
> > 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 
> > <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
> > [ 4070.574990] RSP: 0018:c90002077b78 EFLAGS: 00010202
> > [ 4070.574995] RAX: 0002 RBX: 0002 RCX: 
> > 
> > [ 4070.575000] RDX: 0001 RSI: c90002077b20 RDI: 
> > 88810ddc7c68
> > [ 4070.575004] RBP: 0001 R08: 888103242648 R09: 
> > fffc
> > [ 4070.575008] R10: 82c50bc0 R11: 00025c80 R12: 
> > 888101bf1860
> > [ 4070.575012] R13: dead00b0 R14: c90002077c04 R15: 
> > 88810be5cabc
> > [ 4070.575016] FS:  7f1ed50c0780() GS:5ec8() 
> > knlGS:
> > [ 4070.575021] CS:  0010 DS:  ES:  CR0: 80050033
> > [ 4070.575025] CR2: 7f1ed5590280 CR3: 00010ef6f005 CR4: 
> > 003706e0
> > [ 4070.575029] Call Trace:
> > [ 4070.575033]  
> > [ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
> > [ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
> > [ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
> > [ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
> > [ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
> > [ 4070.575290]  drm_ioctl_kernel+0x85/0x110
> > [ 4070.575296]  ? update_load_avg+0x5f/0x5e0
> > [ 4070.575302]  drm_ioctl+0x1d3/0x370
> > [ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
> > [ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
> > [ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
> > [ 4070.575451]  ? __do_softirq+0xaa/0x1d2
> > [ 4070.575456]  do_syscall_64+0x35/0x80
> > [ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 4070.575467] RIP: 0033:0x7f1ed5c10397
> > [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 
> > 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 
> > <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
> > [ 4070.575478] RSP: 002b:7ffd65c8d7a8 EFLAGS: 0246 ORIG_RAX: 
> > 0010
> > [ 4070.575484] RAX: ffda RBX: 0006 RCX: 
> > 7f1ed5c10397
> > [ 4070.575488] RDX: 7ffd65c8d7c0 RSI: 40106476 RDI: 
> > 0006
> > [ 4070.575492] RBP: 5620972f9c60 R08: 000a R09: 
> > 0005
> > [ 4070.575496] R10: 000d R11: 0246 R12: 
> > 000a
> > [ 4070.575500] R13: 000d R14:  R15: 
> > 7ffd65c8d7c0
> > [ 4070.575505]  
> > [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) 
> > i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) 
> > crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) 
> > cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) 
> > syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) 
> > sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) 
> > i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
> > [ 4070.575549] ---[ end trace  ]---
> > 
> > Reported-by: Mark Janes 
> > Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
> > References: a4e7ccdac38e ("drm/i915: Move context management under GEM")
> > Fixes: f8246cf4d9a9 ("drm/i915/gem: Drop free_work for GEM contexts")
> > Signed-off-by: Chris Wilson 
> > Reviewed-by: Andi Shyti 
> > Signed-off-by: Andi Shyti 
> > Signed-off-by: Janusz Krzysztofik 
> > Cc: Tvrtko Ursulin 
> > Cc:  # v5.12+
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c | 14 +++---
> >   drivers/gpu/drm/i915/i915_perf.c

Re: [Intel-gfx] [PATCH 2/2] drm/i915/gem: Really move i915_gem_context.link under ref protection

2022-09-14 Thread Janusz Krzysztofik
On Wednesday, 14 September 2022 17:15:02 CEST Andi Shyti wrote:
> Hi Krzysztofik,
> 
> if you are going to resend it, I just have a little thing if you
> don't mind,
> 
> On Tue, Sep 13, 2022 at 06:10:39PM +0200, Janusz Krzysztofik wrote:
> > From: Chris Wilson 
> > 
> > i915_perf assumes that it can use the i915_gem_context reference to
> > protect its i915->gem.contexts.list iteration. However, this requires
> > that we do not remove the context from the list until after we drop the
> > final reference and release the struct. If, as currently, we remove the
> > context from the list during context_close(), the link.next pointer may
> > be poisoned while we are holding the context reference and cause a GPF:
> > 
> > [ 4070.573157] i915 :00:02.0: [drm:i915_perf_open_ioctl [i915]] 
> > filtering on ctx_id=0x1f ctx_id_mask=0x1f
> > [ 4070.574881] general protection fault, probably for non-canonical address 
> > 0xdead0100:  [#1] PREEMPT SMP
> > [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G  
> >   E 5.17.9 #180
> > [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS 
> > BNKBL357.86A.0052.2017.0918.1346 09/18/2017
> > [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
> > [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 
> > 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 
> > <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
> > [ 4070.574990] RSP: 0018:c90002077b78 EFLAGS: 00010202
> > [ 4070.574995] RAX: 0002 RBX: 0002 RCX: 
> > 
> > [ 4070.575000] RDX: 0001 RSI: c90002077b20 RDI: 
> > 88810ddc7c68
> > [ 4070.575004] RBP: 0001 R08: 888103242648 R09: 
> > fffc
> > [ 4070.575008] R10: 82c50bc0 R11: 00025c80 R12: 
> > 888101bf1860
> > [ 4070.575012] R13: dead00b0 R14: c90002077c04 R15: 
> > 88810be5cabc
> > [ 4070.575016] FS:  7f1ed50c0780() GS:5ec8() 
> > knlGS:
> > [ 4070.575021] CS:  0010 DS:  ES:  CR0: 80050033
> > [ 4070.575025] CR2: 7f1ed5590280 CR3: 00010ef6f005 CR4: 
> > 003706e0
> > [ 4070.575029] Call Trace:
> > [ 4070.575033]  
> > [ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
> > [ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
> > [ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
> > [ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
> > [ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
> > [ 4070.575290]  drm_ioctl_kernel+0x85/0x110
> > [ 4070.575296]  ? update_load_avg+0x5f/0x5e0
> > [ 4070.575302]  drm_ioctl+0x1d3/0x370
> > [ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
> > [ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
> > [ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
> > [ 4070.575451]  ? __do_softirq+0xaa/0x1d2
> > [ 4070.575456]  do_syscall_64+0x35/0x80
> > [ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 4070.575467] RIP: 0033:0x7f1ed5c10397
> > [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 
> > 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 
> > <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
> > [ 4070.575478] RSP: 002b:7ffd65c8d7a8 EFLAGS: 0246 ORIG_RAX: 
> > 0010
> > [ 4070.575484] RAX: ffda RBX: 0006 RCX: 
> > 7f1ed5c10397
> > [ 4070.575488] RDX: 7ffd65c8d7c0 RSI: 40106476 RDI: 
> > 0006
> > [ 4070.575492] RBP: 5620972f9c60 R08: 000a R09: 
> > 0005
> > [ 4070.575496] R10: 000d R11: 0246 R12: 
> > 000a
> > [ 4070.575500] R13: 000d R14:  R15: 
> > 7ffd65c8d7c0
> > [ 4070.575505]  
> > [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) 
> > i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) 
> > crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) 
> > cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) 
> > syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) 
> > sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) 
> > i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
> > [ 4070.575549] ---[ end trace  ]---
> > 
> > Reported-by: Mark Janes 
> > Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
> > References: a4e7ccdac38e ("drm/i915: Move context management under GEM")
> > Fixes: f8246cf4d9a9 ("drm/i915/gem: Drop free_work for GEM contexts")
> > Signed-off-by: Chris Wilson 
> > Reviewed-by: Andi Shyti 
> > Signed-off-by: Andi Shyti 
> > Signed-off-by: Janusz Krzysztofik 
> > Cc: Tvrtko Ursulin 
> > Cc:  # v5.12+
> > ---
> >  

Re: [Intel-gfx] [PATCH 2/2] drm/i915/gem: Really move i915_gem_context.link under ref protection

2022-09-14 Thread Tvrtko Ursulin



On 13/09/2022 17:10, Janusz Krzysztofik wrote:

From: Chris Wilson 

i915_perf assumes that it can use the i915_gem_context reference to
protect its i915->gem.contexts.list iteration. However, this requires
that we do not remove the context from the list until after we drop the
final reference and release the struct. If, as currently, we remove the
context from the list during context_close(), the link.next pointer may
be poisoned while we are holding the context reference and cause a GPF:

[ 4070.573157] i915 :00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering 
on ctx_id=0x1f ctx_id_mask=0x1f
[ 4070.574881] general protection fault, probably for non-canonical address 
0xdead0100:  [#1] PREEMPT SMP
[ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: GE 
5.17.9 #180
[ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS 
BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
[ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 f0 41 
0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 50 48 
8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
[ 4070.574990] RSP: 0018:c90002077b78 EFLAGS: 00010202
[ 4070.574995] RAX: 0002 RBX: 0002 RCX: 
[ 4070.575000] RDX: 0001 RSI: c90002077b20 RDI: 88810ddc7c68
[ 4070.575004] RBP: 0001 R08: 888103242648 R09: fffc
[ 4070.575008] R10: 82c50bc0 R11: 00025c80 R12: 888101bf1860
[ 4070.575012] R13: dead00b0 R14: c90002077c04 R15: 88810be5cabc
[ 4070.575016] FS:  7f1ed50c0780() GS:5ec8() 
knlGS:
[ 4070.575021] CS:  0010 DS:  ES:  CR0: 80050033
[ 4070.575025] CR2: 7f1ed5590280 CR3: 00010ef6f005 CR4: 003706e0
[ 4070.575029] Call Trace:
[ 4070.575033]  
[ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
[ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
[ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
[ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
[ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
[ 4070.575290]  drm_ioctl_kernel+0x85/0x110
[ 4070.575296]  ? update_load_avg+0x5f/0x5e0
[ 4070.575302]  drm_ioctl+0x1d3/0x370
[ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
[ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
[ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
[ 4070.575451]  ? __do_softirq+0xaa/0x1d2
[ 4070.575456]  do_syscall_64+0x35/0x80
[ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4070.575467] RIP: 0033:0x7f1ed5c10397
[ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 
89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff 
ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
[ 4070.575478] RSP: 002b:7ffd65c8d7a8 EFLAGS: 0246 ORIG_RAX: 
0010
[ 4070.575484] RAX: ffda RBX: 0006 RCX: 7f1ed5c10397
[ 4070.575488] RDX: 7ffd65c8d7c0 RSI: 40106476 RDI: 0006
[ 4070.575492] RBP: 5620972f9c60 R08: 000a R09: 0005
[ 4070.575496] R10: 000d R11: 0246 R12: 000a
[ 4070.575500] R13: 000d R14:  R15: 7ffd65c8d7c0
[ 4070.575505]  
[ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) 
i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) 
crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) 
cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) 
syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) 
sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) 
i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
[ 4070.575549] ---[ end trace  ]---

Reported-by: Mark Janes 
Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
References: a4e7ccdac38e ("drm/i915: Move context management under GEM")
Fixes: f8246cf4d9a9 ("drm/i915/gem: Drop free_work for GEM contexts")
Signed-off-by: Chris Wilson 
Reviewed-by: Andi Shyti 
Signed-off-by: Andi Shyti 
Signed-off-by: Janusz Krzysztofik 
Cc: Tvrtko Ursulin 
Cc:  # v5.12+
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 14 +++---
  drivers/gpu/drm/i915/i915_perf.c| 18 ++
  drivers/gpu/drm/i915/i915_sysfs.c   |  8 
  3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dabdfe09f5e51..9d7142ab63c05 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1133,7 +1133,6 @@ static struct i915_gem_engines *default_engines(struct 

Re: [Intel-gfx] [PATCH 2/2] drm/i915/gem: Really move i915_gem_context.link under ref protection

2022-09-14 Thread Andi Shyti
Hi Krzysztofik,

if you are going to resend it, I just have a little thing if you
don't mind,

On Tue, Sep 13, 2022 at 06:10:39PM +0200, Janusz Krzysztofik wrote:
> From: Chris Wilson 
> 
> i915_perf assumes that it can use the i915_gem_context reference to
> protect its i915->gem.contexts.list iteration. However, this requires
> that we do not remove the context from the list until after we drop the
> final reference and release the struct. If, as currently, we remove the
> context from the list during context_close(), the link.next pointer may
> be poisoned while we are holding the context reference and cause a GPF:
> 
> [ 4070.573157] i915 :00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering 
> on ctx_id=0x1f ctx_id_mask=0x1f
> [ 4070.574881] general protection fault, probably for non-canonical address 
> 0xdead0100:  [#1] PREEMPT SMP
> [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G
> E 5.17.9 #180
> [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS 
> BNKBL357.86A.0052.2017.0918.1346 09/18/2017
> [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
> [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 
> f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 
> 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
> [ 4070.574990] RSP: 0018:c90002077b78 EFLAGS: 00010202
> [ 4070.574995] RAX: 0002 RBX: 0002 RCX: 
> 
> [ 4070.575000] RDX: 0001 RSI: c90002077b20 RDI: 
> 88810ddc7c68
> [ 4070.575004] RBP: 0001 R08: 888103242648 R09: 
> fffc
> [ 4070.575008] R10: 82c50bc0 R11: 00025c80 R12: 
> 888101bf1860
> [ 4070.575012] R13: dead00b0 R14: c90002077c04 R15: 
> 88810be5cabc
> [ 4070.575016] FS:  7f1ed50c0780() GS:5ec8() 
> knlGS:
> [ 4070.575021] CS:  0010 DS:  ES:  CR0: 80050033
> [ 4070.575025] CR2: 7f1ed5590280 CR3: 00010ef6f005 CR4: 
> 003706e0
> [ 4070.575029] Call Trace:
> [ 4070.575033]  
> [ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
> [ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
> [ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
> [ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
> [ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
> [ 4070.575290]  drm_ioctl_kernel+0x85/0x110
> [ 4070.575296]  ? update_load_avg+0x5f/0x5e0
> [ 4070.575302]  drm_ioctl+0x1d3/0x370
> [ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
> [ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
> [ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
> [ 4070.575451]  ? __do_softirq+0xaa/0x1d2
> [ 4070.575456]  do_syscall_64+0x35/0x80
> [ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 4070.575467] RIP: 0033:0x7f1ed5c10397
> [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 
> 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
> 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
> [ 4070.575478] RSP: 002b:7ffd65c8d7a8 EFLAGS: 0246 ORIG_RAX: 
> 0010
> [ 4070.575484] RAX: ffda RBX: 0006 RCX: 
> 7f1ed5c10397
> [ 4070.575488] RDX: 7ffd65c8d7c0 RSI: 40106476 RDI: 
> 0006
> [ 4070.575492] RBP: 5620972f9c60 R08: 000a R09: 
> 0005
> [ 4070.575496] R10: 000d R11: 0246 R12: 
> 000a
> [ 4070.575500] R13: 000d R14:  R15: 
> 7ffd65c8d7c0
> [ 4070.575505]  
> [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) 
> i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) 
> crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) 
> cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) 
> syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) 
> sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) 
> i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
> [ 4070.575549] ---[ end trace  ]---
> 
> Reported-by: Mark Janes 
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
> References: a4e7ccdac38e ("drm/i915: Move context management under GEM")
> Fixes: f8246cf4d9a9 ("drm/i915/gem: Drop free_work for GEM contexts")
> Signed-off-by: Chris Wilson 
> Reviewed-by: Andi Shyti 
> Signed-off-by: Andi Shyti 
> Signed-off-by: Janusz Krzysztofik 
> Cc: Tvrtko Ursulin 
> Cc:  # v5.12+
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 14 +++---
>  drivers/gpu/drm/i915/i915_perf.c| 18 ++
>  drivers/gpu/drm/i915/i915_sysfs.c   |  8 
>  3 files changed, 21 insertions(+), 19 deletions(-)
> 
> diff --git 

[Intel-gfx] [PATCH 2/2] drm/i915/gem: Really move i915_gem_context.link under ref protection

2022-09-13 Thread Janusz Krzysztofik
From: Chris Wilson 

i915_perf assumes that it can use the i915_gem_context reference to
protect its i915->gem.contexts.list iteration. However, this requires
that we do not remove the context from the list until after we drop the
final reference and release the struct. If, as currently, we remove the
context from the list during context_close(), the link.next pointer may
be poisoned while we are holding the context reference and cause a GPF:

[ 4070.573157] i915 :00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering 
on ctx_id=0x1f ctx_id_mask=0x1f
[ 4070.574881] general protection fault, probably for non-canonical address 
0xdead0100:  [#1] PREEMPT SMP
[ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: GE 
5.17.9 #180
[ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS 
BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
[ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 
f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 
50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
[ 4070.574990] RSP: 0018:c90002077b78 EFLAGS: 00010202
[ 4070.574995] RAX: 0002 RBX: 0002 RCX: 
[ 4070.575000] RDX: 0001 RSI: c90002077b20 RDI: 88810ddc7c68
[ 4070.575004] RBP: 0001 R08: 888103242648 R09: fffc
[ 4070.575008] R10: 82c50bc0 R11: 00025c80 R12: 888101bf1860
[ 4070.575012] R13: dead00b0 R14: c90002077c04 R15: 88810be5cabc
[ 4070.575016] FS:  7f1ed50c0780() GS:5ec8() 
knlGS:
[ 4070.575021] CS:  0010 DS:  ES:  CR0: 80050033
[ 4070.575025] CR2: 7f1ed5590280 CR3: 00010ef6f005 CR4: 003706e0
[ 4070.575029] Call Trace:
[ 4070.575033]  
[ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
[ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
[ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
[ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
[ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
[ 4070.575290]  drm_ioctl_kernel+0x85/0x110
[ 4070.575296]  ? update_load_avg+0x5f/0x5e0
[ 4070.575302]  drm_ioctl+0x1d3/0x370
[ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
[ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
[ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
[ 4070.575451]  ? __do_softirq+0xaa/0x1d2
[ 4070.575456]  do_syscall_64+0x35/0x80
[ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4070.575467] RIP: 0033:0x7f1ed5c10397
[ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 
5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
[ 4070.575478] RSP: 002b:7ffd65c8d7a8 EFLAGS: 0246 ORIG_RAX: 
0010
[ 4070.575484] RAX: ffda RBX: 0006 RCX: 7f1ed5c10397
[ 4070.575488] RDX: 7ffd65c8d7c0 RSI: 40106476 RDI: 0006
[ 4070.575492] RBP: 5620972f9c60 R08: 000a R09: 0005
[ 4070.575496] R10: 000d R11: 0246 R12: 000a
[ 4070.575500] R13: 000d R14:  R15: 7ffd65c8d7c0
[ 4070.575505]  
[ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) 
i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) 
crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) 
cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) 
syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) 
sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) 
i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
[ 4070.575549] ---[ end trace  ]---

Reported-by: Mark Janes 
Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
References: a4e7ccdac38e ("drm/i915: Move context management under GEM")
Fixes: f8246cf4d9a9 ("drm/i915/gem: Drop free_work for GEM contexts")
Signed-off-by: Chris Wilson 
Reviewed-by: Andi Shyti 
Signed-off-by: Andi Shyti 
Signed-off-by: Janusz Krzysztofik 
Cc: Tvrtko Ursulin 
Cc:  # v5.12+
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 14 +++---
 drivers/gpu/drm/i915/i915_perf.c| 18 ++
 drivers/gpu/drm/i915/i915_sysfs.c   |  8 
 3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dabdfe09f5e51..9d7142ab63c05 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1133,7 +1133,6 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx,
err =