Re: REGRESSION: in intel video driver following introduction of mm_struct.has_pinned

2020-09-29 Thread Joonas Lahtinen
(+ intel-gfx for being i915 related)
(+ Chris who has looked into the issue)

Hi,

Thanks for reporting!

Could you open a bug report according to following instructions:

https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs

A full dmesg of a bad boot and git bisect logs will be helpful.

Also, please describe when the problem happens, is it at boot? Are you
getting the OOPS on every boot?

For future reference, replying to a single thread helps keeping the
attention focused.

Regards, Joonas

Quoting Tony Fischetti (2020-09-28 21:14:16)
> After a length git bisection, I determined the commit that introduced
> a change that ultimately caused a bug/oops null dereference (see below
> for relevant syslog entries) was 008cfe4418b3dbda2ff.. (mm: Introduce
> mm_struct.has_pinned)
> 
> The RIP (according to syslog) occurs in function
> `__get_user_pages_remote` and the last function to call it from the
> i915 code is `gem_userptr_get_pages_worker`
> More specifically, it appears to be the call to
> `pin_user_pages_remote` in `gem_userptr_get_pages_worker` in
> drivers/gpu/drm/i915/gem/i915_gem_userptr.c that directly leads to the
> oops.
> 
> Unfortunately, I don't know enough to try to fix and share the fix
> myself, but I hope the information I provided is helpful. Please let
> me know if there is any further information I can provide that might
> be of use.
> 
> BUG: kernel NULL pointer dereference, address: 0054
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 8 PID: 497 Comm: kworker/u25:0 Not tainted
> 5.9.0-rc7-alice-investigate-3+ #2
> Hardware name: LENOVO 10ST001QUS/312A, BIOS M1UKT4BA 11/11/2019
> Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915]
> RIP: 0010:__get_user_pages_remote+0xa0/0x2d0
> Code: 85 e7 01 00 00 83 3b 01 0f 85 e0 01 00 00 f7 c1 00 00 04 00 0f
> 84 12 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 58 03 00 00  40
> 54 01 00 00 00 c6 04 24 00 4d 8d 6f 68 48 c7 44 24 10 00 00
> RSP: 0018:a1a58086bde0 EFLAGS: 00010206
> RAX:  RBX: a1a58086be64 RCX: 00040001
> RDX: 07e9 RSI: 7f532f80 RDI: 92f22d89c480
> RBP: 7f532f80 R08: 92f23a188000 R09: 
> R10:  R11: a1a58086bcfd R12: 92f23a188000
> R13: 92f22d89c480 R14: 00042003 R15: 92f22d89c480
> FS:  () GS:92f23e40() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0054 CR3: 16c0a002 CR4: 001706e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  __i915_gem_userptr_get_pages_worker+0x1ec/0x392 [i915]
>  process_one_work+0x1c7/0x310
>  worker_thread+0x28/0x3c0
>  ? set_worker_desc+0xb0/0xb0
>  kthread+0x123/0x140
>  ? kthread_use_mm+0xe0/0xe0
>  ret_from_fork+0x1f/0x30
> Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio iwlmvm mac80211 libarc4
> x86_pkg_temp_thermal intel_powerclamp iwlwifi coretemp i915
> crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit
> ghash_clmulni_intel drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops cec mei_hdcp wmi_bmof snd_hda_intel drm tpm_crb
> snd_intel_dspcfg intel_wmi_thunderbolt snd_hda_codec snd_hwdep
> aesni_intel crypto_simd glue_helper snd_hda_core cfg80211 i2c_i801
> snd_pcm intel_cstate pcspkr snd_timer mei_me i2c_smbus mei i2c_core
> thermal wmi tpm_tis tpm_tis_core tpm rng_core acpi_pad ppdev lp
> ip_tables x_tables
> CR2: 0054
> ---[ end trace 8d080e8b96289c9e ]---


Re: REGRESSION: in intel video driver following introduction of mm_struct.has_pinned

2020-09-29 Thread Chris Wilson
Quoting Joonas Lahtinen (2020-09-29 09:18:34)
> (+ intel-gfx for being i915 related)
> (+ Chris who has looked into the issue)
> 
> Hi,
> 
> Thanks for reporting!

Fixed in commit a4d63c3732f1a0c91abcf5b7f32b4ef7dcd82025
Author: Jason A. Donenfeld 
Date:   Mon Sep 28 12:35:07 2020 +0200

mm: do not rely on mm == current->mm in __get_user_pages_locked
-Chris


Re: REGRESSION: in intel video driver following introduction of mm_struct.has_pinned

2020-09-28 Thread Peter Xu
On Mon, Sep 28, 2020 at 02:14:16PM -0400, Tony Fischetti wrote:
> After a length git bisection, I determined the commit that introduced
> a change that ultimately caused a bug/oops null dereference (see below
> for relevant syslog entries) was 008cfe4418b3dbda2ff.. (mm: Introduce
> mm_struct.has_pinned)
> 
> The RIP (according to syslog) occurs in function
> `__get_user_pages_remote` and the last function to call it from the
> i915 code is `gem_userptr_get_pages_worker`
> More specifically, it appears to be the call to
> `pin_user_pages_remote` in `gem_userptr_get_pages_worker` in
> drivers/gpu/drm/i915/gem/i915_gem_userptr.c that directly leads to the
> oops.
> 
> Unfortunately, I don't know enough to try to fix and share the fix
> myself, but I hope the information I provided is helpful. Please let
> me know if there is any further information I can provide that might
> be of use.
> 
> BUG: kernel NULL pointer dereference, address: 0054
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 8 PID: 497 Comm: kworker/u25:0 Not tainted
> 5.9.0-rc7-alice-investigate-3+ #2
> Hardware name: LENOVO 10ST001QUS/312A, BIOS M1UKT4BA 11/11/2019
> Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915]
> RIP: 0010:__get_user_pages_remote+0xa0/0x2d0
> Code: 85 e7 01 00 00 83 3b 01 0f 85 e0 01 00 00 f7 c1 00 00 04 00 0f
> 84 12 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 58 03 00 00  40
> 54 01 00 00 00 c6 04 24 00 4d 8d 6f 68 48 c7 44 24 10 00 00
> RSP: 0018:a1a58086bde0 EFLAGS: 00010206
> RAX:  RBX: a1a58086be64 RCX: 00040001
> RDX: 07e9 RSI: 7f532f80 RDI: 92f22d89c480
> RBP: 7f532f80 R08: 92f23a188000 R09: 
> R10:  R11: a1a58086bcfd R12: 92f23a188000
> R13: 92f22d89c480 R14: 00042003 R15: 92f22d89c480
> FS:  () GS:92f23e40() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0054 CR3: 16c0a002 CR4: 001706e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  __i915_gem_userptr_get_pages_worker+0x1ec/0x392 [i915]
>  process_one_work+0x1c7/0x310
>  worker_thread+0x28/0x3c0
>  ? set_worker_desc+0xb0/0xb0
>  kthread+0x123/0x140
>  ? kthread_use_mm+0xe0/0xe0
>  ret_from_fork+0x1f/0x30
> Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio iwlmvm mac80211 libarc4
> x86_pkg_temp_thermal intel_powerclamp iwlwifi coretemp i915
> crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit
> ghash_clmulni_intel drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops cec mei_hdcp wmi_bmof snd_hda_intel drm tpm_crb
> snd_intel_dspcfg intel_wmi_thunderbolt snd_hda_codec snd_hwdep
> aesni_intel crypto_simd glue_helper snd_hda_core cfg80211 i2c_i801
> snd_pcm intel_cstate pcspkr snd_timer mei_me i2c_smbus mei i2c_core
> thermal wmi tpm_tis tpm_tis_core tpm rng_core acpi_pad ppdev lp
> ip_tables x_tables
> CR2: 0054
> ---[ end trace 8d080e8b96289c9e ]---
> 

Hi, Tony,

This is also reported elsewhere and the proper fix should be here:

https://lore.kernel.org/intel-gfx/20200928134915.GA5904@xz-x1

Thanks for the report, and sorry for the trouble!

-- 
Peter Xu



REGRESSION: in intel video driver following introduction of mm_struct.has_pinned

2020-09-28 Thread Tony Fischetti
After a length git bisection, I determined the commit that introduced
a change that ultimately caused a bug/oops null dereference (see below
for relevant syslog entries) was 008cfe4418b3dbda2ff.. (mm: Introduce
mm_struct.has_pinned)

The RIP (according to syslog) occurs in function
`__get_user_pages_remote` and the last function to call it from the
i915 code is `gem_userptr_get_pages_worker`
More specifically, it appears to be the call to
`pin_user_pages_remote` in `gem_userptr_get_pages_worker` in
drivers/gpu/drm/i915/gem/i915_gem_userptr.c that directly leads to the
oops.

Unfortunately, I don't know enough to try to fix and share the fix
myself, but I hope the information I provided is helpful. Please let
me know if there is any further information I can provide that might
be of use.

BUG: kernel NULL pointer dereference, address: 0054
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 8 PID: 497 Comm: kworker/u25:0 Not tainted
5.9.0-rc7-alice-investigate-3+ #2
Hardware name: LENOVO 10ST001QUS/312A, BIOS M1UKT4BA 11/11/2019
Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915]
RIP: 0010:__get_user_pages_remote+0xa0/0x2d0
Code: 85 e7 01 00 00 83 3b 01 0f 85 e0 01 00 00 f7 c1 00 00 04 00 0f
84 12 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 58 03 00 00  40
54 01 00 00 00 c6 04 24 00 4d 8d 6f 68 48 c7 44 24 10 00 00
RSP: 0018:a1a58086bde0 EFLAGS: 00010206
RAX:  RBX: a1a58086be64 RCX: 00040001
RDX: 07e9 RSI: 7f532f80 RDI: 92f22d89c480
RBP: 7f532f80 R08: 92f23a188000 R09: 
R10:  R11: a1a58086bcfd R12: 92f23a188000
R13: 92f22d89c480 R14: 00042003 R15: 92f22d89c480
FS:  () GS:92f23e40() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0054 CR3: 16c0a002 CR4: 001706e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 __i915_gem_userptr_get_pages_worker+0x1ec/0x392 [i915]
 process_one_work+0x1c7/0x310
 worker_thread+0x28/0x3c0
 ? set_worker_desc+0xb0/0xb0
 kthread+0x123/0x140
 ? kthread_use_mm+0xe0/0xe0
 ret_from_fork+0x1f/0x30
Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio iwlmvm mac80211 libarc4
x86_pkg_temp_thermal intel_powerclamp iwlwifi coretemp i915
crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit
ghash_clmulni_intel drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops cec mei_hdcp wmi_bmof snd_hda_intel drm tpm_crb
snd_intel_dspcfg intel_wmi_thunderbolt snd_hda_codec snd_hwdep
aesni_intel crypto_simd glue_helper snd_hda_core cfg80211 i2c_i801
snd_pcm intel_cstate pcspkr snd_timer mei_me i2c_smbus mei i2c_core
thermal wmi tpm_tis tpm_tis_core tpm rng_core acpi_pad ppdev lp
ip_tables x_tables
CR2: 0054
---[ end trace 8d080e8b96289c9e ]---