Re: [Nouveau] [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50
[+cc Rafael] On Wed, Mar 13, 2019 at 06:25:02PM -0400, Lyude Paul wrote: > On Fri, 2019-02-15 at 16:17 -0500, Lyude Paul wrote: > > On Thu, 2019-02-14 at 18:43 -0600, Bjorn Helgaas wrote: > > > On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote: > > > > On a very specific subset of ThinkPad P50 SKUs, particularly > > > > ones that come with a Quadro M1000M chip instead of the M2000M > > > > variant, the BIOS seems to have a very nasty habit of not > > > > always resetting the secondary Nvidia GPU between full reboots > > > > if the laptop is configured in Hybrid Graphics mode. The > > > > reason for this happening is unknown, but the following steps > > > > and possibly a good bit of patience will reproduce the issue: > > > > > > > > 1. Boot up the laptop normally in Hybrid graphics mode > > > > 2. Make sure nouveau is loaded and that the GPU is awake > > > > 2. Allow the nvidia GPU to runtime suspend itself after being idle > > > > 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help) > > > > 4. If nouveau loads up properly, reboot the machine again and go back to > > > > step 2 until you reproduce the issue > > > > > > > > This results in some very strange behavior: the GPU will quite > > > > literally be left in exactly the same state it was in when the > > > > previously booted kernel started the reboot. This has all > > > > sorts of bad sideaffects: for starters, this completely breaks > > > > nouveau starting with a mysterious EVO channel failure that > > > > happens well before we've actually used the EVO channel for > > > > anything: Thanks for the hybrid tutorial (snipped from this response). IIUC, what you said was that in hybrid mode, the Intel GPU drives the built-in display and the Nvidia GPU drives any external displays and may be used for DRI PRIME rendering (whatever that is). But since you say the Nvidia device gets runtime suspended, I assume there's no external display here and you're not using DRI PRIME. I wonder if it's related to the fact that the Nvidia GPU has been runtime suspended before you do the reboot. Can you try turning of runtime power management for the GPU by setting the runpm module parameter to 0? I *think* this would be booting with "nouveau.runpm=0". > > > Is there a bug report for this? Bugzilla.kernel.org would be ideal, > > > including "lspci -vvxxx" and dmidecode for the system. > > > > > Not yet, but there has been discussion about this between nouveau > > developers on our IRC channel. > > I lied: yes there actually is a bug report for this, but it's > currently on the Red Hat bugzilla. I can get more information from > it if you need (with lenovo's approval of course). Can you please make a bugzilla.kernel.org entry with as much information (dmesg, "lspci -vvxxx", dmidecode, etc) as you can get approval for? You can include the Red Hat bugzilla URL in the commit log, too, but that's not quite as good because we have no control over whether it's public. > And additionally: I've been working with Lenovo on this issue for a > couple of months now, and we've gone through dozens of different > trial BIOSes with no success thus far. However, Lenovo is currently > working on trying to add this workaround into their BIOS but I've > been told that this change is going to take a decent amount of time > since they need to test it across multiple operating systems. I'd be > happy to come back and add a conditional later to turn this > workaround off for later BIOS versions once Lenovo has released a > proper fix. Sounds like Lenovo is going to a lot of trouble for this. The ideal thing from my point of view would be if they could figure out why this works on Windows but not on Linux. I doubt Windows has a quirk like this, so if we could figure out why it works on Windows, we could likely do something similar in Linux. > > > > So to do this, we add a new pci quirk using > > > > DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe > > > > at boot finishes. From there, we check to make sure that this is indeed > > > > the specific P50 variant of this GPU. We also make sure that the GPU PCI > > > > device is advertising NoReset- in order to prevent us from trying to > > > > reset the GPU when the machine is in Dedicated graphics mode (where the > > > > GPU being initialized by the BIOS is normal and expected). Finally, we > > > > try mapping the MMIO space for the GPU which should only work if the GPU > > > > is actually active in D0 mode. We can then read the magic 0x2240c > > > > register on the GPU, which will have bit 1 set if the GPU's firmware has > > > > already been posted during a previous boot. Once we've confirmed all of > > > > this, we reset the PCI device and re-disable it - bringing the GPU back > > > > into a healthy state. > > > > > > > > Signed-off-by: Lyude Paul > > > > Cc: nouveau@lists.freedesktop.org > > > > Cc: dri-de...@lists.freedesktop.org > > > > Cc: Karol Herbst > > > > Cc: Ben
[Nouveau] [PATCH] gpu/nouveau: empty chunk do not have a buffer object associated with them.
From: Jérôme Glisse Empty chunk do not have a bo associated with them so no need to pin/unpin on suspend/resume. This fix suspend/resume on 5.1rc1 when NOUVEAU_SVM is enabled. Signed-off-by: Jérôme Glisse Reviewed-by: Tobias Klausmann Tested-by: Tobias Klausmann Cc: Ben Skeggs Cc: dri-de...@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: David Airlie Cc: Daniel Vetter --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index aa9fec80492d..a510dbe9a9cb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -456,11 +456,6 @@ nouveau_dmem_resume(struct nouveau_drm *drm) /* FIXME handle pin failure */ WARN_ON(ret); } - list_for_each_entry (chunk, >dmem->chunk_empty, list) { - ret = nouveau_bo_pin(chunk->bo, TTM_PL_FLAG_VRAM, false); - /* FIXME handle pin failure */ - WARN_ON(ret); - } mutex_unlock(>dmem->mutex); } @@ -479,9 +474,6 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) list_for_each_entry (chunk, >dmem->chunk_full, list) { nouveau_bo_unpin(chunk->bo); } - list_for_each_entry (chunk, >dmem->chunk_empty, list) { - nouveau_bo_unpin(chunk->bo); - } mutex_unlock(>dmem->mutex); } -- 2.17.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)
On Thu, Mar 21, 2019 at 08:30:28PM +0100, Tobias Klausmann wrote: > On 21.03.19 18:12, Jerome Glisse wrote: > > On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote: > > > Hi, > > > > > > just for your information and maybe for some help: with 5.1rc1 and SVM > > > enabled i see the following backtrace [1] when the nouveau card (reverse > > > prime) goes to sleep, for now i have papered over with [2] which leaves me > > > with userspace hangs. Any pointers where to look for the actual culprit? > > > > > > PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) > > > > > > Greetings, > > > > > > Tobias > > Can you check if attached patch fix the issue ? > > > > Cheers, > > Jérôme > > > > Hi, > > the patch is fine, you can add my R-b & Tested-by! Thank you for the quick testing ! I will post the patch with your rb. > > PS: yet i have another unrelated error keeping my card from beeing happy, > thats now the next on my todo list: For secureboot related issue Ben would know this lot better than i do :) > > [ 1102.004901] [ cut here ] > [ 1102.004902] nouveau :01:00.0: timeout > [ 1102.004948] WARNING: CPU: 2 PID: 55 at > drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 > acr_ls_sec2_post_run+0x139/0x190 [nouveau] > [ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo btrtl > btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms videobuf2_memops > mmc_core bluetooth memstick videobuf2_v4l2 videodev videobuf2_common > ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid snd_hda_codec_realtek > snd_hda_codec_generic ledtrig_audio nouveau arc4 nls_iso8859_1 nls_cp437 > i915 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp > kvm_intel ath10k_pci msr kvm ath10k_core snd_hda_intel irqbypass ath mxm_wmi > snd_hda_codec ttm joydev mac80211 snd_hda_core drm_kms_helper > crct10dif_pclmul snd_hwdep crc32_pclmul snd_pcm crc32c_intel drm > hid_multitouch ghash_clmulni_intel snd_timer hid_generic iTCO_wdt > aesni_intel mei_hdcp iTCO_vendor_support snd aes_x86_64 fb_sys_fops cfg80211 > crypto_simd acerfan syscopyarea r8169 sysfillrect cryptd sysimgblt > glue_helper realtek idma64 acer_wmi i2c_algo_bit mei_me libphy pcspkr > sparse_keymap intel_lpss_pci intel_wmi_thunderbolt soundcore > [ 1102.004965] intel_pch_thermal mei i2c_i801 intel_lpss rfkill wmi_bmof > thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac pinctrl_intel > battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd serio_raw usbcore > i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc > scsi_dh_alua efivarfs autofs4 > [ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted > 5.1.0-rc1-desktop-debug+ #80 > [ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 > 08/01/2018 > [ 1102.004976] Workqueue: pm pm_runtime_work > [ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau] > [ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 74 > 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 dd <0f> > 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48 > [ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296 > [ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: > 0006 > [ 1102.005010] RDX: 0007 RSI: 0086 RDI: > 912f3ec963f0 > [ 1102.005010] RBP: R08: 03cb R09: > 0004 > [ 1102.005011] R10: R11: 0001 R12: > 912f330cc400 > [ 1102.005011] R13: 0040 R14: 912df09f0060 R15: > 912df09f80b0 > [ 1102.005012] FS: () GS:912f3ec8() > knlGS: > [ 1102.005012] CS: 0010 DS: ES: CR0: 80050033 > [ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: > 003606e0 > [ 1102.005013] Call Trace: > [ 1102.005044] acr_r352_bootstrap+0x16e/0x1d0 [nouveau] > [ 1102.005073] acr_r352_reset+0x21/0x190 [nouveau] > [ 1102.005105] gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau] > [ 1102.005136] gf100_gr_init_ctxctl+0x19/0x270 [nouveau] > [ 1102.005167] ? gf100_gr_init+0x533/0x570 [nouveau] > [ 1102.005181] nvkm_engine_init+0xa2/0x120 [nouveau] > [ 1102.005196] nvkm_subdev_init+0x8d/0xc0 [nouveau] > [ 1102.005226] nvkm_device_init+0x107/0x190 [nouveau] > [ 1102.005255] nvkm_udevice_init+0x3c/0x60 [nouveau] > [ 1102.005269] nvkm_object_init+0x39/0x100 [nouveau] > [ 1102.005284] nvkm_object_init+0x6c/0x100 [nouveau] > [ 1102.005299] nvkm_object_init+0x6c/0x100 [nouveau] > [ 1102.005328] nouveau_do_resume+0x23/0xb0 [nouveau] > [ 1102.005357] nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau] > [ 1102.005360] ? pci_restore_standard_config+0x40/0x40 > [ 1102.005361] pci_pm_runtime_resume+0x6f/0xc0 > [ 1102.005362] ? pci_restore_standard_config+0x40/0x40 > [ 1102.005363] __rpm_callback+0x76/0x120 > [ 1102.005365] ?
Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)
On 21.03.19 20:30, Tobias Klausmann wrote: On 21.03.19 18:12, Jerome Glisse wrote: On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote: Hi, just for your information and maybe for some help: with 5.1rc1 and SVM enabled i see the following backtrace [1] when the nouveau card (reverse prime) goes to sleep, for now i have papered over with [2] which leaves me with userspace hangs. Any pointers where to look for the actual culprit? PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) Greetings, Tobias Can you check if attached patch fix the issue ? Cheers, Jérôme Hi, the patch is fine, you can add my R-b & Tested-by! Of course i tested the second patch you send out, not the first one! PS: yet i have another unrelated error keeping my card from beeing happy, thats now the next on my todo list: [ 1102.004901] [ cut here ] [ 1102.004902] nouveau :01:00.0: timeout [ 1102.004948] WARNING: CPU: 2 PID: 55 at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci intel_wmi_thunderbolt soundcore [ 1102.004965] intel_pch_thermal mei i2c_i801 intel_lpss rfkill wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs autofs4 [ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 5.1.0-rc1-desktop-debug+ #80 [ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 [ 1102.004976] Workqueue: pm pm_runtime_work [ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48 [ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296 [ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 0006 [ 1102.005010] RDX: 0007 RSI: 0086 RDI: 912f3ec963f0 [ 1102.005010] RBP: R08: 03cb R09: 0004 [ 1102.005011] R10: R11: 0001 R12: 912f330cc400 [ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 912df09f80b0 [ 1102.005012] FS: () GS:912f3ec8() knlGS: [ 1102.005012] CS: 0010 DS: ES: CR0: 80050033 [ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 003606e0 [ 1102.005013] Call Trace: [ 1102.005044] acr_r352_bootstrap+0x16e/0x1d0 [nouveau] [ 1102.005073] acr_r352_reset+0x21/0x190 [nouveau] [ 1102.005105] gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau] [ 1102.005136] gf100_gr_init_ctxctl+0x19/0x270 [nouveau] [ 1102.005167] ? gf100_gr_init+0x533/0x570 [nouveau] [ 1102.005181] nvkm_engine_init+0xa2/0x120 [nouveau] [ 1102.005196] nvkm_subdev_init+0x8d/0xc0 [nouveau] [ 1102.005226] nvkm_device_init+0x107/0x190 [nouveau] [ 1102.005255] nvkm_udevice_init+0x3c/0x60 [nouveau] [ 1102.005269] nvkm_object_init+0x39/0x100 [nouveau] [ 1102.005284] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005299] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005328] nouveau_do_resume+0x23/0xb0 [nouveau] [ 1102.005357] nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau] [ 1102.005360] ? pci_restore_standard_config+0x40/0x40 [ 1102.005361] pci_pm_runtime_resume+0x6f/0xc0 [ 1102.005362] ? pci_restore_standard_config+0x40/0x40 [ 1102.005363] __rpm_callback+0x76/0x120 [ 1102.005365] ? pci_restore_standard_config+0x40/0x40 [ 1102.005366] rpm_callback+0x1a/0x70 [ 1102.005367] ? pci_restore_standard_config+0x40/0x40 [ 1102.005368] rpm_resume+0x3f5/0x5f0 [ 1102.005369] pm_runtime_work+0x4e/0xa0 [ 1102.005370] process_one_work+0x1d4/0x360 [ 1102.005372] worker_thread+0x28/0x3c0 [ 1102.005372] ?
Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)
On 21.03.19 18:12, Jerome Glisse wrote: On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote: Hi, just for your information and maybe for some help: with 5.1rc1 and SVM enabled i see the following backtrace [1] when the nouveau card (reverse prime) goes to sleep, for now i have papered over with [2] which leaves me with userspace hangs. Any pointers where to look for the actual culprit? PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) Greetings, Tobias Can you check if attached patch fix the issue ? Cheers, Jérôme Hi, the patch is fine, you can add my R-b & Tested-by! PS: yet i have another unrelated error keeping my card from beeing happy, thats now the next on my todo list: [ 1102.004901] [ cut here ] [ 1102.004902] nouveau :01:00.0: timeout [ 1102.004948] WARNING: CPU: 2 PID: 55 at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci intel_wmi_thunderbolt soundcore [ 1102.004965] intel_pch_thermal mei i2c_i801 intel_lpss rfkill wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs autofs4 [ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 5.1.0-rc1-desktop-debug+ #80 [ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 [ 1102.004976] Workqueue: pm pm_runtime_work [ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau] [ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48 [ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296 [ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 0006 [ 1102.005010] RDX: 0007 RSI: 0086 RDI: 912f3ec963f0 [ 1102.005010] RBP: R08: 03cb R09: 0004 [ 1102.005011] R10: R11: 0001 R12: 912f330cc400 [ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 912df09f80b0 [ 1102.005012] FS: () GS:912f3ec8() knlGS: [ 1102.005012] CS: 0010 DS: ES: CR0: 80050033 [ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 003606e0 [ 1102.005013] Call Trace: [ 1102.005044] acr_r352_bootstrap+0x16e/0x1d0 [nouveau] [ 1102.005073] acr_r352_reset+0x21/0x190 [nouveau] [ 1102.005105] gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau] [ 1102.005136] gf100_gr_init_ctxctl+0x19/0x270 [nouveau] [ 1102.005167] ? gf100_gr_init+0x533/0x570 [nouveau] [ 1102.005181] nvkm_engine_init+0xa2/0x120 [nouveau] [ 1102.005196] nvkm_subdev_init+0x8d/0xc0 [nouveau] [ 1102.005226] nvkm_device_init+0x107/0x190 [nouveau] [ 1102.005255] nvkm_udevice_init+0x3c/0x60 [nouveau] [ 1102.005269] nvkm_object_init+0x39/0x100 [nouveau] [ 1102.005284] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005299] nvkm_object_init+0x6c/0x100 [nouveau] [ 1102.005328] nouveau_do_resume+0x23/0xb0 [nouveau] [ 1102.005357] nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau] [ 1102.005360] ? pci_restore_standard_config+0x40/0x40 [ 1102.005361] pci_pm_runtime_resume+0x6f/0xc0 [ 1102.005362] ? pci_restore_standard_config+0x40/0x40 [ 1102.005363] __rpm_callback+0x76/0x120 [ 1102.005365] ? pci_restore_standard_config+0x40/0x40 [ 1102.005366] rpm_callback+0x1a/0x70 [ 1102.005367] ? pci_restore_standard_config+0x40/0x40 [ 1102.005368] rpm_resume+0x3f5/0x5f0 [ 1102.005369] pm_runtime_work+0x4e/0xa0 [ 1102.005370] process_one_work+0x1d4/0x360 [ 1102.005372] worker_thread+0x28/0x3c0 [ 1102.005372] ? process_one_work+0x360/0x360 [ 1102.005374] kthread+0x10d/0x130 [ 1102.005375] ? kthread_create_worker_on_cpu+0x40/0x40 [ 1102.005377]
Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)
On Thu, Mar 21, 2019 at 01:12:07PM -0400, Jerome Glisse wrote: > On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote: > > Hi, > > > > just for your information and maybe for some help: with 5.1rc1 and SVM > > enabled i see the following backtrace [1] when the nouveau card (reverse > > prime) goes to sleep, for now i have papered over with [2] which leaves me > > with userspace hangs. Any pointers where to look for the actual culprit? > > > > PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) > > > > Greetings, > > > > Tobias > > Can you check if attached patch fix the issue ? Sorry sent bogus patch here is a good one ... Cheers, Jérôme >From 5b413953ba7abd3f92f46ef8261cd64368f0ae84 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= Date: Thu, 21 Mar 2019 13:08:46 -0400 Subject: [PATCH] gpu/nouveau: empty chunk do not have a buffer object associated with them. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Empty chunk do not have a bo associated with them so no need to pin/unpin on suspend/resume. Signed-off-by: Jérôme Glisse Cc: Ben Skeggs Cc: dri-de...@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: David Airlie Cc: Daniel Vetter Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index aa9fec80492d..a510dbe9a9cb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -456,11 +456,6 @@ nouveau_dmem_resume(struct nouveau_drm *drm) /* FIXME handle pin failure */ WARN_ON(ret); } - list_for_each_entry (chunk, >dmem->chunk_empty, list) { - ret = nouveau_bo_pin(chunk->bo, TTM_PL_FLAG_VRAM, false); - /* FIXME handle pin failure */ - WARN_ON(ret); - } mutex_unlock(>dmem->mutex); } @@ -479,9 +474,6 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) list_for_each_entry (chunk, >dmem->chunk_full, list) { nouveau_bo_unpin(chunk->bo); } - list_for_each_entry (chunk, >dmem->chunk_empty, list) { - nouveau_bo_unpin(chunk->bo); - } mutex_unlock(>dmem->mutex); } -- 2.17.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)
On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote: > Hi, > > just for your information and maybe for some help: with 5.1rc1 and SVM > enabled i see the following backtrace [1] when the nouveau card (reverse > prime) goes to sleep, for now i have papered over with [2] which leaves me > with userspace hangs. Any pointers where to look for the actual culprit? > > PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) > > Greetings, > > Tobias Can you check if attached patch fix the issue ? Cheers, Jérôme >From 0304725edbaa3b828598a3babb785e6b9555af0b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= Date: Thu, 21 Mar 2019 13:08:46 -0400 Subject: [PATCH] gpu/nouveau: initialize some fields of dmem no matter what MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On GPU that do not support device memory we left dmem fields uninitialized and this lead to troube in suspend/resume which try to use those fields. It seems best to initialize those fields no matter what. Signed-off-by: Jérôme Glisse Cc: Ben Skeggs Cc: dri-de...@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: David Airlie Cc: Daniel Vetter Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index aa9fec80492d..35b6e83ead8a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -593,6 +593,11 @@ nouveau_dmem_init(struct nouveau_drm *drm) unsigned long i, size; int ret; + mutex_init(>dmem->mutex); + INIT_LIST_HEAD(>dmem->chunk_free); + INIT_LIST_HEAD(>dmem->chunk_full); + INIT_LIST_HEAD(>dmem->chunk_empty); + /* This only make sense on PASCAL or newer */ if (drm->client.device.info.family < NV_DEVICE_INFO_V0_PASCAL) return; @@ -600,11 +605,6 @@ nouveau_dmem_init(struct nouveau_drm *drm) if (!(drm->dmem = kzalloc(sizeof(*drm->dmem), GFP_KERNEL))) return; - mutex_init(>dmem->mutex); - INIT_LIST_HEAD(>dmem->chunk_free); - INIT_LIST_HEAD(>dmem->chunk_full); - INIT_LIST_HEAD(>dmem->chunk_empty); - size = ALIGN(drm->client.device.info.ram_user, DMEM_CHUNK_SIZE); /* Initialize migration dma helpers before registering memory */ -- 2.17.1 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Nouveau dmem NULL Pointer deref (SVM)
Hi, just for your information and maybe for some help: with 5.1rc1 and SVM enabled i see the following backtrace [1] when the nouveau card (reverse prime) goes to sleep, for now i have papered over with [2] which leaves me with userspace hangs. Any pointers where to look for the actual culprit? PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1) Greetings, Tobias [1]: BUG: unable to handle kernel NULL pointer dereference at 0028 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: [#1] PREEMPT SMP PTI CPU: 3 PID: 435 Comm: kworker/3:4 Not tainted 5.1.0-rc1-desktop-debug+ #80 Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 Workqueue: pm pm_runtime_work RIP: 0010:nouveau_bo_unpin (linux/./include/linux/compiler.h:193 linux/./arch/x86/include/asm/atomic.h:31 linux/./include/asm-generic/atomic-instrumented.h:27 linux/./include/linux/refcount.h:43 linux/./include/linux/kref.h:38 linux/./include/drm/ttm/ttm_bo_driver.h:721 linux/drivers/gpu/drm/nouveau/nouveau_bo.c:454) nouveau Code: 89 d9 48 c7 c6 50 04 e5 c0 c4 42 79 f7 c0 bd f0 ff ff ff e8 42 d5 7a c6 ff 83 00 04 00 00 e9 17 ff ff ff 41 54 55 53 48 89 fb <8b> 47 28 85 c0 0f 84 cf 00 00 00 48 8b bb c0 01 00 00 31 f6 4c 8b All code 0: 89 d9 mov %ebx,%ecx 2: 48 c7 c6 50 04 e5 c0 mov $0xc0e50450,%rsi 9: c4 42 79 f7 c0 shlx %eax,%r8d,%r8d e: bd f0 ff ff ff mov $0xfff0,%ebp 13: e8 42 d5 7a c6 callq 0xc67ad55a 18: ff 83 00 04 00 00 incl 0x400(%rbx) 1e: e9 17 ff ff ff jmpq 0xff3a 23: 41 54 push %r12 25: 55 push %rbp 26: 53 push %rbx 27: 48 89 fb mov %rdi,%rbx 2a:* 8b 47 28 mov 0x28(%rdi),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 0f 84 cf 00 00 00 je 0x104 35: 48 8b bb c0 01 00 00 mov 0x1c0(%rbx),%rdi 3c: 31 f6 xor %esi,%esi 3e: 4c rex.WR 3f: 8b .byte 0x8b Code starting with the faulting instruction === 0: 8b 47 28 mov 0x28(%rdi),%eax 3: 85 c0 test %eax,%eax 5: 0f 84 cf 00 00 00 je 0xda b: 48 8b bb c0 01 00 00 mov 0x1c0(%rbx),%rdi 12: 31 f6 xor %esi,%esi 14: 4c rex.WR 15: 8b .byte 0x8b RSP: 0018:bf0b41237d20 EFLAGS: 00010216 RAX: 9dfe0ba2ec00 RBX: RCX: c0ceb630 RDX: 9dfe0ba2ec38 RSI: 7fff RDI: RBP: 9dfe0a07e000 R08: R09: c0d4a9a0 R10: 8080808080808080 R11: 1800 R12: 0001 R13: R14: R15: 0008 FS: () GS:9dfe3ecc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0028 CR3: 0001a500e002 CR4: 003606e0 Call Trace: nouveau_dmem_suspend (linux/drivers/gpu/drm/nouveau/nouveau_dmem.c:482 (discriminator 9)) nouveau nouveau_do_suspend (linux/drivers/gpu/drm/nouveau/nouveau_drm.c:748) nouveau nouveau_pmops_runtime_suspend (linux/drivers/gpu/drm/nouveau/nouveau_drm.c:915) nouveau pci_pm_runtime_suspend (linux/drivers/pci/pci-driver.c:1262) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238) __rpm_callback (linux/drivers/base/power/runtime.c:357) ? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238) rpm_callback (linux/drivers/base/power/runtime.c:490) ? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238) rpm_suspend (linux/drivers/base/power/runtime.c:629) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) ? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312) pm_runtime_work (linux/drivers/base/power/runtime.c:922) process_one_work (linux/./arch/x86/include/asm/preempt.h:26 linux/kernel/workqueue.c:2278) worker_thread (linux/./include/linux/compiler.h:193 linux/./include/linux/list.h:237 linux/kernel/workqueue.c:2416) ? process_one_work (linux/kernel/workqueue.c:2358) kthread (linux/kernel/kthread.c:253) ? kthread_create_worker_on_cpu (linux/kernel/kthread.c:213) ret_from_fork (linux/arch/x86/entry/entry_64.S:358) Modules linked in: rfcomm af_packet snd_hda_codec_hdmi bnep uvcvideo videobuf2_vmalloc rtsx_usb_sdmmc videobuf2_memops btusb rtsx_usb_ms videobuf2_v4l2 btrtl mmc_core memstick btbcm videodev btintel videobuf2_common rtsx_usb bluetooth usbhid