Re: [Nouveau] [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50

2019-03-21 Thread Bjorn Helgaas
[+cc Rafael]

On Wed, Mar 13, 2019 at 06:25:02PM -0400, Lyude Paul wrote:
> On Fri, 2019-02-15 at 16:17 -0500, Lyude Paul wrote:
> > On Thu, 2019-02-14 at 18:43 -0600, Bjorn Helgaas wrote:
> > > On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote:
> > > > On a very specific subset of ThinkPad P50 SKUs, particularly
> > > > ones that come with a Quadro M1000M chip instead of the M2000M
> > > > variant, the BIOS seems to have a very nasty habit of not
> > > > always resetting the secondary Nvidia GPU between full reboots
> > > > if the laptop is configured in Hybrid Graphics mode. The
> > > > reason for this happening is unknown, but the following steps
> > > > and possibly a good bit of patience will reproduce the issue:
> > > > 
> > > > 1. Boot up the laptop normally in Hybrid graphics mode
> > > > 2. Make sure nouveau is loaded and that the GPU is awake
> > > > 2. Allow the nvidia GPU to runtime suspend itself after being idle
> > > > 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
> > > > 4. If nouveau loads up properly, reboot the machine again and go back to
> > > > step 2 until you reproduce the issue
> > > > 
> > > > This results in some very strange behavior: the GPU will quite
> > > > literally be left in exactly the same state it was in when the
> > > > previously booted kernel started the reboot. This has all
> > > > sorts of bad sideaffects: for starters, this completely breaks
> > > > nouveau starting with a mysterious EVO channel failure that
> > > > happens well before we've actually used the EVO channel for
> > > > anything:

Thanks for the hybrid tutorial (snipped from this response).  IIUC,
what you said was that in hybrid mode, the Intel GPU drives the
built-in display and the Nvidia GPU drives any external displays and
may be used for DRI PRIME rendering (whatever that is).  But since you
say the Nvidia device gets runtime suspended, I assume there's no
external display here and you're not using DRI PRIME.

I wonder if it's related to the fact that the Nvidia GPU has been
runtime suspended before you do the reboot.  Can you try turning of
runtime power management for the GPU by setting the runpm module
parameter to 0?  I *think* this would be booting with
"nouveau.runpm=0".

> > > Is there a bug report for this?  Bugzilla.kernel.org would be ideal,
> > > including "lspci -vvxxx" and dmidecode for the system.
> > > 
> > Not yet, but there has been discussion about this between nouveau
> > developers on our IRC channel.
>
> I lied: yes there actually is a bug report for this, but it's
> currently on the Red Hat bugzilla. I can get more information from
> it if you need (with lenovo's approval of course).

Can you please make a bugzilla.kernel.org entry with as much
information (dmesg, "lspci -vvxxx", dmidecode, etc) as you can get
approval for?  You can include the Red Hat bugzilla URL in the commit
log, too, but that's not quite as good because we have no control over
whether it's public.

> And additionally: I've been working with Lenovo on this issue for a
> couple of months now, and we've gone through dozens of different
> trial BIOSes with no success thus far. However, Lenovo is currently
> working on trying to add this workaround into their BIOS but I've
> been told that this change is going to take a decent amount of time
> since they need to test it across multiple operating systems. I'd be
> happy to come back and add a conditional later to turn this
> workaround off for later BIOS versions once Lenovo has released a
> proper fix.

Sounds like Lenovo is going to a lot of trouble for this.  The ideal
thing from my point of view would be if they could figure out why this
works on Windows but not on Linux.  I doubt Windows has a quirk like
this, so if we could figure out why it works on Windows, we could
likely do something similar in Linux.

> > > > So to do this, we add a new pci quirk using
> > > > DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe
> > > > at boot finishes. From there, we check to make sure that this is indeed
> > > > the specific P50 variant of this GPU. We also make sure that the GPU PCI
> > > > device is advertising NoReset- in order to prevent us from trying to
> > > > reset the GPU when the machine is in Dedicated graphics mode (where the
> > > > GPU being initialized by the BIOS is normal and expected). Finally, we
> > > > try mapping the MMIO space for the GPU which should only work if the GPU
> > > > is actually active in D0 mode. We can then read the magic 0x2240c
> > > > register on the GPU, which will have bit 1 set if the GPU's firmware has
> > > > already been posted during a previous boot. Once we've confirmed all of
> > > > this, we reset the PCI device and re-disable it - bringing the GPU back
> > > > into a healthy state.
> > > > 
> > > > Signed-off-by: Lyude Paul 
> > > > Cc: nouveau@lists.freedesktop.org
> > > > Cc: dri-de...@lists.freedesktop.org
> > > > Cc: Karol Herbst 
> > > > Cc: Ben 

[Nouveau] [PATCH] gpu/nouveau: empty chunk do not have a buffer object associated with them.

2019-03-21 Thread jglisse
From: Jérôme Glisse 

Empty chunk do not have a bo associated with them so no need to pin/unpin
on suspend/resume.

This fix suspend/resume on 5.1rc1 when NOUVEAU_SVM is enabled.

Signed-off-by: Jérôme Glisse 
Reviewed-by: Tobias Klausmann 
Tested-by: Tobias Klausmann 
Cc: Ben Skeggs 
Cc: dri-de...@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: David Airlie 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index aa9fec80492d..a510dbe9a9cb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -456,11 +456,6 @@ nouveau_dmem_resume(struct nouveau_drm *drm)
/* FIXME handle pin failure */
WARN_ON(ret);
}
-   list_for_each_entry (chunk, >dmem->chunk_empty, list) {
-   ret = nouveau_bo_pin(chunk->bo, TTM_PL_FLAG_VRAM, false);
-   /* FIXME handle pin failure */
-   WARN_ON(ret);
-   }
mutex_unlock(>dmem->mutex);
 }
 
@@ -479,9 +474,6 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
list_for_each_entry (chunk, >dmem->chunk_full, list) {
nouveau_bo_unpin(chunk->bo);
}
-   list_for_each_entry (chunk, >dmem->chunk_empty, list) {
-   nouveau_bo_unpin(chunk->bo);
-   }
mutex_unlock(>dmem->mutex);
 }
 
-- 
2.17.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Jerome Glisse
On Thu, Mar 21, 2019 at 08:30:28PM +0100, Tobias Klausmann wrote:
> On 21.03.19 18:12, Jerome Glisse wrote:
> > On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote:
> > > Hi,
> > > 
> > > just for your information and maybe for some help: with 5.1rc1 and SVM
> > > enabled i see the following backtrace [1] when the nouveau card (reverse
> > > prime) goes to sleep, for now i have papered over with [2] which leaves me
> > > with userspace hangs. Any pointers where to look for the actual culprit?
> > > 
> > > PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)
> > > 
> > > Greetings,
> > > 
> > > Tobias
> > Can you check if attached patch fix the issue ?
> > 
> > Cheers,
> > Jérôme
> > 
> 
> Hi,
> 
> the patch is fine, you can add my R-b & Tested-by!

Thank you for the quick testing ! I will post the patch with your rb.

> 
> PS: yet i have another unrelated error keeping my card from beeing happy,
> thats now the next on my todo list:

For secureboot related issue Ben would know this lot better than i do :)

> 
> [ 1102.004901] [ cut here ]
> [ 1102.004902] nouveau :01:00.0: timeout
> [ 1102.004948] WARNING: CPU: 2 PID: 55 at
> drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183
> acr_ls_sec2_post_run+0x139/0x190 [nouveau]
> [ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo btrtl
> btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms videobuf2_memops
> mmc_core bluetooth memstick videobuf2_v4l2 videodev videobuf2_common
> ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio nouveau arc4 nls_iso8859_1 nls_cp437
> i915 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel ath10k_pci msr kvm ath10k_core snd_hda_intel irqbypass ath mxm_wmi
> snd_hda_codec ttm joydev mac80211 snd_hda_core drm_kms_helper
> crct10dif_pclmul snd_hwdep crc32_pclmul snd_pcm crc32c_intel drm
> hid_multitouch ghash_clmulni_intel snd_timer hid_generic iTCO_wdt
> aesni_intel mei_hdcp iTCO_vendor_support snd aes_x86_64 fb_sys_fops cfg80211
> crypto_simd acerfan syscopyarea r8169 sysfillrect cryptd sysimgblt
> glue_helper realtek idma64 acer_wmi i2c_algo_bit mei_me libphy pcspkr
> sparse_keymap intel_lpss_pci intel_wmi_thunderbolt soundcore
> [ 1102.004965]  intel_pch_thermal mei i2c_i801 intel_lpss rfkill wmi_bmof
> thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac pinctrl_intel
> battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd serio_raw usbcore
> i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua efivarfs autofs4
> [ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted
> 5.1.0-rc1-desktop-debug+ #80
> [ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11
> 08/01/2018
> [ 1102.004976] Workqueue: pm pm_runtime_work
> [ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau]
> [ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 74
> 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 dd <0f>
> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48
> [ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296
> [ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX:
> 0006
> [ 1102.005010] RDX: 0007 RSI: 0086 RDI:
> 912f3ec963f0
> [ 1102.005010] RBP:  R08: 03cb R09:
> 0004
> [ 1102.005011] R10:  R11: 0001 R12:
> 912f330cc400
> [ 1102.005011] R13: 0040 R14: 912df09f0060 R15:
> 912df09f80b0
> [ 1102.005012] FS:  () GS:912f3ec8()
> knlGS:
> [ 1102.005012] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4:
> 003606e0
> [ 1102.005013] Call Trace:
> [ 1102.005044]  acr_r352_bootstrap+0x16e/0x1d0 [nouveau]
> [ 1102.005073]  acr_r352_reset+0x21/0x190 [nouveau]
> [ 1102.005105]  gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau]
> [ 1102.005136]  gf100_gr_init_ctxctl+0x19/0x270 [nouveau]
> [ 1102.005167]  ? gf100_gr_init+0x533/0x570 [nouveau]
> [ 1102.005181]  nvkm_engine_init+0xa2/0x120 [nouveau]
> [ 1102.005196]  nvkm_subdev_init+0x8d/0xc0 [nouveau]
> [ 1102.005226]  nvkm_device_init+0x107/0x190 [nouveau]
> [ 1102.005255]  nvkm_udevice_init+0x3c/0x60 [nouveau]
> [ 1102.005269]  nvkm_object_init+0x39/0x100 [nouveau]
> [ 1102.005284]  nvkm_object_init+0x6c/0x100 [nouveau]
> [ 1102.005299]  nvkm_object_init+0x6c/0x100 [nouveau]
> [ 1102.005328]  nouveau_do_resume+0x23/0xb0 [nouveau]
> [ 1102.005357]  nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau]
> [ 1102.005360]  ? pci_restore_standard_config+0x40/0x40
> [ 1102.005361]  pci_pm_runtime_resume+0x6f/0xc0
> [ 1102.005362]  ? pci_restore_standard_config+0x40/0x40
> [ 1102.005363]  __rpm_callback+0x76/0x120
> [ 1102.005365]  ? 

Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Tobias Klausmann


On 21.03.19 20:30, Tobias Klausmann wrote:

On 21.03.19 18:12, Jerome Glisse wrote:

On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote:

Hi,

just for your information and maybe for some help: with 5.1rc1 and SVM
enabled i see the following backtrace [1] when the nouveau card 
(reverse
prime) goes to sleep, for now i have papered over with [2] which 
leaves me
with userspace hangs. Any pointers where to look for the actual 
culprit?


PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)

Greetings,

Tobias

Can you check if attached patch fix the issue ?

Cheers,
Jérôme



Hi,

the patch is fine, you can add my R-b & Tested-by!



Of course i tested the second patch you send out, not the first one!




PS: yet i have another unrelated error keeping my card from beeing 
happy, thats now the next on my todo list:


[ 1102.004901] [ cut here ]
[ 1102.004902] nouveau :01:00.0: timeout
[ 1102.004948] WARNING: CPU: 2 PID: 55 at 
drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 
acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo 
btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms 
videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev 
videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 
nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core 
snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 
snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul 
snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer 
hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd 
aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 
sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi 
i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci 
intel_wmi_thunderbolt soundcore
[ 1102.004965]  intel_pch_thermal mei i2c_i801 intel_lpss rfkill 
wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac 
pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci 
xhci_hcd serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs autofs4
[ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 
5.1.0-rc1-desktop-debug+ #80
[ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS 
V1.11 08/01/2018

[ 1102.004976] Workqueue: pm pm_runtime_work
[ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 
f6 74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c 
b6 20 dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 
10 48

[ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296
[ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 
0006
[ 1102.005010] RDX: 0007 RSI: 0086 RDI: 
912f3ec963f0
[ 1102.005010] RBP:  R08: 03cb R09: 
0004
[ 1102.005011] R10:  R11: 0001 R12: 
912f330cc400
[ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 
912df09f80b0
[ 1102.005012] FS:  () GS:912f3ec8() 
knlGS:

[ 1102.005012] CS:  0010 DS:  ES:  CR0: 80050033
[ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 
003606e0

[ 1102.005013] Call Trace:
[ 1102.005044]  acr_r352_bootstrap+0x16e/0x1d0 [nouveau]
[ 1102.005073]  acr_r352_reset+0x21/0x190 [nouveau]
[ 1102.005105]  gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau]
[ 1102.005136]  gf100_gr_init_ctxctl+0x19/0x270 [nouveau]
[ 1102.005167]  ? gf100_gr_init+0x533/0x570 [nouveau]
[ 1102.005181]  nvkm_engine_init+0xa2/0x120 [nouveau]
[ 1102.005196]  nvkm_subdev_init+0x8d/0xc0 [nouveau]
[ 1102.005226]  nvkm_device_init+0x107/0x190 [nouveau]
[ 1102.005255]  nvkm_udevice_init+0x3c/0x60 [nouveau]
[ 1102.005269]  nvkm_object_init+0x39/0x100 [nouveau]
[ 1102.005284]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005299]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005328]  nouveau_do_resume+0x23/0xb0 [nouveau]
[ 1102.005357]  nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau]
[ 1102.005360]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005361]  pci_pm_runtime_resume+0x6f/0xc0
[ 1102.005362]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005363]  __rpm_callback+0x76/0x120
[ 1102.005365]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005366]  rpm_callback+0x1a/0x70
[ 1102.005367]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005368]  rpm_resume+0x3f5/0x5f0
[ 1102.005369]  pm_runtime_work+0x4e/0xa0
[ 1102.005370]  process_one_work+0x1d4/0x360
[ 1102.005372]  worker_thread+0x28/0x3c0
[ 1102.005372]  ? 

Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Tobias Klausmann

On 21.03.19 18:12, Jerome Glisse wrote:

On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote:

Hi,

just for your information and maybe for some help: with 5.1rc1 and SVM
enabled i see the following backtrace [1] when the nouveau card (reverse
prime) goes to sleep, for now i have papered over with [2] which leaves me
with userspace hangs. Any pointers where to look for the actual culprit?

PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)

Greetings,

Tobias

Can you check if attached patch fix the issue ?

Cheers,
Jérôme



Hi,

the patch is fine, you can add my R-b & Tested-by!

PS: yet i have another unrelated error keeping my card from beeing 
happy, thats now the next on my todo list:


[ 1102.004901] [ cut here ]
[ 1102.004902] nouveau :01:00.0: timeout
[ 1102.004948] WARNING: CPU: 2 PID: 55 at 
drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:183 
acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.004949] Modules linked in: rfcomm af_packet bnep btusb uvcvideo 
btrtl btbcm rtsx_usb_sdmmc btintel videobuf2_vmalloc rtsx_usb_ms 
videobuf2_memops mmc_core bluetooth memstick videobuf2_v4l2 videodev 
videobuf2_common ecdh_generic rtsx_usb snd_hda_codec_hdmi usbhid 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau arc4 
nls_iso8859_1 nls_cp437 i915 vfat fat intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel ath10k_pci msr kvm ath10k_core 
snd_hda_intel irqbypass ath mxm_wmi snd_hda_codec ttm joydev mac80211 
snd_hda_core drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul 
snd_pcm crc32c_intel drm hid_multitouch ghash_clmulni_intel snd_timer 
hid_generic iTCO_wdt aesni_intel mei_hdcp iTCO_vendor_support snd 
aes_x86_64 fb_sys_fops cfg80211 crypto_simd acerfan syscopyarea r8169 
sysfillrect cryptd sysimgblt glue_helper realtek idma64 acer_wmi 
i2c_algo_bit mei_me libphy pcspkr sparse_keymap intel_lpss_pci 
intel_wmi_thunderbolt soundcore
[ 1102.004965]  intel_pch_thermal mei i2c_i801 intel_lpss rfkill 
wmi_bmof thermal tpm_crb tpm_tis pinctrl_sunrisepoint tpm_tis_core ac 
pinctrl_intel battery tpm button acpi_pad pcc_cpufreq xhci_pci xhci_hcd 
serio_raw usbcore i2c_hid wmi video sg dm_multipath dm_mod scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua efivarfs autofs4
[ 1102.004972] CPU: 2 PID: 55 Comm: kworker/2:1 Not tainted 
5.1.0-rc1-desktop-debug+ #80
[ 1102.004973] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 
08/01/2018

[ 1102.004976] Workqueue: pm pm_runtime_work
[ 1102.005007] RIP: 0010:acr_ls_sec2_post_run+0x139/0x190 [nouveau]
[ 1102.005008] Code: 04 24 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 
74 1e e8 b9 2d 6a dd 48 89 c6 4c 89 f2 48 c7 c7 39 15 fb c0 e8 8c b6 20 
dd <0f> 0b e9 4c ff ff ff 4c 8b 77 10 eb dc 48 8b 04 24 48 8b 40 10 48

[ 1102.005009] RSP: 0018:a45c00ee7ab8 EFLAGS: 00010296
[ 1102.005009] RAX: 001d RBX: 912f0e366900 RCX: 
0006
[ 1102.005010] RDX: 0007 RSI: 0086 RDI: 
912f3ec963f0
[ 1102.005010] RBP:  R08: 03cb R09: 
0004
[ 1102.005011] R10:  R11: 0001 R12: 
912f330cc400
[ 1102.005011] R13: 0040 R14: 912df09f0060 R15: 
912df09f80b0
[ 1102.005012] FS:  () GS:912f3ec8() 
knlGS:

[ 1102.005012] CS:  0010 DS:  ES:  CR0: 80050033
[ 1102.005013] CR2: 7fed2968e020 CR3: 00028a728004 CR4: 
003606e0

[ 1102.005013] Call Trace:
[ 1102.005044]  acr_r352_bootstrap+0x16e/0x1d0 [nouveau]
[ 1102.005073]  acr_r352_reset+0x21/0x190 [nouveau]
[ 1102.005105]  gf100_gr_init_ctxctl_ext+0x59/0x500 [nouveau]
[ 1102.005136]  gf100_gr_init_ctxctl+0x19/0x270 [nouveau]
[ 1102.005167]  ? gf100_gr_init+0x533/0x570 [nouveau]
[ 1102.005181]  nvkm_engine_init+0xa2/0x120 [nouveau]
[ 1102.005196]  nvkm_subdev_init+0x8d/0xc0 [nouveau]
[ 1102.005226]  nvkm_device_init+0x107/0x190 [nouveau]
[ 1102.005255]  nvkm_udevice_init+0x3c/0x60 [nouveau]
[ 1102.005269]  nvkm_object_init+0x39/0x100 [nouveau]
[ 1102.005284]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005299]  nvkm_object_init+0x6c/0x100 [nouveau]
[ 1102.005328]  nouveau_do_resume+0x23/0xb0 [nouveau]
[ 1102.005357]  nouveau_pmops_runtime_resume+0x7c/0x150 [nouveau]
[ 1102.005360]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005361]  pci_pm_runtime_resume+0x6f/0xc0
[ 1102.005362]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005363]  __rpm_callback+0x76/0x120
[ 1102.005365]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005366]  rpm_callback+0x1a/0x70
[ 1102.005367]  ? pci_restore_standard_config+0x40/0x40
[ 1102.005368]  rpm_resume+0x3f5/0x5f0
[ 1102.005369]  pm_runtime_work+0x4e/0xa0
[ 1102.005370]  process_one_work+0x1d4/0x360
[ 1102.005372]  worker_thread+0x28/0x3c0
[ 1102.005372]  ? process_one_work+0x360/0x360
[ 1102.005374]  kthread+0x10d/0x130
[ 1102.005375]  ? kthread_create_worker_on_cpu+0x40/0x40
[ 1102.005377]  

Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Jerome Glisse
On Thu, Mar 21, 2019 at 01:12:07PM -0400, Jerome Glisse wrote:
> On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote:
> > Hi,
> > 
> > just for your information and maybe for some help: with 5.1rc1 and SVM
> > enabled i see the following backtrace [1] when the nouveau card (reverse
> > prime) goes to sleep, for now i have papered over with [2] which leaves me
> > with userspace hangs. Any pointers where to look for the actual culprit?
> > 
> > PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)
> > 
> > Greetings,
> > 
> > Tobias
> 
> Can you check if attached patch fix the issue ?

Sorry sent bogus patch here is a good one ...

Cheers,
Jérôme


>From 5b413953ba7abd3f92f46ef8261cd64368f0ae84 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= 
Date: Thu, 21 Mar 2019 13:08:46 -0400
Subject: [PATCH] gpu/nouveau: empty chunk do not have a buffer object
 associated with them.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Empty chunk do not have a bo associated with them so no need to pin/unpin
on suspend/resume.

Signed-off-by: Jérôme Glisse 
Cc: Ben Skeggs 
Cc: dri-de...@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index aa9fec80492d..a510dbe9a9cb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -456,11 +456,6 @@ nouveau_dmem_resume(struct nouveau_drm *drm)
/* FIXME handle pin failure */
WARN_ON(ret);
}
-   list_for_each_entry (chunk, >dmem->chunk_empty, list) {
-   ret = nouveau_bo_pin(chunk->bo, TTM_PL_FLAG_VRAM, false);
-   /* FIXME handle pin failure */
-   WARN_ON(ret);
-   }
mutex_unlock(>dmem->mutex);
 }
 
@@ -479,9 +474,6 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
list_for_each_entry (chunk, >dmem->chunk_full, list) {
nouveau_bo_unpin(chunk->bo);
}
-   list_for_each_entry (chunk, >dmem->chunk_empty, list) {
-   nouveau_bo_unpin(chunk->bo);
-   }
mutex_unlock(>dmem->mutex);
 }
 
-- 
2.17.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Jerome Glisse
On Thu, Mar 21, 2019 at 04:59:14PM +0100, Tobias Klausmann wrote:
> Hi,
> 
> just for your information and maybe for some help: with 5.1rc1 and SVM
> enabled i see the following backtrace [1] when the nouveau card (reverse
> prime) goes to sleep, for now i have papered over with [2] which leaves me
> with userspace hangs. Any pointers where to look for the actual culprit?
> 
> PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)
> 
> Greetings,
> 
> Tobias

Can you check if attached patch fix the issue ?

Cheers,
Jérôme

>From 0304725edbaa3b828598a3babb785e6b9555af0b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= 
Date: Thu, 21 Mar 2019 13:08:46 -0400
Subject: [PATCH] gpu/nouveau: initialize some fields of dmem no matter what
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

On GPU that do not support device memory we left dmem fields uninitialized
and this lead to troube in suspend/resume which try to use those fields. It
seems best to initialize those fields no matter what.

Signed-off-by: Jérôme Glisse 
Cc: Ben Skeggs 
Cc: dri-de...@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index aa9fec80492d..35b6e83ead8a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -593,6 +593,11 @@ nouveau_dmem_init(struct nouveau_drm *drm)
unsigned long i, size;
int ret;
 
+   mutex_init(>dmem->mutex);
+   INIT_LIST_HEAD(>dmem->chunk_free);
+   INIT_LIST_HEAD(>dmem->chunk_full);
+   INIT_LIST_HEAD(>dmem->chunk_empty);
+
/* This only make sense on PASCAL or newer */
if (drm->client.device.info.family < NV_DEVICE_INFO_V0_PASCAL)
return;
@@ -600,11 +605,6 @@ nouveau_dmem_init(struct nouveau_drm *drm)
if (!(drm->dmem = kzalloc(sizeof(*drm->dmem), GFP_KERNEL)))
return;
 
-   mutex_init(>dmem->mutex);
-   INIT_LIST_HEAD(>dmem->chunk_free);
-   INIT_LIST_HEAD(>dmem->chunk_full);
-   INIT_LIST_HEAD(>dmem->chunk_empty);
-
size = ALIGN(drm->client.device.info.ram_user, DMEM_CHUNK_SIZE);
 
/* Initialize migration dma helpers before registering memory */
-- 
2.17.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] Nouveau dmem NULL Pointer deref (SVM)

2019-03-21 Thread Tobias Klausmann

Hi,

just for your information and maybe for some help: with 5.1rc1 and SVM 
enabled i see the following backtrace [1] when the nouveau card (reverse 
prime) goes to sleep, for now i have papered over with [2] which leaves 
me with userspace hangs. Any pointers where to look for the actual culprit?


PS: Card is: nouveau :01:00.0: NVIDIA GP106 (136000a1)

Greetings,

Tobias


[1]:

BUG: unable to handle kernel NULL pointer dereference at 0028
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 3 PID: 435 Comm: kworker/3:4 Not tainted 5.1.0-rc1-desktop-debug+ #80
Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018
Workqueue: pm pm_runtime_work
RIP: 0010:nouveau_bo_unpin (linux/./include/linux/compiler.h:193 
linux/./arch/x86/include/asm/atomic.h:31 
linux/./include/asm-generic/atomic-instrumented.h:27 
linux/./include/linux/refcount.h:43 linux/./include/linux/kref.h:38 
linux/./include/drm/ttm/ttm_bo_driver.h:721 
linux/drivers/gpu/drm/nouveau/nouveau_bo.c:454) nouveau
Code: 89 d9 48 c7 c6 50 04 e5 c0 c4 42 79 f7 c0 bd f0 ff ff ff e8 42 d5 
7a c6 ff 83 00 04 00 00 e9 17 ff ff ff 41 54 55 53 48 89 fb <8b> 47 28 
85 c0 0f 84 cf 00 00 00 48 8b bb c0 01 00 00 31 f6 4c 8b

All code

   0:    89 d9        mov    %ebx,%ecx
   2:    48 c7 c6 50 04 e5 c0     mov    $0xc0e50450,%rsi
   9:    c4 42 79 f7 c0       shlx   %eax,%r8d,%r8d
   e:    bd f0 ff ff ff       mov    $0xfff0,%ebp
  13:    e8 42 d5 7a c6       callq  0xc67ad55a
  18:    ff 83 00 04 00 00        incl   0x400(%rbx)
  1e:    e9 17 ff ff ff       jmpq   0xff3a
  23:    41 54        push   %r12
  25:    55       push   %rbp
  26:    53       push   %rbx
  27:    48 89 fb     mov    %rdi,%rbx
  2a:*    8b 47 28     mov    0x28(%rdi),%eax <-- trapping 
instruction

  2d:    85 c0        test   %eax,%eax
  2f:    0f 84 cf 00 00 00        je 0x104
  35:    48 8b bb c0 01 00 00     mov    0x1c0(%rbx),%rdi
  3c:    31 f6        xor    %esi,%esi
  3e:    4c       rex.WR
  3f:    8b       .byte 0x8b

Code starting with the faulting instruction
===
   0:    8b 47 28     mov    0x28(%rdi),%eax
   3:    85 c0        test   %eax,%eax
   5:    0f 84 cf 00 00 00        je 0xda
   b:    48 8b bb c0 01 00 00     mov    0x1c0(%rbx),%rdi
  12:    31 f6        xor    %esi,%esi
  14:    4c       rex.WR
  15:    8b       .byte 0x8b
RSP: 0018:bf0b41237d20 EFLAGS: 00010216
RAX: 9dfe0ba2ec00 RBX:  RCX: c0ceb630
RDX: 9dfe0ba2ec38 RSI: 7fff RDI: 
RBP: 9dfe0a07e000 R08:  R09: c0d4a9a0
R10: 8080808080808080 R11: 1800 R12: 0001
R13:  R14:  R15: 0008
FS:  () GS:9dfe3ecc() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0028 CR3: 0001a500e002 CR4: 003606e0
Call Trace:
nouveau_dmem_suspend (linux/drivers/gpu/drm/nouveau/nouveau_dmem.c:482 
(discriminator 9)) nouveau

nouveau_do_suspend (linux/drivers/gpu/drm/nouveau/nouveau_drm.c:748) nouveau
nouveau_pmops_runtime_suspend 
(linux/drivers/gpu/drm/nouveau/nouveau_drm.c:915) nouveau

pci_pm_runtime_suspend (linux/drivers/pci/pci-driver.c:1262)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238)
__rpm_callback (linux/drivers/base/power/runtime.c:357)
? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238)
rpm_callback (linux/drivers/base/power/runtime.c:490)
? pci_has_legacy_pm_support (linux/drivers/pci/pci-driver.c:1238)
rpm_suspend (linux/drivers/base/power/runtime.c:629)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
? __switch_to_asm (linux/arch/x86/entry/entry_64.S:312)
pm_runtime_work (linux/drivers/base/power/runtime.c:922)
process_one_work (linux/./arch/x86/include/asm/preempt.h:26 
linux/kernel/workqueue.c:2278)
worker_thread (linux/./include/linux/compiler.h:193 
linux/./include/linux/list.h:237 linux/kernel/workqueue.c:2416)

? process_one_work (linux/kernel/workqueue.c:2358)
kthread (linux/kernel/kthread.c:253)
? kthread_create_worker_on_cpu (linux/kernel/kthread.c:213)
ret_from_fork (linux/arch/x86/entry/entry_64.S:358)
Modules linked in: rfcomm af_packet snd_hda_codec_hdmi bnep uvcvideo 
videobuf2_vmalloc rtsx_usb_sdmmc videobuf2_memops btusb rtsx_usb_ms 
videobuf2_v4l2 btrtl mmc_core memstick btbcm videodev btintel 
videobuf2_common rtsx_usb bluetooth usbhid