Re: amd-staging-drm-next breaks suspend

2022-01-20 Thread Bert Karwatzki
Tested amd-staging-drm-next (HEAD
e5c18a35031963eb22bfabf84cce3545da56a8ee) and suspend/resume works
despite the warnings. So the amdgpu_gart_bind warning did not cause
problems.

Am Donnerstag, dem 20.01.2022 um 01:52 + schrieb Kim, Jonathan:
> [Public]
>
> This should fix the issue by getting rid of the unneeded flag check
> during gart bind:
> https://patchwork.freedesktop.org/patch/469907/
>
> Thanks,
>
> Jon
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > Bert
> > Karwatzki
> > Sent: January 19, 2022 8:12 PM
> > To: Alex Deucher 
> > Cc: Chris Hixon ; Zhuo, Qingqing
> > (Lillian) ; Das, Nirmoy
> > ; amd-gfx@lists.freedesktop.org; Scott Bruce
> > ; Limonciello, Mario
> > ; Kazlauskas, Nicholas
> > 
> > Subject: Re: amd-staging-drm-next breaks suspend
> >
> > [CAUTION: External Email]
> >
> > Unfortunately this does not work either:
> >
> > [    0.859998] [ cut here ]
> > [    0.859998] trying to bind memory to uninitialized GART !
> > [    0.860003] WARNING: CPU: 13 PID: 235 at
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254
> > amdgpu_gart_bind+0x29/0x40 [amdgpu]
> > [    0.860099] Modules linked in: amdgpu(+) drm_ttm_helper ttm
> > gpu_sched i2c_algo_bit drm_kms_helper syscopyarea hid_sensor_hub
> > sysfillrect mfd_core sysimgblt hid_generic fb_sys_fops cec xhci_pci
> > xhci_hcd nvme drm r8169 nvme_core psmouse crc32c_intel realtek
> > amd_sfh usbcore i2c_hid_acpi mdio_devres t10_pi crc_t10dif i2c_hid
> > i2c_piix4 crct10dif_generic libphy crct10dif_common hid backlight
> > i2c_designware_platform i2c_designware_core
> > [    0.860113] CPU: 13 PID: 235 Comm: systemd-udevd Not tainted
> > 5.13.0+
> > #15
> > [    0.860115] Hardware name: Micro-Star International Co., Ltd.
> > Alpha
> > 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
> > [    0.860116] RIP: 0010:amdgpu_gart_bind+0x29/0x40 [amdgpu]
> > [    0.860210] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25
> > 00 00
> > 4d 85 c9 74 05 e9 16 ff ff ff 31 c0 c3 48 c7 c7 08 06 7d c0 e8 8e
> > cc 31
> > e2 <0f> 0b b8 ea ff ff ff c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> > 40
> > [    0.860212] RSP: 0018:bb9e80b6f968 EFLAGS: 00010286
> > [    0.860213] RAX:  RBX: 0067 RCX:
> > a3080968
> > [    0.860214] RDX:  RSI: efff RDI:
> > a3028960
> > [    0.860215] RBP: 947c91e49a80 R08:  R09:
> > bb9e80b6f798
> > [    0.860215] R10: bb9e80b6f790 R11: a30989a8 R12:
> > 
> > [    0.860216] R13: 947c8a74 R14: 947c8a74 R15:
> > 
> > [    0.860216] FS:  7f60a3c918c0()
> > GS:947f5e94()
> > knlGS:
> > [    0.860217] CS:  0010 DS:  ES:  CR0: 80050033
> > [    0.860218] CR2: 7f60a4213480 CR3: 000135ee2000 CR4:
> > 00550ee0
> > [    0.860218] PKRU: 5554
> > [    0.860219] Call Trace:
> > [    0.860221]  amdgpu_ttm_gart_bind+0x74/0xc0 [amdgpu]
> > [    0.860305]  amdgpu_ttm_alloc_gart+0x13e/0x190 [amdgpu]
> > [    0.860385]  amdgpu_bo_create_reserved.part.0+0xf3/0x1b0
> > [amdgpu]
> > [    0.860465]  ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu]
> > [    0.860554]  amdgpu_bo_create_kernel+0x36/0xa0 [amdgpu]
> > [    0.860641]  amdgpu_ttm_init.cold+0x167/0x181 [amdgpu]
> > [    0.860784]  gmc_v10_0_sw_init+0x2d7/0x430 [amdgpu]
> > [    0.860889]  amdgpu_device_init.cold+0x147f/0x1ad7 [amdgpu]
> > [    0.861007]  ? acpi_ns_get_node+0x4a/0x55
> > [    0.861011]  ? acpi_get_handle+0x89/0xb2
> > [    0.861012]  amdgpu_driver_load_kms+0x55/0x290 [amdgpu]
> > [    0.861098]  amdgpu_pci_probe+0x181/0x250 [amdgpu]
> > [    0.861188]  pci_device_probe+0xcd/0x140
> > [    0.861191]  really_probe+0xed/0x460
> > [    0.861193]  driver_probe_device+0xe3/0x150
> > [    0.861195]  device_driver_attach+0x9c/0xb0
> > [    0.861196]  __driver_attach+0x8a/0x150
> > [    0.861197]  ? device_driver_attach+0xb0/0xb0
> > [    0.861198]  ? device_driver_attach+0xb0/0xb0
> > [    0.861198]  bus_for_each_dev+0x73/0xb0
> > [    0.861200]  bus_add_driver+0x121/0x1e0
> > [    0.861201]  driver_register+0x8a/0xe0
> > [    0.861202]  ? 0xc1117000
> > [    0.861203]  do_one_initcall+0x47/0x180
> > [    0.861205]  ? do_init_module+0x19/0x230
> > [    0.861208]  ? kmem_cache_alloc+0x182/0x260
> > [    0.861210]  do_init_module+0x51/0x230
> &g

Re: amd-staging-drm-next breaks suspend

2022-01-20 Thread Ma, Jun
The warn_on is still triggered because of empty gart.ptr
in function amdgpu_gart_bind

On 1/20/2022 10:56 AM, Chen, Guchun wrote:
> [Public]
> 
> [ 1.310551] trying to bind memory to uninitialized GART !
> 
> This is a warning only, it should not break suspend/resume. There is a fix on 
> drm-next for this "drm/amdgpu: remove gart.ready flag", pls have a try.
> If you still observe suspend issue, I guess it's caused by other regression. 
> Then can you pls bisect it?
> 
> Regards,
> Guchun
> 
> -Original Message-
> From: amd-gfx  On Behalf Of Bert 
> Karwatzki
> Sent: Thursday, January 20, 2022 5:52 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chris Hixon ; Zhuo, Qingqing (Lillian) 
> ; Scott Bruce ; Limonciello, Mario 
> ; Alex Deucher ; 
> Kazlauskas, Nicholas 
> Subject: amd-staging-drm-next breaks suspend
> 
> I just tested drm-staging-drm-next with HEAD
> f1b2924ee6929cb431440e6f961f06eb65d52beb:
> Going into suspend leads to a hang again:
> This is probably caused by
> [ 1.310551] trying to bind memory to uninitialized GART !
> and/or
> [ 3.976438] trying to bind memory to uninitialized GART !
> 
> 
> Here's the complete dmesg:
> [ 0.00] Linux version 5.13.0+ (bert@lisa) (gcc (Debian 11.2.0-14)
> 11.2.0, GNU ld (GNU Binutils for Debian) 2.37.50.20220106) #4 SMP Wed
> Jan 19 22:19:19 CET 2022
> [ 0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0+
> root=UUID=78dcbf14-902d-49c0-9d4d-b7ad84550d9a ro
> mt7921e.disable_aspm=1 quiet
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
> point registers'
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys
> User registers'
> [ 0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
> [ 0.00] x86/fpu: xstate_offset[9]: 832, xstate_sizes[9]: 8
> [ 0.00] x86/fpu: Enabled xstate features 0x207, context size is 840
> bytes, using 'compacted' format.
> [ 0.00] BIOS-provided physical RAM map:
> [ 0.00] BIOS-e820: [mem 0x-0x0009]
> usable
> [ 0.00] BIOS-e820: [mem 0x000a-0x000f]
> reserved
> [ 0.00] BIOS-e820: [mem 0x0010-0x09bfefff]
> usable
> [ 0.00] BIOS-e820: [mem 0x09bff000-0x0a000fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0x0a001000-0x0a1f]
> usable
> [ 0.00] BIOS-e820: [mem 0x0a20-0x0a20efff] ACPI
> NVS
> [ 0.00] BIOS-e820: [mem 0x0a20f000-0xe9e1]
> usable
> [ 0.00] BIOS-e820: [mem 0xe9e2-0xeb33efff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xeb33f000-0xeb39efff] ACPI
> data
> [ 0.00] BIOS-e820: [mem 0xeb39f000-0xeb556fff] ACPI
> NVS
> [ 0.00] BIOS-e820: [mem 0xeb557000-0xed17cfff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xed17d000-0xed1fefff] type
> 20
> [ 0.00] BIOS-e820: [mem 0xed1ff000-0xedff]
> usable
> [ 0.00] BIOS-e820: [mem 0xee00-0xf7ff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfd00-0xfdff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfec1-0xfec10fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfed0-0xfed00fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfed4-0xfed44fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfed8-0xfed8]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfedc4000-0xfedc9fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfedcc000-0xfedcefff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfedd5000-0xfedd5fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xff00-0x]
> reserved
> [ 0.00] BIOS-e820: [mem 0x0001-0x0003ee2f]
> usable
> [ 0.00] BIOS-e820: [mem 0x0003ee30-0x00040fff]
> reserved
> [ 0.00] NX (Execute Disable) protection: active
> [ 0.00] efi: EFI v2.70 by American Megatrends
> [ 0.00] efi: ACPI=0xeb54 ACPI 2.0=0xeb540014
> TPMFinalLog=0xeb50c000 SMBIOS=0xed02 SMBIOS 3.0=0xed01f000
> MEMATTR=0xe6fa3018 ESRT=0xe87cb918 MOKvar=0xe6fa
> [ 0.00] SMBIOS 3.3.0 present.
> [ 0.00] DMI: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-
> 158L, BIOS E158LAMS.107 11/10/2021
> [ 0.00] tsc: Fast TSC calibration using PIT
> [ 0.00] tsc: Detected 3194.034 MHz processor
> [ 0.000125] e820: update [mem 0x-0x0fff] usable ==>
> reserved
> [ 0.000126] e820: remove [mem 0x000a-0x000f] usable
> [ 0.000131] last_pfn = 0x3ee300 max_arch_pfn = 0x4
> [ 0.000363] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
> [ 0.000577] e820: update [mem 

RE: amd-staging-drm-next breaks suspend

2022-01-19 Thread Chen, Guchun
[Public]

[ 1.310551] trying to bind memory to uninitialized GART !

This is a warning only, it should not break suspend/resume. There is a fix on 
drm-next for this "drm/amdgpu: remove gart.ready flag", pls have a try.
If you still observe suspend issue, I guess it's caused by other regression. 
Then can you pls bisect it?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Bert 
Karwatzki
Sent: Thursday, January 20, 2022 5:52 AM
To: amd-gfx@lists.freedesktop.org
Cc: Chris Hixon ; Zhuo, Qingqing (Lillian) 
; Scott Bruce ; Limonciello, Mario 
; Alex Deucher ; Kazlauskas, 
Nicholas 
Subject: amd-staging-drm-next breaks suspend

I just tested drm-staging-drm-next with HEAD
f1b2924ee6929cb431440e6f961f06eb65d52beb:
Going into suspend leads to a hang again:
This is probably caused by
[ 1.310551] trying to bind memory to uninitialized GART !
and/or
[ 3.976438] trying to bind memory to uninitialized GART !


Here's the complete dmesg:
[ 0.00] Linux version 5.13.0+ (bert@lisa) (gcc (Debian 11.2.0-14)
11.2.0, GNU ld (GNU Binutils for Debian) 2.37.50.20220106) #4 SMP Wed
Jan 19 22:19:19 CET 2022
[ 0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0+
root=UUID=78dcbf14-902d-49c0-9d4d-b7ad84550d9a ro
mt7921e.disable_aspm=1 quiet
[ 0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
point registers'
[ 0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.00] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys
User registers'
[ 0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.00] x86/fpu: xstate_offset[9]: 832, xstate_sizes[9]: 8
[ 0.00] x86/fpu: Enabled xstate features 0x207, context size is 840
bytes, using 'compacted' format.
[ 0.00] BIOS-provided physical RAM map:
[ 0.00] BIOS-e820: [mem 0x-0x0009]
usable
[ 0.00] BIOS-e820: [mem 0x000a-0x000f]
reserved
[ 0.00] BIOS-e820: [mem 0x0010-0x09bfefff]
usable
[ 0.00] BIOS-e820: [mem 0x09bff000-0x0a000fff]
reserved
[ 0.00] BIOS-e820: [mem 0x0a001000-0x0a1f]
usable
[ 0.00] BIOS-e820: [mem 0x0a20-0x0a20efff] ACPI
NVS
[ 0.00] BIOS-e820: [mem 0x0a20f000-0xe9e1]
usable
[ 0.00] BIOS-e820: [mem 0xe9e2-0xeb33efff]
reserved
[ 0.00] BIOS-e820: [mem 0xeb33f000-0xeb39efff] ACPI
data
[ 0.00] BIOS-e820: [mem 0xeb39f000-0xeb556fff] ACPI
NVS
[ 0.00] BIOS-e820: [mem 0xeb557000-0xed17cfff]
reserved
[ 0.00] BIOS-e820: [mem 0xed17d000-0xed1fefff] type
20
[ 0.00] BIOS-e820: [mem 0xed1ff000-0xedff]
usable
[ 0.00] BIOS-e820: [mem 0xee00-0xf7ff]
reserved
[ 0.00] BIOS-e820: [mem 0xfd00-0xfdff]
reserved
[ 0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfec1-0xfec10fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfed0-0xfed00fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfed4-0xfed44fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfed8-0xfed8]
reserved
[ 0.00] BIOS-e820: [mem 0xfedc4000-0xfedc9fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfedcc000-0xfedcefff]
reserved
[ 0.00] BIOS-e820: [mem 0xfedd5000-0xfedd5fff]
reserved
[ 0.00] BIOS-e820: [mem 0xff00-0x]
reserved
[ 0.00] BIOS-e820: [mem 0x0001-0x0003ee2f]
usable
[ 0.00] BIOS-e820: [mem 0x0003ee30-0x00040fff]
reserved
[ 0.00] NX (Execute Disable) protection: active
[ 0.00] efi: EFI v2.70 by American Megatrends
[ 0.00] efi: ACPI=0xeb54 ACPI 2.0=0xeb540014
TPMFinalLog=0xeb50c000 SMBIOS=0xed02 SMBIOS 3.0=0xed01f000
MEMATTR=0xe6fa3018 ESRT=0xe87cb918 MOKvar=0xe6fa
[ 0.00] SMBIOS 3.3.0 present.
[ 0.00] DMI: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-
158L, BIOS E158LAMS.107 11/10/2021
[ 0.00] tsc: Fast TSC calibration using PIT
[ 0.00] tsc: Detected 3194.034 MHz processor
[ 0.000125] e820: update [mem 0x-0x0fff] usable ==>
reserved
[ 0.000126] e820: remove [mem 0x000a-0x000f] usable
[ 0.000131] last_pfn = 0x3ee300 max_arch_pfn = 0x4
[ 0.000363] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.000577] e820: update [mem 0xf000-0x] usable ==>
reserved
[ 0.000582] last_pfn = 0xee000 max_arch_pfn = 0x4
[ 0.003213] esrt: Reserving ESRT space from 0xe87cb918 to
0xe87cb950.
[ 0.003217] e820: update [mem 0xe87cb000-0xe87cbfff] usable ==>
reserved
[ 0.003225] e820: update [mem 0xe6fa-0xe6fa2fff] usable ==>
reserved
[ 0.003235] Using GB pages for direct mapping
[ 

RE: amd-staging-drm-next breaks suspend

2022-01-19 Thread Kim, Jonathan
[Public]

This should fix the issue by getting rid of the unneeded flag check during gart 
bind:
https://patchwork.freedesktop.org/patch/469907/

Thanks,

Jon

> -Original Message-
> From: amd-gfx  On Behalf Of Bert
> Karwatzki
> Sent: January 19, 2022 8:12 PM
> To: Alex Deucher 
> Cc: Chris Hixon ; Zhuo, Qingqing
> (Lillian) ; Das, Nirmoy
> ; amd-gfx@lists.freedesktop.org; Scott Bruce
> ; Limonciello, Mario
> ; Kazlauskas, Nicholas
> 
> Subject: Re: amd-staging-drm-next breaks suspend
>
> [CAUTION: External Email]
>
> Unfortunately this does not work either:
>
> [0.859998] [ cut here ]
> [0.859998] trying to bind memory to uninitialized GART !
> [0.860003] WARNING: CPU: 13 PID: 235 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254
> amdgpu_gart_bind+0x29/0x40 [amdgpu]
> [0.860099] Modules linked in: amdgpu(+) drm_ttm_helper ttm
> gpu_sched i2c_algo_bit drm_kms_helper syscopyarea hid_sensor_hub
> sysfillrect mfd_core sysimgblt hid_generic fb_sys_fops cec xhci_pci
> xhci_hcd nvme drm r8169 nvme_core psmouse crc32c_intel realtek
> amd_sfh usbcore i2c_hid_acpi mdio_devres t10_pi crc_t10dif i2c_hid
> i2c_piix4 crct10dif_generic libphy crct10dif_common hid backlight
> i2c_designware_platform i2c_designware_core
> [0.860113] CPU: 13 PID: 235 Comm: systemd-udevd Not tainted 5.13.0+
> #15
> [0.860115] Hardware name: Micro-Star International Co., Ltd. Alpha
> 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
> [0.860116] RIP: 0010:amdgpu_gart_bind+0x29/0x40 [amdgpu]
> [0.860210] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25 00 00
> 4d 85 c9 74 05 e9 16 ff ff ff 31 c0 c3 48 c7 c7 08 06 7d c0 e8 8e cc 31
> e2 <0f> 0b b8 ea ff ff ff c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> [0.860212] RSP: 0018:bb9e80b6f968 EFLAGS: 00010286
> [0.860213] RAX:  RBX: 0067 RCX:
> a3080968
> [0.860214] RDX:  RSI: efff RDI:
> a3028960
> [0.860215] RBP: 947c91e49a80 R08:  R09:
> bb9e80b6f798
> [0.860215] R10: bb9e80b6f790 R11: a30989a8 R12:
> 
> [0.860216] R13: 947c8a74 R14: 947c8a74 R15:
> 
> [0.860216] FS:  7f60a3c918c0() GS:947f5e94()
> knlGS:
> [0.860217] CS:  0010 DS:  ES:  CR0: 80050033
> [0.860218] CR2: 7f60a4213480 CR3: 000135ee2000 CR4:
> 00550ee0
> [0.860218] PKRU: 5554
> [0.860219] Call Trace:
> [0.860221]  amdgpu_ttm_gart_bind+0x74/0xc0 [amdgpu]
> [0.860305]  amdgpu_ttm_alloc_gart+0x13e/0x190 [amdgpu]
> [0.860385]  amdgpu_bo_create_reserved.part.0+0xf3/0x1b0 [amdgpu]
> [0.860465]  ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu]
> [0.860554]  amdgpu_bo_create_kernel+0x36/0xa0 [amdgpu]
> [0.860641]  amdgpu_ttm_init.cold+0x167/0x181 [amdgpu]
> [0.860784]  gmc_v10_0_sw_init+0x2d7/0x430 [amdgpu]
> [0.860889]  amdgpu_device_init.cold+0x147f/0x1ad7 [amdgpu]
> [0.861007]  ? acpi_ns_get_node+0x4a/0x55
> [0.861011]  ? acpi_get_handle+0x89/0xb2
> [0.861012]  amdgpu_driver_load_kms+0x55/0x290 [amdgpu]
> [0.861098]  amdgpu_pci_probe+0x181/0x250 [amdgpu]
> [0.861188]  pci_device_probe+0xcd/0x140
> [0.861191]  really_probe+0xed/0x460
> [0.861193]  driver_probe_device+0xe3/0x150
> [0.861195]  device_driver_attach+0x9c/0xb0
> [0.861196]  __driver_attach+0x8a/0x150
> [0.861197]  ? device_driver_attach+0xb0/0xb0
> [0.861198]  ? device_driver_attach+0xb0/0xb0
> [0.861198]  bus_for_each_dev+0x73/0xb0
> [0.861200]  bus_add_driver+0x121/0x1e0
> [0.861201]  driver_register+0x8a/0xe0
> [0.861202]  ? 0xc1117000
> [0.861203]  do_one_initcall+0x47/0x180
> [0.861205]  ? do_init_module+0x19/0x230
> [0.861208]  ? kmem_cache_alloc+0x182/0x260
> [0.861210]  do_init_module+0x51/0x230
> [0.861211]  __do_sys_finit_module+0xb1/0x110
> [0.861213]  do_syscall_64+0x40/0xb0
> [0.861216]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [0.861218] RIP: 0033:0x7f60a4149679
> [0.861220] Code: 48 8d 3d 9a a1 0c 00 0f 05 eb a5 66 0f 1f 44 00 00
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 57 0c 00 f7 d8 64 89 01 48
> [0.861221] RSP: 002b:7ffe25f17ea8 EFLAGS: 0246 ORIG_RAX:
> 0139
> [0.861223] RAX: ffda RBX: 56004a10a660 RCX:
> 7f60a4149679
> [0.861224] RDX:  RSI: 7f60a42e9eed RDI:
> 0016
> [0.861224] RBP: 0002 R08:  R09:
&g

Re: amd-staging-drm-next breaks suspend

2022-01-19 Thread Bert Karwatzki
Unfortunately this does not work either:

[0.859998] [ cut here ]
[0.859998] trying to bind memory to uninitialized GART !
[0.860003] WARNING: CPU: 13 PID: 235 at
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254 amdgpu_gart_bind+0x29/0x40
[amdgpu]
[0.860099] Modules linked in: amdgpu(+) drm_ttm_helper ttm
gpu_sched i2c_algo_bit drm_kms_helper syscopyarea hid_sensor_hub
sysfillrect mfd_core sysimgblt hid_generic fb_sys_fops cec xhci_pci
xhci_hcd nvme drm r8169 nvme_core psmouse crc32c_intel realtek amd_sfh
usbcore i2c_hid_acpi mdio_devres t10_pi crc_t10dif i2c_hid i2c_piix4
crct10dif_generic libphy crct10dif_common hid backlight
i2c_designware_platform i2c_designware_core
[0.860113] CPU: 13 PID: 235 Comm: systemd-udevd Not tainted 5.13.0+
#15
[0.860115] Hardware name: Micro-Star International Co., Ltd. Alpha
15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
[0.860116] RIP: 0010:amdgpu_gart_bind+0x29/0x40 [amdgpu]
[0.860210] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25 00 00
4d 85 c9 74 05 e9 16 ff ff ff 31 c0 c3 48 c7 c7 08 06 7d c0 e8 8e cc 31
e2 <0f> 0b b8 ea ff ff ff c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
[0.860212] RSP: 0018:bb9e80b6f968 EFLAGS: 00010286
[0.860213] RAX:  RBX: 0067 RCX:
a3080968
[0.860214] RDX:  RSI: efff RDI:
a3028960
[0.860215] RBP: 947c91e49a80 R08:  R09:
bb9e80b6f798
[0.860215] R10: bb9e80b6f790 R11: a30989a8 R12:

[0.860216] R13: 947c8a74 R14: 947c8a74 R15:

[0.860216] FS:  7f60a3c918c0() GS:947f5e94()
knlGS:
[0.860217] CS:  0010 DS:  ES:  CR0: 80050033
[0.860218] CR2: 7f60a4213480 CR3: 000135ee2000 CR4:
00550ee0
[0.860218] PKRU: 5554
[0.860219] Call Trace:
[0.860221]  amdgpu_ttm_gart_bind+0x74/0xc0 [amdgpu]
[0.860305]  amdgpu_ttm_alloc_gart+0x13e/0x190 [amdgpu]
[0.860385]  amdgpu_bo_create_reserved.part.0+0xf3/0x1b0 [amdgpu]
[0.860465]  ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu]
[0.860554]  amdgpu_bo_create_kernel+0x36/0xa0 [amdgpu]
[0.860641]  amdgpu_ttm_init.cold+0x167/0x181 [amdgpu]
[0.860784]  gmc_v10_0_sw_init+0x2d7/0x430 [amdgpu]
[0.860889]  amdgpu_device_init.cold+0x147f/0x1ad7 [amdgpu]
[0.861007]  ? acpi_ns_get_node+0x4a/0x55
[0.861011]  ? acpi_get_handle+0x89/0xb2
[0.861012]  amdgpu_driver_load_kms+0x55/0x290 [amdgpu]
[0.861098]  amdgpu_pci_probe+0x181/0x250 [amdgpu]
[0.861188]  pci_device_probe+0xcd/0x140
[0.861191]  really_probe+0xed/0x460
[0.861193]  driver_probe_device+0xe3/0x150
[0.861195]  device_driver_attach+0x9c/0xb0
[0.861196]  __driver_attach+0x8a/0x150
[0.861197]  ? device_driver_attach+0xb0/0xb0
[0.861198]  ? device_driver_attach+0xb0/0xb0
[0.861198]  bus_for_each_dev+0x73/0xb0
[0.861200]  bus_add_driver+0x121/0x1e0
[0.861201]  driver_register+0x8a/0xe0
[0.861202]  ? 0xc1117000
[0.861203]  do_one_initcall+0x47/0x180
[0.861205]  ? do_init_module+0x19/0x230
[0.861208]  ? kmem_cache_alloc+0x182/0x260
[0.861210]  do_init_module+0x51/0x230
[0.861211]  __do_sys_finit_module+0xb1/0x110
[0.861213]  do_syscall_64+0x40/0xb0
[0.861216]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[0.861218] RIP: 0033:0x7f60a4149679
[0.861220] Code: 48 8d 3d 9a a1 0c 00 0f 05 eb a5 66 0f 1f 44 00 00
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 57 0c 00 f7 d8 64 89 01 48
[0.861221] RSP: 002b:7ffe25f17ea8 EFLAGS: 0246 ORIG_RAX:
0139
[0.861223] RAX: ffda RBX: 56004a10a660 RCX:
7f60a4149679
[0.861224] RDX:  RSI: 7f60a42e9eed RDI:
0016
[0.861224] RBP: 0002 R08:  R09:
56004a105980
[0.861225] R10: 0016 R11: 0246 R12:
7f60a42e9eed
[0.861225] R13:  R14: 56004a0efdd0 R15:
56004a10a660
[0.861226] ---[ end trace 0319f26df48f8ef0 ]---
[0.861228] [drm:amdgpu_ttm_gart_bind [amdgpu]] *ERROR* failed to
bind 1 pages at 0x0040
[0.861540] amdgpu :03:00.0: amdgpu: a9dfe17c bind
failed


Am Mittwoch, dem 19.01.2022 um 19:54 -0500 schrieb Alex Deucher:
> On Wed, Jan 19, 2022 at 7:48 PM Bert Karwatzki 
> wrote:
> >
> > Bisected the error and found the first bad commit to be
> > d015e9861e55928a78137a2c95897bc50637fc47 is the first bad commit
> > commit d015e9861e55928a78137a2c95897bc50637fc47
> > Author: Jonathan Kim 
> > Date:   Thu Dec 9 16:48:56 2021 -0500
> >
> >     drm/amdgpu: improve debug VRAM access performance using sdma
> >
> >     For better performance during VRAM access for debugged
> > processes,
> > do
> >     read/write copies over SDMA.
> >
> >     In 

Re: amd-staging-drm-next breaks suspend

2022-01-19 Thread Alex Deucher
On Wed, Jan 19, 2022 at 7:48 PM Bert Karwatzki  wrote:
>
> Bisected the error and found the first bad commit to be
> d015e9861e55928a78137a2c95897bc50637fc47 is the first bad commit
> commit d015e9861e55928a78137a2c95897bc50637fc47
> Author: Jonathan Kim 
> Date:   Thu Dec 9 16:48:56 2021 -0500
>
> drm/amdgpu: improve debug VRAM access performance using sdma
>
> For better performance during VRAM access for debugged processes,
> do
> read/write copies over SDMA.
>
> In order to fulfill post mortem debugging on a broken device,
> fallback to
> stable MMIO access when gpu recovery is disabled or when job
> submission
> time outs are set to max.  Failed SDMA access should automatically
> fall
> back to MMIO access.
>
> Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
> page-table updates and TLB flushes on access.
>
> Signed-off-by: Jonathan Kim 
> Reviewed-by: Felix Kuehling 
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
> +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++
>  2 files changed, 82 insertions(+)

Should be fixed with:
https://patchwork.freedesktop.org/patch/470069/

Alex

>
>
> Am Donnerstag, dem 20.01.2022 um 00:22 +0100 schrieb Bert Karwatzki:
> > Reverting commit 72f686438de13f121c52f58d7445570a33dfdc61 does not
> > change the errors:
> > [1.310550] [ cut here ]
> > [1.310551] trying to bind memory to uninitialized GART !
> > [1.310556] WARNING: CPU: 9 PID: 252 at
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254
> > amdgpu_gart_bind+0x2e/0x40
> > [amdgpu]
> > [1.310659] Modules linked in: amdgpu(+) gpu_sched i2c_algo_bit
> > drm_ttm_helper hid_sensor_hub ttm hid_generic nvme drm_kms_helper
> > nvme_core cec xhci_pci t10_pi r8169 rc_core crc32_pclmul crc_t10dif
> > i2c_hid_acpi realtek xhci_hcd psmouse crc32c_intel crct10dif_generic
> > i2c_hid amd_sfh mdio_devres crct10dif_pclmul drm i2c_piix4 usbcore
> > libphy crct10dif_common wmi button battery video fjes(-) hid
> > [1.310672] CPU: 9 PID: 252 Comm: systemd-udevd Not tainted
> > 5.13.0+
> > #4
> > [1.310673] Hardware name: Micro-Star International Co., Ltd.
> > Alpha
> > 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
> > [1.310674] RIP: 0010:amdgpu_gart_bind+0x2e/0x40 [amdgpu]
> > [1.310762] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25 00
> > 00
> > 4d 85 c9 74 05 e9 01 ff ff ff 31 c0 c3 48 c7 c7 68 36 dd c0 e8 86 db
> > 19
> > e8 <0f> 0b b8 ea ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
> > 00
> > [1.310763] RSP: 0018:b19d00c33920 EFLAGS: 00010282
> > [1.310764] RAX:  RBX: 0067 RCX:
> > a9abb208
> > [1.310765] RDX:  RSI: efff RDI:
> > a9a63200
> > [1.310766] RBP: 985ce2a796c0 R08:  R09:
> > b19d00c33748
> > [1.310766] R10: b19d00c33740 R11: a9ad3248 R12:
> > 
> > [1.310766] R13: 985cd45a R14: 985cd45a R15:
> > 
> > [1.310767] FS:  7f69fabdc8c0() GS:985f9e64()
> > knlGS:
> > [1.310768] CS:  0010 DS:  ES:  CR0: 80050033
> > [1.310768] CR2: 7f69fabc5dca CR3: 0001139ec000 CR4:
> > 00750ee0
> > [1.310769] PKRU: 5554
> > [1.310770] Call Trace:
> > [1.310772]  amdgpu_ttm_gart_bind+0x79/0xc0 [amdgpu]
> > [1.310858]  amdgpu_ttm_alloc_gart+0x146/0x1a0 [amdgpu]
> > [1.310942]  amdgpu_bo_create_reserved.part.0+0xf8/0x1b0 [amdgpu]
> > [1.311025]  ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu]
> > [1.311145]  amdgpu_bo_create_kernel+0x3b/0xa0 [amdgpu]
> > [1.311229]  amdgpu_ttm_init.cold+0x165/0x17f [amdgpu]
> > [1.311349]  gmc_v10_0_sw_init+0x2dc/0x430 [amdgpu]
> > [1.311455]  amdgpu_device_init.cold+0x1544/0x1b54 [amdgpu]
> > [1.311570]  ? acpi_ns_get_node+0x4f/0x5a
> > [1.311574]  ? acpi_get_handle+0x8e/0xb7
> > [1.311576]  amdgpu_driver_load_kms+0x67/0x320 [amdgpu]
> > [1.311664]  amdgpu_pci_probe+0x1bc/0x290 [amdgpu]
> > [1.311750]  local_pci_probe+0x42/0x80
> > [1.311753]  ? __cond_resched+0x16/0x40
> > [1.311755]  pci_device_probe+0xfd/0x1b0
> > [1.311756]  really_probe+0xf2/0x460
> > [1.311759]  driver_probe_device+0xe8/0x160
> > [1.311760]  device_driver_attach+0xa1/0xb0
> > [1.311761]  __driver_attach+0x8f/0x150
> > [1.311763]  ? device_driver_attach+0xb0/0xb0
> > [1.311764]  ? device_driver_attach+0xb0/0xb0
> > [1.311765]  bus_for_each_dev+0x78/0xc0
> > [1.311766]  bus_add_driver+0x12b/0x1e0
> > [1.311768]  driver_register+0x8f/0xe0
> > [1.311769]  ? 0xc1828000
> > [1.311770]  do_one_initcall+0x44/0x1d0
> > [1.311772]  ? kmem_cache_alloc_trace+0x103/0x240
> > [1.311775]  do_init_module+0x5c/0x270
> > [1.311777]  __do_sys_finit_module+0xb1/0x110
> > [1.311779]  

Re: amd-staging-drm-next breaks suspend

2022-01-19 Thread Bert Karwatzki
Bisected the error and found the first bad commit to be
d015e9861e55928a78137a2c95897bc50637fc47 is the first bad commit
commit d015e9861e55928a78137a2c95897bc50637fc47
Author: Jonathan Kim 
Date:   Thu Dec 9 16:48:56 2021 -0500

drm/amdgpu: improve debug VRAM access performance using sdma

For better performance during VRAM access for debugged processes,
do
read/write copies over SDMA.

In order to fulfill post mortem debugging on a broken device,
fallback to
stable MMIO access when gpu recovery is disabled or when job
submission
time outs are set to max.  Failed SDMA access should automatically
fall
back to MMIO access.

Use a pre-allocated GTT bounce buffer pre-mapped into GART to avoid
page-table updates and TLB flushes on access.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 

 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 78
+
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++
 2 files changed, 82 insertions(+)


Am Donnerstag, dem 20.01.2022 um 00:22 +0100 schrieb Bert Karwatzki:
> Reverting commit 72f686438de13f121c52f58d7445570a33dfdc61 does not
> change the errors:
> [    1.310550] [ cut here ]
> [    1.310551] trying to bind memory to uninitialized GART !
> [    1.310556] WARNING: CPU: 9 PID: 252 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254
> amdgpu_gart_bind+0x2e/0x40
> [amdgpu]
> [    1.310659] Modules linked in: amdgpu(+) gpu_sched i2c_algo_bit
> drm_ttm_helper hid_sensor_hub ttm hid_generic nvme drm_kms_helper
> nvme_core cec xhci_pci t10_pi r8169 rc_core crc32_pclmul crc_t10dif
> i2c_hid_acpi realtek xhci_hcd psmouse crc32c_intel crct10dif_generic
> i2c_hid amd_sfh mdio_devres crct10dif_pclmul drm i2c_piix4 usbcore
> libphy crct10dif_common wmi button battery video fjes(-) hid
> [    1.310672] CPU: 9 PID: 252 Comm: systemd-udevd Not tainted
> 5.13.0+
> #4
> [    1.310673] Hardware name: Micro-Star International Co., Ltd.
> Alpha
> 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
> [    1.310674] RIP: 0010:amdgpu_gart_bind+0x2e/0x40 [amdgpu]
> [    1.310762] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25 00
> 00
> 4d 85 c9 74 05 e9 01 ff ff ff 31 c0 c3 48 c7 c7 68 36 dd c0 e8 86 db
> 19
> e8 <0f> 0b b8 ea ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
> 00
> [    1.310763] RSP: 0018:b19d00c33920 EFLAGS: 00010282
> [    1.310764] RAX:  RBX: 0067 RCX:
> a9abb208
> [    1.310765] RDX:  RSI: efff RDI:
> a9a63200
> [    1.310766] RBP: 985ce2a796c0 R08:  R09:
> b19d00c33748
> [    1.310766] R10: b19d00c33740 R11: a9ad3248 R12:
> 
> [    1.310766] R13: 985cd45a R14: 985cd45a R15:
> 
> [    1.310767] FS:  7f69fabdc8c0() GS:985f9e64()
> knlGS:
> [    1.310768] CS:  0010 DS:  ES:  CR0: 80050033
> [    1.310768] CR2: 7f69fabc5dca CR3: 0001139ec000 CR4:
> 00750ee0
> [    1.310769] PKRU: 5554
> [    1.310770] Call Trace:
> [    1.310772]  amdgpu_ttm_gart_bind+0x79/0xc0 [amdgpu]
> [    1.310858]  amdgpu_ttm_alloc_gart+0x146/0x1a0 [amdgpu]
> [    1.310942]  amdgpu_bo_create_reserved.part.0+0xf8/0x1b0 [amdgpu]
> [    1.311025]  ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu]
> [    1.311145]  amdgpu_bo_create_kernel+0x3b/0xa0 [amdgpu]
> [    1.311229]  amdgpu_ttm_init.cold+0x165/0x17f [amdgpu]
> [    1.311349]  gmc_v10_0_sw_init+0x2dc/0x430 [amdgpu]
> [    1.311455]  amdgpu_device_init.cold+0x1544/0x1b54 [amdgpu]
> [    1.311570]  ? acpi_ns_get_node+0x4f/0x5a
> [    1.311574]  ? acpi_get_handle+0x8e/0xb7
> [    1.311576]  amdgpu_driver_load_kms+0x67/0x320 [amdgpu]
> [    1.311664]  amdgpu_pci_probe+0x1bc/0x290 [amdgpu]
> [    1.311750]  local_pci_probe+0x42/0x80
> [    1.311753]  ? __cond_resched+0x16/0x40
> [    1.311755]  pci_device_probe+0xfd/0x1b0
> [    1.311756]  really_probe+0xf2/0x460
> [    1.311759]  driver_probe_device+0xe8/0x160
> [    1.311760]  device_driver_attach+0xa1/0xb0
> [    1.311761]  __driver_attach+0x8f/0x150
> [    1.311763]  ? device_driver_attach+0xb0/0xb0
> [    1.311764]  ? device_driver_attach+0xb0/0xb0
> [    1.311765]  bus_for_each_dev+0x78/0xc0
> [    1.311766]  bus_add_driver+0x12b/0x1e0
> [    1.311768]  driver_register+0x8f/0xe0
> [    1.311769]  ? 0xc1828000
> [    1.311770]  do_one_initcall+0x44/0x1d0
> [    1.311772]  ? kmem_cache_alloc_trace+0x103/0x240
> [    1.311775]  do_init_module+0x5c/0x270
> [    1.311777]  __do_sys_finit_module+0xb1/0x110
> [    1.311779]  do_syscall_64+0x40/0xb0
> [    1.311781]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [    1.311783] RIP: 0033:0x7f69fb094679
> [    1.311785] Code: 48 8d 3d 9a a1 0c 00 0f 05 eb a5 66 0f 1f 44 00
> 00
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
> 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 57 0c 00 f7 d8 64 89 01
> 48
> [    

Re: amd-staging-drm-next breaks suspend

2022-01-19 Thread Bert Karwatzki
Reverting commit 72f686438de13f121c52f58d7445570a33dfdc61 does not
change the errors:
[1.310550] [ cut here ]
[1.310551] trying to bind memory to uninitialized GART !
[1.310556] WARNING: CPU: 9 PID: 252 at
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:254 amdgpu_gart_bind+0x2e/0x40
[amdgpu]
[1.310659] Modules linked in: amdgpu(+) gpu_sched i2c_algo_bit
drm_ttm_helper hid_sensor_hub ttm hid_generic nvme drm_kms_helper
nvme_core cec xhci_pci t10_pi r8169 rc_core crc32_pclmul crc_t10dif
i2c_hid_acpi realtek xhci_hcd psmouse crc32c_intel crct10dif_generic
i2c_hid amd_sfh mdio_devres crct10dif_pclmul drm i2c_piix4 usbcore
libphy crct10dif_common wmi button battery video fjes(-) hid
[1.310672] CPU: 9 PID: 252 Comm: systemd-udevd Not tainted 5.13.0+
#4
[1.310673] Hardware name: Micro-Star International Co., Ltd. Alpha
15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
[1.310674] RIP: 0010:amdgpu_gart_bind+0x2e/0x40 [amdgpu]
[1.310762] Code: 00 80 bf 34 25 00 00 00 74 14 4c 8b 8f 20 25 00 00
4d 85 c9 74 05 e9 01 ff ff ff 31 c0 c3 48 c7 c7 68 36 dd c0 e8 86 db 19
e8 <0f> 0b b8 ea ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[1.310763] RSP: 0018:b19d00c33920 EFLAGS: 00010282
[1.310764] RAX:  RBX: 0067 RCX:
a9abb208
[1.310765] RDX:  RSI: efff RDI:
a9a63200
[1.310766] RBP: 985ce2a796c0 R08:  R09:
b19d00c33748
[1.310766] R10: b19d00c33740 R11: a9ad3248 R12:

[1.310766] R13: 985cd45a R14: 985cd45a R15:

[1.310767] FS:  7f69fabdc8c0() GS:985f9e64()
knlGS:
[1.310768] CS:  0010 DS:  ES:  CR0: 80050033
[1.310768] CR2: 7f69fabc5dca CR3: 0001139ec000 CR4:
00750ee0
[1.310769] PKRU: 5554
[1.310770] Call Trace:
[1.310772]  amdgpu_ttm_gart_bind+0x79/0xc0 [amdgpu]
[1.310858]  amdgpu_ttm_alloc_gart+0x146/0x1a0 [amdgpu]
[1.310942]  amdgpu_bo_create_reserved.part.0+0xf8/0x1b0 [amdgpu]
[1.311025]  ? amdgpu_ttm_debugfs_init+0x110/0x110 [amdgpu]
[1.311145]  amdgpu_bo_create_kernel+0x3b/0xa0 [amdgpu]
[1.311229]  amdgpu_ttm_init.cold+0x165/0x17f [amdgpu]
[1.311349]  gmc_v10_0_sw_init+0x2dc/0x430 [amdgpu]
[1.311455]  amdgpu_device_init.cold+0x1544/0x1b54 [amdgpu]
[1.311570]  ? acpi_ns_get_node+0x4f/0x5a
[1.311574]  ? acpi_get_handle+0x8e/0xb7
[1.311576]  amdgpu_driver_load_kms+0x67/0x320 [amdgpu]
[1.311664]  amdgpu_pci_probe+0x1bc/0x290 [amdgpu]
[1.311750]  local_pci_probe+0x42/0x80
[1.311753]  ? __cond_resched+0x16/0x40
[1.311755]  pci_device_probe+0xfd/0x1b0
[1.311756]  really_probe+0xf2/0x460
[1.311759]  driver_probe_device+0xe8/0x160
[1.311760]  device_driver_attach+0xa1/0xb0
[1.311761]  __driver_attach+0x8f/0x150
[1.311763]  ? device_driver_attach+0xb0/0xb0
[1.311764]  ? device_driver_attach+0xb0/0xb0
[1.311765]  bus_for_each_dev+0x78/0xc0
[1.311766]  bus_add_driver+0x12b/0x1e0
[1.311768]  driver_register+0x8f/0xe0
[1.311769]  ? 0xc1828000
[1.311770]  do_one_initcall+0x44/0x1d0
[1.311772]  ? kmem_cache_alloc_trace+0x103/0x240
[1.311775]  do_init_module+0x5c/0x270
[1.311777]  __do_sys_finit_module+0xb1/0x110
[1.311779]  do_syscall_64+0x40/0xb0
[1.311781]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[1.311783] RIP: 0033:0x7f69fb094679
[1.311785] Code: 48 8d 3d 9a a1 0c 00 0f 05 eb a5 66 0f 1f 44 00 00
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 57 0c 00 f7 d8 64 89 01 48
[1.311786] RSP: 002b:7ffce4131708 EFLAGS: 0246 ORIG_RAX:
0139
[1.311788] RAX: ffda RBX: 55d71350a3a0 RCX:
7f69fb094679
[1.311788] RDX:  RSI: 7f69fb234eed RDI:
0013
[1.311789] RBP: 0002 R08:  R09:
55d7134f3930
[1.311789] R10: 0013 R11: 0246 R12:
7f69fb234eed
[1.311790] R13:  R14: 55d7134da0f0 R15:
55d71350a3a0
[1.311791] ---[ end trace ff47998e3140e95d ]---
[1.311793] [drm:amdgpu_ttm_gart_bind [amdgpu]] *ERROR* failed to
bind 1 pages at 0x0040
[1.312100] amdgpu :03:00.0: amdgpu: 989bdfac bind
failed

and using https://patchwork.freedesktop.org/patch/469907/
gives a this message:

[1.311502] [ cut here ]
[1.311502] WARNING: CPU: 9 PID: 221 at
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:244 amdgpu_gart_bind+0x16/0x20
[amdgpu]
[1.311602] Modules linked in: amdgpu(+) gpu_sched i2c_algo_bit
drm_ttm_helper hid_sensor_hub ttm hid_generic nvme xhci_pci
drm_kms_helper nvme_core t10_pi xhci_hcd crc_t10dif r8169 cec
crct10dif_generic i2c_hid_acpi amd_sfh rc_core crct10dif_pclmul realtek
i2c_hid crc32_pclmul 

Re: amd-staging-drm-next breaks suspend

2022-01-19 Thread Das, Nirmoy



On 1/19/2022 10:59 PM, Limonciello, Mario wrote:

[Public]


-Original Message-
From: Bert Karwatzki 
Sent: Wednesday, January 19, 2022 15:52
To: amd-gfx@lists.freedesktop.org
Cc: Limonciello, Mario ; Kazlauskas, Nicholas
; Zhuo, Qingqing (Lillian)
; Scott Bruce ; Alex Deucher
; Chris Hixon 
Subject: amd-staging-drm-next breaks suspend

I just tested drm-staging-drm-next with HEAD
f1b2924ee6929cb431440e6f961f06eb65d52beb:
Going into suspend leads to a hang again:
This is probably caused by
[ 1.310551] trying to bind memory to uninitialized GART !
and/or
[ 3.976438] trying to bind memory to uninitialized GART !



Could you please also try https://patchwork.freedesktop.org/patch/469907/ ?


Regards,

Nirmoy





+@Das, Nirmoy

The only thing that touched that file recently was
72f686438de13f121c52f58d7445570a33dfdc61

Could you see if backing that out helps?


Here's the complete dmesg:
[ 0.00] Linux version 5.13.0+ (bert@lisa) (gcc (Debian 11.2.0-14)
11.2.0, GNU ld (GNU Binutils for Debian) 2.37.50.20220106) #4 SMP Wed
Jan 19 22:19:19 CET 2022
[ 0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0+
root=UUID=78dcbf14-902d-49c0-9d4d-b7ad84550d9a ro
mt7921e.disable_aspm=1 quiet
[ 0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
point registers'
[ 0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.00] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys
User registers'
[ 0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.00] x86/fpu: xstate_offset[9]: 832, xstate_sizes[9]: 8
[ 0.00] x86/fpu: Enabled xstate features 0x207, context size is 840
bytes, using 'compacted' format.
[ 0.00] BIOS-provided physical RAM map:
[ 0.00] BIOS-e820: [mem 0x-0x0009]
usable
[ 0.00] BIOS-e820: [mem 0x000a-0x000f]
reserved
[ 0.00] BIOS-e820: [mem 0x0010-0x09bfefff]
usable
[ 0.00] BIOS-e820: [mem 0x09bff000-0x0a000fff]
reserved
[ 0.00] BIOS-e820: [mem 0x0a001000-0x0a1f]
usable
[ 0.00] BIOS-e820: [mem 0x0a20-0x0a20efff] ACPI
NVS
[ 0.00] BIOS-e820: [mem 0x0a20f000-0xe9e1]
usable
[ 0.00] BIOS-e820: [mem 0xe9e2-0xeb33efff]
reserved
[ 0.00] BIOS-e820: [mem 0xeb33f000-0xeb39efff] ACPI
data
[ 0.00] BIOS-e820: [mem 0xeb39f000-0xeb556fff] ACPI
NVS
[ 0.00] BIOS-e820: [mem 0xeb557000-0xed17cfff]
reserved
[ 0.00] BIOS-e820: [mem 0xed17d000-0xed1fefff] type
20
[ 0.00] BIOS-e820: [mem 0xed1ff000-0xedff]
usable
[ 0.00] BIOS-e820: [mem 0xee00-0xf7ff]
reserved
[ 0.00] BIOS-e820: [mem 0xfd00-0xfdff]
reserved
[ 0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfec1-0xfec10fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfed0-0xfed00fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfed4-0xfed44fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfed8-0xfed8]
reserved
[ 0.00] BIOS-e820: [mem 0xfedc4000-0xfedc9fff]
reserved
[ 0.00] BIOS-e820: [mem 0xfedcc000-0xfedcefff]
reserved
[ 0.00] BIOS-e820: [mem 0xfedd5000-0xfedd5fff]
reserved
[ 0.00] BIOS-e820: [mem 0xff00-0x]
reserved
[ 0.00] BIOS-e820: [mem 0x0001-0x0003ee2f]
usable
[ 0.00] BIOS-e820: [mem 0x0003ee30-0x00040fff]
reserved
[ 0.00] NX (Execute Disable) protection: active
[ 0.00] efi: EFI v2.70 by American Megatrends
[ 0.00] efi: ACPI=0xeb54 ACPI 2.0=0xeb540014
TPMFinalLog=0xeb50c000 SMBIOS=0xed02 SMBIOS 3.0=0xed01f000
MEMATTR=0xe6fa3018 ESRT=0xe87cb918 MOKvar=0xe6fa
[ 0.00] SMBIOS 3.3.0 present.
[ 0.00] DMI: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-
158L, BIOS E158LAMS.107 11/10/2021
[ 0.00] tsc: Fast TSC calibration using PIT
[ 0.00] tsc: Detected 3194.034 MHz processor
[ 0.000125] e820: update [mem 0x-0x0fff] usable ==>
reserved
[ 0.000126] e820: remove [mem 0x000a-0x000f] usable
[ 0.000131] last_pfn = 0x3ee300 max_arch_pfn = 0x4
[ 0.000363] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.000577] e820: update [mem 0xf000-0x] usable ==>
reserved
[ 0.000582] last_pfn = 0xee000 max_arch_pfn = 0x4
[ 0.003213] esrt: Reserving ESRT space from 0xe87cb918 to
0xe87cb950.
[ 0.003217] e820: update [mem 0xe87cb000-0xe87cbfff] usable ==>
reserved
[ 0.003225] e820: update [mem 0xe6fa-0xe6fa2fff] usable ==>
reserved
[ 0.003235] Using GB pages for direct mapping
[ 0.003498] Secure boot disabled
[ 0.003499] RAMDISK: [mem 

RE: amd-staging-drm-next breaks suspend

2022-01-19 Thread Limonciello, Mario
[Public]

> -Original Message-
> From: Bert Karwatzki 
> Sent: Wednesday, January 19, 2022 15:52
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario ; Kazlauskas, Nicholas
> ; Zhuo, Qingqing (Lillian)
> ; Scott Bruce ; Alex Deucher
> ; Chris Hixon 
> Subject: amd-staging-drm-next breaks suspend
> 
> I just tested drm-staging-drm-next with HEAD
> f1b2924ee6929cb431440e6f961f06eb65d52beb:
> Going into suspend leads to a hang again:
> This is probably caused by
> [ 1.310551] trying to bind memory to uninitialized GART !
> and/or
> [ 3.976438] trying to bind memory to uninitialized GART !
> 

+@Das, Nirmoy

The only thing that touched that file recently was
72f686438de13f121c52f58d7445570a33dfdc61

Could you see if backing that out helps?

> 
> Here's the complete dmesg:
> [ 0.00] Linux version 5.13.0+ (bert@lisa) (gcc (Debian 11.2.0-14)
> 11.2.0, GNU ld (GNU Binutils for Debian) 2.37.50.20220106) #4 SMP Wed
> Jan 19 22:19:19 CET 2022
> [ 0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0+
> root=UUID=78dcbf14-902d-49c0-9d4d-b7ad84550d9a ro
> mt7921e.disable_aspm=1 quiet
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
> point registers'
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [ 0.00] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys
> User registers'
> [ 0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
> [ 0.00] x86/fpu: xstate_offset[9]: 832, xstate_sizes[9]: 8
> [ 0.00] x86/fpu: Enabled xstate features 0x207, context size is 840
> bytes, using 'compacted' format.
> [ 0.00] BIOS-provided physical RAM map:
> [ 0.00] BIOS-e820: [mem 0x-0x0009]
> usable
> [ 0.00] BIOS-e820: [mem 0x000a-0x000f]
> reserved
> [ 0.00] BIOS-e820: [mem 0x0010-0x09bfefff]
> usable
> [ 0.00] BIOS-e820: [mem 0x09bff000-0x0a000fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0x0a001000-0x0a1f]
> usable
> [ 0.00] BIOS-e820: [mem 0x0a20-0x0a20efff] ACPI
> NVS
> [ 0.00] BIOS-e820: [mem 0x0a20f000-0xe9e1]
> usable
> [ 0.00] BIOS-e820: [mem 0xe9e2-0xeb33efff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xeb33f000-0xeb39efff] ACPI
> data
> [ 0.00] BIOS-e820: [mem 0xeb39f000-0xeb556fff] ACPI
> NVS
> [ 0.00] BIOS-e820: [mem 0xeb557000-0xed17cfff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xed17d000-0xed1fefff] type
> 20
> [ 0.00] BIOS-e820: [mem 0xed1ff000-0xedff]
> usable
> [ 0.00] BIOS-e820: [mem 0xee00-0xf7ff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfd00-0xfdff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfec1-0xfec10fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfed0-0xfed00fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfed4-0xfed44fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfed8-0xfed8]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfedc4000-0xfedc9fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfedcc000-0xfedcefff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xfedd5000-0xfedd5fff]
> reserved
> [ 0.00] BIOS-e820: [mem 0xff00-0x]
> reserved
> [ 0.00] BIOS-e820: [mem 0x0001-0x0003ee2f]
> usable
> [ 0.00] BIOS-e820: [mem 0x0003ee30-0x00040fff]
> reserved
> [ 0.00] NX (Execute Disable) protection: active
> [ 0.00] efi: EFI v2.70 by American Megatrends
> [ 0.00] efi: ACPI=0xeb54 ACPI 2.0=0xeb540014
> TPMFinalLog=0xeb50c000 SMBIOS=0xed02 SMBIOS 3.0=0xed01f000
> MEMATTR=0xe6fa3018 ESRT=0xe87cb918 MOKvar=0xe6fa
> [ 0.00] SMBIOS 3.3.0 present.
> [ 0.00] DMI: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-
> 158L, BIOS E158LAMS.107 11/10/2021
> [ 0.00] tsc: Fast TSC calibration using PIT
> [ 0.00] tsc: Detected 3194.034 MHz processor
> [ 0.000125] e820: update [mem 0x-0x0fff] usable ==>
> reserved
> [ 0.000126] e820: remove [mem 0x000a-0x000f] usable
> [ 0.000131] last_pfn = 0x3ee300 max_arch_pfn = 0x4
> [ 0.000363] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
> [ 0.000577] e820: update [mem 0xf000-0x] usable ==>
> reserved
> [ 0.000582] last_pfn = 0xee000 max_arch_pfn = 0x4
> [ 0.003213] esrt: Reserving ESRT space from 0xe87cb918 to
> 0xe87cb950.
> [ 0.003217] e820: update [mem 0xe87cb000-0xe87cbfff] usable ==>
> reserved
> [ 0.003225] e820: update [mem 0xe6fa-0xe6fa2fff] usable ==>
> reserved
> [ 0.003235] Using GB pages for