https://bugs.freedesktop.org/show_bug.cgi?id=105018

            Bug ID: 105018
           Summary: Kernel panic when waking up after screen goes blank.
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: critical
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: ragnaros39...@yandex.com

I'm currently running on latest Manjaro XFCE with the 4.15 kernel just
released, and I found that the system would crash when trying to wake up after
the screen went blank.

The system is an AMD Laptop (ASUS ROG STRIX GL702ZC), and the problem is 100%
reproducible with the following steps:

- Lock the screen, leave the screen blank for at least 3-5 minutes.
- Try wake the screen up, like moving the mouse cursor.

At first I did not find the cause, but after looking into the journalctl I was
able to find something that appears to be a kernel panic. It existed since the
beginning, with the 4.14 kernel, and remained unsolved even after upgrading to
4.15 kernel.

Feb 07 11:48:59 linuxsys kernel: BUG: unable to handle kernel NULL pointer
dereference at           (null)
Feb 07 11:48:59 linuxsys kernel: IP: dce110_vblank_set+0x4f/0xb0 [amdgpu]
Feb 07 11:48:59 linuxsys kernel: PGD 7e2ac2067 P4D 7e2ac2067 PUD 7e2a7e067 PMD
0 
Feb 07 11:48:59 linuxsys kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Feb 07 11:48:59 linuxsys kernel: Modules linked in: vmw_vsock_vmci_transport
vsock rfcomm fuse bnep vmnet(O) arc4 amdkfd nls_iso8859_1 amd_iommu_v2
nls_cp437 vfat fat amdgpu iwlmvm uvcvideo mac80211 videobuf2_vmalloc
edac_mce_amd btusb vide
Feb 07 11:48:59 linuxsys kernel:  rng_core cryptd pcspkr k10temp i2c_piix4
shpchp battery wmi thermal ac tpm_crb tpm_tis tpm_tis_core video tpm
asus_wireless i2c_hid button acpi_cpufreq sch_fq_codel vmmon(O) vmw_vmci
vboxnetflt(O) vboxnetad
Feb 07 11:48:59 linuxsys kernel: CPU: 15 PID: 1467 Comm: xfwm4 Tainted: G      
 W  O     4.15.0-1-MANJARO #1
Feb 07 11:48:59 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC.
GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Feb 07 11:48:59 linuxsys kernel: RIP: 0010:dce110_vblank_set+0x4f/0xb0 [amdgpu]
Feb 07 11:48:59 linuxsys kernel: RSP: 0018:ffffb4e388c7bbe0 EFLAGS: 00010002
Feb 07 11:48:59 linuxsys kernel: RAX: ffff9b458850c000 RBX: 0000000000000001
RCX: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: RDX: 0000000000000000 RSI: 000000000000000c
RDI: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: RBP: ffff9b4b2f4168e0 R08: 0000000000000000
R09: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: R10: 00007fff89afe9f0 R11: ffff9b4b2b86ac40
R12: ffff9b4b38511a80
Feb 07 11:48:59 linuxsys kernel: R13: ffffffffc12bbba0 R14: ffff9b4b281f0000
R15: ffff9b4b3ab4cb68
Feb 07 11:48:59 linuxsys kernel: FS:  00007f0bdae66980(0000)
GS:ffff9b4b3e9c0000(0000) knlGS:0000000000000000
Feb 07 11:48:59 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Feb 07 11:48:59 linuxsys kernel: CR2: 0000000000000000 CR3: 00000007d96c8000
CR4: 00000000003406e0
Feb 07 11:48:59 linuxsys kernel: Call Trace:
Feb 07 11:48:59 linuxsys kernel:  amdgpu_dm_set_crtc_irq_state+0x31/0x60
[amdgpu]
Feb 07 11:48:59 linuxsys kernel:  amdgpu_irq_update+0x55/0x90 [amdgpu]
Feb 07 11:48:59 linuxsys kernel:  drm_vblank_enable+0x84/0x100 [drm]
Feb 07 11:48:59 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Feb 07 11:48:59 linuxsys kernel:  drm_wait_vblank_ioctl+0x12a/0x690 [drm]
Feb 07 11:48:59 linuxsys kernel:  ? unix_stream_recvmsg+0x53/0x70
Feb 07 11:48:59 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100
[drm]
Feb 07 11:48:59 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Feb 07 11:48:59 linuxsys kernel:  drm_ioctl+0x2d5/0x370 [drm]
Feb 07 11:48:59 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100
[drm]
Feb 07 11:48:59 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Feb 07 11:48:59 linuxsys kernel:  ? vfs_writev+0xb9/0x110
Feb 07 11:48:59 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Feb 07 11:48:59 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Feb 07 11:48:59 linuxsys kernel:  ? __sys_recvmsg+0x4e/0x90
Feb 07 11:48:59 linuxsys kernel:  ? __sys_recvmsg+0x7d/0x90
Feb 07 11:48:59 linuxsys kernel:  SyS_ioctl+0x74/0x80
Feb 07 11:48:59 linuxsys kernel:  entry_SYSCALL_64_fastpath+0x20/0x83
Feb 07 11:48:59 linuxsys kernel: RIP: 0033:0x7f0bd74b3d87
Feb 07 11:48:59 linuxsys kernel: RSP: 002b:00007fff89afea38 EFLAGS: 00000246
Feb 07 11:48:59 linuxsys kernel: Code: e8 17 20 04 00 83 e8 4e 0f b6 d0 48 89
d0 48 c1 e0 05 48 01 d0 48 c1 e0 05 49 03 86 60 01 00 00 84 db 48 8b b8 78 02
00 00 74 18 <48> 8b 07 be 02 00 00 00 48 8b 80 d8 00 00 00 e8 6d 43 7e ee 84 
Feb 07 11:48:59 linuxsys kernel: RIP: dce110_vblank_set+0x4f/0xb0 [amdgpu] RSP:
ffffb4e388c7bbe0
Feb 07 11:48:59 linuxsys kernel: CR2: 0000000000000000
Feb 07 11:48:59 linuxsys kernel: ---[ end trace 36522610c84ff0f3 ]---

The cause seems to be dce110_vblank_set+0x4f/0xb0 [amdgpu], with the topmost
call trace being dce110_vblank_set+0x4f/0xb0 [amdgpu].

The bug report here, which was closed last December, resembled my current
issue:
https://lists.freedesktop.org/archives/amd-gfx/2017-November/016236.html

I've thought about the possibility of it being DC-related as I saw similar bug
reports, but I was wrong, as at one time I was able to reproduce it even after
passing amdgpu.dc=0 during boot. The modules don't seem to be related, as it
happened on fresh installs, where I left the screen blank (before I actually
adjusted power management options) as I let it download and install packages I
wanted in the background.

Additionally, I'm able to find some additional errors prior to the crash, which
might have happened when the screen went blank. It could be done by simply
locking the screen and leave it as is. (NOTE: When I locked the screen and then
immediately move the mouse cursor to wake it up, the crash would not occur. It
would only occur if the screen went blank for at least 3-5 minutes.)

Feb 07 11:38:04 linuxsys kernel: [drm] {1920x1080, 2080x1111@138700Khz}
Feb 07 11:38:12 linuxsys kernel: [drm] RBRx2 pass VS=1, PE=0
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: WARNING: CPU: 12 PID: 1467 at
drivers/gpu/drm/drm_vblank.c:612
drm_calc_vbltimestamp_from_scanoutpos+0x2c5/0x340 [drm]
Feb 07 11:38:12 linuxsys kernel: Modules linked in: vmw_vsock_vmci_transport
vsock rfcomm fuse bnep vmnet(O) arc4 amdkfd nls_iso8859_1 amd_iommu_v2
nls_cp437 vfat fat amdgpu iwlmvm uvcvideo mac80211 videobuf2_vmalloc
edac_mce_amd btusb vide
Feb 07 11:38:12 linuxsys kernel:  rng_core cryptd pcspkr k10temp i2c_piix4
shpchp battery wmi thermal ac tpm_crb tpm_tis tpm_tis_core video tpm
asus_wireless i2c_hid button acpi_cpufreq sch_fq_codel vmmon(O) vmw_vmci
vboxnetflt(O) vboxnetad
Feb 07 11:38:12 linuxsys kernel: CPU: 12 PID: 1467 Comm: xfwm4 Tainted: G      
    O     4.15.0-1-MANJARO #1
Feb 07 11:38:12 linuxsys kernel: Hardware name: ASUSTeK COMPUTER INC.
GL702ZC/GL702ZC, BIOS GL702ZC.303 12/15/2017
Feb 07 11:38:12 linuxsys kernel: RIP:
0010:drm_calc_vbltimestamp_from_scanoutpos+0x2c5/0x340 [drm]
Feb 07 11:38:12 linuxsys kernel: RSP: 0018:ffffb4e388c7bb50 EFLAGS: 00010086
Feb 07 11:38:12 linuxsys kernel: RAX: ffffffffc12b04c0 RBX: ffff9b4b3ab4c800
RCX: 0000000000000001
Feb 07 11:38:12 linuxsys kernel: RDX: ffffffffc0941068 RSI: 0000000000000001
RDI: ffffffffc093f0d8
Feb 07 11:38:12 linuxsys kernel: RBP: ffffb4e388c7bbb8 R08: 0000000000000000
R09: ffffffffc09214a0
Feb 07 11:38:12 linuxsys kernel: R10: ffffffffc10d6320 R11: ffffffffb056c36d
R12: 0000000000000001
Feb 07 11:38:12 linuxsys kernel: R13: ffffb4e388c7bbcc R14: ffffb4e388c7bc00
R15: ffff9b4b2ba84000
Feb 07 11:38:12 linuxsys kernel: FS:  00007f0bdae66980(0000)
GS:ffff9b4b3e900000(0000) knlGS:0000000000000000
Feb 07 11:38:12 linuxsys kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Feb 07 11:38:12 linuxsys kernel: CR2: 00007f9ee41080b0 CR3: 00000007d96c8000
CR4: 00000000003406e0
Feb 07 11:38:12 linuxsys kernel: Call Trace:
Feb 07 11:38:12 linuxsys kernel:  drm_get_last_vbltimestamp+0x54/0x90 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_update_vblank_count+0x77/0x250 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_vblank_enable+0xbd/0x100 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_vblank_get+0x8d/0xb0 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_wait_vblank_ioctl+0x12a/0x690 [drm]
Feb 07 11:38:12 linuxsys kernel:  ? unix_stream_recvmsg+0x53/0x70
Feb 07 11:38:12 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100
[drm]
Feb 07 11:38:12 linuxsys kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Feb 07 11:38:12 linuxsys kernel:  drm_ioctl+0x2d5/0x370 [drm]
Feb 07 11:38:12 linuxsys kernel:  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100
[drm]
Feb 07 11:38:12 linuxsys kernel:  ? do_iter_write+0xdc/0x190
Feb 07 11:38:12 linuxsys kernel:  ? vfs_writev+0xb9/0x110
Feb 07 11:38:12 linuxsys kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Feb 07 11:38:12 linuxsys kernel:  do_vfs_ioctl+0xa4/0x630
Feb 07 11:38:12 linuxsys kernel:  ? __sys_recvmsg+0x4e/0x90
Feb 07 11:38:12 linuxsys kernel:  ? __sys_recvmsg+0x7d/0x90
Feb 07 11:38:12 linuxsys kernel:  SyS_ioctl+0x74/0x80
Feb 07 11:38:12 linuxsys kernel:  entry_SYSCALL_64_fastpath+0x20/0x83
Feb 07 11:38:12 linuxsys kernel: RIP: 0033:0x7f0bd74b3d87
Feb 07 11:38:12 linuxsys kernel: RSP: 002b:00007fff89afea38 EFLAGS: 00000246
Feb 07 11:38:12 linuxsys kernel: Code: e1 48 c7 c2 68 10 94 c0 be 01 00 00 00
48 c7 c7 d8 f0 93 c0 e8 1d 66 fe ff 48 8b 83 98 03 00 00 48 83 78 20 00 0f 84
6f fd ff ff <0f> ff e9 68 fd ff ff 48 c7 c2 30 10 94 c0 31 f6 48 c7 c7 d5 f0 
Feb 07 11:38:12 linuxsys kernel: ---[ end trace 36522610c84ff0f2 ]---
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:12 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_crtc_get_scanoutpos [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!
Feb 07 11:38:20 linuxsys kernel: [drm:dm_vblank_get_counter [amdgpu]] *ERROR*
dc_stream_state is NULL for crtc '1'!

For now, I could only prevent the panic from happening by not allowing power
saving functions to happen, especially anything that would turn off the screen.
I'm also not allowed to lock the screen it would also blank the screen, and the
GTK+ greeter could blank the screen in its own way. 

However, it's not feasible for running the system on battery for an extended
period without any power saving feature, given the high total TDP the laptop
has, and leaving the system unlocked is a bad idea in terms of security and
privacy.

By the way, given there were a few similar closed bug reports in the past, I
believe the problem might be a regression.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to