Re: [PATCH] drm/amdgpu: fix the hw hang during perform system reboot and reset
On 2020 Apr 13, Prike Liang wrote: > Unify set device CGPG to ungate state before enter poweroff or reset. > > Signed-off-by: Prike Liang > Tested-by: Mengbing Wang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 87f7c12..bbe090a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2413,6 +2413,8 @@ static int amdgpu_device_ip_suspend_phase1(struct > amdgpu_device *adev) > { > int i, r; > > + amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); > + amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE); > > for (i = adev->num_ip_blocks - 1; i >= 0; i--) { > if (!adev->ip_blocks[i].status.valid) > -- > 2.7.4 > I can confirm that this fixes the shutdown/reboot hang on my raven. -- Regards, Johannes Hirte ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)
On 2020 Apr 12, Liang, Prike wrote: > Thanks update and verify. Could you give more detail information and error > log message > about you observed issue? > > Thanks, > Prike There is no error log, the system just doesn't poweroff/reboot. lspci: 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:01.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A 00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7 01:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32) 02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5762 Gigabit Ethernet PCIe (rev 10) 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01) 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev d1) 04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller 04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller 04:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver 05:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61) -- Regards, Johannes Hirte ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)
On 2020 Apr 07, Prike Liang wrote: > The system will be hang up during S3 suspend because of SMU is pending > for GC not respose the register CP_HQD_ACTIVE access request.This issue > root cause of accessing the GC register under enter GFX CGGPG and can > be fixed by disable GFX CGPG before perform suspend. > > v2: Use disable the GFX CGPG instead of RLC safe mode guard. > > Signed-off-by: Prike Liang > Tested-by: Mengbing Wang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 2e1f955..bf8735b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2440,8 +2440,6 @@ static int amdgpu_device_ip_suspend_phase1(struct > amdgpu_device *adev) > { > int i, r; > > - amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); > - amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE); > > for (i = adev->num_ip_blocks - 1; i >= 0; i--) { > if (!adev->ip_blocks[i].status.valid) > @@ -3470,6 +3468,9 @@ int amdgpu_device_suspend(struct drm_device *dev, bool > fbcon) > } > } > > + amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); > + amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE); > + > amdgpu_amdkfd_suspend(adev, !fbcon); > > amdgpu_ras_suspend(adev); This breaks shutdown/reboot on my system (Dell latitude 5495). -- Regards, Johannes Hirte ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu] Store FB for each CRTC in drmmode_flipdata_rec
On 2018 Aug 16, Michel Dänzer wrote: > On 2018-08-10 09:06 AM, Johannes Hirte wrote: > > On 2018 Jul 27, Michel Dänzer wrote: > >> From: Michel Dänzer > >> > >> We were only storing the FB provided by the client, but on CRTCs with > >> TearFree enabled, we use a separate FB. This could cause > >> drmmode_flip_handler to fail to clear drmmode_crtc->flip_pending, which > >> could result in a hang when waiting for the pending flip to complete. We > >> were trying to avoid that by always clearing drmmode_crtc->flip_pending > >> when TearFree is enabled, but that wasn't reliable, because > >> drmmode_crtc->tear_free can already be FALSE at this point when > >> disabling TearFree. > >> > >> Now that we're keeping track of each CRTC's flip FB separately, > >> drmmode_flip_handler can reliably clear flip_pending, and we no longer > >> need the TearFree hack. > >> > >> Signed-off-by: Michel Dänzer > > > > Since this change I get a black screen when login into KDE Plasma. I > > have to switch to linux console and back for getting the X11 screen. > > Additional the Xorg.log is spammed with: > > > > [ 189.744] (WW) AMDGPU(0): get vblank counter failed: Invalid argument > > [ 189.828] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: > > Device or resource busy, TearFree inactive until next modeset > > [ 189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: > > Invalid argument > > [ 189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: > > Invalid argument > > > > The "flip queue failed" message appears only once, the other two are > > much more often. > > > > System is a Carrizo A10-8700B, kernel 4.17.13 + this patch: > > https://bugzilla.kernel.org/attachment.cgi?id=276173 > > Does https://patchwork.freedesktop.org/patch/244860/ fix it? > Yes, this fixed it. -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu] Store FB for each CRTC in drmmode_flipdata_rec
On 2018 Jul 27, Michel Dänzer wrote: > From: Michel Dänzer > > We were only storing the FB provided by the client, but on CRTCs with > TearFree enabled, we use a separate FB. This could cause > drmmode_flip_handler to fail to clear drmmode_crtc->flip_pending, which > could result in a hang when waiting for the pending flip to complete. We > were trying to avoid that by always clearing drmmode_crtc->flip_pending > when TearFree is enabled, but that wasn't reliable, because > drmmode_crtc->tear_free can already be FALSE at this point when > disabling TearFree. > > Now that we're keeping track of each CRTC's flip FB separately, > drmmode_flip_handler can reliably clear flip_pending, and we no longer > need the TearFree hack. > > Signed-off-by: Michel Dänzer Since this change I get a black screen when login into KDE Plasma. I have to switch to linux console and back for getting the X11 screen. Additional the Xorg.log is spammed with: [ 189.744] (WW) AMDGPU(0): get vblank counter failed: Invalid argument [ 189.828] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: Device or resource busy, TearFree inactive until next modeset [ 189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: Invalid argument [ 189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: Invalid argument The "flip queue failed" message appears only once, the other two are much more often. System is a Carrizo A10-8700B, kernel 4.17.13 + this patch: https://bugzilla.kernel.org/attachment.cgi?id=276173 -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb
On 2018 Jan 14, Grodzovsky, Andrey wrote: > To be sure it was inserted at the correct place please send me output of git > diff on your modified branch. > > Thanks, > Andrey > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index bb5fa895fb64..bc2050a5a5c6 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -4802,7 +4802,7 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev, * synchronization events. */ - if (lock_and_validation_needed) { + if (lock_and_validation_needed || state->legacy_cursor_update == true) { ret = do_aquire_global_lock(dev, state); if (ret) If this matters, I've applied the patch on top of 4.15-rc7 with your "Fix: Save job's priority on it's creation instead of accessing it from s_entity later on." patch. This one is still not upstream, but without I see the other use-after-free -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb
On 2018 Jan 12, Andrey Grodzovsky wrote: > Yea, I know , just dumped diff of one file into it, please search in > code for > > "ret = do_aquire_global_lock(dev, state);" it appears only in one place > in entire code base, and manually apply the one line change. > with patch applied: [ 6887.679618] [drm] {1920x1080, 2250x1132@152840Khz} [ 6887.806430] [drm] HBRx2 pass VS=1, PE=0 [12432.070076] [drm] {1920x1080, 2250x1132@152840Khz} [12432.194472] [drm] HBRx2 pass VS=1, PE=0 [13677.257767] == [13677.257812] BUG: KASAN: use-after-free in drm_atomic_helper_wait_for_flip_done+0x24f/0x270 [13677.257820] Read of size 8 at addr 8803f0533388 by task kworker/u8:6/22172 [13677.257832] CPU: 2 PID: 22172 Comm: kworker/u8:6 Not tainted 4.15.0-rc7-2-g617b2907a7aa #445 [13677.257837] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 10/12/2017 [13677.257848] Workqueue: events_unbound commit_work [13677.257853] Call Trace: [13677.257867] dump_stack+0x99/0x11e [13677.257874] ? _atomic_dec_and_lock+0x152/0x152 [13677.257886] print_address_description+0x65/0x270 [13677.257892] kasan_report+0x272/0x360 [13677.257898] ? drm_atomic_helper_wait_for_flip_done+0x24f/0x270 [13677.257903] drm_atomic_helper_wait_for_flip_done+0x24f/0x270 [13677.257913] amdgpu_dm_atomic_commit_tail+0x185e/0x2b90 [13677.257923] ? dm_crtc_duplicate_state+0x130/0x130 [13677.257931] ? trace_raw_output_rcu_utilization+0xa0/0xa0 [13677.257939] ? drm_atomic_helper_wait_for_dependencies+0x3f2/0x800 [13677.257945] commit_tail+0x92/0xe0 [13677.257953] process_one_work+0x84b/0x1600 [13677.257961] ? tick_nohz_dep_clear_signal+0x20/0x20 [13677.257969] ? _raw_spin_unlock_irq+0xbe/0x120 [13677.257973] ? _raw_spin_unlock+0x120/0x120 [13677.257977] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [13677.257984] ? arch_vtime_task_switch+0xee/0x190 [13677.257991] ? finish_task_switch+0x27d/0x7f0 [13677.257995] ? wq_worker_waking_up+0xc0/0xc0 [13677.258000] ? copy_overflow+0x20/0x20 [13677.258010] ? pci_mmcfg_check_reserved+0x100/0x100 [13677.258014] ? pci_mmcfg_check_reserved+0x100/0x100 [13677.258022] ? schedule+0xfb/0x3b0 [13677.258027] ? __schedule+0x19b0/0x19b0 [13677.258031] ? preempt_schedule_common+0x30/0xb0 [13677.258038] ? ___preempt_schedule+0x16/0x18 [13677.258043] ? _raw_spin_unlock_irq+0xfa/0x120 [13677.258047] ? _raw_spin_unlock+0x120/0x120 [13677.258052] worker_thread+0x211/0x1790 [13677.258060] ? pick_next_task_fair+0x313/0x10f0 [13677.258065] ? trace_event_raw_event_workqueue_work+0x170/0x170 [13677.258073] ? cyc2ns_read_end+0x20/0x20 [13677.258078] ? tick_nohz_dep_clear_signal+0x20/0x20 [13677.258083] ? get_vtime_delta+0x16/0xd0 [13677.258087] ? _raw_spin_unlock_irq+0xbe/0x120 [13677.258091] ? _raw_spin_unlock+0x120/0x120 [13677.258098] ? finish_task_switch+0x27d/0x7f0 [13677.258104] ? sched_clock_cpu+0x18/0x1e0 [13677.258110] ? ret_from_fork+0x1f/0x30 [13677.258116] ? pci_mmcfg_check_reserved+0x100/0x100 [13677.258120] ? get_vtime_delta+0x16/0xd0 [13677.258125] ? cyc2ns_read_end+0x20/0x20 [13677.258131] ? schedule+0xfb/0x3b0 [13677.258136] ? __schedule+0x19b0/0x19b0 [13677.258141] ? remove_wait_queue+0x2b0/0x2b0 [13677.258146] ? arch_vtime_task_switch+0xee/0x190 [13677.258151] ? _raw_spin_unlock_irqrestore+0xc2/0x130 [13677.258156] ? _raw_spin_unlock_irq+0x120/0x120 [13677.258162] ? trace_event_raw_event_workqueue_work+0x170/0x170 [13677.258167] kthread+0x2d4/0x390 [13677.258172] ? kthread_create_worker+0xd0/0xd0 [13677.258177] ret_from_fork+0x1f/0x30 [13677.258188] Allocated by task 2377: [13677.258196] kasan_kmalloc+0xa0/0xd0 [13677.258202] kmem_cache_alloc_trace+0xd1/0x1e0 [13677.258208] dm_crtc_duplicate_state+0x73/0x130 [13677.258214] drm_atomic_get_crtc_state+0x13c/0x400 [13677.258218] page_flip_common+0x52/0x230 [13677.258223] drm_atomic_helper_page_flip+0xa1/0x100 [13677.258230] drm_mode_page_flip_ioctl+0xc10/0x1030 [13677.258236] drm_ioctl_kernel+0x1b5/0x2c0 [13677.258240] drm_ioctl+0x709/0xa00 [13677.258245] amdgpu_drm_ioctl+0x118/0x280 [13677.258250] do_vfs_ioctl+0x18a/0x1260 [13677.258254] SyS_ioctl+0x6f/0x80 [13677.258258] do_syscall_64+0x220/0x670 [13677.258262] return_from_SYSCALL_64+0x0/0x65 [13677.258267] Freed by task 2523: [13677.258273] kasan_slab_free+0x71/0xc0 [13677.258276] kfree+0x88/0x1b0 [13677.258280] drm_atomic_state_default_clear+0x2c8/0xa00 [13677.258285] __drm_atomic_state_free+0x30/0xd0 [13677.258289] drm_atomic_helper_update_plane+0xb6/0x350 [13677.258293] __setplane_internal+0x5b4/0x9d0 [13677.258297] drm_mode_cursor_universal+0x412/0xc60 [13677.258301] drm_mode_cursor_common+0x4b6/0x890 [13677.258305] drm_mode_cursor_ioctl+0xd3/0x120 [13677.258309] drm_ioctl_kernel+0x1b5/0x2c0 [13677.258313] drm_ioctl+0x709/0xa00 [13677.258316] amdgpu_drm_ioctl+0x118/0x280 [13677.258319] do_vfs_ioctl+0x18a/0x1260 [13677.258323] SyS_ioctl+0x6f/0x80 [13677.258326] do_syscall_
Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb
On 2018 Jan 12, Andrey Grodzovsky wrote: > Hi, looks to me like a different issue (not related) then the one > Johannes, reports, your issue was already reported by some one (can't > remember the thread of hand) and looks like in shader hang or GPU > scheduler synchronization issue while Johannes's use after free is pure > software logic issue in either KMS atomic framework or more probably in > AMDGPU/DC (DAL). > > > Johanes, I attached a debug patch which forces the cursor update to wait > for any page flip in progress, can you give it a try and see if the > issue is gone ? This is not an actual fix but just to evaluate the reason. > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 5a70682..323d020 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -4908,7 +4908,7 @@ static int amdgpu_dm_atomic_check(struct drm_device > *dev, > * synchronization events. > */ > > - if (lock_and_validation_needed) { > + if (lock_and_validation_needed || state->legacy_cursor_update == > true) { > > ret = do_aquire_global_lock(dev, state); > if (ret) > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c > b/drivers/gpu/drm/ttm/ttm_page_alloc.c > index a1a751b..6d6ffdf 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c The patch seems incomplete. -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb
On 2018 Jan 11, Andrey Grodzovsky wrote: > Thanks for the dmesg, unfortunately nothing suspicious from there. > > Looking again at KASAN it hints at a race between cursor update and non > blocking part of flip with regard to accessing CRTC states, maybe cursor > update is not properly synchronized against a flip in flight on same CRTC... > > P.S What is your setup ? How many displays ? > It's a Carizzo A10-8700B R6 with 16G RAM, 512M assigned to graphics card. Only the laptop display (1920x1080) is connected via eDP, so nothing special. -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb
On 2018 Jan 10, Andrey Grodzovsky wrote: > > Hi, is there a particular scenario when this happens , Unfortunately no, I still search for a reproducer. Sometimes it takes several days until the next use-after-free. > can you add dmesg with echo 0x10 > /sys/module/drm/parameters/debug? I assume you want the debug output when a use-after-free happened. Here it is: Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_init] Allocated atomic state a67d7f62 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_plane_state] Added [PLANE:40:plane-4] 9b693a40 state to a67d7f62 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_crtc_state] Added [CRTC:41:crtc-0] fd68d0e6 state to a67d7f62 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_crtc_for_plane] Link plane state 9b693a40 to [CRTC:41:crtc-0] Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_fb_for_plane] Set [FB:48] for plane state 9b693a40 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_check_only] checking a67d7f62 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_commit] committing a67d7f62 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_default_clear] Clearing atomic state a67d7f62 Jan 11 23:21:33 probook kernel: [drm:__drm_atomic_state_free] Freeing atomic state a67d7f62 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_init] Allocated atomic state aff36e64 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_plane_state] Added [PLANE:40:plane-4] bef4ac0a state to aff36e64 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_crtc_state] Added [CRTC:41:crtc-0] 487e5e13 state to aff36e64 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_crtc_for_plane] Link plane state bef4ac0a to [CRTC:41:crtc-0] Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_fb_for_plane] Set [FB:48] for plane state bef4ac0a Jan 11 23:21:33 probook kernel: [drm:drm_atomic_check_only] checking aff36e64 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_commit] committing aff36e64 Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_default_clear] Clearing atomic state aff36e64 Jan 11 23:21:33 probook kernel: [drm:__drm_atomic_state_free] Freeing atomic state aff36e64 Jan 11 23:21:33 probook kernel: == Jan 11 23:21:33 probook kernel: BUG: KASAN: use-after-free in drm_atomic_helper_wait_for_flip_done+0x24f/0x270 Jan 11 23:21:33 probook kernel: Read of size 8 at addr 8801e020d788 by task kworker/u8:6/18738 Jan 11 23:21:33 probook kernel: Jan 11 23:21:33 probook kernel: CPU: 2 PID: 18738 Comm: kworker/u8:6 Not tainted 4.15.0-rc7-1-gd24b113b5c00 #444 Jan 11 23:21:33 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 10/12/2017 Jan 11 23:21:33 probook kernel: Workqueue: events_unbound commit_work Jan 11 23:21:33 probook kernel: Call Trace: Jan 11 23:21:33 probook kernel: dump_stack+0x99/0x11e Jan 11 23:21:33 probook kernel: ? _atomic_dec_and_lock+0x152/0x152 Jan 11 23:21:33 probook kernel: print_address_description+0x65/0x270 Jan 11 23:21:33 probook kernel: kasan_report+0x272/0x360 Jan 11 23:21:33 probook kernel: ? drm_atomic_helper_wait_for_flip_done+0x24f/0x270 Jan 11 23:21:33 probook kernel: drm_atomic_helper_wait_for_flip_done+0x24f/0x270 Jan 11 23:21:33 probook kernel: amdgpu_dm_atomic_commit_tail+0x185e/0x2b90 Jan 11 23:21:33 probook kernel: ? dm_crtc_duplicate_state+0x130/0x130 Jan 11 23:21:33 probook kernel: ? drm_atomic_helper_wait_for_dependencies+0x3f2/0x800 Jan 11 23:21:33 probook kernel: commit_tail+0x92/0xe0 Jan 11 23:21:33 probook kernel: process_one_work+0x84b/0x1600 Jan 11 23:21:33 probook kernel: ? tick_nohz_dep_clear_signal+0x20/0x20 Jan 11 23:21:33 probook kernel: ? _raw_spin_unlock_irq+0xbe/0x120 Jan 11 23:21:33 probook kernel: ? _raw_spin_unlock+0x120/0x120 Jan 11 23:21:33 probook kernel: ? pwq_dec_nr_in_flight+0x3c0/0x3c0 Jan 11 23:21:33 probook kernel: ? arch_vtime_task_switch+0xee/0x190 Jan 11 23:21:33 probook kernel: ? finish_task_switch+0x27d/0x7f0 Jan 11 23:21:33 probook kernel: ? wq_worker_waking_up+0xc0/0xc0 Jan 11 23:21:33 probook kernel: ? copy_overflow+0x20/0x20 Jan 11 23:21:33 probook kernel: ? sched_clock_cpu+0x18/0x1e0 Jan 11 23:21:33 probook kernel: ? pci_mmcfg_check_reserved+0x100/0x100 Jan 11 23:21:33 probook kernel: ? preempt_schedule_irq+0x4e/0xb0 Jan 11 23:21:33 probook kernel: ? schedule+0xfb/0x3b0 Jan 11 23:21:33 probook kernel: ? __schedule+0x19b0/0x19b0 Jan 11 23:21:33 probook kernel: ? _raw_spin_unlock_irq+0xb9/0x120 Jan 11 23:21:33 probook kernel: ? _raw_spin_unlock_irq+0xbe/0x120 Jan 11 23:21:33 probook kernel: ? _raw_spin_unlock+0x120/0x120 Jan 11 23:21:33 probook kernel: worker_thread+0x211/0x1790 Jan 11 23:21:33 probook kernel: ? trace_event_raw_event_workqueue_work+0x170/0x170 Jan 11 23:21:33 prob
Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb
On 2018 Jan 03, Johannes Hirte wrote: > On 2018 Jan 03, Johannes Hirte wrote: > > This should be fixed already with > > https://lists.freedesktop.org/archives/amd-gfx/2017-October/014932.html > > but's still missing upstream. > > > > With this patch, the use-after-free in amdgpu_job_free_cb seems to be > gone. But now I get an use-after-free in > drm_atomic_helper_wait_for_flip_done: > > [89387.069387] > == > [89387.069407] BUG: KASAN: use-after-free in > drm_atomic_helper_wait_for_flip_done+0x24f/0x270 > [89387.069413] Read of size 8 at addr 880124df0688 by task > kworker/u8:3/31426 > > [89387.069423] CPU: 1 PID: 31426 Comm: kworker/u8:3 Not tainted > 4.15.0-rc6-1-ge0895ba8d88e #442 > [89387.069427] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 > 10/12/2017 > [89387.069435] Workqueue: events_unbound commit_work > [89387.069440] Call Trace: > [89387.069448] dump_stack+0x99/0x11e > [89387.069453] ? _atomic_dec_and_lock+0x152/0x152 > [89387.069460] print_address_description+0x65/0x270 > [89387.069465] kasan_report+0x272/0x360 > [89387.069470] ? drm_atomic_helper_wait_for_flip_done+0x24f/0x270 > [89387.069475] drm_atomic_helper_wait_for_flip_done+0x24f/0x270 > [89387.069483] amdgpu_dm_atomic_commit_tail+0x185e/0x2b90 > [89387.069492] ? dm_crtc_duplicate_state+0x130/0x130 > [89387.069498] ? drm_atomic_helper_wait_for_dependencies+0x3f2/0x800 > [89387.069504] commit_tail+0x92/0xe0 > [89387.069511] process_one_work+0x84b/0x1600 > [89387.069517] ? tick_nohz_dep_clear_signal+0x20/0x20 > [89387.069522] ? _raw_spin_unlock_irq+0xbe/0x120 > [89387.069525] ? _raw_spin_unlock+0x120/0x120 > [89387.069529] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 > [89387.069534] ? arch_vtime_task_switch+0xee/0x190 > [89387.069539] ? finish_task_switch+0x27d/0x7f0 > [89387.069542] ? wq_worker_waking_up+0xc0/0xc0 > [89387.069547] ? copy_overflow+0x20/0x20 > [89387.069550] ? sched_clock_cpu+0x18/0x1e0 > [89387.069558] ? pci_mmcfg_check_reserved+0x100/0x100 > [89387.069562] ? pci_mmcfg_check_reserved+0x100/0x100 > [89387.069569] ? schedule+0xfb/0x3b0 > [89387.069574] ? __schedule+0x19b0/0x19b0 > [89387.069578] ? _raw_spin_unlock_irq+0xb9/0x120 > [89387.069582] ? _raw_spin_unlock_irq+0xbe/0x120 > [89387.069585] ? _raw_spin_unlock+0x120/0x120 > [89387.069590] worker_thread+0x211/0x1790 > [89387.069597] ? pick_next_task_fair+0x313/0x10f0 > [89387.069601] ? trace_event_raw_event_workqueue_work+0x170/0x170 > [89387.069606] ? __read_once_size_nocheck.constprop.6+0x10/0x10 > [89387.069612] ? tick_nohz_dep_clear_signal+0x20/0x20 > [89387.069616] ? account_idle_time+0x94/0x1f0 > [89387.069620] ? _raw_spin_unlock_irq+0xbe/0x120 > [89387.069623] ? _raw_spin_unlock+0x120/0x120 > [89387.069628] ? finish_task_switch+0x27d/0x7f0 > [89387.069633] ? sched_clock_cpu+0x18/0x1e0 > [89387.069639] ? ret_from_fork+0x1f/0x30 > [89387.069644] ? pci_mmcfg_check_reserved+0x100/0x100 > [89387.069650] ? cyc2ns_read_end+0x20/0x20 > [89387.069657] ? schedule+0xfb/0x3b0 > [89387.069662] ? __schedule+0x19b0/0x19b0 > [89387.069666] ? remove_wait_queue+0x2b0/0x2b0 > [89387.069670] ? arch_vtime_task_switch+0xee/0x190 > [89387.069675] ? _raw_spin_unlock_irqrestore+0xc2/0x130 > [89387.069679] ? _raw_spin_unlock_irq+0x120/0x120 > [89387.069683] ? trace_event_raw_event_workqueue_work+0x170/0x170 > [89387.069688] kthread+0x2d4/0x390 > [89387.069693] ? kthread_create_worker+0xd0/0xd0 > [89387.069697] ret_from_fork+0x1f/0x30 > > [89387.069705] Allocated by task 2387: > [89387.069712] kasan_kmalloc+0xa0/0xd0 > [89387.069717] kmem_cache_alloc_trace+0xd1/0x1e0 > [89387.069722] dm_crtc_duplicate_state+0x73/0x130 > [89387.069726] drm_atomic_get_crtc_state+0x13c/0x400 > [89387.069730] page_flip_common+0x52/0x230 > [89387.069734] drm_atomic_helper_page_flip+0xa1/0x100 > [89387.069739] drm_mode_page_flip_ioctl+0xc10/0x1030 > [89387.069744] drm_ioctl_kernel+0x1b5/0x2c0 > [89387.069748] drm_ioctl+0x709/0xa00 > [89387.069752] amdgpu_drm_ioctl+0x118/0x280 > [89387.069756] do_vfs_ioctl+0x18a/0x1260 > [89387.069760] SyS_ioctl+0x6f/0x80 > [89387.069764] do_syscall_64+0x220/0x670 > [89387.069768] return_from_SYSCALL_64+0x0/0x65 > > [89387.069772] Freed by task 2533: > [89387.069776] kasan_slab_free+0x71/0xc0 > [89387.069780] kfree+0x88/0x1b0 > [89387.069784] drm_atomic_state_default_clear+0x2c8/0xa00 > [89387.069787] __drm_atomic_state_free+0x30/0xd0 > [89387.069791] drm_atomic_helper_update_plane+0xb6/0x350 > [89387.069794] __setplane_internal+0x5b4/0x9d0 > [89387.069798] drm_mode_cursor_universal+0x412/0xc60 > [89
Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb
On 2018 Jan 03, Johannes Hirte wrote: > This should be fixed already with > https://lists.freedesktop.org/archives/amd-gfx/2017-October/014932.html > but's still missing upstream. > With this patch, the use-after-free in amdgpu_job_free_cb seems to be gone. But now I get an use-after-free in drm_atomic_helper_wait_for_flip_done: [89387.069387] == [89387.069407] BUG: KASAN: use-after-free in drm_atomic_helper_wait_for_flip_done+0x24f/0x270 [89387.069413] Read of size 8 at addr 880124df0688 by task kworker/u8:3/31426 [89387.069423] CPU: 1 PID: 31426 Comm: kworker/u8:3 Not tainted 4.15.0-rc6-1-ge0895ba8d88e #442 [89387.069427] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 10/12/2017 [89387.069435] Workqueue: events_unbound commit_work [89387.069440] Call Trace: [89387.069448] dump_stack+0x99/0x11e [89387.069453] ? _atomic_dec_and_lock+0x152/0x152 [89387.069460] print_address_description+0x65/0x270 [89387.069465] kasan_report+0x272/0x360 [89387.069470] ? drm_atomic_helper_wait_for_flip_done+0x24f/0x270 [89387.069475] drm_atomic_helper_wait_for_flip_done+0x24f/0x270 [89387.069483] amdgpu_dm_atomic_commit_tail+0x185e/0x2b90 [89387.069492] ? dm_crtc_duplicate_state+0x130/0x130 [89387.069498] ? drm_atomic_helper_wait_for_dependencies+0x3f2/0x800 [89387.069504] commit_tail+0x92/0xe0 [89387.069511] process_one_work+0x84b/0x1600 [89387.069517] ? tick_nohz_dep_clear_signal+0x20/0x20 [89387.069522] ? _raw_spin_unlock_irq+0xbe/0x120 [89387.069525] ? _raw_spin_unlock+0x120/0x120 [89387.069529] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [89387.069534] ? arch_vtime_task_switch+0xee/0x190 [89387.069539] ? finish_task_switch+0x27d/0x7f0 [89387.069542] ? wq_worker_waking_up+0xc0/0xc0 [89387.069547] ? copy_overflow+0x20/0x20 [89387.069550] ? sched_clock_cpu+0x18/0x1e0 [89387.069558] ? pci_mmcfg_check_reserved+0x100/0x100 [89387.069562] ? pci_mmcfg_check_reserved+0x100/0x100 [89387.069569] ? schedule+0xfb/0x3b0 [89387.069574] ? __schedule+0x19b0/0x19b0 [89387.069578] ? _raw_spin_unlock_irq+0xb9/0x120 [89387.069582] ? _raw_spin_unlock_irq+0xbe/0x120 [89387.069585] ? _raw_spin_unlock+0x120/0x120 [89387.069590] worker_thread+0x211/0x1790 [89387.069597] ? pick_next_task_fair+0x313/0x10f0 [89387.069601] ? trace_event_raw_event_workqueue_work+0x170/0x170 [89387.069606] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [89387.069612] ? tick_nohz_dep_clear_signal+0x20/0x20 [89387.069616] ? account_idle_time+0x94/0x1f0 [89387.069620] ? _raw_spin_unlock_irq+0xbe/0x120 [89387.069623] ? _raw_spin_unlock+0x120/0x120 [89387.069628] ? finish_task_switch+0x27d/0x7f0 [89387.069633] ? sched_clock_cpu+0x18/0x1e0 [89387.069639] ? ret_from_fork+0x1f/0x30 [89387.069644] ? pci_mmcfg_check_reserved+0x100/0x100 [89387.069650] ? cyc2ns_read_end+0x20/0x20 [89387.069657] ? schedule+0xfb/0x3b0 [89387.069662] ? __schedule+0x19b0/0x19b0 [89387.069666] ? remove_wait_queue+0x2b0/0x2b0 [89387.069670] ? arch_vtime_task_switch+0xee/0x190 [89387.069675] ? _raw_spin_unlock_irqrestore+0xc2/0x130 [89387.069679] ? _raw_spin_unlock_irq+0x120/0x120 [89387.069683] ? trace_event_raw_event_workqueue_work+0x170/0x170 [89387.069688] kthread+0x2d4/0x390 [89387.069693] ? kthread_create_worker+0xd0/0xd0 [89387.069697] ret_from_fork+0x1f/0x30 [89387.069705] Allocated by task 2387: [89387.069712] kasan_kmalloc+0xa0/0xd0 [89387.069717] kmem_cache_alloc_trace+0xd1/0x1e0 [89387.069722] dm_crtc_duplicate_state+0x73/0x130 [89387.069726] drm_atomic_get_crtc_state+0x13c/0x400 [89387.069730] page_flip_common+0x52/0x230 [89387.069734] drm_atomic_helper_page_flip+0xa1/0x100 [89387.069739] drm_mode_page_flip_ioctl+0xc10/0x1030 [89387.069744] drm_ioctl_kernel+0x1b5/0x2c0 [89387.069748] drm_ioctl+0x709/0xa00 [89387.069752] amdgpu_drm_ioctl+0x118/0x280 [89387.069756] do_vfs_ioctl+0x18a/0x1260 [89387.069760] SyS_ioctl+0x6f/0x80 [89387.069764] do_syscall_64+0x220/0x670 [89387.069768] return_from_SYSCALL_64+0x0/0x65 [89387.069772] Freed by task 2533: [89387.069776] kasan_slab_free+0x71/0xc0 [89387.069780] kfree+0x88/0x1b0 [89387.069784] drm_atomic_state_default_clear+0x2c8/0xa00 [89387.069787] __drm_atomic_state_free+0x30/0xd0 [89387.069791] drm_atomic_helper_update_plane+0xb6/0x350 [89387.069794] __setplane_internal+0x5b4/0x9d0 [89387.069798] drm_mode_cursor_universal+0x412/0xc60 [89387.069801] drm_mode_cursor_common+0x4b6/0x890 [89387.069805] drm_mode_cursor_ioctl+0xd3/0x120 [89387.069809] drm_ioctl_kernel+0x1b5/0x2c0 [89387.069813] drm_ioctl+0x709/0xa00 [89387.069816] amdgpu_drm_ioctl+0x118/0x280 [89387.069819] do_vfs_ioctl+0x18a/0x1260 [89387.069822] SyS_ioctl+0x6f/0x80 [89387.069824] do_syscall_64+0x220/0x670 [89387.069828] return_from_SYSCALL_64+0x0/0x65 [89387.069834] The buggy address belongs to the object at 880124df0480 [89387.069839] The buggy address is located 520 bytes inside of [89387.
BUG: KASAN: use-after-free in amdgpu_job_free_cb
I still get a use-after-free with linux-4.15-rc6: [ 16.788943] == [ 16.788968] BUG: KASAN: use-after-free in amdgpu_job_free_cb+0x140/0x150 [ 16.788975] Read of size 8 at addr 8803dfe4b3c8 by task kworker/0:2/1355 [ 16.788986] CPU: 0 PID: 1355 Comm: kworker/0:2 Not tainted 4.15.0-rc6 #438 [ 16.788990] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 10/12/2017 [ 16.788998] Workqueue: events amd_sched_job_finish [ 16.789003] Call Trace: [ 16.789012] dump_stack+0x99/0x11e [ 16.789018] ? _atomic_dec_and_lock+0x152/0x152 [ 16.789026] print_address_description+0x65/0x270 [ 16.789032] kasan_report+0x272/0x360 [ 16.789038] ? amdgpu_job_free_cb+0x140/0x150 [ 16.789043] amdgpu_job_free_cb+0x140/0x150 [ 16.789049] amd_sched_job_finish+0x288/0x560 [ 16.789055] ? amd_sched_process_job+0x220/0x220 [ 16.789061] ? __queue_delayed_work+0x211/0x360 [ 16.789067] ? pick_next_task_fair+0xcff/0x10f0 [ 16.789073] ? _raw_spin_unlock_irq+0xbe/0x120 [ 16.789077] ? _raw_spin_unlock+0x120/0x120 [ 16.789082] process_one_work+0x84b/0x1600 [ 16.789088] ? tick_nohz_dep_clear_signal+0x20/0x20 [ 16.789093] ? _raw_spin_unlock_irq+0xbe/0x120 [ 16.789097] ? _raw_spin_unlock+0x120/0x120 [ 16.789101] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [ 16.789107] ? compat_start_thread+0x70/0x70 [ 16.789111] ? cyc2ns_read_end+0x20/0x20 [ 16.789117] ? finish_task_switch+0x27d/0x7f0 [ 16.789121] ? wq_worker_waking_up+0xc0/0xc0 [ 16.789127] ? sched_clock_cpu+0x18/0x1e0 [ 16.789133] ? task_change_group_fair+0x7e0/0x7e0 [ 16.789139] ? pci_mmcfg_check_reserved+0x100/0x100 [ 16.789143] ? load_balance+0x3120/0x3120 [ 16.789148] ? perf_event_exit_task+0x91f/0xe20 [ 16.789156] ? schedule+0xfb/0x3b0 [ 16.789160] ? __schedule+0x19b0/0x19b0 [ 16.789165] ? _raw_spin_unlock_irq+0xb9/0x120 [ 16.789169] ? _raw_spin_unlock_irq+0xbe/0x120 [ 16.789172] ? _raw_spin_unlock+0x120/0x120 [ 16.789177] worker_thread+0x211/0x1790 [ 16.789184] ? pick_next_task_fair+0x97d/0x10f0 [ 16.789188] ? trace_event_raw_event_workqueue_work+0x170/0x170 [ 16.789194] ? tick_nohz_dep_clear_signal+0x20/0x20 [ 16.789199] ? _raw_spin_unlock_irq+0xbe/0x120 [ 16.789202] ? _raw_spin_unlock+0x120/0x120 [ 16.789207] ? compat_start_thread+0x70/0x70 [ 16.789212] ? finish_task_switch+0x27d/0x7f0 [ 16.789217] ? sched_clock_cpu+0x18/0x1e0 [ 16.789223] ? ret_from_fork+0x1f/0x30 [ 16.789228] ? pci_mmcfg_check_reserved+0x100/0x100 [ 16.789233] ? get_task_cred+0x210/0x210 [ 16.789238] ? cyc2ns_read_end+0x20/0x20 [ 16.789245] ? schedule+0xfb/0x3b0 [ 16.789249] ? __schedule+0x19b0/0x19b0 [ 16.789254] ? remove_wait_queue+0x2b0/0x2b0 [ 16.789258] ? arch_vtime_task_switch+0xee/0x190 [ 16.789263] ? _raw_spin_unlock_irqrestore+0xc2/0x130 [ 16.789267] ? _raw_spin_unlock_irq+0x120/0x120 [ 16.789273] ? trace_event_raw_event_workqueue_work+0x170/0x170 [ 16.789277] kthread+0x2d4/0x390 [ 16.789282] ? kthread_create_worker+0xd0/0xd0 [ 16.789286] ? umh_complete+0x60/0x60 [ 16.789290] ret_from_fork+0x1f/0x30 [ 16.789298] Allocated by task 2385: [ 16.789304] kasan_kmalloc+0xa0/0xd0 [ 16.789309] kmem_cache_alloc_trace+0xd1/0x1e0 [ 16.789314] amdgpu_driver_open_kms+0x12b/0x4d0 [ 16.789320] drm_open+0x7c3/0x1100 [ 16.789324] drm_stub_open+0x2a8/0x400 [ 16.789329] chrdev_open+0x1eb/0x5a0 [ 16.789333] do_dentry_open+0x5a1/0xc50 [ 16.789337] path_openat+0x11d3/0x4e90 [ 16.789341] do_filp_open+0x239/0x3c0 [ 16.789344] do_sys_open+0x402/0x630 [ 16.789349] do_syscall_64+0x220/0x670 [ 16.789353] return_from_SYSCALL_64+0x0/0x65 [ 16.789357] Freed by task 2541: [ 16.789362] kasan_slab_free+0x71/0xc0 [ 16.789365] kfree+0x88/0x1b0 [ 16.789369] amdgpu_driver_postclose_kms+0x469/0x860 [ 16.789373] drm_release+0x8a8/0x1180 [ 16.789377] __fput+0x2ab/0x730 [ 16.789380] task_work_run+0x14b/0x200 [ 16.789384] exit_to_usermode_loop+0x151/0x180 [ 16.789387] do_syscall_64+0x4ed/0x670 [ 16.789391] return_from_SYSCALL_64+0x0/0x65 [ 16.789397] The buggy address belongs to the object at 8803dfe4b300 [ 16.789403] The buggy address is located 200 bytes inside of [ 16.789406] The buggy address belongs to the page: [ 16.789413] page:4ccd276f count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 [ 16.789421] flags: 0x20008100(slab|head) [ 16.789428] raw: 20008100 0001000f000f [ 16.789433] raw: dead0100 dead0200 8803f3002a80 [ 16.789436] page dumped because: kasan: bad access detected [ 16.789441] Memory state around the buggy address: [ 16.789445] 8803dfe4b280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 16.789449] 8803dfe4b300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Re: Fixes for 4.15-rc1
On 2017 Nov 28, Harry Wentland wrote: > Hi Alex, > > I cherry-picked a bunch of fixes for 4.15. These can be found at > hwentlan/4.15-rc1-fixes. > > Of the changes the highlighted ones (with *) in particular are highly > recommended, but even the other ones are probably good to have. > > * af54c36e0c30 drm/amd/display: Do not put drm_atomic_state on resume This one is really needed, cause it fixes a use-after-free. See this thread: https://lists.freedesktop.org/archives/amd-gfx/2017-November/016236.html Additionally, another use-after-free waits for fixing in 4.15-rc: https://lists.freedesktop.org/archives/amd-gfx/2017-October/014827.html -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Kernel crash/Null pointer dereference on vblank
On 2017 Nov 23, Leo Li wrote: > Hi Johannes, > > The s3 resume issue looks to be a problem with amdgpu/display. Could you > give the attached patch a try? > > Thanks, > Leo > > On 2017-11-23 07:27 AM, Johannes Hirte wrote: > > On 2017 Nov 23, Chunming Zhou wrote: > >> See the attached email, they fixed same issue, each of them is ok to fix > >> your issue, your calltrace is same as the second. > >> > >> We should already push the first patch in early time, could you check if > >> the first patch is in your branch? > >> > > > > This patch (series) is not upstream yet. Just tested it, but this doesn't > > fix the > > use-after-free on S3 resume with dc enabled. > > > From 8656ef112d53f8c08f6571dd0d093f03d2e6cc30 Mon Sep 17 00:00:00 2001 > From: "Leo (Sunpeng) Li" > Date: Thu, 16 Nov 2017 15:17:27 -0500 > Subject: [PATCH] drm/amdgpu/display: Do not put drm_atomic_state on resume > > drm_atomic_helper_resume now puts it for us. See relevant patch here: > https://lists.freedesktop.org/archives/dri-devel/2017-October/154268.html > > Change-Id: Ief246492f721a1cf281d48e9d1a7029e5cefc2da > Signed-off-by: Leo (Sunpeng) Li > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 5731167..951ea77 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -688,7 +688,6 @@ int amdgpu_dm_display_resume(struct amdgpu_device *adev) > > ret = drm_atomic_helper_resume(ddev, adev->dm.cached_state); > > - drm_atomic_state_put(adev->dm.cached_state); > adev->dm.cached_state = NULL; > > amdgpu_dm_irq_resume_late(adev); > -- > 2.7.4 > Looks good, with this patch the use-after-free is gone and S3 resume woks as expected. You can add my Tested-by. -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Kernel crash/Null pointer dereference on vblank
On 2017 Nov 23, Chunming Zhou wrote: > See the attached email, they fixed same issue, each of them is ok to fix > your issue, your calltrace is same as the second. > > We should already push the first patch in early time, could you check if > the first patch is in your branch? > This patch (series) is not upstream yet. Just tested it, but this doesn't fix the use-after-free on S3 resume with dc enabled. -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Kernel crash/Null pointer dereference on vblank
On 2017 Nov 23, Chunming Zhou wrote: > Which driver are you using? > > I guess your driver is a bit old, the issue should be fixed before. > This was with git master from Linus. But even with the latest changes from agd5f/drm-next-4.15 both use-after-free still persist. If there are fixes for this, they're not available for upstream. -- Regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Kernel crash/Null pointer dereference on vblank
Ok, now I have more use-after-free report, this time without dc. I don't know if this is related, but I didn't have runtime errors without dc for now. kasan report: [22697.845475] == [22697.845495] BUG: KASAN: use-after-free in amdgpu_job_free_cb+0x140/0x150 [22697.845500] Read of size 8 at addr 8801c02e91c8 by task kworker/0:2/22547 [22697.845509] CPU: 0 PID: 22547 Comm: kworker/0:2 Not tainted 4.14.0-11095-g0c86a6bd85ff #404 [22697.845513] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.09 06/09/2017 [22697.845520] Workqueue: events amd_sched_job_finish [22697.845525] Call Trace: [22697.845534] dump_stack+0x99/0x11e [22697.845541] ? _atomic_dec_and_lock+0x152/0x152 [22697.845548] print_address_description+0x65/0x270 [22697.845553] kasan_report+0x272/0x360 [22697.845557] ? amdgpu_job_free_cb+0x140/0x150 [22697.845562] amdgpu_job_free_cb+0x140/0x150 [22697.845566] amd_sched_job_finish+0x288/0x560 [22697.845571] ? amd_sched_process_job+0x220/0x220 [22697.845576] ? amdgpu_unpin_work_func+0x266/0x460 [22697.845582] ? _raw_spin_unlock_irq+0xbe/0x120 [22697.845587] ? _raw_spin_unlock+0x120/0x120 [22697.845593] process_one_work+0x84b/0x1600 [22697.845599] ? tick_nohz_dep_clear_signal+0x20/0x20 [22697.845603] ? _raw_spin_unlock_irq+0xbe/0x120 [22697.845607] ? _raw_spin_unlock+0x120/0x120 [22697.845611] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [22697.845617] ? release_thread+0xa0/0xe0 [22697.845621] ? cyc2ns_read_end+0x20/0x20 [22697.845626] ? finish_task_switch+0x27d/0x7f0 [22697.845630] ? wq_worker_waking_up+0xc0/0xc0 [22697.845640] ? pci_mmcfg_check_reserved+0x100/0x100 [22697.845644] ? pci_mmcfg_check_reserved+0x100/0x100 [22697.845648] ? preempt_schedule_irq+0x4e/0xb0 [22697.845653] ? retint_kernel+0x1b/0x1d [22697.845659] ? schedule+0xfb/0x3b0 [22697.845663] ? __schedule+0x19b0/0x19b0 [22697.845669] ? _raw_spin_unlock_irq+0xb9/0x120 [22697.845674] ? _raw_spin_unlock_irq+0xbe/0x120 [22697.845678] ? _raw_spin_unlock+0x120/0x120 [22697.845683] worker_thread+0x211/0x1790 [22697.845692] ? pick_next_task_fair+0x97d/0x10f0 [22697.845697] ? trace_event_raw_event_workqueue_work+0x170/0x170 [22697.845703] ? tick_nohz_dep_clear_signal+0x20/0x20 [22697.845708] ? _raw_spin_unlock_irq+0xbe/0x120 [22697.845713] ? _raw_spin_unlock+0x120/0x120 [22697.845718] ? compat_start_thread+0x70/0x70 [22697.845722] ? finish_task_switch+0x27d/0x7f0 [22697.845727] ? sched_clock_cpu+0x18/0x1e0 [22697.845733] ? ret_from_fork+0x1f/0x30 [22697.845739] ? pci_mmcfg_check_reserved+0x100/0x100 [22697.845744] ? unix_write_space+0x410/0x410 [22697.845749] ? cyc2ns_read_end+0x20/0x20 [22697.845755] ? schedule+0xfb/0x3b0 [22697.845759] ? __schedule+0x19b0/0x19b0 [22697.845765] ? remove_wait_queue+0x2b0/0x2b0 [22697.845770] ? arch_vtime_task_switch+0xee/0x190 [22697.845774] ? _raw_spin_unlock_irqrestore+0xc2/0x130 [22697.845778] ? _raw_spin_unlock_irq+0x120/0x120 [22697.845783] ? trace_event_raw_event_workqueue_work+0x170/0x170 [22697.845788] kthread+0x2d4/0x390 [22697.845793] ? kthread_create_worker+0xd0/0xd0 [22697.845797] ret_from_fork+0x1f/0x30 [22697.845809] Allocated by task 2378: [22697.845817] kasan_kmalloc+0xa0/0xd0 [22697.845822] kmem_cache_alloc_trace+0xd1/0x1e0 [22697.845829] amdgpu_driver_open_kms+0x12b/0x4d0 [22697.845839] drm_open+0x7c3/0x1100 [22697.845843] drm_stub_open+0x2a8/0x400 [22697.845851] chrdev_open+0x1eb/0x5a0 [22697.845857] do_dentry_open+0x5a1/0xc50 [22697.845865] path_openat+0x11d3/0x4e90 [22697.845868] do_filp_open+0x239/0x3c0 [22697.845872] do_sys_open+0x402/0x630 [22697.845878] do_syscall_64+0x220/0x670 [22697.845881] return_from_SYSCALL_64+0x0/0x65 [22697.845887] Freed by task 24090: [22697.845892] kasan_slab_free+0x71/0xc0 [22697.845895] kfree+0x88/0x1b0 [22697.845900] amdgpu_driver_postclose_kms+0x469/0x860 [22697.845904] drm_release+0x8a8/0x1180 [22697.845909] __fput+0x2ab/0x730 [22697.845913] task_work_run+0x14b/0x200 [22697.845919] do_exit+0x7c6/0x13a0 [22697.845922] do_group_exit+0x121/0x340 [22697.845926] SyS_exit_group+0x14/0x20 [22697.845929] do_syscall_64+0x220/0x670 [22697.845932] return_from_SYSCALL_64+0x0/0x65 [22697.845940] The buggy address belongs to the object at 8801c02e9100 [22697.845946] The buggy address is located 200 bytes inside of [22697.845949] The buggy address belongs to the page: [22697.845958] page:ea000700ba00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 [22697.845967] flags: 0x20008100(slab|head) [22697.845977] raw: 20008100 0001000f000f [22697.845982] raw: dead0100 dead0200 8803f3402a80 [22697.845985] page dumped because: kasan: bad access detected [22697.845990] Memory state around the buggy address: [22697.845995] 8801c02e9080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [22697.845999]
Re: Kernel crash/Null pointer dereference on vblank
On 2017 Nov 22, Martin Babutzka wrote: >Dear AMD Developers, >At first congratulations for the DC code submission to the 4.15 kernel. >Unfortunately the major regression which I reported on 29.09., 06.10., >02.11. and 05.11. still exists. But this time I got additional >debugging information maybe this helps to fix it. > >Summary: I am running Xubuntu 17.10 with the amd-staging-drm-next >kernel patched to 4.14.0. The latest build which I tested is from >includes all commits up to now (including 2017-11-17 19:51:57 (GMT) >commit 85d09ce5e5039644487e9508d6359f9f4cf64427). > >Some vblank operations make the kernel crash and hang up the whole >system. The error is reproducible by enabling the screen lock or the >suspend mode. The system can not return to proper state from either of >these (after all I am not 100% sure it is the same error). Debugging is > easier with screen lock. Attached you can find the kernel crash and >the dce110_vblank_set function modified by some kernel prints. It looks >like the function is called twice and does not work the second time. >The whole code around dce110_vblank_set also looks interrupt-ish - >could this be a race condition or timing problem? Objects being cleared >from memory and then accessed by dce110_vblank_set? > >Bug reports on this issue: >https://github.com/M-Bab/linux-kernel-amdgpu-binaries/issues/37 >https://github.com/M-Bab/linux-kernel-amdgpu-binaries/issues/29 > >Many regards, >Martin (M-bab) I'm having the same problem on Carrizo. The system crashes when resuming from S3 and dc is on. With dc off, everything works fine. I was able to catch some debug info with kasan: Nov 22 15:52:19 probook kernel: PM: suspend entry (deep) Nov 22 15:52:19 probook kernel: PM: Syncing filesystems ... done. Nov 22 15:52:28 probook kernel: Freezing user space processes ... (elapsed 0.002 seconds) done. Nov 22 15:52:28 probook kernel: OOM killer disabled. Nov 22 15:52:28 probook kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Nov 22 15:52:28 probook kernel: Suspending console(s) (use no_console_suspend to debug) Nov 22 15:52:28 probook kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Nov 22 15:52:28 probook kernel: sd 0:0:0:0: [sda] Stopping disk Nov 22 15:52:28 probook kernel: amdgpu :00:01.0: 8803e8075500 unpin not necessary Nov 22 15:52:28 probook kernel: ACPI: Preparing to enter system sleep state S3 Nov 22 15:52:28 probook kernel: ACPI: EC: event blocked Nov 22 15:52:28 probook kernel: ACPI: EC: EC stopped Nov 22 15:52:28 probook kernel: PM: Saving platform NVS memory Nov 22 15:52:28 probook kernel: Disabling non-boot CPUs ... Nov 22 15:52:28 probook kernel: smpboot: CPU 1 is now offline Nov 22 15:52:28 probook kernel: smpboot: CPU 2 is now offline Nov 22 15:52:28 probook kernel: smpboot: CPU 3 is now offline Nov 22 15:52:28 probook kernel: ACPI: Low-level resume complete Nov 22 15:52:28 probook kernel: ACPI: EC: EC started Nov 22 15:52:28 probook kernel: PM: Restoring platform NVS memory Nov 22 15:52:28 probook kernel: LVT offset 0 assigned for vector 0x400 Nov 22 15:52:28 probook kernel: Enabling non-boot CPUs ... Nov 22 15:52:28 probook kernel: x86: Booting SMP configuration: Nov 22 15:52:28 probook kernel: smpboot: Booting Node 0 Processor 1 APIC 0x11 Nov 22 15:52:28 probook kernel: cache: parent cpu1 should not be sleeping Nov 22 15:52:28 probook kernel: CPU1 is up Nov 22 15:52:28 probook kernel: smpboot: Booting Node 0 Processor 2 APIC 0x12 Nov 22 15:52:28 probook kernel: cache: parent cpu2 should not be sleeping Nov 22 15:52:28 probook kernel: CPU2 is up Nov 22 15:52:28 probook kernel: smpboot: Booting Node 0 Processor 3 APIC 0x13 Nov 22 15:52:28 probook kernel: cache: parent cpu3 should not be sleeping Nov 22 15:52:28 probook kernel: CPU3 is up Nov 22 15:52:28 probook kernel: ACPI: Waking up from system sleep state S3 Nov 22 15:52:28 probook kernel: ACPI: EC: event unblocked Nov 22 15:52:28 probook kernel: [drm] PCIE GART of 1024M enabled (table at 0x00F40004). Nov 22 15:52:28 probook kernel: sd 0:0:0:0: [sda] Starting disk Nov 22 15:52:28 probook kernel: r8169 :01:00.0 enp1s0: link down Nov 22 15:52:28 probook kernel: ACPI: button: The lid device is not compliant to SW_LID. Nov 22 15:52:28 probook kernel: usb 3-1.1: reset high-speed USB device number 3 using ehci-pci Nov 22 15:52:28 probook kernel: [drm:hwss_wait_for_blank_complete] *ERROR* DC: failed to blank crtc! Nov 22 15:52:28 probook kernel: [drm] ring test on 0 succeeded in 11 usecs Nov 22 15:52:28 probook kernel: [drm] ring test on 9 succeeded in 8 usecs Nov 22 15:52:28 probook kernel: [drm] ring test on 1 succeeded in 4 usecs Nov 22 15:52:28 probook kernel: [drm] ring test on 2 succeeded in 2 usecs Nov 22 15:52:28 probook kernel: [drm] ring test on 3 succeeded in 2 usecs Nov 22 15:52:28 probook kernel: [drm] ring test on 4 succeeded in 2 usecs Nov 22 15:52:28 probook kernel: [drm] ring test on 5 succeeded in 7 usecs Nov 22 15:52:28 probook kernel
[BUG] X broken on Carrizo when GFX_PG enabled
Because nobody reacted on the bug report, I'm trying this way. As mentioned in https://bugzilla.kernel.org/show_bug.cgi?id=196337, my system gets unusable with GFX_PG enabled, cause X doesn't start anymore. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[drm-next-4.9-wip] this function not implement on Carizzo
With commit fad2af195f1abaada473f4f9e9a554c1e4db768b PowerPlay was enabled by default for Carizzo, so I assumed this as complete and tested again. But I still get in dmesg this: [ powerplay ] this function not implement! [ powerplay ] min_core_set_clock not set There are multiple entries on startup and it happens occasional during runtime. Is PowerPlay on Carizzo still incomplete or is this a special problem with my system (HP ProBook 645 G2)? regards, Johannes ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx