On Fri, Jan 16, 2026 at 5:42 AM Christian König <[email protected]> wrote: > > On 1/16/26 02:20, Tiezhu Yang wrote: > > On 2026/1/15 下午9:47, Christian König wrote: > >> On 1/15/26 02:28, Tiezhu Yang wrote: > >>> Currently, there are many Call Traces when booting kernel on LoongArch, > >>> here are the trimmed kernel log messages: > >>> > >>> amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110 > >>> amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed > >>> amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init > >>> amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device. > >>> ------------[ cut here ]------------ > >>> WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at > >>> amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9 > >>> ... > >>> Call Trace: > >>> [<90000000047a8524>] show_stack+0x64/0x190 > >>> [<90000000047a0614>] dump_stack_lvl+0x6c/0x9c > >>> [<90000000047cef34>] __warn+0xa4/0x1b0 > >>> [<90000000060a4884>] __report_bug+0xa4/0x1d0 > >>> [<90000000060a4a88>] report_bug+0x38/0xd0 > >>> [<90000000060df330>] do_bp+0x260/0x410 > >>> [<90000000047a6bc0>] handle_bp+0x120/0x1c0 > >>> [<ffff8000028bff40>] amdgpu_irq_put+0xb0/0x140 [amdgpu] > >>> [<ffff8000027b1a8c>] amdgpu_fence_driver_hw_fini+0x12c/0x180 [amdgpu] > >>> [<ffff800002f2c04c>] amdgpu_device_fini_hw+0xf0/0x3fc [amdgpu] > >>> [<ffff80000279e2ac>] amdgpu_driver_load_kms+0x7c/0xa0 [amdgpu] > >>> [<ffff800002791128>] amdgpu_pci_probe+0x298/0x810 [amdgpu] > >>> [<90000000054d04a4>] local_pci_probe+0x44/0xc0 > >>> [<90000000047f4ab0>] work_for_cpu_fn+0x20/0x40 > >>> [<90000000047f93e0>] process_one_work+0x170/0x4e0 > >>> [<90000000047fa14c>] worker_thread+0x3ac/0x4e0 > >>> [<9000000004806824>] kthread+0x154/0x170 > >>> [<90000000060df5b4>] ret_from_kernel_thread+0x24/0xd0 > >>> [<90000000047a62a4>] ret_from_kernel_thread_asm+0xc/0x88 > >>> > >>> ---[ end trace 0000000000000000 ]--- > >>> amdgpu 0000:07:00.0: probe with driver amdgpu failed with error -110 > >>> amdgpu 0000:07:00.0: amdgpu: amdgpu: ttm finalized > >>> > >>> This is because amdgpu_irq_enabled() is false in amdgpu_irq_put(), then > >>> the condition of WARN_ON() is true. > >>> > >>> In order to avoid the unnecessary Call Traces, it can remove the check of > >>> amdgpu_irq_enabled() and only check atomic_read(&src->enabled_types[type] > >>> for three reasons: > >>> > >>> (1) The aim is to prevent refcount from being less than 0, it was added in > >>> commit 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset"). > >>> (2) There are already many useful failed log before the Call Trace, there > >>> is no need to WARN_ON(). > >> > >> Well completely disagree. The call trace here is absolutely intentional. > > > > If so, since the call trace is same, is it enough to use WARN_ON_ONCE() > > here? > > I also don't see a justification for that. > > >> That you get a lot of other backtraces because the driver doesn't > >> initialize at all isn't a good rational to remove this one here. > >> > >> Regards, > >> Christian. > >> > >>> (3) The following checks in amdgpu_irq_put() are same with the checks in > >>> amdgpu_irq_enabled(), there is no need to do the redundant operations. > >>> > >>> if (!adev->irq.installed) > >>> return -ENOENT; > >>> > >>> if (type >= src->num_types) > >>> return -EINVAL; > >>> > >>> if (!src->enabled_types || !src->funcs->set) > >>> return -EINVAL; > > > > Is this reasonable? Only check atomic_read(&src->enabled_types[type]? > > No, absolutely not. That are two completely different things. > > > That is to say, does it make sense to do the following change? > > The warning can basically only be triggered by two conditions: > 1. A fatal problem while loading the driver and the error handling is not > 100% clean. > 2. A driver coding error. > > And we really need to catch all of those, so there is no real rational to > limit the warning. > > I mean when you run into any of those they should potentially be fixed at > some point.
The problem is there are a lot of error paths which don't get this right that no one has fixed for years. I'm not sure if there is much value anymore; most people just ignore them. Alex > > Regards, > Christian. > > > > > ----->8----- > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > > index 8112ffc85995..d10d6fcc525e 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > > @@ -636,7 +636,7 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct > > amdgpu_irq_src *src, > > if (!src->enabled_types || !src->funcs->set) > > return -EINVAL; > > > > - if (WARN_ON(!amdgpu_irq_enabled(adev, src, type))) > > + if (WARN_ON_ONCE(!atomic_read(&src->enabled_types[type]))) > > return -EINVAL; > > > > if (atomic_dec_and_test(&src->enabled_types[type])) > > > > Thanks, > > Tiezhu > > >
