Re: Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset")

2023-06-20 Thread Christian Kastner
On 2023-06-19 16:05, Alex Deucher wrote:
> On Mon, Jun 19, 2023 at 9:05 AM Christian Kastner  wrote:
>> On a Debian 12 ("bookworm") system, I observed a new warning when I
>> upgraded from kernel 6.1.25 to 6.1.27. This is on a system with an RX
>> 6800 XT GPU and 3500X processor.
> 
> The warnings are harmless, but they have been fixed[1] and the fixes
> are making their way back to stable kernels.
> 
> [1] - 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08c677cb0b436a96a836792bb35a8ec5de4999c2

That was quick. Thank you for pointing out the resolution.

Best,
Christian


Re: Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset")

2023-06-19 Thread Alex Deucher
On Mon, Jun 19, 2023 at 9:05 AM Christian Kastner  wrote:
>
> Hi,
>
> On a Debian 12 ("bookworm") system, I observed a new warning when I
> upgraded from kernel 6.1.25 to 6.1.27. This is on a system with an RX
> 6800 XT GPU and 3500X processor.
>
> I've traced it down to commit c8b5a95 ("drm/amdgpu: Fix desktop freezed
> after gpu-reset"). Rebuilding the 6.1.27 kernel without this change
> makes the warning disappear.
>
> I can reliably trigger this (and another) warning with
>
>   $ sudo cat /sys/kernel/debug/dri/0/amdgpu_test_ib
>   run ib test:
>   ib ring tests passed.
>
> 5 or 6 seconds after this, two warnings are printed. I see these same
> two warnings on system shutdown (or, at least, they looked similar
> enough to the above that I didn't check for identity).
>
> I've attached
>   (1) the dmesg output after modprobe'ing amdgpu
>   (2) the dmesg output after triggering amdgpu_test_ib
>
> The system in question is only used for ROCm development. I haven't
> observed any other side effects there, other than the warning. There's
> no monitor attached. So I can't speak to the effect of a desktop freeze.

The warnings are harmless, but they have been fixed[1] and the fixes
are making their way back to stable kernels.

[1] - 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08c677cb0b436a96a836792bb35a8ec5de4999c2

Alex


Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset")

2023-06-19 Thread Christian Kastner
Hi,

On a Debian 12 ("bookworm") system, I observed a new warning when I
upgraded from kernel 6.1.25 to 6.1.27. This is on a system with an RX
6800 XT GPU and 3500X processor.

I've traced it down to commit c8b5a95 ("drm/amdgpu: Fix desktop freezed
after gpu-reset"). Rebuilding the 6.1.27 kernel without this change
makes the warning disappear.

I can reliably trigger this (and another) warning with

  $ sudo cat /sys/kernel/debug/dri/0/amdgpu_test_ib
  run ib test:
  ib ring tests passed.

5 or 6 seconds after this, two warnings are printed. I see these same
two warnings on system shutdown (or, at least, they looked similar
enough to the above that I didn't check for identity).

I've attached
  (1) the dmesg output after modprobe'ing amdgpu
  (2) the dmesg output after triggering amdgpu_test_ib

The system in question is only used for ROCm development. I haven't
observed any other side effects there, other than the warning. There's
no monitor attached. So I can't speak to the effect of a desktop freeze.

Best,
Christian[  266.669251] [drm] PCIE GART of 512M enabled (table at 0x0083FEB0).
[  266.669268] [drm] PSP is resuming...
[  266.739148] [drm] reserve 0xa0 from 0x83fd00 for PSP TMR
[  266.876401] amdgpu :09:00.0: amdgpu: SECUREDISPLAY: securedisplay ta 
ucode is not available
[  266.876404] amdgpu :09:00.0: amdgpu: SMU is resuming...
[  266.876407] amdgpu :09:00.0: amdgpu: smu driver if version = 0x0040, 
smu fw if version = 0x0041, smu fw program = 0, version = 0x003a5600 
(58.86.0)
[  266.876410] amdgpu :09:00.0: amdgpu: SMU driver if version not matched
[  266.876428] amdgpu :09:00.0: amdgpu: dpm has been enabled
[  266.879972] amdgpu :09:00.0: amdgpu: SMU is resumed successfully!
[  266.881457] [drm] DMUB hardware initialized: version=0x02020017
[  266.904086] [drm] kiq ring mec 2 pipe 1 q 0
[  266.910932] [drm] VCN decode and encode initialized successfully(under DPG 
Mode).
[  266.911082] [drm] JPEG decode initialized successfully.
[  266.911104] amdgpu :09:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on 
hub 0
[  266.911106] amdgpu :09:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 
on hub 0
[  266.911107] amdgpu :09:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 
on hub 0
[  266.911108] amdgpu :09:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 
on hub 0
[  266.911109] amdgpu :09:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 
on hub 0
[  266.90] amdgpu :09:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 
on hub 0
[  266.90] amdgpu :09:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 
on hub 0
[  266.91] amdgpu :09:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 
on hub 0
[  266.92] amdgpu :09:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 
on hub 0
[  266.93] amdgpu :09:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 
on hub 0
[  266.94] amdgpu :09:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on 
hub 0
[  266.95] amdgpu :09:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on 
hub 0
[  266.96] amdgpu :09:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on 
hub 0
[  266.97] amdgpu :09:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on 
hub 0
[  266.97] amdgpu :09:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on 
hub 1
[  266.98] amdgpu :09:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 
on hub 1
[  266.99] amdgpu :09:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 
on hub 1
[  266.911120] amdgpu :09:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on 
hub 1
[  266.911121] amdgpu :09:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 
on hub 1
[  266.911122] amdgpu :09:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 
on hub 1
[  266.911123] amdgpu :09:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on 
hub 1
[  266.916173] amdgpu :09:00.0: [drm] Cannot find any crtc or sizes
[  266.916177] amdgpu :09:00.0: [drm] Cannot find any crtc or sizes
[  272.409887] [ cut here ]
[  272.409891] WARNING: CPU: 1 PID: 259 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:656 amdgpu_irq_put+0x45/0x70 [amdgpu]
[  272.410166] Modules linked in: amdgpu gpu_sched drm_buddy drm_display_helper 
cec rc_core drm_ttm_helper ttm drm_kms_helper i2c_algo_bit ipt_REJECT 
xt_multiport nft_compat ctr ccm wireguard libchacha20poly1305 chacha_x86_64 
poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha 
ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat 
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c 
nfnetlink overlay binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr 
intel_rapl_common amd64_edac edac_mce_amd kvm_amd iwlmvm kvm mac80211 
snd_hda_codec_realtek irqbypass snd_hda_codec_generic ghash_clmulni_intel 
snd_hda_codec_hdmi sha512_ssse3 sha512_generic libarc4 snd_hda_intel