在 2018年11月28日,00:11,Alex Deucher <alexdeuc...@gmail.com> 写道: > > On Tue, Nov 27, 2018 at 4:56 AM Christian König > <ckoenig.leichtzumer...@gmail.com> wrote: >> >> Am 27.11.18 um 02:47 schrieb Zhang, Jerry(Junwei): >> >> On 11/26/18 5:28 PM, Christian König wrote: >> >> Am 26.11.18 um 03:38 schrieb Zhang, Jerry(Junwei): >> >> On 11/24/18 3:32 AM, Deucher, Alexander wrote: >> >> Is this required? Are the harvesting fuses incorrect? If the blocks are >> harvested, we should bail out of the blocks properly during init. Also, >> please make this more explicit if we still need it. E.g., >> >> >> >> The harvest fuse is indeed disabling UVD and VCE, as it's a mining card. >> Then any command to UVD/VCE causing NULL pointer issue, like amdgpu_test. >> >> >> In this case we should fix the NULL pointer issue instead. Do you have a >> backtrace for this? >> >> >> Sorry to miss the detail. >> The NULL pointer is caused by UVD is not initialized as it's disabled in >> VBIOS for this kind of card. >> >> >> Yeah, but that should be handled correctly. >> >> >> When cs submit, it will check ring->funcs->parse_cs in amdgpu_cs_ib_fill(). >> However, uvd_v6_0_early_init() skip the set ring function, as >> CC_HARVEST_FUSES is set UVD/VCE disabled. >> Then the access to UVD/VCE ring's funcs will cause NULL pointer issue. >> >> BTW, Windows driver disables UVD/VCE for it as well. >> >> >> You are approaching this from the wrong side. The fact that UVD/VCE is >> disabled should already be handled correctly. >> >> The problem is rather that in a couple of places (amdgpu_ctx_init for >> example) we assume that we have at least one UVD/VCE ring. >> >> Alex is right that checking the fuses should be sufficient and we rather >> need to fix the handling here instead of adding another workaround. > > Exactly. There are already cards out there with no UVD or VCE, so we > need to fix this if it's a problem. It sounds like userspace is > submitting work to the VCE or UVD rings without checking whether or > not the device supports them in the first place. We should do a > better job of guarding against that in the kernel.
Thanks your all. Got that meaning now. we may also print some message that UVD/VCE is not initialized, since it looks initialized successfully. ``` [ 15.730219] [drm] add ip block number 7 <uvd_v6_0> ``` I could check it after the vacation(back next week). BTW, is that handled by the patch series of [PATCH 1/6] drm/amdgpu: add VCN JPEG support amdgpu_ctx_num_entities? Try to apply the patches, seems amdgpu_test hang at Userptr Test, verified on latest staging build Please confirm that. [ 4388.759743] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 4388.759782] IP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched] [ 4388.759807] PGD 0 P4D 0 [ 4388.759820] Oops: 0000 [#1] SMP PTI [ 4388.759834] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt nls_utf8 cifs ccm rpcsec_gss_krb5 nfsv4 nfs fscache b infmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm snd_hda_intel irqbypass crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hda_co re snd_hwdep ghash_clmulni_intel snd_seq_midi snd_seq_midi_event pcbc snd_pcm snd_rawmidi snd_seq snd_seq_device snd_timer aesni_intel aes_x86_64 crypto_simd eeepc_wmi glue_helper snd cryptd asus_wmi intel_cstate soundcore shpchp intel_ra pl_perf mei_me wmi_bmof intel_wmi_thunderbolt sparse_keymap serio_raw mei acpi_pad mac_hid sch_fq_codel [ 4388.760141] nfsd auth_rpcgss nfs_acl parport_pc lockd ppdev grace lp sunrpc parport ip_tables x_tables autofs4 mxm_wmi e1000e psmouse ptp pps_core ahci libahci wmi video [ 4388.760212] CPU: 7 PID: 915 Comm: amdgpu_test Tainted: G OE 4.15.0-39-generic #42-Ubuntu [ 4388.760250] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1302 11/09/2015 [ 4388.760287] RIP: 0010:amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched] [ 4388.760314] RSP: 0018:ffffa37b8166bd38 EFLAGS: 00010246 [ 4388.760337] RAX: 0000000000000000 RBX: ffff88776740e5f8 RCX: 0000000000000000 [ 4388.760366] RDX: 0000000000000000 RSI: 00000000000000fa RDI: ffff88776740e5f8 [ 4388.760396] RBP: ffffa37b8166bd88 R08: ffff8877765dab10 R09: 0000000000000000 [ 4388.760425] R10: 0000000000000000 R11: 0000000000000064 R12: 00000000000000fa [ 4388.760455] R13: ffff8877606fdf18 R14: ffff8877606fdef8 R15: 00000000000000fa [ 4388.760484] FS: 00007f05b21a1580(0000) GS:ffff8877765c0000(0000) knlGS:0000000000000000 [ 4388.760518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4388.760542] CR2: 0000000000000008 CR3: 000000003020a005 CR4: 00000000003606e0 [ 4388.760572] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4388.760601] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4388.760630] Call Trace: [ 4388.760644] ? wait_woken+0x80/0x80 [ 4388.760701] amdgpu_ctx_mgr_entity_flush+0x7b/0xc0 [amdgpu] [ 4388.760747] amdgpu_flush+0x23/0x30 [amdgpu] [ 4388.760767] filp_close+0x2f/0x80 [ 4388.760782] put_files_struct+0x78/0xf0 [ 4388.760967] exit_files+0x49/0x50 [ 4388.760976] do_exit+0x2ca/0xb40 [ 4388.760985] ? __do_page_fault+0x270/0x4d0 [ 4388.760994] do_group_exit+0x43/0xb0 [ 4388.761003] SyS_exit_group+0x14/0x20 [ 4388.761013] do_syscall_64+0x73/0x130 [ 4388.761023] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 4388.761034] RIP: 0033:0x7f05b143fe06 [ 4388.761043] RSP: 002b:00007ffd0fde5fa8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 4388.761059] RAX: ffffffffffffffda RBX: 00007f05b1742740 RCX: 00007f05b143fe06 [ 4388.761074] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 [ 4388.761088] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80 [ 4388.761103] R10: 00007f05b135a140 R11: 0000000000000246 R12: 00007f05b1742740 [ 4388.761117] R13: 0000000000000001 R14: 00007f05b174b628 R15: 0000000000000000 [ 4388.761132] Code: 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 49 89 f4 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 47 10 <4c> 8b 68 08 65 48 8b 04 25 00 5c 01 00 f6 40 24 04 0f 84 1b 01 [ 4388.761188] RIP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched] RSP: ffffa37b8166bd38 [ 4388.761204] CR2: 0000000000000008 [ 4388.761212] ---[ end trace 7f1dd38e3cb86992 ]--- [ 4388.761222] Fixing recursive fault but reboot is needed! Regards, Jerry > > Alex > >> >> Regards, >> Christian. >> >> >> Regards, >> Jerry >> >> >> Regards, >> Christian. >> >> >> AFAIW, windows also disable UVD and VCE in initialization. >> >> if ((adev->pdev->device == 0x67df) && >> (adev->pdev->revision == 0xf7)) { >> >> /* Some polaris12 variants don't support UVD/VCE */ >> >> } else { >> >> amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block); >> >> amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block); >> >> } >> >> >> >> OK, will explicit the process. >> >> Regards, >> Jerry >> >> That way if we re-arrange the order later, it will be easier to track. >> >> >> Alex >> >> ________________________________ >> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Junwei >> Zhang <jerry.zh...@amd.com> >> Sent: Friday, November 23, 2018 3:32:27 AM >> To: amd-gfx@lists.freedesktop.org >> Cc: Zhang, Jerry >> Subject: [PATCH] drm/amdgpu: disable UVD/VCE for some polaris 12 variants >> >> Some variants don't support UVD and VCE. >> >> Signed-off-by: Junwei Zhang <jerry.zh...@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/vi.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c >> b/drivers/gpu/drm/amd/amdgpu/vi.c >> index f3a4cf1f013a..3338b013ded4 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/vi.c >> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c >> @@ -1660,6 +1660,10 @@ int vi_set_ip_blocks(struct amdgpu_device *adev) >> amdgpu_device_ip_block_add(adev, >> &dce_v11_2_ip_block); >> amdgpu_device_ip_block_add(adev, &gfx_v8_0_ip_block); >> amdgpu_device_ip_block_add(adev, &sdma_v3_1_ip_block); >> + /* Some polaris12 variants don't support UVD/VCE */ >> + if ((adev->pdev->device == 0x67df) && >> + (adev->pdev->revision == 0xf7)) >> + break; >> amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block); >> amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block); >> break; >> -- >> 2.17.1 >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx