VCE for some polaris 12 variants

Zhang, Jerry Tue, 27 Nov 2018 19:48:48 -0800

在 2018年11月28日，00:11，Alex Deucher <alexdeuc...@gmail.com> 写道：
> 
> On Tue, Nov 27, 2018 at 4:56 AM Christian König
> <ckoenig.leichtzumer...@gmail.com> wrote:
>> 
>> Am 27.11.18 um 02:47 schrieb Zhang, Jerry(Junwei):
>> 
>> On 11/26/18 5:28 PM, Christian König wrote:
>> 
>> Am 26.11.18 um 03:38 schrieb Zhang, Jerry(Junwei):
>> 
>> On 11/24/18 3:32 AM, Deucher, Alexander wrote:
>> 
>> Is this required?  Are the harvesting fuses incorrect?  If the blocks are 
>> harvested, we should bail out of the blocks properly during init.  Also, 
>> please make this more explicit if we still need it.  E.g.,
>> 
>> 
>> 
>> The harvest fuse is indeed disabling UVD and VCE, as it's a mining card.
>> Then any command to UVD/VCE causing NULL pointer issue, like amdgpu_test.
>> 
>> 
>> In this case we should fix the NULL pointer issue instead. Do you have a 
>> backtrace for this?
>> 
>> 
>> Sorry to miss the detail.
>> The NULL pointer is caused by UVD is not initialized as it's disabled in 
>> VBIOS for this kind of card.
>> 
>> 
>> Yeah, but that should be handled correctly.
>> 
>> 
>> When cs submit, it will check ring->funcs->parse_cs in amdgpu_cs_ib_fill().
>> However, uvd_v6_0_early_init() skip the set ring function, as 
>> CC_HARVEST_FUSES is set UVD/VCE disabled.
>> Then the access to UVD/VCE ring's funcs will cause NULL pointer issue.
>> 
>> BTW, Windows driver disables UVD/VCE for it as well.
>> 
>> 
>> You are approaching this from the wrong side. The fact that UVD/VCE is 
>> disabled should already be handled correctly.
>> 
>> The problem is rather that in a couple of places (amdgpu_ctx_init for 
>> example) we assume that we have at least one UVD/VCE ring.
>> 
>> Alex is right that checking the fuses should be sufficient and we rather 
>> need to fix the handling here instead of adding another workaround.
> 
> Exactly.  There are already cards out there with no UVD or VCE, so we
> need to fix this if it's a problem.  It sounds like userspace is
> submitting work to the VCE or UVD rings without checking whether or
> not the device supports them in the first place.  We should do a
> better job of guarding against that in the kernel.


Thanks your all.
Got that meaning now.

we may also print some message that UVD/VCE is not initialized, since it looks 
initialized successfully.
```
[   15.730219] [drm] add ip block number 7 <uvd_v6_0>
```
I could check it after the vacation(back next week).

BTW, is that handled by the patch series of [PATCH 1/6] drm/amdgpu: add VCN 
JPEG support amdgpu_ctx_num_entities?
Try to apply the patches, seems amdgpu_test hang at Userptr Test, verified on 
latest staging build
Please confirm that.

[ 4388.759743] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000008
[ 4388.759782] IP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched]
[ 4388.759807] PGD 0 P4D 0
[ 4388.759820] Oops: 0000 [#1] SMP PTI
[ 4388.759834] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) 
amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit 
fb_sys_fops syscopyarea sysfillrect sysimgblt nls_utf8 cifs ccm rpcsec_gss_krb5 
nfsv4 nfs fscache b
infmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm 
snd_hda_intel irqbypass crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hda_co
re snd_hwdep ghash_clmulni_intel snd_seq_midi snd_seq_midi_event pcbc snd_pcm 
snd_rawmidi snd_seq snd_seq_device snd_timer aesni_intel aes_x86_64 crypto_simd 
eeepc_wmi glue_helper snd cryptd asus_wmi intel_cstate soundcore shpchp intel_ra
pl_perf mei_me wmi_bmof intel_wmi_thunderbolt sparse_keymap serio_raw mei 
acpi_pad mac_hid sch_fq_codel
[ 4388.760141]  nfsd auth_rpcgss nfs_acl parport_pc lockd ppdev grace lp sunrpc 
parport ip_tables x_tables autofs4 mxm_wmi e1000e psmouse ptp pps_core ahci 
libahci wmi video
[ 4388.760212] CPU: 7 PID: 915 Comm: amdgpu_test Tainted: G           OE    
4.15.0-39-generic #42-Ubuntu
[ 4388.760250] Hardware name: System manufacturer System Product Name/Z170-A, 
BIOS 1302 11/09/2015
[ 4388.760287] RIP: 0010:amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched]
[ 4388.760314] RSP: 0018:ffffa37b8166bd38 EFLAGS: 00010246
[ 4388.760337] RAX: 0000000000000000 RBX: ffff88776740e5f8 RCX: 0000000000000000
[ 4388.760366] RDX: 0000000000000000 RSI: 00000000000000fa RDI: ffff88776740e5f8
[ 4388.760396] RBP: ffffa37b8166bd88 R08: ffff8877765dab10 R09: 0000000000000000
[ 4388.760425] R10: 0000000000000000 R11: 0000000000000064 R12: 00000000000000fa
[ 4388.760455] R13: ffff8877606fdf18 R14: ffff8877606fdef8 R15: 00000000000000fa
[ 4388.760484] FS:  00007f05b21a1580(0000) GS:ffff8877765c0000(0000) 
knlGS:0000000000000000
[ 4388.760518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4388.760542] CR2: 0000000000000008 CR3: 000000003020a005 CR4: 00000000003606e0
[ 4388.760572] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4388.760601] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4388.760630] Call Trace:
[ 4388.760644]  ? wait_woken+0x80/0x80
[ 4388.760701]  amdgpu_ctx_mgr_entity_flush+0x7b/0xc0 [amdgpu]
[ 4388.760747]  amdgpu_flush+0x23/0x30 [amdgpu]
[ 4388.760767]  filp_close+0x2f/0x80
[ 4388.760782]  put_files_struct+0x78/0xf0
[ 4388.760967]  exit_files+0x49/0x50
[ 4388.760976]  do_exit+0x2ca/0xb40
[ 4388.760985]  ? __do_page_fault+0x270/0x4d0
[ 4388.760994]  do_group_exit+0x43/0xb0
[ 4388.761003]  SyS_exit_group+0x14/0x20
[ 4388.761013]  do_syscall_64+0x73/0x130
[ 4388.761023]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 4388.761034] RIP: 0033:0x7f05b143fe06
[ 4388.761043] RSP: 002b:00007ffd0fde5fa8 EFLAGS: 00000246 ORIG_RAX: 
00000000000000e7
[ 4388.761059] RAX: ffffffffffffffda RBX: 00007f05b1742740 RCX: 00007f05b143fe06
[ 4388.761074] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 4388.761088] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80
[ 4388.761103] R10: 00007f05b135a140 R11: 0000000000000246 R12: 00007f05b1742740
[ 4388.761117] R13: 0000000000000001 R14: 00007f05b174b628 R15: 0000000000000000
[ 4388.761132] Code: 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 49 89 
f4 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 47 10 <4c> 8b 
68 08 65 48 8b 04 25 00 5c 01 00 f6 40 24 04 0f 84 1b 01 
[ 4388.761188] RIP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched] RSP: 
ffffa37b8166bd38
[ 4388.761204] CR2: 0000000000000008
[ 4388.761212] ---[ end trace 7f1dd38e3cb86992 ]---
[ 4388.761222] Fixing recursive fault but reboot is needed!


Regards,
Jerry


> 
> Alex
> 
>> 
>> Regards,
>> Christian.
>> 
>> 
>> Regards,
>> Jerry
>> 
>> 
>> Regards,
>> Christian.
>> 
>> 
>> AFAIW, windows also disable UVD and VCE in initialization.
>> 
>>       if ((adev->pdev->device == 0x67df) &&
>>              (adev->pdev->revision == 0xf7)) {
>> 
>>        /* Some polaris12 variants don't support UVD/VCE */
>> 
>>      } else  {
>> 
>>                 amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block);
>> 
>>                 amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block);
>> 
>>    }
>> 
>> 
>> 
>> OK, will explicit the process.
>> 
>> Regards,
>> Jerry
>> 
>> That way if we re-arrange the order later, it will be easier to track.
>> 
>> 
>> Alex
>> 
>> ________________________________
>> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Junwei 
>> Zhang <jerry.zh...@amd.com>
>> Sent: Friday, November 23, 2018 3:32:27 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Zhang, Jerry
>> Subject: [PATCH] drm/amdgpu: disable UVD/VCE for some polaris 12 variants
>> 
>> Some variants don't support UVD and VCE.
>> 
>> Signed-off-by: Junwei Zhang <jerry.zh...@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/vi.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c 
>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>> index f3a4cf1f013a..3338b013ded4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>> @@ -1660,6 +1660,10 @@ int vi_set_ip_blocks(struct amdgpu_device *adev)
>>                         amdgpu_device_ip_block_add(adev, 
>> &dce_v11_2_ip_block);
>>                 amdgpu_device_ip_block_add(adev, &gfx_v8_0_ip_block);
>>                 amdgpu_device_ip_block_add(adev, &sdma_v3_1_ip_block);
>> +               /* Some polaris12 variants don't support UVD/VCE */
>> +               if ((adev->pdev->device == 0x67df) &&
>> +                     (adev->pdev->revision == 0xf7))
>> +                       break;
>>                 amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block);
>>                 amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block);
>>                 break;
>> --
>> 2.17.1
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> 
>> 
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> 
>> 
>> 
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> 
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: disable UVD/VCE for some polaris 12 variants

Reply via email to