[PATCH 2/2] MAINTAINERS: add docs entry to AMDGPU
To make sure maintainers of amdgpu drivers are aware of any changes in their documentation, add its entry to MAINTAINERS. Signed-off-by: Tales Lelo da Aparecida --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index d54b9f15ffce..b3594b2a09de 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16449,6 +16449,7 @@ S: Supported T: git https://gitlab.freedesktop.org/agd5f/linux.git B: https://gitlab.freedesktop.org/drm/amd/-/issues C: irc://irc.oftc.net/radeon +F: Documentation/gpu/amdgpu/ F: drivers/gpu/drm/amd/ F: drivers/gpu/drm/radeon/ F: include/uapi/drm/amdgpu_drm.h -- 2.35.1
[PATCH 1/2] Documentation/gpu: Add entries to amdgpu glossary
Add missing acronyms to the amdgppu glossary. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1939#note_1309737 Signed-off-by: Tales Lelo da Aparecida --- Documentation/gpu/amdgpu/amdgpu-glossary.rst | 13 + 1 file changed, 13 insertions(+) diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst b/Documentation/gpu/amdgpu/amdgpu-glossary.rst index 859dcec6c6f9..48829d097f40 100644 --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst @@ -8,12 +8,19 @@ we have a dedicated glossary for Display Core at .. glossary:: +active_cu_number + The number of CUs that are active on the system. The number of active + CUs may be less than SE * SH * CU depending on the board configuration. + CP Command Processor CPLIB Content Protection Library +CU + Compute unit + DFS Digital Frequency Synthesizer @@ -74,6 +81,12 @@ we have a dedicated glossary for Display Core at SDMA System DMA +SE + Shader Engine + +SH + SHader array + SMU System Management Unit -- 2.35.1
[PATCH 0/2] Update AMDGPU glossary and MAINTAINERS
I was handling the request from [0] and then I noticed that some AMD developers were missing from get_maintainers output due to the lack of a reference to their documentation in the MAINTAINERS file. [0] https://gitlab.freedesktop.org/drm/amd/-/issues/1939#note_1309737 Tales Lelo da Aparecida (2): Documentation/gpu: Add entries to amdgpu glossary MAINTAINERS: add docs entry to AMDGPU Documentation/gpu/amdgpu/amdgpu-glossary.rst | 13 + MAINTAINERS | 1 + 2 files changed, 14 insertions(+) -- 2.35.1
[PATCH] drm/amd/display: make hubp1_wait_pipe_read_start() static
It's a local function, let's make it static. Signed-off-by: Tales Lelo da Aparecida --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c index fbff6beb78be..3a7f76e2c598 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c @@ -1316,7 +1316,7 @@ void hubp1_set_flip_int(struct hubp *hubp) * * @hubp: hubp struct reference. */ -void hubp1_wait_pipe_read_start(struct hubp *hubp) +static void hubp1_wait_pipe_read_start(struct hubp *hubp) { struct dcn10_hubp *hubp1 = TO_DCN10_HUBP(hubp); -- 2.35.1
Re: Unable to boot 5.18-rc kernel on amdgpu legacy (si) hardware.
On Fri, Apr 15, 2022 at 12:00 PM Paul Blazejowski wrote: > > Hello, > > I am unable to boot 5.18-rc1(2) kernel with my rather old (si) XFX card, > which works fine under 5.17.3 and previous kernels. > > The card is identified as: > > 1:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] > > Running on a Asus M5A99FX PRO R2.0 mainboard. > > The boot process stops at "fb0: EFI VGA frame buffer device" step and > the hdmi connected monitor just shuts off, there's not oops or any other > debug output captured by serial console. > > I was able to bisect the kernel tracing this to a bad commit: > > git bisect bad 3712e7a494596b26861f4dc9b81676d1d0272eaf > # first bad commit: [3712e7a494596b26861f4dc9b81676d1d0272eaf] > drm/amd/pm: unified lock protections in amdgpu_dpm.c > > And reverting this commit on 5.18-rc2 kernel makes my system bootable again. > > Please let me know if there's anything else i could provide to help fix > this issue. I have access to serial console if needed. Fixed with this patch: https://patchwork.freedesktop.org/patch/481477/ Alex > > Thank you.
Unable to boot 5.18-rc kernel on amdgpu legacy (si) hardware.
Hello, I am unable to boot 5.18-rc1(2) kernel with my rather old (si) XFX card, which works fine under 5.17.3 and previous kernels. The card is identified as: 1:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] Running on a Asus M5A99FX PRO R2.0 mainboard. The boot process stops at "fb0: EFI VGA frame buffer device" step and the hdmi connected monitor just shuts off, there's not oops or any other debug output captured by serial console. I was able to bisect the kernel tracing this to a bad commit: git bisect bad 3712e7a494596b26861f4dc9b81676d1d0272eaf # first bad commit: [3712e7a494596b26861f4dc9b81676d1d0272eaf] drm/amd/pm: unified lock protections in amdgpu_dpm.c And reverting this commit on 5.18-rc2 kernel makes my system bootable again. Please let me know if there's anything else i could provide to help fix this issue. I have access to serial console if needed. Thank you.
Re: Vega 56 failing to process EDID from VR Headset
Hi, I have raised a bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1975 I have also attached a patch to the bug, that fixes the CR negotiation problem. My Oculus Rift S headset now works, with the included patch. The patch fixes DP CR and EQ negotiation. Kind Regards James
[pull] amdgpu, amdkfd, radeon drm-next-5.19
Hi Dave, Daniel, New features for 5.19. There is a new DP define added in drm_dp_helper.h to support some new DC code. That file has moved in drm-misc. Just a heads up there when you merge. The following changes since commit 15f9cd4334c83716fa32647652a609e3ba6c998d: drm/amdgpu/gfx10: enable gfx1037 clock counter retrieval function (2022-03-25 12:40:25 -0400) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-next-5.19-2022-04-15 for you to fetch changes up to d68cf992ded575928cf4ddf7c64faff0d8dcce14: drm/amd/amdgpu: Remove static from variable in RLCG Reg RW (2022-04-14 15:29:20 -0400) amd-drm-next-5.19-2022-04-15: amdgpu: - USB-C updates - GPUVM updates - TMZ fixes for RV - DCN 3.1 pstate fixes - Display z state fixes - RAS fixes - Misc code cleanups and spelling fixes - More DC FP rework - GPUVM TLB handling rework - Power management sysfs code cleanup - Add RAS support for VCN - Backlight fix - Add unique id support for more asics - Misc display updates - SR-IOV fixes - Extend CG and PG flags to 64 bits - Enable VCN clk sysfs nodes for navi12 amdkfd: - Fix IO link cleanup during device removal - RAS fixes - Retry fault fixes - Asynchronously free events - SVM fixes radeon: - Drop some dead code - Misc code cleanups Aashish Sharma (1): drm/amd/display: Fix unused-but-set-variable warning Ahmad Othman (1): drm/amd/display: Fix HDCP SEND AKI INIT error Alex Deucher (7): drm/amdgpu: make amdgpu_display_framebuffer_init() static drm/amdgpu: drop amdgpu_display_gem_fb_init() drm/amdgpu: make amdgpu_display_gem_fb_verify_and_init() static drm/amdgpu: don't use BACO for reset in S3 drm/amdgpu/smu10: fix SoC/fclk units in auto mode drm/amdgpu: fix VCN 3.1.2 firmware name drm/amd/display: fix 64 bit divide in freesync code Angus Wang (4): drm/amd/display: Create underflow interrupt IRQ type drm/amd/display: Add flip interval workaround drm/amd/display: Remove underflow IRQ type drm/amd/display: Fix inconsistent timestamp type Anthony Koo (3): drm/amd/display: [FW Promotion] Release 0.0.109.0 drm/amd/display: [FW Promotion] Release 0.0.110.0 drm/amd/display: [FW Promotion] Release 0.0.111.0 Aric Cyr (4): drm/amd/display: 3.2.178 drm/amd/display: 3.2.179 drm/amd/display: 3.2.180 drm/amd/display: 3.2.181 Becle Lee (1): drm/amd/display: fix missing-prototypes warning Benjamin Marty (1): drm/amdgpu/display: change pipe policy for DCN 2.1 Boyuan Zhang (1): drm/amdgpu/vcn3: send smu interface type CHANDAN VURDIGERE NATARAJ (1): drm/amd/display: Fix by adding FPU protection for dcn30_internal_validate_bw Charlene Liu (2): drm/amd/display: Clear optc false state when disable otg drm/amd/display: remove dtbclk_ss compensation for dcn316 Chris Park (1): drm/amd/display: Correct Slice reset calculation Christian König (10): drm/amdgpu: move VM PDEs to idle after update drm/amdgpu: separate VM PT handling into amdgpu_vm_pt.c drm/amdgpu: simplify VM update tracking a bit drm/amdgpu: rework TLB flushing drm/amdkfd: start using tlb_seq from the VM subsystem drm/amdkfd: use tlb_seq from the VM subsystem for SVM as well v2 drm/amdgpu: remove table_freed param from the VM code drm/amdgpu: fix some kerneldoc in the VM code v2 drm/amdgpu: fix incorrect size printing in error msg drm/amdgpu: fix TLB flushing during eviction Colin Ian King (1): drm/amdgpu: Fix spelling mistake "regiser" -> "register" Dan Carpenter (1): drm/amdkfd: potential NULL dereference in kfd_set/reset_event() Darren Powell (2): amdgpu/pm: Add new hwmgr API function "emit_clock_levels" amdgpu/pm: Implement emit_clk_levels for vega10 David Zhang (2): drm: add PSR2 support and capability definition as per eDP 1.5 drm/amd/display: implement shared PSR-SU sink validation helper Dillon Varone (2): drm/amd/display: Add dtb clock to dc_clocks drm/amd/display: Select correct DTO source Dmytro Laktyushkin (1): drm/amd/display: update dcn315 clock table read Duncan Ma (1): drm/amd/display: Add odm seamless boot support Eric Bernstein (1): drm/amd/display: remove assert for odm transition case Eric Yang (1): drm/amd/display: undo clearing of z10 related function pointers Evan Quan (1): drm/amdgpu: expand cg_flags from u32 to u64 Evgenii Krasnikov (1): drm/amd/display: ensure PSR force_static flag can always be set Felix Kuehling (4): drm/amdkfd: Improve concurrency of event handling drm/amdkfd: Fix NULL pointer dereference drm/amdkfd: Asynchronously free events drm/amdkfd: fix race condition in kfd_wait_on_events Gavin Wan
Re: Vega 56 failing to process EDID from VR Headset
On Wed, 13 Apr 2022 at 08:11, Paul Menzel wrote: > > Dear James, > > > I will do some more investigation. In addition to it not processing > > the EDID particularly well... > > Since my email, I have found out that it is failing to complete CR > > (Clock Recovery) on Link 0,2, but it works on 1,3 at HBR2. All 4 Links > > work at HBR1. (I need the HBR2 working) > > The CR negotiation in the code looks a bit wrong to me, so I will look > > into that a bit more. > > Looking at the current amdgpu source code (I am using Mainline > > kernel version 5.17.1), it appears to retry CR negotiation, but each > > time it uses the same settings, rather than try different driver > > parameters, as recommended in the DP standards and compliance test > > documents. > > […] > > Awesome, that you review the code with your expertise. Though I suggest > to look at and work on agd5f/amd-staging-drm-next [1], having the latest > code for the driver. > Just a small update. I have CR negotiation working now. I have found out what I suspected. Although the amdgpu driver has code in it that looks like CR negotiation is implemented, it does not actually do any negotiation at all. I still have some work to do, to also get EQ negotiation working. Once I have a patch that is tidy enough, I will raise a bug request and attach the patch to it. I am just pleased that this is a software problem, and not my screen at fault. Someone mentioned that the amdgpu driver is DP compliance tested periodically. In short, I don't think any of those Link negotiation DP compliance tests are actually valid, based on what I have found, so it might be sensible for the person who runs the Link negotiation DP compliance test suite to double check it is actually doing its job. Summary: During CR negotiation, it only ever outputs a swing of 0, and never 1,2 or 3 as it should. So, my guess is only Screens on relatively short cables have ever worked with the Linux amdgpu driver. Although longer cables might work for some, it probably gets a bit hit and miss for them. Kind Regards James
[PATCH 2/2] drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too
The idea is from commit a50fe7078035 ("drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran") and commit f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus"). At the moment, heavy-weight TLB could cause problems on ASICs except Aldebaran and Arcturus. A simple hipMallocManaged/hipFree program could trigger this issue. [ 97.787657] amdgpu :01:00.0: amdgpu: wait for kiq fence error: 0. [ 106.868758] amdgpu: qcm fence wait loop timeout expired [ 106.868966] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption [ 106.869203] amdgpu: Failed to evict process queues [ 106.869261] amdgpu: Failed to quiesce KFD Signed-off-by: Lang Yu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 459fa07a3bcc..5afe216cf099 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1229,7 +1229,9 @@ svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start, if (r) break; } - kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT); + + if (kfd_flush_tlb_after_unmap(pdd->dev)) + kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT); } return r; -- 2.25.1
[PATCH 1/2] drm/amdkfd: move kfd_flush_tlb_after_unmap into kfd_priv.h
To make kfd_flush_tlb_after_unmap visible in kfd_svm.c, move it into kfd_priv.h. And change it to an inline function. Signed-off-by: Lang Yu Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 91f82a9ccdaf..459f59e3d0ed 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1128,14 +1128,6 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, return ret; } -static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) -{ - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || - (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && - dev->adev->sdma.instance[0].fw_version >= 18) || - KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); -} - static int kfd_ioctl_map_memory_to_gpu(struct file *filep, struct kfd_process *p, void *data) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 8a43def1f638..aff6f598ff2c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1328,6 +1328,14 @@ void kfd_signal_poison_consumed_event(struct kfd_dev *dev, u32 pasid); void kfd_flush_tlb(struct kfd_process_device *pdd, enum TLB_FLUSH_TYPE type); +static inline bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) +{ + return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && + dev->adev->sdma.instance[0].fw_version >= 18) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); +} + bool kfd_is_locked(void); /* Compute profile */ -- 2.25.1
Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga
Am 15.04.22 um 07:38 schrieb Mikhail Gavrilov: On Sat, Apr 9, 2022 at 7:27 PM Christian König wrote: That's unfortunately not the end of the story. This is fixing your problem, but reintroducing the original problem that we call the syncobj with a lock held which can crash badly as well. Going to take a closer look on Monday. I hope you can test a few more patches to help narrow down what's actually going wrong here. Thanks, Christian. Hi Christian. I'm sorry to trouble you. Have you forgotten about this issue? No, I just couldn't find time during all that bug fixing :) Sorry for the delay, going to take a look after the eastern holiday here. Christian.
Re: [PATCH] drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too
On 04/15/ , Paul Menzel wrote: > Dear Lang, > > > Am 15.04.22 um 05:20 schrieb Lang Yu: > > On 04/14/ , Paul Menzel wrote: > > > > Am 14.04.22 um 10:19 schrieb Lang Yu: > > > > The idea is from commit a50fe7078035 ("drm/amdkfd: Only apply > > > > heavy-weight > > > > TLB flush on Aldebaran") and commit f61c40c0757a ("drm/amdkfd: enable > > > > heavy-weight TLB flush on Arcturus"). Otherwise, we will run into > > > > problems > > > > on some ASICs when running SVM applications. > > > > > > Please list the ASICs, you know of having problems, and even add how to > > > reproduce this. > > > > Actually, this is ported from previous commits. You can find more details > > from the commits I mentioned. At the moment the ASICs except Aldebaran > > and Arcturus probably have the problem. > > I think, it’s always good to make it as easy as possible for reviewers and, > later, people reading a commit, and include the necessary information > directly in the commit message. It’d be great if you amended the commit > message. Yes, I agree with you. Will amended the commit message. > > And running a SVM application could reproduce the issue. > > Thanks. How will it fail though? Will describe more details in commit message. > (Also, a small implementation note would be nice to have. Maybe: Move the > helper function into the header `kfd_priv.h`, and use in > `svm_range_unmap_from_gpus()`.) Will separate this change into another patch suggested by Eric. Thanks, Lang > Kind regards, > > Paul > > > > > > Signed-off-by: Lang Yu > > > > --- > > > >drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 > > > >drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 8 > > > >drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +++- > > > >3 files changed, 11 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > index 91f82a9ccdaf..459f59e3d0ed 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > @@ -1128,14 +1128,6 @@ static int kfd_ioctl_free_memory_of_gpu(struct > > > > file *filep, > > > > return ret; > > > >} > > > > -static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) > > > > -{ > > > > - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || > > > > - (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && > > > > - dev->adev->sdma.instance[0].fw_version >= 18) || > > > > - KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); > > > > -} > > > > - > > > >static int kfd_ioctl_map_memory_to_gpu(struct file *filep, > > > > struct kfd_process *p, void > > > > *data) > > > >{ > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > index 8a43def1f638..aff6f598ff2c 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > @@ -1328,6 +1328,14 @@ void kfd_signal_poison_consumed_event(struct > > > > kfd_dev *dev, u32 pasid); > > > >void kfd_flush_tlb(struct kfd_process_device *pdd, enum > > > > TLB_FLUSH_TYPE type); > > > > +static inline bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) > > > > +{ > > > > + return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || > > > > + (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && > > > > + dev->adev->sdma.instance[0].fw_version >= 18) || > > > > + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); > > > > +} > > > > + > > > >bool kfd_is_locked(void); > > > >/* Compute profile */ > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > index 459fa07a3bcc..5afe216cf099 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > @@ -1229,7 +1229,9 @@ svm_range_unmap_from_gpus(struct svm_range > > > > *prange, unsigned long start, > > > > if (r) > > > > break; > > > > } > > > > - kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT); > > > > + > > > > + if (kfd_flush_tlb_after_unmap(pdd->dev)) > > > > + kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT); > > > > } > > > > return r;