RE: [PATCH] drm/amdgpu: sw pstate switch should only be for vega20

2020-04-24 Thread Kim, Jonathan
[AMD Official Use Only - Internal Distribution Only]

Yes that is correct.

Thank you,

Jon

-Original Message-
From: Kuehling, Felix  
Sent: Friday, April 24, 2020 6:51 PM
To: Kim, Jonathan 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: sw pstate switch should only be for vega20

Am 2020-04-24 um 6:22 p.m. schrieb Jonathan Kim:
> Driver steered p-state switching is designed for Vega20 only.
> Also simplify early return for temporary disable due to SMU FW bug.
>
> Signed-off-by: Jonathan Kim 

Reviewed-by: Felix Kuehling 

I assume the early return would be qualified with a firmware version check once 
a firmware fix is available.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index 54d8a3e7e75c..48c0ce13f68e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -395,7 +395,9 @@ int amdgpu_xgmi_set_pstate(struct amdgpu_device *adev, 
> int pstate)
>   bool init_low = hive->pstate == AMDGPU_XGMI_PSTATE_UNKNOWN;
>  
>   /* fw bug so temporarily disable pstate switching */
> - if (!hive || adev->asic_type == CHIP_VEGA20)
> + return 0;
> +
> + if (!hive || adev->asic_type != CHIP_VEGA20)
>   return 0;
>  
>   mutex_lock(&hive->hive_lock);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: sw pstate switch should only be for vega20

2020-04-24 Thread Felix Kuehling
Am 2020-04-24 um 6:22 p.m. schrieb Jonathan Kim:
> Driver steered p-state switching is designed for Vega20 only.
> Also simplify early return for temporary disable due to SMU FW
> bug.
>
> Signed-off-by: Jonathan Kim 

Reviewed-by: Felix Kuehling 

I assume the early return would be qualified with a firmware version
check once a firmware fix is available.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index 54d8a3e7e75c..48c0ce13f68e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -395,7 +395,9 @@ int amdgpu_xgmi_set_pstate(struct amdgpu_device *adev, 
> int pstate)
>   bool init_low = hive->pstate == AMDGPU_XGMI_PSTATE_UNKNOWN;
>  
>   /* fw bug so temporarily disable pstate switching */
> - if (!hive || adev->asic_type == CHIP_VEGA20)
> + return 0;
> +
> + if (!hive || adev->asic_type != CHIP_VEGA20)
>   return 0;
>  
>   mutex_lock(&hive->hive_lock);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: sw pstate switch should only be for vega20

2020-04-24 Thread Jonathan Kim
Driver steered p-state switching is designed for Vega20 only.
Also simplify early return for temporary disable due to SMU FW
bug.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 54d8a3e7e75c..48c0ce13f68e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -395,7 +395,9 @@ int amdgpu_xgmi_set_pstate(struct amdgpu_device *adev, int 
pstate)
bool init_low = hive->pstate == AMDGPU_XGMI_PSTATE_UNKNOWN;
 
/* fw bug so temporarily disable pstate switching */
-   if (!hive || adev->asic_type == CHIP_VEGA20)
+   return 0;
+
+   if (!hive || adev->asic_type != CHIP_VEGA20)
return 0;
 
mutex_lock(&hive->hive_lock);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Remove unneeded semicolon

2020-04-24 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Apr 24, 2020 at 3:56 AM Christian König
 wrote:
>
> Am 24.04.20 um 09:56 schrieb Zheng Bin:
> > Fixes coccicheck warning:
> >
> > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2534:2-3: Unneeded semicolon
> >
> > Reported-by: Hulk Robot 
> > Signed-off-by: Zheng Bin 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 09aa5f509bd2..43d84214995c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -2531,7 +2531,7 @@ static void gfx_v9_0_init_sq_config(struct 
> > amdgpu_device *adev)
> >   break;
> >   default:
> >   break;
> > - };
> > + }
> >   }
> >
> >   static void gfx_v9_0_constants_init(struct amdgpu_device *adev)
> > --
> > 2.26.0.106.g9fadedd
> >
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[pull] amdgpu, amdkfd, radeon drm-next-5.8

2020-04-24 Thread Alex Deucher
Hi Dave, Daniel,

First pull for 5.8.

The following changes since commit 3148a6a0ef3cf93570f30a477292768f7eb5d3c3:

  drm/amdkfd: kfree the wrong pointer (2020-04-01 14:44:22 -0400)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/amd-drm-next-5.8-2020-04-24

for you to fetch changes up to e748f07d00c1c4a9106acafac52df7ea4ecf6264:

  drm/amdgpu: retire legacy vega10 sos version check (2020-04-23 15:41:06 -0400)


amd-drm-next-5.8-2020-04-24:

amdgpu:
- Documentation improvements
- Enable FRU chip access on boards that support it
- RAS updates
- SR-IOV updates
- Powerplay locking fixes for older SMU versions
- VCN DPG (dynamic powergating) cleanup
- VCN 2.5 DPG enablement
- Rework GPU scheduler handling
- Improve scheduler priority handling
- Add SPM (streaming performance monitor) golden settings for navi
- GFX10 clockgating fixes
- DC ABM (automatic backlight modulation) fixes
- DC cursor and plane fixes
- DC watermark fixes
- DC clock handling fixes
- DC color management fixes
- GPU reset fixes
- Clean up MMIO access macros
- EEPROM access fixes
- Misc code cleanups

amdkfd:
- Misc code cleanups

radeon:
- Clean up safe reg list generation
- Misc code cleanups


Aaron Liu (1):
  drm/amdgpu: unify fw_write_wait for new gfx9 asics

Aaron Ma (1):
  drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event

Alex Deucher (8):
  drm/amdgpu/psp: dont warn on missing optional TA's
  drm/amdgpu/display: fix warning when compiling without debugfs
  drm/amdgpu/ring: add no_scheduler flag
  drm/amdgpu/kiq: add no_scheduler flag to KIQ
  drm/amdgpu/ring: simplify scheduler setup logic
  drm/amdgpu/gfx9: add gfxoff quirk
  drm/amdgpu/display: fix aux registration (v2)
  drm/amdgpu/display: give aux i2c buses more meaningful names

Alex Sierra (6):
  drm/amdgpu: infinite retries fix from UTLC1 RB SDMA
  drm/amdgpu: ih doorbell size of range changed for nbio v7.4
  drm/amdgpu: enable IH ring 1 and ring 2 for navi
  drm/amdgpu: call psp to program ih cntl in SR-IOV for Navi
  drm/amdgpu: reroute VMC and UMD to IH ring 1 for oss v5
  amdgpu/drm: remove psp access on navi10 for sriov

Alvin Lee (1):
  drm/amd/display: Revert to old formula in set_vtg_params

Anthony Koo (5):
  drm/amd/display: make all backlight calls link based
  drm/amd/display: move panel power seq to new panel struct
  drm/amd/display: destroy panel on link destruct
  drm/amd/display: change from panel to panel cntl
  drm/amd/display: fix bug in the logic for panel power control

Aric Cyr (7):
  drm/amd/display: 3.2.77
  drm/amd/display: 3.2.78
  drm/amd/display: 3.2.79
  drm/amd/display: 3.2.80
  drm/amd/display: Fix HDR visual confirm
  drm/amd/display: Update MPCC if requested
  drm/amd/display: 3.2.81

Aurabindo Pillai (5):
  amdgpu_kms: Remove unnecessary condition check
  drm/amd/amdgpu: add prefix for pr_* prints
  drm/amd/amdgpu: add print prefix for dev_* variants
  drm/amd/amdgpu: remove hardcoded module name in prints
  drm/amd/display: DispalyPort: Write OUI only if panel supports it

Bernard Zhao (2):
  drm/amdgpu: cleanup coding style in amdkfd a bit
  drm/amdgpu: shrink critical section in 
amdgpu_amdkfd_gpuvm_free_memory_of_gpu

Bhawanpreet Lakha (4):
  drm/amd/display: remove mod_hdcp_hdcp2_get_link_encryption_status()
  drm/amd/display: Guard calls to hdcp_ta and dtm_ta
  drm/amd/display: query hdcp capability during link detect
  drm/amd/display: add HDCP caps debugfs

Charlene Liu (1):
  drm/amd/display: initialize get_max_link_cap

Chen Zhou (1):
  drm/amdgpu/uvd7: remove unnecessary conversion to bool

Chengming Gui (1):
  drm/amd/amdgpu: Correct gfx10's CG sequence

Christian König (3):
  drm/amdgpu: stop disable the scheduler during HW fini
  drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4
  drm/amdgpu: change how we update mmRLC_SPM_MC_CNTL

Colin Ian King (2):
  drm/amdgpu/vcn: fix spelling mistake "fimware" -> "firmware"
  drm/amd/display: remove redundant assignment to variable dp_ref_clk_khz

Dale Zhao (1):
  drm/amd/display: Correct updating logic of dcn21's pipe VM flags

Dennis Li (2):
  drm/amdgpu: replace DRM prefix with PCI device info for gfx/mmhub
  drm/amdgpu: set error query ready after all IPs late init

Dmytro Laktyushkin (3):
  drm/amd/display: fix dml pipe merge logic
  drm/amd/display: fix stream setting for diags on silicon
  drm/amd/display: fix virtual signal dsc setup

Emily Deng (4):
  drm/amdgpu: Virtual display need to support multiple ctrcs
  drm/amdgpu: Add 4k resolution for virtual display
  drm/amdgpu: Ignore the not supported error from psp
  drm/amdgpu: No need support vcn decode

Eric Yang (1)

Re: FreeBSD / amdgpu / Vega 3, pstate TEST_DEBUG_DATA: 0x0

2020-04-24 Thread Harry Wentland
On 2020-04-24 8:34 a.m., Andriy Gapon wrote:
> 
> First, I understand that my platform is not directly supported and probably 
> not
> very interesting, but I still hope to get some tips or pointers.
> 

Hi Andriy,

yeah, limited insight here since FreeBSD isn't something I'm familiar
with. :)

> I am trying to get amdgpu/FreeBSD going on Motile M141 laptop with Ryzen 3 
> 3200U
> CPU that has Vega 3 graphics integrated.  When amdgpu starts loading the 
> screen
> goes black and never lights up again.  I am not sure whether there is no 
> signal
> at all or whether the backlight is turned off, but the screen is completely
> dark.  I can blindly interact with the system, so it's not crashed or hung.
> From system logs I can see that the driver attaches successfully.  It 
> recognizes
> the hardware, loads its firmware, detects the eDP screen and so on.
> 

Does BSD have a way to check or set your backlight value manually (a la
/sys/class/backlight on linux)? If so I'd suggest checking and setting
it to non-zero values, ideally to max_brightness.

Have you tried an external display?

> The FreeBSD's amdgpu port that I am trying is based on code circa 5.0.
> There is no newer version ported.
> I tried a couple of Linux distros with 5.3.x kernels and they worked without 
> any
> problems. So that gives me some hope.
> 
> I compared driver messages (with drm_debug set to 0xfff) between Linux and
> FreeBSD and they look quite similar.  Except for one thing.
> In the FreeBSD case there are these error messages that are not seen with 
> Linux:
> 
> [drm] pstate TEST_DEBUG_DATA: 0x0
> WARNING !(0) failed at
> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c:868
> #0 0x83451633 at linux_dump_stack+0x23
> #1 0x8325a9ee at dcn10_verify_allow_pstate_change_high+0x4e
> #2 0x8325e925 at dcn10_wait_for_mpcc_disconnect+0x25
> #3 0x8325de53 at dcn10_disable_plane+0x53
> #4 0x8325c9f5 at dcn10_init_hw+0x755
> #5 0x83295ca8 at dc_create+0x538
> #6 0x8327a8da at dm_hw_init+0x1ea
> #7 0x831701d1 at amdgpu_device_init+0x1b11
> #8 0x83185177 at amdgpu_driver_load_kms+0xd7
> #9 0x833f138e at drm_dev_register+0x17e
> #10 0x83178dea at amdgpu_pci_probe+0x18a
> #11 0x83456f40 at linux_pci_attach+0x560
> #12 0x80bf68ea at device_attach+0x3ca
> #13 0x80bf6490 at device_probe_and_attach+0x70
> #14 0x80bf8358 at bus_generic_driver_added+0x58
> #15 0x80bf4289 at devclass_driver_added+0x39
> #16 0x80bf41c7 at devclass_add_driver+0x147
> #17 0x83455ae9 at _linux_pci_register_driver+0xc9
> 
> That warning plus stack trace is actually BREAK_TO_DEBUGGER() in the original
> Linux code.
> So, that makes me think that the problem is pretty serious.

BREAK_TO_DEBUGGER is probably overly scary here. It's somewhat a concern
as this means power consumption might be higher than expected. We've
seen this issue on several systems without any other adverse effects to
usability.

Harry

> I tried searching for "TEST_DEBUG_DATA: 0x0" and I could not find a single
> result with "0x0" in it.  Usually there is some non-zero value.
> To me this looks like maybe some hardware component is not turned on...
> Perhaps this is something relatively obvious for people that hack on the 
> driver
> and the hardware.
> I hope to receive some hint about what to look for.
> I can cherry-pick commits from Linux, apply patches, add additional debugging
> logs, etc.
> 
> FreeBSD amdgpu dmesg: https://people.freebsd.org/~avg/amdgpu.dmesg.txt
> Full Linux dmesg: https://people.freebsd.org/~avg/linux-5.3.0-28.dmesg.out
> 
> And with with drm_debug=0xfff.
> FreeBSD: https://people.freebsd.org/~avg/fbsd-dmesg.txt
> Linux: https://people.freebsd.org/~avg/linux-5.3.9-dmesg.txt
> 
> I see that both Linux and FreeBSD have similar messages about failing to load
> some microcode components, but I guess that it must be okay since Linux works:
> [4.487381] [drm] reserve 0x40 from 0xf400c0 for PSP TMR
> [4.564893] [drm] failed to load ucode id (12)
> [4.564894] [drm] psp command failed and response status is (-53233)
> [4.567891] [drm] failed to load ucode id (13)
> [4.567892] [drm] psp command failed and response status is (-65521)
> [4.570891] [drm] failed to load ucode id (14)
> [4.570892] [drm] psp command failed and response status is (-65521)
> 
> Thank you!
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] amdgpu/dc: remove redundant assignment to variable 'option'

2020-04-24 Thread Alex Deucher
On Fri, Apr 24, 2020 at 7:12 AM Colin King  wrote:
>
> From: Colin Ian King 
>
> The variable option is being initialized with a value that is
> never read and it is being updated later with a new value.  The
> initialization is redundant and can be removed.
>
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c 
> b/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c
> index 4245e1f818a3..e096d2b95ef9 100644
> --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c
> +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c
> @@ -679,8 +679,7 @@ void dce110_opp_v_set_csc_default(
> if (default_adjust->force_hw_default == false) {
> const struct out_csc_color_matrix *elm;
> /* currently parameter not in use */
> -   enum grph_color_adjust_option option =
> -   GRPH_COLOR_MATRIX_HW_DEFAULT;
> +   enum grph_color_adjust_option option;
> uint32_t i;
> /*
>  * HW default false we program locally defined matrix
> --
> 2.25.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH][next] drm/amdgpu: fix unlocks on error return path

2020-04-24 Thread Alex Deucher
On Fri, Apr 24, 2020 at 8:56 AM Colin King  wrote:
>
> From: Colin Ian King 
>
> Currently the error returns paths are unlocking lock kiq->ring_lock
> however it seems this should be dev->gfx.kiq.ring_lock as this
> is the lock that is being locked and unlocked around the ring
> operations.  This looks like a bug, fix it by unlocking the
> correct lock.
>
> [ Note: untested ]
>
> Addresses-Coverity: ("Missing unlock")
> Fixes: 82478876eaac ("drm/amdgpu: protect ring overrun")
> Signed-off-by: Colin Ian King 

It's the same lock, just accessed via a local pointer.  I'll take the
patch and update the commit message when I apply it to avoid confusion
in the future.

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index b120f9160f13..edaa50d850a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -430,7 +430,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
> r = amdgpu_fence_emit_polling(ring, &seq, MAX_KIQ_REG_WAIT);
> if (r) {
> amdgpu_ring_undo(ring);
> -   spin_unlock(&kiq->ring_lock);
> +   spin_unlock(&adev->gfx.kiq.ring_lock);
> return -ETIME;
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 0a6026308343..055ecba754ff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -624,7 +624,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
> amdgpu_device *adev,
> r = amdgpu_fence_emit_polling(ring, &seq, MAX_KIQ_REG_WAIT);
> if (r) {
> amdgpu_ring_undo(ring);
> -   spin_unlock(&kiq->ring_lock);
> +   spin_unlock(&adev->gfx.kiq.ring_lock);
> return -ETIME;
> }
>
> --
> 2.25.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: re-enable runtime pm on vega10

2020-04-24 Thread Alex Deucher
It was disabled due to a ROCm failure.  I think that should
be fixed.  Re-enable it.

Signed-off-by: Alex Deucher 
---

Can anyone verify if the runtime pm issues with ROCm have been fixed?
I'd like to re-enable runtime pm for vega10.

 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index ea7e72ecaefa..c7f42ff6ab5e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -172,7 +172,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
else if (amdgpu_device_supports_baco(dev) &&
 (amdgpu_runtime_pm != 0) &&
 (adev->asic_type >= CHIP_TOPAZ) &&
-(adev->asic_type != CHIP_VEGA10) &&
 (adev->asic_type != CHIP_VEGA20) &&
 (adev->asic_type != CHIP_ARCTURUS)) /* enable runpm on VI+ */
adev->runpm = true;
-- 
2.25.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drma/dmgpu: drop redundant cg/pg ungate on runpm enter

2020-04-24 Thread Alex Deucher
On Fri, Apr 24, 2020 at 10:02 AM Alex Deucher  wrote:
>
> Also, I just noticed a typo in the patch title:
>
> drma/dmgpu -> drm/admgpu

Can't type.

drma/dmgpu -> drm/amdgpu

Alex

>
> Alex
>
> On Fri, Apr 24, 2020 at 4:06 AM Evan Quan  wrote:
> >
> > CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
> > the CG/PG ungate will be performed twice. That will cause gfxoff
> > disablement is performed twice also on runpm enter while gfxoff
> > enablemnt once on rump exit. That will put gfxoff into disabled
> > state.
> >
> > Change-Id: I489ca456770d3fe482b685f132400202467f712b
> > Signed-off-by: Evan Quan 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 08eeb0d2c149..71278942f9f0 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3453,9 +3453,6 @@ int amdgpu_device_suspend(struct drm_device *dev, 
> > bool fbcon)
> > }
> > }
> >
> > -   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> > -   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> > -
> > amdgpu_ras_suspend(adev);
> >
> > r = amdgpu_device_ip_suspend_phase1(adev);
> > --
> > 2.26.2
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drma/dmgpu: drop redundant cg/pg ungate on runpm enter

2020-04-24 Thread Alex Deucher
Also, I just noticed a typo in the patch title:

drma/dmgpu -> drm/admgpu

Alex

On Fri, Apr 24, 2020 at 4:06 AM Evan Quan  wrote:
>
> CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
> the CG/PG ungate will be performed twice. That will cause gfxoff
> disablement is performed twice also on runpm enter while gfxoff
> enablemnt once on rump exit. That will put gfxoff into disabled
> state.
>
> Change-Id: I489ca456770d3fe482b685f132400202467f712b
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 08eeb0d2c149..71278942f9f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3453,9 +3453,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
> }
> }
>
> -   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> -   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> -
> amdgpu_ras_suspend(adev);
>
> r = amdgpu_device_ip_suspend_phase1(adev);
> --
> 2.26.2
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Need to power up SDMA for #34 sdma firmware after do mode2 reset on Renoir

2020-04-24 Thread Alex Deucher
On Fri, Apr 24, 2020 at 7:09 AM chen gong  wrote:
>
> [why]
> Previously I found that powering up SDMA after mode2 reset can cause
> ring-sdma0 test failed.
> Perhaps the mode2 reset does not reset the SDMA PG state, so SDMA is
> already powered up so there is no need to ask the SMU to power it up
> again and doing so causes some other problem.
> So I skipped function 'amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_SDMA, false)' with '!adev->in_gpu_reset'.
>
> But now, for #34 sdma firmware, dmesg log show "ring-sdma0 test failed "
> after the mode2 reset, and scan data show SDMA does not have power.
> Then I re-enable function "amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_SDMA, false)" during mode2 reset, The result is issue
> disappear.
>
> Besides, I did more experiments base on previous sdma firmware for this
> patch. Situation Normal.
>
> [how]
> Remove "!adev->in_gpu_reset"

Do we need a firmware version check?  How recent is the #34 firmware?
If there are older renoir SDMA firmwares out in the wild we may need
the check for them.  With that confirmed,
Reviewed-by: Alex Deucher 

>
> Signed-off-by: chen gong 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index c0ca9a82..0ecab41 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -1914,7 +1914,7 @@ static int sdma_v4_0_hw_init(void *handle)
>
> if ((adev->asic_type == CHIP_RAVEN && adev->powerplay.pp_funcs &&
> adev->powerplay.pp_funcs->set_powergating_by_smu) ||
> -   (adev->asic_type == CHIP_RENOIR && 
> !adev->in_gpu_reset))
> +   adev->asic_type == CHIP_RENOIR)
> amdgpu_dpm_set_powergating_by_smu(adev, 
> AMD_IP_BLOCK_TYPE_SDMA, false);
>
> if (!amdgpu_sriov_vf(adev))
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drma/dmgpu: drop redundant cg/pg ungate on runpm enter

2020-04-24 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

Series is:
Acked-by: Alex Deucher 

From: Quan, Evan 
Sent: Friday, April 24, 2020 4:05 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Kuehling, Felix 
; Liang, Prike ; Quan, Evan 

Subject: [PATCH 2/2] drma/dmgpu: drop redundant cg/pg ungate on runpm enter

CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
the CG/PG ungate will be performed twice. That will cause gfxoff
disablement is performed twice also on runpm enter while gfxoff
enablemnt once on rump exit. That will put gfxoff into disabled
state.

Change-Id: I489ca456770d3fe482b685f132400202467f712b
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 08eeb0d2c149..71278942f9f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3453,9 +3453,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
 }
 }

-   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
-   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
-
 amdgpu_ras_suspend(adev);

 r = amdgpu_device_ip_suspend_phase1(adev);
--
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH][next] drm/amdgpu: fix unlocks on error return path

2020-04-24 Thread Colin King
From: Colin Ian King 

Currently the error returns paths are unlocking lock kiq->ring_lock
however it seems this should be dev->gfx.kiq.ring_lock as this
is the lock that is being locked and unlocked around the ring
operations.  This looks like a bug, fix it by unlocking the
correct lock.

[ Note: untested ]

Addresses-Coverity: ("Missing unlock")
Fixes: 82478876eaac ("drm/amdgpu: protect ring overrun")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index b120f9160f13..edaa50d850a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -430,7 +430,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
amdgpu_device *adev,
r = amdgpu_fence_emit_polling(ring, &seq, MAX_KIQ_REG_WAIT);
if (r) {
amdgpu_ring_undo(ring);
-   spin_unlock(&kiq->ring_lock);
+   spin_unlock(&adev->gfx.kiq.ring_lock);
return -ETIME;
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 0a6026308343..055ecba754ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -624,7 +624,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
amdgpu_device *adev,
r = amdgpu_fence_emit_polling(ring, &seq, MAX_KIQ_REG_WAIT);
if (r) {
amdgpu_ring_undo(ring);
-   spin_unlock(&kiq->ring_lock);
+   spin_unlock(&adev->gfx.kiq.ring_lock);
return -ETIME;
}
 
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Remove unneeded semicolon

2020-04-24 Thread Zheng Bin
Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2534:2-3: Unneeded semicolon

Reported-by: Hulk Robot 
Signed-off-by: Zheng Bin 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 09aa5f509bd2..43d84214995c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2531,7 +2531,7 @@ static void gfx_v9_0_init_sq_config(struct amdgpu_device 
*adev)
break;
default:
break;
-   };
+   }
 }

 static void gfx_v9_0_constants_init(struct amdgpu_device *adev)
--
2.26.0.106.g9fadedd

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] amdgpu/dc: remove redundant assignment to variable 'option'

2020-04-24 Thread Colin King
From: Colin Ian King 

The variable option is being initialized with a value that is
never read and it is being updated later with a new value.  The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c
index 4245e1f818a3..e096d2b95ef9 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_opp_csc_v.c
@@ -679,8 +679,7 @@ void dce110_opp_v_set_csc_default(
if (default_adjust->force_hw_default == false) {
const struct out_csc_color_matrix *elm;
/* currently parameter not in use */
-   enum grph_color_adjust_option option =
-   GRPH_COLOR_MATRIX_HW_DEFAULT;
+   enum grph_color_adjust_option option;
uint32_t i;
/*
 * HW default false we program locally defined matrix
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/5] drm/amdgpu: utilize subconnector property for DP through atombios

2020-04-24 Thread Jeevan B
From: Oleg Vasilev 

Since DP-specific information is stored in driver's structures, every
driver needs to implement subconnector property by itself.

v2: rebase

v3: renamed a function call

Cc: Alex Deucher 
Cc: Christian König 
Cc: David (ChunMing) Zhou 
Cc: amd-gfx@lists.freedesktop.org
Signed-off-by: Jeevan B 
Signed-off-by: Oleg Vasilev 
Reviewed-by: Emil Velikov 
Acked-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/atombios_dp.c   | 18 +-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
index f355d9a..8955c4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "amdgpu.h"
@@ -1405,6 +1406,10 @@ amdgpu_connector_dp_detect(struct drm_connector 
*connector, bool force)
pm_runtime_put_autosuspend(connector->dev->dev);
}
 
+   drm_dp_set_subconnector_property(&amdgpu_connector->base,
+ret,
+amdgpu_dig_connector->dpcd,
+
amdgpu_dig_connector->downstream_ports);
return ret;
 }
 
@@ -1951,6 +1956,11 @@ amdgpu_connector_add(struct amdgpu_device *adev,
if (has_aux)
amdgpu_atombios_dp_aux_init(amdgpu_connector);
 
+   if (connector_type == DRM_MODE_CONNECTOR_DisplayPort ||
+   connector_type == DRM_MODE_CONNECTOR_eDP) {
+   
drm_connector_attach_dp_subconnector_property(&amdgpu_connector->base);
+   }
+
return;
 
 failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
index 37ba07e..04a430e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
@@ -469,6 +469,7 @@ struct amdgpu_encoder {
 struct amdgpu_connector_atom_dig {
/* displayport */
u8 dpcd[DP_RECEIVER_CAP_SIZE];
+   u8 downstream_ports[DP_MAX_DOWNSTREAM_PORTS];
u8 dp_sink_type;
int dp_clock;
int dp_lane_count;
diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c 
b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
index 9b74cfd..900b272 100644
--- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
@@ -328,6 +328,22 @@ static void amdgpu_atombios_dp_probe_oui(struct 
amdgpu_connector *amdgpu_connect
  buf[0], buf[1], buf[2]);
 }
 
+static void amdgpu_atombios_dp_ds_ports(struct amdgpu_connector 
*amdgpu_connector)
+{
+   struct amdgpu_connector_atom_dig *dig_connector = 
amdgpu_connector->con_priv;
+   int ret;
+
+   if (dig_connector->dpcd[DP_DPCD_REV] > 0x10) {
+   ret = drm_dp_dpcd_read(&amdgpu_connector->ddc_bus->aux,
+  DP_DOWNSTREAM_PORT_0,
+  dig_connector->downstream_ports,
+  DP_MAX_DOWNSTREAM_PORTS);
+   if (ret)
+   memset(dig_connector->downstream_ports, 0,
+  DP_MAX_DOWNSTREAM_PORTS);
+   }
+}
+
 int amdgpu_atombios_dp_get_dpcd(struct amdgpu_connector *amdgpu_connector)
 {
struct amdgpu_connector_atom_dig *dig_connector = 
amdgpu_connector->con_priv;
@@ -343,7 +359,7 @@ int amdgpu_atombios_dp_get_dpcd(struct amdgpu_connector 
*amdgpu_connector)
  dig_connector->dpcd);
 
amdgpu_atombios_dp_probe_oui(amdgpu_connector);
-
+   amdgpu_atombios_dp_ds_ports(amdgpu_connector);
return 0;
}
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/5] drm/amdgpu: utilize subconnector property for DP through DisplayManager

2020-04-24 Thread Jeevan B
From: Oleg Vasilev 

Since DP-specific information is stored in driver's structures, every
driver needs to implement subconnector property by itself. Display
Core already has the subconnector information, we only need to
expose it through DRM property.

v2:rebase

v3: renamed a function call

Cc: Alex Deucher 
Cc: Christian König 
Cc: David (ChunMing) Zhou 
Cc: amd-gfx@lists.freedesktop.org
Signed-off-by: Jeevan B 
Signed-off-by: Oleg Vasilev 
Tested-by: Oleg Vasilev 
Acked-by: Alex Deucher 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 41 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c|  3 ++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index f7c5cdc..16bdd20 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -121,6 +121,42 @@ MODULE_FIRMWARE(FIRMWARE_NAVI12_DMCU);
 static int amdgpu_dm_init(struct amdgpu_device *adev);
 static void amdgpu_dm_fini(struct amdgpu_device *adev);
 
+static enum drm_mode_subconnector get_subconnector_type(struct dc_link *link)
+{
+   switch (link->dpcd_caps.dongle_type) {
+   case DISPLAY_DONGLE_NONE:
+   return DRM_MODE_SUBCONNECTOR_Native;
+   case DISPLAY_DONGLE_DP_VGA_CONVERTER:
+   return DRM_MODE_SUBCONNECTOR_VGA;
+   case DISPLAY_DONGLE_DP_DVI_CONVERTER:
+   case DISPLAY_DONGLE_DP_DVI_DONGLE:
+   return DRM_MODE_SUBCONNECTOR_DVID;
+   case DISPLAY_DONGLE_DP_HDMI_CONVERTER:
+   case DISPLAY_DONGLE_DP_HDMI_DONGLE:
+   return DRM_MODE_SUBCONNECTOR_HDMIA;
+   case DISPLAY_DONGLE_DP_HDMI_MISMATCHED_DONGLE:
+   default:
+   return DRM_MODE_SUBCONNECTOR_Unknown;
+   }
+}
+
+static void update_subconnector_property(struct amdgpu_dm_connector 
*aconnector)
+{
+   struct dc_link *link = aconnector->dc_link;
+   struct drm_connector *connector = &aconnector->base;
+   enum drm_mode_subconnector subconnector = DRM_MODE_SUBCONNECTOR_Unknown;
+
+   if (connector->connector_type != DRM_MODE_CONNECTOR_DisplayPort)
+   return;
+
+   if (aconnector->dc_sink)
+   subconnector = get_subconnector_type(link);
+
+   drm_object_property_set_value(&connector->base,
+   connector->dev->mode_config.dp_subconnector_property,
+   subconnector);
+}
+
 /*
  * initializes drm_device display related structures, based on the information
  * provided by DAL. The drm strcutures are: drm_crtc, drm_connector,
@@ -1917,7 +1953,6 @@ void amdgpu_dm_update_connector_after_detect(
if (aconnector->mst_mgr.mst_state == true)
return;
 
-
sink = aconnector->dc_link->local_sink;
if (sink)
dc_sink_retain(sink);
@@ -2038,6 +2073,8 @@ void amdgpu_dm_update_connector_after_detect(
 
mutex_unlock(&dev->mode_config.mutex);
 
+   update_subconnector_property(aconnector);
+
if (sink)
dc_sink_release(sink);
 }
@@ -4521,6 +4558,8 @@ amdgpu_dm_connector_detect(struct drm_connector 
*connector, bool force)
else
connected = (aconnector->base.force == DRM_FORCE_ON);
 
+   update_subconnector_property(aconnector);
+
return (connected ? connector_status_connected :
connector_status_disconnected);
 }
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 3db1ec3..6a2562d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dm_services.h"
 #include "amdgpu.h"
 #include "amdgpu_dm.h"
@@ -472,6 +473,8 @@ void amdgpu_dm_initialize_dp_connector(struct 
amdgpu_display_manager *dm,
16,
4,
aconnector->connector_id);
+
+   drm_connector_attach_dp_subconnector_property(&aconnector->base);
 }
 
 int dm_mst_get_pbn_divider(struct dc_link *link)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


FreeBSD / amdgpu / Vega 3, pstate TEST_DEBUG_DATA: 0x0

2020-04-24 Thread Andriy Gapon


First, I understand that my platform is not directly supported and probably not
very interesting, but I still hope to get some tips or pointers.

I am trying to get amdgpu/FreeBSD going on Motile M141 laptop with Ryzen 3 3200U
CPU that has Vega 3 graphics integrated.  When amdgpu starts loading the screen
goes black and never lights up again.  I am not sure whether there is no signal
at all or whether the backlight is turned off, but the screen is completely
dark.  I can blindly interact with the system, so it's not crashed or hung.
>From system logs I can see that the driver attaches successfully.  It 
>recognizes
the hardware, loads its firmware, detects the eDP screen and so on.

The FreeBSD's amdgpu port that I am trying is based on code circa 5.0.
There is no newer version ported.
I tried a couple of Linux distros with 5.3.x kernels and they worked without any
problems. So that gives me some hope.

I compared driver messages (with drm_debug set to 0xfff) between Linux and
FreeBSD and they look quite similar.  Except for one thing.
In the FreeBSD case there are these error messages that are not seen with Linux:

[drm] pstate TEST_DEBUG_DATA: 0x0
WARNING !(0) failed at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c:868
#0 0x83451633 at linux_dump_stack+0x23
#1 0x8325a9ee at dcn10_verify_allow_pstate_change_high+0x4e
#2 0x8325e925 at dcn10_wait_for_mpcc_disconnect+0x25
#3 0x8325de53 at dcn10_disable_plane+0x53
#4 0x8325c9f5 at dcn10_init_hw+0x755
#5 0x83295ca8 at dc_create+0x538
#6 0x8327a8da at dm_hw_init+0x1ea
#7 0x831701d1 at amdgpu_device_init+0x1b11
#8 0x83185177 at amdgpu_driver_load_kms+0xd7
#9 0x833f138e at drm_dev_register+0x17e
#10 0x83178dea at amdgpu_pci_probe+0x18a
#11 0x83456f40 at linux_pci_attach+0x560
#12 0x80bf68ea at device_attach+0x3ca
#13 0x80bf6490 at device_probe_and_attach+0x70
#14 0x80bf8358 at bus_generic_driver_added+0x58
#15 0x80bf4289 at devclass_driver_added+0x39
#16 0x80bf41c7 at devclass_add_driver+0x147
#17 0x83455ae9 at _linux_pci_register_driver+0xc9

That warning plus stack trace is actually BREAK_TO_DEBUGGER() in the original
Linux code.
So, that makes me think that the problem is pretty serious.
I tried searching for "TEST_DEBUG_DATA: 0x0" and I could not find a single
result with "0x0" in it.  Usually there is some non-zero value.
To me this looks like maybe some hardware component is not turned on...
Perhaps this is something relatively obvious for people that hack on the driver
and the hardware.
I hope to receive some hint about what to look for.
I can cherry-pick commits from Linux, apply patches, add additional debugging
logs, etc.

FreeBSD amdgpu dmesg: https://people.freebsd.org/~avg/amdgpu.dmesg.txt
Full Linux dmesg: https://people.freebsd.org/~avg/linux-5.3.0-28.dmesg.out

And with with drm_debug=0xfff.
FreeBSD: https://people.freebsd.org/~avg/fbsd-dmesg.txt
Linux: https://people.freebsd.org/~avg/linux-5.3.9-dmesg.txt

I see that both Linux and FreeBSD have similar messages about failing to load
some microcode components, but I guess that it must be okay since Linux works:
[4.487381] [drm] reserve 0x40 from 0xf400c0 for PSP TMR
[4.564893] [drm] failed to load ucode id (12)
[4.564894] [drm] psp command failed and response status is (-53233)
[4.567891] [drm] failed to load ucode id (13)
[4.567892] [drm] psp command failed and response status is (-65521)
[4.570891] [drm] failed to load ucode id (14)
[4.570892] [drm] psp command failed and response status is (-65521)

Thank you!

-- 
Andriy Gapon
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


FreeBSD / amdgpu / Vega 3, need help

2020-04-24 Thread Andriy Gapon


First, I understand that my platform is not directly supported and probably not
very interesting, but I still hope to get some tips or pointers.

I am trying to get amdgpu/FreeBSD going on Motile M141 laptop with Ryzen 3 3200U
CPU that has Vega 3 graphics integrated.  When amdgpu starts loading the screen
goes black and never lights up again.  I am not sure whether there is no signal
at all or whether the backlight is turned off, but the screen is completely
dark.  I can blindly interact with the system, so it's not crashed or hung.
>From system logs I can see that the driver attaches successfully.  It 
>recognizes
the hardware, loads its firmware, detects the eDP screen and so on.

The FreeBSD's amdgpu port that I am trying is based on code circa 5.0.
There is no newer version ported.
I tried a couple of Linux distros with 5.3.x kernels and they worked without any
problems. So that gives me some hope.

I compared driver messages (with drm_debug set to 0xfff) between Linux and
FreeBSD and they look quite similar.  Except for one thing.
In the FreeBSD case there are these error messages that are not seen with Linux:

[drm] pstate TEST_DEBUG_DATA: 0x0
WARNING !(0) failed at
/usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c:868
#0 0x83451633 at linux_dump_stack+0x23
#1 0x8325a9ee at dcn10_verify_allow_pstate_change_high+0x4e
#2 0x8325e925 at dcn10_wait_for_mpcc_disconnect+0x25
#3 0x8325de53 at dcn10_disable_plane+0x53
#4 0x8325c9f5 at dcn10_init_hw+0x755
#5 0x83295ca8 at dc_create+0x538
#6 0x8327a8da at dm_hw_init+0x1ea
#7 0x831701d1 at amdgpu_device_init+0x1b11
#8 0x83185177 at amdgpu_driver_load_kms+0xd7
#9 0x833f138e at drm_dev_register+0x17e
#10 0x83178dea at amdgpu_pci_probe+0x18a
#11 0x83456f40 at linux_pci_attach+0x560
#12 0x80bf68ea at device_attach+0x3ca
#13 0x80bf6490 at device_probe_and_attach+0x70
#14 0x80bf8358 at bus_generic_driver_added+0x58
#15 0x80bf4289 at devclass_driver_added+0x39
#16 0x80bf41c7 at devclass_add_driver+0x147
#17 0x83455ae9 at _linux_pci_register_driver+0xc9

That warning plus stack trace is actually BREAK_TO_DEBUGGER() in the original
Linux code.
So, that makes me think that the problem is pretty serious.
I tried searching for "TEST_DEBUG_DATA: 0x0" and I could not find a single
result with "0x0" in it.  Usually there is some non-zero value.
To me this looks like maybe some hardware component is not turned on...
Perhaps this is something relatively obvious for people that hack on the driver
and the hardware.
I hope to receive some hint about what to look for.
I can cherry-pick commits from Linux, apply patches, add additional debugging
logs, etc.

FreeBSD amdgpu dmesg: https://people.freebsd.org/~avg/amdgpu.dmesg.txt
Full Linux dmesg: https://people.freebsd.org/~avg/linux-5.3.0-28.dmesg.out

And with with drm_debug=0xfff.
FreeBSD: https://people.freebsd.org/~avg/fbsd-dmesg.txt
Linux: https://people.freebsd.org/~avg/linux-5.3.9-dmesg.txt

I see that both Linux and FreeBSD have similar messages about failing to load
some microcode components, but I guess that it must be okay since Linux works:
[4.487381] [drm] reserve 0x40 from 0xf400c0 for PSP TMR
[4.564893] [drm] failed to load ucode id (12)
[4.564894] [drm] psp command failed and response status is (-53233)
[4.567891] [drm] failed to load ucode id (13)
[4.567892] [drm] psp command failed and response status is (-65521)
[4.570891] [drm] failed to load ucode id (14)
[4.570892] [drm] psp command failed and response status is (-65521)

Thank you!

-- 
Andriy Gapon
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH AUTOSEL 5.6 14/38] drm/amdgpu: fix wrong vram lost counter increment V2

2020-04-24 Thread Sasha Levin
From: Evan Quan 

[ Upstream commit 028cfb2444b94d4f394a6fa4ca46182481236e91 ]

Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs

Signed-off-by: Evan Quan 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 ++--
 drivers/gpu/drm/amd/amdgpu/cik.c   |  2 --
 drivers/gpu/drm/amd/amdgpu/nv.c|  4 
 drivers/gpu/drm/amd/amdgpu/soc15.c |  4 
 drivers/gpu/drm/amd/amdgpu/vi.c|  2 --
 5 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c8bf9cb3cebf2..f184cdca938de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1953,8 +1953,24 @@ static void amdgpu_device_fill_reset_magic(struct 
amdgpu_device *adev)
  */
 static bool amdgpu_device_check_vram_lost(struct amdgpu_device *adev)
 {
-   return !!memcmp(adev->gart.ptr, adev->reset_magic,
-   AMDGPU_RESET_MAGIC_NUM);
+   if (memcmp(adev->gart.ptr, adev->reset_magic,
+   AMDGPU_RESET_MAGIC_NUM))
+   return true;
+
+   if (!adev->in_gpu_reset)
+   return false;
+
+   /*
+* For all ASICs with baco/mode1 reset, the VRAM is
+* always assumed to be lost.
+*/
+   switch (amdgpu_asic_reset_method(adev)) {
+   case AMD_RESET_METHOD_BACO:
+   case AMD_RESET_METHOD_MODE1:
+   return true;
+   default:
+   return false;
+   }
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index 006f21ef7ddf0..62635e58e45ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1358,8 +1358,6 @@ static int cik_asic_reset(struct amdgpu_device *adev)
int r;
 
if (cik_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
-   if (!adev->in_suspend)
-   amdgpu_inc_vram_lost(adev);
r = amdgpu_dpm_baco_reset(adev);
} else {
r = cik_asic_pci_config_reset(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 2d1bebdf1603d..cc3a79029376b 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -351,8 +351,6 @@ static int nv_asic_reset(struct amdgpu_device *adev)
struct smu_context *smu = &adev->smu;
 
if (nv_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
-   if (!adev->in_suspend)
-   amdgpu_inc_vram_lost(adev);
ret = smu_baco_enter(smu);
if (ret)
return ret;
@@ -360,8 +358,6 @@ static int nv_asic_reset(struct amdgpu_device *adev)
if (ret)
return ret;
} else {
-   if (!adev->in_suspend)
-   amdgpu_inc_vram_lost(adev);
ret = nv_asic_mode1_reset(adev);
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index d8945c31b622c..132a67a041a24 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -569,14 +569,10 @@ static int soc15_asic_reset(struct amdgpu_device *adev)
 
switch (soc15_asic_reset_method(adev)) {
case AMD_RESET_METHOD_BACO:
-   if (!adev->in_suspend)
-   amdgpu_inc_vram_lost(adev);
return soc15_asic_baco_reset(adev);
case AMD_RESET_METHOD_MODE2:
return amdgpu_dpm_mode2_reset(adev);
default:
-   if (!adev->in_suspend)
-   amdgpu_inc_vram_lost(adev);
return soc15_asic_mode1_reset(adev);
}
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
index 78b35901643bc..3ce10e05d0d6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -765,8 +765,6 @@ static int vi_asic_reset(struct amdgpu_device *adev)
int r;
 
if (vi_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
-   if (!adev->in_suspend)
-   amdgpu_inc_vram_lost(adev);
r = amdgpu_dpm_baco_reset(adev);
} else {
r = vi_asic_pci_config_reset(adev);
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v3

2020-04-24 Thread Christian König
From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
  rename debugfs file to amdgpu_autodump,
  provide autodump_read as well,
  style and code cleanups

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 92 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..6f8ef98c4b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -990,6 +990,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..b1029d12a971 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #include "amdgpu.h"
@@ -74,8 +74,96 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
 }
 
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
+#if defined(CONFIG_DEBUG_FS)
+   const unsigned long timeout = 600 * HZ;
+   int ret;
+
+   wake_up_interruptible(&adev->autodump.gpu_hang);
+
+   ret = wait_for_completion_interruptible_timeout(&adev->autodump.dumping,
+   timeout);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(&adev->lock_reset);
+   if (adev->autodump.dumping.done) {
+   reinit_completion(&adev->autodump.dumping);
+   ret = 0;
+   } else {
+   ret = -EBUSY;
+   }
+   mutex_unlock(&adev->lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode,
+  struct file *file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete(&adev->autodump.dumping);
+   return 0;
+}
+
+static ssize_t amdgpu_debugfs_autodump_read(struct file *file, char __user 
*buf,
+   size_t size, loff_t *pos)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   wait_event_interruptible(adev->autodump.gpu_hang, adev->in_gpu_reset);
+   return 0;
+}
+
+unsigned int amdgpu_debugfs_autodump_poll(struct file *file,
+ struct poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   poll_wait(file, &adev->autodump.gpu_hang, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = amdgpu_debugfs_autodump_open,
+   .read = amdgpu_debugfs_autodump_read,
+   .poll = amdgpu_debugfs_autodump_poll,
+   .release = amdgpu_debugfs_autodump_release,
+};
+
+static void amdgpu_debugfs_autodump_init(struct amdgpu_device *adev)
+{
+   init_completion(&adev->autodump.dumping);
+   init_waitqueue_head(&adev->autodump.gpu_hang);
+
+   complete(&adev->autodump.dumping);
+
+   debugfs_create_file("amdgpu_autodump", 0600,
+   adev->ddev->primary->debugfs_root,
+   adev, &autodump_debu

[PATCH] drm/amdgpu: Need to power up SDMA for #34 sdma firmware after do mode2 reset on Renoir

2020-04-24 Thread chen gong
[why]
Previously I found that powering up SDMA after mode2 reset can cause
ring-sdma0 test failed.
Perhaps the mode2 reset does not reset the SDMA PG state, so SDMA is
already powered up so there is no need to ask the SMU to power it up
again and doing so causes some other problem.
So I skipped function 'amdgpu_dpm_set_powergating_by_smu(adev,
AMD_IP_BLOCK_TYPE_SDMA, false)' with '!adev->in_gpu_reset'.

But now, for #34 sdma firmware, dmesg log show "ring-sdma0 test failed "
after the mode2 reset, and scan data show SDMA does not have power.
Then I re-enable function "amdgpu_dpm_set_powergating_by_smu(adev,
AMD_IP_BLOCK_TYPE_SDMA, false)" during mode2 reset, The result is issue
disappear.

Besides, I did more experiments base on previous sdma firmware for this
patch. Situation Normal.

[how]
Remove "!adev->in_gpu_reset"

Signed-off-by: chen gong 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index c0ca9a82..0ecab41 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1914,7 +1914,7 @@ static int sdma_v4_0_hw_init(void *handle)
 
if ((adev->asic_type == CHIP_RAVEN && adev->powerplay.pp_funcs &&
adev->powerplay.pp_funcs->set_powergating_by_smu) ||
-   (adev->asic_type == CHIP_RENOIR && !adev->in_gpu_reset))
+   adev->asic_type == CHIP_RENOIR)
amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_SDMA, 
false);
 
if (!amdgpu_sriov_vf(adev))
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: address the static checker warnings

2020-04-24 Thread Dan Carpenter
On Fri, Apr 24, 2020 at 06:41:15PM +0800, Evan Quan wrote:
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4199 amdgpu_device_gpu_recover()
> error: we previously assumed 'hive' could be null (see line 4196)
> 
> This is introduced by "drm/amdgpu: optimize the gpu reset for XGMI setup V2".
> 
> Change-Id: I9c22b57abc9f512114112f93fb035f1fecf26beb
> Signed-off-by: Evan Quan 
> Reported-by: Dan Carpenter 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 71278942f9f0..898338dc9605 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4274,7 +4274,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   if (!amdgpu_device_lock_adev(tmp_adev, !hive)) {
>   DRM_INFO("Bailing on TDR for s_job:%llx, as another 
> already in progress",
> job ? job->base.id : -1);
> - mutex_unlock(&hive->hive_lock);
> + if (hive)
> + mutex_unlock(&hive->hive_lock);

In the current code, we know for a fact that "hive" is NULL at this
point.  Presumably this will be changed in the future?  Otherwise why
not just delete the mutex_unlock() because it is dead code.

regards,
dan carpenter

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: address the static checker warnings

2020-04-24 Thread Evan Quan
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4199 amdgpu_device_gpu_recover()
error: we previously assumed 'hive' could be null (see line 4196)

This is introduced by "drm/amdgpu: optimize the gpu reset for XGMI setup V2".

Change-Id: I9c22b57abc9f512114112f93fb035f1fecf26beb
Signed-off-by: Evan Quan 
Reported-by: Dan Carpenter 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 71278942f9f0..898338dc9605 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4274,7 +4274,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (!amdgpu_device_lock_adev(tmp_adev, !hive)) {
DRM_INFO("Bailing on TDR for s_job:%llx, as another 
already in progress",
  job ? job->base.id : -1);
-   mutex_unlock(&hive->hive_lock);
+   if (hive)
+   mutex_unlock(&hive->hive_lock);
return 0;
}
 
-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[bug report] drm/amdgpu: optimize the gpu reset for XGMI setup V2

2020-04-24 Thread Dan Carpenter
Hello Evan Quan,

This is a semi-automatic email about new static checker warnings.

The patch 9e94d22c0085: "drm/amdgpu: optimize the gpu reset for XGMI
setup V2" from Apr 16, 2020, leads to the following Smatch complaint:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4199 amdgpu_device_gpu_recover()
error: we previously assumed 'hive' could be null (see line 4196)

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
  4195  list_for_each_entry(tmp_adev, device_list_handle, 
gmc.xgmi.head) {
  4196  if (!amdgpu_device_lock_adev(tmp_adev, !hive)) {
   ^
There are a ton of double negatives in this snippet.

Assume hive is NULL, then not NULL is true, so in amdgpu_device_lock_adev()
we try to take the lock, and lets assume that fails.  That's only path
which returns false.  In other words, we know that "hive" is NULL inside
this condition.

  4197  DRM_INFO("Bailing on TDR for s_job:%llx, as 
another already in progress",
  4198job ? job->base.id : -1);
  4199  mutex_unlock(&hive->hive_lock);
  
NULL dereference.

  4200  return 0;
  4201  }

regards,
dan carpenter
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset (v2)

2020-04-24 Thread Zhao, Jiange
[AMD Official Use Only - Internal Distribution Only]

Sure, go ahead.

-Original Message-
From: Christian König  
Sent: Friday, April 24, 2020 4:58 PM
To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org
Cc: Pelloux-prayer, Pierre-eric ; Zhao, 
Jiange ; Kuehling, Felix ; 
Deucher, Alexander ; Koenig, Christian 
; Liu, Monk ; Zhang, Hawking 

Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset (v2)

Am 24.04.20 um 10:49 schrieb jia...@amd.com:
> From: Jiange Zhao 
>
> When GPU got timeout, it would notify an interested part of an 
> opportunity to dump info before actual GPU reset.
>
> A usermode app would open 'autodump' node under debugfs system and 
> poll() for readable/writable. When a GPU reset is due, amdgpu would 
> notify usermode app through wait_queue_head and give it 10 minutes to 
> dump info.
>
> After usermode app has done its work, this 'autodump' node is closed.
> On node closure, amdgpu gets to know the dump is done through the 
> completion that is triggered in release().
>
> There is no write or read callback because necessary info can be 
> obtained through dmesg and umr. Messages back and forth between 
> usermode app and amdgpu are unnecessary.
>
> v2: (1) changed 'registered' to 'app_listening'
>  (2) add a mutex in open() to prevent race condition

May I pick this one up and modify it a bit?

There is still a race which needs to be fixed and it is a bit hard to explain 
what's going wrong here.

Christian.

>
> Signed-off-by: Jiange Zhao 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h | 10 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 92 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
>   4 files changed, 104 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index bc1e0fd71a09..34b8ce9fba47 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -724,6 +724,14 @@ struct amd_powerplay {
>   const struct amd_pm_funcs *pp_funcs;
>   };
>   
> +struct amdgpu_autodump {
> + boolapp_listening;
> + struct completion   completed;
> + struct dentry   *dentry;
> + struct wait_queue_head  gpu_hang_wait;
> + struct mutexmutex;
> +};
> +
>   #define AMDGPU_RESET_MAGIC_NUM 64
>   #define AMDGPU_MAX_DF_PERFMONS 4
>   struct amdgpu_device {
> @@ -990,6 +998,8 @@ struct amdgpu_device {
>   charproduct_number[16];
>   charproduct_name[32];
>   charserial[16];
> +
> + struct amdgpu_autodump  autodump;
>   };
>   
>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct 
> ttm_bo_device *bdev) diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index 1a4894fa3693..693bfcaad312 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -27,7 +27,7 @@
>   #include 
>   #include 
>   #include 
> -
> +#include 
>   #include 
>   
>   #include "amdgpu.h"
> @@ -74,8 +74,96 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
>   return 0;
>   }
>   
> +int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev) { #if 
> +defined(CONFIG_DEBUG_FS)
> + unsigned long tmo = 600*HZ;
> + int ret;
> +
> + if (!adev->autodump.app_listening)
> + return 0;
> +
> + wake_up_interruptible(&adev->autodump.gpu_hang_wait);
> +
> + ret = 
> wait_for_completion_interruptible_timeout(&adev->autodump.completed, tmo);
> + if (ret == 0) {
> + pr_err("autodump: timeout, move on to gpu recovery\n");
> + return -ETIMEDOUT;
> + }
> +#endif
> + return 0;
> +}
> +
>   #if defined(CONFIG_DEBUG_FS)
>   
> +static int amdgpu_debugfs_autodump_open(struct inode *inode, struct 
> +file *file) {
> + int ret = 0;
> + struct amdgpu_device *adev;
> +
> + ret = simple_open(inode, file);
> + if (ret)
> + return ret;
> +
> + adev = file->private_data;
> +
> + mutex_lock(&adev->autodump.mutex);
> + if (adev->autodump.app_listening == true) {
> + ret = -EBUSY;
> + } else {
> + adev->autodump.app_listening = true;
> + }
> + mutex_unlock(&adev->autodump.mutex);
> +
> + return ret;
> +}
> +
> +static int amdgpu_debugfs_autodump_release(struct inode *inode, 
> +struct file *file) {
> + struct amdgpu_device *adev = file->private_data;
> +
> + complete(&adev->autodump.completed);
> + adev->autodump.app_listening = false;
> +
> + return 0;
> +}
> +
> +unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
> +poll_table_struct *poll_table) {
> + struct amdgpu_device *adev = file->private_data;
> +
>

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset (v2)

2020-04-24 Thread Christian König

Am 24.04.20 um 10:49 schrieb jia...@amd.com:

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
 (2) add a mutex in open() to prevent race condition


May I pick this one up and modify it a bit?

There is still a race which needs to be fixed and it is a bit hard to 
explain what's going wrong here.


Christian.



Signed-off-by: Jiange Zhao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 10 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 92 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
  4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..34b8ce9fba47 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -724,6 +724,14 @@ struct amd_powerplay {
const struct amd_pm_funcs *pp_funcs;
  };
  
+struct amdgpu_autodump {

+   boolapp_listening;
+   struct completion   completed;
+   struct dentry   *dentry;
+   struct wait_queue_head  gpu_hang_wait;
+   struct mutexmutex;
+};
+
  #define AMDGPU_RESET_MAGIC_NUM 64
  #define AMDGPU_MAX_DF_PERFMONS 4
  struct amdgpu_device {
@@ -990,6 +998,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
  };
  
  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..693bfcaad312 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
  #include 
  #include 
  #include 
-
+#include 
  #include 
  
  #include "amdgpu.h"

@@ -74,8 +74,96 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
  }
  
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)

+{
+#if defined(CONFIG_DEBUG_FS)
+   unsigned long tmo = 600*HZ;
+   int ret;
+
+   if (!adev->autodump.app_listening)
+   return 0;
+
+   wake_up_interruptible(&adev->autodump.gpu_hang_wait);
+
+   ret = 
wait_for_completion_interruptible_timeout(&adev->autodump.completed, tmo);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
  #if defined(CONFIG_DEBUG_FS)
  
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)

+{
+   int ret = 0;
+   struct amdgpu_device *adev;
+
+   ret = simple_open(inode, file);
+   if (ret)
+   return ret;
+
+   adev = file->private_data;
+
+   mutex_lock(&adev->autodump.mutex);
+   if (adev->autodump.app_listening == true) {
+   ret = -EBUSY;
+   } else {
+   adev->autodump.app_listening = true;
+   }
+   mutex_unlock(&adev->autodump.mutex);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete(&adev->autodump.completed);
+   adev->autodump.app_listening = false;
+
+   return 0;
+}
+
+unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   poll_wait(file, &adev->autodump.gpu_hang_wait, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = amdgpu_debugfs_autodump_open,
+   .poll = amdgpu_debugfs_autodump_poll,
+   .release = amdgpu_debugfs_autodump_release,
+};
+
+static int amdgpu_debugfs_autodump_init(struct amdgpu_device *adev)
+{
+   struct dentry *entry;
+
+   init_completion(&adev->autodump.completed);
+   init_waitqueue_head(

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset (v2)

2020-04-24 Thread Zhao, Jiange
[AMD Official Use Only - Internal Distribution Only]

Hi @Koenig, Christian,

For the completion restore, I have tested the patch and it is working.

It seems that in release() function, complete() would restore the completion.

Jiange

From: Zhao, Jiange 
Sent: Friday, April 24, 2020 4:49 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Koenig, Christian ; Kuehling, Felix 
; Pelloux-prayer, Pierre-eric 
; Deucher, Alexander 
; Zhang, Hawking ; Liu, Monk 
; Zhao, Jiange 
Subject: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset (v2)

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 10 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 92 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..34b8ce9fba47 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -724,6 +724,14 @@ struct amd_powerplay {
 const struct amd_pm_funcs *pp_funcs;
 };

+struct amdgpu_autodump {
+   boolapp_listening;
+   struct completion   completed;
+   struct dentry   *dentry;
+   struct wait_queue_head  gpu_hang_wait;
+   struct mutexmutex;
+};
+
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
 struct amdgpu_device {
@@ -990,6 +998,8 @@ struct amdgpu_device {
 charproduct_number[16];
 charproduct_name[32];
 charserial[16];
+
+   struct amdgpu_autodump  autodump;
 };

 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..693bfcaad312 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 

 #include "amdgpu.h"
@@ -74,8 +74,96 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
 return 0;
 }

+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
+#if defined(CONFIG_DEBUG_FS)
+   unsigned long tmo = 600*HZ;
+   int ret;
+
+   if (!adev->autodump.app_listening)
+   return 0;
+
+   wake_up_interruptible(&adev->autodump.gpu_hang_wait);
+
+   ret = 
wait_for_completion_interruptible_timeout(&adev->autodump.completed, tmo);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)

+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   int ret = 0;
+   struct amdgpu_device *adev;
+
+   ret = simple_open(inode, file);
+   if (ret)
+   return ret;
+
+   adev = file->private_data;
+
+   mutex_lock(&adev->autodump.mutex);
+   if (adev->autodump.app_listening == true) {
+   ret = -EBUSY;
+   } else {
+   adev->autodump.app_listening = true;
+   }
+   mutex_unlock(&adev->autodump.mutex);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete(&adev->autodump.completed);
+   adev->autodump.app_listening = false;
+
+   return 0;
+}
+
+unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   poll_wait(file, &adev->autodump.gpu_hang_wait, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops =

[PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset (v2)

2020-04-24 Thread jianzh
From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 10 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 92 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..34b8ce9fba47 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -724,6 +724,14 @@ struct amd_powerplay {
const struct amd_pm_funcs *pp_funcs;
 };
 
+struct amdgpu_autodump {
+   boolapp_listening;
+   struct completion   completed;
+   struct dentry   *dentry;
+   struct wait_queue_head  gpu_hang_wait;
+   struct mutexmutex;
+};
+
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
 struct amdgpu_device {
@@ -990,6 +998,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..693bfcaad312 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #include "amdgpu.h"
@@ -74,8 +74,96 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
 }
 
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
+#if defined(CONFIG_DEBUG_FS)
+   unsigned long tmo = 600*HZ;
+   int ret;
+
+   if (!adev->autodump.app_listening)
+   return 0;
+
+   wake_up_interruptible(&adev->autodump.gpu_hang_wait);
+
+   ret = 
wait_for_completion_interruptible_timeout(&adev->autodump.completed, tmo);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   int ret = 0;
+   struct amdgpu_device *adev;
+
+   ret = simple_open(inode, file);
+   if (ret)
+   return ret;
+
+   adev = file->private_data;
+
+   mutex_lock(&adev->autodump.mutex);
+   if (adev->autodump.app_listening == true) {
+   ret = -EBUSY;
+   } else {
+   adev->autodump.app_listening = true;
+   }
+   mutex_unlock(&adev->autodump.mutex);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete(&adev->autodump.completed);
+   adev->autodump.app_listening = false;
+
+   return 0;
+}
+
+unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   poll_wait(file, &adev->autodump.gpu_hang_wait, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = amdgpu_debugfs_autodump_open,
+   .poll = amdgpu_debugfs_autodump_poll,
+   .release = amdgpu_debugfs_autodump_release,
+};
+
+static int amdgpu_debugfs_autodump_init(struct amdgpu_device *adev)
+{
+   struct dentry *entry;
+
+   init_completion(&adev->autodump.completed);
+   init_waitqueue_head(&adev->autodump.gpu_hang_wait);
+   mutex_init(&adev->autodump.mutex);
+   adev->autodump.app_listening = false;
+
+   entry = debugfs_create_file("autodump", 0600,
+   adev->ddev->primary->debugfs_root,
+ 

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset

2020-04-24 Thread Christian König
I agree. Closing the node is fine, we just need to make sure that 
reopening it gives us another try again.


Shouldn't be to much of a problem, except for a single line the last 
version already looked like it should have worked for this.


Christian.

Am 24.04.20 um 10:30 schrieb Pierre-Eric Pelloux-Prayer:

Running "umr --autodump" currently does the following:
   - open the debugfs autodump file
   - wait for a hang (using poll)
   - dump gfx rings
   - close the file and exit

I don't think adding support for a "Done" message is necessary, but I'd expect 
to
be able to restart "umr --autodump" and be able to perform another dump.

Pierre-Eric

On 24/04/2020 10:24, Zhao, Jiange wrote:

[AMD Official Use Only - Internal Distribution Only]


Hi,

Of course, considering all the suggestions, I will implement a write callback for 
usermode app to notify KMD that a dump is finished by sending "Done".

In this way, usermode app can do multiple dumps without closing the node,

Jiange
--
*From:* Pelloux-prayer, Pierre-eric 
*Sent:* Friday, April 24, 2020 3:46 PM
*To:* Zhao, Jiange ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org 

*Cc:* Deucher, Alexander ; Pelloux-prayer, Pierre-eric 
; Kuehling, Felix ; Liu, Monk 
; Zhang, Hawking 
*Subject:* Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset
  
Hi Jiange,


FYI I'm working on adding a new "--autodump" command to umr that uses this 
feature.
This is not yet merged but you can find the code here: 
https://gitlab.freedesktop.org/pepp/umr/-/tree/autodump


(3) At the same time, considering the use case of this node, I believe that 
only the first GPU reset is worthy of a dump.

If it's possible I'd like to be able to do multiple dumps instead of limiting 
ourselves to only the first one.

Thanks!
Pierre-Eric




(4) I didn't implement race condition guard because I believe that this node 
caters for a cautious super-user and a single client is enough to do all the 
work. I can add the logic if you think it is necessary.

Jiange

-Original Message-
From: Koenig, Christian 
Sent: Thursday, April 23, 2020 4:53 PM
To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix ; Pelloux-prayer, Pierre-eric 
; Deucher, Alexander ; Zhang, Hawking 
; Liu, Monk ; Zhao, Jiange 
Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset

Am 23.04.20 um 09:19 schrieb jia...@amd.com:

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part of an
opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system and
poll() for readable/writable. When a GPU reset is due, amdgpu would
notify usermode app through wait_queue_head and give it 10 minutes to
dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through the
completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

Signed-off-by: Jiange Zhao 
---
    drivers/gpu/drm/amd/amdgpu/amdgpu.h |  9 +++
    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 85 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
    4 files changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index bc1e0fd71a09..a505b547f242 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -724,6 +724,13 @@ struct amd_powerplay {
    const struct amd_pm_funcs *pp_funcs;
    };

+struct amdgpu_autodump {

+    bool    registered;
+    struct completion   completed;

Registered and completed seems to have the same meaning.


+    struct dentry   *dentry;
+    struct wait_queue_head  gpu_hang_wait;

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset

2020-04-24 Thread Pierre-Eric Pelloux-Prayer
Running "umr --autodump" currently does the following:
  - open the debugfs autodump file
  - wait for a hang (using poll)
  - dump gfx rings
  - close the file and exit

I don't think adding support for a "Done" message is necessary, but I'd expect 
to
be able to restart "umr --autodump" and be able to perform another dump.

Pierre-Eric

On 24/04/2020 10:24, Zhao, Jiange wrote:
> [AMD Official Use Only - Internal Distribution Only]
> 
> 
> Hi,
> 
> Of course, considering all the suggestions, I will implement a write callback 
> for usermode app to notify KMD that a dump is finished by sending "Done".
> 
> In this way, usermode app can do multiple dumps without closing the node,
> 
> Jiange
> --
> *From:* Pelloux-prayer, Pierre-eric 
> *Sent:* Friday, April 24, 2020 3:46 PM
> *To:* Zhao, Jiange ; Koenig, Christian 
> ; amd-gfx@lists.freedesktop.org 
> 
> *Cc:* Deucher, Alexander ; Pelloux-prayer, 
> Pierre-eric ; Kuehling, Felix 
> ; Liu, Monk ; Zhang, Hawking 
> 
> *Subject:* Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset
>  
> Hi Jiange,
> 
> FYI I'm working on adding a new "--autodump" command to umr that uses this 
> feature.
> This is not yet merged but you can find the code here: 
> https://gitlab.freedesktop.org/pepp/umr/-/tree/autodump
> 
>> (3) At the same time, considering the use case of this node, I believe that 
>> only the first GPU reset is worthy of a dump.
> 
> If it's possible I'd like to be able to do multiple dumps instead of limiting 
> ourselves to only the first one.
> 
> Thanks!
> Pierre-Eric
> 
> 
> 
>> (4) I didn't implement race condition guard because I believe that this node 
>> caters for a cautious super-user and a single client is enough to do all the 
>> work. I can add the logic if you think it is necessary.
>> 
>> Jiange
>> 
>> -Original Message-
>> From: Koenig, Christian  
>> Sent: Thursday, April 23, 2020 4:53 PM
>> To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org
>> Cc: Kuehling, Felix ; Pelloux-prayer, Pierre-eric 
>> ; Deucher, Alexander 
>> ; Zhang, Hawking ; Liu, 
>> Monk ; Zhao, Jiange 
>> Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset
>> 
>> Am 23.04.20 um 09:19 schrieb jia...@amd.com:
>>> From: Jiange Zhao 
>>>
>>> When GPU got timeout, it would notify an interested part of an 
>>> opportunity to dump info before actual GPU reset.
>>>
>>> A usermode app would open 'autodump' node under debugfs system and 
>>> poll() for readable/writable. When a GPU reset is due, amdgpu would 
>>> notify usermode app through wait_queue_head and give it 10 minutes to 
>>> dump info.
>>>
>>> After usermode app has done its work, this 'autodump' node is closed.
>>> On node closure, amdgpu gets to know the dump is done through the 
>>> completion that is triggered in release().
>>>
>>> There is no write or read callback because necessary info can be 
>>> obtained through dmesg and umr. Messages back and forth between 
>>> usermode app and amdgpu are unnecessary.
>>>
>>> Signed-off-by: Jiange Zhao 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  9 +++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 85 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
>>>   4 files changed, 97 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index bc1e0fd71a09..a505b547f242 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -724,6 +724,13 @@ struct amd_powerplay {
>>>   const struct amd_pm_funcs *pp_funcs;
>>>   };
>>>   
>>> +struct amdgpu_autodump {
>>> +    bool    registered;
>>> +    struct completion   completed;
>> 
>> Registered and completed seems to have the same meaning.
>> 
>>> +    struct dentry   *dentry;
>>> +    struct wait_queue_head  gpu_hang_wait;
>>> +};
>>> +
>>>   #define AM

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset

2020-04-24 Thread Zhao, Jiange
[AMD Official Use Only - Internal Distribution Only]

Hi,

Of course, considering all the suggestions, I will implement a write callback 
for usermode app to notify KMD that a dump is finished by sending "Done".

In this way, usermode app can do multiple dumps without closing the node,

Jiange

From: Pelloux-prayer, Pierre-eric 
Sent: Friday, April 24, 2020 3:46 PM
To: Zhao, Jiange ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander ; Pelloux-prayer, Pierre-eric 
; Kuehling, Felix ; 
Liu, Monk ; Zhang, Hawking 
Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset

Hi Jiange,

FYI I'm working on adding a new "--autodump" command to umr that uses this 
feature.
This is not yet merged but you can find the code here: 
https://gitlab.freedesktop.org/pepp/umr/-/tree/autodump

> (3) At the same time, considering the use case of this node, I believe that 
> only the first GPU reset is worthy of a dump.

If it's possible I'd like to be able to do multiple dumps instead of limiting 
ourselves to only the first one.

Thanks!
Pierre-Eric



> (4) I didn't implement race condition guard because I believe that this node 
> caters for a cautious super-user and a single client is enough to do all the 
> work. I can add the logic if you think it is necessary.
>
> Jiange
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Thursday, April 23, 2020 4:53 PM
> To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org
> Cc: Kuehling, Felix ; Pelloux-prayer, Pierre-eric 
> ; Deucher, Alexander 
> ; Zhang, Hawking ; Liu, 
> Monk ; Zhao, Jiange 
> Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset
>
> Am 23.04.20 um 09:19 schrieb jia...@amd.com:
>> From: Jiange Zhao 
>>
>> When GPU got timeout, it would notify an interested part of an
>> opportunity to dump info before actual GPU reset.
>>
>> A usermode app would open 'autodump' node under debugfs system and
>> poll() for readable/writable. When a GPU reset is due, amdgpu would
>> notify usermode app through wait_queue_head and give it 10 minutes to
>> dump info.
>>
>> After usermode app has done its work, this 'autodump' node is closed.
>> On node closure, amdgpu gets to know the dump is done through the
>> completion that is triggered in release().
>>
>> There is no write or read callback because necessary info can be
>> obtained through dmesg and umr. Messages back and forth between
>> usermode app and amdgpu are unnecessary.
>>
>> Signed-off-by: Jiange Zhao 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  9 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 85 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
>>   4 files changed, 97 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index bc1e0fd71a09..a505b547f242 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -724,6 +724,13 @@ struct amd_powerplay {
>>   const struct amd_pm_funcs *pp_funcs;
>>   };
>>
>> +struct amdgpu_autodump {
>> +boolregistered;
>> +struct completion   completed;
>
> Registered and completed seems to have the same meaning.
>
>> +struct dentry   *dentry;
>> +struct wait_queue_head  gpu_hang_wait;
>> +};
>> +
>>   #define AMDGPU_RESET_MAGIC_NUM 64
>>   #define AMDGPU_MAX_DF_PERFMONS 4
>>   struct amdgpu_device {
>> @@ -990,6 +997,8 @@ struct amdgpu_device {
>>   charproduct_number[16];
>>   charproduct_name[32];
>>   charserial[16];
>> +
>> +struct amdgpu_autodump  autodump;
>>   };
>>
>>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct
>> ttm_bo_device *bdev) diff --git
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> index 1a4894fa3693..cdd4bf00adee 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> @@ -74,8 +74,91 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
>>   return 0;
>>   }
>>
>> +int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev) { #if
>> +defined(CONFIG_DEBUG_FS)
>> +int ret;
>> +unsigned long tmo = 600*HZ;
>
> In general please declare constant lines first and variable like "i" or "r" 
> last.
>
>> +
>> +if (!adev->autodump.registered)
>> +return 0;
>> +
>> +wake_up_interruptible(&adev->autodump.gpu_hang_wait);
>> +
>> +ret =
>> +wait_for_completion_interruptible_timeout(&adev->autodump.completed,
>> +tmo);
>
> This is racy, in other words it can happen that a new client opens up the 
> debugfs file without being signaled but blocks the reset here.
>
> You could use two completion structures to avoid that.
>

[PATCH 2/2] drma/dmgpu: drop redundant cg/pg ungate on runpm enter

2020-04-24 Thread Evan Quan
CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
the CG/PG ungate will be performed twice. That will cause gfxoff
disablement is performed twice also on runpm enter while gfxoff
enablemnt once on rump exit. That will put gfxoff into disabled
state.

Change-Id: I489ca456770d3fe482b685f132400202467f712b
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 08eeb0d2c149..71278942f9f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3453,9 +3453,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
}
}
 
-   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
-   amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
-
amdgpu_ras_suspend(adev);
 
r = amdgpu_device_ip_suspend_phase1(adev);
-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: move kfd suspend after ip_suspend_phase1

2020-04-24 Thread Evan Quan
This sequence change should be safe as what did in ip_suspend_phase1
is to suspend DCE only. And this is a prerequisite for coming
redundant cg/pg ungate dropping.

Change-Id: Ie125e234f8f743d327cf8e0389e4872a312184c0
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8eb339aff14d..08eeb0d2c149 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3456,12 +3456,12 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
 
-   amdgpu_amdkfd_suspend(adev, !fbcon);
-
amdgpu_ras_suspend(adev);
 
r = amdgpu_device_ip_suspend_phase1(adev);
 
+   amdgpu_amdkfd_suspend(adev, !fbcon);
+
/* evict vram memory */
amdgpu_bo_evict_vram(adev);
 
-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Remove unneeded semicolon

2020-04-24 Thread Christian König

Am 24.04.20 um 09:56 schrieb Zheng Bin:

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2534:2-3: Unneeded semicolon

Reported-by: Hulk Robot 
Signed-off-by: Zheng Bin 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 09aa5f509bd2..43d84214995c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2531,7 +2531,7 @@ static void gfx_v9_0_init_sq_config(struct amdgpu_device 
*adev)
break;
default:
break;
-   };
+   }
  }

  static void gfx_v9_0_constants_init(struct amdgpu_device *adev)
--
2.26.0.106.g9fadedd



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset

2020-04-24 Thread Pierre-Eric Pelloux-Prayer
Hi Jiange,

FYI I'm working on adding a new "--autodump" command to umr that uses this 
feature.
This is not yet merged but you can find the code here: 
https://gitlab.freedesktop.org/pepp/umr/-/tree/autodump

> (3) At the same time, considering the use case of this node, I believe that 
> only the first GPU reset is worthy of a dump.

If it's possible I'd like to be able to do multiple dumps instead of limiting 
ourselves to only the first one.

Thanks!
Pierre-Eric



> (4) I didn't implement race condition guard because I believe that this node 
> caters for a cautious super-user and a single client is enough to do all the 
> work. I can add the logic if you think it is necessary.
> 
> Jiange
> 
> -Original Message-
> From: Koenig, Christian  
> Sent: Thursday, April 23, 2020 4:53 PM
> To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org
> Cc: Kuehling, Felix ; Pelloux-prayer, Pierre-eric 
> ; Deucher, Alexander 
> ; Zhang, Hawking ; Liu, 
> Monk ; Zhao, Jiange 
> Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset
> 
> Am 23.04.20 um 09:19 schrieb jia...@amd.com:
>> From: Jiange Zhao 
>>
>> When GPU got timeout, it would notify an interested part of an 
>> opportunity to dump info before actual GPU reset.
>>
>> A usermode app would open 'autodump' node under debugfs system and 
>> poll() for readable/writable. When a GPU reset is due, amdgpu would 
>> notify usermode app through wait_queue_head and give it 10 minutes to 
>> dump info.
>>
>> After usermode app has done its work, this 'autodump' node is closed.
>> On node closure, amdgpu gets to know the dump is done through the 
>> completion that is triggered in release().
>>
>> There is no write or read callback because necessary info can be 
>> obtained through dmesg and umr. Messages back and forth between 
>> usermode app and amdgpu are unnecessary.
>>
>> Signed-off-by: Jiange Zhao 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  9 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 85 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
>>   4 files changed, 97 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index bc1e0fd71a09..a505b547f242 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -724,6 +724,13 @@ struct amd_powerplay {
>>  const struct amd_pm_funcs *pp_funcs;
>>   };
>>   
>> +struct amdgpu_autodump {
>> +boolregistered;
>> +struct completion   completed;
> 
> Registered and completed seems to have the same meaning.
> 
>> +struct dentry   *dentry;
>> +struct wait_queue_head  gpu_hang_wait;
>> +};
>> +
>>   #define AMDGPU_RESET_MAGIC_NUM 64
>>   #define AMDGPU_MAX_DF_PERFMONS 4
>>   struct amdgpu_device {
>> @@ -990,6 +997,8 @@ struct amdgpu_device {
>>  charproduct_number[16];
>>  charproduct_name[32];
>>  charserial[16];
>> +
>> +struct amdgpu_autodump  autodump;
>>   };
>>   
>>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct 
>> ttm_bo_device *bdev) diff --git 
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> index 1a4894fa3693..cdd4bf00adee 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> @@ -74,8 +74,91 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
>>  return 0;
>>   }
>>   
>> +int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev) { #if 
>> +defined(CONFIG_DEBUG_FS)
>> +int ret;
>> +unsigned long tmo = 600*HZ;
> 
> In general please declare constant lines first and variable like "i" or "r" 
> last.
> 
>> +
>> +if (!adev->autodump.registered)
>> +return 0;
>> +
>> +wake_up_interruptible(&adev->autodump.gpu_hang_wait);
>> +
>> +ret = 
>> +wait_for_completion_interruptible_timeout(&adev->autodump.completed, 
>> +tmo);
> 
> This is racy, in other words it can happen that a new client opens up the 
> debugfs file without being signaled but blocks the reset here.
> 
> You could use two completion structures to avoid that.
> 
>> +if (ret == 0) { /* time out and dump tool still not finish its dump*/
>> +pr_err("autodump: timeout before dump finished, move on to gpu 
>> recovery\n");
>> +return -ETIMEDOUT;
>> +}
>> +#endif
>> +return 0;
>> +}
>> +
>>   #if defined(CONFIG_DEBUG_FS)
>>   
>> +static int amdgpu_debugfs_autodump_open(struct inode *inode, struct 
>> +file *file) {
>> +int ret;
>> +struct amdgpu_device *adev;
>> +
>> +ret = simple_open(inode, file);
>> +if (ret)
>> +return ret;
>> +
>> +adev = file->private_data;
>> +if (adev->autodump.registered == true