from:"Deng, Emily"

RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov

2020-06-10 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
Could you help to review this patch for multiple vf?

Best wishes
Emily Deng



>-Original Message-
>From: Deng, Emily 
>Sent: Wednesday, June 10, 2020 7:01 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Min, Frank 
>Subject: RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>>-Original Message-
>>From: Emily Deng 
>>Sent: Tuesday, June 2, 2020 8:40 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>>
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>index 5294aa7..8ed6c90 100644
>>--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>>@@ -1311,8 +1311,10 @@ static int smu_hw_init(void *handle)  struct
>>amdgpu_device *adev = (struct amdgpu_device *)handle;  struct
>>smu_context *smu = >smu;
>>
>>-if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
>>+if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev)) {
>>+smu->pm_enabled = false;
>> return 0;
>>+}
>>
>> ret = smu_start_smc_engine(smu);
>> if (ret) {
>>--
>>2.7.4
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov

2020-06-10 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, June 2, 2020 8:40 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>index 5294aa7..8ed6c90 100644
>--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>@@ -1311,8 +1311,10 @@ static int smu_hw_init(void *handle)
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> struct smu_context *smu = >smu;
>
>-if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
>+if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev)) {
>+smu->pm_enabled = false;
> return 0;
>+}
>
> ret = smu_start_smc_engine(smu);
> if (ret) {
>--
>2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov

2020-06-02 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Thanks Frank, already sent out the modified patch, please help review again.

Best wishes
Emily Deng



>-Original Message-
>From: Min, Frank 
>Sent: Tuesday, June 2, 2020 8:34 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: 回复: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Hi Emily,
>How about to move it into smu_hw_init()?
>
>Best Regards,
>Frank
>
>-邮件原件-
>发件人: Deng, Emily 
>发送时间: 2020年6月2日 20:08
>收件人: Deng, Emily ; amd-
>g...@lists.freedesktop.org
>抄送: Min, Frank 
>主题: RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>>-Original Message-
>>From: Emily Deng 
>>Sent: Tuesday, June 2, 2020 7:54 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>>
>>Change-Id: Ic010440ef625f6f29e91f267a6f284f9b6554e1f
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>index b6331712..fcbd875 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>@@ -2004,6 +2004,9 @@ static int amdgpu_device_ip_init(struct
>>amdgpu_device *adev)  if (amdgpu_sriov_vf(adev))
>>amdgpu_virt_init_data_exchange(adev);
>>
>>+if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
>>+adev->smu.pm_enabled = 0;
>>+
>> r = amdgpu_ib_pool_init(adev);
>> if (r) {
>> dev_err(adev->dev, "IB initialization failed (%d).\n", r);
>>--
>>2.7.4
>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov

2020-06-02 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

>-Original Message-
>From: Emily Deng 
>Sent: Tuesday, June 2, 2020 7:54 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/sriov: Disable pm for multiple vf sriov
>
>Change-Id: Ic010440ef625f6f29e91f267a6f284f9b6554e1f
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index b6331712..fcbd875 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2004,6 +2004,9 @@ static int amdgpu_device_ip_init(struct
>amdgpu_device *adev)
> if (amdgpu_sriov_vf(adev))
> amdgpu_virt_init_data_exchange(adev);
>
>+if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
>+adev->smu.pm_enabled = 0;
>+
> r = amdgpu_ib_pool_init(adev);
> if (r) {
> dev_err(adev->dev, "IB initialization failed (%d).\n", r);
>--
>2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] SWDEV-231280 CentOS-AWS Guest driver reload 3 failed with call trace in guest

2020-04-20 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Gu,
>JiaWei (Will)
>Sent: Tuesday, April 21, 2020 11:56 AM
>To: Gu, JiaWei (Will) ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] SWDEV-231280 CentOS-AWS Guest driver reload 3 failed
>with call trace in guest
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping..
>
>-Original Message-
>From: Jiawei 
>Sent: Monday, April 20, 2020 7:34 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Gu, JiaWei (Will) 
>Subject: [PATCH] SWDEV-231280 CentOS-AWS Guest driver reload 3 failed
>with call trace in guest
>
>root cause:
>X enables vblank, but driver skips shutting down vblank during unloading
>under sriov, which causes kernel call trace
>
>solution:
>move vblank shutdown logic  into dce_virtual_crtc_disable() to avoid sriov skip
>
>Signed-off-by: Jiawei 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 3f739efead61..c02797f2ee7f 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -191,8 +191,9 @@ static void dce_virtual_crtc_disable(struct drm_crtc
>*crtc)  {
>   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
>
>-  dce_virtual_crtc_dpms(crtc, DRM_MODE_DPMS_OFF);
>+  drm_crtc_vblank_off(crtc);
>
>+  amdgpu_crtc->enabled = false;
>   amdgpu_crtc->pll_id = ATOM_PPLL_INVALID;
>   amdgpu_crtc->encoder = NULL;
>   amdgpu_crtc->connector = NULL;
>--
>2.20.1
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr
>eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C326cebfffbdd4ceb79
>af08d7e5a7dc4d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>7230382202343716sdata=pEeq7kwcCppKlc598GGsG4ES5bhRONDT9rw
>bBwfjuGI%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/powerplay: avoid using pm_en before it is initialized

2020-04-03 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Zhou,
>Tiecheng
>Sent: Friday, April 3, 2020 12:42 PM
>To: Zhou, Tiecheng ; amd-
>g...@lists.freedesktop.org
>Cc: Tao, Yintian 
>Subject: RE: [PATCH] drm/amd/powerplay: avoid using pm_en before it is
>initialized
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping...
>
>-Original Message-
>From: Tiecheng Zhou 
>Sent: Thursday, April 2, 2020 5:29 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhou, Tiecheng ; Tao, Yintian
>
>Subject: [PATCH] drm/amd/powerplay: avoid using pm_en before it is
>initialized
>
>hwmgr->pm_en is initialized at hwmgr_hw_init.
>during amdgpu_device_init, there is amdgpu_asic_reset that calls to
>pp_get_asic_baco_capability, while hwmgr->pm_en has not yet been
>initialized.
>
>so avoid using pm_en in pp_get_asic_baco_capability.
>
>Signed-off-by: Tiecheng Zhou 
>Signed-off-by: Yintian Tao 
>---
> drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>index 71b843f542d8..fdff3e1c5e95 100644
>--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>@@ -1455,7 +1455,8 @@ static int pp_get_asic_baco_state(void *handle, int
>*state)
>   if (!hwmgr)
>   return -EINVAL;
>
>-  if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_state)
>+  if (!(hwmgr->not_vf && amdgpu_dpm) ||
>+  !hwmgr->hwmgr_func->get_asic_baco_state)
>   return 0;
>
>   mutex_lock(>smu_lock);
>--
>2.17.1
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7Cfc7cc02f1db043b393
>1108d7d7895f4d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637
>214857946885870sdata=J8iQUQc8phHtblO1gn9TkoI%2BJ%2BehvD4RVfi
>7MybTuCQ%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: skip access sdma_v5_0 registers under SRIOV

2020-03-30 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

Best wishes
Emily Deng
>-Original Message-
>From: amd-gfx  On Behalf Of Yintian
>Tao
>Sent: Monday, March 30, 2020 4:50 PM
>To: Koenig, Christian ; Deucher, Alexander
>
>Cc: amd-gfx@lists.freedesktop.org; Tao, Yintian 
>Subject: [PATCH] drm/amdgpu: skip access sdma_v5_0 registers under SRIOV
>
>Due to the new L1.0b0c011b policy, many SDMA registers are blocked which
>raise the violation warning. There are total 6 pair register needed to be 
>skipped
>when driver init and de-init.
>mmSDMA0/1_CNTL
>mmSDMA0/1_F32_CNTL
>mmSDMA0/1_UTCL1_PAGE
>mmSDMA0/1_UTCL1_CNTL
>mmSDMA0/1_CHICKEN_BITS,
>mmSDMA0/1_SEM_WAIT_FAIL_TIMER_CNTL
>
>Signed-off-by: Yintian Tao 
>Change-Id: I9d5087582ceb5f629d37bf856533d00c179e6de3
>---
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 110 +
> 1 file changed, 75 insertions(+), 35 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>index b3c30616d6b4..d7c0269059b0 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>@@ -88,6 +88,29 @@ static const struct soc15_reg_golden
>golden_settings_sdma_5[] = {
>   SOC15_REG_GOLDEN_VALUE(GC, 0, mmSDMA1_UTCL1_PAGE,
>0x00ff, 0x000c5c00)  };
>
>+static const struct soc15_reg_golden golden_settings_sdma_5_sriov[] = {
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_GFX_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_PAGE_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC0_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC1_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC2_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC3_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC4_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC5_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC6_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC7_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_GFX_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_PAGE_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC0_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC1_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC2_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC3_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC4_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC5_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC6_RB_WPTR_POLL_CNTL, 0xfff7, 0x00403000),
>+  SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC7_RB_WPTR_POLL_CNTL,
>+0xfff7, 0x00403000), };
>+
> static const struct soc15_reg_golden golden_settings_sdma_nv10[] = {
>   SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA0_RLC3_RB_WPTR_POLL_CNTL, 0xfff0, 0x00403000),
>   SOC15_REG_GOLDEN_VALUE(GC, 0,
>mmSDMA1_RLC3_RB_WPTR_POLL_CNTL, 0xfff0, 0x00403000), @@ -141,9
>+164,14 @@ static void sdma_v5_0_init_golden_registers(struct
>amdgpu_device *adev)
>   (const
>u32)ARRAY_SIZE(golden_settings_sdma_nv14));
>   break;
>   case CHIP_NAVI12:
>-  soc15_program_register_sequence(adev,
>-  golden_settings_sdma_5,
>-  (const
>u32)ARRAY_SIZE(golden_settings_sdma_5));
>+  if (amdgpu_sriov_vf(adev))
>+  soc15_program_register_sequence(adev,
>+
>   golden_settings_sdma_5_sriov,
>+  (const
>u32)ARRAY_SIZE(golden_settings_sdma_5_sriov));
>+  else
>+  soc15_program_register_sequence(adev,
>+
>   golden_settings_sdma_5,
>+  (const
>u32)ARRAY_SIZE(golden_settings_sdma_5));
>   soc15_program_register_sequence(adev,
>   golden_settings_sdma_nv12,
>   (const
>u32)ARRAY_SIZE(golden_settings_sdma_nv12));
>@@ -557,9 +585,12 @@ static void sdma_v5_0_ctx_switch_enable(struct
>amdgpu_device *adev, bool enable)
>   }
>
>   for (i = 0; i < adev->sdma.num_instances; i++) {
>-  f32_cntl =

RE: [PATCH] SWDEV-227226 [AWS][Linux]ReallyQuick test failed, guest dmesg and host dmesg have error

2020-03-26 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Gu,
>JiaWei (Will)
>Sent: Thursday, March 26, 2020 1:58 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] SWDEV-227226 [AWS][Linux]ReallyQuick test failed, guest
>dmesg and host dmesg have error
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping..
>
>-Original Message-
>From: Jiawei 
>Sent: Wednesday, March 25, 2020 4:32 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Gu, JiaWei (Will) 
>Subject: [PATCH] SWDEV-227226 [AWS][Linux]ReallyQuick test failed, guest
>dmesg and host dmesg have error
>
>root cause: compute job timeout for sriov/passthrough is 1 ms, which is too
>short for some compute benchmark
>
>solution: extend the default compute lockup timeout to 6 ms
>
>Signed-off-by: Jiawei 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 3607a63f48bb..88360b220a8f 100755
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2680,12 +2680,12 @@ static int
>amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>* By default timeout for non compute jobs is 1.
>* And there is no timeout enforced on compute jobs.
>* In SR-IOV or passthrough mode, timeout for compute
>-   * jobs are 1 by default.
>+   * jobs are 6 by default.
>*/
>   adev->gfx_timeout = msecs_to_jiffies(1);
>   adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
>   if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
>-  adev->compute_timeout = adev->gfx_timeout;
>+  adev->compute_timeout =  msecs_to_jiffies(6);
>   else
>   adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
>
>--
>2.20.1
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free
>desktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7Cef72ce0afbaf4ed9b1a
>708d7d14a9e8b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637
>207990753889298sdata=febWyY9VUPq2hucVLit73FilxzacEmoIq1TBLFha
>%2FNM%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/4] SWDEV-227334 - No need support vcn decode

2020-03-25 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Ping.

Best wishes
Emily Deng
>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Wednesday, March 25, 2020 4:33 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 2/4] SWDEV-227334 - No need support vcn decode
>
>As no need to support vcn decode feature, so diable the ring.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 4 
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>index ec8091a..febd4c2 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>@@ -223,6 +223,10 @@ static int vcn_v2_0_hw_init(void *handle)
>   if (r)
>   goto done;
>
>+  //Disable vcn decode for sriov
>+  if (amdgpu_sriov_vf(adev))
>+  ring->sched.ready = false;
>+
>   for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>   ring = >vcn.inst->ring_enc[i];
>   r = amdgpu_ring_test_helper(ring);
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free
>desktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7Cd7570f695fc64ab688
>b308d7d09733e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>7207220147577735sdata=vTgILeBbCu5CRPs2aEPs4htoiucE3pYMjPiN2B8
>mEBk%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 7/7] drm/amdgpu: postpone entering fullaccess mode

2020-03-25 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Series Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Monk Liu
>Sent: Wednesday, March 25, 2020 11:59 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH 7/7] drm/amdgpu: postpone entering fullaccess mode
>
>if host support new handshake we only need to enter fullaccess_mode in 
>ip_init()
>part, otherwise we need to do it before reading vbios (becuase host prepares
>vbios for VF only after received REQ_GPU_INIT event under legacy handshake)
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 724ad84..b61161a 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -1814,10 +1814,14 @@ static int amdgpu_device_ip_early_init(struct
>amdgpu_device *adev)
>   return r;
>   }
>   }
>+  }
>
>+  /* we need to send REQ_GPU here for legacy handshaker otherwise the
>vbios
>+   * will not be prepared by host for this VF */
>+  if (amdgpu_sriov_vf(adev) && adev->virt.req_init_data_ver < 1) {
>   r = amdgpu_virt_request_full_gpu(adev, true);
>   if (r)
>-  return -EAGAIN;
>+  return r;
>   }
>
>   adev->pm.pp_feature = amdgpu_pp_feature_mask; @@ -1977,6
>+1981,12 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>   if (r)
>   return r;
>
>+  if (amdgpu_sriov_vf(adev) && adev->virt.req_init_data_ver > 0) {
>+  r = amdgpu_virt_request_full_gpu(adev, true);
>+  if (r)
>+  return -EAGAIN;
>+  }
>+
>   for (i = 0; i < adev->num_ip_blocks; i++) {
>   if (!adev->ip_blocks[i].status.valid)
>   continue;
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free
>desktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C800016a97d6d499de
>adb08d7d070d87f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37207055405387207sdata=J0nRbxV4lfqMe3XLszU7gIGpnookLgz9DaRJ4P
>WHRhg%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 4/4] drm/amdgpu: cleanup all virtualization detection routine

2020-03-24 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Series Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Monk Liu
>Sent: Tuesday, March 24, 2020 6:59 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH 4/4] drm/amdgpu: cleanup all virtualization detection routine
>
>we need to move virt detection much earlier because:
>1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always be at DE5
>(dw) mmio offset from vega10, this way there is no need to implement
>detect_hw_virt() routine in each nbio/chip file.
>for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
>0x1503
>
>2) we need to acknowledged we are SRIOV VF before we do IP discovery because
>the IP discovery content will be updated by host everytime after it recieved a
>new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
>for this new handshake soon).
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h   |  1 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 33
>++
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  6 
> drivers/gpu/drm/amd/amdgpu/cik.c   |  8 --
> drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c | 18 
> drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 18 
> drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c |  7 -
> drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 18 
> drivers/gpu/drm/amd/amdgpu/nv.c|  2 --
> drivers/gpu/drm/amd/amdgpu/si.c|  8 --
> drivers/gpu/drm/amd/amdgpu/soc15.c |  1 -
> drivers/gpu/drm/amd/amdgpu/vi.c| 24 
> .../amd/include/asic_reg/nbif/nbif_6_1_offset.h|  2 ++
> .../amd/include/asic_reg/nbio/nbio_7_0_offset.h|  2 ++
> .../amd/include/asic_reg/nbio/nbio_7_4_offset.h|  2 ++
> 16 files changed, 48 insertions(+), 105 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index e55dbcd..ca609b6 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3057,6 +3057,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   if (amdgpu_mes && adev->asic_type >= CHIP_NAVI10)
>   adev->enable_mes = true;
>
>+  /* detect hw virtualization here */
>+  amdgpu_detect_virtualization(adev);
>+
>   if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10) {
>   r = amdgpu_discovery_init(adev);
>   if (r) {
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
>index 919bd56..edaac24 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
>@@ -77,7 +77,6 @@ struct amdgpu_nbio_funcs {
> u32 *flags);
>   void (*ih_control)(struct amdgpu_device *adev);
>   void (*init_registers)(struct amdgpu_device *adev);
>-  void (*detect_hw_virt)(struct amdgpu_device *adev);
>   void (*remap_hdp_registers)(struct amdgpu_device *adev);
>   void (*handle_ras_controller_intr_no_bifring)(struct amdgpu_device
>*adev);
>   void (*handle_ras_err_event_athub_intr_no_bifring)(struct
>amdgpu_device *adev); diff --git
>a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>index adc813c..43a1ee3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>@@ -287,3 +287,36 @@ void amdgpu_virt_init_data_exchange(struct
>amdgpu_device *adev)
>   }
>   }
> }
>+
>+void amdgpu_detect_virtualization(struct amdgpu_device *adev) {
>+  uint32_t reg;
>+
>+  switch (adev->asic_type) {
>+  case CHIP_TONGA:
>+  case CHIP_FIJI:
>+  reg = RREG32(mmBIF_IOV_FUNC_IDENTIFIER);
>+  break;
>+  case CHIP_VEGA10:
>+  case CHIP_VEGA20:
>+  case CHIP_NAVI10:
>+  case CHIP_NAVI12:
>+  case CHIP_ARCTURUS:
>+  reg = RREG32(mmRCC_IOV_FUNC_IDENTIFIER);
>+  break;
>+  default: /* other chip doesn't support SRIOV */
>+  reg = 0;
>+  break;
>+  }
>+
>+  if (reg & 1)
>+  adev->virt.caps |= AMDGPU_SRIOV_CAPS_IS_VF;
>+
>+  if (reg & 0x8000)
>+  adev->virt.caps |= AMDGPU_SRIOV_CAPS_ENABLE_IOV;
>+
>+  if (!reg) {
>+  if (is_virtual_machine())   /* passthrough mode exclus 
>sriov mod
>*/
>+  adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE;
>+  }
>+}
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>index 0a95b13..74f9843 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>@@ -30,6 +30,11 @@
> #define

回复: [PATCH] drm/amdgpu: revise RLCG access path

2020-03-15 Thread Deng, Emily

Reviewed-by: Emily Deng 

-邮件原件-
发件人: amd-gfx  代表 Monk Liu
发送时间: 2020年3月16日 12:06
收件人: amd-gfx@lists.freedesktop.org
抄送: Liu, Monk 
主题: [PATCH] drm/amdgpu: revise RLCG access path

what changed:
1)provide new implementation interface for the rlcg access path 2)put 
SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op function can 
access reg that need RLCG path help

now even debugfs's reg_op can used to dump wave.

tested-by: Monk Liu 
tested-by: Zhou pengju 
Signed-off-by: Zhou pengju 
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-  
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 50 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h|  3 +
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 74 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 95 -
 drivers/gpu/drm/amd/amdgpu/soc15.h  |  7 +++
 drivers/gpu/drm/amd/amdgpu/soc15_common.h   |  5 +-
 9 files changed, 221 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c00831f..87c25230 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -999,6 +999,8 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
uint32_t reg,
uint32_t acc_flags);
 void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v,
uint32_t acc_flags);
+void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, uint32_t reg, 
uint32_t v,
+   uint32_t acc_flags);
 void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t 
value);  uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 02bb1be1..c0f9a65 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -179,7 +179,7 @@ static int  amdgpu_debugfs_process_reg_op(bool read, struct 
file *f,
} else {
r = get_user(value, (uint32_t *)buf);
if (!r)
-   WREG32(*pos >> 2, value);
+   amdgpu_mm_wreg_mmio_rlc(adev, *pos >> 2, value, 
0);
}
if (r) {
result = r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a35c899..729565f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -306,6 +306,26 @@ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t 
offset, uint8_t value)
BUG();
 }
 
+void static inline amdgpu_mm_wreg_mmio(struct amdgpu_device *adev, 
+uint32_t reg, uint32_t v, uint32_t acc_flags) {
+   trace_amdgpu_mm_wreg(adev->pdev->device, reg, v);
+
+   if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
+   writel(v, ((void __iomem *)adev->rmmio) + (reg * 4));
+   else {
+   unsigned long flags;
+
+   spin_lock_irqsave(>mmio_idx_lock, flags);
+   writel((reg * 4), ((void __iomem *)adev->rmmio) + (mmMM_INDEX * 
4));
+   writel(v, ((void __iomem *)adev->rmmio) + (mmMM_DATA * 4));
+   spin_unlock_irqrestore(>mmio_idx_lock, flags);
+   }
+
+   if (adev->asic_type >= CHIP_VEGA10 && reg == 1 && adev->last_mm_index 
== 0x5702C) {
+   udelay(500);
+   }
+}
+
 /**
  * amdgpu_mm_wreg - write to a memory mapped IO register
  *
@@ -319,8 +339,6 @@ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t 
offset, uint8_t value)  void amdgpu_mm_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32_t v,
uint32_t acc_flags)
 {
-   trace_amdgpu_mm_wreg(adev->pdev->device, reg, v);
-
if (adev->asic_type >= CHIP_VEGA10 && reg == 0) {
adev->last_mm_index = v;
}
@@ -328,20 +346,26 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t 
reg, uint32_t v,
if ((acc_flags & AMDGPU_REGS_KIQ) || (!(acc_flags & AMDGPU_REGS_NO_KIQ) 
&& amdgpu_sriov_runtime(adev)))
return amdgpu_kiq_wreg(adev, reg, v);
 
-   if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
-   writel(v, ((void __iomem *)adev->rmmio) + (reg * 4));
-   else {
-   unsigned long flags;
+   amdgpu_mm_wreg_mmio(adev, reg, v, acc_flags); }
 
-   spin_lock_irqsave(>mmio_idx_lock, flags);
-   writel((reg * 4), ((void __iomem *)adev->rmmio) + (mmMM_INDEX * 
4));
-   writel(v, ((void __iomem *)adev->rmmio) + (mmMM_DATA * 4));
-   spin_unlock_irqrestore(>mmio_idx_lock, flags);
-   }
+/*
+ *

RE: [enable VCN2.0 for NV12 SRIOV 6/6] drm/amdgpu: clear warning on unused var

2020-03-05 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Series reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>Christian König
>Sent: Thursday, March 5, 2020 9:37 PM
>To: Liu, Monk ; amd-gfx@lists.freedesktop.org
>Subject: Re: [enable VCN2.0 for NV12 SRIOV 6/6] drm/amdgpu: clear warning
>on unused var
>
>Am 05.03.20 um 14:33 schrieb Monk Liu:
>> Signed-off-by: Monk Liu 
>
>Acked-by: Christian König 
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>> index ae9754f..a41272f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>> @@ -493,7 +493,6 @@ static int amdgpu_vcn_dec_get_destroy_msg(struct
>amdgpu_ring *ring, uint32_t han
>>
>>   int amdgpu_vcn_dec_ring_test_ib(struct amdgpu_ring *ring, long timeout)
>>   {
>> -struct amdgpu_device *adev = ring->adev;
>>  struct dma_fence *fence;
>>  long r;
>>
>> @@ -655,7 +654,6 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct
>amdgpu_ring *ring, uint32_t han
>>
>>   int amdgpu_vcn_enc_ring_test_ib(struct amdgpu_ring *ring, long timeout)
>>   {
>> -struct amdgpu_device *adev = ring->adev;
>>  struct dma_fence *fence = NULL;
>>  struct amdgpu_bo *bo = NULL;
>>  long r;
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr
>eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C65554a78d47c44316
>3ee08d7c10a4c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>7190122293430826sdata=DeaDmgrUF9yqz1dTnjovIHU6UcjGX4rEy6lcyC
>hHxeA%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/sriov: Use kiq to copy the gpu clock

2020-02-26 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Thanks Alex and Christian. Already  renamed it according your request. Please 
help review.

Best wishes
Emily Deng



>-Original Message-
>From: Alex Deucher 
>Sent: Wednesday, February 26, 2020 10:30 PM
>To: Deng, Emily 
>Cc: amd-gfx list 
>Subject: Re: [PATCH] drm/amdgpu/sriov: Use kiq to copy the gpu clock
>
>On Tue, Feb 25, 2020 at 11:34 PM Emily Deng  wrote:
>>
>> For vega10 sriov, the register is blocked, use copy data command to
>> fix the issue.
>>
>> Signed-off-by: Emily Deng 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 68
>> +--
>>  1 file changed, 58 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 1c7a16b..71df0d9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3963,6 +3963,63 @@ static int gfx_v9_0_soft_reset(void *handle)
>> return 0;
>>  }
>>
>> +static uint64_t amdgpu_kiq_read_clock(struct amdgpu_device *adev)
>
>Please name this function gfx_v9_0_kiq_read_clock for consistency.
>
>
>> +{
>> +   signed long r, cnt = 0;
>> +   unsigned long flags;
>> +   uint32_t seq;
>> +   struct amdgpu_kiq *kiq = >gfx.kiq;
>> +   struct amdgpu_ring *ring = >ring;
>> +
>> +   BUG_ON(!ring->funcs->emit_rreg);
>> +
>> +   spin_lock_irqsave(>ring_lock, flags);
>> +   amdgpu_ring_alloc(ring, 32);
>> +   amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> +   amdgpu_ring_write(ring, 9 | /* src: register*/
>
>Is src 9 the counter?
Yes, it is gpu counter.
>
>Assuming that is correct, with the naming fixed:
>Reviewed-by: Alex Deucher 
>
>> +   (5 << 8) |  /* dst: memory */
>> +   (1 << 16) | /* count sel */
>> +   (1 << 20)); /* write confirm */
>> +   amdgpu_ring_write(ring, 0);
>> +   amdgpu_ring_write(ring, 0);
>> +   amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> +   kiq->reg_val_offs * 4));
>> +   amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> +   kiq->reg_val_offs * 4));
>> +   amdgpu_fence_emit_polling(ring, );
>> +   amdgpu_ring_commit(ring);
>> +   spin_unlock_irqrestore(>ring_lock, flags);
>> +
>> +   r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>> +
>> +   /* don't wait anymore for gpu reset case because this way may
>> +* block gpu_recover() routine forever, e.g. this virt_kiq_rreg
>> +* is triggered in TTM and ttm_bo_lock_delayed_workqueue() will
>> +* never return if we keep waiting in virt_kiq_rreg, which cause
>> +* gpu_recover() hang there.
>> +*
>> +* also don't wait anymore for IRQ context
>> +* */
>> +   if (r < 1 && (adev->in_gpu_reset || in_interrupt()))
>> +   goto failed_kiq_read;
>> +
>> +   might_sleep();
>> +   while (r < 1 && cnt++ < MAX_KIQ_REG_TRY) {
>> +   msleep(MAX_KIQ_REG_BAILOUT_INTERVAL);
>> +   r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>> +   }
>> +
>> +   if (cnt > MAX_KIQ_REG_TRY)
>> +   goto failed_kiq_read;
>> +
>> +   return (uint64_t)adev->wb.wb[kiq->reg_val_offs] |
>> +   (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] <<
>> + 32ULL;
>> +
>> +failed_kiq_read:
>> +   pr_err("failed to read gpu clock\n");
>> +   return ~0;
>> +}
>> +
>>  static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device
>> *adev)  {
>> uint64_t clock;
>> @@ -3970,16 +4027,7 @@ static uint64_t
>gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev)
>> amdgpu_gfx_off_ctrl(adev, false);
>> mutex_lock(>gfx.gpu_clock_mutex);
>> if (adev->asic_type == CHIP_VEGA10 && amdgpu_sriov_runtime(adev)) {
>> -   uint32_t tmp, lsb, msb, i = 0;
>> -   do {
>> -   if (i != 0)
>> -   udelay(1);
>> -   tmp = RREG32_SOC15(GC, 0,
>mmRLC_REFCLOCK_TIMESTAMP_MSB);
>> -

RE: [PATCH 5/5] drm/amd/amdgpu: L1 Policy(5/5) - removed IH_CHICKEN from VF

2020-01-03 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Series are Reviewed-by: Emily Deng 

>-Original Message-
>From: Jane Jian 
>Sent: Friday, January 3, 2020 5:57 PM
>To: amd-gfx@lists.freedesktop.org; Deng, Emily ; Liu, Leo
>
>Cc: Jian,JaneQiang) ; Luo, Zhigang
>; Jian,JaneQiang) 
>Subject: [PATCH 5/5] drm/amd/amdgpu: L1 Policy(5/5) - removed IH_CHICKEN
>from VF
>
>From: Zhigang Luo 
>
>Signed-off-by: Zhigang Luo 
>Signed-off-by: Jane Jian 
>---
> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 22 --
> 1 file changed, 12 insertions(+), 10 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>index 5cb7e231de5f..d9e331084ea0 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>@@ -234,16 +234,9 @@ static int vega10_ih_irq_init(struct amdgpu_device
>*adev)
>   WREG32_SOC15(OSSSYS, 0, mmIH_RB_BASE_HI, (ih->gpu_addr >> 40) &
>0xff);
>
>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL);
>-  ih_chicken = RREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN);
>   ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl);
>-  if (adev->irq.ih.use_bus_addr) {
>-  ih_chicken = REG_SET_FIELD(ih_chicken, IH_CHICKEN,
>MC_SPACE_GPA_ENABLE, 1);
>-  } else {
>-  ih_chicken = REG_SET_FIELD(ih_chicken, IH_CHICKEN,
>MC_SPACE_FBPA_ENABLE, 1);
>-  }
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RPTR_REARM,
>  !!adev->irq.msi_enabled);
>-
>   if (amdgpu_sriov_vf(adev)) {
>   if (psp_reg_program(>psp, PSP_REG_IH_RB_CNTL,
>ih_rb_cntl)) {
>   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
>@@ -253,10 +246,19 @@ static int vega10_ih_irq_init(struct amdgpu_device
>*adev)
>   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL, ih_rb_cntl);
>   }
>
>-  if ((adev->asic_type == CHIP_ARCTURUS
>-  && adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT)
>-  || adev->asic_type == CHIP_RENOIR)
>+  if ((adev->asic_type == CHIP_ARCTURUS &&
>+   adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) ||
>+  adev->asic_type == CHIP_RENOIR) {
>+  ih_chicken = RREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN);
>+  if (adev->irq.ih.use_bus_addr) {
>+  ih_chicken = REG_SET_FIELD(ih_chicken, IH_CHICKEN,
>+ MC_SPACE_GPA_ENABLE, 1);
>+  } else {
>+  ih_chicken = REG_SET_FIELD(ih_chicken, IH_CHICKEN,
>+ MC_SPACE_FBPA_ENABLE, 1);
>+  }
>   WREG32_SOC15(OSSSYS, 0, mmIH_CHICKEN, ih_chicken);
>+  }
>
>   /* set the writeback address whether it's enabled or not */
>   WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_ADDR_LO,
>--
>2.17.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] amd/amdgpu/sriov enable onevf mode for ARCTURUS VF

2019-12-26 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]



>-Original Message-
>From: Zhang, Jack (Jian) 
>Sent: Friday, December 27, 2019 3:00 PM
>To: Feng, Kenneth ; Deucher, Alexander
>; Quan, Evan ; Wang,
>Kevin(Yang) ; Tao, Yintian ;
>Deng, Emily ; Min, Frank ; Liu,
>Monk ; amd-gfx@lists.freedesktop.org; Zhang, Jack (Jian)
>
>Subject: RE: [PATCH 1/2] amd/amdgpu/sriov enable onevf mode for ARCTURUS
>VF
>
>
>
>-Original Message-
>From: Jack Zhang 
>Sent: Friday, December 27, 2019 2:57 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhang, Jack (Jian) 
>Subject: [PATCH 1/2] amd/amdgpu/sriov enable onevf mode for ARCTURUS VF
>
>Before, initialization of smu ip block would be skipped for sriov ASICs. But if
>there's only one VF being used, guest driver should be able to dump some HW
>info such as clks, temperature,etc.
>
>To solve this, now after onevf mode is enabled, host driver will notify guest. 
>If
>it's onevf mode, guest will do smu hw_init and skip some steps in normal smu
>hw_init flow because host driver has already done it for smu.
>
>With this fix, guest app can talk with smu and dump hw information from smu.
>
>v2: refine the logic for pm_enabled.Skip hw_init by not changing pm_enabled.
>
>Signed-off-by: Jack Zhang 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  3 +-
> drivers/gpu/drm/amd/amdgpu/soc15.c |  3 +-
> drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 45 +--
>---
> 3 files changed, 29 insertions(+), 22 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index 8469834..08130a6 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -1448,7 +1448,8 @@ static int psp_np_fw_load(struct psp_context *psp)
> || ucode->ucode_id == AMDGPU_UCODE_ID_RLC_G
>   || ucode->ucode_id ==
>AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL
>   || ucode->ucode_id ==
>AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM
>-  || ucode->ucode_id ==
>AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM))
>+  || ucode->ucode_id ==
>AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM
>+  || ucode->ucode_id == AMDGPU_UCODE_ID_SMC))
>   /*skip ucode loading in SRIOV VF */
>   continue;
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
>b/drivers/gpu/drm/amd/amdgpu/soc15.c
>index b53d401..a271496 100644
>--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>@@ -827,8 +827,7 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev,
>_virtual_ip_block);
>   amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
>   amdgpu_device_ip_block_add(adev, _v4_0_ip_block);
>-  if (!amdgpu_sriov_vf(adev))
>-  amdgpu_device_ip_block_add(adev,
>_v11_0_ip_block);
>+  amdgpu_device_ip_block_add(adev, _v11_0_ip_block);
>
>   if (amdgpu_sriov_vf(adev)) {
>   if (likely(adev->firmware.load_type ==
>AMDGPU_FW_LOAD_PSP)) diff --git
>a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>index 936c682..42c0a6d 100644
>--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
>@@ -531,10 +531,14 @@ bool is_support_sw_smu(struct amdgpu_device
>*adev)
>   if (adev->asic_type == CHIP_VEGA20)
>   return (amdgpu_dpm == 2) ? true : false;
>   else if (adev->asic_type >= CHIP_ARCTURUS) {
>-  if (amdgpu_sriov_vf(adev))
>-  return false;
>-  else
>+  if (amdgpu_sriov_vf(adev)) {
>+  if(amdgpu_sriov_is_pp_one_vf(adev))
>+  return true;
>+  else
>+  return false;
>+  } else {
>   return true;
>+  }
>   } else
>   return false;
> }
>@@ -1062,20 +1066,19 @@ static int smu_smc_table_hw_init(struct
>smu_context *smu,
>   }
>
>   /* smu_dump_pptable(smu); */
>+  if(amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev)){
>+  /*
>+   * Copy pptable bo in the vram to smc with SMU MSGs such as
>+   * SetDriverDramAddr and TransferTableDram2Smu.
>+   */
>+  ret = smu_write_pptable(smu);
>+  if (ret)
>+  return ret;
Why only sriov and non one v

RE: [PATCH 1/2] drm/amdgpu: remove FB location config for sriov

2019-12-20 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of
>Frank.Min
>Sent: Thursday, December 19, 2019 7:44 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Min, Frank 
>Subject: [PATCH 1/2] drm/amdgpu: remove FB location config for sriov
>
>FB location is already programmed by HV driver for arcutus so remove this part
>
>Change-Id: Ia357ae716bfc3084a4dd277ade219e57092f9b42
>Signed-off-by: Frank.Min 
>---
> drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c |  2 +-
>drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c  | 16 
> 2 files changed, 1 insertion(+), 17 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>index e91bd7945777..e9a9d24c2b7f 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>@@ -264,7 +264,7 @@ static void gfxhub_v1_0_program_invalidation(struct
>amdgpu_device *adev)
>
> int gfxhub_v1_0_gart_enable(struct amdgpu_device *adev)  {
>-  if (amdgpu_sriov_vf(adev)) {
>+  if (amdgpu_sriov_vf(adev) && adev->asic_type != CHIP_ARCTURUS) {
>   /*
>* MC_VM_FB_LOCATION_BASE/TOP is NULL for VF, becuase
>they are
>* VF copy registers so vbios post doesn't program them, for 
> diff
>--git a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
>b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
>index d9301e80522a..ac61206c4ce6 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
>@@ -368,22 +368,6 @@ int mmhub_v9_4_gart_enable(struct amdgpu_device
>*adev)
>   int i;
>
>   for (i = 0; i < MMHUB_NUM_INSTANCES; i++) {
>-  if (amdgpu_sriov_vf(adev)) {
>-  /*
>-   * MC_VM_FB_LOCATION_BASE/TOP is NULL for VF,
>becuase
>-   * they are VF copy registers so vbios post doesn't
>-   * program them, for SRIOV driver need to program
>them
>-   */
>-  WREG32_SOC15_OFFSET(MMHUB, 0,
>-
>mmVMSHAREDVC0_MC_VM_FB_LOCATION_BASE,
>-   i * MMHUB_INSTANCE_REGISTER_OFFSET,
>-   adev->gmc.vram_start >> 24);
>-  WREG32_SOC15_OFFSET(MMHUB, 0,
>-
>mmVMSHAREDVC0_MC_VM_FB_LOCATION_TOP,
>-   i * MMHUB_INSTANCE_REGISTER_OFFSET,
>-   adev->gmc.vram_end >> 24);
>-  }
>-
>   /* GART Enable. */
>   mmhub_v9_4_init_gart_aperture_regs(adev, i);
>   mmhub_v9_4_init_system_aperture_regs(adev, i);
>--
>2.17.1
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C564215fe64b245954
>97408d78478b687%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C
>637123526651355885sdata=Aeag0%2FF6lQHe70aot5ZzB5UP1rZxsEZO2
>WfLyJv1njg%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 4/4] drm/amdkfd: Avoid hanging hardware in stop_cpsch

2019-12-20 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Series Tested-by:  Emily Deng  on sriov environment with 
vege10 about TDR-1, TDR-2 and TDR-3 test cases.

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of Felix
>Kuehling
>Sent: Friday, December 20, 2019 4:30 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: [PATCH 4/4] drm/amdkfd: Avoid hanging hardware in stop_cpsch
>
>Don't use the HWS if it's known to be hanging. In a reset also don't try to
>destroy the HIQ because that may hang on SRIOV if the KIQ is unresponsive.
>
>Signed-off-by: Felix Kuehling 
>---
> .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c| 12 
> drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c|  8 
> drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c  |  4 ++--
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  4 ++--
> .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c   |  2 +-
> 5 files changed, 17 insertions(+), 13 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>index a7e9ec1b3ce3..d7eb6ac37f62 100644
>--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>@@ -946,7 +946,7 @@ static int start_nocpsch(struct device_queue_manager
>*dqm)  static int stop_nocpsch(struct device_queue_manager *dqm)  {
>   if (dqm->dev->device_info->asic_family == CHIP_HAWAII)
>-  pm_uninit(>packets);
>+  pm_uninit(>packets, false);
>   dqm->sched_running = false;
>
>   return 0;
>@@ -1114,20 +1114,24 @@ static int start_cpsch(struct
>device_queue_manager *dqm)
>   return 0;
> fail_allocate_vidmem:
> fail_set_sched_resources:
>-  pm_uninit(>packets);
>+  pm_uninit(>packets, false);
> fail_packet_manager_init:
>   return retval;
> }
>
> static int stop_cpsch(struct device_queue_manager *dqm)  {
>+  bool hanging;
>+
>   dqm_lock(dqm);
>-  unmap_queues_cpsch(dqm,
>KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
>+  if (!dqm->is_hws_hang)
>+  unmap_queues_cpsch(dqm,
>KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
>+  hanging = dqm->is_hws_hang || dqm->is_resetting;
>   dqm->sched_running = false;
>   dqm_unlock(dqm);
>
>   kfd_gtt_sa_free(dqm->dev, dqm->fence_mem);
>-  pm_uninit(>packets);
>+  pm_uninit(>packets, hanging);
>
>   return 0;
> }
>diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
>b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
>index 2d56dc534459..bae706462f96 100644
>--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
>+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
>@@ -195,9 +195,9 @@ static bool kq_initialize(struct kernel_queue *kq, struct
>kfd_dev *dev,  }
>
> /* Uninitialize a kernel queue and free all its memory usages. */ -static void
>kq_uninitialize(struct kernel_queue *kq)
>+static void kq_uninitialize(struct kernel_queue *kq, bool hanging)
> {
>-  if (kq->queue->properties.type == KFD_QUEUE_TYPE_HIQ)
>+  if (kq->queue->properties.type == KFD_QUEUE_TYPE_HIQ && !hanging)
>   kq->mqd_mgr->destroy_mqd(kq->mqd_mgr,
>   kq->queue->mqd,
>
>   KFD_PREEMPT_TYPE_WAVEFRONT_RESET,
>@@ -337,9 +337,9 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev
>*dev,
>   return NULL;
> }
>
>-void kernel_queue_uninit(struct kernel_queue *kq)
>+void kernel_queue_uninit(struct kernel_queue *kq, bool hanging)
> {
>-  kq_uninitialize(kq);
>+  kq_uninitialize(kq, hanging);
>   kfree(kq);
> }
>
>diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
>b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
>index 6cabed06ef5d..dc406e6dee23 100644
>--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
>+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
>@@ -264,10 +264,10 @@ int pm_init(struct packet_manager *pm, struct
>device_queue_manager *dqm)
>   return 0;
> }
>
>-void pm_uninit(struct packet_manager *pm)
>+void pm_uninit(struct packet_manager *pm, bool hanging)
> {
>   mutex_destroy(>lock);
>-  kernel_queue_uninit(pm->priv_queue);
>+  kernel_queue_uninit(pm->priv_queue, hanging);
> }
>
> int pm_send_set_resources(struct packet_manager *pm, diff --git
>a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>index 087e96838997..8ac680dc90f1 100644
>--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>@@ -883,7 +883,7 @@ struct device_queue_manager
>*device_queue_manager_init(struct kfd_dev *dev);  void
>device_queue_manager_uninit(struct device_queue_manager *dqm);  struct
>kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>   enum kfd_queue_type type);
>-void kernel_queue_uninit(struct kernel_queue *kq);
>+void kernel_queue_uninit(struct kernel_queue *kq, bool hanging);
> int kfd_process_vm_fault(struct device_queue_manager

RE: [PATCH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-17 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Monk Liu
>Sent: Tuesday, December 17, 2019 6:20 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV
>
>issues:
>MEC is ruined by the amdkfd_pre_reset after VF FLR done
>
>fix:
>amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR, the
>correct sequence is do amdkfd_pre_reset before VF FLR but there is a limitation
>to block this sequence:
>if we do pre_reset() before VF FLR, it would go KIQ way to do register access 
>and
>stuck there, because KIQ probably won't work by that time (e.g. you already
>made GFX hang)
>
>so the best way right now is to simply remove it.
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 --
> 1 file changed, 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 605cef6..ae962b9 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3672,8 +3672,6 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   if (r)
>   return r;
>
>-  amdgpu_amdkfd_pre_reset(adev);
>-
>   /* Resume IP prior to SMC */
>   r = amdgpu_device_ip_reinit_early_sriov(adev);
>   if (r)
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C74408803b49e4f328
>f7708d782daba6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37121748318124859sdata=4YbyHwEEGxVLEhuOg%2Frc%2FxdhFRwrdm
>FuZ4vpHx%2FApAE%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amdgpu: fix double gpu_recovery for NV of SRIOV

2019-12-17 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Monk Liu
>Sent: Tuesday, December 17, 2019 6:20 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH 1/2] drm/amdgpu: fix double gpu_recovery for NV of SRIOV
>
>issues:
>gpu_recover() is re-entered by the mailbox interrupt handler mxgpu_nv.c
>
>fix:
>we need to bypass the gpu_recover() invoke in mailbox interrupt as long as the
>timeout is not infinite (thus the TDR will be triggered automatically after 
>time
>out, no need to invoke
>gpu_recover() through mailbox interrupt.
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 6 +-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>index 0d8767e..1c3a7d4 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>@@ -269,7 +269,11 @@ static void xgpu_nv_mailbox_flr_work(struct
>work_struct *work)
>   }
>
>   /* Trigger recovery for world switch failure if no TDR */
>-  if (amdgpu_device_should_recover_gpu(adev))
>+  if (amdgpu_device_should_recover_gpu(adev)
>+  && (adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
>+  adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
>+  adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
>+  adev->video_timeout == MAX_SCHEDULE_TIMEOUT))
>   amdgpu_device_gpu_recover(adev, NULL);  }
>
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=02%7C01%7CEmily.Deng%40amd.com%7C029ef88677e744f2ad
>8f08d782dab68c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>7121748276776005sdata=IiRwMTw6DQW8sh8Y7SkZ2PehohwnH6gSqkt
>t64a73UU%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2019-12-03 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
Thanks very much.

Best wishes
Emily Deng
From: Grodzovsky, Andrey 
Sent: Tuesday, December 3, 2019 12:33 PM
To: Deucher, Alexander ; Deng, Emily 

Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; steven.pr...@arm.com
Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

Turns out Steven's patch was already in so i just cherry-picked the change from 
drm-next-misc

Emily - it's in.

Andrey

On 12/3/19 2:59 PM, Deucher, Alexander wrote:

[AMD Official Use Only - Internal Distribution Only]

Cherry pick whatever dependencies you need or pick the older version of the 
patch.  Either way works.

Alex

From: Grodzovsky, Andrey 
<mailto:andrey.grodzov...@amd.com>
Sent: Tuesday, December 3, 2019 2:57 PM
To: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>; Deng, Emily 
<mailto:emily.d...@amd.com>
Cc: dri-de...@lists.freedesktop.org<mailto:dri-de...@lists.freedesktop.org> 
<mailto:dri-de...@lists.freedesktop.org>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>; Koenig, 
Christian <mailto:christian.koe...@amd.com>; 
steven.pr...@arm.com<mailto:steven.pr...@arm.com> 
<mailto:steven.pr...@arm.com>
Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

I don't think i can apply this patch 'as is' as this has dependency on patch by 
Steven which also wasn't applied yet - 588b982 Steven Price6 weeks ago  
  drm: Don't free jobs in wait_event_interruptible()

Andrey

On 12/3/19 2:44 PM, Deucher, Alexander wrote:

[AMD Official Use Only - Internal Distribution Only]

Please go ahead an apply whatever version is necessary for amd-staging-drm-next.

Alex

From: Grodzovsky, Andrey 
<mailto:andrey.grodzov...@amd.com>
Sent: Tuesday, December 3, 2019 2:10 PM
To: Deng, Emily <mailto:emily.d...@amd.com>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>
Cc: dri-de...@lists.freedesktop.org<mailto:dri-de...@lists.freedesktop.org> 
<mailto:dri-de...@lists.freedesktop.org>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>; Koenig, 
Christian <mailto:christian.koe...@amd.com>; 
steven.pr...@arm.com<mailto:steven.pr...@arm.com> 
<mailto:steven.pr...@arm.com>
Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

Yes - Christian just pushed it to drm-next-misc - I guess Alex/Christian
didn't pull to amd-staging-drm-next yet.

Andrey

On 12/2/19 2:24 PM, Deng, Emily wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Andrey,
>      Seems this patch is still not in amd-staging-drm-next?
>
> Best wishes
> Emily Deng
>
>
>
>> -Original Message-
>> From: Deng, Emily
>> Sent: Tuesday, November 26, 2019 4:41 PM
>> To: Grodzovsky, Andrey 
>> <mailto:andrey.grodzov...@amd.com>
>> Cc: dri-de...@lists.freedesktop.org<mailto:dri-de...@lists.freedesktop.org>; 
>> amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Koenig,
>> Christian <mailto:christian.koe...@amd.com>; 
>> steven.pr...@arm.com<mailto:steven.pr...@arm.com>
>> Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Reviewed-by: Emily Deng <mailto:emily.d...@amd.com>
>>
>>> -Original Message-
>>> From: Grodzovsky, Andrey 
>>> <mailto:andrey.grodzov...@amd.com>
>>> Sent: Tuesday, November 26, 2019 7:37 AM
>>> Cc: 
>>> dri-de...@lists.freedesktop.org<mailto:dri-de...@lists.freedesktop.org>; 
>>> amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>;
>>> Koenig, Christian 
>>> <mailto:christian.koe...@amd.com>; Deng, Emily
>>> <mailto:emily.d...@amd.com>; 
>>> steven.pr...@arm.com<mailto:steven.pr...@arm.com>
>>> Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>>
>>> Ping
>>>
>>> Andrey
>>>
>>> On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>>>> Problem:
>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>>> drm_sched_job_timedout in timeout work there is a possiblity that bad
>>>> job was already freed while still being accessed from the timeout
>>>> thread.
>>>>
>>>> Fix:
>>>> Instead of just peeking at the bad job in the mirror list remove it
>>>> from the list

RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2019-12-03 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Hi Alex,
When we will cherry pick those patches to drm-next?

>-Original Message-
>From: Grodzovsky, Andrey 
>Sent: Tuesday, December 3, 2019 11:10 AM
>To: Deng, Emily ; Deucher, Alexander
>
>Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian ; steven.pr...@arm.com
>Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Yes - Christian just pushed it to drm-next-misc - I guess Alex/Christian 
>didn't pull
>to amd-staging-drm-next yet.
>
>Andrey
>
>On 12/2/19 2:24 PM, Deng, Emily wrote:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Andrey,
>>  Seems this patch is still not in amd-staging-drm-next?
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: Deng, Emily
>>> Sent: Tuesday, November 26, 2019 4:41 PM
>>> To: Grodzovsky, Andrey 
>>> Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>>> Koenig, Christian ; steven.pr...@arm.com
>>> Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Reviewed-by: Emily Deng 
>>>
>>>> -Original Message-
>>>> From: Grodzovsky, Andrey 
>>>> Sent: Tuesday, November 26, 2019 7:37 AM
>>>> Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>>>> Koenig, Christian ; Deng, Emily
>>>> ; steven.pr...@arm.com
>>>> Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>>>
>>>> Ping
>>>>
>>>> Andrey
>>>>
>>>> On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>>>>> Problem:
>>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>>>> drm_sched_job_timedout in timeout work there is a possiblity that
>>>>> bad job was already freed while still being accessed from the
>>>>> timeout thread.
>>>>>
>>>>> Fix:
>>>>> Instead of just peeking at the bad job in the mirror list remove it
>>>>> from the list under lock and then put it back later when we are
>>>>> garanteed no race with main sched thread is possible which is after
>>>>> the thread is parked.
>>>>>
>>>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>>>>
>>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>>>> drm_sched_get_cleanup_job already has a lock there.
>>>>>
>>>>> v4: Fix comments to relfect latest code in drm-misc.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky 
>>>>> Reviewed-by: Christian König 
>>>>> Tested-by: Emily Deng 
>>>>> ---
>>>>>drivers/gpu/drm/scheduler/sched_main.c | 27
>>>> +++
>>>>>1 file changed, 27 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 6774955..1bf9c40 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>>>> work_struct *work)
>>>>>   unsigned long flags;
>>>>>
>>>>>   sched = container_of(work, struct drm_gpu_scheduler,
>>>>> work_tdr.work);
>>>>> +
>>>>> + /* Protects against concurrent deletion in
>>>> drm_sched_get_cleanup_job */
>>>>> + spin_lock_irqsave(>job_list_lock, flags);
>>>>>   job = list_first_entry_or_null(>ring_mirror_list,
>>>>>  struct drm_sched_job, node);
>>>>>
>>>>>   if (job) {
>>>>> + /*
>>>>> +  * Remove the bad job so it cannot be freed by concurrent
>>>>> +  * drm_sched_cleanup_jobs. It will be reinserted back after
>>>> sched->thread
>>>>> +  * is parked at which point it's safe.
>>>>> +  */
>>>>> + list_del_init(>node);
>>>>> + spin_unlock_irqrestore(>job_list_l

RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2019-12-02 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
Seems this patch is still not in amd-staging-drm-next?

Best wishes
Emily Deng



>-Original Message-
>From: Deng, Emily
>Sent: Tuesday, November 26, 2019 4:41 PM
>To: Grodzovsky, Andrey 
>Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian ; steven.pr...@arm.com
>Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Reviewed-by: Emily Deng 
>
>>-Original Message-
>>From: Grodzovsky, Andrey 
>>Sent: Tuesday, November 26, 2019 7:37 AM
>>Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>>Koenig, Christian ; Deng, Emily
>>; steven.pr...@arm.com
>>Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>>
>>Ping
>>
>>Andrey
>>
>>On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>>> Problem:
>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>> drm_sched_job_timedout in timeout work there is a possiblity that bad
>>> job was already freed while still being accessed from the timeout
>>> thread.
>>>
>>> Fix:
>>> Instead of just peeking at the bad job in the mirror list remove it
>>> from the list under lock and then put it back later when we are
>>> garanteed no race with main sched thread is possible which is after
>>> the thread is parked.
>>>
>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>>
>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>> drm_sched_get_cleanup_job already has a lock there.
>>>
>>> v4: Fix comments to relfect latest code in drm-misc.
>>>
>>> Signed-off-by: Andrey Grodzovsky 
>>> Reviewed-by: Christian König 
>>> Tested-by: Emily Deng 
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_main.c | 27
>>+++
>>>   1 file changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 6774955..1bf9c40 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>>work_struct *work)
>>> unsigned long flags;
>>>
>>> sched = container_of(work, struct drm_gpu_scheduler,
>>> work_tdr.work);
>>> +
>>> +   /* Protects against concurrent deletion in
>>drm_sched_get_cleanup_job */
>>> +   spin_lock_irqsave(>job_list_lock, flags);
>>> job = list_first_entry_or_null(>ring_mirror_list,
>>>struct drm_sched_job, node);
>>>
>>> if (job) {
>>> +   /*
>>> +* Remove the bad job so it cannot be freed by concurrent
>>> +* drm_sched_cleanup_jobs. It will be reinserted back after
>>sched->thread
>>> +* is parked at which point it's safe.
>>> +*/
>>> +   list_del_init(>node);
>>> +   spin_unlock_irqrestore(>job_list_lock, flags);
>>> +
>>> job->sched->ops->timedout_job(job);
>>>
>>> /*
>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>>work_struct *work)
>>> job->sched->ops->free_job(job);
>>> sched->free_guilty = false;
>>> }
>>> +   } else {
>>> +   spin_unlock_irqrestore(>job_list_lock, flags);
>>> }
>>>
>>> spin_lock_irqsave(>job_list_lock, flags); @@ -370,6 +383,20
>>> @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>drm_sched_job *bad)
>>> kthread_park(sched->thread);
>>>
>>> /*
>>> +* Reinsert back the bad job here - now it's safe as
>>> +* drm_sched_get_cleanup_job cannot race against us and release the
>>> +* bad job at this point - we parked (waited for) any in progress
>>> +* (earlier) cleanups and drm_sched_get_cleanup_job will not be
>>called
>>> +* now until the scheduler thread is unparked.
>>> +*/
>>> +   if (bad && bad->sched == sched)
>>> +   /*
>>> +* Add at the head of the queue to reflect it was the earliest
>>> +* job extracted.
>>> +*/
>>> +   list_add(>node, >ring_mirror_list);
>>> +
>>> +   /*
>>>  * Iterate the job list from later to  earlier one and either deactive
>>>  * their HW callbacks or remove them from mirror list if they already
>>>  * signaled.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2019-11-26 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Emily Deng 

>-Original Message-
>From: Grodzovsky, Andrey 
>Sent: Tuesday, November 26, 2019 7:37 AM
>Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian ; Deng, Emily
>; steven.pr...@arm.com
>Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Ping
>
>Andrey
>
>On 11/25/19 3:51 PM, Andrey Grodzovsky wrote:
>> Problem:
>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>> drm_sched_job_timedout in timeout work there is a possiblity that bad
>> job was already freed while still being accessed from the timeout
>> thread.
>>
>> Fix:
>> Instead of just peeking at the bad job in the mirror list remove it
>> from the list under lock and then put it back later when we are
>> garanteed no race with main sched thread is possible which is after
>> the thread is parked.
>>
>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>>
>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>> drm_sched_get_cleanup_job already has a lock there.
>>
>> v4: Fix comments to relfect latest code in drm-misc.
>>
>> Signed-off-by: Andrey Grodzovsky 
>> Reviewed-by: Christian König 
>> Tested-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 27
>+++
>>   1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 6774955..1bf9c40 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>>  unsigned long flags;
>>
>>  sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>> +
>> +/* Protects against concurrent deletion in
>drm_sched_get_cleanup_job */
>> +spin_lock_irqsave(>job_list_lock, flags);
>>  job = list_first_entry_or_null(>ring_mirror_list,
>> struct drm_sched_job, node);
>>
>>  if (job) {
>> +/*
>> + * Remove the bad job so it cannot be freed by concurrent
>> + * drm_sched_cleanup_jobs. It will be reinserted back after
>sched->thread
>> + * is parked at which point it's safe.
>> + */
>> +list_del_init(>node);
>> +spin_unlock_irqrestore(>job_list_lock, flags);
>> +
>>  job->sched->ops->timedout_job(job);
>>
>>  /*
>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>>  job->sched->ops->free_job(job);
>>  sched->free_guilty = false;
>>  }
>> +} else {
>> +spin_unlock_irqrestore(>job_list_lock, flags);
>>  }
>>
>>  spin_lock_irqsave(>job_list_lock, flags); @@ -370,6 +383,20
>> @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>drm_sched_job *bad)
>>  kthread_park(sched->thread);
>>
>>  /*
>> + * Reinsert back the bad job here - now it's safe as
>> + * drm_sched_get_cleanup_job cannot race against us and release the
>> + * bad job at this point - we parked (waited for) any in progress
>> + * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>called
>> + * now until the scheduler thread is unparked.
>> + */
>> +if (bad && bad->sched == sched)
>> +/*
>> + * Add at the head of the queue to reflect it was the earliest
>> + * job extracted.
>> + */
>> +list_add(>node, >ring_mirror_list);
>> +
>> +/*
>>   * Iterate the job list from later to  earlier one and either deactive
>>   * their HW callbacks or remove them from mirror list if they already
>>   * signaled.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2019-11-25 Thread Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Hi Andrey,
Seems you didn't submit this patch?

Best wishes
Emily Deng



>-Original Message-
>From: Andrey Grodzovsky 
>Sent: Monday, November 25, 2019 12:51 PM
>Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian ; Deng, Emily
>; steven.pr...@arm.com; Grodzovsky, Andrey
>
>Subject: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
>
>Problem:
>Due to a race between drm_sched_cleanup_jobs in sched thread and
>drm_sched_job_timedout in timeout work there is a possiblity that bad job
>was already freed while still being accessed from the timeout thread.
>
>Fix:
>Instead of just peeking at the bad job in the mirror list remove it from the 
>list
>under lock and then put it back later when we are garanteed no race with
>main sched thread is possible which is after the thread is parked.
>
>v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>
>v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>drm_sched_get_cleanup_job already has a lock there.
>
>v4: Fix comments to relfect latest code in drm-misc.
>
>Signed-off-by: Andrey Grodzovsky 
>Reviewed-by: Christian König 
>Tested-by: Emily Deng 
>---
> drivers/gpu/drm/scheduler/sched_main.c | 27
>+++
> 1 file changed, 27 insertions(+)
>
>diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>b/drivers/gpu/drm/scheduler/sched_main.c
>index 6774955..1bf9c40 100644
>--- a/drivers/gpu/drm/scheduler/sched_main.c
>+++ b/drivers/gpu/drm/scheduler/sched_main.c
>@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>   unsigned long flags;
>
>   sched = container_of(work, struct drm_gpu_scheduler,
>work_tdr.work);
>+
>+  /* Protects against concurrent deletion in
>drm_sched_get_cleanup_job */
>+  spin_lock_irqsave(>job_list_lock, flags);
>   job = list_first_entry_or_null(>ring_mirror_list,
>  struct drm_sched_job, node);
>
>   if (job) {
>+  /*
>+   * Remove the bad job so it cannot be freed by concurrent
>+   * drm_sched_cleanup_jobs. It will be reinserted back after
>sched->thread
>+   * is parked at which point it's safe.
>+   */
>+  list_del_init(>node);
>+  spin_unlock_irqrestore(>job_list_lock, flags);
>+
>   job->sched->ops->timedout_job(job);
>
>   /*
>@@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>   job->sched->ops->free_job(job);
>   sched->free_guilty = false;
>   }
>+  } else {
>+  spin_unlock_irqrestore(>job_list_lock, flags);
>   }
>
>   spin_lock_irqsave(>job_list_lock, flags); @@ -370,6 +383,20
>@@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>drm_sched_job *bad)
>   kthread_park(sched->thread);
>
>   /*
>+   * Reinsert back the bad job here - now it's safe as
>+   * drm_sched_get_cleanup_job cannot race against us and release the
>+   * bad job at this point - we parked (waited for) any in progress
>+   * (earlier) cleanups and drm_sched_get_cleanup_job will not be
>called
>+   * now until the scheduler thread is unparked.
>+   */
>+  if (bad && bad->sched == sched)
>+  /*
>+   * Add at the head of the queue to reflect it was the earliest
>+   * job extracted.
>+   */
>+  list_add(>node, >ring_mirror_list);
>+
>+  /*
>* Iterate the job list from later to  earlier one and either deactive
>* their HW callbacks or remove them from mirror list if they already
>* signaled.
>--
>2.7.4
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v2] drm/scheduler: Avoid accessing freed bad job.

2019-11-19 Thread Deng, Emily

Tested-by: Emily Deng 

>-Original Message-
>From: Andrey Grodzovsky 
>Sent: Tuesday, November 19, 2019 1:52 AM
>Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
>Christian ; Deng, Emily
>; Grodzovsky, Andrey
>
>Subject: [PATCH v2] drm/scheduler: Avoid accessing freed bad job.
>
>Problem:
>Due to a race between drm_sched_cleanup_jobs in sched thread and
>drm_sched_job_timedout in timeout work there is a possiblity that bad job
>was already freed while still being accessed from the timeout thread.
>
>Fix:
>Instead of just peeking at the bad job in the mirror list remove it from the 
>list
>under lock and then put it back later when we are garanteed no race with
>main sched thread is possible which is after the thread is parked.
>
>v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
>
>Signed-off-by: Andrey Grodzovsky 
>---
> drivers/gpu/drm/scheduler/sched_main.c | 44
>+-
> 1 file changed, 38 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>b/drivers/gpu/drm/scheduler/sched_main.c
>index 80ddbdf..b05b210 100644
>--- a/drivers/gpu/drm/scheduler/sched_main.c
>+++ b/drivers/gpu/drm/scheduler/sched_main.c
>@@ -287,10 +287,24 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>   unsigned long flags;
>
>   sched = container_of(work, struct drm_gpu_scheduler,
>work_tdr.work);
>+
>+  /*
>+   * Protects against concurrent deletion in drm_sched_cleanup_jobs
>that
>+   * is already in progress.
>+   */
>+  spin_lock_irqsave(>job_list_lock, flags);
>   job = list_first_entry_or_null(>ring_mirror_list,
>  struct drm_sched_job, node);
>
>   if (job) {
>+  /*
>+   * Remove the bad job so it cannot be freed by already in
>progress
>+   * drm_sched_cleanup_jobs. It will be reinsrted back after
>sched->thread
>+   * is parked at which point it's safe.
>+   */
>+  list_del_init(>node);
>+  spin_unlock_irqrestore(>job_list_lock, flags);
>+
>   job->sched->ops->timedout_job(job);
>
>   /*
>@@ -302,6 +316,8 @@ static void drm_sched_job_timedout(struct
>work_struct *work)
>   sched->free_guilty = false;
>   }
>   }
>+  else
>+  spin_unlock_irqrestore(>job_list_lock, flags);
>
>   spin_lock_irqsave(>job_list_lock, flags);
>   drm_sched_start_timeout(sched);
>@@ -373,6 +389,19 @@ void drm_sched_stop(struct drm_gpu_scheduler
>*sched, struct drm_sched_job *bad)
>   kthread_park(sched->thread);
>
>   /*
>+   * Reinsert back the bad job here - now it's safe as
>drm_sched_cleanup_jobs
>+   * cannot race against us and release the bad job at this point - we
>parked
>+   * (waited for) any in progress (earlier) cleanups and any later ones 
>will
>+   * bail out due to sched->thread being parked.
>+   */
>+  if (bad && bad->sched == sched)
>+  /*
>+   * Add at the head of the queue to reflect it was the earliest
>+   * job extracted.
>+   */
>+  list_add(>node, >ring_mirror_list);
>+
>+  /*
>* Iterate the job list from later to  earlier one and either deactive
>* their HW callbacks or remove them from mirror list if they already
>* signaled.
>@@ -656,16 +685,19 @@ static void drm_sched_cleanup_jobs(struct
>drm_gpu_scheduler *sched)
>   __kthread_should_park(sched->thread))
>   return;
>
>-
>-  while (!list_empty(>ring_mirror_list)) {
>+  /* See drm_sched_job_timedout for why the locking is here */
>+  while (true) {
>   struct drm_sched_job *job;
>
>-  job = list_first_entry(>ring_mirror_list,
>- struct drm_sched_job, node);
>-  if (!dma_fence_is_signaled(>s_fence->finished))
>+  spin_lock_irqsave(>job_list_lock, flags);
>+  job = list_first_entry_or_null(>ring_mirror_list,
>+ struct drm_sched_job, node);
>+
>+  if (!job || !dma_fence_is_signaled(>s_fence->finished)) {
>+  spin_unlock_irqrestore(>job_list_lock, flags);
>   break;
>+  }
>
>-  spin_lock_irqsave(>job_list_lock, flags);
>   /* remove job from ring_mirror_list */
>   list_del_init(>node);
>   spin_unlock_irqrestore(>job_list_lock, flags);
>--
>2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-14 Thread Deng, Emily

Hi Andrey,
 Currently, I am busying with another issue, maybe will try next week.

Best wishes
Emily Deng



>-Original Message-
>From: Grodzovsky, Andrey 
>Sent: Friday, November 15, 2019 6:14 AM
>To: Koenig, Christian ; Deng, Emily
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Attached.
>
>Emily - can you give it a try ?
>
>Andrey
>
>On 11/14/19 3:12 AM, Christian König wrote:
>>> What about instead of peeking at the job to actually remove it from
>>> ring_mirror_list right there,
>> Also an interesting idea. We would need to protect the mirror list
>> with a lock again, but that should be the lesser evil.
>>
>> Maybe prototype that and see if it works or not.
>>
>> Regards,
>> Christian.
>>
>> Am 13.11.19 um 17:00 schrieb Andrey Grodzovsky:
>>>
>>>
>>> On 11/13/19 9:20 AM, Christian König wrote:
>>>> Another more fundamental question: Could we get rid of the timeout
>>>> job at all?
>>>
>>>
>>> There are other stuff there besides picking the first unfinished job
>>> which is common for all the drivers - such as freeing guilty signaled
>>> job and rearming the timeout work timer.
>>>
>>>
>>>>
>>>> I mean we used to give this as parameter to the scheduler callback
>>>> because we had the timeout worker in the job, but that is no longer
>>>> the case.
>>>>
>>>> E.g. in drm_sched_job_timedout() we do the following:
>>>>>     job = list_first_entry_or_null(>ring_mirror_list,
>>>>>    struct drm_sched_job, node);
>>>>
>>>> Why don't we just remove that here and only get the first job after
>>>> we have stopped the scheduler?
>>>
>>>
>>> Should be ok since we have the extra check for __kthread_should_park
>>> in drm_sched_cleanup_jobs which will protect us in this case from a
>>> wakeup of sched thread and execution of in drm_sched_cleanup_jobs
>>> after we already parked it. The problem here is we need the
>>> drm_sched_job to access the private data for each client driver (see
>>> amdgpu_job_timedout for example). What about instead of peeking at
>>> the job to actually remove it from ring_mirror_list right there, go
>>> ahead with it through the reset routine, if it's signaled in the
>>> meanwhile that great - release it, otherwise put it back into
>>> ring_mirror_list in drm_sched_resubmit_jobs.
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 13.11.19 um 15:12 schrieb Andrey Grodzovsky:
>>>>>
>>>>> This why I asked for a trace with timer enabled, but since there is
>>>>> a finite number of places we touch the timer Emily can just put
>>>>> prints there. Also, I wonder if this temp fix helps her with the
>>>>> issue or not.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 11/13/19 2:36 AM, Christian König wrote:
>>>>>> The question is where do we rearm the timer for this problem to
>>>>>> occur?
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>> Am 12.11.19 um 20:21 schrieb Andrey Grodzovsky:
>>>>>>>
>>>>>>> I was able to reproduce the crash by using the attached
>>>>>>> simulate_crash.patch - waiting on guilty job to signal in reset
>>>>>>> work and artificially rearming the timeout timer just before the
>>>>>>> check for !cancel_delayed_work(>work_tdr)  in
>>>>>>> drm_sched_cleanup_jobs - crash log attached in crash.log. This I
>>>>>>> think confirms my theory i described earlier in this thread.
>>>>>>>
>>>>>>> basic_fix.patch handles this by testing whether another timer
>>>>>>> already armed ob this scheduler or is there a timeout work in
>>>>>>> execution right now (see documentation for work_busy) - obviously
>>>>>>> this is not a full solution as this will not protect from races
>>>>>>> if for example there is immediate work scheduling such as in
>>>>>>> drm_sched_fault -  so we probably need to account for this by
>>>>>>> making drm_sched_cleanup

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-11 Thread Deng, Emily

Hi Christian,
I add the follow print in function drm_sched_cleanup_jobs. From the log it 
shows that only use cancel_delayed_work could not avoid to free job when the 
sched is in reset. But don't know exactly where it is wrong about the driver. 
Do you have any suggestion about this?

+   printk("Emily:drm_sched_cleanup_jobs:begin,tid:%lu, pid:%lu\n", 
current->tgid, current->pid);

/*
 * Don't destroy jobs while the timeout worker is running  OR thread
 * is being parked and hence assumed to not touch ring_mirror_list
 */
 if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
!cancel_delayed_work(>work_tdr)))
return;
+   printk("Emily:drm_sched_cleanup_jobs,tid:%lu, pid:%lu\n", 
current->tgid, current->pid);


Best wishes
Emily Deng

Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11380.695091] 
Emily:drm_sched_cleanup_jobs:begin,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11380.695104] 
Emily:drm_sched_cleanup_jobs:begin,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11380.695105] 
Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11380.695107] 
Emily:drm_sched_cleanup_jobs:begin,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11380.695107] 
Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.222954] 
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled 
seq=78585, emitted seq=78587
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.224275] 
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 
thread  pid 0, s_job:fe75ab36,tid=15603, pid=15603
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225413] 
amdgpu :00:08.0: GPU reset begin!
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225417] 
Emily:drm_sched_cleanup_jobs:begin,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225425] 
Emily:drm_sched_cleanup_jobs:begin,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225425] 
Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225428] 
Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread  pid 0, 
s_job:fe75ab36, tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225429] 
Emily:drm_sched_cleanup_jobs:begin,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225430] 
Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225473] 
Emily:drm_sched_cleanup_jobs:begin,tid:2253, pid:2253
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225486] 
Emily:drm_sched_cleanup_jobs:begin,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225489] 
Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel: [11381.225494] 
Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread  pid 0, 
s_job:f086ec84, tid:2262, pid:2262
>-Original Message-
>From: Grodzovsky, Andrey 
>Sent: Tuesday, November 12, 2019 11:28 AM
>To: Koenig, Christian ; Deng, Emily
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Thinking more about this claim - we assume here that if cancel_delayed_work
>returned true it guarantees that timeout work is not running but, it merely
>means there was a pending timeout work which was removed from the
>workqueue before it's timer elapsed and so it didn't have a chance to be
>dequeued and executed, it doesn't cover already executing work. So there is a
>possibility where while timeout work started executing another timeout work
>already got enqueued (maybe through earlier cleanup jobs or through
>drm_sched_fault) and if at this point another drm_sched_cleanup_jobs runs
>cancel_delayed_work(>work_tdr) will return true even while there is a
>timeout job in progress.
>Unfortunately we cannot change cancel_delayed_work to
>cancel_delayed_work_sync to flush the timeout work as timeout work itself
>waits for schedule thread  to be parked again when calling park_thread.
>
>Andrey
>
>________
>From: amd-gfx  on behalf of
>Koenig, Christian 
>Sent: 08 November 2019 05:35:18
>To: Deng, Emily; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Hi Emily,
>
>exactly that can

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-11 Thread Deng, Emily

Hi Andrey,
 On my side, it doesn't need to a specific scenario, I only run the quark 
with slow job. Then sometimes, it will have fake hang and hardware fence will 
back. For this case, it will randomly occur the NULL pointer issue in 
amdgpu_device_gpu_recover.

>-Original Message-
>From: Grodzovsky, Andrey 
>Sent: Tuesday, November 12, 2019 5:35 AM
>To: Deng, Emily ; Koenig, Christian
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Emily - is there a particular scenario to reproduce this ? I am trying with 
>libdrm
>deadlock test and artificially delaying the GPU reset logic until after the 
>guilty
>job is signaling but indeed nothing bad happens as drm_sched_cleanup_jobs
>returns early because there is a reset in progress and so the bad job is not
>getting released while GPU reset is running.
>
>Can you provide event tracing for timer, dma_fence and gpu_scheduler for
>when the problem happens ?
>
>Andrey
>
>On 11/11/19 4:05 AM, Deng, Emily wrote:
>> Hi Christian and Andrey,
>>   The issue I encountered is the bad job is freeing after entering to the
>amdgpu_device_gpu_recover. Don't know why, as per Christian said, it will
>call cancel_delayed_work in drm_sched_cleanup_jobs.
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Deng, Emily
>>> Sent: Monday, November 11, 2019 3:19 PM
>>> To: Grodzovsky, Andrey ; Koenig,
>Christian
>>> ; amd-gfx@lists.freedesktop.org
>>> Subject: RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>
>>> Hi Andrey,
>>> I don’t think your patch will help for this. As it will may call
>>> kthread_should_park in drm_sched_cleanup_jobs first, and then call
>>> kcl_kthread_park. And then it still has a race between the 2 threads.
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -Original Message-
>>>> From: Grodzovsky, Andrey 
>>>> Sent: Saturday, November 9, 2019 3:01 AM
>>>> To: Koenig, Christian ; Deng, Emily
>>>> ; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>
>>>>
>>>> On 11/8/19 5:35 AM, Koenig, Christian wrote:
>>>>> Hi Emily,
>>>>>
>>>>> exactly that can't happen. See here:
>>>>>
>>>>>>       /* Don't destroy jobs while the timeout worker is
>>>>>> running */
>>>>>>       if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>>>       !cancel_delayed_work(>work_tdr))
>>>>>>       return NULL;
>>>>> We never free jobs while the timeout working is running to prevent
>>>>> exactly that issue.
>>>>
>>>> I don't think this protects us if drm_sched_cleanup_jobs is called
>>>> for scheduler which didn't experience a timeout, in
>>>> amdgpu_device_gpu_recover we access
>>>> sched->ring_mirror_list for all the schedulers on a device so this
>>>> sched->condition
>>>> above won't protect us. What in fact could help maybe is my recent
>>>> patch
>>>> 541c521 drm/sched: Avoid job cleanup if sched thread is parked.
>>>> because we do park each of the scheduler threads during tdr job
>>>> before trying to access
>>>> sched->ring_mirror_list.
>>>>
>>>> Emily - did you see this problem with that patch in place ? I only
>>>> pushed it yesterday.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 08.11.19 um 11:32 schrieb Deng, Emily:
>>>>>> Hi Christian,
>>>>>> The drm_sched_job_timedout-> amdgpu_job_timedout call
>>>> amdgpu_device_gpu_recover. I mean the main scheduler free the jobs
>>>> while in amdgpu_device_gpu_recover, and before calling
>drm_sched_stop.
>>>>>> Best wishes
>>>>>> Emily Deng
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -Original Message-
>>>>>>> From: Koenig, Christian 
>>>>>>> Sent: Friday, November 8, 2019 6:26 PM
>>>>>>> To: Deng, Emily ; amd-
>>> g...@lists.freedesktop.org
>>>>>

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-11 Thread Deng, Emily

Hi Christian and Andrey,
 The issue I encountered is the bad job is freeing after entering to the 
amdgpu_device_gpu_recover. Don't know why, as per Christian said, it will call 
cancel_delayed_work in drm_sched_cleanup_jobs.

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Monday, November 11, 2019 3:19 PM
>To: Grodzovsky, Andrey ; Koenig, Christian
>; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Hi Andrey,
>I don’t think your patch will help for this. As it will may call
>kthread_should_park in drm_sched_cleanup_jobs first, and then call
>kcl_kthread_park. And then it still has a race between the 2 threads.
>
>Best wishes
>Emily Deng
>
>
>
>>-Original Message-
>>From: Grodzovsky, Andrey 
>>Sent: Saturday, November 9, 2019 3:01 AM
>>To: Koenig, Christian ; Deng, Emily
>>; amd-gfx@lists.freedesktop.org
>>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>
>>
>>On 11/8/19 5:35 AM, Koenig, Christian wrote:
>>> Hi Emily,
>>>
>>> exactly that can't happen. See here:
>>>
>>>>      /* Don't destroy jobs while the timeout worker is running
>>>> */
>>>>      if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>      !cancel_delayed_work(>work_tdr))
>>>>      return NULL;
>>> We never free jobs while the timeout working is running to prevent
>>> exactly that issue.
>>
>>
>>I don't think this protects us if drm_sched_cleanup_jobs is called for
>>scheduler which didn't experience a timeout, in
>>amdgpu_device_gpu_recover we access
>>sched->ring_mirror_list for all the schedulers on a device so this
>>sched->condition
>>above won't protect us. What in fact could help maybe is my recent
>>patch
>>541c521 drm/sched: Avoid job cleanup if sched thread is parked. because
>>we do park each of the scheduler threads during tdr job before trying
>>to access
>>sched->ring_mirror_list.
>>
>>Emily - did you see this problem with that patch in place ? I only
>>pushed it yesterday.
>>
>>Andrey
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 08.11.19 um 11:32 schrieb Deng, Emily:
>>>> Hi Christian,
>>>>The drm_sched_job_timedout-> amdgpu_job_timedout call
>>amdgpu_device_gpu_recover. I mean the main scheduler free the jobs
>>while in amdgpu_device_gpu_recover, and before calling drm_sched_stop.
>>>>
>>>> Best wishes
>>>> Emily Deng
>>>>
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: Koenig, Christian 
>>>>> Sent: Friday, November 8, 2019 6:26 PM
>>>>> To: Deng, Emily ; amd-
>g...@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>>
>>>>> Hi Emily,
>>>>>
>>>>> well who is calling amdgpu_device_gpu_recover() in this case?
>>>>>
>>>>> When it's not the scheduler we shouldn't have a guilty job in the first
>place.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 08.11.19 um 11:22 schrieb Deng, Emily:
>>>>>> Hi Chrisitan,
>>>>>> No, I am with the new branch and also has the patch. Even
>>>>>> it are freed by
>>>>> main scheduler, how we could avoid main scheduler to free jobs
>>>>> while enter to function amdgpu_device_gpu_recover?
>>>>>> Best wishes
>>>>>> Emily Deng
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -Original Message-
>>>>>>> From: Koenig, Christian 
>>>>>>> Sent: Friday, November 8, 2019 6:15 PM
>>>>>>> To: Deng, Emily ;
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for
>>>>>>> tdr
>>>>>>>
>>>>>>> Hi Emily,
>>>>>>>
>>>>>>> in this case you are on an old code branch.
>>>>>>>
>>>>>>> Jobs are freed now by the main scheduler thread and only if no
>>>>>>> timeout handler is running.
>>>>>>

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-10 Thread Deng, Emily

Hi Andrey,
I don’t think your patch will help for this. As it will may call 
kthread_should_park in drm_sched_cleanup_jobs first, and then call 
kcl_kthread_park. And then it still has a race between the 2 threads.

Best wishes
Emily Deng



>-Original Message-
>From: Grodzovsky, Andrey 
>Sent: Saturday, November 9, 2019 3:01 AM
>To: Koenig, Christian ; Deng, Emily
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>
>On 11/8/19 5:35 AM, Koenig, Christian wrote:
>> Hi Emily,
>>
>> exactly that can't happen. See here:
>>
>>>      /* Don't destroy jobs while the timeout worker is running */
>>>      if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>      !cancel_delayed_work(>work_tdr))
>>>      return NULL;
>> We never free jobs while the timeout working is running to prevent
>> exactly that issue.
>
>
>I don't think this protects us if drm_sched_cleanup_jobs is called for 
>scheduler
>which didn't experience a timeout, in amdgpu_device_gpu_recover we access
>sched->ring_mirror_list for all the schedulers on a device so this condition
>above won't protect us. What in fact could help maybe is my recent patch
>541c521 drm/sched: Avoid job cleanup if sched thread is parked. because we
>do park each of the scheduler threads during tdr job before trying to access
>sched->ring_mirror_list.
>
>Emily - did you see this problem with that patch in place ? I only pushed it
>yesterday.
>
>Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>> Am 08.11.19 um 11:32 schrieb Deng, Emily:
>>> Hi Christian,
>>>The drm_sched_job_timedout-> amdgpu_job_timedout call
>amdgpu_device_gpu_recover. I mean the main scheduler free the jobs while
>in amdgpu_device_gpu_recover, and before calling drm_sched_stop.
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -Original Message-
>>>> From: Koenig, Christian 
>>>> Sent: Friday, November 8, 2019 6:26 PM
>>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>
>>>> Hi Emily,
>>>>
>>>> well who is calling amdgpu_device_gpu_recover() in this case?
>>>>
>>>> When it's not the scheduler we shouldn't have a guilty job in the first 
>>>> place.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 08.11.19 um 11:22 schrieb Deng, Emily:
>>>>> Hi Chrisitan,
>>>>> No, I am with the new branch and also has the patch. Even
>>>>> it are freed by
>>>> main scheduler, how we could avoid main scheduler to free jobs while
>>>> enter to function amdgpu_device_gpu_recover?
>>>>> Best wishes
>>>>> Emily Deng
>>>>>
>>>>>
>>>>>
>>>>>> -Original Message-
>>>>>> From: Koenig, Christian 
>>>>>> Sent: Friday, November 8, 2019 6:15 PM
>>>>>> To: Deng, Emily ;
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for
>>>>>> tdr
>>>>>>
>>>>>> Hi Emily,
>>>>>>
>>>>>> in this case you are on an old code branch.
>>>>>>
>>>>>> Jobs are freed now by the main scheduler thread and only if no
>>>>>> timeout handler is running.
>>>>>>
>>>>>> See this patch here:
>>>>>>> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
>>>>>>> Author: Christian König 
>>>>>>> Date:   Thu Apr 18 11:00:21 2019 -0400
>>>>>>>
>>>>>>>    drm/scheduler: rework job destruction
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>> Am 08.11.19 um 11:11 schrieb Deng, Emily:
>>>>>>> Hi Christian,
>>>>>>>  Please refer to follow log, when it enter to
>>>>>>> amdgpu_device_gpu_recover
>>>>>> function, the bad job 5086879e is freeing in function
>>>>>> amdgpu_job_free_cb  at the same time, because of the hardware
>>>>>> fence
>>>> signal.
>>>>>> But amdgpu_device_gpu_recover goe

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Deng, Emily

Hi Christian,
 Sorry, seems I understand wrong. And from the print, the free job's thread 
is the same as job timeout thread. So seems have some issue in function 
amdgpu_device_gpu_recover.


Best wishes
Emily Deng



>-Original Message-
>From: Koenig, Christian 
>Sent: Friday, November 8, 2019 6:35 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Hi Emily,
>
>exactly that can't happen. See here:
>
>>     /* Don't destroy jobs while the timeout worker is running */
>>     if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>     !cancel_delayed_work(>work_tdr))
>>     return NULL;
>
>We never free jobs while the timeout working is running to prevent exactly
>that issue.
>
>Regards,
>Christian.
>
>Am 08.11.19 um 11:32 schrieb Deng, Emily:
>> Hi Christian,
>>   The drm_sched_job_timedout-> amdgpu_job_timedout call
>amdgpu_device_gpu_recover. I mean the main scheduler free the jobs while
>in amdgpu_device_gpu_recover, and before calling drm_sched_stop.
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Friday, November 8, 2019 6:26 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>
>>> Hi Emily,
>>>
>>> well who is calling amdgpu_device_gpu_recover() in this case?
>>>
>>> When it's not the scheduler we shouldn't have a guilty job in the first 
>>> place.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 08.11.19 um 11:22 schrieb Deng, Emily:
>>>> Hi Chrisitan,
>>>>No, I am with the new branch and also has the patch. Even it
>>>> are freed by
>>> main scheduler, how we could avoid main scheduler to free jobs while
>>> enter to function amdgpu_device_gpu_recover?
>>>> Best wishes
>>>> Emily Deng
>>>>
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: Koenig, Christian 
>>>>> Sent: Friday, November 8, 2019 6:15 PM
>>>>> To: Deng, Emily ; amd-
>g...@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>>
>>>>> Hi Emily,
>>>>>
>>>>> in this case you are on an old code branch.
>>>>>
>>>>> Jobs are freed now by the main scheduler thread and only if no
>>>>> timeout handler is running.
>>>>>
>>>>> See this patch here:
>>>>>> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
>>>>>> Author: Christian König 
>>>>>> Date:   Thu Apr 18 11:00:21 2019 -0400
>>>>>>
>>>>>>       drm/scheduler: rework job destruction
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 08.11.19 um 11:11 schrieb Deng, Emily:
>>>>>> Hi Christian,
>>>>>> Please refer to follow log, when it enter to
>>>>>> amdgpu_device_gpu_recover
>>>>> function, the bad job 5086879e is freeing in function
>>>>> amdgpu_job_free_cb  at the same time, because of the hardware fence
>>> signal.
>>>>> But amdgpu_device_gpu_recover goes faster, at this case, the
>>>>> s_fence is already freed, but job is not freed in time. Then this issue
>occurs.
>>>>>> [  449.792189] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
>>> sdma0
>>>>>> timeout, signaled seq=2481, emitted seq=2483 [  449.793202]
>>>>>> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
>>>>> process  pid 0 thread  pid 0, s_job:5086879e [  449.794163]
>>>>> amdgpu
>>>>> :00:08.0: GPU reset begin!
>>>>>> [  449.794175] Emily:amdgpu_job_free_cb,Process information:
>>>>>> process pid 0 thread  pid 0, s_job:5086879e [  449.794221]
>>>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0
>>>>>> thread pid 0, s_job:66eb74ab [  449.794222]
>>>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0
>>>>>> thread pid 0, s_job:d4438ad9 [  449.794255]
>>>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0
>&

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Deng, Emily

Hi Christian,
 The drm_sched_job_timedout-> amdgpu_job_timedout call 
amdgpu_device_gpu_recover. I mean the main scheduler free the jobs while in 
amdgpu_device_gpu_recover, and before calling drm_sched_stop. 

Best wishes
Emily Deng



>-Original Message-
>From: Koenig, Christian 
>Sent: Friday, November 8, 2019 6:26 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Hi Emily,
>
>well who is calling amdgpu_device_gpu_recover() in this case?
>
>When it's not the scheduler we shouldn't have a guilty job in the first place.
>
>Regards,
>Christian.
>
>Am 08.11.19 um 11:22 schrieb Deng, Emily:
>> Hi Chrisitan,
>>   No, I am with the new branch and also has the patch. Even it are freed 
>> by
>main scheduler, how we could avoid main scheduler to free jobs while enter
>to function amdgpu_device_gpu_recover?
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Friday, November 8, 2019 6:15 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>
>>> Hi Emily,
>>>
>>> in this case you are on an old code branch.
>>>
>>> Jobs are freed now by the main scheduler thread and only if no
>>> timeout handler is running.
>>>
>>> See this patch here:
>>>> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
>>>> Author: Christian König 
>>>> Date:   Thu Apr 18 11:00:21 2019 -0400
>>>>
>>>>      drm/scheduler: rework job destruction
>>> Regards,
>>> Christian.
>>>
>>> Am 08.11.19 um 11:11 schrieb Deng, Emily:
>>>> Hi Christian,
>>>>Please refer to follow log, when it enter to
>>>> amdgpu_device_gpu_recover
>>> function, the bad job 5086879e is freeing in function
>>> amdgpu_job_free_cb  at the same time, because of the hardware fence
>signal.
>>> But amdgpu_device_gpu_recover goes faster, at this case, the s_fence
>>> is already freed, but job is not freed in time. Then this issue occurs.
>>>> [  449.792189] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
>sdma0
>>>> timeout, signaled seq=2481, emitted seq=2483 [  449.793202]
>>>> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
>>> process  pid 0 thread  pid 0, s_job:5086879e [  449.794163]
>>> amdgpu
>>> :00:08.0: GPU reset begin!
>>>> [  449.794175] Emily:amdgpu_job_free_cb,Process information: process
>>>> pid 0 thread  pid 0, s_job:5086879e [  449.794221]
>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>> pid 0, s_job:66eb74ab [  449.794222]
>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>> pid 0, s_job:d4438ad9 [  449.794255]
>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>> pid 0, s_job:b6d69c65 [  449.794257]
>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>> pid 0,
>>> s_job:ea85e922 [  449.794287]
>>> Emily:amdgpu_job_free_cb,Process
>>> information: process  pid 0 thread  pid 0, s_job:ed3a5ac6 [
>>> 449.794366] BUG: unable to handle kernel NULL pointer dereference at
>>> 00c0 [  449.800818] PGD 0 P4D 0 [  449.801040] Oops: 
>>> [#1] SMP PTI
>>>> [  449.801338] CPU: 3 PID: 55 Comm: kworker/3:1 Tainted: G   OE
>>> 4.18.0-15-generic #16~18.04.1-Ubuntu
>>>> [  449.802157] Hardware name: QEMU Standard PC (i440FX + PIIX,
>>>> 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [  449.802944]
>>>> Workqueue: events drm_sched_job_timedout [amd_sched] [  449.803488]
>RIP:
>>> 0010:amdgpu_device_gpu_recover+0x1da/0xb60 [amdgpu]
>>>> [  449.804020] Code: dd ff ff 49 39 c5 48 89 55 a8 0f 85 56 ff ff ff
>>>> 45 85 e4 0f
>>> 85 a1 00 00 00 48 8b 45 b0 48 85 c0 0f 84 60 01 00 00 48 8b 40 10 <48> 8b
>98
>>> c0 00 00 00 48 85 db 0f 84 4c 01 00 00 48 8b 43 48 a8 01
>>>> [  449.805593] RSP: 0018:b4c7c08f7d68 EFLAGS: 00010286 [
>>>> 449.806032] RAX:  RBX:  RCX:
>>>>  [  449.806625] RDX: b4c7c08f5ac0 RSI:
>>>> 000fffe0 RDI: 0246 [  449.807224] RBP:
>>>> b4c7c0

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Deng, Emily

Hi Chrisitan,
 No, I am with the new branch and also has the patch. Even it are freed by 
main scheduler, how we could avoid main scheduler to free jobs while enter to 
function amdgpu_device_gpu_recover?

Best wishes
Emily Deng

  

>-Original Message-
>From: Koenig, Christian 
>Sent: Friday, November 8, 2019 6:15 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Hi Emily,
>
>in this case you are on an old code branch.
>
>Jobs are freed now by the main scheduler thread and only if no timeout
>handler is running.
>
>See this patch here:
>> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
>> Author: Christian König 
>> Date:   Thu Apr 18 11:00:21 2019 -0400
>>
>>     drm/scheduler: rework job destruction
>
>Regards,
>Christian.
>
>Am 08.11.19 um 11:11 schrieb Deng, Emily:
>> Hi Christian,
>>   Please refer to follow log, when it enter to amdgpu_device_gpu_recover
>function, the bad job 5086879e is freeing in function
>amdgpu_job_free_cb  at the same time, because of the hardware fence signal.
>But amdgpu_device_gpu_recover goes faster, at this case, the s_fence is
>already freed, but job is not freed in time. Then this issue occurs.
>>
>> [  449.792189] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
>> timeout, signaled seq=2481, emitted seq=2483 [  449.793202]
>> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
>process  pid 0 thread  pid 0, s_job:5086879e [  449.794163] amdgpu
>:00:08.0: GPU reset begin!
>> [  449.794175] Emily:amdgpu_job_free_cb,Process information: process
>> pid 0 thread  pid 0, s_job:5086879e [  449.794221]
>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>> pid 0, s_job:66eb74ab [  449.794222]
>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>> pid 0, s_job:d4438ad9 [  449.794255]
>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>> pid 0, s_job:b6d69c65 [  449.794257]
>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread  pid 0,
>s_job:ea85e922 [  449.794287] Emily:amdgpu_job_free_cb,Process
>information: process  pid 0 thread  pid 0, s_job:ed3a5ac6
>[  449.794366] BUG: unable to handle kernel NULL pointer dereference at
>00c0 [  449.800818] PGD 0 P4D 0 [  449.801040] Oops: 
>[#1] SMP PTI
>> [  449.801338] CPU: 3 PID: 55 Comm: kworker/3:1 Tainted: G   OE
>4.18.0-15-generic #16~18.04.1-Ubuntu
>> [  449.802157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [  449.802944] Workqueue: events
>> drm_sched_job_timedout [amd_sched] [  449.803488] RIP:
>0010:amdgpu_device_gpu_recover+0x1da/0xb60 [amdgpu]
>> [  449.804020] Code: dd ff ff 49 39 c5 48 89 55 a8 0f 85 56 ff ff ff 45 85 
>> e4 0f
>85 a1 00 00 00 48 8b 45 b0 48 85 c0 0f 84 60 01 00 00 48 8b 40 10 <48> 8b 98
>c0 00 00 00 48 85 db 0f 84 4c 01 00 00 48 8b 43 48 a8 01
>> [  449.805593] RSP: 0018:b4c7c08f7d68 EFLAGS: 00010286 [
>> 449.806032] RAX:  RBX:  RCX:
>>  [  449.806625] RDX: b4c7c08f5ac0 RSI:
>> 000fffe0 RDI: 0246 [  449.807224] RBP:
>> b4c7c08f7de0 R08: 0068b9d54000 R09:  [
>> 449.807818] R10:  R11: 0148 R12:
>>  [  449.808411] R13: b4c7c08f7da0 R14:
>> 8d82b8525d40 R15: 8d82b8525d40 [  449.809004] FS:
>> () GS:8d82bfd8()
>> knlGS: [  449.809674] CS:  0010 DS:  ES:  CR0:
>> 80050033 [  449.810153] CR2: 00c0 CR3:
>> 3cc0a001 CR4: 003606e0 [  449.810747] DR0:
> DR1:  DR2: 
>[  449.811344] DR3:  DR6: fffe0ff0 DR7:
>0400 [  449.811937] Call Trace:
>> [  449.812206]  amdgpu_job_timedout+0x114/0x140 [amdgpu] [
>> 449.812635]  drm_sched_job_timedout+0x44/0x90 [amd_sched] [
>> 449.813139]  ? amdgpu_cgs_destroy_device+0x10/0x10 [amdgpu] [
>> 449.813609]  ? drm_sched_job_timedout+0x44/0x90 [amd_sched] [
>> 449.814077]  process_one_work+0x1fd/0x3f0 [  449.814417]
>> worker_thread+0x34/0x410 [  449.814728]  kthread+0x121/0x140 [
>> 449.815004]  ? process_one_work+0x3f0/0x3f0 [  449.815374]  ?
>> kthread_create_worker_on_cpu+0x70/0x70
>> [  449.815799]  ret_from_fork+0x35/0x40
>>
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sen

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Deng, Emily

Hi Christian,
 Please refer to follow log, when it enter to amdgpu_device_gpu_recover 
function, the bad job 5086879e is freeing in function  
amdgpu_job_free_cb  at the same time, because of the hardware fence signal. But 
amdgpu_device_gpu_recover goes faster, at this case, the s_fence is already 
freed, but job is not freed in time. Then this issue occurs.

[  449.792189] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, 
signaled seq=2481, emitted seq=2483
[  449.793202] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: 
process  pid 0 thread  pid 0, s_job:5086879e
[  449.794163] amdgpu :00:08.0: GPU reset begin!
[  449.794175] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
thread  pid 0, s_job:5086879e
[  449.794221] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
thread  pid 0, s_job:66eb74ab
[  449.794222] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
thread  pid 0, s_job:d4438ad9
[  449.794255] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
thread  pid 0, s_job:b6d69c65
[  449.794257] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
thread  pid 0, s_job:ea85e922
[  449.794287] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
thread  pid 0, s_job:ed3a5ac6
[  449.794366] BUG: unable to handle kernel NULL pointer dereference at 
00c0
[  449.800818] PGD 0 P4D 0
[  449.801040] Oops:  [#1] SMP PTI
[  449.801338] CPU: 3 PID: 55 Comm: kworker/3:1 Tainted: G   OE 
4.18.0-15-generic #16~18.04.1-Ubuntu
[  449.802157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  449.802944] Workqueue: events drm_sched_job_timedout [amd_sched]
[  449.803488] RIP: 0010:amdgpu_device_gpu_recover+0x1da/0xb60 [amdgpu]
[  449.804020] Code: dd ff ff 49 39 c5 48 89 55 a8 0f 85 56 ff ff ff 45 85 e4 
0f 85 a1 00 00 00 48 8b 45 b0 48 85 c0 0f 84 60 01 00 00 48 8b 40 10 <48> 8b 98 
c0 00 00 00 48 85 db 0f 84 4c 01 00 00 48 8b 43 48 a8 01
[  449.805593] RSP: 0018:b4c7c08f7d68 EFLAGS: 00010286
[  449.806032] RAX:  RBX:  RCX: 
[  449.806625] RDX: b4c7c08f5ac0 RSI: 000fffe0 RDI: 0246
[  449.807224] RBP: b4c7c08f7de0 R08: 0068b9d54000 R09: 
[  449.807818] R10:  R11: 0148 R12: 
[  449.808411] R13: b4c7c08f7da0 R14: 8d82b8525d40 R15: 8d82b8525d40
[  449.809004] FS:  () GS:8d82bfd8() 
knlGS:
[  449.809674] CS:  0010 DS:  ES:  CR0: 80050033
[  449.810153] CR2: 00c0 CR3: 3cc0a001 CR4: 003606e0
[  449.810747] DR0:  DR1:  DR2: 
[  449.811344] DR3:  DR6: fffe0ff0 DR7: 0400
[  449.811937] Call Trace:
[  449.812206]  amdgpu_job_timedout+0x114/0x140 [amdgpu]
[  449.812635]  drm_sched_job_timedout+0x44/0x90 [amd_sched]
[  449.813139]  ? amdgpu_cgs_destroy_device+0x10/0x10 [amdgpu]
[  449.813609]  ? drm_sched_job_timedout+0x44/0x90 [amd_sched]
[  449.814077]  process_one_work+0x1fd/0x3f0
[  449.814417]  worker_thread+0x34/0x410
[  449.814728]  kthread+0x121/0x140
[  449.815004]  ? process_one_work+0x3f0/0x3f0
[  449.815374]  ? kthread_create_worker_on_cpu+0x70/0x70
[  449.815799]  ret_from_fork+0x35/0x40

>-Original Message-
>From: Koenig, Christian 
>Sent: Friday, November 8, 2019 5:43 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Am 08.11.19 um 10:39 schrieb Deng, Emily:
>> Sorry, please take your time.
>
>Have you seen my other response a bit below?
>
>I can't follow how it would be possible for job->s_fence to be NULL without
>the job also being freed.
>
>So it looks like this patch is just papering over some bigger issues.
>
>Regards,
>Christian.
>
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Friday, November 8, 2019 5:08 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>
>>> Am 08.11.19 um 09:52 schrieb Deng, Emily:
>>>> Ping.
>>> You need to give me at least enough time to wake up :)
>>>
>>>>
>>>> Best wishes
>>>> Emily Deng
>>>>
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Deng, Emily
>>>>> Sent: Friday, November 8, 2019 10:56 AM
>>>>> To: Koeni

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Deng, Emily

Sorry, please take your time.

Best wishes
Emily Deng



>-Original Message-
>From: Koenig, Christian 
>Sent: Friday, November 8, 2019 5:08 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Am 08.11.19 um 09:52 schrieb Deng, Emily:
>> Ping.
>
>You need to give me at least enough time to wake up :)
>
>>
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Deng, Emily
>>> Sent: Friday, November 8, 2019 10:56 AM
>>> To: Koenig, Christian ; amd-
>>> g...@lists.freedesktop.org
>>> Subject: RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>
>>>> -Original Message-
>>>> From: Christian König 
>>>> Sent: Thursday, November 7, 2019 7:28 PM
>>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>
>>>> Am 07.11.19 um 11:25 schrieb Emily Deng:
>>>>> When the job is already signaled, the s_fence is freed. Then it
>>>>> will has null pointer in amdgpu_device_gpu_recover.
>>>> NAK, the s_fence is only set to NULL when the job is destroyed. See
>>>> drm_sched_job_cleanup().
>>> I know it is set to NULL in drm_sched_job_cleanup. But in one case,
>>> when it enter into the amdgpu_device_gpu_recover, it already in
>>> drm_sched_job_cleanup, and at this time, it will go to free job. But
>>> the amdgpu_device_gpu_recover sometimes is faster. At that time, job
>>> is not freed, but s_fence is already NULL.
>
>No, that case can't happen. See here:
>
>>     drm_sched_job_cleanup(s_job);
>>
>>     amdgpu_ring_priority_put(ring, s_job->s_priority);
>>     dma_fence_put(job->fence);
>>     amdgpu_sync_free(>sync);
>>     amdgpu_sync_free(>sched_sync);
>>     kfree(job);
>
>The job itself is freed up directly after freeing the reference to the s_fence.
>
>So you are just papering over a much bigger problem here. This patch is a
>clear NAK.
>
>Regards,
>Christian.
>
>>>> When you see a job without an s_fence then that means the problem is
>>>> somewhere else.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Signed-off-by: Emily Deng 
>>>>> ---
>>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>>>>>drivers/gpu/drm/scheduler/sched_main.c | 11 ++-
>>>>>2 files changed, 7 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index e6ce949..5a8f08e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -4075,7 +4075,7 @@ int amdgpu_device_gpu_recover(struct
>>>> amdgpu_device *adev,
>>>>>*
>>>>>* job->base holds a reference to parent fence
>>>>>*/
>>>>> - if (job && job->base.s_fence->parent &&
>>>>> + if (job && job->base.s_fence && job->base.s_fence->parent &&
>>>>>   dma_fence_is_signaled(job->base.s_fence->parent))
>>>>>   job_signaled = true;
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 31809ca..56cc10e 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -334,8 +334,8 @@ void drm_sched_increase_karma(struct
>>>> drm_sched_job
>>>>> *bad)
>>>>>
>>>>>   spin_lock(>lock);
>>>>>   list_for_each_entry_safe(entity, tmp, 
>>>>> >entities,
>>>> list) {
>>>>> - if (bad->s_fence->scheduled.context ==
>>>>> - entity->fence_context) {
>>>>> + if (bad->s_fence && (bad->s_fence-
>>>>> scheduled.cont

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Deng, Emily

Ping.


Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Friday, November 8, 2019 10:56 AM
>To: Koenig, Christian ; amd-
>g...@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>>-Original Message-
>>From: Christian König 
>>Sent: Thursday, November 7, 2019 7:28 PM
>>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>
>>Am 07.11.19 um 11:25 schrieb Emily Deng:
>>> When the job is already signaled, the s_fence is freed. Then it will
>>> has null pointer in amdgpu_device_gpu_recover.
>>
>>NAK, the s_fence is only set to NULL when the job is destroyed. See
>>drm_sched_job_cleanup().
>I know it is set to NULL in drm_sched_job_cleanup. But in one case, when it
>enter into the amdgpu_device_gpu_recover, it already in
>drm_sched_job_cleanup, and at this time, it will go to free job. But the
>amdgpu_device_gpu_recover sometimes is faster. At that time, job is not
>freed, but s_fence is already NULL.
>>
>>When you see a job without an s_fence then that means the problem is
>>somewhere else.
>>
>>Regards,
>>Christian.
>>
>>>
>>> Signed-off-by: Emily Deng 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>>>   drivers/gpu/drm/scheduler/sched_main.c | 11 ++-
>>>   2 files changed, 7 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index e6ce949..5a8f08e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -4075,7 +4075,7 @@ int amdgpu_device_gpu_recover(struct
>>amdgpu_device *adev,
>>>  *
>>>  * job->base holds a reference to parent fence
>>>  */
>>> -   if (job && job->base.s_fence->parent &&
>>> +   if (job && job->base.s_fence && job->base.s_fence->parent &&
>>> dma_fence_is_signaled(job->base.s_fence->parent))
>>> job_signaled = true;
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 31809ca..56cc10e 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -334,8 +334,8 @@ void drm_sched_increase_karma(struct
>>drm_sched_job
>>> *bad)
>>>
>>> spin_lock(>lock);
>>> list_for_each_entry_safe(entity, tmp, >entities,
>>list) {
>>> -   if (bad->s_fence->scheduled.context ==
>>> -   entity->fence_context) {
>>> +   if (bad->s_fence && (bad->s_fence-
>>>scheduled.context ==
>>> +   entity->fence_context)) {
>>> if (atomic_read(>karma) >
>>> bad->sched->hang_limit)
>>> if (entity->guilty)
>>> @@ -376,7 +376,7 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>*sched, struct drm_sched_job *bad)
>>>  * This iteration is thread safe as sched thread is stopped.
>>>  */
>>> list_for_each_entry_safe_reverse(s_job, tmp, 
>>>ring_mirror_list, node) {
>>> -   if (s_job->s_fence->parent &&
>>> +   if (s_job->s_fence && s_job->s_fence->parent &&
>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>   _job->cb)) {
>>> atomic_dec(>hw_rq_count); @@ -395,7
>+395,8 @@ void
>>> drm_sched_stop(struct drm_gpu_scheduler
>>*sched, struct drm_sched_job *bad)
>>>  *
>>>  * Job is still alive so fence refcount at least 1
>>>  */
>>> -   dma_fence_wait(_job->s_fence->finished, false);
>>> +   if (s_job->s_fence)
>>> +   dma_fence_wait(_job->s_fence->finished,
>>false);
>>>
>>> /*
>>>  * We must keep bad job alive for later use during @@
>>-438,7
>>> +439,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>full_recovery)
>>>  * GPU recovers can't run in parallel.
>>>  */
>>> list_for_each_entry_safe(s_job, tmp, >ring_mirror_list,
>>> node)
>>{
>>> -   struct dma_fence *fence = s_job->s_fence->parent;
>>> +   struct dma_fence *fence = s_job->s_fence ? s_job->s_fence-
>>>parent :
>>> +NULL;
>>>
>>> atomic_inc(>hw_rq_count);
>>>
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-07 Thread Deng, Emily

>-Original Message-
>From: Christian König 
>Sent: Thursday, November 7, 2019 7:28 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>
>Am 07.11.19 um 11:25 schrieb Emily Deng:
>> When the job is already signaled, the s_fence is freed. Then it will
>> has null pointer in amdgpu_device_gpu_recover.
>
>NAK, the s_fence is only set to NULL when the job is destroyed. See
>drm_sched_job_cleanup().
I know it is set to NULL in drm_sched_job_cleanup. But in one case, when it 
enter into the amdgpu_device_gpu_recover, it already in drm_sched_job_cleanup, 
and at this time, it will go to free job. But the amdgpu_device_gpu_recover 
sometimes is faster. At
that time, job is not freed, but s_fence is already NULL.
>
>When you see a job without an s_fence then that means the problem is
>somewhere else.
>
>Regards,
>Christian.
>
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>>   drivers/gpu/drm/scheduler/sched_main.c | 11 ++-
>>   2 files changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e6ce949..5a8f08e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4075,7 +4075,7 @@ int amdgpu_device_gpu_recover(struct
>amdgpu_device *adev,
>>   *
>>   * job->base holds a reference to parent fence
>>   */
>> -if (job && job->base.s_fence->parent &&
>> +if (job && job->base.s_fence && job->base.s_fence->parent &&
>>  dma_fence_is_signaled(job->base.s_fence->parent))
>>  job_signaled = true;
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 31809ca..56cc10e 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -334,8 +334,8 @@ void drm_sched_increase_karma(struct
>drm_sched_job
>> *bad)
>>
>>  spin_lock(>lock);
>>  list_for_each_entry_safe(entity, tmp, >entities,
>list) {
>> -if (bad->s_fence->scheduled.context ==
>> -entity->fence_context) {
>> +if (bad->s_fence && (bad->s_fence-
>>scheduled.context ==
>> +entity->fence_context)) {
>>  if (atomic_read(>karma) >
>>  bad->sched->hang_limit)
>>  if (entity->guilty)
>> @@ -376,7 +376,7 @@ void drm_sched_stop(struct drm_gpu_scheduler
>*sched, struct drm_sched_job *bad)
>>   * This iteration is thread safe as sched thread is stopped.
>>   */
>>  list_for_each_entry_safe_reverse(s_job, tmp, 
>>ring_mirror_list, node) {
>> -if (s_job->s_fence->parent &&
>> +if (s_job->s_fence && s_job->s_fence->parent &&
>>  dma_fence_remove_callback(s_job->s_fence->parent,
>>_job->cb)) {
>>  atomic_dec(>hw_rq_count);
>> @@ -395,7 +395,8 @@ void drm_sched_stop(struct drm_gpu_scheduler
>*sched, struct drm_sched_job *bad)
>>   *
>>   * Job is still alive so fence refcount at least 1
>>   */
>> -dma_fence_wait(_job->s_fence->finished, false);
>> +if (s_job->s_fence)
>> +dma_fence_wait(_job->s_fence->finished,
>false);
>>
>>  /*
>>   * We must keep bad job alive for later use during @@
>-438,7
>> +439,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>full_recovery)
>>   * GPU recovers can't run in parallel.
>>   */
>>  list_for_each_entry_safe(s_job, tmp, >ring_mirror_list, node)
>{
>> -struct dma_fence *fence = s_job->s_fence->parent;
>> +struct dma_fence *fence = s_job->s_fence ? s_job->s_fence-
>>parent :
>> +NULL;
>>
>>  atomic_inc(>hw_rq_count);
>>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Need to disable msix when unloading driver

2019-11-06 Thread Deng, Emily

Hi Christian,
We use " pci_alloc_irq_vectors " in amdgpu_irq_init. This patch use " 
pci_free_irq_vectors " in amdgpu_irq_fini.

Hi Alex,
Could you help to review this?

Best wishes
Emily Deng



>-Original Message-
>From: Christian König 
>Sent: Wednesday, November 6, 2019 5:32 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Need to disable msix when unloading
>driver
>
>Not an expert on the PCI IRQ stuff, but from what I know that looks correct to
>me.
>
>Only question I can see is why don't we use pci_alloc_irq_vectors()?
>Alex probably needs to take a look.
>
>Regards,
>Christian.
>
>Am 06.11.19 um 07:28 schrieb Deng, Emily:
>> Hi all,
>>  Please help to review this. This is to fix driver reload issue.
>>
>> Best wishes
>> Emily Deng
>>
>>
>>> -Original Message-----
>>> From: Emily Deng 
>>> Sent: Wednesday, November 6, 2019 2:24 PM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Deng, Emily 
>>> Subject: [PATCH] drm/amdgpu: Need to disable msix when unloading
>>> driver
>>>
>>> For driver reload test, it will report "can't enable MSI (MSI-X already
>enabled)".
>>>
>>> Signed-off-by: Emily Deng 
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> index 6f3b03f..30d540d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> @@ -311,7 +311,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>> drm_irq_uninstall(adev->ddev);
>>> adev->irq.installed = false;
>>> if (adev->irq.msi_enabled)
>>> -   pci_disable_msi(adev->pdev);
>>> +   pci_free_irq_vectors(adev->pdev);
>>> if (!amdgpu_device_has_dc_support(adev))
>>> flush_work(>hotplug_work);
>>> }
>>> --
>>> 2.7.4
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Need to disable msix when unloading driver

2019-11-05 Thread Deng, Emily

Hi all,
Please help to review this. This is to fix driver reload issue.

Best wishes
Emily Deng


>-Original Message-
>From: Emily Deng 
>Sent: Wednesday, November 6, 2019 2:24 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Need to disable msix when unloading driver
>
>For driver reload test, it will report "can't enable MSI (MSI-X already 
>enabled)".
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index 6f3b03f..30d540d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -311,7 +311,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>   drm_irq_uninstall(adev->ddev);
>   adev->irq.installed = false;
>   if (adev->irq.msi_enabled)
>-  pci_disable_msi(adev->pdev);
>+  pci_free_irq_vectors(adev->pdev);
>   if (!amdgpu_device_has_dc_support(adev))
>   flush_work(>hotplug_work);
>   }
>--
>2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v2] drm/amdgpu: Need to free discovery memory

2019-11-03 Thread Deng, Emily

Thanks, done.

Best wishes
Emily Deng



>-Original Message-
>From: Yuan, Xiaojie 
>Sent: Monday, November 4, 2019 11:41 AM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH v2] drm/amdgpu: Need to free discovery memory
>
>Please use 'drm/amdgpu/discovery: ' prefix in commit message to let us easily
>track all discovery-releated changes.
>Other than this, patch is Reviewed-by: Xiaojie Yuan 
>
>BR,
>Xiaojie
>
>
>From: amd-gfx  on behalf of Emily
>Deng 
>Sent: Monday, November 4, 2019 11:03 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily
>Subject: [PATCH v2] drm/amdgpu: Need to free discovery memory
>
>When unloading driver, need to free discovery memory.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>index 28b09f6..7cbe6d9 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>@@ -2106,9 +2106,6 @@ void amdgpu_ttm_late_init(struct amdgpu_device
>*adev)
>void *stolen_vga_buf;
>/* return the VGA stolen memory (if any) back to VRAM */
>amdgpu_bo_free_kernel(>stolen_vga_memory, NULL,
>_vga_buf);
>-
>-   /* return the IP Discovery TMR memory back to VRAM */
>-   amdgpu_bo_free_kernel(>discovery_memory, NULL, NULL);
> }
>
> /**
>@@ -2121,7 +2118,10 @@ void amdgpu_ttm_fini(struct amdgpu_device
>*adev)
>
>amdgpu_ttm_debugfs_fini(adev);
>amdgpu_ttm_training_reserve_vram_fini(adev);
>+   /* return the IP Discovery TMR memory back to VRAM */
>+   amdgpu_bo_free_kernel(>discovery_memory, NULL, NULL);
>amdgpu_ttm_fw_reserve_vram_fini(adev);
>+
>if (adev->mman.aper_base_kaddr)
>iounmap(adev->mman.aper_base_kaddr);
>adev->mman.aper_base_kaddr = NULL;
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH v2] drm/sched: Fix passing zero to 'PTR_ERR' warning

2019-11-03 Thread Deng, Emily

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>Andrey Grodzovsky
>Sent: Wednesday, October 30, 2019 2:08 AM
>To: dan.carpen...@oracle.com
>Cc: Grodzovsky, Andrey ; amd-
>g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org
>Subject: [PATCH v2] drm/sched: Fix passing zero to 'PTR_ERR' warning
>
>Fix a static code checker warning.
>
>v2: Drop PTR_ERR_OR_ZERO.
>
>Signed-off-by: Andrey Grodzovsky 
>---
> drivers/gpu/drm/scheduler/sched_main.c | 7 +--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>b/drivers/gpu/drm/scheduler/sched_main.c
>index f39b97e..dba4390 100644
>--- a/drivers/gpu/drm/scheduler/sched_main.c
>+++ b/drivers/gpu/drm/scheduler/sched_main.c
>@@ -496,8 +496,10 @@ void drm_sched_resubmit_jobs(struct
>drm_gpu_scheduler *sched)
>   fence = sched->ops->run_job(s_job);
>
>   if (IS_ERR_OR_NULL(fence)) {
>+  if (IS_ERR(fence))
>+  dma_fence_set_error(_fence->finished,
>PTR_ERR(fence));
>+
>   s_job->s_fence->parent = NULL;
>-  dma_fence_set_error(_fence->finished,
>PTR_ERR(fence));
>   } else {
>   s_job->s_fence->parent = fence;
>   }
>@@ -741,8 +743,9 @@ static int drm_sched_main(void *param)
> r);
>   dma_fence_put(fence);
>   } else {
>+  if (IS_ERR(fence))
>+  dma_fence_set_error(_fence->finished,
>PTR_ERR(fence));
>
>-  dma_fence_set_error(_fence->finished,
>PTR_ERR(fence));
>   drm_sched_process_job(NULL, _job->cb);
>   }
>
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] SWDEV-206718 drm/amdgpu: Fix tdr3 could hang with slow compute issue

2019-10-11 Thread Deng, Emily

Ping

Best wishes
Emily Deng



>-Original Message-
>From: Emily Deng 
>Sent: Wednesday, October 9, 2019 6:52 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] SWDEV-206718 drm/amdgpu: Fix tdr3 could hang with slow
>compute issue
>
>When index is 1, need to set compute ring timeout for sriov and passthrough.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 6 --
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 53ce227..2f5a015 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2664,8 +2664,11 @@ static int
>amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>* There is only one value specified and
>* it should apply to all non-compute jobs.
>*/
>-  if (index == 1)
>+  if (index == 1) {
>   adev->sdma_timeout = adev->video_timeout = adev-
>>gfx_timeout;
>+  if (amdgpu_sriov_vf(adev) ||
>amdgpu_passthrough(adev))
>+  adev->compute_timeout = adev->gfx_timeout;
>+  }
>   }
>
>   return ret;
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>index a88ea74..311abc8 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>@@ -250,9 +250,11 @@ module_param_named(msi, amdgpu_msi, int, 0444);
>  * By default(with no lockup_timeout settings), the timeout for all non-
>compute(GFX, SDMA and Video)
>  * jobs is 1. And there is no timeout enforced on compute jobs.
>  */
>-MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default:
>1 for non-compute jobs and infinity timeout for compute jobs."
>+MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (default:
>for bare metal 1 for non-compute jobs and infinity timeout for compute
>jobs; "
>+  "for passthrough or sriov, 1 for all jobs."
>   " 0: keep default value. negative: infinity timeout), "
>-  "format is [Non-Compute] or [GFX,Compute,SDMA,Video]");
>+  "format: for bare metal [Non-Compute] or
>[GFX,Compute,SDMA,Video]; "
>+  "for passthrough or sriov [all jobs] or
>[GFX,Compute,SDMA,Video].");
> module_param_string(lockup_timeout, amdgpu_lockup_timeout,
>sizeof(amdgpu_lockup_timeout), 0444);
>
> /**
>--
>2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm7amdgpu: once more fix amdgpu_bo_create_kernel_at

2019-09-25 Thread Deng, Emily

Yes, I have already tested it.

Best wishes
Emily Deng



>-Original Message-
>From: Christian König 
>Sent: Wednesday, September 25, 2019 5:36 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm7amdgpu: once more fix
>amdgpu_bo_create_kernel_at
>
>Hi Emily,
>
>have you also tested this? I don't have the hardware to test it so that would
>be rather nice to have.
>
>Thanks,
>Christian.
>
>Am 25.09.19 um 11:31 schrieb Deng, Emily:
>> Reviewed-by: Emily Deng 
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Tuesday, September 24, 2019 7:56 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: [PATCH] drm7amdgpu: once more fix
>amdgpu_bo_create_kernel_at
>>>
>>> When CPU access is needed we should tell that to
>>> amdgpu_bo_create_reserved() or otherwise the access is denied later on.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 ++---
>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 12d2adcdf14e..f10b6175e20c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -369,7 +369,7 @@ int amdgpu_bo_create_kernel_at(struct
>>> amdgpu_device *adev,
>>> size = ALIGN(size, PAGE_SIZE);
>>>
>>> r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE, domain,
>bo_ptr,
>>> - NULL, NULL);
>>> + NULL, cpu_addr);
>>> if (r)
>>> return r;
>>>
>>> @@ -377,12 +377,15 @@ int amdgpu_bo_create_kernel_at(struct
>>> amdgpu_device *adev,
>>>  * Remove the original mem node and create a new one at the
>request
>>>  * position.
>>>  */
>>> +   if (cpu_addr)
>>> +   amdgpu_bo_kunmap(*bo_ptr);
>>> +
>>> +   ttm_bo_mem_put(&(*bo_ptr)->tbo, &(*bo_ptr)->tbo.mem);
>>> +
>>> for (i = 0; i < (*bo_ptr)->placement.num_placement; ++i) {
>>> (*bo_ptr)->placements[i].fpfn = offset >> PAGE_SHIFT;
>>> (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT;
>>> }
>>> -
>>> -   ttm_bo_mem_put(&(*bo_ptr)->tbo, &(*bo_ptr)->tbo.mem);
>>> r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement,
>>>  &(*bo_ptr)->tbo.mem, );
>>> if (r)
>>> --
>>> 2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm7amdgpu: once more fix amdgpu_bo_create_kernel_at

2019-09-25 Thread Deng, Emily

Reviewed-by: Emily Deng 

>-Original Message-
>From: Christian König 
>Sent: Tuesday, September 24, 2019 7:56 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: [PATCH] drm7amdgpu: once more fix amdgpu_bo_create_kernel_at
>
>When CPU access is needed we should tell that to
>amdgpu_bo_create_reserved() or otherwise the access is denied later on.
>
>Signed-off-by: Christian König 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 ++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>index 12d2adcdf14e..f10b6175e20c 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>@@ -369,7 +369,7 @@ int amdgpu_bo_create_kernel_at(struct
>amdgpu_device *adev,
>   size = ALIGN(size, PAGE_SIZE);
>
>   r = amdgpu_bo_create_reserved(adev, size, PAGE_SIZE, domain,
>bo_ptr,
>-NULL, NULL);
>+NULL, cpu_addr);
>   if (r)
>   return r;
>
>@@ -377,12 +377,15 @@ int amdgpu_bo_create_kernel_at(struct
>amdgpu_device *adev,
>* Remove the original mem node and create a new one at the
>request
>* position.
>*/
>+  if (cpu_addr)
>+  amdgpu_bo_kunmap(*bo_ptr);
>+
>+  ttm_bo_mem_put(&(*bo_ptr)->tbo, &(*bo_ptr)->tbo.mem);
>+
>   for (i = 0; i < (*bo_ptr)->placement.num_placement; ++i) {
>   (*bo_ptr)->placements[i].fpfn = offset >> PAGE_SHIFT;
>   (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT;
>   }
>-
>-  ttm_bo_mem_put(&(*bo_ptr)->tbo, &(*bo_ptr)->tbo.mem);
>   r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement,
>&(*bo_ptr)->tbo.mem, );
>   if (r)
>--
>2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: restrict hotplug error message

2019-09-25 Thread Deng, Emily

Reviewed-by: Emily Deng 

>-Original Message-
>From: Christian König 
>Sent: Thursday, September 19, 2019 9:17 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily ; Zhang, Jack (Jian)
>
>Subject: [PATCH] drm/amdgpu: restrict hotplug error message
>
>We should print the error only when we are hotplugged and crash basically all
>userspace applications.
>
>Signed-off-by: Christian König 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>index 6978d17a406b..5cb808cb8108 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>@@ -1098,7 +1098,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)  {
>   struct drm_device *dev = pci_get_drvdata(pdev);
>
>-  DRM_ERROR("Device removal is currently not supported outside of
>fbcon\n");
>+#ifdef MODULE
>+  if (THIS_MODULE->state != MODULE_STATE_GOING) #endif
>+  DRM_ERROR("Hotplug removal is not supported\n");
>   drm_dev_unplug(dev);
>   drm_dev_put(dev);
>   pci_disable_device(pdev);
>--
>2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

2019-09-19 Thread Deng, Emily

Ok, thanks very much.

Best wishes
Emily Deng
From: Koenig, Christian 
Sent: Thursday, September 19, 2019 5:06 PM
To: Deng, Emily 
Cc: Zhang, Jack (Jian) ; amd-gfx@lists.freedesktop.org; 
Teng, Rui ; Cui, Flora 
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

I can create a patch based on this today and push it on Monday.

Christian.

Am 19.09.2019 11:05 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:
Hi Christian,
Could you please help to push the code?

Best wishes
Emily Deng
From: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>
Sent: Thursday, September 19, 2019 11:33 AM
To: Deng, Emily mailto:emily.d...@amd.com>>; Koenig, 
Christian mailto:christian.koe...@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, 
Rui mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Reviewed & Tested-by: Jack Zhang 
mailto:jack.zha...@amd.com>>

BR,
Jack
From: Deng, Emily mailto:emily.d...@amd.com>>
Sent: Thursday, September 19, 2019 10:58 AM
To: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack,
Could you please give a try about this? Both for passthrough and sriov.

Best wishes
Emily Deng
From: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Sent: Wednesday, September 18, 2019 6:47 PM
To: Deng, Emily mailto:emily.d...@amd.com>>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: Re: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack & Emily,

asking around a bit helped, we should be able to take a look at the module 
state to distinct the two use cases like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6b96a5738e57..0af134eb03e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1074,7 +1074,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
 {
struct drm_device *dev = pci_get_drvdata(pdev);

-   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
+#ifdef MODULE
+   if (THIS_MODULE->state != MODULE_STATE_GOING)
+#endif
+   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
drm_dev_unplug(dev);
drm_dev_put(dev);
pci_disable_device(pdev);

It's a bit of a hack, but I think that this should work.

Regards,
Christian.

Am 18.09.19 um 12:29 schrieb Christian König:
Hi Emily,
Do you think this is because the wrong use case?
Well Jack's use case is correct, but the PCIe hot plug removal use case is 
incorrect.

Changing it to a warning is most likely not a good idea either because it is 
indeed an error to try to use PCIe hot plug removal.

What we need to distinct is why the function is called, if it's because of 
pci_unregister_driver(_kms_pci_driver) in amdgpu_exit() then the use 
case is valid and we should not print the error.

But if it's because somebody does something like "echo 1 > 
/sys/bus/pci/devices/\:0b\:00.1/remove" then that is invalid and we should 
print it.

We could do some hack and look at the stack trace, but that is probably not 
reliable either.

Maybe we can look at the module reference count or something like that.

Regards,
Christian.

Am 18.09.19 um 12:04 schrieb Deng, Emily:
Hi Christian,
Do you think this is because the wrong use case? Even we add the error log, the 
issue still happens. Could we change this error to warning? As for the right 
method to remove the driver, it shouldn't occur issues.

Best wishes
Emily Deng
From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Wednesday, September 18, 2019 5:45 PM
To: Deng, Emily <mailto:emily.d...@amd.com>
Cc: Zhang, Jack (Jian) <mailto:jack.zha...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
<mailto:rui.t...@amd.com>; Cui, Flora 
<mailto:flora@amd.com>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Yes, exactly.

The problem is that even when output is qxl or the Intel driver our driver is 
still loaded and forcefully removing it renders the desktop unusable.

Christian.

Am 18.09.2019 11:24 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:

Hi Christ

RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

2019-09-19 Thread Deng, Emily

Hi Christian,
Could you please help to push the code?

Best wishes
Emily Deng
From: Zhang, Jack (Jian) 
Sent: Thursday, September 19, 2019 11:33 AM
To: Deng, Emily ; Koenig, Christian 

Cc: amd-gfx@lists.freedesktop.org; Teng, Rui ; Cui, Flora 

Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Reviewed & Tested-by: Jack Zhang 
mailto:jack.zha...@amd.com>>

BR,
Jack
From: Deng, Emily mailto:emily.d...@amd.com>>
Sent: Thursday, September 19, 2019 10:58 AM
To: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack,
Could you please give a try about this? Both for passthrough and sriov.

Best wishes
Emily Deng
From: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Sent: Wednesday, September 18, 2019 6:47 PM
To: Deng, Emily mailto:emily.d...@amd.com>>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: Re: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack & Emily,

asking around a bit helped, we should be able to take a look at the module 
state to distinct the two use cases like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6b96a5738e57..0af134eb03e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1074,7 +1074,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
 {
struct drm_device *dev = pci_get_drvdata(pdev);

-   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
+#ifdef MODULE
+   if (THIS_MODULE->state != MODULE_STATE_GOING)
+#endif
+   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
drm_dev_unplug(dev);
drm_dev_put(dev);
pci_disable_device(pdev);

It's a bit of a hack, but I think that this should work.

Regards,
Christian.

Am 18.09.19 um 12:29 schrieb Christian König:
Hi Emily,
Do you think this is because the wrong use case?
Well Jack's use case is correct, but the PCIe hot plug removal use case is 
incorrect.

Changing it to a warning is most likely not a good idea either because it is 
indeed an error to try to use PCIe hot plug removal.

What we need to distinct is why the function is called, if it's because of 
pci_unregister_driver(_kms_pci_driver) in amdgpu_exit() then the use 
case is valid and we should not print the error.

But if it's because somebody does something like "echo 1 > 
/sys/bus/pci/devices/\:0b\:00.1/remove" then that is invalid and we should 
print it.

We could do some hack and look at the stack trace, but that is probably not 
reliable either.

Maybe we can look at the module reference count or something like that.

Regards,
Christian.

Am 18.09.19 um 12:04 schrieb Deng, Emily:
Hi Christian,
Do you think this is because the wrong use case? Even we add the error log, the 
issue still happens. Could we change this error to warning? As for the right 
method to remove the driver, it shouldn’t occur issues.

Best wishes
Emily Deng
From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Wednesday, September 18, 2019 5:45 PM
To: Deng, Emily <mailto:emily.d...@amd.com>
Cc: Zhang, Jack (Jian) <mailto:jack.zha...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
<mailto:rui.t...@amd.com>; Cui, Flora 
<mailto:flora@amd.com>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Yes, exactly.

The problem is that even when output is qxl or the Intel driver our driver is 
still loaded and forcefully removing it renders the desktop unusable.

Christian.

Am 18.09.2019 11:24 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:

Hi Christian,

Do you mean, for passthrough mode, the desktop is rendered by our asic, but 
enduser is trying to remove the driver by hot plug?

Best wishes

Emily Deng

From: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Sent: Wednesday, September 18, 2019 4:44 PM
To: Deng, Emily mailto:emily.d...@amd.com>>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Emily,

Yeah, exactly that's the pro

RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

2019-09-18 Thread Deng, Emily

Hi Jack,
Could you please give a try about this? Both for passthrough and sriov.

Best wishes
Emily Deng
From: Koenig, Christian 
Sent: Wednesday, September 18, 2019 6:47 PM
To: Deng, Emily 
Cc: Zhang, Jack (Jian) ; amd-gfx@lists.freedesktop.org; 
Teng, Rui ; Cui, Flora 
Subject: Re: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack & Emily,

asking around a bit helped, we should be able to take a look at the module 
state to distinct the two use cases like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6b96a5738e57..0af134eb03e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1074,7 +1074,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
 {
struct drm_device *dev = pci_get_drvdata(pdev);

-   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
+#ifdef MODULE
+   if (THIS_MODULE->state != MODULE_STATE_GOING)
+#endif
+   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
drm_dev_unplug(dev);
drm_dev_put(dev);
pci_disable_device(pdev);

It's a bit of a hack, but I think that this should work.

Regards,
Christian.

Am 18.09.19 um 12:29 schrieb Christian König:
Hi Emily,

Do you think this is because the wrong use case?
Well Jack's use case is correct, but the PCIe hot plug removal use case is 
incorrect.

Changing it to a warning is most likely not a good idea either because it is 
indeed an error to try to use PCIe hot plug removal.

What we need to distinct is why the function is called, if it's because of 
pci_unregister_driver(_kms_pci_driver) in amdgpu_exit() then the use 
case is valid and we should not print the error.

But if it's because somebody does something like "echo 1 > 
/sys/bus/pci/devices/\:0b\:00.1/remove" then that is invalid and we should 
print it.

We could do some hack and look at the stack trace, but that is probably not 
reliable either.

Maybe we can look at the module reference count or something like that.

Regards,
Christian.

Am 18.09.19 um 12:04 schrieb Deng, Emily:
Hi Christian,
Do you think this is because the wrong use case? Even we add the error log, the 
issue still happens. Could we change this error to warning? As for the right 
method to remove the driver, it shouldn’t occur issues.

Best wishes
Emily Deng
From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Wednesday, September 18, 2019 5:45 PM
To: Deng, Emily <mailto:emily.d...@amd.com>
Cc: Zhang, Jack (Jian) <mailto:jack.zha...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
<mailto:rui.t...@amd.com>; Cui, Flora 
<mailto:flora@amd.com>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Yes, exactly.

The problem is that even when output is qxl or the Intel driver our driver is 
still loaded and forcefully removing it renders the desktop unusable.

Christian.

Am 18.09.2019 11:24 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:

Hi Christian,

Do you mean, for passthrough mode, the desktop is rendered by our asic, but 
enduser is trying to remove the driver by hot plug?

Best wishes

Emily Deng

From: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Sent: Wednesday, September 18, 2019 4:44 PM
To: Deng, Emily mailto:emily.d...@amd.com>>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Emily,

Yeah, exactly that's the problem: In some cases the driver can be unloaded 
while it is still in use!

See we added this error message because endusers tried to use PCIe hot plug to 
unload the driver to use the hardware for paththrough.

But this will completely nuke your desktop, even when amdgpu is only a 
secondary device like in the qxl case.

Jack is using the correct way of doing it, e.g. using "modprobe -r" or rmmod. 
Both commands check the use count before unloading the driver instances.

I don't see a way to distingt the two cases and what Jack is doing is 
unfortunate not the common one.

Regards,

Christian.

Am 18.09.2019 10:08 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:

Hi Christian,

 Before unloading driver, user must sure there is not any userspace to use 
of amdgpu, if not, it will report driver is in use. And the unloading driver is 
a feature about amdgpu driver which will be easier to replace driver without 
rebooting VM. Do you think it has any issue about driver unloading path?

Best wishes

Emily Deng

From: Koenig, Christian 
mailto:ch

RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

2019-09-18 Thread Deng, Emily

Hi Christian,
Do you think this is because the wrong use case? Even we add the error log, the 
issue still happens. Could we change this error to warning? As for the right 
method to remove the driver, it shouldn’t occur issues.

Best wishes
Emily Deng
From: Koenig, Christian 
Sent: Wednesday, September 18, 2019 5:45 PM
To: Deng, Emily 
Cc: Zhang, Jack (Jian) ; amd-gfx@lists.freedesktop.org; 
Teng, Rui ; Cui, Flora 
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Yes, exactly.

The problem is that even when output is qxl or the Intel driver our driver is 
still loaded and forcefully removing it renders the desktop unusable.

Christian.

Am 18.09.2019 11:24 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:

Hi Christian,

Do you mean, for passthrough mode, the desktop is rendered by our asic, but 
enduser is trying to remove the driver by hot plug?

Best wishes

Emily Deng

From: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Sent: Wednesday, September 18, 2019 4:44 PM
To: Deng, Emily mailto:emily.d...@amd.com>>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, Rui 
mailto:rui.t...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Emily,

Yeah, exactly that's the problem: In some cases the driver can be unloaded 
while it is still in use!

See we added this error message because endusers tried to use PCIe hot plug to 
unload the driver to use the hardware for paththrough.

But this will completely nuke your desktop, even when amdgpu is only a 
secondary device like in the qxl case.

Jack is using the correct way of doing it, e.g. using "modprobe -r" or rmmod. 
Both commands check the use count before unloading the driver instances.

I don't see a way to distingt the two cases and what Jack is doing is 
unfortunate not the common one.

Regards,

Christian.

Am 18.09.2019 10:08 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:

Hi Christian,

 Before unloading driver, user must sure there is not any userspace to use 
of amdgpu, if not, it will report driver is in use. And the unloading driver is 
a feature about amdgpu driver which will be easier to replace driver without 
rebooting VM. Do you think it has any issue about driver unloading path?

Best wishes

Emily Deng

From: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Sent: Wednesday, September 18, 2019 3:54 PM
To: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, 
Rui mailto:rui.t...@amd.com>>; Deng, Emily 
mailto:emily.d...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack,

Well that believe is unfortunately completely wrong.

The point is that ANY use of amdgpu by userspace will prevent correct driver 
unload, that qxl is used for the fbcon doesn't change anything here.

So the patch is a clear NAK. Driver unload is not supposed to work even under 
SRIOV.

Regards,

Christian.

Am 18.09.2019 09:32 schrieb "Zhang, Jack (Jian)" 
mailto:jack.zha...@amd.com>>:

Hi, Christian and folks,

In virtual machines(such virt-manager), there's always a virtual graphics 
device existed like "qxl" as the default gfx device.
So under such condition, we believe it's reasonable to unload amdgpu driver as 
it is not treated as the default fbcon device.

Would you please help to review this patch?

Best wishes,
Jack

-Original Message-
From: Jack Zhang mailto:jack.zha...@amd.com>>
Sent: Wednesday, September 18, 2019 3:25 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>
Subject: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

In virtual machine, there would be a qxl or cirrus graphics device as the 
default master fbcon device.

So for PF(passthrough mode) or SRIOV VF, it is reasonable to unload amdgpu 
driver. Amdgpu doesn't have to be the only fbcon device under this condition.

Signed-off-by: Jack Zhang mailto:jack.zha...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 420888e..ada2b25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1103,8 +1103,9 @@ static void
 amdgpu_pci_remove(struct pci_dev *pdev)  {
 struct drm_device *dev = pci_get_drvdata(pdev);
-
-   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");

RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

2019-09-18 Thread Deng, Emily

Hi Christian,
Do you mean, for passthrough mode, the desktop is rendered by our asic, but 
enduser is trying to remove the driver by hot plug?

Best wishes
Emily Deng
From: Koenig, Christian 
Sent: Wednesday, September 18, 2019 4:44 PM
To: Deng, Emily 
Cc: Zhang, Jack (Jian) ; amd-gfx@lists.freedesktop.org; 
Teng, Rui ; Cui, Flora 
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Emily,

Yeah, exactly that's the problem: In some cases the driver can be unloaded 
while it is still in use!

See we added this error message because endusers tried to use PCIe hot plug to 
unload the driver to use the hardware for paththrough.

But this will completely nuke your desktop, even when amdgpu is only a 
secondary device like in the qxl case.

Jack is using the correct way of doing it, e.g. using "modprobe -r" or rmmod. 
Both commands check the use count before unloading the driver instances.

I don't see a way to distingt the two cases and what Jack is doing is 
unfortunate not the common one.

Regards,
Christian.

Am 18.09.2019 10:08 schrieb "Deng, Emily" 
mailto:emily.d...@amd.com>>:

Hi Christian,

 Before unloading driver, user must sure there is not any userspace to use 
of amdgpu, if not, it will report driver is in use. And the unloading driver is 
a feature about amdgpu driver which will be easier to replace driver without 
rebooting VM. Do you think it has any issue about driver unloading path?

Best wishes

Emily Deng

From: Koenig, Christian 
mailto:christian.koe...@amd.com>>
Sent: Wednesday, September 18, 2019 3:54 PM
To: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Teng, 
Rui mailto:rui.t...@amd.com>>; Deng, Emily 
mailto:emily.d...@amd.com>>; Cui, Flora 
mailto:flora@amd.com>>
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack,

Well that believe is unfortunately completely wrong.

The point is that ANY use of amdgpu by userspace will prevent correct driver 
unload, that qxl is used for the fbcon doesn't change anything here.

So the patch is a clear NAK. Driver unload is not supposed to work even under 
SRIOV.

Regards,

Christian.

Am 18.09.2019 09:32 schrieb "Zhang, Jack (Jian)" 
mailto:jack.zha...@amd.com>>:

Hi, Christian and folks,

In virtual machines(such virt-manager), there's always a virtual graphics 
device existed like "qxl" as the default gfx device.
So under such condition, we believe it's reasonable to unload amdgpu driver as 
it is not treated as the default fbcon device.

Would you please help to review this patch?

Best wishes,
Jack

-Original Message-
From: Jack Zhang mailto:jack.zha...@amd.com>>
Sent: Wednesday, September 18, 2019 3:25 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>
Subject: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

In virtual machine, there would be a qxl or cirrus graphics device as the 
default master fbcon device.

So for PF(passthrough mode) or SRIOV VF, it is reasonable to unload amdgpu 
driver. Amdgpu doesn't have to be the only fbcon device under this condition.

Signed-off-by: Jack Zhang mailto:jack.zha...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 420888e..ada2b25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1103,8 +1103,9 @@ static void
 amdgpu_pci_remove(struct pci_dev *pdev)  {
 struct drm_device *dev = pci_get_drvdata(pdev);
-
-   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
+   struct amdgpu_device *adev = dev->dev_private;
+   if (!amdgpu_sriov_vf(adev) && !amdgpu_passthrough(adev))
+   DRM_ERROR("Device removal is currently not supported outside of
+fbcon\n");
 drm_dev_unplug(dev);
 drm_dev_put(dev);
 pci_disable_device(pdev);
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

2019-09-18 Thread Deng, Emily

Hi Christian,
 Before unloading driver, user must sure there is not any userspace to use 
of amdgpu, if not, it will report driver is in use. And the unloading driver is 
a feature about amdgpu driver which will be easier to replace driver without 
rebooting VM. Do you think it has any issue about driver unloading path?

Best wishes
Emily Deng
From: Koenig, Christian 
Sent: Wednesday, September 18, 2019 3:54 PM
To: Zhang, Jack (Jian) 
Cc: amd-gfx@lists.freedesktop.org; Teng, Rui ; Deng, Emily 
; Cui, Flora 
Subject: RE: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or 
passthrough

Hi Jack,

Well that believe is unfortunately completely wrong.

The point is that ANY use of amdgpu by userspace will prevent correct driver 
unload, that qxl is used for the fbcon doesn't change anything here.

So the patch is a clear NAK. Driver unload is not supposed to work even under 
SRIOV.

Regards,
Christian.



Am 18.09.2019 09:32 schrieb "Zhang, Jack (Jian)" 
mailto:jack.zha...@amd.com>>:
Hi, Christian and folks,

In virtual machines(such virt-manager), there's always a virtual graphics 
device existed like "qxl" as the default gfx device.
So under such condition, we believe it's reasonable to unload amdgpu driver as 
it is not treated as the default fbcon device.

Would you please help to review this patch?

Best wishes,
Jack

-Original Message-
From: Jack Zhang mailto:jack.zha...@amd.com>>
Sent: Wednesday, September 18, 2019 3:25 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Jack (Jian) mailto:jack.zha...@amd.com>>
Subject: [PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

In virtual machine, there would be a qxl or cirrus graphics device as the 
default master fbcon device.

So for PF(passthrough mode) or SRIOV VF, it is reasonable to unload amdgpu 
driver. Amdgpu doesn't have to be the only fbcon device under this condition.

Signed-off-by: Jack Zhang mailto:jack.zha...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 420888e..ada2b25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1103,8 +1103,9 @@ static void
 amdgpu_pci_remove(struct pci_dev *pdev)  {
 struct drm_device *dev = pci_get_drvdata(pdev);
-
-   DRM_ERROR("Device removal is currently not supported outside of 
fbcon\n");
+   struct amdgpu_device *adev = dev->dev_private;
+   if (!amdgpu_sriov_vf(adev) && !amdgpu_passthrough(adev))
+   DRM_ERROR("Device removal is currently not supported outside of
+fbcon\n");
 drm_dev_unplug(dev);
 drm_dev_put(dev);
 pci_disable_device(pdev);
--
2.7.4
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Navi10/12 VF doesn't support SMU

2019-09-12 Thread Deng, Emily

Reviewed-by: Emily Deng 

Best wishes
Emily Deng



>-Original Message-
>From: Zhao, Jiange 
>Sent: Thursday, September 12, 2019 11:46 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Nieto, David M ; Deng, Emily
>; Koenig, Christian ;
>Zhao, Jiange 
>Subject: [PATCH] drm/amdgpu: Navi10/12 VF doesn't support SMU
>
>From: Jiange Zhao 
>
>In SRIOV case, SMU and powerplay are handled in HV.
>
>VF shouldn't have control over SMU and powerplay.
>
>Signed-off-by: Jiange Zhao 
>---
> drivers/gpu/drm/amd/amdgpu/nv.c | 8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>b/drivers/gpu/drm/amd/amdgpu/nv.c index 4c24672be12a..fb097aa089da
>100644
>--- a/drivers/gpu/drm/amd/amdgpu/nv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>@@ -438,7 +438,7 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, _ih_ip_block);
>   amdgpu_device_ip_block_add(adev, _v11_0_ip_block);
>   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP &&
>-  is_support_sw_smu(adev))
>+  is_support_sw_smu(adev) && !amdgpu_sriov_vf(adev))
>   amdgpu_device_ip_block_add(adev,
>_v11_0_ip_block);
>   if (adev->enable_virtual_display || amdgpu_sriov_vf(adev))
>   amdgpu_device_ip_block_add(adev,
>_virtual_ip_block); @@ -449,7 +449,7 @@ int nv_set_ip_blocks(struct
>amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, _v10_0_ip_block);
>   amdgpu_device_ip_block_add(adev, _v5_0_ip_block);
>   if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT
>&&
>-  is_support_sw_smu(adev))
>+  is_support_sw_smu(adev) && !amdgpu_sriov_vf(adev))
>   amdgpu_device_ip_block_add(adev,
>_v11_0_ip_block);
>   amdgpu_device_ip_block_add(adev, _v2_0_ip_block);
>   if (adev->enable_mes)
>@@ -461,7 +461,7 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, _ih_ip_block);
>   amdgpu_device_ip_block_add(adev, _v11_0_ip_block);
>   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP &&
>-  is_support_sw_smu(adev))
>+  is_support_sw_smu(adev) && !amdgpu_sriov_vf(adev))
>   amdgpu_device_ip_block_add(adev,
>_v11_0_ip_block);
>   if (adev->enable_virtual_display || amdgpu_sriov_vf(adev))
>   amdgpu_device_ip_block_add(adev,
>_virtual_ip_block); @@ -472,7 +472,7 @@ int nv_set_ip_blocks(struct
>amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, _v10_0_ip_block);
>   amdgpu_device_ip_block_add(adev, _v5_0_ip_block);
>   if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT
>&&
>-  is_support_sw_smu(adev))
>+  is_support_sw_smu(adev) && !amdgpu_sriov_vf(adev))
>   amdgpu_device_ip_block_add(adev,
>_v11_0_ip_block);
>   amdgpu_device_ip_block_add(adev, _v2_0_ip_block);
>   break;
>--
>2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Navi12 SRIOV VF doesn't load TOC

2019-09-11 Thread Deng, Emily

Reviewed-by: Emily Deng 

>-Original Message-
>From: Zhao, Jiange 
>Sent: Thursday, September 12, 2019 1:22 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Nieto, David M ; Deng, Emily
>; Koenig, Christian ;
>Zhao, Jiange 
>Subject: [PATCH] drm/amdgpu: Navi12 SRIOV VF doesn't load TOC
>
>From: Jiange Zhao 
>
>In SRIOV case, the autoload sequence is the same
>
>as bare metal, except VF won't load TOC.
>
>Signed-off-by: Jiange Zhao 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 ++
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>index f90a0cd12827..762c97ce8251 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>@@ -253,7 +253,8 @@ static int psp_tmr_init(struct psp_context *psp)
>
>   /* For ASICs support RLC autoload, psp will parse the toc
>* and calculate the total size of TMR needed */
>-  if (psp->toc_start_addr &&
>+  if (!amdgpu_sriov_vf(psp->adev) &&
>+  psp->toc_start_addr &&
>   psp->toc_bin_size &&
>   psp->fw_pri_buf) {
>   ret = psp_load_toc(psp, _size);
>@@ -1305,9 +1306,6 @@ int psp_rlc_autoload_start(struct psp_context *psp)
>   int ret;
>   struct psp_gfx_cmd_resp *cmd;
>
>-  if (amdgpu_sriov_vf(psp->adev))
>-  return 0;
>-
>   cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
>   if (!cmd)
>   return -ENOMEM;
>--
>2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: For Navi12 SRIOV VF, register mailbox functions

2019-09-11 Thread Deng, Emily

Reviewed-by: Emily Deng 

>-Original Message-
>From: Zhao, Jiange 
>Sent: Wednesday, September 11, 2019 6:25 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Nieto, David M ; Deng, Emily
>; Koenig, Christian ;
>Zhao, Jiange 
>Subject: [PATCH] drm/amdgpu: For Navi12 SRIOV VF, register mailbox
>functions
>
>From: Jiange Zhao 
>
>Mailbox functions and interrupts are only for Navi12 VF.
>
>Register functions and irqs during initialization.
>
>Signed-off-by: Jiange Zhao 
>---
> drivers/gpu/drm/amd/amdgpu/nv.c | 19 +++
> 1 file changed, 19 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>b/drivers/gpu/drm/amd/amdgpu/nv.c index a61f43c0c9df..4c24672be12a
>100644
>--- a/drivers/gpu/drm/amd/amdgpu/nv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>@@ -53,6 +53,7 @@
> #include "vcn_v2_0.h"
> #include "dce_virtual.h"
> #include "mes_v10_1.h"
>+#include "mxgpu_nv.h"
>
> static const struct amd_ip_funcs nv_common_ip_funcs;
>
>@@ -426,6 +427,9 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
>
>   adev->nbio.funcs->detect_hw_virt(adev);
>
>+  if (amdgpu_sriov_vf(adev))
>+  adev->virt.ops = _nv_virt_ops;
>+
>   switch (adev->asic_type) {
>   case CHIP_NAVI10:
>   case CHIP_NAVI14:
>@@ -666,16 +670,31 @@ static int nv_common_early_init(void *handle)
>   return -EINVAL;
>   }
>
>+  if (amdgpu_sriov_vf(adev)) {
>+  amdgpu_virt_init_setting(adev);
>+  xgpu_nv_mailbox_set_irq_funcs(adev);
>+  }
>+
>   return 0;
> }
>
> static int nv_common_late_init(void *handle)  {
>+  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>+
>+  if (amdgpu_sriov_vf(adev))
>+  xgpu_nv_mailbox_get_irq(adev);
>+
>   return 0;
> }
>
> static int nv_common_sw_init(void *handle)  {
>+  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>+
>+  if (amdgpu_sriov_vf(adev))
>+  xgpu_nv_mailbox_add_irq_id(adev);
>+
>   return 0;
> }
>
>--
>2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Add SRIOV mailbox backend for Navi1x

2019-09-11 Thread Deng, Emily

Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>jia...@amd.com
>Sent: Monday, September 9, 2019 6:37 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhao, Jiange 
>Subject: [PATCH] drm/amdgpu: Add SRIOV mailbox backend for Navi1x
>
>From: Jiange Zhao 
>
>Mimic the ones for Vega10, add mailbox backend for Navi1x
>
>Signed-off-by: Jiange Zhao 
>---
> drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 380
>++  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
>|  41 +++
> 3 files changed, 422 insertions(+), 1 deletion(-)  create mode 100644
>drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
> create mode 100644 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
>b/drivers/gpu/drm/amd/amdgpu/Makefile
>index 84614a71bb4d..43dc4aa18930 100644
>--- a/drivers/gpu/drm/amd/amdgpu/Makefile
>+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>@@ -68,7 +68,7 @@ amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o
>gmc_v6_0.o gfx_v6_0.o si_ih.o si_dma.o dce  amdgpu-y += \
>   vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o
>nbio_v7_0.o vega10_reg_init.o \
>   vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o navi10_reg_init.o
>navi14_reg_init.o \
>-  arct_reg_init.o navi12_reg_init.o
>+  arct_reg_init.o navi12_reg_init.o mxgpu_nv.o
>
> # add DF block
> amdgpu-y += \
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>new file mode 100644
>index ..0d8767eb7a70
>--- /dev/null
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>@@ -0,0 +1,380 @@
>+/*
>+ * Copyright 2014 Advanced Micro Devices, Inc.
>+ *
>+ * Permission is hereby granted, free of charge, to any person
>+obtaining a
>+ * copy of this software and associated documentation files (the
>+"Software"),
>+ * to deal in the Software without restriction, including without
>+limitation
>+ * the rights to use, copy, modify, merge, publish, distribute,
>+sublicense,
>+ * and/or sell copies of the Software, and to permit persons to whom
>+the
>+ * Software is furnished to do so, subject to the following conditions:
>+ *
>+ * The above copyright notice and this permission notice shall be
>+included in
>+ * all copies or substantial portions of the Software.
>+ *
>+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>+EXPRESS OR
>+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>+MERCHANTABILITY,
>+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>EVENT
>+SHALL
>+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
>+DAMAGES OR
>+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>+OTHERWISE,
>+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
>THE USE
>+OR
>+ * OTHER DEALINGS IN THE SOFTWARE.
>+ *
>+ */
>+
>+#include "amdgpu.h"
>+#include "nbio/nbio_2_3_offset.h"
>+#include "nbio/nbio_2_3_sh_mask.h"
>+#include "gc/gc_10_1_0_offset.h"
>+#include "gc/gc_10_1_0_sh_mask.h"
>+#include "soc15.h"
>+#include "navi10_ih.h"
>+#include "soc15_common.h"
>+#include "mxgpu_nv.h"
>+#include "mxgpu_ai.h"
>+
>+static void xgpu_nv_mailbox_send_ack(struct amdgpu_device *adev) {
>+  WREG8(NV_MAIBOX_CONTROL_RCV_OFFSET_BYTE, 2); }
>+
>+static void xgpu_nv_mailbox_set_valid(struct amdgpu_device *adev, bool
>+val) {
>+  WREG8(NV_MAIBOX_CONTROL_TRN_OFFSET_BYTE, val ? 1 : 0); }
>+
>+/*
>+ * this peek_msg could *only* be called in IRQ routine becuase in IRQ
>+routine
>+ * RCV_MSG_VALID filed of BIF_BX_PF_MAILBOX_CONTROL must already be
>set
>+to 1
>+ * by host.
>+ *
>+ * if called no in IRQ routine, this peek_msg cannot guaranteed to
>+return the
>+ * correct value since it doesn't return the RCV_DW0 under the case
>+that
>+ * RCV_MSG_VALID is set by host.
>+ */
>+static enum idh_event xgpu_nv_mailbox_peek_msg(struct amdgpu_device
>+*adev) {
>+  return RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
>+
>   mmBIF_BX_PF_MAILBOX_MSGBUF_RCV_DW0));
>+}
>+
>+
>+static int xgpu_nv_mailbox_rcv_msg(struct amdgpu_device *adev,
>+ enum idh_event event)
>+{
>+  u32 reg;
>+
>+  reg = RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
>+
>mmBIF_BX_PF_MAILBOX_MSGBUF_RCV_DW0));
>+  if (reg != event)
>+  return -ENOENT;
>+
>+  xgpu_nv_mailbox_send_ack(adev);
>+
>+  return 0;
>+}
>+
>+static uint8_t xgpu_nv_peek_ack(struct amdgpu_device *adev) {
>+  return RREG8(NV_MAIBOX_CONTROL_TRN_OFFSET_BYTE) & 2; }
>+
>+static int xgpu_nv_poll_ack(struct amdgpu_device *adev) {
>+  int timeout  = NV_MAILBOX_POLL_ACK_TIMEDOUT;
>+  u8 reg;
>+
>+  do {
>+  reg = RREG8(NV_MAIBOX_CONTROL_TRN_OFFSET_BYTE);
>+  if (reg & 2)
>+  return 0;
>+
>+  mdelay(5);
>+  timeout -= 5;
>+  } while (timeout > 1);
>+
>+  pr_err("Doesn't get TRN_MSG_ACK from pf in %d msec\n",
>+NV_MAILBOX_POLL_ACK_TIMEDOUT);
>+
>+  return -ETIME;
>+}
>+
>+static int

RE: [PATCH 1/2] drm/amdgpu: unity mc base address for arcturus

2019-08-21 Thread Deng, Emily

Series 
Reviewed-by: Emily Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of
>Frank.Min
>Sent: Wednesday, August 21, 2019 6:01 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Min, Frank 
>Subject: [PATCH 1/2] drm/amdgpu: unity mc base address for arcturus
>
>arcturus for sriov would use the unified mc base address
>
>Change-Id: I3f10f88877aa38145a259b88c11a6aa2329f3fe2
>---
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++--
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>index 6de1726..683f47d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>@@ -920,12 +920,12 @@ static void gmc_v9_0_vram_gtt_location(struct
>amdgpu_device *adev,
>   struct amdgpu_gmc *mc)
> {
>   u64 base = 0;
>-  if (!amdgpu_sriov_vf(adev)) {
>-  if (adev->asic_type == CHIP_ARCTURUS)
>-  base = mmhub_v9_4_get_fb_location(adev);
>-  else
>-  base = mmhub_v1_0_get_fb_location(adev);
>-  }
>+
>+  if (adev->asic_type == CHIP_ARCTURUS)
>+  base = mmhub_v9_4_get_fb_location(adev);
>+  else if (!amdgpu_sriov_vf(adev))
>+  base = mmhub_v1_0_get_fb_location(adev);
>+
>   /* add the xgmi offset of the physical node */
>   base += adev->gmc.xgmi.physical_node_id * adev-
>>gmc.xgmi.node_segment_size;
>   amdgpu_gmc_vram_location(adev, mc, base);
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] SWDEV-197284 - drm/amdgpu: Only use the peek function in productor side is not correct

2019-08-12 Thread Deng, Emily

Ok, please ignore this patch.


Best wishes
Emily Deng

>-Original Message-
>From: Christian König 
>Sent: Tuesday, August 13, 2019 1:00 AM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] SWDEV-197284 - drm/amdgpu: Only use the peek
>function in productor side is not correct
>
>Am 12.08.19 um 09:42 schrieb Emily Deng:
>> For spsc queue, use peek function would cause error in productor side.
>> As for the last element, when the push job is occurring during pop
>> job, the peek function will not be updated in time, and it will return NULL.
>>
>> So use queue count for double confirming the job queue is empty.
>
>For the upstream branch I'm going to push my patch which is not as invasive
>as this one.
>
>Christian.
>
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/scheduler/sched_entity.c | 4 ++--
>>   include/drm/spsc_queue.h | 7 +++
>>   2 files changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 35ddbec..e74894f 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -95,7 +95,7 @@ static bool drm_sched_entity_is_idle(struct
>drm_sched_entity *entity)
>>  rmb(); /* for list_empty to work without lock */
>>
>>  if (list_empty(>list) ||
>> -spsc_queue_peek(>job_queue) == NULL)
>> +((spsc_queue_peek(>job_queue) == NULL) &&
>> +!spsc_queue_count(>job_queue)))
>>  return true;
>>
>>  return false;
>> @@ -281,7 +281,7 @@ void drm_sched_entity_fini(struct drm_sched_entity
>*entity)
>>  /* Consumption of existing IBs wasn't completed. Forcefully
>>   * remove them here.
>>   */
>> -if (spsc_queue_peek(>job_queue)) {
>> +if ((spsc_queue_peek(>job_queue) == NULL) &&
>> +!spsc_queue_count(>job_queue)) {
>>  if (sched) {
>>  /* Park the kernel for a moment to make sure it isn't
>processing
>>   * our enity.
>> diff --git a/include/drm/spsc_queue.h b/include/drm/spsc_queue.h index
>> 125f096..78092b9 100644
>> --- a/include/drm/spsc_queue.h
>> +++ b/include/drm/spsc_queue.h
>> @@ -54,9 +54,8 @@ static inline void spsc_queue_init(struct spsc_queue
>> *queue)
>>
>>   static inline struct spsc_node *spsc_queue_peek(struct spsc_queue *queue)
>>   {
>> -return queue->head;
>> +return READ_ONCE(queue->head);
>>   }
>> -
>>   static inline int spsc_queue_count(struct spsc_queue *queue)
>>   {
>>  return atomic_read(>job_count); @@ -70,9 +69,9 @@ static
>> inline bool spsc_queue_push(struct spsc_queue *queue, struct spsc_node
>> *n
>>
>>  preempt_disable();
>>
>> +atomic_inc(>job_count);
>>  tail = (struct spsc_node **)atomic_long_xchg(>tail,
>(long)>next);
>>  WRITE_ONCE(*tail, node);
>> -atomic_inc(>job_count);
>>
>>  /*
>>   * In case of first element verify new node will be visible to the
>> consumer @@ -93,6 +92,7 @@ static inline struct spsc_node
>*spsc_queue_pop(struct spsc_queue *queue)
>>  /* Verify reading from memory and not the cache */
>>  smp_rmb();
>>
>> +atomic_dec(>job_count);
>>  node = READ_ONCE(queue->head);
>>
>>  if (!node)
>> @@ -113,7 +113,6 @@ static inline struct spsc_node
>*spsc_queue_pop(struct spsc_queue *queue)
>>  }
>>  }
>>
>> -atomic_dec(>job_count);
>>  return node;
>>   }
>>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

2019-07-31 Thread Deng, Emily

All looks good to me. Reviewed-by: Emily Deng .

>-Original Message-
>From: amd-gfx  On Behalf Of Monk
>Liu
>Sent: Wednesday, July 31, 2019 4:54 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)
>
>previously the ucode loading of PSP was repreated, one executed in
>phase_1 init/re-init/resume and the other in fw_loading routine
>
>Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset
>prior to the FW loading and any block's hw_init/resume
>
>v2:
>still do the smu fw loading since it is needed by bare-metal
>
>v3:
>drop the change in reinit_early_sriov, just clear all block's status.hw in the
>head place and set the status.hw after hw_init done is enough
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 59
>+++---
> 1 file changed, 38 insertions(+), 21 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 6cb358c..30436ba 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -1673,28 +1673,34 @@ static int amdgpu_device_fw_loading(struct
>amdgpu_device *adev)
>
>   if (adev->asic_type >= CHIP_VEGA10) {
>   for (i = 0; i < adev->num_ip_blocks; i++) {
>-  if (adev->ip_blocks[i].version->type ==
>AMD_IP_BLOCK_TYPE_PSP) {
>-  if (adev->in_gpu_reset || adev->in_suspend) {
>-  if (amdgpu_sriov_vf(adev) && adev-
>>in_gpu_reset)
>-  break; /* sriov gpu reset, psp
>need to do hw_init before IH because of hw limit */
>-  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>-  if (r) {
>-  DRM_ERROR("resume of IP
>block <%s> failed %d\n",
>+  if (adev->ip_blocks[i].version->type !=
>AMD_IP_BLOCK_TYPE_PSP)
>+  continue;
>+
>+  /* no need to do the fw loading again if already
>done*/
>+  if (adev->ip_blocks[i].status.hw == true)
>+  break;
>+
>+  if (adev->in_gpu_reset || adev->in_suspend) {
>+  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>+  if (r) {
>+  DRM_ERROR("resume of IP block <%s>
>failed %d\n",
> adev-
>>ip_blocks[i].version->funcs->name, r);
>-  return r;
>-  }
>-  } else {
>-  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>-  if (r) {
>-  DRM_ERROR("hw_init of IP
>block <%s> failed %d\n",
>-adev->ip_blocks[i].version-
>>funcs->name, r);
>-  return r;
>-  }
>+  return r;
>+  }
>+  } else {
>+  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>+  if (r) {
>+  DRM_ERROR("hw_init of IP block <%s>
>failed %d\n",
>+adev-
>>ip_blocks[i].version->funcs->name, r);
>+  return r;
>   }
>-  adev->ip_blocks[i].status.hw = true;
>   }
>+
>+  adev->ip_blocks[i].status.hw = true;
>+  break;
>   }
>   }
>+
>   r = amdgpu_pm_load_smu_firmware(adev, _version);
>
>   return r;
>@@ -2136,7 +2142,9 @@ static int
>amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
>   if (r) {
>   DRM_ERROR("suspend of IP block <%s>
>failed %d\n",
> adev->ip_blocks[i].version->funcs-
>>name, r);
>+  return r;
>   }
>+  adev->ip_blocks[i].status.hw = false;
>   }
>   }
>
>@@ -2176,14 +2184,16 @@ static int
>amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)
>   if (is_support_sw_smu(adev)) {
>   /* todo */
>   } else if (adev->powerplay.pp_funcs &&
>- adev->powerplay.pp_funcs->set_mp1_state)
>{
>+ adev->powerplay.pp_funcs-
>>set_mp1_state)

RE: [PATCH] drm/ttm: Fix the memory delay free issue

2019-07-15 Thread Deng, Emily

Hi Christian,
 Do you think we could free all those bos those are in current destroy list 
when the current resv is signal in ttm_bo_cleanup_refs?

Best wishes
Emily Deng

>-Original Message-
>From: Koenig, Christian 
>Sent: Monday, July 15, 2019 5:41 PM
>To: Deng, Emily ; Zhou, David(ChunMing)
>
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/ttm: Fix the memory delay free issue
>
>> Do you think we don't need to fix it?
>No, when the application is exhausting memory then we can't expect anything
>else here.
>
>See memory freeing is always delayed until it isn't used any more or when the
>process is killed after access is prevented (by clearing page tables for 
>example).
>
>What we could do is maybe look into why we don't block until the memory is
>freed during command submission, but apart from that this sounds like
>perfectly expected behavior.
>
>Regards,
>Christian.
>
>Am 15.07.19 um 11:34 schrieb Deng, Emily:
>> Hi Christian,
>>  As has this behavior, when test vulkan cts allocation test, it will
>exhausting the memory, and cause out of memory. Do you think we don't
>need to fix it?
>>
>> Best wishes
>> Emily Deng
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Monday, July 15, 2019 5:31 PM
>>> To: Deng, Emily ; Zhou, David(ChunMing)
>>> 
>>> Cc: amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/ttm: Fix the memory delay free issue
>>>
>>> Hi guys,
>>>
>>>> Do you have any suggestion about this? For per vm bo, it seems
>>>> always
>>> delay to free the ttm bo.
>>> Yeah, and that is correct behavior.
>>>
>>> Since we don't know who is using a per-vm BO we need to wait for all
>>> command submissions in flight when it is freed.
>>>
>>> For this we copy the current state of the shared reservation object
>>> into the private one in ttm_bo_individualize_resv.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 15.07.19 um 08:49 schrieb Deng, Emily:
>>>> Hi David,
>>>>You are right, it will copy per-vm resv.
>>>>But currently, it still has the delay free issue which non
>>>> per vm bo doesn't
>>> has. Maybe it already has new fences append to this resv object before
>copy.
>>>> Hi Christian,
>>>>   Do you have any suggestion about this? For per vm bo, it seems
>>>> always
>>> delay to free the ttm bo.
>>>> Best wishes
>>>> Emily Deng
>>>>> -Original Message-
>>>>> From: Zhou, David(ChunMing) 
>>>>> Sent: Wednesday, July 10, 2019 9:28 PM
>>>>> To: Deng, Emily ; amd-
>g...@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/ttm: Fix the memory delay free issue
>>>>>
>>>>> It doesn't make sense that freeing BO still uses per-vm resv.
>>>>>
>>>>> I remember when BO is in release list, its resv will be from per-vm resv
>copy.
>>>>> Could you check it?
>>>>>
>>>>> -David
>>>>>
>>>>> 在 2019/7/10 17:29, Emily Deng 写道:
>>>>>> For vulkan cts allocation test cases, they will create a series of
>>>>>> bos, and then free them. As it has lots of alloction test cases
>>>>>> with the same vm, as per vm bo feature enable, all of those bos'
>>>>>> resv are the same. But the bo free is quite slow, as they use the
>>>>>> same resv object, for every time, free a bo, it will check the
>>>>>> resv whether signal, if it signal, then will free it. But as the
>>>>>> test cases will continue to create bo, and the resv fence is
>>>>>> increasing. So the free is more
>>>>> slower than creating. It will cause memory exhausting.
>>>>>> Method:
>>>>>> When the resv signal, release all the bos which are use the same
>>>>>> resv object.
>>>>>>
>>>>>> Signed-off-by: Emily Deng 
>>>>>> ---
>>>>>> drivers/gpu/drm/ttm/ttm_bo.c | 29 
>-
>>>>>> 1 file changed, 24 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index f9a3d4c..57ec59b 100644
>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
&

RE: [PATCH] drm/ttm: Fix the memory delay free issue

2019-07-15 Thread Deng, Emily

Hi Christian,
As has this behavior, when test vulkan cts allocation test, it will 
exhausting the memory, and cause out of memory. Do you think we don't need to 
fix it?

Best wishes
Emily Deng
>-Original Message-
>From: Koenig, Christian 
>Sent: Monday, July 15, 2019 5:31 PM
>To: Deng, Emily ; Zhou, David(ChunMing)
>
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/ttm: Fix the memory delay free issue
>
>Hi guys,
>
>> Do you have any suggestion about this? For per vm bo, it seems always
>delay to free the ttm bo.
>Yeah, and that is correct behavior.
>
>Since we don't know who is using a per-vm BO we need to wait for all
>command submissions in flight when it is freed.
>
>For this we copy the current state of the shared reservation object into the
>private one in ttm_bo_individualize_resv.
>
>Regards,
>Christian.
>
>Am 15.07.19 um 08:49 schrieb Deng, Emily:
>> Hi David,
>>   You are right, it will copy per-vm resv.
>>   But currently, it still has the delay free issue which non per vm bo 
>> doesn't
>has. Maybe it already has new fences append to this resv object before copy.
>>
>> Hi Christian,
>>  Do you have any suggestion about this? For per vm bo, it seems always
>delay to free the ttm bo.
>>
>> Best wishes
>> Emily Deng
>>> -Original Message-
>>> From: Zhou, David(ChunMing) 
>>> Sent: Wednesday, July 10, 2019 9:28 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/ttm: Fix the memory delay free issue
>>>
>>> It doesn't make sense that freeing BO still uses per-vm resv.
>>>
>>> I remember when BO is in release list, its resv will be from per-vm resv 
>>> copy.
>>> Could you check it?
>>>
>>> -David
>>>
>>> 在 2019/7/10 17:29, Emily Deng 写道:
>>>> For vulkan cts allocation test cases, they will create a series of
>>>> bos, and then free them. As it has lots of alloction test cases with
>>>> the same vm, as per vm bo feature enable, all of those bos' resv are
>>>> the same. But the bo free is quite slow, as they use the same resv
>>>> object, for every time, free a bo, it will check the resv whether
>>>> signal, if it signal, then will free it. But as the test cases will
>>>> continue to create bo, and the resv fence is increasing. So the free
>>>> is more
>>> slower than creating. It will cause memory exhausting.
>>>> Method:
>>>> When the resv signal, release all the bos which are use the same
>>>> resv object.
>>>>
>>>> Signed-off-by: Emily Deng 
>>>> ---
>>>>drivers/gpu/drm/ttm/ttm_bo.c | 29 -
>>>>1 file changed, 24 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index f9a3d4c..57ec59b 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -543,6 +543,7 @@ static int ttm_bo_cleanup_refs(struct
>>> ttm_buffer_object *bo,
>>>>{
>>>>struct ttm_bo_global *glob = bo->bdev->glob;
>>>>struct reservation_object *resv;
>>>> +  struct ttm_buffer_object *resv_bo, *resv_bo_next;
>>>>int ret;
>>>>
>>>>if (unlikely(list_empty(>ddestroy)))
>>>> @@ -566,10 +567,14 @@ static int ttm_bo_cleanup_refs(struct
>>> ttm_buffer_object *bo,
>>>>   
>>>> interruptible,
>>>>   30 * HZ);
>>>>
>>>> -  if (lret < 0)
>>>> +  if (lret < 0) {
>>>> +  kref_put(>list_kref, ttm_bo_release_list);
>>>>return lret;
>>>> -  else if (lret == 0)
>>>> +  }
>>>> +  else if (lret == 0) {
>>>> +  kref_put(>list_kref, ttm_bo_release_list);
>>>>return -EBUSY;
>>>> +  }
>>>>
>>>>spin_lock(>lru_lock);
>>>>if (unlock_resv && 
>>>> !kcl_reservation_object_trylock(bo->resv))
>>> { @@
>>>> -582,6 +587,7 @@ static int ttm_bo_cleanup_refs(struct
>>>> ttm_buffer_o

RE: [PATCH] drm/ttm: Fix the memory delay free issue

2019-07-15 Thread Deng, Emily

Hi David,
 You are right, it will copy per-vm resv.
 But currently, it still has the delay free issue which non per vm bo 
doesn't has. Maybe it already has new fences append to this resv object before 
copy.

Hi Christian,
Do you have any suggestion about this? For per vm bo, it seems always delay 
to free the ttm bo.

Best wishes
Emily Deng
>-Original Message-
>From: Zhou, David(ChunMing) 
>Sent: Wednesday, July 10, 2019 9:28 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/ttm: Fix the memory delay free issue
>
>It doesn't make sense that freeing BO still uses per-vm resv.
>
>I remember when BO is in release list, its resv will be from per-vm resv copy.
>Could you check it?
>
>-David
>
>在 2019/7/10 17:29, Emily Deng 写道:
>> For vulkan cts allocation test cases, they will create a series of
>> bos, and then free them. As it has lots of alloction test cases with
>> the same vm, as per vm bo feature enable, all of those bos' resv are
>> the same. But the bo free is quite slow, as they use the same resv
>> object, for every time, free a bo, it will check the resv whether
>> signal, if it signal, then will free it. But as the test cases will
>> continue to create bo, and the resv fence is increasing. So the free is more
>slower than creating. It will cause memory exhausting.
>>
>> Method:
>> When the resv signal, release all the bos which are use the same resv
>> object.
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 29 -
>>   1 file changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>> b/drivers/gpu/drm/ttm/ttm_bo.c index f9a3d4c..57ec59b 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -543,6 +543,7 @@ static int ttm_bo_cleanup_refs(struct
>ttm_buffer_object *bo,
>>   {
>>  struct ttm_bo_global *glob = bo->bdev->glob;
>>  struct reservation_object *resv;
>> +struct ttm_buffer_object *resv_bo, *resv_bo_next;
>>  int ret;
>>
>>  if (unlikely(list_empty(>ddestroy)))
>> @@ -566,10 +567,14 @@ static int ttm_bo_cleanup_refs(struct
>ttm_buffer_object *bo,
>> interruptible,
>> 30 * HZ);
>>
>> -if (lret < 0)
>> +if (lret < 0) {
>> +kref_put(>list_kref, ttm_bo_release_list);
>>  return lret;
>> -else if (lret == 0)
>> +}
>> +else if (lret == 0) {
>> +kref_put(>list_kref, ttm_bo_release_list);
>>  return -EBUSY;
>> +}
>>
>>  spin_lock(>lru_lock);
>>  if (unlock_resv && !kcl_reservation_object_trylock(bo->resv))
>{ @@
>> -582,6 +587,7 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object
>*bo,
>>   * here.
>>   */
>>  spin_unlock(>lru_lock);
>> +kref_put(>list_kref, ttm_bo_release_list);
>>  return 0;
>>  }
>>  ret = 0;
>> @@ -591,15 +597,29 @@ static int ttm_bo_cleanup_refs(struct
>ttm_buffer_object *bo,
>>  if (unlock_resv)
>>  kcl_reservation_object_unlock(bo->resv);
>>  spin_unlock(>lru_lock);
>> +kref_put(>list_kref, ttm_bo_release_list);
>>  return ret;
>>  }
>>
>>  ttm_bo_del_from_lru(bo);
>>  list_del_init(>ddestroy);
>>  kref_put(>list_kref, ttm_bo_ref_bug);
>> -
>>  spin_unlock(>lru_lock);
>>  ttm_bo_cleanup_memtype_use(bo);
>> +kref_put(>list_kref, ttm_bo_release_list);
>> +
>> +spin_lock(>lru_lock);
>> +list_for_each_entry_safe(resv_bo, resv_bo_next, >bdev-
>>ddestroy, ddestroy) {
>> +if (resv_bo->resv == bo->resv) {
>> +ttm_bo_del_from_lru(resv_bo);
>> +list_del_init(_bo->ddestroy);
>> +spin_unlock(>lru_lock);
>> +ttm_bo_cleanup_memtype_use(resv_bo);
>> +kref_put(_bo->list_kref, ttm_bo_release_list);
>> +spin_lock(>lru_lock);
>> +}
>> +}
>> +spin_unlock(>lru_lock);
>>
>>  if (un

RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset

2019-06-16 Thread Deng, Emily

Hi Philip,
 Could you help to try whether the attachment patch could help with the 
issue encounter? If it is OK, then I will send this patch out to review. 

Best wishes
Emily Deng



>-Original Message-
>From: Deng, Emily
>Sent: Monday, June 17, 2019 10:50 AM
>To: Yang, Philip ; Russell, Kent
>; Quan, Evan ; amd-
>g...@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset
>
>Hi Philip,
> Sorry for introduce this issue for you. From the code, I couldn't see any
>issue. And I have tested the code in my Vega10, it is OK.  So I think this is 
>the
>kfd specific issue, but I couldn't reproduce issue on my platform. Could you
>create an ticket, and assign to me, and share me your platform, so I could
>debug it and fix it today.
>
>Best wishes
>Emily Deng
>
>>-Original Message-
>>From: Yang, Philip 
>>Sent: Friday, June 14, 2019 10:16 PM
>>To: Deng, Emily ; Russell, Kent
>>; Quan, Evan ; amd-
>>g...@lists.freedesktop.org
>>Subject: Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>>reset
>>
>>Hi Emily,
>>
>>I am not familiar with vbios and driver init part, just based on my
>>experience, the patch don't modify amdgpu_get_bios but move
>>amdgpu_get_bios to amdgpu_device_ip_early_init from amdgpu_device_init,
>>so amdgpu_get_bios is executed earlier. The kernel error message "BUG:
>>kernel NULL pointer dereference" means something is not initialized.
>>Please review the change. This issue blocks rocm release now.
>>
>>Regards,
>>Philip
>>
>>On 2019-06-13 11:19 p.m., Deng, Emily wrote:
>>> Hi Russell,
>>>   This patch will read vbios, and parse vbios to get the baco
>>> reset feature
>>bit.  From the call trace, it shows error in " amdgpu_get_bios ", but
>>this patch don't modify amdgpu_get_bios, and code before
>>amdgpu_get_bios. Please first check why it will has error when read vbios.
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -Original Message-
>>>> From: Russell, Kent 
>>>> Sent: Thursday, June 13, 2019 7:11 PM
>>>> To: Quan, Evan ; Deng, Emily
>>;
>>>> amd-gfx@lists.freedesktop.org
>>>> Cc: Deng, Emily 
>>>> Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before
>>>> baco reset
>>>>
>>>> Hi Emily,
>>>>
>>>> This patch caused a regression on MI25 (Chip 6860, VBIOS
>>>> 113-D0513700-001) machines where the driver would not boot. Note
>>>> that this was not seen on
>>>> Vega10 Frontier (Chip 6863, VBIOS 113-D0501100-X09) or Radeon64
>>>> (Chip 697f). Reverting this patch resolved the issue with no other
>>>> work required and was confirmed on all 3 machines.
>>>>
>>>> Here is the dmesg:
>>>>
>>>> [3.908653] amdgpu :23:00.0: BAR 6: can't assign [??? 0x
>>flags
>>>> 0x2000] (bogus alignment)
>>>> [3.908692] BUG: kernel NULL pointer dereference, address:
>>>> 0008
>>>> [3.908716] #PF: supervisor read access in kernel mode
>>>> [3.908734] #PF: error_code(0x) - not-present page
>>>> [3.908753] PGD 0 P4D 0
>>>> [3.908767] Oops:  [#1] SMP NOPTI
>>>> [3.909293] CPU: 8 PID: 409 Comm: kworker/8:1 Not tainted 5.2.0-rc1-
>kfd-
>>>> compute-roc-master-10734 #1
>>>> [3.909753] Hardware name: Inventec P47
>>WC2071019001
>>>> /P47 , BIOS 0.64 04/09/2018
>>>> [3.910534] Workqueue: events work_for_cpu_fn
>>>> [3.910953] RIP: 0010:amdgpu_get_bios+0x3aa/0x580 [amdgpu]
>>>> [3.911314] Code: c0 48 c7 44 24 5c 00 00 00 00 48 c7 84 24 90 00 00 00
>00
>>00
>>>> 00 00 48 89 d9 48 29 f9 83 c1 3c c1 e9 03 f3 48 ab 49 8b 44 24 38
>>>> <48> 8b 40 08
>>>> 48 85 c0 74 24 ba 3c 00 00 00 48 89 de 4c 89 e7 ff d0
>>>> [3.912069] RSP: 0018:a27dce28fc50 EFLAGS: 00010212
>>>> [3.912502] RAX:  RBX: a27dce28fcac RCX:
>>>> 
>>>> [3.912980] RDX:  RSI: 0082 RDI:
>>>> a27dce28fce8
>>>> [3.913467] RBP:  R08: 0001 R09:
>>>> 079a
>>>> [3.913940] R10:  R11: 0001 R12:
>>>> 88d657af
>>&

RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset

2019-06-16 Thread Deng, Emily

Hi Philip,
 Sorry for introduce this issue for you. From the code, I couldn't see any 
issue. And I have tested the code in my Vega10, it is OK.  So I think this is 
the kfd specific issue, but I couldn't reproduce issue on my platform. Could 
you create an ticket, and assign to me, and share me your platform, so I could 
debug it and fix it today.

Best wishes
Emily Deng

>-Original Message-
>From: Yang, Philip 
>Sent: Friday, June 14, 2019 10:16 PM
>To: Deng, Emily ; Russell, Kent
>; Quan, Evan ; amd-
>g...@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset
>
>Hi Emily,
>
>I am not familiar with vbios and driver init part, just based on my experience,
>the patch don't modify amdgpu_get_bios but move amdgpu_get_bios to
>amdgpu_device_ip_early_init from amdgpu_device_init, so amdgpu_get_bios
>is executed earlier. The kernel error message "BUG:
>kernel NULL pointer dereference" means something is not initialized.
>Please review the change. This issue blocks rocm release now.
>
>Regards,
>Philip
>
>On 2019-06-13 11:19 p.m., Deng, Emily wrote:
>> Hi Russell,
>>   This patch will read vbios, and parse vbios to get the baco reset 
>> feature
>bit.  From the call trace, it shows error in " amdgpu_get_bios ", but this 
>patch
>don't modify amdgpu_get_bios, and code before amdgpu_get_bios. Please
>first check why it will has error when read vbios.
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -Original Message-
>>> From: Russell, Kent 
>>> Sent: Thursday, June 13, 2019 7:11 PM
>>> To: Quan, Evan ; Deng, Emily
>;
>>> amd-gfx@lists.freedesktop.org
>>> Cc: Deng, Emily 
>>> Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>>> reset
>>>
>>> Hi Emily,
>>>
>>> This patch caused a regression on MI25 (Chip 6860, VBIOS
>>> 113-D0513700-001) machines where the driver would not boot. Note that
>>> this was not seen on
>>> Vega10 Frontier (Chip 6863, VBIOS 113-D0501100-X09) or Radeon64 (Chip
>>> 697f). Reverting this patch resolved the issue with no other work
>>> required and was confirmed on all 3 machines.
>>>
>>> Here is the dmesg:
>>>
>>> [3.908653] amdgpu :23:00.0: BAR 6: can't assign [??? 0x
>flags
>>> 0x2000] (bogus alignment)
>>> [3.908692] BUG: kernel NULL pointer dereference, address:
>>> 0008
>>> [3.908716] #PF: supervisor read access in kernel mode
>>> [3.908734] #PF: error_code(0x) - not-present page
>>> [3.908753] PGD 0 P4D 0
>>> [3.908767] Oops:  [#1] SMP NOPTI
>>> [3.909293] CPU: 8 PID: 409 Comm: kworker/8:1 Not tainted 5.2.0-rc1-kfd-
>>> compute-roc-master-10734 #1
>>> [3.909753] Hardware name: Inventec P47
>WC2071019001
>>> /P47 , BIOS 0.64 04/09/2018
>>> [3.910534] Workqueue: events work_for_cpu_fn
>>> [3.910953] RIP: 0010:amdgpu_get_bios+0x3aa/0x580 [amdgpu]
>>> [3.911314] Code: c0 48 c7 44 24 5c 00 00 00 00 48 c7 84 24 90 00 00 00 
>>> 00
>00
>>> 00 00 48 89 d9 48 29 f9 83 c1 3c c1 e9 03 f3 48 ab 49 8b 44 24 38
>>> <48> 8b 40 08
>>> 48 85 c0 74 24 ba 3c 00 00 00 48 89 de 4c 89 e7 ff d0
>>> [3.912069] RSP: 0018:a27dce28fc50 EFLAGS: 00010212
>>> [3.912502] RAX:  RBX: a27dce28fcac RCX:
>>> 
>>> [3.912980] RDX:  RSI: 0082 RDI:
>>> a27dce28fce8
>>> [3.913467] RBP:  R08: 0001 R09:
>>> 079a
>>> [3.913940] R10:  R11: 0001 R12:
>>> 88d657af
>>> [3.914349] R13: c0c38120 R14: a27dce28fc68 R15:
>>> 88d657af
>>> [3.914767] FS:  () GS:88d65f40()
>>> knlGS:
>>> [3.915203] CS:  0010 DS:  ES:  CR0: 80050033
>>> [3.915637] CR2: 0008 CR3: 003e7540a000 CR4:
>>> 003406e0
>>> [3.916075] Call Trace:
>>> [3.916522]  ? pcie_capability_clear_and_set_word+0x53/0x80
>>> [3.917014]  amdgpu_device_init+0x923/0x1820 [amdgpu]
>>> [3.917515]  amdgpu_driver_load_kms+0x71/0x310 [amdgpu]
>>> [3.917997]  drm_dev_register+0x113/0x1a0 [drm]
>>> [3.918514]  amdgpu_pci_probe+0xb8/0x150 [amdgpu]
>>> [3.919003]  ? __pm_runtime_resum

RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset

2019-06-13 Thread Deng, Emily

Hi Russell,
 This patch will read vbios, and parse vbios to get the baco reset feature 
bit.  From the call trace, it shows error in " amdgpu_get_bios ", but this 
patch don't modify amdgpu_get_bios, and code before amdgpu_get_bios. Please 
first check why it will has error when read vbios.

Best wishes
Emily Deng



>-Original Message-
>From: Russell, Kent 
>Sent: Thursday, June 13, 2019 7:11 PM
>To: Quan, Evan ; Deng, Emily
>; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset
>
>Hi Emily,
>
>This patch caused a regression on MI25 (Chip 6860, VBIOS 113-D0513700-001)
>machines where the driver would not boot. Note that this was not seen on
>Vega10 Frontier (Chip 6863, VBIOS 113-D0501100-X09) or Radeon64 (Chip
>697f). Reverting this patch resolved the issue with no other work required and
>was confirmed on all 3 machines.
>
>Here is the dmesg:
>
>[3.908653] amdgpu :23:00.0: BAR 6: can't assign [??? 0x flags
>0x2000] (bogus alignment)
>[3.908692] BUG: kernel NULL pointer dereference, address:
>0008
>[3.908716] #PF: supervisor read access in kernel mode
>[3.908734] #PF: error_code(0x) - not-present page
>[3.908753] PGD 0 P4D 0
>[3.908767] Oops:  [#1] SMP NOPTI
>[3.909293] CPU: 8 PID: 409 Comm: kworker/8:1 Not tainted 5.2.0-rc1-kfd-
>compute-roc-master-10734 #1
>[3.909753] Hardware name: Inventec P47  
>WC2071019001
>/P47 , BIOS 0.64 04/09/2018
>[3.910534] Workqueue: events work_for_cpu_fn
>[3.910953] RIP: 0010:amdgpu_get_bios+0x3aa/0x580 [amdgpu]
>[3.911314] Code: c0 48 c7 44 24 5c 00 00 00 00 48 c7 84 24 90 00 00 00 00 
>00
>00 00 48 89 d9 48 29 f9 83 c1 3c c1 e9 03 f3 48 ab 49 8b 44 24 38 <48> 8b 40 08
>48 85 c0 74 24 ba 3c 00 00 00 48 89 de 4c 89 e7 ff d0
>[3.912069] RSP: 0018:a27dce28fc50 EFLAGS: 00010212
>[3.912502] RAX:  RBX: a27dce28fcac RCX:
>
>[3.912980] RDX:  RSI: 0082 RDI:
>a27dce28fce8
>[3.913467] RBP:  R08: 0001 R09:
>079a
>[3.913940] R10:  R11: 0001 R12:
>88d657af
>[3.914349] R13: c0c38120 R14: a27dce28fc68 R15:
>88d657af
>[3.914767] FS:  () GS:88d65f40()
>knlGS:
>[3.915203] CS:  0010 DS:  ES:  CR0: 80050033
>[3.915637] CR2: 0008 CR3: 003e7540a000 CR4:
>003406e0
>[3.916075] Call Trace:
>[3.916522]  ? pcie_capability_clear_and_set_word+0x53/0x80
>[3.917014]  amdgpu_device_init+0x923/0x1820 [amdgpu]
>[3.917515]  amdgpu_driver_load_kms+0x71/0x310 [amdgpu]
>[3.917997]  drm_dev_register+0x113/0x1a0 [drm]
>[3.918514]  amdgpu_pci_probe+0xb8/0x150 [amdgpu]
>[3.919003]  ? __pm_runtime_resume+0x54/0x70
>[3.919270] usb 1-2: New USB device found, idVendor=14dd, idProduct=1005,
>bcdDevice= 0.00
>[3.919498]  local_pci_probe+0x3d/0x90
>[3.919503]  ? __schedule+0x3de/0x690
>[3.920374] usb 1-2: New USB device strings: Mfr=1, Product=2,
>SerialNumber=3
>[3.921137]  work_for_cpu_fn+0x10/0x20
>[3.922028] usb 1-2: Product: D2CIM-VUSB
>[3.922815]  process_one_work+0x159/0x3e0
>[3.923633] usb 1-2: Manufacturer: Raritan
>[3.923635] usb 1-2: SerialNumber: EFFB212D0A6E32F
>[3.924416]  worker_thread+0x22b/0x440
>[3.924419]  ? rescuer_thread+0x350/0x350
>[3.927812]  kthread+0xf6/0x130
>[3.928157] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>[3.928365]  ? kthread_destroy_worker+0x40/0x40
>[3.929401] ata1.00: ATA-10: INTEL SSDSC2KG960G7, SCV10100, max
>UDMA/133
>[3.930101]  ret_from_fork+0x1f/0x30
>[3.930103] Modules linked in: amdgpu(+) crct10dif_pclmul crc32_pclmul
>ghash_clmulni_intel ast amd_iommu_v2 aesni_intel gpu_sched i2c_algo_bit
>aes_x86_64 ttm crypto_simd drm_kms_helper cryptd glue_helper
>syscopyarea sysfillrect ahci sysimgblt libahci fb_sys_fops ixgbe(+) drm dca
>mdio
>[3.930965] ata1.00: 1875385008 sectors, multi 1: LBA48 NCQ (depth 32)
>[3.931085] ata1.00: configured for UDMA/133
>[3.931809] CR2: 0008
>[    3.934723] scsi 0:0:0:0: Direct-Access ATA  INTEL SSDSC2KG96 0100 
>PQ:
>0 ANSI: 5
>
>
>Thanks!
>
> Kent
>
>-Original Message-
>From: amd-gfx  On Behalf Of Quan,
>Evan
>Sent: Monday, May 27, 2019 9:17 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset
>

RE: [PATCH v2] drm/amdgpu/display: Fix reload driver error

2019-05-29 Thread Deng, Emily

Hi Kazlauskas,
I have modified the patch as your suggestion, could you please help to 
review it again?

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Wednesday, May 29, 2019 11:12 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH v2] drm/amdgpu/display: Fix reload driver error
>
>Issue:
>Will have follow error when reload driver:
>[ 3986.567739] sysfs: cannot create duplicate filename
>'/devices/pci:00/:00:07.0/drm_dp_aux_dev'
>[ 3986.567743] CPU: 6 PID: 1767 Comm: modprobe Tainted: G   OE 
>5.0.0-
>rc1-custom #1
>[ 3986.567745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3986.567746] Call Trace:
>..
>[ 3986.567808]  drm_dp_aux_register_devnode+0xdc/0x140
>[drm_kms_helper] ..
>[ 3986.569081] kobject_add_internal failed for drm_dp_aux_dev with -EEXIST,
>don't try to register things with the same name in the same directory.
>
>Reproduce sequences:
>1.modprobe amdgpu
>2.modprobe -r amdgpu
>3.modprobe amdgpu
>
>Root cause:
>When unload driver, it doesn't unregister aux.
>
>v2: Don't use has_aux
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15
>++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>index 8fe1685..941313b 100644
>--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>@@ -3760,6 +3760,13 @@ int
>amdgpu_dm_connector_atomic_get_property(struct drm_connector
>*connector,
>   return ret;
> }
>
>+static void amdgpu_dm_connector_unregister(struct drm_connector
>+*connector) {
>+  struct amdgpu_dm_connector *amdgpu_dm_connector =
>+to_amdgpu_dm_connector(connector);
>+
>+  drm_dp_aux_unregister(_dm_connector-
>>dm_dp_aux.aux);
>+}
>+
> static void amdgpu_dm_connector_destroy(struct drm_connector
>*connector)  {
>   struct amdgpu_dm_connector *aconnector =
>to_amdgpu_dm_connector(connector);
>@@ -3788,6 +3795,11 @@ static void amdgpu_dm_connector_destroy(struct
>drm_connector *connector)
>   drm_dp_cec_unregister_connector(>dm_dp_aux.aux);
>   drm_connector_unregister(connector);
>   drm_connector_cleanup(connector);
>+  if (aconnector->i2c) {
>+  i2c_del_adapter(>i2c->base);
>+  kfree(aconnector->i2c);
>+  }
>+
>   kfree(connector);
> }
>
>@@ -3846,7 +3858,8 @@ static const struct drm_connector_funcs
>amdgpu_dm_connector_funcs = {
>   .atomic_duplicate_state =
>amdgpu_dm_connector_atomic_duplicate_state,
>   .atomic_destroy_state =
>drm_atomic_helper_connector_destroy_state,
>   .atomic_set_property =
>amdgpu_dm_connector_atomic_set_property,
>-  .atomic_get_property =
>amdgpu_dm_connector_atomic_get_property
>+  .atomic_get_property =
>amdgpu_dm_connector_atomic_get_property,
>+  .early_unregister = amdgpu_dm_connector_unregister
> };
>
> static int get_modes(struct drm_connector *connector)
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer

2019-05-29 Thread Deng, Emily

No problem. Thanks for your reviewing.

Best wishes
Emily Deng
From: Christian König 
Sent: Wednesday, May 29, 2019 3:54 PM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; Quan, Evan 
Subject: Re: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer

Sorry for the delay, your patch simply got stuck in the daily wave of mails.

Reviewed-by: Christian König 
<mailto:christian.koe...@amd.com>

Regards,
Christian.

Am 29.05.19 um 05:07 schrieb Deng, Emily:

Hi Christian,

 I have reverted the before change as your suggestion, and sent this new 
patch, could you help to review this?



Best wishes

Emily Deng







-Original Message-

From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Deng,

Emily

Sent: Wednesday, May 29, 2019 10:52 AM

To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

Subject: RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer



Ping..



Best wishes

Emily Deng







-Original Message-

From: Deng, Emily <mailto:emily.d...@amd.com>

Sent: Tuesday, May 28, 2019 6:14 PM

To: Deng, Emily <mailto:emily.d...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

Subject: RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer



Ping ..



Best wishes

Emily Deng







-Original Message-

From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of

Emily Deng

Sent: Tuesday, May 28, 2019 4:06 PM

To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

Cc: Deng, Emily <mailto:emily.d...@amd.com>

Subject: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer



As it will destroy clear_state_obj, and also will unpin it in the

gfx_v9_0_sw_fini, so don't need to call amdgpu_bo_free_kernel in

gfx_v9_0_sw_fini, or it will have unpin warning.



Signed-off-by: Emily Deng <mailto:emily.d...@amd.com>

---

drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +---

1 file changed, 1 insertion(+), 3 deletions(-)



diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

index c763733..cc5a382 100644

--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

@@ -1794,9 +1794,7 @@ static int gfx_v9_0_sw_fini(void *handle)



  gfx_v9_0_mec_fini(adev);

  gfx_v9_0_ngg_fini(adev);

- amdgpu_bo_free_kernel(>gfx.rlc.clear_state_obj,

->gfx.rlc.clear_state_gpu_addr,

-(void **)>gfx.rlc.cs_ptr);

+ amdgpu_bo_unref(>gfx.rlc.clear_state_obj);

  if (adev->asic_type == CHIP_RAVEN) {

   amdgpu_bo_free_kernel(>gfx.rlc.cp_table_obj,

   >gfx.rlc.cp_table_gpu_addr,

--

2.7.4



___

amd-gfx mailing list

amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___

amd-gfx mailing list

amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___

amd-gfx mailing list

amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer

2019-05-28 Thread Deng, Emily

Hi Christian,
 I have reverted the before change as your suggestion, and sent this new 
patch, could you help to review this?

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Wednesday, May 29, 2019 10:52 AM
>To: amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer
>
>Ping..
>
>Best wishes
>Emily Deng
>
>
>
>>-Original Message-
>>From: Deng, Emily 
>>Sent: Tuesday, May 28, 2019 6:14 PM
>>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>Subject: RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer
>>
>>Ping ..
>>
>>Best wishes
>>Emily Deng
>>
>>
>>
>>>-Original Message-
>>>From: amd-gfx  On Behalf Of
>>>Emily Deng
>>>Sent: Tuesday, May 28, 2019 4:06 PM
>>>To: amd-gfx@lists.freedesktop.org
>>>Cc: Deng, Emily 
>>>Subject: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer
>>>
>>>As it will destroy clear_state_obj, and also will unpin it in the
>>>gfx_v9_0_sw_fini, so don't need to call amdgpu_bo_free_kernel in
>>>gfx_v9_0_sw_fini, or it will have unpin warning.
>>>
>>>Signed-off-by: Emily Deng 
>>>---
>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +---
>>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>index c763733..cc5a382 100644
>>>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>@@ -1794,9 +1794,7 @@ static int gfx_v9_0_sw_fini(void *handle)
>>>
>>> gfx_v9_0_mec_fini(adev);
>>> gfx_v9_0_ngg_fini(adev);
>>>-amdgpu_bo_free_kernel(>gfx.rlc.clear_state_obj,
>>>->gfx.rlc.clear_state_gpu_addr,
>>>-(void **)>gfx.rlc.cs_ptr);
>>>+amdgpu_bo_unref(>gfx.rlc.clear_state_obj);
>>> if (adev->asic_type == CHIP_RAVEN) {
>>> amdgpu_bo_free_kernel(>gfx.rlc.cp_table_obj,
>>> >gfx.rlc.cp_table_gpu_addr,
>>>--
>>>2.7.4
>>>
>>>___
>>>amd-gfx mailing list
>>>amd-gfx@lists.freedesktop.org
>>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
--- Begin Message ---
>-Original Message-
>From: Koenig, Christian 
>Sent: Tuesday, May 28, 2019 3:43 PM
>To: Deng, Emily ; Quan, Evan
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>
>Am 28.05.19 um 09:38 schrieb Deng, Emily:
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Tuesday, May 28, 2019 3:04 PM
>>> To: Quan, Evan ; Deng, Emily
>;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>>>
>>> Ok in this case the patch is a NAK.
>>>
>>> The correct solution is to stop using amdgpu_bo_free_kernel in
>>> gfx_v9_0_sw_fini.
>> So we just lead the memory leak here and not destroy the bo? I don't think
>it is correct.
>
>Oh, no. That's not what I meant.
>
>We should stop using amdgpu_bo_free_kernel and instead use
>amdgpu_bo_free!

>Sorry for not being clear here,
>Christian.
Thanks for your good suggestion.  Will revert this patch, and submit another 
patch.

Best wishes
Emily Deng
>
>>> BTW: Are we using the kernel pointer somewhere? Cause that one
>became
>>> completely invalid because of patch "drm/amdgpu: pin the csb buffer
>>> on hw init".
>>>
>>> Christian.
>>>
>>> Am 28.05.19 um 03:42 schrieb Quan, Evan:
>>>> The original unpin in hw_fini was introduced by
>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-July/023681.html
>>>>
>>>> Evan
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Christian K?nig
>>>>> Sent: Monday, May 27, 2019 7:02 PM
>>>>> To: Deng, Emily ; amd-
>g...@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>>>>>
>>>>> Am 27.05

RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer

2019-05-28 Thread Deng, Emily

Ping..

Best wishes
Emily Deng



>-Original Message-
>From: Deng, Emily 
>Sent: Tuesday, May 28, 2019 6:14 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer
>
>Ping ..
>
>Best wishes
>Emily Deng
>
>
>
>>-Original Message-
>>From: amd-gfx  On Behalf Of
>>Emily Deng
>>Sent: Tuesday, May 28, 2019 4:06 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer
>>
>>As it will destroy clear_state_obj, and also will unpin it in the
>>gfx_v9_0_sw_fini, so don't need to call amdgpu_bo_free_kernel in
>>gfx_v9_0_sw_fini, or it will have unpin warning.
>>
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +---
>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>index c763733..cc5a382 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>@@ -1794,9 +1794,7 @@ static int gfx_v9_0_sw_fini(void *handle)
>>
>>  gfx_v9_0_mec_fini(adev);
>>  gfx_v9_0_ngg_fini(adev);
>>- amdgpu_bo_free_kernel(>gfx.rlc.clear_state_obj,
>>- >gfx.rlc.clear_state_gpu_addr,
>>- (void **)>gfx.rlc.cs_ptr);
>>+ amdgpu_bo_unref(>gfx.rlc.clear_state_obj);
>>  if (adev->asic_type == CHIP_RAVEN) {
>>  amdgpu_bo_free_kernel(>gfx.rlc.cp_table_obj,
>>  >gfx.rlc.cp_table_gpu_addr,
>>--
>>2.7.4
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer

2019-05-28 Thread Deng, Emily

Ping ..

Best wishes
Emily Deng



>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Tuesday, May 28, 2019 4:06 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu:Fix the unpin warning about csb buffer
>
>As it will destroy clear_state_obj, and also will unpin it in the 
>gfx_v9_0_sw_fini,
>so don't need to call amdgpu_bo_free_kernel in gfx_v9_0_sw_fini, or it will
>have unpin warning.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>index c763733..cc5a382 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>@@ -1794,9 +1794,7 @@ static int gfx_v9_0_sw_fini(void *handle)
>
>   gfx_v9_0_mec_fini(adev);
>   gfx_v9_0_ngg_fini(adev);
>-  amdgpu_bo_free_kernel(>gfx.rlc.clear_state_obj,
>-  >gfx.rlc.clear_state_gpu_addr,
>-  (void **)>gfx.rlc.cs_ptr);
>+  amdgpu_bo_unref(>gfx.rlc.clear_state_obj);
>   if (adev->asic_type == CHIP_RAVEN) {
>   amdgpu_bo_free_kernel(>gfx.rlc.cp_table_obj,
>   >gfx.rlc.cp_table_gpu_addr,
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin

2019-05-28 Thread Deng, Emily

>-Original Message-
>From: Koenig, Christian 
>Sent: Tuesday, May 28, 2019 3:43 PM
>To: Deng, Emily ; Quan, Evan
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>
>Am 28.05.19 um 09:38 schrieb Deng, Emily:
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Tuesday, May 28, 2019 3:04 PM
>>> To: Quan, Evan ; Deng, Emily
>;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>>>
>>> Ok in this case the patch is a NAK.
>>>
>>> The correct solution is to stop using amdgpu_bo_free_kernel in
>>> gfx_v9_0_sw_fini.
>> So we just lead the memory leak here and not destroy the bo? I don't think
>it is correct.
>
>Oh, no. That's not what I meant.
>
>We should stop using amdgpu_bo_free_kernel and instead use
>amdgpu_bo_free!

>Sorry for not being clear here,
>Christian.
Thanks for your good suggestion.  Will revert this patch, and submit another 
patch.

Best wishes
Emily Deng
>
>>> BTW: Are we using the kernel pointer somewhere? Cause that one
>became
>>> completely invalid because of patch "drm/amdgpu: pin the csb buffer
>>> on hw init".
>>>
>>> Christian.
>>>
>>> Am 28.05.19 um 03:42 schrieb Quan, Evan:
>>>> The original unpin in hw_fini was introduced by
>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-July/023681.html
>>>>
>>>> Evan
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Christian K?nig
>>>>> Sent: Monday, May 27, 2019 7:02 PM
>>>>> To: Deng, Emily ; amd-
>g...@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>>>>>
>>>>> Am 27.05.19 um 10:41 schrieb Emily Deng:
>>>>>> As it will destroy clear_state_obj, and also will unpin it in the
>>>>>> gfx_v9_0_sw_fini, so don't need to call csb_vram unpin in
>>>>>> gfx_v9_0_hw_fini, or it will have unpin warning.
>>>>>>
>>>>>> v2: For suspend, still need to do unpin
>>>>>>
>>>>>> Signed-off-by: Emily Deng 
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++-
>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> index 5eb70e8..5b1ff48 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> @@ -3395,7 +3395,8 @@ static int gfx_v9_0_hw_fini(void *handle)
>>>>>>  gfx_v9_0_cp_enable(adev, false);
>>>>>>  adev->gfx.rlc.funcs->stop(adev);
>>>>>>
>>>>>> -gfx_v9_0_csb_vram_unpin(adev);
>>>>>> +if (adev->in_suspend)
>>>>>> +gfx_v9_0_csb_vram_unpin(adev);
>>>>> That doesn't looks like a good idea to me.
>>>>>
>>>>> Why do we have unpin both in the sw_fini as well as the hw_fini
>>>>> code
>>> paths?
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>  return 0;
>>>>>> }
>>>>> ___
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin

2019-05-28 Thread Deng, Emily

>-Original Message-
>From: Koenig, Christian 
>Sent: Tuesday, May 28, 2019 3:04 PM
>To: Quan, Evan ; Deng, Emily
>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>
>Ok in this case the patch is a NAK.
>
>The correct solution is to stop using amdgpu_bo_free_kernel in
>gfx_v9_0_sw_fini.
So we just lead the memory leak here and not destroy the bo? I don't think it 
is correct.
>
>BTW: Are we using the kernel pointer somewhere? Cause that one became
>completely invalid because of patch "drm/amdgpu: pin the csb buffer on hw
>init".
>
>Christian.
>
>Am 28.05.19 um 03:42 schrieb Quan, Evan:
>> The original unpin in hw_fini was introduced by
>> https://lists.freedesktop.org/archives/amd-gfx/2018-July/023681.html
>>
>> Evan
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Christian K?nig
>>> Sent: Monday, May 27, 2019 7:02 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>>>
>>> Am 27.05.19 um 10:41 schrieb Emily Deng:
>>>> As it will destroy clear_state_obj, and also will unpin it in the
>>>> gfx_v9_0_sw_fini, so don't need to call csb_vram unpin in
>>>> gfx_v9_0_hw_fini, or it will have unpin warning.
>>>>
>>>> v2: For suspend, still need to do unpin
>>>>
>>>> Signed-off-by: Emily Deng 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++-
>>>>1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> index 5eb70e8..5b1ff48 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> @@ -3395,7 +3395,8 @@ static int gfx_v9_0_hw_fini(void *handle)
>>>>gfx_v9_0_cp_enable(adev, false);
>>>>adev->gfx.rlc.funcs->stop(adev);
>>>>
>>>> -  gfx_v9_0_csb_vram_unpin(adev);
>>>> +  if (adev->in_suspend)
>>>> +  gfx_v9_0_csb_vram_unpin(adev);
>>> That doesn't looks like a good idea to me.
>>>
>>> Why do we have unpin both in the sw_fini as well as the hw_fini code
>paths?
>>>
>>> Regards,
>>> Christian.
>>>
>>>>return 0;
>>>>}
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin

2019-05-26 Thread Deng, Emily

Ping..

Best wishes
Emily Deng
>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Friday, May 24, 2019 6:33 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu: Don't need to call csb_vram_unpin
>
>[CAUTION: External Email]
>
>As it will destory clear_state_obj, and also will unpin it in the 
>gfx_v9_0_sw_fini,
>so don't need to call csb_vram unpin in gfx_v9_0_hw_fini, or it will have unpin
>warning.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 16 
> 1 file changed, 16 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>index c763733..231b9e0 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>@@ -1154,20 +1154,6 @@ static int gfx_v9_0_csb_vram_pin(struct
>amdgpu_device *adev)
>return r;
> }
>
>-static void gfx_v9_0_csb_vram_unpin(struct amdgpu_device *adev) -{
>-   int r;
>-
>-   if (!adev->gfx.rlc.clear_state_obj)
>-   return;
>-
>-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
>-   if (likely(r == 0)) {
>-   amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
>-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
>-   }
>-}
>-
> static void gfx_v9_0_mec_fini(struct amdgpu_device *adev)  {
>amdgpu_bo_free_kernel(>gfx.mec.hpd_eop_obj, NULL, NULL);
>@@ -3385,8 +3371,6 @@ static int gfx_v9_0_hw_fini(void *handle)
>gfx_v9_0_cp_enable(adev, false);
>adev->gfx.rlc.funcs->stop(adev);
>
>-   gfx_v9_0_csb_vram_unpin(adev);
>-
>return 0;
> }
>
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset

2019-05-26 Thread Deng, Emily



>-Original Message-
>From: Alex Deucher 
>Sent: Saturday, May 25, 2019 12:59 AM
>To: Deng, Emily 
>Cc: amd-gfx list 
>Subject: Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>reset
>
>[CAUTION: External Email]
>
>On Thu, May 23, 2019 at 10:29 PM Deng, Emily  wrote:
>>
>>
>>
>> >-Original Message-
>> >From: Alex Deucher 
>> >Sent: Friday, May 24, 2019 12:09 AM
>> >To: Deng, Emily 
>> >Cc: amd-gfx list 
>> >Subject: Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>> >reset
>> >
>> >[CAUTION: External Email]
>> >
>> >On Thu, May 23, 2019 at 6:22 AM Emily Deng 
>wrote:
>> >>
>> >> For passthrough, after rebooted the VM, driver will do a baco reset
>> >> before doing other driver initialization during loading  driver.
>> >> For doing the baco reset, it will first check the baco reset capability.
>> >> So first need to set the cap from the vbios information or baco
>> >> reset won't be enabled.
>> >>
>> >> Signed-off-by: Emily Deng 
>> >> ---
>> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 
>> >>  drivers/gpu/drm/amd/amdgpu/soc15.c |  3 ++-
>> >>  drivers/gpu/drm/amd/include/kgd_pp_interface.h |  1 +
>> >>  drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 16
>> >+++
>> >>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c |  1 +
>> >>  .../amd/powerplay/hwmgr/vega10_processpptables.c   | 24
>> >++
>> >>  .../amd/powerplay/hwmgr/vega10_processpptables.h   |  1 +
>> >>  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h  |  1 +
>> >>  8 files changed, 54 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> >> index bdd1fe73..2dde672 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> >> @@ -2611,6 +2611,14 @@ int amdgpu_device_init(struct
>amdgpu_device
>> >*adev,
>> >>  *  E.g., driver was not cleanly unloaded previously, etc.
>> >>  */
>> >> if (!amdgpu_sriov_vf(adev) &&
>> >> amdgpu_asic_need_reset_on_init(adev)) {
>> >> +   if (amdgpu_passthrough(adev) &&
>> >> + adev->powerplay.pp_funcs &&
>> >adev->powerplay.pp_funcs->set_asic_baco_cap) {
>> >> +   r =
>> >> + adev->powerplay.pp_funcs->set_asic_baco_cap(adev-
>> >>powerplay.pp_handle);
>> >> +   if (r) {
>> >> +   dev_err(adev->dev, "set baco capability 
>> >> failed\n");
>> >> +   goto failed;
>> >> +   }
>> >> +   }
>> >> +
>> >
>> >I think it would be cleaner to add this to hwmgr_early_init() or
>> >something called from early init for powerplay.
>> I  also preferred to put it in the hwmgr_early_init, but as the function
>set_asic_baco_cap  need to get the vbios info,  so need to put the
>amdgpu_get_bios before early init. If so the code changes too big.
>
>I think this change is all you need:
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index bdd1fe73f14b..952f61e28d42 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2564,6 +2564,12 @@ int amdgpu_device_init(struct amdgpu_device
>*adev,
>
>amdgpu_device_get_pcie_info(adev);
>
>+   /* Read BIOS */
>+   if (!amdgpu_get_bios(adev)) {
>+   r = -EINVAL;
>+   goto failed;
>+   }
>+
>/* early init functions */
>r = amdgpu_device_ip_early_init(adev);
>if (r)
>@@ -2591,12 +2597,6 @@ int amdgpu_device_init(struct amdgpu_device
>*adev,
>goto fence_driver_init;
>}
>
>-   /* Read BIOS */
>-   if (!amdgpu_get_bios(adev)) {
>-   r = -EINVAL;
>-   goto failed;
>-   }
>-
>r = amdgpu_atombios_init(adev);
>if (r) {
>dev_err(adev->dev, "amdgpu_atombios_init failed\n");
>
>I guess that could be a fol

RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset

2019-05-24 Thread Deng, Emily

Ping ..

Best wishes
Emily Deng
>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Friday, May 24, 2019 10:29 AM
>To: Alex Deucher 
>Cc: amd-gfx list 
>Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>reset
>
>[CAUTION: External Email]
>
>>-Original Message-
>>From: Alex Deucher 
>>Sent: Friday, May 24, 2019 12:09 AM
>>To: Deng, Emily 
>>Cc: amd-gfx list 
>>Subject: Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>>reset
>>
>>[CAUTION: External Email]
>>
>>On Thu, May 23, 2019 at 6:22 AM Emily Deng  wrote:
>>>
>>> For passthrough, after rebooted the VM, driver will do a baco reset
>>> before doing other driver initialization during loading  driver. For
>>> doing the baco reset, it will first check the baco reset capability.
>>> So first need to set the cap from the vbios information or baco reset
>>> won't be enabled.
>>>
>>> Signed-off-by: Emily Deng 
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 
>>>  drivers/gpu/drm/amd/amdgpu/soc15.c |  3 ++-
>>>  drivers/gpu/drm/amd/include/kgd_pp_interface.h |  1 +
>>>  drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 16
>>+++
>>>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c |  1 +
>>>  .../amd/powerplay/hwmgr/vega10_processpptables.c   | 24
>>++
>>>  .../amd/powerplay/hwmgr/vega10_processpptables.h   |  1 +
>>>  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h  |  1 +
>>>  8 files changed, 54 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index bdd1fe73..2dde672 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2611,6 +2611,14 @@ int amdgpu_device_init(struct amdgpu_device
>>*adev,
>>>  *  E.g., driver was not cleanly unloaded previously, etc.
>>>  */
>>> if (!amdgpu_sriov_vf(adev) &&
>>> amdgpu_asic_need_reset_on_init(adev)) {
>>> +   if (amdgpu_passthrough(adev) &&
>>> + adev->powerplay.pp_funcs &&
>>adev->powerplay.pp_funcs->set_asic_baco_cap) {
>>> +   r =
>>> + adev->powerplay.pp_funcs->set_asic_baco_cap(adev-
>>>powerplay.pp_handle);
>>> +   if (r) {
>>> +   dev_err(adev->dev, "set baco capability 
>>> failed\n");
>>> +   goto failed;
>>> +   }
>>> +   }
>>> +
>>
>>I think it would be cleaner to add this to hwmgr_early_init() or
>>something called from early init for powerplay.
>I  also preferred to put it in the hwmgr_early_init, but as the function
>set_asic_baco_cap  need to get the vbios info,  so need to put the
>amdgpu_get_bios before early init. If so the code changes too big.
>>
>>Alex
>>
>>> r = amdgpu_asic_reset(adev);
>>> if (r) {
>>> dev_err(adev->dev, "asic reset on init
>>> failed\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> index 78bd4fc..d9fdd95 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> @@ -764,7 +764,8 @@ static bool soc15_need_reset_on_init(struct
>>amdgpu_device *adev)
>>> /* Just return false for soc15 GPUs.  Reset does not seem to
>>>  * be necessary.
>>>  */
>>> -   return false;
>>> +   if (!amdgpu_passthrough(adev))
>>> +   return false;
>>>
>>> if (adev->flags & AMD_IS_APU)
>>> return false;
>>> diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>>> b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>>> index 9f661bf..c6e2a51 100644
>>> --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>>> +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>>> @@ -296,6 +296,7 @@ struct amd_pm_funcs {
>>> int (*set_hard_min_fclk_by_freq)(void *handle, uint32_t clock);
>>> int (*set_min_deep_sleep_dcefclk)(void *handle, uint32_t clock);
>>> int (

RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset

2019-05-23 Thread Deng, Emily



>-Original Message-
>From: Alex Deucher 
>Sent: Friday, May 24, 2019 12:09 AM
>To: Deng, Emily 
>Cc: amd-gfx list 
>Subject: Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>reset
>
>[CAUTION: External Email]
>
>On Thu, May 23, 2019 at 6:22 AM Emily Deng  wrote:
>>
>> For passthrough, after rebooted the VM, driver will do a baco reset
>> before doing other driver initialization during loading  driver. For
>> doing the baco reset, it will first check the baco reset capability.
>> So first need to set the cap from the vbios information or baco reset
>> won't be enabled.
>>
>> Signed-off-by: Emily Deng 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 
>>  drivers/gpu/drm/amd/amdgpu/soc15.c |  3 ++-
>>  drivers/gpu/drm/amd/include/kgd_pp_interface.h |  1 +
>>  drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 16
>+++
>>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c |  1 +
>>  .../amd/powerplay/hwmgr/vega10_processpptables.c   | 24
>++
>>  .../amd/powerplay/hwmgr/vega10_processpptables.h   |  1 +
>>  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h  |  1 +
>>  8 files changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index bdd1fe73..2dde672 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2611,6 +2611,14 @@ int amdgpu_device_init(struct amdgpu_device
>*adev,
>>  *  E.g., driver was not cleanly unloaded previously, etc.
>>  */
>> if (!amdgpu_sriov_vf(adev) &&
>> amdgpu_asic_need_reset_on_init(adev)) {
>> +   if (amdgpu_passthrough(adev) && adev->powerplay.pp_funcs &&
>adev->powerplay.pp_funcs->set_asic_baco_cap) {
>> +   r = adev->powerplay.pp_funcs->set_asic_baco_cap(adev-
>>powerplay.pp_handle);
>> +   if (r) {
>> +   dev_err(adev->dev, "set baco capability 
>> failed\n");
>> +   goto failed;
>> +   }
>> +   }
>> +
>
>I think it would be cleaner to add this to hwmgr_early_init() or something
>called from early init for powerplay.
I  also preferred to put it in the hwmgr_early_init, but as the function 
set_asic_baco_cap  need to get the vbios info,  so need to put the 
amdgpu_get_bios before early init. If so the code changes too big.
>
>Alex
>
>> r = amdgpu_asic_reset(adev);
>> if (r) {
>> dev_err(adev->dev, "asic reset on init
>> failed\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> index 78bd4fc..d9fdd95 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> @@ -764,7 +764,8 @@ static bool soc15_need_reset_on_init(struct
>amdgpu_device *adev)
>> /* Just return false for soc15 GPUs.  Reset does not seem to
>>  * be necessary.
>>  */
>> -   return false;
>> +   if (!amdgpu_passthrough(adev))
>> +   return false;
>>
>> if (adev->flags & AMD_IS_APU)
>> return false;
>> diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> index 9f661bf..c6e2a51 100644
>> --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> @@ -296,6 +296,7 @@ struct amd_pm_funcs {
>> int (*set_hard_min_fclk_by_freq)(void *handle, uint32_t clock);
>> int (*set_min_deep_sleep_dcefclk)(void *handle, uint32_t clock);
>> int (*get_asic_baco_capability)(void *handle, bool *cap);
>> +   int (*set_asic_baco_cap)(void *handle);
>> int (*get_asic_baco_state)(void *handle, int *state);
>> int (*set_asic_baco_state)(void *handle, int state);
>> int (*get_ppfeature_status)(void *handle, char *buf); diff
>> --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>> b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>> index bea1587..9856760 100644
>> --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>> +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>> @@ -1404,6 +1404,21 @@ static int pp_set_active_display_count(void
>*handle

RE: [PATCH v2] drm/amdgpu: Need to set the baco cap before baco reset

2019-05-22 Thread Deng, Emily

Sorry, I have pushed the change as Evan gave the reviewed-by, I will send 
another patch to reference your review comments, do you think it is Ok?
>-Original Message-
>From: Alex Deucher 
>Sent: Thursday, May 23, 2019 11:54 AM
>To: Deng, Emily 
>Cc: amd-gfx list 
>Subject: Re: [PATCH v2] drm/amdgpu: Need to set the baco cap before baco
>reset
>
>[CAUTION: External Email]
>
>On Wed, May 22, 2019 at 11:48 PM Alex Deucher 
>wrote:
>>
>> On Wed, May 22, 2019 at 11:27 PM Emily Deng 
>wrote:
>> >
>>
>> Please include a patch description.
The comment is lost when using "git send-email", I will check why.
>>
>> > Signed-off-by: Emily Deng 
>> > ---
>> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-
>> >  drivers/gpu/drm/amd/include/kgd_pp_interface.h |  1 +
>> >  drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 16
>
>> >  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c |  1 +
>> >  .../amd/powerplay/hwmgr/vega10_processpptables.c   | 22
>++
>> >  .../amd/powerplay/hwmgr/vega10_processpptables.h   |  1 +
>> >  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h  |  1 +
>> >  7 files changed, 51 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > index d6286ed..5288763 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > @@ -2605,7 +2605,15 @@ int amdgpu_device_init(struct amdgpu_device
>*adev,
>> > /* check if we need to reset the asic
>> >  *  E.g., driver was not cleanly unloaded previously, etc.
>> >  */
>> > -   if (!amdgpu_sriov_vf(adev) &&
>amdgpu_asic_need_reset_on_init(adev)) {
>> > +   if (amdgpu_passthrough(adev) &&
>> > + amdgpu_asic_need_reset_on_init(adev)) {
>>
>> This will change the current behavior on baremetal.
Ok, I will put the passthrough check to "if(adev->powerplay.pp_funcs && 
adev->powerplay.pp_funcs->set_asic_baco_cap)"
>>
>> > +   if (adev->powerplay.pp_funcs && adev->powerplay.pp_funcs-
>>set_asic_baco_cap) {
>> > +   r = 
>> > adev->powerplay.pp_funcs->set_asic_baco_cap(adev-
>>powerplay.pp_handle);
>> > +   if (r) {
>> > +   dev_err(adev->dev, "set baco capability 
>> > failed\n");
>> > +   goto failed;
>> > +   }
>> > +   }
>> > +
>>
>> This will also change the current behavior on bare metal.
>>
>> I think you may want to rework this a bit otherwise this change won't
>> really make any difference due to this patch:
>> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-ne
>> xt=60ae2cd5aec94dc6459bdee5c610bb5c76a1d0ae
>> We need to avoid breaking module reload (modeprobe amdgpu; modprobe
>-r
>> amdgpu; modeprobe amdgpu).
>> I think it would be cleaner to call set_asic_baco_cap() in
>> hwmgr_early_init() and then return true in soc15_need_reset_on_init()
>> is it's passthrough mode.  then everything should just work as is.
Ok, will call set_asic_baco_cap in vega10_hwmgr_init
>Assuming you set the set_asic_baco_cap callbacks for vega20 and vega12 as
>well. Otherwise, you'll have to limit it to vega10 which starts to get messy
>IMHO.
I don't add the callbacks for vega20 and vega12, as no platform to test these. 
And if vega20 or vega12 need this, then they could add this and test this.
Currently only for vega10.
>Alex
>
>>
>> > r = amdgpu_asic_reset(adev);
>> > if (r) {
>> > dev_err(adev->dev, "asic reset on init
>> > failed\n"); diff --git
>> > a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> > b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> > index 2b579ba..0dcc18d 100644
>> > --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> > +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> > @@ -285,6 +285,7 @@ struct amd_pm_funcs {
>> > int (*set_hard_min_fclk_by_freq)(void *handle, uint32_t clock);
>> > int (*set_min_deep_sleep_dcefclk)(void *handle, uint32_t clock);
>> > int (*get_asic_baco_capability)(void *handle, bool *cap);
>> > +   in

RE: [PATCH v2] drm/amdgpu: Need to set the baco cap before baco reset

2019-05-22 Thread Deng, Emily

Hi Evan,
 If don’t call set_asic_baco_cap， then couldn't enable baco, so need to 
modify amdgpu_sriov_vf to limit to passthrough. And the comment is in the 
patch, but don't know why it lost when using " git send-email".
 And as you gave the reviewed-by, I already pushed the patch to the 
drm-next.

Best wishes
Emily Deng



>-Original Message-
>From: Quan, Evan 
>Sent: Thursday, May 23, 2019 11:46 AM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: RE: [PATCH v2] drm/amdgpu: Need to set the baco cap before baco
>reset
>
>I would actually expect the followings
>
>   if (!amdgpu_sriov_vf(adev) && amdgpu_asic_need_reset_on_init(adev))
>{ --> no touch for this
>+if (amdgpu_passthrough(adev) && adev->powerplay.pp_funcs &&
>adev->powerplay.pp_funcs->set_asic_baco_cap) {
>+   r = adev->powerplay.pp_funcs->set_asic_baco_cap(adev-
>>powerplay.pp_handle);
>+   if (r) {
>+   dev_err(adev->dev, "set baco capability 
>failed\n");
>+   goto failed;
>+   }
>+   }
>+
>
>And btw the commit description was lost compared with v1.
>
>Regards,
>Evan
>> -----Original Message-
>> From: amd-gfx  On Behalf Of
>> Emily Deng
>> Sent: Thursday, May 23, 2019 11:27 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Deng, Emily 
>> Subject: [PATCH v2] drm/amdgpu: Need to set the baco cap before baco
>> reset
>>
>> [CAUTION: External Email]
>>
>> Signed-off-by: Emily Deng 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-
>>  drivers/gpu/drm/amd/include/kgd_pp_interface.h |  1 +
>>  drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 16
>> 
>>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c |  1 +
>>  .../amd/powerplay/hwmgr/vega10_processpptables.c   | 22
>> ++
>>  .../amd/powerplay/hwmgr/vega10_processpptables.h   |  1 +
>>  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h  |  1 +
>>  7 files changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index d6286ed..5288763 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2605,7 +2605,15 @@ int amdgpu_device_init(struct amdgpu_device
>> *adev,
>> /* check if we need to reset the asic
>>  *  E.g., driver was not cleanly unloaded previously, etc.
>>  */
>> -   if (!amdgpu_sriov_vf(adev) &&
>> amdgpu_asic_need_reset_on_init(adev)) {
>> +   if (amdgpu_passthrough(adev) &&
>> amdgpu_asic_need_reset_on_init(adev)) {
>> +   if (adev->powerplay.pp_funcs &&
>> + adev->powerplay.pp_funcs-
>> >set_asic_baco_cap) {
>> +   r =
>> + adev->powerplay.pp_funcs->set_asic_baco_cap(adev-
>> >powerplay.pp_handle);
>> +   if (r) {
>> +   dev_err(adev->dev, "set baco capability 
>> failed\n");
>> +   goto failed;
>> +   }
>> +   }
>> +
>> r = amdgpu_asic_reset(adev);
>> if (r) {
>> dev_err(adev->dev, "asic reset on init
>> failed\n"); diff --git
>> a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> index 2b579ba..0dcc18d 100644
>> --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
>> @@ -285,6 +285,7 @@ struct amd_pm_funcs {
>> int (*set_hard_min_fclk_by_freq)(void *handle, uint32_t clock);
>> int (*set_min_deep_sleep_dcefclk)(void *handle, uint32_t clock);
>> int (*get_asic_baco_capability)(void *handle, bool *cap);
>> +   int (*set_asic_baco_cap)(void *handle);
>> int (*get_asic_baco_state)(void *handle, int *state);
>> int (*set_asic_baco_state)(void *handle, int state);
>> int (*get_ppfeature_status)(void *handle, char *buf); diff
>> --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>> b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>> index bea1587..9856760 100644
>> --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
>> +++ b/drivers/gpu/d

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-04-24 Thread Deng, Emily

Hi Emil,
I don't understand your idea clear about follow, what about the case that 
only has 1 GPU, and don't support pci_domain? For this case, it still need to 
fallback to pci_domain_ok=0.
>That aside, I think we can do a slightly better fix. Have you tried:
> - resetting the pci_domain_ok=1 on each iteration, and
> - continuing to the next device when the second
>drmSetInterfaceVersion() call fails

Best wishes
Emily Deng


>-Original Message-
>From: Emil Velikov 
>Sent: Friday, February 15, 2019 11:02 PM
>To: Deng, Emily 
>Cc: amd-gfx mailing list 
>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but
>same BDF
>
>Hi Emily,
>
>Please note that code outside of amdgpu/ is used by all open source drivers.
>Thus patches should have dri-deve@ in to/cc as mentioned in CONTRIBUTING
>
>On Thu, 14 Feb 2019 at 07:53, Emily Deng  wrote:
>>
>> For multiple GPUs which has the same BDF, but has different domain ID,
>> the drmOpenByBusid will return the wrong fd when startx.
>>
>> The reproduce sequence as below:
>> 1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>> and the
>> fd0 is master privilege;
>> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>> will open Card0 first, this time, the fd1 for opening Card0 is not
>> master privilege, and will call drmSetInterfaceVersion to identify the
>> domain ID feature, as the fd1 is not master privilege, then
>> drmSetInterfaceVersion will fail, and then won't compare domain ID, then
>return the wrong fd for Card1.
>>
>> Solution:
>> First loop search the best match fd about drm 1.4.
>>
>First and foremost, I wish we can stop using using these legacy APIs.
>They're fairly fragile and as you can see the are strange things happening.
>We could instead use drmGetDevices2() to gather a list of devices and pick the
>one we're interested.
>
>That aside, I think we can do a slightly better fix. Have you tried:
> - resetting the pci_domain_ok=1 on each iteration, and
> - continuing to the next device when the second
>drmSetInterfaceVersion() call fails
>
>AFAICT it should produce the same result, while being shorter and faster.
>
>Thanks
>-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-04-24 Thread Deng, Emily

Hi Emil and Alex,
Sorry for miss your emails. I will update a new patch as Emil's suggestion.

Best wishes
Emily Deng



>-Original Message-
>From: Emil Velikov 
>Sent: Thursday, April 18, 2019 2:26 AM
>To: Deucher, Alexander 
>Cc: Alex Deucher ; Deng, Emily
>; Maling list - DRI developers de...@lists.freedesktop.org>; amd-gfx list 
>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but
>same BDF
>
>On Mon, 25 Feb 2019 at 19:53, Deucher, Alexander
> wrote:
>>
>> > -Original Message-
>> > From: amd-gfx  On Behalf Of
>> > Emil Velikov
>> > Sent: Monday, February 25, 2019 8:09 AM
>> > To: Alex Deucher 
>> > Cc: Deng, Emily ; Maling list - DRI developers
>> > ; amd-gfx list
>> > 
>> > Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent
>> > domainID but same BDF
>> >
>> > Hi all,
>> >
>> > This patch causes unnecessary round trip by openning the nodes. As
>> > mentioned previously this could be trivially fixed [1].
>> >
>> > Even Emily acknowledged that [1], yet the sub-par fix was merged.
>> > Can we
>> > revert+fixup this properly?
>> >
>>
>> Sorry, I totally missed your reply.  I'm having Internet issues at the moment
>so if you want to revert for now, I'll work with Emily to address your
>suggestions later in the week or next.  Emily, can you take a look at
>addressing Emil's concerns with an updated patch?
>>
>
>Doesn't seem like there's any follow-up work on this, so I've reverted this for
>now.
>I'm a sad panda for doing it - hopefully v2 will come shortly.
>
>
>-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list

2019-04-01 Thread Deng, Emily

Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 
0) " to "if (r <0 || tmo <= 0)".
r = dma_fence_wait_timeout(f, false, timeout);
if (r == 0) {
r = -ETIMEDOUT;
break;
} else if (r < 0) {
break;
}

Best wishes
Emily Deng


>-Original Message-
>From: amd-gfx  On Behalf Of wentalou
>Sent: Monday, April 1, 2019 4:59 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Lou, Wentao 
>Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if
>only one node in shadow_list
>
>amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>r would remain zero if there is only one node in shadow_list.
>current code would always return failure when r <= 0.
>
>Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>Signed-off-by: Wentao Lou 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index c4c61e9..5cf21a4 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)
>   struct dma_fence *fence = NULL, *next = NULL;
>   struct amdgpu_bo *shadow;
>   long r = 1, tmo;
>+  bool single_shadow = false;
>
>   if (amdgpu_sriov_runtime(adev))
>   tmo = msecs_to_jiffies(8000);
>@@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)
>   r = dma_fence_wait_timeout(fence, false, tmo);
>   dma_fence_put(fence);
>   fence = next;
>+  single_shadow = false;
>   if (r <= 0)
>   break;
>   } else {
>   fence = next;
>+  single_shadow = true;
>   }
>   }
>   mutex_unlock(>shadow_list_lock);
>@@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)
>   tmo = dma_fence_wait_timeout(fence, false, tmo);
>   dma_fence_put(fence);
>
>-  if (r <= 0 || tmo <= 0) {
>+  /* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>single shadow in list */
>+  if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>   DRM_ERROR("recover vram bo from shadow failed\n");
>   return -EIO;
>   }
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Correct the irq types' num of sdma

2019-03-27 Thread Deng, Emily

Thanks, will modify the patch as your good suggestion.

Best wishes
Emily Deng



>-Original Message-
>From: Christian König 
>Sent: Wednesday, March 27, 2019 7:40 PM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Correct the irq types' num of sdma
>
>Am 27.03.19 um 08:38 schrieb Emily Deng:
>> Fix the issue about TDR-2 will have "fallback timer expired on ring sdma1".
>> It is because the wrong number of irq types setting.
>
>Good catch, but the solution is not really clean. The correct approach would 
>be to
>fix the amdgpu_sdma_irq enum.
>
>See the definition:
>> enum amdgpu_sdma_irq {
>>     AMDGPU_SDMA_IRQ_TRAP0 = 0,
>>     AMDGPU_SDMA_IRQ_TRAP1,
>>     AMDGPU_SDMA_IRQ_ECC0,
>>     AMDGPU_SDMA_IRQ_ECC1,
>>
>>     AMDGPU_SDMA_IRQ_LAST
>> };
>
>As far as I can see that doesn't make any sense, cause it denotes the source 
>of the
>interrupt and not the type.
>
>The AMDGPU_SDMA_IRQ_TRAP0 and AMDGPU_SDMA_IRQ_TRAP1 should be
>renamed to
>AMDGPU_SDMA_IRQ_INSTANCE0 and AMDGPU_SDMA_IRQ_INSTANCE1 and the
>_ECC values removed altogether.
>
>Christian.
>
>>
>> Signed-off-by: Emily Deng 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index 3ac5abe..72ec51a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -1908,7 +1908,7 @@ static int sdma_v4_0_set_ecc_irq_state(struct
>amdgpu_device *adev,
>>   {
>>  u32 sdma_edc_config;
>>
>> -u32 reg_offset = (type == AMDGPU_SDMA_IRQ_ECC0) ?
>> +u32 reg_offset = (type == 0) ?
>>  sdma_v4_0_get_reg_offset(adev, 0, mmSDMA0_EDC_CONFIG) :
>>  sdma_v4_0_get_reg_offset(adev, 1, mmSDMA0_EDC_CONFIG);
>>
>> @@ -2196,10 +2196,10 @@ static const struct amdgpu_irq_src_funcs
>> sdma_v4_0_ecc_irq_funcs = {
>>
>>   static void sdma_v4_0_set_irq_funcs(struct amdgpu_device *adev)
>>   {
>> -adev->sdma.trap_irq.num_types = AMDGPU_SDMA_IRQ_LAST;
>> +adev->sdma.trap_irq.num_types = 2;
>>  adev->sdma.trap_irq.funcs = _v4_0_trap_irq_funcs;
>>  adev->sdma.illegal_inst_irq.funcs = _v4_0_illegal_inst_irq_funcs;
>> -adev->sdma.ecc_irq.num_types = AMDGPU_SDMA_IRQ_LAST;
>> +adev->sdma.ecc_irq.num_types = 2;
>>  adev->sdma.ecc_irq.funcs = _v4_0_ecc_irq_funcs;
>>   }
>>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-24 Thread Deng, Emily

Thank you very much.

Best wishes
Emily Deng



>-Original Message-
>From: Alex Deucher 
>Sent: Saturday, February 23, 2019 5:05 AM
>To: Deng, Emily 
>Cc: Maling list - DRI developers ; amd-gfx 
>list
>
>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but 
>same
>BDF
>
>Pushed.  Thanks!
>
>Alex
>
>On Thu, Feb 21, 2019 at 9:36 PM Deng, Emily  wrote:
>>
>> Hi Alex,
>> Please help, thanks.
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>> >-Original Message-
>> >From: Alex Deucher 
>> >Sent: Friday, February 22, 2019 12:13 AM
>> >To: Deng, Emily ; Maling list - DRI developers
>> >
>> >Cc: amd-gfx list 
>> >Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent
>> >domainID but same BDF
>> >
>> >On Thu, Feb 14, 2019 at 2:53 AM Emily Deng  wrote:
>> >>
>> >> For multiple GPUs which has the same BDF, but has different domain
>> >> ID, the drmOpenByBusid will return the wrong fd when startx.
>> >>
>> >> The reproduce sequence as below:
>> >> 1. Call drmOpenByBusid to open Card0, then will return the right
>> >> fd0, and the
>> >> fd0 is master privilege;
>> >> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid,
>> >> it will open Card0 first, this time, the fd1 for opening Card0 is
>> >> not master privilege, and will call drmSetInterfaceVersion to
>> >> identify the domain ID feature, as the fd1 is not master privilege,
>> >> then drmSetInterfaceVersion will fail, and then won't compare
>> >> domain ID, then
>> >return the wrong fd for Card1.
>> >>
>> >> Solution:
>> >> First loop search the best match fd about drm 1.4.
>> >>
>> >> Signed-off-by: Emily Deng 
>> >
>> >Reviewed-by: Alex Deucher 
>> >
>> >Do you need someone to commit this for you?
>> >
>> >Alex
>> >
>> >> ---
>> >>  xf86drm.c | 23 +++
>> >>  1 file changed, 23 insertions(+)
>> >>
>> >> diff --git a/xf86drm.c b/xf86drm.c
>> >> index 336d64d..b60e029 100644
>> >> --- a/xf86drm.c
>> >> +++ b/xf86drm.c
>> >> @@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid,
>> >> int
>> >type)
>> >>  if (base < 0)
>> >>  return -1;
>> >>
>> >> +/* We need to try for 1.4 first for proper PCI domain support
>> >> + */
>> >>  drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
>> >>  for (i = base; i < base + DRM_MAX_MINOR; i++) {
>> >>  fd = drmOpenMinor(i, 1, type);
>> >>  drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>> >>  if (fd >= 0) {
>> >> +sv.drm_di_major = 1;
>> >> +sv.drm_di_minor = 4;
>> >> +sv.drm_dd_major = -1;/* Don't care */
>> >> +sv.drm_dd_minor = -1;/* Don't care */
>> >> +if (!drmSetInterfaceVersion(fd, )) {
>> >> +buf = drmGetBusid(fd);
>> >> +drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>> >> +if (buf && drmMatchBusID(buf, busid, 1)) {
>> >> +drmFreeBusid(buf);
>> >> +return fd;
>> >> +}
>> >> +if (buf)
>> >> +drmFreeBusid(buf);
>> >> +}
>> >> +close(fd);
>> >> +}
>> >> +}
>> >> +
>> >> +   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>> >> +fd = drmOpenMinor(i, 1, type);
>> >> +drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>> >> +if (fd >= 0) {
>> >>  /* We need to try for 1.4 first for proper PCI domain support
>> >>   * and if that fails, we know the kernel is busted
>> >>   */
>> >> --
>> >> 2.7.4
>> >>
>> >> ___
>> >> amd-gfx mailing list
>> >> amd-gfx@lists.freedesktop.org
>> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-21 Thread Deng, Emily

Hi Alex,
Please help, thanks.

Best wishes
Emily Deng



>-Original Message-
>From: Alex Deucher 
>Sent: Friday, February 22, 2019 12:13 AM
>To: Deng, Emily ; Maling list - DRI developers de...@lists.freedesktop.org>
>Cc: amd-gfx list 
>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but 
>same
>BDF
>
>On Thu, Feb 14, 2019 at 2:53 AM Emily Deng  wrote:
>>
>> For multiple GPUs which has the same BDF, but has different domain ID,
>> the drmOpenByBusid will return the wrong fd when startx.
>>
>> The reproduce sequence as below:
>> 1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>> and the
>> fd0 is master privilege;
>> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>> will open Card0 first, this time, the fd1 for opening Card0 is not
>> master privilege, and will call drmSetInterfaceVersion to identify the
>> domain ID feature, as the fd1 is not master privilege, then
>> drmSetInterfaceVersion will fail, and then won't compare domain ID, then
>return the wrong fd for Card1.
>>
>> Solution:
>> First loop search the best match fd about drm 1.4.
>>
>> Signed-off-by: Emily Deng 
>
>Reviewed-by: Alex Deucher 
>
>Do you need someone to commit this for you?
>
>Alex
>
>> ---
>>  xf86drm.c | 23 +++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/xf86drm.c b/xf86drm.c
>> index 336d64d..b60e029 100644
>> --- a/xf86drm.c
>> +++ b/xf86drm.c
>> @@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid, int
>type)
>>  if (base < 0)
>>  return -1;
>>
>> +/* We need to try for 1.4 first for proper PCI domain support */
>>  drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
>>  for (i = base; i < base + DRM_MAX_MINOR; i++) {
>>  fd = drmOpenMinor(i, 1, type);
>>  drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>>  if (fd >= 0) {
>> +sv.drm_di_major = 1;
>> +sv.drm_di_minor = 4;
>> +sv.drm_dd_major = -1;/* Don't care */
>> +sv.drm_dd_minor = -1;/* Don't care */
>> +if (!drmSetInterfaceVersion(fd, )) {
>> +buf = drmGetBusid(fd);
>> +drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>> +if (buf && drmMatchBusID(buf, busid, 1)) {
>> +drmFreeBusid(buf);
>> +return fd;
>> +}
>> +if (buf)
>> +drmFreeBusid(buf);
>> +}
>> +close(fd);
>> +}
>> +}
>> +
>> +   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>> +fd = drmOpenMinor(i, 1, type);
>> +drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>> +if (fd >= 0) {
>>  /* We need to try for 1.4 first for proper PCI domain support
>>   * and if that fails, we know the kernel is busted
>>   */
>> --
>> 2.7.4
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-19 Thread Deng, Emily

Ping ..

>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Monday, February 18, 2019 10:17 AM
>To: Alex Deucher ; Maling list - DRI developers de...@lists.freedesktop.org>
>Cc: amd-gfx list 
>Subject: RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but 
>same
>BDF
>
>Thanks Alex to help to add the dri-devel.
>
>Best wishes
>Emily Deng
>
>
>>-Original Message-
>>From: Alex Deucher 
>>Sent: Friday, February 15, 2019 11:14 PM
>>To: Deng, Emily ; Maling list - DRI developers
>>
>>Cc: amd-gfx list 
>>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID
>>but same BDF
>>
>>Adding dri-devel.
>>
>>On Thu, Feb 14, 2019 at 2:53 AM Emily Deng  wrote:
>>>
>>> For multiple GPUs which has the same BDF, but has different domain
>>> ID, the drmOpenByBusid will return the wrong fd when startx.
>>>
>>> The reproduce sequence as below:
>>> 1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>>> and the
>>> fd0 is master privilege;
>>> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>>> will open Card0 first, this time, the fd1 for opening Card0 is not
>>> master privilege, and will call drmSetInterfaceVersion to identify
>>> the domain ID feature, as the fd1 is not master privilege, then
>>> drmSetInterfaceVersion will fail, and then won't compare domain ID,
>>> then
>>return the wrong fd for Card1.
>>>
>>> Solution:
>>> First loop search the best match fd about drm 1.4.
>>>
>>> Signed-off-by: Emily Deng 
>>> ---
>>>  xf86drm.c | 23 +++
>>>  1 file changed, 23 insertions(+)
>>>
>>> diff --git a/xf86drm.c b/xf86drm.c
>>> index 336d64d..b60e029 100644
>>> --- a/xf86drm.c
>>> +++ b/xf86drm.c
>>> @@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid,
>>> int
>>type)
>>>  if (base < 0)
>>>  return -1;
>>>
>>> +/* We need to try for 1.4 first for proper PCI domain support */
>>>  drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
>>>  for (i = base; i < base + DRM_MAX_MINOR; i++) {
>>>  fd = drmOpenMinor(i, 1, type);
>>>  drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>>>  if (fd >= 0) {
>>> +sv.drm_di_major = 1;
>>> +sv.drm_di_minor = 4;
>>> +sv.drm_dd_major = -1;/* Don't care */
>>> +sv.drm_dd_minor = -1;/* Don't care */
>>> +if (!drmSetInterfaceVersion(fd, )) {
>>> +buf = drmGetBusid(fd);
>>> +drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>>> +if (buf && drmMatchBusID(buf, busid, 1)) {
>>> +drmFreeBusid(buf);
>>> +return fd;
>>> +}
>>> +if (buf)
>>> +drmFreeBusid(buf);
>>> +}
>>> +close(fd);
>>> +}
>>> +}
>>> +
>>> +   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>>> +fd = drmOpenMinor(i, 1, type);
>>> +drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>>> +if (fd >= 0) {
>>>  /* We need to try for 1.4 first for proper PCI domain support
>>>   * and if that fails, we know the kernel is busted
>>>   */
>>> --
>>> 2.7.4
>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-17 Thread Deng, Emily

Thanks Alex to help to add the dri-devel. 

Best wishes
Emily Deng


>-Original Message-
>From: Alex Deucher 
>Sent: Friday, February 15, 2019 11:14 PM
>To: Deng, Emily ; Maling list - DRI developers de...@lists.freedesktop.org>
>Cc: amd-gfx list 
>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but 
>same
>BDF
>
>Adding dri-devel.
>
>On Thu, Feb 14, 2019 at 2:53 AM Emily Deng  wrote:
>>
>> For multiple GPUs which has the same BDF, but has different domain ID,
>> the drmOpenByBusid will return the wrong fd when startx.
>>
>> The reproduce sequence as below:
>> 1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>> and the
>> fd0 is master privilege;
>> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>> will open Card0 first, this time, the fd1 for opening Card0 is not
>> master privilege, and will call drmSetInterfaceVersion to identify the
>> domain ID feature, as the fd1 is not master privilege, then
>> drmSetInterfaceVersion will fail, and then won't compare domain ID, then
>return the wrong fd for Card1.
>>
>> Solution:
>> First loop search the best match fd about drm 1.4.
>>
>> Signed-off-by: Emily Deng 
>> ---
>>  xf86drm.c | 23 +++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/xf86drm.c b/xf86drm.c
>> index 336d64d..b60e029 100644
>> --- a/xf86drm.c
>> +++ b/xf86drm.c
>> @@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid, int
>type)
>>  if (base < 0)
>>  return -1;
>>
>> +/* We need to try for 1.4 first for proper PCI domain support */
>>  drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
>>  for (i = base; i < base + DRM_MAX_MINOR; i++) {
>>  fd = drmOpenMinor(i, 1, type);
>>  drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>>  if (fd >= 0) {
>> +sv.drm_di_major = 1;
>> +sv.drm_di_minor = 4;
>> +sv.drm_dd_major = -1;/* Don't care */
>> +sv.drm_dd_minor = -1;/* Don't care */
>> +if (!drmSetInterfaceVersion(fd, )) {
>> +buf = drmGetBusid(fd);
>> +drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>> +if (buf && drmMatchBusID(buf, busid, 1)) {
>> +drmFreeBusid(buf);
>> +return fd;
>> +}
>> +if (buf)
>> +drmFreeBusid(buf);
>> +}
>> +close(fd);
>> +}
>> +}
>> +
>> +   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>> +fd = drmOpenMinor(i, 1, type);
>> +drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>> +if (fd >= 0) {
>>  /* We need to try for 1.4 first for proper PCI domain support
>>   * and if that fails, we know the kernel is busted
>>   */
>> --
>> 2.7.4
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-17 Thread Deng, Emily

Hi Emil,
 Understand, thanks.

Best wishes
Emily Deng

>-Original Message-
>From: Emil Velikov 
>Sent: Friday, February 15, 2019 11:02 PM
>To: Deng, Emily 
>Cc: amd-gfx mailing list 
>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but 
>same
>BDF
>
>Hi Emily,
>
>Please note that code outside of amdgpu/ is used by all open source drivers.
>Thus patches should have dri-deve@ in to/cc as mentioned in CONTRIBUTING
>
>On Thu, 14 Feb 2019 at 07:53, Emily Deng  wrote:
>>
>> For multiple GPUs which has the same BDF, but has different domain ID,
>> the drmOpenByBusid will return the wrong fd when startx.
>>
>> The reproduce sequence as below:
>> 1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>> and the
>> fd0 is master privilege;
>> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>> will open Card0 first, this time, the fd1 for opening Card0 is not
>> master privilege, and will call drmSetInterfaceVersion to identify the
>> domain ID feature, as the fd1 is not master privilege, then
>> drmSetInterfaceVersion will fail, and then won't compare domain ID, then
>return the wrong fd for Card1.
>>
>> Solution:
>> First loop search the best match fd about drm 1.4.
>>
>First and foremost, I wish we can stop using using these legacy APIs.
>They're fairly fragile and as you can see the are strange things happening.
>We could instead use drmGetDevices2() to gather a list of devices and pick the
>one we're interested.
>
>That aside, I think we can do a slightly better fix. Have you tried:
> - resetting the pci_domain_ok=1 on each iteration, and
> - continuing to the next device when the second
>drmSetInterfaceVersion() call fails
>
>AFAICT it should produce the same result, while being shorter and faster.
>
>Thanks
>-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-15 Thread Deng, Emily

Ping ..

Best wishes
Emily Deng

>-Original Message-
>From: Deng, Emily 
>Sent: Friday, February 15, 2019 11:51 AM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but 
>same
>BDF
>
>Ping ..
>
>>-Original Message-
>>From: amd-gfx  On Behalf Of
>>Emily Deng
>>Sent: Thursday, February 14, 2019 3:54 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH libdrm] libdrm: Fix issue about differrent domainID but
>>same BDF
>>
>>For multiple GPUs which has the same BDF, but has different domain ID,
>>the drmOpenByBusid will return the wrong fd when startx.
>>
>>The reproduce sequence as below:
>>1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>>and the
>>fd0 is master privilege;
>>2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>>will open
>>Card0 first, this time, the fd1 for opening Card0 is not master
>>privilege, and will call drmSetInterfaceVersion to identify the domain
>>ID feature, as the fd1 is not master privilege, then
>>drmSetInterfaceVersion will fail, and then won't compare domain ID, then
>return the wrong fd for Card1.
>>
>>Solution:
>>First loop search the best match fd about drm 1.4.
>>
>>Signed-off-by: Emily Deng 
>>---
>> xf86drm.c | 23 +++
>> 1 file changed, 23 insertions(+)
>>
>>diff --git a/xf86drm.c b/xf86drm.c
>>index 336d64d..b60e029 100644
>>--- a/xf86drm.c
>>+++ b/xf86drm.c
>>@@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid, int
>type)
>> if (base < 0)
>> return -1;
>>
>>+/* We need to try for 1.4 first for proper PCI domain support */
>> drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
>> for (i = base; i < base + DRM_MAX_MINOR; i++) {
>> fd = drmOpenMinor(i, 1, type);
>> drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>> if (fd >= 0) {
>>+sv.drm_di_major = 1;
>>+sv.drm_di_minor = 4;
>>+sv.drm_dd_major = -1;/* Don't care */
>>+sv.drm_dd_minor = -1;/* Don't care */
>>+if (!drmSetInterfaceVersion(fd, )) {
>>+buf = drmGetBusid(fd);
>>+drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>>+if (buf && drmMatchBusID(buf, busid, 1)) {
>>+drmFreeBusid(buf);
>>+return fd;
>>+}
>>+if (buf)
>>+drmFreeBusid(buf);
>>+}
>>+close(fd);
>>+}
>>+}
>>+
>>+   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>>+fd = drmOpenMinor(i, 1, type);
>>+drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>>+if (fd >= 0) {
>> /* We need to try for 1.4 first for proper PCI domain support
>>  * and if that fails, we know the kernel is busted
>>  */
>>--
>>2.7.4
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-15 Thread Deng, Emily

Ping ..

>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Thursday, February 14, 2019 3:54 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same
>BDF
>
>For multiple GPUs which has the same BDF, but has different domain ID, the
>drmOpenByBusid will return the wrong fd when startx.
>
>The reproduce sequence as below:
>1. Call drmOpenByBusid to open Card0, then will return the right fd0, and the
>fd0 is master privilege;
>2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it will open
>Card0 first, this time, the fd1 for opening Card0 is not master privilege, and 
>will
>call drmSetInterfaceVersion to identify the domain ID feature, as the fd1 is 
>not
>master privilege, then drmSetInterfaceVersion will fail, and then won't compare
>domain ID, then return the wrong fd for Card1.
>
>Solution:
>First loop search the best match fd about drm 1.4.
>
>Signed-off-by: Emily Deng 
>---
> xf86drm.c | 23 +++
> 1 file changed, 23 insertions(+)
>
>diff --git a/xf86drm.c b/xf86drm.c
>index 336d64d..b60e029 100644
>--- a/xf86drm.c
>+++ b/xf86drm.c
>@@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid, int type)
> if (base < 0)
> return -1;
>
>+/* We need to try for 1.4 first for proper PCI domain support */
> drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
> for (i = base; i < base + DRM_MAX_MINOR; i++) {
> fd = drmOpenMinor(i, 1, type);
> drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
> if (fd >= 0) {
>+sv.drm_di_major = 1;
>+sv.drm_di_minor = 4;
>+sv.drm_dd_major = -1;/* Don't care */
>+sv.drm_dd_minor = -1;/* Don't care */
>+if (!drmSetInterfaceVersion(fd, )) {
>+buf = drmGetBusid(fd);
>+drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>+if (buf && drmMatchBusID(buf, busid, 1)) {
>+drmFreeBusid(buf);
>+return fd;
>+}
>+if (buf)
>+drmFreeBusid(buf);
>+}
>+close(fd);
>+}
>+}
>+
>+   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>+fd = drmOpenMinor(i, 1, type);
>+drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>+if (fd >= 0) {
> /* We need to try for 1.4 first for proper PCI domain support
>  * and if that fails, we know the kernel is busted
>  */
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/sriov:Correct pfvf exchange logic

2019-01-01 Thread Deng, Emily

Ping ..

>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Saturday, December 29, 2018 5:56 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] drm/amdgpu/sriov:Correct pfvf exchange logic
>
>The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu 
>reset.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  | 2 +-
> 2 files changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 98df8e4..7ff3a28 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -1701,8 +1701,10 @@ static int amdgpu_device_ip_init(struct
>amdgpu_device *adev)
>   amdgpu_xgmi_add_device(adev);
>   amdgpu_amdkfd_device_init(adev);
>
>-  if (amdgpu_sriov_vf(adev))
>+  if (amdgpu_sriov_vf(adev)) {
>+  amdgpu_virt_init_data_exchange(adev);
>   amdgpu_virt_release_full_gpu(adev, true);
>+  }
>
>   return 0;
> }
>@@ -2632,9 +2634,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   goto failed;
>   }
>
>-  if (amdgpu_sriov_vf(adev))
>-  amdgpu_virt_init_data_exchange(adev);
>-
>   amdgpu_fbdev_init(adev);
>
>   r = amdgpu_pm_sysfs_init(adev);
>@@ -3226,6 +3225,7 @@ static int amdgpu_device_reset_sriov(struct
>amdgpu_device *adev,
>   r = amdgpu_ib_ring_tests(adev);
>
> error:
>+  amdgpu_virt_init_data_exchange(adev);
>   amdgpu_virt_release_full_gpu(adev, true);
>   if (!r && adev->virt.gim_feature &
>AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
>   atomic_inc(>vram_lost_counter);
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>index 8cbb465..b11a1c17 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>@@ -174,7 +174,7 @@ static int xgpu_ai_send_access_requests(struct
>amdgpu_device *adev,
>   return r;
>   }
>   /* Retrieve checksum from mailbox2 */
>-  if (req == IDH_REQ_GPU_INIT_ACCESS) {
>+  if (req == IDH_REQ_GPU_INIT_ACCESS || req ==
>+IDH_REQ_GPU_RESET_ACCESS) {
>   adev->virt.fw_reserve.checksum_key =
>   RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
>
>   mmBIF_BX_PF0_MAILBOX_MSGBUF_RCV_DW2));
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/2] drm/amdgpu/virtual_dce: No need to pin the cursor bo

2018-12-25 Thread Deng, Emily

Ping..

Best wishes
Emily Deng



>-Original Message-
>From: Deng, Emily 
>Sent: Tuesday, December 25, 2018 11:53 AM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 2/2] drm/amdgpu/virtual_dce: No need to pin the cursor bo
>
>Ping..
>Please help to review. This issue will gate promotion to mainline, as startx 
>will
>have call trace when use virtual display.
>
>>-Original Message-
>>From: amd-gfx  On Behalf Of
>>Emily Deng
>>Sent: Monday, December 24, 2018 2:09 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH 2/2] drm/amdgpu/virtual_dce: No need to pin the cursor
>>bo
>>
>>For virtual display feature, no need to pin cursor bo.
>>
>>Signed-off-by: Emily Deng 
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>index 8a078f4..98df8e4 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>@@ -2798,7 +2798,7 @@ int amdgpu_device_suspend(struct drm_device *dev,
>>bool suspend, bool fbcon)
>>  struct drm_framebuffer *fb = crtc->primary->fb;
>>  struct amdgpu_bo *robj;
>>
>>- if (amdgpu_crtc->cursor_bo) {
>>+ if (amdgpu_crtc->cursor_bo && !adev-
>>>enable_virtual_display) {
>>  struct amdgpu_bo *aobj =
>>gem_to_amdgpu_bo(amdgpu_crtc->cursor_bo);
>>  r = amdgpu_bo_reserve(aobj, true);
>>  if (r == 0) {
>>@@ -2906,7 +2906,7 @@ int amdgpu_device_resume(struct drm_device *dev,
>>bool resume, bool fbcon)
>>  list_for_each_entry(crtc, >mode_config.crtc_list, head) {
>>  struct amdgpu_crtc *amdgpu_crtc =
>>to_amdgpu_crtc(crtc);
>>
>>- if (amdgpu_crtc->cursor_bo) {
>>+ if (amdgpu_crtc->cursor_bo && !adev-
>>>enable_virtual_display) {
>>  struct amdgpu_bo *aobj =
>>gem_to_amdgpu_bo(amdgpu_crtc->cursor_bo);
>>  r = amdgpu_bo_reserve(aobj, true);
>>  if (r == 0) {
>>--
>>2.7.4
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] Revert "drm/amdgpu: WARN once if amdgpu_bo_unpin is called for an unpinned BO"

2018-12-24 Thread Deng, Emily

Hi,
Please ignore this patch.

>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Tuesday, December 25, 2018 1:47 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH] Revert "drm/amdgpu: WARN once if amdgpu_bo_unpin is
>called for an unpinned BO"
>
>This reverts commit 8870ff7fe439aa9d7a542579c4508ea50c0f5b6e.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>index 52fc6ba..959e244 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>@@ -949,7 +949,7 @@ int amdgpu_bo_unpin(struct amdgpu_bo *bo)
>   struct ttm_operation_ctx ctx = { false, false };
>   int r, i;
>
>-  if (WARN_ON_ONCE(!bo->pin_count)) {
>+  if (!bo->pin_count) {
>   dev_warn(adev->dev, "%p unpin not necessary\n", bo);
>   return 0;
>   }
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/2] drm/amdgpu/virtual_dce: No need to pin the cursor bo

2018-12-24 Thread Deng, Emily

Ping..
Please help to review. This issue will gate promotion to mainline, as startx 
will have call trace when use virtual display.

>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Monday, December 24, 2018 2:09 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH 2/2] drm/amdgpu/virtual_dce: No need to pin the cursor bo
>
>For virtual display feature, no need to pin cursor bo.
>
>Signed-off-by: Emily Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 8a078f4..98df8e4 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2798,7 +2798,7 @@ int amdgpu_device_suspend(struct drm_device *dev,
>bool suspend, bool fbcon)
>   struct drm_framebuffer *fb = crtc->primary->fb;
>   struct amdgpu_bo *robj;
>
>-  if (amdgpu_crtc->cursor_bo) {
>+  if (amdgpu_crtc->cursor_bo && !adev-
>>enable_virtual_display) {
>   struct amdgpu_bo *aobj =
>gem_to_amdgpu_bo(amdgpu_crtc->cursor_bo);
>   r = amdgpu_bo_reserve(aobj, true);
>   if (r == 0) {
>@@ -2906,7 +2906,7 @@ int amdgpu_device_resume(struct drm_device *dev,
>bool resume, bool fbcon)
>   list_for_each_entry(crtc, >mode_config.crtc_list, head) {
>   struct amdgpu_crtc *amdgpu_crtc =
>to_amdgpu_crtc(crtc);
>
>-  if (amdgpu_crtc->cursor_bo) {
>+  if (amdgpu_crtc->cursor_bo && !adev-
>>enable_virtual_display) {
>   struct amdgpu_bo *aobj =
>gem_to_amdgpu_bo(amdgpu_crtc->cursor_bo);
>   r = amdgpu_bo_reserve(aobj, true);
>   if (r == 0) {
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo

2018-12-21 Thread Deng, Emily

>-Original Message-
>From: Michel Dänzer 
>Sent: Friday, December 21, 2018 6:08 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>
>On 2018-12-21 10:55 a.m., Deng, Emily wrote:
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Deng, Emily
>>> Sent: Friday, December 21, 2018 5:41 PM
>>> To: Michel Dänzer 
>>> Cc: amd-gfx@lists.freedesktop.org
>>> Subject: RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>>
>>>> -----Original Message-
>>>> From: Michel Dänzer 
>>>> Sent: Friday, December 21, 2018 5:37 PM
>>>> To: Deng, Emily 
>>>> Cc: amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>>>
>>>> On 2018-12-21 10:32 a.m., Deng, Emily wrote:
>>>>>> -Original Message-
>>>>>> From: Michel Dänzer 
>>>>>> Sent: Friday, December 21, 2018 5:28 PM
>>>>>> To: Deng, Emily 
>>>>>> Cc: amd-gfx@lists.freedesktop.org
>>>>>> Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's
>>>>>> bo
>>>>>>
>>>>>> On 2018-12-21 10:17 a.m., Deng, Emily wrote:
>>>>>>>> From: amd-gfx  On Behalf
>>>>>>>> Of Deng, Emily
>>>>>>>>> From: Michel Dänzer  On 2018-12-21 9:45
>>>>>>>>> a.m., Deng, Emily wrote:
>>>>>>>>>>> From: Michel Dänzer  On 2018-12-21 8:26
>>>>>>>>>>> a.m., Emily Deng wrote:
>>>>>>>>>>>> When the bo is used to set mode, the bo need to be pinned.
>>>>>>>>>>>
>>>>>>>>>>> On second thought, why does the BO need to be pinned? When
>>>>>>>>>>> using the display hardware, the BO needs to be pinned to
>>>>>>>>>>> prevent it from being moved while the hardware is scanning
>>>>>>>>>>> out from it, but that shouldn't be
>>>>>>>>> necessary here.
>>>>>>>>>> The pin here is used for scan out the buffer by remote display app.
>>>>>>>>>
>>>>>>>>> I still don't understand why pinning is needed. What mechanism
>>>>>>>>> does the remote display app use to access the BO contents?
>>>>>>>> Sorry, I am not familiar with the remote display app. Maybe it
>>>>>>>> will use drm ioctl function to get the current crtc's fb's
>>>>>>>> information, and get the content in the fb's buffer object by
>>>>>>>> mmap or translate the bo to an OpenGL texture for next
>>>>>>>> rendering. Maybe don't need to pin the bo here, as the use has
>>>>>>>> no different with other
>>> normal bos.
>>>>>>>> So please ignore the patch, and will send another patch to
>>>>>>>> remove the unpin
>>>>>> the fb's bo code.
>>>>>>> It seems to be hard to remove all the pin for virtual_dce, as it
>>>>>>> uses some
>>>>>> common code in amdgpu_display.c.
>>>>>>
>>>>>> Because of amdgpu_display_unpin_work_func? That might be as simple
>>>>>> as replacing
>>>>>>
>>>>>>  schedule_work(>unpin_work);
>>>>>>
>>>>>> with
>>>>>>
>>>>>>  kfree(works->shared);
>>>>>>  kfree(works);
>>>>>>
>>>>>> in dce_virtual_pageflip.
>>>>> But the amdgpu_display_crtc_page_flip_target will pin the new_bo,
>>>>> then we don't need to unpin it?
>>>>
>>>> Ah, right, but then dce_virtual_pageflip could just unpin it? Not
>>>> ideal, but better than leaving it pinned unnecessarily.
>>> Yes, it is not a good idea to leave it pinned. Then will need lots of
>>> "if (amdgpu_virtual_display)", don't know whether it could be accept?
>
>Should rather be if (adev->enable_virtual_display). If you want to never pin, 
>it's
>probably worth giving this a shot and seeing how bad it gets.
>BTW, amdgpu_ttm_alloc_gart can probably also be skipped for virtual display,
>maybe more.
Ok, then I will try, but the code may be ugly.
>
>> Another method is let the logical stay no change, just use if
>> (!amdgpu_virtual_display) before WARN_ON_ONCE of amdgpu_bo_unpin to
>remove the virtual_dce's call trace.
>
>That would be ugly IMHO.
>
>
>But none of that is necessary if dce_virtual_pageflip simply unpins the BO and
>skips unpin_work.
Yes, but it will still have the issue that it won't unpin the bo which is 
pinned by amdgpu_display_crtc_page_flip_target.

Best wishes
Emily Deng


>
>
>--
>Earthling Michel Dänzer   |   http://www.amd.com
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo

2018-12-21 Thread Deng, Emily

>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Friday, December 21, 2018 5:41 PM
>To: Michel Dänzer 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>
>>-Original Message-
>>From: Michel Dänzer 
>>Sent: Friday, December 21, 2018 5:37 PM
>>To: Deng, Emily 
>>Cc: amd-gfx@lists.freedesktop.org
>>Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>
>>On 2018-12-21 10:32 a.m., Deng, Emily wrote:
>>>> -Original Message-----
>>>> From: Michel Dänzer 
>>>> Sent: Friday, December 21, 2018 5:28 PM
>>>> To: Deng, Emily 
>>>> Cc: amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>>>
>>>> On 2018-12-21 10:17 a.m., Deng, Emily wrote:
>>>>>> From: amd-gfx  On Behalf Of
>>>>>> Deng, Emily
>>>>>>> From: Michel Dänzer  On 2018-12-21 9:45 a.m.,
>>>>>>> Deng, Emily wrote:
>>>>>>>>> From: Michel Dänzer  On 2018-12-21 8:26
>>>>>>>>> a.m., Emily Deng wrote:
>>>>>>>>>> When the bo is used to set mode, the bo need to be pinned.
>>>>>>>>>
>>>>>>>>> On second thought, why does the BO need to be pinned? When
>>>>>>>>> using the display hardware, the BO needs to be pinned to
>>>>>>>>> prevent it from being moved while the hardware is scanning out
>>>>>>>>> from it, but that shouldn't be
>>>>>>> necessary here.
>>>>>>>> The pin here is used for scan out the buffer by remote display app.
>>>>>>>
>>>>>>> I still don't understand why pinning is needed. What mechanism
>>>>>>> does the remote display app use to access the BO contents?
>>>>>> Sorry, I am not familiar with the remote display app. Maybe it
>>>>>> will use drm ioctl function to get the current crtc's fb's
>>>>>> information, and get the content in the fb's buffer object by mmap
>>>>>> or translate the bo to an OpenGL texture for next rendering. Maybe
>>>>>> don't need to pin the bo here, as the use has no different with other
>normal bos.
>>>>>> So please ignore the patch, and will send another patch to remove
>>>>>> the unpin
>>>> the fb's bo code.
>>>>> It seems to be hard to remove all the pin for virtual_dce, as it
>>>>> uses some
>>>> common code in amdgpu_display.c.
>>>>
>>>> Because of amdgpu_display_unpin_work_func? That might be as simple
>>>> as replacing
>>>>
>>>>schedule_work(>unpin_work);
>>>>
>>>> with
>>>>
>>>>kfree(works->shared);
>>>>kfree(works);
>>>>
>>>> in dce_virtual_pageflip.
>>> But the amdgpu_display_crtc_page_flip_target will pin the new_bo,
>>> then we don't need to unpin it?
>>
>>Ah, right, but then dce_virtual_pageflip could just unpin it? Not
>>ideal, but better than leaving it pinned unnecessarily.
>Yes, it is not a good idea to leave it pinned. Then will need lots of "if
>(amdgpu_virtual_display)", don't know whether it could be accept?
Another method is let the logical stay no change, just use if 
(!amdgpu_virtual_display)
before WARN_ON_ONCE of amdgpu_bo_unpin to remove the virtual_dce's call trace.
Which method do you think is better?
>>
>>
>>--
>>Earthling Michel Dänzer   |   http://www.amd.com
>>Libre software enthusiast | Mesa and X developer
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo

2018-12-21 Thread Deng, Emily

>-Original Message-
>From: Michel Dänzer 
>Sent: Friday, December 21, 2018 5:37 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>
>On 2018-12-21 10:32 a.m., Deng, Emily wrote:
>>> -Original Message-
>>> From: Michel Dänzer 
>>> Sent: Friday, December 21, 2018 5:28 PM
>>> To: Deng, Emily 
>>> Cc: amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>>
>>> On 2018-12-21 10:17 a.m., Deng, Emily wrote:
>>>>> From: amd-gfx  On Behalf Of
>>>>> Deng, Emily
>>>>>> From: Michel Dänzer  On 2018-12-21 9:45 a.m.,
>>>>>> Deng, Emily wrote:
>>>>>>>> From: Michel Dänzer  On 2018-12-21 8:26
>>>>>>>> a.m., Emily Deng wrote:
>>>>>>>>> When the bo is used to set mode, the bo need to be pinned.
>>>>>>>>
>>>>>>>> On second thought, why does the BO need to be pinned? When using
>>>>>>>> the display hardware, the BO needs to be pinned to prevent it
>>>>>>>> from being moved while the hardware is scanning out from it, but
>>>>>>>> that shouldn't be
>>>>>> necessary here.
>>>>>>> The pin here is used for scan out the buffer by remote display app.
>>>>>>
>>>>>> I still don't understand why pinning is needed. What mechanism
>>>>>> does the remote display app use to access the BO contents?
>>>>> Sorry, I am not familiar with the remote display app. Maybe it will
>>>>> use drm ioctl function to get the current crtc's fb's information,
>>>>> and get the content in the fb's buffer object by mmap or translate
>>>>> the bo to an OpenGL texture for next rendering. Maybe don't need to
>>>>> pin the bo here, as the use has no different with other normal bos.
>>>>> So please ignore the patch, and will send another patch to remove
>>>>> the unpin
>>> the fb's bo code.
>>>> It seems to be hard to remove all the pin for virtual_dce, as it
>>>> uses some
>>> common code in amdgpu_display.c.
>>>
>>> Because of amdgpu_display_unpin_work_func? That might be as simple as
>>> replacing
>>>
>>> schedule_work(>unpin_work);
>>>
>>> with
>>>
>>> kfree(works->shared);
>>> kfree(works);
>>>
>>> in dce_virtual_pageflip.
>> But the amdgpu_display_crtc_page_flip_target will pin the new_bo, then
>> we don't need to unpin it?
>
>Ah, right, but then dce_virtual_pageflip could just unpin it? Not ideal, but 
>better
>than leaving it pinned unnecessarily.
Yes, it is not a good idea to leave it pinned. Then will need lots of "if 
(amdgpu_virtual_display)", don't know
whether it could be accept?
>
>
>--
>Earthling Michel Dänzer   |   http://www.amd.com
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo

2018-12-21 Thread Deng, Emily

>-Original Message-
>From: Michel Dänzer 
>Sent: Friday, December 21, 2018 5:28 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>
>On 2018-12-21 10:17 a.m., Deng, Emily wrote:
>>> From: amd-gfx  On Behalf Of
>>> Deng, Emily
>>>> From: Michel Dänzer  On 2018-12-21 9:45 a.m.,
>>>> Deng, Emily wrote:
>>>>>> From: Michel Dänzer  On 2018-12-21 8:26 a.m.,
>>>>>> Emily Deng wrote:
>>>>>>> When the bo is used to set mode, the bo need to be pinned.
>>>>>>
>>>>>> On second thought, why does the BO need to be pinned? When using
>>>>>> the display hardware, the BO needs to be pinned to prevent it from
>>>>>> being moved while the hardware is scanning out from it, but that
>>>>>> shouldn't be
>>>> necessary here.
>>>>> The pin here is used for scan out the buffer by remote display app.
>>>>
>>>> I still don't understand why pinning is needed. What mechanism does
>>>> the remote display app use to access the BO contents?
>>> Sorry, I am not familiar with the remote display app. Maybe it will
>>> use drm ioctl function to get the current crtc's fb's information,
>>> and get the content in the fb's buffer object by mmap or translate
>>> the bo to an OpenGL texture for next rendering. Maybe don't need to
>>> pin the bo here, as the use has no different with other normal bos.
>>> So please ignore the patch, and will send another patch to remove the unpin
>the fb's bo code.
>> It seems to be hard to remove all the pin for virtual_dce, as it uses some
>common code in amdgpu_display.c.
>
>Because of amdgpu_display_unpin_work_func? That might be as simple as
>replacing
>
>   schedule_work(>unpin_work);
>
>with
>
>   kfree(works->shared);
>   kfree(works);
>
>in dce_virtual_pageflip.
But the amdgpu_display_crtc_page_flip_target will pin the new_bo, then we don't 
need to unpin it?

Best wishes
Emily Deng


>
>
>--
>Earthling Michel Dänzer   |   http://www.amd.com
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo

2018-12-21 Thread Deng, Emily

>-Original Message-
>From: amd-gfx  On Behalf Of Deng,
>Emily
>Sent: Friday, December 21, 2018 5:10 PM
>To: Michel Dänzer 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>
>>-Original Message-
>>From: Michel Dänzer 
>>Sent: Friday, December 21, 2018 4:52 PM
>>To: Deng, Emily 
>>Cc: amd-gfx@lists.freedesktop.org
>>Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>
>>On 2018-12-21 9:45 a.m., Deng, Emily wrote:
>>>> -Original Message-----
>>>> From: Michel Dänzer 
>>>> Sent: Friday, December 21, 2018 4:38 PM
>>>> To: Deng, Emily 
>>>> Cc: amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>>>
>>>> On 2018-12-21 8:26 a.m., Emily Deng wrote:
>>>>> When the bo is used to set mode, the bo need to be pinned.
>>>>
>>>> On second thought, why does the BO need to be pinned? When using the
>>>> display hardware, the BO needs to be pinned to prevent it from being
>>>> moved while the hardware is scanning out from it, but that shouldn't
>>>> be
>>necessary here.
>>> The pin here is used for scan out the buffer by remote display app.
>>
>>I still don't understand why pinning is needed. What mechanism does the
>>remote display app use to access the BO contents?
>Sorry, I am not familiar with the remote display app. Maybe it will use drm 
>ioctl
>function to get the current crtc's fb's information, and get the content in 
>the fb's
>buffer object by mmap or translate the bo to an OpenGL texture for next
>rendering. Maybe don't need to pin the bo here, as the use has no different 
>with
>other normal bos. So please ignore the patch, and will send another patch to
>remove the unpin the fb's bo code.
It seems to be hard to remove all the pin for virtual_dce, as it uses some 
common code in amdgpu_display.c.
So for code consistency, maybe still need to add the pin here.

Best wishes
Emily Deng
>>
>>
>>--
>>Earthling Michel Dänzer   |   http://www.amd.com
>>Libre software enthusiast | Mesa and X developer
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo

2018-12-21 Thread Deng, Emily

>-Original Message-
>From: Michel Dänzer 
>Sent: Friday, December 21, 2018 4:52 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>
>On 2018-12-21 9:45 a.m., Deng, Emily wrote:
>>> -Original Message-
>>> From: Michel Dänzer 
>>> Sent: Friday, December 21, 2018 4:38 PM
>>> To: Deng, Emily 
>>> Cc: amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>>>
>>> On 2018-12-21 8:26 a.m., Emily Deng wrote:
>>>> When the bo is used to set mode, the bo need to be pinned.
>>>
>>> On second thought, why does the BO need to be pinned? When using the
>>> display hardware, the BO needs to be pinned to prevent it from being
>>> moved while the hardware is scanning out from it, but that shouldn't be
>necessary here.
>> The pin here is used for scan out the buffer by remote display app.
>
>I still don't understand why pinning is needed. What mechanism does the remote
>display app use to access the BO contents?
Sorry, I am not familiar with the remote display app. Maybe it will use drm 
ioctl function to get the 
current crtc's fb's information, and get the content in the fb's buffer object 
by mmap or translate the bo
to an OpenGL texture for next rendering. Maybe don't need to pin the bo here, 
as the use has no different with
other normal bos. So please ignore the patch, and will send another patch to 
remove the unpin the fb's bo code. 
>
>
>--
>Earthling Michel Dänzer   |   http://www.amd.com
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo

2018-12-21 Thread Deng, Emily

>-Original Message-
>From: Michel Dänzer 
>Sent: Friday, December 21, 2018 4:38 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu/virtual_dce: Need to pin the fb's bo
>
>On 2018-12-21 8:26 a.m., Emily Deng wrote:
>> When the bo is used to set mode, the bo need to be pinned.
>
>On second thought, why does the BO need to be pinned? When using the display
>hardware, the BO needs to be pinned to prevent it from being moved while the
>hardware is scanning out from it, but that shouldn't be necessary here.
The pin here is used for scan out the buffer by remote display app.

Best wishes
Emily Deng


>
>--
>Earthling Michel Dänzer   |   http://www.amd.com
>Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

< 1 2 3 4 >

101 - 200 of 300 matches

Mail list logo