Re: [PATCH] drm/amdgpu/jpeg: remove redundant check when it returns

2020-08-14 Thread Christian König

Am 13.08.20 um 21:40 schrieb Leo Liu:

Fix warning from kernel test robot

Signed-off-by: Leo Liu 
---
  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
index c41e5590a701..f4ba423af051 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
@@ -465,8 +465,6 @@ static int jpeg_v3_0_wait_for_idle(void *handle)
ret = SOC15_WAIT_ON_RREG(JPEG, 0, mmUVD_JRBC_STATUS,
UVD_JRBC_STATUS__RB_JOB_DONE_MASK,
UVD_JRBC_STATUS__RB_JOB_DONE_MASK);
-   if (ret)
-   return ret;
  
  	return ret;


You will probably get the next warning from the test robot with that 
because it can be simplified to "return SOC15_"...


Maybe even the local variable ret can be dropped, depending on what else 
it is used for in the function.


Christian.


  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/4] drm/amd/powerplay: put those exposed power interfaces in amdgpu_dpm.c

2020-08-14 Thread Nirmoy



On 8/14/20 4:56 AM, Quan, Evan wrote:

[AMD Official Use Only - Internal Distribution Only]

Yes, I would like to make another patch to address Nirmoy's comments.
@Das, Nirmoy is that OK?



Yes.


Nirmoy



BR
Evan
-Original Message-
From: Alex Deucher 
Sent: Thursday, August 13, 2020 9:54 PM
To: Das, Nirmoy 
Cc: Quan, Evan ; amd-gfx list ; 
Deucher, Alexander 
Subject: Re: [PATCH 4/4] drm/amd/powerplay: put those exposed power interfaces 
in amdgpu_dpm.c

Series is:
Reviewed-by: Alex Deucher  Nirmoy makes some good 
points below, but I think those should be a follow up patch as this patch is just 
moving code around; no actual changes.

Alex

On Thu, Aug 13, 2020 at 6:52 AM Nirmoy  wrote:

Acked-by: Nirmoy Das  for 1st 3 patches. Check for
below for

more comments.

On 8/13/20 11:08 AM, Evan Quan wrote:

As other power interfaces.

Change-Id: I5e3b85ae21c4b1d0239f54fa75247b33cfdb7ddc
Signed-off-by: Evan Quan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c | 425 
   drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h |  14 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c  | 423 ---
   drivers/gpu/drm/amd/amdgpu/amdgpu_pm.h  |   8 -
   4 files changed, 439 insertions(+), 431 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
index 2198148319e2..e114b5cbd8b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
@@ -28,6 +28,11 @@
   #include "amdgpu_dpm.h"
   #include "atom.h"
   #include "amd_pcie.h"
+#include "amdgpu_display.h"
+#include "hwmgr.h"
+#include 
+
+#define WIDTH_4K 3840

   void amdgpu_dpm_print_class_info(u32 class, u32 class2)
   {
@@ -1262,3 +1267,423 @@ int amdgpu_dpm_smu_i2c_bus_access(struct
amdgpu_device *adev,

   return ret;
   }
+
+void amdgpu_pm_acpi_event_handler(struct amdgpu_device *adev) {
+ if (adev->pm.dpm_enabled) {
+ mutex_lock(&adev->pm.mutex);
+ if (power_supply_is_system_supplied() > 0)
+ adev->pm.ac_power = true;
+ else
+ adev->pm.ac_power = false;
+ if (adev->powerplay.pp_funcs &&
+ adev->powerplay.pp_funcs->enable_bapm)
+ amdgpu_dpm_enable_bapm(adev, adev->pm.ac_power);
+ mutex_unlock(&adev->pm.mutex);
+
+ if (is_support_sw_smu(adev))
+ smu_set_ac_dc(&adev->smu);
+ }
+}
+
+int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enum amd_pp_sensors 
sensor,
+void *data, uint32_t *size) {
+ int ret = 0;
+
+ if (!data || !size)
+ return -EINVAL;
+
+ if (is_support_sw_smu(adev))
+ ret = smu_read_sensor(&adev->smu, sensor, data, size);
+ else {
+ if (adev->powerplay.pp_funcs && 
adev->powerplay.pp_funcs->read_sensor)
+ ret = 
adev->powerplay.pp_funcs->read_sensor((adev)->powerplay.pp_handle,
+ sensor, data, 
size);
+ else
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+void amdgpu_dpm_thermal_work_handler(struct work_struct *work) {
+ struct amdgpu_device *adev =
+ container_of(work, struct amdgpu_device,
+  pm.dpm.thermal.work);
+ /* switch to the thermal state */
+ enum amd_pm_state_type dpm_state = POWER_STATE_TYPE_INTERNAL_THERMAL;
+ int temp, size = sizeof(temp);
+
+ if (!adev->pm.dpm_enabled)
+ return;
+
+ if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GPU_TEMP,
+ (void *)&temp, &size)) {
+ if (temp < adev->pm.dpm.thermal.min_temp)
+ /* switch back the user state */
+ dpm_state = adev->pm.dpm.user_state;
+ } else {
+ if (adev->pm.dpm.thermal.high_to_low)
+ /* switch back the user state */
+ dpm_state = adev->pm.dpm.user_state;
+ }
+ mutex_lock(&adev->pm.mutex);
+ if (dpm_state == POWER_STATE_TYPE_INTERNAL_THERMAL)
+ adev->pm.dpm.thermal_active = true;
+ else
+ adev->pm.dpm.thermal_active = false;
+ adev->pm.dpm.state = dpm_state;
+ mutex_unlock(&adev->pm.mutex);
+
+ amdgpu_pm_compute_clocks(adev); }
+
+static struct amdgpu_ps *amdgpu_dpm_pick_power_state(struct amdgpu_device 
*adev,
+  enum
+amd_pm_state_type dpm_state) {
+ int i;
+ struct amdgpu_ps *ps;
+ u32 ui_class;
+ bool single_display = (adev->pm.dpm.new_active_crtc_count < 2) ?
+ true : false;
+
+ /* check if the vblank period is too short to adjust the mclk */
+ if (single_display && adev->powerplay.pp_funcs->vblank_too_short) {
+ if (amdgpu_dpm_vblank_too_short(adev))
+ single_display = false;
+ }
+
+ /* certain older

[PATCH] drm/amd/pm: drop redundant MEM_TYPE_* macros

2020-08-14 Thread Evan Quan
As these are already defined in amdgpu_atombios.h. Otherwise, we may
hit "redefined" compile warning.

Change-Id: Ia2a9e10b35173fedcbbd8e0abb8ad38dd231baf4
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
index 3ee54f182943..76ed2e413594 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
@@ -26,15 +26,6 @@
 
 #include "hwmgr.h"
 
-#define MEM_TYPE_GDDR5  0x50
-#define MEM_TYPE_GDDR4  0x40
-#define MEM_TYPE_GDDR3  0x30
-#define MEM_TYPE_DDR2   0x20
-#define MEM_TYPE_GDDR1  0x10
-#define MEM_TYPE_DDR3   0xb0
-#define MEM_TYPE_MASK   0xF0
-
-
 /* As returned from PowerConnectorDetectionTable. */
 #define PP_ATOM_POWER_BUDGET_DISABLE_OVERDRIVE  0x80
 #define PP_ATOM_POWER_BUDGET_SHOW_WARNING   0x40
-- 
2.28.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amd/pm: drop redundant MEM_TYPE_* macros

2020-08-14 Thread Xu, Feifei
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Feifei Xu 

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Friday, August 14, 2020 4:44 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Quan, Evan 

Subject: [PATCH] drm/amd/pm: drop redundant MEM_TYPE_* macros

As these are already defined in amdgpu_atombios.h. Otherwise, we may hit 
"redefined" compile warning.

Change-Id: Ia2a9e10b35173fedcbbd8e0abb8ad38dd231baf4
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
index 3ee54f182943..76ed2e413594 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
@@ -26,15 +26,6 @@

 #include "hwmgr.h"

-#define MEM_TYPE_GDDR5  0x50
-#define MEM_TYPE_GDDR4  0x40
-#define MEM_TYPE_GDDR3  0x30
-#define MEM_TYPE_DDR2   0x20
-#define MEM_TYPE_GDDR1  0x10
-#define MEM_TYPE_DDR3   0xb0
-#define MEM_TYPE_MASK   0xF0
-
-
 /* As returned from PowerConnectorDetectionTable. */  #define 
PP_ATOM_POWER_BUDGET_DISABLE_OVERDRIVE  0x80
 #define PP_ATOM_POWER_BUDGET_SHOW_WARNING   0x40
--
2.28.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CFeifei.Xu%40amd.com%7Cb57418b0630d47f141bd08d8402e3bdc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637329914622450013&sdata=uuHzhQ7DDnANxqLtUhx%2FllmcFXg%2F1qPNzhQPa93IFdI%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/pm: drop redundant MEM_TYPE_* macros

2020-08-14 Thread Nirmoy



Reviewed-by: Nirmoy Das 


Nirmoy

On 8/14/20 10:43 AM, Evan Quan wrote:

As these are already defined in amdgpu_atombios.h. Otherwise, we may
hit "redefined" compile warning.

Change-Id: Ia2a9e10b35173fedcbbd8e0abb8ad38dd231baf4
Signed-off-by: Evan Quan 
---
  drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h | 9 -
  1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
index 3ee54f182943..76ed2e413594 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.h
@@ -26,15 +26,6 @@
  
  #include "hwmgr.h"
  
-#define MEM_TYPE_GDDR5  0x50

-#define MEM_TYPE_GDDR4  0x40
-#define MEM_TYPE_GDDR3  0x30
-#define MEM_TYPE_DDR2   0x20
-#define MEM_TYPE_GDDR1  0x10
-#define MEM_TYPE_DDR3   0xb0
-#define MEM_TYPE_MASK   0xF0
-
-
  /* As returned from PowerConnectorDetectionTable. */
  #define PP_ATOM_POWER_BUDGET_DISABLE_OVERDRIVE  0x80
  #define PP_ATOM_POWER_BUDGET_SHOW_WARNING   0x40

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/jpeg: remove redundant check when it returns

2020-08-14 Thread Leo Liu
Fix warning from kernel test robot
v2: remove the local variable as well

Signed-off-by: Leo Liu 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
index c41e5590a701..3a0dff53654d 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
@@ -460,15 +460,10 @@ static bool jpeg_v3_0_is_idle(void *handle)
 static int jpeg_v3_0_wait_for_idle(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   int ret;
 
-   ret = SOC15_WAIT_ON_RREG(JPEG, 0, mmUVD_JRBC_STATUS,
+   return SOC15_WAIT_ON_RREG(JPEG, 0, mmUVD_JRBC_STATUS,
UVD_JRBC_STATUS__RB_JOB_DONE_MASK,
UVD_JRBC_STATUS__RB_JOB_DONE_MASK);
-   if (ret)
-   return ret;
-
-   return ret;
 }
 
 static int jpeg_v3_0_set_clockgating_state(void *handle,
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Nirmoy


On 8/13/20 11:17 PM, Luben Tuikov wrote:

I support having AER handling.

However, I feel it should be offloaded to the DRM layer.
The PCI driver gets the AER callback and immediately
offloads into DRM, as "return drm_aer_recover(dev); }".
The DRM layer does a top-down approach into the error
recovery procedure.

The PCI device driver provides (would/should?) a capability
(at registration) which the DRM layer would inspect and
subsequently call into the PCI driver's low-level methods
to recover the error or to reset the device.

But it should be a top-down approach. I believe the thread
below has somehow hinted at this.

The implementation below boils down to:

If recoverable error, all is good.
If unrecoverable error, then
disable power management,
suspend I/O operations,
cancel pending work,
call into PCI driver to clear
any state it keeps,
call into PCI driver to reset display control.
Etc.

And this regime could be performed by DRM.

And as you can see now, the function implemented below,
*calls into* (that's the key here!) DRM, and it should be
the other way around--the DRM should call into the PCI driver,
after the PCI driver's callback immediately calls into DRM,
as outlined above.

This abstraction could be expanded to more concepts of PCIe GPU drivers,
and it would scale well, beyond PCIe as a protocol for graphics.


If drm handles pci error callbacks then it should also handle power 
management


because power management also calls into drm in very similar way. I 
think these are very


low device level tasks for drm.



+static const struct pci_error_handlers amdgpu_err_handler = {

That's too generic a name for this. I'd rather add "pci" in there,

static const struct pci_error_handlers amdgpu_pci_err_handler =  {



True, thanks for the name suggestion.


Nirmoy



.element = init,
...
};

Being a singular noun from the outset is good and this is preserved.


+   .error_detected = amdgpu_pci_err_detected,
+};
+
+
  static struct pci_driver amdgpu_kms_pci_driver = {
.name = DRIVER_NAME,
.id_table = pciidlist,
@@ -1523,10 +1576,9 @@ static struct pci_driver amdgpu_kms_pci_driver = {
.remove = amdgpu_pci_remove,
.shutdown = amdgpu_pci_shutdown,
.driver.pm = &amdgpu_pm_ops,
+   .err_handler = &amdgpu_err_handler,

".err_handler = amdgpu_pci_err_handler,"


Regards,
Luben

On 2020-08-13 2:18 p.m., Andrey Grodzovsky wrote:

On 8/13/20 11:06 AM, Nirmoy wrote:

On 8/13/20 3:38 PM, Andrey Grodzovsky wrote:

On 8/13/20 7:09 AM, Nirmoy wrote:

On 8/12/20 4:52 PM, Andrey Grodzovsky wrote:

On 8/11/20 9:30 AM, Nirmoy Das wrote:

This patch will ignore non-fatal errors and try to
stop amdgpu's sw stack on fatal errors.

Signed-off-by: Nirmoy Das 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 56 -
   1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index c1219af2e7d6..2b9ede3000ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -35,6 +35,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
     #include "amdgpu.h"
@@ -1516,6 +1517,58 @@ static struct drm_driver kms_driver = {
   .patchlevel = KMS_DRIVER_PATCHLEVEL,
   };
   +static pci_ers_result_t amdgpu_pci_err_detected(struct pci_dev *pdev,
+    pci_channel_state_t state)
+{
+    struct drm_device *dev = pci_get_drvdata(pdev);
+    struct amdgpu_device *adev = dev->dev_private;
+    int i;
+    int ret = PCI_ERS_RESULT_DISCONNECT;
+
+    switch (state) {
+    case pci_channel_io_normal:
+    ret = PCI_ERS_RESULT_CAN_RECOVER;
+    break;
+    default:
+    /* Disable power management */
+    adev->runpm = 0;
+    /* Suspend all IO operations */
+    amdgpu_fbdev_set_suspend(adev, 1);
+ cancel_delayed_work_sync(&adev->delayed_init_work);
+    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+    struct amdgpu_ring *ring = adev->rings[i];
+
+    if (!ring || !ring->sched.thread)
+    continue;
+
+ amdgpu_job_stop_all_jobs_on_sched(&ring->sched);


You need to call drm_sched_stop first before calling this


+    }
+
+    if (adev->mode_info.mode_config_initialized) {
+    if (!amdgpu_device_has_dc_support(adev))
+ drm_helper_force_disable_all(adev->ddev);
+    else
+ drm_atomic_helper_shutdown(adev->ddev);
+    }
+
+    amdgpu_fence_driver_fini(adev);
+    amdgpu_fbdev_fini(adev);
+    /* Try to close drm device to stop applications
+ * from opening dri files for further IO operations.
+ * TODO: This will throw warning as ttm is not
+ * cleaned perperly */
+    drm_dev_fini(dev);


I think user mode applications might still hold refer

Re: TTM/nouveau conflict in drm-misc-next

2020-08-14 Thread Christian König

Hey Thomas & Alex,

well the TTM and Nouveau changes look good to me, but this completely 
broke amdgpu.


Alex any idea what is going on here?

Regards,
Christian.

Am 12.08.20 um 21:10 schrieb Thomas Zimmermann:

Hi Christian and Ben,

I backmerged drm-next into drm-misc-next and had a conflict between ttm
and nouveau. struct ttm_mem_res got renamed to struct ttm_resource. I
updated nouveau to the new name, test-built, and pushed the result to
drm-misc-next. If either of you has a minute, you may want to double
check the merge.

Best regards
Thomas



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/jpeg: remove redundant check when it returns

2020-08-14 Thread Nirmoy

Acked-by: Nirmoy Das 

On 8/14/20 5:14 PM, Leo Liu wrote:

Fix warning from kernel test robot
v2: remove the local variable as well

Signed-off-by: Leo Liu 
Reviewed-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
index c41e5590a701..3a0dff53654d 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
@@ -460,15 +460,10 @@ static bool jpeg_v3_0_is_idle(void *handle)
  static int jpeg_v3_0_wait_for_idle(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   int ret;
  
-	ret = SOC15_WAIT_ON_RREG(JPEG, 0, mmUVD_JRBC_STATUS,

+   return SOC15_WAIT_ON_RREG(JPEG, 0, mmUVD_JRBC_STATUS,
UVD_JRBC_STATUS__RB_JOB_DONE_MASK,
UVD_JRBC_STATUS__RB_JOB_DONE_MASK);
-   if (ret)
-   return ret;
-
-   return ret;
  }
  
  static int jpeg_v3_0_set_clockgating_state(void *handle,

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TTM/nouveau conflict in drm-misc-next

2020-08-14 Thread Alex Deucher
On Fri, Aug 14, 2020 at 11:22 AM Christian König
 wrote:
>
> Hey Thomas & Alex,
>
> well the TTM and Nouveau changes look good to me, but this completely
> broke amdgpu.
>
> Alex any idea what is going on here?

What's broken in amdgpu?  There shouldn't be any ttm changes in amdgpu
for drm-next.  Those all go through drm-misc.

Alex

>
> Regards,
> Christian.
>
> Am 12.08.20 um 21:10 schrieb Thomas Zimmermann:
> > Hi Christian and Ben,
> >
> > I backmerged drm-next into drm-misc-next and had a conflict between ttm
> > and nouveau. struct ttm_mem_res got renamed to struct ttm_resource. I
> > updated nouveau to the new name, test-built, and pushed the result to
> > drm-misc-next. If either of you has a minute, you may want to double
> > check the merge.
> >
> > Best regards
> > Thomas
> >
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TTM/nouveau conflict in drm-misc-next

2020-08-14 Thread Koenig, Christian


Am 14.08.2020 17:53 schrieb Alex Deucher :
On Fri, Aug 14, 2020 at 11:22 AM Christian König
 wrote:
>
> Hey Thomas & Alex,
>
> well the TTM and Nouveau changes look good to me, but this completely
> broke amdgpu.
>
> Alex any idea what is going on here?

What's broken in amdgpu?  There shouldn't be any ttm changes in amdgpu
for drm-next.  Those all go through drm-misc.

It's not a TTM change.

The backmerge of drm-next into drm-misc-next broke amdgpu so that even glxgears 
doesn't work anymore.

But each individual merge head still works fine as far as I can say.

Any idea how to track that down?

Christian.


Alex

>
> Regards,
> Christian.
>
> Am 12.08.20 um 21:10 schrieb Thomas Zimmermann:
> > Hi Christian and Ben,
> >
> > I backmerged drm-next into drm-misc-next and had a conflict between ttm
> > and nouveau. struct ttm_mem_res got renamed to struct ttm_resource. I
> > updated nouveau to the new name, test-built, and pushed the result to
> > drm-misc-next. If either of you has a minute, you may want to double
> > check the merge.
> >
> > Best regards
> > Thomas
> >
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cchristian.koenig%40amd.com%7Ca1aefc1ee22a4e733df908d8406a395c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330172275088649&sdata=X2ZJUETwoq884Xtg66sDudjXB%2F3s%2BgRglnh33gpU4Hc%3D&reserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: Add DSC_DBG_EN shift/mask for dcn3

2020-08-14 Thread Bhawanpreet Lakha
This field is not defined for DCN3

Signed-off-by: Bhawanpreet Lakha 
---
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h  |  1 +
 .../include/asic_reg/dcn/dcn_3_0_0_sh_mask.h  | 22 +++
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h
index 667640c4b288..1118e33aaa2c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h
@@ -94,6 +94,7 @@
DSC_SF(DSC_TOP0_DSC_TOP_CONTROL, DSC_CLOCK_EN, mask_sh), \
DSC_SF(DSC_TOP0_DSC_TOP_CONTROL, DSC_DISPCLK_R_GATE_DIS, mask_sh), \
DSC_SF(DSC_TOP0_DSC_TOP_CONTROL, DSC_DSCCLK_R_GATE_DIS, mask_sh), \
+   DSC_SF(DSC_TOP0_DSC_DEBUG_CONTROL, DSC_DBG_EN, mask_sh), \
DSC_SF(DSCC0_DSCC_CONFIG0, ICH_RESET_AT_END_OF_LINE, mask_sh), \
DSC_SF(DSCC0_DSCC_CONFIG0, NUMBER_OF_SLICES_PER_LINE, mask_sh), \
DSC_SF(DSCC0_DSCC_CONFIG0, ALTERNATE_ICH_ENCODING_EN, mask_sh), \
diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h
index 0e0319e98c07..ea683f452bb3 100755
--- a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h
@@ -50271,6 +50271,10 @@
 #define DSC_TOP0_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK
   0x0001L
 #define DSC_TOP0_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK  
   0x0010L
 #define DSC_TOP0_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK   
   0x0100L
+//DSC_TOP0_DSC_DEBUG_CONTROL
+#define DSC_TOP0_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT  
   0x0
+#define DSC_TOP0_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK
   0x0001L
+
 
 // addressBlock: dce_dc_dsc0_dispdec_dsccif_dispdec
 //DSCCIF0_DSCCIF_CONFIG0
@@ -50789,6 +50793,9 @@
 #define DSC_TOP1_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK
   0x0001L
 #define DSC_TOP1_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK  
   0x0010L
 #define DSC_TOP1_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK   
   0x0100L
+//DSC_TOP1_DSC_DEBUG_CONTROL
+#define DSC_TOP1_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT  
   0x0
+#define DSC_TOP1_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK
   0x0001L
 
 
 // addressBlock: dce_dc_dsc1_dispdec_dsccif_dispdec
@@ -51308,6 +51315,10 @@
 #define DSC_TOP2_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK
   0x0001L
 #define DSC_TOP2_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK  
   0x0010L
 #define DSC_TOP2_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK   
   0x0100L
+//DSC_TOP2_DSC_DEBUG_CONTROL
+#define DSC_TOP2_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT  
   0x0
+#define DSC_TOP2_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK
   0x0001L
+
 
 // addressBlock: dce_dc_dsc2_dispdec_dsccif_dispdec
 //DSCCIF2_DSCCIF_CONFIG0
@@ -51826,6 +51837,9 @@
 #define DSC_TOP3_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK
   0x0001L
 #define DSC_TOP3_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK  
   0x0010L
 #define DSC_TOP3_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK   
   0x0100L
+//DSC_TOP3_DSC_DEBUG_CONTROL
+#define DSC_TOP3_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT  
   0x0
+#define DSC_TOP3_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK
   0x0001L
 
 
 // addressBlock: dce_dc_dsc3_dispdec_dsccif_dispdec
@@ -52346,6 +52360,10 @@
 #define DSC_TOP4_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK
   0x0001L
 #define DSC_TOP4_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK  
   0x0010L
 #define DSC_TOP4_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK   
   0x0100L
+//DSC_TOP4_DSC_DEBUG_CONTROL
+#define DSC_TOP4_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT  
   0x0
+#define DSC_TOP4_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK
   0x0001L
+
 
 // addressBlock: dce_dc_dsc4_dispdec_dsccif

[PATCH] drm/dp_mst: Don't return error code when crtc is null

2020-08-14 Thread Bhawanpreet Lakha
[Why]
In certain cases the crtc can be NULL and returning -EINVAL causes
atomic check to fail when it shouln't. This leads to valid
configurations failing because atomic check fails.

[How]
Don't early return if crtc is null

Signed-off-by: Bhawanpreet Lakha 
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 70c4b7afed12..bc90a1485699 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -5037,8 +5037,8 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct 
drm_atomic_state *state, struct drm
 
crtc = conn_state->crtc;
 
-   if (WARN_ON(!crtc))
-   return -EINVAL;
+   if (!crtc)
+   continue;
 
if (!drm_dp_mst_dsc_aux_for_port(pos->port))
continue;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/dp_mst: Don't return error code when crtc is null

2020-08-14 Thread Ruhl, Michael J
>-Original Message-
>From: dri-devel  On Behalf Of
>Bhawanpreet Lakha
>Sent: Friday, August 14, 2020 1:02 PM
>To: mikita.lip...@amd.com; nicholas.kazlaus...@amd.com;
>alexander.deuc...@amd.com
>Cc: Bhawanpreet Lakha ; dri-
>de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>Subject: [PATCH] drm/dp_mst: Don't return error code when crtc is null
>
>[Why]
>In certain cases the crtc can be NULL and returning -EINVAL causes
>atomic check to fail when it shouln't. This leads to valid
>configurations failing because atomic check fails.

So is this a bug fix or an exception case, or an expected possibility?

>From my reading of the function comments, it is not clear that 
>pos->port->connector
might be NULL for some reason.

A better explanation of why this would occur would make this a much more
useful commit message.

My reading is that you ran into this issue an are masking it with this fix.

Rather than this is a real possibility and this is the correct fix.

Mike

>[How]
>Don't early return if crtc is null
>
>Signed-off-by: Bhawanpreet Lakha 
>---
> drivers/gpu/drm/drm_dp_mst_topology.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
>b/drivers/gpu/drm/drm_dp_mst_topology.c
>index 70c4b7afed12..bc90a1485699 100644
>--- a/drivers/gpu/drm/drm_dp_mst_topology.c
>+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
>@@ -5037,8 +5037,8 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct
>drm_atomic_state *state, struct drm
>
>   crtc = conn_state->crtc;
>
>-  if (WARN_ON(!crtc))
>-  return -EINVAL;
>+  if (!crtc)
>+  continue;
>
>   if (!drm_dp_mst_dsc_aux_for_port(pos->port))
>   continue;
>--
>2.17.1
>
>___
>dri-devel mailing list
>dri-de...@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TTM/nouveau conflict in drm-misc-next

2020-08-14 Thread Alex Deucher
On Fri, Aug 14, 2020 at 12:21 PM Koenig, Christian
 wrote:
>
>
>
> Am 14.08.2020 17:53 schrieb Alex Deucher :
>
> On Fri, Aug 14, 2020 at 11:22 AM Christian König
>  wrote:
> >
> > Hey Thomas & Alex,
> >
> > well the TTM and Nouveau changes look good to me, but this completely
> > broke amdgpu.
> >
> > Alex any idea what is going on here?
>
> What's broken in amdgpu?  There shouldn't be any ttm changes in amdgpu
> for drm-next.  Those all go through drm-misc.
>
>
> It's not a TTM change.
>
> The backmerge of drm-next into drm-misc-next broke amdgpu so that even 
> glxgears doesn't work anymore.
>
> But each individual merge head still works fine as far as I can say.
>
> Any idea how to track that down?

Weird.  Anything in the logs?

Alex


>
> Christian.
>
>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> > Am 12.08.20 um 21:10 schrieb Thomas Zimmermann:
> > > Hi Christian and Ben,
> > >
> > > I backmerged drm-next into drm-misc-next and had a conflict between ttm
> > > and nouveau. struct ttm_mem_res got renamed to struct ttm_resource. I
> > > updated nouveau to the new name, test-built, and pushed the result to
> > > drm-misc-next. If either of you has a minute, you may want to double
> > > check the merge.
> > >
> > > Best regards
> > > Thomas
> > >
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cchristian.koenig%40amd.com%7Ca1aefc1ee22a4e733df908d8406a395c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330172275088649&sdata=X2ZJUETwoq884Xtg66sDudjXB%2F3s%2BgRglnh33gpU4Hc%3D&reserved=0
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Add DSC_DBG_EN shift/mask for dcn3

2020-08-14 Thread Rodrigo Siqueira
Reviewed-by: Rodrigo Siqueira 

On 08/14, Bhawanpreet Lakha wrote:
> This field is not defined for DCN3
> 
> Signed-off-by: Bhawanpreet Lakha 
> ---
>  .../gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h  |  1 +
>  .../include/asic_reg/dcn/dcn_3_0_0_sh_mask.h  | 22 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h 
> b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h
> index 667640c4b288..1118e33aaa2c 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h
> +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.h
> @@ -94,6 +94,7 @@
>   DSC_SF(DSC_TOP0_DSC_TOP_CONTROL, DSC_CLOCK_EN, mask_sh), \
>   DSC_SF(DSC_TOP0_DSC_TOP_CONTROL, DSC_DISPCLK_R_GATE_DIS, mask_sh), \
>   DSC_SF(DSC_TOP0_DSC_TOP_CONTROL, DSC_DSCCLK_R_GATE_DIS, mask_sh), \
> + DSC_SF(DSC_TOP0_DSC_DEBUG_CONTROL, DSC_DBG_EN, mask_sh), \
>   DSC_SF(DSCC0_DSCC_CONFIG0, ICH_RESET_AT_END_OF_LINE, mask_sh), \
>   DSC_SF(DSCC0_DSCC_CONFIG0, NUMBER_OF_SLICES_PER_LINE, mask_sh), \
>   DSC_SF(DSCC0_DSCC_CONFIG0, ALTERNATE_ICH_ENCODING_EN, mask_sh), \
> diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h 
> b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h
> index 0e0319e98c07..ea683f452bb3 100755
> --- a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h
> +++ b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h
> @@ -50271,6 +50271,10 @@
>  #define DSC_TOP0_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK  
>  0x0001L
>  #define DSC_TOP0_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK
>  0x0010L
>  #define DSC_TOP0_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK 
>  0x0100L
> +//DSC_TOP0_DSC_DEBUG_CONTROL
> +#define DSC_TOP0_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT
>  0x0
> +#define DSC_TOP0_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK  
>  0x0001L
> +
>  
>  // addressBlock: dce_dc_dsc0_dispdec_dsccif_dispdec
>  //DSCCIF0_DSCCIF_CONFIG0
> @@ -50789,6 +50793,9 @@
>  #define DSC_TOP1_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK  
>  0x0001L
>  #define DSC_TOP1_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK
>  0x0010L
>  #define DSC_TOP1_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK 
>  0x0100L
> +//DSC_TOP1_DSC_DEBUG_CONTROL
> +#define DSC_TOP1_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT
>  0x0
> +#define DSC_TOP1_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK  
>  0x0001L
>  
>  
>  // addressBlock: dce_dc_dsc1_dispdec_dsccif_dispdec
> @@ -51308,6 +51315,10 @@
>  #define DSC_TOP2_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK  
>  0x0001L
>  #define DSC_TOP2_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK
>  0x0010L
>  #define DSC_TOP2_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK 
>  0x0100L
> +//DSC_TOP2_DSC_DEBUG_CONTROL
> +#define DSC_TOP2_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT
>  0x0
> +#define DSC_TOP2_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK  
>  0x0001L
> +
>  
>  // addressBlock: dce_dc_dsc2_dispdec_dsccif_dispdec
>  //DSCCIF2_DSCCIF_CONFIG0
> @@ -51826,6 +51837,9 @@
>  #define DSC_TOP3_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK  
>  0x0001L
>  #define DSC_TOP3_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK
>  0x0010L
>  #define DSC_TOP3_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK 
>  0x0100L
> +//DSC_TOP3_DSC_DEBUG_CONTROL
> +#define DSC_TOP3_DSC_DEBUG_CONTROL__DSC_DBG_EN__SHIFT
>  0x0
> +#define DSC_TOP3_DSC_DEBUG_CONTROL__DSC_DBG_EN_MASK  
>  0x0001L
>  
>  
>  // addressBlock: dce_dc_dsc3_dispdec_dsccif_dispdec
> @@ -52346,6 +52360,10 @@
>  #define DSC_TOP4_DSC_TOP_CONTROL__DSC_CLOCK_EN_MASK  
>  0x0001L
>  #define DSC_TOP4_DSC_TOP_CONTROL__DSC_DISPCLK_R_GATE_DIS_MASK
>  0x0010L
>  #define DSC_TOP4_DSC_TOP_CONTROL__DSC_DSCCLK_R_GATE_DIS_MASK 
>  0x0100L
> +//DSC_TOP4_DSC_DEBUG_CONTROL
> +#define DSC_TOP4_DSC_DEBUG_CONTROL__DSC_

[PATCH v2] drm/amd/display: Replace DRM private objects with subclassed DRM atomic state

2020-08-14 Thread Nicholas Kazlauskas
[Why]
DM atomic check was structured in a way that we required old DC state
in order to dynamically add and remove planes and streams from the
context to build the DC state context for validation.

DRM private objects were used to carry over the last DC state and
were added to the context on nearly every commit - regardless of fast
or full so we could check whether or not the new state could affect
bandwidth.

The problem with this model is that DRM private objects do not
implicitly stall out other commits.

So if you have two commits touching separate DRM objects they could
run concurrently and potentially execute out of order - leading to a
use-after-free.

If we want this to be safe we have two options:
1. Stall out concurrent commits since they touch the same private object
2. Refactor DM to not require old DC state and drop private object usage

[How]
This implements approach #2 since it still allows for judder free
updates in multi-display scenarios.

I'll list the big changes in order at a high level:

1. Subclass DRM atomic state instead of using DRM private objects.

DC relied on the old state to determine which changes cause bandwidth
updates but now we have DM perform similar checks based on DRM state
instead - dropping the requirement for old state to exist at all.

This means that we can now build a new DC context from scratch whenever
we have something that DM thinks could affect bandwidth.

Whenever we need to rebuild bandwidth we now add all CRTCs and planes
to the DRM state in order to get the absolute set of DC streams and
DC planes.

This introduces a stall on other commits, but this stall already
exists because of the lock_and_validation logic and it's necessary
since updates may move around pipes and require full reprogramming.

2. Drop workarounds to add planes to maintain z-order early in atomic
check. This is no longer needed because of the changes for (1).

This also involves fixing up should_plane_reset checks since we can just
avoid resetting streams and planes when they haven't actually changed.

3. Rework dm_update_crtc_state and dm_update_plane_state to be single
pass instead of two pass.

This is necessary since we no longer have the dc_state to add and
remove planes to the context in and we want to defer creation to the
end of commit_check.

It also makes the logic a lot simpler to follow since as an added bonus.

v2: rebase, naming, comments and spelling fixes

Cc: Rodrigo Siqueira 
Cc: Bhawanpreet Lakha 
Cc: Harry Wentland 
Cc: Leo Li 
Signed-off-by: Nicholas Kazlauskas 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 700 +++---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  11 +-
 2 files changed, 279 insertions(+), 432 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ff5f7f7ceec6..d76927bac737 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1839,7 +1839,6 @@ static int dm_resume(void *handle)
struct drm_plane *plane;
struct drm_plane_state *new_plane_state;
struct dm_plane_state *dm_new_plane_state;
-   struct dm_atomic_state *dm_state = 
to_dm_atomic_state(dm->atomic_obj.state);
enum dc_connection_type new_connection_type = dc_connection_none;
struct dc_state *dc_state;
int i, r, j;
@@ -1879,11 +1878,6 @@ static int dm_resume(void *handle)
 
return 0;
}
-   /* Recreate dc_state - DC invalidates it when setting power state to 
S3. */
-   dc_release_state(dm_state->context);
-   dm_state->context = dc_create_state(dm->dc);
-   /* TODO: Remove dc_state->dccg, use dc->dccg directly. */
-   dc_resource_state_construct(dm->dc, dm_state->context);
 
/* Before powering on DC we need to re-initialize DMUB. */
r = dm_dmub_hw_init(adev);
@@ -2019,11 +2013,52 @@ const struct amdgpu_ip_block_version dm_ip_block =
  * *WIP*
  */
 
+static struct drm_atomic_state *
+amdgpu_dm_atomic_state_alloc(struct drm_device *dev)
+{
+   struct dm_atomic_state *dm_state;
+
+   dm_state = kzalloc(sizeof(*dm_state), GFP_KERNEL);
+
+   if (!dm_state)
+   return NULL;
+
+   if (drm_atomic_state_init(dev, &dm_state->base) < 0) {
+   kfree(dm_state);
+   return NULL;
+   }
+
+   return &dm_state->base;
+}
+
+static void amdgpu_dm_atomic_state_free(struct drm_atomic_state *state)
+{
+   struct dm_atomic_state *dm_state = to_dm_atomic_state(state);
+
+   if (dm_state->context) {
+   dc_release_state(dm_state->context);
+   dm_state->context = NULL;
+   }
+
+   drm_atomic_state_default_release(state);
+   kfree(state);
+}
+
+static void amdgpu_dm_atomic_state_clear(struct drm_atomic_state *state)
+{
+   struct dm_atomic_state *dm_state = to_dm_atomic_state(state);
+
+   drm_atomic_state_default_clear(&dm_sta

Re: [PATCH] drm/dp_mst: Don't return error code when crtc is null

2020-08-14 Thread Lakha, Bhawanpreet
[AMD Official Use Only - Internal Distribution Only]


pos->port->connector?
This is checking the crtc not the connector. The crtc can be null if its 
disabled.

Since it is happening after a unplug->hotplug, I guess we are missing something 
in the disable sequence and the old connector is still in the list.

Bhawan

>>-Original Message-
>>From: dri-devel  On Behalf Of
>>Bhawanpreet Lakha
>>Sent: Friday, August 14, 2020 1:02 PM
>>To: mikita.lip...@amd.com; nicholas.kazlaus...@amd.com;
>>alexander.deuc...@amd.com
>>Cc: Bhawanpreet Lakha ; dri-
>>de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>>Subject: [PATCH] drm/dp_mst: Don't return error code when crtc is null
>>
>>[Why]
>>In certain cases the crtc can be NULL and returning -EINVAL causes
>>atomic check to fail when it shouln't. This leads to valid
>>configurations failing because atomic check fails.
>
>So is this a bug fix or an exception case, or an expected possibility?
>
>From my reading of the function comments, it is not clear that 
>pos->port->connector
>might be NULL for some reason.

>A better explanation of why this would occur would make this a much more
>useful commit message.
>

>My reading is that you ran into this issue an are masking it with this fix.
>
>Rather than this is a real possibility and this is the correct fix.
>
>Mike
>
>>[How]
>>Don't early return if crtc is null
>>
>>Signed-off-by: Bhawanpreet Lakha 
>>---
>> drivers/gpu/drm/drm_dp_mst_topology.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
>>b/drivers/gpu/drm/drm_dp_mst_topology.c
>>index 70c4b7afed12..bc90a1485699 100644
>>--- a/drivers/gpu/drm/drm_dp_mst_topology.c
>>+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
>>@@ -5037,8 +5037,8 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct
>>drm_atomic_state *state, struct drm
>>
>>crtc = conn_state->crtc;
>>
>>-  if (WARN_ON(!crtc))
>>-  return -EINVAL;
>>+  if (!crtc)
>>+  continue;
>>
>>if (!drm_dp_mst_dsc_aux_for_port(pos->port))
>>continue;
>>--
>>2.17.1
>>
>>___
>>dri-devel mailing list
>>dri-de...@lists.freedesktop.org
>>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7CBhawanpreet.Lakha%40amd.com%7C0f5d55c551644fef3df908d840787b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330233520819407&sdata=5N%2BBb0Qp3bd5zANfxovb%2BrVLAGnbP1sjyw3EeCHXj2w%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/scheduler: Scheduler priority fixes

2020-08-14 Thread Luben Tuikov
Remove DRM_SCHED_PRIORITY_LOW, as it was used
in only one place.

Rename and separate by a line
DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
as it represents a (total) count of said
priorities and it is used as such in loops
throughout the code. (0-based indexing is the
the count number.)

Remove redundant word HIGH in priority names,
and rename *KERNEL* to *HIGH*, as it really
means that, high.

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c|  8 
 include/drm/gpu_scheduler.h   | 15 +--
 8 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index d85d13f7a043..fd97ac34277b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -46,7 +46,7 @@ const unsigned int amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] 
= {
 static int amdgpu_ctx_priority_permit(struct drm_file *filp,
  enum drm_sched_priority priority)
 {
-   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_MAX)
+   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_COUNT)
return -EINVAL;
 
/* NORMAL and below are accessible by everyone */
@@ -65,8 +65,8 @@ static int amdgpu_ctx_priority_permit(struct drm_file *filp,
 static enum gfx_pipe_priority amdgpu_ctx_sched_prio_to_compute_prio(enum 
drm_sched_priority prio)
 {
switch (prio) {
-   case DRM_SCHED_PRIORITY_HIGH_HW:
-   case DRM_SCHED_PRIORITY_KERNEL:
+   case DRM_SCHED_PRIORITY_HW:
+   case DRM_SCHED_PRIORITY_HIGH:
return AMDGPU_GFX_PIPE_PRIO_HIGH;
default:
return AMDGPU_GFX_PIPE_PRIO_NORMAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 75d37dfb51aa..bb9e5481ff3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -251,7 +251,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
int i;
 
/* Signal all jobs not yet scheduled */
-   for (i = DRM_SCHED_PRIORITY_MAX - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
+   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
i--) {
struct drm_sched_rq *rq = &sched->sched_rq[i];
 
if (!rq)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 13ea8ebc421c..6d4fc79bf84a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -267,7 +267,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
&ring->sched;
}
 
-   for (i = 0; i < DRM_SCHED_PRIORITY_MAX; ++i)
+   for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; ++i)
atomic_set(&ring->num_jobs[i], 0);
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index da871d84b742..7112137689db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -243,7 +243,7 @@ struct amdgpu_ring {
boolhas_compute_vm_bug;
boolno_scheduler;
 
-   atomic_tnum_jobs[DRM_SCHED_PRIORITY_MAX];
+   atomic_tnum_jobs[DRM_SCHED_PRIORITY_COUNT];
struct mutexpriority_mutex;
/* protected by priority_mutex */
int priority;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index c799691dfa84..e05bc22a0a49 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -36,14 +36,14 @@ enum drm_sched_priority amdgpu_to_sched_priority(int 
amdgpu_priority)
 {
switch (amdgpu_priority) {
case AMDGPU_CTX_PRIORITY_VERY_HIGH:
-   return DRM_SCHED_PRIORITY_HIGH_HW;
+   return DRM_SCHED_PRIORITY_HW;
case AMDGPU_CTX_PRIORITY_HIGH:
-   return DRM_SCHED_PRIORITY_HIGH_SW;
+   return DRM_SCHED_PRIORITY_SW;
case AMDGPU_CTX_PRIORITY_NORMAL:
return DRM_SCHED_PRIORITY_NORMAL;
case AMDGPU_CTX_PRIORITY_LOW:
case AMDGPU_CTX_PRIORITY_VERY_LOW:
-   return DRM_SCHED_PRIORITY_LOW;
+   return DRM_SCHED_PRIORITY_MIN;
case AMDGPU_CTX_PRIORITY_UNSET:
return DRM_SCHED_PRIORITY_UNSET;
default:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_tt

Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage

2020-08-14 Thread Matt Coffin
Hi all,

As of 026acaeac2d205f22c0f682cc1c7b1a85b9ccd00 ("drm/amdgpu: revert "fix
system hang issue during GPU reset""), this patch is no longer needed,
and won't apply, because the badly-behaving code was removed; I'll
withdraw this patch for now.

Sorry to those who wasted their time reviewing.

Cheers,
Matt

On 8/13/20 7:15 PM, Matt Coffin wrote:
> The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an
> issue with the sysfs interface for pp_od_clk_voltage. It overwrites the
> return value to 0 when it calls another function, then returns 0. The
> intended behavior is that a positive return value indicates the number
> of bytes from the buffer that you processed in that call.
> 
> With the 0 return value, clients would submit the same value to be
> written over and over again, resulting in an infinite loop.
> 
> This is resolved by returning the count of bytes read (in this case the
> whole message), when the desired return is 0 (success).
> 
> Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU")
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1245
> Signed-off-by: Matt Coffin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 1705e328c6fc..f00c7ed361d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct 
> device *dev,
>  
>  pro_end:
>   up_read(&adev->reset_sem);
> - return ret;
> + if (ret) {
> + return ret;
> + } else {
> + return count;
> + }
>  }
>  
>  static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/scheduler: Remove priority macro INVALID

2020-08-14 Thread Luben Tuikov
Remove DRM_SCHED_PRIORITY_INVALID. We no longer
carry around an invalid priority and cut it off
at the source.

Backwards compatibility behaviour of AMDGPU CTX
IOCTL passing in garbage for context priority
from user space and then mapping that to
DRM_SCHED_PRIORITY_NORMAL is preserved.

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 21 
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 62 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h |  3 +-
 include/drm/gpu_scheduler.h   |  1 -
 4 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index fd97ac34277b..10d3bfa416c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -384,42 +384,41 @@ static int amdgpu_ctx_query2(struct amdgpu_device *adev,
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 struct drm_file *filp)
 {
-   int r;
+   int res;
uint32_t id;
-   enum drm_sched_priority priority;
+   enum drm_sched_priority prio;
 
union drm_amdgpu_ctx *args = data;
struct amdgpu_device *adev = dev->dev_private;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
 
-   r = 0;
id = args->in.ctx_id;
-   priority = amdgpu_to_sched_priority(args->in.priority);
+   res = amdgpu_to_sched_priority(args->in.priority, &prio);
 
/* For backwards compatibility reasons, we need to accept
 * ioctls with garbage in the priority field */
-   if (priority == DRM_SCHED_PRIORITY_INVALID)
-   priority = DRM_SCHED_PRIORITY_NORMAL;
+   if (res == -EINVAL)
+   prio = DRM_SCHED_PRIORITY_NORMAL;
 
switch (args->in.op) {
case AMDGPU_CTX_OP_ALLOC_CTX:
-   r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id);
+   res = amdgpu_ctx_alloc(adev, fpriv, filp, prio, &id);
args->out.alloc.ctx_id = id;
break;
case AMDGPU_CTX_OP_FREE_CTX:
-   r = amdgpu_ctx_free(fpriv, id);
+   res = amdgpu_ctx_free(fpriv, id);
break;
case AMDGPU_CTX_OP_QUERY_STATE:
-   r = amdgpu_ctx_query(adev, fpriv, id, &args->out);
+   res = amdgpu_ctx_query(adev, fpriv, id, &args->out);
break;
case AMDGPU_CTX_OP_QUERY_STATE2:
-   r = amdgpu_ctx_query2(adev, fpriv, id, &args->out);
+   res = amdgpu_ctx_query2(adev, fpriv, id, &args->out);
break;
default:
return -EINVAL;
}
 
-   return r;
+   return res;
 }
 
 struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, uint32_t id)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index e05bc22a0a49..2398eb646043 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -32,24 +32,32 @@
 
 #include "amdgpu_vm.h"
 
-enum drm_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
+int amdgpu_to_sched_priority(int amdgpu_priority,
+enum drm_sched_priority *prio)
 {
switch (amdgpu_priority) {
case AMDGPU_CTX_PRIORITY_VERY_HIGH:
-   return DRM_SCHED_PRIORITY_HW;
+   *prio = DRM_SCHED_PRIORITY_HW;
+   break;
case AMDGPU_CTX_PRIORITY_HIGH:
-   return DRM_SCHED_PRIORITY_SW;
+   *prio = DRM_SCHED_PRIORITY_SW;
+   break;
case AMDGPU_CTX_PRIORITY_NORMAL:
-   return DRM_SCHED_PRIORITY_NORMAL;
+   *prio = DRM_SCHED_PRIORITY_NORMAL;
+   break;
case AMDGPU_CTX_PRIORITY_LOW:
case AMDGPU_CTX_PRIORITY_VERY_LOW:
-   return DRM_SCHED_PRIORITY_MIN;
+   *prio = DRM_SCHED_PRIORITY_MIN;
+   break;
case AMDGPU_CTX_PRIORITY_UNSET:
-   return DRM_SCHED_PRIORITY_UNSET;
+   *prio = DRM_SCHED_PRIORITY_UNSET;
+   break;
default:
WARN(1, "Invalid context priority %d\n", amdgpu_priority);
-   return DRM_SCHED_PRIORITY_INVALID;
+   return -EINVAL;
}
+
+   return 0;
 }
 
 static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
@@ -116,30 +124,42 @@ int amdgpu_sched_ioctl(struct drm_device *dev, void *data,
 {
union drm_amdgpu_sched *args = data;
struct amdgpu_device *adev = dev->dev_private;
-   enum drm_sched_priority priority;
-   int r;
+   enum drm_sched_priority prio;
+   int res;
 
-   priority = amdgpu_to_sched_priority(args->in.priority);
-   if (priority == DRM_SCHED_PRIORITY_INVALID)
+   /* First check the op, then the op's argument.
+*/
+   switch (args->in.op) {
+   case AMDGPU_SCHED_OP_PROCESS_PRIORI

[PATCH 0/2] Fixes to drm_sched_priority

2020-08-14 Thread Luben Tuikov
Some fixes to enum drm_sched_priority which I'd wanted to do
since last year.

For instance, renaming MAX to COUNT, as usually a maximum value
is a value which is part of the set of values, (e.g. a maxima of
a function), and thus assignable, whereby a count is the size of
a set (the enumeration in this case). It also makes it clearer
when used to define size of arrays.

Removing some redundant naming and perhaps better naming could be
had for PRIORITY_SW and PRIORITY_HW, maybe "moderate" and
"temperate" respectively. However, I've left them as "sw" and
"hw".

Also remove a macro which was used in only a single place.

In the second patch, remove priority INVALID, since it is not a
priority, e.g. a scheduler cannot be assigned and run at priority
"invalid". It seems to be more of a directive, a status, of user
input, and as such has no place in the enumeration of priority
levels.

Something else I'd like to do, is to eliminate priority
enumeration "UNSET", since it is not really a priority level,
but  a directive, instructing the code to "reset the priority
of a context to the initial priority".

At the moment, this functionality is overloaded to this priority
value, and it should be an IOCTL op, as opposed to a priority value.

Tested on my desktop system, which is running a kernel with
this patch applied.

Luben Tuikov (2):
  drm/scheduler: Scheduler priority fixes
  drm/scheduler: Remove priority macro INVALID

 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 27 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 62 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c|  8 +--
 include/drm/gpu_scheduler.h   | 16 +++---
 9 files changed, 73 insertions(+), 51 deletions(-)

-- 
2.28.0.215.g878e727637

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage

2020-08-14 Thread Deucher, Alexander
[AMD Public Use]

No need to apologize, it was a good catch.  Something to double check in the 
future if something similar ends up getting applied again.

Thanks!

Alex

From: amd-gfx  on behalf of Matt Coffin 

Sent: Friday, August 14, 2020 3:13 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Quan, Evan ; Koenig, Christian 
; Li, Dennis 
Subject: Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for 
pp_od_clk_voltage

Hi all,

As of 026acaeac2d205f22c0f682cc1c7b1a85b9ccd00 ("drm/amdgpu: revert "fix
system hang issue during GPU reset""), this patch is no longer needed,
and won't apply, because the badly-behaving code was removed; I'll
withdraw this patch for now.

Sorry to those who wasted their time reviewing.

Cheers,
Matt

On 8/13/20 7:15 PM, Matt Coffin wrote:
> The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an
> issue with the sysfs interface for pp_od_clk_voltage. It overwrites the
> return value to 0 when it calls another function, then returns 0. The
> intended behavior is that a positive return value indicates the number
> of bytes from the buffer that you processed in that call.
>
> With the 0 return value, clients would submit the same value to be
> written over and over again, resulting in an infinite loop.
>
> This is resolved by returning the count of bytes read (in this case the
> whole message), when the desired return is 0 (success).
>
> Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU")
> Bug: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245&data=02%7C01%7Calexander.deucher%40amd.com%7C3fb1fd9f95ad441a88a208d840862e80%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330292979204288&sdata=iABLxfzQVQa5zBK6a0JozvRYl%2Fg5eTFnfOS86r0g9rU%3D&reserved=0
> Signed-off-by: Matt Coffin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 1705e328c6fc..f00c7ed361d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct 
> device *dev,
>
>  pro_end:
>up_read(&adev->reset_sem);
> - return ret;
> + if (ret) {
> + return ret;
> + } else {
> + return count;
> + }
>  }
>
>  static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C3fb1fd9f95ad441a88a208d840862e80%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330292979204288&sdata=GLWlbhgo0grFjcOh5ZAJWhXGxSm6Ufw8eDFDHZf95Uk%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/2] Fixes to drm_sched_priority

2020-08-14 Thread Alex Deucher
+ dri-devel

On Fri, Aug 14, 2020 at 3:14 PM Luben Tuikov  wrote:
>
> Some fixes to enum drm_sched_priority which I'd wanted to do
> since last year.
>
> For instance, renaming MAX to COUNT, as usually a maximum value
> is a value which is part of the set of values, (e.g. a maxima of
> a function), and thus assignable, whereby a count is the size of
> a set (the enumeration in this case). It also makes it clearer
> when used to define size of arrays.
>
> Removing some redundant naming and perhaps better naming could be
> had for PRIORITY_SW and PRIORITY_HW, maybe "moderate" and
> "temperate" respectively. However, I've left them as "sw" and
> "hw".
>
> Also remove a macro which was used in only a single place.
>
> In the second patch, remove priority INVALID, since it is not a
> priority, e.g. a scheduler cannot be assigned and run at priority
> "invalid". It seems to be more of a directive, a status, of user
> input, and as such has no place in the enumeration of priority
> levels.
>
> Something else I'd like to do, is to eliminate priority
> enumeration "UNSET", since it is not really a priority level,
> but  a directive, instructing the code to "reset the priority
> of a context to the initial priority".
>
> At the moment, this functionality is overloaded to this priority
> value, and it should be an IOCTL op, as opposed to a priority value.
>
> Tested on my desktop system, which is running a kernel with
> this patch applied.
>
> Luben Tuikov (2):
>   drm/scheduler: Scheduler priority fixes
>   drm/scheduler: Remove priority macro INVALID
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 27 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 62 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
>  drivers/gpu/drm/scheduler/sched_main.c|  8 +--
>  include/drm/gpu_scheduler.h   | 16 +++---
>  9 files changed, 73 insertions(+), 51 deletions(-)
>
> --
> 2.28.0.215.g878e727637
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove priority macro INVALID

2020-08-14 Thread Alex Deucher
+ dri-devel


On Fri, Aug 14, 2020 at 3:14 PM Luben Tuikov  wrote:
>
> Remove DRM_SCHED_PRIORITY_INVALID. We no longer
> carry around an invalid priority and cut it off
> at the source.
>
> Backwards compatibility behaviour of AMDGPU CTX
> IOCTL passing in garbage for context priority
> from user space and then mapping that to
> DRM_SCHED_PRIORITY_NORMAL is preserved.
>
> Signed-off-by: Luben Tuikov 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 21 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 62 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h |  3 +-
>  include/drm/gpu_scheduler.h   |  1 -
>  4 files changed, 53 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index fd97ac34277b..10d3bfa416c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -384,42 +384,41 @@ static int amdgpu_ctx_query2(struct amdgpu_device *adev,
>  int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>  struct drm_file *filp)
>  {
> -   int r;
> +   int res;
> uint32_t id;
> -   enum drm_sched_priority priority;
> +   enum drm_sched_priority prio;

The variable renaming is not directly related to the functional change
in the patch and should be split out as a separate patch is you think
it should be applied.  I personally prefer the original naming.  The
driver uses r or ret for return value checking pretty consistently.  I
also prefer to spell things out unless the names are really long.
Makes it more obvious what they are for.  Same comment on the
functions below as well.

Alex

>
> union drm_amdgpu_ctx *args = data;
> struct amdgpu_device *adev = dev->dev_private;
> struct amdgpu_fpriv *fpriv = filp->driver_priv;
>
> -   r = 0;
> id = args->in.ctx_id;
> -   priority = amdgpu_to_sched_priority(args->in.priority);
> +   res = amdgpu_to_sched_priority(args->in.priority, &prio);
>
> /* For backwards compatibility reasons, we need to accept
>  * ioctls with garbage in the priority field */
> -   if (priority == DRM_SCHED_PRIORITY_INVALID)
> -   priority = DRM_SCHED_PRIORITY_NORMAL;
> +   if (res == -EINVAL)
> +   prio = DRM_SCHED_PRIORITY_NORMAL;
>
> switch (args->in.op) {
> case AMDGPU_CTX_OP_ALLOC_CTX:
> -   r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id);
> +   res = amdgpu_ctx_alloc(adev, fpriv, filp, prio, &id);
> args->out.alloc.ctx_id = id;
> break;
> case AMDGPU_CTX_OP_FREE_CTX:
> -   r = amdgpu_ctx_free(fpriv, id);
> +   res = amdgpu_ctx_free(fpriv, id);
> break;
> case AMDGPU_CTX_OP_QUERY_STATE:
> -   r = amdgpu_ctx_query(adev, fpriv, id, &args->out);
> +   res = amdgpu_ctx_query(adev, fpriv, id, &args->out);
> break;
> case AMDGPU_CTX_OP_QUERY_STATE2:
> -   r = amdgpu_ctx_query2(adev, fpriv, id, &args->out);
> +   res = amdgpu_ctx_query2(adev, fpriv, id, &args->out);
> break;
> default:
> return -EINVAL;
> }
>
> -   return r;
> +   return res;
>  }
>
>  struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, uint32_t id)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> index e05bc22a0a49..2398eb646043 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> @@ -32,24 +32,32 @@
>
>  #include "amdgpu_vm.h"
>
> -enum drm_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
> +int amdgpu_to_sched_priority(int amdgpu_priority,
> +enum drm_sched_priority *prio)
>  {
> switch (amdgpu_priority) {
> case AMDGPU_CTX_PRIORITY_VERY_HIGH:
> -   return DRM_SCHED_PRIORITY_HW;
> +   *prio = DRM_SCHED_PRIORITY_HW;
> +   break;
> case AMDGPU_CTX_PRIORITY_HIGH:
> -   return DRM_SCHED_PRIORITY_SW;
> +   *prio = DRM_SCHED_PRIORITY_SW;
> +   break;
> case AMDGPU_CTX_PRIORITY_NORMAL:
> -   return DRM_SCHED_PRIORITY_NORMAL;
> +   *prio = DRM_SCHED_PRIORITY_NORMAL;
> +   break;
> case AMDGPU_CTX_PRIORITY_LOW:
> case AMDGPU_CTX_PRIORITY_VERY_LOW:
> -   return DRM_SCHED_PRIORITY_MIN;
> +   *prio = DRM_SCHED_PRIORITY_MIN;
> +   break;
> case AMDGPU_CTX_PRIORITY_UNSET:
> -   return DRM_SCHED_PRIORITY_UNSET;
> +   *prio = DRM_SCHED_PRIORITY_UNSET;
> +   break;
> default:
> WARN(1, "Invalid context priority %d\n", amdgpu_priority);
> -   return DRM_SCHED_P

Re: [PATCH 1/2] drm/scheduler: Scheduler priority fixes

2020-08-14 Thread Alex Deucher
+ dri-devel

On Fri, Aug 14, 2020 at 3:14 PM Luben Tuikov  wrote:
>
> Remove DRM_SCHED_PRIORITY_LOW, as it was used
> in only one place.
>
> Rename and separate by a line
> DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
> as it represents a (total) count of said
> priorities and it is used as such in loops
> throughout the code. (0-based indexing is the
> the count number.)
>
> Remove redundant word HIGH in priority names,
> and rename *KERNEL* to *HIGH*, as it really
> means that, high.
>
> Signed-off-by: Luben Tuikov 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c |  6 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
>  drivers/gpu/drm/scheduler/sched_main.c|  8 
>  include/drm/gpu_scheduler.h   | 15 +--
>  8 files changed, 23 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index d85d13f7a043..fd97ac34277b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -46,7 +46,7 @@ const unsigned int 
> amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] = {
>  static int amdgpu_ctx_priority_permit(struct drm_file *filp,
>   enum drm_sched_priority priority)
>  {
> -   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_MAX)
> +   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_COUNT)
> return -EINVAL;
>
> /* NORMAL and below are accessible by everyone */
> @@ -65,8 +65,8 @@ static int amdgpu_ctx_priority_permit(struct drm_file *filp,
>  static enum gfx_pipe_priority amdgpu_ctx_sched_prio_to_compute_prio(enum 
> drm_sched_priority prio)
>  {
> switch (prio) {
> -   case DRM_SCHED_PRIORITY_HIGH_HW:
> -   case DRM_SCHED_PRIORITY_KERNEL:
> +   case DRM_SCHED_PRIORITY_HW:
> +   case DRM_SCHED_PRIORITY_HIGH:
> return AMDGPU_GFX_PIPE_PRIO_HIGH;
> default:
> return AMDGPU_GFX_PIPE_PRIO_NORMAL;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 75d37dfb51aa..bb9e5481ff3c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -251,7 +251,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
> drm_gpu_scheduler *sched)
> int i;
>
> /* Signal all jobs not yet scheduled */
> -   for (i = DRM_SCHED_PRIORITY_MAX - 1; i >= DRM_SCHED_PRIORITY_MIN; 
> i--) {
> +   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
> i--) {
> struct drm_sched_rq *rq = &sched->sched_rq[i];
>
> if (!rq)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 13ea8ebc421c..6d4fc79bf84a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -267,7 +267,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> amdgpu_ring *ring,
> &ring->sched;
> }
>
> -   for (i = 0; i < DRM_SCHED_PRIORITY_MAX; ++i)
> +   for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; ++i)
> atomic_set(&ring->num_jobs[i], 0);
>
> return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index da871d84b742..7112137689db 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -243,7 +243,7 @@ struct amdgpu_ring {
> boolhas_compute_vm_bug;
> boolno_scheduler;
>
> -   atomic_tnum_jobs[DRM_SCHED_PRIORITY_MAX];
> +   atomic_tnum_jobs[DRM_SCHED_PRIORITY_COUNT];
> struct mutexpriority_mutex;
> /* protected by priority_mutex */
> int priority;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> index c799691dfa84..e05bc22a0a49 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> @@ -36,14 +36,14 @@ enum drm_sched_priority amdgpu_to_sched_priority(int 
> amdgpu_priority)
>  {
> switch (amdgpu_priority) {
> case AMDGPU_CTX_PRIORITY_VERY_HIGH:
> -   return DRM_SCHED_PRIORITY_HIGH_HW;
> +   return DRM_SCHED_PRIORITY_HW;
> case AMDGPU_CTX_PRIORITY_HIGH:
> -   return DRM_SCHED_PRIORITY_HIGH_SW;
> +   return DRM_SCHED_PRIORITY_SW;
> case AMDGPU_CTX_PRIORITY_NORMAL:
> return DRM_SCHED_PRIORITY_NORMAL;
> case AMDGPU_CTX_PRIORITY_LOW:
>

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Luben Tuikov
On 2020-08-14 11:23 a.m., Nirmoy wrote:
> 
> On 8/13/20 11:17 PM, Luben Tuikov wrote:
>> I support having AER handling.
>>
>> However, I feel it should be offloaded to the DRM layer.
>> The PCI driver gets the AER callback and immediately
>> offloads into DRM, as "return drm_aer_recover(dev); }".
>> The DRM layer does a top-down approach into the error
>> recovery procedure.
>>
>> The PCI device driver provides (would/should?) a capability
>> (at registration) which the DRM layer would inspect and
>> subsequently call into the PCI driver's low-level methods
>> to recover the error or to reset the device.
>>
>> But it should be a top-down approach. I believe the thread
>> below has somehow hinted at this.
>>
>> The implementation below boils down to:
>>
>>  If recoverable error, all is good.
>>  If unrecoverable error, then
>>  disable power management,
>>  suspend I/O operations,
>>  cancel pending work,
>>  call into PCI driver to clear
>>  any state it keeps,
>>  call into PCI driver to reset display control.
>>  Etc.
>>
>> And this regime could be performed by DRM.
>>
>> And as you can see now, the function implemented below,
>> *calls into* (that's the key here!) DRM, and it should be
>> the other way around--the DRM should call into the PCI driver,
>> after the PCI driver's callback immediately calls into DRM,
>> as outlined above.
>>
>> This abstraction could be expanded to more concepts of PCIe GPU drivers,
>> and it would scale well, beyond PCIe as a protocol for graphics.
> 
> If drm handles pci error callbacks then it should also handle power 
> management

Yes. LLDDs should have a method for DRM to call to control power
management. For instance the first thing a DRM unifying layer would
do is suspend issuing tasks to the LLDD, then recall (pause?) issued tasks,
do some memory management housekeeping related to the LLDD and then
invoke power management.

Once DRM is aware of power management it can better control
memory and task dependencies.

> 
> because power management also calls into drm in very similar way. I 
> think these are very
> 
> low device level tasks for drm.

I disagree and think that it is not "very low device level tasks".
Look at the regime posted above. All those directives should/could?
be known to DRM and LLDDs should have entries for DRM to call
into.

I see DRM as more of a unifying layer (perhaps long term), as opposed
to a *library* which LLDDs call into. Then LLDDs would provide an interface
to the hardware. This will help us avoid many of the deadlocks and
synchronization issues which were outlined in a recent memory presentation.
It will also help us avoid code redundancy in LLDDs.

Regards,
Luben

> 
> 
>>> +static const struct pci_error_handlers amdgpu_err_handler = {
>> That's too generic a name for this. I'd rather add "pci" in there,
>>
>> static const struct pci_error_handlers amdgpu_pci_err_handler =  {
> 
> 
> True, thanks for the name suggestion.
> 
> 
> Nirmoy
> 
> 
>>  .element = init,
>>  ...
>> };
>>
>> Being a singular noun from the outset is good and this is preserved.
>>
>>> +   .error_detected = amdgpu_pci_err_detected,
>>> +};
>>> +
>>> +
>>>   static struct pci_driver amdgpu_kms_pci_driver = {
>>> .name = DRIVER_NAME,
>>> .id_table = pciidlist,
>>> @@ -1523,10 +1576,9 @@ static struct pci_driver amdgpu_kms_pci_driver = {
>>> .remove = amdgpu_pci_remove,
>>> .shutdown = amdgpu_pci_shutdown,
>>> .driver.pm = &amdgpu_pm_ops,
>>> +   .err_handler = &amdgpu_err_handler,
>> ".err_handler = amdgpu_pci_err_handler,"
>>
>>
>> Regards,
>> Luben
>>
>> On 2020-08-13 2:18 p.m., Andrey Grodzovsky wrote:
>>> On 8/13/20 11:06 AM, Nirmoy wrote:
 On 8/13/20 3:38 PM, Andrey Grodzovsky wrote:
> On 8/13/20 7:09 AM, Nirmoy wrote:
>> On 8/12/20 4:52 PM, Andrey Grodzovsky wrote:
>>> On 8/11/20 9:30 AM, Nirmoy Das wrote:
 This patch will ignore non-fatal errors and try to
 stop amdgpu's sw stack on fatal errors.

 Signed-off-by: Nirmoy Das 
 ---
    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 56 
 -
    1 file changed, 54 insertions(+), 2 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
 index c1219af2e7d6..2b9ede3000ee 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
 @@ -35,6 +35,7 @@
    #include 
    #include 
    #include 
 +#include 
    #include 
      #include "amdgpu.h"
 @@ -1516,6 +1517,58 @@ static struct drm_driver kms_driver = {
    .patchlevel = KMS_DRIVER_PATCHLEVEL,
    };
    +static pci_ers_result_t amdgpu_pci_err_detected(struct pci_dev 
 *pdev,
 +

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Alex Deucher
On Fri, Aug 14, 2020 at 3:52 PM Luben Tuikov  wrote:
>
> On 2020-08-14 11:23 a.m., Nirmoy wrote:
> >
> > On 8/13/20 11:17 PM, Luben Tuikov wrote:
> >> I support having AER handling.
> >>
> >> However, I feel it should be offloaded to the DRM layer.
> >> The PCI driver gets the AER callback and immediately
> >> offloads into DRM, as "return drm_aer_recover(dev); }".
> >> The DRM layer does a top-down approach into the error
> >> recovery procedure.
> >>
> >> The PCI device driver provides (would/should?) a capability
> >> (at registration) which the DRM layer would inspect and
> >> subsequently call into the PCI driver's low-level methods
> >> to recover the error or to reset the device.
> >>
> >> But it should be a top-down approach. I believe the thread
> >> below has somehow hinted at this.
> >>
> >> The implementation below boils down to:
> >>
> >>  If recoverable error, all is good.
> >>  If unrecoverable error, then
> >>  disable power management,
> >>  suspend I/O operations,
> >>  cancel pending work,
> >>  call into PCI driver to clear
> >>  any state it keeps,
> >>  call into PCI driver to reset display control.
> >>  Etc.
> >>
> >> And this regime could be performed by DRM.
> >>
> >> And as you can see now, the function implemented below,
> >> *calls into* (that's the key here!) DRM, and it should be
> >> the other way around--the DRM should call into the PCI driver,
> >> after the PCI driver's callback immediately calls into DRM,
> >> as outlined above.
> >>
> >> This abstraction could be expanded to more concepts of PCIe GPU drivers,
> >> and it would scale well, beyond PCIe as a protocol for graphics.
> >
> > If drm handles pci error callbacks then it should also handle power
> > management
>
> Yes. LLDDs should have a method for DRM to call to control power
> management. For instance the first thing a DRM unifying layer would
> do is suspend issuing tasks to the LLDD, then recall (pause?) issued tasks,
> do some memory management housekeeping related to the LLDD and then
> invoke power management.
>
> Once DRM is aware of power management it can better control
> memory and task dependencies.
>
> >
> > because power management also calls into drm in very similar way. I
> > think these are very
> >
> > low device level tasks for drm.
>
> I disagree and think that it is not "very low device level tasks".
> Look at the regime posted above. All those directives should/could?
> be known to DRM and LLDDs should have entries for DRM to call
> into.
>
> I see DRM as more of a unifying layer (perhaps long term), as opposed
> to a *library* which LLDDs call into. Then LLDDs would provide an interface
> to the hardware. This will help us avoid many of the deadlocks and
> synchronization issues which were outlined in a recent memory presentation.
> It will also help us avoid code redundancy in LLDDs.

We are actually trying to move away from that model.  drm started out
as a midlayer and it caused all kinds of pain.  E.g., some drm devices
are usb devices, some are pci, some are platform devices, etc.  A flow
that worked for one didn't always work well for another.  In general
we should follow the driver model for the specific device model of
your driver and then use drm helper functions to deal with the core
drm functionality.  E.g., drm shouldn't be in the business of dealing
with pci power management.  The driver should interact directly with
the pci subsystem for that.

Alex

>
> Regards,
> Luben
>
> >
> >
> >>> +static const struct pci_error_handlers amdgpu_err_handler = {
> >> That's too generic a name for this. I'd rather add "pci" in there,
> >>
> >> static const struct pci_error_handlers amdgpu_pci_err_handler =  {
> >
> >
> > True, thanks for the name suggestion.
> >
> >
> > Nirmoy
> >
> >
> >>  .element = init,
> >>  ...
> >> };
> >>
> >> Being a singular noun from the outset is good and this is preserved.
> >>
> >>> +   .error_detected = amdgpu_pci_err_detected,
> >>> +};
> >>> +
> >>> +
> >>>   static struct pci_driver amdgpu_kms_pci_driver = {
> >>> .name = DRIVER_NAME,
> >>> .id_table = pciidlist,
> >>> @@ -1523,10 +1576,9 @@ static struct pci_driver amdgpu_kms_pci_driver = {
> >>> .remove = amdgpu_pci_remove,
> >>> .shutdown = amdgpu_pci_shutdown,
> >>> .driver.pm = &amdgpu_pm_ops,
> >>> +   .err_handler = &amdgpu_err_handler,
> >> ".err_handler = amdgpu_pci_err_handler,"
> >>
> >>
> >> Regards,
> >> Luben
> >>
> >> On 2020-08-13 2:18 p.m., Andrey Grodzovsky wrote:
> >>> On 8/13/20 11:06 AM, Nirmoy wrote:
>  On 8/13/20 3:38 PM, Andrey Grodzovsky wrote:
> > On 8/13/20 7:09 AM, Nirmoy wrote:
> >> On 8/12/20 4:52 PM, Andrey Grodzovsky wrote:
> >>> On 8/11/20 9:30 AM, Nirmoy Das wrote:
>  This patch will ignore non-fatal errors and try to
>  stop amdgpu's sw stack on fatal errors.
> 
>  Signed-off-by: Nirmoy Das 
> >

Re: [RFC PATCH 1/1] drm/amdgpu: add initial support for pci error handler

2020-08-14 Thread Luben Tuikov
On 2020-08-14 4:10 p.m., Alex Deucher wrote:
>> I see DRM as more of a unifying layer (perhaps long term), as opposed
>> to a *library* which LLDDs call into. Then LLDDs would provide an interface
>> to the hardware. This will help us avoid many of the deadlocks and
>> synchronization issues which were outlined in a recent memory presentation.
>> It will also help us avoid code redundancy in LLDDs.
> 
> We are actually trying to move away from that model.  drm started out
> as a midlayer and it caused all kinds of pain.  E.g., some drm devices
> are usb devices, some are pci, some are platform devices, etc.  A flow
> that worked for one didn't always work well for another.  In general
> we should follow the driver model for the specific device model of
> your driver and then use drm helper functions to deal with the core
> drm functionality.  E.g., drm shouldn't be in the business of dealing
> with pci power management.  The driver should interact directly with
> the pci subsystem for that.

Ah, thanks Alex for the clarification. So a library of sorts--got it.

Regards,
Luben
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/scheduler: Scheduler priority fixes

2020-08-14 Thread Alex Deucher
On Fri, Aug 14, 2020 at 3:33 PM Alex Deucher  wrote:
>
> + dri-devel
>
> On Fri, Aug 14, 2020 at 3:14 PM Luben Tuikov  wrote:
> >
> > Remove DRM_SCHED_PRIORITY_LOW, as it was used
> > in only one place.
> >
> > Rename and separate by a line
> > DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
> > as it represents a (total) count of said
> > priorities and it is used as such in loops
> > throughout the code. (0-based indexing is the
> > the count number.)
> >
> > Remove redundant word HIGH in priority names,
> > and rename *KERNEL* to *HIGH*, as it really
> > means that, high.
> >
> > Signed-off-by: Luben Tuikov 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  6 +++---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c |  6 +++---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
> >  drivers/gpu/drm/scheduler/sched_main.c|  8 
> >  include/drm/gpu_scheduler.h   | 15 +--
> >  8 files changed, 23 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> > index d85d13f7a043..fd97ac34277b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> > @@ -46,7 +46,7 @@ const unsigned int 
> > amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] = {
> >  static int amdgpu_ctx_priority_permit(struct drm_file *filp,
> >   enum drm_sched_priority priority)
> >  {
> > -   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_MAX)
> > +   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_COUNT)
> > return -EINVAL;
> >
> > /* NORMAL and below are accessible by everyone */
> > @@ -65,8 +65,8 @@ static int amdgpu_ctx_priority_permit(struct drm_file 
> > *filp,
> >  static enum gfx_pipe_priority amdgpu_ctx_sched_prio_to_compute_prio(enum 
> > drm_sched_priority prio)
> >  {
> > switch (prio) {
> > -   case DRM_SCHED_PRIORITY_HIGH_HW:
> > -   case DRM_SCHED_PRIORITY_KERNEL:
> > +   case DRM_SCHED_PRIORITY_HW:
> > +   case DRM_SCHED_PRIORITY_HIGH:
> > return AMDGPU_GFX_PIPE_PRIO_HIGH;
> > default:
> > return AMDGPU_GFX_PIPE_PRIO_NORMAL;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index 75d37dfb51aa..bb9e5481ff3c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -251,7 +251,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
> > drm_gpu_scheduler *sched)
> > int i;
> >
> > /* Signal all jobs not yet scheduled */
> > -   for (i = DRM_SCHED_PRIORITY_MAX - 1; i >= DRM_SCHED_PRIORITY_MIN; 
> > i--) {
> > +   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
> > i--) {
> > struct drm_sched_rq *rq = &sched->sched_rq[i];
> >
> > if (!rq)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > index 13ea8ebc421c..6d4fc79bf84a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > @@ -267,7 +267,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
> > amdgpu_ring *ring,
> > &ring->sched;
> > }
> >
> > -   for (i = 0; i < DRM_SCHED_PRIORITY_MAX; ++i)
> > +   for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; ++i)
> > atomic_set(&ring->num_jobs[i], 0);
> >
> > return 0;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > index da871d84b742..7112137689db 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > @@ -243,7 +243,7 @@ struct amdgpu_ring {
> > boolhas_compute_vm_bug;
> > boolno_scheduler;
> >
> > -   atomic_tnum_jobs[DRM_SCHED_PRIORITY_MAX];
> > +   atomic_tnum_jobs[DRM_SCHED_PRIORITY_COUNT];
> > struct mutexpriority_mutex;
> > /* protected by priority_mutex */
> > int priority;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > index c799691dfa84..e05bc22a0a49 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > @@ -36,14 +36,14 @@ enum drm_sched_priority amdgpu_to_sched_priority(int 
> > amdgpu_priority)
> >  {
> > switch (amdgpu_priority) {
> > case AMDGPU_CTX_PRIORITY_VERY_HIGH:
> > -   return DRM_SCHED_PRIORITY_HIGH_HW;
> > +   return DRM_SCHED_PRIORI

Re: [PATCH 1/2] drm/scheduler: Scheduler priority fixes

2020-08-14 Thread Luben Tuikov
On 2020-08-14 4:58 p.m., Alex Deucher wrote:
> On Fri, Aug 14, 2020 at 3:33 PM Alex Deucher  wrote:
>>
>> + dri-devel
>>
>> On Fri, Aug 14, 2020 at 3:14 PM Luben Tuikov  wrote:
>>>
>>> Remove DRM_SCHED_PRIORITY_LOW, as it was used
>>> in only one place.
>>>
>>> Rename and separate by a line
>>> DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
>>> as it represents a (total) count of said
>>> priorities and it is used as such in loops
>>> throughout the code. (0-based indexing is the
>>> the count number.)
>>>
>>> Remove redundant word HIGH in priority names,
>>> and rename *KERNEL* to *HIGH*, as it really
>>> means that, high.
>>>
>>> Signed-off-by: Luben Tuikov 
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  6 +++---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c |  6 +++---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
>>>  drivers/gpu/drm/scheduler/sched_main.c|  8 
>>>  include/drm/gpu_scheduler.h   | 15 +--
>>>  8 files changed, 23 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> index d85d13f7a043..fd97ac34277b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -46,7 +46,7 @@ const unsigned int 
>>> amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] = {
>>>  static int amdgpu_ctx_priority_permit(struct drm_file *filp,
>>>   enum drm_sched_priority priority)
>>>  {
>>> -   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_MAX)
>>> +   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_COUNT)
>>> return -EINVAL;
>>>
>>> /* NORMAL and below are accessible by everyone */
>>> @@ -65,8 +65,8 @@ static int amdgpu_ctx_priority_permit(struct drm_file 
>>> *filp,
>>>  static enum gfx_pipe_priority amdgpu_ctx_sched_prio_to_compute_prio(enum 
>>> drm_sched_priority prio)
>>>  {
>>> switch (prio) {
>>> -   case DRM_SCHED_PRIORITY_HIGH_HW:
>>> -   case DRM_SCHED_PRIORITY_KERNEL:
>>> +   case DRM_SCHED_PRIORITY_HW:
>>> +   case DRM_SCHED_PRIORITY_HIGH:
>>> return AMDGPU_GFX_PIPE_PRIO_HIGH;
>>> default:
>>> return AMDGPU_GFX_PIPE_PRIO_NORMAL;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 75d37dfb51aa..bb9e5481ff3c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -251,7 +251,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
>>> drm_gpu_scheduler *sched)
>>> int i;
>>>
>>> /* Signal all jobs not yet scheduled */
>>> -   for (i = DRM_SCHED_PRIORITY_MAX - 1; i >= DRM_SCHED_PRIORITY_MIN; 
>>> i--) {
>>> +   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
>>> i--) {
>>> struct drm_sched_rq *rq = &sched->sched_rq[i];
>>>
>>> if (!rq)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>> index 13ea8ebc421c..6d4fc79bf84a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>> @@ -267,7 +267,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
>>> amdgpu_ring *ring,
>>> &ring->sched;
>>> }
>>>
>>> -   for (i = 0; i < DRM_SCHED_PRIORITY_MAX; ++i)
>>> +   for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; ++i)
>>> atomic_set(&ring->num_jobs[i], 0);
>>>
>>> return 0;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> index da871d84b742..7112137689db 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> @@ -243,7 +243,7 @@ struct amdgpu_ring {
>>> boolhas_compute_vm_bug;
>>> boolno_scheduler;
>>>
>>> -   atomic_tnum_jobs[DRM_SCHED_PRIORITY_MAX];
>>> +   atomic_tnum_jobs[DRM_SCHED_PRIORITY_COUNT];
>>> struct mutexpriority_mutex;
>>> /* protected by priority_mutex */
>>> int priority;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
>>> index c799691dfa84..e05bc22a0a49 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
>>> @@ -36,14 +36,14 @@ enum drm_sched_priority amdgpu_to_sched_priority(int 
>>> amdgpu_priority)
>>>  {
>>> switch (amdgpu_priority) {
>>> case AMDGPU_CTX_PRIORITY_VERY_HIGH:
>>> -   return DRM_SCHED_PRIORITY_H

Re: [PATCH 2/2] drm/scheduler: Remove priority macro INVALID

2020-08-14 Thread Luben Tuikov
On 2020-08-14 3:28 p.m., Alex Deucher wrote:
> + dri-devel
> 
> 
> On Fri, Aug 14, 2020 at 3:14 PM Luben Tuikov  wrote:
>>
>> Remove DRM_SCHED_PRIORITY_INVALID. We no longer
>> carry around an invalid priority and cut it off
>> at the source.
>>
>> Backwards compatibility behaviour of AMDGPU CTX
>> IOCTL passing in garbage for context priority
>> from user space and then mapping that to
>> DRM_SCHED_PRIORITY_NORMAL is preserved.
>>
>> Signed-off-by: Luben Tuikov 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 21 
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 62 +++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h |  3 +-
>>  include/drm/gpu_scheduler.h   |  1 -
>>  4 files changed, 53 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index fd97ac34277b..10d3bfa416c0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -384,42 +384,41 @@ static int amdgpu_ctx_query2(struct amdgpu_device 
>> *adev,
>>  int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>>  struct drm_file *filp)
>>  {
>> -   int r;
>> +   int res;
>> uint32_t id;
>> -   enum drm_sched_priority priority;
>> +   enum drm_sched_priority prio;
> 
> The variable renaming is not directly related to the functional change
> in the patch and should be split out as a separate patch is you think
> it should be applied.  I personally prefer the original naming.  The
> driver uses r or ret for return value checking pretty consistently.  I
> also prefer to spell things out unless the names are really long.
> Makes it more obvious what they are for.  Same comment on the
> functions below as well.

Sure, I can revert this. Personally, I don't like single letter
variables as they are very inexpressive and hard to search for.
I thought "prio" to be easier to type than "priority", but I can
revert this.

Regards,
Luben

> 
> Alex
> 
>>
>> union drm_amdgpu_ctx *args = data;
>> struct amdgpu_device *adev = dev->dev_private;
>> struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>
>> -   r = 0;
>> id = args->in.ctx_id;
>> -   priority = amdgpu_to_sched_priority(args->in.priority);
>> +   res = amdgpu_to_sched_priority(args->in.priority, &prio);
>>
>> /* For backwards compatibility reasons, we need to accept
>>  * ioctls with garbage in the priority field */
>> -   if (priority == DRM_SCHED_PRIORITY_INVALID)
>> -   priority = DRM_SCHED_PRIORITY_NORMAL;
>> +   if (res == -EINVAL)
>> +   prio = DRM_SCHED_PRIORITY_NORMAL;
>>
>> switch (args->in.op) {
>> case AMDGPU_CTX_OP_ALLOC_CTX:
>> -   r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id);
>> +   res = amdgpu_ctx_alloc(adev, fpriv, filp, prio, &id);
>> args->out.alloc.ctx_id = id;
>> break;
>> case AMDGPU_CTX_OP_FREE_CTX:
>> -   r = amdgpu_ctx_free(fpriv, id);
>> +   res = amdgpu_ctx_free(fpriv, id);
>> break;
>> case AMDGPU_CTX_OP_QUERY_STATE:
>> -   r = amdgpu_ctx_query(adev, fpriv, id, &args->out);
>> +   res = amdgpu_ctx_query(adev, fpriv, id, &args->out);
>> break;
>> case AMDGPU_CTX_OP_QUERY_STATE2:
>> -   r = amdgpu_ctx_query2(adev, fpriv, id, &args->out);
>> +   res = amdgpu_ctx_query2(adev, fpriv, id, &args->out);
>> break;
>> default:
>> return -EINVAL;
>> }
>>
>> -   return r;
>> +   return res;
>>  }
>>
>>  struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, uint32_t id)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
>> index e05bc22a0a49..2398eb646043 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
>> @@ -32,24 +32,32 @@
>>
>>  #include "amdgpu_vm.h"
>>
>> -enum drm_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
>> +int amdgpu_to_sched_priority(int amdgpu_priority,
>> +enum drm_sched_priority *prio)
>>  {
>> switch (amdgpu_priority) {
>> case AMDGPU_CTX_PRIORITY_VERY_HIGH:
>> -   return DRM_SCHED_PRIORITY_HW;
>> +   *prio = DRM_SCHED_PRIORITY_HW;
>> +   break;
>> case AMDGPU_CTX_PRIORITY_HIGH:
>> -   return DRM_SCHED_PRIORITY_SW;
>> +   *prio = DRM_SCHED_PRIORITY_SW;
>> +   break;
>> case AMDGPU_CTX_PRIORITY_NORMAL:
>> -   return DRM_SCHED_PRIORITY_NORMAL;
>> +   *prio = DRM_SCHED_PRIORITY_NORMAL;
>> +   break;
>> case AMDGPU_CTX_PRIORITY_LOW:
>> case AMDGPU_CTX_PRIORITY_VERY_LOW:
>> -   

Re: [PATCH] drm/dp_mst: Don't return error code when crtc is null

2020-08-14 Thread Lakha, Bhawanpreet
[AMD Official Use Only - Internal Distribution Only]

I took a closer look at this and there seems to be an issue in the function.

Crtc being null is a valid case here. The sequence that leads to this is, 
unplug -> disable crtc/release vcpi slots then hotplug. The issue is that 
pos->port is not guaranteed to be released in 
drm_dp_atomic_release_vcpi_slots() so list_for_each_entry(pos, 
&mst_state->vcpis, next) {}  might still iterate thought it.

So, when a hotplug is done we still loop through the old port which has port! = 
null, crtc = null, and vpci = 0. I didn't find anything that I can use to 
remove the port from the list. So, a potential solution to this would be to add 
a check for vpci = 0 and skip that port.

Thoughts/Suggestions?

Bhawan




From: amd-gfx  on behalf of Lakha, 
Bhawanpreet 
Sent: August 14, 2020 2:52 PM
To: Ruhl, Michael J ; Lipski, Mikita 
; Kazlauskas, Nicholas ; 
Deucher, Alexander 
Cc: amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org 
Subject: Re: [PATCH] drm/dp_mst: Don't return error code when crtc is null


pos->port->connector?
This is checking the crtc not the connector. The crtc can be null if its 
disabled.

Since it is happening after a unplug->hotplug, I guess we are missing something 
in the disable sequence and the old connector is still in the list.

Bhawan

>>-Original Message-
>>From: dri-devel  On Behalf Of
>>Bhawanpreet Lakha
>>Sent: Friday, August 14, 2020 1:02 PM
>>To: mikita.lip...@amd.com; nicholas.kazlaus...@amd.com;
>>alexander.deuc...@amd.com
>>Cc: Bhawanpreet Lakha ; dri-
>>de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>>Subject: [PATCH] drm/dp_mst: Don't return error code when crtc is null
>>
>>[Why]
>>In certain cases the crtc can be NULL and returning -EINVAL causes
>>atomic check to fail when it shouln't. This leads to valid
>>configurations failing because atomic check fails.
>
>So is this a bug fix or an exception case, or an expected possibility?
>
>From my reading of the function comments, it is not clear that 
>pos->port->connector
>might be NULL for some reason.

>A better explanation of why this would occur would make this a much more
>useful commit message.
>

>My reading is that you ran into this issue an are masking it with this fix.
>
>Rather than this is a real possibility and this is the correct fix.
>
>Mike
>
>>[How]
>>Don't early return if crtc is null
>>
>>Signed-off-by: Bhawanpreet Lakha 
>>---
>> drivers/gpu/drm/drm_dp_mst_topology.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
>>b/drivers/gpu/drm/drm_dp_mst_topology.c
>>index 70c4b7afed12..bc90a1485699 100644
>>--- a/drivers/gpu/drm/drm_dp_mst_topology.c
>>+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
>>@@ -5037,8 +5037,8 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct
>>drm_atomic_state *state, struct drm
>>
>>crtc = conn_state->crtc;
>>
>>-  if (WARN_ON(!crtc))
>>-  return -EINVAL;
>>+  if (!crtc)
>>+  continue;
>>
>>if (!drm_dp_mst_dsc_aux_for_port(pos->port))
>>continue;
>>--
>>2.17.1
>>
>>___
>>dri-devel mailing list
>>dri-de...@lists.freedesktop.org
>>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7CBhawanpreet.Lakha%40amd.com%7C0f5d55c551644fef3df908d840787b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330233520819407&sdata=5N%2BBb0Qp3bd5zANfxovb%2BrVLAGnbP1sjyw3EeCHXj2w%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amdgpu: Embed drm_device into amdgpu_device

2020-08-14 Thread Luben Tuikov
Embed struct drm_device into struct amdgpu_device.
Modify the macro DRM_TO_ADEV()
accordingly. Introduce a new macro to yield the
DRM device from amdgpu_device, ADEV_TO_DRM().
Eliminate the use of drm_device.dev_private,
in amdgpu.
Add a DRM driver relase function, which frees
the amdgpu_device after all krefs on the DRM
device have been released.

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c  |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  |   6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 178 
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  41 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  37 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  51 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c   |  18 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 194 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c  |   2 +-
 .../gpu/drm/amd/amdgpu/atombios_encoders.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/dce_v10_0.c|  46 ++---
 drivers/gpu/drm/amd/amdgpu/dce_v11_0.c|  46 ++---
 drivers/gpu/drm/amd/amdgpu/dce_v6_0.c |  46 ++---
 drivers/gpu/drm/amd/amdgpu/dce_v8_0.c |  46 ++---
 drivers/gpu/drm/amd/amdgpu/dce_virtual.c  |  36 ++--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  78 +++
 .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c |   4 +-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c |   4 +-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |   4 +-
 35 files changed, 475 insertions(+), 467 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0268bb1da82b..5581ba958fca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -724,8 +724,8 @@ struct amd_powerplay {
 #define AMDGPU_MAX_DF_PERFMONS 4
 struct amdgpu_device {
struct device   *dev;
-   struct drm_device   *ddev;
struct pci_dev  *pdev;
+   struct drm_device   ddev;
 
 #ifdef CONFIG_DRM_AMD_ACP
struct amdgpu_acp   acp;
@@ -988,7 +988,8 @@ struct amdgpu_device {
struct ratelimit_state  throttling_logging_rs;
 };
 
-#define DRM_TO_ADEV(_drmdevp)  ((struct amdgpu_device 
*)(_drmdevp)->dev_private)
+#define DRM_TO_ADEV(_drmdevp) container_of(_drmdevp, struct amdgpu_device, 
ddev)
+#define ADEV_TO_DRM(_amddevp) ((struct drm_device *)(&((_amddevp)->ddev)))
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
 {
@@ -996,8 +997,6 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct 
ttm_bo_device *bdev)
 }
 
 int amdgpu_device_init(struct amdgpu_device *adev,
-  struct drm_device *ddev,
-  struct pci_dev *pdev,
   uint32_t flags);
 void amdgpu_device_fini(struct amdgpu_device *adev);
 int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
@@ -1187,7 +1186,7 @@ static inline void *amdgpu_atpx_get_dhandle(void) { 
return NULL; }
 extern const struct drm_ioctl_desc amdgpu_ioctls_kms[];
 extern const int amdgpu_max_kms_ioctl;
 
-int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags);
+int amdgpu_driver_load_kms(struct amdgpu_device *adev, unsigned long flags);
 void amdgpu_driver_unload_kms(struct drm_device *dev);
 void amdgpu_driver_lastclose_kms(struct drm_device *dev);
 int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 913c8f0513bd..0d81fd70775a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -463,11 +463,11 @@ static int amdgpu_atif_handler(struct amdgpu_device *adev,
 
if (req.pending & ATIF_DGPU_DISPLAY_EVENT) {
if (adev->flags & AMD_IS_PX) {
-   pm_runtime_get_sync(adev->ddev->dev);
+   pm_runtime_get_sync(adev->ddev.dev);
   

[PATCH 0/2] Embed drm_device and eliminate use of dev_private

2020-08-14 Thread Luben Tuikov
As per the comments in include/drm/drm_device.h, struct
drm_device::dev_private seems to be obsolete and it is
recommended that drivers embed struct drm_device into their
larger per-device structure.

This patch-set embeds struct drm_device into struct
amdgpu_device, adds accessor macros for both structures from
one another, adds a DRM driver release callback to free the
container struct amdgpu_device, and eliminates using struct
drm_device::dev_private.

Luben Tuikov (2):
  drm/amdgpu: amdgpu_device from DRM dev by macro
  drm/amdgpu: Embed drm_device into amdgpu_device

 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c  |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|   8 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  |  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 186 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  61 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  45 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  61 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c   |  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  36 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 274 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c  |   6 +-
 drivers/gpu/drm/amd/amdgpu/atombios_crtc.c|  22 +-
 drivers/gpu/drm/amd/amdgpu/atombios_dp.c  |   6 +-
 .../gpu/drm/amd/amdgpu/atombios_encoders.c|  36 +--
 drivers/gpu/drm/amd/amdgpu/atombios_i2c.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/dce_v10_0.c|  94 +++---
 drivers/gpu/drm/amd/amdgpu/dce_v11_0.c|  96 +++---
 drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 104 +++
 drivers/gpu/drm/amd/amdgpu/dce_v8_0.c |  94 +++---
 drivers/gpu/drm/amd/amdgpu/dce_virtual.c  |  40 +--
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c  |   2 +-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 139 +
 .../amd/display/amdgpu_dm/amdgpu_dm_color.c   |   3 +-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c |   2 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c |   8 +-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c |   4 +-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |   8 +-
 54 files changed, 774 insertions(+), 766 deletions(-)

-- 
2.28.0.215.g878e727637

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: amdgpu_device from DRM dev by macro

2020-08-14 Thread Luben Tuikov
Get the amdgpu_device from the DRM device by use
of a macro, DRM_TO_ADEV(). The macro resolves
a pointer to struct drm_device to a pointer to
struct amdgpu_device.

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c  | 10 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_connectors.c| 18 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 20 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 12 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_encoders.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 10 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c   | 12 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   | 16 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 80 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 10 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c  |  4 +-
 drivers/gpu/drm/amd/amdgpu/atombios_crtc.c| 22 ++---
 drivers/gpu/drm/amd/amdgpu/atombios_dp.c  |  6 +-
 .../gpu/drm/amd/amdgpu/atombios_encoders.c| 34 
 drivers/gpu/drm/amd/amdgpu/atombios_i2c.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/dce_v10_0.c| 48 +--
 drivers/gpu/drm/amd/amdgpu/dce_v11_0.c| 50 ++--
 drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 58 +++---
 drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 48 +--
 drivers/gpu/drm/amd/amdgpu/dce_virtual.c  |  4 +-
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c  |  2 +-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 61 +++---
 .../amd/display/amdgpu_dm/amdgpu_dm_color.c   |  3 +-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c |  2 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c |  4 +-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  4 +-
 44 files changed, 301 insertions(+), 301 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 1f9d97f61aa5..0268bb1da82b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -988,6 +988,8 @@ struct amdgpu_device {
struct ratelimit_state  throttling_logging_rs;
 };
 
+#define DRM_TO_ADEV(_drmdevp)  ((struct amdgpu_device 
*)(_drmdevp)->dev_private)
+
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
 {
return container_of(bdev, struct amdgpu_device, mman.bdev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 9738dccb1c2c..cf185f3b8ae8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -498,7 +498,7 @@ int amdgpu_amdkfd_get_dmabuf_info(struct kgd_dev *kgd, int 
dma_buf_fd,
/* Can't handle buffers from different drivers */
goto out_put;
 
-   adev = obj->dev->dev_private;
+   adev = DRM_TO_ADEV(obj->dev);
bo = gem_to_amdgpu_bo(obj);
if (!(bo->preferred_domains & (AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT)))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0d75726bd228..aed2bd9e845e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1681,7 +1681,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
return -EINVAL;
 
obj = dma_buf->priv;
-   if (obj->dev->dev_private != adev)
+   if (DRM_TO_ADEV(obj->dev) != adev)
/* Can't handle buffers from other devices */
return -EINVAL;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
index e33f63712b46..351ad1945199 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -1882,7 +1882,7 @@ static void cail_mc_write(struct card_info *info, 
uint32_t reg, uint32_t val)
  */
 static void cail_reg_write(struct card_info *info, uint32_t re

[PATCH 2/2] drm/scheduler: Remove priority macro INVALID (v2)

2020-08-14 Thread Luben Tuikov
Remove DRM_SCHED_PRIORITY_INVALID. We no longer
carry around an invalid priority and cut it off
at the source.

Backwards compatibility behaviour of AMDGPU CTX
IOCTL passing in garbage for context priority
from user space and then mapping that to
DRM_SCHED_PRIORITY_NORMAL is preserved.

v2: Revert "res"  --> "r" and
   "prio" --> "priority".

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  5 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 40 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h |  3 +-
 include/drm/gpu_scheduler.h   |  1 -
 4 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 68eaa4f687a6..0a980f59cfd0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -392,13 +392,12 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
struct amdgpu_device *adev = dev->dev_private;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
 
-   r = 0;
id = args->in.ctx_id;
-   priority = amdgpu_to_sched_priority(args->in.priority);
+   r = amdgpu_to_sched_priority(args->in.priority, &priority);
 
/* For backwards compatibility reasons, we need to accept
 * ioctls with garbage in the priority field */
-   if (priority == DRM_SCHED_PRIORITY_INVALID)
+   if (r == -EINVAL)
priority = DRM_SCHED_PRIORITY_NORMAL;
 
switch (args->in.op) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index 17661ede9488..9581283a4c78 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -32,24 +32,32 @@
 
 #include "amdgpu_vm.h"
 
-enum drm_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
+int amdgpu_to_sched_priority(int amdgpu_priority,
+enum drm_sched_priority *prio)
 {
switch (amdgpu_priority) {
case AMDGPU_CTX_PRIORITY_VERY_HIGH:
-   return DRM_SCHED_PRIORITY_HIGH;
+   *prio = DRM_SCHED_PRIORITY_HIGH;
+   break;
case AMDGPU_CTX_PRIORITY_HIGH:
-   return DRM_SCHED_PRIORITY_HIGH;
+   *prio = DRM_SCHED_PRIORITY_HIGH;
+   break;
case AMDGPU_CTX_PRIORITY_NORMAL:
-   return DRM_SCHED_PRIORITY_NORMAL;
+   *prio = DRM_SCHED_PRIORITY_NORMAL;
+   break;
case AMDGPU_CTX_PRIORITY_LOW:
case AMDGPU_CTX_PRIORITY_VERY_LOW:
-   return DRM_SCHED_PRIORITY_MIN;
+   *prio = DRM_SCHED_PRIORITY_MIN;
+   break;
case AMDGPU_CTX_PRIORITY_UNSET:
-   return DRM_SCHED_PRIORITY_UNSET;
+   *prio = DRM_SCHED_PRIORITY_UNSET;
+   break;
default:
WARN(1, "Invalid context priority %d\n", amdgpu_priority);
-   return DRM_SCHED_PRIORITY_INVALID;
+   return -EINVAL;
}
+
+   return 0;
 }
 
 static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
@@ -119,9 +127,20 @@ int amdgpu_sched_ioctl(struct drm_device *dev, void *data,
enum drm_sched_priority priority;
int r;
 
-   priority = amdgpu_to_sched_priority(args->in.priority);
-   if (priority == DRM_SCHED_PRIORITY_INVALID)
+   /* First check the op, then the op's argument.
+*/
+   switch (args->in.op) {
+   case AMDGPU_SCHED_OP_PROCESS_PRIORITY_OVERRIDE:
+   case AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE:
+   break;
+   default:
+   DRM_ERROR("Invalid sched op specified: %d\n", args->in.op);
return -EINVAL;
+   }
+
+   r = amdgpu_to_sched_priority(args->in.priority, &priority);
+   if (r)
+   return r;
 
switch (args->in.op) {
case AMDGPU_SCHED_OP_PROCESS_PRIORITY_OVERRIDE:
@@ -136,7 +155,8 @@ int amdgpu_sched_ioctl(struct drm_device *dev, void *data,
   priority);
break;
default:
-   DRM_ERROR("Invalid sched op specified: %d\n", args->in.op);
+   /* Impossible.
+*/
r = -EINVAL;
break;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h
index 12299fd95691..67e5b2472f6a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h
@@ -30,7 +30,8 @@ enum drm_sched_priority;
 struct drm_device;
 struct drm_file;
 
-enum drm_sched_priority amdgpu_to_sched_priority(int amdgpu_priority);
+int amdgpu_to_sched_priority(int amdgpu_priority,
+enum drm_sched_priority *prio);
 int amdgpu_sched_ioctl(struct drm_device *dev, void *data,
   struct drm_file *fil

[PATCH 1/2] drm/scheduler: Scheduler priority fixes (v2)

2020-08-14 Thread Luben Tuikov
Remove DRM_SCHED_PRIORITY_LOW, as it was used
in only one place.

Rename and separate by a line
DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
as it represents a (total) count of said
priorities and it is used as such in loops
throughout the code. (0-based indexing is the
the count number.)

Remove redundant word HIGH in priority names,
and rename *KERNEL* to *HIGH*, as it really
means that, high.

v2: Add back KERNEL and remove SW and HW,
in lieu of a single HIGH between NORMAL and KERNEL.

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c|  4 ++--
 include/drm/gpu_scheduler.h   | 12 +++-
 8 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index d85d13f7a043..68eaa4f687a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -46,7 +46,7 @@ const unsigned int amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] 
= {
 static int amdgpu_ctx_priority_permit(struct drm_file *filp,
  enum drm_sched_priority priority)
 {
-   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_MAX)
+   if (priority < 0 || priority >= DRM_SCHED_PRIORITY_COUNT)
return -EINVAL;
 
/* NORMAL and below are accessible by everyone */
@@ -65,7 +65,7 @@ static int amdgpu_ctx_priority_permit(struct drm_file *filp,
 static enum gfx_pipe_priority amdgpu_ctx_sched_prio_to_compute_prio(enum 
drm_sched_priority prio)
 {
switch (prio) {
-   case DRM_SCHED_PRIORITY_HIGH_HW:
+   case DRM_SCHED_PRIORITY_HIGH:
case DRM_SCHED_PRIORITY_KERNEL:
return AMDGPU_GFX_PIPE_PRIO_HIGH;
default:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 75d37dfb51aa..bb9e5481ff3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -251,7 +251,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
int i;
 
/* Signal all jobs not yet scheduled */
-   for (i = DRM_SCHED_PRIORITY_MAX - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
+   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
i--) {
struct drm_sched_rq *rq = &sched->sched_rq[i];
 
if (!rq)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 13ea8ebc421c..6d4fc79bf84a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -267,7 +267,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
&ring->sched;
}
 
-   for (i = 0; i < DRM_SCHED_PRIORITY_MAX; ++i)
+   for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; ++i)
atomic_set(&ring->num_jobs[i], 0);
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index da871d84b742..7112137689db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -243,7 +243,7 @@ struct amdgpu_ring {
boolhas_compute_vm_bug;
boolno_scheduler;
 
-   atomic_tnum_jobs[DRM_SCHED_PRIORITY_MAX];
+   atomic_tnum_jobs[DRM_SCHED_PRIORITY_COUNT];
struct mutexpriority_mutex;
/* protected by priority_mutex */
int priority;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index c799691dfa84..17661ede9488 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -36,14 +36,14 @@ enum drm_sched_priority amdgpu_to_sched_priority(int 
amdgpu_priority)
 {
switch (amdgpu_priority) {
case AMDGPU_CTX_PRIORITY_VERY_HIGH:
-   return DRM_SCHED_PRIORITY_HIGH_HW;
+   return DRM_SCHED_PRIORITY_HIGH;
case AMDGPU_CTX_PRIORITY_HIGH:
-   return DRM_SCHED_PRIORITY_HIGH_SW;
+   return DRM_SCHED_PRIORITY_HIGH;
case AMDGPU_CTX_PRIORITY_NORMAL:
return DRM_SCHED_PRIORITY_NORMAL;
case AMDGPU_CTX_PRIORITY_LOW:
case AMDGPU_CTX_PRIORITY_VERY_LOW:
-   return DRM_SCHED_PRIORITY_LOW;
+   return DRM_SCHED_PRIORITY_MIN;
case AMDGPU_CTX_PRIORITY_UNSET:
return DRM_SCHED_PRIORITY_UNSET;
default:
diff --git a/drivers/gpu/drm/amd/amdgpu/amd

[PATCH 0/2] Fixes to drm_sched_priority (v2)

2020-08-14 Thread Luben Tuikov
Some fixes to enum drm_sched_priority which I'd wanted to do
since last year.

For instance, renaming MAX to COUNT, as usually a maximum value
is a value which is part of the set of values, (e.g. a maxima of
a function), and thus assignable, whereby a count is the size of
a set (the enumeration in this case). It also makes it clearer
when used to define size of arrays.

Removing some redundant naming and perhaps better naming could be
had for PRIORITY_SW and PRIORITY_HW, maybe "moderate" and
"temperate" respectively. However, I've left them as "sw" and
"hw".

Also remove a macro which was used in only a single place.

In the second patch, remove priority INVALID, since it is not a
priority, e.g. a scheduler cannot be assigned and run at priority
"invalid". It seems to be more of a directive, a status, of user
input, and as such has no place in the enumeration of priority
levels.

Something else I'd like to do, is to eliminate priority
enumeration "UNSET", since it is not really a priority level,
but  a directive, instructing the code to "reset the priority
of a context to the initial priority".

At the moment, this functionality is overloaded to this priority
value, and it should be an IOCTL op, as opposed to a priority value.

Tested on my desktop system, which is running a kernel with
this patch applied.

v2: Some reverts; eliminate SW and HW for HIGH, and bring back
KERNEL--the highest!

Luben Tuikov (2):
  drm/scheduler: Scheduler priority fixes (v2)
  drm/scheduler: Remove priority macro INVALID (v2)

 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  9 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 40 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.h |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c|  4 +--
 include/drm/gpu_scheduler.h   | 13 
 9 files changed, 49 insertions(+), 28 deletions(-)

-- 
2.28.0.215.g878e727637

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx