date:20210929

Re: amdgpu driver halted on suspend of shutdown

2021-09-29 Thread Christian König


Well you could remove it locally if it solves your problem at hand.

But keep in mind that a lot of ARM boards are simply not compliant to 
the PCIe specification and the hardware won't work correctly on those in 
general.


I'm pretty sure you have one of those cases here.

Christian.

Am 30.09.21 um 03:26 schrieb 李真能:


So, Can I remove suspend process in amdgpu_pci_shutdown if  I don't  
use amdgpu driver in vm?


Thank you so much foryour reply!

在 2021/9/30 上午5:12, Alex Deucher 写道:

On Wed, Sep 29, 2021 at 3:25 AM 李真能  wrote:

Hello:

  When I do loop  auto test of reboot, I found  kernel may halt
on memcpy_fromio of amdgpu's amdgpu_uvd_suspend, so I remove suspend
process in amdgpu_pci_shutdown, and it will fix this bug.

I have 3 questions to ask:

1. In amdgpu_pci_shutdown, the comment explains why we must execute
suspend,  so I know VM will call amdgpu driver in which situations, as I
know, VM's graphics card is a virtual card;

2. I see a path that is commited by Alex Deucher, the commit message is
as follows:

drm/amdgpu: just suspend the hw on pci shutdown

We can't just reuse pci_remove as there may be userspace still
  doing things.

My question is:In which situations, there may be  userspace till doing
things.

3. Why amdgpu driver is halted on memcpy_fromio of amdgpu_uvd_suspend, I
haven't launch any video app during reboot test, is it the bug of pci bus?

Test environment:

CPU: arm64

I suspect the problem is something ARM specific.  IIRC, we added the
memcpy_fromio() to work around a limitation in ARM related to CPU
mappings of PCI BAR memory.  The whole point of the PCI shutdown
callback is to put the device into a quiescent state (e.g., stop all
DMAs and asynchronous engines, etc.).  Some of that tear down requires
access to PCI BARs.

Alex



Graphics card: r7340(amdgpu), rx550

OS: ubuntu 2004

Re: [PATCH] Documentation/gpu: remove spurious "+" in amdgpu.rst

2021-09-29 Thread Christian König


Am 29.09.21 um 19:45 schrieb Alex Deucher:

Not sure why that was there.  Remove it.

Signed-off-by: Alex Deucher 


Reviewed-by: Christian König 


---
  Documentation/gpu/amdgpu.rst | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpu/amdgpu.rst b/Documentation/gpu/amdgpu.rst
index 364680cdad2e..8ba72e898099 100644
--- a/Documentation/gpu/amdgpu.rst
+++ b/Documentation/gpu/amdgpu.rst
@@ -300,8 +300,8 @@ pcie_replay_count
  .. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
 :doc: pcie_replay_count
  
-+GPU SmartShift Information

-
+GPU SmartShift Information
+==
  
  GPU SmartShift information via sysfs

Re: [PATCH] drm/amdgpu: consolidate case statements

2021-09-29 Thread Christian König


Am 29.09.21 um 19:45 schrieb Alex Deucher:

IP_VERSION(11, 0, 13) does the exact same thing as
IP_VERSION(11, 0, 12) so squash them together.

Signed-off-by: Alex Deucher 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 7 ---
  1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
index 382cebfc2069..aaf200ec982b 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -216,13 +216,6 @@ static int psp_v11_0_init_microcode(struct psp_context 
*psp)
case IP_VERSION(11, 0, 7):
case IP_VERSION(11, 0, 11):
case IP_VERSION(11, 0, 12):
-   err = psp_init_sos_microcode(psp, chip_name);
-   if (err)
-   return err;
-   err = psp_init_ta_microcode(psp, chip_name);
-   if (err)
-   return err;
-   break;
case IP_VERSION(11, 0, 13):
err = psp_init_sos_microcode(psp, chip_name);
if (err)

RE: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Yu, Lang



>-Original Message-
>From: Kuehling, Felix 
>Sent: Thursday, September 30, 2021 11:26 AM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>
>Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak
>
>On 2021-09-29 10:38 p.m., Yu, Lang wrote:
>>> -Original Message-
>>> From: Kuehling, Felix 
>>> Sent: Thursday, September 30, 2021 10:28 AM
>>> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Huang, Ray
>>> 
>>> Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak
>>>
>>> On 2021-09-29 10:23 p.m., Yu, Lang wrote:
> -Original Message-
> From: Kuehling, Felix 
> Sent: Thursday, September 30, 2021 9:47 AM
> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> 
> Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory
> leak
>
> On 2021-09-29 7:32 p.m., Yu, Lang wrote:
>> [AMD Official Use Only]
>>
>>
>>
>>> -Original Message-
>>> From: Kuehling, Felix 
>>> Sent: Wednesday, September 29, 2021 11:25 PM
>>> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Huang,
>Ray
>>> 
>>> Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory
>>> leak
>>>
>>> Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:
 If user doesn't explicitly call kfd_ioctl_destroy_queue to
 destroy all created queues, when the kfd process is destroyed,
>some queues'
 cu_mask memory are not freed.

 To avoid forgetting to free them in some places, free them
 immediately after use.

 Signed-off-by: Lang Yu 
 ---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
 
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |
>10
 --
 2 files changed, 8 insertions(+), 10 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 index 4de907f3e66a..5c0e6dcf692a 100644
 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 @@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file
 *filp, struct
>>> kfd_process *p,
retval = copy_from_user(properties.cu_mask, cu_mask_ptr,
>>> cu_mask_size);
if (retval) {
pr_debug("Could not copy CU mask from userspace");
 -  kfree(properties.cu_mask);
 -  return -EFAULT;
 +  retval = -EFAULT;
 +  goto out;
}

mutex_lock(&p->mutex);
 @@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file
 *filp, struct kfd_process *p,

mutex_unlock(&p->mutex);

 -  if (retval)
 -  kfree(properties.cu_mask);
 +out:
 +  kfree(properties.cu_mask);

return retval;
 }
 diff --git
 a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 index 243dd1efcdbf..4c81d690f31a 100644
 ---
>a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 +++
>b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 @@ -394,8 +394,6 @@ int pqm_destroy_queue(struct
>>> process_queue_manager *pqm, unsigned int qid)
pdd->qpd.num_gws = 0;
}

 -  kfree(pqn->q->properties.cu_mask);
 -  pqn->q->properties.cu_mask = NULL;
uninit_queue(pqn->q);
}

 @@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct
>>> process_queue_manager *pqm, unsigned int qid,
return -EFAULT;
}

 -  /* Free the old CU mask memory if it is already allocated, then
 -   * allocate memory for the new CU mask.
 -   */
 -  kfree(pqn->q->properties.cu_mask);
 +  WARN_ON_ONCE(pqn->q->properties.cu_mask);

pqn->q->properties.cu_mask_count = p->cu_mask_count;
pqn->q->properties.cu_mask = p->cu_mask;

retval = pqn->q->device->dqm->ops.update_queue(pqn->q-
 device->dqm,
pqn->q);
 +
 +  pqn->q->properties.cu_mask = NULL;
 +
>>> This won't work correctly. We need to save the cu_mask for later.
>>> Otherwise the next time dqm->ops.update_queue is called, for
>>> example in pqm_update_queue or pqm_set_gws, it will wipe out the
>>> CU mask in the
>>

Re: [PATCH 2/4] amdgpu_ucode: reduce number of pr_debug calls

2021-09-29 Thread jim . cromie

On Wed, Sep 29, 2021 at 8:08 PM Joe Perches  wrote:
>
> On Wed, 2021-09-29 at 19:44 -0600, Jim Cromie wrote:
> > There are blocks of DRM_DEBUG calls, consolidate their args into
> > single calls.  With dynamic-debug in use, each callsite consumes 56
> > bytes of callsite data, and this patch removes about 65 calls, so
> > it saves ~3.5kb.
> >
> > no functional changes.
>
> No functional change, but an output logging content change.
>
> > RFC: this creates multi-line log messages, does that break any syslog
> > conventions ?
>
> It does change the output as each individual DRM_DEBUG is a call to
> __drm_dbg which is effectively:
>
> printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
>__builtin_return_address(0), &vaf);
>
>

ok.  that would disqualify the nouveau patch too.

Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Felix Kuehling


On 2021-09-29 10:38 p.m., Yu, Lang wrote:

-Original Message-
From: Kuehling, Felix 
Sent: Thursday, September 30, 2021 10:28 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray

Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

On 2021-09-29 10:23 p.m., Yu, Lang wrote:

-Original Message-
From: Kuehling, Felix 
Sent: Thursday, September 30, 2021 9:47 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray

Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

On 2021-09-29 7:32 p.m., Yu, Lang wrote:

[AMD Official Use Only]




-Original Message-
From: Kuehling, Felix 
Sent: Wednesday, September 29, 2021 11:25 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray

Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory
leak

Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:

If user doesn't explicitly call kfd_ioctl_destroy_queue to destroy
all created queues, when the kfd process is destroyed, some queues'
cu_mask memory are not freed.

To avoid forgetting to free them in some places, free them
immediately after use.

Signed-off-by: Lang Yu 
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10
--
2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4de907f3e66a..5c0e6dcf692a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file
*filp, struct

kfd_process *p,

retval = copy_from_user(properties.cu_mask, cu_mask_ptr,

cu_mask_size);

if (retval) {
pr_debug("Could not copy CU mask from userspace");
-   kfree(properties.cu_mask);
-   return -EFAULT;
+   retval = -EFAULT;
+   goto out;
}

mutex_lock(&p->mutex);
@@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file
*filp, struct kfd_process *p,

mutex_unlock(&p->mutex);

-   if (retval)
-   kfree(properties.cu_mask);
+out:
+   kfree(properties.cu_mask);

return retval;
}
diff --git
a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 243dd1efcdbf..4c81d690f31a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -394,8 +394,6 @@ int pqm_destroy_queue(struct

process_queue_manager *pqm, unsigned int qid)

pdd->qpd.num_gws = 0;
}

-   kfree(pqn->q->properties.cu_mask);
-   pqn->q->properties.cu_mask = NULL;
uninit_queue(pqn->q);
}

@@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct

process_queue_manager *pqm, unsigned int qid,

return -EFAULT;
}

-   /* Free the old CU mask memory if it is already allocated, then
-* allocate memory for the new CU mask.
-*/
-   kfree(pqn->q->properties.cu_mask);
+   WARN_ON_ONCE(pqn->q->properties.cu_mask);

pqn->q->properties.cu_mask_count = p->cu_mask_count;
pqn->q->properties.cu_mask = p->cu_mask;

retval = pqn->q->device->dqm->ops.update_queue(pqn->q-

device->dqm,

pqn->q);
+
+   pqn->q->properties.cu_mask = NULL;
+

This won't work correctly. We need to save the cu_mask for later.
Otherwise the next time dqm->ops.update_queue is called, for
example in pqm_update_queue or pqm_set_gws, it will wipe out the CU
mask in the

MQD.

Let's just return when meeting a null cu_mask in update_cu_mask() to
avoid

that.

Like following,

static void update_cu_mask(struct mqd_manager *mm, void *mqd,
   struct queue_properties *q)
{
struct v10_compute_mqd *m;
uint32_t se_mask[4] = {0}; /* 4 is the max # of SEs */

if (!q-> cu_mask || q->cu_mask_count == 0)
return;
..
}

Is this fine with you? Thanks!

I think that could work. I still don't like it. It leaves the CU mask
in the q-

properties structure, but it's only ever used temporarily and
doesn't need to be

persistent. I'd argue, in this case, the cu_mask shouldn't be in the
q->properties structure at all, but should be passed as an optional
parameter into the dqm-

ops.update_queue call.

The cu_mask is originally in q->properties structure. I didn't change that.
What I want to do is keeping the cu_mask memory allocation and deallocation

just in kfd_ioctl_set_cu_mask.

instead of everywhere.

You're not changing where it is stored. But you're changing it from something
persistent (while the queue exists) to something ephemeral (while the ioctl 
c

RE: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Yu, Lang



>-Original Message-
>From: Kuehling, Felix 
>Sent: Thursday, September 30, 2021 10:28 AM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>
>Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak
>
>On 2021-09-29 10:23 p.m., Yu, Lang wrote:
>>> -Original Message-
>>> From: Kuehling, Felix 
>>> Sent: Thursday, September 30, 2021 9:47 AM
>>> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Huang, Ray
>>> 
>>> Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak
>>>
>>> On 2021-09-29 7:32 p.m., Yu, Lang wrote:
 [AMD Official Use Only]



> -Original Message-
> From: Kuehling, Felix 
> Sent: Wednesday, September 29, 2021 11:25 PM
> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> 
> Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory
> leak
>
> Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:
>> If user doesn't explicitly call kfd_ioctl_destroy_queue to destroy
>> all created queues, when the kfd process is destroyed, some queues'
>> cu_mask memory are not freed.
>>
>> To avoid forgetting to free them in some places, free them
>> immediately after use.
>>
>> Signed-off-by: Lang Yu 
>> ---
>>drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
>>drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10
>> --
>>2 files changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index 4de907f3e66a..5c0e6dcf692a 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file
>> *filp, struct
> kfd_process *p,
>>  retval = copy_from_user(properties.cu_mask, cu_mask_ptr,
> cu_mask_size);
>>  if (retval) {
>>  pr_debug("Could not copy CU mask from userspace");
>> -kfree(properties.cu_mask);
>> -return -EFAULT;
>> +retval = -EFAULT;
>> +goto out;
>>  }
>>
>>  mutex_lock(&p->mutex);
>> @@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file
>> *filp, struct kfd_process *p,
>>
>>  mutex_unlock(&p->mutex);
>>
>> -if (retval)
>> -kfree(properties.cu_mask);
>> +out:
>> +kfree(properties.cu_mask);
>>
>>  return retval;
>>}
>> diff --git
>> a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index 243dd1efcdbf..4c81d690f31a 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> @@ -394,8 +394,6 @@ int pqm_destroy_queue(struct
> process_queue_manager *pqm, unsigned int qid)
>>  pdd->qpd.num_gws = 0;
>>  }
>>
>> -kfree(pqn->q->properties.cu_mask);
>> -pqn->q->properties.cu_mask = NULL;
>>  uninit_queue(pqn->q);
>>  }
>>
>> @@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct
> process_queue_manager *pqm, unsigned int qid,
>>  return -EFAULT;
>>  }
>>
>> -/* Free the old CU mask memory if it is already allocated, then
>> - * allocate memory for the new CU mask.
>> - */
>> -kfree(pqn->q->properties.cu_mask);
>> +WARN_ON_ONCE(pqn->q->properties.cu_mask);
>>
>>  pqn->q->properties.cu_mask_count = p->cu_mask_count;
>>  pqn->q->properties.cu_mask = p->cu_mask;
>>
>>  retval = pqn->q->device->dqm->ops.update_queue(pqn->q-
>>device->dqm,
>>  pqn->q);
>> +
>> +pqn->q->properties.cu_mask = NULL;
>> +
> This won't work correctly. We need to save the cu_mask for later.
> Otherwise the next time dqm->ops.update_queue is called, for
> example in pqm_update_queue or pqm_set_gws, it will wipe out the CU
> mask in the
>>> MQD.
 Let's just return when meeting a null cu_mask in update_cu_mask() to
 avoid
>>> that.
 Like following,

 static void update_cu_mask(struct mqd_manager *mm, void *mqd,
   struct queue_properties *q)
 {
struct v10_compute_mqd *m;
uint32_t se_mask[4] = {0}; /* 4 is the max # of SEs */

if (!q-> cu_mask || q->cu_mask_count == 0)
return;
..
 }

 Is this fine with you? Thanks!
>>>

[pull] amdgpu drm-fixes-5.15

2021-09-29 Thread Alex Deucher

Hi Dave, Daniel,

Fixes for 5.15.

The following changes since commit 05812b971c6d605c00987750f422918589aa4486:

  Merge tag 'drm/tegra/for-5.15-rc3' of 
ssh://git.freedesktop.org/git/tegra/linux into drm-fixes (2021-09-28 17:08:44 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-5.15-2021-09-29

for you to fetch changes up to 26db706a6d77b9e184feb11725e97e53b7a89519:

  drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix (2021-09-28 
14:40:27 -0400)


amd-drm-fixes-5.15-2021-09-29:

amdgpu:
- gart pin count fix
- eDP flicker fix
- GFX9 MQD fix
- Display fixes
- Tiling flags fix for pre-GFX9
- SDMA resume fix for S0ix


Charlene Liu (1):
  drm/amd/display: Pass PCI deviceid into DC

Hawking Zhang (1):
  drm/amdgpu: correct initial cp_hqd_quantum for gfx9

Josip Pavic (1):
  drm/amd/display: initialize backlight_ramping_override to false

Leslie Shi (1):
  drm/amdgpu: fix gart.bo pin_count leak

Praful Swarnakar (1):
  drm/amd/display: Fix Display Flicker on embedded panels

Prike Liang (1):
  drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix

Simon Ser (1):
  drm/amdgpu: check tiling flags when creating FB on GFX8-

 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   | 31 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c|  3 ++-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c|  8 ++
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 ++
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c  | 15 +--
 7 files changed, 53 insertions(+), 11 deletions(-)

Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Felix Kuehling

On 2021-09-29 10:23 p.m., Yu, Lang wrote:

-Original Message-
From: Kuehling, Felix 
Sent: Thursday, September 30, 2021 9:47 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray

Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

On 2021-09-29 7:32 p.m., Yu, Lang wrote:

[AMD Official Use Only]

-Original Message-
From: Kuehling, Felix 
Sent: Wednesday, September 29, 2021 11:25 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray

Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:

If user doesn't explicitly call kfd_ioctl_destroy_queue to destroy
all created queues, when the kfd process is destroyed, some queues'
cu_mask memory are not freed.

To avoid forgetting to free them in some places, free them
immediately after use.

Signed-off-by: Lang Yu 
---
   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10
--
   2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4de907f3e66a..5c0e6dcf692a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file
*filp, struct

kfd_process *p,

retval = copy_from_user(properties.cu_mask, cu_mask_ptr,

cu_mask_size);

if (retval) {
pr_debug("Could not copy CU mask from userspace");
-   kfree(properties.cu_mask);
-   return -EFAULT;
+   retval = -EFAULT;
+   goto out;
}

mutex_lock(&p->mutex);
@@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file
*filp, struct kfd_process *p,

mutex_unlock(&p->mutex);

-   if (retval)
-   kfree(properties.cu_mask);
+out:
+   kfree(properties.cu_mask);

return retval;
   }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 243dd1efcdbf..4c81d690f31a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -394,8 +394,6 @@ int pqm_destroy_queue(struct

process_queue_manager *pqm, unsigned int qid)

pdd->qpd.num_gws = 0;
}

-   kfree(pqn->q->properties.cu_mask);
-   pqn->q->properties.cu_mask = NULL;
uninit_queue(pqn->q);
}

@@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct

process_queue_manager *pqm, unsigned int qid,

return -EFAULT;
}

-   /* Free the old CU mask memory if it is already allocated, then
-* allocate memory for the new CU mask.
-*/
-   kfree(pqn->q->properties.cu_mask);
+   WARN_ON_ONCE(pqn->q->properties.cu_mask);

pqn->q->properties.cu_mask_count = p->cu_mask_count;
pqn->q->properties.cu_mask = p->cu_mask;

retval = pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
pqn->q);
+
+   pqn->q->properties.cu_mask = NULL;
+

This won't work correctly. We need to save the cu_mask for later.
Otherwise the next time dqm->ops.update_queue is called, for example
in pqm_update_queue or pqm_set_gws, it will wipe out the CU mask in the

MQD.

Let's just return when meeting a null cu_mask in update_cu_mask() to avoid

that.

Like following,

static void update_cu_mask(struct mqd_manager *mm, void *mqd,
   struct queue_properties *q)
{
struct v10_compute_mqd *m;
uint32_t se_mask[4] = {0}; /* 4 is the max # of SEs */

if (!q-> cu_mask || q->cu_mask_count == 0)
return;
..
}

Is this fine with you? Thanks!

I think that could work. I still don't like it. It leaves the CU mask in the q-

properties structure, but it's only ever used temporarily and doesn't need to be

persistent. I'd argue, in this case, the cu_mask shouldn't be in the 
q->properties
structure at all, but should be passed as an optional parameter into the dqm-

ops.update_queue call.

The cu_mask is originally in q->properties structure. I didn't change that.
What I want to do is keeping the cu_mask memory allocation and deallocation 
just in kfd_ioctl_set_cu_mask.
instead of everywhere.

You're not changing where it is stored. But you're changing it from 
something persistent (while the queue exists) to something ephemeral 
(while the ioctl call is executing in the kernel). I think that would 
justify removing the persistent pointer from the q->properties structure.

But I think a simpler fix would be to move the freeing of the CU mask into
uninit_queue. That would catch all cases where a queue gets destroyed, including
the

RE: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Yu, Lang



>-Original Message-
>From: Kuehling, Felix 
>Sent: Thursday, September 30, 2021 9:47 AM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>
>Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak
>
>On 2021-09-29 7:32 p.m., Yu, Lang wrote:
>> [AMD Official Use Only]
>>
>>
>>
>>> -Original Message-
>>> From: Kuehling, Felix 
>>> Sent: Wednesday, September 29, 2021 11:25 PM
>>> To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Huang, Ray
>>> 
>>> Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak
>>>
>>> Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:
 If user doesn't explicitly call kfd_ioctl_destroy_queue to destroy
 all created queues, when the kfd process is destroyed, some queues'
 cu_mask memory are not freed.

 To avoid forgetting to free them in some places, free them
 immediately after use.

 Signed-off-by: Lang Yu 
 ---
   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10
 --
   2 files changed, 8 insertions(+), 10 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 index 4de907f3e66a..5c0e6dcf692a 100644
 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
 @@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file
 *filp, struct
>>> kfd_process *p,
retval = copy_from_user(properties.cu_mask, cu_mask_ptr,
>>> cu_mask_size);
if (retval) {
pr_debug("Could not copy CU mask from userspace");
 -  kfree(properties.cu_mask);
 -  return -EFAULT;
 +  retval = -EFAULT;
 +  goto out;
}

mutex_lock(&p->mutex);
 @@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file
 *filp, struct kfd_process *p,

mutex_unlock(&p->mutex);

 -  if (retval)
 -  kfree(properties.cu_mask);
 +out:
 +  kfree(properties.cu_mask);

return retval;
   }
 diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 index 243dd1efcdbf..4c81d690f31a 100644
 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
 @@ -394,8 +394,6 @@ int pqm_destroy_queue(struct
>>> process_queue_manager *pqm, unsigned int qid)
pdd->qpd.num_gws = 0;
}

 -  kfree(pqn->q->properties.cu_mask);
 -  pqn->q->properties.cu_mask = NULL;
uninit_queue(pqn->q);
}

 @@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct
>>> process_queue_manager *pqm, unsigned int qid,
return -EFAULT;
}

 -  /* Free the old CU mask memory if it is already allocated, then
 -   * allocate memory for the new CU mask.
 -   */
 -  kfree(pqn->q->properties.cu_mask);
 +  WARN_ON_ONCE(pqn->q->properties.cu_mask);

pqn->q->properties.cu_mask_count = p->cu_mask_count;
pqn->q->properties.cu_mask = p->cu_mask;

retval = pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
pqn->q);
 +
 +  pqn->q->properties.cu_mask = NULL;
 +
>>> This won't work correctly. We need to save the cu_mask for later.
>>> Otherwise the next time dqm->ops.update_queue is called, for example
>>> in pqm_update_queue or pqm_set_gws, it will wipe out the CU mask in the
>MQD.
>> Let's just return when meeting a null cu_mask in update_cu_mask() to avoid
>that.
>> Like following,
>>
>> static void update_cu_mask(struct mqd_manager *mm, void *mqd,
>> struct queue_properties *q)
>> {
>>  struct v10_compute_mqd *m;
>>  uint32_t se_mask[4] = {0}; /* 4 is the max # of SEs */
>>
>>  if (!q-> cu_mask || q->cu_mask_count == 0)
>>  return;
>>  ..
>> }
>>
>> Is this fine with you? Thanks!
>
>I think that could work. I still don't like it. It leaves the CU mask in the q-
>>properties structure, but it's only ever used temporarily and doesn't need to 
>>be
>persistent. I'd argue, in this case, the cu_mask shouldn't be in the 
>q->properties
>structure at all, but should be passed as an optional parameter into the dqm-
>>ops.update_queue call.

The cu_mask is originally in q->properties structure. I didn't change that.
What I want to do is keeping the cu_mask memory allocation and deallocation 
just in kfd_ioctl_set_cu_mask.
instead of everywhere.
 
>But I think a simpler fix would be to move the freeing of the CU mask into
>uninit_queue. That would catch all cases where a queue gets destroyed, 
>including
>the process termination case.

Y

Re: [PATCH 2/4] amdgpu_ucode: reduce number of pr_debug calls

2021-09-29 Thread Joe Perches

On Wed, 2021-09-29 at 19:44 -0600, Jim Cromie wrote:
> There are blocks of DRM_DEBUG calls, consolidate their args into
> single calls.  With dynamic-debug in use, each callsite consumes 56
> bytes of callsite data, and this patch removes about 65 calls, so
> it saves ~3.5kb.
> 
> no functional changes.

No functional change, but an output logging content change.

> RFC: this creates multi-line log messages, does that break any syslog
> conventions ?

It does change the output as each individual DRM_DEBUG is a call to
__drm_dbg which is effectively:

printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
   __builtin_return_address(0), &vaf);


> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
[]
> @@ -30,17 +30,26 @@
>  
> 
>  static void amdgpu_ucode_print_common_hdr(const struct 
> common_firmware_header *hdr)
>  {
> - DRM_DEBUG("size_bytes: %u\n", le32_to_cpu(hdr->size_bytes));
> - DRM_DEBUG("header_size_bytes: %u\n", 
> le32_to_cpu(hdr->header_size_bytes));
[]
> + DRM_DEBUG("size_bytes: %u\n"
> +   "header_size_bytes: %u\n"

etc...

Re: [PATCH] drm/amdkfd: avoid conflicting address mappings

2021-09-29 Thread Felix Kuehling


On 2021-09-29 7:35 p.m., Mike Lothian wrote:

Hi

This patch is causing a compile failure for me

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.c:1254:25: error:
unused variable 'svms' [-Werror,-Wunused-variable]
struct svm_range_list *svms = &p->svms;
   ^
1 error generated.

I'll turn off Werror
I guess the struct svm_range_list *svms declaration should be under #if 
IS_ENABLED(CONFIG_HSA_AMD_SVM). Alternatively, we could get rid of it 
and use p->svms directly (it's used in 3 places in that function).


Would you like to propose a patch for that?

Thanks,
  Felix




On Mon, 19 Jul 2021 at 22:19, Alex Sierra  wrote:

[Why]
Avoid conflict with address ranges mapped by SVM
mechanism that try to be allocated again through
ioctl_alloc in the same process. And viceversa.

[How]
For ioctl_alloc_memory_of_gpu allocations
Check if the address range passed into ioctl memory
alloc does not exist already in the kfd_process
svms->objects interval tree.

For SVM allocations
Look for the address range into the interval tree VA from
the VM inside of each pdds used in a kfd_process.

Signed-off-by: Alex Sierra 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 79 +++-
  2 files changed, 75 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 67541c30327a..f39baaa22a62 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1251,6 +1251,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
 struct kfd_process_device *pdd;
 void *mem;
 struct kfd_dev *dev;
+   struct svm_range_list *svms = &p->svms;
 int idr_handle;
 long err;
 uint64_t offset = args->mmap_offset;
@@ -1259,6 +1260,18 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
 if (args->size == 0)
 return -EINVAL;

+#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+   mutex_lock(&svms->lock);
+   if (interval_tree_iter_first(&svms->objects,
+args->va_addr >> PAGE_SHIFT,
+(args->va_addr + args->size - 1) >> 
PAGE_SHIFT)) {
+   pr_err("Address: 0x%llx already allocated by SVM\n",
+   args->va_addr);
+   mutex_unlock(&svms->lock);
+   return -EADDRINUSE;
+   }
+   mutex_unlock(&svms->lock);
+#endif
 dev = kfd_device_by_id(args->gpu_id);
 if (!dev)
 return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 31f3f24cef6a..043ee0467916 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2581,9 +2581,54 @@ int svm_range_list_init(struct kfd_process *p)
 return 0;
  }

+/**
+ * svm_range_is_vm_bo_mapped - check if virtual address range mapped already
+ * @p: current kfd_process
+ * @start: range start address, in pages
+ * @last: range last address, in pages
+ *
+ * The purpose is to avoid virtual address ranges already allocated by
+ * kfd_ioctl_alloc_memory_of_gpu ioctl.
+ * It looks for each pdd in the kfd_process.
+ *
+ * Context: Process context
+ *
+ * Return 0 - OK, if the range is not mapped.
+ * Otherwise error code:
+ * -EADDRINUSE - if address is mapped already by kfd_ioctl_alloc_memory_of_gpu
+ * -ERESTARTSYS - A wait for the buffer to become unreserved was interrupted by
+ * a signal. Release all buffer reservations and return to user-space.
+ */
+static int
+svm_range_is_vm_bo_mapped(struct kfd_process *p, uint64_t start, uint64_t last)
+{
+   uint32_t i;
+   int r;
+
+   for (i = 0; i < p->n_pdds; i++) {
+   struct amdgpu_vm *vm;
+
+   if (!p->pdds[i]->drm_priv)
+   continue;
+
+   vm = drm_priv_to_vm(p->pdds[i]->drm_priv);
+   r = amdgpu_bo_reserve(vm->root.bo, false);
+   if (r)
+   return r;
+   if (interval_tree_iter_first(&vm->va, start, last)) {
+   pr_debug("Range [0x%llx 0x%llx] already mapped\n", 
start, last);
+   amdgpu_bo_unreserve(vm->root.bo);
+   return -EADDRINUSE;
+   }
+   amdgpu_bo_unreserve(vm->root.bo);
+   }
+
+   return 0;
+}
+
  /**
   * svm_range_is_valid - check if virtual address range is valid
- * @mm: current process mm_struct
+ * @mm: current kfd_process
   * @start: range start address, in pages
   * @size: range size, in pages
   *
@@ -2592,28 +2637,27 @@ int svm_range_list_init(struct kfd_process *p)
   * Context: Process context
   *
   * Return:
- *  true - valid svm range
- *  false - invalid svm range
+ *  0 - OK, otherwise error code
   */
-static bool
-svm_range_is_valid(struct mm_struct *mm, uint64_t s

Re: amdgpu driver halted on suspend of shutdown

2021-09-29 Thread 李真能

So, Can I remove suspend process in amdgpu_pci_shutdown if  I don't  use 
amdgpu driver in vm?


Thank you so much foryour reply!

在 2021/9/30 上午5:12, Alex Deucher 写道:

On Wed, Sep 29, 2021 at 3:25 AM 李真能  wrote:

Hello:

  When I do loop  auto test of reboot, I found  kernel may halt
on memcpy_fromio of amdgpu's amdgpu_uvd_suspend, so I remove suspend
process in amdgpu_pci_shutdown, and it will fix this bug.

I have 3 questions to ask:

1. In amdgpu_pci_shutdown, the comment explains why we must execute
suspend,  so I know VM will call amdgpu driver in which situations, as I
know, VM's graphics card is a virtual card;

2. I see a path that is commited by Alex Deucher, the commit message is
as follows:

drm/amdgpu: just suspend the hw on pci shutdown

We can't just reuse pci_remove as there may be userspace still
  doing things.

My question is:In which situations, there may be  userspace till doing
things.

3. Why amdgpu driver is halted on memcpy_fromio of amdgpu_uvd_suspend, I
haven't launch any video app during reboot test, is it the bug of pci bus?

Test environment:

CPU: arm64

I suspect the problem is something ARM specific.  IIRC, we added the
memcpy_fromio() to work around a limitation in ARM related to CPU
mappings of PCI BAR memory.  The whole point of the PCI shutdown
callback is to put the device into a quiescent state (e.g., stop all
DMAs and asynchronous engines, etc.).  Some of that tear down requires
access to PCI BARs.

Alex



Graphics card: r7340(amdgpu), rx550

OS: ubuntu 2004

Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Felix Kuehling


On 2021-09-29 7:32 p.m., Yu, Lang wrote:

[AMD Official Use Only]




-Original Message-
From: Kuehling, Felix 
Sent: Wednesday, September 29, 2021 11:25 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray

Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:

If user doesn't explicitly call kfd_ioctl_destroy_queue to destroy all
created queues, when the kfd process is destroyed, some queues'
cu_mask memory are not freed.

To avoid forgetting to free them in some places, free them immediately
after use.

Signed-off-by: Lang Yu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10
--
  2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4de907f3e66a..5c0e6dcf692a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct

kfd_process *p,

retval = copy_from_user(properties.cu_mask, cu_mask_ptr,

cu_mask_size);

if (retval) {
pr_debug("Could not copy CU mask from userspace");
-   kfree(properties.cu_mask);
-   return -EFAULT;
+   retval = -EFAULT;
+   goto out;
}

mutex_lock(&p->mutex);
@@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file
*filp, struct kfd_process *p,

mutex_unlock(&p->mutex);

-   if (retval)
-   kfree(properties.cu_mask);
+out:
+   kfree(properties.cu_mask);

return retval;
  }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 243dd1efcdbf..4c81d690f31a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -394,8 +394,6 @@ int pqm_destroy_queue(struct

process_queue_manager *pqm, unsigned int qid)

pdd->qpd.num_gws = 0;
}

-   kfree(pqn->q->properties.cu_mask);
-   pqn->q->properties.cu_mask = NULL;
uninit_queue(pqn->q);
}

@@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct

process_queue_manager *pqm, unsigned int qid,

return -EFAULT;
}

-   /* Free the old CU mask memory if it is already allocated, then
-* allocate memory for the new CU mask.
-*/
-   kfree(pqn->q->properties.cu_mask);
+   WARN_ON_ONCE(pqn->q->properties.cu_mask);

pqn->q->properties.cu_mask_count = p->cu_mask_count;
pqn->q->properties.cu_mask = p->cu_mask;

retval = pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
pqn->q);
+
+   pqn->q->properties.cu_mask = NULL;
+

This won't work correctly. We need to save the cu_mask for later.
Otherwise the next time dqm->ops.update_queue is called, for example in
pqm_update_queue or pqm_set_gws, it will wipe out the CU mask in the MQD.

Let's just return when meeting a null cu_mask in update_cu_mask() to avoid that.
Like following,

static void update_cu_mask(struct mqd_manager *mm, void *mqd,
   struct queue_properties *q)
{
struct v10_compute_mqd *m;
uint32_t se_mask[4] = {0}; /* 4 is the max # of SEs */

if (!q-> cu_mask || q->cu_mask_count == 0)
return;
..
}

Is this fine with you? Thanks!


I think that could work. I still don't like it. It leaves the CU mask in 
the q->properties structure, but it's only ever used temporarily and 
doesn't need to be persistent. I'd argue, in this case, the cu_mask 
shouldn't be in the q->properties structure at all, but should be passed 
as an optional parameter into the dqm->ops.update_queue call.


But I think a simpler fix would be to move the freeing of the CU mask 
into uninit_queue. That would catch all cases where a queue gets 
destroyed, including the process termination case.


Regards,
  Felix




Regards,
Lang
  

Regards,
   Felix



if (retval != 0)
return retval;

[PATCH 4/4] i915/gvt: remove spaces in pr_debug "gvt: core:" etc prefixes

2021-09-29 Thread Jim Cromie

Taking embedded spaces out of existing prefixes makes them better
class-prefixes; simplifying the extra quoting needed otherwise:

  $> echo format "^gvt: core:" +p >control
vs
  $> echo format ^gvt:core: +p >control

Dropping the internal spaces means that quotes are only needed when
the trailing space is required; they more distinctively signal that
requirement.

Consider a generic drm-debug example:

  # turn off ATOMIC reports
  echo format "^drm:atomic: " -p > control

  # turn off all ATOMIC:* reports, including any sub-categories
  echo format "^drm:atomic:" -p > control

  # turn on ATOMIC:FAIL: reports
  echo format "^drm:atomic:fail: " +p > control

Removing embedded spaces in the class-prefixes simplifies the
corresponding match-prefix.  This means that "quoted" match-prefixes
are only needed when the trailing space is desired, in order to
exclude explicitly sub-categorized pr-debugs; in this example,
"drm:atomic:fail:".

Signed-off-by: Jim Cromie 
---
---
 drivers/gpu/drm/i915/gvt/debug.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/debug.h b/drivers/gpu/drm/i915/gvt/debug.h
index c6027125c1ec..bbecc279e077 100644
--- a/drivers/gpu/drm/i915/gvt/debug.h
+++ b/drivers/gpu/drm/i915/gvt/debug.h
@@ -36,30 +36,30 @@ do {
\
 } while (0)
 
 #define gvt_dbg_core(fmt, args...) \
-   pr_debug("gvt: core: "fmt, ##args)
+   pr_debug("gvt:core: " fmt, ##args)
 
 #define gvt_dbg_irq(fmt, args...) \
-   pr_debug("gvt: irq: "fmt, ##args)
+   pr_debug("gvt:irq: " fmt, ##args)
 
 #define gvt_dbg_mm(fmt, args...) \
-   pr_debug("gvt: mm: "fmt, ##args)
+   pr_debug("gvt:mm: " fmt, ##args)
 
 #define gvt_dbg_mmio(fmt, args...) \
-   pr_debug("gvt: mmio: "fmt, ##args)
+   pr_debug("gvt:mmio: " fmt, ##args)
 
 #define gvt_dbg_dpy(fmt, args...) \
-   pr_debug("gvt: dpy: "fmt, ##args)
+   pr_debug("gvt:dpy: " fmt, ##args)
 
 #define gvt_dbg_el(fmt, args...) \
-   pr_debug("gvt: el: "fmt, ##args)
+   pr_debug("gvt:el: " fmt, ##args)
 
 #define gvt_dbg_sched(fmt, args...) \
-   pr_debug("gvt: sched: "fmt, ##args)
+   pr_debug("gvt:sched: " fmt, ##args)
 
 #define gvt_dbg_render(fmt, args...) \
-   pr_debug("gvt: render: "fmt, ##args)
+   pr_debug("gvt:render: " fmt, ##args)
 
 #define gvt_dbg_cmd(fmt, args...) \
-   pr_debug("gvt: cmd: "fmt, ##args)
+   pr_debug("gvt:cmd: " fmt, ##args)
 
 #endif
-- 
2.31.1

[PATCH 3/4] nouveau: fold multiple DRM_DEBUG_DRIVERs together

2021-09-29 Thread Jim Cromie

With DRM_USE_DYNAMIC_DEBUG, each callsite record requires 56 bytes.
We can combine 12 into one here and save ~620 bytes.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 36 +--
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 1f828c9f691c..d9fbd249dbaa 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1242,19 +1242,29 @@ nouveau_drm_pci_table[] = {
 
 static void nouveau_display_options(void)
 {
-   DRM_DEBUG_DRIVER("Loading Nouveau with parameters:\n");
-
-   DRM_DEBUG_DRIVER("... tv_disable   : %d\n", nouveau_tv_disable);
-   DRM_DEBUG_DRIVER("... ignorelid: %d\n", nouveau_ignorelid);
-   DRM_DEBUG_DRIVER("... duallink : %d\n", nouveau_duallink);
-   DRM_DEBUG_DRIVER("... nofbaccel: %d\n", nouveau_nofbaccel);
-   DRM_DEBUG_DRIVER("... config   : %s\n", nouveau_config);
-   DRM_DEBUG_DRIVER("... debug: %s\n", nouveau_debug);
-   DRM_DEBUG_DRIVER("... noaccel  : %d\n", nouveau_noaccel);
-   DRM_DEBUG_DRIVER("... modeset  : %d\n", nouveau_modeset);
-   DRM_DEBUG_DRIVER("... runpm: %d\n", nouveau_runtime_pm);
-   DRM_DEBUG_DRIVER("... vram_pushbuf : %d\n", nouveau_vram_pushbuf);
-   DRM_DEBUG_DRIVER("... hdmimhz  : %d\n", nouveau_hdmimhz);
+   DRM_DEBUG_DRIVER("Loading Nouveau with parameters:\n"
+"... tv_disable   : %d\n"
+"... ignorelid: %d\n"
+"... duallink : %d\n"
+"... nofbaccel: %d\n"
+"... config   : %s\n"
+"... debug: %s\n"
+"... noaccel  : %d\n"
+"... modeset  : %d\n"
+"... runpm: %d\n"
+"... vram_pushbuf : %d\n"
+"... hdmimhz  : %d\n"
+, nouveau_tv_disable
+, nouveau_ignorelid
+, nouveau_duallink
+, nouveau_nofbaccel
+, nouveau_config
+, nouveau_debug
+, nouveau_noaccel
+, nouveau_modeset
+, nouveau_runtime_pm
+, nouveau_vram_pushbuf
+, nouveau_hdmimhz);
 }
 
 static const struct dev_pm_ops nouveau_pm_ops = {
-- 
2.31.1

[PATCH 2/4] amdgpu_ucode: reduce number of pr_debug calls

2021-09-29 Thread Jim Cromie

There are blocks of DRM_DEBUG calls, consolidate their args into
single calls.  With dynamic-debug in use, each callsite consumes 56
bytes of callsite data, and this patch removes about 65 calls, so
it saves ~3.5kb.

no functional changes.

RFC: this creates multi-line log messages, does that break any syslog
conventions ?

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 293 --
 1 file changed, 158 insertions(+), 135 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index abd8469380e5..411179142a6e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -30,17 +30,26 @@
 
 static void amdgpu_ucode_print_common_hdr(const struct common_firmware_header 
*hdr)
 {
-   DRM_DEBUG("size_bytes: %u\n", le32_to_cpu(hdr->size_bytes));
-   DRM_DEBUG("header_size_bytes: %u\n", 
le32_to_cpu(hdr->header_size_bytes));
-   DRM_DEBUG("header_version_major: %u\n", 
le16_to_cpu(hdr->header_version_major));
-   DRM_DEBUG("header_version_minor: %u\n", 
le16_to_cpu(hdr->header_version_minor));
-   DRM_DEBUG("ip_version_major: %u\n", le16_to_cpu(hdr->ip_version_major));
-   DRM_DEBUG("ip_version_minor: %u\n", le16_to_cpu(hdr->ip_version_minor));
-   DRM_DEBUG("ucode_version: 0x%08x\n", le32_to_cpu(hdr->ucode_version));
-   DRM_DEBUG("ucode_size_bytes: %u\n", le32_to_cpu(hdr->ucode_size_bytes));
-   DRM_DEBUG("ucode_array_offset_bytes: %u\n",
- le32_to_cpu(hdr->ucode_array_offset_bytes));
-   DRM_DEBUG("crc32: 0x%08x\n", le32_to_cpu(hdr->crc32));
+   DRM_DEBUG("size_bytes: %u\n"
+ "header_size_bytes: %u\n"
+ "header_version_major: %u\n"
+ "header_version_minor: %u\n"
+ "ip_version_major: %u\n"
+ "ip_version_minor: %u\n"
+ "ucode_version: 0x%08x\n"
+ "ucode_size_bytes: %u\n"
+ "ucode_array_offset_bytes: %u\n"
+ "crc32: 0x%08x\n",
+ le32_to_cpu(hdr->size_bytes),
+ le32_to_cpu(hdr->header_size_bytes),
+ le16_to_cpu(hdr->header_version_major),
+ le16_to_cpu(hdr->header_version_minor),
+ le16_to_cpu(hdr->ip_version_major),
+ le16_to_cpu(hdr->ip_version_minor),
+ le32_to_cpu(hdr->ucode_version),
+ le32_to_cpu(hdr->ucode_size_bytes),
+ le32_to_cpu(hdr->ucode_array_offset_bytes),
+ le32_to_cpu(hdr->crc32));
 }
 
 void amdgpu_ucode_print_mc_hdr(const struct common_firmware_header *hdr)
@@ -55,9 +64,9 @@ void amdgpu_ucode_print_mc_hdr(const struct 
common_firmware_header *hdr)
const struct mc_firmware_header_v1_0 *mc_hdr =
container_of(hdr, struct mc_firmware_header_v1_0, 
header);
 
-   DRM_DEBUG("io_debug_size_bytes: %u\n",
- le32_to_cpu(mc_hdr->io_debug_size_bytes));
-   DRM_DEBUG("io_debug_array_offset_bytes: %u\n",
+   DRM_DEBUG("io_debug_size_bytes: %u\n"
+ "io_debug_array_offset_bytes: %u\n",
+ le32_to_cpu(mc_hdr->io_debug_size_bytes),
  le32_to_cpu(mc_hdr->io_debug_array_offset_bytes));
} else {
DRM_ERROR("Unknown MC ucode version: %u.%u\n", version_major, 
version_minor);
@@ -82,13 +91,17 @@ void amdgpu_ucode_print_smc_hdr(const struct 
common_firmware_header *hdr)
switch (version_minor) {
case 0:
v2_0_hdr = container_of(hdr, struct 
smc_firmware_header_v2_0, v1_0.header);
-   DRM_DEBUG("ppt_offset_bytes: %u\n", 
le32_to_cpu(v2_0_hdr->ppt_offset_bytes));
-   DRM_DEBUG("ppt_size_bytes: %u\n", 
le32_to_cpu(v2_0_hdr->ppt_size_bytes));
+   DRM_DEBUG("ppt_offset_bytes: %u\n"
+ "ppt_size_bytes: %u\n",
+ le32_to_cpu(v2_0_hdr->ppt_offset_bytes),
+ le32_to_cpu(v2_0_hdr->ppt_size_bytes));
break;
case 1:
v2_1_hdr = container_of(hdr, struct 
smc_firmware_header_v2_1, v1_0.header);
-   DRM_DEBUG("pptable_count: %u\n", 
le32_to_cpu(v2_1_hdr->pptable_count));
-   DRM_DEBUG("pptable_entry_offset: %u\n", 
le32_to_cpu(v2_1_hdr->pptable_entry_offset));
+   DRM_DEBUG("pptable_count: %u\n"
+ "pptable_entry_offset: %u\n",
+ le32_to_cpu(v2_1_hdr->pptable_count),
+ le32_to_cpu(v2_1_hdr->pptable_entry_offset));
break;
default:
break;
@@ -111,10 +124,12 @@ void am

[PATCH 1/4] drm: fix doc grammar error

2021-09-29 Thread Jim Cromie

no code changes, good for rc

Signed-off-by: Jim Cromie 
---
 include/drm/drm_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 0cd95953cdf5..4b29261c4537 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -486,7 +486,7 @@ void *__devm_drm_dev_alloc(struct device *parent,
  * @type: the type of the struct which contains struct &drm_device
  * @member: the name of the &drm_device within @type.
  *
- * This allocates and initialize a new DRM device. No device registration is 
done.
+ * This allocates and initializes a new DRM device. No device registration is 
done.
  * Call drm_dev_register() to advertice the device to user space and register 
it
  * with other core subsystems. This should be done last in the device
  * initialization sequence to make sure userspace can't access an inconsistent
-- 
2.31.1

[PATCH 0/4] drm: maintenance patches for 5.15-rcX

2021-09-29 Thread Jim Cromie

hi drm folks,

Heres a small set of assorted patches which are IMO suitable for rcX;
one doc fix, 2 patches folding multiple DBGs together, and a format
string modification.

Jim Cromie (4):
  drm: fix doc grammar error
  amdgpu_ucode: reduce number of pr_debug calls
  nouveau: fold multiple DRM_DEBUG_DRIVERs together
  i915/gvt: remove spaces in pr_debug "gvt: core:" etc prefixes

 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 293 --
 drivers/gpu/drm/i915/gvt/debug.h  |  18 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c |  36 ++-
 include/drm/drm_drv.h |   2 +-
 4 files changed, 191 insertions(+), 158 deletions(-)

-- 
2.31.1

Re: [PATCH] drm/amdkfd: avoid conflicting address mappings

2021-09-29 Thread Mike Lothian

Hi

This patch is causing a compile failure for me

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.c:1254:25: error:
unused variable 'svms' [-Werror,-Wunused-variable]
   struct svm_range_list *svms = &p->svms;
  ^
1 error generated.

I'll turn off Werror

On Mon, 19 Jul 2021 at 22:19, Alex Sierra  wrote:
>
> [Why]
> Avoid conflict with address ranges mapped by SVM
> mechanism that try to be allocated again through
> ioctl_alloc in the same process. And viceversa.
>
> [How]
> For ioctl_alloc_memory_of_gpu allocations
> Check if the address range passed into ioctl memory
> alloc does not exist already in the kfd_process
> svms->objects interval tree.
>
> For SVM allocations
> Look for the address range into the interval tree VA from
> the VM inside of each pdds used in a kfd_process.
>
> Signed-off-by: Alex Sierra 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 79 +++-
>  2 files changed, 75 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 67541c30327a..f39baaa22a62 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1251,6 +1251,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
> *filep,
> struct kfd_process_device *pdd;
> void *mem;
> struct kfd_dev *dev;
> +   struct svm_range_list *svms = &p->svms;
> int idr_handle;
> long err;
> uint64_t offset = args->mmap_offset;
> @@ -1259,6 +1260,18 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
> *filep,
> if (args->size == 0)
> return -EINVAL;
>
> +#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
> +   mutex_lock(&svms->lock);
> +   if (interval_tree_iter_first(&svms->objects,
> +args->va_addr >> PAGE_SHIFT,
> +(args->va_addr + args->size - 1) >> 
> PAGE_SHIFT)) {
> +   pr_err("Address: 0x%llx already allocated by SVM\n",
> +   args->va_addr);
> +   mutex_unlock(&svms->lock);
> +   return -EADDRINUSE;
> +   }
> +   mutex_unlock(&svms->lock);
> +#endif
> dev = kfd_device_by_id(args->gpu_id);
> if (!dev)
> return -EINVAL;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 31f3f24cef6a..043ee0467916 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2581,9 +2581,54 @@ int svm_range_list_init(struct kfd_process *p)
> return 0;
>  }
>
> +/**
> + * svm_range_is_vm_bo_mapped - check if virtual address range mapped already
> + * @p: current kfd_process
> + * @start: range start address, in pages
> + * @last: range last address, in pages
> + *
> + * The purpose is to avoid virtual address ranges already allocated by
> + * kfd_ioctl_alloc_memory_of_gpu ioctl.
> + * It looks for each pdd in the kfd_process.
> + *
> + * Context: Process context
> + *
> + * Return 0 - OK, if the range is not mapped.
> + * Otherwise error code:
> + * -EADDRINUSE - if address is mapped already by 
> kfd_ioctl_alloc_memory_of_gpu
> + * -ERESTARTSYS - A wait for the buffer to become unreserved was interrupted 
> by
> + * a signal. Release all buffer reservations and return to user-space.
> + */
> +static int
> +svm_range_is_vm_bo_mapped(struct kfd_process *p, uint64_t start, uint64_t 
> last)
> +{
> +   uint32_t i;
> +   int r;
> +
> +   for (i = 0; i < p->n_pdds; i++) {
> +   struct amdgpu_vm *vm;
> +
> +   if (!p->pdds[i]->drm_priv)
> +   continue;
> +
> +   vm = drm_priv_to_vm(p->pdds[i]->drm_priv);
> +   r = amdgpu_bo_reserve(vm->root.bo, false);
> +   if (r)
> +   return r;
> +   if (interval_tree_iter_first(&vm->va, start, last)) {
> +   pr_debug("Range [0x%llx 0x%llx] already mapped\n", 
> start, last);
> +   amdgpu_bo_unreserve(vm->root.bo);
> +   return -EADDRINUSE;
> +   }
> +   amdgpu_bo_unreserve(vm->root.bo);
> +   }
> +
> +   return 0;
> +}
> +
>  /**
>   * svm_range_is_valid - check if virtual address range is valid
> - * @mm: current process mm_struct
> + * @mm: current kfd_process
>   * @start: range start address, in pages
>   * @size: range size, in pages
>   *
> @@ -2592,28 +2637,27 @@ int svm_range_list_init(struct kfd_process *p)
>   * Context: Process context
>   *
>   * Return:
> - *  true - valid svm range
> - *  false - invalid svm range
> + *  0 - OK, otherwise error code
>   */
> -static bool
> -svm_range_is_valid(struct mm_struct *mm, uint64_t start, uint64_t size)
> +static int
> +svm_range_is_valid(struct kfd_process *p, uint64_t st

RE: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Yu, Lang

[AMD Official Use Only]



>-Original Message-
>From: Kuehling, Felix 
>Sent: Wednesday, September 29, 2021 11:25 PM
>To: Yu, Lang ; amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Huang, Ray
>
>Subject: Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak
>
>Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:
>> If user doesn't explicitly call kfd_ioctl_destroy_queue to destroy all
>> created queues, when the kfd process is destroyed, some queues'
>> cu_mask memory are not freed.
>>
>> To avoid forgetting to free them in some places, free them immediately
>> after use.
>>
>> Signed-off-by: Lang Yu 
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
>>  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10
>> --
>>  2 files changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index 4de907f3e66a..5c0e6dcf692a 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, 
>> struct
>kfd_process *p,
>>  retval = copy_from_user(properties.cu_mask, cu_mask_ptr,
>cu_mask_size);
>>  if (retval) {
>>  pr_debug("Could not copy CU mask from userspace");
>> -kfree(properties.cu_mask);
>> -return -EFAULT;
>> +retval = -EFAULT;
>> +goto out;
>>  }
>>
>>  mutex_lock(&p->mutex);
>> @@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file
>> *filp, struct kfd_process *p,
>>
>>  mutex_unlock(&p->mutex);
>>
>> -if (retval)
>> -kfree(properties.cu_mask);
>> +out:
>> +kfree(properties.cu_mask);
>>
>>  return retval;
>>  }
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index 243dd1efcdbf..4c81d690f31a 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> @@ -394,8 +394,6 @@ int pqm_destroy_queue(struct
>process_queue_manager *pqm, unsigned int qid)
>>  pdd->qpd.num_gws = 0;
>>  }
>>
>> -kfree(pqn->q->properties.cu_mask);
>> -pqn->q->properties.cu_mask = NULL;
>>  uninit_queue(pqn->q);
>>  }
>>
>> @@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct
>process_queue_manager *pqm, unsigned int qid,
>>  return -EFAULT;
>>  }
>>
>> -/* Free the old CU mask memory if it is already allocated, then
>> - * allocate memory for the new CU mask.
>> - */
>> -kfree(pqn->q->properties.cu_mask);
>> +WARN_ON_ONCE(pqn->q->properties.cu_mask);
>>
>>  pqn->q->properties.cu_mask_count = p->cu_mask_count;
>>  pqn->q->properties.cu_mask = p->cu_mask;
>>
>>  retval = pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
>>  pqn->q);
>> +
>> +pqn->q->properties.cu_mask = NULL;
>> +
>
>This won't work correctly. We need to save the cu_mask for later.
>Otherwise the next time dqm->ops.update_queue is called, for example in
>pqm_update_queue or pqm_set_gws, it will wipe out the CU mask in the MQD.

Let's just return when meeting a null cu_mask in update_cu_mask() to avoid that.
Like following,

static void update_cu_mask(struct mqd_manager *mm, void *mqd,
   struct queue_properties *q)
{
struct v10_compute_mqd *m;
uint32_t se_mask[4] = {0}; /* 4 is the max # of SEs */

if (!q-> cu_mask || q->cu_mask_count == 0)
return;
..
}

Is this fine with you? Thanks!

Regards,
Lang
 
>Regards,
>  Felix
>
>
>>  if (retval != 0)
>>  return retval;
>>

Re: amdgpu driver halted on suspend of shutdown

2021-09-29 Thread Alex Deucher

On Wed, Sep 29, 2021 at 3:25 AM 李真能  wrote:
>
> Hello:
>
>  When I do loop  auto test of reboot, I found  kernel may halt
> on memcpy_fromio of amdgpu's amdgpu_uvd_suspend, so I remove suspend
> process in amdgpu_pci_shutdown, and it will fix this bug.
>
> I have 3 questions to ask:
>
> 1. In amdgpu_pci_shutdown, the comment explains why we must execute
> suspend,  so I know VM will call amdgpu driver in which situations, as I
> know, VM's graphics card is a virtual card;
>
> 2. I see a path that is commited by Alex Deucher, the commit message is
> as follows:
>
> drm/amdgpu: just suspend the hw on pci shutdown
>
> We can't just reuse pci_remove as there may be userspace still
>  doing things.
>
> My question is:In which situations, there may be  userspace till doing
> things.
>
> 3. Why amdgpu driver is halted on memcpy_fromio of amdgpu_uvd_suspend, I
> haven't launch any video app during reboot test, is it the bug of pci bus?
>
> Test environment:
>
> CPU: arm64

I suspect the problem is something ARM specific.  IIRC, we added the
memcpy_fromio() to work around a limitation in ARM related to CPU
mappings of PCI BAR memory.  The whole point of the PCI shutdown
callback is to put the device into a quiescent state (e.g., stop all
DMAs and asynchronous engines, etc.).  Some of that tear down requires
access to PCI BARs.

Alex


>
> Graphics card: r7340(amdgpu), rx550
>
> OS: ubuntu 2004
>

Re: [PATCH] drm/amdgpu/display: protect DCN specific stuff in process_deferred_updates

2021-09-29 Thread Harry Wentland

On 2021-09-29 16:36, Alex Deucher wrote:
> Need to protect this function with CONFIG_DRM_AMD_DC_DCN.
> 
> Fixes: bfd34644dedb ("drm/amd/display: Defer LUT memory powerdown until LUT 
> bypass latches")
> Cc: Michael Strauss 
> Cc: Eric Yang 
> Cc: Anson Jacob 
> Reported-by: Stephen Rothwell 
> Signed-off-by: Alex Deucher 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/core/dc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 0f0440408a16..b113e7e74ded 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -1802,12 +1802,14 @@ static bool is_flip_pending_in_pipes(struct dc *dc, 
> struct dc_state *context)
>   */
>  static void process_deferred_updates(struct dc *dc)
>  {
> +#ifdef CONFIG_DRM_AMD_DC_DCN
>   int i;
>  
>   if (dc->debug.enable_mem_low_power.bits.cm)
>   for (i = 0; i < dc->dcn_ip->max_num_dpp; i++)
>   if (dc->res_pool->dpps[i]->funcs->dpp_deferred_update)
>   
> dc->res_pool->dpps[i]->funcs->dpp_deferred_update(dc->res_pool->dpps[i]);
> +#endif
>  }
>  
>  void dc_post_update_surfaces_to_stream(struct dc *dc)
>

[PATCH] drm/amdgpu/display: protect DCN specific stuff in process_deferred_updates

2021-09-29 Thread Alex Deucher

Need to protect this function with CONFIG_DRM_AMD_DC_DCN.

Fixes: bfd34644dedb ("drm/amd/display: Defer LUT memory powerdown until LUT 
bypass latches")
Cc: Michael Strauss 
Cc: Eric Yang 
Cc: Anson Jacob 
Reported-by: Stephen Rothwell 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 0f0440408a16..b113e7e74ded 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1802,12 +1802,14 @@ static bool is_flip_pending_in_pipes(struct dc *dc, 
struct dc_state *context)
  */
 static void process_deferred_updates(struct dc *dc)
 {
+#ifdef CONFIG_DRM_AMD_DC_DCN
int i;
 
if (dc->debug.enable_mem_low_power.bits.cm)
for (i = 0; i < dc->dcn_ip->max_num_dpp; i++)
if (dc->res_pool->dpps[i]->funcs->dpp_deferred_update)

dc->res_pool->dpps[i]->funcs->dpp_deferred_update(dc->res_pool->dpps[i]);
+#endif
 }
 
 void dc_post_update_surfaces_to_stream(struct dc *dc)
-- 
2.31.1

[PATCH 2/2] amd/amdgpu_dm: Verify Gamma and Degamma LUT sizes using DRM Core check

2021-09-29 Thread Mark Yacoub

From: Mark Yacoub 

[Why]
drm_atomic_helper_check_crtc now verifies both legacy and non-legacy LUT
sizes. There is no need to check it within amdgpu_dm_atomic_check.

[How]
Remove the local call to verify LUT sizes and use DRM Core function
instead.

Tested on ChromeOS Zork.

Signed-off-by: Mark Yacoub 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 07adac1a8c42b..96a1d006b777e 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -10683,6 +10683,10 @@ static int amdgpu_dm_atomic_check(struct drm_device 
*dev,
}
}
 #endif
+   ret = drm_atomic_helper_check_crtc(state);
+   if (ret)
+   return ret;
+
for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, 
new_crtc_state, i) {
dm_old_crtc_state = to_dm_crtc_state(old_crtc_state);
 
@@ -10692,10 +10696,6 @@ static int amdgpu_dm_atomic_check(struct drm_device 
*dev,
dm_old_crtc_state->dsc_force_changed == false)
continue;
 
-   ret = amdgpu_dm_verify_lut_sizes(new_crtc_state);
-   if (ret)
-   goto fail;
-
if (!new_crtc_state->enable)
continue;
 
-- 
2.33.0.685.g46640cef36-goog

[PATCH 1/2] drm: Add Gamma and Degamma LUT sizes props to drm_crtc to validate.

2021-09-29 Thread Mark Yacoub

From: Mark Yacoub 

[Why]
1. drm_atomic_helper_check doesn't check for the LUT sizes of either Gamma
or Degamma props in the new CRTC state, allowing any invalid size to
be passed on.
2. Each driver has its own LUT size, which could also be different for
legacy users.

[How]
1. Create |degamma_lut_size| and |gamma_lut_size| to save the LUT sizes
assigned by the driver when it's initializing its color and CTM
management.
2. Create drm_atomic_helper_check_crtc which is called by
drm_atomic_helper_check to check the LUT sizes saved in drm_crtc that
they match the sizes in the new CRTC state.

Fixes: igt@kms_color@pipe-A-invalid-gamma-lut-sizes on MTK
Tested on Zork(amdgpu) and Jacuzzi(mediatek)

Signed-off-by: Mark Yacoub
---
 drivers/gpu/drm/drm_atomic_helper.c | 56 +
 drivers/gpu/drm/drm_color_mgmt.c|  2 ++
 include/drm/drm_atomic_helper.h |  1 +
 include/drm/drm_crtc.h  | 11 ++
 4 files changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 2c0c6ec928200..265b9747250d1 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -930,6 +930,58 @@ drm_atomic_helper_check_planes(struct drm_device *dev,
 }
 EXPORT_SYMBOL(drm_atomic_helper_check_planes);
 
+/**
+ * drm_atomic_helper_check_planes - validate state object for CRTC changes
+ * @state: the driver state object
+ *
+ * Check the CRTC state object such as the Gamma/Degamma LUT sizes if the new
+ * state holds them.
+ *
+ * RETURNS:
+ * Zero for success or -errno
+ */
+int drm_atomic_helper_check_crtc(struct drm_atomic_state *state)
+{
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *new_crtc_state;
+   int i;
+
+   for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
+   if (new_crtc_state->gamma_lut) {
+   uint64_t supported_lut_size = crtc->gamma_lut_size;
+   uint32_t supported_legacy_lut_size = crtc->gamma_size;
+   uint32_t new_state_lut_size =
+   drm_color_lut_size(new_crtc_state->gamma_lut);
+
+   if (new_state_lut_size != supported_lut_size &&
+   new_state_lut_size != supported_legacy_lut_size) {
+   DRM_DEBUG_DRIVER(
+   "Invalid Gamma LUT size. Should be %u 
(or %u for legacy) but got %u.\n",
+   supported_lut_size,
+   supported_legacy_lut_size,
+   new_state_lut_size);
+   return -EINVAL;
+   }
+   }
+
+   if (new_crtc_state->degamma_lut) {
+   uint32_t new_state_lut_size =
+   drm_color_lut_size(new_crtc_state->degamma_lut);
+   uint64_t supported_lut_size = crtc->degamma_lut_size;
+
+   if (new_state_lut_size != supported_lut_size) {
+   DRM_DEBUG_DRIVER(
+   "Invalid Degamma LUT size. Should be %u 
but got %u.\n",
+   supported_lut_size, new_state_lut_size);
+   return -EINVAL;
+   }
+   }
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_atomic_helper_check_crtc);
+
 /**
  * drm_atomic_helper_check - validate state object
  * @dev: DRM device
@@ -975,6 +1027,10 @@ int drm_atomic_helper_check(struct drm_device *dev,
if (ret)
return ret;
 
+   ret = drm_atomic_helper_check_crtc(state);
+   if (ret)
+   return ret;
+
if (state->legacy_cursor_update)
state->async_update = !drm_atomic_helper_async_check(dev, 
state);
 
diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index bb14f488c8f6c..72a1b628e7cdd 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -166,6 +166,7 @@ void drm_crtc_enable_color_mgmt(struct drm_crtc *crtc,
struct drm_mode_config *config = &dev->mode_config;
 
if (degamma_lut_size) {
+   crtc->degamma_lut_size = degamma_lut_size;
drm_object_attach_property(&crtc->base,
   config->degamma_lut_property, 0);
drm_object_attach_property(&crtc->base,
@@ -178,6 +179,7 @@ void drm_crtc_enable_color_mgmt(struct drm_crtc *crtc,
   config->ctm_property, 0);
 
if (gamma_lut_size) {
+   crtc->gamma_lut_size = gamma_lut_size;
drm_object_attach_property(&crtc->base,
   config->gamma_lut_property, 0);
drm_object_attach_property(&crtc->base,
diff --git a/include/drm/drm_atomi

[PATCH 2/2] drm/amdgpu/jpeg: add jpeg2.6 start/end

2021-09-29 Thread James Zhu

Add jpeg2.6 with updated PCTL0_MMHUB_DEEPSLEEP_IB address in start/end.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 40 --
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
index 46096ad..a29c866 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
@@ -423,6 +423,42 @@ static void jpeg_v2_5_dec_ring_set_wptr(struct amdgpu_ring 
*ring)
}
 }
 
+/**
+ * jpeg_v2_6_dec_ring_insert_start - insert a start command
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Write a start command to the ring.
+ */
+static void jpeg_v2_6_dec_ring_insert_start(struct amdgpu_ring *ring)
+{
+   amdgpu_ring_write(ring, PACKETJ(mmUVD_JRBC_EXTERNAL_REG_INTERNAL_OFFSET,
+   0, 0, PACKETJ_TYPE0));
+   amdgpu_ring_write(ring, 0x6aa04); /* PCTL0_MMHUB_DEEPSLEEP_IB */
+
+   amdgpu_ring_write(ring, PACKETJ(JRBC_DEC_EXTERNAL_REG_WRITE_ADDR,
+   0, 0, PACKETJ_TYPE0));
+   amdgpu_ring_write(ring, 0x8000 | (1 << (ring->me * 2 + 14)));
+}
+
+/**
+ * jpeg_v2_6_dec_ring_insert_end - insert a end command
+ *
+ * @ring: amdgpu_ring pointer
+ *
+ * Write a end command to the ring.
+ */
+static void jpeg_v2_6_dec_ring_insert_end(struct amdgpu_ring *ring)
+{
+   amdgpu_ring_write(ring, PACKETJ(mmUVD_JRBC_EXTERNAL_REG_INTERNAL_OFFSET,
+   0, 0, PACKETJ_TYPE0));
+   amdgpu_ring_write(ring, 0x6aa04); /* PCTL0_MMHUB_DEEPSLEEP_IB */
+
+   amdgpu_ring_write(ring, PACKETJ(JRBC_DEC_EXTERNAL_REG_WRITE_ADDR,
+   0, 0, PACKETJ_TYPE0));
+   amdgpu_ring_write(ring, (1 << (ring->me * 2 + 14)));
+}
+
 static bool jpeg_v2_5_is_idle(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -633,8 +669,8 @@ static const struct amdgpu_ring_funcs 
jpeg_v2_6_dec_ring_vm_funcs = {
.test_ring = amdgpu_jpeg_dec_ring_test_ring,
.test_ib = amdgpu_jpeg_dec_ring_test_ib,
.insert_nop = jpeg_v2_0_dec_ring_nop,
-   .insert_start = jpeg_v2_0_dec_ring_insert_start,
-   .insert_end = jpeg_v2_0_dec_ring_insert_end,
+   .insert_start = jpeg_v2_6_dec_ring_insert_start,
+   .insert_end = jpeg_v2_6_dec_ring_insert_end,
.pad_ib = amdgpu_ring_generic_pad_ib,
.begin_use = amdgpu_jpeg_ring_begin_use,
.end_use = amdgpu_jpeg_ring_end_use,
-- 
2.7.4

[PATCH 1/2] drm/amdgpu/jpeg2: move jpeg2 shared macro to header file

2021-09-29 Thread James Zhu

Move jpeg2 shared macro to header file

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 20 
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.h | 20 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c
index 85967a5..299de1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c
@@ -32,26 +32,6 @@
 #include "vcn/vcn_2_0_0_sh_mask.h"
 #include "ivsrcid/vcn/irqsrcs_vcn_2_0.h"
 
-#define mmUVD_JRBC_EXTERNAL_REG_INTERNAL_OFFSET
0x1bfff
-#define mmUVD_JPEG_GPCOM_CMD_INTERNAL_OFFSET   0x4029
-#define mmUVD_JPEG_GPCOM_DATA0_INTERNAL_OFFSET 0x402a
-#define mmUVD_JPEG_GPCOM_DATA1_INTERNAL_OFFSET 0x402b
-#define mmUVD_LMI_JRBC_RB_MEM_WR_64BIT_BAR_LOW_INTERNAL_OFFSET 0x40ea
-#define mmUVD_LMI_JRBC_RB_MEM_WR_64BIT_BAR_HIGH_INTERNAL_OFFSET
0x40eb
-#define mmUVD_LMI_JRBC_IB_VMID_INTERNAL_OFFSET 0x40cf
-#define mmUVD_LMI_JPEG_VMID_INTERNAL_OFFSET0x40d1
-#define mmUVD_LMI_JRBC_IB_64BIT_BAR_LOW_INTERNAL_OFFSET
0x40e8
-#define mmUVD_LMI_JRBC_IB_64BIT_BAR_HIGH_INTERNAL_OFFSET   0x40e9
-#define mmUVD_JRBC_IB_SIZE_INTERNAL_OFFSET 0x4082
-#define mmUVD_LMI_JRBC_RB_MEM_RD_64BIT_BAR_LOW_INTERNAL_OFFSET 0x40ec
-#define mmUVD_LMI_JRBC_RB_MEM_RD_64BIT_BAR_HIGH_INTERNAL_OFFSET
0x40ed
-#define mmUVD_JRBC_RB_COND_RD_TIMER_INTERNAL_OFFSET0x4085
-#define mmUVD_JRBC_RB_REF_DATA_INTERNAL_OFFSET 0x4084
-#define mmUVD_JRBC_STATUS_INTERNAL_OFFSET  0x4089
-#define mmUVD_JPEG_PITCH_INTERNAL_OFFSET   0x401f
-
-#define JRBC_DEC_EXTERNAL_REG_WRITE_ADDR   0x18000
-
 static void jpeg_v2_0_set_dec_ring_funcs(struct amdgpu_device *adev);
 static void jpeg_v2_0_set_irq_funcs(struct amdgpu_device *adev);
 static int jpeg_v2_0_set_powergating_state(void *handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.h 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.h
index 15a344e..1a03baa 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.h
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.h
@@ -24,6 +24,26 @@
 #ifndef __JPEG_V2_0_H__
 #define __JPEG_V2_0_H__
 
+#define mmUVD_JRBC_EXTERNAL_REG_INTERNAL_OFFSET
0x1bfff
+#define mmUVD_JPEG_GPCOM_CMD_INTERNAL_OFFSET   0x4029
+#define mmUVD_JPEG_GPCOM_DATA0_INTERNAL_OFFSET 0x402a
+#define mmUVD_JPEG_GPCOM_DATA1_INTERNAL_OFFSET 0x402b
+#define mmUVD_LMI_JRBC_RB_MEM_WR_64BIT_BAR_LOW_INTERNAL_OFFSET 0x40ea
+#define mmUVD_LMI_JRBC_RB_MEM_WR_64BIT_BAR_HIGH_INTERNAL_OFFSET
0x40eb
+#define mmUVD_LMI_JRBC_IB_VMID_INTERNAL_OFFSET 0x40cf
+#define mmUVD_LMI_JPEG_VMID_INTERNAL_OFFSET0x40d1
+#define mmUVD_LMI_JRBC_IB_64BIT_BAR_LOW_INTERNAL_OFFSET
0x40e8
+#define mmUVD_LMI_JRBC_IB_64BIT_BAR_HIGH_INTERNAL_OFFSET   0x40e9
+#define mmUVD_JRBC_IB_SIZE_INTERNAL_OFFSET 0x4082
+#define mmUVD_LMI_JRBC_RB_MEM_RD_64BIT_BAR_LOW_INTERNAL_OFFSET 0x40ec
+#define mmUVD_LMI_JRBC_RB_MEM_RD_64BIT_BAR_HIGH_INTERNAL_OFFSET
0x40ed
+#define mmUVD_JRBC_RB_COND_RD_TIMER_INTERNAL_OFFSET0x4085
+#define mmUVD_JRBC_RB_REF_DATA_INTERNAL_OFFSET 0x4084
+#define mmUVD_JRBC_STATUS_INTERNAL_OFFSET  0x4089
+#define mmUVD_JPEG_PITCH_INTERNAL_OFFSET   0x401f
+
+#define JRBC_DEC_EXTERNAL_REG_WRITE_ADDR   0x18000
+
 void jpeg_v2_0_dec_ring_insert_start(struct amdgpu_ring *ring);
 void jpeg_v2_0_dec_ring_insert_end(struct amdgpu_ring *ring);
 void jpeg_v2_0_dec_ring_emit_fence(struct amdgpu_ring *ring, u64 addr, u64 seq,
-- 
2.7.4

[PATCH v3 2/2] amd/display: only require overlay plane to cover whole CRTC on ChromeOS

2021-09-29 Thread Simon Ser

Commit ddab8bd788f5 ("drm/amd/display: Fix two cursor duplication when
using overlay") changed the atomic validation code to forbid the
overlay plane from being used if it doesn't cover the whole CRTC. The
motivation is that ChromeOS uses the atomic API for everything except
the cursor plane (which uses the legacy API). Thus amdgpu must always
be prepared to enable/disable/move the cursor plane at any time without
failing (or else ChromeOS will trip over).

As discussed in [1], there's no reason why the ChromeOS limitation
should prevent other fully atomic users from taking advantage of the
overlay plane. Let's limit the check to ChromeOS.

[1]: 
https://lore.kernel.org/amd-gfx/JIQ_93_cHcshiIDsrMU1huBzx9P9LVQxucx8hQArpQu7Wk5DrCl_vTXj_Q20m_L-8C8A5dSpNcSJ8ehfcCrsQpfB5QG_Spn14EYkH9chtg0=@emersion.fr/

Signed-off-by: Simon Ser 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Bas Nieuwenhuizen 
Fixes: ddab8bd788f5 ("drm/amd/display: Fix two cursor duplication when using 
overlay")
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 6472c0032b54..f06d6e794721 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -10590,6 +10590,10 @@ static int validate_overlay(struct drm_atomic_state 
*state)
struct drm_plane_state *new_plane_state;
struct drm_plane_state *primary_state, *overlay_state = NULL;
 
+   /* This is a workaround for ChromeOS only */
+   if (strcmp(current->comm, "chrome") != 0)
+   return 0;
+
/* Check if primary plane is contained inside overlay */
for_each_new_plane_in_state_reverse(state, plane, new_plane_state, i) {
if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
-- 
2.33.0

[PATCH v3 1/2] amd/display: check cursor plane matches underlying plane

2021-09-29 Thread Simon Ser

The current logic checks whether the cursor plane blending
properties match the primary plane's. However that's wrong,
because the cursor is painted on all planes underneath. If
the cursor is over the primary plane and the cursor plane,
it's painted on both pipes.

Iterate over the CRTC planes and check their scaling match
the cursor's.

Signed-off-by: Simon Ser 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Bas Nieuwenhuizen 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 49 +--
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 3c7a8f869b40..6472c0032b54 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -10505,18 +10505,18 @@ static int dm_check_crtc_cursor(struct 
drm_atomic_state *state,
struct drm_crtc *crtc,
struct drm_crtc_state *new_crtc_state)
 {
-   struct drm_plane_state *new_cursor_state, *new_primary_state;
-   int cursor_scale_w, cursor_scale_h, primary_scale_w, primary_scale_h;
+   struct drm_plane *cursor = crtc->cursor, *underlying;
+   struct drm_plane_state *new_cursor_state, *new_underlying_state;
+   int i;
+   int cursor_scale_w, cursor_scale_h, underlying_scale_w, 
underlying_scale_h;
 
/* On DCE and DCN there is no dedicated hardware cursor plane. We get a
 * cursor per pipe but it's going to inherit the scaling and
 * positioning from the underlying pipe. Check the cursor plane's
-* blending properties match the primary plane's. */
+* blending properties match the underlying planes'. */
 
-   new_cursor_state = drm_atomic_get_new_plane_state(state, crtc->cursor);
-   new_primary_state = drm_atomic_get_new_plane_state(state, 
crtc->primary);
-   if (!new_cursor_state || !new_primary_state ||
-   !new_cursor_state->fb || !new_primary_state->fb) {
+   new_cursor_state = drm_atomic_get_new_plane_state(state, cursor);
+   if (!new_cursor_state || !new_cursor_state->fb) {
return 0;
}
 
@@ -10525,15 +10525,34 @@ static int dm_check_crtc_cursor(struct 
drm_atomic_state *state,
cursor_scale_h = new_cursor_state->crtc_h * 1000 /
 (new_cursor_state->src_h >> 16);
 
-   primary_scale_w = new_primary_state->crtc_w * 1000 /
-(new_primary_state->src_w >> 16);
-   primary_scale_h = new_primary_state->crtc_h * 1000 /
-(new_primary_state->src_h >> 16);
+   for_each_new_plane_in_state_reverse(state, underlying, 
new_underlying_state, i) {
+   /* Narrow down to non-cursor planes on the same CRTC as the 
cursor */
+   if (new_underlying_state->crtc != crtc || underlying == 
crtc->cursor)
+   continue;
 
-   if (cursor_scale_w != primary_scale_w ||
-   cursor_scale_h != primary_scale_h) {
-   drm_dbg_atomic(crtc->dev, "Cursor plane scaling doesn't match 
primary plane\n");
-   return -EINVAL;
+   /* Ignore disabled planes */
+   if (!new_underlying_state->fb)
+   continue;
+
+   underlying_scale_w = new_underlying_state->crtc_w * 1000 /
+(new_underlying_state->src_w >> 16);
+   underlying_scale_h = new_underlying_state->crtc_h * 1000 /
+(new_underlying_state->src_h >> 16);
+
+   if (cursor_scale_w != underlying_scale_w ||
+   cursor_scale_h != underlying_scale_h) {
+   drm_dbg_atomic(crtc->dev,
+  "Cursor [PLANE:%d:%s] scaling doesn't 
match underlying [PLANE:%d:%s]\n",
+  cursor->base.id, cursor->name, 
underlying->base.id, underlying->name);
+   return -EINVAL;
+   }
+
+   /* If this plane covers the whole CRTC, no need to check planes 
underneath */
+   if (new_underlying_state->crtc_x <= 0 &&
+   new_underlying_state->crtc_y <= 0 &&
+   new_underlying_state->crtc_x + new_underlying_state->crtc_w 
>= new_crtc_state->mode.hdisplay &&
+   new_underlying_state->crtc_y + new_underlying_state->crtc_h 
>= new_crtc_state->mode.vdisplay)
+   break;
}
 
return 0;
-- 
2.33.0

RE: [PATCH] drm/amdgpu/display: remove unused variable

2021-09-29 Thread Ma, Hanghong

[AMD Official Use Only]

Hi Alex,
This looks good to me, and thanks for the clean up.
Reviewed-by: Leo (Hanghong) Ma 

-Leo
-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Wednesday, September 29, 2021 1:47 PM
To: Deucher, Alexander 
Cc: amd-gfx list 
Subject: Re: [PATCH] drm/amdgpu/display: remove unused variable

Ping?

On Mon, Sep 27, 2021 at 3:08 PM Alex Deucher  wrote:
>
> No longer used, drop it.
>
> Fixes: 1e07005161fc ("drm/amd/display: add function to convert hw to 
> dpcd lane settings")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> index 029cc78bc9e9..5eb40dcff315 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> @@ -520,7 +520,6 @@ static void dpcd_set_lt_pattern_and_lane_settings(
>
> uint8_t dpcd_lt_buffer[5] = {0};
> union dpcd_training_pattern dpcd_pattern = { {0} };
> -   uint32_t lane;
> uint32_t size_in_bytes;
> bool edp_workaround = false; /* TODO link_prop.INTERNAL */
> dpcd_base_lt_offset = DP_TRAINING_PATTERN_SET; @@ -1020,7 
> +1019,6 @@ enum dc_status dpcd_set_lane_settings(
> uint32_t offset)
>  {
> union dpcd_training_lane dpcd_lane[LANE_COUNT_DP_MAX] = {{{0}}};
> -   uint32_t lane;
> unsigned int lane0_set_address;
> enum dc_status status;
>
> --
> 2.31.1
>

Re: [PATCH] drm/amdgpu/display: remove unused variable

2021-09-29 Thread Alex Deucher

Ping?

On Mon, Sep 27, 2021 at 3:08 PM Alex Deucher  wrote:
>
> No longer used, drop it.
>
> Fixes: 1e07005161fc ("drm/amd/display: add function to convert hw to dpcd 
> lane settings")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> index 029cc78bc9e9..5eb40dcff315 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> @@ -520,7 +520,6 @@ static void dpcd_set_lt_pattern_and_lane_settings(
>
> uint8_t dpcd_lt_buffer[5] = {0};
> union dpcd_training_pattern dpcd_pattern = { {0} };
> -   uint32_t lane;
> uint32_t size_in_bytes;
> bool edp_workaround = false; /* TODO link_prop.INTERNAL */
> dpcd_base_lt_offset = DP_TRAINING_PATTERN_SET;
> @@ -1020,7 +1019,6 @@ enum dc_status dpcd_set_lane_settings(
> uint32_t offset)
>  {
> union dpcd_training_lane dpcd_lane[LANE_COUNT_DP_MAX] = {{{0}}};
> -   uint32_t lane;
> unsigned int lane0_set_address;
> enum dc_status status;
>
> --
> 2.31.1
>

[PATCH] Documentation/gpu: remove spurious "+" in amdgpu.rst

2021-09-29 Thread Alex Deucher

Not sure why that was there.  Remove it.

Signed-off-by: Alex Deucher 
---
 Documentation/gpu/amdgpu.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpu/amdgpu.rst b/Documentation/gpu/amdgpu.rst
index 364680cdad2e..8ba72e898099 100644
--- a/Documentation/gpu/amdgpu.rst
+++ b/Documentation/gpu/amdgpu.rst
@@ -300,8 +300,8 @@ pcie_replay_count
 .. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
:doc: pcie_replay_count
 
-+GPU SmartShift Information
-
+GPU SmartShift Information
+==
 
 GPU SmartShift information via sysfs
 
-- 
2.31.1

[PATCH] drm/amdgpu: consolidate case statements

2021-09-29 Thread Alex Deucher

IP_VERSION(11, 0, 13) does the exact same thing as
IP_VERSION(11, 0, 12) so squash them together.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
index 382cebfc2069..aaf200ec982b 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -216,13 +216,6 @@ static int psp_v11_0_init_microcode(struct psp_context 
*psp)
case IP_VERSION(11, 0, 7):
case IP_VERSION(11, 0, 11):
case IP_VERSION(11, 0, 12):
-   err = psp_init_sos_microcode(psp, chip_name);
-   if (err)
-   return err;
-   err = psp_init_ta_microcode(psp, chip_name);
-   if (err)
-   return err;
-   break;
case IP_VERSION(11, 0, 13):
err = psp_init_sos_microcode(psp, chip_name);
if (err)
-- 
2.31.1

Re: [PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Felix Kuehling

Am 2021-09-29 um 4:22 a.m. schrieb Lang Yu:
> If user doesn't explicitly call kfd_ioctl_destroy_queue
> to destroy all created queues, when the kfd process is
> destroyed, some queues' cu_mask memory are not freed.
>
> To avoid forgetting to free them in some places,
> free them immediately after use.
>
> Signed-off-by: Lang Yu 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
>  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10 --
>  2 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 4de907f3e66a..5c0e6dcf692a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, 
> struct kfd_process *p,
>   retval = copy_from_user(properties.cu_mask, cu_mask_ptr, cu_mask_size);
>   if (retval) {
>   pr_debug("Could not copy CU mask from userspace");
> - kfree(properties.cu_mask);
> - return -EFAULT;
> + retval = -EFAULT;
> + goto out;
>   }
>  
>   mutex_lock(&p->mutex);
> @@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, 
> struct kfd_process *p,
>  
>   mutex_unlock(&p->mutex);
>  
> - if (retval)
> - kfree(properties.cu_mask);
> +out:
> + kfree(properties.cu_mask);
>  
>   return retval;
>  }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 243dd1efcdbf..4c81d690f31a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -394,8 +394,6 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, 
> unsigned int qid)
>   pdd->qpd.num_gws = 0;
>   }
>  
> - kfree(pqn->q->properties.cu_mask);
> - pqn->q->properties.cu_mask = NULL;
>   uninit_queue(pqn->q);
>   }
>  
> @@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct process_queue_manager *pqm, 
> unsigned int qid,
>   return -EFAULT;
>   }
>  
> - /* Free the old CU mask memory if it is already allocated, then
> -  * allocate memory for the new CU mask.
> -  */
> - kfree(pqn->q->properties.cu_mask);
> + WARN_ON_ONCE(pqn->q->properties.cu_mask);
>  
>   pqn->q->properties.cu_mask_count = p->cu_mask_count;
>   pqn->q->properties.cu_mask = p->cu_mask;
>  
>   retval = pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
>   pqn->q);
> +
> + pqn->q->properties.cu_mask = NULL;
> +

This won't work correctly. We need to save the cu_mask for later.
Otherwise the next time dqm->ops.update_queue is called, for example in
pqm_update_queue or pqm_set_gws, it will wipe out the CU mask in the MQD.

Regards,
  Felix


>   if (retval != 0)
>   return retval;
>

Re: [PATCH] drm/amdkfd: fix a potential ttm->sg memory leak

2021-09-29 Thread Felix Kuehling

Am 2021-09-29 um 4:19 a.m. schrieb Lang Yu:
> Memory is allocated for ttm->sg by kmalloc in kfd_mem_dmamap_userptr,
> but isn't freed by kfree in kfd_mem_dmaunmap_userptr. Free it!
>
> Signed-off-by: Lang Yu 

Please add

Fixes: 264fb4d332f5 ("drm/amdgpu: Add multi-GPU DMA mapping helpers")

Reviewed-by: Felix Kuehling 


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 2d6b2d77b738..054c1a224def 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -563,6 +563,7 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
>  
>   dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
>   sg_free_table(ttm->sg);
> + kfree(ttm->sg);
>   ttm->sg = NULL;
>  }
>

Re: [PATCH] drm/amd/amdgpu: Do irq_fini_hw after ip_fini_early

2021-09-29 Thread Andrey Grodzovsky


Can you test  this change with hotunplug tests in libdrm ?
Since the tests are still in disabled mode until latest fixes propagate
to drm-next upstream you will need to comment out 
https://gitlab.freedesktop.org/mesa/drm/-/blob/main/tests/amdgpu/hotunplug_tests.c#L65
I recently fixed a few regressions in amdgpu so hopefully there isn't 
more regressions

which will interfere with your testing.

Andrey

On 2021-09-29 5:22 a.m., YuBiao Wang wrote:

Some IP such as SMU need irq_put to perform hw_fini.
So move irq_fini_hw after ip_fini.

Signed-off-by: YuBiao Wang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 4c8f2f4647c0..18e26a78ef82 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3864,10 +3864,10 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_ucode_sysfs_fini(adev);
sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
  
-	amdgpu_irq_fini_hw(adev);

-
amdgpu_device_ip_fini_early(adev);
  
+	amdgpu_irq_fini_hw(adev);

+
ttm_device_clear_dma_mappings(&adev->mman.bdev);
  
  	amdgpu_gart_dummy_page_fini(adev);

Re: [PATCH v2] drm/amd/display: Only define DP 2.0 symbols if not already defined

2021-09-29 Thread Harry Wentland




On 2021-09-28 23:58, Navare, Manasi D wrote:
> [AMD Official Use Only]
> 
> We have merged such DRM definition dependencies previously through a topic 
> branch in order to avoid redefining inside the driver.
> But yes guarding this with ifdef is good.
> 
> Reviewed-by: Manasi Navare 
> 

Ah, I merged it already. But thanks for your review.

I agree these are better defined in drm headers, with a preparatory
patch if needed by the driver. We're working on cleaning it up and
dropping the driver defines.

Harry

> Manasi
> 
> -Original Message-
> From: Zuo, Jerry 
> Sent: Tuesday, September 28, 2021 11:11 PM
> To: Wentland, Harry ; Deucher, Alexander 
> ; amd-gfx@lists.freedesktop.org
> Cc: Nikula, Jani ; Li, Sun peng (Leo) 
> ; nat...@kernel.org; intel-...@lists.freedesktop.org; 
> dri-de...@lists.freedesktop.org; ville.syrj...@linux.intel.com; Navare, 
> Manasi D ; Koenig, Christian 
> ; Pan, Xinhui ; 
> s...@canb.auug.org.au; linux-n...@vger.kernel.org; airl...@gmail.com; 
> daniel.vet...@ffwll.ch; Wentland, Harry 
> Subject: RE: [PATCH v2] drm/amd/display: Only define DP 2.0 symbols if not 
> already defined
> 
> [AMD Official Use Only]
> 
>> -Original Message-
>> From: Harry Wentland 
>> Sent: September 28, 2021 1:08 PM
>> To: Deucher, Alexander ; amd-
>> g...@lists.freedesktop.org; Zuo, Jerry 
>> Cc: jani.nik...@intel.com; Li, Sun peng (Leo) ;
>> nat...@kernel.org; intel-...@lists.freedesktop.org; dri-
>> de...@lists.freedesktop.org; ville.syrj...@linux.intel.com;
>> manasi.d.nav...@intel.com; Koenig, Christian
>> ; Pan, Xinhui ;
>> s...@canb.auug.org.au; linux- n...@vger.kernel.org; airl...@gmail.com;
>> daniel.vet...@ffwll.ch; Wentland, Harry 
>> Subject: [PATCH v2] drm/amd/display: Only define DP 2.0 symbols if not
>> already defined
>>
>> [Why]
>> For some reason we're defining DP 2.0 definitions inside our driver.
>> Now that patches to introduce relevant definitions are slated to be
>> merged into drm- next this is causing conflicts.
>>
>> In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c:33:
>> In file included
>> from ./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:70:
>> In file included
>> from ./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_mode.h:36:
>> ./include/drm/drm_dp_helper.h:1322:9: error:
>> 'DP_MAIN_LINK_CHANNEL_CODING_PHY_REPEATER' macro redefined [-
>> Werror,-Wmacro-redefined]
>> ^
>> ./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dp_types.h:881:9: note:
>> previous definition is here
>> ^
>> 1 error generated.
>>
>> v2: Add one missing endif
>>
>> [How]
>> Guard all display driver defines with #ifndef for now. Once we pull in
>> the new definitions into amd-staging-drm-next we will follow up and
>> drop definitions from our driver and provide follow-up header updates
>> for any addition DP
>> 2.0 definitions required by our driver.
>>
>> Signed-off-by: Harry Wentland 
> 
> Reviewed-by: Fangzhi Zuo 
> 
>> ---
>>  drivers/gpu/drm/amd/display/dc/dc_dp_types.h | 54
>> ++--
>>  1 file changed, 49 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
>> b/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
>> index a5e798b5da79..9de86ff5ef1b 100644
>> --- a/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
>> +++ b/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
>> @@ -860,28 +860,72 @@ struct psr_caps {  };
>>
>>  #if defined(CONFIG_DRM_AMD_DC_DCN)
>> +#ifndef DP_MAIN_LINK_CHANNEL_CODING_CAP
>>  #define DP_MAIN_LINK_CHANNEL_CODING_CAP  0x006
>> +#endif
>> +#ifndef DP_SINK_VIDEO_FALLBACK_FORMATS
>>  #define DP_SINK_VIDEO_FALLBACK_FORMATS   0x020
>> +#endif
>> +#ifndef DP_FEC_CAPABILITY_1
>>  #define DP_FEC_CAPABILITY_1  0x091
>> +#endif
>> +#ifndef DP_DFP_CAPABILITY_EXTENSION_SUPPORT
>>  #define DP_DFP_CAPABILITY_EXTENSION_SUPPORT  0x0A3
>> +#endif
>> +#ifndef DP_DSC_CONFIGURATION
>>  #define DP_DSC_CONFIGURATION 0x161
>> +#endif
>> +#ifndef DP_PHY_SQUARE_PATTERN
>>  #define DP_PHY_SQUARE_PATTERN0x249
>> +#endif
>> +#ifndef DP_128b_132b_SUPPORTED_LINK_RATES
>>  #define DP_128b_132b_SUPPORTED_LINK_RATES0x2215
>> +#endif
>> +#ifndef DP_128b_132b_TRAINING_AUX_RD_INTERVAL
>>  #define DP_128b_132b_TRAINING_AUX_RD_INTERVAL
>>   0x2216
>> +#endif
>> +#ifndef DP_TEST_264BIT_CUSTOM_PATTERN_7_0
>>  #define DP_TEST_264BIT_CUSTOM_PATTERN_7_00X2230
>> +#endif
>> +#ifndef DP_TEST_264BIT_CUSTOM_PATTERN_263_256
>>  #define DP_TEST_264BIT_CUSTOM_PATTERN_263_256
>>   0X2250
>> +#endif
>> +#ifndef DP_DSC_SUPPORT_AND_DECODER_COUNT
>>  #define DP_DSC_SUPPORT_AND_DECODER_COUNT 0x2260
>> +#endif
>> +#ifndef DP_DSC_MAX_SLICE_COUNT_AND_AGGREGATION_0
>>  #define DP_DSC_MAX_SLICE_COUNT_AND_AGGREGATION_0
>>   0x2270
>> -# define DP_DSC_DECODER_0_MAXIMUM_SLICE_COUNT_MASK   (1 <<
>> 0)
>> -# define DP_DSC_DECODER_0_AGGREGATION_SUPPORT_MASK
>>   (0b

Re: [PATCH] drm/amd/amdgpu: Do irq_fini_hw after ip_fini_early

2021-09-29 Thread Alex Deucher

On Wed, Sep 29, 2021 at 5:22 AM YuBiao Wang  wrote:
>
> Some IP such as SMU need irq_put to perform hw_fini.
> So move irq_fini_hw after ip_fini.
>
> Signed-off-by: YuBiao Wang 

This looks correct in general, but will this code:
if (!amdgpu_device_has_dc_support(adev))
flush_work(&adev->hotplug_work);
in amdgpu_irq_fini_hw() cause any problems if it gets executed after
the DCE hw has been stopped?  I guess it should be ok.

Alex


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 4c8f2f4647c0..18e26a78ef82 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3864,10 +3864,10 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
> amdgpu_ucode_sysfs_fini(adev);
> sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
>
> -   amdgpu_irq_fini_hw(adev);
> -
> amdgpu_device_ip_fini_early(adev);
>
> +   amdgpu_irq_fini_hw(adev);
> +
> ttm_device_clear_dma_mappings(&adev->mman.bdev);
>
> amdgpu_gart_dummy_page_fini(adev);
> --
> 2.25.1
>

Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init

2021-09-29 Thread Zhu, James

[AMD Official Use Only]

H Felix,

Since the previous patch can help on PCO suspend/resume hung issue. Let me work 
with YiFan to see if
there is proper way to cover both cases.


Thanks & Best Regards!


James Zhu


From: Kuehling, Felix 
Sent: Tuesday, September 28, 2021 11:41 AM
To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org 
; Zhu, James 
Subject: Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init

[+James]

This basically undoes James's change "drm/amdgpu: move iommu_resume
before ip init/resume". I assume James made his change for a reason. Can
you please discuss the issue with him and determine a solution that
solves both your problem and his?

If James' patch series was a mistake, I'd prefer to revert his patches,
because his patches complicated the initialization sequence and exposed
the iommu init sequence in amdgpu.

Thanks,
  Felix


Am 2021-09-28 um 4:28 a.m. schrieb Yifan Zhang:
> This patch is to fix clinfo failure in Raven/Picasso:
>
> Number of platforms: 1
>   Platform Profile: FULL_PROFILE
>   Platform Version: OpenCL 2.2 AMD-APP (3364.0)
>   Platform Name: AMD Accelerated Parallel Processing
>   Platform Vendor: Advanced Micro Devices, Inc.
>   Platform Extensions: cl_khr_icd cl_amd_event_callback
>
>   Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
>
> Signed-off-by: Yifan Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 4c8f2f4647c0..89ed9b091386 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2393,10 +2393,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
> *adev)
>if (r)
>goto init_failed;
>
> - r = amdgpu_amdkfd_resume_iommu(adev);
> - if (r)
> - goto init_failed;
> -
>r = amdgpu_device_ip_hw_init_phase1(adev);
>if (r)
>goto init_failed;
> @@ -2435,6 +2431,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
> *adev)
>if (!adev->gmc.xgmi.pending_reset)
>amdgpu_amdkfd_device_init(adev);
>
> + r = amdgpu_amdkfd_resume_iommu(adev);
> + if (r)
> + goto init_failed;
> +
>amdgpu_fru_get_product_info(adev);
>
>  init_failed:

[PATCH v3 11/16] drm/amdkfd: CRIU restore queue doorbell id

2021-09-29 Thread David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.

Signed-off-by: David Yat Sin 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +--
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c29dbc529548..30ee22562329 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -153,7 +153,13 @@ static void decrement_queue_count(struct 
device_queue_manager *dqm,
dqm->active_cp_queue_count--;
 }
 
-static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
+/*
+ * Allocate a doorbell ID to this queue.
+ * If doorbell_id is passed in, make sure requested ID is valid then allocate 
it.
+ */
+static int allocate_doorbell(struct qcm_process_device *qpd,
+struct queue *q,
+uint32_t const *restore_id)
 {
struct kfd_dev *dev = qpd->dqm->dev;
 
@@ -161,6 +167,10 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
/* On pre-SOC15 chips we need to use the queue ID to
 * preserve the user mode ABI.
 */
+
+   if (restore_id && *restore_id != q->properties.queue_id)
+   return -EINVAL;
+
q->doorbell_id = q->properties.queue_id;
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
@@ -169,25 +179,37 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
 * The doobell index distance between RLC (2*i) and (2*i+1)
 * for a SDMA engine is 512.
 */
-   uint32_t *idx_offset =
-   dev->shared_resources.sdma_doorbell_idx;
 
-   q->doorbell_id = idx_offset[q->properties.sdma_engine_id]
-   + (q->properties.sdma_queue_id & 1)
-   * KFD_QUEUE_DOORBELL_MIRROR_OFFSET
-   + (q->properties.sdma_queue_id >> 1);
+   uint32_t *idx_offset = dev->shared_resources.sdma_doorbell_idx;
+   uint32_t valid_id = idx_offset[q->properties.sdma_engine_id]
+   + (q->properties.sdma_queue_id 
& 1)
+   * 
KFD_QUEUE_DOORBELL_MIRROR_OFFSET
+   + (q->properties.sdma_queue_id 
>> 1);
+
+   if (restore_id && *restore_id != valid_id)
+   return -EINVAL;
+   q->doorbell_id = valid_id;
} else {
-   /* For CP queues on SOC15 reserve a free doorbell ID */
-   unsigned int found;
-
-   found = find_first_zero_bit(qpd->doorbell_bitmap,
-   KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
-   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
-   pr_debug("No doorbells available");
-   return -EBUSY;
+   /* For CP queues on SOC15 */
+   if (restore_id) {
+   /* make sure that ID is free  */
+   if (__test_and_set_bit(*restore_id, 
qpd->doorbell_bitmap))
+   return -EINVAL;
+
+   q->doorbell_id = *restore_id;
+   } else {
+   /* or reserve a free doorbell ID */
+   unsigned int found;
+
+   found = find_first_zero_bit(qpd->doorbell_bitmap,
+   
KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+   pr_debug("No doorbells available");
+   return -EBUSY;
+   }
+   set_bit(found, qpd->doorbell_bitmap);
+   q->doorbell_id = found;
}
-   set_bit(found, qpd->doorbell_bitmap);
-   q->doorbell_id = found;
}
 
q->properties.doorbell_off =
@@ -356,7 +378,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? &qd->doorbell_id : NULL);
if (retval)
goto out_deallocate_hqd;
 
@@ -1333,7 +1355,7 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
goto out;
}
 
-   retval = allocate_doorbell(qpd, q);
+   retval = allocate_doorbell(qpd, q, qd ? &qd->doorbell_id : NULL);
i

[PATCH v3 15/16] drm/amdkfd: CRIU implement gpu_id remapping

2021-09-29 Thread David Yat Sin

When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 416 ---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  |   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   9 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  18 +
 4 files changed, 329 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index de0e28f90159..10f08aa26fac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -294,18 +294,19 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
return err;
 
pr_debug("Looking for gpu id 0x%x\n", args->gpu_id);
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev) {
-   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
-   return -EINVAL;
-   }
 
mutex_lock(&p->mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   goto err_unlock;
+   }
+   dev = pdd->dev;
 
pdd = kfd_bind_process_to_device(dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
-   goto err_bind_process;
+   goto err_unlock;
}
 
pr_debug("Creating queue for PASID 0x%x on gpu 0x%x\n",
@@ -315,7 +316,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, 
NULL, NULL, NULL,
&doorbell_offset_in_process);
if (err != 0)
-   goto err_create_queue;
+   goto err_unlock;
 
args->queue_id = queue_id;
 
@@ -344,8 +345,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
 
return 0;
 
-err_create_queue:
-err_bind_process:
+err_unlock:
mutex_unlock(&p->mutex);
return err;
 }
@@ -491,7 +491,6 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_memory_policy_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
enum cache_policy default_policy, alternate_policy;
@@ -506,13 +505,15 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
return -EINVAL;
}
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(&p->mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+   err = -EINVAL;
+   goto out;
+   }
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -525,7 +526,7 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
(args->alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
   ? cache_policy_coherent : cache_policy_noncoherent;
 
-   if (!dev->dqm->ops.set_cache_memory_policy(dev->dqm,
+   if (!pdd->dev->dqm->ops.set_cache_memory_policy(pdd->dev->dqm,
&pdd->qpd,
default_policy,
alternate_policy,
@@ -543,17 +544,18 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_set_trap_handler_args *args = data;
-   struct kfd_dev *dev;
int err = 0;
struct kfd_process_device *pdd;
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
-
mutex_lock(&p->mutex);
 
-   pdd = kfd_bind_process_to_device(dev, p);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+   if (!pdd) {
+   err = -EINVAL;
+   goto out;
+   }
+
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
if (IS_ERR(pdd)) {
err = -ESRCH;
goto out;
@@ -577,16 +579,20 @@ static int kfd_ioctl_dbg_register(struct file *filep,
bool create_ok;
long status = 0;
 
-   dev = kfd_device_by_id(args->gpu_id);
-   if (!dev)
-   return -EINVAL;
+   mutex_lock(&p->mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+

[PATCH v3 09/16] drm/amdkfd: CRIU restore queue ids

2021-09-29 Thread David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 ++
 .../amd/amdkfd/kfd_process_queue_manager.c| 24 +--
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 542a77b7f449..8bb470b1ee93 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id,
+   err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, 
NULL,
&doorbell_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 159add0f5aaa..749a7a3bf191 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   &properties, &qid, NULL);
+   &properties, &qid, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index c5329d843ffb..7e52ef69636a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -470,6 +470,7 @@ enum KFD_QUEUE_PRIORITY {
  * it's user mode or kernel mode queue.
  *
  */
+
 struct queue_properties {
enum kfd_queue_type type;
enum kfd_queue_format format;
@@ -1128,6 +1129,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index f1ec644acdf7..e3cf99dfe352 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -42,6 +42,20 @@ static inline struct process_queue_node *get_queue_by_qid(
return NULL;
 }
 
+static int assign_queue_slot_by_qid(struct process_queue_manager *pqm,
+   unsigned int qid)
+{
+   if (qid >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
+   return -EINVAL;
+
+   if (__test_and_set_bit(qid, pqm->queue_slot_bitmap)) {
+   pr_err("Cannot create new queue because requested qid(%u) is in 
use\n", qid);
+   return -ENOSPC;
+   }
+
+   return 0;
+}
+
 static int find_available_queue_slot(struct process_queue_manager *pqm,
unsigned int *qid)
 {
@@ -194,6 +208,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
struct file *f,
struct queue_properties *properties,
unsigned int *qid,
+   const struct kfd_criu_queue_priv_data *q_data,
uint32_t *p_doorbell_offset_in_process)
 {
int retval;
@@ -225,7 +240,12 @@ int pqm_create_queue(struct process_queue_manager *pqm,
if (pdd->qpd.queue_count >= max_queues)
return -ENOSPC;
 
-   retval = find_available_queue_slot(pqm, qid);
+   if (q_data) {
+   retval = assign_queue_slot_by_qid(pqm, q_data->q_id);
+   *qid = q_data->q_id;
+   } else
+   retval = find_available_queue_slot(pqm, qid);
+
if (retval != 0)
return retval;
 
@@ -774,7 +794,7 @@ static int criu_restore_queue(struct kfd_process *p,
 
print_queue_properties(&qp);
 
-   ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, NULL);
+   ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, q_data, 
NULL);
if (ret) {
pr_err("Failed to create new queue err:%d\n", ret);
ret = -EINVAL;
-- 
2.17.1

[PATCH v3 13/16] drm/amdkfd: CRIU dump/restore queue control stack

2021-09-29 Thread David Yat Sin

Dump contents of queue control stacks on CRIU dump and restore them
during CRIU restore.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 23 +---
 .../drm/amd/amdkfd/kfd_device_queue_manager.h | 11 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |  8 ++-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 10 ++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 11 ++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 26 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 19 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 ++--
 .../amd/amdkfd/kfd_process_queue_manager.c| 54 ---
 11 files changed, 125 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d2130c5a947e..e684fa87cfce 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, 
NULL, NULL,
+   err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, 
NULL, NULL, NULL,
&doorbell_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index c6c0cd47e7f7..3c29e60b967f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   &properties, &qid, NULL, NULL, NULL);
+   &properties, &qid, NULL, NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 3e1a6a9b..3f394e039791 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -333,7 +333,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
struct queue *q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void 
*restore_ctl_stack)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -395,7 +395,8 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
 
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj, 
&q->gart_mqd_addr,
-&q->properties, restore_mqd);
+&q->properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
&q->gart_mqd_addr, &q->properties);
@@ -1342,7 +1343,7 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
const struct kfd_criu_queue_priv_data *qd,
-   const void *restore_mqd)
+   const void *restore_mqd, const void *restore_ctl_stack)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1388,9 +1389,11 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 * updates the is_evicted flag but is a no-op otherwise.
 */
q->properties.is_evicted = !!qpd->evicted;
+
if (qd)
mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj, 
&q->gart_mqd_addr,
-&q->properties, restore_mqd);
+&q->properties, restore_mqd, 
restore_ctl_stack,
+qd->ctl_stack_size);
else
mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
&q->gart_mqd_addr, &q->properties);
@@ -1783,7 +1786,8 @@ static int get_wave_state(struct device_queue_manager 
*dqm,
 
 static void get_queue_dump_info(struct device_queue_manager *dqm,
const struct queue *q,
-   u32 *mqd_size)
+   u3

[PATCH v3 14/16] drm/amdkfd: CRIU dump and restore events

2021-09-29 Thread David Yat Sin

Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  61 +
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 322 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  20 +-
 3 files changed, 324 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index e684fa87cfce..de0e28f90159 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1008,51 +1008,11 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
 * through the event_page_offset field.
 */
if (args->event_page_offset) {
-   struct kfd_dev *kfd;
-   struct kfd_process_device *pdd;
-   void *mem, *kern_addr;
-   uint64_t size;
-
-   if (p->signal_page) {
-   pr_err("Event page is already set\n");
-   return -EINVAL;
-   }
-
-   kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
-   if (!kfd) {
-   pr_err("Getting device by id failed in %s\n", __func__);
-   return -EINVAL;
-   }
-
mutex_lock(&p->mutex);
-   pdd = kfd_bind_process_to_device(kfd, p);
-   if (IS_ERR(pdd)) {
-   err = PTR_ERR(pdd);
-   goto out_unlock;
-   }
-
-   mem = kfd_process_device_translate_handle(pdd,
-   GET_IDR_HANDLE(args->event_page_offset));
-   if (!mem) {
-   pr_err("Can't find BO, offset is 0x%llx\n",
-  args->event_page_offset);
-   err = -EINVAL;
-   goto out_unlock;
-   }
+   err = kfd_kmap_event_page(p, args->event_page_offset);
mutex_unlock(&p->mutex);
-
-   err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->kgd,
-   mem, &kern_addr, &size);
-   if (err) {
-   pr_err("Failed to map event page to kernel\n");
+   if (err)
return err;
-   }
-
-   err = kfd_event_page_set(p, kern_addr, size);
-   if (err) {
-   pr_err("Failed to set event page\n");
-   return err;
-   }
}
 
err = kfd_event_create(filp, p, args->event_type,
@@ -1061,10 +1021,7 @@ static int kfd_ioctl_create_event(struct file *filp, 
struct kfd_process *p,
&args->event_page_offset,
&args->event_slot_index);
 
-   return err;
-
-out_unlock:
-   mutex_unlock(&p->mutex);
+   pr_debug("Created event (id:0x%08x) (%s)\n", args->event_id, __func__);
return err;
 }
 
@@ -2046,6 +2003,8 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
ret = kfd_criu_dump_queues(p, args);
break;
case KFD_CRIU_OBJECT_TYPE_EVENT:
+   ret = kfd_criu_dump_events(p, args);
+   break;
case KFD_CRIU_OBJECT_TYPE_DEVICE:
case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
default:
@@ -2355,6 +2314,8 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
ret = kfd_criu_restore_queues(p, args);
break;
case KFD_CRIU_OBJECT_TYPE_EVENT:
+   ret = kfd_criu_restore_events(filep, p, args);
+   break;
case KFD_CRIU_OBJECT_TYPE_DEVICE:
case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
default:
@@ -2455,9 +2416,13 @@ static int kfd_ioctl_criu_process_info(struct file 
*filep,
args->queues_priv_data_size = queues_extra_data_size +
(args->total_queues * sizeof(struct 
kfd_criu_queue_priv_data));
 
-   dev_dbg(kfd_device, "Num of bos:%llu queues:%u\n",
+   args->total_events = kfd_get_num_events(p);
+   args->events_priv_data_size = args->total_events * sizeof(struct 
kfd_criu_event_priv_data);
+
+   dev_dbg(kfd_device, "Num of bos:%llu queues:%u events:%u\n",
args->total_bos,
-   args->total_queues);
+   args->total_queues,
+   args->total_events);
 err_unlock:
mutex_unlock(&p->mutex);
return ret;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 3eea4edee355..2a1451857f05 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -53,9 +53,9 @@ struct kfd_signal_page {
uint64_t *kernel_address;
uint64_t __user *user_address;

[PATCH v3 12/16] drm/amdkfd: CRIU dump and restore queue mqds

2021-09-29 Thread David Yat Sin

Dump contents of queue MQD's on CRIU dump and restore them during CRIU
restore.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 72 --
 .../drm/amd/amdkfd/kfd_device_queue_manager.h | 12 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |  7 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 67 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 68 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 68 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 69 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  4 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 97 ---
 11 files changed, 444 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 8bb470b1ee93..d2130c5a947e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);
 
-   err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, 
NULL,
+   err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, 
NULL, NULL,
&doorbell_offset_in_process);
if (err != 0)
goto err_create_queue;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 749a7a3bf191..c6c0cd47e7f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
properties.type = KFD_QUEUE_TYPE_DIQ;
 
status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-   &properties, &qid, NULL, NULL);
+   &properties, &qid, NULL, NULL, NULL);
 
if (status) {
pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 30ee22562329..3e1a6a9b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -332,7 +332,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -391,8 +392,14 @@ static int create_queue_nocpsch(struct 
device_queue_manager *dqm,
retval = -ENOMEM;
goto out_deallocate_doorbell;
}
-   mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
-   &q->gart_mqd_addr, &q->properties);
+
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj, 
&q->gart_mqd_addr,
+&q->properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
+   &q->gart_mqd_addr, &q->properties);
+
if (q->properties.is_active) {
if (!dqm->sched_running) {
WARN_ONCE(1, "Load non-HWS mqd while stopped\n");
@@ -1334,7 +1341,8 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue 
*q,
struct qcm_process_device *qpd,
-   const struct kfd_criu_queue_priv_data *qd)
+   const struct kfd_criu_queue_priv_data *qd,
+   const void *restore_mqd)
 {
int retval;
struct mqd_manager *mqd_mgr;
@@ -1380,8 +1388,12 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 * updates the is_evicted flag but is a no-op otherwise.
 */
q->properties.is_evicted = !!qpd->evicted;
-   mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
-   &q->gart_mqd_addr, &q->properties);
+   if (qd)
+   mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj, 
&q->gart_mqd_addr,
+&q->properties, restore_mqd);
+   else
+   mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
+   &q->gart_mqd_addr, &q->properties);
 
list_add(&q->list, &qpd->queues_list);

[PATCH v3 07/16] drm/amdkfd: CRIU Implement KFD pause ioctl

2021-09-29 Thread David Yat Sin

Introducing pause IOCTL. The CRIU amdgpu plugin is needs
to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and
AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures
that the queues are not modified between each CRIU dump ioctl.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 23 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 668772a67f7a..791cb1555413 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2027,6 +2027,14 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
goto err_unlock;
}
 
+   /* Confirm all process queues are evicted */
+   if (!p->queues_paused) {
+   pr_err("Cannot dump process when queues are not in evicted 
state\n");
+   /* CRIU plugin did not call AMDKFD_IOC_CRIU_PAUSE before 
dumping */
+   ret = -EINVAL;
+   goto err_unlock;
+   }
+
switch (args->type) {
case KFD_CRIU_OBJECT_TYPE_PROCESS:
ret = criu_dump_process(p, args);
@@ -2363,7 +2371,20 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
 
 static int kfd_ioctl_criu_pause(struct file *filep, struct kfd_process *p, 
void *data)
 {
-   return 0;
+   int ret;
+   struct kfd_ioctl_criu_pause_args *args = data;
+
+   if (args->pause)
+   ret = kfd_process_evict_queues(p);
+   else
+   ret = kfd_process_restore_queues(p);
+
+   if (ret)
+   pr_err("Failed to %s queues ret:%d\n", args->pause ? "evict" : 
"restore", ret);
+   else
+   p->queues_paused = !!(args->pause);
+
+   return ret;
 }
 
 static int kfd_ioctl_criu_resume(struct file *filep,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 881af8e1b06c..e0601bfbcbf2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -868,6 +868,9 @@ struct kfd_process {
struct svm_range_list svms;
 
bool xnack_enabled;
+
+   /* Queues are in paused stated because we are in the process of doing a 
CRIU checkpoint */
+   bool queues_paused;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 65a389fb97ce..0f7c4c63ee99 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1314,6 +1314,7 @@ static struct kfd_process *create_process(const struct 
task_struct *thread)
process->mm = thread->mm;
process->lead_thread = thread->group_leader;
process->n_pdds = 0;
+   process->queues_paused = false;
INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
process->last_restore_timestamp = get_jiffies_64();
-- 
2.17.1

[PATCH v3 16/16] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

2021-09-29 Thread David Yat Sin

From: Rajneesh Bhardwaj 

KFD buffer objects do not associate a GEM handle with them so cannot
directly be used with libdrm to initiate a system dma (sDMA) operation
to speedup the checkpoint and restore operation so export them as dmabuf
objects and use with libdrm helper (amdgpu_bo_import) to further process
the sdma command submissions.

With sDMA, we see huge improvement in checkpoint and restore operations
compared to the generic pci based access via host data path.

Suggested-by: Felix Kuehling 
Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 
 1 file changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 10f08aa26fac..75fbdd84d2ff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
@@ -43,6 +44,7 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_object.h"
+#include "amdgpu_dma_buf.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1931,6 +1933,33 @@ uint64_t get_process_num_bos(struct kfd_process *p)
return num_of_bos;
 }
 
+static int criu_get_prime_handle(struct drm_gem_object *gobj, int flags,
+ u32 *shared_fd)
+{
+   struct dma_buf *dmabuf;
+   int ret;
+
+   dmabuf = amdgpu_gem_prime_export(gobj, flags);
+   if (IS_ERR(dmabuf)) {
+   ret = PTR_ERR(dmabuf);
+   pr_err("dmabuf export failed for the BO\n");
+   return ret;
+   }
+
+   ret = dma_buf_fd(dmabuf, flags);
+   if (ret < 0) {
+   pr_err("dmabuf create fd failed, ret:%d\n", ret);
+   goto out_free_dmabuf;
+   }
+
+   *shared_fd = ret;
+   return 0;
+
+out_free_dmabuf:
+   dma_buf_put(dmabuf);
+   return ret;
+}
+
 static int criu_dump_bos(struct kfd_process *p, struct 
kfd_ioctl_criu_dumper_args *args)
 {
struct kfd_criu_bo_bucket *bo_buckets;
@@ -2000,6 +2029,14 @@ static int criu_dump_bos(struct kfd_process *p, struct 
kfd_ioctl_criu_dumper_arg
goto exit;
}
}
+   if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+   ret = 
criu_get_prime_handle(&dumper_bo->tbo.base,
+   bo_bucket->alloc_flags &
+   
KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? DRM_RDWR : 0,
+   &bo_bucket->dmabuf_fd);
+   if (ret)
+   goto exit;
+   }
if (bo_bucket->alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
bo_bucket->offset = KFD_MMAP_TYPE_DOORBELL |
KFD_MMAP_GPU_ID(pdd->dev->id);
@@ -2029,6 +2066,11 @@ static int criu_dump_bos(struct kfd_process *p, struct 
kfd_ioctl_criu_dumper_arg
}
 
 exit:
+   while (ret && bo_index--) {
+   if (bo_buckets[bo_index].alloc_flags & 
KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   close_fd(bo_buckets[bo_index].dmabuf_fd);
+   }
+
kvfree(bo_buckets);
return ret;
 }
@@ -2276,6 +2318,7 @@ static int criu_restore_bos(struct kfd_process *p, struct 
kfd_ioctl_criu_restore
struct kfd_criu_bo_priv_data *bo_priv;
struct kfd_dev *dev;
struct kfd_process_device *pdd;
+   struct kgd_mem *kgd_mem;
void *mem;
u64 offset;
int idr_handle;
@@ -2427,6 +2470,15 @@ static int criu_restore_bos(struct kfd_process *p, 
struct kfd_ioctl_criu_restore
}
 
pr_debug("map memory was successful for the BO\n");
+   /* create the dmabuf object and export the bo */
+   kgd_mem = (struct kgd_mem *)mem;
+   if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+   ret = criu_get_prime_handle(&kgd_mem->bo->tbo.base,
+   DRM_RDWR,
+   &bo_bucket->dmabuf_fd);
+   if (ret)
+   goto exit;
+   }
} /* done */
 
if (flush_tlbs) {
@@ -2454,6 +2506,10 @@ static int criu_restore_bos(struct kfd_process *p, 
struct kfd_ioctl_criu_restore
ret = -EFAULT;
 
 exit:
+   while (ret && i--) {
+   if (bo_buckets[i].alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+

[PATCH v3 08/16] drm/amdkfd: CRIU add queues support

2021-09-29 Thread David Yat Sin

Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  25 +-
 .../amd/amdkfd/kfd_process_queue_manager.c| 363 ++
 3 files changed, 402 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 791cb1555413..542a77b7f449 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2043,6 +2043,8 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
ret = criu_dump_bos(p, args);
break;
case KFD_CRIU_OBJECT_TYPE_QUEUE:
+   ret = kfd_criu_dump_queues(p, args);
+   break;
case KFD_CRIU_OBJECT_TYPE_EVENT:
case KFD_CRIU_OBJECT_TYPE_DEVICE:
case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
@@ -2350,6 +2352,8 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
ret = criu_restore_bos(p, args);
break;
case KFD_CRIU_OBJECT_TYPE_QUEUE:
+   ret = kfd_criu_restore_queues(p, args);
+   break;
case KFD_CRIU_OBJECT_TYPE_EVENT:
case KFD_CRIU_OBJECT_TYPE_DEVICE:
case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
@@ -2425,6 +2429,7 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
struct kfd_process *p, void *data)
 {
struct kfd_ioctl_criu_process_info_args *args = data;
+   uint32_t queues_extra_data_size;
int ret = 0;
 
mutex_lock(&p->mutex);
@@ -2443,7 +2448,16 @@ static int kfd_ioctl_criu_process_info(struct file 
*filep,
args->total_bos = get_process_num_bos(p);
args->bos_priv_data_size = args->total_bos * sizeof(struct 
kfd_criu_bo_priv_data);
 
-   dev_dbg(kfd_device, "Num of bos:%llu\n", args->total_bos);
+   ret = kfd_process_get_queue_info(p, &args->total_queues, 
&queues_extra_data_size);
+   if (ret)
+   goto err_unlock;
+
+   args->queues_priv_data_size = queues_extra_data_size +
+   (args->total_queues * sizeof(struct 
kfd_criu_queue_priv_data));
+
+   dev_dbg(kfd_device, "Num of bos:%llu queues:%u\n",
+   args->total_bos,
+   args->total_queues);
 err_unlock:
mutex_unlock(&p->mutex);
return ret;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e0601bfbcbf2..c5329d843ffb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1055,13 +1055,36 @@ struct kfd_criu_svm_range_priv_data {
 };
 
 struct kfd_criu_queue_priv_data {
-   uint64_t reserved;
+   uint64_t q_address;
+   uint64_t q_size;
+   uint64_t read_ptr_addr;
+   uint64_t write_ptr_addr;
+   uint64_t doorbell_off;
+   uint64_t eop_ring_buffer_address;
+   uint64_t ctx_save_restore_area_address;
+   uint32_t gpu_id;
+   uint32_t type;
+   uint32_t format;
+   uint32_t q_id;
+   uint32_t priority;
+   uint32_t q_percent;
+   uint32_t doorbell_id;
+   uint32_t is_gws;
+   uint32_t sdma_id;
+   uint32_t eop_ring_buffer_size;
+   uint32_t ctx_save_restore_area_size;
+   uint32_t ctl_stack_size;
+   uint32_t cu_mask_size;
+   uint32_t mqd_size;
 };
 
 struct kfd_criu_event_priv_data {
uint64_t reserved;
 };
 
+int kfd_process_get_queue_info(struct kfd_process *p, uint32_t *num_queues, 
uint32_t *q_data_sizes);
+int kfd_criu_dump_queues(struct kfd_process *p, struct 
kfd_ioctl_criu_dumper_args *args);
+int kfd_criu_restore_queues(struct kfd_process *p, struct 
kfd_ioctl_criu_restorer_args *args);
 /* CRIU - End */
 
 /* Queue Context Management */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 243dd1efcdbf..f1ec644acdf7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -508,6 +508,369 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
   save_area_used_size);
 }
 
+
+static void get_queue_data_sizes(struct kfd_process_device *pdd,
+   struct queue *q,
+   uint32_t *cu_mask_size)
+{
+   *cu_mask_size = sizeof(uint32_t) * (q->properties.cu_mask_count / 32);
+}
+
+int kfd_process_get_queue_info(struct kfd_process *p, uint32_t *num_queues, 
uint32_t *q_data_sizes)
+{
+   u32 data_sizes = 0;
+   u32 q_index = 0;
+   struct queue *q;
+   int i;
+
+   /* Run over all PDDs of the process */
+   for (i = 0; i < p->n_pdds; i++) {
+

[PATCH v3 10/16] drm/amdkfd: CRIU restore sdma id for queues

2021-09-29 Thread David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.

Signed-off-by: David Yat Sin 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  4 +-
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f8fce9d05f50..c29dbc529548 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -58,7 +58,7 @@ static inline void deallocate_hqd(struct device_queue_manager 
*dqm,
struct queue *q);
 static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q);
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q);
+   struct queue *q, const uint32_t 
*restore_sdma_id);
 static void kfd_process_hw_exception(struct work_struct *work);
 
 static inline
@@ -309,7 +309,8 @@ static void deallocate_vmid(struct device_queue_manager 
*dqm,
 
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
-   struct qcm_process_device *qpd)
+   struct qcm_process_device *qpd,
+   const struct kfd_criu_queue_priv_data *qd)
 {
struct mqd_manager *mqd_mgr;
int retval;
@@ -349,7 +350,7 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,
q->pipe, q->queue);
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
-   retval = allocate_sdma_queue(dqm, q);
+   retval = allocate_sdma_queue(dqm, q, qd ? &qd->sdma_id : NULL);
if (retval)
goto deallocate_vmid;
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1040,7 +1041,7 @@ static void pre_reset(struct device_queue_manager *dqm)
 }
 
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-   struct queue *q)
+   struct queue *q, const uint32_t 
*restore_sdma_id)
 {
int bit;
 
@@ -1050,9 +1051,21 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
return -ENOMEM;
}
 
-   bit = __ffs64(dqm->sdma_bitmap);
-   dqm->sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->sdma_bitmap & (1ULL << *restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   /* Find first available sdma_id */
+   bit = __ffs64(dqm->sdma_bitmap);
+   dqm->sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
+
q->properties.sdma_engine_id = q->sdma_id %
get_num_sdma_engines(dqm);
q->properties.sdma_queue_id = q->sdma_id /
@@ -1062,9 +1075,19 @@ static int allocate_sdma_queue(struct 
device_queue_manager *dqm,
pr_err("No more XGMI SDMA queue to allocate\n");
return -ENOMEM;
}
-   bit = __ffs64(dqm->xgmi_sdma_bitmap);
-   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
-   q->sdma_id = bit;
+   if (restore_sdma_id) {
+   /* Re-use existing sdma_id */
+   if (!(dqm->xgmi_sdma_bitmap & (1ULL << 
*restore_sdma_id))) {
+   pr_err("SDMA queue already in use\n");
+   return -EBUSY;
+   }
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+   q->sdma_id = *restore_sdma_id;
+   } else {
+   bit = __ffs64(dqm->xgmi_sdma_bitmap);
+   dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
+   q->sdma_id = bit;
+   }
/* sdma_engine_id is sdma id including
 * both PCIe-optimized SDMAs and XGMI-
 * optimized SDMAs. The calculation below
@@ -1288,7 +1311,8 @@ static void destroy_kernel_queue_cpsch(struct 
device_queue_manager *dqm,
 }
 
 static int create_queue_cpsch(struct device_queue_manager *dqm, struc

[PATCH v3 04/16] drm/amdkfd: CRIU Implement KFD dumper ioctl

2021-09-29 Thread David Yat Sin

From: Rajneesh Bhardwaj 

This adds support to discover the  buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
during a restore operation.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  20 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 188 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   3 +-
 4 files changed, 211 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e2896ac2c9ce..5d557180cd49 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1160,6 +1160,26 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device 
*bdev,
return ttm_pool_free(&adev->mman.bdev.pool, ttm);
 }
 
+/**
+ * amdgpu_ttm_tt_get_userptr - Return the userptr GTT ttm_tt for the current
+ * task
+ *
+ * @tbo: The ttm_buffer_object that contains the userptr
+ * @user_addr:  The returned value
+ */
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr)
+{
+   struct amdgpu_ttm_tt *gtt;
+
+   if (!tbo->ttm)
+   return -EINVAL;
+
+   gtt = (void *)tbo->ttm;
+   *user_addr = gtt->userptr;
+   return 0;
+}
+
 /**
  * amdgpu_ttm_tt_set_userptr - Initialize userptr GTT ttm_tt for the current
  * task
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index e69f3e8e06e5..a7c0e6372339 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -177,6 +177,8 @@ static inline bool amdgpu_ttm_tt_get_user_pages_done(struct 
ttm_tt *ttm)
 #endif
 
 void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct page **pages);
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+ uint64_t *user_addr);
 int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo,
  uint64_t addr, uint32_t flags);
 bool amdgpu_ttm_tt_has_userptr(struct ttm_tt *ttm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1906ded40698..cc3d8fd1d26f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -42,6 +42,7 @@
 #include "kfd_svm.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
+#include "amdgpu_object.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1841,6 +1842,44 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
return -EPERM;
 }
 #endif
+static int criu_dump_process(struct kfd_process *p, struct 
kfd_ioctl_criu_dumper_args *args)
+{
+   int ret;
+   struct kfd_criu_process_bucket *process_bucket;
+   struct kfd_criu_process_priv_data *process_priv;
+
+   if (args->num_objects != 1) {
+   pr_err("Only 1 process supported\n");
+   return -EINVAL;
+   }
+
+   if (args->objects_size != sizeof(*process_bucket) + 
sizeof(*process_priv)) {
+   pr_err("Invalid objects size for process\n");
+   return -EINVAL;
+   }
+
+   process_bucket = kzalloc(args->objects_size, GFP_KERNEL);
+   if (!process_bucket)
+   return -ENOMEM;
+
+   /* Private data starts after process bucket */
+   process_priv = (void *)(process_bucket + 1);
+
+   process_priv->version = KFD_CRIU_PRIV_VERSION;
+
+   process_bucket->priv_data_offset = 0;
+   process_bucket->priv_data_size = sizeof(*process_priv);
+
+   ret = copy_to_user((void __user *)args->objects, process_bucket, 
args->objects_size);
+   if (ret) {
+   pr_err("Failed to copy process information to user\n");
+   ret = -EFAULT;
+   }
+
+   kfree(process_bucket);
+   return ret;
+}
+
 uint64_t get_process_num_bos(struct kfd_process *p)
 {
uint64_t num_of_bos = 0, i;
@@ -1861,10 +1900,157 @@ uint64_t get_process_num_bos(struct kfd_process *p)
return num_of_bos;
 }
 
+static int criu_dump_bos(struct kfd_process *p, struct 
kfd_ioctl_criu_dumper_args *args)
+{
+   struct kfd_criu_bo_bucket *bo_buckets;
+   struct kfd_criu_bo_priv_data *bo_privs;
+   uint64_t num_bos;
+
+   int ret = 0, pdd_index, bo_index = 0, id;
+   void *mem;
+
+   num_bos = get_process_num_bos(p);
+
+   if (args->num_objects != num_bos) {
+   pr_err("Mismatch with number of BOs (current:%lld user:%lld)\n",
+   num_bos, args->num_objects);
+   return -EINVAL;
+   }
+
+   if (args->obj

[PATCH v3 00/16] CHECKPOINT RESTORE WITH ROCm

2021-09-29 Thread David Yat Sin

CRIU is a user space tool which is very popular for container live migration in 
datacentres. It can checkpoint a running application, save its complete state, 
memory contents and all system resources to images on disk which can be 
migrated to another m achine and restored later. More information on CRIU can 
be found at https://criu.org/Main_Page

CRIU currently does not support Checkpoint / Restore with applications that 
have devices files open so it cannot perform checkpoint and restore on GPU 
devices which are very complex and have their own VRAM managed privately. CRIU, 
however can support e xternal devices by using a plugin architecture. This 
patch series adds initial support for ROCm applications while we add more 
remaining features. We welcome some feedback, especially in regards to the 
APIs, before involving a larger audience.

Our changes to CRIU are can be obtained from here:
https://github.com/RadeonOpenCompute/criu/tree/amdgpu_rfc-210715-2

We have tested the following scenarios:
-Checkpoint / Restore of a Pytorch (BERT) workload -kfdtests with queues and 
events
-Gfx9 and Gfx10 based multi GPU test systems -On baremetal and inside a docker 
container -Restoring on a different system

V1: Initial
V2: Addressed review comments
V3: Rebased on latest amd-staging-drm-next

PS: There will be an upcoming V4 patch series with minor additions to the API's 
to support HMM.

David Yat Sin (9):
  drm/amdkfd: CRIU Implement KFD pause ioctl
  drm/amdkfd: CRIU add queues support
  drm/amdkfd: CRIU restore queue ids
  drm/amdkfd: CRIU restore sdma id for queues
  drm/amdkfd: CRIU restore queue doorbell id
  drm/amdkfd: CRIU dump and restore queue mqds
  drm/amdkfd: CRIU dump/restore queue control stack
  drm/amdkfd: CRIU dump and restore events
  drm/amdkfd: CRIU implement gpu_id remapping

Rajneesh Bhardwaj (7):
  x86/configs: CRIU update debug rock defconfig
  drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
  drm/amdkfd: CRIU Implement KFD process_info ioctl
  drm/amdkfd: CRIU Implement KFD dumper ioctl
  drm/amdkfd: CRIU Implement KFD restore ioctl
  drm/amdkfd: CRIU Implement KFD resume ioctl
  drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

 arch/x86/configs/rock-dbg_defconfig   |   53 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|6 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   51 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   20 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 1179 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  185 ++-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   18 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  323 -
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   11 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |   69 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |   71 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |   86 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |   78 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  137 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   68 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  474 ++-
 include/uapi/linux/kfd_ioctl.h|  221 ++-
 19 files changed, 2792 insertions(+), 262 deletions(-)

-- 
2.17.1

[PATCH v3 01/16] x86/configs: CRIU update debug rock defconfig

2021-09-29 Thread David Yat Sin

From: Rajneesh Bhardwaj 

 - Update debug config for Checkpoint-Restore (CR) support
 - Also include necessary options for CR with docker containers.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 arch/x86/configs/rock-dbg_defconfig | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/arch/x86/configs/rock-dbg_defconfig 
b/arch/x86/configs/rock-dbg_defconfig
index 4877da183599..bc2a34666c1d 100644
--- a/arch/x86/configs/rock-dbg_defconfig
+++ b/arch/x86/configs/rock-dbg_defconfig
@@ -249,6 +249,7 @@ CONFIG_KALLSYMS_ALL=y
 CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
 CONFIG_KALLSYMS_BASE_RELATIVE=y
 # CONFIG_USERFAULTFD is not set
+CONFIG_USERFAULTFD=y
 CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
 CONFIG_KCMP=y
 CONFIG_RSEQ=y
@@ -1015,6 +1016,11 @@ CONFIG_PACKET_DIAG=y
 CONFIG_UNIX=y
 CONFIG_UNIX_SCM=y
 CONFIG_UNIX_DIAG=y
+CONFIG_SMC_DIAG=y
+CONFIG_XDP_SOCKETS_DIAG=y
+CONFIG_INET_MPTCP_DIAG=y
+CONFIG_TIPC_DIAG=y
+CONFIG_VSOCKETS_DIAG=y
 # CONFIG_TLS is not set
 CONFIG_XFRM=y
 CONFIG_XFRM_ALGO=y
@@ -1052,15 +1058,17 @@ CONFIG_SYN_COOKIES=y
 # CONFIG_NET_IPVTI is not set
 # CONFIG_NET_FOU is not set
 # CONFIG_NET_FOU_IP_TUNNELS is not set
-# CONFIG_INET_AH is not set
-# CONFIG_INET_ESP is not set
-# CONFIG_INET_IPCOMP is not set
-CONFIG_INET_TUNNEL=y
-CONFIG_INET_DIAG=y
-CONFIG_INET_TCP_DIAG=y
-# CONFIG_INET_UDP_DIAG is not set
-# CONFIG_INET_RAW_DIAG is not set
-# CONFIG_INET_DIAG_DESTROY is not set
+CONFIG_INET_AH=m
+CONFIG_INET_ESP=m
+CONFIG_INET_IPCOMP=m
+CONFIG_INET_ESP_OFFLOAD=m
+CONFIG_INET_TUNNEL=m
+CONFIG_INET_XFRM_TUNNEL=m
+CONFIG_INET_DIAG=m
+CONFIG_INET_TCP_DIAG=m
+CONFIG_INET_UDP_DIAG=m
+CONFIG_INET_RAW_DIAG=m
+CONFIG_INET_DIAG_DESTROY=y
 CONFIG_TCP_CONG_ADVANCED=y
 # CONFIG_TCP_CONG_BIC is not set
 CONFIG_TCP_CONG_CUBIC=y
@@ -1085,12 +1093,14 @@ CONFIG_TCP_MD5SIG=y
 CONFIG_IPV6=y
 # CONFIG_IPV6_ROUTER_PREF is not set
 # CONFIG_IPV6_OPTIMISTIC_DAD is not set
-CONFIG_INET6_AH=y
-CONFIG_INET6_ESP=y
-# CONFIG_INET6_ESP_OFFLOAD is not set
-# CONFIG_INET6_ESPINTCP is not set
-# CONFIG_INET6_IPCOMP is not set
-# CONFIG_IPV6_MIP6 is not set
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_ESP_OFFLOAD=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_IPV6_MIP6=m
+CONFIG_INET6_XFRM_TUNNEL=m
+CONFIG_INET_DCCP_DIAG=m
+CONFIG_INET_SCTP_DIAG=m
 # CONFIG_IPV6_ILA is not set
 # CONFIG_IPV6_VTI is not set
 CONFIG_IPV6_SIT=y
@@ -1146,8 +1156,13 @@ CONFIG_NF_CT_PROTO_UDPLITE=y
 # CONFIG_NF_CONNTRACK_SANE is not set
 # CONFIG_NF_CONNTRACK_SIP is not set
 # CONFIG_NF_CONNTRACK_TFTP is not set
-# CONFIG_NF_CT_NETLINK is not set
-# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
+CONFIG_COMPAT_NETLINK_MESSAGES=y
+CONFIG_NF_CT_NETLINK=m
+CONFIG_NF_CT_NETLINK_TIMEOUT=m
+CONFIG_NF_CT_NETLINK_HELPER=m
+CONFIG_NETFILTER_NETLINK_GLUE_CT=y
+CONFIG_SCSI_NETLINK=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
 CONFIG_NF_NAT=m
 CONFIG_NF_NAT_REDIRECT=y
 CONFIG_NF_NAT_MASQUERADE=y
@@ -1992,7 +2007,7 @@ CONFIG_NETCONSOLE_DYNAMIC=y
 CONFIG_NETPOLL=y
 CONFIG_NET_POLL_CONTROLLER=y
 # CONFIG_RIONET is not set
-# CONFIG_TUN is not set
+CONFIG_TUN=y
 # CONFIG_TUN_VNET_CROSS_LE is not set
 CONFIG_VETH=y
 # CONFIG_NLMON is not set
@@ -3990,7 +4005,7 @@ CONFIG_MANDATORY_FILE_LOCKING=y
 CONFIG_FSNOTIFY=y
 CONFIG_DNOTIFY=y
 CONFIG_INOTIFY_USER=y
-# CONFIG_FANOTIFY is not set
+CONFIG_FANOTIFY=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
-- 
2.17.1

[PATCH v3 05/16] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-09-29 Thread David Yat Sin

From: Rajneesh Bhardwaj 

This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to Non-Paged system memory
mapped for GPU and/or CPU access and lays basic foundation for the
userptrs buffer objects which will be added in a separate patch.
This ioctl creates various types of buffer objects such as VRAM,
MMIO, Doorbell, GTT based on the date sent from the userspace plugin.
The data mostly contains the previously checkpointed KFD images from
some KFD processs.

While restoring a criu process, attach old IDR values to newly
created BOs. This also adds the minimal gpu mapping support for a single
gpu checkpoint restore use case.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 297 ++-
 1 file changed, 296 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index cc3d8fd1d26f..e5a6a98eae45 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2053,10 +2053,305 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
return ret;
 }
 
+static int criu_restore_process(struct kfd_process *p, struct 
kfd_ioctl_criu_restorer_args *args)
+{
+   int ret = 0;
+   uint8_t *objects;
+   struct kfd_criu_process_bucket *process_bucket;
+   struct kfd_criu_process_priv_data *process_priv;
+
+   if (args->num_objects != 1) {
+   pr_err("Only 1 process supported\n");
+   return -EINVAL;
+   }
+
+   if (args->objects_size != sizeof(*process_bucket) + 
sizeof(*process_priv)) {
+   pr_err("Invalid objects size for process\n");
+   return -EINVAL;
+   }
+
+   objects = kmalloc(args->objects_size, GFP_KERNEL);
+   if (!objects)
+   return -ENOMEM;
+
+   ret = copy_from_user(objects, (void __user *)args->objects, 
args->objects_size);
+   if (ret) {
+   pr_err("Failed to copy process information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+
+   process_bucket = (struct kfd_criu_process_bucket *)objects;
+   /* Private data starts after process bucket */
+   process_priv = (struct kfd_criu_process_priv_data *)
+   (objects + sizeof(*process_bucket) + 
process_bucket->priv_data_offset);
+
+   if (process_priv->version != KFD_CRIU_PRIV_VERSION) {
+   pr_err("Invalid CRIU API version (checkpointed:%d 
current:%d)\n",
+   process_priv->version, KFD_CRIU_PRIV_VERSION);
+   return -EINVAL;
+   }
+
+exit:
+   kfree(objects);
+   return ret;
+}
+
+static int criu_restore_bos(struct kfd_process *p, struct 
kfd_ioctl_criu_restorer_args *args)
+{
+   struct kfd_criu_bo_bucket *bo_buckets;
+   uint8_t *objects, *private_data;
+   bool flush_tlbs = false;
+   int ret = 0, i, j = 0;
+
+   if (args->objects_size != args->num_objects *
+   (sizeof(*bo_buckets) + sizeof(struct kfd_criu_bo_priv_data))) {
+   pr_err("Invalid objects size for BOs\n");
+   return -EINVAL;
+   }
+
+   objects = kmalloc(args->objects_size, GFP_KERNEL);
+   if (!objects)
+   return -ENOMEM;
+
+   ret = copy_from_user(objects, (void __user *)args->objects, 
args->objects_size);
+   if (ret) {
+   pr_err("Failed to copy BOs information from user\n");
+   ret = -EFAULT;
+   goto exit;
+   }
+
+   bo_buckets = (struct kfd_criu_bo_bucket *) objects;
+   /* Private data for first BO starts after all bo_buckets */
+   private_data = (void *)(bo_buckets + args->num_objects);
+
+   /* Create and map new BOs */
+   for (i = 0; i < args->num_objects; i++) {
+   struct kfd_criu_bo_bucket *bo_bucket;
+   struct kfd_criu_bo_priv_data *bo_priv;
+   struct kfd_dev *dev;
+   struct kfd_process_device *pdd;
+   void *mem;
+   u64 offset;
+   int idr_handle;
+
+   bo_bucket = &bo_buckets[i];
+   bo_priv = (struct kfd_criu_bo_priv_data *)
+   (private_data + bo_bucket->priv_data_offset);
+
+   dev = kfd_device_by_id(bo_bucket->gpu_id);
+   if (!dev) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+   pdd = kfd_get_process_device_data(dev, p);
+   if (!pdd) {
+   ret = -EINVAL;
+   pr_err("Failed to get pdd\n");
+   goto exit;
+   }
+
+   pr_debug("kfd restore ioctl - bo_bucket[%d]:\n", i);
+   pr_debug(

[PATCH v3 06/16] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-09-29 Thread David Yat Sin

From: Rajneesh Bhardwaj 

This adds support to create userptr BOs on restore and introduces a new
ioctl to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the
restore process i.e. criu_resume ioctl is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.

Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  6 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 51 +--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 45 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 35 +++--
 5 files changed, 124 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 3bc52b2c604f..3837cec6617d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -131,6 +131,7 @@ struct amdkfd_process_info {
atomic_t evicted_bos;
struct delayed_work restore_userptr_work;
struct pid *pid;
+   bool block_mmu_notifications;
 };
 
 int amdgpu_amdkfd_init(void);
@@ -267,7 +268,7 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv);
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct kgd_dev *kgd, uint64_t va, uint64_t size,
void *drm_priv, struct kgd_mem **mem,
-   uint64_t *offset, uint32_t flags);
+   uint64_t *offset, uint32_t flags, bool criu_resume);
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv,
uint64_t *size);
@@ -290,6 +291,9 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
  uint64_t *mmap_offset);
 int amdgpu_amdkfd_get_tile_config(struct kgd_dev *kgd,
struct tile_config *config);
+void amdgpu_amdkfd_block_mmu_notifications(void *p);
+int amdgpu_amdkfd_criu_resume(void *p);
+
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..8465361ce716 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -819,7 +819,8 @@ static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem 
*mem,
  *
  * Returns 0 for success, negative errno for errors.
  */
-static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
+static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
+  bool criu_resume)
 {
struct amdkfd_process_info *process_info = mem->process_info;
struct amdgpu_bo *bo = mem->bo;
@@ -841,6 +842,17 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
goto out;
}
 
+   if (criu_resume) {
+   /*
+* During a CRIU restore operation, the userptr buffer objects
+* will be validated in the restore_userptr_work worker at a
+* later stage when it is scheduled by another ioctl called by
+* CRIU master process for the target pid for restore.
+*/
+   atomic_inc(&mem->invalid);
+   mutex_unlock(&process_info->lock);
+   return 0;
+   }
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
@@ -1213,6 +1225,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void 
**process_info,
INIT_DELAYED_WORK(&info->restore_userptr_work,
  amdgpu_amdkfd_restore_userptr_worker);
 
+   info->block_mmu_notifications = false;
*process_info = info;
*ef = dma_fence_get(&info->eviction_fence->base);
}
@@ -1381,10 +1394,37 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void 
*drm_priv)
return avm->pd_phys_addr;
 }
 
+void amdgpu_amdkfd_block_mmu_notifications(void *p)
+{
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   pinfo->block_mmu_notifications = true;
+}
+
+int amdgpu_amdkfd_criu_resume(void *p)
+{
+   int ret = 0;
+   struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+   mutex_lock(&pinfo->lock);
+   pr_debug("scheduling work\n");
+   atomic_inc(&pinfo->evicted_bos);
+   if (!pinfo->block_mmu_notif

[PATCH v3 02/16] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-09-29 Thread David Yat Sin

From: Rajneesh Bhardwaj 

Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
and its extensible plugin interface. Thus, In order to support the
Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
Kernel driver, needs to provide a set of new APIs that provide
necessary VRAM metadata and its contents to a userspace component
(CRIU plugin) that can store it in form of image files.

This introduces some new ioctls which will be used to checkpoint-Restore
any KFD bound user process. KFD doesn't allow any arbitrary ioctl call
unless it is called by the group leader process. Since these ioctls are
expected to be called from a KFD criu plugin which has elevated ptrace
attached privileges and CAP_SYS_ADMIN capabilities attached with the
file descriptors so modify KFD to allow such calls.

(API redesigned by David Yat Sin)

Suggested-by: Felix Kuehling 
Signed-off-by: David Yat Sin 
Signed-off-by: Rajneesh Bhardwaj 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  59 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  69 +++
 include/uapi/linux/kfd_ioctl.h   | 221 ++-
 3 files changed, 347 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4de907f3e66a..231f8e3b43f6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "kfd_priv.h"
@@ -1840,6 +1841,34 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
return -EPERM;
 }
 #endif
+static int kfd_ioctl_criu_dumper(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return 0;
+}
+
+static int kfd_ioctl_criu_restorer(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return 0;
+}
+
+static int kfd_ioctl_criu_pause(struct file *filep, struct kfd_process *p, 
void *data)
+{
+   return 0;
+}
+
+static int kfd_ioctl_criu_resume(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return 0;
+}
+
+static int kfd_ioctl_criu_process_info(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return 0;
+}
 
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
@@ -1944,6 +1973,20 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_XNACK_MODE,
kfd_ioctl_set_xnack_mode, 0),
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_DUMPER,
+kfd_ioctl_criu_dumper, KFD_IOC_FLAG_PTRACE_ATTACHED),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_RESTORER,
+kfd_ioctl_criu_restorer, KFD_IOC_FLAG_ROOT_ONLY),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_PROCESS_INFO,
+kfd_ioctl_criu_process_info, 
KFD_IOC_FLAG_PTRACE_ATTACHED),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_RESUME,
+kfd_ioctl_criu_resume, KFD_IOC_FLAG_ROOT_ONLY),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_PAUSE,
+kfd_ioctl_criu_pause, KFD_IOC_FLAG_PTRACE_ATTACHED),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -1958,6 +2001,7 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
char *kdata = NULL;
unsigned int usize, asize;
int retcode = -EINVAL;
+   bool ptrace_attached = false;
 
if (nr >= AMDKFD_CORE_IOCTL_COUNT)
goto err_i1;
@@ -1983,7 +2027,15 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
 * processes need to create their own KFD device context.
 */
process = filep->private_data;
-   if (process->lead_thread != current->group_leader) {
+
+   rcu_read_lock();
+   if ((ioctl->flags & KFD_IOC_FLAG_PTRACE_ATTACHED) &&
+   ptrace_parent(process->lead_thread) == current)
+   ptrace_attached = true;
+   rcu_read_unlock();
+
+   if (process->lead_thread != current->group_leader
+   && !ptrace_attached) {
dev_dbg(kfd_device, "Using KFD FD in wrong process\n");
retcode = -EBADF;
goto err_i1;
@@ -1998,6 +2050,11 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
goto err_i1;
}
 
+   /* KFD_IOC_FLAG_ROOT_ONLY is only for CAP_SYS_ADMIN */
+   if (unlikely((ioctl->flags & KFD_IOC_FLAG_ROOT_ONLY) &&
+!capable(CAP_SYS_ADMIN)))
+   return

[PATCH v3 03/16] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-09-29 Thread David Yat Sin

From: Rajneesh Bhardwaj 

This IOCTL is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
IOCTL.

The process_info IOCTL determines the number of GPUs, buffer objects
that are associated with the target process, its process id in
caller's namespace since /proc/pid/mem interface maybe used to drain
the contents of the discovered buffer objects in userspace and getpid
returns the pid of CRIU dumper process. Also the pid of a process
inside a container might be different than its global pid so return
the ns pid.

Signed-off-by: Rajneesh Bhardwaj 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 14 
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 231f8e3b43f6..1906ded40698 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1841,6 +1841,26 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
return -EPERM;
 }
 #endif
+uint64_t get_process_num_bos(struct kfd_process *p)
+{
+   uint64_t num_of_bos = 0, i;
+
+   /* Run over all PDDs of the process */
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+   void *mem;
+   int id;
+
+   idr_for_each_entry(&pdd->alloc_idr, mem, id) {
+   struct kgd_mem *kgd_mem = (struct kgd_mem *)mem;
+
+   if ((uint64_t)kgd_mem->va > pdd->gpuvm_base)
+   num_of_bos++;
+   }
+   }
+   return num_of_bos;
+}
+
 static int kfd_ioctl_criu_dumper(struct file *filep,
struct kfd_process *p, void *data)
 {
@@ -1867,7 +1887,29 @@ static int kfd_ioctl_criu_resume(struct file *filep,
 static int kfd_ioctl_criu_process_info(struct file *filep,
struct kfd_process *p, void *data)
 {
-   return 0;
+   struct kfd_ioctl_criu_process_info_args *args = data;
+   int ret = 0;
+
+   mutex_lock(&p->mutex);
+
+   if (!kfd_has_process_device_data(p)) {
+   pr_err("No pdd for given process\n");
+   ret = -ENODEV;
+   goto err_unlock;
+   }
+
+   args->task_pid = task_pid_nr_ns(p->lead_thread,
+   task_active_pid_ns(p->lead_thread));
+
+   args->process_priv_data_size = sizeof(struct 
kfd_criu_process_priv_data);
+
+   args->total_bos = get_process_num_bos(p);
+   args->bos_priv_data_size = args->total_bos * sizeof(struct 
kfd_criu_bo_priv_data);
+
+   dev_dbg(kfd_device, "Num of bos:%llu\n", args->total_bos);
+err_unlock:
+   mutex_unlock(&p->mutex);
+   return ret;
 }
 
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index da70c96e5bb0..914306209c9c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -943,6 +943,8 @@ void *kfd_process_device_translate_handle(struct 
kfd_process_device *p,
 void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
int handle);
 
+bool kfd_has_process_device_data(struct kfd_process *p);
+
 /* PASIDs */
 int kfd_pasid_init(void);
 void kfd_pasid_exit(void);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 21ec8a18cad2..9f2b4d8a5247 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1406,6 +1406,20 @@ static int init_doorbell_bitmap(struct 
qcm_process_device *qpd,
return 0;
 }
 
+bool kfd_has_process_device_data(struct kfd_process *p)
+{
+   int i;
+
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (pdd)
+   return true;
+   }
+
+   return false;
+}
+
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
struct kfd_process *p)
 {
-- 
2.17.1

Re: [PATCH] drm/amd/pm: Fix that RPM cannot be obtained for specific GPU

2021-09-29 Thread Christian König


Am 28.09.21 um 23:50 schrieb Alex Deucher:

On Tue, Sep 28, 2021 at 2:29 AM Christian König
 wrote:

Am 28.09.21 um 02:49 schrieb huangyizhi:

The current mechanism for obtaining RPM is to read tach_period from
the register, and then calculate the RPM together with the frequency.
But we found that on specific GPUs, such as RX 550 and RX 560D,
tach_period always reads as 0 and smu7_fan_ctrl_get_fan_speed_rpm
will returns -EINVAL.

To solve this problem, when reading tach_period as 0, we try
to estimate the current RPM using the percentage of current pwm, the
maximum and minimum RPM.

Well that is most likely a bad idea.

When the fan speed is not available faking some value is certainly not
the right solution, especially when you don't know the topology of the
DC conversion driven by the PWM.


I think there is a flag in the vbios to determine whether a specific
board supports rpm based fan control.  This used to be an AIB specific
option.  If the flag is not set, the driver should not expose the rpm
interface for fan control, only the PWM interface.  I think at some
point rpm fan control became mandatory, but maybe it was still an
option on polaris and we are missing a check for that flag.


Yeah, that sounds totally sane to me as well.

Let's ask for a volunteer for the job on Thursday if not somebody from 
the community speaks up.


Christian.



Alex



Christian.


Signed-off-by: huangyizhi 
---
   .../drm/amd/pm/powerplay/hwmgr/smu7_thermal.c | 28 ---
   1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_thermal.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_thermal.c
index a6c3610db23e..307dd87d6882 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_thermal.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_thermal.c
@@ -81,6 +81,11 @@ int smu7_fan_ctrl_get_fan_speed_rpm(struct pp_hwmgr *hwmgr, 
uint32_t *speed)
   {
   uint32_t tach_period;
   uint32_t crystal_clock_freq;
+ uint32_t duty100;
+ uint32_t duty;
+ uint32_t speed_percent;
+ uint64_t tmp64;
+

   if (hwmgr->thermal_controller.fanInfo.bNoFan ||
   !hwmgr->thermal_controller.fanInfo.ucTachometerPulsesPerRevolution)
@@ -89,13 +94,28 @@ int smu7_fan_ctrl_get_fan_speed_rpm(struct pp_hwmgr *hwmgr, 
uint32_t *speed)
   tach_period = PHM_READ_VFPF_INDIRECT_FIELD(hwmgr->device, 
CGS_IND_REG__SMC,
   CG_TACH_STATUS, TACH_PERIOD);

- if (tach_period == 0)
- return -EINVAL;
+ if (tach_period == 0) {

- crystal_clock_freq = amdgpu_asic_get_xclk((struct amdgpu_device 
*)hwmgr->adev);
+ duty100 = PHM_READ_VFPF_INDIRECT_FIELD(hwmgr->device, 
CGS_IND_REG__SMC,
+ CG_FDO_CTRL1, FMAX_DUTY100);
+ duty = PHM_READ_VFPF_INDIRECT_FIELD(hwmgr->device, 
CGS_IND_REG__SMC,
+ CG_THERMAL_STATUS, FDO_PWM_DUTY);

- *speed = 60 * crystal_clock_freq * 1 / tach_period;
+ if (duty100 == 0)
+ return -EINVAL;

+ tmp64 = (uint64_t)duty * 100;
+ do_div(tmp64, duty100);
+ speed_percent = MIN((uint32_t)tmp64, 100);
+
+ *speed = speed_percent * 
(hwmgr->thermal_controller.fanInfo.ulMaxRPM
+ - hwmgr->thermal_controller.fanInfo.ulMinRPM) / 100;
+ } else {
+
+ crystal_clock_freq = amdgpu_asic_get_xclk((struct amdgpu_device 
*)hwmgr->adev);
+
+ *speed = 60 * crystal_clock_freq * 1 / tach_period;
+ }
   return 0;
   }

Re: [PATCH 61/64] drm/amdgpu: add support for SRIOV in IP discovery path

2021-09-29 Thread Christian König


Am 28.09.21 um 18:42 schrieb Alex Deucher:

Handle SRIOV requirements when adding IP blocks.

v2: add comment about UVD/VCE support on vega20 SR-IOV

Signed-off-by: Alex Deucher 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 34 ++-
  1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index d9c2a7210a1b..091ded38545f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -820,7 +820,9 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
amdgpu_device *adev)
switch (adev->ip_versions[UVD_HWIP][0]) {
case IP_VERSION(7, 0, 0):
case IP_VERSION(7, 2, 0):
-   amdgpu_device_ip_block_add(adev, &uvd_v7_0_ip_block);
+   /* UVD is not supported on vega20 SR-IOV */
+   if (!(adev->asic_type == CHIP_VEGA20 && 
amdgpu_sriov_vf(adev)))
+   amdgpu_device_ip_block_add(adev, 
&uvd_v7_0_ip_block);
break;
default:
return -EINVAL;
@@ -828,7 +830,9 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
amdgpu_device *adev)
switch (adev->ip_versions[VCE_HWIP][0]) {
case IP_VERSION(4, 0, 0):
case IP_VERSION(4, 1, 0):
-   amdgpu_device_ip_block_add(adev, &vce_v4_0_ip_block);
+   /* VCE is not supported on vega20 SR-IOV */
+   if (!(adev->asic_type == CHIP_VEGA20 && 
amdgpu_sriov_vf(adev)))
+   amdgpu_device_ip_block_add(adev, 
&vce_v4_0_ip_block);
break;
default:
return -EINVAL;
@@ -860,7 +864,8 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(3, 1, 1):
case IP_VERSION(3, 0, 2):
amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
-   amdgpu_device_ip_block_add(adev, &jpeg_v3_0_ip_block);
+   if (!amdgpu_sriov_vf(adev))
+   amdgpu_device_ip_block_add(adev, 
&jpeg_v3_0_ip_block);
break;
case IP_VERSION(3, 0, 33):
amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
@@ -1202,14 +1207,24 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
if (r)
return r;
  
-	r = amdgpu_discovery_set_ih_ip_blocks(adev);

-   if (r)
-   return r;
-
-   if (likely(adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)) {
+   /* For SR-IOV, PSP needs to be initialized before IH */
+   if (amdgpu_sriov_vf(adev)) {
r = amdgpu_discovery_set_psp_ip_blocks(adev);
if (r)
return r;
+   r = amdgpu_discovery_set_ih_ip_blocks(adev);
+   if (r)
+   return r;
+   } else {
+   r = amdgpu_discovery_set_ih_ip_blocks(adev);
+   if (r)
+   return r;
+
+   if (likely(adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)) {
+   r = amdgpu_discovery_set_psp_ip_blocks(adev);
+   if (r)
+   return r;
+   }
}
  
  	if (likely(adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)) {

@@ -1230,7 +1245,8 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
if (r)
return r;
  
-	if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {

+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT &&
+   !amdgpu_sriov_vf(adev)) {
r = amdgpu_discovery_set_smu_ip_blocks(adev);
if (r)
return r;

Re: [PATCH 59/64] drm/amdgpu: convert IP version array to include instances

2021-09-29 Thread Christian König


Am 28.09.21 um 18:42 schrieb Alex Deucher:

Allow us to query instances versions more cleanly.

Instancing support is not consistent unfortunately. SDMA is a
good example.  Sienna cichlid has 4 total SDMA instances, each
enumerated separately (HWIDs 42, 43, 68, 69).  Arcturus has 8
total SDMA instances, but they are enumerated as multiple
instances of the same HWIDs (4x HWID 42, 4x HWID 43).  UMC
is another example.  On most chips there are multiple
instances with the same HWID.  This allows us to support both
forms.

v2: rebase
v3: clarify instancing support

Signed-off-by: Alex Deucher 


Yes, that comment makes it much easier to grab what is happening here.

Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 271 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   |  34 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   |   4 +-
  drivers/gpu/drm/amd/amdgpu/athub_v2_0.c   |   2 +-
  drivers/gpu/drm/amd/amdgpu/athub_v2_1.c   |   2 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c|  80 +++---
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |  72 ++---
  drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c  |   4 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c|  16 +-
  drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c |  14 +-
  drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c   |  14 +-
  drivers/gpu/drm/amd/amdgpu/mmhub_v2_3.c   |   2 +-
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c|   4 +-
  drivers/gpu/drm/amd/amdgpu/nv.c   |   8 +-
  drivers/gpu/drm/amd/amdgpu/psp_v11_0.c|   4 +-
  drivers/gpu/drm/amd/amdgpu/psp_v13_0.c|   4 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c|  34 +--
  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c|   8 +-
  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c|  10 +-
  drivers/gpu/drm/amd/amdgpu/soc15.c|  24 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c |   4 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c |   6 +-
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  26 +-
  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  18 +-
  .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  32 +--
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  24 +-
  .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  28 +-
  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  10 +-
  29 files changed, 385 insertions(+), 376 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b153c3740307..f4bceb2624fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1096,7 +1096,7 @@ struct amdgpu_device {
struct pci_saved_state  *pci_state;
  
  	struct amdgpu_reset_control *reset_cntl;

-   uint32_tip_versions[HW_ID_MAX];
+   uint32_t
ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE];
  };
  
  static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index dbaa238a4620..dd2c7b2bae68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -384,7 +384,16 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)

hw_id_names[le16_to_cpu(ip->hw_id)]);

adev->reg_offset[hw_ip][ip->number_instance] =
ip->base_address;
-   adev->ip_versions[hw_ip] =
+   /* Instance support is somewhat 
inconsistent.
+* SDMA is a good example.  Sienna 
cichlid has 4 total
+* SDMA instances, each enumerated 
separately (HWIDs
+* 42, 43, 68, 69).  Arcturus has 8 
total SDMA instances,
+* but they are enumerated as multiple 
instances of the
+* same HWIDs (4x HWID 42, 4x HWID 43). 
 UMC is another
+* example.  On most chips there are 
multiple instances
+* with the same HWID.
+*/
+   
adev->ip_versions[hw_ip][ip->number_instance] =
IP_VERSION(ip->major, ip->minor, 
ip->revision);
}
}
@@ -539,139 +548,139 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
case CHIP_VEGA10:
vega10_reg_base_init(adev);
adev->sdma.num_instances = 2;
-   adev->ip_versions[MMHUB_HWIP] = IP_VERSION(9, 0, 0);
-   a

[PATCH] drm/amd/amdgpu: Do irq_fini_hw after ip_fini_early

2021-09-29 Thread YuBiao Wang

Some IP such as SMU need irq_put to perform hw_fini.
So move irq_fini_hw after ip_fini.

Signed-off-by: YuBiao Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 4c8f2f4647c0..18e26a78ef82 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3864,10 +3864,10 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_ucode_sysfs_fini(adev);
sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
 
-   amdgpu_irq_fini_hw(adev);
-
amdgpu_device_ip_fini_early(adev);
 
+   amdgpu_irq_fini_hw(adev);
+
ttm_device_clear_dma_mappings(&adev->mman.bdev);
 
amdgpu_gart_dummy_page_fini(adev);
-- 
2.25.1

Re: [PATCH 52/64] drm/amdgpu: get VCN and SDMA instances from IP discovery table

2021-09-29 Thread Christian König


Am 28.09.21 um 18:42 schrieb Alex Deucher:

Rather than hardcoding it.  We already have the number of VCN
instances from a previous patch, so just update the VCN
instances for chips with static tables.

v2: squash in checks for SDMA3,4 (Guchun)
v3: clarify VCN changes

Signed-off-by: Alex Deucher 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index d3069841ff79..13cd814f2626 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -363,6 +363,11 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
  
  			if (le16_to_cpu(ip->hw_id) == VCN_HWID)

adev->vcn.num_vcn_inst++;
+   if (le16_to_cpu(ip->hw_id) == SDMA0_HWID ||
+   le16_to_cpu(ip->hw_id) == SDMA1_HWID ||
+   le16_to_cpu(ip->hw_id) == SDMA2_HWID ||
+   le16_to_cpu(ip->hw_id) == SDMA3_HWID)
+   adev->sdma.num_instances++;
  
  			for (k = 0; k < num_base_address; k++) {

/*
@@ -529,6 +534,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
switch (adev->asic_type) {
case CHIP_VEGA10:
vega10_reg_base_init(adev);
+   adev->sdma.num_instances = 2;
adev->ip_versions[MMHUB_HWIP] = IP_VERSION(9, 0, 0);
adev->ip_versions[ATHUB_HWIP] = IP_VERSION(9, 0, 0);
adev->ip_versions[OSSSYS_HWIP] = IP_VERSION(4, 0, 0);
@@ -548,6 +554,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
break;
case CHIP_VEGA12:
vega10_reg_base_init(adev);
+   adev->sdma.num_instances = 2;
adev->ip_versions[MMHUB_HWIP] = IP_VERSION(9, 3, 0);
adev->ip_versions[ATHUB_HWIP] = IP_VERSION(9, 3, 0);
adev->ip_versions[OSSSYS_HWIP] = IP_VERSION(4, 0, 1);
@@ -567,6 +574,8 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
break;
case CHIP_RAVEN:
vega10_reg_base_init(adev);
+   adev->sdma.num_instances = 1;
+   adev->vcn.num_vcn_inst = 1;
if (adev->apu_flags & AMD_APU_IS_RAVEN2) {
adev->ip_versions[MMHUB_HWIP] = IP_VERSION(9, 2, 0);
adev->ip_versions[ATHUB_HWIP] = IP_VERSION(9, 2, 0);
@@ -603,6 +612,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
break;
case CHIP_VEGA20:
vega20_reg_base_init(adev);
+   adev->sdma.num_instances = 2;
adev->ip_versions[MMHUB_HWIP] = IP_VERSION(9, 4, 0);
adev->ip_versions[ATHUB_HWIP] = IP_VERSION(9, 4, 0);
adev->ip_versions[OSSSYS_HWIP] = IP_VERSION(4, 2, 0);
@@ -622,6 +632,8 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
break;
case CHIP_ARCTURUS:
arct_reg_base_init(adev);
+   adev->sdma.num_instances = 8;
+   adev->vcn.num_vcn_inst = 2;
adev->ip_versions[MMHUB_HWIP] = IP_VERSION(9, 4, 1);
adev->ip_versions[ATHUB_HWIP] = IP_VERSION(9, 4, 1);
adev->ip_versions[OSSSYS_HWIP] = IP_VERSION(4, 2, 1);
@@ -639,6 +651,8 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
break;
case CHIP_ALDEBARAN:
aldebaran_reg_base_init(adev);
+   adev->sdma.num_instances = 5;
+   adev->vcn.num_vcn_inst = 2;
adev->ip_versions[MMHUB_HWIP] = IP_VERSION(9, 4, 2);
adev->ip_versions[ATHUB_HWIP] = IP_VERSION(9, 4, 2);
adev->ip_versions[OSSSYS_HWIP] = IP_VERSION(4, 4, 0);

Re: [PATCH 50/64] drm/amdgpu: add VCN1 hardware IP

2021-09-29 Thread Christian König


Am 28.09.21 um 18:42 schrieb Alex Deucher:

So we can store the VCN IP revision for each instance of VCN.

Signed-off-by: Alex Deucher 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 815db33190ca..b153c3740307 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -744,6 +744,7 @@ enum amd_hw_ip_block_type {
UVD_HWIP,
VCN_HWIP = UVD_HWIP,
JPEG_HWIP = VCN_HWIP,
+   VCN1_HWIP,
VCE_HWIP,
DF_HWIP,
DCE_HWIP,

Re: [PATCH 31/64] drm/amdgpu/soc15: export common IP functions

2021-09-29 Thread Christian König


Am 28.09.21 um 18:42 schrieb Alex Deucher:

So they can be driven by IP discovery table.

Signed-off-by: Alex Deucher 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/soc15.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/soc15.h | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 1b1e9bfd20f1..dffe7d7ff9e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -706,7 +706,7 @@ static void soc15_enable_doorbell_aperture(struct 
amdgpu_device *adev,
adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, enable);
  }
  
-static const struct amdgpu_ip_block_version vega10_common_ip_block =

+const struct amdgpu_ip_block_version vega10_common_ip_block =
  {
.type = AMD_IP_BLOCK_TYPE_COMMON,
.major = 2,
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.h 
b/drivers/gpu/drm/amd/amdgpu/soc15.h
index a025339ac5e9..f9359003385d 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.h
@@ -28,6 +28,8 @@
  #include "nbio_v7_0.h"
  #include "nbio_v7_4.h"
  
+extern const struct amdgpu_ip_block_version vega10_common_ip_block;

+
  #define SOC15_FLUSH_GPU_TLB_NUM_WREG  6
  #define SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT  3

Re: [PATCH 27/64] drm/amdgpu/nv: convert to IP version checking

2021-09-29 Thread Christian König


Am 28.09.21 um 18:42 schrieb Alex Deucher:

Use IP versions rather than asic_type to differentiate
IP version specific features.

Signed-off-by: Alex Deucher 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/nv.c | 75 +
  1 file changed, 38 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 0dc390a7509f..57be517d70bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -180,8 +180,8 @@ static const struct amdgpu_video_codecs 
yc_video_codecs_decode = {
  static int nv_query_video_codecs(struct amdgpu_device *adev, bool encode,
 const struct amdgpu_video_codecs **codecs)
  {
-   switch (adev->asic_type) {
-   case CHIP_SIENNA_CICHLID:
+   switch (adev->ip_versions[UVD_HWIP]) {
+   case IP_VERSION(3, 0, 0):
if (amdgpu_sriov_vf(adev)) {
if (encode)
*codecs = &sriov_sc_video_codecs_encode;
@@ -194,29 +194,27 @@ static int nv_query_video_codecs(struct amdgpu_device 
*adev, bool encode,
*codecs = &sc_video_codecs_decode;
}
return 0;
-   case CHIP_NAVY_FLOUNDER:
-   case CHIP_DIMGREY_CAVEFISH:
-   case CHIP_VANGOGH:
+   case IP_VERSION(3, 0, 16):
+   case IP_VERSION(3, 0, 2):
if (encode)
*codecs = &nv_video_codecs_encode;
else
*codecs = &sc_video_codecs_decode;
return 0;
-   case CHIP_YELLOW_CARP:
+   case IP_VERSION(3, 1, 1):
if (encode)
*codecs = &nv_video_codecs_encode;
else
*codecs = &yc_video_codecs_decode;
return 0;
-   case CHIP_BEIGE_GOBY:
+   case IP_VERSION(3, 0, 33):
if (encode)
*codecs = &bg_video_codecs_encode;
else
*codecs = &bg_video_codecs_decode;
return 0;
-   case CHIP_NAVI10:
-   case CHIP_NAVI14:
-   case CHIP_NAVI12:
+   case IP_VERSION(2, 0, 0):
+   case IP_VERSION(2, 0, 2):
if (encode)
*codecs = &nv_video_codecs_encode;
else
@@ -511,14 +509,15 @@ nv_asic_reset_method(struct amdgpu_device *adev)
dev_warn(adev->dev, "Specified reset method:%d isn't supported, 
using AUTO instead.\n",
  amdgpu_reset_method);
  
-	switch (adev->asic_type) {

-   case CHIP_VANGOGH:
-   case CHIP_YELLOW_CARP:
+   switch (adev->ip_versions[MP1_HWIP]) {
+   case IP_VERSION(11, 5, 0):
+   case IP_VERSION(13, 0, 1):
+   case IP_VERSION(13, 0, 3):
return AMD_RESET_METHOD_MODE2;
-   case CHIP_SIENNA_CICHLID:
-   case CHIP_NAVY_FLOUNDER:
-   case CHIP_DIMGREY_CAVEFISH:
-   case CHIP_BEIGE_GOBY:
+   case IP_VERSION(11, 0, 7):
+   case IP_VERSION(11, 0, 11):
+   case IP_VERSION(11, 0, 12):
+   case IP_VERSION(11, 0, 13):
return AMD_RESET_METHOD_MODE1;
default:
if (amdgpu_dpm_is_baco_supported(adev))
@@ -1042,8 +1041,11 @@ static int nv_common_early_init(void *handle)
  
  	adev->rev_id = nv_get_rev_id(adev);

adev->external_rev_id = 0xff;
-   switch (adev->asic_type) {
-   case CHIP_NAVI10:
+   /* TODO: split the GC and PG flags based on the relevant IP version for 
which
+* they are relevant.
+*/
+   switch (adev->ip_versions[GC_HWIP]) {
+   case IP_VERSION(10, 1, 10):
adev->cg_flags = AMD_CG_SUPPORT_GFX_MGCG |
AMD_CG_SUPPORT_GFX_CGCG |
AMD_CG_SUPPORT_IH_CG |
@@ -1065,7 +1067,7 @@ static int nv_common_early_init(void *handle)
AMD_PG_SUPPORT_ATHUB;
adev->external_rev_id = adev->rev_id + 0x1;
break;
-   case CHIP_NAVI14:
+   case IP_VERSION(10, 1, 1):
adev->cg_flags = AMD_CG_SUPPORT_GFX_MGCG |
AMD_CG_SUPPORT_GFX_CGCG |
AMD_CG_SUPPORT_IH_CG |
@@ -1086,7 +1088,7 @@ static int nv_common_early_init(void *handle)
AMD_PG_SUPPORT_VCN_DPG;
adev->external_rev_id = adev->rev_id + 20;
break;
-   case CHIP_NAVI12:
+   case IP_VERSION(10, 1, 2):
adev->cg_flags = AMD_CG_SUPPORT_GFX_MGCG |
AMD_CG_SUPPORT_GFX_MGLS |
AMD_CG_SUPPORT_GFX_CGCG |
@@ -1115,7 +1117,7 @@ static int nv_common_early_init(void *handle)
adev->rev_id = 0;
adev->external_rev_id = adev->rev_id + 0xa;
break;
-   case CHIP_SIENNA_CICHLID:
+   case IP_VERSION(10, 3, 0):

[PATCH] drm/amdkfd: fix a potential cu_mask memory leak

2021-09-29 Thread Lang Yu

If user doesn't explicitly call kfd_ioctl_destroy_queue
to destroy all created queues, when the kfd process is
destroyed, some queues' cu_mask memory are not freed.

To avoid forgetting to free them in some places,
free them immediately after use.

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  8 
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10 --
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4de907f3e66a..5c0e6dcf692a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -451,8 +451,8 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
retval = copy_from_user(properties.cu_mask, cu_mask_ptr, cu_mask_size);
if (retval) {
pr_debug("Could not copy CU mask from userspace");
-   kfree(properties.cu_mask);
-   return -EFAULT;
+   retval = -EFAULT;
+   goto out;
}
 
mutex_lock(&p->mutex);
@@ -461,8 +461,8 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
 
mutex_unlock(&p->mutex);
 
-   if (retval)
-   kfree(properties.cu_mask);
+out:
+   kfree(properties.cu_mask);
 
return retval;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 243dd1efcdbf..4c81d690f31a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -394,8 +394,6 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, 
unsigned int qid)
pdd->qpd.num_gws = 0;
}
 
-   kfree(pqn->q->properties.cu_mask);
-   pqn->q->properties.cu_mask = NULL;
uninit_queue(pqn->q);
}
 
@@ -448,16 +446,16 @@ int pqm_set_cu_mask(struct process_queue_manager *pqm, 
unsigned int qid,
return -EFAULT;
}
 
-   /* Free the old CU mask memory if it is already allocated, then
-* allocate memory for the new CU mask.
-*/
-   kfree(pqn->q->properties.cu_mask);
+   WARN_ON_ONCE(pqn->q->properties.cu_mask);
 
pqn->q->properties.cu_mask_count = p->cu_mask_count;
pqn->q->properties.cu_mask = p->cu_mask;
 
retval = pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
pqn->q);
+
+   pqn->q->properties.cu_mask = NULL;
+
if (retval != 0)
return retval;
 
-- 
2.25.1

[PATCH] drm/amdkfd: fix a potential ttm->sg memory leak

2021-09-29 Thread Lang Yu

Memory is allocated for ttm->sg by kmalloc in kfd_mem_dmamap_userptr,
but isn't freed by kfree in kfd_mem_dmaunmap_userptr. Free it!

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..054c1a224def 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -563,6 +563,7 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
 
dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
sg_free_table(ttm->sg);
+   kfree(ttm->sg);
ttm->sg = NULL;
 }
 
-- 
2.25.1

amdgpu driver halted on suspend of shutdown

2021-09-29 Thread 李真能


Hello:

        When I do loop  auto test of reboot, I found  kernel may halt 
on memcpy_fromio of amdgpu's amdgpu_uvd_suspend, so I remove suspend 
process in amdgpu_pci_shutdown, and it will fix this bug.


I have 3 questions to ask:

1. In amdgpu_pci_shutdown, the comment explains why we must execute 
suspend,  so I know VM will call amdgpu driver in which situations, as I 
know, VM's graphics card is a virtual card;


2. I see a path that is commited by Alex Deucher, the commit message is 
as follows:


drm/amdgpu: just suspend the hw on pci shutdown

We can't just reuse pci_remove as there may be userspace still
    doing things.

My question is:In which situations, there may be  userspace till doing 
things.


3. Why amdgpu driver is halted on memcpy_fromio of amdgpu_uvd_suspend, I 
haven't launch any video app during reboot test, is it the bug of pci bus?


Test environment:

CPU: arm64

Graphics card: r7340(amdgpu), rx550

OS: ubuntu 2004

RE: [PATCH] V2: drm/amdgpu: resolve RAS query bug

2021-09-29 Thread Zhang, Hawking

Reviewed-by: Hawking Zhang 

Regards,
Hawking
From: Clements, John 
Sent: Wednesday, September 29, 2021 15:03
To: Clements, John ; amd-gfx@lists.freedesktop.org; 
Zhang, Hawking 
Subject: RE: [PATCH] V2: drm/amdgpu: resolve RAS query bug


[AMD Official Use Only]

Updated patch with simpler solution

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Clements, John
Sent: Wednesday, September 29, 2021 2:07 PM
To: amd-gfx@lists.freedesktop.org; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolve RAS query bug


[AMD Official Use Only]

Submitting patch to clear RAS error encounters during error query if persistent 
harvesting is not enabled

Thank you,
John Clements

RE: [PATCH] V2: drm/amdgpu: resolve RAS query bug

2021-09-29 Thread Clements, John

[AMD Official Use Only]

Updated patch with simpler solution

From: amd-gfx  On Behalf Of Clements, 
John
Sent: Wednesday, September 29, 2021 2:07 PM
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: resolve RAS query bug


[AMD Official Use Only]

Submitting patch to clear RAS error encounters during error query if persistent 
harvesting is not enabled

Thank you,
John Clements


0001-drm-amdgpu-resolve-RAS-query-bug.patch
Description: 0001-drm-amdgpu-resolve-RAS-query-bug.patch

RE: [PATCH] drm/amdgpu: resolve RAS query bug

2021-09-29 Thread Zhang, Hawking

Thanks John! Let's try to use amdgpu_ras_query_error_status for that purpose

Regards,
Hawking
From: Clements, John 
Sent: Wednesday, September 29, 2021 14:07
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: resolve RAS query bug


[AMD Official Use Only]

Submitting patch to clear RAS error encounters during error query if persistent 
harvesting is not enabled

Thank you,
John Clements

71 matches

Mail list logo