Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-27 Thread Marek Olšák
It's possible that what they really needed was the SDMA fix, so I don't
know if they need this flag anymore.

Marek

On Tue., Apr. 28, 2020, 01:12 Marek Olšák,  wrote:

> No, Mesa won't use it. It's only necessary for the constant engine. My
> understanding from various discussions with the PAL team is that they need
> it, but I don't know if they even understand what the MEM_SYNC flag does.
>
> Marek
>
> On Mon., Apr. 27, 2020, 10:53 Christian König, <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Yeah, but is Mesa going to use it?
>>
>> Christian.
>>
>> Am 27.04.20 um 15:54 schrieb Marek Olšák:
>>
>> PAL requested it and they are going to use it. (it looks like they have
>> to use it for correctness)
>>
>> Marek
>>
>> On Mon, Apr 27, 2020 at 9:02 AM Deucher, Alexander <
>> alexander.deuc...@amd.com> wrote:
>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Do we have open source code UMD code which uses this?
>>>
>>> Alex
>>> --
>>> *From:* Christian König 
>>> *Sent:* Sunday, April 26, 2020 4:55 AM
>>> *To:* Marek Olšák ; Koenig, Christian <
>>> christian.koe...@amd.com>
>>> *Cc:* Deucher, Alexander ; amd-gfx mailing
>>> list 
>>> *Subject:* Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to
>>> compute IBs too
>>>
>>> Thanks for that explanation. I suspected that there was a good reason to
>>> have that in the kernel, but couldn't find one.
>>>
>>> In this case the patch is Reviewed-by: Christian König
>>>  
>>>
>>> We should probably add this explanation as comment to the flag as well.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 26.04.20 um 02:43 schrieb Marek Olšák:
>>>
>>> It was merged into amd-staging-drm-next.
>>>
>>> I'm not absolutely sure, but I think we need to invalidate before IBs if
>>> an IB is cached in L2 and the CPU has updated it. It can only be cached in
>>> L2 if something other than CP has read it or written to it without
>>> invalidation. CP reads don't cache it but they can hit the cache if it's
>>> already cached.
>>>
>>> For CE, we need to invalidate before the IB in the kernel, because CE
>>> IBs can't do cache invalidations IIRC. This is the number one reason for
>>> merging the already pushed commits.
>>>
>>> Marek
>>>
>>> On Sat., Apr. 25, 2020, 11:03 Christian König, <
>>> ckoenig.leichtzumer...@gmail.com> wrote:
>>>
>>> Was that patch set actually merged upstream? My last status is that we
>>> couldn't find a reason why we need to do this in the kernel.
>>>
>>> Christian.
>>>
>>> Am 25.04.20 um 10:52 schrieb Marek Olšák:
>>>
>>> This was missed.
>>>
>>> Marek
>>>
>>> ___
>>> amd-gfx mailing 
>>> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>  
>>> 
>>>
>>>
>>>
>>> ___
>>> amd-gfx mailing 
>>> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>  
>>> 
>>>
>>>
>>>
>> ___
>> amd-gfx mailing 
>> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-27 Thread Marek Olšák
No, Mesa won't use it. It's only necessary for the constant engine. My
understanding from various discussions with the PAL team is that they need
it, but I don't know if they even understand what the MEM_SYNC flag does.

Marek

On Mon., Apr. 27, 2020, 10:53 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Yeah, but is Mesa going to use it?
>
> Christian.
>
> Am 27.04.20 um 15:54 schrieb Marek Olšák:
>
> PAL requested it and they are going to use it. (it looks like they have to
> use it for correctness)
>
> Marek
>
> On Mon, Apr 27, 2020 at 9:02 AM Deucher, Alexander <
> alexander.deuc...@amd.com> wrote:
>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Do we have open source code UMD code which uses this?
>>
>> Alex
>> --
>> *From:* Christian König 
>> *Sent:* Sunday, April 26, 2020 4:55 AM
>> *To:* Marek Olšák ; Koenig, Christian <
>> christian.koe...@amd.com>
>> *Cc:* Deucher, Alexander ; amd-gfx mailing
>> list 
>> *Subject:* Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute
>> IBs too
>>
>> Thanks for that explanation. I suspected that there was a good reason to
>> have that in the kernel, but couldn't find one.
>>
>> In this case the patch is Reviewed-by: Christian König
>>  
>>
>> We should probably add this explanation as comment to the flag as well.
>>
>> Thanks,
>> Christian.
>>
>> Am 26.04.20 um 02:43 schrieb Marek Olšák:
>>
>> It was merged into amd-staging-drm-next.
>>
>> I'm not absolutely sure, but I think we need to invalidate before IBs if
>> an IB is cached in L2 and the CPU has updated it. It can only be cached in
>> L2 if something other than CP has read it or written to it without
>> invalidation. CP reads don't cache it but they can hit the cache if it's
>> already cached.
>>
>> For CE, we need to invalidate before the IB in the kernel, because CE IBs
>> can't do cache invalidations IIRC. This is the number one reason for
>> merging the already pushed commits.
>>
>> Marek
>>
>> On Sat., Apr. 25, 2020, 11:03 Christian König, <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>>
>> Was that patch set actually merged upstream? My last status is that we
>> couldn't find a reason why we need to do this in the kernel.
>>
>> Christian.
>>
>> Am 25.04.20 um 10:52 schrieb Marek Olšák:
>>
>> This was missed.
>>
>> Marek
>>
>> ___
>> amd-gfx mailing 
>> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>  
>> 
>>
>>
>>
>> ___
>> amd-gfx mailing 
>> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>  
>> 
>>
>>
>>
> ___
> amd-gfx mailing 
> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: put the audio codec into suspend state before gpu reset V2

2020-04-27 Thread Quan, Evan
Hi Alex,

The pm_runtime_autosuspend_expiration() return 0 due to ->use_autosuspend and 
autosuspend_delay are all zeros.
This seems not kernel specific. As I can see this on 5.6-drm-next kernel and 
ubuntu original 5.3.46 kernel.
Any insights why that happened?
And maybe a compromise is: try the pm_runtime_autosuspend_expiration() first. 
And if failed(report 0), use a fixed interval(3S).

Regards,
Evan
-Original Message-
From: Alex Deucher  
Sent: Wednesday, April 22, 2020 9:35 PM
To: Quan, Evan 
Cc: amd-gfx list ; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: put the audio codec into suspend state before 
gpu reset V2

On Tue, Apr 21, 2020 at 10:42 PM Evan Quan  wrote:
>
> At default, the autosuspend delay of audio controller is 3S. If the 
> gpu reset is triggered within 3S(after audio controller idle), the 
> audio controller may be unable into suspended state. Then the sudden 
> gpu reset will cause some audio errors. The change here is targeted to 
> resolve this.
>
> However if the audio controller is in use when the gpu reset 
> triggered, this change may be still not enough to put the audio 
> controller into suspend state. Under this case, the gpu reset will 
> still proceed but there will be a warning message printed("failed to 
> suspend display audio").
>
> V2: limit this for BACO and mode1 reset only
>
> Change-Id: I33d85e6fcad1882eb33f9cde8916d57be8d5a87a
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 70 
> ++
>  1 file changed, 70 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2d4b78d96426..70f43b1aed78 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -69,6 +69,7 @@
>
>  #include 
>  #include 
> +#include 
>
>  MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
> @@ -4146,6 +4147,59 @@ static void amdgpu_device_unlock_adev(struct 
> amdgpu_device *adev)
> mutex_unlock(>lock_reset);  }
>
> +static void amdgpu_device_resume_display_audio(struct amdgpu_device 
> +*adev) {
> +   struct pci_dev *p = NULL;
> +
> +   p = pci_get_domain_bus_and_slot(pci_domain_nr(adev->pdev->bus),
> +   adev->pdev->bus->number, 1);
> +   if (p) {
> +   pm_runtime_enable(&(p->dev));
> +   pm_runtime_resume(&(p->dev));
> +   }
> +}
> +
> +static int amdgpu_device_suspend_display_audio(struct amdgpu_device 
> +*adev) {
> +   enum amd_reset_method reset_method;
> +   struct pci_dev *p = NULL;
> +   unsigned long end_jiffies;
> +
> +   /*
> +* For now, only BACO and mode1 reset are confirmed
> +* to suffer the audio issue without proper suspended.
> +*/
> +   reset_method = amdgpu_asic_reset_method(adev);
> +   if ((reset_method != AMD_RESET_METHOD_BACO) &&
> +(reset_method != AMD_RESET_METHOD_MODE1))
> +   return -EINVAL;
> +
> +   p = pci_get_domain_bus_and_slot(pci_domain_nr(adev->pdev->bus),
> +   adev->pdev->bus->number, 1);
> +   if (!p)
> +   return -ENODEV;
> +
> +   /*
> +* 3S is the audio controller default autosuspend delay setting.
> +* 4S used here is guaranteed to cover that.
> +*/

Instead of hardcoding 3S, we should probably use
pm_runtime_autosuspend_expiration() to query how much time is left and then use 
that.  That way this will work even if userspace has changed the delay.  With 
that fixed:
Reviewed-by: Alex Deucher 

Alex


> +   end_jiffies = msecs_to_jiffies(4000) + jiffies;
> +   while (!pm_runtime_status_suspended(&(p->dev))) {
> +   if (!pm_runtime_suspend(&(p->dev)))
> +   break;
> +
> +   if (time_after(jiffies, end_jiffies)) {
> +   dev_warn(adev->dev, "failed to suspend display 
> audio\n");
> +   /* TODO: abort the succeeding gpu reset? */
> +   return -ETIMEDOUT;
> +   }
> +   }
> +
> +   pm_runtime_disable(&(p->dev));
> +
> +   return 0;
> +}
> +
>  /**
>   * amdgpu_device_gpu_recover - reset the asic and recover scheduler
>   *
> @@ -4170,6 +4224,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
> bool use_baco =
> (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) ?
> true : false;
> +   bool audio_suspended = false;
>
> /*
>  * Flush RAM to disk so that after reboot @@ -4227,6 +4282,19 
> @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> return 0;
> }
>
> +   /*
> +* Try to put the audio codec into suspend state
> +* before gpu reset started.
> +*
> +* Due to the power 

Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-27 Thread Chunming Zhou

Yes, same question.

In fact, PAL cmd stream has itself Relase/Acquire packets. That we use 
the flag is per your request.


-David

在 2020/4/27 22:53, Christian König 写道:

Yeah, but is Mesa going to use it?

Christian.

Am 27.04.20 um 15:54 schrieb Marek Olšák:
PAL requested it and they are going to use it. (it looks like they 
have to use it for correctness)


Marek

On Mon, Apr 27, 2020 at 9:02 AM Deucher, Alexander 
mailto:alexander.deuc...@amd.com>> wrote:


[AMD Official Use Only - Internal Distribution Only]


Do we have open source code UMD code which uses this?

Alex

*From:* Christian König mailto:ckoenig.leichtzumer...@gmail.com>>
*Sent:* Sunday, April 26, 2020 4:55 AM
*To:* Marek Olšák mailto:mar...@gmail.com>>;
Koenig, Christian mailto:christian.koe...@amd.com>>
*Cc:* Deucher, Alexander mailto:alexander.deuc...@amd.com>>; amd-gfx mailing list
mailto:amd-gfx@lists.freedesktop.org>>
*Subject:* Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to
compute IBs too
Thanks for that explanation. I suspected that there was a good
reason to have that in the kernel, but couldn't find one.

In this case the patch is Reviewed-by: Christian König
 

We should probably add this explanation as comment to the flag as
well.

Thanks,
Christian.

Am 26.04.20 um 02:43 schrieb Marek Olšák:

It was merged into amd-staging-drm-next.

I'm not absolutely sure, but I think we need to invalidate
before IBs if an IB is cached in L2 and the CPU has updated it.
It can only be cached in L2 if something other than CP has read
it or written to it without invalidation. CP reads don't cache
it but they can hit the cache if it's already cached.

For CE, we need to invalidate before the IB in the kernel,
because CE IBs can't do cache invalidations IIRC. This is the
number one reason for merging the already pushed commits.

Marek

On Sat., Apr. 25, 2020, 11:03 Christian König,
mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

Was that patch set actually merged upstream? My last status
is that we couldn't find a reason why we need to do this in
the kernel.

Christian.

Am 25.04.20 um 10:52 schrieb Marek Olšák:

This was missed.

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cdavid1.zhou%40amd.com%7Ced56cca1a5214cf9132808d7eabac6d9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637235960880895689sdata=6p%2BAuZXHiUrO8wElftOqsJzHF%2BVLe5TMDIF%2BbJNV6ac%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v4] drm/amdkfd: Track GPU memory utilization per process

2020-04-27 Thread Mukul Joshi
Track GPU VRAM usage on a per process basis and report it through
sysfs.

v2:
   - Handle AMDGPU BO-specific details in
 amdgpu_amdkfd_gpuvm_free_memory_of_gpu().
   - Return size of VRAM BO being freed from
 amdgpu_amdkfd_gpuvm_free_memory_of_gpu().
   - Do not consider imported memory for VRAM
 usage calculations.

v3:
   - Move handling of imported BO size from
 kfd_ioctl_free_memory_of_gpu() to
 amdgpu_amdkfd_gpuvm_free_memory_of_gpu().

v4:
   - Add READ_ONCE() and WRITE_ONCE() around 
 reading and writing vram_usage count.

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 13 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  7 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 57 ---
 5 files changed, 84 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index d065c50582eb..a501026e829c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -65,6 +65,7 @@ struct kgd_mem {
struct amdgpu_sync sync;
 
bool aql_queue;
+   bool is_imported;
 };
 
 /* KFD Memory Eviction */
@@ -219,7 +220,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
void *vm, struct kgd_mem **mem,
uint64_t *offset, uint32_t flags);
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
-   struct kgd_dev *kgd, struct kgd_mem *mem);
+   struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size);
 int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0768b7eb7683..1247938b1ec1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1277,7 +1277,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 }
 
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
-   struct kgd_dev *kgd, struct kgd_mem *mem)
+   struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size)
 {
struct amdkfd_process_info *process_info = mem->process_info;
unsigned long bo_size = mem->bo->tbo.mem.size;
@@ -1286,9 +1286,11 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct ttm_validate_buffer *bo_list_entry;
unsigned int mapped_to_gpu_memory;
int ret;
+   bool is_imported = 0;
 
mutex_lock(>lock);
mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
+   is_imported = mem->is_imported;
mutex_unlock(>lock);
/* lock is not needed after this, since mem is unused and will
 * be freed anyway
@@ -1340,6 +1342,17 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
kfree(mem->bo->tbo.sg);
}
 
+   /* Update the size of the BO being freed if it was allocated from
+* VRAM and is not imported.
+*/
+   if (size) {
+   if ((mem->bo->preferred_domains == AMDGPU_GEM_DOMAIN_VRAM) &&
+   (!is_imported))
+   *size = bo_size;
+   else
+   *size = 0;
+   }
+
/* Free the BO*/
amdgpu_bo_unref(>bo);
mutex_destroy(>lock);
@@ -1694,6 +1707,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
(*mem)->process_info = avm->process_info;
add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, false);
amdgpu_sync_create(&(*mem)->sync);
+   (*mem)->is_imported = true;
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f8fa03a12add..72c5843d9861 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1322,6 +1322,10 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
goto err_free;
}
 
+   /* Update the VRAM usage count */
+   if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+   WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + args->size);
+
mutex_unlock(>mutex);
 
args->handle = MAKE_HANDLE(args->gpu_id, idr_handle);
@@ -1337,7 +1341,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
return 0;
 
 err_free:
-   amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+   amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem, 
NULL);
 err_unlock:
mutex_unlock(>mutex);
return err;
@@ -1351,6 +1355,7 @@ static int kfd_ioctl_free_memory_of_gpu(struct file 
*filep,
void *mem;
struct kfd_dev *dev;
int ret;
+   uint64_t size = 0;
 
dev = 

RE: [PATCH] drm/amdgpu: address the static checker warnings

2020-04-27 Thread Quan, Evan
Ping..

-Original Message-
From: Quan, Evan 
Sent: Sunday, April 26, 2020 11:19 AM
To: Dan Carpenter 
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 

Subject: RE: [PATCH] drm/amdgpu: address the static checker warnings



-Original Message-
From: Dan Carpenter 
Sent: Friday, April 24, 2020 7:02 PM
To: Quan, Evan 
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: address the static checker warnings

On Fri, Apr 24, 2020 at 06:41:15PM +0800, Evan Quan wrote:
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4199 
> amdgpu_device_gpu_recover()
> error: we previously assumed 'hive' could be null (see line 4196)
> 
> This is introduced by "drm/amdgpu: optimize the gpu reset for XGMI setup V2".
> 
> Change-Id: I9c22b57abc9f512114112f93fb035f1fecf26beb
> Signed-off-by: Evan Quan 
> Reported-by: Dan Carpenter 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 71278942f9f0..898338dc9605 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4274,7 +4274,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   if (!amdgpu_device_lock_adev(tmp_adev, !hive)) {
>   DRM_INFO("Bailing on TDR for s_job:%llx, as another 
> already in progress",
> job ? job->base.id : -1);
> - mutex_unlock(>hive_lock);
> + if (hive)
> + mutex_unlock(>hive_lock);

In the current code, we know for a fact that "hive" is NULL at this point. 
[Evan] No, that's true for SGPU setup only. For XGMI setup(multiple dgpus 
interconnected with bridges), the "hive" here is not NULL.
Presumably this will be changed in the future?  Otherwise why not just delete 
the mutex_unlock() because it is dead code.

regards,
dan carpenter

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is initialized revised

2020-04-27 Thread Quan, Evan
Series s reviewed-by: Evan Quan 

-Original Message-
From: amd-gfx  On Behalf Of Tiecheng Zhou
Sent: Monday, April 27, 2020 11:30 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhou, Tiecheng 
Subject: [PATCH 2/2] drm/amd/powerplay: avoid using pm_en before it is 
initialized revised

hwmgr->pm_en is initialized at hwmgr_hw_init.

during amdgpu_device_init, there is amdgpu_asic_reset that calls to 
soc15_asic_reset (for V320 usecase, Vega10 asic), in which:
1) soc15_asic_reset_method calls to pp_get_asic_baco_capability (pm_en)
2) soc15_asic_baco_reset calls to pp_set_asic_baco_state (pm_en)

pm_en is used in the above two cases while it has not yet been initialized

So avoid using pm_en in the above two functions for V320 passthrough.

Signed-off-by: Tiecheng Zhou 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 71b843f542d8..fc31499c2e5c 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -1438,7 +1438,8 @@ static int pp_get_asic_baco_capability(void *handle, bool 
*cap)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->get_asic_baco_capability)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->get_asic_baco_capability)
return 0;
 
mutex_lock(>smu_lock);
@@ -1472,7 +1473,8 @@ static int pp_set_asic_baco_state(void *handle, int state)
if (!hwmgr)
return -EINVAL;
 
-   if (!hwmgr->pm_en || !hwmgr->hwmgr_func->set_asic_baco_state)
+   if (!(hwmgr->not_vf && amdgpu_dpm) ||
+   !hwmgr->hwmgr_func->set_asic_baco_state)
return 0;
 
mutex_lock(>smu_lock);
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cevan.quan%40amd.com%7C083ee458638840b718b708d7ea5b3934%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637235550456598209sdata=hn2iTGG4U5RgB9QEwYbjZkSwbcpo3wjMsiMu%2BJc%2BbuA%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: FreeBSD / amdgpu / Vega 3, pstate TEST_DEBUG_DATA: 0x0

2020-04-27 Thread Harry Wentland



On 2020-04-27 4:43 p.m., Andriy Gapon wrote:
> On 24/04/2020 20:22, Harry Wentland wrote:
>> On 2020-04-24 8:34 a.m., Andriy Gapon wrote:
>>>
>>> First, I understand that my platform is not directly supported and probably 
>>> not
>>> very interesting, but I still hope to get some tips or pointers.
>>>
>>
>> Hi Andriy,
>>
>> yeah, limited insight here since FreeBSD isn't something I'm familiar
>> with. :)
> 
> Harry,
> 
> thanks a lot for your help.  It set me on a right path, although a bit
> indirectly :-)
> 
> Let me start by saying that I was able to fix the driver.
> 
> After looking at the laptop in a dark room I could see that the backlight was
> actually on, but there was no video output.
> 
> So, I went back to comparing FreeBSD and Linux logs, especially around the 
> place
> where "TEST_DEBUG_DATA: 0x0" was reported.  And then suddenly I spotted what I
> missed before.  Linux reported 3 pipes while FreeBSD reported 4.  And the 
> errors
> were about the forth, non-existent, pipe -- I guess, not surprising at all.
> With some additional printfs I could confirm it for sure.  So, then I looked 
> for
> the code where the number of pipes is set and almost immediately could see the
> problem.
> 
> FreeBSD amdgpu has DCN_VERSION_1_01 support ifdef-ed out, for whatever reason.
> Your commit "drm/amd/display: Drop DCN1_01 guards" has not been ported yet and
> CONFIG_DRM_AMD_DC_DCN1_01 is not defined.  Of course, the number of pipes was
> not the only thing that did not match the actual hardware/firmware because of
> that.  Once I set CONFIG_DRM_AMD_DC_DCN1_01 everything just worked.
> 
> Thank you again!
> 

Awesome. Glad my comment helped and glad you found the problem.

Harry

>>> I am trying to get amdgpu/FreeBSD going on Motile M141 laptop with Ryzen 3 
>>> 3200U
>>> CPU that has Vega 3 graphics integrated.  When amdgpu starts loading the 
>>> screen
>>> goes black and never lights up again.  I am not sure whether there is no 
>>> signal
>>> at all or whether the backlight is turned off, but the screen is completely
>>> dark.  I can blindly interact with the system, so it's not crashed or hung.
>>> From system logs I can see that the driver attaches successfully.  It 
>>> recognizes
>>> the hardware, loads its firmware, detects the eDP screen and so on.
>>>
>>
>> Does BSD have a way to check or set your backlight value manually (a la
>> /sys/class/backlight on linux)? If so I'd suggest checking and setting
>> it to non-zero values, ideally to max_brightness.
>>
>> Have you tried an external display?
>>
>>> The FreeBSD's amdgpu port that I am trying is based on code circa 5.0.
>>> There is no newer version ported.
>>> I tried a couple of Linux distros with 5.3.x kernels and they worked 
>>> without any
>>> problems. So that gives me some hope.
>>>
>>> I compared driver messages (with drm_debug set to 0xfff) between Linux and
>>> FreeBSD and they look quite similar.  Except for one thing.
>>> In the FreeBSD case there are these error messages that are not seen with 
>>> Linux:
>>>
>>> [drm] pstate TEST_DEBUG_DATA: 0x0
>>> WARNING !(0) failed at
>>> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c:868
>>> #0 0x83451633 at linux_dump_stack+0x23
>>> #1 0x8325a9ee at dcn10_verify_allow_pstate_change_high+0x4e
>>> #2 0x8325e925 at dcn10_wait_for_mpcc_disconnect+0x25
>>> #3 0x8325de53 at dcn10_disable_plane+0x53
>>> #4 0x8325c9f5 at dcn10_init_hw+0x755
>>> #5 0x83295ca8 at dc_create+0x538
>>> #6 0x8327a8da at dm_hw_init+0x1ea
>>> #7 0x831701d1 at amdgpu_device_init+0x1b11
>>> #8 0x83185177 at amdgpu_driver_load_kms+0xd7
>>> #9 0x833f138e at drm_dev_register+0x17e
>>> #10 0x83178dea at amdgpu_pci_probe+0x18a
>>> #11 0x83456f40 at linux_pci_attach+0x560
>>> #12 0x80bf68ea at device_attach+0x3ca
>>> #13 0x80bf6490 at device_probe_and_attach+0x70
>>> #14 0x80bf8358 at bus_generic_driver_added+0x58
>>> #15 0x80bf4289 at devclass_driver_added+0x39
>>> #16 0x80bf41c7 at devclass_add_driver+0x147
>>> #17 0x83455ae9 at _linux_pci_register_driver+0xc9
>>>
>>> That warning plus stack trace is actually BREAK_TO_DEBUGGER() in the 
>>> original
>>> Linux code.
>>> So, that makes me think that the problem is pretty serious.
>>
>> BREAK_TO_DEBUGGER is probably overly scary here. It's somewhat a concern
>> as this means power consumption might be higher than expected. We've
>> seen this issue on several systems without any other adverse effects to
>> usability.
>>
>> Harry
>>
>>> I tried searching for "TEST_DEBUG_DATA: 0x0" and I could not find a single
>>> result with "0x0" in it.  Usually there is some non-zero value.
>>> To me this looks like maybe some hardware component is not turned on...
>>> Perhaps this is something relatively obvious for people that hack on the 
>>> driver
>>> and the hardware.
>>> I hope to receive some hint about 

Re: FreeBSD / amdgpu / Vega 3, pstate TEST_DEBUG_DATA: 0x0

2020-04-27 Thread Andriy Gapon
On 24/04/2020 20:22, Harry Wentland wrote:
> On 2020-04-24 8:34 a.m., Andriy Gapon wrote:
>>
>> First, I understand that my platform is not directly supported and probably 
>> not
>> very interesting, but I still hope to get some tips or pointers.
>>
> 
> Hi Andriy,
> 
> yeah, limited insight here since FreeBSD isn't something I'm familiar
> with. :)

Harry,

thanks a lot for your help.  It set me on a right path, although a bit
indirectly :-)

Let me start by saying that I was able to fix the driver.

After looking at the laptop in a dark room I could see that the backlight was
actually on, but there was no video output.

So, I went back to comparing FreeBSD and Linux logs, especially around the place
where "TEST_DEBUG_DATA: 0x0" was reported.  And then suddenly I spotted what I
missed before.  Linux reported 3 pipes while FreeBSD reported 4.  And the errors
were about the forth, non-existent, pipe -- I guess, not surprising at all.
With some additional printfs I could confirm it for sure.  So, then I looked for
the code where the number of pipes is set and almost immediately could see the
problem.

FreeBSD amdgpu has DCN_VERSION_1_01 support ifdef-ed out, for whatever reason.
Your commit "drm/amd/display: Drop DCN1_01 guards" has not been ported yet and
CONFIG_DRM_AMD_DC_DCN1_01 is not defined.  Of course, the number of pipes was
not the only thing that did not match the actual hardware/firmware because of
that.  Once I set CONFIG_DRM_AMD_DC_DCN1_01 everything just worked.

Thank you again!

>> I am trying to get amdgpu/FreeBSD going on Motile M141 laptop with Ryzen 3 
>> 3200U
>> CPU that has Vega 3 graphics integrated.  When amdgpu starts loading the 
>> screen
>> goes black and never lights up again.  I am not sure whether there is no 
>> signal
>> at all or whether the backlight is turned off, but the screen is completely
>> dark.  I can blindly interact with the system, so it's not crashed or hung.
>> From system logs I can see that the driver attaches successfully.  It 
>> recognizes
>> the hardware, loads its firmware, detects the eDP screen and so on.
>>
> 
> Does BSD have a way to check or set your backlight value manually (a la
> /sys/class/backlight on linux)? If so I'd suggest checking and setting
> it to non-zero values, ideally to max_brightness.
> 
> Have you tried an external display?
> 
>> The FreeBSD's amdgpu port that I am trying is based on code circa 5.0.
>> There is no newer version ported.
>> I tried a couple of Linux distros with 5.3.x kernels and they worked without 
>> any
>> problems. So that gives me some hope.
>>
>> I compared driver messages (with drm_debug set to 0xfff) between Linux and
>> FreeBSD and they look quite similar.  Except for one thing.
>> In the FreeBSD case there are these error messages that are not seen with 
>> Linux:
>>
>> [drm] pstate TEST_DEBUG_DATA: 0x0
>> WARNING !(0) failed at
>> /usr/home/avg/devel/kms-drm/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c:868
>> #0 0x83451633 at linux_dump_stack+0x23
>> #1 0x8325a9ee at dcn10_verify_allow_pstate_change_high+0x4e
>> #2 0x8325e925 at dcn10_wait_for_mpcc_disconnect+0x25
>> #3 0x8325de53 at dcn10_disable_plane+0x53
>> #4 0x8325c9f5 at dcn10_init_hw+0x755
>> #5 0x83295ca8 at dc_create+0x538
>> #6 0x8327a8da at dm_hw_init+0x1ea
>> #7 0x831701d1 at amdgpu_device_init+0x1b11
>> #8 0x83185177 at amdgpu_driver_load_kms+0xd7
>> #9 0x833f138e at drm_dev_register+0x17e
>> #10 0x83178dea at amdgpu_pci_probe+0x18a
>> #11 0x83456f40 at linux_pci_attach+0x560
>> #12 0x80bf68ea at device_attach+0x3ca
>> #13 0x80bf6490 at device_probe_and_attach+0x70
>> #14 0x80bf8358 at bus_generic_driver_added+0x58
>> #15 0x80bf4289 at devclass_driver_added+0x39
>> #16 0x80bf41c7 at devclass_add_driver+0x147
>> #17 0x83455ae9 at _linux_pci_register_driver+0xc9
>>
>> That warning plus stack trace is actually BREAK_TO_DEBUGGER() in the original
>> Linux code.
>> So, that makes me think that the problem is pretty serious.
> 
> BREAK_TO_DEBUGGER is probably overly scary here. It's somewhat a concern
> as this means power consumption might be higher than expected. We've
> seen this issue on several systems without any other adverse effects to
> usability.
> 
> Harry
> 
>> I tried searching for "TEST_DEBUG_DATA: 0x0" and I could not find a single
>> result with "0x0" in it.  Usually there is some non-zero value.
>> To me this looks like maybe some hardware component is not turned on...
>> Perhaps this is something relatively obvious for people that hack on the 
>> driver
>> and the hardware.
>> I hope to receive some hint about what to look for.
>> I can cherry-pick commits from Linux, apply patches, add additional debugging
>> logs, etc.
>>
>> FreeBSD amdgpu dmesg: https://people.freebsd.org/~avg/amdgpu.dmesg.txt
>> Full Linux dmesg: 

[PATCH] drm/amdgpu/vcn2.5: wait for tiles off after unpause

2020-04-27 Thread James Zhu
Wait for tiles off after unpause to fix transcode timeout issue.
It is a work around.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
index 0fa1c5c..38ca4a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
@@ -1404,7 +1404,7 @@ static int vcn_v2_5_pause_dpg_mode(struct amdgpu_device 
*adev,
 {
struct amdgpu_ring *ring;
uint32_t reg_data = 0;
-   int ret_code;
+   int ret_code = 0;
 
/* pause/unpause if state is changed */
if (adev->vcn.inst[inst_idx].pause_state.fw_based != 
new_state->fw_based) {
@@ -1414,7 +1414,6 @@ static int vcn_v2_5_pause_dpg_mode(struct amdgpu_device 
*adev,
(~UVD_DPG_PAUSE__NJ_PAUSE_DPG_ACK_MASK);
 
if (new_state->fw_based == VCN_DPG_STATE__PAUSE) {
-   ret_code = 0;
SOC15_WAIT_ON_RREG(UVD, inst_idx, mmUVD_POWER_STATUS, 
0x1,
UVD_POWER_STATUS__UVD_POWER_STATUS_MASK, 
ret_code);
 
@@ -1469,9 +1468,10 @@ static int vcn_v2_5_pause_dpg_mode(struct amdgpu_device 
*adev,
   UVD_PGFSM_CONFIG__UVDM_UVDU_PWR_ON, 
UVD_POWER_STATUS__UVD_POWER_STATUS_MASK, ret_code);
}
} else {
-   /* unpause dpg, no need to wait */
reg_data &= ~UVD_DPG_PAUSE__NJ_PAUSE_DPG_REQ_MASK;
WREG32_SOC15(UVD, inst_idx, mmUVD_DPG_PAUSE, reg_data);
+   SOC15_WAIT_ON_RREG(UVD, inst_idx, mmUVD_POWER_STATUS, 
0x1,
+   UVD_POWER_STATUS__UVD_POWER_STATUS_MASK, 
ret_code);
}
adev->vcn.inst[inst_idx].pause_state.fw_based = 
new_state->fw_based;
}
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: remove conversion to bool in dc_link_ddc.c

2020-04-27 Thread Alex Deucher
Applied.  Thanks!

Alex

On Mon, Apr 27, 2020 at 4:03 AM Christian König
 wrote:
>
> Am 27.04.20 um 08:37 schrieb Jason Yan:
> > The '>' expression itself is bool, no need to convert it to bool again.
> > This fixes the following coccicheck warning:
> >
> > drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c:602:28-33: WARNING:
> > conversion to bool not needed here
> >
> > Signed-off-by: Jason Yan 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c 
> > b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
> > index 256889eed93e..aefd29a440b5 100644
> > --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
> > +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
> > @@ -599,7 +599,7 @@ bool dal_ddc_submit_aux_command(struct ddc_service *ddc,
> >   do {
> >   struct aux_payload current_payload;
> >   bool is_end_of_payload = (retrieved + 
> > DEFAULT_AUX_MAX_DATA_SIZE) >
> > - payload->length ? true : false;
> > + payload->length;
> >
> >   current_payload.address = payload->address;
> >   current_payload.data = >data[retrieved];
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: remove conversion to bool in dcn20_mpc.c

2020-04-27 Thread Alex Deucher
Applied.  Thanks!

Alex

On Mon, Apr 27, 2020 at 4:03 AM Christian König
 wrote:
>
> Am 27.04.20 um 08:37 schrieb Jason Yan:
> > The '==' expression itself is bool, no need to convert it to bool again.
> > This fixes the following coccicheck warning:
> >
> > drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c:455:70-75: WARNING:
> > conversion to bool not needed here
> >
> > Signed-off-by: Jason Yan 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c 
> > b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
> > index de9c857ab3e9..9d7432f3fb16 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
> > +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
> > @@ -452,7 +452,7 @@ void mpc2_set_output_gamma(
> >   next_mode = LUT_RAM_A;
> >
> >   mpc20_power_on_ogam_lut(mpc, mpcc_id, true);
> > - mpc20_configure_ogam_lut(mpc, mpcc_id, next_mode == LUT_RAM_A ? 
> > true:false);
> > + mpc20_configure_ogam_lut(mpc, mpcc_id, next_mode == LUT_RAM_A);
> >
> >   if (next_mode == LUT_RAM_A)
> >   mpc2_program_luta(mpc, mpcc_id, params);
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove conversion to bool in amdgpu_device.c

2020-04-27 Thread Alex Deucher
Applied.  thanks!

Alex

On Mon, Apr 27, 2020 at 4:02 AM Christian König
 wrote:
>
> Am 27.04.20 um 08:36 schrieb Jason Yan:
> > The '>' expression itself is bool, no need to convert it to bool again.
> > This fixes the following coccicheck warning:
> >
> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3004:68-73: WARNING:
> > conversion to bool not needed here
> >
> > Signed-off-by: Jason Yan 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 3d601d5dd5af..ad94de3632d8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3000,7 +3000,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >   INIT_WORK(>xgmi_reset_work, amdgpu_device_xgmi_reset_func);
> >
> >   adev->gfx.gfx_off_req_count = 1;
> > - adev->pm.ac_power = power_supply_is_system_supplied() > 0 ? true : 
> > false;
> > + adev->pm.ac_power = power_supply_is_system_supplied() > 0;
> >
> >   /* Registers mapping */
> >   /* TODO: block userspace mapping of io register */
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3] drm/amdkfd: Track GPU memory utilization per process

2020-04-27 Thread Felix Kuehling
Thanks Mukul.

One last thing I noticed. It seems the update the the vram_usage is
protected by the process->mutex. That means the variable will always be
coherent while that lock is held. However, the sysfs-show function
doesn't hold that lock. So if the compiler decides to break the update
into multiple instructions, or sysfs-show does the read with multiple
instructions, you may see inconsistent results. You can ensure that
doesn't happen by only updating the size atomically with WRITE_ONCE and
only reading it atomically with READ_ONCE in the sysfs-show function.

For reference, this is a good article on the subject on lwn.net: [Who's
afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/)

Regards,
  Felix

Am 2020-04-26 um 11:45 p.m. schrieb Mukul Joshi:
> Track GPU VRAM usage on a per process basis and report it through
> sysfs.
>
> v2:
>- Handle AMDGPU BO-specific details in
>  amdgpu_amdkfd_gpuvm_free_memory_of_gpu().
>- Return size of VRAM BO being freed from
>  amdgpu_amdkfd_gpuvm_free_memory_of_gpu().
>- Do not consider imported memory for VRAM
>  usage calculations.
>
> v3:
>- Move handling of imported BO size from
>  kfd_ioctl_free_memory_of_gpu() to  
>  amdgpu_amdkfd_gpuvm_free_memory_of_gpu().
>
> Signed-off-by: Mukul Joshi 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  3 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 16 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 13 -
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  7 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 57 ---
>  5 files changed, 84 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index d065c50582eb..a501026e829c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -65,6 +65,7 @@ struct kgd_mem {
>   struct amdgpu_sync sync;
>  
>   bool aql_queue;
> + bool is_imported;
>  };
>  
>  /* KFD Memory Eviction */
> @@ -219,7 +220,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   void *vm, struct kgd_mem **mem,
>   uint64_t *offset, uint32_t flags);
>  int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
> - struct kgd_dev *kgd, struct kgd_mem *mem);
> + struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size);
>  int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>   struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
>  int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 0768b7eb7683..1247938b1ec1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1277,7 +1277,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>  }
>  
>  int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
> - struct kgd_dev *kgd, struct kgd_mem *mem)
> + struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size)
>  {
>   struct amdkfd_process_info *process_info = mem->process_info;
>   unsigned long bo_size = mem->bo->tbo.mem.size;
> @@ -1286,9 +1286,11 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   struct ttm_validate_buffer *bo_list_entry;
>   unsigned int mapped_to_gpu_memory;
>   int ret;
> + bool is_imported = 0;
>  
>   mutex_lock(>lock);
>   mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
> + is_imported = mem->is_imported;
>   mutex_unlock(>lock);
>   /* lock is not needed after this, since mem is unused and will
>* be freed anyway
> @@ -1340,6 +1342,17 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   kfree(mem->bo->tbo.sg);
>   }
>  
> + /* Update the size of the BO being freed if it was allocated from
> +  * VRAM and is not imported.
> +  */
> + if (size) {
> + if ((mem->bo->preferred_domains == AMDGPU_GEM_DOMAIN_VRAM) &&
> + (!is_imported))
> + *size = bo_size;
> + else
> + *size = 0;
> + }
> +
>   /* Free the BO*/
>   amdgpu_bo_unref(>bo);
>   mutex_destroy(>lock);
> @@ -1694,6 +1707,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
> *kgd,
>   (*mem)->process_info = avm->process_info;
>   add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, false);
>   amdgpu_sync_create(&(*mem)->sync);
> + (*mem)->is_imported = true;
>  
>   return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index f8fa03a12add..ede84f76397f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1322,6 +1322,10 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
> *filep,
>   goto err_free;
>   }

Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-04-27 Thread Alex Deucher
On Mon, Apr 27, 2020 at 2:39 PM Takashi Iwai  wrote:
>
> On Mon, 27 Apr 2020 20:28:12 +0200,
> Alex Deucher wrote:
> >
> > On Mon, Apr 27, 2020 at 2:07 PM Nicholas Johnson
> >  wrote:
> > >
> > > On Mon, Apr 27, 2020 at 05:15:55PM +0200, Takashi Iwai wrote:
> > > > On Mon, 27 Apr 2020 16:22:21 +0200,
> > > > Deucher, Alexander wrote:
> > > > >
> > > > > [AMD Public Use]
> > > > >
> > > > > > -Original Message-
> > > > > > From: Nicholas Johnson 
> > > > > > Sent: Sunday, April 26, 2020 12:02 PM
> > > > > > To: linux-ker...@vger.kernel.org
> > > > > > Cc: Deucher, Alexander ; Koenig, 
> > > > > > Christian
> > > > > > ; Zhou, David(ChunMing)
> > > > > > ; Nicholas Johnson  > > > > > opensou...@outlook.com.au>
> > > > > > Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO 
> > > > > > state
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: 
> > > > > > enable
> > > > > > runpm on baco capable VI+ asics"), my AMD R9 Nano has been using 
> > > > > > runpm /
> > > > > > BACO. You can tell visually when it sleeps, because the fan on the 
> > > > > > graphics
> > > > > > card is switched off to save power. It did not spin down the fan in 
> > > > > > v5.6.x.
> > > > > >
> > > > > > This is great (I love it), except that when it is sleeping, the 
> > > > > > PCIe audio function
> > > > > > of the GPU has issues if anything tries to access it. You get dmesg 
> > > > > > errors such
> > > > > > as these:
> > > > > >
> > > > > > snd_hda_intel :08:00.1: spurious response 0x0:0x0, last 
> > > > > > cmd=0x170500
> > > > > > snd_hda_intel :08:00.1: azx_get_response timeout, switching to 
> > > > > > polling
> > > > > > mode: last cmd=0x001f0500 snd_hda_intel :08:00.1: No response 
> > > > > > from
> > > > > > codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel 
> > > > > > :08:00.1: No
> > > > > > response from codec, resetting bus: last cmd=0x001f0500
> > > > > > snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. 
> > > > > > -11
> > > > > >
> > > > > > The above is with the Fiji XT GPU at :08:00.0 in a Thunderbolt 
> > > > > > enclosure
> > > > > > (not that Thunderbolt should affect it, but I feel I should mention 
> > > > > > it just in
> > > > > > case). I dropped a lot of duplicate dmesg lines, as some of them 
> > > > > > repeated a
> > > > > > lot of times before the driver gave up.
> > > > > >
> > > > > > I offer this patch to disable runpm for Fiji while a fix is found, 
> > > > > > if you decide
> > > > > > that is the best approach. Regardless, I will gladly test any 
> > > > > > patches you come
> > > > > > up with instead and confirm that the above issue has been fixed.
> > > > > >
> > > > > > I cannot tell if any other GPUs are affected. The only other cards 
> > > > > > to which I
> > > > > > have access are a couple of AMD R9 280X (Tahiti XT), which use 
> > > > > > radeon driver
> > > > > > instead of amdgpu driver.
> > > > >
> > > > > Adding a few more people.  Do you know what is accessing the audio?  
> > > > > The audio should have a dependency on the GPU device.  The GPU won't 
> > > > > enter runtime pm until the audio has entered runtime pm and vice 
> > > > > versa on resume. Please attach a copy of your dmesg output and lspci 
> > > > > output.
> > >
> > > pci :08:00.1: D0 power state depends on :08:00.0
> > > The above must be the dependency of which you speak from dmesg.
> > >
> > > Accessing the audio? I did not have a single method for triggering it.
> > > Sometimes it happened on shutdown. Sometimes when restarting gdm.
> > > Sometimes when playing with audio settings in Cinnamon Desktop. But most
> > > often when changing displays. It might have something to do with the
> > > audio device associated with a monitor being created when the monitor is
> > > found. If an audio device is created, then pulseaudio might touch it.
> > > Sorry, this is a very verbose "not quite sure".
> > >
> > > To trigger the bug, this time I did the following:
> > >
> > > 1. Boot laptop without Fiji and log in
> > >
> > > 2. Attach Fiji via Thunderbolt (no displays attached to Fiji) and
> > > approve Thunderbolt device
> > >
> > > 3. Log in again because the session gets killed when GPU is hot-added
> > >
> > > 4. Wait for Fiji to fall asleep (fan stops)
> > >
> > > 5. Open "dmesg -w" on laptop display
> > >
> > > 6. Attach display to DisplayPort on Fiji (it should still stay asleep)
> > >
> > > 7. Do WindowsKey+P to activate external display. The error appears in
> > > dmesg window that instant.
> > >
> > > Could it be a race condition when waking the card up?
> > >
> > > I cannot get the graphics card fan to spin down if the Thunderbolt
> > > enclosure is attached at boot time. It only does it if hot-added.
> > >
> > > If you think it will help, I can take out the Fiji and put it in a test
> > > rig and try to replicate the issue without Thunderbolt, but it looks
> > > 

Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-04-27 Thread Takashi Iwai
On Mon, 27 Apr 2020 20:28:12 +0200,
Alex Deucher wrote:
> 
> On Mon, Apr 27, 2020 at 2:07 PM Nicholas Johnson
>  wrote:
> >
> > On Mon, Apr 27, 2020 at 05:15:55PM +0200, Takashi Iwai wrote:
> > > On Mon, 27 Apr 2020 16:22:21 +0200,
> > > Deucher, Alexander wrote:
> > > >
> > > > [AMD Public Use]
> > > >
> > > > > -Original Message-
> > > > > From: Nicholas Johnson 
> > > > > Sent: Sunday, April 26, 2020 12:02 PM
> > > > > To: linux-ker...@vger.kernel.org
> > > > > Cc: Deucher, Alexander ; Koenig, Christian
> > > > > ; Zhou, David(ChunMing)
> > > > > ; Nicholas Johnson  > > > > opensou...@outlook.com.au>
> > > > > Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO 
> > > > > state
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
> > > > > runpm on baco capable VI+ asics"), my AMD R9 Nano has been using 
> > > > > runpm /
> > > > > BACO. You can tell visually when it sleeps, because the fan on the 
> > > > > graphics
> > > > > card is switched off to save power. It did not spin down the fan in 
> > > > > v5.6.x.
> > > > >
> > > > > This is great (I love it), except that when it is sleeping, the PCIe 
> > > > > audio function
> > > > > of the GPU has issues if anything tries to access it. You get dmesg 
> > > > > errors such
> > > > > as these:
> > > > >
> > > > > snd_hda_intel :08:00.1: spurious response 0x0:0x0, last 
> > > > > cmd=0x170500
> > > > > snd_hda_intel :08:00.1: azx_get_response timeout, switching to 
> > > > > polling
> > > > > mode: last cmd=0x001f0500 snd_hda_intel :08:00.1: No response from
> > > > > codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel :08:00.1: 
> > > > > No
> > > > > response from codec, resetting bus: last cmd=0x001f0500
> > > > > snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11
> > > > >
> > > > > The above is with the Fiji XT GPU at :08:00.0 in a Thunderbolt 
> > > > > enclosure
> > > > > (not that Thunderbolt should affect it, but I feel I should mention 
> > > > > it just in
> > > > > case). I dropped a lot of duplicate dmesg lines, as some of them 
> > > > > repeated a
> > > > > lot of times before the driver gave up.
> > > > >
> > > > > I offer this patch to disable runpm for Fiji while a fix is found, if 
> > > > > you decide
> > > > > that is the best approach. Regardless, I will gladly test any patches 
> > > > > you come
> > > > > up with instead and confirm that the above issue has been fixed.
> > > > >
> > > > > I cannot tell if any other GPUs are affected. The only other cards to 
> > > > > which I
> > > > > have access are a couple of AMD R9 280X (Tahiti XT), which use radeon 
> > > > > driver
> > > > > instead of amdgpu driver.
> > > >
> > > > Adding a few more people.  Do you know what is accessing the audio?  
> > > > The audio should have a dependency on the GPU device.  The GPU won't 
> > > > enter runtime pm until the audio has entered runtime pm and vice versa 
> > > > on resume. Please attach a copy of your dmesg output and lspci output.
> >
> > pci :08:00.1: D0 power state depends on :08:00.0
> > The above must be the dependency of which you speak from dmesg.
> >
> > Accessing the audio? I did not have a single method for triggering it.
> > Sometimes it happened on shutdown. Sometimes when restarting gdm.
> > Sometimes when playing with audio settings in Cinnamon Desktop. But most
> > often when changing displays. It might have something to do with the
> > audio device associated with a monitor being created when the monitor is
> > found. If an audio device is created, then pulseaudio might touch it.
> > Sorry, this is a very verbose "not quite sure".
> >
> > To trigger the bug, this time I did the following:
> >
> > 1. Boot laptop without Fiji and log in
> >
> > 2. Attach Fiji via Thunderbolt (no displays attached to Fiji) and
> > approve Thunderbolt device
> >
> > 3. Log in again because the session gets killed when GPU is hot-added
> >
> > 4. Wait for Fiji to fall asleep (fan stops)
> >
> > 5. Open "dmesg -w" on laptop display
> >
> > 6. Attach display to DisplayPort on Fiji (it should still stay asleep)
> >
> > 7. Do WindowsKey+P to activate external display. The error appears in
> > dmesg window that instant.
> >
> > Could it be a race condition when waking the card up?
> >
> > I cannot get the graphics card fan to spin down if the Thunderbolt
> > enclosure is attached at boot time. It only does it if hot-added.
> >
> > If you think it will help, I can take out the Fiji and put it in a test
> > rig and try to replicate the issue without Thunderbolt, but it looks
> > like it will not spin the fan down if Fiji is attached at boot time.
> >
> > Question, why would the fan not spin down if Fiji is attached at boot
> > time, and how would one make the said fan turn off? Aside from being
> > useful for pinning down the audio register issue, I would like to make
> > sure the power savings are realised 

Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-04-27 Thread Alex Deucher
On Mon, Apr 27, 2020 at 2:07 PM Nicholas Johnson
 wrote:
>
> On Mon, Apr 27, 2020 at 05:15:55PM +0200, Takashi Iwai wrote:
> > On Mon, 27 Apr 2020 16:22:21 +0200,
> > Deucher, Alexander wrote:
> > >
> > > [AMD Public Use]
> > >
> > > > -Original Message-
> > > > From: Nicholas Johnson 
> > > > Sent: Sunday, April 26, 2020 12:02 PM
> > > > To: linux-ker...@vger.kernel.org
> > > > Cc: Deucher, Alexander ; Koenig, Christian
> > > > ; Zhou, David(ChunMing)
> > > > ; Nicholas Johnson  > > > opensou...@outlook.com.au>
> > > > Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
> > > >
> > > > Hi all,
> > > >
> > > > Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
> > > > runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm /
> > > > BACO. You can tell visually when it sleeps, because the fan on the 
> > > > graphics
> > > > card is switched off to save power. It did not spin down the fan in 
> > > > v5.6.x.
> > > >
> > > > This is great (I love it), except that when it is sleeping, the PCIe 
> > > > audio function
> > > > of the GPU has issues if anything tries to access it. You get dmesg 
> > > > errors such
> > > > as these:
> > > >
> > > > snd_hda_intel :08:00.1: spurious response 0x0:0x0, last cmd=0x170500
> > > > snd_hda_intel :08:00.1: azx_get_response timeout, switching to 
> > > > polling
> > > > mode: last cmd=0x001f0500 snd_hda_intel :08:00.1: No response from
> > > > codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel :08:00.1: No
> > > > response from codec, resetting bus: last cmd=0x001f0500
> > > > snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11
> > > >
> > > > The above is with the Fiji XT GPU at :08:00.0 in a Thunderbolt 
> > > > enclosure
> > > > (not that Thunderbolt should affect it, but I feel I should mention it 
> > > > just in
> > > > case). I dropped a lot of duplicate dmesg lines, as some of them 
> > > > repeated a
> > > > lot of times before the driver gave up.
> > > >
> > > > I offer this patch to disable runpm for Fiji while a fix is found, if 
> > > > you decide
> > > > that is the best approach. Regardless, I will gladly test any patches 
> > > > you come
> > > > up with instead and confirm that the above issue has been fixed.
> > > >
> > > > I cannot tell if any other GPUs are affected. The only other cards to 
> > > > which I
> > > > have access are a couple of AMD R9 280X (Tahiti XT), which use radeon 
> > > > driver
> > > > instead of amdgpu driver.
> > >
> > > Adding a few more people.  Do you know what is accessing the audio?  The 
> > > audio should have a dependency on the GPU device.  The GPU won't enter 
> > > runtime pm until the audio has entered runtime pm and vice versa on 
> > > resume. Please attach a copy of your dmesg output and lspci output.
>
> pci :08:00.1: D0 power state depends on :08:00.0
> The above must be the dependency of which you speak from dmesg.
>
> Accessing the audio? I did not have a single method for triggering it.
> Sometimes it happened on shutdown. Sometimes when restarting gdm.
> Sometimes when playing with audio settings in Cinnamon Desktop. But most
> often when changing displays. It might have something to do with the
> audio device associated with a monitor being created when the monitor is
> found. If an audio device is created, then pulseaudio might touch it.
> Sorry, this is a very verbose "not quite sure".
>
> To trigger the bug, this time I did the following:
>
> 1. Boot laptop without Fiji and log in
>
> 2. Attach Fiji via Thunderbolt (no displays attached to Fiji) and
> approve Thunderbolt device
>
> 3. Log in again because the session gets killed when GPU is hot-added
>
> 4. Wait for Fiji to fall asleep (fan stops)
>
> 5. Open "dmesg -w" on laptop display
>
> 6. Attach display to DisplayPort on Fiji (it should still stay asleep)
>
> 7. Do WindowsKey+P to activate external display. The error appears in
> dmesg window that instant.
>
> Could it be a race condition when waking the card up?
>
> I cannot get the graphics card fan to spin down if the Thunderbolt
> enclosure is attached at boot time. It only does it if hot-added.
>
> If you think it will help, I can take out the Fiji and put it in a test
> rig and try to replicate the issue without Thunderbolt, but it looks
> like it will not spin the fan down if Fiji is attached at boot time.
>
> Question, why would the fan not spin down if Fiji is attached at boot
> time, and how would one make the said fan turn off? Aside from being
> useful for pinning down the audio register issue, I would like to make
> sure the power savings are realised whenever the GPU is not being used.

Presumably something is using the device.  Maybe a framebuffer console
or X?  Or maybe the something like tlp has disabled runtime pm on your
device?  You can see the current status by reading the files in
/sys/class/drm/cardX/device/power/ .  Replace cardX with card0, card1,
etc. depending 

Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-04-27 Thread Nicholas Johnson
On Mon, Apr 27, 2020 at 05:15:55PM +0200, Takashi Iwai wrote:
> On Mon, 27 Apr 2020 16:22:21 +0200,
> Deucher, Alexander wrote:
> > 
> > [AMD Public Use]
> > 
> > > -Original Message-
> > > From: Nicholas Johnson 
> > > Sent: Sunday, April 26, 2020 12:02 PM
> > > To: linux-ker...@vger.kernel.org
> > > Cc: Deucher, Alexander ; Koenig, Christian
> > > ; Zhou, David(ChunMing)
> > > ; Nicholas Johnson  > > opensou...@outlook.com.au>
> > > Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
> > > 
> > > Hi all,
> > > 
> > > Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
> > > runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm /
> > > BACO. You can tell visually when it sleeps, because the fan on the 
> > > graphics
> > > card is switched off to save power. It did not spin down the fan in 
> > > v5.6.x.
> > > 
> > > This is great (I love it), except that when it is sleeping, the PCIe 
> > > audio function
> > > of the GPU has issues if anything tries to access it. You get dmesg 
> > > errors such
> > > as these:
> > > 
> > > snd_hda_intel :08:00.1: spurious response 0x0:0x0, last cmd=0x170500
> > > snd_hda_intel :08:00.1: azx_get_response timeout, switching to polling
> > > mode: last cmd=0x001f0500 snd_hda_intel :08:00.1: No response from
> > > codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel :08:00.1: No
> > > response from codec, resetting bus: last cmd=0x001f0500
> > > snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11
> > > 
> > > The above is with the Fiji XT GPU at :08:00.0 in a Thunderbolt 
> > > enclosure
> > > (not that Thunderbolt should affect it, but I feel I should mention it 
> > > just in
> > > case). I dropped a lot of duplicate dmesg lines, as some of them repeated 
> > > a
> > > lot of times before the driver gave up.
> > > 
> > > I offer this patch to disable runpm for Fiji while a fix is found, if you 
> > > decide
> > > that is the best approach. Regardless, I will gladly test any patches you 
> > > come
> > > up with instead and confirm that the above issue has been fixed.
> > > 
> > > I cannot tell if any other GPUs are affected. The only other cards to 
> > > which I
> > > have access are a couple of AMD R9 280X (Tahiti XT), which use radeon 
> > > driver
> > > instead of amdgpu driver.
> > 
> > Adding a few more people.  Do you know what is accessing the audio?  The 
> > audio should have a dependency on the GPU device.  The GPU won't enter 
> > runtime pm until the audio has entered runtime pm and vice versa on resume. 
> > Please attach a copy of your dmesg output and lspci output.

pci :08:00.1: D0 power state depends on :08:00.0
The above must be the dependency of which you speak from dmesg.

Accessing the audio? I did not have a single method for triggering it. 
Sometimes it happened on shutdown. Sometimes when restarting gdm. 
Sometimes when playing with audio settings in Cinnamon Desktop. But most 
often when changing displays. It might have something to do with the 
audio device associated with a monitor being created when the monitor is 
found. If an audio device is created, then pulseaudio might touch it. 
Sorry, this is a very verbose "not quite sure".

To trigger the bug, this time I did the following:

1. Boot laptop without Fiji and log in

2. Attach Fiji via Thunderbolt (no displays attached to Fiji) and 
approve Thunderbolt device

3. Log in again because the session gets killed when GPU is hot-added

4. Wait for Fiji to fall asleep (fan stops)

5. Open "dmesg -w" on laptop display

6. Attach display to DisplayPort on Fiji (it should still stay asleep)

7. Do WindowsKey+P to activate external display. The error appears in 
dmesg window that instant.

Could it be a race condition when waking the card up?

I cannot get the graphics card fan to spin down if the Thunderbolt 
enclosure is attached at boot time. It only does it if hot-added.

If you think it will help, I can take out the Fiji and put it in a test 
rig and try to replicate the issue without Thunderbolt, but it looks 
like it will not spin the fan down if Fiji is attached at boot time.

Question, why would the fan not spin down if Fiji is attached at boot 
time, and how would one make the said fan turn off? Aside from being 
useful for pinning down the audio register issue, I would like to make 
sure the power savings are realised whenever the GPU is not being used.

> 
> Also, please retest with the fresh 5.7-rc3.  There was a known
> regression regarding HD-audio PM in 5.7-rc1/rc2, and it's been fixed
> there (commit 8d6762af302d).
Linux v5.7-rc3 still has the same problem, unfortunately.

The dmesg is attached.

Thanks for your replies. Kind regards,
Nicholas

> 
> 
> thanks,
> 
> Takashi
[0.00] microcode: microcode updated early to revision 0xca, date = 
2019-10-03
[0.00] Linux version 5.7.0-rc3+ (nicholas@nicholas-dell-linux) (gcc 
version 9.3.0 (Arch Linux 9.3.0-1), 

[PATCH 1/2] drm/amdgpu: Add ReadSerial defines for Arcturus

2020-04-27 Thread Kent Russell
Add the ReadSerial definitions for Arcturus to the arcturus_ppsmc.h
header for use with unique_id

Signed-off-by: Kent Russell 
Change-Id: I71849ec381730b1803e54cd6040aa3b245b53de8
---
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c   | 2 ++
 drivers/gpu/drm/amd/powerplay/inc/arcturus_ppsmc.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index 1c66b7d7139c..e98d92ec1eac 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -128,6 +128,8 @@ static struct smu_11_0_cmn2aisc_mapping 
arcturus_message_map[SMU_MSG_MAX_COUNT]
MSG_MAP(SetXgmiMode, PPSMC_MSG_SetXgmiMode),
MSG_MAP(SetMemoryChannelEnable,  
PPSMC_MSG_SetMemoryChannelEnable),
MSG_MAP(DFCstateControl, PPSMC_MSG_DFCstateControl),
+   MSG_MAP(ReadSerialNumTop32,  
PPSMC_MSG_ReadSerialNumTop32),
+   MSG_MAP(ReadSerialNumBottom32,   
PPSMC_MSG_ReadSerialNumBottom32),
 };
 
 static struct smu_11_0_cmn2aisc_mapping arcturus_clk_map[SMU_CLK_COUNT] = {
diff --git a/drivers/gpu/drm/amd/powerplay/inc/arcturus_ppsmc.h 
b/drivers/gpu/drm/amd/powerplay/inc/arcturus_ppsmc.h
index f736d773f9d6..b426be7c76c6 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/arcturus_ppsmc.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/arcturus_ppsmc.h
@@ -116,6 +116,9 @@
 #define PPSMC_MSG_DFCstateControl   0x3B
 #define PPSMC_Message_Count 0x3C
 
+#define PPSMC_MSG_ReadSerialNumTop320x49
+#define PPSMC_MSG_ReadSerialNumBottom32 0x4A
+
 typedef uint32_t PPSMC_Result;
 typedef uint32_t PPSMC_Msg;
 #pragma pack(pop)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amdgpu: Add unique_id for Arcturus

2020-04-27 Thread Kent Russell
Add support for unique_id for Arcturus, since we only have the ppsmc
definitions for that added at the moment

Signed-off-by: Kent Russell 
Change-Id: I66f8e9ff41521d6c13ff673587d6061c1f3f4b7a
---
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index e98d92ec1eac..f55f9b371bf2 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -293,6 +293,7 @@ static int arcturus_get_workload_type(struct smu_context 
*smu, enum PP_SMC_POWER
 static int arcturus_tables_init(struct smu_context *smu, struct smu_table 
*tables)
 {
struct smu_table_context *smu_table = >smu_table;
+   uint32_t top32, bottom32;
 
SMU_TABLE_INIT(tables, SMU_TABLE_PPTABLE, sizeof(PPTable_t),
   PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM);
@@ -315,6 +316,15 @@ static int arcturus_tables_init(struct smu_context *smu, 
struct smu_table *table
return -ENOMEM;
smu_table->metrics_time = 0;
 
+   if (smu->adev->asic_type == CHIP_ARCTURUS) {
+   /* Get the SN to turn into a Unique ID */
+   smu_send_smc_msg(smu, SMU_MSG_ReadSerialNumTop32,
+);
+   smu_send_smc_msg(smu, SMU_MSG_ReadSerialNumBottom32,
+);
+
+   smu->adev->unique_id = ((uint64_t)bottom32 << 32) | top32;
+   }
return 0;
 }
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-04-27 Thread Takashi Iwai
On Mon, 27 Apr 2020 16:22:21 +0200,
Deucher, Alexander wrote:
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Nicholas Johnson 
> > Sent: Sunday, April 26, 2020 12:02 PM
> > To: linux-ker...@vger.kernel.org
> > Cc: Deucher, Alexander ; Koenig, Christian
> > ; Zhou, David(ChunMing)
> > ; Nicholas Johnson  > opensou...@outlook.com.au>
> > Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
> > 
> > Hi all,
> > 
> > Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
> > runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm /
> > BACO. You can tell visually when it sleeps, because the fan on the graphics
> > card is switched off to save power. It did not spin down the fan in v5.6.x.
> > 
> > This is great (I love it), except that when it is sleeping, the PCIe audio 
> > function
> > of the GPU has issues if anything tries to access it. You get dmesg errors 
> > such
> > as these:
> > 
> > snd_hda_intel :08:00.1: spurious response 0x0:0x0, last cmd=0x170500
> > snd_hda_intel :08:00.1: azx_get_response timeout, switching to polling
> > mode: last cmd=0x001f0500 snd_hda_intel :08:00.1: No response from
> > codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel :08:00.1: No
> > response from codec, resetting bus: last cmd=0x001f0500
> > snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11
> > 
> > The above is with the Fiji XT GPU at :08:00.0 in a Thunderbolt enclosure
> > (not that Thunderbolt should affect it, but I feel I should mention it just 
> > in
> > case). I dropped a lot of duplicate dmesg lines, as some of them repeated a
> > lot of times before the driver gave up.
> > 
> > I offer this patch to disable runpm for Fiji while a fix is found, if you 
> > decide
> > that is the best approach. Regardless, I will gladly test any patches you 
> > come
> > up with instead and confirm that the above issue has been fixed.
> > 
> > I cannot tell if any other GPUs are affected. The only other cards to which 
> > I
> > have access are a couple of AMD R9 280X (Tahiti XT), which use radeon driver
> > instead of amdgpu driver.
> 
> Adding a few more people.  Do you know what is accessing the audio?  The 
> audio should have a dependency on the GPU device.  The GPU won't enter 
> runtime pm until the audio has entered runtime pm and vice versa on resume. 
> Please attach a copy of your dmesg output and lspci output.

Also, please retest with the fresh 5.7-rc3.  There was a known
regression regarding HD-audio PM in 5.7-rc1/rc2, and it's been fixed
there (commit 8d6762af302d).


thanks,

Takashi
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-27 Thread Christian König

Yeah, but is Mesa going to use it?

Christian.

Am 27.04.20 um 15:54 schrieb Marek Olšák:
PAL requested it and they are going to use it. (it looks like they 
have to use it for correctness)


Marek

On Mon, Apr 27, 2020 at 9:02 AM Deucher, Alexander 
mailto:alexander.deuc...@amd.com>> wrote:


[AMD Official Use Only - Internal Distribution Only]


Do we have open source code UMD code which uses this?

Alex

*From:* Christian König mailto:ckoenig.leichtzumer...@gmail.com>>
*Sent:* Sunday, April 26, 2020 4:55 AM
*To:* Marek Olšák mailto:mar...@gmail.com>>;
Koenig, Christian mailto:christian.koe...@amd.com>>
*Cc:* Deucher, Alexander mailto:alexander.deuc...@amd.com>>; amd-gfx mailing list
mailto:amd-gfx@lists.freedesktop.org>>
*Subject:* Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to
compute IBs too
Thanks for that explanation. I suspected that there was a good
reason to have that in the kernel, but couldn't find one.

In this case the patch is Reviewed-by: Christian König
 

We should probably add this explanation as comment to the flag as
well.

Thanks,
Christian.

Am 26.04.20 um 02:43 schrieb Marek Olšák:

It was merged into amd-staging-drm-next.

I'm not absolutely sure, but I think we need to invalidate before
IBs if an IB is cached in L2 and the CPU has updated it. It can
only be cached in L2 if something other than CP has read it or
written to it without invalidation. CP reads don't cache it but
they can hit the cache if it's already cached.

For CE, we need to invalidate before the IB in the kernel,
because CE IBs can't do cache invalidations IIRC. This is the
number one reason for merging the already pushed commits.

Marek

On Sat., Apr. 25, 2020, 11:03 Christian König,
mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

Was that patch set actually merged upstream? My last status
is that we couldn't find a reason why we need to do this in
the kernel.

Christian.

Am 25.04.20 um 10:52 schrieb Marek Olšák:

This was missed.

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-04-27 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: Nicholas Johnson 
> Sent: Sunday, April 26, 2020 12:02 PM
> To: linux-ker...@vger.kernel.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Zhou, David(ChunMing)
> ; Nicholas Johnson  opensou...@outlook.com.au>
> Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
> 
> Hi all,
> 
> Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
> runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm /
> BACO. You can tell visually when it sleeps, because the fan on the graphics
> card is switched off to save power. It did not spin down the fan in v5.6.x.
> 
> This is great (I love it), except that when it is sleeping, the PCIe audio 
> function
> of the GPU has issues if anything tries to access it. You get dmesg errors 
> such
> as these:
> 
> snd_hda_intel :08:00.1: spurious response 0x0:0x0, last cmd=0x170500
> snd_hda_intel :08:00.1: azx_get_response timeout, switching to polling
> mode: last cmd=0x001f0500 snd_hda_intel :08:00.1: No response from
> codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel :08:00.1: No
> response from codec, resetting bus: last cmd=0x001f0500
> snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11
> 
> The above is with the Fiji XT GPU at :08:00.0 in a Thunderbolt enclosure
> (not that Thunderbolt should affect it, but I feel I should mention it just in
> case). I dropped a lot of duplicate dmesg lines, as some of them repeated a
> lot of times before the driver gave up.
> 
> I offer this patch to disable runpm for Fiji while a fix is found, if you 
> decide
> that is the best approach. Regardless, I will gladly test any patches you come
> up with instead and confirm that the above issue has been fixed.
> 
> I cannot tell if any other GPUs are affected. The only other cards to which I
> have access are a couple of AMD R9 280X (Tahiti XT), which use radeon driver
> instead of amdgpu driver.

Adding a few more people.  Do you know what is accessing the audio?  The audio 
should have a dependency on the GPU device.  The GPU won't enter runtime pm 
until the audio has entered runtime pm and vice versa on resume. Please attach 
a copy of your dmesg output and lspci output.

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-27 Thread Marek Olšák
PAL requested it and they are going to use it. (it looks like they have to
use it for correctness)

Marek

On Mon, Apr 27, 2020 at 9:02 AM Deucher, Alexander <
alexander.deuc...@amd.com> wrote:

> [AMD Official Use Only - Internal Distribution Only]
>
> Do we have open source code UMD code which uses this?
>
> Alex
> --
> *From:* Christian König 
> *Sent:* Sunday, April 26, 2020 4:55 AM
> *To:* Marek Olšák ; Koenig, Christian <
> christian.koe...@amd.com>
> *Cc:* Deucher, Alexander ; amd-gfx mailing
> list 
> *Subject:* Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute
> IBs too
>
> Thanks for that explanation. I suspected that there was a good reason to
> have that in the kernel, but couldn't find one.
>
> In this case the patch is Reviewed-by: Christian König
>  
>
> We should probably add this explanation as comment to the flag as well.
>
> Thanks,
> Christian.
>
> Am 26.04.20 um 02:43 schrieb Marek Olšák:
>
> It was merged into amd-staging-drm-next.
>
> I'm not absolutely sure, but I think we need to invalidate before IBs if
> an IB is cached in L2 and the CPU has updated it. It can only be cached in
> L2 if something other than CP has read it or written to it without
> invalidation. CP reads don't cache it but they can hit the cache if it's
> already cached.
>
> For CE, we need to invalidate before the IB in the kernel, because CE IBs
> can't do cache invalidations IIRC. This is the number one reason for
> merging the already pushed commits.
>
> Marek
>
> On Sat., Apr. 25, 2020, 11:03 Christian König, <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
> Was that patch set actually merged upstream? My last status is that we
> couldn't find a reason why we need to do this in the kernel.
>
> Christian.
>
> Am 25.04.20 um 10:52 schrieb Marek Olšák:
>
> This was missed.
>
> Marek
>
> ___
> amd-gfx mailing 
> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>  
> 
>
>
>
> ___
> amd-gfx mailing 
> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>  
> 
>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

2020-04-27 Thread Marek Olšák
I've already pushed the patch and marked it for stable.

I added a note into the commit message to discard the version bump for
stable.

The GCR packet is always supported in the ring regardless of firmware.

Marek

On Mon, Apr 27, 2020 at 9:01 AM Deucher, Alexander <
alexander.deuc...@amd.com> wrote:

> [AMD Official Use Only - Internal Distribution Only]
>
> Please split the patch into two, one to update the emit IB sequence, and
> one to bump the minor version.  That way we can make sure older kernels get
> the new sequence.  Also do we need an sdma fw version check for the new
> packet in the emi IB function?
>
> Alex
> --
> *From:* amd-gfx  on behalf of
> Christian König 
> *Sent:* Sunday, April 26, 2020 4:52 AM
> *To:* Marek Olšák 
> *Cc:* amd-gfx mailing list 
> *Subject:* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
>
> Thought so, we should try to get this get it committed ASAP. And maybe add
> a CC: stable tag as well.
>
> Patch is Reviewed-by: Christian König 
> .
>
> Thanks,
> Christian.
>
> Am 26.04.20 um 02:28 schrieb Marek Olšák:
>
> Not without a mandatory firmware update. The gcr packet support for IBs
> was added into the sdma firmware just two weeks ago.
>
> Marek
>
> On Sat., Apr. 25, 2020, 11:04 Christian König, <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
> Could we do this in userspace as well?
>
> Am 25.04.20 um 10:52 schrieb Marek Olšák:
>
> This should fix SDMA hangs on gfx10.
>
> Marek
>
> ___
> amd-gfx mailing 
> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>  
> 
>
>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-27 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

Do we have open source code UMD code which uses this?

Alex

From: Christian König 
Sent: Sunday, April 26, 2020 4:55 AM
To: Marek Olšák ; Koenig, Christian 
Cc: Deucher, Alexander ; amd-gfx mailing list 

Subject: Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

Thanks for that explanation. I suspected that there was a good reason to have 
that in the kernel, but couldn't find one.

In this case the patch is Reviewed-by: Christian König 


We should probably add this explanation as comment to the flag as well.

Thanks,
Christian.

Am 26.04.20 um 02:43 schrieb Marek Olšák:
It was merged into amd-staging-drm-next.

I'm not absolutely sure, but I think we need to invalidate before IBs if an IB 
is cached in L2 and the CPU has updated it. It can only be cached in L2 if 
something other than CP has read it or written to it without invalidation. CP 
reads don't cache it but they can hit the cache if it's already cached.

For CE, we need to invalidate before the IB in the kernel, because CE IBs can't 
do cache invalidations IIRC. This is the number one reason for merging the 
already pushed commits.

Marek

On Sat., Apr. 25, 2020, 11:03 Christian König, 
mailto:ckoenig.leichtzumer...@gmail.com>> 
wrote:
Was that patch set actually merged upstream? My last status is that we couldn't 
find a reason why we need to do this in the kernel.

Christian.

Am 25.04.20 um 10:52 schrieb Marek Olšák:
This was missed.

Marek



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

2020-04-27 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

Please split the patch into two, one to update the emit IB sequence, and one to 
bump the minor version.  That way we can make sure older kernels get the new 
sequence.  Also do we need an sdma fw version check for the new packet in the 
emi IB function?

Alex

From: amd-gfx  on behalf of Christian 
König 
Sent: Sunday, April 26, 2020 4:52 AM
To: Marek Olšák 
Cc: amd-gfx mailing list 
Subject: Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

Thought so, we should try to get this get it committed ASAP. And maybe add a 
CC: stable tag as well.

Patch is Reviewed-by: Christian König 
.

Thanks,
Christian.

Am 26.04.20 um 02:28 schrieb Marek Olšák:
Not without a mandatory firmware update. The gcr packet support for IBs was 
added into the sdma firmware just two weeks ago.

Marek

On Sat., Apr. 25, 2020, 11:04 Christian König, 
mailto:ckoenig.leichtzumer...@gmail.com>> 
wrote:
Could we do this in userspace as well?

Am 25.04.20 um 10:52 schrieb Marek Olšák:
This should fix SDMA hangs on gfx10.

Marek



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: remove conversion to bool in dc_link_ddc.c

2020-04-27 Thread Christian König

Am 27.04.20 um 08:37 schrieb Jason Yan:

The '>' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c:602:28-33: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
index 256889eed93e..aefd29a440b5 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
@@ -599,7 +599,7 @@ bool dal_ddc_submit_aux_command(struct ddc_service *ddc,
do {
struct aux_payload current_payload;
bool is_end_of_payload = (retrieved + DEFAULT_AUX_MAX_DATA_SIZE) 
>
-   payload->length ? true : false;
+   payload->length;
  
  		current_payload.address = payload->address;

current_payload.data = >data[retrieved];


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: remove conversion to bool in dcn20_mpc.c

2020-04-27 Thread Christian König

Am 27.04.20 um 08:37 schrieb Jason Yan:

The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c:455:70-75: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
index de9c857ab3e9..9d7432f3fb16 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
@@ -452,7 +452,7 @@ void mpc2_set_output_gamma(
next_mode = LUT_RAM_A;
  
  	mpc20_power_on_ogam_lut(mpc, mpcc_id, true);

-   mpc20_configure_ogam_lut(mpc, mpcc_id, next_mode == LUT_RAM_A ? 
true:false);
+   mpc20_configure_ogam_lut(mpc, mpcc_id, next_mode == LUT_RAM_A);
  
  	if (next_mode == LUT_RAM_A)

mpc2_program_luta(mpc, mpcc_id, params);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove conversion to bool in amdgpu_device.c

2020-04-27 Thread Christian König

Am 27.04.20 um 08:36 schrieb Jason Yan:

The '>' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3004:68-73: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3d601d5dd5af..ad94de3632d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3000,7 +3000,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_WORK(>xgmi_reset_work, amdgpu_device_xgmi_reset_func);
  
  	adev->gfx.gfx_off_req_count = 1;

-   adev->pm.ac_power = power_supply_is_system_supplied() > 0 ? true : 
false;
+   adev->pm.ac_power = power_supply_is_system_supplied() > 0;
  
  	/* Registers mapping */

/* TODO: block userspace mapping of io register */


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: remove conversion to bool in amdgpu_device.c

2020-04-27 Thread Jason Yan
The '>' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3004:68-73: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3d601d5dd5af..ad94de3632d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3000,7 +3000,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_WORK(>xgmi_reset_work, amdgpu_device_xgmi_reset_func);
 
adev->gfx.gfx_off_req_count = 1;
-   adev->pm.ac_power = power_supply_is_system_supplied() > 0 ? true : 
false;
+   adev->pm.ac_power = power_supply_is_system_supplied() > 0;
 
/* Registers mapping */
/* TODO: block userspace mapping of io register */
-- 
2.21.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: cleanup coding style a bit

2020-04-27 Thread Joe Perches
On Sun, 2020-04-26 at 15:18 +0200, Christian König wrote:
> Am 26.04.20 um 15:12 schrieb Bernard Zhao:
> > Maybe no need to check ws before kmalloc, kmalloc will check
> > itself, kmalloc`s logic is if ptr is NULL, kmalloc will just
> > return
> > 
> > Signed-off-by: Bernard Zhao 
> 
> Reviewed-by: Christian König 
> 
> I'm wondering why the automated scripts haven't found that one before.

because this pattern is

if (foo)
kfree(bar);

and the pattern looked for is:

if (foo)
kfree(foo);

> > diff --git a/drivers/gpu/drm/radeon/atom.c b/drivers/gpu/drm/radeon/atom.c
[]
> > @@ -1211,8 +1211,7 @@ static int atom_execute_table_locked(struct 
> > atom_context *ctx, int index, uint32
> > SDEBUG("<<\n");
> >   
> >   free:
> > -   if (ws)
> > -   kfree(ectx.ws);
> > +   kfree(ectx.ws);
> > return ret;
> >   }

I'm wondering if this removal is correct as the function
is named _locked and it may be recursive or called under
some external lock.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/radeon: cleanup coding style a bit

2020-04-27 Thread Bernard Zhao
Maybe no need to check ws before kmalloc, kmalloc will check
itself, kmalloc`s logic is if ptr is NULL, kmalloc will just
return

Signed-off-by: Bernard Zhao 
---
 drivers/gpu/drm/radeon/atom.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atom.c b/drivers/gpu/drm/radeon/atom.c
index 2c27627b6659..f15b20da5315 100644
--- a/drivers/gpu/drm/radeon/atom.c
+++ b/drivers/gpu/drm/radeon/atom.c
@@ -1211,8 +1211,7 @@ static int atom_execute_table_locked(struct atom_context 
*ctx, int index, uint32
SDEBUG("<<\n");
 
 free:
-   if (ws)
-   kfree(ectx.ws);
+   kfree(ectx.ws);
return ret;
 }
 
-- 
2.26.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: cleanup coding style a bit

2020-04-27 Thread Joe Perches
btw:  the debugging macros in atom.c are not good.

It could be something like the below as the output logging is
at best poorly formatted due to the many individual printks
without KERN_ that are emitted on separate lines.

#define ATOM_DEBUG

should probably be commented out.

The debugging macros and #include file should be better formatted.

The no_printk macro is useful to verify formats and arguments when
not debugging and removing the ATOM_DEBUG from atom-names.h does
not cause the unused char *arrays to be added to the object file
as the compiler elides unused arrays.

---
 drivers/gpu/drm/radeon/atom-names.h | 266 +++-
 drivers/gpu/drm/radeon/atom.c   |  23 ++--
 2 files changed, 218 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atom-names.h 
b/drivers/gpu/drm/radeon/atom-names.h
index 6f907a5..055775 100644
--- a/drivers/gpu/drm/radeon/atom-names.h
+++ b/drivers/gpu/drm/radeon/atom-names.h
@@ -27,74 +27,218 @@
 
 #include "atom.h"
 
-#ifdef ATOM_DEBUG
-
 #define ATOM_OP_NAMES_CNT 123
-static char *atom_op_names[ATOM_OP_NAMES_CNT] = {
-"RESERVED", "MOVE_REG", "MOVE_PS", "MOVE_WS", "MOVE_FB", "MOVE_PLL",
-"MOVE_MC", "AND_REG", "AND_PS", "AND_WS", "AND_FB", "AND_PLL", "AND_MC",
-"OR_REG", "OR_PS", "OR_WS", "OR_FB", "OR_PLL", "OR_MC", "SHIFT_LEFT_REG",
-"SHIFT_LEFT_PS", "SHIFT_LEFT_WS", "SHIFT_LEFT_FB", "SHIFT_LEFT_PLL",
-"SHIFT_LEFT_MC", "SHIFT_RIGHT_REG", "SHIFT_RIGHT_PS", "SHIFT_RIGHT_WS",
-"SHIFT_RIGHT_FB", "SHIFT_RIGHT_PLL", "SHIFT_RIGHT_MC", "MUL_REG",
-"MUL_PS", "MUL_WS", "MUL_FB", "MUL_PLL", "MUL_MC", "DIV_REG", "DIV_PS",
-"DIV_WS", "DIV_FB", "DIV_PLL", "DIV_MC", "ADD_REG", "ADD_PS", "ADD_WS",
-"ADD_FB", "ADD_PLL", "ADD_MC", "SUB_REG", "SUB_PS", "SUB_WS", "SUB_FB",
-"SUB_PLL", "SUB_MC", "SET_ATI_PORT", "SET_PCI_PORT", "SET_SYS_IO_PORT",
-"SET_REG_BLOCK", "SET_FB_BASE", "COMPARE_REG", "COMPARE_PS",
-"COMPARE_WS", "COMPARE_FB", "COMPARE_PLL", "COMPARE_MC", "SWITCH",
-"JUMP", "JUMP_EQUAL", "JUMP_BELOW", "JUMP_ABOVE", "JUMP_BELOW_OR_EQUAL",
-"JUMP_ABOVE_OR_EQUAL", "JUMP_NOT_EQUAL", "TEST_REG", "TEST_PS", "TEST_WS",
-"TEST_FB", "TEST_PLL", "TEST_MC", "DELAY_MILLISEC", "DELAY_MICROSEC",
-"CALL_TABLE", "REPEAT", "CLEAR_REG", "CLEAR_PS", "CLEAR_WS", "CLEAR_FB",
-"CLEAR_PLL", "CLEAR_MC", "NOP", "EOT", "MASK_REG", "MASK_PS", "MASK_WS",
-"MASK_FB", "MASK_PLL", "MASK_MC", "POST_CARD", "BEEP", "SAVE_REG",
-"RESTORE_REG", "SET_DATA_BLOCK", "XOR_REG", "XOR_PS", "XOR_WS", "XOR_FB",
-"XOR_PLL", "XOR_MC", "SHL_REG", "SHL_PS", "SHL_WS", "SHL_FB", "SHL_PLL",
-"SHL_MC", "SHR_REG", "SHR_PS", "SHR_WS", "SHR_FB", "SHR_PLL", "SHR_MC",
-"DEBUG", "CTB_DS",
+static const char * const atom_op_names[ATOM_OP_NAMES_CNT] = {
+   "RESERVED",
+   "MOVE_REG",
+   "MOVE_PS",
+   "MOVE_WS",
+   "MOVE_FB",
+   "MOVE_PLL",
+   "MOVE_MC",
+   "AND_REG",
+   "AND_PS",
+   "AND_WS",
+   "AND_FB",
+   "AND_PLL",
+   "AND_MC",
+   "OR_REG",
+   "OR_PS",
+   "OR_WS",
+   "OR_FB",
+   "OR_PLL",
+   "OR_MC",
+   "SHIFT_LEFT_REG",
+   "SHIFT_LEFT_PS",
+   "SHIFT_LEFT_WS",
+   "SHIFT_LEFT_FB",
+   "SHIFT_LEFT_PLL",
+   "SHIFT_LEFT_MC",
+   "SHIFT_RIGHT_REG",
+   "SHIFT_RIGHT_PS",
+   "SHIFT_RIGHT_WS",
+   "SHIFT_RIGHT_FB",
+   "SHIFT_RIGHT_PLL",
+   "SHIFT_RIGHT_MC",
+   "MUL_REG",
+   "MUL_PS",
+   "MUL_WS",
+   "MUL_FB",
+   "MUL_PLL",
+   "MUL_MC",
+   "DIV_REG",
+   "DIV_PS",
+   "DIV_WS",
+   "DIV_FB",
+   "DIV_PLL",
+   "DIV_MC",
+   "ADD_REG",
+   "ADD_PS",
+   "ADD_WS",
+   "ADD_FB",
+   "ADD_PLL",
+   "ADD_MC",
+   "SUB_REG",
+   "SUB_PS",
+   "SUB_WS",
+   "SUB_FB",
+   "SUB_PLL",
+   "SUB_MC",
+   "SET_ATI_PORT",
+   "SET_PCI_PORT",
+   "SET_SYS_IO_PORT",
+   "SET_REG_BLOCK",
+   "SET_FB_BASE",
+   "COMPARE_REG",
+   "COMPARE_PS",
+   "COMPARE_WS",
+   "COMPARE_FB",
+   "COMPARE_PLL",
+   "COMPARE_MC",
+   "SWITCH",
+   "JUMP",
+   "JUMP_EQUAL",
+   "JUMP_BELOW",
+   "JUMP_ABOVE",
+   "JUMP_BELOW_OR_EQUAL",
+   "JUMP_ABOVE_OR_EQUAL",
+   "JUMP_NOT_EQUAL",
+   "TEST_REG",
+   "TEST_PS",
+   "TEST_WS",
+   "TEST_FB",
+   "TEST_PLL",
+   "TEST_MC",
+   "DELAY_MILLISEC",
+   "DELAY_MICROSEC",
+   "CALL_TABLE",
+   "REPEAT",
+   "CLEAR_REG",
+   "CLEAR_PS",
+   "CLEAR_WS",
+   "CLEAR_FB",
+   "CLEAR_PLL",
+   "CLEAR_MC",
+   "NOP",
+   "EOT",
+   "MASK_REG",
+   "MASK_PS",
+   "MASK_WS",
+   "MASK_FB",
+   "MASK_PLL",
+   "MASK_MC",
+   "POST_CARD",
+   "BEEP",
+   "SAVE_REG",
+   "RESTORE_REG",
+   "SET_DATA_BLOCK",
+   "XOR_REG",
+   "XOR_PS",
+   "XOR_WS",
+   "XOR_FB",
+   "XOR_PLL",
+   "XOR_MC",
+   "SHL_REG",
+   "SHL_PS",
+   

[PATCH] drm/amd/display: remove conversion to bool in dcn20_mpc.c

2020-04-27 Thread Jason Yan
The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c:455:70-75: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 
---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
index de9c857ab3e9..9d7432f3fb16 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
@@ -452,7 +452,7 @@ void mpc2_set_output_gamma(
next_mode = LUT_RAM_A;
 
mpc20_power_on_ogam_lut(mpc, mpcc_id, true);
-   mpc20_configure_ogam_lut(mpc, mpcc_id, next_mode == LUT_RAM_A ? 
true:false);
+   mpc20_configure_ogam_lut(mpc, mpcc_id, next_mode == LUT_RAM_A);
 
if (next_mode == LUT_RAM_A)
mpc2_program_luta(mpc, mpcc_id, params);
-- 
2.21.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: remove conversion to bool in dc_link_ddc.c

2020-04-27 Thread Jason Yan
The '>' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c:602:28-33: WARNING:
conversion to bool not needed here

Signed-off-by: Jason Yan 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
index 256889eed93e..aefd29a440b5 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c
@@ -599,7 +599,7 @@ bool dal_ddc_submit_aux_command(struct ddc_service *ddc,
do {
struct aux_payload current_payload;
bool is_end_of_payload = (retrieved + 
DEFAULT_AUX_MAX_DATA_SIZE) >
-   payload->length ? true : false;
+   payload->length;
 
current_payload.address = payload->address;
current_payload.data = >data[retrieved];
-- 
2.21.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: remove conversion to bool in dcn20_mpc.c

2020-04-27 Thread Joe Perches
On Mon, 2020-04-27 at 14:37 +0800, Jason Yan wrote:
> The '==' expression itself is bool, no need to convert it to bool again.

trivia:

These descriptions are not quite correct.
The operators return an int, either 0 or 1.

-
6.5.8 Relational operators

6 Each of the operators < (less than), > (greater than), <= (less than or equal 
to), and >=
(greater than or equal to) shall yield 1 if the specified relation is true and 
0 if it is false. 90)
The result has type int

6.5.9 Equality operators

3 The == (equal to) and != (not equal to) operators are analogous to the 
relational
operators except for their lower precedence. 91) Each of the operators yields 1 
if the
specified relation is true and 0 if it is false. The result has type int. For 
any pair of
operands, exactly one of the relations is true.
-




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx