RE: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

2019-11-28 Thread Liu, Monk
It is tested/verified by Xiaojie  on nv14 against S3 issues.

Thanks 

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Koenig, Christian  
Sent: Friday, November 29, 2019 3:46 PM
To: Liu, Monk ; Yuan, Xiaojie ; 
Deucher, Alexander 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

In this case feel free to add an Acked-by: Christian König 
 to your patch.

But I would recommend to get an rb from somebody which knows that stuff better 
than I do.

Regards,
Christian.

Am 29.11.19 um 02:58 schrieb Liu, Monk:
> The content of CSIB is always static, I submitted a patch to use the 
> re-init and get rid of pin/unpin CSIB in hw_ini/fini,  (my purpose is 
> to fix the double unpin warning during unload ) 
> _
> Monk Liu|GPU Virtualization Team |AMD
>
>
> -Original Message-
> From: Christian König 
> Sent: Thursday, November 28, 2019 7:51 PM
> To: Liu, Monk ; Yuan, Xiaojie 
> ; Deucher, Alexander 
> ; Koenig, Christian 
> 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload
>
> Hi Monk,
>
> if the content of the CSIB is constant then it is certainly better to just 
> re-initialize it.
>
> This also prevents from corruption because of VRAM lost.
>
> Christian.
>
> Am 28.11.19 um 03:53 schrieb Liu, Monk:
>> Hi Xiaojie
>>
>> For SRIOV we don't use suspend so I didn't think to that part, thanks for 
>> the remind !
>> But we still need to fix this call trace issue anyway (our jenkins 
>> testing  system consider such call trace as an error )
>>
>> How about we do "  adev->gfx.rlc.funcs->get_csb_buffer(adev,
>> dst_ptr);" in the hw_init() ? this way You don't need to evict the CSIB 
>> during suspend and the CSIB always will be re-initialized after S3 resume ?
>>
>> @Deucher, Alexander @Koenig, Christian what's your opinion ?
>> _
>> Monk Liu|GPU Virtualization Team |AMD
>>
>>
>> -Original Message-
>> From: Yuan, Xiaojie 
>> Sent: Tuesday, November 26, 2019 9:10 PM
>> To: Liu, Monk 
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload
>>
>> Hi Monk,
>>
>> hw_fini() is called in suspend code path as well. I'm wondering how csb can 
>> be evicted if it's not unpined before suspend.
>>
>> BR,
>> Xiaojie
>>
>>> On Nov 26, 2019, at 7:50 PM, Monk Liu  wrote:
>>>
>>> kernel would report a warning on double unpin on the csb BO because 
>>> we unpin it during hw_fini but actually we don't need to pin/unpin 
>>> it during hw_init/fini since it is created with kernel pinned
>>>
>>> remove all those useless code for gfx9/10
>>>
>>> Signed-off-by: Monk Liu 
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c |  1 - 
>>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 38 
>>> 
>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 39 
>>> -
>>> 3 files changed, 78 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>>> index c8793e6..289fada 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>>> @@ -145,7 +145,6 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
>>>  dst_ptr = adev->gfx.rlc.cs_ptr;
>>>  adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
>>>  amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
>>> -amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
>>>  amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
>>>
>>>  return 0;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> index a56cba9..5ee7467 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> @@ -996,39 +996,6 @@ static int gfx_v10_0_rlc_init(struct amdgpu_device 
>>> *adev)
>>>  return 0;
>>> }
>>>
>>> -static int gfx_v10_0_csb_vram_pin(struct amdgpu_device *adev) -{
>>> -int r;
>>> -
>>> -r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
>>> -if (unlikely(r != 0))
>>> -return r;
>>> -
>>> -r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
>>> -AMDGPU_GEM_DOMAIN_VRAM);
>>> -if (!r)
>>> -adev->gfx.rlc.clear_state_gpu_addr =
>>> -amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
>>> -
>>> -amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
>>> -
>>> -return r;
>>> -}
>>> -
>>> -static void gfx_v10_0_csb_vram_unpin(struct amdgpu_device *adev) -{
>>> -int r;
>>> -
>>> -if (!adev->gfx.rlc.clear_state_obj)
>>> -return;
>>> -
>>> -r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
>>> -if (likely(r == 0)) {
>>> -amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
>>> -amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
>>> - 

RE: [PATCH 2/5] drm/amdgpu: skip rlc ucode loading for SRIOV gfx10

2019-11-28 Thread Liu, Monk
@Zhang, Hawking @Deucher, Alexander
Can you help to review it for me

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: amd-gfx  On Behalf Of Liu, Monk
Sent: Thursday, November 28, 2019 10:21 AM
To: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 2/5] drm/amdgpu: skip rlc ucode loading for SRIOV gfx10



_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Monk Liu  
Sent: Tuesday, November 26, 2019 7:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk 
Subject: [PATCH 2/5] drm/amdgpu: skip rlc ucode loading for SRIOV gfx10

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 80 +-
 1 file changed, 41 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 879c0a1..a56cba9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -691,59 +691,61 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device 
*adev)
adev->gfx.ce_fw_version = le32_to_cpu(cp_hdr->header.ucode_version);
adev->gfx.ce_feature_version = 
le32_to_cpu(cp_hdr->ucode_feature_version);
 
-   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
-   err = request_firmware(>gfx.rlc_fw, fw_name, adev->dev);
-   if (err)
-   goto out;
-   err = amdgpu_ucode_validate(adev->gfx.rlc_fw);
-   rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
-   version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
-   version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
-   if (version_major == 2 && version_minor == 1)
-   adev->gfx.rlc.is_rlc_v2_1 = true;
-
-   adev->gfx.rlc_fw_version = le32_to_cpu(rlc_hdr->header.ucode_version);
-   adev->gfx.rlc_feature_version = 
le32_to_cpu(rlc_hdr->ucode_feature_version);
-   adev->gfx.rlc.save_and_restore_offset =
+   if (!amdgpu_sriov_vf(adev)) {
+   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", 
chip_name);
+   err = request_firmware(>gfx.rlc_fw, fw_name, adev->dev);
+   if (err)
+   goto out;
+   err = amdgpu_ucode_validate(adev->gfx.rlc_fw);
+   rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
+   version_major = 
le16_to_cpu(rlc_hdr->header.header_version_major);
+   version_minor = 
le16_to_cpu(rlc_hdr->header.header_version_minor);
+   if (version_major == 2 && version_minor == 1)
+   adev->gfx.rlc.is_rlc_v2_1 = true;
+
+   adev->gfx.rlc_fw_version = 
le32_to_cpu(rlc_hdr->header.ucode_version);
+   adev->gfx.rlc_feature_version = 
le32_to_cpu(rlc_hdr->ucode_feature_version);
+   adev->gfx.rlc.save_and_restore_offset =
le32_to_cpu(rlc_hdr->save_and_restore_offset);
-   adev->gfx.rlc.clear_state_descriptor_offset =
+   adev->gfx.rlc.clear_state_descriptor_offset =
le32_to_cpu(rlc_hdr->clear_state_descriptor_offset);
-   adev->gfx.rlc.avail_scratch_ram_locations =
+   adev->gfx.rlc.avail_scratch_ram_locations =
le32_to_cpu(rlc_hdr->avail_scratch_ram_locations);
-   adev->gfx.rlc.reg_restore_list_size =
+   adev->gfx.rlc.reg_restore_list_size =
le32_to_cpu(rlc_hdr->reg_restore_list_size);
-   adev->gfx.rlc.reg_list_format_start =
+   adev->gfx.rlc.reg_list_format_start =
le32_to_cpu(rlc_hdr->reg_list_format_start);
-   adev->gfx.rlc.reg_list_format_separate_start =
+   adev->gfx.rlc.reg_list_format_separate_start =
le32_to_cpu(rlc_hdr->reg_list_format_separate_start);
-   adev->gfx.rlc.starting_offsets_start =
+   adev->gfx.rlc.starting_offsets_start =
le32_to_cpu(rlc_hdr->starting_offsets_start);
-   adev->gfx.rlc.reg_list_format_size_bytes =
+   adev->gfx.rlc.reg_list_format_size_bytes =
le32_to_cpu(rlc_hdr->reg_list_format_size_bytes);
-   adev->gfx.rlc.reg_list_size_bytes =
+   adev->gfx.rlc.reg_list_size_bytes =
le32_to_cpu(rlc_hdr->reg_list_size_bytes);
-   adev->gfx.rlc.register_list_format =
+   adev->gfx.rlc.register_list_format =
kmalloc(adev->gfx.rlc.reg_list_format_size_bytes +
-   adev->gfx.rlc.reg_list_size_bytes, GFP_KERNEL);
-   if (!adev->gfx.rlc.register_list_format) {
-   err = -ENOMEM;
-   goto out;
-   }
+   adev->gfx.rlc.reg_list_size_bytes, 
GFP_KERNEL);
+   if 

RE: [PATCH 3/5] drm/amdgpu: do autoload right after MEC loaded for SRIOV VF

2019-11-28 Thread Liu, Monk
@Zhang, Hawking@Deucher, Alexander

Can you help to review it for me ? 

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: amd-gfx  On Behalf Of Liu, Monk
Sent: Thursday, November 28, 2019 10:22 AM
To: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 3/5] drm/amdgpu: do autoload right after MEC loaded for 
SRIOV VF

ping
_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Monk Liu  
Sent: Tuesday, November 26, 2019 7:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk 
Subject: [PATCH 3/5] drm/amdgpu: do autoload right after MEC loaded for SRIOV VF

since we don't have RLCG ucode loading and no SRlist as well

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 96a6b00..b65fda9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1490,8 +1490,7 @@ static int psp_np_fw_load(struct psp_context *psp)
return ret;
 
/* Start rlc autoload after psp recieved all the gfx firmware */
-   if (psp->autoload_supported && ucode->ucode_id ==
-   AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM) {
+   if (psp->autoload_supported && ucode->ucode_id == 
(amdgpu_sriov_vf(adev) ? AMDGPU_UCODE_ID_CP_MEC2 : 
AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM)) {
ret = psp_rlc_autoload(psp);
if (ret) {
DRM_ERROR("Failed to start rlc autoload\n");
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cmonk.liu%40amd.com%7C0bc674ddf1c94ba831fa08d773a9b0ed%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637105044990255000sdata=kvsZdko%2FQAXIiZCP19wo%2BxEISLqEw7tMaFO0dvxU5Ew%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

2019-11-28 Thread Christian König
In this case feel free to add an Acked-by: Christian König 
 to your patch.


But I would recommend to get an rb from somebody which knows that stuff 
better than I do.


Regards,
Christian.

Am 29.11.19 um 02:58 schrieb Liu, Monk:

The content of CSIB is always static, I submitted a patch to use the re-init 
and get rid of pin/unpin CSIB in hw_ini/fini,  (my purpose is to fix the double 
unpin warning during unload )
_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Christian König 
Sent: Thursday, November 28, 2019 7:51 PM
To: Liu, Monk ; Yuan, Xiaojie ; Deucher, Alexander 
; Koenig, Christian 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

Hi Monk,

if the content of the CSIB is constant then it is certainly better to just 
re-initialize it.

This also prevents from corruption because of VRAM lost.

Christian.

Am 28.11.19 um 03:53 schrieb Liu, Monk:

Hi Xiaojie

For SRIOV we don't use suspend so I didn't think to that part, thanks for the 
remind !
But we still need to fix this call trace issue anyway (our jenkins
testing  system consider such call trace as an error )

How about we do "  adev->gfx.rlc.funcs->get_csb_buffer(adev,
dst_ptr);" in the hw_init() ? this way You don't need to evict the CSIB during 
suspend and the CSIB always will be re-initialized after S3 resume ?

@Deucher, Alexander @Koenig, Christian what's your opinion ?
_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Yuan, Xiaojie 
Sent: Tuesday, November 26, 2019 9:10 PM
To: Liu, Monk 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

Hi Monk,

hw_fini() is called in suspend code path as well. I'm wondering how csb can be 
evicted if it's not unpined before suspend.

BR,
Xiaojie


On Nov 26, 2019, at 7:50 PM, Monk Liu  wrote:

kernel would report a warning on double unpin on the csb BO because
we unpin it during hw_fini but actually we don't need to pin/unpin it
during hw_init/fini since it is created with kernel pinned

remove all those useless code for gfx9/10

Signed-off-by: Monk Liu 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c |  1 -
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 38 
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 39 -
3 files changed, 78 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index c8793e6..289fada 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -145,7 +145,6 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
 dst_ptr = adev->gfx.rlc.cs_ptr;
 adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
 amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
-amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
 amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);

 return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index a56cba9..5ee7467 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -996,39 +996,6 @@ static int gfx_v10_0_rlc_init(struct amdgpu_device *adev)
 return 0;
}

-static int gfx_v10_0_csb_vram_pin(struct amdgpu_device *adev) -{
-int r;
-
-r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-if (unlikely(r != 0))
-return r;
-
-r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
-AMDGPU_GEM_DOMAIN_VRAM);
-if (!r)
-adev->gfx.rlc.clear_state_gpu_addr =
-amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
-
-amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
-return r;
-}
-
-static void gfx_v10_0_csb_vram_unpin(struct amdgpu_device *adev) -{
-int r;
-
-if (!adev->gfx.rlc.clear_state_obj)
-return;
-
-r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
-if (likely(r == 0)) {
-amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-}
-}
-
static void gfx_v10_0_mec_fini(struct amdgpu_device *adev) {
 amdgpu_bo_free_kernel(>gfx.mec.hpd_eop_obj, NULL, NULL); @@
-3780,10 +3747,6 @@ static int gfx_v10_0_hw_init(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

-r = gfx_v10_0_csb_vram_pin(adev);
-if (r)
-return r;
-
 if (!amdgpu_emu_mode)
 gfx_v10_0_init_golden_registers(adev);

@@ -3871,7 +3834,6 @@ static int gfx_v10_0_hw_fini(void *handle)
 }
 gfx_v10_0_cp_enable(adev, false);
 gfx_v10_0_enable_gui_idle_interrupt(adev, false);
-gfx_v10_0_csb_vram_unpin(adev);

 return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 4cc2e50..524a7ba 100644
--- 

Re: Deadlock on PTEs update for HMM

2019-11-28 Thread Christian König

Hi Felix,

yes that is exactly my thinking as well. Problem is that getting this to 
work was much harder than I thought.


We can't use a mutex cause TTM is calling the eviction callback in 
atomic context. A spinlock doesn't looked like a good idea either 
because we potentially need to wait for the hardware with a fixed IB pool.


Because of this I've started to rewrite the TTM handling to not call the 
driver in an atomic context any more, but that took me way longer than 
expected as well.


I'm currently experimenting with using a trylock driver mutex, that at 
least that should work for now until we got something better.


Regards,
Christian.

Am 28.11.19 um 21:30 schrieb Felix Kuehling:

Hi Christian,

I'm thinking about this problem, trying to come up with a solution. 
The fundamental problem is that we need low-overhead access to the 
page table in the MMU notifier, without much memory management or 
locking.


There is one "driver lock" that we're supposed to take in the MMU 
notifier as well as when we update page tables that is prescribed by 
the HMM documentation (Documentation/vm/hmm.rst). I don't currently 
see such a lock in amdgpu. We'll probably need to add that anyway, 
with all the usual precautions about lock dependencies around MMU 
notifiers. Then we could use that lock to protect page table residency 
state, in addition to the reservation of the top-level page directory.


We don't want to block eviction of page tables unconditionally, so the 
MMU notifier must be able to deal with the situation that page tables 
are not resident at the moment. But the lock can delay page tables 
from being evicted while an MMU notifier is in progress and protect us 
from race conditions between MMU notifiers invalidating PTEs, and page 
tables getting evicted.


amdgpu_vm_bo_invalidate could detect when a page table is being 
evicted, and update a new "vm_resident" flag inside the amdgpu_vm 
while holding the "HMM driver lock". If an MMU notifier is in 
progress, trying to take the "HMM driver lock" will delay the eviction 
long enough for any pending PTE invalidation to complete.


In the case that page tables are not resident (vm_resident flag is 
false), it means the GPU is currently not accessing any memory in that 
amdgpu_vm address space. So we don't need to invalidate the PTEs right 
away. I think we could implement a deferred invalidation mechanism for 
this case, that delays the invalidation until the next time the page 
tables are made resident. amdgpu_amdkfd_gpuvm_restore_process_bos 
would replay any deferred PTE invalidations after validating the page 
tables but before restarting the user mode queues for the process. If 
graphics ever implements page-fault-based memory management, you'd 
need to do something similar in amdgpu_cs.


Once all that is in place, we should be able to update PTEs in MMU 
notifiers without reserving the page tables.


If we use SDMA for updating page tables, we may need a pre-allocated 
IB for use in MMU notifiers. And there is problably other details to 
be worked out about exactly how we implement the PTE invalidation in 
MMU notifiers and reflecting that in the state of the amdgpu_vm and 
amdgpu_bo_va_mapping.


Does this idea sound reasonable to you? Can you think of a simpler 
solution?


Thanks,
  Felix

On 2019-11-27 10:02 a.m., Christian König wrote:

Hi Alejandro,

yes I'm very aware of this issue, but unfortunately can't give an 
easy solution either.


I'm working for over a year now on getting this fixed, but 
unfortunately it turned out that this problem is much bigger than 
initially thought.


Setting the appropriate GFP flags for the job allocation is actually 
the trivial part.


The really really hard thing is that we somehow need to add a lock to 
prevent the page tables from being evicted. And as you also figured 
out that lock can't be taken easily anywhere else.


I've already wrote a prototype for this, but didn't had time to 
hammer it into shape for upstreaming yet.


Regards,
Christian.

Am 27.11.19 um 15:55 schrieb Sierra Guiza, Alejandro (Alex):


Hi Christian,

As you know, we’re working on the HMM enablement. Im working on the 
dGPU page table entries invalidation on the userptr mapping case. 
Currently, the MMU notifiers handle stops all user mode queues, 
schedule a delayed worker to re-validate userptr mappings and 
restart the queues.


Part of the HMM functionality, we need to invalidate the page table 
entries instead of stopping the queues. At the same time we need to 
move the revalidation of the userptr mappings into the page fault 
handler.


We’re seeing a deadlock warning after we try to invalidate the PTEs 
inside the MMU notifier handler. More specific, when we try to 
update the BOs to invalidate PTEs using amdgpu_vm_bo_update. This 
uses kmalloc on the amdgpu_job_alloc which seems to be causing this 
problem.


Based on @Kuehling, Felix  comments, 
kmalloc without any special 

RE: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

2019-11-28 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Liu, Monk  
Sent: 2019年11月29日 14:43
To: Yuan, Xiaojie ; amd-gfx@lists.freedesktop.org; 
Koenig, Christian ; Deucher, Alexander 
; Zhang, Hawking 
Subject: RE: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

@Koenig, Christian @Deucher, Alexander@Zhang, Hawking

Can you guys help to review the patch for me ? thanks for Xiaojie who helped to 
verify the S3 issue testing against this patch 

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Yuan, Xiaojie  
Sent: Friday, November 29, 2019 2:31 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

[AMD Official Use Only - Internal Distribution Only]

Tested on navi14 and s3/baco works fine with patch applied.

Reviewed-by: Xiaojie Yuan 

BR,
Xiaojie


From: amd-gfx  on behalf of Monk Liu 

Sent: Thursday, November 28, 2019 2:57 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

kernel would report a warning on double unpin on the csb BO because we unpin it 
during hw_fini but actually we don't need to pin/unpin it during hw_init/fini 
since it is created with kernel pinned

v2:
get_csb in init_rlc so hw_init() will make CSIB content back even after reset 
or s3.
take care of gfx7/8 as well

v3:
use bo_create_kernel instead of bo_create_reserved for CSB otherwise the 
bo_free_kernel() on CSB is not aligned and would led to its internal reserve 
pending there forever

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 10 +-  
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 58 +
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 40 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 40 +--
 5 files changed, 6 insertions(+), 144 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index c8793e6..6373bfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -124,13 +124,12 @@ int amdgpu_gfx_rlc_init_sr(struct amdgpu_device *adev, 
u32 dws)
  */
 int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)  {
-   volatile u32 *dst_ptr;
u32 dws;
int r;

/* allocate clear state block */
adev->gfx.rlc.clear_state_size = dws = 
adev->gfx.rlc.funcs->get_csb_size(adev);
-   r = amdgpu_bo_create_reserved(adev, dws * 4, PAGE_SIZE,
+   r = amdgpu_bo_create_kernel(adev, dws * 4, PAGE_SIZE,
  AMDGPU_GEM_DOMAIN_VRAM,
  >gfx.rlc.clear_state_obj,
  >gfx.rlc.clear_state_gpu_addr,
@@ -141,13 +140,6 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
return r;
}

-   /* set up the cs buffer */
-   dst_ptr = adev->gfx.rlc.cs_ptr;
-   adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
-   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
return 0;
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 7372904..7703b25 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -991,39 +991,6 @@ static int gfx_v10_0_rlc_init(struct amdgpu_device *adev)
return 0;
 }

-static int gfx_v10_0_csb_vram_pin(struct amdgpu_device *adev) -{
-   int r;
-
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-   if (unlikely(r != 0))
-   return r;
-
-   r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
-   AMDGPU_GEM_DOMAIN_VRAM);
-   if (!r)
-   adev->gfx.rlc.clear_state_gpu_addr =
-   amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
-
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
-   return r;
-}
-
-static void gfx_v10_0_csb_vram_unpin(struct amdgpu_device *adev) -{
-   int r;
-
-   if (!adev->gfx.rlc.clear_state_obj)
-   return;
-
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
-   if (likely(r == 0)) {
-   amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-   }
-}
-
 static void gfx_v10_0_mec_fini(struct amdgpu_device *adev)  {
amdgpu_bo_free_kernel(>gfx.mec.hpd_eop_obj, NULL, NULL); @@ 
-1785,25 +1752,7 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,

 static int gfx_v10_0_init_csb(struct amdgpu_device *adev)  

Re: [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit

2019-11-28 Thread Kenny Ho
On Tue, Oct 1, 2019 at 10:30 AM Michal Koutný  wrote:
> On Thu, Aug 29, 2019 at 02:05:24AM -0400, Kenny Ho  wrote:
> > drm.buffer.default
> > A read-only flat-keyed file which exists on the root cgroup.
> > Each entry is keyed by the drm device's major:minor.
> >
> > Default limits on the total GEM buffer allocation in bytes.
> What is the purpose of this attribute (and alikes for other resources)?
> I can't see it being set differently but S64_MAX in
> drmcg_device_early_init.

cgroup has a number of conventions and one of which is the idea of a
default.  The idea here is to allow for device specific defaults.  For
this specific resource, I can probably not expose it since it's not
particularly useful, but for other resources (such as the lgpu
resource) the concept of a default is useful (for example, different
devices can have different number of lgpu.)


> > +static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > [...]
> > + switch (type) {
> > + case DRMCG_TYPE_BO_TOTAL:
> > + p_max = parent == NULL ? S64_MAX :
> > + parent->dev_resources[minor]->
> > + bo_limits_total_allocated;
> > +
> > + rc = drmcg_process_limit_s64_val(sattr, true,
> > + props->bo_limits_total_allocated_default,
> > + p_max,
> > + );
> IIUC, this allows initiating the particular limit value based either on
> parent or the default per-device value. This is alas rather an
> antipattern. The most stringent limit on the path from a cgroup to the
> root should be applied at the charging time. However, the child should
> not inherit the verbatim value from the parent (may race with parent and
> it won't be updated upon parent change).
I think this was a mistake during one of my refactor and I shrunk the
critical section protected by a mutex a bit too much.  But you are
right in the sense that I don't propagate the limits downward to the
children when the parent's limit is updated.  But from the user
interface perspective, wouldn't this be confusing?  When a sysadmin
sets a limit using the 'max' keyword, the value would be a global one
even though the actual allowable maximum for the particular cgroup is
less in reality because of the ancestor cgroups?  (If this is the
established norm, I am ok to go along but seems confusing to me.)  I
am probably missing something because as I implemented this, the 'max'
and 'default' semantic has been confusing to me especially for the
children cgroups due to the context of the ancestors.

> You already do the appropriate hierarchical check in
> drmcg_try_chb_bo_alloc, so the parent propagation could be simply
> dropped if I'm not mistaken.
I will need to double check.  But I think interaction between parent
and children (or perhaps between siblings) will be needed eventually
because there seems to be a desire to implement "weight" type of
resource.  Also, from performance perspective, wouldn't it make more
sense to make sure the limits are set correctly during configuration
than to have to check all the cgroups up through the parents?  I don't
have comprehensive knowledge of the implementation of other cgroup
controllers so if more experience folks can comment that would be
great.  (Although, I probably should just do one approach instead of
doing both... or 1.5.)

>
> Also, I can't find how the read of
> parent->dev_resources[minor]->bo_limits_total_allocated and its
> concurrent update are synchronized (i.e. someone writing
> buffer.total.max for parent and child in parallel). (It may just my
> oversight.)
This is probably the refactor mistake I mentioned earlier.

Regards,
Kenny
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

2019-11-28 Thread Liu, Monk
@Koenig, Christian @Deucher, Alexander@Zhang, Hawking

Can you guys help to review the patch for me ? thanks for Xiaojie who helped to 
verify the S3 issue testing against this patch 

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Yuan, Xiaojie  
Sent: Friday, November 29, 2019 2:31 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

[AMD Official Use Only - Internal Distribution Only]

Tested on navi14 and s3/baco works fine with patch applied.

Reviewed-by: Xiaojie Yuan 

BR,
Xiaojie


From: amd-gfx  on behalf of Monk Liu 

Sent: Thursday, November 28, 2019 2:57 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

kernel would report a warning on double unpin on the csb BO because we unpin it 
during hw_fini but actually we don't need to pin/unpin it during hw_init/fini 
since it is created with kernel pinned

v2:
get_csb in init_rlc so hw_init() will make CSIB content back even after reset 
or s3.
take care of gfx7/8 as well

v3:
use bo_create_kernel instead of bo_create_reserved for CSB otherwise the 
bo_free_kernel() on CSB is not aligned and would led to its internal reserve 
pending there forever

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 10 +-  
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 58 +
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 40 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 40 +--
 5 files changed, 6 insertions(+), 144 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index c8793e6..6373bfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -124,13 +124,12 @@ int amdgpu_gfx_rlc_init_sr(struct amdgpu_device *adev, 
u32 dws)
  */
 int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)  {
-   volatile u32 *dst_ptr;
u32 dws;
int r;

/* allocate clear state block */
adev->gfx.rlc.clear_state_size = dws = 
adev->gfx.rlc.funcs->get_csb_size(adev);
-   r = amdgpu_bo_create_reserved(adev, dws * 4, PAGE_SIZE,
+   r = amdgpu_bo_create_kernel(adev, dws * 4, PAGE_SIZE,
  AMDGPU_GEM_DOMAIN_VRAM,
  >gfx.rlc.clear_state_obj,
  >gfx.rlc.clear_state_gpu_addr,
@@ -141,13 +140,6 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
return r;
}

-   /* set up the cs buffer */
-   dst_ptr = adev->gfx.rlc.cs_ptr;
-   adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
-   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
return 0;
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 7372904..7703b25 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -991,39 +991,6 @@ static int gfx_v10_0_rlc_init(struct amdgpu_device *adev)
return 0;
 }

-static int gfx_v10_0_csb_vram_pin(struct amdgpu_device *adev) -{
-   int r;
-
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-   if (unlikely(r != 0))
-   return r;
-
-   r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
-   AMDGPU_GEM_DOMAIN_VRAM);
-   if (!r)
-   adev->gfx.rlc.clear_state_gpu_addr =
-   amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
-
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
-   return r;
-}
-
-static void gfx_v10_0_csb_vram_unpin(struct amdgpu_device *adev) -{
-   int r;
-
-   if (!adev->gfx.rlc.clear_state_obj)
-   return;
-
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
-   if (likely(r == 0)) {
-   amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-   }
-}
-
 static void gfx_v10_0_mec_fini(struct amdgpu_device *adev)  {
amdgpu_bo_free_kernel(>gfx.mec.hpd_eop_obj, NULL, NULL); @@ 
-1785,25 +1752,7 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,

 static int gfx_v10_0_init_csb(struct amdgpu_device *adev)  {
-   int r;
-
-   if (adev->in_gpu_reset) {
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
-  (void **)>gfx.rlc.cs_ptr);
-   if (!r) {
-  

Re: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

2019-11-28 Thread Yuan, Xiaojie
[AMD Official Use Only - Internal Distribution Only]

Tested on navi14 and s3/baco works fine with patch applied.

Reviewed-by: Xiaojie Yuan 

BR,
Xiaojie


From: amd-gfx  on behalf of Monk Liu 

Sent: Thursday, November 28, 2019 2:57 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH] drm/amdgpu: fix calltrace during kmd unload(v2)

kernel would report a warning on double unpin
on the csb BO because we unpin it during hw_fini
but actually we don't need to pin/unpin it during
hw_init/fini since it is created with kernel pinned

v2:
get_csb in init_rlc so hw_init() will make CSIB content
back even after reset or s3.
take care of gfx7/8 as well

v3:
use bo_create_kernel instead of bo_create_reserved for CSB
otherwise the bo_free_kernel() on CSB is not aligned and
would led to its internal reserve pending there forever

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 10 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 58 +
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 40 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 40 +--
 5 files changed, 6 insertions(+), 144 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index c8793e6..6373bfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -124,13 +124,12 @@ int amdgpu_gfx_rlc_init_sr(struct amdgpu_device *adev, 
u32 dws)
  */
 int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
 {
-   volatile u32 *dst_ptr;
u32 dws;
int r;

/* allocate clear state block */
adev->gfx.rlc.clear_state_size = dws = 
adev->gfx.rlc.funcs->get_csb_size(adev);
-   r = amdgpu_bo_create_reserved(adev, dws * 4, PAGE_SIZE,
+   r = amdgpu_bo_create_kernel(adev, dws * 4, PAGE_SIZE,
  AMDGPU_GEM_DOMAIN_VRAM,
  >gfx.rlc.clear_state_obj,
  >gfx.rlc.clear_state_gpu_addr,
@@ -141,13 +140,6 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
return r;
}

-   /* set up the cs buffer */
-   dst_ptr = adev->gfx.rlc.cs_ptr;
-   adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
-   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
return 0;
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 7372904..7703b25 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -991,39 +991,6 @@ static int gfx_v10_0_rlc_init(struct amdgpu_device *adev)
return 0;
 }

-static int gfx_v10_0_csb_vram_pin(struct amdgpu_device *adev)
-{
-   int r;
-
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-   if (unlikely(r != 0))
-   return r;
-
-   r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
-   AMDGPU_GEM_DOMAIN_VRAM);
-   if (!r)
-   adev->gfx.rlc.clear_state_gpu_addr =
-   amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
-
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
-   return r;
-}
-
-static void gfx_v10_0_csb_vram_unpin(struct amdgpu_device *adev)
-{
-   int r;
-
-   if (!adev->gfx.rlc.clear_state_obj)
-   return;
-
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
-   if (likely(r == 0)) {
-   amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-   }
-}
-
 static void gfx_v10_0_mec_fini(struct amdgpu_device *adev)
 {
amdgpu_bo_free_kernel(>gfx.mec.hpd_eop_obj, NULL, NULL);
@@ -1785,25 +1752,7 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,

 static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
 {
-   int r;
-
-   if (adev->in_gpu_reset) {
-   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
-  (void **)>gfx.rlc.cs_ptr);
-   if (!r) {
-   adev->gfx.rlc.funcs->get_csb_buffer(adev,
-   adev->gfx.rlc.cs_ptr);
-   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
-   }
-
-   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-   if (r)
-   return r;
-   }
+   adev->gfx.rlc.funcs->get_csb_buffer(adev, adev->gfx.rlc.cs_ptr);

/* csib */
WREG32_SOC15(GC, 0, 

Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem

2019-11-28 Thread Kenny Ho
On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný  wrote:
> On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho  wrote:
> > +struct cgroup_subsys drm_cgrp_subsys = {
> > + .css_alloc  = drmcg_css_alloc,
> > + .css_free   = drmcg_css_free,
> > + .early_init = false,
> > + .legacy_cftypes = files,
> Do you really want to expose the DRM controller on v1 hierarchies (where
> threads of one process can be in different cgroups, or children cgroups
> compete with their parents)?

(Sorry for the delay, I have been distracted by something else.)
Yes, I am hoping to make the functionality as widely available as
possible since the ecosystem is still transitioning to v2.  Do you see
inherent problem with this approach?

Regards,
Kenny


>
> > + .dfl_cftypes= files,
> > +};
>
> Just asking,
> Michal
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup

2019-11-28 Thread Kenny Ho
Reducing audience since this is AMD specific.

On Tue, Oct 8, 2019 at 3:11 PM Kuehling, Felix  wrote:
>
> On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > The number of logical gpu (lgpu) is defined to be the number of compute
> > unit (CU) for a device.  The lgpu allocation limit only applies to
> > compute workload for the moment (enforced via kfd queue creation.)  Any
> > cu_mask update is validated against the availability of the compute unit
> > as defined by the drmcg the kfd process belongs to.
>
> There is something missing here. There is an API for the application to
> specify a CU mask. Right now it looks like the application-specified and
> CGroup-specified CU masks would clobber each other. Instead the two
> should be merged.
>
> The CGroup-specified mask should specify a subset of CUs available for
> application-specified CU masks. When the cgroup CU mask changes, you'd
> need to take any application-specified CU masks into account before
> updating the hardware.
The idea behind the current implementation is to give sysadmin
priority over user application (as that is the definition of control
group.)  Mask specified by applicatoin/user is validated by
pqm_drmcg_lgpu_validate and rejected with EACCES if they are not
compatible.  The alternative is to ignore the difference and have the
kernel guess/redistribute the assignment but I am not sure if this is
a good approach since there is not enough information to allow the
kernel to guess the user's intention correctly consistently.  (This is
base on multiple conversations with you and Joe that, led me to
believe, there are situation where spreading CU assignment across
multiple SE is a good thing but not always.)

If the cgroup-specified mask is changed after the application has set
the mask, the intersection of the two masks will be set instead.  It
is possible to have no intersection and in this case no CU is made
available to the application (just like the possibility for memcgroup
to starve the amount of memory needed by an application.)

> The KFD topology APIs report the number of available CUs to the
> application. CGroups would change that number at runtime and
> applications would not expect that. I think the best way to deal with
> that would be to have multiple bits in the application-specified CU mask
> map to the same CU. How to do that in a fair way is not obvious. I guess
> a more coarse-grain division of the GPU into LGPUs would make this
> somewhat easier.
Another possibility is to add namespace to the topology sysfs such
that the correct number of CUs changes accordingly.  Although that
wouldn't give the user the available mask that is made available by
this implementation via the cgroup sysfs.  Another possibility is to
modify the thunk similar to what was done for device cgroup (device
re-mapping.)

> How is this problem handled for CPU cores and the interaction with CPU
> pthread_setaffinity_np?
Per the documentation of pthread_setaffinity_np, "If the call is
successful, and the thread is not currently running on one of the CPUs
in cpuset, then it is migrated to one of those CPUs."
http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html

Regards,
Kenny



> Regards,
>Felix
>
>
> >
> > Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> > Signed-off-by: Kenny Ho 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   4 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  21 +++
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   6 +
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +
> >   .../amd/amdkfd/kfd_process_queue_manager.c| 140 ++
> >   5 files changed, 174 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > index 55cb1b2094fd..369915337213 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct 
> > kgd_dev *dst, struct kgd_dev *s
> >   valid;  \
> >   })
> >
> > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > + struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> > + unsigned int nbits);
> > +
> >   /* GPUVM API */
> >   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned 
> > int pasid,
> >   void **vm, void **process_info,
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 163a4fbf0611..8abeffdd2e5b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device 
> > *dev, unsigned int pipe,
> >   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
> >   struct drmcg_props *props)
> 

RE: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

2019-11-28 Thread Liu, Monk
The content of CSIB is always static, I submitted a patch to use the re-init 
and get rid of pin/unpin CSIB in hw_ini/fini,  (my purpose is to fix the double 
unpin warning during unload )
_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Christian König  
Sent: Thursday, November 28, 2019 7:51 PM
To: Liu, Monk ; Yuan, Xiaojie ; 
Deucher, Alexander ; Koenig, Christian 

Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

Hi Monk,

if the content of the CSIB is constant then it is certainly better to just 
re-initialize it.

This also prevents from corruption because of VRAM lost.

Christian.

Am 28.11.19 um 03:53 schrieb Liu, Monk:
> Hi Xiaojie
>
> For SRIOV we don't use suspend so I didn't think to that part, thanks for the 
> remind !
> But we still need to fix this call trace issue anyway (our jenkins 
> testing  system consider such call trace as an error )
>
> How about we do "  adev->gfx.rlc.funcs->get_csb_buffer(adev, 
> dst_ptr);" in the hw_init() ? this way You don't need to evict the CSIB 
> during suspend and the CSIB always will be re-initialized after S3 resume ?
>
> @Deucher, Alexander @Koenig, Christian what's your opinion ?
> _
> Monk Liu|GPU Virtualization Team |AMD
>
>
> -Original Message-
> From: Yuan, Xiaojie 
> Sent: Tuesday, November 26, 2019 9:10 PM
> To: Liu, Monk 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload
>
> Hi Monk,
>
> hw_fini() is called in suspend code path as well. I'm wondering how csb can 
> be evicted if it's not unpined before suspend.
>
> BR,
> Xiaojie
>
>> On Nov 26, 2019, at 7:50 PM, Monk Liu  wrote:
>>
>> kernel would report a warning on double unpin on the csb BO because 
>> we unpin it during hw_fini but actually we don't need to pin/unpin it 
>> during hw_init/fini since it is created with kernel pinned
>>
>> remove all those useless code for gfx9/10
>>
>> Signed-off-by: Monk Liu 
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c |  1 - 
>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 38 
>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 39 
>> -
>> 3 files changed, 78 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>> index c8793e6..289fada 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
>> @@ -145,7 +145,6 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
>> dst_ptr = adev->gfx.rlc.cs_ptr;
>> adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
>> amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
>> -amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
>> amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
>>
>> return 0;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index a56cba9..5ee7467 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -996,39 +996,6 @@ static int gfx_v10_0_rlc_init(struct amdgpu_device 
>> *adev)
>> return 0;
>> }
>>
>> -static int gfx_v10_0_csb_vram_pin(struct amdgpu_device *adev) -{
>> -int r;
>> -
>> -r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
>> -if (unlikely(r != 0))
>> -return r;
>> -
>> -r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
>> -AMDGPU_GEM_DOMAIN_VRAM);
>> -if (!r)
>> -adev->gfx.rlc.clear_state_gpu_addr =
>> -amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
>> -
>> -amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
>> -
>> -return r;
>> -}
>> -
>> -static void gfx_v10_0_csb_vram_unpin(struct amdgpu_device *adev) -{
>> -int r;
>> -
>> -if (!adev->gfx.rlc.clear_state_obj)
>> -return;
>> -
>> -r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
>> -if (likely(r == 0)) {
>> -amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
>> -amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
>> -}
>> -}
>> -
>> static void gfx_v10_0_mec_fini(struct amdgpu_device *adev) {
>> amdgpu_bo_free_kernel(>gfx.mec.hpd_eop_obj, NULL, NULL); @@
>> -3780,10 +3747,6 @@ static int gfx_v10_0_hw_init(void *handle)
>> int r;
>> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>
>> -r = gfx_v10_0_csb_vram_pin(adev);
>> -if (r)
>> -return r;
>> -
>> if (!amdgpu_emu_mode)
>> gfx_v10_0_init_golden_registers(adev);
>>
>> @@ -3871,7 +3834,6 @@ static int gfx_v10_0_hw_fini(void *handle)
>> }
>> gfx_v10_0_cp_enable(adev, false);
>> gfx_v10_0_enable_gui_idle_interrupt(adev, false);
>> -gfx_v10_0_csb_vram_unpin(adev);
>>
>> return 0;
>> }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> 

RE: Deadlock on PTEs update for HMM

2019-11-28 Thread Zeng, Oak
[AMD Official Use Only - Internal Distribution Only]

Is kmalloc with GFP_NOWAIT an option here?

Regards,
Oak

From: amd-gfx  On Behalf Of Sierra 
Guiza, Alejandro (Alex)
Sent: Wednesday, November 27, 2019 9:55 AM
To: Koenig, Christian ; Kuehling, Felix 

Cc: amd-gfx@lists.freedesktop.org
Subject: Deadlock on PTEs update for HMM

Hi Christian,
As you know, we're working on the HMM enablement. Im working on the dGPU page 
table entries invalidation on the userptr mapping case. Currently, the MMU 
notifiers handle stops all user mode queues, schedule a delayed worker to 
re-validate userptr mappings and restart the queues.
Part of the HMM functionality, we need to invalidate the page table entries 
instead of stopping the queues. At the same time we need to move the 
revalidation of the userptr mappings into the page fault handler.
We're seeing a deadlock warning after we try to invalidate the PTEs inside the 
MMU notifier handler. More specific, when we try to update the BOs to 
invalidate PTEs using amdgpu_vm_bo_update. This uses kmalloc on the 
amdgpu_job_alloc which seems to be causing this problem.
Based on @Kuehling, Felix comments, kmalloc 
without any special flags can cause memory reclaim. Doing that inside an MMU 
notifier is problematic, because an MMU notifier may be called inside a 
memory-reclaim operation itself. That would result in recursion. Also, reclaim 
shouldn't be done while holding a lock that can be taken in an MMU notifier for 
the same reason. If you cause a reclaim while holding that lock, then an MMU 
notifier called by the reclaim can deadlock trying to take the same lock.
Please let us know if you have any advice to enable this the right way

Thanks in advanced,
Alejandro

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Deadlock on PTEs update for HMM

2019-11-28 Thread Felix Kuehling

Hi Christian,

I'm thinking about this problem, trying to come up with a solution. The 
fundamental problem is that we need low-overhead access to the page 
table in the MMU notifier, without much memory management or locking.


There is one "driver lock" that we're supposed to take in the MMU 
notifier as well as when we update page tables that is prescribed by the 
HMM documentation (Documentation/vm/hmm.rst). I don't currently see such 
a lock in amdgpu. We'll probably need to add that anyway, with all the 
usual precautions about lock dependencies around MMU notifiers. Then we 
could use that lock to protect page table residency state, in addition 
to the reservation of the top-level page directory.


We don't want to block eviction of page tables unconditionally, so the 
MMU notifier must be able to deal with the situation that page tables 
are not resident at the moment. But the lock can delay page tables from 
being evicted while an MMU notifier is in progress and protect us from 
race conditions between MMU notifiers invalidating PTEs, and page tables 
getting evicted.


amdgpu_vm_bo_invalidate could detect when a page table is being evicted, 
and update a new "vm_resident" flag inside the amdgpu_vm while holding 
the "HMM driver lock". If an MMU notifier is in progress, trying to take 
the "HMM driver lock" will delay the eviction long enough for any 
pending PTE invalidation to complete.


In the case that page tables are not resident (vm_resident flag is 
false), it means the GPU is currently not accessing any memory in that 
amdgpu_vm address space. So we don't need to invalidate the PTEs right 
away. I think we could implement a deferred invalidation mechanism for 
this case, that delays the invalidation until the next time the page 
tables are made resident. amdgpu_amdkfd_gpuvm_restore_process_bos would 
replay any deferred PTE invalidations after validating the page tables 
but before restarting the user mode queues for the process. If graphics 
ever implements page-fault-based memory management, you'd need to do 
something similar in amdgpu_cs.


Once all that is in place, we should be able to update PTEs in MMU 
notifiers without reserving the page tables.


If we use SDMA for updating page tables, we may need a pre-allocated IB 
for use in MMU notifiers. And there is problably other details to be 
worked out about exactly how we implement the PTE invalidation in MMU 
notifiers and reflecting that in the state of the amdgpu_vm and 
amdgpu_bo_va_mapping.


Does this idea sound reasonable to you? Can you think of a simpler solution?

Thanks,
  Felix

On 2019-11-27 10:02 a.m., Christian König wrote:

Hi Alejandro,

yes I'm very aware of this issue, but unfortunately can't give an easy 
solution either.


I'm working for over a year now on getting this fixed, but 
unfortunately it turned out that this problem is much bigger than 
initially thought.


Setting the appropriate GFP flags for the job allocation is actually 
the trivial part.


The really really hard thing is that we somehow need to add a lock to 
prevent the page tables from being evicted. And as you also figured 
out that lock can't be taken easily anywhere else.


I've already wrote a prototype for this, but didn't had time to hammer 
it into shape for upstreaming yet.


Regards,
Christian.

Am 27.11.19 um 15:55 schrieb Sierra Guiza, Alejandro (Alex):


Hi Christian,

As you know, we’re working on the HMM enablement. Im working on the 
dGPU page table entries invalidation on the userptr mapping case. 
Currently, the MMU notifiers handle stops all user mode queues, 
schedule a delayed worker to re-validate userptr mappings and restart 
the queues.


Part of the HMM functionality, we need to invalidate the page table 
entries instead of stopping the queues. At the same time we need to 
move the revalidation of the userptr mappings into the page fault 
handler.


We’re seeing a deadlock warning after we try to invalidate the PTEs 
inside the MMU notifier handler. More specific, when we try to update 
the BOs to invalidate PTEs using amdgpu_vm_bo_update. This uses 
kmalloc on the amdgpu_job_alloc which seems to be causing this problem.


Based on @Kuehling, Felix  comments, 
kmalloc without any special flags can cause memory reclaim. Doing 
that inside an MMU notifier is problematic, because an MMU notifier 
may be called inside a memory-reclaim operation itself. That would 
result in recursion. Also, reclaim shouldn't be done while holding a 
lock that can be taken in an MMU notifier for the same reason. If you 
cause a reclaim while holding that lock, then an MMU notifier called 
by the reclaim can deadlock trying to take the same lock.


Please let us know if you have any advice to enable this the right way

Thanks in advanced,

Alejandro




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Load TA firmware for navi10/12/14

2019-11-28 Thread Harry Wentland


On 2019-11-28 3:20 p.m., Bhawanpreet Lakha wrote:
> Hey Harry can you please take a look thanks
> 
> On 2019-11-08 5:01 p.m., Deucher, Alexander wrote:
>> Acked-by: Alex Deucher 
>> 
>> *From:* amd-gfx  on behalf of
>> Bhawanpreet Lakha 
>> *Sent:* Friday, November 8, 2019 4:57 PM
>> *To:* amd-gfx@lists.freedesktop.org 
>> *Cc:* Lakha, Bhawanpreet 
>> *Subject:* [PATCH] drm/amd/display: Load TA firmware for navi10/12/14
>>  
>> load the ta firmware for navi10/12/14.
>> This is already being done for raven/picasso and
>> is needed for supporting hdcp on navi
>>
>> Signed-off-by: Bhawanpreet Lakha 

Reviewed-by: Harry Wentland 

Harry

>> ---
>>  drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 25 +
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>> b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>> index ffeaa2f5588d..daf11be5416f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
>> @@ -186,6 +186,31 @@ static int psp_v11_0_init_microcode(struct
>> psp_context *psp)
>>  case CHIP_NAVI10:
>>  case CHIP_NAVI14:
>>  case CHIP_NAVI12:
>> +   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ta.bin",
>> chip_name);
>> +   err = request_firmware(>psp.ta_fw, fw_name,
>> adev->dev);
>> +   if (err) {
>> +   release_firmware(adev->psp.ta_fw);
>> +   adev->psp.ta_fw = NULL;
>> +   dev_info(adev->dev,
>> +    "psp v11.0: Failed to load firmware
>> \"%s\"\n", fw_name);
>> +   } else {
>> +   err = amdgpu_ucode_validate(adev->psp.ta_fw);
>> +   if (err)
>> +   goto out2;
>> +
>> +   ta_hdr = (const struct ta_firmware_header_v1_0
>> *)adev->psp.ta_fw->data;
>> +   adev->psp.ta_hdcp_ucode_version =
>> le32_to_cpu(ta_hdr->ta_hdcp_ucode_version);
>> +   adev->psp.ta_hdcp_ucode_size =
>> le32_to_cpu(ta_hdr->ta_hdcp_size_bytes);
>> +   adev->psp.ta_hdcp_start_addr = (uint8_t *)ta_hdr +
>> +  
>> le32_to_cpu(ta_hdr->header.ucode_array_offset_bytes);
>> +
>> +   adev->psp.ta_fw_version =
>> le32_to_cpu(ta_hdr->header.ucode_version);
>> +
>> +   adev->psp.ta_dtm_ucode_version =
>> le32_to_cpu(ta_hdr->ta_dtm_ucode_version);
>> +   adev->psp.ta_dtm_ucode_size =
>> le32_to_cpu(ta_hdr->ta_dtm_size_bytes);
>> +   adev->psp.ta_dtm_start_addr = (uint8_t
>> *)adev->psp.ta_hdcp_start_addr +
>> +   le32_to_cpu(ta_hdr->ta_dtm_offset_bytes);
>> +   }
>>  break;
>>  default:
>>  BUG();
>> -- 
>> 2.17.1
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Load TA firmware for navi10/12/14

2019-11-28 Thread Bhawanpreet Lakha

Hey Harry can you please take a look thanks

On 2019-11-08 5:01 p.m., Deucher, Alexander wrote:

Acked-by: Alex Deucher 

*From:* amd-gfx  on behalf of 
Bhawanpreet Lakha 

*Sent:* Friday, November 8, 2019 4:57 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Lakha, Bhawanpreet 
*Subject:* [PATCH] drm/amd/display: Load TA firmware for navi10/12/14
load the ta firmware for navi10/12/14.
This is already being done for raven/picasso and
is needed for supporting hdcp on navi

Signed-off-by: Bhawanpreet Lakha 
---
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c

index ffeaa2f5588d..daf11be5416f 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -186,6 +186,31 @@ static int psp_v11_0_init_microcode(struct 
psp_context *psp)

 case CHIP_NAVI10:
 case CHIP_NAVI14:
 case CHIP_NAVI12:
+   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ta.bin", 
chip_name);
+   err = request_firmware(>psp.ta_fw, fw_name, 
adev->dev);

+   if (err) {
+ release_firmware(adev->psp.ta_fw);
+   adev->psp.ta_fw = NULL;
+   dev_info(adev->dev,
+    "psp v11.0: Failed to load firmware 
\"%s\"\n", fw_name);

+   } else {
+   err = amdgpu_ucode_validate(adev->psp.ta_fw);
+   if (err)
+   goto out2;
+
+   ta_hdr = (const struct ta_firmware_header_v1_0 
*)adev->psp.ta_fw->data;
+   adev->psp.ta_hdcp_ucode_version = 
le32_to_cpu(ta_hdr->ta_hdcp_ucode_version);
+   adev->psp.ta_hdcp_ucode_size = 
le32_to_cpu(ta_hdr->ta_hdcp_size_bytes);

+   adev->psp.ta_hdcp_start_addr = (uint8_t *)ta_hdr +
+ le32_to_cpu(ta_hdr->header.ucode_array_offset_bytes);
+
+   adev->psp.ta_fw_version = 
le32_to_cpu(ta_hdr->header.ucode_version);

+
+   adev->psp.ta_dtm_ucode_version = 
le32_to_cpu(ta_hdr->ta_dtm_ucode_version);
+   adev->psp.ta_dtm_ucode_size = 
le32_to_cpu(ta_hdr->ta_dtm_size_bytes);
+   adev->psp.ta_dtm_start_addr = (uint8_t 
*)adev->psp.ta_hdcp_start_addr +

+ le32_to_cpu(ta_hdr->ta_dtm_offset_bytes);
+   }
 break;
 default:
 BUG();
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Drop AMD_EDID_UTILITY defines

2019-11-28 Thread Kazlauskas, Nicholas

On 2019-11-28 11:33 a.m., Harry Wentland wrote:

We don't use this upstream in the Linux kernel.

Signed-off-by: Harry Wentland 


Reviewed-by: Nicholas Kazlauskas 


---
  drivers/gpu/drm/amd/display/dc/dc_dsc.h  | 2 --
  drivers/gpu/drm/amd/display/dc/dc_hw_types.h | 8 
  drivers/gpu/drm/amd/display/dc/dc_types.h| 4 
  3 files changed, 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_dsc.h 
b/drivers/gpu/drm/amd/display/dc/dc_dsc.h
index a782ae18a1c5..cc9915e545cd 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dsc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_dsc.h
@@ -41,10 +41,8 @@ struct dc_dsc_bw_range {
  
  struct display_stream_compressor {

const struct dsc_funcs *funcs;
-#ifndef AMD_EDID_UTILITY
struct dc_context *ctx;
int inst;
-#endif
  };
  
  bool dc_dsc_parse_dsc_dpcd(const uint8_t *dpcd_dsc_basic_data,

diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
index 86043d431d40..25c50bcab9e9 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -26,8 +26,6 @@
  #ifndef DC_HW_TYPES_H
  #define DC_HW_TYPES_H
  
-#ifndef AMD_EDID_UTILITY

-
  #include "os_types.h"
  #include "fixed31_32.h"
  #include "signal_types.h"
@@ -584,8 +582,6 @@ struct scaling_taps {
bool integer_scaling;
  };
  
-#endif /* AMD_EDID_UTILITY */

-
  enum dc_timing_standard {
DC_TIMING_STANDARD_UNDEFINED,
DC_TIMING_STANDARD_DMT,
@@ -742,8 +738,6 @@ struct dc_crtc_timing {
struct dc_dsc_config dsc_cfg;
  };
  
-#ifndef AMD_EDID_UTILITY

-
  enum trigger_delay {
TRIGGER_DELAY_NEXT_PIXEL = 0,
TRIGGER_DELAY_NEXT_LINE,
@@ -837,7 +831,5 @@ struct tg_color {
uint16_t color_b_cb;
  };
  
-#endif /* AMD_EDID_UTILITY */

-
  #endif /* DC_HW_TYPES_H */
  
diff --git a/drivers/gpu/drm/amd/display/dc/dc_types.h b/drivers/gpu/drm/amd/display/dc/dc_types.h

index 1363e8907fbf..2b92bfa28bde 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_types.h
@@ -25,7 +25,6 @@
  #ifndef DC_TYPES_H_
  #define DC_TYPES_H_
  
-#ifndef AMD_EDID_UTILITY

  /* AND EdidUtility only needs a portion
   * of this file, including the rest only
   * causes additional issues.
@@ -781,9 +780,6 @@ struct dc_clock_config {
uint32_t current_clock_khz;/*current clock in use*/
  };
  
-#endif /*AMD_EDID_UTILITY*/

-//AMD EDID UTILITY does not need any of the above structures
-
  /* DSC DPCD capabilities */
  union dsc_slice_caps1 {
struct {



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/display: Drop AMD_EDID_UTILITY defines

2019-11-28 Thread Harry Wentland
We don't use this upstream in the Linux kernel.

Signed-off-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dc_dsc.h  | 2 --
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h | 8 
 drivers/gpu/drm/amd/display/dc/dc_types.h| 4 
 3 files changed, 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_dsc.h 
b/drivers/gpu/drm/amd/display/dc/dc_dsc.h
index a782ae18a1c5..cc9915e545cd 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dsc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_dsc.h
@@ -41,10 +41,8 @@ struct dc_dsc_bw_range {
 
 struct display_stream_compressor {
const struct dsc_funcs *funcs;
-#ifndef AMD_EDID_UTILITY
struct dc_context *ctx;
int inst;
-#endif
 };
 
 bool dc_dsc_parse_dsc_dpcd(const uint8_t *dpcd_dsc_basic_data,
diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
index 86043d431d40..25c50bcab9e9 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -26,8 +26,6 @@
 #ifndef DC_HW_TYPES_H
 #define DC_HW_TYPES_H
 
-#ifndef AMD_EDID_UTILITY
-
 #include "os_types.h"
 #include "fixed31_32.h"
 #include "signal_types.h"
@@ -584,8 +582,6 @@ struct scaling_taps {
bool integer_scaling;
 };
 
-#endif /* AMD_EDID_UTILITY */
-
 enum dc_timing_standard {
DC_TIMING_STANDARD_UNDEFINED,
DC_TIMING_STANDARD_DMT,
@@ -742,8 +738,6 @@ struct dc_crtc_timing {
struct dc_dsc_config dsc_cfg;
 };
 
-#ifndef AMD_EDID_UTILITY
-
 enum trigger_delay {
TRIGGER_DELAY_NEXT_PIXEL = 0,
TRIGGER_DELAY_NEXT_LINE,
@@ -837,7 +831,5 @@ struct tg_color {
uint16_t color_b_cb;
 };
 
-#endif /* AMD_EDID_UTILITY */
-
 #endif /* DC_HW_TYPES_H */
 
diff --git a/drivers/gpu/drm/amd/display/dc/dc_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_types.h
index 1363e8907fbf..2b92bfa28bde 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_types.h
@@ -25,7 +25,6 @@
 #ifndef DC_TYPES_H_
 #define DC_TYPES_H_
 
-#ifndef AMD_EDID_UTILITY
 /* AND EdidUtility only needs a portion
  * of this file, including the rest only
  * causes additional issues.
@@ -781,9 +780,6 @@ struct dc_clock_config {
uint32_t current_clock_khz;/*current clock in use*/
 };
 
-#endif /*AMD_EDID_UTILITY*/
-//AMD EDID UTILITY does not need any of the above structures
-
 /* DSC DPCD capabilities */
 union dsc_slice_caps1 {
struct {
-- 
2.24.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/4] drm/amd/display: Remove unneeded semicolon

2019-11-28 Thread Harry Wentland
Series is
Reviewed-by: Harry Wentland 

Harry

On 2019-11-27 9:31 p.m., zhengbin wrote:
> zhengbin (4):
>   drm/amd/display: Remove unneeded semicolon in bios_parser.c
>   drm/amd/display: Remove unneeded semicolon in bios_parser2.c
>   drm/amd/display: Remove unneeded semicolon in hdcp.c
>   drm/amd/display: Remove unneeded semicolon in display_rq_dlg_calc_21.c
> 
>  drivers/gpu/drm/amd/display/dc/bios/bios_parser.c | 2 +-
>  drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c| 2 +-
>  drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c | 4 ++--
>  drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c   | 2 +-
>  4 files changed, 5 insertions(+), 5 deletions(-)
> 
> --
> 2.7.4
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix wrongly passed static prefix

2019-11-28 Thread Harry Wentland
On 2019-11-28 3:27 a.m., Takashi Iwai wrote:
> Currently, gcc spews a warning as:
>   drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hubbub.c: In function 
> ‘hubbub1_verify_allow_pstate_change_high’:
>   ./include/drm/drm_print.h:316:2: warning: ‘debug_data’ may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
> 
> This is because the code checks against a static value although it's
> basically a constant and guaranteed to be set.
> 
> This patch changes the type prefix from static to const for addressing
> the compile warning above and also for letting the compiler optimize
> better.
> 
> Fixes: 62d591a8e00c ("drm/amd/display: create new files for hubbub functions")
> Signed-off-by: Takashi Iwai 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c 
> b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
> index a02c10e23e0d..b5c44c3bdb98 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
> @@ -128,8 +128,8 @@ bool hubbub1_verify_allow_pstate_change_high(
>* pstate takes around ~100us on linux. Unknown currently as to
>* why it takes that long on linux
>*/
> - static unsigned int pstate_wait_timeout_us = 200;
> - static unsigned int pstate_wait_expected_timeout_us = 40;
> + const unsigned int pstate_wait_timeout_us = 200;
> + const unsigned int pstate_wait_expected_timeout_us = 40;
>   static unsigned int max_sampled_pstate_wait_us; /* data collection */
>   static bool forced_pstate_allow; /* help with revert wa */
>  
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/dsc: Return unsigned long on compute offset

2019-11-28 Thread Jani Nikula
On Fri, 22 Nov 2019,  wrote:
> From: Mikita Lipski 
>
> We shouldn't compare int with unsigned long to find the max value
> and since we are not expecting negative value returned from
> compute_offset we should make this function return unsigned long
> so we can compare the values when computing rc parameters.
>
> v2: Modified function parameters to unsigned type for type
> consistency

I don't think that really addresses the review.

But all the same, current drm-tip does not have a compute_offset()
function, and the only reference to it in my email are your patches, so
this is completely unactionable anyway.

In any case I think the root cause for your issues is the unfounded use
of unsigned longs in drm_dsc_compute_rc_parameters(). Fix that and many
of your problems go away.

BR,
Jani.

>
> Cc: Ville Syrjälä 
> Cc: Nikola Cornij 
> Cc: Harry Wentland 
> Signed-off-by: Mikita Lipski 
> ---
>  drivers/gpu/drm/drm_dsc.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_dsc.c b/drivers/gpu/drm/drm_dsc.c
> index 74f3527f567d..ccce0297da64 100644
> --- a/drivers/gpu/drm/drm_dsc.c
> +++ b/drivers/gpu/drm/drm_dsc.c
> @@ -245,11 +245,11 @@ void drm_dsc_pps_payload_pack(struct 
> drm_dsc_picture_parameter_set *pps_payload,
>  }
>  EXPORT_SYMBOL(drm_dsc_pps_payload_pack);
>  
> -static int compute_offset(struct drm_dsc_config *vdsc_cfg, int 
> pixels_per_group,
> - int groups_per_line, int grpcnt)
> +static unsigned long compute_offset(struct drm_dsc_config *vdsc_cfg, 
> unsigned int pixels_per_group,
> + unsigned long groups_per_line, unsigned long 
> grpcnt)
>  {
> - int offset = 0;
> - int grpcnt_id = DIV_ROUND_UP(vdsc_cfg->initial_xmit_delay, 
> pixels_per_group);
> + unsigned long offset = 0;
> + unsigned long grpcnt_id = DIV_ROUND_UP(vdsc_cfg->initial_xmit_delay, 
> pixels_per_group);
>  
>   if (grpcnt <= grpcnt_id)
>   offset = DIV_ROUND_UP(grpcnt * pixels_per_group * 
> vdsc_cfg->bits_per_pixel, 16);

-- 
Jani Nikula, Intel Open Source Graphics Center
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: move CS secure flag next the structs where it's used

2019-11-28 Thread Christian König

Am 27.11.19 um 21:56 schrieb Alex Deucher:

So it's not mixed up with the CTX stuff.

Signed-off-by: Alex Deucher 


Reviewed-by: Christian König 


---
  include/uapi/drm/amdgpu_drm.h | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index f75c6957064d..918ac3548cd3 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -207,9 +207,6 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_OP_QUERY_STATE 3
  #define AMDGPU_CTX_OP_QUERY_STATE24
  
-/* Flag the command submission as secure */

-#define AMDGPU_CS_FLAGS_SECURE  (1 << 0)
-
  /* GPU reset status */
  #define AMDGPU_CTX_NO_RESET   0
  /* this the context caused it */
@@ -559,6 +556,9 @@ struct drm_amdgpu_cs_chunk {
__u64   chunk_data;
  };
  
+/* Flag the command submission as secure */

+#define AMDGPU_CS_FLAGS_SECURE  (1 << 0)
+
  struct drm_amdgpu_cs_in {
/** Rendering context id */
__u32   ctx_id;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

2019-11-28 Thread Christian König

Hi Monk,

if the content of the CSIB is constant then it is certainly better to 
just re-initialize it.


This also prevents from corruption because of VRAM lost.

Christian.

Am 28.11.19 um 03:53 schrieb Liu, Monk:

Hi Xiaojie

For SRIOV we don't use suspend so I didn't think to that part, thanks for the 
remind !
But we still need to fix this call trace issue anyway (our jenkins testing  
system consider such call trace as an error )

How about we do "  adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);" in the 
hw_init() ? this way
You don't need to evict the CSIB during suspend and the CSIB always will be 
re-initialized after S3 resume ?

@Deucher, Alexander @Koenig, Christian what's your opinion ?
_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Yuan, Xiaojie 
Sent: Tuesday, November 26, 2019 9:10 PM
To: Liu, Monk 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload

Hi Monk,

hw_fini() is called in suspend code path as well. I'm wondering how csb can be 
evicted if it's not unpined before suspend.

BR,
Xiaojie


On Nov 26, 2019, at 7:50 PM, Monk Liu  wrote:

kernel would report a warning on double unpin on the csb BO because we
unpin it during hw_fini but actually we don't need to pin/unpin it
during hw_init/fini since it is created with kernel pinned

remove all those useless code for gfx9/10

Signed-off-by: Monk Liu 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c |  1 -
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 38 
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 39 -
3 files changed, 78 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index c8793e6..289fada 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -145,7 +145,6 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
dst_ptr = adev->gfx.rlc.cs_ptr;
adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
-amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);

return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index a56cba9..5ee7467 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -996,39 +996,6 @@ static int gfx_v10_0_rlc_init(struct amdgpu_device *adev)
return 0;
}

-static int gfx_v10_0_csb_vram_pin(struct amdgpu_device *adev) -{
-int r;
-
-r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-if (unlikely(r != 0))
-return r;
-
-r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
-AMDGPU_GEM_DOMAIN_VRAM);
-if (!r)
-adev->gfx.rlc.clear_state_gpu_addr =
-amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
-
-amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
-return r;
-}
-
-static void gfx_v10_0_csb_vram_unpin(struct amdgpu_device *adev) -{
-int r;
-
-if (!adev->gfx.rlc.clear_state_obj)
-return;
-
-r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, true);
-if (likely(r == 0)) {
-amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
-amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-}
-}
-
static void gfx_v10_0_mec_fini(struct amdgpu_device *adev) {
amdgpu_bo_free_kernel(>gfx.mec.hpd_eop_obj, NULL, NULL); @@
-3780,10 +3747,6 @@ static int gfx_v10_0_hw_init(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;

-r = gfx_v10_0_csb_vram_pin(adev);
-if (r)
-return r;
-
if (!amdgpu_emu_mode)
gfx_v10_0_init_golden_registers(adev);

@@ -3871,7 +3834,6 @@ static int gfx_v10_0_hw_fini(void *handle)
}
gfx_v10_0_cp_enable(adev, false);
gfx_v10_0_enable_gui_idle_interrupt(adev, false);
-gfx_v10_0_csb_vram_unpin(adev);

return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 4cc2e50..524a7ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1683,39 +1683,6 @@ static int gfx_v9_0_rlc_init(struct amdgpu_device *adev)
return 0;
}

-static int gfx_v9_0_csb_vram_pin(struct amdgpu_device *adev) -{
-int r;
-
-r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
-if (unlikely(r != 0))
-return r;
-
-r = amdgpu_bo_pin(adev->gfx.rlc.clear_state_obj,
-AMDGPU_GEM_DOMAIN_VRAM);
-if (!r)
-adev->gfx.rlc.clear_state_gpu_addr =
-amdgpu_bo_gpu_offset(adev->gfx.rlc.clear_state_obj);
-
-amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
-
-return r;
-}
-
-static void gfx_v9_0_csb_vram_unpin(struct amdgpu_device *adev) -{
-int r;
-
-if (!adev->gfx.rlc.clear_state_obj)
-return;

[PATCH] drm/amgdpu: add cache flush workaround to gfx8 emit_fence

2019-11-28 Thread Pierre-Eric Pelloux-Prayer
The same workaround is used for gfx7.
Both PAL and Mesa use it for gfx8 too, so port this commit to
gfx_v8_0_ring_emit_fence_gfx.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 80b79583dffe..dcd747bef391 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6183,7 +6183,23 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct 
amdgpu_ring *ring, u64 addr,
bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
 
-   /* EVENT_WRITE_EOP - flush caches, send int */
+   /* Workaround for cache flush problems. First send a dummy EOP
+* event down the pipe with seq one below.
+*/
+   amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP, 4));
+   amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
+EOP_TC_ACTION_EN |
+EOP_TC_WB_ACTION_EN |
+EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |
+EVENT_INDEX(5)));
+   amdgpu_ring_write(ring, addr & 0xfffc);
+   amdgpu_ring_write(ring, (upper_32_bits(addr) & 0x) |
+   DATA_SEL(1) | INT_SEL(0));
+   amdgpu_ring_write(ring, lower_32_bits(seq - 1));
+   amdgpu_ring_write(ring, upper_32_bits(seq - 1));
+
+   /* Then send the real EOP event down the pipe:
+* EVENT_WRITE_EOP - flush caches, send int */
amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP, 4));
amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
 EOP_TC_ACTION_EN |
@@ -6926,7 +6942,7 @@ static const struct amdgpu_ring_funcs 
gfx_v8_0_ring_funcs_gfx = {
5 +  /* COND_EXEC */
7 +  /* PIPELINE_SYNC */
VI_FLUSH_GPU_TLB_NUM_WREG * 5 + 9 + /* VM_FLUSH */
-   8 +  /* FENCE for VM_FLUSH */
+   12 +  /* FENCE for VM_FLUSH */
20 + /* GDS switch */
4 + /* double SWITCH_BUFFER,
   the first COND_EXEC jump to the place just
@@ -6938,7 +6954,7 @@ static const struct amdgpu_ring_funcs 
gfx_v8_0_ring_funcs_gfx = {
31 + /* DE_META */
3 + /* CNTX_CTRL */
5 + /* HDP_INVL */
-   8 + 8 + /* FENCE x2 */
+   12 + 12 + /* FENCE x2 */
2, /* SWITCH_BUFFER */
.emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */
.emit_ib = gfx_v8_0_ring_emit_ib_gfx,
-- 
2.24.0.rc0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 07/10] drm/amdgpu: add concurrent baco reset support for XGMI

2019-11-28 Thread Ma, Le




-Original Message-
From: Grodzovsky, Andrey 
Sent: Wednesday, November 27, 2019 11:46 PM
To: Ma, Le ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Zhou1, Tao ; 
Deucher, Alexander ; Li, Dennis ; 
Zhang, Hawking 
Subject: Re: [PATCH 07/10] drm/amdgpu: add concurrent baco reset support for 
XGMI





On 11/27/19 4:15 AM, Le Ma wrote:

> Currently each XGMI node reset wq does not run in parrallel because

> same work item bound to same cpu runs in sequence. So change to bound

> the xgmi_reset_work item to different cpus.



It's not the same work item, see more bellow





>

> XGMI requires all nodes enter into baco within very close proximity

> before any node exit baco. So schedule the xgmi_reset_work wq twice

> for enter/exit baco respectively.

>

> The default reset code path and methods do not change for vega20 production:

>- baco reset without xgmi/ras

>- psp reset with xgmi/ras

>

> To enable baco for XGMI/RAS case, both 2 conditions below are needed:

>- amdgpu_ras_enable=2

>- baco-supported smu firmware

>

> The case that PSP reset and baco reset coexist within an XGMI hive is

> not in the consideration.

>

> Change-Id: I9c08cf90134f940b42e20d2129ff87fba761c532

> Signed-off-by: Le Ma mailto:le...@amd.com>>

> ---

>   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 +

>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 78 
> ++

>   2 files changed, 70 insertions(+), 10 deletions(-)

>

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> index d120fe5..08929e6 100644

> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> @@ -998,6 +998,8 @@ struct amdgpu_device {

>  int   pstate;

>  /* enable runtime pm on the device */

>  boolrunpm;

> +

> +  boolin_baco;

>   };

>

>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct

> ttm_bo_device *bdev) diff --git

> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

> index bd387bb..71abfe9 100644

> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

> @@ -2654,7 +2654,13 @@ static void amdgpu_device_xgmi_reset_func(struct 
> work_struct *__work)

>  struct amdgpu_device *adev =

>  container_of(__work, struct amdgpu_device, 
> xgmi_reset_work);

>

> -   adev->asic_reset_res =  amdgpu_asic_reset(adev);

> +  if (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO)

> +  adev->asic_reset_res = (adev->in_baco == false) ?

> +  
> amdgpu_device_baco_enter(adev->ddev) :

> +  
> amdgpu_device_baco_exit(adev->ddev);

> +  else

> +  adev->asic_reset_res = amdgpu_asic_reset(adev);

> +

>  if (adev->asic_reset_res)

>  DRM_WARN("ASIC reset failed with error, %d for drm dev, 
> %s",

>   adev->asic_reset_res, adev->ddev->unique); 
> @@ -3796,6 +3802,7 @@

> static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,

>  struct amdgpu_device *tmp_adev = NULL;

>  bool need_full_reset = *need_full_reset_arg, vram_lost = false;

>  int r = 0;

> +  int cpu = smp_processor_id();

>

>  /*

>   * ASIC reset has to be done on all HGMI hive nodes ASAP @@

> -3803,21 +3810,24 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info 
> *hive,

>   */

>  if (need_full_reset) {

>  list_for_each_entry(tmp_adev, device_list_handle, 
> gmc.xgmi.head) {

> -   /* For XGMI run all resets in parallel to 
> speed up the process */

> +  /*

> +  * For XGMI run all resets in parallel to speed 
> up the

> +  * process by scheduling the highpri wq on 
> different

> +  * cpus. For XGMI with baco reset, all nodes 
> must enter

> +  * baco within close proximity before anyone 
> exit.

> +  */

>  if (tmp_adev->gmc.xgmi.num_physical_nodes > 
> 1) {

> -   if 
> (!queue_work(system_highpri_wq, _adev->xgmi_reset_work))





Note that tmp_adev->xgmi_reset_work (the work item) is per device in XGMI hive 
and not the same work item. So I don't see why you need to explicitly queue 
them on different CPUs, they should run in parallel already.



Andrey



[Le]: It’s also beyond my understanding that the 2 node reset work items 
scheduled to same cpu does not run in parallel. But from the experiment result 
in my side, the 2nd work item always 

[PATCH 4/4] drm/amd/display: Remove unneeded semicolon in display_rq_dlg_calc_21.c

2019-11-28 Thread zhengbin
Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:1525:144-145: 
Unneeded semicolon
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:1526:142-143: 
Unneeded semicolon

Reported-by: Hulk Robot 
Signed-off-by: zhengbin 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c
index a4b103e..e60af38 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c
@@ -1522,8 +1522,8 @@ static void dml_rq_dlg_get_dlg_params(

disp_dlg_regs->refcyc_per_vm_group_vblank   = 
get_refcyc_per_vm_group_vblank(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * 
refclk_freq_in_mhz;
disp_dlg_regs->refcyc_per_vm_group_flip = 
get_refcyc_per_vm_group_flip(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * 
refclk_freq_in_mhz;
-   disp_dlg_regs->refcyc_per_vm_req_vblank = 
get_refcyc_per_vm_req_vblank(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * 
refclk_freq_in_mhz;;
-   disp_dlg_regs->refcyc_per_vm_req_flip   = 
get_refcyc_per_vm_req_flip(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * 
refclk_freq_in_mhz;;
+   disp_dlg_regs->refcyc_per_vm_req_vblank = 
get_refcyc_per_vm_req_vblank(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * 
refclk_freq_in_mhz;
+   disp_dlg_regs->refcyc_per_vm_req_flip   = 
get_refcyc_per_vm_req_flip(mode_lib, e2e_pipe_param, num_pipes, pipe_idx) * 
refclk_freq_in_mhz;

// Clamp to max for now
if (disp_dlg_regs->refcyc_per_vm_group_vblank >= (unsigned 
int)dml_pow(2, 23))
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/4] drm/amd/display: Remove unneeded semicolon in bios_parser2.c

2019-11-28 Thread zhengbin
Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c:995:2-3: Unneeded semicolon

Reported-by: Hulk Robot 
Signed-off-by: zhengbin 
---
 drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c 
b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
index eb06ee7..4e24717 100644
--- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
@@ -992,7 +992,7 @@ static uint32_t get_support_mask_for_device_id(struct 
device_id device_id)
break;
default:
break;
-   };
+   }

/* Unidentified device ID, return empty support mask. */
return 0;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/4] drm/amd/display: Remove unneeded semicolon in hdcp.c

2019-11-28 Thread zhengbin
Fixes coccicheck warning:

drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c:506:2-3: Unneeded semicolon

Reported-by: Hulk Robot 
Signed-off-by: zhengbin 
---
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
index cbb5e9c..8aa528e 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c
@@ -503,7 +503,7 @@ enum mod_hdcp_operation_mode 
mod_hdcp_signal_type_to_operation_mode(
break;
default:
break;
-   };
+   }

return mode;
 }
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/4] drm/amd/display: Remove unneeded semicolon in bios_parser.c

2019-11-28 Thread zhengbin
Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/bios/bios_parser.c:2192:2-3: Unneeded semicolon

Reported-by: Hulk Robot 
Signed-off-by: zhengbin 
---
 drivers/gpu/drm/amd/display/dc/bios/bios_parser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c 
b/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c
index 27451f2..008d4d1 100644
--- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c
@@ -2189,7 +2189,7 @@ static uint32_t get_support_mask_for_device_id(struct 
device_id device_id)
break;
default:
break;
-   };
+   }

/* Unidentified device ID, return empty support mask. */
return 0;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 0/4] drm/amd/display: Remove unneeded semicolon

2019-11-28 Thread zhengbin
zhengbin (4):
  drm/amd/display: Remove unneeded semicolon in bios_parser.c
  drm/amd/display: Remove unneeded semicolon in bios_parser2.c
  drm/amd/display: Remove unneeded semicolon in hdcp.c
  drm/amd/display: Remove unneeded semicolon in display_rq_dlg_calc_21.c

 drivers/gpu/drm/amd/display/dc/bios/bios_parser.c | 2 +-
 drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c| 2 +-
 drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c | 4 ++--
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/display: Fix wrongly passed static prefix

2019-11-28 Thread Takashi Iwai
Currently, gcc spews a warning as:
  drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hubbub.c: In function 
‘hubbub1_verify_allow_pstate_change_high’:
  ./include/drm/drm_print.h:316:2: warning: ‘debug_data’ may be used 
uninitialized in this function [-Wmaybe-uninitialized]

This is because the code checks against a static value although it's
basically a constant and guaranteed to be set.

This patch changes the type prefix from static to const for addressing
the compile warning above and also for letting the compiler optimize
better.

Fixes: 62d591a8e00c ("drm/amd/display: create new files for hubbub functions")
Signed-off-by: Takashi Iwai 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
index a02c10e23e0d..b5c44c3bdb98 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
@@ -128,8 +128,8 @@ bool hubbub1_verify_allow_pstate_change_high(
 * pstate takes around ~100us on linux. Unknown currently as to
 * why it takes that long on linux
 */
-   static unsigned int pstate_wait_timeout_us = 200;
-   static unsigned int pstate_wait_expected_timeout_us = 40;
+   const unsigned int pstate_wait_timeout_us = 200;
+   const unsigned int pstate_wait_expected_timeout_us = 40;
static unsigned int max_sampled_pstate_wait_us; /* data collection */
static bool forced_pstate_allow; /* help with revert wa */
 
-- 
2.16.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx