[PATCH] drm/amdgpu: Change AID detection logic

2024-04-14 Thread Lijo Lazar
On GFX 9.4.3 SOCs, only 2 SDMA instances need to be available to be
considered as a valid AID.

Signed-off-by: Lijo Lazar 
Reviewed-by: Asad Kamal 
---
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index fbf5f65ab091..bdab65bc3105 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -649,7 +649,7 @@ static void aqua_vanjaram_down_config(struct amdgpu_device 
*adev)
 
 int aqua_vanjaram_init_soc_config(struct amdgpu_device *adev)
 {
-   u32 mask, inst_mask = adev->sdma.sdma_mask, sdma_pres;
+   u32 mask, avail_inst, inst_mask = adev->sdma.sdma_mask;
int ret, i;
 
aqua_vanjaram_down_config(adev);
@@ -662,8 +662,9 @@ int aqua_vanjaram_init_soc_config(struct amdgpu_device 
*adev)
 
for (mask = (1 << adev->sdma.num_inst_per_aid) - 1; inst_mask;
 inst_mask >>= adev->sdma.num_inst_per_aid, ++i) {
-   sdma_pres = inst_mask & mask;
-   if (sdma_pres == mask || sdma_pres == 0x3 || sdma_pres == 0xc)
+   avail_inst = inst_mask & mask;
+   if (avail_inst == mask || avail_inst == 0x3 ||
+   avail_inst == 0xc)
adev->aid_mask |= (1 << i);
}
 
-- 
2.25.1



RE: [PATCH V2] drm/amdgpu: Fix incorrect return value

2024-04-14 Thread Chai, Thomas
[AMD Official Use Only - General]

Hi Christian:
   If an ecc error occurs at an address, HW will generate an interrupt to SW to 
retire all pages located in the same physical row as the error address based on 
the physical characteristics of the memory device.
   Therefore, if other pages located on the same physical row as the error 
address also occur ecc errors later, HW will also generate multiple interrupts 
to SW to retire these same pages again, so that amdgpu_vram_mgr_reserve_range 
will be called multiple times to reserve the same pages.

I think it's more appropriate to do the status check inside the function. 
If the function entry is not checked, people who are not familiar with this 
part of the code can easily make mistakes when calling the function.


-
Best Regards,
Thomas

-Original Message-
From: Christian König 
Sent: Friday, April 12, 2024 5:24 PM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas ; Zhang, Hawking ; 
Zhou1, Tao ; Li, Candice ; Wang, 
Yang(Kevin) ; Yang, Stanley 
Subject: Re: [PATCH V2] drm/amdgpu: Fix incorrect return value

Am 12.04.24 um 10:55 schrieb YiPeng Chai:
> [Why]
>After calling amdgpu_vram_mgr_reserve_range multiple times with the
> same address, calling amdgpu_vram_mgr_query_page_status will always
> return -EBUSY.
>From the second call to amdgpu_vram_mgr_reserve_range, the same
> address will be added to the reservations_pending list again and is
> never moved to the reserved_pages list because the address had been
> reserved.

Well just to make it clear that approach is a NAK until my concerns are solved.

Regards,
Christian.

>
> [How]
>First add the address status check before calling
> amdgpu_vram_mgr_do_reserve, if the address is already reserved, do
> nothing; If the address is already in the reservations_pending list,
> directly reserve memory; only add new nodes for the addresses that are
> not in the reserved_pages list and reservations_pending list.
>
> V2:
>   Avoid repeated locking/unlocking.
>
> Signed-off-by: YiPeng Chai 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 25 +---
>   1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 1e36c428d254..a636d3f650b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -317,7 +317,6 @@ static void amdgpu_vram_mgr_do_reserve(struct
> ttm_resource_manager *man)
>
>   dev_dbg(adev->dev, "Reservation 0x%llx - %lld, Succeeded\n",
>   rsv->start, rsv->size);
> -
>   vis_usage = amdgpu_vram_mgr_vis_size(adev, block);
>   atomic64_add(vis_usage, >vis_usage);
>   spin_lock(>bdev->lru_lock);
> @@ -340,19 +339,27 @@ int amdgpu_vram_mgr_reserve_range(struct 
> amdgpu_vram_mgr *mgr,
> uint64_t start, uint64_t size)
>   {
>   struct amdgpu_vram_reservation *rsv;
> + int ret = 0;
>
> - rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
> - if (!rsv)
> - return -ENOMEM;
> + ret = amdgpu_vram_mgr_query_page_status(mgr, start);
> + if (!ret)
> + return 0;
>
> - INIT_LIST_HEAD(>allocated);
> - INIT_LIST_HEAD(>blocks);
> + if (ret == -ENOENT) {
> + rsv = kzalloc(sizeof(*rsv), GFP_KERNEL);
> + if (!rsv)
> + return -ENOMEM;
>
> - rsv->start = start;
> - rsv->size = size;
> + INIT_LIST_HEAD(>allocated);
> + INIT_LIST_HEAD(>blocks);
> +
> + rsv->start = start;
> + rsv->size = size;
> + }
>
>   mutex_lock(>lock);
> - list_add_tail(>blocks, >reservations_pending);
> + if (ret == -ENOENT)
> + list_add_tail(>blocks, >reservations_pending);
>   amdgpu_vram_mgr_do_reserve(>manager);
>   mutex_unlock(>lock);
>



Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-04-14 Thread Lin, Wayne
[Public]

Ping for code review. Thanks!

Regards,
Wayne


From: Wayne Lin 
Sent: Thursday, March 7, 2024 14:29
To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
intel-...@lists.freedesktop.org
Cc: ly...@redhat.com; Wentland, Harry; imre.d...@intel.com; Lin, Wayne; Leon 
Weiß; sta...@vger.kernel.org; regressi...@lists.linux.dev
Subject: [PATCH] drm/mst: Fix NULL pointer dereference at 
drm_dp_add_payload_part2

[Why]
Commit:
- commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
allocation/removement")
accidently overwrite the commit
- commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in 
drm_dp_add_payload_part2")
which cause regression.

[How]
Recover the original NULL fix and remove the unnecessary input parameter 
'state' for
drm_dp_add_payload_part2().

Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
allocation/removement")
Reported-by: Leon Weiß 
Link: 
https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.ca...@ruhr-uni-bochum.de/
Cc: ly...@redhat.com
Cc: imre.d...@intel.com
Cc: sta...@vger.kernel.org
Cc: regressi...@lists.linux.dev
Signed-off-by: Wayne Lin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 2 +-
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 +---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   | 2 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c   | 2 +-
 include/drm/display/drm_dp_mst_helper.h   | 1 -
 5 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index c27063305a13..2c36f3d00ca2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -363,7 +363,7 @@ void dm_helpers_dp_mst_send_payload_allocation(
mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
new_payload = drm_atomic_get_mst_payload_state(mst_state, 
aconnector->mst_output_port);

-   ret = drm_dp_add_payload_part2(mst_mgr, mst_state->base.state, 
new_payload);
+   ret = drm_dp_add_payload_part2(mst_mgr, new_payload);

if (ret) {
amdgpu_dm_set_mst_status(>mst_status,
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 03d528209426..95fd18f24e94 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3421,7 +3421,6 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
 /**
  * drm_dp_add_payload_part2() - Execute payload update part 2
  * @mgr: Manager to use.
- * @state: The global atomic state
  * @payload: The payload to update
  *
  * If @payload was successfully assigned a starting time slot by 
drm_dp_add_payload_part1(), this
@@ -3430,14 +3429,13 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
  * Returns: 0 on success, negative error code on failure.
  */
 int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
-struct drm_atomic_state *state,
 struct drm_dp_mst_atomic_payload *payload)
 {
int ret = 0;

/* Skip failed payloads */
if (payload->payload_allocation_status != 
DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) {
-   drm_dbg_kms(state->dev, "Part 1 of payload creation for %s 
failed, skipping part 2\n",
+   drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s 
failed, skipping part 2\n",
payload->port->connector->name);
return -EIO;
}
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 53aec023ce92..2fba66aec038 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -1160,7 +1160,7 @@ static void intel_mst_enable_dp(struct intel_atomic_state 
*state,
if (first_mst_stream)
intel_ddi_wait_for_fec_status(encoder, pipe_config, true);

-   drm_dp_add_payload_part2(_dp->mst_mgr, >base,
+   drm_dp_add_payload_part2(_dp->mst_mgr,
 drm_atomic_get_mst_payload_state(mst_state, 
connector->port));

if (DISPLAY_VER(dev_priv) >= 12)
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 0c3d88ad0b0e..88728a0b2c25 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -915,7 +915,7 @@ nv50_msto_cleanup(struct drm_atomic_state *state,
msto->disabled = false;
drm_dp_remove_payload_part2(mgr, new_mst_state, old_payload, 
new_payload);
} else if (msto->enabled) {
-   drm_dp_add_payload_part2(mgr, state, new_payload);
+   drm_dp_add_payload_part2(mgr, new_payload);

Re: [PATCH v2] drm/amdgpu: refactoring the runtime pm mode detection code

2024-04-14 Thread Ma, Jun
ping...

Regards,
Ma Jun

On 4/3/2024 10:57 AM, Ma Jun wrote:
> refactor the code of runtime pm mode detection to support
> amdgpu_runtime_pm =2 and 1 two cases
> 
> Signed-off-by: Ma Jun 
> Reviewed-by: Yang Wang 
> ---
> v1->v2:
> - Fix logic and output info (Lijo)
> - Fix code style (Kevin)
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 75 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 48 +-
>  3 files changed, 77 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 65c17c59c152..e0d7f4ee7e16 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1409,6 +1409,7 @@ bool amdgpu_device_supports_px(struct drm_device *dev);
>  bool amdgpu_device_supports_boco(struct drm_device *dev);
>  bool amdgpu_device_supports_smart_shift(struct drm_device *dev);
>  int amdgpu_device_supports_baco(struct drm_device *dev);
> +void amdgpu_device_detect_runtime_pm_mode(struct amdgpu_device *adev);
>  bool amdgpu_device_is_peer_accessible(struct amdgpu_device *adev,
> struct amdgpu_device *peer_adev);
>  int amdgpu_device_baco_enter(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f830024cffca..2ea1f47df79c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -350,6 +350,81 @@ int amdgpu_device_supports_baco(struct drm_device *dev)
>   return amdgpu_asic_supports_baco(adev);
>  }
>  
> +void amdgpu_device_detect_runtime_pm_mode(struct amdgpu_device *adev)
> +{
> + struct drm_device *dev;
> + int bamaco_support;
> +
> + dev = adev_to_drm(adev);
> +
> + adev->pm.rpm_mode = AMDGPU_RUNPM_NONE;
> + bamaco_support = amdgpu_device_supports_baco(dev);
> +
> + switch (amdgpu_runtime_pm) {
> + case 2:
> + if (bamaco_support & MACO_SUPPORT) {
> + adev->pm.rpm_mode = AMDGPU_RUNPM_BAMACO;
> + dev_info(adev->dev, "Forcing BAMACO for runtime pm\n");
> + } else if (bamaco_support == BACO_SUPPORT) {
> + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
> + dev_info(adev->dev, "Requested mode BAMACO not 
> available,fallback to use BACO\n");
> + }
> + break;
> + case 1:
> + if (bamaco_support & BACO_SUPPORT) {
> + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
> + dev_info(adev->dev, "Forcing BACO for runtime pm\n");
> + }
> + break;
> + case -1:
> + case -2:
> + if (amdgpu_device_supports_px(dev)) { /* enable PX as runtime 
> mode */
> + adev->pm.rpm_mode = AMDGPU_RUNPM_PX;
> + dev_info(adev->dev, "Using ATPX for runtime pm\n");
> + } else if (amdgpu_device_supports_boco(dev)) { /* enable boco 
> as runtime mode */
> + adev->pm.rpm_mode = AMDGPU_RUNPM_BOCO;
> + dev_info(adev->dev, "Using BOCO for runtime pm\n");
> + } else {
> + if (!bamaco_support)
> + goto no_runtime_pm;
> +
> + switch (adev->asic_type) {
> + case CHIP_VEGA20:
> + case CHIP_ARCTURUS:
> + /* BACO are not supported on vega20 and arctrus 
> */
> + break;
> + case CHIP_VEGA10:
> + /* enable BACO as runpm mode if noretry=0 */
> + if (!adev->gmc.noretry)
> + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
> + break;
> + default:
> + /* enable BACO as runpm mode on CI+ */
> + adev->pm.rpm_mode = AMDGPU_RUNPM_BACO;
> + break;
> + }
> +
> + if (adev->pm.rpm_mode == AMDGPU_RUNPM_BACO) {
> + if (bamaco_support & MACO_SUPPORT) {
> + adev->pm.rpm_mode = AMDGPU_RUNPM_BAMACO;
> + dev_info(adev->dev, "Using BAMACO for 
> runtime pm\n");
> + } else {
> + dev_info(adev->dev, "Using BACO for 
> runtime pm\n");
> + }
> + }
> + }
> + break;
> + case 0:
> + dev_info(adev->dev, "runtime pm is manually disabled\n");
> + break;
> + default:
> + break;
> + }
> +
> +no_runtime_pm:
> + if (adev->pm.rpm_mode == AMDGPU_RUNPM_NONE)
> + 

[PATCH v11 1/3] drm/buddy: Implement tracking clear page feature

2024-04-14 Thread Arunpravin Paneer Selvam
- Add tracking clear page feature.

- Driver should enable the DRM_BUDDY_CLEARED flag if it
  successfully clears the blocks in the free path. On the otherhand,
  DRM buddy marks each block as cleared.

- Track the available cleared pages size

- If driver requests cleared memory we prefer cleared memory
  but fallback to uncleared if we can't find the cleared blocks.
  when driver requests uncleared memory we try to use uncleared but
  fallback to cleared memory if necessary.

- When a block gets freed we clear it and mark the freed block as cleared,
  when there are buddies which are cleared as well we can merge them.
  Otherwise, we prefer to keep the blocks as separated.

- Add a function to support defragmentation.

v1:
  - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as
cleared. Else, reset the clear flag for each block in the list(Christian)
  - For merging the 2 cleared blocks compare as below,
drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian)
  - Defragment the memory beginning from min_order
till the required memory space is available.

v2: (Matthew)
  - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks
operation within drm buddy.
  - Write a macro block_incompatible() to allocate the required blocks.
  - Update the xe driver for the drm_buddy_free_list change in arguments.
  - add a warning if the two blocks are incompatible on
defragmentation
  - call full defragmentation in the fini() function
  - place a condition to test if min_order is equal to 0
  - replace the list with safe_reverse() variant as we might
remove the block from the list.

v3:
  - fix Gitlab user reported lockup issue.
  - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew)
  - modify to pass the root order instead max_order in fini()
function(Matthew)
  - change bool 1 to true(Matthew)
  - add check if min_block_size is power of 2(Matthew)
  - modify the min_block_size datatype to u64(Matthew)

v4:
  - rename the function drm_buddy_defrag with __force_merge.
  - Include __force_merge directly in drm buddy file and remove
the defrag use in amdgpu driver.
  - Remove list_empty() check(Matthew)
  - Remove unnecessary space, headers and placement of new variables(Matthew)
  - Add a unit test case(Matthew)

v5:
  - remove force merge support to actual range allocation and not to bail
out when contains && split(Matthew)
  - add range support to force merge function.

v6:
  - modify the alloc_range() function clear page non merged blocks
allocation(Matthew)
  - correct the list_insert function name(Matthew).

Signed-off-by: Arunpravin Paneer Selvam 
Signed-off-by: Matthew Auld 
Suggested-by: Christian König 
Suggested-by: Matthew Auld 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   6 +-
 drivers/gpu/drm/drm_buddy.c   | 427 ++
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   6 +-
 drivers/gpu/drm/tests/drm_buddy_test.c|  28 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c  |   4 +-
 include/drm/drm_buddy.h   |  16 +-
 6 files changed, 363 insertions(+), 124 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 8db880244324..c0c851409241 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
return 0;
 
 error_free_blocks:
-   drm_buddy_free_list(mm, >blocks);
+   drm_buddy_free_list(mm, >blocks, 0);
mutex_unlock(>lock);
 error_fini:
ttm_resource_fini(man, >base);
@@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager 
*man,
 
amdgpu_vram_mgr_do_reserve(man);
 
-   drm_buddy_free_list(mm, >blocks);
+   drm_buddy_free_list(mm, >blocks, 0);
mutex_unlock(>lock);
 
atomic64_sub(vis_usage, >vis_usage);
@@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
kfree(rsv);
 
list_for_each_entry_safe(rsv, temp, >reserved_pages, blocks) {
-   drm_buddy_free_list(>mm, >allocated);
+   drm_buddy_free_list(>mm, >allocated, 0);
kfree(rsv);
}
if (!adev->gmc.is_app_apu)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 5ebdd6f8f36e..284ebae71cc4 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm,
__list_add(>link, node->link.prev, >link);
 }
 
+static void clear_reset(struct drm_buddy_block *block)
+{
+   block->header &= ~DRM_BUDDY_HEADER_CLEAR;
+}
+
+static void mark_cleared(struct drm_buddy_block *block)
+{
+   block->header |= DRM_BUDDY_HEADER_CLEAR;
+}
+
 static void mark_allocated(struct drm_buddy_block *block)
 {

[PATCH v11 3/3] drm/tests: Add a test case for drm buddy clear allocation

2024-04-14 Thread Arunpravin Paneer Selvam
Add a new test case for the drm buddy clear and dirty
allocation.

v2:(Matthew)
  - make size as u32
  - rename PAGE_SIZE with SZ_4K
  - dont fragment the address space for all the order allocation
iterations. we can do it once and just increment and allocate
the size.
  - create new mm with non power-of-two size to ensure the multi-root
force_merge during fini.

v3:
  - add randomness in size calculation(Matthew)

Signed-off-by: Arunpravin Paneer Selvam 
Reviewed-by: Matthew Auld 
Suggested-by: Matthew Auld 
---
 drivers/gpu/drm/tests/drm_buddy_test.c | 143 +
 1 file changed, 143 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c 
b/drivers/gpu/drm/tests/drm_buddy_test.c
index 4621a860cb05..e3b50e240d36 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -224,6 +224,148 @@ static void drm_test_buddy_alloc_range_bias(struct kunit 
*test)
drm_buddy_fini();
 }
 
+static void drm_test_buddy_alloc_clear(struct kunit *test)
+{
+   unsigned long n_pages, total, i = 0;
+   DRM_RND_STATE(prng, random_seed);
+   const unsigned long ps = SZ_4K;
+   struct drm_buddy_block *block;
+   const int max_order = 12;
+   LIST_HEAD(allocated);
+   struct drm_buddy mm;
+   unsigned int order;
+   u32 mm_size, size;
+   LIST_HEAD(dirty);
+   LIST_HEAD(clean);
+
+   mm_size = SZ_4K << max_order;
+   KUNIT_EXPECT_FALSE(test, drm_buddy_init(, mm_size, ps));
+
+   KUNIT_EXPECT_EQ(test, mm.max_order, max_order);
+
+   /*
+* Idea is to allocate and free some random portion of the address 
space,
+* returning those pages as non-dirty and randomly alternate between
+* requesting dirty and non-dirty pages (not going over the limit
+* we freed as non-dirty), putting that into two separate lists.
+* Loop over both lists at the end checking that the dirty list
+* is indeed all dirty pages and vice versa. Free it all again,
+* keeping the dirty/clear status.
+*/
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   5 * ps, ps, 
,
+   
DRM_BUDDY_TOPDOWN_ALLOCATION),
+   "buddy_alloc hit an error size=%lu\n", 5 * ps);
+   drm_buddy_free_list(, , DRM_BUDDY_CLEARED);
+
+   n_pages = 10;
+   do {
+   unsigned long flags;
+   struct list_head *list;
+   int slot = i % 2;
+
+   if (slot == 0) {
+   list = 
+   flags = 0;
+   } else {
+   list = 
+   flags = DRM_BUDDY_CLEAR_ALLOCATION;
+   }
+
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, 
mm_size,
+   ps, ps, 
list,
+   flags),
+   "buddy_alloc hit an error size=%lu\n", 
ps);
+   } while (++i < n_pages);
+
+   list_for_each_entry(block, , link)
+   KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), true);
+
+   list_for_each_entry(block, , link)
+   KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false);
+
+   drm_buddy_free_list(, , DRM_BUDDY_CLEARED);
+
+   /*
+* Trying to go over the clear limit for some allocation.
+* The allocation should never fail with reasonable page-size.
+*/
+   KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(, 0, mm_size,
+   10 * ps, ps, ,
+   
DRM_BUDDY_CLEAR_ALLOCATION),
+   "buddy_alloc hit an error size=%lu\n", 10 * ps);
+
+   drm_buddy_free_list(, , DRM_BUDDY_CLEARED);
+   drm_buddy_free_list(, , 0);
+   drm_buddy_fini();
+
+   KUNIT_EXPECT_FALSE(test, drm_buddy_init(, mm_size, ps));
+
+   /*
+* Create a new mm. Intentionally fragment the address space by creating
+* two alternating lists. Free both lists, one as dirty the other as 
clean.
+* Try to allocate double the previous size with matching 
min_page_size. The
+* allocation should never fail as it calls the force_merge. Also check 
that
+* the page is always dirty after force_merge. Free the page as dirty, 
then
+* repeat the whole thing, increment the order until we hit the 
max_order.
+*/
+
+   i = 0;
+   n_pages = mm_size / ps;
+   do {
+   struct list_head *list;
+   int slot = i % 2;
+
+   if (slot == 0)
+   list = 
+   else
+   list = 
+
+   

[PATCH v11 2/3] drm/amdgpu: Enable clear page functionality

2024-04-14 Thread Arunpravin Paneer Selvam
Add clear page support in vram memory region.

v1(Christian):
  - Dont handle clear page as TTM flag since when moving the BO back
in from GTT again we don't need that.
  - Make a specialized version of amdgpu_fill_buffer() which only
clears the VRAM areas which are not already cleared
  - Drop the TTM_PL_FLAG_WIPE_ON_RELEASE check in
amdgpu_object.c

v2:
  - Modify the function name amdgpu_ttm_* (Alex)
  - Drop the delayed parameter (Christian)
  - handle amdgpu_res_cleared() just above the size
calculation (Christian)
  - Use AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE for clearing the buffers
in the free path to properly wait for fences etc.. (Christian)

v3(Christian):
  - Remove buffer clear code in VRAM manager instead change the
AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE handling to set
the DRM_BUDDY_CLEARED flag.
  - Remove ! from amdgpu_res_cleared() check.

v4(Christian):
  - vres flag setting move to vram manager file
  - use dma_fence_get_stub in amdgpu_ttm_clear_buffer function
  - make fence a mandatory parameter and drop the if and the get/put dance

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
Acked-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|  9 +--
 .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 25 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 70 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |  5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  | 10 +++
 6 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8bc79924d171..e011a326fe86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -39,6 +39,7 @@
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
+#include "amdgpu_vram_mgr.h"
 
 /**
  * DOC: amdgpu_object
@@ -601,8 +602,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
if (!amdgpu_bo_support_uswc(bo->flags))
bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
 
-   if (adev->ras_enabled)
-   bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
+   bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
 
bo->tbo.bdev = >mman.bdev;
if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
@@ -634,7 +634,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
bo->tbo.resource->mem_type == TTM_PL_VRAM) {
struct dma_fence *fence;
 
-   r = amdgpu_fill_buffer(bo, 0, bo->tbo.base.resv, , true);
+   r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, );
if (unlikely(r))
goto fail_unreserve;
 
@@ -1365,8 +1365,9 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object 
*bo)
if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))
return;
 
-   r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, , true);
+   r = amdgpu_fill_buffer(abo, 0, bo->base.resv, , true);
if (!WARN_ON(r)) {
+   amdgpu_vram_mgr_set_cleared(bo->resource);
amdgpu_bo_fence(abo, fence, false);
dma_fence_put(fence);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 381101d2bf05..50fcd86e1033 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -164,4 +164,29 @@ static inline void amdgpu_res_next(struct 
amdgpu_res_cursor *cur, uint64_t size)
}
 }
 
+/**
+ * amdgpu_res_cleared - check if blocks are cleared
+ *
+ * @cur: the cursor to extract the block
+ *
+ * Check if the @cur block is cleared
+ */
+static inline bool amdgpu_res_cleared(struct amdgpu_res_cursor *cur)
+{
+   struct drm_buddy_block *block;
+
+   switch (cur->mem_type) {
+   case TTM_PL_VRAM:
+   block = cur->node;
+
+   if (!amdgpu_vram_mgr_is_cleared(block))
+   return false;
+   break;
+   default:
+   return false;
+   }
+
+   return true;
+}
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index fc418e670fda..f0b42b567357 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -378,11 +378,12 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) {
struct dma_fence *wipe_fence = NULL;
 
-   r = amdgpu_fill_buffer(abo, AMDGPU_POISON, NULL, _fence,
-   false);
+   r = amdgpu_fill_buffer(abo, 0, NULL, _fence,
+  false);
if (r) {

Re: [PATCH v10 1/3] drm/buddy: Implement tracking clear page feature

2024-04-14 Thread Paneer Selvam, Arunpravin

Hi Matthew,

On 4/10/2024 6:22 PM, Matthew Auld wrote:

On 08/04/2024 16:16, Arunpravin Paneer Selvam wrote:

- Add tracking clear page feature.

- Driver should enable the DRM_BUDDY_CLEARED flag if it
   successfully clears the blocks in the free path. On the otherhand,
   DRM buddy marks each block as cleared.

- Track the available cleared pages size

- If driver requests cleared memory we prefer cleared memory
   but fallback to uncleared if we can't find the cleared blocks.
   when driver requests uncleared memory we try to use uncleared but
   fallback to cleared memory if necessary.

- When a block gets freed we clear it and mark the freed block as 
cleared,

   when there are buddies which are cleared as well we can merge them.
   Otherwise, we prefer to keep the blocks as separated.

- Add a function to support defragmentation.

v1:
   - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as
 cleared. Else, reset the clear flag for each block in the 
list(Christian)

   - For merging the 2 cleared blocks compare as below,
 drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian)
   - Defragment the memory beginning from min_order
 till the required memory space is available.

v2: (Matthew)
   - Add a wrapper drm_buddy_free_list_internal for the freeing of 
blocks

 operation within drm buddy.
   - Write a macro block_incompatible() to allocate the required blocks.
   - Update the xe driver for the drm_buddy_free_list change in 
arguments.

   - add a warning if the two blocks are incompatible on
 defragmentation
   - call full defragmentation in the fini() function
   - place a condition to test if min_order is equal to 0
   - replace the list with safe_reverse() variant as we might
 remove the block from the list.

v3:
   - fix Gitlab user reported lockup issue.
   - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew)
   - modify to pass the root order instead max_order in fini()
 function(Matthew)
   - change bool 1 to true(Matthew)
   - add check if min_block_size is power of 2(Matthew)
   - modify the min_block_size datatype to u64(Matthew)

v4:
   - rename the function drm_buddy_defrag with __force_merge.
   - Include __force_merge directly in drm buddy file and remove
 the defrag use in amdgpu driver.
   - Remove list_empty() check(Matthew)
   - Remove unnecessary space, headers and placement of new 
variables(Matthew)

   - Add a unit test case(Matthew)

v5:
   - remove force merge support to actual range allocation and not to 
bail

 out when contains && split(Matthew)
   - add range support to force merge function.

Signed-off-by: Arunpravin Paneer Selvam 


Signed-off-by: Matthew Auld 
Suggested-by: Christian König 
Suggested-by: Matthew Auld 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   6 +-
  drivers/gpu/drm/drm_buddy.c   | 430 ++
  drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   6 +-
  drivers/gpu/drm/tests/drm_buddy_test.c    |  28 +-
  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c  |   4 +-
  include/drm/drm_buddy.h   |  16 +-
  6 files changed, 368 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

index 8db880244324..c0c851409241 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,

  return 0;
    error_free_blocks:
-    drm_buddy_free_list(mm, >blocks);
+    drm_buddy_free_list(mm, >blocks, 0);
  mutex_unlock(>lock);
  error_fini:
  ttm_resource_fini(man, >base);
@@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct 
ttm_resource_manager *man,

    amdgpu_vram_mgr_do_reserve(man);
  -    drm_buddy_free_list(mm, >blocks);
+    drm_buddy_free_list(mm, >blocks, 0);
  mutex_unlock(>lock);
    atomic64_sub(vis_usage, >vis_usage);
@@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device 
*adev)

  kfree(rsv);
    list_for_each_entry_safe(rsv, temp, >reserved_pages, 
blocks) {

-    drm_buddy_free_list(>mm, >allocated);
+    drm_buddy_free_list(>mm, >allocated, 0);
  kfree(rsv);
  }
  if (!adev->gmc.is_app_apu)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 5ebdd6f8f36e..83dbe252f727 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -38,8 +38,8 @@ static void drm_block_free(struct drm_buddy *mm,
  kmem_cache_free(slab_blocks, block);
  }
  -static void list_insert_sorted(struct drm_buddy *mm,
-   struct drm_buddy_block *block)
+static void list_insert(struct drm_buddy *mm,
+    struct drm_buddy_block *block)


Why the change here?


  {
  struct drm_buddy_block *node;
  struct list_head *head;
@@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm,
  

[PATCH] drm/amdgpu: Modify the contiguous flags behaviour

2024-04-14 Thread Arunpravin Paneer Selvam
Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.
- we will try to allocate the contiguous buffer. Say if the
  allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
  and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
  the allocation fails, we return -ENOSPC.

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   | 14 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 57 +++-
 2 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8bc79924d171..41926d631563 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -153,8 +153,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
else
places[c].flags |= TTM_PL_FLAG_TOPDOWN;
 
-   if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
-   places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
c++;
}
 
@@ -899,6 +897,8 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = { false, false };
+   struct ttm_place *places = bo->placements;
+   u32 c = 0;
int r, i;
 
if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
@@ -921,16 +921,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 
if (bo->tbo.pin_count) {
uint32_t mem_type = bo->tbo.resource->mem_type;
-   uint32_t mem_flags = bo->tbo.resource->placement;
 
if (!(domain & amdgpu_mem_type_to_domain(mem_type)))
return -EINVAL;
 
-   if ((mem_type == TTM_PL_VRAM) &&
-   (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) &&
-   !(mem_flags & TTM_PL_FLAG_CONTIGUOUS))
-   return -EINVAL;
-
ttm_bo_pin(>tbo);
 
if (max_offset != 0) {
@@ -968,6 +962,10 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
bo->placements[i].lpfn = lpfn;
}
 
+   if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
+   !WARN_ON(places[c].mem_type != TTM_PL_VRAM))
+   places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
+
r = ttm_bo_validate(>tbo, >placement, );
if (unlikely(r)) {
dev_err(adev->dev, "%p pin failed\n", bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 8db880244324..ddbf302878f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -88,6 +88,30 @@ static inline u64 amdgpu_vram_mgr_blocks_size(struct 
list_head *head)
return size;
 }
 
+static inline unsigned long
+amdgpu_vram_find_pages_per_block(struct ttm_buffer_object *tbo,
+const struct ttm_place *place,
+unsigned long bo_flags)
+{
+   unsigned long pages_per_block;
+
+   if (bo_flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS ||
+   place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+   pages_per_block = ~0ul;
+   } else {
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   pages_per_block = HPAGE_PMD_NR;
+#else
+   /* default to 2MB */
+   pages_per_block = 2UL << (20UL - PAGE_SHIFT);
+#endif
+   pages_per_block = max_t(uint32_t, pages_per_block,
+   tbo->page_alignment);
+   }
+
+   return pages_per_block;
+}
+
 /**
  * DOC: mem_info_vram_total
  *
@@ -451,8 +475,10 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
struct amdgpu_device *adev = to_amdgpu_device(mgr);
u64 vis_usage = 0, max_bytes, min_block_size;
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
struct amdgpu_vram_mgr_resource *vres;
u64 size, remaining_size, lpfn, fpfn;
+   unsigned long bo_flags = bo->flags;
struct drm_buddy *mm = >mm;
struct drm_buddy_block *block;
unsigned long pages_per_block;
@@ -468,18 +494,8 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
if (tbo->type