Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Tvrtko Ursulin



On 15/05/2024 15:31, Christian König wrote:

Am 15.05.24 um 12:59 schrieb Tvrtko Ursulin:


On 15/05/2024 08:20, Christian König wrote:

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with 
buffer

allowed and preferred placements can control the decision on whether
backing store migration will be attempted or not.

Both from code inspection and from empirical experiments I see that not
being true, and that both allowed and preferred placement are typically
set to the same bitmask.


That's not correct for the use case handled here, but see below.


Which part is not correct, that bo->preferred_domains and 
bo->allower_domains are the same bitmask?


Sorry totally forgot to explain that.

This rate limit here was specially made for OpenGL applications which 
over commit VRAM. In those case preferred_domains will be VRAM only and 
allowed_domains will be VRAM|GTT.


RADV always uses VRAM|GTT for both (which is correct).


Got it, thanks!

As such, when the code decides to throttle the migration for a 
client, it
is in fact not achieving anything. Buffers can still be either 
migrated or
not migrated based on the external (to this function and facility) 
logic.


Fix it by not changing the buffer object placements if the migration
budget has been spent.

FIXME:
Is it still required to call validate is the question..

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 22708954ae68..d07a1dd7c880 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

  .no_wait_gpu = false,
  .resv = bo->tbo.base.resv
  };
+    bool migration_allowed = true;
  struct ttm_resource *old_res;
  uint32_t domain;
  int r;
@@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

   * visible VRAM if we've depleted our allowance to do
   * that.
   */
-    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
  domain = bo->preferred_domains;
-    else
+    } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
+    }
  } else {
  domain = bo->preferred_domains;
  }
  } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
  }
  retry:
-    amdgpu_bo_placement_from_domain(bo, domain);
+    if (migration_allowed)
+    amdgpu_bo_placement_from_domain(bo, domain);


That's completely invalid. Calling amdgpu_bo_placement_from_domain() 
is a mandatory prerequisite for calling ttm_bo_validate();


E.g. the usually code fow is:

/* This initializes bo->placement */
amdgpu_bo_placement_from_domain()

/* Eventually modify bo->placement to fit special requirements */


/* Apply the placement to the BO */
ttm_bo_validate(&bo->tbo, &bo->placement, &ctx)

To sum it up bo->placement should be a variable on the stack instead, 
but we never bothered to clean that up.


I am not clear if you agree or not that the current method of trying 
to avoid migration doesn't really do anything?


I totally agree, but the approach you taken to fix it is just quite 
broken. You can't leave bo->placement uninitialized and expect that 
ttm_bo_validate() won't move the BO.


Yep, that much was clear, sorry that I did not explicitly acknowledge 
but just moved on to discussing how to fix it properly.


On stack placements sounds plausible to force migration avoidance by 
putting a single current object placement in that list, if that is 
what you have in mind? Or a specialized flag/version of 
amdgpu_bo_placement_from_domain with an bool input like 
"allow_placement_change"?


A very rough idea with no guarantee that it actually works:

Add a TTM_PL_FLAG_RATE_LIMITED with all the TTM code to actually figure 
out how many bytes have been moved and how many bytes the current 
operation can move etc...


Friedrich's patches actually looked like quite a step into the right 
direction for that already, so I would start from there.


Then always feed amdgpu_bo_placement_from_domain() with the 
allowed_domains in the CS path and VM validation.


Finally extend amdgpu_bo_placement_from_domain() to take a closer look 
at bo->preferred_domains, similar to how we do for the 
TTM_PL_FLAG_FALLBACK already and set the TTM_PL_FLAG_RATE_LIMITED flag 
as appropriate.


Two things which I kind of don't like with the placement flag idea is 
that a) typically two placements are involved in a move so semanti

Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Christian König

Am 15.05.24 um 12:59 schrieb Tvrtko Ursulin:


On 15/05/2024 08:20, Christian König wrote:

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with 
buffer

allowed and preferred placements can control the decision on whether
backing store migration will be attempted or not.

Both from code inspection and from empirical experiments I see that not
being true, and that both allowed and preferred placement are typically
set to the same bitmask.


That's not correct for the use case handled here, but see below.


Which part is not correct, that bo->preferred_domains and 
bo->allower_domains are the same bitmask?


Sorry totally forgot to explain that.

This rate limit here was specially made for OpenGL applications which 
over commit VRAM. In those case preferred_domains will be VRAM only and 
allowed_domains will be VRAM|GTT.


RADV always uses VRAM|GTT for both (which is correct).





As such, when the code decides to throttle the migration for a 
client, it
is in fact not achieving anything. Buffers can still be either 
migrated or
not migrated based on the external (to this function and facility) 
logic.


Fix it by not changing the buffer object placements if the migration
budget has been spent.

FIXME:
Is it still required to call validate is the question..

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 22708954ae68..d07a1dd7c880 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

  .no_wait_gpu = false,
  .resv = bo->tbo.base.resv
  };
+    bool migration_allowed = true;
  struct ttm_resource *old_res;
  uint32_t domain;
  int r;
@@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

   * visible VRAM if we've depleted our allowance to do
   * that.
   */
-    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
  domain = bo->preferred_domains;
-    else
+    } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
+    }
  } else {
  domain = bo->preferred_domains;
  }
  } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
  }
  retry:
-    amdgpu_bo_placement_from_domain(bo, domain);
+    if (migration_allowed)
+    amdgpu_bo_placement_from_domain(bo, domain);


That's completely invalid. Calling amdgpu_bo_placement_from_domain() 
is a mandatory prerequisite for calling ttm_bo_validate();


E.g. the usually code fow is:

/* This initializes bo->placement */
amdgpu_bo_placement_from_domain()

/* Eventually modify bo->placement to fit special requirements */


/* Apply the placement to the BO */
ttm_bo_validate(&bo->tbo, &bo->placement, &ctx)

To sum it up bo->placement should be a variable on the stack instead, 
but we never bothered to clean that up.


I am not clear if you agree or not that the current method of trying 
to avoid migration doesn't really do anything?


I totally agree, but the approach you taken to fix it is just quite 
broken. You can't leave bo->placement uninitialized and expect that 
ttm_bo_validate() won't move the BO.




On stack placements sounds plausible to force migration avoidance by 
putting a single current object placement in that list, if that is 
what you have in mind? Or a specialized flag/version of 
amdgpu_bo_placement_from_domain with an bool input like 
"allow_placement_change"?


A very rough idea with no guarantee that it actually works:

Add a TTM_PL_FLAG_RATE_LIMITED with all the TTM code to actually figure 
out how many bytes have been moved and how many bytes the current 
operation can move etc...


Friedrich's patches actually looked like quite a step into the right 
direction for that already, so I would start from there.


Then always feed amdgpu_bo_placement_from_domain() with the 
allowed_domains in the CS path and VM validation.


Finally extend amdgpu_bo_placement_from_domain() to take a closer look 
at bo->preferred_domains, similar to how we do for the 
TTM_PL_FLAG_FALLBACK already and set the TTM_PL_FLAG_RATE_LIMITED flag 
as appropriate.


Regards,
Christian.


Regards,

Tvrtko



Regards,
Christian.


+
  r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
  if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {






Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Tvrtko Ursulin



On 15/05/2024 08:20, Christian König wrote:

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with buffer
allowed and preferred placements can control the decision on whether
backing store migration will be attempted or not.

Both from code inspection and from empirical experiments I see that not
being true, and that both allowed and preferred placement are typically
set to the same bitmask.


That's not correct for the use case handled here, but see below.


Which part is not correct, that bo->preferred_domains and 
bo->allower_domains are the same bitmask?




As such, when the code decides to throttle the migration for a client, it
is in fact not achieving anything. Buffers can still be either 
migrated or

not migrated based on the external (to this function and facility) logic.

Fix it by not changing the buffer object placements if the migration
budget has been spent.

FIXME:
Is it still required to call validate is the question..

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 22708954ae68..d07a1dd7c880 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

  .no_wait_gpu = false,
  .resv = bo->tbo.base.resv
  };
+    bool migration_allowed = true;
  struct ttm_resource *old_res;
  uint32_t domain;
  int r;
@@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, 
struct amdgpu_bo *bo)

   * visible VRAM if we've depleted our allowance to do
   * that.
   */
-    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+    if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
  domain = bo->preferred_domains;
-    else
+    } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
+    }
  } else {
  domain = bo->preferred_domains;
  }
  } else {
  domain = bo->allowed_domains;
+    migration_allowed = false;
  }
  retry:
-    amdgpu_bo_placement_from_domain(bo, domain);
+    if (migration_allowed)
+    amdgpu_bo_placement_from_domain(bo, domain);


That's completely invalid. Calling amdgpu_bo_placement_from_domain() is 
a mandatory prerequisite for calling ttm_bo_validate();


E.g. the usually code fow is:

/* This initializes bo->placement */
amdgpu_bo_placement_from_domain()

/* Eventually modify bo->placement to fit special requirements */


/* Apply the placement to the BO */
ttm_bo_validate(&bo->tbo, &bo->placement, &ctx)

To sum it up bo->placement should be a variable on the stack instead, 
but we never bothered to clean that up.


I am not clear if you agree or not that the current method of trying to 
avoid migration doesn't really do anything?


On stack placements sounds plausible to force migration avoidance by 
putting a single current object placement in that list, if that is what 
you have in mind? Or a specialized flag/version of 
amdgpu_bo_placement_from_domain with an bool input like 
"allow_placement_change"?


Regards,

Tvrtko



Regards,
Christian.


+
  r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
  if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {




Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Christian König

Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with buffer
allowed and preferred placements can control the decision on whether
backing store migration will be attempted or not.

Both from code inspection and from empirical experiments I see that not
being true, and that both allowed and preferred placement are typically
set to the same bitmask.


That's not correct for the use case handled here, but see below.



As such, when the code decides to throttle the migration for a client, it
is in fact not achieving anything. Buffers can still be either migrated or
not migrated based on the external (to this function and facility) logic.

Fix it by not changing the buffer object placements if the migration
budget has been spent.

FIXME:
Is it still required to call validate is the question..

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 22708954ae68..d07a1dd7c880 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, struct 
amdgpu_bo *bo)
.no_wait_gpu = false,
.resv = bo->tbo.base.resv
};
+   bool migration_allowed = true;
struct ttm_resource *old_res;
uint32_t domain;
int r;
@@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, struct 
amdgpu_bo *bo)
 * visible VRAM if we've depleted our allowance to do
 * that.
 */
-   if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+   if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
domain = bo->preferred_domains;
-   else
+   } else {
domain = bo->allowed_domains;
+   migration_allowed = false;
+   }
} else {
domain = bo->preferred_domains;
}
} else {
domain = bo->allowed_domains;
+   migration_allowed = false;
}
  
  retry:

-   amdgpu_bo_placement_from_domain(bo, domain);
+   if (migration_allowed)
+   amdgpu_bo_placement_from_domain(bo, domain);


That's completely invalid. Calling amdgpu_bo_placement_from_domain() is 
a mandatory prerequisite for calling ttm_bo_validate();


E.g. the usually code fow is:

/* This initializes bo->placement */
amdgpu_bo_placement_from_domain()

/* Eventually modify bo->placement to fit special requirements */


/* Apply the placement to the BO */
ttm_bo_validate(&bo->tbo, &bo->placement, &ctx)

To sum it up bo->placement should be a variable on the stack instead, 
but we never bothered to clean that up.


Regards,
Christian.


+
r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
  
  	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {




[RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-08 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Current code appears to live in a misconception that playing with buffer
allowed and preferred placements can control the decision on whether
backing store migration will be attempted or not.

Both from code inspection and from empirical experiments I see that not
being true, and that both allowed and preferred placement are typically
set to the same bitmask.

As such, when the code decides to throttle the migration for a client, it
is in fact not achieving anything. Buffers can still be either migrated or
not migrated based on the external (to this function and facility) logic.

Fix it by not changing the buffer object placements if the migration
budget has been spent.

FIXME:
Is it still required to call validate is the question..

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Friedrich Vock 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 22708954ae68..d07a1dd7c880 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, struct 
amdgpu_bo *bo)
.no_wait_gpu = false,
.resv = bo->tbo.base.resv
};
+   bool migration_allowed = true;
struct ttm_resource *old_res;
uint32_t domain;
int r;
@@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, struct 
amdgpu_bo *bo)
 * visible VRAM if we've depleted our allowance to do
 * that.
 */
-   if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
+   if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
domain = bo->preferred_domains;
-   else
+   } else {
domain = bo->allowed_domains;
+   migration_allowed = false;
+   }
} else {
domain = bo->preferred_domains;
}
} else {
domain = bo->allowed_domains;
+   migration_allowed = false;
}
 
 retry:
-   amdgpu_bo_placement_from_domain(bo, domain);
+   if (migration_allowed)
+   amdgpu_bo_placement_from_domain(bo, domain);
+
r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
 
if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
-- 
2.44.0