from:"Christian König"

Re: [PATCH v2 2/2] drm/ttm: Add a device flag to propagate -ENOSPC on OOM

2024-10-02 Thread Christian König


Am 02.10.24 um 14:24 schrieb Thomas Hellström:

Some graphics APIs differentiate between out-of-graphics-memory and
out-of-host-memory (system memory). Add a device init flag to
have -ENOSPC propagated from the resource managers instead of being
converted to -ENOMEM, to aid driver stacks in determining what
error code to return or whether corrective action can be taken at
the driver level.

Cc: Christian König 
Cc: Matthew Brost 
Signed-off-by: Thomas Hellström 


Independent of how we communicate flags to the TTM device init function 
this looks like the right approach to me.


So feel free to add Reviewed-by: Christian König .

Regards,
Christian.


---
  drivers/gpu/drm/ttm/ttm_bo.c |  2 +-
  drivers/gpu/drm/ttm/ttm_device.c |  1 +
  include/drm/ttm/ttm_device.h | 13 +
  3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 320592435252..c4bec2ad301b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -835,7 +835,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
  
  	/* For backward compatibility with userspace */

if (ret == -ENOSPC)
-   return -ENOMEM;
+   return bo->bdev->propagate_enospc ? ret : -ENOMEM;
  
  	/*

 * We might need to add a TTM.
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 0c85d10e5e0b..aee9d52d745b 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -203,6 +203,7 @@ int ttm_device_init(struct ttm_device *bdev, const struct 
ttm_device_funcs *func
}
  
  	bdev->funcs = funcs;

+   bdev->propagate_enospc = flags.propagate_enospc;
  
  	ttm_sys_man_init(bdev);
  
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h

index 1534bd946c78..f9da78bbd925 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -266,6 +266,13 @@ struct ttm_device {
 * @wq: Work queue structure for the delayed delete workqueue.
 */
struct workqueue_struct *wq;
+
+   /**
+* @propagate_enospc: Whether -ENOSPC should be propagated to the 
caller after
+* graphics memory allocation failure. If false, this will be converted 
to
+* -ENOMEM, which is the default behaviour.
+*/
+   bool propagate_enospc;
  };
  
  int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);

@@ -295,6 +302,12 @@ struct ttm_device_init_flags {
u32 use_dma_alloc : 1;
/** @use_dma32: If we should use GFP_DMA32 for device memory 
allocations. */
u32 use_dma32 : 1;
+   /**
+* @propagate_enospc: Whether -ENOSPC should be propagated to the 
caller after
+* graphics memory allocation failure. If false, this will be converted 
to
+* -ENOMEM, which is the default behaviour.
+*/
+   u32 propagate_enospc : 1;
  };
  
  int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,

Re: [PATCH v2 1/2] drm/ttm: Change ttm_device_init to use a struct instead of multiple bools

2024-10-02 Thread Christian König


Am 02.10.24 um 14:24 schrieb Thomas Hellström:

The ttm_device_init funcition uses multiple bool arguments. That means
readability in the caller becomes poor, and all callers need to change if
yet another bool is added.

Instead use a struct with multiple single-bit flags. This addresses both
problems. Prefer it over using defines or enums with explicit bit shifts,
since converting to and from these bit values uses logical operations or
tests which are implicit with the struct usage, and ofc type-checking.


Ok, that style is new to me. I've mostly seen defined parameter flags in 
the kernel.


It obviously has some advantages, but do we have any precedence in the 
kernel for using that approach?


Regards,
Christian.



This is in preparation of adding yet another bool flag parameter to the
function.

Cc: Christian König 
Cc: amd-...@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: nouv...@lists.freedesktop.org
Cc: spice-de...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Cc: Zack Rusin 
Cc: 
Cc: Sui Jingfeng 
Cc: 
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  6 --
  drivers/gpu/drm/drm_gem_vram_helper.c |  7 ---
  drivers/gpu/drm/i915/intel_region_ttm.c   |  3 ++-
  drivers/gpu/drm/loongson/lsdc_ttm.c   |  5 -
  drivers/gpu/drm/nouveau/nouveau_ttm.c |  7 +--
  drivers/gpu/drm/qxl/qxl_ttm.c |  2 +-
  drivers/gpu/drm/radeon/radeon_ttm.c   |  6 --
  drivers/gpu/drm/ttm/tests/ttm_bo_test.c   | 16 +++
  .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |  3 ++-
  drivers/gpu/drm/ttm/tests/ttm_device_test.c   | 16 ---
  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 20 ---
  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h |  6 ++
  drivers/gpu/drm/ttm/ttm_device.c  |  7 +++
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |  4 ++--
  drivers/gpu/drm/xe/xe_device.c|  3 ++-
  include/drm/ttm/ttm_device.h  | 12 ++-
  16 files changed, 71 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 74adb983ab03..e43635ac54fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1853,8 +1853,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
   adev_to_drm(adev)->anon_inode->i_mapping,
   adev_to_drm(adev)->vma_offset_manager,
-  adev->need_swiotlb,
-  dma_addressing_limited(adev->dev));
+  (struct ttm_device_init_flags){
+  .use_dma_alloc = adev->need_swiotlb,
+  .use_dma32 = 
dma_addressing_limited(adev->dev)
+  });
if (r) {
DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
return r;
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c
index 22b1fe9c03b8..7c3165b00378 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -931,9 +931,10 @@ static int drm_vram_mm_init(struct drm_vram_mm *vmm, 
struct drm_device *dev,
vmm->vram_size = vram_size;
  
  	ret = ttm_device_init(&vmm->bdev, &bo_driver, dev->dev,

-dev->anon_inode->i_mapping,
-dev->vma_offset_manager,
-false, true);
+ dev->anon_inode->i_mapping,
+ dev->vma_offset_manager,
+ (struct ttm_device_init_flags)
+ {.use_dma32 = true});
if (ret)
return ret;
  
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c

index 04525d92bec5..db34da63814c 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -34,7 +34,8 @@ int intel_region_ttm_device_init(struct drm_i915_private 
*dev_priv)
  
  	return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),

   drm->dev, drm->anon_inode->i_mapping,
-  drm->vma_offset_manager, false, false);
+  drm->vma_offset_manager,
+  (struct ttm_device_init_flags){});
  }
  
  /**

diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c 
b/drivers/gpu/drm/loongson/lsdc_ttm.c
index 2e42c6970c9f..c684f1636f3f 100644
--- a/drivers/gpu/drm/loongson/lsdc_ttm.c
+++ b/drivers/gpu/drm/loongson/lsdc_ttm

Re: RESEND Re: [RFC PATCH] dma-buf/dma-fence: Use a successful read_trylock() annotation for dma_fence_begin_signalling()

2024-09-18 Thread Christian König

Sorry, somehow completely missed that. Feel free to push it to 
drm-misc-next.


Christian.

Am 18.09.24 um 14:34 schrieb Thomas Hellström:

Christian,

Ping?


On Wed, 2024-08-14 at 10:37 +0200, Thomas Hellström wrote:

Christian,

Ack to merge this through drm-misc-next, or do you want to pick it up
for dma-buf?

Thanks,
Thomas


On Wed, 2024-08-14 at 09:10 +0200, Daniel Vetter wrote:

On Fri, May 26, 2023 at 01:11:28PM +0200, Thomas Hellström wrote:

Daniel,

On 4/28/23 14:52, Thomas Hellström wrote:

Condsider the following call sequence:

/* Upper layer */
dma_fence_begin_signalling();
lock(tainted_shared_lock);
/* Driver callback */
dma_fence_begin_signalling();
...

The driver might here use a utility that is annotated as
intended
for the
dma-fence signalling critical path. Now if the upper layer
isn't
correctly
annotated yet for whatever reason, resulting in

/* Upper layer */
lock(tainted_shared_lock);
/* Driver callback */
dma_fence_begin_signalling();

We will receive a false lockdep locking order violation
notification from
dma_fence_begin_signalling(). However entering a dma-fence
signalling
critical section itself doesn't block and could not cause a
deadlock.

So use a successful read_trylock() annotation instead for
dma_fence_begin_signalling(). That will make sure that the
locking order
is correctly registered in the first case, and doesn't register
any
locking order in the second case.

The alternative is of course to make sure that the "Upper
layer"
is always
correctly annotated. But experience shows that's not easily
achievable
in all cases.

Signed-off-by: Thomas Hellström


Resurrecting the discussion on this one. I can't see a situation
where we
would miss *relevant* locking
order violation warnings with this patch. Ofc if we have a
scheduler
annotation patch that would work fine as well, but the lack of
annotation in
the scheduler callbacks is really starting to hurt us.

Yeah this is just a bit too brain-melting to review, but I concur
now.

Reviewed-by: Daniel Vetter 











I think what would help is some lockdep selftests to check that we
both
catch the stuff we want to, and don't incur false positives. Maybe
with a
plea that lockdep should have some native form of cross-release
annotations ...

But definitely seperate patch set, since it might take a few rounds
of
review by lockdep folks.
-Sima


Thanks,

Thomas




---
   drivers/dma-buf/dma-fence.c | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-
fence.c
index f177c56269bb..17f632768ef9 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -308,8 +308,8 @@ bool dma_fence_begin_signalling(void)
    if (in_atomic())
    return true;
-   /* ... and non-recursive readlock */
-   lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL,
_RET_IP_);
+   /* ... and non-recursive successful read_trylock */
+   lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL,
_RET_IP_);
    return false;
   }
@@ -340,7 +340,7 @@ void __dma_fence_might_wait(void)
    lock_map_acquire(&dma_fence_lockdep_map);
    lock_map_release(&dma_fence_lockdep_map);
    if (tmp)
-   lock_acquire(&dma_fence_lockdep_map, 0, 0, 1,
1,
NULL, _THIS_IP_);
+   lock_acquire(&dma_fence_lockdep_map, 0, 1, 1,
1,
NULL, _THIS_IP_);
   }
   #endif

Re: [PATCH v7 0/3] drm: Use full allocated minor range for DRM

2024-08-26 Thread Christian König


Am 23.08.24 um 18:30 schrieb Michał Winiarski:

64 DRM device nodes is not enough for everyone.
Upgrade it to ~512K (which definitely is more than enough).

Additionally, convert minors to use XArray instead of IDR to simplify
the locking.

Corresponding libdrm changes were merged in:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305


Added my Acked-by and pushed the result to drm-misc-next.

Regards,
Christian.



v1 -> v2:
Don't touch DRM_MINOR_CONTROL and its range (Simon Ser)

v2 -> v3:
Don't use legacy scheme for >=192 minor range (Dave Airlie)
Add modparam for testing (Dave Airlie)
Add lockdep annotation for IDR (Daniel Vetter)

v3 -> v4:
Convert from IDR to XArray (Matthew Wilcox)

v4 -> v5:
Fixup IDR to XArray conversion (Matthew Wilcox)

v5 -> v6:
Also convert Accel to XArray
Rename skip_legacy_minors to force_extended_minors

v6 -> v7:
Drop the force_extended_minors patch intended for debug
Rebase on latest drm-tip
Update the cover letter, pointing out libdrm changes

Michał Winiarski (3):
   drm: Use XArray instead of IDR for minors
   accel: Use XArray instead of IDR for minors
   drm: Expand max DRM device number to full MINORBITS

  drivers/accel/drm_accel.c  | 110 +++--
  drivers/gpu/drm/drm_drv.c  |  97 ++---
  drivers/gpu/drm/drm_file.c |   2 +-
  drivers/gpu/drm/drm_internal.h |   4 --
  include/drm/drm_accel.h|  18 +-
  include/drm/drm_file.h |   5 ++
  6 files changed, 62 insertions(+), 174 deletions(-)

Re: [PATCH 1/2] drm/ttm: fix kernel-doc typo for @trylock_only

2024-08-23 Thread Christian König


Am 23.08.24 um 16:11 schrieb Jani Nikula:

s/tryock_only/trylock_only/

Fixes: da966b82bf3d ("drm/ttm: Provide a generic LRU walker helper")
Cc: Thomas Hellström 
Cc: Christian König 
Signed-off-by: Jani Nikula 


Reviewed-by: Christian König 


---
  include/drm/ttm/ttm_bo.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index d1a732d56259..7294dde240fb 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -222,7 +222,7 @@ struct ttm_lru_walk {
struct ttm_operation_ctx *ctx;
/** @ticket: The struct ww_acquire_ctx if any. */
struct ww_acquire_ctx *ticket;
-   /** @tryock_only: Only use trylock for locking. */
+   /** @trylock_only: Only use trylock for locking. */
bool trylock_only;
  };

Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function

2024-07-24 Thread Christian König


Am 24.07.24 um 11:37 schrieb Matthew Auld:

On 24/07/2024 02:35, Marek Olšák wrote:
The reason is that our DCC requires 768K alignment in some cases. I 
haven't read this patch series, but one way to do that is to align to 
256K, overallocate by 512K, and then not use either 0, 256K, or 512K 
at the beginning to get to 768K alignment.


Ah, so we need a non power-of-two alignment. That makes sense, thanks.


Well actually the requirement is that memory reads for scanout needs to 
be distributed over the memory channels in a certain way.


Our hw guys just expressed that as a rather strange non-power of two 
alignment :)


Christian.





Marek

On Tue, Jul 23, 2024, 11:04 Matthew Auld <mailto:matthew.a...@intel.com>> wrote:


    On 23/07/2024 14:43, Paneer Selvam, Arunpravin wrote:
 > Hi Matthew,
 >
 > Can we push this version for now as we need to mainline the DCC
    changes
 > ASAP,
 > while we continue our discussion and proceed to implement the
    permanent
 > solution
 > for address alignment?

    Yeah, we can always merge now and circle back around later, if 
this for
    sure helps your usecase and is needed asap. I just didn't fully 
get the

    idea for needing this interface, but likely I am missing something.

 >
 > Thanks,
 > Arun.
 >
 > On 7/23/2024 6:55 PM, Arunpravin Paneer Selvam wrote:
 >> - Add a new start parameter in trim function to specify exact
 >>    address from where to start the trimming. This would help us
 >>    in situations like if drivers would like to do address 
alignment

 >>    for specific requirements.
 >>
 >> - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this
 >>    flag to disable the allocator trimming part. This patch 
enables

 >>    the drivers control trimming and they can do it themselves
 >>    based on the application requirements.
 >>
 >> v1:(Matthew)
 >>    - check new_start alignment with min chunk_size
 >>    - use range_overflows()
 >>
 >> Signed-off-by: Arunpravin Paneer Selvam
    mailto:arunpravin.paneersel...@amd.com>>
 >> Acked-by: Alex Deucher mailto:alexander.deuc...@amd.com>>
 >> Acked-by: Christian König mailto:christian.koe...@amd.com>>
 >> ---
 >>   drivers/gpu/drm/drm_buddy.c  | 25
    +++--
 >>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c |  2 +-
 >>   include/drm/drm_buddy.h  |  2 ++
 >>   3 files changed, 26 insertions(+), 3 deletions(-)
 >>
 >> diff --git a/drivers/gpu/drm/drm_buddy.c
    b/drivers/gpu/drm/drm_buddy.c
 >> index 6a8e45e9d0ec..103c185bb1c8 100644
 >> --- a/drivers/gpu/drm/drm_buddy.c
 >> +++ b/drivers/gpu/drm/drm_buddy.c
 >> @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct
 >> drm_buddy *mm,
 >>    * drm_buddy_block_trim - free unused pages
 >>    *
 >>    * @mm: DRM buddy manager
 >> + * @start: start address to begin the trimming.
 >>    * @new_size: original size requested
 >>    * @blocks: Input and output list of allocated blocks.
 >>    * MUST contain single block as input to be trimmed.
 >> @@ -866,11 +867,13 @@ static int 
__alloc_contig_try_harder(struct

 >> drm_buddy *mm,
 >>    * 0 on success, error code on failure.
 >>    */
 >>   int drm_buddy_block_trim(struct drm_buddy *mm,
 >> + u64 *start,
 >>    u64 new_size,
 >>    struct list_head *blocks)
 >>   {
 >>   struct drm_buddy_block *parent;
 >>   struct drm_buddy_block *block;
 >> +    u64 block_start, block_end;
 >>   LIST_HEAD(dfs);
 >>   u64 new_start;
 >>   int err;
 >> @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy 
*mm,

 >>    struct drm_buddy_block,
 >>    link);
 >> +    block_start = drm_buddy_block_offset(block);
 >> +    block_end = block_start + drm_buddy_block_size(mm, block);
 >> +
 >>   if (WARN_ON(!drm_buddy_block_is_allocated(block)))
 >>   return -EINVAL;
 >> @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct 
drm_buddy *mm,

 >>   if (new_size == drm_buddy_block_size(mm, block))
 >>   return 0;
 >> +    new_start = block_start;
 >> +    if (start) {
 >> +    new_start = *start;
 >> +
 >> +    if (new_start < block_start)
 >> +    return -EINVA

Re: [PATCH v6 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-18 Thread Christian König


Am 18.07.24 um 12:32 schrieb Arunpravin Paneer Selvam:

Add address alignment support to the DCC VRAM buffers.

v2:
   - adjust size based on the max_texture_channel_caches values
 only for GFX12 DCC buffers.
   - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
 for DCC buffers.
   - roundup non power of two DCC buffer adjusted size to nearest
 power of two number as the buddy allocator does not support non
 power of two alignments. This applies only to the contiguous
 DCC buffers.

v3:(Alex)
   - rewrite the max texture channel caches comparison code in an
 algorithmic way to determine the alignment size.

v4:(Alex)
   - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c
 and add a new gmc func callback for dcc alignment. If the callback
 is non-NULL, call it to get the alignment, otherwise, use the default.

v5:(Alex)
   - Set the Alignment to a default value if the callback doesn't exist.
   - Add the callback to amdgpu_gmc_funcs.

v6:
   - Fix checkpatch warning reported by Intel CI.

v7:(Christian)
   - remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that
 checks the BO pinning and for a specific hw generation.

Signed-off-by: Arunpravin Paneer Selvam 
Acked-by: Alex Deucher 
Acked-by: Christian König 
Reviewed-by: Frank Min 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  |  6 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 39 +++-
  drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c   | 15 
  3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index febca3130497..654d0548a3f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -156,6 +156,8 @@ struct amdgpu_gmc_funcs {
  uint64_t addr, uint64_t *flags);
/* get the amount of memory used by the vbios for pre-OS console */
unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev);
+   /* get the DCC buffer alignment */
+   u64 (*get_dcc_alignment)(struct amdgpu_device *adev);
  
  	enum amdgpu_memory_partition (*query_mem_partition_mode)(

struct amdgpu_device *adev);
@@ -363,6 +365,10 @@ struct amdgpu_gmc {
(adev)->gmc.gmc_funcs->override_vm_pte_flags  \
((adev), (vm), (addr), (pte_flags))
  #define amdgpu_gmc_get_vbios_fb_size(adev) 
(adev)->gmc.gmc_funcs->get_vbios_fb_size((adev))
+#define amdgpu_gmc_get_dcc_alignment(_adev) ({ \
+   typeof(_adev) (adev) = (_adev); \
+   ((adev)->gmc.gmc_funcs->get_dcc_alignment((adev)));   \
+})
  
  /**

   * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the 
BAR
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index f91cc149d06c..ace9d61fc512 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -512,6 +512,17 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
  
  	remaining_size = (u64)vres->base.size;




+   if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
+   (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 0) ||
+amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 1))) {


I think you should move this check into gmc_v12_0_get_dcc_alignment.

E.g. here you just check if adev->gmc.gmc_funcs->get_dcc_alignment is 
not NULL.


Then call the function and if it returns a non zero value apply it.

Regards,
Christian.



+   u64 adjust_size;
+
+   if (adev->gmc.gmc_funcs->get_dcc_alignment) {
+   adjust_size = amdgpu_gmc_get_dcc_alignment(adev);
+   remaining_size = roundup_pow_of_two(remaining_size + 
adjust_size);
+   vres->flags |= DRM_BUDDY_TRIM_DISABLE;
+   }
+   }
  
  	mutex_lock(&mgr->lock);

while (remaining_size) {
@@ -521,8 +532,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
min_block_size = mgr->default_page_size;
  
  		size = remaining_size;

-   if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
-   !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1)))
+
+   if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
+   (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 
0) ||
+amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 
1)))
+   min_block_size = size;
+   else if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
+

Re: [PATCH v5 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-17 Thread Christian König


As far as I know, yes.

Regards,
Christian.

Am 17.07.24 um 16:38 schrieb Paneer Selvam, Arunpravin:

Hi Christian,

Can we use the below combination flags to kick in hardware workaround 
while pinning BO's for this specific hw generation.


if (place->flags & TTM_PL_FLAG_CONTIGUOUS) &&
(amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 0) ||
amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 1))) {
}

Regards,
Arun.

On 7/17/2024 2:38 PM, Christian König wrote:

Well that approach was discussed before and seemed to be to complicated.

But I totally agree that the AMDGPU_GEM_CREATE_GFX12_DCC flag is a 
bad idea. This isn't anything userspace should need to specify in the 
first place.


All we need is a hardware workaround which kicks in all the time 
while pinning BOs for this specific hw generation and texture channel 
configuration.


Please remove the AMDGPU_GEM_CREATE_GFX12_DCC flag again if possible 
or specify why it is actually necessary?


Regards,
Christian.

Am 17.07.24 um 05:44 schrieb Marek Olšák:
AMDGPU_GEM_CREATE_GFX12_DCC is set on 90% of all memory allocations, 
and almost all of them are not displayable. Shouldn't we use a 
different way to indicate that we need a non-power-of-two alignment, 
such as looking at the alignment field directly?


Marek

On Tue, Jul 16, 2024, 11:45 Arunpravin Paneer Selvam 
 wrote:


Add address alignment support to the DCC VRAM buffers.

v2:
  - adjust size based on the max_texture_channel_caches values
    only for GFX12 DCC buffers.
  - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
    for DCC buffers.
  - roundup non power of two DCC buffer adjusted size to nearest
    power of two number as the buddy allocator does not support non
    power of two alignments. This applies only to the contiguous
    DCC buffers.

v3:(Alex)
  - rewrite the max texture channel caches comparison code in an
    algorithmic way to determine the alignment size.

v4:(Alex)
  - Move the logic from amdgpu_vram_mgr_dcc_alignment() to
gmc_v12_0.c
    and add a new gmc func callback for dcc alignment. If the
callback
    is non-NULL, call it to get the alignment, otherwise, use
the default.

v5:(Alex)
  - Set the Alignment to a default value if the callback doesn't
exist.
  - Add the callback to amdgpu_gmc_funcs.

v6:
  - Fix checkpatch error reported by Intel CI.

Signed-off-by: Arunpravin Paneer Selvam

Acked-by: Alex Deucher 
Acked-by: Christian König 
Reviewed-by: Frank Min 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h      |  6 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 36
++--
 drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c       | 15 
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index febca3130497..654d0548a3f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -156,6 +156,8 @@ struct amdgpu_gmc_funcs {
                                      uint64_t addr, uint64_t
*flags);
        /* get the amount of memory used by the vbios for pre-OS
console */
        unsigned int (*get_vbios_fb_size)(struct amdgpu_device
*adev);
+       /* get the DCC buffer alignment */
+       u64 (*get_dcc_alignment)(struct amdgpu_device *adev);

        enum amdgpu_memory_partition (*query_mem_partition_mode)(
                struct amdgpu_device *adev);
@@ -363,6 +365,10 @@ struct amdgpu_gmc {
(adev)->gmc.gmc_funcs->override_vm_pte_flags                 \
                ((adev), (vm), (addr), (pte_flags))
 #define amdgpu_gmc_get_vbios_fb_size(adev)
(adev)->gmc.gmc_funcs->get_vbios_fb_size((adev))
+#define amdgpu_gmc_get_dcc_alignment(_adev) ({          \
+       typeof(_adev) (adev) = (_adev);          \
+  ((adev)->gmc.gmc_funcs->get_dcc_alignment((adev)));    \
+})

 /**
  * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible
through the BAR
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index f91cc149d06c..aa9dca12371c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -512,6 +512,16 @@ static int amdgpu_vram_mgr_new(struct
ttm_resource_manager *man,
                vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;

        remaining_size = (u64)vres->base.size;
+       if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
+           bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC) {
+               u64 adjust_size;
+
+               if (adev->gmc.gmc_funcs->get_dcc_align

Re: [PATCH v5 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-17 Thread Christian König


Well that approach was discussed before and seemed to be to complicated.

But I totally agree that the AMDGPU_GEM_CREATE_GFX12_DCC flag is a bad 
idea. This isn't anything userspace should need to specify in the first 
place.


All we need is a hardware workaround which kicks in all the time while 
pinning BOs for this specific hw generation and texture channel 
configuration.


Please remove the AMDGPU_GEM_CREATE_GFX12_DCC flag again if possible or 
specify why it is actually necessary?


Regards,
Christian.

Am 17.07.24 um 05:44 schrieb Marek Olšák:
AMDGPU_GEM_CREATE_GFX12_DCC is set on 90% of all memory allocations, 
and almost all of them are not displayable. Shouldn't we use a 
different way to indicate that we need a non-power-of-two alignment, 
such as looking at the alignment field directly?


Marek

On Tue, Jul 16, 2024, 11:45 Arunpravin Paneer Selvam 
 wrote:


Add address alignment support to the DCC VRAM buffers.

v2:
  - adjust size based on the max_texture_channel_caches values
    only for GFX12 DCC buffers.
  - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
    for DCC buffers.
  - roundup non power of two DCC buffer adjusted size to nearest
    power of two number as the buddy allocator does not support non
    power of two alignments. This applies only to the contiguous
    DCC buffers.

v3:(Alex)
  - rewrite the max texture channel caches comparison code in an
    algorithmic way to determine the alignment size.

v4:(Alex)
  - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c
    and add a new gmc func callback for dcc alignment. If the callback
    is non-NULL, call it to get the alignment, otherwise, use the
default.

v5:(Alex)
  - Set the Alignment to a default value if the callback doesn't
exist.
  - Add the callback to amdgpu_gmc_funcs.

v6:
  - Fix checkpatch error reported by Intel CI.

Signed-off-by: Arunpravin Paneer Selvam

Acked-by: Alex Deucher 
Acked-by: Christian König 
Reviewed-by: Frank Min 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h      |  6 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 36
++--
 drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c       | 15 
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index febca3130497..654d0548a3f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -156,6 +156,8 @@ struct amdgpu_gmc_funcs {
                                      uint64_t addr, uint64_t *flags);
        /* get the amount of memory used by the vbios for pre-OS
console */
        unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev);
+       /* get the DCC buffer alignment */
+       u64 (*get_dcc_alignment)(struct amdgpu_device *adev);

        enum amdgpu_memory_partition (*query_mem_partition_mode)(
                struct amdgpu_device *adev);
@@ -363,6 +365,10 @@ struct amdgpu_gmc {
(adev)->gmc.gmc_funcs->override_vm_pte_flags             \
                ((adev), (vm), (addr), (pte_flags))
 #define amdgpu_gmc_get_vbios_fb_size(adev)
(adev)->gmc.gmc_funcs->get_vbios_fb_size((adev))
+#define amdgpu_gmc_get_dcc_alignment(_adev) ({      \
+       typeof(_adev) (adev) = (_adev);      \
+  ((adev)->gmc.gmc_funcs->get_dcc_alignment((adev)));    \
+})

 /**
  * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible
through the BAR
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index f91cc149d06c..aa9dca12371c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -512,6 +512,16 @@ static int amdgpu_vram_mgr_new(struct
ttm_resource_manager *man,
                vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;

        remaining_size = (u64)vres->base.size;
+       if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
+           bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC) {
+               u64 adjust_size;
+
+               if (adev->gmc.gmc_funcs->get_dcc_alignment) {
+                       adjust_size =
amdgpu_gmc_get_dcc_alignment(adev);
+                       remaining_size =
roundup_pow_of_two(remaining_size + adjust_size);
+                       vres->flags |= DRM_BUDDY_TRIM_DISABLE;
+               }
+       }

        mutex_lock(&mgr->lock);
        while (remaining_size) {
@@ -521,8 +531,12 @@ static int amdgpu_vram_mgr_new(struct
ttm_resource_manager *man,
                        min_block_size = mgr->default_page_size;

Re: [PATCH 1/2] drm: Add might_fault to drm_modeset_lock priming

2024-07-10 Thread Christian König


Am 10.07.24 um 13:58 schrieb Daniel Vetter:

On Wed, 10 Jul 2024 at 13:39, Christian König  wrote:

Am 10.07.24 um 11:31 schrieb Daniel Vetter:

We already teach lockdep that dma_resv nests within drm_modeset_lock,
but there's a lot more: All drm kms ioctl rely on being able to
put/get_user while holding modeset locks, so we really need a
might_fault in there too to complete the picture. Add it.

Mhm, lockdep should be able to deduce that when there might be faults
under the dma_resv lock there might also be faults under the
drm_modeset_lock.

You're not allowed to take a fault under dma_resv, because drivers
might need to take that lock to handle faults. So unfortunately in our
combined lockdep priming, there really seems to be no chain yet that
teaches about faults possibly happening while holding
drm_modeset_lock.


Ah, of course! You are right, it was just the other way around.

Thanks,
Christian.


-Sima


Motivated by a syzbot report that blew up on bcachefs doing an
unconditional console_lock way deep in the locking hierarchy, and
lockdep only noticing the depency loop in a drm ioctl instead of much
earlier. This annotation will make sure such issues have a much harder
time escaping.

References: 
https://lore.kernel.org/dri-devel/73db8b061cd43...@google.com/
Signed-off-by: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: Sumit Semwal 
Cc: "Christian König" 
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org

On the other hand pointing it out explicitly doesn't hurts us at all, so
Reviewed-by: Christian König .

Regards,
Christian.


---
   drivers/gpu/drm/drm_mode_config.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/drm_mode_config.c 
b/drivers/gpu/drm/drm_mode_config.c
index 568972258222..37d2e0a4ef4b 100644
--- a/drivers/gpu/drm/drm_mode_config.c
+++ b/drivers/gpu/drm/drm_mode_config.c
@@ -456,6 +456,8 @@ int drmm_mode_config_init(struct drm_device *dev)
   if (ret == -EDEADLK)
   ret = drm_modeset_backoff(&modeset_ctx);

+ might_fault();
+
   ww_acquire_init(&resv_ctx, &reservation_ww_class);
   ret = dma_resv_lock(&resv, &resv_ctx);
   if (ret == -EDEADLK)

Re: [PATCH 1/2] drm: Add might_fault to drm_modeset_lock priming

2024-07-10 Thread Christian König


Am 10.07.24 um 11:31 schrieb Daniel Vetter:

We already teach lockdep that dma_resv nests within drm_modeset_lock,
but there's a lot more: All drm kms ioctl rely on being able to
put/get_user while holding modeset locks, so we really need a
might_fault in there too to complete the picture. Add it.


Mhm, lockdep should be able to deduce that when there might be faults 
under the dma_resv lock there might also be faults under the 
drm_modeset_lock.




Motivated by a syzbot report that blew up on bcachefs doing an
unconditional console_lock way deep in the locking hierarchy, and
lockdep only noticing the depency loop in a drm ioctl instead of much
earlier. This annotation will make sure such issues have a much harder
time escaping.

References: 
https://lore.kernel.org/dri-devel/73db8b061cd43...@google.com/
Signed-off-by: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: Sumit Semwal 
Cc: "Christian König" 
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org


On the other hand pointing it out explicitly doesn't hurts us at all, so 
Reviewed-by: Christian König .


Regards,
Christian.


---
  drivers/gpu/drm/drm_mode_config.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/drm_mode_config.c 
b/drivers/gpu/drm/drm_mode_config.c
index 568972258222..37d2e0a4ef4b 100644
--- a/drivers/gpu/drm/drm_mode_config.c
+++ b/drivers/gpu/drm/drm_mode_config.c
@@ -456,6 +456,8 @@ int drmm_mode_config_init(struct drm_device *dev)
if (ret == -EDEADLK)
ret = drm_modeset_backoff(&modeset_ctx);
  
+		might_fault();

+
ww_acquire_init(&resv_ctx, &reservation_ww_class);
ret = dma_resv_lock(&resv, &resv_ctx);
if (ret == -EDEADLK)

Re: [PATCH 2/6] drm/ttm: Store the bo_kmap_type in struct iosys_map

2024-06-17 Thread Christian König


Hi,

Am 17.06.24 um 14:32 schrieb Thomas Zimmermann:

Hi

Am 14.06.24 um 16:31 schrieb Christian König:

Am 14.06.24 um 15:21 schrieb Thomas Zimmermann:

For each instances of struct iosys_map set up by ttm_bo_vmap(), store
the type of allocation in the instance. Use this information to unmap
the memory in ttm_bo_vunmap(). This change simplifies the unmap code
and puts the complicated logic entirely into the map code.


I'm not sure that's a good idea.

The mapping information should already be available in the resource 
and storing it in the iosys_map structures duplicates that information.


So we might run into the issue that the resource has changed and so 
we need a different approach now, but the iosys_map will say that we 
should unmap things for example.


Patches 1 and 2 are only here to make patch 4 (add the kmap special 
case) work. How can I distinguish between vmap'ed and kmap'ed memory? 
It's not clear to me, whether there is a benefit from patch 4. The xe 
driver makes it sound like that, but the rest of the kernel appears to 
disagree.


Yeah, exactly that's the point.

The last time we talked about that we came to the conclusion that the 
kmap approach of mapping only a single page or range of pages isn't that 
useful in general.


The only use case where you actually need this is the 
ttm_bo_vm_access_kmap() helper and that is static and internal to TTM.


So what exactly is the use case xe tries to handle here?

Regards,
Christian.



Best regards
Thomas



Regards,
Christian.



Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/ttm/ttm_bo_util.c | 46 
+--

  1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c

index 0b3f4267130c4..a9df0deff2deb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -36,6 +36,7 @@
  #include 
    #include 
+#include 
    struct ttm_transfer_obj {
  struct ttm_buffer_object base;
@@ -479,24 +480,29 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, 
struct iosys_map *map)

    if (mem->bus.is_iomem) {
  void __iomem *vaddr_iomem;
+    u16 alloc_flags;
  -    if (mem->bus.addr)
+    if (mem->bus.addr) {
  vaddr_iomem = (void __iomem *)mem->bus.addr;
-    else if (mem->bus.caching == ttm_write_combined)
-    vaddr_iomem = ioremap_wc(mem->bus.offset,
- bo->base.size);
+    alloc_flags = ttm_bo_map_premapped;
+    } else if (mem->bus.caching == ttm_write_combined) {
+    vaddr_iomem = ioremap_wc(mem->bus.offset, bo->base.size);
+    alloc_flags = ttm_bo_map_iomap;
  #ifdef CONFIG_X86
-    else if (mem->bus.caching == ttm_cached)
-    vaddr_iomem = ioremap_cache(mem->bus.offset,
-  bo->base.size);
+    } else if (mem->bus.caching == ttm_cached) {
+    vaddr_iomem = ioremap_cache(mem->bus.offset, 
bo->base.size);

+    alloc_flags = ttm_bo_map_iomap;
  #endif
-    else
+    } else {
  vaddr_iomem = ioremap(mem->bus.offset, bo->base.size);
+    alloc_flags = ttm_bo_map_iomap;
+    }
    if (!vaddr_iomem)
  return -ENOMEM;
    iosys_map_set_vaddr_iomem(map, vaddr_iomem);
+    map->alloc_flags = alloc_flags;
    } else {
  struct ttm_operation_ctx ctx = {
@@ -506,6 +512,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, 
struct iosys_map *map)

  struct ttm_tt *ttm = bo->ttm;
  pgprot_t prot;
  void *vaddr;
+    u16 alloc_flags;
    ret = ttm_tt_populate(bo->bdev, ttm, &ctx);
  if (ret)
@@ -519,8 +526,10 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, 
struct iosys_map *map)

  vaddr = vmap(ttm->pages, ttm->num_pages, 0, prot);
  if (!vaddr)
  return -ENOMEM;
+    alloc_flags = ttm_bo_map_vmap;
    iosys_map_set_vaddr(map, vaddr);
+    map->alloc_flags = alloc_flags;
  }
    return 0;
@@ -537,20 +546,27 @@ EXPORT_SYMBOL(ttm_bo_vmap);
   */
  void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map 
*map)

  {
-    struct ttm_resource *mem = bo->resource;
-
  dma_resv_assert_held(bo->base.resv);
    if (iosys_map_is_null(map))
  return;
  -    if (!map->is_iomem)
-    vunmap(map->vaddr);
-    else if (!mem->bus.addr)
+    switch (map->alloc_flags) {
+    case ttm_bo_map_iomap:
  iounmap(map->vaddr_iomem);
-    iosys_map_clear(map);
-
+    break;
+    case ttm_bo_map_vmap:
+    vunmap(map->vaddr);
+    break;
+    case ttm_bo_map_premapped:
+    break;
+    default:
+    drm_err(bo->base.dev, "Unsupported alloc_flags 0x%x\n", 
map->alloc_flags);

+    return;
+    }
  ttm_mem_io_free(bo->bdev, bo->resource);
+
+    iosys_map_clear(map);
  }
  EXPORT_SYMBOL(ttm_bo_vunmap);

Re: [PATCH 3/6] drm/ttm: Support partial buffer mappings for ttm_bo_vmap()

2024-06-14 Thread Christian König


Am 14.06.24 um 15:21 schrieb Thomas Zimmermann:

Add offset and size parameters to ttm_bo_vmap() to allow for partial
mappings of a buffer object. This brings the functionality on par with
ttm_bo_kmap().


Well the long term plan was to remove this functionality from 
ttm_bo_kmap() and nuke that function sooner or later.


What exactly is the use case for partial mappings?

Regards,
Christian.



Callers pass the byte offset and size within the buffer object and
receive a page-aligned mapping of the buffer object's memory for the
specified area.

Also update all callers of ttm_bo_vmap() for the new parameters. As
before, existing callers map the buffer object's complete memory.

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/drm_gem_ttm_helper.c  |  2 +-
  drivers/gpu/drm/drm_gem_vram_helper.c |  2 +-
  drivers/gpu/drm/loongson/lsdc_gem.c   |  2 +-
  drivers/gpu/drm/qxl/qxl_object.c  |  2 +-
  drivers/gpu/drm/ttm/ttm_bo_util.c | 21 +++--
  drivers/gpu/drm/xe/xe_lrc.c   |  2 +-
  drivers/gpu/drm/xe/xe_vm.c|  2 +-
  include/drm/ttm/ttm_bo.h  |  4 +++-
  8 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_ttm_helper.c 
b/drivers/gpu/drm/drm_gem_ttm_helper.c
index 3734aa2d1c5b5..f26b7c9077a68 100644
--- a/drivers/gpu/drm/drm_gem_ttm_helper.c
+++ b/drivers/gpu/drm/drm_gem_ttm_helper.c
@@ -67,7 +67,7 @@ int drm_gem_ttm_vmap(struct drm_gem_object *gem,
  {
struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(gem);
  
-	return ttm_bo_vmap(bo, map);

+   return ttm_bo_vmap(bo, 0, gem->size, map);
  }
  EXPORT_SYMBOL(drm_gem_ttm_vmap);
  
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c

index 6027584406af6..1670f9a459a9d 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -398,7 +398,7 @@ int drm_gem_vram_vmap(struct drm_gem_vram_object *gbo, 
struct iosys_map *map)
 * no mapping present.
 */
if (iosys_map_is_null(&gbo->map)) {
-   ret = ttm_bo_vmap(&gbo->bo, &gbo->map);
+   ret = ttm_bo_vmap(&gbo->bo, 0, gbo->bo.base.size, &gbo->map);
if (ret)
return ret;
}
diff --git a/drivers/gpu/drm/loongson/lsdc_gem.c 
b/drivers/gpu/drm/loongson/lsdc_gem.c
index a720d8f532093..f709960c781b9 100644
--- a/drivers/gpu/drm/loongson/lsdc_gem.c
+++ b/drivers/gpu/drm/loongson/lsdc_gem.c
@@ -77,7 +77,7 @@ static int lsdc_gem_object_vmap(struct drm_gem_object *obj, 
struct iosys_map *ma
return ret;
}
  
-	ret = ttm_bo_vmap(tbo, &lbo->map);

+   ret = ttm_bo_vmap(tbo, 0, tbo->base.size, &lbo->map);
if (ret) {
drm_err(obj->dev, "ttm bo vmap failed\n");
lsdc_bo_unpin(lbo);
diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
index 5893e27a7ae50..9f06d5e26a32c 100644
--- a/drivers/gpu/drm/qxl/qxl_object.c
+++ b/drivers/gpu/drm/qxl/qxl_object.c
@@ -164,7 +164,7 @@ int qxl_bo_vmap_locked(struct qxl_bo *bo, struct iosys_map 
*map)
goto out;
}
  
-	r = ttm_bo_vmap(&bo->tbo, &bo->map);

+   r = ttm_bo_vmap(&bo->tbo, 0, bo->tbo.base.size, &bo->map);
if (r) {
qxl_bo_unpin_locked(bo);
return r;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index a9df0deff2deb..31f9772f05dac 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -457,17 +457,23 @@ EXPORT_SYMBOL(ttm_bo_kunmap);
   * ttm_bo_vmap
   *
   * @bo: The buffer object.
+ * @offset: Byte offset into the buffer.
+ * @size: Number of bytes to map.
   * @map: pointer to a struct iosys_map representing the map.
   *
   * Sets up a kernel virtual mapping, using ioremap or vmap to the
   * data in the buffer object. The parameter @map returns the virtual
   * address as struct iosys_map. Unmap the buffer with ttm_bo_vunmap().
+ * The address stored in @map will be aligned to the next lower page
+ * boundaries.
   *
   * Returns
   * -ENOMEM: Out of memory.
   * -EINVAL: Invalid range.
   */
-int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map)
+int ttm_bo_vmap(struct ttm_buffer_object *bo,
+   unsigned long offset, unsigned long size,
+   struct iosys_map *map)
  {
struct ttm_resource *mem = bo->resource;
int ret;
@@ -483,18 +489,18 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct 
iosys_map *map)
u16 alloc_flags;
  
  		if (mem->bus.addr) {

-   vaddr_iomem = (void __iomem *)mem->bus.addr;
+   vaddr_iomem = (u8 __iomem *)mem->bus.addr + offset;
alloc_flags = ttm_bo_map_premapped;
} else if (mem->bus.caching == ttm_write_combined) {
-   vaddr_iomem = ioremap_wc(mem->bus.offset, 
bo->base.size);
+

Re: [PATCH 2/6] drm/ttm: Store the bo_kmap_type in struct iosys_map

2024-06-14 Thread Christian König


Am 14.06.24 um 15:21 schrieb Thomas Zimmermann:

For each instances of struct iosys_map set up by ttm_bo_vmap(), store
the type of allocation in the instance. Use this information to unmap
the memory in ttm_bo_vunmap(). This change simplifies the unmap code
and puts the complicated logic entirely into the map code.


I'm not sure that's a good idea.

The mapping information should already be available in the resource and 
storing it in the iosys_map structures duplicates that information.


So we might run into the issue that the resource has changed and so we 
need a different approach now, but the iosys_map will say that we should 
unmap things for example.


Regards,
Christian.



Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/ttm/ttm_bo_util.c | 46 +--
  1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 0b3f4267130c4..a9df0deff2deb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -36,6 +36,7 @@
  #include 
  
  #include 

+#include 
  
  struct ttm_transfer_obj {

struct ttm_buffer_object base;
@@ -479,24 +480,29 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct 
iosys_map *map)
  
  	if (mem->bus.is_iomem) {

void __iomem *vaddr_iomem;
+   u16 alloc_flags;
  
-		if (mem->bus.addr)

+   if (mem->bus.addr) {
vaddr_iomem = (void __iomem *)mem->bus.addr;
-   else if (mem->bus.caching == ttm_write_combined)
-   vaddr_iomem = ioremap_wc(mem->bus.offset,
-bo->base.size);
+   alloc_flags = ttm_bo_map_premapped;
+   } else if (mem->bus.caching == ttm_write_combined) {
+   vaddr_iomem = ioremap_wc(mem->bus.offset, 
bo->base.size);
+   alloc_flags = ttm_bo_map_iomap;
  #ifdef CONFIG_X86
-   else if (mem->bus.caching == ttm_cached)
-   vaddr_iomem = ioremap_cache(mem->bus.offset,
- bo->base.size);
+   } else if (mem->bus.caching == ttm_cached) {
+   vaddr_iomem = ioremap_cache(mem->bus.offset, 
bo->base.size);
+   alloc_flags = ttm_bo_map_iomap;
  #endif
-   else
+   } else {
vaddr_iomem = ioremap(mem->bus.offset, bo->base.size);
+   alloc_flags = ttm_bo_map_iomap;
+   }
  
  		if (!vaddr_iomem)

return -ENOMEM;
  
  		iosys_map_set_vaddr_iomem(map, vaddr_iomem);

+   map->alloc_flags = alloc_flags;
  
  	} else {

struct ttm_operation_ctx ctx = {
@@ -506,6 +512,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct 
iosys_map *map)
struct ttm_tt *ttm = bo->ttm;
pgprot_t prot;
void *vaddr;
+   u16 alloc_flags;
  
  		ret = ttm_tt_populate(bo->bdev, ttm, &ctx);

if (ret)
@@ -519,8 +526,10 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct 
iosys_map *map)
vaddr = vmap(ttm->pages, ttm->num_pages, 0, prot);
if (!vaddr)
return -ENOMEM;
+   alloc_flags = ttm_bo_map_vmap;
  
  		iosys_map_set_vaddr(map, vaddr);

+   map->alloc_flags = alloc_flags;
}
  
  	return 0;

@@ -537,20 +546,27 @@ EXPORT_SYMBOL(ttm_bo_vmap);
   */
  void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map *map)
  {
-   struct ttm_resource *mem = bo->resource;
-
dma_resv_assert_held(bo->base.resv);
  
  	if (iosys_map_is_null(map))

return;
  
-	if (!map->is_iomem)

-   vunmap(map->vaddr);
-   else if (!mem->bus.addr)
+   switch (map->alloc_flags) {
+   case ttm_bo_map_iomap:
iounmap(map->vaddr_iomem);
-   iosys_map_clear(map);
-
+   break;
+   case ttm_bo_map_vmap:
+   vunmap(map->vaddr);
+   break;
+   case ttm_bo_map_premapped:
+   break;
+   default:
+   drm_err(bo->base.dev, "Unsupported alloc_flags 0x%x\n", 
map->alloc_flags);
+   return;
+   }
ttm_mem_io_free(bo->bdev, bo->resource);
+
+   iosys_map_clear(map);
  }
  EXPORT_SYMBOL(ttm_bo_vunmap);

Re: [PATCH V2] drm/ttm: remove unused paramter

2024-04-11 Thread Christian König


Am 01.04.24 um 05:04 schrieb jesse.zh...@amd.com:

From: Jesse Zhang 

remove the unsed the paramter in the function
ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence.
  V2:rebase the patch on top of drm-misc-next (Christian)


And pushed to drm-misc-next.

Thanks,
Christian.



Signed-off-by: Jesse Zhang 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index e059b1e1b13b..6396dece0db1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo)
  EXPORT_SYMBOL(ttm_bo_put);
  
  static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,

-struct ttm_resource **mem,
 struct ttm_operation_ctx *ctx,
 struct ttm_place *hop)
  {
@@ -469,7 +468,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
if (ret != -EMULTIHOP)
break;
  
-		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);

+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop);
} while (!ret);
  
  	if (ret) {

@@ -698,7 +697,6 @@ EXPORT_SYMBOL(ttm_bo_unpin);
   */
  static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 struct ttm_resource_manager *man,
-struct ttm_resource *mem,
 bool no_wait_gpu)
  {
struct dma_fence *fence;
@@ -787,7 +785,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object 
*bo,
if (ret)
continue;
  
-		ret = ttm_bo_add_move_fence(bo, man, *res, ctx->no_wait_gpu);

+   ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
if (unlikely(ret)) {
ttm_resource_free(bo, res);
if (ret == -EBUSY)
@@ -894,7 +892,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
  bounce:
ret = ttm_bo_handle_move_mem(bo, res, false, ctx, &hop);
if (ret == -EMULTIHOP) {
-   ret = ttm_bo_bounce_temp_buffer(bo, &res, ctx, &hop);
+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop);
/* try and move to final place now. */
if (!ret)
goto bounce;

Re: [PATCH] drm/ttm: remove unused paramter

2024-03-29 Thread Christian König


Am 29.03.24 um 12:10 schrieb Christian König:

Am 25.03.24 um 08:45 schrieb Jesse Zhang:

remove the unsed the paramter in the function
ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence.

Signed-off-by: Jesse Zhang 


Good catch, Reviewed-by: Christian König 


Please rebase that patch on top of drm-misc-next.

Regards,
Christian.



Regards,
Christian.


---
  drivers/gpu/drm/ttm/ttm_bo.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..7f08787687a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo)
  EXPORT_SYMBOL(ttm_bo_put);
    static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
- struct ttm_resource **mem,
   struct ttm_operation_ctx *ctx,
   struct ttm_place *hop)
  {
@@ -470,7 +469,7 @@ static int ttm_bo_evict(struct ttm_buffer_object 
*bo,

  if (ret != -EMULTIHOP)
  break;
  -    ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
+    ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop);
  } while (!ret);
    if (ret) {
@@ -699,7 +698,6 @@ EXPORT_SYMBOL(ttm_bo_unpin);
   */
  static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
   struct ttm_resource_manager *man,
- struct ttm_resource *mem,
   bool no_wait_gpu)
  {
  struct dma_fence *fence;
@@ -753,7 +751,7 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

  return ret;
  } while (1);
  -    return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
+    return ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
  }
    /**
@@ -802,7 +800,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
  if (unlikely(ret))
  goto error;
  -    ret = ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
+    ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
  if (unlikely(ret)) {
  ttm_resource_free(bo, mem);
  if (ret == -EBUSY)
@@ -866,7 +864,7 @@ static int ttm_bo_move_buffer(struct 
ttm_buffer_object *bo,

  bounce:
  ret = ttm_bo_handle_move_mem(bo, mem, false, ctx, &hop);
  if (ret == -EMULTIHOP) {
-    ret = ttm_bo_bounce_temp_buffer(bo, &mem, ctx, &hop);
+    ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop);
  if (ret)
  goto out;
  /* try and move to final place now. */

Re: [PATCH] drm/ttm: remove unused paramter

2024-03-29 Thread Christian König


Am 25.03.24 um 08:45 schrieb Jesse Zhang:

remove the unsed the paramter in the function
ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence.

Signed-off-by: Jesse Zhang 


Good catch, Reviewed-by: Christian König 

Regards,
Christian.


---
  drivers/gpu/drm/ttm/ttm_bo.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..7f08787687a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo)
  EXPORT_SYMBOL(ttm_bo_put);
  
  static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,

-struct ttm_resource **mem,
 struct ttm_operation_ctx *ctx,
 struct ttm_place *hop)
  {
@@ -470,7 +469,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
if (ret != -EMULTIHOP)
break;
  
-		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);

+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop);
} while (!ret);
  
  	if (ret) {

@@ -699,7 +698,6 @@ EXPORT_SYMBOL(ttm_bo_unpin);
   */
  static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 struct ttm_resource_manager *man,
-struct ttm_resource *mem,
 bool no_wait_gpu)
  {
struct dma_fence *fence;
@@ -753,7 +751,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
return ret;
} while (1);
  
-	return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);

+   return ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
  }
  
  /**

@@ -802,7 +800,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (unlikely(ret))
goto error;
  
-		ret = ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);

+   ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
if (unlikely(ret)) {
ttm_resource_free(bo, mem);
if (ret == -EBUSY)
@@ -866,7 +864,7 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
  bounce:
ret = ttm_bo_handle_move_mem(bo, mem, false, ctx, &hop);
if (ret == -EMULTIHOP) {
-   ret = ttm_bo_bounce_temp_buffer(bo, &mem, ctx, &hop);
+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop);
if (ret)
goto out;
/* try and move to final place now. */

Re: [PATCH v2 09/16] drm/ttm: fix ttm_bo.h kernel-doc warnings

2024-03-08 Thread Christian König


Am 08.03.24 um 12:55 schrieb Jani Nikula:

Some renames, some formatting fixes, add some missing documentation.

Cc: Christian Koenig 
Cc: Huang Rui 
Acked-by: Thomas Zimmermann 
Signed-off-by: Jani Nikula 
---
  include/drm/ttm/ttm_bo.h | 18 --
  1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 0223a41a64b2..59151ecb2db3 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -83,6 +83,9 @@ enum ttm_bo_type {
   * @resource: structure describing current placement.
   * @ttm: TTM structure holding system pages.
   * @deleted: True if the object is only a zombie and already deleted.
+ * @bulk_move: The bulk move object.
+ * @priority: Priority for LRU, BOs with lower priority are evicted first.
+ * @pin_count: Pin count.
   *
   * Base class for TTM buffer object, that deals with data placement and CPU
   * mappings. GPU mappings are really up to the driver, but for simpler GPUs
@@ -128,26 +131,28 @@ struct ttm_buffer_object {
struct work_struct delayed_delete;
  
  	/**

-* Special members that are protected by the reserve lock
-* and the bo::lock when written to. Can be read with
-* either of these locks held.
+* @sg: Special members that are protected by the reserve lock and the
+* bo::lock when written to. Can be read with either of these locks
+* held.


Actually that is completely outdated since the bo::lock was removed 
years ago.


I would just write that as "@sg: external source of pages and DMA 
addresses, protected by the reservation lock." (or something like this).


With that fixed feel free to add Reviewed-by: Christian König 
 to this patch and the other TTM cleanup 
patches in this series.


Regards,
Christian.


 */
struct sg_table *sg;
  };
  
+#define TTM_BO_MAP_IOMEM_MASK 0x80

+
  /**
   * struct ttm_bo_kmap_obj
   *
   * @virtual: The current kernel virtual address.
   * @page: The page when kmap'ing a single page.
   * @bo_kmap_type: Type of bo_kmap.
+ * @bo: The TTM BO.
   *
   * Object describing a kernel mapping. Since a TTM bo may be located
   * in various memory types with various caching policies, the
   * mapping can either be an ioremap, a vmap, a kmap or part of a
   * premapped region.
   */
-#define TTM_BO_MAP_IOMEM_MASK 0x80
  struct ttm_bo_kmap_obj {
void *virtual;
struct page *page;
@@ -171,6 +176,7 @@ struct ttm_bo_kmap_obj {
   * @force_alloc: Don't check the memory account during suspend or CPU page
   * faults. Should only be used by TTM internally.
   * @resv: Reservation object to allow reserved evictions with.
+ * @bytes_moved: Statistics on how many bytes have been moved.
   *
   * Context for TTM operations like changing buffer placement or general memory
   * allocation.
@@ -264,7 +270,7 @@ static inline int ttm_bo_reserve(struct ttm_buffer_object 
*bo,
   * ttm_bo_reserve_slowpath:
   * @bo: A pointer to a struct ttm_buffer_object.
   * @interruptible: Sleep interruptible if waiting.
- * @sequence: Set (@bo)->sequence to this value after lock
+ * @ticket: Ticket used to acquire the ww_mutex.
   *
   * This is called after ttm_bo_reserve returns -EAGAIN and we backed off
   * from all our other reservations. Because there are no other reservations
@@ -303,7 +309,7 @@ static inline void ttm_bo_assign_mem(struct 
ttm_buffer_object *bo,
  }
  
  /**

- * ttm_bo_move_null = assign memory for a buffer object.
+ * ttm_bo_move_null - assign memory for a buffer object.
   * @bo: The bo to assign the memory to
   * @new_mem: The memory to be assigned.
   *

Re: [PATCH v8 3/3] drm/buddy: Add user for defragmentation

2024-03-05 Thread Christian König


Am 05.03.24 um 12:14 schrieb Paneer Selvam, Arunpravin:

On 3/5/2024 4:33 PM, Paneer Selvam, Arunpravin wrote:

Hi Christian,

On 3/4/2024 10:09 PM, Christian König wrote:

Am 04.03.24 um 17:32 schrieb Arunpravin Paneer Selvam:

Add amdgpu driver as user for the drm buddy
defragmentation.

Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++--
  drivers/gpu/drm/drm_buddy.c  |  1 +
  2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

index e494f5bf136a..cff8a526c622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,

 min_block_size,
 &vres->blocks,
 vres->flags);
-    if (unlikely(r))
-    goto error_free_blocks;
+    if (unlikely(r)) {
+    if (r == -ENOSPC) {
+    drm_buddy_defrag(mm, min_block_size);
+    r = drm_buddy_alloc_blocks(mm, fpfn,
+   lpfn,
+   size,
+   min_block_size,
+   &vres->blocks,
+   vres->flags);


That doesn't looks like something we should do.

We might fallback when contiguous memory is requested, but certainly 
not on normal allocation failure.
yes, defrag here not useful for normal allocations. But worried about 
the bigger min_block_size normal allocations.
In such cases, I think we should move this drm_buddy_defrag() call 
into buddy allocator file. For example if the required
size is 1024KiB and if min_block_size is 256KiB, the allocator first 
tries to find the 1024KiB block, when there is no single 1024KiB block,
the allocator goes one level below in freelist and tries to search 
for two 512KiB blocks and goes on. At one point of time if we have 
less space,
we might go further levels below to search four 256KiB blocks to 
satisfy the request.


Assuming if the allocator cannot find the first 256KiB block, that 
time I think we might need to merge the two 128KiB blocks
through defragmentation function. And again for the second 256KiB 
block, we might need to call the defragmentation again to
merge two 128KiB blocks or four 64KiB blocks to form minimum 
alignment size of 256KiB. This goes on for the third and fourth
256KiB blocks to complete the required size allocation of 1024KiB. 
Please let me know if my understanding is not correct.


I don't think we should do that. We essentially have to support two 
different use cases:


1. Non contiguous allocation with 2MiB min_block_size for everything 
larger than 2MiB. Using a block size as large as possible is desirable, 
but not something we enforce.


2. Contiguous allocations for display, firmware etc.. Here we need to 
enforce a large block size and can live with the additional overhead 
caused by force merging.




As you have suggested we can also rename this as force merge or some 
other names.


Yeah, but just an suggestion. You are way deeper in the code and 
handling than I'm, so feel free to name it whatever you think fits best.


Regards,
Christian.




Thanks,
Arun.


Thanks,
Arun.


Regards,
Christian.


+    if (unlikely(r))
+    goto error_free_blocks;
+    } else {
+    goto error_free_blocks;
+    }
+    }
    if (size > remaining_size)
  remaining_size = 0;
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 40131ed9b0cd..19440f8caec0 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -396,6 +396,7 @@ void drm_buddy_defrag(struct drm_buddy *mm,
  }
  }
  }
+EXPORT_SYMBOL(drm_buddy_defrag);
    /**
   * drm_buddy_free_block - free a block

Re: [PATCH v8 3/3] drm/buddy: Add user for defragmentation

2024-03-04 Thread Christian König


Am 04.03.24 um 17:32 schrieb Arunpravin Paneer Selvam:

Add amdgpu driver as user for the drm buddy
defragmentation.

Signed-off-by: Arunpravin Paneer Selvam 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++--
  drivers/gpu/drm/drm_buddy.c  |  1 +
  2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index e494f5bf136a..cff8a526c622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
   min_block_size,
   &vres->blocks,
   vres->flags);
-   if (unlikely(r))
-   goto error_free_blocks;
+   if (unlikely(r)) {
+   if (r == -ENOSPC) {
+   drm_buddy_defrag(mm, min_block_size);
+   r = drm_buddy_alloc_blocks(mm, fpfn,
+  lpfn,
+  size,
+  min_block_size,
+  &vres->blocks,
+  vres->flags);


That doesn't looks like something we should do.

We might fallback when contiguous memory is requested, but certainly not 
on normal allocation failure.


Regards,
Christian.


+   if (unlikely(r))
+   goto error_free_blocks;
+   } else {
+   goto error_free_blocks;
+   }
+   }
  
  		if (size > remaining_size)

remaining_size = 0;
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 40131ed9b0cd..19440f8caec0 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -396,6 +396,7 @@ void drm_buddy_defrag(struct drm_buddy *mm,
}
}
  }
+EXPORT_SYMBOL(drm_buddy_defrag);
  
  /**

   * drm_buddy_free_block - free a block

Re: [PATCH] drm/i915: fix applying placement flag

2024-03-01 Thread Christian König


Am 01.03.24 um 17:04 schrieb Lucas De Marchi:

On Thu, Feb 29, 2024 at 02:01:05PM +0100, Christian König wrote:

Gentle ping. Can I get an rb for that?

Thanks,
Christian.


Reviewed-by: Lucas De Marchi 


Thanks!



For some reason CI failed, but can't be related with this change.
I re-triggered it to see if we can get a green run before merging.


Do you want to pick it into a i915 branch or should I push it to 
drm-misc-next(-fixes) then?


Christian.



thanks
Lucas De Marchi



Am 26.02.24 um 15:27 schrieb Christian König:

Switching from a separate list to flags introduced a bug here.

We were accidentially ORing the flag before initailizing the placement
and not after. So this code didn't do nothing except producing a
warning.

Signed-off-by: Christian König 
Reported-by: Stephen Rothwell 
Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index a6b0aaf30cbe..7264fb08eee8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct 
drm_i915_gem_object *obj,

 unsigned int flags = obj->flags;
 unsigned int i;
-    places[0].flags |= TTM_PL_FLAG_DESIRED;
 i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
    obj->mm.region, &places[0], obj->bo_offset,
    obj->base.size, flags);
+    places[0].flags |= TTM_PL_FLAG_DESIRED;
 /* Cache this on object? */
 for (i = 0; i < num_allowed; ++i) {

[PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2024-02-29 Thread Christian König

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König 
Reviewed-by: Zack Rusin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 010b0cb7693c..8bc79924d171 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
+   /*
+* When GTT is just an alternative to VRAM make sure that we
+* only use it as fallback and still try to fill up VRAM first.
+*/
+   if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
+   places[c].flags |= TTM_PL_FLAG_FALLBACK;
c++;
}
 
-- 
2.34.1

[PATCH 1/2] drm/ttm: improve idle/busy handling v5

2024-02-29 Thread Christian König

Previously we would never try to move a BO into the preferred placements
when it ever landed in a busy placement since those were considered
compatible.

Rework the whole handling and finally unify the idle and busy handling.
ttm_bo_validate() is now responsible to try idle placement first and then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on overcommit.

v2: fix kerneldoc warning and coding style
v3: take care of XE as well
v4: keep the ttm_bo_mem_space functionality as it is for now, only add
new handling for ttm_bo_validate as suggested by Thomas
v5: fix bug pointed out by Matthew

Signed-off-by: Christian König 
Reviewed-by: Zack Rusin  v3
---
 drivers/gpu/drm/ttm/ttm_bo.c   | 231 +
 drivers/gpu/drm/ttm/ttm_resource.c |  16 +-
 include/drm/ttm/ttm_resource.h |   3 +-
 3 files changed, 121 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 96a724e8f3ff..e059b1e1b13b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
return ret;
 }
 
-/*
- * Repeatedly evict memory from the LRU for @mem_type until we create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
- const struct ttm_place *place,
- struct ttm_resource **mem,
- struct ttm_operation_ctx *ctx)
-{
-   struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man;
-   struct ww_acquire_ctx *ticket;
-   int ret;
-
-   man = ttm_manager_type(bdev, place->mem_type);
-   ticket = dma_resv_locking_ctx(bo->base.resv);
-   do {
-   ret = ttm_resource_alloc(bo, place, mem);
-   if (likely(!ret))
-   break;
-   if (unlikely(ret != -ENOSPC))
-   return ret;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
- ticket);
-   if (unlikely(ret != 0))
-   return ret;
-   } while (1);
-
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
-}
-
 /**
- * ttm_bo_mem_space
+ * ttm_bo_alloc_resource - Allocate backing store for a BO
  *
- * @bo: Pointer to a struct ttm_buffer_object. the data of which
- * we want to allocate space for.
- * @placement: Proposed new placement for the buffer object.
- * @mem: A struct ttm_resource.
+ * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for
+ * @placement: Proposed new placement for the buffer object
  * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
+ * @res: The resulting struct ttm_resource.
  *
- * Allocate memory space for the buffer object pointed to by @bo, using
- * the placement flags in @placement, potentially evicting other idle buffer 
objects.
- * This function may sleep while waiting for space to become available.
+ * Allocates a resource for the buffer object pointed to by @bo, using the
+ * placement flags in @placement, potentially evicting other buffer objects 
when
+ * @force_space is true.
+ * This function may sleep while waiting for resources to become available.
  * Returns:
- * -EBUSY: No space available (only if no_wait == 1).
+ * -EBUSY: No space available (only if no_wait == true).
  * -ENOSPC: Could not allocate space for the buffer object, either due to
  * fragmentation or concurrent allocators.
  * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
  */
-int ttm_bo_mem_space(struct ttm_buffer_object *bo,
-   struct ttm_placement *placement,
-   struct ttm_resource **mem,
-   struct ttm_operation_ctx *ctx)
+static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
+struct ttm_placement *placement,
+struct ttm_operation_ctx *ctx,
+bool force_space,
+struct ttm_resource **res)
 {
struct ttm_device *bdev = bo->bdev;
-   bool type_found = false;
+   struct ww_acquire_ctx *ticket;
int i, ret;
 
+   ticket = dma_resv_locking_ctx(bo->base.resv);
ret = dma_resv_reserve_fences(bo->base.resv, 1);
if (unlikely(ret))
return ret;
@@ -790,98 +762,73 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
const struct ttm_place *place = &placement->placement[i];
struct ttm_resource_manager *man;
 
-

Re: [PATCH] drm/i915: fix applying placement flag

2024-02-29 Thread Christian König


Gentle ping. Can I get an rb for that?

Thanks,
Christian.

Am 26.02.24 um 15:27 schrieb Christian König:

Switching from a separate list to flags introduced a bug here.

We were accidentially ORing the flag before initailizing the placement
and not after. So this code didn't do nothing except producing a
warning.

Signed-off-by: Christian König 
Reported-by: Stephen Rothwell 
Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index a6b0aaf30cbe..7264fb08eee8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct 
drm_i915_gem_object *obj,
unsigned int flags = obj->flags;
unsigned int i;
  
-	places[0].flags |= TTM_PL_FLAG_DESIRED;

i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
   obj->mm.region, &places[0], obj->bo_offset,
   obj->base.size, flags);
+   places[0].flags |= TTM_PL_FLAG_DESIRED;
  
  	/* Cache this on object? */

for (i = 0; i < num_allowed; ++i) {

Re: [PATCH] drm/i915/ttm: Fix TTM_PL_FLAG_DESIRED

2024-02-27 Thread Christian König


Am 27.02.24 um 21:26 schrieb Ville Syrjala:

From: Ville Syrjälä 

inlined from ‘i915_ttm_get_pages’ at 
../drivers/gpu/drm/i915/gem/i915_gem_ttm.c:847:2:
../drivers/gpu/drm/i915/gem/i915_gem_ttm.c:165:18: warning: ‘places[0].flags’ 
is used uninitialized [-Wuninitialized]
   165 | places[0].flags |= TTM_PL_FLAG_DESIRED;
   | ~^~
../drivers/gpu/drm/i915/gem/i915_gem_ttm.c: In function ‘i915_ttm_get_pages’:
../drivers/gpu/drm/i915/gem/i915_gem_ttm.c:837:26: note: ‘places’ declared here
   837 | struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1];
   |  ^~

Furhermore we then proceed to call i915_ttm_place_from_region() which
memset()s the whole thing back to zero anyway. So in the end we lose
the TTM_PL_FLAG_DESIRED flag (and fortunately also whatever else stack
garbage happened to be in the flags at this point).

No idea what functional changes this will result in...


I've already send out the same patch yesterday. Please review that one.

Sorry for the noise, didn't realized that i915_ttm_place_from_region() 
was initializing the flags and not the caller while converting this.


Thanks,
Christian.



Cc: Somalapuram Amaranath 
Cc: Christian König 
Cc: Zack Rusin 
Cc: Thomas Zimmermann 
Cc: Thomas Hellström 
Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")
Signed-off-by: Ville Syrjälä 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 27dcfd8a34bb..e6f177183c0f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct 
drm_i915_gem_object *obj,
unsigned int flags = obj->flags;
unsigned int i;
  
-	places[0].flags |= TTM_PL_FLAG_DESIRED;

i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
   obj->mm.region, &places[0], obj->bo_offset,
   obj->base.size, flags);
+   places[0].flags |= TTM_PL_FLAG_DESIRED;
  
  	/* Cache this on object? */

for (i = 0; i < num_allowed; ++i) {

Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4

2024-02-27 Thread Christian König


Am 27.02.24 um 09:12 schrieb Matthew Auld:

On 26/02/2024 20:21, Thomas Hellström wrote:

Hi, Christian

On Fri, 2024-02-23 at 15:30 +0100, Christian König wrote:

Am 06.02.24 um 13:56 schrieb Christian König:

Am 06.02.24 um 13:53 schrieb Thomas Hellström:

Hi, Christian,

On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote:

Previously we would never try to move a BO into the preferred
placements
when it ever landed in a busy placement since those were
considered
compatible.

Rework the whole handling and finally unify the idle and busy
handling.
ttm_bo_validate() is now responsible to try idle placement
first and
then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for
each
validation which might cause some additional CPU overhead on
overcommit.

v2: fix kerneldoc warning and coding style
v3: take care of XE as well
v4: keep the ttm_bo_mem_space functionality as it is for now,
only
add
  new handling for ttm_bo_validate as suggested by Thomas

Signed-off-by: Christian König 
Reviewed-by: Zack Rusin  v3

Sending this through xe CI, will try to review asap.


Take your time. At the moment people are bombarding me with work
and I
have only two hands and one head as well :(


So I've digged myself out of that hole and would rather like to get
this
new feature into 6.9.

Any time to review it? I can also plan some time to review your LRU
changes next week.

Thanks,
Christian.


Sorry for the late response. Was planning to review but saw that there
was still an xe CI failure.

https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-129579v1/bat-atsm-2/igt@xe_evict_...@evict-overcommit-parallel-nofree-samefd.html 



I haven't really had time to look into what might be causing this,
though.

Maybe in ttm_bo_alloc_resource():

@@ -772,7 +772,7 @@ static int ttm_bo_alloc_resource(struct 
ttm_buffer_object *bo,


    do {
    ret = ttm_resource_alloc(bo, place, res);
-   if (unlikely(ret != -ENOSPC))
+   if (unlikely(ret && ret != -ENOSPC))
    return ret;
    if (likely(!ret) || !force_space)
    break;

Otherwise we allocate VRAM but never correctly synchronise against the 
move fence, since we missed adding it to the BO. When we trigger async 
evictions that would explain the above test failure where we detect 
VRAM corruption, since someone else is still using the VRAM we 
allocated. What do you think?


Yup, that looks like the right thing to do. Thanks.

Give me a moment to fix that.

Christian.





/Thomas





Christian.



/Thomas



---
   drivers/gpu/drm/ttm/ttm_bo.c   | 231 +---
---
--
   drivers/gpu/drm/ttm/ttm_resource.c |  16 +-
   include/drm/ttm/ttm_resource.h |   3 +-
   3 files changed, 121 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c
index ba3f09e2d7e6..b12f435542a9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct
ttm_buffer_object *bo,
   return ret;
   }
   -/*
- * Repeatedly evict memory from the LRU for @mem_type until we
create enough
- * space, or we've evicted everything and there isn't enough
space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object
*bo,
-      const struct ttm_place *place,
-      struct ttm_resource **mem,
-      struct ttm_operation_ctx *ctx)
-{
-    struct ttm_device *bdev = bo->bdev;
-    struct ttm_resource_manager *man;
-    struct ww_acquire_ctx *ticket;
-    int ret;
-
-    man = ttm_manager_type(bdev, place->mem_type);
-    ticket = dma_resv_locking_ctx(bo->base.resv);
-    do {
-    ret = ttm_resource_alloc(bo, place, mem);
-    if (likely(!ret))
-    break;
-    if (unlikely(ret != -ENOSPC))
-    return ret;
-    ret = ttm_mem_evict_first(bdev, man, place, ctx,
-      ticket);
-    if (unlikely(ret != 0))
-    return ret;
-    } while (1);
-
-    return ttm_bo_add_move_fence(bo, man, *mem, ctx-

no_wait_gpu);

-}
-
   /**
- * ttm_bo_mem_space
+ * ttm_bo_alloc_resource - Allocate backing store for a BO
    *
- * @bo: Pointer to a struct ttm_buffer_object. the data of
which
- * we want to allocate space for.
- * @placement: Proposed new placement for the buffer object.
- * @mem: A struct ttm_resource.
+ * @bo: Pointer to a struct ttm_buffer_object of which we want
a
resource for
+ * @placement: Proposed new placement for the buffer object
    * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
+ * @res: The resulting struct ttm_resource.
    *
- * Allocate memory space for the buffer object pointed to by
@bo,
using
- * the placement flags in @placement, pote

[PATCH] drm/i915: fix applying placement flag

2024-02-26 Thread Christian König

Switching from a separate list to flags introduced a bug here.

We were accidentially ORing the flag before initailizing the placement
and not after. So this code didn't do nothing except producing a
warning.

Signed-off-by: Christian König 
Reported-by: Stephen Rothwell 
Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index a6b0aaf30cbe..7264fb08eee8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct 
drm_i915_gem_object *obj,
unsigned int flags = obj->flags;
unsigned int i;
 
-   places[0].flags |= TTM_PL_FLAG_DESIRED;
i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
   obj->mm.region, &places[0], obj->bo_offset,
   obj->base.size, flags);
+   places[0].flags |= TTM_PL_FLAG_DESIRED;
 
/* Cache this on object? */
for (i = 0; i < num_allowed; ++i) {
-- 
2.34.1

Re: linux-next: build failure after merge of the drm-misc tree

2024-02-26 Thread Christian König


Am 25.02.24 um 22:47 schrieb Stephen Rothwell:

Hi all,

On Mon, 26 Feb 2024 08:41:16 +1100 Stephen Rothwell  
wrote:

On Tue, 20 Feb 2024 08:48:21 +1100 Stephen Rothwell  
wrote:

On Mon, 12 Feb 2024 15:15:54 +0200 Jani Nikula  
wrote:

On Tue, 06 Feb 2024, Stephen Rothwell  wrote:

After merging the drm-misc tree, today's linux-next build (i386 defconfig)
failed like this:

In function 'i915_ttm_placement_from_obj',
 inlined from 'i915_ttm_get_pages' at 
drivers/gpu/drm/i915/gem/i915_gem_ttm.c:847:2:
drivers/gpu/drm/i915/gem/i915_gem_ttm.c:165:18: error: 'places[0].flags' is 
used uninitialized [-Werror=uninitialized]
   165 | places[0].flags |= TTM_PL_FLAG_DESIRED;
   | ~^~
drivers/gpu/drm/i915/gem/i915_gem_ttm.c: In function 'i915_ttm_get_pages':
drivers/gpu/drm/i915/gem/i915_gem_ttm.c:837:26: note: 'places' declared here
   837 | struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1];
   |  ^~

Caused by commit

   a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")

Cc: more people.
 

I applied the following hack for today:

From: Stephen Rothwell 
Date: Tue, 6 Feb 2024 15:17:54 +1100
Subject: [PATCH] drm/ttm: initialise places

Signed-off-by: Stephen Rothwell 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 80c6cafc8887..34e699e67c25 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -834,7 +834,7 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object 
*obj,
  
  static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)

  {
-   struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1];
+   struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1] = {};
struct ttm_placement placement;
  
  	/* restricted by sg_alloc_table */

--
2.43.0

I am still applying the above patch ...

Any progress?

And this commit is now in the drm tree.


Sorry for the delay. Oring in the flag needs to come after the call and 
not before it.


Going to fix this.

Thanks,
Christian.

Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4

2024-02-23 Thread Christian König


Am 06.02.24 um 13:56 schrieb Christian König:

Am 06.02.24 um 13:53 schrieb Thomas Hellström:

Hi, Christian,

On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote:

Previously we would never try to move a BO into the preferred
placements
when it ever landed in a busy placement since those were considered
compatible.

Rework the whole handling and finally unify the idle and busy
handling.
ttm_bo_validate() is now responsible to try idle placement first and
then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on
overcommit.

v2: fix kerneldoc warning and coding style
v3: take care of XE as well
v4: keep the ttm_bo_mem_space functionality as it is for now, only
add
 new handling for ttm_bo_validate as suggested by Thomas

Signed-off-by: Christian König 
Reviewed-by: Zack Rusin  v3

Sending this through xe CI, will try to review asap.


Take your time. At the moment people are bombarding me with work and I 
have only two hands and one head as well :(


So I've digged myself out of that hole and would rather like to get this 
new feature into 6.9.


Any time to review it? I can also plan some time to review your LRU 
changes next week.


Thanks,
Christian.



Christian.



/Thomas



---
  drivers/gpu/drm/ttm/ttm_bo.c   | 231 +--
--
  drivers/gpu/drm/ttm/ttm_resource.c |  16 +-
  include/drm/ttm/ttm_resource.h |   3 +-
  3 files changed, 121 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c
index ba3f09e2d7e6..b12f435542a9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct
ttm_buffer_object *bo,
  return ret;
  }
  -/*
- * Repeatedly evict memory from the LRU for @mem_type until we
create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
-      const struct ttm_place *place,
-      struct ttm_resource **mem,
-      struct ttm_operation_ctx *ctx)
-{
-    struct ttm_device *bdev = bo->bdev;
-    struct ttm_resource_manager *man;
-    struct ww_acquire_ctx *ticket;
-    int ret;
-
-    man = ttm_manager_type(bdev, place->mem_type);
-    ticket = dma_resv_locking_ctx(bo->base.resv);
-    do {
-    ret = ttm_resource_alloc(bo, place, mem);
-    if (likely(!ret))
-    break;
-    if (unlikely(ret != -ENOSPC))
-    return ret;
-    ret = ttm_mem_evict_first(bdev, man, place, ctx,
-      ticket);
-    if (unlikely(ret != 0))
-    return ret;
-    } while (1);
-
-    return ttm_bo_add_move_fence(bo, man, *mem, ctx-

no_wait_gpu);

-}
-
  /**
- * ttm_bo_mem_space
+ * ttm_bo_alloc_resource - Allocate backing store for a BO
   *
- * @bo: Pointer to a struct ttm_buffer_object. the data of which
- * we want to allocate space for.
- * @placement: Proposed new placement for the buffer object.
- * @mem: A struct ttm_resource.
+ * @bo: Pointer to a struct ttm_buffer_object of which we want a
resource for
+ * @placement: Proposed new placement for the buffer object
   * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
+ * @res: The resulting struct ttm_resource.
   *
- * Allocate memory space for the buffer object pointed to by @bo,
using
- * the placement flags in @placement, potentially evicting other
idle buffer objects.
- * This function may sleep while waiting for space to become
available.
+ * Allocates a resource for the buffer object pointed to by @bo,
using the
+ * placement flags in @placement, potentially evicting other buffer
objects when
+ * @force_space is true.
+ * This function may sleep while waiting for resources to become
available.
   * Returns:
- * -EBUSY: No space available (only if no_wait == 1).
+ * -EBUSY: No space available (only if no_wait == true).
   * -ENOSPC: Could not allocate space for the buffer object, either
due to
   * fragmentation or concurrent allocators.
   * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
   */
-int ttm_bo_mem_space(struct ttm_buffer_object *bo,
-    struct ttm_placement *placement,
-    struct ttm_resource **mem,
-    struct ttm_operation_ctx *ctx)
+static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
+ struct ttm_placement *placement,
+ struct ttm_operation_ctx *ctx,
+ bool force_space,
+ struct ttm_resource **res)
  {
  struct ttm_device *bdev = bo->bdev;
-    bool type_found = false;
+    struct ww_acquire_ctx *ticket;
  int i, ret;
  +    ticket = dma_resv_locking_ctx(bo->base.resv);
  ret = dma_resv_reserve_fences(bo->base.resv, 1);

Re: [PATCH] drm/ttm: Fix an invalid freeing on already freed page in error path

2024-02-22 Thread Christian König


Am 22.02.24 um 08:34 schrieb Thomas Hellström:

On Wed, 2024-02-21 at 11:26 +0100, Christian König wrote:

Am 21.02.24 um 08:33 schrieb Thomas Hellström:

If caching mode change fails due to, for example, OOM we
free the allocated pages in a two-step process. First the pages
for which the caching change has already succeeded. Secondly
the pages for which a caching change did not succeed.

However the second step was incorrectly freeing the pages already
freed in the first step.

Fix.

Signed-off-by: Thomas Hellström 
Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
Cc: Christian König 
Cc: Dave Airlie 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: dri-de...@lists.freedesktop.org
Cc:  # v6.4+

You don't know how much time I've spend staring at this line to find
the
bug in it and haven't seen it. Got bug reports about that for month
as well.


Yeah, sorry about that. We should probably have Kunit tests exercising
OOM in the pool code involving WC pages.

I'll push this to drm-misc-next.


drm-misc-fixes please! That needs to be backported ASAP.

Need to dig up the bug report for this again.

Thanks,
Christian.



/Thomas


Reviewed-by: Christian König 


---
   drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
b/drivers/gpu/drm/ttm/ttm_pool.c
index b62f420a9f96..112438d965ff 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool
*pool, struct ttm_tt *tt,
    enum ttm_caching caching,
    pgoff_t start_page, pgoff_t
end_page)
   {
-   struct page **pages = tt->pages;
+   struct page **pages = &tt->pages[start_page];
    unsigned int order;
    pgoff_t i, nr;

Re: [PATCH v7 3/3] drm/buddy: Add defragmentation support

2024-02-21 Thread Christian König


Am 21.02.24 um 13:18 schrieb Arunpravin Paneer Selvam:

Add a function to support defragmentation.


Thinking more about it maybe you want to call this function differently.

Essentially we are force merging pages even if their cleared flag 
doesn't match, that is something different than defragmentation I think.


Maybe rename it for force_merge or something similar. Not mandatory, 
just an idea how to improve the readability of the code.


Apart from that just let me know when I can push it to drm-misc-next.

Christian.



v1:
   - Defragment the memory beginning from min_order
 till the required memory space is available.

v2(Matthew):
   - add amdgpu user for defragmentation
   - add a warning if the two blocks are incompatible on
 defragmentation
   - call full defragmentation in the fini() function
   - place a condition to test if min_order is equal to 0
   - replace the list with safe_reverse() variant as we might
 remove the block from the list.

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Matthew Auld 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++-
  drivers/gpu/drm/drm_buddy.c  | 93 +---
  include/drm/drm_buddy.h  |  3 +
  3 files changed, 97 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index e494f5bf136a..cff8a526c622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
   min_block_size,
   &vres->blocks,
   vres->flags);
-   if (unlikely(r))
-   goto error_free_blocks;
+   if (unlikely(r)) {
+   if (r == -ENOSPC) {
+   drm_buddy_defrag(mm, min_block_size);
+   r = drm_buddy_alloc_blocks(mm, fpfn,
+  lpfn,
+  size,
+  min_block_size,
+  &vres->blocks,
+  vres->flags);
+   if (unlikely(r))
+   goto error_free_blocks;
+   } else {
+   goto error_free_blocks;
+   }
+   }
  
  		if (size > remaining_size)

remaining_size = 0;
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 18e004fa39d3..56bd1560fbcd 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -203,6 +203,8 @@ void drm_buddy_fini(struct drm_buddy *mm)
drm_block_free(mm, mm->roots[i]);
}
  
+	drm_buddy_defrag(mm, mm->chunk_size << mm->max_order);

+
WARN_ON(mm->avail != mm->size);
  
  	kfree(mm->roots);

@@ -276,25 +278,39 @@ drm_get_buddy(struct drm_buddy_block *block)
  }
  EXPORT_SYMBOL(drm_get_buddy);
  
-static void __drm_buddy_free(struct drm_buddy *mm,

-struct drm_buddy_block *block)
+static unsigned int __drm_buddy_free(struct drm_buddy *mm,
+struct drm_buddy_block *block,
+bool defrag)
  {
+   unsigned int order, block_order;
struct drm_buddy_block *parent;
  
+	block_order = drm_buddy_block_order(block);

+
while ((parent = block->parent)) {
-   struct drm_buddy_block *buddy;
+   struct drm_buddy_block *buddy = NULL;
  
  		buddy = __get_buddy(block);
  
  		if (!drm_buddy_block_is_free(buddy))

break;
  
-		if (drm_buddy_block_is_clear(block) !=

-   drm_buddy_block_is_clear(buddy))
-   break;
+   if (!defrag) {
+   /*
+* Check the block and its buddy clear state and exit
+* the loop if they both have the dissimilar state.
+*/
+   if (drm_buddy_block_is_clear(block) !=
+   drm_buddy_block_is_clear(buddy))
+   break;
  
-		if (drm_buddy_block_is_clear(block))

-   mark_cleared(parent);
+   if (drm_buddy_block_is_clear(block))
+   mark_cleared(parent);
+   }
+
+   WARN_ON(defrag &&
+   (drm_buddy_block_is_clear(block) ==
+drm_buddy_block_is_clear(buddy)));
  
  		list_del(&buddy->link);
  
@@ -304,8 +320,57 @@ static void __drm_buddy_free(struct

Re: [PATCH] drm/ttm: Fix an invalid freeing on already freed page in error path

2024-02-21 Thread Christian König


Am 21.02.24 um 08:33 schrieb Thomas Hellström:

If caching mode change fails due to, for example, OOM we
free the allocated pages in a two-step process. First the pages
for which the caching change has already succeeded. Secondly
the pages for which a caching change did not succeed.

However the second step was incorrectly freeing the pages already
freed in the first step.

Fix.

Signed-off-by: Thomas Hellström 
Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
Cc: Christian König 
Cc: Dave Airlie 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: dri-de...@lists.freedesktop.org
Cc:  # v6.4+


You don't know how much time I've spend staring at this line to find the 
bug in it and haven't seen it. Got bug reports about that for month as well.


Reviewed-by: Christian König 


---
  drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index b62f420a9f96..112438d965ff 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, 
struct ttm_tt *tt,
enum ttm_caching caching,
pgoff_t start_page, pgoff_t end_page)
  {
-   struct page **pages = tt->pages;
+   struct page **pages = &tt->pages[start_page];
unsigned int order;
pgoff_t i, nr;

Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support

2024-02-16 Thread Christian König


Am 16.02.24 um 15:47 schrieb Matthew Auld:

On 16/02/2024 14:02, Christian König wrote:

Am 16.02.24 um 14:21 schrieb Matthew Auld:

On 16/02/2024 12:33, Christian König wrote:

Am 16.02.24 um 13:23 schrieb Matthew Auld:

On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote:

Add a function to support defragmentation.

v1: Defragment the memory beginning from min_order
 till the required memory space is available.

Signed-off-by: Arunpravin Paneer Selvam 


Suggested-by: Matthew Auld 
---
  drivers/gpu/drm/drm_buddy.c | 67 
+++--

  include/drm/drm_buddy.h |  3 ++


No users?


Other question is how can a buddy allocator fragment in the first 
place?


The fragmentation is due to pages now being tracked as dirty/clear. 
Should the allocator merge together a page that is dirty with a page 
that is cleared? When should it do that? User wants to be able to 
keep the two separate if possible. For example, freeing one single 
dirty page can dirty a huge swathe of your already cleared pages if 
they are merged together. Or do you have some some other ideas here?


Sorry, that was not what I meant. I should probably have been clearer.

That dirty and clean pages are now kept separated is obvious, but why 
do you need to de-fragment them at some point?


Ah, right. At the very least we need to do something similar to this 
at fini(), just to ensure we properly merge everything back together 
so we can correctly tear down the mm. Outside of that the thinking was 
that it might be useful to call when allocating larger min page-sizes. 
You might now be failing the allocation due to fragmentation, and so 
in some cases might be better off running some kind of defrag step 
first, instead of failing the allocation and trying to evict stuff. 
Anyway, if that is not a concern for amdgpu, then we just need to 
handle the fini() case and can keep this internal.


Ah, yes that makes more sense.

So you basically force merge the pages before fini to avoid warnings 
that the buddy isn't empty.


Thanks, that answers my curiosity. But I unfortunately still don't have 
time to dig deep enough into this for a review.


Thanks,
Christian.





Christian.





Christian.




  2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c 
b/drivers/gpu/drm/drm_buddy.c

index 33ad0cfbd54c..fac423d2cb73 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block)
  }
  EXPORT_SYMBOL(drm_get_buddy);
  -static void __drm_buddy_free(struct drm_buddy *mm,
- struct drm_buddy_block *block)
+static unsigned int __drm_buddy_free(struct drm_buddy *mm,
+ struct drm_buddy_block *block,
+ bool defrag)
  {
  struct drm_buddy_block *parent;
+    unsigned int order;
    while ((parent = block->parent)) {
  struct drm_buddy_block *buddy;
@@ -289,12 +291,14 @@ static void __drm_buddy_free(struct 
drm_buddy *mm,

  if (!drm_buddy_block_is_free(buddy))
  break;
  -    if (drm_buddy_block_is_clear(block) !=
-    drm_buddy_block_is_clear(buddy))
-    break;
+    if (!defrag) {
+    if (drm_buddy_block_is_clear(block) !=
+    drm_buddy_block_is_clear(buddy))
+    break;
  -    if (drm_buddy_block_is_clear(block))
-    mark_cleared(parent);
+    if (drm_buddy_block_is_clear(block))
+    mark_cleared(parent);
+    }


Maybe check if the two blocks are incompatible and chuck a warn if 
they are not? Main thing is not to hide issues with split blocks 
that should have been merged before.



list_del(&buddy->link);
  @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct 
drm_buddy *mm,

  block = parent;
  }
  +    order = drm_buddy_block_order(block);
  mark_free(mm, block);
+
+    return order;
+}
+
+/**
+ * drm_buddy_defrag - Defragmentation routine
+ *
+ * @mm: DRM buddy manager
+ * @min_order: minimum order in the freelist to begin
+ * the defragmentation process
+ *
+ * Driver calls the defragmentation function when the
+ * requested memory allocation returns -ENOSPC.
+ */
+void drm_buddy_defrag(struct drm_buddy *mm,
+  unsigned int min_order)


Just wondering if we need "full defag" also? We would probably 
need to call this at fini() anyway.



+{
+    struct drm_buddy_block *block;
+    struct list_head *list;
+    unsigned int order;
+    int i;
+
+    if (min_order > mm->max_order)
+    return;
+
+    for (i = min_order - 1; i >= 0; i--) {


Need to be careful with min_order = 0 ?


+    list = &mm->free_list[i];
+    if (list_empty(list))
+    continue;
+
+    list_for_each_entry_reverse(block, list, link) {


Don't we need the safe_reverse() variant here, since this is 
removing from the list?



+

Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support

2024-02-16 Thread Christian König


Am 16.02.24 um 14:21 schrieb Matthew Auld:

On 16/02/2024 12:33, Christian König wrote:

Am 16.02.24 um 13:23 schrieb Matthew Auld:

On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote:

Add a function to support defragmentation.

v1: Defragment the memory beginning from min_order
 till the required memory space is available.

Signed-off-by: Arunpravin Paneer Selvam 


Suggested-by: Matthew Auld 
---
  drivers/gpu/drm/drm_buddy.c | 67 
+++--

  include/drm/drm_buddy.h |  3 ++


No users?


Other question is how can a buddy allocator fragment in the first place?


The fragmentation is due to pages now being tracked as dirty/clear. 
Should the allocator merge together a page that is dirty with a page 
that is cleared? When should it do that? User wants to be able to keep 
the two separate if possible. For example, freeing one single dirty 
page can dirty a huge swathe of your already cleared pages if they are 
merged together. Or do you have some some other ideas here?


Sorry, that was not what I meant. I should probably have been clearer.

That dirty and clean pages are now kept separated is obvious, but why do 
you need to de-fragment them at some point?


Christian.





Christian.




  2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 33ad0cfbd54c..fac423d2cb73 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block)
  }
  EXPORT_SYMBOL(drm_get_buddy);
  -static void __drm_buddy_free(struct drm_buddy *mm,
- struct drm_buddy_block *block)
+static unsigned int __drm_buddy_free(struct drm_buddy *mm,
+ struct drm_buddy_block *block,
+ bool defrag)
  {
  struct drm_buddy_block *parent;
+    unsigned int order;
    while ((parent = block->parent)) {
  struct drm_buddy_block *buddy;
@@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy 
*mm,

  if (!drm_buddy_block_is_free(buddy))
  break;
  -    if (drm_buddy_block_is_clear(block) !=
-    drm_buddy_block_is_clear(buddy))
-    break;
+    if (!defrag) {
+    if (drm_buddy_block_is_clear(block) !=
+    drm_buddy_block_is_clear(buddy))
+    break;
  -    if (drm_buddy_block_is_clear(block))
-    mark_cleared(parent);
+    if (drm_buddy_block_is_clear(block))
+    mark_cleared(parent);
+    }


Maybe check if the two blocks are incompatible and chuck a warn if 
they are not? Main thing is not to hide issues with split blocks 
that should have been merged before.



list_del(&buddy->link);
  @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct 
drm_buddy *mm,

  block = parent;
  }
  +    order = drm_buddy_block_order(block);
  mark_free(mm, block);
+
+    return order;
+}
+
+/**
+ * drm_buddy_defrag - Defragmentation routine
+ *
+ * @mm: DRM buddy manager
+ * @min_order: minimum order in the freelist to begin
+ * the defragmentation process
+ *
+ * Driver calls the defragmentation function when the
+ * requested memory allocation returns -ENOSPC.
+ */
+void drm_buddy_defrag(struct drm_buddy *mm,
+  unsigned int min_order)


Just wondering if we need "full defag" also? We would probably need 
to call this at fini() anyway.



+{
+    struct drm_buddy_block *block;
+    struct list_head *list;
+    unsigned int order;
+    int i;
+
+    if (min_order > mm->max_order)
+    return;
+
+    for (i = min_order - 1; i >= 0; i--) {


Need to be careful with min_order = 0 ?


+    list = &mm->free_list[i];
+    if (list_empty(list))
+    continue;
+
+    list_for_each_entry_reverse(block, list, link) {


Don't we need the safe_reverse() variant here, since this is 
removing from the list?



+    if (!block->parent)
+    continue;
+
+    order = __drm_buddy_free(mm, block, 1);
+    if (order >= min_order)
+    return;
+    }
+    }
  }
+EXPORT_SYMBOL(drm_buddy_defrag);
    /**
   * drm_buddy_free_block - free a block
@@ -321,7 +366,7 @@ void drm_buddy_free_block(struct drm_buddy *mm,
  if (drm_buddy_block_is_clear(block))
  mm->clear_avail += drm_buddy_block_size(mm, block);
  -    __drm_buddy_free(mm, block);
+    __drm_buddy_free(mm, block, 0);
  }
  EXPORT_SYMBOL(drm_buddy_free_block);
  @@ -470,7 +515,7 @@ __alloc_range_bias(struct drm_buddy *mm,
  if (buddy &&
  (drm_buddy_block_is_free(block) &&
   drm_buddy_block_is_free(buddy)))
-    __drm_buddy_free(mm, block);
+    __drm_buddy_free(mm, block, 0);
  return ERR_PTR(err);
  }
  @@ -588,7 +633,7 @@ alloc_from_freelist(struct drm_buddy *mm,
    err_undo:
  if (tmp != order)
-    __drm_buddy_free

Re: [PATCH 0/6 V4] fdinfo shared stats

2024-02-16 Thread Christian König


Am 15.02.24 um 15:20 schrieb Alex Deucher:

On Thu, Feb 15, 2024 at 9:18 AM Christian König
 wrote:

Am 12.02.24 um 22:04 schrieb Alex Deucher:

We had a request to add shared buffer stats to fdinfo for amdgpu and
while implementing that, Christian mentioned that just looking at
the GEM handle count doesn't take into account buffers shared with other
subsystems like V4L or RDMA.  Those subsystems don't use GEM, so it
doesn't really matter from a GPU top perspective, but it's more
correct if you actually want to see shared buffers.

After further discussions, add a helper and update all fdinfo
implementations to use that helper for consistency.

v4: switch drm_gem_object_is_shared_for_memory_stats() to an inline function

I'm still not sure if looking at the actual handle count is the right
approach, but it's certainly better than before.

Well, it's consistent across drivers.


Yeah, which makes it easy to change if we find something better.




So Reviewed-by: Christian König  for the
entire series.

Should I take this through drm-misc-next?

Yes, please.


Done.

Regards,
Christian.



Thanks,

Alex


Regards,
Christian.


Alex Deucher (6):
Documentation/gpu: Update documentation on drm-shared-*
drm: add drm_gem_object_is_shared_for_memory_stats() helper
drm: update drm_show_memory_stats() for dma-bufs
drm/amdgpu: add shared fdinfo stats
drm/i915: Update shared stats to use the new gem helper
drm/xe: Update shared stats to use the new gem helper

   Documentation/gpu/drm-usage-stats.rst  |  2 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
   drivers/gpu/drm/drm_file.c |  2 +-
   drivers/gpu/drm/i915/i915_drm_client.c |  2 +-
   drivers/gpu/drm/xe/xe_drm_client.c |  2 +-
   include/drm/drm_gem.h  | 13 +
   8 files changed, 38 insertions(+), 4 deletions(-)

Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support

2024-02-16 Thread Christian König


Am 16.02.24 um 13:23 schrieb Matthew Auld:

On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote:

Add a function to support defragmentation.

v1: Defragment the memory beginning from min_order
 till the required memory space is available.

Signed-off-by: Arunpravin Paneer Selvam 


Suggested-by: Matthew Auld 
---
  drivers/gpu/drm/drm_buddy.c | 67 +++--
  include/drm/drm_buddy.h |  3 ++


No users?


Other question is how can a buddy allocator fragment in the first place?

Christian.




  2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 33ad0cfbd54c..fac423d2cb73 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block)
  }
  EXPORT_SYMBOL(drm_get_buddy);
  -static void __drm_buddy_free(struct drm_buddy *mm,
- struct drm_buddy_block *block)
+static unsigned int __drm_buddy_free(struct drm_buddy *mm,
+ struct drm_buddy_block *block,
+ bool defrag)
  {
  struct drm_buddy_block *parent;
+    unsigned int order;
    while ((parent = block->parent)) {
  struct drm_buddy_block *buddy;
@@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm,
  if (!drm_buddy_block_is_free(buddy))
  break;
  -    if (drm_buddy_block_is_clear(block) !=
-    drm_buddy_block_is_clear(buddy))
-    break;
+    if (!defrag) {
+    if (drm_buddy_block_is_clear(block) !=
+    drm_buddy_block_is_clear(buddy))
+    break;
  -    if (drm_buddy_block_is_clear(block))
-    mark_cleared(parent);
+    if (drm_buddy_block_is_clear(block))
+    mark_cleared(parent);
+    }


Maybe check if the two blocks are incompatible and chuck a warn if 
they are not? Main thing is not to hide issues with split blocks that 
should have been merged before.



    list_del(&buddy->link);
  @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct drm_buddy 
*mm,

  block = parent;
  }
  +    order = drm_buddy_block_order(block);
  mark_free(mm, block);
+
+    return order;
+}
+
+/**
+ * drm_buddy_defrag - Defragmentation routine
+ *
+ * @mm: DRM buddy manager
+ * @min_order: minimum order in the freelist to begin
+ * the defragmentation process
+ *
+ * Driver calls the defragmentation function when the
+ * requested memory allocation returns -ENOSPC.
+ */
+void drm_buddy_defrag(struct drm_buddy *mm,
+  unsigned int min_order)


Just wondering if we need "full defag" also? We would probably need to 
call this at fini() anyway.



+{
+    struct drm_buddy_block *block;
+    struct list_head *list;
+    unsigned int order;
+    int i;
+
+    if (min_order > mm->max_order)
+    return;
+
+    for (i = min_order - 1; i >= 0; i--) {


Need to be careful with min_order = 0 ?


+    list = &mm->free_list[i];
+    if (list_empty(list))
+    continue;
+
+    list_for_each_entry_reverse(block, list, link) {


Don't we need the safe_reverse() variant here, since this is removing 
from the list?



+    if (!block->parent)
+    continue;
+
+    order = __drm_buddy_free(mm, block, 1);
+    if (order >= min_order)
+    return;
+    }
+    }
  }
+EXPORT_SYMBOL(drm_buddy_defrag);
    /**
   * drm_buddy_free_block - free a block
@@ -321,7 +366,7 @@ void drm_buddy_free_block(struct drm_buddy *mm,
  if (drm_buddy_block_is_clear(block))
  mm->clear_avail += drm_buddy_block_size(mm, block);
  -    __drm_buddy_free(mm, block);
+    __drm_buddy_free(mm, block, 0);
  }
  EXPORT_SYMBOL(drm_buddy_free_block);
  @@ -470,7 +515,7 @@ __alloc_range_bias(struct drm_buddy *mm,
  if (buddy &&
  (drm_buddy_block_is_free(block) &&
   drm_buddy_block_is_free(buddy)))
-    __drm_buddy_free(mm, block);
+    __drm_buddy_free(mm, block, 0);
  return ERR_PTR(err);
  }
  @@ -588,7 +633,7 @@ alloc_from_freelist(struct drm_buddy *mm,
    err_undo:
  if (tmp != order)
-    __drm_buddy_free(mm, block);
+    __drm_buddy_free(mm, block, 0);
  return ERR_PTR(err);
  }
  @@ -668,7 +713,7 @@ static int __alloc_range(struct drm_buddy *mm,
  if (buddy &&
  (drm_buddy_block_is_free(block) &&
   drm_buddy_block_is_free(buddy)))
-    __drm_buddy_free(mm, block);
+    __drm_buddy_free(mm, block, 0);
    err_free:
  if (err == -ENOSPC && total_allocated_on_err) {
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
index d81c596dfa38..d0f63e7b5915 100644
--- a/include/drm/drm_buddy.h
+++ b/include/drm/drm_buddy.h
@@ -166,6 +166,9 @@ void drm_buddy_free_list(struct drm_buddy *mm,
   struct list_head *objects,
   unsigned int flags);
  +void drm_buddy_defrag(struct drm_buddy *mm,
+

Re: [PATCH] drm/buddy: Modify duplicate list_splice_tail call

2024-02-16 Thread Christian König





Am 16.02.24 um 12:46 schrieb Arunpravin Paneer Selvam:



On 2/16/2024 4:41 PM, Matthew Auld wrote:

On 16/02/2024 10:00, Arunpravin Paneer Selvam wrote:

Remove the duplicate list_splice_tail call when the
total_allocated < size condition is true.

Cc:  # 6.7+
Fixes: 8746c6c9dfa3 ("drm/buddy: Fix alloc_range() error handling 
code")

Reported-by: Bert Karwatzki 
Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/drm_buddy.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index c1a99bf4dffd..c4222b886db7 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -538,13 +538,13 @@ static int __alloc_range(struct drm_buddy *mm,
  list_add(&block->left->tmp_link, dfs);
  } while (1);
  -    list_splice_tail(&allocated, blocks);
-
  if (total_allocated < size) {
  err = -ENOSPC;
  goto err_free;
  }
  +    list_splice_tail(&allocated, blocks);


Sigh. Can we extend the unit test(s) to catch this?

Sure, Let me check.


In the meantime I'm going to push this one to drm-misc-fixes.

Regards,
Christian.



Regards,
Arun.


Reviewed-by: Matthew Auld 


+
  return 0;
    err_undo:

base-commit: a64056bb5a3215bd31c8ce17d609ba0f4d5c55ea

Re: [PATCH 0/6 V4] fdinfo shared stats

2024-02-15 Thread Christian König


Am 12.02.24 um 22:04 schrieb Alex Deucher:

We had a request to add shared buffer stats to fdinfo for amdgpu and
while implementing that, Christian mentioned that just looking at
the GEM handle count doesn't take into account buffers shared with other
subsystems like V4L or RDMA.  Those subsystems don't use GEM, so it
doesn't really matter from a GPU top perspective, but it's more
correct if you actually want to see shared buffers.

After further discussions, add a helper and update all fdinfo
implementations to use that helper for consistency.

v4: switch drm_gem_object_is_shared_for_memory_stats() to an inline function


I'm still not sure if looking at the actual handle count is the right 
approach, but it's certainly better than before.


So Reviewed-by: Christian König  for the 
entire series.


Should I take this through drm-misc-next?

Regards,
Christian.



Alex Deucher (6):
   Documentation/gpu: Update documentation on drm-shared-*
   drm: add drm_gem_object_is_shared_for_memory_stats() helper
   drm: update drm_show_memory_stats() for dma-bufs
   drm/amdgpu: add shared fdinfo stats
   drm/i915: Update shared stats to use the new gem helper
   drm/xe: Update shared stats to use the new gem helper

  Documentation/gpu/drm-usage-stats.rst  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
  drivers/gpu/drm/drm_file.c |  2 +-
  drivers/gpu/drm/i915/i915_drm_client.c |  2 +-
  drivers/gpu/drm/xe/xe_drm_client.c |  2 +-
  include/drm/drm_gem.h  | 13 +
  8 files changed, 38 insertions(+), 4 deletions(-)

Re: [PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test

2024-02-13 Thread Christian König


Am 13.02.24 um 15:28 schrieb Matthew Auld:

On 13/02/2024 13:52, Arunpravin Paneer Selvam wrote:

Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Signed-off-by: Matthew Auld 
Reviewed-by: Arunpravin Paneer Selvam 


It looks like you changed the patch authorship here.


Going to fix this if I get tasked with pushing this to drm-misc-fixes.

But I still have hope that Arun will figure out how to do this himself.

Christian.




Cc: Arunpravin Paneer Selvam 
Cc: Limonciello 
Cc: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++
  1 file changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c 
b/drivers/gpu/drm/tests/drm_buddy_test.c

index ea2af6bd9abe..fee6bec757d1 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -8,6 +8,7 @@
    #include 
  #include 
+#include 
    #include 
  @@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 
chunk_size)

  return (1 << order) * chunk_size;
  }
  +static void drm_test_buddy_alloc_contiguous(struct kunit *test)
+{
+    u64 mm_size, ps = SZ_4K, i, n_pages, total;
+    struct drm_buddy_block *block;
+    struct drm_buddy mm;
+    LIST_HEAD(left);
+    LIST_HEAD(middle);
+    LIST_HEAD(right);
+    LIST_HEAD(allocated);
+
+    mm_size = 16 * 3 * SZ_4K;
+
+    KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
+
+    /*
+ * Idea is to fragment the address space by alternating block
+ * allocations between three different lists; one for left, 
middle and

+ * right. We can then free a list to simulate fragmentation. In
+ * particular we want to exercise the 
DRM_BUDDY_CONTIGUOUS_ALLOCATION,

+ * including the try_harder path.
+ */
+
+    i = 0;
+    n_pages = mm_size / ps;
+    do {
+    struct list_head *list;
+    int slot = i % 3;
+
+    if (slot == 0)
+    list = &left;
+    else if (slot == 1)
+    list = &middle;
+    else
+    list = &right;
+    KUNIT_ASSERT_FALSE_MSG(test,
+   drm_buddy_alloc_blocks(&mm, 0, mm_size,
+  ps, ps, list, 0),
+   "buddy_alloc hit an error size=%d\n",
+   ps);
+    } while (++i < n_pages);
+
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+   3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%d\n", 3 * ps);
+
+    drm_buddy_free_list(&mm, &middle);
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+   3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%llu\n", 3 * ps);
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+   2 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%llu\n", 2 * ps);
+
+    drm_buddy_free_list(&mm, &right);
+    KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+   3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc didn't error size=%llu\n", 3 * ps);
+    /*
+ * At this point we should have enough contiguous space for 2 
blocks,
+ * however they are never buddies (since we freed middle and 
right) so

+ * will require the try_harder logic to find them.
+ */
+    KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, 
mm_size,

+    2 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc hit an error size=%d\n", 2 * ps);
+
+    drm_buddy_free_list(&mm, &left);
+    KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, 
mm_size,

+    3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+   "buddy_alloc hit an error size=%d\n", 3 * ps);
+
+    total = 0;
+    list_for_each_entry(block, &allocated, link)
+    total += drm_buddy_block_size(&mm, block);
+
+    KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
+
+    drm_buddy_free_list(&mm, &allocated);
+    drm_buddy_fini(&mm);
+}
+
  static void drm_test_buddy_alloc_pathological(struct kunit *test)
  {
  u64 mm_size, size, start = 0;
@@ -280,6 +368,7 @@ static struct kunit_case drm_buddy_tests[] = {
  KUNIT_CASE(drm_test_buddy_alloc_optimistic),
  KUNIT_CASE(drm_test_buddy_alloc_pessimistic),
  KUNIT_CASE(drm_test_buddy_alloc_pathological),
+    KUNIT_CASE(drm_test_buddy_alloc_contiguous),
  {}
  };

Re: [PATCH] drm/buddy: Fix alloc_range() error handling code

2024-02-07 Thread Christian König


Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam:

Few users have observed display corruption when they boot
the machine to KDE Plasma or playing games. We have root
caused the problem that whenever alloc_range() couldn't
find the required memory blocks the function was returning
SUCCESS in some of the corner cases.

The right approach would be if the total allocated size
is less than the required size, the function should
return -ENOSPC.

Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
Signed-off-by: Arunpravin Paneer Selvam 
Tested-by: Mario Limonciello 


Acked-by: Christian König 

CC: stable.. ?


---
  drivers/gpu/drm/drm_buddy.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d74fb0e..c1a99bf4dffd 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
} while (1);
  
  	list_splice_tail(&allocated, blocks);

+
+   if (total_allocated < size) {
+   err = -ENOSPC;
+   goto err_free;
+   }
+
return 0;
  
  err_undo:

Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4

2024-02-06 Thread Christian König


Am 06.02.24 um 13:53 schrieb Thomas Hellström:

Hi, Christian,

On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote:

Previously we would never try to move a BO into the preferred
placements
when it ever landed in a busy placement since those were considered
compatible.

Rework the whole handling and finally unify the idle and busy
handling.
ttm_bo_validate() is now responsible to try idle placement first and
then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on
overcommit.

v2: fix kerneldoc warning and coding style
v3: take care of XE as well
v4: keep the ttm_bo_mem_space functionality as it is for now, only
add
     new handling for ttm_bo_validate as suggested by Thomas

Signed-off-by: Christian König 
Reviewed-by: Zack Rusin  v3

Sending this through xe CI, will try to review asap.


Take your time. At the moment people are bombarding me with work and I 
have only two hands and one head as well :(


Christian.



/Thomas



---
  drivers/gpu/drm/ttm/ttm_bo.c   | 231 +--
--
  drivers/gpu/drm/ttm/ttm_resource.c |  16 +-
  include/drm/ttm/ttm_resource.h |   3 +-
  3 files changed, 121 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c
index ba3f09e2d7e6..b12f435542a9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct
ttm_buffer_object *bo,
    return ret;
  }
  
-/*

- * Repeatedly evict memory from the LRU for @mem_type until we
create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
-     const struct ttm_place *place,
-     struct ttm_resource **mem,
-     struct ttm_operation_ctx *ctx)
-{
-   struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man;
-   struct ww_acquire_ctx *ticket;
-   int ret;
-
-   man = ttm_manager_type(bdev, place->mem_type);
-   ticket = dma_resv_locking_ctx(bo->base.resv);
-   do {
-   ret = ttm_resource_alloc(bo, place, mem);
-   if (likely(!ret))
-   break;
-   if (unlikely(ret != -ENOSPC))
-   return ret;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
-     ticket);
-   if (unlikely(ret != 0))
-   return ret;
-   } while (1);
-
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx-

no_wait_gpu);

-}
-
  /**
- * ttm_bo_mem_space
+ * ttm_bo_alloc_resource - Allocate backing store for a BO
   *
- * @bo: Pointer to a struct ttm_buffer_object. the data of which
- * we want to allocate space for.
- * @placement: Proposed new placement for the buffer object.
- * @mem: A struct ttm_resource.
+ * @bo: Pointer to a struct ttm_buffer_object of which we want a
resource for
+ * @placement: Proposed new placement for the buffer object
   * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
+ * @res: The resulting struct ttm_resource.
   *
- * Allocate memory space for the buffer object pointed to by @bo,
using
- * the placement flags in @placement, potentially evicting other
idle buffer objects.
- * This function may sleep while waiting for space to become
available.
+ * Allocates a resource for the buffer object pointed to by @bo,
using the
+ * placement flags in @placement, potentially evicting other buffer
objects when
+ * @force_space is true.
+ * This function may sleep while waiting for resources to become
available.
   * Returns:
- * -EBUSY: No space available (only if no_wait == 1).
+ * -EBUSY: No space available (only if no_wait == true).
   * -ENOSPC: Could not allocate space for the buffer object, either
due to
   * fragmentation or concurrent allocators.
   * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
   */
-int ttm_bo_mem_space(struct ttm_buffer_object *bo,
-   struct ttm_placement *placement,
-   struct ttm_resource **mem,
-   struct ttm_operation_ctx *ctx)
+static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
+struct ttm_placement *placement,
+struct ttm_operation_ctx *ctx,
+bool force_space,
+struct ttm_resource **res)
  {
    struct ttm_device *bdev = bo->bdev;
-   bool type_found = false;
+   struct ww_acquire_ctx *ticket;
    int i, ret;
  
+	ticket = dma_resv_locking_ctx(bo->base.resv);

    ret = dma_resv_reserve_fences(bo->base.r

[PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2024-01-26 Thread Christian König

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König 
Reviewed-by: Zack Rusin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b671b0665492..0eac179a387c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
+   /*
+* When GTT is just an alternative to VRAM make sure that we
+* only use it as fallback and still try to fill up VRAM first.
+*/
+   if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
+   places[c].flags |= TTM_PL_FLAG_FALLBACK;
c++;
}
 
-- 
2.34.1

[PATCH 1/2] drm/ttm: improve idle/busy handling v4

2024-01-26 Thread Christian König

Previously we would never try to move a BO into the preferred placements
when it ever landed in a busy placement since those were considered
compatible.

Rework the whole handling and finally unify the idle and busy handling.
ttm_bo_validate() is now responsible to try idle placement first and then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on overcommit.

v2: fix kerneldoc warning and coding style
v3: take care of XE as well
v4: keep the ttm_bo_mem_space functionality as it is for now, only add
new handling for ttm_bo_validate as suggested by Thomas

Signed-off-by: Christian König 
Reviewed-by: Zack Rusin  v3
---
 drivers/gpu/drm/ttm/ttm_bo.c   | 231 +
 drivers/gpu/drm/ttm/ttm_resource.c |  16 +-
 include/drm/ttm/ttm_resource.h |   3 +-
 3 files changed, 121 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ba3f09e2d7e6..b12f435542a9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
return ret;
 }
 
-/*
- * Repeatedly evict memory from the LRU for @mem_type until we create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
- const struct ttm_place *place,
- struct ttm_resource **mem,
- struct ttm_operation_ctx *ctx)
-{
-   struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man;
-   struct ww_acquire_ctx *ticket;
-   int ret;
-
-   man = ttm_manager_type(bdev, place->mem_type);
-   ticket = dma_resv_locking_ctx(bo->base.resv);
-   do {
-   ret = ttm_resource_alloc(bo, place, mem);
-   if (likely(!ret))
-   break;
-   if (unlikely(ret != -ENOSPC))
-   return ret;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
- ticket);
-   if (unlikely(ret != 0))
-   return ret;
-   } while (1);
-
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
-}
-
 /**
- * ttm_bo_mem_space
+ * ttm_bo_alloc_resource - Allocate backing store for a BO
  *
- * @bo: Pointer to a struct ttm_buffer_object. the data of which
- * we want to allocate space for.
- * @placement: Proposed new placement for the buffer object.
- * @mem: A struct ttm_resource.
+ * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for
+ * @placement: Proposed new placement for the buffer object
  * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
+ * @res: The resulting struct ttm_resource.
  *
- * Allocate memory space for the buffer object pointed to by @bo, using
- * the placement flags in @placement, potentially evicting other idle buffer 
objects.
- * This function may sleep while waiting for space to become available.
+ * Allocates a resource for the buffer object pointed to by @bo, using the
+ * placement flags in @placement, potentially evicting other buffer objects 
when
+ * @force_space is true.
+ * This function may sleep while waiting for resources to become available.
  * Returns:
- * -EBUSY: No space available (only if no_wait == 1).
+ * -EBUSY: No space available (only if no_wait == true).
  * -ENOSPC: Could not allocate space for the buffer object, either due to
  * fragmentation or concurrent allocators.
  * -ERESTARTSYS: An interruptible sleep was interrupted by a signal.
  */
-int ttm_bo_mem_space(struct ttm_buffer_object *bo,
-   struct ttm_placement *placement,
-   struct ttm_resource **mem,
-   struct ttm_operation_ctx *ctx)
+static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
+struct ttm_placement *placement,
+struct ttm_operation_ctx *ctx,
+bool force_space,
+struct ttm_resource **res)
 {
struct ttm_device *bdev = bo->bdev;
-   bool type_found = false;
+   struct ww_acquire_ctx *ticket;
int i, ret;
 
+   ticket = dma_resv_locking_ctx(bo->base.resv);
ret = dma_resv_reserve_fences(bo->base.resv, 1);
if (unlikely(ret))
return ret;
@@ -790,98 +762,73 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
const struct ttm_place *place = &placement->placement[i];
struct ttm_resource_manager *man;
 
-   if (place->flags & TTM_PL_FLAG_FALLBACK)
-

Rework TTMs busy handling

2024-01-26 Thread Christian König

Hi guys,

so pushed the first few patches from this series. I hope that I
correctly managed to resolve the silent Xe merge conflict in drm-tip,
but would be nice if somebody could double check.

Then for the two remaining patches I've implemented most of what
Thomas suggest, e.g. the existing functionality sticks around for
eviction and hobs, but ttm_bo_validate will now try to always move
things into the non-fallback placements on validation first.

What I haven't done yet is to split up the preferred placement since
I couldn't immediately see an use case for this, but it's really
something we might do in the future as well.

Please review and comment,
Christian.

Re: [PATCH] mm: Remove double faults once write a device pfn

2024-01-25 Thread Christian König


Am 24.01.24 um 12:04 schrieb Alistair Popple:

"Zhou, Xianrong"  writes:


[AMD Official Use Only - General]


The vmf_insert_pfn_prot could cause unnecessary double faults on a
device pfn. Because currently the vmf_insert_pfn_prot does not
make the pfn writable so the pte entry is normally read-only or
dirty catching.

What? How do you got to this conclusion?

Sorry. I did not mention that this problem only exists on arm64 platform.

Ok, that makes at least a little bit more sense.


Because on arm64 platform the PTE_RDONLY is automatically attached
to the userspace pte entries even through VM_WRITE + VM_SHARE.
The  PTE_RDONLY needs to be cleared in vmf_insert_pfn_prot. However
vmf_insert_pfn_prot do not make the pte writable passing false
@mkwrite to insert_pfn.

Question is why is arm64 doing this? As far as I can see they must
have some hardware reason for that.

The mkwrite parameter to insert_pfn() was added by commit
b2770da642540 to make insert_pfn() look more like insert_pfn_pmd() so
that the DAX code can insert PTEs which are writable and dirty at the same

time.

This is one scenario to do so. In fact on arm64 there are many
scenarios could be to do so. So we can let vmf_insert_pfn_prot
supporting @mkwrite for drivers at core layer and let drivers to
decide whether or not to make writable and dirty at one time. The
patch did this. Otherwise double faults on arm64 when call

vmf_insert_pfn_prot.

Well, that doesn't answer my question why arm64 is double faulting in the
first place,.



Eh.

On arm64 When userspace mmap() with PROT_WRITE and MAP_SHARED the
vma->vm_page_prot has the PTE_RDONLY and PTE_WRITE within
PAGE_SHARED_EXEC. (seeing arm64 protection_map)


Well that's your observation, but not the explanation why arm64 is doing 
this.


See this would have quite some negative impact on performance, not only 
for gfx drivers but in general.


So either the observation is incorrect or there is a *really* good 
reason why arm64 is taking this performance penalty.



When write the userspace virtual address the first fault happen and call
into driver's .fault->ttm_bo_vm_fault_reserved->vmf_insert_pfn_prot->insert_pfn.
The insert_pfn will establish the pte entry. However the vmf_insert_pfn_prot
pass false @mkwrite to insert_pfn by default and so insert_pfn could not make
the pfn writable and it do not call maybe_mkwrite(pte_mkdirty(entry), vma)
to clear the PTE_RDONLY bit. So the pte entry is actually write protection for 
mmu.
So when the first fault return and re-execute the store instruction the second
fault happen again. And in second fault it just only do pte_mkdirty(entry) which
clear the PTE_RDONLY.

It depends if the ARM64 CPU in question supports hardware dirty bit
management (DBM). If that is the case and PTE_DBM (ie. PTE_WRITE) is set
HW will automatically clear PTE_RDONLY bit to mark the entry dirty
instead of raising a write fault. So you shouldn't see a double fault if
PTE_DBM/WRITE is set.

On ARM64 you can kind of think of PTE_RDONLY as the HW dirty bit and
PTE_DBM as the read/write permission bit with SW being responsible for
updating PTE_RDONLY via the fault handler if DBM is not supported by HW.

At least that's my understanding from having hacked on this in the
past. You can see all this weirdness happening in the definitions of
pte_dirty() and pte_write() for ARM64.


+1

Thanks a lot for that, this was exactly the information I was looking for.

In this light it makes this patch here look unnecessary and questionable 
at best.


Xianrong if you have an arm64 platform which really double faults 
(confirmed through a debugger for example) then you have to ask why this 
platform shows this behavior and not try to work around it.


Behaviors like those usually have a very very good reason and without a 
confirmed explanation I'm not allowing any patch in which would disable 
stuff like that.


Regards,
Christian.




I think so and hope no wrong.


So as long as this isn't sorted out I'm going to reject this patch.

Regards,
Christian.


This is a completely different use case to what you try to use it
here for and that looks extremely fishy to me.

Regards,
Christian.


The first fault only sets up the pte entry which actually is dirty
catching. And the second immediate fault to the pfn due to first
dirty catching when the cpu re-execute the store instruction.

It could be that this is done to work around some hw behavior, but
not because of dirty catching.


Normally if the drivers call vmf_insert_pfn_prot and also supply
'pfn_mkwrite' callback within vm_operations_struct which requires
the pte to be dirty catching then the vmf_insert_pfn_prot and the
double fault are reasonable. It is not a problem.

Well, as far as I can see that behavior absolutely doesn't make sense.

When pfn_mkwrite is requested then the driver should use PAGE_COPY,
which is exactly what VMWGFX (the only driver using dirty tracking)
is

doing.

Everybody else uses PAGE_SHARED which should make the pte w

Re: [PATCH] mm: Remove double faults once write a device pfn

2024-01-23 Thread Christian König


Am 24.01.24 um 03:43 schrieb Zhou, Xianrong:

[AMD Official Use Only - General]


The vmf_insert_pfn_prot could cause unnecessary double faults on a
device pfn. Because currently the vmf_insert_pfn_prot does not make
the pfn writable so the pte entry is normally read-only or dirty
catching.

What? How do you got to this conclusion?

Sorry. I did not mention that this problem only exists on arm64 platform.

Ok, that makes at least a little bit more sense.


Because on arm64 platform the PTE_RDONLY is automatically attached to
the userspace pte entries even through VM_WRITE + VM_SHARE.
The  PTE_RDONLY needs to be cleared in vmf_insert_pfn_prot. However
vmf_insert_pfn_prot do not make the pte writable passing false
@mkwrite to insert_pfn.

Question is why is arm64 doing this? As far as I can see they must have some
hardware reason for that.

The mkwrite parameter to insert_pfn() was added by commit
b2770da642540 to make insert_pfn() look more like insert_pfn_pmd() so that
the DAX code can insert PTEs which are writable and dirty at the same time.


This is one scenario to do so. In fact on arm64 there are many scenarios could
be to do so. So we can let vmf_insert_pfn_prot supporting @mkwrite for drivers
at core layer and let drivers to decide whether or not to make writable and 
dirty
at one time. The patch did this. Otherwise double faults on arm64 when call
vmf_insert_pfn_prot.


Well, that doesn't answer my question why arm64 is double faulting in 
the first place,.


So as long as this isn't sorted out I'm going to reject this patch.

Regards,
Christian.




This is a completely different use case to what you try to use it here for and
that looks extremely fishy to me.

Regards,
Christian.


The first fault only sets up the pte entry which actually is dirty
catching. And the second immediate fault to the pfn due to first
dirty catching when the cpu re-execute the store instruction.

It could be that this is done to work around some hw behavior, but
not because of dirty catching.


Normally if the drivers call vmf_insert_pfn_prot and also supply
'pfn_mkwrite' callback within vm_operations_struct which requires
the pte to be dirty catching then the vmf_insert_pfn_prot and the
double fault are reasonable. It is not a problem.

Well, as far as I can see that behavior absolutely doesn't make sense.

When pfn_mkwrite is requested then the driver should use PAGE_COPY,
which is exactly what VMWGFX (the only driver using dirty tracking) is

doing.

Everybody else uses PAGE_SHARED which should make the pte writeable
immediately.

Regards,
Christian.


However the most of drivers calling vmf_insert_pfn_prot do not
supply the 'pfn_mkwrite' callback so that the second fault is unnecessary.

So just like vmf_insert_mixed and vmf_insert_mixed_mkwrite pair, we
should also supply vmf_insert_pfn_mkwrite for drivers as well.

Signed-off-by: Xianrong Zhou 
---
arch/x86/entry/vdso/vma.c  |  3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  2 +-
drivers/gpu/drm/i915/gem/i915_gem_ttm.c|  2 +-
drivers/gpu/drm/nouveau/nouveau_gem.c  |  2 +-
drivers/gpu/drm/radeon/radeon_gem.c|  2 +-
drivers/gpu/drm/ttm/ttm_bo_vm.c|  8 +---
drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |  8 +---
include/drm/ttm/ttm_bo.h   |  3 ++-
include/linux/mm.h |  2 +-
mm/memory.c| 14 +++---
10 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 7645730dc228..dd2431c2975f 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -185,7 +185,8 @@ static vm_fault_t vvar_fault(const struct

vm_special_mapping *sm,

  if (pvti && vclock_was_used(VDSO_CLOCKMODE_PVCLOCK))

{

  return vmf_insert_pfn_prot(vma, vmf->address,
  __pa(pvti) >> PAGE_SHIFT,
-   pgprot_decrypted(vma-
vm_page_prot));
+   pgprot_decrypted(vma-
vm_page_prot),
+   true);
  }
  } else if (sym_offset == image->sym_hvclock_page) {
  pfn = hv_get_tsc_pfn(); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 49a5f1c73b3e..adcb20d9e624 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -64,7 +64,7 @@ static vm_fault_t amdgpu_gem_fault(struct

vm_fault

*vmf)

  }

  ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
vm_page_prot,
-  TTM_BO_VM_NUM_PREFAULT);
+  TTM_BO_VM_NUM_PREFAULT,

true);

  drm_dev_exit(idx);
  } else {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 9227f8146a58..c6f1

Re: [PATCH] mm: Remove double faults once write a device pfn

2024-01-23 Thread Christian König


Am 23.01.24 um 09:33 schrieb Zhou, Xianrong:

[AMD Official Use Only - General]


The vmf_insert_pfn_prot could cause unnecessary double faults on a
device pfn. Because currently the vmf_insert_pfn_prot does not make
the pfn writable so the pte entry is normally read-only or dirty
catching.

What? How do you got to this conclusion?

Sorry. I did not mention that this problem only exists on arm64 platform.


Ok, that makes at least a little bit more sense.


Because on arm64 platform the PTE_RDONLY is automatically attached to
the userspace pte entries even through VM_WRITE + VM_SHARE.
The  PTE_RDONLY needs to be cleared in vmf_insert_pfn_prot. However
vmf_insert_pfn_prot do not make the pte writable passing false @mkwrite
to insert_pfn.


Question is why is arm64 doing this? As far as I can see they must have 
some hardware reason for that.


The mkwrite parameter to insert_pfn() was added by commit b2770da642540 
to make insert_pfn() look more like insert_pfn_pmd() so that the DAX 
code can insert PTEs which are writable and dirty at the same time.


This is a completely different use case to what you try to use it here 
for and that looks extremely fishy to me.


Regards,
Christian.




The first fault only sets up the pte entry which actually is dirty
catching. And the second immediate fault to the pfn due to first dirty
catching when the cpu re-execute the store instruction.

It could be that this is done to work around some hw behavior, but not
because of dirty catching.


Normally if the drivers call vmf_insert_pfn_prot and also supply
'pfn_mkwrite' callback within vm_operations_struct which requires the
pte to be dirty catching then the vmf_insert_pfn_prot and the double
fault are reasonable. It is not a problem.

Well, as far as I can see that behavior absolutely doesn't make sense.

When pfn_mkwrite is requested then the driver should use PAGE_COPY, which
is exactly what VMWGFX (the only driver using dirty tracking) is doing.

Everybody else uses PAGE_SHARED which should make the pte writeable
immediately.

Regards,
Christian.


However the most of drivers calling vmf_insert_pfn_prot do not supply
the 'pfn_mkwrite' callback so that the second fault is unnecessary.

So just like vmf_insert_mixed and vmf_insert_mixed_mkwrite pair, we
should also supply vmf_insert_pfn_mkwrite for drivers as well.

Signed-off-by: Xianrong Zhou 
---
   arch/x86/entry/vdso/vma.c  |  3 ++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  2 +-
   drivers/gpu/drm/i915/gem/i915_gem_ttm.c|  2 +-
   drivers/gpu/drm/nouveau/nouveau_gem.c  |  2 +-
   drivers/gpu/drm/radeon/radeon_gem.c|  2 +-
   drivers/gpu/drm/ttm/ttm_bo_vm.c|  8 +---
   drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |  8 +---
   include/drm/ttm/ttm_bo.h   |  3 ++-
   include/linux/mm.h |  2 +-
   mm/memory.c| 14 +++---
   10 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 7645730dc228..dd2431c2975f 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -185,7 +185,8 @@ static vm_fault_t vvar_fault(const struct

vm_special_mapping *sm,

 if (pvti && vclock_was_used(VDSO_CLOCKMODE_PVCLOCK))

{

 return vmf_insert_pfn_prot(vma, vmf->address,
 __pa(pvti) >> PAGE_SHIFT,
-   pgprot_decrypted(vma-
vm_page_prot));
+   pgprot_decrypted(vma-
vm_page_prot),
+   true);
 }
 } else if (sym_offset == image->sym_hvclock_page) {
 pfn = hv_get_tsc_pfn();
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 49a5f1c73b3e..adcb20d9e624 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -64,7 +64,7 @@ static vm_fault_t amdgpu_gem_fault(struct vm_fault

*vmf)

 }

 ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
vm_page_prot,
-  TTM_BO_VM_NUM_PREFAULT);
+  TTM_BO_VM_NUM_PREFAULT,

true);

 drm_dev_exit(idx);
 } else {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 9227f8146a58..c6f13ae6c308 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1114,7 +1114,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault
*vmf)

 if (drm_dev_enter(dev, &idx)) {
 ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma-
vm_page_prot,
-  TTM_BO_VM_NUM_PREFAULT);
+  TTM_BO_VM_NUM_PREFAULT,

true);

 drm_dev_exit(idx);
 } else {
 ret = ttm_bo_vm_dummy_page(vmf, vmf->vma

Re: [PATCH] mm: Remove double faults once write a device pfn

2024-01-22 Thread Christian König


Am 22.01.24 um 04:32 schrieb Xianrong Zhou:

The vmf_insert_pfn_prot could cause unnecessary double faults
on a device pfn. Because currently the vmf_insert_pfn_prot does
not make the pfn writable so the pte entry is normally read-only
or dirty catching.


What? How do you got to this conclusion?


The first fault only sets up the pte entry which actually is
dirty catching. And the second immediate fault to the pfn due
to first dirty catching when the cpu re-execute the store
instruction.


It could be that this is done to work around some hw behavior, but not 
because of dirty catching.



Normally if the drivers call vmf_insert_pfn_prot and also supply
'pfn_mkwrite' callback within vm_operations_struct which requires
the pte to be dirty catching then the vmf_insert_pfn_prot and the
double fault are reasonable. It is not a problem.


Well, as far as I can see that behavior absolutely doesn't make sense.

When pfn_mkwrite is requested then the driver should use PAGE_COPY, 
which is exactly what VMWGFX (the only driver using dirty tracking) is 
doing.


Everybody else uses PAGE_SHARED which should make the pte writeable 
immediately.


Regards,
Christian.



However the most of drivers calling vmf_insert_pfn_prot do not
supply the 'pfn_mkwrite' callback so that the second fault is
unnecessary.

So just like vmf_insert_mixed and vmf_insert_mixed_mkwrite pair,
we should also supply vmf_insert_pfn_mkwrite for drivers as well.

Signed-off-by: Xianrong Zhou 
---
  arch/x86/entry/vdso/vma.c  |  3 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c|  2 +-
  drivers/gpu/drm/nouveau/nouveau_gem.c  |  2 +-
  drivers/gpu/drm/radeon/radeon_gem.c|  2 +-
  drivers/gpu/drm/ttm/ttm_bo_vm.c|  8 +---
  drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |  8 +---
  include/drm/ttm/ttm_bo.h   |  3 ++-
  include/linux/mm.h |  2 +-
  mm/memory.c| 14 +++---
  10 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 7645730dc228..dd2431c2975f 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -185,7 +185,8 @@ static vm_fault_t vvar_fault(const struct 
vm_special_mapping *sm,
if (pvti && vclock_was_used(VDSO_CLOCKMODE_PVCLOCK)) {
return vmf_insert_pfn_prot(vma, vmf->address,
__pa(pvti) >> PAGE_SHIFT,
-   pgprot_decrypted(vma->vm_page_prot));
+   pgprot_decrypted(vma->vm_page_prot),
+   true);
}
} else if (sym_offset == image->sym_hvclock_page) {
pfn = hv_get_tsc_pfn();
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 49a5f1c73b3e..adcb20d9e624 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -64,7 +64,7 @@ static vm_fault_t amdgpu_gem_fault(struct vm_fault *vmf)
}
  
  		ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,

-  TTM_BO_VM_NUM_PREFAULT);
+  TTM_BO_VM_NUM_PREFAULT, true);
  
  		drm_dev_exit(idx);

} else {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 9227f8146a58..c6f13ae6c308 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1114,7 +1114,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
  
  	if (drm_dev_enter(dev, &idx)) {

ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
-  TTM_BO_VM_NUM_PREFAULT);
+  TTM_BO_VM_NUM_PREFAULT, true);
drm_dev_exit(idx);
} else {
ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 49c2bcbef129..7e1453762ec9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -56,7 +56,7 @@ static vm_fault_t nouveau_ttm_fault(struct vm_fault *vmf)
  
  	nouveau_bo_del_io_reserve_lru(bo);

prot = vm_get_page_prot(vma->vm_flags);
-   ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
+   ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, true);
nouveau_bo_add_io_reserve_lru(bo);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
return ret;
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
b/drivers/gpu/drm/radeon/radeon_gem.c
index 3fec3acdaf28..b21cf00ae162 100644
--- a/drivers/gpu/drm/rade

Re: Rework TTMs busy handling

2024-01-16 Thread Christian König


Am 12.01.24 um 13:51 schrieb Christian König:

Hi guys,


just a gentle ping on this.

Zack any more comments for the VMWGFX parts?

Thanks,
Christian.



same as the last time. Things I've changed:

Implemented the requirements from Zack to correctly fill in the busy
placements for VMWGFX.

Renamed the placement flags to desired and fallback as suggested by
Michel.

Rebased on drm-tip instead of drm-misc-next and fixed XE as well.

Please review and comment,
Christian.

[PATCH 4/5] drm/ttm: improve idle/busy handling v3

2024-01-12 Thread Christian König

Previously we would never try to move a BO into the preferred placements
when it ever landed in a busy placement since those were considered
compatible.

Rework the whole handling and finally unify the idle and busy handling.
ttm_bo_validate() is now responsible to try idle placement first and then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on overcommit.

v2: fix kerneldoc warning and coding style
v3: take care of XE as well

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c   | 131 -
 drivers/gpu/drm/ttm/ttm_resource.c |  16 ++-
 drivers/gpu/drm/xe/xe_bo.c |   4 +-
 include/drm/ttm/ttm_bo.h   |   3 +-
 include/drm/ttm/ttm_resource.h |   3 +-
 7 files changed, 68 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b671b0665492..06fb3fc47eaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev,
(*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT;
}
r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement,
-&(*bo_ptr)->tbo.resource, &ctx);
+&(*bo_ptr)->tbo.resource, &ctx, false);
if (r)
goto error;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8722beba494e..f23cdc7c5b08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -966,7 +966,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
placements.mem_type = TTM_PL_TT;
placements.flags = bo->resource->placement;
 
-   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
+   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true);
if (unlikely(r))
return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index a5e11a92e0b9..3783be24d832 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct 
ttm_buffer_object *bo,
hop_placement.placement = hop;
 
/* find space in the bounce domain */
-   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true);
if (ret)
return ret;
/* move to the bounce domain */
@@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
return ttm_bo_pipeline_gutting(bo);
}
 
-   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true);
if (ret) {
if (ret != -ERESTARTSYS) {
pr_err("Failed to find memory space for buffer 0x%p 
eviction\n",
@@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
return ret;
 }
 
-/*
- * Repeatedly evict memory from the LRU for @mem_type until we create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
- const struct ttm_place *place,
- struct ttm_resource **mem,
- struct ttm_operation_ctx *ctx)
-{
-   struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man;
-   struct ww_acquire_ctx *ticket;
-   int ret;
-
-   man = ttm_manager_type(bdev, place->mem_type);
-   ticket = dma_resv_locking_ctx(bo->base.resv);
-   do {
-   ret = ttm_resource_alloc(bo, place, mem);
-   if (likely(!ret))
-   break;
-   if (unlikely(ret != -ENOSPC))
-   return ret;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
- ticket);
-   if (unlikely(ret != 0))
-   return ret;
-   } while (1);
-
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
-}
-
 /**
  * ttm_bo_mem_space
  *
@@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
  * @placement: Proposed new placement for the buffer object.
  * @mem: A struct ttm_resource.
  * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space

[PATCH 5/5] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2024-01-12 Thread Christian König

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 06fb3fc47eaa..2752f2a67a44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
+   /*
+* When GTT is just an alternative to VRAM make sure that we
+* only use it as fallback and still try to fill up VRAM first.
+*/
+   if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
+   places[c].flags |= TTM_PL_FLAG_FALLBACK;
c++;
}
 
-- 
2.34.1

[PATCH 3/5] drm/ttm: replace busy placement with flags v6

2024-01-12 Thread Christian König

From: Somalapuram Amaranath 

Instead of a list of separate busy placement add flags which indicate
that a placement should only be used when there is room or if we need to
evict.

v2: add missing TTM_PL_FLAG_IDLE for i915
v3: fix auto build test ERROR on drm-tip/drm-tip
v4: fix some typos pointed out by checkpatch
v5: cleanup some rebase problems with VMWGFX
v6: implement some missing VMWGFX functionality pointed out by Zack,
rename the flags as suggested by Michel, rebase on drm-tip and
adjust XE as well

Signed-off-by: Christian König 
Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +---
 drivers/gpu/drm/drm_gem_vram_helper.c  |  2 -
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 +--
 drivers/gpu/drm/loongson/lsdc_ttm.c|  2 -
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 59 +++--
 drivers/gpu/drm/nouveau/nouveau_bo.h   |  1 -
 drivers/gpu/drm/qxl/qxl_object.c   |  2 -
 drivers/gpu/drm/qxl/qxl_ttm.c  |  2 -
 drivers/gpu/drm/radeon/radeon_object.c |  2 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  8 +--
 drivers/gpu/drm/radeon/radeon_uvd.c|  1 -
 drivers/gpu/drm/ttm/ttm_bo.c   | 21 ---
 drivers/gpu/drm/ttm/ttm_resource.c | 73 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 33 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  4 --
 drivers/gpu/drm/xe/xe_bo.c | 33 +-
 include/drm/ttm/ttm_placement.h| 10 +--
 include/drm/ttm/ttm_resource.h |  8 +--
 19 files changed, 118 insertions(+), 197 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 425cebcc5cbf..b671b0665492 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
 
placement->num_placement = c;
placement->placement = places;
-
-   placement->num_busy_placement = c;
-   placement->busy_placement = places;
 }
 
 /**
@@ -1397,8 +1394,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
AMDGPU_GEM_DOMAIN_GTT);
 
/* Avoid costly evictions; only set GTT as a busy placement */
-   abo->placement.num_busy_placement = 1;
-   abo->placement.busy_placement = &abo->placements[1];
+   abo->placements[0].flags |= TTM_PL_FLAG_DESIRED;
 
r = ttm_bo_validate(bo, &abo->placement, &ctx);
if (unlikely(r == -EBUSY || r == -ERESTARTSYS))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 75c9fd2c6c2a..8722beba494e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
/* Don't handle scatter gather BOs */
if (bo->type == ttm_bo_type_sg) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
/* Object isn't an AMDGPU object so ignore */
if (!amdgpu_bo_is_amdgpu_bo(bo)) {
placement->placement = &placements;
-   placement->busy_placement = &placements;
placement->num_placement = 1;
-   placement->num_busy_placement = 1;
return;
}
 
abo = ttm_to_amdgpu_bo(bo);
if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
@@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
case AMDGPU_PL_OA:
case AMDGPU_PL_DOORBELL:
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
 
case TTM_PL_VRAM:
if (!adev->mman.buffer_funcs_enabled) {
/* Move to system memory */
amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
+
} else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
   !(abo->flags & 
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
   amdgpu_bo_in_cpu_visible_vram(abo)) {
@@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
AMDGPU_GEM_DOMAIN_CPU);
abo->placements[0].fpfn = adev->gmc.visible_vram_size 
>> PAGE_SHIFT;
abo->placements[0].lpf

[PATCH 2/5] drm/ttm: return ENOSPC from ttm_bo_mem_space

2024-01-12 Thread Christian König

Only convert it to ENOMEM in ttm_bo_validate.

This allows ttm_bo_validate to distinct between an out of memory
situation and just out of space in a placement domain.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..8c1eaa74fa21 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
goto error;
}
 
-   ret = -ENOMEM;
+   ret = -ENOSPC;
if (!type_found) {
pr_err(TTM_PFX "No compatible memory type found\n");
ret = -EINVAL;
@@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
return -EINVAL;
 
ret = ttm_bo_move_buffer(bo, placement, ctx);
+   /* For backward compatibility with userspace */
+   if (ret == -ENOSPC)
+   return -ENOMEM;
if (ret)
return ret;
 
-- 
2.34.1

Rework TTMs busy handling

2024-01-12 Thread Christian König

Hi guys,

same as the last time. Things I've changed:

Implemented the requirements from Zack to correctly fill in the busy
placements for VMWGFX.

Renamed the placement flags to desired and fallback as suggested by
Michel.

Rebased on drm-tip instead of drm-misc-next and fixed XE as well.

Please review and comment,
Christian.

[PATCH 1/5] drm/vmwgfx: remove vmw_vram_gmr_placement

2024-01-12 Thread Christian König

Seems to be unused.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 28 --
 2 files changed, 29 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 3cd5090dedfc..12efecc17df6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -942,7 +942,6 @@ vmw_is_cursor_bypass3_enabled(const struct vmw_private 
*dev_priv)
 
 extern const size_t vmw_tt_size;
 extern struct ttm_placement vmw_vram_placement;
-extern struct ttm_placement vmw_vram_gmr_placement;
 extern struct ttm_placement vmw_sys_placement;
 extern struct ttm_device_funcs vmw_bo_driver;
 extern const struct vmw_sg_table *
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index af8562c95cc3..a84fffcef8e1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -43,13 +43,6 @@ static const struct ttm_place sys_placement_flags = {
.flags = 0
 };
 
-static const struct ttm_place gmr_placement_flags = {
-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = VMW_PL_GMR,
-   .flags = 0
-};
-
 struct ttm_placement vmw_vram_placement = {
.num_placement = 1,
.placement = &vram_placement_flags,
@@ -57,27 +50,6 @@ struct ttm_placement vmw_vram_placement = {
.busy_placement = &vram_placement_flags
 };
 
-static const struct ttm_place vram_gmr_placement_flags[] = {
-   {
-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = TTM_PL_VRAM,
-   .flags = 0
-   }, {
-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = VMW_PL_GMR,
-   .flags = 0
-   }
-};
-
-struct ttm_placement vmw_vram_gmr_placement = {
-   .num_placement = 2,
-   .placement = vram_gmr_placement_flags,
-   .num_busy_placement = 1,
-   .busy_placement = &gmr_placement_flags
-};
-
 struct ttm_placement vmw_sys_placement = {
.num_placement = 1,
.placement = &sys_placement_flags,
-- 
2.34.1

Re: Rework TTMs busy handling

2024-01-09 Thread Christian König


Am 09.01.24 um 09:14 schrieb Thomas Hellström:

Hi, Christian

On Tue, 2024-01-09 at 08:47 +0100, Christian König wrote:

Hi guys,

I'm trying to make this functionality a bit more useful for years now
since we multiple reports that behavior of drivers can be suboptimal
when multiple placements be given.

So basically instead of hacking around the TTM behavior in the driver
once more I've gone ahead and changed the idle/busy placement list
into idle/busy placement flags. This not only saves a bunch of code,
but also allows setting some placements as fallback which are used if
allocating from the preferred ones didn't worked.

Zack pointed out that some removed VMWGFX code was brought back
because
of rebasing, fixed in this version.

Intel CI seems to be happy with those patches, so any more comments?

Looks like Xe changes are missing? (xe is now in drm-tip).

I also have some doubts about the naming "idle" vs "busy", since an
elaborate eviction mechanism would probably at some point want to check
for gpu idle vs gpu busy, and this might create some confusion moving
forward for people confusing busy as in memory overcommit with busy as
in gpu activity.

I can't immediately think of something better, though.


Yeah, I was wondering about that as well. Especially since I wanted to 
add some more flags in the future when for example a bandwidth quota how 
much memory can be moved in/out is exceeded.


Something like phase1, phase2, phase3 etc..., but that's also not very 
descriptive either.


Going to take a look at XE as well, thanks for the notice.

Regards,
Christian.



/Thomas



Regards,
Christian.

[PATCH 2/5] drm/ttm: return ENOSPC from ttm_bo_mem_space

2024-01-08 Thread Christian König

Only convert it to ENOMEM in ttm_bo_validate.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..8c1eaa74fa21 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
goto error;
}
 
-   ret = -ENOMEM;
+   ret = -ENOSPC;
if (!type_found) {
pr_err(TTM_PFX "No compatible memory type found\n");
ret = -EINVAL;
@@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
return -EINVAL;
 
ret = ttm_bo_move_buffer(bo, placement, ctx);
+   /* For backward compatibility with userspace */
+   if (ret == -ENOSPC)
+   return -ENOMEM;
if (ret)
return ret;
 
-- 
2.34.1

[PATCH 5/5] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2024-01-08 Thread Christian König

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index f110dfdc4feb..979cecf18f17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
+   /*
+* When GTT is just an alternative to VRAM make sure that we
+* only use it as fallback and still try to fill up VRAM first.
+*/
+   if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
+   places[c].flags |= TTM_PL_FLAG_BUSY;
c++;
}
 
-- 
2.34.1

[PATCH 3/5] drm/ttm: replace busy placement with flags v5

2024-01-08 Thread Christian König

From: Somalapuram Amaranath 

Instead of a list of separate busy placement add flags which indicate
that a placement should only be used when there is room or if we need to
evict.

v2: add missing TTM_PL_FLAG_IDLE for i915
v3: fix auto build test ERROR on drm-tip/drm-tip
v4: fix some typos pointed out by checkpatch
v5: cleanup some rebase problems with VMWGFX

Signed-off-by: Christian König 
Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +---
 drivers/gpu/drm/drm_gem_vram_helper.c  |  2 -
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 +--
 drivers/gpu/drm/loongson/lsdc_ttm.c|  2 -
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 59 +++--
 drivers/gpu/drm/nouveau/nouveau_bo.h   |  1 -
 drivers/gpu/drm/qxl/qxl_object.c   |  2 -
 drivers/gpu/drm/qxl/qxl_ttm.c  |  2 -
 drivers/gpu/drm/radeon/radeon_object.c |  2 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  8 +--
 drivers/gpu/drm/radeon/radeon_uvd.c|  1 -
 drivers/gpu/drm/ttm/ttm_bo.c   | 21 ---
 drivers/gpu/drm/ttm/ttm_resource.c | 73 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |  2 -
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  4 --
 include/drm/ttm/ttm_placement.h| 10 +--
 include/drm/ttm/ttm_resource.h |  8 +--
 18 files changed, 79 insertions(+), 172 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..aa0dd6dad068 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
 
placement->num_placement = c;
placement->placement = places;
-
-   placement->num_busy_placement = c;
-   placement->busy_placement = places;
 }
 
 /**
@@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
AMDGPU_GEM_DOMAIN_GTT);
 
/* Avoid costly evictions; only set GTT as a busy placement */
-   abo->placement.num_busy_placement = 1;
-   abo->placement.busy_placement = &abo->placements[1];
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
 
r = ttm_bo_validate(bo, &abo->placement, &ctx);
if (unlikely(r == -EBUSY || r == -ERESTARTSYS))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..9a6a00b1af40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
/* Don't handle scatter gather BOs */
if (bo->type == ttm_bo_type_sg) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
/* Object isn't an AMDGPU object so ignore */
if (!amdgpu_bo_is_amdgpu_bo(bo)) {
placement->placement = &placements;
-   placement->busy_placement = &placements;
placement->num_placement = 1;
-   placement->num_busy_placement = 1;
return;
}
 
abo = ttm_to_amdgpu_bo(bo);
if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
@@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
case AMDGPU_PL_OA:
case AMDGPU_PL_DOORBELL:
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
 
case TTM_PL_VRAM:
if (!adev->mman.buffer_funcs_enabled) {
/* Move to system memory */
amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
+
} else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
   !(abo->flags & 
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
   amdgpu_bo_in_cpu_visible_vram(abo)) {
@@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
AMDGPU_GEM_DOMAIN_CPU);
abo->placements[0].fpfn = adev->gmc.visible_vram_size 
>> PAGE_SHIFT;
abo->placements[0].lpfn = 0;
-   abo->placement.busy_placement = &abo->placements[1];
-   abo->placement.num_busy_placement = 1;
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;

[PATCH 4/5] drm/ttm: improve idle/busy handling v2

2024-01-08 Thread Christian König

Previously we would never try to move a BO into the preferred placements
when it ever landed in a busy placement since those were considered
compatible.

Rework the whole handling and finally unify the idle and busy handling.
ttm_bo_validate() is now responsible to try idle placement first and then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on overcommit.

v2: fix kerneldoc warning and coding style

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c   | 131 -
 drivers/gpu/drm/ttm/ttm_resource.c |  15 ++-
 include/drm/ttm/ttm_bo.h   |   3 +-
 include/drm/ttm/ttm_resource.h |   3 +-
 6 files changed, 65 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index aa0dd6dad068..f110dfdc4feb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev,
(*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT;
}
r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement,
-&(*bo_ptr)->tbo.resource, &ctx);
+&(*bo_ptr)->tbo.resource, &ctx, false);
if (r)
goto error;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 9a6a00b1af40..00da9a81cf6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -967,7 +967,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
placements.mem_type = TTM_PL_TT;
placements.flags = bo->resource->placement;
 
-   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
+   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true);
if (unlikely(r))
return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index aa12bd5cfd17..17bfc252f76d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct 
ttm_buffer_object *bo,
hop_placement.placement = hop;
 
/* find space in the bounce domain */
-   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true);
if (ret)
return ret;
/* move to the bounce domain */
@@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
return ttm_bo_pipeline_gutting(bo);
}
 
-   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true);
if (ret) {
if (ret != -ERESTARTSYS) {
pr_err("Failed to find memory space for buffer 0x%p 
eviction\n",
@@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
return ret;
 }
 
-/*
- * Repeatedly evict memory from the LRU for @mem_type until we create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
- const struct ttm_place *place,
- struct ttm_resource **mem,
- struct ttm_operation_ctx *ctx)
-{
-   struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man;
-   struct ww_acquire_ctx *ticket;
-   int ret;
-
-   man = ttm_manager_type(bdev, place->mem_type);
-   ticket = dma_resv_locking_ctx(bo->base.resv);
-   do {
-   ret = ttm_resource_alloc(bo, place, mem);
-   if (likely(!ret))
-   break;
-   if (unlikely(ret != -ENOSPC))
-   return ret;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
- ticket);
-   if (unlikely(ret != 0))
-   return ret;
-   } while (1);
-
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
-}
-
 /**
  * ttm_bo_mem_space
  *
@@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
  * @placement: Proposed new placement for the buffer object.
  * @mem: A struct ttm_resource.
  * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
  *
  * Allocate memory space for the b

[PATCH 1/5] drm/vmwgfx: remove vmw_vram_gmr_placement

2024-01-08 Thread Christian König

Seems to be unused.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 28 --
 2 files changed, 29 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 3cd5090dedfc..12efecc17df6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -942,7 +942,6 @@ vmw_is_cursor_bypass3_enabled(const struct vmw_private 
*dev_priv)
 
 extern const size_t vmw_tt_size;
 extern struct ttm_placement vmw_vram_placement;
-extern struct ttm_placement vmw_vram_gmr_placement;
 extern struct ttm_placement vmw_sys_placement;
 extern struct ttm_device_funcs vmw_bo_driver;
 extern const struct vmw_sg_table *
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index af8562c95cc3..a84fffcef8e1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -43,13 +43,6 @@ static const struct ttm_place sys_placement_flags = {
.flags = 0
 };
 
-static const struct ttm_place gmr_placement_flags = {
-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = VMW_PL_GMR,
-   .flags = 0
-};
-
 struct ttm_placement vmw_vram_placement = {
.num_placement = 1,
.placement = &vram_placement_flags,
@@ -57,27 +50,6 @@ struct ttm_placement vmw_vram_placement = {
.busy_placement = &vram_placement_flags
 };
 
-static const struct ttm_place vram_gmr_placement_flags[] = {
-   {
-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = TTM_PL_VRAM,
-   .flags = 0
-   }, {
-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = VMW_PL_GMR,
-   .flags = 0
-   }
-};
-
-struct ttm_placement vmw_vram_gmr_placement = {
-   .num_placement = 2,
-   .placement = vram_gmr_placement_flags,
-   .num_busy_placement = 1,
-   .busy_placement = &gmr_placement_flags
-};
-
 struct ttm_placement vmw_sys_placement = {
.num_placement = 1,
.placement = &sys_placement_flags,
-- 
2.34.1

Rework TTMs busy handling

2024-01-08 Thread Christian König

Hi guys,

I'm trying to make this functionality a bit more useful for years now
since we multiple reports that behavior of drivers can be suboptimal
when multiple placements be given.

So basically instead of hacking around the TTM behavior in the driver
once more I've gone ahead and changed the idle/busy placement list
into idle/busy placement flags. This not only saves a bunch of code,
but also allows setting some placements as fallback which are used if
allocating from the preferred ones didn't worked.

Zack pointed out that some removed VMWGFX code was brought back because
of rebasing, fixed in this version.

Intel CI seems to be happy with those patches, so any more comments?

Regards,
Christian.

Re: [PATCH 2/4] drm/ttm: replace busy placement with flags v4

2024-01-05 Thread Christian König


Am 04.01.24 um 21:02 schrieb Zack Rusin:

On Thu, Jan 4, 2024 at 10:05 AM Christian König
 wrote:

From: Somalapuram Amaranath 

Instead of a list of separate busy placement add flags which indicate
that a placement should only be used when there is room or if we need to
evict.

v2: add missing TTM_PL_FLAG_IDLE for i915
v3: fix auto build test ERROR on drm-tip/drm-tip
v4: fix some typos pointed out by checkpatch

Signed-off-by: Christian König 
Signed-off-by: Somalapuram Amaranath 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +--
  drivers/gpu/drm/drm_gem_vram_helper.c  |  2 -
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 
  drivers/gpu/drm/loongson/lsdc_ttm.c|  2 -
  drivers/gpu/drm/nouveau/nouveau_bo.c   | 59 +
  drivers/gpu/drm/nouveau/nouveau_bo.h   |  1 -
  drivers/gpu/drm/qxl/qxl_object.c   |  2 -
  drivers/gpu/drm/qxl/qxl_ttm.c  |  2 -
  drivers/gpu/drm/radeon/radeon_object.c |  2 -
  drivers/gpu/drm/radeon/radeon_ttm.c|  8 +-
  drivers/gpu/drm/radeon/radeon_uvd.c|  1 -
  drivers/gpu/drm/ttm/ttm_bo.c   | 21 +++--
  drivers/gpu/drm/ttm/ttm_resource.c | 73 
  drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |  2 -
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +-
  include/drm/ttm/ttm_placement.h| 10 ++-
  include/drm/ttm/ttm_resource.h |  8 +-
  18 files changed, 159 insertions(+), 187 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..aa0dd6dad068 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)

 placement->num_placement = c;
 placement->placement = places;
-
-   placement->num_busy_placement = c;
-   placement->busy_placement = places;
  }

  /**
@@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
 AMDGPU_GEM_DOMAIN_GTT);

 /* Avoid costly evictions; only set GTT as a busy placement */
-   abo->placement.num_busy_placement = 1;
-   abo->placement.busy_placement = &abo->placements[1];
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;

 r = ttm_bo_validate(bo, &abo->placement, &ctx);
 if (unlikely(r == -EBUSY || r == -ERESTARTSYS))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..9a6a00b1af40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
 /* Don't handle scatter gather BOs */
 if (bo->type == ttm_bo_type_sg) {
 placement->num_placement = 0;
-   placement->num_busy_placement = 0;
 return;
 }

 /* Object isn't an AMDGPU object so ignore */
 if (!amdgpu_bo_is_amdgpu_bo(bo)) {
 placement->placement = &placements;
-   placement->busy_placement = &placements;
 placement->num_placement = 1;
-   placement->num_busy_placement = 1;
 return;
 }

 abo = ttm_to_amdgpu_bo(bo);
 if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
 placement->num_placement = 0;
-   placement->num_busy_placement = 0;
 return;
 }

@@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
 case AMDGPU_PL_OA:
 case AMDGPU_PL_DOORBELL:
 placement->num_placement = 0;
-   placement->num_busy_placement = 0;
 return;

 case TTM_PL_VRAM:
 if (!adev->mman.buffer_funcs_enabled) {
 /* Move to system memory */
 amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
+
 } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
!(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) 
&&
amdgpu_bo_in_cpu_visible_vram(abo)) {
@@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 AMDGPU_GEM_DOMAIN_CPU);
 abo->placements[0].fpfn = adev->gmc.visible_vram_size 
>> PAGE_SHIFT;
 abo->placements[0].lpfn = 0;
-   abo->placement.busy_placement = &abo->placements[1];
-   abo->placement.n

[PATCH 2/4] drm/ttm: replace busy placement with flags v4

2024-01-04 Thread Christian König

From: Somalapuram Amaranath 

Instead of a list of separate busy placement add flags which indicate
that a placement should only be used when there is room or if we need to
evict.

v2: add missing TTM_PL_FLAG_IDLE for i915
v3: fix auto build test ERROR on drm-tip/drm-tip
v4: fix some typos pointed out by checkpatch

Signed-off-by: Christian König 
Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +--
 drivers/gpu/drm/drm_gem_vram_helper.c  |  2 -
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 
 drivers/gpu/drm/loongson/lsdc_ttm.c|  2 -
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 59 +
 drivers/gpu/drm/nouveau/nouveau_bo.h   |  1 -
 drivers/gpu/drm/qxl/qxl_object.c   |  2 -
 drivers/gpu/drm/qxl/qxl_ttm.c  |  2 -
 drivers/gpu/drm/radeon/radeon_object.c |  2 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  8 +-
 drivers/gpu/drm/radeon/radeon_uvd.c|  1 -
 drivers/gpu/drm/ttm/ttm_bo.c   | 21 +++--
 drivers/gpu/drm/ttm/ttm_resource.c | 73 
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |  2 -
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +-
 include/drm/ttm/ttm_placement.h| 10 ++-
 include/drm/ttm/ttm_resource.h |  8 +-
 18 files changed, 159 insertions(+), 187 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..aa0dd6dad068 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
 
placement->num_placement = c;
placement->placement = places;
-
-   placement->num_busy_placement = c;
-   placement->busy_placement = places;
 }
 
 /**
@@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
AMDGPU_GEM_DOMAIN_GTT);
 
/* Avoid costly evictions; only set GTT as a busy placement */
-   abo->placement.num_busy_placement = 1;
-   abo->placement.busy_placement = &abo->placements[1];
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
 
r = ttm_bo_validate(bo, &abo->placement, &ctx);
if (unlikely(r == -EBUSY || r == -ERESTARTSYS))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..9a6a00b1af40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
/* Don't handle scatter gather BOs */
if (bo->type == ttm_bo_type_sg) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
/* Object isn't an AMDGPU object so ignore */
if (!amdgpu_bo_is_amdgpu_bo(bo)) {
placement->placement = &placements;
-   placement->busy_placement = &placements;
placement->num_placement = 1;
-   placement->num_busy_placement = 1;
return;
}
 
abo = ttm_to_amdgpu_bo(bo);
if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
@@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
case AMDGPU_PL_OA:
case AMDGPU_PL_DOORBELL:
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
 
case TTM_PL_VRAM:
if (!adev->mman.buffer_funcs_enabled) {
/* Move to system memory */
amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
+
} else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
   !(abo->flags & 
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
   amdgpu_bo_in_cpu_visible_vram(abo)) {
@@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
AMDGPU_GEM_DOMAIN_CPU);
abo->placements[0].fpfn = adev->gmc.visible_vram_size 
>> PAGE_SHIFT;
abo->placements[0].lpfn = 0;
-   abo->placement.busy_placement = &abo->placements[1];
-   abo->placement.num_busy_placement = 1;
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
} else {
/*

[PATCH 4/4] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2024-01-04 Thread Christian König

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index f110dfdc4feb..979cecf18f17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
+   /*
+* When GTT is just an alternative to VRAM make sure that we
+* only use it as fallback and still try to fill up VRAM first.
+*/
+   if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
+   places[c].flags |= TTM_PL_FLAG_BUSY;
c++;
}
 
-- 
2.34.1

[PATCH 3/4] drm/ttm: improve idle/busy handling v2

2024-01-04 Thread Christian König

Previously we would never try to move a BO into the preferred placements
when it ever landed in a busy placement since those were considered
compatible.

Rework the whole handling and finally unify the idle and busy handling.
ttm_bo_validate() is now responsible to try idle placement first and then
use the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on overcommit.

v2: fix kerneldoc warning and coding style

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c   | 131 -
 drivers/gpu/drm/ttm/ttm_resource.c |  15 ++-
 include/drm/ttm/ttm_bo.h   |   3 +-
 include/drm/ttm/ttm_resource.h |   3 +-
 6 files changed, 65 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index aa0dd6dad068..f110dfdc4feb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev,
(*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT;
}
r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement,
-&(*bo_ptr)->tbo.resource, &ctx);
+&(*bo_ptr)->tbo.resource, &ctx, false);
if (r)
goto error;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 9a6a00b1af40..00da9a81cf6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -967,7 +967,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
placements.mem_type = TTM_PL_TT;
placements.flags = bo->resource->placement;
 
-   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
+   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true);
if (unlikely(r))
return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index aa12bd5cfd17..17bfc252f76d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct 
ttm_buffer_object *bo,
hop_placement.placement = hop;
 
/* find space in the bounce domain */
-   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true);
if (ret)
return ret;
/* move to the bounce domain */
@@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
return ttm_bo_pipeline_gutting(bo);
}
 
-   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true);
if (ret) {
if (ret != -ERESTARTSYS) {
pr_err("Failed to find memory space for buffer 0x%p 
eviction\n",
@@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
return ret;
 }
 
-/*
- * Repeatedly evict memory from the LRU for @mem_type until we create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
- const struct ttm_place *place,
- struct ttm_resource **mem,
- struct ttm_operation_ctx *ctx)
-{
-   struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man;
-   struct ww_acquire_ctx *ticket;
-   int ret;
-
-   man = ttm_manager_type(bdev, place->mem_type);
-   ticket = dma_resv_locking_ctx(bo->base.resv);
-   do {
-   ret = ttm_resource_alloc(bo, place, mem);
-   if (likely(!ret))
-   break;
-   if (unlikely(ret != -ENOSPC))
-   return ret;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
- ticket);
-   if (unlikely(ret != 0))
-   return ret;
-   } while (1);
-
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
-}
-
 /**
  * ttm_bo_mem_space
  *
@@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
  * @placement: Proposed new placement for the buffer object.
  * @mem: A struct ttm_resource.
  * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
  *
  * Allocate memory space for the b

[PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space

2024-01-04 Thread Christian König

Only convert it to ENOMEM in ttm_bo_validate.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..8c1eaa74fa21 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
goto error;
}
 
-   ret = -ENOMEM;
+   ret = -ENOSPC;
if (!type_found) {
pr_err(TTM_PFX "No compatible memory type found\n");
ret = -EINVAL;
@@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
return -EINVAL;
 
ret = ttm_bo_move_buffer(bo, placement, ctx);
+   /* For backward compatibility with userspace */
+   if (ret == -ENOSPC)
+   return -ENOMEM;
if (ret)
return ret;
 
-- 
2.34.1

Rework TTMs busy handling

2024-01-04 Thread Christian König

Hi guys,

I'm trying to make this functionality a bit more useful for years now
since we multiple reports that behavior of drivers can be suboptimal
when multiple placements be given.

So basically instead of hacking around the TTM behavior in the driver
once more I've gone ahead and changed the idle/busy placement list
into idle/busy placement flags. This not only saves a bunch of code,
but also allows setting some placements as fallback which are used if
allocating from the preferred ones didn't worked.

Intel CI seems to be happy with those patches, so any more comments?

Regards,
Christian.

[PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space

2024-01-04 Thread Christian König

Only convert it to ENOMEM in ttm_bo_validate.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..8c1eaa74fa21 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
goto error;
}
 
-   ret = -ENOMEM;
+   ret = -ENOSPC;
if (!type_found) {
pr_err(TTM_PFX "No compatible memory type found\n");
ret = -EINVAL;
@@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
return -EINVAL;
 
ret = ttm_bo_move_buffer(bo, placement, ctx);
+   /* For backward compatibility with userspace */
+   if (ret == -ENOSPC)
+   return -ENOMEM;
if (ret)
return ret;
 
-- 
2.34.1

Re: [PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space

2023-12-13 Thread Christian König

Before anybody wonders why no additional people are on CC: I just send 
that out to get feedback from the CI systems.


Regards,
Christian.

Am 13.12.23 um 15:42 schrieb Christian König:

Only convert it to ENOMEM in ttm_bo_validate.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..8c1eaa74fa21 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
goto error;
}
  
-	ret = -ENOMEM;

+   ret = -ENOSPC;
if (!type_found) {
pr_err(TTM_PFX "No compatible memory type found\n");
ret = -EINVAL;
@@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
return -EINVAL;
  
  	ret = ttm_bo_move_buffer(bo, placement, ctx);

+   /* For backward compatibility with userspace */
+   if (ret == -ENOSPC)
+   return -ENOMEM;
if (ret)
return ret;

[PATCH 4/4] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2023-12-13 Thread Christian König

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index f110dfdc4feb..979cecf18f17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
+   /*
+* When GTT is just an alternative to VRAM make sure that we
+* only use it as fallback and still try to fill up VRAM first.
+*/
+   if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
+   places[c].flags |= TTM_PL_FLAG_BUSY;
c++;
}
 
-- 
2.34.1

[PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space

2023-12-13 Thread Christian König

Only convert it to ENOMEM in ttm_bo_validate.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..8c1eaa74fa21 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
goto error;
}
 
-   ret = -ENOMEM;
+   ret = -ENOSPC;
if (!type_found) {
pr_err(TTM_PFX "No compatible memory type found\n");
ret = -EINVAL;
@@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
return -EINVAL;
 
ret = ttm_bo_move_buffer(bo, placement, ctx);
+   /* For backward compatibility with userspace */
+   if (ret == -ENOSPC)
+   return -ENOMEM;
if (ret)
return ret;
 
-- 
2.34.1

[PATCH 3/4] drm/ttm: improve idle/busy handling

2023-12-13 Thread Christian König

Previously we would never try to move a BO into the preferred placements when
it ever landed in a busy placement since those were considered compatible.

Rework the whole handling and finally unify the idle and busy handling.
ttm_bo_validate() is now responsible to try idle placement first and then use
the busy placement if that didn't worked.

Drawback is that we now always try the idle placement first for each
validation which might cause some additional CPU overhead on overcommit.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c   | 131 -
 drivers/gpu/drm/ttm/ttm_resource.c |  14 ++-
 include/drm/ttm/ttm_bo.h   |   3 +-
 include/drm/ttm/ttm_resource.h |   3 +-
 6 files changed, 64 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index aa0dd6dad068..f110dfdc4feb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev,
(*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT;
}
r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement,
-&(*bo_ptr)->tbo.resource, &ctx);
+&(*bo_ptr)->tbo.resource, &ctx, false);
if (r)
goto error;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 9a6a00b1af40..00da9a81cf6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -967,7 +967,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
placements.mem_type = TTM_PL_TT;
placements.flags = bo->resource->placement;
 
-   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
+   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true);
if (unlikely(r))
return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index aa12bd5cfd17..17bfc252f76d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct 
ttm_buffer_object *bo,
hop_placement.placement = hop;
 
/* find space in the bounce domain */
-   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true);
if (ret)
return ret;
/* move to the bounce domain */
@@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
return ttm_bo_pipeline_gutting(bo);
}
 
-   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx);
+   ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true);
if (ret) {
if (ret != -ERESTARTSYS) {
pr_err("Failed to find memory space for buffer 0x%p 
eviction\n",
@@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
return ret;
 }
 
-/*
- * Repeatedly evict memory from the LRU for @mem_type until we create enough
- * space, or we've evicted everything and there isn't enough space.
- */
-static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
- const struct ttm_place *place,
- struct ttm_resource **mem,
- struct ttm_operation_ctx *ctx)
-{
-   struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man;
-   struct ww_acquire_ctx *ticket;
-   int ret;
-
-   man = ttm_manager_type(bdev, place->mem_type);
-   ticket = dma_resv_locking_ctx(bo->base.resv);
-   do {
-   ret = ttm_resource_alloc(bo, place, mem);
-   if (likely(!ret))
-   break;
-   if (unlikely(ret != -ENOSPC))
-   return ret;
-   ret = ttm_mem_evict_first(bdev, man, place, ctx,
- ticket);
-   if (unlikely(ret != 0))
-   return ret;
-   } while (1);
-
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
-}
-
 /**
  * ttm_bo_mem_space
  *
@@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
  * @placement: Proposed new placement for the buffer object.
  * @mem: A struct ttm_resource.
  * @ctx: if and how to sleep, lock buffers and alloc memory
+ * @force_space: If we should evict buffers to force space
  *
  * Allocate memory space for the buffer object pointed to by @bo, using
  * th

[PATCH 2/4] drm/ttm: replace busy placement with flags v3

2023-12-13 Thread Christian König

From: Somalapuram Amaranath 

Instead of a list of separate busy placement add flags which indicate
that a placement should only be used when there is room or if we need to
evict.

v2: add missing TTM_PL_FLAG_IDLE for i915
v3: fix auto build test ERROR on drm-tip/drm-tip

Signed-off-by: Christian König 
Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +--
 drivers/gpu/drm/drm_gem_vram_helper.c  |  2 -
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 
 drivers/gpu/drm/loongson/lsdc_ttm.c|  2 -
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 59 +
 drivers/gpu/drm/nouveau/nouveau_bo.h   |  1 -
 drivers/gpu/drm/qxl/qxl_object.c   |  2 -
 drivers/gpu/drm/qxl/qxl_ttm.c  |  2 -
 drivers/gpu/drm/radeon/radeon_object.c |  2 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  8 +-
 drivers/gpu/drm/radeon/radeon_uvd.c|  1 -
 drivers/gpu/drm/ttm/ttm_bo.c   | 21 +++--
 drivers/gpu/drm/ttm/ttm_resource.c | 73 
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |  3 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +-
 include/drm/ttm/ttm_placement.h| 10 ++-
 include/drm/ttm/ttm_resource.h |  8 +-
 18 files changed, 160 insertions(+), 187 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..aa0dd6dad068 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
 
placement->num_placement = c;
placement->placement = places;
-
-   placement->num_busy_placement = c;
-   placement->busy_placement = places;
 }
 
 /**
@@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
AMDGPU_GEM_DOMAIN_GTT);
 
/* Avoid costly evictions; only set GTT as a busy placement */
-   abo->placement.num_busy_placement = 1;
-   abo->placement.busy_placement = &abo->placements[1];
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
 
r = ttm_bo_validate(bo, &abo->placement, &ctx);
if (unlikely(r == -EBUSY || r == -ERESTARTSYS))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..9a6a00b1af40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
/* Don't handle scatter gather BOs */
if (bo->type == ttm_bo_type_sg) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
/* Object isn't an AMDGPU object so ignore */
if (!amdgpu_bo_is_amdgpu_bo(bo)) {
placement->placement = &placements;
-   placement->busy_placement = &placements;
placement->num_placement = 1;
-   placement->num_busy_placement = 1;
return;
}
 
abo = ttm_to_amdgpu_bo(bo);
if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
@@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
case AMDGPU_PL_OA:
case AMDGPU_PL_DOORBELL:
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
 
case TTM_PL_VRAM:
if (!adev->mman.buffer_funcs_enabled) {
/* Move to system memory */
amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
+
} else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
   !(abo->flags & 
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
   amdgpu_bo_in_cpu_visible_vram(abo)) {
@@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
AMDGPU_GEM_DOMAIN_CPU);
abo->placements[0].fpfn = adev->gmc.visible_vram_size 
>> PAGE_SHIFT;
abo->placements[0].lpfn = 0;
-   abo->placement.busy_placement = &abo->placements[1];
-   abo->placement.num_busy_placement = 1;
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
} else {
/* Move to GTT memory */
a

Re: [RFC] drm/i915: Allow dmabuf mmap forwarding

2023-12-13 Thread Christian König


Am 13.12.23 um 12:46 schrieb Tvrtko Ursulin:


Hi,

On 12/12/2023 14:10, Christian König wrote:

Hi Tvrtko,

Thanks for pointing this mail out once more, I've totally missed it.


That's okay, if it was really urgent I would have re-raised the thread 
earlier. :) As it stands so far it is only about acceptance test 
suites failing and no known real use cases affected.



Am 12.12.23 um 11:37 schrieb Tvrtko Ursulin:


On 25/09/2023 14:16, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Allow mmap forwarding for imported buffers in order to allow 
minigbm mmap

to work on aperture-less platforms such as Meteorlake.

So far i915 did not allow mmap on imported buffers but from minigbm
perspective that worked because of the DRM_IOCTL_I915_GEM_MMAP_GTT 
fall-

back would then be attempted, and would be successful.

This stops working on Meteorlake since there is no aperture.

Allow i915 to mmap imported buffers using forwarding via 
dma_buf_mmap(),
which allows the primary minigbm path of 
DRM_IOCTL_I915_GEM_MMAP_OFFSET /

I915_MMAP_OFFSET_WB to work.

Signed-off-by: Tvrtko Ursulin 
Cc: Daniel Vetter 
Cc: Christian König 
Cc: Matthew Auld 
Cc: Nirmoy Das 
---
1)
It is unclear to me if any real userspace depends on this, but 
there are

certainly compliance suites which fail.


Well that is actually intentional, but see below.



2)
It is also a bit unclear to me if dma_buf_mmap() is exactly 
intended for
this kind of use. It seems that it is, but I also found some old 
mailing

list discussions suggesting there might be some unresolved questions
around VMA revocation.


I actually solved those a few years back by introducing the 
vma_set_file() function which standardized the dance necessary for 
the dma_buf_mmap() function.




1 + 2 = RFC for now.

Daniel and Christian were involved in 2) in the past so comments would
be appreciated.


Any comments on this one? I don't have all the historical knowledge 
of when this was maybe attempted before and what problems were hit, 
or something. So would there be downsides or it is fine to forward it.


It works technically inside the kernel and Thomas Zimmerman suggested 
a patch set which made it possible to use for all DRM drivers.


But IIRC this patch set was rejected with the rational that while 
doing an mmap() on an imported DMA-buf works when userspace actually 
does this then there is a bug in userspace. The UMD doesn't seems to 
be aware of the fact that the buffer is imported and so for example 
needs to call dma_buf_begin_cpu_access() and dma_buf_end_cpu_access().


UMDs can trivially work around this by doing the mmap() on the 
DMA-buf file descriptor instead (potentially after re-exporting it), 
but the kernel really shouldn't help hide userspace bugs.


Hm right, however why does drm_gem_shmem_mmap:

if (obj->import_attach) {
    ret = dma_buf_mmap(obj->dma_buf, vma, 0);


Honestly I have absolutely no idea.

Isn't that allowing drivers which use the helper to to forward to 
dma_buf_mmap?


Yes, Daniel mentioned that some drivers did this before we found that 
it's actually not a good idea. It could be that this code piece was 
meant with that and we only allow it to avoid breaking UAPI.


Never the less I think we should add documentation for this.

Maybe I am getting lost in the forest of callbacks in this area.. 
Because it is supposed to be about shmem objects, but drivers which 
use the helper and rely on common prime import look and also use 
drm_gem_shmem_prime_import_sg_table can get there.


I don't fully understand it either of hand.

Regards,
Christian.



Regards,

Tvrtko



Test-with: 20230925131539.32743-1-tvrtko.ursu...@linux.intel.com

---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  | 78 
+++

  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
  2 files changed, 65 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c

index aa4d842d4c5a..78c84c0a8b08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -5,6 +5,7 @@
   */
    #include 
+#include 
  #include 
  #include 
  #include 
@@ -664,6 +665,7 @@ insert_mmo(struct drm_i915_gem_object *obj, 
struct i915_mmap_offset *mmo)

  static struct i915_mmap_offset *
  mmap_offset_attach(struct drm_i915_gem_object *obj,
 enum i915_mmap_type mmap_type,
+   bool forward_mmap,
 struct drm_file *file)
  {
  struct drm_i915_private *i915 = to_i915(obj->base.dev);
@@ -682,6 +684,7 @@ mmap_offset_attach(struct drm_i915_gem_object 
*obj,

    mmo->obj = obj;
  mmo->mmap_type = mmap_type;
+    mmo->forward_mmap = forward_mmap;
  drm_vma_node_reset(&mmo->vma_node);
    err = drm_vma_offset_add(obj->base.dev->vma_offset_manager,
@@ -714,12 +717,25 @@ mmap_offset_attach(struct drm_i915_gem_object 
*obj,

  return ERR_PTR(err);

Re: [RFC] drm/i915: Allow dmabuf mmap forwarding

2023-12-12 Thread Christian König


Hi Tvrtko,

Thanks for pointing this mail out once more, I've totally missed it.

Am 12.12.23 um 11:37 schrieb Tvrtko Ursulin:


On 25/09/2023 14:16, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Allow mmap forwarding for imported buffers in order to allow minigbm 
mmap

to work on aperture-less platforms such as Meteorlake.

So far i915 did not allow mmap on imported buffers but from minigbm
perspective that worked because of the DRM_IOCTL_I915_GEM_MMAP_GTT fall-
back would then be attempted, and would be successful.

This stops working on Meteorlake since there is no aperture.

Allow i915 to mmap imported buffers using forwarding via dma_buf_mmap(),
which allows the primary minigbm path of 
DRM_IOCTL_I915_GEM_MMAP_OFFSET /

I915_MMAP_OFFSET_WB to work.

Signed-off-by: Tvrtko Ursulin 
Cc: Daniel Vetter 
Cc: Christian König 
Cc: Matthew Auld 
Cc: Nirmoy Das 
---
1)
It is unclear to me if any real userspace depends on this, but there are
certainly compliance suites which fail.


Well that is actually intentional, but see below.



2)
It is also a bit unclear to me if dma_buf_mmap() is exactly intended for
this kind of use. It seems that it is, but I also found some old mailing
list discussions suggesting there might be some unresolved questions
around VMA revocation.


I actually solved those a few years back by introducing the 
vma_set_file() function which standardized the dance necessary for the 
dma_buf_mmap() function.




1 + 2 = RFC for now.

Daniel and Christian were involved in 2) in the past so comments would
be appreciated.


Any comments on this one? I don't have all the historical knowledge of 
when this was maybe attempted before and what problems were hit, or 
something. So would there be downsides or it is fine to forward it.


It works technically inside the kernel and Thomas Zimmerman suggested a 
patch set which made it possible to use for all DRM drivers.


But IIRC this patch set was rejected with the rational that while doing 
an mmap() on an imported DMA-buf works when userspace actually does this 
then there is a bug in userspace. The UMD doesn't seems to be aware of 
the fact that the buffer is imported and so for example needs to call 
dma_buf_begin_cpu_access() and dma_buf_end_cpu_access().


UMDs can trivially work around this by doing the mmap() on the DMA-buf 
file descriptor instead (potentially after re-exporting it), but the 
kernel really shouldn't help hide userspace bugs.


Regards,
Christian.



Regards,

Tvrtko



Test-with: 20230925131539.32743-1-tvrtko.ursu...@linux.intel.com

---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  | 78 +++
  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
  2 files changed, 65 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c

index aa4d842d4c5a..78c84c0a8b08 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -5,6 +5,7 @@
   */
    #include 
+#include 
  #include 
  #include 
  #include 
@@ -664,6 +665,7 @@ insert_mmo(struct drm_i915_gem_object *obj, 
struct i915_mmap_offset *mmo)

  static struct i915_mmap_offset *
  mmap_offset_attach(struct drm_i915_gem_object *obj,
 enum i915_mmap_type mmap_type,
+   bool forward_mmap,
 struct drm_file *file)
  {
  struct drm_i915_private *i915 = to_i915(obj->base.dev);
@@ -682,6 +684,7 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
    mmo->obj = obj;
  mmo->mmap_type = mmap_type;
+    mmo->forward_mmap = forward_mmap;
  drm_vma_node_reset(&mmo->vma_node);
    err = drm_vma_offset_add(obj->base.dev->vma_offset_manager,
@@ -714,12 +717,25 @@ mmap_offset_attach(struct drm_i915_gem_object 
*obj,

  return ERR_PTR(err);
  }
  +static bool
+should_forward_mmap(struct drm_i915_gem_object *obj,
+    enum i915_mmap_type mmap_type)
+{
+    if (!obj->base.import_attach)
+    return false;
+
+    return mmap_type == I915_MMAP_TYPE_WB ||
+   mmap_type == I915_MMAP_TYPE_WC ||
+   mmap_type == I915_MMAP_TYPE_UC;
+}
+
  static int
  __assign_mmap_offset(struct drm_i915_gem_object *obj,
   enum i915_mmap_type mmap_type,
   u64 *offset, struct drm_file *file)
  {
  struct i915_mmap_offset *mmo;
+    bool should_forward;
    if (i915_gem_object_never_mmap(obj))
  return -ENODEV;
@@ -735,12 +751,15 @@ __assign_mmap_offset(struct drm_i915_gem_object 
*obj,

  if (mmap_type == I915_MMAP_TYPE_FIXED)
  return -ENODEV;
  +    should_forward = should_forward_mmap(obj, mmap_type);
+
  if (mmap_type != I915_MMAP_TYPE_GTT &&
  !i915_gem_object_has_struct_page(obj) &&
-    !i915_gem_object_has_iomem(obj))
+    !i915_gem_object_has_iomem(obj) &&
+    !should_forward)
  return -ENODEV;
  -    mmo = mmap_offset_attach(obj, mmap_t

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-12-04 Thread Christian König


Am 04.12.23 um 00:32 schrieb Alistair Popple:

Christian König  writes:


Am 01.12.23 um 06:48 schrieb Zeng, Oak:

[SNIP]
Besides memory eviction/oversubscription, there are a few other pain points 
when I use hmm:

1) hmm doesn't support file-back memory, so it is hard to share

memory b/t process in a gpu environment. You mentioned you have a
plan... How hard is it to support file-backed in your approach?

As hard as it is to support it through HMM. That's what I meant that
this approach doesn't integrate well, as far as I know the problem
isn't inside HMM or any other solution but rather in the file system
layer.

In what way does HMM not support file-backed memory? I was under the
impression that at least hmm_range_fault() does.


Oh, well file-backed memory is indeed supported by HMM. IIRC KFD 
actually allows this for the SVM implementation.


It's just that the way the file system layer (for example) does 
writeback absolutely doesn't fit well with how GPUs and other 
acceleration devices work.


The general assumption in the kernel seems to be that page faults and 
preemption are extremely cheap. So things like copy on write is used 
quite extensively.


For a CPU this basically means you just need to context change into the 
kernel once to get the new address of a page into your PTEs on write, 
while for acceleration devices this always require a complete CPU round 
trip for each initial write access for a 4k page. The performance impact 
is just horrible.


Regards,
Christian.








  - Alistair


Regards,
Christian.


2)virtual address range based memory attribute/hint: with hmadvise,

where do you save the memory attribute of a virtual address range? Do
you need to extend vm_area_struct to save it? With hmm, we have to
maintain such information at driver. This ends up with pretty
complicated logic to split/merge those address range. I know core mm
has similar logic to split/merge vma...

Oak



-Weixi

-----Original Message-
From: Christian König
Sent: Thursday, November 30, 2023 4:28 PM
To: Zeng, Oak; Christian König
; zhuweixi; linux-
m...@kvack.org;linux-ker...@vger.kernel.org;a...@linux-foundation.org;
Danilo Krummrich; Dave Airlie; Daniel
Vetter
Cc:intel-gvt-...@lists.freedesktop.org;rcampb...@nvidia.com;
mhairgr...@nvidia.com;j...@nvidia.com;weixi@openeuler.sh;
jhubb...@nvidia.com;intel-gfx@lists.freedesktop.org;apop...@nvidia.com;
xinhui@amd.com;amd-...@lists.freedesktop.org;
tvrtko.ursu...@linux.intel.com;ogab...@kernel.org;jgli...@redhat.com; dri-
de...@lists.freedesktop.org;z...@nvidia.com; Vivi, Rodrigo
;alexander.deuc...@amd.com;leo...@nvidia.com;
felix.kuehl...@amd.com; Wang, Zhi A;
mgor...@suse.de
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory
management) for external memory devices

Hi Oak,

yeah, #4 is indeed a really good point and I think Felix will agree to that as 
well.

HMM is basically still missing a way to advise device attributes for the CPU
address space. Both migration strategy as well as device specific information 
(like
cache preferences) fall into this category.

Since there is a device specific component in those attributes as well I think
device specific IOCTLs still make sense to update them, but HMM should offer
the functionality to manage and store those information.

Split and merge of VMAs only become a problem if you attach those information
to VMAs, if you keep them completely separate than that doesn't become an
issue either. The down side of this approach is that you don't get automatically
extending attribute ranges for growing VMAs for example.

Regards,
Christian.

Am 29.11.23 um 23:23 schrieb Zeng, Oak:

Hi Weixi,

Even though Christian has listed reasons rejecting this proposal (yes they are

very reasonable to me), I would open my mind and further explore the possibility
here. Since the current GPU driver uses a hmm based implementation (AMD and
NV has done this; At Intel we are catching up), I want to explore how much we
can benefit from the proposed approach and how your approach can solve some
pain points of our development. So basically what I am questioning here is: what
is the advantage of your approach against hmm.

To implement a UVM (unified virtual address space b/t cpu and gpu device),

with hmm, driver essentially need to implement below functions:

1. device page table update. Your approach requires the same because
this is device specific codes

2. Some migration functions to migrate memory b/t system memory and GPU

local memory. My understanding is, even though you generalized this a bit, such
as modified cpu page fault path, provided "general" gm_dev_fault handler... but
device driver still need to provide migration functions because migration
functions have to be device specific (i.e., using device dma/copy engine for
performance purpose). Right?

3. GPU physical memory management, this part is now in drm/buddy, shared

by all drivers. I thin

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-12-01 Thread Christian König


Am 01.12.23 um 06:48 schrieb Zeng, Oak:

[SNIP]

   3. MMU notifiers register hooks at certain core MM events, while GMEM
declares basic functions and internally invokes them. GMEM requires less from
the driver side -- no need to understand what core MM behaves at certain MMU
events. GMEM also expects fewer bugs than MMU notifiers: implementing basic
operations with standard declarations vs. implementing whatever random device
MM logic in MMU notifiers.

This seems true to me. I feel the mmu notifier thing, especially the 
synchronization/lock design (those sequence numbers, interacting with driver 
lock, and the mmap lock) are very complicated. I indeed spent time to 
understand the specification documented in hmm.rst...

Your approach seems better.


I have to agree on that as well. HMM/MMU notifiers are developed with 
exposing MM functionality in mind instead of trying to fulfill driver 
requirements.


But this originated not in HMM/MMU notifiers, rather it was a 
requirement to not change the CPU side of the MM code to much.


So when you can get the acknowledgement to make changes to the CPU side 
of the MM code to better handle device driver requirements then I'm 
totally in favor of this.


It's just that I don't think the approach of starting with a new 
framework/idea will help with that. Instead rather try to improve the 
existing functionality.



   5. GMEM has been demonstrated to allow device memory oversubscription (a
GMEM-based 32GB NPU card can run a GPT model oversubscribing 500GB host
DDR), while drivers using HMM/MMU notifier must implement this logic one by
one. I will submit this part in a future RFC patch.

When device memory is oversubscribed, do you call a driver callback function to 
evict device memory to system memory? Or just cpu copy? Copy with device's fast 
copy engine is faster.

I can see even though with both approach we need to implement a driver copy 
function, with your approach, the driver logic can be simplified. With today's 
drm/ttm, I do see the logic in the memory eviction area is very complicated. 
Those eviction fence (some call it suspend fence), dma-fence enable 
signallingvery complicated to me.

Essentially evict device memory to system memory is nothing different from 
evict system memory to disk... so if your approach can leverage some linux core 
mm eviction logic, I do see it can simplify things here...


We actually already do this in TTM as well through the MM shrinkers.

It's just that it's an intentional design decision to make the whole 
thing asynchronously using dma_fence etc... That's why you have this 
complexity in there.



I want to reiterate that GMEM's shared address space support is a bonus result,
not a main contribution... It was done because it was not difficult to implement
internal CPU-device coordination mechanism when core MM is extended by
GMEM to support devices.

Besides memory eviction/oversubscription, there are a few other pain points 
when I use hmm:

1) hmm doesn't support file-back memory, so it is hard to share memory b/t 
process in a gpu environment. You mentioned you have a plan... How hard is it 
to support file-backed in your approach?


As hard as it is to support it through HMM. That's what I meant that 
this approach doesn't integrate well, as far as I know the problem isn't 
inside HMM or any other solution but rather in the file system layer.


Regards,
Christian.


2)virtual address range based memory attribute/hint: with hmadvise, where do 
you save the memory attribute of a virtual address range? Do you need to extend 
vm_area_struct to save it? With hmm, we have to maintain such information at 
driver. This ends up with pretty complicated logic to split/merge those address 
range. I know core mm has similar logic to split/merge vma...

Oak



-Weixi

-Original Message-
From: Christian König
Sent: Thursday, November 30, 2023 4:28 PM
To: Zeng, Oak; Christian König
; zhuweixi; linux-
m...@kvack.org;linux-ker...@vger.kernel.org;a...@linux-foundation.org;
Danilo Krummrich; Dave Airlie; Daniel
Vetter
Cc:intel-gvt-...@lists.freedesktop.org;rcampb...@nvidia.com;
mhairgr...@nvidia.com;j...@nvidia.com;weixi@openeuler.sh;
jhubb...@nvidia.com;intel-gfx@lists.freedesktop.org;apop...@nvidia.com;
xinhui@amd.com;amd-...@lists.freedesktop.org;
tvrtko.ursu...@linux.intel.com;ogab...@kernel.org;jgli...@redhat.com; dri-
de...@lists.freedesktop.org;z...@nvidia.com; Vivi, Rodrigo
;alexander.deuc...@amd.com;leo...@nvidia.com;
felix.kuehl...@amd.com; Wang, Zhi A;
mgor...@suse.de
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory
management) for external memory devices

Hi Oak,

yeah, #4 is indeed a really good point and I think Felix will agree to that as 
well.

HMM is basically still missing a way to advise device attributes for the CPU
address space. Both migration strategy as well as device specific information 
(like
cac

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-30 Thread Christian König

in Linux..."
To be honest, not using a logical page table for anonymous memory is why Linux THP 
fails compared with FreeBSD's superpage, but I am not going to elaborate it here. But 
yes, and I am looking for merging struct vm_object->logical_page_table with struct 
address_space->i_pages. This will make a natural support for devices 
oversubscribing both host DRAM and disks. As explained in my cover letter, struct 
vm_object borrows FreeBSD's VM design -- it provides a unified abstraction layer for 
anonymous, file-backed memory and etc.


I'm not that deep into this stuff, so leaving this to the experts on 
FreeBSD.



3. "Requirements to CPU address space management and device address space management 
are just massively different. For example huge and giant pages are a must have for modern 
devices..."
I think you are asking two questions. First, is VA space a problem?


No, this is about something completely different.


GMEM assumes that device VA space should be covered by CPU VA space (sorry 
i386), ...

[SNIP]

I'm removing this because you were talking about something different 
than what I meant.


I will try to explain the background on an example outside of machine 
learning and compute since this framework should be applicable to every 
use case and not be limited to those. Otherwise Linux would sooner or 
later just be applicable to only those use cases.


So let's take a look at how modern games use a GPU for example. On 
startup a rather large part of the GPU address space is allocated, for 
example 64GiB. Then the necessary resources (images, texture, vertices, 
shaders etc..) are loaded into separate buffer objects.


Those resources are then mapped into the allocated address on a page by 
page basis. So you basically don't have large VMAs which cover one 
resource, but rather the page tables are used as a remapping table
 into the available resources. This increases the number of virtual 
mappings drastically, it's kind of comparable how an anon_vma works 
inside a VMA on Linux.


Those mappings also are not setup at start and then used throughout the 
whole lifetime of the process, but rather done very dynamically 
sometimes resulting in thousands of mapping operations per second.


Additional to that devices have page table feature which CPUs don't 
have. This ranges from support for partial resident texture over flags 
how caching and dynamic color space compression is made.


So the mappings contain tons of device specific information and it's 
most likely not even possible to handle all of this with a device 
independent mmap() call.



4. "The argument that a shared memory management leads to less bugs has also 
absolutely not be proven true. Instead we literally spend month if not years hunting down 
bugs which resulted from interaction between CPU and devices."
This is another case supporting GMEM. Don't developers want to let GMEM handle 
the CPU-device interaction so that they can waive months of debugging cost?


No, we already have HMM for that.

Regards,
Christian.



PS, hmadvise() is based on the idea of Nvidia's cudaMemAdvise() which provides 
abundant and useful memory policies. HMM extended mbind() instead.

-Weixi

-Original Message-
From: Christian König 
Sent: Wednesday, November 29, 2023 11:22 PM
To: zhuweixi ; Dave Airlie 
Cc: linux...@kvack.org; linux-ker...@vger.kernel.org; 
a...@linux-foundation.org; weixi@openeuler.sh; mgor...@suse.de; 
jgli...@redhat.com; rcampb...@nvidia.com; jhubb...@nvidia.com; 
apop...@nvidia.com; mhairgr...@nvidia.com; z...@nvidia.com; 
alexander.deuc...@amd.com; xinhui@amd.com; amd-...@lists.freedesktop.org; 
felix.kuehl...@amd.com; ogab...@kernel.org; dri-de...@lists.freedesktop.org; 
j...@nvidia.com; leo...@nvidia.com; zhen...@linux.intel.com; 
zhi.a.w...@intel.com; intel-gvt-...@lists.freedesktop.org; 
intel-gfx@lists.freedesktop.org; jani.nik...@linux.intel.com; 
joonas.lahti...@linux.intel.com; rodrigo.v...@intel.com; 
tvrtko.ursu...@linux.intel.com
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) 
for external memory devices

Am 29.11.23 um 09:27 schrieb zhuweixi:

Glad to hear that more sharable code is desirable.
IMHO, for a common MM subsystem, it is more beneficial for GMEM to
extend core MM instead of building a separate one.

As stated in the beginning of my RFC letter, MM systems are large and
similar. Even a sophisticated one like Linux MM that has evolved over
decades still suffers from an increasing number of bugs[1]. So,
directly extending core MM to support devices not only avoids opening
a new box of bugs, but also allows the community to concentrate on
maintaining one single MM system. On the other side, GMEM does no hurt
to core MM If a CPU process is not attached with device contexts.

@Christian, could you provide more information on what AMD proposed
with KFD and why it was r

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-30 Thread Christian König


Hi Oak,

yeah, #4 is indeed a really good point and I think Felix will agree to 
that as well.


HMM is basically still missing a way to advise device attributes for the 
CPU address space. Both migration strategy as well as device specific 
information (like cache preferences) fall into this category.


Since there is a device specific component in those attributes as well I 
think device specific IOCTLs still make sense to update them, but HMM 
should offer the functionality to manage and store those information.


Split and merge of VMAs only become a problem if you attach those 
information to VMAs, if you keep them completely separate than that 
doesn't become an issue either. The down side of this approach is that 
you don't get automatically extending attribute ranges for growing VMAs 
for example.


Regards,
Christian.

Am 29.11.23 um 23:23 schrieb Zeng, Oak:

Hi Weixi,

Even though Christian has listed reasons rejecting this proposal (yes they are 
very reasonable to me), I would open my mind and further explore the 
possibility here. Since the current GPU driver uses a hmm based implementation 
(AMD and NV has done this; At Intel we are catching up), I want to explore how 
much we can benefit from the proposed approach and how your approach can solve 
some pain points of our development. So basically what I am questioning here 
is: what is the advantage of your approach against hmm.

To implement a UVM (unified virtual address space b/t cpu and gpu device), with 
hmm, driver essentially need to implement below functions:

1. device page table update. Your approach requires the same because this is 
device specific codes

2. Some migration functions to migrate memory b/t system memory and GPU local memory. My 
understanding is, even though you generalized this a bit, such as modified cpu page fault 
path, provided "general" gm_dev_fault handler... but device driver still need 
to provide migration functions because migration functions have to be device specific 
(i.e., using device dma/copy engine for performance purpose). Right?

3. GPU physical memory management, this part is now in drm/buddy, shared by all 
drivers. I think with your approach, driver still need to provide callback 
functions to allocate/free physical pages. Right? Or do you let linux core mm 
buddy manage device memory directly?

4. madvise/hints/virtual address range management. This has been pain point for 
us. Right now device driver has to maintain certain virtual address range data 
structure to maintain hints and other virtual address range based memory 
attributes. Driver need to sync with linux vma. Driver need to explicitly deal 
with range split/merging... HMM doesn't provide support in this area. Your 
approach seems cleaner/simpler to me...


So in above, I have examined the some key factors of a gpu UVM memory manager. 
I think for #1 and #2, hmm has provide pretty good abstraction/tools for 
address space mirroring and migration helpers. For #3, since we have a common 
drm/buddy layer, I don't think it is a big problem for driver writer now.

I do see #4 is something you solved more beautifully, requires new system call 
though.

Oak



-Original Message-
From: dri-devel  On Behalf Of
Christian König
Sent: Tuesday, November 28, 2023 8:09 AM
To: Weixi Zhu ; linux...@kvack.org; linux-
ker...@vger.kernel.org; a...@linux-foundation.org; Danilo Krummrich
; Dave Airlie ; Daniel Vetter

Cc: dri-de...@lists.freedesktop.org; leo...@nvidia.com; apop...@nvidia.com;
amd-...@lists.freedesktop.org; mgor...@suse.de; z...@nvidia.com; Wang, Zhi
A ; rcampb...@nvidia.com; j...@nvidia.com;
weixi@openeuler.sh; jhubb...@nvidia.com; intel-gfx@lists.freedesktop.org;
mhairgr...@nvidia.com; jgli...@redhat.com; Vivi, Rodrigo
; intel-gvt-...@lists.freedesktop.org;
tvrtko.ursu...@linux.intel.com; felix.kuehl...@amd.com; xinhui@amd.com;
alexander.deuc...@amd.com; ogab...@kernel.org
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory
management) for external memory devices

Adding a few missing important people to the explicit to list.

Am 28.11.23 um 13:50 schrieb Weixi Zhu:

The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU

has

30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-29 Thread Christian König

tly 
to the CPU address space.


Regards,
Christian.



[1] Huang, Jian, Moinuddin K. Qureshi, and Karsten Schwan. "An evolutionary study of 
linux memory management for fun and profit." 2016 USENIX Annual Technical Conference 
(USENIX ATC 16). 2016.

Thanks,
Weixi

-Original Message-
From: Dave Airlie 
Sent: Wednesday, November 29, 2023 1:15 PM
To: Christian König 
Cc: zhuweixi ; linux...@kvack.org; 
linux-ker...@vger.kernel.org; a...@linux-foundation.org; weixi@openeuler.sh; 
mgor...@suse.de; jgli...@redhat.com; rcampb...@nvidia.com; jhubb...@nvidia.com; 
apop...@nvidia.com; mhairgr...@nvidia.com; z...@nvidia.com; 
alexander.deuc...@amd.com; xinhui@amd.com; amd-...@lists.freedesktop.org; 
felix.kuehl...@amd.com; ogab...@kernel.org; dri-de...@lists.freedesktop.org; 
j...@nvidia.com; leo...@nvidia.com; zhen...@linux.intel.com; zhi.a.w...@intel.com; 
intel-gvt-...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; 
jani.nik...@linux.intel.com; joonas.lahti...@linux.intel.com; rodrigo.v...@intel.com; 
tvrtko.ursu...@linux.intel.com
Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) 
for external memory devices

On Tue, 28 Nov 2023 at 23:07, Christian König  wrote:

Am 28.11.23 um 13:50 schrieb Weixi Zhu:

The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar mechanisms except for the
hardware-dependent part, so reinventing them is effectively introducing
redundant code (14K~70K for each case). Such developing/maintaining is not
cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
notifiers/HMM. This raises the bar for driver development, since developers
must understand how Linux MM works. Further, it creates code maintenance
problems -- any changes to Linux MM potentially require coordinated changes
to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these
external MM subsystems disappear. For example, a throughput-oriented
accelerator will not tolerate executing heavy memory access workload with
a host MMU/IOMMU via a remote bus. Therefore, devices will still have
their own MMU and pick a simpler page table format for lower address
translation overhead, requiring external MM subsystems.



What GMEM (Generalized Memory Management [1]) does:

GMEM extends Linux MM to share its machine-independent MM code. Only
high-level interface is provided for device drivers. This prevents
accelerator drivers from reinventing the wheel, but relies on drivers to
implement their hardware-dependent functions declared by GMEM. GMEM's key
interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
gm_dev_register_physmem(). Here briefly describe how a device driver
utilizes them:
1. At boot time, call gm_dev_create() and registers the implementation of
 hardware-dependent functions as declared in struct gm_mmu.
   - If the device has local DRAM, call gm_dev_register_physmem() to
 register available physical addresses.
2. When a device context is initialized (e.g. triggered by ioctl), check if
 the current CPU process has been attached to a gmem address space
 (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
 to it.
3. Call gm_as_attach() to attach the device context to a gmem address space.
4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
 device computation happens.

GMEM has changed the following assumptions in Linux MM:
1. An mm_struct not only handle a single CPU context, but may also handle
   external memory contexts encapsulated as gm_context listed in
   mm->gm_as. An external memory context can include a few or all of the
   following parts: an external MMU (that requires TLB invalidation), an
   external page table (that requires PTE manipulation) and external DRAM
   (that requires physical memory management).

Well that is pretty much exactly what AMD has already proposed with KFD
and was rejected for rather good reasons.

MMU functions
The MMU functions peer_map() and peer_unmap() overlap other functions,
leav

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Christian König


Adding a few missing important people to the explicit to list.

Am 28.11.23 um 13:50 schrieb Weixi Zhu:

The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar mechanisms except for the
hardware-dependent part, so reinventing them is effectively introducing
redundant code (14K~70K for each case). Such developing/maintaining is not
cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
notifiers/HMM. This raises the bar for driver development, since developers
must understand how Linux MM works. Further, it creates code maintenance
problems -- any changes to Linux MM potentially require coordinated changes
to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these
external MM subsystems disappear. For example, a throughput-oriented
accelerator will not tolerate executing heavy memory access workload with
a host MMU/IOMMU via a remote bus. Therefore, devices will still have
their own MMU and pick a simpler page table format for lower address
translation overhead, requiring external MM subsystems.



What GMEM (Generalized Memory Management [1]) does:

GMEM extends Linux MM to share its machine-independent MM code. Only
high-level interface is provided for device drivers. This prevents
accelerator drivers from reinventing the wheel, but relies on drivers to
implement their hardware-dependent functions declared by GMEM. GMEM's key
interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
gm_dev_register_physmem(). Here briefly describe how a device driver
utilizes them:
1. At boot time, call gm_dev_create() and registers the implementation of
hardware-dependent functions as declared in struct gm_mmu.
  - If the device has local DRAM, call gm_dev_register_physmem() to
register available physical addresses.
2. When a device context is initialized (e.g. triggered by ioctl), check if
the current CPU process has been attached to a gmem address space
(struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
to it.
3. Call gm_as_attach() to attach the device context to a gmem address space.
4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
device computation happens.

GMEM has changed the following assumptions in Linux MM:
   1. An mm_struct not only handle a single CPU context, but may also handle
  external memory contexts encapsulated as gm_context listed in
  mm->gm_as. An external memory context can include a few or all of the
  following parts: an external MMU (that requires TLB invalidation), an
  external page table (that requires PTE manipulation) and external DRAM
  (that requires physical memory management).
   2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily
  mean that a zero-filled physical page should be mapped. The virtual
  page may have been mapped to an external memory device.
   3. Unmapping a page may include sending device TLB invalidation (even if
  its MMU shares CPU page table) and manipulating device PTEs.



Semantics of new syscalls:

1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED)
 Allocate virtual address that is shared between the CPU and all
 attached devices. Data is guaranteed to be coherent whenever the
 address is accessed by either CPU or any attached device. If the device
 does not support page fault, then device driver is responsible for
 faulting memory before data gets accessed. By default, the CPU DRAM is
 can be used as a swap backup for the device local memory.
2. hmadvise(NUMA_id, va_start, size, memory_hint)
 Issuing memory hint for a given VMA. This extends traditional madvise()
 syscall with an extra argument so that programmers have better control
 with heterogeneous devices registered as NUMA nodes. One useful memory
 hint could be MADV_PREFETCH, which guarantees that the physical data of
 the given VMA [VA, VA+size) is migrated to NUMA node #id. Another
 useful memory hint is MADV_DONTNEED. This is helpful to increase device
 memory utilization. It

Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Christian König


Am 28.11.23 um 13:50 schrieb Weixi Zhu:

The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar mechanisms except for the
hardware-dependent part, so reinventing them is effectively introducing
redundant code (14K~70K for each case). Such developing/maintaining is not
cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
notifiers/HMM. This raises the bar for driver development, since developers
must understand how Linux MM works. Further, it creates code maintenance
problems -- any changes to Linux MM potentially require coordinated changes
to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these
external MM subsystems disappear. For example, a throughput-oriented
accelerator will not tolerate executing heavy memory access workload with
a host MMU/IOMMU via a remote bus. Therefore, devices will still have
their own MMU and pick a simpler page table format for lower address
translation overhead, requiring external MM subsystems.



What GMEM (Generalized Memory Management [1]) does:

GMEM extends Linux MM to share its machine-independent MM code. Only
high-level interface is provided for device drivers. This prevents
accelerator drivers from reinventing the wheel, but relies on drivers to
implement their hardware-dependent functions declared by GMEM. GMEM's key
interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
gm_dev_register_physmem(). Here briefly describe how a device driver
utilizes them:
1. At boot time, call gm_dev_create() and registers the implementation of
hardware-dependent functions as declared in struct gm_mmu.
  - If the device has local DRAM, call gm_dev_register_physmem() to
register available physical addresses.
2. When a device context is initialized (e.g. triggered by ioctl), check if
the current CPU process has been attached to a gmem address space
(struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
to it.
3. Call gm_as_attach() to attach the device context to a gmem address space.
4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
device computation happens.

GMEM has changed the following assumptions in Linux MM:
   1. An mm_struct not only handle a single CPU context, but may also handle
  external memory contexts encapsulated as gm_context listed in
  mm->gm_as. An external memory context can include a few or all of the
  following parts: an external MMU (that requires TLB invalidation), an
  external page table (that requires PTE manipulation) and external DRAM
  (that requires physical memory management).


Well that is pretty much exactly what AMD has already proposed with KFD 
and was rejected for rather good reasons.



   2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily
  mean that a zero-filled physical page should be mapped. The virtual
  page may have been mapped to an external memory device.
   3. Unmapping a page may include sending device TLB invalidation (even if
  its MMU shares CPU page table) and manipulating device PTEs.



Semantics of new syscalls:

1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED)
 Allocate virtual address that is shared between the CPU and all
 attached devices. Data is guaranteed to be coherent whenever the
 address is accessed by either CPU or any attached device. If the device
 does not support page fault, then device driver is responsible for
 faulting memory before data gets accessed. By default, the CPU DRAM is
 can be used as a swap backup for the device local memory.
2. hmadvise(NUMA_id, va_start, size, memory_hint)
 Issuing memory hint for a given VMA. This extends traditional madvise()
 syscall with an extra argument so that programmers have better control
 with heterogeneous devices registered as NUMA nodes. One useful memory
 hint could be MADV_PREFETCH, which guarantees that the physical data of
 the given VMA [VA, VA+size) is migrated to NUMA node #id. Another
 useful memory hint is MADV_DONTNEED. This is

Re: [Intel-gfx] [PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2023-11-27 Thread Christian König


Am 27.11.23 um 17:47 schrieb Bhardwaj, Rajneesh:

[AMD Official Use Only - General]

-Original Message-
From: amd-gfx  On Behalf Of Hamza Mahfooz
Sent: Monday, November 27, 2023 10:53 AM
To: Christian König ; jani.nik...@linux.intel.com; 
kher...@redhat.com; d...@redhat.com; za...@vmware.com; Olsak, Marek 
; linux-graphics-maintai...@vmware.com; 
amd-...@lists.freedesktop.org; nouv...@lists.freedesktop.org; 
intel-gfx@lists.freedesktop.org; virtualizat...@lists.linux.dev; 
spice-de...@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT

On 11/27/23 09:54, Christian König wrote:

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's
never filled up again unless the application is restarted.

I found the subject description a bit misleading. Maybe use a Fixes tag 
describing it is a fix for suspend resume regression other than that, looks 
good to me.


Well exactly that's the problem, this isn't really a fix and we also 
don't want to backport it.


Basically the previous behavior was working as design, it's just that it 
was never intended to be used like this.




Acked-by: Rajneesh Bhardwaj 

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2893


Thanks,
Christian.




Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
   1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index aa0dd6dad068..ddc8fb4db678 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
   abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
   AMDGPU_PL_PREEMPT : TTM_PL_TT;
   places[c].flags = 0;
+ /*
+  * When GTT is just an alternative to VRAM make sure that we
+  * only use it as fallback and still try to fill up VRAM first.
+  */
+ if (domain & AMDGPU_GEM_DOMAIN_VRAM)
+ places[c].flags |= TTM_PL_FLAG_BUSY;
   c++;
   }


--
Hamza

[Intel-gfx] [PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT

2023-11-27 Thread Christian König

Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index aa0dd6dad068..ddc8fb4db678 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ?
AMDGPU_PL_PREEMPT : TTM_PL_TT;
places[c].flags = 0;
+   /*
+* When GTT is just an alternative to VRAM make sure that we
+* only use it as fallback and still try to fill up VRAM first.
+*/
+   if (domain & AMDGPU_GEM_DOMAIN_VRAM)
+   places[c].flags |= TTM_PL_FLAG_BUSY;
c++;
}
 
-- 
2.34.1

[Intel-gfx] [PATCH 1/2] drm/ttm: replace busy placement with flags v3

2023-11-27 Thread Christian König

From: Somalapuram Amaranath 

Instead of a list of separate busy placement add flags which indicate
that a placement should only be used when there is room or if we need to
evict.

v2: add missing TTM_PL_FLAG_IDLE for i915
v3: fix auto build test ERROR on drm-tip/drm-tip

Signed-off-by: Christian König 
Signed-off-by: Somalapuram Amaranath 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +--
 drivers/gpu/drm/drm_gem_vram_helper.c  |  2 -
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 
 drivers/gpu/drm/loongson/lsdc_ttm.c|  2 -
 drivers/gpu/drm/nouveau/nouveau_bo.c   | 59 +
 drivers/gpu/drm/nouveau/nouveau_bo.h   |  1 -
 drivers/gpu/drm/qxl/qxl_object.c   |  2 -
 drivers/gpu/drm/qxl/qxl_ttm.c  |  2 -
 drivers/gpu/drm/radeon/radeon_object.c |  2 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  8 +-
 drivers/gpu/drm/radeon/radeon_uvd.c|  1 -
 drivers/gpu/drm/ttm/ttm_bo.c   | 21 +++--
 drivers/gpu/drm/ttm/ttm_resource.c | 73 
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |  3 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +-
 include/drm/ttm/ttm_placement.h| 10 ++-
 include/drm/ttm/ttm_resource.h |  8 +-
 18 files changed, 160 insertions(+), 187 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..aa0dd6dad068 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, 
u32 domain)
 
placement->num_placement = c;
placement->placement = places;
-
-   placement->num_busy_placement = c;
-   placement->busy_placement = places;
 }
 
 /**
@@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
AMDGPU_GEM_DOMAIN_GTT);
 
/* Avoid costly evictions; only set GTT as a busy placement */
-   abo->placement.num_busy_placement = 1;
-   abo->placement.busy_placement = &abo->placements[1];
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
 
r = ttm_bo_validate(bo, &abo->placement, &ctx);
if (unlikely(r == -EBUSY || r == -ERESTARTSYS))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..9a6a00b1af40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
/* Don't handle scatter gather BOs */
if (bo->type == ttm_bo_type_sg) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
/* Object isn't an AMDGPU object so ignore */
if (!amdgpu_bo_is_amdgpu_bo(bo)) {
placement->placement = &placements;
-   placement->busy_placement = &placements;
placement->num_placement = 1;
-   placement->num_busy_placement = 1;
return;
}
 
abo = ttm_to_amdgpu_bo(bo);
if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
 
@@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
case AMDGPU_PL_OA:
case AMDGPU_PL_DOORBELL:
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
 
case TTM_PL_VRAM:
if (!adev->mman.buffer_funcs_enabled) {
/* Move to system memory */
amdgpu_bo_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
+
} else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
   !(abo->flags & 
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) &&
   amdgpu_bo_in_cpu_visible_vram(abo)) {
@@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
AMDGPU_GEM_DOMAIN_CPU);
abo->placements[0].fpfn = adev->gmc.visible_vram_size 
>> PAGE_SHIFT;
abo->placements[0].lpfn = 0;
-   abo->placement.busy_placement = &abo->placements[1];
-   abo->placement.num_busy_placement = 1;
+   abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
} else {
/* Move to GTT memory */
a

[Intel-gfx] TTM improvement and amdgpu fix

2023-11-27 Thread Christian König

Hi guys,

TTM has a feature which allows to specify placements for normal operation as 
well as when all domains are "busy" and don't have free space.

Not very widely used since it was a bit inflexible and required making multiple 
placement lists. Replace the multiple lists with flags and start to use this in 
amdgpu as well.

As future improvement we should probably re-work was "busy" means for a domain 
as well.

Please comment and/or test.

Thanks,
Christian.

Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by

2023-10-05 Thread Christian König


Am 02.10.23 um 20:22 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 08:11:41PM +0200, Christian König wrote:

Am 02.10.23 um 20:08 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 08:01:57PM +0200, Christian König wrote:

Am 02.10.23 um 18:53 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 11:06:19AM -0400, Alex Deucher wrote:

On Mon, Oct 2, 2023 at 5:20 AM Christian König
 wrote:

Am 29.09.23 um 21:33 schrieb Kees Cook:

On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote:

This is a batch of patches touching drm for preparing for the coming
implementation by GCC and Clang of the __counted_by attribute. Flexible
array members annotated with __counted_by can have their accesses
bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array
indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

As found with Coccinelle[1], add __counted_by to structs that would
benefit from the annotation.

[...]

Since this got Acks, I figure I should carry it in my tree. Let me know
if this should go via drm instead.

Applied to for-next/hardening, thanks!

[1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with 
__counted_by
  https://git.kernel.org/kees/c/a6046ac659d6

STOP! In a follow up discussion Alex and I figured out that this won't work.

I'm so confused; from the discussion I saw that Alex said both instances
were false positives?


The value in the structure is byte swapped based on some firmware
endianness which not necessary matches the CPU endianness.

SMU10 is APU only so the endianess of the SMU firmware and the CPU
will always match.

Which I think is what is being said here?


Please revert that one from going upstream if it's already on it's way.

And because of those reasons I strongly think that patches like this
should go through the DRM tree :)

Sure, that's fine -- please let me know. It was others Acked/etc. Who
should carry these patches?

Probably best if the relevant maintainer pick them up individually.

Some of those structures are filled in by firmware/hardware and only the
maintainers can judge if that value actually matches what the compiler
needs.

We have cases where individual bits are used as flags or when the size is
byte swapped etc...

Even Alex and I didn't immediately say how and where that field is actually
used and had to dig that up. That's where the confusion came from.

Okay, I've dropped them all from my tree. Several had Acks/Reviews, so
hopefully those can get picked up for the DRM tree?

I will pick those up to go through drm-misc-next.

Going to ping maintainers once more when I'm not sure if stuff is correct or
not.

Sounds great; thanks!


I wasn't 100% sure for the VC4 patch, but pushed the whole set to 
drm-misc-next anyway.


This also means that the patches are now auto merged into the drm-tip 
integration branch and should any build or unit test go boom we should 
notice immediately and can revert it pretty easily.


Thanks,
Christian.



-Kees

Re: [Intel-gfx] [PATCH] dma-buf: Deny copy-on-writes mmaps

2023-10-04 Thread Christian König


Am 04.10.23 um 01:03 schrieb Andi Shyti:

From: Chris Wilson 

Enforce that an mmap of a dmabuf is always using MAP_SHARED so that all
access (both read and writes) using the device memory and not a local
copy-on-write page in system memory.


As much as I would like to do this I fear that this won't work.

First of all interesting approach to do this in .get_unmapped_area. The 
standard handling is to have the check like "if 
(is_cow_mapping(vma->vm_flags)) return -EINVAL;", see TTM for example.


Then IIRC we already tried this and had to revert it because it breaks 
the UAPI. Some broken applications actually use shared mappings (but not 
really cow) and we would like to keep them working.


Regards,
Christian.



Signed-off-by: Chris Wilson 
Signed-off-by: Andi Shyti 
---
  drivers/dma-buf/dma-buf.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 21916bba77d5..1ec297241842 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -128,6 +129,19 @@ static struct file_system_type dma_buf_fs_type = {

.kill_sb = kill_anon_super,
  };
  
+static unsigned long

+dma_buf_get_unmapped_area(struct file *file,
+ unsigned long addr,
+ unsigned long len,
+ unsigned long pgoff,
+ unsigned long flags)
+{
+   if ((flags & MAP_TYPE) == MAP_PRIVATE)
+   return -EINVAL;
+
+   return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
  static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct 
*vma)
  {
struct dma_buf *dmabuf;
@@ -508,6 +522,7 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct 
file *file)
  
  static const struct file_operations dma_buf_fops = {

.release= dma_buf_file_release,
+   .get_unmapped_area = dma_buf_get_unmapped_area,
.mmap   = dma_buf_mmap_internal,
.llseek = dma_buf_llseek,
.poll   = dma_buf_poll,

Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by

2023-10-02 Thread Christian König


Am 02.10.23 um 20:08 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 08:01:57PM +0200, Christian König wrote:

Am 02.10.23 um 18:53 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 11:06:19AM -0400, Alex Deucher wrote:

On Mon, Oct 2, 2023 at 5:20 AM Christian König
 wrote:

Am 29.09.23 um 21:33 schrieb Kees Cook:

On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote:

This is a batch of patches touching drm for preparing for the coming
implementation by GCC and Clang of the __counted_by attribute. Flexible
array members annotated with __counted_by can have their accesses
bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array
indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

As found with Coccinelle[1], add __counted_by to structs that would
benefit from the annotation.

[...]

Since this got Acks, I figure I should carry it in my tree. Let me know
if this should go via drm instead.

Applied to for-next/hardening, thanks!

[1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with 
__counted_by
 https://git.kernel.org/kees/c/a6046ac659d6

STOP! In a follow up discussion Alex and I figured out that this won't work.

I'm so confused; from the discussion I saw that Alex said both instances
were false positives?


The value in the structure is byte swapped based on some firmware
endianness which not necessary matches the CPU endianness.

SMU10 is APU only so the endianess of the SMU firmware and the CPU
will always match.

Which I think is what is being said here?


Please revert that one from going upstream if it's already on it's way.

And because of those reasons I strongly think that patches like this
should go through the DRM tree :)

Sure, that's fine -- please let me know. It was others Acked/etc. Who
should carry these patches?

Probably best if the relevant maintainer pick them up individually.

Some of those structures are filled in by firmware/hardware and only the
maintainers can judge if that value actually matches what the compiler
needs.

We have cases where individual bits are used as flags or when the size is
byte swapped etc...

Even Alex and I didn't immediately say how and where that field is actually
used and had to dig that up. That's where the confusion came from.

Okay, I've dropped them all from my tree. Several had Acks/Reviews, so
hopefully those can get picked up for the DRM tree?


I will pick those up to go through drm-misc-next.

Going to ping maintainers once more when I'm not sure if stuff is 
correct or not.


Christian.



Thanks!

-Kees

Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by

2023-10-02 Thread Christian König


Am 02.10.23 um 18:53 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 11:06:19AM -0400, Alex Deucher wrote:

On Mon, Oct 2, 2023 at 5:20 AM Christian König
 wrote:

Am 29.09.23 um 21:33 schrieb Kees Cook:

On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote:

This is a batch of patches touching drm for preparing for the coming
implementation by GCC and Clang of the __counted_by attribute. Flexible
array members annotated with __counted_by can have their accesses
bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array
indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

As found with Coccinelle[1], add __counted_by to structs that would
benefit from the annotation.

[...]

Since this got Acks, I figure I should carry it in my tree. Let me know
if this should go via drm instead.

Applied to for-next/hardening, thanks!

[1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with 
__counted_by
https://git.kernel.org/kees/c/a6046ac659d6

STOP! In a follow up discussion Alex and I figured out that this won't work.

I'm so confused; from the discussion I saw that Alex said both instances
were false positives?


The value in the structure is byte swapped based on some firmware
endianness which not necessary matches the CPU endianness.

SMU10 is APU only so the endianess of the SMU firmware and the CPU
will always match.

Which I think is what is being said here?


Please revert that one from going upstream if it's already on it's way.

And because of those reasons I strongly think that patches like this
should go through the DRM tree :)

Sure, that's fine -- please let me know. It was others Acked/etc. Who
should carry these patches?


Probably best if the relevant maintainer pick them up individually.

Some of those structures are filled in by firmware/hardware and only the 
maintainers can judge if that value actually matches what the compiler 
needs.


We have cases where individual bits are used as flags or when the size 
is byte swapped etc...


Even Alex and I didn't immediately say how and where that field is 
actually used and had to dig that up. That's where the confusion came from.


Regards,
Christian.



Thanks!

-Kees



Regards,
Christian.


[2/9] drm/amdgpu/discovery: Annotate struct ip_hw_instance with __counted_by
https://git.kernel.org/kees/c/4df33089b46f
[3/9] drm/i915/selftests: Annotate struct perf_series with __counted_by
https://git.kernel.org/kees/c/ffd3f823bdf6
[4/9] drm/msm/dpu: Annotate struct dpu_hw_intr with __counted_by
https://git.kernel.org/kees/c/2de35a989b76
[5/9] drm/nouveau/pm: Annotate struct nvkm_perfdom with __counted_by
https://git.kernel.org/kees/c/188aeb08bfaa
[6/9] drm/vc4: Annotate struct vc4_perfmon with __counted_by
https://git.kernel.org/kees/c/59a54dc896c3
[7/9] drm/virtio: Annotate struct virtio_gpu_object_array with __counted_by
https://git.kernel.org/kees/c/5cd476de33af
[8/9] drm/vmwgfx: Annotate struct vmw_surface_dirty with __counted_by
https://git.kernel.org/kees/c/b426f2e5356a
[9/9] drm/v3d: Annotate struct v3d_perfmon with __counted_by
https://git.kernel.org/kees/c/dc662fa1b0e4

Take care,

Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by

2023-10-02 Thread Christian König


Am 29.09.23 um 21:33 schrieb Kees Cook:

On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote:

This is a batch of patches touching drm for preparing for the coming
implementation by GCC and Clang of the __counted_by attribute. Flexible
array members annotated with __counted_by can have their accesses
bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array
indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

As found with Coccinelle[1], add __counted_by to structs that would
benefit from the annotation.

[...]

Since this got Acks, I figure I should carry it in my tree. Let me know
if this should go via drm instead.

Applied to for-next/hardening, thanks!

[1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with 
__counted_by
   https://git.kernel.org/kees/c/a6046ac659d6


STOP! In a follow up discussion Alex and I figured out that this won't work.

The value in the structure is byte swapped based on some firmware 
endianness which not necessary matches the CPU endianness.


Please revert that one from going upstream if it's already on it's way.

And because of those reasons I strongly think that patches like this 
should go through the DRM tree :)


Regards,
Christian.


[2/9] drm/amdgpu/discovery: Annotate struct ip_hw_instance with __counted_by
   https://git.kernel.org/kees/c/4df33089b46f
[3/9] drm/i915/selftests: Annotate struct perf_series with __counted_by
   https://git.kernel.org/kees/c/ffd3f823bdf6
[4/9] drm/msm/dpu: Annotate struct dpu_hw_intr with __counted_by
   https://git.kernel.org/kees/c/2de35a989b76
[5/9] drm/nouveau/pm: Annotate struct nvkm_perfdom with __counted_by
   https://git.kernel.org/kees/c/188aeb08bfaa
[6/9] drm/vc4: Annotate struct vc4_perfmon with __counted_by
   https://git.kernel.org/kees/c/59a54dc896c3
[7/9] drm/virtio: Annotate struct virtio_gpu_object_array with __counted_by
   https://git.kernel.org/kees/c/5cd476de33af
[8/9] drm/vmwgfx: Annotate struct vmw_surface_dirty with __counted_by
   https://git.kernel.org/kees/c/b426f2e5356a
[9/9] drm/v3d: Annotate struct v3d_perfmon with __counted_by
   https://git.kernel.org/kees/c/dc662fa1b0e4

Take care,

Re: [Intel-gfx] [PATCH 1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with __counted_by

2023-09-24 Thread Christian König


Am 22.09.23 um 19:41 schrieb Alex Deucher:

On Fri, Sep 22, 2023 at 1:32 PM Kees Cook  wrote:

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct 
smu10_voltage_dependency_table.

[1] 
https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Evan Quan 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: "Pan, Xinhui" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Xiaojian Du 
Cc: Huang Rui 
Cc: Kevin Wang 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Kees Cook 

Acked-by: Alex Deucher 


Mhm, I'm not sure if this is a good idea. That is a structure filled in 
by the firmware, isn't it?


That would imply that we might need to byte swap count before it is 
checkable.


Regards,
Christian.




---
  drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h
index 808e0ecbe1f0..42adc2a3dcbc 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h
@@ -192,7 +192,7 @@ struct smu10_clock_voltage_dependency_record {

  struct smu10_voltage_dependency_table {
 uint32_t count;
-   struct smu10_clock_voltage_dependency_record entries[];
+   struct smu10_clock_voltage_dependency_record entries[] 
__counted_by(count);
  };

  struct smu10_clock_voltage_information {
--
2.34.1

Re: [Intel-gfx] [PATCH v2] drm: Update file owner during use

2023-09-20 Thread Christian König


Am 20.09.23 um 15:21 schrieb Tvrtko Ursulin:


On 28/08/2023 20:58, Rob Clark wrote:

On Wed, Jun 21, 2023 at 2:48 AM Tvrtko Ursulin
 wrote:


From: Tvrtko Ursulin 

With the typical model where the display server opens the file 
descriptor

and then hands it over to the client(*), we were showing stale data in
debugfs.

Fix it by updating the drm_file->pid on ioctl access from a different
process.

The field is also made RCU protected to allow for lockless readers. 
Update

side is protected with dev->filelist_mutex.

Before:

$ cat /sys/kernel/debug/dri/0/clients
  command   pid dev master a   uid  magic
 Xorg  2344   0   y    y 0  0
 Xorg  2344   0   n    y 0  2
 Xorg  2344   0   n    y 0  3
 Xorg  2344   0   n    y 0  4

After:

$ cat /sys/kernel/debug/dri/0/clients
  command  tgid dev master a   uid  magic
 Xorg   830   0   y    y 0  0
    xfce4-session   880   0   n    y 0  1
    xfwm4   943   0   n    y 0  2
    neverball  1095   0   n    y 0  3

*)
More detailed and historically accurate description of various handover
implementation kindly provided by Emil Velikov:

"""
The traditional model, the server was the orchestrator managing the
primary device node. From the fd, to the master status and
authentication. But looking at the fd alone, this has varied across
the years.

IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open
fd(s) and reuse those whenever needed, DRI2 the client was responsible
for open() themselves and with DRI3 the fd was passed to the client.

Around the inception of DRI3 and systemd-logind, the latter became
another possible orchestrator. Whereby Xorg and Wayland compositors
could ask it for the fd. For various reasons (hysterical and genuine
ones) Xorg has a fallback path going the open(), whereas Wayland
compositors are moving to solely relying on logind... some never had
fallback even.

Over the past few years, more projects have emerged which provide
functionality similar (be that on API level, Dbus, or otherwise) to
systemd-logind.
"""

v2:
  * Fixed typo in commit text and added a fine historical explanation
    from Emil.

Signed-off-by: Tvrtko Ursulin 
Cc: "Christian König" 
Cc: Daniel Vetter 
Acked-by: Christian König 
Reviewed-by: Emil Velikov 


Reviewed-by: Rob Clark 
Tested-by: Rob Clark 


Thanks. If everyone else is happy with this approach I don't have the 
commit rights for drm-misc.


Going to take care of pushing this.

Regards,
Christian.



Regards,

Tvrtko




---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  6 ++--
  drivers/gpu/drm/drm_auth.c  |  3 +-
  drivers/gpu/drm/drm_debugfs.c   | 10 ---
  drivers/gpu/drm/drm_file.c  | 40 
+++--

  drivers/gpu/drm/drm_ioctl.c |  3 ++
  drivers/gpu/drm/nouveau/nouveau_drm.c   |  5 +++-
  drivers/gpu/drm/vmwgfx/vmwgfx_gem.c |  6 ++--
  include/drm/drm_file.h  | 13 ++--
  8 files changed, 71 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 74055cba3dc9..849097dff02b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -963,6 +963,7 @@ static int amdgpu_debugfs_gem_info_show(struct 
seq_file *m, void *unused)

 list_for_each_entry(file, &dev->filelist, lhead) {
 struct task_struct *task;
 struct drm_gem_object *gobj;
+   struct pid *pid;
 int id;

 /*
@@ -972,8 +973,9 @@ static int amdgpu_debugfs_gem_info_show(struct 
seq_file *m, void *unused)
  * Therefore, we need to protect this ->comm access 
using RCU.

  */
 rcu_read_lock();
-   task = pid_task(file->pid, PIDTYPE_TGID);
-   seq_printf(m, "pid %8d command %s:\n", 
pid_nr(file->pid),

+   pid = rcu_dereference(file->pid);
+   task = pid_task(pid, PIDTYPE_TGID);
+   seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
    task ? task->comm : "");
 rcu_read_unlock();

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index cf92a9ae8034..2ed2585ded37 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -235,7 +235,8 @@ static int drm_new_set_master(struct drm_device 
*dev, struct drm_file *fpriv)

  static int
  drm_master_check_perm(struct drm_device *dev, struct drm_file 
*file_priv)

  {
-   if (file_priv->pid == task_pid(current) && 
file_priv->was_master)

+   if (file_priv->was_master &

Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time

2023-09-07 Thread Christian König


Am 07.09.23 um 18:33 schrieb suijingfeng:

Hi,


On 2023/9/7 17:08, Christian König wrote:


I strongly suggest that you just completely drop this here 



Drop this is OK, no problem. Then I will go to develop something else.
This version is not intended to merge originally, as it's a RFC.
Also, the core mechanism already finished, it is the first patch in 
this series.
Things left are just policy (how to specify one and parse the kernel 
CMD line) and nothing interesting left.
It is actually to fulfill my promise at V3 which is to give some 
examples as usage cases.



and go into the AST driver and try to fix it. 


Well, someone tell me that this is well defined behavior yesterday,
which imply that it is not a bug. I'm not going to fix a non-bug.


Sorry for that, I wasn't realizing what you are actually trying to do.


But if thomas ask me to fix it, then I probably have to try to fix.
But I suggest if things not broken, don't fix it. Otherwise this may
incur more big trouble. For server's single display use case, it is
good enough.


Yeah, exactly that's the reason why you shouldn't mess with this.

In theory you could try to re-program the necessary north bridge blocks 
to make integrated graphics work even if you installed a dedicated VGA 
adapter, but you will most likely be missing something.


The only real fix is to tell the BIOS that you want to use the 
integrated VGA device even if a dedicated one is detected.


If you want to learn more about the background AMD has a bunch of 
documentation around this on their website: 
https://www.amd.com/en/search/documentation/hub.html


The most interesting document for you is probably the BIOS programming 
manual, but don't ask me what exactly the title of that one. @Alex do 
you remember what that was called?


IIRC Intel had similar documentations public, but I don't know where to 
find those of hand.


Regards,
Christian.




Thanks.

Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time

2023-09-07 Thread Christian König


Am 07.09.23 um 17:26 schrieb suijingfeng:

[SNIP]



Then, I'll give you another example, see below for elaborate description.
I have one AMD BC160 GPU, see[1] to get what it looks like.

The GPU don't has a display connector interface exported.
It actually can be seen as a render-only GPU or compute class GPU for 
bitcoin.

But the firmware of it still acclaim this GPU as VGA compatible.
When mount this GPU onto motherboard, the system always select this 
GPU as primary.

But this GPU can't be able to connect with a monitor.

Under such a situation, modprobe.blacklist=amdgpu don't works either,
because vgaarb always select this GPU as primary, this is a 
device-level decision.


It's not VGAARB which makes this selection, it's the BIOS. VGAARB just 
detects what the BIOS has decided.




$ dmesg | grep vgaarb:

[    3.541405] pci :0c:00.0: vgaarb: BAR 0: [mem 
0xa000-0xafff 64bit pref] contains firmware FB 
[0xa000-0xa02f]

[    3.901448] pci :05:00.0: vgaarb: setting as boot VGA device
[    3.905375] pci :05:00.0: vgaarb: VGA device added: 
decodes=io+mem,owns=none,locks=none
[    3.905382] pci :0c:00.0: vgaarb: setting as boot VGA device 
(overriding previous)
[    3.909375] pci :0c:00.0: vgaarb: VGA device added: 
decodes=io+mem,owns=io+mem,locks=none
[    3.913375] pci :0d:00.0: vgaarb: VGA device added: 
decodes=io+mem,owns=none,locks=none

[    3.913377] vgaarb: loaded
[   13.513760] amdgpu :0c:00.0: vgaarb: deactivate vga console
[   19.020992] amdgpu :0c:00.0: vgaarb: changed VGA decodes: 
olddecodes=io+mem,decodes=none:owns=io+mem


I'm using ubuntu 22.04 system, with ast.modeset=10 passed on the cmd 
line,
I still be able to enter the graphics system. And views this GPU as a 
render-only GPU.

Probably continue to examine what's wrong, except this, drm/amdgpu report
" *ERROR* IB test failed on sdma0 (-110)" to me.

Does this count as problem?


No, again that is perfectly expected behavior.

Some BIOSes (or maybe most by modern standard) allows to override this, 
but if you later override this by the OS you run the hardware outside 
what's validated.


When you put a VGA device into a board with an integrated VGA device the 
integrated one gets disabled. This is even part of some PCIe 
specification IIRC.


So the problems you run into here are perfectly expected.

Regards,
Christian.



Before I could find solution, I have keep this de-fact render only GPU 
mounted.
Because I need recompile kennel module, install the kernel module and 
testing.


All I need is a 2D video card to display something, ast drm is OK, 
despite simple.

It suit the need for my daily usage with VIM, that's enough for me.

Now, the real questions that I want ask is:

1)

Does the fact that when the kernel driver module got blocked (by 
modprobe.blacklist=amdgpu),
while the vgaarb still select it as primary which leave the X server 
crash there (because no kennel space driver loaded)

count as a problem?


2)

Does my approach that mounting another GPU as the primary display 
adapter,
while its real purpose is to solving bugs and development for another 
GPU,

count as a use case?


$ cat demsg.txt | grep drm

[   10.099888] ACPI: bus type drm_connector registered
[   11.083920] etnaviv :0d:00.0: [drm] bind etnaviv-display, 
master name: :0d:00.0
[   11.084106] [drm] Initialized etnaviv 1.3.0 20151214 for 
:0d:00.0 on minor 0

[   13.301702] [drm] amdgpu kernel modesetting enabled.
[   13.359820] [drm] initializing kernel modesetting (NAVI12 
0x1002:0x7360 0x1002:0x0A34 0xC7).

[   13.368246] [drm] register mmio base: 0xEB10
[   13.372861] [drm] register mmio size: 524288
[   13.380788] [drm] add ip block number 0 
[   13.385661] [drm] add ip block number 1 
[   13.390531] [drm] add ip block number 2 
[   13.395405] [drm] add ip block number 3 
[   13.399760] [drm] add ip block number 4 
[   13.404111] [drm] add ip block number 5 
[   13.408378] [drm] add ip block number 6 
[   13.413249] [drm] add ip block number 7 
[   13.433546] [drm] add ip block number 8 
[   13.433547] [drm] add ip block number 9 
[   13.497757] [drm] VCN decode is enabled in VM mode
[   13.502540] [drm] VCN encode is enabled in VM mode
[   13.508785] [drm] JPEG decode is enabled in VM mode
[   13.529596] [drm] vm size is 262144 GB, 4 levels, block size is 
9-bit, fragment size is 9-bit

[   13.564762] [drm] Detected VRAM RAM=8176M, BAR=256M
[   13.569628] [drm] RAM width 2048bits HBM
[   13.574167] [drm] amdgpu: 8176M of VRAM memory ready
[   13.579125] [drm] amdgpu: 15998M of GTT memory ready.
[   13.584184] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   13.590505] [drm] PCIE GART of 512M enabled (table at 
0x00800030).
[   13.598749] [drm] Found VCN firmware Version ENC: 1.16 DEC: 5 VEP: 
0 Revision: 4

[   13.671786] [drm] reserve 0xe0 from 0x81fd00 for PSP TMR
[   13.801235] [drm] Display Core v3.2.247 initialized on DCN 2.0
[   13.807061] [drm] DP-HDM

Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time

2023-09-07 Thread Christian König


Am 07.09.23 um 14:32 schrieb suijingfeng:

Hi,


On 2023/9/7 17:08, Christian König wrote:
Well, I have over 25 years of experience with display hardware and 
what you describe here was never an issue. 


I want to give you an example to let you know more.

I have a ASRock AD2550B-ITX board[1],
When another discrete video card is mounted into it mini PCIe slot or 
PCI slot,
The IGD cannot be the primary display adapter anymore. The display is 
totally black.

I have try to draft a few trivial patch to help fix this[2].

And I want to use the IGD as primary, does this count as an issue?


No, this is completely expected behavior and a limitation of the 
hardware design.


As far as I know both AMD and Intel GPUs work the same here.

Regards,
Christian.



[1] https://www.asrock.com/mb/Intel/AD2550-ITX/
[2] https://patchwork.freedesktop.org/series/123073/

Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time

2023-09-07 Thread Christian König


Am 07.09.23 um 04:30 schrieb Sui Jingfeng:

Hi,


On 2023/9/6 17:40, Christian König wrote:

Am 06.09.23 um 11:08 schrieb suijingfeng:

Well, welcome to correct me if I'm wrong.


You seem to have some very basic misunderstandings here.

The term framebuffer describes some VRAM memory used for scanout.

This framebuffer is exposed to userspace through some framebuffer 
driver, on UEFI platforms that is usually efifb but can be quite a 
bunch of different drivers.


When the DRM drivers load they remove the previous drivers using 
drm_aperture_remove_conflicting_pci_framebuffers() (or similar 
function), but this does not mean that the framebuffer or scanout 
parameters are modified in any way. It just means that the 
framebuffer is just no longer exposed through this driver.


Take over is the perfectly right description here because that's 
exactly what's happening. The framebuffer configuration including the 
VRAM memory as well as the parameters for scanout are exposed by the 
newly loaded DRM driver.


In other words userspace can query through the DRM interfaces which 
monitors already driven by the hardware and so in your terminology 
figure out which is the primary one.



I'm a little bit of not convinced about this idea, you might be correct.


Well I can point you to the code if you don't believe me.


But there cases where three are multiple monitors and each video card
connect one.


Yeah, but this is irrelevant. The key point is the configuration is 
taken over when the driver loads.


So whatever is there before as setup (one monitor showing console, three 
monitors mirrored, whatever) should be there after loading the driver as 
well. This configuration is just immediately overwritten because nobody 
cares about it.




It also quite common that no monitors is connected, let the machine boot
first, then find a monitors to connect to a random display output. See
which will display. I don't expect the primary shake with.
The primary one have to be determined as early as possible, because of
the VGA console and the framebuffer console may directly output the 
primary.


Well that is simply not correct. There is not concept of "primary" 
display, it can just be that a monitor was brought up by the BIOS or 
bootloader and we take over this configuration.



Get the DDC and/or HPD involved may necessary complicated the problem.

There are ASpeed BMC who add a virtual connector in order to able 
display remotely.

There are also have commands to force a connector to be connected status.


It's just that as Thomas explained as well that this completely 
irrelevant to any modern desktop. Both X and Wayland both iterate the 
available devices and start rendering to them which one was used 
during boot doesn't really matter to them.



You may be correct, but I'm still not sure.
I probably need more times to investigate.
Me and my colleagues are mainly using X server,
the version varies from 1.20.4 and 1.21.1.4.
Even this is true, the problems still exist for non-modern desktops.


Well, I have over 25 years of experience with display hardware and what 
you describe here was never an issue.


What you have is simply a broken display driver which for some reason 
can't handle your use case.


I strongly suggest that you just completely drop this here and go into 
the AST driver and try to fix it.


Regards,
Christian.




Apart from that ranting like this and trying to explain stuff to 
people who obviously have much better background in the topic is not 
going to help your patches getting upstream.




Thanks for you tell me so much knowledge,
I'm realized where are the problems now.
I will try to resolve the concerns at the next version.



Regards,
Christian.

Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time

2023-09-06 Thread Christian König


Am 06.09.23 um 12:31 schrieb Sui Jingfeng:

Hi,

On 2023/9/6 14:45, Christian König wrote:
Firmware framebuffer device already get killed by the 
drm_aperture_remove_conflicting_pci_framebuffers()
function (or its siblings). So, this series is definitely not to 
interact with the firmware framebuffer
(or more intelligent framebuffer drivers).  It is for user space 
program, such as X server and Wayland
compositor. Its for Linux user or drm drivers testers, which allow 
them to direct graphic display server

using right hardware of interested as primary video card.

Also, I believe that X server and Wayland compositor are the best 
test examples.

If a specific DRM driver can't work with X server as a primary,
then there probably have something wrong.



But what's the use case for overriding this setting?



On a specific machine with multiple GPUs mounted,
only the primary graphics get POST-ed (initialized) by the firmware.
Therefore, the DRM drivers for the rest video cards, have to choose to
work without the prerequisite setups done by firmware, This is 
called as POST.


Well, you don't seem to understand the background here. This is 
perfectly normal behavior.


Secondary cards are posted after loading the appropriate DRM driver. 
At least for amdgpu this is done by calling the appropriate functions 
in the BIOS. 



Well, thanks for you tell me this. You know more than me and 
definitely have a better understanding.


Are you telling me that the POST function for AMDGPU reside in the BIOS?
The kernel call into the BIOS?


Yes, exactly that.

Does the BIOS here refer to the UEFI runtime or ATOM BIOS or something 
else?


On dGPUs it's the VBIOS on a flashrom on the board, for iGPUs (APUs as 
AMD calls them) it's part of the system BIOS.


UEFI is actually just a small subsystem in the system BIOS which 
replaced the old interface used between system BIOS, video BIOS and 
operating system.




But the POST function for the drm ast, reside in the kernel space (in 
other word, in ast.ko).

Is this statement correct?


I don't know the ast driver well enough to answer that, but I assume 
they just read the BIOS and execute the appropriate functions.




I means that for ASpeed BMC chip, if the firmware not POST the display 
controller.
Then we have to POST it at the kernel space before doing various 
modeset option.

We can only POST this chip by directly operate the various registers.
Am I correct for the judgement about ast drm driver?


Well POST just means Power On Self Test, but what you mean is 
initializing the hardware.


Some drivers can of course initialize the hardware without the help of 
the BIOS, but I don't think AST can do that. As far as I know it's a 
relatively simple driver.


BTW firmware is not the same as the BIOS (which runs the POST), firmware 
usually refers to something run on microcontrollers inside the ASIC 
while the (system or video) BIOS runs on the host CPU.


Regards,
Christian.



Thanks for your reviews.

Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time

2023-09-06 Thread Christian König


Am 06.09.23 um 11:08 schrieb suijingfeng:

Well, welcome to correct me if I'm wrong.


You seem to have some very basic misunderstandings here.

The term framebuffer describes some VRAM memory used for scanout.

This framebuffer is exposed to userspace through some framebuffer 
driver, on UEFI platforms that is usually efifb but can be quite a bunch 
of different drivers.


When the DRM drivers load they remove the previous drivers using 
drm_aperture_remove_conflicting_pci_framebuffers() (or similar 
function), but this does not mean that the framebuffer or scanout 
parameters are modified in any way. It just means that the framebuffer 
is just no longer exposed through this driver.


Take over is the perfectly right description here because that's exactly 
what's happening. The framebuffer configuration including the VRAM 
memory as well as the parameters for scanout are exposed by the newly 
loaded DRM driver.


In other words userspace can query through the DRM interfaces which 
monitors already driven by the hardware and so in your terminology 
figure out which is the primary one.


It's just that as Thomas explained as well that this completely 
irrelevant to any modern desktop. Both X and Wayland both iterate the 
available devices and start rendering to them which one was used during 
boot doesn't really matter to them.


Apart from that ranting like this and trying to explain stuff to people 
who obviously have much better background in the topic is not going to 
help your patches getting upstream.


Regards,
Christian.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1159 matches

Mail list logo