pool: Introduce a way to skip clear on free

Nirmoy Das Thu, 20 Jun 2024 08:11:19 -0700

Hi Christian,

On 6/20/2024 4:45 PM, Christian König wrote:

Hi Nirmoy,
Am 20.06.24 um 16:37 schrieb Nirmoy Das:
Hi Christian,

On 6/20/2024 4:08 PM, Christian König wrote:
Am 20.06.24 um 15:46 schrieb Nirmoy Das:
Clearing pages can be very slow when using CPU but GPUs can performthistask much faster. With this new pool API driver can decide if itwants toclear pages using GPU. This provides the flexibility to choose thedesiredclear policy, either during allocation or upon freeing, as per thedriver's
preference.
We already have the TTM_TT_FLAG_ZERO_ALLOC to indicate if pagesneeds to be cleared on alloc from the OS.
I'm not sure if we really need the option to not clear them in thepool as well, but if we really need this I suggest to switch fromclear on free to clear on alloc again and just honor the flag.
Perf reported higher latency because of clearing pages before givingback to the pool. I think it would be nice if drm driver could avoid it.
I can modify this to move clearing page to ttm_pool_type_take() tohonor TTM_TT_FLAG_ZERO_ALLOC flags.
Both approaches have some pro and cons. IIRC we intentionally movedthe clearing to the free function to avoid latency on allocation.

Ah I see. I will send a patch to honor TTM_TT_FLAG_ZERO_ALLOC flag andif we need clear-on-free then we can add another flag for that.

Alternatively you could also split the pools into cleared and notcleared pages as well.
Could you expand this please ?
Just create separate pools for cleared and uncleared pages (orseparate lists inside the pools).
Then when you see the TTM_TT_FLAG_ZERO_ALLOC flag you try to grabthings from the uncleared pool and if you don't see it try to grabthings from the cleared pool.
Same for release of pages, just the other way around.

I get it now. My main goal is to avoid cpu clear so this will work welltoo with the above change.

I have another question. Our userspace team have found that there ishigher latency for ttm_cached type buffer as well and using gpu cleardoesn't help much
because kernel will clear pages anyways ifalloc_on_init/CONFIG_INIT_ON_ALLOC_DEFAULT_ON is active. I see thatonly way to mitigate this is to use a pool for
ttm_cached buffers. I was thinking of using a pool flag to alsoallow drm driver to create a pool for ttm_cached. I wonder what doyou think about it and if
there is any other better solution.
Well I would clearly have to NAK a hack like this.


I thought so :)

We only have a pool for uncached and WC pages because of lack ofsupport for that in the general memory and DMA management.
The TTM_TT_FLAG_ZERO_ALLOC should control if GFP_ZERO is set or not.If the core MM decides to ignore that and clear pages anyway then youneed to talk to the core MM people if you want to avoid that.

Now I know why we don't have a pool for WB. Also it seems the pools arex86 exclusive. Just found about that there were some proposal in coreMM[1] to avoid alloc_on_init but haven't seen any


follow up with a quick search.

[1]https://patchwork.kernel.org/project/linux-mm/patch/20230831105252.1385911-1-zhaoyang.hu...@unisoc.com/#25494667



Thanks a lot,

Nirmoy


Regards,
Christian.



Thanks,

Nirmoy


Regards,
Christian.


Cc: Christian Koenig <christian.koe...@amd.com>
Cc: "Thomas Hellström" <thomas.hellst...@linux.intel.com>
Cc: Matthew Auld <matthew.a...@intel.com>
Signed-off-by: Nirmoy Das <nirmoy....@intel.com>
---
  drivers/gpu/drm/ttm/ttm_device.c | 42 +++++++++++++++++++++++----

drivers/gpu/drm/ttm/ttm_pool.c | 49+++++++++++++++++++++++++-------

  include/drm/ttm/ttm_device.h     |  8 ++++++
  include/drm/ttm/ttm_pool.h       | 11 +++++++
  4 files changed, 94 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.cb/drivers/gpu/drm/ttm/ttm_device.c

index 434cf0258000..54a3ea825c2e 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -191,15 +191,19 @@ EXPORT_SYMBOL(ttm_device_swapout);
   * @vma_manager: A pointer to a vma manager.
   * @use_dma_alloc: If coherent DMA allocation API should be used.

* @use_dma32: If we should use GFP_DMA32 for device memoryallocations.

+ * @pool_flags: Desired pool flags
   *
   * Initializes a struct ttm_device:
   * Returns:
   * !0: Failure.
   */

-int ttm_device_init(struct ttm_device *bdev, const structttm_device_funcs *funcs,

-            struct device *dev, struct address_space *mapping,
-            struct drm_vma_offset_manager *vma_manager,
-            bool use_dma_alloc, bool use_dma32)
+int ttm_device_init_with_pool_flags(struct ttm_device *bdev,
+                    const struct ttm_device_funcs *funcs,
+                    struct device *dev,
+                    struct address_space *mapping,
+                    struct drm_vma_offset_manager *vma_manager,
+                    bool use_dma_alloc, bool use_dma32,
+                    unsigned int pool_flags)
  {
      struct ttm_global *glob = &ttm_glob;
      int ret, nid;

@@ -227,7 +231,8 @@ int ttm_device_init(struct ttm_device *bdev,const struct ttm_device_funcs *func

      else
          nid = NUMA_NO_NODE;
  -    ttm_pool_init(&bdev->pool, dev, nid, use_dma_alloc, use_dma32);
+    ttm_pool_init_with_flags(&bdev->pool, dev, nid, use_dma_alloc,
+                 use_dma32, pool_flags);
        bdev->vma_manager = vma_manager;
      spin_lock_init(&bdev->lru_lock);

@@ -239,6 +244,33 @@ int ttm_device_init(struct ttm_device *bdev,const struct ttm_device_funcs *func

        return 0;
  }
+EXPORT_SYMBOL(ttm_device_init_with_pool_flags);
+
+
+/**
+ * ttm_device_init
+ *
+ * @bdev: A pointer to a struct ttm_device to initialize.
+ * @funcs: Function table for the device.

+ * @dev: The core kernel device pointer for DMA mappings andallocations.

+ * @mapping: The address space to use for this bo.
+ * @vma_manager: A pointer to a vma manager.
+ * @use_dma_alloc: If coherent DMA allocation API should be used.

+ * @use_dma32: If we should use GFP_DMA32 for device memoryallocations.

+ *
+ * Initializes a struct ttm_device:
+ * Returns:
+ * !0: Failure.
+ */

+int ttm_device_init(struct ttm_device *bdev, const structttm_device_funcs *funcs,

+            struct device *dev, struct address_space *mapping,
+            struct drm_vma_offset_manager *vma_manager,
+            bool use_dma_alloc, bool use_dma32)
+{
+    return ttm_device_init_with_pool_flags(bdev, funcs, dev, mapping,
+                           vma_manager, use_dma_alloc,
+                           use_dma32, 0);
+}
  EXPORT_SYMBOL(ttm_device_init);
    void ttm_device_fini(struct ttm_device *bdev)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.cb/drivers/gpu/drm/ttm/ttm_pool.c

index 6e1fd6985ffc..6f33c3e7cdf2 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c

@@ -222,15 +222,17 @@ static void ttm_pool_unmap(struct ttm_pool*pool, dma_addr_t dma_addr,

  }
    /* Give pages into a specific pool_type */

-static void ttm_pool_type_give(struct ttm_pool_type *pt, structpage *p)+static void ttm_pool_type_give(struct ttm_pool_type *pt, structpage *p, bool skip_clear)

  {
      unsigned int i, num_pages = 1 << pt->order;
  -    for (i = 0; i < num_pages; ++i) {
-        if (PageHighMem(p))
-            clear_highpage(p + i);
-        else
-            clear_page(page_address(p + i));
+    if (!skip_clear) {
+        for (i = 0; i < num_pages; ++i) {
+            if (PageHighMem(p))
+                clear_highpage(p + i);
+            else
+                clear_page(page_address(p + i));
+        }
      }
        spin_lock(&pt->lock);

@@ -396,7 +398,10 @@ static void ttm_pool_free_range(structttm_pool *pool, struct ttm_tt *tt,

      struct page **pages = &tt->pages[start_page];
      unsigned int order;
      pgoff_t i, nr;
+    bool skip_clear = false;
  +    if (pool->flags & TTM_POOL_FLAG_SKIP_CLEAR_ON_FREE)
+        skip_clear = true;
      for (i = start_page; i < end_page; i += nr, pages += nr) {
          struct ttm_pool_type *pt = NULL;

@@ -407,7 +412,7 @@ static void ttm_pool_free_range(structttm_pool *pool, struct ttm_tt *tt,

            pt = ttm_pool_select_type(pool, caching, order);
          if (pt)
-            ttm_pool_type_give(pt, *pages);
+            ttm_pool_type_give(pt, *pages, skip_clear);
          else
              ttm_pool_free_page(pool, caching, order, *pages);
      }

@@ -550,18 +555,21 @@ void ttm_pool_free(struct ttm_pool *pool,struct ttm_tt *tt)

  EXPORT_SYMBOL(ttm_pool_free);
    /**
- * ttm_pool_init - Initialize a pool
+ * ttm_pool_init_with_flags - Initialize a pool with flags
   *
   * @pool: the pool to initialize
   * @dev: device for DMA allocations and mappings
   * @nid: NUMA node to use for allocations
   * @use_dma_alloc: true if coherent DMA alloc should be used
   * @use_dma32: true if GFP_DMA32 should be used
+ * @flags: control flags for the pool
+ *

+ * Initialize the pool and its pool types with flags to modifydefaults

   *
- * Initialize the pool and its pool types.
   */
-void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
-           int nid, bool use_dma_alloc, bool use_dma32)

+void ttm_pool_init_with_flags(struct ttm_pool *pool, struct device*dev,

+           int nid, bool use_dma_alloc, bool use_dma32,
+           unsigned int flags)
  {
      unsigned int i, j;

@@ -571,6 +579,7 @@ void ttm_pool_init(struct ttm_pool *pool,struct device *dev,

      pool->nid = nid;
      pool->use_dma_alloc = use_dma_alloc;
      pool->use_dma32 = use_dma32;
+    pool->flags = flags;
        for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
          for (j = 0; j < NR_PAGE_ORDERS; ++j) {

@@ -585,6 +594,24 @@ void ttm_pool_init(struct ttm_pool *pool,struct device *dev,

          }
      }
  }
+EXPORT_SYMBOL(ttm_pool_init_with_flags);
+
+/**
+ * ttm_pool_init - Initialize a pool
+ *
+ * @pool: the pool to initialize
+ * @dev: device for DMA allocations and mappings
+ * @nid: NUMA node to use for allocations
+ * @use_dma_alloc: true if coherent DMA alloc should be used
+ * @use_dma32: true if GFP_DMA32 should be used
+ *
+ * Initialize the pool and its pool types.
+ */
+void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
+           int nid, bool use_dma_alloc, bool use_dma32)
+{

+ ttm_pool_init_with_flags(pool, dev, nid, use_dma_alloc,use_dma32, 0);

+}
  EXPORT_SYMBOL(ttm_pool_init);
    /**

diff --git a/include/drm/ttm/ttm_device.hb/include/drm/ttm/ttm_device.h

index c22f30535c84..1b20c5798e97 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h

@@ -291,6 +291,14 @@ int ttm_device_init(struct ttm_device *bdev,const struct ttm_device_funcs *func

              struct device *dev, struct address_space *mapping,
              struct drm_vma_offset_manager *vma_manager,
              bool use_dma_alloc, bool use_dma32);
+int ttm_device_init_with_pool_flags(struct ttm_device *bdev,
+                    const struct ttm_device_funcs *funcs,
+                    struct device *dev,
+                    struct address_space *mapping,
+                    struct drm_vma_offset_manager *vma_manager,
+                    bool use_dma_alloc, bool use_dma32,
+                    unsigned int pool_flags);
+
  void ttm_device_fini(struct ttm_device *bdev);
  void ttm_device_clear_dma_mappings(struct ttm_device *bdev);
  diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 160d954a261e..9822996309e5 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -66,10 +66,17 @@ struct ttm_pool_type {
   * @use_dma_alloc: if coherent DMA allocations should be used
   * @use_dma32: if GFP_DMA32 should be used
   * @caching: pools for each caching/order
+ * @flags: flags to control certain pool behaviour
+ *
+ * The @flags can be:
+ *  - %TTM_POOL_FLAG_SKIP_CLEAR_ON_FREE - This flag can be used to
+ *    skip clear on free when driver decides to do that on it's own.
   */
  struct ttm_pool {
      struct device *dev;
      int nid;
+#define TTM_POOL_FLAG_SKIP_CLEAR_ON_FREE    1 << 0
+    unsigned int flags;
        bool use_dma_alloc;
      bool use_dma32;

@@ -85,6 +92,10 @@ void ttm_pool_free(struct ttm_pool *pool, structttm_tt *tt);

    void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
             int nid, bool use_dma_alloc, bool use_dma32);

+void ttm_pool_init_with_flags(struct ttm_pool *pool, struct device*dev,

+                  int nid, bool use_dma_alloc, bool use_dma32,
+                  unsigned int flags);
+
  void ttm_pool_fini(struct ttm_pool *pool);
    int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m);

Re: [RFC PATCH 1/2] drm/ttm/pool: Introduce a way to skip clear on free

Reply via email to