RE: [PATCH] Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"

2023-05-28 Thread Xiao, Shane
[AMD Official Use Only - General]

> -Original Message-
> From: Paneer Selvam, Arunpravin 
> Sent: Saturday, May 27, 2023 2:12 AM
> To: Xiao, Shane ; Kuehling, Felix
> ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> 
> Subject: Re: [PATCH] Revert "drm/amdgpu: remove TOPDOWN flags when
> allocating VRAM in large bar system"
>
>
>
> On 5/22/2023 8:52 AM, Xiao, Shane wrote:
> > [AMD Official Use Only - General]
> >
> >> -Original Message-
> >> From: Kuehling, Felix 
> >> Sent: Sunday, May 21, 2023 2:39 AM
> >> To: Paneer Selvam, Arunpravin ;
> amd-
> >> g...@lists.freedesktop.org
> >> Cc: Deucher, Alexander ; Koenig, Christian
> >> ; Xiao, Shane 
> >> Subject: Re: [PATCH] Revert "drm/amdgpu: remove TOPDOWN flags when
> >> allocating VRAM in large bar system"
> >>
> >> Am 2023-05-20 um 05:25 schrieb Arunpravin Paneer Selvam:
> >>> This reverts commit c105518679b6e87232874ffc989ec403bee59664.
> >>>
> >>> This patch disables the TOPDOWN flag for APU and few dGPU cards
> >>> which has the VRAM size equal to the BAR size.
> >> With resizable BARs it's not that rare.
> >>
> >>
> >>> When we enable the TOPDOWN flag, we get the free blocks at the
> >>> highest available memory region and we don't split the lower order blocks.
> >>> This change is required to keep off the fragmentation related issues
> >>> particularly in ASIC which has VRAM space <= 500MiB
> > As far as I can understand that, both ways may cause fragmentation issues.
> >
> > As I can understand that remove TOPDOWN flags may get two pros:
> > 1) It can reduce the research time to O(1);
> > 2) Reduce the risk of splitting higher order into lower orders when 
> > allocating
> blocks.
> >
> > But as for the second point, Removing TOPDOWN flags still carries the
> > risk of causing fragmentation issue on some scenarios.
> > For example, the apps need allocate (1M + 4K) VRAM size, which is not a
> power of two.
> >
> > Case 1: If not using TOPDOWN flag, we will allocate 1M order size and 4K
> order size from different order.
> > --And then if this 4K buddy and 1M buddy are freed, it may prevent the 
> > system
> from merging 4K(or 1M) to 8K(2M) order.
> > --If the app has many such allocation requirements, it may cause
> fragmentation issue under memory stress.
> >
> > Case 2: If using TOPDOWN flag, we will find the highest order which is
> suitable for 4K and 1M needs.
> > --Assuming the highest order size is 8M, then we will split this 8M to lower
> order to meet the needs of 1M and 4K sizes, respectively.
> > --In such case the 8M size will split into different lower orders.
> > --If any other thread or process need allocate 4K or 1M VRAM, and there is
> also a chance to prevent the system from merging 4K(or 1M) to 8K(or 2M)
> order.
> >
> > If there are so many needs to allocate not power of two pages from apps,
> one of the choose is to use suballocator from UMD such as what ROCr does.
> > That means suballocator can allocate power of two pages(for example 2M
> bytes) from driver, and the app can allocate memory from suballocator.
> > In such way, it may reduce the risk of causing fragmentation issues.  But I 
> > am
> not sure that such an option is feasible.
> >
> > Maybe case 2 occurs less frequently than case one, then we should use
> TOPDOWN flags whether the system is lar-bar or not.
> Yes, It is better to set TOPDOWN flag mainly to avoid splitting down the lower
> order blocks which resolves the fragmentation issues and it doesn't try to
> allocate from visible memory (provided that we have the enough requested
> memory space in the top down region) which improves the graphics
> performances as well.

Hi Arun,
Think it more, the gfx needs get continuous VRAM for display something.
In such case, the TOPDOWN flag can meet this need better than not use TOPDOWN 
flag because the drm buddy TOPDOWN flag
is prone to meet the condition in amdgpu_is_vram_mgr_blocks_contiguous, which 
means that the blocks are contiguous.

As you mentioned that issue 
https://gitlab.freedesktop.org/drm/amd/-/issues/2270 , one of the reasons may 
be that TOPDOWN
flag in drm buddy can make it easier to get contiguous VRAM with the size of 
not power of two.
This means that we do not need allocate contiguous VRAM again and then do copy 
for display something.
In such way, we can get good performance and reduce the risk of allocation 
failures.
If so, it's necessary to use TOPDOWN flag for gfx.


Hi Felix & Christian,
However, for compute, there is suballocator on ROCr to allocate power of two 
sizes from drm buddy.
If we use the TOPDOWN flag,  there is a potential problem that split higher 
order into lower order.
It may increase the risk of allocating higher order blocks under memory 
pressure if we split too much higher order into lower order.

If so, my question is that do we need to consider using different allocator 
strategies on gfx and compute in large bar system?

Best Regards,
Shane

>
> Regards,
> Arun.
> >
> > Best 

RE: [PATCH] drm/amdgpu: change reserved vram info print

2023-05-28 Thread Chai, Thomas
[AMD Official Use Only - General]

Ping 


-
Best Regards,
Thomas

-Original Message-
From: Chai, Thomas 
Sent: Thursday, May 25, 2023 4:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas ; Zhang, Hawking ; 
Zhou1, Tao ; Li, Candice ; Yang, Stanley 
; Chai, Thomas 
Subject: [PATCH] drm/amdgpu: change reserved vram info print

The link object of mgr->reserved_pages is the blocks variable in struct 
amdgpu_vram_reservation, not the link variable in struct drm_buddy_block.

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 89d35d194f2c..c7085a747b03 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -839,7 +839,7 @@ static void amdgpu_vram_mgr_debug(struct 
ttm_resource_manager *man,  {
struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
struct drm_buddy *mm = >mm;
-   struct drm_buddy_block *block;
+   struct amdgpu_vram_reservation *rsv;

drm_printf(printer, "  vis usage:%llu\n",
   amdgpu_vram_mgr_vis_usage(mgr));
@@ -851,8 +851,9 @@ static void amdgpu_vram_mgr_debug(struct 
ttm_resource_manager *man,
drm_buddy_print(mm, printer);

drm_printf(printer, "reserved:\n");
-   list_for_each_entry(block, >reserved_pages, link)
-   drm_buddy_block_print(mm, block, printer);
+   list_for_each_entry(rsv, >reserved_pages, blocks)
+   drm_printf(printer, "%#018llx-%#018llx: %llu\n",
+   rsv->start, rsv->start + rsv->size, rsv->size);
mutex_unlock(>lock);
 }

--
2.34.1



[PATCH][next] drm/amdgpu/discovery: Replace fake flex-arrays with flexible-array members

2023-05-28 Thread Gustavo A. R. Silva
Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.

Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length
arrays in a union into flexible-array members. And replace a one-element
array with a C99 flexible-array member.

Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1009:89: warning: array subscript 
kk is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} 
[-Warray-bounds=]
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1007:94: warning: array subscript 
kk is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} 
[-Warray-bounds=]
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1310:94: warning: array subscript 
k is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} 
[-Warray-bounds=]
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1309:57: warning: array subscript 
k is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} 
[-Warray-bounds=]

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].

This results in no differences in binary output.

Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/193
Link: https://github.com/KSPP/linux/issues/300
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/include/discovery.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/discovery.h 
b/drivers/gpu/drm/amd/include/discovery.h
index 9181e57887db..f43e29722ef7 100644
--- a/drivers/gpu/drm/amd/include/discovery.h
+++ b/drivers/gpu/drm/amd/include/discovery.h
@@ -122,7 +122,7 @@ typedef struct ip_v3
uint8_t sub_revision : 4;   /* HCID Sub-Revision */
uint8_t variant : 4;/* HW variant */
 #endif
-   uint32_t base_address[1];   /* Base Address list. 
Corresponds to the num_base_address field*/
+   uint32_t base_address[];/* Base Address list. 
Corresponds to the num_base_address field*/
 } ip_v3;
 
 typedef struct ip_v4 {
@@ -140,8 +140,8 @@ typedef struct ip_v4 {
uint8_t sub_revision : 4;   /* HCID Sub-Revision */
 #endif
union {
-   uint32_t base_address[0];   /* 32-bit Base Address 
list. Corresponds to the num_base_address field*/
-   uint64_t base_address_64[0];/* 64-bit Base Address 
list. Corresponds to the num_base_address field*/
+   DECLARE_FLEX_ARRAY(uint32_t, base_address); /* 32-bit Base 
Address list. Corresponds to the num_base_address field*/
+   DECLARE_FLEX_ARRAY(uint64_t, base_address_64);  /* 64-bit Base 
Address list. Corresponds to the num_base_address field*/
} __packed;
 } ip_v4;
 
-- 
2.34.1