date:20221125

[PATCH] drm/amdgpu: Fix minmax error

2022-11-25 Thread Luben Tuikov

Fix minmax compilation error by using the correct constant and correct integer
suffix.

Cc: James Zhu 
Cc: Felix Kuehling 
Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system 
memory")
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 8a2e5716d8dba2..65715cb395d838 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -51,7 +51,7 @@
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_hmm.h"
 
-#define MAX_WALK_BYTE  (64ULL<<30)
+#define MAX_WALK_BYTE  (2UL << 30)
 
 /**
  * amdgpu_hmm_invalidate_gfx - callback to notify about mm change
@@ -197,8 +197,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
hmm_range->start, hmm_range->end);
 
/* Assuming 512MB takes maxmium 1 second to fault page address 
*/
-   timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL) *
-   HMM_RANGE_DEFAULT_TIMEOUT;
+   timeout = max((hmm_range->end - hmm_range->start) >> 29, 1UL);
+   timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
timeout = jiffies + msecs_to_jiffies(timeout);
 
 retry:

base-commit: 9e95ce4c60631c1339204f8723008a715391f410
-- 
2.39.0.rc0

Re: [PATCH] drm/amdgpu: Fix minmax error

2022-11-25 Thread Luben Tuikov

On 2022-11-25 16:03, James Zhu wrote:
> 
> On 2022-11-25 14:42, Luben Tuikov wrote:
>> On 2022-11-25 04:57, Christian König wrote:
>>>
>>> Am 25.11.22 um 09:33 schrieb Luben Tuikov:
 On 2022-11-25 02:59, Christian König wrote:
> Am 25.11.22 um 08:56 schrieb Luben Tuikov:
>> On 2022-11-25 02:45, Christian König wrote:
>>> Am 24.11.22 um 22:19 schrieb Luben Tuikov:
 Fix minmax compilation error by using min_t()/max_t(), of the 
 assignment type.

 Cc: James Zhu 
 Cc: Felix Kuehling 
 Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate 
 large system memory")
 Signed-off-by: Luben Tuikov 
 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
 index 8a2e5716d8dba2..d22d14b0ef0c84 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
 @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct 
 mmu_interval_notifier *notifier,
hmm_range->dev_private_owner = owner;
  
do {
 -  hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, 
 end);
 +  hmm_range->end = min_t(typeof(hmm_range->end),
 + hmm_range->start + MAX_WALK_BYTE,
 + end);
>>> Since end is a local variable I would strongly prefer to just have it
>>> use the correct type for it.
>>>
>>> Otherwise we might end up using something which doesn't work on all
>>> architectures.
>> They all appear to be "unsigned long". I thought, since we assign to
>> hmm_range->end, we use that type.
> Mhm, then why does the compiler complain here?
 Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 
 :-) ),
 and this is why the minmax check complains.

 So, since the left-hand expression is unsigned long,
 i.e.,
hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
 is,
unsigned long = min(unsigned long long, unsigned long);
 The compiler complains.

 I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX,
>>> That's not only a preference, but a must have. Otherwise the code maybe
>>> won't work as expected on 32bit architectures.
>> Well, I don't know what to change MAX_WALK_BYTE to, and given the suggestion
>> below to change to min_t(u64, ...), I wonder if messing with MAX_WALK_BYTE
>> even matters--it wouldn't matter so long as the type in min_t() is u64.
>> It's a macro at the moment.
>>
>> However, the LHS--struct hmm_range's members are all
>> unsigned long and then we're essentially doing (unsigned long) = (u64),
>> which on 32-bit is (u32) = (u64).
> [JZ]MAX_WALK_BYTE can be reduce to #define MAX_WALK_BYTE (2UL<<30)

Hi James--okay, I'll prep up a patch.

Regards,
Luben

Re: [PATCH] drm/amdgpu: Fix minmax error

2022-11-25 Thread James Zhu




On 2022-11-25 14:42, Luben Tuikov wrote:

On 2022-11-25 04:57, Christian König wrote:


Am 25.11.22 um 09:33 schrieb Luben Tuikov:

On 2022-11-25 02:59, Christian König wrote:

Am 25.11.22 um 08:56 schrieb Luben Tuikov:

On 2022-11-25 02:45, Christian König wrote:

Am 24.11.22 um 22:19 schrieb Luben Tuikov:

Fix minmax compilation error by using min_t()/max_t(), of the assignment type.

Cc: James Zhu 
Cc: Felix Kuehling 
Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system 
memory")
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 8a2e5716d8dba2..d22d14b0ef0c84 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,
hmm_range->dev_private_owner = owner;
 
 	do {

-   hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
+   hmm_range->end = min_t(typeof(hmm_range->end),
+  hmm_range->start + MAX_WALK_BYTE,
+  end);

Since end is a local variable I would strongly prefer to just have it
use the correct type for it.

Otherwise we might end up using something which doesn't work on all
architectures.

They all appear to be "unsigned long". I thought, since we assign to
hmm_range->end, we use that type.

Mhm, then why does the compiler complain here?

Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) ),
and this is why the minmax check complains.

So, since the left-hand expression is unsigned long,
i.e.,
hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
is,
unsigned long = min(unsigned long long, unsigned long);
The compiler complains.

I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX,

That's not only a preference, but a must have. Otherwise the code maybe
won't work as expected on 32bit architectures.

Well, I don't know what to change MAX_WALK_BYTE to, and given the suggestion
below to change to min_t(u64, ...), I wonder if messing with MAX_WALK_BYTE
even matters--it wouldn't matter so long as the type in min_t() is u64.
It's a macro at the moment.

However, the LHS--struct hmm_range's members are all
unsigned long and then we're essentially doing (unsigned long) = (u64),
which on 32-bit is (u32) = (u64).

[JZ]MAX_WALK_BYTE can be reduce to #define MAX_WALK_BYTE (2UL<<30)


Regards,
Luben


and be defined as UL. I mean, why is everything in struct hmm_range
"unsigned long", but we set a high limit of 10__h for an end, and
compare it to "end" to find the smaller? If our "end" could potentially
be 10__h then shouldn't the members in struct hmm_range be
unsigned long long as well?

No, that the hmm range depends on the address space bits of the CPU is
perfectly correct. Essentially this is just an userspace address range.

Our problem here is that this code needs to work on both 32bit and 64bit
systems. And on a 32bit system limiting the types wouldn't work
correctly as far as I can see.

So the compiler is complaining for rather good reasons and by using
"min_t(UL" we just hide that instead of fixing the problem.

I suggest to use "min_t(u64" instead. An intelligent compiler should
even be capable of optimizing this away by looking at the input types on
32bit archs.


And for the timeout, we have the (now) obvious,

timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL);

and I don't know why we necessarily need a "1ULL", when 1UL would do just fine,
and then compilation passes for that statement. I can set this to 1UL, instead
of using max_t().

I think just changing this to 1UL should be sufficient.

Regards,
Christian.


Regards,
Luben



As far as I can see "unsigned long" is correct here, but if we somehow
have a typecast then something is not working as expected.

Is MAX_WALK_BYTE maybe of signed type?


Would you prefer at the top of the function to define "timeout" and "end" as,
typeof(hmm_range->end) end, timeout;

Well for end that might make sense, but timeout is independent of the
hmm range.

Regards,
Christian.


Regards,
Luben

[PATCH] drm/amd/display: fix PSR-SU/DSC interoperability support

2022-11-25 Thread Hamza Mahfooz

Currently, there are issues with enabling PSR-SU + DSC. This stems from
the fact that DSC imposes a slice height on transmitted video data and
we are not conforming to that slice height in PSR-SU regions. So, pass
slice_height into su_y_granularity to feed the DSC slice height into
PSR-SU code.

Signed-off-by: Hamza Mahfooz 
---
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 38 +++
 1 file changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
index 26291db0a3cf..55acadf0b63f 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
@@ -53,6 +53,41 @@ static bool link_supports_psrsu(struct dc_link *link)
return true;
 }
 
+static bool psr_su_set_y_granularity(struct dc *dc, struct dc_link *link,
+struct dc_stream_state *stream,
+struct psr_config *config)
+{
+   u16 pic_height;
+   u8 slice_height;
+
+   if (!dc->caps.edp_dsc_support ||
+   link->panel_config.dsc.disable_dsc_edp ||
+   
!link->dpcd_caps.dsc_caps.dsc_basic_caps.fields.dsc_support.DSC_SUPPORT ||
+   !stream->timing.dsc_cfg.num_slices_v)
+   return true;
+
+   pic_height = stream->timing.v_addressable +
+   stream->timing.v_border_top + stream->timing.v_border_bottom;
+   slice_height = pic_height / stream->timing.dsc_cfg.num_slices_v;
+
+   if (slice_height) {
+   if (config->su_y_granularity &&
+   (slice_height % config->su_y_granularity)) {
+   WARN(1,
+"%s: dsc: %d, slice_height: %d, num_slices_v: 
%d\n",
+__func__,
+
stream->sink->dsc_caps.dsc_dec_caps.is_dsc_supported,
+slice_height,
+stream->timing.dsc_cfg.num_slices_v);
+   return false;
+   }
+
+   config->su_y_granularity = slice_height;
+   }
+
+   return true;
+}
+
 /*
  * amdgpu_dm_set_psr_caps() - set link psr capabilities
  * @link: link
@@ -122,6 +157,9 @@ bool amdgpu_dm_link_setup_psr(struct dc_stream_state 
*stream)
psr_config.allow_multi_disp_optimizations =
(amdgpu_dc_feature_mask & DC_PSR_ALLOW_MULTI_DISP_OPT);
 
+   if (!psr_su_set_y_granularity(dc, link, stream, &psr_config))
+   return false;
+
ret = dc_link_setup_psr(link, stream, &psr_config, 
&psr_context);
 
}
-- 
2.38.1

Re: [PATCH] drm/amdgpu: Fix minmax error

2022-11-25 Thread Luben Tuikov

On 2022-11-25 04:57, Christian König wrote:
> 
> 
> Am 25.11.22 um 09:33 schrieb Luben Tuikov:
>> On 2022-11-25 02:59, Christian König wrote:
>>> Am 25.11.22 um 08:56 schrieb Luben Tuikov:
 On 2022-11-25 02:45, Christian König wrote:
> Am 24.11.22 um 22:19 schrieb Luben Tuikov:
>> Fix minmax compilation error by using min_t()/max_t(), of the assignment 
>> type.
>>
>> Cc: James Zhu 
>> Cc: Felix Kuehling 
>> Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large 
>> system memory")
>> Signed-off-by: Luben Tuikov 
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>> index 8a2e5716d8dba2..d22d14b0ef0c84 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>> @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct 
>> mmu_interval_notifier *notifier,
>>  hmm_range->dev_private_owner = owner;
>> 
>>  do {
>> -hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, 
>> end);
>> +hmm_range->end = min_t(typeof(hmm_range->end),
>> +   hmm_range->start + MAX_WALK_BYTE,
>> +   end);
> Since end is a local variable I would strongly prefer to just have it
> use the correct type for it.
>
> Otherwise we might end up using something which doesn't work on all
> architectures.
 They all appear to be "unsigned long". I thought, since we assign to
 hmm_range->end, we use that type.
>>> Mhm, then why does the compiler complain here?
>> Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) 
>> ),
>> and this is why the minmax check complains.
>>
>> So, since the left-hand expression is unsigned long,
>> i.e.,
>>  hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
>> is,
>>  unsigned long = min(unsigned long long, unsigned long);
>> The compiler complains.
>>
>> I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX,
> 
> That's not only a preference, but a must have. Otherwise the code maybe 
> won't work as expected on 32bit architectures.

Well, I don't know what to change MAX_WALK_BYTE to, and given the suggestion
below to change to min_t(u64, ...), I wonder if messing with MAX_WALK_BYTE
even matters--it wouldn't matter so long as the type in min_t() is u64.
It's a macro at the moment.

However, the LHS--struct hmm_range's members are all
unsigned long and then we're essentially doing (unsigned long) = (u64),
which on 32-bit is (u32) = (u64).

Regards,
Luben

> 
>> and be defined as UL. I mean, why is everything in struct hmm_range
>> "unsigned long", but we set a high limit of 10__h for an end, and
>> compare it to "end" to find the smaller? If our "end" could potentially
>> be 10__h then shouldn't the members in struct hmm_range be
>> unsigned long long as well?
> 
> No, that the hmm range depends on the address space bits of the CPU is 
> perfectly correct. Essentially this is just an userspace address range.
> 
> Our problem here is that this code needs to work on both 32bit and 64bit 
> systems. And on a 32bit system limiting the types wouldn't work 
> correctly as far as I can see.
> 
> So the compiler is complaining for rather good reasons and by using 
> "min_t(UL" we just hide that instead of fixing the problem.
> 
> I suggest to use "min_t(u64" instead. An intelligent compiler should 
> even be capable of optimizing this away by looking at the input types on 
> 32bit archs.
> 
>>
>> And for the timeout, we have the (now) obvious,
>>
>>  timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL);
>>
>> and I don't know why we necessarily need a "1ULL", when 1UL would do just 
>> fine,
>> and then compilation passes for that statement. I can set this to 1UL, 
>> instead
>> of using max_t().
> 
> I think just changing this to 1UL should be sufficient.
> 
> Regards,
> Christian.
> 
>>
>> Regards,
>> Luben
>>
>>
>>> As far as I can see "unsigned long" is correct here, but if we somehow
>>> have a typecast then something is not working as expected.
>>>
>>> Is MAX_WALK_BYTE maybe of signed type?
>>>
 Would you prefer at the top of the function to define "timeout" and "end" 
 as,
typeof(hmm_range->end) end, timeout;
>>> Well for end that might make sense, but timeout is independent of the
>>> hmm range.
>>>
>>> Regards,
>>> Christian.
>>>
 Regards,
 Luben

>

Re: [PATCH 2/2] drm/amdgpu: Fix PCI device refcount leak in amdgpu_atrm_get_bios()

2022-11-25 Thread Alex Deucher

Applied the series.  Thanks!

Alex

On Tue, Nov 22, 2022 at 6:13 AM Xiongfeng Wang
 wrote:
>
> As comment of pci_get_class() says, it returns a pci_device with its
> refcount increased and decreased the refcount for the input parameter
> @from if it is not NULL.
>
> If we break the loop in amdgpu_atrm_get_bios() with 'pdev' not NULL, we
> need to call pci_dev_put() to decrease the refcount. Add the missing
> pci_dev_put() to avoid refcount leak.
>
> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> Signed-off-by: Xiongfeng Wang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> index e363f56c72af..30c28a69e847 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> @@ -317,6 +317,7 @@ static bool amdgpu_atrm_get_bios(struct amdgpu_device 
> *adev)
>
> if (!found)
> return false;
> +   pci_dev_put(pdev);
>
> adev->bios = kmalloc(size, GFP_KERNEL);
> if (!adev->bios) {
> --
> 2.20.1
>

Re: [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation

2022-11-25 Thread Alex Deucher

On Fri, Nov 25, 2022 at 5:21 AM Christian König
 wrote:
>
> We already fallback to a dummy BO with no backing store when we
> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>
> Drop all those workarounds and generalize this for GTT as well. This
> fixes ENOMEM issues with runaway applications which try to allocate/free
> GTT in a loop and are otherwise only limited by the CPU speed.
>
> The CS will wait for the cleanup of freed up BOs to satisfy the
> various domain specific limits and so effectively throttle those
> buggy applications down to a sane allocation behavior again.
>
> Signed-off-by: Christian König 

This looks like a good bug fix and unrelated to the rest of this series.
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 16 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
>  2 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index a0780a4e3e61..62e98f1ad770 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
> unsigned long size,
> bp.resv = resv;
> bp.preferred_domain = initial_domain;
> bp.flags = flags;
> -   bp.domain = initial_domain;
> +   bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
> bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>
> r = amdgpu_bo_create_user(adev, &bp, &ubo);
> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, 
> void *data,
> }
>
> initial_domain = (u32)(0x & args->in.domains);
> -retry:
> r = amdgpu_gem_object_create(adev, size, args->in.alignment,
> -initial_domain,
> -flags, ttm_bo_type_device, resv, &gobj);
> +initial_domain, flags, 
> ttm_bo_type_device,
> +resv, &gobj);
> if (r && r != -ERESTARTSYS) {
> -   if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> -   flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> -   goto retry;
> -   }
> -
> -   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
> -   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
> -   goto retry;
> -   }
> DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
> %d)\n",
> size, initial_domain, args->in.alignment, r);
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 974e85d8b6cc..919bbea2e3ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
> bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>
> bo->tbo.bdev = &adev->mman.bdev;
> -   if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
> - AMDGPU_GEM_DOMAIN_GDS))
> -   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
> -   else
> -   amdgpu_bo_placement_from_domain(bo, bp->domain);
> +   amdgpu_bo_placement_from_domain(bo, bp->domain);
> if (bp->type == ttm_bo_type_kernel)
> bo->tbo.priority = 1;
>
> --
> 2.34.1
>

[pull] amdgpu, amdkfd drm-next-6.2

2022-11-25 Thread Alex Deucher

Hi Dave, Daniel,

More stuff for 6.2.  Mostly bug fixes at this point.

The following changes since commit 3d335a523b938a445a674be24d1dd5c7a4c86fb6:

  Merge tag 'drm-intel-next-2022-11-18' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-next (2022-11-23 09:15:44 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-6.2-2022-11-25

for you to fetch changes up to 10d2d1fc05f03ee1626b60761a3425622767513e:

  drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly 
when the edid is read" (2022-11-23 10:31:31 -0500)


amd-drm-next-6.2-2022-11-25:

amdgpu:
- Old GCC fix
- GFX11 fixes
- PSP suspend/resume fix
- PCI ref count fix
- DC KASAN fix
- DCN 3.2.x fixes
- Dell platform suspend/resume fixes
- DCN 3.1.4 fixes
- RAS fixes
- SMU 13.x fixes
- Flex array changes
- VCN 4.0 RAS updates
- Add missing licsense to some files
- Documentation updates
- SR-IOV fixes
- DP MST DSC fix

amdkfd:
- Fix topology locking in error case


Alex Deucher (4):
  drm/amdgpu/psp: don't free PSP buffers on suspend
  Revert "drm/amd/display: fix dpms_off issue when disabling bios mode"
  drm/amdgpu: add missing license to some files
  drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info 
correctly when the edid is read"

Alvin Lee (6):
  drm/amd/display: Add margin on DRR vblank start for subvp
  drm/amd/display: Limit HW cursor size of >= 4k
  drm/amd/display: Check if PSR enabled when entering MALL
  drm/amd/display: Add debug options for increasing phantom lines
  drm/amd/display: Retain phantom plane/stream if validation fails
  drm/amd/display: Revert check for phantom BPP

Aric Cyr (1):
  drm/amd/display: 3.2.213

Aurabindo Pillai (1):
  drm/amd/display: trigger timing sync only if TG is running

Bob zhou (1):
  drm/amd/display: fix compilation issue with legacy gcc

Camille Cho (1):
  drm/amd/display: new ABM config 2

Candice Li (1):
  drm/amd/pm: Enable bad memory page/channel recording support for smu 
v13_0_0

David Galiffi (1):
  drm/amd/display: Fix rotated cursor offset calculation

Dillon Varone (5):
  drm/amd/display: Update soc bounding box for dcn32/dcn321
  drm/amd/display: Use dummy pstate latency for subvp when needed on dcn32
  drm/amd/display: Add check for DET fetch latency hiding for dcn32
  drm/amd/display: Use viewport height for subvp mall allocation size
  drm/amd/display: Use new num clk levels struct for max mclk index

Felix Kuehling (1):
  drm/amdkfd: Release the topology_lock in error case

Ilya Bakoulin (1):
  drm/amd/display: Fix display corruption w/ VSR enable

Jack Xiao (1):
  drm/amd/amdgpu: reserve vm invalidation engine for firmware

Jane Jian (1):
  drm/amdgpu/vcn: re-use original vcn0 doorbell value

Luben Tuikov (1):
  drm/amdgpu: Fix minmax warning

Lyude Paul (2):
  drm/amdgpu/dm/mst: Fix uninitialized var in 
pre_compute_mst_dsc_configs_for_state()
  drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN

Mustapha Ghaddar (1):
  drm/amd/display: Phase 1 Add Bw Allocation source and header files

Nicholas Kazlauskas (2):
  drm/amd/display: Update Z8 watermarks for DCN314
  drm/amd/display: Add Z8 allow states to z-state support list

Paulo Miguel Almeida (1):
  drm/amdgpu: Replace remaining 1-element array with flex-array

Ren Zhijie (1):
  drm/amdgpu: fix unused-function error

Rodrigo Siqueira (1):
  drm/amd/display: Add YCBCR2020 coefficients to CSC matrix

Shikang Fan (1):
  drm/amdgpu: fix for suspend/resume kiq fence fallback under sriov

Stanley.Yang (1):
  drm/amdgpu: fix use-after-free during gpu recovery

Taimur Hassan (1):
  drm/amd/display: Avoid setting pixel rate divider to N/A

Tao Zhou (2):
  drm/amdgpu: add register definition for VCN RAS initialization
  drm/amdgpu: enable RAS poison for VCN 2.6

Tsung-hua Lin (1):
  drm/amd/display: No display after resume from WB/CB

Yang Yingliang (1):
  drm/amdgpu: fix pci device refcount leak

ZhenGuo Yin (1):
  drm/amdgpu: update documentation of parameter amdgpu_gtt_size

lyndonli (2):
  drm/amd/pm: update driver if header for smu_13_0_7
  drm/amdgpu: add the fan abnormal detection feature

 drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  27 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c  |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c|   6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|   7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  16 +--
 drivers/gpu/drm/amd/amdgpu/mmsch_v4_0.h|   1 -
 drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c

[PATCH v3 2/2] drm/amdgpu: Add work function for GPU reset event

2022-11-25 Thread André Almeida

Add a work function to send a GPU reset uevent and scheduled it during
a GPU reset.

Co-developed-by: Shashank Sharma 
Signed-off-by: Shashank Sharma 
Signed-off-by: André Almeida 
---
V3:
   - Merge two last commits

V2: Addressed review comments from Christian
   - Changed the name of the work to gpu_reset_event_work
   - Added a structure to accommodate some additional information
 (like a PID and some flags)
   - Do not add new structure in amdgpu.h
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  4 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6b74df446694..88cb5b739c5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -60,6 +60,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include "dm_pp_interface.h"
@@ -1003,6 +1005,7 @@ struct amdgpu_device {
 
int asic_reset_res;
struct work_struct  xgmi_reset_work;
+   struct work_struct  gpu_reset_event_work;
struct list_headreset_list;
 
longgfx_timeout;
@@ -1036,6 +1039,7 @@ struct amdgpu_device {
pci_channel_state_t pci_channel_state;
 
struct amdgpu_reset_control *reset_cntl;
+   struct drm_reset_event_info reset_event_info;
uint32_t
ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
 
boolram_is_direct_mapped;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b2b1c66bfe39..d04541fdb606 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -77,6 +77,7 @@
 #include 
 
 #include 
+#include 
 
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
@@ -3365,6 +3366,19 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device 
*adev)
return amdgpu_device_asic_has_dc_support(adev->asic_type);
 }
 
+static void amdgpu_device_reset_event_func(struct work_struct *__work)
+{
+   struct amdgpu_device *adev = container_of(__work, struct amdgpu_device,
+  gpu_reset_event_work);
+   /*
+* A GPU reset has happened, inform the userspace and pass the reset
+* related information
+*/
+   drm_sysfs_reset_event(&adev->ddev, &adev->reset_event_info);
+
+   put_pid(adev->reset_event_info.pid);
+}
+
 static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
 {
struct amdgpu_device *adev =
@@ -3616,6 +3630,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  amdgpu_device_delay_enable_gfx_off);
 
INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
+   INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func);
 
adev->gfx.gfx_off_req_count = 1;
adev->gfx.gfx_off_residency = 0;
@@ -4920,6 +4935,21 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
goto out;
 
vram_lost = 
amdgpu_device_check_vram_lost(tmp_adev);
+
+   if (reset_context->job && 
reset_context->job->vm) {
+   tmp_adev->reset_event_info.pid =
+   
find_get_pid(reset_context->job->vm->task_info.pid);
+   } else {
+   tmp_adev->reset_event_info.pid = NULL;
+   }
+
+   if (vram_lost)
+   tmp_adev->reset_event_info.flags |=
+   DRM_RESET_EVENT_VRAM_LOST;
+
+   /* Send GPU reset event */
+   schedule_work(&tmp_adev->gpu_reset_event_work);
+
 #ifdef CONFIG_DEV_COREDUMP
tmp_adev->reset_vram_lost = vram_lost;
memset(&tmp_adev->reset_task_info, 0,
-- 
2.38.1

[PATCH v3 1/2] drm: Add GPU reset sysfs event

2022-11-25 Thread André Almeida

From: Shashank Sharma 

Add a sysfs event to notify userspace about GPU resets providing:
- PID that triggered the GPU reset, if any. Resets can happen from
  kernel threads as well, in that case no PID is provided
- Information about the reset (e.g. was VRAM lost?)

Co-developed-by: André Almeida 
Signed-off-by: André Almeida 
Signed-off-by: Shashank Sharma 
---

V3:
   - Reduce information to just PID and flags
   - Use pid pointer instead of just pid number
   - BUG() if no reset info is provided

V2:
   - Addressed review comments from Christian and Amar
   - move the reset information structure to DRM layer
   - drop _ctx from struct name
   - make pid 32 bit(than 64)
   - set flag when VRAM invalid (than valid)
   - add process name as well (Amar)
---
 drivers/gpu/drm/drm_sysfs.c | 26 ++
 include/drm/drm_sysfs.h | 13 +
 2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
index 430e00b16eec..85777abf4194 100644
--- a/drivers/gpu/drm/drm_sysfs.c
+++ b/drivers/gpu/drm/drm_sysfs.c
@@ -409,6 +409,32 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
 }
 EXPORT_SYMBOL(drm_sysfs_hotplug_event);
 
+/**
+ * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
+ * @dev: DRM device
+ * @reset_info: The contextual information about the reset (like PID, flags)
+ *
+ * Send a uevent for the DRM device specified by @dev. This informs
+ * user that a GPU reset has occurred, so that an interested client
+ * can take any recovery or profiling measure.
+ */
+void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_event_info 
*reset_info)
+{
+   unsigned char pid_str[13];
+   unsigned char flags_str[18];
+   unsigned char reset_str[] = "RESET=1";
+   char *envp[] = { reset_str, pid_str, flags_str, NULL };
+
+   DRM_DEBUG("generating reset event\n");
+
+   BUG_ON(!reset_info);
+
+   snprintf(pid_str, sizeof(pid_str), "PID=%u", pid_vnr(reset_info->pid));
+   snprintf(flags_str, sizeof(flags_str), "FLAGS=0x%llx", 
reset_info->flags);
+   kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+}
+EXPORT_SYMBOL(drm_sysfs_reset_event);
+
 /**
  * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector
  * change
diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
index 6273cac44e47..dbb0ac6230b8 100644
--- a/include/drm/drm_sysfs.h
+++ b/include/drm/drm_sysfs.h
@@ -2,15 +2,28 @@
 #ifndef _DRM_SYSFS_H_
 #define _DRM_SYSFS_H_
 
+#define DRM_RESET_EVENT_VRAM_LOST (1 << 0)
+
 struct drm_device;
 struct device;
 struct drm_connector;
 struct drm_property;
 
+/**
+ * struct drm_reset_event_info - Information about a GPU reset event
+ * @pid: Process that triggered the reset, if any
+ * @flags: Extra information around the reset event (e.g. is VRAM lost?)
+ */
+struct drm_reset_event_info {
+   struct pid *pid;
+   uint64_t flags;
+};
+
 int drm_class_device_register(struct device *dev);
 void drm_class_device_unregister(struct device *dev);
 
 void drm_sysfs_hotplug_event(struct drm_device *dev);
+void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_event_info 
*reset_info);
 void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
 void drm_sysfs_connector_status_event(struct drm_connector *connector,
  struct drm_property *property);
-- 
2.38.1

[PATCH v3 0/2] drm: Add GPU reset sysfs

2022-11-25 Thread André Almeida

This patchset adds a udev event for DRM device's resets.

Userspace apps can trigger GPU resets by misuse of graphical APIs or driver
bugs. Either way, the GPU reset might lead the system to a broken state[1], that
might be recovered if user has access to a tty or a remote shell. Arguably, this
recovery could happen automatically by the system itself, thus this is the goal
of this patchset.

For debugging and report purposes, device coredump support was already added
for amdgpu[2], but it's not suitable for programmatic usage like this one given
the uAPI not being stable and the need for parsing.

GL/VK is out of scope for this use, giving that we are dealing with device
resets regardless of API.

A basic userspace daemon is provided at [3] showing how the interface is used
to recovery from resets.

[1] A search for "reset" in DRM/AMD issue tracker shows reports of resets
making the system unusable:
https://gitlab.freedesktop.org/drm/amd/-/issues/?search=reset

[2] 
https://lore.kernel.org/amd-gfx/20220602081538.1652842-2-amaranath.somalapu...@amd.com/

[3] https://gitlab.freedesktop.org/andrealmeid/gpu-resetd

v2: 
https://lore.kernel.org/dri-devel/20220308180403.75566-1-contactshashanksha...@gmail.com/

André Almeida (1):
  drm/amdgpu: Add work function for GPU reset event

Shashank Sharma (1):
  drm: Add GPU reset sysfs event

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  4 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++
 drivers/gpu/drm/drm_sysfs.c| 26 +++
 include/drm/drm_sysfs.h| 13 ++
 4 files changed, 73 insertions(+)

-- 
2.38.1

RE: [PATCH 00/12] DC Patches November 28 2022

2022-11-25 Thread Wheeler, Daniel

[AMD Official Use Only - General]

Hi all,
 
This week this patchset was tested on the following systems:
 
HP Envy 360, with Ryzen 5 4500U
Lenovo Thinkpad T14s Gen2, with AMD Ryzen 5 5650U 
Lenovo Thinkpad T13s Gen4 with AMD Ryzen 5 6600U
Sapphire Pulse RX5700XT 
Reference AMD RX6800
 
These systems were tested on the following display types: 
eDP, (1080p 60hz [4500U, 5650U]) (1920x1200 [6600U])
VGA and DVI (1680x1050 60HZ [DP to VGA/DVI, USB-C to DVI/VGA])
DP/HDMI/USB-C (1440p 170hz, 4k 60hz, 4k 144hz [Includes USB-C to DP/HDMI 
adapters])
 
MST tested with Startech MST14DP123DP and 2x 4k 60Hz displays
DSC tested with Cable Matters 101075 (DP to 3x DP), and 201375 (USB-C to 3x DP) 
with 3x 4k60 displays
HP Hook G2 with 1 and 2 4k60 Displays
 
The testing is a mix of automated and manual tests. Manual testing includes 
(but is not limited to):
Changing display configurations and settings
Benchmark testing
Feature testing (Freesync, etc.)
 
Automated testing includes (but is not limited to):
Script testing (scripts to automate some of the manual checks)
IGT testing
 
The patchset consists of the amd-staging-drm-next branch (Head commit - 
1ee359db35f2c487a063412168724d3a55b72123 -> drm/amdgpu: enable RAS poison for 
VCN 2.6) with new patches added on top of it. This branch is used for both 
Ubuntu and Chrome OS testing (ChromeOS on a bi-weekly basis).

 
Tested on Ubuntu 22.04 and Chrome OS
 
Tested-by: Daniel Wheeler 
 
 
Thank you,
 
Dan Wheeler
Sr. Technologist | AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
amd.com

Thank you,

Dan Wheeler
Sr. Technologist  |  AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
Facebook |  Twitter |  amd.com  


-Original Message-
From: Dhillon, Jasdeep  
Sent: November 24, 2022 4:14 PM
To: amd-gfx@lists.freedesktop.org
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Lakha, Bhawanpreet ; Siqueira, 
Rodrigo ; Pillai, Aurabindo 
; Zhuo, Qingqing (Lillian) ; 
Li, Roman ; Lin, Wayne ; Wang, Chao-kai 
(Stylon) ; Chiu, Solomon ; Kotarac, 
Pavle ; Gutierrez, Agustin ; 
Dhillon, Jasdeep ; Wheeler, Daniel 

Subject: [PATCH 00/12] DC Patches November 28 2022

This DC patchset brings improvements in multiple areas. In summary, we have:

* Program output transfer function when required
* Fix arthmetic errror in MALL size caluclations for subvp
* DCC Meta pitch used for MALL allocation
* Debugfs entry to tell if connector is DPIA link
* Use largest vready_offset in pipe group
* Fixes race condition in DPIA Aux transfer


Cc: Daniel Wheeler 


Alvin Lee (3):
  drm/amd/display: Retain phantom pipes when min transition into subvp
  drm/amd/display: Don't overwrite subvp pipe info in fast updates
  drm/amd/display: Fix DTBCLK disable requests and SRC_SEL programming

Aric Cyr (1):
  drm/amd/display: 3.2.214

Dillon Varone (4):
  drm/amd/display: MALL SS calculations should iterate over all pipes
for cursor
  drm/amd/display: Use DCC meta pitch for MALL allocation requirements
  drm/amd/display: Fix arithmetic error in MALL size calculations for
subvp
  drm/amd/display: program output tf when required

Dmytro Laktyushkin (1):
  drm/amd/display: set per pipe dppclk to 0 when dpp is off

Stylon Wang (2):
  drm/amd/display: Fix race condition in DPIA AUX transfer
  drm/amd/display: Create debugfs to tell if connector is DPIA link

Wesley Chalmers (1):
  drm/amd/display: Use the largest vready_offset in pipe group

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 151 +-  
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  17 +-  
.../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c |  23 ++-  
.../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  10 +-
 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  |   2 +-
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  23 ++-
 drivers/gpu/drm/amd/display/dc/dc.h   |   2 +-
 drivers/gpu/drm/amd/display/dc/dc_stream.h|  11 ++
 .../amd/display/dc/dcn10/dcn10_hw_sequencer.c |  30 +++-
 .../drm/amd/display/dc/dcn20/dcn20_hwseq.c|  34 +++-
 .../gpu/drm/amd/display/dc/dcn32/dcn32_dccg.c |   6 +-
 .../drm/amd/display/dc/dcn32/dcn32_hwseq.c|   8 +-
 .../drm/amd/display/dc/dcn32/dcn32_resource.c |  66 +---  
.../drm/amd/display/dc/dcn32/dcn32_resource.h |  13 +-  
.../display/dc/dcn32/dcn32_resource_helpers.c |  15 +-
 .../amd/display/dc/dcn321/dcn321_resource.c   |   2 +
 .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  |   3 +
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  |   9 +-
 .../gpu/drm/amd/display/dc/inc/core_types.h   |   4 +-
 19 files changed, 277 insertions(+), 152 deletions(-)

--
2.34.1
<>

Re: [PATCH] drm/amdgpu: use dev_dbg to print messages in runtime cycle

2022-11-25 Thread Alex Deucher

On Thu, Nov 24, 2022 at 12:29 AM Guchun Chen  wrote:
>
> Runtime PM can happen pretty frequently, as these printings
> may be annoyed, switch to dev_dbg.
>
> Suggested-by: Lijo Lazar 
> Signed-off-by: Guchun Chen 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 8b1f6c032a2e..447e27b2e16b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2568,7 +2568,7 @@ static int amdgpu_pmops_runtime_suspend(struct device 
> *dev)
> amdgpu_device_baco_enter(drm_dev);
> }
>
> -   dev_info(&pdev->dev, "asic/device is runtime suspended\n");
> +   dev_dbg(&pdev->dev, "asic/device is runtime suspended\n");
>
> return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 7bb2de1d11ff..4a18d1944e4f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -861,7 +861,7 @@ static int psp_tmr_unload(struct psp_context *psp)
> struct psp_gfx_cmd_resp *cmd = acquire_psp_cmd_buf(psp);
>
> psp_prep_tmr_unload_cmd_buf(psp, cmd);
> -   dev_info(psp->adev->dev, "free PSP TMR buffer\n");
> +   dev_dbg(psp->adev->dev, "free PSP TMR buffer\n");
>
> ret = psp_cmd_submit_buf(psp, NULL, cmd,
>  psp->fence_buf_mc_addr);
> --
> 2.25.1
>

Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA

2022-11-25 Thread Felix Kuehling




On 2022-11-24 15:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]


-Original Message-
From: Kuehling, Felix 
Sent: November 24, 2022 11:24 AM
To: Kim, Jonathan ; amd-
g...@lists.freedesktop.org
Subject: Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to
TMA


Am 2022-11-24 um 09:51 schrieb Kim, Jonathan:

[Public]


-Original Message-
From: Kuehling, Felix 
Sent: November 22, 2022 7:45 PM
To: Kim, Jonathan ; amd-
g...@lists.freedesktop.org
Subject: Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to
TMA


On 2022-10-31 12:23, Jonathan Kim wrote:

From: Jay Cornwall 

Trap handler behavior will differ when a debugger is attached.

Make the debug trap flag available in the trap handler TMA.
Update it when the debug trap ioctl is invoked.

v3: Rebase for upstream

v2:
Add missing debug flag setup on APUs

Signed-off-by: Jay Cornwall 
Reviewed-by: Felix Kuehling 
Signed-off-by: Jonathan Kim 
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.c   |  4 
drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16



3 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c

b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c

index ae6e701a2656..d4f87f2adada 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -193,6 +193,8 @@ void kfd_dbg_trap_deactivate(struct

kfd_process

*target, bool unwind, int unwind

  if (unwind && count == unwind_count)
  break;

+   kfd_process_set_trap_debug_flag(&pdd->qpd, false);
+
  /* GFX off is already disabled by debug activate if not RLC

restore supported. */

  if (kfd_dbg_is_rlc_restore_supported(pdd->dev))
  amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
@@ -278,6 +280,8 @@ int kfd_dbg_trap_activate(struct kfd_process

*target)

  if (kfd_dbg_is_rlc_restore_supported(pdd->dev))
  amdgpu_gfx_off_ctrl(pdd->dev->adev, true);

+   kfd_process_set_trap_debug_flag(&pdd->qpd, true);
+
  r = debug_refresh_runlist(pdd->dev->dqm);
  if (r) {
  target->runtime_info.runtime_state =
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 9690a2adb9ed..82b28588ab72 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1101,6 +1101,8 @@ int kfd_init_apertures(struct kfd_process

*process);

void kfd_process_set_trap_handler(struct qcm_process_device *qpd,
uint64_t tba_addr,
uint64_t tma_addr);
+void kfd_process_set_trap_debug_flag(struct qcm_process_device

*qpd,

+bool enabled);

/* CWSR initialization */
int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file

*filep);

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c

b/drivers/gpu/drm/amd/amdkfd/kfd_process.c

index 59c4c38833b6..d62e0c62df76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1252,6 +1252,8 @@ int kfd_process_init_cwsr_apu(struct

kfd_process *p, struct file *filep)

  memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev-
cwsr_isa_size);

+   kfd_process_set_trap_debug_flag(qpd, p-
debug_trap_enabled);
+
  qpd->tma_addr = qpd->tba_addr +

KFD_CWSR_TMA_OFFSET;

  pr_debug("set tba :0x%llx, tma:0x%llx, cwsr_kaddr:%p for

pqm.\n",

  qpd->tba_addr, qpd->tma_addr, qpd->cwsr_kaddr);
@@ -1288,6 +1290,9 @@ static int

kfd_process_device_init_cwsr_dgpu(struct kfd_process_device *pdd)

  memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev->cwsr_isa_size);

+   kfd_process_set_trap_debug_flag(&pdd->qpd,
+   pdd->process-
debug_trap_enabled);
+
  qpd->tma_addr = qpd->tba_addr + KFD_CWSR_TMA_OFFSET;
  pr_debug("set tba :0x%llx, tma:0x%llx, cwsr_kaddr:%p for pqm.\n",
   qpd->tba_addr, qpd->tma_addr, qpd->cwsr_kaddr);
@@ -1374,6 +1379,17 @@ bool kfd_process_xnack_mode(struct

kfd_process *p, bool supported)

  return true;
}

+void kfd_process_set_trap_debug_flag(struct qcm_process_device

*qpd,

+bool enabled)
+{
+   /* If TMA doesn't exist then flag will be set during allocation. */

I would expect a change to the TMA allocation function, but that isn't
in this patch?

The TMA is allocated under kfd_process_init_cwsr_* and CWSR enabled is

a pre-condition for the 1st level trap handler loading.

The lack of context in the patch for those functions may be hiding that fact.
Is the placement of this comment misleading?  Maybe it should go in

kfd_dbg_trap_activate when kfd_process_set_trap_debug_flag is called?

Or should it just be removed since the combined calls within initialization of

CWS

[linux-next:master] BUILD REGRESSION 9e46a79967326efb03c481ddfd58902475bd920d

2022-11-25 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 9e46a79967326efb03c481ddfd58902475bd920d  Add linux-next specific 
files for 20221125

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202211041320.coq8eelj-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211090634.ryfkk0ws-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211242021.fdzrfna8-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211242120.mzzvguln-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

arch/arm/mach-s3c/devs.c:32:10: fatal error: linux/platform_data/dma-s3c24xx.h: 
No such file or directory
arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't 
find starting instruction
arch/powerpc/kernel/optprobes_head.o: warning: objtool: 
optprobe_template_end(): can't find starting instruction
drivers/clk/clk.c:1022:5: error: redefinition of 'clk_prepare'
drivers/clk/clk.c:1268:6: error: redefinition of 'clk_is_enabled_when_prepared'
drivers/clk/clk.c:941:6: error: redefinition of 'clk_unprepare'
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4968: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:5075:24: warning: 
implicit conversion from 'enum ' to 'enum dc_status' 
[-Wenum-conversion]
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: 
warning: no previous prototype for 'to_dal_irq_source_dcn201' 
[-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous 
prototype for 'gf100_fifo_nonstall_block' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous 
prototype for 'nvkm_engn_cgrp_get' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous 
prototype for 'tu102_gr_load' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype 
for 'wpr_generic_header_dump' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c:221:21: warning: variable 'loc' 
set but not used [-Wunused-but-set-variable]
net/netfilter/nf_conntrack_netlink.c:2674:6: warning: unused variable 'mark' 
[-Wunused-variable]
vmlinux.o: warning: objtool: __btrfs_map_block+0x1d77: unreachable instruction

Unverified Error/Warning (likely false positive, please contact us if 
interested):

ERROR: modpost: "input_ff_create_memless" [drivers/hid/hid-axff.ko] undefined!
ERROR: modpost: "input_ff_create_memless" [drivers/hid/hid-sjoy.ko] undefined!
ERROR: modpost: "input_ff_create_memless" 
[drivers/input/misc/arizona-haptics.ko] undefined!
ERROR: modpost: "input_ff_create_memless" [drivers/input/misc/drv2667.ko] 
undefined!
drivers/dma/at_hdmac.c:1371 atc_prep_slave_sg() warn: possible memory leak of 
'desc'

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc.c:warning:This-comment-starts-with-but-isn-t-a-kernel-doc-comment.-Refer-Documentation-doc-guide-kernel-doc.rst
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load
|   |-- 
drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump
|   `-- 
drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used
|-- alpha-randconfig-r003-20221124
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc.c:warning:This-comment-starts-with-but-isn-t-a-kernel-doc-comment.-Refer-Documentation-doc-guide-kernel-doc.rst
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|-- alpha-randconfig-r016-20221124
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-gr-

Re: [PATCH] drm/amdkfd: Remove unnecessary condition in kfd_topology_add_device()

2022-11-25 Thread Felix Kuehling


On 2022-11-25 02:39, Dan Carpenter wrote:

We re-arranged this code recently so "ret" is always zero at this point.

Signed-off-by: Dan Carpenter 


Reviewed-by: Felix Kuehling 

I'm applying your patch to amd-staging-drm-next. Thank you!

Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 6f01ebc8557b..bceb1a5b2518 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2012,10 +2012,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
  
  	kfd_debug_print_topology();
  
-	if (!res)

-   kfd_notify_gpu_change(gpu_id, 1);
+   kfd_notify_gpu_change(gpu_id, 1);
  
-	return res;

+   return 0;
  }
  
  /**

Re: [PATCH 2/3] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame

2022-11-25 Thread Arnd Bergmann

On Fri, Nov 25, 2022, at 10:25, Lee Jones wrote:
> calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64)
> architectures built with Clang (all released versions), whereby the stack
> frame gets blown up to well over 5k.  This would cause an immediate kernel
> panic on most architectures.  We'll revert this when the following bug report
> has been resolved: https://github.com/llvm/llvm-project/issues/41896.
>
> Suggested-by: Arnd Bergmann 
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/Kconfig | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 34f5a092c99e7..1fa7b9760adb8 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -265,6 +265,7 @@ source "drivers/gpu/drm/radeon/Kconfig"
> 
>  config DRM_AMDGPU
>   tristate "AMD GPU"
> + depends on BROKEN || !CC_IS_CLANG || !(X86_64 || SPARC64 || ARM64)

The logic looks fine, this has been broken for so long without anyone
paying attention that limiting the broken code to the working architectures
is probably the best way to avoid trouble.

However, as far as I can tell, the problem doesn't affect the
entire driver, only the "new" display engine, so I would probably
try to limit this to turning off CONFIG_DRM_AMD_DC for architectures
that don't have a fixed clang.

 Arnd

Re: [PATCH 1/3] drm/amd/display/dc/calcs/dce_calcs: Break-out a stack-heavy chunk of code

2022-11-25 Thread Arnd Bergmann

On Fri, Nov 25, 2022, at 10:25, Lee Jones wrote:
> bw_calcs() presently blows the stack-frame limit by calling functions
> inside a argument list which return quite a bit of data to be passed
> onto sub-functions.  Simply breaking out this hunk reduces the
> stack-frame use by 500 Bytes, preventing the following compiler
> warning:
>
> drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dce_calcs.c:3285:6:
>   warning: stack frame size (1384) exceeds limit (1024)
> in 'bw_calcs' [-Wframe-larger-than]
> bool bw_calcs(struct dc_context *ctx,
>  ^
> 1 warning generated.
>
> This resolves the issue and takes us one step closer towards a
> successful allmodconfig WERROR build.
>
> Signed-off-by: Lee Jones 

Is this still needed with the patch to turn off the display engine
on most architectures? On which architecture and with which compiler
do you still observe the problem?

Note that this probably doesn't actually solve the potential stack
overflow by itself, since the function that is now split out
is still called with the parent stack active. Splitting out multiple
smaller bits however would solve it since then the stack frames
could overlap.

Arnd

Re: [PATCH 3/3] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled

2022-11-25 Thread Arnd Bergmann

On Fri, Nov 25, 2022, at 10:25, Lee Jones wrote:
> When enabled, KASAN enlarges function's stack-frames.  Pushing quite a
> few over the current threshold.  This can mainly be seen on 32-bit
> architectures where the present limit (when !GCC) is a lowly
> 1024-Bytes.
>
> Signed-off-by: Lee Jones 

Acked-by: Arnd Bergmann 

If this affects only clang but not gcc, I wonder if we could
limit the scope and keep the 1024 byte limit on gcc builds.

> ---
>  lib/Kconfig.debug | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c3c0b077ade33..82d475168db95 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -399,6 +399,7 @@ config FRAME_WARN
>   default 2048 if GCC_PLUGIN_LATENT_ENTROPY

This is actually a related bug that we should fix: allmodconfig
with gcc turns on GCC_PLUGIN_LATENT_ENTROPY, so the limit
ends up being way too high. I think we need to either ensure
that allmodconfig turns off the latent entropy plugin,
or that the limit gets lowered again to something that is
not any higher than the KASAN limit.

 Arnd

Re: [PATCH v2 1/2] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame

2022-11-25 Thread Arnd Bergmann

On Fri, Nov 25, 2022, at 13:07, Lee Jones wrote:
> calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64)
> architectures built with Clang (all released versions), whereby the stack
> frame gets blown up to well over 5k.  This would cause an immediate kernel
> panic on most architectures.  We'll revert this when the following bug report
> has been resolved: https://github.com/llvm/llvm-project/issues/41896.
>
> Suggested-by: Arnd Bergmann 
> Signed-off-by: Lee Jones 

Acked-by: Arnd Bergmann

Re: [PATCH v2 2/2] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled

2022-11-25 Thread Arnd Bergmann

On Fri, Nov 25, 2022, at 14:40, Lee Jones wrote:
> On Fri, 25 Nov 2022, Lee Jones wrote:
>
>> When enabled, KASAN enlarges function's stack-frames.  Pushing quite a
>> few over the current threshold.  This can mainly be seen on 32-bit
>> architectures where the present limit (when !GCC) is a lowly
>> 1024-Bytes.
>> 
>> Signed-off-by: Lee Jones 
>> ---
>>  lib/Kconfig.debug | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index c3c0b077ade33..82d475168db95 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -399,6 +399,7 @@ config FRAME_WARN
>>  default 2048 if GCC_PLUGIN_LATENT_ENTROPY
>>  default 2048 if PARISC
>>  default 1536 if (!64BIT && XTENSA)
>> +default 1280 if KASAN && !64BIT
>>  default 1024 if !64BIT
>>  default 2048 if 64BIT
>>  help
>
> Note this also fixes 61 warnings when
>
>   (GCC && !GCC_PLUGIN_LATENT_ENTROPY)
>
> ... which as Arnd says should not be enabled by default.  We'll
> address that issue once this set has been applied.

Thanks a lot for checking this!

Reviewed-by: Arnd Bergmann

Re: [PATCH v2 2/2] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled

2022-11-25 Thread Lee Jones

On Fri, 25 Nov 2022, Lee Jones wrote:

> When enabled, KASAN enlarges function's stack-frames.  Pushing quite a
> few over the current threshold.  This can mainly be seen on 32-bit
> architectures where the present limit (when !GCC) is a lowly
> 1024-Bytes.
> 
> Signed-off-by: Lee Jones 
> ---
>  lib/Kconfig.debug | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c3c0b077ade33..82d475168db95 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -399,6 +399,7 @@ config FRAME_WARN
>   default 2048 if GCC_PLUGIN_LATENT_ENTROPY
>   default 2048 if PARISC
>   default 1536 if (!64BIT && XTENSA)
> + default 1280 if KASAN && !64BIT
>   default 1024 if !64BIT
>   default 2048 if 64BIT
>   help

Note this also fixes 61 warnings when

  (GCC && !GCC_PLUGIN_LATENT_ENTROPY)

... which as Arnd says should not be enabled by default.  We'll
address that issue once this set has been applied.

-- 
Lee Jones [李琼斯]

Re: [Intel-gfx] [PATCH 7/9] drm/i915: stop using ttm_bo_wait

2022-11-25 Thread Christian König


Am 25.11.22 um 12:14 schrieb Tvrtko Ursulin:


+ Matt

On 25/11/2022 10:21, Christian König wrote:

TTM is just wrapping core DMA functionality here, remove the mid-layer.
No functional change.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index 5247d88b3c13..d409a77449a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct 
drm_i915_gem_object *obj,

  static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
  {
  struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-    int err;
+    long err;
    WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
  -    err = ttm_bo_wait(bo, true, false);
-    if (err)
+    err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
+    true, 15 * HZ);


This 15 second stuck out a bit for me and then on a slightly deeper 
look it seems this timeout will "leak" into a few of i915 code paths. 
If we look at the difference between the legacy shmem and ttm backend 
I am not sure if the legacy one is blocking or not - but if it can 
block I don't think it would have an arbitrary timeout like this. Matt 
your thoughts?


That's exactly the reason why I try to remove the ttm_bo_wait() as mid 
layer here. It hides the fact that we don't wait forever for BOs to 
become idle.


This is functional identical to the old code. If you want some other 
behavior feel free to note what's desired and I will implement it.


Regards,
Christian.



Regards,

Tvrtko


+    if (err < 0)
  return err;
+    if (err == 0)
+    return -EBUSY;
    err = i915_ttm_move_notify(bo);
  if (err)

[PATCH v2 2/2] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled

2022-11-25 Thread Lee Jones

When enabled, KASAN enlarges function's stack-frames.  Pushing quite a
few over the current threshold.  This can mainly be seen on 32-bit
architectures where the present limit (when !GCC) is a lowly
1024-Bytes.

Signed-off-by: Lee Jones 
---
 lib/Kconfig.debug | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c3c0b077ade33..82d475168db95 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -399,6 +399,7 @@ config FRAME_WARN
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 2048 if PARISC
default 1536 if (!64BIT && XTENSA)
+   default 1280 if KASAN && !64BIT
default 1024 if !64BIT
default 2048 if 64BIT
help
-- 
2.38.1.584.g0f3c55d4c2-goog

[PATCH v2 1/2] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame

2022-11-25 Thread Lee Jones

calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64)
architectures built with Clang (all released versions), whereby the stack
frame gets blown up to well over 5k.  This would cause an immediate kernel
panic on most architectures.  We'll revert this when the following bug report
has been resolved: https://github.com/llvm/llvm-project/issues/41896.

Suggested-by: Arnd Bergmann 
Signed-off-by: Lee Jones 
---
 drivers/gpu/drm/amd/display/Kconfig | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/Kconfig 
b/drivers/gpu/drm/amd/display/Kconfig
index 6925e0280dbe6..f4f3d2665a6b2 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -5,6 +5,7 @@ menu "Display Engine Configuration"
 config DRM_AMD_DC
bool "AMD DC - Enable new display engine"
default y
+   depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64
select SND_HDA_COMPONENT if SND_HDA_CORE
select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128)
help
@@ -12,6 +13,12 @@ config DRM_AMD_DC
  support for AMDGPU. This adds required support for Vega and
  Raven ASICs.
 
+ calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 
|| ARM64)
+ architectures built with Clang (all released versions), whereby the 
stack
+ frame gets blown up to well over 5k.  This would cause an immediate 
kernel
+ panic on most architectures.  We'll revert this when the following 
bug report
+ has been resolved: https://github.com/llvm/llvm-project/issues/41896.
+
 config DRM_AMD_DC_DCN
def_bool n
help
-- 
2.38.1.584.g0f3c55d4c2-goog

[PATCH v2 0/2] Fix a bunch of allmodconfig errors

2022-11-25 Thread Lee Jones

Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST") WERROR 
now defaults to COMPILE_TEST meaning that it's enabled for allmodconfig
builds.  This leads to some interesting failures, each resolved in this set.   
   
With this set applied, I am able to obtain a successful allmodconfig Arm build.

v1 => v2:
 - Remove superfluous change (these two override it)
 - Mark only DRM_AMD_DC ("the new display engine) as Broken
 - Change logic to only *include* working arches, not *preclude* them

Lee Jones (2):
  drm/amdgpu: Temporarily disable broken Clang builds due to blown
stack-frame
  Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is
enabled

 drivers/gpu/drm/amd/display/Kconfig | 7 +++
 lib/Kconfig.debug   | 1 +
 2 files changed, 8 insertions(+)

-- 
2.38.1.584.g0f3c55d4c2-goog

Re: [Intel-gfx] [PATCH 7/9] drm/i915: stop using ttm_bo_wait

2022-11-25 Thread Tvrtko Ursulin




+ Matt

On 25/11/2022 10:21, Christian König wrote:

TTM is just wrapping core DMA functionality here, remove the mid-layer.
No functional change.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5247d88b3c13..d409a77449a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
  static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
  {
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-   int err;
+   long err;
  
  	WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
  
-	err = ttm_bo_wait(bo, true, false);

-   if (err)
+   err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
+   true, 15 * HZ);


This 15 second stuck out a bit for me and then on a slightly deeper look 
it seems this timeout will "leak" into a few of i915 code paths. If we 
look at the difference between the legacy shmem and ttm backend I am not 
sure if the legacy one is blocking or not - but if it can block I don't 
think it would have an arbitrary timeout like this. Matt your thoughts?


Regards,

Tvrtko


+   if (err < 0)
return err;
+   if (err == 0)
+   return -EBUSY;
  
  	err = i915_ttm_move_notify(bo);

if (err)

[PATCH 6/9] drm/qxl: stop using ttm_bo_wait

2022-11-25 Thread Christian König

TTM is just wrapping core DMA functionality here, remove the mid-layer.
No functional change.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/qxl/qxl_cmd.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index 63aa96a69752..281edab518cd 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -579,7 +579,7 @@ void qxl_surface_evict(struct qxl_device *qdev, struct 
qxl_bo *surf, bool do_upd
 
 static int qxl_reap_surf(struct qxl_device *qdev, struct qxl_bo *surf, bool 
stall)
 {
-   int ret;
+   long ret;
 
ret = qxl_bo_reserve(surf);
if (ret)
@@ -588,7 +588,19 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct 
qxl_bo *surf, bool stal
if (stall)
mutex_unlock(&qdev->surf_evict_mutex);
 
-   ret = ttm_bo_wait(&surf->tbo, true, !stall);
+   if (stall) {
+   ret = dma_resv_wait_timeout(surf->tbo.base.resv,
+   DMA_RESV_USAGE_BOOKKEEP, true,
+   15 * HZ);
+   if (ret > 0)
+   ret = 0;
+   else if (ret == 0)
+   ret = -EBUSY;
+   } else {
+   ret = dma_resv_test_signaled(surf->tbo.base.resv,
+DMA_RESV_USAGE_BOOKKEEP);
+   ret = ret ? -EBUSY : 0;
+   }
 
if (stall)
mutex_lock(&qdev->surf_evict_mutex);
-- 
2.34.1

[PATCH 8/9] drm/ttm: use ttm_bo_wait_ctx instead of ttm_bo_wait

2022-11-25 Thread Christian König

Make sure that we use the correct settings from the context.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f9d9fd2d865d..cd266a067773 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -439,7 +439,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
bdev->funcs->evict_flags(bo, &placement);
 
if (!placement.num_placement && !placement.num_busy_placement) {
-   ret = ttm_bo_wait(bo, true, false);
+   ret = ttm_bo_wait_ctx(bo, ctx);
if (ret)
return ret;
 
@@ -1190,7 +1190,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
/*
 * Make sure BO is idle.
 */
-   ret = ttm_bo_wait(bo, false, false);
+   ret = ttm_bo_wait_ctx(bo, ctx);
if (unlikely(ret != 0))
goto out;
 
-- 
2.34.1

[PATCH 7/9] drm/i915: stop using ttm_bo_wait

2022-11-25 Thread Christian König

TTM is just wrapping core DMA functionality here, remove the mid-layer.
No functional change.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5247d88b3c13..d409a77449a3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
 {
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-   int err;
+   long err;
 
WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
 
-   err = ttm_bo_wait(bo, true, false);
-   if (err)
+   err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
+   true, 15 * HZ);
+   if (err < 0)
return err;
+   if (err == 0)
+   return -EBUSY;
 
err = i915_ttm_move_notify(bo);
if (err)
-- 
2.34.1

[PATCH 5/9] drm/nouveau: stop using ttm_bo_wait

2022-11-25 Thread Christian König

TTM is just wrapping core DMA functionality here, remove the mid-layer.
No functional change.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |  6 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c | 11 ---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 335fa91ca4ad..288eebc70a67 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -922,6 +922,7 @@ static void nouveau_bo_move_ntfy(struct ttm_buffer_object 
*bo,
struct nouveau_mem *mem = new_reg ? nouveau_mem(new_reg) : NULL;
struct nouveau_bo *nvbo = nouveau_bo(bo);
struct nouveau_vma *vma;
+   long ret;
 
/* ttm can now (stupidly) pass the driver bos it didn't create... */
if (bo->destroy != nouveau_bo_del_ttm)
@@ -936,7 +937,10 @@ static void nouveau_bo_move_ntfy(struct ttm_buffer_object 
*bo,
}
} else {
list_for_each_entry(vma, &nvbo->vma_list, head) {
-   WARN_ON(ttm_bo_wait(bo, false, false));
+   ret = dma_resv_wait_timeout(bo->base.resv,
+   DMA_RESV_USAGE_BOOKKEEP,
+   false, 15 * HZ);
+   WARN_ON(ret <= 0);
nouveau_vma_unmap(vma);
}
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index ac5793c96957..f77e44958037 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -645,7 +645,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
struct drm_nouveau_gem_pushbuf_reloc *reloc,
struct drm_nouveau_gem_pushbuf_bo *bo)
 {
-   int ret = 0;
+   long ret = 0;
unsigned i;
 
for (i = 0; i < req->nr_relocs; i++) {
@@ -703,9 +703,14 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
data |= r->vor;
}
 
-   ret = ttm_bo_wait(&nvbo->bo, false, false);
+   ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
+   DMA_RESV_USAGE_BOOKKEEP,
+   false, 15 * HZ);
+   if (ret == 0)
+   ret = -EBUSY;
if (ret) {
-   NV_PRINTK(err, cli, "reloc wait_idle failed: %d\n", 
ret);
+   NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
+ ret);
break;
}
 
-- 
2.34.1

[PATCH 9/9] drm/ttm: move ttm_bo_wait into VMWGFX

2022-11-25 Thread Christian König

Not used anymore by other drivers or TTM itself.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 44 +++--
 drivers/gpu/drm/ttm/ttm_bo_util.c   | 19 -
 drivers/gpu/drm/vmwgfx/ttm_object.h | 11 
 include/drm/ttm/ttm_bo.h|  1 -
 4 files changed, 39 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index cd266a067773..326a3d13a829 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1087,47 +1087,35 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
 EXPORT_SYMBOL(ttm_bo_unmap_virtual);
 
 /**
- * ttm_bo_wait - wait for buffer idle.
+ * ttm_bo_wait_ctx - wait for buffer idle.
  *
  * @bo:  The buffer object.
- * @interruptible:  Use interruptible wait.
- * @no_wait:  Return immediately if buffer is busy.
+ * @ctx: defines how to wait
  *
- * This function must be called with the bo::mutex held, and makes
- * sure any previous rendering to the buffer is completed.
- * Note: It might be necessary to block validations before the
- * wait by reserving the buffer.
- * Returns -EBUSY if no_wait is true and the buffer is busy.
- * Returns -ERESTARTSYS if interrupted by a signal.
+ * Waits for the buffer to be idle. Used timeout depends on the context.
+ * Returns -EBUSY if wait timed outt, -ERESTARTSYS if interrupted by a signal 
or
+ * zero on success.
  */
-int ttm_bo_wait(struct ttm_buffer_object *bo,
-   bool interruptible, bool no_wait)
+int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx 
*ctx)
 {
-   long timeout = 15 * HZ;
+   long ret;
 
-   if (no_wait) {
-   if (dma_resv_test_signaled(bo->base.resv, 
DMA_RESV_USAGE_BOOKKEEP))
+   if (ctx->no_wait_gpu) {
+   if (dma_resv_test_signaled(bo->base.resv,
+  DMA_RESV_USAGE_BOOKKEEP))
return 0;
else
return -EBUSY;
}
 
-   timeout = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
-   interruptible, timeout);
-   if (timeout < 0)
-   return timeout;
-
-   if (timeout == 0)
+   ret = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
+   ctx->interruptible, 15 * HZ);
+   if (unlikely(ret < 0))
+   return ret;
+   if (unlikely(ret == 0))
return -EBUSY;
-
return 0;
 }
-EXPORT_SYMBOL(ttm_bo_wait);
-
-int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx 
*ctx)
-{
-   return ttm_bo_wait(bo, ctx->interruptible, ctx->no_wait_gpu);
-}
 EXPORT_SYMBOL(ttm_bo_wait_ctx);
 
 int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
@@ -1135,7 +1123,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
 {
struct ttm_place place;
bool locked;
-   int ret;
+   long ret;
 
/*
 * While the bo may already reside in SYSTEM placement, set
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index fee7c20775c0..ed2b28734541 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -547,9 +547,13 @@ EXPORT_SYMBOL(ttm_bo_vunmap);
 static int ttm_bo_wait_free_node(struct ttm_buffer_object *bo,
 bool dst_use_tt)
 {
-   int ret;
-   ret = ttm_bo_wait(bo, false, false);
-   if (ret)
+   long ret;
+
+   ret = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
+   false, 15 * HZ);
+   if (ret == 0)
+   return -EBUSY;
+   if (ret < 0)
return ret;
 
if (!dst_use_tt)
@@ -710,8 +714,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
return ret;
 
/* If already idle, no need for ghost object dance. */
-   ret = ttm_bo_wait(bo, false, true);
-   if (ret != -EBUSY) {
+   if (dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP)) {
if (!bo->ttm) {
/* See comment below about clearing. */
ret = ttm_tt_create(bo, true);
@@ -748,8 +751,10 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
 
ret = dma_resv_copy_fences(&ghost->base._resv, bo->base.resv);
/* Last resort, wait for the BO to be idle when we are OOM */
-   if (ret)
-   ttm_bo_wait(bo, false, false);
+   if (ret) {
+   dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
+ false, MAX_SCHEDULE_TIMEOUT);
+   }
 
dma_resv_unlock(&ghost->base._resv);
ttm_bo_put(ghost);
diff --git a/drivers/gpu/drm/vmwgfx/ttm_object.h 
b/drivers/gpu/drm/vmwgfx/ttm_object.h
index f0ebbe340ad6..95a96

[PATCH 4/9] drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h

2022-11-25 Thread Christian König

Merge and cleanup the two headers into a single description of the
object API. Also move all the documentation to the implementation and
drop unecessary includes from the header.

No functional change.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |   1 +
 drivers/gpu/drm/drm_gem_ttm_helper.c  |   2 +
 drivers/gpu/drm/drm_gem_vram_helper.c |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |   2 +-
 drivers/gpu/drm/i915/i915_deps.c  |   2 +-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   2 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   |   1 -
 drivers/gpu/drm/nouveau/nouveau_bo.c  |   1 +
 drivers/gpu/drm/nouveau/nouveau_bo.h  |   3 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h |   3 +-
 drivers/gpu/drm/nouveau/nouveau_mem.c |   3 +-
 drivers/gpu/drm/nouveau/nouveau_mem.h |   2 +-
 drivers/gpu/drm/nouveau/nouveau_prime.c   |   1 +
 drivers/gpu/drm/nouveau/nouveau_sgdma.c   |   1 +
 drivers/gpu/drm/qxl/qxl_drv.h |   3 +-
 drivers/gpu/drm/qxl/qxl_ttm.c |   4 +-
 drivers/gpu/drm/radeon/radeon.h   |   3 +-
 drivers/gpu/drm/radeon/radeon_prime.c |   2 +
 drivers/gpu/drm/radeon/radeon_ttm.c   |   4 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  81 +++-
 drivers/gpu/drm/ttm/ttm_bo_util.c | 110 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |  19 +-
 drivers/gpu/drm/ttm/ttm_device.c  |   2 +-
 drivers/gpu/drm/ttm/ttm_execbuf_util.c|   6 +-
 drivers/gpu/drm/ttm/ttm_pool.c|   3 +-
 drivers/gpu/drm/ttm/ttm_range_manager.c   |   2 +-
 drivers/gpu/drm/ttm/ttm_resource.c|   3 +-
 drivers/gpu/drm/ttm/ttm_tt.c  |   3 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c|   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |   1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c   |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gmr.c   |   1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c |   1 -
 .../gpu/drm/vmwgfx/vmwgfx_system_manager.c|   1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c|   1 -
 include/drm/drm_gem_ttm_helper.h  |   3 +-
 include/drm/drm_gem_vram_helper.h |   4 +-
 include/drm/ttm/{ttm_bo_api.h => ttm_bo.h}| 345 +-
 include/drm/ttm/ttm_bo_driver.h   | 303 ---
 include/drm/ttm/ttm_execbuf_util.h|   4 +-
 55 files changed, 410 insertions(+), 557 deletions(-)
 rename include/drm/ttm/{ttm_bo_api.h => ttm_bo.h} (67%)
 delete mode 100644 include/drm/ttm/ttm_bo_driver.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6b74df446694..2644cd991210 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -52,8 +52,7 @@
 #include 
 #include 
 
-#include 
-#include 
+#include 
 #include 
 #include 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 3a763916a5a1..ab450f12c445 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "amdgpu_object.h"
 #include "amdgpu_gem.h"
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index e4d78491bcc7..ededdc01ca28 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -28,6 +28,8 @@
 
 struct hmm_range;
 
+struct drm_file;
+
 struct amdgpu_device;
 struct amdgpu_bo;
 struct amdgpu_bo_va;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8516c814bc9b..8b7a09b392ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -32,6 +32,8 @@
 
 #include 
 #include 
+#include 
+
 #include "amdgpu_cs.h"
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_c

[PATCH 2/9] drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue

2022-11-25 Thread Christian König

Those functions never worked correctly since it is still perfectly
possible that a buffer object is released and the background worker
restarted even after calling them.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +---
 drivers/gpu/drm/radeon/radeon_device.c  |  5 -
 drivers/gpu/drm/radeon/radeon_pm.c  |  4 +---
 drivers/gpu/drm/ttm/ttm_bo.c| 14 --
 include/drm/ttm/ttm_bo_api.h| 16 
 6 files changed, 3 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 0f16d3c09309..f60753f97ac5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1717,7 +1717,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
amdgpu_ring *ring)
 
 static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 {
-   int r, resched, length;
+   int r, length;
struct amdgpu_ring *ring;
struct dma_fence **fences = NULL;
struct amdgpu_device *adev = (struct amdgpu_device *)data;
@@ -1747,8 +1747,6 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
/* stop the scheduler */
kthread_park(ring->sched.thread);
 
-   resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
-
/* preempt the IB */
r = amdgpu_ring_preempt_ib(ring);
if (r) {
@@ -1785,8 +1783,6 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
up_read(&adev->reset_domain->sem);
 
-   ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
-
 pro_end:
kfree(fences);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b2b1c66bfe39..2b1db37e25c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3983,10 +3983,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
}
amdgpu_fence_driver_hw_fini(adev);
 
-   if (adev->mman.initialized) {
+   if (adev->mman.initialized)
flush_delayed_work(&adev->mman.bdev.wq);
-   ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
-   }
 
if (adev->pm_sysfs_en)
amdgpu_pm_sysfs_fini(adev);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 6344454a7721..9a556f505685 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1772,7 +1772,6 @@ int radeon_gpu_reset(struct radeon_device *rdev)
bool saved = false;
 
int i, r;
-   int resched;
 
down_write(&rdev->exclusive_lock);
 
@@ -1784,8 +1783,6 @@ int radeon_gpu_reset(struct radeon_device *rdev)
atomic_inc(&rdev->gpu_reset_counter);
 
radeon_save_bios_scratch_regs(rdev);
-   /* block TTM */
-   resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
radeon_suspend(rdev);
radeon_hpd_fini(rdev);
 
@@ -1844,8 +1841,6 @@ int radeon_gpu_reset(struct radeon_device *rdev)
/* reset hpd state */
radeon_hpd_init(rdev);
 
-   ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
-
rdev->in_reset = true;
rdev->needs_reset = false;
 
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index 04c693ca419a..cbc554928bcc 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -1853,11 +1853,10 @@ static bool radeon_pm_debug_check_in_vbl(struct 
radeon_device *rdev, bool finish
 static void radeon_dynpm_idle_work_handler(struct work_struct *work)
 {
struct radeon_device *rdev;
-   int resched;
+
rdev = container_of(work, struct radeon_device,
pm.dynpm_idle_work.work);
 
-   resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
mutex_lock(&rdev->pm.mutex);
if (rdev->pm.dynpm_state == DYNPM_STATE_ACTIVE) {
int not_processed = 0;
@@ -1908,7 +1907,6 @@ static void radeon_dynpm_idle_work_handler(struct 
work_struct *work)
  msecs_to_jiffies(RADEON_IDLE_LOOP_MS));
}
mutex_unlock(&rdev->pm.mutex);
-   ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
 }
 
 /*
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c3f4b33136e5..b77262a623e0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -418,20 +418,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo)
 }
 EXPORT_SYMBOL(ttm_bo_put);
 
-int ttm_bo_lock_delayed_workqueue(struct ttm_device *bdev)
-{
-   return cancel_delayed_work_sync(&bdev->wq);
-}
-EXPORT_SYMBOL(ttm_bo_lock_delayed_workqueue);
-
-void ttm_bo_unlock_delayed_workqueue(struct ttm_device *bdev, int resc

[PATCH 3/9] drm/ttm: use per BO cleanup workers

2022-11-25 Thread Christian König

Instead of a single worker going over the list of delete BOs in regular
intervals use a per BO worker which blocks for the resv object and
locking of the BO.

This not only simplifies the handling massively, but also results in
much better response time when cleaning up buffers.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
 drivers/gpu/drm/i915/i915_gem.c|   2 +-
 drivers/gpu/drm/i915/intel_region_ttm.c|   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c   | 112 -
 drivers/gpu/drm/ttm/ttm_bo_util.c  |   1 -
 drivers/gpu/drm/ttm/ttm_device.c   |  24 ++---
 include/drm/ttm/ttm_bo_api.h   |  18 +---
 include/drm/ttm/ttm_device.h   |   7 +-
 8 files changed, 57 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b1db37e25c1..74ccbd566777 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3984,7 +3984,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_fence_driver_hw_fini(adev);
 
if (adev->mman.initialized)
-   flush_delayed_work(&adev->mman.bdev.wq);
+   drain_workqueue(adev->mman.bdev.wq);
 
if (adev->pm_sysfs_en)
amdgpu_pm_sysfs_fini(adev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8468ca9885fd..c38306f156d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1099,7 +1099,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private 
*i915)
 {
while (atomic_read(&i915->mm.free_count)) {
flush_work(&i915->mm.free_work);
-   flush_delayed_work(&i915->bdev.wq);
+   drain_workqueue(i915->bdev.wq);
rcu_barrier();
}
 }
diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
b/drivers/gpu/drm/i915/intel_region_ttm.c
index cf89d0c2a2d9..657bbc16a48a 100644
--- a/drivers/gpu/drm/i915/intel_region_ttm.c
+++ b/drivers/gpu/drm/i915/intel_region_ttm.c
@@ -132,7 +132,7 @@ int intel_region_ttm_fini(struct intel_memory_region *mem)
break;
 
msleep(20);
-   flush_delayed_work(&mem->i915->bdev.wq);
+   drain_workqueue(mem->i915->bdev.wq);
}
 
/* If we leaked objects, Don't free the region causing use after free */
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index b77262a623e0..4749b65bedc4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -280,14 +280,13 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object 
*bo,
ret = 0;
}
 
-   if (ret || unlikely(list_empty(&bo->ddestroy))) {
+   if (ret) {
if (unlock_resv)
dma_resv_unlock(bo->base.resv);
spin_unlock(&bo->bdev->lru_lock);
return ret;
}
 
-   list_del_init(&bo->ddestroy);
spin_unlock(&bo->bdev->lru_lock);
ttm_bo_cleanup_memtype_use(bo);
 
@@ -300,47 +299,21 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object 
*bo,
 }
 
 /*
- * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
- * encountered buffers.
+ * Block for the dma_resv object to become idle, lock the buffer and clean up
+ * the resource and tt object.
  */
-bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
+static void ttm_bo_delayed_delete(struct work_struct *work)
 {
-   struct list_head removed;
-   bool empty;
-
-   INIT_LIST_HEAD(&removed);
-
-   spin_lock(&bdev->lru_lock);
-   while (!list_empty(&bdev->ddestroy)) {
-   struct ttm_buffer_object *bo;
-
-   bo = list_first_entry(&bdev->ddestroy, struct ttm_buffer_object,
- ddestroy);
-   list_move_tail(&bo->ddestroy, &removed);
-   if (!ttm_bo_get_unless_zero(bo))
-   continue;
-
-   if (remove_all || bo->base.resv != &bo->base._resv) {
-   spin_unlock(&bdev->lru_lock);
-   dma_resv_lock(bo->base.resv, NULL);
-
-   spin_lock(&bdev->lru_lock);
-   ttm_bo_cleanup_refs(bo, false, !remove_all, true);
-
-   } else if (dma_resv_trylock(bo->base.resv)) {
-   ttm_bo_cleanup_refs(bo, false, !remove_all, true);
-   } else {
-   spin_unlock(&bdev->lru_lock);
-   }
+   struct ttm_buffer_object *bo;
 
-   ttm_bo_put(bo);
-   spin_lock(&bdev->lru_lock);
-   }
-   list_splice_tail(&removed, &bdev->ddestroy);
-   empty = list_empty(&bdev->ddestroy);
-   spin_unlock(&bdev->lru_lock);
+   bo = container_of(work, typeof(*bo), delayed_delete);
 
-

[PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation

2022-11-25 Thread Christian König

We already fallback to a dummy BO with no backing store when we
allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.

Drop all those workarounds and generalize this for GTT as well. This
fixes ENOMEM issues with runaway applications which try to allocate/free
GTT in a loop and are otherwise only limited by the CPU speed.

The CS will wait for the cleanup of freed up BOs to satisfy the
various domain specific limits and so effectively throttle those
buggy applications down to a sane allocation behavior again.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 16 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a0780a4e3e61..62e98f1ad770 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
bp.resv = resv;
bp.preferred_domain = initial_domain;
bp.flags = flags;
-   bp.domain = initial_domain;
+   bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
bp.bo_ptr_size = sizeof(struct amdgpu_bo);
 
r = amdgpu_bo_create_user(adev, &bp, &ubo);
@@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
}
 
initial_domain = (u32)(0x & args->in.domains);
-retry:
r = amdgpu_gem_object_create(adev, size, args->in.alignment,
-initial_domain,
-flags, ttm_bo_type_device, resv, &gobj);
+initial_domain, flags, ttm_bo_type_device,
+resv, &gobj);
if (r && r != -ERESTARTSYS) {
-   if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
-   flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
-   goto retry;
-   }
-
-   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-   goto retry;
-   }
DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
%d)\n",
size, initial_domain, args->in.alignment, r);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 974e85d8b6cc..919bbea2e3ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
 
bo->tbo.bdev = &adev->mman.bdev;
-   if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
- AMDGPU_GEM_DOMAIN_GDS))
-   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
-   else
-   amdgpu_bo_placement_from_domain(bo, bp->domain);
+   amdgpu_bo_placement_from_domain(bo, bp->domain);
if (bp->type == ttm_bo_type_kernel)
bo->tbo.priority = 1;
 
-- 
2.34.1

Re: [PATCH] drm/amdgpu: Fix minmax error

2022-11-25 Thread Christian König





Am 25.11.22 um 09:33 schrieb Luben Tuikov:

On 2022-11-25 02:59, Christian König wrote:

Am 25.11.22 um 08:56 schrieb Luben Tuikov:

On 2022-11-25 02:45, Christian König wrote:

Am 24.11.22 um 22:19 schrieb Luben Tuikov:

Fix minmax compilation error by using min_t()/max_t(), of the assignment type.

Cc: James Zhu 
Cc: Felix Kuehling 
Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system 
memory")
Signed-off-by: Luben Tuikov 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 8a2e5716d8dba2..d22d14b0ef0c84 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,
hmm_range->dev_private_owner = owner;

	do {

-   hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
+   hmm_range->end = min_t(typeof(hmm_range->end),
+  hmm_range->start + MAX_WALK_BYTE,
+  end);

Since end is a local variable I would strongly prefer to just have it
use the correct type for it.

Otherwise we might end up using something which doesn't work on all
architectures.

They all appear to be "unsigned long". I thought, since we assign to
hmm_range->end, we use that type.

Mhm, then why does the compiler complain here?

Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) ),
and this is why the minmax check complains.

So, since the left-hand expression is unsigned long,
i.e.,
hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
is,
unsigned long = min(unsigned long long, unsigned long);
The compiler complains.

I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX,


That's not only a preference, but a must have. Otherwise the code maybe 
won't work as expected on 32bit architectures.



and be defined as UL. I mean, why is everything in struct hmm_range
"unsigned long", but we set a high limit of 10__h for an end, and
compare it to "end" to find the smaller? If our "end" could potentially
be 10__h then shouldn't the members in struct hmm_range be
unsigned long long as well?


No, that the hmm range depends on the address space bits of the CPU is 
perfectly correct. Essentially this is just an userspace address range.


Our problem here is that this code needs to work on both 32bit and 64bit 
systems. And on a 32bit system limiting the types wouldn't work 
correctly as far as I can see.


So the compiler is complaining for rather good reasons and by using 
"min_t(UL" we just hide that instead of fixing the problem.


I suggest to use "min_t(u64" instead. An intelligent compiler should 
even be capable of optimizing this away by looking at the input types on 
32bit archs.




And for the timeout, we have the (now) obvious,

timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL);

and I don't know why we necessarily need a "1ULL", when 1UL would do just fine,
and then compilation passes for that statement. I can set this to 1UL, instead
of using max_t().


I think just changing this to 1UL should be sufficient.

Regards,
Christian.



Regards,
Luben



As far as I can see "unsigned long" is correct here, but if we somehow
have a typecast then something is not working as expected.

Is MAX_WALK_BYTE maybe of signed type?


Would you prefer at the top of the function to define "timeout" and "end" as,
typeof(hmm_range->end) end, timeout;

Well for end that might make sense, but timeout is independent of the
hmm range.

Regards,
Christian.


Regards,
Luben

[PATCH 3/3] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled

2022-11-25 Thread Lee Jones

When enabled, KASAN enlarges function's stack-frames.  Pushing quite a
few over the current threshold.  This can mainly be seen on 32-bit
architectures where the present limit (when !GCC) is a lowly
1024-Bytes.

Signed-off-by: Lee Jones 
---
 lib/Kconfig.debug | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c3c0b077ade33..82d475168db95 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -399,6 +399,7 @@ config FRAME_WARN
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 2048 if PARISC
default 1536 if (!64BIT && XTENSA)
+   default 1280 if KASAN && !64BIT
default 1024 if !64BIT
default 2048 if 64BIT
help
-- 
2.38.1.584.g0f3c55d4c2-goog

[PATCH 2/3] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame

2022-11-25 Thread Lee Jones

calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64)
architectures built with Clang (all released versions), whereby the stack
frame gets blown up to well over 5k.  This would cause an immediate kernel
panic on most architectures.  We'll revert this when the following bug report
has been resolved: https://github.com/llvm/llvm-project/issues/41896.

Suggested-by: Arnd Bergmann 
Signed-off-by: Lee Jones 
---
 drivers/gpu/drm/Kconfig | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 34f5a092c99e7..1fa7b9760adb8 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -265,6 +265,7 @@ source "drivers/gpu/drm/radeon/Kconfig"
 
 config DRM_AMDGPU
tristate "AMD GPU"
+   depends on BROKEN || !CC_IS_CLANG || !(X86_64 || SPARC64 || ARM64)
depends on DRM && PCI && MMU
select FW_LOADER
select DRM_DISPLAY_DP_HELPER
@@ -289,6 +290,12 @@ config DRM_AMDGPU
help
  Choose this option if you have a recent AMD Radeon graphics card.
 
+ calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 
|| ARM64)
+ architectures built with Clang (all released versions), whereby the 
stack
+ frame gets blown up to well over 5k.  This would cause an immediate 
kernel
+ panic on most architectures.  We'll revert this when the following 
bug report
+ has been resolved: https://github.com/llvm/llvm-project/issues/41896.
+
  If M is selected, the module will be called amdgpu.
 
 source "drivers/gpu/drm/amd/amdgpu/Kconfig"
-- 
2.38.1.584.g0f3c55d4c2-goog

[PATCH 1/3] drm/amd/display/dc/calcs/dce_calcs: Break-out a stack-heavy chunk of code

2022-11-25 Thread Lee Jones

bw_calcs() presently blows the stack-frame limit by calling functions
inside a argument list which return quite a bit of data to be passed
onto sub-functions.  Simply breaking out this hunk reduces the
stack-frame use by 500 Bytes, preventing the following compiler
warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dce_calcs.c:3285:6:
  warning: stack frame size (1384) exceeds limit (1024)
in 'bw_calcs' [-Wframe-larger-than]
bool bw_calcs(struct dc_context *ctx,
 ^
1 warning generated.

This resolves the issue and takes us one step closer towards a
successful allmodconfig WERROR build.

Signed-off-by: Lee Jones 
---
 .../drm/amd/display/dc/dml/calcs/dce_calcs.c  | 483 +-
 1 file changed, 245 insertions(+), 238 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c 
b/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c
index 0100a6053ab6b..ce5918830c030 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c
@@ -3034,6 +3034,248 @@ static bool all_displays_in_sync(const struct pipe_ctx 
pipe[],
return true;
 }
 
+void bw_calcs_mid_phase(struct dc_context *ctx, const struct bw_calcs_dceip 
*dceip,
+   const struct bw_calcs_vbios *vbios, struct 
dce_bw_output *calcs_output,
+   struct bw_fixed low_sclk, struct bw_fixed mid1_sclk,
+   struct bw_fixed mid2_sclk, struct bw_fixed mid3_sclk,
+   struct bw_fixed mid_yclk, struct bw_calcs_data *data)
+{
+   ((struct bw_calcs_vbios *)vbios)->low_sclk = mid3_sclk;
+   ((struct bw_calcs_vbios *)vbios)->mid1_sclk = mid3_sclk;
+   ((struct bw_calcs_vbios *)vbios)->mid2_sclk = mid3_sclk;
+   calculate_bandwidth(dceip, vbios, data);
+
+   calcs_output->nbp_state_change_wm_ns[0].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  
nbp_state_change_watermark[4],bw_int_to_fixed(1000)));
+   calcs_output->nbp_state_change_wm_ns[1].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  nbp_state_change_watermark[5], 
bw_int_to_fixed(1000)));
+   calcs_output->nbp_state_change_wm_ns[2].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  nbp_state_change_watermark[6], 
bw_int_to_fixed(1000)));
+
+   if (ctx->dc->caps.max_slave_planes) {
+   calcs_output->nbp_state_change_wm_ns[3].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  nbp_state_change_watermark[0], 
bw_int_to_fixed(1000)));
+   calcs_output->nbp_state_change_wm_ns[4].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  nbp_state_change_watermark[1], 
bw_int_to_fixed(1000)));
+   } else {
+   calcs_output->nbp_state_change_wm_ns[3].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  nbp_state_change_watermark[7], 
bw_int_to_fixed(1000)));
+   calcs_output->nbp_state_change_wm_ns[4].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  nbp_state_change_watermark[8], 
bw_int_to_fixed(1000)));
+   }
+   calcs_output->nbp_state_change_wm_ns[5].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  nbp_state_change_watermark[9], 
bw_int_to_fixed(1000)));
+
+   calcs_output->stutter_exit_wm_ns[0].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  stutter_exit_watermark[4], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_exit_wm_ns[1].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  stutter_exit_watermark[5], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_exit_wm_ns[2].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  stutter_exit_watermark[6], 
bw_int_to_fixed(1000)));
+   if (ctx->dc->caps.max_slave_planes) {
+   calcs_output->stutter_exit_wm_ns[3].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  stutter_exit_watermark[0], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_exit_wm_ns[4].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  stutter_exit_watermark[1], 
bw_int_to_fixed(1000)));
+   } else {
+   calcs_output->stutter_exit_wm_ns[3].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+  stutter_exit_watermark[7], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_exit_wm_ns[4].b_mark =
+   bw_fixed_to_int(b

[PATCH 0/3] Fix a bunch of allmodconfig errors

2022-11-25 Thread Lee Jones

Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST") WERROR
now defaults to COMPILE_TEST meaning that it's enabled for allmodconfig
builds.  This leads to some interesting failures, each resolved in this set.

With this set applied, I am able to obtain a successful allmodconfig Arm build.

Lee Jones (3):
  drm/amd/display/dc/calcs/dce_calcs: Break-out a stack-heavy chunk of
code
  drm/amdgpu: Temporarily disable broken Clang builds due to blown
stack-frame
  Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is
enabled

 drivers/gpu/drm/Kconfig   |   7 +
 .../drm/amd/display/dc/dml/calcs/dce_calcs.c  | 483 +-
 lib/Kconfig.debug |   1 +
 3 files changed, 253 insertions(+), 238 deletions(-)


-- 
2.38.1.584.g0f3c55d4c2-goog

[PATCH] drm/amdkfd: Remove unnecessary condition in kfd_topology_add_device()

2022-11-25 Thread Dan Carpenter

We re-arranged this code recently so "ret" is always zero at this point.

Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 6f01ebc8557b..bceb1a5b2518 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2012,10 +2012,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 
kfd_debug_print_topology();
 
-   if (!res)
-   kfd_notify_gpu_change(gpu_id, 1);
+   kfd_notify_gpu_change(gpu_id, 1);
 
-   return res;
+   return 0;
 }
 
 /**
-- 
2.35.1

Re: [PATCH] drm/amdgpu: Fix minmax error

2022-11-25 Thread Luben Tuikov

On 2022-11-25 02:59, Christian König wrote:
> Am 25.11.22 um 08:56 schrieb Luben Tuikov:
>> On 2022-11-25 02:45, Christian König wrote:
>>>
>>> Am 24.11.22 um 22:19 schrieb Luben Tuikov:
 Fix minmax compilation error by using min_t()/max_t(), of the assignment 
 type.

 Cc: James Zhu 
 Cc: Felix Kuehling 
 Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large 
 system memory")
 Signed-off-by: Luben Tuikov 
 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++---
1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
 index 8a2e5716d8dba2..d22d14b0ef0c84 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
 @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct 
 mmu_interval_notifier *notifier,
hmm_range->dev_private_owner = owner;

do {
 -  hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
 +  hmm_range->end = min_t(typeof(hmm_range->end),
 + hmm_range->start + MAX_WALK_BYTE,
 + end);
>>> Since end is a local variable I would strongly prefer to just have it
>>> use the correct type for it.
>>>
>>> Otherwise we might end up using something which doesn't work on all
>>> architectures.
>> They all appear to be "unsigned long". I thought, since we assign to
>> hmm_range->end, we use that type.
> 
> Mhm, then why does the compiler complain here?

Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) ),
and this is why the minmax check complains.

So, since the left-hand expression is unsigned long,
i.e.,
hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
is,
unsigned long = min(unsigned long long, unsigned long);
The compiler complains.

I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX,
and be defined as UL. I mean, why is everything in struct hmm_range
"unsigned long", but we set a high limit of 10__h for an end, and
compare it to "end" to find the smaller? If our "end" could potentially
be 10__h then shouldn't the members in struct hmm_range be
unsigned long long as well?

And for the timeout, we have the (now) obvious,

timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL);

and I don't know why we necessarily need a "1ULL", when 1UL would do just fine,
and then compilation passes for that statement. I can set this to 1UL, instead
of using max_t().

Regards,
Luben

> 
> As far as I can see "unsigned long" is correct here, but if we somehow 
> have a typecast then something is not working as expected.
> 
> Is MAX_WALK_BYTE maybe of signed type?
> 
>>
>> Would you prefer at the top of the function to define "timeout" and "end" as,
>>  typeof(hmm_range->end) end, timeout;
> 
> Well for end that might make sense, but timeout is independent of the 
> hmm range.
> 
> Regards,
> Christian.
> 
>>
>> Regards,
>> Luben
>>
>

43 matches

Mail list logo