[PATCH] drm/amdgpu: Fix minmax error
Fix minmax compilation error by using the correct constant and correct integer suffix. Cc: James Zhu Cc: Felix Kuehling Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system memory") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 8a2e5716d8dba2..65715cb395d838 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -51,7 +51,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_hmm.h" -#define MAX_WALK_BYTE (64ULL<<30) +#define MAX_WALK_BYTE (2UL << 30) /** * amdgpu_hmm_invalidate_gfx - callback to notify about mm change @@ -197,8 +197,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->start, hmm_range->end); /* Assuming 512MB takes maxmium 1 second to fault page address */ - timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL) * - HMM_RANGE_DEFAULT_TIMEOUT; + timeout = max((hmm_range->end - hmm_range->start) >> 29, 1UL); + timeout *= HMM_RANGE_DEFAULT_TIMEOUT; timeout = jiffies + msecs_to_jiffies(timeout); retry: base-commit: 9e95ce4c60631c1339204f8723008a715391f410 -- 2.39.0.rc0
Re: [PATCH] drm/amdgpu: Fix minmax error
On 2022-11-25 16:03, James Zhu wrote: > > On 2022-11-25 14:42, Luben Tuikov wrote: >> On 2022-11-25 04:57, Christian König wrote: >>> >>> Am 25.11.22 um 09:33 schrieb Luben Tuikov: On 2022-11-25 02:59, Christian König wrote: > Am 25.11.22 um 08:56 schrieb Luben Tuikov: >> On 2022-11-25 02:45, Christian König wrote: >>> Am 24.11.22 um 22:19 schrieb Luben Tuikov: Fix minmax compilation error by using min_t()/max_t(), of the assignment type. Cc: James Zhu Cc: Felix Kuehling Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system memory") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 8a2e5716d8dba2..d22d14b0ef0c84 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->dev_private_owner = owner; do { - hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); + hmm_range->end = min_t(typeof(hmm_range->end), + hmm_range->start + MAX_WALK_BYTE, + end); >>> Since end is a local variable I would strongly prefer to just have it >>> use the correct type for it. >>> >>> Otherwise we might end up using something which doesn't work on all >>> architectures. >> They all appear to be "unsigned long". I thought, since we assign to >> hmm_range->end, we use that type. > Mhm, then why does the compiler complain here? Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) ), and this is why the minmax check complains. So, since the left-hand expression is unsigned long, i.e., hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); is, unsigned long = min(unsigned long long, unsigned long); The compiler complains. I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX, >>> That's not only a preference, but a must have. Otherwise the code maybe >>> won't work as expected on 32bit architectures. >> Well, I don't know what to change MAX_WALK_BYTE to, and given the suggestion >> below to change to min_t(u64, ...), I wonder if messing with MAX_WALK_BYTE >> even matters--it wouldn't matter so long as the type in min_t() is u64. >> It's a macro at the moment. >> >> However, the LHS--struct hmm_range's members are all >> unsigned long and then we're essentially doing (unsigned long) = (u64), >> which on 32-bit is (u32) = (u64). > [JZ]MAX_WALK_BYTE can be reduce to #define MAX_WALK_BYTE (2UL<<30) Hi James--okay, I'll prep up a patch. Regards, Luben
Re: [PATCH] drm/amdgpu: Fix minmax error
On 2022-11-25 14:42, Luben Tuikov wrote: On 2022-11-25 04:57, Christian König wrote: Am 25.11.22 um 09:33 schrieb Luben Tuikov: On 2022-11-25 02:59, Christian König wrote: Am 25.11.22 um 08:56 schrieb Luben Tuikov: On 2022-11-25 02:45, Christian König wrote: Am 24.11.22 um 22:19 schrieb Luben Tuikov: Fix minmax compilation error by using min_t()/max_t(), of the assignment type. Cc: James Zhu Cc: Felix Kuehling Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system memory") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 8a2e5716d8dba2..d22d14b0ef0c84 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->dev_private_owner = owner; do { - hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); + hmm_range->end = min_t(typeof(hmm_range->end), + hmm_range->start + MAX_WALK_BYTE, + end); Since end is a local variable I would strongly prefer to just have it use the correct type for it. Otherwise we might end up using something which doesn't work on all architectures. They all appear to be "unsigned long". I thought, since we assign to hmm_range->end, we use that type. Mhm, then why does the compiler complain here? Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) ), and this is why the minmax check complains. So, since the left-hand expression is unsigned long, i.e., hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); is, unsigned long = min(unsigned long long, unsigned long); The compiler complains. I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX, That's not only a preference, but a must have. Otherwise the code maybe won't work as expected on 32bit architectures. Well, I don't know what to change MAX_WALK_BYTE to, and given the suggestion below to change to min_t(u64, ...), I wonder if messing with MAX_WALK_BYTE even matters--it wouldn't matter so long as the type in min_t() is u64. It's a macro at the moment. However, the LHS--struct hmm_range's members are all unsigned long and then we're essentially doing (unsigned long) = (u64), which on 32-bit is (u32) = (u64). [JZ]MAX_WALK_BYTE can be reduce to #define MAX_WALK_BYTE (2UL<<30) Regards, Luben and be defined as UL. I mean, why is everything in struct hmm_range "unsigned long", but we set a high limit of 10__h for an end, and compare it to "end" to find the smaller? If our "end" could potentially be 10__h then shouldn't the members in struct hmm_range be unsigned long long as well? No, that the hmm range depends on the address space bits of the CPU is perfectly correct. Essentially this is just an userspace address range. Our problem here is that this code needs to work on both 32bit and 64bit systems. And on a 32bit system limiting the types wouldn't work correctly as far as I can see. So the compiler is complaining for rather good reasons and by using "min_t(UL" we just hide that instead of fixing the problem. I suggest to use "min_t(u64" instead. An intelligent compiler should even be capable of optimizing this away by looking at the input types on 32bit archs. And for the timeout, we have the (now) obvious, timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL); and I don't know why we necessarily need a "1ULL", when 1UL would do just fine, and then compilation passes for that statement. I can set this to 1UL, instead of using max_t(). I think just changing this to 1UL should be sufficient. Regards, Christian. Regards, Luben As far as I can see "unsigned long" is correct here, but if we somehow have a typecast then something is not working as expected. Is MAX_WALK_BYTE maybe of signed type? Would you prefer at the top of the function to define "timeout" and "end" as, typeof(hmm_range->end) end, timeout; Well for end that might make sense, but timeout is independent of the hmm range. Regards, Christian. Regards, Luben
[PATCH] drm/amd/display: fix PSR-SU/DSC interoperability support
Currently, there are issues with enabling PSR-SU + DSC. This stems from the fact that DSC imposes a slice height on transmitted video data and we are not conforming to that slice height in PSR-SU regions. So, pass slice_height into su_y_granularity to feed the DSC slice height into PSR-SU code. Signed-off-by: Hamza Mahfooz --- .../drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 38 +++ 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c index 26291db0a3cf..55acadf0b63f 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c @@ -53,6 +53,41 @@ static bool link_supports_psrsu(struct dc_link *link) return true; } +static bool psr_su_set_y_granularity(struct dc *dc, struct dc_link *link, +struct dc_stream_state *stream, +struct psr_config *config) +{ + u16 pic_height; + u8 slice_height; + + if (!dc->caps.edp_dsc_support || + link->panel_config.dsc.disable_dsc_edp || + !link->dpcd_caps.dsc_caps.dsc_basic_caps.fields.dsc_support.DSC_SUPPORT || + !stream->timing.dsc_cfg.num_slices_v) + return true; + + pic_height = stream->timing.v_addressable + + stream->timing.v_border_top + stream->timing.v_border_bottom; + slice_height = pic_height / stream->timing.dsc_cfg.num_slices_v; + + if (slice_height) { + if (config->su_y_granularity && + (slice_height % config->su_y_granularity)) { + WARN(1, +"%s: dsc: %d, slice_height: %d, num_slices_v: %d\n", +__func__, + stream->sink->dsc_caps.dsc_dec_caps.is_dsc_supported, +slice_height, +stream->timing.dsc_cfg.num_slices_v); + return false; + } + + config->su_y_granularity = slice_height; + } + + return true; +} + /* * amdgpu_dm_set_psr_caps() - set link psr capabilities * @link: link @@ -122,6 +157,9 @@ bool amdgpu_dm_link_setup_psr(struct dc_stream_state *stream) psr_config.allow_multi_disp_optimizations = (amdgpu_dc_feature_mask & DC_PSR_ALLOW_MULTI_DISP_OPT); + if (!psr_su_set_y_granularity(dc, link, stream, &psr_config)) + return false; + ret = dc_link_setup_psr(link, stream, &psr_config, &psr_context); } -- 2.38.1
Re: [PATCH] drm/amdgpu: Fix minmax error
On 2022-11-25 04:57, Christian König wrote: > > > Am 25.11.22 um 09:33 schrieb Luben Tuikov: >> On 2022-11-25 02:59, Christian König wrote: >>> Am 25.11.22 um 08:56 schrieb Luben Tuikov: On 2022-11-25 02:45, Christian König wrote: > Am 24.11.22 um 22:19 schrieb Luben Tuikov: >> Fix minmax compilation error by using min_t()/max_t(), of the assignment >> type. >> >> Cc: James Zhu >> Cc: Felix Kuehling >> Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large >> system memory") >> Signed-off-by: Luben Tuikov >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c >> index 8a2e5716d8dba2..d22d14b0ef0c84 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c >> @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct >> mmu_interval_notifier *notifier, >> hmm_range->dev_private_owner = owner; >> >> do { >> -hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, >> end); >> +hmm_range->end = min_t(typeof(hmm_range->end), >> + hmm_range->start + MAX_WALK_BYTE, >> + end); > Since end is a local variable I would strongly prefer to just have it > use the correct type for it. > > Otherwise we might end up using something which doesn't work on all > architectures. They all appear to be "unsigned long". I thought, since we assign to hmm_range->end, we use that type. >>> Mhm, then why does the compiler complain here? >> Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) >> ), >> and this is why the minmax check complains. >> >> So, since the left-hand expression is unsigned long, >> i.e., >> hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); >> is, >> unsigned long = min(unsigned long long, unsigned long); >> The compiler complains. >> >> I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX, > > That's not only a preference, but a must have. Otherwise the code maybe > won't work as expected on 32bit architectures. Well, I don't know what to change MAX_WALK_BYTE to, and given the suggestion below to change to min_t(u64, ...), I wonder if messing with MAX_WALK_BYTE even matters--it wouldn't matter so long as the type in min_t() is u64. It's a macro at the moment. However, the LHS--struct hmm_range's members are all unsigned long and then we're essentially doing (unsigned long) = (u64), which on 32-bit is (u32) = (u64). Regards, Luben > >> and be defined as UL. I mean, why is everything in struct hmm_range >> "unsigned long", but we set a high limit of 10__h for an end, and >> compare it to "end" to find the smaller? If our "end" could potentially >> be 10__h then shouldn't the members in struct hmm_range be >> unsigned long long as well? > > No, that the hmm range depends on the address space bits of the CPU is > perfectly correct. Essentially this is just an userspace address range. > > Our problem here is that this code needs to work on both 32bit and 64bit > systems. And on a 32bit system limiting the types wouldn't work > correctly as far as I can see. > > So the compiler is complaining for rather good reasons and by using > "min_t(UL" we just hide that instead of fixing the problem. > > I suggest to use "min_t(u64" instead. An intelligent compiler should > even be capable of optimizing this away by looking at the input types on > 32bit archs. > >> >> And for the timeout, we have the (now) obvious, >> >> timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL); >> >> and I don't know why we necessarily need a "1ULL", when 1UL would do just >> fine, >> and then compilation passes for that statement. I can set this to 1UL, >> instead >> of using max_t(). > > I think just changing this to 1UL should be sufficient. > > Regards, > Christian. > >> >> Regards, >> Luben >> >> >>> As far as I can see "unsigned long" is correct here, but if we somehow >>> have a typecast then something is not working as expected. >>> >>> Is MAX_WALK_BYTE maybe of signed type? >>> Would you prefer at the top of the function to define "timeout" and "end" as, typeof(hmm_range->end) end, timeout; >>> Well for end that might make sense, but timeout is independent of the >>> hmm range. >>> >>> Regards, >>> Christian. >>> Regards, Luben >
Re: [PATCH 2/2] drm/amdgpu: Fix PCI device refcount leak in amdgpu_atrm_get_bios()
Applied the series. Thanks! Alex On Tue, Nov 22, 2022 at 6:13 AM Xiongfeng Wang wrote: > > As comment of pci_get_class() says, it returns a pci_device with its > refcount increased and decreased the refcount for the input parameter > @from if it is not NULL. > > If we break the loop in amdgpu_atrm_get_bios() with 'pdev' not NULL, we > need to call pci_dev_put() to decrease the refcount. Add the missing > pci_dev_put() to avoid refcount leak. > > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") > Signed-off-by: Xiongfeng Wang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c > index e363f56c72af..30c28a69e847 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c > @@ -317,6 +317,7 @@ static bool amdgpu_atrm_get_bios(struct amdgpu_device > *adev) > > if (!found) > return false; > + pci_dev_put(pdev); > > adev->bios = kmalloc(size, GFP_KERNEL); > if (!adev->bios) { > -- > 2.20.1 >
Re: [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation
On Fri, Nov 25, 2022 at 5:21 AM Christian König wrote: > > We already fallback to a dummy BO with no backing store when we > allocate GDS,GWS and OA resources and to GTT when we allocate VRAM. > > Drop all those workarounds and generalize this for GTT as well. This > fixes ENOMEM issues with runaway applications which try to allocate/free > GTT in a loop and are otherwise only limited by the CPU speed. > > The CS will wait for the cleanup of freed up BOs to satisfy the > various domain specific limits and so effectively throttle those > buggy applications down to a sane allocation behavior again. > > Signed-off-by: Christian König This looks like a good bug fix and unrelated to the rest of this series. Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 16 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- > 2 files changed, 4 insertions(+), 18 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > index a0780a4e3e61..62e98f1ad770 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, > unsigned long size, > bp.resv = resv; > bp.preferred_domain = initial_domain; > bp.flags = flags; > - bp.domain = initial_domain; > + bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU; > bp.bo_ptr_size = sizeof(struct amdgpu_bo); > > r = amdgpu_bo_create_user(adev, &bp, &ubo); > @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, > void *data, > } > > initial_domain = (u32)(0x & args->in.domains); > -retry: > r = amdgpu_gem_object_create(adev, size, args->in.alignment, > -initial_domain, > -flags, ttm_bo_type_device, resv, &gobj); > +initial_domain, flags, > ttm_bo_type_device, > +resv, &gobj); > if (r && r != -ERESTARTSYS) { > - if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) { > - flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > - goto retry; > - } > - > - if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { > - initial_domain |= AMDGPU_GEM_DOMAIN_GTT; > - goto retry; > - } > DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, > %d)\n", > size, initial_domain, args->in.alignment, r); > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 974e85d8b6cc..919bbea2e3ac 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev, > bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE; > > bo->tbo.bdev = &adev->mman.bdev; > - if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA | > - AMDGPU_GEM_DOMAIN_GDS)) > - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); > - else > - amdgpu_bo_placement_from_domain(bo, bp->domain); > + amdgpu_bo_placement_from_domain(bo, bp->domain); > if (bp->type == ttm_bo_type_kernel) > bo->tbo.priority = 1; > > -- > 2.34.1 >
[pull] amdgpu, amdkfd drm-next-6.2
Hi Dave, Daniel, More stuff for 6.2. Mostly bug fixes at this point. The following changes since commit 3d335a523b938a445a674be24d1dd5c7a4c86fb6: Merge tag 'drm-intel-next-2022-11-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-next (2022-11-23 09:15:44 +1000) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-next-6.2-2022-11-25 for you to fetch changes up to 10d2d1fc05f03ee1626b60761a3425622767513e: drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read" (2022-11-23 10:31:31 -0500) amd-drm-next-6.2-2022-11-25: amdgpu: - Old GCC fix - GFX11 fixes - PSP suspend/resume fix - PCI ref count fix - DC KASAN fix - DCN 3.2.x fixes - Dell platform suspend/resume fixes - DCN 3.1.4 fixes - RAS fixes - SMU 13.x fixes - Flex array changes - VCN 4.0 RAS updates - Add missing licsense to some files - Documentation updates - SR-IOV fixes - DP MST DSC fix amdkfd: - Fix topology locking in error case Alex Deucher (4): drm/amdgpu/psp: don't free PSP buffers on suspend Revert "drm/amd/display: fix dpms_off issue when disabling bios mode" drm/amdgpu: add missing license to some files drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read" Alvin Lee (6): drm/amd/display: Add margin on DRR vblank start for subvp drm/amd/display: Limit HW cursor size of >= 4k drm/amd/display: Check if PSR enabled when entering MALL drm/amd/display: Add debug options for increasing phantom lines drm/amd/display: Retain phantom plane/stream if validation fails drm/amd/display: Revert check for phantom BPP Aric Cyr (1): drm/amd/display: 3.2.213 Aurabindo Pillai (1): drm/amd/display: trigger timing sync only if TG is running Bob zhou (1): drm/amd/display: fix compilation issue with legacy gcc Camille Cho (1): drm/amd/display: new ABM config 2 Candice Li (1): drm/amd/pm: Enable bad memory page/channel recording support for smu v13_0_0 David Galiffi (1): drm/amd/display: Fix rotated cursor offset calculation Dillon Varone (5): drm/amd/display: Update soc bounding box for dcn32/dcn321 drm/amd/display: Use dummy pstate latency for subvp when needed on dcn32 drm/amd/display: Add check for DET fetch latency hiding for dcn32 drm/amd/display: Use viewport height for subvp mall allocation size drm/amd/display: Use new num clk levels struct for max mclk index Felix Kuehling (1): drm/amdkfd: Release the topology_lock in error case Ilya Bakoulin (1): drm/amd/display: Fix display corruption w/ VSR enable Jack Xiao (1): drm/amd/amdgpu: reserve vm invalidation engine for firmware Jane Jian (1): drm/amdgpu/vcn: re-use original vcn0 doorbell value Luben Tuikov (1): drm/amdgpu: Fix minmax warning Lyude Paul (2): drm/amdgpu/dm/mst: Fix uninitialized var in pre_compute_mst_dsc_configs_for_state() drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN Mustapha Ghaddar (1): drm/amd/display: Phase 1 Add Bw Allocation source and header files Nicholas Kazlauskas (2): drm/amd/display: Update Z8 watermarks for DCN314 drm/amd/display: Add Z8 allow states to z-state support list Paulo Miguel Almeida (1): drm/amdgpu: Replace remaining 1-element array with flex-array Ren Zhijie (1): drm/amdgpu: fix unused-function error Rodrigo Siqueira (1): drm/amd/display: Add YCBCR2020 coefficients to CSC matrix Shikang Fan (1): drm/amdgpu: fix for suspend/resume kiq fence fallback under sriov Stanley.Yang (1): drm/amdgpu: fix use-after-free during gpu recovery Taimur Hassan (1): drm/amd/display: Avoid setting pixel rate divider to N/A Tao Zhou (2): drm/amdgpu: add register definition for VCN RAS initialization drm/amdgpu: enable RAS poison for VCN 2.6 Tsung-hua Lin (1): drm/amd/display: No display after resume from WB/CB Yang Yingliang (1): drm/amdgpu: fix pci device refcount leak ZhenGuo Yin (1): drm/amdgpu: update documentation of parameter amdgpu_gtt_size lyndonli (2): drm/amd/pm: update driver if header for smu_13_0_7 drm/amdgpu: add the fan abnormal detection feature drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 16 +-- drivers/gpu/drm/amd/amdgpu/mmsch_v4_0.h| 1 - drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c
[PATCH v3 2/2] drm/amdgpu: Add work function for GPU reset event
Add a work function to send a GPU reset uevent and scheduled it during a GPU reset. Co-developed-by: Shashank Sharma Signed-off-by: Shashank Sharma Signed-off-by: André Almeida --- V3: - Merge two last commits V2: Addressed review comments from Christian - Changed the name of the work to gpu_reset_event_work - Added a structure to accommodate some additional information (like a PID and some flags) - Do not add new structure in amdgpu.h --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 4 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++ 2 files changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 6b74df446694..88cb5b739c5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -60,6 +60,8 @@ #include #include #include +#include +#include #include #include "dm_pp_interface.h" @@ -1003,6 +1005,7 @@ struct amdgpu_device { int asic_reset_res; struct work_struct xgmi_reset_work; + struct work_struct gpu_reset_event_work; struct list_headreset_list; longgfx_timeout; @@ -1036,6 +1039,7 @@ struct amdgpu_device { pci_channel_state_t pci_channel_state; struct amdgpu_reset_control *reset_cntl; + struct drm_reset_event_info reset_event_info; uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE]; boolram_is_direct_mapped; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b2b1c66bfe39..d04541fdb606 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -77,6 +77,7 @@ #include #include +#include MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); @@ -3365,6 +3366,19 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev) return amdgpu_device_asic_has_dc_support(adev->asic_type); } +static void amdgpu_device_reset_event_func(struct work_struct *__work) +{ + struct amdgpu_device *adev = container_of(__work, struct amdgpu_device, + gpu_reset_event_work); + /* +* A GPU reset has happened, inform the userspace and pass the reset +* related information +*/ + drm_sysfs_reset_event(&adev->ddev, &adev->reset_event_info); + + put_pid(adev->reset_event_info.pid); +} + static void amdgpu_device_xgmi_reset_func(struct work_struct *__work) { struct amdgpu_device *adev = @@ -3616,6 +3630,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, amdgpu_device_delay_enable_gfx_off); INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func); + INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func); adev->gfx.gfx_off_req_count = 1; adev->gfx.gfx_off_residency = 0; @@ -4920,6 +4935,21 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, goto out; vram_lost = amdgpu_device_check_vram_lost(tmp_adev); + + if (reset_context->job && reset_context->job->vm) { + tmp_adev->reset_event_info.pid = + find_get_pid(reset_context->job->vm->task_info.pid); + } else { + tmp_adev->reset_event_info.pid = NULL; + } + + if (vram_lost) + tmp_adev->reset_event_info.flags |= + DRM_RESET_EVENT_VRAM_LOST; + + /* Send GPU reset event */ + schedule_work(&tmp_adev->gpu_reset_event_work); + #ifdef CONFIG_DEV_COREDUMP tmp_adev->reset_vram_lost = vram_lost; memset(&tmp_adev->reset_task_info, 0, -- 2.38.1
[PATCH v3 1/2] drm: Add GPU reset sysfs event
From: Shashank Sharma Add a sysfs event to notify userspace about GPU resets providing: - PID that triggered the GPU reset, if any. Resets can happen from kernel threads as well, in that case no PID is provided - Information about the reset (e.g. was VRAM lost?) Co-developed-by: André Almeida Signed-off-by: André Almeida Signed-off-by: Shashank Sharma --- V3: - Reduce information to just PID and flags - Use pid pointer instead of just pid number - BUG() if no reset info is provided V2: - Addressed review comments from Christian and Amar - move the reset information structure to DRM layer - drop _ctx from struct name - make pid 32 bit(than 64) - set flag when VRAM invalid (than valid) - add process name as well (Amar) --- drivers/gpu/drm/drm_sysfs.c | 26 ++ include/drm/drm_sysfs.h | 13 + 2 files changed, 39 insertions(+) diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c index 430e00b16eec..85777abf4194 100644 --- a/drivers/gpu/drm/drm_sysfs.c +++ b/drivers/gpu/drm/drm_sysfs.c @@ -409,6 +409,32 @@ void drm_sysfs_hotplug_event(struct drm_device *dev) } EXPORT_SYMBOL(drm_sysfs_hotplug_event); +/** + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset + * @dev: DRM device + * @reset_info: The contextual information about the reset (like PID, flags) + * + * Send a uevent for the DRM device specified by @dev. This informs + * user that a GPU reset has occurred, so that an interested client + * can take any recovery or profiling measure. + */ +void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_event_info *reset_info) +{ + unsigned char pid_str[13]; + unsigned char flags_str[18]; + unsigned char reset_str[] = "RESET=1"; + char *envp[] = { reset_str, pid_str, flags_str, NULL }; + + DRM_DEBUG("generating reset event\n"); + + BUG_ON(!reset_info); + + snprintf(pid_str, sizeof(pid_str), "PID=%u", pid_vnr(reset_info->pid)); + snprintf(flags_str, sizeof(flags_str), "FLAGS=0x%llx", reset_info->flags); + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); +} +EXPORT_SYMBOL(drm_sysfs_reset_event); + /** * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector * change diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h index 6273cac44e47..dbb0ac6230b8 100644 --- a/include/drm/drm_sysfs.h +++ b/include/drm/drm_sysfs.h @@ -2,15 +2,28 @@ #ifndef _DRM_SYSFS_H_ #define _DRM_SYSFS_H_ +#define DRM_RESET_EVENT_VRAM_LOST (1 << 0) + struct drm_device; struct device; struct drm_connector; struct drm_property; +/** + * struct drm_reset_event_info - Information about a GPU reset event + * @pid: Process that triggered the reset, if any + * @flags: Extra information around the reset event (e.g. is VRAM lost?) + */ +struct drm_reset_event_info { + struct pid *pid; + uint64_t flags; +}; + int drm_class_device_register(struct device *dev); void drm_class_device_unregister(struct device *dev); void drm_sysfs_hotplug_event(struct drm_device *dev); +void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_event_info *reset_info); void drm_sysfs_connector_hotplug_event(struct drm_connector *connector); void drm_sysfs_connector_status_event(struct drm_connector *connector, struct drm_property *property); -- 2.38.1
[PATCH v3 0/2] drm: Add GPU reset sysfs
This patchset adds a udev event for DRM device's resets. Userspace apps can trigger GPU resets by misuse of graphical APIs or driver bugs. Either way, the GPU reset might lead the system to a broken state[1], that might be recovered if user has access to a tty or a remote shell. Arguably, this recovery could happen automatically by the system itself, thus this is the goal of this patchset. For debugging and report purposes, device coredump support was already added for amdgpu[2], but it's not suitable for programmatic usage like this one given the uAPI not being stable and the need for parsing. GL/VK is out of scope for this use, giving that we are dealing with device resets regardless of API. A basic userspace daemon is provided at [3] showing how the interface is used to recovery from resets. [1] A search for "reset" in DRM/AMD issue tracker shows reports of resets making the system unusable: https://gitlab.freedesktop.org/drm/amd/-/issues/?search=reset [2] https://lore.kernel.org/amd-gfx/20220602081538.1652842-2-amaranath.somalapu...@amd.com/ [3] https://gitlab.freedesktop.org/andrealmeid/gpu-resetd v2: https://lore.kernel.org/dri-devel/20220308180403.75566-1-contactshashanksha...@gmail.com/ André Almeida (1): drm/amdgpu: Add work function for GPU reset event Shashank Sharma (1): drm: Add GPU reset sysfs event drivers/gpu/drm/amd/amdgpu/amdgpu.h| 4 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++ drivers/gpu/drm/drm_sysfs.c| 26 +++ include/drm/drm_sysfs.h| 13 ++ 4 files changed, 73 insertions(+) -- 2.38.1
RE: [PATCH 00/12] DC Patches November 28 2022
[AMD Official Use Only - General] Hi all, This week this patchset was tested on the following systems: HP Envy 360, with Ryzen 5 4500U Lenovo Thinkpad T14s Gen2, with AMD Ryzen 5 5650U Lenovo Thinkpad T13s Gen4 with AMD Ryzen 5 6600U Sapphire Pulse RX5700XT Reference AMD RX6800 These systems were tested on the following display types: eDP, (1080p 60hz [4500U, 5650U]) (1920x1200 [6600U]) VGA and DVI (1680x1050 60HZ [DP to VGA/DVI, USB-C to DVI/VGA]) DP/HDMI/USB-C (1440p 170hz, 4k 60hz, 4k 144hz [Includes USB-C to DP/HDMI adapters]) MST tested with Startech MST14DP123DP and 2x 4k 60Hz displays DSC tested with Cable Matters 101075 (DP to 3x DP), and 201375 (USB-C to 3x DP) with 3x 4k60 displays HP Hook G2 with 1 and 2 4k60 Displays The testing is a mix of automated and manual tests. Manual testing includes (but is not limited to): Changing display configurations and settings Benchmark testing Feature testing (Freesync, etc.) Automated testing includes (but is not limited to): Script testing (scripts to automate some of the manual checks) IGT testing The patchset consists of the amd-staging-drm-next branch (Head commit - 1ee359db35f2c487a063412168724d3a55b72123 -> drm/amdgpu: enable RAS poison for VCN 2.6) with new patches added on top of it. This branch is used for both Ubuntu and Chrome OS testing (ChromeOS on a bi-weekly basis). Tested on Ubuntu 22.04 and Chrome OS Tested-by: Daniel Wheeler Thank you, Dan Wheeler Sr. Technologist | AMD SW Display -- 1 Commerce Valley Dr E, Thornhill, ON L3T 7X6 amd.com Thank you, Dan Wheeler Sr. Technologist | AMD SW Display -- 1 Commerce Valley Dr E, Thornhill, ON L3T 7X6 Facebook | Twitter | amd.com -Original Message- From: Dhillon, Jasdeep Sent: November 24, 2022 4:14 PM To: amd-gfx@lists.freedesktop.org Cc: Wentland, Harry ; Li, Sun peng (Leo) ; Lakha, Bhawanpreet ; Siqueira, Rodrigo ; Pillai, Aurabindo ; Zhuo, Qingqing (Lillian) ; Li, Roman ; Lin, Wayne ; Wang, Chao-kai (Stylon) ; Chiu, Solomon ; Kotarac, Pavle ; Gutierrez, Agustin ; Dhillon, Jasdeep ; Wheeler, Daniel Subject: [PATCH 00/12] DC Patches November 28 2022 This DC patchset brings improvements in multiple areas. In summary, we have: * Program output transfer function when required * Fix arthmetic errror in MALL size caluclations for subvp * DCC Meta pitch used for MALL allocation * Debugfs entry to tell if connector is DPIA link * Use largest vready_offset in pipe group * Fixes race condition in DPIA Aux transfer Cc: Daniel Wheeler Alvin Lee (3): drm/amd/display: Retain phantom pipes when min transition into subvp drm/amd/display: Don't overwrite subvp pipe info in fast updates drm/amd/display: Fix DTBCLK disable requests and SRC_SEL programming Aric Cyr (1): drm/amd/display: 3.2.214 Dillon Varone (4): drm/amd/display: MALL SS calculations should iterate over all pipes for cursor drm/amd/display: Use DCC meta pitch for MALL allocation requirements drm/amd/display: Fix arithmetic error in MALL size calculations for subvp drm/amd/display: program output tf when required Dmytro Laktyushkin (1): drm/amd/display: set per pipe dppclk to 0 when dpp is off Stylon Wang (2): drm/amd/display: Fix race condition in DPIA AUX transfer drm/amd/display: Create debugfs to tell if connector is DPIA link Wesley Chalmers (1): drm/amd/display: Use the largest vready_offset in pipe group .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 151 +- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 17 +- .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 23 ++- .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 10 +- .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 23 ++- drivers/gpu/drm/amd/display/dc/dc.h | 2 +- drivers/gpu/drm/amd/display/dc/dc_stream.h| 11 ++ .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 30 +++- .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 34 +++- .../gpu/drm/amd/display/dc/dcn32/dcn32_dccg.c | 6 +- .../drm/amd/display/dc/dcn32/dcn32_hwseq.c| 8 +- .../drm/amd/display/dc/dcn32/dcn32_resource.c | 66 +--- .../drm/amd/display/dc/dcn32/dcn32_resource.h | 13 +- .../display/dc/dcn32/dcn32_resource_helpers.c | 15 +- .../amd/display/dc/dcn321/dcn321_resource.c | 2 + .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 3 + .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 9 +- .../gpu/drm/amd/display/dc/inc/core_types.h | 4 +- 19 files changed, 277 insertions(+), 152 deletions(-) -- 2.34.1 <>
Re: [PATCH] drm/amdgpu: use dev_dbg to print messages in runtime cycle
On Thu, Nov 24, 2022 at 12:29 AM Guchun Chen wrote: > > Runtime PM can happen pretty frequently, as these printings > may be annoyed, switch to dev_dbg. > > Suggested-by: Lijo Lazar > Signed-off-by: Guchun Chen Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 8b1f6c032a2e..447e27b2e16b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2568,7 +2568,7 @@ static int amdgpu_pmops_runtime_suspend(struct device > *dev) > amdgpu_device_baco_enter(drm_dev); > } > > - dev_info(&pdev->dev, "asic/device is runtime suspended\n"); > + dev_dbg(&pdev->dev, "asic/device is runtime suspended\n"); > > return 0; > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > index 7bb2de1d11ff..4a18d1944e4f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > @@ -861,7 +861,7 @@ static int psp_tmr_unload(struct psp_context *psp) > struct psp_gfx_cmd_resp *cmd = acquire_psp_cmd_buf(psp); > > psp_prep_tmr_unload_cmd_buf(psp, cmd); > - dev_info(psp->adev->dev, "free PSP TMR buffer\n"); > + dev_dbg(psp->adev->dev, "free PSP TMR buffer\n"); > > ret = psp_cmd_submit_buf(psp, NULL, cmd, > psp->fence_buf_mc_addr); > -- > 2.25.1 >
Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA
On 2022-11-24 15:27, Kim, Jonathan wrote: [AMD Official Use Only - General] -Original Message- From: Kuehling, Felix Sent: November 24, 2022 11:24 AM To: Kim, Jonathan ; amd- g...@lists.freedesktop.org Subject: Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA Am 2022-11-24 um 09:51 schrieb Kim, Jonathan: [Public] -Original Message- From: Kuehling, Felix Sent: November 22, 2022 7:45 PM To: Kim, Jonathan ; amd- g...@lists.freedesktop.org Subject: Re: [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA On 2022-10-31 12:23, Jonathan Kim wrote: From: Jay Cornwall Trap handler behavior will differ when a debugger is attached. Make the debug trap flag available in the trap handler TMA. Update it when the debug trap ioctl is invoked. v3: Rebase for upstream v2: Add missing debug flag setup on APUs Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 4 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 3 files changed, 22 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index ae6e701a2656..d4f87f2adada 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -193,6 +193,8 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, int unwind if (unwind && count == unwind_count) break; + kfd_process_set_trap_debug_flag(&pdd->qpd, false); + /* GFX off is already disabled by debug activate if not RLC restore supported. */ if (kfd_dbg_is_rlc_restore_supported(pdd->dev)) amdgpu_gfx_off_ctrl(pdd->dev->adev, false); @@ -278,6 +280,8 @@ int kfd_dbg_trap_activate(struct kfd_process *target) if (kfd_dbg_is_rlc_restore_supported(pdd->dev)) amdgpu_gfx_off_ctrl(pdd->dev->adev, true); + kfd_process_set_trap_debug_flag(&pdd->qpd, true); + r = debug_refresh_runlist(pdd->dev->dqm); if (r) { target->runtime_info.runtime_state = diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 9690a2adb9ed..82b28588ab72 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1101,6 +1101,8 @@ int kfd_init_apertures(struct kfd_process *process); void kfd_process_set_trap_handler(struct qcm_process_device *qpd, uint64_t tba_addr, uint64_t tma_addr); +void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, +bool enabled); /* CWSR initialization */ int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 59c4c38833b6..d62e0c62df76 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1252,6 +1252,8 @@ int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep) memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev- cwsr_isa_size); + kfd_process_set_trap_debug_flag(qpd, p- debug_trap_enabled); + qpd->tma_addr = qpd->tba_addr + KFD_CWSR_TMA_OFFSET; pr_debug("set tba :0x%llx, tma:0x%llx, cwsr_kaddr:%p for pqm.\n", qpd->tba_addr, qpd->tma_addr, qpd->cwsr_kaddr); @@ -1288,6 +1290,9 @@ static int kfd_process_device_init_cwsr_dgpu(struct kfd_process_device *pdd) memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev->cwsr_isa_size); + kfd_process_set_trap_debug_flag(&pdd->qpd, + pdd->process- debug_trap_enabled); + qpd->tma_addr = qpd->tba_addr + KFD_CWSR_TMA_OFFSET; pr_debug("set tba :0x%llx, tma:0x%llx, cwsr_kaddr:%p for pqm.\n", qpd->tba_addr, qpd->tma_addr, qpd->cwsr_kaddr); @@ -1374,6 +1379,17 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool supported) return true; } +void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, +bool enabled) +{ + /* If TMA doesn't exist then flag will be set during allocation. */ I would expect a change to the TMA allocation function, but that isn't in this patch? The TMA is allocated under kfd_process_init_cwsr_* and CWSR enabled is a pre-condition for the 1st level trap handler loading. The lack of context in the patch for those functions may be hiding that fact. Is the placement of this comment misleading? Maybe it should go in kfd_dbg_trap_activate when kfd_process_set_trap_debug_flag is called? Or should it just be removed since the combined calls within initialization of CWS
[linux-next:master] BUILD REGRESSION 9e46a79967326efb03c481ddfd58902475bd920d
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 9e46a79967326efb03c481ddfd58902475bd920d Add linux-next specific files for 20221125 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202211041320.coq8eelj-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211090634.ryfkk0ws-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211242021.fdzrfna8-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211242120.mzzvguln-...@intel.com Error/Warning: (recently discovered and may have been fixed) arch/arm/mach-s3c/devs.c:32:10: fatal error: linux/platform_data/dma-s3c24xx.h: No such file or directory arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't find starting instruction arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction drivers/clk/clk.c:1022:5: error: redefinition of 'clk_prepare' drivers/clk/clk.c:1268:6: error: redefinition of 'clk_is_enabled_when_prepared' drivers/clk/clk.c:941:6: error: redefinition of 'clk_unprepare' drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4968: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:5075:24: warning: implicit conversion from 'enum ' to 'enum dc_status' [-Wenum-conversion] drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: warning: no previous prototype for 'to_dal_irq_source_dcn201' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous prototype for 'gf100_fifo_nonstall_block' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous prototype for 'nvkm_engn_cgrp_get' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous prototype for 'tu102_gr_load' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype for 'wpr_generic_header_dump' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c:221:21: warning: variable 'loc' set but not used [-Wunused-but-set-variable] net/netfilter/nf_conntrack_netlink.c:2674:6: warning: unused variable 'mark' [-Wunused-variable] vmlinux.o: warning: objtool: __btrfs_map_block+0x1d77: unreachable instruction Unverified Error/Warning (likely false positive, please contact us if interested): ERROR: modpost: "input_ff_create_memless" [drivers/hid/hid-axff.ko] undefined! ERROR: modpost: "input_ff_create_memless" [drivers/hid/hid-sjoy.ko] undefined! ERROR: modpost: "input_ff_create_memless" [drivers/input/misc/arizona-haptics.ko] undefined! ERROR: modpost: "input_ff_create_memless" [drivers/input/misc/drv2667.ko] undefined! drivers/dma/at_hdmac.c:1371 atc_prep_slave_sg() warn: possible memory leak of 'desc' Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc.c:warning:This-comment-starts-with-but-isn-t-a-kernel-doc-comment.-Refer-Documentation-doc-guide-kernel-doc.rst | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get | |-- drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load | |-- drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump | `-- drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used |-- alpha-randconfig-r003-20221124 | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc.c:warning:This-comment-starts-with-but-isn-t-a-kernel-doc-comment.-Refer-Documentation-doc-guide-kernel-doc.rst | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201 |-- alpha-randconfig-r016-20221124 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get | |-- drivers-gpu-drm-nouveau-nvkm-engine-gr-
Re: [PATCH] drm/amdkfd: Remove unnecessary condition in kfd_topology_add_device()
On 2022-11-25 02:39, Dan Carpenter wrote: We re-arranged this code recently so "ret" is always zero at this point. Signed-off-by: Dan Carpenter Reviewed-by: Felix Kuehling I'm applying your patch to amd-staging-drm-next. Thank you! Felix --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 6f01ebc8557b..bceb1a5b2518 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -2012,10 +2012,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu) kfd_debug_print_topology(); - if (!res) - kfd_notify_gpu_change(gpu_id, 1); + kfd_notify_gpu_change(gpu_id, 1); - return res; + return 0; } /**
Re: [PATCH 2/3] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame
On Fri, Nov 25, 2022, at 10:25, Lee Jones wrote: > calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64) > architectures built with Clang (all released versions), whereby the stack > frame gets blown up to well over 5k. This would cause an immediate kernel > panic on most architectures. We'll revert this when the following bug report > has been resolved: https://github.com/llvm/llvm-project/issues/41896. > > Suggested-by: Arnd Bergmann > Signed-off-by: Lee Jones > --- > drivers/gpu/drm/Kconfig | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > index 34f5a092c99e7..1fa7b9760adb8 100644 > --- a/drivers/gpu/drm/Kconfig > +++ b/drivers/gpu/drm/Kconfig > @@ -265,6 +265,7 @@ source "drivers/gpu/drm/radeon/Kconfig" > > config DRM_AMDGPU > tristate "AMD GPU" > + depends on BROKEN || !CC_IS_CLANG || !(X86_64 || SPARC64 || ARM64) The logic looks fine, this has been broken for so long without anyone paying attention that limiting the broken code to the working architectures is probably the best way to avoid trouble. However, as far as I can tell, the problem doesn't affect the entire driver, only the "new" display engine, so I would probably try to limit this to turning off CONFIG_DRM_AMD_DC for architectures that don't have a fixed clang. Arnd
Re: [PATCH 1/3] drm/amd/display/dc/calcs/dce_calcs: Break-out a stack-heavy chunk of code
On Fri, Nov 25, 2022, at 10:25, Lee Jones wrote: > bw_calcs() presently blows the stack-frame limit by calling functions > inside a argument list which return quite a bit of data to be passed > onto sub-functions. Simply breaking out this hunk reduces the > stack-frame use by 500 Bytes, preventing the following compiler > warning: > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dce_calcs.c:3285:6: > warning: stack frame size (1384) exceeds limit (1024) > in 'bw_calcs' [-Wframe-larger-than] > bool bw_calcs(struct dc_context *ctx, > ^ > 1 warning generated. > > This resolves the issue and takes us one step closer towards a > successful allmodconfig WERROR build. > > Signed-off-by: Lee Jones Is this still needed with the patch to turn off the display engine on most architectures? On which architecture and with which compiler do you still observe the problem? Note that this probably doesn't actually solve the potential stack overflow by itself, since the function that is now split out is still called with the parent stack active. Splitting out multiple smaller bits however would solve it since then the stack frames could overlap. Arnd
Re: [PATCH 3/3] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled
On Fri, Nov 25, 2022, at 10:25, Lee Jones wrote: > When enabled, KASAN enlarges function's stack-frames. Pushing quite a > few over the current threshold. This can mainly be seen on 32-bit > architectures where the present limit (when !GCC) is a lowly > 1024-Bytes. > > Signed-off-by: Lee Jones Acked-by: Arnd Bergmann If this affects only clang but not gcc, I wonder if we could limit the scope and keep the 1024 byte limit on gcc builds. > --- > lib/Kconfig.debug | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index c3c0b077ade33..82d475168db95 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -399,6 +399,7 @@ config FRAME_WARN > default 2048 if GCC_PLUGIN_LATENT_ENTROPY This is actually a related bug that we should fix: allmodconfig with gcc turns on GCC_PLUGIN_LATENT_ENTROPY, so the limit ends up being way too high. I think we need to either ensure that allmodconfig turns off the latent entropy plugin, or that the limit gets lowered again to something that is not any higher than the KASAN limit. Arnd
Re: [PATCH v2 1/2] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame
On Fri, Nov 25, 2022, at 13:07, Lee Jones wrote: > calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64) > architectures built with Clang (all released versions), whereby the stack > frame gets blown up to well over 5k. This would cause an immediate kernel > panic on most architectures. We'll revert this when the following bug report > has been resolved: https://github.com/llvm/llvm-project/issues/41896. > > Suggested-by: Arnd Bergmann > Signed-off-by: Lee Jones Acked-by: Arnd Bergmann
Re: [PATCH v2 2/2] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled
On Fri, Nov 25, 2022, at 14:40, Lee Jones wrote: > On Fri, 25 Nov 2022, Lee Jones wrote: > >> When enabled, KASAN enlarges function's stack-frames. Pushing quite a >> few over the current threshold. This can mainly be seen on 32-bit >> architectures where the present limit (when !GCC) is a lowly >> 1024-Bytes. >> >> Signed-off-by: Lee Jones >> --- >> lib/Kconfig.debug | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug >> index c3c0b077ade33..82d475168db95 100644 >> --- a/lib/Kconfig.debug >> +++ b/lib/Kconfig.debug >> @@ -399,6 +399,7 @@ config FRAME_WARN >> default 2048 if GCC_PLUGIN_LATENT_ENTROPY >> default 2048 if PARISC >> default 1536 if (!64BIT && XTENSA) >> +default 1280 if KASAN && !64BIT >> default 1024 if !64BIT >> default 2048 if 64BIT >> help > > Note this also fixes 61 warnings when > > (GCC && !GCC_PLUGIN_LATENT_ENTROPY) > > ... which as Arnd says should not be enabled by default. We'll > address that issue once this set has been applied. Thanks a lot for checking this! Reviewed-by: Arnd Bergmann
Re: [PATCH v2 2/2] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled
On Fri, 25 Nov 2022, Lee Jones wrote: > When enabled, KASAN enlarges function's stack-frames. Pushing quite a > few over the current threshold. This can mainly be seen on 32-bit > architectures where the present limit (when !GCC) is a lowly > 1024-Bytes. > > Signed-off-by: Lee Jones > --- > lib/Kconfig.debug | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index c3c0b077ade33..82d475168db95 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -399,6 +399,7 @@ config FRAME_WARN > default 2048 if GCC_PLUGIN_LATENT_ENTROPY > default 2048 if PARISC > default 1536 if (!64BIT && XTENSA) > + default 1280 if KASAN && !64BIT > default 1024 if !64BIT > default 2048 if 64BIT > help Note this also fixes 61 warnings when (GCC && !GCC_PLUGIN_LATENT_ENTROPY) ... which as Arnd says should not be enabled by default. We'll address that issue once this set has been applied. -- Lee Jones [李琼斯]
Re: [Intel-gfx] [PATCH 7/9] drm/i915: stop using ttm_bo_wait
Am 25.11.22 um 12:14 schrieb Tvrtko Ursulin: + Matt On 25/11/2022 10:21, Christian König wrote: TTM is just wrapping core DMA functionality here, remove the mid-layer. No functional change. Signed-off-by: Christian König --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 5247d88b3c13..d409a77449a3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj, static int i915_ttm_truncate(struct drm_i915_gem_object *obj) { struct ttm_buffer_object *bo = i915_gem_to_ttm(obj); - int err; + long err; WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED); - err = ttm_bo_wait(bo, true, false); - if (err) + err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, + true, 15 * HZ); This 15 second stuck out a bit for me and then on a slightly deeper look it seems this timeout will "leak" into a few of i915 code paths. If we look at the difference between the legacy shmem and ttm backend I am not sure if the legacy one is blocking or not - but if it can block I don't think it would have an arbitrary timeout like this. Matt your thoughts? That's exactly the reason why I try to remove the ttm_bo_wait() as mid layer here. It hides the fact that we don't wait forever for BOs to become idle. This is functional identical to the old code. If you want some other behavior feel free to note what's desired and I will implement it. Regards, Christian. Regards, Tvrtko + if (err < 0) return err; + if (err == 0) + return -EBUSY; err = i915_ttm_move_notify(bo); if (err)
[PATCH v2 2/2] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled
When enabled, KASAN enlarges function's stack-frames. Pushing quite a few over the current threshold. This can mainly be seen on 32-bit architectures where the present limit (when !GCC) is a lowly 1024-Bytes. Signed-off-by: Lee Jones --- lib/Kconfig.debug | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c3c0b077ade33..82d475168db95 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -399,6 +399,7 @@ config FRAME_WARN default 2048 if GCC_PLUGIN_LATENT_ENTROPY default 2048 if PARISC default 1536 if (!64BIT && XTENSA) + default 1280 if KASAN && !64BIT default 1024 if !64BIT default 2048 if 64BIT help -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 1/2] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame
calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64) architectures built with Clang (all released versions), whereby the stack frame gets blown up to well over 5k. This would cause an immediate kernel panic on most architectures. We'll revert this when the following bug report has been resolved: https://github.com/llvm/llvm-project/issues/41896. Suggested-by: Arnd Bergmann Signed-off-by: Lee Jones --- drivers/gpu/drm/amd/display/Kconfig | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 6925e0280dbe6..f4f3d2665a6b2 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -5,6 +5,7 @@ menu "Display Engine Configuration" config DRM_AMD_DC bool "AMD DC - Enable new display engine" default y + depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64 select SND_HDA_COMPONENT if SND_HDA_CORE select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128) help @@ -12,6 +13,12 @@ config DRM_AMD_DC support for AMDGPU. This adds required support for Vega and Raven ASICs. + calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64) + architectures built with Clang (all released versions), whereby the stack + frame gets blown up to well over 5k. This would cause an immediate kernel + panic on most architectures. We'll revert this when the following bug report + has been resolved: https://github.com/llvm/llvm-project/issues/41896. + config DRM_AMD_DC_DCN def_bool n help -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH v2 0/2] Fix a bunch of allmodconfig errors
Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST") WERROR now defaults to COMPILE_TEST meaning that it's enabled for allmodconfig builds. This leads to some interesting failures, each resolved in this set. With this set applied, I am able to obtain a successful allmodconfig Arm build. v1 => v2: - Remove superfluous change (these two override it) - Mark only DRM_AMD_DC ("the new display engine) as Broken - Change logic to only *include* working arches, not *preclude* them Lee Jones (2): drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled drivers/gpu/drm/amd/display/Kconfig | 7 +++ lib/Kconfig.debug | 1 + 2 files changed, 8 insertions(+) -- 2.38.1.584.g0f3c55d4c2-goog
Re: [Intel-gfx] [PATCH 7/9] drm/i915: stop using ttm_bo_wait
+ Matt On 25/11/2022 10:21, Christian König wrote: TTM is just wrapping core DMA functionality here, remove the mid-layer. No functional change. Signed-off-by: Christian König --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 5247d88b3c13..d409a77449a3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj, static int i915_ttm_truncate(struct drm_i915_gem_object *obj) { struct ttm_buffer_object *bo = i915_gem_to_ttm(obj); - int err; + long err; WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED); - err = ttm_bo_wait(bo, true, false); - if (err) + err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, + true, 15 * HZ); This 15 second stuck out a bit for me and then on a slightly deeper look it seems this timeout will "leak" into a few of i915 code paths. If we look at the difference between the legacy shmem and ttm backend I am not sure if the legacy one is blocking or not - but if it can block I don't think it would have an arbitrary timeout like this. Matt your thoughts? Regards, Tvrtko + if (err < 0) return err; + if (err == 0) + return -EBUSY; err = i915_ttm_move_notify(bo); if (err)
[PATCH 6/9] drm/qxl: stop using ttm_bo_wait
TTM is just wrapping core DMA functionality here, remove the mid-layer. No functional change. Signed-off-by: Christian König --- drivers/gpu/drm/qxl/qxl_cmd.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c index 63aa96a69752..281edab518cd 100644 --- a/drivers/gpu/drm/qxl/qxl_cmd.c +++ b/drivers/gpu/drm/qxl/qxl_cmd.c @@ -579,7 +579,7 @@ void qxl_surface_evict(struct qxl_device *qdev, struct qxl_bo *surf, bool do_upd static int qxl_reap_surf(struct qxl_device *qdev, struct qxl_bo *surf, bool stall) { - int ret; + long ret; ret = qxl_bo_reserve(surf); if (ret) @@ -588,7 +588,19 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct qxl_bo *surf, bool stal if (stall) mutex_unlock(&qdev->surf_evict_mutex); - ret = ttm_bo_wait(&surf->tbo, true, !stall); + if (stall) { + ret = dma_resv_wait_timeout(surf->tbo.base.resv, + DMA_RESV_USAGE_BOOKKEEP, true, + 15 * HZ); + if (ret > 0) + ret = 0; + else if (ret == 0) + ret = -EBUSY; + } else { + ret = dma_resv_test_signaled(surf->tbo.base.resv, +DMA_RESV_USAGE_BOOKKEEP); + ret = ret ? -EBUSY : 0; + } if (stall) mutex_lock(&qdev->surf_evict_mutex); -- 2.34.1
[PATCH 8/9] drm/ttm: use ttm_bo_wait_ctx instead of ttm_bo_wait
Make sure that we use the correct settings from the context. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f9d9fd2d865d..cd266a067773 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -439,7 +439,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, bdev->funcs->evict_flags(bo, &placement); if (!placement.num_placement && !placement.num_busy_placement) { - ret = ttm_bo_wait(bo, true, false); + ret = ttm_bo_wait_ctx(bo, ctx); if (ret) return ret; @@ -1190,7 +1190,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, /* * Make sure BO is idle. */ - ret = ttm_bo_wait(bo, false, false); + ret = ttm_bo_wait_ctx(bo, ctx); if (unlikely(ret != 0)) goto out; -- 2.34.1
[PATCH 7/9] drm/i915: stop using ttm_bo_wait
TTM is just wrapping core DMA functionality here, remove the mid-layer. No functional change. Signed-off-by: Christian König --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 5247d88b3c13..d409a77449a3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj, static int i915_ttm_truncate(struct drm_i915_gem_object *obj) { struct ttm_buffer_object *bo = i915_gem_to_ttm(obj); - int err; + long err; WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED); - err = ttm_bo_wait(bo, true, false); - if (err) + err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, + true, 15 * HZ); + if (err < 0) return err; + if (err == 0) + return -EBUSY; err = i915_ttm_move_notify(bo); if (err) -- 2.34.1
[PATCH 5/9] drm/nouveau: stop using ttm_bo_wait
TTM is just wrapping core DMA functionality here, remove the mid-layer. No functional change. Signed-off-by: Christian König --- drivers/gpu/drm/nouveau/nouveau_bo.c | 6 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 11 --- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 335fa91ca4ad..288eebc70a67 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -922,6 +922,7 @@ static void nouveau_bo_move_ntfy(struct ttm_buffer_object *bo, struct nouveau_mem *mem = new_reg ? nouveau_mem(new_reg) : NULL; struct nouveau_bo *nvbo = nouveau_bo(bo); struct nouveau_vma *vma; + long ret; /* ttm can now (stupidly) pass the driver bos it didn't create... */ if (bo->destroy != nouveau_bo_del_ttm) @@ -936,7 +937,10 @@ static void nouveau_bo_move_ntfy(struct ttm_buffer_object *bo, } } else { list_for_each_entry(vma, &nvbo->vma_list, head) { - WARN_ON(ttm_bo_wait(bo, false, false)); + ret = dma_resv_wait_timeout(bo->base.resv, + DMA_RESV_USAGE_BOOKKEEP, + false, 15 * HZ); + WARN_ON(ret <= 0); nouveau_vma_unmap(vma); } } diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index ac5793c96957..f77e44958037 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -645,7 +645,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, struct drm_nouveau_gem_pushbuf_reloc *reloc, struct drm_nouveau_gem_pushbuf_bo *bo) { - int ret = 0; + long ret = 0; unsigned i; for (i = 0; i < req->nr_relocs; i++) { @@ -703,9 +703,14 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, data |= r->vor; } - ret = ttm_bo_wait(&nvbo->bo, false, false); + ret = dma_resv_wait_timeout(nvbo->bo.base.resv, + DMA_RESV_USAGE_BOOKKEEP, + false, 15 * HZ); + if (ret == 0) + ret = -EBUSY; if (ret) { - NV_PRINTK(err, cli, "reloc wait_idle failed: %d\n", ret); + NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", + ret); break; } -- 2.34.1
[PATCH 9/9] drm/ttm: move ttm_bo_wait into VMWGFX
Not used anymore by other drivers or TTM itself. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c| 44 +++-- drivers/gpu/drm/ttm/ttm_bo_util.c | 19 - drivers/gpu/drm/vmwgfx/ttm_object.h | 11 include/drm/ttm/ttm_bo.h| 1 - 4 files changed, 39 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index cd266a067773..326a3d13a829 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1087,47 +1087,35 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_unmap_virtual); /** - * ttm_bo_wait - wait for buffer idle. + * ttm_bo_wait_ctx - wait for buffer idle. * * @bo: The buffer object. - * @interruptible: Use interruptible wait. - * @no_wait: Return immediately if buffer is busy. + * @ctx: defines how to wait * - * This function must be called with the bo::mutex held, and makes - * sure any previous rendering to the buffer is completed. - * Note: It might be necessary to block validations before the - * wait by reserving the buffer. - * Returns -EBUSY if no_wait is true and the buffer is busy. - * Returns -ERESTARTSYS if interrupted by a signal. + * Waits for the buffer to be idle. Used timeout depends on the context. + * Returns -EBUSY if wait timed outt, -ERESTARTSYS if interrupted by a signal or + * zero on success. */ -int ttm_bo_wait(struct ttm_buffer_object *bo, - bool interruptible, bool no_wait) +int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx) { - long timeout = 15 * HZ; + long ret; - if (no_wait) { - if (dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP)) + if (ctx->no_wait_gpu) { + if (dma_resv_test_signaled(bo->base.resv, + DMA_RESV_USAGE_BOOKKEEP)) return 0; else return -EBUSY; } - timeout = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, - interruptible, timeout); - if (timeout < 0) - return timeout; - - if (timeout == 0) + ret = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, + ctx->interruptible, 15 * HZ); + if (unlikely(ret < 0)) + return ret; + if (unlikely(ret == 0)) return -EBUSY; - return 0; } -EXPORT_SYMBOL(ttm_bo_wait); - -int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx) -{ - return ttm_bo_wait(bo, ctx->interruptible, ctx->no_wait_gpu); -} EXPORT_SYMBOL(ttm_bo_wait_ctx); int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, @@ -1135,7 +1123,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, { struct ttm_place place; bool locked; - int ret; + long ret; /* * While the bo may already reside in SYSTEM placement, set diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fee7c20775c0..ed2b28734541 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -547,9 +547,13 @@ EXPORT_SYMBOL(ttm_bo_vunmap); static int ttm_bo_wait_free_node(struct ttm_buffer_object *bo, bool dst_use_tt) { - int ret; - ret = ttm_bo_wait(bo, false, false); - if (ret) + long ret; + + ret = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, + false, 15 * HZ); + if (ret == 0) + return -EBUSY; + if (ret < 0) return ret; if (!dst_use_tt) @@ -710,8 +714,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) return ret; /* If already idle, no need for ghost object dance. */ - ret = ttm_bo_wait(bo, false, true); - if (ret != -EBUSY) { + if (dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP)) { if (!bo->ttm) { /* See comment below about clearing. */ ret = ttm_tt_create(bo, true); @@ -748,8 +751,10 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) ret = dma_resv_copy_fences(&ghost->base._resv, bo->base.resv); /* Last resort, wait for the BO to be idle when we are OOM */ - if (ret) - ttm_bo_wait(bo, false, false); + if (ret) { + dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, + false, MAX_SCHEDULE_TIMEOUT); + } dma_resv_unlock(&ghost->base._resv); ttm_bo_put(ghost); diff --git a/drivers/gpu/drm/vmwgfx/ttm_object.h b/drivers/gpu/drm/vmwgfx/ttm_object.h index f0ebbe340ad6..95a96
[PATCH 4/9] drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h
Merge and cleanup the two headers into a single description of the object API. Also move all the documentation to the implementation and drop unecessary includes from the header. No functional change. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h| 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 + drivers/gpu/drm/drm_gem_ttm_helper.c | 2 + drivers/gpu/drm/drm_gem_vram_helper.c | 1 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 +- drivers/gpu/drm/i915/i915_deps.c | 2 +- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 2 +- drivers/gpu/drm/i915/intel_region_ttm.c | 1 - drivers/gpu/drm/nouveau/nouveau_bo.c | 1 + drivers/gpu/drm/nouveau/nouveau_bo.h | 3 +- drivers/gpu/drm/nouveau/nouveau_drv.h | 3 +- drivers/gpu/drm/nouveau/nouveau_mem.c | 3 +- drivers/gpu/drm/nouveau/nouveau_mem.h | 2 +- drivers/gpu/drm/nouveau/nouveau_prime.c | 1 + drivers/gpu/drm/nouveau/nouveau_sgdma.c | 1 + drivers/gpu/drm/qxl/qxl_drv.h | 3 +- drivers/gpu/drm/qxl/qxl_ttm.c | 4 +- drivers/gpu/drm/radeon/radeon.h | 3 +- drivers/gpu/drm/radeon/radeon_prime.c | 2 + drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/gpu/drm/ttm/ttm_bo.c | 81 +++- drivers/gpu/drm/ttm/ttm_bo_util.c | 110 +- drivers/gpu/drm/ttm/ttm_bo_vm.c | 19 +- drivers/gpu/drm/ttm/ttm_device.c | 2 +- drivers/gpu/drm/ttm/ttm_execbuf_util.c| 6 +- drivers/gpu/drm/ttm/ttm_pool.c| 3 +- drivers/gpu/drm/ttm/ttm_range_manager.c | 2 +- drivers/gpu/drm/ttm/ttm_resource.c| 3 +- drivers/gpu/drm/ttm/ttm_tt.c | 3 +- drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c| 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 1 - drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_gmr.c | 1 - drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 1 - .../gpu/drm/vmwgfx/vmwgfx_system_manager.c| 1 - drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c| 1 - include/drm/drm_gem_ttm_helper.h | 3 +- include/drm/drm_gem_vram_helper.h | 4 +- include/drm/ttm/{ttm_bo_api.h => ttm_bo.h}| 345 +- include/drm/ttm/ttm_bo_driver.h | 303 --- include/drm/ttm/ttm_execbuf_util.h| 4 +- 55 files changed, 410 insertions(+), 557 deletions(-) rename include/drm/ttm/{ttm_bo_api.h => ttm_bo.h} (67%) delete mode 100644 include/drm/ttm/ttm_bo_driver.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 6b74df446694..2644cd991210 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -52,8 +52,7 @@ #include #include -#include -#include +#include #include #include diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 3a763916a5a1..ab450f12c445 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -25,6 +25,7 @@ #include #include #include +#include #include "amdgpu_object.h" #include "amdgpu_gem.h" diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h index e4d78491bcc7..ededdc01ca28 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h @@ -28,6 +28,8 @@ struct hmm_range; +struct drm_file; + struct amdgpu_device; struct amdgpu_bo; struct amdgpu_bo_va; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 8516c814bc9b..8b7a09b392ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -32,6 +32,8 @@ #include #include +#include + #include "amdgpu_cs.h" #include "amdgpu.h" #include "amdgpu_trace.h" diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_c
[PATCH 2/9] drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue
Those functions never worked correctly since it is still perfectly possible that a buffer object is released and the background worker restarted even after calling them. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +--- drivers/gpu/drm/radeon/radeon_device.c | 5 - drivers/gpu/drm/radeon/radeon_pm.c | 4 +--- drivers/gpu/drm/ttm/ttm_bo.c| 14 -- include/drm/ttm/ttm_bo_api.h| 16 6 files changed, 3 insertions(+), 46 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index 0f16d3c09309..f60753f97ac5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -1717,7 +1717,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct amdgpu_ring *ring) static int amdgpu_debugfs_ib_preempt(void *data, u64 val) { - int r, resched, length; + int r, length; struct amdgpu_ring *ring; struct dma_fence **fences = NULL; struct amdgpu_device *adev = (struct amdgpu_device *)data; @@ -1747,8 +1747,6 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val) /* stop the scheduler */ kthread_park(ring->sched.thread); - resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev); - /* preempt the IB */ r = amdgpu_ring_preempt_ib(ring); if (r) { @@ -1785,8 +1783,6 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val) up_read(&adev->reset_domain->sem); - ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched); - pro_end: kfree(fences); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b2b1c66bfe39..2b1db37e25c1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3983,10 +3983,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) } amdgpu_fence_driver_hw_fini(adev); - if (adev->mman.initialized) { + if (adev->mman.initialized) flush_delayed_work(&adev->mman.bdev.wq); - ttm_bo_lock_delayed_workqueue(&adev->mman.bdev); - } if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 6344454a7721..9a556f505685 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1772,7 +1772,6 @@ int radeon_gpu_reset(struct radeon_device *rdev) bool saved = false; int i, r; - int resched; down_write(&rdev->exclusive_lock); @@ -1784,8 +1783,6 @@ int radeon_gpu_reset(struct radeon_device *rdev) atomic_inc(&rdev->gpu_reset_counter); radeon_save_bios_scratch_regs(rdev); - /* block TTM */ - resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev); radeon_suspend(rdev); radeon_hpd_fini(rdev); @@ -1844,8 +1841,6 @@ int radeon_gpu_reset(struct radeon_device *rdev) /* reset hpd state */ radeon_hpd_init(rdev); - ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); - rdev->in_reset = true; rdev->needs_reset = false; diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index 04c693ca419a..cbc554928bcc 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -1853,11 +1853,10 @@ static bool radeon_pm_debug_check_in_vbl(struct radeon_device *rdev, bool finish static void radeon_dynpm_idle_work_handler(struct work_struct *work) { struct radeon_device *rdev; - int resched; + rdev = container_of(work, struct radeon_device, pm.dynpm_idle_work.work); - resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev); mutex_lock(&rdev->pm.mutex); if (rdev->pm.dynpm_state == DYNPM_STATE_ACTIVE) { int not_processed = 0; @@ -1908,7 +1907,6 @@ static void radeon_dynpm_idle_work_handler(struct work_struct *work) msecs_to_jiffies(RADEON_IDLE_LOOP_MS)); } mutex_unlock(&rdev->pm.mutex); - ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); } /* diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index c3f4b33136e5..b77262a623e0 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -418,20 +418,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo) } EXPORT_SYMBOL(ttm_bo_put); -int ttm_bo_lock_delayed_workqueue(struct ttm_device *bdev) -{ - return cancel_delayed_work_sync(&bdev->wq); -} -EXPORT_SYMBOL(ttm_bo_lock_delayed_workqueue); - -void ttm_bo_unlock_delayed_workqueue(struct ttm_device *bdev, int resc
[PATCH 3/9] drm/ttm: use per BO cleanup workers
Instead of a single worker going over the list of delete BOs in regular intervals use a per BO worker which blocks for the resv object and locking of the BO. This not only simplifies the handling massively, but also results in much better response time when cleaning up buffers. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/i915/i915_gem.c| 2 +- drivers/gpu/drm/i915/intel_region_ttm.c| 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 112 - drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_device.c | 24 ++--- include/drm/ttm/ttm_bo_api.h | 18 +--- include/drm/ttm/ttm_device.h | 7 +- 8 files changed, 57 insertions(+), 111 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2b1db37e25c1..74ccbd566777 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3984,7 +3984,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_fence_driver_hw_fini(adev); if (adev->mman.initialized) - flush_delayed_work(&adev->mman.bdev.wq); + drain_workqueue(adev->mman.bdev.wq); if (adev->pm_sysfs_en) amdgpu_pm_sysfs_fini(adev); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8468ca9885fd..c38306f156d6 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1099,7 +1099,7 @@ void i915_gem_drain_freed_objects(struct drm_i915_private *i915) { while (atomic_read(&i915->mm.free_count)) { flush_work(&i915->mm.free_work); - flush_delayed_work(&i915->bdev.wq); + drain_workqueue(i915->bdev.wq); rcu_barrier(); } } diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c index cf89d0c2a2d9..657bbc16a48a 100644 --- a/drivers/gpu/drm/i915/intel_region_ttm.c +++ b/drivers/gpu/drm/i915/intel_region_ttm.c @@ -132,7 +132,7 @@ int intel_region_ttm_fini(struct intel_memory_region *mem) break; msleep(20); - flush_delayed_work(&mem->i915->bdev.wq); + drain_workqueue(mem->i915->bdev.wq); } /* If we leaked objects, Don't free the region causing use after free */ diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index b77262a623e0..4749b65bedc4 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -280,14 +280,13 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, ret = 0; } - if (ret || unlikely(list_empty(&bo->ddestroy))) { + if (ret) { if (unlock_resv) dma_resv_unlock(bo->base.resv); spin_unlock(&bo->bdev->lru_lock); return ret; } - list_del_init(&bo->ddestroy); spin_unlock(&bo->bdev->lru_lock); ttm_bo_cleanup_memtype_use(bo); @@ -300,47 +299,21 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, } /* - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all - * encountered buffers. + * Block for the dma_resv object to become idle, lock the buffer and clean up + * the resource and tt object. */ -bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all) +static void ttm_bo_delayed_delete(struct work_struct *work) { - struct list_head removed; - bool empty; - - INIT_LIST_HEAD(&removed); - - spin_lock(&bdev->lru_lock); - while (!list_empty(&bdev->ddestroy)) { - struct ttm_buffer_object *bo; - - bo = list_first_entry(&bdev->ddestroy, struct ttm_buffer_object, - ddestroy); - list_move_tail(&bo->ddestroy, &removed); - if (!ttm_bo_get_unless_zero(bo)) - continue; - - if (remove_all || bo->base.resv != &bo->base._resv) { - spin_unlock(&bdev->lru_lock); - dma_resv_lock(bo->base.resv, NULL); - - spin_lock(&bdev->lru_lock); - ttm_bo_cleanup_refs(bo, false, !remove_all, true); - - } else if (dma_resv_trylock(bo->base.resv)) { - ttm_bo_cleanup_refs(bo, false, !remove_all, true); - } else { - spin_unlock(&bdev->lru_lock); - } + struct ttm_buffer_object *bo; - ttm_bo_put(bo); - spin_lock(&bdev->lru_lock); - } - list_splice_tail(&removed, &bdev->ddestroy); - empty = list_empty(&bdev->ddestroy); - spin_unlock(&bdev->lru_lock); + bo = container_of(work, typeof(*bo), delayed_delete); -
[PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation
We already fallback to a dummy BO with no backing store when we allocate GDS,GWS and OA resources and to GTT when we allocate VRAM. Drop all those workarounds and generalize this for GTT as well. This fixes ENOMEM issues with runaway applications which try to allocate/free GTT in a loop and are otherwise only limited by the CPU speed. The CS will wait for the cleanup of freed up BOs to satisfy the various domain specific limits and so effectively throttle those buggy applications down to a sane allocation behavior again. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 16 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- 2 files changed, 4 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index a0780a4e3e61..62e98f1ad770 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, bp.resv = resv; bp.preferred_domain = initial_domain; bp.flags = flags; - bp.domain = initial_domain; + bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU; bp.bo_ptr_size = sizeof(struct amdgpu_bo); r = amdgpu_bo_create_user(adev, &bp, &ubo); @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, } initial_domain = (u32)(0x & args->in.domains); -retry: r = amdgpu_gem_object_create(adev, size, args->in.alignment, -initial_domain, -flags, ttm_bo_type_device, resv, &gobj); +initial_domain, flags, ttm_bo_type_device, +resv, &gobj); if (r && r != -ERESTARTSYS) { - if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) { - flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; - goto retry; - } - - if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { - initial_domain |= AMDGPU_GEM_DOMAIN_GTT; - goto retry; - } DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n", size, initial_domain, args->in.alignment, r); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 974e85d8b6cc..919bbea2e3ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev, bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE; bo->tbo.bdev = &adev->mman.bdev; - if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA | - AMDGPU_GEM_DOMAIN_GDS)) - amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); - else - amdgpu_bo_placement_from_domain(bo, bp->domain); + amdgpu_bo_placement_from_domain(bo, bp->domain); if (bp->type == ttm_bo_type_kernel) bo->tbo.priority = 1; -- 2.34.1
Re: [PATCH] drm/amdgpu: Fix minmax error
Am 25.11.22 um 09:33 schrieb Luben Tuikov: On 2022-11-25 02:59, Christian König wrote: Am 25.11.22 um 08:56 schrieb Luben Tuikov: On 2022-11-25 02:45, Christian König wrote: Am 24.11.22 um 22:19 schrieb Luben Tuikov: Fix minmax compilation error by using min_t()/max_t(), of the assignment type. Cc: James Zhu Cc: Felix Kuehling Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system memory") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 8a2e5716d8dba2..d22d14b0ef0c84 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->dev_private_owner = owner; do { - hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); + hmm_range->end = min_t(typeof(hmm_range->end), + hmm_range->start + MAX_WALK_BYTE, + end); Since end is a local variable I would strongly prefer to just have it use the correct type for it. Otherwise we might end up using something which doesn't work on all architectures. They all appear to be "unsigned long". I thought, since we assign to hmm_range->end, we use that type. Mhm, then why does the compiler complain here? Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) ), and this is why the minmax check complains. So, since the left-hand expression is unsigned long, i.e., hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); is, unsigned long = min(unsigned long long, unsigned long); The compiler complains. I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX, That's not only a preference, but a must have. Otherwise the code maybe won't work as expected on 32bit architectures. and be defined as UL. I mean, why is everything in struct hmm_range "unsigned long", but we set a high limit of 10__h for an end, and compare it to "end" to find the smaller? If our "end" could potentially be 10__h then shouldn't the members in struct hmm_range be unsigned long long as well? No, that the hmm range depends on the address space bits of the CPU is perfectly correct. Essentially this is just an userspace address range. Our problem here is that this code needs to work on both 32bit and 64bit systems. And on a 32bit system limiting the types wouldn't work correctly as far as I can see. So the compiler is complaining for rather good reasons and by using "min_t(UL" we just hide that instead of fixing the problem. I suggest to use "min_t(u64" instead. An intelligent compiler should even be capable of optimizing this away by looking at the input types on 32bit archs. And for the timeout, we have the (now) obvious, timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL); and I don't know why we necessarily need a "1ULL", when 1UL would do just fine, and then compilation passes for that statement. I can set this to 1UL, instead of using max_t(). I think just changing this to 1UL should be sufficient. Regards, Christian. Regards, Luben As far as I can see "unsigned long" is correct here, but if we somehow have a typecast then something is not working as expected. Is MAX_WALK_BYTE maybe of signed type? Would you prefer at the top of the function to define "timeout" and "end" as, typeof(hmm_range->end) end, timeout; Well for end that might make sense, but timeout is independent of the hmm range. Regards, Christian. Regards, Luben
[PATCH 3/3] Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled
When enabled, KASAN enlarges function's stack-frames. Pushing quite a few over the current threshold. This can mainly be seen on 32-bit architectures where the present limit (when !GCC) is a lowly 1024-Bytes. Signed-off-by: Lee Jones --- lib/Kconfig.debug | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c3c0b077ade33..82d475168db95 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -399,6 +399,7 @@ config FRAME_WARN default 2048 if GCC_PLUGIN_LATENT_ENTROPY default 2048 if PARISC default 1536 if (!64BIT && XTENSA) + default 1280 if KASAN && !64BIT default 1024 if !64BIT default 2048 if 64BIT help -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH 2/3] drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame
calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64) architectures built with Clang (all released versions), whereby the stack frame gets blown up to well over 5k. This would cause an immediate kernel panic on most architectures. We'll revert this when the following bug report has been resolved: https://github.com/llvm/llvm-project/issues/41896. Suggested-by: Arnd Bergmann Signed-off-by: Lee Jones --- drivers/gpu/drm/Kconfig | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 34f5a092c99e7..1fa7b9760adb8 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -265,6 +265,7 @@ source "drivers/gpu/drm/radeon/Kconfig" config DRM_AMDGPU tristate "AMD GPU" + depends on BROKEN || !CC_IS_CLANG || !(X86_64 || SPARC64 || ARM64) depends on DRM && PCI && MMU select FW_LOADER select DRM_DISPLAY_DP_HELPER @@ -289,6 +290,12 @@ config DRM_AMDGPU help Choose this option if you have a recent AMD Radeon graphics card. + calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64) + architectures built with Clang (all released versions), whereby the stack + frame gets blown up to well over 5k. This would cause an immediate kernel + panic on most architectures. We'll revert this when the following bug report + has been resolved: https://github.com/llvm/llvm-project/issues/41896. + If M is selected, the module will be called amdgpu. source "drivers/gpu/drm/amd/amdgpu/Kconfig" -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH 1/3] drm/amd/display/dc/calcs/dce_calcs: Break-out a stack-heavy chunk of code
bw_calcs() presently blows the stack-frame limit by calling functions inside a argument list which return quite a bit of data to be passed onto sub-functions. Simply breaking out this hunk reduces the stack-frame use by 500 Bytes, preventing the following compiler warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/calcs/dce_calcs.c:3285:6: warning: stack frame size (1384) exceeds limit (1024) in 'bw_calcs' [-Wframe-larger-than] bool bw_calcs(struct dc_context *ctx, ^ 1 warning generated. This resolves the issue and takes us one step closer towards a successful allmodconfig WERROR build. Signed-off-by: Lee Jones --- .../drm/amd/display/dc/dml/calcs/dce_calcs.c | 483 +- 1 file changed, 245 insertions(+), 238 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c b/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c index 0100a6053ab6b..ce5918830c030 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c @@ -3034,6 +3034,248 @@ static bool all_displays_in_sync(const struct pipe_ctx pipe[], return true; } +void bw_calcs_mid_phase(struct dc_context *ctx, const struct bw_calcs_dceip *dceip, + const struct bw_calcs_vbios *vbios, struct dce_bw_output *calcs_output, + struct bw_fixed low_sclk, struct bw_fixed mid1_sclk, + struct bw_fixed mid2_sclk, struct bw_fixed mid3_sclk, + struct bw_fixed mid_yclk, struct bw_calcs_data *data) +{ + ((struct bw_calcs_vbios *)vbios)->low_sclk = mid3_sclk; + ((struct bw_calcs_vbios *)vbios)->mid1_sclk = mid3_sclk; + ((struct bw_calcs_vbios *)vbios)->mid2_sclk = mid3_sclk; + calculate_bandwidth(dceip, vbios, data); + + calcs_output->nbp_state_change_wm_ns[0].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[4],bw_int_to_fixed(1000))); + calcs_output->nbp_state_change_wm_ns[1].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[5], bw_int_to_fixed(1000))); + calcs_output->nbp_state_change_wm_ns[2].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[6], bw_int_to_fixed(1000))); + + if (ctx->dc->caps.max_slave_planes) { + calcs_output->nbp_state_change_wm_ns[3].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[0], bw_int_to_fixed(1000))); + calcs_output->nbp_state_change_wm_ns[4].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[1], bw_int_to_fixed(1000))); + } else { + calcs_output->nbp_state_change_wm_ns[3].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[7], bw_int_to_fixed(1000))); + calcs_output->nbp_state_change_wm_ns[4].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[8], bw_int_to_fixed(1000))); + } + calcs_output->nbp_state_change_wm_ns[5].b_mark = + bw_fixed_to_int(bw_mul(data-> + nbp_state_change_watermark[9], bw_int_to_fixed(1000))); + + calcs_output->stutter_exit_wm_ns[0].b_mark = + bw_fixed_to_int(bw_mul(data-> + stutter_exit_watermark[4], bw_int_to_fixed(1000))); + calcs_output->stutter_exit_wm_ns[1].b_mark = + bw_fixed_to_int(bw_mul(data-> + stutter_exit_watermark[5], bw_int_to_fixed(1000))); + calcs_output->stutter_exit_wm_ns[2].b_mark = + bw_fixed_to_int(bw_mul(data-> + stutter_exit_watermark[6], bw_int_to_fixed(1000))); + if (ctx->dc->caps.max_slave_planes) { + calcs_output->stutter_exit_wm_ns[3].b_mark = + bw_fixed_to_int(bw_mul(data-> + stutter_exit_watermark[0], bw_int_to_fixed(1000))); + calcs_output->stutter_exit_wm_ns[4].b_mark = + bw_fixed_to_int(bw_mul(data-> + stutter_exit_watermark[1], bw_int_to_fixed(1000))); + } else { + calcs_output->stutter_exit_wm_ns[3].b_mark = + bw_fixed_to_int(bw_mul(data-> + stutter_exit_watermark[7], bw_int_to_fixed(1000))); + calcs_output->stutter_exit_wm_ns[4].b_mark = + bw_fixed_to_int(b
[PATCH 0/3] Fix a bunch of allmodconfig errors
Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST") WERROR now defaults to COMPILE_TEST meaning that it's enabled for allmodconfig builds. This leads to some interesting failures, each resolved in this set. With this set applied, I am able to obtain a successful allmodconfig Arm build. Lee Jones (3): drm/amd/display/dc/calcs/dce_calcs: Break-out a stack-heavy chunk of code drm/amdgpu: Temporarily disable broken Clang builds due to blown stack-frame Kconfig.debug: Provide a little extra FRAME_WARN leeway when KASAN is enabled drivers/gpu/drm/Kconfig | 7 + .../drm/amd/display/dc/dml/calcs/dce_calcs.c | 483 +- lib/Kconfig.debug | 1 + 3 files changed, 253 insertions(+), 238 deletions(-) -- 2.38.1.584.g0f3c55d4c2-goog
[PATCH] drm/amdkfd: Remove unnecessary condition in kfd_topology_add_device()
We re-arranged this code recently so "ret" is always zero at this point. Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 6f01ebc8557b..bceb1a5b2518 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -2012,10 +2012,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu) kfd_debug_print_topology(); - if (!res) - kfd_notify_gpu_change(gpu_id, 1); + kfd_notify_gpu_change(gpu_id, 1); - return res; + return 0; } /** -- 2.35.1
Re: [PATCH] drm/amdgpu: Fix minmax error
On 2022-11-25 02:59, Christian König wrote: > Am 25.11.22 um 08:56 schrieb Luben Tuikov: >> On 2022-11-25 02:45, Christian König wrote: >>> >>> Am 24.11.22 um 22:19 schrieb Luben Tuikov: Fix minmax compilation error by using min_t()/max_t(), of the assignment type. Cc: James Zhu Cc: Felix Kuehling Fixes: 58170a7a002ad6 ("drm/amdgpu: fix stall on CPU when allocate large system memory") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 8a2e5716d8dba2..d22d14b0ef0c84 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -191,14 +191,18 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->dev_private_owner = owner; do { - hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); + hmm_range->end = min_t(typeof(hmm_range->end), + hmm_range->start + MAX_WALK_BYTE, + end); >>> Since end is a local variable I would strongly prefer to just have it >>> use the correct type for it. >>> >>> Otherwise we might end up using something which doesn't work on all >>> architectures. >> They all appear to be "unsigned long". I thought, since we assign to >> hmm_range->end, we use that type. > > Mhm, then why does the compiler complain here? Right... so MAX_WALK_BYTE is 2^36 ULL (diabolically defined as 64ULL<<30 :-) ), and this is why the minmax check complains. So, since the left-hand expression is unsigned long, i.e., hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end); is, unsigned long = min(unsigned long long, unsigned long); The compiler complains. I'd really prefer MAX_WALK_BYTE be less than or equal to ULONG_MAX, and be defined as UL. I mean, why is everything in struct hmm_range "unsigned long", but we set a high limit of 10__h for an end, and compare it to "end" to find the smaller? If our "end" could potentially be 10__h then shouldn't the members in struct hmm_range be unsigned long long as well? And for the timeout, we have the (now) obvious, timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL); and I don't know why we necessarily need a "1ULL", when 1UL would do just fine, and then compilation passes for that statement. I can set this to 1UL, instead of using max_t(). Regards, Luben > > As far as I can see "unsigned long" is correct here, but if we somehow > have a typecast then something is not working as expected. > > Is MAX_WALK_BYTE maybe of signed type? > >> >> Would you prefer at the top of the function to define "timeout" and "end" as, >> typeof(hmm_range->end) end, timeout; > > Well for end that might make sense, but timeout is independent of the > hmm range. > > Regards, > Christian. > >> >> Regards, >> Luben >> >