Quoting Matt Roper (2025-11-03 21:15:37-03:00)
>On Mon, Nov 03, 2025 at 02:18:12PM -0300, Gustavo Sousa wrote:
>> From: Vinod Govindapillai <[email protected]>
>> 
>> Configure one of the FBC instances to use system caching. FBC
>> read/write requests are tagged as cacheable till a programmed
>> limit is reached by the hw.
>
>What exactly is "system caching?"  We have lots of different caches in
>current platforms, and it's not really obvious to me from the
>description here (or the bspec page) exactly which cache(s) are involved
>here.

Perhaps Vinod would provide more assertive answers for this and other
questions, but I'll also try to reply based on my research on this
topic.

Although the Bspec does not make it clear, by digging a bit deeper into
other documentation, "system cache" appears to be the SoC-level cache.

>
>Is turning this on always a win or is it situational?  I.e., is there
>any potential for display memory traffic to fill a cache with FBC data
>by evicting data that was part of the CPU or GT's working set?  If so,
>that seems like it could potentially harm the performance of other
>workloads running on the platform.
>
>Or is this whole thing about a completely new cache (unrelated to
>and unusable by anything else) which is devoted solely to FBC?

>From what I understood in the docs, the value
FBC_SYS_CACHE_TAG_USE_RES_SPACE (i.e. 0b11) for field "Cache Tags"
indicates that the caching will happen in a reserved space dedicated to
the display engine.  So I believe we wouldn't be interfering with other
agents.

>
>> 
>> Bspec: 74722
>
>You might want to add 68881 here since it has a bit more information
>about how we're actually supposed to set the fields documented on 74722.

Agreed.

>
>> Signed-off-by: Vinod Govindapillai <[email protected]>
>> Signed-off-by: Gustavo Sousa <[email protected]>
>> ---
>>  drivers/gpu/drm/i915/display/intel_fbc.c      | 47 
>> +++++++++++++++++++++++++++
>>  drivers/gpu/drm/i915/display/intel_fbc_regs.h |  9 +++++
>>  2 files changed, 56 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c 
>> b/drivers/gpu/drm/i915/display/intel_fbc.c
>> index 24b72951ea3c..e2e55c58ddbc 100644
>> --- a/drivers/gpu/drm/i915/display/intel_fbc.c
>> +++ b/drivers/gpu/drm/i915/display/intel_fbc.c
>> @@ -127,6 +127,9 @@ struct intel_fbc {
>>           */
>>          struct intel_fbc_state state;
>>          const char *no_fbc_reason;
>> +
>> +        /* Only one of FBC instances can use the system cache */
>> +        bool own_sys_cache;

If we go ahead with using this member, I would prefer that we used
"owns_sys_cache" (just like we use "has_something" instead of
"have_something").

>>  };
>>  
>>  static char fbc_name(enum intel_fbc_id fbc_id)
>> @@ -569,12 +572,51 @@ static bool ilk_fbc_is_compressing(struct intel_fbc 
>> *fbc)
>>          return intel_de_read(fbc->display, ILK_DPFC_STATUS(fbc->id)) & 
>> DPFC_COMP_SEG_MASK;
>>  }
>>  
>> +static void nvl_fbc_program_system_cache(struct intel_fbc *fbc, bool enable)
>> +{
>> +        struct intel_display *display = fbc->display;
>> +        u32 cfb_offset, usage;
>> +
>> +        lockdep_assert_held(&fbc->lock);
>> +
>> +        usage = intel_de_read(display, NVL_FBC_SYS_CACHE_USAGE_CFG);
>> +
>> +        /* System cache already being used by another pipe */
>> +        if (enable && (usage & FBC_SYS_CACHE_TAG_USE_RES_SPACE))
>> +                return;
>
>Rather than relying on the current register contents, should we be
>sanitizing this on driver probe (in case the pre-OS firmware already set
>this up) and then making our own decisions (as part of an atomic
>transaction) about which pipe to prioritize after that?

I agree.

>
>> +
>> +        /* Only the fbc instance which owns system cache can disable it */
>> +        if (!enable && !fbc->own_sys_cache)
>> +                return;
>> +
>> +        /*
>> +         * Not programming the cache limit and cache reading enable bits 
>> explicitly
>> +         * here. The default values should take care of those and that 
>> could leave
>> +         * adjustments of those bits to the system hw policy
>> +         *
>> +         * TODO: check if we need to explicitly program these?
>
>There's no hardware default documented for the range field, so unless
>the pre-OS firmware sets it up (which we probably shouldn't rely on),
>I'd expect this to be 0; I don't think that's what we want.

Agreed.

The Bspec clearly states that we should set "Cacheable Range" to 32, the
equivalent of 2MB (i.e. 32 chunks of 64KB).  So yes, we shouldn't rely
on any existing value and always use 32.

>
>> +         */
>> +        cfb_offset = enable ? 
>> i915_gem_stolen_node_offset(fbc->compressed_fb) : 0;
>> +        usage |= FBC_SYS_CACHE_START_BASE(cfb_offset);
>
>And if something *did* set this up already, then OR'ing a new address
>over the old one isn't going to work.  We'd need "(old & ~mask) | new"
>to ensure we don't have leftover bits still set by accident.  But it

Yeah.  The current code is not accouting for any pre-existing value here
and is subject to corruption by simply OR'ing.  This needs to be fixed.

Another thing to fix here is that the field "Cache Start Base" needs to
be in "4k byte chunks" and we are currently using cfb_offset directly
instead of applying the necessary shift.

>would probably be better to just avoid RMW-style handling in general and
>build a complete register value with exactly what we want rather than
>trying to modify the pre-existing value.

The Bspec says that we should keep other fields with their default
values.  So, I believe we do need to have RMW logic here.

>
>> +        usage |= enable ? FBC_SYS_CACHE_TAG_USE_RES_SPACE : 
>> FBC_SYS_CACHE_TAG_DONT_CACHE;
>> +
>> +        intel_de_write(display, NVL_FBC_SYS_CACHE_USAGE_CFG, usage);
>> +
>> +        fbc->own_sys_cache = enable;
>
>It feels like instead of having this as a boolean flag in fbc, this
>should be a pointer/ID tracked at the intel_display level.  E.g.,
>
>        display->sys_cache_fbc = fbc;

Yeah.  A single member instead of one for each FBC seems to be enough.

>
>or possibly converted over to something tracked with atomic state so
>that we can make better high-level decisions about which FBC we want to
>enable this on as various displays get enabled/disabled.

That would be nice.  I see it as something that could be done as a
follow-up.

>
>
>Matt
>
>> +
>> +        drm_dbg_kms(display->drm, "System caching for FBC[%d] %s\n",
>> +                    fbc->id, enable ? "configured" : "cleared");
>> +}
>> +
>>  static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
>>  {
>>          struct intel_display *display = fbc->display;
>>  
>>          intel_de_write(display, ILK_DPFC_CB_BASE(fbc->id),
>>                         i915_gem_stolen_node_offset(fbc->compressed_fb));
>> +
>> +        if (DISPLAY_VER(display) >= 35)
>> +                nvl_fbc_program_system_cache(fbc, true);

One thing that concerns me here is that we are programming
SYS_CACHE_USAGE multiple times and the Bspec seems to indicate that we
should do it only once:

    "Configure SYS_CACHE_USAGE to setup the caching before enabling
    first FBC and leave it alone after that."

I believe we should get some clarification with the HW team to verify if
what we are doing here is legal.  By doing it multiple times, we could
be interfering with other agents (e.g. PCode) that could be doing some
dynamic adjustments.

--
Gustavo Sousa

>>  }
>>  
>>  static const struct intel_fbc_funcs ilk_fbc_funcs = {
>> @@ -952,6 +994,8 @@ static void intel_fbc_program_workarounds(struct 
>> intel_fbc *fbc)
>>  
>>  static void __intel_fbc_cleanup_cfb(struct intel_fbc *fbc)
>>  {
>> +        struct intel_display *display = fbc->display;
>> +
>>          if (WARN_ON(intel_fbc_hw_is_active(fbc)))
>>                  return;
>>  
>> @@ -959,6 +1003,9 @@ static void __intel_fbc_cleanup_cfb(struct intel_fbc 
>> *fbc)
>>                  i915_gem_stolen_remove_node(fbc->compressed_llb);
>>          if (i915_gem_stolen_node_allocated(fbc->compressed_fb))
>>                  i915_gem_stolen_remove_node(fbc->compressed_fb);
>> +
>> +        if (DISPLAY_VER(display) >= 35)
>> +                nvl_fbc_program_system_cache(fbc, false);
>>  }
>>  
>>  void intel_fbc_cleanup(struct intel_display *display)
>> diff --git a/drivers/gpu/drm/i915/display/intel_fbc_regs.h 
>> b/drivers/gpu/drm/i915/display/intel_fbc_regs.h
>> index 77d8321c4fb3..592cd2384255 100644
>> --- a/drivers/gpu/drm/i915/display/intel_fbc_regs.h
>> +++ b/drivers/gpu/drm/i915/display/intel_fbc_regs.h
>> @@ -128,4 +128,13 @@
>>  #define   FBC_REND_NUKE                        REG_BIT(2)
>>  #define   FBC_REND_CACHE_CLEAN                REG_BIT(1)
>>  
>> +#define NVL_FBC_SYS_CACHE_USAGE_CFG             _MMIO(0x1344E0)
>> +#define   FBC_SYS_CACHE_START_BASE_MASK         REG_GENMASK(31, 16)
>> +#define   FBC_SYS_CACHE_START_BASE(base)        
>> REG_FIELD_PREP(FBC_SYS_CACHE_START_BASE_MASK, (base))
>> +#define   FBC_SYS_CACHEABLE_RANGE_MASK          REG_GENMASK(15, 4)
>> +#define   FBC_SYS_CACHEABLE_RANGE(range)        
>> REG_FIELD_PREP(FBC_SYS_CACHEABLE_RANGE_MASK, (range))
>> +#define   FBC_SYS_CACHE_TAG_MASK                REG_GENMASK(3, 2)
>> +#define   FBC_SYS_CACHE_TAG_DONT_CACHE          
>> REG_FIELD_PREP(FBC_SYS_CACHE_TAG_MASK, 0)
>> +#define   FBC_SYS_CACHE_TAG_USE_RES_SPACE       
>> REG_FIELD_PREP(FBC_SYS_CACHE_TAG_MASK, 3)
>> +
>>  #endif /* __INTEL_FBC_REGS__ */
>> 
>> -- 
>> 2.51.0
>> 
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

Reply via email to