On Tue, 2025-11-04 at 13:28 -0300, Gustavo Sousa wrote:
> Quoting Gustavo Sousa (2025-11-04 13:16:01-03:00)
> > Quoting Matt Roper (2025-11-03 21:15:37-03:00)
> > > On Mon, Nov 03, 2025 at 02:18:12PM -0300, Gustavo Sousa wrote:
> > > > From: Vinod Govindapillai <[email protected]>
> > > >
> > > > Configure one of the FBC instances to use system caching. FBC
> > > > read/write requests are tagged as cacheable till a programmed
> > > > limit is reached by the hw.
> > >
> > > What exactly is "system caching?" We have lots of different
> > > caches in
> > > current platforms, and it's not really obvious to me from the
> > > description here (or the bspec page) exactly which cache(s) are
> > > involved
> > > here.
> >
> > Perhaps Vinod would provide more assertive answers for this and
> > other
> > questions, but I'll also try to reply based on my research on this
> > topic.
> >
> > Although the Bspec does not make it clear, by digging a bit deeper
> > into
> > other documentation, "system cache" appears to be the SoC-level
> > cache.
> >
> > >
> > > Is turning this on always a win or is it situational? I.e., is
> > > there
> > > any potential for display memory traffic to fill a cache with FBC
> > > data
> > > by evicting data that was part of the CPU or GT's working set?
> > > If so,
> > > that seems like it could potentially harm the performance of
> > > other
> > > workloads running on the platform.
> > >
> > > Or is this whole thing about a completely new cache (unrelated to
> > > and unusable by anything else) which is devoted solely to FBC?
> >
> > From what I understood in the docs, the value
> > FBC_SYS_CACHE_TAG_USE_RES_SPACE (i.e. 0b11) for field "Cache Tags"
> > indicates that the caching will happen in a reserved space
> > dedicated to
> > the display engine. So I believe we wouldn't be interfering with
> > other
> > agents.
> >
> > >
> > > >
> > > > Bspec: 74722
> > >
> > > You might want to add 68881 here since it has a bit more
> > > information
> > > about how we're actually supposed to set the fields documented on
> > > 74722.
> >
> > Agreed.
> >
> > >
> > > > Signed-off-by: Vinod Govindapillai
> > > > <[email protected]>
> > > > Signed-off-by: Gustavo Sousa <[email protected]>
> > > > ---
> > > > drivers/gpu/drm/i915/display/intel_fbc.c | 47
> > > > +++++++++++++++++++++++++++
> > > > drivers/gpu/drm/i915/display/intel_fbc_regs.h | 9 +++++
> > > > 2 files changed, 56 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c
> > > > b/drivers/gpu/drm/i915/display/intel_fbc.c
> > > > index 24b72951ea3c..e2e55c58ddbc 100644
> > > > --- a/drivers/gpu/drm/i915/display/intel_fbc.c
> > > > +++ b/drivers/gpu/drm/i915/display/intel_fbc.c
> > > > @@ -127,6 +127,9 @@ struct intel_fbc {
> > > > */
> > > > struct intel_fbc_state state;
> > > > const char *no_fbc_reason;
> > > > +
> > > > + /* Only one of FBC instances can use the system cache
> > > > */
> > > > + bool own_sys_cache;
> >
> > If we go ahead with using this member, I would prefer that we used
> > "owns_sys_cache" (just like we use "has_something" instead of
> > "have_something").
> >
> > > > };
> > > >
> > > > static char fbc_name(enum intel_fbc_id fbc_id)
> > > > @@ -569,12 +572,51 @@ static bool ilk_fbc_is_compressing(struct
> > > > intel_fbc *fbc)
> > > > return intel_de_read(fbc->display,
> > > > ILK_DPFC_STATUS(fbc->id)) & DPFC_COMP_SEG_MASK;
> > > > }
> > > >
> > > > +static void nvl_fbc_program_system_cache(struct intel_fbc
> > > > *fbc, bool enable)
> > > > +{
> > > > + struct intel_display *display = fbc->display;
> > > > + u32 cfb_offset, usage;
> > > > +
> > > > + lockdep_assert_held(&fbc->lock);
> > > > +
> > > > + usage = intel_de_read(display,
> > > > NVL_FBC_SYS_CACHE_USAGE_CFG);
> > > > +
> > > > + /* System cache already being used by another pipe */
> > > > + if (enable && (usage &
> > > > FBC_SYS_CACHE_TAG_USE_RES_SPACE))
> > > > + return;
> > >
> > > Rather than relying on the current register contents, should we
> > > be
> > > sanitizing this on driver probe (in case the pre-OS firmware
> > > already set
> > > this up) and then making our own decisions (as part of an atomic
> > > transaction) about which pipe to prioritize after that?
> >
> > I agree.
> >
> > >
> > > > +
> > > > + /* Only the fbc instance which owns system cache can
> > > > disable it */
> > > > + if (!enable && !fbc->own_sys_cache)
> > > > + return;
> > > > +
> > > > + /*
> > > > + * Not programming the cache limit and cache reading
> > > > enable bits explicitly
> > > > + * here. The default values should take care of those
> > > > and that could leave
> > > > + * adjustments of those bits to the system hw policy
> > > > + *
> > > > + * TODO: check if we need to explicitly program these?
> > >
> > > There's no hardware default documented for the range field, so
> > > unless
> > > the pre-OS firmware sets it up (which we probably shouldn't rely
> > > on),
> > > I'd expect this to be 0; I don't think that's what we want.
> >
> > Agreed.
> >
> > The Bspec clearly states that we should set "Cacheable Range" to
> > 32, the
> > equivalent of 2MB (i.e. 32 chunks of 64KB). So yes, we shouldn't
> > rely
> > on any existing value and always use 32.
> >
> > >
> > > > + */
> > > > + cfb_offset = enable ? i915_gem_stolen_node_offset(fbc-
> > > > >compressed_fb) : 0;
> > > > + usage |= FBC_SYS_CACHE_START_BASE(cfb_offset);
> > >
> > > And if something *did* set this up already, then OR'ing a new
> > > address
> > > over the old one isn't going to work. We'd need "(old & ~mask) |
> > > new"
> > > to ensure we don't have leftover bits still set by accident. But
> > > it
> >
> > Yeah. The current code is not accouting for any pre-existing value
> > here
> > and is subject to corruption by simply OR'ing. This needs to be
> > fixed.
> >
> > Another thing to fix here is that the field "Cache Start Base"
> > needs to
> > be in "4k byte chunks" and we are currently using cfb_offset
> > directly
> > instead of applying the necessary shift.
> >
> > > would probably be better to just avoid RMW-style handling in
> > > general and
> > > build a complete register value with exactly what we want rather
> > > than
> > > trying to modify the pre-existing value.
> >
> > The Bspec says that we should keep other fields with their default
> > values. So, I believe we do need to have RMW logic here.
> >
> > >
> > > > + usage |= enable ? FBC_SYS_CACHE_TAG_USE_RES_SPACE :
> > > > FBC_SYS_CACHE_TAG_DONT_CACHE;
> > > > +
> > > > + intel_de_write(display, NVL_FBC_SYS_CACHE_USAGE_CFG,
> > > > usage);
> > > > +
> > > > + fbc->own_sys_cache = enable;
> > >
> > > It feels like instead of having this as a boolean flag in fbc,
> > > this
> > > should be a pointer/ID tracked at the intel_display level. E.g.,
> > >
> > > display->sys_cache_fbc = fbc;
> >
> > Yeah. A single member instead of one for each FBC seems to be
> > enough.
> >
> > >
> > > or possibly converted over to something tracked with atomic state
> > > so
> > > that we can make better high-level decisions about which FBC we
> > > want to
> > > enable this on as various displays get enabled/disabled.
> >
> > That would be nice. I see it as something that could be done as a
> > follow-up.
> >
> > >
> > >
> > > Matt
> > >
> > > > +
> > > > + drm_dbg_kms(display->drm, "System caching for FBC[%d]
> > > > %s\n",
> > > > + fbc->id, enable ? "configured" :
> > > > "cleared");
> > > > +}
> > > > +
> > > > static void ilk_fbc_program_cfb(struct intel_fbc *fbc)
> > > > {
> > > > struct intel_display *display = fbc->display;
> > > >
> > > > intel_de_write(display, ILK_DPFC_CB_BASE(fbc->id),
> > > > i915_gem_stolen_node_offset(fbc-
> > > > >compressed_fb));
> > > > +
> > > > + if (DISPLAY_VER(display) >= 35)
>
> I forgot to mention this on my previous email: I think Bspec is
> missing
> this info, but this feature is applicable only to integrated display,
> so
> we probably want to limit the condition above to reflect that.
>
> Perhaps it would be good to have a macro like
>
> #define HAS_FBC_SYS_CACHE(__display) (DISPLAY_VER(__display) >= 35
> && !(__display)->platform.dgfx)
Ack.
BR
Vinod
>
> --
> Gustavo Sousa
>
> > > > + nvl_fbc_program_system_cache(fbc, true);
> >
> > One thing that concerns me here is that we are programming
> > SYS_CACHE_USAGE multiple times and the Bspec seems to indicate that
> > we
> > should do it only once:
> >
> > "Configure SYS_CACHE_USAGE to setup the caching before enabling
> > first FBC and leave it alone after that."
> >
> > I believe we should get some clarification with the HW team to
> > verify if
> > what we are doing here is legal. By doing it multiple times, we
> > could
> > be interfering with other agents (e.g. PCode) that could be doing
> > some
> > dynamic adjustments.
> >
> > --
> > Gustavo Sousa
> >
> > > > }
> > > >
> > > > static const struct intel_fbc_funcs ilk_fbc_funcs = {
> > > > @@ -952,6 +994,8 @@ static void
> > > > intel_fbc_program_workarounds(struct intel_fbc *fbc)
> > > >
> > > > static void __intel_fbc_cleanup_cfb(struct intel_fbc *fbc)
> > > > {
> > > > + struct intel_display *display = fbc->display;
> > > > +
> > > > if (WARN_ON(intel_fbc_hw_is_active(fbc)))
> > > > return;
> > > >
> > > > @@ -959,6 +1003,9 @@ static void __intel_fbc_cleanup_cfb(struct
> > > > intel_fbc *fbc)
> > > > i915_gem_stolen_remove_node(fbc-
> > > > >compressed_llb);
> > > > if (i915_gem_stolen_node_allocated(fbc-
> > > > >compressed_fb))
> > > > i915_gem_stolen_remove_node(fbc-
> > > > >compressed_fb);
> > > > +
> > > > + if (DISPLAY_VER(display) >= 35)
> > > > + nvl_fbc_program_system_cache(fbc, false);
> > > > }
> > > >
> > > > void intel_fbc_cleanup(struct intel_display *display)
> > > > diff --git a/drivers/gpu/drm/i915/display/intel_fbc_regs.h
> > > > b/drivers/gpu/drm/i915/display/intel_fbc_regs.h
> > > > index 77d8321c4fb3..592cd2384255 100644
> > > > --- a/drivers/gpu/drm/i915/display/intel_fbc_regs.h
> > > > +++ b/drivers/gpu/drm/i915/display/intel_fbc_regs.h
> > > > @@ -128,4 +128,13 @@
> > > > #define FBC_REND_NUKE REG_BIT(2)
> > > > #define FBC_REND_CACHE_CLEAN REG_BIT(1)
> > > >
> > > > +#define NVL_FBC_SYS_CACHE_USAGE_CFG
> > > > _MMIO(0x1344E0)
> > > > +#define FBC_SYS_CACHE_START_BASE_MASK
> > > > REG_GENMASK(31, 16)
> > > > +#define FBC_SYS_CACHE_START_BASE(base)
> > > > REG_FIELD_PREP(FBC_SYS_CACHE_START_BASE_MASK, (base))
> > > > +#define FBC_SYS_CACHEABLE_RANGE_MASK
> > > > REG_GENMASK(15, 4)
> > > > +#define FBC_SYS_CACHEABLE_RANGE(range)
> > > > REG_FIELD_PREP(FBC_SYS_CACHEABLE_RANGE_MASK, (range))
> > > > +#define FBC_SYS_CACHE_TAG_MASK REG_GENMASK(3,
> > > > 2)
> > > > +#define FBC_SYS_CACHE_TAG_DONT_CACHE
> > > > REG_FIELD_PREP(FBC_SYS_CACHE_TAG_MASK, 0)
> > > > +#define FBC_SYS_CACHE_TAG_USE_RES_SPACE
> > > > REG_FIELD_PREP(FBC_SYS_CACHE_TAG_MASK, 3)
> > > > +
> > > > #endif /* __INTEL_FBC_REGS__ */
> > > >
> > > > --
> > > > 2.51.0
> > > >
> > >
> > > --
> > > Matt Roper
> > > Graphics Software Engineer
> > > Linux GPU Platform Enablement
> > > Intel Corporation