On Tue, 2025-09-09 at 09:47 +0800, Xi Ruoyao wrote:
> On Mon, 2025-09-08 at 14:18 -0600, Alex Hung wrote:
> > 
> > 
> > On 8/25/25 02:52, Xi Ruoyao wrote:
> > > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually
> > > inlined by the compiler) populate_dml21_surface_config_from_plane_state
> > > and populate_dml21_plane_config_from_plane_state which may use FPU.  In
> > > a x86-64 build:
> > > 
> > >      $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \
> > >      > 
> > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o |
> > >      > grep %xmm -c
> > >      63
> > > 
> > > Thus it needs to be guarded with DC_FP_START.  But we must note that the
> > > current code quality of the in-kernel FPU use in AMD dml2 is very much
> > > problematic: we are actually calling DC_FP_START in dml21_wrapper.c
> > > here, and this translation unit is built with CC_FLAGS_FPU.  Strictly
> > > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is
> > > allowed to generate FPU uses anywhere in the translated code, perhaps
> > > out of the DC_FP_START guard.  This problematic pattern also occurs in
> > > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c.  Thus we really
> > 
> > Let me share Austin's comments below:
> > 
> > "
> > Both CC_FLAGS_FPU and DC_FP_START are required for FPU usage.
> > 
> > CC_FLAGS_FPU allows the compiler to generate FPU code whereas 
> > DC_FP_START ensures that the FPU registers don't get tainted during runtime.
> 
> But the correct pattern would be: isolating the code using FPU into its
> own translation unit, say xxx_fp.c, then call DC_FP_START in another
> translation unit (for which CC_FLAGS_FPU is NOT used) before calling the
> routines in xxx_fp.c.
> 
> It's because there's generally no way to guarantee the compiler only
> generate FP code in the range of DC_FP_START ... DC_FP_END if
> CC_FLAGS_FPU is used.  The compiler can generate FPU code in situations
> we don't intend to use (and don't anticipate) FPU, for example if a
> "counting number of ones in a word" pattern is detected when the code is
> built for LoongArch, the compiler can invoke FPU to use a popcount
> instruction with CC_FLAGS_FPU.  (That doesn't really happen now but it
> may happen in the future.)
> 
> And the correct pattern is already used by, for example
> dcn10_resource.c:
> 
>     DC_FP_START();
>     voltage_supported = dcn_validate_bandwidth(dc, context, validate_mode);
>     DC_FP_END();
> 
> Here dcn_validate_bandwidth is in dcn_calcs.c where CC_FLAGS_FPU is in-
> effect, but dcn10_resource.c itself is not built with CC_FLAGS_FPU.

And the correct pattern is already documented and explained in
dcn20_fpu.c:

 * 4. Developers **must not** use DC_FP_START/END in this file, but they need
 *    to ensure that the caller invokes it before access any function available
 *    in this file. For this reason, public functions in this file must invoke
 *    dc_assert_fp_enabled();
 *
 * Let's expand a little bit more the idea in the code pattern. To fully
 * isolate FPU operations in a single place, we must avoid situations where
 * compilers spill FP values to registers due to FP enable in a specific C
 * file. Note that even if we isolate all FPU functions in a single file and
 * call its interface from other files, the compiler might enable the use of
 * FPU before we call DC_FP_START. Nevertheless, it is the programmer's
 * responsibility to invoke DC_FP_START/END in the correct place. To highlight
 * situations where developers forgot to use the FP protection before calling
 * the DC FPU interface functions, we introduce a helper that checks if the
 * function is invoked under FP protection. If not, it will trigger a kernel
 * warning.

-- 
Xi Ruoyao <[email protected]>

Reply via email to