On Tue, 2025-09-09 at 09:47 +0800, Xi Ruoyao wrote: > On Mon, 2025-09-08 at 14:18 -0600, Alex Hung wrote: > > > > > > On 8/25/25 02:52, Xi Ruoyao wrote: > > > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually > > > inlined by the compiler) populate_dml21_surface_config_from_plane_state > > > and populate_dml21_plane_config_from_plane_state which may use FPU. In > > > a x86-64 build: > > > > > > $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \ > > > > > > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o | > > > > grep %xmm -c > > > 63 > > > > > > Thus it needs to be guarded with DC_FP_START. But we must note that the > > > current code quality of the in-kernel FPU use in AMD dml2 is very much > > > problematic: we are actually calling DC_FP_START in dml21_wrapper.c > > > here, and this translation unit is built with CC_FLAGS_FPU. Strictly > > > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is > > > allowed to generate FPU uses anywhere in the translated code, perhaps > > > out of the DC_FP_START guard. This problematic pattern also occurs in > > > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really > > > > Let me share Austin's comments below: > > > > " > > Both CC_FLAGS_FPU and DC_FP_START are required for FPU usage. > > > > CC_FLAGS_FPU allows the compiler to generate FPU code whereas > > DC_FP_START ensures that the FPU registers don't get tainted during runtime. > > But the correct pattern would be: isolating the code using FPU into its > own translation unit, say xxx_fp.c, then call DC_FP_START in another > translation unit (for which CC_FLAGS_FPU is NOT used) before calling the > routines in xxx_fp.c. > > It's because there's generally no way to guarantee the compiler only > generate FP code in the range of DC_FP_START ... DC_FP_END if > CC_FLAGS_FPU is used. The compiler can generate FPU code in situations > we don't intend to use (and don't anticipate) FPU, for example if a > "counting number of ones in a word" pattern is detected when the code is > built for LoongArch, the compiler can invoke FPU to use a popcount > instruction with CC_FLAGS_FPU. (That doesn't really happen now but it > may happen in the future.) > > And the correct pattern is already used by, for example > dcn10_resource.c: > > DC_FP_START(); > voltage_supported = dcn_validate_bandwidth(dc, context, validate_mode); > DC_FP_END(); > > Here dcn_validate_bandwidth is in dcn_calcs.c where CC_FLAGS_FPU is in- > effect, but dcn10_resource.c itself is not built with CC_FLAGS_FPU.
And the correct pattern is already documented and explained in dcn20_fpu.c: * 4. Developers **must not** use DC_FP_START/END in this file, but they need * to ensure that the caller invokes it before access any function available * in this file. For this reason, public functions in this file must invoke * dc_assert_fp_enabled(); * * Let's expand a little bit more the idea in the code pattern. To fully * isolate FPU operations in a single place, we must avoid situations where * compilers spill FP values to registers due to FP enable in a specific C * file. Note that even if we isolate all FPU functions in a single file and * call its interface from other files, the compiler might enable the use of * FPU before we call DC_FP_START. Nevertheless, it is the programmer's * responsibility to invoke DC_FP_START/END in the correct place. To highlight * situations where developers forgot to use the FP protection before calling * the DC FPU interface functions, we introduce a helper that checks if the * function is invoked under FP protection. If not, it will trigger a kernel * warning. -- Xi Ruoyao <[email protected]>
