xiezhanpeng2025 opened a new pull request, #17448: URL: https://github.com/apache/nuttx/pull/17448
## Summary The specific Cortex-R52 implementation could be configured with a Single-Precision-only FPU (SP-only) and no Neon unit. Executing double-precision instructions (e.g., `vadd.f64`) triggers an Undefined Instruction exception. > Support for two FPU options: either a single precision-only with 32x 32-bit single precision registers, or double precision in Advanced SIMD implementations with 32x 64-bit / 16x 128-bit double precision registers. The FPU performance is optimized for both single and double precision calculations. Operations include add, subtract, divide, multiply, multiply and accumulate, square root, conversions between fixed and floating-point, and floating-point constant instructions. (https://developer.arm.com/Processors/Cortex-R52) The standard `-mfpu=fp-armv8` implicitly enables double-precision, which is unsafe for this hardware. `-mfpu=fpv5-sp-d16` is selected as the closest architectural match. - It enforces Single Precision code generation (preventing crashes). - It enables VFPv4/FPv5 features like FMA (Fused Multiply-Add) supported by the CR52 FPU. - It restricts the register set to d0-d15, matching the hardware constraints. This ensures the compiler utilizes hardware FPU and FMA acceleration without emitting illegal double-precision instructions. ## Impact Add compiler option support for Cortex-R52 with Single-Precision-only FPU. ## Testing The change is tested on a E3650 board with Cortex R52+ cores (https://www.semidrive.com/en/product/E3650). I tested this change by reviewing the code generation with FPU option '-mfpu=fp-armv8' or '-mfpu=fpv5-sp-d16'. The following code snippet from /ostest/fpu.c: ``` /* Do some trivial floating point operations that should cause some * changes to floating point registers. First, some single precision * nonsense. */ sp4 = (float)3.14159 * sp1; /* Multiple by Pi */ sp3 = sp4 + (float)1.61803; /* Add the golden ratio */ sp2 = sp3 / (float)2.71828; /* Divide by Euler's constant */ sp1 = sp2 + (float)1.0; /* Plus one */ fpu->sp1 = sp1; /* Make the compiler believe that somebody cares about the result */ fpu->sp2 = sp2; fpu->sp3 = sp3; fpu->sp4 = sp4; /* Again using double precision */ dp4 = (double)3.14159 * dp1; /* Multiple by Pi */ dp3 = dp4 + (double)1.61803; /* Add the golden ratio */ dp2 = dp3 / (double)2.71828; /* Divide by Euler's constant */ dp1 = dp2 + (double)1.0; /* Plus one */ fpu->dp1 = dp1; /* Make the compiler believe that somebody cares about the result */ fpu->dp2 = dp2; fpu->dp3 = dp3; fpu->dp4 = dp4; ``` When CONFIG_ARCH_DPFPU is enabled and '-mfpu=fp-armv8' is used, the code snippet is generated as: ``` 807f26e: eddf 7a47 vldr s15, [pc, #284] ; 807f38c <fpu_task+0x18c> 807f272: ee69 1b0a vmul.f64 d17, d9, d10 807f276: ee68 7a27 vmul.f32 s15, s16, s15 807f27a: ed9f 7a45 vldr s14, [pc, #276] ; 807f390 <fpu_task+0x190> 807f27e: eddf 0b3c vldr d16, [pc, #240] ; 807f370 <fpu_task+0x170> 807f282: ee37 7a87 vadd.f32 s14, s15, s14 807f286: ee71 0ba0 vadd.f64 d16, d17, d16 807f28a: ed9f 6a42 vldr s12, [pc, #264] ; 807f394 <fpu_task+0x194> 807f28e: eddf 3b3a vldr d19, [pc, #232] ; 807f378 <fpu_task+0x178> 807f292: eec7 6a06 vdiv.f32 s13, s14, s12 807f296: eec0 2ba3 vdiv.f64 d18, d16, d19 807f29a: eeb7 8a00 vmov.f32 s16, #112 ; 0x3f800000 1.0 807f29e: eeb7 9b00 vmov.f64 d9, #112 ; 0x3f800000 1.0 807f2a2: ee36 8a88 vadd.f32 s16, s13, s16 807f2a6: ee32 9b89 vadd.f64 d9, d18, d9 807f2aa: ed85 8aa4 vstr s16, [r5, #656] ; 0x290 807f2ae: edc5 6aa5 vstr s13, [r5, #660] ; 0x294 807f2b2: ed85 7aa6 vstr s14, [r5, #664] ; 0x298 807f2b6: edc5 7aa7 vstr s15, [r5, #668] ; 0x29c 807f2ba: eef7 7bc9 vcvt.f32.f64 s15, d9 807f2be: edc5 7aa8 vstr s15, [r5, #672] ; 0x2a0 807f2c2: eef7 7be2 vcvt.f32.f64 s15, d18 807f2c6: edc5 7aa9 vstr s15, [r5, #676] ; 0x2a4 807f2ca: eef7 7be0 vcvt.f32.f64 s15, d16 807f2ce: edc5 7aaa vstr s15, [r5, #680] ; 0x2a8 807f2d2: eef7 7be1 vcvt.f32.f64 s15, d17 807f2d6: 4628 mov r0, r5 807f2d8: edc5 7aab vstr s15, [r5, #684] ; 0x2ac ``` Here we find the double precision operations are computed with double precision instructions like vadd.f64, vdiv.f64 and so on. When CONFIG_ARCH_DPFPU is disabled and '-mfpu=fpv5-sp-d16' is used, the code snippet is generated as: ``` 807fedc: a34c add r3, pc, #304 ; (adr r3, 8080010 <fpu_task+0x1b0>) 807fede: e9d3 2300 ldrd r2, r3, [r3] 807fee2: eddf 7a54 vldr s15, [pc, #336] ; 8080034 <fpu_task+0x1d4> 807fee6: ed9f 7a54 vldr s14, [pc, #336] ; 8080038 <fpu_task+0x1d8> 807feea: ee68 7a27 vmul.f32 s15, s16, s15 807feee: ed9f 6a53 vldr s12, [pc, #332] ; 808003c <fpu_task+0x1dc> 807fef2: ee37 7a87 vadd.f32 s14, s15, s14 807fef6: eeb7 8a00 vmov.f32 s16, #112 ; 0x3f800000 1.0 807fefa: eec7 6a06 vdiv.f32 s13, s14, s12 807fefe: ee36 8a88 vadd.f32 s16, s13, s16 807ff02: 4630 mov r0, r6 807ff04: ed85 8aa4 vstr s16, [r5, #656] ; 0x290 807ff08: 4639 mov r1, r7 807ff0a: edc5 6aa5 vstr s13, [r5, #660] ; 0x294 807ff0e: ed85 7aa6 vstr s14, [r5, #664] ; 0x298 807ff12: edc5 7aa7 vstr s15, [r5, #668] ; 0x29c 807ff16: f7df f93d bl 805f194 <__aeabi_dmul> 807ff1a: 4602 mov r2, r0 807ff1c: 460b mov r3, r1 807ff1e: e9cd 2300 strd r2, r3, [sp] 807ff22: a33d add r3, pc, #244 ; (adr r3, 8080018 <fpu_task+0x1b8>) 807ff24: e9d3 2300 ldrd r2, r3, [r3] 807ff28: f7de ff7e bl 805ee28 <__adddf3> 807ff2c: 4602 mov r2, r0 807ff2e: 460b mov r3, r1 807ff30: e9cd 2302 strd r2, r3, [sp, #8] 807ff34: a33a add r3, pc, #232 ; (adr r3, 8080020 <fpu_task+0x1c0>) 807ff36: e9d3 2300 ldrd r2, r3, [r3] 807ff3a: f7df fa55 bl 805f3e8 <__aeabi_ddiv> 807ff3e: 2200 movs r2, #0 807ff40: 4b3f ldr r3, [pc, #252] ; (8080040 <fpu_task+0x1e0>) 807ff42: 4680 mov r8, r0 807ff44: 4689 mov r9, r1 807ff46: f7de ff6f bl 805ee28 <__adddf3> 807ff4a: 460f mov r7, r1 807ff4c: 4606 mov r6, r0 807ff4e: f003 fd2b bl 80839a8 <__aeabi_d2f> 807ff52: 4649 mov r1, r9 807ff54: f8c5 02a0 str.w r0, [r5, #672] ; 0x2a0 807ff58: 4640 mov r0, r8 807ff5a: f003 fd25 bl 80839a8 <__aeabi_d2f> 807ff5e: f8c5 02a4 str.w r0, [r5, #676] ; 0x2a4 807ff62: e9dd 0102 ldrd r0, r1, [sp, #8] 807ff66: f003 fd1f bl 80839a8 <__aeabi_d2f> 807ff6a: f8c5 02a8 str.w r0, [r5, #680] ; 0x2a8 807ff6e: e9dd 0100 ldrd r0, r1, [sp] 807ff72: f003 fd19 bl 80839a8 <__aeabi_d2f> 807ff76: f8c5 02ac str.w r0, [r5, #684] ; 0x2ac 807ff7a: 4628 mov r0, r5 ``` Some Compiler library functions from the compiler is used (__aeabi_dmul, __aeabi_ddiv) are used to handle the double precision operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
