xiezhanpeng2025 opened a new pull request, #17448:
URL: https://github.com/apache/nuttx/pull/17448

   ## Summary
   
   The specific Cortex-R52 implementation could be configured with a 
Single-Precision-only FPU (SP-only) and no Neon unit. Executing 
double-precision instructions (e.g., `vadd.f64`) triggers an Undefined 
Instruction exception.
   
   > Support for two FPU options: either a single precision-only with 32x 
32-bit single precision registers, or double precision in Advanced SIMD 
implementations with 32x 64-bit / 16x 128-bit double precision registers. The 
FPU performance is optimized for both single and double precision calculations. 
Operations include add, subtract, divide, multiply, multiply and accumulate, 
square root, conversions between fixed and floating-point, and floating-point 
constant instructions. (https://developer.arm.com/Processors/Cortex-R52)
   
   The standard `-mfpu=fp-armv8` implicitly enables double-precision, which is 
unsafe for this hardware.
   
   `-mfpu=fpv5-sp-d16` is selected as the closest architectural match.
     - It enforces Single Precision code generation (preventing crashes).
     - It enables VFPv4/FPv5 features like FMA (Fused Multiply-Add) supported 
by the CR52 FPU.
     - It restricts the register set to d0-d15, matching the hardware 
constraints.
   
   This ensures the compiler utilizes hardware FPU and FMA acceleration without 
emitting illegal double-precision instructions.
   
   ## Impact
   
   Add compiler option support for Cortex-R52 with Single-Precision-only FPU.
   
   ## Testing
   
   The change is tested on a E3650 board with Cortex R52+ cores 
(https://www.semidrive.com/en/product/E3650).
   I tested this change by reviewing the code generation with FPU option 
'-mfpu=fp-armv8' or '-mfpu=fpv5-sp-d16'.
   
   The following code snippet from /ostest/fpu.c:
   ```
         /* Do some trivial floating point operations that should cause some
          * changes to floating point registers.  First, some single precision
          * nonsense.
          */
   
         sp4 = (float)3.14159 * sp1;    /* Multiple by Pi */
         sp3 = sp4 + (float)1.61803;    /* Add the golden ratio */
         sp2 = sp3 / (float)2.71828;    /* Divide by Euler's constant */
         sp1 = sp2 + (float)1.0;        /* Plus one */
   
         fpu->sp1 = sp1;                /* Make the compiler believe that 
somebody cares about the result */
         fpu->sp2 = sp2;
         fpu->sp3 = sp3;
         fpu->sp4 = sp4;
   
         /* Again using double precision */
   
         dp4 = (double)3.14159 * dp1;   /* Multiple by Pi */
         dp3 = dp4 + (double)1.61803;   /* Add the golden ratio */
         dp2 = dp3 / (double)2.71828;   /* Divide by Euler's constant */
         dp1 = dp2 + (double)1.0;       /* Plus one */
   
         fpu->dp1 = dp1;                /* Make the compiler believe that 
somebody cares about the result */
         fpu->dp2 = dp2;
         fpu->dp3 = dp3;
         fpu->dp4 = dp4;
   
   ```
   
   When CONFIG_ARCH_DPFPU is enabled and '-mfpu=fp-armv8' is used, the code 
snippet is generated as:
   ```
    807f26e:    eddf 7a47       vldr    s15, [pc, #284] ; 807f38c 
<fpu_task+0x18c>
    807f272:    ee69 1b0a       vmul.f64        d17, d9, d10
    807f276:    ee68 7a27       vmul.f32        s15, s16, s15
    807f27a:    ed9f 7a45       vldr    s14, [pc, #276] ; 807f390 
<fpu_task+0x190>
    807f27e:    eddf 0b3c       vldr    d16, [pc, #240] ; 807f370 
<fpu_task+0x170>
    807f282:    ee37 7a87       vadd.f32        s14, s15, s14
    807f286:    ee71 0ba0       vadd.f64        d16, d17, d16
    807f28a:    ed9f 6a42       vldr    s12, [pc, #264] ; 807f394 
<fpu_task+0x194>
    807f28e:    eddf 3b3a       vldr    d19, [pc, #232] ; 807f378 
<fpu_task+0x178>
    807f292:    eec7 6a06       vdiv.f32        s13, s14, s12
    807f296:    eec0 2ba3       vdiv.f64        d18, d16, d19
    807f29a:    eeb7 8a00       vmov.f32        s16, #112       ; 0x3f800000  
1.0
    807f29e:    eeb7 9b00       vmov.f64        d9, #112        ; 0x3f800000  
1.0
    807f2a2:    ee36 8a88       vadd.f32        s16, s13, s16
    807f2a6:    ee32 9b89       vadd.f64        d9, d18, d9
    807f2aa:    ed85 8aa4       vstr    s16, [r5, #656] ; 0x290
    807f2ae:    edc5 6aa5       vstr    s13, [r5, #660] ; 0x294
    807f2b2:    ed85 7aa6       vstr    s14, [r5, #664] ; 0x298
    807f2b6:    edc5 7aa7       vstr    s15, [r5, #668] ; 0x29c
    807f2ba:    eef7 7bc9       vcvt.f32.f64    s15, d9
    807f2be:    edc5 7aa8       vstr    s15, [r5, #672] ; 0x2a0
    807f2c2:    eef7 7be2       vcvt.f32.f64    s15, d18
    807f2c6:    edc5 7aa9       vstr    s15, [r5, #676] ; 0x2a4
    807f2ca:    eef7 7be0       vcvt.f32.f64    s15, d16
    807f2ce:    edc5 7aaa       vstr    s15, [r5, #680] ; 0x2a8
    807f2d2:    eef7 7be1       vcvt.f32.f64    s15, d17
    807f2d6:    4628            mov     r0, r5
    807f2d8:    edc5 7aab       vstr    s15, [r5, #684] ; 0x2ac
   ```
   Here we find the double precision operations are computed with double 
precision instructions
   like vadd.f64, vdiv.f64 and so on.
   
   When CONFIG_ARCH_DPFPU is disabled and  '-mfpu=fpv5-sp-d16' is used, the 
code snippet is generated as:
   ```
    807fedc:    a34c            add     r3, pc, #304    ; (adr r3, 8080010 
<fpu_task+0x1b0>)
    807fede:    e9d3 2300       ldrd    r2, r3, [r3]
    807fee2:    eddf 7a54       vldr    s15, [pc, #336] ; 8080034 
<fpu_task+0x1d4>
    807fee6:    ed9f 7a54       vldr    s14, [pc, #336] ; 8080038 
<fpu_task+0x1d8>
    807feea:    ee68 7a27       vmul.f32        s15, s16, s15
    807feee:    ed9f 6a53       vldr    s12, [pc, #332] ; 808003c 
<fpu_task+0x1dc>
    807fef2:    ee37 7a87       vadd.f32        s14, s15, s14
    807fef6:    eeb7 8a00       vmov.f32        s16, #112       ; 0x3f800000  
1.0
    807fefa:    eec7 6a06       vdiv.f32        s13, s14, s12
    807fefe:    ee36 8a88       vadd.f32        s16, s13, s16
    807ff02:    4630            mov     r0, r6
    807ff04:    ed85 8aa4       vstr    s16, [r5, #656] ; 0x290
    807ff08:    4639            mov     r1, r7
    807ff0a:    edc5 6aa5       vstr    s13, [r5, #660] ; 0x294
    807ff0e:    ed85 7aa6       vstr    s14, [r5, #664] ; 0x298
    807ff12:    edc5 7aa7       vstr    s15, [r5, #668] ; 0x29c
    807ff16:    f7df f93d       bl      805f194 <__aeabi_dmul>
    807ff1a:    4602            mov     r2, r0
    807ff1c:    460b            mov     r3, r1
    807ff1e:    e9cd 2300       strd    r2, r3, [sp]
    807ff22:    a33d            add     r3, pc, #244    ; (adr r3, 8080018 
<fpu_task+0x1b8>)
    807ff24:    e9d3 2300       ldrd    r2, r3, [r3]
    807ff28:    f7de ff7e       bl      805ee28 <__adddf3>
    807ff2c:    4602            mov     r2, r0
    807ff2e:    460b            mov     r3, r1
    807ff30:    e9cd 2302       strd    r2, r3, [sp, #8]
    807ff34:    a33a            add     r3, pc, #232    ; (adr r3, 8080020 
<fpu_task+0x1c0>)
    807ff36:    e9d3 2300       ldrd    r2, r3, [r3]
    807ff3a:    f7df fa55       bl      805f3e8 <__aeabi_ddiv>
    807ff3e:    2200            movs    r2, #0
    807ff40:    4b3f            ldr     r3, [pc, #252]  ; (8080040 
<fpu_task+0x1e0>)
    807ff42:    4680            mov     r8, r0
    807ff44:    4689            mov     r9, r1
    807ff46:    f7de ff6f       bl      805ee28 <__adddf3>
    807ff4a:    460f            mov     r7, r1
    807ff4c:    4606            mov     r6, r0
    807ff4e:    f003 fd2b       bl      80839a8 <__aeabi_d2f>
    807ff52:    4649            mov     r1, r9
    807ff54:    f8c5 02a0       str.w   r0, [r5, #672]  ; 0x2a0
    807ff58:    4640            mov     r0, r8
    807ff5a:    f003 fd25       bl      80839a8 <__aeabi_d2f>
    807ff5e:    f8c5 02a4       str.w   r0, [r5, #676]  ; 0x2a4
    807ff62:    e9dd 0102       ldrd    r0, r1, [sp, #8]
    807ff66:    f003 fd1f       bl      80839a8 <__aeabi_d2f>
    807ff6a:    f8c5 02a8       str.w   r0, [r5, #680]  ; 0x2a8
    807ff6e:    e9dd 0100       ldrd    r0, r1, [sp]
    807ff72:    f003 fd19       bl      80839a8 <__aeabi_d2f>
    807ff76:    f8c5 02ac       str.w   r0, [r5, #684]  ; 0x2ac
    807ff7a:    4628            mov     r0, r5
   ```
   Some Compiler library functions from the compiler is used (__aeabi_dmul, 
__aeabi_ddiv) are used to handle the double precision operations.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to