Re: [RFC] Adding RISC-V Vector support to RTEMS

Sebastian Huber Fri, 15 Mar 2024 07:30:00 -0700

Hello Ken,

On 14.03.24 18:33, [email protected] wrote:

Hello RTEMS experts,
We’re in the process of implementing support for RTEMS on a new RISC-Vplatform. Among other things, our processor core supports the RISC-VVector ISA (RVV), with its 32 vector registers which in our case are 512bits (VLEN) deep. RVV is used by applications to accelerate a varietyof code/algorithms utilizing either the auto-vectorizer in GCC/Clang, orthrough C intrinsics or direct assembly coding.
My query here is regarding context saving, and options foroptimization. Before I get to that, here’s a few points of background.
#1. Use of RVV by the compiler (GCC/Clang) is dictated by the ISA stringprovided, e.g -march=rv64imadcv, where v is for vector. The compilerpreprocessor symbol "__riscv_vector" can then be used for conditionallycompiled code, similar to what is done for floating point.
#2. Enablement of RVV can be controlled via a machine status CSR. Thisallows one to disable RVV entirely (triggering an exception if RVVinstructions are executed), but also allows one to track the dirty/cleanstate of the vector register file.(https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-context-status-in-mstatus <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-context-status-in-mstatus>). So, one can conditionally save/restore the registers, although the 2KB of stack space (32 x (512b / 8)) would need to be allocated.
#3. The C ABI specifies that vector registers are Caller saved. So, anysystem call (or any function call for that matter) does not need topreserve these registers.
A relatively straightforward approach is to say that RTEMS + Applicationis built with vectors enabled (i.e -march=rv64imadcv). One does notneed to save/restore the vector registers within Context_Control becauseof #3, however,

Yes, in this case you don't have to save/restore the vector registers in_CPU_Context_switch(). This is similar to the SPARC architecture. ForSPARC, we implemented a lazy floating point switch(cpukit/score/cpu/sparc/include/rtems/score/cpu.h):

/*

* The SPARC ABI is a bit special with respect to the floating pointcontext.* The complete floating point context is volatile. Thus, from an ABIpoint

 * of view nothing needs to be saved and restored during a context switch.
 * Instead the floating point context must be saved and restored during

* interrupt processing. Historically, the deferred floating pointswitch was

 * used for SPARC and the complete floating point context is saved and
 * restored during a context switch to the new floating point unit owner.
 * This is a bit dangerous since post-switch actions (e.g. signal handlers)
 * and context switch extensions may silently corrupt the floating point
 * context.
 *

* The floating point unit is disabled for interrupt handlers. Thus,in case* an interrupt handler uses the floating point unit then this willresult in a

 * trap (INTERNAL_ERROR_ILLEGAL_USE_OF_FLOATING_POINT_UNIT).
 *
 * In uniprocessor configurations, a lazy floating point context switch is

* used. In case an active floating point thread is interrupted(PSR[EF] == 1)* and a thread dispatch is carried out, then this thread is registeredas the

 * floating point owner.  When a floating point owner is present during a
 * context switch, the floating point unit is disabled for the heir thread

* (PSR[EF] == 0). The floating point disabled trap checks that theuse of the* floating point unit is allowed and saves/restores the floating pointcontext

 * on demand.
 *

* In SMP configurations, the deferred floating point switch is notsupported

 * in principle.  So, use here a synchronous floating point switching.
 * Synchronous means that the volatile floating point context is saved and

* restored around a thread dispatch issued during interruptprocessing. Thus

 * post-switch actions and context switch extensions may safely use the
 * floating point unit.
 */
#if SPARC_HAS_FPU == 1
  #if defined(RTEMS_SMP)
    #define SPARC_USE_SYNCHRONOUS_FP_SWITCH
  #else
    #define SPARC_USE_LAZY_FP_SWITCH
  #endif
#endif

For simplicity, we disabled this optimization in SMP configurations. Itcould be probably used also in SMP configurations, however, this is abit more complicated if threads move to other cores.

because we are now building RTEMS with V enabled, weneed to add this save/restore to the CPU_Interrupt_frame and incur thatcost on all interrupts and stack space for all tasks. (A smallsecondary effect is that CPU_INTERRUPT_FRAME_SIZE is now larger than theimmediate addressing range).
Is there an alternative to consider? Might one build RTEMS itselfwithout V, leaving that only for Applications/Tasks, perhaps adding aTask attribute, and thereby taking the vector save/restore penalty onlywhen switching into or out of a vector enabled task? (Perhaps similarto HW floating point from the past). One would then use an RTEMS configflag to enable vector support, although qualified in runtime byvalidating the existence of the vector ISA (misa CSR). Or maybe thatdirection would require too many changes? e.g I see that “is_fp” isused more specifically, rather than the more general rtems_attribute inThread_Control and in the arguments to _Context_Initialize(). Anyways,I’m happy to hear any thoughts on this subject. Note that I’m new toRTEMS, so may not have some past context.

The potential optimizations depend also on how aggressively the vectorunit is used by the compiler. On powerpc for example, the AltiVec unitis also used to clear or copy non-vector data. There is also a vrsaveregister, which helps to optimize the AltiVec context save/restore. Ifyou disable the vector unit for certain jobs (like interrupt handlers,non-vector tasks), then you have to ensure that the compiler doesn'tgenerate vector unit instructions. In theory, you could compile yourcode with different options. However, during linking you provably haveto select the non-vector standard libraries. Specialized numericlibraries may have to be restricted to vector tasks.

From an RTEMS API point of view, we could simply use theRTEMS_FLOATING_POINT attribute. This is a bit of an issue since POSIXthreads use this by default. Maybe it makes sense to introduce anon-standard attribute for POSIX threads to be control this.


--
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: [email protected]
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/
_______________________________________________
devel mailing list
[email protected]
http://lists.rtems.org/mailman/listinfo/devel

Re: [RFC] Adding RISC-V Vector support to RTEMS

Reply via email to