Hi all,

In Cortex-M MCUs, the PendSV interrupt is used to implement the actual context 
switch.  The way that it is currently used with RIOT causes instability with 
nRF52 SoftDevice.  Fixing this seems to be quite easy (see below), but might 
cause instability elsewhere.  

My apology for this unfortunately quite long mail.  However, I think we need to 
understand the background and discuss the details.

Rationale and background
------------------------

The general background for PendSV is explained rather well starting from page 
194 in Yiu's "The Definitive Guide to the ARM Cortex-M0" [1].  There, and also 
in e.g. at the ARM community forums [2], there is a clear recommendation of 
having PendSV at the lowest priority.  However, RIOT currently places PendSV at 
the *same* priority with other interrupts [3].

I am proposing that RIOT also adopts the recommended practice and -- perhaps 
after some further testing -- places, by default, PendSV at the lowest priority 
level.

As a second change, I propose that the order of lines 61 and 62 in `thread.c` 
are changed [3].  (See also my previous email on this list [4]).

There are two reasons for these changes:

1. Having PendSV at the same priority with the others, as today, causes 
instability with nRF52 SoftDevice

Exactly why PendPV at the same priority with other interrupts causes 
instability, I don't know.  It may be related to the fact that the Nordic 
SoftDevice uses an interrupt trampoline.  That is, the MCU interrupts don't 
land directly at RIOT, but the physical interrupt vectors point to a trampoline 
function inside the SoftDevice, which then calls the RIOT handler.  It may be 
that the SoftDevice uses PendSV also for some of its own functionality, too, 
perhaps not always calling RIOT, or first doing something and then calling RIOT.

In practical terms, I get HardFaults at irq_restore() or thread_yield_higher() 
around lines 61 or 62 in `mutex.c`, depending on their order.  Most of the time 
they stem from xtimer_usleep().  With the current order, with irq_restore() 
first and thread_yield_higher() second, the instability is much worse, i.e. 
much more crashes.  Independent of the order, the crashes are indeterministic, 
but relatively easy to trigger under a number of conditions that I don't fully 
understand yet.

2. Having PendSV at the lowest priority increases performance

The PendSV interrupt is designed to be used to perform the actual context 
switch just before returning from an interrupt to a normal thread context.  The 
basic idea is to switch the thread stack and context only when there are no 
more interrupts pending.  If there are still pending interrupts, running them 
may cause a higher priority thread to become ready to run.  If so, with the 
current RIOT practice, PendSV may first change the thread, then, without this 
thread getting any running time, the next pending interrupt may make another 
thread runnable and trigger another PendSV, causing another thread change.  The 
first thread change is wasted.

Kaspar and Oleg, you seem to have changed this in #3511 [5].  Looking at #3450 
[6], I think the main problem was that PendSV calls sched_run().  But, looking 
at isr_pendsv(), I don't see it calling sched_run().  Only isr_svc() does.

So, maybe it is time to revert that change?

Proposed changes
----------------

1. Make it possible to have separate priority to PENDSV:

```
   diff --git a/cpu/cortexm_common/cortexm_init.c 
b/cpu/cortexm_common/cortexm_init.c
   index 1015d4e87..9350b32cb 100644
   --- a/cpu/cortexm_common/cortexm_init.c
   +++ b/cpu/cortexm_common/cortexm_init.c
   @@ -30,6 +30,22 @@
    #define DONT_OVERRIDE_NVIC 1
    #endif
    
   +/*
   + * According to the ARM Cortex-M documentation, the recommended best
   + * practice is to place the PendSV at the lowest interrupt priority.
   + * However, RIOT has traditionally placed it in the same priority with
   + * the other interrupts.
   + *
   + * For now, by default we preserve the traditional RIOT behaviour,
   + * but allow specific CPUs, boards, or apps to change this.
   + *
   + * It is anticipated that this may be changed to the lowest
   + * priority at some point, after sufficient testing.
   + */
   +#if !defined(CPU_PENDSV_IRQ_PRIO)
   +#define CPU_PENDSV_IRQ_PRIO CPU_DEFAULT_IRQ_PRIO
   +#endif
   +
    #include "cpu.h"
    
    /**
   @@ -46,8 +62,8 @@ extern const void *_isr_vectors;
    CORTEXM_STATIC_INLINE void cortexm_init_isr_priorities(void)
    {
        /* initialize the interrupt priorities */
   -    /* set pendSV interrupt to same priority as the rest */
   -    NVIC_SetPriority(PendSV_IRQn, CPU_DEFAULT_IRQ_PRIO);
   +    /* set pendSV interrupt to its own priority */
   +    NVIC_SetPriority(PendSV_IRQn, CPU_PENDSV_IRQ_PRIO);
        /* set SVC interrupt to same priority as the rest */
        NVIC_SetPriority(SVCall_IRQn, CPU_DEFAULT_IRQ_PRIO);
        /* initialize all vendor specific interrupts with the same value */
```

2. Fix the nRF52 SD instability

```
   diff --git a/cpu/nrf52/include/cpu_conf.h b/cpu/nrf52/include/cpu_conf.h
   index f8b4183b2..efb850ae8 100644
   --- a/cpu/nrf52/include/cpu_conf.h
   +++ b/cpu/nrf52/include/cpu_conf.h
   @@ -44,6 +44,7 @@ extern "C" {
     */
    #ifdef SOFTDEVICE_PRESENT
    #define CPU_DEFAULT_IRQ_PRIO            (6U)
   +#define CPU_PENDSV_IRQ_PRIO             (7U)
    #else
    #define CPU_DEFAULT_IRQ_PRIO            (2U)
    #endif
```

3. Change the order of when PendSV is triggered and the interrupts are enabled 
in `mutex.c`

```
   diff --git a/core/mutex.c b/core/mutex.c
   index e75c5882f..f7bde2364 100644
   --- a/core/mutex.c
   +++ b/core/mutex.c
   @@ -58,8 +58,8 @@ int _mutex_lock(mutex_t *mutex, int blocking)
            else {
                thread_add_to_list(&mutex->queue, me);
            }
   -        irq_restore(irqstate);
            thread_yield_higher();
   +        irq_restore(irqstate);
            /* We were woken up by scheduler. Waker removed us from queue.
             * We have the mutex now. */
            return 1; 
```

Do these make sense?  The first two are essential to fix the nRF52 SD 
instability, but of course the comment doesn't make sense just for that.

The third one may be more controversial, perhaps even for Cortex-M.  
Furthermore, I don't understand what it would mean for other MCUs but nRF52, 
since I don't understand the details of their architectures.

--Pekka

[1] 
https://books.google.fi/books?id=5OZblBzjsJ0C&lpg=PA174&ots=m3bMkoQjNt&dq=cortex-m%20pendsv%20priority&hl=fi&pg=PA194#v=onepage&q&f=false
[2] 
https://community.arm.com/processors/f/discussions/9485/cortex-m-rtos-related-exceptions-and-concepts
[3] 
https://github.com/RIOT-OS/RIOT/blob/master/cpu/cortexm_common/cortexm_init.c#L38
[4] https://lists.riot-os.org/pipermail/devel/2018-September/005923.html
[5] https://github.com/RIOT-OS/RIOT/pull/3511
[6] https://github.com/RIOT-OS/RIOT/pull/3450

_______________________________________________
devel mailing list
[email protected]
https://lists.riot-os.org/mailman/listinfo/devel

Reply via email to