Re: [riot-devel] sched_active_thread is a volatile pointer, but volatile is ignored

Juan Ignacio Carrano Wed, 27 Feb 2019 05:49:58 -0800


On 8/1/19 21:47, Kees Bakker wrote:

E.g.:

void msg_recv(...)
{
    if (!sched_active_thread->waiters) {
     // platform specific macro to suspend thread
    }

    thread_t *waiter = sched_active_thread->waiters;

    // ...
}

(or similar)

My understanding of volatile back then was that the compiler could,
without volatile, assume that sched_active_thread->waiters equals NULL.

Well, in the example code, the compiler would just readsched_active_thread->waiters

once and keep it in a register. If either sched_active_thread or waiters
can change in between the uses in that function you must declare some
things volatile.



Say you do something like this:

extern void f(void);

int x;

int test(void)
{
    int y = x;
    f();
    return y+x;
}

Then the compiler, knowing nothing about "f()", _has_ to re-read x inthe return statement. This is because f() could possibly modify x. Thisis a behavior in which we can rely, because it is needed even in singlethreaded programs. This is what happens when the above snippet iscompiled with anything above -O0:


int test(void)
{
   0:   b538            push    {r3, r4, r5, lr}
    int y = x;
   2:   4c03            ldr     r4, [pc, #12]   ; (10 <test+0x10>)
   4:   6825            ldr     r5, [r4, #0]
    f();
   6:   f7ff fffe       bl      0 <f>
    return y+x;
   a:   6820            ldr     r0, [r4, #0]
}
   c:   4428            add     r0, r5
   e:   bd38            pop     {r3, r4, r5, pc}
  10:   00000000        .word   0x00000000

See the two lines with "ldr        rN, [r4, #0]".

In the provided example of "msg_recv", if the "platform specific macroto suspend thread" includes a call to a function not defined in thecurrent translation unit, then the compiler cannot assume that thisfunction does not access or change globals. The only problem would be ifthe compiler thinks he knows what the function does (via inline or LTO).Looking at the code for _msg_receive() it does not seem to be the case,to I'm puzzled as to why it would break. I tried removing it, and theredoes not seem to be a problem. I'll PR it can be tried in the CI.


Marian,

> I think getting rid of `volatile` and adding "proper" memory barriersto places> where context switches are supposed to happen will make the code morerobust and

> might allow the compiler to optimize the code better

Yes, that should work in those cases where the compiler may think he issure a function call cannot modify certain variables. For example, the code:


int x;

int test2(void)
{
    int y = x;
    int k = z(x);
    return y+x+k;
}

Gets turned into:

int z(int x)
{
    return x*x;
}
  14:   fb00 f000       mul.w   r0, r0, r0
  18:   4770            bx      lr

0000001a <test2>:

int test2(void)
{
  1a:   b510            push    {r4, lr}
    int y = x;
  1c:   4b03            ldr     r3, [pc, #12]   ; (2c <test2+0x12>)
  1e:   681c            ldr     r4, [r3, #0]
    int k = z(x);
  20:   4620            mov     r0, r4
  22:   f7ff fffe       bl      14 <z>
    return y+x+k;
}
  26:   eb00 0044       add.w   r0, r0, r4, lsl #1
  2a:   bd10            pop     {r4, pc}
  2c:   00000000        .word   0x00000000

I compile with -Og to prevent inlining. Observer that the global "x" isonly loaded once.

> `void __sync_synchronize(void)` placed just before and just after acontext...

Yes, but most likely overkill. __sync_synchronize() is meant to issue ahardware barrier (that is, force the CPU to comlete all pending memoryoperations.) This is needed in multi-core, and maybe some single-corescenarios, but it is not what we are looking for. We need a _compilerbarrier_, like linux's barrier macro(). See this code:


int x;

int test3(void)
{
    int y = x;
    __sync_synchronize();
    //__asm__ __volatile__("": : :"memory");  // linux's barrier()
    return y + x;
}

With no barriers, the ASM shows that x is read once.__sync_synchronize() inserts a "dmb" (data memory barrier) instructionin addition to forcing an additional read:


int test3(void)
{
    int y = x;
  30:   4b03            ldr     r3, [pc, #12]   ; (40 <test3+0x10>)
  32:   6818            ldr     r0, [r3, #0]
    __sync_synchronize();
  34:   f3bf 8f5b       dmb     ish
    return y + x;
  38:   681b            ldr     r3, [r3, #0]
}
  3a:   4418            add     r0, r3
  3c:   4770            bx      lr

The other barrier causes a re-read, but without the "dmb", which isprobably what we want in most cases. Linus has a series on emails[1] inwhich he explains why barrier() is good and essentially free.


Regards,

Juan.

[1] https://yarchive.net/comp/linux/compiler_barriers.html
_______________________________________________
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel

Re: [riot-devel] sched_active_thread is a volatile pointer, but volatile is ignored

Reply via email to