On 8/1/19 21:47, Kees Bakker wrote:

E.g.:

void msg_recv(...)
{
    if (!sched_active_thread->waiters) {
     // platform specific macro to suspend thread
    }

    thread_t *waiter = sched_active_thread->waiters;

    // ...
}

(or similar)

My understanding of volatile back then was that the compiler could,
without volatile, assume that sched_active_thread->waiters equals NULL.
Well, in the example code, the compiler would just read sched_active_thread->waiters
once and keep it in a register. If either sched_active_thread or waiters
can change in between the uses in that function you must declare some
things volatile.


Say you do something like this:

extern void f(void);

int x;

int test(void)
{
    int y = x;
    f();
    return y+x;
}

Then the compiler, knowing nothing about "f()", _has_ to re-read x in the return statement. This is because f() could possibly modify x. This is a behavior in which we can rely, because it is needed even in single threaded programs. This is what happens when the above snippet is compiled with anything above -O0:

int test(void)
{
   0:   b538            push    {r3, r4, r5, lr}
    int y = x;
   2:   4c03            ldr     r4, [pc, #12]   ; (10 <test+0x10>)
   4:   6825            ldr     r5, [r4, #0]
    f();
   6:   f7ff fffe       bl      0 <f>
    return y+x;
   a:   6820            ldr     r0, [r4, #0]
}
   c:   4428            add     r0, r5
   e:   bd38            pop     {r3, r4, r5, pc}
  10:   00000000        .word   0x00000000

See the two lines with "ldr        rN, [r4, #0]".

In the provided example of "msg_recv", if the "platform specific macro to suspend thread" includes a call to a function not defined in the current translation unit, then the compiler cannot assume that this function does not access or change globals. The only problem would be if the compiler thinks he knows what the function does (via inline or LTO). Looking at the code for _msg_receive() it does not seem to be the case, to I'm puzzled as to why it would break. I tried removing it, and there does not seem to be a problem. I'll PR it can be tried in the CI.

Marian,
> I think getting rid of `volatile` and adding "proper" memory barriers to places > where context switches are supposed to happen will make the code more robust and
> might allow the compiler to optimize the code better

Yes, that should work in those cases where the compiler may think he is sure a function call cannot modify certain variables. For example, the code:

int x;

int test2(void)
{
    int y = x;
    int k = z(x);
    return y+x+k;
}

Gets turned into:

int z(int x)
{
    return x*x;
}
  14:   fb00 f000       mul.w   r0, r0, r0
  18:   4770            bx      lr

0000001a <test2>:

int test2(void)
{
  1a:   b510            push    {r4, lr}
    int y = x;
  1c:   4b03            ldr     r3, [pc, #12]   ; (2c <test2+0x12>)
  1e:   681c            ldr     r4, [r3, #0]
    int k = z(x);
  20:   4620            mov     r0, r4
  22:   f7ff fffe       bl      14 <z>
    return y+x+k;
}
  26:   eb00 0044       add.w   r0, r0, r4, lsl #1
  2a:   bd10            pop     {r4, pc}
  2c:   00000000        .word   0x00000000


I compile with -Og to prevent inlining. Observer that the global "x" is only loaded once.


> `void __sync_synchronize(void)` placed just before and just after a context...

Yes, but most likely overkill. __sync_synchronize() is meant to issue a hardware barrier (that is, force the CPU to comlete all pending memory operations.) This is needed in multi-core, and maybe some single-core scenarios, but it is not what we are looking for. We need a _compiler barrier_, like linux's barrier macro(). See this code:

int x;

int test3(void)
{
    int y = x;
    __sync_synchronize();
    //__asm__ __volatile__("": : :"memory");  // linux's barrier()
    return y + x;
}


With no barriers, the ASM shows that x is read once. __sync_synchronize() inserts a "dmb" (data memory barrier) instruction in addition to forcing an additional read:

int test3(void)
{
    int y = x;
  30:   4b03            ldr     r3, [pc, #12]   ; (40 <test3+0x10>)
  32:   6818            ldr     r0, [r3, #0]
    __sync_synchronize();
  34:   f3bf 8f5b       dmb     ish
    return y + x;
  38:   681b            ldr     r3, [r3, #0]
}
  3a:   4418            add     r0, r3
  3c:   4770            bx      lr

The other barrier causes a re-read, but without the "dmb", which is probably what we want in most cases. Linus has a series on emails[1] in which he explains why barrier() is good and essentially free.

Regards,

Juan.

[1] https://yarchive.net/comp/linux/compiler_barriers.html
_______________________________________________
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel

Reply via email to