* Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote:

> + (*) The compiler is within its rights to reorder memory accesses unless
> +     you tell it not to.  For example, consider the following interaction
> +     between process-level code and an interrupt handler:
> +
> +     void process_level(void)
> +     {
> +             msg = get_message();
> +             flag = true;
> +     }
> +
> +     void interrupt_handler(void)
> +     {
> +             if (flag)
> +                     process_message(msg);
> +     }
> +
> +     There is nothing to prevent the the compiler from transforming
> +     process_level() to the following, in fact, this might well be a
> +     win for single-threaded code:
> +
> +     void process_level(void)
> +     {
> +             flag = true;
> +             msg = get_message();
> +     }
> +
> +     If the interrupt occurs between these two statement, then
> +     interrupt_handler() might be passed a garbled msg.  Use ACCESS_ONCE()
> +     to prevent this as follows:
> +
> +     void process_level(void)
> +     {
> +             ACCESS_ONCE(msg) = get_message();
> +             ACCESS_ONCE(flag) = true;
> +     }
> +
> +     void interrupt_handler(void)
> +     {
> +             if (ACCESS_ONCE(flag))
> +                     process_message(ACCESS_ONCE(msg));
> +     }

Technically, if the interrupt handler is the innermost context, the 
ACCESS_ONCE() is not needed in the interrupt_handler() code.

Since for the vast majority of Linux code IRQ handlers are the most 
atomic contexts (very few drivers deal with NMIs) I suspect we should 
either remove that ACCESS_ONCE() from the example or add a comment 
explaining that in many cases those are superfluous?

> + (*) For aligned memory locations whose size allows them to be accessed
> +     with a single memory-reference instruction, prevents "load tearing"
> +     and "store tearing," in which a single large access is replaced by
> +     multiple smaller accesses.  For example, given an architecture having
> +     16-bit store instructions with 7-bit immediate fields, the compiler
> +     might be tempted to use two 16-bit store-immediate instructions to
> +     implement the following 32-bit store:
> +
> +     p = 0x00010002;
> +
> +     Please note that GCC really does use this sort of optimization,
> +     which is not surprising given that it would likely take more
> +     than two instructions to build the constant and then store it.
> +     This optimization can therefore be a win in single-threaded code.
> +     In fact, a recent bug (since fixed) caused GCC to incorrectly use
> +     this optimization in a volatile store.  In the absence of such bugs,
> +     use of ACCESS_ONCE() prevents store tearing:
> +
> +     ACCESS_ONCE(p) = 0x00010002;

I suspect the last sentence should read:

> +                                             In the absence of such bugs,
> +     use of ACCESS_ONCE() prevents store tearing in this example:
> +
> +     ACCESS_ONCE(p) = 0x00010002;

Otherwise it could be read as a more generic statement (leaving out 
'load tearing')?

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to