On Tue, Nov 19, 2013 at 03:29:12PM +0000, Mathieu Desnoyers wrote:
> Hi,

Hi Mathieu,

> I got a bug report on ARM which appears to be caused by an aggressive gcc 
> optimisation starting from gcc 4.8.x due to lack of constraints on the 
> current_thread_info() inline assembly. The only logical explanation for his 
> issue I see so far is that read of the preempt_count within might_sleep() is 
> reordered with preempt_enable() or preempt_disable(). AFAIU, this kind of 
> issue also applies to other architectures.
> 
> First thing, preempt enable/disable only contains barrier() between the 
> inc/dec and the inside of the critical section, not the outside. Therefore, 
> we are relying on program order to ensure that the preempt_count() read in 
> might_sleep() is not reordered with the preempt count inc/dec.

This sounds almost identical to an issue I experienced with our optimised
per-cpu code (more below).

> However, looking at ARM arch/arm/include/asm/thread_info.h:
> 
> static inline struct thread_info *current_thread_info(void)
> {
>         register unsigned long sp asm ("sp");
>         return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
> }
> 
> The inline assembly has no clobber and is not volatile. (this is also true 
> for all other architectures I've looked at so far, which includes x86 and 
> powerpc)
> 
> As long as someone does:
> 
> struct thread_info *myinfo = current_thread_info();
> 
> load from myinfo->preempt_count;
> store to myinfo->preempt_count;
> 
> The program order should be preserved, because the read and the write are 
> done wrt same base. However, what we have here in the case of might_sleep() 
> followed by preempt_enable() is:
> 
> load from current_thread_info()->preempt_count;
> store to current_thread_info()->preempt_count;
> 
> Since each current_thread_info() is a different asm ("sp") without clobber 
> nor volatile, AFAIU, the compiler is within its right to reorder them.
> 
> One possible solution to this might be to add "memory" clobber and volatile 
> to this inline asm, but I fear it would put way too much constraints on the 
> compiler optimizations (too heavyweight).

Yup, that sucks, because you end up unable to cache the value when you know
it hasn't changed.

> Have any of you experienced this issue ? Any thoughts on the matter ?

The way I got around this for the per-cpu code is to include a dummy memory
constraint for the stack. This has a couple of advantages:

        (1) It hazards against a "memory" clobber, so doesn't require use of
            `volatile'

        (2) It doesn't require GCC to emit any address generation code,
            since dereferencing the sp is valid in ARM assembly

so adding something like:

        asm("" :: "Q" (*sp));

immediately after the declaration of sp in current_therad_info might do the
trick. Do you have a way to test that?

Cheers,

Will

_______________________________________________
lttng-dev mailing list
[email protected]
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to