Re: [PATCH] [5/20] x86: Introduce nsec_barrier() II

2008-01-07 Thread Andi Kleen
Andi Kleen <[EMAIL PROTECTED]> writes:

> On Thursday 03 January 2008 11:47:54 Ingo Molnar wrote:
>> 
>> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>> 
>> > nsec_barrier() is a new barrier primitive that stops RDTSC speculation 
>> > to avoid races with timer interrupts on other CPUs.
>> > 
>> > Add it to all architectures. Except for x86 it is a nop right now. I 
>> > only tested x86, but it's a very simple change.
>> > 
>> > On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD 
>> > CPUs) which stops RDTSC on all currently known microarchitectures that 
>> > implement SSE. On CPUs without SSE there is generally no RDTSC 
>> > speculation.
>> 
>> i've picked up your rdtsc patches into x86.git but have simplified it: 
>> there's no nsec_barrier() anymore - rdtsc() is always synchronous. 
>> MFENCE/LFENCE is fast enough. Open-coding such barriers almost always 
>> leads to needless trouble. Please check the next x86.git tree.
>
> That's most likely wrong unless you added two barriers -- the barriers
> are strictly need to be before and after RDTSC.

I checked your patch now -- 743abf4d987911af1ffce4c96f06cba6ffaa7e88
and 428f309ba5244fa25b44fcdf1d79aa94c8745cfd in gitx86 and you did it
indeed wrong.

The problem is that you inserted the barrier only after the RDTSC, but
the instruction can be speculated both forward and backward and both
can cause inconsistencies if it leaves the critical section.

The minimal fix would be to change native_read_tsc to

rdtsc_barrier();
asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));
rdtsc_barrier();

(e.g. add the missing barrier) 

Better would be still to keep the explicit nsec barriers as in
the original patch for several reasons:
- There are situations where RDTSC without barrier is ok and it might
be useful there.
- It is better to give the CPU some room to run uops in parallel;
especially in very performance critical like gtod(). A wider
barriered area is faster.
- I actually expect that other architectures with aggressive OOO 
implementations (like POWER4/5) should make use of 
nsec_barrier() too. I don't think it's really an x86 specific concept.
- Explicit barriers can be useful when measuring performance, although
admittedly there a x86 specific barrier is usually ok. However forcing
the barrier inside RDTSC is not.

If you insist to keep the incorrect patch please drop my name from it.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [5/20] x86: Introduce nsec_barrier() II

2008-01-07 Thread Andi Kleen
Andi Kleen [EMAIL PROTECTED] writes:

 On Thursday 03 January 2008 11:47:54 Ingo Molnar wrote:
 
 * Andi Kleen [EMAIL PROTECTED] wrote:
 
  nsec_barrier() is a new barrier primitive that stops RDTSC speculation 
  to avoid races with timer interrupts on other CPUs.
  
  Add it to all architectures. Except for x86 it is a nop right now. I 
  only tested x86, but it's a very simple change.
  
  On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD 
  CPUs) which stops RDTSC on all currently known microarchitectures that 
  implement SSE. On CPUs without SSE there is generally no RDTSC 
  speculation.
 
 i've picked up your rdtsc patches into x86.git but have simplified it: 
 there's no nsec_barrier() anymore - rdtsc() is always synchronous. 
 MFENCE/LFENCE is fast enough. Open-coding such barriers almost always 
 leads to needless trouble. Please check the next x86.git tree.

 That's most likely wrong unless you added two barriers -- the barriers
 are strictly need to be before and after RDTSC.

I checked your patch now -- 743abf4d987911af1ffce4c96f06cba6ffaa7e88
and 428f309ba5244fa25b44fcdf1d79aa94c8745cfd in gitx86 and you did it
indeed wrong.

The problem is that you inserted the barrier only after the RDTSC, but
the instruction can be speculated both forward and backward and both
can cause inconsistencies if it leaves the critical section.

The minimal fix would be to change native_read_tsc to

rdtsc_barrier();
asm volatile(rdtsc : EAX_EDX_RET(val, low, high));
rdtsc_barrier();

(e.g. add the missing barrier) 

Better would be still to keep the explicit nsec barriers as in
the original patch for several reasons:
- There are situations where RDTSC without barrier is ok and it might
be useful there.
- It is better to give the CPU some room to run uops in parallel;
especially in very performance critical like gtod(). A wider
barriered area is faster.
- I actually expect that other architectures with aggressive OOO 
implementations (like POWER4/5) should make use of 
nsec_barrier() too. I don't think it's really an x86 specific concept.
- Explicit barriers can be useful when measuring performance, although
admittedly there a x86 specific barrier is usually ok. However forcing
the barrier inside RDTSC is not.

If you insist to keep the incorrect patch please drop my name from it.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/