Re: [PATCH] [5/20] x86: Introduce nsec_barrier() II
Andi Kleen <[EMAIL PROTECTED]> writes: > On Thursday 03 January 2008 11:47:54 Ingo Molnar wrote: >> >> * Andi Kleen <[EMAIL PROTECTED]> wrote: >> >> > nsec_barrier() is a new barrier primitive that stops RDTSC speculation >> > to avoid races with timer interrupts on other CPUs. >> > >> > Add it to all architectures. Except for x86 it is a nop right now. I >> > only tested x86, but it's a very simple change. >> > >> > On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD >> > CPUs) which stops RDTSC on all currently known microarchitectures that >> > implement SSE. On CPUs without SSE there is generally no RDTSC >> > speculation. >> >> i've picked up your rdtsc patches into x86.git but have simplified it: >> there's no nsec_barrier() anymore - rdtsc() is always synchronous. >> MFENCE/LFENCE is fast enough. Open-coding such barriers almost always >> leads to needless trouble. Please check the next x86.git tree. > > That's most likely wrong unless you added two barriers -- the barriers > are strictly need to be before and after RDTSC. I checked your patch now -- 743abf4d987911af1ffce4c96f06cba6ffaa7e88 and 428f309ba5244fa25b44fcdf1d79aa94c8745cfd in gitx86 and you did it indeed wrong. The problem is that you inserted the barrier only after the RDTSC, but the instruction can be speculated both forward and backward and both can cause inconsistencies if it leaves the critical section. The minimal fix would be to change native_read_tsc to rdtsc_barrier(); asm volatile("rdtsc" : EAX_EDX_RET(val, low, high)); rdtsc_barrier(); (e.g. add the missing barrier) Better would be still to keep the explicit nsec barriers as in the original patch for several reasons: - There are situations where RDTSC without barrier is ok and it might be useful there. - It is better to give the CPU some room to run uops in parallel; especially in very performance critical like gtod(). A wider barriered area is faster. - I actually expect that other architectures with aggressive OOO implementations (like POWER4/5) should make use of nsec_barrier() too. I don't think it's really an x86 specific concept. - Explicit barriers can be useful when measuring performance, although admittedly there a x86 specific barrier is usually ok. However forcing the barrier inside RDTSC is not. If you insist to keep the incorrect patch please drop my name from it. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [5/20] x86: Introduce nsec_barrier() II
Andi Kleen [EMAIL PROTECTED] writes: On Thursday 03 January 2008 11:47:54 Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: nsec_barrier() is a new barrier primitive that stops RDTSC speculation to avoid races with timer interrupts on other CPUs. Add it to all architectures. Except for x86 it is a nop right now. I only tested x86, but it's a very simple change. On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD CPUs) which stops RDTSC on all currently known microarchitectures that implement SSE. On CPUs without SSE there is generally no RDTSC speculation. i've picked up your rdtsc patches into x86.git but have simplified it: there's no nsec_barrier() anymore - rdtsc() is always synchronous. MFENCE/LFENCE is fast enough. Open-coding such barriers almost always leads to needless trouble. Please check the next x86.git tree. That's most likely wrong unless you added two barriers -- the barriers are strictly need to be before and after RDTSC. I checked your patch now -- 743abf4d987911af1ffce4c96f06cba6ffaa7e88 and 428f309ba5244fa25b44fcdf1d79aa94c8745cfd in gitx86 and you did it indeed wrong. The problem is that you inserted the barrier only after the RDTSC, but the instruction can be speculated both forward and backward and both can cause inconsistencies if it leaves the critical section. The minimal fix would be to change native_read_tsc to rdtsc_barrier(); asm volatile(rdtsc : EAX_EDX_RET(val, low, high)); rdtsc_barrier(); (e.g. add the missing barrier) Better would be still to keep the explicit nsec barriers as in the original patch for several reasons: - There are situations where RDTSC without barrier is ok and it might be useful there. - It is better to give the CPU some room to run uops in parallel; especially in very performance critical like gtod(). A wider barriered area is faster. - I actually expect that other architectures with aggressive OOO implementations (like POWER4/5) should make use of nsec_barrier() too. I don't think it's really an x86 specific concept. - Explicit barriers can be useful when measuring performance, although admittedly there a x86 specific barrier is usually ok. However forcing the barrier inside RDTSC is not. If you insist to keep the incorrect patch please drop my name from it. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/