Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
"volatile" has nothing to do with reordering. atomic_dec() writes to memory, so it _does_ have "volatile semantics", implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. Stores can be reordered. Only x86 has (mostly) implicit write ordering. So no atomic_dec has no volatile semantics Read again: I said the C "volatile" construct has nothing to do with CPU memory access reordering. and may be reordered on a variety of processors. Writes to memory may not follow code order on several processors. The _compiler_ isn't allowed to reorder things here. Yes, of course you do need stronger barriers for many purposes, volatile isn't all that useful you know. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Sep 10, 2007 at 02:36:26PM -0700, Christoph Lameter wrote: > On Mon, 10 Sep 2007, Paul E. McKenney wrote: > > > The one exception to this being the case where process-level code is > > communicating to an interrupt handler running on that same CPU -- on > > all CPUs that I am aware of, a given CPU always sees its own writes > > in order. > > Yes but that is due to the code path effectively continuing in the > interrupt handler. The cpu makes sure that op codes being executed always > see memory in a consistent way. The basic ordering problem with out of > order writes is therefore coming from other processors concurrently > executing code and holding variables in registers that are modified > elsewhere. The only solution that I know of are one or the other form of > barrier. So we are agreed then -- volatile accesses may be of some assistance when interacting with interrupt handlers running on the same CPU (presumably when using per-CPU variables), but are generally useless when sharing variables among CPUs. Correct? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007, Paul E. McKenney wrote: > The one exception to this being the case where process-level code is > communicating to an interrupt handler running on that same CPU -- on > all CPUs that I am aware of, a given CPU always sees its own writes > in order. Yes but that is due to the code path effectively continuing in the interrupt handler. The cpu makes sure that op codes being executed always see memory in a consistent way. The basic ordering problem with out of order writes is therefore coming from other processors concurrently executing code and holding variables in registers that are modified elsewhere. The only solution that I know of are one or the other form of barrier. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Sep 10, 2007 at 11:59:29AM -0700, Christoph Lameter wrote: > On Fri, 17 Aug 2007, Segher Boessenkool wrote: > > > "volatile" has nothing to do with reordering. atomic_dec() writes > > to memory, so it _does_ have "volatile semantics", implicitly, as > > long as the compiler cannot optimise the atomic variable away > > completely -- any store counts as a side effect. > > Stores can be reordered. Only x86 has (mostly) implicit write ordering. So > no atomic_dec has no volatile semantics and may be reordered on a variety > of processors. Writes to memory may not follow code order on several > processors. The one exception to this being the case where process-level code is communicating to an interrupt handler running on that same CPU -- on all CPUs that I am aware of, a given CPU always sees its own writes in order. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sep 10, 2007, at 12:46:33, Denys Vlasenko wrote: My point is that people are confused as to what atomic_read() exactly means, and this is bad. Same for cpu_relax(). First one says "read", and second one doesn't say "barrier". Q&A: Q: When is it OK to use atomic_read()? A: You are asking the question, so never. Q: But I need to check the value of the atomic at this point in time... A: Your code is buggy if it needs to do that on an atomic_t for anything other than debugging or optimization. Use either atomic_*_return() or a lock and some normal integers. Q: "So why can't the atomic_read DTRT magically?" A: Because "the right thing" depends on the situation and is usually best done with something other than atomic_t. If somebody can post some non-buggy code which is correctly using atomic_read() *and* depends on the compiler generating extra nonsensical loads due to "volatile" then the issue *might* be reconsidered. This also includes samples of code which uses atomic_read() and needs memory barriers (so that we can fix the buggy code, not so we can change atomic_read()). So far the only code samples anybody has posted are buggy regardless of whether or not the value and/or accessors are flagged "volatile" or not. And hey, maybe the volatile ops *should* be implemented in inline ASM for future- proof-ness, but that's a separate issue. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007, Linus Torvalds wrote: > The fact is, "volatile" *only* makes things worse. It generates worse > code, and never fixes any real bugs. This is a *fact*. Yes, lets just drop the volatiles now! We need a patch that gets rid of them Volunteers? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Segher Boessenkool wrote: > "volatile" has nothing to do with reordering. atomic_dec() writes > to memory, so it _does_ have "volatile semantics", implicitly, as > long as the compiler cannot optimise the atomic variable away > completely -- any store counts as a side effect. Stores can be reordered. Only x86 has (mostly) implicit write ordering. So no atomic_dec has no volatile semantics and may be reordered on a variety of processors. Writes to memory may not follow code order on several processors. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 16:09, Linus Torvalds wrote: > On Mon, 10 Sep 2007, Denys Vlasenko wrote: > > static inline int > > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) > > { > > int return_status = QLA_SUCCESS; > > unsigned long loop_timeout ; > > scsi_qla_host_t *pha = to_qla_parent(ha); > > > > /* wait for 5 min at the max for loop to be ready */ > > loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); > > > > while ((!atomic_read(&pha->loop_down_timer) && > > atomic_read(&pha->loop_state) == LOOP_DOWN) || > > atomic_read(&pha->loop_state) != LOOP_READY) { > > if (atomic_read(&pha->loop_state) == LOOP_DEAD) { > ... > > Is above correct or buggy? Correct, because msleep is a barrier. > > Is it obvious? No. > > It's *buggy*. But it has nothing to do with any msleep() in the loop, or > anything else. > > And more importantly, it would be equally buggy even *with* a "volatile" > atomic_read(). I am not saying that this code is okay, this isn't the point. (The code is in fact awful for several more reasons). My point is that people are confused as to what atomic_read() exactly means, and this is bad. Same for cpu_relax(). First one says "read", and second one doesn't say "barrier". This is real code from current kernel which demonstrates this: "I don't know that cpu_relax() is a barrier already": drivers/kvm/kvm_main.c while (atomic_read(&completed) != needed) { cpu_relax(); barrier(); } "I think that atomic_read() is a read from memory and therefore I don't need a barrier": arch/x86_64/kernel/crash.c msecs = 1000; /* Wait at most a second for the other cpus to stop */ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { mdelay(1); msecs--; } Since neither camp seems to give up, I am proposing renaming them to something less confusing, and make everybody happy. cpu_relax_barrier() atomic_value(&x) atomic_fetch(&x) I'm not native English speaker, do these sound better? -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007 15:38:23 +0100 Denys Vlasenko <[EMAIL PROTECTED]> wrote: > On Monday 10 September 2007 15:51, Arjan van de Ven wrote: > > On Mon, 10 Sep 2007 11:56:29 +0100 > > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > > > > > > Well, if you insist on having it again: > > > > > > Waiting for atomic value to be zero: > > > > > > while (atomic_read(&x)) > > > continue; > > > > > > > and this I would say is buggy code all the way. > > > > Not from a pure C level semantics, but from a "busy waiting is > > buggy" semantics level and a "I'm inventing my own locking" > > semantics level. > > After inspecting arch/*, I cannot agree with you. the arch/ people obviously are allowed to do their own locking stuff... BECAUSE THEY HAVE TO IMPLEMENT THAT! the arch maintainers know EXACTLY how their hw behaves (well, we hope) so they tend to be the exception to many rules in the kernel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007, Denys Vlasenko wrote: > > static inline int > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) > { > int return_status = QLA_SUCCESS; > unsigned long loop_timeout ; > scsi_qla_host_t *pha = to_qla_parent(ha); > > /* wait for 5 min at the max for loop to be ready */ > loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); > > while ((!atomic_read(&pha->loop_down_timer) && > atomic_read(&pha->loop_state) == LOOP_DOWN) || > atomic_read(&pha->loop_state) != LOOP_READY) { > if (atomic_read(&pha->loop_state) == LOOP_DEAD) { ... > Is above correct or buggy? Correct, because msleep is a barrier. > Is it obvious? No. It's *buggy*. But it has nothing to do with any msleep() in the loop, or anything else. And more importantly, it would be equally buggy even *with* a "volatile" atomic_read(). Why is this so hard for people to understand? You're all acting like morons. The reason it is buggy has absolutely nothing to do with whether the read is done or not, it has to do with the fact that the CPU may re-order the reads *regardless* of whether the read is done in some specific order by the compiler ot not! In effect, there is zero ordering between all those three reads, and if you don't have memory barriers (or a lock or other serialization), that code is buggy. So stop this idiotic discussion thread already. The above kind of code needs memory barriers to be non-buggy. The whole "volatile or not" discussion is totally idiotic, and pointless, and anybody who doesn't understand that by now needs to just shut up and think about it more, rather than make this discussion drag out even further. The fact is, "volatile" *only* makes things worse. It generates worse code, and never fixes any real bugs. This is a *fact*. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 15:51, Arjan van de Ven wrote: > On Mon, 10 Sep 2007 11:56:29 +0100 > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > > > Well, if you insist on having it again: > > > > Waiting for atomic value to be zero: > > > > while (atomic_read(&x)) > > continue; > > > > and this I would say is buggy code all the way. > > Not from a pure C level semantics, but from a "busy waiting is buggy" > semantics level and a "I'm inventing my own locking" semantics level. After inspecting arch/*, I cannot agree with you. Otherwise almost all major architectures use "conceptually buggy busy-waiting": arch/alpha arch/i386 arch/ia64 arch/m32r arch/mips arch/parisc arch/powerpc arch/sh arch/sparc64 arch/um arch/x86_64 All of the above contain busy-waiting on atomic_read. Including these loops without barriers: arch/mips/kernel/smtc.c while (atomic_read(&idle_hook_initialized) < 1000) ; arch/mips/sgi-ip27/ip27-nmi.c while (atomic_read(&nmied_cpus) != num_online_cpus()); [Well maybe num_online_cpus() is a barrier, I didn't check] arch/sh/kernel/smp.c if (wait) while (atomic_read(&smp_fn_call.finished) != (nr_cpus - 1)); Bugs? -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 14:38, Denys Vlasenko wrote: > You are basically trying to educate me how to use atomic properly. > You don't need to do it, as I am (currently) not a driver author. > > I am saying that people who are already using atomic_read() > (and who unfortunately did not read your explanation above) > will still sometimes use atomic_read() as a way to read atomic value > *from memory*, and will create nasty heisenbugs for you to debug. static inline int qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) { int return_status = QLA_SUCCESS; unsigned long loop_timeout ; scsi_qla_host_t *pha = to_qla_parent(ha); /* wait for 5 min at the max for loop to be ready */ loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); while ((!atomic_read(&pha->loop_down_timer) && atomic_read(&pha->loop_state) == LOOP_DOWN) || atomic_read(&pha->loop_state) != LOOP_READY) { if (atomic_read(&pha->loop_state) == LOOP_DEAD) { return_status = QLA_FUNCTION_FAILED; break; } msleep(1000); if (time_after_eq(jiffies, loop_timeout)) { return_status = QLA_FUNCTION_FAILED; break; } } return (return_status); } Is above correct or buggy? Correct, because msleep is a barrier. Is it obvious? No. static void qla2x00_rst_aen(scsi_qla_host_t *ha) { if (ha->flags.online && !ha->flags.reset_active && !atomic_read(&ha->loop_down_timer) && !(test_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags))) { do { clear_bit(RESET_MARKER_NEEDED, &ha->dpc_flags); /* * Issue marker command only when we are going to start * the I/O. */ ha->marker_needed = 1; } while (!atomic_read(&ha->loop_down_timer) && (test_bit(RESET_MARKER_NEEDED, &ha->dpc_flags))); } } Is above correct? I honestly don't know. Correct, because set_bit is a barrier on _all _memory_? Will it break if set_bit will be changed to be a barrier only on its operand? Probably yes. drivers/kvm/kvm_main.c while (atomic_read(&completed) != needed) { cpu_relax(); barrier(); } Obviously author did not know that cpu_relax is already a barrier. See why I think driver authors will be confused? arch/x86_64/kernel/crash.c static void nmi_shootdown_cpus(void) { ... msecs = 1000; /* Wait at most a second for the other cpus to stop */ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { mdelay(1); msecs--; } ... } Is mdelay(1) a barrier? Yes, because it is a function on x86_64. Absolutely the same code will be buggy on an arch where mdelay(1) == udelay(1000), and udelay is implemented as inline busy-wait. arch/sparc64/kernel/smp.c /* Wait for response */ while (atomic_read(&data.finished) != cpus) cpu_relax(); ...later in the same file... while (atomic_read(&smp_capture_registry) != ncpus) rmb(); I'm confused. Do we need cpu_relax() or rmb()? Does cpu_relax() imply rmb()? (No it doesn't). Which of those two while loops needs correcting? -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 10 Sep 2007 11:56:29 +0100 Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > Well, if you insist on having it again: > > Waiting for atomic value to be zero: > > while (atomic_read(&x)) > continue; > and this I would say is buggy code all the way. Not from a pure C level semantics, but from a "busy waiting is buggy" semantics level and a "I'm inventing my own locking" semantics level. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 13:22, Kyle Moffett wrote: > On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote: > > On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: > >> On Sun, 9 Sep 2007 19:02:54 +0100 > >> Denys Vlasenko <[EMAIL PROTECTED]> wrote: > >> > >>> Why is all this fixation on "volatile"? I don't think people want > >>> "volatile" keyword per se, they want atomic_read(&x) to _always_ > >>> compile into an memory-accessing instruction, not register access. > >> > >> and ... why is that? is there any valid, non-buggy code sequence > >> that makes that a reasonable requirement? > > > > Well, if you insist on having it again: > > > > Waiting for atomic value to be zero: > > > > while (atomic_read(&x)) > > continue; > > > > gcc may happily convert it into: > > > > reg = atomic_read(&x); > > while (reg) > > continue; > > Bzzt. Even if you fixed gcc to actually convert it to a busy loop on > a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that > does the conversion, it may be that the CPU does the caching of the > memory value. GCC has no mechanism to do cache-flushes or memory- > barriers except through our custom inline assembly. CPU can cache the value all right, but it cannot use that cached value *forever*, it has to react to invalidate cycles on the shared bus and re-fetch new data. IOW: atomic_read(&x) which compiles down to memory accessor will work properly. > the CPU. Thirdly, on a large system it may take some arbitrarily > large amount of time for cache-propagation to update the value of the > variable in your local CPU cache. Yes, but "arbitrarily large amount of time" is actually measured in nanoseconds here. Let's say 1000ns max for hundreds of CPUs? > Also, you > probably want a cpu_relax() in there somewhere to avoid overheating > the CPU. Yes, but 1. CPU shouldn't overheat (in a sense that it gets damaged), it will only use more power than needed. 2. cpu_relax() just throttles down my CPU, so it's performance optimization only. Wait, it isn't, it's a barrier too. Wow, "cpu_relax" is a barrier? How am I supposed to know that without reading lkml flamewars and/or header files? Let's try reading headers. asm-x86_64/processor.h: #define cpu_relax() rep_nop() So, is it a barrier? No clue yet. /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */ static inline void rep_nop(void) { __asm__ __volatile__("rep;nop": : :"memory"); } Comment explicitly says that it is "a good thing" (doesn't say that it is mandatory) and says NOTHING about barriers! Barrier-ness is not mentioned and is hidden in "memory" clobber. Do you think it's obvious enough for average driver writer? I think not, especially that it's unlikely for him to even start suspecting that it is a memory barrier based on the "cpu_relax" name. > You simply CANNOT use an atomic_t as your sole synchronizing > primitive, it doesn't work! You virtually ALWAYS want to use an > atomic_t in the following types of situations: > > (A) As an object refcount. The value is never read except as part of > an atomic_dec_return(). Why aren't you using "struct kref"? > > (B) As an atomic value counter (number of processes, for example). > Just "reading" the value is racy anyways, if you want to enforce a > limit or something then use atomic_inc_return(), check the result, > and use atomic_dec() if it's too big. If you just want to return the > statistics then you are going to be instantaneous-point-in-time anyways. > > (C) As an optimization value (statistics-like, but exact accuracy > isn't important). > > Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like > completions, mutexes, semaphores, spinlocks, krefs, etc. It's not > useful for synchronization, only for keeping track of simple integer > RMW values. Note that atomic_read() and atomic_set() aren't very > useful RMW primitives (read-nomodify-nowrite and read-set-zero- > write). Code which assumes anything else is probably buggy in other > ways too. You are basically trying to educate me how to use atomic properly. You don't need to do it, as I am (currently) not a driver author. I am saying that people who are already using atomic_read() (and who unfortunately did not read your explanation above) will still sometimes use atomic_read() as a way to read atomic value *from memory*, and will create nasty heisenbugs for you to debug. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote: On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: On Sun, 9 Sep 2007 19:02:54 +0100 Denys Vlasenko <[EMAIL PROTECTED]> wrote: Why is all this fixation on "volatile"? I don't think people want "volatile" keyword per se, they want atomic_read(&x) to _always_ compile into an memory-accessing instruction, not register access. and ... why is that? is there any valid, non-buggy code sequence that makes that a reasonable requirement? Well, if you insist on having it again: Waiting for atomic value to be zero: while (atomic_read(&x)) continue; gcc may happily convert it into: reg = atomic_read(&x); while (reg) continue; Bzzt. Even if you fixed gcc to actually convert it to a busy loop on a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that does the conversion, it may be that the CPU does the caching of the memory value. GCC has no mechanism to do cache-flushes or memory- barriers except through our custom inline assembly. Also, you probably want a cpu_relax() in there somewhere to avoid overheating the CPU. Thirdly, on a large system it may take some arbitrarily large amount of time for cache-propagation to update the value of the variable in your local CPU cache. Finally, if atomics are based on based on spinlock+interrupt-disable then you will sit in a tight busy- loop of spin_lock_irqsave()->spin_unlock_irqrestore(). Depending on your system's internal model this may practically lock up your core because the spin_lock() will take the cacheline for exclusive access and doing that in a loop can prevent any other CPU from doing any operation on it! Since your IRQs are disabled you even have a very small window that an IRQ will come along and free it up long enough for the update to take place. The earlier code segment of: while(atomic_read(&x) > 0) atomic_dec(&x); is *completely* buggy because you could very easily have 4 CPUs doing this on an atomic variable with a value of 1 and end up with it at negative 3 by the time you are done. Moreover all the alternatives are also buggy, with the sole exception of this rather obvious- seeming one: atomic_set(&x, 0); You simply CANNOT use an atomic_t as your sole synchronizing primitive, it doesn't work! You virtually ALWAYS want to use an atomic_t in the following types of situations: (A) As an object refcount. The value is never read except as part of an atomic_dec_return(). Why aren't you using "struct kref"? (B) As an atomic value counter (number of processes, for example). Just "reading" the value is racy anyways, if you want to enforce a limit or something then use atomic_inc_return(), check the result, and use atomic_dec() if it's too big. If you just want to return the statistics then you are going to be instantaneous-point-in-time anyways. (C) As an optimization value (statistics-like, but exact accuracy isn't important). Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like completions, mutexes, semaphores, spinlocks, krefs, etc. It's not useful for synchronization, only for keeping track of simple integer RMW values. Note that atomic_read() and atomic_set() aren't very useful RMW primitives (read-nomodify-nowrite and read-set-zero- write). Code which assumes anything else is probably buggy in other ways too. So while I see no real reason for the "volatile" on the atomics, I also see no real reason why it's terribly harmful. Regardless of the "volatile" on the operation the CPU is perfectly happy to cache it anyways so it doesn't buy you any actual "always-access-memory" guarantees. If you are just interested in it as an optimization you could probably just read the properly-aligned integer counter directly, an atomic read on most CPUs. If you really need it to hit main memory *every* *single* *time* (Why? Are you using it instead of the proper kernel subsystem?) then you probably need a custom inline assembly helper anyways. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Sep 10, 2007 at 11:56:29AM +0100, Denys Vlasenko wrote: > > Expecting every driver writer to remember that atomic_read is not in fact > a "read from memory" is naive. That won't happen. Face it, majority of > driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;) > The name of the macro is saying that it's a read. > We are confusing users here. For driver authors who're too busy to learn the intricacies of atomic operations, we have the plain old spin lock which then lets you use normal data structures such as u32 safely. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: > On Sun, 9 Sep 2007 19:02:54 +0100 > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > Why is all this fixation on "volatile"? I don't think > > people want "volatile" keyword per se, they want atomic_read(&x) to > > _always_ compile into an memory-accessing instruction, not register > > access. > > and ... why is that? > is there any valid, non-buggy code sequence that makes that a > reasonable requirement? Well, if you insist on having it again: Waiting for atomic value to be zero: while (atomic_read(&x)) continue; gcc may happily convert it into: reg = atomic_read(&x); while (reg) continue; Expecting every driver writer to remember that atomic_read is not in fact a "read from memory" is naive. That won't happen. Face it, majority of driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;) The name of the macro is saying that it's a read. We are confusing users here. It's doubly confusing that cpy_relax(), which says _nothing_ about barriers in its name, is actually a barrier you need to insert here. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sun, 9 Sep 2007 19:02:54 +0100 Denys Vlasenko <[EMAIL PROTECTED]> wrote: > Why is all this fixation on "volatile"? I don't think > people want "volatile" keyword per se, they want atomic_read(&x) to > _always_ compile into an memory-accessing instruction, not register > access. and ... why is that? is there any valid, non-buggy code sequence that makes that a reasonable requirement? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Friday 17 August 2007 17:48, Linus Torvalds wrote: > > On Fri, 17 Aug 2007, Nick Piggin wrote: > > > > That's not obviously just taste to me. Not when the primitive has many > > (perhaps, the majority) of uses that do not require said barriers. And > > this is not solely about the code generation (which, as Paul says, is > > relatively minor even on x86). I prefer people to think explicitly > > about barriers in their lockless code. > > Indeed. > > I think the important issues are: > > - "volatile" itself is simply a badly/weakly defined issue. The semantics >of it as far as the compiler is concerned are really not very good, and >in practice tends to boil down to "I will generate so bad code that >nobody can accuse me of optimizing anything away". > > - "volatile" - regardless of how well or badly defined it is - is purely >a compiler thing. It has absolutely no meaning for the CPU itself, so >it at no point implies any CPU barriers. As a result, even if the >compiler generates crap code and doesn't re-order anything, there's >nothing that says what the CPU will do. > > - in other words, the *only* possible meaning for "volatile" is a purely >single-CPU meaning. And if you only have a single CPU involved in the >process, the "volatile" is by definition pointless (because even >without a volatile, the compiler is required to make the C code appear >consistent as far as a single CPU is concerned). > > So, let's take the example *buggy* code where we use "volatile" to wait > for other CPU's: > > atomic_set(&var, 0); > while (!atomic_read(&var)) > /* nothing */; > > > which generates an endless loop if we don't have atomic_read() imply > volatile. > > The point here is that it's buggy whether the volatile is there or not! > Exactly because the user expects multi-processing behaviour, but > "volatile" doesn't actually give any real guarantees about it. Another CPU > may have done: > > external_ptr = kmalloc(..); > /* Setup is now complete, inform the waiter */ > atomic_inc(&var); > > but the fact is, since the other CPU isn't serialized in any way, the > "while-loop" (even in the presense of "volatile") doesn't actually work > right! Whatever the "atomic_read()" was waiting for may not have > completed, because we have no barriers! Why is all this fixation on "volatile"? I don't think people want "volatile" keyword per se, they want atomic_read(&x) to _always_ compile into an memory-accessing instruction, not register access. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Friday 24 August 2007 18:15, Christoph Lameter wrote: > On Fri, 24 Aug 2007, Denys Vlasenko wrote: > > On Thursday 16 August 2007 00:22, Paul Mackerras wrote: > > > Satyam Sharma writes: > > > In the kernel we use atomic variables in precisely those situations > > > where a variable is potentially accessed concurrently by multiple > > > CPUs, and where each CPU needs to see updates done by other CPUs in a > > > timely fashion. That is what they are for. Therefore the compiler > > > must not cache values of atomic variables in registers; each > > > atomic_read must result in a load and each atomic_set must result in a > > > store. Anything else will just lead to subtle bugs. > > > > Amen. > > A "timely" fashion? One cannot rely on something like that when coding. > The visibility of updates is insured by barriers and not by some fuzzy > notion of "timeliness". But here you do have some notion of time: while (atomic_read(&x)) continue; "continue when other CPU(s) decrement it down to zero". If "read" includes an insn which accesses RAM, you will see "new" value sometime after other CPU decrements it. "Sometime after" is on the order of nanoseconds here. It is a valid concept of time, right? The whole confusion is about whether atomic_read implies "read from RAM" or not. I am in a camp which thinks it does. You are in an opposite one. We just need a less ambiguous name. What about this: /** * atomic_read - read atomic variable * @v: pointer of type atomic_t * * Atomically reads the value of @v. * No compiler barrier implied. */ #define atomic_read(v) ((v)->counter) +/** + * atomic_read_uncached - read atomic variable from memory + * @v: pointer of type atomic_t + * + * Atomically reads the value of @v. This is guaranteed to emit an insn + * which accesses memory, atomically. No ordering guarantees! + */ +#define atomic_read_uncached(v) asm_or_volatile_ptr_magic(v) I was thinking of s/atomic_read/atomic_get/ too, but it implies "taking" atomic a-la get_cpu()... -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 24 Aug 2007, Denys Vlasenko wrote: > > > No, you don't use "x.counter++". But you *do* use > > > > if (atomic_read(&x) <= 1) > > > > and loading into a register is stupid and pointless, when you could just > > do it as a regular memory-operand to the cmp instruction. > > It doesn't mean that (volatile int*) cast is bad, it means that current gcc > is bad (or "not good enough"). IOW: instead of avoiding volatile cast, > it's better to fix the compiler. I would agree that fixing the compiler in this case would be a good thing, even quite regardless of any "atomic_read()" discussion. I just have a strong suspicion that "volatile" performance is so low down the list of any C compiler persons interest, that it's never going to happen. And quite frankly, I cannot blame the gcc guys for it. That's especially as "volatile" really isn't a very good feature of the C language, and is likely to get *less* interesting rather than more (as user space starts to be more and more threaded, "volatile" gets less and less useful. [ Ie, currently, I think you can validly use "volatile" in a "sigatomic_t" kind of way, where there is a single thread, but with asynchronous events. In that kind of situation, I think it's probably useful. But once you get multiple threads, it gets pointless. Sure: you could use "volatile" together with something like Dekker's or Peterson's algorithm that doesn't depend on cache coherency (that's basically what the C "volatile" keyword approximates: not atomic accesses, but *uncached* accesses! But let's face it, that's way past insane. ] So I wouldn't expect "volatile" to ever really generate better code. It might happen as a side effect of other improvements (eg, I might hope that the SSA work would eventually lead to gcc having a much better defined model of valid optimizations, and maybe better code generation for volatile accesses fall out cleanly out of that), but in the end, it's such an ugly special case in C, and so seldom used, that I wouldn't depend on it. > Linus, in all honesty gcc has many more cases of suboptimal code, > case of "volatile" is just one of many. Well, the thing is, quite often, many of those "suboptimal code" generations fall into two distinct classes: - complex C code. I can't really blame the compiler too much for this. Some things are *hard* to optimize, and for various scalability reasons, you often end up having limits in the compiler where it doesn't even _try_ doing certain optimizations if you have excessive complexity. - bad register allocation. Register allocation really is hard, and sometimes gcc just does the "obviously wrong" thing, and you end up having totally unnecessary spills. > Off the top of my head: Yes, "unsigned long long" with x86 has always generated atrocious code. In fact, I would say that historically it was really *really* bad. These days, gcc actually does a pretty good job, but I'm not surprised that it's still quite possible to find cases where it did some optimization (in this case, apparently noticing that "shift by >= 32 bits" causes the low register to be pointless) and then missed *another* optimization (better register use) because that optimization had been done *before* the first optimization was done. That's a *classic* example of compiler code generation issues, and quite frankly, I think that's very different from the issue of "volatile". Quite frankly, I'd like there to be more competition in the open source compiler game, and that might cause some upheavals, but on the whole, gcc actually does a pretty damn good job. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 24 Aug 2007, Denys Vlasenko wrote: > On Thursday 16 August 2007 00:22, Paul Mackerras wrote: > > Satyam Sharma writes: > > In the kernel we use atomic variables in precisely those situations > > where a variable is potentially accessed concurrently by multiple > > CPUs, and where each CPU needs to see updates done by other CPUs in a > > timely fashion. That is what they are for. Therefore the compiler > > must not cache values of atomic variables in registers; each > > atomic_read must result in a load and each atomic_set must result in a > > store. Anything else will just lead to subtle bugs. > > Amen. A "timely" fashion? One cannot rely on something like that when coding. The visibility of updates is insured by barriers and not by some fuzzy notion of "timeliness". - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thursday 16 August 2007 00:22, Paul Mackerras wrote: > Satyam Sharma writes: > In the kernel we use atomic variables in precisely those situations > where a variable is potentially accessed concurrently by multiple > CPUs, and where each CPU needs to see updates done by other CPUs in a > timely fashion. That is what they are for. Therefore the compiler > must not cache values of atomic variables in registers; each > atomic_read must result in a load and each atomic_set must result in a > store. Anything else will just lead to subtle bugs. Amen. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Saturday 18 August 2007 05:13, Linus Torvalds wrote: > On Sat, 18 Aug 2007, Satyam Sharma wrote: > > No code does (or would do, or should do): > > > > x.counter++; > > > > on an "atomic_t x;" anyway. > > That's just an example of a general problem. > > No, you don't use "x.counter++". But you *do* use > > if (atomic_read(&x) <= 1) > > and loading into a register is stupid and pointless, when you could just > do it as a regular memory-operand to the cmp instruction. It doesn't mean that (volatile int*) cast is bad, it means that current gcc is bad (or "not good enough"). IOW: instead of avoiding volatile cast, it's better to fix the compiler. > And as far as the compiler is concerned, the problem is the 100% same: > combining operations with the volatile memop. > > The fact is, a compiler that thinks that > > movl mem,reg > cmpl $val,reg > > is any better than > > cmpl $val,mem > > is just not a very good compiler. Linus, in all honesty gcc has many more cases of suboptimal code, case of "volatile" is just one of many. Off the top of my head: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417 unsigned v; void f(unsigned A) { v = ((unsigned long long)A) * 365384439 >> (27+32); } gcc-4.1.1 -S -Os -fomit-frame-pointer t.c f: movl$365384439, %eax mull4(%esp) movl%edx, %eax <= ? shrl$27, %eax movl%eax, v ret Why is it moving %edx to %eax? gcc-4.2.1 -S -Os -fomit-frame-pointer t.c f: movl$365384439, %eax mull4(%esp) movl%edx, %eax <= ? xorl%edx, %edx <= ??! shrl$27, %eax movl%eax, v ret Progress... Now we also zero out %edx afterwards for no apparent reason. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, Aug 21, 2007 at 06:51:16PM -0400, [EMAIL PROTECTED] wrote: > On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said: > > > I agree that instant gratification is hard to come by when synching > > up compiler and kernel versions. Nonetheless, it should be possible > > to create APIs that are are conditioned on the compiler version. > > We've tried that, sort of. See the mess surrounding the whole > extern/static/inline/__whatever boondogle, which seems to have > changed semantics in every single gcc release since 2.95 or so. >... There is exactly one semantics change in gcc in this area, and that is the change of the "extern inline" semantics in gcc 4.3 to the C99 semantics. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, Aug 21, 2007 at 06:51:16PM -0400, [EMAIL PROTECTED] wrote: > On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said: > > > I agree that instant gratification is hard to come by when synching > > up compiler and kernel versions. Nonetheless, it should be possible > > to create APIs that are are conditioned on the compiler version. > > We've tried that, sort of. See the mess surrounding the whole > extern/static/inline/__whatever boondogle, which seems to have > changed semantics in every single gcc release since 2.95 or so. > > And recently mention was made that gcc4.4 will have *new* semantics > in this area. Yee. Hah. ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said: > I agree that instant gratification is hard to come by when synching > up compiler and kernel versions. Nonetheless, it should be possible > to create APIs that are are conditioned on the compiler version. We've tried that, sort of. See the mess surrounding the whole extern/static/inline/__whatever boondogle, which seems to have changed semantics in every single gcc release since 2.95 or so. And recently mention was made that gcc4.4 will have *new* semantics in this area. Yee. Hah. pgpGx7YTiWc5V.pgp Description: PGP signature
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, 21 Aug 2007, Chris Snook wrote: > > Moore's law is definitely working against us here. Register counts, pipeline > depths, core counts, and clock multipliers are all increasing in the long run. > At some point in the future, barrier() will be universally regarded as a > hammer too big for most purposes. Note that "barrier()" is purely a compiler barrier. It has zero impact on the CPU pipeline itself, and also has zero impact on anything that gcc knows isn't visible in memory (ie local variables that don't have their address taken), so barrier() really is pretty cheap. Now, it's possible that gcc messes up in some circumstances, and that the memory clobber will cause gcc to also do things like flush local registers unnecessarily to their stack slots, but quite frankly, if that happens, it's a gcc problem, and I also have to say that I've not seen that myself. So in a very real sense, "barrier()" will just make sure that there is a stronger sequence point for the compiler where things are stable. In most cases it has absolutely zero performance impact - apart from the -intended- impact of making sure that the compiler doesn't re-order or cache stuff around it. And sure, we could make it more finegrained, and also introduce a per-variable barrier, but the fact is, people _already_ have problems with thinking about these kinds of things, and adding new abstraction issues with subtle semantics is the last thing we want. So I really think you'd want to show a real example of real code that actually gets noticeably slower or bigger. In removing "volatile", we have shown that. It may not have made a big difference on powerpc, but it makes a real difference on x86 - and more importantly, it removes something that people clearly don't know how it works, and incorrectly expect to just fix bugs. [ There are *other* barriers - the ones that actually add memory barriers to the CPU - that really can be quite expensive. The good news is that the expense is going down rather than up: both Intel and AMD are not only removing the need for some of them (ie "smp_rmb()" will become a compiler-only barrier), but we're _also_ seeing the whole "pipeline flush" approach go away, and be replaced by the CPU itself actually being better - so even the actual CPU pipeline barriers are getting cheaper, not more expensive. ] For example, did anybody even _test_ how expensive "barrier()" is? Just as a lark, I did #undef barrier #define barrier() do { } while (0) in kernel/sched.c (which only has three of them in it, but hey, that's more than most files), and there were _zero_ code generation downsides. One instruction was moved (and a few line numbers changed), so it wasn't like the assembly language was identical, but the point is, barrier() simply doesn't have the same kinds of downsides that "volatile" has. (That may not be true on other architectures or in other source files, of course. This *does* depend on code generation details. But anybody who thinks that "barrier()" is fundamentally expensive is simply incorrect. It is *fundamnetally* a no-op). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, 21 Aug 2007, Chris Snook wrote: > David Miller wrote: > > From: Linus Torvalds <[EMAIL PROTECTED]> > > Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT) > > > > > Ie a "barrier()" is likely _cheaper_ than the code generation downside > > > from using "volatile". > > > > Assuming GCC were ever better about the code generation badness > > with volatile that has been discussed here, I much prefer > > we tell GCC "this memory piece changed" rather than "every > > piece of memory has changed" which is what the barrier() does. > > > > I happened to have been scanning a lot of assembler lately to > > track down a gcc-4.2 miscompilation on sparc64, and the barriers > > do hurt quite a bit in some places. Instead of keeping unrelated > > variables around cached in local registers, it reloads everything. > > Moore's law is definitely working against us here. Register counts, pipeline > depths, core counts, and clock multipliers are all increasing in the long run. > At some point in the future, barrier() will be universally regarded as a > hammer too big for most purposes. I do agree, and the important point to note is that the benefits of a /lighter/ compiler barrier, such as what David referred to above, _can_ be had without having to do anything with the "volatile" keyword at all. And such a primitive has already been mentioned/proposed on this thread. But this is all tangential to the core question at hand -- whether to have implicit (compiler, possibly "light-weight" of the kind referred above) barrier semantics in atomic ops that do not have them, or not. I was lately looking in the kernel for _actual_ code that uses atomic_t and benefits from the lack of any implicit barrier, with the compiler being free to cache the atomic_t in a register. Now that often does _not_ happen, because all other ops (implemented in asm with LOCK prefix on x86) _must_ therefore constrain the atomic_t to memory anyway. So typically all atomic ops code sequences end up operating on memory. Then I did locate sched.c:select_nohz_load_balancer() -- it repeatedly references the same atomic_t object, and the code that I saw generated (with CC_OPTIMIZE_FOR_SIZE=y) did cache it in a register for a sequence of instructions. It uses atomic_cmpxchg, thereby not requiring explicit memory barriers anywhere in the code, and is an example of an atomic_t user that is safe, and yet benefits from its memory loads/stores being elided/coalesced by the compiler. # at this point, %%eax holds num_online_cpus() and # %%ebx holds cpus_weight(nohz.cpu_mask) # the variable "cpu" is in %esi 0xc1018e1d: cmp%eax,%ebx # if No.A. 0xc1018e1f: mov0xc134d900,%eax # first atomic_read() 0xc1018e24: jne0xc1018e36 0xc1018e26: cmp%esi,%eax # if No.B. 0xc1018e28: jne0xc1018e80 # returns with 0 0xc1018e2a: movl $0x,0xc134d900 # atomic_set(-1), and ... 0xc1018e34: jmp0xc1018e80 # ... returns with 0 0xc1018e36: cmp$0x,%eax# if No.C. (NOTE!) 0xc1018e39: jne0xc1018e46 0xc1018e3b: lock cmpxchg %esi,0xc134d900 # atomic_cmpxchg() 0xc1018e43: inc%eax 0xc1018e44: jmp0xc1018e48 0xc1018e46: cmp%esi,%eax # if No.D. (NOTE!) 0xc1018e48: jne0xc1018e80 # if !=, default return 0 (if No.E.) 0xc1018e4a: jmp0xc1018e84 # otherwise (==) returns with 1 The above is: if (cpus_weight(nohz.cpu_mask) == num_online_cpus()) { /* if No.A. */ if (atomic_read(&nohz.load_balancer) == cpu)/* if No.B. */ atomic_set(&nohz.load_balancer, -1);/* XXX */ return 0; } if (atomic_read(&nohz.load_balancer) == -1) { /* if No.C. */ /* make me the ilb owner */ if (atomic_cmpxchg(&nohz.load_balancer, -1, cpu) == -1) /* if No.E. */ return 1; } else if (atomic_read(&nohz.load_balancer) == cpu) /* if No.D. */ return 1; ... ... return 0; /* default return from function */ As you can see, the atomic_read()'s of "if"s Nos. B, C, and D, were _all_ coalesced into a single memory reference "mov0xc134d900,%eax" at the top of the function, and then "if"s Nos. C and D simply used the value from %%eax itself. But that's perfectly safe, such is the logic of this function. It uses cmpxchg _whenever_ updating the value in the memory atomic_t and then returns appropriately. The _only_ point that a casual reader may find racy is that marked /* XXX */ above -- atomic_read() followed by atomic_set() with no barrier in between. But even that is ok, because if one thread ever finds that condition to succeed, it is 100% guaranteed no other thread on any other CPU will find _any_ condition to be true, thereby avoiding any race in the modification of that value.
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, Aug 21, 2007 at 04:48:51PM +0200, Segher Boessenkool wrote: > >>Let me say it more clearly: On ARM, it is impossible to perform atomic > >>operations on MMIO space. > > > >Actually, no one is suggesting that we try to do that at all. > > > >The discussion about RMW ops on MMIO space started with a comment > >attributed to the gcc developers that one reason why gcc on x86 > >doesn't use instructions that do RMW ops on volatile variables is that > >volatile is used to mark MMIO addresses, and there was some > >uncertainty about whether (non-atomic) RMW ops on x86 could be used on > >MMIO. This is in regard to the question about why gcc on x86 always > >moves a volatile variable into a register before doing anything to it. > > This question is GCC PR33102, which was incorrectly closed as a > duplicate > of PR3506 -- and *that* PR was closed because its reporter seemed to > claim the GCC generated code for an increment on a volatile (namely, > three > machine instructions: load, modify, store) was incorrect, and it has to > be one machine instruction. > > >So the whole discussion is irrelevant to ARM, PowerPC and any other > >architecture except x86[-64]. > > And even there, it's not something the kernel can take advantage of > before GCC 4.4 is in widespread use, if then. Let's move on. I agree that instant gratification is hard to come by when synching up compiler and kernel versions. Nonetheless, it should be possible to create APIs that are are conditioned on the compiler version. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
At some point in the future, barrier() will be universally regarded as a hammer too big for most purposes. Whether or not removing it now You can't just remove it, it is needed in some places; you want to replace it in most places with a more fine-grained "compiler barrier", I presume? constitutes premature optimization is arguable, but I think we should allow such optimization to happen (or not happen) in architecture-dependent code, and provide a consistent API that doesn't require the use of such things in arch-independent code where it might turn into a totally superfluous performance killer depending on what hardware it gets compiled for. Explicit barrier()s won't be too hard to replace -- but what to do about the implicit barrier()s in rmb() etc. etc. -- *those* will be hard to get rid of, if only because it is hard enough to teach driver authors about how to use those primitives *already*. It is far from clear what a good interface like that would look like, anyway. Probably we should first start experimenting with a forget()-style micro-barrier (but please, find a better name), and see if a nice usage pattern shows up that can be turned into an API. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Let me say it more clearly: On ARM, it is impossible to perform atomic operations on MMIO space. Actually, no one is suggesting that we try to do that at all. The discussion about RMW ops on MMIO space started with a comment attributed to the gcc developers that one reason why gcc on x86 doesn't use instructions that do RMW ops on volatile variables is that volatile is used to mark MMIO addresses, and there was some uncertainty about whether (non-atomic) RMW ops on x86 could be used on MMIO. This is in regard to the question about why gcc on x86 always moves a volatile variable into a register before doing anything to it. This question is GCC PR33102, which was incorrectly closed as a duplicate of PR3506 -- and *that* PR was closed because its reporter seemed to claim the GCC generated code for an increment on a volatile (namely, three machine instructions: load, modify, store) was incorrect, and it has to be one machine instruction. So the whole discussion is irrelevant to ARM, PowerPC and any other architecture except x86[-64]. And even there, it's not something the kernel can take advantage of before GCC 4.4 is in widespread use, if then. Let's move on. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
And no, RMW on MMIO isn't "problematic" at all, either. An RMW op is a read op, a modify op, and a write op, all rolled into one opcode. But three actual operations. Maybe for some CPUs, but not all. ARM for instance can't use the load exclusive and store exclusive instructions to MMIO space. Sure, your CPU doesn't have RMW instructions -- how to emulate those if you don't have them is a totally different thing. Let me say it more clearly: On ARM, it is impossible to perform atomic operations on MMIO space. It's all completely beside the point, see the other subthread, but... Yeah, you can't do LL/SC to MMIO space; ARM isn't alone in that. You could still implement atomic operations on MMIO space by taking a lock elsewhere, in normal cacheable memory space. Why you would do this is a separate question, you probably don't want it :-) Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
David Miller wrote: From: Linus Torvalds <[EMAIL PROTECTED]> Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT) Ie a "barrier()" is likely _cheaper_ than the code generation downside from using "volatile". Assuming GCC were ever better about the code generation badness with volatile that has been discussed here, I much prefer we tell GCC "this memory piece changed" rather than "every piece of memory has changed" which is what the barrier() does. I happened to have been scanning a lot of assembler lately to track down a gcc-4.2 miscompilation on sparc64, and the barriers do hurt quite a bit in some places. Instead of keeping unrelated variables around cached in local registers, it reloads everything. Moore's law is definitely working against us here. Register counts, pipeline depths, core counts, and clock multipliers are all increasing in the long run. At some point in the future, barrier() will be universally regarded as a hammer too big for most purposes. Whether or not removing it now constitutes premature optimization is arguable, but I think we should allow such optimization to happen (or not happen) in architecture-dependent code, and provide a consistent API that doesn't require the use of such things in arch-independent code where it might turn into a totally superfluous performance killer depending on what hardware it gets compiled for. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, Aug 21, 2007 at 07:33:49PM +1000, Paul Mackerras wrote: > So the whole discussion is irrelevant to ARM, PowerPC and any other > architecture except x86[-64]. It's even irrelevant on x86 because all modifying operations on atomic_t are coded in inline assembler and will always be RMW no matter if atomic_t is volatile or not. [ignoring atomic_set(x, atomic_read(x) + 1) which nobody should do] The only issue is if atomic_t should have a implicit barrier or not. My personal opinion is yes -- better safe than sorry. And any code impact it may have is typically dwarved by the next cache miss anyways, so it doesn't matter much. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Russell King writes: > Let me say it more clearly: On ARM, it is impossible to perform atomic > operations on MMIO space. Actually, no one is suggesting that we try to do that at all. The discussion about RMW ops on MMIO space started with a comment attributed to the gcc developers that one reason why gcc on x86 doesn't use instructions that do RMW ops on volatile variables is that volatile is used to mark MMIO addresses, and there was some uncertainty about whether (non-atomic) RMW ops on x86 could be used on MMIO. This is in regard to the question about why gcc on x86 always moves a volatile variable into a register before doing anything to it. So the whole discussion is irrelevant to ARM, PowerPC and any other architecture except x86[-64]. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Aug 20, 2007 at 05:05:18PM -0700, Paul E. McKenney wrote: > On Tue, Aug 21, 2007 at 01:02:01AM +0200, Segher Boessenkool wrote: > > >>And no, RMW on MMIO isn't "problematic" at all, either. > > >> > > >>An RMW op is a read op, a modify op, and a write op, all rolled > > >>into one opcode. But three actual operations. > > > > > >Maybe for some CPUs, but not all. ARM for instance can't use the > > >load exclusive and store exclusive instructions to MMIO space. > > > > Sure, your CPU doesn't have RMW instructions -- how to emulate > > those if you don't have them is a totally different thing. > > I thought that ARM's load exclusive and store exclusive instructions > were its equivalent of LL and SC, which RISC machines typically use to > build atomic sequences of instructions -- and which normally cannot be > applied to MMIO space. Absolutely correct. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, Aug 21, 2007 at 01:02:01AM +0200, Segher Boessenkool wrote: > >>And no, RMW on MMIO isn't "problematic" at all, either. > >> > >>An RMW op is a read op, a modify op, and a write op, all rolled > >>into one opcode. But three actual operations. > > > >Maybe for some CPUs, but not all. ARM for instance can't use the > >load exclusive and store exclusive instructions to MMIO space. > > Sure, your CPU doesn't have RMW instructions -- how to emulate > those if you don't have them is a totally different thing. Let me say it more clearly: On ARM, it is impossible to perform atomic operations on MMIO space. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT) > Ie a "barrier()" is likely _cheaper_ than the code generation downside > from using "volatile". Assuming GCC were ever better about the code generation badness with volatile that has been discussed here, I much prefer we tell GCC "this memory piece changed" rather than "every piece of memory has changed" which is what the barrier() does. I happened to have been scanning a lot of assembler lately to track down a gcc-4.2 miscompilation on sparc64, and the barriers do hurt quite a bit in some places. Instead of keeping unrelated variables around cached in local registers, it reloads everything. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, 20 Aug 2007, Chris Snook wrote: > > What about barrier removal? With consistent semantics we could optimize a > fair amount of code. Whether or not that constitutes "premature" optimization > is open to debate, but there's no question we could reduce our register wiping > in some places. Why do people think that barriers are expensive? They really aren't. Especially the regular compiler barrier is basically zero cost. Any reasonable compiler will just flush the stuff it holds in registers that isn't already automatic local variables, and for regular kernel code, that tends to basically be nothing at all. Ie a "barrier()" is likely _cheaper_ than the code generation downside from using "volatile". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, Aug 21, 2007 at 01:02:01AM +0200, Segher Boessenkool wrote: > >>And no, RMW on MMIO isn't "problematic" at all, either. > >> > >>An RMW op is a read op, a modify op, and a write op, all rolled > >>into one opcode. But three actual operations. > > > >Maybe for some CPUs, but not all. ARM for instance can't use the > >load exclusive and store exclusive instructions to MMIO space. > > Sure, your CPU doesn't have RMW instructions -- how to emulate > those if you don't have them is a totally different thing. I thought that ARM's load exclusive and store exclusive instructions were its equivalent of LL and SC, which RISC machines typically use to build atomic sequences of instructions -- and which normally cannot be applied to MMIO space. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
And no, RMW on MMIO isn't "problematic" at all, either. An RMW op is a read op, a modify op, and a write op, all rolled into one opcode. But three actual operations. Maybe for some CPUs, but not all. ARM for instance can't use the load exclusive and store exclusive instructions to MMIO space. Sure, your CPU doesn't have RMW instructions -- how to emulate those if you don't have them is a totally different thing. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, Aug 21, 2007 at 12:04:17AM +0200, Segher Boessenkool wrote: > And no, RMW on MMIO isn't "problematic" at all, either. > > An RMW op is a read op, a modify op, and a write op, all rolled > into one opcode. But three actual operations. Maybe for some CPUs, but not all. ARM for instance can't use the load exclusive and store exclusive instructions to MMIO space. This means placing atomic_t or bitops into MMIO space is a definite no-go on ARM. It breaks. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Such code generally doesn't care precisely when it gets the update, just that the update is atomic, and it doesn't loop forever. Yes, it _does_ care that it gets the update _at all_, and preferably as early as possible. Regardless, I'm convinced we just need to do it all in assembly. So do you want "volatile asm" or "plain asm", for atomic_read()? The asm version has two ways to go about it too... Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Right. ROTFL... volatile actually breaks atomic_t instead of making it safe. x++ becomes a register load, increment and a register store. Without volatile we can increment the memory directly. It seems that volatile requires that the variable is loaded into a register first and then operated upon. Understandable when you think about volatile being used to access memory mapped I/O registers where a RMW operation could be problematic. So, if we want consistent behavior, we're pretty much screwed unless we use inline assembler everywhere? Nah, this whole argument is flawed -- "without volatile" we still *cannot* "increment the memory directly". On x86, you need a lock prefix; on other archs, some other mechanism to make the memory increment an *atomic* memory increment. And no, RMW on MMIO isn't "problematic" at all, either. An RMW op is a read op, a modify op, and a write op, all rolled into one opcode. But three actual operations. The advantages of asm code for atomic_{read,set} are: 1) all the other atomic ops are implemented that way already; 2) you have full control over the asm insns selected, in particular, you can guarantee you *do* get an atomic op; 3) you don't need to use "volatile " which generates not-all-that-good code on archs like x86, and we want to get rid of it anyway since it is problematic in many ways; 4) you don't need to use *(volatile *)&, which a) doesn't exist in C; b) isn't documented or supported in GCC; c) has a recent history of bugginess; d) _still uses volatile objects_; e) _still_ is problematic in almost all those same ways as in 3); 5) you can mix atomic and non-atomic accesses to the atomic_t, which you cannot with the other alternatives. The only disadvantage I know of is potentially slightly worse instruction scheduling. This is a generic asm() problem: GCC cannot see what actual insns are inside the asm() block. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Herbert Xu wrote: On Mon, Aug 20, 2007 at 09:15:11AM -0400, Chris Snook wrote: Linus Torvalds wrote: So the only reason to add back "volatile" to the atomic_read() sequence is not to fix bugs, but to _hide_ the bugs better. They're still there, they are just a lot harder to trigger, and tend to be a lot subtler. What about barrier removal? With consistent semantics we could optimize a fair amount of code. Whether or not that constitutes "premature" optimization is open to debate, but there's no question we could reduce our register wiping in some places. If you've been reading all of Linus's emails you should be thinking about adding memory barriers, and not removing compiler barriers. He's just told you that code of the kind while (!atomic_read(cond)) ; do_something() probably needs a memory barrier (not just compiler) so that do_something() doesn't see stale cache content that occured before cond flipped. Such code generally doesn't care precisely when it gets the update, just that the update is atomic, and it doesn't loop forever. Regardless, I'm convinced we just need to do it all in assembly. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Christoph Lameter wrote: On Fri, 17 Aug 2007, Paul E. McKenney wrote: On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) I had totally forgotten that I'd already filed that bug more than six years ago until they just closed yours as a duplicate of mine :) Good luck in getting it fixed! Well, just got done re-opening it for the third time. And a local gcc community member advised me not to give up too easily. But I must admit that I am impressed with the speed that it was identified as duplicate. Should be entertaining! ;-) Right. ROTFL... volatile actually breaks atomic_t instead of making it safe. x++ becomes a register load, increment and a register store. Without volatile we can increment the memory directly. It seems that volatile requires that the variable is loaded into a register first and then operated upon. Understandable when you think about volatile being used to access memory mapped I/O registers where a RMW operation could be problematic. So, if we want consistent behavior, we're pretty much screwed unless we use inline assembler everywhere? -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Mon, Aug 20, 2007 at 09:15:11AM -0400, Chris Snook wrote: > Linus Torvalds wrote: > >So the only reason to add back "volatile" to the atomic_read() sequence is > >not to fix bugs, but to _hide_ the bugs better. They're still there, they > >are just a lot harder to trigger, and tend to be a lot subtler. > > What about barrier removal? With consistent semantics we could optimize a > fair amount of code. Whether or not that constitutes "premature" > optimization is open to debate, but there's no question we could reduce our > register wiping in some places. If you've been reading all of Linus's emails you should be thinking about adding memory barriers, and not removing compiler barriers. He's just told you that code of the kind while (!atomic_read(cond)) ; do_something() probably needs a memory barrier (not just compiler) so that do_something() doesn't see stale cache content that occured before cond flipped. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures)
Stefan Richter wrote: Nick Piggin wrote: Stefan Richter wrote: Nick Piggin wrote: I don't know why people would assume volatile of atomics. AFAIK, most of the documentation is pretty clear that all the atomic stuff can be reordered etc. except for those that modify and return a value. Which documentation is there? Documentation/atomic_ops.txt For driver authors, there is LDD3. It doesn't specifically cover effects of optimization on accesses to atomic_t. For architecture port authors, there is Documentation/atomic_ops.txt. Driver authors also can learn something from that document, as it indirectly documents the atomic_t and bitops APIs. "Semantics and Behavior of Atomic and Bitmask Operations" is pretty direct :) Sure, it says that it's for arch maintainers, but there is no reason why users can't make use of it. Note, LDD3 page 238 says: "It is worth noting that most of the other kernel primitives dealing with synchronization, such as spinlock and atomic_t operations, also function as memory barriers." I don't know about Linux 2.6.10 against which LDD3 was written, but currently only _some_ atomic_t operations function as memory barriers. Besides, judging from some posts in this thread, saying that atomic_t operations dealt with synchronization may not be entirely precise. atomic_t is often used as the basis for implementing more sophisticated synchronization mechanisms, such as rwlocks. Whether or not they are designed for that purpose, the atomic_* operations are de facto synchronization primitives. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Linus Torvalds wrote: So the only reason to add back "volatile" to the atomic_read() sequence is not to fix bugs, but to _hide_ the bugs better. They're still there, they are just a lot harder to trigger, and tend to be a lot subtler. What about barrier removal? With consistent semantics we could optimize a fair amount of code. Whether or not that constitutes "premature" optimization is open to debate, but there's no question we could reduce our register wiping in some places. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, Aug 18, 2007 at 03:41:13PM -0700, Linus Torvalds wrote: > > > On Sat, 18 Aug 2007, Paul E. McKenney wrote: > > > > One of the gcc guys claimed that he thought that the two-instruction > > sequence would be faster on some x86 machines. I pointed out that > > there might be a concern about code size. I chose not to point out > > that people might also care about the other x86 machines. ;-) > > Some (very few) x86 uarchs do tend to prefer "load-store" like code > generation, and doing a "mov [mem],reg + op reg" instead of "op [mem]" can > actually be faster on some of them. Not any that are relevant today, > though. ;-) > Also, that has nothing to do with volatile, and should be controlled by > optimization flags (like -mtune). In fact, I thought there was a separate > flag to do that (ie something like "-mload-store"), but I can't find it, > so maybe that's just my fevered brain.. Good point, will suggest this if the need arises. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, 18 Aug 2007, Paul E. McKenney wrote: > > One of the gcc guys claimed that he thought that the two-instruction > sequence would be faster on some x86 machines. I pointed out that > there might be a concern about code size. I chose not to point out > that people might also care about the other x86 machines. ;-) Some (very few) x86 uarchs do tend to prefer "load-store" like code generation, and doing a "mov [mem],reg + op reg" instead of "op [mem]" can actually be faster on some of them. Not any that are relevant today, though. Also, that has nothing to do with volatile, and should be controlled by optimization flags (like -mtune). In fact, I thought there was a separate flag to do that (ie something like "-mload-store"), but I can't find it, so maybe that's just my fevered brain.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, Aug 17, 2007 at 06:24:15PM -0700, Christoph Lameter wrote: > On Fri, 17 Aug 2007, Paul E. McKenney wrote: > > > On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: > > > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: > > > > > > > > gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) > > > > > > I had totally forgotten that I'd already filed that bug more > > > than six years ago until they just closed yours as a duplicate > > > of mine :) > > > > > > Good luck in getting it fixed! > > > > Well, just got done re-opening it for the third time. And a local > > gcc community member advised me not to give up too easily. But I > > must admit that I am impressed with the speed that it was identified > > as duplicate. > > > > Should be entertaining! ;-) > > Right. ROTFL... volatile actually breaks atomic_t instead of making it > safe. x++ becomes a register load, increment and a register store. Without > volatile we can increment the memory directly. It seems that volatile > requires that the variable is loaded into a register first and then > operated upon. Understandable when you think about volatile being used to > access memory mapped I/O registers where a RMW operation could be > problematic. > > See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3506 Yep. The initial reaction was in fact to close my bug as a duplicate of 3506. But I was not asking for atomicity, but rather for smaller code to be generated, so I reopened it. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, Aug 17, 2007 at 09:13:35PM -0700, Linus Torvalds wrote: > > > On Sat, 18 Aug 2007, Satyam Sharma wrote: > > > > No code does (or would do, or should do): > > > > x.counter++; > > > > on an "atomic_t x;" anyway. > > That's just an example of a general problem. > > No, you don't use "x.counter++". But you *do* use > > if (atomic_read(&x) <= 1) > > and loading into a register is stupid and pointless, when you could just > do it as a regular memory-operand to the cmp instruction. > > And as far as the compiler is concerned, the problem is the 100% same: > combining operations with the volatile memop. > > The fact is, a compiler that thinks that > > movl mem,reg > cmpl $val,reg > > is any better than > > cmpl $val,mem > > is just not a very good compiler. But when talking about "volatile", > that's exactly what ytou always get (and always have gotten - this is > not a regression, and I doubt gcc is alone in this). One of the gcc guys claimed that he thought that the two-instruction sequence would be faster on some x86 machines. I pointed out that there might be a concern about code size. I chose not to point out that people might also care about the other x86 machines. ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures)
Nick Piggin wrote: > Stefan Richter wrote: >> Nick Piggin wrote: >> >>> I don't know why people would assume volatile of atomics. AFAIK, most >>> of the documentation is pretty clear that all the atomic stuff can be >>> reordered etc. except for those that modify and return a value. >> >> >> Which documentation is there? > > Documentation/atomic_ops.txt > > >> For driver authors, there is LDD3. It doesn't specifically cover >> effects of optimization on accesses to atomic_t. >> >> For architecture port authors, there is Documentation/atomic_ops.txt. >> Driver authors also can learn something from that document, as it >> indirectly documents the atomic_t and bitops APIs. >> > > "Semantics and Behavior of Atomic and Bitmask Operations" is > pretty direct :) > > Sure, it says that it's for arch maintainers, but there is no > reason why users can't make use of it. Note, LDD3 page 238 says: "It is worth noting that most of the other kernel primitives dealing with synchronization, such as spinlock and atomic_t operations, also function as memory barriers." I don't know about Linux 2.6.10 against which LDD3 was written, but currently only _some_ atomic_t operations function as memory barriers. Besides, judging from some posts in this thread, saying that atomic_t operations dealt with synchronization may not be entirely precise. -- Stefan Richter -=-=-=== =--- =--=- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, 18 Aug 2007, Segher Boessenkool wrote: > > > GCC manual, section 6.1, "When ^^ > > > is a Volatile Object Accessed?" doesn't say anything of the ^^^ > > > kind. ^ > > True, "implementation-defined" as per the C standard _is_ supposed to mean ^ > > "unspecified behaviour where each implementation documents how the choice > > is made". So ok, probably GCC isn't "documenting" this > > implementation-defined behaviour which it is supposed to, but can't really > > fault them much for this, probably. > > GCC _is_ documenting this, namely in this section 6.1. (Again totally petty, but) Yes, but ... > It doesn't ^^ > mention volatile-casted stuff. Draw your own conclusions. ^^ ... exactly. So that's why I said "GCC isn't documenting _this_". Man, try _reading_ mails before replying to them ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Linus Torvalds wrote: > On Sat, 18 Aug 2007, Satyam Sharma wrote: > > > > No code does (or would do, or should do): > > > > x.counter++; > > > > on an "atomic_t x;" anyway. > > That's just an example of a general problem. > > No, you don't use "x.counter++". But you *do* use > > if (atomic_read(&x) <= 1) > > and loading into a register is stupid and pointless, when you could just > do it as a regular memory-operand to the cmp instruction. True, but that makes this a bad/poor code generation issue with the compiler, not something that affects the _correctness_ of atomic ops if "volatile" is used for that counter object (as was suggested), because we'd always use the atomic_inc() etc primitives to do increments, which are always (should be!) implemented to be atomic. > And as far as the compiler is concerned, the problem is the 100% same: > combining operations with the volatile memop. > > The fact is, a compiler that thinks that > > movl mem,reg > cmpl $val,reg > > is any better than > > cmpl $val,mem > > is just not a very good compiler. Absolutely, this is definitely a bug report worth opening with gcc. And what you've said to explain this previously sounds definitely correct -- seeing "volatile" for any access does appear to just scare the hell out of gcc and makes it generate such (poor) code. Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
[ LOL, you _are_ shockingly petty! ] On Sat, 18 Aug 2007, Segher Boessenkool wrote: > > > The documentation simply doesn't say "+m" is allowed. The code to > > > allow it was added for the benefit of people who do not read the > > > documentation. Documentation for "+m" might get added later if it > > > is decided this [the code, not the documentation] is a sane thing > > > to have (which isn't directly obvious). > > > > Huh? > > > > "If the (current) documentation doesn't match up with the (current) > > code, then _at least one_ of them has to be (as of current) wrong." > > > > I wonder how could you even try to disagree with that. > > Easy. > > The GCC documentation you're referring to is the user's manual. > See the blurb on the first page: > > "This manual documents how to use the GNU compilers, as well as their > features and incompatibilities, and how to report bugs. It corresponds > to GCC version 4.3.0. The internals of the GNU compilers, including > how to port them to new targets and some information about how to write > front ends for new languages, are documented in a separate manual." > > _How to use_. This documentation doesn't describe in minute detail > everything the compiler does (see the source code for that -- no, it > isn't described in the internals manual either). Wow, now that's a nice "disclaimer". By your (poor) standards of writing documentation, one can as well write any factually incorrect stuff that one wants in a document once you've got such a blurb in place :-) > If it doesn't tell you how to use "+m", and even tells you _not_ to > use it, maybe that is what it means to say? It doesn't mean "+m" > doesn't actually do something. It also doesn't mean it does what > you think it should do. It might do just that of course. But treating > writing C code as an empirical science isn't such a smart idea. Oh, really? Considering how much is (left out of being) documented, often one would reasonably have to experimentally see (with testcases) how the compiler behaves for some given code. Well, at least _I_ do it often (several others on this list do as well), and I think there's everything smart about it rather than having to read gcc sources -- I'd be surprised (unless you have infinite free time on your hands, which does look like teh case actually) if someone actually prefers reading gcc sources first to know what/how gcc does something for some given code, rather than simply write it out, compile and look the generated code (saves time for those who don't have an infinite amount of it). > > And I didn't go whining about this ... you asked me. (I think I'd said > > something to the effect of GCC docs are often wrong, > > No need to guess at what you said, even if you managed to delete > your own mail already, there are plenty of free web-based archives > around. You said: > > > See, "volatile" C keyword, for all it's ill-definition and dodgy > > semantics, is still at least given somewhat of a treatment in the C > > standard (whose quality is ... ummm, sadly not always good and clear, > > but unsurprisingly, still about 5,482 orders-of-magnitude times > > better than GCC docs). Try _reading_ what I said there, for a change, dude. I'd originally only said "unless GCC's docs is yet again wrong" ... then _you_ asked me what, after which this discussion began and I wrote the above [which I fully agree with -- so what if I used hyperbole in my sentence (yup, that was intended, and obviously, exaggeration), am I not even allowed to do that? Man, you're a Nazi or what ...] I didn't go whining about on my own as you'd had earlier suggested, until _you_ asked me. [ Ick, I somehow managed to reply this ... this is such a ... *disgustingly* petty argument you made here. ] > > which is true, > > Yes, documentation of that size often has shortcomings. No surprise > there. However, great effort is made to make it better documentation, > and especially to keep it up to date; if you find any errors or > omissions, please report them. There are many ways how to do that, > see the GCC homepage. ^^ Looks like you even get paid :-) > > but probably you feel saying that is "not allowed" on non-gcc lists?) > > [amazingly pointless stuff snipped] > > > As for the "PR" > > you're requesting me to file with GCC for this, that > > gcc-patches@ thread did precisely that > > [more amazingly pointless stuff snipped] > > > and more (submitted a patch to > > said documentation -- and no, saying "documentation might get added > > later" is totally bogus and nonsensical -- documentation exists to > > document current behaviour, not past). > > When code like you want to write becomes a supported feature, that > will be reflected in the user manual. It is completely nonsensical > to expect everything that is *not* a supported feature to be mentioned > there. What crap. It is _perfectly reasonable_ to expect (current) docume
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
The documentation simply doesn't say "+m" is allowed. The code to allow it was added for the benefit of people who do not read the documentation. Documentation for "+m" might get added later if it is decided this [the code, not the documentation] is a sane thing to have (which isn't directly obvious). Huh? "If the (current) documentation doesn't match up with the (current) code, then _at least one_ of them has to be (as of current) wrong." I wonder how could you even try to disagree with that. Easy. The GCC documentation you're referring to is the user's manual. See the blurb on the first page: "This manual documents how to use the GNU compilers, as well as their features and incompatibilities, and how to report bugs. It corresponds to GCC version 4.3.0. The internals of the GNU compilers, including how to port them to new targets and some information about how to write front ends for new languages, are documented in a separate manual." _How to use_. This documentation doesn't describe in minute detail everything the compiler does (see the source code for that -- no, it isn't described in the internals manual either). If it doesn't tell you how to use "+m", and even tells you _not_ to use it, maybe that is what it means to say? It doesn't mean "+m" doesn't actually do something. It also doesn't mean it does what you think it should do. It might do just that of course. But treating writing C code as an empirical science isn't such a smart idea. And I didn't go whining about this ... you asked me. (I think I'd said something to the effect of GCC docs are often wrong, No need to guess at what you said, even if you managed to delete your own mail already, there are plenty of free web-based archives around. You said: See, "volatile" C keyword, for all it's ill-definition and dodgy semantics, is still at least given somewhat of a treatment in the C standard (whose quality is ... ummm, sadly not always good and clear, but unsurprisingly, still about 5,482 orders-of-magnitude times better than GCC docs). and that to me reads as complaining that the ISO C standard "isn't very good" and that the GCC documentation is 10**5482 times worse even. Which of course is hyperbole and cannot be true. It also isn't helpful in any way or form for anyone on this list. I call that whining. which is true, Yes, documentation of that size often has shortcomings. No surprise there. However, great effort is made to make it better documentation, and especially to keep it up to date; if you find any errors or omissions, please report them. There are many ways how to do that, see the GCC homepage. but probably you feel saying that is "not allowed" on non-gcc lists?) You're allowed to say whatever you want. Let's have a quote again shall we? I said: If you find any problems/shortcomings in the GCC documentation, please file a PR, don't go whine on some unrelated mailing lists. Thank you. I read that as a friendly request, not a prohibition. Well maybe not actually friendly, more a bit angry. A request, either way. As for the "PR" "Problem report", a bugzilla ticket. Sorry for using terminology unknown to you. you're requesting me to file with GCC for this, that gcc-patches@ thread did precisely that Actually not -- PRs make sure issues aren't forgotten (although they might gather dust, sure). But yes, submitting patches is a Great Thing(tm). and more (submitted a patch to said documentation -- and no, saying "documentation might get added later" is totally bogus and nonsensical -- documentation exists to document current behaviour, not past). When code like you want to write becomes a supported feature, that will be reflected in the user manual. It is completely nonsensical to expect everything that is *not* a supported feature to be mentioned there. I wouldn't have replied, really, if you weren't so provoking. Hey, maybe that character trait is good for something, then. Now to build a business plan around it... Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, 18 Aug 2007, Satyam Sharma wrote: > > No code does (or would do, or should do): > > x.counter++; > > on an "atomic_t x;" anyway. That's just an example of a general problem. No, you don't use "x.counter++". But you *do* use if (atomic_read(&x) <= 1) and loading into a register is stupid and pointless, when you could just do it as a regular memory-operand to the cmp instruction. And as far as the compiler is concerned, the problem is the 100% same: combining operations with the volatile memop. The fact is, a compiler that thinks that movl mem,reg cmpl $val,reg is any better than cmpl $val,mem is just not a very good compiler. But when talking about "volatile", that's exactly what ytou always get (and always have gotten - this is not a regression, and I doubt gcc is alone in this). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, 18 Aug 2007, Segher Boessenkool wrote: > > > > > The "asm volatile" implementation does have exactly the same > > > > > reordering guarantees as the "volatile cast" thing, > > > > > > > > I don't think so. > > > > > > "asm volatile" creates a side effect. > > > > Yeah. > > > > > Side effects aren't > > > allowed to be reordered wrt sequence points. > > > > Yeah. > > > > > This is exactly > > > the same reason as why "volatile accesses" cannot be reordered. > > > > No, the code in that sub-thread I earlier pointed you at *WAS* written > > such that there was a sequence point after all the uses of that volatile > > access cast macro, and _therefore_ we were safe from re-ordering > > (behaviour guaranteed by the C standard). > > And exactly the same is true for the "asm" version. > > > Now, one cannot fantasize that "volatile asms" are also sequence points. > > Sure you can do that. I don't though. > > > In fact such an argument would be sadly mistaken, because "sequence > > points" are defined by the C standard and it'd be horribly wrong to > > even _try_ claiming that the C standard knows about "volatile asms". > > That's nonsense. GCC can extend the C standard any way they > bloody well please -- witness the fact that they added an > extra class of side effects... > > > > > Read the relevant GCC documentation. > > > > > > I did, yes. > > > > No, you didn't read: > > > > http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html > > > > Read the bit about the need for artificial dependencies, and the example > > given there: > > > > asm volatile("mtfsf 255,%0" : : "f" (fpenv)); > > sum = x + y; > > > > The docs explicitly say the addition can be moved before the "volatile > > asm". Hopefully, as you know, (x + y) is an C "expression" and hence > > a "sequence point" as defined by the standard. > > The _end of a full expression_ is a sequence point, not every > expression. And that is irrelevant here anyway. > > It is perfectly fine to compute x+y any time before the > assignment; the C compiler is allowed to compute it _after_ > the assignment even, if it could figure out how ;-) > > x+y does not contain a side effect, you know. > > > I know there is also stuff written about "side-effects" there which > > _could_ give the same semantic w.r.t. sequence points as the volatile > > access casts, > > s/could/does/ > > > but hey, it's GCC's own documentation, you obviously can't > > find fault with _me_ if there's wrong stuff written in there. Say that > > to GCC ... > > There's nothing wrong there. > > > See, "volatile" C keyword, for all it's ill-definition and dodgy > > semantics, is still at least given somewhat of a treatment in the C > > standard (whose quality is ... ummm, sadly not always good and clear, > > but unsurprisingly, still about 5,482 orders-of-magnitude times > > better than GCC docs). > > If you find any problems/shortcomings in the GCC documentation, > please file a PR, don't go whine on some unrelated mailing lists. > Thank you. > > > Semantics of "volatile" as applies to inline > > asm, OTOH? You're completely relying on the compiler for that ... > > Yes, and? GCC promises the behaviour it has documented. LOTS there, which obviously isn't correct, but which I'll reply to later, easier stuff first. (call this "handwaving" if you want, but don't worry, I /will/ bother myself to reply) > > > > [ of course, if the (latest) GCC documentation is *yet again* > > > > wrong, then alright, not much I can do about it, is there. ] > > > > > > There was (and is) nothing wrong about the "+m" documentation, if > > > that is what you are talking about. It could be extended now, to > > > allow "+m" -- but that takes more than just "fixing" the documentation. > > > > No, there was (and is) _everything_ wrong about the "+" documentation as > > applies to memory-constrained operands. I don't give a whit if it's > > some workaround in their gimplifier, or the other, that makes it possible > > to use "+m" (like the current kernel code does). The docs suggest > > otherwise, so there's obviously a clear disconnect between the docs and > > actual GCC behaviour. > > The documentation simply doesn't say "+m" is allowed. The code to > allow it was added for the benefit of people who do not read the > documentation. Documentation for "+m" might get added later if it > is decided this [the code, not the documentation] is a sane thing > to have (which isn't directly obvious). Huh? "If the (current) documentation doesn't match up with the (current) code, then _at least one_ of them has to be (as of current) wrong." I wonder how could you even try to disagree with that. And I didn't go whining about this ... you asked me. (I think I'd said something to the effect of GCC docs are often wrong, which is true, but probably you feel saying that is "not allowed" on non-gcc lists?) As for the "PR" you're requesting me to file with GCC for this, that gcc-patches@ threa
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
The "asm volatile" implementation does have exactly the same reordering guarantees as the "volatile cast" thing, I don't think so. "asm volatile" creates a side effect. Yeah. Side effects aren't allowed to be reordered wrt sequence points. Yeah. This is exactly the same reason as why "volatile accesses" cannot be reordered. No, the code in that sub-thread I earlier pointed you at *WAS* written such that there was a sequence point after all the uses of that volatile access cast macro, and _therefore_ we were safe from re-ordering (behaviour guaranteed by the C standard). And exactly the same is true for the "asm" version. Now, one cannot fantasize that "volatile asms" are also sequence points. Sure you can do that. I don't though. In fact such an argument would be sadly mistaken, because "sequence points" are defined by the C standard and it'd be horribly wrong to even _try_ claiming that the C standard knows about "volatile asms". That's nonsense. GCC can extend the C standard any way they bloody well please -- witness the fact that they added an extra class of side effects... Read the relevant GCC documentation. I did, yes. No, you didn't read: http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html Read the bit about the need for artificial dependencies, and the example given there: asm volatile("mtfsf 255,%0" : : "f" (fpenv)); sum = x + y; The docs explicitly say the addition can be moved before the "volatile asm". Hopefully, as you know, (x + y) is an C "expression" and hence a "sequence point" as defined by the standard. The _end of a full expression_ is a sequence point, not every expression. And that is irrelevant here anyway. It is perfectly fine to compute x+y any time before the assignment; the C compiler is allowed to compute it _after_ the assignment even, if it could figure out how ;-) x+y does not contain a side effect, you know. I know there is also stuff written about "side-effects" there which _could_ give the same semantic w.r.t. sequence points as the volatile access casts, s/could/does/ but hey, it's GCC's own documentation, you obviously can't find fault with _me_ if there's wrong stuff written in there. Say that to GCC ... There's nothing wrong there. See, "volatile" C keyword, for all it's ill-definition and dodgy semantics, is still at least given somewhat of a treatment in the C standard (whose quality is ... ummm, sadly not always good and clear, but unsurprisingly, still about 5,482 orders-of-magnitude times better than GCC docs). If you find any problems/shortcomings in the GCC documentation, please file a PR, don't go whine on some unrelated mailing lists. Thank you. Semantics of "volatile" as applies to inline asm, OTOH? You're completely relying on the compiler for that ... Yes, and? GCC promises the behaviour it has documented. [ of course, if the (latest) GCC documentation is *yet again* wrong, then alright, not much I can do about it, is there. ] There was (and is) nothing wrong about the "+m" documentation, if that is what you are talking about. It could be extended now, to allow "+m" -- but that takes more than just "fixing" the documentation. No, there was (and is) _everything_ wrong about the "+" documentation as applies to memory-constrained operands. I don't give a whit if it's some workaround in their gimplifier, or the other, that makes it possible to use "+m" (like the current kernel code does). The docs suggest otherwise, so there's obviously a clear disconnect between the docs and actual GCC behaviour. The documentation simply doesn't say "+m" is allowed. The code to allow it was added for the benefit of people who do not read the documentation. Documentation for "+m" might get added later if it is decided this [the code, not the documentation] is a sane thing to have (which isn't directly obvious). [ You seem to often take issue with _amazingly_ petty and pedantic things, by the way :-) ] If you're talking details, you better get them right. Handwaving is fine with me as long as you're not purporting you're not. And I simply cannot stand false assertions. You can always ignore me if _you_ take issue with _that_ :-) Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Nick Piggin wrote: > Satyam Sharma wrote: > > > I didn't quite understand what you said here, so I'll tell what I think: > > > > * foo() is a compiler barrier if the definition of foo() is invisible to > > the compiler at a callsite. > > > > * foo() is also a compiler barrier if the definition of foo() includes > > a barrier, and it is inlined at the callsite. > > > > If the above is wrong, or if there's something else at play as well, > > do let me know. > > [...] > If a function is not completely visible to the compiler (so it can't > determine whether a barrier could be in it or not), then it must always > assume it will contain a barrier so it always does the right thing. Yup, that's what I'd said just a few sentences above, as you can see. I was actually asking for "elaboration" on "how a compiler determines that function foo() (say foo == schedule), even when it cannot see that it has a barrier(), as you'd mentioned, is a 'sleeping' function" actually, and whether compilers have a "notion of sleep to automatically assume a compiler barrier whenever such a sleeping function foo() is called". But I think you've already qualified the discussion to this kernel, so okay, I shouldn't nit-pick anymore. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, 18 Aug 2007, Segher Boessenkool wrote: > > > > > atomic_dec() writes > > > > > to memory, so it _does_ have "volatile semantics", implicitly, as > > > > > long as the compiler cannot optimise the atomic variable away > > > > > completely -- any store counts as a side effect. > > > > > > > > I don't think an atomic_dec() implemented as an inline "asm volatile" > > > > or one that uses a "forget" macro would have the same re-ordering > > > > guarantees as an atomic_dec() that uses a volatile access cast. > > > > > > The "asm volatile" implementation does have exactly the same > > > reordering guarantees as the "volatile cast" thing, > > > > I don't think so. > > "asm volatile" creates a side effect. Yeah. > Side effects aren't > allowed to be reordered wrt sequence points. Yeah. > This is exactly > the same reason as why "volatile accesses" cannot be reordered. No, the code in that sub-thread I earlier pointed you at *WAS* written such that there was a sequence point after all the uses of that volatile access cast macro, and _therefore_ we were safe from re-ordering (behaviour guaranteed by the C standard). But you seem to be missing the simple and basic fact that: (something_that_has_side_effects || statement) != something_that_is_a_sequence_point Now, one cannot fantasize that "volatile asms" are also sequence points. In fact such an argument would be sadly mistaken, because "sequence points" are defined by the C standard and it'd be horribly wrong to even _try_ claiming that the C standard knows about "volatile asms". > > > if that is > > > implemented by GCC in the "obvious" way. Even a "plain" asm() > > > will do the same. > > > > Read the relevant GCC documentation. > > I did, yes. No, you didn't read: http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html Read the bit about the need for artificial dependencies, and the example given there: asm volatile("mtfsf 255,%0" : : "f" (fpenv)); sum = x + y; The docs explicitly say the addition can be moved before the "volatile asm". Hopefully, as you know, (x + y) is an C "expression" and hence a "sequence point" as defined by the standard. So the "volatile asm" should've happened before it, right? Wrong. I know there is also stuff written about "side-effects" there which _could_ give the same semantic w.r.t. sequence points as the volatile access casts, but hey, it's GCC's own documentation, you obviously can't find fault with _me_ if there's wrong stuff written in there. Say that to GCC ... See, "volatile" C keyword, for all it's ill-definition and dodgy semantics, is still at least given somewhat of a treatment in the C standard (whose quality is ... ummm, sadly not always good and clear, but unsurprisingly, still about 5,482 orders-of-magnitude times better than GCC docs). Semantics of "volatile" as applies to inline asm, OTOH? You're completely relying on the compiler for that ... > > [ of course, if the (latest) GCC documentation is *yet again* > > wrong, then alright, not much I can do about it, is there. ] > > There was (and is) nothing wrong about the "+m" documentation, if > that is what you are talking about. It could be extended now, to > allow "+m" -- but that takes more than just "fixing" the documentation. No, there was (and is) _everything_ wrong about the "+" documentation as applies to memory-constrained operands. I don't give a whit if it's some workaround in their gimplifier, or the other, that makes it possible to use "+m" (like the current kernel code does). The docs suggest otherwise, so there's obviously a clear disconnect between the docs and actual GCC behaviour. [ You seem to often take issue with _amazingly_ petty and pedantic things, by the way :-) ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Christoph Lameter wrote: > On Fri, 17 Aug 2007, Paul E. McKenney wrote: > > > On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: > > > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: > > > > > > > > gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) > > > > > > I had totally forgotten that I'd already filed that bug more > > > than six years ago until they just closed yours as a duplicate > > > of mine :) > > > > > > Good luck in getting it fixed! > > > > Well, just got done re-opening it for the third time. And a local > > gcc community member advised me not to give up too easily. But I > > must admit that I am impressed with the speed that it was identified > > as duplicate. > > > > Should be entertaining! ;-) > > Right. ROTFL... volatile actually breaks atomic_t instead of making it > safe. x++ becomes a register load, increment and a register store. Without > volatile we can increment the memory directly. No code does (or would do, or should do): x.counter++; on an "atomic_t x;" anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Paul E. McKenney wrote: > On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: > > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: > > > > > > gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) > > > > I had totally forgotten that I'd already filed that bug more > > than six years ago until they just closed yours as a duplicate > > of mine :) > > > > Good luck in getting it fixed! > > Well, just got done re-opening it for the third time. And a local > gcc community member advised me not to give up too easily. But I > must admit that I am impressed with the speed that it was identified > as duplicate. > > Should be entertaining! ;-) Right. ROTFL... volatile actually breaks atomic_t instead of making it safe. x++ becomes a register load, increment and a register store. Without volatile we can increment the memory directly. It seems that volatile requires that the variable is loaded into a register first and then operated upon. Understandable when you think about volatile being used to access memory mapped I/O registers where a RMW operation could be problematic. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3506 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: > > > > gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) > > I had totally forgotten that I'd already filed that bug more > than six years ago until they just closed yours as a duplicate > of mine :) > > Good luck in getting it fixed! Well, just got done re-opening it for the third time. And a local gcc community member advised me not to give up too easily. But I must admit that I am impressed with the speed that it was identified as duplicate. Should be entertaining! ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. "volatile" has nothing to do with reordering. If you're talking of "volatile" the type-qualifier keyword, then http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows otherwise. I'm not sure what in that mail you mean, but anyway... Yes, of course, the fact that "volatile" creates a side effect prevents certain things from being reordered wrt the atomic_dec(); but the atomic_dec() has a side effect *already* so the volatile doesn't change anything. atomic_dec() writes to memory, so it _does_ have "volatile semantics", implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. I don't think an atomic_dec() implemented as an inline "asm volatile" or one that uses a "forget" macro would have the same re-ordering guarantees as an atomic_dec() that uses a volatile access cast. The "asm volatile" implementation does have exactly the same reordering guarantees as the "volatile cast" thing, if that is implemented by GCC in the "obvious" way. Even a "plain" asm() will do the same. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: > > gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) I had totally forgotten that I'd already filed that bug more than six years ago until they just closed yours as a duplicate of mine :) Good luck in getting it fixed! Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
atomic_dec() writes to memory, so it _does_ have "volatile semantics", implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. I don't think an atomic_dec() implemented as an inline "asm volatile" or one that uses a "forget" macro would have the same re-ordering guarantees as an atomic_dec() that uses a volatile access cast. The "asm volatile" implementation does have exactly the same reordering guarantees as the "volatile cast" thing, I don't think so. "asm volatile" creates a side effect. Side effects aren't allowed to be reordered wrt sequence points. This is exactly the same reason as why "volatile accesses" cannot be reordered. if that is implemented by GCC in the "obvious" way. Even a "plain" asm() will do the same. Read the relevant GCC documentation. I did, yes. [ of course, if the (latest) GCC documentation is *yet again* wrong, then alright, not much I can do about it, is there. ] There was (and is) nothing wrong about the "+m" documentation, if that is what you are talking about. It could be extended now, to allow "+m" -- but that takes more than just "fixing" the documentation. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 08:50:30PM -0700, Linus Torvalds wrote: > Just try it yourself: > > volatile int i; > int j; > > int testme(void) > { > return i <= 1; > } > > int testme2(void) > { > return j <= 1; > } > > and compile with all the optimizations you can. > > I get: > > testme: > movli(%rip), %eax > subl$1, %eax > setle %al > movzbl %al, %eax > ret > > vs > > testme2: > xorl%eax, %eax > cmpl$1, j(%rip) > setle %al > ret > > (now, whether that "xorl + setle" is better than "setle + movzbl", I don't > really know - maybe it is. But that's not thepoint. The point is the > difference between > > movli(%rip), %eax > subl$1, %eax > > and > > cmpl$1, j(%rip) gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
#define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a)) [ This is exactly equivalent to using "+m" in the constraints, as recently explained on a GCC list somewhere, in response to the patch in my bitops series a few weeks back where I thought "+m" was bogus. ] [It wasn't explained on a GCC list in response to your patch, as far as I can see -- if I missed it, please point me to an archived version of it]. http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html Ah yes, that old thread, thank you. That's when _I_ came to know how GCC interprets "+m", but probably this has been explained on those lists multiple times. Who cares, anyway? I just couldn't find the thread you meant, I thought I missed have it, that's all :-) Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, 18 Aug 2007, Segher Boessenkool wrote: > > > > No it does not have any volatile semantics. atomic_dec() can be > > > > reordered > > > > at will by the compiler within the current basic unit if you do not add > > > > a > > > > barrier. > > > > > > "volatile" has nothing to do with reordering. > > > > If you're talking of "volatile" the type-qualifier keyword, then > > http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows > > otherwise. > > I'm not sure what in that mail you mean, but anyway... > > Yes, of course, the fact that "volatile" creates a side effect > prevents certain things from being reordered wrt the atomic_dec(); > but the atomic_dec() has a side effect *already* so the volatile > doesn't change anything. That's precisely what that sub-thread (read down to the last mail there, and not the first mail only) shows. So yes, "volatile" does have something to do with re-ordering (as guaranteed by the C standard). > > > atomic_dec() writes > > > to memory, so it _does_ have "volatile semantics", implicitly, as > > > long as the compiler cannot optimise the atomic variable away > > > completely -- any store counts as a side effect. > > > > I don't think an atomic_dec() implemented as an inline "asm volatile" > > or one that uses a "forget" macro would have the same re-ordering > > guarantees as an atomic_dec() that uses a volatile access cast. > > The "asm volatile" implementation does have exactly the same > reordering guarantees as the "volatile cast" thing, I don't think so. > if that is > implemented by GCC in the "obvious" way. Even a "plain" asm() > will do the same. Read the relevant GCC documentation. [ of course, if the (latest) GCC documentation is *yet again* wrong, then alright, not much I can do about it, is there. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, 18 Aug 2007, Segher Boessenkool wrote: > > #define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a)) > > > > [ This is exactly equivalent to using "+m" in the constraints, as recently > > explained on a GCC list somewhere, in response to the patch in my bitops > > series a few weeks back where I thought "+m" was bogus. ] > > [It wasn't explained on a GCC list in response to your patch, as > far as I can see -- if I missed it, please point me to an archived > version of it]. http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html is a follow-up in the thread on the [EMAIL PROTECTED] mailing list, which began with: http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01677.html that was posted by Jan Kubicka, as he quotes in that initial posting, after I had submitted: http://lkml.org/lkml/2007/7/23/252 which was a (wrong) patch to "rectify" what I thought was the "bogus" "+m" constraint, as per the quoted extract from gcc docs (that was given in that (wrong) patch's changelog). That's when _I_ came to know how GCC interprets "+m", but probably this has been explained on those lists multiple times. Who cares, anyway? > One last time: it isn't equivalent on older (but still supported > by Linux) versions of GCC. Current versions of GCC allow it, but > it has no documented behaviour at all, so use it at your own risk. True. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
#define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a)) [ This is exactly equivalent to using "+m" in the constraints, as recently explained on a GCC list somewhere, in response to the patch in my bitops series a few weeks back where I thought "+m" was bogus. ] [It wasn't explained on a GCC list in response to your patch, as far as I can see -- if I missed it, please point me to an archived version of it]. One last time: it isn't equivalent on older (but still supported by Linux) versions of GCC. Current versions of GCC allow it, but it has no documented behaviour at all, so use it at your own risk. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Here, I should obviously admit that the semantics of *(volatile int *)& aren't any neater or well-defined in the _language standard_ at all. The standard does say (verbatim) "precisely what constitutes as access to object of volatile-qualified type is implementation-defined", but GCC does help us out here by doing the right thing. Where do you get that idea? Try a testcase (experimentally verify). That doesn't prove anything. Experiments can only disprove things. GCC manual, section 6.1, "When is a Volatile Object Accessed?" doesn't say anything of the kind. True, "implementation-defined" as per the C standard _is_ supposed to mean "unspecified behaviour where each implementation documents how the choice is made". So ok, probably GCC isn't "documenting" this implementation-defined behaviour which it is supposed to, but can't really fault them much for this, probably. GCC _is_ documenting this, namely in this section 6.1. It doesn't mention volatile-casted stuff. Draw your own conclusions. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Now the second wording *IS* technically correct, but come on, it's 24 words long whereas the original one was 3 -- and hopefully anybody reading the shorter phrase *would* have known anyway what was meant, without having to be pedantic about it :-) Well you were talking pretty formal (and detailed) stuff, so IMHO it's good to have that exactly correct. Sure it's nicer to use small words most of the time :-) Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
In a reasonable world, gcc should just make that be (on x86) addl $1,i(%rip) on x86-64, which is indeed what it does without the volatile. But with the volatile, the compiler gets really nervous, and doesn't dare do it in one instruction, and thus generates crap like movli(%rip), %eax addl$1, %eax movl%eax, i(%rip) instead. For no good reason, except that "volatile" just doesn't have any good/clear semantics for the compiler, so most compilers will just make it be "I will not touch this access in any way, shape, or form". Including even trivially correct instruction optimization/combination. It's just a (target-specific, perhaps) missed-optimisation kind of bug in GCC. Care to file a bug report? but is (again) something that gcc doesn't dare do, since "i" is volatile. Just nobody taught it it can do this; perhaps no one wanted to add optimisations like that, maybe with a reasoning like "people who hit the go-slow-in-unspecified-ways button should get what they deserve" ;-) Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
(and yes, it is perfectly legitimate to want a non-volatile read for a data type that you also want to do atomic RMW operations on) ...which is undefined behaviour in C (and GCC) when that data is declared volatile, which is a good argument against implementing atomics that way in itself. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Of course, since *normal* accesses aren't necessarily limited wrt re-ordering, the question then becomes one of "with regard to *what* does it limit re-ordering?". A C compiler that re-orders two different volatile accesses that have a sequence point in between them is pretty clearly a buggy compiler. So at a minimum, it limits re-ordering wrt other volatiles (assuming sequence points exists). It also means that the compiler cannot move it speculatively across conditionals, but other than that it's starting to get fuzzy. This is actually really well-defined in C, not fuzzy at all. "Volatile accesses" are a side effect, and no side effects can be reordered with respect to sequence points. The side effects that matter in the kernel environment are: 1) accessing a volatile object; 2) modifying an object; 3) volatile asm(); 4) calling a function that does any of these. We certainly should avoid volatile whenever possible, but "because it's fuzzy wrt reordering" is not a reason -- all alternatives have exactly the same issues. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, Aug 17, 2007 at 12:49:00PM -0700, Arjan van de Ven wrote: > > On Fri, 2007-08-17 at 12:49 -0700, Paul E. McKenney wrote: > > > > What about reading values modified in interrupt handlers, as in your > > > > "random" case? Or is this a bug where the user of atomic_read() is > > > > invalidly expecting a read each time it is called? > > > > > > the interrupt handler case is an SMP case since you do not know > > > beforehand what cpu your interrupt handler will run on. > > > > With the exception of per-CPU variables, yes. > > if you're spinning waiting for a per-CPU variable to get changed by an > interrupt handler... you have bigger problems than "volatile" ;-) That would be true, if you were doing that. But you might instead be simply making sure that the mainline actions were seen in order by the interrupt handler. My current example is the NMI-save rcu_read_lock() implementation for realtime. Not the common case, I will admit, but still real. ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 2007-08-17 at 12:49 -0700, Paul E. McKenney wrote: > > > What about reading values modified in interrupt handlers, as in your > > > "random" case? Or is this a bug where the user of atomic_read() is > > > invalidly expecting a read each time it is called? > > > > the interrupt handler case is an SMP case since you do not know > > beforehand what cpu your interrupt handler will run on. > > With the exception of per-CPU variables, yes. if you're spinning waiting for a per-CPU variable to get changed by an interrupt handler... you have bigger problems than "volatile" ;-) -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, Aug 17, 2007 at 11:54:33AM -0700, Arjan van de Ven wrote: > > On Fri, 2007-08-17 at 12:50 -0600, Chris Friesen wrote: > > Linus Torvalds wrote: > > > > > - in other words, the *only* possible meaning for "volatile" is a purely > > >single-CPU meaning. And if you only have a single CPU involved in the > > >process, the "volatile" is by definition pointless (because even > > >without a volatile, the compiler is required to make the C code appear > > >consistent as far as a single CPU is concerned). > > > > I assume you mean "except for IO-related code and 'random' values like > > jiffies" as you mention later on? I assume other values set in > > interrupt handlers would count as "random" from a volatility perspective? > > > > > So anybody who argues for "volatile" fixing bugs is fundamentally > > > incorrect. It does NO SUCH THING. By arguing that, such people only show > > > that you have no idea what they are talking about. > > > > What about reading values modified in interrupt handlers, as in your > > "random" case? Or is this a bug where the user of atomic_read() is > > invalidly expecting a read each time it is called? > > the interrupt handler case is an SMP case since you do not know > beforehand what cpu your interrupt handler will run on. With the exception of per-CPU variables, yes. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Chris Friesen wrote: > > I assume you mean "except for IO-related code and 'random' values like > jiffies" as you mention later on? Yes. There *are* valid uses for "volatile", but they have remained the same for the last few years: - "jiffies" - internal per-architecture IO implementations that can do them as normal stores. > I assume other values set in interrupt handlers would count as "random" > from a volatility perspective? I don't really see any valid case. I can imagine that you have your own "jiffy" counter in a driver, but what's the point, really? I'd suggest not using volatile, and using barriers instead. > > > So anybody who argues for "volatile" fixing bugs is fundamentally > > incorrect. It does NO SUCH THING. By arguing that, such people only > > show that you have no idea what they are talking about. > What about reading values modified in interrupt handlers, as in your > "random" case? Or is this a bug where the user of atomic_read() is > invalidly expecting a read each time it is called? Quite frankly, the biggest reason for using "volatile" on jiffies was really historic. So even the "random" case is not really a very strong one. You'll notice that anybody who is actually careful will be using sequence locks for the jiffy accesses, if only because the *full* jiffy count is actually a 64-bit value, and so you cannot get it atomically on a 32-bit architecture even on a single CPU (ie a timer interrupt might happen in between reading the low and the high word, so "volatile" is only used for the low 32 bits). So even for jiffies, we actually have: extern u64 __jiffy_data jiffies_64; extern unsigned long volatile __jiffy_data jiffies; where the *real* jiffies is not volatile: the volatile one is using linker tricks to alias the low 32 bits: - arch/i386/kernel/vmlinux.lds.S: ... jiffies = jiffies_64; ... and the only reason we do all these games is (a) it works and (b) it's legacy. Note how I do *not* say "(c) it's a good idea". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 2007-08-17 at 12:50 -0600, Chris Friesen wrote: > Linus Torvalds wrote: > > > - in other words, the *only* possible meaning for "volatile" is a purely > >single-CPU meaning. And if you only have a single CPU involved in the > >process, the "volatile" is by definition pointless (because even > >without a volatile, the compiler is required to make the C code appear > >consistent as far as a single CPU is concerned). > > I assume you mean "except for IO-related code and 'random' values like > jiffies" as you mention later on? I assume other values set in > interrupt handlers would count as "random" from a volatility perspective? > > > So anybody who argues for "volatile" fixing bugs is fundamentally > > incorrect. It does NO SUCH THING. By arguing that, such people only show > > that you have no idea what they are talking about. > > What about reading values modified in interrupt handlers, as in your > "random" case? Or is this a bug where the user of atomic_read() is > invalidly expecting a read each time it is called? the interrupt handler case is an SMP case since you do not know beforehand what cpu your interrupt handler will run on. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sat, Aug 18, 2007 at 12:01:38AM +0530, Satyam Sharma wrote: > > > On Fri, 17 Aug 2007, Paul E. McKenney wrote: > > > On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote: > > > > > > On Thu, 16 Aug 2007, Paul E. McKenney wrote: > > > > > > > On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote: > > > > > > > > > > First of all, I think this illustrates that what you want > > > > > here has nothing to do with atomic ops. The ORDERED_WRT_IRQ > > > > > macro occurs a lot more times in your patch than atomic > > > > > reads/sets. So *assuming* that it was necessary at all, > > > > > then having an ordered variant of the atomic_read/atomic_set > > > > > ops could do just as well. > > > > > > > > Indeed. If I could trust atomic_read()/atomic_set() to cause the > > > > compiler > > > > to maintain ordering, then I could just use them instead of having to > > > > create an ORDERED_WRT_IRQ(). (Or ACCESS_ONCE(), as it is called in a > > > > different patch.) > > > > > > +#define WHATEVER(x) (*(volatile typeof(x) *)&(x)) > > > [...] > > > Also, this gives *zero* "re-ordering" guarantees that your code wants > > > as you've explained it below) -- neither w.r.t. CPU re-ordering (which > > > probably you don't care about) *nor* w.r.t. compiler re-ordering > > > (which you definitely _do_ care about). > > > > You are correct about CPU re-ordering (and about the fact that this > > example doesn't care about it), but not about compiler re-ordering. > > > > The compiler is prohibited from moving a volatile access across a sequence > > point. One example of a sequence point is a statement boundary. Because > > all of the volatile accesses in this code are separated by statement > > boundaries, a conforming compiler is prohibited from reordering them. > > Yes, you're right, and I believe precisely this was discussed elsewhere > as well today. > > But I'd call attention to what Herbert mentioned there. You're using > ORDERED_WRT_IRQ() on stuff that is _not_ defined to be an atomic_t at all: > > * Member "completed" of struct rcu_ctrlblk is a long. > * Per-cpu variable rcu_flipctr is an array of ints. > * Members "rcu_read_lock_nesting" and "rcu_flipctr_idx" of > struct task_struct are ints. > > So are you saying you're "having to use" this volatile-access macro > because you *couldn't* declare all the above as atomic_t and thus just > expect the right thing to happen by using the atomic ops API by default, > because it lacks volatile access semantics (on x86)? > > If so, then I wonder if using the volatile access cast is really the > best way to achieve (at least in terms of code clarity) the kind of > re-ordering guarantees it wants there. (there could be alternative > solutions, such as using barrier(), or that at bottom of this mail) > > What I mean is this: If you switch to atomic_t, and x86 switched to > make atomic_t have "volatile" semantics by default, the statements > would be simply a string of: atomic_inc(), atomic_add(), atomic_set(), > and atomic_read() statements, and nothing in there that clearly makes > it *explicit* that the code is correct (and not buggy) simply because > of the re-ordering guarantees that the C "volatile" type-qualifier > keyword gives us as per the standard. But now we're firmly in > "subjective" territory, so you or anybody could legitimately disagree. In any case, given Linus's note, it appears that atomic_read() and atomic_set() won't consistently have volatile semantics, at least not while the compiler generates such ugly code for volatile accesses. So I will continue with my current approach. In any case, I will not be using atomic_inc() or atomic_add() in this code, as doing so would more than double the overhead, even on machines that are the most efficient at implementing atomic operations. > > > > Suppose I tried replacing the ORDERED_WRT_IRQ() calls with > > > > atomic_read() and atomic_set(). Starting with __rcu_read_lock(): > > > > > > > > o If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++" > > > > was ordered by the compiler after > > > > "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then > > > > suppose an NMI/SMI happened after the rcu_read_lock_nesting but > > > > before the rcu_flipctr. > > > > > > > > Then if there was an rcu_read_lock() in the SMI/NMI > > > > handler (which is perfectly legal), the nested rcu_read_lock() > > > > would believe that it could take the then-clause of the > > > > enclosing "if" statement. But because the rcu_flipctr per-CPU > > > > variable had not yet been incremented, an RCU updater would > > > > be within its rights to assume that there were no RCU reads > > > > in progress, thus possibly yanking a data structure out from > > > > under the reader in the SMI/NMI function. > > > > > > > > Fatal outcome. Note that only one CPU is involved here > > > >
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Linus Torvalds wrote: - in other words, the *only* possible meaning for "volatile" is a purely single-CPU meaning. And if you only have a single CPU involved in the process, the "volatile" is by definition pointless (because even without a volatile, the compiler is required to make the C code appear consistent as far as a single CPU is concerned). I assume you mean "except for IO-related code and 'random' values like jiffies" as you mention later on? I assume other values set in interrupt handlers would count as "random" from a volatility perspective? So anybody who argues for "volatile" fixing bugs is fundamentally incorrect. It does NO SUCH THING. By arguing that, such people only show that you have no idea what they are talking about. What about reading values modified in interrupt handlers, as in your "random" case? Or is this a bug where the user of atomic_read() is invalidly expecting a read each time it is called? Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Segher Boessenkool wrote: > > > atomic_dec() already has volatile behavior everywhere, so this is > > > semantically > > > okay, but this code (and any like it) should be calling cpu_relax() each > > > iteration through the loop, unless there's a compelling reason not to. > > > I'll > > > allow that for some hardware drivers (possibly this one) such a compelling > > > reason may exist, but hardware-independent core subsystems probably have > > > no > > > excuse. > > > > No it does not have any volatile semantics. atomic_dec() can be reordered > > at will by the compiler within the current basic unit if you do not add a > > barrier. > > "volatile" has nothing to do with reordering. If you're talking of "volatile" the type-qualifier keyword, then http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows otherwise. > atomic_dec() writes > to memory, so it _does_ have "volatile semantics", implicitly, as > long as the compiler cannot optimise the atomic variable away > completely -- any store counts as a side effect. I don't think an atomic_dec() implemented as an inline "asm volatile" or one that uses a "forget" macro would have the same re-ordering guarantees as an atomic_dec() that uses a volatile access cast. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Paul E. McKenney wrote: > On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote: > > > > On Thu, 16 Aug 2007, Paul E. McKenney wrote: > > > > > On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote: > > > > > > > > First of all, I think this illustrates that what you want > > > > here has nothing to do with atomic ops. The ORDERED_WRT_IRQ > > > > macro occurs a lot more times in your patch than atomic > > > > reads/sets. So *assuming* that it was necessary at all, > > > > then having an ordered variant of the atomic_read/atomic_set > > > > ops could do just as well. > > > > > > Indeed. If I could trust atomic_read()/atomic_set() to cause the compiler > > > to maintain ordering, then I could just use them instead of having to > > > create an ORDERED_WRT_IRQ(). (Or ACCESS_ONCE(), as it is called in a > > > different patch.) > > > > +#define WHATEVER(x)(*(volatile typeof(x) *)&(x)) > > [...] > > Also, this gives *zero* "re-ordering" guarantees that your code wants > > as you've explained it below) -- neither w.r.t. CPU re-ordering (which > > probably you don't care about) *nor* w.r.t. compiler re-ordering > > (which you definitely _do_ care about). > > You are correct about CPU re-ordering (and about the fact that this > example doesn't care about it), but not about compiler re-ordering. > > The compiler is prohibited from moving a volatile access across a sequence > point. One example of a sequence point is a statement boundary. Because > all of the volatile accesses in this code are separated by statement > boundaries, a conforming compiler is prohibited from reordering them. Yes, you're right, and I believe precisely this was discussed elsewhere as well today. But I'd call attention to what Herbert mentioned there. You're using ORDERED_WRT_IRQ() on stuff that is _not_ defined to be an atomic_t at all: * Member "completed" of struct rcu_ctrlblk is a long. * Per-cpu variable rcu_flipctr is an array of ints. * Members "rcu_read_lock_nesting" and "rcu_flipctr_idx" of struct task_struct are ints. So are you saying you're "having to use" this volatile-access macro because you *couldn't* declare all the above as atomic_t and thus just expect the right thing to happen by using the atomic ops API by default, because it lacks volatile access semantics (on x86)? If so, then I wonder if using the volatile access cast is really the best way to achieve (at least in terms of code clarity) the kind of re-ordering guarantees it wants there. (there could be alternative solutions, such as using barrier(), or that at bottom of this mail) What I mean is this: If you switch to atomic_t, and x86 switched to make atomic_t have "volatile" semantics by default, the statements would be simply a string of: atomic_inc(), atomic_add(), atomic_set(), and atomic_read() statements, and nothing in there that clearly makes it *explicit* that the code is correct (and not buggy) simply because of the re-ordering guarantees that the C "volatile" type-qualifier keyword gives us as per the standard. But now we're firmly in "subjective" territory, so you or anybody could legitimately disagree. > > > Suppose I tried replacing the ORDERED_WRT_IRQ() calls with > > > atomic_read() and atomic_set(). Starting with __rcu_read_lock(): > > > > > > o If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++" > > > was ordered by the compiler after > > > "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then > > > suppose an NMI/SMI happened after the rcu_read_lock_nesting but > > > before the rcu_flipctr. > > > > > > Then if there was an rcu_read_lock() in the SMI/NMI > > > handler (which is perfectly legal), the nested rcu_read_lock() > > > would believe that it could take the then-clause of the > > > enclosing "if" statement. But because the rcu_flipctr per-CPU > > > variable had not yet been incremented, an RCU updater would > > > be within its rights to assume that there were no RCU reads > > > in progress, thus possibly yanking a data structure out from > > > under the reader in the SMI/NMI function. > > > > > > Fatal outcome. Note that only one CPU is involved here > > > because these are all either per-CPU or per-task variables. > > > > Ok, so you don't care about CPU re-ordering. Still, I should let you know > > that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you > > want is a full compiler optimization barrier(). > > No. See above. True, *(volatile foo *)& _will_ work for this case. But multiple calls to barrier() (granted, would invalidate all other optimizations also) would work as well, would it not? [ Interestingly, if you declared all those objects mentioned earlier as atomic_t, and x86(-64) switched to an __asm__ __volatile__ based variant for atomic_{read,set}_volatile(), the bugs you want to avoid would still be there. "volatile" the C language type-qualifier does have compiler re-ordering semanti
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll allow that for some hardware drivers (possibly this one) such a compelling reason may exist, but hardware-independent core subsystems probably have no excuse. No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. "volatile" has nothing to do with reordering. atomic_dec() writes to memory, so it _does_ have "volatile semantics", implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would find disabling all compiler optimisations an acceptable price to pay for some peace of mind. So why is this a good tradeoff? It certainly is better than disabling all compiler optimisations! It's easy to be better than something really stupid :) Sure, it wasn't me who made the comparison though. So i386 and x86-64 don't have volatiles there, and it saves them a few K of kernel text. Which has to be investigated. A few kB is a lot more than expected. What you need to justify is why it is a good tradeoff to make them volatile (which btw, is much harder to go the other way after we let people make those assumptions). My point is that people *already* made those assumptions. There are two ways to clean up this mess: 1) Have the "volatile" semantics by default, change the users that don't need it; 2) Have "non-volatile" semantics by default, change the users that do need it. Option 2) randomly breaks stuff all over the place, option 1) doesn't. Yeah 1) could cause some extremely minor speed or code size regression, but only temporarily until everything has been audited. I also think that just adding things to APIs in the hope it might fix up some bugs isn't really a good road to go down. Where do you stop? I look at it the other way: keeping the "volatile" semantics in atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs; Yeah, but we could add lots of things to help prevent bugs and would never be included. I would also contend that it helps _hide_ bugs and encourages people to be lazy when thinking about these things. Sure. We aren't _adding_ anything here though, not on the platforms where it is most likely to show up, anyway. Also, you dismiss the fact that we'd actually be *adding* volatile semantics back to the 2 most widely tested architectures (in terms of test time, number of testers, variety of configurations, and coverage of driver code). I'm not dismissing that. x86 however is one of the few architectures where mistakenly leaving out a "volatile" will not easily show up on user testing, since the compiler will very often produce a memory reference anyway because it has no registers to play with. This is a very important different from just keeping volatile semantics because it is basically a one-way API change. That's a good point. Maybe we should create _two_ new APIs, one explicitly going each way. certainly most people expect that behaviour, and also that behaviour is *needed* in some places and no other interface provides that functionality. I don't know that most people would expect that behaviour. I didn't conduct any formal poll either :-) Is there any documentation anywhere that would suggest this? Not really I think, no. But not the other way around, either. Most uses of it seem to expect it though. [some confusion about barriers wrt atomics snipped] What were you confused about? Me? Not much. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Nick Piggin wrote: > > That's not obviously just taste to me. Not when the primitive has many > (perhaps, the majority) of uses that do not require said barriers. And > this is not solely about the code generation (which, as Paul says, is > relatively minor even on x86). I prefer people to think explicitly > about barriers in their lockless code. Indeed. I think the important issues are: - "volatile" itself is simply a badly/weakly defined issue. The semantics of it as far as the compiler is concerned are really not very good, and in practice tends to boil down to "I will generate so bad code that nobody can accuse me of optimizing anything away". - "volatile" - regardless of how well or badly defined it is - is purely a compiler thing. It has absolutely no meaning for the CPU itself, so it at no point implies any CPU barriers. As a result, even if the compiler generates crap code and doesn't re-order anything, there's nothing that says what the CPU will do. - in other words, the *only* possible meaning for "volatile" is a purely single-CPU meaning. And if you only have a single CPU involved in the process, the "volatile" is by definition pointless (because even without a volatile, the compiler is required to make the C code appear consistent as far as a single CPU is concerned). So, let's take the example *buggy* code where we use "volatile" to wait for other CPU's: atomic_set(&var, 0); while (!atomic_read(&var)) /* nothing */; which generates an endless loop if we don't have atomic_read() imply volatile. The point here is that it's buggy whether the volatile is there or not! Exactly because the user expects multi-processing behaviour, but "volatile" doesn't actually give any real guarantees about it. Another CPU may have done: external_ptr = kmalloc(..); /* Setup is now complete, inform the waiter */ atomic_inc(&var); but the fact is, since the other CPU isn't serialized in any way, the "while-loop" (even in the presense of "volatile") doesn't actually work right! Whatever the "atomic_read()" was waiting for may not have completed, because we have no barriers! So if "volatile" makes a difference, it is invariably a sign of a bug in serialization (the one exception is for IO - we use "volatile" to avoid having to use inline asm for IO on x86) - and for "random values" like jiffies). So the question should *not* be whether "volatile" actually fixes bugs. It *never* fixes a bug. But what it can do is to hide the obvious ones. In other words, adding a volaile in the above kind of situation of "atomic_read()" will certainly turn an obvious bug into something that works "practically all of the time). So anybody who argues for "volatile" fixing bugs is fundamentally incorrect. It does NO SUCH THING. By arguing that, such people only show that you have no idea what they are talking about. So the only reason to add back "volatile" to the atomic_read() sequence is not to fix bugs, but to _hide_ the bugs better. They're still there, they are just a lot harder to trigger, and tend to be a lot subtler. And hey, sometimes "hiding bugs well enough" is ok. In this case, I'd argue that we've successfully *not* had the volatile there for eight months on x86-64, and that should tell people something. (Does _removing_ the volatile fix bugs? No - callers still need to think about barriers etc, and lots of people don't. So I'm not claiming that removing volatile fixes any bugs either, but I *am* claiming that: - removing volatile makes some bugs easier to see (which is mostly a good thing: they were there before, anyway). - removing volatile generates better code (which is a good thing, even if it's just 0.1%) - removing volatile removes a huge mental *bug* that lots of people seem to have, as shown by this whole thread. Anybody who thinks that "volatile" actually fixes anything has a gaping hole in their head, and we should remove volatile just to make sure that nobody thinks that it means soemthign that it doesn't mean! In other words, this whole discussion has just convinced me that we should *not* add back "volatile" to "atomic_read()" - I was willing to do it for practical and "hide the bugs" reasons, but having seen people argue for it, thinking that it actually fixes something, I'm now convinced that the *last* thing we should do is to encourage that kind of superstitious thinking. "volatile" is like a black cat crossing the road. Sure, it affects *something* (at a minimum: before, the black cat was on one side of the road, afterwards it is on the other side of the road), but it has no bigger and longer-lasting direct affects. People who think "volatile" really matters are just fooling themselves. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a me
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote: > > > On Thu, 16 Aug 2007, Paul E. McKenney wrote: > > > On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote: > > > On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote: > > > > > > > > The compiler can also reorder non-volatile accesses. For an example > > > > patch that cares about this, please see: > > > > > > > > http://lkml.org/lkml/2007/8/7/280 > > > > > > > > This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and > > > > rcu_read_unlock() to ensure that accesses aren't reordered with respect > > > > to interrupt handlers and NMIs/SMIs running on that same CPU. > > > > > > Good, finally we have some code to discuss (even though it's > > > not actually in the kernel yet). > > > > There was some earlier in this thread as well. > > Hmm, I never quite got what all this interrupt/NMI/SMI handling and > RCU business you mentioned earlier was all about, but now that you've > pointed to the actual code and issues with it ... Glad to help... > > > First of all, I think this illustrates that what you want > > > here has nothing to do with atomic ops. The ORDERED_WRT_IRQ > > > macro occurs a lot more times in your patch than atomic > > > reads/sets. So *assuming* that it was necessary at all, > > > then having an ordered variant of the atomic_read/atomic_set > > > ops could do just as well. > > > > Indeed. If I could trust atomic_read()/atomic_set() to cause the compiler > > to maintain ordering, then I could just use them instead of having to > > create an ORDERED_WRT_IRQ(). (Or ACCESS_ONCE(), as it is called in a > > different patch.) > > +#define WHATEVER(x) (*(volatile typeof(x) *)&(x)) > > I suppose one could want volatile access semantics for stuff that's > a bit-field too, no? One could, but this is not supported in general. So if you want that, you need to use the usual bit-mask tricks and (for setting) atomic operations. > Also, this gives *zero* "re-ordering" guarantees that your code wants > as you've explained it below) -- neither w.r.t. CPU re-ordering (which > probably you don't care about) *nor* w.r.t. compiler re-ordering > (which you definitely _do_ care about). You are correct about CPU re-ordering (and about the fact that this example doesn't care about it), but not about compiler re-ordering. The compiler is prohibited from moving a volatile access across a sequence point. One example of a sequence point is a statement boundary. Because all of the volatile accesses in this code are separated by statement boundaries, a conforming compiler is prohibited from reordering them. > > > However, I still don't know which atomic_read/atomic_set in > > > your patch would be broken if there were no volatile. Could > > > you please point them out? > > > > Suppose I tried replacing the ORDERED_WRT_IRQ() calls with > > atomic_read() and atomic_set(). Starting with __rcu_read_lock(): > > > > o If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++" > > was ordered by the compiler after > > "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then > > suppose an NMI/SMI happened after the rcu_read_lock_nesting but > > before the rcu_flipctr. > > > > Then if there was an rcu_read_lock() in the SMI/NMI > > handler (which is perfectly legal), the nested rcu_read_lock() > > would believe that it could take the then-clause of the > > enclosing "if" statement. But because the rcu_flipctr per-CPU > > variable had not yet been incremented, an RCU updater would > > be within its rights to assume that there were no RCU reads > > in progress, thus possibly yanking a data structure out from > > under the reader in the SMI/NMI function. > > > > Fatal outcome. Note that only one CPU is involved here > > because these are all either per-CPU or per-task variables. > > Ok, so you don't care about CPU re-ordering. Still, I should let you know > that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you > want is a full compiler optimization barrier(). No. See above. > [ Your code probably works now, and emits correct code, but that's > just because of gcc did what it did. Nothing in any standard, > or in any documented behaviour of gcc, or anything about the real > (or expected) semantics of "volatile" is protecting the code here. ] Really? Why doesn't the prohibition against moving volatile accesses across sequence points take care of this? > > o If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1" > > was ordered by the compiler to follow the > > "ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI > > happened between the two, then an __rcu_read_lock() in the NMI/SMI > > would incorrectly take the "else" clause of the enclosing "if" > > statement. If some other CPU flipped the rcu_ctrlblk.completed > > in the meantime, then the __rcu_read_lock() would (correctl
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Nick Piggin wrote: > Satyam Sharma wrote: > > > On Fri, 17 Aug 2007, Nick Piggin wrote: > > > > > Because they should be thinking about them in terms of barriers, over > > > which the compiler / CPU is not to reorder accesses or cache memory > > > operations, rather than "special" "volatile" accesses. > > > > This is obviously just a taste thing. Whether to have that forget(x) > > barrier as something author should explicitly sprinkle appropriately > > in appropriate places in the code by himself or use a primitive that > > includes it itself. > > That's not obviously just taste to me. Not when the primitive has many > (perhaps, the majority) of uses that do not require said barriers. And > this is not solely about the code generation (which, as Paul says, is > relatively minor even on x86). See, you do *require* people to have to repeat the same things to you! As has been written about enough times already, and if you followed the discussion on this thread, I am *not* proposing that atomic_read()'s semantics be changed to have any extra barriers. What is proposed is a different atomic_read_xxx() variant thereof, that those can use who do want that. Now whether to have a kind of barrier ("volatile", whatever) in the atomic_read_xxx() itself, or whether to make the code writer himself to explicitly write the order(x) appropriately in appropriate places in the code _is_ a matter of taste. > > That's definitely the point, why not. This is why "barrier()", being > > heavy-handed, is not the best option. > > That is _not_ the point [...] Again, you're requiring me to repeat things that were already made evident on this thread (if you follow it). This _is_ the point, because a lot of loops out there (too many of them, I WILL NOT bother citing file_name:line_number) end up having to use a barrier just because they're using a loop-exit-condition that depends on a value returned by atomic_read(). It would be good for them if they used an atomic_read_xxx() primitive that gave these "volatility" semantics without junking compiler optimizations for other memory references. > because there has already been an alternative posted Whether that alternative (explicitly using forget(x), or wrappers thereof, such as the "order_atomic" you proposed) is better than other alternatives (such as atomic_read_xxx() which includes the volatility behaviour in itself) is still open, and precisely what we started discussing just one mail back. (The above was also mostly stuff I had to repeated for you, sadly.) > that better conforms with Linux barrier > API and is much more widely useful and more usable. I don't think so. (Now *this* _is_ the "taste-dependent matter" that I mentioned earlier.) > If you are so worried > about > barrier() being too heavyweight, then you're off to a poor start by wanting to > add a few K of kernel text by making atomic_read volatile. Repeating myself, for the N'th time, NO, I DON'T want to make atomic_read have "volatile" semantics. > > > because as I also mentioned, the logical extention > > > to Linux's barrier API to handle this is the order(x) macro. Again, not > > > special volatile accessors. > > > > Sure, that forget(x) macro _is_ proposed to be made part of the generic > > API. Doesn't explain why not to define/use primitives that has volatility > > semantics in itself, though (taste matters apart). ^ > If you follow the discussion You were thinking of a reason why the > semantics *should* be changed or added, and I was rebutting your argument > that it must be used when a full barrier() is too heavy (ie. by pointing > out that order() has superior semantics anyway). Amazing. Either you have reading comprehension problems, or else, please try reading this thread (or at least this sub-thread) again. I don't want _you_ blaming _me_ for having to repeat things to you all over again. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. Probably you didn't mean that, but no, schedule() is not barrier because it sleeps. It's a barrier because it's invisible. Where did I say it is a barrier because it sleeps? Just below. What you wrote: It is always a barrier because, at the lowest level, schedule() (and thus anything that sleeps) is defined to always be a barrier. "It is always a barrier because, at the lowest level, anything that sleeps is defined to always be a barrier". ... because it must call schedule and schedule is a barrier. Regardless of whatever obscure means the compiler might need to infer the barrier. In other words, you can ignore those obscure details because schedule() is always going to have an explicit barrier in it. I didn't quite understand what you said here, so I'll tell what I think: * foo() is a compiler barrier if the definition of foo() is invisible to the compiler at a callsite. * foo() is also a compiler barrier if the definition of foo() includes a barrier, and it is inlined at the callsite. If the above is wrong, or if there's something else at play as well, do let me know. Right. The "unobvious" thing is that you wanted to know how the compiler knows a function is a barrier -- answer is that if it does not *know* it is not a barrier, it must assume it is a barrier. True, that's clearly what happens here. But are you're definitely joking that this is "obvious" in terms of code-clarity, right? No. If you accept that barrier() is implemented correctly, and you know that sleeping is defined to be a barrier, Curiously, that's the second time you've said "sleeping is defined to be a (compiler) barrier". _In Linux,_ sleeping is defined to be a compiler barrier. How does the compiler even know if foo() is a function that "sleeps"? Do compilers have some notion of "sleeping" to ensure they automatically assume a compiler barrier whenever such a function is called? Or are you saying that the compiler can see the barrier() inside said function ... nopes, you're saying quite the opposite below. You're getting too worried about the compiler implementation. Start by assuming that it does work ;) then its perfectly clear. You don't have to know how the compiler "knows" that some function contains a barrier. I think I do, why not? Would appreciate if you could elaborate on this. If a function is not completely visible to the compiler (so it can't determine whether a barrier could be in it or not), then it must always assume it will contain a barrier so it always does the right thing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Nick Piggin wrote: > Satyam Sharma wrote: > [...] > > You think both these are equivalent in terms of "looks": > > > > | > > while (!atomic_read(&v)) { | while (!atomic_read_xxx(&v)) { > > ... | ... > > cpu_relax_no_barrier(); | > > cpu_relax_no_barrier(); > > order_atomic(&v); | } > > } | > > > > (where order_atomic() is an atomic_t > > specific wrapper as you mentioned below) > > > > ? > > I think the LHS is better if your atomic_read_xxx primitive is using the > crazy one-sided barrier, ^ I'd say it's purposefully one-sided. > because the LHS code you immediately know what > barriers are happening, and with the RHS you have to look at the > atomic_read_xxx > definition. No. As I said, the _xxx (whatever the heck you want to name it as) should give the same heads-up that your "order_atomic" thing is supposed to give. > If your atomic_read_xxx implementation was more intuitive, then both are > pretty well equal. More lines != ugly code. > > > [...] > > What bugs? > > You can't think for yourself? Your atomic_read_volatile contains a compiler > barrier to the atomic variable before the load. 2 such reads from different > locations look like this: > > asm volatile("" : "+m" (v1)); > atomic_read(&v1); > asm volatile("" : "+m" (v2)); > atomic_read(&v2); > > Which implies that the load of v1 can be reordered to occur after the load > of v2. And how would that be a bug? (sorry, I really can't think for myself) > > > Secondly, what sort of code would do such a thing? > > > > See the nodemgr_host_thread() that does something similar, though not > > exactly same. > > I'm sorry, all this waffling about made up code which might do this and > that is just a waste of time. First, you could try looking at the code. And by the way, as I've already said (why do *require* people to have to repeat things to you?) this isn't even about only existing code. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Nick Piggin wrote: > Satyam Sharma wrote: > > On Fri, 17 Aug 2007, Nick Piggin wrote: > > > Satyam Sharma wrote: > > > > > > It is very obvious. msleep calls schedule() (ie. sleeps), which is > > > always a barrier. > > > > Probably you didn't mean that, but no, schedule() is not barrier because > > it sleeps. It's a barrier because it's invisible. > > Where did I say it is a barrier because it sleeps? Just below. What you wrote: > It is always a barrier because, at the lowest level, schedule() (and thus > anything that sleeps) is defined to always be a barrier. "It is always a barrier because, at the lowest level, anything that sleeps is defined to always be a barrier". > Regardless of > whatever obscure means the compiler might need to infer the barrier. > > In other words, you can ignore those obscure details because schedule() is > always going to have an explicit barrier in it. I didn't quite understand what you said here, so I'll tell what I think: * foo() is a compiler barrier if the definition of foo() is invisible to the compiler at a callsite. * foo() is also a compiler barrier if the definition of foo() includes a barrier, and it is inlined at the callsite. If the above is wrong, or if there's something else at play as well, do let me know. > > > The "unobvious" thing is that you wanted to know how the compiler knows > > > a function is a barrier -- answer is that if it does not *know* it is not > > > a barrier, it must assume it is a barrier. > > > > True, that's clearly what happens here. But are you're definitely joking > > that this is "obvious" in terms of code-clarity, right? > > No. If you accept that barrier() is implemented correctly, and you know > that sleeping is defined to be a barrier, Curiously, that's the second time you've said "sleeping is defined to be a (compiler) barrier". How does the compiler even know if foo() is a function that "sleeps"? Do compilers have some notion of "sleeping" to ensure they automatically assume a compiler barrier whenever such a function is called? Or are you saying that the compiler can see the barrier() inside said function ... nopes, you're saying quite the opposite below. > then its perfectly clear. You > don't have to know how the compiler "knows" that some function contains > a barrier. I think I do, why not? Would appreciate if you could elaborate on this. Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Because they should be thinking about them in terms of barriers, over which the compiler / CPU is not to reorder accesses or cache memory operations, rather than "special" "volatile" accesses. This is obviously just a taste thing. Whether to have that forget(x) barrier as something author should explicitly sprinkle appropriately in appropriate places in the code by himself or use a primitive that includes it itself. That's not obviously just taste to me. Not when the primitive has many (perhaps, the majority) of uses that do not require said barriers. And this is not solely about the code generation (which, as Paul says, is relatively minor even on x86). I prefer people to think explicitly about barriers in their lockless code. I'm not saying "taste matters aren't important" (they are), but I'm really skeptical if most folks would find the former tasteful. So I /do/ have better taste than most folks? Thanks! :-) And by the way, the point is *also* about the fact that cpu_relax(), as of today, implies a full memory clobber, which is not what a lot of such loops want. (due to stuff mentioned elsewhere, summarized in that summary) That's not the point, That's definitely the point, why not. This is why "barrier()", being heavy-handed, is not the best option. That is _not_ the point (of why a volatile atomic_read is good) because there has already been an alternative posted that better conforms with Linux barrier API and is much more widely useful and more usable. If you are so worried about barrier() being too heavyweight, then you're off to a poor start by wanting to add a few K of kernel text by making atomic_read volatile. because as I also mentioned, the logical extention to Linux's barrier API to handle this is the order(x) macro. Again, not special volatile accessors. Sure, that forget(x) macro _is_ proposed to be made part of the generic API. Doesn't explain why not to define/use primitives that has volatility semantics in itself, though (taste matters apart). If you follow the discussion You were thinking of a reason why the semantics *should* be changed or added, and I was rebutting your argument that it must be used when a full barrier() is too heavy (ie. by pointing out that order() has superior semantics anyway). Why do I keep repeating the same things? I'll not continue bloating this thread until a new valid point comes up... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: I think they would both be equally ugly, You think both these are equivalent in terms of "looks": | while (!atomic_read(&v)) { | while (!atomic_read_xxx(&v)) { ... | ... cpu_relax_no_barrier(); | cpu_relax_no_barrier(); order_atomic(&v); | } } | (where order_atomic() is an atomic_t specific wrapper as you mentioned below) ? I think the LHS is better if your atomic_read_xxx primitive is using the crazy one-sided barrier, because the LHS code you immediately know what barriers are happening, and with the RHS you have to look at the atomic_read_xxx definition. If your atomic_read_xxx implementation was more intuitive, then both are pretty well equal. More lines != ugly code. but the atomic_read_volatile variant would be more prone to subtle bugs because of the weird implementation. What bugs? You can't think for yourself? Your atomic_read_volatile contains a compiler barrier to the atomic variable before the load. 2 such reads from different locations look like this: asm volatile("" : "+m" (v1)); atomic_read(&v1); asm volatile("" : "+m" (v2)); atomic_read(&v2); Which implies that the load of v1 can be reordered to occur after the load of v2. Bet you didn't expect that? Secondly, what sort of code would do such a thing? See the nodemgr_host_thread() that does something similar, though not exactly same. I'm sorry, all this waffling about made up code which might do this and that is just a waste of time. Seriously, the thread is bloated enough and never going to get anywhere with all this handwaving. If someone is saving up all the really real and actually good arguments for why we must have a volatile here, now is the time to use them. and have barriers both before and after the memory operation, How could that lead to bugs? (if you can point to existing code, but just some testcase / sample code would be fine as well). See above. As I said, barrier() is too heavy-handed. Typo. I meant: defined for a single memory location (ie. order(x)). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. Probably you didn't mean that, but no, schedule() is not barrier because it sleeps. It's a barrier because it's invisible. Where did I say it is a barrier because it sleeps? It is always a barrier because, at the lowest level, schedule() (and thus anything that sleeps) is defined to always be a barrier. Regardless of whatever obscure means the compiler might need to infer the barrier. In other words, you can ignore those obscure details because schedule() is always going to have an explicit barrier in it. The "unobvious" thing is that you wanted to know how the compiler knows a function is a barrier -- answer is that if it does not *know* it is not a barrier, it must assume it is a barrier. True, that's clearly what happens here. But are you're definitely joking that this is "obvious" in terms of code-clarity, right? No. If you accept that barrier() is implemented correctly, and you know that sleeping is defined to be a barrier, then its perfectly clear. You don't have to know how the compiler "knows" that some function contains a barrier. Just 5 minutes back you mentioned elsewhere you like seeing lots of explicit calls to barrier() (with comments, no less, hmm? :-) Sure, but there are well known primitives which contain barriers, and trivial recognisable code sequences for which you don't need comments. waiting-loops using sleeps or cpu_relax() are prime examples. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/