On Thu, 15 Jan 2015, Jakub Jelinek wrote:

> On Thu, Jan 15, 2015 at 07:46:18AM +0100, Richard Biener wrote:
> > >> That last line means the compiler is free to delete a non-volatile
> > >> asm with a memory clobber if that asm is not needed for dataflow.  Or
> > >> that is how I read it; it is trying to indicate that if you want to
> > >> prevent the memory clobber from being deleted (together with the rest
> > >> of the asm), you need to make the asm volatile.
> > >>
> > >> So as far as I can see the compiler can CSE two identical
> > >non-volatile
> > >> asms with memory clobber just fine.  Older GCC (I tried 4.7.2) does
> > >do
> > >> this; current mainline doesn't.  I think it should.
> > >No, it should not CSE those two cases.  That's simply wrong and if an 
> > >older version did that optimization, that's a bug.
> > 
> > I think segher has a point here.  If the asm with memory clobber would 
> > store to random memory and the point would be to preserve that then the 
> > whole distinction with volatile doesn't make much sense (after all without 
> > volatile we happily DCE such asm if the regular outputs are not needed).
> > 
> > This doesn't mean 'memory' is a well-designed thing, of course. Just its
> > effects are effectively limited to reads without volatile(?)
> 
> Segher's mails talk about "memory" being a write but not read.
> If we even can't agree on what non-volatile "memory" means, I think
> we should treat it more conservatively, because every user (and there are
> lots of people using non-volatile asm with "memory" in the wild) treats it
> differently.  Just trying to grep for a few:
> glibc:
> ./sysdeps/alpha/bits/atomic.h:# define atomic_full_barrier()  __asm ("mb" : : 
> : "memory");
> ./sysdeps/alpha/bits/atomic.h:# define atomic_read_barrier()  __asm ("mb" : : 
> : "memory");
> ./sysdeps/alpha/bits/atomic.h:# define atomic_write_barrier() __asm ("wmb" : 
> : : "memory");
> ./sysdeps/sparc/sparc32/bits/atomic.h:# define atomic_full_barrier() __asm 
> ("" ::: "memory")
> ./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier()      
> __asm ("lwsync" ::: "memory")
> ./sysdeps/powerpc/powerpc32/bits/atomic.h:# define atomic_read_barrier()      
> __asm ("sync" ::: "memory")
> ./sysdeps/powerpc/powerpc64/bits/atomic.h:#define atomic_read_barrier()       
> __asm ("lwsync" ::: "memory")
> ./sysdeps/powerpc/bits/atomic.h:#define atomic_full_barrier() __asm ("sync" 
> ::: "memory")
> ./sysdeps/powerpc/bits/atomic.h:#define atomic_write_barrier()        __asm 
> ("eieio" ::: "memory")
> ./sysdeps/generic/malloc-machine.h:# define atomic_full_barrier() __asm ("" 
> ::: "memory")
> ./elf/tst-tlsmod3.c:  asm ("" ::: "memory");
> ./elf/tst-tlsmod4.c:  asm ("" ::: "memory");
> ./elf/tst-tlsmod1.c:  asm ("" ::: "memory");
> ./elf/tst-tlsmod2.c:  asm ("" ::: "memory");
> ./include/atomic.h:# define atomic_full_barrier() __asm ("" ::: "memory")
> linux kernel:
> ./arch/arm/mach-omap2/pm24xx.c:       asm("mcr p15, 0, %0, c7, c0, 4" : : "r" 
> (zero) : "memory", "cc");
> ./arch/arm/include/asm/irqflags.h:#define local_fiq_enable()  __asm__("cpsie 
> f        @ __stf" : : : "memory", "cc")
> ./arch/arm/include/asm/irqflags.h:#define local_fiq_disable() __asm__("cpsid 
> f        @ __clf" : : : "memory", "cc")
> ./arch/x86/include/asm/uaccess_64.h:          asm("":::"memory");
> ./arch/x86/include/asm/uaccess_64.h:          asm("":::"memory");
> ./arch/x86/include/asm/segment.h:     asm("mov %%" #seg ",%0":"=r" (value) : 
> : "memory")
> ./arch/x86/include/asm/stackprotector.h:      asm("mov %0, %%gs" : : "r" 
> (__KERNEL_STACK_CANARY) : "memory");
> ./arch/arm64/include/asm/irqflags.h:#define local_fiq_enable()        
> asm("msr        daifclr, #1" : : : "memory")
> ./arch/arm64/include/asm/irqflags.h:#define local_fiq_disable()       
> asm("msr        daifset, #1" : : : "memory")
> ./arch/arm64/include/asm/irqflags.h:#define local_async_enable()      
> asm("msr        daifclr, #4" : : : "memory")
> ./arch/arm64/include/asm/irqflags.h:#define local_async_disable()     
> asm("msr        daifset, #4" : : : "memory")
> ./arch/tile/lib/memcpy_64.c:                          __asm__ ("" : : : 
> "memory");
> ./arch/tile/lib/memcpy_64.c:                          __asm__ ("" : : : 
> "memory");
> ./arch/tile/include/hv/netio_intf.h:  __asm__("" : : : "memory");
> ./drivers/net/ethernet/sgi/ioc3-eth.c:        __asm__("sync" ::: "memory")
> ./lib/sha1.c:  #define setW(x, val) do { W(x) = (val); 
> __asm__("":::"memory"); } while (0)
> 
> The glibc barriers are supposedly something that can be CSEd (one barrier 
> instead of
> two consecutive barriers is enough), but certainly not moved across any 
> loads/stores
> in between.  In the kernel case, the enable/disable probably wouldn't allow 
> even CSE.
> 
> So I'm with Jeff that we should treat "memory" at least as unspecified read 
> and write,
> and whether we can CSE them if there are no memory loads/stores in between 
> them can
> be discussed (most likely the kernel would be safe even in that case, because 
> those
> usually don't nest and appear in pairs, or act as barriers (like the glibc 
> case),
> or read from segment registers (guess again ok to be CSEd with no intervening 
> loads/stores).
> 
> In 4.9 backport I'd prefer not to CSE them at all though, stay conservative.

Sure - I also requested "memory" to be not CSEd just to be conservative,
not because I fully understood its meaning.

Richard.

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

Reply via email to