Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Torvald Riegel
On Thu, 2014-02-06 at 20:06 -0800, Paul E. McKenney wrote:
> On Thu, Feb 06, 2014 at 11:58:22PM +0100, Torvald Riegel wrote:
> > On Thu, 2014-02-06 at 13:55 -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 06, 2014 at 10:09:25PM +0100, Torvald Riegel wrote:
> > > > On Thu, 2014-02-06 at 18:59 +, Will Deacon wrote:
> > > > > To answer that question, you need to go and look at the definitions of
> > > > > synchronises-with, happens-before, dependency_ordered_before and a 
> > > > > whole
> > > > > pile of vaguely written waffle to realise that you don't know.
> > > > 
> > > > Are you familiar with the formalization of the C11/C++11 model by Batty
> > > > et al.?
> > > > http://www.cl.cam.ac.uk/~mjb220/popl085ap-sewell.pdf
> > > > http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
> > > > 
> > > > They also have a nice tool that can run condensed examples and show you
> > > > all allowed (and forbidden) executions (it runs in the browser, so is
> > > > slow for larger examples), including nice annotated graphs for those:
> > > > http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/
> > > > 
> > > > It requires somewhat special syntax, but the following, which should be
> > > > equivalent to your example above, runs just fine:
> > > > 
> > > > int main() {
> > > >   atomic_int foo = 0; 
> > > >   atomic_int bar = 0; 
> > > >   atomic_int baz = 0; 
> > > >   {{{ {
> > > > foo.store(42, memory_order_relaxed);
> > > > bar.store(1, memory_order_seq_cst);
> > > > baz.store(42, memory_order_relaxed);
> > > >   }
> > > >   ||| {
> > > > r1=baz.load(memory_order_seq_cst).readsvalue(42);
> > > > r2=foo.load(memory_order_seq_cst).readsvalue(0);
> > > >   }
> > > >   }}};
> > > >   return 0; }
> > > > 
> > > > That yields 3 consistent executions for me, and likewise if the last
> > > > readsvalue() is using 42 as argument.
> > > > 
> > > > If you add a "fence(memory_order_seq_cst);" after the store to foo, the
> > > > program can't observe != 42 for foo anymore, because the seq-cst fence
> > > > is adding a synchronizes-with edge via the baz reads-from.
> > > > 
> > > > I think this is a really neat tool, and very helpful to answer such
> > > > questions as in your example.
> > > 
> > > Hmmm...  The tool doesn't seem to like fetch_add().  But let's assume that
> > > your substitution of store() for fetch_add() is correct.  Then this shows
> > > that we cannot substitute fetch_add() for atomic_add_return().
> > 
> > It should be in this example, I believe.
> 
> You lost me on this one.

I mean that in this example, substituting fetch_add() with store()
should not change meaning, given that what the fetch_add reads-from
seems irrelevant.



Simplify using context in loop-iv

2014-02-07 Thread Paulo Matos
Hello,

This is a followup of 
http://gcc.gnu.org/ml/gcc/2014-01/msg00120.html
and
http://gcc.gnu.org/ml/gcc/2014-01/msg00055.html

This is a slightly long patch that attempts to simplify the niter and infinite 
expressions by recursively looking through the dominant basic blocks for 
register definitions, replacing them in the original expressions and 
simplifying the expression with simplify-rtx.

It allows GCC to generate a lot more zero overhead loops by finding much better 
bounds to loops or discovering that infinite loops are actually not so.

I would like to have some comments on the patch and its possible integration 
into upstream (I guess it can only go upstream once 4.9 is released, right?).

I bootstrapped and tested x86_64-unknown-linux-gnu with no regressions.

Thanks,

Paulo Matos




simplify-using-context.patch
Description: simplify-using-context.patch


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Will Deacon
Hello Torvald,

It looks like Paul clarified most of the points I was trying to make
(thanks Paul!), so I won't go back over them here.

On Thu, Feb 06, 2014 at 09:09:25PM +, Torvald Riegel wrote:
> Are you familiar with the formalization of the C11/C++11 model by Batty
> et al.?
> http://www.cl.cam.ac.uk/~mjb220/popl085ap-sewell.pdf
> http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
> 
> They also have a nice tool that can run condensed examples and show you
> all allowed (and forbidden) executions (it runs in the browser, so is
> slow for larger examples), including nice annotated graphs for those:
> http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/

Thanks for the link, that's incredibly helpful. I've used ppcmem and armmem
in the past, but I didn't realise they have a version for C++11 too.
Actually, the armmem backend doesn't implement our atomic instructions or
the acquire/release accessors, so it's not been as useful as it could be.
I should probably try to learn OCaml...

> IMHO, one thing worth considering is that for C/C++, the C11/C++11 is
> the only memory model that has widespread support.  So, even though it's
> a fairly weak memory model (unless you go for the "only seq-cst"
> beginners advice) and thus comes with a higher complexity, this model is
> what likely most people will be familiar with over time.  Deviating from
> the "standard" model can have valid reasons, but it also has a cost in
> that new contributors are more likely to be familiar with the "standard"
> model.

Indeed, I wasn't trying to write-off the C11 memory model as something we
can never use in the kernel. I just don't think the current situation is
anywhere close to usable for a project such as Linux. If a greater
understanding of the memory model does eventually manifest amongst C/C++
developers (by which I mean, the beginners advice is really treated as
such and there is a widespread intuition about ordering guarantees, as
opposed to the need to use formal tools), then surely the tools and libraries
will stabilise and provide uniform semantics across the 25+ architectures
that Linux currently supports. If *that* happens, this discussion is certainly
worth having again.

Will


[MIPS] Avoiding FP operations/register usage

2014-02-07 Thread Matthew Fortune
Hi Richard,

I've been trying to determine for some time whether the MIPS backend has 
successfully guaranteed that even when compiling with hard-float enabled there 
is no floating point code emitted unless you use floating point types.

My most recent reason for looking at this is because I am starting to 
understand/look at mips ld.so from glibc and it appears to make such an 
assumption. I.e. I cannot see it using any specific options to prevent the use 
of floating point but the path into the dynamic linker for resolving symbols 
only preserves integer argument registers and ignores floating point. I have to 
therefore assume that the MIPS backend manages to avoid what I thought was a 
common problem of using floating point registers as integer scratch in extreme 
circumstances.

An another example of where this issue is relevant is the MIPS linux kernel 
which explicitly compiles for soft-float, whether this is out of caution or 
necessity I do not know but I'm interested to figure it out.

Any insight into this would be welcome. If there is no such guarantee (which is 
what I have assumed thus far) then I will go ahead fix anything that relies on 
avoiding floating point code.

Regards,
Matthew 



Re: -O3 and -ftree-vectorize

2014-02-07 Thread Jakub Jelinek
On Thu, Feb 06, 2014 at 05:21:00PM -0500, Tim Prince wrote:
> I'm seeing vectorization  but no output from
> -ftree-vectorizer-verbose, and no dot product vectorization inside
> omp parallel regions, with gcc g++ or gfortran 4.9.  Primary targets
> are cygwin64 and linux x86_64.
> I've been unable to use -O3 vectorization with gcc, although it
> works with gfortran and g++, so use gcc -O2 -ftree-vectorize
> together with additional optimization flags which don't break.

Can you file a GCC bugzilla PR with minimal testcases for this (or point us
at already filed bugreports)?

> I've made source code changes to take advantage of the new
> vectorization with merge() and ? operators; while it's useful for
> -march=core-avx2, it's sometimes a loss for -msse4.1.
> gcc vectorization with #pragma omp parallel for simd is reasonably
> effective in my tests only on 12 or more cores.

Likewise.

> #pragma omp simd reduction(max: ) is giving correct results but poor
> performance in my tests.

Likewise.

Jakub


Re: -O3 and -ftree-vectorize

2014-02-07 Thread Tim Prince


On 02/07/2014 10:22 AM, Jakub Jelinek wrote:

On Thu, Feb 06, 2014 at 05:21:00PM -0500, Tim Prince wrote:

I'm seeing vectorization  but no output from
-ftree-vectorizer-verbose, and no dot product vectorization inside
omp parallel regions, with gcc g++ or gfortran 4.9.  Primary targets
are cygwin64 and linux x86_64.
I've been unable to use -O3 vectorization with gcc, although it
works with gfortran and g++, so use gcc -O2 -ftree-vectorize
together with additional optimization flags which don't break.

Can you file a GCC bugzilla PR with minimal testcases for this (or point us
at already filed bugreports)?
The question of problems with gcc -O3 (called from gfortran) have eluded 
me as to finding a minimal test case.  When I run under debug, it 
appears that somewhere prior to the crash some gfortran code is 
over-written with data by the gcc code, overwhelming my debugging 
skill.  I can get full performance with -O2 plus a bunch of intermediate 
flags.
As to non-vectorization of dot product in omp parallel region, 
-fopt-info (which I didn't know about) is reporting vectorization, but 
there are no parallel simd instructions in the generated code for the 
omp_fn.  I'll file a PR on that if it's still reproduced in a minimal case.





I've made source code changes to take advantage of the new
vectorization with merge() and ? operators; while it's useful for
-march=core-avx2, it's sometimes a loss for -msse4.1.
gcc vectorization with #pragma omp parallel for simd is reasonably
effective in my tests only on 12 or more cores.

Likewise.
Those are cases of 2 levels of loops from netlib "vector" benchmark 
where only one level is vectorizable and parallelizable. By putting the 
vectorizable loop on the outside the parallelization scales to a large 
number of cores.  I don't expect it to out-perform single thread 
optimized avx vectorization until 8 or more cores are in use, but it 
needs more than expected number of threads even relative to SSE 
vectorization.



#pragma omp simd reduction(max: ) is giving correct results but poor
performance in my tests.

Likewise.
I'll file a PR on this, didn't know if there might be interest.  I have 
an Intel compiler issue "closed, will not be fixed" so the simd 
reduction(max: ) isn't viable for icc in the near term.

Thanks,



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Paul E. McKenney
On Fri, Feb 07, 2014 at 10:13:40AM +0100, Torvald Riegel wrote:
> On Thu, 2014-02-06 at 20:06 -0800, Paul E. McKenney wrote:
> > On Thu, Feb 06, 2014 at 11:58:22PM +0100, Torvald Riegel wrote:
> > > On Thu, 2014-02-06 at 13:55 -0800, Paul E. McKenney wrote:
> > > > On Thu, Feb 06, 2014 at 10:09:25PM +0100, Torvald Riegel wrote:
> > > > > On Thu, 2014-02-06 at 18:59 +, Will Deacon wrote:
> > > > > > To answer that question, you need to go and look at the definitions 
> > > > > > of
> > > > > > synchronises-with, happens-before, dependency_ordered_before and a 
> > > > > > whole
> > > > > > pile of vaguely written waffle to realise that you don't know.
> > > > > 
> > > > > Are you familiar with the formalization of the C11/C++11 model by 
> > > > > Batty
> > > > > et al.?
> > > > > http://www.cl.cam.ac.uk/~mjb220/popl085ap-sewell.pdf
> > > > > http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
> > > > > 
> > > > > They also have a nice tool that can run condensed examples and show 
> > > > > you
> > > > > all allowed (and forbidden) executions (it runs in the browser, so is
> > > > > slow for larger examples), including nice annotated graphs for those:
> > > > > http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/
> > > > > 
> > > > > It requires somewhat special syntax, but the following, which should 
> > > > > be
> > > > > equivalent to your example above, runs just fine:
> > > > > 
> > > > > int main() {
> > > > >   atomic_int foo = 0; 
> > > > >   atomic_int bar = 0; 
> > > > >   atomic_int baz = 0; 
> > > > >   {{{ {
> > > > > foo.store(42, memory_order_relaxed);
> > > > > bar.store(1, memory_order_seq_cst);
> > > > > baz.store(42, memory_order_relaxed);
> > > > >   }
> > > > >   ||| {
> > > > > r1=baz.load(memory_order_seq_cst).readsvalue(42);
> > > > > r2=foo.load(memory_order_seq_cst).readsvalue(0);
> > > > >   }
> > > > >   }}};
> > > > >   return 0; }
> > > > > 
> > > > > That yields 3 consistent executions for me, and likewise if the last
> > > > > readsvalue() is using 42 as argument.
> > > > > 
> > > > > If you add a "fence(memory_order_seq_cst);" after the store to foo, 
> > > > > the
> > > > > program can't observe != 42 for foo anymore, because the seq-cst fence
> > > > > is adding a synchronizes-with edge via the baz reads-from.
> > > > > 
> > > > > I think this is a really neat tool, and very helpful to answer such
> > > > > questions as in your example.
> > > > 
> > > > Hmmm...  The tool doesn't seem to like fetch_add().  But let's assume 
> > > > that
> > > > your substitution of store() for fetch_add() is correct.  Then this 
> > > > shows
> > > > that we cannot substitute fetch_add() for atomic_add_return().
> > > 
> > > It should be in this example, I believe.
> > 
> > You lost me on this one.
> 
> I mean that in this example, substituting fetch_add() with store()
> should not change meaning, given that what the fetch_add reads-from
> seems irrelevant.

Got it.  Agreed, though your other suggestion of substituting CAS is
more convincing.  ;-)

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Paul E. McKenney
On Fri, Feb 07, 2014 at 12:01:25PM +, Will Deacon wrote:
> Hello Torvald,
> 
> It looks like Paul clarified most of the points I was trying to make
> (thanks Paul!), so I won't go back over them here.
> 
> On Thu, Feb 06, 2014 at 09:09:25PM +, Torvald Riegel wrote:
> > Are you familiar with the formalization of the C11/C++11 model by Batty
> > et al.?
> > http://www.cl.cam.ac.uk/~mjb220/popl085ap-sewell.pdf
> > http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
> > 
> > They also have a nice tool that can run condensed examples and show you
> > all allowed (and forbidden) executions (it runs in the browser, so is
> > slow for larger examples), including nice annotated graphs for those:
> > http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/
> 
> Thanks for the link, that's incredibly helpful. I've used ppcmem and armmem
> in the past, but I didn't realise they have a version for C++11 too.
> Actually, the armmem backend doesn't implement our atomic instructions or
> the acquire/release accessors, so it's not been as useful as it could be.
> I should probably try to learn OCaml...

That would be very cool!

Another option would be to recruit a grad student to take on that project
for Peter Sewell.  He might already have one, for all I know.

> > IMHO, one thing worth considering is that for C/C++, the C11/C++11 is
> > the only memory model that has widespread support.  So, even though it's
> > a fairly weak memory model (unless you go for the "only seq-cst"
> > beginners advice) and thus comes with a higher complexity, this model is
> > what likely most people will be familiar with over time.  Deviating from
> > the "standard" model can have valid reasons, but it also has a cost in
> > that new contributors are more likely to be familiar with the "standard"
> > model.
> 
> Indeed, I wasn't trying to write-off the C11 memory model as something we
> can never use in the kernel. I just don't think the current situation is
> anywhere close to usable for a project such as Linux. If a greater
> understanding of the memory model does eventually manifest amongst C/C++
> developers (by which I mean, the beginners advice is really treated as
> such and there is a widespread intuition about ordering guarantees, as
> opposed to the need to use formal tools), then surely the tools and libraries
> will stabilise and provide uniform semantics across the 25+ architectures
> that Linux currently supports. If *that* happens, this discussion is certainly
> worth having again.

And it is likely to be worthwhile even before then on a piecemeal
basis, where architecture maintainers pick and choose which primitive
is in inline assembly and which the compiler can deal with properly.
For example, I bet that atomic_inc() can be implemented just fine by C11
in the very near future.  However, atomic_add_return() is another story.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Paul E. McKenney
On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > Hopefully some discussion of out-of-thin-air values as well.
> 
> Yes, absolutely shoot store speculation in the head already. Then drive
> a wooden stake through its hart.
> 
> C11/C++11 should not be allowed to claim itself a memory model until that
> is sorted.

There actually is a proposal being put forward, but it might not make ARM
and Power people happy because it involves adding a compare, a branch,
and an ISB/isync after every relaxed load...  Me, I agree with you,
much preferring the no-store-speculation approach.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Will Deacon
On Fri, Feb 07, 2014 at 05:06:54PM +, Peter Zijlstra wrote:
> On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> > Hi Paul,
> > 
> > On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > > Hopefully some discussion of out-of-thin-air values as well.
> > > > 
> > > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > > a wooden stake through its hart.
> > > > 
> > > > C11/C++11 should not be allowed to claim itself a memory model until 
> > > > that
> > > > is sorted.
> > > 
> > > There actually is a proposal being put forward, but it might not make ARM
> > > and Power people happy because it involves adding a compare, a branch,
> > > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > > much preferring the no-store-speculation approach.
> > 
> > Can you elaborate a bit on this please? We don't permit speculative stores
> > in the ARM architecture, so it seems counter-intuitive that GCC needs to
> > emit any additional instructions to prevent that from happening.
> > 
> > Stores can, of course, be observed out-of-order but that's a lot more
> > reasonable :)
> 
> This is more about the compiler speculating on stores; imagine:
> 
>   if (x)
>   y = 1;
>   else
>   y = 2;
> 
> The compiler is allowed to change that into:
> 
>   y = 2;
>   if (x)
>   y = 1;
> 
> Which is of course a big problem when you want to rely on the ordering.

Understood, but that doesn't explain why Paul wants to add ISB/isync
instructions which affect the *CPU* rather than the compiler!

Will


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Peter Zijlstra
On Fri, Feb 07, 2014 at 05:13:36PM +, Will Deacon wrote:
> Understood, but that doesn't explain why Paul wants to add ISB/isync
> instructions which affect the *CPU* rather than the compiler!

I doubt Paul wants it, but yeah, I'm curious about that proposal as
well, sounds like someone took a big toke from the bong again; it seems
a favourite past time amongst committees.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Peter Zijlstra
On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> Hi Paul,
> 
> On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > Hopefully some discussion of out-of-thin-air values as well.
> > > 
> > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > a wooden stake through its hart.
> > > 
> > > C11/C++11 should not be allowed to claim itself a memory model until that
> > > is sorted.
> > 
> > There actually is a proposal being put forward, but it might not make ARM
> > and Power people happy because it involves adding a compare, a branch,
> > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > much preferring the no-store-speculation approach.
> 
> Can you elaborate a bit on this please? We don't permit speculative stores
> in the ARM architecture, so it seems counter-intuitive that GCC needs to
> emit any additional instructions to prevent that from happening.
> 
> Stores can, of course, be observed out-of-order but that's a lot more
> reasonable :)

This is more about the compiler speculating on stores; imagine:

  if (x)
y = 1;
  else
y = 2;

The compiler is allowed to change that into:

  y = 2;
  if (x)
y = 1;

Which is of course a big problem when you want to rely on the ordering.

There's further problems where things like memset() can write outside
the specified address range. Examples are memset() using single
instructions to wipe entire cachelines and then 'restoring' the tail
bit.

While valid for single threaded, its a complete disaster for concurrent
code.

There's more, but it all boils down to doing stores you don't expect in
a 'sane' concurrent environment and/or don't respect the control flow.




Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Paul E. McKenney
On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> Hi Paul,
> 
> On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > Hopefully some discussion of out-of-thin-air values as well.
> > > 
> > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > a wooden stake through its hart.
> > > 
> > > C11/C++11 should not be allowed to claim itself a memory model until that
> > > is sorted.
> > 
> > There actually is a proposal being put forward, but it might not make ARM
> > and Power people happy because it involves adding a compare, a branch,
> > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > much preferring the no-store-speculation approach.
> 
> Can you elaborate a bit on this please? We don't permit speculative stores
> in the ARM architecture, so it seems counter-intuitive that GCC needs to
> emit any additional instructions to prevent that from happening.

Requiring a compare/branch/ISB after each relaxed load enables a simple(r)
proof that out-of-thin-air values cannot be observed in the face of any
compiler optimization that refrains from reordering a prior relaxed load
with a subsequent relaxed store.

> Stores can, of course, be observed out-of-order but that's a lot more
> reasonable :)

So let me try an example.  I am sure that Torvald Riegel will jump in
with any needed corrections or amplifications:

Initial state: x == y == 0

T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
atomic_store_explicit(r1, y, memory_order_relaxed);

T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);

One would intuitively expect r1 == r2 == 0 as the only possible outcome.
But suppose that the compiler used specialization optimizations, as it
would if there was a function that has a very lightweight implementation
for some values and a very heavyweight one for other.  In particular,
suppose that the lightweight implementation was for the value 42.
Then the compiler might do something like the following:

Initial state: x == y == 0

T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
if (r1 == 42)
atomic_store_explicit(42, y, memory_order_relaxed);
else
atomic_store_explicit(r1, y, memory_order_relaxed);

T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);

Suddenly we have an explicit constant 42 showing up.  Of course, if
the compiler carefully avoided speculative stores (as both Peter and
I believe that it should if its code generation is to be regarded as
anything other than an act of vandalism, the words in the standard
notwithstanding), there would be no problem.  But currently, a number
of compiler writers see absolutely nothing wrong with transforming
the optimized-for-42 version above with something like this:

Initial state: x == y == 0

T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
atomic_store_explicit(42, y, memory_order_relaxed);
if (r1 != 42)
atomic_store_explicit(r1, y, memory_order_relaxed);

T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);

And then it is a short and uncontroversial step to the following:

Initial state: x == y == 0

T1: atomic_store_explicit(42, y, memory_order_relaxed);
r1 = atomic_load_explicit(x, memory_order_relaxed);
if (r1 != 42)
atomic_store_explicit(r1, y, memory_order_relaxed);

T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);

This can of course result in r1 == r2 == 42, even though the constant
42 never appeared in the original code.  This is one way to generate
an out-of-thin-air value.

As near as I can tell, compiler writers hate the idea of prohibiting
speculative-store optimizations because it requires them to introduce
both control and data dependency tracking into their compilers.  Many of
them seem to hate dependency tracking with a purple passion.  At least,
such a hatred would go a long way towards explaining the incomplete
and high-overhead implementations of memory_order_consume, the long
and successful use of idioms based on the memory_order_consume pattern
notwithstanding [*].  ;-)

That said, the Java guys are talking about introducing something
vaguely resembling memory_order_consume (and thus resembling the
rcu_assign_pointer() and rcu_dereference() portions of RCU) to solve Java
out-of-thin-air issues involving initialization, so perhaps there is hope.

Thanx, Paul

[*] http://queue.acm.org/detail.cfm?id=2488549
http://www.rdrop.com/users/paulmck/RCU/rclockpdcsproo

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Paul E. McKenney
On Fri, Feb 07, 2014 at 05:13:36PM +, Will Deacon wrote:
> On Fri, Feb 07, 2014 at 05:06:54PM +, Peter Zijlstra wrote:
> > On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> > > Hi Paul,
> > > 
> > > On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > > > Hopefully some discussion of out-of-thin-air values as well.
> > > > > 
> > > > > Yes, absolutely shoot store speculation in the head already. Then 
> > > > > drive
> > > > > a wooden stake through its hart.
> > > > > 
> > > > > C11/C++11 should not be allowed to claim itself a memory model until 
> > > > > that
> > > > > is sorted.
> > > > 
> > > > There actually is a proposal being put forward, but it might not make 
> > > > ARM
> > > > and Power people happy because it involves adding a compare, a branch,
> > > > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > > > much preferring the no-store-speculation approach.
> > > 
> > > Can you elaborate a bit on this please? We don't permit speculative stores
> > > in the ARM architecture, so it seems counter-intuitive that GCC needs to
> > > emit any additional instructions to prevent that from happening.
> > > 
> > > Stores can, of course, be observed out-of-order but that's a lot more
> > > reasonable :)
> > 
> > This is more about the compiler speculating on stores; imagine:
> > 
> >   if (x)
> > y = 1;
> >   else
> > y = 2;
> > 
> > The compiler is allowed to change that into:
> > 
> >   y = 2;
> >   if (x)
> > y = 1;
> > 
> > Which is of course a big problem when you want to rely on the ordering.
> 
> Understood, but that doesn't explain why Paul wants to add ISB/isync
> instructions which affect the *CPU* rather than the compiler!

Hey!!!  -I- don't want to add those instructions!  Others do.
Unfortunately, lots of others.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Will Deacon
Hi Paul,

On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > Hopefully some discussion of out-of-thin-air values as well.
> > 
> > Yes, absolutely shoot store speculation in the head already. Then drive
> > a wooden stake through its hart.
> > 
> > C11/C++11 should not be allowed to claim itself a memory model until that
> > is sorted.
> 
> There actually is a proposal being put forward, but it might not make ARM
> and Power people happy because it involves adding a compare, a branch,
> and an ISB/isync after every relaxed load...  Me, I agree with you,
> much preferring the no-store-speculation approach.

Can you elaborate a bit on this please? We don't permit speculative stores
in the ARM architecture, so it seems counter-intuitive that GCC needs to
emit any additional instructions to prevent that from happening.

Stores can, of course, be observed out-of-order but that's a lot more
reasonable :)

Will


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Torvald Riegel
On Fri, 2014-02-07 at 18:06 +0100, Peter Zijlstra wrote:
> On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> > Hi Paul,
> > 
> > On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > > Hopefully some discussion of out-of-thin-air values as well.
> > > > 
> > > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > > a wooden stake through its hart.
> > > > 
> > > > C11/C++11 should not be allowed to claim itself a memory model until 
> > > > that
> > > > is sorted.
> > > 
> > > There actually is a proposal being put forward, but it might not make ARM
> > > and Power people happy because it involves adding a compare, a branch,
> > > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > > much preferring the no-store-speculation approach.
> > 
> > Can you elaborate a bit on this please? We don't permit speculative stores
> > in the ARM architecture, so it seems counter-intuitive that GCC needs to
> > emit any additional instructions to prevent that from happening.
> > 
> > Stores can, of course, be observed out-of-order but that's a lot more
> > reasonable :)
> 
> This is more about the compiler speculating on stores; imagine:
> 
>   if (x)
>   y = 1;
>   else
>   y = 2;
> 
> The compiler is allowed to change that into:
> 
>   y = 2;
>   if (x)
>   y = 1;

If you write the example like that, this is indeed allowed because it's
all sequential code (and there's no volatiles in there, at least you
didn't show them :).  A store to y would happen in either case.  You
cannot observe the difference between both examples in a data-race-free
program.

Are there supposed to be atomic/non-sequential accesses in there?  If
so, please update the example.

> Which is of course a big problem when you want to rely on the ordering.
> 
> There's further problems where things like memset() can write outside
> the specified address range. Examples are memset() using single
> instructions to wipe entire cachelines and then 'restoring' the tail
> bit.

As Joseph said, this would be a bug IMO.

> While valid for single threaded, its a complete disaster for concurrent
> code.
> 
> There's more, but it all boils down to doing stores you don't expect in
> a 'sane' concurrent environment and/or don't respect the control flow.

A few of those got fixed already, because they violated the memory
model's requirements.  If you have further examples that are valid code
in the C11/C++11 model, please report them.




Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Torvald Riegel
On Fri, 2014-02-07 at 08:50 -0800, Paul E. McKenney wrote:
> On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > Hopefully some discussion of out-of-thin-air values as well.
> > 
> > Yes, absolutely shoot store speculation in the head already. Then drive
> > a wooden stake through its hart.
> > 
> > C11/C++11 should not be allowed to claim itself a memory model until that
> > is sorted.
> 
> There actually is a proposal being put forward, but it might not make ARM
> and Power people happy because it involves adding a compare, a branch,
> and an ISB/isync after every relaxed load...  Me, I agree with you,
> much preferring the no-store-speculation approach.

My vague recollection is that everyone agrees that out-of-thin-air
values shouldn't be allowed, but that it's surprisingly complex to
actually specify this properly.

However, the example that Peter posted further down in the thread seems
to be unrelated to out-of-thin-air.




Re: [MIPS] Avoiding FP operations/register usage

2014-02-07 Thread Joseph S. Myers
On Fri, 7 Feb 2014, Matthew Fortune wrote:

> My most recent reason for looking at this is because I am starting to 
> understand/look at mips ld.so from glibc and it appears to make such an 
> assumption. I.e. I cannot see it using any specific options to prevent 
> the use of floating point but the path into the dynamic linker for 
> resolving symbols only preserves integer argument registers and ignores 
> floating point. I have to therefore assume that the MIPS backend manages 
> to avoid what I thought was a common problem of using floating point 
> registers as integer scratch in extreme circumstances.

Even if you avoid use of floating point (via -ffixed-* options - check 
carefully that those are actually effective, as for some targets there are 
or have been initialization order issues for registers that are only 
conditionally available, that may make such options ineffective - not 
-msoft-float, as that would mark the objects ABI-incompatible), you'd 
still need to save and restore call-clobbered registers used for argument 
passing, because IFUNC resolvers, audit modules and user implementations 
of malloc might clobber them.  Thus, I think ld.so needs to save and 
restore those registers (and so there isn't much point making it avoid 
floating point).  See 
.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Joseph S. Myers
On Fri, 7 Feb 2014, Peter Zijlstra wrote:

> There's further problems where things like memset() can write outside
> the specified address range. Examples are memset() using single
> instructions to wipe entire cachelines and then 'restoring' the tail
> bit.

If memset (or any C library function) modifies bytes it's not permitted to 
modify in the abstract machine, that's a simple bug and should be reported 
as usual.  We've made GCC follow that part of the memory model by default 
(so a store to a non-bit-field structure field doesn't do a 
read-modify-write to a word containing another field, for example) and I 
think it's pretty obvious that glibc should do so as well.

(Of course, memset is not an atomic operation, and you need to allow for 
that if you use it on an _Atomic object - which is I think valid, unless 
the object is also volatile, but perhaps ill-advised.)

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [MIPS] Avoiding FP operations/register usage

2014-02-07 Thread Matthew Fortune
> > My most recent reason for looking at this is because I am starting to
> > understand/look at mips ld.so from glibc and it appears to make such
> > an assumption. I.e. I cannot see it using any specific options to
> > prevent the use of floating point but the path into the dynamic linker
> > for resolving symbols only preserves integer argument registers and
> > ignores floating point. I have to therefore assume that the MIPS
> > backend manages to avoid what I thought was a common problem of using
> > floating point registers as integer scratch in extreme circumstances.
> 
> Even if you avoid use of floating point (via -ffixed-* options - check 
> carefully
> that those are actually effective, as for some targets there are or have been
> initialization order issues for registers that are only conditionally 
> available,
> that may make such options ineffective - not -msoft-float, as that would
> mark the objects ABI-incompatible), you'd still need to save and restore call-
> clobbered registers used for argument passing, because IFUNC resolvers,
> audit modules and user implementations of malloc might clobber them.

This is where I was going next with this but I didn't know if it was 
appropriate to go into such things on the GCC list.

> Thus, I think ld.so needs to save and restore those registers (and so there
> isn't much point making it avoid floating point).  See
> .

Thanks for this and I agree. I've read some of the threads on this topic but 
not these. I have also realised I've stumbled my way into something that will 
also affect/be affected by how we define the ABI extension for MSA. If we 
define an ABI extension that uses MSA registers for arguments then these would 
also need saving around dynamic loader entry points.

I'm still interested in how successfully the MIPS backend is managing to avoid 
floating point but I am also convinced there are bugs in ld.so entry points for 
MIPS.

Matthew



Fwd: LLVM collaboration?

2014-02-07 Thread Renato Golin
Folks,

I'm about to do something I've been advised against, but since I
normally don't have good judgement, I'll risk it, because I think it's
worth it. I know some people here share my views and this is the
reason I'm writing this.


The problem

For a long time already I've been hearing on the LLVM list people
saying: "oh, ld should not accept this deprecated instruction, but we
can't change that", "that would be a good idea, but we need to talk to
the GCC guys first", and to be honest, nobody ever does.

Worst still, with Clang and LLVM getting more traction recently, and
with a lot of very interesting academic work being done, a lot of new
things are getting into LLVM first (like the sanitizers, or some
specialized pragmas) and we're dangerously close to start having
clang-extensions, which in my humble opinion, would be a nightmare.

We, on the other side of the fence, know very well how hard it is to
keep with legacy undocumented gcc-extensions, and the ARM side is
particularly filled with magical things, so I know very well how you
guys would feel if you, one day, had to start implementing clang stuff
without even participating in the original design just because someone
relies on it.

So, as far as I can see (please, correct me if I'm wrong), there are
two critical problems that we're facing right now:

1. There IS an unnecessary fence between GCC and LLVM.

License arguments are one reason why we can't share code as easily as
we would like, but there is no argument against sharing ideas,
cross-reporting bugs, helping each other implement a better
compiler/linker/assembler/libraries just because of an artificial
wall. We need to break this wall.

I rarely see GCC folks reporting bugs on our side, or people saying
"we should check with the GCC folks" actually doing it. We're not
contagious folks, you know. Talking to GCC engineers won't make me a
lesser LLVM engineer, and vice-versa.

I happen to have a very deep respect for GCC *and* for my preferred
personal license (GPLv3), but I also happen to work with LLVM, and I
like it a lot. There is no contradiction on those statements, and I
wish more people could share my opinion.

2. There are decisions that NEED to be shared.

In the past, GCC implemented a lot of extensions because the standards
weren't good enough. This has changed, but the fact that there will
always be things that don't belong on any other standard, and are very
specific to the toolchain inner workings, hasn't.

It would be beneficial to both toolchains to have a shared forum where
we could not only discuss how to solve problems better, but also keep
track of the results, so we can use it as guidelines when implementing
those features.

Further still, other compilers would certainly benefit from such
guidelines, if they want to interact with our toolchains. So, this
wouldn't be just for our sake, but also for future technologies. We
had a hard time figuring out why GCC would do this or that, and in the
end, there was always a reason (mostly good, sometimes, not so much),
but we wasted a lot of time following problems lost in translation.


The Open Source Compiler Initiative

My view is that we're unnecessarily duplicating a lot of the work to
create a powerful toolchain. The license problems won't go away, so I
don't think LLVM will ever disappear. But we're engineers, not
lawyers, so we should solve the bigger technical problem in a way that
we know how: by making things work.

For the last year or two, Clang and GCC are approaching an asymptote
as to what people believe a toolchain should be, but we won't converge
to the same solution unless we talk. If we keep our ideas enclosed
inside our own communities (who has the time to follow both gcc and
llvm lists?), we'll forever fly around the expected target and never
reach it.

To solve the technical problem of duplicated work we just need to
start talking to each other. This mailing list (or LLVM's) is not a
good place, since the traffic is huge and not every one is interested,
so I think we should have something else (another list? a web page? a
bugzilla?) where we'd record all common problems and proposals for new
features (not present in any standards), so that at least we know what
the problems are.

Getting to fix a problem or accepting a proposal would go a long way
of having them as kosher on both compilers, and that could be
considered as the standard compiler implementation, so other
compilers, even the closed source ones, should follow suit.

I'll be at the GNU Cauldron this year, feel free to come and discuss
this and other ideas. I hope to participate more in the GCC side of
things, and I wish some of you guys would do the same on our side. And
hopefully, in a few years, we'll all be on the same side.

I'll stop here, TL;DR; wise. Please, reply copying me, as I'm not
(yet) subscribing to this list.

Best Regards,
--renato


Re: LLVM collaboration?

2014-02-07 Thread Diego Novillo
On Fri, Feb 7, 2014 at 4:33 PM, Renato Golin  wrote:

> I'll be at the GNU Cauldron this year, feel free to come and discuss
> this and other ideas. I hope to participate more in the GCC side of
> things, and I wish some of you guys would do the same on our side. And
> hopefully, in a few years, we'll all be on the same side.

I think this would be worth a BoF, at the very least. Would you be
willing to propose one? I just need an abstract to get it in the
system. We still have some room left for presentations.

I think the friendly competition we have going between the two
compilers has done nothing but improve both toolchains. I agree that
we should keep it at this level. Any kind of abrasive interaction
between the two communities is a waste of everyone's time.

Both compilers have a lot to learn from each other.


Diego.


Re: LLVM collaboration?

2014-02-07 Thread Renato Golin
On 7 February 2014 21:53, Diego Novillo  wrote:
> I think this would be worth a BoF, at the very least. Would you be
> willing to propose one? I just need an abstract to get it in the
> system. We still have some room left for presentations.

Hi Diego,

Thanks, that'd be great!

A BoF would give us more time to discuss the issue, even though I'd
like to start the conversation a lot earlier. Plus, I have a lot more
to learn than to talk about. ;)

Something along the lines of...

* GCC and LLVM collaboration / The Open Source Compiler Initiative

With LLVM mature enough to feature as the default toolchain in some
Unix distributions, and with the inherent (and profitable) share of
solutions, ideas and code between the two, we need to start talking at
a more profound level. There will always be problems that can't be
included in any standard (language, extension, or machine-specific)
and are intrinsic to the compilation infrastructure. For those, and
other common problems, we need common solutions to at least both LLVM
and GCC, but ideally any open source (and even closed source)
toolchain. In this BoF session, we shall discuss to what extent this
collaboration can take us, how we should start and what are the next
steps to make this happen.

cheers,
--renato


Re: LLVM collaboration?

2014-02-07 Thread Andrew Pinski
On Fri, Feb 7, 2014 at 1:33 PM, Renato Golin  wrote:
> Folks,
>
> I'm about to do something I've been advised against, but since I
> normally don't have good judgement, I'll risk it, because I think it's
> worth it. I know some people here share my views and this is the
> reason I'm writing this.
>
>
> The problem
>
> For a long time already I've been hearing on the LLVM list people
> saying: "oh, ld should not accept this deprecated instruction, but we
> can't change that", "that would be a good idea, but we need to talk to
> the GCC guys first", and to be honest, nobody ever does.

Well you should also be talking to the binutils folks for ld issue.
GCC is less an issue here.


>
> Worst still, with Clang and LLVM getting more traction recently, and
> with a lot of very interesting academic work being done, a lot of new
> things are getting into LLVM first (like the sanitizers, or some
> specialized pragmas) and we're dangerously close to start having
> clang-extensions, which in my humble opinion, would be a nightmare.
>
> We, on the other side of the fence, know very well how hard it is to
> keep with legacy undocumented gcc-extensions, and the ARM side is
> particularly filled with magical things, so I know very well how you
> guys would feel if you, one day, had to start implementing clang stuff
> without even participating in the original design just because someone
> relies on it.

Can you give an example?  We have been cleaning these undocumented
extensions.  I think this is not a big issue but without examples, it
is harder to figure out what needs to be done.

>
> So, as far as I can see (please, correct me if I'm wrong), there are
> two critical problems that we're facing right now:
>
> 1. There IS an unnecessary fence between GCC and LLVM.

I don't see that.  What I see a GCC folks are working on GCC and the
rest of the GNU tools (including glibc) but the LLVM only work on LLVM
and when they find an issue they don't bring it up to the GCC list.

>
> License arguments are one reason why we can't share code as easily as
> we would like, but there is no argument against sharing ideas,
> cross-reporting bugs, helping each other implement a better
> compiler/linker/assembler/libraries just because of an artificial
> wall. We need to break this wall.
>
> I rarely see GCC folks reporting bugs on our side, or people saying
> "we should check with the GCC folks" actually doing it. We're not
> contagious folks, you know. Talking to GCC engineers won't make me a
> lesser LLVM engineer, and vice-versa.

That is because most of us don't need to figure out what LLVM is doing
for most of the time.   It should be up to the people who say check
with the GCC folks first to actually ask the GCC folks rather than the
other way around to figure out what LLVM needs.

>
> I happen to have a very deep respect for GCC *and* for my preferred
> personal license (GPLv3), but I also happen to work with LLVM, and I
> like it a lot. There is no contradiction on those statements, and I
> wish more people could share my opinion.
>
> 2. There are decisions that NEED to be shared.
>
> In the past, GCC implemented a lot of extensions because the standards
> weren't good enough. This has changed, but the fact that there will
> always be things that don't belong on any other standard, and are very
> specific to the toolchain inner workings, hasn't.

It depends on the extensions.  Most of people working on GCC are
working on GCC because that is the only compiler they need to work on;
LLVM is not even in most of their minds when they bring up an
extension.

An good example is asm goto, it is brought up by the Linux kernel
folks to the GCC folks.  Us as a project should not say please also
bring it up to the clang folks for discussion  GCC should implement a
good extension which is useful for the kernel folks.  The same is true
of the ifunc extension, it was brought up by the glibc folks to GCC.

>
> It would be beneficial to both toolchains to have a shared forum where
> we could not only discuss how to solve problems better, but also keep
> track of the results, so we can use it as guidelines when implementing
> those features.

Again it depends on the issue.  I don't some issues don't need an
extra discussion list and in fact get in the way of implementing
features.

>
> Further still, other compilers would certainly benefit from such
> guidelines, if they want to interact with our toolchains. So, this
> wouldn't be just for our sake, but also for future technologies. We
> had a hard time figuring out why GCC would do this or that, and in the
> end, there was always a reason (mostly good, sometimes, not so much),
> but we wasted a lot of time following problems lost in translation.
>
>
> The Open Source Compiler Initiative
>
> My view is that we're unnecessarily duplicating a lot of the work to
> create a powerful toolchain. The license problems won't go away, so I
> don't think LLVM will ever disappear. But we're engineers, not
> 

Re: LLVM collaboration?

2014-02-07 Thread Andrew Pinski
On Fri, Feb 7, 2014 at 1:53 PM, Diego Novillo  wrote:
> On Fri, Feb 7, 2014 at 4:33 PM, Renato Golin  wrote:
>
>> I'll be at the GNU Cauldron this year, feel free to come and discuss
>> this and other ideas. I hope to participate more in the GCC side of
>> things, and I wish some of you guys would do the same on our side. And
>> hopefully, in a few years, we'll all be on the same side.
>
> I think this would be worth a BoF, at the very least. Would you be
> willing to propose one? I just need an abstract to get it in the
> system. We still have some room left for presentations.

I still don't see any need for this extra BoF really.  They should be
handled at the sources of the extensions rather than the destination
of the extensions.  In fact I see this as weaking the computition
between the two compilers.  Things like the new attributes being added
for the kernel to use (in fact they are already implemented in sparse
is a thing which should be mentioned here) should have been talked
about the source.  HPA filed the bugs to GCC as he is an user of GCC
but not an user of LLVM, if someone in the kernel community wanted
LLVM support they would have filed the bugs there.

And then again the original message here is that GCC is not
controlling binutils (ld) and " ld should not accept this deprecated
instruction, but we
can't change that"  but you should have talked with the binutils
community rather than the GCC one since they are two separate projects
(though most folks work on both).

Thanks,
Andrew Pinski

>
> I think the friendly competition we have going between the two
> compilers has done nothing but improve both toolchains. I agree that
> we should keep it at this level. Any kind of abrasive interaction
> between the two communities is a waste of everyone's time.
>
> Both compilers have a lot to learn from each other.
>
>
> Diego.


Re: LLVM collaboration?

2014-02-07 Thread Andrew Pinski
On Fri, Feb 7, 2014 at 2:07 PM, Renato Golin  wrote:
> On 7 February 2014 21:53, Diego Novillo  wrote:
>> I think this would be worth a BoF, at the very least. Would you be
>> willing to propose one? I just need an abstract to get it in the
>> system. We still have some room left for presentations.
>
> Hi Diego,
>
> Thanks, that'd be great!
>
> A BoF would give us more time to discuss the issue, even though I'd
> like to start the conversation a lot earlier. Plus, I have a lot more
> to learn than to talk about. ;)
>
> Something along the lines of...
>
> * GCC and LLVM collaboration / The Open Source Compiler Initiative

I think it is going to called anything, it should be GNU and LLVM
collaboration since GCC does not include binutils/gdb while LLVM
includes the assembler/etc.
Again I think the problem is not knowing who to talk with rather than
an issue of having another issue or a BoF.  Many of the GCC developers
don't talk with LLVM developers and don't feel the need to.  While on
the other hand LLVM folks say report it to the GCC (though should have
been the GNU folks) but never follow through; so this is a LLVM issue
rather than a GCC issue.  GCC developers depend on the GNU binutils
and when there is bug in the linker/assembler; we report them but from
the sound of it, the LLVM developers don't.

Thanks,
Andrew

>
> With LLVM mature enough to feature as the default toolchain in some
> Unix distributions, and with the inherent (and profitable) share of
> solutions, ideas and code between the two, we need to start talking at
> a more profound level. There will always be problems that can't be
> included in any standard (language, extension, or machine-specific)
> and are intrinsic to the compilation infrastructure. For those, and
> other common problems, we need common solutions to at least both LLVM
> and GCC, but ideally any open source (and even closed source)
> toolchain. In this BoF session, we shall discuss to what extent this
> collaboration can take us, how we should start and what are the next
> steps to make this happen.
>
> cheers,
> --renato


Re: LLVM collaboration?

2014-02-07 Thread Renato Golin
On 7 February 2014 22:33, Andrew Pinski  wrote:
> I think it is going to called anything, it should be GNU and LLVM
> collaboration since GCC does not include binutils/gdb while LLVM
> includes the assembler/etc.

Good point. I do mean the whole toolchain.

cheers,
--renato


Re: LLVM collaboration?

2014-02-07 Thread Jonathan Wakely
On 7 February 2014 21:33, Renato Golin wrote:
>
> Worst still, with Clang and LLVM getting more traction recently, and
> with a lot of very interesting academic work being done, a lot of new
> things are getting into LLVM first (like the sanitizers, or some
> specialized pragmas) and we're dangerously close to start having
> clang-extensions, which in my humble opinion, would be a nightmare.

The sanitizers are IMHO an impressive example of collaboration. The
process may not be perfect, but the fact is that those powerful tools
are available in both compilers - I think that's amazing!

> We, on the other side of the fence, know very well how hard it is to
> keep with legacy undocumented gcc-extensions, and the ARM side is
> particularly filled with magical things, so I know very well how you
> guys would feel if you, one day, had to start implementing clang stuff
> without even participating in the original design just because someone
> relies on it.

Like the Blocks extension? :-)


> So, as far as I can see (please, correct me if I'm wrong), there are
> two critical problems that we're facing right now:
>
> 1. There IS an unnecessary fence between GCC and LLVM.
>
> License arguments are one reason why we can't share code as easily as
> we would like, but there is no argument against sharing ideas,
> cross-reporting bugs, helping each other implement a better
> compiler/linker/assembler/libraries just because of an artificial
> wall. We need to break this wall.

If there's a wall I agree we should break it (I don't see one in the
areas I work on, which I think is great).


> I rarely see GCC folks reporting bugs on our side,

For my part, I report them when I find them, but I just don't use
Clang or LLVM that much, so I don't find many (also the few things I
test often work correctly anyway!)

I expect that many GCC devs aren't reporting bugs because they're just
not using LLVM.  I don't report OpenBSD bugs either, not because I
dislike OpenBSD, I just don't use it.


> I happen to have a very deep respect for GCC *and* for my preferred
> personal license (GPLv3), but I also happen to work with LLVM, and I
> like it a lot. There is no contradiction on those statements, and I
> wish more people could share my opinion.

I'm sure many of us do.

IMHO more collaboration should help both projects, but I think there
is already more collaboration than some people realise. Development of
OpenMP, DWARF, the C++ Itanium ABI, the psABI etc. happens in the open
and is not limited to GNU devs.

For things that don't belong in any standard, such as warning options,
that's an area where the compilers may be in competition to provide a
better user-experience, so it's unsurprising that options get added to
one compiler first without discussing it with the other project. What
tends to happen with warnings is someone says "hey, clang has this
warning, we should add it too" e.g. -Wdelete-non-virtual-dtor or
-Winclude-guard, so we may end up agreeing eventually anyway.


Re: LLVM collaboration?

2014-02-07 Thread Renato Golin
On 7 February 2014 22:42, Jonathan Wakely  wrote:
> The sanitizers are IMHO an impressive example of collaboration. The
> process may not be perfect, but the fact is that those powerful tools
> are available in both compilers - I think that's amazing!

I agree.


> Like the Blocks extension? :-)

So, as an example, I started a discussion about our internal
vectorizer and how we could control it from pragmas to test and report
errors. It turned out a lot bigger than I imagined, with people
defending inclusion in openMP pragmas, or implementing as C++11
annotations, and even talking about back-porting annotations for C89
code, as an extension. Seriously, that gave me the chills.

Working in the ARM debugger, compiler and now with LLVM, I had to work
around and understand GNU-isms and the contract that the kernel has
with the toolchain, that I don't think is entirely healthy. We should,
yes, have a close relationship with them, but some proposals are
easier to implement in one compiler than another, others are just
implemented because it was the quickest implementation, or generated
the smallest code, or whatever. Things that I was expecting to see in
closed-source toolchains (as I have), but not on an open source one.

At the very least, some discussion could point to defects on one or
another toolchain, as well as the kernel. We've seen a fair number of
bad code that GCC accepts in the kernel just because it can (VLAIS,
nested functions), not because it's sensible, and that's actually
making the kernel code worse in respect with the standard. Opinions
will vary, and I don't expect everyone to agree with me that those are
nasty (nor I want flame from it, please), but some consensus would be
good.


> I expect that many GCC devs aren't reporting bugs because they're just
> not using LLVM.  I don't report OpenBSD bugs either, not because I
> dislike OpenBSD, I just don't use it.

I understand that, and I take your point. I wasn't requesting every
one to use it, but to enquire about new extensions when they come your
way, as we should do when it comes our way. I'm guilty of this being
my first email to the gcc list (and I have been publicly bashed at
FOSDEM because of it, which I appreciate).


> For things that don't belong in any standard, such as warning options,
> that's an area where the compilers may be in competition to provide a
> better user-experience, so it's unsurprising that options get added to
> one compiler first without discussing it with the other project. What
> tends to happen with warnings is someone says "hey, clang has this
> warning, we should add it too" e.g. -Wdelete-non-virtual-dtor or
> -Winclude-guard, so we may end up agreeing eventually anyway.

I think you have touched a very good point: "competition to provide
the best user experience". Do we really need that?

Front-end warnings are quite easy to replicate, but some other flags
may have slightly different semantics on each compiler, and having the
user to tell the difference is cruel. Inline assembly magic and new
ASM directives are another issue that populate the kernel (and we've
been implementing all of them in our assembler to compile the kernel).
That simply won't go away ever.

I question some of the decisions, as I have questioned some of ARM's
decisions on its ABIs, as something that had a purpose, but the core
reason is gone, and we can move along. Some consensus would have
probably helped to design a better, long-lasting solution, a lot more
consensus would have halted any progress, so we have to be careful.
But I specifically don't think that extensions required by
third-parties (like the kernel) should be discussed directly with any
specific compiler, as that will perpetuate this problem.

Some kernel developers, including Linus, are very receptive to
compiling it with Clang, so new extensions will probably be discussed
with both. Now, if we require them to discuss with each community in
separate, I'm sure the user experience will be terrible when trying to
consolidate it.

I don't want us to compete, I want us to collaborate. I don't believe
LLVM is ever going to steal GCC's shine, but both will coexist, and
having a friendly coexistence would be a lot better for everyone.
Aren't we doing open source for a better world? I can't see where
competition fits into this.

As I said before, one of my main goals of working with LLVM is to make
*GCC* better. I believe that having two toolchains is better than one
for that very reason, but maintaining two completely separate and
competing toolchains is not sustainable, even for open source
projects.

cheers,
--renato


Re: Fwd: LLVM collaboration?

2014-02-07 Thread Joseph S. Myers
On Fri, 7 Feb 2014, Renato Golin wrote:

> For a long time already I've been hearing on the LLVM list people
> saying: "oh, ld should not accept this deprecated instruction, but we
> can't change that", "that would be a good idea, but we need to talk to
> the GCC guys first", and to be honest, nobody ever does.

I think there are other closely related issues, as GCC people try to work 
around issues with glibc, or vice versa, rather than coordinating what 
might be the best solution involving changes to both components, as people 
in the glibc context complain about some Linux kernel decision but have 
difficulty getting any conclusion in conjunction with Linux kernel people 
about the right way forward (or, historically, have difficulty getting 
agreement there is a problem at all - the Linux kernel community has 
tended to have less interest in supporting the details of standards than 
the people now trying to do things in GCC and glibc), as Linux kernel 
people complain about any compiler that optimizes C as a high-level 
language in ways conflicting with its use as something more like a 
portable assembler for kernel code, and as people from the various 
communities complain about issues with underlying standards such as ISO C 
and POSIX but rather less reliably engage with the standards process to 
solve those issues.

Maybe the compiler context is sufficiently separate from the others 
mentioned that there should be multiple collaboration routes for 
(compilers), (libraries / kernel), ... - but people need to be aware that 
just because something is being discussed in a compiler context doesn't 
mean that a C language extension is the right solution; it's possible 
something involving both language and library elements is right, it's 
possible collaboration with the standards process is right at an early 
stage.

(The libraries / kernel collaboration venue exists - the linux-api list, 
which was meant to be for anything about the kernel/userspace interface.  
Unfortunately, it's rarely used - most relevant kernel discussion doesn't 
go there - and I don't have time to follow linux-kernel.  We have recently 
seen several feature requests from the Linux kernel side reported to GCC 
Bugzilla, which is good - at least if there are people on the GCC side 
working on getting such things of use to the kernel implemented in a 
suitably clean way that works for what the kernel wants.)

> 2. There are decisions that NEED to be shared.
> 
> In the past, GCC implemented a lot of extensions because the standards
> weren't good enough. This has changed, but the fact that there will
> always be things that don't belong on any other standard, and are very
> specific to the toolchain inner workings, hasn't.

There are also lots of things where either (a) it would make sense to get 
something in a standard - it can be defined sensibly at the level ISO C / 
C++ deals with, or (b) the standard exists, but what gets implemented 
ignores the standard.  Some of this may be because economic incentives 
seem to get things done one way rather than another way that would 
ultimately be better for users of the languages.

To expand on (a): for a recent GCC patch there was a use for having 
popcount on the host, and I noted 
 how that's one 
of many integer manipulation operations lacking any form of standard C 
bindings.  Sometimes for these things we do at least have 
target-independent GCC extensions - but sometimes just target-specific 
ones, with multiple targets having built-in functions for similar things, 
nominally mapping to particular instructions, when it would be better to 
have a standard C binding for a given operation.

To expand on (b): consider the recent AVX512 GCC patches.  As is typical 
for patches enabling support for new instruction set features, they added 
a large pile of intrinsics (intrinsic headers, mapping to built-in 
functions).  The intrinsics implement a standard of sorts - shared with 
Intel's compiler, at least.  But are they really the right approach in all 
cases?  The features of AVX512 include, in particular, floating-point 
operations with rounding modes embedded in the instruction (in support of 
IEEE 754-2008 saying language standards should support a way to apply 
rounding modes to particular blocks, not just dynamic rounding modes).

There's a proposed specification for C bindings to such a feature - draft 
TS 18661-1 (WG14 N1778; JTC1 ballot ends 2014-03-05, so may well be 
published later this year).  There was some discussion of this starting 
with  (discussion 
continued into Jan 2013), which I presume was motivated by the AVX512 
feature, but in the end the traditional intrinsics were the approach taken 
for supporting this feature, not anything that would allow 
architecture-independent source code to be written.  (The AVX512 feature 
combines constant rounding modes with di

Re: Fwd: LLVM collaboration?

2014-02-07 Thread Renato Golin
On 7 February 2014 23:30, Joseph S. Myers  wrote:
> I think there are other closely related issues, as GCC people try to work
> around issues with glibc, or vice versa, rather than coordinating what
> might be the best solution involving changes to both components,

Hi Joseph,

Thanks for the huge email, all of it (IMHO) was spot on. I agree with
your arguments, and one of the reasons why I finally sent the email,
is that I'm starting to see all this on LLVM, too.

Because of licenses, we have to replicate libgcc, libstdc++, the
linker, etc. And in many ways, features get added to random places
because it's the easiest route, or because it's the right place to be,
even though there isn't anything controlling or monitoring the feature
in the grand scheme of things. This will, in the end, invariably take
us through the route that GNU crossed a few years back, when people
had to use radioactive suites to work on some parts of GCC.

So, I guess my email was more of a cry for help, than a request to
play nice (as some would infer). I don't think we should repeat the
same mistakes you guys did, but I also think that we have a lot to
offer, as you mention, in looking at extensions and proposing to
standards, or keeping kernel requests sane, and having a unison
argument on specific changes, and so on.

The perfect world would be if any compiler could use any assembler,
linker and libraries, interchangeably. While that may never happen, as
a long term goal, this would at least draw us a nice asymptote to
follow. As every one here and there, I don't have enough time to work
through every detail and follow all lists, but if we encourage the
cross over, or even cross posting between the two lists, we might
solve common problems without incurring in additional time wasted.

--renato