date:20140224

Re: asking your advice about bug

2014-02-24 Thread Tobias Grosser


On 02/17/2014 06:50 PM, Roman Gareev wrote:



Hi Tobias,


  thanks for the answer!


Hi Roman,

sorry for missing this mail.



  I think that the segfault is being caused by NULL arguments being passedto 
compute_deps
by loop_level_carries_dependences. *This is **causing **an* *assignment of**
NULL values to the following parameters of **compute_deps:* must_raw_no_source,
may_raw_no_source, must_war_no_source, may_war_no_source,
must_waw_no_source, may_waw_no_source. They are being passed to 
subtract_commutative_associative_deps
and dereferenced in the following statements:


  *must_raw_no_source = isl_union_map_subtract (*must_raw_no_source,
x_must_raw_no_source);


  *may_raw_no_source = isl_union_map_subtract (*may_raw_no_source,
x_may_raw_no_source);


  *must_war_no_source = isl_union_map_subtract (*must_war_no_source,
x_must_war_no_source);


  *may_war_no_source = isl_union_map_subtract (*may_war_no_source,
x_may_war_no_source);


  *must_waw_no_source = isl_union_map_subtract (*must_waw_no_source,
x_must_waw_no_source);


  *may_waw_no_source = isl_union_map_subtract (*may_waw_no_source,
x_may_waw_no_source);

  This is the reason of segfault. (All functions mentioned above are located
in gcc/graphite-dependences.c)



Interesting analysis.


  I think that this can be solved by the addition to
subtract_commutative_associative_deps of NULL checking of the following
variables: must_raw_no_source, may_raw_no_source, must_war_no_source,
may_war_no_source, must_waw_no_source, may_waw_no_source. I've implemented
this in the patch, which can be found below.


Yes, this would be a 'solution'. However, I am in fact surprised that 
those variables are NULL at all. Do you have an idea why this is the 
case? Understanding this would help to understand if the patch you 
propose is actually the right solution or if it is just hiding a 
previous bug.


Cheers,
Tobias

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Michael Matz

Hi,

On Fri, 21 Feb 2014, Paul E. McKenney wrote:

  And with conservative I mean everything is a source of a dependency, and 
  hence can't be removed, reordered or otherwise fiddled with, and that 
  includes code sequences where no atomic objects are anywhere in sight [1].
  In the light of that the only realistic way (meaning to not have to 
  disable optimization everywhere) to implement consume as currently 
  specified is to map it to acquire.  At which point it becomes pointless.
 
 No, only memory_order_consume loads and [[carries_dependency]]
 function arguments are sources of dependency chains.

I don't see [[carries_dependency]] in the C11 final draft (yeah, should 
get a real copy, I know, but let's assume it's the same language as the 
standard).  Therefore, yes, only consume loads are sources of 
dependencies.  The problem with the definition of the carries a 
dependency relation is not the sources, but rather where it stops.  
It's transitively closed over value of evaluation A is used as operand in 
evaluation B, with very few exceptions as per 5.1.2.4#14.  Evaluations 
can contain function calls, so if there's _any_ chance that an operand of 
an evaluation might even indirectly use something resulting from a consume 
load then that evaluation must be compiled in a way to not break 
dependency chains.

I don't see a way to generally assume that e.g. the value of a function 
argument can impossibly result from a consume load, therefore the compiler 
must assume that all function arguments _can_ result from such loads, and 
so must disable all depchain breaking optimization (which are many).

  [1] Simple example of what type of transformations would be disallowed:
  
  int getzero (int i) { return i - i; }
 
 This needs to be as follows:
 
 [[carries_dependency]] int getzero(int i [[carries_dependency]])
 {
   return i - i;
 }
 
 Otherwise dependencies won't get carried through it.

So, with the above do you agree that in absense of any other magic (see 
below) the compiler is not allowed to transform my initial getzero() 
(without the carries_dependency markers) implementation into return 0; 
because of the C11 rules for carries-a-dependency?

If so, do you then also agree that the specification of carries a 
dependency is somewhat, err, shall we say, overbroad?

  depchains don't matter, could _then_ optmize it to zero.  But that's 
  insane, especially considering that it's hard to detect if a given context 
  doesn't care for depchains, after all the depchain relation is constructed 
  exactly so that it bleeds into nearly everywhere.  So we would most of 
  the time have to assume that the ultimate context will be depchain-aware 
  and therefore disable many transformations.
 
 Any function that does not contain a memory_order_consume load and that 
 doesn't have any arguments marked [[carries_dependency]] can be 
 optimized just as before.

And as such marker doesn't exist we must conservatively assume that it's 
on _all_ parameters, so I'll stand by my claim.

  Then inlining getzero would merely add another # j.dep = i.dep 
  relation, so depchains are still there but the value optimization can 
  happen before inlining.  Having to do something like that I'd find 
  disgusting, and rather rewrite consume into acquire :)  Or make the 
  depchain relation somehow realistically implementable.
 
 I was actually OK with arithmetic cancellation breaking the dependency 
 chains.  Others on the committee felt otherwise, and I figured that (1) 
 I wouldn't be writing that kind of function anyway and (2) they knew 
 more about writing compilers than I.  I would still be OK saying that 
 things like i-i, i*0, i%1, i0, i|~0 and so on just break the 
 dependency chain.

Exactly.  I can see the problem that people had with that, though.  There 
are very many ways to write conceiled zeros (or generally neutral elements 
of the function in question).  My getzero() function is one (it could e.g. 
be an assembler implementation).  The allowance to break dependency chains 
would have to apply to such cancellation as well, and so can't simply 
itemize all cases in which cancellation is allowed.  Rather it would have 
had to argue about something like value dependency, ala evaluation B 
depends on A, if there exist at least two different values A1 and A2 
(results from A), for which evaluation B (with otherwise same operands) 
yields different values B1 and B2.

Alas, it doesn't, except if you want to understand the term the value of 
A is used as an operand of B in that way.  Even then you'd still have the 
second case of the depchain definition, via intermediate not even atomic 
memory stores and loads to make two evaluations be ordered per 
carries-a-dependency.

And even that understanding of is used wouldn't be enough, because there 
are cases where the cancellation happens in steps, and where it interacts 
with the third clause (transitiveness):  Assume this:

  a = something()  // evaluation

Non-temporal move

2014-02-24 Thread Gopalasubramanian, Ganesh

I could see storent pattern in x86 machine descriptions (in sse.md)., but 
internals doc don't mention it. Should we add a description about this in the 
internals doc?

Regards
Ganesh

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Matthew Fortune

Richard Sandiford rdsandif...@googlemail.com writes
 Matthew Fortune matthew.fort...@imgtec.com writes:
  All,
 
  Imagination Technologies would like to introduce the design of an O32
  ABI extension for MIPS to allow it to be used in conjunction with MIPS
  FPUs having 64-bit floating-point registers. This is a wide-reaching
  design that involves changes to all components of the MIPS toolchain
  it is being posted to GCC first and will progress on to other tools.
  This ABI extension is compatible with the existing O32 ABI definition
  and will not require the introduction of new build variants
 (multilibs).
 
  The design document is relatively large and has been placed on the
  MIPS Compiler Team wiki to facilitate review:
 
  http://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinkin
  g
 
 Looks good to me.  It'll be interesting to see whether making the odd-
 numbered call-saved-in-fr0 registers available for frx pays off or
 whether it ends up being better to avoid them.

Indeed, I suspect they should be avoided except for leaf functions. You would 
have to be pretty desperate for a register if you use the caller-and-callee 
save registers!

 I understand the need to deprecate the current -mgp32 -mfp64 behaviour.
 I don't think we should deprecate -mfp64 itself though.  Instead, why
 not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for
 modeless?  So rather than deprecating the -mgp32 -mfp64 combination and
 adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in
 which the odd-numbered registers are call-clobbered rather than the old
 form in which they were call-saved.

Extreme caution is the only reason why the design avoided changing fp64 
behaviour (basically in case anyone had serious objection). If you would be 
happy with a change of behaviour for -mgp32 -mfp64 then that is a great start.
 
 AIUI the old form never really worked reliably due to things like
 newlib's setjmp not preserving the odd-numbered registers, so it doesn't
 seem worth keeping around.  Also, the old form is identified by the GNU
 attribute (4, 4) so it'd be easy for the linker to reject links between
 the old and the new form.

That is true. You will have noticed a number of changes over recent months to 
start fixing fp64 as currently defined but having found this new solution then 
such fixes are no longer important. The lack of support for gp32 fp64 in linux 
is further reason to permit redefining it. Would you be happy to retain the 
same builtin defines for FP64 if changing its behaviour (i.e. __mips_fpr=64)?
 
 The corresponding asm would then be .set fp=xx.
 
 Either way, a new .set option would be better than a specific .fr
 directive because it gives you access to the option stack (.set
 push/.set pop).
 
 I'm not sure about:
 
   If an assembly directive is seen prior to the start of the text
   section then this modifies the default mode for the module.
 
 This isn't how any of the existing options work and I think the
 inconsistency would be confusing.  It also means that if the first
 function in a file happens to force a local mode (e.g.
 because it's an ifunc implementation) then you'd have to remember to
 write:
 
   .fr x
   .fr 1
 
 so that the first sets the mode for the module and the second sets it
 for the first function.  The different treatment of the two lines
 wouldn't be obvious at first glance.
 
 How about instead having a separate directive that explicitly sets the
 global value of an option?  I.e. something like .module, taking the
 same options as .set.  Better names welcome. :-)

Use of a different directive to actually affect the overall mode of a module 
sounds like a good plan and it avoids the weird behaviour. The only thing 
specifically needed is that the assembly file records the mode it was written 
for. Getting the wrong command line option would otherwise lead to unusual 
runtime failures. We have been/are still discussing this point so it's no 
surprise you have commented on it too. I'll wait for any further comments on 
this area and update accordingly.
 
 The scheme allows an ifunc to request a mode and effectively gives the
 choice to the firstcomer.  Every other ifunc then has to live with the
 choice.  I don't think that's a good idea, since the order that ifuncs
 are called isn't well-defined or under direct user control.
 
 Since ifuncs would have to live with a choice made by other ifuncs, in
 practice they must all be prepared to deal with FR=1 if linked into a
 fully-modeless or FR1 program, and to deal with FR=0 if linked into a
 fully-modeless or FR0 program.  So IMO the dynamic linker should simply
 set FR=1 for modeless programs if the hardware supports it and set it to
 0 for modeless programs otherwise, like you say in the first paragraph
 of 9.4.

The ifunc interaction should possibly be moved to a different proposal. We 
could reduce this down to a simple statement that interaction with ifunc needs 
to be

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Let me think about it some more, but my gut feel is that just tweaking
 the definition of what ordered means is sufficient.

 So to go back to the suggested ordering rules (ignoring the restrict
 part, which is just to clarify that ordering through other means to
 get to the object doesn't matter), I suggested:

  the consume ordering guarantees the ordering between that
   atomic read and the accesses to the object that the pointer
   points to

 and I think the solution is to just say that this ordering acts as a
 fence. It doesn't say exactly *where* the fence is, but it says that
 there is *some* fence between the load of the pointer and any/all
 accesses to the object through that pointer.

I'm wrong. That doesn't work. At all. There is no ordering except
through the pointer chain.

So I think saying just that, and nothing else (no magic fences, no
nothing) is the right thing:

 the consume ordering guarantees the ordering between that
  atomic read and the accesses to the object that the pointer
  points to directly or indirectly through a chain of pointers

The thing is, anything but a chain of pointers (and maybe relaxing it
to indexes in tables in addition to pointers) doesn't really work.

The current standard tries to break it at obvious points that can
lose the data dependency (either by turning it into a control
dependency, or by just dropping the value, like the left-hand side of
a comma-expression), but the fact is, it's broken.

It's broken not just because the value can be lost other ways (ie the
p-p example), it's broken because the value can be turned into a
control dependency so many other ways too.

Compilers regularly turn arithmetic ops with logical comparisons into
branches. So an expression like a = !!ptr carries a dependency in
the current C standard, but it's entirely possible that a compiler
ends up turning it into a compare-and-branch rather than a
compare-and-set-conditional, depending on just exactly how a ends up
being used. That's true even on an architecture like ARM that has a
lot of conditional instructions (there are way less if you compile for
Thumb, for example, but compilers also do things like if there are
more than N predicated instructions I'll just turn it into a
branch-over instead).

So I think the C standard needs to just explicitly say that you can
walk a chain of pointers (with that possible indexes in arrays
extension), and nothing more.

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Richard Biener

On Mon, Feb 24, 2014 at 4:57 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:

 Let me think about it some more, but my gut feel is that just tweaking
 the definition of what ordered means is sufficient.

 So to go back to the suggested ordering rules (ignoring the restrict
 part, which is just to clarify that ordering through other means to
 get to the object doesn't matter), I suggested:

  the consume ordering guarantees the ordering between that
   atomic read and the accesses to the object that the pointer
   points to

 and I think the solution is to just say that this ordering acts as a
 fence. It doesn't say exactly *where* the fence is, but it says that
 there is *some* fence between the load of the pointer and any/all
 accesses to the object through that pointer.

 I'm wrong. That doesn't work. At all. There is no ordering except
 through the pointer chain.

 So I think saying just that, and nothing else (no magic fences, no
 nothing) is the right thing:

  the consume ordering guarantees the ordering between that
   atomic read and the accesses to the object that the pointer
   points to directly or indirectly through a chain of pointers

To me that reads like

  int i;
  int *q = i;
  int **p = q;

  atomic_XXX (p, CONSUME);

orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
want to say that it orders against aliased storage - but then go further
and include indirectly through a chain of pointers?!  Thus an
atomic read of a int * orders against any 'int' memory operation but
not against 'float' memory operations?

Eh ...

Just jumping in to throw in my weird-2-cents.

Richard.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Mon, Feb 24, 2014 at 8:27 AM, Richard Biener
richard.guent...@gmail.com wrote:

 To me that reads like

   int i;
   int *q = i;
   int **p = q;

   atomic_XXX (p, CONSUME);

 orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
 want to say that it orders against aliased storage - but then go further
 and include indirectly through a chain of pointers?!  Thus an
 atomic read of a int * orders against any 'int' memory operation but
 not against 'float' memory operations?

No, it's not about type at all, and the chain of pointers can be
much more complex than that, since the int * can point to within an
object that contains other things than just that int (the int can
be part of a structure that then has pointers to other structures
etc).

So in your example,

ptr = atomic_read(p, CONSUME);

would indeed order against the subsequent access of the chain through
*that* pointer (the whole restrict thing that I left out as a
separate thing, which was probably a mistake), but certainly not
against any integer pointer, and certainly not against any aliasing
pointer chains.

So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q')
_and_ '**ptr' (getting 'i'), but nothing else - including just the
aliasing access of dereferencing 'i' directly.

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Mon, Feb 24, 2014 at 8:37 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q')
 _and_ '**ptr' (getting 'i'), but nothing else - including just the
 aliasing access of dereferencing 'i' directly.

Btw, what CPU architects and memory ordering guys tend to do in
documentation is give a number of litmus test pseudo-code sequences
to show the effects and intent of the language.

I think giving those kinds of litmus tests for both this is ordered
and this is not ordered cases like the above is would be a great
clarification. Partly because the language is going to be somewhat
legalistic and thus hard to wrap your mind around, and partly to
really hit home the *intent* of the language, which I think is
actually fairly clear to both compiler writers and to programmers.

   Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Michael Matz

Hi,

On Mon, 24 Feb 2014, Linus Torvalds wrote:

  To me that reads like
 
int i;
int *q = i;
int **p = q;
 
atomic_XXX (p, CONSUME);
 
  orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
  want to say that it orders against aliased storage - but then go further
  and include indirectly through a chain of pointers?!  Thus an
  atomic read of a int * orders against any 'int' memory operation but
  not against 'float' memory operations?
 
 No, it's not about type at all, and the chain of pointers can be
 much more complex than that, since the int * can point to within an
 object that contains other things than just that int (the int can
 be part of a structure that then has pointers to other structures
 etc).

So, let me try to poke holes into your definition or increase my 
understanding :) .  You said chain of pointers(dereferences I assume), 
e.g. if p is result of consume load, then access to 
p-here-there-next-prev-stuff is supposed to be ordered with that load 
(or only when that last load/store itself is also an atomic load or 
store?).

So, what happens if the pointer deref chain is partly hidden in some 
functions:

A * adjustptr (B *ptr) { return ptr-here-there-next; }
B * p = atomic_XXX (somewhere, consume);
adjustptr(p)-prev-stuff = bla;

As far as I understood you, this whole ptrderef chain business would be 
only an optimization opportunity, right?  So if the compiler can't be sure 
how p is actually used (as in my function-using case, assume adjustptr is 
defined in another unit), then the consume load would simply be 
transformed into an acquire (or whatever, with some barrier I mean)?  Only 
_if_ the compiler sees all obvious uses of p (indirectly through pointer 
derefs) can it, yeah, do what with the consume load?


Ciao,
Michael.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney

On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote:
 Hi,
 
 On Mon, 24 Feb 2014, Linus Torvalds wrote:
 
   To me that reads like
  
 int i;
 int *q = i;
 int **p = q;
  
 atomic_XXX (p, CONSUME);
  
   orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
   want to say that it orders against aliased storage - but then go further
   and include indirectly through a chain of pointers?!  Thus an
   atomic read of a int * orders against any 'int' memory operation but
   not against 'float' memory operations?
  
  No, it's not about type at all, and the chain of pointers can be
  much more complex than that, since the int * can point to within an
  object that contains other things than just that int (the int can
  be part of a structure that then has pointers to other structures
  etc).
 
 So, let me try to poke holes into your definition or increase my 
 understanding :) .  You said chain of pointers(dereferences I assume), 
 e.g. if p is result of consume load, then access to 
 p-here-there-next-prev-stuff is supposed to be ordered with that load 
 (or only when that last load/store itself is also an atomic load or 
 store?).
 
 So, what happens if the pointer deref chain is partly hidden in some 
 functions:
 
 A * adjustptr (B *ptr) { return ptr-here-there-next; }
 B * p = atomic_XXX (somewhere, consume);
 adjustptr(p)-prev-stuff = bla;
 
 As far as I understood you, this whole ptrderef chain business would be 
 only an optimization opportunity, right?  So if the compiler can't be sure 
 how p is actually used (as in my function-using case, assume adjustptr is 
 defined in another unit), then the consume load would simply be 
 transformed into an acquire (or whatever, with some barrier I mean)?  Only 
 _if_ the compiler sees all obvious uses of p (indirectly through pointer 
 derefs) can it, yeah, do what with the consume load?

Good point, I left that out of my list.  Adding it:

13. By default, pointer chains do not propagate into or out of functions.
In implementations having attributes, a [[carries_dependency]]
may be used to mark a function argument or return as passing
a pointer chain into or out of that function.

If a function does not contain memory_order_consume loads and
also does not contain [[carries_dependency]] attributes, then
that function may be compiled using any desired dependency-breaking
optimizations.

The ordering effects are implementation defined when a given
pointer chain passes into or out of a function through a parameter
or return not marked with a [[carries_dependency]] attributed.

Note that this last paragraph differs from the current standard, which
would require ordering regardless.

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz m...@suse.de wrote:

 So, let me try to poke holes into your definition or increase my
 understanding :) .  You said chain of pointers(dereferences I assume),
 e.g. if p is result of consume load, then access to
 p-here-there-next-prev-stuff is supposed to be ordered with that load
 (or only when that last load/store itself is also an atomic load or
 store?).

It's supposed to be ordered wrt the first load (the consuming one), yes.

 So, what happens if the pointer deref chain is partly hidden in some
 functions:

No problem.

The thing is, the ordering is actually handled by the CPU in all
relevant cases.  So the compiler doesn't actually need to *do*
anything. All this legalistic stuff is just to describe the semantics
and the guarantees.

The problem is two cases:

 (a) alpha (which doesn't really order any accesses at all, not even
dependent loads), but for a compiler alpha is actually trivial: just
add a rmb instruction after the load, and you can't really do
anything else (there's a few optimizations you can do wrt the rmb, but
they are very specific and simple).

So (a) is a problem, but the solution is actually really simple, and
gives very *strong* guarantees: on alpha, a consume ends up being
basically the same as a read barrier after the load, with only very
minimal room for optimization.

 (b) ARM and powerpc and similar architectures, that guarantee the
data dependency as long as it is an *actual* data dependency, and
never becomes a control dependency.

On ARM and powerpc, control dependencies do *not* order accesses (the
reasons boil down to essentially: branch prediction breaks the
dependency, and instructions that come after the branch can be happily
executed before the branch). But it's almost impossible to describe
that in the standard, since compilers can (and very much do) turn a
control dependency into a data dependency and vice versa.

So the current standard tries to describe that control vs data
dependency, and tries to limit it to a data dependency. It fails. It
fails for multiple reasons - it doesn't allow for trivial
optimizations that just remove the data dependency, and it also
doesn't allow for various trivial cases where the compiler *does* turn
the data dependency into a control dependency.

So I really really think that the current C standard language is
broken. Unfixably broken.

I'm trying to change the syntactic data dependency that the current
standard uses into something that is clearer and correct.

The chain of pointers thing is still obviously a data dependency,
but by limiting it to pointers, it simplifies the language, clarifies
the meaning, avoids all syntactic tricks (ie p-p is clearly a
syntactic dependency on p, but does *not* involve in any way
following the pointer) and makes it basically impossible for the
compiler to break the dependency without doing value prediction, and
since value prediction has to be disallowed anyway, that's a feature,
not a bug.

  Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney

On Mon, Feb 24, 2014 at 02:55:07PM +0100, Michael Matz wrote:
 Hi,
 
 On Fri, 21 Feb 2014, Paul E. McKenney wrote:
 
   And with conservative I mean everything is a source of a dependency, and 
   hence can't be removed, reordered or otherwise fiddled with, and that 
   includes code sequences where no atomic objects are anywhere in sight [1].
   In the light of that the only realistic way (meaning to not have to 
   disable optimization everywhere) to implement consume as currently 
   specified is to map it to acquire.  At which point it becomes pointless.
  
  No, only memory_order_consume loads and [[carries_dependency]]
  function arguments are sources of dependency chains.
 
 I don't see [[carries_dependency]] in the C11 final draft (yeah, should 
 get a real copy, I know, but let's assume it's the same language as the 
 standard).  Therefore, yes, only consume loads are sources of 
 dependencies.  The problem with the definition of the carries a 
 dependency relation is not the sources, but rather where it stops.  
 It's transitively closed over value of evaluation A is used as operand in 
 evaluation B, with very few exceptions as per 5.1.2.4#14.  Evaluations 
 can contain function calls, so if there's _any_ chance that an operand of 
 an evaluation might even indirectly use something resulting from a consume 
 load then that evaluation must be compiled in a way to not break 
 dependency chains.
 
 I don't see a way to generally assume that e.g. the value of a function 
 argument can impossibly result from a consume load, therefore the compiler 
 must assume that all function arguments _can_ result from such loads, and 
 so must disable all depchain breaking optimization (which are many).
 
   [1] Simple example of what type of transformations would be disallowed:
   
   int getzero (int i) { return i - i; }
  
  This needs to be as follows:
  
  [[carries_dependency]] int getzero(int i [[carries_dependency]])
  {
  return i - i;
  }
  
  Otherwise dependencies won't get carried through it.
 
 So, with the above do you agree that in absense of any other magic (see 
 below) the compiler is not allowed to transform my initial getzero() 
 (without the carries_dependency markers) implementation into return 0; 
 because of the C11 rules for carries-a-dependency?
 
 If so, do you then also agree that the specification of carries a 
 dependency is somewhat, err, shall we say, overbroad?

From what I can see, overbroad.  The problem is that the C++11 standard
defines how carries-dependency interacts with function calls and returns
in 7.6.4, which describes the [[carries_dependency]] attribute.  For example,
7.6.4p6 says:

Function g’s second parameter has a carries_dependency
attribute, but its first parameter does not. Therefore, function
h’s first call to g carries a dependency into g, but its second
call does not. The implementation might need to insert a fence
prior to the second call to g.

When C11 declined to take attributes, they also left out the part saying
how carries-dependency interacts with functions.  :-/

Might be fixed by now, checking up on it.

One could argue that the bit about emitting fence instructions at
function calls and returns is implied by the as-if rule even without
this wording, but...

   depchains don't matter, could _then_ optmize it to zero.  But that's 
   insane, especially considering that it's hard to detect if a given 
   context 
   doesn't care for depchains, after all the depchain relation is 
   constructed 
   exactly so that it bleeds into nearly everywhere.  So we would most of 
   the time have to assume that the ultimate context will be depchain-aware 
   and therefore disable many transformations.
  
  Any function that does not contain a memory_order_consume load and that 
  doesn't have any arguments marked [[carries_dependency]] can be 
  optimized just as before.
 
 And as such marker doesn't exist we must conservatively assume that it's 
 on _all_ parameters, so I'll stand by my claim.

Or that you have to emit a fence instruction when a dependency chain
enters or leaves a function in cases where all callers/calles are not
visible to the compiler.

My preference is that the ordering properties of a carries-dependency
chain is implementation defined at the point that it enters or leaves
a function without the marker, but others strongly disagreed.  ;-)

   Then inlining getzero would merely add another # j.dep = i.dep 
   relation, so depchains are still there but the value optimization can 
   happen before inlining.  Having to do something like that I'd find 
   disgusting, and rather rewrite consume into acquire :)  Or make the 
   depchain relation somehow realistically implementable.
  
  I was actually OK with arithmetic cancellation breaking the dependency 
  chains.  Others on the committee felt otherwise, and I figured that (1) 
  I wouldn't be writing that kind of function anyway and (2) they knew

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney

On Mon, Feb 24, 2014 at 09:28:56AM -0800, Paul E. McKenney wrote:
 On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote:
  Hi,
  
  On Mon, 24 Feb 2014, Linus Torvalds wrote:
  
To me that reads like
   
  int i;
  int *q = i;
  int **p = q;
   
  atomic_XXX (p, CONSUME);
   
orders against accesses '*p', '**p', '*q' and 'i'.  Thus it seems they
want to say that it orders against aliased storage - but then go further
and include indirectly through a chain of pointers?!  Thus an
atomic read of a int * orders against any 'int' memory operation but
not against 'float' memory operations?
   
   No, it's not about type at all, and the chain of pointers can be
   much more complex than that, since the int * can point to within an
   object that contains other things than just that int (the int can
   be part of a structure that then has pointers to other structures
   etc).
  
  So, let me try to poke holes into your definition or increase my 
  understanding :) .  You said chain of pointers(dereferences I assume), 
  e.g. if p is result of consume load, then access to 
  p-here-there-next-prev-stuff is supposed to be ordered with that load 
  (or only when that last load/store itself is also an atomic load or 
  store?).
  
  So, what happens if the pointer deref chain is partly hidden in some 
  functions:
  
  A * adjustptr (B *ptr) { return ptr-here-there-next; }
  B * p = atomic_XXX (somewhere, consume);
  adjustptr(p)-prev-stuff = bla;
  
  As far as I understood you, this whole ptrderef chain business would be 
  only an optimization opportunity, right?  So if the compiler can't be sure 
  how p is actually used (as in my function-using case, assume adjustptr is 
  defined in another unit), then the consume load would simply be 
  transformed into an acquire (or whatever, with some barrier I mean)?  Only 
  _if_ the compiler sees all obvious uses of p (indirectly through pointer 
  derefs) can it, yeah, do what with the consume load?
 
 Good point, I left that out of my list.  Adding it:
 
 13.   By default, pointer chains do not propagate into or out of functions.
   In implementations having attributes, a [[carries_dependency]]
   may be used to mark a function argument or return as passing
   a pointer chain into or out of that function.
 
   If a function does not contain memory_order_consume loads and
   also does not contain [[carries_dependency]] attributes, then
   that function may be compiled using any desired dependency-breaking
   optimizations.
 
   The ordering effects are implementation defined when a given
   pointer chain passes into or out of a function through a parameter
   or return not marked with a [[carries_dependency]] attributed.
 
 Note that this last paragraph differs from the current standard, which
 would require ordering regardless.

And there is also kill_dependency(), which needs to be added to the list
in #8 of operators that take a chained pointer and return something that
is not a chained pointer.

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney

On Mon, Feb 24, 2014 at 09:38:46AM -0800, Linus Torvalds wrote:
 On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz m...@suse.de wrote:
 
  So, let me try to poke holes into your definition or increase my
  understanding :) .  You said chain of pointers(dereferences I assume),
  e.g. if p is result of consume load, then access to
  p-here-there-next-prev-stuff is supposed to be ordered with that load
  (or only when that last load/store itself is also an atomic load or
  store?).
 
 It's supposed to be ordered wrt the first load (the consuming one), yes.
 
  So, what happens if the pointer deref chain is partly hidden in some
  functions:
 
 No problem.
 
 The thing is, the ordering is actually handled by the CPU in all
 relevant cases.  So the compiler doesn't actually need to *do*
 anything. All this legalistic stuff is just to describe the semantics
 and the guarantees.
 
 The problem is two cases:
 
  (a) alpha (which doesn't really order any accesses at all, not even
 dependent loads), but for a compiler alpha is actually trivial: just
 add a rmb instruction after the load, and you can't really do
 anything else (there's a few optimizations you can do wrt the rmb, but
 they are very specific and simple).
 
 So (a) is a problem, but the solution is actually really simple, and
 gives very *strong* guarantees: on alpha, a consume ends up being
 basically the same as a read barrier after the load, with only very
 minimal room for optimization.
 
  (b) ARM and powerpc and similar architectures, that guarantee the
 data dependency as long as it is an *actual* data dependency, and
 never becomes a control dependency.
 
 On ARM and powerpc, control dependencies do *not* order accesses (the
 reasons boil down to essentially: branch prediction breaks the
 dependency, and instructions that come after the branch can be happily
 executed before the branch). But it's almost impossible to describe
 that in the standard, since compilers can (and very much do) turn a
 control dependency into a data dependency and vice versa.
 
 So the current standard tries to describe that control vs data
 dependency, and tries to limit it to a data dependency. It fails. It
 fails for multiple reasons - it doesn't allow for trivial
 optimizations that just remove the data dependency, and it also
 doesn't allow for various trivial cases where the compiler *does* turn
 the data dependency into a control dependency.
 
 So I really really think that the current C standard language is
 broken. Unfixably broken.
 
 I'm trying to change the syntactic data dependency that the current
 standard uses into something that is clearer and correct.
 
 The chain of pointers thing is still obviously a data dependency,
 but by limiting it to pointers, it simplifies the language, clarifies
 the meaning, avoids all syntactic tricks (ie p-p is clearly a
 syntactic dependency on p, but does *not* involve in any way
 following the pointer) and makes it basically impossible for the
 compiler to break the dependency without doing value prediction, and
 since value prediction has to be disallowed anyway, that's a feature,
 not a bug.

OK, good point, please ignore my added thirteenth item in the list.

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:

 4.  Bitwise operators (, |, ^, and I suppose also ~)
 applied to a chained pointer and an integer results in another
 chained pointer in that same pointer chain.

No. You cannot define it this way. Taking the value of a pointer and
doing a bitwise operation that throws away all the bits (or even
*most* of the bits) results in the compiler easily being able to turn
the chain into a non-chain.

The obvious example being val  0, but things like val  1 are in
practice also something that compilers easily turn into control
dependencies instead of data dependencies.

So you can talk about things like aligning the pointer value to
object boundaries etc, but it really cannot and *must* not be about
the syntactic operations.

The same goes for adding and subtracting an integer. The *syntax*
doesn't matter. It's about remaining information. Doing p-(int)p or
p+(-(int)p) doesn't leave any information despite being subtracting
and adding an integer at a syntactic level.

Syntax is meaningless. Really.

 8.  Applying any of the following operators to a chained pointer
 results in something that is not a chained pointer:
 (), sizeof, !, *, /, %, , , , , =,
 =, ==, !=, , and ||.

Parenthesis? I'm assuming that you mean calling through the chained pointer.

Also, I think all of /, * and % are perfectly fine, and might be used
for that aligning the pointer operation that is fine.

 Linus

Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Richard Sandiford

Matthew Fortune matthew.fort...@imgtec.com writes:
 Richard Sandiford rdsandif...@googlemail.com writes
 I understand the need to deprecate the current -mgp32 -mfp64 behaviour.
 I don't think we should deprecate -mfp64 itself though.  Instead, why
 not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for
 modeless?  So rather than deprecating the -mgp32 -mfp64 combination and
 adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in
 which the odd-numbered registers are call-clobbered rather than the old
 form in which they were call-saved.

 Extreme caution is the only reason why the design avoided changing fp64
 behaviour (basically in case anyone had serious objection). If you would
 be happy with a change of behaviour for -mgp32 -mfp64 then that is a
 great start.

Yeah, my first impression is that keeping the current interface would
be much better than adding a new set of options.

 AIUI the old form never really worked reliably due to things like
 newlib's setjmp not preserving the odd-numbered registers, so it doesn't
 seem worth keeping around.  Also, the old form is identified by the GNU
 attribute (4, 4) so it'd be easy for the linker to reject links between
 the old and the new form.

 That is true. You will have noticed a number of changes over recent
 months to start fixing fp64 as currently defined but having found this
 new solution then such fixes are no longer important. The lack of
 support for gp32 fp64 in linux is further reason to permit redefining
 it. Would you be happy to retain the same builtin defines for FP64 if
 changing its behaviour (i.e. __mips_fpr=64)?

I think that should be OK.  I suppose a natural follow-on question
is what __mips_fpr should be for -mfpxx.  Maybe just 0?

If we want to be extra cautious we could define a second set of macros
alongside the old ones.

 The scheme allows an ifunc to request a mode and effectively gives the
 choice to the firstcomer.  Every other ifunc then has to live with the
 choice.  I don't think that's a good idea, since the order that ifuncs
 are called isn't well-defined or under direct user control.
 
 Since ifuncs would have to live with a choice made by other ifuncs, in
 practice they must all be prepared to deal with FR=1 if linked into a
 fully-modeless or FR1 program, and to deal with FR=0 if linked into a
 fully-modeless or FR0 program.  So IMO the dynamic linker should simply
 set FR=1 for modeless programs if the hardware supports it and set it to
 0 for modeless programs otherwise, like you say in the first paragraph
 of 9.4.

 The ifunc interaction should possibly be moved to a different
 proposal. We could reduce this down to a simple statement that
 interaction with ifunc needs to be considered when finalising MIPS ifunc
 support in general.

Sounds good.

 You allow the mode to be changed midexecution if a new FR0 or FR1 object
 is loaded.  Is it really worth supporting that though?
 It has the same problem as the ifuncs: once you've dlopen()ed an object,
 you fix the mode for the whole program, even after the dlclose().
 Unless we know of specific cases where this is needed, maybe it would be
 safer to fix the mode before execution based on DT_NEEDED libraries and
 allow the mode of modeless programs to be overridden by an environment
 variable.

 Scanning the entire set of DT_NEEDED libraries would achieve most of
 what full dynamic mode switching gives us, it is essentially the first
 stage of the dynamic mode switching described in the proposal
 anyway. However, I am concerned about excluding dlopen()ed objects from
 mode selection though (not so worried about excluding ifunc, that could
 just fix the mode before resolving the first one). One specific concern
 is for Android where I believe we have the situation where native
 applications are loaded as (a form of) shared library. This means a mode
 requirement can be introduced late on. In an Android environment it is
 unlikely to be acceptable to have to do something special to load an
 application that happens to have a specific mode requirement so dynamic
 selection is useful. This is more of a transitional problem than
 anything but making it a smooth process is quite important. I'm also not
 sure that there is much more effort required for a dynamic linker to
 take account of dlopen()ed objects in addition to DT_NEEDED, changes are
 needed in this code regardless.

As far as GNU/Linux goes, if we do end up with a function in something
like a modeless libm that is implemented as an FR-aware ifunc, that would
force the choice to be made early anyway.  So we have this very specific
case where everything in the initial process is modeless, no ifuncs take
advantage of the FR setting, and a dlopen()ed object was compiled as fr0
rather than modeless.  I agree it's possible but it seems unlikely.

I know nothing about the way Android loading works though. :-)
Could you describe it in more detail?  Is it significantly different
from glibc's

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney

On Mon, Feb 24, 2014 at 10:14:01AM -0800, Linus Torvalds wrote:
 On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
 
  4.  Bitwise operators (, |, ^, and I suppose also ~)
  applied to a chained pointer and an integer results in another
  chained pointer in that same pointer chain.
 
 No. You cannot define it this way. Taking the value of a pointer and
 doing a bitwise operation that throws away all the bits (or even
 *most* of the bits) results in the compiler easily being able to turn
 the chain into a non-chain.
 
 The obvious example being val  0, but things like val  1 are in
 practice also something that compilers easily turn into control
 dependencies instead of data dependencies.

Indeed, most of the bits need to remain for this to work.

 So you can talk about things like aligning the pointer value to
 object boundaries etc, but it really cannot and *must* not be about
 the syntactic operations.
 
 The same goes for adding and subtracting an integer. The *syntax*
 doesn't matter. It's about remaining information. Doing p-(int)p or
 p+(-(int)p) doesn't leave any information despite being subtracting
 and adding an integer at a syntactic level.
 
 Syntax is meaningless. Really.

Good points.  How about the following replacements?

3.  Adding or subtracting an integer to/from a chained pointer
results in another chained pointer in that same pointer chain.
The results of addition and subtraction operations that cancel
the chained pointer's value (for example, p-(long)p where p
is a pointer to char) are implementation defined.

4.  Bitwise operators (, |, ^, and I suppose also ~)
applied to a chained pointer and an integer for the purposes
of alignment and pointer translation results in another
chained pointer in that same pointer chain.  Other uses
of bitwise operators on chained pointers (for example,
p|~0) are implementation defined.

  8.  Applying any of the following operators to a chained pointer
  results in something that is not a chained pointer:
  (), sizeof, !, *, /, %, , , , , =,
  =, ==, !=, , and ||.
 
 Parenthesis? I'm assuming that you mean calling through the chained pointer.

Yes, good point.  Of course, parentheses for grouping just pass the
value through without affecting the chained-ness.

 Also, I think all of /, * and % are perfectly fine, and might be used
 for that aligning the pointer operation that is fine.

Something like this?

char *p;

p = p - (unsigned long)p % 8;

I was thinking of this as subtraction -- the p gets moduloed by 8,
which loses the chained-pointer designation.  But that is OK because
that designation gets folded back in by the subtraction.  Am I missing
a use case?

That leaves things like this one:

p = (p / 8) * 8;

I cannot think of any other legitimate use for / and *.

Here is an updated #8 and a new 8a:

8.  Applying any of the following operators to a chained pointer
results in something that is not a chained pointer: function call
(), sizeof, !, %, , , , , =, =, ==,
!=, , ||, and kill_dependency().

8a. Dividing a chained pointer by an integer and multiplying it
by that same integer (for example, to align that pointer) results
in a chained pointer in that same pointer chain.  The ordering
effects of other uses of infix * and / on chained pointers
are implementation defined.

Does that capture it?

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:

 Good points.  How about the following replacements?

 3.  Adding or subtracting an integer to/from a chained pointer
 results in another chained pointer in that same pointer chain.
 The results of addition and subtraction operations that cancel
 the chained pointer's value (for example, p-(long)p where p
 is a pointer to char) are implementation defined.

 4.  Bitwise operators (, |, ^, and I suppose also ~)
 applied to a chained pointer and an integer for the purposes
 of alignment and pointer translation results in another
 chained pointer in that same pointer chain.  Other uses
 of bitwise operators on chained pointers (for example,
 p|~0) are implementation defined.

Quite frankly, I think all of this language that is about the actual
operations is irrelevant and wrong.

It's not going to help compiler writers, and it sure isn't going to
help users that read this.

Why not just talk about value chains and that any operations that
restrict the value range severely end up breaking the chain. There is
no point in listing the operations individually, because every single
operation *can* restrict things. Listing individual operations and
depdendencies is just fundamentally wrong.

For example, let's look at this obvious case:

   int q,*p = atomic_read(pp, consume);
   .. nothing modifies 'p' ..
   q = *p;

and there are literally *zero* operations that modify the value
change, so obviously the two operations are ordered, right?

Wrong.

What if the nothing modifies 'p' part looks like this:

if (p != myvariable)
return;

and now any sane compiler will happily optimize q = *p into q =
myvariable, and we're all done - nothing invalid was ever

So my earlier suggestion tried to avoid this by having the restrict
thing, so the above wouldn't work.

But your (and the current C standards) attempt to define this with
some kind of syntactic dependency carrying chain will _inevitably_ get
this wrong, and/or be too horribly complex to actually be useful.

Seriously, don't do it. I claim that all your attempts to do this
crazy syntactic these operations maintain the chained pointers is
broken. The fact that you have changed carries a dependency to
chained pointer changes NOTHING.

So just give it up. It's a fundamentally broken model. It's *wrong*,
but even more importantly, it's not even *useful*, since it ends up
being too complicated for a compiler writer or a programmer to
understand.

I really really really think you need to do this at a higher
conceptual level, get away from all these idiotic these operations
maintain the chain crap. Because there *is* no such list.

Quite frankly, any standards text that has that
[[carries_dependency]] or [[kill_dependency]] or whatever
attribute is broken. It's broken because the whole concept is TOTALLY
ALIEN to the compiler writer or the programmer. It makes no sense.
It's purely legalistic language that has zero reason to exist. It's
non-intuitive for everybody.

And *any* language that talks about the individual operations only
encourages people to play legalistic games that actually defeat the
whole purpose (namely that all relevant CPU's are going to implement
that consume ordering guarantee natively, with no extra code
generation rules AT ALL). So any time you talk about some random
detail of some operation, somebody is going to come up with a trick
that defeats things. So don't do it. There is absolutely ZERO
difference between any of the arithmetic operations, be they bitwise,
additive, multiplicative, shifts, whatever.

The *only* thing that matters for all of them is whether they are
value-preserving, or whether they drop so much information that the
compiler might decide to use a control dependency instead. That's true
for every single one of them.

Similarly, actual true control dependencies that limit the problem
space sufficiently that the actual pointer value no longer has
significant information in it (see the above example) are also things
that remove information to the point that only a control dependency
remains. Even when the value itself is not modified in any way at all.

  Linus

[GSoc 2014] OpenMP runtime improvements

2014-02-24 Thread Pranith Kumar

Hi,

I am interested in contributing to the OpenMP project as part of GSoC 2014.

I am mailing to discuss ideas, to see if someone is willing to mentor
me and also if possible, to get a test patch in before the end  of
application process so as to verify my qualification!

Looking forward to your feedback!

Thanks,
Pranith

build breakage in libsanitizer

2014-02-24 Thread Oleg Smolsky

Hey all, I've just tried building Trunk from branches/google/main/ and 
got the following failure in libsanitizer:


In file included from 
/mnt/test/rvbd-root-gcc49/usr/include/features.h:356:0,

 from /mnt/test/rvbd-root-gcc49/usr/include/arpa/inet.h:22,
 from 
../../../../gcc49-google-main/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:20:
/mnt/test/rvbd-root-gcc49/usr/include/sys/timex.h:145:31: error: 
expected initializer before 'throw'

  __asm__ (ntp_gettimex) __THROW;
   ^

This is an intel-to-intel cross-compiler that is being built against 
linux-2.6.32.27 headers and glibc-2.12 (with a few redhat patches). The 
system header in question contains the following:


#if defined __GNUC__  __GNUC__ = 2
extern int ntp_gettime (struct ntptimeval *__ntv)
 __asm__ (ntp_gettimex) __THROW;
#else
extern int ntp_gettimex (struct ntptimeval *__ntv) __THROW;
# define ntp_gettime ntp_gettimex
#endif

This same header works against gcc4.8 (or is not used). Could someone 
clarify what libsanitizer needs here and what gcc dislikes please? Which 
should I patch?


Thanks in advance!
Oleg.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:

 What if the nothing modifies 'p' part looks like this:

 if (p != myvariable)
 return;

 and now any sane compiler will happily optimize q = *p into q =
 myvariable, and we're all done - nothing invalid was ever

 Yes, the compiler could do that.  But it would still be required to
 carry a dependency from the memory_order_consume read to the *p,

But that's *BS*. You didn't actually listen to the main issue.

Paul, why do you insist on this carries-a-dependency crap?

It's broken. If you don't believe me, then believe the compiler person
who already piped up and told you so.

The carries a dependency model is broken. Get over it.

No sane compiler will ever distinguish two different registers that
have the same value from each other. No sane compiler will ever say
ok, register r1 has the exact same value as register r2, but r2
carries the dependency, so I need to make sure to pass r2 to that
function or use it as a base pointer.

And nobody sane should *expect* a compiler to distinguish two
registers with the same value that way.

So the whole model is broken.

I gave an alternate model (the restrict), and you didn't seem to
understand the really fundamental difference. It's not a language
difference, it's a conceptual difference.

In the broken carries a dependency model, you have fight all those
aliases that can have the same value, and it is not a fight you can
win. We've had the p-p examples, we've had the p0 examples, but
the fact is, that p==myvariable example IS EXACTLY THE SAME THING.

All three of those things: p-p, p0, and p==myvariable mean
that any compiler worth its salt now know that p carries no
information, and will optimize it away.

So please stop arguing against that. Whenever you argue against that
simple fact, you are arguing against sane compilers.

So *accept* the fact that some operations (and I guarantee that there
are more of those than you can think of, and you can create them with
various tricks using pretty much *any* feature in the C language)
essentially take the data information away. And just accept the fact
that then the ordering goes away too.

So give up on carries a dependency. Because there will be cases
where that dependency *isn't* carried.

The language of the standard needs to get *away* from the broken
model, because otherwise the standard is broken.

I suggest we instead talk about litmus tests and why certain code
sequences are ordered, and others are not.

So the code sequence I already mentioned is *not* ordered:

Litmus test 1:

p = atomic_read(pp, consume);
if (p == variable)
return p-val;

   is *NOT* ordered, because the compiler can trivially turn this into
return variable.val, and break the data dependency.

   This is true *regardless* of any carries a dependency language,
because that language is insane, and doesn't work when the different
pieces here may be in different compilation units.

BUT:

Litmus test 2:

p = atomic_read(pp, consume);
if (p != variable)
return p-val;

  *IS* ordered, because while it looks syntactically a lot like
Litmus test 1, there is no sane way a compiler can use the knowledge
that p is not a pointer to a particular location to break the data
dependency.

There is no way in hell that any sane carries a dependency model can
get the simple tests above right.

So give up on it already. Carries a dependency cannot work. It's a
bad model. You're trying to describe the problem using the wrong
tools.

Note that my restrict+pointer to object language actually got this
*right*. The restrict part made Litmus test 1 not ordered, because
that p == variable success case means that the pointer wasn't
restricted, so the pre-requisite for ordering didn't exist.

See? The carries a dependency is a broken model for this, but there
are _other_ models that can work.

You tried to rewrite my model into carries a dependency. That
*CANNOT* work. It's like trying to rewrite quantum physics into the
Greek model of the four elements. They are not compatible models, and
one of them can be shown to not work.

  Linus

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Matthew Fortune

Richard Sandiford rdsandif...@googlemail.com writes
  AIUI the old form never really worked reliably due to things like
  newlib's setjmp not preserving the odd-numbered registers, so it
  doesn't seem worth keeping around.  Also, the old form is identified
  by the GNU attribute (4, 4) so it'd be easy for the linker to reject
  links between the old and the new form.
 
  That is true. You will have noticed a number of changes over recent
  months to start fixing fp64 as currently defined but having found this
  new solution then such fixes are no longer important. The lack of
  support for gp32 fp64 in linux is further reason to permit redefining
  it. Would you be happy to retain the same builtin defines for FP64 if
  changing its behaviour (i.e. __mips_fpr=64)?
 
 I think that should be OK.  I suppose a natural follow-on question is
 what __mips_fpr should be for -mfpxx.  Maybe just 0?

I'm doing just that in my experimental implementation of all this.
 
 If we want to be extra cautious we could define a second set of macros
 alongside the old ones.
 
  You allow the mode to be changed midexecution if a new FR0 or FR1
  object is loaded.  Is it really worth supporting that though?
  It has the same problem as the ifuncs: once you've dlopen()ed an
  object, you fix the mode for the whole program, even after the
 dlclose().
  Unless we know of specific cases where this is needed, maybe it would
  be safer to fix the mode before execution based on DT_NEEDED
  libraries and allow the mode of modeless programs to be overridden by
  an environment variable.
 
  Scanning the entire set of DT_NEEDED libraries would achieve most of
  what full dynamic mode switching gives us, it is essentially the first
  stage of the dynamic mode switching described in the proposal anyway.
  However, I am concerned about excluding dlopen()ed objects from mode
  selection though (not so worried about excluding ifunc, that could
  just fix the mode before resolving the first one). One specific
  concern is for Android where I believe we have the situation where
  native applications are loaded as (a form of) shared library. This
  means a mode requirement can be introduced late on. In an Android
  environment it is unlikely to be acceptable to have to do something
  special to load an application that happens to have a specific mode
  requirement so dynamic selection is useful. This is more of a
  transitional problem than anything but making it a smooth process is
  quite important. I'm also not sure that there is much more effort
  required for a dynamic linker to take account of dlopen()ed objects in
  addition to DT_NEEDED, changes are needed in this code regardless.
 
 As far as GNU/Linux goes, if we do end up with a function in something
 like a modeless libm that is implemented as an FR-aware ifunc, that
 would force the choice to be made early anyway.  So we have this very
 specific case where everything in the initial process is modeless, no
 ifuncs take advantage of the FR setting, and a dlopen()ed object was
 compiled as fr0 rather than modeless.  I agree it's possible but it
 seems unlikely.

A reasonable point.
 
 I know nothing about the way Android loading works though. :-) Could you
 describe it in more detail?  Is it significantly different from glibc's
 dynamic loader running a PIE?

I am working from fragments of information on this aspect still so I need to 
get more clarification from Android developers. My current understanding is 
that native parts of applications are actually shared libraries and form part 
of, but not necessarily the entry to, an application. Since such a shared 
library can't be 'required' by anything it must be loaded explicitly. I'll get 
clarification but the potential need for dynamic mode switching in Android need 
not affect the decision that GNU/Linux takes.
 
  If we do end up using ELF flags then maybe adding two new EF_MIPS_ABI
  enums would be better.  It's more likely to be trapped by old loaders
  and avoids eating up those precious remaining bits.
 
  Sound's reasonable but I'm still trying to determine how this
  information can be propagated from loader to dynamic loader.
 
 The dynamic loader has access to the ELF headers so I didn't think it
 would need any help.

As I understand it the dynamic loader only has specific access to the program 
headers of the executable not the ELF headers. There is no question that the 
dynamic loader has access to DSO ELF headers but we need the start point too.
 
  You didn't say specifically how a static program's crt code would
  know whether it was linked as modeless or in a specific FR mode.
  Maybe the linker could define a special hidden symbol?
 
  Why do you say crt rather than dlopen? The mode requirement should
  only matter if you want to change it and dlopen should be able to
  access information in the same way that a dynamic linker would. It may
  seem redundant but perhaps we end up having to mark an executable with

Re: [LM-32] Code generation for address loading

2014-02-24 Thread Anthony Green

FX MOREL fxmorel@gmail.com writes:

 Hi everyone,

 I am developing on a custom design using the LatticeMico32
 architecture and I use gcc 4.5.1 to compile C code for this arch.

 In this architecture, the loading of an address 0x always
 takes two assembly instructions to fetch the address because
 immediates are on 16 bits :
 mvhi r1, 0x
 ori r1, r1, 0x
 ...
 lw r2, r1

 In my situation, nearly all the symbols are located in the same 64kB
 region and their address share the same hi-part, so I am trying to
 minimize the overload of always using two instructions when only one
 is needed.


[additional details deleted]

 Because the symbol mapping phase is done during linking, I have little
 chance to know the future symbol address at code generation but is
 there some way I could make this work ?
 If I affect the symbol to a dedicated section (with the __attribute__
 ((section())) directive ), is there a way to know its section during
 code generation ?

 I understand that I am asking for a very 'dangerous' advice but again,
 this will only be a custom optimization for a custom design.

Have you considered doing this through custom GNU linker relaxation
work?  I would try this before hacking away at the compiler.

AG



 Thank you.

 F-X Morel

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Paul E. McKenney

On Mon, Feb 24, 2014 at 03:35:04PM -0800, Linus Torvalds wrote:
 On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
 
  What if the nothing modifies 'p' part looks like this:
 
  if (p != myvariable)
  return;
 
  and now any sane compiler will happily optimize q = *p into q =
  myvariable, and we're all done - nothing invalid was ever
 
  Yes, the compiler could do that.  But it would still be required to
  carry a dependency from the memory_order_consume read to the *p,
 
 But that's *BS*. You didn't actually listen to the main issue.
 
 Paul, why do you insist on this carries-a-dependency crap?

Sigh.  Read on...

 It's broken. If you don't believe me, then believe the compiler person
 who already piped up and told you so.
 
 The carries a dependency model is broken. Get over it.
 
 No sane compiler will ever distinguish two different registers that
 have the same value from each other. No sane compiler will ever say
 ok, register r1 has the exact same value as register r2, but r2
 carries the dependency, so I need to make sure to pass r2 to that
 function or use it as a base pointer.
 
 And nobody sane should *expect* a compiler to distinguish two
 registers with the same value that way.
 
 So the whole model is broken.
 
 I gave an alternate model (the restrict), and you didn't seem to
 understand the really fundamental difference. It's not a language
 difference, it's a conceptual difference.
 
 In the broken carries a dependency model, you have fight all those
 aliases that can have the same value, and it is not a fight you can
 win. We've had the p-p examples, we've had the p0 examples, but
 the fact is, that p==myvariable example IS EXACTLY THE SAME THING.
 
 All three of those things: p-p, p0, and p==myvariable mean
 that any compiler worth its salt now know that p carries no
 information, and will optimize it away.
 
 So please stop arguing against that. Whenever you argue against that
 simple fact, you are arguing against sane compilers.

So let me see if I understand your reasoning.  My best guess is that it
goes something like this:

1.  The Linux kernel contains code that passes pointers from
rcu_dereference() through external functions.

2.  Code in the Linux kernel expects the normal RCU ordering
guarantees to be in effect even when external functions are
involved.

3.  When compiling one of these external functions, the C compiler
has no way of knowing about these RCU ordering guarantees.

4.  The C compiler might therefore apply any and all optimizations
to these external functions.

5.  This in turn implies that we the only way to prohibit any given
optimization from being applied to the results obtained from
rcu_dereference() is to prohibit that optimization globally.

6.  We have to be very careful what optimizations are globally
prohibited, because a poor choice could result in unacceptable
performance degradation.

7.  Therefore, the only operations that can be counted on to
maintain the needed RCU orderings are those where the compiler
really doesn't have any choice, in other words, where any
reasonable way of computing the result will necessarily maintain
the needed ordering.

Did I get this right, or am I confused?

 So *accept* the fact that some operations (and I guarantee that there
 are more of those than you can think of, and you can create them with
 various tricks using pretty much *any* feature in the C language)
 essentially take the data information away. And just accept the fact
 that then the ordering goes away too.

Actually, the fact that there are more potential optimizations than I can
think of is a big reason for my insistence on the carries-a-dependency
crap.  My lack of optimization omniscience makes me very nervous about
relying on there never ever being a reasonable way of computing a given
result without preserving the ordering.

 So give up on carries a dependency. Because there will be cases
 where that dependency *isn't* carried.
 
 The language of the standard needs to get *away* from the broken
 model, because otherwise the standard is broken.
 
 I suggest we instead talk about litmus tests and why certain code
 sequences are ordered, and others are not.

OK...

 So the code sequence I already mentioned is *not* ordered:
 
 Litmus test 1:
 
 p = atomic_read(pp, consume);
 if (p == variable)
 return p-val;
 
is *NOT* ordered, because the compiler can trivially turn this into
 return variable.val, and break the data dependency.

Right, given your model, the compiler is free to produce code that
doesn't order the load from pp against the load from p-val.

This is true *regardless* of any carries a dependency language,
 because that language is insane, and doesn't work when the different
 pieces here may be in different compilation units.

Indeed, it

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-24 Thread Linus Torvalds

On Mon, Feb 24, 2014 at 3:35 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Litmus test 1:

 p = atomic_read(pp, consume);
 if (p == variable)
 return p-val;

is *NOT* ordered

Btw, don't get me wrong. I don't _like_ it not being ordered, and I
actually did spend some time thinking about my earlier proposal on
strengthening the 'consume' ordering.

I have for the last several years been 100% convinced that the Intel
memory ordering is the right thing, and that people who like weak
memory ordering are wrong and should try to avoid reproducing if at
all possible. But given that we have memory orderings like power and
ARM, I don't actually see a sane way to get a good strong ordering.
You can teach compilers about cases like the above when they actually
see all the code and they could poison the value chain etc. But it
would be fairly painful, and once you cross object files (or even just
functions in the same compilation unit, for that matter), it goes from
painful to just ridiculously not worth it.

So I think the C semantics should mirror what the hardware gives us -
and do so even in the face of reasonable optimizations - not try to do
something else that requires compilers to treat consume very
differently.

If people made me king of the world, I'd outlaw weak memory ordering.
You can re-order as much as you want in hardware with speculation etc,
but you should always *check* your speculation and make it *look* like
you did everything in order. Which is pretty much the intel memory
ordering (ignoring the write buffering).

   Linus

[Bug ipa/60327] [4.9 Regression] xalanbmk and dealII ICE in ipa-inline-analysis.c:3555

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60327

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||lto
 CC||hubicka at gcc dot gnu.org
   Target Milestone|--- |4.9.0

[Bug ipa/60327] New: [4.9 Regression] xalanbmk and dealII ICE in ipa-inline-analysis.c:3555

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60327

Bug ID: 60327
   Summary: [4.9 Regression] xalanbmk and dealII ICE in
ipa-inline-analysis.c:3555
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org

Both xalanbmk and dealII ICE when built with -Ofast -funroll-loops -fpeel-loops
-march=native -flto -fwhole-program -flto-partition=none (AMD Fam10) like

lto1: internal compiler error: Segmentation fault
0x7ea3bf crash_signal
/gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/toplev.c:337
0x68d839 inline_update_overall_summary(cgraph_node*)
   
/gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/ipa-inline-analysis.c:3555 
0x6a7843 walk_polymorphic_call_targets
/gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/ipa.c:229
0x6a7843 symtab_remove_unreachable_nodes(bool, _IO_FILE*)
/gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/ipa.c:400
0x744edb execute_todo
/gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/passes.c:1896
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
lto-wrapper:
/gcc/spec/sb-barbella.arch.suse.de-ai-64/x86_64/install-hack/bin/g++ returned 1
exit status
/usr/bin/ld: fatal error: lto-wrapper failed
collect2: error: ld returned 1 exit status
specmake: *** [dealII] Error 1

[Bug ipa/60325] [4.9 Regression] ICE in ipa_modify_formal_parameters, at ipa-prop.c compiling g++.dg/cilk-plus/CK/lambda_spawns.cc with LTO-profiledbootstrap build

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60325

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||lto
   Target Milestone|--- |4.9.0
Summary|ICE in  |[4.9 Regression] ICE in
   |ipa_modify_formal_parameter |ipa_modify_formal_parameter
   |s, at ipa-prop.c compiling  |s, at ipa-prop.c compiling
   |g++.dg/cilk-plus/CK/lambda_ |g++.dg/cilk-plus/CK/lambda_
   |spawns.cc with  |spawns.cc with
   |LTO-profiledbootstrap build |LTO-profiledbootstrap build

[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||lto
  Known to fail||4.5.4, 4.6.4, 4.7.3

--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org ---
Hmm, I can't reproduce this with 4.8 or trunk but with 4.5, 4.6 and 4.7.

[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319

--- Comment #2 from Richard Biener rguenth at gcc dot gnu.org ---
4.7 and lower is expected to show this behavior due to the bug that c++ is
not properly implemented as c = (char)((int)c + 1) and thus we think that
overflow is undefined.

4.8 and above has that fixed and thus shows different, working behavior.

[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu

2014-02-24 Thread su at cs dot ucdavis.edu

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319

--- Comment #3 from Zhendong Su su at cs dot ucdavis.edu ---
(In reply to Richard Biener from comment #1)
 Hmm, I can't reproduce this with 4.8 or trunk but with 4.5, 4.6 and 4.7.

Richard, it still fails for me. Did you use LTO?  

$ gcc-trunk -v
Using built-in specs.
COLLECT_GCC=gcc-trunk
COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk
--enable-languages=c,c++ --disable-werror --enable-multilib
Thread model: posix
gcc version 4.9.0 20140223 (experimental) [trunk revision 208062] (GCC) 
$   
$ gcc-trunk -O0 -c foo.c
$ gcc-trunk -O0 -c main.c
$ gcc-trunk -Os foo.o main.o
$ a.out
$ 
$ gcc-trunk -flto -O0 -c foo.c
$ gcc-trunk -flto -O0 -c main.c
$ gcc-trunk -flto -Os foo.o main.o
$ a.out
^C
$ 
$ gcc-4.8 -flto -O0 -c foo.c
$ gcc-4.8 -flto -O0 -c main.c
$ gcc-4.8 -flto -Os foo.o main.o
$ a.out
^C
$

[Bug fortran/59198] [4.7/4.8/4.9 Regression] ICE on cyclically dependent polymorphic types

2014-02-24 Thread paul.richard.thomas at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59198

--- Comment #10 from paul.richard.thomas at gmail dot com paul.richard.thomas 
at gmail dot com ---
A further small remark, when the explicit interface for obs1_int is
turned to a subroutine, everything works perfectly.  I am homing in on
this as being the source of the trouble; I suspect that the function
pointer is not receiving the DEC_SIZE information.  I will look
tonight.

On 23 February 2014 21:49, pault at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org wrote:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59198

 --- Comment #9 from Paul Thomas pault at gcc dot gnu.org ---
 Hi Tobias,

 I need to walk away from this one for 24 hours.

 I have established this chain:
 (i) We start building decay_t;
 (ii) During which we have to build decay_gen_t (from trans-types.c:2456);
 (iii) Followed by decay_term_t;
 (iv) Which has a decay_t as its only component;
 (v) Since this is in the process of being built, what is retruned is the
 backend_decl without any of the fields.  Thus the size cannot be determined;
 (vi) For reasons that I cannot see, since this component is a pointer,
 indeterminate size this propagates back to the size of the decay_gen_t
 component in decay_t; and
 (vii) This I suppose but have not confirmed, clobbers the initialisation of 
 the
 vtable.

 This latter is surmise, on the basis that changing the 'term' field to a
 pointer still causes the size problem but the ICE goes away. The programme 
 even
 executes!

 I cannot see why there is a problem in estimating the size, since the relevant
 components are either allocatable or pointers - thus the size can be
 determined.

 Cheers

 Paul

 --
 You are receiving this mail because:
 You are on the CC list for the bug.

[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319

--- Comment #4 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Zhendong Su from comment #3)
 (In reply to Richard Biener from comment #1)
  Hmm, I can't reproduce this with 4.8 or trunk but with 4.5, 4.6 and 4.7.
 
 Richard, it still fails for me. Did you use LTO?  

Yes, I did.  For me -O[s23] -flto -v -save-temps -fdump-tree-all results
in a ccXYZ.ltrans0.o.169t.optimized file like


;; Function main (main, funcdef_no=2, decl_uid=2401, symbol_order=0) (executed
once)

main ()
{
  bb 2:
  return 0;

}

 $ gcc-trunk -v
 Using built-in specs.
 COLLECT_GCC=gcc-trunk
 COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux-
 gnu/4.9.0/lto-wrapper
 Target: x86_64-unknown-linux-gnu
 Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk
 --enable-languages=c,c++ --disable-werror --enable-multilib
 Thread model: posix
 gcc version 4.9.0 20140223 (experimental) [trunk revision 208062] (GCC) 
 $   
 $ gcc-trunk -O0 -c foo.c
 $ gcc-trunk -O0 -c main.c
 $ gcc-trunk -Os foo.o main.o
 $ a.out
 $ 
 $ gcc-trunk -flto -O0 -c foo.c
 $ gcc-trunk -flto -O0 -c main.c
 $ gcc-trunk -flto -Os foo.o main.o
 $ a.out
 ^C
 $ 
 $ gcc-4.8 -flto -O0 -c foo.c
 $ gcc-4.8 -flto -O0 -c main.c
 $ gcc-4.8 -flto -Os foo.o main.o
 $ a.out
 ^C
 $

[Bug fortran/60128] [4.8/4.9 Regression] Wrong ouput using en edit descriptor

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60128

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4
   Target Milestone|--- |4.8.3

[Bug tree-optimization/45791] Missed devirtualization

2014-02-24 Thread matthijs at stdin dot nl

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45791

Matthijs Kooijman matthijs at stdin dot nl changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #15 from Matthijs Kooijman matthijs at stdin dot nl ---
I ran into another variant of this problem, which I reduced to the following
testcase. I found the problem on 4.8.2, but it is already fixed in trunk /
gcc-4.9 (Debian 4.9-20140218-1). Still, it might be useful to have the testcase
here for reference.

class Base { };

class Sub : public Base {
public: 
virtual void bar();
};

Sub foo;
Sub * const pointer = foo;
Sub* function() { return foo; };

int main() {
// Gcc 4.8.2 devirtualizes this:
pointer-bar();
// but not this:
function()-bar();
}

[Bug libstdc++/60326] Incorrect type from std::make_unsignedwchar_t

2014-02-24 Thread redi at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60326

Jonathan Wakely redi at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-24
 Ever confirmed|0   |1

--- Comment #2 from Jonathan Wakely redi at gcc dot gnu.org ---
I think we're just missing wchar_t specializations, and also for char16_t and
char32_t

#include type_traits
using namespace std;
using wchar_signed = make_signedwchar_t::type;
using wchar_unsigned = make_unsignedwchar_t::type;
static_assert( !is_samewchar_signed, wchar_unsigned::value, );
static_assert( !is_samechar16_t, make_signedchar16_t::type::value, );
static_assert( !is_samechar32_t, make_signedchar32_t::type::value, );

[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0

2014-02-24 Thread dominiq at lps dot ens.fr

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292

--- Comment #5 from Dominique d'Humieres dominiq at lps dot ens.fr ---
The PR is fixed by the patch in comment 2 without regression, see
http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg01688.html.

Thanks for the quick fix.

[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Blocks||60243

--- Comment #2 from Richard Biener rguenth at gcc dot gnu.org ---
We have only 1 billion (!) calls to estimate_calls_size_and_time (and 2 billion
recursions in that function - I suppose callgrind/kcachegrind overflowed the
entry counter even ...).  Also 3 billion calls to evaluate_predicate.

Sth is getting seriously out-of-hands here ;)

That is, it's ultimately called from do_estimate_edge_size (but only 36
calls here - still a _lot_ for this testcase).

Looks related to what I found in PR60243.

[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0

2014-02-24 Thread abel at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292

--- Comment #6 from Andrey Belevantsev abel at gcc dot gnu.org ---
(In reply to Dominique d'Humieres from comment #5)
 The PR is fixed by the patch in comment 2 without regression, see
 http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg01688.html.
 
 Thanks for the quick fix.

Thank you, I will commit the patch then once the ia64 testing will finish.

[Bug rtl-optimization/60317] [4.9 Regression] find_hard_regno_for compile time hog in libvpx

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60317

--- Comment #3 from Richard Biener rguenth at gcc dot gnu.org ---
This whole thing updating keys and such should use a proper lattice of
per-cgraph and per-edge node sizes/times which can be updated with a
less ad-hoc algorithm than the current one which easily and completely explodes
... :/  (and the 'cache' is completely ineffective during the update process)

[Bug rtl-optimization/60317] [4.9 Regression] find_hard_regno_for compile time hog in libvpx

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60317

--- Comment #4 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Richard Biener from comment #3)
 This whole thing updating keys and such should use a proper lattice of
 per-cgraph and per-edge node sizes/times which can be updated with a
 less ad-hoc algorithm than the current one which easily and completely
 explodes ... :/  (and the 'cache' is completely ineffective during the
 update process)

Err - wrong bug

[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #3 from Richard Biener rguenth at gcc dot gnu.org ---
This whole thing updating keys and such should use a proper lattice of
per-cgraph and per-edge node sizes/times which can be updated with a
less ad-hoc algorithm than the current one which easily and completely explodes
... :/  (and the 'cache' is completely ineffective during the update process)

[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build

2014-02-24 Thread jamborm at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266

--- Comment #4 from Martin Jambor jamborm at gcc dot gnu.org ---
Author: jamborm
Date: Mon Feb 24 12:39:52 2014
New Revision: 208067

URL: http://gcc.gnu.org/viewcvs?rev=208067root=gccview=rev
Log:
2014-02-24  Martin Jambor  mjam...@suse.cz

PR ipa/60266
* ipa-cp.c (propagate_constants_accross_call): Bail out early if
there are no parameter descriptors.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-cp.c

[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build

2014-02-24 Thread jamborm at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266

Martin Jambor jamborm at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Martin Jambor jamborm at gcc dot gnu.org ---
Fixed.

[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #4 from Richard Biener rguenth at gcc dot gnu.org ---
When calling do_estimate_edge_size to compute the effect on caller size when
inlining an edge we call estimate_node_size_and_time which eventually recurses
down to estimate_calls_size_and_time (why!?  call edges in the callee are
irrelevant when inlining the call into the caller!).  Doesn't this just want
to add(?) e-call_stmt_size/time?  At the moment estimate_calls_size_and_time
recurses to estimate_edge_size_and_time ... and I don't see _any_ prevention
of running in cgraph cycles here.  (and the cache isn't populated before
computing an edges size/time is).

In fact,

Index: gcc/ipa-inline-analysis.c
===
--- gcc/ipa-inline-analysis.c   (revision 207960)
+++ gcc/ipa-inline-analysis.c   (working copy)
@@ -3011,21 +3011,11 @@ estimate_calls_size_and_time (struct cgr
   struct inline_edge_summary *es = inline_edge_summary (e);
   if (!es-predicate
  || evaluate_predicate (es-predicate, possible_truths))
-   {
- if (e-inline_failed)
-   {
- /* Predicates of calls shall not use NOT_CHANGED codes,
-sowe do not need to compute probabilities.  */
- estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE,
-  known_vals, known_binfos,
-  known_aggs, hints);
-   }
- else
-   estimate_calls_size_and_time (e-callee, size, time, hints,
- possible_truths,
- known_vals, known_binfos,
- known_aggs);
-   }
+   /* Predicates of calls shall not use NOT_CHANGED codes,
+  sowe do not need to compute probabilities.  */
+   estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE,
+known_vals, known_binfos,
+known_aggs, hints);
 }
   for (e = node-indirect_calls; e; e = e-next_callee)
 {

fixes this and I cannot make sense of calling estimate_calls_size_and_time
for the callee of an edge that we are not going to inline (or that is already
inlined?  I still find those if (e-inline_failed) checks odd).  If it's
supposed to account for inline bodies in the caller then we should have
updated the inline_summary () of the caller, not have to recurse here - and
we _do_ seem to (inline_merge_summary).

[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #5 from Richard Biener rguenth at gcc dot gnu.org ---
Hmm, ok - it is supposed to only account for the extra call edges in the
inlined
bodies.  The actual issue seems to be

Deciding on inlining of small functions.  Starting with size 114.
Enqueueing calls in Testscale::Test(Scale) [with Scale scale = (Scale)3u]/14.
Enqueueing calls in Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13.
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
  enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 -
Testscale::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)2u]/9
   Known to be false: not inlined, op1 != 2, op1 changed
   size:0 time:0
  enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 -
Testscale::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5
   Known to be false: not inlined, op1 != 1, op1 changed
   size:0 time:0
...

Considering Testscale::Test(Scale) [with Scale scale = (Scale)2u]/9 with 27
size
 to be inlined into Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13
in t.C:21
 Estimated growth after inlined into all is +27 insns.
 Estimated badness is -1073741827, frequency 0.16.
Badness calculation for Testscale::Test(Scale) [with Scale scale =
(Scale)3u]/13 - Testscale::Test(Scale) [with Scale scale = (Scale)2u]/10
  size growth -3, time 0 inline hints: in_scc declared_inline
  -1073741827: Growth -3 = 0
Processing frequency Testscale::Test(Scale) [with Scale scale = (Scale)2u]
  Called by Testscale::Test(Scale) [with Scale scale = (Scale)3u] that is
normal or hot
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)0u]/2
   Known to be false: not inlined, op1 != 0, op1 changed
   size:0 time:0
  enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 -
Testscale::Test(Scale) [with Scale scale = (Scale)0u]/3, badness -1073741827
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5
   Known to be false: not inlined, op1 != 1, op1 changed
   size:0 time:0
  enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 -
Testscale::Test(Scale) [with Scale scale = (Scale)1u]/6, badness -1073741827
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)2u]/9
   Known to be false: not inlined, op1 != 2, op1 changed
   size:0 time:0
  enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 -
Testscale::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
  enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 -
Testscale::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
 Inlined into Testscale::Test(Scale) [with Scale scale = (Scale)3u] which now
has time 13 and size 24,net change of -3.
New minimal size reached: 111
   Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5
   Known to be false: not inlined, op1 != 1, op1 changed
   size:0 time:0

Considering Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5 with 27
size
 to be inlined into Testscale::Test(Scale) [with Scale scale = (Scale)0u]/2
in t.C:20
 Estimated growth after inlined into all is +27 insns.
 Estimated badness is -1073741827, frequency 0.12.
Badness calculation for Testscale::Test(Scale) [with Scale scale =
(Scale)0u]/2 - Testscale::Test(Scale) [with Scale scale = (Scale)1u]/6
  size growth -3, time 0 inline hints: declared_inline
  -1073741827: Growth -3 = 0
...

so we are inlining all over the place but don't really arrive at a point
where no further useful inlining is left and inlining never has a growth
effect in our theory and we continue to think that we shrink the overall
unit by further inlining.  Meanwhile the cgraph is full of millions of
calls (but estimated to be never reached?):

Considering Testscale::Test(Scale) [with Scale scale = (Scale)0u]/2 with 24
size
 to be inlined into Testscale::Test(Scale)

[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org ---
Btw, the smaller testcase (E4 case commented) shows exactly the same behavior,
we just seem to be exponential so only adding E4 makes it really bad.

[Bug libstdc++/59894] Force use of the default new/delete

2014-02-24 Thread glisse at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59894

--- Comment #4 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Marc Glisse from comment #0)
 PR 59893 considers a different path using LTO to inline at link time the
 definition from libsupc++.

Note that doing both at the same time:
1) provide an inline version of new
2) LTO-link with libsupc++
might interact badly (or not, I haven't checked). We should test it and if
needed warn about it in the documentation or preferably find a workaround.

[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
Note that we seem to fail to update BB predicates for switch stmts.

size:0.00, time:0.00, predicate:(true)
size:3.00, time:2.00, predicate:(not inlined)
size:2.00, time:2.00, predicate:(op1 changed)
size:8.00, time:3.20, predicate:(op1 changed)  (op1 != 2)

I cannot interpret the size 2 case (what is op1 changed?), but in that
case we actually shrink compared to not inlining.  As  op1 != 2 makes
it more restrictive it's odd that that increases the metrics.

As inliner I would inline all the (op1 changed) cases, thus in the above
case op1 == 2.

We seem to inline fully until hitting the case where only recursive edges
are left.  Even for only two switch cases we inline 5(!) calls into TestE1.

Ah, the switch isn't handled by the predicates because

 BB 3 predicate:(op1 != 1)
  s.1_4 = (int) s_2(D);
freq:0.80 size:  0 time:  0
  switch (s.1_4) default: L8, case 0: L1, case 1: L2

it is not unmodified_parm_or_parm_agg_item ().  The parameter is unsigned int
so the cast is not value-preserving.

Of course in general we can't rely on proper predicates so we need to avoid
exploding reliably.

[Bug preprocessor/58580] [4.8 Regression] preprocessor goes OOM with warning for zero literals

2014-02-24 Thread dodji at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58580

Dodji Seketeli dodji at gcc dot gnu.org changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Dodji Seketeli dodji at gcc dot gnu.org ---
This fall-out seems fixed now on trunk by commit r207046.

Sorry for the inconvenience.

[Bug preprocessor/58580] [4.8 Regression] preprocessor goes OOM with warning for zero literals

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58580

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #13 from Richard Biener rguenth at gcc dot gnu.org ---
Still broken on the branch as far as I can see.

[Bug c++/58950] Missing statement has no effect

2014-02-24 Thread glisse at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950

Marc Glisse glisse at gcc dot gnu.org changed:

   What|Removed |Added

 CC||paolo.carlini at oracle dot com

--- Comment #14 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Marc Glisse from comment #7)
 The current
 patch breaks g++.dg/ext/vla13.C (PR 54583), but nothing else covered by the
 testsuite, so it is tempting to see if there are other ways to fix PR 54583.

Paolo, do you have an opinion on this PR?

[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu

2014-02-24 Thread su at cs dot ucdavis.edu

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319

--- Comment #5 from Zhendong Su su at cs dot ucdavis.edu ---
Did you separately compile the two files at -O0 and link at -Os, like below? 

 $ gcc-trunk -flto -O0 -c foo.c
 $ gcc-trunk -flto -O0 -c main.c
 $ gcc-trunk -flto -Os foo.o main.o

[Bug rtl-optimization/50677] volatile forces load into register

2014-02-24 Thread hjl.tools at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677

--- Comment #4 from H.J. Lu hjl.tools at gmail dot com ---
Combine generates

Trying 6, 7 - 8:
Failed to match this instruction:
(set (mem/v:SI (reg/v/f:DI 85 [ i ]) [2 *i_2(D)+0 S4 A32])
(plus:SI (mem/v:SI (reg/v/f:DI 85 [ i ]) [2 *i_2(D)+0 S4 A32])
(const_int 1 [0x1])))

from

(insn 6 3 7 2 (set (reg:SI 83 [ D.1752 ])
(mem/v:SI (reg/v/f:DI 85 [ i ]) [2 *i_2(D)+0 S4 A32])) x.i:1 90
{*movsi_
internal}
 (nil))
(insn 7 6 8 2 (parallel [
(set (reg:SI 84 [ D.1752 ])
(plus:SI (reg:SI 83 [ D.1752 ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) x.i:1 266 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 83 [ D.1752 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil

Why doesn't combine include (clobber (reg:CC 17 flags))?

[Bug c++/58950] Missing statement has no effect

2014-02-24 Thread paolo.carlini at oracle dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950

--- Comment #15 from Paolo Carlini paolo.carlini at oracle dot com ---
I don't think you simply want a better fix for 54583, because for the testcase
in #Comment 13 the new conditional setting TREE_NO_WARNING isn't used.
Otherwise, I think it would be easy to tighten it via array_of_runtime_bound_p.

[Bug c++/58950] Missing statement has no effect

2014-02-24 Thread glisse at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950

--- Comment #16 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Paolo Carlini from comment #15)
 I don't think you simply want a better fix for 54583, because for the
 testcase in #Comment 13 the new conditional setting TREE_NO_WARNING isn't
 used. Otherwise, I think it would be easy to tighten it via
 array_of_runtime_bound_p.

The issue isn't with setting the bit but reading it. If you look at the patch,
I remove a test for TREE_NO_WARNING (expr). This breaks 54583 because the
TREE_NO_WARNING bit is then ignored.

[Bug rtl-optimization/50677] volatile forces load into register

2014-02-24 Thread pinskia at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677

--- Comment #5 from Andrew Pinski pinskia at gcc dot gnu.org ---
(In reply to H.J. Lu from comment #4)
 Why doesn't combine include (clobber (reg:CC 17 flags))?

It has nothing to do with the clobber.
Inside combine_instructions there is a call to init_recog_no_volatile which
forces volatile memory not be recognized.  The main reason is because combine
does not check for volatile memory issues before doing the combine so it was
easier to just disable recognizability of volatile memory.

[Bug c++/58950] Missing statement has no effect

2014-02-24 Thread paolo.carlini at oracle dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950

--- Comment #17 from Paolo Carlini paolo.carlini at oracle dot com ---
Yes, I know that. What I'm saying is that other code may want to see that
TREE_NO_WARNING honored, the issue doesn't have much to do with 54583 per se.
In my personal opinion removing a TREE_NO_WARNING check is in general a pretty
risky thing to do, because unfortunately we have only that generic bit and we
use it in many different circumstances.

[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-24
 Ever confirmed|0   |1

--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Zhendong Su from comment #5)
 Did you separately compile the two files at -O0 and link at -Os, like below? 
 
  $ gcc-trunk -flto -O0 -c foo.c
  $ gcc-trunk -flto -O0 -c main.c
  $ gcc-trunk -flto -Os foo.o main.o

Ah, no.  The issue here is that the fix for that bug I mention triggers
on TYPE_OVERFLOW_UNDEFINED, but with -O[01] we have -fno-strict-overflow
enabled and thus we lower it in a bogus way while with -O[s23] we
have -fstrict-overflow.  This is a IL semantic change that is not actually
contained in the IL ...

Anyway, confirmed.

[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu

2014-02-24 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
Note that such bugs may occur generally when mixing -O[01] with -O[s23] ... for
a similar case, -f[no-]strict-aliasing we get away with streaming get_alias_set
() == 0.

Thus I think we have to conservatively merge -fstrict-overflow, similar
to how we treat -ffp-contract.

I have a patch.

[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter

2014-02-24 Thread jason at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312

Jason Merrill jason at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Jason Merrill jason at gcc dot gnu.org ---
Fixed.

[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter

2014-02-24 Thread jason at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312

--- Comment #2 from Jason Merrill jason at gcc dot gnu.org ---
Author: jason
Date: Mon Feb 24 18:47:20 2014
New Revision: 208092

URL: http://gcc.gnu.org/viewcvs?rev=208092root=gccview=rev
Log:
PR c++/60312
* parser.c (cp_parser_template_type_arg): Check for invalid 'auto'.

Added:
trunk/gcc/testsuite/g++.dg/cpp1y/auto-neg1.C
Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/parser.c

[Bug c++/60146] [4.8/4.9 Regression] ICE when compiling this code with -fopenmp

2014-02-24 Thread jason at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146

Jason Merrill jason at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jason at gcc dot gnu.org

[Bug c++/60328] New: [4.8/4.9 Regression] [c++11] ICE/Rejection with specialization in variadic template

2014-02-24 Thread reagentoo at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60328

Bug ID: 60328
   Summary: [4.8/4.9 Regression] [c++11] ICE/Rejection with
specialization in variadic template
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: reagentoo at gmail dot com

The following valid code snippet (compiled with -std=c++11) rejected in GCC
4.9.0 (20130909) and triggers an ICE in GCC 4.8.2:


template class _T, class... _Rest
struct Foo
{
template class _TT, class... _RR
using Bar = Foo_TT, _RR...;

using Normal = Foo_Rest...;
using Fail = Bar_Rest...;
};

GCC 4.8.2 output:
internal compiler error: Segmentation fault
 using Fail = Bar_Rest...;
  ^
Please submit a full bug report,
with preprocessed source if appropriate.
See https://bugs.gentoo.org/ for instructions.

GCC 4.9.0 output:
8 : error: pack expansion argument for non-pack parameter ‘_TT’ of alias
template ‘template template using Bar = Foo_TT, _RR ...’
using Fail = Bar_Rest...;
^

[Bug other/60329] New: Fix Typo

2014-02-24 Thread alangiderick at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60329

Bug ID: 60329
   Summary: Fix Typo
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alangiderick at gmail dot com

Created attachment 32206
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32206action=edit
Fix Typo

Hello
  Fixed a typo in call.h header file in the comment.

[Bug other/60330] New: Licensed an unlicensed file

2014-02-24 Thread alangiderick at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60330

Bug ID: 60330
   Summary: Licensed an unlicensed file
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alangiderick at gmail dot com

Created attachment 32207
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32207action=edit
Licensed a file

Hi 
   I licensed a file that had the license missing at the begginning of the
code.
  Thanks

[Bug c++/60331] New: ICE with OpenMP #pragma omp declare reduction in template class

2014-02-24 Thread reichelt at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60331

Bug ID: 60331
   Summary: ICE with OpenMP #pragma omp declare reduction in
template class
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code, openmp
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: reichelt at gcc dot gnu.org

The following valid(?) code snippet (compiled with -std=c++11 -fopenmp)
triggers an ICE on trunk:

===
templatetypename struct A
{
  #pragma omp declare reduction (x : int : omp_out += omp_in) initializer
(omp_priv = omp_priv)
};
===

bug.cc: In static member function 'static void A template-parameter-1-1
::omp declare reduction x~i(int)':
bug.cc:3:87: sorry, unimplemented: unexpected AST of kind decl_expr
   #pragma omp declare reduction (x : int : omp_out += omp_in) initializer
(omp_priv = omp_priv)
   
   ^
bug.cc:3:87: internal compiler error: in potential_constant_expression_1, at
cp/semantics.c:10553
0x746309 potential_constant_expression_1
../../gcc/gcc/cp/semantics.c:10553
0x6105a8 fold_non_dependent_expr_sfinae(tree_node*, int)
../../gcc/gcc/cp/pt.c:5111
0x610619 build_non_dependent_expr(tree_node*)
../../gcc/gcc/cp/pt.c:21306
0x72b0e0 finish_expr_stmt(tree_node*)
../../gcc/gcc/cp/semantics.c:688
0x6d2629 cp_parser_omp_declare_reduction_exprs
../../gcc/gcc/cp/parser.c:30638
0x6d404a cp_parser_late_parsing_for_member
../../gcc/gcc/cp/parser.c:23503
0x6ae0bc cp_parser_class_specifier_1
../../gcc/gcc/cp/parser.c:19451
0x6ae0bc cp_parser_class_specifier
../../gcc/gcc/cp/parser.c:19482
0x6ae0bc cp_parser_type_specifier
../../gcc/gcc/cp/parser.c:14305
0x6c6ec0 cp_parser_decl_specifier_seq
../../gcc/gcc/cp/parser.c:11547
0x6ccb63 cp_parser_single_declaration
../../gcc/gcc/cp/parser.c:23082
0x6cd044 cp_parser_template_declaration_after_export
../../gcc/gcc/cp/parser.c:22958
0x6d83e9 cp_parser_declaration
../../gcc/gcc/cp/parser.c:10947
0x6d6ed8 cp_parser_declaration_seq_opt
../../gcc/gcc/cp/parser.c:10869
0x6d877a cp_parser_translation_unit
../../gcc/gcc/cp/parser.c:4014
0x6d877a c_parse_file()
../../gcc/gcc/cp/parser.c:31582
0x7f79f3 c_common_parse_file()
../../gcc/gcc/c-family/c-opts.c:1060
Please submit a full bug report, [etc.]

Without -std=c++11 or without the template, the code compiles fine.

[Bug c++/60332] New: [c++1y] ICE with auto in function-pointer cast

2014-02-24 Thread reichelt at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60332

Bug ID: 60332
   Summary: [c++1y] ICE with auto in function-pointer cast
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code, lto
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: reichelt at gcc dot gnu.org

The following valid(?) code snippet (compiled with -std=c++1y -flto)
triggers an ICE on trunk:

=
void foo();

auto f = (auto(*)())(foo);
=

bug.cc:3:27: internal compiler error: tree code 'template_type_parm' is not
supported in LTO streams
 auto f = (auto(*)())(foo);
   ^
0xaba08d DFS_write_tree
../../gcc/gcc/lto-streamer-out.c:1300
0xab941f DFS_write_tree_body
../../gcc/gcc/lto-streamer-out.c:476
0xab941f DFS_write_tree
../../gcc/gcc/lto-streamer-out.c:1208
0xab941f DFS_write_tree_body
../../gcc/gcc/lto-streamer-out.c:476
0xab941f DFS_write_tree
../../gcc/gcc/lto-streamer-out.c:1208
0xab941f DFS_write_tree_body
../../gcc/gcc/lto-streamer-out.c:476
0xab941f DFS_write_tree
../../gcc/gcc/lto-streamer-out.c:1208
0xabb727 lto_output_tree(output_block*, tree_node*, bool, bool)
../../gcc/gcc/lto-streamer-out.c:1390
0xab5aef write_global_stream
../../gcc/gcc/lto-streamer-out.c:2100
0xabd99e lto_output_decl_state_streams
../../gcc/gcc/lto-streamer-out.c:2144
0xabd99e produce_asm_for_decls()
../../gcc/gcc/lto-streamer-out.c:2429
0xaffe4f write_lto
../../gcc/gcc/passes.c:2297
0xb02ec0 ipa_write_summaries_1
../../gcc/gcc/passes.c:2356
0xb02ec0 ipa_write_summaries()
../../gcc/gcc/passes.c:2413
0x891cf7 ipa_passes
../../gcc/gcc/cgraphunit.c:2078
0x891cf7 compile()
../../gcc/gcc/cgraphunit.c:2174
0x892224 finalize_compilation_unit()
../../gcc/gcc/cgraphunit.c:2329
0x68deee cp_write_global_declarations()
../../gcc/gcc/cp/decl2.c:4449
Please submit a full bug report, [etc.]

[Bug c++/37140] type inherited from base class not recognized

2014-02-24 Thread fabien at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37140

--- Comment #15 from fabien at gcc dot gnu.org ---
Author: fabien
Date: Mon Feb 24 20:27:34 2014
New Revision: 208093

URL: http://gcc.gnu.org/viewcvs?rev=208093root=gccview=rev
Log:
2014-02-24  Fabien Chene  fab...@gcc.gnu.org
PR c++/37140
* parser.c (cp_parser_nonclass_name): Call strip_using_decl and
move the code handling dependent USING_DECLs...
* name-lookup.c (strip_using_decl): ...Here.

2014-02-24  Fabien Chene  fab...@gcc.gnu.org

PR c++/37140
* g++.dg/template/using27.C: New.
* g++.dg/template/using28.C: New.
* g++.dg/template/using29.C: New.

Added:
branches/gcc-4_8-branch/gcc/testsuite/g++.dg/template/using27.C
branches/gcc-4_8-branch/gcc/testsuite/g++.dg/template/using28.C
branches/gcc-4_8-branch/gcc/testsuite/g++.dg/template/using29.C
Modified:
branches/gcc-4_8-branch/gcc/cp/ChangeLog
branches/gcc-4_8-branch/gcc/cp/name-lookup.c
branches/gcc-4_8-branch/gcc/cp/parser.c
branches/gcc-4_8-branch/gcc/testsuite/ChangeLog

[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter

2014-02-24 Thread abutcher at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312

Adam Butcher abutcher at gcc dot gnu.org changed:

   What|Removed |Added

 CC||abutcher at gcc dot gnu.org

--- Comment #3 from Adam Butcher abutcher at gcc dot gnu.org ---
I think this might have fixed PR c++/60311 too.

[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter

2014-02-24 Thread reichelt at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312

--- Comment #4 from Volker Reichelt reichelt at gcc dot gnu.org ---
 I think this might have fixed PR c++/60311 too.

Alas not, that one still crashes for me.

[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked

2014-02-24 Thread hjl.tools at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295

--- Comment #3 from H.J. Lu hjl.tools at gmail dot com ---
lto_wpa_write_files has

for (i = 0; i  n_sets; i++)
{
...
stream_out (temp_filename, part-encoder, i == n_sets - 1);
...
}

n_sets is 32 when bootstrapping GCC.  With parallel build, we
may build cc1, cc1plus, f951, cc1obj at the same time.  If machine
is under heavy load, we may start 4*32 == 128 lto1-wpa-stream
processes at the same time with -flto=jobserver.  This patch:

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index c676d79..4023036 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -3219,9 +3219,7 @@ do_whole_program_analysis (void)
   lto_parallelism = 1;

   /* TODO: jobserver communicatoin is not supported, yet.  */
-  if (!strcmp (flag_wpa, jobserver))
-lto_parallelism = -1;
-  else
+  if (strcmp (flag_wpa, jobserver))
 {
   lto_parallelism = atoi (flag_wpa);
   if (lto_parallelism = 0)

limits lto1-wpa-stream process to 1 if -flto=jobserver is used.

[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter

2014-02-24 Thread abutcher at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312

--- Comment #5 from Adam Butcher abutcher at gcc dot gnu.org ---
Actually strike that, my [local] changes relating to PR c++/60065
(http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01437.html) seem to have changed
the behavior.

[Bug libstdc++/60333] New: type_traits make_signed, make_unsigned missing support for long long enumerations

2014-02-24 Thread harald at gigawatt dot nl

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60333

Bug ID: 60333
   Summary: type_traits make_signed, make_unsigned missing support
for long long enumerations
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: harald at gigawatt dot nl

#include type_traits
enum E { e = 0x1 };
static_assert(sizeof(std::make_signedE::type) == sizeof(E), );
static_assert(sizeof(std::make_unsignedE::type) == sizeof(E), );

This fails on x86, because make_signed and make_unsigned never return a larger
type than (un)signed long.

[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked

2014-02-24 Thread hubicka at ucw dot cz

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295

--- Comment #4 from Jan Hubicka hubicka at ucw dot cz ---
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295
 
 --- Comment #3 from H.J. Lu hjl.tools at gmail dot com ---
 lto_wpa_write_files has
 
 for (i = 0; i  n_sets; i++)
 {
 ...
 stream_out (temp_filename, part-encoder, i == n_sets - 1);
 ...
 }
 
 n_sets is 32 when bootstrapping GCC.  With parallel build, we
 may build cc1, cc1plus, f951, cc1obj at the same time.  If machine
 is under heavy load, we may start 4*32 == 128 lto1-wpa-stream
 processes at the same time with -flto=jobserver.  This patch:
 
 diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
 index c676d79..4023036 100644
 --- a/gcc/lto/lto.c
 +++ b/gcc/lto/lto.c
 @@ -3219,9 +3219,7 @@ do_whole_program_analysis (void)
lto_parallelism = 1;
 
/* TODO: jobserver communicatoin is not supported, yet.  */
 -  if (!strcmp (flag_wpa, jobserver))
 -lto_parallelism = -1;
 -  else
 +  if (strcmp (flag_wpa, jobserver))
  {
lto_parallelism = atoi (flag_wpa);
if (lto_parallelism = 0)
 
 limits lto1-wpa-stream process to 1 if -flto=jobserver is used.

I think this is better variant
$ svn diff ~/trunk/gcc/lto/lto.c
Index: /aux/hubicka/trunk/gcc/lto/lto.c
===
--- /aux/hubicka/trunk/gcc/lto/lto.c(revision 207702)
+++ /aux/hubicka/trunk/gcc/lto/lto.c(working copy)
@@ -2491,7 +2491,7 @@ stream_out (char *temp_filename, lto_sym
 #ifdef HAVE_WORKING_FORK
   static int nruns;

-  if (!lto_parallelism || lto_parallelism == 1)
+  if (lto_parallelism = 1)
 {
   do_stream_out (temp_filename, encoder);
   return;

I basically wanted to have simple jobserver client with lto_parallelism=-1, but
I did not have time to implement it.  (after glancing over GNU Make's
implementation
it seems actually bit non-trivial)

I am adding Paul D. Smith into CC, since he wrote article on the
implementation.
Paul, we would like GCC to actually use jobserver to limit number of processes
it forks internally
for streaming.  Is there an elegant and simple solution to get this
implemented?

I was thinking that if connecting to jobserver is hard, we can just produce
makefile
with rules waiting for read to finish and use it to get tokens into GCC
streamer, but
that seems somewhat kludgy.

Jan

[Bug c++/60146] [4.8/4.9 Regression] ICE when compiling this code with -fopenmp

2014-02-24 Thread jason at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146

--- Comment #5 from Jason Merrill jason at gcc dot gnu.org ---
Author: jason
Date: Mon Feb 24 22:17:43 2014
New Revision: 208094

URL: http://gcc.gnu.org/viewcvs?rev=208094root=gccview=rev
Log:
PR c++/60146
* pt.c (tsubst_omp_for_iterator): Don't let substitution of the
DECL_EXPR initialize a non-class iterator.

Added:
trunk/gcc/testsuite/g++.dg/gomp/for-20.C
Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/pt.c

[Bug c++/60146] [4.8 Regression] ICE when compiling this code with -fopenmp

2014-02-24 Thread jason at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146

Jason Merrill jason at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[4.8/4.9 Regression] ICE|[4.8 Regression] ICE when
   |when compiling this code|compiling this code with
   |with -fopenmp   |-fopenmp

--- Comment #6 from Jason Merrill jason at gcc dot gnu.org ---
Fixed in 4.9 so far.

[Bug c++/60328] [4.8/4.9 Regression] [c++11] ICE/Rejection with specialization in variadic template

2014-02-24 Thread jason at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60328

Jason Merrill jason at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jason at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #1 from Jason Merrill jason at gcc dot gnu.org ---
GCC 4.9 implements the tentative resolution of DR 1430.

http://open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1430

*** This bug has been marked as a duplicate of bug 51239 ***

[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked

2014-02-24 Thread hjl.tools at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295

--- Comment #5 from H.J. Lu hjl.tools at gmail dot com ---
(In reply to Jan Hubicka from comment #4)

 I think this is better variant
 $ svn diff ~/trunk/gcc/lto/lto.c
 Index: /aux/hubicka/trunk/gcc/lto/lto.c
 ===
 --- /aux/hubicka/trunk/gcc/lto/lto.c(revision 207702)
 +++ /aux/hubicka/trunk/gcc/lto/lto.c(working copy)
 @@ -2491,7 +2491,7 @@ stream_out (char *temp_filename, lto_sym
  #ifdef HAVE_WORKING_FORK
static int nruns;
  
 -  if (!lto_parallelism || lto_parallelism == 1)
 +  if (lto_parallelism = 1)
  {
do_stream_out (temp_filename, encoder);
return;
 
 I basically wanted to have simple jobserver client with lto_parallelism=-1,
 but
 I did not have time to implement it.  (after glancing over GNU Make's
 implementation
 it seems actually bit non-trivial)
 

This works for me.  Can you check it in?

[Bug c++/51239] [DR 1430] ICE with variadic template alias

2014-02-24 Thread jason at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51239

Jason Merrill jason at gcc dot gnu.org changed:

   What|Removed |Added

 CC||reagentoo at gmail dot com

--- Comment #7 from Jason Merrill jason at gcc dot gnu.org ---
*** Bug 60328 has been marked as a duplicate of this bug. ***

[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked

2014-02-24 Thread hubicka at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295

--- Comment #6 from Jan Hubicka hubicka at gcc dot gnu.org ---
Author: hubicka
Date: Mon Feb 24 22:58:44 2014
New Revision: 208097

URL: http://gcc.gnu.org/viewcvs?rev=208097root=gccview=rev
Log:

PR lto/60295
* lto.c (stream_out): Avoid parallel streaming with
-flto=jobserver until we are able to throttle it down
resonably.

Modified:
trunk/gcc/lto/ChangeLog
trunk/gcc/lto/lto.c

[Bug fortran/60334] New: Segmentation fault on character pointer assignments

2014-02-24 Thread antony at cosmologist dot info

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60334

Bug ID: 60334
   Summary: Segmentation fault on character pointer assignments
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antony at cosmologist dot info

This compiles OK

program tester
implicit none  

character(LEN=:), pointer :: Y
character(LEN=0), target :: Empty_String = ''

Y = test()
print *, Y

contains

function test() result(P)
character(LEN=:), pointer :: P
P= Empty_String  

end function

end program


but compiled with  gfortran -Og when run gives

Program received signal 11 (SIGSEGV): Segmentation fault.

Backtrace for this error:
  + [0xb776c400]
  + /lib/i386-linux-gnu/libc.so.6(+0x13afcb) [0xb7529fcb]
  + in the main program
from file TestClass.f90
  + /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb74084d3]
*** glibc detected *** ./a.out: free(): invalid pointer: 0x09be0898 ***

Other compiler options don't so reliably crash, but still probably invalid code
(I think). It's not specific to the string having zero length.

[Bug c++/60335] New: confused by earlier errors, bailing out

2014-02-24 Thread vanyacpp at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60335

Bug ID: 60335
   Summary: confused by earlier errors, bailing out
   Product: gcc
   Version: 4.7.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vanyacpp at gmail dot com

struct baz0 
{
int baz1(void bar0, struct bar0 {} bar3);
};

[Bug c++/60065] [c++1y] ICE with auto parameter pack

2014-02-24 Thread abutcher at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60065

--- Comment #3 from Adam Butcher abutcher at gcc dot gnu.org ---
Author: abutcher
Date: Tue Feb 25 03:47:24 2014
New Revision: 208106

URL: http://gcc.gnu.org/viewcvs?rev=208106root=gccview=rev
Log:
Fix PR c++/60065.

PR c++/60065
* parser.c (cp_parser_direct_declarator): Don't save and
restore num_template_parameter_lists around call to
cp_parser_parameter_declaration_list.
(function_being_declared_is_template_p): New predicate.
(cp_parser_parameter_declaration_list): Use
function_being_declared_is_template_p as predicate for
inspecting current function template parameter list length
rather than num_template_parameter_lists.

PR c++/60065
* g++.dg/cpp1y/pr60065.C: New testcase.

Added:
trunk/gcc/testsuite/g++.dg/cpp1y/pr60065.C
Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/parser.c
trunk/gcc/testsuite/ChangeLog

[Bug c++/60328] [4.8/4.9 Regression] [c++11] ICE/Rejection with specialization in variadic template

2014-02-24 Thread reagentoo at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60328

--- Comment #2 from reagentoo at gmail dot com ---
(In reply to Jason Merrill from comment #1)
 GCC 4.9 implements the tentative resolution of DR 1430.
 
 http://open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1430
 
 *** This bug has been marked as a duplicate of bug 51239 ***

But this test-case compiles with Clang normal.

[Bug rtl-optimization/49847] [4.7/4.8/4.9 Regression] NULL deref in fold_rtx (prev_insn_cc0 == NULL)

2014-02-24 Thread law at redhat dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847

--- Comment #34 from Jeffrey A. Law law at redhat dot com ---
OK.  Then I suggest two immediate things to do.

1. Fix the documentation for cc0 targets to indicate that the setter/user no
longer have to be consecutive, particularly in the presence of
flag_trapping_math.

2. Fault in fixes.  While a review of every bit of HAVE_cc0 code is warranted,
I'm not terribly inclined as HAVE_cc0 targets simply aren't that important
anymore.

[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0

2014-02-24 Thread abel at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292

--- Comment #7 from Andrey Belevantsev abel at gcc dot gnu.org ---
Author: abel
Date: Tue Feb 25 06:35:09 2014
New Revision: 208109

URL: http://gcc.gnu.org/viewcvs?rev=208109root=gccview=rev
Log:
PR rtl-optimization/60292
* sel-sched.c (fill_vec_av_set): Do not reset target availability
bit fot the fence instruction.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched.c

[Bug rtl-optimization/60155] ICE: in get_pressure_class_and_nregs at gcse.c:3438

2014-02-24 Thread law at redhat dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60155

Jeffrey A. Law law at redhat dot com changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #6 from Jeffrey A. Law law at redhat dot com ---
Well, given that gcse merely moves evaluations to other blocks where evaluation
of the expression is always anticipated, there's no inherent reason why we
can't gcse something that might trap.

It really feels like this is papering over the real problem, namely that
get_pressure_class_and_nregs simply doesn't handle things that are passed
through hash_scan_set, in particular cases where there's a single set inside a
PARALLEL.

[Bug c++/60311] [c++1y] ICE with pointer-to-function with auto parameter

2014-02-24 Thread abutcher at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60311

--- Comment #1 from Adam Butcher abutcher at gcc dot gnu.org ---
Author: abutcher
Date: Tue Feb 25 06:44:53 2014
New Revision: 208111

URL: http://gcc.gnu.org/viewcvs?rev=208111root=gccview=rev
Log:
Fix PR c++/60311.

PR c++/60311
* parser.c (function_being_declared_is_template_p): Return false when
processing a template parameter list.
(cp_parser_parameter_declaration_clause): Don't set
auto_is_implicit_function_template_parm_p when processing a
template parameter list.

PR c++/60311
* g++.dg/cpp1y/pr60311.C: New testcase.

Added:
trunk/gcc/testsuite/g++.dg/cpp1y/pr60311.C
Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/parser.c
trunk/gcc/testsuite/ChangeLog

[Bug c++/60305] ICE constexpr array of functions in template

2014-02-24 Thread daniel.kruegler at googlemail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60305

Daniel Krügler daniel.kruegler at googlemail dot com changed:

   What|Removed |Added

 CC||daniel.kruegler@googlemail.
   ||com

--- Comment #1 from Daniel Krügler daniel.kruegler at googlemail dot com ---
Seems to be fixed in 4.9.0 head. Note that your func function template is
broken, because it is declared as returning void, but obviously returns a
non-void result.

[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0

2014-02-24 Thread abel at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292

Andrey Belevantsev abel at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Andrey Belevantsev abel at gcc dot gnu.org ---
Fixed on trunk.  No need to backport as the initial assert patch was committed
to trunk only.

Re: Fix caller-save.c:add_used_regs_1 handling of pseudos

2014-02-24 Thread Richard Sandiford

Steven Bosscher stevenb@gmail.com writes:
 On Sun, Feb 23, 2014 at 10:14 PM, Richard Sandiford wrote:
 I noticed in passing that this 4.7 cleanup:

   http://article.gmane.org/gmane.comp.gcc.patches/224853
 ...
 so that nothing happens for pseudos.  I've no idea whether this makes
 a difference in practice or not but it seems safer to restore the old
 behaviour.

 Tested on mipsisa64-sde-elf rather than x86_64-linux-gnu since it only
 affects reload targets.  OK to install?

 If it's worked since GCC 4.7, why restore that code?

OK, fair enough.  I'll withdraw the patch.

Thanks,
Richard

New Serbian PO file for 'cpplib' (version 4.9-b20140202)

2014-02-24 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Serbian team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/sr.po

(This file, 'cpplib-4.9-b20140202.sr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org

Contents of PO file 'cpplib-4.9-b20140202.sr.po'

2014-02-24 Thread Translation Project Robot



cpplib-4.9-b20140202.sr.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org

Re: [PATCH] Fix a typo in sparseset_pop

2014-02-24 Thread Richard Biener

On Mon, Feb 24, 2014 at 6:52 AM, Carrot Wei car...@google.com wrote:
 Hi

 The following patch fixes an obvious wrong index used to access the
 dense array. The patch has passed the bootstrap and regression tests
 on x86-64.

 OK for trunk?

Ok.

Thanks,
Richard.

 thanks
 Carrot


 2014-02-23  Guozhi Wei  car...@google.com

 * sparseset.h (sparseset_pop): Fix the wrong index.


 Index: sparseset.h
 ===
 --- sparseset.h (revision 208039)
 +++ sparseset.h (working copy)
 @@ -177,7 +177,7 @@
gcc_checking_assert (mem != 0);

s-members = mem - 1;
 -  return s-dense[mem];
 +  return s-dense[s-members];
  }

  static inline void

Re: builtin fe[gs]etround

2014-02-24 Thread Richard Biener

On Sun, Feb 23, 2014 at 12:09 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 a natural first step to optimize changes of rounding modes seems to be
 making these 2 functions builtins. I don't know exactly how far
 optimizations will be able to go (the fact that fesetround can fail
 complicates things a lot). What is included here:

 1) fegetround is pure.

 2) Neither function aliases (use or clobber) any memory. I expect this is
 likely not true on all platforms, some probably store the rounding mode in a
 global variable that is accessible through other means (though mixing direct
 accesses with calls to fe*etround seems a questionable style). Any opinion
 or advice here?

 Regtested on x86_64-linux-gnu, certainly not for 4.9.

Hohumm ... before making any of these functions less of a barrier than they
are (at least for loads and stores), shouldn't we think of, and fix, the lack of
any dependences between FP status word changes and actual arithmetic
instructions?

In fact, using 'pure' or 'not use/clobber memory' here is exactly walking
on shaking grounds.  Simply because we lack of a way to say that
this stmt uses/clobbers the FP state (fegetround would be 'const' when
following your logic in 2)).

Otherwise, what is it worth optimizing^breaking things even more than
we do now?

[not that I have an answer for the FP state dependency that I like]

Thanks,
Richard.

 2014-02-23  Marc Glisse  marc.gli...@inria.fr

 gcc/
 * builtins.def (BUILT_IN_FEGETROUND, BUILT_IN_FESETROUND): Add.
 * tree-ssa-alias.c (ref_maybe_used_by_call_p_1,
 call_may_clobber_ref_p_1): Handle them.

 gcc/testsuite/
 * gcc.dg/tree-ssa/fegsetround.c: New file.

 --
 Marc Glisse
 Index: gcc/builtins.def
 ===
 --- gcc/builtins.def(revision 208045)
 +++ gcc/builtins.def(working copy)
 @@ -276,20 +276,22 @@ DEF_C99_BUILTIN(BUILT_IN_EXPM1F,
  DEF_C99_BUILTIN(BUILT_IN_EXPM1L, expm1l,
 BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
  DEF_LIB_BUILTIN(BUILT_IN_FABS, fabs, BT_FN_DOUBLE_DOUBLE,
 ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSF, fabsf, BT_FN_FLOAT_FLOAT,
 ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSL, fabsl,
 BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_GCC_BUILTIN(BUILT_IN_FABSD32, fabsd32,
 BT_FN_DFLOAT32_DFLOAT32, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_GCC_BUILTIN(BUILT_IN_FABSD64, fabsd64,
 BT_FN_DFLOAT64_DFLOAT64, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_GCC_BUILTIN(BUILT_IN_FABSD128, fabsd128,
 BT_FN_DFLOAT128_DFLOAT128, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_BUILTIN(BUILT_IN_FDIM, fdim, BT_FN_DOUBLE_DOUBLE_DOUBLE,
 ATTR_MATHFN_FPROUNDING_ERRNO)
  DEF_C99_BUILTIN(BUILT_IN_FDIMF, fdimf, BT_FN_FLOAT_FLOAT_FLOAT,
 ATTR_MATHFN_FPROUNDING_ERRNO)
  DEF_C99_BUILTIN(BUILT_IN_FDIML, fdiml,
 BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
 +DEF_C99_BUILTIN(BUILT_IN_FEGETROUND, fegetround, BT_FN_INT,
 ATTR_PURE_NOTHROW_LEAF_LIST)
 +DEF_C99_BUILTIN(BUILT_IN_FESETROUND, fesetround, BT_FN_INT_INT,
 ATTR_NOTHROW_LEAF_LIST)
  DEF_LIB_BUILTIN(BUILT_IN_FLOOR, floor, BT_FN_DOUBLE_DOUBLE,
 ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_C90RES_BUILTIN (BUILT_IN_FLOORF, floorf, BT_FN_FLOAT_FLOAT,
 ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_C90RES_BUILTIN (BUILT_IN_FLOORL, floorl,
 BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_BUILTIN(BUILT_IN_FMA, fma,
 BT_FN_DOUBLE_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING)
  DEF_C99_BUILTIN(BUILT_IN_FMAF, fmaf,
 BT_FN_FLOAT_FLOAT_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING)
  DEF_C99_BUILTIN(BUILT_IN_FMAL, fmal,
 BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING)
  DEF_C99_BUILTIN(BUILT_IN_FMAX, fmax, BT_FN_DOUBLE_DOUBLE_DOUBLE,
 ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_BUILTIN(BUILT_IN_FMAXF, fmaxf, BT_FN_FLOAT_FLOAT_FLOAT,
 ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_BUILTIN(BUILT_IN_FMAXL, fmaxl,
 BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_C99_BUILTIN(BUILT_IN_FMIN, fmin, BT_FN_DOUBLE_DOUBLE_DOUBLE,
 ATTR_CONST_NOTHROW_LEAF_LIST)
 Index: gcc/testsuite/gcc.dg/tree-ssa/fegsetround.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/fegsetround.c (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/fegsetround.c (working copy)
 @@ -0,0 +1,32 @@
 +/* { dg-do compile } */
 +/* { dg-options -std=c99 -O -fdump-tree-optimized } */
 +
 +#include fenv.h
 +
 +int a;
 +int f ()
 +{
 +  a = 42;
 +  // don't read a
 +  int x = fegetround ();
 +  fesetround (x + 1);
 +  a = 0;
 +  return a;
 +}
 +int g ()
 +{
 +  a = 0;
 +  // don't write a
 +  int x = fegetround ();
 +  fesetround (x + 1);
 +  return a;
 +}
 +int h ()
 +{
 +  // pure
 +  return fegetround () -

Re: [v3] complex functions with expression template reals

2014-02-24 Thread Paolo Carlini


Hi,

On 02/23/2014 04:11 PM, Marc Glisse wrote:

On Sun, 23 Feb 2014, Paolo Carlini wrote:


On 02/23/2014 11:32 AM, Marc Glisse wrote:

Hello,

looking at this question:
http://stackoverflow.com/q/21737186/1918193
I was surprised to see that libstdc++'s std::complex basically just 
works with user-defined types, even weird expression template ones, 
although that's not a supported use afaik.


The only functions that fail seem to be exp and pow, both because 
they call polar with two arguments that have different (expression) 
types.


I am not proposing to make this a supported use, but the cost of 
this small patch seems very low, and if it makes a couple users 
happy...


Regtested with no problem on x86_64-linux-gnu, ok for stage 1?


I would even be in favor of applying it now. Can we figure out simple 
(ie, not relying on boost...) testcases too?


I didn't try std::complexstd::valarrayX, maybe...

Otherwise, you need a type T with all the (real) math functions 
defined, and where every operation returns a different type 
(implicitly convertible to T). And then you want to call all the 
complex functions.


That seems doable, but way bigger than I'm willing to go for this 
feature. If you want to take over, be my guest ;-)
Another option would be just using boost/multiprecision/mpfr.hpp when 
available. In general, I think it makes sense to have a minimum of 
infrastructure enabling tests checking interoperability with boost. If 
only we had a check_v3_target_header { args } it would be most of it, 
but it doesn't seem we do?!? Anyway I guess we can take care of that 
post 4.9.0 and commit the straightforward code tweak now. Jon?


Paolo.

PS: Resending message, yesterday had issues with html

[PATCH ARM]: Fix more -mapcs-frame failures

2014-02-24 Thread Christian Bruel

This patch improves the one sent previously,
(http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01159.html),  to fix a few
more failures in the testsuite that could arise with shrink-wrap and
-fexceptions.

To recall, the problem that it fixes is that with -mapcs-frame :

-  the epilogue pops as

 sub sp, fp, #12 @ does not set FRAME_RELATED_P
 ldmia   sp, {fp, sp, lr}  @ XXX assert  def_cfa-reg is FP instead
of SP

- with vrp this is worse, we have

   fldmfdd ip!, {d8}@ FRAME_RELATED_P
   sub sp, fp, #20   ...
   ldmfd   sp, {r3, r4, fp, sp, pc}  @ XXX assert def_cfa-reg is IP
instead of SP,

Fixed by inserting a REG_CFA_DEF_CFA note, fixing the arm_unwind_emit
machinery and setting the FRAME_RELATED_P . The comment says :

/* The INSN is generated in epilogue.  It is set as RTX_FRAME_RELATED_P
   to get correct dwarf information for shrink-wrap.  We should not
   emit unwind information for it because these are used either for
   pretend arguments or notes to adjust sp and restore registers from
   stack.  */

the  testsuite score improves without regression (improvements from -g
and -fexeptions tests)

=== gcc Summary for arm-sim//-mapcs-frame ===

# of expected passes77545
# of unexpected failures31
# of unexpected successes2
# of expected failures172
# of unsupported tests1336

 === g++ Summary for arm-sim//-mapcs-frame ===

# of expected passes50116
# of unexpected failures9
# of unexpected successes3
# of expected failures280
# of unsupported tests1229

instead of

=== gcc Summary for arm-sim//-mapcs-frame ===

# of expected passes77106
# of unexpected failures500
# of unexpected successes2
# of expected failures172
# of unresolved testcases111
# of unsupported tests1336

=== g++ Summary for arm-sim//-mapcs-frame ===

# of expected passes50021
# of unexpected failures136
# of unexpected successes3
# of expected failures280
# of unsupported tests1229

Comments ? OK for trunk ?

Many thanks


2014-02-18  Christian Bruel  christian.br...@st.com

	PR target/60264
	* config/arm/arm.c (arm_emit_vfp_multi_reg_pop): Emit a	REG_CFA_DEF_CFA
	note.
	(arm_expand_epilogue_apcs_frame): call arm_add_cfa_adjust_cfa_note.
	(arm_unwind_emit): Allow REG_CFA_DEF_CFA.

2014-02-18  Christian Bruel  christian.br...@st.com

	PR target/60264
	* gcc.target/arm/pr60264.c
	* gcc.target/arm/pr60264-2.c

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 207942)
+++ gcc/config/arm/arm.c	(working copy)
@@ -19909,8 +19909,15 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num
   par = emit_insn (par);
   REG_NOTES (par) = dwarf;
 
-  arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs,
-			   base_reg, base_reg);
+  /* Make sure cfa doesn't leave with IP_REGNUM to allow unwinding fron FP.  */
+  if (TARGET_VFP  REGNO (base_reg) == IP_REGNUM)
+{
+  RTX_FRAME_RELATED_P (par) = 1;
+  add_reg_note (par, REG_CFA_DEF_CFA, hard_frame_pointer_rtx);
+}
+  else
+arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs,
+ base_reg, base_reg);
 }
 
 /* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
@@ -27098,15 +27105,19 @@ arm_expand_epilogue_apcs_frame (bool really_return
   if (TARGET_HARD_FLOAT  TARGET_VFP)
 {
   int start_reg;
+  rtx ip_rtx = gen_rtx_REG (SImode, IP_REGNUM);
 
   /* The offset is from IP_REGNUM.  */
   int saved_size = arm_get_vfp_saved_size ();
   if (saved_size  0)
 {
+	  rtx insn;
   floats_from_frame += saved_size;
-  emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
- hard_frame_pointer_rtx,
- GEN_INT (-floats_from_frame)));
+  insn = emit_insn (gen_addsi3 (ip_rtx,
+	hard_frame_pointer_rtx,
+	GEN_INT (-floats_from_frame)));
+	  arm_add_cfa_adjust_cfa_note (insn, -floats_from_frame,
+   ip_rtx, hard_frame_pointer_rtx);
 }
 
   /* Generate VFP register multi-pop.  */
@@ -27179,11 +27190,15 @@ arm_expand_epilogue_apcs_frame (bool really_return
   num_regs = bit_count (saved_regs_mask);
   if ((offsets-outgoing_args != (1 + num_regs)) || cfun-calls_alloca)
 {
+  rtx insn;
   emit_insn (gen_blockage ());
   /* Unwind the stack to just below the saved registers.  */
-  emit_insn (gen_addsi3 (stack_pointer_rtx,
- hard_frame_pointer_rtx,
- GEN_INT (- 4 * num_regs)));
+  insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
+hard_frame_pointer_rtx,
+GEN_INT (- 4 * num_regs)));
+
+  arm_add_cfa_adjust_cfa_note (insn, - 4 * num_regs,
+   stack_pointer_rtx, hard_frame_pointer_rtx);
 }

Re: builtin fe[gs]etround

2014-02-24 Thread Richard Biener

On Mon, Feb 24, 2014 at 10:02 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Sun, Feb 23, 2014 at 12:09 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 a natural first step to optimize changes of rounding modes seems to be
 making these 2 functions builtins. I don't know exactly how far
 optimizations will be able to go (the fact that fesetround can fail
 complicates things a lot). What is included here:

 1) fegetround is pure.

 2) Neither function aliases (use or clobber) any memory. I expect this is
 likely not true on all platforms, some probably store the rounding mode in a
 global variable that is accessible through other means (though mixing direct
 accesses with calls to fe*etround seems a questionable style). Any opinion
 or advice here?

 Regtested on x86_64-linux-gnu, certainly not for 4.9.

 Hohumm ... before making any of these functions less of a barrier than they
 are (at least for loads and stores), shouldn't we think of, and fix, the lack 
 of
 any dependences between FP status word changes and actual arithmetic
 instructions?

 In fact, using 'pure' or 'not use/clobber memory' here is exactly walking
 on shaking grounds.  Simply because we lack of a way to say that
 this stmt uses/clobbers the FP state (fegetround would be 'const' when
 following your logic in 2)).

 Otherwise, what is it worth optimizing^breaking things even more than
 we do now?

 [not that I have an answer for the FP state dependency that I like]

Just to elaborate on the two obvious options:

1) represent all arithmetic with builtins, using an extra explicit FP
state argument
and set / query that with the FP manipulation / query functions (also
with every call)

2) use sth similar to virtual operands - conveniently the vuse/vdef members are
present even for unary, binary and ternary assigns (you'd only use the
vuse field
here).  Issues with calls (might consume/clobber FP state) - there the
vop fields
are already used, so you'd need to add an extra use (easy) and a def (ugh)

eventually people wanted to get multiple defs for the simple stmts (assigns
and calls) back for stuff like modeling CPU flags explicitely (the overflow flag
for example).  And FP ISAs now have support for per-stmt rounding mode
flags (and element masks for vector instructions).  Thus eventually this
may be a good reason to support extra (but less efficient to get at / modify?)
SSA(!) uses and defs to these stmt kinds.  But it needs to be well-designed
to not throw away the speedups and simplicity we gained when removing
general support for multiple defs.

(should be obvious that I lean towards 2) but am not very happy with the
consequences for gimple data structures)

Richard.

 Thanks,
 Richard.

Re: [ARM] [Trivial] Fix shortening of field name extend.

2014-02-24 Thread James Greenhalgh

*ping*, CCing Jakub.

Thanks,
James

On Wed, Feb 12, 2014 at 12:43:10PM +, Ramana Radhakrishnan wrote:
 On 02/12/14 12:19, James Greenhalgh wrote:
 
  Hi,
 
  In aarch-common-protos.h we define a field in alu_cost_table:
 
 extnd
 
  On its own this is an upsetting optimization of the
  English language, but this trouble is compounded by the
  comment attached to this field throughout the cost tables
  themselves:
 
 /* Extend.  */
 
  This patch fixes the spelling of extend to match that in the
  commemnts.
 
  I've checked that AArch64 and AArch32 build with this patch
  applied.
 
  OK for trunk/stage-1 (I don't mind which)?
 
 I am happy for this to go in now -
 
 Jakub ?
 
 
 regards
 Ramana
 
 
  Thanks,
  James
 
  ---
  2014-02-12  James Greenhalgh  james.greenha...@arm.com
 
  * config/arm/aarch-common-protos.h
  (alu_cost_table): Fix spelling of extend.
  * config/arm/arm.c (arm_new_rtx_costs): Fix spelling of extend.
 
 
 
 -- 
 Ramana Radhakrishnan
 Principal Engineer
 ARM Ltd.

1 2 >

1 - 100 of 168 matches

Mail list logo