Re: Wrong code for i686 target with -O3 -flto

2013-08-05 Thread Jan Hubicka

Quoting Uros Bizjak ubiz...@gmail.com:


On Sun, Aug 4, 2013 at 2:34 AM, NightStrike nightstr...@gmail.com wrote:

On Mon, Jul 22, 2013 at 5:22 AM, Igor Zamyatin izamya...@gmail.com wrote:

Hi All!

Unfortunately now the compiler generates wrong code for i686 target
when options -O3 and -flto are used. It started more than a month ago
and reflected in PR57602.

Such combination of options could be quite important at least from the
performance point of view.

Since there was almost no reaction on this PR I'd like to ask either
to look at it in some observable future or revert the commit which is
guilty for the issue.



What's the bad commit?


As mentioned in the PR, it is r199422 [1,2].

[1] http://gcc.gnu.org/r199422
[2] http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01644.html


Sorry for the delay - I was travelling for almost 4 weeks and then had  
plenty things to do last week.  I am looking into it now and will fix  
it today.

Honza


Uros.






[RFC] vector subscripts/BIT_FIELD_REF in Big Endian.

2013-08-05 Thread Tejas Belagod


Hi,

I'm looking for some help understanding how BIT_FIELD_REFs work with big-endian.

Vector subscripts in this example:

#define vector __attribute__((vector_size(sizeof(int)*4) ))

typedef int vec vector;

int foo(vec a)
{
  return a[0];
}

gets lowered into array accesses by c-typeck.c

;; Function foo (null)
{
  return *(int *) a;
}

and gets gimplified into BIT_FIELD_REFs a bit later.

foo (vec a)
{
  int _2;

  bb 2:
  _2 = BIT_FIELD_REF a_3(D), 32, 0;
  return _2;

}

What's interesting to me here is the bitpos - does this not need 
BYTES_BIG_ENDIAN correction? This seems to be inconsistenct with what happens 
with reduction operations in the autovectorizer where the scalar result in the 
reduction epilogue gets extracted with a BIT_FIELD_REF but the bitpos there is 
corrected for BIG_ENDIAN.


... from tree-vect-loop.c:vect_create_epilog_for_reduction ()

  /* 2.4  Extract the final scalar result.  Create:
  s_out3 = extract_field v_out2, bitpos  */

  if (extract_scalar_result)
{
  tree rhs;

  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 extract scalar result);

  if (BYTES_BIG_ENDIAN)
bitpos = size_binop (MULT_EXPR,
 bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
 TYPE_SIZE (scalar_type));
  else
bitpos = bitsize_zero_node;


For eg:

int foo(int * a)
{
  int i, sum = 0;

  for (i=0;i16;i++)
   sum += a[i];

  return sum;
}

gets autovectorized into:

...
  vect_sum_9.17_74 = [reduc_plus_expr] vect_sum_9.15_73;
  stmp_sum_9.16_75 = BIT_FIELD_REF vect_sum_9.17_74, 32, 96;
  sum_76 = stmp_sum_9.16_75 + sum_47;

the BIT_FIELD_REF here seems to have been corrected for BYTES_BIG_ENDIAN

If vec_extract is defined in the back-end, how does one figure out if the 
BIT_FIELD_REF is a product of the gimplifier's indirect ref folding or the 
vectorizer's bit-field extraction and apply the appropriate correction in 
vec_extract's expansion? Or am I missing something that corrects BIT_FIELD_REFs 
between the gimplifier and the RTL expander?


Thanks,
Tejas.



Help with C++11 memory model on zSeries

2013-08-05 Thread Richard Sandiford
Sorry for the long mail and for what's probably an FAQ.  I did try to find
an answer without bothering the list... (and showing my ignorance so much :-))

At the moment, the s390 backend treats all atomic loads as simple loads
and only uses serialisation instructions for atomic stores.  I just wanted
to check whether this was really the right behaviour.

The architecture has strong memory-ordering semantics in which a CPU is
only allowed to move a store after a later load; the other three combinations
cannot happen.  The current implementation seems fine from that point of view,
because it means that a serialising instruction after a store is enough
to prevent any reordering.  However, page 5-126 of the architecture
manual[*] says:

  Following is an example showing the effects of serialization. Location
  A initially contains FF hex.

  CPU 1  CPU 2
  MVI A,X'00'   GCLI A,X'00'
  BCR 15,0   BNE G

  The BCR 15,0 instruction executed by CPU 1 is a serializing
  instruction that ensures that the store by CPU 1 at location A is
  completed. However, CPU 2 may loop indefinitely, or until the next
  interruption on CPU 2, because CPU 2 may already have fetched from
  location A for every execution of the CLI instruction. A serializing
  instruction must be in the CPU-2 loop to ensure that CPU 2 will again
  fetch from location A.

Does the new C/C++ memory model allow that kind of infinite loop 
even for sequentially-consistent atomic loads?  The draft text was:

[29.3.3]
  There shall be a single total order S on all memory_order_seq_cst
  operations, consistent with the “happens before” order and modification
  orders for all affected locations, such that each memory_order_seq_cst
  operation that loads a value observes either the last preceding
  modification according to this order S, or the result of an operation
  that is not memory_order_seq_cst.

but when I asked around, noone could see anything in the standard that
prevents the total order from having an infinite sequence of loads
between two stores.  That feels like a cheat though. :-)

Even if it isn't allowed, every CPU is going to get interrupted eventually,
and I'm told that in practice all current implementations would see the
store at some point.  In that case it might come down to a quality of
implementation question.  Is it OK to leave out the serialisation anyway
with a slightly vague guarantee like that?

Thanks,
Richard

[*] Available here FWIW: 
http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a



all_ones_mask_p clarification

2013-08-05 Thread Mike Stump
It is the intent for all_ones_mask_p to return true when 64 bits of ones in an 
unsigned type of width 64 when size is 64, right?  Currently the code uses a 
signed type for tmask, which sets the upper bits to 1, when the value includes 
the sign bit set and the equality code does check all 128 bits of the the value 
for equality.  This results in the current code returning false in this case.  
The below change is the behavior change I'm talking about.

We're fixing this in the wide-int branch, and just wanted to see if someone 
wanted to argue this isn't a bug.

If you want to see a small test case:

typedef enum
{
  DK_ERROR,
  DK_SORRY,
  DK_LAST_DIAGNOSTIC_KIND
} diagnostic_t;

struct diagnostic_context
{
  int diagnostic_count[DK_LAST_DIAGNOSTIC_KIND];
  diagnostic_t *classify_diagnostic;
};

extern diagnostic_context *global_dc;

bool
seen_error (void)
{
  return (global_dc)-diagnostic_count[(int) (DK_ERROR)] || 
(global_dc)-diagnostic_count[(int) (DK_SORRY)];
}



diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 6506ae7..9b17d1d 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3702,12 +3702,23 @@ all_ones_mask_p (const_tree mask, int size)
 
   tmask = build_int_cst_type (signed_type_for (type), -1);
 
-  return
-tree_int_cst_equal (mask,
-   const_binop (RSHIFT_EXPR,
-const_binop (LSHIFT_EXPR, tmask,
- size_int (precision - size)),
-size_int (precision - size)));
+  if (tree_int_cst_equal (mask,
+ const_binop (RSHIFT_EXPR,
+  const_binop (LSHIFT_EXPR, tmask,
+   size_int (precision - 
size)),
+  size_int (precision - size
+return true;
+
+  tmask = build_int_cst_type (unsigned_type_for (type), -1);
+
+  if (tree_int_cst_equal (mask,
+ const_binop (RSHIFT_EXPR,
+  const_binop (LSHIFT_EXPR, tmask,
+   size_int (precision - 
size)),
+  size_int (precision - size
+return true;
+
+  return false;
 }
 
 /* Subroutine for fold: determine if VAL is the INTEGER_CONST that

[RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
[ sent to both Linux kernel mailing list and to gcc list ]

I was looking at some of the old code I still have marked in my TODO
list, that I never pushed to get mainlined. One of them is to move trace
point logic out of the fast path to get rid of the stress that it
imposes on the icache.

Almost a full year ago, Mathieu suggested something like:

if (unlikely(x)) __attribute__((section(.unlikely))) {
...
} else __attribute__((section(.likely))) {
...
}

https://lkml.org/lkml/2012/8/9/658

Which got me thinking. How hard would it be to set a block in its own
section. Like what Mathieu suggested, but it doesn't have to be
.unlikely.

if (x) __attibute__((section(.foo))) {
/* do something */
}

Then have in the assembly, simply:

test x
beq 2f
1:
/* continue */
ret

2:
jmp foo1
3:
jmp 1b


Then in section .foo:

foo1:
/* do something */
jmp 3b

Perhaps we can't use the section attribute. We could create a new
attribute. Perhaps a __jmp_section__ or whatever (I'm horrible with
names).

Is this a possibility?

If this is possible, we can get a lot of code out of the fast path.
Things like stats and tracing, which is mostly default off. I would
imagine that we would get better performance by doing this. Especially
as tracepoints are being added all over the place.

Thanks,

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 09:55 AM, Steven Rostedt wrote:
 
 Almost a full year ago, Mathieu suggested something like:
 
 if (unlikely(x)) __attribute__((section(.unlikely))) {
 ...
 } else __attribute__((section(.likely))) {
 ...
 }
 
 https://lkml.org/lkml/2012/8/9/658
 
 Which got me thinking. How hard would it be to set a block in its own
 section. Like what Mathieu suggested, but it doesn't have to be
 .unlikely.
 
 if (x) __attibute__((section(.foo))) {
   /* do something */
 }
 

One concern I have is how this kind of code would work when embedded
inside a function which already has a section attribute.  This could
easily cause really weird bugs when someone optimizes an inline or
macro and breaks a single call site...

-hpa




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt rost...@goodmis.org wrote:

 Almost a full year ago, Mathieu suggested something like:

 if (unlikely(x)) __attribute__((section(.unlikely))) {
 ...
 } else __attribute__((section(.likely))) {
 ...
 }

It's almost certainly a horrible idea.

First off, we have very few things that are *so* unlikely that they
never get executed. Putting things in a separate section would
actually be really bad.

Secondly, you don't want a separate section anyway for any normal
kernel code, since you want short jumps if possible (pretty much every
single architecture out there has a concept of shorter jumps that are
noticeably cheaper than long ones). You want the unlikely code to be
out-of-line, but still *close*. Which is largely what gcc already does
(except if you use -Os, which disables all the basic block movement
and thus makes likely/unlikely pointless to begin with).

There are some situations where you'd want extremely unlikely code to
really be elsewhere, but they are rare as hell, and mostly in user
code where you might try to avoid demand-loading such code entirely.

So give up on sections. They are a bad idea for anything except the
things we already use them for. Sure, you can try to fix the problems
with sections with link-time optimization work and a *lot* of small
individual sections (the way per-function sections work already), but
that's basically just undoing the stupidity of using sections to begin
with.

Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 10:12 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Secondly, you don't want a separate section anyway for any normal
 kernel code, since you want short jumps if possible

Just to clarify: the short jump is important regardless of how
unlikely the code you're jumping is, since even if you'd be jumping to
very unlikely (never executed) code, the branch to that code is
itself in the hot path.

And the difference between a two-byte short jump to the end of a short
function, and a five-byte long jump (to pick the x86 case) is quite
noticeable.

Other cases do long jumps by jumping to a thunk, and so the hot case
is unaffected, but at least one common architecture very much sees the
difference in the likely code.

  Linus


Re: all_ones_mask_p clarification

2013-08-05 Thread Gabriel Dos Reis
On Mon, Aug 5, 2013 at 11:44 AM, Mike Stump mikest...@comcast.net wrote:
 It is the intent for all_ones_mask_p to return true when 64 bits of ones in 
 an unsigned type of width 64 when size is 64, right?  Currently the code uses 
 a signed type for tmask, which sets the upper bits to 1, when the value 
 includes the sign bit set and the equality code does check all 128 bits of 
 the the value for equality.  This results in the current code returning false 
 in this case.  The below change is the behavior change I'm talking about.

 We're fixing this in the wide-int branch, and just wanted to see if someone 
 wanted to argue this isn't a bug.

 If you want to see a small test case:

 typedef enum
 {
   DK_ERROR,
   DK_SORRY,
   DK_LAST_DIAGNOSTIC_KIND
 } diagnostic_t;

 struct diagnostic_context
 {
   int diagnostic_count[DK_LAST_DIAGNOSTIC_KIND];
   diagnostic_t *classify_diagnostic;
 };

 extern diagnostic_context *global_dc;

 bool
 seen_error (void)
 {
   return (global_dc)-diagnostic_count[(int) (DK_ERROR)] || 
 (global_dc)-diagnostic_count[(int) (DK_SORRY)];
 }

These casts to int are relics of the old days when we used KR C and
the cast was recommended.  I think that was a mistake, and we should remove
the cast and use the enumerations directly as index.

now, your issue may still stand with other test cases.

-- Gaby


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 10:02 -0700, H. Peter Anvin wrote:

  if (x) __attibute__((section(.foo))) {
  /* do something */
  }
  
 
 One concern I have is how this kind of code would work when embedded
 inside a function which already has a section attribute.  This could
 easily cause really weird bugs when someone optimizes an inline or
 macro and breaks a single call site...

I would say that it overrides the section it is embedded in. Basically
like a .pushsection and .popsection would work.

What bugs do you think would happen? Sure, this used in an .init section
would have this code sit around after boot up. I'm sure modules could
handle this properly. What other uses of attribute section is there for
code? I'm aware of locks and sched using it but that's more for
debugging purposes and even there, the worse thing I see is that a debug
report wont say that the code is in the section.

We do a lot of tricks with sections in the Linux kernel, so I too share
your concern. But even with that, if we audit all use cases, we may
still be able to safely do this. This is why I'm asking for comments :-)

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 10:12 -0700, Linus Torvalds wrote:
 On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt rost...@goodmis.org wrote:

 First off, we have very few things that are *so* unlikely that they
 never get executed. Putting things in a separate section would
 actually be really bad.

My main concern is with tracepoints. Which on 90% (or more) of systems
running Linux, is completely off, and basically just dead code, until
someone wants to see what's happening and enables them.

 
 Secondly, you don't want a separate section anyway for any normal
 kernel code, since you want short jumps if possible (pretty much every
 single architecture out there has a concept of shorter jumps that are
 noticeably cheaper than long ones). You want the unlikely code to be
 out-of-line, but still *close*. Which is largely what gcc already does
 (except if you use -Os, which disables all the basic block movement
 and thus makes likely/unlikely pointless to begin with).
 
 There are some situations where you'd want extremely unlikely code to
 really be elsewhere, but they are rare as hell, and mostly in user
 code where you might try to avoid demand-loading such code entirely.

Well, as tracepoints are being added quite a bit in Linux, my concern is
with the inlined functions that they bring. With jump labels they are
disabled in a very unlikely way (the static_key_false() is a nop to skip
the code, and is dynamically enabled to a jump).

I did a make kernel/sched/core.i to get what we have in the current
sched_switch code:

static inline __attribute__((no_instrument_function)) void
trace_sched_switch (struct task_struct *prev, struct task_struct *next) {
if (static_key_false( __tracepoint_sched_switch .key)) do {
struct tracepoint_func *it_func_ptr;
void *it_func;
void *__data;
rcu_read_lock_sched_notrace();
it_func_ptr = ({
typeof(*((__tracepoint_sched_switch)-funcs)) 
*_p1 =

(typeof(*((__tracepoint_sched_switch)-funcs))* )
(*(volatile 
typeof(((__tracepoint_sched_switch)-funcs)) *)
(((__tracepoint_sched_switch)-funcs)));
do {
static bool __attribute__ 
((__section__(.data.unlikely))) __warned;
if (debug_lockdep_rcu_enabled()  !__warned  
!(rcu_read_lock_sched_held() || (0))) {
__warned = true;
lockdep_rcu_suspicious( , 153 , 
suspicious rcu_dereference_check()  usage);
}
} while (0);
((typeof(*((__tracepoint_sched_switch)-funcs)) 
*)(_p1));
});
if (it_func_ptr) {
do {
it_func = (it_func_ptr)-func;
__data = (it_func_ptr)-data;
((void(*)(void *__data, struct task_struct 
*prev, struct task_struct *next))(it_func))(__data, prev, next);
} while ((++it_func_ptr)-func);
}
rcu_read_unlock_sched_notrace();
} while (0);
} 

I massaged it to look more readable. This is inlined right at the
beginning of the prepare_task_switch(). Now, most of this code should be
moved to the end of the function by gcc (well, as you stated -Os may not
play nice here). And perhaps its not that bad of an issue. That is, how
much of the icache does this actually take up? Maybe we are lucky and it
sits outside the icache of the hot path.

I still need to start running a bunch of benchmarks to see how much
overhead these tracepoints cause. Herbert Xu brought up the concern
about various latencies in the kernel, including tracing, in his ATTEND
request on the kernel-discuss mailing list.



 
 So give up on sections. They are a bad idea for anything except the
 things we already use them for. Sure, you can try to fix the problems
 with sections with link-time optimization work and a *lot* of small
 individual sections (the way per-function sections work already), but
 that's basically just undoing the stupidity of using sections to begin
 with.

OK, this was just a suggestion. Perhaps my original patch that just
moves this code into a real function where the trace_sched_switch() only
contains the jump_label and a call to another function that does all the
work when enabled, is still a better idea. That is, if benchmarks prove
that it's worth it.

Instead of the above, my patches would make the code into:

static inline __attribute__((no_instrument_function)) void
trace_sched_switch (struct task_struct *prev, struct task_struct *next)
{
if (static_key_false( __tracepoint_sched_switch .key))
__trace_sched_switch(prev, next);
}

That is, when this 

Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 13:55 -0400, Steven Rostedt wrote:
  The difference between this and the
 section hack I suggested, is that this would use a call/ret when
 enabled instead of a jmp/jmp.

I wonder if this is what Kris Kross meant in their song?

/me goes back to work...

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 10:55 AM, Steven Rostedt wrote:
 
 Well, as tracepoints are being added quite a bit in Linux, my concern is
 with the inlined functions that they bring. With jump labels they are
 disabled in a very unlikely way (the static_key_false() is a nop to skip
 the code, and is dynamically enabled to a jump).
 

Have you considered using traps for tracepoints?  A trapping instruction
can be as small as a single byte.  The downside, of course, is that it
is extremely suppressed -- the trap is always expensive -- and you then
have to do a lookup to find the target based on the originating IP.

-hpa




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 10:55 AM, Steven Rostedt rost...@goodmis.org wrote:

 My main concern is with tracepoints. Which on 90% (or more) of systems
 running Linux, is completely off, and basically just dead code, until
 someone wants to see what's happening and enables them.

The static_key_false() approach with minimal inlining sounds like a
much better approach overall. Sure, it might add a call/ret, but it
adds it to just the unlikely tracepoint taken path.

Of course, it would be good to optimize static_key_false() itself -
right now those static key jumps are always five bytes, and while they
get nopped out, it would still be nice if there was some way to have
just a two-byte nop (turning into a short branch) *if* we can reach
another jump that way..For small functions that would be lovely. Oh
well.

Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
 On 08/05/2013 10:55 AM, Steven Rostedt wrote:
  
  Well, as tracepoints are being added quite a bit in Linux, my concern is
  with the inlined functions that they bring. With jump labels they are
  disabled in a very unlikely way (the static_key_false() is a nop to skip
  the code, and is dynamically enabled to a jump).
  
 
 Have you considered using traps for tracepoints?  A trapping instruction
 can be as small as a single byte.  The downside, of course, is that it
 is extremely suppressed -- the trap is always expensive -- and you then
 have to do a lookup to find the target based on the originating IP.

No, never considered it, nor would I. Those that use tracepoints, do use
them extensively, and adding traps like this would probably cause
heissenbugs and make tracepoints useless.

Not to mention, how would we add a tracepoint to a trap handler?

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 11:20 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 The static_key_false() approach with minimal inlining sounds like a
 much better approach overall.

Sorry, I misunderstood your thing. That's actually what you want that
section thing for, because right now you cannot generate the argument
expansion otherwise.

Ugh. I can see the attraction of your section thing for that case, I
just get the feeling that we should be able to do better somehow.

  Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 11:23 AM, Steven Rostedt wrote:
 On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
 On 08/05/2013 10:55 AM, Steven Rostedt wrote:

 Well, as tracepoints are being added quite a bit in Linux, my concern is
 with the inlined functions that they bring. With jump labels they are
 disabled in a very unlikely way (the static_key_false() is a nop to skip
 the code, and is dynamically enabled to a jump).


 Have you considered using traps for tracepoints?  A trapping instruction
 can be as small as a single byte.  The downside, of course, is that it
 is extremely suppressed -- the trap is always expensive -- and you then
 have to do a lookup to find the target based on the originating IP.
 
 No, never considered it, nor would I. Those that use tracepoints, do use
 them extensively, and adding traps like this would probably cause
 heissenbugs and make tracepoints useless.
 
 Not to mention, how would we add a tracepoint to a trap handler?
 

Traps nest, that's why there is a stack.  (OK, so you don't want to take
the same trap inside the trap handler, but that code should be very
limited.)  The trap instruction just becomes very short, but rather
slow, call-return.

However, when you consider the cost you have to consider that the
tracepoint is doing other work, so it may very well amortize out.

-hpa




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 11:20 AM, Linus Torvalds wrote:
 
 Of course, it would be good to optimize static_key_false() itself -
 right now those static key jumps are always five bytes, and while they
 get nopped out, it would still be nice if there was some way to have
 just a two-byte nop (turning into a short branch) *if* we can reach
 another jump that way..For small functions that would be lovely. Oh
 well.
 

That would definitely require gcc support.  It would be useful, but
probably requires a lot of machinery.

-hpa




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Ugh. I can see the attraction of your section thing for that case, I
 just get the feeling that we should be able to do better somehow.

Hmm.. Quite frankly, Steven, for your use case I think you actually
want the C goto *labels* associated with a section. Which sounds like
it might be a cleaner syntax than making it about the basic block
anyway.

 Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 11:34 AM, Linus Torvalds wrote:
 On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:

 Ugh. I can see the attraction of your section thing for that case, I
 just get the feeling that we should be able to do better somehow.
 
 Hmm.. Quite frankly, Steven, for your use case I think you actually
 want the C goto *labels* associated with a section. Which sounds like
 it might be a cleaner syntax than making it about the basic block
 anyway.
 

A label wouldn't have an endpoint, though...

-hpa




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:

 Of course, it would be good to optimize static_key_false() itself -
 right now those static key jumps are always five bytes, and while they
 get nopped out, it would still be nice if there was some way to have
 just a two-byte nop (turning into a short branch) *if* we can reach
 another jump that way..For small functions that would be lovely. Oh
 well.

I had patches that did exactly this:

 https://lkml.org/lkml/2012/3/8/461

But it got dropped for some reason. I don't remember why. Maybe because
of the complexity?

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:

 Traps nest, that's why there is a stack.  (OK, so you don't want to take
 the same trap inside the trap handler, but that code should be very
 limited.)  The trap instruction just becomes very short, but rather
 slow, call-return.
 
 However, when you consider the cost you have to consider that the
 tracepoint is doing other work, so it may very well amortize out.

Also, how would you pass the parameters? Every tracepoint has its own
parameters to pass to it. How would a trap know what where to get prev
and next?

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt rost...@goodmis.org wrote:

 I had patches that did exactly this:

  https://lkml.org/lkml/2012/3/8/461

 But it got dropped for some reason. I don't remember why. Maybe because
 of the complexity?

Ugh. Why the crazy update_jump_label script stuff? I'd go Eww at
that too, it looks crazy. The assembler already knows to make short
2-byte jmp instructions for near jumps, and you can just look at the
opcode itself to determine size, why is all that other stuff required?

IOW, 5/7 looks sane, but 4/7 makes me go there's something wrong with
that series.

 Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 11:49 AM, Steven Rostedt wrote:
 On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
 
 Traps nest, that's why there is a stack.  (OK, so you don't want to take
 the same trap inside the trap handler, but that code should be very
 limited.)  The trap instruction just becomes very short, but rather
 slow, call-return.

 However, when you consider the cost you have to consider that the
 tracepoint is doing other work, so it may very well amortize out.
 
 Also, how would you pass the parameters? Every tracepoint has its own
 parameters to pass to it. How would a trap know what where to get prev
 and next?
 

How do you do that now?

You have to do an IP lookup to find out what you are doing.

(Note: I wonder how much the parameter generation costs the tracepoints.)

-hpa




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 11:51 AM, H. Peter Anvin h...@linux.intel.com wrote:

 Also, how would you pass the parameters? Every tracepoint has its own
 parameters to pass to it. How would a trap know what where to get prev
 and next?

 How do you do that now?

 You have to do an IP lookup to find out what you are doing.

No, he just generates the code for the call and then uses a static_key
to jump to it. So normally it's all out-of-line, and the only thing in
the hot-path is that 5-byte nop (which gets turned into a 5-byte jump
when the tracing key is enabled)

Works fine, but the normally unused stubs end up mixing in the normal
code segment. Which I actually think is fine, but right now we don't
get the short-jump advantage from it (and there is likely some I$
disadvantage from just fragmentation of the code).

With two-byte jumps, you'd still get the I$ fragmentation (the
argument generation and the call and the branch back would all be in
the same code segment as the hot code), but that would be offset by
the fact that at least the hot code itself could use a short jump when
possible (ie a 2-byte nop rather than a 5-byte one).

Don't know which way it would go performance-wise. But it shouldn't
need gcc changes, it just needs the static key branch/nop rewriting to
be able to handle both sizes. I couldn't tell why Steven's series to
do that was so complex, though - I only glanced through the patches.

Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 11:34 -0700, Linus Torvalds wrote:
 On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  Ugh. I can see the attraction of your section thing for that case, I
  just get the feeling that we should be able to do better somehow.
 
 Hmm.. Quite frankly, Steven, for your use case I think you actually
 want the C goto *labels* associated with a section. Which sounds like
 it might be a cleaner syntax than making it about the basic block
 anyway.

I would love to. But IIRC, the asm_goto() has some strict constraints.
We may be able to jump to a different section, but we have no way of
coming back. Not to mention, you must tell the asm goto() what label you
may be jumping to.

I don't know how safe something like this may be:


static inline trace_sched_switch(prev, next)
{
asm goto(jmp foo1\n : : foo2);
 foo1:
return;

asm goto(.pushsection\n
section \.foo\\n);
 foo2:
__trace_sched_switch(prev, next);
asm goto(jmp foo1
.popsection\n : : foo1);
}


The above looks too fragile for my taste. I'm afraid gcc will move stuff
out of those asm goto locations, and make things just fail. But I can
play with this, but I don't like it.

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Andi Kleen
Steven Rostedt rost...@goodmis.org writes:

Can't you just use -freorder-blocks-and-partition?

This should already partition unlikely blocks into a
different section. Just a single one of course.

FWIW the disadvantage is that multiple code sections tends
to break various older dwarf unwinders, as it needs
dwarf3 latest'n'greatest.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 11:51 -0700, H. Peter Anvin wrote:
 On 08/05/2013 11:49 AM, Steven Rostedt wrote:
  On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
  
  Traps nest, that's why there is a stack.  (OK, so you don't want to take
  the same trap inside the trap handler, but that code should be very
  limited.)  The trap instruction just becomes very short, but rather
  slow, call-return.
 
  However, when you consider the cost you have to consider that the
  tracepoint is doing other work, so it may very well amortize out.
  
  Also, how would you pass the parameters? Every tracepoint has its own
  parameters to pass to it. How would a trap know what where to get prev
  and next?
  
 
 How do you do that now?
 
 You have to do an IP lookup to find out what you are doing.

??

You mean to do the enabling? Sure, but not after the code is enabled.
There's no lookup. It just calls functions directly.

 
 (Note: I wonder how much the parameter generation costs the tracepoints.)

The same as doing a function call.

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
 Steven Rostedt rost...@goodmis.org writes:
 
 Can't you just use -freorder-blocks-and-partition?

Yeah, I'm familiar with this option.

 
 This should already partition unlikely blocks into a
 different section. Just a single one of course.
 
 FWIW the disadvantage is that multiple code sections tends
 to break various older dwarf unwinders, as it needs
 dwarf3 latest'n'greatest.

If the option was so good, I would expect everyone would be using it ;-)


I'm mainly only concerned with the tracepoints. I'm asking to be able to
do this with just the tracepoint code, and affect nobody else.

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 12:04 PM, Andi Kleen a...@firstfloor.org wrote:
 Steven Rostedt rost...@goodmis.org writes:

 Can't you just use -freorder-blocks-and-partition?

 This should already partition unlikely blocks into a
 different section. Just a single one of course.

That's horrible. Not because of dwarf problems, but exactly because
unlikely code isn't necessarily *that* unlikely, and normal unlikely
code is reached with a small branch. Making it a whole different
section breaks both of those.

Maybe some really_unlikely() would make it ok.

  Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Xinliang David Li
On Mon, Aug 5, 2013 at 12:16 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
 Steven Rostedt rost...@goodmis.org writes:

 Can't you just use -freorder-blocks-and-partition?

 Yeah, I'm familiar with this option.


This option works best with FDO.   FDOed linux kernel rocks :)


 This should already partition unlikely blocks into a
 different section. Just a single one of course.

 FWIW the disadvantage is that multiple code sections tends
 to break various older dwarf unwinders, as it needs
 dwarf3 latest'n'greatest.

 If the option was so good, I would expect everyone would be using it ;-)


There were lots of problems with this option -- recently cleaned
up/fixed by Teresa in GCC trunk.

thanks,

David


 I'm mainly only concerned with the tracepoints. I'm asking to be able to
 do this with just the tracepoint code, and affect nobody else.

 -- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
 On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt rost...@goodmis.org wrote:
 
  I had patches that did exactly this:
 
   https://lkml.org/lkml/2012/3/8/461
 
  But it got dropped for some reason. I don't remember why. Maybe because
  of the complexity?
 
 Ugh. Why the crazy update_jump_label script stuff? I'd go Eww at
 that too, it looks crazy. The assembler already knows to make short
 2-byte jmp instructions for near jumps, and you can just look at the
 opcode itself to determine size, why is all that other stuff required?

Hmm, I probably added that optimization in there because I was doing a
bunch of jump label work and just included it in. It's been over a year
since I've worked on this so I don't remember all the details. That
update_jump_label program may have just been to do the conversion of
nops at compile time and not during boot. It may not be needed. Also, it
was based on the record-mcount code that the function tracer uses, which
is also done at compile time, to get all the mcount locations.

 
 IOW, 5/7 looks sane, but 4/7 makes me go there's something wrong with
 that series.

I just quickly looked at the changes again. I think I can redo them and
send them again for 3.12. What do you think about keeping all but patch
4?

1 - Use a default nop at boot. I had help from hpa on this. Currently,
jump labels use a jmp instead of a nop on boot.

2 - On boot, the jump label nops (jump before patch 1) looks at the best
run time nop, and converts them. Since it is likely that the current
nop is already ideal, skip the conversion. Again, this is just a
boot up optimization.

3 - Add a test to see what we are converting. Adds safety checks like
there
is in the function tracer, where if it updates a location, and does
not
find what it expects to find, output a nasty bug.

 will skip patch 4 

5 - Does what you want, with the 2 and 5 byte nops.

6 - When/if a failure does trigger. Print out information to what went
wrong. Helps debugging splats caused by patch 3.

7 - needs to go before patch 3. As patch 3 can trigger if the default
nop
is not the ideal nop for the box that is running. reported by Ingo


If I take out patch 4, would that solution look fine for you? I can get
this ready for 3.12.

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Marek Polacek
On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
 On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  Ugh. I can see the attraction of your section thing for that case, I
  just get the feeling that we should be able to do better somehow.
 
 Hmm.. Quite frankly, Steven, for your use case I think you actually
 want the C goto *labels* associated with a section. Which sounds like
 it might be a cleaner syntax than making it about the basic block
 anyway.

FWIW, we also support hot/cold attributes for labels, thus e.g.

  if (bar ()) 
goto A;
  /* ... */
A: __attribute__((cold))
  /* ... */

I don't know whether that might be useful for what you want or not though...

Marek


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Mathieu Desnoyers
* Linus Torvalds (torva...@linux-foundation.org) wrote:
[...]
 With two-byte jumps, you'd still get the I$ fragmentation (the
 argument generation and the call and the branch back would all be in
 the same code segment as the hot code), but that would be offset by
 the fact that at least the hot code itself could use a short jump when
 possible (ie a 2-byte nop rather than a 5-byte one).

I remember that choosing between 2 and 5 bytes nop in the asm goto was
tricky: it had something to do with the fact that gcc doesn't know the
exact size of each instructions until further down within compilation
phases on architectures with variable instruction size like x86. If we
have guarantees that the guessed size of each instruction is an upper
bound on the instruction size, this could probably work though.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 12:40 PM, Marek Polacek pola...@redhat.com wrote:

 FWIW, we also support hot/cold attributes for labels, thus e.g.

   if (bar ())
 goto A;
   /* ... */
 A: __attribute__((cold))
   /* ... */

 I don't know whether that might be useful for what you want or not though...

Steve? That does sound like it might at least re-order the basic
blocks better for your cases. Worth checking out, no?

That said, I don't know what gcc actually does for that case. It may
be that it just ends up trying to transfer that cold information to
the conditional itself, which wouldn't work for our asm goto use. I
hope/assume it doesn't do that, though, since the cold attribute
would presumably also be useful for things like computed gotos etc -
so it really isn't about the _source_ of the branch, but about that
specific target, and the basic block re-ordering.

Anyway, the exact implementation details may make it more or less
useful for our special static key things. But it does sound like the
right thing to do for static keys.

Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Jason Baron

On 08/05/2013 03:40 PM, Marek Polacek wrote:

On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:

On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

Ugh. I can see the attraction of your section thing for that case, I
just get the feeling that we should be able to do better somehow.

Hmm.. Quite frankly, Steven, for your use case I think you actually
want the C goto *labels* associated with a section. Which sounds like
it might be a cleaner syntax than making it about the basic block
anyway.

FWIW, we also support hot/cold attributes for labels, thus e.g.

   if (bar ())
 goto A;
   /* ... */
A: __attribute__((cold))
   /* ... */

I don't know whether that might be useful for what you want or not though...

Marek



It certainly would be.

That was how I wanted to the 'static_key' stuff to work, but 
unfortunately the last time I tried it, it didn't move the text 
out-of-line any further than it was already doing. Would that be 
expected? The change for us, if it worked would be quite simple. 
Something like:


--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -21,7 +21,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key)

.popsection \n\t
: :  i (key) : : l_yes);
return false;
-l_yes:
+l_yes: __attribute__((cold))
return true;
 }






Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Linus Torvalds
On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
mathieu.desnoy...@efficios.com wrote:

 I remember that choosing between 2 and 5 bytes nop in the asm goto was
 tricky: it had something to do with the fact that gcc doesn't know the
 exact size of each instructions until further down within compilation

Oh, you can't do it in the coompiler, no. But you don't need to. The
assembler will pick the right version if you just do jmp target.

 Linus


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 12:57 -0700, Linus Torvalds wrote:
 On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
 mathieu.desnoy...@efficios.com wrote:
 
  I remember that choosing between 2 and 5 bytes nop in the asm goto was
  tricky: it had something to do with the fact that gcc doesn't know the
  exact size of each instructions until further down within compilation
 
 Oh, you can't do it in the coompiler, no. But you don't need to. The
 assembler will pick the right version if you just do jmp target.

Right, and that's exactly what my patches did.

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Jason Baron

On 08/05/2013 02:39 PM, Steven Rostedt wrote:

On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:


Of course, it would be good to optimize static_key_false() itself -
right now those static key jumps are always five bytes, and while they
get nopped out, it would still be nice if there was some way to have
just a two-byte nop (turning into a short branch) *if* we can reach
another jump that way..For small functions that would be lovely. Oh
well.

I had patches that did exactly this:

  https://lkml.org/lkml/2012/3/8/461

But it got dropped for some reason. I don't remember why. Maybe because
of the complexity?

-- Steve


Hi Steve,

I recall testing your patches and the text size increased unexpectedly. 
I believe I correctly accounted for changes to the text size *outside* 
of branch points. If you do re-visit the series that is one thing I'd 
like to double check/understand.


Thanks,

-Jason


Дополнительные посетители на Ваш сайт

2013-08-05 Thread Вячеслав
Здравствуйте!

Хочу предложить Вам целевые переходы  интересных для Вашего сайта из, 
источником являются email рассылки.

Наши преимущества:
- Возможность таргетинга по любому региону;
- Статистика в личном кабинете нашего сервиса;
- Стоимость перехода гораздо меньше Яндекс.Директа;
- Размер рекламного объявления до 200 символов без темы и ссылки.

Обращайтесь по любым вопросам: 7 ( Ч9  5)5 42 =3 9 -87


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Richard Henderson
On 08/05/2013 09:57 AM, Jason Baron wrote:
 On 08/05/2013 03:40 PM, Marek Polacek wrote:
 On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
 On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 Ugh. I can see the attraction of your section thing for that case, I
 just get the feeling that we should be able to do better somehow.
 Hmm.. Quite frankly, Steven, for your use case I think you actually
 want the C goto *labels* associated with a section. Which sounds like
 it might be a cleaner syntax than making it about the basic block
 anyway.
 FWIW, we also support hot/cold attributes for labels, thus e.g.

if (bar ())
  goto A;
/* ... */
 A: __attribute__((cold))
/* ... */

 I don't know whether that might be useful for what you want or not though...

 Marek

 
 It certainly would be.
 
 That was how I wanted to the 'static_key' stuff to work, but unfortunately the
 last time I tried it, it didn't move the text out-of-line any further than it
 was already doing. Would that be expected? The change for us, if it worked
 would be quite simple. Something like:

It is expected.  One must use -freorder-blocks-and-partition, and use real
profile feedback to get blocks moved completely out-of-line.

Whether that's a sensible default or not is debatable.


r~


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Mathieu Desnoyers
* Linus Torvalds (torva...@linux-foundation.org) wrote:
 On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
 mathieu.desnoy...@efficios.com wrote:
 
  I remember that choosing between 2 and 5 bytes nop in the asm goto was
  tricky: it had something to do with the fact that gcc doesn't know the
  exact size of each instructions until further down within compilation
 
 Oh, you can't do it in the coompiler, no. But you don't need to. The
 assembler will pick the right version if you just do jmp target.

Yep.

Another thing that bothers me with Steven's approach is that decoding
jumps generated by the compiler seems fragile IMHO.

x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 :

+static int make_nop_x86(void *map, size_t const offset)
+{
+   unsigned char *op;
+   unsigned char *nop;
+   int size;
+
+   /* Determine which type of jmp this is 2 byte or 5. */
+   op = map + offset;
+   switch (*op) {
+   case 0xeb: /* 2 byte */
+   size = 2;
+   nop = ideal_nop2_x86;
+   break;
+   case 0xe9: /* 5 byte */
+   size = 5;
+   nop = ideal_nop;
+   break;
+   default:
+   die(NULL, Bad jump label section (bad op %x)\n, *op);
+   __builtin_unreachable();
+   }

My though is that the code above does not cover all jump encodings that
can be generated by past, current and future x86 assemblers.

Another way around this issue might be to keep the instruction size
within a non-allocated section:

static __always_inline bool arch_static_branch(struct static_key *key)
{
asm goto(1:
jmp %l[l_yes]\n\t
2:

.pushsection __jump_table,  \aw\ \n\t
_ASM_ALIGN \n\t
_ASM_PTR 1b, %l[l_yes], %c0 \n\t
.popsection \n\t

.pushsection __jump_table_ilen \n\t
_ASM_PTR 1b \n\t  /* Address of the jmp */
.byte 2b - 1b \n\t/* Size of the jmp instruction */
.popsection \n\t

: :  i (key) : : l_yes);
return false;
l_yes:
return true;
}

And use (2b - 1b) to know what size of no-op should be used rather than
to rely on instruction decoding.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote:
 * Linus Torvalds (torva...@linux-foundation.org) wrote:
 On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
 mathieu.desnoy...@efficios.com wrote:

 I remember that choosing between 2 and 5 bytes nop in the asm goto was
 tricky: it had something to do with the fact that gcc doesn't know the
 exact size of each instructions until further down within compilation

 Oh, you can't do it in the coompiler, no. But you don't need to. The
 assembler will pick the right version if you just do jmp target.
 
 Yep.
 
 Another thing that bothers me with Steven's approach is that decoding
 jumps generated by the compiler seems fragile IMHO.
 
 x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 :
 
 +static int make_nop_x86(void *map, size_t const offset)
 +{
 + unsigned char *op;
 + unsigned char *nop;
 + int size;
 +
 + /* Determine which type of jmp this is 2 byte or 5. */
 + op = map + offset;
 + switch (*op) {
 + case 0xeb: /* 2 byte */
 + size = 2;
 + nop = ideal_nop2_x86;
 + break;
 + case 0xe9: /* 5 byte */
 + size = 5;
 + nop = ideal_nop;
 + break;
 + default:
 + die(NULL, Bad jump label section (bad op %x)\n, *op);
 + __builtin_unreachable();
 + }
 
 My though is that the code above does not cover all jump encodings that
 can be generated by past, current and future x86 assemblers.
 

For unconditional jmp that should be pretty safe barring any fundamental
changes to the instruction set, in which case we can enable it as
needed, but for extra robustness it probably should skip prefix bytes.

-hpa



Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:

 Another thing that bothers me with Steven's approach is that decoding
 jumps generated by the compiler seems fragile IMHO.

The encodings wont change. If they do, then old kernels will not run on
new hardware.

Now if it adds a third option to jmp, then we hit the die path and
know right away that it wont work anymore. Then we fix it properly.

 
 x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 :
 
 +static int make_nop_x86(void *map, size_t const offset)
 +{
 + unsigned char *op;
 + unsigned char *nop;
 + int size;
 +
 + /* Determine which type of jmp this is 2 byte or 5. */
 + op = map + offset;
 + switch (*op) {
 + case 0xeb: /* 2 byte */
 + size = 2;
 + nop = ideal_nop2_x86;
 + break;
 + case 0xe9: /* 5 byte */
 + size = 5;
 + nop = ideal_nop;
 + break;
 + default:
 + die(NULL, Bad jump label section (bad op %x)\n, *op);
 + __builtin_unreachable();
 + }
 
 My though is that the code above does not cover all jump encodings that
 can be generated by past, current and future x86 assemblers.
 
 Another way around this issue might be to keep the instruction size
 within a non-allocated section:
 
 static __always_inline bool arch_static_branch(struct static_key *key)
 {
 asm goto(1:
 jmp %l[l_yes]\n\t
 2:
 
 .pushsection __jump_table,  \aw\ \n\t
 _ASM_ALIGN \n\t
 _ASM_PTR 1b, %l[l_yes], %c0 \n\t
 .popsection \n\t
 
 .pushsection __jump_table_ilen \n\t
 _ASM_PTR 1b \n\t  /* Address of the jmp */
 .byte 2b - 1b \n\t/* Size of the jmp instruction */
 .popsection \n\t
 
 : :  i (key) : : l_yes);
 return false;
 l_yes:
 return true;
 }
 
 And use (2b - 1b) to know what size of no-op should be used rather than
 to rely on instruction decoding.
 
 Thoughts ?
 

Then we need to add yet another table of information to the kernel that
needs to hang around. This goes with another kernel-discuss request
talking about kernel data bloat.

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Mathieu Desnoyers
* Steven Rostedt (rost...@goodmis.org) wrote:
 On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:
 
[...]
  My though is that the code above does not cover all jump encodings that
  can be generated by past, current and future x86 assemblers.
  
  Another way around this issue might be to keep the instruction size
  within a non-allocated section:
  
  static __always_inline bool arch_static_branch(struct static_key *key)
  {
  asm goto(1:
  jmp %l[l_yes]\n\t
  2:
  
  .pushsection __jump_table,  \aw\ \n\t
  _ASM_ALIGN \n\t
  _ASM_PTR 1b, %l[l_yes], %c0 \n\t
  .popsection \n\t
  
  .pushsection __jump_table_ilen \n\t
  _ASM_PTR 1b \n\t  /* Address of the jmp */
  .byte 2b - 1b \n\t/* Size of the jmp instruction */
  .popsection \n\t
  
  : :  i (key) : : l_yes);
  return false;
  l_yes:
  return true;
  }
  
  And use (2b - 1b) to know what size of no-op should be used rather than
  to rely on instruction decoding.
  
  Thoughts ?
  
 
 Then we need to add yet another table of information to the kernel that
 needs to hang around. This goes with another kernel-discuss request
 talking about kernel data bloat.

Perhaps this section could be simply removed by the post-link stage ?

Thanks,

Mathieu

 
 -- Steve
 
 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Jason Baron

On 08/05/2013 04:35 PM, Richard Henderson wrote:

On 08/05/2013 09:57 AM, Jason Baron wrote:

On 08/05/2013 03:40 PM, Marek Polacek wrote:

On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:

On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

Ugh. I can see the attraction of your section thing for that case, I
just get the feeling that we should be able to do better somehow.

Hmm.. Quite frankly, Steven, for your use case I think you actually
want the C goto *labels* associated with a section. Which sounds like
it might be a cleaner syntax than making it about the basic block
anyway.

FWIW, we also support hot/cold attributes for labels, thus e.g.

if (bar ())
  goto A;
/* ... */
A: __attribute__((cold))
/* ... */

I don't know whether that might be useful for what you want or not though...

 Marek


It certainly would be.

That was how I wanted to the 'static_key' stuff to work, but unfortunately the
last time I tried it, it didn't move the text out-of-line any further than it
was already doing. Would that be expected? The change for us, if it worked
would be quite simple. Something like:

It is expected.  One must use -freorder-blocks-and-partition, and use real
profile feedback to get blocks moved completely out-of-line.

Whether that's a sensible default or not is debatable.



Hi Steve,

I think if the 'cold' attribute on the default disabled static_key 
branch moved the text completely out-of-line, it would satisfy your 
requirement here?


If you like this approach, perhaps we can make something like this work 
within gcc. As its already supported, but doesn't quite go far enough 
for our purposes.


Also, if we go down this path, it means the 2-byte jump sequence is 
probably not going to be too useful.


Thanks,

-Jason






Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 22:26 -0400, Jason Baron wrote:

 I think if the 'cold' attribute on the default disabled static_key 
 branch moved the text completely out-of-line, it would satisfy your 
 requirement here?
 
 If you like this approach, perhaps we can make something like this work 
 within gcc. As its already supported, but doesn't quite go far enough 
 for our purposes.

It may not be too bad to use.

 
 Also, if we go down this path, it means the 2-byte jump sequence is 
 probably not going to be too useful.

Don't count us out yet :-)


static inline bool arch_static_branch(struct static_key *key)
{
asm goto(1:
[...]
: : i (key) : : l_yes);
return false;
l_yes:
goto __l_yes;
__l_yes: __attribute__((cold));
return false;
}

Or put that logic in the caller of arch_static_branch(). Basically, we
may be able to do a short jump to the place that will do a long jump to
the real work.

I'll have to play with this and see what gcc does with the output.

-- Steve




Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Mathieu Desnoyers
* H. Peter Anvin (h...@linux.intel.com) wrote:
 On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote:
  * Linus Torvalds (torva...@linux-foundation.org) wrote:
  On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
  mathieu.desnoy...@efficios.com wrote:
 
  I remember that choosing between 2 and 5 bytes nop in the asm goto was
  tricky: it had something to do with the fact that gcc doesn't know the
  exact size of each instructions until further down within compilation
 
  Oh, you can't do it in the coompiler, no. But you don't need to. The
  assembler will pick the right version if you just do jmp target.
  
  Yep.
  
  Another thing that bothers me with Steven's approach is that decoding
  jumps generated by the compiler seems fragile IMHO.
  
  x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 :
  
  +static int make_nop_x86(void *map, size_t const offset)
  +{
  +   unsigned char *op;
  +   unsigned char *nop;
  +   int size;
  +
  +   /* Determine which type of jmp this is 2 byte or 5. */
  +   op = map + offset;
  +   switch (*op) {
  +   case 0xeb: /* 2 byte */
  +   size = 2;
  +   nop = ideal_nop2_x86;
  +   break;
  +   case 0xe9: /* 5 byte */
  +   size = 5;
  +   nop = ideal_nop;
  +   break;
  +   default:
  +   die(NULL, Bad jump label section (bad op %x)\n, *op);
  +   __builtin_unreachable();
  +   }
  
  My though is that the code above does not cover all jump encodings that
  can be generated by past, current and future x86 assemblers.
  
 
 For unconditional jmp that should be pretty safe barring any fundamental
 changes to the instruction set, in which case we can enable it as
 needed, but for extra robustness it probably should skip prefix bytes.

On x86-32, some prefixes are actually meaningful. AFAIK, the 0x66 prefix
is used for:

E9 cw   jmp rel16   relative jump, only in 32-bit

Other prefixes can probably be safely skipped.

Another question is whether anything prevents the assembler from
generating a jump near (absolute indirect), or far jump. The code above
seems to assume that we have either a short or near relative jump.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 09:14 PM, Mathieu Desnoyers wrote:

 For unconditional jmp that should be pretty safe barring any fundamental
 changes to the instruction set, in which case we can enable it as
 needed, but for extra robustness it probably should skip prefix bytes.
 
 On x86-32, some prefixes are actually meaningful. AFAIK, the 0x66 prefix
 is used for:
 
 E9 cw   jmp rel16   relative jump, only in 32-bit
 
 Other prefixes can probably be safely skipped.
 

Yes.  Some of them are used as hints or for MPX.

 Another question is whether anything prevents the assembler from
 generating a jump near (absolute indirect), or far jump. The code above
 seems to assume that we have either a short or near relative jump.

Absolutely something prevents!  It would be a very serious error for the
assembler to generate such instructions.

-hpa






c++/linker problems maybe?

2013-08-05 Thread George R Goffe
Hi,

I'm having trouble building or linking C++ code. Could one of you brains take a 
peek at the enclosed and let me know where I'm goofing please?

Regards, and thanks,

George...

gcc --version
gcc (GCC) 4.9.0 20130805 (experimental)



[ 88%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3bthemedlabel.cpp.o  

 
[ 88%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3blsofwrapper.cpp.o  

 
[ 89%] Building CXX object 
src/CMakeFiles/k3b_bin.dir/k3blsofwrapperdialog.cpp.o   
  
[ 89%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3baction.cpp.o   

 
[ 89%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3bdevicemenu.cpp.o   

 
[ 89%] Building CXX object 
src/CMakeFiles/k3b_bin.dir/k3bviewcolumnadjuster.cpp.o  
  
[ 90%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3bmodelutils.cpp.o   

 
Linking CXX executable k3b
CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function `K3b::MainWindow::~MainWindow()':
/tools/k3b/k3b/src/k3b.cpp:272: undefined reference to 
`KXmlGuiWindow::~KXmlGuiWindow(void const**)'
CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function `K3b::MainWindow::MainWindow()':
/tools/k3b/k3b/src/k3b.cpp:227: undefined reference to 
`KXmlGuiWindow::KXmlGuiWindow(void const**, QWidget*, QFlagsQt::WindowType)'
CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function `K3b::MainWindow::MainWindow()':
/tools/k3b/k3b/src/k3b.cpp:227: undefined reference to 
`KXmlGuiWindow::KXmlGuiWindow(void const**, QWidget*, QFlagsQt::WindowType)'
CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function `K3b::MainWindow::~MainWindow()':
/tools/k3b/k3b/src/k3b.cpp:272: undefined reference to 
`KXmlGuiWindow::~KXmlGuiWindow(void const**)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/k3b] Error 1
make[1]: *** [src/CMakeFiles/k3b_bin.dir/all] Error 2
make: *** [all] Error 2


Re: c++/linker problems maybe?

2013-08-05 Thread Marek Polacek
On Mon, Aug 05, 2013 at 10:05:22PM -0700, George R Goffe wrote:
 Hi,
 
 I'm having trouble building or linking C++ code. Could one of you brains take 
 a peek at the enclosed and let me know where I'm goofing please?

This question is not appropriate for the mailing list gcc@gcc.gnu.org,
which is for the development of GCC.  It would be appropriate for
gcc-h...@gcc.gnu.org.  Please take any followups to gcc-help.  Thanks.

 [ 88%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3bthemedlabel.cpp.o
   
  
 [ 88%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3blsofwrapper.cpp.o
   
  
 [ 89%] Building CXX object 
 src/CMakeFiles/k3b_bin.dir/k3blsofwrapperdialog.cpp.o 
     
 [ 89%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3baction.cpp.o 
   
  
 [ 89%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3bdevicemenu.cpp.o 
   
  
 [ 89%] Building CXX object 
 src/CMakeFiles/k3b_bin.dir/k3bviewcolumnadjuster.cpp.o
     
 [ 90%] Building CXX object src/CMakeFiles/k3b_bin.dir/k3bmodelutils.cpp.o 
   
  
 Linking CXX executable k3b
 CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function 
 `K3b::MainWindow::~MainWindow()':
 /tools/k3b/k3b/src/k3b.cpp:272: undefined reference to 
 `KXmlGuiWindow::~KXmlGuiWindow(void const**)'
 CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function `K3b::MainWindow::MainWindow()':
 /tools/k3b/k3b/src/k3b.cpp:227: undefined reference to 
 `KXmlGuiWindow::KXmlGuiWindow(void const**, QWidget*, QFlagsQt::WindowType)'
 CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function `K3b::MainWindow::MainWindow()':
 /tools/k3b/k3b/src/k3b.cpp:227: undefined reference to 
 `KXmlGuiWindow::KXmlGuiWindow(void const**, QWidget*, QFlagsQt::WindowType)'
 CMakeFiles/k3b_bin.dir/k3b.cpp.o: In function 
 `K3b::MainWindow::~MainWindow()':
 /tools/k3b/k3b/src/k3b.cpp:272: undefined reference to 
 `KXmlGuiWindow::~KXmlGuiWindow(void const**)'
 collect2: error: ld returned 1 exit status
 make[2]: *** [src/k3b] Error 1
 make[1]: *** [src/CMakeFiles/k3b_bin.dir/all] Error 2
 make: *** [all] Error 2

It just seems the library containing the definition of 
KXmlGuiWindow::KXmlGuiWindow isn't properly linked in.

Marek


[Bug rtl-optimization/58079] internal compiler error: in do_SUBST, at combine.c:711

2013-08-05 Thread mikpe at it dot uu.se
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58079

Mikael Pettersson mikpe at it dot uu.se changed:

   What|Removed |Added

 CC||mikpe at it dot uu.se

--- Comment #2 from Mikael Pettersson mikpe at it dot uu.se ---
I can reproduce the ICE with 4.9 and 4.8 crosses to mips64-linux, but not with
4.7 or 4.6.


[Bug c++/58083] [4.8/4.9 Regression] ICE with lambda as default parameter of a template function

2013-08-05 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58083

Paolo Carlini paolo.carlini at oracle dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-05
Summary|ICE with lambda as default  |[4.8/4.9 Regression] ICE
   |parameter of a template |with lambda as default
   |function|parameter of a template
   ||function
 Ever confirmed|0   |1

--- Comment #2 from Paolo Carlini paolo.carlini at oracle dot com ---
Without the *it; bit the testcase seems valid to me and compiled fine with
4.7.x.


[Bug fortran/49213] [OOP] gfortran rejects structure constructor expression

2013-08-05 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49213

--- Comment #12 from janus at gcc dot gnu.org ---
Related test case (using unlimited polymorphism) from
http://gcc.gnu.org/ml/fortran/2013-08/msg00011.html:

type t
  class(*), pointer :: x
end type t
type(t), target :: y
integer,target :: z
type(t) :: x = t(y)
type(t) :: x = t(z)
class(*), pointer :: a = y 
end


Unpatched gfortran trunk yields:

tobias2.f90:7.17:

type(t) :: x = t(y)
 1
Error: Can't convert TYPE(t) to CLASS(*) at (1)
tobias2.f90:8.17:

type(t) :: x = t(z)
 1
Error: Can't convert INTEGER(4) to CLASS(*) at (1)


[Bug rtl-optimization/58068] ICE in execute_strength_reduction at -O3 (both 32-bit and 64-bit modes)

2013-08-05 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58068

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||mpolacek at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org ---
Fixed by r201466.


[Bug fortran/49213] [OOP] gfortran rejects structure constructor expression

2013-08-05 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49213

--- Comment #13 from janus at gcc dot gnu.org ---
(In reply to janus from comment #12)
 
 type(t) :: x = t(y)
  1
 Error: Can't convert TYPE(t) to CLASS(*) at (1)

The patch in comment 8 turns this error into:

type(t) :: x = t(y)
 1
Error: Parameter 'y' at (1) has not been declared or is a variable, which does
not reduce to a constant expression


[Bug lto/57602] [4.9 Regression] Runfails for several C/C++ benchmarks from spec2000 for i686 with -flto after r199422

2013-08-05 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602

--- Comment #11 from Jan Hubicka hubicka at gcc dot gnu.org ---
Created attachment 30616
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30616action=edit
Proposed fix

Patch I am testing. The problem was that ltrans passes got overzelaous on
clearing local flags.  I think this bug was there for a while, I wonder why it
did not hit us before.

The patch fixes the testcase seen in one of dups of this PR, does it fix all of
SPEC?


[Bug rtl-optimization/57708] [4.8 regression] function clobbers callee saved register on ARM

2013-08-05 Thread rearnsha at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57708

--- Comment #4 from Richard Earnshaw rearnsha at gcc dot gnu.org ---
Proposed patch posted here:

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00194.html


[Bug c++/34938] ICE with function pointers and attribute noreturn

2013-08-05 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34938

Paolo Carlini paolo.carlini at oracle dot com changed:

   What|Removed |Added

 CC|gcc-bugs at gcc dot gnu.org|

--- Comment #2 from Paolo Carlini paolo.carlini at oracle dot com ---
Lately doesn't ICE anymore, it's rejected.


[Bug other/56780] --disable-install-libiberty still installs libiberty.a

2013-08-05 Thread 2013.bugzilla.gcc.gnu.org at ingomueller dot net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56780

Ingo Müller 2013.bugzilla.gcc.gnu.org at ingomueller dot net changed:

   What|Removed |Added

 CC||2013.bugzilla.gcc.gnu.org@i
   ||ngomueller.net

--- Comment #7 from Ingo Müller 2013.bugzilla.gcc.gnu.org at ingomueller dot 
net ---
libiberty.a is still installed to /lib/libiberty.a in GCC 4.8.1, even with
--disable-install-libiberty set.

[Bug regression/58084] New: FAIL: gcc.dg/torture/pr8081.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)

2013-08-05 Thread ktkachov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58084

Bug ID: 58084
   Summary: FAIL: gcc.dg/torture/pr8081.c  -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (internal
compiler error)
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org

New ICE in the testsuite when targeting arm-none-eabi:

FAIL: gcc.dg/torture/pr8081.c  -O2 -flto -fno-use-linker-plugin
-flto-partition=none  (internal compiler error)
FAIL: gcc.dg/torture/pr8081.c  -O2 -flto -fno-use-linker-plugin
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr8081.c  -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (internal compiler error)
FAIL: gcc.dg/torture/pr8081.c  -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (test for excess errors)
WARNING: gcc.dg/torture/pr8081.c  -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  compilation failed to produce executable

example output:

Executing on host: $ROOT/build/obj/gcc2/gcc/xgcc -B$ROOT/build/obj/gcc2/gcc/
$ROOT/gcc/gcc/testsuite/gcc.dg/torture/pr8081.c gcc_tg.o 
-fno-diagnostics-show-caret -fdiagnostics-color=never   -O2 -flto
-fno-use-linker-plugin -flto-partition=none   -specs=rdimon.specs -Wa,-m
no-warn-deprecated  -Wl,-wrap,exit -Wl,-wrap,_exit -Wl,-wrap,main
-Wl,-wrap,abort -lm   -o ./pr8081.exe(timeout = 300)
$ROOT/gcc/gcc/testsuite/gcc.dg/torture/pr8081.c: In function 'retframe_block':
$ROOT/gcc/gcc/testsuite/gcc.dg/torture/pr8081.c:15:3: error: invalid conversion
in return statement
struct block

struct block

# VUSE .MEM_6
return retval;
$ROOT/gcc/gcc/testsuite/gcc.dg/torture/pr8081.c:15:3: internal compiler error:
verify_gimple failed
0x89d91d verify_gimple_in_cfg(function*)
$ROOT/gcc/gcc/tree-cfg.c:4807
0x7c1bf2 execute_function_todo
$ROOT/gcc/gcc/passes.c:1627
0x7c4d6d execute_todo
$ROOT/gcc/gcc/passes.c:1660
0x7c6e89 execute_one_ipa_transform_pass
$ROOT/gcc/gcc/passes.c:1843
0x7c6e89 execute_all_ipa_transforms()
$ROOT/gcc/gcc/passes.c:1873
0x574348 expand_function
$ROOT/gcc/gcc/cgraphunit.c:1601
0x575150 expand_all_functions
$ROOT/gcc/gcc/cgraphunit.c:1712
0x575150 compile()
$ROOT/gcc/gcc/cgraphunit.c:2049
0x4f9126 lto_main()
$ROOT/gcc/gcc/lto/lto.c:3872
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
lto-wrapper: $ROOT/build/obj/gcc2/gcc/xgcc returned 1 exit status
collect2: error: lto-wrapper returned 1 exit status
compiler exited with status 1


Bisection shows it started with r201468:

2013-08-02  Jan Hubicka  j...@suse.cz

* lto-cgraph.c (compute_ltrans_boundary): Add abstract origins into
boundaries.
* lto-streamer-out.c (tree_is_indexable): Results decls and
parm decls are not indexable.
(DFS_write_tree_body): Do not follow args and results.
(hash_tree): Likewise.
(output_functions): Rearrange so struct function is needed
only when real body is output; be able to also ouptut abstract
functions; output DECL_ARGUMENTS and DECL_RESULT.
(lto_output): When not in WPA, ale store abstract functions.
(write_symbol): Do not care about RESULT_DECL.
(output_symbol_p): Handle correctly sbtract decls.
* lto-streamer-in.c (input_function): Rearrange so struct
function can be NULL at entry; allow streaming of
functions w/o body; store DECL_ARGUMENTS and DECL_RESULT.
* ipa.c (symtab_remove_unreachable_nodes): Silence confused
sanity check during LTO.
* tree-streamer-out.c (write_ts_decl_non_common_tree_pointers): Skip
RESULT_DECl and DECL_ARGUMENTS.
* tree-streamer-in.c (lto_input_ts_decl_non_common_tree_pointers):
Likewise.


[Bug regression/58084] FAIL: gcc.dg/torture/pr8081.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)

2013-08-05 Thread ktkachov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58084

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||lto
 CC||jh at suse dot cz
   Target Milestone|--- |4.9.0
  Known to fail||4.9.0


[Bug lto/57776] [4.9 Regression] FAIL: gcc.dg/lto/pr56297 c_lto_pr56297_0.o-c_lto_pr56297_1.o link, -flto -fno-common (internal compiler error)

2013-08-05 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57776

--- Comment #2 from Uroš Bizjak ubizjak at gmail dot com ---
It is r200151 [1].

[1] http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00848.html

[Bug regression/58084] FAIL: gcc.dg/torture/pr8081.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)

2013-08-05 Thread ktkachov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58084

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Target||arm-none-eabi

--- Comment #1 from ktkachov at gcc dot gnu.org ---
In case it's needed, compiler configured for arm-none-eabi:

--with-fpu=neon-vfpv4 --with-float=hard --with-arch=armv7-a


[Bug fortran/45290] [F08] pointer initialization

2013-08-05 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45290

--- Comment #15 from janus at gcc dot gnu.org ---
(In reply to janus from comment #13)
 Just two minor leftovers:
 
 (1) Making global variables in a program SAVE_IMPLICIT. (Does it even make a
 difference?)

cf. PR 55207 (and apparently, yes, it does make a difference in some cases)


[Bug c++/46206] using typedef-name error with typedef name hiding struct name

2013-08-05 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46206

--- Comment #2 from Paolo Carlini paolo.carlini at oracle dot com ---
The difference is that in the first case the TYPE_DECL Bar is regenerated and
the DECL_IMPLICIT_TYPEDEF_P bit is lost, the true value set earlier by
create_implicit_typedef is lost.


[Bug middle-end/55595] [google] r172952 (LIPO) broke profiledbootstrap on google/main, and later in google/gcc-4_7

2013-08-05 Thread dtemirbulatov at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55595

Dinar Temirbulatov dtemirbulatov at gmail dot com changed:

   What|Removed |Added

 CC||dtemirbulatov at gmail dot com

--- Comment #4 from Dinar Temirbulatov dtemirbulatov at gmail dot com ---
For the gcc-4_7 this bug was fixed with this commit:

r194713 | dehao | 2012-12-24 16:49:06 -0800 (Mon, 24 Dec 2012) | 5 lines

and for gcc-4_8 this incident was resolved here:

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00063.html


[Bug fortran/49213] [OOP] gfortran rejects structure constructor expression

2013-08-05 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49213

--- Comment #14 from janus at gcc dot gnu.org ---
(In reply to janus from comment #13)
 type(t) :: x = t(y)
  1
 Error: Parameter 'y' at (1) has not been declared or is a variable, which
 does not reduce to a constant expression

This error also occurs for the following non-polymorphic version ...

type t
  integer, pointer :: j
end type t
integer, target :: i = 0
type(t) :: x = t(i)
end

... which should be valid at least in F08.


[Bug fortran/58085] New: Wrong indexing of an array in ASSOCIATE

2013-08-05 Thread vladimir.fuka at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58085

Bug ID: 58085
   Summary: Wrong indexing of an array in ASSOCIATE
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vladimir.fuka at gmail dot com

Program:

 real c(3,3)
 associate (x=shape(c))
   print *,lbound(x)
   print *,ubound(x)
   print *,x(1),x(2)
 end associate
end

Expected result:
1
2
3 3

Actual result:
 gfortran-4.7 indresult.f90 
  ./a.out 
   1
   2
   3   990059265
 gfortran-4.8 indresult.f90 
  ./a.out 
   1
   2
   3   0

,but:

 print *,x(0),x(1) ! bound checks off

--

 3 3


[Bug other/58086] New: Installer installs files outside --prefix

2013-08-05 Thread 2013.bugzilla.gcc.gnu.org at ingomueller dot net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58086

Bug ID: 58086
   Summary: Installer installs files outside --prefix
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: 2013.bugzilla.gcc.gnu.org at ingomueller dot net

The installer of gcc-4.8.1 installs the following files outside the path
specifide by --prefix to the configure script:

/lib/libiberty.a
/lib32/libquadmath.so.0.0.0
/lib32/libgomp.la
/lib32/libgij.la
/lib32/libmudflapth.la
/lib32/libmudflapth.a
/lib32/libssp_nonshared.a
/lib32/libgfortran.so.3.0.0
/lib32/libquadmath.la
/lib32/libgomp.spec
/lib32/libgomp.so.1.0.0
/lib32/libitm.a
/lib32/libssp_nonshared.la
/lib32/libstdc++.a
/lib32/libssp.la
/lib32/libobjc.la
/lib32/libgcj.la
/lib32/libgcj_bc.so.1.0.0
/lib32/libstdc++.la
/lib32/libobjc.so.4.0.0
/lib32/libsupc++.la
/lib32/libitm.la
/lib32/libitm.so.1.0.0
/lib32/libgfortran.a
/lib32/logging.properties
/lib32/libmudflap.la
/lib32/libmudflap.so.0.0.0
/lib32/libmudflap.a
/lib32/libitm.spec
/lib32/libgcj-tools.la
/lib32/security/classpath.security
/lib32/libmudflapth.so.0.0.0
/lib32/libssp.a
/lib32/libgcj_bc.so
/lib32/libgfortran.la
/lib32/libobjc.a
/lib32/libssp.so.0.0.0
/lib32/libgcc_s.so.1
/lib32/libgfortran.spec
/lib32/libsupc++.a
/lib32/libquadmath.a
/lib32/libgomp.a
/lib64/libquadmath.so.0.0.0
/lib64/libgomp.la
/lib64/libgij.la
/lib64/libmudflapth.la
/lib64/libmudflapth.a
/lib64/libssp_nonshared.a
/lib64/libgfortran.so.3.0.0
/lib64/libquadmath.la
/lib64/libgomp.spec
/lib64/libgomp.so.1.0.0
/lib64/libitm.a
/lib64/libssp_nonshared.la
/lib64/libstdc++.a
/lib64/libssp.la
/lib64/libobjc.la
/lib64/libgcj.la
/lib64/libgcj_bc.so.1.0.0
/lib64/libstdc++.la
/lib64/libobjc.so.4.0.0
/lib64/libsupc++.la
/lib64/libitm.la
/lib64/libitm.so.1.0.0
/lib64/libgfortran.a
/lib64/logging.properties
/lib64/libmudflap.la
/lib64/libmudflap.so.0.0.0
/lib64/libmudflap.a
/lib64/libitm.spec
/lib64/libgcj-tools.la
/lib64/security/classpath.security
/lib64/libmudflapth.so.0.0.0
/lib64/libssp.a
/lib64/libgcj_bc.so
/lib64/libgfortran.la
/lib64/libobjc.a
/lib64/libssp.so.0.0.0
/lib64/libgcc_s.so.1
/lib64/libgfortran.spec
/lib64/libsupc++.a
/lib64/libquadmath.a
/lib64/libgomp.a
/lib32/libgcc_s.so
/lib32/libgcj_bc.so.1
/lib64/libgcc_s.so
/lib64/libgcj_bc.so.1

I produced this list by executing the following commands:

 wget http://gcc.cybermirror.org/releases/gcc-4.8.1/gcc-4.8.1.tar.gz
 tar -xf gcc-4.8.1.tar.gz 
 cd gcc-4.8.1/
 ./configure --prefix=/opt/gcc-4.8 --program-suffix=-4.8
 make
 sudo checkinstall #answer some questions
 sudo dpkg --force-overwrite -i gcc-4.8_4.8.1-1_amd64.deb

dpkg then warns about the above list of files been overwritten.

I suppose that everything should be installed under PREFIX, instead.


[Bug rtl-optimization/58079] internal compiler error: in do_SUBST, at combine.c:711

2013-08-05 Thread mikpe at it dot uu.se
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58079

Mikael Pettersson mikpe at it dot uu.se changed:

   What|Removed |Added

 CC||rdsandiford at googlemail dot 
com

--- Comment #3 from Mikael Pettersson mikpe at it dot uu.se ---
Started with Uros' PR54457 patch in r191928.  I'm not sure if that patch was
wrong or if it exposed a problem in the mips backend.  Cc:ing a MIPS maintainer
(Richard S.)


[Bug translation/58087] New: Huge memory consumption with #pragma GCC optimize, __attribute__ and long output paths

2013-08-05 Thread manisandro at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58087

Bug ID: 58087
   Summary: Huge memory consumption with #pragma GCC optimize,
__attribute__ and long output paths
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: translation
  Assignee: unassigned at gcc dot gnu.org
  Reporter: manisandro at gmail dot com

Using gcc version 4.8.1 20130717 (Red Hat 4.8.1-5) (GCC)

Description:
Any code containing #pragma GCC optimize, lots of __attribute__(()) will result
in cc1 taking up huge amounts of memory _depending_ on the length of the output
path.

Test case:
$ for i in $(seq 1 1); do echo -e void __attribute__(()) fz$i(); 
test.h; done

$ cat  test.c EOF
#pragma GCC optimize (-O1)
#include test.h
int main(){ return 0; }
EOF

$ wget
https://gist.github.com/netj/526585/raw/9044a9972fd71d215ba034a38174960c1c9079ad/memusg
$ chmod +x memusg

$ mkdir a
$ ./memusg gcc -c -o a/test.o test.c
memusg: peak=485796

$ mkdir aaa
$ ./memusg gcc -c -o aaa/test.o test.c
memusg: peak=4827100

Observations:
- The -O1 is not important, passing any other string to #pragma GCC optimize
(regardless of whether it is valid or invalid) will also trigger the issue
- As noted, the memory consumption depends on the length of the output path


[Bug lto/57602] [4.9 Regression] Runfails for several C/C++ benchmarks from spec2000 for i686 with -flto after r199422

2013-08-05 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602

--- Comment #12 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Jan,

I tried to test your fix and got the following error message while
building trunk compiler (with your fix):

../../../../../trunk/libstdc++-v3/src/c++11/fstream-inst.cc:48:1:
error: node is alias but not definition
 } // namespace
 ^
_ZNSt9basic_iosIwSt11char_traitsIwEED1Ev/764 (std::basic_ios_CharT,
_Traits::~basic_ios() [with _CharT = wchar_t; _Traits =
std::char_traitswchar_t]) @0x7f1375b1be40
  Type: function alias cpp_implicit_alias
  Visibility: external public visibility_specified
  Address is taken.
  References:
  Referring:
  Availability: not_available
  Function flags:
  Called by:
  Calls:
../../../../../trunk/libstdc++-v3/src/c++11/fstream-inst.cc:48:1:
internal compiler error: verify_cgraph_node failed
0x7dc6b1 verify_cgraph_node(cgraph_node*)
../../trunk/gcc/cgraph.c:2621
0x7d6567 verify_symtab_node(symtab_node_def*)
../../trunk/gcc/symtab.c:763
0x7d65a7 verify_symtab()
../../trunk/gcc/symtab.c:780
0x98118b symtab_remove_unreachable_nodes(bool, _IO_FILE*)
../../trunk/gcc/ipa.c:477
0xf33f20 ipa_inline
../../trunk/gcc/ipa-inline.c:1800
Please submit a full bug report,

Please, let me know if more info is needed.

2013/8/5 hubicka at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602

 --- Comment #11 from Jan Hubicka hubicka at gcc dot gnu.org ---
 Created attachment 30616
   -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30616action=edit
 Proposed fix

 Patch I am testing. The problem was that ltrans passes got overzelaous on
 clearing local flags.  I think this bug was there for a while, I wonder why it
 did not hit us before.

 The patch fixes the testcase seen in one of dups of this PR, does it fix all 
 of
 SPEC?

 --
 You are receiving this mail because:
 You are on the CC list for the bug.


[Bug lto/57602] [4.9 Regression] Runfails for several C/C++ benchmarks from spec2000 for i686 with -flto after r199422

2013-08-05 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602

--- Comment #13 from Jan Hubicka hubicka at ucw dot cz ---
 Please, let me know if more info is needed.
Actually I got the same ICE in meantime.  Here is improved patch (it is still
testing for me)

Index: cgraph.c
===
*** cgraph.c(revision 201483)
--- cgraph.c(working copy)
*** verify_cgraph_node (struct cgraph_node *
*** 2363,2369 
error (inline clone in same comdat group list);
error_found = true;
  }
!   if (!node-symbol.definition  node-local.local)
  {
error (local symbols must be defined);
error_found = true;
--- 2363,2369 
error (inline clone in same comdat group list);
error_found = true;
  }
!   if (!node-symbol.definition  !node-symbol.in_other_partition 
node-local.local)
  {
error (local symbols must be defined);
error_found = true;
Index: ipa.c
===
*** ipa.c(revision 201483)
--- ipa.c(working copy)
*** symtab_remove_unreachable_nodes (bool be
*** 376,382 
  {
if (file)
  fprintf (file,  %s, cgraph_node_name (node));
!   cgraph_reset_node (node);
changed = true;
  }
  }
--- 376,390 
  {
if (file)
  fprintf (file,  %s, cgraph_node_name (node));
!   node-symbol.analyzed = false;
!   node-symbol.definition = false;
!   node-symbol.cpp_implicit_alias = false;
!   node-symbol.alias = false;
!   node-symbol.weakref = false;
!   if (!node-symbol.in_other_partition)
! node-local.local = false;
!   cgraph_node_remove_callees (node);
!   ipa_remove_all_references (node-symbol.ref_list);
changed = true;
  }
  }
*** function_and_variable_visibility (bool w
*** 888,894 
  }
FOR_EACH_DEFINED_FUNCTION (node)
  {
!   node-local.local = cgraph_local_node_p (node);

/* If we know that function can not be overwritten by a different
semantics
   and moreover its section can not be discarded, replace all direct calls
--- 896,902 
  }
FOR_EACH_DEFINED_FUNCTION (node)
  {
!   node-local.local |= cgraph_local_node_p (node);

/* If we know that function can not be overwritten by a different
semantics
   and moreover its section can not be discarded, replace all direct calls


[Bug tree-optimization/58088] New: ICE in gcc.c

2013-08-05 Thread ishiura-compiler at ml dot kwansei.ac.jp
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58088

Bug ID: 58088
   Summary: ICE in gcc.c
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ishiura-compiler at ml dot kwansei.ac.jp

GCC 4.9.0 ICEs on the following code. (i686 and x86_64)

  $ cat error.c

  int main (void)
  {
int x = 0;
int y = 127 | ( 128  ( 2 * x ));

return 0;
  }

  $ i686-pc-linux-gnu-gcc-4.9.0 error.c

  i686-pc-linux-gnu-gcc-4.9.0: internal compiler error: Segmentation
fault (program cc1)
  0x8053b4e execute
../../../../../gcc/gcc/gcc.c:2824
  0x8053e1a do_spec_1
../../../../../gcc/gcc/gcc.c:4616
  0x80565bd process_brace_body
../../../../../gcc/gcc/gcc.c:5873
  0x80565bd handle_braces
../../../../../gcc/gcc/gcc.c:5787
  0x8054a2a do_spec_1
../../../../../gcc/gcc/gcc.c:5270
  0x80565bd process_brace_body
../../../../../gcc/gcc/gcc.c:5873
  0x80565bd handle_braces
../../../../../gcc/gcc/gcc.c:5787
  0x8054a2a do_spec_1
../../../../../gcc/gcc/gcc.c:5270
  0x805414e do_spec_1
../../../../../gcc/gcc/gcc.c:5375
  0x80565bd process_brace_body
../../../../../gcc/gcc/gcc.c:5873
  0x80565bd handle_braces
../../../../../gcc/gcc/gcc.c:5787
  0x8054a2a do_spec_1
../../../../../gcc/gcc/gcc.c:5270
  0x80565bd process_brace_body
../../../../../gcc/gcc/gcc.c:5873
  0x80565bd handle_braces
../../../../../gcc/gcc/gcc.c:5787
  0x8054a2a do_spec_1
../../../../../gcc/gcc/gcc.c:5270
  0x80565bd process_brace_body
../../../../../gcc/gcc/gcc.c:5873
  0x80565bd handle_braces
../../../../../gcc/gcc/gcc.c:5787
  0x8054a2a do_spec_1
../../../../../gcc/gcc/gcc.c:5270
  0x80565bd process_brace_body
../../../../../gcc/gcc/gcc.c:5873
  0x80565bd handle_braces
../../../../../gcc/gcc/gcc.c:5787
  Please submit a full bug report,
  with preprocessed source if appropriate.
  Please include the complete backtrace with any bug report.
  See http://gcc.gnu.org/bugs.html for instructions.

  $ i686-pc-linux-gnu-gcc-4.9.0 -v
  Using built-in specs.
  COLLECT_GCC=i686-pc-linux-gnu-gcc-4.9.0
 
COLLECT_LTO_WRAPPER=/usr/local/i686-tools/gcc-4.9.0/libexec/gcc/i686-pc-linux-gnu/4.9.0/lto-wrapper
  Target: i686-pc-linux-gnu
  Configured with: ../../../../gcc/configure
--prefix=/usr/local/i686-tools/gcc-4.9.0/
--with-gmp=/usr/local/gmp-5.1.1/ --with-mpfr=/usr/local/mpfr-3.1.2/
--with-mpc=/usr/local/mpc-1.0.1/ --disable-multilib --disable-nls
--enable-languages=c
  Thread model: posix
  gcc version 4.9.0 20130805 (experimental) (GCC)


[Bug tree-optimization/58088] [4.8/4.9 Regression] ICE in gcc.c

2013-08-05 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58088

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-05
 CC||mpolacek at gcc dot gnu.org
  Known to work||4.7.3
   Target Milestone|--- |4.8.2
Summary|ICE in gcc.c|[4.8/4.9 Regression] ICE in
   ||gcc.c
 Ever confirmed|0   |1
  Known to fail||4.8.1, 4.9.0

--- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org ---
Ugh.  Confirmed.


[Bug c++/46206] using typedef-name error with typedef name hiding struct name

2013-08-05 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46206

--- Comment #3 from Paolo Carlini paolo.carlini at oracle dot com ---
More correctly: it seems that when we parse typedef struct Bar { } Bar; we
create two TYPE_DECL: first, one marked as DECL_IMPLICIT_TYPEDEF_P in pushtag_1
(via create_implicit_typedef); then a second, real, one in grokdeclarator, via
build_lang_decl (TYPE_DECL... ). When we do lookup for struct Bar bar, it can
happen, depending on layout details, that the *second* one is found, thus the
check in check_elaborated_type_specifier triggers.


[Bug tree-optimization/58088] [4.8/4.9 Regression] ICE in gcc.c

2013-08-05 Thread ktkachov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58088

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Target||i686-pc-linux-gnu,
   ||arm-none-eabi
 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
FWIW, also segfaults on arm-none-eabi.

gdb says:

fold_binary_loc (loc=787, code=BIT_AND_EXPR, type=0x76eba5e8,
op0=0x77052488, op1=0x76de6280)


[Bug tree-optimization/58088] [4.8/4.9 Regression] ICE in gcc.c

2013-08-05 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58088

--- Comment #3 from Marek Polacek mpolacek at gcc dot gnu.org ---
Started with r187280.


[Bug c++/58089] New: expanding empty parameter pack as constructor arguments requires accessible copy constructor

2013-08-05 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58089

Bug ID: 58089
   Summary: expanding empty parameter pack as constructor
arguments requires accessible copy constructor
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org

struct X
{
X() { }
private:
X(const X r);
};


template typename... Args
void f(Args... args)
{
X t(args...);
}


int main()
{
f();
}


$ g++ -std=gnu++11 t.cc
t.cc: In instantiation of 'void f(Args ...) [with Args = {}]':
t.cc:18:7:   required from here
t.cc:5:5: error: 'X::X(const X)' is private
 X(const X r);
 ^
t.cc:12:16: error: within this context
 X t(args...);
^


[Bug c++/58089] expanding empty parameter pack as constructor arguments requires accessible copy constructor

2013-08-05 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58089

--- Comment #1 from Jonathan Wakely redi at gcc dot gnu.org ---
Using list-initialization works fine:

  X t{args...};


[Bug lto/57602] [4.9 Regression] Runfails for several C/C++ benchmarks from spec2000 for i686 with -flto after r199422

2013-08-05 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602

--- Comment #14 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Hi Jan,

I checked that  all benches from spec2000 are run successfully with
-flto options and eembc_2_0 suite was also run sucessfully with lto
(for 32-bit mode).

So go ahead and commit your fix.

Best regards.
Yuri.

2013/8/5 hubicka at ucw dot cz gcc-bugzi...@gcc.gnu.org:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602

 --- Comment #13 from Jan Hubicka hubicka at ucw dot cz ---
 Please, let me know if more info is needed.
 Actually I got the same ICE in meantime.  Here is improved patch (it is still
 testing for me)

 Index: cgraph.c
 ===
 *** cgraph.c(revision 201483)
 --- cgraph.c(working copy)
 *** verify_cgraph_node (struct cgraph_node *
 *** 2363,2369 
 error (inline clone in same comdat group list);
 error_found = true;
   }
 !   if (!node-symbol.definition  node-local.local)
   {
 error (local symbols must be defined);
 error_found = true;
 --- 2363,2369 
 error (inline clone in same comdat group list);
 error_found = true;
   }
 !   if (!node-symbol.definition  !node-symbol.in_other_partition 
 node-local.local)
   {
 error (local symbols must be defined);
 error_found = true;
 Index: ipa.c
 ===
 *** ipa.c(revision 201483)
 --- ipa.c(working copy)
 *** symtab_remove_unreachable_nodes (bool be
 *** 376,382 
   {
 if (file)
   fprintf (file,  %s, cgraph_node_name (node));
 !   cgraph_reset_node (node);
 changed = true;
   }
   }
 --- 376,390 
   {
 if (file)
   fprintf (file,  %s, cgraph_node_name (node));
 !   node-symbol.analyzed = false;
 !   node-symbol.definition = false;
 !   node-symbol.cpp_implicit_alias = false;
 !   node-symbol.alias = false;
 !   node-symbol.weakref = false;
 !   if (!node-symbol.in_other_partition)
 ! node-local.local = false;
 !   cgraph_node_remove_callees (node);
 !   ipa_remove_all_references (node-symbol.ref_list);
 changed = true;
   }
   }
 *** function_and_variable_visibility (bool w
 *** 888,894 
   }
 FOR_EACH_DEFINED_FUNCTION (node)
   {
 !   node-local.local = cgraph_local_node_p (node);

 /* If we know that function can not be overwritten by a different
 semantics
and moreover its section can not be discarded, replace all direct calls
 --- 896,902 
   }
 FOR_EACH_DEFINED_FUNCTION (node)
   {
 !   node-local.local |= cgraph_local_node_p (node);

 /* If we know that function can not be overwritten by a different
 semantics
and moreover its section can not be discarded, replace all direct calls

 --
 You are receiving this mail because:
 You are on the CC list for the bug.


[Bug middle-end/45631] devirtualization with profile feedback does not work for function pointers

2013-08-05 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45631

--- Comment #4 from Jan Hubicka hubicka at gcc dot gnu.org ---
Not much ideas except for implementing more smart (=expensive) common value
histogram collection.  I wonder how often such patterns hits us in practice?
The problem here is that the functions are interleaving in regular pattern that
won't get the counters to saturate...


[Bug middle-end/58041] Unaligned access to arrays in packed structure

2013-08-05 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58041

--- Comment #28 from Martin Jambor jamborm at gcc dot gnu.org ---
Thanks, for testing, I have submitted the patch for a review:

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00224.html


[Bug c++/46206] using typedef-name error with typedef name hiding struct name

2013-08-05 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46206

Paolo Carlini paolo.carlini at oracle dot com changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |paolo.carlini at oracle 
dot com
   Target Milestone|--- |4.9.0

--- Comment #4 from Paolo Carlini paolo.carlini at oracle dot com ---
I have a patchlet which works for the testcase and passes testing. Let's fix
this insanity, one way or another.


[Bug target/56110] Sub-optimal code: unnecessary CMP after AND

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56110

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-05
 CC||ramana at gcc dot gnu.org
 Ever confirmed|0   |1


[Bug target/56102] Wrong rtx cost calculated for Thumb1

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56102

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-05
 CC||ramana at gcc dot gnu.org
   Target Milestone|--- |4.9.0
 Ever confirmed|0   |1

--- Comment #3 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
Is this now fixed by 

http://gcc.gnu.org/ml/gcc-cvs/2013-03/msg00784.html


[Bug bootstrap/58090] New: bootstrap fails comparison with --enable-gather-detailed-mem-stats

2013-08-05 Thread andi-gcc at firstfloor dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58090

Bug ID: 58090
   Summary: bootstrap fails comparison with
--enable-gather-detailed-mem-stats
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

On x86_64-linux

Works without --enable-gather-detailed-mem-stats

make[2]: *** [compare] Error 1
make[1]: *** [stage3-bubble] Error 2
make: *** [all] Error 2


[Bug tree-optimization/56369] Missed opportunity to combine comparisons with zero

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56369

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-05
 CC||ramana at gcc dot gnu.org
 Ever confirmed|0   |1


[Bug middle-end/57540] stack pointer related loop invariants after reload for ARM mode

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57540

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-05
 CC||ramana at gcc dot gnu.org
  Component|target  |middle-end
 Ever confirmed|0   |1


[Bug target/54473] Compiling advancemame on raspberry pi yields unrecognizable insn

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54473

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||ramana at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #5 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
Duplicate of 50099.

*** This bug has been marked as a duplicate of bug 50099 ***


[Bug target/50099] ICE: internal compiler error: in extract_insn, at recog.c:2113 while building lttng-ust

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50099

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 CC||patenaude at gmail dot com

--- Comment #12 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
*** Bug 54473 has been marked as a duplicate of this bug. ***


[Bug rtl-optimization/57708] [4.8 regression] function clobbers callee saved register on ARM

2013-08-05 Thread rearnsha at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57708

Richard Earnshaw rearnsha at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |4.8.2

--- Comment #5 from Richard Earnshaw rearnsha at gcc dot gnu.org ---
Fixed with:

PR rtl-optimization/57708
* recog.c (peep2_find_free_register): Validate all regs in a
multi-reg mode.

Trunk revision: r201501.
gcc-4.8 revision: r201510.


[Bug target/55634] ARM: gcc vector extensions: storing vector to unaligned memory location does not use VST1.8 NEON instruction

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55634

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-05
 CC||ramana at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
On AArch64 with no strict alignment we end up generating with .002t.original

trunk 


{
  T tmp = *a + *b;
  extern void * memcpy (void *, const void *, long unsigned int);

T tmp = *a + *b;
  MEM[(char * {ref-all})result] = MEM[(char * {ref-all})tmp];, result;
}



On A32 or indeed AArch64 with -mstrict-align we end up generating 

{
  T tmp = *a + *b;
  extern void * memcpy (void *, const void *, long unsigned int);

T tmp = *a + *b;
  memcpy (result, (const void *) tmp, 16);
}


Where do you expect the memcpy to have been made redundant or a use of the
appropriate movmisalign insn - richi ? 

Ramana


[Bug target/58065] ARM MALLOC_ABI_ALIGNMENT is wrong

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58065

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ramana at gcc dot gnu.org

--- Comment #6 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #4)
 Created attachment 30601 [details]
 Proposed patch

If you want to propose a patch please post to the mailing list. 

Ramana


[Bug target/40523] GCC generates invalid instructions when building for Thumb-2 on armel

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40523

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||ramana at gcc dot gnu.org
  Known to work||
 Resolution|--- |WONTFIX
   Target Milestone|--- |4.4.0

--- Comment #4 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
4.3 is no longer interesting and this is fixed on 4.4.0 onwards.


[Bug c/55349] arm-linux-androideabi-gcc-4.6: Internal compiler error compiling libpng in debug mode

2013-08-05 Thread rearnsha at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55349

Richard Earnshaw rearnsha at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Richard Earnshaw rearnsha at gcc dot gnu.org ---
No testcase provided


[Bug target/48250] ICE in reload_cse_simplify_operands, at postreload.c:403

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48250

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |4.7.0

--- Comment #7 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
Fixed on 4.7.0 - wont fix on 4.6.x


[Bug target/43590] ICE in spill_failure, at reload1.c:2158

2013-08-05 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43590

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||4.7.0
 Resolution|--- |FIXED
   Assignee|ramana at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org
   Target Milestone|--- |4.7.0
  Known to fail||4.6.4

--- Comment #7 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
Fixed 4.7.0 onwards.


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2013-08-05 Thread rearnsha at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829

--- Comment #8 from Richard Earnshaw rearnsha at gcc dot gnu.org ---
(In reply to Daniel Santos from comment #7)
 First off, I apologize for my late response here.
 
 (In reply to comment #5)
 I'm going to respond a little backwards..
 
  In fact, on ARM there is no branch instruction that can be used for  0 
  as a
  side effect of a subtract.  To get the desired effect the code would have 
  to be
  completely re-arranged to factor out the  0 (bmi) and then == 0 (beq)
  cases first.
 
 I'm not an ARM programmer, but I'm looking at my reference book and it would
 appear that BGT would perform a branch of greater than for signed comparison
 and and BHI for unsigned comparison.  Again, convert the subtraction into a
 comparison (subtract, but discard the result) and branch based upon the
 flags (for signed numbers):
 
 cmpr0, r1
 bgt.L1
 bne.L2
 ;handle equality here
 

Unfortunately, computers don't to infinite precision arithmetic by default. 
That would perform a different comparison in that it checks that r0  r1, not
whether r0 - r1  0.  The difference, for signed comparisons, is when overflow
occurs.

Consider the case where (in your original code) a has the value INT_MIN (ie
-2147483648) and b has the value 1.

Now clearly a  b and by the normal rules of arithmetic (infinite precision) we
would expect a - b to be less than zero.

However, INT_MIN - 1 cannot be represented in a 32-bit long value and becomes
INT_MAX due to overflow; the result is that for these values a - b  0!

On ARM and x86, the flag setting that results from a subtract operation is, in
effect a comparison of the original operands, rather than a comparison of the
result; that is on ARM

   subs rd, rn, rm

is equivalent to 

   cmp rn, rm

except that the register rd is not written by the comparison.

Power PC is different: it's subtract and compare instruction really does use
the result of the subtraction to form the comparison.


  1   2   >