date:20091119

Re: [Dwarf-Discuss] Does gcc optimization impacts retrieving Dwarf information?

2009-11-19 Thread Mark Wielaard

On Wed, 2009-11-18 at 18:19 +0530, M. Mohan Kumar wrote:
 Are VTA patches part of mainline gcc now? If not, where could we get the 
 VTA patches?

The VTA implementation is in mainline gcc now. There are also some
backports to gcc 4.4, like the gcc that Fedora 12 ships with.

Cheers,

Mark

Re: [Dwarf-Discuss] Does gcc optimization impacts retrieving Dwarf information?

2009-11-19 Thread M. Mohan Kumar


On 11/19/2009 04:30 PM, Mark Wielaard wrote:

On Wed, 2009-11-18 at 18:19 +0530, M. Mohan Kumar wrote:

Are VTA patches part of mainline gcc now? If not, where could we get the
VTA patches?


The VTA implementation is in mainline gcc now. There are also some
backports to gcc 4.4, like the gcc that Fedora 12 ships with.


Hi Mark,

Thank you very much for the info. Is there any option needs to be passed 
to gcc to enable this VTA feature?

Re: [Dwarf-Discuss] Does gcc optimization impacts retrieving Dwarf information?

2009-11-19 Thread Mark Wielaard

On Thu, 2009-11-19 at 19:15 +0530, M. Mohan Kumar wrote:
 On 11/19/2009 04:30 PM, Mark Wielaard wrote:
  On Wed, 2009-11-18 at 18:19 +0530, M. Mohan Kumar wrote:
  Are VTA patches part of mainline gcc now? If not, where could we get the
  VTA patches?
 
  The VTA implementation is in mainline gcc now. There are also some
  backports to gcc 4.4, like the gcc that Fedora 12 ships with.

 Thank you very much for the info. Is there any option needs to be passed 
 to gcc to enable this VTA feature?

See the following options from:
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options

-fvar-tracking
Run variable tracking pass. It computes where variables are
stored at each position in code. Better debugging information is
then generated (if the debugging information format supports
this information). 

It is enabled by default when compiling with optimization (-Os,
-O, -O2, ...), debugging information (-g) and the debug info
format supports it. 


-fvar-tracking-assignments
Annotate assignments to user variables early in the compilation
and attempt to carry the annotations over throughout the
compilation all the way to the end, in an attempt to improve
debug information while optimizing. Use of -gdwarf-4 is
recommended along with it. 

It can be enabled even if var-tracking is disabled, in which
case annotations will be created and maintained, but discarded
at the end.

Re: [Dwarf-Discuss] Does gcc optimization impacts retrieving Dwarf information?

2009-11-19 Thread Richard Guenther

On Thu, Nov 19, 2009 at 2:55 PM, Mark Wielaard m...@redhat.com wrote:
 On Thu, 2009-11-19 at 19:15 +0530, M. Mohan Kumar wrote:
 On 11/19/2009 04:30 PM, Mark Wielaard wrote:
  On Wed, 2009-11-18 at 18:19 +0530, M. Mohan Kumar wrote:
  Are VTA patches part of mainline gcc now? If not, where could we get the
  VTA patches?
 
  The VTA implementation is in mainline gcc now. There are also some
  backports to gcc 4.4, like the gcc that Fedora 12 ships with.

 Thank you very much for the info. Is there any option needs to be passed
 to gcc to enable this VTA feature?

It is enabled by default when -g is specified.

Richard.

 See the following options from:
 http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options

 -fvar-tracking
        Run variable tracking pass. It computes where variables are
        stored at each position in code. Better debugging information is
        then generated (if the debugging information format supports
        this information).

        It is enabled by default when compiling with optimization (-Os,
        -O, -O2, ...), debugging information (-g) and the debug info
        format supports it.


 -fvar-tracking-assignments
        Annotate assignments to user variables early in the compilation
        and attempt to carry the annotations over throughout the
        compilation all the way to the end, in an attempt to improve
        debug information while optimizing. Use of -gdwarf-4 is
        recommended along with it.

        It can be enabled even if var-tracking is disabled, in which
        case annotations will be created and maintained, but discarded
        at the end.

Re: i370 port - constructing compile script

2009-11-19 Thread Ulrich Weigand

Paul Edwards wrote:

 gcov-iov creates a gcov-iov.h which has a version number
 which changes when I change MVS versions.  So I am
 thinking of updating gcov-iov.c so that when the target is
 MVS, it generates a more fixed format.

I don't see how the generated number depends on the MVS
version ...  It is supposed to depend solely on the *GCC*
version string of the compiler currently being built.

 gengtype-yacc.c  .h gets created with my new version of bison.
 I just want to use the one that came with 3.4.6 instead of
 having it regenerated.  Do I need to hide my bison to stop
 that from happening?

Well, it's just a make step -- the files will get rebuilt if
and only if the gengtype-yacc.y file is more recent than the
gengtype-yacc.c and .h files.  In the default 3.4.6 tarball
this is not the case.  Did you somehow modify file timestamps
while unpacking / copying the files?

 gencheck.h is being generated as an empty file, which doesn't
 work well on some environments.  I want it to at least have a
 comment saying /* empty file */.  I can put that in as part of
 the build script too.

Well, adding a comment should be trivial at the place in the
Makefile.in where gencheck.h is generated (s-gencheck).

In any case, more recent GCC versions no longer refer to this
file at all.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Thomas Gleixner wrote:

Can the GCC folks please shed some light on this:

standard function start:

 push   %ebp
 mov%esp, %ebp
 
 call   mcount

modified function start on a handful of functions only seen with gcc
4.4.x on x86 32 bit:

push   %edi
lea0x8(%esp),%edi
and$0xfff0,%esp
pushl  -0x4(%edi)
push   %ebp
mov%esp,%ebp
...
call   mcount

This modification leads to a hard to solve problem in the kernel
function graph tracer which assumes that the stack looks like:

   return address
   saved  ebp

With the modified function start sequence this is not longer true and
the manipulation of the return address on the stack fails silently.

Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it
looks like a gcc 4.4.x feature.

There is no real obvious reason why the edi magic needs to be done
_before_ 

push   %ebp
mov%esp,%ebp

Thanks,

tglx

RE: Understanding IRA

2009-11-19 Thread Ian Bolton

Jeff Law wrote: 
 On 11/16/09 10:33, Ian Bolton wrote:
  The question is: how to fix this?  I have initially put the
 REG_ALLOC_ORDER
  back to how it was and changed the operand constraints in our MD
 file,
  so each of the apathetic instructions, will either take a 't'
 (TOP_CREG)
  or '?b' (BOTTOM_REG).  The '?' shows that this alternative is
 slightly more
  costly than using 't'.  On the benchmark that benefitted the most
 from
  the new REG_ALLOC_ORDER, these constraints are almost achieving the
 same
  thing.  It is only almost there because I am struggling with how to
 show
  two alternatives for loads and stores, which currently have an 'm'
  constraint.
 
 I'm not aware of any way to describe this to IRA.  In theory I guess
 IRA
 could be twiddled to use  TARGET_ADDRESS_COST to derive some kind of
 cost difference based on the registers used in the MEM, but it seems
 rather hackish.

I found somewhere in record_address_reg to achieve what I needed:

  for (k = 0; k  cost_classes_num; k++)
  {
i = cost_classes[k];
pp-cost[k]
  += (ira_get_may_move_cost (Pmode, i, rclass, true) * scale) / 2;

/* Slightly nudge memory addresses away from using BOTTOM_REGS and
   C_REGS, so they take TOP_CREGS instead - should this pseudo later
   need BOTTOM_REGS, there will be a higher cost to use TOP_CREGS
   and it will still get BOTTOM_REGS. This is equivalent to adding a
   ?b on each instruction that currently has a 'm' constraint.

   Writing this generically might look something like:

   pp-cost[k] += TARGET_ADDRESS_EXTRA_COST_P(cost_classes[k])
  ? (scale/2) : 0;
*/
if (cost_classes[k] == BOTTOM_REGS || cost_classes[k] == C_REGS)
  pp-cost[k] += (scale) / 2;
  }

I was then able to alter all our register-agnostic instructions in our
.md file to take either a 't' for TOP_CREGS for a '?b' for BOTTOM_REGS.
Initial results showed that IRA was moving input arguments out of their
BOTTOM_REGS (e.g. $c1) into TOP_CREGS to do work on them, since it
thought TOP_CREGS were less costly to use, despite the cost of the move
instruction to get the input argument into a TOP_CREG.

I wondered if this was because we add 2 to the alt_cost for each '?' and
our REG_MOVE_COST is also 2, but this becomes irrelevant anyway if you
do a lot of work with the input argument, since each potential use in a
BOTTOM_REG incurs some kind of penalty (one per '?' seen) which will
eventually persuade IRA that leaving the input argument in the
BOTTOM_REG it arrived in is more costly than moving it to a TOP_CREG.

I addressed this problem by splitting my register bank a little
differently: instead of making a distinction between BOTTOM_REGS and
TOP_CREGS, I made it so there was only a penalty if you used one of the
non-argument BOTTOM_REGS (i.e. a callee-save BOTTOM_REG).  This meant
that IRA was happy to leave input arguments in their BOTTOM_REGS but
erred towards using TOP_CREGS once the caller-save BOTTOM_REGS had run
out.  This was an improvement, but there was still a case where these
'?' penalties were not aligned with reality:

T1 = A + B; // can use any register, TOP_CREGS appears cheaper
T2 = A - C; // can use any register, TOP_CREGS appears cheaper
T3 = A  D; // must use BOTTOM_REGS

The constraints for the first two instructions show that TOP_CREGS is
cheaper, but then you have to plant a move to get A into a BOTTOM_REG
to do the AND; in reality, we know it cheaper to have A in a BOTTOM_REG
all along, but the '?' constraint suggests there is a cost in doing this
for the ADD and SUB and so IRA will put A in a TOP_CREG at first and
incur the cost of the move because it is still cheaper than the costs I
have defined in with my constraints.  I don't believe there is a way to
communicate a conditional cost, so I'm thinking that constraints are not
the solution for me at this time.  What are your thoughts?

 You might try something like this:
 
1. Crank up the callee-saved register cost adjustment in
 assign_hard_reg so that it's scaled based on REG_FREQ.  That will
 probably lead to some regressions based on my experiments.
 
2. Then add a check that completely avoids the cost adjustment in
 cases where we pushed a MAY_SPILL_P allocno.  This was on my todo list,
 but I haven't got to it yet.
 
 If you wanted to get fancy, you could track the maximum number of
 neighbors in each class as allocnos are pushed and use that to adjust
 how many registers are cost adjusted in assign_hard_reg.  The idea
 being
 the more neighbors the allocno has, the  more callee-saved regsiters
 we're likely to need.
 
 You could also try to account for the fact that once allocated, the
 callee saved regsiter is available for free to non-conflicting
 allocnos.   So you could iterate over those and decrease the penalty
 for
 using a callee saved register on the current allocno.  Given the
 interfaces provided by IRA, this could be compile-time expensive.
 

Your small improvement to

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Andrew Haley

Thomas Gleixner wrote:
 On Thu, 19 Nov 2009, Thomas Gleixner wrote:
 
 Can the GCC folks please shed some light on this:
 
 standard function start:
 
push   %ebp
mov%esp, %ebp

call   mcount
 
 modified function start on a handful of functions only seen with gcc
 4.4.x on x86 32 bit:
 
   push   %edi
   lea0x8(%esp),%edi
   and$0xfff0,%esp
   pushl  -0x4(%edi)
   push   %ebp
   mov%esp,%ebp
   ...
   call   mcount
 
 This modification leads to a hard to solve problem in the kernel
 function graph tracer which assumes that the stack looks like:
 
return address
saved  ebp
 
 With the modified function start sequence this is not longer true and
 the manipulation of the return address on the stack fails silently.
 
 Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it
 looks like a gcc 4.4.x feature.
 
 There is no real obvious reason why the edi magic needs to be done
 _before_ 
 
   push   %ebp
   mov%esp,%ebp

Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.

We're aligning the stack properly, as per the ABI requirements.  Can't
you just fix the tracer?

Andrew.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

On 11/19/2009 07:37 AM, Thomas Gleixner wrote:
 
 modified function start on a handful of functions only seen with gcc
 4.4.x on x86 32 bit:
 
   push   %edi
   lea0x8(%esp),%edi
   and$0xfff0,%esp
   pushl  -0x4(%edi)
   push   %ebp
   mov%esp,%ebp
   ...
   call   mcount
 

The real questions is why we're aligning the stack in the kernel.  It is
probably not what we want -- we don't use SSE for anything but a handful
of special cases in the kernel, and we don't want the overhead.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Richard Guenther

On Thu, Nov 19, 2009 at 4:45 PM, H. Peter Anvin h...@zytor.com wrote:
 On 11/19/2009 07:37 AM, Thomas Gleixner wrote:

 modified function start on a handful of functions only seen with gcc
 4.4.x on x86 32 bit:

       push   %edi
       lea    0x8(%esp),%edi
       and    $0xfff0,%esp
       pushl  -0x4(%edi)
       push   %ebp
       mov    %esp,%ebp
       ...
       call   mcount


 The real questions is why we're aligning the stack in the kernel.  It is
 probably not what we want -- we don't use SSE for anything but a handful
 of special cases in the kernel, and we don't want the overhead.

It's likely because you have long long vars on the stack which is
faster when they are aligned.  -mno-stackrealign may do what you
want (or may not, I have not checked).  I assume you already
use -mpreferred-stack-boundary=2.

Richard.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Richard Guenther

On Thu, Nov 19, 2009 at 4:49 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Nov 19, 2009 at 4:45 PM, H. Peter Anvin h...@zytor.com wrote:
 On 11/19/2009 07:37 AM, Thomas Gleixner wrote:

 modified function start on a handful of functions only seen with gcc
 4.4.x on x86 32 bit:

       push   %edi
       lea    0x8(%esp),%edi
       and    $0xfff0,%esp
       pushl  -0x4(%edi)
       push   %ebp
       mov    %esp,%ebp
       ...
       call   mcount


 The real questions is why we're aligning the stack in the kernel.  It is
 probably not what we want -- we don't use SSE for anything but a handful
 of special cases in the kernel, and we don't want the overhead.

 It's likely because you have long long vars on the stack which is
 faster when they are aligned.  -mno-stackrealign may do what you
 want (or may not, I have not checked).  I assume you already
 use -mpreferred-stack-boundary=2.

Just checking it seems you must be using -mincoming-stack-boundary=2
instead but keep the preferred stack boundary at 4.

Richard.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

On 11/19/2009 07:44 AM, Andrew Haley wrote:
 
 We're aligning the stack properly, as per the ABI requirements.  Can't
 you just fix the tracer?
 

Per the ABI requirements?  We're talking 32 bits, here.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Richard Guenther

On Thu, Nov 19, 2009 at 4:54 PM, H. Peter Anvin h...@zytor.com wrote:
 On 11/19/2009 07:44 AM, Andrew Haley wrote:

 We're aligning the stack properly, as per the ABI requirements.  Can't
 you just fix the tracer?


 Per the ABI requirements?  We're talking 32 bits, here.

Hm, even with

void bar (int *);
void foo (void)
{
  int x;
  bar (x);
}

gcc -S -O2 -m32 -mincoming-stack-boundary=2 t.c

we re-align the stack.  That looks indeed bogus.

HJ, you invented all this code, what's the reason for the above?

Richard.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
 Thomas Gleixner wrote:

 We're aligning the stack properly, as per the ABI requirements.  Can't
 you just fix the tracer?

And how do we do that? The hooks that are in place have no idea of what
happened before they were called?

-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Andrew Haley wrote:
 Thomas Gleixner wrote:
  There is no real obvious reason why the edi magic needs to be done
  _before_ 
  
  push   %ebp
  mov%esp,%ebp
 
 Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.

And why is this not done in 99% of the functions in the kernel, just
in this one and some random others ?
 
 We're aligning the stack properly, as per the ABI requirements.  Can't
 you just fix the tracer?

Where is that ABI requirement that

push   %ebp

needs to happen on an aligned stack ? 

And why is this something GCC did not care about until GCC4.4 ?

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

On 11/19/2009 08:02 AM, Steven Rostedt wrote:
 On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
 Thomas Gleixner wrote:
 
 We're aligning the stack properly, as per the ABI requirements.  Can't
 you just fix the tracer?
 
 And how do we do that? The hooks that are in place have no idea of what
 happened before they were called?
 

Furthermore, it is nonsense -- ABI stack alignment on *32 bits* is 4
bytes, not 16.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:

 We're aligning the stack properly, as per the ABI requirements.  Can't
 you just fix the tracer?

Unfortunately, this is the only fix we have:

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index b416512..cd39064 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -143,7 +143,6 @@ config FUNCTION_GRAPH_TRACER
bool Kernel Function Graph Tracer
depends on HAVE_FUNCTION_GRAPH_TRACER
depends on FUNCTION_TRACER
-   depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
default y
help
  Enable the kernel to trace a function at both its return
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 45e6c01..50c2251 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -1070,6 +1070,11 @@ static __init int init_graph_trace(void)
 {
max_bytes_for_cpu = snprintf(NULL, 0, %d, nr_cpu_ids - 1);
 
+#if defined(CONFIG_X86_32  __GNUC__ = 4  __GNUC_MINOR__ = 4)
+   pr_info(WARNING: GCC 4.4.X breaks the function graph tracer on i686.\n
+The function graph tracer will be disabled.\n);
+   return -1;
+#endif
return register_tracer(graph_trace);
 }
 
-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Andrew Haley

Thomas Gleixner wrote:
 On Thu, 19 Nov 2009, Andrew Haley wrote:
 Thomas Gleixner wrote:
 There is no real obvious reason why the edi magic needs to be done
 _before_ 

 push   %ebp
 mov%esp,%ebp
 Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
 
 And why is this not done in 99% of the functions in the kernel, just
 in this one and some random others ?

If I could see the function I might be able to tell you.  It's either a
performance enhancement, something to do with SSE, or it's a bug.

Andrew.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Frederic Weisbecker

On Thu, Nov 19, 2009 at 11:02:32AM -0500, Steven Rostedt wrote:
 On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
  Thomas Gleixner wrote:
 
  We're aligning the stack properly, as per the ABI requirements.  Can't
  you just fix the tracer?
 
 And how do we do that? The hooks that are in place have no idea of what
 happened before they were called?
 
 -- Steve


Yep, this is really something we can't fix from the tracer

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Andrew Haley wrote:

 Thomas Gleixner wrote:
  On Thu, 19 Nov 2009, Andrew Haley wrote:
  Thomas Gleixner wrote:
  There is no real obvious reason why the edi magic needs to be done
  _before_ 
 
push   %ebp
mov%esp,%ebp
  Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
  
  And why is this not done in 99% of the functions in the kernel, just
  in this one and some random others ?
 
 If I could see the function I might be able to tell you.  It's either a
 performance enhancement, something to do with SSE, or it's a bug.

kernel/time/timer_stats.c timer_stats_update_stats()

Here is the disassembly:

8107ad50 timer_stats_update_stats:
8107ad50:   57  push   %edi
8107ad51:   8d 7c 24 08 lea0x8(%esp),%edi
8107ad55:   83 e4 f0and$0xfff0,%esp
8107ad58:   ff 77 fcpushl  -0x4(%edi)
8107ad5b:   55  push   %ebp
8107ad5c:   89 e5   mov%esp,%ebp
8107ad5e:   57  push   %edi
8107ad5f:   56  push   %esi
8107ad60:   53  push   %ebx
8107ad61:   83 ec 6csub$0x6c,%esp
8107ad64:   e8 47 92 f8 ff  call   81003fb0 mcount
8107ad69:   8b 77 04mov0x4(%edi),%esi
8107ad6c:   89 75 a4mov%esi,-0x5c(%ebp)
8107ad6f:   65 8b 35 14 00 00 00mov%gs:0x14,%esi
8107ad76:   89 75 e4mov%esi,-0x1c(%ebp)
8107ad79:   31 f6   xor%esi,%esi
8107ad7b:   8b 35 60 5a cd 81   mov0x81cd5a60,%esi
8107ad81:   8b 1f   mov(%edi),%ebx
8107ad83:   85 f6   test   %esi,%esi
8107ad85:   8b 7f 08mov0x8(%edi),%edi
8107ad88:   75 18   jne8107ada2 
timer_stats_update_stats+0x52
8107ad8a:   8b 45 e4mov-0x1c(%ebp),%eax
8107ad8d:   65 33 05 14 00 00 00xor%gs:0x14,%eax
8107ad94:   75 53   jne8107ade9 
timer_stats_update_stats+0x99
8107ad96:   83 c4 6cadd$0x6c,%esp
8107ad99:   5b  pop%ebx
8107ad9a:   5e  pop%esi
8107ad9b:   5f  pop%edi
8107ad9c:   5d  pop%ebp
8107ad9d:   8d 67 f8lea-0x8(%edi),%esp
8107ada0:   5f  pop%edi
8107ada1:   c3  ret
8107ada2:   be 00 7a d6 81  mov$0x81d67a00,%esi
8107ada7:   89 45 acmov%eax,-0x54(%ebp)
8107adaa:   89 75 a0mov%esi,-0x60(%ebp)
8107adad:   89 5d b4mov%ebx,-0x4c(%ebp)
8107adb0:   64 8b 35 78 6a d6 81mov%fs:0x81d66a78,%esi
8107adb7:   8b 34 b5 20 50 cd 81mov-0x7e32afe0(,%esi,4),%esi
8107adbe:   89 4d b0mov%ecx,-0x50(%ebp)
8107adc1:   01 75 a0add%esi,-0x60(%ebp)
8107adc4:   89 55 b8mov%edx,-0x48(%ebp)
8107adc7:   8b 45 a0mov-0x60(%ebp),%eax
8107adca:   89 7d c0mov%edi,-0x40(%ebp)
8107adcd:   e8 de f7 76 00  call   817ea5b0 _spin_lock_irqsave
8107add2:   83 3d 60 5a cd 81 00cmpl   $0x0,0x81cd5a60
8107add9:   89 c3   mov%eax,%ebx
8107addb:   75 11   jne8107adee 
timer_stats_update_stats+0x9e
8107addd:   89 da   mov%ebx,%edx
8107addf:   8b 45 a0mov-0x60(%ebp),%eax
8107ade2:   e8 79 fc 76 00  call   817eaa60 
_spin_unlock_irqrestore
8107ade7:   eb a1   jmp8107ad8a 
timer_stats_update_stats+0x3a
8107ade9:   e8 52 e4 fc ff  call   81049240 __stack_chk_fail
8107adee:   8d 45 a8lea-0x58(%ebp),%eax
8107adf1:   8b 55 a4mov-0x5c(%ebp),%edx
8107adf4:   e8 f7 fd ff ff  call   8107abf0 tstat_lookup
8107adf9:   85 c0   test   %eax,%eax
8107adfb:   74 05   je 8107ae02 
timer_stats_update_stats+0xb2
8107adfd:   ff 40 14incl   0x14(%eax)
8107ae00:   eb db   jmp8107addd 
timer_stats_update_stats+0x8d
8107ae02:   f0 ff 05 00 67 fd 81lock incl 0x81fd6700
8107ae09:   eb d2   jmp8107addd 
timer_stats_update_stats+0x8d
8107ae0b:   90  nop
8107ae0c:   90  nop
8107ae0d:   90  nop
8107ae0e:   90  nop
8107ae0f:   90  nop


There is a dozen more of those.

Thanks,

tglx

with rev. 154329 ppl doesn't build anymore

2009-11-19 Thread Rainer Emrich

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This is with gcc-trunk rev. 154329
build=x86_64-w64-mingw32
ppl-0.10.2

Used to work until yesterday, now:

/bin/sh ../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.
- -I../../ppl-0.10.2/src -I..  -I.. -I../../ppl-0.10.2/src
- -I/mingw/x86_64-w64/x86_64-w64/x86_64-w64/gcc-4.5.0/mingw/include  -g -O2
- -frounding-math  -W -Wall -MT Box.lo -MD -MP -MF .deps/Box.Tpo -c -o Box.lo
../../ppl-0.10.2/src/Box.cc
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../ppl-0.10.2/src -I.. -I..
- -I../../ppl-0.10.2/src
- -I/mingw/x86_64-w64/x86_64-w64/x86_64-w64/gcc-4.5.0/mingw/include -g -O2
- -frounding-math -W -Wall -MT Box.lo -MD -MP -MF .deps/Box.Tpo -c
../../ppl-0.10.2/src/Box.cc -o Box.o
In file included from ../../ppl-0.10.2/src/Row.defs.hh:504:0,
 from ../../ppl-0.10.2/src/Linear_Row.defs.hh:28,
 from ../../ppl-0.10.2/src/Constraint.defs.hh:28,
 from ../../ppl-0.10.2/src/Box.defs.hh:33,
 from ../../ppl-0.10.2/src/Box.cc:24:
../../ppl-0.10.2/src/Row.inlines.hh: In member function 'void
Parma_Polyhedra_Library::Row::allocate(Parma_Polyhedra_Library::dimension_type,
Parma_Polyhedra_Library::Row::Flags)':
../../ppl-0.10.2/src/Row.inlines.hh:92:1: error: non-placement deallocation
function 'static void Parma_Polyhedra_Library::Row_Impl_Handler::Impl::operator
delete(void*, Parma_Polyhedra_Library::dimension_type)'
../../ppl-0.10.2/src/Row.inlines.hh:224:31: error: selected for placement delete

some issue with placement, non-placement deallocation.

Cheers,
Rainer
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksFfHcACgkQoUhjsh59BL5E0gCeKr7cvQ5eTbJAy/JFksmdiBuZ
PK8An2EzjY1Gw60Gwp7SJc3xfujAe843
=fH3l
-END PGP SIGNATURE-

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Andi Kleen

Richard Guenther richard.guent...@gmail.com writes:

 It's likely because you have long long vars on the stack which is
 faster when they are aligned.

It's not faster for 32bit.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Thu, 19 Nov 2009, Thomas Gleixner wrote:
 
 standard function start:
 
push   %ebp
mov%esp, %ebp

call   mcount
 
 modified function start on a handful of functions only seen with gcc
 4.4.x on x86 32 bit:
 
   push   %edi
   lea0x8(%esp),%edi
   and$0xfff0,%esp
   pushl  -0x4(%edi)
   push   %ebp
   mov%esp,%ebp
   ...
   call   mcount

That's some crazy sh*t anyway, since we don't _want_ the stack to be 
16-byte aligned in the kernel. We do

KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2)

why is that not working?

So this looks like a gcc bug, plain and simple.

 This modification leads to a hard to solve problem in the kernel
 function graph tracer which assumes that the stack looks like:
 
return address
saved  ebp

Umm. But it still does, doesn't it? That

pushl  -0x4(%edi)
push   %ebp

should do it - the -0x4(%edi) thing seems to be trying to reload the 
return address. No? 

Maybe I misread the code - but regardless, it does look like a gcc code 
generation bug if only because we really don't want a 16-byte aligned 
stack anyway, and have asked for it to not be done.

So I agree that gcc shouldn't do that crazy prologue (and certainly _not_ 
before calling mcount anyway), but I'm not sure I agree with that detail 
of your analysis or explanation.

Linus

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Linus Torvalds wrote:
 Umm. But it still does, doesn't it? That
 
   pushl  -0x4(%edi)
   push   %ebp
 
 should do it - the -0x4(%edi) thing seems to be trying to reload the 
 return address. No? 
 
 Maybe I misread the code - but regardless, it does look like a gcc code 
 generation bug if only because we really don't want a 16-byte aligned 
 stack anyway, and have asked for it to not be done.
 
 So I agree that gcc shouldn't do that crazy prologue (and certainly _not_ 
 before calling mcount anyway), but I'm not sure I agree with that detail 
 of your analysis or explanation.

Yes, it does store the return address before the pushed ebp, but this
is a copy of the real stack entry which is before the pushed edi.

The function graph tracer needs to redirect the return into the tracer
and it therefor saves the real return address and modifies the stack
so the return ends up in the tracer code which then goes back to the
real return address.

But in this prologue/aligment case we modify the copy and not the real
return address on the stack, so we return without calling into the
tracer which is causing the headache because the state of the tracer
becomes confused.

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:

  This modification leads to a hard to solve problem in the kernel
  function graph tracer which assumes that the stack looks like:
  
 return address
 saved  ebp
 
 Umm. But it still does, doesn't it? That
 
   pushl  -0x4(%edi)
   push   %ebp
 
 should do it - the -0x4(%edi) thing seems to be trying to reload the 
 return address. No? 

Yes that is what it is doing. The problem we have is that it is putting
into the frame pointer a copy of the return address, and not the
actual pointer. Which is fine for the function tracer, but breaks the
function graph tracer (which is a much more powerful tracer).

Technically, this is all that mcount must have. And yes, we are making
an assumption that the return address in the frame pointer is the one
that will be used to leave the function. But the reason for making this
copy just seems to be all messed up.

I don't know if the ABI says anything about the return address in the
frame pointer must be the actual return address. But it would be nice if
the gcc folks would let us guarantee that it is.

-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Richard Guenther

On Thu, Nov 19, 2009 at 6:59 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:

  This modification leads to a hard to solve problem in the kernel
  function graph tracer which assumes that the stack looks like:
 
         return address
         saved  ebp

 Umm. But it still does, doesn't it? That

       pushl  -0x4(%edi)
       push   %ebp

 should do it - the -0x4(%edi) thing seems to be trying to reload the
 return address. No?

 Yes that is what it is doing. The problem we have is that it is putting
 into the frame pointer a copy of the return address, and not the
 actual pointer. Which is fine for the function tracer, but breaks the
 function graph tracer (which is a much more powerful tracer).

 Technically, this is all that mcount must have. And yes, we are making
 an assumption that the return address in the frame pointer is the one
 that will be used to leave the function. But the reason for making this
 copy just seems to be all messed up.

 I don't know if the ABI says anything about the return address in the
 frame pointer must be the actual return address. But it would be nice if
 the gcc folks would let us guarantee that it is.

Note that I only can reproduce the issue with
-mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
And
you didn't provide us with a testcase either ... so please open
a bugzilla and attach preprocessed source of a file that
shows the problem, note the function it happens in and provide
the command-line options you used for building.

Otherwise it's going to be all speculation on our side.

Thanks,
Richard.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Andrew Haley

Thomas Gleixner wrote:
 On Thu, 19 Nov 2009, Thomas Gleixner wrote:
 
 Can the GCC folks please shed some light on this:
 
 standard function start:
 
push   %ebp
mov%esp, %ebp

call   mcount
 
 modified function start on a handful of functions only seen with gcc
 4.4.x on x86 32 bit:
 
   push   %edi
   lea0x8(%esp),%edi
   and$0xfff0,%esp
   pushl  -0x4(%edi)
   push   %ebp
   mov%esp,%ebp
   ...
   call   mcount
 
 This modification leads to a hard to solve problem in the kernel
 function graph tracer which assumes that the stack looks like:
 
return address
saved  ebp
 
 With the modified function start sequence this is not longer true and
 the manipulation of the return address on the stack fails silently.
 
 Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it
 looks like a gcc 4.4.x feature.
 
 There is no real obvious reason why the edi magic needs to be done
 _before_ 
 
   push   %ebp
   mov%esp,%ebp

OK, I found it.  There is a struct defined as

struct entry {
 ...
} __attribute__((__aligned__((1  (4);

and then in timer_stats_update_stats you have a local variable of type
struct entry:

void timer_stats_update_stats()
{
 spinlock_t *lock;
 struct entry *entry, input;

So, gcc has to 16-align the stack pointer to satisfy the alignment
for struct entry.

Andrew.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Andrew Haley

Richard Guenther wrote:

 And
 you didn't provide us with a testcase either ... so please open
 a bugzilla and attach preprocessed source of a file that
 shows the problem, note the function it happens in and provide
 the command-line options you used for building.

I've got all that off-list.  I found the cause, and replied in another
email.  It's not a bug.

Andrew.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 18:20 +, Andrew Haley wrote:

 OK, I found it.  There is a struct defined as
 
 struct entry {
  ...
 } __attribute__((__aligned__((1  (4);
 
 and then in timer_stats_update_stats you have a local variable of type
 struct entry:
 
 void timer_stats_update_stats()
 {
  spinlock_t *lock;
  struct entry *entry, input;
 
 So, gcc has to 16-align the stack pointer to satisfy the alignment
 for struct entry.

It has to align the entire stack? Why not just the variable within the
stack?

-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Richard Guenther wrote:
 Note that I only can reproduce the issue with
 -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
 And
 you didn't provide us with a testcase either ... so please open
 a bugzilla and attach preprocessed source of a file that
 shows the problem, note the function it happens in and provide
 the command-line options you used for building.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Andrew Pinski

On Thu, Nov 19, 2009 at 10:33 AM, Steven Rostedt rost...@goodmis.org wrote:
 It has to align the entire stack? Why not just the variable within the
 stack?

I had proposed a patch which just aligns the variable but that patch
was never really commented on and HJL's patches to realign the whole
stack went in afterwards.

Thanks,
Andrew Pinski

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Andrew Haley

Steven Rostedt wrote:
 On Thu, 2009-11-19 at 18:20 +, Andrew Haley wrote:
 
 OK, I found it.  There is a struct defined as

 struct entry {
  ...
 } __attribute__((__aligned__((1  (4);

 and then in timer_stats_update_stats you have a local variable of type
 struct entry:

 void timer_stats_update_stats()
 {
  spinlock_t *lock;
  struct entry *entry, input;

 So, gcc has to 16-align the stack pointer to satisfy the alignment
 for struct entry.
 
 It has to align the entire stack? Why not just the variable within the
 stack?

How?. gcc has to know, at compile time, the offset from sp of each variable.
So, it of course makes sure that offset is 16-aligned, but it also has to
16-align the stack pointer.

Andrew.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

On 11/19/2009 10:33 AM, Steven Rostedt wrote:
 
 It has to align the entire stack? Why not just the variable within the
 stack?
 

Because if the stack pointer isn't aligned, it won't know where it can
stuff the variable.  It has to pad *somewhere*, and since you may have
more than one such variable, the most efficient way -- and by far least
complex -- is for the compiler to align the stack when it sets up the
stack frame.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Andrew Haley wrote:
 OK, I found it.  There is a struct defined as
 
 struct entry {
  ...
 } __attribute__((__aligned__((1  (4);
 
 and then in timer_stats_update_stats you have a local variable of type
 struct entry:
 
 void timer_stats_update_stats()
 {
  spinlock_t *lock;
  struct entry *entry, input;
 
 So, gcc has to 16-align the stack pointer to satisfy the alignment
 for struct entry.

This does not explain why GCC  4.4.x actually puts

 push %ebp
 mov  %esp, %ebp

first and why GCC 4.4.x decides to create an extra copy of the return
address instead of just keeping the mcount stack magic right at the
function entry.

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Thu, 19 Nov 2009, Richard Guenther wrote:
 
 Note that I only can reproduce the issue with
 -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.

Since you can reproduce it with -mincoming-stack-boundary=2, I woul 
suggest just fixing mcount handling that way regardless of anything else. 
The current code generated by gcc is just insane - even for the case where 
you _want_ 16-byte stack alignment.

Instead crazy code like

   push   %edi
   lea0x8(%esp),%edi
   and$0xfff0,%esp
   pushl  -0x4(%edi)
   push   %ebp
   mov%esp,%ebp
   ...
   call   mcount

the sane thing to do would be to just do it as

push   %ebp
mov%esp,%ebp
call   mcount
and$0xfff0,%esp

since

 - no sane 'mcount' implementation can ever care about 16-byte stack 
   alignment anyway, so aliging the stack before mcount is crazy.

 - mcount is special anyway, and is the only thing that cares about that 
   whole ebp/return address thing is mcount, and _all_ your games with 
   %edi are about that mcount thing.

IOW, once you as a compiler person understand that the 'mcount' call is 
special, you should have realized that all the work you did for it was 
totally pointless and stupid to begin with. 

You must already have that special mcount logic (the whole code to save a 
register early and push the fake mcount stack frame), so instead of _that_ 
special logic, change it to a different mcount special logic that 
associates the 'mcount' call with theframe pointer pushing. 

That will not only make the Linux kernel tracer happy, it will make all 
your _other_ users happier too, since you can generate smaller and more 
efficient code.

Admittedly, anybody who compiles with -pg probably doesn't care deeply 
about smaller and more efficient code, since the mcount call overhead 
tends to make the thing moot anyway, but it really looks like a win-win 
situation to just fix the mcount call sequence regardless.

 And you didn't provide us with a testcase either ... so please open a 
 bugzilla and attach preprocessed source of a file that shows the 
 problem, note the function it happens in and provide the command-line 
 options you used for building.
 
 Otherwise it's going to be all speculation on our side.

See above - all you need to do is to just fix mcount calling.

Now, there is a separate bug that shows that you seem to over-align the 
stack when not asked for, and yes, since we noticed that I hope that 
Thomas and friends will fix that, but I think your mcount logic could (and 
should) be fixed as an independent sillyness.

Linus

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Thu, 19 Nov 2009, Andrew Haley wrote:
 
 I've got all that off-list.  I found the cause, and replied in another
 email.  It's not a bug.

Oh Gods, are we back to gcc people saying sure, we do stupid things, but 
it's allowed, so we don't consider it a bug because it doesn't matter that 
real people care about real life, we only care about some paper, and real 
life doesn't matter, if it's 'undefined' we can make our idiotic choices 
regardless of what people need, and regardless of whether it actually 
generates better code or not.

Linus

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Thu, 19 Nov 2009, Linus Torvalds wrote:
 
 Oh Gods, are we back to gcc people saying sure, we do stupid things, but 
 it's allowed, so we don't consider it a bug because it doesn't matter that 
 real people care about real life, we only care about some paper, and real 
 life doesn't matter, if it's 'undefined' we can make our idiotic choices 
 regardless of what people need, and regardless of whether it actually 
 generates better code or not.

Put another way: the stack alignment itself may not be a bug, but gcc 
generating God-awful code for the mcount handling that results in problems 
in real life sure as hell is *stupid* enough to be called a bug.

I bet other people than just the kernel use the mcount hook for subtler 
things than just doing profiles. And even if they don't, the quoted code 
generation is just crazy _crap_.

Linus

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Ingo Molnar


* Linus Torvalds torva...@linux-foundation.org wrote:

 Admittedly, anybody who compiles with -pg probably doesn't care deeply 
 about smaller and more efficient code, since the mcount call overhead 
 tends to make the thing moot anyway, but it really looks like a 
 win-win situation to just fix the mcount call sequence regardless.

Just a sidenote: due to dyn-ftrace, which patches out all mcounts during 
bootup to be NOPs (and opt-in patches them in again if someone runs the 
function tracer), the cost is not as large as one would have it with say 
-pg based user-space profiling.

It's not completely zero-cost as the pure NOPs balloon the i$ footprint 
a bit and GCC generates different code too in some cases. But it's 
certainly good enough that it's generally pretty hard to prove overhead 
via micro or macro benchmarks that the patched out mcounts call sites 
are there.

Ingo

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Thu, 19 Nov 2009, Linus Torvalds wrote:
 
 I bet other people than just the kernel use the mcount hook for subtler 
 things than just doing profiles. And even if they don't, the quoted code 
 generation is just crazy _crap_.

For the kernel, if the only case is that timer_stat.c thing that Thomas 
pointed at, I guess we can at least work around it with something like the 
appended. The kernel code is certainly ugly too, no question about that. 

It's just that we'd like to be able to depend on mcount code generation 
not being insane even in the presense of ugly code..

The alternative would be to have some warning when this happens, so that 
we can at least see it. mcount won't work reliably

Linus

---
 kernel/time/timer_stats.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c
index ee5681f..488c7b8 100644
--- a/kernel/time/timer_stats.c
+++ b/kernel/time/timer_stats.c
@@ -76,7 +76,7 @@ struct entry {
 */
charcomm[TASK_COMM_LEN + 1];
 
-} cacheline_aligned_in_smp;
+};
 
 /*
  * Spinlock protecting the tables - not taken during lookup:
@@ -114,7 +114,7 @@ static ktime_t time_start, time_stop;
 #define MAX_ENTRIES(1UL  MAX_ENTRIES_BITS)
 
 static unsigned long nr_entries;
-static struct entry entries[MAX_ENTRIES];
+static struct entry entries[MAX_ENTRIES] cacheline_aligned_in_smp;
 
 static atomic_t overflow_count;

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Linus Torvalds wrote:
  I bet other people than just the kernel use the mcount hook for subtler 
  things than just doing profiles. And even if they don't, the quoted code 
  generation is just crazy _crap_.
 
 For the kernel, if the only case is that timer_stat.c thing that Thomas 
 pointed at, I guess we can at least work around it with something like the 
 appended. The kernel code is certainly ugly too, no question about that. 
 
 It's just that we'd like to be able to depend on mcount code generation 
 not being insane even in the presense of ugly code..
 
 The alternative would be to have some warning when this happens, so that 
 we can at least see it. mcount won't work reliably

There are at least 20 other random functions which have the same
problem. Have not looked at the details yet.

Just compiled with -mincoming-stack-boundary=4 and the problem goes
away as gcc now thinks that the incoming stack is already 16 byte
aligned. But that might break code which actually uses SSE

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
 * Linus Torvalds torva...@linux-foundation.org wrote:
 
  Admittedly, anybody who compiles with -pg probably doesn't care deeply 
  about smaller and more efficient code, since the mcount call overhead 
  tends to make the thing moot anyway, but it really looks like a 
  win-win situation to just fix the mcount call sequence regardless.
 
 Just a sidenote: due to dyn-ftrace, which patches out all mcounts during 
 bootup to be NOPs (and opt-in patches them in again if someone runs the 
 function tracer), the cost is not as large as one would have it with say 
 -pg based user-space profiling.
 
 It's not completely zero-cost as the pure NOPs balloon the i$ footprint 
 a bit and GCC generates different code too in some cases. But it's 
 certainly good enough that it's generally pretty hard to prove overhead 
 via micro or macro benchmarks that the patched out mcounts call sites 
 are there.

And frame pointers do add a little overhead as well. Too bad the mcount
ABI wasn't something like this:


function:
callmcount
[...]

This way, the function address for mcount would have been (%esp) and the
parent address would be 4(%esp). Mcount would work without frame
pointers and this whole mess would also become moot.

-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread David Daney


Linus Torvalds wrote:


On Thu, 19 Nov 2009, Linus Torvalds wrote:
I bet other people than just the kernel use the mcount hook for subtler 
things than just doing profiles. And even if they don't, the quoted code 
generation is just crazy _crap_.


For the kernel, if the only case is that timer_stat.c thing that Thomas 
pointed at, I guess we can at least work around it with something like the 
appended. The kernel code is certainly ugly too, no question about that. 

It's just that we'd like to be able to depend on mcount code generation 
not being insane even in the presense of ugly code..


The alternative would be to have some warning when this happens, so that 
we can at least see it. mcount won't work reliably




For the MIPS port of GCC and Linux I recently added the 
-mmcount-ra-address switch.  It causes the location of the return 
address (on the stack) to be passed to mcount in a scratch register.


Perhaps something similar could be done for x86.  It would make this 
patching of the return location more reliable at the expense of more 
code at the mcount invocation site.


For the MIPS case the code size doesn't increase, as it is done in the 
delay slot of the call instruction, which would otherwise be a nop.


David Daney

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Frederic Weisbecker

On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
 On Thu, 2009-11-19 at 11:10 -0800, David Daney wrote:
  Linus Torvalds wrote:
 
  For the MIPS port of GCC and Linux I recently added the 
  -mmcount-ra-address switch.  It causes the location of the return 
  address (on the stack) to be passed to mcount in a scratch register.
 
 Hehe, scratch register on i686 ;-)
 
 i686 has no extra regs. It just has:
 
 %eax, %ebx, %ecx, %edx - as the general purpose regs
 %esp - stack
 %ebp - frame pointer
 %edi, %esi - counter regs
 
 That's just 8 regs, and half of those are special.
 
  
  Perhaps something similar could be done for x86.  It would make this 
  patching of the return location more reliable at the expense of more 
  code at the mcount invocation site.
 
 I rather not put any more code in the call site.
 
  
  For the MIPS case the code size doesn't increase, as it is done in the 
  delay slot of the call instruction, which would otherwise be a nop.
 
 I showed in a previous post what the best would be for x86. That is just
 calling mcount at the very beginning of the function. The return address
 is automatically pushed onto the stack.
 Perhaps we could create another profiler? Instead of calling mcount,
 call a new function: __fentry__ or something. Have it activated with
 another switch. This could make the performance of the function tracer
 even better without all these exceptions.
 
   function:
   call __fentry__
   [...]
 
   
 -- Steve


I would really like this. So that we can forget about other possible
further suprises due to sophisticated function prologues beeing before
the mcount call.

And I guess that would fix it in every archs.

That said, Linus had a good point about the fact there might other uses
of mcount even more tricky than what does the function graph tracer,
outside the kernel, and those may depend on the strict ABI assumption
that 4(ebp) is always the _real_ return address, and that through all
the previous stack call. This is even a concern that extrapolates the
single mcount case.

So I wonder that actually the real problem is the lack of something that
could provide this guarantee. We may need a -real-ra-before-fp (yeah
I suck in naming).

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Ingo Molnar


* Steven Rostedt rost...@goodmis.org wrote:

 On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
  * Linus Torvalds torva...@linux-foundation.org wrote:
  
   Admittedly, anybody who compiles with -pg probably doesn't care deeply 
   about smaller and more efficient code, since the mcount call overhead 
   tends to make the thing moot anyway, but it really looks like a 
   win-win situation to just fix the mcount call sequence regardless.
  
  Just a sidenote: due to dyn-ftrace, which patches out all mcounts during 
  bootup to be NOPs (and opt-in patches them in again if someone runs the 
  function tracer), the cost is not as large as one would have it with say 
  -pg based user-space profiling.
  
  It's not completely zero-cost as the pure NOPs balloon the i$ footprint 
  a bit and GCC generates different code too in some cases. But it's 
  certainly good enough that it's generally pretty hard to prove overhead 
  via micro or macro benchmarks that the patched out mcounts call sites 
  are there.
 
 And frame pointers do add a little overhead as well. Too bad the mcount
 ABI wasn't something like this:
 
 
   function:
   callmcount
   [...]
 
 This way, the function address for mcount would have been (%esp) and 
 the parent address would be 4(%esp). Mcount would work without frame 
 pointers and this whole mess would also become moot.

In that case we could also fix up static callsites to this address as 
well (to jump +5 bytes into the function) and avoid the NOP as well in 
most cases. (That would in essence merge any slow-path function epilogue 
with the mcount cal instruction in terms of I$ footprint - i.e. it would 
be an even lower overhead feature.)

If only the kernel had its own compiler.

Ingo

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

On 11/19/2009 11:28 AM, Steven Rostedt wrote:
 
 Hehe, scratch register on i686 ;-)
 
 i686 has no extra regs. It just has:
 
 %eax, %ebx, %ecx, %edx - as the general purpose regs
 %esp - stack
 %ebp - frame pointer
 %edi, %esi - counter regs
 
 That's just 8 regs, and half of those are special.
 

For a modern ABI it is better described as:

%eax, %edx, %ecx- argument/return/scratch registers
%ebx, %esi, %edi- saved registers
%esp- stack pointer
%ebp- frame pointer (saved)

 Perhaps we could create another profiler? Instead of calling mcount,
 call a new function: __fentry__ or something. Have it activated with
 another switch. This could make the performance of the function tracer
 even better without all these exceptions.
 
   function:
   call __fentry__
   [...]
 

Calling the profiler immediately at the entry point is clearly the more
sane option.  It means the ABI is well-defined, stable, and independent
of what the actual function contents are.  It means that ABI isn't the
normal C ABI (the __fentry__ function would have to preserve all
registers), but that's fine...

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Kai Tietz

2009/11/19 Frederic Weisbecker fweis...@gmail.com:
 I would really like this. So that we can forget about other possible
 further suprises due to sophisticated function prologues beeing before
 the mcount call.

 And I guess that would fix it in every archs.

My 5 cent for this, too.

 That said, Linus had a good point about the fact there might other uses
 of mcount even more tricky than what does the function graph tracer,
 outside the kernel, and those may depend on the strict ABI assumption
 that 4(ebp) is always the _real_ return address, and that through all
 the previous stack call. This is even a concern that extrapolates the
 single mcount case.

 So I wonder that actually the real problem is the lack of something that
 could provide this guarantee. We may need a -real-ra-before-fp (yeah
 I suck in naming).

There are, especially in windows world. We noticed that for example
the Sun's JDK (which is compiled by VC) can be used in gcc compiled
code only by -fno-omit-frame-pointer, as otherwise it fails badly
reasoned by wrong ebp accesses.

Kai

-- 
|  (\_/) This is Bunny. Copy and paste
| (='.'=) Bunny into your signature to help
| ()_() him gain world domination

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Frederic Weisbecker

On Thu, Nov 19, 2009 at 08:54:56PM +0100, Kai Tietz wrote:
 2009/11/19 Frederic Weisbecker fweis...@gmail.com:
  I would really like this. So that we can forget about other possible
  further suprises due to sophisticated function prologues beeing before
  the mcount call.
 
  And I guess that would fix it in every archs.
 
 My 5 cent for this, too.
 
  That said, Linus had a good point about the fact there might other uses
  of mcount even more tricky than what does the function graph tracer,
  outside the kernel, and those may depend on the strict ABI assumption
  that 4(ebp) is always the _real_ return address, and that through all
  the previous stack call. This is even a concern that extrapolates the
  single mcount case.
 
  So I wonder that actually the real problem is the lack of something that
  could provide this guarantee. We may need a -real-ra-before-fp (yeah
  I suck in naming).
 
 There are, especially in windows world. We noticed that for example
 the Sun's JDK (which is compiled by VC) can be used in gcc compiled
 code only by -fno-omit-frame-pointer, as otherwise it fails badly
 reasoned by wrong ebp accesses.


Yeah but what we need is not only to ensure ebp is used as the frame
pointer but also that ebp + 4 is really the address that will be used
to return to the caller, and not a copy of the return value.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
 On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:

  function:
  call __fentry__
  [...]
  
  
  -- Steve
 
 
 I would really like this. So that we can forget about other possible
 further suprises due to sophisticated function prologues beeing before
 the mcount call.
 
 And I guess that would fix it in every archs.

Well, other archs use a register to store the return address. But it
would also be easy to do (pseudo arch assembly):

function:
mov lr, (%sp)
add 8, %sp
blr __fentry__
sub 8, %sp
mov (%sp), lr


That way the lr would have the current function, and the parent would
still be at 8(%sp)


 
 That said, Linus had a good point about the fact there might other uses
 of mcount even more tricky than what does the function graph tracer,
 outside the kernel, and those may depend on the strict ABI assumption
 that 4(ebp) is always the _real_ return address, and that through all
 the previous stack call. This is even a concern that extrapolates the
 single mcount case.

As I am proposing a new call. This means that mcount stay as is for
legacy reasons. Yes I know there exists the -finstrument-functions but
that adds way too much bloat to the code. One single call to the
profiler is all I want.


 
 So I wonder that actually the real problem is the lack of something that
 could provide this guarantee. We may need a -real-ra-before-fp (yeah
 I suck in naming).

Don't worry, so do the C compiler folks, I mean, come on mcount?

-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Thu, 19 Nov 2009, H. Peter Anvin wrote:
 
 Calling the profiler immediately at the entry point is clearly the more
 sane option.  It means the ABI is well-defined, stable, and independent
 of what the actual function contents are.  It means that ABI isn't the
 normal C ABI (the __fentry__ function would have to preserve all
 registers), but that's fine...

As far as I know, that's true of _mcount already: it's not a normal ABI 
and is rather a highly architecture-specific special case to begin with. 
At least ARM has some (several?) special mcount calling conventions, 
afaik.

(And then ARM people use __attribute__((naked)) and insert the code by 
inline asm, or something. That seems to be standard in the embedded 
world, where they often do even stranger things than we do in the 
kernel. At least our low-level system call and interrupt handlers are 
written as assembly language - the embedded world seems to commonly 
write them as C functions with magic attributes and inline asm).

Linus

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 11:50 -0800, H. Peter Anvin wrote:

  Perhaps we could create another profiler? Instead of calling mcount,
  call a new function: __fentry__ or something. Have it activated with
  another switch. This could make the performance of the function tracer
  even better without all these exceptions.
  
  function:
  call __fentry__
  [...]
  
 
 Calling the profiler immediately at the entry point is clearly the more
 sane option.  It means the ABI is well-defined, stable, and independent
 of what the actual function contents are.  It means that ABI isn't the
 normal C ABI (the __fentry__ function would have to preserve all
 registers), but that's fine...

mcount already has that requirement (saving all/most regs). Anyway, you
are right, we don't care. The tracer should carry the blunt of the load,
not the individual callers.

-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:

 Well, other archs use a register to store the return address. But it
 would also be easy to do (pseudo arch assembly):
 
   function:
   mov lr, (%sp)
   add 8, %sp
   blr __fentry__

Should be bl __fentry__ for branch and link.

   sub 8, %sp
   mov (%sp), lr
 
 
 That way the lr would have the current function, and the parent would
 still be at 8(%sp)

Actually, if we add a new profiler and can make our own specification, I
would say that the add and sub lines be the responsibility of
__fentry__. Then we would have:

function:
mov lr, (%sp)
bl __fentry__
mov (%sp), lr

If sp points to the current content, then replace (%sp) above with 
-8(%sp).  Then the implementation of a nop __fentry__ would simply be:

__fentry__:
blr

For anything more elaborate, __fentry__ would be responsible for all
adjustments.

-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Frederic Weisbecker

On Thu, Nov 19, 2009 at 03:05:41PM -0500, Steven Rostedt wrote:
 On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
  On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
 
 function:
 call __fentry__
 [...]
   
 
   -- Steve
  
  
  I would really like this. So that we can forget about other possible
  further suprises due to sophisticated function prologues beeing before
  the mcount call.
  
  And I guess that would fix it in every archs.
 
 Well, other archs use a register to store the return address. But it
 would also be easy to do (pseudo arch assembly):
 
   function:
   mov lr, (%sp)
   add 8, %sp
   blr __fentry__
   sub 8, %sp
   mov (%sp), lr
 
 
 That way the lr would have the current function, and the parent would
 still be at 8(%sp)
 


Yeah right, we need at least such very tiny prologue for
archs that store return addresses in a reg.


  
  That said, Linus had a good point about the fact there might other uses
  of mcount even more tricky than what does the function graph tracer,
  outside the kernel, and those may depend on the strict ABI assumption
  that 4(ebp) is always the _real_ return address, and that through all
  the previous stack call. This is even a concern that extrapolates the
  single mcount case.
 
 As I am proposing a new call. This means that mcount stay as is for
 legacy reasons. Yes I know there exists the -finstrument-functions but
 that adds way too much bloat to the code. One single call to the
 profiler is all I want.


Sure, the purpose is not to change the existing -mcount thing.
What I meant is that we could have -mcount and -real-ra-before-fp
at the same time to guarantee fp + 4 is really what we want while
using -mcount.

The __fentry__ idea is more neat, but the guarantee of a real pointer
to the return address is still something that lacks.


  
  So I wonder that actually the real problem is the lack of something that
  could provide this guarantee. We may need a -real-ra-before-fp (yeah
  I suck in naming).
 
 Don't worry, so do the C compiler folks, I mean, come on mcount?


I guess it has been first created for the single purpose of counting
specific functions but then it has been used for wider, unpredicted uses :)

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Frederic Weisbecker

On Thu, Nov 19, 2009 at 03:17:16PM -0500, Steven Rostedt wrote:
 On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:
 
  Well, other archs use a register to store the return address. But it
  would also be easy to do (pseudo arch assembly):
  
  function:
  mov lr, (%sp)
  add 8, %sp
  blr __fentry__
 
 Should be bl __fentry__ for branch and link.
 
  sub 8, %sp
  mov (%sp), lr
  
  
  That way the lr would have the current function, and the parent would
  still be at 8(%sp)
 
 Actually, if we add a new profiler and can make our own specification, I
 would say that the add and sub lines be the responsibility of
 __fentry__. Then we would have:
 
   function:
   mov lr, (%sp)
   bl __fentry__
   mov (%sp), lr
 
 If sp points to the current content, then replace (%sp) above with 
 -8(%sp).  Then the implementation of a nop __fentry__ would simply be:
 
   __fentry__:
   blr


Good point!

 
 For anything more elaborate, __fentry__ would be responsible for all
 adjustments.


Yep. The more we control it from __fentry__, the less we fall
down into unexpected surprises.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Linus Torvalds wrote:
 On Thu, 19 Nov 2009, Richard Guenther wrote:
  
  Note that I only can reproduce the issue with
  -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
 
 Since you can reproduce it with -mincoming-stack-boundary=2, I woul 
 suggest just fixing mcount handling that way regardless of anything else. 
 The current code generated by gcc is just insane - even for the case where 
 you _want_ 16-byte stack alignment.
 
 Instead crazy code like
 
push   %edi
lea0x8(%esp),%edi
and$0xfff0,%esp
pushl  -0x4(%edi)
push   %ebp
mov%esp,%ebp
...
call   mcount
 
 the sane thing to do would be to just do it as
 
   push   %ebp
   mov%esp,%ebp
   call   mcount
   and$0xfff0,%esp

which is what the 64bit compile does except that the mcount call
happens a bit later which is fine.

8107cd34 timer_stats_update_stats:
8107cd34:   55  push   %rbp
8107cd35:   48 89 e5mov%rsp,%rbp
8107cd38:   48 83 e4 c0 and$0xffc0,%rsp

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Thu, 19 Nov 2009, Frederic Weisbecker wrote:

  That way the lr would have the current function, and the parent would
  still be at 8(%sp)
 
 Yeah right, we need at least such very tiny prologue for
 archs that store return addresses in a reg.

Well, it will be architecture-dependent.

For example, alpha can store the return value in _any_ register if I 
recall correctly, so you can do the call to __fentry__ by just picking 
another register than the default one as the return address.

And powerpc has two special registers: link and ctr, but iirc you can only 
load 'link' with a branch instruction. Which means that you could do 
something like 

mflr 0
bl __fentry__

in the caller (I forget if R0 is actually a call-trashed register or not), 
and then __fentry__ could do something like

mflr 12 # save _new_ link
mtlr 0  # restore original link
mtctr 12# move __fentry__ link to ctr
.. do whatever ..
bctr# return to __fentry__ caller

to return with 'link' restored (but ctr and r0/r12 trashed - I don't 
recall the ppc calling conventions any more, but I think that is ok).

Saving to stack seems unnecessary and pointless.

Linus

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 12:36 -0800, Linus Torvalds wrote:
 
 On Thu, 19 Nov 2009, Frederic Weisbecker wrote:
 
   That way the lr would have the current function, and the parent would
   still be at 8(%sp)
  
  Yeah right, we need at least such very tiny prologue for
  archs that store return addresses in a reg.
 
 Well, it will be architecture-dependent.

Totally agree, as mcount today is architecture dependent.

 
 For example, alpha can store the return value in _any_ register if I 
 recall correctly, so you can do the call to __fentry__ by just picking 
 another register than the default one as the return address.
 
 And powerpc has two special registers: link and ctr, but iirc you can only 
 load 'link' with a branch instruction. Which means that you could do 
 something like 
 
   mflr 0
   bl __fentry__
 
 in the caller (I forget if R0 is actually a call-trashed register or not), 
 and then __fentry__ could do something like
 
   mflr 12 # save _new_ link
   mtlr 0  # restore original link
   mtctr 12# move __fentry__ link to ctr
   .. do whatever ..
   bctr# return to __fentry__ caller
 
 to return with 'link' restored (but ctr and r0/r12 trashed - I don't 
 recall the ppc calling conventions any more, but I think that is ok).
 
 Saving to stack seems unnecessary and pointless.

I was just using an example. But as you pointed out, each arch can find
its best way to handle it. Having the profiler called at the beginning
of the function is what I feel is the best.

We also get access to the function's parameters :-)


-- Steve

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

On i386, if we call __fentry__ immediately on entry the return address will be 
in 4(%esp), so I fail to see how you could not reliably have the return 
address.  Other arches would have different constraints, of course.

Frederic Weisbecker fweis...@gmail.com wrote:

On Thu, Nov 19, 2009 at 03:05:41PM -0500, Steven Rostedt wrote:
 On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
  On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
 
function:
call __fentry__
[...]
   

   -- Steve
  
  
  I would really like this. So that we can forget about other possible
  further suprises due to sophisticated function prologues beeing before
  the mcount call.
  
  And I guess that would fix it in every archs.
 
 Well, other archs use a register to store the return address. But it
 would also be easy to do (pseudo arch assembly):
 
  function:
  mov lr, (%sp)
  add 8, %sp
  blr __fentry__
  sub 8, %sp
  mov (%sp), lr
 
 
 That way the lr would have the current function, and the parent would
 still be at 8(%sp)
 


Yeah right, we need at least such very tiny prologue for
archs that store return addresses in a reg.

   
  
  That said, Linus had a good point about the fact there might other uses
  of mcount even more tricky than what does the function graph tracer,
  outside the kernel, and those may depend on the strict ABI assumption
  that 4(ebp) is always the _real_ return address, and that through all
  the previous stack call. This is even a concern that extrapolates the
  single mcount case.
 
 As I am proposing a new call. This means that mcount stay as is for
 legacy reasons. Yes I know there exists the -finstrument-functions but
 that adds way too much bloat to the code. One single call to the
 profiler is all I want.


Sure, the purpose is not to change the existing -mcount thing.
What I meant is that we could have -mcount and -real-ra-before-fp
at the same time to guarantee fp + 4 is really what we want while
using -mcount.

The __fentry__ idea is more neat, but the guarantee of a real pointer
to the return address is still something that lacks.


  
  So I wonder that actually the real problem is the lack of something that
  could provide this guarantee. We may need a -real-ra-before-fp (yeah
  I suck in naming).
 
 Don't worry, so do the C compiler folks, I mean, come on mcount?


I guess it has been first created for the single purpose of counting
specific functions but then it has been used for wider, unpredicted uses :)


--
Sent from my mobile phone. Please excuse any lack of formatting.

Re: with rev. 154329 ppl doesn't build anymore

2009-11-19 Thread Rainer Emrich

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rainer Emrich schrieb:
 This is with gcc-trunk rev. 154329
 build=x86_64-w64-mingw32
 ppl-0.10.2
 
 Used to work until yesterday, now:
 
 /bin/sh ../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.
 -I../../ppl-0.10.2/src -I..  -I.. -I../../ppl-0.10.2/src
 -I/mingw/x86_64-w64/x86_64-w64/x86_64-w64/gcc-4.5.0/mingw/include  -g -O2
 -frounding-math  -W -Wall -MT Box.lo -MD -MP -MF .deps/Box.Tpo -c -o Box.lo
 ../../ppl-0.10.2/src/Box.cc
 libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../ppl-0.10.2/src -I.. -I..
 -I../../ppl-0.10.2/src
 -I/mingw/x86_64-w64/x86_64-w64/x86_64-w64/gcc-4.5.0/mingw/include -g -O2
 -frounding-math -W -Wall -MT Box.lo -MD -MP -MF .deps/Box.Tpo -c
 ../../ppl-0.10.2/src/Box.cc -o Box.o
 In file included from ../../ppl-0.10.2/src/Row.defs.hh:504:0,
  from ../../ppl-0.10.2/src/Linear_Row.defs.hh:28,
  from ../../ppl-0.10.2/src/Constraint.defs.hh:28,
  from ../../ppl-0.10.2/src/Box.defs.hh:33,
  from ../../ppl-0.10.2/src/Box.cc:24:
 ../../ppl-0.10.2/src/Row.inlines.hh: In member function 'void
 Parma_Polyhedra_Library::Row::allocate(Parma_Polyhedra_Library::dimension_type,
 Parma_Polyhedra_Library::Row::Flags)':
 ../../ppl-0.10.2/src/Row.inlines.hh:92:1: error: non-placement deallocation
 function 'static void 
 Parma_Polyhedra_Library::Row_Impl_Handler::Impl::operator
 delete(void*, Parma_Polyhedra_Library::dimension_type)'
 ../../ppl-0.10.2/src/Row.inlines.hh:224:31: error: selected for placement 
 delete
 
 some issue with placement, non-placement deallocation.
 
 Cheers,
 Rainer

It's a setup clitch on my site, sorry for the noise.

Cheers,
Rainer
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksFsNoACgkQoUhjsh59BL4NWACeJXg1SjtmsVs5ttBoESPAUNXA
E2wAoKhojbVmeqRzG0rhdcH0JRZOUZAS
=iKl+
-END PGP SIGNATURE-

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Jeff Law


On 11/19/09 12:50, H. Peter Anvin wrote:


Calling the profiler immediately at the entry point is clearly the more
sane option.  It means the ABI is well-defined, stable, and independent
of what the actual function contents are.  It means that ABI isn't the
normal C ABI (the __fentry__ function would have to preserve all
registers), but that's fine...
   
Note there are targets (even some old x86 variants) that required the 
profiling calls to occur after the prologue.  Unfortunately, nobody 
documented *why* that  was the case.   Sigh.


Jeff

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Jeff Law


On 11/19/09 13:06, Linus Torvalds wrote:


On Thu, 19 Nov 2009, H. Peter Anvin wrote:
   

Calling the profiler immediately at the entry point is clearly the more
sane option.  It means the ABI is well-defined, stable, and independent
of what the actual function contents are.  It means that ABI isn't the
normal C ABI (the __fentry__ function would have to preserve all
registers), but that's fine...
 

As far as I know, that's true of _mcount already: it's not a normal ABI
and is rather a highly architecture-specific special case to begin with.
At least ARM has some (several?) special mcount calling conventions,
afaik.
   
Correct.  _mcount's ABI typically has been defined by the implementation 
of the vendor's C library mcount.


GCC has options to emit the profiling code prior to or after the 
prologue controllable through the usual variety of target macros  
hooks.  I can't imagine anyone would object to a clean, tested patch to 
change how x86-linux's profiling code was implemented.


jeff

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

Hence a new unconstrained option...

Jeff Law l...@redhat.com wrote:

On 11/19/09 12:50, H. Peter Anvin wrote:

 Calling the profiler immediately at the entry point is clearly the more
 sane option.  It means the ABI is well-defined, stable, and independent
 of what the actual function contents are.  It means that ABI isn't the
 normal C ABI (the __fentry__ function would have to preserve all
 registers), but that's fine...

Note there are targets (even some old x86 variants) that required the 
profiling calls to occur after the prologue.  Unfortunately, nobody 
documented *why* that  was the case.   Sigh.

Jeff

--
Sent from my mobile phone. Please excuse any lack of formatting.

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Jeff Law


On 11/19/09 14:14, H. Peter Anvin wrote:

Hence a new unconstrained option...
   
Not arguing against it, just noting there are targets where after the 
prologue mcount is mandated.   There's certainly hooks in GCC to do it 
both ways and if there's no clear need to use after-prologue on 
x86-linux, then before-prologue seems reasonable to me.


It's also the case that aligning stacks on the x86 and the poor code 
generated when used with profiling is an interaction I doubt anyone has 
looked at until now.  The result is definitely ugly and inefficient -- 
and there's something to be said for cleaning that up and at least 
marginally reducing the overhead of profiling.


Having said all that, I don't expect to personally be looking at the 
problem, given the list of other codegen issues that need to be looked 
at (reload in particular), profiling/stack interactions would be around 
87 millionth on my list.


jeff

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Steven Rostedt

On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:

 Having said all that, I don't expect to personally be looking at the 
 problem, given the list of other codegen issues that need to be looked 
 at (reload in particular), profiling/stack interactions would be around 
 87 millionth on my list.

Is there someone else that can look at it?

Or at the very least, could you point us to where that code is, and one
of us tracing folks could take a crack at switching hats to be a
compiler writer (with the obvious prerequisite of drinking a lot of beer
first, or is there a better drug to cope with the pain of writing gcc?).

-- Steve

Re: git mirror repacked, new branches

2009-11-19 Thread Jason Merrill


The git mirror seems to have stopped updating today.

Jason

Re: C++ comp_cdtor FUNCTION_DECL tree nodes with DECL_LANG_SPECIFIC but no DECL_CONTEXT: valid or not?

2009-11-19 Thread Jason Merrill


On 11/18/2009 07:59 AM, Dave Korn wrote:

   Is it valid for the context to be NULL here?


It doesn't make sense to have a [cd]tor with null DECL_CONTEXT, but 
dump_function_decl should probably be more resilient, for use during 
debugging when things may be in an intermediate state.


Jason

gcc-4.5-20091119 is now available

2009-11-19 Thread gccadmin

Snapshot gcc-4.5-20091119 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20091119/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 154346

You'll find:

gcc-4.5-20091119.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20091119.tar.bz2 C front end and core compiler

gcc-ada-4.5-20091119.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20091119.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20091119.tar.bz2  C++ front end and runtime

gcc-java-4.5-20091119.tar.bz2 Java front end and runtime

gcc-objc-4.5-20091119.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20091119.tar.bz2The GCC testsuite

Diffs from 4.5-20091112 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

missed IPA/whopr optimization?

2009-11-19 Thread Matt



Hello all,

In the work I'm doing on my new book, I'm trying to show how modern 
compiler optimizations can eliminate a good deal of the overhead 
introduced by an modular/unit-testable design. In verifying some of my 
text, I found that GCC 4.4 and 4.5 (20091018, Ubuntu 9.10 package) isn't 
doing an optimization that I expected it to do:


class Calculable
{
public:
virtual unsigned char calculate() = 0;
};

class X : public Calculable
{
public:
unsigned char calculate() { return 1; }
};

class Y : public Calculable
{
public:
unsigned char calculate() { return 2; }
};

static void print(Calculable c)
{
printf(%d\n, c.calculate());
printf(+1: %d\n, c.calculate() + 1);
}

int main()
{
X x;
Y y;

print(x);
print(y);

return 0;
}

GCC 4.5 (and 4.4.1) generates this approximate code:

~/src $ /usr/lib/gcc-snapshot/bin/g++ -O3 -ftree-loop-ivcanon -fivopts 
-ftree-loop-im -fwhole-program -fipa-struct-reorg -fipa-matrix-reorg 
-fgcse-sm -fgcse-las -fgcse-after-reload --param max-gcse-memory=1 
--param max-pending-list-length=10   folding-test-interface.cpp -o 
folding-test-interface_gcc450_20091018-O3-kitchen-sink


~/src$ objdump -Mintel -S 
folding-test-interface_gcc450_20091018-O3-kitchen-sink | less -p \main


00400310 main:
  400310:   53  push   rbx
  400311:   48 83 ec 20 subrsp,0x20
  400315:   48 8d 5c 24 10  learbx,[rsp+0x10]
  40031a:   48 c7 44 24 10 c0 04movQWORD PTR 
[rsp+0x10],0x4004c0

  400321:   40 00
  400323:   48 c7 04 24 00 05 40movQWORD PTR [rsp],0x400500
  40032a:   00
  40032b:   48 89 dfmovrdi,rbx
  40032e:   ff 15 8c 01 00 00   call   QWORD PTR [rip+0x18c] 
# 4004c0 _ZTV1X+0x10

  400334:   bf ac 04 40 00  movedi,0x4004ac
  400339:   0f b6 f0movzx  esi,al
  40033c:   31 c0   xoreax,eax
  40033e:   e8 a5 03 00 00  call   4006e8 pri...@plt
  400343:   48 8b 44 24 10  movrax,QWORD PTR [rsp+0x10]
  400348:   48 89 dfmovrdi,rbx
  40034b:   ff 10   call   QWORD PTR [rax]
  40034d:   0f b6 f0movzx  esi,al
  400350:   bf a4 04 40 00  movedi,0x4004a4
  400355:   31 c0   xoreax,eax
  400357:   83 c6 01addesi,0x1
  40035a:   e8 89 03 00 00  call   4006e8 pri...@plt
[...]

as seen here, GCC isn't folding/inlining the constants returned across the 
virtual function boundary, even though they are visible in the compilation 
unit and -O3 -fwhole-program is being used. (Note that I started with just 
that commandline, and added things in an attempt to induce the 
optimization I was hoping for.)


I was able to induce the optimization by removing a level of indirection 
via two ways: 1) By having two print() methods, one overloaded to accept 
X and a second overload to accept Y; and 2) by replacing the classes 
with  single-level indirection function pointers:

--
#include stdio.h

typedef unsigned char(*Calculable)(void);

unsigned char one() { return 1; }
unsigned char two() { return 2; }

static void print(Calculable calculate)
{
printf(%d\n, calculate());
printf(+1: %d\n, calculate() + 1);
}

int main()
{
print(one);
print(two);

return 0;
}
--
For completeness, this code is generated from the function-pointer example 
optimizes in the way I expect:

00400390 main:
  400390:   48 83 ec 08 subrsp,0x8
  400394:   ba 01 00 00 00  movedx,0x1
  400399:   be e4 04 40 00  movesi,0x4004e4
  40039e:   bf 01 00 00 00  movedi,0x1
  4003a3:   31 c0   xoreax,eax
  4003a5:   e8 c6 02 00 00  call   400670 __printf_...@plt
  4003aa:   ba 02 00 00 00  movedx,0x2
  4003af:   be dc 04 40 00  movesi,0x4004dc
  4003b4:   bf 01 00 00 00  movedi,0x1
  4003b9:   31 c0   xoreax,eax
  4003bb:   e8 b0 02 00 00  call   400670 __printf_...@plt



Modifying this last example to include two function pointer indirections 
once again causes the optimization to be missed.


So, my questions are:
0) Am I missing some existing commandline parameter that would induce the 
optimization? (e.g. a bad connection between my chair and keyboard)

1) Is this a missed optimization bug, or is this a missing feature?
2) Either way, what are the steps to correct the issue?

Thanks in advance for insights and/or help!



PS: I would test with a newer 4.5.0 build, but I'm having trouble 
bootstrapping. Any help is appreciated on that email (sent yesterday), as 
well.


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt

Re: [variadic templates]feature request: n-th element of expansion

2009-11-19 Thread Jason Merrill


On 11/17/2009 09:36 AM, Larry Evans wrote:

Could g++ provide this feature? How hard would it be to implement.


It probably wouldn't be difficult to implement, but I'd want someone to 
champion the extension with the C++ committee as well.  Have you asked 
Doug Gregor what he thinks?  I assume that omitting this functionality 
was deliberate.


Jason

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Jeff Law


On 11/19/09 15:43, Steven Rostedt wrote:

On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:

   

Having said all that, I don't expect to personally be looking at the
problem, given the list of other codegen issues that need to be looked
at (reload in particular), profiling/stack interactions would be around
87 millionth on my list.
 

Is there someone else that can look at it?

   
Unsure at the moment...  Like everyone else, GCC developers are busy and 
this probably isn't going to be a high priority item for anyone.




Or at the very least, could you point us to where that code is, and one
of us tracing folks could take a crack at switching hats to be a
compiler writer (with the obvious prerequisite of drinking a lot of beer
first, or is there a better drug to cope with the pain of writing gcc?).
   

It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in
gcc-someversiongcc/config/i386/linux.h  rebuilding GCC.

Based on comments elsewhere, the sun386i support may have used 
PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not 
need further adjustment.  That is obviously the ideal case.


If that appears to work for your needs, I'll volunteer to test it more 
thoroughly and assuming those tests look good shepherd it into the 
source tree.


Jeff

Re: [variadic templates]feature request: n-th element of expansion

2009-11-19 Thread Larry Evans


On 11/19/09 17:23, Jason Merrill wrote:

On 11/17/2009 09:36 AM, Larry Evans wrote:

Could g++ provide this feature? How hard would it be to implement.


It probably wouldn't be difficult to implement, but I'd want someone to 
champion the extension with the C++ committee as well.  Have you asked 
Doug Gregor what he thinks?  


Yes:


Hi Doug,

Your post:


http://groups.google.com/group/comp.std.c++/msg/40705c1e2a6f78f8


contains:


 3) A guaranteed non-recursive way to access elements of parameter
 packs
  templateint N, class ... V struct get_type
  {
 typedef v...@n type;  // or implementation_definedN,V...::type -
 guaranteed linear
  };
  templateint N, class ... V get_typeN,V...::type get(V...v) {
 return ::implementation_definedN(v...);
  }

This is probably the most-requested feature for variadic templates,
and it never it made it because we never found a good, unambiguous
syntax.


Could you elaborate on why it's hard to find some unambiguous syntax?
For example, what would be wrong with the syntax:


 get-nth-expansion-element:
expansion-pattern '...[' constant-expression ']' 

shown in the following post to gmane.comp.gcc.devel:


http://thread.gmane.org/gmane.comp.gcc.devel/110252


TIA.

-regards,
Larry






I assume that omitting this functionality 
was deliberate.


As noted in my quote of Doug's post to comp.std.c++, there
*might* be something ambiguous about:

   expansion-pattern '...[' constant-expression ']'

I'm awaiting Doug's reply.



Jason

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Jeff Law wrote:
 On 11/19/09 15:43, Steven Rostedt wrote:
  On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
  
 
   Having said all that, I don't expect to personally be looking at the
   problem, given the list of other codegen issues that need to be looked
   at (reload in particular), profiling/stack interactions would be around
   87 millionth on my list.

  Is there someone else that can look at it?
  
 
 Unsure at the moment...  Like everyone else, GCC developers are busy and this
 probably isn't going to be a high priority item for anyone.
 
 
  Or at the very least, could you point us to where that code is, and one
  of us tracing folks could take a crack at switching hats to be a
  compiler writer (with the obvious prerequisite of drinking a lot of beer
  first, or is there a better drug to cope with the pain of writing gcc?).
 
 It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in
 gcc-someversiongcc/config/i386/linux.h  rebuilding GCC.
 
 Based on comments elsewhere, the sun386i support may have used
 PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not need
 further adjustment.  That is obviously the ideal case.
 
 If that appears to work for your needs, I'll volunteer to test it more
 thoroughly and assuming those tests look good shepherd it into the source
 tree.

We definitely want to see that ASAP.

While testing various kernel configs we found out that the problem
comes and goes. Finally I started to compare the gcc command line
options and after some fiddling it turned out that the following
minimal deltas change the code generator behaviour:

Bad:  -march=pentium-mmx-Wa,-mtune=generic32
Good: -march=i686-mtune=generic -Wa,-mtune=generic32
Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32

I'm not supposed to understand the logic behind that, right ?

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Linus Torvalds



On Fri, 20 Nov 2009, Thomas Gleixner wrote:
 
 While testing various kernel configs we found out that the problem
 comes and goes. Finally I started to compare the gcc command line
 options and after some fiddling it turned out that the following
 minimal deltas change the code generator behaviour:
 
 Bad:  -march=pentium-mmx-Wa,-mtune=generic32
 Good: -march=i686-mtune=generic -Wa,-mtune=generic32
 Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
 
 I'm not supposed to understand the logic behind that, right ?

Are you sure it's just the compiler flags?

There's another configuration portion: the size of the alignment itself. 
That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel 
config CONFIG_X86_L1_CACHE_SHIFT.

Maybe that value matters too - for example maybe gcc will not try to align 
the stack if it's big?

[ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT 
  totally unrelated numbers? Very confusing. ]

The compiler flags we use are tied to some of the same choices that choose 
the cache shift, so the correlation you found while debugging this would 
still hold.

Linus

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread Thomas Gleixner

On Thu, 19 Nov 2009, Linus Torvalds wrote:
 On Fri, 20 Nov 2009, Thomas Gleixner wrote:
  
  While testing various kernel configs we found out that the problem
  comes and goes. Finally I started to compare the gcc command line
  options and after some fiddling it turned out that the following
  minimal deltas change the code generator behaviour:
  
  Bad:  -march=pentium-mmx-Wa,-mtune=generic32
  Good: -march=i686-mtune=generic -Wa,-mtune=generic32
  Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
  
  I'm not supposed to understand the logic behind that, right ?
 
 Are you sure it's just the compiler flags?

I first captured the command line with V=1 and created a script of
it. Then I changed the -march -mtune options in that script and
compiled just that single file manually w/o changing .config or
invoking the kernel make magic.

The good ones produce:

650:   55  push   %ebp
651:   89 e5   mov%esp,%ebp
653:   83 e4 f0and$0xfff0,%esp

The bad one:

05f0 timer_stats_update_stats:
 5f0:   57  push   %edi
 5f1:   8d 7c 24 08 lea0x8(%esp),%edi
 5f5:   83 e4 f0and$0xfff0,%esp
 5f8:   ff 77 fcpushl  -0x4(%edi)
 5fb:   55  push   %ebp
 5fc:   89 e5   mov%esp,%ebp
 
 There's another configuration portion: the size of the alignment itself. 
 That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel 
 config CONFIG_X86_L1_CACHE_SHIFT.
 
 Maybe that value matters too - for example maybe gcc will not try to align 
 the stack if it's big?

That does not change any of the compiler options, but yes it could
have some effect via the various include magics, but all I have seen
so far is linkage.h which should not affect the compiler. And the
manual compile did not change any of this.
 
 [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT 
   totally unrelated numbers? Very confusing. ]

Agreed.

 The compiler flags we use are tied to some of the same choices that choose 
 the cache shift, so the correlation you found while debugging this would 
 still hold.

Digging further tomorrow when my brain is more awake.

Thanks,

tglx

Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-19 Thread H. Peter Anvin

On 11/19/2009 04:59 PM, Linus Torvalds wrote:
 
 [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT 
   totally unrelated numbers? Very confusing. ]
 

Yes, there is another thread to clean up that particular mess; it is
already in -tip:

http://git.kernel.org/tip/350f8f5631922c7848ec4b530c111cb8c2ff7caa

-hpa

Re: with rev. 154329 ppl doesn't build anymore

2009-11-19 Thread Rainer Emrich

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rainer Emrich schrieb:
 Rainer Emrich schrieb:
 This is with gcc-trunk rev. 154329
 build=x86_64-w64-mingw32
 ppl-0.10.2
 
 Used to work until yesterday, now:
 
 /bin/sh ../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.
 -I../../ppl-0.10.2/src -I..  -I.. -I../../ppl-0.10.2/src
 -I/mingw/x86_64-w64/x86_64-w64/x86_64-w64/gcc-4.5.0/mingw/include  -g -O2
 -frounding-math  -W -Wall -MT Box.lo -MD -MP -MF .deps/Box.Tpo -c -o Box.lo
 ../../ppl-0.10.2/src/Box.cc
 libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../ppl-0.10.2/src -I.. -I..
 -I../../ppl-0.10.2/src
 -I/mingw/x86_64-w64/x86_64-w64/x86_64-w64/gcc-4.5.0/mingw/include -g -O2
 -frounding-math -W -Wall -MT Box.lo -MD -MP -MF .deps/Box.Tpo -c
 ../../ppl-0.10.2/src/Box.cc -o Box.o
 In file included from ../../ppl-0.10.2/src/Row.defs.hh:504:0,
  from ../../ppl-0.10.2/src/Linear_Row.defs.hh:28,
  from ../../ppl-0.10.2/src/Constraint.defs.hh:28,
  from ../../ppl-0.10.2/src/Box.defs.hh:33,
  from ../../ppl-0.10.2/src/Box.cc:24:
 ../../ppl-0.10.2/src/Row.inlines.hh: In member function 'void
 Parma_Polyhedra_Library::Row::allocate(Parma_Polyhedra_Library::dimension_type,
 Parma_Polyhedra_Library::Row::Flags)':
 ../../ppl-0.10.2/src/Row.inlines.hh:92:1: error: non-placement deallocation
 function 'static void 
 Parma_Polyhedra_Library::Row_Impl_Handler::Impl::operator
 delete(void*, Parma_Polyhedra_Library::dimension_type)'
 ../../ppl-0.10.2/src/Row.inlines.hh:224:31: error: selected for placement 
 delete
 
 some issue with placement, non-placement deallocation.
 
 Cheers,
 Rainer
 
 It's a setup clitch on my site, sorry for the noise.
 
 Cheers,
 Rainer

It's yet a bug, see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42115

Cheers,
Rainer
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksF9jYACgkQoUhjsh59BL5GjACcCw4okhWEXTotCJjooKOG5sEv
8RkAoJbbj7RtVncLzY3AphpPFa3Jgjpu
=PRWx
-END PGP SIGNATURE-

[PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue

2009-11-19 Thread Steven Rostedt


Ingo,

Not sure if this is too much for this late in the -rc game, but it finds
the gcc bug at build time, and we don't need to disable function graph
tracer for all i386 builds.

This is built on my last urgent repo pull request.

Please pull the latest tip/tracing/urgent-2 tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/urgent-2


Steven Rostedt (1):
  tracing/x86: Add check to detect GCC messing with mcount prologue


 kernel/trace/Kconfig|1 -
 scripts/Makefile.build  |   25 +++-
 scripts/recordmcount.pl |   74 +--
 3 files changed, 95 insertions(+), 5 deletions(-)
---
commit c7715fb611c69ac4b7f722a891de08b206fb7686
Author: Steven Rostedt srost...@redhat.com
Date:   Thu Nov 19 23:41:02 2009 -0500

tracing/x86: Add check to detect GCC messing with mcount prologue

Latest versions of GCC create a funny prologue for some functions.
Instead of the typical:

push   %ebp
mov%esp,%ebp
and$0xffe0,%esp
[...]
call   mcount

GCC may try to align the stack before setting up the frame pointer
register:

push   %edi
lea0x8(%esp),%edi
and$0xffe0,%esp
pushl  -0x4(%edi)
push   %ebp
mov%esp,%ebp
[...]
call   mcount

This crazy code places a copy of the return address into the
frame pointer. The function graph tracer uses this pointer to
save and replace the return address of the calling function to jump
to the function graph tracer's return handler, which will put back
the return address. But instead instead of the typical return:

mov%ebp,%esp
pop%ebp
ret

The return of the function performs:

lea-0x8(%edi),%esp
pop%edi
ret

The function graph tracer return handler will not be called at the exit
of the function, but the parent function will call it. Because we missed
the return of the child function, the handler will replace the parent's
return address with that of the child. Obviously this will cause a crash
(Note, there is code to detect this case and safely panic the kernel).

The kicker is that this happens to just a handful of functions.
And only with certain gcc options.

Compiling with: -march=pentium-mmx
will cause the problem to appear. But if you were to change
pentium-mmx to i686 or add -mtune=generic, then the problem goes away.

I first saw this problem when compiling with optimize for size.
But it seems that various other options may cause this issue to arise.

Instead of completely disabling the function graph tracer for i386 builds
this patch adds a check to recordmcount.pl to make sure that all
functions that contain a call to mcount start with push %ebp.
If not, it will fail the compile and print out the nasty warning:

  CC  kernel/time/timer_stats.o


  Your version of GCC breaks the function graph tracer
  Please disable CONFIG_FUNCTION_GRAPH_TRACER
  Failed function was timer_stats_update_stats


The script recordmcount.pl is given a new parameter do_check. If
this is negative, the script will only perform this check without
creating the mcount caller section. This will be executed for x86_32
when CONFIG_FUNCTION_GRAPH_TRACER is enabled and CONFIG_DYNAMIC_FTRACE
is not.

If the arch is x86_32 and $do_check is greater than 1, it will perform
the check while processing the mcount callers. If $do_check is 0, then
no check will be performed. This is for non x86_32 archs and when
compiling without CONFIG_FUNCTION_GRAPH_TRACER enabled, even on x86_32.

Reported-by: Thomas Gleixner t...@linutronix.de
LKML-Reference: alpine.lfd.2.00.0911191423190.24...@localhost.localdomain
Signed-off-by: Steven Rostedt rost...@goodmis.org

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index b416512..cd39064 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -143,7 +143,6 @@ config FUNCTION_GRAPH_TRACER
bool Kernel Function Graph Tracer
depends on HAVE_FUNCTION_GRAPH_TRACER
depends on FUNCTION_TRACER
-   depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
default y
help
  Enable the kernel to trace a function at both its return
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 341b589..3b897f2 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -206,10

Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue

2009-11-19 Thread Steven Rostedt

This touches the Makefile scripts. I forgot to CC kbuild and Sam.

-- Steve

On Fri, 2009-11-20 at 00:23 -0500, Steven Rostedt wrote:
 Ingo,
 
 Not sure if this is too much for this late in the -rc game, but it finds
 the gcc bug at build time, and we don't need to disable function graph
 tracer for all i386 builds.
 
 This is built on my last urgent repo pull request.
 
 Please pull the latest tip/tracing/urgent-2 tree, which can be found at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
 tip/tracing/urgent-2
 
 
 Steven Rostedt (1):
   tracing/x86: Add check to detect GCC messing with mcount prologue
 
 
  kernel/trace/Kconfig|1 -
  scripts/Makefile.build  |   25 +++-
  scripts/recordmcount.pl |   74 
 +--
  3 files changed, 95 insertions(+), 5 deletions(-)
 ---
 commit c7715fb611c69ac4b7f722a891de08b206fb7686
 Author: Steven Rostedt srost...@redhat.com
 Date:   Thu Nov 19 23:41:02 2009 -0500
 
 tracing/x86: Add check to detect GCC messing with mcount prologue
 
 Latest versions of GCC create a funny prologue for some functions.
 Instead of the typical:
 
   push   %ebp
   mov%esp,%ebp
   and$0xffe0,%esp
   [...]
   call   mcount
 
 GCC may try to align the stack before setting up the frame pointer
 register:
 
   push   %edi
   lea0x8(%esp),%edi
   and$0xffe0,%esp
   pushl  -0x4(%edi)
   push   %ebp
   mov%esp,%ebp
   [...]
   call   mcount
 
 This crazy code places a copy of the return address into the
 frame pointer. The function graph tracer uses this pointer to
 save and replace the return address of the calling function to jump
 to the function graph tracer's return handler, which will put back
 the return address. But instead instead of the typical return:
 
   mov%ebp,%esp
   pop%ebp
   ret
 
 The return of the function performs:
 
   lea-0x8(%edi),%esp
   pop%edi
   ret
 
 The function graph tracer return handler will not be called at the exit
 of the function, but the parent function will call it. Because we missed
 the return of the child function, the handler will replace the parent's
 return address with that of the child. Obviously this will cause a crash
 (Note, there is code to detect this case and safely panic the kernel).
 
 The kicker is that this happens to just a handful of functions.
 And only with certain gcc options.
 
 Compiling with:   -march=pentium-mmx
 will cause the problem to appear. But if you were to change
 pentium-mmx to i686 or add -mtune=generic, then the problem goes away.
 
 I first saw this problem when compiling with optimize for size.
 But it seems that various other options may cause this issue to arise.
 
 Instead of completely disabling the function graph tracer for i386 builds
 this patch adds a check to recordmcount.pl to make sure that all
 functions that contain a call to mcount start with push %ebp.
 If not, it will fail the compile and print out the nasty warning:
 
   CC  kernel/time/timer_stats.o
 
 
   Your version of GCC breaks the function graph tracer
   Please disable CONFIG_FUNCTION_GRAPH_TRACER
   Failed function was timer_stats_update_stats
 
 
 The script recordmcount.pl is given a new parameter do_check. If
 this is negative, the script will only perform this check without
 creating the mcount caller section. This will be executed for x86_32
 when CONFIG_FUNCTION_GRAPH_TRACER is enabled and CONFIG_DYNAMIC_FTRACE
 is not.
 
 If the arch is x86_32 and $do_check is greater than 1, it will perform
 the check while processing the mcount callers. If $do_check is 0, then
 no check will be performed. This is for non x86_32 archs and when
 compiling without CONFIG_FUNCTION_GRAPH_TRACER enabled, even on x86_32.
 
 Reported-by: Thomas Gleixner t...@linutronix.de
 LKML-Reference: 
 alpine.lfd.2.00.0911191423190.24...@localhost.localdomain
 Signed-off-by: Steven Rostedt rost...@goodmis.org
 
 diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
 index b416512..cd39064 100644
 --- a/kernel/trace/Kconfig
 +++ b/kernel/trace/Kconfig
 @@ -143,7 +143,6 @@ config FUNCTION_GRAPH_TRACER
   bool Kernel Function Graph Tracer
   depends on HAVE_FUNCTION_GRAPH_TRACER
   depends on FUNCTION_TRACER
 - depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
   default y
   help
 Enable the

How to support 40bit GP register - Take two

2009-11-19 Thread Mohamed Shafi

Hello all,

I am porting GCC 4.4.0 for a 32bit target. The target has 40bit data
registers and 32bit address register. Both can be used as general
purpose registers. All load and store operations are 32bit. If 40bit
data register is involved in load/sore the register gets sign
extended. Whenever there is a move from address register to data
register sign extension is automatically performed. Currently GCC
generates code for 32bit register target. Since the data register is
40bit after/before some operations sign/zero extension has to be
performed for the result to be proper. So at present for the port the
results are not proper. I would need a solution to fix this.

I had mailed about this previously. You can see about this here
http://www.mail-archive.com/gcc@gcc.gnu.org/msg47224.html

I tried implementing the suggestion given by Richard, but got into
issues. The GCC frame work is written assuming that there are no modes
with HOST_BITS_PER_WIDE_INT  GET_MODE_BITSIZE (mode)  2 *
HOST_BITS_PER_WIDE_INT. Moreover i am getting ICEs when there is an
optimization/operation related to subreg. (GCC tries to split RImode
values).RImode is 5byte and uses SImode load/store instructions. So
GCC generates offsets/addresses that are not 32bit aligned. Currently
i am hacking the complier all the way to get an executable (though i
have not tested the output of the obtained executables) Even if i
somehow manage to get proper output there is the issue of using 32bit
registers in RImode instructions. RImode values is meant for 40bit
register, i.e data register. That means i will not be able to use
address registers(32bit registers) in RImode patterns even though the
instructions accept them. This will definitely hamper efficiency.

So i was wondering if anybody has any alternative solution that i can
try. All i can think is to flag an insn for unsigned operation so that
i will be able to insert sign/zero extension during say reorg pass.
Can this be implemented? How feasible is this?

Regards,
Shafi

[Bug c++/42105] New: class with operator failed to be stored in the STL map

2009-11-19 Thread evgeny at mainsoft dot com

OS version: Red Hat Enterprise Linux AS release 4 (Nahant Update 5)
gcc version: g++ (GCC) 4.1.2
gcc command: gcc -c xx.cpp

Source file:

--
#include map
#include set

class test
{
public:

test() { }
int** operator() { return (int**)0; }
operator int*() const { return (int*)0; }
};

void foo()
{
test s;
std::map unsigned int, std::settest  m;
m[0].insert( s );
}

--

Compiler output:

/usr/local/gnu/gcc/4.1.2/bin/g++ -v -save-temps -c xx.cpp
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4.1.2/configure --prefix=/usr/local/gnu/gcc/4.1.2
--enable-languages=c,c++
Thread model: posix
gcc version 4.1.2
 /usr/local/gnu/gcc/4.1.2/libexec/gcc/i686-pc-linux-gnu/4.1.2/cc1plus -E -quiet
-v -D_GNU_SOURCE xx.cpp -mtune=pentiumpro -fpch-preprocess -o xx.ii
ignoring nonexistent directory
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../i686-pc-linux-gnu/include
#include ... search starts here:
#include ... search starts here:

/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2

/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/i686-pc-linux-gnu

/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/backward
 /usr/local/include
 /usr/local/gnu/gcc/4.1.2/include
 /usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/include
 /usr/include
End of search list.
 /usr/local/gnu/gcc/4.1.2/libexec/gcc/i686-pc-linux-gnu/4.1.2/cc1plus
-fpreprocessed xx.ii -quiet -dumpbase xx.cpp -mtune=pentiumpro -auxbase xx
-version -o xx.s
GNU C++ version 4.1.2 (i686-pc-linux-gnu)
compiled by GNU C version 4.1.2.
GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129323
Compiler executable checksum: edf0d2731cb64fda3dd46c0e3997ac20
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:
In member function 'void std::_Rb_tree_Key, _Val, _KeyOfValue, _Compare,
_Alloc::destroy_node(std::_Rb_tree_node_Val*) [with _Key = test, _Val =
test, _KeyOfValue = std::_Identitytest, _Compare = std::lesstest, _Alloc =
std::allocatortest]':
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:1266:
  instantiated from 'void std::_Rb_tree_Key, _Val, _KeyOfValue, _Compare,
_Alloc::_M_erase(std::_Rb_tree_node_Val*) [with _Key = test, _Val = test,
_KeyOfValue = std::_Identitytest, _Compare = std::lesstest, _Alloc =
std::allocatortest]'
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:578:
  instantiated from 'std::_Rb_tree_Key, _Val, _KeyOfValue, _Compare,
_Alloc::~_Rb_tree() [with _Key = test, _Val = test, _KeyOfValue =
std::_Identitytest, _Compare = std::lesstest, _Alloc =
std::allocatortest]'
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_set.h:108:
  instantiated from '_Tp std::map_Key, _Tp, _Compare,
_Alloc::operator[](const _Key) [with _Key = unsigned int, _Tp =
std::settest, std::lesstest, std::allocatortest , _Compare =
std::lessunsigned int, _Alloc = std::allocatorstd::pairconst unsigned int,
std::settest, std::lesstest, std::allocatortest   ]'
xx.cpp:17:   instantiated from here
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:391:
error: no matching function for call to 'std::allocatortest::destroy(int**)'
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:107:
note: candidates are: void __gnu_cxx::new_allocator_Tp::destroy(_Tp*) [with
_Tp = test]
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:
In member function 'std::_Rb_tree_node_Val* std::_Rb_tree_Key, _Val,
_KeyOfValue, _Compare, _Alloc::_M_create_node(const _Val) [with _Key = test,
_Val = test, _KeyOfValue = std::_Identitytest, _Compare = std::lesstest,
_Alloc = std::allocatortest]':
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:819:
  instantiated from 'typename std::_Rb_tree_Key, _Val, _KeyOfValue, _Compare,
_Alloc::iterator std::_Rb_tree_Key, _Val, _KeyOfValue, _Compare,
_Alloc::_M_insert(std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, const
_Val) [with _Key = test, _Val = test, _KeyOfValue = std::_Identitytest,
_Compare = std::lesstest, _Alloc = std::allocatortest]'
/usr/local/gnu/gcc/4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:927:
  instantiated from 'std::pairtypename std::_Rb_tree_Key, _Val, _KeyOfValue,
_Compare, _Alloc::iterator, bool std::_Rb_tree_Key, _Val, _KeyOfValue,
_Compare, _Alloc::insert_unique(const _Val) [with _Key = test, _Val = test,
_KeyOfValue = std::_Identitytest, _Compare = std::lesstest, _Alloc =

[Bug c++/42105] class with operator failed to be stored in the STL map

2009-11-19 Thread evgeny at mainsoft dot com



--- Comment #1 from evgeny at mainsoft dot com  2009-11-19 08:10 ---
Created an attachment (id=19044)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19044action=view)
preprocessed source file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42105

[Bug c++/42105] class with operator failed to be stored in the STL map

2009-11-19 Thread evgeny at mainsoft dot com



--- Comment #2 from evgeny at mainsoft dot com  2009-11-19 08:12 ---
The source code published in bug description compiled successfully with gcc
(g++) 3.4.3.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42105

[Bug c++/42105] class with operator failed to be stored in the STL map

2009-11-19 Thread pinskia at gcc dot gnu dot org



--- Comment #3 from pinskia at gcc dot gnu dot org  2009-11-19 08:30 ---
This is by design for C++03 but for C++0x (really C++1x) it is not.

*** This bug has been marked as a duplicate of 41792 ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42105

[Bug libstdc++/41792] [C++0x] overloading the address operator confuses the standard containers

2009-11-19 Thread pinskia at gcc dot gnu dot org



--- Comment #4 from pinskia at gcc dot gnu dot org  2009-11-19 08:30 ---
*** Bug 42105 has been marked as a duplicate of this bug. ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||evgeny at mainsoft dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41792

[Bug c++/42101] Linking failure with static constants and ternary inline function

2009-11-19 Thread paolo dot carlini at oracle dot com



--- Comment #8 from paolo dot carlini at oracle dot com  2009-11-19 09:38 
---
It is basic, yes, a point worth making with people insisting that we do have a
serious bug, thus reopening the PR at will, without trusting the competence of
the maintainers and wasting some of our time.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42101

[Bug target/36047] -pg does not work on large binaries and m68k

2009-11-19 Thread mkuvyrkov at gcc dot gnu dot org



--- Comment #5 from mkuvyrkov at gcc dot gnu dot org  2009-11-19 10:09 
---
g...@breakpoint.cc,

Would you please submit your patch to gcc-patc...@gcc.gnu.org.  Only the
linux.h version of FUNCTION_PROFILER causes problems, you can leave the m68k.h
version as is.

Thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36047

[Bug fortran/42104] [F03] runtime segfault with procedure pointer component

2009-11-19 Thread janus at gcc dot gnu dot org



--- Comment #3 from janus at gcc dot gnu dot org  2009-11-19 10:45 ---
Confirmed.


-- 

janus at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Keywords||wrong-code
   Last reconfirmed|-00-00 00:00:00 |2009-11-19 10:45:21
   date||
Summary|Segmentation fault with |[F03] runtime segfault with
   |procedure pointer component |procedure pointer component


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42104

[Bug target/41473] [4.5 Regression] dsymutil Assertion failed ...

2009-11-19 Thread dominiq at lps dot ens dot fr



--- Comment #23 from dominiq at lps dot ens dot fr  2009-11-19 10:56 ---
Further reduced test case:

void
check_add_float (void)
{
  _Complex float a1, a2, b2, c2; 
  a1 = 0.0f;
  a2 = a1;
  b2 = a1; 
  c2 = a2 + b2;
}

int
main (void)
{
  check_add_float ();
}

[ibook-dhum] f90/bug% gcc45 complex-sign-add_red_2.c -O1 -g
-fno-guess-branch-probability
[ibook-dhum] f90/bug% gcc45 complex-sign-add_red_2.c -O1 -g
Assertion failed: (orig_str), function FixReferences, file
/SourceCache/dwarf_utilities/dwarf_utilities-70/source/DWARFdSYM.cpp, line
3641.
...


-- 

dominiq at lps dot ens dot fr changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu dot
   ||org, jakub at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41473

[Bug target/41473] [4.5 Regression] dsymutil Assertion failed ...

2009-11-19 Thread dominiq at lps dot ens dot fr



--- Comment #24 from dominiq at lps dot ens dot fr  2009-11-19 10:58 ---
Created an attachment (id=19045)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19045action=view)
assembly generated with gcc45 complex-sign-add_red_2.c -O1 -g
-fno-guess-branch-probability -S

ibook-dhum] f90/bug% rm -rf a.out*
[ibook-dhum] f90/bug% as complex-sign-add_red_2_nop.s -o
complex-sign-add_red_2_nop.o
[ibook-dhum] f90/bug% gcc complex-sign-add_red_2_nop.o
[ibook-dhum] f90/bug% dsymutil a.out
[ibook-dhum] f90/bug%


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41473

[Bug target/41473] [4.5 Regression] dsymutil Assertion failed ...

2009-11-19 Thread dominiq at lps dot ens dot fr



--- Comment #25 from dominiq at lps dot ens dot fr  2009-11-19 10:59 ---
Created an attachment (id=19046)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19046action=view)
assembly generated by gcc45 complex-sign-add_red_2.c -O1 -g -S

[ibook-dhum] f90/bug% as complex-sign-add_red_2_yes.s -o
complex-sign-add_red_2_yes.o
[ibook-dhum] f90/bug% gcc complex-sign-add_red_2_yes.o
[ibook-dhum] f90/bug% dsymutil a.out
Assertion failed: (orig_str), function FixReferences, file
/SourceCache/dwarf_utilities/dwarf_utilities-70/source/DWARFdSYM.cpp, line
3641.
Abort


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41473

[Bug target/41473] [4.5 Regression] dsymutil Assertion failed ...

2009-11-19 Thread dominiq at lps dot ens dot fr



--- Comment #26 from dominiq at lps dot ens dot fr  2009-11-19 11:00 ---
Created an attachment (id=19047)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19047action=view)
diff between complex-sign-add_red_2_nop.s and complex-sign-add_red_2_yes.s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41473

[Bug target/41810] Cannot build gcc: gthr-default.h:466: error: '__mutex' was not declared in this scope

2009-11-19 Thread ro at CeBiTec dot Uni-Bielefeld dot DE



--- Comment #8 from ro at CeBiTec dot Uni-Bielefeld dot DE  2009-11-19 
11:25 ---
Subject: Re:  Cannot build gcc: gthr-default.h:466: error: '__mutex' was not
declared in this scope

 --- Comment #7 from alanpae at ilkda dot com  2009-11-18 19:39 ---
 changing to --disable-threads also works.

True, but why not omit any --{enable, disable}-threads option and use
the default?

Rainer


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41810

[Bug target/41810] Cannot build gcc: gthr-default.h:466: error: '__mutex' was not declared in this scope

2009-11-19 Thread jakub at gcc dot gnu dot org



--- Comment #9 from jakub at gcc dot gnu dot org  2009-11-19 11:54 ---
The #c4 patch looks wrong, instead of that you should IMHO just not use UNUSED
macro on __gthread_mutex_destroy argument.  It is perfectly fine on
__gthread_key_delete.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41810

[Bug fortran/42104] [F03] runtime segfault with procedure pointer component

2009-11-19 Thread janus at gcc dot gnu dot org



--- Comment #4 from janus at gcc dot gnu dot org  2009-11-19 11:57 ---
Let's have a look at the dump for the test case in comment #2.

The call to 'func' is translated to:

  real(kind=4) D.1568;
  struct array1_real(kind=4) parm.7;
  static real(kind=4) A.6[2] = {1.0001490116119384765625e-1,
1.0001490116119384765625e-1};
  static integer(kind=4) C.1562 = 3;

  parm.7.dtype = 281;
  parm.7.dim[0].lbound = 1;
  parm.7.dim[0].ubound = 2;
  parm.7.dim[0].stride = 1;
  parm.7.data = (void *) A.6[0];
  parm.7.offset = 0;
  D.1568 = func (C.1562, parm.7);


The PPC call to 'funcp%p' looks simlar, except for:

  D.1576 = _gfortran_internal_pack (parm.10);
  D.1578 = funcp.p (C.1570, D.1576);

(i.e. it has an additional _gfortran_internal_pack).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42104

[Bug c/42079] missing unitialized warning on simple testcase

2009-11-19 Thread manu at gcc dot gnu dot org



--- Comment #2 from manu at gcc dot gnu dot org  2009-11-19 12:00 ---
Taking address of var causes missing may be uninitialized.

*** This bug has been marked as a duplicate of 19430 ***


-- 

manu at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||manu at gcc dot gnu dot org
 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42079

[Bug middle-end/19430] V_MAY_DEF (taking address of var) causes missing uninitialized warning

2009-11-19 Thread manu at gcc dot gnu dot org



--- Comment #17 from manu at gcc dot gnu dot org  2009-11-19 12:00 ---
*** Bug 42079 has been marked as a duplicate of this bug. ***


-- 

manu at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||m dot b dot lankhorst at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19430

[Bug c/41441] failure to warn about uninitialized induction var

2009-11-19 Thread manu at gcc dot gnu dot org



--- Comment #2 from manu at gcc dot gnu dot org  2009-11-19 12:13 ---
If the loop does nothing, the whole loop is removed before warning about
anything. If you find a testcase where the loop does something useful, and
there is still no warning, please open a new bug report. Thanks.


-- 

manu at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||manu at gcc dot gnu dot org
 Status|NEW |RESOLVED
 Resolution||INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41441

[Bug middle-end/41817] bogus may be uninitialized (huge testcase, inlining?)

2009-11-19 Thread manu at gcc dot gnu dot org



--- Comment #7 from manu at gcc dot gnu dot org  2009-11-19 12:19 ---
This is still unconfirmed until someone looks at the dumps and check that the
variables are indeed initialized in all paths that can be sensibly detected by
GCC.

BTW, when you release code, your compiler flags should not contain -Werror. If
some package does, you should really report it upstream because taking into
account all the amount of new warnings, and fixes to existing warnings that
occur between consecutive GCC releases, that is madness for any user compiling
your code.


-- 

manu at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||manu at gcc dot gnu dot org
OtherBugsDependingO||24639
  nThis||
Summary|elfutils fails with may be |bogus may be uninitialized
   |uninitialized with -O3 -   |(huge testcase, inlining?)
   |mtune=k8|


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41817

[Bug middle-end/39936] -Wuninitialized false positive with unhelpful diagnostic

2009-11-19 Thread manu at gcc dot gnu dot org



--- Comment #3 from manu at gcc dot gnu dot org  2009-11-19 12:22 ---
The best we can do is to add this testcase to GCC 4.5 and close this as FIXED
in mainline. These kind of fixes are typically not easy to backport.


-- 

manu at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||manu at gcc dot gnu dot org
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-11-19 12:22:23
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39936

[Bug bootstrap/42068] [4.5 regression] ICE in function_and_variable_visibility breaks Tru64 UNIX Ada bootstrap

2009-11-19 Thread ebotcazou at gcc dot gnu dot org



--- Comment #1 from ebotcazou at gcc dot gnu dot org  2009-11-19 12:26 
---
Created an attachment (id=19048)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19048action=view)
Reduced testcase

To be gnatchop-ed and cross-compiled.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42068

[Bug c++/42000] missing -Wuninitialized warning on a user-defined class ctor

2009-11-19 Thread manu at gcc dot gnu dot org



--- Comment #2 from manu at gcc dot gnu dot org  2009-11-19 12:28 ---
I think this is a duplicate of either bug 2972 or bug 19808 or one of the SRA
testcases.


-- 

manu at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||manu at gcc dot gnu dot org
OtherBugsDependingO||24639
  nThis||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42000

1 2 >

1 - 100 of 191 matches

Mail list logo