Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread John Marino
On 4/16/2014 03:22, Ian Lance Taylor wrote:
> On Tue, Apr 15, 2014 at 4:45 AM, Douglas B Rupp  wrote:
>> On 04/14/2014 02:01 PM, Ian Lance Taylor wrote:
>>
>> No I considered that but I think that number will be very small. Will you
>> concede, in hindsight, that it would be better had the name been chosen to
>> be more unique?
> 
> No, I won't concede that.  The unwind.h file provides the interface
> for the C++ exception handling interface
> (http://mentorembedded.github.io/cxx-abi/abi-eh.html).  That interface
> is implemented by several different compilers, not just GCC.

The header can provide the exact same interface with a different, better
file name.

He's basically asking, "If you had it to do all over again, would you
still call it unwind.h or would you call it something different?"

It's just an academic discussion because answering yes or no changes
nothing, but I think the majority of the people would give it a
different file name if they could do it all over again.  It's not a big
concession.

John


Fragile test case nsdmi-union5.C

2014-04-16 Thread Joey Ye
Ran into a fragile test case:
FAIL: g+.dg/cpp0x/nsdmi-union5.C -std=c+11 scan-assembler 7

$ cat nsdmi-union5.C
// PR c++/58701
// { dg-require-effective-target c++11 }
// { dg-final { scan-assembler "7" } }

static union
{
  union
  {
int i = 7;
  };
};

Two issues make it very fragile. It only seems to pass with -O0, as the code
will be optimized away with any non-O0 levels. This is somewhat acceptable,
but following is annoying: It scans digit 7 in resulting asm, which will
pass whenever 
* Any CPU name, file name, svn/git revision number and ARM eabi_attribute
contain digit 7. 
* All GCC x.7 versions
* Any GCC built in July or on 7th/17th/27th of the month, or in 2017
Actually I ran into this issue when I checked my test result with a
reference. Unfortunately one has digit 7 in revision number and one has not.

I tend to just not scan anything and makes it a do-compile case against ICE.
But I'm not sure if there was a reason to scan the digit. Please comment.

Thanks,
Joey




Re: Fragile test case nsdmi-union5.C

2014-04-16 Thread pinskia


> On Apr 16, 2014, at 12:42 AM, "Joey Ye"  wrote:
> 
> Ran into a fragile test case:
> FAIL: g+.dg/cpp0x/nsdmi-union5.C -std=c+11 scan-assembler 7
> 
> $ cat nsdmi-union5.C
> // PR c++/58701
> // { dg-require-effective-target c++11 }
> // { dg-final { scan-assembler "7" } }
> 
> static union
> {
>  union
>  {
>int i = 7;
>  };
> };
> 
> Two issues make it very fragile. It only seems to pass with -O0, as the code
> will be optimized away with any non-O0 levels. This is somewhat acceptable,
> but following is annoying: It scans digit 7 in resulting asm, which will
> pass whenever 
> * Any CPU name, file name, svn/git revision number and ARM eabi_attribute
> contain digit 7. 
> * All GCC x.7 versions
> * Any GCC built in July or on 7th/17th/27th of the month, or in 2017
> Actually I ran into this issue when I checked my test result with a
> reference. Unfortunately one has digit 7 in revision number and one has not.
> 
> I tend to just not scan anything and makes it a do-compile case against ICE.
> But I'm not sure if there was a reason to scan the digit. Please comment.


What about adding an explicit -O0 and doing a scan on the gimple dump?

Thanks,
Andrew

> 
> Thanks,
> Joey
> 
> 


Re: Fragile test case nsdmi-union5.C

2014-04-16 Thread Richard Biener
On Wed, Apr 16, 2014 at 10:24 AM,   wrote:
>
>
>> On Apr 16, 2014, at 12:42 AM, "Joey Ye"  wrote:
>>
>> Ran into a fragile test case:
>> FAIL: g+.dg/cpp0x/nsdmi-union5.C -std=c+11 scan-assembler 7
>>
>> $ cat nsdmi-union5.C
>> // PR c++/58701
>> // { dg-require-effective-target c++11 }
>> // { dg-final { scan-assembler "7" } }
>>
>> static union
>> {
>>  union
>>  {
>>int i = 7;
>>  };
>> };
>>
>> Two issues make it very fragile. It only seems to pass with -O0, as the code
>> will be optimized away with any non-O0 levels. This is somewhat acceptable,
>> but following is annoying: It scans digit 7 in resulting asm, which will
>> pass whenever
>> * Any CPU name, file name, svn/git revision number and ARM eabi_attribute
>> contain digit 7.
>> * All GCC x.7 versions
>> * Any GCC built in July or on 7th/17th/27th of the month, or in 2017
>> Actually I ran into this issue when I checked my test result with a
>> reference. Unfortunately one has digit 7 in revision number and one has not.
>>
>> I tend to just not scan anything and makes it a do-compile case against ICE.
>> But I'm not sure if there was a reason to scan the digit. Please comment.
>
>
> What about adding an explicit -O0 and doing a scan on the gimple dump?

Or make it a runtime testcase?

Richard.

> Thanks,
> Andrew
>
>>
>> Thanks,
>> Joey
>>
>>


Fwd: Using GCC to convert markup to C++

2014-04-16 Thread Akos Vandra
Forwarding this to the gcc list, since it seems to be more releavant
to the topic of this list. Sorry for the confusion.

Regards,
  Akos Vandra


-- Forwarded message --
From: Akos Vandra 
Date: 16 April 2014 12:48
Subject: Using GCC to convert markup to C++
To: gcc-h...@gcc.gnu.org


Hi!

We are developing a C++ Web Framework, and we use ERB for the markup
of the views.
I would like to use the GCC tool collection to convert that ERB markup
to C++ code that would actually do the rendering, so the markup
interpretation would happen at compile time.

How is that possible? Can you give me some pointers on what part of
the documentation I should read up on?

Is a frontend capable to generate C++ code? Is that the way this
should be solved? Or should I take another route?
The reason for using GCC would be its (presumed) capability of parsing
a language grammar extracting the tokens, which can be easily
translated into code afterwards.

Thanks,
  Akos Vandra


P.S. Something like:

"
<%#include  >
Hello ERB World!
You are the <%= count %>. visitor!
The current time is <%= strftime(time()) %>.
"

into

"
  #include "abstractview.h"
  #include "context.h"

  class MyView : public AbstractView {
virtual void render(Context ctx);
  }
"

"
#include "my_view.h"
#include 

void MyView::render(Context ctx) {
  std::string s;
  s.reserve(1);
  s.append("Hello ERB World!\n You are the ");
  s.append(count);
  s.append(". visitor!\n The current time is: ");
  s.append(strftime(time());
  s.append(".");

  return s;
}
"


Performance gain through dereferencing?

2014-04-16 Thread Peter Schneider
I have made a curious performance observation with gcc under 64 bit 
cygwin on a corei7. I'm genuinely puzzled and couldn't find any 
information about it. Perhaps this is only indirectly a gcc question 
though, bear with me.


I have two trivial programs which assign a loop variable to a local 
variable 10^8 times. One does it the obvious way, the other one accesses 
the variable through a pointer, which means it must dereference the 
pointer first. This is reflected nicely in the disassembly snippets of 
the respective loop bodies below. Funny enough, the loop with the extra 
dereferencing runs considerably faster than the loop with the direct 
assignment (>10%). While the issue (indeed the whole program ;-) ) goes 
away with optimization, in less trivial scenarios that may not be so.


My first question is: What makes the smaller code slower?
The gcc question is: Should assignment always be performed through a 
pointer if it is faster? (Probably not, but why not?) A session 
transcript including the compilable source is below.


Here are the disassembled loop bodies:

Direct access
=
localInt = i;
   1004010e6:   8b 45 fcmov-0x4(%rbp),%eax
   1004010e9:   89 45 f8mov%eax,-0x8(%rbp)


Pointer access
=
*localP = i;
   1004010ee:   48 8b 45 f0 mov-0x10(%rbp),%rax
   1004010f2:   8b 55 fcmov-0x4(%rbp),%edx
   1004010f5:   89 10   mov%edx,(%rax)

Note the first instruction which moves the address into %rax. The other 
two are similar to the direct assignment above.--


Here is a session transcript:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/lto-wrapper.exe
Target: x86_64-pc-cygwin
Configured with: 
/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2/configure 
--srcdir=/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2 
--prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin 
--libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var 
--sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share 
--docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C 
--build=x86_64-pc-cygwin --host=x86_64-pc-cygwin 
--target=x86_64-pc-cygwin --without-libiconv-prefix 
--without-libintl-prefix --enable-shared --enable-shared-libgcc 
--enable-static --enable-version-specific-runtime-libs 
--enable-bootstrap --disable-__cxa_atexit --with-dwarf2 
--with-tune=generic 
--enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite 
--enable-threads=posix --enable-libatomic --enable-libgomp 
--disable-libitm --enable-libquadmath --enable-libquadmath-support 
--enable-libssp --enable-libada --enable-libgcj-sublibs 
--disable-java-awt --disable-symvers 
--with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as 
--with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix 
--without-libintl-prefix --with-system-zlib --libexecdir=/usr/lib

Thread model: posix
gcc version 4.8.2 (GCC)

peter@peter-lap ~/src/test/obj_vs_ptr
$ cat ./t
#!/bin/bash

cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1


peter@peter-lap ~/src/test/obj_vs_ptr
$ ./t obj
int main()
{
int localInt;
for (int i = 0; i < 1; ++i)
localInt = i;
return 0;
}
real0m0.248s
user0m0.234s
sys 0m0.015s

peter@peter-lap ~/src/test/obj_vs_ptr
$ ./t ptr
int main()
{
int localInt;
int *localP = &localInt;
for (int i = 0; i < 1; ++i)
*localP = i;
return 0;
}

real0m0.215s
user0m0.203s
sys 0m0.000s



Re: Performance gain through dereferencing?

2014-04-16 Thread David Brown

Hi,

You cannot learn useful timing information from a single run of a short
test like this - there are far too many other factors that come into play.

You cannot learn useful timing information from unoptimised code.

There is too much luck involved in a test like this to be useful.  You
need optimised code (at least -O1), longer times, more tests, varied
code, etc., before being able to conclude anything.  Otherwise the
result could be nothing more than a quirk of the way caching worked out.

mvh.,

David


On 16/04/14 16:26, Peter Schneider wrote:
> I have made a curious performance observation with gcc under 64 bit
> cygwin on a corei7. I'm genuinely puzzled and couldn't find any
> information about it. Perhaps this is only indirectly a gcc question
> though, bear with me.
> 
> I have two trivial programs which assign a loop variable to a local
> variable 10^8 times. One does it the obvious way, the other one accesses
> the variable through a pointer, which means it must dereference the
> pointer first. This is reflected nicely in the disassembly snippets of
> the respective loop bodies below. Funny enough, the loop with the extra
> dereferencing runs considerably faster than the loop with the direct
> assignment (>10%). While the issue (indeed the whole program ;-) ) goes
> away with optimization, in less trivial scenarios that may not be so.
> 
> My first question is: What makes the smaller code slower?
> The gcc question is: Should assignment always be performed through a
> pointer if it is faster? (Probably not, but why not?) A session
> transcript including the compilable source is below.
> 
> Here are the disassembled loop bodies:
> 
> Direct access
> =
> localInt = i;
>1004010e6:   8b 45 fcmov-0x4(%rbp),%eax
>1004010e9:   89 45 f8mov%eax,-0x8(%rbp)
> 
> 
> Pointer access
> =
> *localP = i;
>1004010ee:   48 8b 45 f0 mov-0x10(%rbp),%rax
>1004010f2:   8b 55 fcmov-0x4(%rbp),%edx
>1004010f5:   89 10   mov%edx,(%rax)
> 
> Note the first instruction which moves the address into %rax. The other
> two are similar to the direct assignment above.--
> 
> Here is a session transcript:
> 
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/lto-wrapper.exe
> Target: x86_64-pc-cygwin
> Configured with:
> /cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2/configure
> --srcdir=/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2
> --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin
> --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var
> --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share
> --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C
> --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin
> --target=x86_64-pc-cygwin --without-libiconv-prefix
> --without-libintl-prefix --enable-shared --enable-shared-libgcc
> --enable-static --enable-version-specific-runtime-libs
> --enable-bootstrap --disable-__cxa_atexit --with-dwarf2
> --with-tune=generic
> --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite
> --enable-threads=posix --enable-libatomic --enable-libgomp
> --disable-libitm --enable-libquadmath --enable-libquadmath-support
> --enable-libssp --enable-libada --enable-libgcj-sublibs
> --disable-java-awt --disable-symvers
> --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as
> --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix
> --without-libintl-prefix --with-system-zlib --libexecdir=/usr/lib
> Thread model: posix
> gcc version 4.8.2 (GCC)
> 
> peter@peter-lap ~/src/test/obj_vs_ptr
> $ cat ./t
> #!/bin/bash
> 
> cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1
> 
> 
> peter@peter-lap ~/src/test/obj_vs_ptr
> $ ./t obj
> int main()
> {
> int localInt;
> for (int i = 0; i < 1; ++i)
> localInt = i;
> return 0;
> }
> real0m0.248s
> user0m0.234s
> sys 0m0.015s
> 
> peter@peter-lap ~/src/test/obj_vs_ptr
> $ ./t ptr
> int main()
> {
> int localInt;
> int *localP = &localInt;
> for (int i = 0; i < 1; ++i)
> *localP = i;
> return 0;
> }
> 
> real0m0.215s
> user0m0.203s
> sys 0m0.000s
> 
> 



Re: Performance gain through dereferencing?

2014-04-16 Thread David Guillen
Hello,

I completely agree with David.
Note that your results will greatly vary depending on the machine you
run the tests on. Performance on such tests it is very
machine-dependant, so the conclusion cannot be generalized.

David

2014-04-16 16:49 GMT+02:00 David Brown :
>
> Hi,
>
> You cannot learn useful timing information from a single run of a short
> test like this - there are far too many other factors that come into play.
>
> You cannot learn useful timing information from unoptimised code.
>
> There is too much luck involved in a test like this to be useful.  You
> need optimised code (at least -O1), longer times, more tests, varied
> code, etc., before being able to conclude anything.  Otherwise the
> result could be nothing more than a quirk of the way caching worked out.
>
> mvh.,
>
> David
>
>
> On 16/04/14 16:26, Peter Schneider wrote:
>> I have made a curious performance observation with gcc under 64 bit
>> cygwin on a corei7. I'm genuinely puzzled and couldn't find any
>> information about it. Perhaps this is only indirectly a gcc question
>> though, bear with me.
>>
>> I have two trivial programs which assign a loop variable to a local
>> variable 10^8 times. One does it the obvious way, the other one accesses
>> the variable through a pointer, which means it must dereference the
>> pointer first. This is reflected nicely in the disassembly snippets of
>> the respective loop bodies below. Funny enough, the loop with the extra
>> dereferencing runs considerably faster than the loop with the direct
>> assignment (>10%). While the issue (indeed the whole program ;-) ) goes
>> away with optimization, in less trivial scenarios that may not be so.
>>
>> My first question is: What makes the smaller code slower?
>> The gcc question is: Should assignment always be performed through a
>> pointer if it is faster? (Probably not, but why not?) A session
>> transcript including the compilable source is below.
>>
>> Here are the disassembled loop bodies:
>>
>> Direct access
>> =
>> localInt = i;
>>1004010e6:   8b 45 fcmov-0x4(%rbp),%eax
>>1004010e9:   89 45 f8mov%eax,-0x8(%rbp)
>>
>>
>> Pointer access
>> =
>> *localP = i;
>>1004010ee:   48 8b 45 f0 mov-0x10(%rbp),%rax
>>1004010f2:   8b 55 fcmov-0x4(%rbp),%edx
>>1004010f5:   89 10   mov%edx,(%rax)
>>
>> Note the first instruction which moves the address into %rax. The other
>> two are similar to the direct assignment above.--
>>
>> Here is a session transcript:
>>
>> $ gcc -v
>> Using built-in specs.
>> COLLECT_GCC=gcc
>> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/lto-wrapper.exe
>> Target: x86_64-pc-cygwin
>> Configured with:
>> /cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2/configure
>> --srcdir=/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2
>> --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin
>> --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var
>> --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share
>> --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C
>> --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin
>> --target=x86_64-pc-cygwin --without-libiconv-prefix
>> --without-libintl-prefix --enable-shared --enable-shared-libgcc
>> --enable-static --enable-version-specific-runtime-libs
>> --enable-bootstrap --disable-__cxa_atexit --with-dwarf2
>> --with-tune=generic
>> --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite
>> --enable-threads=posix --enable-libatomic --enable-libgomp
>> --disable-libitm --enable-libquadmath --enable-libquadmath-support
>> --enable-libssp --enable-libada --enable-libgcj-sublibs
>> --disable-java-awt --disable-symvers
>> --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as
>> --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix
>> --without-libintl-prefix --with-system-zlib --libexecdir=/usr/lib
>> Thread model: posix
>> gcc version 4.8.2 (GCC)
>>
>> peter@peter-lap ~/src/test/obj_vs_ptr
>> $ cat ./t
>> #!/bin/bash
>>
>> cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1
>>
>>
>> peter@peter-lap ~/src/test/obj_vs_ptr
>> $ ./t obj
>> int main()
>> {
>> int localInt;
>> for (int i = 0; i < 1; ++i)
>> localInt = i;
>> return 0;
>> }
>> real0m0.248s
>> user0m0.234s
>> sys 0m0.015s
>>
>> peter@peter-lap ~/src/test/obj_vs_ptr
>> $ ./t ptr
>> int main()
>> {
>> int localInt;
>> int *localP = &localInt;
>> for (int i = 0; i < 1; ++i)
>> *localP = i;
>> return 0;
>> }
>>
>> real0m0.215s
>> user0m0.203s
>> sys 0m0.000s
>>
>>
>


Re: Performance gain through dereferencing?

2014-04-16 Thread Peter Schneider

Hi David,

Sorry, I had included more information in an earlier draft which I 
edited out for brevity.


> You cannot learn useful timing
> information from a single run of a short
> test like this - there are far too many
> other factors that come into play.

I didn't mention that I have run it dozens of times. I know that blunt 
runtime measurements on a non-realtime system tend to be 
non-reproducible, and that they are inadequate for exact measurements. 
But the difference here is so large that the result is highly 
significant, in spite of the "amateurish" setup. The run I am showing 
here is typical. One of my four cores is surely idle at any given 
moment, and there is no I/O, so the variations are small.



You cannot learn useful timing information from unoptimised code.


I beg to disagree. While in this case the problem (and indeed eventually 
the whole program ;-) ) goes away with optimization that may not be the 
case in less trivial scenarios. And optimization or not -- I would 
always contend that *p = n is **not slower** than i = n. But it is. 
Something is wrong ;-).


So I'd like to direct our attention to the generated code and its 
performance (because such code conceivably could appear as the result of 
an optimized compiler run as well, in less trivial scenarios). What 
puzzles me is: How can it be that two instructions are slower than a 
very similar pair of instructions plus another one? (And that question 
is totally unrelated to optimization.)



Otherwise the
result could be nothing more than a quirk of the way caching worked out.


Could you explain how caching could play a role here if all variables 
and adresses are on the stack and are likely to be in the same memory 
page? (I'm not being sarcastic -- I may miss something obvious).


I can imagine that somehow the processor architecture is better utilized 
by the faster version (e.g. because short inner loops pipleline worse or 
whatever). For what it's worth, the programs were running on a i7-3632QM.


Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Richard Henderson
On 04/16/2014 12:01 AM, John Marino wrote:
> On 4/16/2014 03:22, Ian Lance Taylor wrote:
>> On Tue, Apr 15, 2014 at 4:45 AM, Douglas B Rupp  wrote:
>>> On 04/14/2014 02:01 PM, Ian Lance Taylor wrote:
>>>
>>> No I considered that but I think that number will be very small. Will you
>>> concede, in hindsight, that it would be better had the name been chosen to
>>> be more unique?
>>
>> No, I won't concede that.  The unwind.h file provides the interface
>> for the C++ exception handling interface
>> (http://mentorembedded.github.io/cxx-abi/abi-eh.html).  That interface
>> is implemented by several different compilers, not just GCC.
> 
> The header can provide the exact same interface with a different, better
> file name.
> 
> He's basically asking, "If you had it to do all over again, would you
> still call it unwind.h or would you call it something different?"
> 
> It's just an academic discussion because answering yes or no changes
> nothing, but I think the majority of the people would give it a
> different file name if they could do it all over again.  It's not a big
> concession.

No, I don't think the majority would.

Because GCC would then be already incompatible with the Intel compiler from
which this interface was drawn, way back when the ia64 support was added to GCC
and we redesigned GCC's exception handling.


r~


Re: Performance gain through dereferencing?

2014-04-16 Thread Peter Schneider
In order to see what difference a different processor makes I also tried 
the same code on a fairly old 32 bit "AMD Athlon(tm) XP 3000+" with the 
current stable gcc (4.7.2). The difference is even more striking 
(dereferencing is much faster). I see that the size of the code inside 
the loop for the faster pointer access is exactly 8. No idea whether 
that has any significance.


Here as well I performed several runs with similar results. Statistical 
significance was established around n=2 ;-).


gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.7/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' 
--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs 
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.7 --enable-shared --enable-linker-build-id 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--enable-gnu-unique-object --enable-plugin --enable-objc-gc 
--enable-targets=all --with-arch-32=i586 --with-tune=generic 
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu 
--target=i486-linux-gnu

Thread model: posix
gcc version 4.7.2 (Debian 4.7.2-5)

ppeterr@www:~/src/test/obj-vs-ptr$  cat t
#!/bin/bash
cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1

ppeterr@www:~/src/test/obj-vs-ptr$ ./t obj
int main()
{
int localInt;
for (int i = 0; i < 1; ++i)
localInt = i;
return 0;
}

real0m0.418s
user0m0.416s
sys 0m0.004s
ppeterr@www:~/src/test/obj-vs-ptr$ ./t ptr
int main()
{
int localInt;
int *localP = &localInt;
for (int i = 0; i < 1; ++i)
*localP = i;
return 0;
}

real0m0.243s
user0m0.240s
sys 0m0.000s

===

The disassembly is for the direct access (slower):

localInt = i;
 80483eb:   8b 45 fcmov-0x4(%ebp),%eax
 80483ee:   89 45 f8mov%eax,-0x8(%ebp)

And for the pointer access (faster):

*localP = i;
 80483f1:   8b 45 f8mov-0x8(%ebp),%eax
 80483f4:   8b 55 fcmov-0x4(%ebp),%edx
 80483f7:   89 10   mov%edx,(%eax)



Re: Performance gain through dereferencing?

2014-04-16 Thread Richard Biener
On April 16, 2014 7:45:55 PM CEST, Peter Schneider  wrote:
>In order to see what difference a different processor makes I also
>tried 
>the same code on a fairly old 32 bit "AMD Athlon(tm) XP 3000+" with the
>
>current stable gcc (4.7.2). The difference is even more striking 
>(dereferencing is much faster). I see that the size of the code inside 
>the loop for the faster pointer access is exactly 8. No idea whether 
>that has any significance.

Alignment of jump targets are important. I don't think we do anything special 
there at O0, so the result will be pure luck.

Richard.

>Here as well I performed several runs with similar results. Statistical
>
>significance was established around n=2 ;-).
>
>gcc -v
>Using built-in specs.
>COLLECT_GCC=gcc
>COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.7/lto-wrapper
>Target: i486-linux-gnu
>Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5'
>
>--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs 
>--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr 
>--program-suffix=-4.7 --enable-shared --enable-linker-build-id 
>--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
>--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 
>--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
>--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
>--enable-gnu-unique-object --enable-plugin --enable-objc-gc 
>--enable-targets=all --with-arch-32=i586 --with-tune=generic 
>--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu 
>--target=i486-linux-gnu
>Thread model: posix
>gcc version 4.7.2 (Debian 4.7.2-5)
>
>ppeterr@www:~/src/test/obj-vs-ptr$  cat t
>#!/bin/bash
>cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1
>
>ppeterr@www:~/src/test/obj-vs-ptr$ ./t obj
>int main()
>{
> int localInt;
> for (int i = 0; i < 1; ++i)
> localInt = i;
> return 0;
>}
>
>real0m0.418s
>user0m0.416s
>sys 0m0.004s
>ppeterr@www:~/src/test/obj-vs-ptr$ ./t ptr
>int main()
>{
> int localInt;
> int *localP = &localInt;
> for (int i = 0; i < 1; ++i)
> *localP = i;
> return 0;
>}
>
>real0m0.243s
>user0m0.240s
>sys 0m0.000s
>
>===
>
>The disassembly is for the direct access (slower):
>
> localInt = i;
>  80483eb:   8b 45 fcmov-0x4(%ebp),%eax
>  80483ee:   89 45 f8mov%eax,-0x8(%ebp)
>
>And for the pointer access (faster):
>
> *localP = i;
>  80483f1:   8b 45 f8mov-0x8(%ebp),%eax
>  80483f4:   8b 55 fcmov-0x4(%ebp),%edx
>  80483f7:   89 10   mov%edx,(%eax)




Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Eric Botcazou
> Because GCC would then be already incompatible with the Intel compiler from
> which this interface was drawn, way back when the ia64 support was added to
> GCC and we redesigned GCC's exception handling.

The irony being that WindRiver is now owned by Intel...

Doug, what does this unwind.h from VxWorks 7 contain exactly?  Is it something 
that is derived from the original ICC implementation?

-- 
Eric Botcazou


Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Douglas B Rupp

On 04/16/2014 12:38 PM, Eric Botcazou wrote:

Because GCC would then be already incompatible with the Intel compiler from
which this interface was drawn, way back when the ia64 support was added to
GCC and we redesigned GCC's exception handling.


The irony being that WindRiver is now owned by Intel...

Doug, what does this unwind.h from VxWorks 7 contain exactly?  Is it something
that is derived from the original ICC implementation?



There's no reference in it to a derivative work. It's
Copyright 2010 Wind River
and the description says it's the C++ ABI interface to the unwinder.




Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Ian Lance Taylor
On Wed, Apr 16, 2014 at 12:01 AM, John Marino  wrote:
> On 4/16/2014 03:22, Ian Lance Taylor wrote:
>> On Tue, Apr 15, 2014 at 4:45 AM, Douglas B Rupp  wrote:
>>> On 04/14/2014 02:01 PM, Ian Lance Taylor wrote:
>>>
>>> No I considered that but I think that number will be very small. Will you
>>> concede, in hindsight, that it would be better had the name been chosen to
>>> be more unique?
>>
>> No, I won't concede that.  The unwind.h file provides the interface
>> for the C++ exception handling interface
>> (http://mentorembedded.github.io/cxx-abi/abi-eh.html).  That interface
>> is implemented by several different compilers, not just GCC.
>
> The header can provide the exact same interface with a different, better
> file name.
>
> He's basically asking, "If you had it to do all over again, would you
> still call it unwind.h or would you call it something different?"
>
> It's just an academic discussion because answering yes or no changes
> nothing, but I think the majority of the people would give it a
> different file name if they could do it all over again.  It's not a big
> concession.

I agree that it doesn't matter at this date, but I would still vote to
call it unwind.h.  It's a good descriptive name for the interface
described by the file.  I certainly wouldn't call it unwind-gcc.h;
it's intentionally not GCC-specific.

Ian


Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Richard Henderson
On 04/16/2014 12:48 PM, Douglas B Rupp wrote:
> On 04/16/2014 12:38 PM, Eric Botcazou wrote:
>>> Because GCC would then be already incompatible with the Intel compiler from
>>> which this interface was drawn, way back when the ia64 support was added to
>>> GCC and we redesigned GCC's exception handling.
>>
>> The irony being that WindRiver is now owned by Intel...
>>
>> Doug, what does this unwind.h from VxWorks 7 contain exactly?  Is it 
>> something
>> that is derived from the original ICC implementation?
>>
> 
> There's no reference in it to a derivative work. It's
> Copyright 2010 Wind River
> and the description says it's the C++ ABI interface to the unwinder.

Is it a (reasonably) close match, interface-wise?  Ought we be using
--with-system-libunwind for VxWorks7, like we do for hpux?


r~


Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Douglas B Rupp

On 04/16/2014 12:55 PM, Richard Henderson wrote:


Is it a (reasonably) close match, interface-wise?  Ought we be using
--with-system-libunwind for VxWorks7, like we do for hpux?


It looks reasonable at first glance, but there's a disturbing comment in 
the code something to the effect:


until we have a GCC compatible unwinding library, hide the API



Re: Redundant / wasted stack space and instructions

2014-04-16 Thread Jeff Law

On 04/16/14 00:30, pshor...@dataworx.com.au wrote:


I had left the movsi patterns unimplemented because I was told that if I
did this then gcc would create expands/splits to use 16 bit moves. So, I
removed my movsi patterns and all seemed well.

Correct.  GCC can synthesize movsi from movhi.  However


In comparing the output of the expand pass from the msp430 port and mine
I could see the movsi instructions for the msp430 and the movhi subreg
instructions in mine.

I wondered if using the subreg:HI to split the SI moves into HI moves
was hiding real nature of the moves for reload and future optimizations,
so I reverted to my movsi patterns and ... the spills now resolve back
to the incoming stack parameter slots as desired !
Seems plausible.  In general the compiler is not as good at optimizing 
code with SUBREG expressions.


There's a natural tension between making the MD file closely match the 
hardware capabilities and presenting a somewhat less accurate, but 
easier to optimize description.  Determining what's "best" is difficult, 
even for those of us with many years of experience with GCC.


In the end, it always comes down to looking how GCC compiles code of 
interest and determining how to improve things.




Oh, and I tried using the LRA on this test case and Jeff, you're correct
the generated code is far, far better.
Good to know.  LRA is definitely the future, but it's not a "turn on the 
bit and everything gets magically better", at least not most of the time.




jeff


Re: LRA Stuck in a loop until aborting

2014-04-16 Thread Paul Shortis

Solved... kind of.

*ldsi is one of the patterns movsi is expanded to and as the name 
suggests it only handles register loads. I know that at some 
stages memory references will pass the register_operand predicate 
so I changed the predicate for operand 0 and added an alternative 
to *ldsi that could store to memory and the problem went away.


What is interesting is that when LRA is disabled this case was 
handled fine by the old reload pass, suggesting that the new LRA 
pass handles this situation differently - or not at all ?


BTW. Can someone tell me whether I should be top or bottom posting ?

Cheers, Paul.

On 16/04/14 16:38, pshor...@dataworx.com.au wrote:


I've got a small test case there the ira pass produces this ...

(insn 35 38 36 5 (set (reg/v:SI 29 [orig:17 _b ] [17])
(reg/v:SI 17 [ _b ])) 48 {*ldsi}
 (expr_list:REG_DEAD (reg/v:SI 17 [ _b ])
(nil)))

and the LRA processes it as follows ...

   Spilling non-eliminable hard regs: 6
0 Non input pseudo reload: reject++
  alt=0,overall=607,losers=1,rld_nregs=2
0 Non input pseudo reload: reject++
1 Spill pseudo into memory: reject+=3
  alt=1,overall=616,losers=2,rld_nregs=2
0 Non input pseudo reload: reject++
alt=2: Bad operand -- refuse
 Choosing alt 0 in insn 35:  (0) =r  (1) r {*ldsi}
  Creating newreg=34 from oldreg=29, assigning class 
GENERAL_REGS to r34

   35: r34:SI=r17:SI
  REG_DEAD r17:SI
Inserting insn reload after:
   45: r29:SI=r34:SI

0 Non input pseudo reload: reject++
1 Non pseudo reload: reject++
  alt=0,overall=608,losers=1,rld_nregs=2
0 Non input pseudo reload: reject++
  alt=1,overall=613,losers=2,rld_nregs=2
0 Non input pseudo reload: reject++
alt=2: Bad operand -- refuse
 Choosing alt 0 in insn 45:  (0) =r  (1) r {*ldsi}
  Creating newreg=35 from oldreg=29, assigning class 
GENERAL_REGS to r35

   45: r35:SI=r34:SI
Inserting insn reload after:
   46: r29:SI=r35:SI

so, it is stuck in a loop (continues on for 90 attempts then 
aborts) but I can't see what is causing it. The pattern (below) 
shouldn't require a reload so I can't see why it would be doing 
this


(define_insn "*ldsi"
  [(set (match_operand:SI 0 "register_operand" "=r,r,r")
(match_operand:SI 1 "general_operand"   "r,m,i"))
  ]
  ""

Can anyone shed any light on this behaviour ?

Thanks, Paul.






Re: LRA Stuck in a loop until aborting

2014-04-16 Thread Richard Henderson
On 04/16/2014 03:05 PM, Paul Shortis wrote:
> Solved... kind of.
> 
> *ldsi is one of the patterns movsi is expanded to and as the name suggests it
> only handles register loads. I know that at some stages memory references will
> pass the register_operand predicate so I changed the predicate for operand 0
> and added an alternative to *ldsi that could store to memory and the problem
> went away.
> 
> What is interesting is that when LRA is disabled this case was handled fine by
> the old reload pass, suggesting that the new LRA pass handles this situation
> differently - or not at all ?

No, old reload didn't handle this either.  That you got away with it is bizzare.

It's always the case that you should combine patterns such that a given
operation can be performed with as many different kinds of inputs and outputs
as possible.  This allows the register allocator freedom to move data around as
it sees fit.

But the move patterns are especially special, in that a *single* pattern must
describe *all* possible ways that a given data type (mode) can be moved around.

The register allocators only select an alternative for a move.  They do not
choose between N different patterns, separately describing loads, stores, and
register-to-register movement.

I'm fairly sure the documentation is quite clear on this, and GCC had required
this since the beginning of time.


r~


gcc-4.9-20140416 is now available

2014-04-16 Thread gccadmin
Snapshot gcc-4.9-20140416 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140416/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 209450

You'll find:

 gcc-4.9-20140416.tar.bz2 Complete GCC

  MD5=dda6cefa1ed78845e1e4862dc7f7522d
  SHA1=41bf62ed3008fa6fa092731e4a8bf5702521eae4

Diffs from 4.9-20140406 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Eric Botcazou
> It looks reasonable at first glance, but there's a disturbing comment in
> the code something to the effect:
> 
> until we have a GCC compatible unwinding library, hide the API

Indeed, it looks like it was written from scratch for the Diab compiler and 
is not fully compatible, but it is meant to implement the common C++ ABI.

-- 
Eric Botcazou


Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Ian Lance Taylor
On Wed, Apr 16, 2014 at 3:53 PM, Eric Botcazou  wrote:
>> It looks reasonable at first glance, but there's a disturbing comment in
>> the code something to the effect:
>>
>> until we have a GCC compatible unwinding library, hide the API
>
> Indeed, it looks like it was written from scratch for the Diab compiler and
> is not fully compatible, but it is meant to implement the common C++ ABI.

I'm still not clear on what the real problem is.

It seems to me that when using GCC, if you #include , you
will get the GCC version, since it will come from the installed
GCC include directory, which is searched ahead of /usr/include.  That
seems OK.

The original e-mail suggested that there was a problem building
libgcc, but I don't see why that would be.  Is that the real problem?
If so we need more details.

Ian


Re: Rename unwind.h to unwind-gcc.h

2014-04-16 Thread Douglas B Rupp

On 04/16/2014 05:56 PM, Ian Lance Taylor wrote:


I'm still not clear on what the real problem is.

It seems to me that when using GCC, if you #include , you
will get the GCC version, since it will come from the installed
GCC include directory, which is searched ahead of /usr/include.  That
seems OK.

The original e-mail suggested that there was a problem building
libgcc, but I don't see why that would be.  Is that the real problem?
If so we need more details.


The root of the problem is a hack in libgcc/config/t-vxworks put in to 
resolve a name clash for "regs.h", but this clash was in the other 
direction, e.g. the vxworks version is needed in preference to the gcc 
version.


Now with vxworks7, we have a name clash in the other direction and a 
catch-22 trying to fix it.


I'm working now to try to remove this first hack, then the unwind.h 
problem should resolve itself.






Re: LRA Stuck in a loop until aborting

2014-04-16 Thread Jeff Law

On 04/16/14 16:19, Richard Henderson wrote:


The register allocators only select an alternative for a move.  They do not
choose between N different patterns, separately describing loads, stores, and
register-to-register movement.

I'm fairly sure the documentation is quite clear on this, and GCC had required
this since the beginning of time.
Correct on all counts; many an hour was spent reimplementing the PA 
movXX patterns to satisfy that requirement.


jeff


stack-protection vs alloca vs dwarf2

2014-04-16 Thread DJ Delorie

While debugging some gdb-related FAILs, I discovered that gcc's
-fstack-check option effectively calls alloca() to adjust the stack
pointer.

However, it doesn't mark the stack adjustment as FRAME_RELATED even
when it's setting up the local variables for the function.

In the case of rx-elf, for this testcase, the CFA for the function is
defined in terms of the stack pointer - and thus is incorrect after
the alloca call.

My question is: who's fault is this?  Should alloca() tell the debug
stuff that the stack pointer has changed?  Should it tell it to not
use $sp at all?  Should the debug stuff "just know" that $sp isn't a
valid choice for the CFA?

The testcase from gdb is pretty simple:

  void medium_frame ()
  {
char S [16384];
small_frame ();
  }


Re: LRA Stuck in a loop until aborting

2014-04-16 Thread pshortis

On 17.04.2014 13:00, Jeff Law wrote:

On 04/16/14 16:19, Richard Henderson wrote:


The register allocators only select an alternative for a move.  They 
do not
choose between N different patterns, separately describing loads, 
stores, and

register-to-register movement.

I'm fairly sure the documentation is quite clear on this, and GCC had 
required

this since the beginning of time.

Correct on all counts; many an hour was spent reimplementing the PA
movXX patterns to satisfy that requirement.

jeff

I'm convinced :-) but...

gcc internals info about movm is fairly comprehensive and I had taken 
care to ensure that I satisfied ...


"The constraints on a ‘movm’ must permit moving any hard register to 
any other hard register provided..."


by providing a define_expand that assigns from a general_operand to a 
nonimmediate_operand and ...


*ldsi instruction that can load from a general_operand to a 
nonimmediate_operand

and a
*storesi instruction that can store a register_operand to a 
memory_operand


In any case, out of curiosity and to convice myself I hadn't imagined 
the old reload pass handling this I reverted my recent fixes so that 
ldsi and storesi were once again as described above then repeated the 
exercise with full rtl dumping on and compared the rtl generated both 
with and without LRA enabled.


In both cases the *.ira dmp produced the triggering ...

(insn 57 61 58 5 (set (reg/v:SI 46 [orig:31 s ] [31])
(reg/v:SI 31 [ s ])) 48 {*ldsi}
 (expr_list:REG_DEAD (reg/v:SI 31 [ s ])
(nil)))

The non-LRA reload rtl produced ..

(insn 57 61 67 3 (set (reg:SI 1 r1)
(mem/c:SI (plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4])) [4 %sfp+4 S4 A16])) 48 {*ldsi}
 (nil))
(insn 67 57 58 3 (set (mem/c:SI (plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4])) [4 %sfp+4 S4 A16])
(reg:SI 1 r1)) 47 {*storesi}
 (nil))

While the LRA just got stuck in a loop unable to perform the reload of 
insn 57 that the old reload pass handled (or more correctly didn't choke 
over - it seems to be a redundant load/store).


I'm really just highlighting this because I know the LRA is quite young 
and this might be a hint towards a deeper/other issues.


Paul.