[RFC] Implementing addressof for C++0x

2010-05-20 Thread Paolo Carlini
Hi,

among the support facilities required by C++0x there is:

//

20.9.9.1 addressof [specialized.addressof]

template  T* addressof(T& r);

1 Returns: the actual address of the object or function referenced by r,
even in the presence of an
overloaded operator&.

Throws: nothing.

//

the only implementation I briefly skimmed, part of boost, involves an
ugly series of reinterpret_cast, which, by the way, I'm not even sure is
completely safe vs aliasing, in all the possible circumstances. Thus,
I'm wondering if we want a "builtin" facility in the C++ front-end for
this too, additionally to those strictly required for implementing
, or maybe we can put to use existing gcc extensions...

Again, directions / help welcome (at least this time the ABI is safe ;)

Thanks in advance,
Paolo.


gcc 3.2 compile issue when initialize value

2010-05-20 Thread Yixuan Huang
Hello,

I wrote following code:
#include 
#include 
#include 
int main()
{
struct dirent **namelist;
int numberOfProcDirs;
numberOfProcDirs=scandir("/proc", &namelist, 0, alphasort);
//std::string temp(std::string(namelist[0]->d_name)+std::string("fdsfds"));
//std::string temp(std::string(namelist[0]->d_name)+std::string("fdsf"));
// The error occured
std::string temp(std::string(namelist[0]->d_name)+std::string("cfdada"));
//std::string temp;
//temp = std::string(namelist[0]->d_name)+std::string("cfdada");
return 0;
}

When compiled under g++ 3.2, it would report compile error.

test.cpp: In function `int main()':
test.cpp:12: syntax error before `->' token

But code can compile under gcc 4.

Is this a limitation for gcc 3.2 when I used "std::string
temp(std::string(namelist[0]->d_name)+std::string("cfdada"));" to
initialize value.


Thanks,
yixuan


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Paolo Carlini
... for reference, it would be something like this (in my recollections,
it was even uglier ;)

template
  _Tp*
  addressof(_Tp& __v)
  {
return reinterpret_cast<_Tp*>
  (&const_cast(reinterpret_cast(__v)));
  }

I'm not sure...

Paolo.


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Paolo Carlini
On 05/20/2010 01:10 PM, Paolo Carlini wrote:
> ... for reference, it would be something like this (in my recollections,
> it was even uglier ;)
>
> template
>   _Tp*
>   addressof(_Tp& __v)
>   {
> return reinterpret_cast<_Tp*>
>   (&const_cast(reinterpret_cast(__v)));
>   }
>   
By the way, Peter (I think you are the author of the current boost
implementation, which I looked at yesterday), in case we end up having
something like the above, temporarily at least, which kind of
acknowledgment would you be Ok with? Is it enough your name in the
ChangeLog?

Thanks,
Paolo.


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

On 05/20/2010 01:10 PM, Paolo Carlini wrote:

... for reference, it would be something like this (in my recollections,
it was even uglier ;)

template
  _Tp*
  addressof(_Tp& __v)
  {
return reinterpret_cast<_Tp*>
  (&const_cast(reinterpret_cast(__v)));
  }


It's uglier because the code above doesn't work for functions, and because 
of compiler bugs.



By the way, Peter (I think you are the author of the current boost
implementation, which I looked at yesterday), in case we end up having
something like the above, temporarily at least, which kind of
acknowledgment would you be Ok with? Is it enough your name in the
ChangeLog?


Any kind of acknowledgment is fine with me, including none at all. Whichever 
you prefer. :-)





Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Paolo Carlini
Hi,
>>> ... for reference, it would be something like this (in my
>>> recollections,
>>> it was even uglier ;)
>>>
>>> template
>>>   _Tp*
>>>   addressof(_Tp& __v)
>>>   {
>>> return reinterpret_cast<_Tp*>
>>>   (&const_cast(reinterpret_cast>> char&>(__v)));
>>>   }
> It's uglier because the code above doesn't work for functions,
Ah, ok, it will be a little bigger then, I missed testing functions,
thanks. I will post the complete patch, in case.
> and because of compiler bugs.
Luckily we don't need that.
> Any kind of acknowledgment is fine with me, including none at all.
> Whichever you prefer. :-)
You are very kind, thanks. After all, we should still be below, say, 20
lines of code, thus, if you are ok with that, we are not going to need a
Copyright assignment, etc.

Paolo.


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Paolo Carlini
On 05/20/2010 01:55 PM, Paolo Carlini wrote:
> It's uglier because the code above doesn't work for functions,
>   
By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...

Paolo.


Re: Performance optimizations for Intel Core 2 and Core i7 processors

2010-05-20 Thread Steven Bosscher
On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov  wrote:
> CodeSourcery is working on improving performance for Intel's Core 2 and Core
> i7 families of processors.
>
> CodeSourcery plans to add support for unaligned vector instructions, to
> provide fine-tuned scheduling support and to update instruction selection
> and instruction cost models for Core i7 and Core 2 families of processors.
>
> As usual, CodeSourcery will be contributing its work to GCC.  Currently, our
> target is the end of GCC 4.6 Stage1.
>
> If your favorite benchmark significantly under-performs on Core 2 or Core i7
> CPUs, don't hesitate asking us to take a look at it.

I'd like to ask you to look at ffmpeg (missed core2 vectorization
opportunities), polyhedron (PR34501, like, duh! :-), and Apache
benchmark (-mtune=core2 results in lower scores).

You could check overall effects on an openly available benchmark suite
such as http://www.phoronix-test-suite.com/

Good luck with this project, it'll be great when -mtune=core2 actually
improves performance rather than degrading it!

Ciao!
Steven


Re: Performance optimizations for Intel Core 2 and Core i7 processors

2010-05-20 Thread Maxim Kuvyrkov

On 5/20/10 4:04 PM, Steven Bosscher wrote:

On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov  wrote:

CodeSourcery is working on improving performance for Intel's Core 2 and Core
i7 families of processors.

CodeSourcery plans to add support for unaligned vector instructions, to
provide fine-tuned scheduling support and to update instruction selection
and instruction cost models for Core i7 and Core 2 families of processors.

As usual, CodeSourcery will be contributing its work to GCC.  Currently, our
target is the end of GCC 4.6 Stage1.

If your favorite benchmark significantly under-performs on Core 2 or Core i7
CPUs, don't hesitate asking us to take a look at it.


I'd like to ask you to look at ffmpeg (missed core2 vectorization
opportunities), polyhedron (PR34501, like, duh! :-), and Apache
benchmark (-mtune=core2 results in lower scores).

You could check overall effects on an openly available benchmark suite
such as http://www.phoronix-test-suite.com/


Thank you for the pointers!

--
Maxim Kuvyrkov
CodeSourcery
ma...@codesourcery.com
(650) 331-3385 x724


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow 
references to functions to be casted to char&, and I believe the standard 
doesn't permit that, either. 



Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Paolo Carlini
On 05/20/2010 02:18 PM, Peter Dimov wrote:
>> On 05/20/2010 01:55 PM, Paolo Carlini wrote:
>>> It's uglier because the code above doesn't work for functions,
>>>
>> By the way, do you have a specific testcase in mind?
>>
>> Because addressof_fn_test.cpp, part of Boost, passes...
>
> This is probably a g++/gcc extension... some compilers do not allow
> references to functions to be casted to char&, and I believe the
> standard doesn't permit that, either.
I see. I'm a bit reluctant to add complexity to the code, given that
current Comeau and Intel, at least, in strict-mode, also like it...

Paolo.


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

Paolo Carlini wrote:

On 05/20/2010 02:18 PM, Peter Dimov wrote:

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow
references to functions to be casted to char&, and I believe the
standard doesn't permit that, either.

I see. I'm a bit reluctant to add complexity to the code, given that
current Comeau and Intel, at least, in strict-mode, also like it...


If it works, there's certainly no need to add complexity.

Here's the ticket that prompted the boost::addressof changes:

https://svn.boost.org/trac/boost/ticket/1846

but it doesn't say which compiler didn't like it at the time. MSVC 8.0 also 
does. 



Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Andrew Pinski



Sent from my iPhone

On May 20, 2010, at 5:43 AM, "Peter Dimov"  wrote:


Paolo Carlini wrote:

On 05/20/2010 02:18 PM, Peter Dimov wrote:

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow
references to functions to be casted to char&, and I believe the
standard doesn't permit that, either.

I see. I'm a bit reluctant to add complexity to the code, given that
current Comeau and Intel, at least, in strict-mode, also like it...


If it works, there's certainly no need to add complexity.

Here's the ticket that prompted the boost::addressof changes:

https://svn.boost.org/trac/boost/ticket/1846

but it doesn't say which compiler didn't like it at the time. MSVC  
8.0 also does.




I do know at one point gcc changed to reject it by default and then  
that was reverted as it broke building libjava; not to mention most  
uses of dlsym.


Thanks,
Andrew Pinski




Workaround for error: ??? cannot appear in a constant-expression

2010-05-20 Thread Jan Tusch
I observed the following issue with g++-4.3.4:

I can work around the restriction of initializing a static const integral 
class member with a non-constant expression.
This violates the standard and should be rejected by "-pedantic".
Correct me, if I'm wrong.

I cut down my code to the following (concededly a bit contrived) demo 
scenario:

--- Example Code ---
template 
struct X {
  static const NT VALUE;
};
template 
const NT X::VALUE = static_cast(x);

struct FailingTruth {
  static const bool VALUE = X::VALUE < X::VALUE;
}; 

template 
struct Test {
  static const bool VALUE = X::VALUE < X::VALUE;
};

struct WorkingTruth {
  static const bool VALUE = Test<1,2>::VALUE;
};

template 
struct UseAsTemplateParamter {
  enum {
VALUE = B
  };
};

UseAsTemplateParamter object;

--- End Example Code ---

The compiler correctly complains with
error: ‘X::VALUE’ cannot appear in a constant-expression
in "FailingTruth".

However, when I encasulate the  very same condition using the Test template 
class, the "WorkingTruth" code is accepted by g++. I could even use 
WorkingTruth::VALUE as a non-type template parameter (see last line).


Am I missing something here, or does that really violate the standard.

Regards Jan.

PS: Don't get me wrong, I like this "feature", it is really useful, when doing 
some meta-programming involving floating-point constants :-)


Where does the time go?

2010-05-20 Thread Steven Bosscher
Hello,

For some time now, I've wanted to see where compile time goes in a
typical GCC build, because nobody really seems to know what the
compiler spends its time on. The impressions that get published about
gcc usually indicate that there is at least a feeling that GCC is not
getting faster, and that parts of the compiler are unreasonably slow.
I was hoping to maybe shed some light on what parts that could be.

What I've done is this:
* Build GCC 4.6.0 (trunk r159624) with --enable-checking=release and
with -O2 and install it
* Build GCC 4.6.0 (trunk r159624) again, with the installed compiler
and with "-O2 -g3 -ftime-report". The time reports (along with
everything else on stderr) are piped to an output file
* Extract, sum, and sort time consumed per timevar

Host was cfarm gcc14 (8 x 3GHz Xeon). Target was
x86_64-unknown-linux-gnu. "Build" means non-bootstrap.

Results at the bottom of this mail.

Conclusions:

* There are quite a few timevars for parts of the compiler that have
been removed: TV_SEQABSTR, TV_GLOBAL_ALLOC, TV_LOCAL_ALLOC are the
ones I've spotted so far.  I will go through the whole list, remove
all timevars that are unused, and submit a patch.

* The "slow" parts of the compiler are not exactly news: tree-PRE,
scheduling, register allocation

* Variable tracking costs ~7.8% of compile time. This more than the
cost of the register allocation (IRA+reload)

* The C front end (preprocessing+lexing+parsing) costs ~17%. For an
optimizing compiler with so many passes, this is quite a lot.

* The GIMPLE optimizers (done with egrep
"tree|dominator_opt|alias_stmt_walking|alias_analysis|inline_heuristics|PHI_merge")
together cost ~16%.

* Adding and subtracting the above numbers, the rest of the compiler,
which is mostly the RTL parts, still account for 100-17-16-8=59% of
the total compile time. This was the most surprising result for me.

Ciao!
Steven


auto_inc_dec0.000%
callgraph_verifier  0.000%
cfg_construction0.000%
CFG_verifier0.000%
delay_branch_sched  0.000%
df_live_byte_regs   0.000%
df_scan_insns   0.000%
df_uninitialized_regs_2 0.000%
dump_files  0.000%
global_alloc0.000%
Graphite_code_generation0.000%
Graphite_data_dep_analysis  0.000%
Graphite_loop_transforms0.000%
ipa_free_lang_data  0.000%
ipa_lto_cgraph_IO   0.000%
ipa_lto_cgraph_merge0.000%
ipa_lto_decl_init_IO0.000%
ipa_lto_decl_IO 0.000%
ipa_lto_decl_merge  0.000%
ipa_lto_gimple_IO   0.000%
ipa_points_to   0.000%
ipa_profile 0.000%
ipa_type_escape 0.000%
life_analysis   0.000%
life_info_update0.000%
load_CSE_after_reload   0.000%
local_alloc 0.000%
loop_doloop 0.000%
loop_unrolling  0.000%
loop_unswitching0.000%
LSM 0.000%
lto 0.000%
name_lookup 0.000%
overload_resolution 0.000%
PCH_main_state_restore  0.000%
PCH_main_state_save 0.000%
PCH_pointer_reallocation0.000%
PCH_pointer_sort0.000%
PCH_preprocessor_state_restore  0.000%
PCH_preprocessor_state_save 0.000%
plugin_execution0.000%
plugin_initialization   0.000%
predictive_commoning0.000%
reg_stack   0.000%
rename_registers0.000%
rest_of_compilation 0.000%
sequence_abstraction0.000%
shorten_branches0.000%
sms_modulo_scheduling   0.000%
template_instantiation  0.000%
total_time  0.000%
tracer  0.000%
tree_check_data_dependences   

Re: powerpc-eabi-gcc no implicit FPU usage

2010-05-20 Thread Mark Mitchell
Robert Grimm wrote:

> Actually, I saw some old posts that talked about a -fno-implicit-fp option 

[I know this is a very old post, but I noticed it in old email and felt
it might still be useful to reply.]

CodeSourcery has a -fno-implicit-fp option that does exactly what you
have requested in some of our compilers.  We use this particularly in
the context of VxWorks; on VxWorks, the kernel will only save/restore
floating-point registers across task switches for tasks that are
designated as floating-point tasks.  Therefore, using an FPR in a
non-floating-point task is a bug.

Because there is so much VxWorks code out there that does not use
-msoft-float, it's not practical to require that programmers use
-msoft-float for all files, and then explicitly turn on hard-float for
floating-point tasks.  We offered to contribute this code, but the FSF
GCC maintainers decided not to accept it.

-- 
Mark Mitchell
CodeSourcery
m...@codesourcery.com
(650) 331-3385 x713


Re: powerpc-eabi-gcc no implicit FPU usage

2010-05-20 Thread Joel Sherrill



On 05/20/2010 11:21 AM, Mark Mitchell wrote:

Robert Grimm wrote:

   

Actually, I saw some old posts that talked about a -fno-implicit-fp option
 

[I know this is a very old post, but I noticed it in old email and felt
it might still be useful to reply.]

CodeSourcery has a -fno-implicit-fp option that does exactly what you
have requested in some of our compilers.  We use this particularly in
the context of VxWorks; on VxWorks, the kernel will only save/restore
floating-point registers across task switches for tasks that are
designated as floating-point tasks.  Therefore, using an FPR in a
non-floating-point task is a bug.

Because there is so much VxWorks code out there that does not use
-msoft-float, it's not practical to require that programmers use
-msoft-float for all files, and then explicitly turn on hard-float for
floating-point tasks.  We offered to contribute this code, but the FSF
GCC maintainers decided not to accept it.

   


I know RTEMS users have asked about this option before.
We have the same problem since we also support floating
point as an optional task attribute.  On some tasks we implicitly
make the port force all tasks to have FP contexts.

Why was this rejected and not merged?

--
Joel Sherrill, Ph.D. Director of Research&  Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
   Support Available (256) 722-9985




Re: powerpc-eabi-gcc no implicit FPU usage

2010-05-20 Thread Mark Mitchell
Joel Sherrill wrote:

> I know RTEMS users have asked about this option before.
> We have the same problem since we also support floating
> point as an optional task attribute.  On some tasks we implicitly
> make the port force all tasks to have FP contexts.
> 
> Why was this rejected and not merged?

You'd have to check the archives.  I think the basic thrust of the
rejection was that the concept was ugly, not that the patch was
technically unsound.  In other words, the objection was to the feature,
not the implementation of that feature.  It is of course a feature much
less valuable on a workstation/server class operating system than on the
VxWorks/RTEMS class of RTOS systems.

Perhaps the idea was that you should just compile files that don't need
floating-point with -msoft-float (or the equivalent for other
architectures).  Of course, that runs into the legacy source code
problem, and also the problem that you can't mix code within a file,
which does sometimes have performance advantages.  (For example, because
calls to static functions need not have the standard ABIs.)

-- 
Mark Mitchell
CodeSourcery
m...@codesourcery.com
(650) 331-3385 x713


Re: LTO and libelf (and FreeBSD)

2010-05-20 Thread Kai Wang
On Sun, May 02, 2010 at 11:38:39PM +0200, Gerald Pfeifer wrote:
> As http://gcc.gnu.org/ml/gcc-testresults/2010-05/msg00120.html shows,
> *-unknown-freebsd* exhibits tons of failures around LTO.
> 
> I dug a bit deeper, and even the most trivial test program
>   int main() { }
> fails with
>   lto1: internal compiler error: compressed stream: data error
>   Please submit a full bug report,
>   with preprocessed source if appropriate.
>   See  for instructions.
>   lto-wrapper: /files/pfeifer/gcc/bin/gcc returned 1 exit status
>   collect2: lto-wrapper returned 1 exit status
> when compiled with gcc -flto x.c.

Hi Gerald,

First sorry about the late reply.

The problem is indeed caused by a small incomptibility of FreeBSD
libelf with other libelf implementations.

The elf_getbase() API in FreeBSD libelf can only be called using
an archive member ELF descriptor. It will return -1 (indicates an error)
when called with a "regular" ELF object.

The lto_obj_build_section_table() function in lto-elf.c calls
elf_getbase() to get the offset base for a "regular" ELF object. On
FreeBSD it gets -1.  (Side notes: lto-elf.c should really check return
values of libelf APIs) And later it uses -1 as base_offset for all the
ELF sections thus results in an off-by-one error when passing the
section data to zlib to inflate.

Could you please try applying the attached patch to the libelf in
FreeBSD 7.3 and see if it fixes gcc4.6 lto?

Thanks,
Kai
diff -urN libelf.orig/elf_getbase.c libelf/elf_getbase.c
--- libelf.orig/elf_getbase.c	2010-05-20 18:49:43.0 +0200
+++ libelf/elf_getbase.c	2010-05-20 18:52:49.0 +0200
@@ -34,12 +34,14 @@
 off_t
 elf_getbase(Elf *e)
 {
-	if (e == NULL ||
-	e->e_parent == NULL) {
+	if (e == NULL) {
 		LIBELF_SET_ERROR(ARGUMENT, 0);
 		return (off_t) -1;
 	}
 
-	return ((off_t) ((uintptr_t) e->e_rawfile -
-	(uintptr_t) e->e_parent->e_rawfile));
+	if (e->e_parent)
+		return ((off_t) ((uintptr_t) e->e_rawfile -
+		(uintptr_t) e->e_parent->e_rawfile));
+	else
+		return (off_t) 0;
 }


Re: gcc 3.2 compile issue when initialize value

2010-05-20 Thread Ian Lance Taylor
Yixuan Huang  writes:

> I wrote following code:
> #include 
> #include 
> #include 
> int main()
> {
> struct dirent **namelist;
> int numberOfProcDirs;
> numberOfProcDirs=scandir("/proc", &namelist, 0, alphasort);
> //std::string temp(std::string(namelist[0]->d_name)+std::string("fdsfds"));
> //std::string temp(std::string(namelist[0]->d_name)+std::string("fdsf"));
> // The error occured
> std::string temp(std::string(namelist[0]->d_name)+std::string("cfdada"));
> //std::string temp;
> //temp = std::string(namelist[0]->d_name)+std::string("cfdada");
> return 0;
> }
>
> When compiled under g++ 3.2, it would report compile error.
>
> test.cpp: In function `int main()':
> test.cpp:12: syntax error before `->' token
>
> But code can compile under gcc 4.
>
> Is this a limitation for gcc 3.2 when I used "std::string
> temp(std::string(namelist[0]->d_name)+std::string("cfdada"));" to
> initialize value.

This question is not appropriate for the mailing list
g...@gcc.gnu.org.  It would be appropriate for gcc-h...@gcc.gnu.org.
Please take any followups to gcc-help.  Thanks.

gcc 3.2 is quite old. The C++ parser was completely rewritten in gcc
3.4 to improve correctness.  It is quite likely that this is simply a
bug in gcc 3.2.  You can probably avoid the bug by using temporary
variables.

Ian


Re: Workaround for error: ??? cannot appear in a constant-expression

2010-05-20 Thread Ian Lance Taylor
Jan Tusch  writes:

> I observed the following issue with g++-4.3.4:
>
> I can work around the restriction of initializing a static const integral 
> class member with a non-constant expression.
> This violates the standard and should be rejected by "-pedantic".
> Correct me, if I'm wrong.
>
> I cut down my code to the following (concededly a bit contrived) demo 
> scenario:
>
> --- Example Code ---
> template 
> struct X {
>   static const NT VALUE;
> };
> template 
> const NT X::VALUE = static_cast(x);
>
> struct FailingTruth {
>   static const bool VALUE = X::VALUE < X::VALUE;
> }; 
>
> template 
> struct Test {
>   static const bool VALUE = X::VALUE < X::VALUE;
> };
>
> struct WorkingTruth {
>   static const bool VALUE = Test<1,2>::VALUE;
> };
>
> template 
> struct UseAsTemplateParamter {
>   enum {
> VALUE = B
>   };
> };
>
> UseAsTemplateParamter object;
>
> --- End Example Code ---
>
> The compiler correctly complains with
> error: ‘X::VALUE’ cannot appear in a constant-expression
> in "FailingTruth".
>
> However, when I encasulate the  very same condition using the Test template 
> class, the "WorkingTruth" code is accepted by g++. I could even use 
> WorkingTruth::VALUE as a non-type template parameter (see last line).



This question is not appropriate for the mailing list g...@gcc.gnu.org.
It would be appropriate for gcc-h...@gcc.gnu.org.  Please take any
followups to gcc-help.  Thanks.

I agree that this appears to be a bug.  Please consider reporting it
per the instructions at http://gcc.gnu.org/bugs/ .  Thanks.

Ian


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Jason Merrill

On 05/20/2010 08:18 AM, Peter Dimov wrote:

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow
references to functions to be casted to char&, and I believe the
standard doesn't permit that, either.


The standard permits a compiler to accept or reject such a cast.

5.2.10/8: Converting a pointer to a function into a pointer to an object 
type or vice versa is conditionally-supported.  The meaning of such a 
conversion is implementation-defined, except that if an implementation 
supports conversions in both directions, converting a prvalue of one 
type to the other type and back, possibly with different 
cv-qualification, shall yield the original pointer value.


Jason


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Paolo Carlini
On 05/20/2010 08:13 PM, Jason Merrill wrote:
> The standard permits a compiler to accept or reject such a cast.
>
> 5.2.10/8: Converting a pointer to a function into a pointer to an
> object type or vice versa is conditionally-supported.  The meaning of
> such a conversion is implementation-defined, except that if an
> implementation supports conversions in both directions, converting a
> prvalue of one type to the other type and back, possibly with
> different cv-qualification, shall yield the original pointer value.
Thanks for the clarification.

The library patch which I'll commit later today includes a very simple
implementation of std::addressof - essentially what I posted already at
the beginning of this thread - and a specific testcase making sure that
we can deal correctly with functions: if, for some reason, the compiler
changes in the future about this issue, we can immediately notice and
adjust std::addressof.

Paolo.


Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

Jason Merrill wrote:

On 05/20/2010 08:18 AM, Peter Dimov wrote:

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow
references to functions to be casted to char&, and I believe the
standard doesn't permit that, either.


The standard permits a compiler to accept or reject such a cast.

5.2.10/8: Converting a pointer to a function into a pointer to an object 
type or vice versa is conditionally-supported.


Thanks; that is, then, why the latest Comeau accepts it. It didn't occur to 
me to try the earlier versions on http://www.comeaucomputing.com/tryitout/ - 
they reject the code. This paragraph is a new addition, not present in 
C++03; "conditionally supported" is a C++0x-ism. :-) 



Proposal: remove the Obj-C++ front end

2010-05-20 Thread Steven Bosscher
Hello,

The Obj-C++ front end is effectively unmaintained, and has virtually
no serious users. I propose to remove it from GCC.

Perhaps the only user of some significance is GNUstep, but they are
already in an awkward position because they wish to use ObjC 2.0, and
they are looking at clang for that.

The burden on GCC on the other hand, from a maintainer's point of
view, is very significant. Obj-C++ was allowed in for GCC 4.1 under
the condition that Apple would maintain the front end. Clearly this
condition is currently not satisfied. The Obj-C++ front end ties
together the c, objc, and g++ front ends in very uncomfortable ways.
The front end also from time to time breaks because someone changes
g++ and breaks one of the many non-obvious dependencies between g++
and objc++.

There are also technical reasons for removing this front end. For one,
objc exceptions in objc++ don't work for the GNU runtime. The
implementation of the language also appears to be out of sync with
Apple (apart from ObjC 2.0).

Removing *anything* from GCC always leads to discussions, and I've
been in enough of them to not propose this removal lightly. I think
everyone understands that the GCC community does not exist only to
benefit itself.  But this trade-off should be considered seriously,
between the benefits of an unmaintained and essentially broken objc++
for a very limited number of users (if any) against very high
maintenance cost for the GCC community.  In my view: if there is no
active maintainer for objc++ then it should go.

I'll get in my flame resistant suit now...

Ciao!
Steven


Re: Where does the time go?

2010-05-20 Thread Vladimir Makarov

Steven Bosscher wrote:

Hello,

For some time now, I've wanted to see where compile time goes in a
typical GCC build, because nobody really seems to know what the
compiler spends its time on. The impressions that get published about
gcc usually indicate that there is at least a feeling that GCC is not
getting faster, and that parts of the compiler are unreasonably slow.
  
It is just a feeling.  In fact, starting since 4.2, gcc becomes faster 
(if we ignore LTO).  My feeling is that LLVM becomes slower.  The gap in 
compilation speed between GCC4.5 and LLVM2.7 achieves less 10% on x86_64 
SPECInt2000 for -O2.


Feeling that GCC becomes slower probably occurs for people who switched 
recently from 3.x versions because in comparison with theses versions 
gcc became slower achieving maximum slowdown 25 and 40% slowdown on 
gcc4.2 correspondingly on SPECInt2000 and SPECFP2000 on x86_64 for -O2.


All these GCC version comparison can be found on 
http://vmakarov.fedorapeople.org/spec/.


As I wrote, GCC is not such slower compiler.  People sometime exaggerate 
this problem.  For example, x86_64 Sun compiler generating about the 
same quality code as GCC is 2 times slower than GCC.


We should work on speeding GCC up but the first priority (at least for 
me) should be improvement of generated code.

Host was cfarm gcc14 (8 x 3GHz Xeon). Target was
x86_64-unknown-linux-gnu. "Build" means non-bootstrap.

Results at the bottom of this mail.

Conclusions:


* The "slow" parts of the compiler are not exactly news: tree-PRE,
scheduling, register allocation

  
RA and scheduling is usually the slowest part of optimizing compiler 
because they solve NP-problems and widely used algorithms (list 
scheduling and graph coloring) has worst quadratic complexity.  For 
example, here is comparison of how many time LLVM-2.7 passes and 
analogous GCC passes (although sometime it is hard to find full 
correspondence) spent on


   LTOGCC4.6
RA (linear scan RA + simple register coalescing) 7.2%IRA 
  9%
Instruction Selection   10.7%
combiner+reload9%


The data are from compilation all_cp2k_gfortran.f90 (420Kline fortran 
with hundreds functions) on x86_64 in -O3 mode.  By the way on Core2 
GCC4.6 spent 235 user sec on compilation of this file vs 265 sec by LLVM.


Also linear scan RA is one of the fastest RA algorithm but it is much 
worse (at least 2-3% on SPEC2000) than graph coloring one.


I wanted to look at the same time distribution for OPEN64 because it has 
sophisticated graph coloring RA algorithm.  May be I'll do it when I 
learn OPEN64 more.

* Adding and subtracting the above numbers, the rest of the compiler,
which is mostly the RTL parts, still account for 100-17-16-8=59% of
the total compile time. This was the most surprising result for me.

  
I don't know is it big or not to have such time spend in RTL parts.  But 
I think that this RTL part could be decreased if RTL (magically :) would 
have smaller footprint and contain less details.

Ciao!
Steven

  
Thanks for the data.  That was interesting to look one more time (from 
many others) at this.




Re: Where does the time go?

2010-05-20 Thread Joseph S. Myers
On Thu, 20 May 2010, Steven Bosscher wrote:

> * The C front end (preprocessing+lexing+parsing) costs ~17%. For an
> optimizing compiler with so many passes, this is quite a lot.

Unlike C++ where a lot of speedups were done around 2004-5, I don't think 
much has been done about speeding up the C front end outside the 
preprocessor, so there are likely to be lots of opportunities available 
(there are probably still a fair number for C++ as well).

Preprocessor (including preprocessed output) speed is of relevance for use 
with tools such as distcc, and whole-compilation speed is of obvious 
relevance for general use of the compiler.  Front-end-only speed 
(-fsyntax-only, approximately) is less important on its own as that's not 
a normal compiler use (I think) - it's only relevant as a contribution to 
the speed of a whole compilation.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal: remove the Obj-C++ front end

2010-05-20 Thread IainS

No Asbestos required - but .. I do have some observations..

 I write pretty much all my serious (day-job) code in ObjC and..
...  I have stated that it's an intention to make *that*, at least  
work at V2 on FSF.


Having said that:

a)  I have not anything like as much attachment to ObjC++ ...
b) We are all limited in the amount of time we can allocate to FSF ..

Progress can be limited almost as much by the review lag as it is by  
technical difficulty in fixing the problems.


On 20 May 2010, at 20:02, Steven Bosscher wrote:

The Obj-C++ front end is effectively unmaintained, and has virtually
no serious users. I propose to remove it from GCC.


a) I'm curious about how we know how many actual users we have.

b) The largest group of potential users is on OSX - and, of course,  
without NeXT working on 64bit that's a serious limitation.


yay verily the chicken and the egg.

Also, I think it fair to say that "ordinary" users - like me have been  
reluctant to devote significant time to ObjC/C++

... when the likelihood of getting code accepted was c.f sqrt(0.0).

I would suggest that, if we achieve V2 NeXT support and detachment  
from corporate domination of maintainers ... we might fare better.

(but perhaps it's just not a popular language full stop).


Perhaps the only user of some significance is GNUstep, but they are
already in an awkward position because they wish to use ObjC 2.0, and
they are looking at clang for that.


I am under the impression from the 4.2 code-base that V2.0 ObjC++ will  
come pretty much for free with V2 ObjC.



The burden on GCC on the other hand, from a maintainer's point of
view, is very significant.


high maintenance  != unmaintained ;-)


The Obj-C++ front end ties
together the c, objc, and g++ front ends in very uncomfortable ways.


doesn't it just  :-( ... it's a nightmare to learn (1st hand  
experience).


It would be worth pausing to see if there's a way to disentangle it.
Which would, perhaps, go some way towards ameliorating the irritation.


There are also technical reasons for removing this front end. For one,
objc exceptions in objc++ don't work for the GNU runtime.


Well .. I have an opinion that don't != can't  ...  if you know this  
to be untrue I'd be interested.


... ObjC exceptions need sorting out..
I would hope that ObjC++ exceptions would 99% fall out of that.


The implementation of the language also appears to be out of sync with
Apple (apart from ObjC 2.0).


I'm not aware of any excess lack of sync c.f. ObjC

V1 requires FE __attribute__ and @property support (which I've got 95%  
working on a local tree)


V2 - I guess the most significant impact is gc.


I'll get in my flame resistant suit now...


So what about writing an Occam FE ;-) ?

Iain



Re: Where does the time go?

2010-05-20 Thread Toon Moene

On 05/20/2010 09:17 PM, Vladimir Makarov wrote:


Steven Bosscher wrote:



For some time now, I've wanted to see where compile time goes in a
typical GCC build, because nobody really seems to know what the
compiler spends its time on. The impressions that get published about
gcc usually indicate that there is at least a feeling that GCC is not
getting faster, and that parts of the compiler are unreasonably slow.



It is just a feeling. In fact, starting since 4.2, gcc becomes faster
(if we ignore LTO). My feeling is that LLVM becomes slower. The gap in
compilation speed between GCC4.5 and LLVM2.7 achieves less 10% on x86_64
SPECInt2000 for -O2.

Feeling that GCC becomes slower probably occurs for people who switched
recently from 3.x versions because in comparison with theses versions
gcc became slower achieving maximum slowdown 25 and 40% slowdown on
gcc4.2 correspondingly on SPECInt2000 and SPECFP2000 on x86_64 for -O2.


I never understood the problem of "gcc getting slower", but that might 
just be my "the economic life-time of computers is 3 years" replacement 
policy.


It's very interesting, Vladimir, to couple this to switching from gcc 3 
to gcc 4.


Because I was mainly interested in what gcc 4 (+ gfortran) could *do* 
(in terms of the code it could compile), I never really bothered about 
the compiler's speed of doing so.


What I *did* notice, though, is that as of gcc 4.4, I can recompile our 
complete weather forecasting code (~ 1 million lines of Fortran and ~ 
30,000 lines of C) with -O3 -ffast-math (therefore, with vectorization) 
within 5 minutes [quad core Core 2].


This means that recompiling "everything" before every run (4 times a 
day) is now a no-brainer (currently, I do it only once a day, max).


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran


Re: Proposal: remove the Obj-C++ front end

2010-05-20 Thread Basile Starynkevitch
On Thu, 2010-05-20 at 21:02 +0200, Steven Bosscher wrote:
> Hello,
> 
> The Obj-C++ front end is effectively unmaintained, and has virtually
> no serious users. I propose to remove it from GCC.

Maybe we could consider, for the putative people wanting to maintain
that frontend, to make it a plugin (or perhaps an external program
producing Gimple-syntax).

That brings a hopefully simpler, more general, and probably more
interesting question: do we have the necessary hooks to enable such a
front-end plugin? Or is the future gimple-syntax front-end becoming our
recommended way to add extra front-ends to GCC?

This is not at all (hopefully) a flame, but a a sincere question (by a
"believer" of plugins).

And I have no good perception of how difficult it is to "pluginify" or
"externalize" -thru a gimple-syntax producer program- a front-end.
 
If there is some way to add extra frontends in the future 4.6 thru
plugins (or thru external programs producing a Gimple-syntax
S-expression file), we have therefore an argument for any industrial
wanting Obj-C++ in GCC.

And a good machinery for external front-ends will sure help GCC, because
it would ease the work of those working on non-mainstream frontends
(like for the D language, or an hypothetical Oz, etc...).

Cheers.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***




Re: Proposal: remove the Obj-C++ front end

2010-05-20 Thread Steven Bosscher
On Thu, May 20, 2010 at 9:54 PM, IainS  wrote:
> No Asbestos required - but .. I do have some observations..
>
>  I write pretty much all my serious (day-job) code in ObjC and..
> ...  I have stated that it's an intention to make *that*, at least work at
> V2 on FSF.
>
> Having said that:
>
> a)  I have not anything like as much attachment to ObjC++ ...
> b) We are all limited in the amount of time we can allocate to FSF ..

Right. I am not suggesting ObjC should go away at all. It clearly has
users and it's pretty much isolated. It mixes with the C front end, of
course, but that is not a problem. With ObjC++, you have a front end
trying to unify the C, ObjC, and C++ front ends with itself. That's
not even comparable.

Besides, the ObjC language makes sense, whereas Obj-C++ combines the
worst of ObjC with the worst of C++ (or so I'm told by people who
should know what they're talking about ;-)


> a) I'm curious about how we know how many actual users we have.
>
> b) The largest group of potential users is on OSX - and, of course, without
> NeXT working on 64bit that's a serious limitation.

Well, without the NeXT runtime, I think we can safely state that there
are *no* users of GNU Obj-C++ on OSX.  I have no idea how we can
establish how many users there are, but given the state the front end
is in, it can't be many.


> Also, I think it fair to say that "ordinary" users - like me have been
> reluctant to devote significant time to ObjC/C++
> ... when the likelihood of getting code accepted was c.f sqrt(0.0).

I do not think this is true for ObjC.  I've suggested before that you
should just proclaim yourself ObjC front end maintainer with support
of the existing active ObjC maintainers, you would be welcomed with
open arms (don't expect cheer leaders, though ;-)


>> The burden on GCC on the other hand, from a maintainer's point of
>> view, is very significant.
>
> high maintenance  != unmaintained ;-)

I don't mean the number of patches per unit of time. It is the way
that everything is stringed together, and how easily it all breaks.
And, of course, how it hinders progress on other fronts that are
perhaps more important. As you've said: We are all limited in the
amount of time. It would be helpful if the GCC community can spend its
limited time at something else than this front end. If there is no-one
willing and able to step up and put in the TLC that this front end
needs, then the front end should go.

Personally, I think it would make matters only better for those who
are interested in adding ObjC 2.0.  It is easier to focus on one
language front end than four at once...


>> The Obj-C++ front end ties
>> together the c, objc, and g++ front ends in very uncomfortable ways.
>
> doesn't it just  :-( ... it's a nightmare to learn (1st hand experience).
>
> It would be worth pausing to see if there's a way to disentangle it.
> Which would, perhaps, go some way towards ameliorating the irritation.

From what I've seen so far, it's virtually impossible without actually
unifying objc++ and c++.  Perhaps that would have been the best thing
to do from the start anyway (not sure if that was tried and rejected,
or if this separate-and-tie approach is by design...).


>> There are also technical reasons for removing this front end. For one,
>> objc exceptions in objc++ don't work for the GNU runtime.
>
> Well .. I have an opinion that don't != can't  ...  if you know this to be
> untrue I'd be interested.

It's, again, a matter of resources.


> ... ObjC exceptions need sorting out..
> I would hope that ObjC++ exceptions would 99% fall out of that.
>
>> The implementation of the language also appears to be out of sync with
>> Apple (apart from ObjC 2.0).
>
> I'm not aware of any excess lack of sync c.f. ObjC

I made a diff of the Obj-C++ front ends in Apple GCC 4.2 and FSF GCC
4.2 at one point and IIRC there was already a surprising amount of
diverging going on.  But it's a while ago and I may be wrong on this
one.

Ciao!
Steven


Re: Proposal: remove the Obj-C++ front end

2010-05-20 Thread Steven Bosscher
On Thu, May 20, 2010 at 10:20 PM, Basile Starynkevitch
 wrote:
> On Thu, 2010-05-20 at 21:02 +0200, Steven Bosscher wrote:
>> Hello,
>>
>> The Obj-C++ front end is effectively unmaintained, and has virtually
>> no serious users. I propose to remove it from GCC.
>
> Maybe we could consider, for the putative people wanting to maintain
> that frontend, to make it a plugin (or perhaps an external program
> producing Gimple-syntax).

I think this is an interesting idea in principle. But IMHO removing
ObjC++ is perpendicular to fe-as-plugins.

FWIW the infrastructure for front ends as plugins is not ready and
probably will not be ready for some time to come (got time? :-). We
will first need to separate the front ends better from the rest of the
compiler.

For example, front ends should not have to call
cgraph_finalize_compilation_unit. It should be the middle-end calling
the front end to just produce GIMPLE and a cgraph, and then return to
let the middle-end finish the translation units. I'm using this as an
example because one of the big blockers for this change are (yes,
you've guessed it!) the ObjC/ObjC++ front ends. :-)

Ciao!
Steven


Re: Where does the time go?

2010-05-20 Thread Eric Botcazou
> * Adding and subtracting the above numbers, the rest of the compiler,
> which is mostly the RTL parts, still account for 100-17-16-8=59% of
> the total compile time. This was the most surprising result for me.

That figure is a little skewed though, the rest is not entirely RTL.


Front-end (3):
lexical_analysis 6.65
preprocessing   27.59
parser  31.53
65.77 17.16%

IPA (9):
ipa_reference0.06
ipa_cp   0.34
varpool_construction 0.39
ipa_pure_const   0.71
callgraph_construction   0.73
ipa_SRA  0.91
inline_heuristics1.42
integration  2.73
callgraph_optimization   3.22
10.51  2.74%

dominance_computation5.44  1.42%

Tree optimizers (47):
tree_NRV_optimization0.01
tree_loop_fini   0.03
tree_switch_initialization_convers   0.03
tree_buildin_call_DCE0.05
tree_canonical_iv0.06
tree_if_combine  0.06
PHI_merge0.07
tree_phiprop 0.07
uninit_var_anaysis   0.07
control_dependences  0.08
tree_PHI_const_copy_prop 0.16
tree_eh  0.19
tree_split_crit_edges0.19
scev_constant_prop0.2
tree_PHI_insertion0.2
tree_copy_headers0.23
tree_loop_bounds 0.24
tree_loop_invariant_motion   0.27
tree_SSA_uncprop 0.28
tree_SRA  0.3
tree_linearize_phis  0.34
tree_DSE 0.39
tree_find_ref._vars  0.39
tree_rename_SSA_copies   0.47
tree_SSA_other   0.53
tree_loop_init   0.57
tree_CFG_construction0.59
tree_code_sinking 0.6
tree_reassociation   0.72
tree_forward_propagate   0.77
tree_conservative_DCE0.96
tree_iv_optimization 1.31
tree_operand_scan1.32
tree_SSA_rewrite  1.8
alias_stmt_walking   2.15
tree_copy_propagation2.19
tree_aggressive_DCE  2.29
tree_CCP 2.89
tree_gimplify3.03
alias_analysis   3.41
dominator_optimization   3.49
tree_SSA_incremental  3.5
tree_FRE  3.9
tree_PTA  4.8
tree_CFG_cleanup 5.69
tree_VRP 8.36
tree_PRE11.42
70.67 18.44%

expand  24.18  6.31%

RTL optimizers (43)
mode_switching   0.01
lower_subreg 0.04
code_hoisting0.06
combine_stack_adjustments0.28
loop_analysis0.28
complete_unrolling0.5
zee  0.62
dominance_frontiers  0.65
loop_invariant_motion0.66
register_scan0.67
if_conversion_2  0.74
Peephole_2   0.95
regmove  1.02
thread_pro_and_epilogue  1.28
rebuild_jump_labels  1.33
jump 1.34
branch_prediction1.35
machine_dep_reorg1.36
df_multiple_defs  1.5
dead_code_elimination1.74
df_use_def_def_use_chains1.86
trivially_dead_code 2
reorder_blocks   2.07
hard_reg_cprop2.1
register_information 2.27
dead_store_elim1 2.38
dead_store_elim2  2.4
if_conversion 2.8
forward_prop 3.25
df_reaching_defs 3.44
CSE_24.71
CPROP4.98
reload_CSE_regs  5.23
df_reg_dead_unused_notes 5.62
cfg_cleanup  6.28
PRE

Re: Where does the time go?

2010-05-20 Thread Steven Bosscher
Hi Vlad,

On Thu, May 20, 2010 at 9:17 PM, Vladimir Makarov wrote:
>> For some time now, I've wanted to see where compile time goes in a
>> typical GCC build, because nobody really seems to know what the
>> compiler spends its time on. The impressions that get published about
>> gcc usually indicate that there is at least a feeling that GCC is not
>> getting faster, and that parts of the compiler are unreasonably slow.
>>
>
> It is just a feeling.  In fact, starting since 4.2, gcc becomes faster (if
> we ignore LTO).  My feeling is that LLVM becomes slower.  The gap in
> compilation speed between GCC4.5 and LLVM2.7 achieves less 10% on x86_64
> SPECInt2000 for -O2.
>
> Feeling that GCC becomes slower probably occurs for people who switched
> recently from 3.x versions because in comparison with theses versions gcc
> became slower achieving maximum slowdown 25 and 40% slowdown on gcc4.2
> correspondingly on SPECInt2000 and SPECFP2000 on x86_64 for -O2.

Yes, I agree with you on this point. But I think that doesn't mean the
GCC community can ignore the feeling.  It would be great if, for once,
the release notes could say "this release is 15% faster than the
previous release" or something like that. For PR if nothing else ;-)


> RA and scheduling is usually the slowest part of optimizing compiler because
> they solve NP-problems and widely used algorithms (list scheduling and graph
> coloring) has worst quadratic complexity.

Yes.
Actually I'm more worried about things like the DF LR/LIVE problems,
var-tracking, and expand.

DF used to be better than this, so I think there is a regression
somewhere.  I guess I'll have to look at the file-by-file data to see
if there is one file that triggers bad performance.

For var-tracking, I'll confess I'm biased, but I think it's one of the
the worst thing that happened to GCC in a long time: too invasive,
perhaps I'm not looking in the right places but I don't see the
benefit and the crowds cheering about better debug info, and on top of
all that it's slow. But that's one battle lost...  I do feel that this
slowdown (see also bugzilla) was not properly addressed before the GCC
4.5 release.

And finally: expand. This should be just a change of IR format, from
GIMPLE to RTL. I have no idea why this pass always shows up in the top
10 of slowest parts of GCC.  Lowering passes on e.g. WHIRL, or GENERIC
lowering to GIMPLE, never show up in the compile time overviews.


>  For example, here is comparison
> of how many time LLVM-2.7 passes and analogous GCC passes (although sometime
> it is hard to find full correspondence) spent on
>
>                   LTO                                            GCC4.6
> RA (linear scan RA + simple register coalescing) 7.2%        IRA
>   9%
> Instruction Selection                           10.7%        combiner+reload
>    9%
>
> The data are from compilation all_cp2k_gfortran.f90 (420Kline fortran with
> hundreds functions) on x86_64 in -O3 mode.  By the way on Core2 GCC4.6 spent
> 235 user sec on compilation of this file vs 265 sec by LLVM.

Interesting data, thanks!


> I don't know is it big or not to have such time spend in RTL parts.  But I
> think that this RTL part could be decreased if RTL (magically :) would have
> smaller footprint and contain less details.


Bah, no wand... :-)

Ciao!
Steven


Re: Where does the time go?

2010-05-20 Thread Eric Botcazou
> That figure is a little skewed though, the rest is not entirely RTL.

Now without some annoying typo in a formula...


Front-end (3):
lexical_analysis 6.65
preprocessing   27.59
parser  31.53
65.77 17.16%

IPA (9):
ipa_reference0.06
ipa_cp   0.34
varpool_construction 0.39
ipa_pure_const   0.71
callgraph_construction   0.73
ipa_SRA  0.91
inline_heuristics1.42
integration  2.73
callgraph_optimization   3.22
10.51  2.74%

dominance_computation5.44  1.42%

Tree optimizers (47):
tree_NRV_optimization0.01
tree_loop_fini   0.03
tree_switch_initialization_convers   0.03
tree_buildin_call_DCE0.05
tree_canonical_iv0.06
tree_if_combine  0.06
PHI_merge0.07
tree_phiprop 0.07
uninit_var_anaysis   0.07
control_dependences  0.08
tree_PHI_const_copy_prop 0.16
tree_eh  0.19
tree_split_crit_edges0.19
scev_constant_prop0.2
tree_PHI_insertion0.2
tree_copy_headers0.23
tree_loop_bounds 0.24
tree_loop_invariant_motion   0.27
tree_SSA_uncprop 0.28
tree_SRA  0.3
tree_linearize_phis  0.34
tree_DSE 0.39
tree_find_ref._vars  0.39
tree_rename_SSA_copies   0.47
tree_SSA_other   0.53
tree_loop_init   0.57
tree_CFG_construction0.59
tree_code_sinking 0.6
tree_reassociation   0.72
tree_forward_propagate   0.77
tree_conservative_DCE0.96
tree_iv_optimization 1.31
tree_operand_scan1.32
tree_SSA_rewrite  1.8
alias_stmt_walking   2.15
tree_copy_propagation2.19
tree_aggressive_DCE  2.29
tree_CCP 2.89
tree_gimplify3.03
alias_analysis   3.41
dominator_optimization   3.49
tree_SSA_incremental  3.5
tree_FRE  3.9
tree_PTA  4.8
tree_CFG_cleanup 5.69
tree_VRP 8.36
tree_PRE11.42
70.67 18.44%

expand  24.18  6.31%

RTL optimizers (43)
mode_switching   0.01
lower_subreg 0.04
code_hoisting0.06
combine_stack_adjustments0.28
loop_analysis0.28
complete_unrolling0.5
zee  0.62
dominance_frontiers  0.65
loop_invariant_motion0.66
register_scan0.67
if_conversion_2  0.74
Peephole_2   0.95
regmove  1.02
thread_pro_and_epilogue  1.28
rebuild_jump_labels  1.33
jump 1.34
branch_prediction1.35
machine_dep_reorg1.36
df_multiple_defs  1.5
dead_code_elimination1.74
df_use_def_def_use_chains1.86
trivially_dead_code 2
reorder_blocks   2.07
hard_reg_cprop2.1
register_information 2.27
dead_store_elim1 2.38
dead_store_elim2  2.4
if_conversion 2.8
forward_prop 3.25
df_reaching_defs 3.44
CSE_24.71
CPROP4.98
reload_CSE_regs  5.23
df_reg_dead_unused_notes 5.62
cfg_cleanup  6.28
PRE  6.64
CSE  8.16
combiner10.17
scheduling_211.44
reload

Re: Where does the time go?

2010-05-20 Thread Duncan Sands

Hi,


I don't know is it big or not to have such time spend in RTL parts.  But I
think that this RTL part could be decreased if RTL (magically :) would have
smaller footprint and contain less details.



Bah, no wand... :-)


I noticed while working on the dragonegg plugin that replacing gimple -> RTL
with gimple -> LLVM IR significantly reduced the amount of memory used by the
compiler at -O0.  I didn't investigate where the memory was going, but it seems
likely that RTL either contains a whole lot more information than the LLVM IR,
or doesn't represent it in a very memory efficient way.

Ciao,

Duncan.


Re: Where does the time go?

2010-05-20 Thread Ian Lance Taylor
Steven Bosscher  writes:

> And finally: expand. This should be just a change of IR format, from
> GIMPLE to RTL. I have no idea why this pass always shows up in the top
> 10 of slowest parts of GCC.  Lowering passes on e.g. WHIRL, or GENERIC
> lowering to GIMPLE, never show up in the compile time overviews.

expand unfortunately does more than just convert IR.  It also does
things like changing division by a constant into multiplication by a
constant, and changing multiplication by a constant into a set of
shifts and adds.  It does various optimizations involving store flags
(a = b == c;).  When expanding function calls, it generates
instructions to put arguments in the right place in registers and on
the stack.  Etc.

Ian


Re: Where does the time go?

2010-05-20 Thread Xinliang David Li
On Thu, May 20, 2010 at 2:09 PM, Ian Lance Taylor  wrote:
> Steven Bosscher  writes:
>
>> And finally: expand. This should be just a change of IR format, from
>> GIMPLE to RTL. I have no idea why this pass always shows up in the top
>> 10 of slowest parts of GCC.  Lowering passes on e.g. WHIRL, or GENERIC
>> lowering to GIMPLE, never show up in the compile time overviews.
>
> expand unfortunately does more than just convert IR.  It also does
> things like changing division by a constant into multiplication by a
> constant, and changing multiplication by a constant into a set of
> shifts and adds.  It does various optimizations involving store flags
> (a = b == c;).  When expanding function calls, it generates
> instructions to put arguments in the right place in registers and on
> the stack.  Etc.


stack variable overlay and stack slot assignments is here too.

David


>
> Ian
>


Re: Where does the time go?

2010-05-20 Thread Steven Bosscher
On Thu, May 20, 2010 at 10:54 PM, Duncan Sands  wrote:
> I noticed while working on the dragonegg plugin that replacing gimple -> RTL
> with gimple -> LLVM IR significantly reduced the amount of memory used by
> the compiler at -O0.  I didn't investigate where the memory was going, but
> it seems likely that RTL either contains a whole lot more information than
> the LLVM IR, or doesn't represent it in a very memory efficient way.

The latter. LLVM IR contains a bit more information (or at least,
contains it in a more natural way) but the problem with RTL is, I
think, the tree-like representation. If you have an instruction like
(set (a) (b+c)) you could have, at the simples, three integers (insn
uid, basic block, instruction code) and three pointers for operands.
In total, on a 64 bits host: 3*4+3*8 = 36 bytes.

An RTL instruction of that form, assuming all operands are registers,
is 6*sizeof(struct rtx_def) = 6*48 = 288 bytes, give or take a few.
Those 6 rtx'en are for:

1. insn
2. set
3. set_dest operand
4. set_source: a plus
5. source operand 1
6. source operand 2

All in all, perhaps not the most efficient representation for memory
foot print, and the pointer chasing probably doesn't help (cache!).
But changing it is a lot more difficult than the GIMPLE tuples
project. I don't think it can be done.

Ciao!
Steven


Re: Where does the time go?

2010-05-20 Thread Steven Bosscher
On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li  wrote:
> stack variable overlay and stack slot assignments is here too.

Yes, and for these I would like to add a separate timevar. Agree?

Ciao!
Steven


Re: Where does the time go?

2010-05-20 Thread Xinliang David Li
On Thu, May 20, 2010 at 2:18 PM, Steven Bosscher  wrote:
> On Thu, May 20, 2010 at 11:14 PM, Xinliang David Li  
> wrote:
>> stack variable overlay and stack slot assignments is here too.
>
> Yes, and for these I would like to add a separate timevar. Agree?

Yes.  (By the way, we are rewriting this pass to eliminate the code
motion/aliasing problem -- but that is a different topic).

David

>
> Ciao!
> Steven
>


Re: Where does the time go?

2010-05-20 Thread Bradley Lucier
On my codes, pre-RA instruction scheduling on X86-64 (a) improves run
times by roughly 10%, and (b) costs a lot of compile time.

The -fscheduling option didn't seem to be on in your time tests (I think
it's not on by default on that architecture at -O2).

Brad



Code compilation with GCC and GFORTRAN

2010-05-20 Thread muhammad.k.akbar
Hello,
I have a FORTRAN code that uses some c routines. I compile it with gcc
and gfortran in RedHat Linux without any problem. Recently I bought a
laptop with Ubuntu. I have gcc and gfortran version 4.4.3 in it. When
I compile the code, I see the following warning:

warning: ignoring return value of 'fread', declared with attribute
warn_unused_result

When I try to run the executable, it does not run as expected. Is
there any library files that I am missing? Please help.

I am using the following flags:
gcc -c -O3 -DLINUX
gfortran -c -O3 -DLINUX -fdefault-real-8 -ffree-line-length-132

By the way, I have installed Intel compilers, and the code runs fine
with icc and ifort. I am puzzled!
Thanks,
Muhammad Akbar


Re: Code compilation with GCC and GFORTRAN

2010-05-20 Thread Muhammad Akbar
Hello,
I have a FORTRAN code that uses some c routines. I compile it with gcc
and gfortran in RedHat Linux without any problem. Recently I bought a
laptop with Ubuntu. I have gcc and gfortran version 4.4.3 in it. When
I compile the code, I see the following warning:

warning: ignoring return value of 'fread', declared with attribute
warn_unused_result

When I try to run the executable, it does not run as expected. Is
there any library
files that I am missing? Please help.
I am using the following flags:
gcc -c -O3 -DLINUX
gfortran -c -O3 -DLINUX -fdefault-real-8 -ffree-line-length-132

By the way, I have installed Intel compilers, and the code runs fine
with icc and ifort. I am puzzled!
Thanks,
Muhammad Akbar


gcc-4.5-20100520 is now available

2010-05-20 Thread gccadmin
Snapshot gcc-4.5-20100520 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100520/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 159642

You'll find:

gcc-4.5-20100520.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100520.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100520.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100520.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100520.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100520.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100520.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100520.tar.bz2The GCC testsuite

Diffs from 4.5-20100513 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Code compilation with GCC and GFORTRAN

2010-05-20 Thread Ian Lance Taylor
Muhammad Akbar  writes:

> I have a FORTRAN code that uses some c routines. I compile it with gcc
> and gfortran in RedHat Linux without any problem. Recently I bought a
> laptop with Ubuntu. I have gcc and gfortran version 4.4.3 in it. When
> I compile the code, I see the following warning:
>
> warning: ignoring return value of 'fread', declared with attribute
> warn_unused_result
>
> When I try to run the executable, it does not run as expected. Is
> there any library
> files that I am missing? Please help.
> I am using the following flags:
> gcc -c -O3 -DLINUX
> gfortran -c -O3 -DLINUX -fdefault-real-8 -ffree-line-length-132
>
> By the way, I have installed Intel compilers, and the code runs fine
> with icc and ifort. I am puzzled!


Please never send messages to both gcc@gcc.gnu.org and
gcc-h...@gcc.gnu.org.  This message should only have gone to
gcc-h...@gcc.gnu.org.  Please take any followups to gcc-help.  Thanks.

It is impossible for us to answer your question without more
information, because you didn't tell us what you mean by "does not run
as expected."  Also, if it runs on RedHat Linux but fails on Ubuntu,
then it is probably not an issue with gcc.

Ian


Re: Performance optimizations for Intel Core 2 and Core i7 processors

2010-05-20 Thread Alexander Strange

On May 20, 2010, at 8:04 AM, Steven Bosscher wrote:

> On Mon, May 17, 2010 at 8:44 AM, Maxim Kuvyrkov  
> wrote:
>> CodeSourcery is working on improving performance for Intel's Core 2 and Core
>> i7 families of processors.
>> 
>> CodeSourcery plans to add support for unaligned vector instructions, to
>> provide fine-tuned scheduling support and to update instruction selection
>> and instruction cost models for Core i7 and Core 2 families of processors.
>> 
>> As usual, CodeSourcery will be contributing its work to GCC.  Currently, our
>> target is the end of GCC 4.6 Stage1.
>> 
>> If your favorite benchmark significantly under-performs on Core 2 or Core i7
>> CPUs, don't hesitate asking us to take a look at it.
> 
> I'd like to ask you to look at ffmpeg (missed core2 vectorization
> opportunities), polyhedron (PR34501, like, duh! :-), and Apache
> benchmark (-mtune=core2 results in lower scores).
> 
> You could check overall effects on an openly available benchmark suite
> such as http://www.phoronix-test-suite.com/
> 
> Good luck with this project, it'll be great when -mtune=core2 actually
> improves performance rather than degrading it!
> 
> Ciao!
> Steven

ffmpeg builds with -fno-tree-vectorize - there was some miscompilation with it 
on PPC and the maintainer is too shy to file compiler bugs about it - and that 
probably won't change. But it's still worth looking at, since it might improve 
other programs.

Some numbers decoding H264 on Core i5 x86-64:
asm on: 8.78s
asm off (./configure --disable-asm): 15.61s
asm off + -ftree-vectorize -ftree-slp-vectorize -fstrict-aliasing: 14.84s

So there's a lot of room there.

I haven't investigated, but I guess some useful missing features are 
small-vector vectorization using MMX (ffmpeg uses it everywhere) and scalar 
write-combining (http://x264dev.multimedia.cx/?p=32). And better 
scheduling/shorter code in general.


Re: powerpc-eabi-gcc no implicit FPU usage

2010-05-20 Thread Alan Modra
On Thu, May 20, 2010 at 09:40:47AM -0700, Mark Mitchell wrote:
> It is of course a feature much
> less valuable on a workstation/server class operating system than on the
> VxWorks/RTEMS class of RTOS systems.

Even on servers this option may be quite valuable.  I recall seing
figures that showed using fp regs for something like structure copies
could cost thousands of cpu cycles.

Why?  With lazy fpu save and restore, the first use of the fpu in a
given time slice takes an interrupt.  So if your task is only using
the fpu occasionally it is a severe misoptimization to choose to use
fp regs rather than gp regs.

-- 
Alan Modra
Australia Development Lab, IBM


Re: powerpc-eabi-gcc no implicit FPU usage

2010-05-20 Thread David Edelsohn
On Thu, May 20, 2010 at 9:35 PM, Alan Modra  wrote:
> On Thu, May 20, 2010 at 09:40:47AM -0700, Mark Mitchell wrote:
>> It is of course a feature much
>> less valuable on a workstation/server class operating system than on the
>> VxWorks/RTEMS class of RTOS systems.
>
> Even on servers this option may be quite valuable.  I recall seing
> figures that showed using fp regs for something like structure copies
> could cost thousands of cpu cycles.
>
> Why?  With lazy fpu save and restore, the first use of the fpu in a
> given time slice takes an interrupt.  So if your task is only using
> the fpu occasionally it is a severe misoptimization to choose to use
> fp regs rather than gp regs.

If the patch is the one I remember, I believe the consensus is the
patch was not safe -- it was substituting RTL patterns in ways that
could violate GCC internal semantics.  I would gratefully accept a
safe and reliable patch, such as scanning the function early and
disabling patterns with FP instructions if the function did not
contain explicit references to FP operations.

No one disagrees with the potential benefit of the feature.

David


Re: powerpc-eabi-gcc no implicit FPU usage

2010-05-20 Thread Mark Mitchell
David Edelsohn wrote:

>>> It is of course a feature much
>>> less valuable on a workstation/server class operating system than on the
>>> VxWorks/RTEMS class of RTOS systems.

> No one disagrees with the potential benefit of the feature.

OK; I must have misremembered.

I believe our current implementation keeps track of FP usage through the
front-end, and then disables any floating-point registers by futzing
with fixed_regs and such when compiling each function.  There appear to
be no back-end specific patches at all.  If that sounds like a
reasonable approach, we might be able to get that into 4.6.

Thanks,

-- 
Mark Mitchell
CodeSourcery
m...@codesourcery.com
(650) 331-3385 x713