Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Dave Korn
On 24/11/2010 21:31, Joern Rennecke wrote:
> Quoting Paul Koning :
> 
>> If BITS_PER_UNIT is all that's left, could you use some genxxx.c to 
>> extract that from tm.h and drop it into a tm-bits.h in the build 
>> directory?  Then you could include that one instead of tm.h.
> 
> Yes, that's what I said.  Only there is little point in writing
> the generator program right now if all it ever does is spit out
> #define BITS_PER_UNIT 8
> 
> We can add the generator program when we (re-) add a word addressed
> target, or add a bit addressed one.

  I do think that this goal is not so far off that we should actually
encourage new code to break it.  I built gcc 4.5.0 based on a 24-bit word size
recently, just in order to get the driver to work and the actual compiler
itself to successfully init itself and compile an empty file without crashing,
and that proved entirely practical, so we might not be so far off as one might
assume.  That shows that the core is already substantially independent from
the target, I think, and that we could go further with that independence.

cheers,
  DaveK


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Dave Korn
On 24/11/2010 14:17, Richard Guenther wrote:

> I don't see why RTL optimizers should be different from tree optimizers.

  I thought half the point of tree-ssa in the first place was to separate
optimisation out from target-specific stuff and do it on an independent level?

On 24/11/2010 15:32, Richard Guenther wrote:
> As we are moving towards doing more target dependent optimizations
> on the tree level this doesn't sound like a sustainable opinion. 

  Wait, we're doing that?  Isn't that the same mistake we made earlier?

On 24/11/2010 14:17, Richard Guenther wrote:
> And we don't want to pay the overhead of hookization every target
> dependent constant just for the odd guys who want multi-target
> compilers that have those constants differing.

  Why not?  Precisely how big is this cost?  Back in the old days we all used
to want to avoid virtual functions, because of the cost of a
function-call-through-pointer, but that certainly isn't justified any more and
may not even have been then.

> a multi-target compiler where the hooks are in shared loadable
> modules

  It's not just Diego who envisions that, I think it would be an excellent
long-term goal too.  And I thought that was why all the work to hookize macros
was motivated in the first place.

cheers,
  DaveK



Re: Method to test all sse2 calls?

2010-11-24 Thread Ian Lance Taylor
"David Mathog"  writes:

> Ian Lance Taylor  wrote:
>
>> Your changes are relying on a gcc extension which was only recently
>> added, more recently than those tests were added to the testsuite.  Only
>> recently did gcc acquire the ability to use [] to access elements in a
>> vector. 
>
> That isn't what my changes did. The array accesses are to the arrays in
> the union - nothing cutting edge there.  The data is accessed through
> the array specified by .d (or .s etc.) not to name.x[index].

Oh, sorry, completely misunderstood.  In that case, it seems to me that
your changes are causing the tests to no longer test what they should:
the code generation resulting from the specific gcc builtins, now
available as a gcc extension.


> Would changing the use of inlined functions to defines let the compiler
> digest it better?  For instance:
>
> static __inline __m128i __attribute__((__always_inline__))
> _mm_andnot_si128 (__m128i __A, __m128i __B)
> {
>   return (~__A) & __B;
> }
>
> becomes
>
> #define _mm_andnot_si128(A,B)  (~A & B)
>
> That approach will get really messy for the more complicated _mm*.

I can't think of any reason why that would help.


> In general terms, can somebody give me a hint as to the sorts of things
> that if found in inlined functions might cause the compiler to optimize
> to invalid code?

The usual issue is invalid aliasing; see the docs for the
-fstrict-aliasing option.  I don't know if that is the problem here.

Ian


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joern Rennecke

Quoting Richard Guenther :


What's the benefit of not including tm.h in the tree optimizers and frontend
files to our users?


We should see less instability of frontends and tree optimizers for less-often
tested targets.  This can prevent release cycles from getting longer, and/or
allow more work to be accomplished in a release cycle.

These files should compile the same for different target configurations
(assuming we don't have a BITS_PER_UNIT discrepancy).
With tm.h included, you can never quite tell what's going on.
(unless you want to analyze every every source file for every target
 configuration - that can be done in finite time, but not necessarily
 before the code is obsolete.)

So you should be able to build the frontend once and use it in  
multiple compilers, e.g. a native one and across-compiler to a netbook  
with uses

a different processor.
More importantly, CPU-GPU programming certainly coming, and a multi-target
compiler should eventuallyprovide a tool to use such a heterogenous system
without having to do all the partitioning and interworking by hand.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Richard Guenther
On Wed, Nov 24, 2010 at 10:04 PM, Joern Rennecke  wrote:
> Quoting Richard Guenther :
>
>> So, Joern, maybe you can clarify what the benefit is in hookizing
>> BITS_PER_UNIT?
>
> The point is that I want to eliminate all tm.h macro uses from the
> tree optimizer and frontend files, so that they can stop including
> tm.h .  When I first tried putting all patches to eliminate tm.h includes
> from target.h, function.h and gimple.h together, a missing definition of
> BITS_PER_UNIT was the first problem that popped up.  Also, even if the
> count of files where BITS_PER_UNIT is the only tm.h macro is low right now,
> if we don't have a strategy how to deal with it, it'll remain the last
> macro standing and block all efforts to get rid of the tm.h includes.
> With our current supported target set, we can actually
> define BITS_PER_UNIT as constant 8 in system.h - that'd get it out
> of the way.
> If we actually actually get different different BITS_PER_UNIT values again,
> we can generate a header file to define the appropriate value, but
> with our current target set, there would be little point and little test
> coverage for doing that.

What's the benefit of not including tm.h in the tree optimizers and frontend
files to our users?

Richard.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joern Rennecke

Quoting Paul Koning :

If BITS_PER_UNIT is all that's left, could you use some genxxx.c to   
extract that from tm.h and drop it into a tm-bits.h in the build   
directory?  Then you could include that one instead of tm.h.


Yes, that's what I said.  Only there is little point in writing
the generator program right now if all it ever does is spit out
#define BITS_PER_UNIT 8

We can add the generator program when we (re-) add a word addressed
target, or add a bit addressed one.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Paul Koning

On Nov 24, 2010, at 4:04 PM, Joern Rennecke wrote:

> Quoting Richard Guenther :
> 
>> So, Joern, maybe you can clarify what the benefit is in hookizing
>> BITS_PER_UNIT?
> 
> The point is that I want to eliminate all tm.h macro uses from the
> tree optimizer and frontend files, so that they can stop including
> tm.h .  When I first tried putting all patches to eliminate tm.h includes
> from target.h, function.h and gimple.h together, a missing definition of
> BITS_PER_UNIT was the first problem that popped up.  Also, even if the
> count of files where BITS_PER_UNIT is the only tm.h macro is low right now,
> if we don't have a strategy how to deal with it, it'll remain the last
> macro standing and block all efforts to get rid of the tm.h includes.
> With our current supported target set, we can actually
> define BITS_PER_UNIT as constant 8 in system.h - that'd get it out
> of the way.
> If we actually actually get different different BITS_PER_UNIT values again,
> we can generate a header file to define the appropriate value, but
> with our current target set, there would be little point and little test
> coverage for doing that.

If BITS_PER_UNIT is all that's left, could you use some genxxx.c to extract 
that from tm.h and drop it into a tm-bits.h in the build directory?  Then you 
could include that one instead of tm.h.

paul



Re: Method to test all sse2 calls?

2010-11-24 Thread David Mathog
Ian Lance Taylor  wrote:

> Your changes are relying on a gcc extension which was only recently
> added, more recently than those tests were added to the testsuite.  Only
> recently did gcc acquire the ability to use [] to access elements in a
> vector. 

That isn't what my changes did. The array accesses are to the arrays in
the union - nothing cutting edge there.  The data is accessed through
the array specified by .d (or .s etc.) not to name.x[index].


> So I think you may have misinterpreted the __builtin_ia32_vec_ext_v2di
> builtin function.  That function treats the vector as containing two
> 8-byte integers, and pulls out one or the other depending on the second
> argument.  Your dumps of res[0] and res[1] suggest that you are treating
> the vector as four 4-byte integers and pulling out specific ones.

Yup, my bad, put in d where it should have been ll.  Also fixed the
problem I induced in sse2-check.h, where too large a chunk was commented
out, that was causing the gcc -Wall -msse2 problem.  The changed part in
the original source was

  if ((edx & bit_SSE2) && sse_os_support ())

and is now:

#if !defined(SOFT_SSE2)
  if ((edx & bit_SSE2) && sse_os_support ())
#else
  if (sse_os_support ())
#endif /*SOFT_SSE2*/

My software SSE2 passes all 165 of the sse2 tests that are complete
programs.

However, there is a problem in the real world.  While the sse2 programs
in the testsuite do exercise the _mm* functions, they do so one at a
time.  I have found that in real code, which makes multiple _mm* calls,
if -O0 is not used, the wrong results (may) come out.  

% gcc -std=gnu99 -g -pg -pthread -O0 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g
-pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel
-I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest
./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall
-Wl,--end-group -leasel -lm
% ./msvfilter_utest
(no output, it ran correctly)

% gcc -std=gnu99 -g -pg -pthread -O1 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g
-pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel
-I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest
./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall
-Wl,--end-group -leasel -lm
% ./msvfilter_utest
msv filter unit test failed: scores differ (-50.37, -10.86)

Going to higher optimization and there are even bigger issues, like not
compiling at all (even with gcc 4.4.1):

% gcc -std=gnu99 -g -pg -pthread -O2 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g
-pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel
-I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest
./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall
-Wl,--end-group -leasel -lm
../../easel/emmintrin.h:2178: warning: dereferencing pointer
'({anonymous})' does break strict-aliasing rules
../../easel/emmintrin.h:2178: note: initialized from here
.
.  (same sort of message many many times)
.
./msvfilter.c:208: error: unable to find a register to spill in class
'GENERAL_REGS'
./msvfilter.c:208: error: this is the insn:
(insn 1944 1943 1945 46 ../../easel/emmintrin.h:2348 (set
(strict_low_part (subreg:HI (reg:TI 1239) 0))
(mem:HI (reg/f:SI 96 [ pretmp.1031 ]) [13 S2 A16])) 47
{*movstricthi_1} (nil))
./msvfilter.c:208: confused by earlier errors, bailing out

Would changing the use of inlined functions to defines let the compiler
digest it better?  For instance:

static __inline __m128i __attribute__((__always_inline__))
_mm_andnot_si128 (__m128i __A, __m128i __B)
{
  return (~__A) & __B;
}

becomes

#define _mm_andnot_si128(A,B)  (~A & B)

That approach will get really messy for the more complicated _mm*.

In general terms, can somebody give me a hint as to the sorts of things
that if found in inlined functions might cause the compiler to optimize
to invalid code?


Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joern Rennecke

Quoting Richard Guenther :


So, Joern, maybe you can clarify what the benefit is in hookizing
BITS_PER_UNIT?


The point is that I want to eliminate all tm.h macro uses from the
tree optimizer and frontend files, so that they can stop including
tm.h .  When I first tried putting all patches to eliminate tm.h includes
from target.h, function.h and gimple.h together, a missing definition of
BITS_PER_UNIT was the first problem that popped up.  Also, even if the
count of files where BITS_PER_UNIT is the only tm.h macro is low right now,
if we don't have a strategy how to deal with it, it'll remain the last
macro standing and block all efforts to get rid of the tm.h includes.
With our current supported target set, we can actually
define BITS_PER_UNIT as constant 8 in system.h - that'd get it out
of the way.
If we actually actually get different different BITS_PER_UNIT values again,
we can generate a header file to define the appropriate value, but
with our current target set, there would be little point and little test
coverage for doing that.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Richard Guenther
On Wed, Nov 24, 2010 at 4:32 PM, Richard Guenther
 wrote:
> On Wed, Nov 24, 2010 at 4:22 PM, Joern Rennecke  wrote:
>> Quoting Richard Guenther :
>>
>>> On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke 
>>> wrote:

 I'm fine with the RTL optimizers to use target macros, but I'd like the
 frontends and tree optimizers to cease to use tm.h .  That means
 all macros uses there have to be converted.  That does not necessarily
 involve target port code - a wrapper hook could be provided in
 targhooks.c
 that uses the target macro.
>>>
>>> I don't see why RTL optimizers should be different from tree optimizers.
>>
>> RTL optimizers tend to have a lot of target dependencies; hookizing them
>> all is likely impractical, and also to have a performance impact.
>>
>> Also, by making the tree optimizers target independent, you can make
>> optimizations that consider more than one target.
>>
>> Because RTL optimizers work on highly target-dependent program
>> representations, the decision on what target's code to work on has already
>> been fixed by the time the RTL optimizers run.
>
> As we are moving towards doing more target dependent optimizations
> on the tree level this doesn't sound like a sustainable opinion.  GIMPLE
> is just a representation - whether it is target dependent or not isn't
> related to that it is GIMPLE or RTL.
>
>>> And we don't want to pay the overhead of hookization every target
>>> dependent constant just for the odd guys who want multi-target
>>> compilers that have those constants differing.
>>
>> As compared to... having a multi-year unfinished hookization process that
>> hasn't provided any new functinality yet.
>
> And hookizing BITS_PER_UNIT brings you closer exactly how much?
>
> Tackle the hard ones.  Because if you can't solve those you won't
> succeed ever and there's no reason to pay the price for BITS_PER_UNIT
> then.

Btw, I don't remember what your reason was for hookization, Joern.  But
I can't see why things like BITS_PER_UNIT cannot be part of the
ABI/API between the middle-end and a possible target shared object.

If the goal is to emit code for different targets from a single compilation
(thus basically make the IL re-targetable) then hookization of BITS_PER_UNIT
brings you exactly nothing as values derived from it are stored all
over in the IL, so you'd need to fixup all types and decls and possibly
re-layout things at the time you switch to a different target.

So, Joern, maybe you can clarify what the benefit is in hookizing
BITS_PER_UNIT?

Thanks,
Richard.


Re: Method to test all sse2 calls?

2010-11-24 Thread Ian Lance Taylor
"David Mathog"  writes:

> Ian Lance Taylor , wrote:
>
>> Tests that directly invoke __builtin functions are not appropriate for
>> your replacement for emmintrin.h.
>
> Clearly.  However, I do not see why these are in the test routines in
> the first place.  They seem not to be needed.  I made the changes below
> my signature, eliminating all of the vector builtins, and the programs
> still worked with both -msse2 and -mno-sse2 plus my software SSE2.  If
> anything the test programs are much easier to understand without the
> builtins.

Your changes are relying on a gcc extension which was only recently
added, more recently than those tests were added to the testsuite.  Only
recently did gcc acquire the ability to use [] to access elements in a
vector.  I agree that your changes look good, although we rarely change
existing tests unless there is a very good reason.  Avoiding __builtin
functions in the gcc testsuite is not in itself a good reason.  These
tests were written for gcc; they were not written as general purpose SSE
tests.


> There is also a (big) problem with sse2-vec-2.c (and -2a, which is empty
> other than an #include sse2-vec-2.c).  There are no explicit sse2
> operations within this test program.  Moreover, the code within the
> tests does not work.  Finally, if one puts a print statement anywhere in
> the test that is there, compiles it with:
>
>  gcc -msse -msse2
>
> there will be no warnings, and the run will appear to show a valid test,
> but in actuality the test will never execute! This shows part of the
> problem:
>
> gcc -Wall -msse -msse2 -o foo sse2-vec-2.c
> sse-os-support.h:27: warning: 'sse_os_support' defined but not used
> sse2-check.h:10: warning: 'do_test' defined but not used
>
> (also for -m64) There must be some sort of main in there, but no test,
> it does nothing and returns a valid status.

The main function is in sse2-check.h.  As you can see in that file, the
test is only run if the CPU includes SSE2 support.  That is fine for
gcc's purposes, but I can see that it is problematic for yours.


> When stuffed with debug statements:
>
>   for (i = 0; i < 2; i++)
> masks[i] = i;
>
> printf("DEBUG res[0] %llX\n",res[0]);
> printf("DEBUG res[1] %llX\n",res[1]);
> printf("DEBUG val1.ll[0] %llX\n",val1.ll[0]);
> printf("DEBUG val1.ll[1] %llX\n",val1.ll[1]);
>   for (i = 0; i < 2; i++)
> if (res[i] != val1.ll [masks[i]]){
> printf("DEBUG i %d\n",i);
> printf("DEBUG masks[i] %d\n",masks[i]);
> printf("DEBUG val1.ll [masks[i]] %llX\n", val1.ll [masks[i]]);
>   abort ();
> }
>
> and compiled with my software SSE2 
>
> gcc -Wall -msse -mno-sse2 -I. -O0 -m32 -lm -DSOFT_SSE2 -DEMMSOFTDBG -o
> foo   sse2-vec-2.c
>
> It emits:
>
> DEBUG res[0] 3020100
> DEBUG res[1] 7060504
> DEBUG val1.ll[0] 706050403020100
> DEBUG val1.ll[1] F0E0D0C0B0A0908
> DEBUG i 0
> DEBUG masks[i] 0
> DEBUG val1.ll [masks[i]] 706050403020100
> Aborted
>
> True enough 3020100 != 706050403020100, but what kind of test
> is that???

When I run the unmodified test on my system, which has SSE2 support in
hardware, I see that

res[0] == 0x706050403020100
res[1] == 0xf0e0d0c0b0a0908

So I think you may have misinterpreted the __builtin_ia32_vec_ext_v2di
builtin function.  That function treats the vector as containing two
8-byte integers, and pulls out one or the other depending on the second
argument.  Your dumps of res[0] and res[1] suggest that you are treating
the vector as four 4-byte integers and pulling out specific ones.

Ian


Re: Method to test all sse2 calls?

2010-11-24 Thread David Mathog
Ian Lance Taylor , wrote:

> Tests that directly invoke __builtin functions are not appropriate for
> your replacement for emmintrin.h.

Clearly.  However, I do not see why these are in the test routines in
the first place.  They seem not to be needed.  I made the changes below
my signature, eliminating all of the vector builtins, and the programs
still worked with both -msse2 and -mno-sse2 plus my software SSE2.  If
anything the test programs are much easier to understand without the
builtins.

There is also a (big) problem with sse2-vec-2.c (and -2a, which is empty
other than an #include sse2-vec-2.c).  There are no explicit sse2
operations within this test program.  Moreover, the code within the
tests does not work.  Finally, if one puts a print statement anywhere in
the test that is there, compiles it with:

 gcc -msse -msse2

there will be no warnings, and the run will appear to show a valid test,
but in actuality the test will never execute! This shows part of the
problem:

gcc -Wall -msse -msse2 -o foo sse2-vec-2.c
sse-os-support.h:27: warning: 'sse_os_support' defined but not used
sse2-check.h:10: warning: 'do_test' defined but not used

(also for -m64) There must be some sort of main in there, but no test,
it does nothing and returns a valid status.

When stuffed with debug statements:

  for (i = 0; i < 2; i++)
masks[i] = i;

printf("DEBUG res[0] %llX\n",res[0]);
printf("DEBUG res[1] %llX\n",res[1]);
printf("DEBUG val1.ll[0] %llX\n",val1.ll[0]);
printf("DEBUG val1.ll[1] %llX\n",val1.ll[1]);
  for (i = 0; i < 2; i++)
if (res[i] != val1.ll [masks[i]]){
printf("DEBUG i %d\n",i);
printf("DEBUG masks[i] %d\n",masks[i]);
printf("DEBUG val1.ll [masks[i]] %llX\n", val1.ll [masks[i]]);
  abort ();
}

and compiled with my software SSE2 

gcc -Wall -msse -mno-sse2 -I. -O0 -m32 -lm -DSOFT_SSE2 -DEMMSOFTDBG -o
foo   sse2-vec-2.c

It emits:

DEBUG res[0] 3020100
DEBUG res[1] 7060504
DEBUG val1.ll[0] 706050403020100
DEBUG val1.ll[1] F0E0D0C0B0A0908
DEBUG i 0
DEBUG masks[i] 0
DEBUG val1.ll [masks[i]] 706050403020100
Aborted

True enough 3020100 != 706050403020100, but what kind of test
is that???

Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

changes to sse2-vec-*.c routines to eliminate all of the __builtin
calls:

ls -1 sse2-vec*dist | grep -v vec-2 | extract -cols 'diff --context=0
[1,-6] [1,]' | execinput
*** sse2-vec-1.c2010-11-24 09:06:46.0 -0800
--- sse2-vec-1.c.dist   2010-11-24 09:06:39.0 -0800
***
*** 27,28 
!   res[0] = val1.d[msk0];
!   res[1] = val1.d[msk1];
--- 27,28 
!   res[0] = __builtin_ia32_vec_ext_v2df ((__v2df)val1.x, msk0);
!   res[1] = __builtin_ia32_vec_ext_v2df ((__v2df)val1.x, msk1);
*** sse2-vec-3.c2010-11-24 09:09:13.0 -0800
--- sse2-vec-3.c.dist   2010-11-24 09:07:48.0 -0800
***
*** 27,30 
!   res[0] = val1.i[0];
!   res[1] = val1.i[1];
!   res[2] = val1.i[2];
!   res[3] = val1.i[3];
--- 27,30 
!   res[0] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 0);
!   res[1] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 1);
!   res[2] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 2);
!   res[3] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 3);
*** sse2-vec-4.c2010-11-24 09:10:00.0 -0800
--- sse2-vec-4.c.dist   2010-11-24 09:07:48.0 -0800
***
*** 27,34 
!   res[0] = val1.s[0];
!   res[1] = val1.s[1];
!   res[2] = val1.s[2];
!   res[3] = val1.s[3];
!   res[4] = val1.s[4];
!   res[5] = val1.s[5];
!   res[6] = val1.s[6];
!   res[7] = val1.s[7];
--- 27,34 
!   res[0] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 0);
!   res[1] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 1);
!   res[2] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 2);
!   res[3] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 3);
!   res[4] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 4);
!   res[5] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 5);
!   res[6] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 6);
!   res[7] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 7);
*** sse2-vec-5.c2010-11-24 09:11:09.0 -0800
--- sse2-vec-5.c.dist   2010-11-24 09:07:48.0 -0800
***
*** 27,42 
!   res[0] = val1.c[0];
!   res[1] = val1.c[1];
!   res[2] = val1.c[2];
!   res[3] = val1.c[3];
!   res[4] = val1.c[4];
!   res[5] = val1.c[5];
!   res[6] = val1.c[6];
!   res[7] = val1.c[7];
!   res[8] = val1.c[8];
!   res[9] = val1.c[9];
!   res[10] = val1.c[10];
!   res[11] = val1.c[11];
!   res[12] = val1.c[12];
!   res[13] = val1.c[13];
!   res[14] = val1.c[14];
!   res[15] = val1.c[15];
--- 27,42 
!   res[0] = __builtin_ia32_vec_ext_v16qi ((__v16qi)val1.x, 0);
!   res[1] = __builtin_ia32_vec_ext_v16qi ((__v16qi)val1.x, 1);
!   res[2] = __builtin_ia32_vec_ext_v16qi ((__v16qi)val1.x, 2);
!   res[3] = __builtin_ia32_vec_ext_v16qi 

Re: Boostrap failures on Solaris at gcc/toplev.c stage2 compilation

2010-11-24 Thread Joseph S. Myers
On Wed, 24 Nov 2010, Art Haas wrote:

> This morning's build attempts on both i386-pc-solaris2.10 and
> sparc-sun-solaris2.10 failed with the following error:
> 
> /export/home/arth/gnu/gcc-1124/./prev-gcc/xgcc 
> -B/export/home/arth/gnu/gcc-1124/./prev-gcc/ 
> -B/export/home/arth/local/i386-pc-solaris2.10/bin/ 
> -B/export/home/arth/local/i386-pc-solaris2.10/bin/ 
> -B/export/home/arth/local/i386-pc-solaris2.10/lib/ -isystem 
> /export/home/arth/local/i386-pc-solaris2.10/include -isystem 
> /export/home/arth/local/i386-pc-solaris2.10/sys-include-c   -g -O2 
> -gtoggle -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes 
> -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long 
> -Wno-variadic-macros -Wno-overlength-strings -Werror -Wold-style-definition 
> -Wc++-compat   -DHAVE_CONFIG_H -I. -I. -I/home/ahaas/gnu/gcc.git/gcc 
> -I/home/ahaas/gnu/gcc.git/gcc/. -I/home/ahaas/gnu/gcc.git/gcc/../include 
> -I/home/ahaas/gnu/gcc.git/gcc/../libcpp/include 
> -I/export/home/arth/local/include -I/export/home/arth/local/include  
> -I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber 
> -I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber/dpd -I../libdecnumber/!
>  home/ahaas/gnu/gcc.git/gcc/tree-call-cdce.c -o tree-call-cdce.o
> /home/ahaas/gnu/gcc.git/gcc/toplev.c: In function 'crash_signal':
> /home/ahaas/gnu/gcc.git/gcc/toplev.c:445:3: error: implicit declaration of 
> function 'signal' [-Werror=implicit-function-declaration]
> cc1: all warnings being treated as errors
> 
> The likely cause is this patch applied yesterday:
> 
> 2010-11-23  Joseph Myers  
> { ...snip ... }
> * toplev.c: Don't include  or .
> (setup_core_dumping, strip_off_ending, decode_d_option): Move to
> opts.c.

I've committed this patch as obvious to fix this.  (With glibc, 
 includes , as POSIX permits but does not require, 
which explains why I didn't see this in my testing.)

I do wonder if it really makes sense for  includes to go in 
individual source files or whether it would be better to put more headers 
in system.h.  There may be cases where including a system header means you 
need to link in extra libraries - in all programs, not just the compilers 
proper - if it has inline functions (gmp.h and mpfr.h might be like that).  
But otherwise I think more host-side code should avoid including more 
system headers itself.  Particular headers in point:  
  .  There are also several cases 
of host-side code including headers already included in system.h.

Index: toplev.c
===
--- toplev.c(revision 167122)
+++ toplev.c(working copy)
@@ -28,6 +28,7 @@
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+#include 
 
 #ifdef HAVE_SYS_TIMES_H
 # include 
Index: ChangeLog
===
--- ChangeLog   (revision 167122)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2010-11-24  Joseph Myers  
+
+   * toplev.c: Include .
+
 2010-11-24  Richard Guenther  
 
PR lto/43218

-- 
Joseph S. Myers
jos...@codesourcery.com


Boostrap failures on Solaris at gcc/toplev.c stage2 compilation

2010-11-24 Thread Art Haas

Hi.

This morning's build attempts on both i386-pc-solaris2.10 and
sparc-sun-solaris2.10 failed with the following error:

/export/home/arth/gnu/gcc-1124/./prev-gcc/xgcc 
-B/export/home/arth/gnu/gcc-1124/./prev-gcc/ 
-B/export/home/arth/local/i386-pc-solaris2.10/bin/ 
-B/export/home/arth/local/i386-pc-solaris2.10/bin/ 
-B/export/home/arth/local/i386-pc-solaris2.10/lib/ -isystem 
/export/home/arth/local/i386-pc-solaris2.10/include -isystem 
/export/home/arth/local/i386-pc-solaris2.10/sys-include-c   -g -O2 -gtoggle 
-DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes 
-Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -Werror -Wold-style-definition 
-Wc++-compat   -DHAVE_CONFIG_H -I. -I. -I/home/ahaas/gnu/gcc.git/gcc 
-I/home/ahaas/gnu/gcc.git/gcc/. -I/home/ahaas/gnu/gcc.git/gcc/../include 
-I/home/ahaas/gnu/gcc.git/gcc/../libcpp/include 
-I/export/home/arth/local/include -I/export/home/arth/local/include  
-I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber 
-I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber/dpd -I../libdecnumber/!
 home/ahaas/gnu/gcc.git/gcc/tree-call-cdce.c -o tree-call-cdce.o
/home/ahaas/gnu/gcc.git/gcc/toplev.c: In function 'crash_signal':
/home/ahaas/gnu/gcc.git/gcc/toplev.c:445:3: error: implicit declaration of 
function 'signal' [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors

The likely cause is this patch applied yesterday:

2010-11-23  Joseph Myers  
{ ...snip ... }
* toplev.c: Don't include  or .
(setup_core_dumping, strip_off_ending, decode_d_option): Move to
opts.c.

With the issues involving libquadmath I'm not expect the i386 build to
succeed even once this snag is resolved, but my sparc builds have been
working, and I had a successful build yesterday morning:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/export/home/arth/local/libexec/gcc/sparc-sun-solaris2.10/4.6.0/lto-wrapper
Target: sparc-sun-solaris2.10
Configured with: /export/home/arth/src/gcc.git/configure 
--prefix=/export/home/arth/local --enable-languages=c,c++,objc --disable-nls 
--with-gmp=/export/home/arth/local --with-mpfr=/export/home/arth/local 
--with-mpc=/export/home/arth/local --enable-checking=release --enable-threads 
--with-gnu-as --with-as=/export/home/arth/local/bin/as --with-gnu-ld 
--with-ld=/export/home/arth/local/bin/ld --enable-libstdcxx-pch=no 
--with-cpu=ultrasparc3 --with-tune=ultrasparc3
Thread model: posix
gcc version 4.6.0 20101123 (experimental) [master revision 
66b86a7:d759d44:e71ec76db59f8a20d013503e7192680f92872796] (GCC)

Art Haas


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Richard Guenther
On Wed, Nov 24, 2010 at 4:22 PM, Joern Rennecke  wrote:
> Quoting Richard Guenther :
>
>> On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke 
>> wrote:
>>>
>>> I'm fine with the RTL optimizers to use target macros, but I'd like the
>>> frontends and tree optimizers to cease to use tm.h .  That means
>>> all macros uses there have to be converted.  That does not necessarily
>>> involve target port code - a wrapper hook could be provided in
>>> targhooks.c
>>> that uses the target macro.
>>
>> I don't see why RTL optimizers should be different from tree optimizers.
>
> RTL optimizers tend to have a lot of target dependencies; hookizing them
> all is likely impractical, and also to have a performance impact.
>
> Also, by making the tree optimizers target independent, you can make
> optimizations that consider more than one target.
>
> Because RTL optimizers work on highly target-dependent program
> representations, the decision on what target's code to work on has already
> been fixed by the time the RTL optimizers run.

As we are moving towards doing more target dependent optimizations
on the tree level this doesn't sound like a sustainable opinion.  GIMPLE
is just a representation - whether it is target dependent or not isn't
related to that it is GIMPLE or RTL.

>> And we don't want to pay the overhead of hookization every target
>> dependent constant just for the odd guys who want multi-target
>> compilers that have those constants differing.
>
> As compared to... having a multi-year unfinished hookization process that
> hasn't provided any new functinality yet.

And hookizing BITS_PER_UNIT brings you closer exactly how much?

Tackle the hard ones.  Because if you can't solve those you won't
succeed ever and there's no reason to pay the price for BITS_PER_UNIT
then.

> I don't think hookizing the frontends and tree optimizers will have a
> noticable performance impact.
> And if you must have the abolute fastest compiler, LTO should eventually be
> able to inline the hooks if they are really only returning a constant.

Not for a multi-target compiler where the hooks are in shared loadable
modules like Diego envisions.  Maybe we should at least have a way
to specify indirect function calls are 'const' or 'pure', I don't know if
that works right now, but I doubt it (decl vs. type attributes, etc.).

> With regards to BITS_PER_UNIT, the issue is not so much that I really need
> it hookized for a multi-target compiler - ultimately there have to be
> consistent structure layout rules for an input program.
>
> The issue is that BITS_PER_UNIT is defined in tm.h, and if every
> file that wants to know BITS_PER_UNIT includes tm.h for that purpose,
> we'll continue to have hard-to-predict interactions between target,
> midddle-end and front-end headers in the frontends and tree optimizers,
> and other macros can creep in unnoticed which work on one target, but
> not for some other target.
> Hookizing and poisoning individual macros is only patchwork, and it
> can actually give higher performance penalties when you hookize
> the macro even in files that are tighly coupled with the target definitions
> -
> as many RTL optimizers are.
>
> The only watertight way to make sure frontends do not use macros from tm.h
> is for them not to include that header, and and neither should any of the
> headers they need include that header; make this a written policy, and
> poison TM_H for IN_GCC_FRONTEND .

Well, it was already said that maybe the FEs should use type-precision
of char-type-node.  I don't know if splitting tm.h into good-for-tree and
not-good-for tree is a way to go, but it's certainly a possibility if
your short-term goal is to avoid accidential use of target information.

Richard.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joern Rennecke

Quoting Richard Guenther :


On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke  wrote:

I'm fine with the RTL optimizers to use target macros, but I'd like the
frontends and tree optimizers to cease to use tm.h .  That means
all macros uses there have to be converted.  That does not necessarily
involve target port code - a wrapper hook could be provided in targhooks.c
that uses the target macro.


I don't see why RTL optimizers should be different from tree optimizers.


RTL optimizers tend to have a lot of target dependencies; hookizing them
all is likely impractical, and also to have a performance impact.

Also, by making the tree optimizers target independent, you can make
optimizations that consider more than one target.

Because RTL optimizers work on highly target-dependent program
representations, the decision on what target's code to work on has already
been fixed by the time the RTL optimizers run.


And we don't want to pay the overhead of hookization every target
dependent constant just for the odd guys who want multi-target
compilers that have those constants differing.


As compared to... having a multi-year unfinished hookization process that
hasn't provided any new functinality yet.

I don't think hookizing the frontends and tree optimizers will have a
noticable performance impact.
And if you must have the abolute fastest compiler, LTO should eventually be
able to inline the hooks if they are really only returning a constant.

With regards to BITS_PER_UNIT, the issue is not so much that I really need
it hookized for a multi-target compiler - ultimately there have to be
consistent structure layout rules for an input program.

The issue is that BITS_PER_UNIT is defined in tm.h, and if every
file that wants to know BITS_PER_UNIT includes tm.h for that purpose,
we'll continue to have hard-to-predict interactions between target,
midddle-end and front-end headers in the frontends and tree optimizers,
and other macros can creep in unnoticed which work on one target, but
not for some other target.
Hookizing and poisoning individual macros is only patchwork, and it
can actually give higher performance penalties when you hookize
the macro even in files that are tighly coupled with the target definitions -
as many RTL optimizers are.

The only watertight way to make sure frontends do not use macros from tm.h
is for them not to include that header, and and neither should any of the
headers they need include that header; make this a written policy, and
poison TM_H for IN_GCC_FRONTEND .


Well.  Long term.  Hookizing constants is easy - before proceeding
with those (seemingly expensive) ones I'd like to see all the _hard_
target macros converted into hooks.  If there are only things like
BITS_PER_UNIT left we can talk again.


The hard parts certainly include target.h and function.h .
But these are necessary to get a proper overview on the actual
problem, and stopping us from sliding back.
When I fix these, a number of files suddenly become exposed as using
tm.h without including it themselves.

Should I change all these files to explicitly include "tm.h" then,
even if it's only for BITS_PER_UNIT?


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Nathan Froyd
On Wed, Nov 24, 2010 at 02:48:01PM +, Pedro Alves wrote:
> On Wednesday 24 November 2010 13:45:40, Joern Rennecke wrote:
> > Quoting Pedro Alves :
> > Also, these separate hooks for common operations can make the code more
> > readable, particularly in the bits_in_units_ceil case.
> > I.e.
> >  foo_var = ((bitsize + targetm.bits_per_unit () - 1)
> > / targetm.bits_per_unit ());
> > vs.
> >  foo_var = targetm.bits_in_units_ceil (bitsize);
> > 
> 
> bits_in_units_ceil could well be a macro or helper function
> implemented on top of targetm.bits_per_unit (which itself could
> be a data field instead of a function call), that only accessed
> bits_per_unit once.  It could even be implemented as a helper
> macro / function today, on top of BITS_PER_UNIT.

I think adding the functions as inline functions somewhere and using
them in the appropriate places would be a reasonable standalone
cleanup.  It'd be easy to move towards something more general later.
Writing:

  int bits = ...;
  ... (X + bits - 1)/ bits;

  also generates ever-so-slightly smaller code than:

  ... (X + BITS_PER_UNIT - 1) / BITS_PER_UNIT;

on targets where BITS_PER_UNIT is not constant.

I personally am not a fan of the X_in_Y naming, though; I think X_to_Y
is a little clearer.

-Nathan


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joseph S. Myers
On Wed, 24 Nov 2010, Richard Guenther wrote:

> Well.  Long term.  Hookizing constants is easy - before proceeding
> with those (seemingly expensive) ones I'd like to see all the _hard_
> target macros converted into hooks.  If there are only things like
> BITS_PER_UNIT left we can talk again.

I think doing easy ones first is natural - the hard ones are those 
affecting enum values, #if conditionals etc. (which includes a lot of 
constants), and if you convert the easy ones you can then see what's left.  
I think good priorities for moving away from target macros include:

* Anything in code built for the target (use predefined macros, built-in 
functions or if appropriate macros defined in headers under 
libgcc/config/).

* Anything that may expand to a function call for some targets and so 
requires tm_p.h to be included.

* Anything clearly not performance-critical - for example, things used 
only at startup.

But Joern has a different set of priorities:

* Anything used in front ends.

* Anything used in tree optimizers.

And that's also fine.  What's important is:

* Do the conversion rather than spending ages talking about it.

* *Think* about the appropriate conversion for a macro or set of macros 
rather than blindly mirroring the macro semantics in a hook.  See for 
example my recent elimination of HANDLE_SYSV_PRAGMA and 
HANDLE_PRAGMA_PACK_PUSH_POP by enabling features unconditionally - not 
everything that is presently configurable by a target necessarily has a 
good reason for being configurable by a target, and sometimes the existing 
set of macros may not be a good way of describing what actually does need 
to be configured in a particular area.  Or, in the BITS_PER_UNIT case, 
making sure to use TYPE_PRECISION (char_type_node) where that seems more 
appropriate.

-- 
Joseph S. Myers
jos...@codesourcery.com


Possible GCC bug.

2010-11-24 Thread Simon Hill
I think I may have hit a bug where an implicit copy constructor can't
construct an array of a subtype with a user-defined copy constructor.
I can't see any hits searching for "invalid array assignment" on the
bug repository.
I messed up submitting my last bug so I thought I'd ask here first for
confirmation.


§12.8.28 states:
"A copy/move assignment operator that is defaulted and not defined as
deleted is implicitly defined when [...] or when it is explicitly
defaulted after its first declaration."

§12.8.30 (implicitly-defined copy assignment) states:
"The implicitly-defined copy assignment operator for a non-union class
X performs memberwise copy assignment of its subobjects [...]
Each subobject is assigned in the manner appropriate to its type: [...]
-- if the subobject is an array, each element is assigned, in the
manner appropriate to the element type;"
I'm assuming that "the manner appropriate to the element type" means
use copy-assignment. At least, that's what seems to happens if the
main object's copy-assignment operator is implicitly defined.

Yet the above doesn't seem able to compile if:
- The main object contains an array of the subobject.
- The main object's copy-assignment operator IS explicitly defaulted (§12.8.28).
- The subobject's copy-assignment operator isn't implicitly or default defined.


TEST SOURCE (Attached):
1) I created the most trivial type (named SFoo) that contains a
non-default copy-assignment operator.
2) I created the most trivial type (named SBar) that contains:
  - an array of SFoo.
  - an explicitly defaulted copy-assignment operator.
3) I created a function that:
  - creates two copies of SBar.
  - assigns one copy to the other.

TEST:
I compiled using the -std=c++0x option.
GCC refuses to compile (11:8: error: invalid array assignment).
- If I remove the explicit defaulting of SBar's copy-assignment, it works.
- If I default SFoo's copy-assignment, it works.

SPECS:
GCC: 4.6.0 20101106 (experimental) (GCC)
  - Using Pedro Lamarão's delegating constructors patch:
  - http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00620.html
  - (I can't see this having any effect here).
TARGET: x86_64-unknown-linux-gnu
SYSTEM: Core2Duo(64), Ubuntu(64) 10.4.



TL/DR: (§12.8.28) & (§12.8.30) seem to say attached code should
compile. It doesn't.
struct SFoo
  {
SFoo& operator = (SFoo const&)
  { return *this; } // <--(1) FAILS.
  // =default; // <--(2) WORKS.

//void operator = (SFoo const&) {} // <--(3) ALSO FAILS.
  };


struct SBar
  {
SBar& operator = (SBar const&) =default; // <--(4): WORKS if removed.
  
SFoo M_data[1];
  };


int main()
  {
SBar x;
SBar y;
y = x;
  }

Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Pedro Alves
On Wednesday 24 November 2010 13:45:40, Joern Rennecke wrote:
> Quoting Pedro Alves :
> 
> > On Tuesday 23 November 2010 20:09:52, Joern Rennecke wrote:
> >> If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this
> >> would not only cost a data load from the target vector, but would also
> >> inhibit optimizations that replace division / modulo / multiply with shift
> >> or mask operations.
> >
> > Have you done any sort of measurement, to see if what is lost
> > is actually noticeable in practice?
> 
> No, I haven't.
> On an i686 it's probably not measurable.  On a host with a slow software
> divide it might be, if the code paths that require these operations are
> exercised a lot - that would also depend on the source code being compiled.

And I imagine that it should be possible to factor out many
of the slow divides out of hot loops, if the compiler doesn't
manage to do that already.

> Also, these separate hooks for common operations can make the code more
> readable, particularly in the bits_in_units_ceil case.
> I.e.
>  foo_var = ((bitsize + targetm.bits_per_unit () - 1)
> / targetm.bits_per_unit ());
> vs.
>  foo_var = targetm.bits_in_units_ceil (bitsize);
> 

bits_in_units_ceil could well be a macro or helper function
implemented on top of targetm.bits_per_unit (which itself could
be a data field instead of a function call), that only accessed
bits_per_unit once.  It could even be implemented as a helper
macro / function today, on top of BITS_PER_UNIT.

Making design decisions like this based on supposedly
missed optimizations _alone_, without knowing how much
overhead we're talking about is really the wrong way to
do things.

-- 
Pedro Alves


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Diego Novillo
On Wed, Nov 24, 2010 at 09:37, Richard Guenther
 wrote:

> Well.  Long term.  Hookizing constants is easy - before proceeding
> with those (seemingly expensive) ones I'd like to see all the _hard_
> target macros converted into hooks.  If there are only things like
> BITS_PER_UNIT left we can talk again.

Sure.  I mostly wanted to check whether my long term view was
compatible with yours.


Diego.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Richard Guenther
On Wed, Nov 24, 2010 at 3:33 PM, Diego Novillo  wrote:
> On Wed, Nov 24, 2010 at 09:17, Richard Guenther
>  wrote:
>
>> And we don't want to pay the overhead of hookization every target
>> dependent constant just for the odd guys who want multi-target
>> compilers that have those constants differing.
>
> I would like to know how much this overhead really amounts to.  Long
> term, I would like to see back ends become shared objects that can be
> selected with a -fbackend=... flag or some such.  Removing
> configure/compile-time macros and other hardwired data is instrumental
> to that.

Well.  Long term.  Hookizing constants is easy - before proceeding
with those (seemingly expensive) ones I'd like to see all the _hard_
target macros converted into hooks.  If there are only things like
BITS_PER_UNIT left we can talk again.

Richard.

>
> Diego.
>


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Diego Novillo
On Wed, Nov 24, 2010 at 09:17, Richard Guenther
 wrote:

> And we don't want to pay the overhead of hookization every target
> dependent constant just for the odd guys who want multi-target
> compilers that have those constants differing.

I would like to know how much this overhead really amounts to.  Long
term, I would like to see back ends become shared objects that can be
selected with a -fbackend=... flag or some such.  Removing
configure/compile-time macros and other hardwired data is instrumental
to that.


Diego.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Richard Guenther
On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke  wrote:
> Quoting Richard Guenther :
>
>> On Wed, Nov 24, 2010 at 1:56 PM, Joern Rennecke 
>> wrote:
>>>
>>> So what are we going to do about all the tree optimizers and frontends
>>> that
>>> use BITS_PER_UNIT?
>>
>> Tree optimizers are fine to use target macros/hooks, and I expect
>> use will grow, not shrink.
>
> Hooks are fine, as long as we can make the target vector type target
> independent (see PR46500).  However, macro use means the tree
> optimizer / frontend is compiled for a particular target.  That prevents
> both mulit-target compilers and target-independent frontend plugins
> from working properly.
>>
>>> Should they all include tm.h, with the hazard that more specific
>>> macros creep in?
>>> Or do we want to put this in a separate header file?
>>
>> I don't have a very clear picture of where we want to go with all the
>> hookization.  And I've decided to postpone any investigation until
>> more macros are converted (where it makes sense to).
>
> I'm fine with the RTL optimizers to use target macros, but I'd like the
> frontends and tree optimizers to cease to use tm.h .  That means
> all macros uses there have to be converted.  That does not necessarily
> involve target port code - a wrapper hook could be provided in targhooks.c
> that uses the target macro.

I don't see why RTL optimizers should be different from tree optimizers.

And we don't want to pay the overhead of hookization every target
dependent constant just for the odd guys who want multi-target
compilers that have those constants differing.

Richard.

> target libraries should also not use tm.h, but predefined macros or built-in
> functions.  I'm not currently working on that, though.
>


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joern Rennecke

Quoting Richard Guenther :


On Wed, Nov 24, 2010 at 1:56 PM, Joern Rennecke  wrote:

So what are we going to do about all the tree optimizers and frontends that
use BITS_PER_UNIT?


Tree optimizers are fine to use target macros/hooks, and I expect
use will grow, not shrink.


Hooks are fine, as long as we can make the target vector type target
independent (see PR46500).  However, macro use means the tree
optimizer / frontend is compiled for a particular target.  That prevents
both mulit-target compilers and target-independent frontend plugins
from working properly.



Should they all include tm.h, with the hazard that more specific
macros creep in?
Or do we want to put this in a separate header file?


I don't have a very clear picture of where we want to go with all the
hookization.  And I've decided to postpone any investigation until
more macros are converted (where it makes sense to).


I'm fine with the RTL optimizers to use target macros, but I'd like the
frontends and tree optimizers to cease to use tm.h .  That means
all macros uses there have to be converted.  That does not necessarily
involve target port code - a wrapper hook could be provided in targhooks.c
that uses the target macro.

target libraries should also not use tm.h, but predefined macros or  
built-in functions.  I'm not currently working on that, though.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Richard Guenther
On Wed, Nov 24, 2010 at 1:56 PM, Joern Rennecke  wrote:
> Quoting Richard Guenther :
>
>> Well.  Some things really ought to stay as macros.  You can always
>> error out if a multi-target compiler would have conflicts there at
>> configure time.
>
> So what are we going to do about all the tree optimizers and frontends that
> use BITS_PER_UNIT?

Tree optimizers are fine to use target macros/hooks, and I expect
use will grow, not shrink.

> Should they all include tm.h, with the hazard that more specific
> macros creep in?
> Or do we want to put this in a separate header file?

I don't have a very clear picture of where we want to go with all the
hookization.  And I've decided to postpone any investigation until
more macros are converted (where it makes sense to).

Richard.


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joern Rennecke

Quoting Pedro Alves :


On Tuesday 23 November 2010 20:09:52, Joern Rennecke wrote:

If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this
would not only cost a data load from the target vector, but would also
inhibit optimizations that replace division / modulo / multiply with shift
or mask operations.


Have you done any sort of measurement, to see if what is lost
is actually noticeable in practice?


No, I haven't.
On an i686 it's probably not measurable.  On a host with a slow software
divide it might be, if the code paths that require these operations are
exercised a lot - that would also depend on the source code being compiled.

Also, these separate hooks for common operations can make the code more
readable, particularly in the bits_in_units_ceil case.
I.e.
foo_var = ((bitsize + targetm.bits_per_unit () - 1)
   / targetm.bits_per_unit ());
vs.
foo_var = targetm.bits_in_units_ceil (bitsize);


Re: Help with reloading FP + offset addressing mode

2010-11-24 Thread Mohamed Shafi
On 30 October 2010 05:45, Joern Rennecke  wrote:
> Quoting Mohamed Shafi :
>
>> On 29 October 2010 00:06, Joern Rennecke 
>> wrote:
>>>
>>> Quoting Mohamed Shafi :
>>>
 Hi,

 I am doing a port in GCC 4.5.1. For the port

 1. there is only (reg + offset) addressing mode only when reg is SP.
 Other base registers are not allowed
 2. FP cannot be used as a base register. (FP based addressing is done
 by copying it into a base register)
 In order to take advantage of FP elimination (this will create SP +
 offset addressing), what i did the following

 1. Created a new register class (address registers + FP) and used this
 new class as the BASE_REG_CLASS
>>>
>>> Stop right there.  You need to distinguish between FRAME_POINTER_REGNUM
>>> and HARD_FRAME_POINTER_REGNUM.
>>>
>>
>> From the description given in the internals, i am not able to
>> understand why you suggested this. Could you please explain this?
>
> In order to trigger reloading of the address, you have to have a register
> elimination, even if the stack pointer is not a suitable destinatination
> for the elimination.  Also, if you want to reload do the work for you,
> you must not lie to it about the addressing capabilities of an actual hard
> register.  Hence, you need separate hard and soft frame pointers.
>
> If you have them, but conflate them when you describe what you are doing
> in your port, you are not only likely to confuse the listener/reader,
> but also your documentation, your code, and ultimately yourself.
>

Having a FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM will
trigger reloading of address. But for the following pattern

(insn 3 2 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg/f:QI 35 SFP)
 (const_int 1 [0x1])) [0 c+0 S1 A32])
(reg:QI 0 g0 [ c ])) 7 {movqi_op} (nil))

where SFP is FRAME_POINTER_REGNUM, an elimination will result in

(insn 3 2 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg/f:QI 27 as15)
 (const_int 1 [0x1])) [0 c+0 S1 A32])
(reg:QI 0 g0 [ c ])) 7 {movqi_op} (nil))

where as15 is the HARD_FRAME_POINTER_REGNUM. But remember this new
address is not valid (as only SP is allowed in this addressing mode).
When the above pattern is reloaded i get:

(insn 28 27 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg:QI 28 a0)
 (const_int 1 [0x1])) [0 c+0 S1 A32])
  (reg:QI 3 g3)) -1 (nil))

I get unrecognizable insn ICE, because this addressing mode is not
valid. I believe this happens because when the reload_pass get the
address of the form (reg + off), it assumes that the address is
invalid due to one of the following:

1. 'reg' is not a suitable base register
2. the offset is out of range
3. the address has an eliminatable register as a base register.

Is there any way to over come this one?

Any help is appreciated.

Shafi


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Pedro Alves
On Tuesday 23 November 2010 20:09:52, Joern Rennecke wrote:
> If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this
> would not only cost a data load from the target vector, but would also
> inhibit optimizations that replace division / modulo / multiply with shift
> or mask operations.

Have you done any sort of measurement, to see if what is lost
is actually noticeable in practice?

> So maybe we should look into having a few functional hooks that do  
> common operations, i.e.
> bits_in_unitsx / BITS_PER_UNIT
> bits_in_units_ceil   (x + BITS_PER_UNIT - 1) / BITS_PER_UNIT
> bit_unit_remainder   x % BITS_PER_UNIT
> units_in_bitsx * BITS_PER_UNIT

-- 
Pedro Alves


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Joern Rennecke

Quoting Richard Guenther :


Well.  Some things really ought to stay as macros.  You can always
error out if a multi-target compiler would have conflicts there at
configure time.


So what are we going to do about all the tree optimizers and frontends that
use BITS_PER_UNIT?
Should they all include tm.h, with the hazard that more specific
macros creep in?
Or do we want to put this in a separate header file?


Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Paul Koning

On Nov 24, 2010, at 6:45 AM, Richard Guenther wrote:

> On Tue, Nov 23, 2010 at 9:09 PM, Joern Rennecke  wrote:
>> If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this
>> would not only cost a data load from the target vector, but would also
>> inhibit optimizations that replace division / modulo / multiply with shift
>> or mask operations.
>> So maybe we should look into having a few functional hooks that do common
>> operations, i.e.
>> bits_in_unitsx / BITS_PER_UNIT
>> bits_in_units_ceil   (x + BITS_PER_UNIT - 1) / BITS_PER_UNIT
>> bit_unit_remainder   x % BITS_PER_UNIT
>> units_in_bitsx * BITS_PER_UNIT
>> 
>> Although we currently have some HOST_WIDE_INT uses, I hope using
>> unsigned HOST_WIDE_INT as the argument / return type will generally work.
>> 
>> tree.h also defines BITS_PER_UNIT_LOG, which (or its hook equivalent)
>> should probably be used in all the places that use
>> exact_log_2 (BITS_PER_UNIT), and, if it could be relied upon to exist, we
>> could also use it as a substitute for the above hooks.  However, this seems
>> a bit iffy - we'd permanently forgo the possibility to have 6 / 7 / 36
>> bit etc. units.
>> 
>> Similar arrangements could be made for BITS_PER_WORD and UNITS_PER_WORD,
>> although these macros seem not quite so prevalent in the tree optimizers.
> 
> Well.  Some things really ought to stay as macros.  You can always
> error out if a multi-target compiler would have conflicts there at
> configure time.

That seems reasonable especially since BITS_PER_UNIT is likely to be consistent 
(and 8) in any multi-target compiler.

paul



Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends

2010-11-24 Thread Richard Guenther
On Tue, Nov 23, 2010 at 9:09 PM, Joern Rennecke  wrote:
> If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this
> would not only cost a data load from the target vector, but would also
> inhibit optimizations that replace division / modulo / multiply with shift
> or mask operations.
> So maybe we should look into having a few functional hooks that do common
> operations, i.e.
> bits_in_units        x / BITS_PER_UNIT
> bits_in_units_ceil   (x + BITS_PER_UNIT - 1) / BITS_PER_UNIT
> bit_unit_remainder   x % BITS_PER_UNIT
> units_in_bits        x * BITS_PER_UNIT
>
> Although we currently have some HOST_WIDE_INT uses, I hope using
> unsigned HOST_WIDE_INT as the argument / return type will generally work.
>
> tree.h also defines BITS_PER_UNIT_LOG, which (or its hook equivalent)
> should probably be used in all the places that use
> exact_log_2 (BITS_PER_UNIT), and, if it could be relied upon to exist, we
> could also use it as a substitute for the above hooks.  However, this seems
> a bit iffy - we'd permanently forgo the possibility to have 6 / 7 / 36
> bit etc. units.
>
> Similar arrangements could be made for BITS_PER_WORD and UNITS_PER_WORD,
> although these macros seem not quite so prevalent in the tree optimizers.

Well.  Some things really ought to stay as macros.  You can always
error out if a multi-target compiler would have conflicts there at
configure time.

Richard.


Re: Adding Leon processor to the SPARC list of processors

2010-11-24 Thread Konrad Eisele

> Is the list above an indication that you are already finished with
> the modifications? :-)
> Can you give me a note, otherwise  I'll create a new patch that implements
> the scheme you suggested.
> 

Sorry, I didnt note the attachment that is already your implementation.
I thought it was the old diff.
So: I'm ok with all. Thanks for the effort.

-- Greetings Konrad


Re: Adding Leon processor to the SPARC list of processors

2010-11-24 Thread Konrad Eisele
Eric Botcazou wrote:
>> Following the recent comments by Eric, the patch now sketches the
>> following setup:
>>
>> If multi-lib is wanted:
>>  configure --with-cpu=leon ... : creates multilib-dir soft|v8
>> combinations using [-msoft-float|-mcpu=sparcleonv8] (MULTILIB_OPTIONS =
>> msoft-float mcpu=sparcleonv8)
>>
>> If Single-lib is wanted:
>>  configure --with-cpu=sparcleonv7 --with-float=soft --disable-multilib ... 
>> : (v7 | soft | no-multilib) configure --with-cpu=sparcleonv8
>> --with-float=soft --disable-multilib ...  : (v8 | soft | no-multilib)
>> configure --with-cpu=sparcleonv7 --with-float=hard --disable-multilib ... 
>> : (v7 | hard | no-multilib) configure --with-cpu=sparcleonv8
>> --with-float=hard --disable-multilib ...  : (v8 | hard | no-multilib)
>>
>> Using --with-cpu=leon|sparcleonv7|sparcleonv8 the the sparc_cpu is switched
>> to PROCESSOR_LEON.
> 
> I'm mostly OK, but I don't think we need sparcleonv7 or sparcleonv8.  
> Attached 

You are right.

> is another proposal, which:
> 
>  1. Adds -mtune/--with-tune=leon for all SPARC targets.  In particular, this 
> mean that if you configure --target=sparc-{elf,rtems} --with-tune=leon, you 
> get a multilib-ed compiler defaulting to V7/FPU and -mtune=leon, with V8 and 
> NO-FPU libraries.

Ok, this scheme seems best.

> 
>  2. Adds new targets sparc-leon-{elf,linux}: multilib-ed compiler defaulting
> to V8/FPU and -mtune=leon, with V7 and NO-FPU libraries.

Ok.

> 
>  3. Adds new targets sparc-leon3-{elf,linux}: multilib-ed compiler defaulting 
> to V8/FPU and -mtune=leon, with NO-FPU libraries.
> 
> Singlelib-ed compilers are available through --disable-multilib and
>   --with=cpu={v7,v8} --with-float={soft,hard} --with-tune=leon
> for sparc-{elf,rtems} or just
>   --with=cpu={v7,v8} --with-float={soft,hard}
> for sparc-leon*-*.
> 
> The rationale is that --with-cpu shouldn't change the set of multilibs, it is 
> only the configure-time equivalent of -mcpu.  This set of multilibs should 
> only depend on the target and the presence of --disable-multilib.
> 

Ok, understood.

> 
>   * config.gcc (sparc-*-elf*): Deal with sparc-leon specifically.
>   (sparc-*-linux*): Likewise.
>   (sparc*-*-*): Remove obsolete sparc86x setting.
>   (sparc-leon*): Default to --with-cpu=v8 and --with-tune=leon.
>   * doc/invoke.texi (SPARC Options): Document -mcpu/-mtune=leon.
>   * config/sparc/sparc.h (TARGET_CPU_leon): Define.
>   (TARGET_CPU_sparc86x): Delete.
>   (TARGET_CPU_cypress): Define as alias to TARGET_CPU_v7.
>   (TARGET_CPU_f930): Define as alias to TARGET_CPU_sparclite.
>   (TARGET_CPU_f934): Likewise.
>   (TARGET_CPU_tsc701): Define as alias to TARGET_CPU_sparclet.
>   (CPP_CPU_SPEC): Add entry for -mcpu=leon.
>   (enum processor_type): Add PROCESSOR_LEON.
>   * config/sparc/sparc.c (leon_costs): New cost array.
>   (sparc_option_override): Add entry for TARGET_CPU_leon and -mcpu=leon.
>   Initialize cost array to leon_costs if -mtune=leon.
>   * config/sparc/sparc.md (cpu attribute): Add leon.
>   Include leon.md scheduling description.
>   * config/sparc/leon.md: New file.
>   * config/sparc/t-elf: Do not assemble Solaris startup files.
>   * config/sparc/t-leon: New file.
>   * config/sparc/t-leon3: Likewise.
> 
> 

Is the list above an indication that you are already finished with
the modifications? :-)
Can you give me a note, otherwise  I'll create a new patch that implements
the scheme you suggested.

-- Greetings Konrad