Re: libgcc: strange optimization

2011-08-04 Thread Andreas Schwab
Hans-Peter Nilsson  writes:

> To make sure, it'd be nice if someone could perhaps grep an
> entire GNU/Linux-or-other distribution including the kernel for
> uses of asm-declared *local* registers that don't directly feed
> into asms and not being the stack-pointer?

One frequent candidate is the global pointer.

Andreas.

-- 
Andreas Schwab, sch...@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84  5EC7 45C6 250E 6F00 984E
"And now for something completely different."


Re: Question about SMS scheduling windows

2011-08-04 Thread Richard Sandiford
Hi Ayal,

Thanks to you and Revital for the replies.  The reason I asked is that
I wanted to rewrite gen_sched_window so that it has only one loop over
the PSPs and one loop over the PSSs.  I have a follow-up patch to use
iv analysis to reduce the number of memory dependencies (or at least
increase the distance between them), and it was easier to do that
after this change.

Ayal Zaks  writes:
>>> I ask because in the final range:
>>>
>>>   start = early_start;
>>>   end = MIN (end, early_start + ii);
>>>   /* Schedule the node close to it's predecessors.  */
>>>   step = 1;
>>>
>>> END is an exclusive bound.  It seems like we might be double-counting
> here,
>>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.
>>
>>
>>Yes, I think it indeed should be fixed. Thanks for reporting on this.
>>
>>Revital
>
> Agreed;
>
>   if (e->data_type == MEM_DEP)
> end = MIN (end, SCHED_TIME (v_node) + ii - 1);
>
> should be replaced with
>
>   if (e->data_type == MEM_DEP)
> end = MIN (end, p_st + ii);
>
> also for the (3rd) case when there are both previously-scheduled
> predessors and previously-scheduled successors. The range is inclusive
> of start and exclusive of end: for (c = start; c != end; c += step)...

OK.  For the follow-on iv patch, it seemed easier to keep both bounds
inclusive for the loop, then make the "end" exclusive when setting the
out parameters.  (Note that there shouldn't be any problem with overflow
when making the bound exclusive, because the size of the range has been
restricted to "ii" by that stage.  The follow-on patch does use
saturating maths for all ops though.)

>>This doesn't seem to be in the paper, and the comment suggests
>>"count_succs > count_preds" rather than "count_succs >= count_preds".
>>Is the ">=" vs ">" important?
>
> I think not: in both cases you'll be trying to minimize the same
> number of live ranges. But indeed it's good to be consistent with the
> comment.

OK.  For no particular reason other than cowardice, I ended up keeping
this as:

!   if (count_succs && count_succs >= count_preds)

The reason for asking was that:

!   if (count_succs > count_preds)

seemed more natural, and would match the existing comment.  I'm happy
to test that instead if you prefer.

I don't have powerpc hardware that I can do meaningful performance
testing on, but I did run it through a Popular* Embedded Benchmark
on an ARM Cortex-A8 board with -O3 -fmodulo-sched
-fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
a benchmark that does benefit from modulo scheduling, in some cases
by a significant amount.)

Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
additional patch:

Index: gcc/opts.c
===
--- gcc/opts.c  2011-08-03 10:56:50.0 +0100
+++ gcc/opts.c  2011-08-03 10:56:50.0 +0100
@@ -449,6 +449,8 @@ static const struct default_options defa
 { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
 
 /* -O2 optimizations.  */
 { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },

applied for both the "before" and "after" runs.  OK to install?

Thanks,
Richard


gcc/
* modulo-sched.c (get_sched_window): Use just one loop for predecessors
and one loop for successors.  Fix upper bound of memory range.

Index: gcc/modulo-sched.c
===
*** gcc/modulo-sched.c  2011-08-04 09:09:29.0 +0100
--- gcc/modulo-sched.c  2011-08-04 09:49:16.0 +0100
*** #define MAX_SPLIT_NUM 10
*** 1630,1638 
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
! sbitmap sched_nodes, int ii, int *start_p, int *step_p, int 
*end_p)
  {
int start, step, end;
ddg_edge_ptr e;
sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
--- 1630,1640 
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
! sbitmap sched_nodes, int ii, int *start_p, int *step_p,
! int *end_p)
  {
int start, step, end;
+   int early_start, late_start;
ddg_edge_ptr e;
sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
*** get_sched_window (partial_schedule_ptr p
*** 1640,1645 
--- 1642,1649 
sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
int psp_not_empty;
int pss_not_empty;
+   int count_preds;
+   int count_succs;
  
/* 1. compute sched window for u (start, end, step).  */
sbi

Re: g++ 2.5.2 does not catch reference to local variable error.

2011-08-04 Thread Jonathan Wakely
On 4 August 2011 03:30, LIM Fung-Chai  wrote:
> Hi,
>
> "g++ -Wall -Wextra ..." should flag a warning on the following code
> but does not.
>
> std::pair
> get_XYZ_data()
> {
>    XYZ result;
>    return std::pair(1, result);
> }
>
> This is a violation of Scott Meyer's "Effective C++" Item 21 "Don't
> try to return a reference when you must return an object."  GCC
> version 4.5.2 on Kubuntu 11.04 does not issue a warning.
>
> I apologize for not subscribing to the mailing list or submitting via
> GCC Buzilla.

Thanks for the apology, but it should still be reported to bugzilla
not to this list.

Your example can be reduced to

struct XYZ { };

XYZ& f(XYZ& r) { return r; }

XYZ&
get_XYZ_data()
{
   XYZ result;
   return f(result);
}


Re: g++ 2.5.2 does not catch reference to local variable error.

2011-08-04 Thread Miles Bader
Jonathan Wakely  writes:
>> "g++ -Wall -Wextra ..." should flag a warning on the following code
>> but does not.
>
> Thanks for the apology, but it should still be reported to bugzilla
> not to this list.

BTW, it should only warn if given -Weffc++, right?

-Miles

-- 
People who are more than casually interested in computers should have at
least some idea of what the underlying hardware is like.  Otherwise the
programs they write will be pretty weird.  -- Donald Knuth


[named address] rejects-valid: error: initializer element is not computable at load time

2011-08-04 Thread Georg-Johann Lay
For the following 1-liner I get an error with current trunk r177267:

const __pgm char * pallo = "pallo";

__pgm as a named address space qualifier.


>$TV/xgcc -B$TV pgm.c -c -save-temps -dp -mmcu=atmega8
 
unit size 
align 8 symtab 0 alias set -1 canonical type 0xb74c7f00 precision 8 
min  max 
pointer_to_this >
unsigned PHI
size 
unit size 
align 8 symtab 0 alias set -1 canonical type 0xb74c7f60>
readonly constant
arg 0 
unsigned HI size  unit size 
align 8 symtab 0 alias set -1 canonical type 0xb7473f00>
readonly constant
arg 0 
readonly constant
arg 0 
readonly constant static "pallo\000">
pgm.c:1:28>
pgm.c:1:28>
pgm.c:1:28>
pgm.c:1:1: error: initializer element is not computable at load time

Is that a flaw in c-typeck.c?

I know the line worked some time ago; from time to time I merge trunk into
the named address space support for AVR which is not in SVN.

If you need the patch to reproduce the bug I will upload the patch, of course.

No BE hook is called for the tree.

Moreover, if a user writes a line like
   const __pgm char * pallo = "pallo";
he wants the string literal to be placed in the non-default address
space.  There is no other way to express that except something like
   const __pgm char * pallo = (const __pgm char*) "pallo";
which doesn't work either and triggers

pgm.c:1:1: error: initializer element is not constant

with a tree dump similar to above.


Johann


Re: libgcc: strange optimization

2011-08-04 Thread Andrew Haley
On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

> To make sure, it'd be nice if someone could perhaps grep an
> entire GNU/Linux-or-other distribution including the kernel for
> uses of asm-declared *local* registers that don't directly feed
> into asms and not being the stack-pointer?  Or can we get away
> with just saying that local asm registers haven't had any other
> documented meaning for the last seven years?

It's the sort of thing that gets done in threaded interpreters,
where you really need to keep a few pointers in registers and
the interpreter itself is a very long function.  gcc has always
done a dreadful job of register allocation in such cases.

Andrew.


Re: libgcc: strange optimization

2011-08-04 Thread Richard Guenther
On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley  wrote:
> On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:
>
>> To make sure, it'd be nice if someone could perhaps grep an
>> entire GNU/Linux-or-other distribution including the kernel for
>> uses of asm-declared *local* registers that don't directly feed
>> into asms and not being the stack-pointer?  Or can we get away
>> with just saying that local asm registers haven't had any other
>> documented meaning for the last seven years?
>
> It's the sort of thing that gets done in threaded interpreters,
> where you really need to keep a few pointers in registers and
> the interpreter itself is a very long function.  gcc has always
> done a dreadful job of register allocation in such cases.

Sure, but what I have seen people use global register variables
for this (which means they get taken away from the register allocator).

Richard.

> Andrew.
>


Re: g++ 2.5.2 does not catch reference to local variable error.

2011-08-04 Thread Jonathan Wakely
On 4 August 2011 10:29, Miles Bader wrote:
> Jonathan Wakely  writes:
>>> "g++ -Wall -Wextra ..." should flag a warning on the following code
>>> but does not.
>>
>> Thanks for the apology, but it should still be reported to bugzilla
>> not to this list.
>
> BTW, it should only warn if given -Weffc++, right?

No, returning a reference to a local variable is always wrong, not
only because Meyers says so.

We warn for this unconditionally, without any -W option:

struct XYZ { };

XYZ& f() { XYZ x; return x; }

We just don't do it when the reference doesn't bind directly to the
local, as in the earlier examples, because it's harder.


Re: g++ 2.5.2 does not catch reference to local variable error.

2011-08-04 Thread LIM Fung-Chai
Hi,

On Thu, Aug 4, 2011 at 5:29 PM, Miles Bader  wrote:
> Jonathan Wakely  writes:
>>> "g++ -Wall -Wextra ..." should flag a warning on the following code
>>> but does not.
>>
>> Thanks for the apology, but it should still be reported to bugzilla
>> not to this list.

I was hoping someone could submit a bugzilla report.  I don't want to
create an account which I may never use again.

>
> BTW, it should only warn if given -Weffc++, right?

-Weffc++ does not catch the dangerous error in my code.  -Wall caught
another Item21 error somewhere else in my code:

std::vector&
get_XYYs()
{
std::vector result;
return result;
}

I notice -Weffc++ produces more warnings on my code, but -Wall knows
about Item21.  Both -Weffc++ and -Wall only see get_XYZs() as an
Item21 violation but not get_XYZ_data() which returns the local
variable as a reference in an std::pair.

>
> -Miles
>
> --
> People who are more than casually interested in computers should have at
> least some idea of what the underlying hardware is like.  Otherwise the
> programs they write will be pretty weird.  -- Donald Knuth
>

Thanks.

Regards,
Fung Chai.

-- 
FWIW: $\lnot \exists x \, {\rm Right} (x) \leftarrow \forall x \, {\rm
Wrong} (x)$ \hfill -- Stephen Stills

Freedom's just another word for nothin' left to lose -- Kris Kristofferson


Re: g++ 2.5.2 does not catch reference to local variable error.

2011-08-04 Thread Miles Bader
Jonathan Wakely  writes:
> No, returning a reference to a local variable is always wrong, not
> only because Meyers says so.

True ... :}

-miles

-- 
Apologize, v. To lay the foundation for a future offense.


Re: libgcc: strange optimization

2011-08-04 Thread Andrew Haley
On 08/04/2011 10:52 AM, Richard Guenther wrote:
> On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley  wrote:
>> On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:
>>
>>> To make sure, it'd be nice if someone could perhaps grep an
>>> entire GNU/Linux-or-other distribution including the kernel for
>>> uses of asm-declared *local* registers that don't directly feed
>>> into asms and not being the stack-pointer?  Or can we get away
>>> with just saying that local asm registers haven't had any other
>>> documented meaning for the last seven years?
>>
>> It's the sort of thing that gets done in threaded interpreters,
>> where you really need to keep a few pointers in registers and
>> the interpreter itself is a very long function.  gcc has always
>> done a dreadful job of register allocation in such cases.
> 
> Sure, but what I have seen people use global register variables
> for this (which means they get taken away from the register allocator).

Not always though, and the x86 has so few registers that using a
global register variable is very problematic.  I suppose you could
compile the threaded interpreter in a file of its own, but I'm not
sure that has quite the same semantics as local register variables.

The problem is that people who care about this stuff very much don't
always read gcc@gcc.gnu.org so won't be heard.  But in their own world
(LISP, Forth) nice features like register variables and labels as
values have led to gcc being the preferred compiler for this kind of
work.

Andrew.


Re: libgcc: strange optimization

2011-08-04 Thread Richard Guenther
On Thu, Aug 4, 2011 at 1:10 PM, Andrew Haley  wrote:
> On 08/04/2011 10:52 AM, Richard Guenther wrote:
>> On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley  wrote:
>>> On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:
>>>
 To make sure, it'd be nice if someone could perhaps grep an
 entire GNU/Linux-or-other distribution including the kernel for
 uses of asm-declared *local* registers that don't directly feed
 into asms and not being the stack-pointer?  Or can we get away
 with just saying that local asm registers haven't had any other
 documented meaning for the last seven years?
>>>
>>> It's the sort of thing that gets done in threaded interpreters,
>>> where you really need to keep a few pointers in registers and
>>> the interpreter itself is a very long function.  gcc has always
>>> done a dreadful job of register allocation in such cases.
>>
>> Sure, but what I have seen people use global register variables
>> for this (which means they get taken away from the register allocator).
>
> Not always though, and the x86 has so few registers that using a
> global register variable is very problematic.  I suppose you could
> compile the threaded interpreter in a file of its own, but I'm not
> sure that has quite the same semantics as local register variables.
>
> The problem is that people who care about this stuff very much don't
> always read gcc@gcc.gnu.org so won't be heard.  But in their own world
> (LISP, Forth) nice features like register variables and labels as
> values have led to gcc being the preferred compiler for this kind of
> work.

Well, the uses won't break with the idea - they would simply work
like if they were not using local register variables.

Richard.

> Andrew.
>


Re: g++ 2.5.2 does not catch reference to local variable error.

2011-08-04 Thread Jonathan Wakely
On 4 August 2011 11:31, LIM Fung-Chai wrote:
> Hi,
>
> On Thu, Aug 4, 2011 at 5:29 PM, Miles Bader  wrote:
>> Jonathan Wakely  writes:
 "g++ -Wall -Wextra ..." should flag a warning on the following code
 but does not.
>>>
>>> Thanks for the apology, but it should still be reported to bugzilla
>>> not to this list.
>
> I was hoping someone could submit a bugzilla report.  I don't want to
> create an account which I may never use again.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49974

>>
>> BTW, it should only warn if given -Weffc++, right?
>
> -Weffc++ does not catch the dangerous error in my code.  -Wall caught
> another Item21 error somewhere else in my code:
>
> std::vector&
> get_XYYs()
> {
>    std::vector result;
>    return result;
> }
>
> I notice -Weffc++ produces more warnings on my code, but -Wall knows
> about Item21.  Both -Weffc++ and -Wall only see get_XYZs() as an
> Item21 violation but not get_XYZ_data() which returns the local
> variable as a reference in an std::pair.

Because -Weffc++ issues warnings about style guidelines (specifically,
Items 11, 12, 14, 15, 23 from Effective C++ and Items 6 and 67 from
More Effective C++) that not everyone agrees with in all cases.

Item 21 is about correctness, not style, returning a dangling
reference always warrants a warning.


Re: libgcc: strange optimization

2011-08-04 Thread Hans-Peter Nilsson
On Thu, 4 Aug 2011, Andreas Schwab wrote:
> Hans-Peter Nilsson  writes:
>
> > To make sure, it'd be nice if someone could perhaps grep an
> > entire GNU/Linux-or-other distribution including the kernel for
> > uses of asm-declared *local* registers that don't directly feed
> > into asms and not being the stack-pointer?
>
> One frequent candidate is the global pointer.

Yes, that too, but it's usually fixed isn't it?  What I really
meant was "not being a fixed register" but I don't think many
willing to grep a whole distro can tell which registers in which
gcc port are fixed and remember to look for uses of
"-ffixed-reg-".

brgds, H-P


Re: [RFC] Remove -freorder-blocks-and-partition

2011-08-04 Thread Jan Hubicka
> On Wed, Aug 3, 2011 at 2:06 PM, Jan Hubicka  wrote:
> >> In xalancbmk, with the partition option, most of object files have
> >> nonzero size cold sections generated. The text size of the binary is
> >> increased to 3572728 bytes from 3466790 bytes.  Profiling the program
> >> using the training input shows the following differences. With
> >> partitioning, number of executed branch instructions slightly
> >> increases, but itlb misses and icache load misses are significantly
> >> lower compared with the binary without partitioning.
> >>
> >>
> >> David
> >>
> >> With partition:
> >> -
> >>    53654937239  branches
> >>       306751458  L1-icache-load-misses
> >>         8146112  iTLB-load-misses
> >
> > Note that I was also planning for some time to introduce notion of provably 
> > cold
> > stuff into our branch prediction heurstics. I.e. code leading to aborts, eh 
> > etc
> 
> no-return attribute is looked at by static profile estimation pass. Is
> the attribute (definitely not returning) properly propagated to the
> callers (wrappers of exit, etc)?

It is, at local pure const and IPA pure const. Catch with IPA pure const is
that profile is computed at tha time and it is not updated afterwards, so when
discovered late it does not affect static profile estimates (yet), only gets
cfg/codegen better.

Honza
> 
> David
> 
> > that can be then offlined even w/o profile feedback and could perhaps help
> > to large apps.
> > (also the whole pass should be more effective with larger testcases, 
> > SPEC2k6 is slowly
> > becoming a small one)
> >
> > Honza
> >


Re: [RFC] Remove -freorder-blocks-and-partition

2011-08-04 Thread Jan Hubicka
Also on the oriignal topic, Iknow that Mozlla folks experimented with this
switch (and I do expect it should make noticeable reducion in the hot section
footprint that is important for them). They are not using it at the moment
because of problems with their bug reporting tool not being able to do unwind
info for split functions.

Taras, do you have any data if the partitioning helps?
Honza


[named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-04 Thread Georg-Johann Lay
Trying to make named address space support work for target AVR,
I am facing the following problem:

For generic AS, there are three valid base pointer registers
X , Y and Z.

For the new __pgm AS, only Z is available without offset.

The problem is now that addresses.h:base_reg_class() does not
pass down AS information, i.e. addr_space_t down to target hook
MODE_CODE_BASE_REG_CLASS.

Likewise for REGNO_MODE_CODE_OK_FOR_BASE_P.

The MODE argument that both of these get describe the mode of
the MEM but not the mode associated with the AS.

Thus, it is not possible to provide proper implementations of
these hooks.  IRA/reload choses the wrong hard register so
that in postreload no constraints match, i.e.
TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P returns wrong for that
hard register (which is right) so that postreload busts.

I'd just like to reassure me that I didn't do something fundamentally
wrong and named address space support if not yet smoothly supported
in GCC.

FYI, the C source is

int read_from_pgm_2 (int * const __pgm * const addr)
{
return **addr;
}

= Prior to reload: PHI is attached to __pgm, i.e. AS1, looks fine:

(insn 2 4 3 2 (set (reg/v/f:PHI 45 [ addr ])
(reg:PHI 24 r24 [ addr ])) pgm.c:42 8 {*movphi}
 (nil))

(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)

(insn 6 3 7 2 (set (reg/f:HI 47 [ *addr_1(D) ])
(mem/f:HI (reg/v/f:PHI 45 [ addr ]) [2 *addr_1(D)+0 S2 A8 AS1])) 
pgm.c:43 7 {*movhi}
 (expr_list:REG_DEAD (reg/v/f:PHI 45 [ addr ])
(nil)))

(note 7 6 12 2 NOTE_INSN_DELETED)

(insn 12 7 15 2 (set (reg/i:HI 24 r24)
(mem:HI (reg/f:HI 47 [ *addr_1(D) ]) [3 *D.1949_2+0 S2 A8])) pgm.c:44 7 
{*movhi}
 (expr_list:REG_DEAD (reg/f:HI 47 [ *addr_1(D) ])
(nil)))

(insn 15 12 0 2 (use (reg/i:HI 24 r24)) pgm.c:44 -1
 (nil))


= After reload

(insn 20 3 6 2 (set (reg:PHI 26 r26)
(reg/v/f:PHI 24 r24 [orig:45 addr ] [45])) pgm.c:43 8 {*movphi}
 (nil))

(insn 6 20 7 2 (set (reg/f:HI 30 r30 [orig:47 *addr_1(D) ] [47])
(mem/f:HI (reg:PHI 26 r26) [2 *addr_1(D)+0 S2 A8 AS1])) pgm.c:43 7 
{*movhi}
 (nil))

(note 7 6 12 2 NOTE_INSN_DELETED)

(insn 12 7 15 2 (set (reg/i:HI 24 r24)
(mem:HI (reg/f:HI 30 r30 [orig:47 *addr_1(D) ] [47]) [3 *D.1949_2+0 S2 
A8])) pgm.c:44 7 {*movhi}
 (nil))

(insn 15 12 18 2 (use (reg/i:HI 24 r24)) pgm.c:44 -1
 (nil))

The register names are (X=r26, invalid to load from AS1 in insn 6).

Johann


FDO and LTO on ARM

2011-08-04 Thread Mike Hommey
Hi,

We (Mozilla) are trying to get the best of the ARM toolchain for our
Android build. I recently built an Android Native-code Development Kit
with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils
2.19 that come with the default NDK.

LTO doesn't work at all, I'm getting an ICE that looks like the one from
bug 41159.

FDO however, works, but sadly, the resulting build is not only quite
bigger, it's also slower on some tests (the Sunspider javascript
benchmark). While we have seen improvements on other tests (most
notably, the V8 benchmark is much faster) by switching to GCC 4.6 (that
is, without FDO), FDO doesn't seem to bring anything on the table. It
even seems to bring performance regression.

Note that we do our normal builds with -Os and use -O3 for FDO. As for
architecture specific flags, we use -marmv7-a -mthumb -mfloat-abi=softfp
-mfpu=vfp. I've attempted a -O2 build in the past with GCC 4.4 but it
was both bigger and slower than the -Os builds.

So, it pretty much looks like current aggressive optimizations hit
current hardware limitations and are slower than builds optimized for
size.

Has there been significant changes to the ARM backend that would justify
that I try some more with current GCC HEAD? Should I maybe try some more
with the linaro GCC branch? Are there things we can do to help getting
better ARM performance?

Cheers,

Mike


Re: libgcc: strange optimization

2011-08-04 Thread Andrew Haley
On 08/04/2011 12:19 PM, Richard Guenther wrote:
> On Thu, Aug 4, 2011 at 1:10 PM, Andrew Haley  wrote:
>> On 08/04/2011 10:52 AM, Richard Guenther wrote:
>>> On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley  wrote:
 On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

> To make sure, it'd be nice if someone could perhaps grep an
> entire GNU/Linux-or-other distribution including the kernel for
> uses of asm-declared *local* registers that don't directly feed
> into asms and not being the stack-pointer?  Or can we get away
> with just saying that local asm registers haven't had any other
> documented meaning for the last seven years?

 It's the sort of thing that gets done in threaded interpreters,
 where you really need to keep a few pointers in registers and
 the interpreter itself is a very long function.  gcc has always
 done a dreadful job of register allocation in such cases.
>>>
>>> Sure, but what I have seen people use global register variables
>>> for this (which means they get taken away from the register allocator).
>>
>> Not always though, and the x86 has so few registers that using a
>> global register variable is very problematic.  I suppose you could
>> compile the threaded interpreter in a file of its own, but I'm not
>> sure that has quite the same semantics as local register variables.
>>
>> The problem is that people who care about this stuff very much don't
>> always read gcc@gcc.gnu.org so won't be heard.  But in their own world
>> (LISP, Forth) nice features like register variables and labels as
>> values have led to gcc being the preferred compiler for this kind of
>> work.
> 
> Well, the uses won't break with the idea - they would simply work
> like if they were not using local register variables.

I don't understand this remark.  Surely if they work like they were
not using local register variables, you'll get dreadful register
allocation.  But this is a big reason to use gcc.  Efficient code
really does matter to people writing this kind of thing.

Andrew.



Re: FDO and LTO on ARM

2011-08-04 Thread Richard Guenther
On Thu, Aug 4, 2011 at 4:05 PM, Mike Hommey  wrote:
> Hi,
>
> We (Mozilla) are trying to get the best of the ARM toolchain for our
> Android build. I recently built an Android Native-code Development Kit
> with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils
> 2.19 that come with the default NDK.
>
> LTO doesn't work at all, I'm getting an ICE that looks like the one from
> bug 41159.
>
> FDO however, works, but sadly, the resulting build is not only quite
> bigger, it's also slower on some tests (the Sunspider javascript
> benchmark). While we have seen improvements on other tests (most
> notably, the V8 benchmark is much faster) by switching to GCC 4.6 (that
> is, without FDO), FDO doesn't seem to bring anything on the table. It
> even seems to bring performance regression.
>
> Note that we do our normal builds with -Os and use -O3 for FDO. As for
> architecture specific flags, we use -marmv7-a -mthumb -mfloat-abi=softfp
> -mfpu=vfp. I've attempted a -O2 build in the past with GCC 4.4 but it
> was both bigger and slower than the -Os builds.
>
> So, it pretty much looks like current aggressive optimizations hit
> current hardware limitations and are slower than builds optimized for
> size.
>
> Has there been significant changes to the ARM backend that would justify
> that I try some more with current GCC HEAD? Should I maybe try some more
> with the linaro GCC branch? Are there things we can do to help getting
> better ARM performance?

-fprofile-use enables quite some optimizations that are even off for -O3
which are -funroll-loops and -fpeel-loops, -ftracer and -funswitch-loops.
Those will all be increasing code-size (hopefully only for hot code pieces
though).

Did you try using FDO with -Os?  FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.

Richard.

> Cheers,
>
> Mike
>


Re: [RFC] Remove -freorder-blocks-and-partition

2011-08-04 Thread Taras Glek

On 08/04/2011 06:39 AM, Jan Hubicka wrote:

Also on the oriignal topic, Iknow that Mozlla folks experimented with this
switch (and I do expect it should make noticeable reducion in the hot section
footprint that is important for them). They are not using it at the moment
because of problems with their bug reporting tool not being able to do unwind
info for split functions.

Taras, do you have any data if the partitioning helps?
Honza
Currently it does not help. I believe it has potential since we have a 
lot of large methods where only a small part of them gets run.


Taras


Re: [named address] rejects-valid: error: initializer element is not computable at load time

2011-08-04 Thread Ulrich Weigand
Georg-Johann Lay wrote:

> For the following 1-liner I get an error with current trunk r177267:
> 
> const __pgm char * pallo = "pallo";
> 
> __pgm as a named address space qualifier.
[snip]
> Moreover, if a user writes a line like
>const __pgm char * pallo = "pallo";
> he wants the string literal to be placed in the non-default address
> space.  There is no other way to express that except something like
>const __pgm char * pallo = (const __pgm char*) "pallo";
> which doesn't work either and triggers
> 
> pgm.c:1:1: error: initializer element is not constant
> 
> with a tree dump similar to above.

This is pretty much working as expected.  "pallo" is a string literal
which (always) resides in the default address space.  According to the
named address space specification (TR 18037) there are no string literals
in non-default address spaces ...

The assignment above would therefore need to convert a pointer to the
string literal in the default space to a pointer to the __pgm address
space.  This might be impossible (depending on whether __pgm encloses
the generic space), and even if it is possible, it is not guaranteed
that the conversion can be done as a constant expression at compile time.

What I'd do to get a string placed into the __pgm space is to explicitly
initialize an *array* in __pgm space, e.g. like so:

const __pgm char pallo[] = "pallo";

This is defined to work according to TR 18037, and it does actually
work for me on spu-elf.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-04 Thread DJ Delorie

> The only target supporting named address spaces today is spu-elf,

m32c-elf does too.


Re: Question about SMS scheduling windows

2011-08-04 Thread Ayal Zaks
Hi Richard,

2011/8/4 Richard Sandiford 
>
> Hi Ayal,
>
> Thanks to you and Revital for the replies.  The reason I asked is that
> I wanted to rewrite gen_sched_window so that it has only one loop over
> the PSPs and one loop over the PSSs.


This rewrite makes perfect sense regardless of any follow-up patches,
as it clarifies and reuses code.

>
> I have a follow-up patch to use
> iv analysis to reduce the number of memory dependencies (or at least
> increase the distance between them), and it was easier to do that
> after this change.


Reducing memory dependencies is definitely important.

>
> Ayal Zaks  writes:
> >>> I ask because in the final range:
> >>>
> >>>       start = early_start;
> >>>       end = MIN (end, early_start + ii);
> >>>       /* Schedule the node close to it's predecessors.  */
> >>>       step = 1;
> >>>
> >>> END is an exclusive bound.  It seems like we might be double-counting
> > here,
> >>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.
> >>
> >>
> >>Yes, I think it indeed should be fixed. Thanks for reporting on this.
> >>
> >>Revital
> >
> > Agreed;
> >
> >                       if (e->data_type == MEM_DEP)
> >                                 end = MIN (end, SCHED_TIME (v_node) + ii - 
> > 1);
> >
> > should be replaced with
> >
> >                       if (e->data_type == MEM_DEP)
> >                                 end = MIN (end, p_st + ii);
> >
> > also for the (3rd) case when there are both previously-scheduled
> > predessors and previously-scheduled successors. The range is inclusive
> > of start and exclusive of end: for (c = start; c != end; c += step)...
>
> OK.  For the follow-on iv patch, it seemed easier to keep both bounds
> inclusive for the loop, then make the "end" exclusive when setting the
> out parameters.  (Note that there shouldn't be any problem with overflow
> when making the bound exclusive, because the size of the range has been
> restricted to "ii" by that stage.  The follow-on patch does use
> saturating maths for all ops though.)


Sure, no problem having "end" inclusive, as it simplifies things. We
can even keep it inclusive all the way through, iterating over: for (c
= start; c != end+step; c += step)...
>
> >>This doesn't seem to be in the paper, and the comment suggests
> >>"count_succs > count_preds" rather than "count_succs >= count_preds".
> >>Is the ">=" vs ">" important?
> >
> > I think not: in both cases you'll be trying to minimize the same
> > number of live ranges. But indeed it's good to be consistent with the
> > comment.
>
> OK.  For no particular reason other than cowardice, I ended up keeping
> this as:
>
> !   if (count_succs && count_succs >= count_preds)
>
> The reason for asking was that:
>
> !   if (count_succs > count_preds)
>
> seemed more natural, and would match the existing comment.  I'm happy
> to test that instead if you prefer.
>
I wouldn't worry about this tie breaker, unless there's a reason (in
which case the reason should hopefully provide a secondary criteria).

>
> I don't have powerpc hardware that I can do meaningful performance
> testing on, but I did run it through a Popular* Embedded Benchmark
> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
> -fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
> a benchmark that does benefit from modulo scheduling, in some cases
> by a significant amount.)
>
Ok, the patch should be neutral performance-wise. One could check btw
if SMS produces exactly the same output in both cases, even w/o a
powerpc (or any other) HW.

>
> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
> additional patch:
>
> Index: gcc/opts.c
> ===
> --- gcc/opts.c  2011-08-03 10:56:50.0 +0100
> +++ gcc/opts.c  2011-08-03 10:56:50.0 +0100
> @@ -449,6 +449,8 @@ static const struct default_options defa
>     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
>     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
>     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
>
>     /* -O2 optimizations.  */
>     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
>
> applied for both the "before" and "after" runs.  OK to install?


Yes, with just a couple of minor non-mandatory comments:
1. Setting "count_preds = psp_not_empty;" and "count_succs =
pss_not_empty;" suggests that you want to decrement non-interesting
neighbors instead of incrementing interesting ones. Otherwise can
initialize to zero (doesn't really matter, just a bit confusing).
2. I find it easier to read "late_start = MIN (late_start, early_start
+ (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1),
late_start);".
3. Suggest to set step=1 at the beginning, explaining from the start
that we first compute a forward range (start
> Thanks,
> Richar

Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-04 Thread Ulrich Weigand
DJ Delorie wrote:
> > The only target supporting named address spaces today is spu-elf,
> 
> m32c-elf does too.

Huh, I totally missed that, sorry ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: FDO and LTO on ARM

2011-08-04 Thread Mike Hommey
On Thu, Aug 04, 2011 at 05:16:25PM +0200, Richard Guenther wrote:
> -fprofile-use enables quite some optimizations that are even off for -O3
> which are -funroll-loops and -fpeel-loops, -ftracer and -funswitch-loops.
> Those will all be increasing code-size (hopefully only for hot code pieces
> though).
> 
> Did you try using FDO with -Os?  FDO should make hot code parts
> optimized similar to -O3 but leave other pieces optimized for size.
> Using FDO with -O3 gives you the opposite, cold portions optimized
> for size while the rest is optimized for speed.

Yes I initially did, and the results were quite similar, in fact (that
is, regression on Sunspider ; I hadn't checked V8 by then)

I however don't see the difference between the two cases you describe,
except if hot and cold parts combined is not the whole. Experimentation
shows that FDO -Os and FDO -O3 are different, though.

Jan Hubicka also commented on my blog where I mentioned these FDO
issues. Since I think it makes more sense to have the discussion
happening in one place only (and this list is more appropriate), I'll
put answers to his questions below:

> I never actually looked on PDO on ARM, but the slowdowns should not
> really happen. Would be possible to analyse this bit further (i.e.
> figure out what slows down) so we could fix it for 4.7?

The android packages are linked from my blog, and they can be unzipped.
For convenience to the list readers, here is a link to the files:
http://people.mozilla.org/~mhommey/pgo/
The .nightly.apk file is a build from our build bots (standard NDK, GCC
4.4). The .gcc4.6.apk file is a GCC 4.6 -Os build. The pgo.apk file is a
GCC 4.6 -O3 FDO build.
The corresponding source code is:
http://hg.mozilla.org/mozilla-central/file/1dddaeb1366b
Please tell me if you need more data, like object files, unstripped
libraries, etc.

> Also do you get any warnings on profile mismatches? Perhaps something
> is wrong to the degree that the relevant part of profile gets
> misapplied.

I don't get any warning on profile mismatches. I only get a "few"
missing gcda files warning, but that's expected.

Cheers,

Mike


Re: FDO and LTO on ARM

2011-08-04 Thread Xinliang David Li
+Mark who has done size optimization tuning with FDO.

On Thu, Aug 4, 2011 at 7:05 AM, Mike Hommey  wrote:
> Hi,
>
> We (Mozilla) are trying to get the best of the ARM toolchain for our
> Android build. I recently built an Android Native-code Development Kit
> with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils
> 2.19 that come with the default NDK.
>
> LTO doesn't work at all, I'm getting an ICE that looks like the one from
> bug 41159.
>
> FDO however, works, but sadly, the resulting build is not only quite
> bigger,

Is this true for both 4.6 and 4.4 gcc? There is a bug in 4.6 that
prevents cold functions from be optimized for size with FDO. The bug
was fixed in trunk recently.

> it's also slower on some tests (the Sunspider javascript
> benchmark). While we have seen improvements on other tests (most
> notably, the V8 benchmark is much faster) by switching to GCC 4.6 (that
> is, without FDO), FDO doesn't seem to bring anything on the table. It
> even seems to bring performance regression.

ARM specific performance tuning (with FDO) seems needed.  More
parameters (e.g, in inliner related) may need to be made target
dependent.
>
> Note that we do our normal builds with -Os and use -O3 for FDO. As for
> architecture specific flags, we use -marmv7-a -mthumb -mfloat-abi=softfp
> -mfpu=vfp. I've attempted a -O2 build in the past with GCC 4.4 but it
> was both bigger and slower than the -Os builds.
>
> So, it pretty much looks like current aggressive optimizations hit
> current hardware limitations and are slower than builds optimized for
> size.

Yes, this is very likely. Hardware profiling will be very useful to
help identify the root cause.

>
> Has there been significant changes to the ARM backend that would justify
> that I try some more with current GCC HEAD? Should I maybe try some more
> with the linaro GCC branch? Are there things we can do to help getting
> better ARM performance?

It does not hurt to try it :)

Thanks,

David

>
> Cheers,
>
> Mike
>


Re: [named address] rejects-valid: error: initializer element is not computable at load time

2011-08-04 Thread Georg-Johann Lay
Ulrich Weigand wrote:
> Georg-Johann Lay wrote:
> 
>> For the following 1-liner I get an error with current trunk r177267:
>>
>> const __pgm char * pallo = "pallo";
>>
>> __pgm as a named address space qualifier.
> [snip]
>> Moreover, if a user writes a line like
>>const __pgm char * pallo = "pallo";
>> he wants the string literal to be placed in the non-default address
>> space.  There is no other way to express that except something like
>>const __pgm char * pallo = (const __pgm char*) "pallo";
>> which doesn't work either and triggers
>>
>> pgm.c:1:1: error: initializer element is not constant
>>
>> with a tree dump similar to above.
> 
> This is pretty much working as expected.  "pallo" is a string literal
> which (always) resides in the default address space.  According to the
> named address space specification (TR 18037) there are no string literals
> in non-default address spaces ...

The intension of TR 18037 to supply "Extension to Support Embedded Systems"
and these are systems where every byte counts -- and it counts *where* the
byte will be placed.

Basically named AS provide something like target specific qualifiers, and
if GCC, maybe under the umbrella of GNU-C, would actually implement a feature
like target specific qualifiers, that would be a great gain and much more
appreciated than -- users will perceive it that way -- being more catholic
than the pope ;-)

For example, you can have any combination of qualifiers like const, restrict
or volatile, but it is not possible for named AS.  That's clear as long as
named AS is as strict as TR 18037.  However, users want features to write
down their code an a comfortable, type-safe way and not as it is at the moment,
i.e. by means of dreaded inline assembler macros (for my specific case).

For example, the backend could decide what qualifier combinations are valid
and what machine mode is associated to them or emit a diagnostic.

It's target specific, the backend knows it!

> The assignment above would therefore need to convert a pointer to the
> string literal in the default space to a pointer to the __pgm address
> space.  This might be impossible (depending on whether __pgm encloses
> the generic space), and even if it is possible, it is not guaranteed
> that the conversion can be done as a constant expression at compile time.

The backend can tell. It likes to implement features to help users.
It knows about the semantic and if it's legal or not.

And even if it's all strict under TR 18037, the resulting error messages
are *really* confusing to users because to them, a string literal's address
is known.

> What I'd do to get a string placed into the __pgm space is to explicitly
> initialize an *array* in __pgm space, e.g. like so:
> 
> const __pgm char pallo[] = "pallo";
> 
> This is defined to work according to TR 18037, and it does actually
> work for me on spu-elf.

Ya, but it different to the line above.  Was just starting with the work and
it worked some time ago, so I wondered.  And I must admit I am not familiar
will all the dreaded restriction TR 18037 imposes to render it less functional 
:-(
> 
> Bye,
> Ulrich
> 

Anyway, thanks for your explanations.

Do you think a feature like "target specific qualifier" would be reasonable?
IMO it would be greatly appreciated by users.
Should not be too hard atop the work already being done for named addresses.

Johann


Re: FDO and LTO on ARM

2011-08-04 Thread Denis Chertykov
2011/8/4 Xinliang David Li :
> +Mark who has done size optimization tuning with FDO.
>
> On Thu, Aug 4, 2011 at 7:05 AM, Mike Hommey  wrote:
>> Hi,
>>
>> We (Mozilla) are trying to get the best of the ARM toolchain for our
>> Android build. I recently built an Android Native-code Development Kit
>> with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils
>> 2.19 that come with the default NDK.
>>
>> LTO doesn't work at all, I'm getting an ICE that looks like the one from
>> bug 41159.
>>
>> FDO however, works, but sadly, the resulting build is not only quite
>> bigger,
>
> Is this true for both 4.6 and 4.4 gcc? There is a bug in 4.6 that
> prevents cold functions from be optimized for size with FDO. The bug
> was fixed in trunk recently.
>
>> it's also slower on some tests (the Sunspider javascript
>> benchmark). While we have seen improvements on other tests (most
>> notably, the V8 benchmark is much faster) by switching to GCC 4.6 (that
>> is, without FDO), FDO doesn't seem to bring anything on the table. It
>> even seems to bring performance regression.
>
> ARM specific performance tuning (with FDO) seems needed.  More
> parameters (e.g, in inliner related) may need to be made target
> dependent.
>>
>> Note that we do our normal builds with -Os and use -O3 for FDO. As for
>> architecture specific flags, we use -marmv7-a -mthumb -mfloat-abi=softfp
>> -mfpu=vfp. I've attempted a -O2 build in the past with GCC 4.4 but it
>> was both bigger and slower than the -Os builds.
>>
>> So, it pretty much looks like current aggressive optimizations hit
>> current hardware limitations and are slower than builds optimized for
>> size.
>
> Yes, this is very likely. Hardware profiling will be very useful to
> help identify the root cause.
>
>>
>> Has there been significant changes to the ARM backend that would justify
>> that I try some more with current GCC HEAD? Should I maybe try some more
>> with the linaro GCC branch? Are there things we can do to help getting
>> better ARM performance?
>
> It does not hurt to try it :)
>

Linaro android toolchain benchmarks:
https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-07

Denis.


Re: [named address] ice-on-valid: in postreload.c:reload_cse_simplify_operands

2011-08-04 Thread Georg-Johann Lay
Ulrich Weigand wrote:
> Georg-Johann Lay wrote:
> 
>> Trying to make named address space support work for target AVR,
>> I am facing the following problem:
>>
>> For generic AS, there are three valid base pointer registers
>> X , Y and Z.
>>
>> For the new __pgm AS, only Z is available without offset.
>>
>> The problem is now that addresses.h:base_reg_class() does not
>> pass down AS information, i.e. addr_space_t down to target hook
>> MODE_CODE_BASE_REG_CLASS.
>>
>> Likewise for REGNO_MODE_CODE_OK_FOR_BASE_P.
>>
>> The MODE argument that both of these get describe the mode of
>> the MEM but not the mode associated with the AS.
>>
>> Thus, it is not possible to provide proper implementations of
>> these hooks.  IRA/reload choses the wrong hard register so
>> that in postreload no constraints match, i.e.
>> TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P returns wrong for that
>> hard register (which is right) so that postreload busts.
>>
>> I'd just like to reassure me that I didn't do something fundamentally
>> wrong and named address space support if not yet smoothly supported
>> in GCC.
> 
> No, you're correct.  The only target supporting named address spaces
> today is spu-elf, and here we're eliminating any reference to the
> non-default space during expand, so that reload never sees any.

So a solution for me would be to expand as inline asm instead of as
proper insns.  Yes, it's hack.  But before AS support I tried it via
builtin that expanded as unspec and IRA created horribly bloated code.

The annoyance is that only POST_INC addresses for one single hard reg
is supported by the device.  Greetings from spill fails...


> This means that problems like the one you're seeing have been hidden
> so far.  I've started looking into fixing this, but since I don't
> have a target where this is needed, this effort never really went
> anywhere.  I'll append below a patch I did a year or so ago; just
> as something to look at, it probably will not even apply to the
> current sources any more.

Sounds great that you tried to fix it!  Much work below... wow.

> I'd be happy to bring this up to date if you're willing to work with
> me to get this tested on a target that needs this support ...

Just attached a patch to bugzilla because an AVR user wanted to play
with the AS support and asked me to supply my changes. It's still in
a mess but you could get a more reasonable base than on a target where
all named addresses vanish at expand.

The patch is fresh and attached to the enhancement PR49868, just BE stuff.
There is also some sample code.

http://gcc.gnu.org/PR49868

Johann


> Bye,
> Ulrich
> 
> 
> 
> ChangeLog:
> 
>   * doc/tm.texi (MODE_CODE_BASE_REG_CLASS): Add address space
>   argument.
>   (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise.
> 
>   * config/cris/cris.h (MODE_CODE_BASE_REG_CLASS): Add address
>   space argument.
>   (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise.
>   * config/bfin/bfin.h (MODE_CODE_BASE_REG_CLASS): Likewise.
>   (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise.
> 
>   * addresses.h (base_reg_class): Add address space argument.
>   Pass to MODE_CODE_BASE_REG_CLASS.
>   (ok_for_base_p_1): Add address space argument.  Pass to
>   REGNO_MODE_CODE_OK_FOR_BASE_P.
>   (regno_ok_for_base_p): Add address space argument.  Pass to
>   ok_for_base_p_1.
> 
>   * regrename.c (scan_rtx_address): Add address space argument.
>   Pass address space to regno_ok_for_base_p and base_reg_class.
>   Update recursive calls.
>   (scan_rtx): Pass address space to scan_rtx_address.
>   (build_def_use): Likewise.
>   * regcprop.c (replace_oldest_value_addr): Add address space
>   argument.  Pass to regno_ok_for_base_p and base_reg_class.
>   Update recursive calls.
>   (replace_oldest_value_mem): Pass address space to
>   replace_oldest_value_addr.
>   (copyprop_hardreg_forward_1): Likewise.
> 
>   * reload.c (find_reloads_address_1): Add address space argument.
>   Pass address space to base_reg_class and regno_ok_for_base_p.
>   Update recursive calls.
>   (find_reloads_address): Pass address space to base_reg_class,
>   regno_ok_for_base_p, and find_reloads_address_1.
>   (find_reloads): Pass address space to base_reg_class.
>   (find_reloads_subreg_address): Likewise.
> 
>   * ira-costs.c (record_reg_classes): Update calls to base_reg_class.
>   (ok_for_base_p_nonstrict): Add address space argument.  Pass to
>   ok_for_base_p_1.
>   (record_address_regs): Add address space argument.  Pass to
>   base_reg_class and ok_for_base_p_nonstrict.  Update recursive calls.
>   (record_operand_costs): Pass address space to record_address_regs.
>   (scan_one_insn): Likewise.
> 
>   * caller-save.c (init_caller_save): Update call to base_reg_class.
>   * ira-conflicts.c (ira_build_conflicts): Likewise.
>   * reload1.c (maybe_fix_stack_asms): Likewise.


Re: RFC: PATCH: Require and use int64 for x86 options

2011-08-04 Thread H.J. Lu
On Wed, Jul 27, 2011 at 2:23 PM, Joseph S. Myers
 wrote:
> On Wed, 27 Jul 2011, H.J. Lu wrote:
>
>> ; Maximum number of mask bits in a variable.
>> MaxMaskBits
>> ix86_isa_flags = 64
>>
>> It mark ix86_isa_flags as 64bit.  Any comments?
>
> The patch won't work as is.  set_option, for example, casts a pointer to
> (int *), and stores a mask that came from option->var_value, which is an
> int, so this won't work with option fields not of type int or values that
> don't fit in int; you'd need to check all uses of CLVC_BIT_CLEAR and
> CLVC_BIT_SET in the source tree to adapt things for the possibility of
> wider mask fields, and track the type of each such field.
>

Here is the updated patch.  Tested on Linux/ia32 and Linux/x86-64. I
used (1LL < X) for HOST_WIDE_INT instead of ((HOST_WIDE_INT) 1 << x)
since we have

#define OPTION_MASK_ISA_64BIT (1LL << 2)
...
#define TARGET_64BIT_DEFAULT OPTION_MASK_ISA_64BIT
...

#if TARGET_64BIT_DEFAULT
#define OPT_ARCH64 "!m32"
#define OPT_ARCH32 "m32"
#else
#define OPT_ARCH64 "m64|mx32"
#define OPT_ARCH32 "m64|mx32:;"
#endif

and C preprocessor doesn't support ((HOST_WIDE_INT) 1 << x).

OK for trunk?

Thanks.

-- 
H.J.
---
2011-08-04  H.J. Lu  
Igor Zamyatin 

* opt-functions/awk (switch_bit_fields): Initialization
of the host_wide_int field.
(host_wide_int_var_name): New.
var_type_struct): Return HOST_WIDE_INT on 64bit integer.

* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

* opth-gen.awk: Use 1LL for 64bit integer.  Check max_mask_bits
instead of 31 for target masks.

* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
to HOST_WIDE_INT.

* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
with HOST_WIDE_INT for isa_flag.
(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
isa variables.

* config/i386/i386.c (ix86_target_string): Replace int with
HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
(pta_flags): Removed.
(PTA_XXX): Redefined as (1LL << X).
(pta): Use HOST_WIDE_INT on flags.
(builtin_isa): Use HOST_WIDE_INT on isa.
(ix86_add_new_builtins): Likewise.
(def_builtin): Use HOST_WIDE_INT on mask.
(def_builtin_const): Likewise.
(builtin_description): Likewise.

* config/i386/i386.opt (ix86_isa_flags): Replace int with
HOST_WIDE_INT.
(ix86_isa_flags_explicit): Likewise.
(x_ix86_isa_flags_explicit): Likewise.
2011-08-04  H.J. Lu  
	Igor Zamyatin 

	* opt-functions/awk (switch_bit_fields): Initialization
	of the host_wide_int field.
	(host_wide_int_var_name): New.
	var_type_struct): Return HOST_WIDE_INT on 64bit integer.

	* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

	* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

	* opth-gen.awk: Use 1LL for 64bit integer.  Check max_mask_bits
	instead of 31 for target masks.

	* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

	* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
	to HOST_WIDE_INT.

	* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
	with HOST_WIDE_INT for isa_flag.
	(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
	isa variables.

	* config/i386/i386.c (ix86_target_string): Replace int with
	HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
	(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
	(pta_flags): Removed.
	(PTA_XXX): Redefined as (1LL << X).
	(pta): Use HOST_WIDE_INT on flags.
	(builtin_isa): Use HOST_WIDE_INT on isa.
	(ix86_add_new_builtins): Likewise.
	(def_builtin): Use HOST_WIDE_INT on mask.
	(def_builtin_const): Likewise.
	(builtin_description): Likewise.

	* config/i386/i386.opt (ix86_isa_flags): Replace int with
	HOST_WIDE_INT.
	(ix86_isa_flags_explicit): Likewise.
	(x_ix86_isa_flags_explicit): Likewise.

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 1fc333c..c5a770f 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -34,14 +34,14 @@ along with GCC; see the file COPYING3.  If not see
 
 static bool ix86_pragma_target_parse (tree, tree);
 static void ix86_target_macros_internal
-  (int, enum processor_type, enum processor_type, enum fpmath_unit,
+  (HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
void (*def_or_undef) (cpp_reader *, const char *));
 
 
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
-ix86_target_macros_internal (int isa_flag,
+ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 			 enum processor_type arch,
 			 enum processor_type tune,
 			 enum fpmath_u

Re: FDO and LTO on ARM

2011-08-04 Thread Jan Hubicka

Did you try using FDO with -Os?  FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.


FDO with -Os still optimize for size, even in hot parts.  So to get resonale
speedups you need -O3+FDO.  -O3+FDO effectively defaults to -Os in  
cold portions of program.


Still -Os+FDO should be somewhat faster than -Os alone, so a slowdown  
is bug.  It is not very thoroughly since it is not really used in  
practice.



Also do you get any warnings on profile mismatches? Perhaps something
is wrong to the degree that the relevant part of profile gets
misapplied.


I don't get any warning on profile mismatches. I only get a "few"
missing gcda files warning, but that's expected.


Perhaps you could compile one of less trivial files you are sure that  
are covered by train run and send me -fdump-tree-all-blocks  
-fdump-ipa-all dumps of the compilation so I can double check the  
profile seems sane. This could be good start to rule out something  
stupid.


Honza


Cheers,

Mike






Re: FDO and LTO on ARM

2011-08-04 Thread Jan Hubicka
> +Mark who has done size optimization tuning with FDO.
> 
> On Thu, Aug 4, 2011 at 7:05 AM, Mike Hommey  wrote:
> > Hi,
> >
> > We (Mozilla) are trying to get the best of the ARM toolchain for our
> > Android build. I recently built an Android Native-code Development Kit
> > with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils
> > 2.19 that come with the default NDK.
> >
> > LTO doesn't work at all, I'm getting an ICE that looks like the one from
> > bug 41159.
> >
> > FDO however, works, but sadly, the resulting build is not only quite
> > bigger,
> 
> Is this true for both 4.6 and 4.4 gcc? There is a bug in 4.6 that
> prevents cold functions from be optimized for size with FDO. The bug
> was fixed in trunk recently.

You can also backport the patch to 4.6 tree. If the bug exists there, consider
the patch preaproved.

With FDO, -O2 and -O3 is not really that significandly different (i.e. -O2
gets all the extra inlining, but it does not get vectorization that is probably
not big deal for you). -Os is however different storry.
> >
> > Has there been significant changes to the ARM backend that would justify
> > that I try some more with current GCC HEAD? Should I maybe try some more
> > with the linaro GCC branch? Are there things we can do to help getting
> > better ARM performance?
> 
> It does not hurt to try it :)

One thing that is really changed is inliner heuristics.  If would be very happy
to have some feedback on this early, since we plan to do re-tunning of it
(LTO changes many things and there are also fortran benchmarks that shows a lot
of problems. Mozilla may chime in and make my life even harder with hopefully
some positive results on it).

As discused earlier, I think it would be very good idea to start trakcing
perfomrance of Mozilla built with mainline GCC like we track other benchmarks.
We don't really monitor anything of this size and thus we are quite likely
to find new interesting issues by doing so. 

Honza
> 
> Thanks,
> 
> David
> 
> >
> > Cheers,
> >
> > Mike
> >


Re: RFC: PATCH: Require and use int64 for x86 options

2011-08-04 Thread H.J. Lu
On Thu, Aug 4, 2011 at 11:08 AM, H.J. Lu  wrote:
> On Wed, Jul 27, 2011 at 2:23 PM, Joseph S. Myers
>  wrote:
>> On Wed, 27 Jul 2011, H.J. Lu wrote:
>>
>>> ; Maximum number of mask bits in a variable.
>>> MaxMaskBits
>>> ix86_isa_flags = 64
>>>
>>> It mark ix86_isa_flags as 64bit.  Any comments?
>>
>> The patch won't work as is.  set_option, for example, casts a pointer to
>> (int *), and stores a mask that came from option->var_value, which is an
>> int, so this won't work with option fields not of type int or values that
>> don't fit in int; you'd need to check all uses of CLVC_BIT_CLEAR and
>> CLVC_BIT_SET in the source tree to adapt things for the possibility of
>> wider mask fields, and track the type of each such field.
>>
>
> Here is the updated patch.  Tested on Linux/ia32 and Linux/x86-64. I
> used (1LL < X) for HOST_WIDE_INT instead of ((HOST_WIDE_INT) 1 << x)
> since we have
>
> #define OPTION_MASK_ISA_64BIT (1LL << 2)
> ...
> #define TARGET_64BIT_DEFAULT OPTION_MASK_ISA_64BIT
> ...
>
> #if TARGET_64BIT_DEFAULT
> #define OPT_ARCH64 "!m32"
> #define OPT_ARCH32 "m32"
> #else
> #define OPT_ARCH64 "m64|mx32"
> #define OPT_ARCH32 "m64|mx32:;"
> #endif
>
> and C preprocessor doesn't support ((HOST_WIDE_INT) 1 << x).
>
> OK for trunk?
>

Here is the updated patch to get proper HOST_WIDE_INT bits and 1
through a new file, opt-gen.c.  OK for trunk?

Thanks.


-- 
H.J.

2011-08-04  H.J. Lu  
Igor Zamyatin 

* Makefile.in (options.c): Depend on and use opt-gen$(exeext).
(options-save.c): Likewise.
(options.h): Likewise.
(opt-gen$(exeext)): New target.

* opt-functions.awk (switch_bit_fields): Initialize the
host_wide_int field.
(host_wide_int_var_name): New.
(var_type_struct): Check and return HOST_WIDE_INT.

* opt-gen.c: New.

* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

* opth-gen.awk: Use HOST_WIDE_INT_1 on HOST_WIDE_INT.  Check
max_mask_bits instead of 31 for target masks.

* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
to HOST_WIDE_INT.

* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
with HOST_WIDE_INT for isa_flag.
(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
isa variables.

* config/i386/i386.c (ix86_target_string): Replace int with
HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
(pta_flags): Removed.
(PTA_XXX): Redefined as (1LL << X).
(pta): Use HOST_WIDE_INT on flags.
(builtin_isa): Use HOST_WIDE_INT on isa.
(ix86_add_new_builtins): Likewise.
(def_builtin): Use HOST_WIDE_INT on mask.
(def_builtin_const): Likewise.
(builtin_description): Likewise.

* config/i386/i386.opt (ix86_isa_flags): Replace int with
HOST_WIDE_INT.
(ix86_isa_flags_explicit): Likewise.
(x_ix86_isa_flags_explicit): Likewise.
2011-08-04  H.J. Lu  
	Igor Zamyatin 

	* Makefile.in (options.c): Depend on and use opt-gen$(exeext).
	(options-save.c): Likewise.
	(options.h): Likewise.
	(opt-gen$(exeext)): New target.

	* opt-functions.awk (switch_bit_fields): Initialize the
	host_wide_int field.
	(host_wide_int_var_name): New.
	(var_type_struct): Check and return HOST_WIDE_INT.

	* opt-gen.c: New.

	* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

	* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

	* opth-gen.awk: Use HOST_WIDE_INT_1 on HOST_WIDE_INT.  Check
	max_mask_bits instead of 31 for target masks.

	* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

	* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
	to HOST_WIDE_INT.

	* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
	with HOST_WIDE_INT for isa_flag.
	(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
	isa variables.

	* config/i386/i386.c (ix86_target_string): Replace int with
	HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
	(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
	(pta_flags): Removed.
	(PTA_XXX): Redefined as (1LL << X).
	(pta): Use HOST_WIDE_INT on flags.
	(builtin_isa): Use HOST_WIDE_INT on isa.
	(ix86_add_new_builtins): Likewise.
	(def_builtin): Use HOST_WIDE_INT on mask.
	(def_builtin_const): Likewise.
	(builtin_description): Likewise.

	* config/i386/i386.opt (ix86_isa_flags): Replace int with
	HOST_WIDE_INT.
	(ix86_isa_flags_explicit): Likewise.
	(x_ix86_isa_flags_explicit): Likewise.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 0204f93..fd14c35 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2245,22 +2245,22 @@ s-options: $(ALL_OPT_FILES) Makefile $(srcdir)/opt-gather.awk
 	$(STAMP) s-options

Re: RFC: PATCH: Require and use int64 for x86 options

2011-08-04 Thread Joseph S. Myers
On Thu, 4 Aug 2011, H.J. Lu wrote:

> Here is the updated patch to get proper HOST_WIDE_INT bits and 1
> through a new file, opt-gen.c.  OK for trunk?

Using another generator program like this can't be the best approach 
(apart from anything else, when built for the build system hwint.h should 
reflect the build system types not the host system types; cf. 
 where I suspected that 
sort of host/build confusion of causing a reported build failure).

You want opth-gen.awk to know the number of bits to give errors.  Note 
that the errors are given by generating #error into the output file.  It's 
easy enough to generate #if conditions into the file that compare with 
HOST_BITS_PER_WIDE_INT.

You want opth-gen.awk to know whether to use 1LL as the shifted constant.  
You can easily enough make hwint.h contain a HOST_WIDE_INT_1 macro, 
defined to 1L or 1LL as appropriate.

-- 
Joseph S. Myers
jos...@codesourcery.com


gcc-4.5-20110804 is now available

2011-08-04 Thread gccadmin
Snapshot gcc-4.5-20110804 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20110804/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 177417

You'll find:

 gcc-4.5-20110804.tar.bz2 Complete GCC

  MD5=b6f3554d43026a656eb361e6a79b8f26
  SHA1=fe76bf10bac70ef88e5cd627967a63e613ddbd6b

Diffs from 4.5-20110728 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: RFC: PATCH: Require and use int64 for x86 options

2011-08-04 Thread H.J. Lu
On Thu, Aug 4, 2011 at 3:46 PM, Joseph S. Myers  wrote:
> On Thu, 4 Aug 2011, H.J. Lu wrote:
>
>> Here is the updated patch to get proper HOST_WIDE_INT bits and 1
>> through a new file, opt-gen.c.  OK for trunk?
>
> Using another generator program like this can't be the best approach
> (apart from anything else, when built for the build system hwint.h should
> reflect the build system types not the host system types; cf.
>  where I suspected that
> sort of host/build confusion of causing a reported build failure).
>
> You want opth-gen.awk to know the number of bits to give errors.  Note
> that the errors are given by generating #error into the output file.  It's
> easy enough to generate #if conditions into the file that compare with
> HOST_BITS_PER_WIDE_INT.
>
> You want opth-gen.awk to know whether to use 1LL as the shifted constant.
> You can easily enough make hwint.h contain a HOST_WIDE_INT_1 macro,
> defined to 1L or 1LL as appropriate.
>


Here is the updated patch.  OK for trunk?

Thanks.


-- 
H.J.

2011-08-04  H.J. Lu  
Igor Zamyatin 

* hwint.h (HOST_WIDE_INT_1): New.

* opt-functions.awk (switch_bit_fields): Initialize the
host_wide_int field.
(host_wide_int_var_name): New.
(var_type_struct): Check and return HOST_WIDE_INT.

* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

* opth-gen.awk: Use HOST_WIDE_INT_1 on HOST_WIDE_INT.  Properly
check masks for HOST_WIDE_INT.

* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
to HOST_WIDE_INT.

* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
with HOST_WIDE_INT for isa_flag.
(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
isa variables.

* config/i386/i386.c (ix86_target_string): Replace int with
HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
(pta_flags): Removed.
(PTA_XXX): Redefined as (1LL << X).
(pta): Use HOST_WIDE_INT on flags.
(builtin_isa): Use HOST_WIDE_INT on isa.
(ix86_add_new_builtins): Likewise.
(def_builtin): Use HOST_WIDE_INT on mask.
(def_builtin_const): Likewise.
(builtin_description): Likewise.

* config/i386/i386.opt (ix86_isa_flags): Replace int with
HOST_WIDE_INT.
(ix86_isa_flags_explicit): Likewise.
(x_ix86_isa_flags_explicit): Likewise.
2011-08-04  H.J. Lu  
	Igor Zamyatin 

	* hwint.h (HOST_WIDE_INT_1): New.

	* opt-functions.awk (switch_bit_fields): Initialize the
	host_wide_int field.
	(host_wide_int_var_name): New.
	(var_type_struct): Check and return HOST_WIDE_INT.

	* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

	* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

	* opth-gen.awk: Use HOST_WIDE_INT_1 on HOST_WIDE_INT.  Properly
	check masks for HOST_WIDE_INT.

	* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

	* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
	to HOST_WIDE_INT.

	* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
	with HOST_WIDE_INT for isa_flag.
	(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
	isa variables.

	* config/i386/i386.c (ix86_target_string): Replace int with
	HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
	(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
	(pta_flags): Removed.
	(PTA_XXX): Redefined as (1LL << X).
	(pta): Use HOST_WIDE_INT on flags.
	(builtin_isa): Use HOST_WIDE_INT on isa.
	(ix86_add_new_builtins): Likewise.
	(def_builtin): Use HOST_WIDE_INT on mask.
	(def_builtin_const): Likewise.
	(builtin_description): Likewise.

	* config/i386/i386.opt (ix86_isa_flags): Replace int with
	HOST_WIDE_INT.
	(ix86_isa_flags_explicit): Likewise.
	(x_ix86_isa_flags_explicit): Likewise.

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 1fc333c..c5a770f 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -34,14 +34,14 @@ along with GCC; see the file COPYING3.  If not see
 
 static bool ix86_pragma_target_parse (tree, tree);
 static void ix86_target_macros_internal
-  (int, enum processor_type, enum processor_type, enum fpmath_unit,
+  (HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
void (*def_or_undef) (cpp_reader *, const char *));
 
 
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
-ix86_target_macros_internal (int isa_flag,
+ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 			 enum processor_type arch,
 			 enum processor_type tune,
 			 enum fpmath_unit fpmath,
@@ -301,9 +301,9 @@ ix86_pragma_targ

Re: RFC: PATCH: Require and use int64 for x86 options

2011-08-04 Thread H.J. Lu
On Thu, Aug 4, 2011 at 4:44 PM, H.J. Lu  wrote:
> On Thu, Aug 4, 2011 at 3:46 PM, Joseph S. Myers  
> wrote:
>> On Thu, 4 Aug 2011, H.J. Lu wrote:
>>
>>> Here is the updated patch to get proper HOST_WIDE_INT bits and 1
>>> through a new file, opt-gen.c.  OK for trunk?
>>
>> Using another generator program like this can't be the best approach
>> (apart from anything else, when built for the build system hwint.h should
>> reflect the build system types not the host system types; cf.
>>  where I suspected that
>> sort of host/build confusion of causing a reported build failure).
>>
>> You want opth-gen.awk to know the number of bits to give errors.  Note
>> that the errors are given by generating #error into the output file.  It's
>> easy enough to generate #if conditions into the file that compare with
>> HOST_BITS_PER_WIDE_INT.
>>
>> You want opth-gen.awk to know whether to use 1LL as the shifted constant.
>> You can easily enough make hwint.h contain a HOST_WIDE_INT_1 macro,
>> defined to 1L or 1LL as appropriate.
>>
>
>
> Here is the updated patch.  OK for trunk?
>

Small update.  Replace 1LL with HOST_WIDE_INT_1 in  PTA_XXX.
OK for trunk?

Thanks.

-- 
H.J.
---
2011-08-04  H.J. Lu  
Igor Zamyatin 

* hwint.h (HOST_WIDE_INT_1): New.

* opt-functions.awk (switch_bit_fields): Initialize the
host_wide_int field.
(host_wide_int_var_name): New.
(var_type_struct): Check and return HOST_WIDE_INT.

* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

* opth-gen.awk: Use HOST_WIDE_INT_1 on HOST_WIDE_INT.  Properly
check masks for HOST_WIDE_INT.

* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
to HOST_WIDE_INT.

* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
with HOST_WIDE_INT for isa_flag.
(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
isa variables.

* config/i386/i386.c (ix86_target_string): Replace int with
HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
(pta_flags): Removed.
(PTA_XXX): Redefined as (HOST_WIDE_INT_1 << X).
(pta): Use HOST_WIDE_INT on flags.
(builtin_isa): Use HOST_WIDE_INT on isa.
(ix86_add_new_builtins): Likewise.
(def_builtin): Use HOST_WIDE_INT on mask.
(def_builtin_const): Likewise.
(builtin_description): Likewise.

* config/i386/i386.opt (ix86_isa_flags): Replace int with
HOST_WIDE_INT.
(ix86_isa_flags_explicit): Likewise.
(x_ix86_isa_flags_explicit): Likewise.
2011-08-04  H.J. Lu  
	Igor Zamyatin 

	* hwint.h (HOST_WIDE_INT_1): New.

	* opt-functions.awk (switch_bit_fields): Initialize the
	host_wide_int field.
	(host_wide_int_var_name): New.
	(var_type_struct): Check and return HOST_WIDE_INT.

	* opt-read.awk: Handle HOST_WIDE_INT for "Variable".

	* optc-save-gen.awk: Support HOST_WIDE_INT on var_target_other.

	* opth-gen.awk: Use HOST_WIDE_INT_1 on HOST_WIDE_INT.  Properly
	check masks for HOST_WIDE_INT.

	* opts-common.c (set_option): Support HOST_WIDE_INT Flag_var.

	* opts.h (cl_option): Add cl_host_wide_int.  Change var_value
	to HOST_WIDE_INT.

	* config/i386/i386-c.c (ix86_target_macros_internal): Replace int
	with HOST_WIDE_INT for isa_flag.
	(ix86_pragma_target_parse): Replace int with HOST_WIDE_INT for
	isa variables.

	* config/i386/i386.c (ix86_target_string): Replace int with
	HOST_WIDE_INT for isa.  Use HOST_WIDE_INT_PRINT to print isa.
	(ix86_target_opts): Replace int with HOST_WIDE_INT on mask.
	(pta_flags): Removed.
	(PTA_XXX): Redefined as (HOST_WIDE_INT_1 << X).
	(pta): Use HOST_WIDE_INT on flags.
	(builtin_isa): Use HOST_WIDE_INT on isa.
	(ix86_add_new_builtins): Likewise.
	(def_builtin): Use HOST_WIDE_INT on mask.
	(def_builtin_const): Likewise.
	(builtin_description): Likewise.

	* config/i386/i386.opt (ix86_isa_flags): Replace int with
	HOST_WIDE_INT.
	(ix86_isa_flags_explicit): Likewise.
	(x_ix86_isa_flags_explicit): Likewise.

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 1fc333c..c5a770f 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -34,14 +34,14 @@ along with GCC; see the file COPYING3.  If not see
 
 static bool ix86_pragma_target_parse (tree, tree);
 static void ix86_target_macros_internal
-  (int, enum processor_type, enum processor_type, enum fpmath_unit,
+  (HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
void (*def_or_undef) (cpp_reader *, const char *));
 
 
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
-ix86_target_macros_internal (int isa_flag,
+ix86_target_macr