Ping: [PATCH] Enable bbro for -Os

2012-08-21 Thread Zhenqiang Chen
Ping.

Thanks!
-Zhenqiang

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen
> Sent: Tuesday, August 14, 2012 2:50 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] Enable bbro for -Os
> 
> Hi,
> 
> Basic block reordering is disabled for -Os from gcc 4.7 since the pass
will lead
> to big code size regression. But benchmarks logs also show there are lots
of
> regression due to poor code layout compared with 4.6.
> 
> The patch is to enable bbro for -Os. When optimizing for size, it
> * avoid duplicating block.
> * keep its original order if there is no chance to fall through.
> * ignore edge frequency and probability.
> * handle predecessor first if its index is smaller to break long trace.
> * only connect Trace n with Trace n + 1 to reduce long jump.
> 
> Here are the CSiBE code size benchmark results:
> * For ARM, code size reduces 0.21%.
> * For MIPS, code size reduces 0.25%.
> * For PPC, code size reduces 0.33%.
> * For X86, code size reduces 0.22%.
> 
> The patch does not impact bbro when optimizing for speed. To verify it, I
> "objdump -d" all obj files from CSiBE (compiled with -O2) for
> ARM/MIPS/PPC/X86. The assembler with the patch is the same as it without
> the patch.
> 
> No make check regression on ARM.
> 
> Is it OK for trunk?
> 
> Thanks!
> -Zhenqiang
> 
> ChangeLog
> 2012-08-14  Zhenqiang Chen 
> 
>   * bb-reorder.c (connect_better_edge_p): New added.
>   (find_traces_1_round): When optimizing for size, ignore edge
> frequency
>   and probability, and handle all in one round.
>   (bb_to_key): Use bb->index as key for size.
>   (better_edge_p): The smaller bb index is better for size.
>   (connect_traces): Connect block n with block n + 1;
>   connect trace m with trace m + 1 if falling through.
>   (copy_bb_p): Avoid duplicating blocks.
>   (gate_handle_reorder_blocks): Enable bbro when optimizing for -Os.





Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Jan Hubicka
> On Tue, Aug 21, 2012 at 6:56 PM, Jan Hubicka  wrote:
> >> > I can go ahead with the histogram approach. There is some roundoff
> >> > error from the working set scaling approach that can affect different
> >> > merging orders as you note, although I think this only really affects the
> >> > small counter values. The other place where saving/merging the histogram
> >
> > Do you have any intuition on why simple maximalization merging (that is 
> > safe wrt
> > ordering) would be bad idea?
> 
> When you say "maximalization merging" are you talking about the
> histogram merging approach I mentioned a few emails back (my response
> on Aug 19) where we assume the same relative order of hotness in the
> counters between runs, and accumulate the counter values in the
> histogram in that order?

I was speaking of BB (counter) counts only here and considering the simplest
strategy: maximalize the BB counts in corresponding buckets and sum the 
cummlative
values in the corresponding buckets. 

The strategy of scaling is more sensible, but it will also be sensitive to order
of train runs, i.e. it will give unstable results with parallel make.
> > OK, so I guess we went through
> >  1) two pass updating with race in between pases.
> >  2) two pass updating with first pass updating countes and second having 
> > race only for summary update.
> > (i.e. no races for counters)
> >  3) two pass updating with flocking (and some way to handle detected 
> > deadlocks)
> >  4) one pass updating with histogram merging + maximalization of working 
> > set.
> > (we do not realy need to scale the buckets, we can simply merge the 
> > histograms
> >  and then mutiply them by nruns before comparing to actual counters.
> 
> By merging the histograms (and accumulating the counter values stored
> there as we merge), I don't think we need to multiply the counter
> values by nruns, do we?

With the simple approach we need, but...
> 
> >  This assumes that working sets of all runs are about the same, but 
> > should work
> >  resonably in practice I think.
> >
> > I guess 3/4 are acceptable WRT bootstrap reproducibility.
> >
> > I have no experience with flocking large number of files and portability of
> > this solution i.e.  to Windows.  If you think that 2) would be too 
> > inaccurate
> > in practice and 3) has chance to be portable, we could go for this.  It will
> > solve the precision problems and will also work for LIPO summaries.
> > I would be curious about effect on profiledbootstrap time of this if you 
> > implement
> > it.
> 
> I'm hoping that 2) will be accurate enough in practice, but it will
> need some investigation.

Perhaps weighting all pros/cons you are right here.  I think getting results
consistent across different order of runs is more important than the
possibility of race on the last update and 4) is probably getting quite a lot
of GIGO issues.

We can implement more consistent locking mechanizm incrementally if this turns
out to be issue.

It is posible to actually detect the race: at first round we are updating the
nruns. In the second round we can see if nruns has changed.  If so, we have
problem since someone has already updated the file and will update the
histograms.

I would suggest skiping update when nruns has changes, so the summary at least
match the counters in current file accurately (even though it will be off for
the whole program since whoever updated the file may have not seen all our
updates, just part of them).  Just to reduce changes that the trashed run is
the most important one. Theoretically we could also make libgcov to re-read all
counters in this case and try another update, but I would not worry about that
for the moment.

We will still have problems with bootstrap: the summary of libbackend.a will
depend on what of compiler binaries has executed last, since cc1 will have
different summary from cc1plus becuase frontend is different.  I would give
extra points to voluteer who changes libgcov to do multiple summaries for
different programs as intended originally (if you search archives in 2001-2003,
somewhere are patches that changes libgcov from not merging at all to merging
always. I guess changing it to merge just when program checksum match is not
that hard).

Thanks,
Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> >>
> >> David
> >>
> >>
> >>
> >> >
> >> > Thanks,
> >> > Teresa
> >> >
> >> > >
> >> >> > >>
> >> >> > >>
> >> >> > >> >  2) Do we plan to add some features in near future that will
> >> >> anyway require global locking?
> >> >> > >> > I guess LIPO itself does not count since it streams its data
> >> >> into independent file as you
> >> >> > >> > mentioned earlier and locking LIPO file is not that hard.
> >> >> > >> > Does LIPO stream everything into that common file, or does it
> >> >> use combination of gcda files
> >> >> > >> > and common summary?
> >> >> > >>
> >> >> > >> Actually, LIPO module grouping information are stored in gcda

[Patch, Fortran, committed] Free loop and gfc_ss data

2012-08-21 Thread Tobias Burnus

Committed as Rev. 190586 after successful regtesting.

That's the version I also had attached to 
http://gcc.gnu.org/ml/fortran/2012-08/msg00118.html; as written there:


"The patch is incomplete, e.g. "argss" of gfc_conv_procedure_call is not 
(or not always) freed. Ditto for rss of gfc_trans_assignment_1; ditto 
for lss and rss of gfc_trans_pointer_assignment."


Tobias
Index: gcc/fortran/trans-expr.c
===
--- gcc/fortran/trans-expr.c	(Revision 190585)
+++ gcc/fortran/trans-expr.c	(Arbeitskopie)
@@ -533,6 +533,7 @@ gfc_copy_class_to_class (tree from, tree to, tree
   loop.to[0] = nelems;
   gfc_trans_scalarizing_loops (&loop, &loopbody);
   gfc_add_block_to_block (&body, &loop.pre);
+  gfc_cleanup_loop (&loop);
   tmp = gfc_finish_block (&body);
 }
   else
@@ -6770,6 +6771,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_
   if (!expr2->value.function.isym)
 	{
 	  realloc_lhs_loop_for_fcn_call (&se, &expr1->where, &ss, &loop);
+	  gfc_cleanup_loop (&loop);
 	  ss->is_alloc_lhs = 1;
 	}
   else
@@ -6778,6 +6780,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_
 
   gfc_conv_function_expr (&se, expr2);
   gfc_add_block_to_block (&se.pre, &se.post);
+  gfc_free_ss (se.ss);
 
   return gfc_finish_block (&se.pre);
 }
Index: gcc/fortran/trans-intrinsic.c
===
--- gcc/fortran/trans-intrinsic.c	(Revision 190585)
+++ gcc/fortran/trans-intrinsic.c	(Arbeitskopie)
@@ -1328,6 +1328,7 @@ gfc_conv_intrinsic_rank (gfc_se *se, gfc_expr *exp
   argse.descriptor_only = 1;
 
   gfc_conv_expr_descriptor (&argse, expr->value.function.actual->expr, ss);
+  gfc_free_ss (ss);
   gfc_add_block_to_block (&se->pre, &argse.pre);
   gfc_add_block_to_block (&se->post, &argse.post);
 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 190585)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2012-08-22  Tobias Burnus  
+
+	* trans-expr.c (gfc_copy_class_to_class,
+	gfc_trans_arrayfunc_assign): Free loop and ss data.
+	* trans-intrinsic.c (gfc_trans_arrayfunc_assign): Free ss data.
+
 2012-08-21  Tobias Burnus  
 
 	* parse.c (parse_contained): Include EXEC_END_PROCEDURE


Ping^4: Properly handle arg_pointer and frame_pointer in DWARF output

2012-08-21 Thread Richard Sandiford
(^4 because I unwittingly submitted the same patch a while back.)

Ping for H.J.'s patch to avoid using dwarf extension codes for
simple CFA addresses based on arg_pointer_rtx and frame_pointer_rtx
in cases where Pmode is wider than the DWARF address size:

http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01815.html

Using extension codes for an internal rtx like frame_pointer_rtx
(as opposed to hard_frame_pointer_rtx) can't work in general,
because there's no associated DWARF register number.  Even on
targets that provide arg_pointer_rtx, I expect debuggers would
expect the normal CFA-like addresses for arg_pointer_rtx-based
expressions.

I've had to use this locally for a few months now.  Without it
mipsisa64-elf doesn't build.

Richard


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Teresa Johnson
On Tue, Aug 21, 2012 at 6:56 PM, Jan Hubicka  wrote:
>> > I can go ahead with the histogram approach. There is some roundoff
>> > error from the working set scaling approach that can affect different
>> > merging orders as you note, although I think this only really affects the
>> > small counter values. The other place where saving/merging the histogram
>
> Do you have any intuition on why simple maximalization merging (that is safe 
> wrt
> ordering) would be bad idea?

When you say "maximalization merging" are you talking about the
histogram merging approach I mentioned a few emails back (my response
on Aug 19) where we assume the same relative order of hotness in the
counters between runs, and accumulate the counter values in the
histogram in that order?

This would be inaccurate if different runs exercised different areas
of the code, and thus the counters would be ordered in the histogram
differently.

>
> We care only about working set size around top of the histogram and I would 
> say

For optimizations that care about the boundary between hot and cold
such as code layout I think we will also care about the smaller values
in the histogram (to have a good idea of what constitutes a cold block
counter value).

> that we should sort of optimize for the largest (in the number of blocks in 
> hot
> area) of the train runs.  One way where things will get messed up is when the
> working set is about the same but the runs are of different size so all the
> blocks gets accounted into two different buckets.

I'm not sure I understand the last sentence - is this something that
would not get handled by merging the histogram entries as I described
earlier? Or it sounds like you might have a different merging approach
in mind?

>
> But in general I do not think there is resonably accurate way to merge the
> profiles without actually streaming out all counter IDs in every bucket, so 
> perhaps
> this will work well enough.  If not, we can in future introduce per-program 
> global
> summary file that will contain the counters to be merged acurately.

Sounds good.

>
>> > would help is when the distribution of counter values in the histogram
>> > varies between runs, say for example, the hottest loop is much hotter in a
>> > subsequent run, but the rest of the counter values stay largely consistent.
>> > Note, however, that if the hotspots are different in different runs, then
>> > merging either the histogram or the working set will have issues. The only
>> > way to detect this is to recompute the histogram/working set from scratch
>> > from the merged counters.
>> >
>> > I wonder in practice, even when there are a lot of simultaneous runs going
>> > like in a gcc bootstrap, if we could get reasonably accurate summary
>> > recomputation without global locking. The reason is that as long as the
>> > actual counter updates are safe as they are now due to the individual file
>> > locking, the inaccuracy in the recomputed summary information will not grow
>> > over time, and there is a reasonable chance that the last run to complete
>> > and merge will recompute the summary from the final merged counter values
>> > and get it right (or only be off by a little bit if there are a couple of
>> > runs completing simultaneously at the end). But this can be investigated as
>> > a follow on to this patch.
>> >
>>
>>
>> The main concern is probably the build reproducibility in gcc bootstrap
>> with FDO.
>
> Hmm, you mean in the first pass update every file with new counters and in 
> the second
> pass just update the summaries?

Right, that's what I had in mind (what you have described in #2 below).

> OK, so I guess we went through
>  1) two pass updating with race in between pases.
>  2) two pass updating with first pass updating countes and second having race 
> only for summary update.
> (i.e. no races for counters)
>  3) two pass updating with flocking (and some way to handle detected 
> deadlocks)
>  4) one pass updating with histogram merging + maximalization of working set.
> (we do not realy need to scale the buckets, we can simply merge the 
> histograms
>  and then mutiply them by nruns before comparing to actual counters.

By merging the histograms (and accumulating the counter values stored
there as we merge), I don't think we need to multiply the counter
values by nruns, do we?

>  This assumes that working sets of all runs are about the same, but 
> should work
>  resonably in practice I think.
>
> I guess 3/4 are acceptable WRT bootstrap reproducibility.
>
> I have no experience with flocking large number of files and portability of
> this solution i.e.  to Windows.  If you think that 2) would be too inaccurate
> in practice and 3) has chance to be portable, we could go for this.  It will
> solve the precision problems and will also work for LIPO summaries.
> I would be curious about effect on profiledbootstrap time of this if you 
> implement
> it.

I'm hoping that 2) will be accurate eno

Re: [PATCH 4/4] Reduce the size of optabs representation

2012-08-21 Thread Mike Stump
On Jul 19, 2012, at 11:24 AM, Richard Henderson wrote:
> +# genopinit produces two files.
> +insn-opinit.c insn-opinit.h: s-opinit ; @true
> +s-opinit: $(MD_DEPS) build/genopinit$(build_exeext) insn-conditions.md
> + $(RUN_GEN) build/genopinit$(build_exeext) $(md_file) \
> +   insn-conditions.md -htmp-opinit.h -ctmp-opinit.c
> + $(SHELL) $(srcdir)/../move-if-change tmp-opinit.h insn-opinit.h
> + $(SHELL) $(srcdir)/../move-if-change tmp-opinit.c insn-opinit.c
> + $(STAMP) s-opinit

Breaks my port without the attached patch...

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 67f1d66..bd31c9b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3484,7 +3484,7 @@ s-attrtab : $(MD_DEPS) build/genattrtab$(build_exeext) \
 # genopinit produces two files.
 insn-opinit.c insn-opinit.h: s-opinit ; @true
 s-opinit: $(MD_DEPS) build/genopinit$(build_exeext) insn-conditions.md
-   $(RUN_GEN) build/genopinit$(build_exeext) $(md_file) \
+   $(RUN_GEN) build/genopinit$(build_exeext) $(MD_INCS) $(md_file) \
  insn-conditions.md -htmp-opinit.h -ctmp-opinit.c
$(SHELL) $(srcdir)/../move-if-change tmp-opinit.h insn-opinit.h
$(SHELL) $(srcdir)/../move-if-change tmp-opinit.c insn-opinit.c


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Jan Hubicka
> > I can go ahead with the histogram approach. There is some roundoff
> > error from the working set scaling approach that can affect different
> > merging orders as you note, although I think this only really affects the
> > small counter values. The other place where saving/merging the histogram

Do you have any intuition on why simple maximalization merging (that is safe wrt
ordering) would be bad idea?

We care only about working set size around top of the histogram and I would say
that we should sort of optimize for the largest (in the number of blocks in hot
area) of the train runs.  One way where things will get messed up is when the
working set is about the same but the runs are of different size so all the
blocks gets accounted into two different buckets.

But in general I do not think there is resonably accurate way to merge the
profiles without actually streaming out all counter IDs in every bucket, so 
perhaps
this will work well enough.  If not, we can in future introduce per-program 
global
summary file that will contain the counters to be merged acurately.

> > would help is when the distribution of counter values in the histogram
> > varies between runs, say for example, the hottest loop is much hotter in a
> > subsequent run, but the rest of the counter values stay largely consistent.
> > Note, however, that if the hotspots are different in different runs, then
> > merging either the histogram or the working set will have issues. The only
> > way to detect this is to recompute the histogram/working set from scratch
> > from the merged counters.
> >
> > I wonder in practice, even when there are a lot of simultaneous runs going
> > like in a gcc bootstrap, if we could get reasonably accurate summary
> > recomputation without global locking. The reason is that as long as the
> > actual counter updates are safe as they are now due to the individual file
> > locking, the inaccuracy in the recomputed summary information will not grow
> > over time, and there is a reasonable chance that the last run to complete
> > and merge will recompute the summary from the final merged counter values
> > and get it right (or only be off by a little bit if there are a couple of
> > runs completing simultaneously at the end). But this can be investigated as
> > a follow on to this patch.
> >
> 
> 
> The main concern is probably the build reproducibility in gcc bootstrap
> with FDO.

Hmm, you mean in the first pass update every file with new counters and in the 
second
pass just update the summaries?
OK, so I guess we went through
 1) two pass updating with race in between pases.
 2) two pass updating with first pass updating countes and second having race 
only for summary update.
(i.e. no races for counters)
 3) two pass updating with flocking (and some way to handle detected deadlocks)
 4) one pass updating with histogram merging + maximalization of working set.
(we do not realy need to scale the buckets, we can simply merge the 
histograms
 and then mutiply them by nruns before comparing to actual counters.
 This assumes that working sets of all runs are about the same, but should 
work
 resonably in practice I think.

I guess 3/4 are acceptable WRT bootstrap reproducibility. 

I have no experience with flocking large number of files and portability of
this solution i.e.  to Windows.  If you think that 2) would be too inaccurate
in practice and 3) has chance to be portable, we could go for this.  It will
solve the precision problems and will also work for LIPO summaries.
I would be curious about effect on profiledbootstrap time of this if you 
implement
it.

Honza
> 
> David
> 
> 
> 
> >
> > Thanks,
> > Teresa
> >
> > >
> >> > >>
> >> > >>
> >> > >> >  2) Do we plan to add some features in near future that will
> >> anyway require global locking?
> >> > >> > I guess LIPO itself does not count since it streams its data
> >> into independent file as you
> >> > >> > mentioned earlier and locking LIPO file is not that hard.
> >> > >> > Does LIPO stream everything into that common file, or does it
> >> use combination of gcda files
> >> > >> > and common summary?
> >> > >>
> >> > >> Actually, LIPO module grouping information are stored in gcda files.
> >> > >> It is also stored in a separate .imports file (one per object) ---
> >> > >> this is primarily used by our build system for dependence
> >> information.
> >> > >
> >> > > I see, getting LIPO safe WRT parallel updates will be fun. How does
> >> LIPO behave
> >> > > on GCC bootstrap?
> >> >
> >> > We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
> >> > main problem for application build -- the link time (for debug build)
> >> > is.
> >>
> >> I was primarily curious how the LIPOs runtime analysis fare in the
> >> situation where
> >> you do very many small train runs on rather large app (sure GCC is small
> >> to google's
> >> use case ;).
> >> >
> >> > > (i.e. it does a lot more work in the libgcov mo

[Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode

2012-08-21 Thread Terry Guo
Hi,

Due to the impact of ARM UAL, the Thumb1 and Thumb2 mode use LSRS
instruction while the ARM mode uses MOVS instruction. So the following case
is updated accordingly. Is it OK to trunk?

BR,
Terry

2012-08-21  Terry Guo  

* gcc.target/arm/combine-movs.c: Check movs for ARM mode 
and lsrs for other mode.

diff --git a/gcc/testsuite/gcc.target/arm/combine-movs.c
b/gcc/testsuite/gcc.target/arm/combine-movs.c
index 4209a33..fbef9df 100644
--- a/gcc/testsuite/gcc.target/arm/combine-movs.c
+++ b/gcc/testsuite/gcc.target/arm/combine-movs.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-skip-if "" { arm_thumb1 } } */
 /* { dg-options "-O" }  */

 void foo (unsigned long r[], unsigned int d)
@@ -9,4 +8,5 @@ void foo (unsigned long r[], unsigned int d)
 r[i] = 0;
 }

-/* { dg-final { scan-assembler "movs\tr\[0-9\]" } } */
+/* { dg-final { scan-assembler "movs\tr\[0-9\]" { target arm_nothumb } } }
*/
+/* { dg-final { scan-assembler "lsrs\tr\[0-9\]" { target { ! arm_nothumb }
} } } */





Re: Build static libgcc with hidden visibility even with --disable-shared

2012-08-21 Thread Ian Lance Taylor
On Tue, Aug 21, 2012 at 5:33 PM, Joseph S. Myers
 wrote:
>
> 2012-08-21  Joseph Myers  
>
> * Makefile.in (vis_hide, gen-hide-list): Do not make definitions
> depend on --enable-shared.
> ($(lib1asmfuncs-o)): Use %.vis files independent of
> --enable-shared.
> * static-object.mk ($(base)$(objext), $(base).vis)
> ($(base)_s$(objext)): Use same rules for visibility handling as in
> shared-object.mk.

This is OK.

Thanks.

Ian


Build static libgcc with hidden visibility even with --disable-shared

2012-08-21 Thread Joseph S. Myers
As discussed in
, it is
desirable for the libgcc build with inhibit_libc defined and
--disable-shared to be similar enough to that build without
inhibit_libc and --enable-shared to be usable to build glibc,
producing the same results as if glibc were built with a toolchain
that already included a shared libgcc and was built against previously
built glibc.  One source of differences noted there was functions in
libgcc.a being hidden only if shared libgcc is also being built.

This patch changes the logic so that the way libgcc.a is built in the
static-only case is more similar to how it is built when shared libgcc
is being built as well; in particular, libgcc symbols are generally
given hidden visibility (if supported) in the static libgcc.

Tested with cross to arm-none-linux-gnueabi that it fixes the
previously observed differences; rebuilding glibc with the second GCC
now produces identical stripped binaries to the results of building
with the first (static-only) GCC, except for the cases of nscd and
static libraries which differ between multiple glibc builds even with
identical compilers (in both cases because of embedded timestamps).
Also bootstrapped with no regressions on x86_64-unknown-linux-gnu as a
sanity check.  OK to commit?

2012-08-21  Joseph Myers  

* Makefile.in (vis_hide, gen-hide-list): Do not make definitions
depend on --enable-shared.
($(lib1asmfuncs-o)): Use %.vis files independent of
--enable-shared.
* static-object.mk ($(base)$(objext), $(base).vis)
($(base)_s$(objext)): Use same rules for visibility handling as in
shared-object.mk.

Index: libgcc/Makefile.in
===
--- libgcc/Makefile.in  (revision 190577)
+++ libgcc/Makefile.in  (working copy)
@@ -363,6 +363,7 @@
   ifneq ($(LIBUNWIND),)
 install-libunwind = install-libunwind
   endif
+endif
 
 # For -fvisibility=hidden.  We need both a -fvisibility=hidden on
 # the command line, and a #define to prevent libgcc2.h etc from
@@ -386,11 +387,8 @@
 gen-hide-list = echo > $@
 endif
 
-else
-# Not enable_shared.
+ifneq ($(enable_shared),yes)
 iterator = $(srcdir)/empty.mk $(patsubst 
%,$(srcdir)/static-object.mk,$(iter-items))
-vis_hide =
-gen-hide-list = echo > \$@
 endif
 
 LIB2ADD += enable-execute-stack.c
@@ -439,7 +437,6 @@
   $(LIB2_DIVMOD_FUNCS))
 
 # Build "libgcc1" (assembly) components.
-ifeq ($(enable_shared),yes)
 
 lib1asmfuncs-o = $(patsubst %,%$(objext),$(LIB1ASMFUNCS))
 $(lib1asmfuncs-o): %$(objext): $(srcdir)/config/$(LIB1ASMSRC) %.vis
@@ -451,15 +448,10 @@
 lib1asmfuncs-s-o = $(patsubst %,%_s$(objext),$(LIB1ASMFUNCS))
 $(lib1asmfuncs-s-o): %_s$(objext): $(srcdir)/config/$(LIB1ASMSRC)
$(gcc_s_compile) -DL$* -xassembler-with-cpp -c $<
+ifeq ($(enable_shared),yes)
+
 libgcc-s-objects += $(lib1asmfuncs-s-o)
 
-else
-
-lib1asmfuncs-o = $(patsubst %,%$(objext),$(LIB1ASMFUNCS))
-$(lib1asmfuncs-o): %$(objext): $(srcdir)/config/$(LIB1ASMSRC)
-   $(gcc_compile) -DL$* -xassembler-with-cpp -c $<
-libgcc-objects += $(lib1asmfuncs-o)
-
 endif
 
 # Build lib2funcs.  For the static library also include LIB2FUNCS_ST.
Index: libgcc/static-object.mk
===
--- libgcc/static-object.mk (revision 190577)
+++ libgcc/static-object.mk (working copy)
@@ -24,7 +24,13 @@
 endif
 endif
 
-$(base)$(objext): $o
-   $(gcc_compile) -c -xassembler-with-cpp $<
+$(base)$(objext): $o $(base).vis
+   $(gcc_compile) -c -xassembler-with-cpp -include $*.vis $<
 
+$(base).vis: $(base)_s$(objext)
+   $(gen-hide-list)
+
+$(base)_s$(objext): $o
+   $(gcc_s_compile) -c -xassembler-with-cpp $<
+
 endif

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Combine location with block using block_locations

2012-08-21 Thread Dehao Chen
On Tue, Aug 21, 2012 at 6:25 AM, Richard Guenther
 wrote:
> On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen  wrote:
>> ping
>
> Conceptually I like the change.  Can a libcpp maintainer please have a 2nd
> look?
>
> Dehao, did you do any compile-time and memory-usage benchmarks?

I don't have a memory benchmarks at hand. But I've tested it through
some huge apps, each of which takes more than 1 hour to build on a
modern machine. None of them had observed noticeable memory footprint
and compile time increase.

Thanks,
Dehao

>
> Thanks,
> Richard.
>
>> Thanks,
>> Dehao
>>
>> On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen  wrote:
>>> Hi, Dodji,
>>>
>>> Thanks for the review. I've fixed all the addressed issues. I'm
>>> attaching the related changes:
>>>
>>> Thanks,
>>> Dehao
>>>
>>> libcpp/ChangeLog:
>>> 2012-08-01  Dehao Chen  
>>>
>>> * include/line-map.h (MAX_SOURCE_LOCATION): New value.
>>> (location_adhoc_data_init): New.
>>> (location_adhoc_data_fini): New.
>>> (get_combined_adhoc_loc): New.
>>> (get_data_from_adhoc_loc): New.
>>> (get_location_from_adhoc_loc): New.
>>> (COMBINE_LOCATION_DATA): New.
>>> (IS_ADHOC_LOC): New.
>>> (expanded_location): New field.
>>> * line-map.c (location_adhoc_data): New.
>>> (location_adhoc_data_htab): New.
>>> (curr_adhoc_loc): New.
>>> (location_adhoc_data): New.
>>> (allocated_location_adhoc_data): New.
>>> (location_adhoc_data_hash): New.
>>> (location_adhoc_data_eq): New.
>>> (location_adhoc_data_update): New.
>>> (get_combined_adhoc_loc): New.
>>> (get_data_from_adhoc_loc): New.
>>> (get_location_from_adhoc_loc): New.
>>> (location_adhoc_data_init): New.
>>> (location_adhoc_data_fini): New.
>>> (linemap_lookup): Change to use new location.
>>> (linemap_ordinary_map_lookup): Likewise.
>>> (linemap_macro_map_lookup): Likewise.
>>> (linemap_macro_map_loc_to_def_point): Likewise.
>>> (linemap_macro_map_loc_unwind_toward_spel): Likewise.
>>> (linemap_get_expansion_line): Likewise.
>>> (linemap_get_expansion_filename): Likewise.
>>> (linemap_location_in_system_header_p): Likewise.
>>> (linemap_location_from_macro_expansion_p): Likewise.
>>> (linemap_macro_loc_to_spelling_point): Likewise.
>>> (linemap_macro_loc_to_def_point): Likewise.
>>> (linemap_macro_loc_to_exp_point): Likewise.
>>> (linemap_resolve_location): Likewise.
>>> (linemap_unwind_toward_expansion): Likewise.
>>> (linemap_unwind_to_first_non_reserved_loc): Likewise.
>>> (linemap_expand_location): Likewise.
>>> (linemap_dump_location): Likewise.
>>>
>>> Index: libcpp/line-map.c
>>> ===
>>> --- libcpp/line-map.c   (revision 190209)
>>> +++ libcpp/line-map.c   (working copy)
>>> @@ -25,6 +25,7 @@
>>>  #include "line-map.h"
>>>  #include "cpplib.h"
>>>  #include "internal.h"
>>> +#include "hashtab.h"
>>>
>>>  static void trace_include (const struct line_maps *, const struct line_map 
>>> *);
>>>  static const struct line_map * linemap_ordinary_map_lookup (struct 
>>> line_maps *,
>>> @@ -50,6 +51,135 @@
>>>  extern unsigned num_expanded_macros_counter;
>>>  extern unsigned num_macro_tokens_counter;
>>>
>>> +/* Data structure to associate an arbitrary data to a source location.  */
>>> +struct location_adhoc_data {
>>> +  source_location locus;
>>> +  void *data;
>>> +};
>>> +
>>> +/* The following data structure encodes a location with some adhoc data
>>> +   and maps it to a new unsigned integer (called an adhoc location)
>>> +   that replaces the original location to represent the mapping.
>>> +
>>> +   The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the
>>> +   highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as
>>> +   the original location. Once identified as the adhoc_loc, the lower 31
>>> +   bits of the integer is used to index the location_adhoc_data array,
>>> +   in which the locus and associated data is stored.  */
>>> +
>>> +static htab_t location_adhoc_data_htab;
>>> +static source_location curr_adhoc_loc;
>>> +static struct location_adhoc_data *location_adhoc_data;
>>> +static unsigned int allocated_location_adhoc_data;
>>> +
>>> +/* Hash function for location_adhoc_data hashtable.  */
>>> +
>>> +static hashval_t
>>> +location_adhoc_data_hash (const void *l)
>>> +{
>>> +  const struct location_adhoc_data *lb =
>>> +  (const struct location_adhoc_data *) l;
>>> +  return (hashval_t) lb->locus + (size_t) &lb->data;
>>> +}
>>> +
>>> +/* Compare function for location_adhoc_data hashtable.  */
>>> +
>>> +static int
>>> +location_adhoc_data_eq (const void *l1, const void *l2)
>>> +{
>>> +  const struct location_adhoc_data *lb1 =
>>> +  (const struct location_adhoc_data *) l1;
>>> +  const 

Re: [SH] PR 39423 - Add support for SH2A movu.w insn

2012-08-21 Thread Kaz Kojima
Oleg Endo  wrote:
> This adds support for SH2A's movu.w insn for memory addressing cases as
> described in the PR.
> Tested on rev 190546 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?

OK.

Regards,
kaz


Re: [SH] Use more multi-line asm outputs

2012-08-21 Thread Kaz Kojima
Oleg Endo  wrote:
> This mainly converts the asm outputs to multi-line strings and uses tab
> chars instead of '\\t' in the asm strings, in the hope to make stuff
> easier to read and a bit more consistent.
> Tested on rev 190546 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?

OK.

Regards,
kaz


Re: [PATCH, ARM] Don't pull in unwinder for 64-bit division routines

2012-08-21 Thread Michael Hope
On 17 August 2012 07:29, Julian Brown  wrote:
> On Thu, 16 Aug 2012 19:56:52 +0100
> Ramana Radhakrishnan  wrote:
>
>> On 07/24/12 13:27, Julian Brown wrote:
>> > On Fri, 20 Jul 2012 11:15:27 +0100
>> > Julian Brown  wrote:
>> >
>> >> Anyway: this revised version of the patch removes the strange
>> >> libgcc Makefile-fragment changes, the equivalent of which have
>> >> since been incorporated into mainline GCC now anyway, so the patch
>> >> is somewhat more straightforward than it was previously.
>> >
>> > Joey Ye contacted me offlist and suggested that the t-divmod-ef
>> > fragment might be better integrated into t-bpabi instead. Doing that
>> > makes the patch somewhat smaller/cleaner.
>> >
>> > Minimally re-tested, looks OK.
>>
>> The original submission makes no mention of testing ? The ARM
>> specific portions look OK to me modulo no regressions.
>
> Thanks -- I'm sure I did test the patch, but just omitted to mention
> that fact in the mail :-O. We've also been carrying a version of this
> patch in our local source base for many years now.

Hi Julian.  The test case fails on arm-linux-gnueabi:
 http://gcc.gnu.org/ml/gcc-testresults/2012-08/msg02100.html

FAIL: gcc.target/arm/div64-unwinding.c execution test

The test aborts as &_Unwind_RaiseException is not null.  _divdi3.o
itself looks fine and no longer pulls in the unwinder so I assume
something else in the environment is.  I've put the binaries up at
http://people.linaro.org/~michaelh/incoming/div64-unwinding.exe and
http://people.linaro.org/~michaelh/incoming/_divdi3.o if that helps.

-- Michael


[Patch, Fortran, committed] free gfc_code of EXEC_END_PROCEDURE

2012-08-21 Thread Tobias Burnus
Background: There is currently a memory leakage cleanup in the middle 
end – and fixing PR 54332 would probably have been also easier without 
FE leaks.


I think we should join in an try to remove some leakage - and try to not 
introduce new ones.


* * *

Committed: For EXPR_END_PROCEDURE, I have committed one fix as obvious 
(-> parse.c). However, I have a test case where parse_contained still 
leaks memory, possibly another, similar patch is needed in addition.


* * *

There are also plenty of leaks related to the freeing of gfc_ss. I 
attached a draft patch (trans-expr.c, trans-intrinsics.c), which is 
probably okay, but not yet regtested.


OK with a changelog (and if it regtested)?

Note: The patch is incomplete, e.g. "argss" of gfc_conv_procedure_call 
is not (or not always) freed. Ditto for rss of gfc_trans_assignment_1; 
ditto for lss and rss of gfc_trans_pointer_assignment



* * * * * * * * * * * *

Additionally, there is a memory leak when generating more than one 
procedure per TU: The memory which is allocated but not freed 
gfc_generate_function_code -> (generate_coarray_init or 
trans_function_start) -> init_function_start -> prepare_function_start 
-> init_emit


The memory should be feed via (backend_init_target or 
lang_dependent_init_target) -> expand_dummy_function_end -> 
free_after_compilation


The latter seems to operate on "cfun" – hence, it only frees the last 
"cfun" and not all.


However, despite some longer debugging (e.g. using a main program, which 
calls create_main_function), I failed to find the problem.


* * *

And module.c can also leak plenty of memory ...

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 190574)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2012-08-21  Tobias Burnus  
+
+	* parse.c (parse_contained): Include EXEC_END_PROCEDURE
+	in ns->code to make sure the gfc_code is freed.
+
 2012-08-20  Tobias Burnus  
 
 	PR fortran/54301
Index: gcc/fortran/parse.c
===
--- gcc/fortran/parse.c	(Revision 190574)
+++ gcc/fortran/parse.c	(Arbeitskopie)
@@ -4075,6 +4075,7 @@ parse_contained (int module)
 	case ST_END_PROGRAM:
 	case ST_END_SUBROUTINE:
 	  accept_statement (st);
+	  gfc_current_ns->code = s1.head;
 	  break;
 
 	default:
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 4f7d026..cfb0862 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -533,6 +533,7 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems)
   loop.to[0] = nelems;
   gfc_trans_scalarizing_loops (&loop, &loopbody);
   gfc_add_block_to_block (&body, &loop.pre);
+  gfc_cleanup_loop (&loop);
   tmp = gfc_finish_block (&body);
 }
   else
@@ -6770,6 +6771,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_expr * expr2)
   if (!expr2->value.function.isym)
 	{
 	  realloc_lhs_loop_for_fcn_call (&se, &expr1->where, &ss, &loop);
+	  gfc_cleanup_loop (&loop);
 	  ss->is_alloc_lhs = 1;
 	}
   else
@@ -6778,6 +6780,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_expr * expr2)
 
   gfc_conv_function_expr (&se, expr2);
   gfc_add_block_to_block (&se.pre, &se.post);
+  gfc_free_ss (se.ss);
 
   return gfc_finish_block (&se.pre);
 }
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index fac29c7..d0aebe9 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1328,6 +1328,7 @@ gfc_conv_intrinsic_rank (gfc_se *se, gfc_expr *expr)
   argse.descriptor_only = 1;
 
   gfc_conv_expr_descriptor (&argse, expr->value.function.actual->expr, ss);
+  gfc_free_ss (ss);
   gfc_add_block_to_block (&se->pre, &argse.pre);
   gfc_add_block_to_block (&se->post, &argse.post);
 


Re: [PATCH] Allow dg-skip-if to use compiler flags specified through set_board_info cflags

2012-08-21 Thread Mike Stump
On Aug 11, 2012, at 10:39 AM, Senthil Kumar Selvaraj wrote:
> This patch allows cflags set in board config files using 
> "set_board_info cflags" to be used in the selectors of
> dg-skip-if and other dejagnu commands that use the check-flags
> proc.

Ok.


Another merge from gcc 4.7 branch to gccgo branch

2012-08-21 Thread Ian Lance Taylor
I merged revision 190574 from the gcc 4.7 branch to the gccgo branch.

Ian


PATCH: PR middle-end/54332: [4.8 Regression] 481.wrf in SPEC CPU 2006 takes > 10GB memory to compile

2012-08-21 Thread H.J. Lu
Hi,

This patch restores df_free_collection_rec call inside the insn traversal
loop and removes the stack allocation check in vec_reserve.  It has
been approved in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332#c25

It has been tested on Linux/x86-64 and checked in.

Thanks.


H.J.
---
2012-08-21  H.J. Lu  

PR middle-end/54332
* df-scan.c (df_bb_verify): Restore df_free_collection_rec call
inside the insn traversal loop.

* vec.h (vec_reserve): Remove the stack allocation check.

diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 55492fa..df90365 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -4448,6 +4448,7 @@ df_bb_verify (basic_block bb)
   if (!INSN_P (insn))
 continue;
   df_insn_refs_verify (&collection_rec, bb, insn, true);
+  df_free_collection_rec (&collection_rec);
 }
 
   /* Do the artificial defs and uses.  */
diff --git a/gcc/vec.h b/gcc/vec.h
index 5fdb859..1922616 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1099,21 +1099,9 @@ vec_reserve (vec_t *vec_, int reserve MEM_STAT_DECL)
  sizeof (T), false
  PASS_MEM_STAT);
   else
-{
-  /* Only allow stack vectors when re-growing them.  The initial
-allocation of stack vectors must be done with the
-VEC_stack_alloc macro, because it uses alloca() for the
-allocation.  */
-  if (vec_ == NULL)
-   {
- fprintf (stderr, "Stack vectors must be initially allocated "
-  "with VEC_stack_alloc.\n");
- gcc_unreachable ();
-   }
-  return (vec_t *) vec_stack_o_reserve (vec_, reserve,
-  offsetof (vec_t, vec),
-  sizeof (T) PASS_MEM_STAT);
-}
+return (vec_t *) vec_stack_o_reserve (vec_, reserve,
+offsetof (vec_t, vec),
+sizeof (T) PASS_MEM_STAT);
 }
 
 


libgcc patch committed: Increase non-split stack space

2012-08-21 Thread Ian Lance Taylor
When a -fsplit-stack function calls a non-split-stack function, the gold
linker automatically redirects the call to __morestack to call
__morestack_non_split instead.  I wrote __morestack_non_split to always
allocate at least 0x4000 bytes.  However, that was unclear thinking;
0x4000 bytes is sufficient for calling into libc, but it is not
sufficient for calling a general function.  This value leads to stack
overruns in ordinary code.  The default thread stack size on x86 and
x86_64 is 0x80 bytes.  This patch significantly increases the stack
size allocated for non-split code, to less than the default but still
larger, 0x10 bytes.

Probably the program should have a way to control this, but I'm not yet
sure what the right API would be for that.  In any case the default
should be larger.

Bootstrapped and ran Go testsuite and split-stack tests on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian


2012-08-21  Ian Lance Taylor  

* config/i386/morestack.S (__morestack_non_split): Increase amount
of space allocated for non-split code stack.


Index: config/i386/morestack.S
===
--- config/i386/morestack.S	(revision 190572)
+++ config/i386/morestack.S	(working copy)
@@ -83,6 +83,9 @@
 #endif
 
 
+# The amount of space we ask for when calling non-split-stack code.
+#define NON_SPLIT_STACK 0x10
+
 # This entry point is for split-stack code which calls non-split-stack
 # code.  When the linker sees this case, it converts the call to
 # __morestack to call __morestack_non_split instead.  We just bump the
@@ -109,7 +112,7 @@ __morestack_non_split:
 
 	movl	%esp,%eax		# Current stack,
 	subl	8(%esp),%eax		# less required stack frame size,
-	subl	$0x4000,%eax		# less space for non-split code.
+	subl	$NON_SPLIT_STACK,%eax	# less space for non-split code.
 	cmpl	%gs:0x30,%eax		# See if we have enough space.
 	jb	2f			# Get more space if we need it.
 
@@ -171,7 +174,8 @@ __morestack_non_split:
 
 	.cfi_adjust_cfa_offset -4	# Account for popped register.
 
-	addl	$0x5000+BACKOFF,4(%esp)	# Increment space we request.
+	# Increment space we request.
+	addl	$NON_SPLIT_STACK+0x1000+BACKOFF,4(%esp)
 
 	# Fall through into morestack.
 
@@ -186,7 +190,7 @@ __morestack_non_split:
 
 	movq	%rsp,%rax		# Current stack,
 	subq	%r10,%rax		# less required stack frame size,
-	subq	$0x4000,%rax		# less space for non-split code.
+	subq	$NON_SPLIT_STACK,%rax	# less space for non-split code.
 
 #ifdef __LP64__
 	cmpq	%fs:0x70,%rax		# See if we have enough space.
@@ -219,7 +223,8 @@ __morestack_non_split:
 
 	.cfi_adjust_cfa_offset -8	# Adjust for popped register.
 
-	addq	$0x5000+BACKOFF,%r10	# Increment space we request.
+	# Increment space we request.
+	addq	$NON_SPLIT_STACK+0x1000+BACKOFF,%r10
 
 	# Fall through into morestack.
 


[PATCH] Fix some leaks and one uninitialized var read

2012-08-21 Thread Jakub Jelinek
Hi!

The recent change in find_assert_locations from XCNEWVEC to XNEWVEC
caused a valgrind warning, because bb_rpo[ENTRY_BLOCK] used to
be accessed, but was never initialized.

Fixed by ignoring edges from ENTRY_BLOCK altogether.

The rest are a couple of memory leak fixes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-08-21  Jakub Jelinek  

* tree-vrp.c (find_assert_locations): Skip also edges
from the entry block.

* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Call
free_stmt_vec_info on orig_cond after gsi_removing it.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Always
free body_cost_vec vector.
(vect_analyze_data_refs): If gather is unsuccessful,
free_data_ref (dr).
* tree-inline.c (tree_function_versioning): Free
old_transforms_to_apply vector.

--- gcc/tree-vrp.c.jj   2012-08-20 20:56:01.0 +0200
+++ gcc/tree-vrp.c  2012-08-21 12:15:32.501753048 +0200
@@ -5596,7 +5596,7 @@ find_assert_locations (void)
  FOR_EACH_EDGE (e, ei, bb->preds)
{
  int pred = e->src->index;
- if (e->flags & EDGE_DFS_BACK)
+ if ((e->flags & EDGE_DFS_BACK) || pred == ENTRY_BLOCK)
continue;
 
  if (!live[pred])
--- gcc/tree-vect-loop-manip.c.jj   2012-08-15 10:55:24.0 +0200
+++ gcc/tree-vect-loop-manip.c  2012-08-21 15:01:02.600750196 +0200
@@ -788,6 +788,7 @@ slpeel_make_loop_iterate_ntimes (struct
 
   /* Remove old loop exit test:  */
   gsi_remove (&loop_cond_gsi, true);
+  free_stmt_vec_info (orig_cond);
 
   loop_loc = find_loop_location (loop);
   if (dump_file && (dump_flags & TDF_DETAILS))
--- gcc/tree-vect-data-refs.c.jj2012-08-20 11:09:45.0 +0200
+++ gcc/tree-vect-data-refs.c   2012-08-21 16:32:13.631428796 +0200
@@ -1934,10 +1934,9 @@ vect_enhance_data_refs_alignment (loop_v
  gcc_assert (stat);
   return stat;
 }
-  else
-   VEC_free (stmt_info_for_cost, heap, body_cost_vec);
 }
 
+  VEC_free (stmt_info_for_cost, heap, body_cost_vec);
 
   /* (2) Versioning to force alignment.  */
 
@@ -3313,6 +3312,8 @@ vect_analyze_data_refs (loop_vec_info lo
gather = false;
  if (!gather)
{
+ STMT_VINFO_DATA_REF (stmt_info) = NULL;
+ free_data_ref (dr);
  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
{
  fprintf (vect_dump,
--- gcc/tree-inline.c.jj2012-08-15 10:55:33.0 +0200
+++ gcc/tree-inline.c   2012-08-21 17:28:24.181069515 +0200
@@ -5089,6 +5089,7 @@ tree_function_versioning (tree old_decl,
   VEC_index (ipa_opt_pass,
  old_transforms_to_apply,
  i));
+  VEC_free (ipa_opt_pass, heap, old_transforms_to_apply);
 }
 
   id.copy_decl = copy_decl_no_change;

Jakub


Re: [PATCH, MIPS] fix MIPS16 jump table overflow

2012-08-21 Thread Richard Sandiford
Sandra Loosemore  writes:
> In config/mips/mips.h, there is presently this comment:
>
> /* ??? 16-bit offsets can overflow in large functions.  */
> #define TARGET_MIPS16_SHORT_JUMP_TABLES TARGET_MIPS16_TEXT_LOADS
>
> A while ago we had a bug report where a big switch statement did, in 
> fact, overflow the range of 16-bit offsets, causing a runtime error.
>
> GCC already has generic machinery to shorten offset tables for switch 
> statements that does the necessary range checking, but it only works 
> with "casesi", not the lower-level "tablejump" expansion.  So, this 
> patch provides a "casesi" expander to handle this situation.

Nice.

> This patch has been in use on our local 4.5 and 4.6 branches for about a 
> year now.  When testing it on mainline, I found it tripped over the 
> recent change to add MIPS16 branch overflow checking in other 
> situations, causing it to get into an infinite loop.  I think telling it 
> to ignore these new jump insns it doesn't know how to process is the 
> right thing to do, but I'm not sure if there's a better way to restrict 
> the condition or make mips16_split_long_branches more robust.  Richard,
> since that's your code I assume you'll suggest an alternative if this 
> doesn't meet with your approval.

Changing it to:

if (JUMP_P (insn)
&& USEFUL_INSN_P (insn)
&& get_attr_length (insn) > 8
&& (any_condjump_p (insn) || any_uncond_jump_p (insn)))

should be OK.

> @@ -5937,6 +5933,91 @@
>[(set_attr "type" "jump")
> (set_attr "mode" "none")])
>  
> +;; For MIPS16, we don't know whether a given jump table will use short or
> +;; word-sized offsets until late in compilation, when we are able to 
> determine
> +;; the sizes of the insns which comprise the containing function.  This
> +;; necessitates the use of the casesi rather than the tablejump pattern, 
> since
> +;; the latter tries to calculate the index of the offset to jump through 
> early
> +;; in compilation, i.e. at expand time, when nothing is known about the
> +;; eventual function layout.
> +
> +(define_expand "casesi"
> +  [(match_operand:SI 0 "register_operand" ""); index to jump on
> +   (match_operand:SI 1 "const_int_operand" "")   ; lower bound
> +   (match_operand:SI 2 "const_int_operand" "")   ; total range
> +   (match_operand:SI 3 "" ""); table label
> +   (match_operand:SI 4 "" "")]   ; out of range label

The last two are Pmode rather than SImode.  Since there aren't different
case* patterns for different Pmodes, we can't use :P instead, so let's just
drop the modes on 4 and 5.

Would be nice to add a compile test for -mabi=64 just to make sure
that Pmode == DImode works.  A copy of an existing test like
code-readable-1.c would be fine.

> +(define_insn "casesi_internal_mips16"
> +  [(set (pc)
> + (if_then_else
> +   (leu (match_operand:SI 0 "register_operand" "d")
> + (match_operand:SI 1 "arith_operand" "dI"))
> +   (mem:SI (plus:SI (mult:SI (match_dup 0) (const_int 4))
> + (label_ref (match_operand 2 "" ""
> +   (label_ref (match_operand 3 "" ""
> +   (clobber (match_scratch:SI 4 "=&d"))
> +   (clobber (match_scratch:SI 5 "=d"))
> +   (clobber (reg:SI MIPS16_T_REGNUM))
> +   (use (label_ref (match_dup 2)))]

Although this is descriptive, the MEM is probably more trouble
than it's worth.  A hard-coded MEM like this will alias a lot
of things, whereas we're only reading from the function itself.
I think an unspec would be better.

This pattern should have :P for operands 4 and 5, with the pattern
name becoming:

"casesi_internal_mips16_"

PMODE_INSN should make it easy to wrap up the difference.

There shouldn't be any need for the final USE.  Let me know
if you found otherwise, because that sounds like a bug.

> +  "TARGET_MIPS16_SHORT_JUMP_TABLES"
> +{
> +  rtx diff_vec = PATTERN (next_real_insn (operands[2]));
> +
> +  gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC);
> +  
> +  output_asm_insn ("sltu\t%0, %1", operands);
> +  output_asm_insn ("bteqz\t%3", operands);
> +  output_asm_insn ("la\t%4, %2", operands);
> +  
> +  switch (GET_MODE (diff_vec))
> +{
> +case HImode:
> +  output_asm_insn ("sll\t%5, %0, 1", operands);
> +  output_asm_insn ("addu\t%5, %4, %5", operands);
> +  output_asm_insn ("lh\t%5, 0(%5)", operands);
> +  break;
> +
> +case SImode:
> +  output_asm_insn ("sll\t%5, %0, 2", operands);
> +  output_asm_insn ("addu\t%5, %4, %5", operands);
> +  output_asm_insn ("lw\t%5, 0(%5)", operands);
> +  break;
> +
> +default:
> +  gcc_unreachable ();
> +}
> +  
> +  output_asm_insn ("addu\t%4, %4, %5", operands);
> +  
> +  return "j\t%4";
> +}
> +  [(set_attr "length" "32")])

The "addu"s here ought to be "addu"s after the :P change.

I think we can avoid the earlyclobber on operand 4 by moving the LA
after the SLL.

> +#define CASE_VECTOR_MODE ptr_mode
> +
> +

[PATCH] fix wrong-code bug for -fstrict-volatile-bitfields

2012-08-21 Thread Sandra Loosemore
This patch is a followup to the addition of support for 
-fstrict-volatile-bitfields (required by the ARM EABI); see this thread


http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01889.html

for discussion of the original patch.

That patch only addressed the behavior when extracting the value of a 
volatile bit field, but the same problems affect storing values into a 
volatile bit field (or a field of a packed structure, which is 
effectively implemented as a bit field).  This patch makes the code for 
bitfield stores mirror that for bitfield loads.


Although the fix is in target-independent code, it's really for ARM; 
hence the test case, which (without this patch) generates wrong code. 
Code to determine the access width was correctly preserving the 
user-specified width, but was incorrectly falling through to code that 
assumes word mode.


As well as regression testing on arm-none-eabi, I've bootstrapped and 
regression-tested this patch on x86_64 Linux.  Earlier versions of this 
patch have also been present in our local 4.5 and 4.6 GCC trees for some 
time, so it's been well-tested on a variety of other platforms.  OK to 
check in on mainline?


-Sandra


2012-08-21  Paul Brook  
Joseph Myers 
Sandra Loosemore  

gcc/
* expr.h (store_bit_field): Add packedp parameter to prototype.
* expmed.c (store_bit_field, store_bit_field_1): Add packedp
parameter.  Adjust all callers.
(warn_misaligned_bitfield): New function, split from
extract_fixed_bit_field.
(store_fixed_bit_field): Add packedp parameter.  Fix wrong-code
behavior for the combination of misaligned bitfield and
-fstrict-volatile-bitfields.  Use warn_misaligned_bitfield.
(extract_fixed_bit_field): Use warn_misaligned_bitfield.
* expr.c: Adjust calls to store_bit_field.
(expand_assignment): Identify accesses to packed structures.
(store_field): Add packedp parameter.  Adjust callers.
* calls.c: Adjust calls to store_bit_field.
* ifcvt.c: Likewise.
* config/s390/s390.c: Likewise.

gcc/testsuite/
* gcc.target/arm/volatile-bitfields-5.c: New test case.

Index: gcc/expr.h
===
--- gcc/expr.h	(revision 190541)
+++ gcc/expr.h	(working copy)
@@ -693,7 +693,7 @@ extern void store_bit_field (rtx, unsign
 			 unsigned HOST_WIDE_INT,
 			 unsigned HOST_WIDE_INT,
 			 unsigned HOST_WIDE_INT,
-			 enum machine_mode, rtx);
+			 bool, enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			  unsigned HOST_WIDE_INT, int, bool, rtx,
 			  enum machine_mode, enum machine_mode);
Index: gcc/expmed.c
===
--- gcc/expmed.c	(revision 190541)
+++ gcc/expmed.c	(working copy)
@@ -50,7 +50,7 @@ static void store_fixed_bit_field (rtx, 
    unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
-   rtx);
+   rtx, bool);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
@@ -406,7 +406,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 		   unsigned HOST_WIDE_INT bitnum,
 		   unsigned HOST_WIDE_INT bitregion_start,
 		   unsigned HOST_WIDE_INT bitregion_end,
-		   enum machine_mode fieldmode,
+		   bool packedp, enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -638,7 +638,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  if (!store_bit_field_1 (op0, new_bitsize,
   bitnum + bit_offset,
   bitregion_start, bitregion_end,
-  word_mode,
+  false, word_mode,
   value_word, fallback_p))
 	{
 	  delete_insns_since (last);
@@ -859,7 +859,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
  bitregion_start, bitregion_end,
- fieldmode, orig_value, false))
+ false, fieldmode, orig_value, false))
 	{
 	  emit_move_insn (xop0, tempreg);
 	  return true;
@@ -872,7 +872,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 return false;
 
   store_fixed_bit_field (op0, offset, bitsize, bitpos,
-			 bitregion_start, bitregion_end, value);
+			 bitregion_start, bitregion_end, value, packedp);
   return true;
 }
 
@@ -885,6 +885,8 @@ store_bit_field_1 (rtx str_rtx, unsigned
These two fields are 0, if the C++ memory model does not apply,
or we are not interested in keeping track of bitfield regions.
 
+   PACKEDP is true for fields with the packed attribute.
+
FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
@@ -892,6 +894,7 @@ store_bit_field (rtx str_rtx, unsigned H
 		 unsigned HOST_WIDE_INT bitnum,
 		 unsigned HOST_WIDE_INT bitregion_start,
 		 unsigned HOST_WIDE_INT bitregion_end,
+		 bool packedp,
 		 enum ma

Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Richard Sandiford
Kenneth Zadeck  writes:
> I named it this way CASE_CONST_SCALAR_INTEGER because i need to 
> introduce in the next patch a predicate that looks like
>
> /* Predicate yielding true iff X is an rtx for a integer const.  */
> #if TARGET_SUPPORTS_WIDE_INT == 1
> #define CONST_INTEGER_P(X) \
>(CONST_INT_P (X) || CONST_WIDE_INT_P (X))
> #else
> #define CONST_INTEGER_P(X) \
>(CONST_INT_P (X) || CONST_DOUBLE_AS_INT_P (X))
> #endif
>
> for all of the rtxs that represent an integer.  And this name was 
> consistent with that.   It may be that you have a suggestion for the 
> name of predicate as well

Good guess.

> but it seemed to make more sense to have the 
> rtxs be consistent rather than rtx/mode consistent.

Yeah, I think CONST_SCALAR_INT_P would be better here too.  "INTEGER"
just isn't distinct enough from "INT" for the difference to be obvious.
It also doesn't indicate that complex integers and vector integers
are excluded.  SCALAR_INT seems a bit more precise, as well as
having precedent.

BTW, the "== 1" above looks redundant.

Richard


[patch] two more bitmap obstacks

2012-08-21 Thread Steven Bosscher
Hello,

Two more bitmap obstacks, this time in tree-ssa-coalesce.c.

The advantage isn't so much in having the bitmaps on the non-default
obstack, but more in that the bitmaps can be free'ed all at once by
simply releasing the obstack.

Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven


more_obs.diff
Description: Binary data


[lra] patch to remove -flra option

2012-08-21 Thread Vladimir Makarov
The following patch mostly removes -flra option by defining a 
machine-dependent hook lra_p.  If the hook returns true, LRA is used.  
Otherwise, reload pass is used.  By default the hook returns false.  It 
returns true for 8 targets, lra was ported (i386, rs6000, arm, s390, 
ia64, sparc, mips, pa).


The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 190564.

2012-08-21  Vladimir Makarov  

* targhooks.h (default_lra_p): Declare.
* targhooks.c (default_lra_p): New function.
* target.def (lra_p): New hook.
* ira.h (ira_use_lra_p): New external.
* ira.c (ira_init_once, ira_init, ira_finish_once): Call
lra_start_once, lra_init, lra_finish_once in anyway.
(ira_setup_eliminable_regset, setup_reg_renumber): Use 
ira_use_lra_p instead of

flra_lra.
(ira_use_lra_p): Define.
(ira): Set up ira_use_lra_p.  Use ira_use_lra_p instead of
flra_lra.
* dwarf2out.c: Add ira.h.
(based_loc_descr, compute_frame_pointer_to_fb_displacement): Use
ira_use_lra_p instead of ira_use_lra_p.
* rtlanal.c (simplify_subreg_regno): Add comments.
* Makefile.in (dwarf2out.c): Add dependence ira.h.
* doc/passes.texi: Change LRA pass description.
* doc/tm.texi.in: Add TARGET_LRA_P.
* doc/tm.texi: Update.
* doc/invoke.texi: Remove -flra option.
* common.opt: Remove flra option.  Add description for
flra-reg-spill.
* reginfo.c (allocate_reg_info): Fix a comment typo.
* config/arm/arm.c (TARGET_LRA_P): Define.
(arm_lra_p): New function.
* config/sparc/sparc.c (TARGET_LRA_P): Define.
(sparc_lra_p): New function.
* config/s390/s390.c (TARGET_LRA_P): Define.
(s390_lra_p): New function.
* config/i386/i386.c (TARGET_LRA_P): Define.
(ix86_lra_p): New function.
* config/rs6000/rs6000.c (TARGET_LRA_P): Define.
(rs6000_lra_p): New function.
* config/mips/mips.c (TARGET_LRA_P): Define.
(mips_lra_p): New function.
* config/pa/pa.c (TARGET_LRA_P): Define.
(pa_lra_p): New function.
* config/ia64/ia64.c (TARGET_LRA_P): Define.
(ia64_lra_p): New function.

Index: targhooks.c
===
--- targhooks.c	(revision 190448)
+++ targhooks.c	(working copy)
@@ -840,6 +840,12 @@ default_branch_target_register_class (vo
   return NO_REGS;
 }
 
+extern bool
+default_lra_p (void)
+{
+  return false;
+}
+
 int
 default_register_bank (int hard_regno ATTRIBUTE_UNUSED)
 {
Index: target.def
===
--- target.def	(revision 190448)
+++ target.def	(working copy)
@@ -2332,6 +2332,16 @@ DEFHOOK
  tree, (tree type, tree expr),
  hook_tree_tree_tree_null)
 
+/* Return true if we use LRA instead of reload.  */
+DEFHOOK
+(lra_p,
+ "A target hook which returns true if we use LRA instead of reload pass.\
+  It means that LRA was ported to the target.\
+  \
+  The default version of this target hook returns always false.",
+ bool, (void),
+ default_lra_p)
+
 /* Return register bank of given hard regno for the current target.  */
 DEFHOOK
 (register_bank,
Index: targhooks.h
===
--- targhooks.h	(revision 190448)
+++ targhooks.h	(working copy)
@@ -132,6 +132,7 @@ extern rtx default_static_chain (const_t
 extern void default_trampoline_init (rtx, tree, rtx);
 extern int default_return_pops_args (tree, tree, int);
 extern reg_class_t default_branch_target_register_class (void);
+extern bool default_lra_p (void);
 extern int default_register_bank (int);
 extern bool default_different_addr_displacement_p (void);
 extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
Index: rtlanal.c
===
--- rtlanal.c	(revision 190448)
+++ rtlanal.c	(working copy)
@@ -3501,6 +3501,7 @@ simplify_subreg_regno (unsigned int xreg
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
   && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
   && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+  /* We can use mode change in LRA for some transformations.  */
   && ! lra_in_progress)
 return -1;
 #endif
@@ -3511,6 +3512,8 @@ simplify_subreg_regno (unsigned int xreg
 return -1;
 
   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
+  /* We should convert arg register in LRA after the elimination
+	 if it is possible.  */
   && xregno == ARG_POINTER_REGNUM
   && ! lra_in_progress)
 return -1;
Index: ira.c
===
--- ira.c	(revision 190448)
+++ ira.c	(working copy)
@@ -1634,8 +1634,7 @@ void
 ira_init_once (void)
 {
   ira_init_costs_once ();
-  if (flag_lra)
-lra_init_once ();
+  lra_init_once ();
 }
 
 /* Free ira_max_register_move_cost, ira_ma

Re: C++ PATCH for c++/51675 (more constexpr unions)

2012-08-21 Thread H.J. Lu
On Wed, Feb 8, 2012 at 1:23 AM, Jason Merrill  wrote:
> More traffic on PR 51675 demonstrates that my earlier patch didn't fix the
> whole problem.  This patch improves handling of user-defined constructors.
>
> Tested x86_64-pc-linux-gnu, applying to trunk.

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54341

-- 
H.J.


Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-21 Thread Sriraman Tallam
Committed after making the changes.

One small problem, I am not sure how to fix this:

The hyper link I referenced is :
http://gcc.gnu.org/onlinedocs/gcc/X86-Built_002din-Functions.html#X86-Built_002din-Functions

whereas the committed changes.html is pointing to:
http://gcc.gnu.org/onlinedocs/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions

Please note that the "_002din" is missing. This makes the link broken,
did I miss anything? I verified that I submitted the right link.

Thanks,
-Sri.

On Tue, Aug 21, 2012 at 5:41 AM, Diego Novillo  wrote:
> On 2012-08-20 22:41 , Sriraman Tallam wrote:
>>
>> Hi Gerald / Diego,
>>
>>  I have made all the mentioned changes.  I also shortened the
>> description like Diego mentioned by removing all the strings but kept
>> the caveats. I have not added a reference to the documentation because
>> i do not know what link to reference. The builtins are completely
>> documented in extend.texi.
>
>
> Referring to the user's manual is OK, I think.
>
>> +Caveat: If these built-in functions are called before any static
>> +constructors are invoked, like during IFUNC initialization, then the
>> CPU
>> +detection initialization must be explicity run using this newly
>> provided
>
>
> s/explicity/explicitly/
>
> Other than that, it looks fine to me.
>
>
> Diego.


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Xinliang David Li
On Tue, Aug 21, 2012 at 12:34 AM, Jan Hubicka  wrote:
>> Teresa has done some tunings for the unroller so far. The inliner
>> tuning is the next step.
>>
>> >
>> > What concerns me that it is greatly inaccurate - you have no idea how many
>> > instructions given counter is guarding and it can differ quite a lot. Also
>> > inlining/optimization makes working sets significantly different (by 
>> > factor of
>> > 100 for tramp3d).
>>
>> The pre ipa-inline working set is the one that is needed for ipa
>> inliner tuning. For post-ipa inline code increase transformations,
>> some update is probably needed.
>>
>> >But on the ohter hand any solution at this level will be
>> > greatly inaccurate. So I am curious how reliable data you can get from 
>> > this?
>> > How you take this into account for the heuristics?
>>
>> This effort is just the first step to allow good heuristics to develop.
>>
>> >
>> > It seems to me that for this use perhaps the simple logic in histogram 
>> > merging
>> > maximizing the number of BBs for given bucket will work well?  It is
>> > inaccurate, but we are working with greatly inaccurate data anyway.
>> > Except for degenerated cases, the small and unimportant runs will have 
>> > small BB
>> > counts, while large runs will have larger counts and those are ones we 
>> > optimize
>> > for anyway.
>>
>> The working set curve for each type of applications contains lots of
>> information that can be mined. The inaccuracy can also be mitigated by
>> more data 'calibration'.
>
> Sure, I think I am leaning towards trying the solution 2) with maximizing
> counter count merging (probably it would make sense to rename it from BB count
> since it is not really BB count and thus it is misleading) and we will see how
> well it works in practice.
>
> We have benefits of much fewer issues with profile locking/unlocking and we
> lose bit of precision on BB counts. I tend to believe that the error will not
> be that important in practice. Another loss is more histogram streaming into
> each gcda file, but with skiping zero entries it should not be major overhead
> problem I hope.
>
> What do you think?
>>
>> >>
>> >>
>> >> >  2) Do we plan to add some features in near future that will anyway 
>> >> > require global locking?
>> >> > I guess LIPO itself does not count since it streams its data into 
>> >> > independent file as you
>> >> > mentioned earlier and locking LIPO file is not that hard.
>> >> > Does LIPO stream everything into that common file, or does it use 
>> >> > combination of gcda files
>> >> > and common summary?
>> >>
>> >> Actually, LIPO module grouping information are stored in gcda files.
>> >> It is also stored in a separate .imports file (one per object) ---
>> >> this is primarily used by our build system for dependence information.
>> >
>> > I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO 
>> > behave
>> > on GCC bootstrap?
>>
>> We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
>> main problem for application build -- the link time (for debug build)
>> is.
>
> I was primarily curious how the LIPOs runtime analysis fare in the situation 
> where
> you do very many small train runs on rather large app (sure GCC is small to 
> google's
> use case ;).


There will be race, but as Teresa mentioned, there is a big chance
that the process which finishes the merge the last is also t the final
overrider of the LIPO summary data.


>>
>> > (i.e. it does a lot more work in the libgcov module per each
>> > invocation, so I am curious if it is practically useful at all).
>> >
>> > With LTO based solution a lot can be probably pushed at link time? Before
>> > actual GCC starts from the linker plugin, LIPO module can read gcov CFGs 
>> > from
>> > gcda files and do all the merging/updating/CFG constructions that is 
>> > currently
>> > performed at runtime, right?
>>
>> The dynamic cgraph build and analysis is still done at runtime.
>> However, with the new implementation, FE is no longer involved. Gcc
>> driver is modified to understand module grouping, and lto is used to
>> merge the streamed output from aux modules.
>
> I see. Are there any fundamental reasons why it can not be done at link-time
> when all gcda files are available?

For build parallelism, the decision should be made as early as
possible -- that is what makes LIPO 'light'.

> Why the grouping is not done inside linker
> plugin?

It is not delayed into link time. In fact linker plugin is not even involved.

David


>
> Honza
>>
>>
>> David


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Kenneth Zadeck
I am certainly not going to check it in if there are any issues with the 
patch.   However, this was basically a trivial lexicographical cleanup, 
and if no one has any comments on it after a reasonable amount of time, 
then i do feel this is ok.Obviously if anyone has any comments. that 
is completely a different issue.


I named it this way CASE_CONST_SCALAR_INTEGER because i need to 
introduce in the next patch a predicate that looks like


/* Predicate yielding true iff X is an rtx for a integer const.  */
#if TARGET_SUPPORTS_WIDE_INT == 1
#define CONST_INTEGER_P(X) \
  (CONST_INT_P (X) || CONST_WIDE_INT_P (X))
#else
#define CONST_INTEGER_P(X) \
  (CONST_INT_P (X) || CONST_DOUBLE_AS_INT_P (X))
#endif

for all of the rtxs that represent an integer.  And this name was 
consistent with that.   It may be that you have a suggestion for the 
name of predicate as well but it seemed to make more sense to have the 
rtxs be consistent rather than rtx/mode consistent.


kenny

On 08/21/2012 12:56 PM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

I plan to commit this in a few days unless someone has some comments.
This is a mostly trivial patch and the changes from that are Richard
Sandiford's and he is an rtl maintainer.

Please don't do this.  Patches need to be sent for review in their
final form.  Obviously, having got this far with the patch, you're free
to beat me up if I don't review it. :-)

Anyway, please do call it CASE_CONST_SCALAR_INT rather than
CASE_CONST_SCALAR_INTEGER.  Like I said in my original mail,
CASE_CONST_SCALAR_INT chimes nicely with SCALAR_INT_MODE_P, etc.,
and (as I didn't say) it'd be better not to have two spellings
of the same thing.


diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
--- gccBaseline/gcc/combine.c   2012-08-17 09:35:24.802195795 -0400
+++ gccWCase/gcc/combine.c  2012-08-20 15:43:34.659362244 -0400
@@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)

switch (code)
  {
-case CONST_INT:
  case CONST:
  case LABEL_REF:
  case SYMBOL_REF:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
  case CLOBBER:
return 0;

@@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
  {
  case LABEL_REF:
  case SYMBOL_REF:
-case CONST_INT:
  case CONST:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
  case PC:
  case ADDR_VEC:
  case ADDR_DIFF_VEC:

These were supposed to be CASE_CONST_ANY.  The omission of CONST_FIXED
looks like an oversight.


switch (code)
  {
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
  case SYMBOL_REF:
  case CONST:
  case LABEL_REF:

This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
looks like an oversight.


+/* Match CONST_*s for which pointer equality corresponds to value
+equality.  */

Should be:

/* Match CONST_*s for which pointer equality corresponds to value equality.  */

(probably an artefact of my work mailer, sorry)


+
+
+

Rather a lot of whitespace there.  One line seems enough, since we're
just before the definition of CONST_INT_P.

OK with those changes, thanks.

Richard




Re: [Patch,testsuite] Break gcc.dg/fixed-point/convert.c into manageable parts

2012-08-21 Thread Mike Stump
On Aug 21, 2012, at 4:32 AM, Georg-Johann Lay wrote:
> The patch breaks up convert.c in parts so that an AVR ATmega103 device
> with 128KiB for executable code (.text + .data + .rodata) can run them.
> 
> Ok for trunk?

Ok, but watch out for any comments from the fixed-point or the C front-end 
folks.


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Kenneth Zadeck

it would have been tough without the second snippit
On 08/21/2012 01:02 PM, Richard Sandiford wrote:

Richard Sandiford  writes:

switch (code)
  {
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
  case SYMBOL_REF:
  case CONST:
  case LABEL_REF:

This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
looks like an oversight.

Sorry, snipped the all-important:


--- gccBaseline/gcc/loop-invariant.c2012-07-22 16:55:01.239982968 -0400
+++ gccWCase/gcc/loop-invariant.c   2012-08-20 16:02:30.013430970 -0400
@@ -203,9 +203,7 @@ check_maybe_invariant (rtx x)

Richard




Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Richard Sandiford
Richard Sandiford  writes:
>>switch (code)
>>  {
>> -case CONST_INT:
>> -case CONST_DOUBLE:
>> -case CONST_FIXED:
>> +CASE_CONST_UNIQUE:
>>  case SYMBOL_REF:
>>  case CONST:
>>  case LABEL_REF:
>
> This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
> looks like an oversight.

Sorry, snipped the all-important:

> --- gccBaseline/gcc/loop-invariant.c  2012-07-22 16:55:01.239982968 -0400
> +++ gccWCase/gcc/loop-invariant.c 2012-08-20 16:02:30.013430970 -0400
> @@ -203,9 +203,7 @@ check_maybe_invariant (rtx x)

Richard


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Richard Sandiford
Kenneth Zadeck  writes:
> I plan to commit this in a few days unless someone has some comments.   
> This is a mostly trivial patch and the changes from that are Richard 
> Sandiford's and he is an rtl maintainer.

Please don't do this.  Patches need to be sent for review in their
final form.  Obviously, having got this far with the patch, you're free
to beat me up if I don't review it. :-)

Anyway, please do call it CASE_CONST_SCALAR_INT rather than
CASE_CONST_SCALAR_INTEGER.  Like I said in my original mail,
CASE_CONST_SCALAR_INT chimes nicely with SCALAR_INT_MODE_P, etc.,
and (as I didn't say) it'd be better not to have two spellings
of the same thing.

> diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
> --- gccBaseline/gcc/combine.c 2012-08-17 09:35:24.802195795 -0400
> +++ gccWCase/gcc/combine.c2012-08-20 15:43:34.659362244 -0400
> @@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)
> 
>switch (code)
>  {
> -case CONST_INT:
>  case CONST:
>  case LABEL_REF:
>  case SYMBOL_REF:
> -case CONST_DOUBLE:
> -case CONST_VECTOR:
> +CASE_CONST_UNIQUE:
>  case CLOBBER:
>return 0;
> 
> @@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
>  {
>  case LABEL_REF:
>  case SYMBOL_REF:
> -case CONST_INT:
>  case CONST:
> -case CONST_DOUBLE:
> -case CONST_VECTOR:
> +CASE_CONST_UNIQUE:
>  case PC:
>  case ADDR_VEC:
>  case ADDR_DIFF_VEC:

These were supposed to be CASE_CONST_ANY.  The omission of CONST_FIXED
looks like an oversight.

>switch (code)
>  {
> -case CONST_INT:
> -case CONST_DOUBLE:
> -case CONST_FIXED:
> +CASE_CONST_UNIQUE:
>  case SYMBOL_REF:
>  case CONST:
>  case LABEL_REF:

This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
looks like an oversight.

> +/* Match CONST_*s for which pointer equality corresponds to value 
> +equality.  */

Should be:

/* Match CONST_*s for which pointer equality corresponds to value equality.  */

(probably an artefact of my work mailer, sorry)

> +
> +
> +

Rather a lot of whitespace there.  One line seems enough, since we're
just before the definition of CONST_INT_P.

OK with those changes, thanks.

Richard


Re: [Patch,AVR] PR54222: Add fixed point support

2012-08-21 Thread Denis Chertykov
2012/8/13 Georg-Johann Lay :
> Denis Chertykov wrote:
>> 2012/8/11 Georg-Johann Lay :
>>> Weddington, Eric schrieb:
> From: Georg-Johann Lay
>
>
> The first step would be to bisect and find the patch that lead to
> PR53923.  It was not a change in the avr BE, so the question goes
> to the authors of the respective patch.
>
> Up to now I didn't even try to bisect; that would take years on the
> host that I have available...
>
>> My only real concern is that this is a major feature addition and
>> the AVR port is currently broken.
>
> I don't know if it's the avr port or some parts of the middle end that
> don't cooperate with avr.

 I would really, really love to see fixed point support added in,
 especially since I know that Sean has worked on it for quite a while,
 and you've also done a lot of work in getting the patches in shape to
 get them committed.

 But, if the AVR port is currently broken (by whomever, and whatever
 patch) and a major feature like this can't be tested to make sure it
 doesn't break anything else in the AVR backend, then I'm hesitant to
 approve (even though I really want to approve).
>>>
>>> I don't understand enough of DF to fix PR53923.  The insn that leads
>>> to the ICE is (in df-problems.c:dead_debug_insert_temp):
>>>
>>
>> Today I have updated GCC svn tree and successfully compiled avr-gcc.
>> The libgcc2-mulsc3.c from   also compiled without bugs.
>>
>> Denis.
>>
>> PS: May be I'm doing something wrong ? (I had too long vacations)
>
> I am configuring with --target=avr --disable-nls --with-dwarf2
> --enable-languages=c,c++ --enable-target-optspace=yes 
> --enable-checking=yes,rtl
>
> Build GCC is "gcc version 4.3.2".
> Build and host are i686-pc-linux-gnu.
>
> Maybe it's different on a 64-bit computer, but I only have 32-bit host.
>

I have debugging PR53923 and on my opinion it's not an AVR port bug.
Please commit fixed point support.

Denis.
PS: sorry for delay


Merge from gcc 4.7 branch to gccgo branch

2012-08-21 Thread Ian Lance Taylor
I've merged gcc 4.7 branch revision 190560 to the gccgo branch.

Ian


Re: [google/gcc-4_7] Fix regression - SUBTARGET_EXTRA_SPECS overridden by LINUX_GRTE_EXTRA_SPECS

2012-08-21 Thread 沈涵
Hi Jing, the crosstool test passed. You can start the review, thanks! -Han

On Wed, Aug 15, 2012 at 3:11 PM, Han Shen(沈涵)  wrote:
> Hi Jing, ping?
>
> On Mon, Aug 13, 2012 at 10:58 AM, Han Shen(沈涵)  wrote:
>> Hi, the google/gcc-4_7 fails to linking anything (on x86-generic), by
>> looking into specs file, it seems that 'link_emulation' section is
>> missing in specs.
>>
>> The problem is in config/i386/linux.h, SUBTARGET_EXTRA_SPECS (which is
>> not empty for chrome x86-generic) is overridden by
>> "LINUX_GRTE_EXTRA_SPECS".
>>
>> My fix is to prepend LINUX_GRTE_EXTRA_SPECS to SUBTARGET_EXTRA_SPECS in 
>> linux.h
>>
>> Jing, could you take a look at this?
>>
>> --
>> Han Shen
>>
>> 2012-08-13 Han Shen  
>> * gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS): Compute
>> new value of LINUX_GRTE_EXTRA_SPECS by pre-pending LINUX_GRTE_EXTRA_SPECS
>> to its origin value.
>> * gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS_STR): Add
>> new MACRO to hold value of SUBTARET_EXTRA_SPECS so that
>> SUBTARET_EXTRA_SPECS could be replaced later in gnu-user.h
>>
>> --- a/gcc/config/i386/gnu-user.h
>> +++ b/gcc/config/i386/gnu-user.h
>> @@ -92,11 +92,14 @@ along with GCC; see the file COPYING3.  If not see
>>  #define ASM_SPEC \
>>"--32 %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}}"
>>
>> -#undef  SUBTARGET_EXTRA_SPECS
>> -#define SUBTARGET_EXTRA_SPECS \
>> +#undef  SUBTARGET_EXTRA_SPECS_STR
>> +#define SUBTARGET_EXTRA_SPECS_STR \
>>{ "link_emulation", GNU_USER_LINK_EMULATION },\
>>{ "dynamic_linker", GNU_USER_DYNAMIC_LINKER }
>>
>> +#undef  SUBTARGET_EXTRA_SPECS
>> +#define SUBTARGET_EXTRA_SPECS SUBTARGET_EXTRA_SPECS_STR
>> +
>>  #undef LINK_SPEC
>>  #define LINK_SPEC "-m %(link_emulation) %{shared:-shared} \
>>%{!shared: \
>> --- a/gcc/config/i386/linux.h
>> +++ b/gcc/config/i386/linux.h
>> @@ -32,5 +32,11 @@ along with GCC; see the file COPYING3.  If not see
>>  #endif
>>
>>  #undef  SUBTARGET_EXTRA_SPECS
>> +#ifndef SUBTARGET_EXTRA_SPECS_STR
>>  #define SUBTARGET_EXTRA_SPECS \
>>LINUX_GRTE_EXTRA_SPECS
>> +#else
>> +#define SUBTARGET_EXTRA_SPECS \
>> +  LINUX_GRTE_EXTRA_SPECS \
>> +  SUBTARGET_EXTRA_SPECS_STR
>> +#endif
>
>
>
> --
> Han Shen |  Software Engineer |  shen...@google.com |  +1-650-440-3330



-- 
Han Shen |  Software Engineer |  shen...@google.com |  +1-650-440-3330


Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-21 Thread Joseph S. Myers
On Tue, 21 Aug 2012, Simon Baldwin wrote:

> Index: gcc/doc/options.texi
> ===
> --- gcc/doc/options.texi  (revision 190535)
> +++ gcc/doc/options.texi  (working copy)
> @@ -468,4 +468,8 @@ of @option{-@var{opt}}, if not explicitl
>  specify several different languages.  Each @var{language} must have
>  been declared by an earlier @code{Language} record.  @xref{Option file
>  format}.
> +
> +@item NoDWARFRecord
> +The option is added to the list of those omitted from the producer string
> +written by @option{-grecord-gcc-switches}.

Remove "added to the list of those" (which seems unnecessarily verbose).

> +@item @samp{nodwarfrecord}
> +Display only those options that are marked for addition to the list of
> +options omitted from @option{-grecord-gcc-switches}.

I don't think there's any need for special --help support for options with 
this flag; this flag is really an implementation detail.  (Thus, I think 
all the opts.c changes are unnecessary.)

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [AARCH64] [PATCH 2/3] AArch64 Port

2012-08-21 Thread Sofiane Naci
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Joseph S. Myers
> Sent: 25 May 2012 15:24
> To: Marcus Shawcroft
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [AARCH64] [PATCH 2/3] AArch64 Port
> 
> On Fri, 25 May 2012, Marcus Shawcroft wrote:
> 
> > Index: gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x
> > ===
> > --- gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x   (revision
> 0)
> > +++ gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x   (revision
> 0)
> > @@ -0,0 +1,5 @@
> > +if { [istarget aarch64_be-*-*] } then {
> > +   return 1
> > +}
> > +
> > +return 0
> 
> This isn't a suitable way of enabling a test only for one endianness,
> since a test may be run with -mbig-endian or -mlittle-endian with a
> compiler defaulting to the other endianness.  You need to test an
> effective-target keyword instead.
> 
> > Index: gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x
> > ===
> > --- gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x   (revision
> 0)
> > +++ gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x   (revision
> 0)
> > @@ -0,0 +1,5 @@
> > +if { [istarget aarch64_be-*-*] } then {
> > +   return 1
> > +}
> > +
> > +return 0
> 
> Likewise.

Thanks. This is now fixed in:

r190482 | sofiane | 2012-08-17 16:02:20 +0100 (Fri, 17 Aug 2012) | 9 lines
[AArch64] Use effective-target to check for big endian

Sofiane





PATCH: PR target/54347: REAL_VALUE_TO_TARGET_LONG_DOUBLE shouldn't be used in i386

2012-08-21 Thread H.J. Lu
Hi,

long double may not be 80-bit on i386.  We can't use
REAL_VALUE_TO_TARGET_LONG_DOUBLE for XFmode.  This patch replaces
REAL_VALUE_TO_TARGET_LONG_DOUBLE with real_to_target.  OK to install?

Thanks.

H.J.
---
2012-08-21  H.J. Lu  

PR target/54347
* config/i386/i386.c (ix86_split_to_parts): Replace
REAL_VALUE_TO_TARGET_LONG_DOUBLE with real_to_target.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5da4da2..a6fc45b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20743,7 +20743,9 @@ ix86_split_to_parts (rtx operand, rtx *parts, enum 
machine_mode mode)
  parts[2] = gen_int_mode (l[2], SImode);
  break;
case XFmode:
- REAL_VALUE_TO_TARGET_LONG_DOUBLE (r, l);
+ /* We can't use REAL_VALUE_TO_TARGET_LONG_DOUBLE since
+long double may not be 80-bit.  */
+ real_to_target (l, &r, mode);
  parts[2] = gen_int_mode (l[2], SImode);
  break;
case DFmode:


RE: [AARCH64] [PATCH 1/3] AArch64 Port

2012-08-21 Thread Sofiane Naci
Hi,

Thanks for the feedback. I respond here to the remaining issues:

> > Index: gcc/doc/extend.texi
> > ===
> > --- gcc/doc/extend.texi (revision 187870)
> > +++ gcc/doc/extend.texi (working copy)
> > @@ -935,7 +935,8 @@
> >
> >  Not all targets support additional floating point types.
> @code{__float80}
> >  and @code{__float128} types are supported on i386, x86_64 and ia64
> targets.
> > -The @code{__float128} type is supported on hppa HP-UX targets.
> > +The @code{__float128} type is supported on hppa HP-UX targets and
> ARM AArch64
> > +targets.
> 
> I don't see any good reason to support it on AArch64, since it's the
> same as "long double" there.  (It's on PA HP-UX as a workaround for
> libquadmath requiring the type rather than being able to with with a
> type called either "long double" or "__float128" - libquadmath being
> used on PA HP-UX as a workaround for the system libm lacking much long
> double support.  But that shouldn't be an issue for new targets such
> as AArch64 GNU/Linux.  And my understanding from N1582 is that the C
> bindings for IEEE 754-2008, being worked on for a five-part ISO/IEC
> TS, are expected to use names such as _Float128, not __float128, as
> standard names for supported IEEE floating-point types.)

Support for __float128 has been removed.

Fixed in:
r189655 | sofiane | 2012-07-19 13:24:57 +0100 (Thu, 19 Jul 2012) | 19 lines
[AArch64] Remove __float128 support.

 
> > +@opindex mbig-endian
> > +Generate big-endian code. This is the default when GCC is configured
> for an
> > +@samp{aarch64*be-*-*} target.
> 
> In general, throughout Texinfo changes, two spaces after "." at the
> end of a sentence.
> 
> > +@item -march=@var{name}
> > +@opindex march
> > +Specify the name of the target architecture, optionally suffixed by
> one or
> > +more feature modifiers. This option has the form
> > +@samp{-march=[+[no]]}, where the only value for
> @samp{}
> > +is @samp{armv8}, and the possible values for @samp{} are
> > +@samp{crypto}, @samp{fp}, @samp{simd}.
> 
> It's unfortunate that you've chosen this complicated syntax that means
> the generic support for enumerated option arguments cannot be used
> (and so --help information cannot list supported CPUs and features).
> A simpler syntax where -march takes just an architecture name and
> features have separate options would seem better, and more in line
> with most other architectures supported by GCC.
> 
> There are several Texinfo problems above.  Instead of  you
> should use @var{feature}, and since the '[' and ']' are not literal
> text they should be inside @r{} - the proper way of writing
> @samp{-march=[+[no]]} would be
> @option{-march=@var{arch}@r{[}+@r{[}no@r{]}@var{feature}@r{]}}.
> 
> Also, could you document what the feature names mean?

Documentation formatting has been fixed to conform to the required styling.
Also the documentation has been updated to clarify ambiguous parts or add
missing ones.

Fixed in:
r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines
[AArch64] Fix documentation layout.

 
> > +@item -mcpu=@var{name}
> > +@opindex mcpu
> > +Specify the name of the target processor, optionally suffixed by one
> or more
> > +feature modifiers. This option has the form @samp{-
> cpu=[+[no]]},
> > +where the possible values for @samp{} are @samp{generic},
> @samp{large},
> > +and the possible values for @samp{} are @samp{crypto},
> @samp{fp},
> > +@samp{simd}.
> 
> Same comments apply.

Same as above.

Fixed in:
r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines
[AArch64] Fix documentation layout.

 
> > +This option is very similar to the -mcpu= option, except that
> instead of
> 
> @option{-mcpu=}.  And does -mtune= take feature names or just plain CPU
> names?

Same as above.

Fixed in:
r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines
[AArch64] Fix documentation layout.

 
> > +   if (mvn == 0)
> > + {
> > +   if (widthc != 'd')
> > + sprintf (templ,"movi\t%%0.%d%c, %%1, lsl %d ",(64/width),
> > +   widthc, shift);
> > +   else
> > + sprintf (templ,"movi\t%%d0, %%1");
> > + }
> > +   else
> > + sprintf (templ,"mvni\t%%0.%d%c, %%1, lsl %d",(64/width),
> > +   widthc, shift);
> 
> Presumably you have some logic for why the 40-byte buffer size is
> enough, but could you use snprintf with sizeof (templ) specified in
> the call to protect against any mistakes in that logic?  Also, spaces
> after commas and around the "/" in the division, and the second line
> in the function call should be lined up immediately after the opening
> '(', not further right.  (Check for and fix all these issues elsewhere
> in the port as well; I've just pointed out a representative instance
> of them.)

sprinsf has been replaced with snprintf and sizeof (tem

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Andi Kleen
> The issue here is holding lock for all the files (that can be many) versus
> number of locks limits & possibilities for deadlocking (mind that updating
> may happen in different orders on the same files for different programs built
> from same objects)

lockf typically has a deadlock detector, and will error out.

-Andi


[PATCH][4.7] Backport recent heap leak fixes

2012-08-21 Thread Richard Guenther

This backports the obvious heap leak fixes that have accumulated sofar.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-21  Richard Guenther  

Backport from mainline
2012-08-16  Richard Guenther  

PR middle-end/54146
* tree-ssa-loop-niter.c (find_loop_niter_by_eval): Free the
exit vector.
* ipa-pure-const.c (analyze_function): Use FOR_EACH_LOOP_BREAK.
* cfgloop.h (FOR_EACH_LOOP_BREAK): Fix.
* tree-ssa-structalias.c (handle_lhs_call): Properly free rhsc.
* tree-ssa-loop-im.c (analyze_memory_references): Adjust.
(tree_ssa_lim_finalize): Free all mem_refs.
* tree-ssa-sccvn.c (extract_and_process_scc_for_name): Free
scc when bailing out.
* modulo-sched.c (sms_schedule): Use FOR_EACH_LOOP_BREAK.
* ira-build.c (loop_with_complex_edge_p): Free loop exit vector.
* graphite-sese-to-poly.c (scop_ivs_can_be_represented): Use
FOR_EACH_LOOP_BREAK.

2012-08-17  Richard Guenther  

* tree-sra.c (modify_function): Free redirect_callers vector.
* ipa-split.c (split_function): Free args_to_pass vector.
* tree-vect-stmts.c (vectorizable_operation): Do not pre-allocate
vec_oprnds.
(new_stmt_vec_info): Do not pre-allocate STMT_VINFO_SAME_ALIGN_REFS.
* tree-vect-slp.c (vect_free_slp_instance): Free the instance.
(vect_analyze_slp_instance): Free everything.
(destroy_bb_vec_info): Free the SLP instances.

2012-08-17  Richard Guenther  
 
* params.def (integer-share-limit): Decrease from 256 to 251,
add rationale.

2012-08-21  Richard Guenther  
 
* tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free
the affine expansion cache.

Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 190560)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -2290,7 +2290,10 @@ find_loop_niter_by_eval (struct loop *lo
   /* Loops with multiple exits are expensive to handle and less important.  */
   if (!flag_expensive_optimizations
   && VEC_length (edge, exits) > 1)
-return chrec_dont_know;
+{
+  VEC_free (edge, heap, exits);
+  return chrec_dont_know;
+}
 
   FOR_EACH_VEC_ELT (edge, exits, i, ex)
 {
Index: gcc/ipa-pure-const.c
===
--- gcc/ipa-pure-const.c(revision 190560)
+++ gcc/ipa-pure-const.c(working copy)
@@ -803,7 +803,7 @@ end:
if (dump_file)
  fprintf (dump_file, "can not prove finiteness of loop 
%i\n", loop->num);
l->looping =true;
-   break;
+   FOR_EACH_LOOP_BREAK (li);
  }
  scev_finalize ();
}
Index: gcc/ipa-split.c
===
--- gcc/ipa-split.c (revision 190560)
+++ gcc/ipa-split.c (working copy)
@@ -1239,6 +1239,7 @@ split_function (struct split_point *spli
   }
   call = gimple_build_call_vec (node->decl, args_to_pass);
   gimple_set_block (call, DECL_INITIAL (current_function_decl));
+  VEC_free (tree, heap, args_to_pass);
 
   /* We avoid address being taken on any variable used by split part,
  so return slot optimization is always possible.  Moreover this is
Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 190560)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -3229,6 +3229,7 @@ scop_ivs_can_be_represented (scop_p scop
   loop_iterator li;
   loop_p loop;
   gimple_stmt_iterator psi;
+  bool result = true;
 
   FOR_EACH_LOOP (li, loop, 0)
 {
@@ -3244,11 +3245,16 @@ scop_ivs_can_be_represented (scop_p scop
 
  if (TYPE_UNSIGNED (type)
  && TYPE_PRECISION (type) >= TYPE_PRECISION 
(long_long_integer_type_node))
-   return false;
+   {
+ result = false;
+ break;
+   }
}
+  if (!result)
+   FOR_EACH_LOOP_BREAK (li);
 }
 
-  return true;
+  return result;
 }
 
 /* Builds the polyhedral representation for a SESE region.  */
Index: gcc/cfgloop.h
===
--- gcc/cfgloop.h   (revision 190560)
+++ gcc/cfgloop.h   (working copy)
@@ -629,7 +629,7 @@ fel_init (loop_iterator *li, loop_p *loo
 
 #define FOR_EACH_LOOP_BREAK(LI) \
   { \
-VEC_free (int, heap, (LI)->to_visit); \
+VEC_free (int, heap, (LI).to_visit); \
 break; \
   }
 
Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 190560)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -3859,9 +3859,11 @@ handle_lhs_call (gimple stmt, tree lhs,
 

Re: [PATCH] Combine location with block using block_locations

2012-08-21 Thread Richard Guenther
On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen  wrote:
> ping

Conceptually I like the change.  Can a libcpp maintainer please have a 2nd
look?

Dehao, did you do any compile-time and memory-usage benchmarks?

Thanks,
Richard.

> Thanks,
> Dehao
>
> On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen  wrote:
>> Hi, Dodji,
>>
>> Thanks for the review. I've fixed all the addressed issues. I'm
>> attaching the related changes:
>>
>> Thanks,
>> Dehao
>>
>> libcpp/ChangeLog:
>> 2012-08-01  Dehao Chen  
>>
>> * include/line-map.h (MAX_SOURCE_LOCATION): New value.
>> (location_adhoc_data_init): New.
>> (location_adhoc_data_fini): New.
>> (get_combined_adhoc_loc): New.
>> (get_data_from_adhoc_loc): New.
>> (get_location_from_adhoc_loc): New.
>> (COMBINE_LOCATION_DATA): New.
>> (IS_ADHOC_LOC): New.
>> (expanded_location): New field.
>> * line-map.c (location_adhoc_data): New.
>> (location_adhoc_data_htab): New.
>> (curr_adhoc_loc): New.
>> (location_adhoc_data): New.
>> (allocated_location_adhoc_data): New.
>> (location_adhoc_data_hash): New.
>> (location_adhoc_data_eq): New.
>> (location_adhoc_data_update): New.
>> (get_combined_adhoc_loc): New.
>> (get_data_from_adhoc_loc): New.
>> (get_location_from_adhoc_loc): New.
>> (location_adhoc_data_init): New.
>> (location_adhoc_data_fini): New.
>> (linemap_lookup): Change to use new location.
>> (linemap_ordinary_map_lookup): Likewise.
>> (linemap_macro_map_lookup): Likewise.
>> (linemap_macro_map_loc_to_def_point): Likewise.
>> (linemap_macro_map_loc_unwind_toward_spel): Likewise.
>> (linemap_get_expansion_line): Likewise.
>> (linemap_get_expansion_filename): Likewise.
>> (linemap_location_in_system_header_p): Likewise.
>> (linemap_location_from_macro_expansion_p): Likewise.
>> (linemap_macro_loc_to_spelling_point): Likewise.
>> (linemap_macro_loc_to_def_point): Likewise.
>> (linemap_macro_loc_to_exp_point): Likewise.
>> (linemap_resolve_location): Likewise.
>> (linemap_unwind_toward_expansion): Likewise.
>> (linemap_unwind_to_first_non_reserved_loc): Likewise.
>> (linemap_expand_location): Likewise.
>> (linemap_dump_location): Likewise.
>>
>> Index: libcpp/line-map.c
>> ===
>> --- libcpp/line-map.c   (revision 190209)
>> +++ libcpp/line-map.c   (working copy)
>> @@ -25,6 +25,7 @@
>>  #include "line-map.h"
>>  #include "cpplib.h"
>>  #include "internal.h"
>> +#include "hashtab.h"
>>
>>  static void trace_include (const struct line_maps *, const struct line_map 
>> *);
>>  static const struct line_map * linemap_ordinary_map_lookup (struct 
>> line_maps *,
>> @@ -50,6 +51,135 @@
>>  extern unsigned num_expanded_macros_counter;
>>  extern unsigned num_macro_tokens_counter;
>>
>> +/* Data structure to associate an arbitrary data to a source location.  */
>> +struct location_adhoc_data {
>> +  source_location locus;
>> +  void *data;
>> +};
>> +
>> +/* The following data structure encodes a location with some adhoc data
>> +   and maps it to a new unsigned integer (called an adhoc location)
>> +   that replaces the original location to represent the mapping.
>> +
>> +   The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the
>> +   highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as
>> +   the original location. Once identified as the adhoc_loc, the lower 31
>> +   bits of the integer is used to index the location_adhoc_data array,
>> +   in which the locus and associated data is stored.  */
>> +
>> +static htab_t location_adhoc_data_htab;
>> +static source_location curr_adhoc_loc;
>> +static struct location_adhoc_data *location_adhoc_data;
>> +static unsigned int allocated_location_adhoc_data;
>> +
>> +/* Hash function for location_adhoc_data hashtable.  */
>> +
>> +static hashval_t
>> +location_adhoc_data_hash (const void *l)
>> +{
>> +  const struct location_adhoc_data *lb =
>> +  (const struct location_adhoc_data *) l;
>> +  return (hashval_t) lb->locus + (size_t) &lb->data;
>> +}
>> +
>> +/* Compare function for location_adhoc_data hashtable.  */
>> +
>> +static int
>> +location_adhoc_data_eq (const void *l1, const void *l2)
>> +{
>> +  const struct location_adhoc_data *lb1 =
>> +  (const struct location_adhoc_data *) l1;
>> +  const struct location_adhoc_data *lb2 =
>> +  (const struct location_adhoc_data *) l2;
>> +  return lb1->locus == lb2->locus && lb1->data == lb2->data;
>> +}
>> +
>> +/* Update the hashtable when location_adhoc_data is reallocated.  */
>> +
>> +static int
>> +location_adhoc_data_update (void **slot, void *data)
>> +{
>> +  *((char **) slot) += ((char *) location_adhoc_data - (char *) data);
>> +  return 1;
>> +}
>> +
>> +/* Combine LOC

Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-21 Thread Diego Novillo

On 2012-08-20 22:41 , Sriraman Tallam wrote:

Hi Gerald / Diego,

 I have made all the mentioned changes.  I also shortened the
description like Diego mentioned by removing all the strings but kept
the caveats. I have not added a reference to the documentation because
i do not know what link to reference. The builtins are completely
documented in extend.texi.


Referring to the user's manual is OK, I think.


+Caveat: If these built-in functions are called before any static
+constructors are invoked, like during IFUNC initialization, then the CPU
+detection initialization must be explicity run using this newly provided


s/explicity/explicitly/

Other than that, it looks fine to me.


Diego.


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Kenneth Zadeck
Now that I have had a chance to talk to Richard, I have now done 
everything that he requested in his email.


Here is the new patch and changelog.   Everything was tested on x86-64.

2012-08-21  Kenneth Zadeck 

* alias.c (rtx_equal_for_memref_p): Convert constant cases.
* combine.c (find_single_use_1, mark_used_regs_combine): Ditto.
 * cse.c (exp_equiv_p, canon_reg, fold_rtx, cse_process_notes_1,
count_reg_usage): Ditto.
* cselib.c (cselib_expand_value_rtx_1): Convert to
CASE_CONST_ANY.
(cselib_subst_to_values): Convert constant cases.
* df-scan.c (df_uses_record): Ditto.
* dse.c (const_or_frame_p): Convert case statements to explicit
if-then-else using mode classes.
* emit-rtl.c (verify_rtx_sharing, copy_insn_1): Convert constant cases.
* explow.c (convert_memory_address_addr_space): Ditto.
* gcse.c (want_to_gcse_p, oprs_unchanged_p, compute_transp): Ditto.
* genattrtab.c (attr_copy_rtx, clear_struct_flag): Ditto.
* ira.c (equiv_init_varies_p, contains_replace_regs,
memref_referenced_p, rtx_moveable_p): Ditto.
* jump.c (mark_jump_label_1): Remove constant cases.
(rtx_renumbered_equal_p): Convert to CASE_CONST_UNIQUE.
* loop-invariant.c (check_maybe_invariant): Convert constant cases.
(hash_invariant_expr_1,invariant_expr_equal_p): Convert to
CASE_CONST_ALL.
* postreload-gcse.c (oprs_unchanged_p): Convert constant cases.
* reginfo.c (reg_scan_mark_refs): Ditto.
* regrename.c (scan_rtx): Ditto.
* reload1.c (eliminate_regs_1, elimination_effects,
scan_paradoxical_subregs): Ditto.
* reload.c (operands_match_p, subst_reg_equivs):  Ditto.
* resource.c (mark_referenced_resources, mark_set_resources): Ditto.
* rtlanal.c (rtx_unstable_p, rtx_varies_p, count_occurrences)
(reg_mentioned_p, modified_between_p, modified_in_p)
(volatile_insn_p, volatile_refs_p, side_effects_p, may_trap_p_1,
inequality_comparisons_p, computed_jump_p_1): Ditto.
* rtl.c (copy_rtx, rtx_equal_p_cb, rtx_equal_p): Ditto.
* sched-deps.c (sched_analyze_2): Ditto.
* valtrack.c (cleanup_auto_inc_dec): Ditto.
* rtl.h: (CASE_CONST_SCALAR_INTEGER, CASE_CONST_UNIQUE,
CASE_CONST_ANY): New macros.


I plan to commit this in a few days unless someone has some comments.   
This is a mostly trivial patch and the changes from that are Richard 
Sandiford's and he is an rtl maintainer.


kenny

On 08/20/2012 09:58 AM, Kenneth Zadeck wrote:

I of course meant the machine "independent" not "dependent"
On 08/20/2012 09:50 AM, Kenneth Zadeck wrote:
This patch started out to be a purely mechanical change to the switch 
statements so that the ones that are used to take apart constants can 
be logically grouped. This is important for the next patch that I 
will submit this week that frees the rtl level from only being able 
to represent large integer constants with two HWIs.


I sent the patch to Richard Sandiford and when the comments came back 
from him, this patch turned into something that actually has real 
semantic changes.   (His comments are enclosed below.)   I did almost 
all of Richard's changes because he is generally right about such 
things, but it does mean that the patch has to be more carefully 
reviewed.   Richard does not count his comments as a review.


The patch has, of course, been properly tested on x86-64.

Any comments?  Ok for commit?

Kenny




diff -upNr '--exclude=.svn' gccBaseline/gcc/alias.c gccWCase/gcc/alias.c
--- gccBaseline/gcc/alias.c	2012-08-17 09:35:24.794195890 -0400
+++ gccWCase/gcc/alias.c	2012-08-19 09:48:33.666509880 -0400
@@ -1486,9 +1486,7 @@ rtx_equal_for_memref_p (const_rtx x, con
   return XSTR (x, 0) == XSTR (y, 0);
 
 case VALUE:
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
   /* There's no need to compare the contents of CONST_DOUBLEs or
 	 CONST_INTs because pointer equality is a good enough
 	 comparison for these nodes.  */
diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
--- gccBaseline/gcc/combine.c	2012-08-17 09:35:24.802195795 -0400
+++ gccWCase/gcc/combine.c	2012-08-20 15:43:34.659362244 -0400
@@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)
 
   switch (code)
 {
-case CONST_INT:
 case CONST:
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
 case CLOBBER:
   return 0;
 
@@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
 {
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_INT:
 case CONST:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
 case PC:
 case ADDR_VEC:
 case ADDR_DIFF_VEC:
diff -upNr '--exclude=.svn' gccBaseline/gcc/cse.c gccWCase/gcc/cse.c
--- gccBaseline/gcc/cse.c	2012-07-27 16:58:24.829691705 -0400
+++ gccWCase/gcc/cse.c	2012-08-20 15:47:26.924501205 -0400
@@ -2623,9 +2623,7 @@ exp_equiv_p (const_rtx x, 

Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-21 Thread Simon Baldwin
On 20 August 2012 16:45, Joseph S. Myers  wrote:
>
> On Mon, 20 Aug 2012, Simon Baldwin wrote:
>
> > > OPT_* for Fortran options only exist when the Fortran front-end is in the
> > > source tree (whether or not enabled).  I think we try to avoid knowingly
> > > breaking use cases where people remove some front ends from the source
> > > tree, although we don't actively test them and no longer provide split-up
> > > source tarballs.
> >
> > Thanks for the update.  Which fix should move forwards?
>
> I think the approach using a new option flag is the way to go, though the
> patch needs (at least) documentation for the new flag in options.texi.
>

Updated version appended below.  Okay for 4.8 trunk?

--

Omit OPT_cpp_ from the DWARF producer string in gfortran.

Gfortran uses -cpp= internally, and with -grecord_gcc_switches
this command line switch is stored by default in object files.  This causes
problems with build and packaging systems that care about gcc binary
reproducibility and file checksums; the temporary file is different on each
compiler invocation.

Fixed by adding a new opt marker NoDWARFRecord and associated flag, filtering
options for this this setting when writing the producer string, and setting
this flag for fortran -cpp=

Tested for fortran (suppresses -cpp=...) and c (no effect).

gcc/ChangeLog
2012-08-21  Simon Baldwin  

* dwarf2out.c (gen_producer_string): Omit command line switch if
CL_NO_DWARF_RECORD flag set.
* opts.c (print_specific_help): Add CL_NO_DWARF_RECORD handling.
* opts.h (CL_NO_DWARF_RECORD): New.
* opt-functions.awk (switch_flags): Add NoDWARFRecord.
* doc/options.texi: Document NoDWARFRecord option flag.
* doc/invoke.texi: Document --help=nodwarfrecord.

gcc/fortran/ChangeLog
2012-08-21  Simon Baldwin  

* lang.opt (-cpp=): Mark flag NoDWARFRecord.


Index: gcc/doc/options.texi
===
--- gcc/doc/options.texi(revision 190535)
+++ gcc/doc/options.texi(working copy)
@@ -468,4 +468,8 @@ of @option{-@var{opt}}, if not explicitl
 specify several different languages.  Each @var{language} must have
 been declared by an earlier @code{Language} record.  @xref{Option file
 format}.
+
+@item NoDWARFRecord
+The option is added to the list of those omitted from the producer string
+written by @option{-grecord-gcc-switches}.
 @end table
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 190535)
+++ gcc/doc/invoke.texi (working copy)
@@ -1330,6 +1330,10 @@ sign in the same continuous piece of tex
 @item @samp{separate}
 Display options taking an argument that appears as a separate word
 following the original option, such as: @samp{-o output-file}.
+
+@item @samp{nodwarfrecord}
+Display only those options that are marked for addition to the list of
+options omitted from @option{-grecord-gcc-switches}.
 @end table

 Thus for example to display all the undocumented target-specific
Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 190535)
+++ gcc/dwarf2out.c (working copy)
@@ -18101,6 +18101,9 @@ gen_producer_string (void)
/* Ignore these.  */
continue;
   default:
+if (cl_options[save_decoded_options[j].opt_index].flags
+   & CL_NO_DWARF_RECORD)
+ continue;
 gcc_checking_assert (save_decoded_options[j].canonical_option[0][0]
 == '-');
 switch (save_decoded_options[j].canonical_option[0][1])
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 190535)
+++ gcc/opts.c  (working copy)
@@ -1186,7 +1186,9 @@ print_specific_help (unsigned int includ
 {
   if (any_flags == 0)
{
- if (include_flags & CL_UNDOCUMENTED)
+ if (include_flags & CL_NO_DWARF_RECORD)
+   description = _("The following options are not recorded by DWARF");
+  else if (include_flags & CL_UNDOCUMENTED)
description = _("The following options are not documented");
  else if (include_flags & CL_SEPARATE)
description = _("The following options take separate arguments");
@@ -1292,7 +1294,7 @@ common_handle_option (struct gcc_options
/* Walk along the argument string, parsing each word in turn.
   The format is:
   arg = [^]{word}[,{arg}]
-  word = {optimizers|target|warnings|undocumented|
+  word = {optimizers|target|warnings|undocumented|nodwarfrecord|
   params|common|}  */
while (* a != 0)
  {
@@ -1307,6 +1309,7 @@ common_handle_option (struct gcc_options
  { "target", CL_TARGET },
  { "warnings", CL_WARNING },
  { "undocumented", CL_UNDOCUMENTED },
+ { "nodwarfrecord", CL_NO_DWARF_REC

[Patch,testsuite] Break gcc.dg/fixed-point/convert.c into manageable parts

2012-08-21 Thread Georg-Johann Lay
Just as the title says: gcc.dg/fixed-point/convert.c is much too big to run on
embedded targets like AVR.

Note that embedded systems are a main audience of ISO/IEC TR 18037,
and that these systems might have limited resources.

The original convert.c inflates to thousands of functions and set -O0.
Some targets need to emulate *everything*, even integer multiplication,
and the executable is much too fat.

The patch breaks up convert.c in parts so that an AVR ATmega103 device
with 128KiB for executable code (.text + .data + .rodata) can run them.

Ok for trunk?

Johann

* gcc.dg/fixed-point/convert.c: Split into more manageable parts:
* gcc.dg/fixed-point/convert-1.c: New.
* gcc.dg/fixed-point/convert-2.c: New.
* gcc.dg/fixed-point/convert-3.c: New.
* gcc.dg/fixed-point/convert-4.c: New.
* gcc.dg/fixed-point/convert-float-1.c: New.
* gcc.dg/fixed-point/convert-float-2.c: New.
* gcc.dg/fixed-point/convert-float-3.c: New.
* gcc.dg/fixed-point/convert-float-4.c: New.
* gcc.dg/fixed-point/convert-accum-neg.c: New.
* gcc.dg/fixed-point/convert-sat.c: New.
* gcc.dg/fixed-point/convert.h: New.


Index: gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-sat.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-sat.c	(revision 0)
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -O0" } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include "convert.h"
+
+int main ()
+{
+  SAT_CONV1 (short _Accum, hk);
+  SAT_CONV1 (_Accum, k);
+  SAT_CONV1 (long _Accum, lk);
+  SAT_CONV1 (long long _Accum, llk);
+
+  SAT_CONV2 (unsigned short _Accum, uhk);
+  SAT_CONV2 (unsigned _Accum, uk);
+  SAT_CONV2 (unsigned long _Accum, ulk);
+  SAT_CONV2 (unsigned long long _Accum, ullk);
+
+  SAT_CONV3 (short _Fract, hr);
+  SAT_CONV3 (_Fract, r);
+  SAT_CONV3 (long _Fract, lr);
+  SAT_CONV3 (long long _Fract, llr);
+
+  SAT_CONV4 (signed char);
+  SAT_CONV4 (short);
+  SAT_CONV4 (int);
+  SAT_CONV4 (long);
+  SAT_CONV4 (long long);
+
+  SAT_CONV5 (unsigned char);
+  SAT_CONV5 (unsigned short);
+  SAT_CONV5 (unsigned int);
+  SAT_CONV5 (unsigned long);
+  SAT_CONV5 (unsigned long long);
+
+  SAT_CONV6 (float);
+  SAT_CONV6 (double);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c	(revision 0)
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -O0" } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include "convert.h"
+
+int main ()
+{
+  ALL_ACCUM_CONV (short _Accum, hk);
+  ALL_ACCUM_CONV (_Accum, k);
+  ALL_ACCUM_CONV (long _Accum, lk);
+  ALL_ACCUM_CONV (long long _Accum, llk);
+  ALL_ACCUM_CONV (unsigned short _Accum, uhk);
+  ALL_ACCUM_CONV (unsigned _Accum, uk);
+  ALL_ACCUM_CONV (unsigned long _Accum, ulk);
+  ALL_ACCUM_CONV (unsigned long long _Accum, ullk);
+
+  NEG_CONV (short _Fract, hr);
+  NEG_CONV (_Fract, r);
+  NEG_CONV (long _Fract, lr);
+  NEG_CONV (long long _Fract, llr);
+  NEG_CONV (short _Accum, hk);
+  NEG_CONV (_Accum, k);
+  NEG_CONV (long _Accum, lk);
+  NEG_CONV (long long _Accum, llk);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert-1.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-1.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -O0" } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include "convert.h"
+
+int main ()
+{
+  ALL_CONV (short _Fract, hr);
+  ALL_CONV (_Fract, r);
+  ALL_CONV (long _Fract, lr);
+  ALL_CONV (long long _Fract, llr);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert-2.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-2.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-std=gnu99 -O0" } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include "convert.h"
+
+int main ()
+{
+  ALL_CONV (unsigned short _Fract, uhr);
+  ALL_CONV (unsigned _Fract, ur);
+  ALL_CONV (unsigned long _Fract, ulr);
+  ALL_CONV (unsigned long long _Fract, ullr);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert.c	(revision 190558)

Re: [PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function

2012-08-21 Thread Richard Guenther
On Tue, Aug 21, 2012 at 1:27 PM, Martin Jambor  wrote:
> On Wed, Aug 15, 2012 at 05:21:04PM +0200, Martin Jambor wrote:
>> Hi,
>>
>> On Fri, Aug 10, 2012 at 04:57:41PM +0200, Eric Botcazou wrote:
>> > > - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls
>> > >   dump_function which in turns calls dump_function_to_file which calls
>> > >   push_cfun.  But Ada front end has its idea of the
>> > >   current_function_decl and there is no cfun which is an inconsistency
>> > >   which makes push_cfun assert fail.  I "solved" it by temporarily
>> > >   setting current_function_decl to NULL_TREE.  It's just dumping and I
>> > >   thought that dump_function should be considered middle-end and thus
>> > >   middle-end invariants should apply.
>> >
>> > If you think that calling dump_function from 
>> > rest_of_subprog_body_compilation
>> > is a layering violation, I don't have a problem with replacing it with a 
>> > more
>> > "manual" scheme like the one in c-family/c-gimplify.c:c_genericize, 
>> > provided
>> > that this yields roughly the same output.
>>
>> Richi suggested on IRC that I remove the push/pop_cfun calls from
>> dump_function_to_file.  The only problem seems to be
>> dump_histograms_for_stmt
>
> Yesterday I actually tried and it is not the only problem.  Another
> one is dump_function_to_file->dump_bb->maybe_hot_bb_p which uses cfun
> to read profile_status.  There may be others, this one just blew up
> first when I set cfun to NULL.  And in future someone is quite likely
> to need cfun to dump something new too.
>
> At the same time, re-implementing dumping
> c-family/c-gimplify.c:c_genericize when dump_function suffices seems
> ugly to me.
>
> So I am going to declare dump_function a front-end interface and use
> set_cfun in my original patch in dump_function_to_file like we do in
> other such functions.
>
> I hope that will be OK.  Thanks,

Setting cfun has side-effects of switching target stuff which might have
code-generation side-effects because of implementation issues we have
with target/optimize attributes.  So I don't think cfun should be changed
just for dumping.

Can you instead just set current_function_decl and access
struct function via DECL_STRUCT_FUNCTION in the dumpers then?
After all, it it is a front-end interface, the frontend way of saying
"this is the current function" is to set current_function_decl, not the
middle-end cfun.

Richard.

> Martin
>
> PS: Each of various alternatives proposed in this thread had someone
> who opposed it.  If there is a consensus that some of them should be
> implemented anyway (like global value profiling hash), I am willing to
> do that, I just do not want to end up bickering about the result.


Re: [PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function

2012-08-21 Thread Martin Jambor
On Wed, Aug 15, 2012 at 05:21:04PM +0200, Martin Jambor wrote:
> Hi,
> 
> On Fri, Aug 10, 2012 at 04:57:41PM +0200, Eric Botcazou wrote:
> > > - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls
> > >   dump_function which in turns calls dump_function_to_file which calls
> > >   push_cfun.  But Ada front end has its idea of the
> > >   current_function_decl and there is no cfun which is an inconsistency
> > >   which makes push_cfun assert fail.  I "solved" it by temporarily
> > >   setting current_function_decl to NULL_TREE.  It's just dumping and I
> > >   thought that dump_function should be considered middle-end and thus
> > >   middle-end invariants should apply.
> > 
> > If you think that calling dump_function from 
> > rest_of_subprog_body_compilation 
> > is a layering violation, I don't have a problem with replacing it with a 
> > more 
> > "manual" scheme like the one in c-family/c-gimplify.c:c_genericize, 
> > provided 
> > that this yields roughly the same output.
> 
> Richi suggested on IRC that I remove the push/pop_cfun calls from
> dump_function_to_file.  The only problem seems to be
> dump_histograms_for_stmt

Yesterday I actually tried and it is not the only problem.  Another
one is dump_function_to_file->dump_bb->maybe_hot_bb_p which uses cfun
to read profile_status.  There may be others, this one just blew up
first when I set cfun to NULL.  And in future someone is quite likely
to need cfun to dump something new too.

At the same time, re-implementing dumping
c-family/c-gimplify.c:c_genericize when dump_function suffices seems
ugly to me.

So I am going to declare dump_function a front-end interface and use
set_cfun in my original patch in dump_function_to_file like we do in
other such functions.

I hope that will be OK.  Thanks,

Martin

PS: Each of various alternatives proposed in this thread had someone
who opposed it.  If there is a consensus that some of them should be
implemented anyway (like global value profiling hash), I am willing to
do that, I just do not want to end up bickering about the result.


Re: [PATCH] Document tree.h flags more, fixup valgrind alloc-pool.c

2012-08-21 Thread Richard Guenther
On Tue, 21 Aug 2012, Richard Guenther wrote:

> 
> Testing in progress.
> 
> Richard.
> 
> 2012-08-21  Richard Guenther  
> 
>   * alloc-pool.c (pool_alloc): Fix valgrind annotation.
>   * tree.h: Complete flags documentation.
>   (CLEANUP_EH_ONLY): Check documented allowed tree codes.

I have instead applied the following - the C++ frontend uses
CLEANUP_EH_ONLY on C++ specific trees.

Bootstrapped on x86_64-unknown-linux-gnu.

Richard.

2012-08-21  Richard Guenther  

* alloc-pool.c (pool_alloc): Fix valgrind annotation.
* tree.h: Fix typo and complete flags documentation.

Index: gcc/alloc-pool.c
===
--- gcc/alloc-pool.c(revision 190558)
+++ gcc/alloc-pool.c(working copy)
@@ -247,7 +247,9 @@ void *
 pool_alloc (alloc_pool pool)
 {
   alloc_pool_list header;
-  VALGRIND_DISCARD (int size);
+#ifdef ENABLE_VALGRIND_CHECKING
+  int size;
+#endif
 
   if (GATHER_STATISTICS)
 {
@@ -260,7 +262,9 @@ pool_alloc (alloc_pool pool)
 }
 
   gcc_checking_assert (pool);
-  VALGRIND_DISCARD (size = pool->elt_size - offsetof (allocation_object, 
u.data));
+#ifdef ENABLE_VALGRIND_CHECKING
+  size = pool->elt_size - offsetof (allocation_object, u.data);
+#endif
 
   /* If there are no more free elements, make some more!.  */
   if (!pool->returned_free_list)
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 190558)
+++ gcc/tree.h  (working copy)
@@ -417,7 +417,7 @@ enum omp_clause_code
so all nodes have these fields.
 
See the accessor macros, defined below, for documentation of the
-   fields, and the table below which connects the fileds and the
+   fields, and the table below which connects the fields and the
accessor macros.  */
 
 struct GTY(()) tree_base {
@@ -494,6 +494,9 @@ struct GTY(()) tree_base {
CASE_LOW_SEEN in
CASE_LABEL_EXPR
 
+   PREDICT_EXPR_OUTCOME in
+  PREDICT_EXPR
+
static_flag:
 
TREE_STATIC in
@@ -576,12 +579,16 @@ struct GTY(()) tree_base {
 
OMP_PARALLEL_COMBINED in
OMP_PARALLEL
+
OMP_CLAUSE_PRIVATE_OUTER_REF in
   OMP_CLAUSE_PRIVATE
 
TYPE_REF_IS_RVALUE in
   REFERENCE_TYPE
 
+   ENUM_IS_OPAQUE in
+  ENUMERAL_TYPE
+
protected_flag:
 
TREE_PROTECTED in



[PATCH] Document tree.h flags more, fixup valgrind alloc-pool.c

2012-08-21 Thread Richard Guenther

Testing in progress.

Richard.

2012-08-21  Richard Guenther  

* alloc-pool.c (pool_alloc): Fix valgrind annotation.
* tree.h: Complete flags documentation.
(CLEANUP_EH_ONLY): Check documented allowed tree codes.

Index: gcc/alloc-pool.c
===
--- gcc/alloc-pool.c(revision 190558)
+++ gcc/alloc-pool.c(working copy)
@@ -247,7 +247,9 @@ void *
 pool_alloc (alloc_pool pool)
 {
   alloc_pool_list header;
-  VALGRIND_DISCARD (int size);
+#ifdef ENABLE_VALGRIND_CHECKING
+  int size;
+#endif
 
   if (GATHER_STATISTICS)
 {
@@ -260,7 +262,9 @@ pool_alloc (alloc_pool pool)
 }
 
   gcc_checking_assert (pool);
-  VALGRIND_DISCARD (size = pool->elt_size - offsetof (allocation_object, 
u.data));
+#ifdef ENABLE_VALGRIND_CHECKING
+  size = pool->elt_size - offsetof (allocation_object, u.data);
+#endif
 
   /* If there are no more free elements, make some more!.  */
   if (!pool->returned_free_list)
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 190558)
+++ gcc/tree.h  (working copy)
@@ -417,7 +417,7 @@ enum omp_clause_code
so all nodes have these fields.
 
See the accessor macros, defined below, for documentation of the
-   fields, and the table below which connects the fileds and the
+   fields, and the table below which connects the fields and the
accessor macros.  */
 
 struct GTY(()) tree_base {
@@ -494,6 +494,9 @@ struct GTY(()) tree_base {
CASE_LOW_SEEN in
CASE_LABEL_EXPR
 
+   PREDICT_EXPR_OUTCOME in
+  PREDICT_EXPR
+
static_flag:
 
TREE_STATIC in
@@ -576,12 +579,16 @@ struct GTY(()) tree_base {
 
OMP_PARALLEL_COMBINED in
OMP_PARALLEL
+
OMP_CLAUSE_PRIVATE_OUTER_REF in
   OMP_CLAUSE_PRIVATE
 
TYPE_REF_IS_RVALUE in
   REFERENCE_TYPE
 
+   ENUM_IS_OPAQUE in
+  ENUMERAL_TYPE
+
protected_flag:
 
TREE_PROTECTED in
@@ -1117,7 +1124,8 @@ extern void omp_clause_range_check_faile
 /* In a TARGET_EXPR or WITH_CLEANUP_EXPR, means that the pertinent cleanup
should only be executed if an exception is thrown, not on normal exit
of its scope.  */
-#define CLEANUP_EH_ONLY(NODE) ((NODE)->base.static_flag)
+#define CLEANUP_EH_ONLY(NODE) \
+  (TREE_CHECK2 (NODE, TARGET_EXPR, WITH_CLEANUP_EXPR)->base.static_flag)
 
 /* In a TRY_CATCH_EXPR, means that the handler should be considered a
separate cleanup in honor_protect_cleanup_actions.  */


[PATCH] Fix more leaks

2012-08-21 Thread Richard Guenther

This fixes a few more heap leaks.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2012-08-21  Richard Guenther  

* tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free
the affine expansion cache.
* tree-ssa-dom.c (free_expr_hash_elt_contents): New function,
split out from ...
(free_expr_hash_elt): ... this one.
(record_cond): Properly free a not needed hashtable element.
(lookup_avail_expr): Likewise.
* tree-into-ssa.c (init_ssa_renamer): Specify a free function
for the var_infos hashtable.
(update_ssa): Likewise.

Index: gcc/tree-ssa-loop-im.c
===
*** gcc/tree-ssa-loop-im.c  (revision 190533)
--- gcc/tree-ssa-loop-im.c  (working copy)
*** tree_ssa_lim_finalize (void)
*** 2634,2640 
VEC_free (bitmap, heap, memory_accesses.all_refs_stored_in_loop);
  
if (memory_accesses.ttae_cache)
! pointer_map_destroy (memory_accesses.ttae_cache);
  }
  
  /* Moves invariants from loops.  Only "expensive" invariants are moved out --
--- 2634,2640 
VEC_free (bitmap, heap, memory_accesses.all_refs_stored_in_loop);
  
if (memory_accesses.ttae_cache)
! free_affine_expand_cache (&memory_accesses.ttae_cache);
  }
  
  /* Moves invariants from loops.  Only "expensive" invariants are moved out --
Index: gcc/tree-ssa-dom.c
===
*** gcc/tree-ssa-dom.c  (revision 190533)
--- gcc/tree-ssa-dom.c  (working copy)
*** print_expr_hash_elt (FILE * stream, cons
*** 649,667 
  }
  }
  
! /* Delete an expr_hash_elt and reclaim its storage.  */
  
  static void
! free_expr_hash_elt (void *elt)
  {
-   struct expr_hash_elt *element = ((struct expr_hash_elt *)elt);
- 
if (element->expr.kind == EXPR_CALL)
  free (element->expr.ops.call.args);
! 
!   if (element->expr.kind == EXPR_PHI)
  free (element->expr.ops.phi.args);
  
free (element);
  }
  
--- 649,672 
  }
  }
  
! /* Delete variable sized pieces of the expr_hash_elt ELEMENT.  */
  
  static void
! free_expr_hash_elt_contents (struct expr_hash_elt *element)
  {
if (element->expr.kind == EXPR_CALL)
  free (element->expr.ops.call.args);
!   else if (element->expr.kind == EXPR_PHI)
  free (element->expr.ops.phi.args);
+ }
+ 
+ /* Delete an expr_hash_elt and reclaim its storage.  */
  
+ static void
+ free_expr_hash_elt (void *elt)
+ {
+   struct expr_hash_elt *element = ((struct expr_hash_elt *)elt);
+   free_expr_hash_elt_contents (element);
free (element);
  }
  
*** lookup_avail_expr (gimple stmt, bool ins
*** 2404,2412 
slot = htab_find_slot_with_hash (avail_exprs, &element, element.hash,
   (insert ? INSERT : NO_INSERT));
if (slot == NULL)
! return NULL_TREE;
! 
!   if (*slot == NULL)
  {
struct expr_hash_elt *element2 = XNEW (struct expr_hash_elt);
*element2 = element;
--- 2409,2419 
slot = htab_find_slot_with_hash (avail_exprs, &element, element.hash,
   (insert ? INSERT : NO_INSERT));
if (slot == NULL)
! {
!   free_expr_hash_elt_contents (&element);
!   return NULL_TREE;
! }
!   else if (*slot == NULL)
  {
struct expr_hash_elt *element2 = XNEW (struct expr_hash_elt);
*element2 = element;
*** lookup_avail_expr (gimple stmt, bool ins
*** 2422,2427 
--- 2429,2436 
VEC_safe_push (expr_hash_elt_t, heap, avail_exprs_stack, element2);
return NULL_TREE;
  }
+   else
+ free_expr_hash_elt_contents (&element);
  
/* Extract the LHS of the assignment so that it can be used as the current
   definition of another variable.  */
Index: gcc/tree-into-ssa.c
===
*** gcc/tree-into-ssa.c (revision 190533)
--- gcc/tree-into-ssa.c (working copy)
*** init_ssa_renamer (void)
*** 2291,2297 
/* Allocate memory for the DEF_BLOCKS hash table.  */
gcc_assert (var_infos == NULL);
var_infos = htab_create (VEC_length (tree, cfun->local_decls),
!  var_info_hash, var_info_eq, NULL);
  
bitmap_obstack_initialize (&update_ssa_obstack);
  }
--- 2291,2297 
/* Allocate memory for the DEF_BLOCKS hash table.  */
gcc_assert (var_infos == NULL);
var_infos = htab_create (VEC_length (tree, cfun->local_decls),
!  var_info_hash, var_info_eq, free);
  
bitmap_obstack_initialize (&update_ssa_obstack);
  }
*** update_ssa (unsigned update_flags)
*** 3170,3176 
  {
/* If we rename bare symbols initialize the mapping to
   auxiliar info we need to keep track of.  */
!   var_infos = htab_create (47, var_info_hash, var_info_eq, NULL);
  
/* If we have to rename some symbols f

[SH] Use more multi-line asm outputs

2012-08-21 Thread Oleg Endo
Hello,

This mainly converts the asm outputs to multi-line strings and uses tab
chars instead of '\\t' in the asm strings, in the hope to make stuff
easier to read and a bit more consistent.
Tested on rev 190546 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

ChangeLog:

* config/sh/sh.md (cmpeqdi_t, cmpgtdi_t, cmpgedi_t, cmpgeudi_t,
cmpgtudi_t, *movsicc_t_false, *movsicc_t_true, divsi_inv20, 
negsi_cond, truncdihi2, ic_invalidate_line_i, 
ic_invalidate_line_sh4a, ic_invalidate_line_media, movdf_i4,
calli_pcrel, call_valuei, call_valuei_pcrel, sibcalli_pcrel,
sibcall_compact, sibcall_valuei_pcrel, sibcall_value_compact,
casesi_worker_1, casesi_worker_2, bandreg_m2a, borreg_m2a, 
bxorreg_m2a, sp_switch_1, sp_switch_2, stack_protect_set_si,
stack_protect_set_si_media, stack_protect_set_di_media,
stack_protect_test_si, stack_protect_test_si_media,
stack_protect_test_di_media): Convert to multi-line asm output
strings.
(divsi_inv_qitable, divsi_inv_hitable): Use single-alternative 
asm output.
(*andsi3_bclr, rotldi3_mextr, rotrdi3_mextr, calli, 
call_valuei_tbr_rel, movml_push_banked, movml_pop_banked, 
bclr_m2a, bclrmem_m2a, bset_m2a, bsetmem_m2a, bst_m2a, bld_m2a, 
bldsign_m2a, bld_reg, *bld_regqi, band_m2a, bor_m2a, bxor_m2a, 
mextr_rl, *mextr_lr, ): Use tab char instead of '\\t'.
(iordi3): Use braced string.
(*movsi_pop): Use tab chars instead of spaces.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 190546)
+++ gcc/config/sh/sh.md	(working copy)
@@ -541,12 +541,10 @@
 
 ;; On the SH and SH2, the rte instruction reads the return pc from the stack,
 ;; and thus we can't put a pop instruction in its delay slot.
-;; ??? On the SH3, the rte instruction does not use the stack, so a pop
+;; On the SH3 and SH4, the rte instruction does not use the stack, so a pop
 ;; instruction can go in the delay slot.
-
 ;; Since a normal return (rts) implicitly uses the PR register,
 ;; we can't allow PR register loads in an rts delay slot.
-
 (define_delay
   (eq_attr "type" "return")
   [(and (eq_attr "in_delay_slot" "yes")
@@ -1154,9 +1152,21 @@
 	(eq:SI (match_operand:DI 0 "arith_reg_operand" "r,r")
 	   (match_operand:DI 1 "arith_reg_or_0_operand" "N,r")))]
   "TARGET_SH1"
-  "@
-	tst	%S0,%S0\;bf	%,Ldi%=\;tst	%R0,%R0\\n%,Ldi%=:
-	cmp/eq	%S1,%S0\;bf	%,Ldi%=\;cmp/eq	%R1,%R0\\n%,Ldi%=:"
+{
+  static const char* alt[] =
+  {
+   "tst	%S0,%S0"	"\n"
+"	bf	0f"		"\n"
+"	tst	%R0,%R0"	"\n"
+"0:",
+
+   "cmp/eq	%S1,%S0"	"\n"
+"	bf	0f"		"\n"
+"	cmp/eq	%R1,%R0"	"\n"
+"0:"
+  };
+  return alt[which_alternative];
+}
   [(set_attr "length" "6")
(set_attr "type" "arith3b")])
 
@@ -1189,9 +1199,23 @@
 	(gt:SI (match_operand:DI 0 "arith_reg_operand" "r,r")
 	   (match_operand:DI 1 "arith_reg_or_0_operand" "r,N")))]
   "TARGET_SH2"
-  "@
-	cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/gt\\t%S1,%S0\;cmp/hi\\t%R1,%R0\\n%,Ldi%=:
-	tst\\t%S0,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/pl\\t%S0\;cmp/hi\\t%S0,%R0\\n%,Ldi%=:"
+{
+  static const char* alt[] =
+  {
+   "cmp/eq	%S1,%S0"	"\n"
+"	bf{.|/}s	0f"	"\n"
+"	cmp/gt	%S1,%S0"	"\n"
+"	cmp/hi	%R1,%R0"	"\n"
+"0:",
+
+"tst	%S0,%S0"	"\n"
+"	bf{.|/}s	0f"	"\n"
+"	cmp/pl	%S0"		"\n"
+"	cmp/hi	%S0,%R0"	"\n"
+"0:"
+  };
+  return alt[which_alternative];
+}
   [(set_attr "length" "8")
(set_attr "type" "arith3")])
 
@@ -1200,9 +1224,19 @@
 	(ge:SI (match_operand:DI 0 "arith_reg_operand" "r,r")
 	   (match_operand:DI 1 "arith_reg_or_0_operand" "r,N")))]
   "TARGET_SH2"
-  "@
-	cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/ge\\t%S1,%S0\;cmp/hs\\t%R1,%R0\\n%,Ldi%=:
-	cmp/pz\\t%S0"
+{
+  static const char* alt[] =
+  {
+   "cmp/eq	%S1,%S0"	"\n"
+"	bf{.|/}s	0f"	"\n"
+"	cmp/ge	%S1,%S0"	"\n"
+"	cmp/hs	%R1,%R0"	"\n"
+"0:",
+
+   "cmp/pz	%S0"
+  };
+  return alt[which_alternative];
+}
   [(set_attr "length" "8,2")
(set_attr "type" "arith3,mt_group")])
 
@@ -1215,7 +1249,13 @@
 	(geu:SI (match_operand:DI 0 "arith_reg_operand" "r")
 		(match_operand:DI 1 "arith_reg_operand" "r")))]
   "TARGET_SH2"
-  "cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/hs\\t%S1,%S0\;cmp/hs\\t%R1,%R0\\n%,Ldi%=:"
+{
+  return   "cmp/eq	%S1,%S0"	"\n"
+	 "	bf{.|/}s	0f"	"\n"
+	 "	cmp/hs	%S1,%S0"	"\n"
+	 "	cmp/hs	%R1,%R0"	"\n"
+	 "0:";
+}
   [(set_attr "length" "8")
(set_attr "type" "arith3")])
 
@@ -1224,7 +1264,13 @@
 	(gtu:SI (match_operand:DI 0 "arith_reg_operand" "r")
 		(match_operand:DI 1 "arith_reg_operand" "r")))]
   "TARGET_SH2"
-  "cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/hi\\t%S1,%S0\;cmp/hi\\t%R1,%R0\\n%,Ldi%=:"
+{
+  return   "cmp/eq	%S1,%S0"	"\n"
+	 "	bf{.|/}s	0f"	"\n"

[C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-08-21 Thread Florian Weimer
I don't think there are any callers out there, but let's fix this for 
completeness.


A compiler emitting code to call this function would still have to 
perform overflow checks for the new T[n][m] case, so this interface is 
not as helpful as it looks at first glance.


Tested on x86_64-redhat-linux-gnu.

--
Florian Weimer / Red Hat Product Security Team
2012-08-21  Florian Weimer  

	* libsupc++/vec.cc (compute_size): New function.
	(__cxa_vec_new2, __cxa_vec_new3): Use it.

2012-08-21  Florian Weimer  

	* g++.old-deja/g++.abi/cxa_vec.C (test5, test6): New.

diff --git a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
index f3d602f..e2b82e7 100644
--- a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
+++ b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
@@ -8,7 +8,7 @@
 // Avoid use of none-overridable new/delete operators in shared
 // { dg-options "-static" { target *-*-mingw* } }
 // Test __cxa_vec routines
-// Copyright (C) 2000, 2005 Free Software Foundation, Inc.
+// Copyright (C) 2000-2012 Free Software Foundation, Inc.
 // Contributed by Nathan Sidwell 7 Apr 2000 
 
 #if defined (__GXX_ABI_VERSION) && __GXX_ABI_VERSION >= 100
@@ -255,6 +255,80 @@ void test4 ()
   return;
 }
 
+static const std::size_t large_size = std::size_t(1) << (sizeof(std::size_t) * 8 - 2);
+
+// allocate an array whose size causes an overflow during multiplication
+void test5 ()
+{
+  static bool started = false;
+
+  if (!started)
+{
+  started = true;
+  std::set_terminate (test0);
+
+  ctor_count = dtor_count = 1;
+  dtor_repeat = false;
+  blocks = 0;
+
+  try
+{
+  void *ary = abi::__cxa_vec_new (large_size, 4, padding, ctor, dtor);
+	  longjmp (jump, 1);
+}
+  catch (std::bad_alloc)
+	{
+	  if (ctor_count != 1)
+	longjmp (jump, 4);
+	}
+  catch (...)
+{
+  longjmp (jump, 2);
+}
+}
+  else
+{
+  longjmp (jump, 3);
+}
+  return;
+}
+
+// allocate an array whose size causes an overflow during addition
+void test6 ()
+{
+  static bool started = false;
+
+  if (!started)
+{
+  started = true;
+  std::set_terminate (test0);
+
+  ctor_count = dtor_count = 1;
+  dtor_repeat = false;
+  blocks = 0;
+
+  try
+{
+  void *ary = abi::__cxa_vec_new (std::size_t(-1) / 4, 4, padding, ctor, dtor);
+	  longjmp (jump, 1);
+}
+  catch (std::bad_alloc)
+	{
+	  if (ctor_count != 1)
+	longjmp (jump, 4);
+	}
+  catch (...)
+{
+  longjmp (jump, 2);
+}
+}
+  else
+{
+  longjmp (jump, 3);
+}
+  return;
+}
+
 static void (*tests[])() =
 {
   test0,
@@ -262,6 +336,8 @@ static void (*tests[])() =
   test2,
   test3,
   test4,
+  test5,
+  test6,
   NULL
 };
 
diff --git a/libstdc++-v3/libsupc++/vec.cc b/libstdc++-v3/libsupc++/vec.cc
index 700c5ef..bfce117 100644
--- a/libstdc++-v3/libsupc++/vec.cc
+++ b/libstdc++-v3/libsupc++/vec.cc
@@ -1,6 +1,6 @@
 // New abi Support -*- C++ -*-
 
-// Copyright (C) 2000, 2001, 2003, 2004, 2009, 2011
+// Copyright (C) 2000-2012
 // Free Software Foundation, Inc.
 //  
 // This file is part of GCC.
@@ -59,6 +59,19 @@ namespace __cxxabiv1
   globals->caughtExceptions = p->nextException;
   globals->uncaughtExceptions += 1;
 }
+
+// Compute the total size with overflow checking.
+std::size_t compute_size(std::size_t element_count,
+			 std::size_t element_size,
+			 std::size_t padding_size)
+{
+  if (element_size && element_count > std::size_t(-1) / element_size)
+	throw std::bad_alloc();
+  std::size_t size = element_count * element_size;
+  if (size + padding_size < size)
+	throw std::bad_alloc();
+  return size + padding_size;
+}
   }
 
   // Allocate and construct array.
@@ -83,7 +96,8 @@ namespace __cxxabiv1
 		 void *(*alloc) (std::size_t),
 		 void (*dealloc) (void *))
   {
-std::size_t size = element_count * element_size + padding_size;
+std::size_t size
+  = compute_size(element_count, element_size, padding_size);
 char *base = static_cast  (alloc (size));
 if (!base)
   return base;
@@ -124,7 +138,8 @@ namespace __cxxabiv1
 		 void *(*alloc) (std::size_t),
 		 void (*dealloc) (void *, std::size_t))
   {
-std::size_t size = element_count * element_size + padding_size;
+std::size_t size
+  = compute_size(element_count, element_size, padding_size);
 char *base = static_cast(alloc (size));
 if (!base)
   return base;


Re: [PATCH] Add valgrind support to alloc-pool.c

2012-08-21 Thread Richard Guenther
On Sat, Aug 18, 2012 at 9:56 AM, Richard Guenther
 wrote:
> On Sat, Aug 18, 2012 at 6:17 AM, Andrew Pinski  wrote:
>> Hi,
>>   I implemented this patch almost 6 years ago when the df branch was
>> being worked on.  It adds valgrind support to  alloc-pool.c to catch
>> cases of using memory after free the memory.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Ok.

It doesn't work.  Did you check with valgrind checking?

/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c: In function 'void*
pool_alloc(alloc_pool)':
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected
primary-expression before 'int'
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected
')' before 'int'
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected
')' before ';' token
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:263:3: error: 'size'
was not declared in this scope
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:303:7: error: 'size'
was not declared in this scope

that's because VALGRIND_DISCARD is not what you think it is.

Testing a fix ...

Richard.

> Thanks,
> Richard.
>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * alloc-pool.c (pool_alloc): Add valgrind markers.
>> (pool_free): Likewise.


Re: [PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base

2012-08-21 Thread Jay Foad
On 21 August 2012 10:58, Richard Guenther  wrote:
> Index: trunk/gcc/tree.h
> ===
> *** trunk.orig/gcc/tree.h   2012-08-20 12:47:47.0 +0200
> --- trunk/gcc/tree.h2012-08-21 10:32:47.717394657 +0200
> *** enum omp_clause_code
> *** 417,423 
>  so all nodes have these fields.
>
>  See the accessor macros, defined below, for documentation of the
> !fields.  */
>
>   struct GTY(()) tree_base {
> ENUM_BITFIELD(tree_code) code : 16;
> --- 417,424 
>  so all nodes have these fields.
>
>  See the accessor macros, defined below, for documentation of the
> !fields, and the table below which connects the fileds and the
> !accessor macros.  */

Typo "fileds".

Jay.


Re: [PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base

2012-08-21 Thread Richard Guenther
On Mon, 20 Aug 2012, Richard Guenther wrote:

> 
> This shrinks TREE_VEC from 40 bytes to 32 bytes and SSA_NAME from
> 80 bytes to 72 bytes on a 64bit host.  Both structures suffer
> from the fact they need storage for an integer (length and version)
> which leaves unused padding.  Both data structures do not require
> as many flag bits as we keep in tree_base though, so they can
> conveniently use the upper 4-bytes of the 8-bytes tree_base to
> store length / version.
> 
> I added a union to tree_base to divide up the space between flags
> (possibly) used for all tree kinds and flags that are not used
> for those who chose to re-use the upper 4-bytes of tree_base for
> something else.
> 
> This superseeds the patch that moved the C++ specific usage of
> TREE_CHAIN on TREE_VECs to tree_base (same savings, but TREE_VEC
> isn't any closer to be based on tree_base only).
> 
> Due to re-use of flags from frontends definitive checking for
> flag accesses is not always possible (TREE_NOTRHOW for example).
> Where appropriate I added TREE_NOT_CHECK (NODE, TREE_VEC) instead,
> to catch mis-uses of the C++ frontend.  Changed ARGUMENT_PACK_INCOMPLETE_P
> to use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 then which
> it used on TREE_VECs.
> 
> We are very lazy adjusting flag usage documentation :/
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

After discussion on IRC I added !SSA_NAME checking to the lang flag
accessors.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-21  Richard Guenther  

cp/
* cp-tree.h (TREE_INDIRECT_USING): Use TREE_LANG_FLAG_0 accessor.
(ATTR_IS_DEPENDENT): Likewise.
(ARGUMENT_PACK_INCOMPLETE_P): Use TREE_ADDRESSABLE instead of
TREE_LANG_FLAG_0 on TREE_VECs.

* tree.h (struct tree_base): Add union to make it possible to
re-use the upper 4 bytes for tree codes that do not need as
many flags as others.  Move visited and default_def_flag to
common bits section in exchange for saturating_flag and
unsigned_flag.  Add SSA name version and tree vec length
fields here.
(struct tree_vec): Remove length field here.
(struct tree_ssa_name): Remove version field here.

Index: trunk/gcc/cp/cp-tree.h
===
*** trunk.orig/gcc/cp/cp-tree.h 2012-08-20 12:47:47.0 +0200
--- trunk/gcc/cp/cp-tree.h  2012-08-20 13:53:05.212969994 +0200
*** struct GTY((variable_size)) lang_decl {
*** 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) (TREE_LIST_CHECK (NODE)->base.lang_flag_0)
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) (TREE_LIST_CHECK (NODE)->base.lang_flag_0)
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
--- 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
*** extern void decl_shadowed_for_var_insert
*** 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_LANG_FLAG_0 (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
--- 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_ADDRESSABLE (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
Index: trunk/gcc/tree.h
===
*** trunk.orig/gcc/tree.h   2012-08-20 12:47:47.0 +0200
--- trunk/gcc/tree.h2012-08-21 10:32:47.717394657 +0200
*** enum omp_clause_code
*** 417,423 
 so all nodes have these fields.
  
 See the accessor macros, defined below, for documentation of the
!fields.  */
  
  struct GTY(()) tree_base {
ENUM_BITFIELD(tree_code) code : 16;
--- 417,424 
 so all nodes have these fields.
  
 See the accessor macros, defined below, for documentation of the
!fields, and the table below which connects the fileds and t

[SH] PR 39423 - Add support for SH2A movu.w insn

2012-08-21 Thread Oleg Endo
Hello,

This adds support for SH2A's movu.w insn for memory addressing cases as
described in the PR.
Tested on rev 190546 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

ChangeLog:

PR target/39423
* config/sh/sh.md (*movhi_index_disp): Add support for SH2A 
movu.w insn.

testsuite/ChangeLog:

PR target/39423
* gcc.target/sh/pr39423-2.c: New.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 190459)
+++ gcc/config/sh/sh.md	(working copy)
@@ -5667,12 +5667,35 @@
(clobber (reg:SI T_REG))]
   "TARGET_SH1"
   "#"
-  "&& 1"
-  [(parallel [(set (match_dup 0) (sign_extend:SI (match_dup 1)))
-	  (clobber (reg:SI T_REG))])
-   (set (match_dup 0) (zero_extend:SI (match_dup 2)))]
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
 {
-  operands[2] = gen_lowpart (HImode, operands[0]);
+  rtx mem = operands[1];
+  rtx plus0_rtx = XEXP (mem, 0);
+  rtx plus1_rtx = XEXP (plus0_rtx, 0);
+  rtx mult_rtx = XEXP (plus1_rtx, 0);
+
+  rtx op_1 = XEXP (mult_rtx, 0);
+  rtx op_2 = GEN_INT (exact_log2 (INTVAL (XEXP (mult_rtx, 1;
+  rtx op_3 = XEXP (plus1_rtx, 1);
+  rtx op_4 = XEXP (plus0_rtx, 1);
+  rtx op_5 = gen_reg_rtx (SImode);
+  rtx op_6 = gen_reg_rtx (SImode);
+  rtx op_7 = replace_equiv_address (mem, gen_rtx_PLUS (SImode, op_6, op_4));
+
+  emit_insn (gen_ashlsi3 (op_5, op_1, op_2));
+  emit_insn (gen_addsi3 (op_6, op_5, op_3));
+
+  /* On SH2A the movu.w insn can be used for zero extending loads.  */
+  if (TARGET_SH2A)
+emit_insn (gen_zero_extendhisi2 (operands[0], op_7));
+  else
+{
+  emit_insn (gen_extendhisi2 (operands[0], op_7));
+  emit_insn (gen_zero_extendhisi2 (operands[0],
+   gen_lowpart (HImode, operands[0])));
+}
+  DONE;
 })
 
 (define_insn_and_split "*movsi_index_disp"
Index: gcc/testsuite/gcc.target/sh/pr39423-2.c
===
--- gcc/testsuite/gcc.target/sh/pr39423-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr39423-2.c	(revision 0)
@@ -0,0 +1,14 @@
+/* Check that displacement addressing is used for indexed addresses with a
+   small offset, instead of re-calculating the index and that the movu.w
+   instruction is used on SH2A.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O2" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "*" } { "-m2a*" } } */
+/* { dg-final { scan-assembler-not "add\t#1" } } */
+/* { dg-final { scan-assembler "movu.w" } } */
+
+int
+test_00 (unsigned short tab[], int index)
+{
+  return tab[index + 1];
+}


Re: Fix Solaris 9/x86 bootstrap

2012-08-21 Thread Richard Guenther
On Tue, Aug 21, 2012 at 10:53 AM, Rainer Orth
 wrote:
> Solaris 9/x86 bootstrap was broken after the cxx-conversion merge:
>
> In file included from /vol/gcc/src/hg/trunk/local/gcc/gengtype.c:957:
> /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected identifier 
> before n
> umeric constant
> /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected '}' before 
> numeric
> constant
>
> This happens since g++, unlike gcc, defines __EXTENSIONS__, which
> exposes the equivalent of
>
> #define PC 14
>
> Initially I tried to avoid this by having gengtype.c include rtl.h,
> which already has the #undef, but this produced so much fallout that I
> decided it's better to just replicate it here.
>
> The patch allowed an i386-pc-solaris2.9 bootstrap to finish.  I think
> this counts as obvious unless someone prefers the rtl.h route
> nonetheless.
>
> Ok for mainline?

Doesn't that belong in system.h instead?  And removed from rtl.h?

Thanks,
Richard.

> Rainer
>
>
> 2012-08-20  Rainer Orth  
>
> * gengtype.c (PC): Undef.
>
>
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>


Fix Solaris 9/x86 bootstrap

2012-08-21 Thread Rainer Orth
Solaris 9/x86 bootstrap was broken after the cxx-conversion merge:

In file included from /vol/gcc/src/hg/trunk/local/gcc/gengtype.c:957:
/vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected identifier before n
umeric constant
/vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected '}' before numeric
constant

This happens since g++, unlike gcc, defines __EXTENSIONS__, which
exposes the equivalent of

#define PC 14

Initially I tried to avoid this by having gengtype.c include rtl.h,
which already has the #undef, but this produced so much fallout that I
decided it's better to just replicate it here.

The patch allowed an i386-pc-solaris2.9 bootstrap to finish.  I think
this counts as obvious unless someone prefers the rtl.h route
nonetheless.

Ok for mainline?

Rainer


2012-08-20  Rainer Orth  

* gengtype.c (PC): Undef.

# HG changeset patch
# Parent cf74f0e72cab4965ba20bf236eac2fac2b87064e
Fix Solaris 9 bootstrap

diff --git a/gcc/gengtype.c b/gcc/gengtype.c
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -35,6 +35,8 @@
 #include "gengtype.h"
 #include "filenames.h"
 
+#undef PC /* Some systems predefine this symbol; don't let it interfere.  */
+
 /* Data types, macros, etc. used only in this file.  */
 
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, ARM] Don't pull in unwinder for 64-bit division routines

2012-08-21 Thread Ye Joey
On Fri, Aug 17, 2012 at 9:13 AM, Ian Lance Taylor  wrote:
>
> Looks fine to me.
>
> Ian
Will backport to arm/embedded-4_7-branch. No sure if appropriate for
4.7 branch since it is not a stability problem.

- Joey


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Jan Hubicka
> Teresa has done some tunings for the unroller so far. The inliner
> tuning is the next step.
> 
> >
> > What concerns me that it is greatly inaccurate - you have no idea how many
> > instructions given counter is guarding and it can differ quite a lot. Also
> > inlining/optimization makes working sets significantly different (by factor 
> > of
> > 100 for tramp3d).
> 
> The pre ipa-inline working set is the one that is needed for ipa
> inliner tuning. For post-ipa inline code increase transformations,
> some update is probably needed.
> 
> >But on the ohter hand any solution at this level will be
> > greatly inaccurate. So I am curious how reliable data you can get from this?
> > How you take this into account for the heuristics?
> 
> This effort is just the first step to allow good heuristics to develop.
> 
> >
> > It seems to me that for this use perhaps the simple logic in histogram 
> > merging
> > maximizing the number of BBs for given bucket will work well?  It is
> > inaccurate, but we are working with greatly inaccurate data anyway.
> > Except for degenerated cases, the small and unimportant runs will have 
> > small BB
> > counts, while large runs will have larger counts and those are ones we 
> > optimize
> > for anyway.
> 
> The working set curve for each type of applications contains lots of
> information that can be mined. The inaccuracy can also be mitigated by
> more data 'calibration'.

Sure, I think I am leaning towards trying the solution 2) with maximizing
counter count merging (probably it would make sense to rename it from BB count
since it is not really BB count and thus it is misleading) and we will see how
well it works in practice.

We have benefits of much fewer issues with profile locking/unlocking and we
lose bit of precision on BB counts. I tend to believe that the error will not
be that important in practice. Another loss is more histogram streaming into
each gcda file, but with skiping zero entries it should not be major overhead
problem I hope.

What do you think?
> 
> >>
> >>
> >> >  2) Do we plan to add some features in near future that will anyway 
> >> > require global locking?
> >> > I guess LIPO itself does not count since it streams its data into 
> >> > independent file as you
> >> > mentioned earlier and locking LIPO file is not that hard.
> >> > Does LIPO stream everything into that common file, or does it use 
> >> > combination of gcda files
> >> > and common summary?
> >>
> >> Actually, LIPO module grouping information are stored in gcda files.
> >> It is also stored in a separate .imports file (one per object) ---
> >> this is primarily used by our build system for dependence information.
> >
> > I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO 
> > behave
> > on GCC bootstrap?
> 
> We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
> main problem for application build -- the link time (for debug build)
> is.

I was primarily curious how the LIPOs runtime analysis fare in the situation 
where
you do very many small train runs on rather large app (sure GCC is small to 
google's
use case ;).
> 
> > (i.e. it does a lot more work in the libgcov module per each
> > invocation, so I am curious if it is practically useful at all).
> >
> > With LTO based solution a lot can be probably pushed at link time? Before
> > actual GCC starts from the linker plugin, LIPO module can read gcov CFGs 
> > from
> > gcda files and do all the merging/updating/CFG constructions that is 
> > currently
> > performed at runtime, right?
> 
> The dynamic cgraph build and analysis is still done at runtime.
> However, with the new implementation, FE is no longer involved. Gcc
> driver is modified to understand module grouping, and lto is used to
> merge the streamed output from aux modules.

I see. Are there any fundamental reasons why it can not be done at link-time
when all gcda files are available? Why the grouping is not done inside linker
plugin?

Honza
> 
> 
> David


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Xinliang David Li
On Mon, Aug 20, 2012 at 11:33 PM, Jan Hubicka  wrote:
>>
>> This is useful for large applications with a long tail. The
>> instruction working set for those applications are very large, and
>> inliner and unroller need to be aware of that and good heuristics can
>> be developed to throttle aggressive code bloat transformations. For
>> inliner, it is kind of the like the global budget but more application
>> dependent. In the long run, we will collect more advanced fdo summary
>> regarding working set -- it will be working set size for each code
>> region (locality region).
>
> I see, so you use it to estimate size of the working set and effect of 
> bloating
> optimizations on cache size. This sounds interesting. What are you experiences
> with this?

Teresa has done some tunings for the unroller so far. The inliner
tuning is the next step.

>
> What concerns me that it is greatly inaccurate - you have no idea how many
> instructions given counter is guarding and it can differ quite a lot. Also
> inlining/optimization makes working sets significantly different (by factor of
> 100 for tramp3d).

The pre ipa-inline working set is the one that is needed for ipa
inliner tuning. For post-ipa inline code increase transformations,
some update is probably needed.

>But on the ohter hand any solution at this level will be
> greatly inaccurate. So I am curious how reliable data you can get from this?
> How you take this into account for the heuristics?

This effort is just the first step to allow good heuristics to develop.

>
> It seems to me that for this use perhaps the simple logic in histogram merging
> maximizing the number of BBs for given bucket will work well?  It is
> inaccurate, but we are working with greatly inaccurate data anyway.
> Except for degenerated cases, the small and unimportant runs will have small 
> BB
> counts, while large runs will have larger counts and those are ones we 
> optimize
> for anyway.

The working set curve for each type of applications contains lots of
information that can be mined. The inaccuracy can also be mitigated by
more data 'calibration'.

>>
>>
>> >  2) Do we plan to add some features in near future that will anyway 
>> > require global locking?
>> > I guess LIPO itself does not count since it streams its data into 
>> > independent file as you
>> > mentioned earlier and locking LIPO file is not that hard.
>> > Does LIPO stream everything into that common file, or does it use 
>> > combination of gcda files
>> > and common summary?
>>
>> Actually, LIPO module grouping information are stored in gcda files.
>> It is also stored in a separate .imports file (one per object) ---
>> this is primarily used by our build system for dependence information.
>
> I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO 
> behave
> on GCC bootstrap?

We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
main problem for application build -- the link time (for debug build)
is.

> (i.e. it does a lot more work in the libgcov module per each
> invocation, so I am curious if it is practically useful at all).
>
> With LTO based solution a lot can be probably pushed at link time? Before
> actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from
> gcda files and do all the merging/updating/CFG constructions that is currently
> performed at runtime, right?

The dynamic cgraph build and analysis is still done at runtime.
However, with the new implementation, FE is no longer involved. Gcc
driver is modified to understand module grouping, and lto is used to
merge the streamed output from aux modules.


David

>>
>>
>> >
>> > What other stuff Google plans to merge?
>> > (In general I would be curious about merging plans WRT profile stuff, 
>> > so we get more
>> > synchronized and effective on getting patches in. We have about two 
>> > months to get it done
>> > in stage1 and it would be nice to get as much as possible. Obviously 
>> > some of the patches will
>> > need bit fo dicsussion like this one. Hope you do not find it 
>> > frustrating, I actually think
>> > this is an important feature).
>>
>> We plan to merge in the new LIPO implementation based on LTO
>> streaming. Rong Xu finished this in 4.6 based compiler, and he needs
>> to port it to 4.8.
>
> Good.  Looks like a lot of work ahead. It would be nice if we can perhaps 
> start
> by merging the libgcov infrastructure updates prior the LIPO changes.  From
> what I saw at LIPO branch some time ago it has a lot of stuff that is not
> exactly LIPO specific.
>
> Honza
>>
>>
>> thanks,
>>
>> David
>>
>> >
>> > I also realized today that the common value counters (used by switch, 
>> > indirect
>> > call and div/mod value profiling) are non-stanble WRT different merging 
>> > orders
>> > (i.e.  parallel make in train run).  I do not think there is actual 
>> > solution to
>> > that except for not merging the counter section of this type in lib