Re: symbolic names for processor IDs

2012-09-08 Thread Ulrich Drepper
On Sat, Sep 8, 2012 at 7:17 AM, Uros Bizjak  wrote:
> There are some other cpuid vendor signatures than AMD and Intel,

How about a patch with this complete list?


d-gcc-cpuid-signature
Description: Binary data


Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-08 Thread Georg-Johann Lay

Hans-Peter Nilsson schrieb:

Oleg Endo wrote:

Georg-Johann Lay wrote:

The change is definitely in the right direction, but I wonder
how it helps to fix code bloats of 300%-400% as in PR52543?

I'm not familiar with the AVR parts.
BTW, There was a small change in lower-subreg which required some
adaptations in targets:
http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00425.html

See also gcc/config/sh/sh.c (sh_rtx_costs): case SET: ...
Not sure whether it helps in your case.


Those kinds of changes are not required anymore (for most
targets), the fallback rtx_cost was amended to more sanely take
the mode-size into account, and to handle SET.  Whether that is
related to PR52543, I don't know.


Hi,

I found an older compiler version (4.7.1 prerelease) without the
"expand MEM as UNSPEC" hack, and compiled one test case from
PR52543:

long read1 (const __flash1 long *p)
{
return *p;
}

with the following options

$ avr-gcc -mmcu=atmega128 -c foo.c -fdump-rtl-subreg1-details 
-fdump-rtl-jump-details -mlog=rtx_costs,address_cost -Os  -std=gnu99 -dp 
-save-temps


The -mlog= option logs all results from the mentioned target hooks.

What I see is this:

- targetm.address_cost isn't even called once

- In .157r.jump the insn is fine, it reads SI from AS2:

(insn 6 3 7 2 (set (reg:SI 46)
(mem:SI (reg/v/f:HI 44 [ p ]) [2 *p_1(D)+0 S4 A8 AS2])) 
foo.c:10 26 {*movsi}

 (nil))

- In pass .158r.subreg1 this insn is split into 4 movqi.
  This costs 16 instructions instead of 6 without a split.
  Besides that the register pressure goes up.

- Costs for mem:QI/AS2 are called *after* subreg1 split the insn
  forst time from cse1.

- There is not a single call that tries to get the costs for
  mem:SI/AS2 moves

- The only calls that get costs for SI moves look like

avr_rtx_costs[:pass=?]=true (size) total=16, outer=set:
(mem:SI (reg:HI 38 virtual-stack-dynamic) [0 S4 A8])

This cost is for generic address space, not for AS2.
Besides that it calls the hook with the wrong optimization
(speed instead of size as requested by -Os)

Johann




[committed] PA cost update

2012-09-08 Thread John David Anglin
The attached patch improves the cost estimate for multiplication, etc,
on parisc.  It was noted debugging various PRs that the cost for multiplication
of long longs was seriously underestimated.

Tested on hppa2.0w-hp-hpux11.11, hppa64-hp-hpux11.11 and hppa-unknown-linux-gnu.
Committed to trunk.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

2012-09-08  John David Anglin  

* config/pa/pa.c (hppa_rtx_costs): Update costs for large integer modes.

Index: config/pa/pa.c
===
--- config/pa/pa.c  (revision 191079)
+++ config/pa/pa.c  (working copy)
@@ -1422,6 +1422,8 @@
 hppa_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
int *total, bool speed ATTRIBUTE_UNUSED)
 {
+  int factor;
+
   switch (code)
 {
 case CONST_INT:
@@ -1453,11 +1455,20 @@
 
 case MULT:
   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
-*total = COSTS_N_INSNS (3);
-  else if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT)
-   *total = COSTS_N_INSNS (8);
+   {
+ *total = COSTS_N_INSNS (3);
+ return true;
+   }
+
+  /* A mode size N times larger than SImode needs O(N*N) more insns.  */
+  factor = GET_MODE_SIZE (GET_MODE (x)) / 4;
+  if (factor == 0)
+   factor = 1;
+
+  if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT)
+   *total = factor * factor * COSTS_N_INSNS (8);
   else
-   *total = COSTS_N_INSNS (20);
+   *total = factor * factor * COSTS_N_INSNS (20);
   return true;
 
 case DIV:
@@ -1471,15 +1482,28 @@
 case UDIV:
 case MOD:
 case UMOD:
-  *total = COSTS_N_INSNS (60);
+  /* A mode size N times larger than SImode needs O(N*N) more insns.  */
+  factor = GET_MODE_SIZE (GET_MODE (x)) / 4;
+  if (factor == 0)
+   factor = 1;
+
+  *total = factor * factor * COSTS_N_INSNS (60);
   return true;
 
 case PLUS: /* this includes shNadd insns */
 case MINUS:
   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
-   *total = COSTS_N_INSNS (3);
-  else
-*total = COSTS_N_INSNS (1);
+   {
+ *total = COSTS_N_INSNS (3);
+ return true;
+   }
+
+  /* A size N times larger than UNITS_PER_WORD needs N times as
+many insns, taking N times as long.  */
+  factor = GET_MODE_SIZE (GET_MODE (x)) / UNITS_PER_WORD;
+  if (factor == 0)
+   factor = 1;
+  *total = factor * COSTS_N_INSNS (1);
   return true;
 
 case ASHIFT:


Re: [PATCH] Set correct source location for deallocator calls

2012-09-08 Thread Dehao Chen
Hi,

I've added a libjava unittest which verifies that this patch will not
break Java debug info. I've also incorporated Richard's review in the
previous mail. Attached is the new patch, which passed bootstrap and
all gcc/libjava testsuites on x86.

Is it ok for trunk?

Thanks,
Dehao

gcc/ChangeLog:
2012-09-08  Dehao Chen  

 * tree-eh.c (goto_queue_node): New field.
(record_in_goto_queue): New parameter.
(record_in_goto_queue_label): New parameter.
(lower_try_finally_dup_block): New parameter.
(maybe_record_in_goto_queue): Update source location.
(lower_try_finally_copy): Likewise.
(honor_protect_cleanup_actions): Likewise.
* gimplify.c (gimplify_expr): Reset the location to unknown.

gcc/testsuite/ChangeLog:
2012-09-08  Dehao Chen  

* g++.dg/debug/dwarf2/deallocator.C: New test.

libjava/ChangeLog:
2012-09-08  Dehao Chen  

* testsuite/libjava.lang/sourcelocation.java: New cases.
* testsuite/libjava.lang/sourcelocation.out: New cases.
Index: libjava/testsuite/libjava.lang/sourcelocation.java
===
--- libjava/testsuite/libjava.lang/sourcelocation.java  (revision 0)
+++ libjava/testsuite/libjava.lang/sourcelocation.java  (revision 0)
@@ -0,0 +1,18 @@
+/* This test should test the source location attribution.
+   We print the line number of different parts of the program to make sure
+   that the source code attribution is correct.
+   To make this test pass, one need to have up-to-date addr2line installed
+   to parse the dwarf4 data format.
+*/
+public class sourcelocation {
+  public static void main(String args[]) {
+try {
+  System.out.println(new Exception().getStackTrace()[0].getLineNumber());
+  throw new Exception();
+} catch (Exception e) {
+  System.out.println(new Exception().getStackTrace()[0].getLineNumber());
+} finally {
+  System.out.println(new Exception().getStackTrace()[0].getLineNumber());
+}
+  }
+}
Index: libjava/testsuite/libjava.lang/sourcelocation.out
===
--- libjava/testsuite/libjava.lang/sourcelocation.out   (revision 0)
+++ libjava/testsuite/libjava.lang/sourcelocation.out   (revision 0)
@@ -0,0 +1,3 @@
+10
+13
+15
Index: libjava/testsuite/libjava.lang/sourcelocation.jar
===
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream

Property changes on: libjava/testsuite/libjava.lang/sourcelocation.jar
___
Added: svn:mime-type
   + application/octet-stream

Index: gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C
===
--- gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C (revision 0)
+++ gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C (revision 0)
@@ -0,0 +1,33 @@
+// Test that debug info generated for auto-inserted deallocator is
+// correctly attributed.
+// This patch scans for the lineno directly from assembly, which may
+// differ between different architectures. Because it mainly tests
+// FE generated debug info, without losing generality, only x86
+// assembly is scanned in this test.
+// { dg-do compile { target { i?86-*-* x86_64-*-* } } }
+// { dg-options "-O2 -fno-exceptions -g -dA" }
+
+struct t {
+  t ();
+  ~t ();
+  void foo();
+  void bar();
+};
+
+int bar();
+
+void foo(int i)
+{
+  for (int j = 0; j < 10; j++)
+{
+  t test;
+  test.foo();
+  if (i + j)
+   {
+ test.bar();
+ return;
+   }
+}
+  return;
+}
+// { dg-final { scan-assembler "deallocator.C:28" } }
Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 191083)
+++ gcc/tree-eh.c   (working copy)
@@ -321,6 +321,7 @@ static bitmap eh_region_may_contain_throw_map;
 struct goto_queue_node
 {
   treemple stmt;
+  location_t location;
   gimple_seq repl_stmt;
   gimple cont_stmt;
   int index;
@@ -560,7 +561,8 @@ static void
 record_in_goto_queue (struct leh_tf_state *tf,
   treemple new_stmt,
   int index,
-  bool is_label)
+  bool is_label,
+ location_t location)
 {
   size_t active, size;
   struct goto_queue_node *q;
@@ -583,6 +585,7 @@ record_in_goto_queue (struct leh_tf_state *tf,
   memset (q, 0, sizeof (*q));
   q->stmt = new_stmt;
   q->index = index;
+  q->location = location;
   q->is_label = is_label;
 }
 
@@ -590,7 +593,8 @@ record_in_goto_queue (struct leh_tf_state *tf,
TF is not null.  */
 
 static void
-record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree label)
+record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree label,
+   location_t location)
 {
 

Ping: [PATCH] top-level libtool / configure.ac PIC support for AIX

2012-09-08 Thread David Edelsohn
As mentioned in

http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01364.html

GCC now is performing optimizations in binds_local_p() that
necessitates adding -fPIC when compiling objects for shared libraries
on AIX.  The libtool patch has been accepted upstream.

-fPIC creates unique names for some weak symbols, causing spurious
miscompares addressed by the patch to top-level configure.ac.  The
libquadmath patch is another patch waiting for approval.

How do the requirements to regenerate configure interact with the
shared, top-level directories?  The only strange difference I saw in
GCC was the libatomic subdirectory, which had differences for AMTAR
although the same version of autoconf.

Bootstrapped and regression tested on powerpc-ibm-aix5.3.0.0 and
powerpc-ibm-aix7.1.0.0.

Thanks, David

Merge upstream change.
* libtool.m4 (_LT_COMPILER_PIC): Add -fPIC to GCC and GXX for AIX.

* configure.ac: Add target-libquadmath to noconfigdirs for AIX.
Add libgomp*.o to compare_exclusions for AIX.

* configure: Regenerate
* */configure: Regenerate

Index: libtool.m4
===
--- libtool.m4  (revision 191073)
+++ libtool.m4  (working copy)
@@ -3580,6 +3580,7 @@
# AIX 5 now supports IA64 processor
_LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   fi
+  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
   ;;

 amigaos*)
@@ -3891,6 +3892,7 @@
# AIX 5 now supports IA64 processor
_LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   fi
+  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
   ;;

 amigaos*)
Index: configure.ac
===
--- configure.ac(revision 191073)
+++ configure.ac(working copy)
@@ -527,6 +527,15 @@
 fi
 fi

+# Disable libquadmath for some systems.
+case "${target}" in
+  # libquadmath is unused on AIX and libquadmath build process use of
+  # LD_LIBRARY_PATH can break AIX bootstrap.
+  powerpc-*-aix* | rs6000-*-aix*)
+noconfigdirs="$noconfigdirs target-libquadmath"
+;;
+esac
+
 # Disable libssp for some systems.
 case "${target}" in
   avr-*-*)
@@ -3187,6 +3196,7 @@
 case "$target" in
   hppa*64*-*-hpux*) ;;
   hppa*-*-hpux*) compare_exclusions="gcc/cc*-checksum\$(objext) | */libgcc/lib\
2funcs* | gcc/ada/*tools/*" ;;
+  powerpc*-ibm-aix*) compare_exclusions="gcc/cc*-checksum\$(objext) | gcc/ada/\
*tools/* | *libgomp*\$(objext)" ;;
 esac
 AC_SUBST(compare_exclusions)


Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-08 Thread Hans-Peter Nilsson
Oleg Endo wrote:
> On Thu, 2012-09-06 at 14:41 +0200, Georg-Johann Lay wrote:
> > The change is definitely in the right direction, but I wonder
> > how it helps to fix code bloats of 300%-400% as in PR52543?
>
> I'm not familiar with the AVR parts.
> BTW, There was a small change in lower-subreg which required some
> adaptations in targets:
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00425.html
>
> See also gcc/config/sh/sh.c (sh_rtx_costs): case SET: ...
> Not sure whether it helps in your case.

Those kinds of changes are not required anymore (for most
targets), the fallback rtx_cost was amended to more sanely take
the mode-size into account, and to handle SET.  Whether that is
related to PR52543, I don't know.

brgds, H-P


Re: [PATCH] Fix C++ bootstrap ICE

2012-09-08 Thread Gerald Pfeifer
On Sat, 8 Sep 2012, Mark Kettenis wrote:
> I don't have commit access (or at least I'm not on the write after
> approval list).  Would you (or somebody else) be so kind to commit
> this fix for me?

I'll take care of it.

Gerald


Re: [PATCH] Add -fmem-report-wpa

2012-09-08 Thread Andi Kleen
On Sat, Sep 08, 2012 at 02:37:18PM -0400, Hans-Peter Nilsson wrote:
> On Thu, 6 Sep 2012, Andi Kleen wrote:
> 
> > From: Andi Kleen 
> 
> > Passed bootstrap and testsuite on x86_64.
> 
> > +++ b/gcc/lto/lto.c
> > @@ -2016,6 +2016,8 @@ do_whole_program_analysis (void)
> >/* Show the LTO report before launching LTRANS.  */
> >if (flag_lto_report)
> >  print_lto_report ();
> > +  if (mem_report_wpa)
> > +dump_mem_report ();
> >  }
> 
> Broke build for cris-elf:

Fixed now. Sorry for the complications.
-Andi


Re: [PATCH] Add -fmem-report-wpa

2012-09-08 Thread Hans-Peter Nilsson
On Thu, 6 Sep 2012, Andi Kleen wrote:

> From: Andi Kleen 

> Passed bootstrap and testsuite on x86_64.

> +++ b/gcc/lto/lto.c
> @@ -2016,6 +2016,8 @@ do_whole_program_analysis (void)
>/* Show the LTO report before launching LTRANS.  */
>if (flag_lto_report)
>  print_lto_report ();
> +  if (mem_report_wpa)
> +dump_mem_report ();
>  }

Broke build for cris-elf:
g++ -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE
-fno-exceptions -fno-rtti -W -Wall -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -pedantic -Wno-long-long
-Wno-variadic-macros -Wno-overlength-strings -fno-common
-DHAVE_CONFIG_H -I. -Ilto -I/tmp/hpautotest-gcc0/gcc/gcc
-I/tmp/hpautotest-gcc0/gcc/gcc/lto
-I/tmp/hpautotest-gcc0/gcc/gcc/../include 
-I/tmp/hpautotest-gcc0/gcc/gcc/../libcpp/include
-I/tmp/hpautotest-gcc0/cris-elf/gccobj/./gmp
-I/tmp/hpautotest-gcc0/gcc/gmp -I/tmp/hpautotest-gcc0/cris-elf/gccobj/./mpfr
-I/tmp/hpautotest-gcc0/gcc/mpfr -I/tmp/hpautotest-gcc0/gcc/mpc/src  
-I/tmp/hpautotest-gcc0/gcc/gcc/../libdecnumber
-I/tmp/hpautotest-gcc0/gcc/gcc/../libdecnumber/dpd
-I../libdecnumber/tmp/hpautotest-gcc0/gcc/gcc/lto/lto.c -o lto/lto.o
/tmp/hpautotest-gcc0/gcc/gcc/lto/lto.c: In function 'void 
do_whole_program_analysis()':
/tmp/hpautotest-gcc0/gcc/gcc/lto/lto.c:2020: error: 'dump_mem_report' was not 
declared in this scope
make[2]: *** [lto/lto.o] Error 1

brgds, H-P


Re: [PATCH] Fix C++ bootstrap ICE

2012-09-08 Thread Mark Kettenis
> Date: Fri, 07 Sep 2012 14:12:29 -0400
> From: Jason Merrill 
> 
> OK, thanks.

I don't have commit access (or at least I'm not on the write after
approval list).  Would you (or somebody else) be so kind to commit
this fix for me?

Thanks,

Mark


Re: [Patch, PR 54128] ira.c change to fix mips bootstrap

2012-09-08 Thread Richard Sandiford
Don't have time to look at this in detail, but FWIW:

Steve Ellcey  writes:
> On Wed, 2012-09-05 at 08:15 +0200, Jakub Jelinek wrote:
>> On Fri, Aug 31, 2012 at 10:58:51AM -0700, Steve Ellcey  wrote:
>> > Here is my patch to fix the bootstrap comparision failure (PR 54128) on
>> > MIPS.  The reason for the comparision failure was a difference in
>> > register usage and I tracked it down to build_insn_chain which checked
>> > all instructions for register usage in order to set the dead_or_set
>> > and live_relevant_regs bitmaps instead of checking only non-debug
>> > instructions.  Changing INSN_P to NONDEBUG_INSN_P in build_insn_chain
>> > allowed me to bootstrap and caused no regressions.
>> 
>> The debug insns generally shouldn't extend the lifetime of pseudos (see the
>> valtrack.c stuff), so if you hit this, there is probably some earlier bug
>> that didn't reset/adjust the debug insns in question.
>> I'm not saying the ira.c patch is absolutely a bad idea, but it would be
>> good if you could investigate where those debug insns started extending
>> lifetime of pseudos.
>> 
>> > 2012-08-31  Steve Ellcey  
>> > 
>> >PR bootstrap/54128
>> >* ira.c (build_insn_chain): Check only NONDEBUG instructions for
>> >register usage.
>> 
>>  Jakub
>
> I think I know where this may be going wrong, though I am having trouble
> actually creating a patch.  I think MIPS should define
> TARGET_DELAY_VARTRACK and call variable_tracking_main from mips_reorg.
>
> The systems that define TARGET_DELAY_VARTRACK all have this comment:
>
> /* Variable tracking should be run after all optimizations which
>change order of insns.  It also needs a valid CFG.  */

That isn't possible on delayed-branch targets.  dbr_schedule changes
the order of insns and (necessarily, until someone rewrites it)
runs with the CFG pulled down.

> And I think mips_reorg could change the order of insns.  I have tried
> putting a call to variable_tracking_main in mips_df_reorg (and changed
> mips_cfg_in_reorg to return true if flag_var_tracking is true but that
> didn't fix the problem.  I thought this might be because some of the
> mips_reorg code that comes after mips_df_reorg is still changing
> insn ordering.

Right.  mips_reorg calls dbr_schedule as a subroutine after mips_df_reorg.
dbr_schedule ought to be the last point at which we change the order of the
instructions though (rather than splitting them, inserting hazard nops, etc.).
So if the comment above really is what vartrack expects, it sounds like
this is still a generic dbr_schedule vs. vartrack thing rather than
a mips_reorg vs. vartrack thing.

Richard


Re: symbolic names for processor IDs

2012-09-08 Thread Ulrich Drepper
On Sat, Sep 8, 2012 at 7:17 AM, Uros Bizjak  wrote:
> There are some other cpuid vendor signatures than AMD and Intel, and
> probably there will be some more.

Sure, if there are still people using those they can easily add those.


> IMO, anybody using __get_cpuid_max
> call should define accepted signatures by itself.

According to that argument there should also be no bit_* definitions.
Of course everyone can define them.  It's easy enough but it's also
easy to make mistakes and to not document them.


Re: [patch, mips] New mips triplet for multilib linux builds

2012-09-08 Thread Richard Sandiford
Steve Ellcey  writes:
> On Fri, 2012-09-07 at 07:52 +0100, Richard Sandiford wrote:
>
>> MIPS_ABI_DEFAULT and MIPS_ISA_DEFAULT are better set in config.gcc.
>> That also allows you to handle (say) mipsisa32-mti-linux-gnu vs.
>> mipsisa64-mti-linux-gnu.
>> 
>> I think in general we want more specific header files to come after
>> less specific ones, so that the more specific ones can override
>> whatever they like.  E.g. the order for the generic config/ *.hs
>> is "elfos.h gnu-user.h linux.h" and the order for the MIPS ones
>> is "mips.h gnu-user.h gnu-user64.h linux64.h".  linux-common.h
>> coming after linux64.h is an odd-one-out really.
>> 
>> Richard
>
> OK, here is my latest patch.  The only thing I am not sure about is that
> you asked me to try building with '--with-arch=mips64 --with-float=hard'
> to see if it did something sensible.  I am not sure what sensible is.  I
> am building a cross compiler and when I built with these options the GCC
> would generate mips64 code by default.  But then the build would fail
> because that setup did not match the sysroot that I had built with
> mips32r2 as the default set of libraries.  I don't know if that is
> considered sensible or not.

Well, it sounds a bit odd.  Specs are applied in the order:

  OPTION_DEFAULT_SPEC
  DRIVER_SELF_SPECS
  SYSROOT_SUFFIX_SPEC

so I would have hoped even OPTION_DEFAULT_SPEC -mips64s would cause the
right sysroot suffix to be chosen.

> +#undef DRIVER_SELF_SPECS
> +#define DRIVER_SELF_SPECS\
> +  /* Make sure a -mips option is present.  This helps us to pick \
> + the right multilib, and also makes the later specs easier   
> \
> + to write.  */   \
> +  MIPS_ISA_LEVEL_SPEC,   
> \
> + \
> +  /* Infer the default float setting from -march.  */
> \
> +  MIPS_ARCH_FLOAT_SPEC,  
> \
> + \
> +  /* Infer the -msynci setting from -march if not explicitly set.  */
> \
> +  MIPS_ISA_SYNCI_SPEC,   
> \
> + \
> +  /* Use the standard linux specs for everything else.  */   \
> +  LINUX64_DRIVER_SELF_SPECS

Should add BASE_DRIVER_SELF_SPECS too.  OK with that change, thanks.
And thanks for your patience.

Richard


Re: [ping][PATCH] Power: Reorder a sign-extend RTL pattern for readability

2012-09-08 Thread David Edelsohn
 While examining the Power MD file seeking the explanation for a problem I
saw I have noticed a change in the past separated one of the instruction
splitters from its corresponding instruction pattern.  Several unrelated
patterns were inserted between the two, presumably by accident where the
`patch' tool applied a diff made from an older or modified tree in the
wrong place.

 I find this arrangement confusing, so I propose to move the splitter
back, next to the other pattern.  Here's the intended update.  No
functional change.  OK to apply?

2012-08-10  Maciej W. Rozycki  

gcc/
* config/rs6000/rs6000.md: Move a splitter next to its insn.

This patch is okay.  Yes, the splitter should not have been separated
from the basic pattern. Thanks for helping to clean up the port.

Thanks, David


Re: symbolic names for processor IDs

2012-09-08 Thread Uros Bizjak
Hello!

> The x86 cpuid instruction returns a processor ID and the
> __get_cpuid_max function even explicitly makes the %ebx value directly
> available.  But users of that function have to use a cryptic constant.
>  How about adding a few macros to make this more transparent?

There are some other cpuid vendor signatures than AMD and Intel, and
probably there will be some more. IMO, anybody using __get_cpuid_max
call should define accepted signatures by itself.

Uros.


Re: combine BIT_FIELD_REF and VEC_PERM_EXPR

2012-09-08 Thread Marc Glisse

On Mon, 3 Sep 2012, Richard Guenther wrote:


Please do the early outs where you compute the arguments.  Thus, right
after getting op0 in this case or right after computing n for the n != 1 check.

I think you need to verify that the type of 'op' is actually the element type
of op0.  The BIT_FIELD_REF can happily access elements two and three
of { 1, 2, 3, 4 } as a long for example.  See the BIT_FIELD_REF foldings
in fold-const.c.


On Mon, 3 Sep 2012, Richard Guenther wrote:


If you use fold_build3 you need to check that the result is in expected form
(a is_gimple_invariant or an SSA_NAME).


I first tried this:

+  if (TREE_CODE (tem) != SSA_NAME
+ && TREE_CODE (tem) != BIT_FIELD_REF
+ && !is_gimple_min_invariant (tem))
+   return false;

but then I thought that fold_stmt probably does the right thing (?) so I
switched to it.


Now that I look at this line, I wonder if I am missing some unshare_expr for
p and/or op1.


If either is a CONSTRUCTOR and its def stmt is not removed and it survives
into tem then yes ...


I added an unconditional unshare_expr. I guess it would be possible to 
look at if the permutation is only used once and in that case maybe not 
call unshare_expr (?) and call remove_prop_source_from_use at the end, but 
it gets a bit complicated and I don't think it helps compared to waiting 
for the next DCE pass.



Please also add handling of code == CONSTRUCTOR.


The cases I tried were already handled by fre1. I can add code for
constructor, but I'll need to look for a testcase first. Can that go to a
different patch?


Yes.


This is currently handled by FRE. The only testcase I have that reaches 
forwprop is bit_field_ref (vec_perm_expr (constructor)) and will require 
me to disable this patch so I can test that one...



New version of the patch (I really think I already regtested it, but I'll 
do it again to be sure):


2012-09-08  Marc Glisse  

gcc/
* tree-ssa-forwprop.c (simplify_bitfield): New function.
(ssa_forward_propagate_and_combine): Call it.

gcc/testsuite/
* gcc.dg/tree-ssa/forwprop-21.c: New testcase.

--
Marc GlisseIndex: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c (revision 191089)
+++ tree-ssa-forwprop.c (working copy)
@@ -2567,20 +2567,92 @@ combine_conversions (gimple_stmt_iterato
  gimple_assign_set_rhs_code (stmt, CONVERT_EXPR);
  update_stmt (stmt);
  return remove_prop_source_from_use (op0) ? 2 : 1;
}
}
 }
 
   return 0;
 }
 
+/* Combine an element access with a shuffle.  Returns true if there were
+   any changes made, else it returns false.  */
+ 
+static bool
+simplify_bitfield (gimple_stmt_iterator *gsi)
+{
+  gimple stmt = gsi_stmt (*gsi);
+  gimple def_stmt;
+  tree op, op0, op1, op2;
+  tree elem_type;
+  unsigned idx, n, size;
+  enum tree_code code;
+
+  op = gimple_assign_rhs1 (stmt);
+  gcc_checking_assert (TREE_CODE (op) == BIT_FIELD_REF);
+
+  op0 = TREE_OPERAND (op, 0);
+  if (TREE_CODE (op0) != SSA_NAME
+  || TREE_CODE (TREE_TYPE (op0)) != VECTOR_TYPE)
+return false;
+
+  elem_type = TREE_TYPE (TREE_TYPE (op0));
+  if (TREE_TYPE (op) != elem_type)
+return false;
+
+  size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
+  op1 = TREE_OPERAND (op, 1);
+  n = TREE_INT_CST_LOW (op1) / size;
+  if (n != 1)
+return false;
+
+  def_stmt = SSA_NAME_DEF_STMT (op0);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+  || !can_propagate_from (def_stmt))
+return false;
+
+  op2 = TREE_OPERAND (op, 2);
+  idx = TREE_INT_CST_LOW (op2) / size;
+
+  code = gimple_assign_rhs_code (def_stmt);
+
+  if (code == VEC_PERM_EXPR)
+{
+  tree p, m, index, tem;
+  unsigned nelts;
+  m = gimple_assign_rhs3 (def_stmt);
+  if (TREE_CODE (m) != VECTOR_CST)
+   return false;
+  nelts = VECTOR_CST_NELTS (m);
+  idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx));
+  idx %= 2 * nelts;
+  if (idx < nelts)
+   {
+ p = gimple_assign_rhs1 (def_stmt);
+   }
+  else
+   {
+ p = gimple_assign_rhs2 (def_stmt);
+ idx -= nelts;
+   }
+  index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
+  tem = build3 (BIT_FIELD_REF, TREE_TYPE (op),
+unshare_expr (p), op1, index);
+  gimple_assign_set_rhs1 (stmt, tem);
+  fold_stmt (gsi);
+  update_stmt (gsi_stmt (*gsi));
+  return true;
+}
+
+  return false;
+}
+
 /* Determine whether applying the 2 permutations (mask1 then mask2)
gives back one of the input.  */
 
 static int
 is_combined_permutation_identity (tree mask1, tree mask2)
 {
   tree mask;
   unsigned int nelts, i, j;
   bool maybe_identity1 = true;
   bool maybe_identity2 = true;
@@ -2825,20 +2897,22 @@ ssa_forward_propagate_and_combine (void)
  cfg_changed = true;
changed = did_something != 0;
  

Re: combine vec_perm_expr with constructor

2012-09-08 Thread Marc Glisse

On Mon, 3 Sep 2012, Richard Guenther wrote:


You do work above and then bail late here.  Always do early exists early
to reduce useless compile-time.

[...]

You need to verify that fold_ternary returns something that is valid GIMPLE.
fold () in general happily returns trees that are in the need of
re-gimplification.
You expect a CONSTRUCTOR or VECTOR_CST here, so you should check
for that.


Hello,

here is a new version of the patch, again tested on x86_64-linux-gnu.

2012-09-08  Marc Glisse  

gcc/
* tree-ssa-forwprop.c (simplify_permutation): Handle CONSTRUCTOR.

gcc/testsuite/
* gcc.dg/tree-ssa/forwprop-20.c: New testcase.


--
Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0)
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target double64 } */
+/* { dg-options "-O2 -fdump-tree-optimized" }  */
+
+#include 
+
+/* All of these optimizations happen for unsupported vector modes as a
+   consequence of the lowering pass. We need to test with a vector mode
+   that is supported by default on at least some architectures, or make
+   the test target specific so we can pass a flag like -mavx.  */
+
+typedef double vecf __attribute__ ((vector_size (2 * sizeof (double;
+typedef int64_t veci __attribute__ ((vector_size (2 * sizeof (int64_t;
+
+void f (double d, vecf* r)
+{
+  vecf x = { -d, 5 };
+  vecf y = {  1, 4 };
+  veci m = {  2, 0 };
+  *r = __builtin_shuffle (x, y, m); // { 1, -d }
+}
+
+void g (float d, vecf* r)
+{
+  vecf x = { d, 5 };
+  vecf y = { 1, 4 };
+  veci m = { 2, 1 };
+  *r = __builtin_shuffle (x, y, m); // { 1, 5 }
+}
+
+void h (double d, vecf* r)
+{
+  vecf x = { d + 1, 5 };
+  vecf y = {   1  , 4 };
+  veci m = {   2  , 0 };
+  *r = __builtin_shuffle (y, x, m); // { d + 1, 1 }
+}
+
+void i (float d, vecf* r)
+{
+  vecf x = { d, 5 };
+  veci m = { 1, 0 };
+  *r = __builtin_shuffle (x, m); // { 5, d }
+}
+
+void j (vecf* r)
+{
+  vecf y = {  1, 2 };
+  veci m = {  0, 0 };
+  *r = __builtin_shuffle (y, m); // { 1, 1 }
+}
+
+void k (vecf* r)
+{
+  vecf x = {  3, 4 };
+  vecf y = {  1, 2 };
+  veci m = {  3, 0 };
+  *r = __builtin_shuffle (x, y, m); // { 2, 3 }
+}
+
+void l (double d, vecf* r)
+{
+  vecf x = { -d, 5 };
+  vecf y = {  d, 4 };
+  veci m = {  2, 0 };
+  *r = __builtin_shuffle (x, y, m); // { d, -d }
+}
+
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c
___
Added: svn:eol-style
   + native
Added: svn:keywords
   + Author Date Id Revision URL

Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c (revision 191082)
+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -2599,75 +2599,134 @@ is_combined_permutation_identity (tree m
   if (j == i)
maybe_identity2 = false;
   else if (j == i + nelts)
maybe_identity1 = false;
   else
return 0;
 }
   return maybe_identity1 ? 1 : maybe_identity2 ? 2 : 0;
 }
 
-/* Combine two shuffles in a row.  Returns 1 if there were any changes
-   made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
+/* Combine a shuffle with its arguments.  Returns 1 if there were any
+   changes made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
  
 static int
 simplify_permutation (gimple_stmt_iterator *gsi)
 {
   gimple stmt = gsi_stmt (*gsi);
   gimple def_stmt;
-  tree op0, op1, op2, op3;
-  enum tree_code code = gimple_assign_rhs_code (stmt);
-  enum tree_code code2;
+  tree op0, op1, op2, op3, arg0, arg1;
+  enum tree_code code;
 
-  gcc_checking_assert (code == VEC_PERM_EXPR);
+  gcc_checking_assert (gimple_assign_rhs_code (stmt) == VEC_PERM_EXPR);
 
   op0 = gimple_assign_rhs1 (stmt);
   op1 = gimple_assign_rhs2 (stmt);
   op2 = gimple_assign_rhs3 (stmt);
 
-  if (TREE_CODE (op0) != SSA_NAME)
-return 0;
-
   if (TREE_CODE (op2) != VECTOR_CST)
 return 0;
 
-  if (op0 != op1)
-return 0;
+  if (TREE_CODE (op0) == VECTOR_CST)
+{
+  code = VECTOR_CST;
+  arg0 = op0;
+}
+  else if (TREE_CODE (op0) == SSA_NAME)
+{
+  def_stmt = SSA_NAME_DEF_STMT (op0);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+ || !can_propagate_from (def_stmt))
+   return 0;
 
-  def_stmt = SSA_NAME_DEF_STMT (op0);
-  if (!def_stmt || !is_gimple_assign (def_stmt)
-  || !can_propagate_from (def_stmt))
+  code = gimple_assign_rhs_code (def_stmt);
+  arg0 = gimple_assign_rhs1 (def_stmt);
+}
+  else
 return 0;
 
-  code2 = gimple_assign_rhs_code (def_stmt);
-
   /* Two consecutive shuffles.  */
-  if (code2 == VEC_PERM_EXPR)
+  if (code == VEC_PERM_E