Re: symbolic names for processor IDs
On Sat, Sep 8, 2012 at 7:17 AM, Uros Bizjak wrote: > There are some other cpuid vendor signatures than AMD and Intel, How about a patch with this complete list? d-gcc-cpuid-signature Description: Binary data
Re: [middle-end] Add machine_mode to address_cost target hook
Hans-Peter Nilsson schrieb: Oleg Endo wrote: Georg-Johann Lay wrote: The change is definitely in the right direction, but I wonder how it helps to fix code bloats of 300%-400% as in PR52543? I'm not familiar with the AVR parts. BTW, There was a small change in lower-subreg which required some adaptations in targets: http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00425.html See also gcc/config/sh/sh.c (sh_rtx_costs): case SET: ... Not sure whether it helps in your case. Those kinds of changes are not required anymore (for most targets), the fallback rtx_cost was amended to more sanely take the mode-size into account, and to handle SET. Whether that is related to PR52543, I don't know. Hi, I found an older compiler version (4.7.1 prerelease) without the "expand MEM as UNSPEC" hack, and compiled one test case from PR52543: long read1 (const __flash1 long *p) { return *p; } with the following options $ avr-gcc -mmcu=atmega128 -c foo.c -fdump-rtl-subreg1-details -fdump-rtl-jump-details -mlog=rtx_costs,address_cost -Os -std=gnu99 -dp -save-temps The -mlog= option logs all results from the mentioned target hooks. What I see is this: - targetm.address_cost isn't even called once - In .157r.jump the insn is fine, it reads SI from AS2: (insn 6 3 7 2 (set (reg:SI 46) (mem:SI (reg/v/f:HI 44 [ p ]) [2 *p_1(D)+0 S4 A8 AS2])) foo.c:10 26 {*movsi} (nil)) - In pass .158r.subreg1 this insn is split into 4 movqi. This costs 16 instructions instead of 6 without a split. Besides that the register pressure goes up. - Costs for mem:QI/AS2 are called *after* subreg1 split the insn forst time from cse1. - There is not a single call that tries to get the costs for mem:SI/AS2 moves - The only calls that get costs for SI moves look like avr_rtx_costs[:pass=?]=true (size) total=16, outer=set: (mem:SI (reg:HI 38 virtual-stack-dynamic) [0 S4 A8]) This cost is for generic address space, not for AS2. Besides that it calls the hook with the wrong optimization (speed instead of size as requested by -Os) Johann
[committed] PA cost update
The attached patch improves the cost estimate for multiplication, etc, on parisc. It was noted debugging various PRs that the cost for multiplication of long longs was seriously underestimated. Tested on hppa2.0w-hp-hpux11.11, hppa64-hp-hpux11.11 and hppa-unknown-linux-gnu. Committed to trunk. Dave -- J. David Anglin dave.ang...@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) 2012-09-08 John David Anglin * config/pa/pa.c (hppa_rtx_costs): Update costs for large integer modes. Index: config/pa/pa.c === --- config/pa/pa.c (revision 191079) +++ config/pa/pa.c (working copy) @@ -1422,6 +1422,8 @@ hppa_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED, int *total, bool speed ATTRIBUTE_UNUSED) { + int factor; + switch (code) { case CONST_INT: @@ -1453,11 +1455,20 @@ case MULT: if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT) -*total = COSTS_N_INSNS (3); - else if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT) - *total = COSTS_N_INSNS (8); + { + *total = COSTS_N_INSNS (3); + return true; + } + + /* A mode size N times larger than SImode needs O(N*N) more insns. */ + factor = GET_MODE_SIZE (GET_MODE (x)) / 4; + if (factor == 0) + factor = 1; + + if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT) + *total = factor * factor * COSTS_N_INSNS (8); else - *total = COSTS_N_INSNS (20); + *total = factor * factor * COSTS_N_INSNS (20); return true; case DIV: @@ -1471,15 +1482,28 @@ case UDIV: case MOD: case UMOD: - *total = COSTS_N_INSNS (60); + /* A mode size N times larger than SImode needs O(N*N) more insns. */ + factor = GET_MODE_SIZE (GET_MODE (x)) / 4; + if (factor == 0) + factor = 1; + + *total = factor * factor * COSTS_N_INSNS (60); return true; case PLUS: /* this includes shNadd insns */ case MINUS: if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT) - *total = COSTS_N_INSNS (3); - else -*total = COSTS_N_INSNS (1); + { + *total = COSTS_N_INSNS (3); + return true; + } + + /* A size N times larger than UNITS_PER_WORD needs N times as +many insns, taking N times as long. */ + factor = GET_MODE_SIZE (GET_MODE (x)) / UNITS_PER_WORD; + if (factor == 0) + factor = 1; + *total = factor * COSTS_N_INSNS (1); return true; case ASHIFT:
Re: [PATCH] Set correct source location for deallocator calls
Hi, I've added a libjava unittest which verifies that this patch will not break Java debug info. I've also incorporated Richard's review in the previous mail. Attached is the new patch, which passed bootstrap and all gcc/libjava testsuites on x86. Is it ok for trunk? Thanks, Dehao gcc/ChangeLog: 2012-09-08 Dehao Chen * tree-eh.c (goto_queue_node): New field. (record_in_goto_queue): New parameter. (record_in_goto_queue_label): New parameter. (lower_try_finally_dup_block): New parameter. (maybe_record_in_goto_queue): Update source location. (lower_try_finally_copy): Likewise. (honor_protect_cleanup_actions): Likewise. * gimplify.c (gimplify_expr): Reset the location to unknown. gcc/testsuite/ChangeLog: 2012-09-08 Dehao Chen * g++.dg/debug/dwarf2/deallocator.C: New test. libjava/ChangeLog: 2012-09-08 Dehao Chen * testsuite/libjava.lang/sourcelocation.java: New cases. * testsuite/libjava.lang/sourcelocation.out: New cases. Index: libjava/testsuite/libjava.lang/sourcelocation.java === --- libjava/testsuite/libjava.lang/sourcelocation.java (revision 0) +++ libjava/testsuite/libjava.lang/sourcelocation.java (revision 0) @@ -0,0 +1,18 @@ +/* This test should test the source location attribution. + We print the line number of different parts of the program to make sure + that the source code attribution is correct. + To make this test pass, one need to have up-to-date addr2line installed + to parse the dwarf4 data format. +*/ +public class sourcelocation { + public static void main(String args[]) { +try { + System.out.println(new Exception().getStackTrace()[0].getLineNumber()); + throw new Exception(); +} catch (Exception e) { + System.out.println(new Exception().getStackTrace()[0].getLineNumber()); +} finally { + System.out.println(new Exception().getStackTrace()[0].getLineNumber()); +} + } +} Index: libjava/testsuite/libjava.lang/sourcelocation.out === --- libjava/testsuite/libjava.lang/sourcelocation.out (revision 0) +++ libjava/testsuite/libjava.lang/sourcelocation.out (revision 0) @@ -0,0 +1,3 @@ +10 +13 +15 Index: libjava/testsuite/libjava.lang/sourcelocation.jar === Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Property changes on: libjava/testsuite/libjava.lang/sourcelocation.jar ___ Added: svn:mime-type + application/octet-stream Index: gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C === --- gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C (revision 0) +++ gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C (revision 0) @@ -0,0 +1,33 @@ +// Test that debug info generated for auto-inserted deallocator is +// correctly attributed. +// This patch scans for the lineno directly from assembly, which may +// differ between different architectures. Because it mainly tests +// FE generated debug info, without losing generality, only x86 +// assembly is scanned in this test. +// { dg-do compile { target { i?86-*-* x86_64-*-* } } } +// { dg-options "-O2 -fno-exceptions -g -dA" } + +struct t { + t (); + ~t (); + void foo(); + void bar(); +}; + +int bar(); + +void foo(int i) +{ + for (int j = 0; j < 10; j++) +{ + t test; + test.foo(); + if (i + j) + { + test.bar(); + return; + } +} + return; +} +// { dg-final { scan-assembler "deallocator.C:28" } } Index: gcc/tree-eh.c === --- gcc/tree-eh.c (revision 191083) +++ gcc/tree-eh.c (working copy) @@ -321,6 +321,7 @@ static bitmap eh_region_may_contain_throw_map; struct goto_queue_node { treemple stmt; + location_t location; gimple_seq repl_stmt; gimple cont_stmt; int index; @@ -560,7 +561,8 @@ static void record_in_goto_queue (struct leh_tf_state *tf, treemple new_stmt, int index, - bool is_label) + bool is_label, + location_t location) { size_t active, size; struct goto_queue_node *q; @@ -583,6 +585,7 @@ record_in_goto_queue (struct leh_tf_state *tf, memset (q, 0, sizeof (*q)); q->stmt = new_stmt; q->index = index; + q->location = location; q->is_label = is_label; } @@ -590,7 +593,8 @@ record_in_goto_queue (struct leh_tf_state *tf, TF is not null. */ static void -record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree label) +record_in_goto_queue_label (struct leh_tf_state *tf, treemple stmt, tree label, + location_t location) {
Ping: [PATCH] top-level libtool / configure.ac PIC support for AIX
As mentioned in http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01364.html GCC now is performing optimizations in binds_local_p() that necessitates adding -fPIC when compiling objects for shared libraries on AIX. The libtool patch has been accepted upstream. -fPIC creates unique names for some weak symbols, causing spurious miscompares addressed by the patch to top-level configure.ac. The libquadmath patch is another patch waiting for approval. How do the requirements to regenerate configure interact with the shared, top-level directories? The only strange difference I saw in GCC was the libatomic subdirectory, which had differences for AMTAR although the same version of autoconf. Bootstrapped and regression tested on powerpc-ibm-aix5.3.0.0 and powerpc-ibm-aix7.1.0.0. Thanks, David Merge upstream change. * libtool.m4 (_LT_COMPILER_PIC): Add -fPIC to GCC and GXX for AIX. * configure.ac: Add target-libquadmath to noconfigdirs for AIX. Add libgomp*.o to compare_exclusions for AIX. * configure: Regenerate * */configure: Regenerate Index: libtool.m4 === --- libtool.m4 (revision 191073) +++ libtool.m4 (working copy) @@ -3580,6 +3580,7 @@ # AIX 5 now supports IA64 processor _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' fi + _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' ;; amigaos*) @@ -3891,6 +3892,7 @@ # AIX 5 now supports IA64 processor _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' fi + _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' ;; amigaos*) Index: configure.ac === --- configure.ac(revision 191073) +++ configure.ac(working copy) @@ -527,6 +527,15 @@ fi fi +# Disable libquadmath for some systems. +case "${target}" in + # libquadmath is unused on AIX and libquadmath build process use of + # LD_LIBRARY_PATH can break AIX bootstrap. + powerpc-*-aix* | rs6000-*-aix*) +noconfigdirs="$noconfigdirs target-libquadmath" +;; +esac + # Disable libssp for some systems. case "${target}" in avr-*-*) @@ -3187,6 +3196,7 @@ case "$target" in hppa*64*-*-hpux*) ;; hppa*-*-hpux*) compare_exclusions="gcc/cc*-checksum\$(objext) | */libgcc/lib\ 2funcs* | gcc/ada/*tools/*" ;; + powerpc*-ibm-aix*) compare_exclusions="gcc/cc*-checksum\$(objext) | gcc/ada/\ *tools/* | *libgomp*\$(objext)" ;; esac AC_SUBST(compare_exclusions)
Re: [middle-end] Add machine_mode to address_cost target hook
Oleg Endo wrote: > On Thu, 2012-09-06 at 14:41 +0200, Georg-Johann Lay wrote: > > The change is definitely in the right direction, but I wonder > > how it helps to fix code bloats of 300%-400% as in PR52543? > > I'm not familiar with the AVR parts. > BTW, There was a small change in lower-subreg which required some > adaptations in targets: > http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00425.html > > See also gcc/config/sh/sh.c (sh_rtx_costs): case SET: ... > Not sure whether it helps in your case. Those kinds of changes are not required anymore (for most targets), the fallback rtx_cost was amended to more sanely take the mode-size into account, and to handle SET. Whether that is related to PR52543, I don't know. brgds, H-P
Re: [PATCH] Fix C++ bootstrap ICE
On Sat, 8 Sep 2012, Mark Kettenis wrote: > I don't have commit access (or at least I'm not on the write after > approval list). Would you (or somebody else) be so kind to commit > this fix for me? I'll take care of it. Gerald
Re: [PATCH] Add -fmem-report-wpa
On Sat, Sep 08, 2012 at 02:37:18PM -0400, Hans-Peter Nilsson wrote: > On Thu, 6 Sep 2012, Andi Kleen wrote: > > > From: Andi Kleen > > > Passed bootstrap and testsuite on x86_64. > > > +++ b/gcc/lto/lto.c > > @@ -2016,6 +2016,8 @@ do_whole_program_analysis (void) > >/* Show the LTO report before launching LTRANS. */ > >if (flag_lto_report) > > print_lto_report (); > > + if (mem_report_wpa) > > +dump_mem_report (); > > } > > Broke build for cris-elf: Fixed now. Sorry for the complications. -Andi
Re: [PATCH] Add -fmem-report-wpa
On Thu, 6 Sep 2012, Andi Kleen wrote: > From: Andi Kleen > Passed bootstrap and testsuite on x86_64. > +++ b/gcc/lto/lto.c > @@ -2016,6 +2016,8 @@ do_whole_program_analysis (void) >/* Show the LTO report before launching LTRANS. */ >if (flag_lto_report) > print_lto_report (); > + if (mem_report_wpa) > +dump_mem_report (); > } Broke build for cris-elf: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -W -Wall -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -Ilto -I/tmp/hpautotest-gcc0/gcc/gcc -I/tmp/hpautotest-gcc0/gcc/gcc/lto -I/tmp/hpautotest-gcc0/gcc/gcc/../include -I/tmp/hpautotest-gcc0/gcc/gcc/../libcpp/include -I/tmp/hpautotest-gcc0/cris-elf/gccobj/./gmp -I/tmp/hpautotest-gcc0/gcc/gmp -I/tmp/hpautotest-gcc0/cris-elf/gccobj/./mpfr -I/tmp/hpautotest-gcc0/gcc/mpfr -I/tmp/hpautotest-gcc0/gcc/mpc/src -I/tmp/hpautotest-gcc0/gcc/gcc/../libdecnumber -I/tmp/hpautotest-gcc0/gcc/gcc/../libdecnumber/dpd -I../libdecnumber/tmp/hpautotest-gcc0/gcc/gcc/lto/lto.c -o lto/lto.o /tmp/hpautotest-gcc0/gcc/gcc/lto/lto.c: In function 'void do_whole_program_analysis()': /tmp/hpautotest-gcc0/gcc/gcc/lto/lto.c:2020: error: 'dump_mem_report' was not declared in this scope make[2]: *** [lto/lto.o] Error 1 brgds, H-P
Re: [PATCH] Fix C++ bootstrap ICE
> Date: Fri, 07 Sep 2012 14:12:29 -0400 > From: Jason Merrill > > OK, thanks. I don't have commit access (or at least I'm not on the write after approval list). Would you (or somebody else) be so kind to commit this fix for me? Thanks, Mark
Re: [Patch, PR 54128] ira.c change to fix mips bootstrap
Don't have time to look at this in detail, but FWIW: Steve Ellcey writes: > On Wed, 2012-09-05 at 08:15 +0200, Jakub Jelinek wrote: >> On Fri, Aug 31, 2012 at 10:58:51AM -0700, Steve Ellcey wrote: >> > Here is my patch to fix the bootstrap comparision failure (PR 54128) on >> > MIPS. The reason for the comparision failure was a difference in >> > register usage and I tracked it down to build_insn_chain which checked >> > all instructions for register usage in order to set the dead_or_set >> > and live_relevant_regs bitmaps instead of checking only non-debug >> > instructions. Changing INSN_P to NONDEBUG_INSN_P in build_insn_chain >> > allowed me to bootstrap and caused no regressions. >> >> The debug insns generally shouldn't extend the lifetime of pseudos (see the >> valtrack.c stuff), so if you hit this, there is probably some earlier bug >> that didn't reset/adjust the debug insns in question. >> I'm not saying the ira.c patch is absolutely a bad idea, but it would be >> good if you could investigate where those debug insns started extending >> lifetime of pseudos. >> >> > 2012-08-31 Steve Ellcey >> > >> >PR bootstrap/54128 >> >* ira.c (build_insn_chain): Check only NONDEBUG instructions for >> >register usage. >> >> Jakub > > I think I know where this may be going wrong, though I am having trouble > actually creating a patch. I think MIPS should define > TARGET_DELAY_VARTRACK and call variable_tracking_main from mips_reorg. > > The systems that define TARGET_DELAY_VARTRACK all have this comment: > > /* Variable tracking should be run after all optimizations which >change order of insns. It also needs a valid CFG. */ That isn't possible on delayed-branch targets. dbr_schedule changes the order of insns and (necessarily, until someone rewrites it) runs with the CFG pulled down. > And I think mips_reorg could change the order of insns. I have tried > putting a call to variable_tracking_main in mips_df_reorg (and changed > mips_cfg_in_reorg to return true if flag_var_tracking is true but that > didn't fix the problem. I thought this might be because some of the > mips_reorg code that comes after mips_df_reorg is still changing > insn ordering. Right. mips_reorg calls dbr_schedule as a subroutine after mips_df_reorg. dbr_schedule ought to be the last point at which we change the order of the instructions though (rather than splitting them, inserting hazard nops, etc.). So if the comment above really is what vartrack expects, it sounds like this is still a generic dbr_schedule vs. vartrack thing rather than a mips_reorg vs. vartrack thing. Richard
Re: symbolic names for processor IDs
On Sat, Sep 8, 2012 at 7:17 AM, Uros Bizjak wrote: > There are some other cpuid vendor signatures than AMD and Intel, and > probably there will be some more. Sure, if there are still people using those they can easily add those. > IMO, anybody using __get_cpuid_max > call should define accepted signatures by itself. According to that argument there should also be no bit_* definitions. Of course everyone can define them. It's easy enough but it's also easy to make mistakes and to not document them.
Re: [patch, mips] New mips triplet for multilib linux builds
Steve Ellcey writes: > On Fri, 2012-09-07 at 07:52 +0100, Richard Sandiford wrote: > >> MIPS_ABI_DEFAULT and MIPS_ISA_DEFAULT are better set in config.gcc. >> That also allows you to handle (say) mipsisa32-mti-linux-gnu vs. >> mipsisa64-mti-linux-gnu. >> >> I think in general we want more specific header files to come after >> less specific ones, so that the more specific ones can override >> whatever they like. E.g. the order for the generic config/ *.hs >> is "elfos.h gnu-user.h linux.h" and the order for the MIPS ones >> is "mips.h gnu-user.h gnu-user64.h linux64.h". linux-common.h >> coming after linux64.h is an odd-one-out really. >> >> Richard > > OK, here is my latest patch. The only thing I am not sure about is that > you asked me to try building with '--with-arch=mips64 --with-float=hard' > to see if it did something sensible. I am not sure what sensible is. I > am building a cross compiler and when I built with these options the GCC > would generate mips64 code by default. But then the build would fail > because that setup did not match the sysroot that I had built with > mips32r2 as the default set of libraries. I don't know if that is > considered sensible or not. Well, it sounds a bit odd. Specs are applied in the order: OPTION_DEFAULT_SPEC DRIVER_SELF_SPECS SYSROOT_SUFFIX_SPEC so I would have hoped even OPTION_DEFAULT_SPEC -mips64s would cause the right sysroot suffix to be chosen. > +#undef DRIVER_SELF_SPECS > +#define DRIVER_SELF_SPECS\ > + /* Make sure a -mips option is present. This helps us to pick \ > + the right multilib, and also makes the later specs easier > \ > + to write. */ \ > + MIPS_ISA_LEVEL_SPEC, > \ > + \ > + /* Infer the default float setting from -march. */ > \ > + MIPS_ARCH_FLOAT_SPEC, > \ > + \ > + /* Infer the -msynci setting from -march if not explicitly set. */ > \ > + MIPS_ISA_SYNCI_SPEC, > \ > + \ > + /* Use the standard linux specs for everything else. */ \ > + LINUX64_DRIVER_SELF_SPECS Should add BASE_DRIVER_SELF_SPECS too. OK with that change, thanks. And thanks for your patience. Richard
Re: [ping][PATCH] Power: Reorder a sign-extend RTL pattern for readability
While examining the Power MD file seeking the explanation for a problem I saw I have noticed a change in the past separated one of the instruction splitters from its corresponding instruction pattern. Several unrelated patterns were inserted between the two, presumably by accident where the `patch' tool applied a diff made from an older or modified tree in the wrong place. I find this arrangement confusing, so I propose to move the splitter back, next to the other pattern. Here's the intended update. No functional change. OK to apply? 2012-08-10 Maciej W. Rozycki gcc/ * config/rs6000/rs6000.md: Move a splitter next to its insn. This patch is okay. Yes, the splitter should not have been separated from the basic pattern. Thanks for helping to clean up the port. Thanks, David
Re: symbolic names for processor IDs
Hello! > The x86 cpuid instruction returns a processor ID and the > __get_cpuid_max function even explicitly makes the %ebx value directly > available. But users of that function have to use a cryptic constant. > How about adding a few macros to make this more transparent? There are some other cpuid vendor signatures than AMD and Intel, and probably there will be some more. IMO, anybody using __get_cpuid_max call should define accepted signatures by itself. Uros.
Re: combine BIT_FIELD_REF and VEC_PERM_EXPR
On Mon, 3 Sep 2012, Richard Guenther wrote: Please do the early outs where you compute the arguments. Thus, right after getting op0 in this case or right after computing n for the n != 1 check. I think you need to verify that the type of 'op' is actually the element type of op0. The BIT_FIELD_REF can happily access elements two and three of { 1, 2, 3, 4 } as a long for example. See the BIT_FIELD_REF foldings in fold-const.c. On Mon, 3 Sep 2012, Richard Guenther wrote: If you use fold_build3 you need to check that the result is in expected form (a is_gimple_invariant or an SSA_NAME). I first tried this: + if (TREE_CODE (tem) != SSA_NAME + && TREE_CODE (tem) != BIT_FIELD_REF + && !is_gimple_min_invariant (tem)) + return false; but then I thought that fold_stmt probably does the right thing (?) so I switched to it. Now that I look at this line, I wonder if I am missing some unshare_expr for p and/or op1. If either is a CONSTRUCTOR and its def stmt is not removed and it survives into tem then yes ... I added an unconditional unshare_expr. I guess it would be possible to look at if the permutation is only used once and in that case maybe not call unshare_expr (?) and call remove_prop_source_from_use at the end, but it gets a bit complicated and I don't think it helps compared to waiting for the next DCE pass. Please also add handling of code == CONSTRUCTOR. The cases I tried were already handled by fre1. I can add code for constructor, but I'll need to look for a testcase first. Can that go to a different patch? Yes. This is currently handled by FRE. The only testcase I have that reaches forwprop is bit_field_ref (vec_perm_expr (constructor)) and will require me to disable this patch so I can test that one... New version of the patch (I really think I already regtested it, but I'll do it again to be sure): 2012-09-08 Marc Glisse gcc/ * tree-ssa-forwprop.c (simplify_bitfield): New function. (ssa_forward_propagate_and_combine): Call it. gcc/testsuite/ * gcc.dg/tree-ssa/forwprop-21.c: New testcase. -- Marc GlisseIndex: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 191089) +++ tree-ssa-forwprop.c (working copy) @@ -2567,20 +2567,92 @@ combine_conversions (gimple_stmt_iterato gimple_assign_set_rhs_code (stmt, CONVERT_EXPR); update_stmt (stmt); return remove_prop_source_from_use (op0) ? 2 : 1; } } } return 0; } +/* Combine an element access with a shuffle. Returns true if there were + any changes made, else it returns false. */ + +static bool +simplify_bitfield (gimple_stmt_iterator *gsi) +{ + gimple stmt = gsi_stmt (*gsi); + gimple def_stmt; + tree op, op0, op1, op2; + tree elem_type; + unsigned idx, n, size; + enum tree_code code; + + op = gimple_assign_rhs1 (stmt); + gcc_checking_assert (TREE_CODE (op) == BIT_FIELD_REF); + + op0 = TREE_OPERAND (op, 0); + if (TREE_CODE (op0) != SSA_NAME + || TREE_CODE (TREE_TYPE (op0)) != VECTOR_TYPE) +return false; + + elem_type = TREE_TYPE (TREE_TYPE (op0)); + if (TREE_TYPE (op) != elem_type) +return false; + + size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); + op1 = TREE_OPERAND (op, 1); + n = TREE_INT_CST_LOW (op1) / size; + if (n != 1) +return false; + + def_stmt = SSA_NAME_DEF_STMT (op0); + if (!def_stmt || !is_gimple_assign (def_stmt) + || !can_propagate_from (def_stmt)) +return false; + + op2 = TREE_OPERAND (op, 2); + idx = TREE_INT_CST_LOW (op2) / size; + + code = gimple_assign_rhs_code (def_stmt); + + if (code == VEC_PERM_EXPR) +{ + tree p, m, index, tem; + unsigned nelts; + m = gimple_assign_rhs3 (def_stmt); + if (TREE_CODE (m) != VECTOR_CST) + return false; + nelts = VECTOR_CST_NELTS (m); + idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx)); + idx %= 2 * nelts; + if (idx < nelts) + { + p = gimple_assign_rhs1 (def_stmt); + } + else + { + p = gimple_assign_rhs2 (def_stmt); + idx -= nelts; + } + index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size); + tem = build3 (BIT_FIELD_REF, TREE_TYPE (op), +unshare_expr (p), op1, index); + gimple_assign_set_rhs1 (stmt, tem); + fold_stmt (gsi); + update_stmt (gsi_stmt (*gsi)); + return true; +} + + return false; +} + /* Determine whether applying the 2 permutations (mask1 then mask2) gives back one of the input. */ static int is_combined_permutation_identity (tree mask1, tree mask2) { tree mask; unsigned int nelts, i, j; bool maybe_identity1 = true; bool maybe_identity2 = true; @@ -2825,20 +2897,22 @@ ssa_forward_propagate_and_combine (void) cfg_changed = true; changed = did_something != 0;
Re: combine vec_perm_expr with constructor
On Mon, 3 Sep 2012, Richard Guenther wrote: You do work above and then bail late here. Always do early exists early to reduce useless compile-time. [...] You need to verify that fold_ternary returns something that is valid GIMPLE. fold () in general happily returns trees that are in the need of re-gimplification. You expect a CONSTRUCTOR or VECTOR_CST here, so you should check for that. Hello, here is a new version of the patch, again tested on x86_64-linux-gnu. 2012-09-08 Marc Glisse gcc/ * tree-ssa-forwprop.c (simplify_permutation): Handle CONSTRUCTOR. gcc/testsuite/ * gcc.dg/tree-ssa/forwprop-20.c: New testcase. -- Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c === --- gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 0) @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target double64 } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +/* All of these optimizations happen for unsupported vector modes as a + consequence of the lowering pass. We need to test with a vector mode + that is supported by default on at least some architectures, or make + the test target specific so we can pass a flag like -mavx. */ + +typedef double vecf __attribute__ ((vector_size (2 * sizeof (double; +typedef int64_t veci __attribute__ ((vector_size (2 * sizeof (int64_t; + +void f (double d, vecf* r) +{ + vecf x = { -d, 5 }; + vecf y = { 1, 4 }; + veci m = { 2, 0 }; + *r = __builtin_shuffle (x, y, m); // { 1, -d } +} + +void g (float d, vecf* r) +{ + vecf x = { d, 5 }; + vecf y = { 1, 4 }; + veci m = { 2, 1 }; + *r = __builtin_shuffle (x, y, m); // { 1, 5 } +} + +void h (double d, vecf* r) +{ + vecf x = { d + 1, 5 }; + vecf y = { 1 , 4 }; + veci m = { 2 , 0 }; + *r = __builtin_shuffle (y, x, m); // { d + 1, 1 } +} + +void i (float d, vecf* r) +{ + vecf x = { d, 5 }; + veci m = { 1, 0 }; + *r = __builtin_shuffle (x, m); // { 5, d } +} + +void j (vecf* r) +{ + vecf y = { 1, 2 }; + veci m = { 0, 0 }; + *r = __builtin_shuffle (y, m); // { 1, 1 } +} + +void k (vecf* r) +{ + vecf x = { 3, 4 }; + vecf y = { 1, 2 }; + veci m = { 3, 0 }; + *r = __builtin_shuffle (x, y, m); // { 2, 3 } +} + +void l (double d, vecf* r) +{ + vecf x = { -d, 5 }; + vecf y = { d, 4 }; + veci m = { 2, 0 }; + *r = __builtin_shuffle (x, y, m); // { d, -d } +} + +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/forwprop-20.c ___ Added: svn:eol-style + native Added: svn:keywords + Author Date Id Revision URL Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (revision 191082) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -2599,75 +2599,134 @@ is_combined_permutation_identity (tree m if (j == i) maybe_identity2 = false; else if (j == i + nelts) maybe_identity1 = false; else return 0; } return maybe_identity1 ? 1 : maybe_identity2 ? 2 : 0; } -/* Combine two shuffles in a row. Returns 1 if there were any changes - made, 2 if cfg-cleanup needs to run. Else it returns 0. */ +/* Combine a shuffle with its arguments. Returns 1 if there were any + changes made, 2 if cfg-cleanup needs to run. Else it returns 0. */ static int simplify_permutation (gimple_stmt_iterator *gsi) { gimple stmt = gsi_stmt (*gsi); gimple def_stmt; - tree op0, op1, op2, op3; - enum tree_code code = gimple_assign_rhs_code (stmt); - enum tree_code code2; + tree op0, op1, op2, op3, arg0, arg1; + enum tree_code code; - gcc_checking_assert (code == VEC_PERM_EXPR); + gcc_checking_assert (gimple_assign_rhs_code (stmt) == VEC_PERM_EXPR); op0 = gimple_assign_rhs1 (stmt); op1 = gimple_assign_rhs2 (stmt); op2 = gimple_assign_rhs3 (stmt); - if (TREE_CODE (op0) != SSA_NAME) -return 0; - if (TREE_CODE (op2) != VECTOR_CST) return 0; - if (op0 != op1) -return 0; + if (TREE_CODE (op0) == VECTOR_CST) +{ + code = VECTOR_CST; + arg0 = op0; +} + else if (TREE_CODE (op0) == SSA_NAME) +{ + def_stmt = SSA_NAME_DEF_STMT (op0); + if (!def_stmt || !is_gimple_assign (def_stmt) + || !can_propagate_from (def_stmt)) + return 0; - def_stmt = SSA_NAME_DEF_STMT (op0); - if (!def_stmt || !is_gimple_assign (def_stmt) - || !can_propagate_from (def_stmt)) + code = gimple_assign_rhs_code (def_stmt); + arg0 = gimple_assign_rhs1 (def_stmt); +} + else return 0; - code2 = gimple_assign_rhs_code (def_stmt); - /* Two consecutive shuffles. */ - if (code2 == VEC_PERM_EXPR) + if (code == VEC_PERM_E