Re: [Patch, fortran] PRs 57893 and 58858
Am 28.10.2013 23:26, schrieb Paul Richard Thomas: This patch addresses issues arising from PR57893. It is entirely obvious. Bootstraps and regtests on FC17/x86_64 - OK for trunk? Thanks for the patch. It's OK except for the following change: *** gcc/testsuite/gfortran.dg/unlimited_polymorphic_13.f90 (revision 204135) --- gcc/testsuite/gfortran.dg/unlimited_polymorphic_13.f90 (working copy) ... *** 36,43 if (storage_size(o) /= sz) call abort() type is (complex(c2)) if (storage_size(o) /= sz) call abort() - end select - select type (o) type is (complex(c3)) if (storage_size(o) /= sz) call abort() type is (complex(c4)) --- 42,47 Please don't merge the two SELECT TYPE blocks. If a system has only two or three complex kinds, that will lead to an error message like: Error: CASE label at (1) overlaps with CASE label at (2) Tobias PS: The following patch is still pending review: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01957.html 2013-10-29 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * trans-types.c (gfc_typenode_for_spec): Add typenode for BT_HOLLERITH. Note that the length is incorrect but unusable. PR fortran 58858 * target-memory.c (gfc_element_size): Add element sizes for BT_VOID and BT_ASSUMED, using gfc_typenode_for_spec. 2013-10-29 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * gfortran.dg/unlimited_polymorphic_13.f90 : Use real variables to determine sizes of real kinds. PR fortran 58858 * gfortran.dg/unlimited_polymorphic_14.f90 : New test.
Re: free is a killer
On Mon, 28 Oct 2013, Jeff Law wrote: On 10/28/13 16:05, Marc Glisse wrote: I checked and it does the wrong thing (I don't have the testcase handy anymore, but it shouldn't be hard to recreate one), I even wrote a patch (attached) but it is related to: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html so it can't go in. A more limited fix (still missing many cases) would be possible. I didn't test VA_END, but it doesn't look as bad (it is the SSA_NAME case that is wrong for memcpy Actually, it's fairly easy to see with memset. Something like this: /* { dg-do compile } */ /* { dg-options -O1 -fdump-tree-optimized } */ void f (long *p) { *p = 42; p[4] = 42; __builtin_memset (p, 0, 100); } /* { dg-final { scan-tree-dump-not = 42 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ Indeed. Do you want to commit it xfailed or put it in bugzilla so we don't lose it? (it becomes harder if you replace p with p-1 in the memset arguments). While I am posting non-submitted patches, does the following vaguely make sense? I couldn't find a testcase that exercised it without some local patches, but the idea (IIRC) is that with: struct A { struct B b; int i; } we can easily end up with one ao_ref.base that is a MEM_REF[p] and another one a MEM_REF[(B*)q] where p and q are of type A*, and that prevents us from noticing that p-i and q-b don't alias. Maybe I should instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually points to a larger structure that starts with A it is likely to fail as well... I've never looked at the alias oracle stuff, but ISTM that offsets from the base ought to be enough to disambiguate? Yes, the issue here is making them notice that they can be seen as having bases of the same type (and thus that it is meaningful to compare offsets), since there are multiple possible base choices when a structure starts with a structure which starts with... In the category ugly hack that I won't submit, let me also attach this patch that sometimes helps notice that malloc doesn't alias other things (I don't know if the first hunk ever fires). Well, one thing that can be done here (I think it comes from Steensgaard and I'm pretty sure it's discussed in Morgan's text) is you assign a storage class and propagate that. Once you do so, you can eliminate certain things. You use a vrp-like engine with a fairly simple meet operator to handle propagation of the storage class. Once you have storage classes for all the pointers, you can make certain deductions about aliasing relationships. Including what things in the malloc storage class can and can not alias. Sounds good, but I am not really up to rewriting the points-to analysis at the moment... Thanks, -- Marc Glisse
[PATCH] Testcase for nonzero_bits __builtin_unreachable
On Sat, Oct 26, 2013 at 12:19:33AM +0200, Jakub Jelinek wrote: And here is a patch that allows vectorization without peeling for alignment and scalar loop for bound even for fn2, fn3 and fn4 in the following testcase, though as with the range __builtin_unreachable () notes, it is quite fragile, because it only works if there are no immediate uses of the tested SSA_NAME before the assertion. Perhaps more reliable way would be to convert those assertions info __builtin_assume_aligned, but that has the disadvantage that it's first argument is a pointer and it returns a pointer, so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED internal function. Bootstrapped/regtested on x86_64-linux and i686-linux. And here is as a follow-up to the whole patchset of 4 patches a testcase: 2013-10-29 Jakub Jelinek ja...@redhat.com * gcc.dg/vect/vect-align-3.c: New test. --- gcc/testsuite/gcc.dg/vect/vect-align-3.c.jj 2013-10-29 08:18:08.636348586 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-align-3.c2013-10-29 08:21:14.931384138 +0100 @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +int a[2048]; + +void +f1 (int x, int y) +{ + int i; + x = -256; + y = -256; + for (i = x + 256; i y; i++) +a[i]++; +} + +void +f2 (int x, int y) +{ + int i; + if (x 31) +__builtin_unreachable (); + if (y 31) +__builtin_unreachable (); + for (i = x + 256; i x + y; i++) +a[i]++; +} + +void +f3 (int x, int y) +{ + int i; + if (x % 256) +__builtin_unreachable (); + if (y % 256) +__builtin_unreachable (); + for (i = x + 256; i x + y; i++) +a[i]++; +} + +void +f4 (int x, int y) +{ + int i; + if ((x % 256) != 0) +__builtin_unreachable (); + if ((y % 256) != 0) +__builtin_unreachable (); + for (i = x + 256; i x + y; i++) +a[i]++; +} + +/* { dg-final { scan-tree-dump-not vect_do_peeling_for_loop_bound vect } } */ +/* { dg-final { scan-tree-dump-not loop peeled for vectorization vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ Jakub
Re: [PATCH, ARM] Fix line number data for PIC register setup code
Updated patch, re-bootstrapped on x86_64 and committed to trunk. Also applied to 4.7 and 4.8 branches, the same problem is present there. You only asked for approval on trunk though, and I'm not sure we really care about the results of the GDB testsuite on the 4.7 branch at this point, so let's exclude the 4.7 branch here. -- Eric Botcazou
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
Hello! For the define_expand I added as below, the else body is there to avoid fall-through transformations to ABS operation in optabs.c. Otherwise ABS will be converted to other operations even that we have corresponding instructions from SSSE3. No, it wont be. Fallthrough will generate the pattern that will be matched by the insn pattern above, just like you are doing by hand below. (define_expand absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))] TARGET_SSE2 { if (!TARGET_SSSE3) ix86_expand_sse2_abs (operands[0], force_reg (MODEmode, operands[1])); Do you really need force_reg here? You are using generic expanders in ix86_expand_sse2_abs that can handle non-registers operands just as well. else emit_insn (gen_rtx_SET (VOIDmode, operands[0], gen_rtx_ABS (MODEmode, operands[1]))); DONE; }) Please note that your mailer mangles indents. Please indent your code correctly. Uros.
RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
Hi, On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote: On 10/28/2013 03:20 AM, Bernd Edlinger wrote: On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote: I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new pr56997-1.c testcase, it got stuck in infinite recursion between store_split_bit_field/store_fixed_bit_field and/or extract_split_bit_field/extract_fixed_bit_field. This didn't show up in my previous mainline testing. [snip] I have attached an update to your patch, that should a) fix the recursion problem. b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory model. Thanks for picking up the ball on this -- I've been busy with other tasks for several days. I again tried backporting the patch series along with your fix to GCC 4.8 and tested on arm-none-eabi. I found that it was still getting stuck in infinite recursion unless the test from this patch hunk hmmm Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi. I have seen the recursion on the extract_bit_field, but not on store_bit_field so far, maybe you could give me a hint which test case exposes the other flavour of this recursion problem. @@ -1699,6 +1711,14 @@ extract_bit_field (rtx str_rtx, unsigned { enum machine_mode mode1; + /* Handle extraction of unaligned fields, + this can happen in -fstrict-volatile-bitfields. */ + if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0 + GET_MODE_BITSIZE (GET_MODE (str_rtx)) GET_MODE_BITSIZE (word_mode) + bitnum % GET_MODE_BITSIZE (GET_MODE (str_rtx)) + bitsize + GET_MODE_BITSIZE (GET_MODE (str_rtx)) ) + str_rtx = adjust_address (str_rtx, word_mode, 0); + /* Handle -fstrict-volatile-bitfields in the cases where it applies. */ if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) 0) mode1 = GET_MODE (str_rtx); was also inserted into store_bit_field. I also think that, minimally, this needs to be rewritten as something like if (MEM_P (str_rtx) STRICT_ALIGNMENT GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0 GET_MODE_BITSIZE (GET_MODE (str_rtx)) GET_MODE_BITSIZE (word_mode) (bitnum % MEM_ALIGN (str_rtx) + bitsize GET_MODE_BITSIZE (GET_MODE (str_rtx str_rtx = adjust_address (str_rtx, word_mode, 0); Yes, that looks fine. Otherwise, we'll end up using word_mode instead of the field mode on targets that support unaligned accesses. I tested that (again, 4.8 and arm-none-eabi) and results looked good, but of course ARM isn't one of the targets that supports unaligned accesses. :-S I'm still not convinced this is the right fix, though. It seems to me that callers of store_bit_field and extract_bit_field in expr.c ought not to be messing with the mode of the rtx when flag_strict_volatile_bitfields 0, because that is losing information about the original mode that we are trying to patch up here by forcing it to word_mode. Instead, I think the callers ought to be passing the declared field mode into store_bit_field and extract_bit_field, and those functions ought to change the mode of the incoming rtx to match the field mode only if strict_volatile_bitfield_p assures us that the insertion/extraction can be done in that mode. The problem starts likely at expr.c when this is done: if (volatilep flag_strict_volatile_bitfields 0) to_rtx = adjust_address (to_rtx, mode1, 0); this restricts the possible access mode not only for bit-fields but for all possible volatile members. But -fstrict-volatile-bitfields is supposed to affect bit-fields only. Alternatively, if strict_volatile_bitfield_p returns false but flag_strict_volatile_bitfields 0, then always force to word_mode and change the -fstrict-volatile-bitfields documentation to indicate that's the fallback if the insertion/extraction cannot be done in the declared mode, rather than claiming that it tries to do the same thing as if -fstrict-volatile-bitfields were not enabled at all. Either way, still needs more work, and more thorough testing (more targets, and obviously trunk as well as the 4.8 backport) before I'd consider this ready to commit. I might or might not be able to find some more time to work on this in the next week Yes. And it would be good to reach Richard's Nov-21 deadline for GCC-4.9 Bernd. -Sandra
[PATCH] Adjust testsuite with respect to -fvect-cost-model changes
-fvect-cost-model changed its behavior dependent on the optimization level. The former behavior is restored by using -fvect-cost-model=dynamic for -fvect-cost-model and -fvect-cost-model=unlimited for -fno-vect-cost-model. The following patch makes this change throughout the testsuite - which also highlights the fact that we miss testcases for the cheap cost model. Previously selected testcases that regressed with the above change were adjusted, but clearly all cost-model testcases were developed with the dynamic model in mind. Committed. Richard. 2013-10-29 Richard Biener rguent...@suse.de * g++.dg/vect/slp-pr56812.cc: Adjust with respect to -fvect-cost-model changes. * gcc.dg/vect/bb-slp-32.c: Likewise. * gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: Likewise. * gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Likewise. * gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp: Likewise. * gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp: Likewise. * gcc.target/powerpc/crypto-builtin-1.c: Likewise. * gcc.target/powerpc/p8vector-builtin-1.c: Likewise. * gcc.target/powerpc/p8vector-builtin-2.c: Likewise. * gcc.target/powerpc/p8vector-builtin-3.c: Likewise. * gcc.target/powerpc/p8vector-builtin-4.c: Likewise. * gcc.target/powerpc/p8vector-builtin-5.c: Likewise. * gcc.target/powerpc/p8vector-vectorize-1.c: Likewise. * gcc.target/powerpc/p8vector-vectorize-2.c: Likewise. * gcc.target/powerpc/p8vector-vectorize-3.c: Likewise. * gcc.target/powerpc/p8vector-vectorize-4.c: Likewise. * gcc.target/powerpc/p8vector-vectorize-5.c: Likewise. * gfortran.dg/vect/vect.exp: Likewise. Index: trunk/gcc/testsuite/g++.dg/vect/slp-pr56812.cc === *** trunk.orig/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 2013-08-30 09:55:27.0 +0200 --- trunk/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 2013-10-29 10:06:33.146955101 +0100 *** *** 1,7 /* { dg-do compile } */ /* { dg-require-effective-target vect_float } */ /* { dg-require-effective-target vect_hw_misalign } */ ! /* { dg-additional-options -O3 -funroll-loops -fvect-cost-model } */ class mydata { public: --- 1,7 /* { dg-do compile } */ /* { dg-require-effective-target vect_float } */ /* { dg-require-effective-target vect_hw_misalign } */ ! /* { dg-additional-options -O3 -funroll-loops -fvect-cost-model=dynamic } */ class mydata { public: Index: trunk/gcc/testsuite/gcc.dg/vect/bb-slp-32.c === *** trunk.orig/gcc/testsuite/gcc.dg/vect/bb-slp-32.c2013-08-27 11:13:22.0 +0200 --- trunk/gcc/testsuite/gcc.dg/vect/bb-slp-32.c 2013-10-29 10:06:38.660018142 +0100 *** *** 1,6 /* { dg-do compile } */ /* { dg-require-effective-target vect_int } */ ! /* { dg-additional-options -fvect-cost-model } */ void bar (int *); int foo (int *p) --- 1,6 /* { dg-do compile } */ /* { dg-require-effective-target vect_int } */ ! /* { dg-additional-options -fvect-cost-model=dynamic } */ void bar (int *); int foo (int *p) Index: trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp === *** trunk.orig/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp 2013-01-11 10:54:37.0 +0100 --- trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp 2013-10-29 10:07:01.423277877 +0100 *** if { ![istarget i?86*-*-*] ![istarget *** 28,34 set DEFAULT_VECTCFLAGS # These flags are used for all targets. ! lappend DEFAULT_VECTCFLAGS -O2 -ftree-vectorize -fvect-cost-model # If the target system supports vector instructions, the default action # for a test is 'run', otherwise it's 'compile'. Save current default. --- 28,34 set DEFAULT_VECTCFLAGS # These flags are used for all targets. ! lappend DEFAULT_VECTCFLAGS -O2 -ftree-vectorize -fvect-cost-model=dynamic # If the target system supports vector instructions, the default action # for a test is 'run', otherwise it's 'compile'. Save current default. Index: trunk/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp === *** trunk.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp 2013-01-11 10:54:37.0 +0100 --- trunk/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp 2013-10-29 10:06:48.601131525 +0100 *** if ![is-effective-target powerpc_altivec *** 33,39 set DEFAULT_VECTCFLAGS # These flags are used for all targets. ! lappend DEFAULT_VECTCFLAGS -O2 -ftree-vectorize -fvect-cost-model -fno-common # If the target system supports vector instructions, the
[PATCH GCC]Simplify address expression in IVOPT
Hi, I noticed that IVOPT generates complex address expressions like below for iv base. arr_base[0].y arr[0] MEM[p+o] It's even worse for targets support auto-increment addressing mode because IVOPT adjusts such base expression with +/- step, then creates below: arr_base[0].y +/- step arr[0] +/- step MEM[p+o] +/- step It has two disadvantages: 1) Cost computation in IVOPT can't handle complex address expression and general returns spill_cost for it, which is bad since address iv is important to IVOPT. 2) IVOPT creates duplicate candidates for IVs which have same value in different forms, for example, two candidates are generated with each for a[0] and a. Again, it's even worse for auto-increment addressing mode. This patch fixes the issue by simplifying address expression at the entry of allocating IV struct. Maybe the simplification can be put in various fold* functions but I think it might be better in this way, because: 1) fold* functions are used from front-end to various tree optimizations, the simplified address expressions may not be what each optimizer wanted. Think about parallelism related passes, they might want the array index information kept for further analysis. 2) In some way, the simplification is conflict with current implementation of fold* function. Take fold_binary_loc as an example, it tries to simplify a[i1] +p c* i2 into a[i1+i2]. Of course we can simplify in this way for IVOPT too, but that will cause new problems like: a) we have to add code in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want to avoid; b) the simplification can't always be done because of the sign/unsigned offset problem (especially for auto-increment addressing mode). 3) There are many entry point for fold* functions, the change will be non-trivial. 4) The simplification is only done in alloc_iv for true (not duplicate ones) iv struct, the number of such iv should be moderate. With these points, I think it might be a win to do the simplification in IVOPT and create a kind of sand box to let IVOPT play. Any suggestions? Bootstrap and tested on x86/x86_64/arm. The patch causes three cases failed on some target, but all of them are false alarm, which can be resolved by refining test cases to check more accurate information. Is it OK? Thanks. bin gcc/testsuite/ChangeLog 2013-10-29 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/loop-2.c: Refine check condition. * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. 2013-10-29 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression. (get_shiftadd_cost): Check equality using operand_equal_p. (force_expr_to_var_cost): Refactor the code. Handle type conversion. (split_address_cost): Call force_expr_to_var_cost.Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (working copy) @@ -27,7 +27,7 @@ void xxx(void) /* { dg-final { scan-tree-dump-times \\* \[^\\n\\r\]*= 0 optimized } } */ /* { dg-final { scan-tree-dump-times \[^\\n\\r\]*= \\* 0 optimized } } */ -/* { dg-final { scan-tree-dump-times MEM 1 optimized } } */ +/* { dg-final { scan-tree-dump-times MEM\\\[base 1 optimized } } */ /* 17 * iter should be strength reduced. */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy) @@ -7,7 +7,8 @@ extern char a[]; -/* Can not infer loop iteration from array -- exit test can not be replaced. */ +/* Can not infer loop iteration from array -- exit test can not be + replaced by the array address. */ void foo (unsigned int i_width, TYPE dst) { unsigned long long i = 0; @@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst) } } -/* { dg-final { scan-tree-dump-times Replacing 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times \[^:\]*if \\(.*j_\[0-9\]+.*\\) 1 ivopts} } */ /* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(working copy) @@ -18,5 +18,5 @@ long foo(long* p, long* p2, int N1, int N2) return s; } -/* { dg-final { scan-tree-dump-times Replacing 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times Replacing exit test: if \\(.*p2.*\\) 1 ivopts} } */ /* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/tree-ssa-loop-ivopts.c
[PING] [AArch64] Peepholes to generate ldp and stp instructions
Hi, Please consider this as a reminder to review the ldp and stp peephole implementation for AArch64 target. The patch was originally posted at:- http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01051.html Please review the same and let me know if its okay. Build and tested on aarch64-thunder-elf (using Cavium's internal simulator). No new regressions. 2013-10-29 Naveen H.S naveen.hurugalaw...@caviumnetworks.com gcc/ * config/aarch64/aarch64.md (peephole2s to generate ldp instruction for 2 consecutive loads from memory): New. (peephole2s to generate stp instruction for 2 consecutive stores to memory in integer mode): New. (peephole2s to generate ldp instruction for 2 consecutive loads from memory in floating point mode): New. (peephole2s to generate stp instruction for 2 consecutive stores to memory in floating point mode): New. gcc/testsuite * gcc.target/aarch64/ldp-stp.c: New testcase Thanks, Naveendiff -uprN '-x*.orig' mainline-orig/gcc/config/aarch64/aarch64.md gcc-4.8.0/gcc/config/aarch64/aarch64.md --- mainline-orig/gcc/config/aarch64/aarch64.md 2013-10-28 17:15:52.363975264 +0530 +++ gcc-4.8.0/gcc/config/aarch64/aarch64.md 2013-10-29 10:11:09.656082201 +0530 @@ -1068,6 +1068,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPI 0 register_operand) + (match_operand:GPI 1 aarch64_mem_pair_operand)) + (set (match_operand:GPI 2 register_operand) + (match_operand:GPI 3 memory_operand))] + GET_CODE (operands[1]) == MEM +GET_CODE (XEXP (operands[1], 0)) == PLUS +GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG +GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT +GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0)) +REGNO (operands[0]) != REGNO (operands[2]) +REGNO_REG_CLASS (REGNO (operands[0])) + == REGNO_REG_CLASS (REGNO (operands[2])) +rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, XEXP (operands[1], 0), + GET_MODE_SIZE (MODEmode))) +optimize_size + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 0 and 2 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn store_pairmode @@ -1085,6 +1106,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPI 0 aarch64_mem_pair_operand) + (match_operand:GPI 1 register_operand)) + (set (match_operand:GPI 2 memory_operand) + (match_operand:GPI 3 register_operand))] + GET_CODE (operands[0]) == MEM +GET_CODE (XEXP (operands[0], 0)) == PLUS +GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG +GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT +GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0)) +REGNO (operands[1]) != REGNO (operands[3]) +REGNO_REG_CLASS (REGNO (operands[1])) + == REGNO_REG_CLASS (REGNO (operands[3])) +rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, XEXP (operands[0], 0), + GET_MODE_SIZE (MODEmode))) +optimize_size + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 1 and 3 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn load_pairmode @@ -1102,6 +1144,28 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPF 0 register_operand) + (match_operand:GPF 1 aarch64_mem_pair_operand)) + (set (match_operand:GPF 2 register_operand) + (match_operand:GPF 3 memory_operand))] + GET_CODE (operands[1]) == MEM +GET_CODE (XEXP (operands[1], 0)) == PLUS +GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG +GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT +GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0)) +REGNO (operands[0]) != REGNO (operands[2]) +REGNO (operands[0]) = 32 REGNO (operands[2]) = 32 +REGNO_REG_CLASS (REGNO (operands[0])) + == REGNO_REG_CLASS (REGNO (operands[2])) +rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, XEXP (operands[1], 0), + GET_MODE_SIZE (MODEmode))) +optimize_size + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 0 and 2 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn store_pairmode @@ -1119,6 +1183,28 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPF 0 aarch64_mem_pair_operand) + (match_operand:GPF 1 register_operand)) + (set (match_operand:GPF 2 memory_operand) + (match_operand:GPF 3 register_operand))] + GET_CODE (operands[0]) == MEM +GET_CODE (XEXP (operands[0], 0)) == PLUS +GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG +GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT +
Re: Aliasing: look through pointer's def stmt
On Sat, Oct 26, 2013 at 7:07 PM, Marc Glisse marc.gli...@inria.fr wrote: On Fri, 25 Oct 2013, Marc Glisse wrote: On Fri, 25 Oct 2013, Richard Biener wrote: you can followup with handling POINTER_PLUS_EXPR if you like. Like this? (bootstrap+testsuite on x86_64-unknown-linux-gnu) 2013-10-26 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a POINTER_PLUS_EXPR in the defining statement. This has some issues with the type of the offsets. First, they might overflow in some extreme cases (that could probably already happen without the patch). But mostly, negative offsets are not supported. There is a comment to that effect before ao_ref_init_from_ptr_and_size, and ranges_overlap_p takes the offsets as unsigned HOST_WIDE_INT. Currently it does indeed seem hard to produce a negative offset there, but handling POINTER_PLUS_EXPR (definitely a good thing) would obviously change that. For the POINTER_PLUS_EXPR offset argument you should use int_cst_value () to access it (it will unconditionally sign-extend) and use host_integerp (..., 0). That leaves the overflow possibility in place (and you should multiply by BITS_PER_UNIT) which we ignore in enough other places similar to this to ignore ... (side-note: for quite some time on my TODO is to make the gimple alias-machinery use byte-offsets instead of bit-offsets which wouldn't cause regressions if we finally lowered bitfield accesses ...). A way to not ignore it is to do off = double_int_from_tree (ptr-plus-offset); off = double_int_sext (off, TYPE_PRECISION (...)) off = double_int_mul (off, BITS_PER_UNIT); if (off.fits_shwi ()) ... Richard. -- Marc Glisse
Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c
On 10/24/13 00:04, Kugan wrote: Hi, arm testcases neon-vcond-ltgt.c and neon-vcond-unordered.c fails in Linaro 4.8 branch. It is not reproducable with trunk but it can happen. Both neon-vcond-ltgt.c and neon-vcond-unordered.c scans for vbsl instruction, with other vector instructions. However, as per the comment for neon_vbslmode_internal md pattern defined in neon.md, gcc can generate vbsl or vbit or vbif depending on the register allocation. Therfore, these testcases should scan for one of these three instructions instead of just vbsl. I have updated the testcases to scan vbsl or vbit or vbif now. Is this OK? This is OK. Thanks for the analysis on the testcase. regards Ramana Thanks, Kugan 2013-10-23 Kugan Vivekanandarajah kug...@linaro.org * gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif. * gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.
Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c
On 10/24/13 00:04, Kugan wrote: Hi, arm testcases neon-vcond-ltgt.c and neon-vcond-unordered.c fails in Linaro 4.8 branch. It is not reproducable with trunk but it can happen. Both neon-vcond-ltgt.c and neon-vcond-unordered.c scans for vbsl instruction, with other vector instructions. However, as per the comment for neon_vbslmode_internal md pattern defined in neon.md, gcc can generate vbsl or vbit or vbif depending on the register allocation. Therfore, these testcases should scan for one of these three instructions instead of just vbsl. I have updated the testcases to scan vbsl or vbit or vbif now. Is this OK? Thanks, Kugan Hit send too soon on my earlier email. Minor nit in the Changelog as below. Need a newline between the DATE line and the file names . 2013-10-23 Kugan Vivekanandarajah kug...@linaro.org * gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif. * gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif. Ramana
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
On Wed, Oct 16, 2013 at 09:35:21AM +0400, Yury Gribov wrote: I've recently submitted a bug report regarding invalid unpoisoning of stack frame redzones (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take a look at proposed patch (a simple one-liner) and check whether it's ok for commit? Can you please be more verbose Do you think the proposed patch is ok or should I provide some additional info? Jakub, Dodji, Any updates on this one? Note that this bug is a huge blocker for using AddressSanitizer on ARM platforms. Sorry for the delay, I finally found time to look at it. While your patch fixes the issue, I wonder if for the case where shadow_mem before the asan_clear_shadow call is already of the form (mem (reg)) it isn't better to just adjust next asan_clear_shadow base from the end of the cleared region rather than from the start of it, because the end will be already in the right pseudo, while with your approach it needs to be done in a different register and potentially increase register pressure. So here is (completely untested) alternate fix: 2013-10-29 Jakub Jelinek ja...@redhat.com Yury Gribov y.gri...@samsung.com PR sanitizer/58543 * asan.c (asan_clear_shadow): Return bool whether the emitted loop (if any) updated shadow_mem's base. (asan_emit_stack_protection): If asan_clear_shadow returned true, adjust shadow_mem and prev_offset. --- gcc/asan.c.jj 2013-10-23 14:43:15.0 +0200 +++ gcc/asan.c 2013-10-29 10:26:56.085406934 +0100 @@ -878,10 +878,11 @@ asan_shadow_cst (unsigned char shadow_by /* Clear shadow memory at SHADOW_MEM, LEN bytes. Can't call a library call here though. */ -static void +static bool asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) { rtx insn, insns, top_label, end, addr, tmp, jump; + bool ret; start_sequence (); clear_storage (shadow_mem, GEN_INT (len), BLOCK_OP_NORMAL); @@ -893,12 +894,13 @@ asan_clear_shadow (rtx shadow_mem, HOST_ if (insn == NULL_RTX) { emit_insn (insns); - return; + return false; } gcc_assert ((len 3) == 0); top_label = gen_label_rtx (); addr = force_reg (Pmode, XEXP (shadow_mem, 0)); + ret = addr == XEXP (shadow_mem, 0); shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0); end = force_reg (Pmode, plus_constant (Pmode, addr, len)); emit_label (top_label); @@ -912,6 +914,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_ jump = get_last_insn (); gcc_assert (JUMP_P (jump)); add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100); + return ret; } /* Insert code to protect stack vars. The prologue sequence should be emitted @@ -1048,7 +1051,14 @@ asan_emit_stack_protection (rtx base, HO (last_offset - prev_offset) ASAN_SHADOW_SHIFT); prev_offset = last_offset; - asan_clear_shadow (shadow_mem, last_size ASAN_SHADOW_SHIFT); + if (asan_clear_shadow (shadow_mem, last_size ASAN_SHADOW_SHIFT)) + { + shadow_mem + = adjust_automodify_address (shadow_mem, VOIDmode, +XEXP (shadow_mem, 0), +last_size ASAN_SHADOW_SHIFT); + prev_offset += last_size; + } last_offset = offset; last_size = 0; } Jakub
[PATCH] Time profiler - phase 1
Hello, I've cooperating with Jan on a new profile-based function reordering stuff. This first patch introduces a new GCOV counter that instruments each function call and stores the time of first run of a function. Bootstrapped/regtested on x86_64-linux and i686-linux. Thanks, Martin diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fca665b..3b62bcc 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,31 @@ +2013-10-29 Martin Liska marxin.li...@gmail.com + Jan Hubicka j...@suse.cz + + * cgraph.c (dump_cgraph_node): Profile dump added. + * cgraph.h (struct cgraph_node): New time profile variable added. + * cgraphclones.c (cgraph_clone_node): Time profile is cloned. + * gcov-io.h (gcov_type): New profiler type introduced. + * ipa-profile.c (lto_output_node): Streaming for time profile added. + (input_node): Time profiler is read from LTO stream. + * predict.c (maybe_hot_count_p): Hot prediction changed. + * profile.c (instrument_values): New case for time profiler added. + (compute_value_histograms): Read of time profile. + * tree-pretty-print.c (dump_function_header): Time profiler is dumped. + * tree-profile.c (init_ic_make_global_vars): Time profiler function added. + (gimple_init_edge_profiler): TP function instrumentation. + (gimple_gen_time_profiler): New. + * value-prof.c (gimple_add_histogram_value): Support for time profiler + added. + (dump_histogram_value): TP type added to dumps. + (visit_hist): More sensitive check that takes TP into account. + (gimple_find_values_to_profile): TP instrumentation. + * value-prof.h (hist_type): New histogram type added. + (struct histogram_value_t): Pointer to struct function added. + * libgcc/Makefile.in: New GCOV merge function for TP added. + * libgcov.c: function_counter variable introduced. + (_gcov_merge_time_profile): New. + (_gcov_time_profiler): New. + 2013-10-29 David Malcolm dmalc...@redhat.com * doc/gty.texi (Inheritance and GTY): Make it clear that diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 52d9ab0..c95a54e 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1890,6 +1890,7 @@ dump_cgraph_node (FILE *f, struct cgraph_node *node) if (node-profile_id) fprintf (f, Profile id: %i\n, node-profile_id); + fprintf (f, First run: %i\n, node-tp_first_run); fprintf (f, Function flags:); if (node-count) fprintf (f, executed HOST_WIDEST_INT_PRINT_DECx, diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 7706419..479d49f 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -247,7 +247,6 @@ struct GTY(()) cgraph_clone_info bitmap combined_args_to_skip; }; - /* The cgraph data structure. Each function decl has assigned cgraph_node listing callees and callers. */ @@ -324,6 +323,8 @@ struct GTY(()) cgraph_node { unsigned tm_clone : 1; /* True if this decl is a dispatcher for function versions. */ unsigned dispatcher_function : 1; + /* Time profiler: first run of function. */ + int tp_first_run; }; diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index 800dd2c..a05fd77 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -207,6 +207,7 @@ cgraph_clone_node (struct cgraph_node *n, tree decl, gcov_type count, int freq, new_node-frequency = n-frequency; new_node-clone = n-clone; new_node-clone.tree_map = NULL; + new_node-tp_first_run = n-tp_first_run; if (n-count) { if (new_node-count n-count) diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c index 5a21c1f..aea272b 100644 --- a/gcc/gcov-io.c +++ b/gcc/gcov-io.c @@ -68,7 +68,7 @@ gcov_open (const char *name, int mode) #if IN_LIBGCOV const int mode = 0; #endif -#if GCOV_LOCKED +#if GCOV_LOCKED struct flock s_flock; int fd; diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h index ea8d9a7..4650836 100644 --- a/gcc/gcov-io.h +++ b/gcc/gcov-io.h @@ -342,9 +342,10 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned; counter. */ #define GCOV_COUNTER_IOR 7 /* IOR of the all values passed to counter. */ -#define GCOV_LAST_VALUE_COUNTER 7 /* The last of counters used for value +#define GCOV_TIME_PROFILER 8 /* Time profile collecting first run of a function */ +#define GCOV_LAST_VALUE_COUNTER 8 /* The last of counters used for value profiling. */ -#define GCOV_COUNTERS 8 +#define GCOV_COUNTERS 9 /* Number of counters used for value profiling. */ #define GCOV_N_VALUE_COUNTERS \ @@ -352,7 +353,7 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned; /* A list of human readable names of the counters */ #define GCOV_COUNTER_NAMES {arcs, interval, pow2, single, \ - delta, indirect_call, average, ior} + delta, indirect_call, average, ior, time_profiler} /* Names of merge functions for counters. */ #define GCOV_MERGE_FUNCTIONS {__gcov_merge_add, \ @@ -362,7 +363,8 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned; __gcov_merge_delta, \ __gcov_merge_single, \ __gcov_merge_add, \ - __gcov_merge_ior} +
[PATCH] tree-ssa documetation fix
Hello, I've noticed that some part of documentation is obsolete and this patch adds documentation for new functions that replace old macros. Thanks, Martin diff --git a/ChangeLog b/ChangeLog index 1c50b9b..d64fbcf 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2013-10-29 Martin Liska marxin.li...@gmail.com + + * doc/tree-ssa.texi: Newly added functions that replace old macros + are documented. + 2013-10-21 Cong Hou co...@google.com * MAINTAINERS (Write After Approval): Add myself. diff --git a/gcc/doc/tree-ssa.texi b/gcc/doc/tree-ssa.texi index aaf741b..93f596d 100644 --- a/gcc/doc/tree-ssa.texi +++ b/gcc/doc/tree-ssa.texi @@ -526,31 +526,29 @@ the result of ``merging'' @code{a_1}, @code{a_2} and @code{a_3}. Hence, PHI nodes mean ``one of these operands. I don't know which''. -The following macros can be used to examine PHI nodes +The following functions can be used to examine PHI nodes -@defmac PHI_RESULT (@var{phi}) +@defun gimple_phi_result (@var{phi}) Returns the @code{SSA_NAME} created by PHI node @var{phi} (i.e., @var{phi}'s LHS)@. -@end defmac +@end defun -@defmac PHI_NUM_ARGS (@var{phi}) +@defun gimple_phi_num_args (@var{phi}) Returns the number of arguments in @var{phi}. This number is exactly the number of incoming edges to the basic block holding @var{phi}@. -@end defmac +@end defun -@defmac PHI_ARG_ELT (@var{phi}, @var{i}) -Returns a tuple representing the @var{i}th argument of @var{phi}@. -Each element of this tuple contains an @code{SSA_NAME} @var{var} and -the incoming edge through which @var{var} flows. -@end defmac +@defun gimple_phi_arg (@var{phi}, @var{i}) +Returns @var{i}th argument of @var{phi}@. +@end defun -@defmac PHI_ARG_EDGE (@var{phi}, @var{i}) +@defun gimple_phi_arg_edge (@var{phi}, @var{i}) Returns the incoming edge for the @var{i}th argument of @var{phi}. -@end defmac +@end defun -@defmac PHI_ARG_DEF (@var{phi}, @var{i}) +@defun gimple_phi_arg_def (@var{phi}, @var{i}) Returns the @code{SSA_NAME} for the @var{i}th argument of @var{phi}. -@end defmac +@end defun @subsection Preserving the SSA form
Re: [PING] [AArch64] Peepholes to generate ldp and stp instructions
You are better off CCing the maintainers for such reviews. Let me do that for you. I cannot approve or reject this patch but I have a few comments as below. On 10/29/13 09:22, Hurugalawadi, Naveen wrote: diff -uprN '-x*.orig' mainline-orig/gcc/config/aarch64/aarch64.md gcc-4.8.0/gcc/config/aarch64/aarch64.md --- mainline-orig/gcc/config/aarch64/aarch64.md 2013-10-28 17:15:52.363975264 +0530 +++ gcc-4.8.0/gcc/config/aarch64/aarch64.md 2013-10-29 10:11:09.656082201 +0530 @@ -1068,6 +1068,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPI 0 register_operand) + (match_operand:GPI 1 aarch64_mem_pair_operand)) + (set (match_operand:GPI 2 register_operand) + (match_operand:GPI 3 memory_operand))] + GET_CODE (operands[1]) == MEM +GET_CODE (XEXP (operands[1], 0)) == PLUS +GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG Use the predicates REG_P , CONST_INT_P and so on. Don't do this in this form. +GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT +GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0)) +REGNO (operands[0]) != REGNO (operands[2]) +REGNO_REG_CLASS (REGNO (operands[0])) + == REGNO_REG_CLASS (REGNO (operands[2])) +rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, XEXP (operands[1], 0), + GET_MODE_SIZE (MODEmode))) +optimize_size Why use optimize_size here ? I would have thought it was always good to generate ldp and stp instructions. If you want to make it a core specific decision make it a tuning decision but don't disable just based on size. Additionally if this check is common between all the patterns it would be better to factor this out into a predicate or a function in its own right. + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 0 and 2 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn store_pairmode @@ -1085,6 +1106,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPI 0 aarch64_mem_pair_operand) + (match_operand:GPI 1 register_operand)) + (set (match_operand:GPI 2 memory_operand) + (match_operand:GPI 3 register_operand))] + GET_CODE (operands[0]) == MEM +GET_CODE (XEXP (operands[0], 0)) == PLUS +GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG +GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT +GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0)) +REGNO (operands[1]) != REGNO (operands[3]) +REGNO_REG_CLASS (REGNO (operands[1])) + == REGNO_REG_CLASS (REGNO (operands[3])) +rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, XEXP (operands[0], 0), + GET_MODE_SIZE (MODEmode))) +optimize_size + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 1 and 3 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn load_pairmode @@ -1102,6 +1144,28 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPF 0 register_operand) + (match_operand:GPF 1 aarch64_mem_pair_operand)) + (set (match_operand:GPF 2 register_operand) + (match_operand:GPF 3 memory_operand))] + GET_CODE (operands[1]) == MEM +GET_CODE (XEXP (operands[1], 0)) == PLUS +GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG +GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT +GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0)) +REGNO (operands[0]) != REGNO (operands[2]) +REGNO (operands[0]) = 32 REGNO (operands[2]) = 32 I'm not sure what this means here but this check doesn't look right - 32 is V0_REGNUM - why are you checking that here ? If you really need to do that check then use V0_REGNUM or it may be better to see if this really is a SIMD regnum ? Can you not use something like FP_REGNUM_P here instead ? +REGNO_REG_CLASS (REGNO (operands[0])) + == REGNO_REG_CLASS (REGNO (operands[2])) +rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, XEXP (operands[1], 0), + GET_MODE_SIZE (MODEmode))) +optimize_size + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 0 and 2 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn store_pairmode @@ -1119,6 +1183,28 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPF 0 aarch64_mem_pair_operand) + (match_operand:GPF 1 register_operand)) + (set (match_operand:GPF 2 memory_operand) + (match_operand:GPF 3 register_operand))] +
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello Richard, On 28 Oct 14:45, Richard Henderson wrote: On 10/28/2013 01:58 PM, Kirill Yukhin wrote: Hello Richard, On 28 Oct 08:20, Richard Henderson wrote: Why is a masked *scalar* operation useful? The reason the instructions exist is so that you can do fully fault correct predicated scalar algorithms. Using VEC_MERGE isn't the proper representation for that. If that's your real goal, then COND_EXEC is the only way to let rtl know that faults are suppressed in the false condition. I believe cond_exec approach supposed to look like this: (define_subst mask_scalar [(set (match_operand:SUBST_V 0) (vec_merge:SUBST_V (match_operand:SUBST_V 1) (match_operand:SUBST_V 2) (const_int 1)))] TARGET_AVX512F [(cond_exec (eq:CC (match_operand:avx512fmaskmode 3 register_operand k) (const_int 1)) (set (match_dup 0) (vec_merge:SUBST_V (match_dup 1) (match_dup 2) (const_int 1]) But this only will describe merge-masking in incorrect way. We will need to add a clobber to signal that even for false condition we will zero higher part of register. Preferable zerro-masking will be indistinguishable from merge- masking and will need to choose which mask mode to enable. Bad turn. IMHO, we have 3 options to implement scalar masked insns: 1. `vec_merge' over vec_merge (current approach). Pro. 1. Precise semantic description 2. Unified approach with vector patterns 3. Freedom for simplifier to reduce EVEX to VEX for certain const masks Cons. 1. Too precise semantic description and as a consequence complicated code in md-file 2. `cond_exec' approach Pro. 1. Look useful for compiler when trying to generate predicated code Cons. 1. Not precise. Extra clobbers (?) needed: to signal that we're changing the register even for false condition in cond_exec 2. Unable to describe zero masking nicely 3. Code still complicated as for option #1 4. Simplifier won't work (clobber is always clobber) 3. Make all masked scalar insns to be unspecs Pro. 1. Straight-forward, not overweighted. Enough for intrinsics to work Cons. 1. Since every unspec needs a code: substs won't be applied directly: huge volume of similar code 2. Simplifier won't work 3. Generation of predicated code become hard Am I missing some options, or that’s all we have? If so, what option would you prefer? Thanks, K
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law l...@redhat.com wrote: On 10/25/13 11:57, Ilya Enkovich wrote: There are currently two known issues with LTO. The first one is ICE in LTO streamer when it reads instrumented code. The second one is unitialized flag_check_pointers when code is compiled by lto1 (do you know why it may happen BTW?). It also causes ICE beacause instrumented code is met when not expected. Of course, I'll fix these problems anyway, but I was going to allow '-fcheck-pointers -flto' only when checker testsuite has 100% pass rate with lto. No idea about why LTO would be failing in this way -- my experience with LTO is virtually nil. I'm OK with the restriction as a temporary measure. But we really dislike this kidn of stuff, so I need a commitment from you to dive into these problems and sort out what's really going on. I'm not ok with adding such arbitrary restrictions. LTO support should be trivial. Richard. Ilya
Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS
On Mon, Oct 28, 2013 at 9:16 PM, Mike Stump mikest...@comcast.net wrote: Ping? Can you please get rid of PARTIAL_INT_MODE_NAME by making the name a required argument to PARTIAL_INT_MODE? Ok with that change (given target maintainers are ok with this). Thanks, Richard. On Oct 18, 2013, at 10:03 AM, Mike Stump mikest...@comcast.net wrote: On Oct 18, 2013, at 4:24 AM, Richard Biener richard.guent...@gmail.com wrote: I agree. Btw, the implementation defined precision of PDI on SH-5 definitely looks interesting, but I wonder how you can perform implementation defined arithmetic on that PDI mode then ;) Easy, perform it in the maximal size supported… What else blocks this patch? Nothing. We have heard from everyone, and all the issues found by testing were on msp, and those were exclusively due to port issues that I believe now have been resolved. The below patch has the change to 22 bits for PSI on SH. Ok?
Re: [PATCH, ARM] Fix line number data for PIC register setup code
On 29/10/13 09:12, Eric Botcazou wrote: Updated patch, re-bootstrapped on x86_64 and committed to trunk. Also applied to 4.7 and 4.8 branches, the same problem is present there. You only asked for approval on trunk though, Eric, Sorry about that. and I'm not sure we really care about the results of the GDB testsuite on the 4.7 branch at this point, FWIW, I originally encountered the problem on 4.7. so let's exclude the 4.7 branch here. Reverted on the 4.7 branch. Thanks, - Tom
Re: [RFC] [Testsuite,ARM] Neon intrinsics executable tests
On 10/09/13 23:16, Christophe Lyon wrote: Hi, This patch is a first small sample of dejagnu-ization of my ARM Neon intrinsics tests. Thanks for attempting this and apologies for the slow response - I've been busy with a few other things internally. It's derived from my previous work at http://gitorious.org/arm-neon-tests/arm-neon-tests which supports all the ARM intrinsics, with executable tests. As I have to manually transform each test (to include the expected data, and a few other modifications), it's quite a bit tedious. I'll take your word that this is tedious :) I can see how you get the reference input in from the original text file into headers and having to marshall things there. Irrespective of our earlier conversations on this now I'm actually wondering if instead of doing this and integrating this in the GCC source base it maybe easier to write a harness to test this cross on qemu or natively. Additionally setting up an auto-tester to do this might be a more productive use of time rather than manually dejagnuizing this which appears to be a tedious and slow process. I'd like your feedback before continuing, as there are a lot more files to come. I have made some cleanup to help review, but the two .h files will need to grow as more intrinsics will be added (see the original ones). Which one should I compare this with in terms of the original file ? I'd like to keep the modifications at a minimal level, to save my time when adapting each test (there are currently 145 test files, so 143 left:-). On to the patch itself. The prefix TEST_ seems a bit misleading in that it suggests this is testing something when in reality this is initializing stuff. Thanks, Christophe. This patch only introduces new files. 2013-10-03 Christophe Lyonchristophe.l...@linaro.org testsuite/gcc.target/arm/neon-intrinsics/ * neon-intrinsics.exp: New driver file. * arm-neon-ref.h: New file, with common vector construction helpers. * compute_ref_data.h: New file, with helpers for input data initialization. * ref_vaba.c: New test file for the vaba family of intrinsics. * ref_vld1.c: New test file for vld1. neontests.patch.txt diff -rNup '--exclude=.git' gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp --- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp 1970-01-01 01:00:00.0 +0100 +++ gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp 2013-05-08 23:08:46.271786347 +0200 @@ -0,0 +1,35 @@ +# Copyright (C) 1997-2013 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +#http://www.gnu.org/licenses/. + +# GCC testsuite that uses the `dg.exp' driver. + +# Exit immediately if this isn't an ARM target. +if ![istarget arm*-*-*] then { + return +} Also for aarch64*-*-* as all these intrinsics are compatible with the aarch64 port. I would also prefer that this be tortured over multiple optimization levels as many times we find issues with different optimization levels. More later I need to get back to something else and I need to play more with your original testsuite - but I'd like some discussion around some of these points anyway. Ramana + +# Load support procs. +load_lib gcc-dg.exp + +# Initialize `dg'. +dg-init + +# Main loop. +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ + + +# All done. +dg-finish diff -rNup '--exclude=.git' gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h --- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 1970-01-01 01:00:00.0 +0100 +++ gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 2013-05-09 00:48:59.395628726 +0200 @@ -0,0 +1,349 @@ +#ifndef_ARM_NEON_REF_H_ +#define_ARM_NEON_REF_H_ + +#include stdio.h +#include inttypes.h +#include string.h +#include stdlib.h + +#define xSTR(X) #X +#define STR(X) xSTR(X) + +#define xNAME1(V,T) V ## _ ## T +#define xNAME(V,T) xNAME1(V,T) + +#define VAR(V,T,W) xNAME(V,T##W) +#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W) + +#define VECT_NAME(T, W, N) T##W##x##N +#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L +#define VECT_TYPE(T, W, N)
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski pins...@gmail.com wrote: On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski pins...@gmail.com wrote: On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski pins...@gmail.com wrote: On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: On 18 October 2013 00:58, Jeff Law l...@redhat.com wrote: On 10/17/13 05:03, Richard Biener wrote: Is it OK for trunk? I had a much simpler change which did basically the same from 4.7 (I can update it if people think this is a better approach). I like that more (note you can now use is_gimple_condexpr as predicate for force_gimple_operand). The obvious question is whether or not Andrew's simpler change picks up as many transformations as Zhenqiang's change. If not are the things missed important. Zhenqiang, can you do some testing of your change vs Andrew P.'s change? Here is a rough compare: 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included in my patch). Root cause is that it does not skip LABEL. The guard to do this opt should be the same the bb_has_overhead_p in my patch. This should be an easy change, I am working on this right now. 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not efficient for ||. e.g. For ssa-ifcombine-ccmp-6.c, it will generate _3 = a_2(D) 0; _5 = b_4(D) 0; _6 = _3 | _5; _9 = c_7(D) = 0; _10 = ~_6; _11 = _9 _10; if (_11 == 0) With my patch, it will generate _3 = a_2(D) 0; _5 = b_4(D) 0; _6 = _3 | _5; _9 = c_7(D) 0; _10 = _6 | _9; if (_10 != 0) As mentioned otherwise, this seems like a missed optimization inside forwprop. When I originally wrote this code there used to be two cases one for and one for |, but this was removed sometime and I just made the code evolve with that. Actually I can make a small patch (3 lines) to my current patch which causes tree-ssa-ifcombine.c to produce the behavior of your patch. if (result_inv) { t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t); result_inv = false; } I will submit a new patch after some testing of my current patch which fixes items 1 and 2. Here is my latest patch which adds the testcases from Zhenqiang's patch and fixes item 1 and 2. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. return i; } +/* Return a new iterator pointing to the first non-debug non-label statement in + basic block BB. */ + +static inline gimple_stmt_iterator +gsi_start_nondebug_after_labels_bb (basic_block bb) vertical space before the comment missing. @@ -631,7 +669,7 @@ tree_ssa_ifcombine (void) bbs = single_pred_before_succ_order (); calculate_dominance_info (CDI_DOMINATORS); - for (i = 0; i n_basic_blocks - NUM_FIXED_BLOCKS; ++i) + for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i = 0; i--) { basic_block bb = bbs[i]; gimple stmt = last_stmt (bb); The original code matches what phiopt does which even has a comment: /* Search every basic block for COND_EXPR we may be able to optimize. We walk the blocks in order that guarantees that a block with a single predecessor is processed before the predecessor. This ensures that we collapse inner ifs before visiting the outer ones, and also that we do not try to visit a removed block. */ bb_order = single_pred_before_succ_order (); n = n_basic_blocks - NUM_FIXED_BLOCKS; for (i = 0; i n; i++) so if the reverse order is better here (and in phiopt?) then it at least needs a comment explaining why. Otherwise the patch looks good, but please iterate with Jeff over the dom testcase issue. Thanks, Richard. Thanks, Andrew Pinski ChangeLog: * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (ifcombine_ifandif): Handle cases where maybe_fold_and_comparisons fails, combining the branches anyways. (tree_ssa_ifcombine): Inverse the order of the basic block walk, increases the number of combinings. * gimple.h (gsi_start_nondebug_after_labels_bb): New function. testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent conditional move to be used. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more intermediate. Thanks, Andrew Pinski 3) The good thing of Andrew P.'s change is that Inverse the order of the basic block walk so it can do combine recursively. But I think we need some heuristic to control the number of ifs. Move too much compares from the inner_bb to outer_bb is not good. I think this depends on the target. For MIPS we don't want an upper bound
[PATCH v4, nds32] Andes nds32 port, libgcc part
Hi, all, This is v4 patch for Andes nds32 port on libgcc part. Thanks Joseph Myers's comments. Now we remove the .file directive from crtzero.S. The rationale of this can be referred to: http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00123.html The diff is as follows and the patch file is attached. diff --git a/libgcc/config/nds32/crtzero.S b/libgcc/config/nds32/crtzero.S index 5ef0361..6509869 100644 --- a/libgcc/config/nds32/crtzero.S +++ b/libgcc/config/nds32/crtzero.S @@ -33,8 +33,6 @@ !! !!== - .file crtzero.S - !!-- !! Jump to start up code !!-- libgcc/ 2013-10-29 Chung-Ju Wu jasonw...@gmail.com Shiva Chen shiva0...@gmail.com * config/nds32 : New directory and files. Best regards, jasonwucj 3-nds32-libgcc.v4.patch.tar.gz Description: GNU Zip compressed data
Re: free is a killer
On Mon, Oct 28, 2013 at 11:05 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 28 Oct 2013, Jeff Law wrote: On 10/26/13 01:15, Marc Glisse wrote: Hello, this patch teaches gcc that free kills the memory its argument points to. The equality test is probably too strict, I guess we can loosen it later (unless you have suggestions?). Note that the corresponding code for BUILT_IN_MEMCPY and others seems suspicious to me, it looks like it is testing for equality between a pointer and a mem_ref, which is unlikely to happen. Bootstrap+testsuite on x86_64-unknown-linux-gnu. 2013-10-27 Marc Glisse marc.gli...@inria.fr PR tree-optimization/19831 gcc/ * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle BUILT_IN_FREE. gcc/testsuite/ * gcc.dg/tree-ssa/alias-25.c: New file. OK for the trunk. Thanks. I agree the MEM_REF* and VA_END cases look strange and I think they're wrong as well. Did you happen to try fixing them and running the testsuite to see if anything fails? ISTM your testcase could be tweaked to test conclusively if they're doing the right thing -- and regardless of what's found that test can/should make its way into the testsuite. The VA_END case looks ok (though == equality testing is a bit too conservative, the SSA_NAME case of the builtins handling looks wrong indeed. I checked and it does the wrong thing (I don't have the testcase handy anymore, but it shouldn't be hard to recreate one), I even wrote a patch (attached) but it is related to: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html so it can't go in. A more limited fix (still missing many cases) would be possible. I didn't test VA_END, but it doesn't look as bad (it is the SSA_NAME case that is wrong for memcpy). While I am posting non-submitted patches, does the following vaguely make sense? I couldn't find a testcase that exercised it without some local patches, but the idea (IIRC) is that with: struct A { struct B b; int i; } we can easily end up with one ao_ref.base that is a MEM_REF[p] and another one a MEM_REF[(B*)q] where p and q are of type A*, and that prevents us from noticing that p-i and q-b don't alias. Maybe I should instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually points to a larger structure that starts with A it is likely to fail as well... --- gcc/tree-ssa-alias.c(revision 204095) +++ gcc/tree-ssa-alias.c(working copy) @@ -1099,22 +1099,24 @@ indirect_refs_may_alias_p (tree ref1 ATT /* If both references are through the same type, they do not alias if the accesses do not overlap. This does extra disambiguation for mixed/pointer accesses but requires strict aliasing. */ if ((TREE_CODE (base1) != TARGET_MEM_REF || (!TMR_INDEX (base1) !TMR_INDEX2 (base1))) (TREE_CODE (base2) != TARGET_MEM_REF || (!TMR_INDEX (base2) !TMR_INDEX2 (base2))) same_type_for_tbaa (TREE_TYPE (base1), TREE_TYPE (ptrtype1)) == 1 same_type_for_tbaa (TREE_TYPE (base2), TREE_TYPE (ptrtype2)) == 1 - same_type_for_tbaa (TREE_TYPE (ptrtype1), -TREE_TYPE (ptrtype2)) == 1) + (same_type_for_tbaa (TREE_TYPE (ptrtype1), + TREE_TYPE (ptrtype2)) == 1 + || same_type_for_tbaa (TREE_TYPE (TREE_TYPE (ptr1)), +TREE_TYPE (TREE_TYPE (ptr2))) == 1)) No, the type of 'ptr' are not in any way relevant. Richard. return ranges_overlap_p (offset1, max_size1, offset2, max_size2); /* Do type-based disambiguation. */ if (base1_alias_set != base2_alias_set !alias_sets_conflict_p (base1_alias_set, base2_alias_set)) return false; /* Do access-path based disambiguation. */ if (ref1 ref2 (handled_component_p (ref1) || handled_component_p (ref2)) In the category ugly hack that I won't submit, let me also attach this patch that sometimes helps notice that malloc doesn't alias other things (I don't know if the first hunk ever fires). -- Marc Glisse
[PATCH v4, nds32] Andes nds32 port, documentation part
This is v4 patch for Andes nds32 port on documentation part. Thanks Joseph Myers's comments. Now we use @deftypefn and @samp to refine our documentation content. The diff is as follows and the patch file is attached. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index f0002a9..5f91813 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -12432,33 +12432,38 @@ once the handler returns. These built-in functions are available for the NDS32 target: -@table @code -@item void __builtin_nds32_isync (int *@var{addr}) +@deftypefn {Built-in Function} void __builtin_nds32_isync (int *@var{addr}) Insert an ISYNC instruction into the instruction stream where @var{addr} is an instruction address for serialization. +@end deftypefn -@item void __builtin_nds32_isb (void) +@deftypefn {Built-in Function} void __builtin_nds32_isb (void) Insert an ISB instruction into the instruction stream. +@end deftypefn -@item int __builtin_nds32_mfsr (int @var{sr}) +@deftypefn {Built-in Function} int __builtin_nds32_mfsr (int @var{sr}) Return the content of a system register which is mapped by @var{sr}. +@end deftypefn -@item int __builtin_nds32_mfusr (int @var{usr}) +@deftypefn {Built-in Function} int __builtin_nds32_mfusr (int @var{usr}) Return the content of a user space register which is mapped by @var{usr}. +@end deftypefn -@item void __builtin_nds32_mtsr (int @var{value}, int @var{sr}) +@deftypefn {Built-in Function} void __builtin_nds32_mtsr (int @var{value}, int @var{sr}) Move the @var{value} to a system register which is mapped by @var{sr}. +@end deftypefn -@item void __builtin_nds32_mtusr (int @var{value}, int @var{usr}) +@deftypefn {Built-in Function} void __builtin_nds32_mtusr (int @var{value}, int @var{usr}) Move the @var{value} to a user space register which is mapped by @var{usr}. +@end deftypefn -@item void __builtin_nds32_setgie_en (void) +@deftypefn {Built-in Function} void __builtin_nds32_setgie_en (void) Enable global interrupt. +@end deftypefn -@item void __builtin_nds32_setgie_dis (void) +@deftypefn {Built-in Function} void __builtin_nds32_setgie_dis (void) Disable global interrupt. - -@end table +@end deftypefn @node picoChip Built-in Functions @subsection picoChip Built-in Functions diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 68af98e..cbc0644 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1864,7 +1864,7 @@ supported since version 4.7.2 and is the default in 4.8.0 and newer. @item --with-nds32-lib=@var{library} Specifies that @var{library} setting is used for building @file{libgcc.a}. -Currently, the valid @var{library} is 'newlib' or 'mculib'. +Currently, the valid @var{library} is @samp{newlib} or @samp{mculib}. This option is only supported for the NDS32 target. @item --with-build-time-tools=@var{dir} gcc/ 2013-10-29 Chung-Ju Wu jasonw...@gmail.com Shiva Chen shiva0...@gmail.com * doc/invoke.texi (NDS32 options): Document nds32 specific options. * doc/md.texi (NDS32 family): Document nds32 specific constraints. * doc/install.texi (Cross-Compiler-Specific Options): Document --with-nds32-lib for nds32 target. * doc/extend.texi (Function Attributes): Document nds32 specific attributes. Best regards, jasonwucj 5-nds32-documentation.v4.patch Description: Binary data
Re: free is a killer
On Tue, Oct 29, 2013 at 6:35 AM, Jeff Law l...@redhat.com wrote: On 10/28/13 16:05, Marc Glisse wrote: I checked and it does the wrong thing (I don't have the testcase handy anymore, but it shouldn't be hard to recreate one), I even wrote a patch (attached) but it is related to: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html so it can't go in. A more limited fix (still missing many cases) would be possible. I didn't test VA_END, but it doesn't look as bad (it is the SSA_NAME case that is wrong for memcpy Actually, it's fairly easy to see with memset. Something like this: /* { dg-do compile } */ /* { dg-options -O1 -fdump-tree-optimized } */ void f (long *p) { *p = 42; p[4] = 42; __builtin_memset (p, 0, 100); } /* { dg-final { scan-tree-dump-not = 42 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ I'm sure it can be made to work with memcpy as well, it's just harder because you have to ensure the source/dest args don't alias. While I am posting non-submitted patches, does the following vaguely make sense? I couldn't find a testcase that exercised it without some local patches, but the idea (IIRC) is that with: struct A { struct B b; int i; } we can easily end up with one ao_ref.base that is a MEM_REF[p] and another one a MEM_REF[(B*)q] where p and q are of type A*, and that prevents us from noticing that p-i and q-b don't alias. Maybe I should instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually points to a larger structure that starts with A it is likely to fail as well... I've never looked at the alias oracle stuff, but ISTM that offsets from the base ought to be enough to disambiguate? In the category ugly hack that I won't submit, let me also attach this patch that sometimes helps notice that malloc doesn't alias other things (I don't know if the first hunk ever fires). Well, one thing that can be done here (I think it comes from Steensgaard and I'm pretty sure it's discussed in Morgan's text) is you assign a storage class and propagate that. Once you do so, you can eliminate certain things. You use a vrp-like engine with a fairly simple meet operator to handle propagation of the storage class. Once you have storage classes for all the pointers, you can make certain deductions about aliasing relationships. Including what things in the malloc storage class can and can not alias. Of course in the example we have the global memory storage class (incoming function argument) and malloc memory which is really the same storage class. It only becomes a different storage class if you factor in flow analysis (for which the current PTA code is weak once you leave the SSA web). Richard. Jeff
Re: RFA: Andes nds32 port v4 patch
2013/10/29 Joseph S. Myers jos...@codesourcery.com: Since Richard looked at the machine description part of the port, I looked at the other pieces. I'm happy with those pieces with the following minor changes: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00520.html (libgcc): remove the .file directive from crtzero.S. See http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00123.html for why such directives are best avoided. Thank you for the link. Now I realize the rationale of why it is better to remove the .file directive. :) Modify it accordingly: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00522.html (documentation): use @deftypefn rather than @table @code when documenting built-in functions. In install.texi, say @samp{newlib} and @samp{mculib} instead of using '' quotes. Thanks for the comment. Now I modify it as you suggested: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html -- Joseph S. Myers jos...@codesourcery.com Thank you very much for the comments on libgcc and documentation parts. Here I re-list the v4 patch of nds32 port: Machine Description -- http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02153.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02154.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02156.html Libgcc and Documentation -- http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html Are those nds32 port patches OK to be committed? Best regards, jasonwucj
Re: [C++ Patch] PR 58888
Hi, On 10/28/2013 07:05 PM, Jason Merrill wrote: My question is, why do we need that whole block for massaging VAR_DECL initializers? That all ought to be handled properly by cp_finish_decl. Does removing everything after else if (!VAR_P (value)) gcc_unreachable (); work? In fact, it works great, excellent. I only have to tweak a bit g++.dg/other/warning1.C, which, for some reason, used dg-warnings, whereas in fact we have been matching errors (we are removing the field initializer is not constant error and replacing it with (1.0e+0f / 0.0f)’ is not a constant expression etc) Tested x86_64-linux. Thanks, Paolo. / Index: cp/decl2.c === --- cp/decl2.c (revision 204152) +++ cp/decl2.c (working copy) @@ -955,26 +955,6 @@ grokfield (const cp_declarator *declarator, /* C++11 NSDMI, keep going. */; else if (!VAR_P (value)) gcc_unreachable (); - else if (!processing_template_decl) - { - if (TREE_CODE (init) == CONSTRUCTOR) - init = digest_init (TREE_TYPE (value), init, tf_warning_or_error); - init = maybe_constant_init (init); - - if (init != error_mark_node !TREE_CONSTANT (init)) - { - /* We can allow references to things that are effectively -static, since references are initialized with the -address. */ - if (TREE_CODE (TREE_TYPE (value)) != REFERENCE_TYPE - || (TREE_STATIC (init) == 0 - (!DECL_P (init) || DECL_EXTERNAL (init) == 0))) - { - error (field initializer is not constant); - init = error_mark_node; - } - } - } } if (processing_template_decl VAR_OR_FUNCTION_DECL_P (value)) Index: testsuite/g++.dg/cpp0x/auto40.C === --- testsuite/g++.dg/cpp0x/auto40.C (revision 0) +++ testsuite/g++.dg/cpp0x/auto40.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/5 +// { dg-do compile { target c++11 } } + +#include initializer_list + +struct A +{ + static constexpr auto b{1.0}; +}; + +constexpr decltype(A::b) A::b; Index: testsuite/g++.dg/other/warning1.C === --- testsuite/g++.dg/other/warning1.C (revision 204152) +++ testsuite/g++.dg/other/warning1.C (working copy) @@ -7,8 +7,8 @@ extern C int printf(const char *, ...); struct S { - static const float inf = 1.0f / 0.0f; // { dg-warning 1.0|float|initializ } - static const float nan = 0.0f / 0.0f; // { dg-warning 0.0|float|initializ } + static const float inf = 1.0f / 0.0f; // { dg-error 1.0|float|initializ } + static const float nan = 0.0f / 0.0f; // { dg-error 0.0|float|initializ } }; int main()
Re: [PATCH, ARM] Fix line number data for PIC register setup code
Reverted on the 4.7 branch. Thanks. -- Eric Botcazou
Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP
On Fri, 25 Oct 2013, Jakub Jelinek wrote: Hi! As discussed on IRC, this patch attempts to preserve VRP computed range info for some simple __builtin_unreachable () using assertions. If there are no immediate uses of some SSA_NAME except for those in a condition guarding __builtin_unreachable () and in ASSERT_EXPR in the following basic block, we can copy the range info from the assert lhs (which is lost during remove_range_assertions) and thus preserve it. Eventually we might then remove the __builtin_unreachable () in that case and still benefit from user annotating the source. In the testcase below in bar we have even without this patch the expected range info on the SSA_NAME set to x + 1, but in foo there was no range info preserved. As we have just one user of get_range_info right now, not creating a testcase. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? int foo (int x) { if (x 26 || x 37) __builtin_unreachable (); return x; } int bar (int x) { if (x 26 || x 37) __builtin_unreachable (); return x + 1; } 2013-10-25 Jakub Jelinek ja...@redhat.com * tree-vrp.c (remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate uses but in the condition and ASSERT_EXPR and the other successor of the predecessor bb is __builtin_unreachable (), set_range_info of the ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs. --- gcc/tree-vrp.c.jj 2013-10-24 10:19:21.0 +0200 +++ gcc/tree-vrp.c2013-10-24 14:32:29.065878208 +0200 @@ -6488,12 +6488,16 @@ remove_range_assertions (void) { basic_block bb; gimple_stmt_iterator si; + /* 1 if looking at ASSERT_EXPRs immediately at the beginning of + a basic block preceeded by GIMPLE_COND branching to it and + __builtin_trap, -1 if not yet checked, 0 otherwise. */ + int is_unreachable; /* Note that the BSI iterator bump happens at the bottom of the loop and no bump is necessary if we're removing the statement referenced by the current BSI. */ FOR_EACH_BB (bb) -for (si = gsi_start_bb (bb); !gsi_end_p (si);) +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);) { gimple stmt = gsi_stmt (si); gimple use_stmt; @@ -6501,30 +6505,96 @@ remove_range_assertions (void) if (is_gimple_assign (stmt) gimple_assign_rhs_code (stmt) == ASSERT_EXPR) { + tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); tree var; tree cond = fold (ASSERT_EXPR_COND (rhs)); - use_operand_p use_p; + use_operand_p use_p, use2_p; imm_use_iterator iter; gcc_assert (cond != boolean_false_node); - /* Propagate the RHS into every use of the LHS. */ var = ASSERT_EXPR_VAR (rhs); - FOR_EACH_IMM_USE_STMT (use_stmt, iter, -gimple_assign_lhs (stmt)) + gcc_assert (TREE_CODE (var) == SSA_NAME); + + if (!POINTER_TYPE_P (TREE_TYPE (lhs)) + SSA_NAME_RANGE_INFO (lhs)) + { + if (is_unreachable == -1) + { + is_unreachable = 0; + if (single_pred_p (bb)) + { + basic_block pred_bb = single_pred (bb); + gimple last = last_stmt (pred_bb); + if (last gimple_code (last) == GIMPLE_COND) + { + basic_block other_bb + = EDGE_SUCC (pred_bb, 0)-dest; + if (other_bb == bb) + other_bb = EDGE_SUCC (pred_bb, 1)-dest; + if (EDGE_COUNT (other_bb-succs) == 0) + { + gimple_stmt_iterator gsi + = gsi_after_labels (other_bb); + if (!gsi_end_p (gsi) + gimple_call_builtin_p + (gsi_stmt (gsi), + BUILT_IN_UNREACHABLE)) + is_unreachable = 1; + } + } Please factor this out into a separate function. Like obfuscated_fallthru_edge_p () (better name?). Best somewhere in tree-cfg.c. Also you want to skip debug stmts ... (not sure if we reliably enough remove them, but I can envision a compare-debug fail with some DCE disabled at least). + } + } + /* Handle +if (x_7 = 10 x_7 20) + __builtin_unreachable (); +x_8 = ASSERT_EXPR x_7, ...; +if the only uses of x_7 are in the ASSERT_EXPR and +in the condition. In that
Re: [PATCH] Introduce [sg]et_nonzero_bits
On Fri, 25 Oct 2013, Jakub Jelinek wrote: Hi! tree-ssa-ccp.c already computes which bits are known to be zero, but we preserve that info only for pointers and not for integers. This patch changes SSA_NAME_RANGE_INFO, so we preserve that info even for integers. The bitmask is also computed from range info. There are no users of this besides ccp itself right now, but I'm working on using that info e.g. in the vectorizer. I had to tweak one testcase, because it relied on one of the two warnings being emitted during VRP1 and one during VRP2, but because of the range info derived nonzero_bits we now manage to optimize away the second test during CCP3, which doesn't have -Wtype-limits warnings. If it would be too important to keep warning even there, I guess it could check get_range_info when folding a condition to 0/1 and warn if the rhs isn't in the lhs' range. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? During the development of the patch I had there code to not only use get_range_info if otherwise returning VARYING, but also if it was already constant, but that failed bootstrap miserably: It was something along the lines of: ++/* If get_nonzero_bits tells us some bits ++ have to be zero and they aren't, something ++ went wrong. Only bits in value where val.mask ++ has zeroes (i.e. valid bit from CCP), mask has ones ++ (don't care about bits outside of precision) and ++ has zeroes in nonzero_bits (guaranteed to be zero) ++ should be verified to be zero. */ ++double_int valv = tree_to_double_int (val.value); ++gcc_assert ((valv ~val.mask ++ ~nonzero_bits mask).is_zero ()); ++if (!(valv ~nonzero_bits mask).is_zero ()) ++ val.value = double_int_to_tree (TREE_TYPE (lhs), ++ valv nonzero_bits); ++if (nonzero_bits.is_zero ()) ++ val.mask = double_int_zero; ++else ++ val.mask = val.mask (nonzero_bits | ~mask); The problem was that during the iteration when only a subset of the edges were executable it sometimes failed the assertion, dunno if that is the result of the nonzero bits info in this case coming from completely different algorithm (from VRP ranges) and that simply until the CCP iterations settle it can be wrong, so just to be safe in that case I'm adding the info only in ccp_finalize (and the new nonzero_bits mask with the old one). From what I could see, the range info on that testcase looked just fine, the nonzero_bits derived from it too, if desirable, I can try to cook up a small testcase showing that (reduce from libgcov.c that exhibited that). Surely you can't rely on CCP and VRP compute exactly the same nonzero_bits. As you don't record/compute zero_bits you can't tell whether a not set bit in nonzer_bits is don't know or if it is zero. And you cannot do an assert during iteration (during iteration optimistically 'wrong' values can disappear). During CCP iteration the rule is that bits may only be added to mask (and obviously the constant itself on a CONSTANT lattice value may not change). More comments inline 2013-10-24 Jakub Jelinek ja...@redhat.com * gimple-pretty-print.c (dump_ssaname_info): Print newline also in case of VR_VARYING. Print get_nonzero_bits if not all ones. * tree-ssanames.h (struct range_info_def): Add nonzero_bits field. (set_nonzero_bits, get_nonzero_bits): New prototypes. * tree-ssa-ccp.c (get_default_value): Use get_range_info to see if a default def isn't partially constant. (ccp_finalize): If after IPA, set_range_info if integral SSA_NAME is known to be partially zero. (evaluate_stmt): If we'd return otherwise VARYING, use get_range_info to see if a default def isn't partially constant. * tree-ssanames.c (set_range_info): Initialize nonzero_bits upon creation of a range, if VR_RANGE, try to improve nonzero_bits from the range. (set_nonzero_bits, get_nonzero_bits): New functions. * g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure we warn about it already during VRP1 pass. --- gcc/gimple-pretty-print.c.jj 2013-10-23 14:43:12.0 +0200 +++ gcc/gimple-pretty-print.c 2013-10-24 17:26:59.650945232 +0200 @@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe if (!POINTER_TYPE_P (TREE_TYPE (node)) SSA_NAME_RANGE_INFO (node)) { - double_int min, max; + double_int min, max, nonzero_bits; value_range_type range_type = get_range_info (node, min, max); if (range_type == VR_VARYING) @@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe pp_printf (buffer, , ); pp_double_int (buffer, max,
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
Sorry for the delay, I finally found time to look at it. Np, thanks for helping! While your patch fixes the issue, I wonder ... potentially increase register pressure. Makes sense. I didn't take care of this because I believed that we can freely allocate vregs and rely on register allocator do it's work. So here is (completely untested) alternate fix: Nice, this seems to fix my repro cases. I'll only be able to test my larger ARM app in 1-2 days. We could wait until then or commit now. -Y
[AArch64] Fix size of memory store for the vstn_lane intrinsics
Hi, The vstn_lane_lane_type intrinsics should write (sizeof (lane_type) * n) bytes to memory. In their current form, their asm constraints suggest a write size of (sizeof (vector_type) * n). This is anywhere from 1 to 16 times too much data, can cause huge headaches with dead store elimination. This patch better models how much data we will be writing, which in turn lets us eliminate the memory clobber. Together, we avoid the problems with dead store elimination. Tested with aarch64.exp and checked the C++ neon mangling test which often breaks when you do these ugly casts. OK? Thanks, James --- gcc/ 2013-10-29 James Greenhalgh james.greenha...@arm.com * config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Better model data size. (__ST3_LANE_FUNC): Likewise. (__ST4_LANE_FUNC): Likewise. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 787ff15..7a63ea1 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -14704,16 +14704,19 @@ __LD4_LANE_FUNC (uint64x2x4_t, uint64_t, 2d, d, u64, q) #define __ST2_LANE_FUNC(intype, ptrtype, regsuffix, \ lnsuffix, funcsuffix, Q) \ + typedef struct { ptrtype __x[2]; } __ST2_LANE_STRUCTURE_##intype; \ __extension__ static __inline void \ __attribute__ ((__always_inline__)) \ - vst2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ + vst2 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr, \ intype b, const int c) \ { \ +__ST2_LANE_STRUCTURE_##intype *__p =\ +(__ST2_LANE_STRUCTURE_##intype *)ptr; \ __asm__ (ld1 {v16. #regsuffix , v17. #regsuffix }, %1\n\t \ st2 {v16. #lnsuffix , v17. #lnsuffix }[%2], %0\n\t \ - : =Q(*(intype *) ptr) \ + : =Q(*__p) \ : Q(b), i(c) \ - : memory, v16, v17); \ + : v16, v17); \ } __ST2_LANE_FUNC (int8x8x2_t, int8_t, 8b, b, s8,) @@ -14743,16 +14746,19 @@ __ST2_LANE_FUNC (uint64x2x2_t, uint64_t, 2d, d, u64, q) #define __ST3_LANE_FUNC(intype, ptrtype, regsuffix, \ lnsuffix, funcsuffix, Q) \ + typedef struct { ptrtype __x[3]; } __ST3_LANE_STRUCTURE_##intype; \ __extension__ static __inline void \ __attribute__ ((__always_inline__)) \ - vst3 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ + vst3 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr, \ intype b, const int c) \ { \ +__ST3_LANE_STRUCTURE_##intype *__p =\ +(__ST3_LANE_STRUCTURE_##intype *)ptr; \ __asm__ (ld1 {v16. #regsuffix - v18. #regsuffix }, %1\n\t \ st3 {v16. #lnsuffix - v18. #lnsuffix }[%2], %0\n\t \ - : =Q(*(intype *) ptr) \ + : =Q(*__p) \ : Q(b), i(c) \ - : memory, v16, v17, v18);\ + : v16, v17, v18); \ } __ST3_LANE_FUNC (int8x8x3_t, int8_t, 8b, b, s8,) @@ -14782,16 +14788,19 @@ __ST3_LANE_FUNC (uint64x2x3_t, uint64_t, 2d, d, u64, q) #define __ST4_LANE_FUNC(intype, ptrtype, regsuffix, \ lnsuffix, funcsuffix, Q) \ + typedef struct { ptrtype __x[4]; } __ST4_LANE_STRUCTURE_##intype; \ __extension__ static __inline void \ __attribute__ ((__always_inline__)) \ - vst4 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ + vst4 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr, \ intype b, const int c) \ { \ +__ST4_LANE_STRUCTURE_##intype *__p =\ +(__ST4_LANE_STRUCTURE_##intype *)ptr; \ __asm__ (ld1 {v16. #regsuffix - v19. #regsuffix }, %1\n\t \ st4 {v16. #lnsuffix - v19. #lnsuffix }[%2], %0\n\t \ - : =Q(*(intype *) ptr) \ + : =Q(*__p) \ : Q(b), i(c) \ - : memory, v16, v17, v18, v19); \ + : v16, v17, v18, v19);\ } __ST4_LANE_FUNC (int8x8x4_t, int8_t, 8b, b, s8,)
Re: [PATCH] Use get_nonzero_bits to improve vectorization
On Fri, 25 Oct 2013, Jakub Jelinek wrote: Hi! The following patch makes use of the computed nonzero_bits preserved in the SSA_NAME_RANGE_INFO. I chose to write a new routine instead of improving current highest_pow2_factor, because that routine didn't care about overflows etc. and by working on ctz numbers instead of powers of two in UHWI we can handle even larger constants etc. highest_pow2_factor could very well overflow to zero etc. So, the patch introduces a new tree_ctz routine and reimplements highest_pow2_factor on top of that, plus uses tree_ctz also in get_object_alignment_2 and in the vectorizer to determine if it can avoid scalar loop for bound (and indirectly also in the analysis whether peeling for alignment is needed). With this patch, e.g. int a[1024]; void foo (int x, int y) { int i; x = -32; y = -32; for (i = x + 32; i y; i++) a[i]++; } can be vectorized without any peeling for alignment or scalar loop afterwards. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-10-25 Jakub Jelinek ja...@redhat.com * tree.c (tree_ctz): New function. * tree.h (tree_ctz): New prototype. * tree-ssanames.h (get_range_info, get_nonzero_bits): Change first argument from tree to const_tree. * tree-ssanames.c (get_range_info, get_nonzero_bits): Likewise. * tree-vectorizer.h (vect_generate_tmps_on_preheader): New prototype. * tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): No longer static. * expr.c (highest_pow2_factor): Reimplemented using tree_ctz. * tree-vect-loop.c (vect_analyze_loop_operations, vect_transform_loop): Don't force scalar loop for bound just because number of iterations is unknown, only do it if it is not known to be a multiple of vectorization_factor. * builtins.c (get_object_alignment_2): Use tree_ctz on offset. --- gcc/tree.c.jj 2013-10-23 14:43:12.0 +0200 +++ gcc/tree.c2013-10-25 15:00:55.296178794 +0200 @@ -2213,6 +2213,110 @@ tree_floor_log2 (const_tree expr) : floor_log2 (low)); } +/* Return number of known trailing zero bits in EXPR, or, if the value of + EXPR is known to be zero, the precision of it's type. */ + +int unsigned int? +tree_ctz (const_tree expr) +{ + if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) + !POINTER_TYPE_P (TREE_TYPE (expr))) +return 0; + + int ret1, ret2, prec = TYPE_PRECISION (TREE_TYPE (expr)); + switch (TREE_CODE (expr)) +{ +case INTEGER_CST: + ret1 = tree_to_double_int (expr).trailing_zeros (); + return MIN (ret1, prec); +case SSA_NAME: + ret1 = get_nonzero_bits (expr).trailing_zeros (); + return MIN (ret1, prec); +case PLUS_EXPR: +case MINUS_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: +case MIN_EXPR: +case MAX_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); This first recurses but if either one returns 0 you don't have to (that cuts down the recursion from exponential to linear in the common case?). Thus, early out on ret == 0? + return MIN (ret1, ret2); +case POINTER_PLUS_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + ret2 = MIN (ret2, prec); Why do you need that here but not elsewhere when processing binary ops? + return MIN (ret1, ret2); +case BIT_AND_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MAX (ret1, ret2); +case MULT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MIN (ret1 + ret2, prec); +case LSHIFT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (host_integerp (TREE_OPERAND (expr, 1), 1) check that first before recursing for op0 - if op1 is negative you simply return ret1 which looks wrong, too. +((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1) +(unsigned HOST_WIDE_INT) prec)) This check is to avoid overflowing ret1 + ret2? If so, why not add + { + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1); ret2 = MIN (ret2, prec); instead? + return MIN (ret1 + ret2, prec); + } + return ret1; +case RSHIFT_EXPR: + if (host_integerp (TREE_OPERAND (expr, 1), 1) +((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1) +(unsigned HOST_WIDE_INT) prec)) + { + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1); + if (ret1 ret2) + return ret1 - ret2; + } + return 0; Seems to be slightly better structured. Looks like you assume only positive shift amounts exist in the LSHIFT_EXPR case, I'm not sure that's a valid
Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.
My reading of memory_address_addr_space is that MODE is the mode of the memory reference, not the mode of the address. I fail to see how passing in the mode of the address in the first call can be correct. What am I missing? Nothing, it's me confusing rtx memory_address_addr_space (enum machine_mode, rtx, addr_space_t); with rtx convert_memory_address_addr_space (enum machine_mode, rtx, addr_space_t); for a long time apparently... Sorry about that. In the meantime, I installed the obvious patch to eliminate the small redundancy. Tested on x86_64/Linux. 2013-10-29 Eric Botcazou ebotca...@adacore.com * expr.c (expand_expr_real_1) MEM_EXPR: Eliminate small redundancy. -- Eric BotcazouIndex: expr.c === --- expr.c (revision 204106) +++ expr.c (working copy) @@ -9648,8 +9648,8 @@ expand_expr_real_1 (tree exp, rtx target rtx off = immed_double_int_const (mem_ref_offset (exp), address_mode); op0 = simplify_gen_binary (PLUS, address_mode, op0, off); + op0 = memory_address_addr_space (mode, op0, as); } - op0 = memory_address_addr_space (mode, op0, as); temp = gen_rtx_MEM (mode, op0); set_mem_attributes (temp, exp, 0); set_mem_addr_space (temp, as);
Re: libsanitizer merge from upstream r191666
On Wed, Oct 2, 2013 at 12:51 PM, Konstantin Serebryany 2013-10-XX Kostya Serebryany k...@google.com * g++.dg/asan/asan_test.cc: Update the test to match the fresh asan run-time. * c-c++-common/asan/stack-overflow-1.c: Ditto. === gcc/ChangeLog 2013-10-XX Kostya Serebryany k...@google.com * asan.c: Update to match the changed asan API. (asan_emit_stack_protection): update the string stored in the stack red zone to match new API. Store the PC of the current function in the red zone. (asan_global_struct): update the __asan_global definition to match the new API. (asan_add_global): Ditto. * sanitizer.def: rename __asan_init_v1 to __asan_init_v3 The Update to match the changed asan API. should either be dropped, or come on a line before the * asan.c (asan_emit_stack_protection): line. All descriptions should start with capital letters, end with ., two spaces after . if followed by another sentence. Besides that, here is (completely untested) attempt to give you the pc of the first instruction of the function and two minor changes (pp_string (something, ) is useless and in two spots I've noticed you didn't add space before ( in function call). Finally, if the new libasan is ABI incompatible with the old one, which seems it is, then libsanitizer/asan/libtool-version (and perhaps also libsanitizer/tsan/libtool-version, haven't looked if that one is ABI compatible or not) needs to be bumped (to 1:0:0 ?). --- gcc/asan.c.jj 2013-10-29 11:58:30.0 +0100 +++ gcc/asan.c 2013-10-29 13:04:07.709667677 +0100 @@ -921,6 +921,15 @@ asan_clear_shadow (rtx shadow_mem, HOST_ add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100); } +void +asan_function_start (void) +{ + section *fnsec = function_section (current_function_decl); + switch_to_section (fnsec); + ASM_OUTPUT_DEBUG_LABEL (asm_out_file, LASANPC, + current_function_funcdef_no); +} + /* Insert code to protect stack vars. The prologue sequence should be emitted directly, epilogue sequence returned. BASE is the register holding the stack base, against which OFFSETS array offsets are relative to, OFFSETS @@ -936,12 +945,13 @@ asan_emit_stack_protection (rtx base, HO int length) { rtx shadow_base, shadow_mem, ret, mem; + char buf[30]; unsigned char shadow_bytes[4]; HOST_WIDE_INT base_offset = offsets[length - 1], offset, prev_offset; HOST_WIDE_INT last_offset, last_size; int l; unsigned char cur_shadow_byte = ASAN_STACK_MAGIC_LEFT; - tree str_cst; + tree str_cst, decl; if (shadow_ptr_types[0] == NULL_TREE) asan_init_shadow_ptr_types (); @@ -949,7 +959,6 @@ asan_emit_stack_protection (rtx base, HO /* First of all, prepare the description string. */ pretty_printer asan_pp; - pp_string (asan_pp, ); pp_decimal_int (asan_pp, length / 2 - 1); pp_space (asan_pp); for (l = length - 2; l; l -= 2) @@ -980,7 +989,17 @@ asan_emit_stack_protection (rtx base, HO mem = adjust_address (mem, VOIDmode, GET_MODE_SIZE (ptr_mode)); emit_move_insn (mem, expand_normal (str_cst)); mem = adjust_address (mem, VOIDmode, GET_MODE_SIZE (ptr_mode)); - emit_move_insn (mem, expand_normal (str_cst)); // FIXME: should be cur_pc. + ASM_GENERATE_INTERNAL_LABEL (buf, LASANPC, current_function_funcdef_no); + decl = build_decl (DECL_SOURCE_LOCATION (current_function_decl), +VAR_DECL, get_identifier (buf), char_type_node); + TREE_ADDRESSABLE (decl) = 1; + TREE_READONLY (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + DECL_IGNORED_P (decl) = 1; + TREE_STATIC (decl) = 1; + TREE_PUBLIC (decl) = 0; + TREE_USED (decl) = 1; + emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl))); shadow_base = expand_binop (Pmode, lshr_optab, base, GEN_INT (ASAN_SHADOW_SHIFT), NULL_RTX, 1, OPTAB_DIRECT); @@ -1979,8 +1998,8 @@ asan_add_global (tree decl, tree type, v pp_string (asan_pp, unknown); str_cst = asan_pp_string (asan_pp); - pp_string(module_name_pp, main_input_filename); - module_name_cst = asan_pp_string(module_name_pp); + pp_string (module_name_pp, main_input_filename); + module_name_cst = asan_pp_string (module_name_pp); if (asan_needs_local_alias (decl)) { --- gcc/asan.h.jj 2013-01-11 09:02:50.0 +0100 +++ gcc/asan.h 2013-10-29 12:37:54.190798947 +0100 @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. #ifndef TREE_ASAN #define TREE_ASAN +extern void asan_function_start (void); extern void asan_finish_file (void); extern rtx asan_emit_stack_protection (rtx, HOST_WIDE_INT *, tree *, int); extern bool asan_protect_global (tree); --- gcc/final.c.jj 2013-10-23 14:43:12.0 +0200 +++ gcc/final.c 2013-10-29 12:49:33.609176613 +0100 @@ -78,6 +78,7 @@ along with GCC; see the
[PATCH][ARM] New rtx cost table for Cortex-A7
Hi all, This patch adds the new rtx costs for the Cortex-A7 core as well as a new tuning structure to contain it. Tested arm-none-eabi on qemu and no benchmark regressions. Ok for trunk? Thanks, Kyrill [gcc/] 2013-10-29 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (cortexa7_extra_costs): New table. (arm_cortex_a7_tune): New. * config/arm/arm-cores.def: Use cortex_a7 tuning for cortex-a7.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 17c9bf3..79e2e87 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -125,7 +125,7 @@ ARM_CORE(arm1156t2-s, arm1156t2s, 6T2, FL_LDSCHED, v6t2) ARM_CORE(arm1156t2f-s, arm1156t2fs, 6T2, FL_LDSCHED | FL_VFPV2, v6t2) ARM_CORE(generic-armv7-a, genericv7a, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a5, cortexa5, 7A, FL_LDSCHED, cortex_a5) -ARM_CORE(cortex-a7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex) +ARM_CORE(cortex-a7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a7) ARM_CORE(cortex-a8, cortexa8, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a9, cortexa9, 7A, FL_LDSCHED, cortex_a9) ARM_CORE(cortex-a15, cortexa15, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 4cdac60..768bc21 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1050,6 +1050,107 @@ const struct cpu_cost_table generic_extra_costs = } }; +const struct cpu_cost_table cortexa7_extra_costs = +{ + /* ALU */ + { +0, /* Arith. */ +0, /* Logical. */ +COSTS_N_INSNS (1), /* Shift. */ +COSTS_N_INSNS (1), /* Shift_reg. */ +COSTS_N_INSNS (1), /* Arith_shift. */ +COSTS_N_INSNS (1), /* Arith_shift_reg. */ +COSTS_N_INSNS (1), /* Log_shift. */ +COSTS_N_INSNS (1), /* Log_shift_reg. */ +COSTS_N_INSNS (1), /* Extend. */ +COSTS_N_INSNS (1), /* Extend_arith. */ +COSTS_N_INSNS (1), /* Bfi. */ +COSTS_N_INSNS (1), /* Bfx. */ +COSTS_N_INSNS (1), /* Clz. */ +0, /* non_exec. */ +true /* non_exec_costs_exec. */ + }, + + { +/* MULT SImode */ +{ + 0, /* Simple. */ + COSTS_N_INSNS (1), /* Flag_setting. */ + COSTS_N_INSNS (1), /* Extend. */ + COSTS_N_INSNS (1), /* Add. */ + COSTS_N_INSNS (1), /* Extend_add. */ + COSTS_N_INSNS (7) /* Idiv. */ +}, +/* MULT DImode */ +{ + 0, /* Simple (N/A). */ + 0, /* Flag_setting (N/A). */ + COSTS_N_INSNS (1), /* Extend. */ + 0, /* Add. */ + COSTS_N_INSNS (2), /* Extend_add. */ + 0/* Idiv (N/A). */ +} + }, + /* LD/ST */ + { +COSTS_N_INSNS (1), /* Load. */ +COSTS_N_INSNS (1), /* Load_sign_extend. */ +COSTS_N_INSNS (3), /* Ldrd. */ +COSTS_N_INSNS (1), /* Ldm_1st. */ +1, /* Ldm_regs_per_insn_1st. */ +2, /* Ldm_regs_per_insn_subsequent. */ +COSTS_N_INSNS (2), /* Loadf. */ +COSTS_N_INSNS (2), /* Loadd. */ +COSTS_N_INSNS (1), /* Load_unaligned. */ +COSTS_N_INSNS (1), /* Store. */ +COSTS_N_INSNS (3), /* Strd. */ +COSTS_N_INSNS (1), /* Stm_1st. */ +1, /* Stm_regs_per_insn_1st. */ +2, /* Stm_regs_per_insn_subsequent. */ +COSTS_N_INSNS (2), /* Storef. */ +COSTS_N_INSNS (2), /* Stored. */ +COSTS_N_INSNS (1) /* Store_unaligned. */ + }, + { +/* FP SFmode */ +{ + COSTS_N_INSNS (15), /* Div. */ + COSTS_N_INSNS (3), /* Mult. */ + COSTS_N_INSNS (7), /* Mult_addsub. */ + COSTS_N_INSNS (7), /* Fma. */ + COSTS_N_INSNS (3), /* Addsub. */ + COSTS_N_INSNS (3), /* Fpconst. */ + COSTS_N_INSNS (3), /* Neg. */ + COSTS_N_INSNS (3), /* Compare. */ + COSTS_N_INSNS (3), /* Widen. */ + COSTS_N_INSNS (3), /* Narrow. */ + COSTS_N_INSNS (3), /* Toint. */ + COSTS_N_INSNS (3), /* Fromint. */ + COSTS_N_INSNS (3) /* Roundint. */ +}, +/* FP DFmode */ +{ + COSTS_N_INSNS (30), /* Div. */ + COSTS_N_INSNS (6), /* Mult. */ + COSTS_N_INSNS (10), /* Mult_addsub. */ + COSTS_N_INSNS (7), /* Fma. */ + COSTS_N_INSNS (3), /* Addsub. */ + COSTS_N_INSNS (3), /* Fpconst. */ + COSTS_N_INSNS (3), /* Neg. */ + COSTS_N_INSNS (3), /* Compare. */ + COSTS_N_INSNS (3), /* Widen. */ + COSTS_N_INSNS (3), /* Narrow. */ + COSTS_N_INSNS (3), /* Toint. */ + COSTS_N_INSNS (3), /* Fromint. */ + COSTS_N_INSNS (3) /* Roundint. */ +} + }, + /* Vector */ + { +COSTS_N_INSNS (1) /* Alu. */ + } +}; + const struct cpu_cost_table cortexa15_extra_costs = { /* ALU */ @@ -1266,6 +1367,22 @@ const struct tune_params arm_cortex_tune = false /* Prefer Neon for 64-bits bitops. */ }; +const struct tune_params arm_cortex_a7_tune = +{ + arm_9e_rtx_costs, + cortexa7_extra_costs, + NULL, + 1, /* Constant
Re: [PATCH][ARM] New rtx cost table for Cortex-A7
On 10/29/13 12:15, Kyrill Tkachov wrote: Hi all, This patch adds the new rtx costs for the Cortex-A7 core as well as a new tuning structure to contain it. Tested arm-none-eabi on qemu and no benchmark regressions. Ok for trunk? Ok. Ramana
Re: [wide-int] Treat order comparisons like other binary ops
On Sun, 27 Oct 2013, Richard Sandiford wrote: Until now, eq_p and ne_p have enforced the same argument rules as things like addition, while order comparisons like lts_p have treated the two arguments as independent and signed. Richard, I think you said on IRC that you thought lts_p should behave like the others. E.g. lts_p on two trees or two rtxes should make sure that the precisions are the same. Yes. This patch does that. I think all uses of INT_CST_LT and INT_CST_LT_UNSIGNED are really comparing to infinite precision, and the UNSIGNED distinction is only there because double_int isn't wide enough to be effectively infinite. Is that right? Yes, it's only because of the case of set MSB on the double-int. Since that isn't a problem with widest_int, the patch gets rid of INT_CST_LT_UNSIGNED and only uses INT_CST_LT. The c-lex.c change includes a generic change that I justed posted for trunk. I rejigged the order in tree.h slightly so that it matches trunk. Tested on powerpc64-linux-gnu and x86_64-linux-gnu. OK for wide-int? Ok. Thanks, Richard. Thanks, Richard Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c 2013-10-27 14:11:53.006510519 + +++ gcc/c-family/c-common.c 2013-10-27 14:19:27.667578745 + @@ -4101,20 +4101,10 @@ shorten_compare (tree *op0_ptr, tree *op maxval = convert (*restype_ptr, maxval); } - if (unsignedp unsignedp0) - { - min_gt = INT_CST_LT_UNSIGNED (primop1, minval); - max_gt = INT_CST_LT_UNSIGNED (primop1, maxval); - min_lt = INT_CST_LT_UNSIGNED (minval, primop1); - max_lt = INT_CST_LT_UNSIGNED (maxval, primop1); - } - else - { - min_gt = INT_CST_LT (primop1, minval); - max_gt = INT_CST_LT (primop1, maxval); - min_lt = INT_CST_LT (minval, primop1); - max_lt = INT_CST_LT (maxval, primop1); - } + min_gt = INT_CST_LT (primop1, minval); + max_gt = INT_CST_LT (primop1, maxval); + min_lt = INT_CST_LT (minval, primop1); + max_lt = INT_CST_LT (maxval, primop1); val = 0; /* This used to be a switch, but Genix compiler can't handle that. */ Index: gcc/c-family/c-lex.c === --- gcc/c-family/c-lex.c 2013-10-27 14:11:53.006510519 + +++ gcc/c-family/c-lex.c 2013-10-27 14:19:27.664578718 + @@ -48,9 +48,9 @@ static tree interpret_float (const cpp_t enum overflow_type *); static tree interpret_fixed (const cpp_token *, unsigned int); static enum integer_type_kind narrowest_unsigned_type - (const wide_int , unsigned int); + (const widest_int , unsigned int); static enum integer_type_kind narrowest_signed_type - (const wide_int , unsigned int); + (const widest_int , unsigned int); static enum cpp_ttype lex_string (const cpp_token *, tree *, bool, bool); static tree lex_charconst (const cpp_token *); static void update_header_times (const char *); @@ -526,7 +526,7 @@ c_lex_with_flags (tree *value, location_ there isn't one. */ static enum integer_type_kind -narrowest_unsigned_type (const wide_int val, unsigned int flags) +narrowest_unsigned_type (const widest_int val, unsigned int flags) { int itk; @@ -545,7 +545,7 @@ narrowest_unsigned_type (const wide_int continue; upper = TYPE_MAX_VALUE (integer_types[itk]); - if (wi::geu_p (upper, val)) + if (wi::geu_p (wi::to_widest (upper), val)) return (enum integer_type_kind) itk; } @@ -554,7 +554,7 @@ narrowest_unsigned_type (const wide_int /* Ditto, but narrowest signed type. */ static enum integer_type_kind -narrowest_signed_type (const wide_int val, unsigned int flags) +narrowest_signed_type (const widest_int val, unsigned int flags) { int itk; @@ -573,7 +573,7 @@ narrowest_signed_type (const wide_int v continue; upper = TYPE_MAX_VALUE (integer_types[itk]); - if (wi::geu_p (upper, val)) + if (wi::geu_p (wi::to_widest (upper), val)) return (enum integer_type_kind) itk; } @@ -588,20 +588,18 @@ interpret_integer (const cpp_token *toke tree value, type; enum integer_type_kind itk; cpp_num integer; - cpp_options *options = cpp_get_options (parse_in); - HOST_WIDE_INT ival[2]; - wide_int wval; + HOST_WIDE_INT ival[3]; *overflow = OT_NONE; integer = cpp_interpret_integer (parse_in, token, flags); - integer = cpp_num_sign_extend (integer, options-precision); if (integer.overflow) *overflow = OT_OVERFLOW; ival[0] = integer.low; ival[1] = integer.high; - wval = wide_int::from_array (ival, 2, HOST_BITS_PER_WIDE_INT * 2); + ival[2] = 0; + widest_int wval = widest_int::from_array (ival, 3); /* The type of a constant with a U suffix is
Re: [wide-int] More optimisations
On Sun, 27 Oct 2013, Richard Sandiford wrote: This patch adds some more optimisations to the wi:: comparison functions. It uses the: #define CONSTANT(X) (__builtin_constant_p (X) (X)) idiom that was mentioned before, except that I thought CONSTANT would be too easily confused with CONSTANT_P, so I went for CAN_TELL instead. Better names welcome. STATIC_CONSTANT_P similar to static_assert? The changes are: - Add a fast path to eq_p for when one of the inputs isn't sign-extended. This includes code to handle compile-time 0 specially. - Add the opposite optimisation to Mike's lts_p change, if we can tell at compile time that it applies. I think the cases Mike added should only be enabled when we can figure them out at compile-time, too. Ok with that change. Thanks, Richard. - Add fast paths to ltu_p for constants. E.g.: bool f1 (const_tree x) { return wi::eq_p (x, 0); } now gives: xorl%eax, %eax cmpw$1, 4(%rdi) je .L5 rep ret .p2align 4,,10 .p2align 3 .L5: cmpq$0, 16(%rdi) sete%al ret bool f2 (const_tree x, HOST_WIDE_INT y) { return wi::eq_p (x, y); } gives: movq8(%rdi), %rax movzwl 52(%rax), %edx xorl%eax, %eax andw$1023, %dx cmpw$1, 4(%rdi) je .L10 rep ret .p2align 4,,10 .p2align 3 .L10: xorq16(%rdi), %rsi movzwl %dx, %edx movl$64, %ecx subl%edx, %ecx movq%rsi, %rax salq%cl, %rax testl %ecx, %ecx cmovg %rax, %rsi testq %rsi, %rsi sete%al ret bool f3 (HOST_WIDE_INT x, const_tree y) { return wi::lts_p (x, y); } is similarly ugly because of way that it ignores TYPE_SIGN and so has to explicitly sign-extend small-prec cases: movq8(%rsi), %rax movzwl 4(%rsi), %ecx movzwl 52(%rax), %edx andl$1023, %edx cmpl$1, %ecx je .L16 leal-1(%rcx), %eax sall$6, %ecx subl%edx, %ecx movq16(%rsi,%rax,8), %rax movq%rax, %rdx salq%cl, %rdx testl %ecx, %ecx cmovg %rdx, %rax sarq$63, %rax addl$1, %eax ret .p2align 4,,10 .p2align 3 .L16: cmpl$63, %edx movq16(%rsi), %rax ja .L13 movb$64, %cl subl%edx, %ecx salq%cl, %rax sarq%cl, %rax .L13: cmpq%rdi, %rax setg%al ret but: bool f4 (HOST_WIDE_INT x, const_tree y) { return wi::lts_p (x, wi::to_widest (y)); } is a bit more respectable: movzwl 6(%rsi), %eax cmpl$1, %eax je .L20 subl$1, %eax movq16(%rsi,%rax,8), %rax sarq$63, %rax addl$1, %eax ret .p2align 4,,10 .p2align 3 .L20: cmpq%rdi, 16(%rsi) setg%al ret For similar reasons: bool f5 (const_tree x) { return wi::ltu_p (x, 100); } gives: movq8(%rdi), %rax movzwl 52(%rax), %ecx xorl%eax, %eax andw$1023, %cx cmpw$1, 4(%rdi) je .L26 rep ret .p2align 4,,10 .p2align 3 .L26: cmpw$63, %cx ja .L23 movl$1, %eax salq%cl, %rax subq$1, %rax andq16(%rdi), %rax .L24: cmpq$99, %rax setbe %al ret .p2align 4,,10 .p2align 3 .L23: movq16(%rdi), %rax jmp .L24 but: bool f6 (const_tree x) { return wi::ltu_p (wi::to_widest (x), 100); } gives: xorl%eax, %eax cmpw$1, 6(%rdi) je .L30 rep ret .p2align 4,,10 .p2align 3 .L30: cmpq$99, 16(%rdi) setbe %al ret Tested on powerpc64-linux-gnu and x86_64-linux-gnu. OK for wide-int? Thanks, Richard Index: gcc/system.h === --- gcc/system.h 2013-10-27 14:25:19.144723977 + +++ gcc/system.h 2013-10-27 14:25:20.716738045 + @@ -711,6 +711,12 @@ #define gcc_unreachable() __builtin_unre #define gcc_unreachable() (fancy_abort (__FILE__, __LINE__, __FUNCTION__)) #endif +#if GCC_VERSION = 3001 +#define CAN_TELL(X) (__builtin_constant_p (X) (X)) +#else +#define CAN_TELL(X) (false (X)) +#endif + /* Until we can use STATIC_ASSERT. */ #define STATIC_ASSERT(X) \ typedef int assertion1[(X) ? 1 : -1] ATTRIBUTE_UNUSED Index: gcc/wide-int.h
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
Jakub Jelinek ja...@redhat.com writes: On Wed, Oct 16, 2013 at 09:35:21AM +0400, Yury Gribov wrote: I've recently submitted a bug report regarding invalid unpoisoning of stack frame redzones (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take a look at proposed patch (a simple one-liner) and check whether it's ok for commit? Can you please be more verbose Do you think the proposed patch is ok or should I provide some additional info? Jakub, Dodji, Any updates on this one? Note that this bug is a huge blocker for using AddressSanitizer on ARM platforms. Sorry for the delay, I finally found time to look at it. While your patch fixes the issue, I wonder if for the case where shadow_mem before the asan_clear_shadow call is already of the form (mem (reg)) it isn't better to just adjust next asan_clear_shadow base from the end of the cleared region rather than from the start of it, because the end will be already in the right pseudo, while with your approach it needs to be done in a different register and potentially increase register pressure. So here is (completely untested) alternate fix: Hmm, it just seems wrong to be assigning to registers returned by force_reg and expand_binop though. Would this variation be OK? Thanks, Richard Index: gcc/asan.c === --- gcc/asan.c (revision 204124) +++ gcc/asan.c (working copy) @@ -876,9 +876,10 @@ } /* Clear shadow memory at SHADOW_MEM, LEN bytes. Can't call a library call here - though. */ + though. If the address of the next byte is in a known register, return + that register, otherwise return null. */ -static void +static rtx asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) { rtx insn, insns, top_label, end, addr, tmp, jump; @@ -893,12 +894,12 @@ if (insn == NULL_RTX) { emit_insn (insns); - return; + return 0; } gcc_assert ((len 3) == 0); top_label = gen_label_rtx (); - addr = force_reg (Pmode, XEXP (shadow_mem, 0)); + addr = copy_to_mode_reg (Pmode, XEXP (shadow_mem, 0)); shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0); end = force_reg (Pmode, plus_constant (Pmode, addr, len)); emit_label (top_label); @@ -912,6 +913,7 @@ jump = get_last_insn (); gcc_assert (JUMP_P (jump)); add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100); + return addr; } /* Insert code to protect stack vars. The prologue sequence should be emitted @@ -1048,7 +1050,15 @@ (last_offset - prev_offset) ASAN_SHADOW_SHIFT); prev_offset = last_offset; - asan_clear_shadow (shadow_mem, last_size ASAN_SHADOW_SHIFT); + rtx end_addr = asan_clear_shadow (shadow_mem, + last_size ASAN_SHADOW_SHIFT); + if (end_addr) + { + shadow_mem + = adjust_automodify_address (shadow_mem, VOIDmode, end_addr, +last_size ASAN_SHADOW_SHIFT); + prev_offset += last_size; + } last_offset = offset; last_size = 0; }
RE: [PING] [AArch64] Peepholes to generate ldp and stp instructions
Hi, You are better off CCing the maintainers for such reviews. Let me do that for you. I cannot approve or reject this patch but I have a few comments as below. Thanks for the quick review and comments. Please find attached the modified patch as per review comments. Please review the same and let me know if its okay. Build and tested on aarch64-thunder-elf (using Cavium's internal simulator). No new regressions. 2013-10-29 Naveen H.S naveen.hurugalaw...@caviumnetworks.com gcc/ * config/aarch64/aarch64.md (peephole2 to generate ldp instruction for 2 consecutive loads from memory): New. (peephole2 to generate stp instruction for 2 consecutive stores to memory in integer mode): New. (peephole2 to generate ldp instruction for 2 consecutive loads from memory in floating point mode): New. (peephole2 to generate stp instruction for 2 consecutive stores to memory in floating point mode): New. gcc/testsuite * gcc.target/aarch64/ldp-stp.c: New testcase. Thanks, Naveen --- gcc/config/aarch64/aarch64.md 2013-10-28 17:15:52.363975264 +0530 +++ gcc/config/aarch64/aarch64.md 2013-10-29 17:40:48.516129561 +0530 @@ -1068,6 +1068,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPI 0 register_operand) + (match_operand:GPI 1 aarch64_mem_pair_operand)) + (set (match_operand:GPI 2 register_operand) + (match_operand:GPI 3 memory_operand))] + GET_CODE (operands[1]) == MEM +GET_CODE (XEXP (operands[1], 0)) == PLUS +REG_P (XEXP (XEXP (operands[1], 0), 0)) +CONST_INT_P (XEXP (XEXP (operands[1], 0), 1)) +GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0)) +REGNO (operands[0]) != REGNO (operands[2]) +GP_REGNUM_P (REGNO (operands[0])) GP_REGNUM_P (REGNO (operands[2])) +REGNO_REG_CLASS (REGNO (operands[0])) + == REGNO_REG_CLASS (REGNO (operands[2])) +rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, XEXP (operands[1], 0), + GET_MODE_SIZE (MODEmode))) + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 0 and 2 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn store_pairmode @@ -1085,6 +1106,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPI 0 aarch64_mem_pair_operand) + (match_operand:GPI 1 register_operand)) + (set (match_operand:GPI 2 memory_operand) + (match_operand:GPI 3 register_operand))] + GET_CODE (operands[0]) == MEM +GET_CODE (XEXP (operands[0], 0)) == PLUS +REG_P (XEXP (XEXP (operands[0], 0), 0)) +CONST_INT_P (XEXP (XEXP (operands[0], 0), 1)) +GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0)) +REGNO (operands[1]) != REGNO (operands[3]) +GP_REGNUM_P (REGNO (operands[1])) GP_REGNUM_P (REGNO (operands[3])) +REGNO_REG_CLASS (REGNO (operands[1])) + == REGNO_REG_CLASS (REGNO (operands[3])) +rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, XEXP (operands[0], 0), + GET_MODE_SIZE (MODEmode))) + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 1 and 3 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn load_pairmode @@ -1102,6 +1144,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPF 0 register_operand) + (match_operand:GPF 1 aarch64_mem_pair_operand)) + (set (match_operand:GPF 2 register_operand) + (match_operand:GPF 3 memory_operand))] + GET_CODE (operands[1]) == MEM +GET_CODE (XEXP (operands[1], 0)) == PLUS +REG_P (XEXP (XEXP (operands[1], 0), 0)) +CONST_INT_P (XEXP (XEXP (operands[1], 0), 1)) +GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0)) +REGNO (operands[0]) != REGNO (operands[2]) +FP_REGNUM_P (REGNO (operands[0])) FP_REGNUM_P (REGNO (operands[2])) +REGNO_REG_CLASS (REGNO (operands[0])) + == REGNO_REG_CLASS (REGNO (operands[2])) +rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, XEXP (operands[1], 0), + GET_MODE_SIZE (MODEmode))) + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +) + ;; Operands 0 and 2 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn store_pairmode @@ -1119,6 +1182,27 @@ (set_attr mode MODE)] ) +(define_peephole2 + [(set (match_operand:GPF 0 aarch64_mem_pair_operand) + (match_operand:GPF 1 register_operand)) + (set (match_operand:GPF 2 memory_operand) + (match_operand:GPF 3 register_operand))] + GET_CODE (operands[0]) == MEM +GET_CODE (XEXP (operands[0], 0)) == PLUS +REG_P (XEXP (XEXP (operands[0], 0), 0)) +CONST_INT_P (XEXP (XEXP (operands[0], 0), 1)) +GET_MODE (operands[1]) ==
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
On Tue, Oct 29, 2013 at 12:25:33PM +, Richard Sandiford wrote: Any updates on this one? Note that this bug is a huge blocker for using AddressSanitizer on ARM platforms. Sorry for the delay, I finally found time to look at it. While your patch fixes the issue, I wonder if for the case where shadow_mem before the asan_clear_shadow call is already of the form (mem (reg)) it isn't better to just adjust next asan_clear_shadow base from the end of the cleared region rather than from the start of it, because the end will be already in the right pseudo, while with your approach it needs to be done in a different register and potentially increase register pressure. So here is (completely untested) alternate fix: Hmm, it just seems wrong to be assigning to registers returned by force_reg and expand_binop though. Would this variation be OK? Why? If it is a pseudo, it is certainly a pseudo that isn't used for anything else, as it is the result of (base 3) + constant, if it isn't a pseudo, then supposedly it is better not to just keep adding the offsets to a base and thus not to use the end address from asan_clear_shadow. Your patch would force using the end address even if shadow_base is initially say some_reg + 32, I think it is better to use shadow_mem some_reg + 72 next time instead of some_other_reg + 40. Jakub
Re: [wide-int] More optimisations
Richard Biener rguent...@suse.de writes: On Sun, 27 Oct 2013, Richard Sandiford wrote: This patch adds some more optimisations to the wi:: comparison functions. It uses the: #define CONSTANT(X) (__builtin_constant_p (X) (X)) idiom that was mentioned before, except that I thought CONSTANT would be too easily confused with CONSTANT_P, so I went for CAN_TELL instead. Better names welcome. STATIC_CONSTANT_P similar to static_assert? Sounds good. The changes are: - Add a fast path to eq_p for when one of the inputs isn't sign-extended. This includes code to handle compile-time 0 specially. - Add the opposite optimisation to Mike's lts_p change, if we can tell at compile time that it applies. I think the cases Mike added should only be enabled when we can figure them out at compile-time, too. Well, the idea with most of these functions was to handle the simple everything is one HWI cases inline. For operations like addition this needs to be based on the precision, since that's the only cheap way of knowing whether the result is also a single HWI. But for logic operations like AND and OR, it can be based on whether the inputs are single HWIs, since single-HWI inputs give a single-HWI output. This catches more cases. lts_p is a bit like AND and OR in this respect. fits_shwi_p is the same as len == 1, which is simple to check and is often true. If we restrict the optimisation to STATIC_CONSTANT_P we'll end up using the out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones that we currently handle inline. Thanks, Richard
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
On 10/29/13 04:17, Richard Biener wrote: On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law l...@redhat.com wrote: On 10/25/13 11:57, Ilya Enkovich wrote: There are currently two known issues with LTO. The first one is ICE in LTO streamer when it reads instrumented code. The second one is unitialized flag_check_pointers when code is compiled by lto1 (do you know why it may happen BTW?). It also causes ICE beacause instrumented code is met when not expected. Of course, I'll fix these problems anyway, but I was going to allow '-fcheck-pointers -flto' only when checker testsuite has 100% pass rate with lto. No idea about why LTO would be failing in this way -- my experience with LTO is virtually nil. I'm OK with the restriction as a temporary measure. But we really dislike this kidn of stuff, so I need a commitment from you to dive into these problems and sort out what's really going on. I'm not ok with adding such arbitrary restrictions. LTO support should be trivial. In that case, Ilya, you've got to resolve the LTO issues before this can move forward. jeff
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
Jakub Jelinek ja...@redhat.com writes: On Tue, Oct 29, 2013 at 12:25:33PM +, Richard Sandiford wrote: Any updates on this one? Note that this bug is a huge blocker for using AddressSanitizer on ARM platforms. Sorry for the delay, I finally found time to look at it. While your patch fixes the issue, I wonder if for the case where shadow_mem before the asan_clear_shadow call is already of the form (mem (reg)) it isn't better to just adjust next asan_clear_shadow base from the end of the cleared region rather than from the start of it, because the end will be already in the right pseudo, while with your approach it needs to be done in a different register and potentially increase register pressure. So here is (completely untested) alternate fix: Hmm, it just seems wrong to be assigning to registers returned by force_reg and expand_binop though. Would this variation be OK? Why? Well, that was the traditional rule: The caller must not alter the value in the register we return, since we mark it as a constant register. */ rtx force_reg (enum machine_mode mode, rtx x) { but maybe it doesn't matter now, since the constant register flag has gone and since we seem to use REG_EQUAL rather than REG_EQUIV... If it is a pseudo, it is certainly a pseudo that isn't used for anything else, as it is the result of (base 3) + constant, if it isn't a pseudo, then supposedly it is better not to just keep adding the offsets to a base and thus not to use the end address from asan_clear_shadow. Your patch would force using the end address even if shadow_base is initially say some_reg + 32, I think it is better to use shadow_mem some_reg + 72 next time instead of some_other_reg + 40. But I thought the register pressure thing was to avoid having the original base live across the loop. Doesn't that apply for reg + offset as well as plain reg? If we have to force some_reg + 32 into a temporary and then loop on it, using the new base should avoid keeping both it and some_reg live at the same time. Plus smaller offsets are often better on some targets. Thanks, Richard
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
On Tue, Oct 29, 2013 at 01:06:21PM +, Richard Sandiford wrote: If it is a pseudo, it is certainly a pseudo that isn't used for anything else, as it is the result of (base 3) + constant, if it isn't a pseudo, then supposedly it is better not to just keep adding the offsets to a base and thus not to use the end address from asan_clear_shadow. Your patch would force using the end address even if shadow_base is initially say some_reg + 32, I think it is better to use shadow_mem some_reg + 72 next time instead of some_other_reg + 40. But I thought the register pressure thing was to avoid having the original base live across the loop. Doesn't that apply for reg + offset as well as plain reg? If we have to force some_reg + 32 into a temporary and then loop on it, using the new base should avoid keeping both it and some_reg live at the same time. Plus smaller offsets are often better on some targets. On the other side, if we force the address to be just register based when it previously wasn't, it makes life harder for alias analysis etc., at least in the past say on ia64 I remember the fact that it didn't allow anything but registers as valid addresses often resulted in tons of missed optimizations), so in the end I guess I have nothing against the original patch (with whatever form of the copy to reg call is desirable). Jakub
Re: libsanitizer merge from upstream r191666
On Tue, Oct 29, 2013 at 06:49:30AM -0700, Konstantin Serebryany wrote: Thanks! (At this time I will be slow with response due to travel) BTW, don't have compiled clang/llvm pre-3.4 around to look at, for the use after return, do you emit always the the __asan_*malloc/free calls for the stack vars, or only conditionally based on some compiler flag (and in the latter case, is that flag on by default or not)? Jakub
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
On 29 Oct 06:59, Jeff Law wrote: On 10/29/13 04:17, Richard Biener wrote: On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law l...@redhat.com wrote: On 10/25/13 11:57, Ilya Enkovich wrote: There are currently two known issues with LTO. The first one is ICE in LTO streamer when it reads instrumented code. The second one is unitialized flag_check_pointers when code is compiled by lto1 (do you know why it may happen BTW?). It also causes ICE beacause instrumented code is met when not expected. Of course, I'll fix these problems anyway, but I was going to allow '-fcheck-pointers -flto' only when checker testsuite has 100% pass rate with lto. No idea about why LTO would be failing in this way -- my experience with LTO is virtually nil. I'm OK with the restriction as a temporary measure. But we really dislike this kidn of stuff, so I need a commitment from you to dive into these problems and sort out what's really going on. I'm not ok with adding such arbitrary restrictions. LTO support should be trivial. In that case, Ilya, you've got to resolve the LTO issues before this can move forward. jeff Yeah. I'm working on it right now. I've fixed known issues and now I'm looking for others. Meanwhile here is a new patch version with required renames and without LTO restriction. Thanks, Ilya -- gcc/ 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * builtin-types.def (BT_FN_VOID_CONST_PTR): New. (BT_FN_PTR_CONST_PTR): New. (BT_FN_CONST_PTR_CONST_PTR): New. (BT_FN_PTR_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR): New. (BT_FN_VOID_PTRPTR_CONST_PTR): New. (BT_FN_VOID_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New. * chkp-builtins.def: New. * builtins.def: include chkp-builtins.def. (DEF_CHKP_BUILTIN): New. * builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS, BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS, BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS, BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS, BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS, BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND, BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL, BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET, BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW, BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER. * common.opt (fcheck-pointer-bounds): New. * toplev.c (process_options): Check Pointer Bounds Checker is supported. * doc/extend.texi: Document Pointer Bounds Checker built-in functions. diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 3deedba..1f9ae4e 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -226,6 +226,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, BT_DFLOAT64) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR) +DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT) DEF_FUNCTION_TYPE_1 (BT_FN_ULONG_ULONG, BT_ULONG, BT_ULONG) DEF_FUNCTION_TYPE_1 (BT_FN_ULONGLONG_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG) @@ -233,6 +234,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16, BT_UINT16, BT_UINT16) DEF_FUNCTION_TYPE_1 (BT_FN_UINT32_UINT32, BT_UINT32, BT_UINT32) DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_UINT64, BT_UINT64, BT_UINT64) DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT) +DEF_FUNCTION_TYPE_1 (BT_FN_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR) +DEF_FUNCTION_TYPE_1 (BT_FN_CONST_PTR_CONST_PTR, BT_CONST_PTR, BT_CONST_PTR) DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR, BT_FN_VOID_PTR) @@ -346,6 +349,10 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_SIZE_CONST_VPTR, BT_BOOL, BT_SIZE, BT_CONST_VOLATILE_PTR) DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, BT_BOOL) DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT_UINT, BT_VOID, BT_UINT, BT_UINT) +DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_SIZE, BT_PTR, BT_CONST_PTR, BT_SIZE) +DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR, BT_CONST_PTR) +DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRPTR_CONST_PTR, BT_VOID, BT_PTR_PTR, BT_CONST_PTR) +DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE) DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR) @@ -428,6 +435,7 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, BT_VOLATILE_PTR, BT_I2, BT DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, BT_INT) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, BT_INT) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID,
Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP
On Tue, Oct 29, 2013 at 12:28:49PM +0100, Richard Biener wrote: Otherwise ok. So like this? 2013-10-29 Jakub Jelinek ja...@redhat.com * tree-cfg.c (assert_unreachable_fallthru_edge_p): New function. * tree-cfg.h (assert_unreachable_fallthru_edge_p): New prototype. * tree-vrp.c (all_imm_uses_in_stmt_or_feed_cond): New function. (remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate uses but in the condition and ASSERT_EXPR and the other successor of the predecessor bb is __builtin_unreachable (), set_range_info of the ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs. --- gcc/tree-cfg.c.jj 2013-10-29 09:25:55.0 +0100 +++ gcc/tree-cfg.c 2013-10-29 14:32:53.235150267 +0100 @@ -380,6 +380,50 @@ computed_goto_p (gimple t) TREE_CODE (gimple_goto_dest (t)) != LABEL_DECL); } +/* Returns true for edge E where e-src ends with a GIMPLE_COND and + the other edge points to a bb with just __builtin_unreachable (). + I.e. return true for C-M edge in: + bb C: + ... + if (something) + goto bb N; + else + goto bb M; + bb N: + __builtin_unreachable (); + bb M: */ + +bool +assert_unreachable_fallthru_edge_p (edge e) +{ + basic_block pred_bb = e-src; + gimple last = last_stmt (pred_bb); + if (last gimple_code (last) == GIMPLE_COND) +{ + basic_block other_bb = EDGE_SUCC (pred_bb, 0)-dest; + if (other_bb == e-dest) + other_bb = EDGE_SUCC (pred_bb, 1)-dest; + if (EDGE_COUNT (other_bb-succs) == 0) + { + gimple_stmt_iterator gsi = gsi_after_labels (other_bb); + gimple stmt; + + if (gsi_end_p (gsi)) + return false; + stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + { + gsi_next_nondebug (gsi); + if (gsi_end_p (gsi)) + return false; + stmt = gsi_stmt (gsi); + } + return gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE); + } +} + return false; +} + /* Search the CFG for any computed gotos. If found, factor them to a common computed goto site. Also record the location of that site so --- gcc/tree-vrp.c.jj 2013-10-29 09:25:38.382477963 +0100 +++ gcc/tree-vrp.c 2013-10-29 14:36:23.695060600 +0100 @@ -6459,6 +6459,33 @@ check_all_array_refs (void) } } +/* Return true if all imm uses of VAR are either in STMT, or + feed (optionally through a chain of single imm uses) GIMPLE_COND + in basic block COND_BB. */ + +static bool +all_imm_uses_in_stmt_or_feed_cond (tree var, gimple stmt, basic_block cond_bb) +{ + use_operand_p use_p, use2_p; + imm_use_iterator iter; + + FOR_EACH_IMM_USE_FAST (use_p, iter, var) +if (USE_STMT (use_p) != stmt) + { + gimple use_stmt = USE_STMT (use_p); + if (is_gimple_debug (use_stmt)) + continue; + while (is_gimple_assign (use_stmt) + single_imm_use (gimple_assign_lhs (use_stmt), + use2_p, use_stmt)) + ; + if (gimple_code (use_stmt) != GIMPLE_COND + || gimple_bb (use_stmt) != cond_bb) + return false; + } + return true; +} + /* Convert range assertion expressions into the implied copies and copy propagate away the copies. Doing the trivial copy propagation here avoids the need to run the full copy propagation pass after @@ -6488,12 +6515,16 @@ remove_range_assertions (void) { basic_block bb; gimple_stmt_iterator si; + /* 1 if looking at ASSERT_EXPRs immediately at the beginning of + a basic block preceeded by GIMPLE_COND branching to it and + __builtin_trap, -1 if not yet checked, 0 otherwise. */ + int is_unreachable; /* Note that the BSI iterator bump happens at the bottom of the loop and no bump is necessary if we're removing the statement referenced by the current BSI. */ FOR_EACH_BB (bb) -for (si = gsi_start_bb (bb); !gsi_end_p (si);) +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);) { gimple stmt = gsi_stmt (si); gimple use_stmt; @@ -6501,6 +6532,7 @@ remove_range_assertions (void) if (is_gimple_assign (stmt) gimple_assign_rhs_code (stmt) == ASSERT_EXPR) { + tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); tree var; tree cond = fold (ASSERT_EXPR_COND (rhs)); @@ -6509,22 +6541,49 @@ remove_range_assertions (void) gcc_assert (cond != boolean_false_node); - /* Propagate the RHS into every use of the LHS. */ var = ASSERT_EXPR_VAR (rhs); - FOR_EACH_IMM_USE_STMT (use_stmt, iter, - gimple_assign_lhs (stmt)) + gcc_assert (TREE_CODE (var) == SSA_NAME); + + if (!POINTER_TYPE_P (TREE_TYPE (lhs)) +SSA_NAME_RANGE_INFO (lhs)) + { +
Re: libsanitizer merge from upstream r191666
On Tue, Oct 29, 2013 at 6:52 AM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Oct 29, 2013 at 06:49:30AM -0700, Konstantin Serebryany wrote: Thanks! (At this time I will be slow with response due to travel) BTW, don't have compiled clang/llvm pre-3.4 around to look at, for the use after return, do you emit always the the __asan_*malloc/free calls for the stack vars, or only conditionally based on some compiler flag (and in the latter case, is that flag on by default or not)? The calls are emitted by default, but the __asan_stack_malloc call is done under a run-time flag __asan_option_detect_stack_use_after_return. So, to use the stack-use-after-return feature you should simply compile with -fsanitize=address and then at run-time pass ASAN_OPTIONS=detect_stack_use_after_return=1 For small stack frame sizes the call to __asan_stack_free is inlined (as a performance optimization, not mandatory). Here is how the IR looks like: void bar(void*); void foo() { char x[1]; bar(x); } define void @foo() #0 { entry: %MyAlloca = alloca [10080 x i8], align 32 %0 = ptrtoint [10080 x i8]* %MyAlloca to i64 %1 = load i32* @__asan_option_detect_stack_use_after_return %2 = icmp ne i32 %1, 0 br i1 %2, label %3, label %5 ; label:3 ; preds = %entry %4 = call i64 @__asan_stack_malloc_8(i64 10080, i64 %0) br label %5 ; label:5 ; preds = %entry, %3 %6 = phi i64 [ %0, %entry ], [ %4, %3 ] ... call void @__asan_stack_free_8(i64 %6, i64 10080, i64 %0) ret void } So, when the UAR is disabled at run-time we pay a very small additional price for this code (less than 1% on avg on SPEC). When it is enabled, we pay more: https://code.google.com/p/address-sanitizer/wiki/UseAfterReturn#Performance --kcc Jakub
Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP
On Tue, 29 Oct 2013, Jakub Jelinek wrote: On Tue, Oct 29, 2013 at 12:28:49PM +0100, Richard Biener wrote: Otherwise ok. So like this? Yes, Thanks. Richard. 2013-10-29 Jakub Jelinek ja...@redhat.com * tree-cfg.c (assert_unreachable_fallthru_edge_p): New function. * tree-cfg.h (assert_unreachable_fallthru_edge_p): New prototype. * tree-vrp.c (all_imm_uses_in_stmt_or_feed_cond): New function. (remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate uses but in the condition and ASSERT_EXPR and the other successor of the predecessor bb is __builtin_unreachable (), set_range_info of the ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs. --- gcc/tree-cfg.c.jj 2013-10-29 09:25:55.0 +0100 +++ gcc/tree-cfg.c2013-10-29 14:32:53.235150267 +0100 @@ -380,6 +380,50 @@ computed_goto_p (gimple t) TREE_CODE (gimple_goto_dest (t)) != LABEL_DECL); } +/* Returns true for edge E where e-src ends with a GIMPLE_COND and + the other edge points to a bb with just __builtin_unreachable (). + I.e. return true for C-M edge in: + bb C: + ... + if (something) + goto bb N; + else + goto bb M; + bb N: + __builtin_unreachable (); + bb M: */ + +bool +assert_unreachable_fallthru_edge_p (edge e) +{ + basic_block pred_bb = e-src; + gimple last = last_stmt (pred_bb); + if (last gimple_code (last) == GIMPLE_COND) +{ + basic_block other_bb = EDGE_SUCC (pred_bb, 0)-dest; + if (other_bb == e-dest) + other_bb = EDGE_SUCC (pred_bb, 1)-dest; + if (EDGE_COUNT (other_bb-succs) == 0) + { + gimple_stmt_iterator gsi = gsi_after_labels (other_bb); + gimple stmt; + + if (gsi_end_p (gsi)) + return false; + stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + { + gsi_next_nondebug (gsi); + if (gsi_end_p (gsi)) + return false; + stmt = gsi_stmt (gsi); + } + return gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE); + } +} + return false; +} + /* Search the CFG for any computed gotos. If found, factor them to a common computed goto site. Also record the location of that site so --- gcc/tree-vrp.c.jj 2013-10-29 09:25:38.382477963 +0100 +++ gcc/tree-vrp.c2013-10-29 14:36:23.695060600 +0100 @@ -6459,6 +6459,33 @@ check_all_array_refs (void) } } +/* Return true if all imm uses of VAR are either in STMT, or + feed (optionally through a chain of single imm uses) GIMPLE_COND + in basic block COND_BB. */ + +static bool +all_imm_uses_in_stmt_or_feed_cond (tree var, gimple stmt, basic_block cond_bb) +{ + use_operand_p use_p, use2_p; + imm_use_iterator iter; + + FOR_EACH_IMM_USE_FAST (use_p, iter, var) +if (USE_STMT (use_p) != stmt) + { + gimple use_stmt = USE_STMT (use_p); + if (is_gimple_debug (use_stmt)) + continue; + while (is_gimple_assign (use_stmt) + single_imm_use (gimple_assign_lhs (use_stmt), + use2_p, use_stmt)) + ; + if (gimple_code (use_stmt) != GIMPLE_COND + || gimple_bb (use_stmt) != cond_bb) + return false; + } + return true; +} + /* Convert range assertion expressions into the implied copies and copy propagate away the copies. Doing the trivial copy propagation here avoids the need to run the full copy propagation pass after @@ -6488,12 +6515,16 @@ remove_range_assertions (void) { basic_block bb; gimple_stmt_iterator si; + /* 1 if looking at ASSERT_EXPRs immediately at the beginning of + a basic block preceeded by GIMPLE_COND branching to it and + __builtin_trap, -1 if not yet checked, 0 otherwise. */ + int is_unreachable; /* Note that the BSI iterator bump happens at the bottom of the loop and no bump is necessary if we're removing the statement referenced by the current BSI. */ FOR_EACH_BB (bb) -for (si = gsi_start_bb (bb); !gsi_end_p (si);) +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);) { gimple stmt = gsi_stmt (si); gimple use_stmt; @@ -6501,6 +6532,7 @@ remove_range_assertions (void) if (is_gimple_assign (stmt) gimple_assign_rhs_code (stmt) == ASSERT_EXPR) { + tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); tree var; tree cond = fold (ASSERT_EXPR_COND (rhs)); @@ -6509,22 +6541,49 @@ remove_range_assertions (void) gcc_assert (cond != boolean_false_node); - /* Propagate the RHS into every use of the LHS. */ var = ASSERT_EXPR_VAR (rhs); - FOR_EACH_IMM_USE_STMT (use_stmt, iter, -gimple_assign_lhs (stmt)) + gcc_assert (TREE_CODE (var) == SSA_NAME); + +
[PATCH] More restrict testcases
Just figured while optimizing Michas ADD_RESTRICT patch ... (well, make it work). Tested on x86_64-unknown-linux-gnu, committed. Richard. 2013-10-29 Richard Biener rguent...@suse.de * gcc.dg/torture/restrict-2.c: New testcase. * gcc.dg/torture/restrict-3.c: Likewise. * gcc.dg/torture/restrict-4.c: Likewise. * gcc.dg/torture/restrict-5.c: Likewise. Index: gcc/testsuite/gcc.dg/torture/restrict-2.c === --- gcc/testsuite/gcc.dg/torture/restrict-2.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/restrict-2.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do run } */ + +extern void abort (void); + +static inline void +foo (int * p) +{ + int * __restrict pr = p; + *pr = 1; +} + +int __attribute__((noinline,noclone)) +bar (int *q) +{ + int * __restrict qr = q; + *qr = 0; + foo (qr); + return *qr; +} + +int main() +{ + int i; + if (bar (i) != 1) +abort (); + return 0; +} Index: gcc/testsuite/gcc.dg/torture/restrict-3.c === --- gcc/testsuite/gcc.dg/torture/restrict-3.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/restrict-3.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do run } */ + +extern void abort (void); + +static inline void +foo (int * __restrict pr) +{ + *pr = 1; +} + +int __attribute__((noinline,noclone)) +bar (int *q) +{ + int * __restrict qr = q; + *qr = 0; + foo (qr); + return *qr; +} + +int main() +{ + int i; + if (bar (i) != 1) +abort (); + return 0; +} Index: gcc/testsuite/gcc.dg/torture/restrict-4.c === --- gcc/testsuite/gcc.dg/torture/restrict-4.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/restrict-4.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do run } */ + +extern void abort (void); + +static inline void +foo (int * p) +{ + int * __restrict pr = p; + *pr = 1; +} + +int __attribute__((noinline,noclone)) +bar (int * __restrict qr) +{ + *qr = 0; + foo (qr); + return *qr; +} + +int main() +{ + int i; + if (bar (i) != 1) +abort (); + return 0; +} Index: gcc/testsuite/gcc.dg/torture/restrict-5.c === --- gcc/testsuite/gcc.dg/torture/restrict-5.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/restrict-5.c (working copy) @@ -0,0 +1,25 @@ +/* { dg-do run } */ + +extern void abort (void); + +static inline void +foo (int * __restrict pr) +{ + *pr = 1; +} + +int __attribute__((noinline,noclone)) +bar (int * __restrict qr) +{ + *qr = 0; + foo (qr); + return *qr; +} + +int main() +{ + int i; + if (bar (i) != 1) +abort (); + return 0; +}
[PATCH, 4.8, PR 58789] Backport cgraph_get_create_real_symbol_node and PR 57084 fix
Hi, PR 58789 has been fixed on trunk by revision 198743 which needs cgraph_get_create_real_symbol_node introduced in revision 196750. This patch backports both. It has been pre-approved by Honza in person and passed bootstrap and testing on the branch. I am about to commit it momentarily. Thanks, Martin 2013-10-25 Martin Jambor mjam...@suse.cz PR middle-end/58789 Backport from mainline 2013-05-09 Martin Jambor mjam...@suse.cz PR lto/57084 * gimple-fold.c (canonicalize_constructor_val): Call cgraph_get_create_real_symbol_node instead of cgraph_get_create_node. Backport from mainline 2013-03-16 Jan Hubicka j...@suse.cz * cgraph.h (cgraph_get_create_real_symbol_node): Declare. * cgraph.c (cgraph_get_create_real_symbol_node): New function. * cgrpahbuild.c: Use cgraph_get_create_real_symbol_node instead of cgraph_get_create_node. * ipa-prop.c (ipa_make_edge_direct_to_target): Likewise. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 8c1efb4..fd3aade 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2596,4 +2596,47 @@ verify_cgraph (void) FOR_EACH_FUNCTION (node) verify_cgraph_node (node); } + +/* Create external decl node for DECL. + The difference i nbetween cgraph_get_create_node and + cgraph_get_create_real_symbol_node is that cgraph_get_create_node + may return inline clone, while cgraph_get_create_real_symbol_node + will create a new node in this case. + FIXME: This function should be removed once clones are put out of decl + hash. */ + +struct cgraph_node * +cgraph_get_create_real_symbol_node (tree decl) +{ + struct cgraph_node *first_clone = cgraph_get_node (decl); + struct cgraph_node *node; + /* create symbol table node. even if inline clone exists, we can not take + it as a target of non-inlined call. */ + node = cgraph_get_node (decl); + if (node !node-global.inlined_to) +return node; + + node = cgraph_create_node (decl); + + /* ok, we previously inlined the function, then removed the offline copy and + now we want it back for external call. this can happen when devirtualizing + while inlining function called once that happens after extern inlined and + virtuals are already removed. in this case introduce the external node + and make it available for call. */ + if (first_clone) +{ + first_clone-clone_of = node; + node-clones = first_clone; + symtab_prevail_in_asm_name_hash ((symtab_node) node); + symtab_insert_node_to_hashtable ((symtab_node) node); + if (dump_file) + fprintf (dump_file, Introduced new external node +(%s/%i) and turned into root of the clone tree.\n, +xstrdup (cgraph_node_name (node)), node-uid); +} + else if (dump_file) +fprintf (dump_file, Introduced new external node +(%s/%i).\n, xstrdup (cgraph_node_name (node)), node-uid); + return node; +} #include gt-cgraph.h diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 5df7fb4..8ab7ae1 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -575,6 +575,7 @@ struct cgraph_indirect_call_info *cgraph_allocate_init_indirect_info (void); struct cgraph_node * cgraph_create_node (tree); struct cgraph_node * cgraph_create_empty_node (void); struct cgraph_node * cgraph_get_create_node (tree); +struct cgraph_node * cgraph_get_create_real_symbol_node (tree); struct cgraph_node * cgraph_same_body_alias (struct cgraph_node *, tree, tree); struct cgraph_node * cgraph_add_thunk (struct cgraph_node *, tree, tree, bool, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree); diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c index fb01f24..a74a4c0 100644 --- a/gcc/cgraphbuild.c +++ b/gcc/cgraphbuild.c @@ -73,7 +73,7 @@ record_reference (tree *tp, int *walk_subtrees, void *data) decl = get_base_var (*tp); if (TREE_CODE (decl) == FUNCTION_DECL) { - struct cgraph_node *node = cgraph_get_create_node (decl); + struct cgraph_node *node = cgraph_get_create_real_symbol_node (decl); if (!ctx-only_vars) cgraph_mark_address_taken_node (node); ipa_record_reference ((symtab_node)ctx-varpool_node, @@ -143,7 +143,7 @@ record_eh_tables (struct cgraph_node *node, struct function *fun) { struct cgraph_node *per_node; - per_node = cgraph_get_create_node (DECL_FUNCTION_PERSONALITY (node-symbol.decl)); + per_node = cgraph_get_create_real_symbol_node (DECL_FUNCTION_PERSONALITY (node-symbol.decl)); ipa_record_reference ((symtab_node)node, (symtab_node)per_node, IPA_REF_ADDR, NULL); cgraph_mark_address_taken_node (per_node); } @@ -223,7 +223,7 @@ mark_address (gimple stmt, tree addr, void *data) addr = get_base_address (addr); if (TREE_CODE (addr) == FUNCTION_DECL) { - struct cgraph_node *node = cgraph_get_create_node (addr); + struct cgraph_node
Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
On 10/29/2013 02:51 AM, Bernd Edlinger wrote: On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote: I again tried backporting the patch series along with your fix to GCC 4.8 and tested on arm-none-eabi. I found that it was still getting stuck in infinite recursion unless the test from this patch hunk hmmm Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi. I have seen the recursion on the extract_bit_field, but not on store_bit_field so far, maybe you could give me a hint which test case exposes the other flavour of this recursion problem. Sorry, I was going to describe that in more detail but I forgot. It was compiling pr56997-1.c with -mthumb, like: arm-none-eabi-gcc /path/to/gcc/testsuite/gcc.dg/pr56997-1.c -c -fno-diagnostics-show-caret -fstrict-volatile-bitfields -DSTACK_SIZE=16384 -mthumb Other testing with -mthumb looked fine. Either way, still needs more work, and more thorough testing (more targets, and obviously trunk as well as the 4.8 backport) before I'd consider this ready to commit. I might or might not be able to find some more time to work on this in the next week Yes. And it would be good to reach Richard's Nov-21 deadline for GCC-4.9 Yeah, I know the clock is ticking. :-( -Sandra
Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops
On 10/28/2013 05:48 PM, Tobias Burnus wrote: I am not completely sure whether you had the following in mind, but that's what I have now implemented: DEFTREECODE (RANGE_FOR_STMT, range_for_stmt, tcc_statement, 4) has now a 5th operator (RANGE_FOR_IVDEP), which has the value boolean_true_node - or NULL_TREE (via begin_range_for_stmt). You don't need to add an operand, you can use one of the TREE_LANG_FLAGs for a boolean flag. Jason
Re: [C++ Patch] PR 58888
OK. Jason
Re: [PATCH] Compute nonzero_bits from __builtin_unreachable assertions
On Sat, 26 Oct 2013, Jakub Jelinek wrote: Hi! And here is a patch that allows vectorization without peeling for alignment and scalar loop for bound even for fn2, fn3 and fn4 in the following testcase, though as with the range __builtin_unreachable () notes, it is quite fragile, because it only works if there are no immediate uses of the tested SSA_NAME before the assertion. Perhaps more reliable way would be to convert those assertions info __builtin_assume_aligned, but that has the disadvantage that it's first argument is a pointer and it returns a pointer, so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED internal function. Bootstrapped/regtested on x86_64-linux and i686-linux. Ok. Thanks, Richard. int a[1024]; void fn1 (int x, int y) { int i; x = -32; y = -32; for (i = x + 32; i y; i++) a[i]++; } void fn2 (int x, int y) { int i; if (x 31) __builtin_unreachable (); if (y 31) __builtin_unreachable (); for (i = x + 32; i x + y; i++) a[i]++; } void fn3 (int x, int y) { int i; if (x % 32) __builtin_unreachable (); if (y % 32) __builtin_unreachable (); for (i = x + 32; i x + y; i++) a[i]++; } void fn3 (int x, int y) { int i; if ((x % 32) != 0) __builtin_unreachable (); if ((y % 32) != 0) __builtin_unreachable (); for (i = x + 32; i x + y; i++) a[i]++; } 2013-10-25 Jakub Jelinek ja...@redhat.com * tree-vrp.c (maybe_set_nonzero_bits): New function. (remove_range_assertions): Call it. --- gcc/tree-vrp.c.jj 2013-10-24 14:32:29.0 +0200 +++ gcc/tree-vrp.c2013-10-25 21:21:35.183092937 +0200 @@ -6459,6 +6459,60 @@ check_all_array_refs (void) } } +/* Handle + _4 = x_3 31; + if (_4 != 0) + goto bb 6; + else + goto bb 7; + bb 6: + __builtin_unreachable (); + bb 7: + x_5 = ASSERT_EXPR x_3, ...; + If x_3 has no other immediate uses (checked by caller), + var is the x_3 var from ASSERT_EXPR, we can clear low 5 bits + from the non-zero bitmask. */ + +static void +maybe_set_nonzero_bits (basic_block bb, tree var) +{ + edge e = single_pred_edge (bb); + basic_block cond_bb = e-src; + gimple stmt = last_stmt (cond_bb); + tree cst; + + if (stmt == NULL + || gimple_code (stmt) != GIMPLE_COND + || gimple_cond_code (stmt) != ((e-flags EDGE_TRUE_VALUE) + ? EQ_EXPR : NE_EXPR) + || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME + || !integer_zerop (gimple_cond_rhs (stmt))) +return; + + stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt)); + if (!is_gimple_assign (stmt) + || gimple_assign_rhs_code (stmt) != BIT_AND_EXPR + || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST) +return; + if (gimple_assign_rhs1 (stmt) != var) +{ + gimple stmt2; + + if (TREE_CODE (gimple_assign_rhs1 (stmt)) != SSA_NAME) + return; + stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt)); + if (!gimple_assign_cast_p (stmt2) + || gimple_assign_rhs1 (stmt2) != var + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2)) + || (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt))) + != TYPE_PRECISION (TREE_TYPE (var + return; +} + cst = gimple_assign_rhs2 (stmt); + set_nonzero_bits (var, (get_nonzero_bits (var) +~tree_to_double_int (cst))); +} + /* Convert range assertion expressions into the implied copies and copy propagate away the copies. Doing the trivial copy propagation here avoids the need to run the full copy propagation pass after @@ -6576,8 +6630,11 @@ remove_range_assertions (void) } } if (ok) - set_range_info (var, SSA_NAME_RANGE_INFO (lhs)-min, - SSA_NAME_RANGE_INFO (lhs)-max); + { + set_range_info (var, SSA_NAME_RANGE_INFO (lhs)-min, + SSA_NAME_RANGE_INFO (lhs)-max); + maybe_set_nonzero_bits (bb, var); + } } } Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [PATCH] Testcase for nonzero_bits __builtin_unreachable
On Tue, 29 Oct 2013, Jakub Jelinek wrote: On Sat, Oct 26, 2013 at 12:19:33AM +0200, Jakub Jelinek wrote: And here is a patch that allows vectorization without peeling for alignment and scalar loop for bound even for fn2, fn3 and fn4 in the following testcase, though as with the range __builtin_unreachable () notes, it is quite fragile, because it only works if there are no immediate uses of the tested SSA_NAME before the assertion. Perhaps more reliable way would be to convert those assertions info __builtin_assume_aligned, but that has the disadvantage that it's first argument is a pointer and it returns a pointer, so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED internal function. Bootstrapped/regtested on x86_64-linux and i686-linux. And here is as a follow-up to the whole patchset of 4 patches a testcase: Ok. Thanks, Richard. 2013-10-29 Jakub Jelinek ja...@redhat.com * gcc.dg/vect/vect-align-3.c: New test. --- gcc/testsuite/gcc.dg/vect/vect-align-3.c.jj 2013-10-29 08:18:08.636348586 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-align-3.c 2013-10-29 08:21:14.931384138 +0100 @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +int a[2048]; + +void +f1 (int x, int y) +{ + int i; + x = -256; + y = -256; + for (i = x + 256; i y; i++) +a[i]++; +} + +void +f2 (int x, int y) +{ + int i; + if (x 31) +__builtin_unreachable (); + if (y 31) +__builtin_unreachable (); + for (i = x + 256; i x + y; i++) +a[i]++; +} + +void +f3 (int x, int y) +{ + int i; + if (x % 256) +__builtin_unreachable (); + if (y % 256) +__builtin_unreachable (); + for (i = x + 256; i x + y; i++) +a[i]++; +} + +void +f4 (int x, int y) +{ + int i; + if ((x % 256) != 0) +__builtin_unreachable (); + if ((y % 256) != 0) +__builtin_unreachable (); + for (i = x + 256; i x + y; i++) +a[i]++; +} + +/* { dg-final { scan-tree-dump-not vect_do_peeling_for_loop_bound vect } } */ +/* { dg-final { scan-tree-dump-not loop peeled for vectorization vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
[PATCH] Fix PR 58867: asan and ubsan tests not run for installed testing
Hi, The problem here is that both asan and ubsan testsuite test if we have set a library path before running the testsuite. This is incorrect when running the already installed testing as there is no path to set. This patch changes it so it tests if the executable is able to be built/linked before running the tests. This also fixes a warning/error in restoring the CXXFLAGS which shows up in the already installed testing case. OK? Bootstrapped and tested on x86_64-linux-gnu. Also manually looked into the log file to make sure both asan and ubsan were still being tested. Thanks, Andrew Pinski ChangeLog: * lib/ubsan-dg.exp (check_effective_target_fundefined_sanitizer): New function. (ubsan_init): Save off ALWAYS_CXXFLAGS. (ubsan_finish): Restore ALWAYS_CXXFLAGS correctly. * lib/asan-dg.exp (check_effective_target_faddress_sanitizer): Change to creating an executable. (asan_init): Save off ALWAYS_CXXFLAGS. (asan_finish): Restore ALWAYS_CXXFLAGS correctly. * gcc.dg/ubsan/ubsan.exp: Don't check the return value of ubsan_init. Check check_effective_target_fundefined_sanitizer before running the tests. * g++.dg/ubsan/ubsan.exp: Likewise. * testsuite/gcc.dg/asan/asan.exp: Don't check check_effective_target_faddress_sanitizer too early. Don't check the return value of asan_init. * g++.dg/asan/asan.exp: Likewise. Index: testsuite/lib/ubsan-dg.exp === --- testsuite/lib/ubsan-dg.exp (revision 204138) +++ testsuite/lib/ubsan-dg.exp (working copy) @@ -14,6 +14,15 @@ # along with GCC; see the file COPYING3. If not see # http://www.gnu.org/licenses/. +# Return 1 if compilation with -fsanitize=undefined is error-free for trivial +# code, 0 otherwise. + +proc check_effective_target_fundefined_sanitizer {} { +return [check_no_compiler_messages fundefined_sanitizer executable { + int main (void) { return 0; } +} -fsanitize=undefined] +} + # # ubsan_link_flags -- compute library path and flags to find libubsan. # (originally from g++.exp) @@ -60,6 +69,7 @@ proc ubsan_init { args } { global ALWAYS_CXXFLAGS global TOOL_OPTIONS global ubsan_saved_TEST_ALWAYS_FLAGS +global usan_saved_ALWAYS_CXXFLAGS set link_flags if ![is_remote host] { @@ -74,6 +84,7 @@ proc ubsan_init { args } { set ubsan_saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS } if [info exists ALWAYS_CXXFLAGS] { + set usan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS set ALWAYS_CXXFLAGS [concat {ldflags=$link_flags} $ALWAYS_CXXFLAGS] } else { if [info exists TEST_ALWAYS_FLAGS] { @@ -95,10 +106,15 @@ proc ubsan_init { args } { proc ubsan_finish { args } { global TEST_ALWAYS_FLAGS global ubsan_saved_TEST_ALWAYS_FLAGS +global asan_saved_ALWAYS_CXXFLAGS -if [info exists ubsan_saved_TEST_ALWAYS_FLAGS] { - set TEST_ALWAYS_FLAGS $ubsan_saved_TEST_ALWAYS_FLAGS +if [info exists asan_saved_ALWAYS_CXXFLAGS ] { + set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS } else { - unset TEST_ALWAYS_FLAGS + if [info exists ubsan_saved_TEST_ALWAYS_FLAGS] { + set TEST_ALWAYS_FLAGS $ubsan_saved_TEST_ALWAYS_FLAGS + } else { + unset TEST_ALWAYS_FLAGS + } } } Index: testsuite/lib/asan-dg.exp === --- testsuite/lib/asan-dg.exp (revision 204138) +++ testsuite/lib/asan-dg.exp (working copy) @@ -18,8 +18,8 @@ # code, 0 otherwise. proc check_effective_target_faddress_sanitizer {} { -return [check_no_compiler_messages faddress_sanitizer object { - void foo (void) { } +return [check_no_compiler_messages faddress_sanitizer executable { + int main (void) { return 0; } } -fsanitize=address] } @@ -69,6 +69,7 @@ proc asan_init { args } { global ALWAYS_CXXFLAGS global TOOL_OPTIONS global asan_saved_TEST_ALWAYS_FLAGS +global asan_saved_ALWAYS_CXXFLAGS set link_flags if ![is_remote host] { @@ -83,6 +84,7 @@ proc asan_init { args } { set asan_saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS } if [info exists ALWAYS_CXXFLAGS] { + set asan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS set ALWAYS_CXXFLAGS [concat {ldflags=$link_flags} $ALWAYS_CXXFLAGS] set ALWAYS_CXXFLAGS [concat {additional_flags=-fsanitize=address -g} $ALWAYS_CXXFLAGS] } else { @@ -105,11 +107,16 @@ proc asan_init { args } { proc asan_finish { args } { global TEST_ALWAYS_FLAGS global asan_saved_TEST_ALWAYS_FLAGS +global asan_saved_ALWAYS_CXXFLAGS -if [info exists asan_saved_TEST_ALWAYS_FLAGS] { - set TEST_ALWAYS_FLAGS $asan_saved_TEST_ALWAYS_FLAGS +if [info exists asan_saved_ALWAYS_CXXFLAGS ] { + set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS } else { - unset TEST_ALWAYS_FLAGS + if [info exists asan_saved_TEST_ALWAYS_FLAGS] { + set TEST_ALWAYS_FLAGS
Re: [PATCH] Do not append *INTERNAL* to the decl name
On 10/28/2013 06:12 PM, Dehao Chen wrote: ping... Sorry for the slow response. If we're actually emitting the name now, we need to give it a name different from the complete constructor. I suppose it makes sense to go with C4/D4 as in the decloning patch, http://gcc.gnu.org/ml/gcc-patches/2007-11/txt00046.txt (which I've been meaning to revisit before the end of stage 1). Jason
Re: free is a killer
On 10/29/13 04:40, Richard Biener wrote: Of course in the example we have the global memory storage class (incoming function argument) and malloc memory which is really the same storage class. It only becomes a different storage class if you factor in flow analysis (for which the current PTA code is weak once you leave the SSA web). The global memory storage class is really better thought of as i haven't a clue class. You want a different class for things passed in as you know they can't conflict with anything in the current function's stack or anything malloc'd by the current function. Jeff
Re: [PATCH 1/2] fix PR49847 ICE-on-HAVE_cc0 regression from PR50780 changes
On Wed, Aug 28, 2013 at 10:06 PM, Jeff Law l...@redhat.com wrote: On 08/28/2013 03:50 AM, Mikael Pettersson wrote: This patch fixes an ICE that occurs in #ifdef HAVE_cc0 code. The ICE breaks both Java and Ada bootstrap on m68k-linux. There is also a tiny C++ test case in the BZ entry. The ICE is triggered by the PR middle-end/50780 changes in r180192. As explained in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847#c16, the effect of those changes is that an expression in EH context is broken up so that the cc0 setter and user are put in separate BBs, which normally isn't allowed. As fold_rtx sees the cc0 user, it calls equiv_constant on prev_insn_cc0, but that is NULL due to the BB boundary, resulting in the ICE. This patch checks if prev_insn_cc0 is NULL, and if so doesn't call equiv_constant but sets const_arg to zero, which avoids the ICE and makes fold_rtx leave the insn unchanged. Bootstrapped and regtested on m68k-linux, no regressions. This patch has been in 4.6-based production compilers on m68k-linux since early 2012, and in a 4.7-based compiler since early 2013. Ok for trunk and 4.8? [If approved I'll need help to commit it as I don't have commit rights.] gcc/ 2013-08-28 Mikael Pettersson mi...@it.uu.se PR rtl-optimization/49847 * cse.c (fold_rtx) second case CC0: If prev_insn_cc0 is NULL don't call equiv_constant on it. I can't help but feel something different needs to be done here. It's always been the case that those two insns are expected to be contiguous and there's going to be code all over the place that makes that assumption and that model is what every GCC developer has expected for the last 20+ years. What precisely about Richi's patch causes the setter and user to end up in different blocks? I'm guessing we now consider the comparison itself as potentially trapping and thus we end the block immediately? The consumer then appears in the next block? Btw, the handling for COND_EXPRs was simply adjusted to be the same as for if () conditions. From looking at the testcase, int f (float g) { try { return g = 0; } catch (...) {} } can't you simply avoid expanding throwing stuff to a set-reg-based-on-compare instruction sequence? Or why does the issue not trigger on try { if (g = 0) return 1; else 0; } catch (...) {} ? Richard. I wonder if the comparison could be duplicated and marked as non-throwing at RTL generation time. Alternately (and probably better) would be to convert the remaining stragglers and drop the old cc0 support entirely. jeff
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law l...@redhat.com wrote: On 10/27/13 12:55, Andrew Pinski wrote: Here is my latest patch which adds the testcases from Zhenqiang's patch and fixes item 1 and 2. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (ifcombine_ifandif): Handle cases where maybe_fold_and_comparisons fails, combining the branches anyways. (tree_ssa_ifcombine): Inverse the order of the basic block walk, increases the number of combinings. * gimple.h (gsi_start_nondebug_after_labels_bb): New function. testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent conditional move to be used. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more intermediate. I don't be able to look at this in any depth tonight. However, I would ask that you pass along the dump file for ssa-dom-thread-3.c so that I can evaluate the correctness of your change better. We no longer jump thread one case as we have: bb 2: var_3 = var_1(D)-ssa_name.var; _4 = var_1(D)-base.code; if (_4 == 1) goto bb 3; else goto bb 5; ... bb 5: _6 = var_3-base.code; _13 = _4 != 1; _14 = _6 != 0; _12 = _13 _14; if (_12 != 0) goto bb 8; else goto bb 6; Attached is the full dump too. Thanks, Andrew Pinski Thanks, jeff ssa-dom-thread-3.c.077t.dom1 Description: Binary data
Re: free is a killer
On Tue, Oct 29, 2013 at 4:01 PM, Jeff Law l...@redhat.com wrote: On 10/29/13 04:40, Richard Biener wrote: Of course in the example we have the global memory storage class (incoming function argument) and malloc memory which is really the same storage class. It only becomes a different storage class if you factor in flow analysis (for which the current PTA code is weak once you leave the SSA web). The global memory storage class is really better thought of as i haven't a clue class. No, that's the anything class in the current PTA. You want a different class for things passed in as you know they can't conflict with anything in the current function's stack or anything malloc'd by the current function. That's the NONLOCAL class in the current PTA. Which also means that simple cases should work. It's only when the malloc result escapes somewhere that it gets reachable by NONLOCAL because of the limited flow-sensitiveness of PTA (and its lack of context sensitivity). Richard. Jeff
Re: [wide-int] More optimisations
On 10/29/2013 08:43 AM, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Sun, 27 Oct 2013, Richard Sandiford wrote: This patch adds some more optimisations to the wi:: comparison functions. It uses the: #define CONSTANT(X) (__builtin_constant_p (X) (X)) idiom that was mentioned before, except that I thought CONSTANT would be too easily confused with CONSTANT_P, so I went for CAN_TELL instead. Better names welcome. STATIC_CONSTANT_P similar to static_assert? Sounds good. The changes are: - Add a fast path to eq_p for when one of the inputs isn't sign-extended. This includes code to handle compile-time 0 specially. - Add the opposite optimisation to Mike's lts_p change, if we can tell at compile time that it applies. I think the cases Mike added should only be enabled when we can figure them out at compile-time, too. Well, the idea with most of these functions was to handle the simple everything is one HWI cases inline. For operations like addition this needs to be based on the precision, since that's the only cheap way of knowing whether the result is also a single HWI. But for logic operations like AND and OR, it can be based on whether the inputs are single HWIs, since single-HWI inputs give a single-HWI output. This catches more cases. lts_p is a bit like AND and OR in this respect. fits_shwi_p is the same as len == 1, which is simple to check and is often true. If we restrict the optimisation to STATIC_CONSTANT_P we'll end up using the out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones that we currently handle inline. This has always been my way of looking at it. kenny Thanks, Richard
Re: [PATCH] Introduce [sg]et_nonzero_bits
On Tue, Oct 29, 2013 at 12:55:04PM +0100, Richard Biener wrote: Surely you can't rely on CCP and VRP compute exactly the same nonzero_bits. As you don't record/compute zero_bits you can't tell whether a not set bit in nonzer_bits is don't know or if it is zero. And you cannot do an assert during iteration (during iteration optimistically 'wrong' values can disappear). During CCP iteration the rule is that bits may only be added to mask (and obviously the constant itself on a CONSTANT lattice value may not change). Factor this out to commonize with code in evaluate_stmt Ok. + tem = val-mask.low; + align = (tem -tem); + if (align 1) + set_ptr_info_alignment (get_ptr_info (name), align, + (TREE_INT_CST_LOW (val-value) + (align - 1))); + } + else + { + double_int nonzero_bits = val-mask; + nonzero_bits = nonzero_bits | tree_to_double_int (val-value); + nonzero_bits = get_nonzero_bits (name); This looks odd to me - shouldn't it be simply nonzero_bits = ~val-mask tree_to_double_int (val-value); Bits set in the mask are for the bits where the value is unknown, so potentially nonzero bits. Where bits are clear in the mask, but set in the value, those are surely nonzero bits. The only known zero bits are where both mask and value are zero, and as nonzero_bits is defined to be 1 where the bit may be non-zero, 0 where it must be zero (like it is for nonzero_bits* in RTL), I think val-mask | tree_to_double_int (val-value); is exactly what we want. ? = of get_nonzero_bits either is not necessary or shows the lattice is unnecessarily conservative because you apply it too early during propagation? + set_nonzero_bits (name, nonzero_bits); + } } /* Perform substitutions based on the known constant values. */ @@ -1671,6 +1700,25 @@ evaluate_stmt (gimple stmt) is_constant = (val.lattice_val == CONSTANT); } + if (flag_tree_bit_ccp + !is_constant ^^^ ah, so if the bit-CCP part figured out a set of known bits then you don't do anything here while in fact you can still remove bits from mask with get_nonzero_bits info. Yeah, that was not to lose info for the case where the lattice is already CONSTANT. Originally, I had code like: --- tree-ssa-ccp.c.xx 2013-10-29 15:31:03.0 +0100 +++ tree-ssa-ccp.c 2013-10-29 15:34:44.257977817 +0100 @@ -1701,8 +1701,7 @@ evaluate_stmt (gimple stmt) } if (flag_tree_bit_ccp - !is_constant - likelyvalue != UNDEFINED + (is_constant || likelyvalue != UNDEFINED) gimple_get_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) == SSA_NAME) { @@ -1711,11 +1710,27 @@ evaluate_stmt (gimple stmt) double_int mask = double_int::mask (TYPE_PRECISION (TREE_TYPE (lhs))); if (nonzero_bits != double_int_minus_one nonzero_bits != mask) { - val.lattice_val = CONSTANT; - val.value = build_zero_cst (TREE_TYPE (lhs)); - /* CCP wants the bits above precision set. */ - val.mask = nonzero_bits | ~mask; - is_constant = true; + if (!is_constant) + { + val.lattice_val = CONSTANT; + val.value = build_zero_cst (TREE_TYPE (lhs)); + /* CCP wants the bits above precision set. */ + val.mask = nonzero_bits | ~mask; + is_constant = true; + } + else + { + double_int valv = tree_to_double_int (val.value); + gcc_assert ((valv ~val.mask + ~nonzero_bits mask).is_zero ()); + if (!(valv ~nonzero_bits mask).is_zero ()) + val.value = double_int_to_tree (TREE_TYPE (lhs), + valv nonzero_bits); + if (nonzero_bits.is_zero ()) + val.mask = double_int_zero; + else + val.mask = val.mask (nonzero_bits | ~mask); + } } } in the patch, but that breaks e.g. on void foo (unsigned int a, unsigned b, unsigned **c, void *d[8], int e) { int i; for (i = 0; i != e; i++); if (a) { for (i = 0;;) { i--; if (bar ()) goto lab2; } lab1:; __builtin_printf (%d\n, i 0 ? -1 - i : i); return; } lab2:; if (!d[b] *c) goto lab1; } at -O2 - VRP1 figures out: if (i_3 0) goto bb 10; else goto bb 11; bb 10: # RANGE [0, 2147483647] NONZERO 0x07fff iftmp.0_23 = ~i_3; bb 11: # RANGE [0, 2147483647] NONZERO 0x07fff # iftmp.0_4 = PHI iftmp.0_23(10), i_3(9) through the edge assertions, but CPP of course doesn't have edge assertions and during iterations just at some point assumes i_3 can be zero and processes bb 10 with that, which leads to value -1 that triggers the above assert. So, I still
Re: free is a killer
On 10/29/13 09:15, Richard Biener wrote: On Tue, Oct 29, 2013 at 4:01 PM, Jeff Law l...@redhat.com wrote: On 10/29/13 04:40, Richard Biener wrote: Of course in the example we have the global memory storage class (incoming function argument) and malloc memory which is really the same storage class. It only becomes a different storage class if you factor in flow analysis (for which the current PTA code is weak once you leave the SSA web). The global memory storage class is really better thought of as i haven't a clue class. No, that's the anything class in the current PTA. You want a different class for things passed in as you know they can't conflict with anything in the current function's stack or anything malloc'd by the current function. That's the NONLOCAL class in the current PTA. Which also means that simple cases should work. It's only when the malloc result escapes somewhere that it gets reachable by NONLOCAL because of the limited flow-sensitiveness of PTA (and its lack of context sensitivity). OK. I think we've got all the same concepts and may just be arguing about terminology :-) jeff
Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.
On 10/29/13 06:10, Eric Botcazou wrote: My reading of memory_address_addr_space is that MODE is the mode of the memory reference, not the mode of the address. I fail to see how passing in the mode of the address in the first call can be correct. What am I missing? Nothing, it's me confusing rtx memory_address_addr_space (enum machine_mode, rtx, addr_space_t); with rtx convert_memory_address_addr_space (enum machine_mode, rtx, addr_space_t); for a long time apparently... Sorry about that. In the meantime, I installed the obvious patch to eliminate the small redundancy. Tested on x86_64/Linux. No worries. At one time I kept reading memory_address_addr_space_p and couldn't reconcile how the return value was used. Eventually I realized I needed to be reading memory_address_space_addr_space instead. Ugh. Jeff
[patch] fix fallout from lto-streamer.h not including gimple.h
2 target files also included lto-streamer.h but did not include gimple.h.Fixed thusly. Applied as revision 204166 Andrew * config/darwin.c: Include gimple.h. * config/i386/winnt.c: Likewise. Index: config/darwin.c === *** config/darwin.c (revision 204164) --- config/darwin.c (working copy) *** along with GCC; see the file COPYING3. *** 45,50 --- 45,51 #include df.h #include debug.h #include obstack.h + #include gimple.h #include lto-streamer.h /* Darwin supports a feature called fix-and-continue, which is used Index: config/i386/winnt.c === *** config/i386/winnt.c (revision 204164) --- config/i386/winnt.c (working copy) *** along with GCC; see the file COPYING3. *** 35,40 --- 35,41 #include ggc.h #include target.h #include except.h + #include gimple.h #include lto-streamer.h /* i386/PE specific attribute support.
Re: RFC/A: PR 58079: CONST_INT mode assert in combine.c:do_SUBST
On Mon, Oct 28, 2013 at 5:28 PM, Uros Bizjak ubiz...@gmail.com wrote: PR 58079 is about the do_SUBST assert: /* Sanity check that we're replacing oldval with a CONST_INT that is a valid sign-extension for the original mode. */ gcc_assert (INTVAL (newval) == trunc_int_for_mode (INTVAL (newval), GET_MODE (oldval))); triggering while trying to optimise the temporary result: (eq (const_int -73 [0xffb7]) (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ]) combine_simplify_rtx calls simplify_comparison, which first canonicalises the order so that the constant is second and then promotes the comparison to SImode (the first supported comparison mode on MIPS). So we end up with: (eq (zero_extend:SI (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ])) (const_int 183 [0xb7])) So far so good. But combine_simplify_rtx then tries to install the new operands in-place, with the second part of: /* If the code changed, return a whole new comparison. */ if (new_code != code) return gen_rtx_fmt_ee (new_code, mode, op0, op1); /* Otherwise, keep this operation, but maybe change its operands. This also converts (ne (compare FOO BAR) 0) to (ne FOO BAR). */ SUBST (XEXP (x, 0), op0); SUBST (XEXP (x, 1), op1); And this triggers the assert because we're replacing a QImode register with the non-QImode constant 183. One fix would be to extend the new_code != code condition, as below. This should avoid the problem cases without generating too much garbage rtl. But if the condition's getting this complicated, perhaps it'd be better just to avoid SUBST here? I.e. if (new_code != code || op0 != XEXP (x, 0) || op1 != XEXP (x, 1)) return gen_rtx_fmt_ee (new_code, mode, op0, op1); Just asking though. :-) Tested on mipsisa64-elf. OK to install? Thanks, Richard gcc/ PR rtl-optimization/58079 * combine.c (combine_simplify_rtx): Avoid using SUBST if simplify_comparison has widened a comparison with an integer. gcc/testsuite/ * gcc.dg/torture/pr58079.c: New test. I would like to backport this patch to 4.8 branch. The patch fixes the failure on alpha-linux-gnu and also (reportedly) fixes the same failure on mips and pa targets. I have bootstrapped and regression test the patch on x86_64-pc-linux-gnu {,-m32} and started bootstrap and regtests on alphaev68-linux-gnu on 4.8 branch for all default languages + obj-c++ and go. The latter will take some ~10 hours. OK to backort the patch to the 4.8 branch if both tests pass? The regression test was OK in both cases, so I went ahead and installed the patch. The fix is kind of obvious anyway. Uros.
Re: free is a killer
On 10/29/13 00:38, Marc Glisse wrote: Indeed. Do you want to commit it xfailed or put it in bugzilla so we don't lose it? (it becomes harder if you replace p with p-1 in the memset arguments). I'll either add it as xfailed, or I'll add it as-is if I fix the alias code. It'd primarly be to address the current incorrect state of the code as it stands today rather than to extensively feature test this code. Sounds good, but I am not really up to rewriting the points-to analysis at the moment... Understandable. Never hurts to put a bug in someone's ear, you never know when they (or someone else reading) will run with it. jeff
Re: [PATCH] Use get_nonzero_bits to improve vectorization
On Tue, Oct 29, 2013 at 01:11:53PM +0100, Richard Biener wrote: +/* Return number of known trailing zero bits in EXPR, or, if the value of + EXPR is known to be zero, the precision of it's type. */ + +int unsigned int? Ok. +case PLUS_EXPR: +case MINUS_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: +case MIN_EXPR: +case MAX_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); This first recurses but if either one returns 0 you don't have to (that cuts down the recursion from exponential to linear in the common case?). Thus, early out on ret == 0? Yeah, that is reasonable. Usually we use this during expansion etc., trees won't be arbitrarily large and for SSA_NAMEs we don't recurse on definitions, so I think we are unlikely to ever see very large expressions there though. I've added similar early out to the COND_EXPR case. + return MIN (ret1, ret2); +case POINTER_PLUS_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + ret2 = MIN (ret2, prec); Why do you need that here but not elsewhere when processing binary ops? Because POINTER_PLUS_EXPR (well, also shifts, but for those we don't call tree_ctz on the second argument) is the only binop we handle that uses different types for the operands, for the first argument we know it has the same precision as the result, but what if sizetype has bigger precision than TYPE_PRECISION of pointers? I know it typically doesn't, just wanted to make sure we never return an out of range return value, tree_ctz result should be in [0, prec] always. + return MIN (ret1, ret2); +case BIT_AND_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MAX (ret1, ret2); +case MULT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + ret2 = tree_ctz (TREE_OPERAND (expr, 1)); + return MIN (ret1 + ret2, prec); +case LSHIFT_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (host_integerp (TREE_OPERAND (expr, 1), 1) check that first before recursing for op0 - if op1 is negative you simply return ret1 which looks wrong, too. If op1 is negative, then it is undefined behavior, so assuming in that case the same thing as when op1 is not constant (i.e. that we worst case shift left by 0 and thus not increase number of trailing zeros, or shift left by 0 and increase it) is IMHO correct. I don't think we treat left shift by negative value as right shift anywhere, do we? + ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1) + (unsigned HOST_WIDE_INT) prec)) This check is to avoid overflowing ret1 + ret2? If so, why not add + { + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1); ret2 = MIN (ret2, prec); instead? Because ret{1,2} are int (well, with the change you've suggested unsigned int), but tree_low_cst is UHWI, so if the shift count is say UINT_MAX + 1 (yeah, surely undefined behavior), I just didn't want to lose any of the upper bits. Sure, alternatively I could have an UHWI temporary variable and assign tree_low_cst to it and do the MIN on that temporary and prec, but still I think it is better to handle out of range constants as variable shift count rather than say shift count up by prec - 1. +case TRUNC_DIV_EXPR: +case CEIL_DIV_EXPR: +case FLOOR_DIV_EXPR: +case ROUND_DIV_EXPR: +case EXACT_DIV_EXPR: + if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST) + { + ret2 = tree_log2 (TREE_OPERAND (expr, 1)); + if (ret2 = 0 tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1) cheaper to test the sign first. Ok. + { + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (ret1 ret2) + return ret1 - ret2; + } + } + return 0; +CASE_CONVERT: + ret1 = tree_ctz (TREE_OPERAND (expr, 0)); + if (ret1 ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 0 + ret1 = prec; + return MIN (ret1, prec); +case SAVE_EXPR: + return tree_ctz (TREE_OPERAND (expr, 0)); +case COND_EXPR: + ret1 = tree_ctz (TREE_OPERAND (expr, 1)); + ret2 = tree_ctz (TREE_OPERAND (expr, 2)); + return MIN (ret1, ret2); +case COMPOUND_EXPR: + return tree_ctz (TREE_OPERAND (expr, 1)); +case ADDR_EXPR: + ret1 = get_object_alignment (TREE_OPERAND (expr, 0)); Use get_pointer_alignment, this isn't a memory reference so type alignment rules don't apply. Ok. Jakub
Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping
On 10/25/2013 11:19 AM, Martin Jambor wrote: Hi, On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote: On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote: /* Perform the second half of the transformation started in @@ -4522,7 +4704,15 @@ ira (FILE *f) allocation because of -O0 usage or because the function is too big. */ if (ira_conflicts_p) -find_moveable_pseudos (); +{ + df_analyze (); + calculate_dominance_info (CDI_DOMINATORS); + + find_moveable_pseudos (); + split_live_ranges_for_shrink_wrap (); + + free_dominance_info (CDI_DOMINATORS); +} You probably want to add another df_analyze if split_live_ranges_for_shrink_wrap makes code transformations. AFAIU find_moveable_pseudos doesn't change global liveness but your transformation might. IRA/LRA need up-to-date DF_LR results to compute allocno live ranges. OK, I have changed the patch to fo that (it is below, still bootstraps and passes tests on x86_64 fine). However, I have noticed that the corresponding part in function ira now looks like: /* ... */ if (delete_trivially_dead_insns (get_insns (), max_reg_num ())) df_analyze (); /* It is not worth to do such improvement when we use a simple allocation because of -O0 usage or because the function is too big. */ if (ira_conflicts_p) { df_analyze (); calculate_dominance_info (CDI_DOMINATORS); find_moveable_pseudos (); if (split_live_ranges_for_shrink_wrap ()) df_analyze (); free_dominance_info (CDI_DOMINATORS); } /* ... */ So, that left me wondering whether the first call to df_analyze is actually necessary, or whether perhaps the data are actually already up to date. What do you think? I guess it needs some investigation. delete_trivially_dead_insns code was taken from the old RA. First of all, I don't know how many insns are really trivially dead before RA in optimization and non-optimization mode. May be the code can be removed at all. I'll put it on my todo list. The patch is ok to commit. Thanks for working on this, Martin. 2013-10-23 Martin Jambor mjam...@suse.cz PR rtl-optimization/10474 * ira.c (find_moveable_pseudos): Do not calculate dominance info nor df analysis. (interesting_dest_for_shprep): New function. (split_live_ranges_for_shrink_wrap): Likewise. (ira): Calculate dominance info and df analysis. Call split_live_ranges_for_shrink_wrap. testsuite/ * gcc.dg/pr10474.c: New testcase. * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise. * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
Re: [Fwd: Re: [PATCH (updated)] Convert symtab, cgraph and varpool nodes into a real class hierarchy]
On 10/29/13 10:03, David Malcolm wrote: commit 739276306f1d8f5d0b1202da21cc29b5fcac102e Author: David Malcolmdmalc...@redhat.com Date: Mon Oct 28 22:25:38 2013 -0400 Add chain_next and chain_prev options back to symtab_node_base gcc/ * cgraph.h (symtab_node_base): Add missing chain_next and chain_prev GTY options. This is fine. Sorry I didn't review it at the same time as the other patch. Jeff
Re: RE : Problem with _Hashtable_ebo_helper
This patch replaces the problematic use of _Hashtable_ebo_helper with a new type. 2013-10-29 Jonathan Wakely jwakely@gmail.com * include/bits/hashtable.cc (__access_protected_ctor): Define and use new type instead of _Hashtable_ebo_helper. * testsuite/23_containers/unordered_set/ not_default_constructible_hash_neg.cc: Adjust line number. Tested x86_64-linux, I'll commit it to trunk today. commit 50786d1785911ef2d335fdcd162e71e14da4 Author: Jonathan Wakely jwakely@gmail.com Date: Tue Oct 29 15:50:00 2013 + * include/bits/hashtable.cc (__access_protected_ctor): Define and use new type instead of _Hashtable_ebo_helper. * testsuite/23_containers/unordered_set/ not_default_constructible_hash_neg.cc: Adjust line number. diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index c639c55..aae146b 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -277,14 +277,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION Functor used to map hash code to bucket index must be default constructible); + // _Hash_code_base has a protected default constructor, so use this + // derived type to tell if it's usable. + struct __access_protected_ctor : __hash_code_base { }; + // When hash codes are not cached local iterator inherits from // __hash_code_base above to compute node bucket index so it has to be // default constructible. static_assert(__if_hash_not_cached - is_default_constructible - // We use _Hashtable_ebo_helper to access the protected - // default constructor. - __detail::_Hashtable_ebo_helper0, __hash_code_base, true::value, + is_default_constructible__access_protected_ctor::value, Cache the hash code or make functors involved in hash code and bucket index computation default constructible); diff --git a/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc b/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc index 3332cc5..4216b91 100644 --- a/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc +++ b/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc @@ -19,7 +19,7 @@ // with this library; see the file COPYING3. If not see // http://www.gnu.org/licenses/. -// { dg-error default constructible { target *-*-* } 283 } +// { dg-error default constructible { target *-*-* } 287 } #include unordered_set
Re: patch to improve register preferencing in IRA and to *remove regmove* pass
On 10/29/13 09:12, Vladimir Makarov wrote: Tomorrow I'd like commit the following patch. The patch removes regmove pass. I'm buying next time we get together :-) You have my eternal gratitude. I've found only one useful transformations in regmove pass: dst = srcdst = src (src dies) ... no dst or src modification = src changed on dst src dies ... It is some kind value numbering technique decreasing register pressure by removing one live-range. It is not frequently triggered transformation (about 30 in all combine.c) and its effect is quite small but there is no harm in it at all too. So I added the code to IRA without changes practically (it would be nice to make this in more general way, e.g. not only in BB scope -- but I am not sure that it will makes visible improvements and I have no time to try this currently). I wouldn't be concerned about generalizing this too much. I can't see how it's *that* important, and we can always come back to it if we decide it is important. Still to achieve the same code performance without regmove pass, I needed to improve code in IRA which implicitly replace removed regmove transformations: o improving hard reg preferences. As you know RTL code can contain explicitly hard regs. Currently, hard register in RTL code affect costs of hard regs only for pseudo involved with the hard register occurrence. E.g. [ ... ] This sounds good and generally useful. o improving preference propagation of hard registers occurring in RTL and assigned to connected pseudo. Let us look at the situation: p1 - p2 - p3, where '-' means a copy and we are assigning p1, p2, p3 in the same order. When we've just assigned hr1 to p1, we propagating hr1 preference to p2 and p3. When we assign to p2, we can not assign hr1 for some reason and have to assign hr2. P3 in the current preference implementation still has hr1 preference which is wrong. I implemented undoing preference propagation for such situations. Ouch. Again, sounds good to get this fixed. o Currently IRA generates too aggressively copies for operands might be matched, so I rewrite this code to generate copies more accurately. Also good :-) The changes in testsuites are necessary as IRA/LRA now generate a different code (more accurately a better code by removing register shuffle moves for each case). Excellent. So this patch removes a lot of code, decrease compilation time (e.g. valgrind lackey reports about 0.4% less executed insns on compiling GCC combine.i with -O2), generates about the same performace code (the best improvement I saw is 0.5% SPEC2000 improvement on x86_64 in -O3 mode on a Haswell processor) and about the same average code size for SPEC2000 (the differences in hundredth percent range). No concerns here. It is a big change and I hope there are no serious objections to this. If somebody has them, please express them or inform me. Thanks, Vlad. 2013-10-28 Vladimir Makarov vmaka...@redhat.com * regmove.c: Remove. * tree-pass.h (make_pass_regmove): Remove. * timevar.def (TV_REGMOVE): Remove. * passes.def (pass_regmove): Remove. * opts.c (default_options_table): Remove entry for regmove. * doc/passes.texi: Remove regmove pass description. * doc/invoke.texi (-foptimize-register-move, -fregmove): Remove options. (-fdump-rtl-regmove): Ditto. * common.opt (foptimize-register-move, fregmove): Remove. * Makefile.in (OBJS): Remove regmove.o. * regmove.c: Remove. * ira-int.h (struct ira_allocno_pref, ira_pref_t): New structure and type. (struct ira_allocno) New member allocno_prefs. (ALLOCNO_PREFS): New macro. (ira_prefs, ira_prefs_num): New external vars. (ira_setup_alts, ira_get_dup_out_num, ira_debug_pref): New prototypes. (ira_debug_prefs, ira_debug_allocno_prefs, ira_create_pref): Ditto. (ira_add_allocno_pref, ira_remove_pref, ira_remove_allocno_prefs): Ditto. (ira_add_allocno_copy_to_list): Remove prototype. (ira_swap_allocno_copy_ends_if_necessary): Ditto. (ira_pref_iterator): New type. (ira_pref_iter_init, ira_pref_iter_cond): New functions. (FOR_EACH_PREF): New macro. * ira.c (commutative_constraint_p): Move from ira-conflicts.c. (ira_get_dup_out_num): Ditto. Rename from get_dup_num. Modify the code. (ira_setup_alts): New function. (decrease_live_ranges_number): New function. (ira): Call the above function. * ira-build.c (ira_prefs, ira_prefs_num): New global vars. (ira_create_allocno): Initialize allocno prefs. (pref_pool,
Re: RFA: Andes nds32 port v4 patch
On Tue, 29 Oct 2013, Chung-Ju Wu wrote: Thank you very much for the comments on libgcc and documentation parts. Here I re-list the v4 patch of nds32 port: Machine Description -- http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02153.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02154.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02156.html Libgcc and Documentation -- http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html Are those nds32 port patches OK to be committed? This is OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [patch] Mostly remove tree-core.h from other .h files.
On 10/28/2013 03:38 PM, Andrew MacLeod wrote: I noticed there were a few .h files which we including tree-core.h themselves. expr.h was one of them, and many of the rtl files used that to get at the basic tree structure for various bits. I moved the include of tree-core.h to rtl.h instead. That way all the rtl files get it, and when we flatten the rtl.h includes into the c files, it'll be reduced to just the ones that need it. Eventually, those things that need it from rtl.h won't even need tree-code.h, they'll get converted to the new gimple interface instead. bah, looks like I broke some cross compiling. Apparently some gen* routines include rtl.h before all-tree.def exists, and this causes breakage. Doesnt happen during stage 1 of a cross, but later. sigh. I'll put tree-core.h back into expr.h, remove it from rtl.h and look to flatten it properly shortly... bootstrapped on x86_64-unknown-linux-gnu and applied as revision 204167. Andrew * expr.h: Revert change and include tree-core.h. * rtl.h: Revert change and don't include tree-core.h. Index: expr.h === *** expr.h (revision 204163) --- expr.h (working copy) *** along with GCC; see the file COPYING3. *** 26,31 --- 26,34 #include rtl.h /* For optimize_size */ #include flags.h + /* For host_integerp, tree_low_cst, fold_convert, size_binop, ssize_int, +TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ + #include tree-core.h /* For GET_MODE_BITSIZE, word_mode */ #include machmode.h Index: rtl.h === *** rtl.h (revision 204163) --- rtl.h (working copy) *** along with GCC; see the file COPYING3. *** 29,35 #include alias.h #include hashtab.h #include flags.h - #include tree-core.h /* Value used by some passes to recognize noop moves as valid instructions. */ --- 29,34
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
On 10/29/2013 03:02 AM, Kirill Yukhin wrote: Hello Richard, On 28 Oct 14:45, Richard Henderson wrote: On 10/28/2013 01:58 PM, Kirill Yukhin wrote: Hello Richard, On 28 Oct 08:20, Richard Henderson wrote: Why is a masked *scalar* operation useful? The reason the instructions exist is so that you can do fully fault correct predicated scalar algorithms. Using VEC_MERGE isn't the proper representation for that. If that's your real goal, then COND_EXEC is the only way to let rtl know that faults are suppressed in the false condition. I believe cond_exec approach supposed to look like this: (define_subst mask_scalar [(set (match_operand:SUBST_V 0) (vec_merge:SUBST_V (match_operand:SUBST_V 1) (match_operand:SUBST_V 2) (const_int 1)))] TARGET_AVX512F [(cond_exec (eq:CC (match_operand:avx512fmaskmode 3 register_operand k) (const_int 1)) (set (match_dup 0) (vec_merge:SUBST_V (match_dup 1) (match_dup 2) (const_int 1]) But this only will describe merge-masking in incorrect way. We will need to add a clobber to signal that even for false condition we will zero higher part of register. Preferable zerro-masking will be indistinguishable from merge- masking and will need to choose which mask mode to enable. Bad turn. No, a cond_exec approach to scalars would use scalar modes not vector modes with vec_merge. In that case the higher part of the register is ignored and undefined, and the fact that zeroing happens is irrelevant. IMHO, we have 3 options to implement scalar masked insns: 1. `vec_merge' over vec_merge (current approach). Pro. 1. Precise semantic description False, as I've described above. The compiler will believe that all exceptions, especially memory exceptions, will be taken. 2. Unified approach with vector patterns 3. Freedom for simplifier to reduce EVEX to VEX for certain const masks Cons. 1. Too precise semantic description and as a consequence complicated code in md-file 2. `cond_exec' approach Pro. 1. Look useful for compiler when trying to generate predicated code Cons. 1. Not precise. Extra clobbers (?) needed: to signal that we're changing the register even for false condition in cond_exec 2. Unable to describe zero masking nicely 3. Code still complicated as for option #1 4. Simplifier won't work (clobber is always clobber) 3. Make all masked scalar insns to be unspecs Pro. 1. Straight-forward, not overweighted. Enough for intrinsics to work Cons. 1. Since every unspec needs a code: substs won't be applied directly: huge volume of similar code 2. Simplifier won't work 3. Generation of predicated code become hard Am I missing some options, or that’s all we have? If so, what option would you prefer? As far as builtins are concerned, all three approaches are functional. But in order for the compiler to be able to automatically create conditional code from normal scalar code we'll have to use cond_exec. Indeed, if we've done our job right, then the user-facing inlines that we expose could be written inline double add_mask(double r, double x, double y, int m) { if (m 1) r = x + y; return r; } although honestly such inlines would be silly.
Re: patch to improve register preferencing in IRA and to *remove regmove* pass
Hello, A very minor nit: in common.opt, entries for the options should be changed to fregmove Common Ignore Does nothing. Preserved for backward compatibility. instead of removing them altogether, so the compiler does not start rejecting build commands with such options. There are now a few such entries there. Alexander
Re: patch to improve register preferencing in IRA and to *remove regmove* pass
Please change the .opt entries to foptimize-register-move Common Ignore Does nothing. Preserved for backward compatibility. and fregmove Common Ignore Does nothing. Preserved for backward compatibility. rather than removing them completely. In general, when removing an option that's purely about optimizations rather than affecting the semantics of the user's source code, it's best to leave it behind as an ignored option like that. -- Joseph S. Myers jos...@codesourcery.com
Re: patch to improve register preferencing in IRA and to *remove regmove* pass
On 10/29/2013 12:26 PM, Jeff Law wrote: On 10/29/13 09:12, Vladimir Makarov wrote: I've found only one useful transformations in regmove pass: dst = srcdst = src (src dies) ... no dst or src modification = src changed on dst src dies ... It is some kind value numbering technique decreasing register pressure by removing one live-range. It is not frequently triggered transformation (about 30 in all combine.c) and its effect is quite small but there is no harm in it at all too. So I added the code to IRA without changes practically (it would be nice to make this in more general way, e.g. not only in BB scope -- but I am not sure that it will makes visible improvements and I have no time to try this currently). I wouldn't be concerned about generalizing this too much. I can't see how it's *that* important, and we can always come back to it if we decide it is important. I am completely agree with you. I don't expect any important outcome from this work. In fact, I did some experiments with GVN about 10 years ago (how time flies) for old RA and did not see any visible effect on performance but generated code always changes and may be there are more opportunity for this (although more probably less opportunities). In any case, it is lowest priority project for me and most probably I will never start it.
Re: patch to improve register preferencing in IRA and to *remove regmove* pass
On 10/29/2013 12:43 PM, Alexander Monakov wrote: Hello, A very minor nit: in common.opt, entries for the options should be changed to fregmove Common Ignore Does nothing. Preserved for backward compatibility. instead of removing them altogether, so the compiler does not start rejecting build commands with such options. There are now a few such entries there. Thanks, Alex. A good point. I'll change it.
Re: patch to improve register preferencing in IRA and to *remove regmove* pass
On 10/29/2013 12:44 PM, Joseph S. Myers wrote: Please change the .opt entries to foptimize-register-move Common Ignore Does nothing. Preserved for backward compatibility. and fregmove Common Ignore Does nothing. Preserved for backward compatibility. rather than removing them completely. In general, when removing an option that's purely about optimizations rather than affecting the semantics of the user's source code, it's best to leave it behind as an ignored option like that. Thanks, Joseph. I modified common.opt accordingly.
Re: [GOOGLE] Don't update the callee count if caller is not resolved node
On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li davi...@google.com wrote: Is it sufficient to check if the final caller is defined in primary module? This might not be sufficient because the final caller may come from comdat of aux-modules (not defined in the primary module). Note that in some cases, doing profile update 'speculatively' (without your fix) can be more precise (assuming the inlining gets materialized in a different compilation), but in general it might be better to be conservative in profile update (as in your patch) -- the downside is you may end up with less aggressive size optimization. This is actually no speculatively update profile, but double update. E.g. comdat_foo()--bar() with count 100 and comdat_foo() has been defined in 2 aux-modules. Then there will be two edges from comdat_foo()--bar(), each of which has the count 100. But bar()'s entry count is only 100 (assume comdat_foo is the only caller). Then if we update bar() twice when inline these two edges, the second update will be wrong. Dehao David On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen de...@google.com wrote: This patch changes to no update callee count if caller count is not a resolved node (in LIPO mode) during AutoFDO compilation. This is because AutoFDO will have the same edge counts for all unresolved nodes. Original update method will lead to multi-update of the callee. Bootstrapped and testing on going. OK for google-4_8 if test is OK? Thanks, Dehao Index: gcc/ipa-inline-transform.c === --- gcc/ipa-inline-transform.c (revision 204131) +++ gcc/ipa-inline-transform.c (working copy) @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d else { struct cgraph_node *n; + if (flag_auto_profile L_IPO_COMP_MODE + cgraph_pre_profiling_inlining_done) +{ + struct cgraph_node *caller = e-caller; + if (caller-global.inlined_to) + caller = caller-global.inlined_to; + if (cgraph_lipo_get_resolved_node (caller-symbol.decl) != caller) + update_original = false; +} n = cgraph_clone_node (e-callee, e-callee-symbol.decl, e-count, e-frequency, update_original, vNULL, true);
Re: [PATCH] Introduce [sg]et_nonzero_bits
Hi! On Tue, Oct 29, 2013 at 04:29:56PM +0100, Jakub Jelinek wrote: Surely you can't rely on CCP and VRP compute exactly the same nonzero_bits. As you don't record/compute zero_bits you can't tell whether a not set bit in nonzer_bits is don't know or if it is zero. And you cannot do an assert during iteration Unset bits in get_nonzero_bits are always known to be zero, set bits are don't know or known to be nonzero. (during iteration optimistically 'wrong' values can disappear). During CCP iteration the rule is that bits may only be added to mask (and obviously the constant itself on a CONSTANT lattice value may not change). Here is an untested safer variant of the original patch that attempts to honor the constraints you've mentioned by looking at get_value and either making nonzero_bits more conservative or not applying it at all if the above rules would be violated. Still, not doing anything for the !is_constant case, because I don't know what would be the right thing. Say, if for iteration we assume some constant value that has some bits unset in mask (known) and set in value (known non-zero), yet they are clear in get_nonzero_bits, shall we punt, or ignore what the iteration would try to do and adjust from get_nonzero_bits instead, something else? Factor this out to commonize with code in evaluate_stmt Ok. But in this more conservative patch I don't see enough code to commonize any more. The only change in the patch below since the previous patch is in the evaluate_stmt function: 2013-10-29 Jakub Jelinek ja...@redhat.com * gimple-pretty-print.c (dump_ssaname_info): Print newline also in case of VR_VARYING. Print get_nonzero_bits if not all ones. * tree-ssanames.h (struct range_info_def): Add nonzero_bits field. (set_nonzero_bits, get_nonzero_bits): New prototypes. * tree-ssa-ccp.c (get_default_value): Use get_range_info to see if a default def isn't partially constant. (ccp_finalize): If after IPA, set_range_info if integral SSA_NAME is known to be partially zero. (evaluate_stmt): If we'd return otherwise VARYING, use get_range_info to see if a default def isn't partially constant. * tree-ssanames.c (set_range_info): Initialize nonzero_bits upon creation of a range, if VR_RANGE, try to improve nonzero_bits from the range. (set_nonzero_bits, get_nonzero_bits): New functions. * g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure we warn about it already during VRP1 pass. --- gcc/gimple-pretty-print.c.jj2013-10-23 14:43:12.0 +0200 +++ gcc/gimple-pretty-print.c 2013-10-24 17:26:59.650945232 +0200 @@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe if (!POINTER_TYPE_P (TREE_TYPE (node)) SSA_NAME_RANGE_INFO (node)) { - double_int min, max; + double_int min, max, nonzero_bits; value_range_type range_type = get_range_info (node, min, max); if (range_type == VR_VARYING) @@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe pp_printf (buffer, , ); pp_double_int (buffer, max, TYPE_UNSIGNED (TREE_TYPE (node))); pp_printf (buffer, ]); - newline_and_indent (buffer, spc); } + nonzero_bits = get_nonzero_bits (node); + if (nonzero_bits != double_int_minus_one + (nonzero_bits + != double_int::mask (TYPE_PRECISION (TREE_TYPE (node) + { + pp_string (buffer, NONZERO ); + sprintf (pp_buffer (buffer)-digit_buffer, + HOST_WIDE_INT_PRINT_DOUBLE_HEX, + (unsigned HOST_WIDE_INT) nonzero_bits.high, + nonzero_bits.low); + pp_string (buffer, pp_buffer (buffer)-digit_buffer); + } + newline_and_indent (buffer, spc); } } --- gcc/tree-ssanames.h.jj 2013-09-27 15:42:44.0 +0200 +++ gcc/tree-ssanames.h 2013-10-24 15:52:53.765955605 +0200 @@ -52,6 +52,8 @@ struct GTY (()) range_info_def { double_int min; /* Maximum for value range. */ double_int max; + /* Non-zero bits - bits not set are guaranteed to be always zero. */ + double_int nonzero_bits; }; @@ -68,10 +70,11 @@ struct GTY (()) range_info_def { enum value_range_type { VR_UNDEFINED, VR_RANGE, VR_ANTI_RANGE, VR_VARYING }; /* Sets the value range to SSA. */ -extern void set_range_info (tree ssa, double_int min, double_int max); +extern void set_range_info (tree, double_int, double_int); /* Gets the value range from SSA. */ -extern enum value_range_type get_range_info (tree name, double_int *min, - double_int *max); +extern enum value_range_type get_range_info (tree, double_int *, double_int *); +extern void set_nonzero_bits (tree, double_int); +extern double_int get_nonzero_bits (tree); extern void init_ssanames (struct function *, int); extern void fini_ssanames
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
On Tue, Oct 29, 2013 at 1:38 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! For the define_expand I added as below, the else body is there to avoid fall-through transformations to ABS operation in optabs.c. Otherwise ABS will be converted to other operations even that we have corresponding instructions from SSSE3. No, it wont be. Fallthrough will generate the pattern that will be matched by the insn pattern above, just like you are doing by hand below. I think the case is special for abs(). In optabs.c, there is a function expand_abs() in which the function expand_abs_nojump() is called. This function first tries the expand function defined for the target and if it fails it will try max(v, -v) then shift-xor-sub method. If I don't generate any instruction for SSSE3, the fall-through will be max(v, -v). I have tested it on my machine. (define_expand absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))] TARGET_SSE2 { if (!TARGET_SSSE3) ix86_expand_sse2_abs (operands[0], force_reg (MODEmode, operands[1])); Do you really need force_reg here? You are using generic expanders in ix86_expand_sse2_abs that can handle non-registers operands just as well. You are right. I have removed force_reg. else emit_insn (gen_rtx_SET (VOIDmode, operands[0], gen_rtx_ABS (MODEmode, operands[1]))); DONE; }) Please note that your mailer mangles indents. Please indent your code correctly. Right.. I also attached a text file in which all tabs are there. The updated patch is pasted below (and also in the attached file). Thank you very much for your comment! Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..84c7ab5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-22 Cong Hou co...@google.com + + PR target/58762 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function. + * config/i386/i386.c (ix86_expand_sse2_abs): New function. + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int). + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 3ab2f3a..ca31224 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_abs (rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 02cbbbd..71905fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) gen_rtx_MULT (mode, op1, op2)); } +void +ix86_expand_sse2_abs (rtx op0, rtx op1) +{ + enum machine_mode mode = GET_MODE (op0); + rtx tmp0, tmp1; + + switch (mode) +{ + /* For 32-bit signed integer X, the best way to calculate the absolute + value of X is (((signed) X (W-1)) ^ X) - ((signed) X (W-1)). */ + case V4SImode: + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1, +GEN_INT (GET_MODE_BITSIZE + (GET_MODE_INNER (mode)) - 1), +NULL, 0, OPTAB_DIRECT); + if (tmp0) + tmp1 = expand_simple_binop (mode, XOR, op1, tmp0, + NULL, 0, OPTAB_DIRECT); + if (tmp0 tmp1) + expand_simple_binop (mode, MINUS, tmp1, tmp0, + op0, 0, OPTAB_DIRECT); + break; + + /* For 16-bit signed integer X, the best way to calculate the absolute + value of X is max (X, -X), as SSE2 provides the PMAXSW insn. */ + case V8HImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + /* For 8-bit signed integer X, the best way to calculate the absolute + value of X is min ((unsigned char) X, (unsigned char) (-X)), + as SSE2 provides the PMINUB insn. */ + case V16QImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + default: + break; +} +} + /* Expand an insert into a vector register through pinsr insn. Return true if successful. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..0d9cefe 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8721,7 +8721,7 @@ (set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn))) (set_attr mode DI)]) -(define_insn absmode2 +(define_insn *absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v) (abs:VI124_AVX2_48_AVX512F
Re: patch to improve register preferencing in IRA and to *remove regmove* pass
On 10/29/13 10:46, Vladimir Makarov wrote: On 10/29/2013 12:26 PM, Jeff Law wrote: On 10/29/13 09:12, Vladimir Makarov wrote: I've found only one useful transformations in regmove pass: dst = srcdst = src (src dies) ... no dst or src modification = src changed on dst src dies ... It is some kind value numbering technique decreasing register pressure by removing one live-range. It is not frequently triggered transformation (about 30 in all combine.c) and its effect is quite small but there is no harm in it at all too. So I added the code to IRA without changes practically (it would be nice to make this in more general way, e.g. not only in BB scope -- but I am not sure that it will makes visible improvements and I have no time to try this currently). I wouldn't be concerned about generalizing this too much. I can't see how it's *that* important, and we can always come back to it if we decide it is important. I am completely agree with you. I don't expect any important outcome from this work. In fact, I did some experiments with GVN about 10 years ago (how time flies) for old RA and did not see any visible effect on performance but generated code always changes and may be there are more opportunity for this (although more probably less opportunities). In any case, it is lowest priority project for me and most probably I will never start it. I vaguely recall you talking about GVN in the allocator once -- perhaps at one of the summits. And yes, I recall you saying it wasn't particularly helpful. Jeff
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou co...@google.com wrote: For the define_expand I added as below, the else body is there to avoid fall-through transformations to ABS operation in optabs.c. Otherwise ABS will be converted to other operations even that we have corresponding instructions from SSSE3. No, it wont be. Fallthrough will generate the pattern that will be matched by the insn pattern above, just like you are doing by hand below. I think the case is special for abs(). In optabs.c, there is a function expand_abs() in which the function expand_abs_nojump() is called. This function first tries the expand function defined for the target and if it fails it will try max(v, -v) then shift-xor-sub method. If I don't generate any instruction for SSSE3, the fall-through will be max(v, -v). I have tested it on my machine. Huh, strange. Then you can rename previous pattern to absmode2_1 and call it from the new expander instead of expanding it manually. Please also add a small comment, describing the situation to prevent future optimizations in this place. Thanks, Uros.
Re: [PATCH] Do not append *INTERNAL* to the decl name
On Tue, Oct 29, 2013 at 7:58 AM, Jason Merrill ja...@redhat.com wrote: On 10/28/2013 06:12 PM, Dehao Chen wrote: ping... Sorry for the slow response. If we're actually emitting the name now, we need to give it a name different from the complete constructor. I suppose it makes sense to go with C4/D4 as in the decloning patch, Shall we do it in a separate patch? And I suppose binutils also need to be updated for C4/D4? Thanks, Dehao http://gcc.gnu.org/ml/gcc-patches/2007-11/txt00046.txt (which I've been meaning to revisit before the end of stage 1). Jason
Re: [GOOGLE] Don't update the callee count if caller is not resolved node
The situation you described is worse -- hopefully it will be addressed in the next version of lipo. The change is ok. David On Tue, Oct 29, 2013 at 10:08 AM, Dehao Chen de...@google.com wrote: On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li davi...@google.com wrote: Is it sufficient to check if the final caller is defined in primary module? This might not be sufficient because the final caller may come from comdat of aux-modules (not defined in the primary module). Note that in some cases, doing profile update 'speculatively' (without your fix) can be more precise (assuming the inlining gets materialized in a different compilation), but in general it might be better to be conservative in profile update (as in your patch) -- the downside is you may end up with less aggressive size optimization. This is actually no speculatively update profile, but double update. E.g. comdat_foo()--bar() with count 100 and comdat_foo() has been defined in 2 aux-modules. Then there will be two edges from comdat_foo()--bar(), each of which has the count 100. But bar()'s entry count is only 100 (assume comdat_foo is the only caller). Then if we update bar() twice when inline these two edges, the second update will be wrong. Dehao David On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen de...@google.com wrote: This patch changes to no update callee count if caller count is not a resolved node (in LIPO mode) during AutoFDO compilation. This is because AutoFDO will have the same edge counts for all unresolved nodes. Original update method will lead to multi-update of the callee. Bootstrapped and testing on going. OK for google-4_8 if test is OK? Thanks, Dehao Index: gcc/ipa-inline-transform.c === --- gcc/ipa-inline-transform.c (revision 204131) +++ gcc/ipa-inline-transform.c (working copy) @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d else { struct cgraph_node *n; + if (flag_auto_profile L_IPO_COMP_MODE + cgraph_pre_profiling_inlining_done) +{ + struct cgraph_node *caller = e-caller; + if (caller-global.inlined_to) + caller = caller-global.inlined_to; + if (cgraph_lipo_get_resolved_node (caller-symbol.decl) != caller) + update_original = false; +} n = cgraph_clone_node (e-callee, e-callee-symbol.decl, e-count, e-frequency, update_original, vNULL, true);
Re: [PATCH] tree-ssa documetation fix
On 10/29/13 03:53, Martin Liška wrote: Hello, I've noticed that some part of documentation is obsolete and this patch adds documentation for new functions that replace old macros. [ ... ] Thanks. Installed on the trunk. jeff
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 10/29/13 09:14, Andrew Pinski wrote: On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law l...@redhat.com wrote: On 10/27/13 12:55, Andrew Pinski wrote: Here is my latest patch which adds the testcases from Zhenqiang's patch and fixes item 1 and 2. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (ifcombine_ifandif): Handle cases where maybe_fold_and_comparisons fails, combining the branches anyways. (tree_ssa_ifcombine): Inverse the order of the basic block walk, increases the number of combinings. * gimple.h (gsi_start_nondebug_after_labels_bb): New function. testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent conditional move to be used. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more intermediate. I don't be able to look at this in any depth tonight. However, I would ask that you pass along the dump file for ssa-dom-thread-3.c so that I can evaluate the correctness of your change better. We no longer jump thread one case as we have: bb 2: var_3 = var_1(D)-ssa_name.var; _4 = var_1(D)-base.code; if (_4 == 1) goto bb 3; else goto bb 5; ... bb 5: _6 = var_3-base.code; _13 = _4 != 1; _14 = _6 != 0; _12 = _13 _14; if (_12 != 0) goto bb 8; else goto bb 6; Attached is the full dump too. I mostly wanted to verify whether or not the test was worth running. It isn't after this change. It no longer tests for the threader's ability to clone an intermediate block to isolate a path to a successor of the intermediate that is threadable. Please just remove ssa-dom-thread-3.c as part of this checkin. Thanks, Jeff
Re: [RFC] [Testsuite,ARM] Neon intrinsics executable tests
On 29 October 2013 03:24, Ramana Radhakrishnan ramra...@arm.com wrote: On 10/09/13 23:16, Christophe Lyon wrote: Irrespective of our earlier conversations on this now I'm actually wondering if instead of doing this and integrating this in the GCC source base it maybe easier to write a harness to test this cross on qemu or natively. Additionally setting up an auto-tester to do this might be a more productive use of time rather than manually dejagnuizing this which appears to be a tedious and slow process. This would be easy to setup, since the Makefile on gitorious is already targetting qemu. I used it occasionnally on boards with minimal changes. This just means we'd have to agree on how to set up such an auto-tester, where do we send the results to, etc... I'd like your feedback before continuing, as there are a lot more files to come. I have made some cleanup to help review, but the two .h files will need to grow as more intrinsics will be added (see the original ones). Which one should I compare this with in terms of the original file ? I have kept the same file names. I'd like to keep the modifications at a minimal level, to save my time when adapting each test (there are currently 145 test files, so 143 left:-). On to the patch itself. The prefix TEST_ seems a bit misleading in that it suggests this is testing something when in reality this is initializing stuff. In fact, TEST_ executes the intrinsics, and copies the results to memory when relevant. But I can easily change TEST_ to something else. So in the sample I posted: TEST_VABA: VAR=vaba(); vst1(BUFFER,VAR) TEST_VLD1: VAR=vld1(); vst1(BUFFER, VAR) VDUP is special in that it is a helper for other tests: TEST_VDUP: VAR1=vdup(VAR2,) and similarly for TEST_VLOAD and TEST_VSETLANE +# Exit immediately if this isn't an ARM target. +if ![istarget arm*-*-*] then { + return +} Also for aarch64*-*-* as all these intrinsics are compatible with the aarch64 port. I would also prefer that this be tortured over multiple optimization levels as many times we find issues with different optimization levels. OK, this sounds easy to do, and I agree. I prefered to post a simple version first. And given you talked me about your plans to factorize arm and aarch64 tests, I thought it was better to start with a simple version I knew was working. More later I need to get back to something else and I need to play more with your original testsuite - but I'd like some discussion around some of these points anyway. Ramana OK thanks for the feedback. If we decide to go with auto-testers instead, the discussion will probably be shorter. Christophe + +# Load support procs. +load_lib gcc-dg.exp + +# Initialize `dg'. +dg-init + +# Main loop. +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ + + +# All done. +dg-finish diff -rNup '--exclude=.git' gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h --- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 1970-01-01 01:00:00.0 +0100 +++ gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 2013-05-09 00:48:59.395628726 +0200 @@ -0,0 +1,349 @@ +#ifndef_ARM_NEON_REF_H_ +#define_ARM_NEON_REF_H_ + +#include stdio.h +#include inttypes.h +#include string.h +#include stdlib.h + +#define xSTR(X) #X +#define STR(X) xSTR(X) + +#define xNAME1(V,T) V ## _ ## T +#define xNAME(V,T) xNAME1(V,T) + +#define VAR(V,T,W) xNAME(V,T##W) +#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W) + +#define VECT_NAME(T, W, N) T##W##x##N +#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L +#define VECT_TYPE(T, W, N) xNAME(VECT_NAME(T,W,N),t) +#define VECT_ARRAY_TYPE(T, W, N, L) xNAME(VECT_ARRAY_NAME(T,W,N,L),t) + +#define VECT_VAR(V,T,W,N) xNAME(V,VECT_NAME(T,W,N)) +#define VECT_VAR_DECL(V, T, W, N) T##W##_t VECT_VAR(V,T,W,N) + +/* Array declarations. */ +#define ARRAY(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[N] + +/* Check results vs expected values. */ +#define CHECK(MSG,T,W,N,FMT) \ + for(i=0; iN ; i++) \ +{ \ + if (VECT_VAR(result, T, W, N)[i] != VECT_VAR(expected, T, W, N)[i]) { \ + fprintf(stderr, \ + ERROR in %s at type %s index %d: 0x% FMT != 0x%\ + FMT (expected)\n,\ + MSG, STR(VECT_NAME(T, W, N)), i,\ + VECT_VAR(result, T, W, N)[i], \ + VECT_VAR(expected, T, W, N)[i]);\ + abort();\ + } \ +
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
On Tue, Oct 29, 2013 at 10:34 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou co...@google.com wrote: For the define_expand I added as below, the else body is there to avoid fall-through transformations to ABS operation in optabs.c. Otherwise ABS will be converted to other operations even that we have corresponding instructions from SSSE3. No, it wont be. Fallthrough will generate the pattern that will be matched by the insn pattern above, just like you are doing by hand below. I think the case is special for abs(). In optabs.c, there is a function expand_abs() in which the function expand_abs_nojump() is called. This function first tries the expand function defined for the target and if it fails it will try max(v, -v) then shift-xor-sub method. If I don't generate any instruction for SSSE3, the fall-through will be max(v, -v). I have tested it on my machine. Huh, strange. Then you can rename previous pattern to absmode2_1 and call it from the new expander instead of expanding it manually. Please also add a small comment, describing the situation to prevent future optimizations in this place. Could you tell me how to do that? The renamed pattern absmode2_1 is also a define_expand? How to call this expander? Thank you! Cong Thanks, Uros.
[PATCH] Fix PR ipa/58862 (profiled bootstrap failure)
This patch fixes a profiledbootstrap failure that occurred after I added some profile fixup. The initial profile insanity occurred upstream of my change, but my change caused the insanity to spread to the edge probability, resulting in a verify failure. The patch below ensures this doesn't occur. Bootstrapped and tested on x86-64-unknown-linux-gnu. A profiledbootstrap build also now succeeds. Ok for trunk? Thanks, Teresa 2013-10-29 Teresa Johnson tejohn...@google.com PR ipa/58862 * tree-ssa-tail-merge.c (replace_block_by): Tolerate profile insanities when updating probabilities. Index: tree-ssa-tail-merge.c === --- tree-ssa-tail-merge.c (revision 204166) +++ tree-ssa-tail-merge.c (working copy) @@ -1467,7 +1467,7 @@ static void replace_block_by (basic_block bb1, basic_block bb2) { edge pred_edge; - edge e1; + edge e1, e2; edge_iterator ei; unsigned int i; gimple bb2_phi; @@ -1502,16 +1502,22 @@ replace_block_by (basic_block bb1, basic_block bb2 bb2-count += bb1-count; /* Merge the outgoing edge counts from bb1 onto bb2. */ + gcov_type out_sum = 0; FOR_EACH_EDGE (e1, ei, bb1-succs) { - edge e2; e2 = find_edge (bb2, e1-dest); gcc_assert (e2); e2-count += e1-count; - /* Recompute the probability from the new merged edge count (bb2-count - was updated above). */ - e2-probability = GCOV_COMPUTE_SCALE (e2-count, bb2-count); + out_sum += e2-count; } + /* Recompute the edge probabilities from the new merged edge count. + Use the sum of the new merged edge counts computed above instead + of bb2's merged count, in case there are profile count insanities + making the bb count inconsistent with the edge weights. */ + FOR_EACH_EDGE (e2, ei, bb2-succs) +{ + e2-probability = GCOV_COMPUTE_SCALE (e2-count, out_sum); +} /* Do updates that use bb1, before deleting bb1. */ release_last_vdef (bb1); -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Keep REG_INC note in subreg2 pass
On 10/24/13 02:20, Zhenqiang Chen wrote: Hi, REG_INC note is lost in subreg2 pass when resolve_simple_move, which might lead to wrong dependence for ira. e.g. In function validate_equiv_mem of ira.c, it checks REG_INC note: for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) if ((REG_NOTE_KIND (note) == REG_INC || REG_NOTE_KIND (note) == REG_DEAD) REG_P (XEXP (note, 0)) reg_overlap_mentioned_p (XEXP (note, 0), memref)) return 0; Without REG_INC note, validate_equiv_mem will return a wrong result. Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022 for more detail about a real case in kernel. Bootstrap and no make check regression on X86-64 and ARM. Is it OK for trunk and 4.8? Thanks! -Zhenqiang ChangeLog: 2013-10-24 Zhenqiang Chenzhenqiang.c...@linaro.org * lower-subreg.c (resolve_simple_move): Copy REG_INC note. testsuite/ChangeLog: 2013-10-24 Zhenqiang Chenzhenqiang.c...@linaro.org * gcc.target/arm/lp1243022.c: New test. This clearly handles adding a note when the destination is a MEM with a side effect. What about cases where the side effect is associated with a load from memory rather than a store to memory? lp1243022.patch diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index 57b4b3c..e710fa5 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn) mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0); minsn = emit_move_insn (real_dest, mdest); +#ifdef AUTO_INC_DEC + /* Copy the REG_INC notes. */ + if (MEM_P (real_dest) !(resolve_reg_p (real_dest) +|| resolve_subreg_p (real_dest))) + { + rtx note = find_reg_note (insn, REG_INC, NULL_RTX); + if (note) + { + if (!REG_NOTES (minsn)) + REG_NOTES (minsn) = note; + else + add_reg_note (minsn, REG_INC, note); + } + } +#endif If MINSN does not have any notes, then this results in MINSN and INSN sharing the note. Note carefully that notes are chained (see implementation of add_reg_note). Thus the sharing would result in MINSN and INSN actually sharing a chain of notes. I'm pretty sure that's not what you intended. I think you need to always use add_reg_note. Jeff
Symtab, cgraph and varpool nodes are now a real class hierarchy
On Mon, 2013-10-28 at 23:37 -0600, Jeff Law wrote: On 10/28/13 20:57, David Malcolm wrote: * cgraph.h (symtab_node_base): Convert to a class; add GTY((desc (%h.type), tag (SYMTAB_SYMBOL))). (cgraph_node): Inherit from symtab_node; add GTY option tag (SYMTAB_FUNCTION). (varpool_node): Inherit from symtab_node; add GTY option tag (SYMTAB_VARIABLE). (symtab_node_def): Remove. (is_a_helper cgraph_node::test (symtab_node_def *)): Convert to... (is_a_helper cgraph_node::test (symtab_node_base *)): ...this. (is_a_helper varpool_node::test (symtab_node_def *)): Convert to... (is_a_helper varpool_node::test (symtab_node_base *)): ...this. * ipa-ref.h (symtab_node_def): Drop. (symtab_node): Change underlying type from symtab_node_def to symtab_node_base. (const_symtab_node): Likwise. * is-a.h: Update examples in comment. * symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base. (assembler_name_hash): Likewise. This patch is OK. Thanks for working on this! These symtab changes were dependent on having gengtype support for inheritance, which is now in trunk, so I'm now revisiting these patches. The above patch hasn't bitrotted, though the autogenerated one that goes with it needed regenerating. A new version of the autogenerated patch can be seen at: http://dmalcolm.fedorapeople.org/gcc/large-patches/eaba9669644c84592ea32be2dcd19ba92beca381-0003-Autogenerated-fixes-of-symbol.-to.patch Is that new patch OK? (it's 450KB so one's eyes tend to glaze over after a while, but FWIW you approved an earlier version of that in: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00730.html and the test suite for the script that generated the patch can be seen at: https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/test_refactor_symtab.py ) Given it's already approved and hasn't changed in any substantial way, it's OK for the trunk once you get through the bootstrap and regression testing. Thanks. The conversion of the symtab types from inheritance-in-C to a C++ class hierarchy is now in trunk: I committed the manual parts of the conversion as r204170, and the automated part as r204171. I then noticed that this broke the gdb debugging hooks for printing a (cgraph_node *). I tested and committed the attached fix for them as r204174. As a followup, I'm next looking at the renaming of the symtab types previously discussed with Honza (on 2013-09-10). Dave Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 204173) +++ gcc/ChangeLog (revision 204174) @@ -1,3 +1,10 @@ +2013-10-29 David Malcolm dmalc...@redhat.com + + * gdbhooks.py (CGraphNodePrinter.to_string): Update gdb + prettyprinter for cgraph_node to reflect the conversion of the + symtable types to a C++ class hierarchy: it now *is* a + symtab_node_base, rather than having one (named symbol). + 2013-10-29 Balaji V. Iyer balaji.v.i...@intel.com * builtins.c (is_builtin_name): Added a check for __cilkrts_detach and Index: gcc/gdbhooks.py === --- gcc/gdbhooks.py (revision 204173) +++ gcc/gdbhooks.py (revision 204174) @@ -226,8 +226,7 @@ # symtab_node_name calls lang_hooks.decl_printable_name # default implementation (lhd_decl_printable_name) is: #return IDENTIFIER_POINTER (DECL_NAME (decl)); -symbol = self.gdbval['symbol'] -tree_decl = Tree(symbol['decl']) +tree_decl = Tree(self.gdbval['decl']) result += ' %s' % tree_decl.DECL_NAME().IDENTIFIER_POINTER() result += '' return result
Re: free is a killer
On 10/29/13 00:38, Marc Glisse wrote: On Mon, 28 Oct 2013, Jeff Law wrote: On 10/28/13 16:05, Marc Glisse wrote: I checked and it does the wrong thing (I don't have the testcase handy anymore, but it shouldn't be hard to recreate one), I even wrote a patch (attached) but it is related to: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html so it can't go in. A more limited fix (still missing many cases) would be possible. I didn't test VA_END, but it doesn't look as bad (it is the SSA_NAME case that is wrong for memcpy Actually, it's fairly easy to see with memset. Something like this: /* { dg-do compile } */ /* { dg-options -O1 -fdump-tree-optimized } */ void f (long *p) { *p = 42; p[4] = 42; __builtin_memset (p, 0, 100); } /* { dg-final { scan-tree-dump-not = 42 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ Indeed. Do you want to commit it xfailed or put it in bugzilla so we don't lose it? (it becomes harder if you replace p with p-1 in the memset arguments). BTW, you can get a VAR_DECL (and presumably a PARM_DECL) from ao_ref_base. Some of the ssa-dse tests show this. Jeff