Re: C++14 digit separators..
On 10/28/2013 03:51 AM, Ed Smith-Rowland wrote: Here is an implementation for C++14 digit separators (single quote). It's still testing on x86_64-linux but I wanted to give folks a chance to check it out. Ed Index: gcc/testsuite/g++.dg/cpp1y/digit-sep-neg.C === --- gcc/testsuite/g++.dg/cpp1y/digit-sep-neg.C(revision 0) +++ gcc/testsuite/g++.dg/cpp1y/digit-sep-neg.C(working copy) @@ -0,0 +1,17 @@ +// { dg-options -std=c++1y } + +#define assert(E) if(!(E))__builtin_abort(); + +int +main() +{ + (1048576 == 1048''576); + (1048576 == 0X'10); + (1048576 == 0x'10); + (1048576 == 0'004'000'000); + (1048576 == 0B1); + (1048576 == 0b0001'''''); + + (1.'602'176'565e-19 == 1.602176565e-19); + (1.602'176'565e'-19 == 1.602176565e-19); +} These tests seem to be missing the 'assert' there. -- VZ signature.asc Description: OpenPGP digital signature
Re: [PATCH, PR 53001] Re: Patch to split out new warning flag for floating point conversion
Hello Joshua, Joshua J Cogliati jrinc...@yahoo.com writes: I am not certain that c.opt was modified correctly. I don't see any problem with the c.opt part. So unless another maintainer says otherwise, I'd say this is OK. 1. warn_float_patch_and_new_testcase.diff This adds a new testcase and checks for float-conversion in the warning. This will add somewhat more time for running the testcases compared to version 1 while still testing more or less the same code paths. This does however check that the warning occurs when -Wconversion is not used. As said earlier, I'd really prefer this approach because it leaves the existing tests alone and just adds new one. I guess we cannot do much at this point for the speed concern (okay, we could try and make the test harness run more tests in parallel but that is a separate discussion) and I think Joseph agrees too. So please let's stick to this approach. Changelog for warn_float_patch_and_new_testcase.diff and warn_float_patch_and_new_testcase2.diff: Splitting out a -Wfloat-conversion from -Wconversion for conversions that lower floating point number precision or conversion from floating point numbers to integers * c-family/c-common.c Switching unsafe_conversion_p to return an enumeration with more detail, and conversion_warning to use this information. Please, strictly follow the same format as the others entries in the ChangeLog file. That is, explicitly write the names of the functions you changed, in parenthesis, followed by a colon. That would make the entry look like: * c-family/c-common.c (unsafe_conversion_p): Make this function return an enumeration with more detail. (conversion_warning): Use the new return type of unsafe_conversion_p to separately warn either about conversions that lower floating point number precision or about the other kinds of conversions. * c-family/c-common.h Adding conversion_safety enumeration and switching return type of unsafe_conversion_p Likewise. * c-family/c.opt Adding new warning float-conversion and enabling it -Wconversion * doc/invoke.texi Adding documentation about -Wfloat-conversion Likewise (colon missing at the end of the file name). * testsuite/c-c++-common/Wfloat-conversion.c Copies relevant tests from c-c++-common/Wconversion-real.c, gcc.dg/Wconversion-real-integer.c and gcc.dg/pr35635.c into new testcase for ones that are warned about by -Wfloat-conversion You need to add the above ChangeLog entry to the ChangeLog file in gcc/testsuite/ChangeLog. Thus, the entry would look like (note how the prefix testsuite/ is removed from the path name): * c-c++-common/Wfloat-conversion.c: New test file. Its content started as a copy of c-c++-common/Wconversion-real.c, gcc.dg/Wconversion-real-integer.c and gcc.dg/pr35635.c. It has been augmented by new tests to exercise the -Wfloat-conversion option. Index: gcc/testsuite/c-c++-common/Wfloat-conversion.c === --- gcc/testsuite/c-c++-common/Wfloat-conversion.c(working copy) +++ gcc/testsuite/c-c++-common/Wfloat-conversion.c(working copy) @@ -1,85 +1,58 @@ /* Test for diagnostics for Wconversion for floating-point. */ Hmmh. I think this diff hunk should say that this is a new file. Here what it says is that it's a modification of an existing file. Thank you for your time on this. -- Dodji
RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
Hi Sandra, On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote: On 10/18/2013 10:38 AM, Richard Biener wrote: Sandra Loosemore san...@codesourcery.com wrote: On 10/18/2013 04:50 AM, Richard Biener wrote: On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore san...@codesourcery.com wrote: This patch fixes various -fstrict-volatile-bitfields wrong-code bugs, including instances where bitfield values were being quietly truncated as well as bugs relating to using the wrong width. The code changes are identical to those in the previous version of this patch series (originally posted in June and re-pinged many times since then), but for this version I have cleaned up the test cases to remove dependencies on header files, as Bernd requested. Ok. Just to clarify, is this approval unconditional, independent of part 2 and other patches or changes still under active discussion? Yes. Hr. After some further testing, I'm afraid this patch is still buggy. :-( I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new pr56997-1.c testcase, it got stuck in infinite recursion between store_split_bit_field/store_fixed_bit_field and/or extract_split_bit_field/extract_fixed_bit_field. This didn't show up in my previous mainline testing. The difference between 4.8 and mainline head is the alignment of the incoming str_rtx passed to store_bit_field/extract_bit_field, due to the changes in r199898. The alignment is 8 bits on mainline, and 32 on 4.8. It seems to me that the bitfield code ought to handle store/extract from a more-aligned object and it's probably possible to construct an example that fails in this way on mainline as well. It looks like there are conflicting assumptions between get_best_mode, the places that call it in store_fixed_bit_field and extract_fixed_bit_field, and the code that actually does the splitting -- which uses a unit based on the MEM_ALIGN of the incoming rtx, rather than its mode. In the case where it's failing, get_best_mode is deciding it can't do a HImode access without splitting, but then the split code is assuming SImode units because of the 32-bit alignment, but not actually changing the mode of the rtx to match that. On top of that, this is one of the cases that strict_volatile_bitfield_p checks for and returns false, but the callers of store_bit_field and extract_bit_field in expr.c have already fiddled with the mode of the incoming rtx assuming that -fstrict-volatile-bitfields does apply. It doesn't get into that infinite recursion if it's compiled with -fno-strict-volatile-bitfields instead; in that case the incoming rtx has BLKmode, get_best_mode chooses SImode, and it's able to do the access without splitting at all. Anyway I tried a couple different bandaids that solved the infinite recursion problem but caused regressions elsewhere, and now I'm not sure of the right place to fix this. Given that there is also still ongoing discussion about making this a 3-way switch (etc) I am going to hold off on committing this patch for now. -Sandra I have attached an update to your patch, that should a) fix the recursion problem. b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory model. Bernd. patch-bitfields-update.diff Description: Binary data
Re: [PATCH, ARM] Fix line number data for PIC register setup code
On 27/10/13 17:04, Eric Botcazou wrote: The PIC register setup code is emitted after NOTE_INSNS_FUNCTION_BEG, because it uses the insert_insn_on_edge mechanism, and the corresponding insertion in cfgexpand.c:gimple_expand_cfg takes care to insert the code after the parm_birth_insn: ... /* Avoid putting insns before parm_birth_insn. */ if (e-src == ENTRY_BLOCK_PTR single_succ_p (ENTRY_BLOCK_PTR) parm_birth_insn) { rtx insns = e-insns.r; e-insns.r = NULL_RTX; emit_insn_after_noloc (insns, parm_birth_insn, e-dest); } ... And in the case for this test-case, parm_birth_insn is the NOTE_INSNS_FUNCTION_BEG. So this means that parm_birth_insn can never be null, right? Yes, AFAICT parm_birth_insn is never NULL at this point. 2013-10-13 Tom de Vries t...@codesourcery.com * cfgexpand.c (gimple_expand_cfg): Don't commit insertions after NOTE_INSN_FUNCTION_BEG. * gcc.target/arm/require-pic-register-loc.c: New test. OK if you also remove the test on parm_birth_insn. Updated patch, re-bootstrapped on x86_64 and committed to trunk. Also applied to 4.7 and 4.8 branches, the same problem is present there. Thanks, - Tom 2013-10-28 Tom de Vries t...@codesourcery.com * cfgexpand.c (gimple_expand_cfg): Remove test for parm_birth_insn. Don't commit insertions after NOTE_INSN_FUNCTION_BEG. * gcc.target/arm/require-pic-register-loc.c: New test. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index ba4c0e6..9705036 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -4789,14 +4789,18 @@ gimple_expand_cfg (void) if (e-insns.r) { rebuild_jump_labels_chain (e-insns.r); - /* Avoid putting insns before parm_birth_insn. */ + /* Put insns after parm birth, but before + NOTE_INSNS_FUNCTION_BEG. */ if (e-src == ENTRY_BLOCK_PTR - single_succ_p (ENTRY_BLOCK_PTR) - parm_birth_insn) + single_succ_p (ENTRY_BLOCK_PTR)) { rtx insns = e-insns.r; e-insns.r = NULL_RTX; - emit_insn_after_noloc (insns, parm_birth_insn, e-dest); + if (NOTE_P (parm_birth_insn) + NOTE_KIND (parm_birth_insn) == NOTE_INSN_FUNCTION_BEG) + emit_insn_before_noloc (insns, parm_birth_insn, e-dest); + else + emit_insn_after_noloc (insns, parm_birth_insn, e-dest); } else commit_one_edge_insertion (e); 2013-10-27 Tobias Burnus bur...@net-b.de PR other/33426 diff --git a/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c b/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c new file mode 100644 index 000..bd85e86 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/require-pic-register-loc.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options -g -fPIC } */ + +void *v; +void a (void *x) { } +void b (void) { } + /* line 7. */ +int/* line 8. */ +main (int argc)/* line 9. */ +{ /* line 10. */ + if (argc == 12345) /* line 11. */ +{ + a (v); + return 1; +} + b (); + + return 0; +} + +/* { dg-final { scan-assembler-not \.loc 1 7 0 } } */ +/* { dg-final { scan-assembler-not \.loc 1 8 0 } } */ +/* { dg-final { scan-assembler-not \.loc 1 9 0 } } */ + +/* The loc at the start of the prologue. */ +/* { dg-final { scan-assembler-times \.loc 1 10 0 1 } } */ + +/* The loc at the end of the prologue, with the first user line. */ +/* { dg-final { scan-assembler-times \.loc 1 11 0 1 } } */
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello Richard, On 22 Oct 08:16, Richard Henderson wrote: On 10/22/2013 07:42 AM, Kirill Yukhin wrote: Hello Richard, Thanks for remarks, they all seems reasonable. One question On 21 Oct 16:01, Richard Henderson wrote: +(define_insn avx512f_movesmode_mask + [(set (match_operand:VF_128 0 register_operand =v) + (vec_merge:VF_128 + (vec_merge:VF_128 + (match_operand:VF_128 2 register_operand v) + (match_operand:VF_128 3 vector_move_operand 0C) + (match_operand:avx512fmaskmode 4 register_operand k)) + (match_operand:VF_128 1 register_operand v) + (const_int 1)))] + TARGET_AVX512F + vmovssescalarmodesuffix\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2} + [(set_attr type ssemov) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) Nested vec_merge? That seems... odd to say the least. How in the world does this get matched? This is generic approach for all scalar `masked' instructions. Reason is that we must save higher bits of vector (outer vec_merge) and apply single-bit mask (inner vec_merge). We may do it with unspecs though... But is it really better? What do you think? What I think is that while it's an instruction that exists in the ISA, does that mean we must model it in the compiler? How would this pattern be used? When we have all-1 mask then simplifier may reduce such pattern to simpler form with single vec_merge. This will be impossible if we put unspec there. So, for example for thise code: __m128d foo (__m128d x, __m128d y) { return _mm_maskz_add_sd (-1, x, y); } With unspec we will have: foo: .LFB2328: movl$-1, %eax # 10*movqi_internal/2 [length = 5] kmovw %eax, %k1 # 24*movqi_internal/8 [length = 4] vaddsd %xmm1, %xmm0, %xmm0{%k1}{z} # 11sse2_vmaddv2df3_mask/2 [length = 6] ret # 27simple_return_internal [length = 1] While for `semantic' version it will be simplified to: foo: .LFB2329: vaddsd %xmm1, %xmm0, %xmm0 # 11sse2_vmaddv2df3/2 [length = 4] ret # 26simple_return_internal [length = 1] So, we have short VEX insn vs. long EVEX one + mask creation insns. That is why we want to expose semantics of such operations. Thanks, K
Committed: Add testcase / extra comment to recent arc_ccfsm_post_advance patch
gcc/testsuite: 2013-10-28 Claudiu Zissulescu claz...@synopsys.com Joern Rennecke joern.renne...@embecosm.com * gcc.target/arc/jump-around-jump.c: New test. gcc: 2013-10-28 Joern Rennecke joern.renne...@embecosm.com * config/arc/arc.c (arc_ccfsm_post_advance): Add comment about TYPE_RETURN. tmp Description: Binary data
Re: Rework c99status.html
On Mon, 28 Oct 2013, Gerald Pfeifer wrote: I think it would be appropriate to redirect all the c99status.html files for particular GCC versions to this file, now it covers all versions. To make sure I understand: you are proposing we remove all individual c99status.html pages and have the web server redirect to the one you just updated for all of those? Yes. (I wonder if we should actually rename it to a more generic name, such as cstdstatus.html, with a view to adding a section on C11 support, but I'm less sure of that. If I complete the atomics support during stage 1, and also get _Thread_local done, then the C11 support can be described as substantially complete in 4.9 with the same caveats as for C99.) -- Joseph S. Myers jos...@codesourcery.com
Re: C++14 digit separators..
On Sun, 27 Oct 2013, Ed Smith-Rowland wrote: Here is an implementation for C++14 digit separators (single quote). I tend to think that such features should come with a test that the feature is not enabled for language / standard versions for which it shouldn't be. That is, something like #define m(x) 0 int i = m(1'2)+(3'4); that is valid in both cases, but whose semantics differ depending on whether the feature is enabled (with an appropriate check on the value of i, and the test being run both in cases where it should be enabled and cases where it shouldn't). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, PR 53001] Re: Patch to split out new warning flag for floating point conversion
On Mon, 28 Oct 2013, Dodji Seketeli wrote: * c-family/c-common.c (unsafe_conversion_p): Make this function As previously noted, c-family/ has its own ChangeLog, so the c-family/ part does not go in the ChangeLog entry. -- Joseph S. Myers jos...@codesourcery.com
Re: C++14 digit separators..
On 10/28/2013 09:10 AM, Joseph S. Myers wrote: On Sun, 27 Oct 2013, Ed Smith-Rowland wrote: Here is an implementation for C++14 digit separators (single quote). I tend to think that such features should come with a test that the feature is not enabled for language / standard versions for which it shouldn't be. That is, something like #define m(x) 0 int i = m(1'2)+(3'4); Good idea. Other than that, the patch looks good to me. Jason
Re: [PATCH, PR 53001] Re: Patch to split out new warning flag for floating point conversion
Joseph S. Myers jos...@codesourcery.com writes: On Mon, 28 Oct 2013, Dodji Seketeli wrote: * c-family/c-common.c (unsafe_conversion_p): Make this function As previously noted, c-family/ has its own ChangeLog, so the c-family/ part does not go in the ChangeLog entry. Agreed. I missed this. -- Dodji
Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
Hello, This patch introduces mask scalar subst. Is it ok to commit to main trunk? Testing pass. -- Thanks, K --- gcc/config/i386/sse.md | 104 --- gcc/config/i386/subst.md | 23 +++ 2 files changed, 95 insertions(+), 32 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index bf0e1ed..1f0d6fa 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1307,7 +1307,7 @@ (set_attr prefix mask_prefix3) (set_attr mode MODE)]) -(define_insn sse_vmplusminus_insnmode3 +(define_insn sse_vmplusminus_insnmode3mask_scalar_name [(set (match_operand:VF_128 0 register_operand =x,v) (vec_merge:VF_128 (plusminus:VF_128 @@ -1318,10 +1318,10 @@ TARGET_SSE @ plusminus_mnemonicssescalarmodesuffix\t{%2, %0|%0, %iptr2} - vplusminus_mnemonicssescalarmodesuffix\t{%2, %1, %0|%0, %1, %iptr2} + vplusminus_mnemonicssescalarmodesuffix\t{%2, %1, %0mask_scalar_operand3|%0mask_scalar_operand3, %1, %iptr2} [(set_attr isa noavx,avx) (set_attr type sseadd) - (set_attr prefix orig,vex) + (set_attr prefix mask_scalar_prefix) (set_attr mode ssescalarmode)]) (define_expand mulmode3mask_name @@ -1347,7 +1347,7 @@ (set_attr btver2_decode direct,double) (set_attr mode MODE)]) -(define_insn sse_vmmultdiv_mnemonicmode3 +(define_insn sse_vmmultdiv_mnemonicmode3mask_scalar_name [(set (match_operand:VF_128 0 register_operand =x,v) (vec_merge:VF_128 (multdiv:VF_128 @@ -1358,10 +1358,10 @@ TARGET_SSE @ multdiv_mnemonicssescalarmodesuffix\t{%2, %0|%0, %iptr2} - vmultdiv_mnemonicssescalarmodesuffix\t{%2, %1, %0|%0, %1, %iptr2} + vmultdiv_mnemonicssescalarmodesuffix\t{%2, %1, %0mask_scalar_operand3|%0mask_scalar_operand3, %1, %iptr2} [(set_attr isa noavx,avx) (set_attr type ssemultdiv_mnemonic) - (set_attr prefix orig,vex) + (set_attr prefix mask_scalar_prefix) (set_attr btver2_decode direct,double) (set_attr mode ssescalarmode)]) @@ -1446,7 +1446,7 @@ (set_attr prefix evex) (set_attr mode MODE)]) -(define_insn *srcp14mode +(define_insn mask_scalar_codeforsrcp14modemask_scalar_name [(set (match_operand:VF_128 0 register_operand =v) (vec_merge:VF_128 (unspec:VF_128 @@ -1456,7 +1456,7 @@ (match_dup 1) (const_int 1)))] TARGET_AVX512F - vrcp14ssescalarmodesuffix\t{%2, %1, %0|%0, %1, %2} + vrcp14ssescalarmodesuffix\t{%2, %1, %0mask_scalar_operand3|%0mask_scalar_operand3, %1, %2} [(set_attr type sse) (set_attr prefix evex) (set_attr mode MODE)]) @@ -1493,7 +1493,7 @@ (set_attr prefix maybe_vex) (set_attr mode MODE)]) -(define_insn sse_vmsqrtmode2 +(define_insn sse_vmsqrtmode2mask_scalar_name [(set (match_operand:VF_128 0 register_operand =x,v) (vec_merge:VF_128 (sqrt:VF_128 @@ -1503,11 +1503,11 @@ TARGET_SSE @ sqrtssescalarmodesuffix\t{%1, %0|%0, %iptr1} - vsqrtssescalarmodesuffix\t{%1, %2, %0|%0, %2, %iptr1} + vsqrtssescalarmodesuffix\t{%1, %2, %0mask_scalar_operand3|%0mask_scalar_operand3, %2, %iptr1} [(set_attr isa noavx,avx) (set_attr type sse) (set_attr atom_sse_attr sqrt) - (set_attr prefix orig,vex) + (set_attr prefix mask_scalar_prefix) (set_attr btver2_sse_attr sqrt) (set_attr mode ssescalarmode)]) @@ -1542,7 +1542,7 @@ (set_attr prefix evex) (set_attr mode MODE)]) -(define_insn *rsqrt14mode +(define_insn mask_scalar_codeforrsqrt14modemask_scalar_name [(set (match_operand:VF_128 0 register_operand =v) (vec_merge:VF_128 (unspec:VF_128 @@ -1552,7 +1552,7 @@ (match_dup 1) (const_int 1)))] TARGET_AVX512F - vrsqrt14ssescalarmodesuffix\t{%2, %1, %0|%0, %1, %2} + vrsqrt14ssescalarmodesuffix\t{%2, %1, %0mask_scalar_operand3|%0mask_scalar_operand3, %1, %2} [(set_attr type sse) (set_attr prefix evex) (set_attr mode MODE)]) @@ -1622,7 +1622,7 @@ (set_attr prefix mask_prefix3) (set_attr mode MODE)]) -(define_insn sse_vmcodemode3 +(define_insn sse_vmcodemode3mask_scalar_name [(set (match_operand:VF_128 0 register_operand =x,v) (vec_merge:VF_128 (smaxmin:VF_128 @@ -1633,11 +1633,11 @@ TARGET_SSE @ maxmin_floatssescalarmodesuffix\t{%2, %0|%0, %iptr2} - vmaxmin_floatssescalarmodesuffix\t{%2, %1, %0|%0, %1, %iptr2} + vmaxmin_floatssescalarmodesuffix\t{%2, %1, %0mask_scalar_operand3|%0mask_scalar_operand3, %1, %iptr2} [(set_attr isa noavx,avx) (set_attr type sse) (set_attr btver2_sse_attr maxmin) - (set_attr prefix orig,vex) + (set_attr prefix mask_scalar_prefix) (set_attr mode ssescalarmode)]) ;; These versions of the min/max patterns implement exactly the operations @@ -2748,7 +2748,7 @@ (match_operand:FMAMODE 3 nonimmediate_operand)))] ) -(define_insn *fma_fmadd_mode +(define_insn fma_fmadd_mode [(set (match_operand:FMAMODE 0 register_operand =v,v,v,x,x)
Re: [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst.
Hello, This patch introduces mask_scalar_merge subst. Is it ok to commit to main trunk? Testing pass. -- Thanks, K --- gcc/config/i386/sse.md | 26 +- gcc/config/i386/subst.md | 16 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 1f0d6fa..f3cca59 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2153,7 +2153,7 @@ [(V16SF const_0_to_31_operand) (V8DF const_0_to_31_operand) (V16SI const_0_to_7_operand) (V8DI const_0_to_7_operand)]) -(define_insn avx512f_cmpmode3 +(define_insn avx512f_cmpmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand =k) (unspec:avx512fmaskmode [(match_operand:VI48F_512 1 register_operand v) @@ -2161,13 +2161,13 @@ (match_operand:SI 3 cmp_imm_predicate n)] UNSPEC_PCMP))] TARGET_AVX512F - vsseintprefixcmpssemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3} + vsseintprefixcmpssemodesuffix\t{%3, %2, %1, %0mask_scalar_merge_operand4|%0mask_scalar_merge_operand4, %1, %2, %3} [(set_attr type ssecmp) (set_attr length_immediate 1) (set_attr prefix evex) (set_attr mode sseinsnmode)]) -(define_insn avx512f_ucmpmode3 +(define_insn avx512f_ucmpmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand =k) (unspec:avx512fmaskmode [(match_operand:VI48_512 1 register_operand v) @@ -2175,7 +2175,7 @@ (match_operand:SI 3 const_0_to_7_operand n)] UNSPEC_UNSIGNED_PCMP))] TARGET_AVX512F - vpcmpussemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3} + vpcmpussemodesuffix\t{%3, %2, %1, %0mask_scalar_merge_operand4|%0mask_scalar_merge_operand4, %1, %2, %3} [(set_attr type ssecmp) (set_attr length_immediate 1) (set_attr prefix evex) @@ -8712,7 +8712,7 @@ (set_attr prefix vex) (set_attr mode OI)]) -(define_expand avx512f_eqmode3 +(define_expand avx512f_eqmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand) (unspec:avx512fmaskmode [(match_operand:VI48_512 1 register_operand) @@ -8721,14 +8721,14 @@ TARGET_AVX512F ix86_fixup_binary_operands_no_copy (EQ, MODEmode, operands);) -(define_insn avx512f_eqmode3_1 +(define_insn avx512f_eqmode3mask_scalar_merge_name_1 [(set (match_operand:avx512fmaskmode 0 register_operand =k) (unspec:avx512fmaskmode [(match_operand:VI48_512 1 register_operand %v) (match_operand:VI48_512 2 nonimmediate_operand vm)] UNSPEC_MASKED_EQ))] TARGET_AVX512F ix86_binary_operator_ok (EQ, MODEmode, operands) - vpcmpeqssemodesuffix\t{%2, %1, %0|%0, %1, %2} + vpcmpeqssemodesuffix\t{%2, %1, %0mask_scalar_merge_operand3|%0mask_scalar_merge_operand3, %1, %2} [(set_attr type ssecmp) (set_attr prefix_extra 1) (set_attr prefix evex) @@ -8808,13 +8808,13 @@ (set_attr prefix vex) (set_attr mode OI)]) -(define_insn avx512f_gtmode3 +(define_insn avx512f_gtmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand =k) (unspec:avx512fmaskmode [(match_operand:VI48_512 1 register_operand v) (match_operand:VI48_512 2 nonimmediate_operand vm)] UNSPEC_MASKED_GT))] TARGET_AVX512F - vpcmpgtssemodesuffix\t{%2, %1, %0|%0, %1, %2} + vpcmpgtssemodesuffix\t{%2, %1, %0mask_scalar_merge_operand3|%0mask_scalar_merge_operand3, %1, %2} [(set_attr type ssecmp) (set_attr prefix_extra 1) (set_attr prefix evex) @@ -9208,25 +9208,25 @@ ] (const_string sseinsnmode)))]) -(define_insn avx512f_testmmode3 +(define_insn avx512f_testmmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand =k) (unspec:avx512fmaskmode [(match_operand:VI48_512 1 register_operand v) (match_operand:VI48_512 2 nonimmediate_operand vm)] UNSPEC_TESTM))] TARGET_AVX512F - vptestmssemodesuffix\t{%2, %1, %0|%0, %1, %2} + vptestmssemodesuffix\t{%2, %1, %0mask_scalar_merge_operand3|%0mask_scalar_merge_operand3, %1, %2} [(set_attr prefix evex) (set_attr mode sseinsnmode)]) -(define_insn avx512f_testnmmode3 +(define_insn avx512f_testnmmode3mask_scalar_merge_name [(set (match_operand:avx512fmaskmode 0 register_operand =k) (unspec:avx512fmaskmode [(match_operand:VI48_512 1 register_operand v) (match_operand:VI48_512 2 nonimmediate_operand vm)] UNSPEC_TESTNM))] TARGET_AVX512CD - %vptestnmssemodesuffix\t{%2, %1, %0|%0, %1, %2} + %vptestnmssemodesuffix\t{%2, %1, %0mask_scalar_merge_operand3|%0mask_scalar_merge_operand3, %1, %2} [(set_attr prefix evex) (set_attr mode sseinsnmode)]) diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 532a3a1..b537c5e 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -27,6 +27,9 @@ V16SF V8SF V4SF V8DF V4DF
Re: [PATCH 1/n] Add conditional compare support
On 10/28/2013 01:32 AM, Zhenqiang Chen wrote: Patch is updated according to your comments. Main changes are: * Add two hooks: legitimize_cmp_combination and legitimize_ccmp_combination * Improve document. No, these are not the hooks I proposed. You should *not* have a ccmp_optab, because the middle-end has absolutely no idea what mode TARGET should be in. The hook needs to return an rtx expression appropriate for feeding to cbranch et al. E.g. for arm, (ne (reg:CC_Z CC_REGNUM) (const_int 0)) after having emitted the instruction that sets CC_REGNUM. We need to push this to the backend because for ia64 the expression needs to be of te form (ne (reg:BI new_pseudo) (const_int 0)) r~
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
On 10/28/2013 03:24 AM, Kirill Yukhin wrote: Hello Richard, On 22 Oct 08:16, Richard Henderson wrote: On 10/22/2013 07:42 AM, Kirill Yukhin wrote: Hello Richard, Thanks for remarks, they all seems reasonable. One question On 21 Oct 16:01, Richard Henderson wrote: +(define_insn avx512f_movesmode_mask + [(set (match_operand:VF_128 0 register_operand =v) + (vec_merge:VF_128 + (vec_merge:VF_128 + (match_operand:VF_128 2 register_operand v) + (match_operand:VF_128 3 vector_move_operand 0C) + (match_operand:avx512fmaskmode 4 register_operand k)) + (match_operand:VF_128 1 register_operand v) + (const_int 1)))] + TARGET_AVX512F + vmovssescalarmodesuffix\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2} + [(set_attr type ssemov) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) Nested vec_merge? That seems... odd to say the least. How in the world does this get matched? This is generic approach for all scalar `masked' instructions. Reason is that we must save higher bits of vector (outer vec_merge) and apply single-bit mask (inner vec_merge). We may do it with unspecs though... But is it really better? What do you think? What I think is that while it's an instruction that exists in the ISA, does that mean we must model it in the compiler? How would this pattern be used? When we have all-1 mask then simplifier may reduce such pattern to simpler form with single vec_merge. This will be impossible if we put unspec there. So, for example for thise code: __m128d foo (__m128d x, __m128d y) { return _mm_maskz_add_sd (-1, x, y); } With unspec we will have: foo: .LFB2328: movl$-1, %eax # 10*movqi_internal/2 [length = 5] kmovw %eax, %k1 # 24*movqi_internal/8 [length = 4] vaddsd %xmm1, %xmm0, %xmm0{%k1}{z} # 11 sse2_vmaddv2df3_mask/2 [length = 6] ret # 27simple_return_internal [length = 1] While for `semantic' version it will be simplified to: foo: .LFB2329: vaddsd %xmm1, %xmm0, %xmm0 # 11sse2_vmaddv2df3/2 [length = 4] ret # 26simple_return_internal [length = 1] So, we have short VEX insn vs. long EVEX one + mask creation insns. That is why we want to expose semantics of such operations. This is not the question that I asked. Why is a masked *scalar* operation useful? The only way I can see that one would get created is that builtin, which begs the question of why the builtin exists. r~
Re: [PATCH, MPX, 2/X] Pointers Checker [1/25] Hooks
On 24 Oct 23:21, Jeff Law wrote: On 10/24/13 02:24, Ilya Enkovich wrote: These two hooks are used by expand pass to pass/receive bounds for args. When bounds are passed in a register, expand does not need this hook and uses regular move insn. If we are out of bound register or platform does not have them at all, this hook is called. If bounded arg is passed in memory, regular way to store associated bounds is supposed to be used in hook. That is why no slot_no value is required, only arg and it's place (slot) are used. If bounded arg is passed in register (e.g. it happens on i386 with MPX when more that 4 pointers are passed in registers), then some special slot has to be used for bounds. slot_no here holds identifier of this special slot. E.g. if we call function with 6 pointer on i386, we call this hook passing R8 and R9 as slot with const1_rtx and const2_rtx as slot_no. So can we find a concise way to describe this and include that in the docs for the hooks. Otherwise I can't see how a developer is going to know how to use this stuff. So how am I (as a GCC developer) suppsoed to know what BNDMK, BNDLDX, BNDCU, etc mean? The names aren't particularly descriptive. Are these documented elsewhere in a follow-up patch? If not, it seems to me we need to document them here. Actually the next patch introduces them and is a good place for documentation. But currently this patch has documentation for user visible built-ins only. For now built-ins used for instrumentation are described on Wiki only (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler#Builtins_used_for_instrumentation). Where should I move it? Does it also go to extend.texi? If they're strictly for developers, then there's not a real good place for them. They're not really extensions we would expect the user to use, so extend.texi seems inappropriate. Perhaps a section in tm.texi? Jeff I fixed documentation for bounds load/store hooks. Also added documentation for list of builtins returned by builtin_chkp_function. Thanks, Ilya -- gcc/ 2013-10-28 Ilya Enkovich ilya.enkov...@intel.com * target.def (builtin_chkp_function): New. (chkp_bound_type): New. (chkp_bound_mode): New. (fn_abi_va_list_bounds_size): New. (load_bounds_for_arg): New. (store_bounds_for_arg): New. * targhooks.h (default_load_bounds_for_arg): New. (default_store_bounds_for_arg): New. (default_fn_abi_va_list_bounds_size): New. (default_chkp_bound_type): New. (default_chkp_bound_mode): New. (default_builtin_chkp_function): New. * targhooks.c (default_load_bounds_for_arg): New. (default_store_bounds_for_arg): New. (default_fn_abi_va_list_bounds_size): New. (default_chkp_bound_type): New. (default_chkp_bound_mode); New. (default_builtin_chkp_function): New. * doc/tm.texi.in (TARGET_FN_ABI_VA_LIST_BOUNDS_SIZE): New. (TARGET_LOAD_BOUNDS_FOR_ARG): New. (TARGET_STORE_BOUNDS_FOR_ARG): New. (TARGET_BUILTIN_CHKP_FUNCTION): New. (TARGET_CHKP_BOUND_TYPE): New. (TARGET_CHKP_BOUND_MODE): New. * doc/tm.texi: Regenerated. * langhooks.h (lang_hooks): Add chkp_supported field. * langhooks-def.h (LANG_HOOKS_CHKP_SUPPORTED): New. (LANG_HOOKS_INITIALIZER); Add LANG_HOOKS_CHKP_SUPPORTED. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8d220f3..c60ebef 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4334,6 +4334,13 @@ This hook returns the va_list type of the calling convention specified by The default version of this hook returns @code{va_list_type_node}. @end deftypefn +@deftypefn {Target Hook} tree TARGET_FN_ABI_VA_LIST_BOUNDS_SIZE (tree @var{fndecl}) +This hook returns size for @code{va_list} object in function specified +by @var{fndecl}. This hook is used by Pointer Bounds Checker to build bounds +for @code{va_list} object. Return @code{integer_zero_node} if no bounds +should be used (e.g. @code{va_list} is a scalar pointer to the stack). +@end deftypefn + @deftypefn {Target Hook} tree TARGET_CANONICAL_VA_LIST_TYPE (tree @var{type}) This hook returns the va_list type of the calling convention specified by the type of @var{type}. If @var{type} is not a valid va_list type, it returns @@ -5151,6 +5158,26 @@ defined, then define this hook to return @code{true} if Otherwise, you should not define this hook. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_LOAD_BOUNDS_FOR_ARG (rtx @var{slot}, rtx @var{arg}, rtx @var{slot_no}) +This hook is used by expand pass to emit insn to load bounds of +@var{arg} passed in @var{slot}. Expand pass uses this hook in case +bounds of @var{arg} are not passed in register. If @var{slot} is a +memory, then bounds are loaded as for regular pointer loaded from +memory. If @var{slot} is not a memory then @var{slot_no} is an integer +constant holding
[C++ Patch] PR 58888
Hi, here we reject the declaration of b with the invalid use of ‘auto’ error. The reason is that, at the beginning of grokfield, init != NULL_TREE, processing_template_decl is false (in fact, a templatized version of the testcase is accepted) and we get to: if (TREE_CODE (init) == CONSTRUCTOR) init = digest_init (TREE_TYPE (value), init, tf_warning_or_error); then, at the beginning of digest_init_r, complete_type_or_maybe_complain finds the auto type incomplete. It seems to me that in grokfield, in the non-template case too, we should just process the declaration later, in the final switch, thus forward to finish_static_data_member_decl (then the only difference with the templatized case that we already handle correctly, is that we don't call push_template_decl in the middle) Tested x86_64-linux. Thanks, Paolo. /cp 2013-10-28 Paolo Carlini paolo.carl...@oracle.com PR c++/5 * decl2.c (grokfield): Handle auto like NSDMI. /testsuite 2013-10-28 Paolo Carlini paolo.carl...@oracle.com PR c++/5 * g++.dg/cpp0x/auto40.C: New. Index: cp/decl2.c === --- cp/decl2.c (revision 204133) +++ cp/decl2.c (working copy) @@ -953,6 +953,8 @@ grokfield (const cp_declarator *declarator, } else if (TREE_CODE (value) == FIELD_DECL) /* C++11 NSDMI, keep going. */; + else if (is_auto (TREE_TYPE (value))) + /* C++11 auto, keep going. */; else if (!VAR_P (value)) gcc_unreachable (); else if (!processing_template_decl) Index: testsuite/g++.dg/cpp0x/auto40.C === --- testsuite/g++.dg/cpp0x/auto40.C (revision 0) +++ testsuite/g++.dg/cpp0x/auto40.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/5 +// { dg-do compile { target c++11 } } + +#include initializer_list + +struct A +{ + static constexpr auto b{1.0}; +}; + +constexpr decltype(A::b) A::b;
Re: RFC/A: PR 58079: CONST_INT mode assert in combine.c:do_SUBST
Hello! PR 58079 is about the do_SUBST assert: /* Sanity check that we're replacing oldval with a CONST_INT that is a valid sign-extension for the original mode. */ gcc_assert (INTVAL (newval) == trunc_int_for_mode (INTVAL (newval), GET_MODE (oldval))); triggering while trying to optimise the temporary result: (eq (const_int -73 [0xffb7]) (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ]) combine_simplify_rtx calls simplify_comparison, which first canonicalises the order so that the constant is second and then promotes the comparison to SImode (the first supported comparison mode on MIPS). So we end up with: (eq (zero_extend:SI (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ])) (const_int 183 [0xb7])) So far so good. But combine_simplify_rtx then tries to install the new operands in-place, with the second part of: /* If the code changed, return a whole new comparison. */ if (new_code != code) return gen_rtx_fmt_ee (new_code, mode, op0, op1); /* Otherwise, keep this operation, but maybe change its operands. This also converts (ne (compare FOO BAR) 0) to (ne FOO BAR). */ SUBST (XEXP (x, 0), op0); SUBST (XEXP (x, 1), op1); And this triggers the assert because we're replacing a QImode register with the non-QImode constant 183. One fix would be to extend the new_code != code condition, as below. This should avoid the problem cases without generating too much garbage rtl. But if the condition's getting this complicated, perhaps it'd be better just to avoid SUBST here? I.e. if (new_code != code || op0 != XEXP (x, 0) || op1 != XEXP (x, 1)) return gen_rtx_fmt_ee (new_code, mode, op0, op1); Just asking though. :-) Tested on mipsisa64-elf. OK to install? Thanks, Richard gcc/ PR rtl-optimization/58079 * combine.c (combine_simplify_rtx): Avoid using SUBST if simplify_comparison has widened a comparison with an integer. gcc/testsuite/ * gcc.dg/torture/pr58079.c: New test. I would like to backport this patch to 4.8 branch. The patch fixes the failure on alpha-linux-gnu and also (reportedly) fixes the same failure on mips and pa targets. I have bootstrapped and regression test the patch on x86_64-pc-linux-gnu {,-m32} and started bootstrap and regtests on alphaev68-linux-gnu on 4.8 branch for all default languages + obj-c++ and go. The latter will take some ~10 hours. OK to backort the patch to the 4.8 branch if both tests pass? Thanks, Uros.
Re: RFA: Andes nds32 port v4 patch
Since Richard looked at the machine description part of the port, I looked at the other pieces. I'm happy with those pieces with the following minor changes: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00520.html (libgcc): remove the .file directive from crtzero.S. See http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00123.html for why such directives are best avoided. http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00522.html (documentation): use @deftypefn rather than @table @code when documenting built-in functions. In install.texi, say @samp{newlib} and @samp{mculib} instead of using '' quotes. -- Joseph S. Myers jos...@codesourcery.com
[patch] Remove gimple.h from header files.
There are currently only 2 .h files which include gimple.h, so removing this indirection is not too difficult. Similar to the tree.h change Diego made, only .c files should include gimple.h. Bootstraps on x86_64-unknown-linux-gnu, regression tests are running. Assuming no issues, OK? Andrew * ipa-prop.h: Remove gimple.h from include list. * lto-streamer.h: Likewise. * cgraphbuild.c: Add gimple.h to include list. * data-streamer-in.c: Likewise. * ipa-cp.c: Likewise. * tree-streamer.c: Likewise. * lto-compress.c: Likewise. * ipa-reference.c: Likewise. * data-streamer-out.c: Likewise. * lto-cgraph.c: Likewise. * cgraphclones.c: Likewise. * ipa-utils.c: Likewise. * data-streamer.c: Likewise. * ipa-split.c: Likewise. * lto-section-in.c: Likewise. * lto/lto-object.c: Likewise. * lto/lto-partition.c: Likewise. * tree-streamer-out.c: Likewise. * ipa-prop.c: Likewise. * tree-streamer-in.c: Likewise. * symtab.c: Likewise. * opts-global.c: Likewise. * lto-opts.c: Likewise. * lto-section-out.c: Likewise. * lto-streamer.c: Likewise. Index: ipa-prop.h === *** ipa-prop.h (revision 204031) --- ipa-prop.h (working copy) *** along with GCC; see the file COPYING3. *** 23,29 #include tree-core.h #include vec.h #include cgraph.h - #include gimple.h #include alloc-pool.h /* The following definitions and interfaces are used by --- 23,28 Index: lto-streamer.h === *** lto-streamer.h (revision 204031) --- lto-streamer.h (working copy) *** along with GCC; see the file COPYING3. *** 26,32 #include plugin-api.h #include hash-table.h #include tree-core.h - #include gimple.h #include target.h #include cgraph.h #include vec.h --- 26,31 Index: cgraphbuild.c === *** cgraphbuild.c (revision 204031) --- cgraphbuild.c (working copy) *** along with GCC; see the file COPYING3. *** 23,28 --- 23,29 #include coretypes.h #include tm.h #include tree.h + #include gimple.h #include langhooks.h #include pointer-set.h #include intl.h Index: data-streamer-in.c === *** data-streamer-in.c (revision 204031) --- data-streamer-in.c (working copy) *** along with GCC; see the file COPYING3. *** 25,30 --- 25,31 #include coretypes.h #include diagnostic.h #include tree.h + #include gimple.h #include data-streamer.h /* Read a string from the string table in DATA_IN using input block Index: ipa-cp.c === *** ipa-cp.c (revision 204031) --- ipa-cp.c (working copy) *** along with GCC; see the file COPYING3. *** 104,109 --- 104,110 #include system.h #include coretypes.h #include tree.h + #include gimple.h #include target.h #include ipa-prop.h #include bitmap.h Index: tree-streamer.c === *** tree-streamer.c (revision 204031) --- tree-streamer.c (working copy) *** along with GCC; see the file COPYING3. *** 24,29 --- 24,30 #include system.h #include coretypes.h #include tree.h + #include gimple.h #include streamer-hooks.h #include tree-streamer.h Index: lto-compress.c === *** lto-compress.c (revision 204031) --- lto-compress.c (working copy) *** along with GCC; see the file COPYING3. *** 28,33 --- 28,34 #include zlib.h #include coretypes.h #include tree.h + #include gimple.h #include diagnostic-core.h #include langhooks.h #include lto-streamer.h Index: ipa-reference.c === *** ipa-reference.c (revision 204031) --- ipa-reference.c (working copy) *** along with GCC; see the file COPYING3. *** 41,46 --- 41,47 #include coretypes.h #include tm.h #include tree.h + #include gimple.h #include tree-inline.h #include tree-pass.h #include pointer-set.h Index: data-streamer-out.c === *** data-streamer-out.c (revision 204031) --- data-streamer-out.c (working copy) *** along with GCC; see the file COPYING3. *** 24,29 --- 24,30 #include system.h #include coretypes.h #include tree.h + #include gimple.h #include data-streamer.h /* Return index used to reference STRING of LEN characters in the string table Index: lto-cgraph.c === *** lto-cgraph.c (revision 204031) --- lto-cgraph.c (working copy) *** along with GCC; see the file COPYING3. *** 25,30 --- 25,31 #include coretypes.h
Re: [C++ Patch] PR 58888
My question is, why do we need that whole block for massaging VAR_DECL initializers? That all ought to be handled properly by cp_finish_decl. Does removing everything after else if (!VAR_P (value)) gcc_unreachable (); work? Jason
Re: [patch] Remove gimple.h from header files.
On 10/28/13 11:39, Andrew MacLeod wrote: There are currently only 2 .h files which include gimple.h, so removing this indirection is not too difficult. Similar to the tree.h change Diego made, only .c files should include gimple.h. Bootstraps on x86_64-unknown-linux-gnu, regression tests are running. Assuming no issues, OK? OK. Thanks, Jeff
Re: Fix for cris-elf breakage from mudflap removal, take 2
On 10/27/13 08:43, Hans-Peter Nilsson wrote: PRED_NORETURN seems a better match; not that I see anything in the current source that actually treats them differently other than generating them, but my grep-fu may be weak and certainly my crystall-ball-fu is. After testing (no regressions compared to r204080 before the breakage), committed. * config/cris/cris.c (cris_emit_trap_for_misalignment): Replace the removed PRED_MUDFLAP with PRED_NORETURN. Correct file-path in comment. Thanks. SOrry for mucking up cris. I know I'd scanned that code at one time and somehow missed fixing the PRED_MUDFLAP reference. jeff
Re: [PATCH] Final removal of mudflap
On 10/26/13 15:08, Joseph S. Myers wrote: As far as I can see, the commit left empty libmudflap directories around rather than removing them (SVN, unlike git, allows empty directories to be represented in the repository). I'll check out an SVN tree and remove the empty directory. jeff
[patch] Flatten tree-outof-ssa.h
This is even more trivial... Remove the 3 includes from tree-outof-ssa.h and pushed the required ones down to the .c files that include tree-outof-ssa.h. Bootstraps on x86_64-unknown-linux-gnu, regressions are running. Assuming no issues, OK? Andrew * tree-outof-ssa.h: Remove include files. * tree-outof-ssa.c: Add required include files from tree-outof-ssa.h. * expr.c: Likewise. * tree-ssa-coalesce.c: Likewise. * cfgexpand.c: Likewise. * tree-ssa-ter.c: Likewise. Index: tree-outof-ssa.h === *** tree-outof-ssa.h (revision 204135) --- tree-outof-ssa.h (working copy) *** along with GCC; see the file COPYING3. *** 21,29 #ifndef GCC_TREE_OUTOF_SSA_H #define GCC_TREE_OUTOF_SSA_H - #include tree-ssa-live.h - #include tree-ssa-ter.h - #include tree-ssa-coalesce.h /* This structure (of which only a singleton SA exists) is used to pass around information between the outof-SSA functions, cfgexpand --- 21,26 Index: tree-outof-ssa.c === *** tree-outof-ssa.c (revision 204135) --- tree-outof-ssa.c (working copy) *** along with GCC; see the file COPYING3. *** 36,41 --- 36,44 #include tree-ssanames.h #include dumpfile.h #include diagnostic-core.h + #include tree-ssa-live.h + #include tree-ssa-ter.h + #include tree-ssa-coalesce.h #include tree-outof-ssa.h /* FIXME: A lot of code here deals with expanding to RTL. All that code Index: expr.c === *** expr.c (revision 204135) --- expr.c (working copy) *** along with GCC; see the file COPYING3. *** 52,57 --- 52,58 #include timevar.h #include df.h #include diagnostic.h + #include tree-ssa-live.h #include tree-outof-ssa.h #include target-globals.h #include params.h Index: tree-ssa-coalesce.c === *** tree-ssa-coalesce.c (revision 204135) --- tree-ssa-coalesce.c (working copy) *** along with GCC; see the file COPYING3. *** 33,39 #include ssa-iterators.h #include tree-ssanames.h #include hash-table.h ! #include tree-outof-ssa.h #include diagnostic-core.h --- 33,40 #include ssa-iterators.h #include tree-ssanames.h #include hash-table.h ! #include tree-ssa-live.h ! #include tree-ssa-coalesce.h #include diagnostic-core.h Index: cfgexpand.c === *** cfgexpand.c (revision 204135) --- cfgexpand.c (working copy) *** along with GCC; see the file COPYING3. *** 49,54 --- 49,55 #include tree-inline.h #include value-prof.h #include target.h + #include tree-ssa-live.h #include tree-outof-ssa.h #include sbitmap.h #include cfgloop.h Index: tree-ssa-ter.c === *** tree-ssa-ter.c (revision 204135) --- tree-ssa-ter.c (working copy) *** along with GCC; see the file COPYING3. *** 32,37 --- 32,39 #include ssa-iterators.h #include tree-ssanames.h #include dumpfile.h + #include tree-ssa-live.h + #include tree-ssa-ter.h #include tree-outof-ssa.h #include flags.h
Re: [patch] Flatten tree-outof-ssa.h
On 10/28/13 12:24, Andrew MacLeod wrote: This is even more trivial... Remove the 3 includes from tree-outof-ssa.h and pushed the required ones down to the .c files that include tree-outof-ssa.h. Bootstraps on x86_64-unknown-linux-gnu, regressions are running. Assuming no issues, OK? OK. jeff
Re: [PATCH] rewrite stack vectors
On 10/25/13 13:56, Trevor Saunders wrote: On Fri, Oct 25, 2013 at 03:30:47PM -0400, Diego Novillo wrote: On 2013-10-10 14:07 , tsaund...@mozilla.com wrote: This makes the implementation of stack vectors simpler and easier to use. This works by making the size of the on stack storage a template argument, so the size is embedded in the type. This allows you to implicitly convert a stack_vecT, N to a vecT, va_heap *, and it will just work. Because there's no need to support stack vectors in unions we can make them be a more normal c++ class with a constructor and destructor that are nontrivial. Thanks. This looks much simpler, indeed. The patch is fine to commit. Just a couple of observations/questions: I don't have commit access, so can someone check it in for me? I bootstrapped and got no changes in regression tests two weeks ago, but haven't checked it since if that helps. Just doing a quick sanity build given things change quite a bit in a couple weeks.Once the sanity build completes, I'll check it in. Thanks, Jeff
[PATCH, testsuite/rs6000] Adjust two VMX tests for little endian
Hi, These two test cases require source changes when compiled on a little endian target. Verified on powerpc64{,le}-unknown-linux-gnu. Ok to commit? Thanks, Bill 2013-10-28 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant. * gcc.dg/vmx/eg-5.c: Likewise. Index: gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c === --- gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c(revision 203979) +++ gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c(working copy) @@ -13,12 +13,27 @@ #define DO_INLINE __attribute__ ((always_inline)) #define DONT_INLINE __attribute__ ((noinline)) +#ifdef __LITTLE_ENDIAN__ +static inline DO_INLINE int inline_me(vector signed short data) +{ + union {vector signed short v; signed short s[8];} u; + signed short x; + unsigned char x1, x2; + + u.v = data; + x = u.s[7]; + x1 = (x 8) 0xff; + x2 = x 0xff; + return ((x2 8) | x1); +} +#else static inline DO_INLINE int inline_me(vector signed short data) { union {vector signed short v; signed short s[8];} u; u.v = data; return u.s[7]; } +#endif static DONT_INLINE int foo(vector signed short data) { Index: gcc/testsuite/gcc.dg/vmx/eg-5.c === --- gcc/testsuite/gcc.dg/vmx/eg-5.c (revision 203979) +++ gcc/testsuite/gcc.dg/vmx/eg-5.c (working copy) @@ -7,10 +7,17 @@ matvecmul4 (vector float c0, vector float c1, vect /* Set result to a vector of f32 0's */ vector float result = ((vector float){0.,0.,0.,0.}); +#ifdef __LITTLE_ENDIAN__ + result = vec_madd (c0, vec_splat (v, 3), result); + result = vec_madd (c1, vec_splat (v, 2), result); + result = vec_madd (c2, vec_splat (v, 1), result); + result = vec_madd (c3, vec_splat (v, 0), result); +#else result = vec_madd (c0, vec_splat (v, 0), result); result = vec_madd (c1, vec_splat (v, 1), result); result = vec_madd (c2, vec_splat (v, 2), result); result = vec_madd (c3, vec_splat (v, 3), result); +#endif return result; }
Re: [PATCH] Final removal of mudflap
On Oct 28, 2013, at 11:24 AM, Jeff Law l...@redhat.com wrote: On 10/26/13 15:08, Joseph S. Myers wrote: As far as I can see, the commit left empty libmudflap directories around rather than removing them (SVN, unlike git, allows empty directories to be represented in the repository). I'll check out an SVN tree and remove the empty directory. Looks like mike stump has already did this. Thanks, Andrew jeff
[patch] Mostly remove tree-core.h from other .h files.
I noticed there were a few .h files which we including tree-core.h themselves. expr.h was one of them, and many of the rtl files used that to get at the basic tree structure for various bits. I moved the include of tree-core.h to rtl.h instead. That way all the rtl files get it, and when we flatten the rtl.h includes into the c files, it'll be reduced to just the ones that need it. Eventually, those things that need it from rtl.h won't even need tree-code.h, they'll get converted to the new gimple interface instead. streamer-hooks.h was including tree-core.h as well, but really only streamer-hooks.c needed location_t from input.h, so I included that instead from there. Now only tree.h and rtl.h are including tree-core.h, which is more appropriate. bootstrapping and regressions running on x86_64-unknown-linux-gnu.. assuming no issues, OK? Andrew * rtl.h: Add tree-core.h to include list. * expr.h: Remove tree-core.h from include list. * gimple.h: Likewise. * ipa-prop.h: Likewise. * ipa-utils.h: Likewise. * lto-streamer.h: Likewise. * streamer-hooks.h: Likewise. * streamer-hooks.c: Include input.h. *** HEADER/rtl.h 2013-10-28 11:55:32.021437788 -0400 --- rtl.h 2013-10-28 14:49:39.749635187 -0400 *** along with GCC; see the file COPYING3. *** 29,34 --- 29,35 #include alias.h #include hashtab.h #include flags.h + #include tree-core.h /* Value used by some passes to recognize noop moves as valid instructions. */ *** HEADER/expr.h 2013-10-28 11:55:32.005437796 -0400 --- expr.h 2013-10-28 14:46:25.069932215 -0400 *** along with GCC; see the file COPYING3. *** 26,34 #include rtl.h /* For optimize_size */ #include flags.h - /* For host_integerp, tree_low_cst, fold_convert, size_binop, ssize_int, -TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ - #include tree-core.h /* For GET_MODE_BITSIZE, word_mode */ #include machmode.h --- 26,31 *** HEADER/gimple.h 2013-10-28 11:55:32.009437794 -0400 --- gimple.h 2013-10-28 15:01:27.285050961 -0400 *** along with GCC; see the file COPYING3. *** 27,33 #include vec.h #include ggc.h #include basic-block.h - #include tree-core.h #include tree-ssa-alias.h #include internal-fn.h #include gimple-fold.h --- 27,32 *** HEADER/ipa-prop.h 2013-10-28 11:55:32.013437792 -0400 --- ipa-prop.h 2013-10-28 15:03:36.512726491 -0400 *** along with GCC; see the file COPYING3. *** 20,26 #ifndef IPA_PROP_H #define IPA_PROP_H - #include tree-core.h #include vec.h #include cgraph.h #include alloc-pool.h --- 20,25 *** HEADER/ipa-utils.h 2013-10-28 11:55:32.014437792 -0400 --- ipa-utils.h 2013-10-28 15:04:02.304664416 -0400 *** along with GCC; see the file COPYING3. *** 20,26 #ifndef GCC_IPA_UTILS_H #define GCC_IPA_UTILS_H - #include tree-core.h #include cgraph.h struct ipa_dfs_info { --- 20,25 *** HEADER/lto-streamer.h 2013-10-28 11:55:32.017437790 -0400 --- lto-streamer.h 2013-10-28 15:04:30.626597222 -0400 *** along with GCC; see the file COPYING3. *** 25,31 #include plugin-api.h #include hash-table.h - #include tree-core.h #include target.h #include cgraph.h #include vec.h --- 25,30 *** HEADER/streamer-hooks.h 2013-10-28 11:55:32.023437787 -0400 --- streamer-hooks.h 2013-10-28 15:30:31.876866273 -0400 *** along with GCC; see the file COPYING3. *** 23,30 #ifndef GCC_STREAMER_HOOKS_H #define GCC_STREAMER_HOOKS_H - #include tree-core.h - /* Forward declarations to avoid including unnecessary headers. */ struct output_block; struct lto_input_block; --- 23,28 *** HEADER/streamer-hooks.c 2013-10-28 11:55:32.023437787 -0400 --- streamer-hooks.c 2013-10-28 15:30:42.437856247 -0400 *** along with GCC; see the file COPYING3. *** 23,28 --- 23,29 #include config.h #include system.h #include coretypes.h + #include input.h #include streamer-hooks.h /* Streamer hooks. */
Re: [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst.
On 10/28/2013 07:53 AM, Kirill Yukhin wrote: This patch introduces mask_scalar_merge subst. Is it ok to commit to main trunk? Ok. r~
Re: [PATCH] rewrite stack vectors
On 10/25/13 13:56, Trevor Saunders wrote: On Fri, Oct 25, 2013 at 03:30:47PM -0400, Diego Novillo wrote: On 2013-10-10 14:07 , tsaund...@mozilla.com wrote: This makes the implementation of stack vectors simpler and easier to use. This works by making the size of the on stack storage a template argument, so the size is embedded in the type. This allows you to implicitly convert a stack_vecT, N to a vecT, va_heap *, and it will just work. Because there's no need to support stack vectors in unions we can make them be a more normal c++ class with a constructor and destructor that are nontrivial. Thanks. This looks much simpler, indeed. The patch is fine to commit. Just a couple of observations/questions: I don't have commit access, so can someone check it in for me? I bootstrapped and got no changes in regression tests two weeks ago, but haven't checked it since if that helps. Bootstrap went fine (as expected). Installed on your behalf. Thanks! jeff
Re: [patch] Mostly remove tree-core.h from other .h files.
On 10/28/13 13:38, Andrew MacLeod wrote: I noticed there were a few .h files which we including tree-core.h themselves. expr.h was one of them, and many of the rtl files used that to get at the basic tree structure for various bits. I moved the include of tree-core.h to rtl.h instead. That way all the rtl files get it, and when we flatten the rtl.h includes into the c files, it'll be reduced to just the ones that need it. Eventually, those things that need it from rtl.h won't even need tree-code.h, they'll get converted to the new gimple interface instead. streamer-hooks.h was including tree-core.h as well, but really only streamer-hooks.c needed location_t from input.h, so I included that instead from there. Now only tree.h and rtl.h are including tree-core.h, which is more appropriate. bootstrapping and regressions running on x86_64-unknown-linux-gnu.. assuming no issues, OK? OK. jeff
Re: [PATCH] Final removal of mudflap
On Oct 28, 2013, at 11:24 AM, Jeff Law l...@redhat.com wrote: On 10/26/13 15:08, Joseph S. Myers wrote: As far as I can see, the commit left empty libmudflap directories around rather than removing them (SVN, unlike git, allows empty directories to be represented in the repository). I'll check out an SVN tree and remove the empty directory. No time like the present, I removed it. I did a bootstrap on linux to try and ensure no fallout.
Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS
Ping? On Oct 18, 2013, at 10:03 AM, Mike Stump mikest...@comcast.net wrote: On Oct 18, 2013, at 4:24 AM, Richard Biener richard.guent...@gmail.com wrote: I agree. Btw, the implementation defined precision of PDI on SH-5 definitely looks interesting, but I wonder how you can perform implementation defined arithmetic on that PDI mode then ;) Easy, perform it in the maximal size supported… What else blocks this patch? Nothing. We have heard from everyone, and all the issues found by testing were on msp, and those were exclusively due to port issues that I believe now have been resolved. The below patch has the change to 22 bits for PSI on SH. Ok? Index: gcc/config/msp430/msp430-modes.def === --- gcc/config/msp430/msp430-modes.def (revision 202634) +++ gcc/config/msp430/msp430-modes.def (working copy) @@ -1,3 +1,3 @@ /* 20-bit address */ -PARTIAL_INT_MODE (SI); +PARTIAL_INT_MODE_NAME (SI, 20, PSI); Index: gcc/config/bfin/bfin-modes.def === --- gcc/config/bfin/bfin-modes.def (revision 202634) +++ gcc/config/bfin/bfin-modes.def (working copy) @@ -19,7 +19,7 @@ http://www.gnu.org/licenses/. */ /* PDImode for the 40-bit accumulators. */ -PARTIAL_INT_MODE (DI); +PARTIAL_INT_MODE_NAME (DI, 40, PDI); /* Two of those - covering both accumulators for vector multiplications. */ VECTOR_MODE (INT, PDI, 2); Index: gcc/config/m32c/m32c-modes.def === --- gcc/config/m32c/m32c-modes.def (revision 202634) +++ gcc/config/m32c/m32c-modes.def (working copy) @@ -22,7 +22,7 @@ /*INT_MODE (PI, 3);*/ /* 24-bit pointers, in 32-bit units */ -PARTIAL_INT_MODE (SI); +PARTIAL_INT_MODE_NAME (SI, 24, PSI); /* 48-bit MULEX result */ /* INT_MODE (MI, 6); */ Index: gcc/config/rs6000/rs6000-modes.def === --- gcc/config/rs6000/rs6000-modes.def (revision 202634) +++ gcc/config/rs6000/rs6000-modes.def (working copy) @@ -45,4 +45,4 @@ VECTOR_MODES (FLOAT, 32); /* V /* Replacement for TImode that only is allowed in GPRs. We also use PTImode for quad memory atomic operations to force getting an even/odd register combination. */ -PARTIAL_INT_MODE (TI); +PARTIAL_INT_MODE_NAME (TI, 128, PTI); Index: gcc/config/sh/sh-modes.def === --- gcc/config/sh/sh-modes.def (revision 202634) +++ gcc/config/sh/sh-modes.def (working copy) @@ -18,9 +18,9 @@ along with GCC; see the file COPYING3. http://www.gnu.org/licenses/. */ /* The SH uses a partial integer mode to represent the FPSCR register. */ -PARTIAL_INT_MODE (SI); +PARTIAL_INT_MODE_NAME (SI, 22, PSI); /* PDI mode is used to represent a function address in a target register. */ -PARTIAL_INT_MODE (DI); +PARTIAL_INT_MODE_NAME (DI, 64, PDI); /* Vector modes. */ VECTOR_MODE (INT, QI, 2);/* V2QI */ Index: gcc/genmodes.c === --- gcc/genmodes.c (revision 202634) +++ gcc/genmodes.c (working copy) @@ -629,10 +629,14 @@ reset_float_format (const char *name, co m-format = format; } -/* Partial integer modes are specified by relation to a full integer mode. - For now, we do not attempt to narrow down their bit sizes. */ -#define PARTIAL_INT_MODE(M) \ - make_partial_integer_mode (#M, P #M, -1U, __FILE__, __LINE__) +/* Partial integer modes are specified by relation to a full integer + mode. */ +#define PARTIAL_INT_MODE(M,PREC) \ + make_partial_integer_mode (#M, P #PREC #M, PREC, __FILE__, __LINE__) +/* Partial integer modes are specified by relation to a full integer + mode. */ +#define PARTIAL_INT_MODE_NAME(M,PREC,NAME) \ + make_partial_integer_mode (#M, #NAME, PREC, __FILE__, __LINE__) static void ATTRIBUTE_UNUSED make_partial_integer_mode (const char *base, const char *name, unsigned int precision, @@ -669,7 +673,7 @@ make_vector_mode (enum mode_class bclass struct mode_data *v; enum mode_class vclass = vector_class (bclass); struct mode_data *component = find_mode (base); - char namebuf[8]; + char namebuf[16]; if (vclass == MODE_RANDOM) return; @@ -917,7 +921,7 @@ enum machine_mode\n{); end will try to use it for bitfields in structures and the like, which we do not want. Only the target md file should generate BImode widgets. */ - if (first first-precision == 1) + if (first first-precision == 1 c == MODE_INT) first = first-next; if (first last) @@ -1187,7 +1191,7 @@ emit_class_narrowest_mode (void) /* Bleah, all this to get the comment right for MIN_MODE_INT. */ tagged_printf (MIN_%s,
Re: Commit: MSP430: Pass -md on to assembler
Ping? On Sep 27, 2013, at 11:21 AM, Mike Stump mikest...@comcast.net wrote: On Sep 27, 2013, at 1:48 AM, nick clifton ni...@redhat.com wrote: OK by me, although I cannot approve that particular patch. I know, the intent is for someone that can, to approve it. But I ran into a very strange problem. With your PARTIAL_INT_MODE_NAME patch applied GCC started erroneously eliminating NULL function pointer checks! This was particularly noticeable in libgcc/crtstuff.c where for example: Index: stor-layout.c === --- stor-layout.c (revision 202634) +++ stor-layout.c (working copy) @@ -2821,7 +2821,7 @@ get_mode_bounds (enum machine_mode mode, enum machine_mode target_mode, rtx *mmin, rtx *mmax) { - unsigned size = GET_MODE_BITSIZE (mode); + unsigned size = GET_MODE_PRECISION (mode); unsigned HOST_WIDE_INT min_val, max_val; gcc_assert (size = HOST_BITS_PER_WIDE_INT); fixes this problem. The problem is that we treat the maximum value of PSImode as -1, and then later we do: case EQ: /* x == y is always false for y out of range. */ =if (val mmin || val mmax) B return const0_rtx; break; and the answer to the question is 0 -1, is no, so the entire test is eliminated as never able to be true. After the fix, in your case, we get: (gdb) p mmin $72 = -524288 (gdb) p mmax $73 = 524287 and the test becomes if (0 -524288 || 0 524287), which is not true, then the test isn't eliminated as never true. Here, we see the test that protects this code uses GET_MODE_PRECISION: (gdb) macro expand HWI_COMPUTABLE_MODE_P (PSImode) expands to: enum mode_class) mode_class[PSImode]) == MODE_INT || ((enum mode_class) mode_class[PSIm\ ode]) == MODE_PARTIAL_INT) mode_precision[PSImode] = (8 * 8)) so, clearly, someone was thinking about GET_MODE_PRECISION being in range, not GET_MODE_BITSIZE. Ok? [ for the rtl people ]
Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS
Who needs to approve this? THe target maintainers, or some global maintainer?
Re: Apply attribute returns_nonnull in libiberty
On Fri, 11 Oct 2013, Marc Glisse wrote: Hello, here are some uses for returns_nonnull in libiberty. Bootstrap+testsuite (default languages) on x86_64-unknown-linux-gnu. 2013-10-11 Marc Glisse marc.gli...@inria.fr PR tree-optimization/58689 include/ * ansidecl.h (ATTRIBUTE_RETURNS_NONNULL): New macro. * libiberty.h (basename, lbasename, dos_lbasename, unix_lbasename, concat_copy): Mark with attributes nonnull(1) and returns_nonnull. (concat, reconcat, concat_copy2, choose_temp_base, xstrerror, xmalloc, xrealloc, xcalloc, xstrdup, xstrndup, xmemdup, pex_init): Mark with attribute returns_nonnull. libiberty/ * concat.c: Remove note about xmalloc. Hello, I cross-posted this message: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00986.html to the various libiberty users, and the only replies I got were: ok for gcc gdb doesn't mind (Tom Tromey) makes sense (Mike Frysinger) is the original patch thus ok? http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00818.html -- Marc Glisse
Re: Apply attribute returns_nonnull in libiberty
Sure, go for it.
Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)
On 10/22/13 10:22, Iyer, Balaji V wrote: Hi Jeff, I have attached 2 patches - 1 for C and 1 for C++ - along with the changelogs (ChangeLog.cilkplus for C and common changes, cp-ChangeLog.cilkplus for C++ specific files) with the changes you have requested. Answers to your questions are given below also. It passes all its tests and doesn't affect any other existing tests (i.e. by affect I mean fail a passing test or pass a failing test). A note. cilk-common seems to be a mix of tree and rtl bits. Those may need to be broken into separate files and/or moved around as part of the reorganization work that's occurring in GCC right now. So if Andrew or someone else pings you, please respond appropriately. The C (and shared) parts of this patch are OK. Jason needs to review the C++ front-end changes. If possible, I recommend installing the C (and shared) bits as well as the runtime component if they can build work w/o the C++ front-end changes, then track the C++ front-end changes separately. Jeff
Re: [PATCH, testsuite/rs6000] Adjust two VMX tests for little endian
On Mon, Oct 28, 2013 at 3:17 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, These two test cases require source changes when compiled on a little endian target. Verified on powerpc64{,le}-unknown-linux-gnu. Ok to commit? Thanks, Bill 2013-10-28 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant. * gcc.dg/vmx/eg-5.c: Likewise. Okay. Thanks, David
Re: [PATCH, MPX, 2/X] Pointers Checker [1/25] Hooks
On 10/28/13 09:12, Ilya Enkovich wrote: I fixed documentation for bounds load/store hooks. Also added documentation for list of builtins returned by builtin_chkp_function. Thanks, Ilya -- gcc/ 2013-10-28 Ilya Enkovich ilya.enkov...@intel.com * target.def (builtin_chkp_function): New. (chkp_bound_type): New. (chkp_bound_mode): New. (fn_abi_va_list_bounds_size): New. (load_bounds_for_arg): New. (store_bounds_for_arg): New. * targhooks.h (default_load_bounds_for_arg): New. (default_store_bounds_for_arg): New. (default_fn_abi_va_list_bounds_size): New. (default_chkp_bound_type): New. (default_chkp_bound_mode): New. (default_builtin_chkp_function): New. * targhooks.c (default_load_bounds_for_arg): New. (default_store_bounds_for_arg): New. (default_fn_abi_va_list_bounds_size): New. (default_chkp_bound_type): New. (default_chkp_bound_mode); New. (default_builtin_chkp_function): New. * doc/tm.texi.in (TARGET_FN_ABI_VA_LIST_BOUNDS_SIZE): New. (TARGET_LOAD_BOUNDS_FOR_ARG): New. (TARGET_STORE_BOUNDS_FOR_ARG): New. (TARGET_BUILTIN_CHKP_FUNCTION): New. (TARGET_CHKP_BOUND_TYPE): New. (TARGET_CHKP_BOUND_MODE): New. * doc/tm.texi: Regenerated. * langhooks.h (lang_hooks): Add chkp_supported field. * langhooks-def.h (LANG_HOOKS_CHKP_SUPPORTED): New. (LANG_HOOKS_INITIALIZER); Add LANG_HOOKS_CHKP_SUPPORTED. This is fine. Please install after a bootstrap test to ensure everything builds correctly. Thanks, Jeff
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello Richard, On 28 Oct 08:20, Richard Henderson wrote: Why is a masked *scalar* operation useful? The reason the instructions exist is so that you can do fully fault correct predicated scalar algorithms. I example. In fact, with some hacky tricks, you can fully predicate normal C code in the SIMD registers. One might want to have such region as fast as possible while staying scalar: (all vars are integers): if ( a[i] ) b[i] += c[i]; Definetely to have max performace we want to have the region fully predicated. This code cannot be predicated correctly in IA pre-AVX-512: vmovd a[i], %xmm0 vptestm %zmm0, %zmm0, %k1 // hack because we didn’t have masking for VPMOVD/Q vmovss b[i], %xmm0 {%k1}{z} // no scalar int add, hack, 128-bit works fine // because mask is sawed off in the right places vpaddd c[i], %zmm0, %zmm0 {%k1}{z} // vmpmovd/w hack again vmovss %xmm0, b[i] {%k1} So, having such masked scalar insns allows us to have non-branching scalar code. II Example. Perhaps one interesting case of scalar and mask, though not to do with predication (and really narrow), is as an idiom to generate a write mask value of 0x1: vcmpss k1, xmm0, xmm0, {sae}, 0xf Currently we do: mov 1, %eax kmovw %eax, %k1 But we sometimes have to spill a GPR in order to introduce this sequence. In that case the vcmpss idiom seems like a better choice (though you might worry about the additional false dependency on xmm0). And finally. It’s kind of strange not to have complete ISA support. What if someone would want to have equivalent code for the vector version and the scalar remainder version? -- Thanks, K
RE: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jeff Law Sent: Monday, October 28, 2013 4:24 PM To: Iyer, Balaji V; r...@redhat.com; Jason Merrill (ja...@redhat.com); Aldy Hernandez (al...@redhat.com) Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++) On 10/22/13 10:22, Iyer, Balaji V wrote: Hi Jeff, I have attached 2 patches - 1 for C and 1 for C++ - along with the changelogs (ChangeLog.cilkplus for C and common changes, cp-ChangeLog.cilkplus for C++ specific files) with the changes you have requested. Answers to your questions are given below also. It passes all its tests and doesn't affect any other existing tests (i.e. by affect I mean fail a passing test or pass a failing test). A note. cilk-common seems to be a mix of tree and rtl bits. Those may need to be broken into separate files and/or moved around as part of the reorganization work that's occurring in GCC right now. So if Andrew or someone else pings you, please respond appropriately. The C (and shared) parts of this patch are OK. Jason needs to review the C++ front-end changes. If possible, I recommend installing the C (and shared) bits as well as the runtime component if they can build work w/o the C++ front-end changes, then track the C++ front-end changes separately. Thanks! I will extract and check in the Cilk_spawn and _Cilk_sync for C work. -Balaji V. Iyer. Jeff
Re: free is a killer
On 10/26/13 01:15, Marc Glisse wrote: Hello, this patch teaches gcc that free kills the memory its argument points to. The equality test is probably too strict, I guess we can loosen it later (unless you have suggestions?). Note that the corresponding code for BUILT_IN_MEMCPY and others seems suspicious to me, it looks like it is testing for equality between a pointer and a mem_ref, which is unlikely to happen. Bootstrap+testsuite on x86_64-unknown-linux-gnu. 2013-10-27 Marc Glisse marc.gli...@inria.fr PR tree-optimization/19831 gcc/ * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle BUILT_IN_FREE. gcc/testsuite/ * gcc.dg/tree-ssa/alias-25.c: New file. OK for the trunk. I agree the MEM_REF* and VA_END cases look strange and I think they're wrong as well. Did you happen to try fixing them and running the testsuite to see if anything fails? ISTM your testcase could be tweaked to test conclusively if they're doing the right thing -- and regardless of what's found that test can/should make its way into the testsuite. Jeff
Re: [PATCH] make gengtype more robust against user error
On 10/25/13 15:37, David Malcolm wrote: This patch addresses various forms of failure described in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01974.html It adds a default: gcc_unreachable(); to the autogenerated switch() statement in the routines for a base class, converting various kinds of tag errors from leading to silent lack-of-field traversal into giving run-time assertion failures (addressing (F) and (G)) It also issues an error within gengtype itself for a missing desc (failure B), turning this into an error message from gengtype. I found another potential failure mode: (H) If you had: class GTY((desc(%0.kind), tag(0))) foo { public: int kind; tree p; }; class GTY((tag(1))) bar : public foo { public: tree q; }; static GTY(()) foo *dummy_foo; and there are no explicit pointers to bar in the code, gengtype treated it as unused, and silently omitted the case for it from foo's marking routine. I've updated set_gc_type_used so that it propagates usage down into subclasses (which recurses), fixing this issue. To do this efficiently we need to track subclasses in within gengtype, so the patch also adds that, and uses it to eliminate an O(N^2). Note that for error (G), if a class within the hierarchy omits a GTY marker, gengtype doesn't parse it at all, and so doesn't parse the inheritance information - so we can't issue a warning about this. However, the lack of a tag will trigger a run-time assertion failure due to hitting the default: gcc_unreachable(); in the switch. The patch also adds a paragraph to the docs, spelling out the need for evary class in such a hierarchy to have a GTY marker. I believe this addresses all of the silent-lack-of-field-traversal issues, converting them to gengtype errors or runtime assertions. It also adds a handler for (E), turning this from a failure to compile bogus C to a specific error in gengtype. I'm bootstrapping/regtesting now. OK for trunk if that passes? gcc/ * doc/gty.texi (Inheritance and GTY): Make it clear that to use autogenerated markers for a class-hierarchy, every class must have a GTY marker. * gengtype.h (struct type): Add linked list of subclasses to the s member of the union. (add_subclass): New decl. * gengtype-state.c (read_state_struct_type): Set up subclass linked list. * gengtype.c (get_ultimate_base_class): New. (add_subclass): New. (new_structure): Set up subclass linked list. (set_gc_used_type): Propagate usage information to subclasses. (output_mangled_typename): Use get_ultimate_base_class. (walk_subclasses): Use the subclass linked list, avoiding an O(N^2) when writing out all types. (walk_type): Issue an error if the base class is missing a tag, rather than generating bogus C code. Add a gcc_unreachable default case, in case people omit tags from concrete subclasses, or get the values wrong. (write_func_for_structure): Issue an error for subclasses for which the base doesn't have a desc, since otherwise the autogenerated routines for the base would silently fail to visit any subclass fields. (write_root): Use get_ultimate_base_class, tweaking constness of tp to match that function's signature. Thanks for diving into this stuff and getting it fixed. OK for the trunk, Jeff
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
On 10/25/13 11:57, Ilya Enkovich wrote: There are currently two known issues with LTO. The first one is ICE in LTO streamer when it reads instrumented code. The second one is unitialized flag_check_pointers when code is compiled by lto1 (do you know why it may happen BTW?). It also causes ICE beacause instrumented code is met when not expected. Of course, I'll fix these problems anyway, but I was going to allow '-fcheck-pointers -flto' only when checker testsuite has 100% pass rate with lto. No idea about why LTO would be failing in this way -- my experience with LTO is virtually nil. I'm OK with the restriction as a temporary measure. But we really dislike this kidn of stuff, so I need a commitment from you to dive into these problems and sort out what's really going on. Ilya
[SH] PR 54236 - add some more addc patterns
Hello, This adds some more patterns to utilize the SH addc instruction. Tested on rev 204111 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2a/-mb,-m2a-single/-mb,-m4/-ml,-m4-single/-ml,-m4/-mb,-m4-single/-mb} and no new failures. OK for trunk? Cheers, Oleg gcc/ChangeLog: PR target/54236 * config/sh/sh.md (*addc): Add more addc related insn and splits. * config/sh/sh.c (addsubcosts): Handle some addc special cases. testsuite/ChangeLog: PR target/54236 * gcc.target/sh/pr54236-2: New. * gcc.target/sh/pr54089-6: Add another rotl special case. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 204098) +++ gcc/config/sh/sh.md (working copy) @@ -1910,6 +1910,126 @@ (match_dup 1))) (clobber (reg:SI T_REG))])]) +;; Use shlr-addc to do 'reg + (reg 1)'. +(define_insn_and_split *addc + [(set (match_operand:SI 0 arith_reg_dest) + (plus:SI (and:SI (match_operand:SI 1 arith_reg_operand) + (const_int 1)) + (match_operand:SI 2 arith_reg_operand))) + (clobber (reg:SI T_REG))] + TARGET_SH1 + # + can_create_pseudo_p () + [(parallel [(set (match_dup 0) (plus:SI (reg:SI T_REG) (match_dup 2))) + (clobber (reg:SI T_REG))])] +{ + emit_insn (gen_shlr (gen_reg_rtx (SImode), operands[1])); +}) + +;; Use shlr-addc to do 'reg + reg + (reg 1)'. +(define_insn_and_split *addc + [(set (match_operand:SI 0 arith_reg_dest) + (plus:SI (plus:SI (and:SI (match_operand:SI 1 arith_reg_operand) + (const_int 1)) + (match_operand:SI 2 arith_reg_operand)) + (match_operand:SI 3 arith_reg_operand))) + (clobber (reg:SI T_REG))] + TARGET_SH1 + # + can_create_pseudo_p () + [(parallel [(set (match_dup 0) (plus:SI (plus:SI (match_dup 2) (match_dup 3)) + (reg:SI T_REG))) + (clobber (reg:SI T_REG))])] +{ + emit_insn (gen_shlr (gen_reg_rtx (SImode), operands[1])); +}) + +;; Canonicalize 'reg + (reg 1) + reg' into 'reg + reg + (reg 1)'. +(define_insn_and_split *addc + [(set (match_operand:SI 0 arith_reg_dest) + (plus:SI (and:SI (match_operand:SI 1 arith_reg_operand) + (const_int 1)) + (plus:SI (match_operand:SI 2 arith_reg_operand) + (match_operand:SI 3 arith_reg_operand + (clobber (reg:SI T_REG))] + TARGET_SH1 + # + can_create_pseudo_p () + [(parallel [(set (match_dup 0) + (plus:SI (plus:SI (and:SI (match_dup 1) (const_int 1)) + (match_dup 2)) + (match_dup 3))) + (clobber (reg:SI T_REG))])]) + +;; Canonicalize '2 * reg + (reg 1)' into 'reg + reg + (reg 1)'. +(define_insn_and_split *addc + [(set (match_operand:SI 0 arith_reg_dest) + (plus:SI (and:SI (match_operand:SI 1 arith_reg_operand) + (const_int 1)) + (mult:SI (match_operand:SI 2 arith_reg_operand) + (const_int 2 + (clobber (reg:SI T_REG))] + TARGET_SH1 + # + can_create_pseudo_p () + [(parallel [(set (match_dup 0) + (plus:SI (plus:SI (and:SI (match_dup 1) (const_int 1)) + (match_dup 2)) + (match_dup 2))) + (clobber (reg:SI T_REG))])]) + +;; Use shll-addc to do 'reg + ((unsigned int)reg 31)'. +(define_insn_and_split *addc + [(set (match_operand:SI 0 arith_reg_dest) + (plus:SI (lshiftrt:SI (match_operand:SI 1 arith_reg_operand) + (const_int 31)) + (match_operand:SI 2 arith_reg_operand))) + (clobber (reg:SI T_REG))] + TARGET_SH1 + # + can_create_pseudo_p () + [(parallel [(set (match_dup 0) (plus:SI (reg:SI T_REG) (match_dup 2))) + (clobber (reg:SI T_REG))])] +{ + emit_insn (gen_shll (gen_reg_rtx (SImode), operands[1])); +}) + +;; Use shll-addc to do 'reg + reg + ((unsigned int)reg 31)'. +(define_insn_and_split *addc + [(set (match_operand:SI 0 arith_reg_dest) + (plus:SI (plus:SI (lshiftrt:SI (match_operand:SI 1 arith_reg_operand) + (const_int 31)) + (match_operand:SI 2 arith_reg_operand)) + (match_operand:SI 3 arith_reg_operand))) + (clobber (reg:SI T_REG))] + TARGET_SH1 + # + can_create_pseudo_p () + [(parallel [(set (match_dup 0) (plus:SI (plus:SI (match_dup 2) (match_dup 3)) + (reg:SI T_REG))) + (clobber (reg:SI T_REG))])] +{ + emit_insn (gen_shll (gen_reg_rtx (SImode), operands[1])); +}) + +;; Canonicalize '2 * reg + ((unsigned int)reg 31)' +;; into 'reg + reg + (reg 1)'. +(define_insn_and_split *addc + [(set (match_operand:SI 0 arith_reg_dest) + (plus:SI (mult:SI (match_operand:SI 1 arith_reg_operand) + (const_int 2)) + (lshiftrt:SI (match_operand:SI 2 arith_reg_operand) + (const_int 31 + (clobber (reg:SI T_REG))] + TARGET_SH1 + # + can_create_pseudo_p () + [(parallel [(set (match_dup 0) + (plus:SI (plus:SI (lshiftrt:SI (match_dup 2) (const_int 31)) + (match_dup 1)) + (match_dup 1))) + (clobber (reg:SI T_REG))])]) + (define_expand addsi3 [(set (match_operand:SI 0 arith_reg_operand ) (plus:SI (match_operand:SI 1 arith_operand ) Index: gcc/config/sh/sh.c
Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.
On 10/25/13 02:39, Eric Botcazou wrote: Thanks. Installed on the trunk. Well, no, that will be problematic for some architectures. The history of this piece of code is complicated and it's admittedly lacking a comment, but the purpose of the block is clear enough: op0 = expand_expr (base, NULL_RTX, VOIDmode, EXPAND_SUM); op0 = memory_address_addr_space (address_mode, op0, as); if (!integer_zerop (TREE_OPERAND (exp, 1))) { rtx off = immed_double_int_const (mem_ref_offset (exp), address_mode); op0 = simplify_gen_binary (PLUS, address_mode, op0, off); } op0 = memory_address_addr_space (mode, op0, as); The offset computation is done in address_mode and then, only at the end, converted to mode. My reading of memory_address_addr_space is that MODE is the mode of the memory reference, not the mode of the address. I fail to see how passing in the mode of the address in the first call can be correct. What am I missing? jeff
[PATCH] Fix comment typo
Just something I noticed while reviewing a patch. Committed. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e525748..013fe73 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2013-10-28 Jeff Law l...@redhat.com + + * lower-subreg.c (resolve_simple_move): Fix comment typo. + 2013-10-28 Trevor Saunders tsaund...@mozilla.com * df-scan.c (df_collection_rec): Adjust. diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index 8ff5fc3..ebf364f 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -1070,7 +1070,7 @@ resolve_simple_move (rtx set, rtx insn) emit_insn_before (insns, insn); - /* If we get here via self-recutsion, then INSN is not yet in the insns + /* If we get here via self-recursion, then INSN is not yet in the insns chain and delete_insn will fail. We only want to remove INSN from the current sequence. See PR56738. */ if (in_sequence_p ())
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
On 10/28/2013 01:58 PM, Kirill Yukhin wrote: Hello Richard, On 28 Oct 08:20, Richard Henderson wrote: Why is a masked *scalar* operation useful? The reason the instructions exist is so that you can do fully fault correct predicated scalar algorithms. Using VEC_MERGE isn't the proper representation for that. If that's your real goal, then COND_EXEC is the only way to let rtl know that faults are suppressed in the false condition. r~
Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops
Jason Merrill wrote: On 10/25/2013 04:46 PM, Tobias Burnus wrote: +do_range_for_auto_deduction (range_decl, range_expr); // FIXME:IVDEP I think in this situation let's set a flag on the RANGE_FOR_STMT so we can do ivdep handling at instantiation time. I am not completely sure whether you had the following in mind, but that's what I have now implemented: DEFTREECODE (RANGE_FOR_STMT, range_for_stmt, tcc_statement, 4) has now a 5th operator (RANGE_FOR_IVDEP), which has the value boolean_true_node - or NULL_TREE (via begin_range_for_stmt). Additionally, I moved the ivdep annotation to semantics.c as one otherwise might get a cleanup_point_expr around the annotate expression, which the current algorithm in tree-cfg.c's replace_loop_annotate doesn't handle. Unfortunately, the GCC ivdep annotation doesn't help with pr33426-ivdep-4.cc: It still versions the loop with an alias check. But that seems to be a middle-end problem as loop-safelen gets set. I did an all-language bootstrap + regtesting on x86-64-gnu-linux. Is the patch now OK? Tobias 2013-10-29 Tobias Burnus bur...@net-b.de gcc/cp/ PR other/33426 * cp-tree.def (RANGE_FOR_STMT): Add RANGE_FOR_IVDEP flag. * cp-tree.h (RANGE_FOR_IVDEP): Define. (cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Take 'bool ivdep' parameter. * cp-array-notation.c (create_an_loop): Update call. * init.c (build_vec_init): Ditto. * pt.c (tsubst_expr): Ditto. * parser.c (cp_parser_iteration_statement, cp_parser_for, cp_parser_range_for, cp_convert_range_for): Update calls. (cp_parser_pragma): Accept GCC ivdep for 'while' and 'do'. * semantics.c (finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Optionally build ivdep annotation. (begin_range_for_stmt): Adapt for RANGE_FOR_STMT changes. gcc/testsuite/ PR other/33426 * g++.dg/vect/pr33426-ivdep-2.cc: New. * g++.dg/vect/pr33426-ivdep-3.cc: New. * g++.dg/vect/pr33426-ivdep-4.cc: New. gcc/ PR other/33426 * gcc/tree-cfg.c (replace_loop_annotate): Replace warning by warning_at. diff --git a/gcc/cp/cp-array-notation.c b/gcc/cp/cp-array-notation.c index c700f58..e1fb0ee 100644 --- a/gcc/cp/cp-array-notation.c +++ b/gcc/cp/cp-array-notation.c @@ -71,7 +71,7 @@ create_an_loop (tree init, tree cond, tree incr, tree body) finish_expr_stmt (init); for_stmt = begin_for_stmt (NULL_TREE, NULL_TREE); finish_for_init_stmt (for_stmt); - finish_for_cond (cond, for_stmt); + finish_for_cond (cond, for_stmt, false); finish_for_expr (incr, for_stmt); finish_expr_stmt (body); finish_for_stmt (for_stmt); diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def index bd9bfa8..93355b5 100644 --- a/gcc/cp/cp-tree.def +++ b/gcc/cp/cp-tree.def @@ -299,9 +299,9 @@ DEFTREECODE (IF_STMT, if_stmt, tcc_statement, 4) DEFTREECODE (FOR_STMT, for_stmt, tcc_statement, 5) /* Used to represent a range-based `for' statement. The operands are - RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, and RANGE_FOR_SCOPE, - respectively. Only used in templates. */ -DEFTREECODE (RANGE_FOR_STMT, range_for_stmt, tcc_statement, 4) + RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE, + and RANGE_FOR_IVDEP, respectively. Only used in templates. */ +DEFTREECODE (RANGE_FOR_STMT, range_for_stmt, tcc_statement, 5) /* Used to represent a 'while' statement. The operands are WHILE_COND and WHILE_BODY, respectively. */ diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 507b389..0d3bc70 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -4088,6 +4088,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter) #define RANGE_FOR_EXPR(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 1) #define RANGE_FOR_BODY(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 2) #define RANGE_FOR_SCOPE(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 3) +#define RANGE_FOR_IVDEP(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4) #define SWITCH_STMT_COND(NODE) TREE_OPERAND (SWITCH_STMT_CHECK (NODE), 0) #define SWITCH_STMT_BODY(NODE) TREE_OPERAND (SWITCH_STMT_CHECK (NODE), 1) @@ -4321,7 +4322,7 @@ extern int comparing_specializations; sizeof can be nested. */ extern int cp_unevaluated_operand; -extern tree cp_convert_range_for (tree, tree, tree); +extern tree cp_convert_range_for (tree, tree, tree, bool); extern bool parsing_nsdmi (void); /* in pt.c */ @@ -5671,16 +5672,16 @@ extern void begin_else_clause (tree); extern void finish_else_clause (tree); extern void finish_if_stmt (tree); extern tree begin_while_stmt (void); -extern void finish_while_stmt_cond (tree, tree); +extern void finish_while_stmt_cond (tree, tree, bool); extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body (tree); -extern void finish_do_stmt (tree, tree); +extern void finish_do_stmt (tree, tree, bool); extern tree finish_return_stmt (tree); extern tree begin_for_scope (tree *); extern
Re: free is a killer
On Mon, 28 Oct 2013, Jeff Law wrote: On 10/26/13 01:15, Marc Glisse wrote: Hello, this patch teaches gcc that free kills the memory its argument points to. The equality test is probably too strict, I guess we can loosen it later (unless you have suggestions?). Note that the corresponding code for BUILT_IN_MEMCPY and others seems suspicious to me, it looks like it is testing for equality between a pointer and a mem_ref, which is unlikely to happen. Bootstrap+testsuite on x86_64-unknown-linux-gnu. 2013-10-27 Marc Glisse marc.gli...@inria.fr PR tree-optimization/19831 gcc/ * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle BUILT_IN_FREE. gcc/testsuite/ * gcc.dg/tree-ssa/alias-25.c: New file. OK for the trunk. Thanks. I agree the MEM_REF* and VA_END cases look strange and I think they're wrong as well. Did you happen to try fixing them and running the testsuite to see if anything fails? ISTM your testcase could be tweaked to test conclusively if they're doing the right thing -- and regardless of what's found that test can/should make its way into the testsuite. I checked and it does the wrong thing (I don't have the testcase handy anymore, but it shouldn't be hard to recreate one), I even wrote a patch (attached) but it is related to: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html so it can't go in. A more limited fix (still missing many cases) would be possible. I didn't test VA_END, but it doesn't look as bad (it is the SSA_NAME case that is wrong for memcpy). While I am posting non-submitted patches, does the following vaguely make sense? I couldn't find a testcase that exercised it without some local patches, but the idea (IIRC) is that with: struct A { struct B b; int i; } we can easily end up with one ao_ref.base that is a MEM_REF[p] and another one a MEM_REF[(B*)q] where p and q are of type A*, and that prevents us from noticing that p-i and q-b don't alias. Maybe I should instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually points to a larger structure that starts with A it is likely to fail as well... --- gcc/tree-ssa-alias.c(revision 204095) +++ gcc/tree-ssa-alias.c(working copy) @@ -1099,22 +1099,24 @@ indirect_refs_may_alias_p (tree ref1 ATT /* If both references are through the same type, they do not alias if the accesses do not overlap. This does extra disambiguation for mixed/pointer accesses but requires strict aliasing. */ if ((TREE_CODE (base1) != TARGET_MEM_REF || (!TMR_INDEX (base1) !TMR_INDEX2 (base1))) (TREE_CODE (base2) != TARGET_MEM_REF || (!TMR_INDEX (base2) !TMR_INDEX2 (base2))) same_type_for_tbaa (TREE_TYPE (base1), TREE_TYPE (ptrtype1)) == 1 same_type_for_tbaa (TREE_TYPE (base2), TREE_TYPE (ptrtype2)) == 1 - same_type_for_tbaa (TREE_TYPE (ptrtype1), -TREE_TYPE (ptrtype2)) == 1) + (same_type_for_tbaa (TREE_TYPE (ptrtype1), + TREE_TYPE (ptrtype2)) == 1 + || same_type_for_tbaa (TREE_TYPE (TREE_TYPE (ptr1)), +TREE_TYPE (TREE_TYPE (ptr2))) == 1)) return ranges_overlap_p (offset1, max_size1, offset2, max_size2); /* Do type-based disambiguation. */ if (base1_alias_set != base2_alias_set !alias_sets_conflict_p (base1_alias_set, base2_alias_set)) return false; /* Do access-path based disambiguation. */ if (ref1 ref2 (handled_component_p (ref1) || handled_component_p (ref2)) In the category ugly hack that I won't submit, let me also attach this patch that sometimes helps notice that malloc doesn't alias other things (I don't know if the first hunk ever fires). -- Marc GlisseIndex: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204089) +++ gcc/tree-ssa-alias.c(working copy) @@ -1985,23 +1985,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre return stmt_may_clobber_ref_p_1 (stmt, r); } /* If STMT kills the memory reference REF return true, otherwise return false. */ static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { /* For a must-alias check we need to be able to constrain - the access properly. */ - ao_ref_base (ref); - if (ref-max_size == -1) + the access properly. + FIXME: except for BUILTIN_FREE. */ + if (!ao_ref_base (ref) + || ref-max_size == -1) return false; if (gimple_has_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME /* The assignment is not necessarily carried out if it can throw and we can catch it in the current function where we could inspect the previous value. ??? We only need to care about the RHS throwing. For aggregate assignments or similar calls and non-call exceptions the LHS might throw as well. */ @@ -2064,37 +2065,45 @@
Re: [PATCH] Do not append *INTERNAL* to the decl name
ping... Thanks, Dehao On Fri, Oct 18, 2013 at 11:06 AM, Dehao Chen de...@google.com wrote: On Fri, Oct 18, 2013 at 10:39 AM, Jason Merrill ja...@redhat.com wrote: On 10/11/2013 01:59 PM, Dehao Chen wrote: It's hard to get a testcase without http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=201856 because none of these *INTERNAL* symbols will be emitted in debug info. Why does that change cause one of these symbols to be emitted? As Cary says, that was done as an assertion. If you're invalidating the assertion, I need more explanation as to why. Sorry, pointed to the incorrect patch, it should be http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=201858 These symbols will not be emitted in the symbol table in assembly, but they will be emitted in the debug_info after http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=201858, which emits function's linkage name in debug_info even when it's abstract. The *INTERNAL* assertion I removed is an over-assertion that not only assert for symbol table, but also for debug info. Thanks, Dehao Jason
[Patch, fortran] PRs 57893 and 58858
Dear All, This patch addresses issues arising from PR57893. It is entirely obvious. Bootstraps and regtests on FC17/x86_64 - OK for trunk? Cheers Paul PS I wash my hands of all attempts to use BT_HOLLERITH types! 2013-10-29 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * trans-types.c (gfc_typenode_for_spec): Add typenode for BT_HOLLERITH. Note that the length is incorrect but unusable. PR fortran 58858 * target-memory.c (gfc_element_size): Add element sizes for BT_VOID and BT_ASSUMED, using gfc_typenode_for_spec. 2013-10-29 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * gfortran.dg/unlimited_polymorphic_13.f90 : Use real variables to determine sizes of real kinds. PR fortran 58858 * gfortran.dg/unlimited_polymorphic_14.f90 : New test. Index: gcc/fortran/trans-types.c === *** gcc/fortran/trans-types.c (revision 204135) --- gcc/fortran/trans-types.c (working copy) *** gfc_typenode_for_spec (gfc_typespec * sp *** 1099,1104 --- 1099,1110 basetype = gfc_get_character_type (spec-kind, spec-u.cl); break; + case BT_HOLLERITH: + /* Since this cannot be used, return a length one character. */ + basetype = gfc_get_character_type_len (gfc_default_character_kind, + gfc_index_one_node); + break; + case BT_DERIVED: case BT_CLASS: basetype = gfc_get_derived_type (spec-u.derived); Index: gcc/fortran/target-memory.c === *** gcc/fortran/target-memory.c (revision 204135) --- gcc/fortran/target-memory.c (working copy) *** gfc_element_size (gfc_expr *e) *** 109,114 --- 109,116 return e-representation.length; case BT_DERIVED: case BT_CLASS: + case BT_VOID: + case BT_ASSUMED: { /* Determine type size without clobbering the typespec for ISO C binding types. */ Index: gcc/testsuite/gfortran.dg/unlimited_polymorphic_13.f90 === *** gcc/testsuite/gfortran.dg/unlimited_polymorphic_13.f90 (revision 204135) --- gcc/testsuite/gfortran.dg/unlimited_polymorphic_13.f90 (working copy) *** module m *** 13,18 --- 13,22 integer, parameter :: c2 = real_kinds(2) integer, parameter :: c3 = real_kinds(size(real_kinds)-1) integer, parameter :: c4 = real_kinds(size(real_kinds)) + real(c1) :: r1 + real(c2) :: r2 + real(c3) :: r3 + real(c4) :: r4 contains subroutine s(o, k) class(*) :: o *** contains *** 21,31 select case (k) case (4) ! sz = 32*2 case (8) ! sz = 64*2 ! case (10,16) ! sz = 128*2 case default call abort() end select --- 25,37 select case (k) case (4) ! sz = storage_size(r1)*2 case (8) ! sz = storage_size(r2)*2 ! case (10) ! sz = storage_size(r3)*2 ! case (16) ! sz = storage_size(r4)*2 case default call abort() end select *** contains *** 36,43 if (storage_size(o) /= sz) call abort() type is (complex(c2)) if (storage_size(o) /= sz) call abort() - end select - select type (o) type is (complex(c3)) if (storage_size(o) /= sz) call abort() type is (complex(c4)) --- 42,47 Index: gcc/testsuite/gfortran.dg/unlimited_polymorphic_14.f90 === *** gcc/testsuite/gfortran.dg/unlimited_polymorphic_14.f90 (revision 0) --- gcc/testsuite/gfortran.dg/unlimited_polymorphic_14.f90 (working copy) *** *** 0 --- 1,26 + ! { dg-do run } + ! + ! Uncovered in fixing PR fortran/58793 + ! + ! Contributed by Tobias Burnus bur...@gcc.gnu.org + ! + ! Barfed on the hollerith argument + ! + program test + logical l + call up(abc, l) + if (l) call abort + call up(3habc, l) ! { dg-warning Legacy Extension } + if (.not. l) call abort + contains + subroutine up(x, l) + class(*) :: x + logical l + select type(x) + type is (character(*)) + l = .false. + class default + l = .true. + end select + end subroutine + end program test
[PATCH] libstdc++ testsuite cxxflags
This patch addresses two issues with the libstdc++ testsuite: * duplicate -g -O2 CXXFLAGS * missing -g -O2 for remote targets The duplicate -g -O2 flags is a result of testsuite_flags.in using build-time CXXFLAGS and proc libstdc++_init using the environmental CXXFLAGS, which defaults to its build-time value. This patch prevents testsuite_flags.in from using build-time CXXFLAGS. Certain remote targets require a minimum optimization level -O1 in order to pass several atomics built-in function tests. This patch ensures cxxflags contains -g -O2 at minimum when no other optimization flags are specified. The testsuite used to set those flags prior to Benjamin's patch to remove duplicate cxxflags here http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01572.html. Is this OK for trunk? If so, please apply. Thanks, Cesar 2013-10-28 Cesar Philippidis ce...@codesourcery.com libstdc++-v3/ * scripts/testsuite_flags.in (cxxflags): Don't use build-time CXXFLAGS and EXTRA_CXX_FLAGS. * testsuite/lib/libstdc++.exp (libstdc++_init): Ensure, at minimum, cxxflags contains -g -O2. diff --git a/libstdc++-v3/scripts/testsuite_flags.in b/libstdc++-v3/scripts/testsuite_flags.in index d7710ca..35b36e7 100755 --- a/libstdc++-v3/scripts/testsuite_flags.in +++ b/libstdc++-v3/scripts/testsuite_flags.in @@ -55,7 +55,7 @@ case ${query} in ;; --cxxflags) CXXFLAGS_default=-D_GLIBCXX_ASSERT -fmessage-length=0 - CXXFLAGS_config=@SECTION_FLAGS@ @CXXFLAGS@ @EXTRA_CXX_FLAGS@ + CXXFLAGS_config=@SECTION_FLAGS@ echo ${CXXFLAGS_default} ${CXXFLAGS_config} ;; --cxxparallelflags) diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp index 51ff6dd..68dcb15 100644 --- a/libstdc++-v3/testsuite/lib/libstdc++.exp +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp @@ -265,6 +265,15 @@ proc libstdc++_init { testfile } { } append cxxflags append cxxflags [getenv CXXFLAGS] + +if {$cxxflags == -D_GLIBCXX_ASSERT -fmessage-length=0 } { + append cxxflags -g +} + +if ![regexp \-O $cxxflags] { + append cxxflags -O2 +} + v3track cxxflags 2 # Always use MO files built by this test harness.
[GOOGLE] Don't update the callee count if caller is not resolved node
This patch changes to no update callee count if caller count is not a resolved node (in LIPO mode) during AutoFDO compilation. This is because AutoFDO will have the same edge counts for all unresolved nodes. Original update method will lead to multi-update of the callee. Bootstrapped and testing on going. OK for google-4_8 if test is OK? Thanks, Dehao Index: gcc/ipa-inline-transform.c === --- gcc/ipa-inline-transform.c (revision 204131) +++ gcc/ipa-inline-transform.c (working copy) @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d else { struct cgraph_node *n; + if (flag_auto_profile L_IPO_COMP_MODE + cgraph_pre_profiling_inlining_done) +{ + struct cgraph_node *caller = e-caller; + if (caller-global.inlined_to) + caller = caller-global.inlined_to; + if (cgraph_lipo_get_resolved_node (caller-symbol.decl) != caller) + update_original = false; +} n = cgraph_clone_node (e-callee, e-callee-symbol.decl, e-count, e-frequency, update_original, vNULL, true);
Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS
On Oct 28, 2013, at 1:30 PM, DJ Delorie d...@redhat.com wrote: Who needs to approve this? THe target maintainers, or some global maintainer? So Richard said he'd review/approve it… He's been busy clearing out a backlog or email/patches and reviewing wide-int stuff… when next he comes up for air, I suspect he'll approve it. I've got all the review for the target bits, mainly verifying partial sizes, and trying out builds.
Re: [PATCH] libstdc++ testsuite cxxflags
On Oct 28, 2013, at 3:29 PM, Cesar Philippidis ce...@codesourcery.com wrote: This patch addresses two issues with the libstdc++ testsuite: * duplicate -g -O2 CXXFLAGS * missing -g -O2 for remote targets I need to defer this to the libstdc++ people… there is enough of them, and they do such a nice job…
Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c
On 25/10/13 19:04, Kyrill Tkachov wrote: On 24/10/13 20:03, Kugan wrote: Hi Kyrill, It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm --with-arch=armv5te --with-float=soft Ah ok, I can reproduce it now. So, while I agree that we add a scan for vbit and vbif to these testcases, there seems to be something dodgy going on with the register allocation. With -march=armv5te I'm getting the following snippet of code in the ltgt case: .L12: ldr r4, [ip] ldr r5, [ip, #4] ldr r6, [ip, #8] ldr r7, [ip, #12] vmovd20, r4, r5 @ v4sf vmovd21, r6, r7 vcgt.f32q8, q10, q9 vcgt.f32q10, q9, q10 vorrq8, q8, q10 vmovd22, r4, r5 @ v4sf vmovd23, r6, r7 vbitq11, q9, q8 vmovr4, r5, d22 @ v4sf vmovr6, r7, d23 The second vcgt.f32 trashes q10, then recreates it in q11 with: vmovd22, r4, r5 @ v4sf vmovd23, r6, r7 so it can do the vbit. Surely there's something better that can be done? In contrast, with -march=armv7-a we get: .L12: vld1.32 {q9}, [r4]! vcgt.f32q8, q9, q10 vcgt.f32q11, q10, q9 vorrq8, q8, q11 vbslq8, q10, q9 vst1.32 {q8}, [lr]! This is because of the unaligned access done for armv7-a. arm.c has the following comment: /* Enable -munaligned-access by default for - all ARMv6 architecture-based processors - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. - ARMv8 architecture-base processors. Disable -munaligned-access by default for - all pre-ARMv6 architecture-based processors - ARMv6-M architecture-based processors. */ Please look at the rtl difference. - is armv7-a + is armv5te ;; vect_var_.18_61 = MEM[(float *)vect_pw2.14_59]; -(insn 71 70 72 (set (reg:V4SF 192) -(unspec:V4SF [ -(mem:V4SF (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float *)vect_pw2.14_59]+0 S16 A32]) -] UNSPEC_MISALIGNED_ACCESS)) neon-vcond-ltgt.c:12 -1 +(insn 71 70 72 (clobber (reg:V4SF 168 [ vect_var_.18 ])) neon-vcond-ltgt.c:12 -1 + (nil)) + +(insn 72 71 73 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 0) +(mem:SI (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float *)vect_pw2.14_59]+0 S4 A32])) neon-vcond-ltgt.c:12 -1 + (nil)) + +(insn 73 72 74 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 4) +(mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ]) +(const_int 4 [0x4])) [0 MEM[(float *)vect_pw2.14_59]+4 S4 A32])) neon-vcond-ltgt.c:12 -1 + (nil)) + +(insn 74 73 75 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 8) +(mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ]) +(const_int 8 [0x8])) [0 MEM[(float *)vect_pw2.14_59]+8 S4 A32])) neon-vcond-ltgt.c:12 -1 (nil)) -(insn 72 71 0 (set (reg:V4SF 168 [ vect_var_.18 ]) -(reg:V4SF 192)) neon-vcond-ltgt.c:12 -1 +(insn 75 74 0 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 12) +(mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ]) +(const_int 12 [0xc])) [0 MEM[(float *)vect_pw2.14_59]+12 S4 A32])) neon-vcond-ltgt.c:12 -1 (nil))
patch to fix a LRA crash on ppc
The following patch is a new version of a fix for LRA crash on 32-bit ppc SPEC2006 gamess. The previous version broke mode-switching optimization on x86/x86-64. The patch was successfully bootstrapped and tested on x86/x86-64. Committed as rev. 204140. 2013-10-28 Vladimir Makarov vmaka...@redhat.com * lra-spills.c (lra_final_code_change): Remove useless move insns originated from moves of pseudos. Index: lra-spills.c === --- lra-spills.c(revision 204131) +++ lra-spills.c(working copy) @@ -625,7 +625,7 @@ lra_final_code_change (void) { int i, hard_regno; basic_block bb; - rtx insn, curr; + rtx insn, curr, set; int max_regno = max_reg_num (); for (i = FIRST_PSEUDO_REGISTER; i max_regno; i++) @@ -636,6 +636,7 @@ lra_final_code_change (void) FOR_BB_INSNS_SAFE (bb, insn, curr) if (INSN_P (insn)) { + bool change_p; rtx pat = PATTERN (insn); if (GET_CODE (pat) == CLOBBER LRA_TEMP_CLOBBER_P (pat)) @@ -648,6 +649,12 @@ lra_final_code_change (void) continue; } + set = single_set (insn); + change_p = (set != NULL + REG_P (SET_SRC (set)) REG_P (SET_DEST (set)) + REGNO (SET_SRC (set)) = FIRST_PSEUDO_REGISTER + REGNO (SET_DEST (set)) = FIRST_PSEUDO_REGISTER); + lra_insn_recog_data_t id = lra_get_insn_recog_data (insn); struct lra_static_insn_data *static_id = id-insn_static_data; bool insn_change_p = false; @@ -661,5 +668,20 @@ lra_final_code_change (void) } if (insn_change_p) lra_update_operator_dups (id); + + if (change_p REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))) + { + /* Remove an useless move insn but only involving +pseudos as some subsequent optimizations are based on +that move insns involving originally hard registers +are preserved. IRA can generate move insns involving +pseudos. It is better remove them earlier to speed +up compiler a bit. It is also better to do it here +as they might not pass final RTL check in LRA, +(e.g. insn moving a control register into +itself). */ + lra_invalidate_insn_data (insn); + delete_insn (insn); + } } }
Re: [PATCH v2 3/4] Handle simple inheritance in gengtype.
On Tue, 2013-10-15 at 12:58 -0600, Jeff Law wrote: [...] This is OK. Thanks; committed to trunk as r204146. (I was holding off until approval of the robustification followup, which you've now approved, as [1]). Dave [1] http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02330.html
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 28 October 2013 02:55, Andrew Pinski pins...@gmail.com wrote: On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski pins...@gmail.com wrote: On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski pins...@gmail.com wrote: On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: On 18 October 2013 00:58, Jeff Law l...@redhat.com wrote: On 10/17/13 05:03, Richard Biener wrote: Is it OK for trunk? I had a much simpler change which did basically the same from 4.7 (I can update it if people think this is a better approach). I like that more (note you can now use is_gimple_condexpr as predicate for force_gimple_operand). The obvious question is whether or not Andrew's simpler change picks up as many transformations as Zhenqiang's change. If not are the things missed important. Zhenqiang, can you do some testing of your change vs Andrew P.'s change? Here is a rough compare: 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included in my patch). Root cause is that it does not skip LABEL. The guard to do this opt should be the same the bb_has_overhead_p in my patch. This should be an easy change, I am working on this right now. 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not efficient for ||. e.g. For ssa-ifcombine-ccmp-6.c, it will generate _3 = a_2(D) 0; _5 = b_4(D) 0; _6 = _3 | _5; _9 = c_7(D) = 0; _10 = ~_6; _11 = _9 _10; if (_11 == 0) With my patch, it will generate _3 = a_2(D) 0; _5 = b_4(D) 0; _6 = _3 | _5; _9 = c_7(D) 0; _10 = _6 | _9; if (_10 != 0) As mentioned otherwise, this seems like a missed optimization inside forwprop. When I originally wrote this code there used to be two cases one for and one for |, but this was removed sometime and I just made the code evolve with that. Actually I can make a small patch (3 lines) to my current patch which causes tree-ssa-ifcombine.c to produce the behavior of your patch. if (result_inv) { t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t); result_inv = false; } I will submit a new patch after some testing of my current patch which fixes items 1 and 2. Here is my latest patch which adds the testcases from Zhenqiang's patch and fixes item 1 and 2. The patch works fine for me. Bootstrap and no make check regression on ARM Chromebook. Thanks! -Zhenqiang OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (ifcombine_ifandif): Handle cases where maybe_fold_and_comparisons fails, combining the branches anyways. (tree_ssa_ifcombine): Inverse the order of the basic block walk, increases the number of combinings. * gimple.h (gsi_start_nondebug_after_labels_bb): New function. testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent conditional move to be used. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more intermediate. Thanks, Andrew Pinski 3) The good thing of Andrew P.'s change is that Inverse the order of the basic block walk so it can do combine recursively. But I think we need some heuristic to control the number of ifs. Move too much compares from the inner_bb to outer_bb is not good. I think this depends on the target. For MIPS we don't want an upper bound as integer comparisons go directly to GPRs. I wrote this patch with MIPS in mind as that was the target I was working on. 4) Another good thing of Andrew P.'s change is that it reuses some existing functions. So it looks much simple. Thanks, Andrew Pinski With that we should be able to kill the fold-const.c transform? That would certainly be nice and an excellent follow-up for Zhenqiang. That's my final goal to kill the fold-const.c transform. I think we may combine the two changes to make a simple and good patch. Thanks! -Zhenqiang
Limited support for inheritance in gengtype now in trunk (was Re: [PATCH] make gengtype more robust against user error)
On Mon, 2013-10-28 at 15:19 -0600, Jeff Law wrote: On 10/25/13 15:37, David Malcolm wrote: This patch addresses various forms of failure described in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01974.html [...] Thanks for diving into this stuff and getting it fixed. OK for the trunk, Thanks; it passed bootstrapregtesting on x86_64-unknown-linux, so I've committed it to trunk as r204148, completing (I hope) the addition of support in gengtype for some forms of simple class hierarchies. I believe this may also complete the prerequisites for my conversion of the symtable types into a C++ class hierarchy (in that IIRC all of those patches were approved, but were pending this gengtype feature)... but I'll have to see to what extent they've bitrotted (and recheck the bootstrap/regtest, naturally). Dave
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
As there are some issues with abs() type conversions, I removed the related content from the patch but only kept the SSE2 support for abs(int). For the define_expand I added as below, the else body is there to avoid fall-through transformations to ABS operation in optabs.c. Otherwise ABS will be converted to other operations even that we have corresponding instructions from SSSE3. (define_expand absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))] TARGET_SSE2 { if (!TARGET_SSSE3) ix86_expand_sse2_abs (operands[0], force_reg (MODEmode, operands[1])); else emit_insn (gen_rtx_SET (VOIDmode, operands[0], gen_rtx_ABS (MODEmode, operands[1]))); DONE; }) The patch is attached here. Please give me your comments. thanks, Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..84c7ab5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-22 Cong Hou co...@google.com + + PR target/58762 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function. + * config/i386/i386.c (ix86_expand_sse2_abs): New function. + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int). + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 3ab2f3a..ca31224 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_abs (rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 02cbbbd..71905fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) gen_rtx_MULT (mode, op1, op2)); } +void +ix86_expand_sse2_abs (rtx op0, rtx op1) +{ + enum machine_mode mode = GET_MODE (op0); + rtx tmp0, tmp1; + + switch (mode) +{ + /* For 32-bit signed integer X, the best way to calculate the absolute + value of X is (((signed) X (W-1)) ^ X) - ((signed) X (W-1)). */ + case V4SImode: + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1, +GEN_INT (GET_MODE_BITSIZE + (GET_MODE_INNER (mode)) - 1), +NULL, 0, OPTAB_DIRECT); + if (tmp0) + tmp1 = expand_simple_binop (mode, XOR, op1, tmp0, + NULL, 0, OPTAB_DIRECT); + if (tmp0 tmp1) + expand_simple_binop (mode, MINUS, tmp1, tmp0, + op0, 0, OPTAB_DIRECT); + break; + + /* For 16-bit signed integer X, the best way to calculate the absolute + value of X is max (X, -X), as SSE2 provides the PMAXSW insn. */ + case V8HImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + /* For 8-bit signed integer X, the best way to calculate the absolute + value of X is min ((unsigned char) X, (unsigned char) (-X)), + as SSE2 provides the PMINUB insn. */ + case V16QImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + default: + break; +} +} + /* Expand an insert into a vector register through pinsr insn. Return true if successful. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..b85ded4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8721,7 +8721,7 @@ (set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn))) (set_attr mode DI)]) -(define_insn absmode2 +(define_insn *absmode2 [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v) (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))] @@ -8733,6 +8733,20 @@ (set_attr prefix maybe_vex) (set_attr mode sseinsnmode)]) +(define_expand absmode2 + [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand) + (abs:VI124_AVX2_48_AVX512F + (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))] + TARGET_SSE2 +{ + if (!TARGET_SSSE3) +ix86_expand_sse2_abs (operands[0], force_reg (MODEmode, operands[1])); + else +emit_insn (gen_rtx_SET (VOIDmode, operands[0], +gen_rtx_ABS (MODEmode, operands[1]))); + DONE; +}) + (define_insn absmode2 [(set (match_operand:MMXMODEI 0 register_operand =y) (abs:MMXMODEI diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..cf5b942 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2013-10-22 Cong Hou co...@google.com + + PR
RE: [PATCH 1/n] Add conditional compare support
On Tue, 22 Oct 2013, Zhenqiang Chen wrote: ChangeLog: 2013-10-22 Zhenqiang Chen zhenqiang.c...@linaro.org * config/arm/arm.c (arm_fixed_condition_code_regs, arm_ccmode_to_code, arm_select_dominance_ccmp_mode): New functions. (arm_select_dominance_cc_mode_1): New function extracted from arm_select_dominance_cc_mode. (arm_select_dominance_cc_mode): Call arm_select_dominance_cc_mode_1. * config/arm/arm.md (ccmp, cbranchcc4, ccmp_and, ccmp_ior, ccmp_ior_scc_scc, ccmp_ior_scc_scc_cmp, ccmp_and_scc_scc, ccmp_and_scc_scc_cmp): New. * config/arm/arm-protos.h (arm_select_dominance_ccmp_mode): New. * expr.c (ccmp_candidate_p, used_in_cond_stmt_p, expand_ccmp_expr_2, expand_ccmp_expr_3, expand_ccmp_expr_1, expand_ccmp_expr): New. (expand_expr_real_1): Handle ccmp. * optabs.c: Include gimple.h. (expand_ccmp_op): New. (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. * optabs.def (ccmp): New. * optabs.h (expand_ccmp_op): New. * doc/md.texi (ccmp): New index. One thing I don't see other people mentioning, is that this patch has just too much code inside #ifdef HAVE_ccmp ... #endif. I couldn't actually find the part that *requires* that, i.e. code that does something like gen_ccmp (...) but maybe it's there. Where needed and where the conditioned code is more than a few lines, such code is preferably transformed into if (0):d code using constructs like that at builtins.c:5354: #ifndef HAVE_atomic_clear # define HAVE_atomic_clear 0 # define gen_atomic_clear(x,y) (gcc_unreachable (), NULL_RTX) #endif Right, this causes dead code, but for maintenance it's *much* better than when only a fraction of the code being compiled for other targets. (Also, the dead code may be eliminated by gcc.) Unfortunately the number of examples (as above) are few compared to the pages of #if HAVE_thisorthat'd code. :( (And IMHO that whole construct should be the default implementation and shouldn't have to be written manually in the first place. But that's material for an invasive patch.) brgds, H-P
Re: [SH] PR 54236 - add some more addc patterns
Oleg Endo oleg.e...@t-online.de wrote: This adds some more patterns to utilize the SH addc instruction. Tested on rev 204111 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2a/-mb,-m2a-single/-mb,-m4/-ml,-m4-single/-ml,-m4/-mb,-m4-single/-mb} and no new failures. OK for trunk? Using same *addc for 7 different insn_and_split looks a bit confusing. It might be better to add some sufficies like _r_1, _r_r_1, _r_1_r, _2r_1, _r_31, _r_r_31, _2r_31. Other than that, the patch is OK. Regards, kaz
[PATCH (updated)] Convert symtab, cgraph and varpool nodes into a real class hierarchy
On Fri, 2013-09-20 at 17:33 +0200, Jan Hubicka wrote: This patch is the handwritten part of the conversion of these types to C++; it requires the followup patch, which is autogenerated. It converts: struct symtab_node_base to: class symtab_node_base and converts: struct cgraph_node to: struct cgraph_node : public symtab_node_base and: struct varpool_node to: class varpool_node : public symtab_node_base dropping the symtab_node_def union. Yep, incrementally we should continue with the grand renaming retiring symtab_node_base and getting things symmetric. * cgraph.h (symtab_node_base): Convert to a class; add GTY((desc (%h.type), tag (SYMTAB_SYMBOL))). (cgraph_node): Inherit from symtab_node; add GTY option tag (SYMTAB_FUNCTION). (varpool_node): Inherit from symtab_node; add GTY option tag (SYMTAB_VARIABLE). (symtab_node_def): Remove. (is_a_helper cgraph_node::test (symtab_node_def *)): Convert to... (is_a_helper cgraph_node::test (symtab_node_base *)): ...this. (is_a_helper varpool_node::test (symtab_node_def *)): Convert to... (is_a_helper varpool_node::test (symtab_node_base *)): ...this. * ipa-ref.h (symtab_node_def): Drop. (symtab_node): Change underlying type from symtab_node_def to symtab_node_base. (const_symtab_node): Likwise. * is-a.h: Update examples in comment. * symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base. (assembler_name_hash): Likewise. This patch is OK. Thanks for working on this! These symtab changes were dependent on having gengtype support for inheritance, which is now in trunk, so I'm now revisiting these patches. The above patch hasn't bitrotted, though the autogenerated one that goes with it needed regenerating. A new version of the autogenerated patch can be seen at: http://dmalcolm.fedorapeople.org/gcc/large-patches/eaba9669644c84592ea32be2dcd19ba92beca381-0003-Autogenerated-fixes-of-symbol.-to.patch Is that new patch OK? (it's 450KB so one's eyes tend to glaze over after a while, but FWIW you approved an earlier version of that in: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00730.html and the test suite for the script that generated the patch can be seen at: https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/test_refactor_symtab.py ) Also, I noticed when reviewing the autogenerated marking routines in gtype-desc.c that the symtab_node_def union had chain_next/chain_prev markings, and that my patch above effectively dropped them. The attached patch reinstates them, albeit to the symtab_node_base base class, and inspection of the generated gtype-desc.c shows that the generated code does the right thing. Is this change also OK for trunk? (I'm still bootstrapping/regtesting the patch series now, so the above is all conditional on that, so my apologies if this turns out to be premature). Dave From f070f1ecc2baa536d121f2942bd8af750dbc2f02 Mon Sep 17 00:00:00 2001 From: David Malcolm dmalc...@redhat.com Date: Mon, 28 Oct 2013 22:25:38 -0400 Subject: [PATCH 2/3] Add chain_next and chain_prev options back to symtab_node_base gcc/ * cgraph.h (symtab_node_base): Add missing chain_next and chain_prev GTY options. --- gcc/cgraph.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 5a582d8..35f79ec 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -39,7 +39,7 @@ enum symtab_type /* Base of all entries in the symbol table. The symtab_node is inherited by cgraph and varpol nodes. */ -class GTY((desc (%h.type), tag (SYMTAB_SYMBOL))) symtab_node_base +class GTY((desc (%h.type), tag (SYMTAB_SYMBOL), chain_next (%h.next), chain_prev (%h.previous))) symtab_node_base { public: /* Type of the symbol. */ -- 1.7.11.7
Re: [PATCH (updated)] Convert symtab, cgraph and varpool nodes into a real class hierarchy
On Mon, 2013-10-28 at 22:57 -0400, David Malcolm wrote: On Fri, 2013-09-20 at 17:33 +0200, Jan Hubicka wrote: This patch is the handwritten part of the conversion of these types to C++; it requires the followup patch, which is autogenerated. It converts: struct symtab_node_base to: class symtab_node_base and converts: struct cgraph_node to: struct cgraph_node : public symtab_node_base and: struct varpool_node to: class varpool_node : public symtab_node_base dropping the symtab_node_def union. Yep, incrementally we should continue with the grand renaming retiring symtab_node_base and getting things symmetric. * cgraph.h (symtab_node_base): Convert to a class; add GTY((desc (%h.type), tag (SYMTAB_SYMBOL))). (cgraph_node): Inherit from symtab_node; add GTY option tag (SYMTAB_FUNCTION). (varpool_node): Inherit from symtab_node; add GTY option tag (SYMTAB_VARIABLE). (symtab_node_def): Remove. (is_a_helper cgraph_node::test (symtab_node_def *)): Convert to... (is_a_helper cgraph_node::test (symtab_node_base *)): ...this. (is_a_helper varpool_node::test (symtab_node_def *)): Convert to... (is_a_helper varpool_node::test (symtab_node_base *)): ...this. * ipa-ref.h (symtab_node_def): Drop. (symtab_node): Change underlying type from symtab_node_def to symtab_node_base. (const_symtab_node): Likwise. * is-a.h: Update examples in comment. * symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base. (assembler_name_hash): Likewise. This patch is OK. Thanks for working on this! These symtab changes were dependent on having gengtype support for inheritance, which is now in trunk, so I'm now revisiting these patches. The above patch hasn't bitrotted, though the autogenerated one that goes with it needed regenerating. A new version of the autogenerated patch can be seen at: http://dmalcolm.fedorapeople.org/gcc/large-patches/eaba9669644c84592ea32be2dcd19ba92beca381-0003-Autogenerated-fixes-of-symbol.-to.patch Is that new patch OK? (it's 450KB so one's eyes tend to glaze over after a while, but FWIW you approved an earlier version of that in: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00730.html and the test suite for the script that generated the patch can be seen at: https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/test_refactor_symtab.py ) Also, I noticed when reviewing the autogenerated marking routines in gtype-desc.c that the symtab_node_def union had chain_next/chain_prev markings, and that my patch above effectively dropped them. The attached patch reinstates them, albeit to the symtab_node_base base class, and inspection of the generated gtype-desc.c shows that the generated code does the right thing. ...and here's a revised version one that avoids overlong lines. commit 739276306f1d8f5d0b1202da21cc29b5fcac102e Author: David Malcolm dmalc...@redhat.com Date: Mon Oct 28 22:25:38 2013 -0400 Add chain_next and chain_prev options back to symtab_node_base gcc/ * cgraph.h (symtab_node_base): Add missing chain_next and chain_prev GTY options. diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 5a582d8..4cc2049 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -39,7 +39,9 @@ enum symtab_type /* Base of all entries in the symbol table. The symtab_node is inherited by cgraph and varpol nodes. */ -class GTY((desc (%h.type), tag (SYMTAB_SYMBOL))) symtab_node_base +class GTY((desc (%h.type), tag (SYMTAB_SYMBOL), + chain_next (%h.next), chain_prev (%h.previous))) + symtab_node_base { public: /* Type of the symbol. */
Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
On 10/28/2013 03:20 AM, Bernd Edlinger wrote: On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote: I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new pr56997-1.c testcase, it got stuck in infinite recursion between store_split_bit_field/store_fixed_bit_field and/or extract_split_bit_field/extract_fixed_bit_field. This didn't show up in my previous mainline testing. [snip] I have attached an update to your patch, that should a) fix the recursion problem. b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory model. Thanks for picking up the ball on this -- I've been busy with other tasks for several days. I again tried backporting the patch series along with your fix to GCC 4.8 and tested on arm-none-eabi. I found that it was still getting stuck in infinite recursion unless the test from this patch hunk @@ -1699,6 +1711,14 @@ extract_bit_field (rtx str_rtx, unsigned { enum machine_mode mode1; + /* Handle extraction of unaligned fields, + this can happen in -fstrict-volatile-bitfields. */ + if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0 + GET_MODE_BITSIZE (GET_MODE (str_rtx)) GET_MODE_BITSIZE (word_mode) + bitnum % GET_MODE_BITSIZE (GET_MODE (str_rtx)) + bitsize + GET_MODE_BITSIZE (GET_MODE (str_rtx)) ) +str_rtx = adjust_address (str_rtx, word_mode, 0); + /* Handle -fstrict-volatile-bitfields in the cases where it applies. */ if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) 0) mode1 = GET_MODE (str_rtx); was also inserted into store_bit_field. I also think that, minimally, this needs to be rewritten as something like if (MEM_P (str_rtx) STRICT_ALIGNMENT GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0 GET_MODE_BITSIZE (GET_MODE (str_rtx)) GET_MODE_BITSIZE (word_mode) (bitnum % MEM_ALIGN (str_rtx) + bitsize GET_MODE_BITSIZE (GET_MODE (str_rtx str_rtx = adjust_address (str_rtx, word_mode, 0); Otherwise, we'll end up using word_mode instead of the field mode on targets that support unaligned accesses. I tested that (again, 4.8 and arm-none-eabi) and results looked good, but of course ARM isn't one of the targets that supports unaligned accesses. :-S I'm still not convinced this is the right fix, though. It seems to me that callers of store_bit_field and extract_bit_field in expr.c ought not to be messing with the mode of the rtx when flag_strict_volatile_bitfields 0, because that is losing information about the original mode that we are trying to patch up here by forcing it to word_mode. Instead, I think the callers ought to be passing the declared field mode into store_bit_field and extract_bit_field, and those functions ought to change the mode of the incoming rtx to match the field mode only if strict_volatile_bitfield_p assures us that the insertion/extraction can be done in that mode. Alternatively, if strict_volatile_bitfield_p returns false but flag_strict_volatile_bitfields 0, then always force to word_mode and change the -fstrict-volatile-bitfields documentation to indicate that's the fallback if the insertion/extraction cannot be done in the declared mode, rather than claiming that it tries to do the same thing as if -fstrict-volatile-bitfields were not enabled at all. Either way, still needs more work, and more thorough testing (more targets, and obviously trunk as well as the 4.8 backport) before I'd consider this ready to commit. I might or might not be able to find some more time to work on this in the next week -Sandra
Re: [GOOGLE] Don't update the callee count if caller is not resolved node
Is it sufficient to check if the final caller is defined in primary module? Note that in some cases, doing profile update 'speculatively' (without your fix) can be more precise (assuming the inlining gets materialized in a different compilation), but in general it might be better to be conservative in profile update (as in your patch) -- the downside is you may end up with less aggressive size optimization. David On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen de...@google.com wrote: This patch changes to no update callee count if caller count is not a resolved node (in LIPO mode) during AutoFDO compilation. This is because AutoFDO will have the same edge counts for all unresolved nodes. Original update method will lead to multi-update of the callee. Bootstrapped and testing on going. OK for google-4_8 if test is OK? Thanks, Dehao Index: gcc/ipa-inline-transform.c === --- gcc/ipa-inline-transform.c (revision 204131) +++ gcc/ipa-inline-transform.c (working copy) @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d else { struct cgraph_node *n; + if (flag_auto_profile L_IPO_COMP_MODE + cgraph_pre_profiling_inlining_done) +{ + struct cgraph_node *caller = e-caller; + if (caller-global.inlined_to) + caller = caller-global.inlined_to; + if (cgraph_lipo_get_resolved_node (caller-symbol.decl) != caller) + update_original = false; +} n = cgraph_clone_node (e-callee, e-callee-symbol.decl, e-count, e-frequency, update_original, vNULL, true);
[PATCH][RFA] Improvements to infer_nonnull_range
Based on a suggestion from Marc, I want to use infer_nonnull_range in the erroneous path isolation optimization. In the process of doing that, I found a few deficiencies in infer_nonnull_range that need to be addressed. First, infer_nonnull_range_p doesn't infer ranges from a GIMPLE_RETURN if the current function is marked as returns_nonnull. That's fixed in the relatively obvious way. Second, infer_nonnull_range_p, when presented with an arglist where the non-null attribute applies to all pointer arguments, it won't bother to determine if OP is actually one of the arguments :( It just assumes that OP is in the argument list. Opps. Third, I want to be able to call infer_nonnull_range with OP being 0B. That lets me use infer_nonnull_range to look for explicit null pointer dereferences, explicit uses of null in a return statement in functions that can't return non-null and explicit uses of null arguments when those arguments can't be null. Sadly, to make that work we need to use operand_equal_p rather than simple pointer comparisons to see if OP shows up in STMT. Finally, for detecting explicit null pointers, infer_nonnull_range calls count_ptr_derefs. count_ptr_derefs counts things in two ways. One with a FOR_EACH_SSA_TREE_OPERAND, then again with simple walks of the tree structures. Not surprisingly if we're looking for an explicit 0B, then the loop over the operands finds nothing, but walking the tree structures does. And the checking assert triggers. This change removes the assert and instead sets *num_uses_p to a sane value. I don't have testcases for this stuff that are independent of the erroneous path isolation optimization. However, each is triggered by tests I'll include in the the erroneous path isolation patch. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the trunk? jeff * tree-ssa.c (count_ptr_derefs): Use operand_equal_p rather than testing pointer equality. Remove invalid assert. Clamp minimum value for *num_uses_p at *num_loads_p + *num_stores_p. * tree-vrp.c (infer_nonnull_range): Verify OP is in the argument list and the argument corresponding to OP is a pointer type. Use operand_equal_p rather than pointer equality when testing if OP is on the nonnull list. Handle GIMPLE_RETURN statements too. diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c index 0b743d1..d29665b 100644 --- a/gcc/tree-ssa.c +++ b/gcc/tree-ssa.c @@ -265,7 +265,8 @@ count_ptr_derefs (tree *tp, int *walk_subtrees, void *data) return NULL_TREE; } - if (TREE_CODE (*tp) == MEM_REF TREE_OPERAND (*tp, 0) == count_p-ptr) + if (TREE_CODE (*tp) == MEM_REF + operand_equal_p (TREE_OPERAND (*tp, 0), count_p-ptr, 0)) { if (wi_p-is_lhs) count_p-num_stores++; @@ -326,7 +327,11 @@ count_uses_and_derefs (tree ptr, gimple stmt, unsigned *num_uses_p, *num_loads_p = count.num_loads; } - gcc_assert (*num_uses_p = *num_loads_p + *num_stores_p); + /* This can happen if PTR refers to something other than an SSA_NAME. + In that case we will find no uses as the FOR_EACH_SSA_TREE_OPERAND + only iterates over the SSA_NAMEs in a statement. */ + if (*num_uses_p *num_loads_p + *num_stores_p) +*num_uses_p = *num_loads_p + *num_stores_p; } diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index d3a07f3..bbcb34c 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4509,21 +4509,38 @@ infer_nonnull_range (gimple stmt, tree op) return false; /* If nonnull applies to all the arguments, then ARG -is non-null. */ +is non-null if it's in the argument list. */ if (TREE_VALUE (attrs) == NULL_TREE) - return true; + { + for (unsigned int i = 0; i gimple_call_num_args (stmt); i++) + { + if (operand_equal_p (op, gimple_call_arg (stmt, i), 0) + POINTER_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, i + return true; + } + return false; + } /* Now see if op appears in the nonnull list. */ for (tree t = TREE_VALUE (attrs); t; t = TREE_CHAIN (t)) { int idx = TREE_INT_CST_LOW (TREE_VALUE (t)) - 1; tree arg = gimple_call_arg (stmt, idx); - if (op == arg) + if (operand_equal_p (op, arg, 0)) return true; } } } + /* If this function is marked as returning non-null, then we can + infer OP is non-null if it is used in the return statement. */ + if (gimple_code (stmt) == GIMPLE_RETURN + gimple_return_retval (stmt) + operand_equal_p (gimple_return_retval (stmt), op, 0) + lookup_attribute (returns_nonnull, + TYPE_ATTRIBUTES (TREE_TYPE (current_function_decl +return true; + return
Re: free is a killer
On 10/28/13 16:05, Marc Glisse wrote: I checked and it does the wrong thing (I don't have the testcase handy anymore, but it shouldn't be hard to recreate one), I even wrote a patch (attached) but it is related to: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html so it can't go in. A more limited fix (still missing many cases) would be possible. I didn't test VA_END, but it doesn't look as bad (it is the SSA_NAME case that is wrong for memcpy Actually, it's fairly easy to see with memset. Something like this: /* { dg-do compile } */ /* { dg-options -O1 -fdump-tree-optimized } */ void f (long *p) { *p = 42; p[4] = 42; __builtin_memset (p, 0, 100); } /* { dg-final { scan-tree-dump-not = 42 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ I'm sure it can be made to work with memcpy as well, it's just harder because you have to ensure the source/dest args don't alias. While I am posting non-submitted patches, does the following vaguely make sense? I couldn't find a testcase that exercised it without some local patches, but the idea (IIRC) is that with: struct A { struct B b; int i; } we can easily end up with one ao_ref.base that is a MEM_REF[p] and another one a MEM_REF[(B*)q] where p and q are of type A*, and that prevents us from noticing that p-i and q-b don't alias. Maybe I should instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually points to a larger structure that starts with A it is likely to fail as well... I've never looked at the alias oracle stuff, but ISTM that offsets from the base ought to be enough to disambiguate? In the category ugly hack that I won't submit, let me also attach this patch that sometimes helps notice that malloc doesn't alias other things (I don't know if the first hunk ever fires). Well, one thing that can be done here (I think it comes from Steensgaard and I'm pretty sure it's discussed in Morgan's text) is you assign a storage class and propagate that. Once you do so, you can eliminate certain things. You use a vrp-like engine with a fairly simple meet operator to handle propagation of the storage class. Once you have storage classes for all the pointers, you can make certain deductions about aliasing relationships. Including what things in the malloc storage class can and can not alias. Jeff
Re: [PATCH (updated)] Convert symtab, cgraph and varpool nodes into a real class hierarchy
On 10/28/13 20:57, David Malcolm wrote: * cgraph.h (symtab_node_base): Convert to a class; add GTY((desc (%h.type), tag (SYMTAB_SYMBOL))). (cgraph_node): Inherit from symtab_node; add GTY option tag (SYMTAB_FUNCTION). (varpool_node): Inherit from symtab_node; add GTY option tag (SYMTAB_VARIABLE). (symtab_node_def): Remove. (is_a_helper cgraph_node::test (symtab_node_def *)): Convert to... (is_a_helper cgraph_node::test (symtab_node_base *)): ...this. (is_a_helper varpool_node::test (symtab_node_def *)): Convert to... (is_a_helper varpool_node::test (symtab_node_base *)): ...this. * ipa-ref.h (symtab_node_def): Drop. (symtab_node): Change underlying type from symtab_node_def to symtab_node_base. (const_symtab_node): Likwise. * is-a.h: Update examples in comment. * symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base. (assembler_name_hash): Likewise. This patch is OK. Thanks for working on this! These symtab changes were dependent on having gengtype support for inheritance, which is now in trunk, so I'm now revisiting these patches. The above patch hasn't bitrotted, though the autogenerated one that goes with it needed regenerating. A new version of the autogenerated patch can be seen at: http://dmalcolm.fedorapeople.org/gcc/large-patches/eaba9669644c84592ea32be2dcd19ba92beca381-0003-Autogenerated-fixes-of-symbol.-to.patch Is that new patch OK? (it's 450KB so one's eyes tend to glaze over after a while, but FWIW you approved an earlier version of that in: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00730.html and the test suite for the script that generated the patch can be seen at: https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/test_refactor_symtab.py ) Given it's already approved and hasn't changed in any substantial way, it's OK for the trunk once you get through the bootstrap and regression testing. jeff
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On 10/27/13 12:55, Andrew Pinski wrote: Here is my latest patch which adds the testcases from Zhenqiang's patch and fixes item 1 and 2. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (ifcombine_ifandif): Handle cases where maybe_fold_and_comparisons fails, combining the branches anyways. (tree_ssa_ifcombine): Inverse the order of the basic block walk, increases the number of combinings. * gimple.h (gsi_start_nondebug_after_labels_bb): New function. testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent conditional move to be used. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more intermediate. I don't be able to look at this in any depth tonight. However, I would ask that you pass along the dump file for ssa-dom-thread-3.c so that I can evaluate the correctness of your change better. Thanks, jeff