[PATCH i386 11/8] [AVX512] Add missing packed PF gathers/scatters, rename load/store.
Hello, This patch introduces missing AVX-512PF intrinsics and tests. It also renames store/load intrinsics according to EAS. gcc/ * config/i386/avx512fintrin.h (_mm512_loadu_si512): Rename. (_mm512_storeu_si512): Ditto. * config/i386/avx512pfintrin.h (_mm512_mask_prefetch_i32gather_pd): New. (_mm512_mask_prefetch_i64gather_pd): Ditto. (_mm512_prefetch_i32scatter_pd): Ditto. (_mm512_mask_prefetch_i32scatter_pd): Ditto. (_mm512_prefetch_i64scatter_pd): Ditto. (_mm512_mask_prefetch_i64scatter_pd): Ditto. (_mm512_mask_prefetch_i32gather_ps): Fix operand type. (_mm512_mask_prefetch_i64gather_ps): Ditto. (_mm512_prefetch_i32scatter_ps): Ditto. (_mm512_mask_prefetch_i32scatter_ps): Ditto. (_mm512_prefetch_i64scatter_ps): Ditto. (_mm512_mask_prefetch_i64scatter_ps): Ditto. * config/i386/i386-builtin-types.def: Define VOID_FTYPE_QI_V8SI_PCINT64_INT_INT and VOID_FTYPE_QI_V8DI_PCINT64_INT_INT. * config/i386/i386.c (ix86_builtins): Define IX86_BUILTIN_GATHERPFQPD, IX86_BUILTIN_GATHERPFDPD, IX86_BUILTIN_SCATTERPFDPD, IX86_BUILTIN_SCATTERPFQPD. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_gatherpfdpd, __builtin_ia32_gatherpfdps, __builtin_ia32_gatherpfqpd, __builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdpd, __builtin_ia32_scatterpfdps, __builtin_ia32_scatterpfqpd, __builtin_ia32_scatterpfqps. (ix86_expand_builtin): Expand new built-ins. * config/i386/sse.md (avx512pf_gatherpf): Add SF suffix, fix memory access data type. (*avx512pf_gatherpf_mask): Ditto. (*avx512pf_gatherpf): Ditto. (avx512pf_scatterpf): Ditto. (*avx512pf_scatterpf_mask): Ditto. (*avx512pf_scatterpf): Ditto. (avx512pf_gatherpfdf): New. (*avx512pf_gatherpfdf_mask): Ditto. (*avx512pf_gatherpfdf): Ditto. (avx512pf_scatterpfdf): Ditto. (*avx512pf_scatterpfdf_mask): Ditto. (*avx512pf_scatterpfdf): Ditto. testsuite/ * gcc.target/i386/avx512f-vmovdqu32-1.c: Fix intrinsic name. * gcc.target/i386/avx512f-vmovdqu32-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpd-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpq-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpud-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpuq-2.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto. * gcc.target/i386/sse-14.c: Add new built-ins, fix AVX-512ER built-ins roudning immediate. * gcc.target/i386/sse-22.c: Add new built-ins. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx-1.c: Ditto. I have a doubts about changes to sse.md. I've splitted existing (SF-only) patterns into 2: DF and SF. As far as insn operands and final instruction have no such data type discrimination I set this data type to (mem:..) part. Having this (for SF): (define_expand "avx512pf_scatterpfsf" [(unspec [(match_operand: 0 "register_or_constm1_operand") (mem:SF ... instead of this: (define_expand "avx512pf_scatterpf" [(unspec [(match_operand: 0 "register_or_constm1_operand") (mem: ... Not sure if this (DI/SI) mode for mem is needed. Moreover, not sure what that data type represents. Patch in the bottom. AVX* and SSE* tests pass. Comments or it is ok for trunk? -- Thanks, K --- gcc/config/i386/avx512fintrin.h| 4 +- gcc/config/i386/avx512pfintrin.h | 113 -- gcc/config/i386/i386-builtin-types.def | 2 + gcc/config/i386/i386.c | 37 - gcc/config/i386/sse.md | 171 +++-- gcc/testsuite/gcc.target/i386/avx-1.c | 4 + .../gcc.target/i386/avx512f-vmovdqu32-1.c | 4 +- .../gcc.target/i386/avx512f-vmovdqu32-2.c | 4 +- gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c | 4 +- gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c | 4 +- gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c | 4 +- gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c | 4 +- .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c | 15 ++ .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c | 15 ++ .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c | 15 ++ .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c | 15 ++ .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c| 17 ++ .../gcc.tar
Re: [PATCH] Fixing PR59006 and PR58921 by delaying loop invariant hoisting in vectorizer.
I noticed that LIM could not hoist vector invariant, and that is why my first implementation tries to hoist them all. In addition, there are two disadvantages of hoisting invariant load + lim method: First, for some instructions the scalar version is faster than the vector version, and in this case hoisting scalar instructions before vectorization is better. Those instructions include data packing/unpacking, integer multiplication with SSE2, etc.. Second, it may use more SIMD registers. The following code shows a simple example: char *a, *b, *c; for (int i = 0; i < N; ++i) a[i] = b[0] * c[0] + a[i]; Vectorizing b[0]*c[0] is worse than loading the result of b[0]*c[0] into a vector. thanks, Cong On Mon, Jan 13, 2014 at 5:37 AM, Richard Biener wrote: > On Wed, 27 Nov 2013, Jakub Jelinek wrote: > >> On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote: >> > Hmm. I'm still thinking that we should handle this during the regular >> > transform step. >> >> I wonder if it can't be done instead just in vectorizable_load, >> if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is >> invariant, just emit the (broadcasted) load not inside of the loop, but on >> the loop preheader edge. > > So this implements this suggestion, XFAILing the no longer handled cases. > For example we get > > _94 = *b_8(D); > vect_cst_.18_95 = {_94, _94, _94, _94}; > _99 = prolog_loop_adjusted_niters.9_132 * 4; > vectp_a.22_98 = a_6(D) + _99; > ivtmp.43_77 = (unsigned long) vectp_a.22_98; > > : > # ivtmp.41_67 = PHI > # ivtmp.43_71 = PHI > vect__10.19_97 = vect_cst_.18_95 + { 1, 1, 1, 1 }; > _76 = (void *) ivtmp.43_71; > MEM[base: _76, offset: 0B] = vect__10.19_97; > > ... > > instead of having hoisted *b_8 + 1 as scalar computation. Not sure > why LIM doesn't hoist the vector variant later. > > vect__10.19_97 = vect_cst_.18_95 + vect_cst_.20_96; > invariant up to level 1, cost 1. > > ah, the cost thing. Should be "improved" to see that hoisting > reduces the number of live SSA names in the loop. > > Eventually lower_vector_ssa could optimize vector to scalar > code again ... (ick). > > Bootstrap / regtest running on x86_64. > > Comments? > > Thanks, > Richard. > > 2014-01-13 Richard Biener > > PR tree-optimization/58921 > PR tree-optimization/59006 > * tree-vect-loop-manip.c (vect_loop_versioning): Remove code > hoisting invariant stmts. > * tree-vect-stmts.c (vectorizable_load): Insert the splat of > invariant loads on the preheader edge if possible. > > * gcc.dg/torture/pr58921.c: New testcase. > * gcc.dg/torture/pr59006.c: Likewise. > * gcc.dg/vect/pr58508.c: XFAIL no longer handled cases. > > Index: gcc/tree-vect-loop-manip.c > === > *** gcc/tree-vect-loop-manip.c (revision 206576) > --- gcc/tree-vect-loop-manip.c (working copy) > *** vect_loop_versioning (loop_vec_info loop > *** 2435,2507 > } > } > > - > - /* Extract load statements on memrefs with zero-stride accesses. */ > - > - if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) > - { > - /* In the loop body, we iterate each statement to check if it is a > load. > -Then we check the DR_STEP of the data reference. If DR_STEP is zero, > -then we will hoist the load statement to the loop preheader. */ > - > - basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); > - int nbbs = loop->num_nodes; > - > - for (int i = 0; i < nbbs; ++i) > - { > - for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]); > - !gsi_end_p (si);) > - { > - gimple stmt = gsi_stmt (si); > - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > - struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); > - > - if (is_gimple_assign (stmt) > - && (!dr > - || (DR_IS_READ (dr) && integer_zerop (DR_STEP (dr) > - { > - bool hoist = true; > - ssa_op_iter iter; > - tree var; > - > - /* We hoist a statement if all SSA uses in it are defined > -outside of the loop. */ > - FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE) > - { > - gimple def = SSA_NAME_DEF_STMT (var); > - if (!gimple_nop_p (def) > - && flow_bb_inside_loop_p (loop, gimple_bb (def))) > - { > - hoist = false; > - break; > - } > - } > - > - if (hoist) > - { > - if (dr) > - gimple_set_vuse (stmt, NULL); > - > - gsi_remove (&si, false); > - gsi_i
[PATCH/AARCH64] Add issue_rate tuning field
Hi, While writing a scheduler for Cavium's aarch64 processor (Thunder), I found there was no way currently to change the issue rate in back-end. This patch adds a field (issue_rate) to tune_params and creates a new function that the middle-end calls. I updated the current two tuning variables (generic_tunings and cortexa53_tunings) to be 1 which was the default before. OK? Built and tested for aarch64-elf with no regressions. Thanks, Andrew Pinski ChangeLog: * config/aarch64/aarch64-protos.h (tune_params): Add issue_rate. * config/aarch64/aarch64.c (generic_tunings): Add issue rate of 1. (cortexa53_tunings): Likewise. (aarch64_sched_issue_rate): New function. (TARGET_SCHED_ISSUE_RATE): Define. Index: config/aarch64/aarch64-protos.h === --- config/aarch64/aarch64-protos.h (revision 206594) +++ config/aarch64/aarch64-protos.h (working copy) @@ -156,6 +156,7 @@ struct tune_params const struct cpu_regmove_cost *const regmove_cost; const struct cpu_vector_cost *const vec_costs; const int memmov_cost; + const int issue_rate; }; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); Index: config/aarch64/aarch64.c === --- config/aarch64/aarch64.c(revision 206594) +++ config/aarch64/aarch64.c(working copy) @@ -221,7 +221,8 @@ static const struct tune_params generic_ &generic_addrcost_table, &generic_regmove_cost, &generic_vector_cost, - NAMED_PARAM (memmov_cost, 4) + NAMED_PARAM (memmov_cost, 4), + NAMED_PARAM (issue_rate, 1) }; static const struct tune_params cortexa53_tunings = @@ -230,7 +231,8 @@ static const struct tune_params cortexa5 &generic_addrcost_table, &generic_regmove_cost, &generic_vector_cost, - NAMED_PARAM (memmov_cost, 4) + NAMED_PARAM (memmov_cost, 4), + NAMED_PARAM (issue_rate, 1) }; /* A processor implementing AArch64. */ @@ -4895,6 +4897,13 @@ aarch64_memory_move_cost (enum machine_m return aarch64_tune_params->memmov_cost; } +/* Return the number of instructions that can be issued per cycle. */ +static int +aarch64_sched_issue_rate (void) +{ + return aarch64_tune_params->issue_rate; +} + /* Vectorizer cost model target hooks. */ /* Implement targetm.vectorize.builtin_vectorization_cost. */ @@ -8411,6 +8420,9 @@ aarch64_vectorize_vec_perm_const_ok (enu #undef TARGET_RTX_COSTS #define TARGET_RTX_COSTS aarch64_rtx_costs +#undef TARGET_SCHED_ISSUE_RATE +#define TARGET_SCHED_ISSUE_RATE aarch64_sched_issue_rate + #undef TARGET_TRAMPOLINE_INIT #define TARGET_TRAMPOLINE_INIT aarch64_trampoline_init
[PATCH,rs6000] Implement -maltivec=be for vec_mule and vec_mulo Altivec intrinsics
This patch provides for interpreting parity of element numbers for the Altivec vec_mule and vec_mulo intrinsics as big-endian (left to right in a vector register) when targeting a little endian machine and specifying -maltivec=be. New test cases are added to test this functionality on all supported vector types. The main change is in the altivec.md define_insns for vec_widen_{su}mult_{even,odd}_{v8hi,v16qi}, where we now test for VECTOR_ELT_ORDER_BIG rather than BYTES_BIG_ENDIAN in order to treat the element order as big-endian. However, this necessitates changes to other places in altivec.md where we previously called gen_vec_widen_{su}mult_*. The semantics of these internal uses are not affected by -maltivec=be, so these are now replaced with direct generation of the underlying instructions that were previously generated. Bootstrapped and tested with no new regressions on powerpc64{,le}-unknown-linux-gnu. Ok for trunk? Thanks, Bill gcc: 2014-01-13 Bill Schmidt * config/rs6000/altivec.md (mulv8hi3): Explicitly generate vmulesh and vmulosh rather than call gen_vec_widen_smult_*. (vec_widen_umult_even_v16qi): Test VECTOR_ELT_ORDER_BIG rather than BYTES_BIG_ENDIAN to determine use of even or odd instruction. (vec_widen_smult_even_v16qi): Likewise. (vec_widen_umult_even_v8hi): Likewise. (vec_widen_smult_even_v8hi): Likewise. (vec_widen_umult_odd_v16qi): Likewise. (vec_widen_smult_odd_v16qi): Likewise. (vec_widen_umult_odd_v8hi): Likewise. (vec_widen_smult_odd_v8hi): Likewise. (vec_widen_umult_hi_v16qi): Explicitly generate vmuleub and vmuloub rather than call gen_vec_widen_umult_*. (vec_widen_umult_lo_v16qi): Likewise. (vec_widen_smult_hi_v16qi): Explicitly generate vmulesb and vmulosb rather than call gen_vec_widen_smult_*. (vec_widen_smult_lo_v16qi): Likewise. (vec_widen_umult_hi_v8hi): Explicitly generate vmuleuh and vmulouh rather than call gen_vec_widen_umult_*. (vec_widen_umult_lo_v8hi): Likewise. (vec_widen_smult_hi_v8hi): Explicitly gnerate vmulesh and vmulosh rather than call gen_vec_widen_smult_*. (vec_widen_smult_lo_v8hi): Likewise. gcc/testsuite: 2014-01-13 Bill Schmidt * gcc.dg/vmx/mult-even-odd.c: New. * gcc.dg/vmx/mult-even-odd-be-order.c: New. Index: gcc/testsuite/gcc.dg/vmx/mult-even-odd.c === --- gcc/testsuite/gcc.dg/vmx/mult-even-odd.c(revision 0) +++ gcc/testsuite/gcc.dg/vmx/mult-even-odd.c(revision 0) @@ -0,0 +1,43 @@ +#include "harness.h" + +static void test() +{ + vector unsigned char vuca = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + vector unsigned char vucb = {2,3,2,3,2,3,2,3,2,3,2,3,2,3,2,3}; + vector signed char vsca = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7}; + vector signed char vscb = {2,-3,2,-3,2,-3,2,-3,2,-3,2,-3,2,-3,2,-3}; + vector unsigned short vusa = {0,1,2,3,4,5,6,7}; + vector unsigned short vusb = {2,3,2,3,2,3,2,3}; + vector signed short vssa = {-4,-3,-2,-1,0,1,2,3}; + vector signed short vssb = {2,-3,2,-3,2,-3,2,-3}; + vector unsigned short vuse, vuso; + vector signed short vsse, vsso; + vector unsigned int vuie, vuio; + vector signed int vsie, vsio; + + vuse = vec_mule (vuca, vucb); + vuso = vec_mulo (vuca, vucb); + vsse = vec_mule (vsca, vscb); + vsso = vec_mulo (vsca, vscb); + vuie = vec_mule (vusa, vusb); + vuio = vec_mulo (vusa, vusb); + vsie = vec_mule (vssa, vssb); + vsio = vec_mulo (vssa, vssb); + + check (vec_all_eq (vuse, +((vector unsigned short){0,4,8,12,16,20,24,28})), +"vuse"); + check (vec_all_eq (vuso, +((vector unsigned short){3,9,15,21,27,33,39,45})), +"vuso"); + check (vec_all_eq (vsse, +((vector signed short){-16,-12,-8,-4,0,4,8,12})), +"vsse"); + check (vec_all_eq (vsso, +((vector signed short){21,15,9,3,-3,-9,-15,-21})), +"vsso"); + check (vec_all_eq (vuie, ((vector unsigned int){0,4,8,12})), "vuie"); + check (vec_all_eq (vuio, ((vector unsigned int){3,9,15,21})), "vuio"); + check (vec_all_eq (vsie, ((vector signed int){-8,-4,0,4})), "vsie"); + check (vec_all_eq (vsio, ((vector signed int){9,3,-3,-9})), "vsio"); +} Index: gcc/testsuite/gcc.dg/vmx/mult-even-odd-be-order.c === --- gcc/testsuite/gcc.dg/vmx/mult-even-odd-be-order.c (revision 0) +++ gcc/testsuite/gcc.dg/vmx/mult-even-odd-be-order.c (revision 0) @@ -0,0 +1,64 @@ +/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */ + +#include "harness.h" + +static void test() +{ + vector unsigned char vuca = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + vector unsigned char vucb = {2,3,2,3,2,3,2,3,2,3,2,3,2,3,2,3}; + vector signed char vsca = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7}; + vect
Re: [PATCH] i?86 unaligned/aligned load improvement for AVX512F
On Mon, Jan 13, 2014 at 07:35:41PM +0100, Uros Bizjak wrote: > Jakub, do you plan to submit this patch? That would be following patch then, tested on x86_64-linux. Unfortunately, it doesn't help for the avx512f-vmovdqu32-1.c testcase, the thing is that the __m512i type is V8DImode and while the emitted (unaligned) load is V16SImode, as it is then cast to V8DImode, combiner combines it into V8DImode load and thus it is vmovdqu64 anyway. So not sure if this is worth it, your call... But, while at it, is there any reason why we treat V64QImode and V32HImode so badly? As vec_initv64qi and vec_initv32hi aren't defined, e.g. for the foo_1 in avx512f-vec-init.c we generate ~ 180 instructions when I'd say vmovd %edi, %xmm0 vpbroadcastb%xmm0, %xmm0 vpbroadcastq%xmm0, %zmm0 ret would do the trick just fine. 2014-01-13 Jakub Jelinek * config/i386/sse.md (*mov_internal): Only use vmovdqa64 or vmovdqu64 instructions for V?DImode, for other MODE_VECT_INT modes use vmovdqa32 or vmovdqu32. * gcc.target/i386/avx512f-vec-init.c: Expect vmovdqa32 instead of vmovdqa64. --- gcc/config/i386/sse.md.jj 2014-01-04 10:56:54.795976470 +0100 +++ gcc/config/i386/sse.md 2014-01-13 20:30:04.052499798 +0100 @@ -705,7 +705,14 @@ (define_insn "*mov_internal" return "vmovapd\t{%g1, %g0|%g0, %g1}"; case MODE_OI: case MODE_TI: - return "vmovdqa64\t{%g1, %g0|%g0, %g1}"; + switch (mode) + { + case V4DImode: + case V2DImode: + return "vmovdqa64\t{%g1, %g0|%g0, %g1}"; + default: + return "vmovdqa32\t{%g1, %g0|%g0, %g1}"; + } default: gcc_unreachable (); } @@ -743,9 +750,16 @@ (define_insn "*mov_internal" case MODE_XI: if (misaligned_operand (operands[0], mode) || misaligned_operand (operands[1], mode)) - return "vmovdqu64\t{%1, %0|%0, %1}"; - else + { + if (mode == V8DImode) + return "vmovdqu64\t{%1, %0|%0, %1}"; + else + return "vmovdqu32\t{%1, %0|%0, %1}"; + } + else if (mode == V8DImode) return "vmovdqa64\t{%1, %0|%0, %1}"; + else + return "vmovdqa32\t{%1, %0|%0, %1}"; default: gcc_unreachable (); --- gcc/testsuite/gcc.target/i386/avx512f-vec-init.c.jj 2013-12-31 12:51:09.0 +0100 +++ gcc/testsuite/gcc.target/i386/avx512f-vec-init.c2014-01-13 21:42:48.410415601 +0100 @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O3 -mavx512f" } */ -/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+%zmm" 2 } } */ +/* { dg-final { scan-assembler-times "vmovdqa32\[ \\t\]+%zmm" 2 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd" 1 } } */ /* { dg-final { scan-assembler-times "vpbroadcastq" 1 } } */ /* { dg-final { scan-assembler-times "vpbroadcastb" 2 } } */ Jakub
Re: [C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)
On Mon, Jan 13, 2014 at 05:48:59PM +0100, Marek Polacek wrote: > The patch will need some tweaking, I realized that e.g. for struct S { > union {}; }; it doesn't do the right thing... Done in the patch below. CCing Jason for the C++ part. Does this look sane now? Regtested/bootstrapped on x86_64. 2014-01-13 Marek Polacek PR c/58346 c-family/ * c-common.c (pointer_to_zero_sized_aggr_p): New function. * c-common.h: Declare it. cp/ * typeck.c (pointer_diff): Give an error on arithmetic on pointer to an empty aggregate. c/ * c-typeck.c (pointer_diff): Give an error on arithmetic on pointer to an empty aggregate. testsuite/ * c-c++-common/pr58346.c: New test. --- gcc/c-family/c-common.h.mp 2014-01-13 19:02:22.249870601 +0100 +++ gcc/c-family/c-common.h 2014-01-13 19:04:15.068294390 +0100 @@ -789,6 +789,7 @@ extern bool keyword_is_storage_class_spe extern bool keyword_is_type_qualifier (enum rid); extern bool keyword_is_decl_specifier (enum rid); extern bool cxx_fundamental_alignment_p (unsigned); +extern bool pointer_to_zero_sized_aggr_p (tree); #define c_sizeof(LOC, T) c_sizeof_or_alignof_type (LOC, T, true, false, 1) #define c_alignof(LOC, T) c_sizeof_or_alignof_type (LOC, T, false, false, 1) --- gcc/c-family/c-common.c.mp 2014-01-13 19:01:20.503637616 +0100 +++ gcc/c-family/c-common.c 2014-01-13 19:42:32.805135382 +0100 @@ -11829,4 +11829,17 @@ cxx_fundamental_alignment_p (unsigned a TYPE_ALIGN (long_double_type_node))); } +/* Return true if T is a pointer to a zero-sized struct/union. */ + +bool +pointer_to_zero_sized_aggr_p (tree t) +{ + t = strip_pointer_operator (t); + if (RECORD_OR_UNION_TYPE_P (t) + && TYPE_SIZE (t) + && integer_zerop (TYPE_SIZE (t))) +return true; + return false; +} + #include "gt-c-family-c-common.h" --- gcc/cp/typeck.c.mp 2014-01-13 19:08:12.237244663 +0100 +++ gcc/cp/typeck.c 2014-01-13 19:10:23.350742070 +0100 @@ -5043,6 +5043,14 @@ pointer_diff (tree op0, tree op1, tree p return error_mark_node; } + if (pointer_to_zero_sized_aggr_p (TREE_TYPE (op1))) +{ + if (complain & tf_error) + error ("arithmetic on pointer to an empty aggregate"); + else + return error_mark_node; +} + op1 = (TYPE_PTROB_P (ptrtype) ? size_in_bytes (target_type) : integer_one_node); --- gcc/c/c-typeck.c.mp 2014-01-13 15:47:01.316105676 +0100 +++ gcc/c/c-typeck.c2014-01-13 19:58:19.237271626 +0100 @@ -3536,6 +3536,9 @@ pointer_diff (location_t loc, tree op0, /* This generates an error if op0 is pointer to incomplete type. */ op1 = c_size_in_bytes (target_type); + if (pointer_to_zero_sized_aggr_p (TREE_TYPE (orig_op1))) +error_at (loc, "arithmetic on pointer to an empty aggregate"); + /* Divide by the size, in easiest possible way. */ result = fold_build2_loc (loc, EXACT_DIV_EXPR, inttype, op0, convert (inttype, op1)); --- gcc/testsuite/c-c++-common/pr58346.c.mp 2014-01-13 15:48:20.011420141 +0100 +++ gcc/testsuite/c-c++-common/pr58346.c2014-01-13 20:25:17.544582444 +0100 @@ -0,0 +1,24 @@ +/* PR c/58346 */ +/* { dg-do compile } */ + +struct U { +#ifdef __cplusplus + char a[0]; +#endif +}; +static struct U b[6]; +static struct U **u1, **u2; + +int +foo (struct U *p, struct U *q) +{ + return q - p; /* { dg-error "arithmetic on pointer to an empty aggregate" } */ +} + +void +bar (void) +{ + __PTRDIFF_TYPE__ d = u1 - u2; /* { dg-error "arithmetic on pointer to an empty aggregate" } */ + __asm volatile ("" : "+g" (d)); + foo (&b[0], &b[4]); +} Marek
Re: [PATCH,rs6000] Implement -maltivec=be for vec_insert and vec_extract Altivec intrinsics
On Sun, Jan 12, 2014 at 7:53 PM, Bill Schmidt wrote: > This patch provides for interpreting element numbers for the Altivec > vec_insert and vec_extract intrinsics as big-endian (left to right in a > vector register) when targeting a little endian machine and specifying > -maltivec=be. New test cases are added to test this functionality on > all supported vector types. > > Bootstrapped and tested with no new regressions on > powerpc64{,le}-unknown-linux-gnu. Ok for trunk? > > Thanks, > Bill > > > gcc: > > 2014-01-12 Bill Schmidt > > * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): > Implement -maltivec=be for vec_insert and vec_extract. > > gcc/testsuite: > > 2014-01-12 Bill Schmidt > > * gcc.dg/vmx/insert.c: New. > * gcc.dg/vmx/insert-be-order.c: New. > * gcc.dg/vmx/extract.c: New. > * gcc.dg/vmx/extract-be-order.c: New. > + if (!BYTES_BIG_ENDIAN && rs6000_altivec_element_order == 2) > + { > + int last_elem = TYPE_VECTOR_SUBPARTS (arg1_type) - 1; > + double_int di_last_elem = double_int::from_uhwi (last_elem); > + arg2 = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (arg2), > + double_int_to_tree (TREE_TYPE (arg2), > + di_last_elem), > + arg2); > + } Please change last_elem to unsigned int in both blocks of code. And I believe that GCC provides a more direct API to create a Tree from last_elem than the double_int::from_uhwi() and double_int_to_tree() dance because it seems that the value is constant for each instance. build_int_cstu()? Okay with those changes. Thanks, David
Re: Fix tree containers debug mode C++11 allocator awareness
On 12/22/2013 09:55 PM, François Dumont wrote: On 12/22/2013 12:51 PM, Jonathan Wakely wrote: On 21 December 2013 08:51, François Dumont wrote: Any feedback for this proposal ? It looks good but I don't have time to review it fully yet, please be patient. I'm more concerned about your comment about the non-debug mode implementation being incorrect, could you provide more details? . That's not a big issue. The constructor taking a rvalue reference and an allocator doesn't take care about safe iterators. They should be swap like in the move constructor when allocator is equivalent and invalidated if we have not been able to move memory. I plan to submit a patch to fix all implementations the same way at once but I can include it in this patch if you prefer. Following agreement given here: http://gcc.gnu.org/ml/libstdc++/2014-01/msg00066.html Attached patch applied. Profile mode will need the same kind of patch too. 2014-01-13 François Dumont * include/debug/set.h (set): Implement C++11 allocator-aware container requirements. * include/debug/map.h (map): Likewise. * include/debug/multiset.h (multiset): Likewise. * include/debug/multimap.h (multimap): Likewise. * include/debug/set.h (set::operator=(set&&)): Add noexcept and fix implementation regarding management of safe iterators. * include/debug/map.h (map::operator=(map&&)): Likewise. * include/debug/multiset.h (multiset::operator=(multiset&&)): Likewise. * include/debug/multimap.h (multimap::operator=(multimap&&)): Likewise. * include/debug/set.h (set::operator=(std::initializer_list<>)): Rely on the same operator from normal mode. * include/debug/map.h (map::operator=(std::initializer_list<>)): Likewise. * include/debug/multiset.h (multiset::operator=(std::initializer_list<>)): Likewise. * include/debug/multimap.h (multimap::operator=(std::initializer_list<>)): Likewise. * include/debug/set.h (set::swap(set&)): Add noexcept specification, add allocator equality check. * include/debug/map.h (map::swap(map&)): Likewise. * include/debug/multiset.h (multiset::swap(multiset&)): Likewise. * include/debug/multimap.h (multimap::swap(multimap&)): Likewise. François Index: include/debug/set.h === --- include/debug/set.h (revision 206587) +++ include/debug/set.h (working copy) @@ -49,6 +49,10 @@ typedef typename _Base::const_iterator _Base_const_iterator; typedef typename _Base::iterator _Base_iterator; typedef __gnu_debug::_Equal_to<_Base_const_iterator> _Equal; +#if __cplusplus >= 201103L + typedef __gnu_cxx::__alloc_traits _Alloc_traits; +#endif public: // types: typedef _Keykey_type; @@ -101,6 +105,28 @@ const _Compare& __comp = _Compare(), const allocator_type& __a = allocator_type()) : _Base(__l, __comp, __a) { } + + explicit + set(const allocator_type& __a) + : _Base(__a) { } + + set(const set& __x, const allocator_type& __a) + : _Base(__x, __a) { } + + set(set&& __x, const allocator_type& __a) + : _Base(std::move(__x._M_base()), __a) { } + + set(initializer_list __l, const allocator_type& __a) + : _Base(__l, __a) + { } + + template +set(_InputIterator __first, _InputIterator __last, + const allocator_type& __a) + : _Base(__gnu_debug::__base(__gnu_debug::__check_valid_range(__first, + __last)), + __gnu_debug::__base(__last), __a) +{ } #endif ~set() _GLIBCXX_NOEXCEPT { } @@ -108,7 +134,7 @@ set& operator=(const set& __x) { - *static_cast<_Base*>(this) = __x; + _M_base() = __x; this->_M_invalidate_all(); return *this; } @@ -116,20 +142,25 @@ #if __cplusplus >= 201103L set& operator=(set&& __x) + noexcept(_Alloc_traits::_S_nothrow_move()) { - // NB: DR 1204. - // NB: DR 675. __glibcxx_check_self_move_assign(__x); - clear(); - swap(__x); + bool xfer_memory = _Alloc_traits::_S_propagate_on_move_assign() + || __x.get_allocator() == this->get_allocator(); + _M_base() = std::move(__x._M_base()); + if (xfer_memory) + this->_M_swap(__x); + else + this->_M_invalidate_all(); + __x._M_invalidate_all(); return *this; } set& operator=(initializer_list __l) { - this->clear(); - this->insert(__l); + _M_base() = __l; + this->_M_invalidate_all(); return *this; } #endif @@ -337,7 +368,14 @@ void swap(set& __x) +#if __cplusplus >= 201103L + noexcept(_Alloc_traits::_S_nothrow_swap()) +#endif { +#if __cplusplus >= 201103L + if (!_Alloc_traits::_S_propagate_on_swap()) + __glibcxx_check_equal_allocs(__x); +#endif _Base::swap(__x); this->_M_swap(__x); } Index: include/debug/map.h === --- include/debug/map.h (revision 206587) ++
PATCH: PR middle-end/59789: [4.9 Regression] ICE in in convert_move, at expr.c:333
Hi, We should report some early inlining errors. This patch is based on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57698#c7 It adds report_early_inliner_always_inline_failure and uses it in expand_call_inline. Tested on Linux/x86-64. OK to install? Thanks. H.J. commit 7b18b53d308b2c25bef5664be3e6544249d86bdc Author: H.J. Lu Date: Mon Jan 13 11:54:36 2014 -0800 Update error handling during early_inlining diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 5c674bc..284bc66 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2014-01-13 Sriraman Tallam + H.J. Lu + + PR middle-end/59789 + * tree-inline.c (report_early_inliner_always_inline_failure): New + function. + (expand_call_inline): Emit errors during early_inlining if + report_early_inliner_always_inline_failure returns true. + 2014-01-10 DJ Delorie * config/msp430/msp430.md (call_internal): Don't allow memory diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 459e365..2a7b3ca 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-01-13 H.J. Lu + + PR middle-end/59789 + * gcc.target/i386/pr59789.c: New testcase. + 2014-01-13 Jakub Jelinek PR tree-optimization/59387 diff --git a/gcc/testsuite/gcc.target/i386/pr59789.c b/gcc/testsuite/gcc.target/i386/pr59789.c new file mode 100644 index 000..b476d6c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr59789.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ia32 } */ +/* { dg-options "-O -march=i686" } */ + +#pragma GCC push_options +#pragma GCC target("sse2") +typedef int __v4si __attribute__ ((__vector_size__ (16))); +typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__)); + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_epi32 (int __q3, int __q2, int __q1, int __q0) /* { dg-error "target specific option mismatch" } */ +{ + return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 }; +} +#pragma GCC pop_options + + +__m128i +f1(void) /* { dg-message "warning: SSE vector return without SSE enabled changes the ABI" } */ +{ + return _mm_set_epi32 (0, 0, 0, 0); /* { dg-error "called from here" } */ +} diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 22521b1..ce1e3af 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -4046,6 +4046,32 @@ add_local_variables (struct function *callee, struct function *caller, } } +/* Should an error be reported when early inliner fails to inline an + always_inline function? That depends on the REASON. */ + +static inline bool +report_early_inliner_always_inline_failure (cgraph_inline_failed_t reason) +{ + /* Only the following reasons need to be reported when the early inliner + fails to inline an always_inline function. Called from + expand_call_inline. */ + switch (reason) +{ +case CIF_BODY_NOT_AVAILABLE: +case CIF_FUNCTION_NOT_INLINABLE: +case CIF_OVERWRITABLE: +case CIF_MISMATCHED_ARGUMENTS: +case CIF_EH_PERSONALITY: +case CIF_UNSPECIFIED: +case CIF_NON_CALL_EXCEPTIONS: +case CIF_TARGET_OPTION_MISMATCH: +case CIF_OPTIMIZATION_MISMATCH: + return true; +default: + return false; +} +} + /* If STMT is a GIMPLE_CALL, replace it with its inline expansion. */ static bool @@ -4116,7 +4142,8 @@ expand_call_inline (basic_block bb, gimple stmt, copy_body_data *id) /* During early inline pass, report only when optimization is not turned on. */ && (cgraph_global_info_ready - || !optimize) + || !optimize + || report_early_inliner_always_inline_failure (reason)) /* PR 20090218-1_0.c. Body can be provided by another module. */ && (reason != CIF_BODY_NOT_AVAILABLE || !flag_generate_lto)) {
[Patch,AArch64] Support SISD variants of SCVTF,UCVTF
Hello, This patch adds support to the SISD variants of SCVTF/UCVTF instructions. This also refactors the existing support for floating point instruction variants of SCVTF/UCVTF in order to direct the instruction selection based on the constraints. Given that the floating-point variations supports inequal width convertions (SI to DF and DI to SF), new mode iterator w1 and w2 have been introduced and fcvt_target,FCVT_TARGET have been extended to support non vector type. Since this patch changes the existing patterns, the testcase includes tests for both SISD and floating point variations of the instructions. Tested for aarch64-none-elf. OK for trunk? Cheers VP. gcc/ChangeLog: 2013-01-13 Vidya Praveen * aarch64.md (float2): Remove. (floatuns2): Remove. (2): New pattern for equal width float and floatuns conversions. (2): New pattern for inequal width float and floatuns conversions. * iterators.md (fcvt_target, FCVT_TARGET): Support SF and DF modes. (w1,w2): New mode attributes for inequal width conversions. gcc/testsuite/ChangeLog: 2013-01-13 Vidya Praveen * gcc.target/aarch64/cvtf_1.c: New.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c83622d..1775849 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3295,20 +3295,24 @@ [(set_attr "type" "f_cvtf2i")] ) -(define_insn "float2" - [(set (match_operand:GPF 0 "register_operand" "=w") -(float:GPF (match_operand:GPI 1 "register_operand" "r")))] - "TARGET_FLOAT" - "scvtf\\t%0, %1" - [(set_attr "type" "f_cvti2f")] +(define_insn "2" + [(set (match_operand:GPF 0 "register_operand" "=w,w") +(FLOATUORS:GPF (match_operand: 1 "register_operand" "w,r")))] + "" + "@ + cvtf\t%0, %1 + cvtf\t%0, %1" + [(set_attr "simd" "yes,no") + (set_attr "fp" "no,yes") + (set_attr "type" "neon_int_to_fp_,f_cvti2f")] ) -(define_insn "floatuns2" +(define_insn "2" [(set (match_operand:GPF 0 "register_operand" "=w") -(unsigned_float:GPF (match_operand:GPI 1 "register_operand" "r")))] +(FLOATUORS:GPF (match_operand: 1 "register_operand" "r")))] "TARGET_FLOAT" - "ucvtf\\t%0, %1" - [(set_attr "type" "f_cvt")] + "cvtf\t%0, %1" + [(set_attr "type" "f_cvti2f")] ) ;; --- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index c4f95dc..11bdc35 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -293,6 +293,10 @@ ;; 32-bit version and "%x0" in the 64-bit version. (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")]) +;; For inequal width int to float conversion +(define_mode_attr w1 [(SF "w") (DF "x")]) +(define_mode_attr w2 [(SF "x") (DF "w")]) + ;; For constraints used in scalar immediate vector moves (define_mode_attr hq [(HI "h") (QI "q")]) @@ -558,8 +562,12 @@ (define_mode_attr atomic_sfx [(QI "b") (HI "h") (SI "") (DI "")]) -(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si")]) -(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")]) +(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si") (SF "si") (DF "di")]) +(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI") (SF "SI") (DF "DI")]) + +;; for the inequal width integer to fp conversions +(define_mode_attr fcvt_iesize [(SF "di") (DF "si")]) +(define_mode_attr FCVT_IESIZE [(SF "DI") (DF "SI")]) (define_mode_attr VSWAP_WIDTH [(V8QI "V16QI") (V16QI "V8QI") (V4HI "V8HI") (V8HI "V4HI") diff --git a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c new file mode 100644 index 000..80ab9a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c @@ -0,0 +1,95 @@ +/* { dg-do run } */ +/* { dg-options "-save-temps -fno-inline -O1" } */ + +#define FCVTDEF(ftype,itype) \ +void \ +cvt_##itype##_to_##ftype (itype a, ftype b)\ +{\ + ftype c;\ + c = (ftype) a;\ + if ( (c - b) > 0.1) abort();\ +} + +#define force_simd_for_float(v) asm volatile ("mov %s0, %1.s[0]" :"=w" (v) :"w" (v) :) +#define force_simd_for_double(v) asm volatile ("mov %d0, %1.d[0]" :"=w" (v) :"w" (v) :) + +#define FCVTDEF_SISD(ftype,itype) \ +void \ +cvt_##itype##_to_##ftype##_sisd (itype a, ftype b)\ +{\ + ftype c;\ + force_simd_for_##ftype(a);\ + c = (ftype) a;\ + if ( (c - b) > 0.1) abort();\ +} + +#define FCVT(ftype,itype,ival,fval) cvt_##itype##_to_##ftype (ival, fval); +#define FCVT_SISD(ftype,itype,ival,fval) cvt_##itype##_to_##ftype##_sisd (ival, fval); + +typedef int int32_t; +typedef unsigned int uint32_t; +typedef long long int int64_t; +typedef unsigned long long int uint64_t; + +extern void abort(); + +FCVTDEF (float, int32_t) +/* { dg-final { scan-assembler "scvtf\ts\[0-9\]+,\ w\[0-9\]+" } } */ +FCVTDEF (float, uint32_t) +/* { dg-final { scan-assembler "uc
[PATCH] Fix up vect/fast-math-mgrid-resid.f testcase (PR testsuite/59494)
Hi! As discussed in the PR and on IRC, this testcase is very fragile, counting additions with vect_ named SSA_NAME on lhs works only for some tunings, for other tunings reassoc width etc. affect it and we can e.g. have anonymous SSA_NAMEs on the lhs in the optimized dump instead. These alternate regexps seems to match regardless of the tunings (at least what I've tried), starting with the corresponding fix onwards, and FAIL before the fix. Regtested on x86_64-linux and i686-linux, ok for trunk? 2014-01-13 Jakub Jelinek PR testsuite/59494 * gfortran.dg/vect/fast-math-mgrid-resid.f: Change -fdump-tree-optimized to -fdump-tree-pcom-details in dg-options and cleanup-tree-dump from optimized to pcom. Remove scan-tree-dump-times for vect_\[^\\n\]*\\+, add scan-tree-dump-times for no suitable chains and Executing predictive commoning without unrolling. --- gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f.jj 2013-04-08 15:38:21.0 +0200 +++ gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 2014-01-13 13:18:39.904315828 +0100 @@ -1,7 +1,7 @@ ! { dg-do compile { target i?86-*-* x86_64-*-* } } ! { dg-require-effective-target vect_double } ! { dg-require-effective-target sse2 } -! { dg-options "-O3 -ffast-math -msse2 -fpredictive-commoning -ftree-vectorize -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -msse2 -fpredictive-commoning -ftree-vectorize -fdump-tree-pcom-details" } *** RESID COMPUTES THE RESIDUAL: R = V - AU @@ -39,8 +39,9 @@ C RETURN END ! we want to check that predictive commoning did something on the -! vectorized loop, which means we have to have exactly 13 vector -! additions. -! { dg-final { scan-tree-dump-times "vect_\[^\\n\]*\\+ " 13 "optimized" } } +! vectorized loop. +! { dg-final { scan-tree-dump-times "Executing predictive commoning without unrolling" 1 "pcom" { target lp64 } } } +! { dg-final { scan-tree-dump-times "Executing predictive commoning without unrolling" 2 "pcom" { target ia32 } } } +! { dg-final { scan-tree-dump-times "Predictive commoning failed: no suitable chains" 0 "pcom" } } ! { dg-final { cleanup-tree-dump "vect" } } -! { dg-final { cleanup-tree-dump "optimized" } } +! { dg-final { cleanup-tree-dump "pcom" } } Jakub
Re: PING: PATCH: PR libitm/53113: Build fails in x86_avx.cc if AVX disabled by -mno-avx
On 01/11/2014 08:28 AM, H.J. Lu wrote: > +2013-12-25 H.J. Lu >> + >> + PR libitm/53113 >> + * Makefile.am (x86_sse.lo): Append -msse to CXXFLAGS. >> + (x86_avx.lo): Append -mavx to CXXFLAGS. >> + * Makefile.in: Regenerate. >> + Ok. r~
[msp430] fix call-via-sp and epilogue helper patterns
The call change avoids a problem on hardware where indirect calls that use SP as a base register don't seem to do what you expect. The 'J' one fixes a link-time error wrt epilogue helper functions. Committed. * config/msp430/msp430.md (call_internal): Don't allow memory references with SP as the base register. (call_value_internal): Likewise. * config/msp430/constraints.md (Yc): New. For memory references that don't use SP as a base register. * config/msp430/msp430.c (msp430_print_operand): Add 'J' to mean "an integer without a # prefix" * config/msp430/msp430.md (epilogue_helper): Use it. Index: config/msp430/msp430.md === --- config/msp430/msp430.md (revision 206582) +++ config/msp430/msp430.md (working copy) @@ -917,13 +917,13 @@ ) (define_insn "epilogue_helper" [(unspec_volatile [(match_operand 0 "immediate_operand" "i")] UNS_EPILOGUE_HELPER)] "" - "BR%Q0\t#__mspabi_func_epilog_%0" + "BR%Q0\t#__mspabi_func_epilog_%J0" ) (define_insn "prologue_start_marker" [(unspec_volatile [(const_int 0)] UNS_PROLOGUE_START_MARKER)] "" @@ -950,13 +950,13 @@ (match_operand 1 ""))] "" "" ) (define_insn "call_internal" - [(call (mem:HI (match_operand 0 "general_operand" "rmi")) + [(call (mem:HI (match_operand 0 "general_operand" "rYci")) (match_operand 1 ""))] "" "CALL%Q0\t%0" ) (define_expand "call_value" @@ -966,13 +966,13 @@ "" "" ) (define_insn "call_value_internal" [(set (match_operand 0 "register_operand" "=r") - (call (mem:HI (match_operand 1 "general_operand" "rmi")) + (call (mem:HI (match_operand 1 "general_operand" "rYci")) (match_operand 2 "")))] "" "CALL%Q0\t%1" ) (define_insn "msp_return" Index: config/msp430/constraints.md === --- config/msp430/constraints.md(revision 206582) +++ config/msp430/constraints.md(working copy) @@ -67,6 +67,19 @@ (and (match_code "plus" "0") (and (match_code "reg" "00") (match_test ("CONST_INT_P (XEXP (XEXP (op, 0), 1))")) (match_test ("IN_RANGE (INTVAL (XEXP (XEXP (op, 0), 1)), -1 << 15, (1 << 15)-1)" (match_code "reg" "0") ))) + +(define_constraint "Yc" + "Memory reference, for CALL - we can't use SP" + (and (match_code "mem") + (match_code "mem" "0") + (not (ior +(and (match_code "plus" "00") + (and (match_code "reg" "000") + (match_test ("REGNO (XEXP (XEXP (op, 0), 0)) != SP_REGNO" +(and (match_code "reg" "0") + (match_test ("REGNO (XEXP (XEXP (op, 0), 0)) != SP_REGNO"))) + + Index: config/msp430/msp430.c === --- config/msp430/msp430.c (revision 206582) +++ config/msp430/msp430.c (working copy) @@ -1917,12 +1917,13 @@ msp430_print_operand_addr (FILE * file, /* A low 16-bits of int/lower of register pair B high 16-bits of int/higher of register pair C bits 32-47 of a 64-bit value/reg 3 of a DImode value D bits 48-63 of a 64-bit value/reg 4 of a DImode value H like %B (for backwards compatibility) I inverse of value + J an integer without a # prefix L like %A (for backwards compatibility) O offset of the top of the stack Q like X but generates an A postfix R inverse of condition code, unsigned. X X instruction postfix in large mode Y value - 4 @@ -1947,13 +1948,12 @@ msp430_print_operand (FILE * file, rtx o return; case 'Y': gcc_assert (CONST_INT_P (op)); /* Print the constant value, less four. */ fprintf (file, "#%ld", INTVAL (op) - 4); return; - /* case 'D': used for "decimal without '#'" */ case 'I': if (GET_CODE (op) == CONST_INT) { /* Inverse of constants */ int i = INTVAL (op); fprintf (file, "%d", ~i); @@ -2107,12 +2107,14 @@ msp430_print_operand (FILE * file, rtx o because builtins are expanded before the frame layout is determined. */ fprintf (file, "%d", msp430_initial_elimination_offset (ARG_POINTER_REGNUM, STACK_POINTER_REGNUM) - 2); return; +case 'J': + gcc_assert (GET_CODE (op) == CONST_INT); case 0: break; default: output_operand_lossage ("invalid operand prefix"); return; }
Re: [PATCH] Avoid introducing undefined behavior in sccp (PR tree-optimization/59387)
On Mon, Jan 13, 2014 at 11:42:11AM +0100, Richard Biener wrote: > > + if (TREE_CODE (def) == INTEGER_CST && TREE_OVERFLOW (def)) > > TREE_OVERFLOW_P (), but it seems to me that the SCEV machinery > should do this at a good place (like where it finally records > the result into its cache before returning it, at set_and_end: > of analyze_scalar_evolution_1). > > > + def = drop_tree_overflow (def); As discussed on IRC, dropped this part of the change altogether (for now). > Hmm, stmt is still in the 'stmts' sequence here, I think you should > gsi_remove it before inserting it elsewhere. Fixed, bootstrapped/regtested on x86_64-linux and i686-linux, here is what I've committed in the end: 2014-01-13 Jakub Jelinek PR tree-optimization/59387 * tree-scalar-evolution.c: Include gimple-fold.h and gimplify-me.h. (scev_const_prop): If folded_casts and type has undefined overflow, use force_gimple_operand instead of force_gimple_operand_gsi and for each added stmt if it is assign with arith_code_with_undefined_signed_overflow, call rewrite_to_defined_overflow. * tree-ssa-loop-im.c: Don't include gimplify-me.h, include gimple-fold.h instead. (arith_code_with_undefined_signed_overflow, rewrite_to_defined_overflow): Moved to ... * gimple-fold.c (arith_code_with_undefined_signed_overflow, rewrite_to_defined_overflow): ... here. No longer static. Include gimplify-me.h. * gimple-fold.h (arith_code_with_undefined_signed_overflow, rewrite_to_defined_overflow): New prototypes. * gcc.c-torture/execute/pr59387.c: New test. --- gcc/tree-scalar-evolution.c.jj 2014-01-08 17:44:57.596582925 +0100 +++ gcc/tree-scalar-evolution.c 2014-01-10 15:46:55.355915072 +0100 @@ -286,6 +286,8 @@ along with GCC; see the file COPYING3. #include "dumpfile.h" #include "params.h" #include "tree-ssa-propagate.h" +#include "gimple-fold.h" +#include "gimplify-me.h" static tree analyze_scalar_evolution_1 (struct loop *, tree, tree); static tree analyze_scalar_evolution_for_address_of (struct loop *loop, @@ -3409,7 +3411,7 @@ scev_const_prop (void) { edge exit; tree def, rslt, niter; - gimple_stmt_iterator bsi; + gimple_stmt_iterator gsi; /* If we do not know exact number of iterations of the loop, we cannot replace the final value. */ @@ -3424,7 +3426,7 @@ scev_const_prop (void) /* Ensure that it is possible to insert new statements somewhere. */ if (!single_pred_p (exit->dest)) split_loop_exit_edge (exit); - bsi = gsi_after_labels (exit->dest); + gsi = gsi_after_labels (exit->dest); ex_loop = superloop_at_depth (loop, loop_depth (exit->dest->loop_father) + 1); @@ -3447,7 +3449,9 @@ scev_const_prop (void) continue; } - def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL); + bool folded_casts; + def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, + &folded_casts); def = compute_overall_effect_of_inner_loop (ex_loop, def); if (!tree_does_not_contain_chrecs (def) || chrec_contains_symbols_defined_in_loop (def, ex_loop->num) @@ -3485,10 +3489,37 @@ scev_const_prop (void) def = unshare_expr (def); remove_phi_node (&psi, false); - def = force_gimple_operand_gsi (&bsi, def, false, NULL_TREE, - true, GSI_SAME_STMT); + /* If def's type has undefined overflow and there were folded +casts, rewrite all stmts added for def into arithmetics +with defined overflow behavior. */ + if (folded_casts && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (def))) + { + gimple_seq stmts; + gimple_stmt_iterator gsi2; + def = force_gimple_operand (def, &stmts, true, NULL_TREE); + gsi2 = gsi_start (stmts); + while (!gsi_end_p (gsi2)) + { + gimple stmt = gsi_stmt (gsi2); + gimple_stmt_iterator gsi3 = gsi2; + gsi_next (&gsi2); + gsi_remove (&gsi3, false); + if (is_gimple_assign (stmt) + && arith_code_with_undefined_signed_overflow + (gimple_assign_rhs_code (stmt))) + gsi_insert_seq_before (&gsi, + rewrite_to_defined_overflow (stmt), + GSI_SAME_STMT); + else + gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); + } + } + else + def = force_gimple_operand_gsi (&gsi, def, false, NULL_TREE, + true, GSI_SAME_STMT); +
[C PATCH] Preevaluate rhs for lhs op= rhs in C (PR c/58943)
Hi! This patch fixes the following testcase by preevaluating rhs if it has (can have) side-effects in lhs op= rhs expressions. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? C++ already does a similar thing (though in that case with TARGET_EXPRs). Note1: had to tweak ssa-fre-33.c testcase a little bit (but it still fails without the fix which went together with it and succeeds with the fix and from that point onwards), because before fre1 there isn't enough forward propagation that would make it constant (the addition result becomes constant during fre1). Note2: c-c++-common/cilk-plus/AN/rank_mismatch2.c ICEs now, supposedly array notation handling doesn't handle SAVE_EXPRs properly, Balaji, do you think you can debug it and fix up afterwards? 2014-01-13 Jakub Jelinek PR c/58943 * c-typeck.c (build_modify_expr): For lhs op= rhs, if rhs has side effects, preevaluate rhs using SAVE_EXPR first. * c-omp.c (c_finish_omp_atomic): Set in_late_binary_op around build_modify_expr with non-NOP_EXPR opcode. Handle return from it being COMPOUND_EXPR. (c_finish_omp_for): Handle incr being COMPOUND_EXPR with first operand a SAVE_EXPR and second MODIFY_EXPR. * gcc.c-torture/execute/pr58943.c: New test. * gcc.dg/tree-ssa/ssa-fre-33.c (main): Avoid using += in the test. --- gcc/c/c-typeck.c.jj 2014-01-04 09:48:20.845147744 +0100 +++ gcc/c/c-typeck.c2014-01-13 14:57:27.133743740 +0100 @@ -5193,6 +5193,7 @@ build_modify_expr (location_t location, { tree result; tree newrhs; + tree rhseval = NULL_TREE; tree rhs_semantic_type = NULL_TREE; tree lhstype = TREE_TYPE (lhs); tree olhstype = lhstype; @@ -5254,8 +5255,17 @@ build_modify_expr (location_t location, /* Construct the RHS for any non-atomic compound assignemnt. */ if (!is_atomic_op) { + /* If in LHS op= RHS the RHS has side-effects, ensure they +are preevaluated before the rest of the assignment expression's +side-effects, because RHS could contain e.g. function calls +that modify LHS. */ + if (TREE_SIDE_EFFECTS (rhs)) + { + newrhs = in_late_binary_op ? save_expr (rhs) : c_save_expr (rhs); + rhseval = newrhs; + } newrhs = build_binary_op (location, - modifycode, lhs, rhs, 1); + modifycode, lhs, newrhs, 1); /* The original type of the right hand side is no longer meaningful. */ @@ -5269,7 +5279,7 @@ build_modify_expr (location_t location, if so, we need to generate setter calls. */ result = objc_maybe_build_modify_expr (lhs, newrhs); if (result) - return result; + goto return_result; /* Else, do the check that we postponed for Objective-C. */ if (!lvalue_or_else (location, lhs, lv_assign)) @@ -5363,7 +5373,7 @@ build_modify_expr (location_t location, if (result) { protected_set_expr_location (result, location); - return result; + goto return_result; } } @@ -5384,11 +5394,15 @@ build_modify_expr (location_t location, as the LHS argument. */ if (olhstype == TREE_TYPE (result)) -return result; +goto return_result; result = convert_for_assignment (location, olhstype, result, rhs_origtype, ic_assign, false, NULL_TREE, NULL_TREE, 0); protected_set_expr_location (result, location); + +return_result: + if (rhseval) +result = build2 (COMPOUND_EXPR, TREE_TYPE (result), rhseval, result); return result; } --- gcc/c-family/c-omp.c.jj 2014-01-04 09:48:20.0 +0100 +++ gcc/c-family/c-omp.c2014-01-13 15:23:51.653591098 +0100 @@ -136,7 +136,7 @@ c_finish_omp_atomic (location_t loc, enu enum tree_code opcode, tree lhs, tree rhs, tree v, tree lhs1, tree rhs1, bool swapped, bool seq_cst) { - tree x, type, addr; + tree x, type, addr, pre = NULL_TREE; if (lhs == error_mark_node || rhs == error_mark_node || v == error_mark_node || lhs1 == error_mark_node @@ -194,9 +194,18 @@ c_finish_omp_atomic (location_t loc, enu rhs = build2_loc (loc, opcode, TREE_TYPE (lhs), rhs, lhs); opcode = NOP_EXPR; } + bool save = in_late_binary_op; + in_late_binary_op = true; x = build_modify_expr (loc, lhs, NULL_TREE, opcode, loc, rhs, NULL_TREE); + in_late_binary_op = save; if (x == error_mark_node) return error_mark_node; + if (TREE_CODE (x) == COMPOUND_EXPR) +{ + pre = TREE_OPERAND (x, 0); + gcc_assert (TREE_CODE (pre) == SAVE_EXPR); + x = TREE_OPERAND (x, 1); +} gcc_assert (TREE_CODE (x) == MODIFY_EXPR); rhs = TREE_OPERAND (x, 1); @@ -264,6 +273,8 @@ c_finish_omp_atomic (location_t loc, enu x = omit_one_operand_loc (loc, type, x, rhs1ad
Re: Patch ping
On Mon, Jan 13, 2014 at 07:40:16PM +0100, Uros Bizjak wrote: > An unrelated observation: gcc should figure out that %k1 mask register > can be used in all gather insns and avoid unnecessary copies at the > beginning of the loop. I thought about that too, even started modifying sse.md, but then I read the spec and the AVX512F gather insns overwrite the mask register (like it does for the vector mask register in AVX2 case). Jakub
Re: Patch ping
On Mon, Jan 13, 2014 at 7:26 PM, Kirill Yukhin wrote: >> > Kirill, is it possible for you to test the patch in the simulator? Do >> > we have a testcase in gcc's testsuite that can be used to check this >> > patch? >> >> E.g. gcc.target/i386/avx2-gather* and avx512f-gather*. > This tests are for built-in generation. The issue is connected to > auto code gen. > > It seems to be working, we have for hss2a.fppized.f: > .L402: > vmovdqu64 (%rdi,%rax), %zmm1 > kmovw %k1, %k3 > kmovw %k1, %k2 > kmovw %k1, %k4 > kmovw %k1, %k5 > addl$1, %esi > vpgatherdd npwrx.4971-4(,%zmm1,4), %zmm0{%k3} > vpgatherdd (%r10,%zmm1,4), %zmm2{%k2} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm7, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r11,%rax) > vpgatherdd npwry.4973-4(,%zmm1,4), %zmm0{%k4} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm6, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r9,%rax) > vpgatherdd npwrz.4975-4(,%zmm1,4), %zmm0{%k5} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm5, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r14,%rax) > vpaddd %zmm2, %zmm4, %zmm0 > vmovdqa64 %zmm0, (%r15,%rax) > addq$64, %rax > cmpl%esi, %edx > ja .L402 An unrelated observation: gcc should figure out that %k1 mask register can be used in all gather insns and avoid unnecessary copies at the beginning of the loop. Uros.
Re: Patch ping
On Mon, Jan 13, 2014 at 7:26 PM, Kirill Yukhin wrote: >> On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote: >> > On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek wrote: >> > Kirill, is it possible for you to test the patch in the simulator? Do >> > we have a testcase in gcc's testsuite that can be used to check this >> > patch? >> >> E.g. gcc.target/i386/avx2-gather* and avx512f-gather*. > This tests are for built-in generation. The issue is connected to > auto code gen. > > It seems to be working, we have for hss2a.fppized.f: > .L402: > vmovdqu64 (%rdi,%rax), %zmm1 > kmovw %k1, %k3 > kmovw %k1, %k2 > kmovw %k1, %k4 > kmovw %k1, %k5 > addl$1, %esi > vpgatherdd npwrx.4971-4(,%zmm1,4), %zmm0{%k3} > vpgatherdd (%r10,%zmm1,4), %zmm2{%k2} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm7, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r11,%rax) > vpgatherdd npwry.4973-4(,%zmm1,4), %zmm0{%k4} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm6, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r9,%rax) > vpgatherdd npwrz.4975-4(,%zmm1,4), %zmm0{%k5} > vpmulld %zmm3, %zmm0, %zmm0 > vpaddd %zmm5, %zmm0, %zmm0 > vmovdqu32 %zmm0, (%r14,%rax) > vpaddd %zmm2, %zmm4, %zmm0 > vmovdqa64 %zmm0, (%r15,%rax) > addq$64, %rax > cmpl%esi, %edx > ja .L402 > > So, I vote that patch is working. Well, OK for mainline, then. Thanks, Uros.
Re: Patch ping
Hello, On 13 Jan 09:35, Jakub Jelinek wrote: > On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote: > > On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek wrote: > > Kirill, is it possible for you to test the patch in the simulator? Do > > we have a testcase in gcc's testsuite that can be used to check this > > patch? > > E.g. gcc.target/i386/avx2-gather* and avx512f-gather*. This tests are for built-in generation. The issue is connected to auto code gen. It seems to be working, we have for hss2a.fppized.f: .L402: vmovdqu64 (%rdi,%rax), %zmm1 kmovw %k1, %k3 kmovw %k1, %k2 kmovw %k1, %k4 kmovw %k1, %k5 addl$1, %esi vpgatherdd npwrx.4971-4(,%zmm1,4), %zmm0{%k3} vpgatherdd (%r10,%zmm1,4), %zmm2{%k2} vpmulld %zmm3, %zmm0, %zmm0 vpaddd %zmm7, %zmm0, %zmm0 vmovdqu32 %zmm0, (%r11,%rax) vpgatherdd npwry.4973-4(,%zmm1,4), %zmm0{%k4} vpmulld %zmm3, %zmm0, %zmm0 vpaddd %zmm6, %zmm0, %zmm0 vmovdqu32 %zmm0, (%r9,%rax) vpgatherdd npwrz.4975-4(,%zmm1,4), %zmm0{%k5} vpmulld %zmm3, %zmm0, %zmm0 vpaddd %zmm5, %zmm0, %zmm0 vmovdqu32 %zmm0, (%r14,%rax) vpaddd %zmm2, %zmm4, %zmm0 vmovdqa64 %zmm0, (%r15,%rax) addq$64, %rax cmpl%esi, %edx ja .L402 So, I vote that patch is working. -- Thanks, K
Re: [Patch] Remove references to non-existent tree-flow.h file
On 01/09/14 10:45, Steve Ellcey wrote: While looking at PR 59335 (plugin doesn't build) I saw the comments about tree-flow.h and tree-flow-inline.h not existing anymore. While these files have been removed there are still some references to them in Makefile.in, doc/tree-ssa.texi, and a couple of source files. This patch removes the references to these now-nonexistent files. OK to checkin? Steve Ellcey sell...@mips.com 2014-01-09 Steve Ellcey * Makefile.in (TREE_FLOW_H): Remove. (TREE_SSA_H): Add files names from tree-flow.h. * doc/tree-ssa.texi (Annotations): Remove reference to tree-flow.h * tree.h: Remove tree-flow.h reference. * hash-table.h: Remove tree-flow.h reference. * tree-ssa-loop-niter.c (dump_affine_iv): Replace tree-flow.h reference with tree-ssa-loop.h. Yes, this is fine. jeff
Re: [PATCH] Add zero-overhead looping for xtensa backend
On Thu, Jan 9, 2014 at 7:48 PM, Yangfei (Felix) wrote: > And here is the xtensa configuration tested (include/xtensa-config.h): > > #define XCHAL_HAVE_BE 0 > #define XCHAL_HAVE_LOOPS1 Hi Felix, I like this patch, and expect I will approve it. However, I would like you to do two more things before I do: 1. Ensure it doesn't generate zcl's when: #define XCHAL_HAVE_LOOPS 0 2. Ensure it doesn't produce loops bodies that contain ret, retw, ret.n or retw.n as the last instruction. It might be easier to just disallow them in loop bodies entirely though. Thanks!
Re: [C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)
On Mon, Jan 13, 2014 at 05:32:26PM +0100, Marek Polacek wrote: > This doesn't really fix the PR, but solves a related issue, where we > have e.g. > struct U {}; > static struct U b[6]; > > int foo (struct U *p, struct U *q) > { > return q - p; > } > int main() > { > return foo (&b[0], &b[4]); > } > Such a program SIGFPEs at runtime. But subtraction of pointers to empty > structures/unions doesn't really make sense and this patch forbids that. > Note that GCC permits a structure/union to have no members, but it's only > an extension, in C11 it's undefined behavior. > > Regtested/bootstrapped on x86_64, ok for trunk? The patch will need some tweaking, I realized that e.g. for struct S { union {}; }; it doesn't do the right thing... Marek
Re: [C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)
On 01/13/2014 05:32 PM, Marek Polacek wrote: This doesn't really fix the PR, but solves a related issue, where we have e.g. struct U {}; static struct U b[6]; int foo (struct U *p, struct U *q) { return q - p; } int main() { return foo (&b[0], &b[4]); } Such a program SIGFPEs at runtime. But subtraction of pointers to empty structures/unions doesn't really make sense and this patch forbids that. Note that GCC permits a structure/union to have no members, but it's only + if (pointer_to_empty_aggr_p (TREE_TYPE (orig_op1))) +error_at (loc, "arithmetic on pointer to an empty aggregate"); You need to check the size of the aggregate, not if it has no members. With your patch applied, if the struct definition in your test case is changed to this: struct U { char empty[0]; }; it still compiles and fails at run time. Empty structs have size 1 in C++, but structs with a zero-length array have size 0, so the C++ compiler should be changed as well. -- Florian Weimer / Red Hat Product Security Team
[C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)
This doesn't really fix the PR, but solves a related issue, where we have e.g. struct U {}; static struct U b[6]; int foo (struct U *p, struct U *q) { return q - p; } int main() { return foo (&b[0], &b[4]); } Such a program SIGFPEs at runtime. But subtraction of pointers to empty structures/unions doesn't really make sense and this patch forbids that. Note that GCC permits a structure/union to have no members, but it's only an extension, in C11 it's undefined behavior. Regtested/bootstrapped on x86_64, ok for trunk? 2014-01-13 Marek Polacek PR c/58346 c/ * c-typeck.c (pointer_to_empty_aggr_p): New function. (pointer_diff): Give an error on arithmetic on pointer to an empty aggregate. testsuite/ * gcc.dg/pr58346.c: New test. --- gcc/c/c-typeck.c.mp 2014-01-13 15:47:01.316105676 +0100 +++ gcc/c/c-typeck.c2014-01-13 16:03:35.513081392 +0100 @@ -3427,6 +3427,18 @@ parser_build_binary_op (location_t locat return result; } + +/* Return true if T is a pointer to an empty struct/union. */ + +static bool +pointer_to_empty_aggr_p (tree t) +{ + t = strip_pointer_operator (t); + if (!RECORD_OR_UNION_TYPE_P (t)) +return false; + return TYPE_FIELDS (t) == NULL_TREE; +} + /* Return a tree for the difference of pointers OP0 and OP1. The resulting tree has type int. */ @@ -3536,6 +3548,9 @@ pointer_diff (location_t loc, tree op0, /* This generates an error if op0 is pointer to incomplete type. */ op1 = c_size_in_bytes (target_type); + if (pointer_to_empty_aggr_p (TREE_TYPE (orig_op1))) +error_at (loc, "arithmetic on pointer to an empty aggregate"); + /* Divide by the size, in easiest possible way. */ result = fold_build2_loc (loc, EXACT_DIV_EXPR, inttype, op0, convert (inttype, op1)); --- gcc/testsuite/gcc.dg/pr58346.c.mp 2014-01-13 15:48:20.011420141 +0100 +++ gcc/testsuite/gcc.dg/pr58346.c 2014-01-13 16:01:41.741713601 +0100 @@ -0,0 +1,21 @@ +/* PR c/58346 */ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" } */ + +struct U {}; +static struct U b[6]; +static struct U **s1, **s2; + +int +foo (struct U *p, struct U *q) +{ + return q - p; /* { dg-error "arithmetic on pointer to an empty aggregate" } */ +} + +void +bar (void) +{ + __PTRDIFF_TYPE__ d = s1 - s2; /* { dg-error "arithmetic on pointer to an empty aggregate" } */ + __asm volatile ("" : "+g" (d)); + foo (&b[0], &b[4]); +} Marek
Re: [PATCH] Fix for PR57698
On Fri, Jul 12, 2013 at 3:16 PM, Sriraman Tallam wrote: > Patch attached to fix this: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57698 > > Here is what is going on. In rev. 200179, this change to tree-inline.c > > Index: tree-inline.c > === > --- tree-inline.c (revision 200178) > +++ tree-inline.c (revision 200179) > @@ -3905,8 +3905,6 @@ > for inlining, but we can't do that because frontends overwrite > the body. */ > && !cg_edge->callee->local.redefined_extern_inline > - /* Avoid warnings during early inline pass. */ > - && cgraph_global_info_ready > /* PR 20090218-1_0.c. Body can be provided by another module. */ > && (reason != CIF_BODY_NOT_AVAILABLE || !flag_generate_lto)) > { > > made inline failure errors during early inlining reportable. Now, > this function is called when the early_inliner calls > optimize_inline_calls. The reason for the failure, > CIF_INDIRECT_UNKNOWN_CALL, should not be reported because it is not a > valid reason,(see can_inline_edge_p in ipa-inline.c for the list of > reasons we intend to report) but it gets reported because of the above > change. > > > The reported bug happens only when optimization is turned on as the > early inliner pass invokes incremental inlining which calls > optimize_inline_calls and triggers the above failure. > > So, the fix is then as simple as: > > Index: tree-inline.c > === > --- tree-inline.c (revision 200912) > +++ tree-inline.c (working copy) > @@ -3905,6 +3905,10 @@ expand_call_inline (basic_block bb, gimple stmt, c > for inlining, but we can't do that because frontends overwrite > the body. */ > && !cg_edge->callee->local.redefined_extern_inline > + /* During early inline pass, report only when optimization is > +not turned on. */ > + && (cgraph_global_info_ready > + || !optimize) > /* PR 20090218-1_0.c. Body can be provided by another module. */ > && (reason != CIF_BODY_NOT_AVAILABLE || !flag_generate_lto)) > { > > Seems like the right fix to me. Ok? The whole patch with test case > included is attached. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59789 -- H.J.
Re: Patch ping
On Mon, Jan 13, 2014 at 08:15:11AM -0700, Jeff Law wrote: > On 01/13/14 01:07, Jakub Jelinek wrote: > >I'd like to ping 2 patches: > > > >http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html > >- Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than > > memory load after optimization (I'd like to keep the current > > patch for the reasons mentioned there, but also add this patch) > I'd tend to think this is 4.10/5.0 material. Unless (for example), > you've got a PR where this makes a significant difference in compile > time. Ok, will defer it then. Jakub
Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.
On 10-01-14 12:39, Richard Earnshaw wrote: >>Consequently, you'll need to add a patch for AArch64 which has two >>registers clobbered by PLT-based calls. >> > >Thanks for pointing that out. That's r16 and r17, right? I can propose the hook >for AArch64, once we all agree on how the hook should look. > Yes; and thanks! Hi Richard, I'm posting this patch that implements the TARGET_FN_OTHER_HARD_REG_USAGE hook for aarch64. It uses the conservative hook format for now. I've build gcc and cc1 with the patch, and observed the impact on this code snippet: ... static int bar (int x) { return x + 3; } int foo (int y) { return y + bar (y); } ... AFAICT, that looks as expected: ... $ gcc fuse-caller-save.c -mno-lra -fno-use-caller-save -O2 -S -o- > 1 $ gcc fuse-caller-save.c -mno-lra -fuse-caller-save -O2 -S -o- > 2 $ diff -u 1 2 --- 1 2014-01-13 16:51:24.0 +0100 +++ 2 2014-01-13 16:51:19.0 +0100 @@ -11,14 +11,12 @@ .global foo .type foo, %function foo: - stp x29, x30, [sp, -32]! + stp x29, x30, [sp, -16]! + mov w1, w0 add x29, sp, 0 - str x19, [sp,16] - mov w19, w0 bl bar - add w0, w0, w19 - ldr x19, [sp,16] - ldp x29, x30, [sp], 32 + ldp x29, x30, [sp], 16 + add w0, w0, w1 ret .size foo, .-foo .section.text.startup,"ax",%progbits ... Btw, the results are the same for -mno-lra and -mlra. I'm just using the -mno-lra version here because the -mlra version of -fuse-caller-save is still in review ( http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00586.html ). Thanks, - Tom 2014-01-11 Tom de Vries * config/aarch64/aarch64.c (TARGET_FN_OTHER_HARD_REG_USAGE): Redefine as aarch64_fn_other_hard_reg_usage. (aarch64_fn_other_hard_reg_usage): New function. --- gcc/config/aarch64/aarch64.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3b1f6b5..295fd5d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3287,6 +3287,16 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2) return true; } +/* Implement TARGET_FN_OTHER_HARD_REG_USAGE. */ + +static bool +aarch64_fn_other_hard_reg_usage (struct hard_reg_set_container *regs) +{ + SET_HARD_REG_BIT (regs->set, R16_REGNUM); + SET_HARD_REG_BIT (regs->set, R17_REGNUM); + return true; +} + enum machine_mode aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) { @@ -8472,6 +8482,11 @@ aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode, #undef TARGET_FIXED_CONDITION_CODE_REGS #define TARGET_FIXED_CONDITION_CODE_REGS aarch64_fixed_condition_code_regs +#undef TARGET_FN_OTHER_HARD_REG_USAGE +#define TARGET_FN_OTHER_HARD_REG_USAGE \ + aarch64_fn_other_hard_reg_usage + + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" -- 1.8.3.2
[PATCH][ARM][committed] Fix typo in arm.h
Hi all, I've committed this obvious typo fix to trunk as r206580. Kyrill 2014-01-13 Kyrylo Tkachov * config/arm/arm.h (MAX_CONDITIONAL_EXECUTE): Fix typo in description.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 409589d..b815488 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -189,7 +189,7 @@ extern arm_cc arm_current_cc; #define ARM_INVERSE_CONDITION_CODE(X) ((arm_cc) (((int)X) ^ 1)) -/* The maximaum number of instructions that is beneficial to +/* The maximum number of instructions that is beneficial to conditionally execute. */ #undef MAX_CONDITIONAL_EXECUTE #define MAX_CONDITIONAL_EXECUTE arm_max_conditional_execute ()
Re: Patch ping
On 01/13/14 08:20, Jakub Jelinek wrote: On Mon, Jan 13, 2014 at 08:15:11AM -0700, Jeff Law wrote: On 01/13/14 01:07, Jakub Jelinek wrote: I'd like to ping 2 patches: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html - Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than memory load after optimization (I'd like to keep the current patch for the reasons mentioned there, but also add this patch) I'd tend to think this is 4.10/5.0 material. Unless (for example), you've got a PR where this makes a significant difference in compile time. Ok, will defer it then. THanks. I've put in my queued folder as well ;-) jeff
Re: Patch ping
On 01/13/14 01:07, Jakub Jelinek wrote: Hi! I'd like to ping 2 patches: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html - Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than memory load after optimization (I'd like to keep the current patch for the reasons mentioned there, but also add this patch) I'd tend to think this is 4.10/5.0 material. Unless (for example), you've got a PR where this makes a significant difference in compile time. jeff
Re: [PING^2][PATCH] -fuse-caller-save - Implement TARGET_FN_OTHER_HARD_REG_USAGE hook for MIPS
On 10-01-14 09:47, Richard Sandiford wrote: Tom de Vries writes: Why not just collect the usage information at the end of final rather than at the beginning, so that all splits during final have been done? If we have a call to a leaf function, the final rtl representation does not contain calls. The problem does not lie in the final pass where the callee is analyzed, but in the caller, where information is used, and where the unsplit call is missing the clobber of r6. Ah, so when you're using this hook in final, you're actually adding in the set of registers that will be clobbered by a future caller's CALL_INSN, as well as the registers that are clobbered by the callee itself? Right. The first part is not the intended usage of the hook, but it was the simplest fix. That seems a bit error-prone, since we don't know at this stage what the future caller will look like. (Things like the target attribute make this harder to predict.) I think it would be cleaner to just calculate the callee-clobbered registers during final and leave the caller to say what it clobbers. Agree. I've rewritten the patch as such. FWIW, I still think it'd be better to collect the set at the end of final (after any final splits) rather than at the beginning. Hmm. I was not aware that splits can happen during final. I'll try to update that patch as well. For other cases (where the usage isn't explicit at the rtl level), why not record the usage in CALL_INSN_FUNCTION_USAGE instead? Right, we could add the r6 clobber that way. But to keep things simple, I've used the hook instead. Why's it simpler though? That's the kind of thing CALL_INSN_FUNCTION_USAGE is there for. It was simpler to implement. But you're right, using CALL_INSN_FUNCTION_USAGE was simple as well. build and reg-tested on MIPS. OK for stage1? (You've alread OK-ed the test-case part). Thanks, - Tom Thanks, Richard 2014-01-12 Radovan Obradovic Tom de Vries * config/mips/mips.c (POST_CALL_TMP_REG): Define. (mips_split_call): Use POST_CALL_TMP_REG. (mips_fn_other_hard_reg_usage): New function. (TARGET_FN_OTHER_HARD_REG_USAGE): Define targhook using new function. (mips_expand_call): Add POST_CALL_TMP_REG clobber. * gcc.target/mips/mips.exp: Add use-caller-save to -ffoo/-fno-foo options. * gcc.target/mips/fuse-caller-save.c: New test. --- gcc/config/mips/mips.c | 41 +--- gcc/testsuite/gcc.target/mips/fuse-caller-save.c | 30 + gcc/testsuite/gcc.target/mips/mips.exp | 1 + 3 files changed, 67 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/mips/fuse-caller-save.c diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 617391c..ef7a3f9 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -175,6 +175,11 @@ along with GCC; see the file COPYING3. If not see /* Return the usual opcode for a nop. */ #define MIPS_NOP 0 +/* Temporary register that is used after a call, and suitable for both + MIPS16 and non-MIPS16 code. $4 and $5 are used for returning complex double + values in soft-float code, so $6 is the first suitable candidate. */ +#define POST_CALL_TMP_REG (GP_ARG_FIRST + 2) + /* Classifies an address. ADDRESS_REG @@ -6906,11 +6911,19 @@ mips_expand_call (enum mips_call_type type, rtx result, rtx addr, { rtx orig_addr, pattern, insn; int fp_code; + rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG); fp_code = aux == 0 ? 0 : (int) GET_MODE (aux); insn = mips16_build_call_stub (result, &addr, args_size, fp_code); if (insn) { + if (TARGET_EXPLICIT_RELOCS + && TARGET_CALL_CLOBBERED_GP + && !find_reg_note (insn, REG_NORETURN, 0)) + CALL_INSN_FUNCTION_USAGE (insn) + = gen_rtx_EXPR_LIST (VOIDmode, + gen_rtx_CLOBBER (VOIDmode, post_call_tmp_reg), + CALL_INSN_FUNCTION_USAGE (insn)); gcc_assert (!lazy_p && type == MIPS_CALL_NORMAL); return insn; } @@ -6966,7 +6979,16 @@ mips_expand_call (enum mips_call_type type, rtx result, rtx addr, pattern = fn (result, addr, args_size); } - return mips_emit_call_insn (pattern, orig_addr, addr, lazy_p); + insn = mips_emit_call_insn (pattern, orig_addr, addr, lazy_p); + if (TARGET_EXPLICIT_RELOCS + && TARGET_CALL_CLOBBERED_GP + && !find_reg_note (insn, REG_NORETURN, 0)) +CALL_INSN_FUNCTION_USAGE (insn) + = gen_rtx_EXPR_LIST (VOIDmode, + gen_rtx_CLOBBER (VOIDmode, post_call_tmp_reg), + CALL_INSN_FUNCTION_USAGE (insn)); + + return insn; } /* Split call instruction INSN into a $gp-clobbering call and @@ -6978,10 +7000,8 @@ mips_split_call (rtx insn, rtx call_pattern) { emit_call_insn (call_pattern); if (!find_reg_note (insn, REG_NORETURN, 0)) -/* Pick a temporary register that is suitable for both MIPS16 and - non-MIPS16 code. $4 and $5 are used for returning complex double - v
Re: [PATCH][testsuite][ARM] Properly figure -mfloat-abi option for crypto tests
On 13 January 2014 15:51, Kyrill Tkachov wrote: > On 13/01/14 13:57, Christophe Lyon wrote: >> >> Hi Kyrill, >> >> Your patch fixes most of the problems I noticed, however, it makes the >> compiler crash on vld1Q_dupp64 when the target is big-endian: >> --with-target= armeb-none-linux-gnueabihf >> --with-cpu=cortex-a9 >> --with-fpu=neon-fp16 >> >> >> >> /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c: >> In function 'test_vld1Q_dupp64': >> >> /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1: >> error: unrecognizable insn: >> (insn 30 29 16 (set (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) >> 0) >> (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 8)) >> >> /aci-gcc-fsf/builds/gcc-fsf-trunk/obj-armeb-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:8624 >> -1 >> (nil)) > > > Hmmm... This seems to be a failure in the vld1Q_dupu64 and vld1Q_dups64 > intrinsics as well that were not part of my crypto patches and were likely > ICEing before that in big-endian. The problem seems that we end up splitting > into subregs after register allocation, which causes the ICE. The cuprit is > the neon_vld1_dupv2di. I think it can be modified to directly use the hard > registers after reload instead of generating their low and high parts. > You are probably right; before your patch it failed in my configuration because it was trying to #include gnu/stubs-soft.h in the hf configuration. Since you fixed that, the other problem appeared. > I'll test a patch... > Thanks
Re: [PATCH][testsuite][ARM] Properly figure -mfloat-abi option for crypto tests
On 13/01/14 13:57, Christophe Lyon wrote: Hi Kyrill, Your patch fixes most of the problems I noticed, however, it makes the compiler crash on vld1Q_dupp64 when the target is big-endian: --with-target= armeb-none-linux-gnueabihf --with-cpu=cortex-a9 --with-fpu=neon-fp16 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c: In function 'test_vld1Q_dupp64': /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1: error: unrecognizable insn: (insn 30 29 16 (set (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 0) (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 8)) /aci-gcc-fsf/builds/gcc-fsf-trunk/obj-armeb-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:8624 -1 (nil)) Hmmm... This seems to be a failure in the vld1Q_dupu64 and vld1Q_dups64 intrinsics as well that were not part of my crypto patches and were likely ICEing before that in big-endian. The problem seems that we end up splitting into subregs after register allocation, which causes the ICE. The cuprit is the neon_vld1_dupv2di. I think it can be modified to directly use the hard registers after reload instead of generating their low and high parts. I'll test a patch... Thanks, Kyrill /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1: internal compiler error: in extract_insn, at recog.c:2168 0xa9e560 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:109 0xa9e59f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:117 0xa58fef extract_insn(rtx_def*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2168 0xa592ec extract_insn_cached(rtx_def*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2071 0x7e5309 cleanup_subreg_operands(rtx_def*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/final.c:3074 0xa5845f split_insn /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2886 0xa585b7 split_all_insns_noflow() /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2991 0xe31941 arm_reorg /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/config/arm/arm.c:16962 0xa9e240 rest_of_handle_machine_reorg /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3933 0xa9e26e execute /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3963 Christophe. On 10 January 2014 12:31, Richard Earnshaw wrote: On 09/01/14 17:02, Kyrill Tkachov wrote: Hi all, When adding the testsuite options for the crypto tests we need to make sure that don't end up adding -mfloat-abi=softfp to a hard-float target like arm-none-linux-gnueabihf. This patch adds that code to figure out which -mfpu/-mfloat-abi combination to use in a similar approach to the NEON tests. This patch addresses the same failures that Christophe mentioned in http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00375.html but with this patch we can get those tests to PASS on arm-none-linux-gnueabihf instead of being just UNSUPPORTED. Tested arm-none-linux-gnueabihf and arm-none-eabi. Ok for trunk? Thanks, Kyrill 2014-01-09 Kyrylo Tkachov * lib/target-supports.exp (check_effective_target_arm_crypto_ok_nocache): New. (check_effective_target_arm_crypto_ok): Use above procedure. (add_options_for_arm_crypto): Use et_arm_crypto_flags. OK. R.
[PATCH] remove some old code from ansidecl.h
ansidecl.h still defines a number of macros which I think are now obsolete. I recently removed all uses of these macros from binutils-gdb.git; and there are no more uses in gcc. So, I'd like to propose removing the old macros entirely. This patch removes the last uses of PARAMS from include, and the last uses of the obsolete VA_* wrapper macros from libiberty. Then, it removes many obsolete macro definitions from ansidecl.h. I tested this by rebuilding gcc and binutils-gdb with the patch. Note that even if I missed a use of one of the macros, the consequences are small, as the fix is always trivial. 2014-01-13 Tom Tromey * ansidecl.h (ANSI_PROTOTYPES, PTRCONST, LONG_DOUBLE, PARAMS) (VPARAMS, VA_START, VA_OPEN, VA_CLOSE, VA_FIXEDARG, CONST) (VOLATILE, SIGNED, PROTO, EXFUN, DEFUN, DEFUN_VOID, AND, DOTS) (NOARGS): Don't define. * libiberty.h (expandargv, writeargv): Don't use PARAMS. 2014-01-13 Tom Tromey * _doprint.c (checkit): Use stdarg, not VA_* macros. * asprintf.c (asprintf): Use stdarg, not VA_* macros. * concat.c (concat_length, concat_copy, concat_copy2, concat) (reconcat): Use stdarg, not VA_* macros. * snprintf.c (snprintf): Use stdarg, not VA_* macros. * vasprintf.c (checkit): Use stdarg, not VA_* macros. * vsnprintf.c (checkit): Use stdarg, not VA_* macros. --- include/ChangeLog | 8 +++ include/ansidecl.h| 141 +- include/libiberty.h | 6 +-- libiberty/ChangeLog | 10 libiberty/_doprnt.c | 6 +-- libiberty/asprintf.c | 9 ++-- libiberty/concat.c| 45 +++- libiberty/snprintf.c | 10 ++-- libiberty/vasprintf.c | 8 +-- libiberty/vsnprintf.c | 10 ++-- 10 files changed, 62 insertions(+), 191 deletions(-) diff --git a/include/ansidecl.h b/include/ansidecl.h index 5cd03a7..0fb23bb 100644 --- a/include/ansidecl.h +++ b/include/ansidecl.h @@ -1,6 +1,6 @@ /* ANSI and traditional C compatability macros Copyright 1991, 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001, - 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010 + 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2013 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -24,93 +24,16 @@ Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. Macro ANSI C definition Traditional C definition - - -- --- - -- - ANSI_PROTOTYPES 1 not defined PTR `void *'`char *' - PTRCONST`void *const' `char *' - LONG_DOUBLE `long double' `double' const not defined `' volatilenot defined `' signed not defined `' - VA_START(ap, var) va_start(ap, var) va_start(ap) - - Note that it is safe to write "void foo();" indicating a function - with no return value, in all K+R compilers we have been able to test. - - For declaring functions with prototypes, we also provide these: - - PARAMS ((prototype)) - -- for functions which take a fixed number of arguments. Use this - when declaring the function. When defining the function, write a - K+R style argument list. For example: - - char *strcpy PARAMS ((char *dest, char *source)); - ... - char * - strcpy (dest, source) -char *dest; -char *source; - { ... } - - - VPARAMS ((prototype, ...)) - -- for functions which take a variable number of arguments. Use - PARAMS to declare the function, VPARAMS to define it. For example: - - int printf PARAMS ((const char *format, ...)); - ... - int - printf VPARAMS ((const char *format, ...)) - { - ... - } - - For writing functions which take variable numbers of arguments, we - also provide the VA_OPEN, VA_CLOSE, and VA_FIXEDARG macros. These - hide the differences between K+R and C89 more - thoroughly than the simple VA_START() macro mentioned above. - - VA_OPEN and VA_CLOSE are used *instead of* va_start and va_end. - Immediately after VA_OPEN, put a sequence of VA_FIXEDARG calls - corresponding to the list of fixed arguments. Then use va_arg - normally to get the variable arguments, or pass your va_list object - around. You do not declare the va_list yourself; VA_OPEN does it - for you. - - Here is a complete example: - - int - printf VPARAMS ((const char *format, ...)) - { - int result; - - VA_OPEN (ap, format); - VA_FIXEDARG (ap, const char *, format); - - result = vfprintf (stdout, format, ap); - VA_CLOSE (ap); - - return result; - } - - - You can declare variables either before or after the VA_OPEN, - VA_FIXED
Re: [PATCH][testsuite][ARM] Properly figure -mfloat-abi option for crypto tests
Hi Kyrill, Your patch fixes most of the problems I noticed, however, it makes the compiler crash on vld1Q_dupp64 when the target is big-endian: --with-target= armeb-none-linux-gnueabihf --with-cpu=cortex-a9 --with-fpu=neon-fp16 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c: In function 'test_vld1Q_dupp64': /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1: error: unrecognizable insn: (insn 30 29 16 (set (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 0) (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 8)) /aci-gcc-fsf/builds/gcc-fsf-trunk/obj-armeb-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:8624 -1 (nil)) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1: internal compiler error: in extract_insn, at recog.c:2168 0xa9e560 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:109 0xa9e59f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:117 0xa58fef extract_insn(rtx_def*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2168 0xa592ec extract_insn_cached(rtx_def*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2071 0x7e5309 cleanup_subreg_operands(rtx_def*) /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/final.c:3074 0xa5845f split_insn /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2886 0xa585b7 split_all_insns_noflow() /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2991 0xe31941 arm_reorg /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/config/arm/arm.c:16962 0xa9e240 rest_of_handle_machine_reorg /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3933 0xa9e26e execute /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3963 Christophe. On 10 January 2014 12:31, Richard Earnshaw wrote: > On 09/01/14 17:02, Kyrill Tkachov wrote: >> Hi all, >> >> When adding the testsuite options for the crypto tests we need to make sure >> that >> don't end up adding -mfloat-abi=softfp to a hard-float target like >> arm-none-linux-gnueabihf. This patch adds that code to figure out which >> -mfpu/-mfloat-abi combination to use in a similar approach to the NEON tests. >> >> This patch addresses the same failures that Christophe mentioned in >> http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00375.html >> but with this patch we can get those tests to PASS on >> arm-none-linux-gnueabihf >> instead of being just UNSUPPORTED. >> >> Tested arm-none-linux-gnueabihf and arm-none-eabi. >> >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> >> 2014-01-09 Kyrylo Tkachov >> >> * lib/target-supports.exp >> (check_effective_target_arm_crypto_ok_nocache): New. >> (check_effective_target_arm_crypto_ok): Use above procedure. >> (add_options_for_arm_crypto): Use et_arm_crypto_flags. >> >> > > OK. > > R. > >
Re: [PATCH] Fixing PR59006 and PR58921 by delaying loop invariant hoisting in vectorizer.
On Mon, Jan 13, 2014 at 02:37:38PM +0100, Richard Biener wrote: > 2014-01-13 Richard Biener > > PR tree-optimization/58921 > PR tree-optimization/59006 > * tree-vect-loop-manip.c (vect_loop_versioning): Remove code > hoisting invariant stmts. > * tree-vect-stmts.c (vectorizable_load): Insert the splat of > invariant loads on the preheader edge if possible. > > * gcc.dg/torture/pr58921.c: New testcase. > * gcc.dg/torture/pr59006.c: Likewise. > * gcc.dg/vect/pr58508.c: XFAIL no longer handled cases. Looks good to me. If you want, I can add another bool to loop_vinfo, which would say if in the vectorized loop could be aliasing preventing the hoisting (i.e. set to false always, unless the loop->simdlen > 0, when it would be set if we would without loop->simdlen > 0 use versioning for alias or punting, but loop->simdlen > 0 resulted in vectorization of the loop anyway). Then, as a follow-up we could use that predicate instead of LOOP_REQUIRES_VERSIONING_FOR_ALIAS in vectorizable_load. Jakub
Re: [PATCH] Fixing PR59006 and PR58921 by delaying loop invariant hoisting in vectorizer.
On Wed, 27 Nov 2013, Jakub Jelinek wrote: > On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote: > > Hmm. I'm still thinking that we should handle this during the regular > > transform step. > > I wonder if it can't be done instead just in vectorizable_load, > if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is > invariant, just emit the (broadcasted) load not inside of the loop, but on > the loop preheader edge. So this implements this suggestion, XFAILing the no longer handled cases. For example we get _94 = *b_8(D); vect_cst_.18_95 = {_94, _94, _94, _94}; _99 = prolog_loop_adjusted_niters.9_132 * 4; vectp_a.22_98 = a_6(D) + _99; ivtmp.43_77 = (unsigned long) vectp_a.22_98; : # ivtmp.41_67 = PHI # ivtmp.43_71 = PHI vect__10.19_97 = vect_cst_.18_95 + { 1, 1, 1, 1 }; _76 = (void *) ivtmp.43_71; MEM[base: _76, offset: 0B] = vect__10.19_97; ... instead of having hoisted *b_8 + 1 as scalar computation. Not sure why LIM doesn't hoist the vector variant later. vect__10.19_97 = vect_cst_.18_95 + vect_cst_.20_96; invariant up to level 1, cost 1. ah, the cost thing. Should be "improved" to see that hoisting reduces the number of live SSA names in the loop. Eventually lower_vector_ssa could optimize vector to scalar code again ... (ick). Bootstrap / regtest running on x86_64. Comments? Thanks, Richard. 2014-01-13 Richard Biener PR tree-optimization/58921 PR tree-optimization/59006 * tree-vect-loop-manip.c (vect_loop_versioning): Remove code hoisting invariant stmts. * tree-vect-stmts.c (vectorizable_load): Insert the splat of invariant loads on the preheader edge if possible. * gcc.dg/torture/pr58921.c: New testcase. * gcc.dg/torture/pr59006.c: Likewise. * gcc.dg/vect/pr58508.c: XFAIL no longer handled cases. Index: gcc/tree-vect-loop-manip.c === *** gcc/tree-vect-loop-manip.c (revision 206576) --- gcc/tree-vect-loop-manip.c (working copy) *** vect_loop_versioning (loop_vec_info loop *** 2435,2507 } } - - /* Extract load statements on memrefs with zero-stride accesses. */ - - if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) - { - /* In the loop body, we iterate each statement to check if it is a load. -Then we check the DR_STEP of the data reference. If DR_STEP is zero, -then we will hoist the load statement to the loop preheader. */ - - basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); - int nbbs = loop->num_nodes; - - for (int i = 0; i < nbbs; ++i) - { - for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]); - !gsi_end_p (si);) - { - gimple stmt = gsi_stmt (si); - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); - - if (is_gimple_assign (stmt) - && (!dr - || (DR_IS_READ (dr) && integer_zerop (DR_STEP (dr) - { - bool hoist = true; - ssa_op_iter iter; - tree var; - - /* We hoist a statement if all SSA uses in it are defined -outside of the loop. */ - FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE) - { - gimple def = SSA_NAME_DEF_STMT (var); - if (!gimple_nop_p (def) - && flow_bb_inside_loop_p (loop, gimple_bb (def))) - { - hoist = false; - break; - } - } - - if (hoist) - { - if (dr) - gimple_set_vuse (stmt, NULL); - - gsi_remove (&si, false); - gsi_insert_on_edge_immediate (loop_preheader_edge (loop), - stmt); - - if (dump_enabled_p ()) - { - dump_printf_loc - (MSG_NOTE, vect_location, - "hoisting out of the vectorized loop: "); - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); - dump_printf (MSG_NOTE, "\n"); - } - continue; - } - } - gsi_next (&si); - } - } - } - /* End loop-exit-fixes after versioning. */ if (cond_expr_stmt_list) --- 2435,2440 Index: gcc/tree-vect-stmts.c === *** gcc/tree-vect-stmts.c (revision 206576) --- gcc/tree-vect-stmts.c (working copy) *** vectorizable_loa
Re: [PATCH] Fix ifcvt (PR rtl-optimization/58668)
Hi Jakub, I can confirm it's OK now. Thanks, Christophe. On 10 January 2014 17:56, Christophe Lyon wrote: > On 10 January 2014 17:45, Jakub Jelinek wrote: >> On Fri, Jan 10, 2014 at 05:44:22PM +0100, Christophe Lyon wrote: >>> It seems this patch causes several regressions in gfortran on ARM too: >>> gfortran.dg/default_format_1.f90 >>> gfortran.dg/default_format_denormal_1.f90 >>> gfortran.dg/fmt_bz_bn.f >>> gfortran.dg/fmt_read_bz_bn.f90 >>> gfortran.dg/g77/f77-edit-t-in.f >>> gfortran.dg/list_read_4.f90 >>> gfortran.dg/namelist_11.f >>> gfortran.dg/past_eor.f90 >>> gfortran.dg/read_2.f90 >>> gfortran.dg/read_float_2.f03 >>> gfortran.dg/read_float_3.f90 >>> gfortran.dg/read_float_4.f90 >>> now fail after this patch. >> >> Even after the http://gcc.gnu.org/r206456 fix? >> > I don't know yet. My validations are still catching up with the backlog. > I'll tell you shortly.
Re: wide-int, wide
On Sat, Nov 23, 2013 at 8:23 PM, Mike Stump wrote: > Richi has asked the we break the wide-int patch so that the individual port > and front end maintainers can review their parts without have to go through > the entire patch.This patch covers the new wide-int code. > > Ok? I know the patch is not up-to-date. I've looked at the wide-int.h pieces on the branch repeatedly - more eyes on .cc bits appreciated. Ok for stage1. Thanks, Richard.
Re: wide-int, fold
On Sat, Nov 23, 2013 at 8:21 PM, Mike Stump wrote: > Richi has asked the we break the wide-int patch so that the individual port > and front end maintainers can review their parts without have to go through > the entire patch.This patch covers the constant folding code. > > Ok? Ok for stage1. Thanks, Richard.
[PATCH] Fix test case vect-nop-move.c
Hello, there is another test case, that misses the necessary check_vect() runtime check. Tested on i686-pc-linux-gnu. OK for trunk? Regards Bernd. patch-vect-nop-move.diff Description: Binary data
Re: [PATCH] Fix unaligned access generated by IVOPTS
On Mon, Jan 13, 2014 at 11:37 AM, Eric Botcazou wrote: >> Note that this now lets unaligned vector moves slip through as >> their TYPE_ALIGN (TREE_TYPE (ref)) is properly reflecting this >> fact, so is anything which dereferences a type with an aligned >> attribute lowering its alignment. >> >> Which of course raises the question what the function is >> supposed to verify alignment against - given that it is only >> queried for STRICT_ALIGNMENT targets I would guess >> it wants to verify against mode alignment (historically >> at least ...). Not sure how this observation relates to the >> bug you want to fix though. > > Yes, it was the mode, but on STRICT_ALIGNMENT targets types must be as aligned > as their mode (unless you previously under-aligned the type and knew what you > were doing when you did it...). Yeah, the vectorizer first querying target capabilities and then under-aligning the vector type probably qualifies here. > The bug is that, for BLKmode, you really need > to look at the type to have the alignment. Of course. >> Still the patch is an improvement and thus ok. > > Thanks. > > -- > Eric Botcazou
Re: [PATCH] Avoid introducing undefined behavior in sccp (PR tree-optimization/59387)
On Fri, 10 Jan 2014, Jakub Jelinek wrote: > Hi! > > If folded_casts is true, sccp can introduce undefined behavior even when > there was none in the original loop, e.g. all actual additions performed in > unsigned type and then cast back to signed. > > The following patch fixes that by turning the arithmetic stmts added by sccp > use unsigned operations if folded_casts and def's type has undefined > overflow behavior. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2014-01-10 Jakub Jelinek > > PR tree-optimization/59387 > * tree-scalar-evolution.c: Include gimple-fold.h and gimplify-me.h. > (scev_const_prop): If folded_casts and type has undefined overflow, > use force_gimple_operand instead of force_gimple_operand_gsi and > for each added stmt if it is assign with > arith_code_with_undefined_signed_overflow, call > rewrite_to_defined_overflow. > * tree-ssa-loop-im.c: Don't include gimplify-me.h, include > gimple-fold.h instead. > (arith_code_with_undefined_signed_overflow, > rewrite_to_defined_overflow): Moved to ... > * gimple-fold.c (arith_code_with_undefined_signed_overflow, > rewrite_to_defined_overflow): ... here. No longer static. > Include gimplify-me.h. > * gimple-fold.h (arith_code_with_undefined_signed_overflow, > rewrite_to_defined_overflow): New prototypes. > > * gcc.c-torture/execute/pr59387.c: New test. > > --- gcc/tree-scalar-evolution.c.jj2014-01-08 17:44:57.596582925 +0100 > +++ gcc/tree-scalar-evolution.c 2014-01-10 15:46:55.355915072 +0100 > @@ -286,6 +286,8 @@ along with GCC; see the file COPYING3. > #include "dumpfile.h" > #include "params.h" > #include "tree-ssa-propagate.h" > +#include "gimple-fold.h" > +#include "gimplify-me.h" > > static tree analyze_scalar_evolution_1 (struct loop *, tree, tree); > static tree analyze_scalar_evolution_for_address_of (struct loop *loop, > @@ -3409,7 +3411,7 @@ scev_const_prop (void) > { >edge exit; >tree def, rslt, niter; > - gimple_stmt_iterator bsi; > + gimple_stmt_iterator gsi; > >/* If we do not know exact number of iterations of the loop, we cannot >replace the final value. */ > @@ -3424,7 +3426,7 @@ scev_const_prop (void) >/* Ensure that it is possible to insert new statements somewhere. */ >if (!single_pred_p (exit->dest)) > split_loop_exit_edge (exit); > - bsi = gsi_after_labels (exit->dest); > + gsi = gsi_after_labels (exit->dest); > >ex_loop = superloop_at_depth (loop, > loop_depth (exit->dest->loop_father) + 1); > @@ -3447,7 +3449,9 @@ scev_const_prop (void) > continue; > } > > - def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL); > + bool folded_casts; > + def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, > + &folded_casts); > def = compute_overall_effect_of_inner_loop (ex_loop, def); > if (!tree_does_not_contain_chrecs (def) > || chrec_contains_symbols_defined_in_loop (def, ex_loop->num) > @@ -3485,10 +3489,38 @@ scev_const_prop (void) > def = unshare_expr (def); > remove_phi_node (&psi, false); > > - def = force_gimple_operand_gsi (&bsi, def, false, NULL_TREE, > - true, GSI_SAME_STMT); > + if (TREE_CODE (def) == INTEGER_CST && TREE_OVERFLOW (def)) TREE_OVERFLOW_P (), but it seems to me that the SCEV machinery should do this at a good place (like where it finally records the result into its cache before returning it, at set_and_end: of analyze_scalar_evolution_1). > + def = drop_tree_overflow (def); > + > + /* If def's type has undefined overflow and there were folded > + casts, rewrite all stmts added for def into arithmetics > + with defined overflow behavior. */ > + if (folded_casts && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (def))) > + { > + gimple_seq stmts; > + gimple_stmt_iterator gsi2; > + def = force_gimple_operand (def, &stmts, true, NULL_TREE); > + gsi2 = gsi_start (stmts); > + while (!gsi_end_p (gsi2)) > + { > + gimple stmt = gsi_stmt (gsi2); > + gsi_next (&gsi2); > + if (is_gimple_assign (stmt) > + && arith_code_with_undefined_signed_overflow > + (gimple_assign_rhs_code (stmt))) > + gsi_insert_seq_before (&gsi, > +rewrite_to_defined_overflow (stmt), > +GSI_SAME_STMT); Hmm, stmt is still in the 'stmts' sequence here, I think you should gsi_remove it before inserting it elsewhere. > + else > + gsi_insert_b
Re: [PATCH] Fix unaligned access generated by IVOPTS
> Note that this now lets unaligned vector moves slip through as > their TYPE_ALIGN (TREE_TYPE (ref)) is properly reflecting this > fact, so is anything which dereferences a type with an aligned > attribute lowering its alignment. > > Which of course raises the question what the function is > supposed to verify alignment against - given that it is only > queried for STRICT_ALIGNMENT targets I would guess > it wants to verify against mode alignment (historically > at least ...). Not sure how this observation relates to the > bug you want to fix though. Yes, it was the mode, but on STRICT_ALIGNMENT targets types must be as aligned as their mode (unless you previously under-aligned the type and knew what you were doing when you did it...). The bug is that, for BLKmode, you really need to look at the type to have the alignment. > Still the patch is an improvement and thus ok. Thanks. -- Eric Botcazou
Re: Patch ping
On Mon, 13 Jan 2014, Jakub Jelinek wrote: > On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote: > > On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek wrote: > > > > > http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html > > > - PR target/59617 > > > handle gather loads for AVX512 (at least non-masked ones, masked ones > > > will need to wait for 5.0 and we need to find how to represent it in > > > GIMPLE) > > > > This one needs tree-optimization approval first. > > Sure, that is why Richard was on To line too ;) The vectorizer parts are ok. Richard.
Re: [PATCH] Fix unaligned access generated by IVOPTS
On Sat, Jan 11, 2014 at 12:42 AM, Eric Botcazou wrote: > [Sorry for dropping the ball here] > >> I think that may_be_unaligned_p is just seriously out-dated ... shouldn't it >> be sth like >> >> get_object_alignment_1 (ref, &align, &bitpos); >> if step * BITS_PER_UNIT + bitpos is misaligned >> ... >> >> or rather all this may_be_unaligned_p stuff should be dropped and IVOPTs >> should finally generate proper [TARGET_]MEM_REFs instead? That is, >> we already handle aliasing fine: >> >> ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, >> reference_alias_ptr_type (*use->op_p), >> iv, base_hint, data->speed); >> >> so just also handle alignment properly by passing down >> get_object_alignment (*use->op_p) and in create_mem_ref_raw >> do at the end do the >> >> if (TYPE_MODE (type) != BLKmode >> && GET_MODE_ALIGNMENT (TYPE_MODE (type)) > align) >> type = build_aligned_type (type, align); >> >> for BLKmode we already look at TYPE_ALIGN and as we do not change >> the access type(?) either the previous code was already wrong or it was >> fine, so there is nothing to do. >> >> So - if you want to give it a try...? > > After a bit of pondering, I'm not really thrilled, as this would mean changing > TARGET_MEM_REF to accept invalid (unaligned) memory references for the target. AFAIK the expander already handles this if the target can expand it via movmisalign at least. One issue with vectorization is that possibly unaligned vector accesses are not handled/optimized by IVOPTs which is bad. Something to re-visit for 4.10. > But I agree that may_be_unaligned_p is seriously outdated, so the attached > patch entirely rewrites it, fixing the bug in the process. > > Tested on SPARC, SPARC64, IA-64 and ARM, OK for the mainline? OK. Note that this now lets unaligned vector moves slip through as their TYPE_ALIGN (TREE_TYPE (ref)) is properly reflecting this fact, so is anything which dereferences a type with an aligned attribute lowering its alignment. Which of course raises the question what the function is supposed to verify alignment against - given that it is only queried for STRICT_ALIGNMENT targets I would guess it wants to verify against mode alignment (historically at least ...). Not sure how this observation relates to the bug you want to fix though. Still the patch is an improvement and thus ok. Thanks, Richard. > 2014-01-10 Eric Botcazou > > * builtins.c (get_object_alignment_2): Minor tweak. > * tree-ssa-loop-ivopts.c (may_be_unaligned_p): Rewrite. > > > -- > Eric Botcazou
Re: [AARCH64][PATCH] PR59695
On 11/01/14 23:42, Kugan wrote: > Hi, > > aarch64_build_constant incorrectly truncates the immediate when > constants are generated with MOVN. This causes coinor-osi tests to fail > (tracked also in https://bugs.launchpad.net/gcc-linaro/+bug/1263576) > > Attached patch fixes this. Also attaching a reduced testcase that > reproduces this. Tested on aarch64-none-linux-gnu with no new > regressions. Is this OK for trunk? > > Thanks, > Kugan > > gcc/ > +2013-10-15 Matthew Gretton-Dann > + Kugan Vivekanandarajah > + > + PR target/59588 > + * config/aarch64/aarch64.c (aarch64_build_constant): Fix incorrect > + truncation. > + > > > gcc/testsuite/ > +2014-01-11 Matthew Gretton-Dann > + Kugan Vivekanandarajah > + > + PR target/59695 > + * g++.dg/pr59695.C: New file. > + > > > p.txt > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 3d32ea5..854666f 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -2486,7 +2486,7 @@ aarch64_build_constant (int regnum, HOST_WIDE_INT val) >if (ncount < zcount) > { > emit_move_insn (gen_rtx_REG (Pmode, regnum), > - GEN_INT ((~val) & 0x)); > + GEN_INT (~((~val) & 0x))); I think that would be better written as GEN_INT (val | ~(HOST_WIDE_INT) 0x); Note the cast after the ~ to ensure we invert the right number of bits. Otherwise OK. R. > tval = 0x; > } >else > diff --git a/gcc/testsuite/g++.dg/pr59695.C b/gcc/testsuite/g++.dg/pr59695.C > index e69de29..0da06cb 100644 > --- a/gcc/testsuite/g++.dg/pr59695.C > +++ b/gcc/testsuite/g++.dg/pr59695.C > @@ -0,0 +1,125 @@ > + > +/* PR target/53055 */ > +/* { dg-do run { target aarch64*-*-* } } */ > +/* { dg-options "-O0" } */ > + > +#define DEFINE_VIRTUALS_FNS(i) virtual void xxx##i () {} \ > + virtual void foo1_##i () {}\ > + virtual void foo2_##i () {}\ > + virtual void foo3_##i () {}\ > + virtual void foo4_##i () {}\ > + virtual void foo5_##i () {}\ > + virtual void foo6_##i () {}\ > + virtual void foo7_##i () {}\ > + virtual void foo8_##i () {}\ > + virtual void foo9_##i () {}\ > + virtual void foo10_##i () {}\ > + virtual void foo11_##i () {}\ > + virtual void foo12_##i () {}\ > + virtual void foo13_##i () {}\ > + virtual void foo14_##i () {}\ > + virtual void foo15_##i () {}\ > + virtual void foo16_##i () {}\ > + virtual void foo17_##i () {}\ > + virtual void foo18_##i () {}\ > + virtual void foo19_##i () {}\ > + virtual void foo20_##i () {}\ > + virtual void foo21_##i () {}\ > + virtual void foo22_##i () {}\ > + > +class base_class_2 > +{ > + > +public: > + /* Define lots of virtual functions */ > + DEFINE_VIRTUALS_FNS (1) > + DEFINE_VIRTUALS_FNS (2) > + DEFINE_VIRTUALS_FNS (3) > + DEFINE_VIRTUALS_FNS (4) > + DEFINE_VIRTUALS_FNS (5) > + DEFINE_VIRTUALS_FNS (6) > + DEFINE_VIRTUALS_FNS (7) > + DEFINE_VIRTUALS_FNS (8) > + DEFINE_VIRTUALS_FNS (9) > + DEFINE_VIRTUALS_FNS (10) > + DEFINE_VIRTUALS_FNS (11) > + DEFINE_VIRTUALS_FNS (12) > + DEFINE_VIRTUALS_FNS (13) > + DEFINE_VIRTUALS_FNS (14) > + DEFINE_VIRTUALS_FNS (15) > + DEFINE_VIRTUALS_FNS (16) > + DEFINE_VIRTUALS_FNS (17) > + DEFINE_VIRTUALS_FNS (18) > + DEFINE_VIRTUALS_FNS (19) > + DEFINE_VIRTUALS_FNS (20) > + > + base_class_2(); > + virtual ~base_class_2 (); > +}; > + > +base_class_2::base_class_2() > +{ > +} > + > +base_class_2::~base_class_2 () > +{ > +} > + > +class base_class_1 > +{ > +public: > + virtual ~base_class_1(); > + base_class_1(); > +}; > + > +base_class_1::base_class_1() > +{ > +} > + > +base_class_1::~base_class_1() > +{ > +} > + > +class base_Impl_class : > + virtual public base_class_2, public base_class_1 > +{ > +public: > + base_Impl_class (); > + virtual ~base_Impl_class (); > +}; > + > +base_Impl_class::base_Impl_class () > +{ > +} > + > +base_Impl_class::~base_Impl_class () > +{ > +} > + > + > +class test_cls : public base_Impl_class > +{ > +public: > + test_cls(); > + virtual ~test_cls(); > +}; > + > +test_cls::test_cls() > +{ > +} > + > +test_cls::~test_cls() > +{ > +} > + > +int main() > +{ > + test_cls *test = new test_cls; > + base_class_2 *p1 = test; > + > + /* PR 53055 destructor thunk offsets are not setup > + correctly resulting in crash. */ > + delete p1; > + return 0; > +} > + >
Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs
On Sun, Jan 12, 2014 at 10:51 PM, Trevor Saunders wrote: > On Sun, Jan 12, 2014 at 02:23:21PM +0100, Richard Biener wrote: >> On Fri, Jan 10, 2014 at 6:37 PM, Richard Henderson wrote: >> > On 01/09/2014 03:34 PM, Jakub Jelinek wrote: >> >> 2014-01-09 Jakub Jelinek >> >> >> >> * target-globals.c (save_target_globals): Allocate < 4KB structs >> >> using >> >> GC in payload of target_globals struct instead of allocating them on >> >> the heap and the larger structs separately using GC. >> >> * target-globals.h (struct target_globals): Make regs, hard_regs, >> >> reload, expmed, ira, ira_int and lra_fields GTY((atomic)) instead >> >> of GTY((skip)) and change type to void *. >> >> (reset_target_globals): Cast loads from those fields to >> >> corresponding >> >> types. >> >> >> >> --- gcc/target-globals.h.jj 2014-01-09 19:24:20.0 +0100 >> >> +++ gcc/target-globals.h 2014-01-09 19:39:43.879348712 +0100 >> >> @@ -41,17 +41,17 @@ extern struct target_lower_subreg *this_ >> >> >> >> struct GTY(()) target_globals { >> >>struct target_flag_state *GTY((skip)) flag_state; >> >> - struct target_regs *GTY((skip)) regs; >> >> + void *GTY((atomic)) regs; >> > >> > I'm not entirely fond of this either, for the obvious reason. Clearly a >> > deficiency in gengtype, but after 2 hours of poking around I can see that >> > it isn't a quick fix. >> > >> > I guess I'm ok with the patch, since the use of the target_globals >> > structure >> > is so restricted. >> >> Yeah. At some time we need a way to specify a finalization hook called >> if an object is collected and eventually a hook that walks extra roots >> indirectly >> reachable via an object (so you can have GC -> heap -> GC memory layouts >> more easily). > > I actually tried to add finalizers a couple weeks ago, but it seems > pretty non trivial. ggc seems to basically just allocate by searching > for the first unmarked block. It doesn't even sweep unmarked stuff, it > just marks and then waits for the space to be allocated over. I believe > it deals with size by using different pages for each size class? So even > if it did sweep it would be somewhat tricky to know what finalizer to > call. Perhaps a solution is to have separate pages for each type that > needs a finalizer, and be able to mark things as being in one of three > states (in use, needs finalization but not in use, finalized and not in > use). That might hurt memory consumption in the short term, but I think > finalizers will be really useful in getting stuff out of gc memory so > that's probably not too bad. I think you would need to have a list of object/finalizer per GC page and do finalization at sweep_pages () time. Yes, per-type pools would also work (for types with finalizers). Or rework how the GC works - surely advanced techs like incremental or copying collection might benefit GCC. Richard. > Trev > >> >> Richard. >> >> > >> > r~ >> >
Re: Test cases vect-simd-clone-10/12.c keep failing
On Sun, Jan 12, 2014 at 10:53:12PM +0100, Bernd Edlinger wrote: > Yes, explicit /* { dg-do run } */ works. Ok, I've committed 2014-01-13 Jakub Jelinek * gcc.dg/vect/vect-simd-clone-10.c: Add dg-do run. * gcc.dg/vect/vect-simd-clone-12.c: Likewise. --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c (revision 206573) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c (working copy) @@ -1,3 +1,4 @@ +/* { dg-do run } */ /* { dg-require-effective-target vect_simd_clones } */ /* { dg-additional-options "-fopenmp-simd" } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-12.c (revision 206573) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-12.c (working copy) @@ -1,3 +1,4 @@ +/* { dg-do run } */ /* { dg-require-effective-target vect_simd_clones } */ /* { dg-additional-options "-fopenmp-simd" } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */ then. Jakub
Re: Patch ping
On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote: > On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek wrote: > > > http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html > > - PR target/59617 > > handle gather loads for AVX512 (at least non-masked ones, masked ones > > will need to wait for 5.0 and we need to find how to represent it in > > GIMPLE) > > This one needs tree-optimization approval first. Sure, that is why Richard was on To line too ;) > Kirill, is it possible for you to test the patch in the simulator? Do > we have a testcase in gcc's testsuite that can be used to check this > patch? E.g. gcc.target/i386/avx2-gather* and avx512f-gather*. Jakub
Re: Patch ping
On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek wrote: > http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html > - PR target/59617 > handle gather loads for AVX512 (at least non-masked ones, masked ones > will need to wait for 5.0 and we need to find how to represent it in > GIMPLE) This one needs tree-optimization approval first. Kirill, is it possible for you to test the patch in the simulator? Do we have a testcase in gcc's testsuite that can be used to check this patch? Uros.
Patch ping
Hi! I'd like to ping 2 patches: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html - Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than memory load after optimization (I'd like to keep the current patch for the reasons mentioned there, but also add this patch) http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html - PR target/59617 handle gather loads for AVX512 (at least non-masked ones, masked ones will need to wait for 5.0 and we need to find how to represent it in GIMPLE) Jakub
[committed] Fix #pragma omp atomic/atomic reductions (PR libgomp/59194)
Hi! When expanding #pragma omp atomic or reduction merging using expand_omp_atomic_pipeline loop, we start by fetching the initial value using normal memory read and only in the second and following iteration use the one from the atomic compare and exchange. The initial value is just an optimization, it is better if it is what we'll want to use, but if it is something different, except perhaps for floating point exceptions it shouldn't really matter what exact value we load. This patch uses __atomic_load_N with MEMMODEL_RELAXED instead of normal load. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2014-01-13 Jakub Jelinek PR libgomp/59194 * omp-low.c (expand_omp_atomic_pipeline): Expand the initial load as __atomic_load_N if possible. --- gcc/omp-low.c.jj2014-01-08 17:45:05.0 +0100 +++ gcc/omp-low.c 2014-01-10 21:12:22.498276852 +0100 @@ -7536,12 +7536,21 @@ expand_omp_atomic_pipeline (basic_block loadedi = loaded_val; } + fncode = (enum built_in_function) (BUILT_IN_ATOMIC_LOAD_N + index + 1); + tree loaddecl = builtin_decl_explicit (fncode); + if (loaddecl) +initial + = fold_convert (TREE_TYPE (TREE_TYPE (iaddr)), + build_call_expr (loaddecl, 2, iaddr, + build_int_cst (NULL_TREE, + MEMMODEL_RELAXED))); + else +initial = build2 (MEM_REF, TREE_TYPE (TREE_TYPE (iaddr)), iaddr, + build_int_cst (TREE_TYPE (iaddr), 0)); + initial -= force_gimple_operand_gsi (&si, - build2 (MEM_REF, TREE_TYPE (TREE_TYPE (iaddr)), - iaddr, - build_int_cst (TREE_TYPE (iaddr), 0)), - true, NULL_TREE, true, GSI_SAME_STMT); += force_gimple_operand_gsi (&si, initial, true, NULL_TREE, true, + GSI_SAME_STMT); /* Move the value to the LOADEDI temporary. */ if (gimple_in_ssa_p (cfun)) Jakub