Re: [PATCH, PR81192] Fix sigsegv in find_same_succ_bb
On Sun, 2 Jul 2017, Tom de Vries wrote: > Hi, > > consider this test-case: > ... > unsigned a; > int b, c; > > static int > fn1 (int p1, int p2) > { > return p1 > 2147483647 - p2 ? p1 : p1 + p2; > } > > void > fn2 (void) > { > int j; > a = 30; > for (; a;) > for (; c; b = fn1 (j, 1)) > ; > } > ... > > When compiling the test-case with -Os, just before tail-merge it looks as in > before.pdf. > > During tail-merge, it runs into a sigsegv. > > What happens is the following: > - tail-merge decides to merge blocks 4 and 6, and removes block 6. > - bb8, a predecessor of block 6, is marked as member of > deleted_bb_preds. > - during update_worklist, same_succ_flush_bb is called for bb8 > - same_succ_flush_bb runs into a sigsegv because > BB_SAME_SUCC (bb8) == NULL > - the reason that BB_SAME_SUCC (bb8) == NULL, is because it hit the > bb->loop_father->latch == bb clause in find_same_succ_bb at the start > of the tail-merge pass. > > This patch fixes the sigsegv by doing an early-out in same_succ_flush_bb if > BB_SAME_SUCC () == NULL. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk and gcc-[567]-branch? Ok for trunk and branches. Mind the gcc-6 branch is frozen right now. Thanks, Richard. > Thanks, > - Tom > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: Add DR_BASE_ALIGNMENT
Richard Biener writes: > On Wed, Jun 28, 2017 at 3:40 PM, Richard Sandiford > wrote: >> This patch records the base alignment in data_reference, to avoid the >> second-guessing that was previously done in vect_compute_data_ref_alignment. >> It also makes vect_analyze_data_refs use dr_analyze_innermost, instead >> of having an almost-copy of the same code. >> >> I'd originally tried to do the second part as a standalone patch, >> but on its own it caused us to miscompute the base alignment (due to >> the second-guessing not quite working). This was previously latent >> because the old code set STMT_VINFO_DR_ALIGNED_TO to a byte value, >> whereas it should have been bits. >> >> After the previous patch, the only thing that dr_analyze_innermost >> read from the dr was the DR_REF. I thought it would be better to pass >> that in and make data_reference write-only. This means that callers >> like vect_analyze_data_refs don't have to set any fields first >> (and thus not worry about *which* fields they should set). >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Richard >> >> >> 2017-06-28 Richard Sandiford >> >> gcc/ >> * tree-data-ref.h (data_reference): Add a base_alignment field. >> (DR_BASE_ALIGNMENT): New macro. >> (dr_analyze_innermost): Add a tree argument. >> * tree-data-ref.c: Include builtins.h. >> (dr_analyze_innermost): Take the tree reference as argument. >> Set up DR_BASE_ALIGNMENT. >> (create_data_ref): Update call accordingly. >> * tree-predcom.c (find_looparound_phi): Likewise. >> * tree-vectorizer.h (_stmt_vec_info): Add dr_base_alignment. >> (STMT_VINFO_DR_BASE_ALIGNMENT): New macro. >> * tree-vect-data-refs.c: Include tree-cfg.h. >> (vect_compute_data_ref_alignment): Use DR_BASE_ALIGNMENT instead >> of calculating an alignment here. >> (vect_analyze_data_refs): Use dr_analyze_innermost. Record the >> base alignment in STMT_VINFO_DR_BASE_ALIGNMENT. >> >> Index: gcc/tree-data-ref.h >> === >> --- gcc/tree-data-ref.h 2017-06-26 19:41:19.549571836 +0100 >> +++ gcc/tree-data-ref.h 2017-06-28 14:26:19.651051322 +0100 >> @@ -119,6 +119,10 @@ struct data_reference >>/* True when the data reference is in RHS of a stmt. */ >>bool is_read; >> >> + /* The alignment of INNERMOST.base_address, in bits. This is logically >> + part of INNERMOST, but is kept here to avoid unnecessary padding. */ >> + unsigned int base_alignment; >> + > > But then it would be nice to have dr_analyze_innermost take a struct > innermost_loop_behavior * only. That way the vectorizer copy for the > outer loop behavior can just be a innermost_loop_behavior sub-struct > as well and predcom wouldn't need to invent an interesting DR_STMT > either. Yeah, had wondered that too. Comments below led to a new series in which the information ends up being nicely padded, so there was no excuse not to switch to innermost_loop_behavior. > The DR_ accessors wouldn't work on that but I guess they > could be made work with > > inline tree dr_base_address (struct data_reference *dr) { return > dr->innermost.base_address; } > inline tree dr_base_address (struct innermost_loop_behavior *p) { > return p->base_address; } > #define DR_BASE_ADDRESS(DR) (dr_base_address (dr)) > > anyway, that's implementation detail ;) I just went for direct field accesses, hope that's OK. >>/* Behavior of the memory reference in the innermost loop. */ >>struct innermost_loop_behavior innermost; >> >> @@ -139,6 +143,7 @@ #define DR_NUM_DIMENSIONS(DR) DR_AC >> #define DR_IS_READ(DR) (DR)->is_read >> #define DR_IS_WRITE(DR)(!DR_IS_READ (DR)) >> #define DR_BASE_ADDRESS(DR)(DR)->innermost.base_address >> +#define DR_BASE_ALIGNMENT(DR) (DR)->base_alignment >> #define DR_OFFSET(DR) (DR)->innermost.offset >> #define DR_INIT(DR)(DR)->innermost.init >> #define DR_STEP(DR)(DR)->innermost.step >> @@ -322,7 +327,7 @@ #define DDR_DIST_VECT(DDR, I) \ >> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p >> >> >> -bool dr_analyze_innermost (struct data_reference *, struct loop *); >> +bool dr_analyze_innermost (struct data_reference *, tree, struct loop *); >> extern bool compute_data_dependences_for_loop (struct loop *, bool, >>vec *, >>vec *, >> Index: gcc/tree-data-ref.c >> === >> --- gcc/tree-data-ref.c 2017-06-28 14:26:12.946306736 +0100 >> +++ gcc/tree-data-ref.c 2017-06-28 14:26:19.651051322 +0100 >> @@ -94,6 +94,7 @@ Software Foundation; either version 3, o >> #include "dumpfile.h" >> #include "tree-affine.h" >> #include "params.h" >> +#include "builtins.h" >> >> static struct dat
Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access
Hi All, Sorry I just realized I never actually sent the updated patch... So here it is :) Regards, Tamar From: gcc-patches-ow...@gcc.gnu.org on behalf of Tamar Christina Sent: Friday, June 9, 2017 4:51:52 PM To: Richard Sandiford Cc: GCC Patches; nd; James Greenhalgh; Richard Earnshaw; Marcus Shawcroft Subject: RE: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access > > + /* Optimize routines for MEM to REG copies. */ if (n < 8 && > > + !REG_P (XEXP (operands[0], 0))) > > This seems to be checking that the address of the original destination > memory isn't a plain base register. Why's it important to reject that case > but > allow e.g. base+offset? this function is (as far as I could tell) only being called with two types of destinations: a location on the stack or a plain register. When the destination is a register such as with void Fun3(struct struct3 foo3) { L3 = foo3; } You run into the issue you had pointed to below where we might write too much. Ideally the constraint I would like is to check if the destination is either a new register (no data) or that the structure was padded. I couldn't figure out how to do this and the gains over the existing code for this case was quite small. So I just disallowed it leaving it to the existing code, which isn't so bad, only 1 extra instruction. > > > + { > > + unsigned int max_align = UINTVAL (operands[2]); > > + max_align = n < max_align ? max_align : n; > > Might be misunderstanding, but isn't max_align always equal to n here, since > n was set by: Correct, previously this patch was made to allow n < 16. These were left over from the cleanup. I'll correct. > > > + result = gen_rtx_IOR (dest_mode, reg, result); > > + } > > + > > +dst = adjust_address (dst, dest_mode, 0); > > +emit_insn (gen_move_insn (dst, result)); > > dest_mode was chosen by smallest_mode_for_size, so can be bigger than n. > Doesn't that mean that we'll write beyond the end of the copy region when > n is an awkward number? Yes, see my answer above. For the other case, when we write onto a location on the stack, this is fine due to the alignment. > > > diff --git a/gcc/expr.c b/gcc/expr.c > > index > > > 91d7ea217229fac62380b5d4b646961bf7c836c1..b1df4651e7942346007cda1cce > 8e > > e5a19297ab16 100644 > > --- a/gcc/expr.c > > +++ b/gcc/expr.c > > @@ -2743,7 +2743,9 @@ copy_blkmode_to_reg (machine_mode mode, > tree > > src) > > > >n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD; > >dst_words = XALLOCAVEC (rtx, n_regs); > > - bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD); > > + bitsize = BITS_PER_WORD; > > + if (SLOW_UNALIGNED_ACCESS (BLKmode, TYPE_ALIGN (TREE_TYPE > (src > > +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD); > > I think this ought to be testing word_mode instead of BLKmode. > (Testing BLKmode doesn't really make sense in general, because the mode > doesn't have a meaningful alignment.) Ah, yes that makes sense. I'll update the patch. New patch is validating and will submit it soon. > > Thanks, > Richard diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ab1bdc0233afe7a3c41501cb724a5c4c719123b8..1cf6c4b891571745f740d7dbd03754bbba9264a6 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13466,7 +13466,6 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, /* Expand movmem, as if from a __builtin_memcpy. Return true if we succeed, otherwise return false. */ - bool aarch64_expand_movmem (rtx *operands) { @@ -13498,16 +13497,55 @@ aarch64_expand_movmem (rtx *operands) base = copy_to_mode_reg (Pmode, XEXP (src, 0)); src = adjust_automodify_address (src, VOIDmode, base, 0); + /* Optimize routines for MEM to REG copies. + This particular function only handles copying to two + destination types: 1) a regular register and 2) the stack. + When writing to a regular register we may end up writting too much in cases + where the register already contains a live value or when no data padding is + happening so we disallow it. */ + if (n < 8 && !REG_P (XEXP (operands[0], 0))) + { + machine_mode mov_mode, dest_mode + = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT); + rtx result = gen_reg_rtx (dest_mode); + emit_insn (gen_move_insn (result, GEN_INT (0))); + + unsigned int shift_cnt = 0; + for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode)) + { + int nearest = 0; + /* Find the mode to use. */ + for (unsigned max = 1; max <= (n - shift_cnt); max *= 2) + nearest = max; + + mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT, MODE_INT); + rtx reg = gen_reg_rtx (mov_mode); + + src = adjust_address (src, mov_mode, 0); + emit_insn (gen_move_insn (reg, src)); + src = aarch64_progress_pointer (src); + + reg = gen_rtx_ZERO_EXTEND (dest_mo
Re: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode.
Ping From: gcc-patches-ow...@gcc.gnu.org on behalf of Tamar Christina Sent: Monday, June 26, 2017 11:50:51 AM To: James Greenhalgh Cc: GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw Subject: Re: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode. Hi all, Here's the re-spun patch. Aside from the grouping of the split patterns it now also uses h register for the fmov for HF when available, otherwise it forces a literal load. Regression tested on aarch64-none-linux-gnu and no regressions. OK for trunk? Thanks, Tamar gcc/ 2017-06-26 Tamar Christina Richard Sandiford * config/aarch64/aarch64.md (mov): Generalize. (*movhf_aarch64, *movsf_aarch64, *movdf_aarch64): Add integer and movi cases. (movi-split-hf-df-sf split, fp16): New. (enabled): Added TARGET_FP_F16INST. * config/aarch64/iterators.md (GPF_HF): New. From: Tamar Christina Sent: Wednesday, June 21, 2017 11:48:33 AM To: James Greenhalgh Cc: GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw Subject: RE: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode. > > movi\\t%0.4h, #0 > > - mov\\t%0.h[0], %w1 > > + fmov\\t%s0, %w1 > > Should this not be %h0? The problem is that H registers are only available in ARMv8.2+, I'm not sure what to do about ARMv8.1 given your other feedback Pointing out that the bit patterns between how it's stored in s vs h registers differ. > > > umov\\t%w0, %1.h[0] > > mov\\t%0.h[0], %1.h[0] > > + fmov\\t%s0, %1 > > Likewise, and much more important for correctness as it changes the way the > bit pattern ends up in the register (see table C2-1 in release B.a of the ARM > Architecture Reference Manual for ARMv8-A), here. > > > + * return aarch64_output_scalar_simd_mov_immediate (operands[1], > > + SImode); > > ldr\\t%h0, %1 > > str\\t%h1, %0 > > ldrh\\t%w0, %1 > > strh\\t%w1, %0 > > mov\\t%w0, %w1" > > - [(set_attr "type" > "neon_move,neon_from_gp,neon_to_gp,neon_move,\ > > - f_loads,f_stores,load1,store1,mov_reg") > > - (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")] > > + "&& can_create_pseudo_p () > > + && !aarch64_can_const_movi_rtx_p (operands[1], HFmode) > > + && !aarch64_float_const_representable_p (operands[1]) > > + && aarch64_float_const_rtx_p (operands[1])" > > + [(const_int 0)] > > + "{ > > +unsigned HOST_WIDE_INT ival; > > +if (!aarch64_reinterpret_float_as_int (operands[1], &ival)) > > + FAIL; > > + > > +rtx tmp = gen_reg_rtx (SImode); > > +aarch64_expand_mov_immediate (tmp, GEN_INT (ival)); > > +tmp = simplify_gen_subreg (HImode, tmp, SImode, 0); > > +emit_move_insn (operands[0], gen_lowpart (HFmode, tmp)); > > +DONE; > > + }" > > + [(set_attr "type" "neon_move,f_mcr,neon_to_gp,neon_move,fconsts, > \ > > +neon_move,f_loads,f_stores,load1,store1,mov_reg") > > + (set_attr "simd" "yes,*,yes,yes,*,yes,*,*,*,*,*")] > > ) > > Thanks, > James
Re: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - infrastructure.
Ping From: gcc-patches-ow...@gcc.gnu.org on behalf of Tamar Christina Sent: Monday, June 26, 2017 11:49:42 AM To: James Greenhalgh Cc: Richard Sandiford; GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw Subject: Re: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - infrastructure. Hi All, I've updated patch accordingly. This mostly involves removing the loop to create the ival and removing the *2 code and instead defaulting to 64bit and switching to 128 when needed. Regression tested on aarch64-none-linux-gnu and no regressions. OK for trunk? Thanks, Tamar gcc/ 2017-06-26 Tamar Christina * config/aarch64/aarch64.c (aarch64_simd_container_mode): Add prototype. (aarch64_expand_mov_immediate): Add HI support. (aarch64_reinterpret_float_as_int, aarch64_float_const_rtx_p: New. (aarch64_can_const_movi_rtx_p): New. (aarch64_preferred_reload_class): Remove restrictions of using FP registers for certain SIMD operations. (aarch64_rtx_costs): Added new cost for CONST_DOUBLE moves. (aarch64_valid_floating_const): Add integer move validation. (aarch64_simd_imm_scalar_p): Remove. (aarch64_output_scalar_simd_mov_immediate): Generalize function. (aarch64_legitimate_constant_p): Expand list of supported cases. * config/aarch64/aarch64-protos.h (aarch64_float_const_rtx_p, aarch64_can_const_movi_rtx_p): New. (aarch64_reinterpret_float_as_int): New. (aarch64_simd_imm_scalar_p): Remove. * config/aarch64/predicates.md (aarch64_reg_or_fp_float): New. * config/aarch64/constraints.md (Uvi): New. (Dd): Split into Ds and new Dd. * config/aarch64/aarch64.md (*movsi_aarch64): Add SIMD mov case. (*movdi_aarch64): Add SIMD mov case. From: Tamar Christina Sent: Thursday, June 15, 2017 1:50:19 PM To: James Greenhalgh Cc: Richard Sandiford; GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw Subject: RE: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - infrastructure. > > This patch is pretty huge, are there any opportunities to further split it to > aid > review? Unfortunately because I'm also changing some constraints it introduced a bit of a dependency cycle. If I were to break it up more, the individual patches won't work on their own anymore. If this is acceptable I can break it up more. > > + ival = zext_hwi (res[needed - 1], 32); for (int i = needed - 2; i > > + >= 0; i--) > > +{ > > + ival <<= 32; > > + ival |= zext_hwi (res[i], 32); > > +} > > + > > + *intval = ival; > > ??? > > Two cases here, needed is either 2 if GET_MODE_BITSIZE (mode) == 64, or it > is 1 otherwise. So i starts at either -1 or 0. So this for loop either runs > 0 or 1 times. What am I missing? I'm sure this is all an indirect way of > writing: > Yes, the code was set up to be easily extended to support 128 floats as well, Which was deprioritized. I'll just remove the loop. > > + > > + /* Determine whether it's cheaper to write float constants as > > + mov/movk pairs over ldr/adrp pairs. */ unsigned HOST_WIDE_INT > > + ival; > > + > > + if (GET_CODE (x) == CONST_DOUBLE > > + && SCALAR_FLOAT_MODE_P (mode) > > + && aarch64_reinterpret_float_as_int (x, &ival)) > > +{ > > + machine_mode imode = mode == HFmode ? SImode : > int_mode_for_mode (mode); > > + int num_instr = aarch64_internal_mov_immediate > > + (NULL_RTX, gen_int_mode (ival, imode), false, > imode); > > + return num_instr < 3; > > Should this cost model be static on a magin number? Is it not the case that > the decision should be based on the relative speeds of a memory access > compared with mov/movk/fmov ? > As far as I'm aware, the cost model is too simplistic to be able to express the Actual costs of mov/movk and movk/movk pairs. E.g it doesn't take into account The latency and throughput difference when the instructions occur in sequence/pairs. This leads to it allowing a smaller subset through here then what would be beneficial. > > +/* Return TRUE if rtx X is immediate constant that fits in a single > > + MOVI immediate operation. */ > > +bool > > +aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode) { > > + if (!TARGET_SIMD) > > + return false; > > + > > + machine_mode vmode, imode; > > + unsigned HOST_WIDE_INT ival; > > + > > + /* Don't write float constants out to memory. */ > > + if (GET_CODE (x) == CONST_DOUBLE > > + && SCALAR_FLOAT_MODE_P (mode)) > > +{ > > + if (!aarch64_reinterpret_float_as_int (x, &ival)) > > + return false; > > + > > + imode = int_mode_for_mode (mode); > > +} > > + else if (GET_CODE (x) == CONST_INT > > + && SCALAR_INT_MODE_P (mode)) > > +{ > > + imode = mode; > > + ival = INTVAL (x); > > +} > > + else > > +return fa
Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons
On Sun, Jul 2, 2017 at 6:03 PM, Yuri Gribov wrote: > Hi all, > > This is initial patch for > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Per Joseph's > suggestion it optimizes > (float)lhs CMP rhs > (double)lhs CMP rhs > to > lhs CMP (typeof(x))rhs > whenever typeof(x) can be precisely represented by floating-point type > (e.g. short by float or int by double) and rhs can be precisely > represented by typeof(x). > > Bootstrapped/regtested on x64. Ok for trunk? Seems my understanding of real_to_integer was too naive. I'll fix the patch and send updated variant. -Y
[PATCH] Add dotfn
Hi, this patch adds a debug function dotfn and a convenience macro DOTFN similar to dot-fn in gdbhooks.py. It can be used to have the compiler: - dump a control flow graph, or - pop up a control flow graph window at specific moments in the compilation flow, for debugging purposes. Bootstrapped and reg-tested on x86_64. Used for debugging PR81192. OK for trunk? Thanks, - Tom Add dotfn 2017-06-30 Tom de Vries * graph.c (get_graph_file_name): New function, factored out of ... (open_graph_file): ... here. (dotfn): New function. * graph.h (dotfn): Declare. (DOTFN): New macro. --- gcc/graph.c | 68 +++-- gcc/graph.h | 7 +++ 2 files changed, 69 insertions(+), 6 deletions(-) diff --git a/gcc/graph.c b/gcc/graph.c index 9261732..e6d3f2e 100644 --- a/gcc/graph.c +++ b/gcc/graph.c @@ -37,23 +37,45 @@ along with GCC; see the file COPYING3. If not see ignore that recommendation... */ static const char *const graph_ext = ".dot"; +/* Return filename for dumping our graph to. If NUM, use *NUM in the file name + and increase *NUM. */ + +static char * +get_graph_file_name (const char *base, unsigned int *num = NULL) +{ + size_t namelen = strlen (base); + size_t extlen = strlen (graph_ext); + size_t numlen = num ? 11 : 0; + size_t len = namelen + extlen + numlen + 1; + char *buf = XNEWVEC (char, len); + + char *pos = buf; + memcpy (pos, base, namelen); + pos += namelen; + if (num) +{ + pos += snprintf (pos, numlen, ".%u", *num); + ++*num; +} + memcpy (pos, graph_ext, extlen + 1); + + return buf; +} + /* Open a file with MODE for dumping our graph to. Return the file pointer. */ static FILE * open_graph_file (const char *base, const char *mode) { - size_t namelen = strlen (base); - size_t extlen = strlen (graph_ext) + 1; - char *buf = XALLOCAVEC (char, namelen + extlen); + char *buf = get_graph_file_name (base); FILE *fp; - memcpy (buf, base, namelen); - memcpy (buf + namelen, graph_ext, extlen); - fp = fopen (buf, mode); if (fp == NULL) fatal_error (input_location, "can%'t open %s: %m", buf); + XDELETEVEC (buf); + return fp; } @@ -354,3 +376,37 @@ finish_graph_dump_file (const char *base) end_graph_dump (fp); fclose (fp); } + +/* Dump graph into file '.dot', or if COUNTER is defined + '.<*COUNTER>.dot'. If POPUP, show the graph in a popup window. */ + +void DEBUG_FUNCTION +dotfn (const char *base, unsigned int *counter, bool popup) +{ + char *name = get_graph_file_name (base, counter); + + FILE *fp = fopen (name, "w"); + if (fp != NULL) +{ + start_graph_dump (fp, ""); + print_graph_cfg (fp, cfun, dump_flags); + end_graph_dump (fp); + fclose (fp); + + if (popup) + { + char prog[] = "dot"; + char arg1[] = "-Tx11"; + char *const cmd[] = { prog, arg1, name , NULL }; + struct pex_obj *pex = pex_init (0, prog, NULL); + int err; + pex_run (pex, PEX_LAST | PEX_SEARCH, prog, cmd, NULL, NULL, &err); + + int status; + pex_get_status (pex, 1, &status); + pex_free (pex); + } +} + + XDELETEVEC (name); +} diff --git a/gcc/graph.h b/gcc/graph.h index 4db253f..168b60c 100644 --- a/gcc/graph.h +++ b/gcc/graph.h @@ -24,4 +24,11 @@ extern void print_graph_cfg (const char *, struct function *); extern void clean_graph_dump_file (const char *); extern void finish_graph_dump_file (const char *); +extern void dotfn (const char *base, unsigned int *counter, bool popup); +#define DOTFN(base, popup) \ + { \ +static unsigned int counter = 0; \ +dotfn (base, &counter, popup); \ + } + #endif /* ! GCC_GRAPH_H */
[PATCH, PR69468] Ignore EDGE_{DFS_BACK,EXECUTABLE} in tail-merge
[ was: Re: [PATCH, PR81192] Don't tail-merge blocks from different loops ] On 07/03/2017 12:49 AM, Tom de Vries wrote: [ was: Re: [PATCH, PR81192] Fix sigsegv in find_same_succ_bb ] On 07/03/2017 12:26 AM, Tom de Vries wrote: [ Trying again with before.svg instead of before.pdf ] Hi, consider this test-case: ... unsigned a; int b, c; static int fn1 (int p1, int p2) { return p1 > 2147483647 - p2 ? p1 : p1 + p2; } void fn2 (void) { int j; a = 30; for (; a;) for (; c; b = fn1 (j, 1)) ; } ... When compiling the test-case with -Os, just before tail-merge it looks as in before.svg. During tail-merge, it runs into a sigsegv. What happens is the following: - tail-merge decides to merge blocks 4 and 6, and removes block 6. As pointed out in the PR, blocks 4 and 6 belong to different loops, and shouldn't be merged. While working on this PR, I used print_graph_cfg in tail-merge, which I noticed changes the choice in merging blocks. I tracked this down to: - mark_dfs_back_edges changing the EDGE_DFS_BACK flag on some edges, and - tail-merge requiring identical edge flags (apart from the true/false flags, which are dealt with explicitly). This is similar to PR69468 which notices the same about EDGE_EXECUTABLE. This patch allows more tail-merging by ignoring both EDGE_DFS_BACK and EDGE_EXECUTABLE. Consequently, in this example the two latch bbs (of the same loop) bb4 and bb7 are merged. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom Ignore EDGE_{DFS_BACK,EXECUTABLE} in tail-merge 2017-06-30 Tom de Vries PR tree-optimization/69468 * tree-ssa-tail-merge.c (ignore_edge_flags): New constant. (find_same_succ_bb): Handle ignore_edge_flags. * gcc.dg/pr81192.c: Update. --- gcc/testsuite/gcc.dg/pr81192.c | 2 +- gcc/tree-ssa-tail-merge.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c index 8b3a77a..57eb478 100644 --- a/gcc/testsuite/gcc.dg/pr81192.c +++ b/gcc/testsuite/gcc.dg/pr81192.c @@ -19,4 +19,4 @@ fn2 (void) ; } -/* { dg-final { scan-tree-dump-not "(?n)find_duplicates: duplicate of " "pre" } } */ +/* { dg-final { scan-tree-dump-times "(?n)find_duplicates: duplicate of " 1 "pre" } } */ diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index 0865e86..6f199de 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -207,6 +207,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "tree-cfgcleanup.h" +const int ignore_edge_flags = EDGE_DFS_BACK | EDGE_EXECUTABLE; + /* Describes a group of bbs with the same successors. The successor bbs are cached in succs, and the successor edge flags are cached in succ_flags. If a bb has the EDGE_TRUE/FALSE_VALUE flags swapped compared to succ_flags, @@ -707,7 +709,7 @@ find_same_succ_bb (basic_block bb, same_succ **same_p) { int index = e->dest->index; bitmap_set_bit (same->succs, index); - same_succ_edge_flags[index] = e->flags; + same_succ_edge_flags[index] = (e->flags & ~ignore_edge_flags); } EXECUTE_IF_SET_IN_BITMAP (same->succs, 0, j, bj) same->succ_flags.safe_push (same_succ_edge_flags[j]);
[PATCH, PR81192] Don't tail-merge blocks from different loops
[ was: Re: [PATCH, PR81192] Fix sigsegv in find_same_succ_bb ] On 07/03/2017 12:26 AM, Tom de Vries wrote: [ Trying again with before.svg instead of before.pdf ] Hi, consider this test-case: ... unsigned a; int b, c; static int fn1 (int p1, int p2) { return p1 > 2147483647 - p2 ? p1 : p1 + p2; } void fn2 (void) { int j; a = 30; for (; a;) for (; c; b = fn1 (j, 1)) ; } ... When compiling the test-case with -Os, just before tail-merge it looks as in before.svg. During tail-merge, it runs into a sigsegv. What happens is the following: - tail-merge decides to merge blocks 4 and 6, and removes block 6. As pointed out in the PR, blocks 4 and 6 belong to different loops, and shouldn't be merged. There is a test 'bb->loop_father->latch == bb' in find_same_succ_bb that is supposed to keep the two blocks from merging, but the test is not working for this example because the latch field is not defined (because the loop has in fact two latches). This patch prevents blocks from different loops to be merged by testing for bb->loop_father->num equivalence in tail-merge. It also removes the unreliable test for 'bb->loop_father->latch == bb' in find_same_succ_bb. Bootstrapped and reg-tested on x86_64. OK for trunk and gcc-[567]-branch? Thanks, - Tom Don't tail-merge blocks from different loops 2017-06-30 Tom de Vries PR tree-optimization/81192 * tree-ssa-tail-merge.c (same_succ_hash): Use bb->loop_father->num in hash. (same_succ::equal): Don't find bbs to be equal if bb->loop_father->num differs. (find_same_succ_bb): Remove obsolete test on bb->loop_father->latch. * gcc.dg/pr81192.c: Update. --- gcc/testsuite/gcc.dg/pr81192.c | 2 +- gcc/tree-ssa-tail-merge.c | 15 ++- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c index 57eb478..8b3a77a 100644 --- a/gcc/testsuite/gcc.dg/pr81192.c +++ b/gcc/testsuite/gcc.dg/pr81192.c @@ -19,4 +19,4 @@ fn2 (void) ; } -/* { dg-final { scan-tree-dump-times "(?n)find_duplicates: duplicate of " 1 "pre" } } */ +/* { dg-final { scan-tree-dump-not "(?n)find_duplicates: duplicate of " "pre" } } */ diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index bb8a308..0865e86 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -479,6 +479,8 @@ same_succ_hash (const same_succ *e) hstate.add_int (size); BB_SIZE (bb) = size; + hstate.add_int (bb->loop_father->num); + for (i = 0; i < e->succ_flags.length (); ++i) { flags = e->succ_flags[i]; @@ -568,6 +570,9 @@ same_succ::equal (const same_succ *e1, const same_succ *e2) if (BB_SIZE (bb1) != BB_SIZE (bb2)) return 0; + if (bb1->loop_father->num != bb2->loop_father->num) +return 0; + gsi1 = gsi_start_nondebug_bb (bb1); gsi2 = gsi_start_nondebug_bb (bb2); gsi_advance_fw_nondebug_nonlocal (&gsi1); @@ -695,15 +700,7 @@ find_same_succ_bb (basic_block bb, same_succ **same_p) edge_iterator ei; edge e; - if (bb == NULL - /* Be conservative with loop structure. It's not evident that this test - is sufficient. Before tail-merge, we've just called - loop_optimizer_finalize, and LOOPS_MAY_HAVE_MULTIPLE_LATCHES is now - set, so there's no guarantee that the loop->latch value is still valid. - But we assume that, since we've forced LOOPS_HAVE_SIMPLE_LATCHES at the - start of pre, we've kept that property intact throughout pre, and are - keeping it throughout tail-merge using this test. */ - || bb->loop_father->latch == bb) + if (bb == NULL) return; bitmap_set_bit (same->bbs, bb->index); FOR_EACH_EDGE (e, ei, bb->succs)
[PATCH, PR81192] Fix sigsegv in find_same_succ_bb
[ Trying again with before.svg instead of before.pdf ] Hi, consider this test-case: ... unsigned a; int b, c; static int fn1 (int p1, int p2) { return p1 > 2147483647 - p2 ? p1 : p1 + p2; } void fn2 (void) { int j; a = 30; for (; a;) for (; c; b = fn1 (j, 1)) ; } ... When compiling the test-case with -Os, just before tail-merge it looks as in before.svg. During tail-merge, it runs into a sigsegv. What happens is the following: - tail-merge decides to merge blocks 4 and 6, and removes block 6. - bb8, a predecessor of block 6, is marked as member of deleted_bb_preds. - during update_worklist, same_succ_flush_bb is called for bb8 - same_succ_flush_bb runs into a sigsegv because BB_SAME_SUCC (bb8) == NULL - the reason that BB_SAME_SUCC (bb8) == NULL, is because it hit the bb->loop_father->latch == bb clause in find_same_succ_bb at the start of the tail-merge pass. This patch fixes the sigsegv by doing an early-out in same_succ_flush_bb if BB_SAME_SUCC () == NULL. Bootstrapped and reg-tested on x86_64. OK for trunk and gcc-[567]-branch? Thanks, - Tom Fix sigsegv in find_same_succ_bb 2017-06-30 Tom de Vries PR tree-optimization/81192 * tree-ssa-tail-merge.c (same_succ_flush_bb): Handle BB_SAME_SUCC (bb) == NULL. * gcc.dg/pr81192.c: New test. --- gcc/testsuite/gcc.dg/pr81192.c | 22 ++ gcc/tree-ssa-tail-merge.c | 3 +++ 2 files changed, 25 insertions(+) diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c new file mode 100644 index 000..57eb478 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr81192.c @@ -0,0 +1,22 @@ +/* { dg-options "-Os -fdump-tree-pre-details" } */ + +unsigned a; +int b, c; + +static int +fn1 (int p1, int p2) +{ + return p1 > 2147483647 - p2 ? p1 : p1 + p2; +} + +void +fn2 (void) +{ + int j; + a = 30; + for (; a;) +for (; c; b = fn1 (j, 1)) + ; +} + +/* { dg-final { scan-tree-dump-times "(?n)find_duplicates: duplicate of " 1 "pre" } } */ diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index f6c9878..bb8a308 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -809,6 +809,9 @@ static void same_succ_flush_bb (basic_block bb) { same_succ *same = BB_SAME_SUCC (bb); + if (! same) +return; + BB_SAME_SUCC (bb) = NULL; if (bitmap_single_bit_set_p (same->bbs)) same_succ_htab->remove_elt_with_hash (same, same->hashval);
Update profile in tree-complex.c
bootrapped/regtested x86_64-linux, comitted. Honza * tree-complex.c (expand_complex_div_wide): update profile. Index: tree-complex.c === --- tree-complex.c (revision 249881) +++ tree-complex.c (working copy) @@ -1186,13 +1186,22 @@ expand_complex_div_wide (gimple_stmt_ite bb_join = e->dest; bb_true = create_empty_bb (bb_cond); bb_false = create_empty_bb (bb_true); + bb_true->frequency = bb_false->frequency = bb_cond->frequency / 2; + bb_true->count = bb_false->count += bb_cond->count.apply_probability (profile_probability::even ()); /* Wire the blocks together. */ e->flags = EDGE_TRUE_VALUE; + e->count = bb_true->count; + /* TODO: With value profile we could add an historgram to determine real +branch outcome. */ + e->probability = profile_probability::even (); redirect_edge_succ (e, bb_true); - make_edge (bb_cond, bb_false, EDGE_FALSE_VALUE); - make_edge (bb_true, bb_join, EDGE_FALLTHRU); - make_edge (bb_false, bb_join, EDGE_FALLTHRU); + edge e2 = make_edge (bb_cond, bb_false, EDGE_FALSE_VALUE); + e2->count = bb_false->count; + e2->probability = profile_probability::even (); + make_single_succ_edge (bb_true, bb_join, EDGE_FALLTHRU); + make_single_succ_edge (bb_false, bb_join, EDGE_FALLTHRU); add_bb_to_loop (bb_true, bb_cond->loop_father); add_bb_to_loop (bb_false, bb_cond->loop_father);
Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons
On Sun, 2 Jul 2017, Yuri Gribov wrote: On Sun, Jul 2, 2017 at 6:32 PM, Marc Glisse wrote: On Sun, 2 Jul 2017, Yuri Gribov wrote: This is initial patch for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Thanks. I have had this unfinished, probably wrong patch on my hard drive for 4 years, I doubt there is much you can extract from it, but just in case... Thanks, this will indeed help! I only don't like !uns here: + bool uns = TYPE_UNSIGNED (itype); + bool fits = mantissa >= prec - !uns; as I believe for INT?_MIN we need 'prec' number of bits. INT_MIN is (minus) a power of 2, so 1 bit of mantissa is sufficient to represent it exactly (as long as the exponent also fits). At least for an IEEE754 representation using base 2, but I expect non-IEEE representations of floats are similar enough. -- Marc Glisse
Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons
On Sun, Jul 2, 2017 at 6:32 PM, Marc Glisse wrote: > On Sun, 2 Jul 2017, Yuri Gribov wrote: > >> This is initial patch for >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . > > Thanks. I have had this unfinished, probably wrong patch on my hard drive > for 4 years, I doubt there is much you can extract from it, but just in > case... Thanks, this will indeed help! I only don't like !uns here: + bool uns = TYPE_UNSIGNED (itype); + bool fits = mantissa >= prec - !uns; as I believe for INT?_MIN we need 'prec' number of bits. -Y
[gcc commit] [gcc patch] DWARF-5: Define DW_IDX_GNU_static and DW_IDX_GNU_external
On Sun, 02 Jul 2017 18:22:45 +0200, Jason Merrill wrote: > I'd suggest "internal" rather than "static". Otherwise the patch looks good. Checked in as r249883 with: DW_IDX_GNU_internal, DW_IDX_GNU_external Jan
[PATCH] enhance -Wrestrict for sprintf %s arguments
The attached patch enhances the -Wrestrict warning to detect more than just trivial instances of overlapping copying by sprintf and related functions. The meat of the patch is relatively simple but because it introduces dependencies between existing classes in the sprintf pass I had to move the class definitions around. That makes the changes look more extensive than they really are. The enhancement works by first determining the base object (or pointer) from the destination of the sprintf call, the constant offset into the object, and its size. For each %s argument, it then computes same information. If it determines that overlap between the two is possible it stores the information for the directive argument (including the size of the argument) for later processing. After the whole call/format string has been processed, the patch then iterates over the stored directives and their arguments and compares the size/length of the argument against the function's overall output. If they overlap it issues a warning. Tested on x86_64-linux. -Wrestrict is not currently included in either -Wextra or -Wall and this patch doesn't change it even though there have been requests to add it to one of these two options. I'd like to do that as a separate step. As the next step I'd also like to extend a limited form of the -Wrestrict enhancement to other restrict-qualified built-ins (like strcpy), and ultimately also to user-defined functions that make use of restrict. I think this might perhaps best be done in a separate pass where the computed pointer information can be cached to avoid recomputing it for each call, but I don't expect to be able to have the new pass (or whatever form the enhancement might end up taking) to also handle sprintf (at least not with the same accuracy it does now) because the sprintf data for each format directive are not available outside the sprintf pass. Martin PR tree-optimization/35503 - Warning about restricted pointers? gcc/c-family/ChangeLog: PR tree-optimization/35503 * gcc/c-family/c-common.c (check_function_restrict): Avoid diagnosing sprintf et al. unless both -Wformat-overflow and -Wformat-truncation are disabled. gcc/ChangeLog: PR tree-optimization/35503 * gimple-ssa-sprintf.c (format_result::alias_info): New struct. (directive::argno): New member. (format_result::aliases, format_result::alias_count): New data members. (format_result::append_alias): New member function. (fmtresult::dst_offset): New data member. (pass_sprintf_length::call_info::dst_origin): New data member. (pass_sprintf_length::call_info::dst_field, dst_offset): Same. (char_type_p, array_elt_at_offset, field_at_offset): New functions. (get_origin_and_offset): Same. (format_string): Call it. (format_directive): Call append_alias and set directive argument number. (pass_sprintf_length::compute_format_length): Diagnose arguments that overlap the destination buffer. (pass_sprintf_length::handle_gimple_call): Initialize new members. gcc/testsuite/ChangeLog: PR tree-optimization/35503 * gcc.dg/tree-ssa/builtin-sprintf-warn-19.c: New test. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 4395e51..406ebc4 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -5266,6 +5266,24 @@ check_function_restrict (const_tree fndecl, const_tree fntype, else parms = TYPE_ARG_TYPES (fntype); + if (DECL_BUILT_IN (fndecl) + && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) +{ + switch (DECL_FUNCTION_CODE (fndecl)) + { + case BUILT_IN_SPRINTF: + case BUILT_IN_SPRINTF_CHK: + case BUILT_IN_SNPRINTF: + case BUILT_IN_SNPRINTF_CHK: + /* These are handled in gimple-ssa-sprintf.c. */ + if (warn_format_overflow || warn_format_trunc) + return; + + default: + break; + } +} + for (i = 0; i < nargs; i++) TREE_VISITED (argarray[i]) = 0; diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index f43778b..8113353 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -67,6 +67,7 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "intl.h" #include "langhooks.h" +#include "tree-ssa-alias.h" #include "builtins.h" #include "stor-layout.h" @@ -78,6 +79,7 @@ along with GCC; see the file COPYING3. If not see #include "input.h" #include "toplev.h" #include "substring-locations.h" +#include "gcc-rich-location.h" #include "diagnostic.h" /* The likely worst case value of MB_LEN_MAX for the target, large enough @@ -93,6 +95,30 @@ along with GCC; see the file COPYING3. If not see namespace { +/* Return the value of INT_MIN for the target. */ + +static inline HOST_WIDE_INT +target_int_min () +{ + return tree_to_shwi (TYPE_MIN_VALUE (integer_type_node)); +} + +/* Return the value of INT_MAX for the target. */ + +static inline unsigned HOST_WIDE_INT +target_int_max () +{ + return tree_to_uhwi (TYPE_MAX_VALUE (integer_type_node)); +} + +/* Return the value of
Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons
On Sun, 2 Jul 2017, Yuri Gribov wrote: This is initial patch for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Thanks. I have had this unfinished, probably wrong patch on my hard drive for 4 years, I doubt there is much you can extract from it, but just in case... -- Marc GlisseIndex: gcc/fold-const.c === --- gcc/fold-const.c(revision 209514) +++ gcc/fold-const.c(working copy) @@ -6389,21 +6389,21 @@ fold_inf_compare (location_t loc, enum t /* x > +Inf is always false, if with ignore sNANs. */ if (HONOR_SNANS (mode)) return NULL_TREE; return omit_one_operand_loc (loc, type, integer_zero_node, arg0); case LE_EXPR: /* x <= +Inf is always true, if we don't case about NaNs. */ if (! HONOR_NANS (mode)) return omit_one_operand_loc (loc, type, integer_one_node, arg0); - /* x <= +Inf is the same as x == x, i.e. isfinite(x). */ + /* x <= +Inf is the same as x == x, i.e. !isnan(x). */ arg0 = save_expr (arg0); return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg0); case EQ_EXPR: case GE_EXPR: /* x == +Inf and x >= +Inf are always equal to x > DBL_MAX. */ real_maxval (&max, neg, mode); return fold_build2_loc (loc, neg ? LT_EXPR : GT_EXPR, type, arg0, build_real (TREE_TYPE (arg0), max)); @@ -9389,70 +9389,121 @@ fold_comparison (location_t loc, enum tr tem = maybe_canonicalize_comparison (loc, code, type, arg0, arg1); if (tem) return tem; if (FLOAT_TYPE_P (TREE_TYPE (arg0))) { tree targ0 = strip_float_extensions (arg0); tree targ1 = strip_float_extensions (arg1); tree newtype = TREE_TYPE (targ0); + tree ftype = TREE_TYPE (arg0); if (TYPE_PRECISION (TREE_TYPE (targ1)) > TYPE_PRECISION (newtype)) newtype = TREE_TYPE (targ1); /* Fold (double)float1 CMP (double)float2 into float1 CMP float2. */ - if (TYPE_PRECISION (newtype) < TYPE_PRECISION (TREE_TYPE (arg0))) + if (TYPE_PRECISION (newtype) < TYPE_PRECISION (ftype)) return fold_build2_loc (loc, code, type, fold_convert_loc (loc, newtype, targ0), fold_convert_loc (loc, newtype, targ1)); /* (-a) CMP (-b) -> b CMP a */ if (TREE_CODE (arg0) == NEGATE_EXPR && TREE_CODE (arg1) == NEGATE_EXPR) return fold_build2_loc (loc, code, type, TREE_OPERAND (arg1, 0), TREE_OPERAND (arg0, 0)); if (TREE_CODE (arg1) == REAL_CST) { REAL_VALUE_TYPE cst; cst = TREE_REAL_CST (arg1); /* (-a) CMP CST -> a swap(CMP) (-CST) */ if (TREE_CODE (arg0) == NEGATE_EXPR) return fold_build2_loc (loc, swap_tree_comparison (code), type, TREE_OPERAND (arg0, 0), - build_real (TREE_TYPE (arg1), + build_real (ftype, real_value_negate (&cst))); /* IEEE doesn't distinguish +0 and -0 in comparisons. */ /* a CMP (-0) -> a CMP 0 */ if (REAL_VALUE_MINUS_ZERO (cst)) return fold_build2_loc (loc, code, type, arg0, - build_real (TREE_TYPE (arg1), dconst0)); + build_real (ftype, dconst0)); /* x != NaN is always true, other ops are always false. */ if (REAL_VALUE_ISNAN (cst) - && ! HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg1 + && ! HONOR_SNANS (TYPE_MODE (ftype))) { tem = (code == NE_EXPR) ? integer_one_node : integer_zero_node; return omit_one_operand_loc (loc, type, tem, arg0); } /* Fold comparisons against infinity. */ if (REAL_VALUE_ISINF (cst) - && MODE_HAS_INFINITIES (TYPE_MODE (TREE_TYPE (arg1 + && MODE_HAS_INFINITIES (TYPE_MODE (ftype))) { tem = fold_inf_compare (loc, code, type, arg0, arg1); if (tem != NULL_TREE) return tem; } + + /* (double)i CMP cst is often just i CMP cst'. +See PR 57371 for a detailed analysis and other ideas. */ + if (TREE_CODE (arg0) == FLOAT_EXPR) + { + tree inner = TREE_OPERAND (arg0, 0); + tree itype = TREE_TYPE (inner); + unsigned mantissa = significand_size (TYPE_MODE (ftype)); + unsigned prec = element_precision (itype); + bool uns = TYPE_UNSIGNED (itype); + bool fits = mantissa >= prec - !uns; + /* If ftype cannot represent exactly all values of itype, +we may have an inexact exception. If the conversion from +itype to ftype may overflow (unsigned __int128 to float), +
[PATCH][PR 57371] Remove useless floating point casts in comparisons
Hi all, This is initial patch for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Per Joseph's suggestion it optimizes (float)lhs CMP rhs (double)lhs CMP rhs to lhs CMP (typeof(x))rhs whenever typeof(x) can be precisely represented by floating-point type (e.g. short by float or int by double) and rhs can be precisely represented by typeof(x). Bootstrapped/regtested on x64. Ok for trunk? I'd like to extend this further in follow-up patches: 1) fold always-false/always-true comparisons e.g. short x; (float)x > INT16_MAX; // Always false 2) get rid of cast in comparisons with zero regardless of typeof(lhs) when -fno-trapping-math: (float_or_double)lhs CMP 0 -Y pr57371-1.patch Description: Binary data
Re: ping^2: [gcc patch] DWARF-5: Define DW_IDX_GNU_static and DW_IDX_GNU_external
On Sun, Jul 2, 2017 at 3:25 AM, Jan Kratochvil wrote: > http://dwarfstd.org/ShowIssue.php?issue=170527.1 > > 170527.1 Jan Kratochvil DW_IDX_* for static/extern symbols Enhancement Open > > Section 6.1.1.4.7, pg 147 > When a debugger wants to print 'somename' it logically tries to find first > 'somename' as an > external symbol in all available libraries. Only if none such external > symbol is found the > debugger starts searching for a static 'somename' symbol in those libraries. > > This requires to know whether a symbol in .debug_names index has > DW_AT_external or not. > Otherwise a lot of needless CU expansions happen. This extension improves > performance > gain of the .debug_names index. > > (Discovered in an original fix by Doug Evans - GDB Bug 14125.) > > Proposing and asking for pre-allocation: > DW_IDX_static = 6 = DW_FORM_flag_present = DIE's DW_AT_external is not > present > DW_IDX_external = 7 = DW_FORM_flag_present = DIE's DW_AT_external is present I'd suggest "internal" rather than "static". Otherwise the patch looks good. Jason
Re: [patch, fortran] Implement blocked eoshift for eoshift0
Hi Thomas, The timings are impressive! OK for trunk. Thanks Paul On 1 July 2017 at 14:48, Thomas Koenig wrote: > Hello world, > > the attached patch implements the blocked algorithm for > constant shift for dim > 1 for eoshift0 (which handles > the case of constant shift and constant fill value). > > Speedup, as for cshift, is large. Moving a 500*500*500 > array by -3 with eo_bench.f90 (also attached): > > $ gfortran -O3 eo_bench.f90 && ./a.out > dim =1 t = 0.451796889 > dim =2 t = 0.183514118 > dim =3 t = 0.184015989 > $ gfortran-7 -static-libgfortran -O3 eo_bench.f90 && ./a.out > dim =1 t = 0.955736041 > dim =2 t =1.42228103 > dim =3 t =3.00043702 > > Regression-tested. OK for trunk? > > Regards > > Thomas > > 2017-07-01 Thomas Koenig > > * intrinsics/eoshift0.c: For contiguous arrays, use > block algorithm. Use memcpy where possible. > > 2017-07-01 Thomas Koenig > > * gfortran/eoshift_3.f90: New test. -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [2/2] PR 80769: Incorrect strlen optimisation
Jakub Jelinek writes: > On Tue, May 16, 2017 at 09:02:08AM +0100, Richard Sandiford wrote: >> 2017-05-16 Richard Sandiford >> >> gcc/ >> PR tree-optimization/80769 >> * tree-ssa-strlen.c (strinfo): Document that "stmt" is also used >> for malloc and calloc. Document the new invariant that all related >> strinfos have delayed lengths or none do. >> (verify_related_strinfos): Move earlier in file. >> (set_endptr_and_length): New function, split out from... >> (get_string_length): ...here. Also set the lengths of related >> strinfos. >> (zero_length_string): Assert that chainsi has known (rather than >> delayed) lengths. >> (adjust_related_strinfos): Likewise. >> >> gcc/testsuite/ >> PR tree-optimization/80769 >> * gcc.dg/strlenopt-31.c: New test. >> * gcc.dg/strlenopt-31g.c: Likewise. > > Ok for trunk, sorry for the delay. I assume 7.x is not affected, right? Thanks, applied. 6.x and 7.x have it too, so I'll try to backport a minimal fix if there's no fallout on trunk. Richard
[PATCHv2][PR 56727] Bypass PLT for recursive calls
Hi all, This is a new version of previous patch (https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00020.html), fixed after Rainer's remarks. -Y pr56727-2.patch Description: Binary data
Re: [PATCH][PR 56727] Bypass PLT for recursive calls
On Sat, Jul 1, 2017 at 9:56 PM, Rainer Orth wrote: > Hi Yuri, > >> diff -rupN gcc/gcc/testsuite/gcc.dg/pr56727-1.c >> gcc-56727/gcc/testsuite/gcc.dg/pr56727-1.c >> --- gcc/gcc/testsuite/gcc.dg/pr56727-1.c 1970-01-01 01:00:00.0 >> +0100 >> +++ gcc-56727/gcc/testsuite/gcc.dg/pr56727-1.c2017-07-01 >> 21:36:36.0 +0200 >> @@ -0,0 +1,23 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fPIC" } */ > > both tests need to be restricted to target fpic... > >> +/* { dg-final { scan-assembler-not "@PLT" } } */ > > ... and @PLT won't work everywhere, either. Judging from gcc/config, it > will be i386/x86_64 (except darwin), microblaze, mn10300, s390, sh, and > xtensa at most. Thanks Rainer, I'll update the patch. A pity that this can not be tested in cross-platform way. -Y
ping^2: [gcc patch] DWARF-5: Define DW_IDX_GNU_static and DW_IDX_GNU_external
http://dwarfstd.org/ShowIssue.php?issue=170527.1 170527.1 Jan Kratochvil DW_IDX_* for static/extern symbols Enhancement Open Section 6.1.1.4.7, pg 147 When a debugger wants to print 'somename' it logically tries to find first 'somename' as an external symbol in all available libraries. Only if none such external symbol is found the debugger starts searching for a static 'somename' symbol in those libraries. This requires to know whether a symbol in .debug_names index has DW_AT_external or not. Otherwise a lot of needless CU expansions happen. This extension improves performance gain of the .debug_names index. (Discovered in an original fix by Doug Evans - GDB Bug 14125.) Proposing and asking for pre-allocation: DW_IDX_static = 6 = DW_FORM_flag_present = DIE's DW_AT_external is not present DW_IDX_external = 7 = DW_FORM_flag_present = DIE's DW_AT_external is present