Re: [PATCH, PR81192] Fix sigsegv in find_same_succ_bb

2017-07-02 Thread Richard Biener
On Sun, 2 Jul 2017, Tom de Vries wrote:

> Hi,
> 
> consider this test-case:
> ...
> unsigned a;
> int b, c;
> 
> static int
> fn1 (int p1, int p2)
> {
>   return p1 > 2147483647 - p2 ? p1 : p1 + p2;
> }
> 
> void
> fn2 (void)
> {
>   int j;
>   a = 30;
>   for (; a;)
> for (; c; b = fn1 (j, 1))
>   ;
> }
> ...
> 
> When compiling the test-case with -Os, just before tail-merge it looks as in
> before.pdf.
> 
> During tail-merge, it runs into a sigsegv.
> 
> What happens is the following:
> - tail-merge decides to merge blocks 4 and 6, and removes block 6.
> - bb8, a predecessor of block 6, is marked as member of
>   deleted_bb_preds.
> - during update_worklist, same_succ_flush_bb is called for bb8
> - same_succ_flush_bb runs into a sigsegv because
>   BB_SAME_SUCC (bb8) == NULL
> - the reason that BB_SAME_SUCC (bb8) == NULL, is because it hit the
>   bb->loop_father->latch == bb clause in find_same_succ_bb at the start
>   of the tail-merge pass.
> 
> This patch fixes the sigsegv by doing an early-out in same_succ_flush_bb if
> BB_SAME_SUCC () == NULL.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk and gcc-[567]-branch?

Ok for trunk and branches.  Mind the gcc-6 branch is frozen right now.

Thanks,
Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Add DR_BASE_ALIGNMENT

2017-07-02 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jun 28, 2017 at 3:40 PM, Richard Sandiford
>  wrote:
>> This patch records the base alignment in data_reference, to avoid the
>> second-guessing that was previously done in vect_compute_data_ref_alignment.
>> It also makes vect_analyze_data_refs use dr_analyze_innermost, instead
>> of having an almost-copy of the same code.
>>
>> I'd originally tried to do the second part as a standalone patch,
>> but on its own it caused us to miscompute the base alignment (due to
>> the second-guessing not quite working).  This was previously latent
>> because the old code set STMT_VINFO_DR_ALIGNED_TO to a byte value,
>> whereas it should have been bits.
>>
>> After the previous patch, the only thing that dr_analyze_innermost
>> read from the dr was the DR_REF.  I thought it would be better to pass
>> that in and make data_reference write-only.  This means that callers
>> like vect_analyze_data_refs don't have to set any fields first
>> (and thus not worry about *which* fields they should set).
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2017-06-28  Richard Sandiford  
>>
>> gcc/
>> * tree-data-ref.h (data_reference): Add a base_alignment field.
>> (DR_BASE_ALIGNMENT): New macro.
>> (dr_analyze_innermost): Add a tree argument.
>> * tree-data-ref.c: Include builtins.h.
>> (dr_analyze_innermost): Take the tree reference as argument.
>> Set up DR_BASE_ALIGNMENT.
>> (create_data_ref): Update call accordingly.
>> * tree-predcom.c (find_looparound_phi): Likewise.
>> * tree-vectorizer.h (_stmt_vec_info): Add dr_base_alignment.
>> (STMT_VINFO_DR_BASE_ALIGNMENT): New macro.
>> * tree-vect-data-refs.c: Include tree-cfg.h.
>> (vect_compute_data_ref_alignment): Use DR_BASE_ALIGNMENT instead
>> of calculating an alignment here.
>> (vect_analyze_data_refs): Use dr_analyze_innermost.  Record the
>> base alignment in STMT_VINFO_DR_BASE_ALIGNMENT.
>>
>> Index: gcc/tree-data-ref.h
>> ===
>> --- gcc/tree-data-ref.h 2017-06-26 19:41:19.549571836 +0100
>> +++ gcc/tree-data-ref.h 2017-06-28 14:26:19.651051322 +0100
>> @@ -119,6 +119,10 @@ struct data_reference
>>/* True when the data reference is in RHS of a stmt.  */
>>bool is_read;
>>
>> +  /* The alignment of INNERMOST.base_address, in bits.  This is logically
>> + part of INNERMOST, but is kept here to avoid unnecessary padding.  */
>> +  unsigned int base_alignment;
>> +
>
> But then it would be nice to have dr_analyze_innermost take a struct
> innermost_loop_behavior * only.  That way the vectorizer copy for the
> outer loop behavior can just be a innermost_loop_behavior sub-struct
> as well and predcom wouldn't need to invent an interesting DR_STMT
> either.

Yeah, had wondered that too.  Comments below led to a new series
in which the information ends up being nicely padded, so there was
no excuse not to switch to innermost_loop_behavior.

> The DR_ accessors wouldn't work on that but I guess they
> could be made work with
>
> inline tree dr_base_address (struct data_reference *dr) { return
> dr->innermost.base_address; }
> inline tree dr_base_address (struct innermost_loop_behavior *p) {
> return p->base_address; }
> #define DR_BASE_ADDRESS(DR) (dr_base_address (dr))
>
> anyway, that's implementation detail ;)

I just went for direct field accesses, hope that's OK.

>>/* Behavior of the memory reference in the innermost loop.  */
>>struct innermost_loop_behavior innermost;
>>
>> @@ -139,6 +143,7 @@ #define DR_NUM_DIMENSIONS(DR)  DR_AC
>>  #define DR_IS_READ(DR) (DR)->is_read
>>  #define DR_IS_WRITE(DR)(!DR_IS_READ (DR))
>>  #define DR_BASE_ADDRESS(DR)(DR)->innermost.base_address
>> +#define DR_BASE_ALIGNMENT(DR)  (DR)->base_alignment
>>  #define DR_OFFSET(DR)  (DR)->innermost.offset
>>  #define DR_INIT(DR)(DR)->innermost.init
>>  #define DR_STEP(DR)(DR)->innermost.step
>> @@ -322,7 +327,7 @@ #define DDR_DIST_VECT(DDR, I) \
>>  #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
>>
>>
>> -bool dr_analyze_innermost (struct data_reference *, struct loop *);
>> +bool dr_analyze_innermost (struct data_reference *, tree, struct loop *);
>>  extern bool compute_data_dependences_for_loop (struct loop *, bool,
>>vec *,
>>vec *,
>> Index: gcc/tree-data-ref.c
>> ===
>> --- gcc/tree-data-ref.c 2017-06-28 14:26:12.946306736 +0100
>> +++ gcc/tree-data-ref.c 2017-06-28 14:26:19.651051322 +0100
>> @@ -94,6 +94,7 @@ Software Foundation; either version 3, o
>>  #include "dumpfile.h"
>>  #include "tree-affine.h"
>>  #include "params.h"
>> +#include "builtins.h"
>>
>>  static struct dat

Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-07-02 Thread Tamar Christina
Hi All,

Sorry I just realized I never actually sent the updated patch...

So here it is :)

Regards,
Tamar

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Friday, June 9, 2017 4:51:52 PM
To: Richard Sandiford
Cc: GCC Patches; nd; James Greenhalgh; Richard Earnshaw; Marcus Shawcroft
Subject: RE: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned 
memory access

> > +  /* Optimize routines for MEM to REG copies.  */  if (n < 8 &&
> > + !REG_P (XEXP (operands[0], 0)))
>
> This seems to be checking that the address of the original destination
> memory isn't a plain base register.  Why's it important to reject that case 
> but
> allow e.g. base+offset?

this function is (as far as I could tell) only being called with two types of 
destinations:
a location on the stack or a plain register.

When the destination is a register such as with

void Fun3(struct struct3 foo3)
{
  L3 = foo3;
}

You run into the issue you had pointed to below where we might write too much. 
Ideally the constraint
I would like is to check if the destination is either a new register (no data) 
or that the structure was padded.

I couldn't figure out how to do this and the gains over the existing code for 
this case was quite small.
So I just disallowed it leaving it to the existing code, which isn't so bad, 
only 1 extra instruction.

>
> > +   {
> > + unsigned int max_align = UINTVAL (operands[2]);
> > + max_align = n < max_align ? max_align : n;
>
> Might be misunderstanding, but isn't max_align always equal to n here, since
> n was set by:

Correct, previously this patch was made to allow n < 16. These were left over 
from the cleanup.
I'll correct.

>
> > + result = gen_rtx_IOR (dest_mode, reg, result);
> > +  }
> > +
> > +dst = adjust_address (dst, dest_mode, 0);
> > +emit_insn (gen_move_insn (dst, result));
>
> dest_mode was chosen by smallest_mode_for_size, so can be bigger than n.
> Doesn't that mean that we'll write beyond the end of the copy region when
> n is an awkward number?

Yes, see my answer above. For the other case, when we write onto a location on 
the stack, this is fine
due to the alignment.

>
> > diff --git a/gcc/expr.c b/gcc/expr.c
> > index
> >
> 91d7ea217229fac62380b5d4b646961bf7c836c1..b1df4651e7942346007cda1cce
> 8e
> > e5a19297ab16 100644
> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -2743,7 +2743,9 @@ copy_blkmode_to_reg (machine_mode mode,
> tree
> > src)
> >
> >n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> >dst_words = XALLOCAVEC (rtx, n_regs);
> > -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > +  bitsize = BITS_PER_WORD;
> > +  if (SLOW_UNALIGNED_ACCESS (BLKmode, TYPE_ALIGN (TREE_TYPE
> (src
> > +bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>
> I think this ought to be testing word_mode instead of BLKmode.
> (Testing BLKmode doesn't really make sense in general, because the mode
> doesn't have a meaningful alignment.)

Ah, yes that makes sense. I'll update the patch.

New patch is validating and will submit it soon.

>
> Thanks,
> Richard
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ab1bdc0233afe7a3c41501cb724a5c4c719123b8..1cf6c4b891571745f740d7dbd03754bbba9264a6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13466,7 +13466,6 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
 
 /* Expand movmem, as if from a __builtin_memcpy.  Return true if
we succeed, otherwise return false.  */
-
 bool
 aarch64_expand_movmem (rtx *operands)
 {
@@ -13498,16 +13497,55 @@ aarch64_expand_movmem (rtx *operands)
   base = copy_to_mode_reg (Pmode, XEXP (src, 0));
   src = adjust_automodify_address (src, VOIDmode, base, 0);
 
+  /* Optimize routines for MEM to REG copies.
+ This particular function only handles copying to two
+ destination types: 1) a regular register and 2) the stack.
+ When writing to a regular register we may end up writting too much in cases
+ where the register already contains a live value or when no data padding is
+ happening so we disallow it.  */
+  if (n < 8 && !REG_P (XEXP (operands[0], 0)))
+   {
+ machine_mode mov_mode, dest_mode
+   = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT);
+ rtx result = gen_reg_rtx (dest_mode);
+ emit_insn (gen_move_insn (result, GEN_INT (0)));
+
+ unsigned int shift_cnt = 0;
+ for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode))
+   {
+	 int nearest = 0;
+	 /* Find the mode to use.  */
+	 for (unsigned max = 1; max <= (n - shift_cnt); max *= 2)
+	  nearest = max;
+
+	  mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT, MODE_INT);
+	  rtx reg = gen_reg_rtx (mov_mode);
+
+	  src = adjust_address (src, mov_mode, 0);
+	  emit_insn (gen_move_insn (reg, src));
+	  src = aarch64_progress_pointer (src);
+
+	  reg = gen_rtx_ZERO_EXTEND (dest_mo

Re: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode.

2017-07-02 Thread Tamar Christina
Ping

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Monday, June 26, 2017 11:50:51 AM
To: James Greenhalgh
Cc: GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw
Subject: Re: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - 
HF/DF/SF mode.

Hi all,

Here's the re-spun patch.
Aside from the grouping of the split patterns it now also uses h register for 
the fmov for HF when available,
otherwise it forces a literal load.

Regression tested on aarch64-none-linux-gnu and no regressions.

OK for trunk?

Thanks,
Tamar


gcc/
2017-06-26  Tamar Christina  
Richard Sandiford 

* config/aarch64/aarch64.md (mov): Generalize.
(*movhf_aarch64, *movsf_aarch64, *movdf_aarch64):
Add integer and movi cases.
(movi-split-hf-df-sf split, fp16): New.
(enabled): Added TARGET_FP_F16INST.
* config/aarch64/iterators.md (GPF_HF): New.

From: Tamar Christina
Sent: Wednesday, June 21, 2017 11:48:33 AM
To: James Greenhalgh
Cc: GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw
Subject: RE: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - 
HF/DF/SF mode.

> > movi\\t%0.4h, #0
> > -   mov\\t%0.h[0], %w1
> > +   fmov\\t%s0, %w1
>
> Should this not be %h0?

The problem is that H registers are only available in ARMv8.2+,
I'm not sure what to do about ARMv8.1 given your other feedback
Pointing out that the bit patterns between how it's stored in s vs h registers
differ.

>
> > umov\\t%w0, %1.h[0]
> > mov\\t%0.h[0], %1.h[0]
> > +   fmov\\t%s0, %1
>
> Likewise, and much more important for correctness as it changes the way the
> bit pattern ends up in the register (see table C2-1 in release B.a of the ARM
> Architecture Reference Manual for ARMv8-A), here.
>
> > +   * return aarch64_output_scalar_simd_mov_immediate (operands[1],
> > + SImode);
> > ldr\\t%h0, %1
> > str\\t%h1, %0
> > ldrh\\t%w0, %1
> > strh\\t%w1, %0
> > mov\\t%w0, %w1"
> > -  [(set_attr "type"
> "neon_move,neon_from_gp,neon_to_gp,neon_move,\
> > - f_loads,f_stores,load1,store1,mov_reg")
> > -   (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")]
> > +  "&& can_create_pseudo_p ()
> > +   && !aarch64_can_const_movi_rtx_p (operands[1], HFmode)
> > +   && !aarch64_float_const_representable_p (operands[1])
> > +   &&  aarch64_float_const_rtx_p (operands[1])"
> > +  [(const_int 0)]
> > +  "{
> > +unsigned HOST_WIDE_INT ival;
> > +if (!aarch64_reinterpret_float_as_int (operands[1], &ival))
> > +  FAIL;
> > +
> > +rtx tmp = gen_reg_rtx (SImode);
> > +aarch64_expand_mov_immediate (tmp, GEN_INT (ival));
> > +tmp = simplify_gen_subreg (HImode, tmp, SImode, 0);
> > +emit_move_insn (operands[0], gen_lowpart (HFmode, tmp));
> > +DONE;
> > +  }"
> > +  [(set_attr "type" "neon_move,f_mcr,neon_to_gp,neon_move,fconsts,
> \
> > +neon_move,f_loads,f_stores,load1,store1,mov_reg")
> > +   (set_attr "simd" "yes,*,yes,yes,*,yes,*,*,*,*,*")]
> >  )
>
> Thanks,
> James



Re: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - infrastructure.

2017-07-02 Thread Tamar Christina
Ping

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Monday, June 26, 2017 11:49:42 AM
To: James Greenhalgh
Cc: Richard Sandiford; GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw
Subject: Re: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - 
infrastructure.

Hi All,

I've updated patch accordingly.

This mostly involves removing the loop to create the ival
and removing the *2 code and instead defaulting to 64bit
and switching to 128 when needed.

Regression tested on aarch64-none-linux-gnu and no regressions.

OK for trunk?

Thanks,
Tamar


gcc/
2017-06-26  Tamar Christina  

* config/aarch64/aarch64.c
(aarch64_simd_container_mode): Add prototype.
(aarch64_expand_mov_immediate): Add HI support.
(aarch64_reinterpret_float_as_int, aarch64_float_const_rtx_p: New.
(aarch64_can_const_movi_rtx_p): New.
(aarch64_preferred_reload_class):
Remove restrictions of using FP registers for certain SIMD operations.
(aarch64_rtx_costs): Added new cost for CONST_DOUBLE moves.
(aarch64_valid_floating_const): Add integer move validation.
(aarch64_simd_imm_scalar_p): Remove.
(aarch64_output_scalar_simd_mov_immediate): Generalize function.
(aarch64_legitimate_constant_p): Expand list of supported cases.
* config/aarch64/aarch64-protos.h
(aarch64_float_const_rtx_p, aarch64_can_const_movi_rtx_p): New.
(aarch64_reinterpret_float_as_int): New.
(aarch64_simd_imm_scalar_p): Remove.
* config/aarch64/predicates.md (aarch64_reg_or_fp_float): New.
* config/aarch64/constraints.md (Uvi): New.
(Dd): Split into Ds and new Dd.
* config/aarch64/aarch64.md (*movsi_aarch64):
Add SIMD mov case.
(*movdi_aarch64): Add SIMD mov case.

From: Tamar Christina
Sent: Thursday, June 15, 2017 1:50:19 PM
To: James Greenhalgh
Cc: Richard Sandiford; GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw
Subject: RE: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - 
infrastructure.

>
> This patch is pretty huge, are there any opportunities to further split it to 
> aid
> review?

Unfortunately because I'm also changing some constraints it introduced a bit of 
a dependency cycle.
If I were to break it up more, the individual patches won't work on their own 
anymore. If this is acceptable
I can break it up more.

> > +  ival = zext_hwi (res[needed - 1], 32);  for (int i = needed - 2; i
> > + >= 0; i--)
> > +{
> > +  ival <<= 32;
> > +  ival |= zext_hwi (res[i], 32);
> > +}
> > +
> > +  *intval = ival;
>
> ???
>
> Two cases here, needed is either 2 if GET_MODE_BITSIZE (mode) == 64, or it
> is 1 otherwise. So i starts at either -1 or 0. So this for loop either runs
> 0 or 1 times. What am I missing? I'm sure this is all an indirect way of
> writing:
>

Yes, the code was set up to be easily extended to support 128 floats as well,
Which was deprioritized. I'll just remove the loop.

> > +
> > +  /* Determine whether it's cheaper to write float constants as
> > + mov/movk pairs over ldr/adrp pairs.  */  unsigned HOST_WIDE_INT
> > + ival;
> > +
> > +  if (GET_CODE (x) == CONST_DOUBLE
> > +  && SCALAR_FLOAT_MODE_P (mode)
> > +  && aarch64_reinterpret_float_as_int (x, &ival))
> > +{
> > +  machine_mode imode = mode == HFmode ? SImode :
> int_mode_for_mode (mode);
> > +  int num_instr = aarch64_internal_mov_immediate
> > +   (NULL_RTX, gen_int_mode (ival, imode), false,
> imode);
> > +  return num_instr < 3;
>
> Should this cost model be static on a magin number? Is it not the case that
> the decision should be based on the relative speeds of a memory access
> compared with mov/movk/fmov ?
>

As far as I'm aware, the cost model is too simplistic to be able to express the
Actual costs of mov/movk and movk/movk pairs. E.g it doesn't take into account
The latency and throughput difference when the instructions occur in 
sequence/pairs.

This leads to it allowing a smaller subset through here then what would be 
beneficial.

> > +/* Return TRUE if rtx X is immediate constant that fits in a single
> > +   MOVI immediate operation.  */
> > +bool
> > +aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode) {
> > +  if (!TARGET_SIMD)
> > + return false;
> > +
> > +  machine_mode vmode, imode;
> > +  unsigned HOST_WIDE_INT ival;
> > +
> > +  /* Don't write float constants out to memory.  */
> > +  if (GET_CODE (x) == CONST_DOUBLE
> > +  && SCALAR_FLOAT_MODE_P (mode))
> > +{
> > +  if (!aarch64_reinterpret_float_as_int (x, &ival))
> > +   return false;
> > +
> > +  imode = int_mode_for_mode (mode);
> > +}
> > +  else if (GET_CODE (x) == CONST_INT
> > +  && SCALAR_INT_MODE_P (mode))
> > +{
> > +   imode = mode;
> > +   ival = INTVAL (x);
> > +}
> > +  else
> > +return fa

Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-02 Thread Yuri Gribov
On Sun, Jul 2, 2017 at 6:03 PM, Yuri Gribov  wrote:
> Hi all,
>
> This is initial patch for
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Per Joseph's
> suggestion it optimizes
>   (float)lhs CMP rhs
>   (double)lhs CMP rhs
> to
>   lhs CMP (typeof(x))rhs
> whenever typeof(x) can be precisely represented by floating-point type
> (e.g. short by float or int by double) and rhs can be precisely
> represented by typeof(x).
>
> Bootstrapped/regtested on x64. Ok for trunk?

Seems my understanding of real_to_integer was too naive. I'll fix the
patch and send updated variant.

-Y


[PATCH] Add dotfn

2017-07-02 Thread Tom de Vries

Hi,

this patch adds a debug function dotfn and a convenience macro DOTFN 
similar to dot-fn in gdbhooks.py.


It can be used to have the compiler:
- dump a control flow graph, or
- pop up a control flow graph window
at specific moments in the compilation flow, for debugging purposes.

Bootstrapped and reg-tested on x86_64.

Used for debugging PR81192.

OK for trunk?

Thanks,
- Tom
Add dotfn

2017-06-30  Tom de Vries  

	* graph.c (get_graph_file_name): New function, factored out of ...
	(open_graph_file): ... here.
	(dotfn): New function.
	* graph.h (dotfn): Declare.
	(DOTFN): New macro.

---
 gcc/graph.c | 68 +++--
 gcc/graph.h |  7 +++
 2 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index 9261732..e6d3f2e 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -37,23 +37,45 @@ along with GCC; see the file COPYING3.  If not see
ignore that recommendation...  */
 static const char *const graph_ext = ".dot";
 
+/* Return filename for dumping our graph to.  If NUM, use *NUM in the file name
+   and increase *NUM.  */
+
+static char *
+get_graph_file_name (const char *base, unsigned int *num = NULL)
+{
+  size_t namelen = strlen (base);
+  size_t extlen = strlen (graph_ext);
+  size_t numlen = num ? 11 : 0;
+  size_t len = namelen + extlen + numlen + 1;
+  char *buf = XNEWVEC (char, len);
+
+  char *pos = buf;
+  memcpy (pos, base, namelen);
+  pos += namelen;
+  if (num)
+{
+  pos += snprintf (pos, numlen, ".%u", *num);
+  ++*num;
+}
+  memcpy (pos, graph_ext, extlen + 1);
+
+  return buf;
+}
+
 /* Open a file with MODE for dumping our graph to.
Return the file pointer.  */
 static FILE *
 open_graph_file (const char *base, const char *mode)
 {
-  size_t namelen = strlen (base);
-  size_t extlen = strlen (graph_ext) + 1;
-  char *buf = XALLOCAVEC (char, namelen + extlen);
+  char *buf = get_graph_file_name (base);
   FILE *fp;
 
-  memcpy (buf, base, namelen);
-  memcpy (buf + namelen, graph_ext, extlen);
-
   fp = fopen (buf, mode);
   if (fp == NULL)
 fatal_error (input_location, "can%'t open %s: %m", buf);
 
+  XDELETEVEC (buf);
+
   return fp;
 }
 
@@ -354,3 +376,37 @@ finish_graph_dump_file (const char *base)
   end_graph_dump (fp);
   fclose (fp);
 }
+
+/* Dump graph into file '.dot', or if COUNTER is defined
+   '.<*COUNTER>.dot'.  If POPUP, show the graph in a popup window.  */
+
+void DEBUG_FUNCTION
+dotfn (const char *base, unsigned int *counter, bool popup)
+{
+  char *name = get_graph_file_name (base, counter);
+
+  FILE *fp = fopen (name, "w");
+  if (fp != NULL)
+{
+  start_graph_dump (fp, "");
+  print_graph_cfg (fp, cfun, dump_flags);
+  end_graph_dump (fp);
+  fclose (fp);
+
+  if (popup)
+	{
+	  char prog[] = "dot";
+	  char arg1[] = "-Tx11";
+	  char *const cmd[] = { prog, arg1, name , NULL };
+	  struct pex_obj *pex = pex_init (0, prog, NULL);
+	  int err;
+	  pex_run (pex, PEX_LAST | PEX_SEARCH, prog, cmd, NULL, NULL, &err);
+
+	  int status;
+	  pex_get_status (pex, 1, &status);
+	  pex_free (pex);
+	}
+}
+
+  XDELETEVEC (name);
+}
diff --git a/gcc/graph.h b/gcc/graph.h
index 4db253f..168b60c 100644
--- a/gcc/graph.h
+++ b/gcc/graph.h
@@ -24,4 +24,11 @@ extern void print_graph_cfg (const char *, struct function *);
 extern void clean_graph_dump_file (const char *);
 extern void finish_graph_dump_file (const char *);
 
+extern void dotfn (const char *base, unsigned int *counter, bool popup);
+#define DOTFN(base, popup)			\
+  {		\
+static unsigned int counter = 0;		\
+dotfn (base, &counter, popup);		\
+  }
+
 #endif /* ! GCC_GRAPH_H */


[PATCH, PR69468] Ignore EDGE_{DFS_BACK,EXECUTABLE} in tail-merge

2017-07-02 Thread Tom de Vries

[ was: Re: [PATCH, PR81192] Don't tail-merge blocks from different loops ]

On 07/03/2017 12:49 AM, Tom de Vries wrote:

[ was: Re: [PATCH, PR81192] Fix sigsegv in find_same_succ_bb ]

On 07/03/2017 12:26 AM, Tom de Vries wrote:

[ Trying again with before.svg instead of before.pdf ]

Hi,

consider this test-case:
...
unsigned a;
int b, c;

static int
fn1 (int p1, int p2)
{
   return p1 > 2147483647 - p2 ? p1 : p1 + p2;
}

void
fn2 (void)
{
   int j;
   a = 30;
   for (; a;)
 for (; c; b = fn1 (j, 1))
   ;
}
...

When compiling the test-case with -Os, just before tail-merge it looks 
as in before.svg.


During tail-merge, it runs into a sigsegv.

What happens is the following:
- tail-merge decides to merge blocks 4 and 6, and removes block 6.


As pointed out in the PR, blocks 4 and 6 belong to different loops, and 
shouldn't be merged.




While working on this PR, I used print_graph_cfg in tail-merge, which I 
noticed changes the choice in merging blocks. I tracked this down to:

- mark_dfs_back_edges changing the EDGE_DFS_BACK flag on some edges, and
- tail-merge requiring identical edge flags (apart from the true/false
  flags, which are dealt with explicitly).

This is similar to PR69468 which notices the same about EDGE_EXECUTABLE.

This patch allows more tail-merging by ignoring both EDGE_DFS_BACK and 
EDGE_EXECUTABLE. Consequently, in this example the two latch bbs (of the 
same loop) bb4 and bb7 are merged.


Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Ignore EDGE_{DFS_BACK,EXECUTABLE} in tail-merge

2017-06-30  Tom de Vries  

	PR tree-optimization/69468
	* tree-ssa-tail-merge.c (ignore_edge_flags): New constant.
	(find_same_succ_bb): Handle ignore_edge_flags.

	* gcc.dg/pr81192.c: Update.

---
 gcc/testsuite/gcc.dg/pr81192.c | 2 +-
 gcc/tree-ssa-tail-merge.c  | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c
index 8b3a77a..57eb478 100644
--- a/gcc/testsuite/gcc.dg/pr81192.c
+++ b/gcc/testsuite/gcc.dg/pr81192.c
@@ -19,4 +19,4 @@ fn2 (void)
   ;
 }
 
-/* { dg-final { scan-tree-dump-not "(?n)find_duplicates:  duplicate of " "pre" } } */
+/* { dg-final { scan-tree-dump-times "(?n)find_duplicates:  duplicate of " 1 "pre" } } */
diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index 0865e86..6f199de 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -207,6 +207,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "tree-cfgcleanup.h"
 
+const int ignore_edge_flags = EDGE_DFS_BACK | EDGE_EXECUTABLE;
+
 /* Describes a group of bbs with the same successors.  The successor bbs are
cached in succs, and the successor edge flags are cached in succ_flags.
If a bb has the EDGE_TRUE/FALSE_VALUE flags swapped compared to succ_flags,
@@ -707,7 +709,7 @@ find_same_succ_bb (basic_block bb, same_succ **same_p)
 {
   int index = e->dest->index;
   bitmap_set_bit (same->succs, index);
-  same_succ_edge_flags[index] = e->flags;
+  same_succ_edge_flags[index] = (e->flags & ~ignore_edge_flags);
 }
   EXECUTE_IF_SET_IN_BITMAP (same->succs, 0, j, bj)
 same->succ_flags.safe_push (same_succ_edge_flags[j]);


[PATCH, PR81192] Don't tail-merge blocks from different loops

2017-07-02 Thread Tom de Vries

[ was: Re: [PATCH, PR81192] Fix sigsegv in find_same_succ_bb ]

On 07/03/2017 12:26 AM, Tom de Vries wrote:

[ Trying again with before.svg instead of before.pdf ]

Hi,

consider this test-case:
...
unsigned a;
int b, c;

static int
fn1 (int p1, int p2)
{
   return p1 > 2147483647 - p2 ? p1 : p1 + p2;
}

void
fn2 (void)
{
   int j;
   a = 30;
   for (; a;)
 for (; c; b = fn1 (j, 1))
   ;
}
...

When compiling the test-case with -Os, just before tail-merge it looks 
as in before.svg.


During tail-merge, it runs into a sigsegv.

What happens is the following:
- tail-merge decides to merge blocks 4 and 6, and removes block 6.


As pointed out in the PR, blocks 4 and 6 belong to different loops, and 
shouldn't be merged.


There is a test 'bb->loop_father->latch == bb' in find_same_succ_bb that 
is supposed to keep the two blocks from merging, but the test is not 
working for this example because the latch field is not defined (because 
the loop has in fact two latches).


This patch prevents blocks from different loops to be merged by testing 
for bb->loop_father->num equivalence in tail-merge.


It also removes the unreliable test for 'bb->loop_father->latch == bb' 
in find_same_succ_bb.


Bootstrapped and reg-tested on x86_64.

OK for trunk and gcc-[567]-branch?

Thanks,
- Tom

Don't tail-merge blocks from different loops

2017-06-30  Tom de Vries  

	PR tree-optimization/81192
	* tree-ssa-tail-merge.c (same_succ_hash): Use bb->loop_father->num in
	hash.
	(same_succ::equal): Don't find bbs to be equal if bb->loop_father->num
	differs.
	(find_same_succ_bb): Remove obsolete test on bb->loop_father->latch.

	* gcc.dg/pr81192.c: Update.

---
 gcc/testsuite/gcc.dg/pr81192.c |  2 +-
 gcc/tree-ssa-tail-merge.c  | 15 ++-
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c
index 57eb478..8b3a77a 100644
--- a/gcc/testsuite/gcc.dg/pr81192.c
+++ b/gcc/testsuite/gcc.dg/pr81192.c
@@ -19,4 +19,4 @@ fn2 (void)
   ;
 }
 
-/* { dg-final { scan-tree-dump-times "(?n)find_duplicates:  duplicate of " 1 "pre" } } */
+/* { dg-final { scan-tree-dump-not "(?n)find_duplicates:  duplicate of " "pre" } } */
diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index bb8a308..0865e86 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -479,6 +479,8 @@ same_succ_hash (const same_succ *e)
   hstate.add_int (size);
   BB_SIZE (bb) = size;
 
+  hstate.add_int (bb->loop_father->num);
+
   for (i = 0; i < e->succ_flags.length (); ++i)
 {
   flags = e->succ_flags[i];
@@ -568,6 +570,9 @@ same_succ::equal (const same_succ *e1, const same_succ *e2)
   if (BB_SIZE (bb1) != BB_SIZE (bb2))
 return 0;
 
+  if (bb1->loop_father->num != bb2->loop_father->num)
+return 0;
+
   gsi1 = gsi_start_nondebug_bb (bb1);
   gsi2 = gsi_start_nondebug_bb (bb2);
   gsi_advance_fw_nondebug_nonlocal (&gsi1);
@@ -695,15 +700,7 @@ find_same_succ_bb (basic_block bb, same_succ **same_p)
   edge_iterator ei;
   edge e;
 
-  if (bb == NULL
-  /* Be conservative with loop structure.  It's not evident that this test
-	 is sufficient.  Before tail-merge, we've just called
-	 loop_optimizer_finalize, and LOOPS_MAY_HAVE_MULTIPLE_LATCHES is now
-	 set, so there's no guarantee that the loop->latch value is still valid.
-	 But we assume that, since we've forced LOOPS_HAVE_SIMPLE_LATCHES at the
-	 start of pre, we've kept that property intact throughout pre, and are
-	 keeping it throughout tail-merge using this test.  */
-  || bb->loop_father->latch == bb)
+  if (bb == NULL)
 return;
   bitmap_set_bit (same->bbs, bb->index);
   FOR_EACH_EDGE (e, ei, bb->succs)


[PATCH, PR81192] Fix sigsegv in find_same_succ_bb

2017-07-02 Thread Tom de Vries

[ Trying again with before.svg instead of before.pdf ]

Hi,

consider this test-case:
...
unsigned a;
int b, c;

static int
fn1 (int p1, int p2)
{
  return p1 > 2147483647 - p2 ? p1 : p1 + p2;
}

void
fn2 (void)
{
  int j;
  a = 30;
  for (; a;)
for (; c; b = fn1 (j, 1))
  ;
}
...

When compiling the test-case with -Os, just before tail-merge it looks 
as in before.svg.


During tail-merge, it runs into a sigsegv.

What happens is the following:
- tail-merge decides to merge blocks 4 and 6, and removes block 6.
- bb8, a predecessor of block 6, is marked as member of
  deleted_bb_preds.
- during update_worklist, same_succ_flush_bb is called for bb8
- same_succ_flush_bb runs into a sigsegv because
  BB_SAME_SUCC (bb8) == NULL
- the reason that BB_SAME_SUCC (bb8) == NULL, is because it hit the
  bb->loop_father->latch == bb clause in find_same_succ_bb at the start
  of the tail-merge pass.

This patch fixes the sigsegv by doing an early-out in same_succ_flush_bb 
if BB_SAME_SUCC () == NULL.


Bootstrapped and reg-tested on x86_64.

OK for trunk and gcc-[567]-branch?

Thanks,
- Tom
Fix sigsegv in find_same_succ_bb

2017-06-30  Tom de Vries  

	PR tree-optimization/81192
	* tree-ssa-tail-merge.c (same_succ_flush_bb): Handle
	BB_SAME_SUCC (bb) == NULL.

	* gcc.dg/pr81192.c: New test.

---
 gcc/testsuite/gcc.dg/pr81192.c | 22 ++
 gcc/tree-ssa-tail-merge.c  |  3 +++
 2 files changed, 25 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c
new file mode 100644
index 000..57eb478
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81192.c
@@ -0,0 +1,22 @@
+/* { dg-options "-Os -fdump-tree-pre-details" } */
+
+unsigned a;
+int b, c;
+
+static int
+fn1 (int p1, int p2)
+{
+  return p1 > 2147483647 - p2 ? p1 : p1 + p2;
+}
+
+void
+fn2 (void)
+{
+  int j;
+  a = 30;
+  for (; a;)
+for (; c; b = fn1 (j, 1))
+  ;
+}
+
+/* { dg-final { scan-tree-dump-times "(?n)find_duplicates:  duplicate of " 1 "pre" } } */
diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index f6c9878..bb8a308 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -809,6 +809,9 @@ static void
 same_succ_flush_bb (basic_block bb)
 {
   same_succ *same = BB_SAME_SUCC (bb);
+  if (! same)
+return;
+
   BB_SAME_SUCC (bb) = NULL;
   if (bitmap_single_bit_set_p (same->bbs))
 same_succ_htab->remove_elt_with_hash (same, same->hashval);


Update profile in tree-complex.c

2017-07-02 Thread Jan Hubicka
bootrapped/regtested x86_64-linux, comitted.

Honza

* tree-complex.c (expand_complex_div_wide): update profile.
Index: tree-complex.c
===
--- tree-complex.c  (revision 249881)
+++ tree-complex.c  (working copy)
@@ -1186,13 +1186,22 @@ expand_complex_div_wide (gimple_stmt_ite
   bb_join = e->dest;
   bb_true = create_empty_bb (bb_cond);
   bb_false = create_empty_bb (bb_true);
+  bb_true->frequency = bb_false->frequency = bb_cond->frequency / 2;
+  bb_true->count = bb_false->count
+= bb_cond->count.apply_probability (profile_probability::even ());
 
   /* Wire the blocks together.  */
   e->flags = EDGE_TRUE_VALUE;
+  e->count = bb_true->count;
+  /* TODO: With value profile we could add an historgram to determine real
+branch outcome.  */
+  e->probability = profile_probability::even ();
   redirect_edge_succ (e, bb_true);
-  make_edge (bb_cond, bb_false, EDGE_FALSE_VALUE);
-  make_edge (bb_true, bb_join, EDGE_FALLTHRU);
-  make_edge (bb_false, bb_join, EDGE_FALLTHRU);
+  edge e2 = make_edge (bb_cond, bb_false, EDGE_FALSE_VALUE);
+  e2->count = bb_false->count;
+  e2->probability = profile_probability::even ();
+  make_single_succ_edge (bb_true, bb_join, EDGE_FALLTHRU);
+  make_single_succ_edge (bb_false, bb_join, EDGE_FALLTHRU);
   add_bb_to_loop (bb_true, bb_cond->loop_father);
   add_bb_to_loop (bb_false, bb_cond->loop_father);
 


Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-02 Thread Marc Glisse

On Sun, 2 Jul 2017, Yuri Gribov wrote:


On Sun, Jul 2, 2017 at 6:32 PM, Marc Glisse  wrote:

On Sun, 2 Jul 2017, Yuri Gribov wrote:


This is initial patch for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 .


Thanks. I have had this unfinished, probably wrong patch on my hard drive
for 4 years, I doubt there is much you can extract from it, but just in
case...


Thanks, this will indeed help! I only don't like !uns here:

+  bool uns = TYPE_UNSIGNED (itype);
+  bool fits = mantissa >= prec - !uns;

as I believe for INT?_MIN we need 'prec' number of bits.


INT_MIN is (minus) a power of 2, so 1 bit of mantissa is sufficient to 
represent it exactly (as long as the exponent also fits). At least for an 
IEEE754 representation using base 2, but I expect non-IEEE representations 
of floats are similar enough.


--
Marc Glisse


Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-02 Thread Yuri Gribov
On Sun, Jul 2, 2017 at 6:32 PM, Marc Glisse  wrote:
> On Sun, 2 Jul 2017, Yuri Gribov wrote:
>
>> This is initial patch for
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 .
>
> Thanks. I have had this unfinished, probably wrong patch on my hard drive
> for 4 years, I doubt there is much you can extract from it, but just in
> case...

Thanks, this will indeed help! I only don't like !uns here:

+  bool uns = TYPE_UNSIGNED (itype);
+  bool fits = mantissa >= prec - !uns;

as I believe for INT?_MIN we need 'prec' number of bits.

-Y


[gcc commit] [gcc patch] DWARF-5: Define DW_IDX_GNU_static and DW_IDX_GNU_external

2017-07-02 Thread Jan Kratochvil
On Sun, 02 Jul 2017 18:22:45 +0200, Jason Merrill wrote:
> I'd suggest "internal" rather than "static".  Otherwise the patch looks good.

Checked in as r249883 with: DW_IDX_GNU_internal, DW_IDX_GNU_external


Jan


[PATCH] enhance -Wrestrict for sprintf %s arguments

2017-07-02 Thread Martin Sebor

The attached patch enhances the -Wrestrict warning to detect more
than just trivial instances of overlapping copying by sprintf and
related functions.

The meat of the patch is relatively simple but because it introduces
dependencies between existing classes in the sprintf pass I had to
move the class definitions around.  That makes the changes look more
extensive than they really are.

The enhancement works by first determining the base object (or
pointer) from the destination of the sprintf call, the constant
offset into the object, and its size.  For each %s argument, it
then computes same information.  If it determines that overlap
between the two is possible it stores the information for the
directive argument (including the size of the argument) for later
processing.  After the whole call/format string has been processed,
the patch then iterates over the stored directives and their
arguments and compares the size/length of the argument against
the function's overall output.  If they overlap it issues
a warning.

Tested on x86_64-linux.

-Wrestrict is not currently included in either -Wextra or -Wall
and this patch doesn't change it even though there have been
requests to add it to one of these two options.  I'd like to do
that as a separate step.

As the next step I'd also like to extend a limited form of the
-Wrestrict enhancement to other restrict-qualified built-ins (like
strcpy), and ultimately also to user-defined functions that make
use of restrict.  I think this might perhaps best be done in
a separate pass where the computed pointer information can be
cached to avoid recomputing it for each call, but I don't expect
to be able to have the new pass (or whatever form the enhancement
might end up taking) to also handle sprintf (at least not with
the same accuracy it does now) because the sprintf data for each
format directive are not available outside the sprintf pass.

Martin
PR tree-optimization/35503 - Warning about restricted pointers?

gcc/c-family/ChangeLog:

	PR tree-optimization/35503
	* gcc/c-family/c-common.c (check_function_restrict): Avoid diagnosing
	sprintf et al. unless both -Wformat-overflow and -Wformat-truncation
	are disabled.

gcc/ChangeLog:

	PR tree-optimization/35503
	* gimple-ssa-sprintf.c (format_result::alias_info): New struct.
	(directive::argno): New member.
	(format_result::aliases, format_result::alias_count): New data members.
	(format_result::append_alias): New member function.
	(fmtresult::dst_offset): New data member.
	(pass_sprintf_length::call_info::dst_origin): New data member.
	(pass_sprintf_length::call_info::dst_field, dst_offset): Same.
	(char_type_p, array_elt_at_offset, field_at_offset): New functions.
	(get_origin_and_offset): Same.
	(format_string): Call it.
	(format_directive): Call append_alias and set directive argument
	number.
	(pass_sprintf_length::compute_format_length): Diagnose arguments
	that overlap the destination buffer.
	(pass_sprintf_length::handle_gimple_call): Initialize new members.
gcc/testsuite/ChangeLog:

	PR tree-optimization/35503
	* gcc.dg/tree-ssa/builtin-sprintf-warn-19.c: New test.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4395e51..406ebc4 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5266,6 +5266,24 @@ check_function_restrict (const_tree fndecl, const_tree fntype,
   else
 parms = TYPE_ARG_TYPES (fntype);
 
+  if (DECL_BUILT_IN (fndecl)
+  && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+{
+  switch (DECL_FUNCTION_CODE (fndecl))
+	{
+	case BUILT_IN_SPRINTF:
+	case BUILT_IN_SPRINTF_CHK:
+	case BUILT_IN_SNPRINTF:
+	case BUILT_IN_SNPRINTF_CHK:
+	  /* These are handled in gimple-ssa-sprintf.c.  */
+	  if (warn_format_overflow || warn_format_trunc)
+	return;
+
+	default:
+	  break;
+	}
+}
+
   for (i = 0; i < nargs; i++)
 TREE_VISITED (argarray[i]) = 0;
 
diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index f43778b..8113353 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "intl.h"
 #include "langhooks.h"
+#include "tree-ssa-alias.h"
 
 #include "builtins.h"
 #include "stor-layout.h"
@@ -78,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "input.h"
 #include "toplev.h"
 #include "substring-locations.h"
+#include "gcc-rich-location.h"
 #include "diagnostic.h"
 
 /* The likely worst case value of MB_LEN_MAX for the target, large enough
@@ -93,6 +95,30 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace {
 
+/* Return the value of INT_MIN for the target.  */
+
+static inline HOST_WIDE_INT
+target_int_min ()
+{
+  return tree_to_shwi (TYPE_MIN_VALUE (integer_type_node));
+}
+
+/* Return the value of INT_MAX for the target.  */
+
+static inline unsigned HOST_WIDE_INT
+target_int_max ()
+{
+  return tree_to_uhwi (TYPE_MAX_VALUE (integer_type_node));
+}
+
+/* Return the value of 

Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-02 Thread Marc Glisse

On Sun, 2 Jul 2017, Yuri Gribov wrote:


This is initial patch for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 .


Thanks. I have had this unfinished, probably wrong patch on my hard drive 
for 4 years, I doubt there is much you can extract from it, but just in 
case...


--
Marc GlisseIndex: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 209514)
+++ gcc/fold-const.c(working copy)
@@ -6389,21 +6389,21 @@ fold_inf_compare (location_t loc, enum t
   /* x > +Inf is always false, if with ignore sNANs.  */
   if (HONOR_SNANS (mode))
 return NULL_TREE;
   return omit_one_operand_loc (loc, type, integer_zero_node, arg0);
 
 case LE_EXPR:
   /* x <= +Inf is always true, if we don't case about NaNs.  */
   if (! HONOR_NANS (mode))
return omit_one_operand_loc (loc, type, integer_one_node, arg0);
 
-  /* x <= +Inf is the same as x == x, i.e. isfinite(x).  */
+  /* x <= +Inf is the same as x == x, i.e. !isnan(x).  */
   arg0 = save_expr (arg0);
   return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg0);
 
 case EQ_EXPR:
 case GE_EXPR:
   /* x == +Inf and x >= +Inf are always equal to x > DBL_MAX.  */
   real_maxval (&max, neg, mode);
   return fold_build2_loc (loc, neg ? LT_EXPR : GT_EXPR, type,
  arg0, build_real (TREE_TYPE (arg0), max));
 
@@ -9389,70 +9389,121 @@ fold_comparison (location_t loc, enum tr
 
   tem = maybe_canonicalize_comparison (loc, code, type, arg0, arg1);
   if (tem)
 return tem;
 
   if (FLOAT_TYPE_P (TREE_TYPE (arg0)))
 {
   tree targ0 = strip_float_extensions (arg0);
   tree targ1 = strip_float_extensions (arg1);
   tree newtype = TREE_TYPE (targ0);
+  tree ftype = TREE_TYPE (arg0);
 
   if (TYPE_PRECISION (TREE_TYPE (targ1)) > TYPE_PRECISION (newtype))
newtype = TREE_TYPE (targ1);
 
   /* Fold (double)float1 CMP (double)float2 into float1 CMP float2.  */
-  if (TYPE_PRECISION (newtype) < TYPE_PRECISION (TREE_TYPE (arg0)))
+  if (TYPE_PRECISION (newtype) < TYPE_PRECISION (ftype))
return fold_build2_loc (loc, code, type,
fold_convert_loc (loc, newtype, targ0),
fold_convert_loc (loc, newtype, targ1));
 
   /* (-a) CMP (-b) -> b CMP a  */
   if (TREE_CODE (arg0) == NEGATE_EXPR
  && TREE_CODE (arg1) == NEGATE_EXPR)
return fold_build2_loc (loc, code, type, TREE_OPERAND (arg1, 0),
TREE_OPERAND (arg0, 0));
 
   if (TREE_CODE (arg1) == REAL_CST)
{
  REAL_VALUE_TYPE cst;
  cst = TREE_REAL_CST (arg1);
 
  /* (-a) CMP CST -> a swap(CMP) (-CST)  */
  if (TREE_CODE (arg0) == NEGATE_EXPR)
return fold_build2_loc (loc, swap_tree_comparison (code), type,
TREE_OPERAND (arg0, 0),
-   build_real (TREE_TYPE (arg1),
+   build_real (ftype,
real_value_negate (&cst)));
 
  /* IEEE doesn't distinguish +0 and -0 in comparisons.  */
  /* a CMP (-0) -> a CMP 0  */
  if (REAL_VALUE_MINUS_ZERO (cst))
return fold_build2_loc (loc, code, type, arg0,
-   build_real (TREE_TYPE (arg1), dconst0));
+   build_real (ftype, dconst0));
 
  /* x != NaN is always true, other ops are always false.  */
  if (REAL_VALUE_ISNAN (cst)
- && ! HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg1
+ && ! HONOR_SNANS (TYPE_MODE (ftype)))
{
  tem = (code == NE_EXPR) ? integer_one_node : integer_zero_node;
  return omit_one_operand_loc (loc, type, tem, arg0);
}
 
  /* Fold comparisons against infinity.  */
  if (REAL_VALUE_ISINF (cst)
- && MODE_HAS_INFINITIES (TYPE_MODE (TREE_TYPE (arg1
+ && MODE_HAS_INFINITIES (TYPE_MODE (ftype)))
{
  tem = fold_inf_compare (loc, code, type, arg0, arg1);
  if (tem != NULL_TREE)
return tem;
}
+
+ /* (double)i CMP cst is often just i CMP cst'.
+See PR 57371 for a detailed analysis and other ideas.  */
+ if (TREE_CODE (arg0) == FLOAT_EXPR)
+   {
+ tree inner = TREE_OPERAND (arg0, 0);
+ tree itype = TREE_TYPE (inner);
+ unsigned mantissa = significand_size (TYPE_MODE (ftype));
+ unsigned prec = element_precision (itype);
+ bool uns = TYPE_UNSIGNED (itype);
+ bool fits = mantissa >= prec - !uns;
+ /* If ftype cannot represent exactly all values of itype,
+we may have an inexact exception.  If the conversion from
+itype to ftype may overflow (unsigned __int128 to float),
+ 

[PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-02 Thread Yuri Gribov
Hi all,

This is initial patch for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Per Joseph's
suggestion it optimizes
  (float)lhs CMP rhs
  (double)lhs CMP rhs
to
  lhs CMP (typeof(x))rhs
whenever typeof(x) can be precisely represented by floating-point type
(e.g. short by float or int by double) and rhs can be precisely
represented by typeof(x).

Bootstrapped/regtested on x64. Ok for trunk?

I'd like to extend this further in follow-up patches:
1) fold always-false/always-true comparisons e.g.
  short x;
  (float)x > INT16_MAX;  // Always false
2) get rid of cast in comparisons with zero regardless of typeof(lhs)
when -fno-trapping-math:
  (float_or_double)lhs CMP 0

-Y


pr57371-1.patch
Description: Binary data


Re: ping^2: [gcc patch] DWARF-5: Define DW_IDX_GNU_static and DW_IDX_GNU_external

2017-07-02 Thread Jason Merrill
On Sun, Jul 2, 2017 at 3:25 AM, Jan Kratochvil
 wrote:
> http://dwarfstd.org/ShowIssue.php?issue=170527.1
>
> 170527.1 Jan Kratochvil DW_IDX_* for static/extern symbols Enhancement Open
>
> Section 6.1.1.4.7, pg 147
> When a debugger wants to print 'somename' it logically tries to find first 
> 'somename' as an
> external symbol in all available libraries.  Only if none such external 
> symbol is found the
> debugger starts searching for a static 'somename' symbol in those libraries.
>
> This requires to know whether a symbol in .debug_names index has 
> DW_AT_external or not.
> Otherwise a lot of needless CU expansions happen.  This extension improves 
> performance
> gain of the .debug_names index.
>
> (Discovered in an original fix by Doug Evans - GDB Bug 14125.)
>
> Proposing and asking for pre-allocation:
>   DW_IDX_static   = 6 = DW_FORM_flag_present = DIE's DW_AT_external is not 
> present
>   DW_IDX_external = 7 = DW_FORM_flag_present = DIE's DW_AT_external is present

I'd suggest "internal" rather than "static".  Otherwise the patch looks good.

Jason


Re: [patch, fortran] Implement blocked eoshift for eoshift0

2017-07-02 Thread Paul Richard Thomas
Hi Thomas,

The timings are impressive! OK for trunk.

Thanks

Paul

On 1 July 2017 at 14:48, Thomas Koenig  wrote:
> Hello world,
>
> the attached patch implements the blocked algorithm for
> constant shift for dim > 1 for eoshift0 (which handles
> the case of constant shift and constant fill value).
>
> Speedup, as for cshift, is large.  Moving a 500*500*500
> array by -3 with eo_bench.f90 (also attached):
>
> $ gfortran -O3 eo_bench.f90 && ./a.out
>  dim =1  t =   0.451796889
>  dim =2  t =   0.183514118
>  dim =3  t =   0.184015989
> $ gfortran-7 -static-libgfortran -O3 eo_bench.f90 && ./a.out
>  dim =1  t =   0.955736041
>  dim =2  t =1.42228103
>  dim =3  t =3.00043702
>
> Regression-tested.  OK for trunk?
>
> Regards
>
> Thomas
>
> 2017-07-01  Thomas Koenig  
>
> * intrinsics/eoshift0.c:  For contiguous arrays, use
> block algorithm.  Use memcpy where possible.
>
> 2017-07-01  Thomas Koenig  
>
> * gfortran/eoshift_3.f90:  New test.



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [2/2] PR 80769: Incorrect strlen optimisation

2017-07-02 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Tue, May 16, 2017 at 09:02:08AM +0100, Richard Sandiford wrote:
>> 2017-05-16  Richard Sandiford  
>> 
>> gcc/
>>  PR tree-optimization/80769
>>  * tree-ssa-strlen.c (strinfo): Document that "stmt" is also used
>>  for malloc and calloc.  Document the new invariant that all related
>>  strinfos have delayed lengths or none do.
>>  (verify_related_strinfos): Move earlier in file.
>>  (set_endptr_and_length): New function, split out from...
>>  (get_string_length): ...here.  Also set the lengths of related
>>  strinfos.
>>  (zero_length_string): Assert that chainsi has known (rather than
>>  delayed) lengths.
>>  (adjust_related_strinfos): Likewise.
>> 
>> gcc/testsuite/
>>  PR tree-optimization/80769
>>  * gcc.dg/strlenopt-31.c: New test.
>>  * gcc.dg/strlenopt-31g.c: Likewise.
>
> Ok for trunk, sorry for the delay.  I assume 7.x is not affected, right?

Thanks, applied.  6.x and 7.x have it too, so I'll try to backport
a minimal fix if there's no fallout on trunk.

Richard


[PATCHv2][PR 56727] Bypass PLT for recursive calls

2017-07-02 Thread Yuri Gribov
Hi all,

This is a new version of previous patch
(https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00020.html), fixed
after Rainer's remarks.

-Y


pr56727-2.patch
Description: Binary data


Re: [PATCH][PR 56727] Bypass PLT for recursive calls

2017-07-02 Thread Yuri Gribov
On Sat, Jul 1, 2017 at 9:56 PM, Rainer Orth  
wrote:
> Hi Yuri,
>
>> diff -rupN gcc/gcc/testsuite/gcc.dg/pr56727-1.c 
>> gcc-56727/gcc/testsuite/gcc.dg/pr56727-1.c
>> --- gcc/gcc/testsuite/gcc.dg/pr56727-1.c  1970-01-01 01:00:00.0 
>> +0100
>> +++ gcc-56727/gcc/testsuite/gcc.dg/pr56727-1.c2017-07-01 
>> 21:36:36.0 +0200
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fPIC" } */
>
> both tests need to be restricted to target fpic...
>
>> +/* { dg-final { scan-assembler-not "@PLT" } } */
>
> ... and @PLT won't work everywhere, either.  Judging from gcc/config, it
> will be i386/x86_64 (except darwin), microblaze, mn10300, s390, sh, and
> xtensa at most.

Thanks Rainer, I'll update the patch. A pity that this can not be
tested in cross-platform way.

-Y


ping^2: [gcc patch] DWARF-5: Define DW_IDX_GNU_static and DW_IDX_GNU_external

2017-07-02 Thread Jan Kratochvil
http://dwarfstd.org/ShowIssue.php?issue=170527.1

170527.1 Jan Kratochvil DW_IDX_* for static/extern symbols Enhancement Open 

Section 6.1.1.4.7, pg 147
When a debugger wants to print 'somename' it logically tries to find first 
'somename' as an 
external symbol in all available libraries.  Only if none such external symbol 
is found the 
debugger starts searching for a static 'somename' symbol in those libraries.

This requires to know whether a symbol in .debug_names index has DW_AT_external 
or not.  
Otherwise a lot of needless CU expansions happen.  This extension improves 
performance 
gain of the .debug_names index.

(Discovered in an original fix by Doug Evans - GDB Bug 14125.)

Proposing and asking for pre-allocation:
  DW_IDX_static   = 6 = DW_FORM_flag_present = DIE's DW_AT_external is not 
present
  DW_IDX_external = 7 = DW_FORM_flag_present = DIE's DW_AT_external is present