[Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-06 Thread Hongtao Liu
Hi Uros and GCC:
  This patch is to fix ix86_expand_sse_comi_round whose implementation
was not correct.
  New implentation aligns with _mm_cmp_round_s[sd]_mask.

Bootstrap and regression tests for x86 is fine.
Ok for trunk?


ChangeLog:
gcc/
   * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
   Modified, original implementation isn't correct.

gcc/testsuite
   * gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
   * gcc.target/i386/avx512f-vcomisd-2.c: Likewise.

-- 
BR,
Hongtao
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 270933)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2019-05-06  H.J. Lu  
+	Hongtao Liu  
+
+	PR Target/89750
+	PR Target/86444
+	* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
+	Modified, original implementation isn't correct.
+
 2019-05-06  Segher Boessenkool  
 
 	* config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
Index: gcc/config/i386/i386-expand.c
===
--- gcc/config/i386/i386-expand.c	(revision 270933)
+++ gcc/config/i386/i386-expand.c	(working copy)
@@ -9853,18 +9853,24 @@
   const struct insn_data_d *insn_p = _data[icode];
   machine_mode mode0 = insn_p->operand[0].mode;
   machine_mode mode1 = insn_p->operand[1].mode;
-  enum rtx_code comparison = UNEQ;
-  bool need_ucomi = false;
 
   /* See avxintrin.h for values.  */
-  enum rtx_code comi_comparisons[32] =
+  static const enum rtx_code comparisons[32] =
 {
-  UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
-  UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
-  UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
+  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
+  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
+  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
+  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
 };
-  bool need_ucomi_values[32] =
+  static const bool ordereds[32] =
 {
+  true,  true,  true,  false, false, false, false, true,
+  false, false, false, true,  true,  true,  true,  false,
+  true,  true,  true,  false, false, false, false, true,
+  false, false, false, true,  true,  true,  true,  false
+};
+  static const bool non_signalings[32] =
+{
   true,  false, false, true,  true,  false, false, true,
   true,  false, false, true,  true,  false, false, true,
   false, true,  true,  false, false, true,  true,  false,
@@ -9888,16 +9894,95 @@
   return const0_rtx;
 }
 
-  comparison = comi_comparisons[INTVAL (op2)];
-  need_ucomi = need_ucomi_values[INTVAL (op2)];
-
   if (VECTOR_MODE_P (mode0))
 op0 = safe_vector_operand (op0, mode0);
   if (VECTOR_MODE_P (mode1))
 op1 = safe_vector_operand (op1, mode1);
 
+  enum rtx_code comparison = comparisons[INTVAL (op2)];
+  bool ordered = ordereds[INTVAL (op2)];
+  bool non_signaling = non_signalings[INTVAL (op2)];
+  rtx const_val = const0_rtx;
+  
+  bool check_unordered = false;
+  machine_mode mode = CCFPmode;
+  switch (comparison)
+{
+case ORDERED:
+  if (!ordered)
+	{
+	  /* NB: Use CCSmode/NE for _CMP_TRUE_UQ/_CMP_TRUE_US.  */
+	  if (!non_signaling)
+	ordered = true;
+	  mode = CCSmode;
+	}
+  else
+	{
+	  /* NB: Use CCPmode/NE for _CMP_ORD_Q/_CMP_ORD_S.  */
+	  if (non_signaling)
+	ordered = false;
+	  mode = CCPmode;
+	}
+  comparison = NE;
+  break;
+case UNORDERED:
+  if (ordered)
+	{
+	  /* NB: Use CCSmode/EQ for _CMP_FALSE_OQ/_CMP_FALSE_OS.  */
+	  if (non_signaling)
+	ordered = false;
+	  mode = CCSmode;
+	}
+  else
+	{
+	  /* NB: Use CCPmode/NE for _CMP_UNORD_Q/_CMP_UNORD_S.  */
+	  if (!non_signaling)
+	ordered = true;
+	  mode = CCPmode;
+	}
+  comparison = EQ;
+  break;
+
+case LE:	/* -> GE  */
+case LT:	/* -> GT  */
+case UNGE:	/* -> UNLE  */
+case UNGT:	/* -> UNLT  */
+  std::swap (op0, op1);
+  comparison = swap_condition (comparison);
+  /* FALLTHRU */
+case GT:
+case GE:
+case UNEQ:
+case UNLT:
+case UNLE:
+case LTGT:
+  /* These are supported by CCFPmode.  NB: Use ordered/signaling
+	 COMI or unordered/non-signaling UCOMI.  Both set ZF, PF, CF
+	 with NAN operands.  */
+  if (ordered == non_signaling)
+	ordered = !ordered;
+  break;
+case EQ:
+  if (ordered)
+	check_unordered = true;
+  /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCmode for
+	 _CMP_EQ_OQ/_CMP_EQ_OS/_CMP_EQ_UQ/_CMP_EQ_US.  */
+  mode = CCmode;
+  break;
+case NE:
+  gcc_assert (!ordered);
+  check_unordered = true;
+  /* NB: COMI/UCOMI will set ZF with NAN operands.  Use CCmode for
+	 _CMP_NEQ_UQ/_CMP_NEQ_US.  */
+  mode = CCmode;
+  const_val = const1_rtx;
+  break;
+default:
+  gcc_unreachable ();
+}
+
   target = gen_reg_rtx (SImode);
-  emit_move_insn (target, 

Re: [Bug libstdc++/90277] Debug Mode test failures

2019-05-06 Thread François Dumont

Hi

    I just prefer to make the tests implementation-agnostic using reserve.

    I check that without the patch to initiate the hashtable with 11 
buckets I reproduce the failures and that with this patch it is fine.


    PR libstdc++/90277
    * testsuite/23_containers/unordered_multiset/insert/24061-multiset.cc
    (test01): Reserve for number of insertions to avoid rehash during test.
    * testsuite/23_containers/unordered_multimap/insert/24061-multimap.cc
    (test01): Likewise.
    * testsuite/23_containers/unordered_multimap/insert/24061-multimap.cc
    (test01): Likewise.
    (test02): Likewise.
    (test03): Likewise.

    I plan to commit it this evening if not told otherwise.

François


On 5/6/19 3:04 PM, redi at gcc dot gnu.org wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90277

--- Comment #4 from Jonathan Wakely  ---
(In reply to François Dumont from comment #3)

As stated in my message this patch can be consider as a fix for this PR as
it reserve buckets for 11 buckets so iterators are not invalidated during
execution of the tests. You shouldn't have anymore failures, do you ?

But maybe you would prefer to make those tests independent of this
implementation detail. I'll do that too.

I think it's OK to rely on the detail as long as the tests make that explicit.
A comment saying that they assume there's no rehashing would be OK, even better
would be to use VERIFY to assert that the count doesn't change. If we make the
tests actually verify there's no rehashing, then it's OK to assume no hashing
:-)

Currently it's just a silent assumption, which isn't even true any more.



diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/24061-multimap.cc b/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/24061-multimap.cc
index 717e9208e4a..1f7a590267e 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/24061-multimap.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/24061-multimap.cc
@@ -32,7 +32,8 @@ void test01()
   typedef Mmap::value_type value_type;
 
   Mmap mm1;
-  
+  mm1.reserve(3);
+
   iterator it1 = mm1.insert(mm1.begin(),
 			value_type("all the love in the world", 1));
   VERIFY( mm1.size() == 1 );
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc b/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc
index a71a6a9539a..c38afc57438 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc
@@ -29,6 +29,7 @@ void test01()
   typedef typename Map::value_type Pair;
 
   Map m;
+  m.reserve(3);
 
   auto it1 = m.insert(Pair(0, 0));
   VERIFY( it1 != m.end() );
@@ -58,6 +59,7 @@ void test02()
   typedef typename Map::value_type Pair;
 
   Map m;
+  m.reserve(5);
 
   auto it1 = m.insert(Pair(0, 0));
   auto it2 = m.insert(it1, Pair(1, 0));
@@ -89,6 +91,7 @@ void test03()
   typedef typename Map::value_type Pair;
 
   Map m;
+  m.reserve(3);
 
   auto it1 = m.insert(Pair(0, 0));
   VERIFY( it1 != m.end() );
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multiset/insert/24061-multiset.cc b/libstdc++-v3/testsuite/23_containers/unordered_multiset/insert/24061-multiset.cc
index 04833a67188..0c1029f27ff 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multiset/insert/24061-multiset.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multiset/insert/24061-multiset.cc
@@ -31,6 +31,7 @@ void test01()
   typedef Mset::const_iterator const_iterator;
 
   Mset ms1;
+  ms1.reserve(3);
   
   iterator it1 = ms1.insert(ms1.begin(), "all the love in the world");
   VERIFY( ms1.size() == 1 );


Re: [PATCH][Tree-optimization/PR89772]fold memchr builtins for character not in constant nul-padded string

2019-05-06 Thread JunMa

在 2019/5/7 上午2:05, Martin Sebor 写道:

On 5/6/19 5:58 AM, JunMa wrote:

在 2019/5/6 下午6:02, Richard Biener 写道:

On Thu, Mar 21, 2019 at 5:57 AM JunMa  wrote:

Hi
For now, gcc can not fold code like:

const char a[5] = "123"
__builtin_memchr (a, '7', sizeof a)

It tries to avoid folding out of string length although length of a 
is 5.

This is a bit conservative, it's safe to folding memchr/bcmp/memcmp
builtins when constant string stores in array with some trailing nuls.

This patch folds these cases by exposing additional length of
trailing nuls in c_getstr().
Bootstrapped/regtested on x86_64-linux, ok for trunk?

It's probably better if gimple_fold_builtin_memchr uses string_constant
directly instead?
We can use string_constant in gimple_fold_builtin_memchr, however it 
is a

bit complex to use it in memchr/memcmp constant folding.

You are changing memcmp constant folding but only have a testcase
for memchr.

I'll add later.

If we decide to amend c_getstr I would rather make it return the
total length instead of the number of trailing zeros.

I think return the total length is safe in other place as well.
I used the argument in patch since I do not want any impact on
other part at all.


Using c_getstr is simpler than string_constant so it seems fine
to me but I agree that returning a size of the array rather than
the number of trailing nuls would make the API more intuitive.
I would also suggest to use a name for the variable/parameter
that makes that clear.  E.g., string_size or array_size.
(Since trailing nuls don't contribute to the length of a string
referring to length in the name is misleading.)



I found that the return length of c_getstr() is used in function
inline_expand_builtin_string_cmp(). It may emit redundant rtls for
trailing null chars when total length is returned. So Although it
is safe to return total length, It seems that using string_constant
is better.

Thanks
JunMa



Martin


Thanks
JunMa

Richard.


Regards
JunMa


gcc/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * fold-const.c (c_getstr): Add new parameter to get length of
additional
  trailing nuls after constant string.
  * gimple-fold.c (gimple_fold_builtin_memchr): consider 
trailing nuls in

  out-of-bound accesses checking.
  * fold-const-call.c (fold_const_call): Likewise.


gcc/testsuite/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * gcc.dg/builtin-memchr-4.c: New test.







Re: [PATCH][Tree-optimization/PR89772]fold memchr builtins for character not in constant nul-padded string

2019-05-06 Thread JunMa

在 2019/5/6 下午7:58, JunMa 写道:

在 2019/5/6 下午6:02, Richard Biener 写道:

On Thu, Mar 21, 2019 at 5:57 AM JunMa  wrote:

Hi
For now, gcc can not fold code like:

const char a[5] = "123"
__builtin_memchr (a, '7', sizeof a)

It tries to avoid folding out of string length although length of a 
is 5.

This is a bit conservative, it's safe to folding memchr/bcmp/memcmp
builtins when constant string stores in array with some trailing nuls.

This patch folds these cases by exposing additional length of
trailing nuls in c_getstr().
Bootstrapped/regtested on x86_64-linux, ok for trunk?

It's probably better if gimple_fold_builtin_memchr uses string_constant
directly instead?

We can use string_constant in gimple_fold_builtin_memchr, however it is a
bit complex to use it in memchr/memcmp constant folding.

You are changing memcmp constant folding but only have a testcase
for memchr.

I'll add later.

If we decide to amend c_getstr I would rather make it return the
total length instead of the number of trailing zeros.

I think return the total length is safe in other place as well.
I used the argument in patch since I do not want any impact on
other part at all.



Although it is safe to use total length, I found that function
inline_expand_builtin_string_cmp() which used c_getstr() may emit
redundant rtls for trailing null chars when total length is returned.

Also it is trivial to handle constant string with trailing null chars.

So this updated patch follow richard's suggestion: using string_constant
directly.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

Regards
JunMa

gcc/ChangeLog

2019-05-07  Jun Ma 

    PR Tree-optimization/89772
    * gimple-fold.c (gimple_fold_builtin_memchr): consider trailing nuls in
    out-of-bound accesses checking.

gcc/testsuite/ChangeLog

2019-05-07  Jun Ma 

    PR Tree-optimization/89772
    * gcc.dg/builtin-memchr-4.c: New test.

Thanks
JunMa

Richard.


Regards
JunMa


gcc/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * fold-const.c (c_getstr): Add new parameter to get length of
additional
  trailing nuls after constant string.
  * gimple-fold.c (gimple_fold_builtin_memchr): consider 
trailing nuls in

  out-of-bound accesses checking.
  * fold-const-call.c (fold_const_call): Likewise.


gcc/testsuite/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * gcc.dg/builtin-memchr-4.c: New test.




---
 gcc/gimple-fold.c   |  9 -
 gcc/testsuite/gcc.dg/builtin-memchr-4.c | 30 ++
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-memchr-4.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1b10bae..7ee5aea 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2557,7 +2557,14 @@ gimple_fold_builtin_memchr (gimple_stmt_iterator *gsi)
   const char *r = (const char *)memchr (p1, c, MIN (length, 
string_length));
   if (r == NULL)
{
- if (length <= string_length)
+ tree mem_size, offset_node;
+ string_constant (arg1, _node, _size, NULL);
+ unsigned HOST_WIDE_INT offset = (offset_node == NULL_TREE)
+ ? 0 : tree_to_uhwi (offset_node);
+ /* MEM_SIZE is the size of the array the string literal
+is stored in.  */
+ unsigned HOST_WIDE_INT string_size = tree_to_uhwi (mem_size) - offset;
+ if (length <= string_size)
{
  replace_call_with_value (gsi, build_int_cst (ptr_type_node, 0));
  return true;
diff --git a/gcc/testsuite/gcc.dg/builtin-memchr-4.c 
b/gcc/testsuite/gcc.dg/builtin-memchr-4.c
new file mode 100644
index 000..3ef424c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-memchr-4.c
@@ -0,0 +1,30 @@
+/* PR tree-optimization/89772
+   Verify that memchr calls with a pointer to a constant character
+   are folded as expected.
+   { dg-do compile }
+   { dg-options "-O1 -Wall -fdump-tree-lower" } */
+
+typedef __SIZE_TYPE__  size_t;
+typedef __WCHAR_TYPE__ wchar_t;
+
+extern void* memchr (const void*, int, size_t);
+extern int printf (const char*, ...);
+extern void abort (void);
+
+#define A(expr)\
+  ((expr)  \
+   ? (void)0   \
+   : (printf ("assertion failed on line %i: %s\n", \
+ __LINE__, #expr), \
+  abort ()))
+
+const char a[8] = {'a',0,'b'};
+
+void test_memchr_cst_char (void)
+{
+  A (!memchr (a, 'c', 2));
+  A (!memchr (a, 'c', 5));
+  A (!memchr (a, 'c', sizeof a));
+}
+
+/* { dg-final { scan-tree-dump-not "abort" "gimple" } } */
-- 
1.8.3.1



Re: Enable BF16 support (Please ignore my former email)

2019-05-06 Thread Hongtao Liu
Since   GCC 9.1 released [2019-05-03].

I'll merge this to trunk?

On Wed, Apr 17, 2019 at 7:14 PM Uros Bizjak  wrote:
>
> On Wed, Apr 17, 2019 at 1:03 PM Uros Bizjak  wrote:
> >
> > On Wed, Apr 17, 2019 at 12:29 PM Hongtao Liu  wrote:
> > >
> > > On Fri, Apr 12, 2019 at 11:18 PM H.J. Lu  wrote:
> > > >
> > > > On Fri, Apr 12, 2019 at 3:19 AM Uros Bizjak  wrote:
> > > > >
> > > > > On Fri, Apr 12, 2019 at 11:03 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Apr 12, 2019 at 3:30 PM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Apr 12, 2019 at 9:09 AM Liu, Hongtao 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi :
> > > > > > > > This patch is about to enable support for bfloat16 which 
> > > > > > > > will be in Future Cooper Lake, Please refer to 
> > > > > > > > https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
> > > > > > > > for more details about BF16.
> > > > > > > >
> > > > > > > > There are 3 instructions for AVX512BF16: VCVTNE2PS2BF16, 
> > > > > > > > VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector 
> > > > > > > > Neural Network Instructions supporting:
> > > > > > > >
> > > > > > > > -   VCVTNE2PS2BF16: Convert Two Packed Single Data to One 
> > > > > > > > Packed BF16 Data.
> > > > > > > > -   VCVTNEPS2BF16: Convert Packed Single Data to Packed 
> > > > > > > > BF16 Data.
> > > > > > > > -   VDPBF16PS: Dot Product of BF16 Pairs Accumulated into 
> > > > > > > > Packed Single Precision.
> > > > > > > >
> > > > > > > > Since only BF16 intrinsics are supported, we treat it as HI for 
> > > > > > > > simplicity.
> > > > > > >
> > > > > > > I think it was a mistake declaring cvtps2ph and cvtph2ps using 
> > > > > > > HImode
> > > > > > > instead of HFmode. Is there a compelling reason not to introduce
> > > > > > > corresponding bf16_format supporting infrastructure and declare 
> > > > > > > these
> > > > > > > intrinsics using half-binary (HBmode ?) mode instead?
> > > > > > >
> > > > > > > Uros.
> > > > > >
> > > > > > Bfloat16 isn't IEEE standard which we want to reserve HFmode for.
> > > > >
> > > > > True.
> > > > >
> > > > > > The IEEE 754 standard specifies a binary16 as having the following 
> > > > > > format:
> > > > > > Sign bit: 1 bit
> > > > > > Exponent width: 5 bits
> > > > > > Significand precision: 11 bits (10 explicitly stored)
> > > > > >
> > > > > > Bfloat16 has the following format:
> > > > > > Sign bit: 1 bit
> > > > > > Exponent width: 8 bits
> > > > > > Significand precision: 8 bits (7 explicitly stored), as opposed to 
> > > > > > 24
> > > > > > bits in a classical single-precision floating-point format
> > > > >
> > > > > This is why I proposed to introduce HBmode (and corresponding
> > > > > bfloat16_format) to distingush between ieee HFmode and BFmode.
> > > > >
> > > >
> > > > Unless there is BF16 language level support,  HBmode has no advantage
> > > > over HImode.   We can add HBmode when we gain BF16 language support.
> > > >
> > > > --
> > > > H.J.
> > >
> > > Any other comments, I'll merge this to trunk?
> >
> > It is not a regression, so please no.
>
> Ehm, "regression fix" ...
>
> Uros.



-- 
BR,
Hongtao


[PATCH] rs6000: Renumber the registers

2019-05-06 Thread Segher Boessenkool
This renumbers the registers.

It moves the VRs to 64..95, right after the GPRs and the FPRS.  This
means that the VSRs (which are aliases to the FPRs and the VRs, in
that order) are consecutive now.

It removes MQ, which has been just a stub for ages (it is a leftover
from RIOS, old POWER).

It moves the CR fields to 100..107, which is a bit easier to read
than the 68..75 is was before.

The rest fills the holes.  It should be easy to move anything else
after this series, so the exact order isn't very important anymore,
we aren't stuck with it if we dislike it.

Many things still want the GPRs to be at 0..31, and some things want
the FPRs at 32..63.  I don't think we'll ever want to change that,
so I left it be.

Small things...  It removes DWARF_FRAME_REGISTERS, it used to save
1000 or so words of memory, but it has been just a handful for a
while, and now it is one.  Some whitespace fixes.  Testing showed one
or two places where register allocation was different (not worse, not
better, just different).

Tested on BE and LE Linux (p7 and p9); committing to trunk.


Segher


2019-05-06  Segher Boessenkool  

* config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
(LR_REGNO, CTR_REGNO, CA_REGNO, ARG_POINTER_REGNUM, CR0_REGNO)
(CR1_REGNO, CR2_REGNO, CR3_REGNO, CR4_REGNO, CR5_REGNO, CR6_REGNO)
(CR7_REGNO, MAX_CR_REGNO, VRSAVE_REGNO, VSCR_REGNO)
(FRAME_POINTER_REGNUM): Change numbering.
* config/rs6000/rs6000.c (rs6000_reg_names): Adjust.
(alt_reg_names): Adjust.
(rs6000_conditional_register_usage): Don't mark hard register 64 as
fixed.
* config/rs6000/rs6000.h (FIRST_PSEUDO_REGISTER): Adjust.
(DWARF_FRAME_REGISTERS): Delete.
(DWARF2_FRAME_REG_OUT): Fix whitespace.
(FIXED_REGISTERS, CALL_USED_REGISTERS, CALL_REALLY_USED_REGISTERS):
Adjust.
(REG_ALLOC_ORDER): Adjust.
(FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM): Adjust.
(REG_CLASS_CONTENTS): Adjust.
(RETURN_ADDR_RTX): Change comment.
(REGNO_OK_FOR_INDEX_P, REGNO_OK_FOR_BASE_P): Use ARG_POINTER_REGNUM
instead of 67.
(REGISTER_NAMES): Adjust.
(ADDITIONAL_REGISTER_NAMES): Adjust.
* config/rs6000/darwin.h (REGISTER_NAMES): Adjust.

---
 gcc/config/rs6000/darwin.h  |  20 ++--
 gcc/config/rs6000/rs6000.c  |  64 +--
 gcc/config/rs6000/rs6000.h  | 272 +++-
 gcc/config/rs6000/rs6000.md |  36 +++---
 4 files changed, 203 insertions(+), 189 deletions(-)

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index d86333d..4774d1d 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -215,23 +215,27 @@ extern int darwin_emit_branch_islands;
 #undef REGISTER_NAMES
 #define REGISTER_NAMES \
 {  \
+  /* GPRs */   \
  "r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",\
  "r8",  "r9", "r10", "r11", "r12", "r13", "r14", "r15",\
 "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",\
 "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",\
+  /* FPRs */   \
  "f0",  "f1",  "f2",  "f3",  "f4",  "f5",  "f6",  "f7",\
  "f8",  "f9", "f10", "f11", "f12", "f13", "f14", "f15",\
 "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23",\
 "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",\
- "mq",  "lr", "ctr",  "ap",
\
+  /* VRs */\
+ "v0",  "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",\
+ "v8",  "v9", "v10", "v11", "v12", "v13", "v14", "v15",\
+"v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",\
+"v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",\
+  /* lr ctr ca ap */   \
+ "lr", "ctr", "xer",  "ap",
\
+  /* cr0..cr7 */   \
 "cr0", "cr1", "cr2", "cr3", "cr4", "cr5", "cr6", "cr7",\
-"xer", \
- "v0",  "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7", \
- "v8",  "v9", "v10", "v11", "v12", "v13", "v14", "v15", \
-"v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", \
-"v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", \
-"vrsave", "vscr",  \
-"sfp"  \
+  /* 

Re: [PATCH] PR fortran/90290 -- Check F2008 STOP code

2019-05-06 Thread Steve Kargl
On Thu, May 02, 2019 at 09:19:10AM +0300, Janne Blomqvist wrote:
> On Thu, May 2, 2019 at 9:02 AM Steve Kargl
>  wrote:
> >
> > The attach patch adds an additional check for the
> > STOP code when -std=f2008 is used.  The patch has
> > been bootstrapped and regression tested on
> > x86_64-*-freebsd for trunk.  OK to commit?
> 
> Ok.
> 

Committed revision 270928.

I have not decided if I will backport this
to other branches, yet.

-- 
Steve


Re: [PATCH] PR fortran/90166 -- Add check for module prefix

2019-05-06 Thread Steve Kargl
Ping.

On Wed, May 01, 2019 at 11:06:08PM -0700, Steve Kargl wrote:
> The attach patch adds a check that a module prefix 
> occurs only in a module, submodule, or interface.
> 
> C1547 (R1526) MODULE shall appear only in the function-stmt or
>subroutine-stmt of a module subprogram or of a nonabstract
>interface body that is declared in the scoping unit of a
>module or submodule.
> 
> The patch has been bootstrapped and regression
> tested on x86_64-*-freebsd.  OK to commit?
> 
> 2019-04-19  Steven G. Kargl  
> 
>   PR fortran/90166
>   * decl.c (in_module_or_interface): New function to check that the
>   current state is in a module, submodule, or interface.
>   (gfc_match_prefix): Use it.
> 
>   PR fortran/90166
>   * gfortran.dg/submodule_22.f08: Add additional dg-error comments.
> 
> -- 
> Steve

> Index: gcc/fortran/decl.c
> ===
> --- gcc/fortran/decl.c(revision 270181)
> +++ gcc/fortran/decl.c(working copy)
> @@ -6070,7 +6070,29 @@ cleanup:
>return m;
>  }
>  
> +static bool
> +in_module_or_interface(void)
> +{
> +  if (gfc_current_state () == COMP_MODULE
> +  || gfc_current_state () == COMP_SUBMODULE 
> +  || gfc_current_state () == COMP_INTERFACE)
> +return true;
>  
> +  if (gfc_state_stack->state == COMP_CONTAINS
> +  || gfc_state_stack->state == COMP_FUNCTION
> +  || gfc_state_stack->state == COMP_SUBROUTINE)
> +{
> +  gfc_state_data *p;
> +  for (p = gfc_state_stack->previous; p ; p = p->previous)
> + {
> +   if (p->state == COMP_MODULE || p->state == COMP_SUBMODULE 
> +   || p->state == COMP_INTERFACE)
> + return true;
> + }
> +}
> +return false;
> +}
> +
>  /* Match a prefix associated with a function or subroutine
> declaration.  If the typespec pointer is nonnull, then a typespec
> can be matched.  Note that if nothing matches, MATCH_YES is
> @@ -6102,6 +6124,13 @@ gfc_match_prefix (gfc_typespec *ts)
>   {
> if (!gfc_notify_std (GFC_STD_F2008, "MODULE prefix at %C"))
>   goto error;
> +
> +   if (!in_module_or_interface ())
> + {
> +   gfc_error ("MODULE prefix at %C found outside of a module, "
> +  "submodule, or INTERFACE");
> +   goto error;
> + }
>  
> current_attr.module_procedure = 1;
> found_prefix = true;
> Index: gcc/testsuite/gfortran.dg/submodule_22.f08
> ===
> --- gcc/testsuite/gfortran.dg/submodule_22.f08(revision 270181)
> +++ gcc/testsuite/gfortran.dg/submodule_22.f08(working copy)
> @@ -40,8 +40,10 @@ end
>  
>  submodule (mtop:submod:subsubmod) subsubsubmod ! { dg-error "Syntax error in 
> SUBMODULE statement" }
>  contains
> -  module subroutine sub3
> -r = 2.0
> -s = 2.0
> -  end subroutine sub3
> +  module subroutine sub3  ! { dg-error "found outside of a module" }
> +r = 2.0   ! { dg-error "Unexpected assignment" }
> +s = 2.0   ! { dg-error "Unexpected assignment" }
> +  end subroutine sub3 ! { dg-error "Expecting END PROGRAM statement" }
>  end
> +
> +found outside of a module


-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow


[PATCH 3/3] rs6000: Remove TM regs

2019-05-06 Thread Segher Boessenkool
We do not need to expose the TM registers in debug info.  It isn't
actually useful there, because none of the theing that can modify
these register (other than explicit moves) are marked.

We also do not need the registers for GCC itself internally.  This
patch deletes them.

Tested on powerpc64-linux {-m32,-m64} and on powerpc64le-linux;
installing on trunk.


Segher


2019-05-06  Segher Boessenkool  

* config/rs6000/rs6000.md (TFHAR_REGNO, TFIAR_REGNO, TEXASR_REGNO):
Delete.
* config/rs6000/rs6000.h (FIRST_PSEUDO_REGISTER): Adjust.
(DWARF_FRAME_REGISTERS): Adjust.
(FIXED_REGISTERS, CALL_USED_REGISTERS, CALL_REALLY_USED_REGISTERS):
Adjust.
(REG_ALLOC_ORDER): Adjust.
(enum reg_class): Delete SPR_REGS.
(REG_CLASS_NAMES): Delete SPR_REGS.
(REG_CLASS_CONTENTS): Delete SPR_REGS.  Adjust for deleted TM regs.
(REGISTER_NAMES): Adjust.
(ADDITIONAL_REGISTER_NAMES): Adjust.
* config/rs6000/darwin.h (REGISTER_NAMES): Adjust.
* config/rs6000/htm.md (htm_mfspr_, htm_mtspr_): Adjust.
* config/rs6000/predicates.md (htm_spr_reg_operand): Delete.
* config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Adjust.
(htm_spr_regno): Delete.
(htm_expand_builtin): Adjust: the HTM builtins now have one fewer
argument.
(rs6000_dbx_register_number): Adjust.

---
 gcc/config/rs6000/darwin.h  |  3 +--
 gcc/config/rs6000/htm.md| 10 --
 gcc/config/rs6000/predicates.md | 27 ---
 gcc/config/rs6000/rs6000.c  | 35 +--
 gcc/config/rs6000/rs6000.h  | 31 +--
 gcc/config/rs6000/rs6000.md |  3 ---
 6 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index 9fb36e4..d86333d 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -231,8 +231,7 @@ extern int darwin_emit_branch_islands;
 "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", \
 "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", \
 "vrsave", "vscr",  \
-"sfp", \
-"tfhar", "tfiar", "texasr" \
+"sfp"  \
 }
 
 /* This outputs NAME to FILE.  */
diff --git a/gcc/config/rs6000/htm.md b/gcc/config/rs6000/htm.md
index 66b583d..9e99afa 100644
--- a/gcc/config/rs6000/htm.md
+++ b/gcc/config/rs6000/htm.md
@@ -267,18 +267,16 @@ (define_insn "*ttest"
 
 (define_insn "htm_mfspr_"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
-(unspec_volatile:GPR [(match_operand 1 "u10bit_cint_operand" "n")
- (match_operand:GPR 2 "htm_spr_reg_operand" "")]
+(unspec_volatile:GPR [(match_operand 1 "u10bit_cint_operand" "n")]
 UNSPECV_HTM_MFSPR))]
   "TARGET_HTM"
   "mfspr %0,%1";
   [(set_attr "type" "htm")])
 
 (define_insn "htm_mtspr_"
-  [(set (match_operand:GPR 2 "htm_spr_reg_operand" "")
-(unspec_volatile:GPR [(match_operand:GPR 0 "gpc_reg_operand" "r")
- (match_operand 1 "u10bit_cint_operand" "n")]
-UNSPECV_HTM_MTSPR))]
+  [(unspec_volatile [(match_operand:GPR 0 "gpc_reg_operand" "r")
+(match_operand 1 "u10bit_cint_operand" "n")]
+   UNSPECV_HTM_MTSPR)]
   "TARGET_HTM"
   "mtspr %1,%0";
   [(set_attr "type" "htm")])
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 5cc80de..2643f1a 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -406,33 +406,6 @@ (define_predicate "fpr_reg_operand"
   return FP_REGNO_P (r);
 })
 
-;; Return 1 if op is a HTM specific SPR register.
-(define_predicate "htm_spr_reg_operand"
-  (match_operand 0 "register_operand")
-{
-  if (!TARGET_HTM)
-return 0;
-
-  if (SUBREG_P (op))
-op = SUBREG_REG (op);
-
-  if (!REG_P (op))
-return 0;
-
-  switch (REGNO (op))
-{
-  case TFHAR_REGNO:
-  case TFIAR_REGNO:
-  case TEXASR_REGNO:
-   return 1;
-  default:
-   break;
-}
-  
-  /* Unknown SPR.  */
-  return 0;
-})
-
 ;; Return 1 if op is a general purpose register that is an even register
 ;; which suitable for a load/store quad operation
 ;; Subregs are not allowed here because when they are combine can
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6f46fdd..af0ce16 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3011,9 +3011,6 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
   rs6000_regno_regclass[CA_REGNO] = NO_REGS;
   rs6000_regno_regclass[VRSAVE_REGNO] = VRSAVE_REGS;
   rs6000_regno_regclass[VSCR_REGNO] = VRSAVE_REGS;

[PATCH 1/3] rs6000: rs6000_dbx_register_number for fp/ap/mq

2019-05-06 Thread Segher Boessenkool
The frame pointer and the argument pointer aren't real registers.  MQ
was a register on old POWER.  All three are still used as arguments to
rs6000_dbx_register_number during initialisation.  If we handle them
explicitly we can do a gcc_unreachable to catch other unexpected
registers.

Tested on powerpc64-linux {-m32,-m64} and on powerpc64le-linux;
installing on trunk.


Segher


2019-05-06  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_dbx_register_number): Handle
FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM, and 64 (which was MQ).

--
 gcc/config/rs6000/rs6000.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c75fd86..6f46fdd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -36305,7 +36305,15 @@ rs6000_dbx_register_number (unsigned int regno, 
unsigned int format)
   if (regno == TEXASR_REGNO)
return 230;
 
-  return regno;
+  /* These do not make much sense.  */
+  if (regno == FRAME_POINTER_REGNUM)
+   return 111;
+  if (regno == ARG_POINTER_REGNUM)
+   return 67;
+  if (regno == 64)
+   return 100;
+
+  gcc_unreachable ();
 #endif
 }
 
@@ -36337,7 +36345,14 @@ rs6000_dbx_register_number (unsigned int regno, 
unsigned int format)
   if (regno == TEXASR_REGNO)
 return 116;
 
-  return regno;
+  if (regno == FRAME_POINTER_REGNUM)
+return 111;
+  if (regno == ARG_POINTER_REGNUM)
+return 67;
+  if (regno == 64)
+return 64;
+
+  gcc_unreachable ();
 }
 
 /* target hook eh_return_filter_mode */
-- 
1.8.3.1



[PATCH 2/3] rs6000: Delete PRE_GCC3_DWARF_FRAME_REGISTERS

2019-05-06 Thread Segher Boessenkool
We don't need this.


Segher


2019-05-06  Segher Boessenkool  

* config/rs6000/rs6000.h (PRE_GCC3_DWARF_FRAME_REGISTERS): Delete.

---
 gcc/config/rs6000/rs6000.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index ff9449c..3829e8f 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -817,9 +817,6 @@ enum data_align { align_abi, align_opt, align_both };
 
 #define FIRST_PSEUDO_REGISTER 115
 
-/* This must be included for pre gcc 3.0 glibc compatibility.  */
-#define PRE_GCC3_DWARF_FRAME_REGISTERS 77
-
 /* The sfp register and 3 HTM registers
aren't included in DWARF_FRAME_REGISTERS.  */
 #define DWARF_FRAME_REGISTERS (FIRST_PSEUDO_REGISTER - 4)
-- 
1.8.3.1



C++ PATCH for c++/78010 - bogus -Wsuggest-override warning on final function

2019-05-06 Thread Marek Polacek
-Wsuggest-override warns for virtual functions that could use the override
keyword.  But as everyone in this PR agrees, the warning shouldn't suggest
adding "override" for "final" functions.  This trivial patch implements that.

Bootstrapped/regtested on x86_64-linux, ok for trunk/9?

2019-05-06  Marek Polacek  

PR c++/78010 - bogus -Wsuggest-override warning on final function.
* class.c (check_for_override): Don't warn for final functions.

* g++.dg/warn/Wsuggest-override-2.C: New test.

diff --git gcc/cp/class.c gcc/cp/class.c
index 712169ce7e7..a4cdd9e 100644
--- gcc/cp/class.c
+++ gcc/cp/class.c
@@ -2780,7 +2780,9 @@ check_for_override (tree decl, tree ctype)
 {
   DECL_VINDEX (decl) = decl;
   overrides_found = true;
-  if (warn_override && !DECL_OVERRIDE_P (decl)
+  if (warn_override
+ && !DECL_OVERRIDE_P (decl)
+ && !DECL_FINAL_P (decl)
  && !DECL_DESTRUCTOR_P (decl))
warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wsuggest_override,
"%qD can be marked override", decl);
diff --git gcc/testsuite/g++.dg/warn/Wsuggest-override-2.C 
gcc/testsuite/g++.dg/warn/Wsuggest-override-2.C
new file mode 100644
index 000..4948902e930
--- /dev/null
+++ gcc/testsuite/g++.dg/warn/Wsuggest-override-2.C
@@ -0,0 +1,9 @@
+// PR c++/78010
+// { dg-options "-std=c++11 -Wsuggest-override" }
+
+struct A {
+  virtual void f();
+};
+struct B : A {
+  void f() final; // { dg-bogus "can be marked override" }
+};


Re: Hide move_iterator ill-form operators

2019-05-06 Thread Jonathan Wakely

On 06/05/19 19:36 +0200, François Dumont wrote:

Hi

    This is another attempt to make adapter iterator types operators 
undefined when underlying iterator type doesn't support it. For the 
move_iterator it is rather easy and even already done for the 
operator- so I just generalize it to comparison operators. It doesn't 
cover all operators of course but it is still better than current 
situation.


    * include/bits/stl_iterator.h (move_iterator<>::operator++(int)):
    Simplify implementation using underlying iterator type same
    post-increment operator.
    (move_iterator<>::operator--(int)):
    Simplify implementation using underlying iterator type same
    post-decrement operator.
    (move_iterator<>::operator<(const move_iterator<>&,
    const move_iterator<>&): Define return type as return type of the same
    expression on underlying iterator type.
    (move_iterator<>::operator<=(const move_iterator<>&,
    const move_iterator<>&): Likewise.
    (move_iterator<>::operator>(const move_iterator<>&,
    const move_iterator<>&): Likewise.
    (move_iterator<>::operator>=(const move_iterator<>&,
    const move_iterator<>&): Likewise.
    * testsuite/24_iterators/move_iterator/operator_neg.cc: New.

    Ok to commit or should the Standard be amended first ?


Not OK.

The C++2a draft already solves the same problem, but differently.
Please follow the draft standard, instead of inventing something
different.


François



diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 47be1a9dbcd..c1bbc75ca43 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1121,11 +1121,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  _GLIBCXX17_CONSTEXPR move_iterator
  operator++(int)
-  {
-   move_iterator __tmp = *this;
-   ++_M_current;
-   return __tmp;
-  }
+  { return move_iterator(_M_current++); }


This is not what C++2a says.



  _GLIBCXX17_CONSTEXPR move_iterator&
  operator--()
@@ -1136,11 +1132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  _GLIBCXX17_CONSTEXPR move_iterator
  operator--(int)
-  {
-   move_iterator __tmp = *this;
-   --_M_current;
-   return __tmp;
-  }
+  { return move_iterator(_M_current--); }


This is not what C++2a says.


  _GLIBCXX17_CONSTEXPR move_iterator
  operator+(difference_type __n) const
@@ -1197,51 +1189,59 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ return !(__x == __y); }

  template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
operator<(const move_iterator<_IteratorL>& __x,
  const move_iterator<_IteratorR>& __y)
+-> decltype( __x.base() < __y.base() )


This is wrong, it needs to return bool, e.g.

   -> decltype(bool(__x.base() < __y.base()))



Re: Fix Solaris bootstrap: lto-common.c, lto-dump.c format mismatches

2019-05-06 Thread Jakub Jelinek
On Mon, May 06, 2019 at 08:46:05PM +0200, Richard Biener wrote:
> >Fixed as follows.  i386-pc-solaris2.11 bootstrap has completed with
> >this
> >patch, sparc-sun-solaris2.11 is running the testsuite and
> >x86_64-pc-linux-gnu is building the runtime libs.
> >
> >Ok for mainline?
> 
> Can you use the PRI* format macros to match the types instead?

Is that sufficiently portable though?
I mean, for PRI[diouxX]64 we redefine those macros in hwint.h if needed.
But we don't have anything similar for PRI[diouxX]PTR if inttypes.h
is not available, and for size_t there isn't even any PRI* macro at all.

Jakub


Re: [PATCH] Prep for PR88828 fix

2019-05-06 Thread Bernhard Reutner-Fischer
On 3 May 2019 18:48:09 CEST, "H.J. Lu"  wrote:
>On Fri, May 3, 2019 at 6:02 AM Richard Biener 
>wrote:
>>
>>
>> The following refactors simplify_vector_constructor and adds
>> handling of constants to it in a straight-forward way.
>>
>> A followup will handle the testcases posted in HJs patch.
>>
>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>>
>> Richard.
>>
>> 2019-05-03  Richard Biener  
>>
>> PR tree-optimization/88828
>> * tree-ssa-forwprop.c (get_bit_field_ref_def): Split out
>from...
>> (simplify_vector_constructor): ...here.  Handle constants in
>> the constructor.
>>
>> * gcc.target/i386/pr88828-0.c: New testcase.
>>
>> Index: gcc/tree-ssa-forwprop.c
>> ===
>> --- gcc/tree-ssa-forwprop.c (revision 270847)
>> +++ gcc/tree-ssa-forwprop.c (working copy)
>> @@ -1997,6 +1997,44 @@ simplify_permutation (gimple_stmt_iterat
>>return 0;
>>  }
>>
>> +/* Get the BIT_FIELD_REF definition of VAL, if any, looking through
>> +   conversions with code CONV_CODE or update it if still ERROR_MARK.
>> +   Return NULL_TREE if no such matching def was found.  */
>> +
>> +static tree
>> +get_bit_field_ref_def (tree val, enum tree_code _code)

Also:
/enum tree_code/s/enum //g

i think.

thanks,

>> +{
>> +  if (TREE_CODE (val) != SSA_NAME)
>> +return NULL_TREE ;
>> +  gimple *def_stmt = get_prop_source_stmt (val, false, NULL);
>> +  if (!def_stmt)
>> +return NULL_TREE;
>> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
>> +  if (code == FLOAT_EXPR
>> +  || code == FIX_TRUNC_EXPR)
>> +{
>> +  tree op1 = gimple_assign_rhs1 (def_stmt);
>> +  if (conv_code == ERROR_MARK)
>> +   {
>> + if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (val))),
>> +   GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1)
>> +   return NULL_TREE;
>> + conv_code = code;
>> +   }
>> +  else if (conv_code != code)
>> +   return NULL_TREE;
>> +  if (TREE_CODE (op1) != SSA_NAME)
>> +   return NULL_TREE;
>> +  def_stmt = SSA_NAME_DEF_STMT (op1);
>> +  if (! is_gimple_assign (def_stmt))
>> +   return NULL_TREE;
>> +  code = gimple_assign_rhs_code (def_stmt);
>> +}
>> +  if (code != BIT_FIELD_REF)
>> +return NULL_TREE;
>> +  return gimple_assign_rhs1 (def_stmt);
>> +}
>> +
>>  /* Recognize a VEC_PERM_EXPR.  Returns true if there were any
>changes.  */
>>
>>  static bool
>> @@ -2027,6 +2065,9 @@ simplify_vector_constructor (gimple_stmt
>>orig[1] = NULL;
>>conv_code = ERROR_MARK;
>>maybe_ident = true;
>> +  tree one_constant = NULL_TREE;
>> +  auto_vec constants;
>> +  constants.safe_grow_cleared (nelts);
>>FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
>>  {
>>tree ref, op1;
>> @@ -2034,68 +2075,55 @@ simplify_vector_constructor (gimple_stmt
>>if (i >= nelts)
>> return false;
>>
>> -  if (TREE_CODE (elt->value) != SSA_NAME)
>> -   return false;
>> -  def_stmt = get_prop_source_stmt (elt->value, false, NULL);
>> -  if (!def_stmt)
>> -   return false;
>> -  code = gimple_assign_rhs_code (def_stmt);
>> -  if (code == FLOAT_EXPR
>> - || code == FIX_TRUNC_EXPR)
>> +  op1 = get_bit_field_ref_def (elt->value, conv_code);
>> +  if (op1)
>> {
>> - op1 = gimple_assign_rhs1 (def_stmt);
>> - if (conv_code == ERROR_MARK)
>> + ref = TREE_OPERAND (op1, 0);
>> + unsigned int j;
>> + for (j = 0; j < 2; ++j)
>> {
>> - if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE
>(elt->value))),
>> -   GET_MODE_SIZE (TYPE_MODE (TREE_TYPE
>(op1)
>> -   return false;
>> - conv_code = code;
>> + if (!orig[j])
>> +   {
>> + if (TREE_CODE (ref) != SSA_NAME)
>> +   return false;
>> + if (! VECTOR_TYPE_P (TREE_TYPE (ref))
>> + || ! useless_type_conversion_p (TREE_TYPE
>(op1),
>> + TREE_TYPE
>(TREE_TYPE (ref
>> +   return false;
>> + if (j && !useless_type_conversion_p (TREE_TYPE
>(orig[0]),
>> +  TREE_TYPE
>(ref)))
>> +   return false;
>> + orig[j] = ref;
>> + break;
>> +   }
>> + else if (ref == orig[j])
>> +   break;
>> }
>> - else if (conv_code != code)
>> + if (j == 2)
>> return false;
>> - if (TREE_CODE (op1) != SSA_NAME)
>> -   return false;
>> - def_stmt = SSA_NAME_DEF_STMT (op1);
>> - if (! is_gimple_assign (def_stmt))
>> +
>> + unsigned int elt;
>> + if (maybe_ne (bit_field_size (op1), elem_size)
>> + || 

Re: [PATCH] libphobos: RISC-V: Fix soft-float build errors with IEEE exception flags

2019-05-06 Thread Jim Wilson
On Fri, May 3, 2019 at 3:11 PM Maciej W. Rozycki  wrote:
>  Hmm, I've been thinking a little bit about this hybrid mode and I have
> one question: how do we pass the IEEE rounding mode setting between `fcsr'
> and softfp where we have `-march=rv32imafc -mabi=ilp32' and
> `-march=rv32imac -mabi=ilp32' object modules interlinked?

If you look at libgcc/config/riscv/sfp-machine.h you will see
definitions of FP_INIT_ROUNDMODE and FP_HANDLE_EXCEPTIONS for reading
rounding mode from the fcsr before soft-float FP operations, and
storing the exception flags back into fcsr after soft-float FP
operations, if FP regs exist..  Whether this actually works, I don't
know, I haven't tested it.  I think in practice people using
soft-float libraries are probably not relying on rounding modes and
exception flags much if at all, so this is unlikely to be a problem.
If someone does find a problem, I will worry about how to fix it then.

Jim


Re: [Patch, fortran] ISO_Fortran_binding PRs 90093, 90352 & 90355

2019-05-06 Thread Paul Richard Thomas
It helps to attach the patch!

On Mon, 6 May 2019 at 19:57, Paul Richard Thomas
 wrote:
>
> Unfortunately, this patch was still in the making at the release of
> 9.1. It is more or less self explanatory with the ChangeLogs.
>
> It should be noted that gfc_conv_expr_present could not be used in the
> fix for PR90093 because the passed descriptor is a CFI type. Instead,
> the test is for a null pointer passed.
>
> The changes to trans-array.c(gfc_trans_create_temp_array) have an eye
> on the future, as well as PR90355. I am progressing towards the point
> where all descriptors have 'span' set correctly so that
> trans.c(get_array_span) can be eliminated and much of the code in the
> callers can be simplified.
>
> Bootstrapped and regtested on FC29/x86_64 - OK for trunk and 9-branch?
>
> Paul
>
> 2019-05-06  Paul Thomas  
>
> PR fortran/90093
> * trans-decl.c (convert_CFI_desc): Test that the dummy is
> present before doing any of the conversions.
>
> PR fortran/90352
> * decl.c (gfc_verify_c_interop_param): Restore the error for
> charlen > 1 actual arguments passed to bind(C) procs.
> Clean up trailing white space.
>
> PR fortran/90355
> * trans-array.c (gfc_trans_create_temp_array): Set the 'span'
> field to the element length for all types.
> (gfc_conv_expr_descriptor): The force_no_tmp flag is used to
> prevent temporary creation, especially for substrings.
> * trans-decl.c (gfc_trans_deferred_vars): Rather than assert
> that the backend decl for the string length is non-null, use it
> as a condition before calling gfc_trans_vla_type_sizes.
> * trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): 'force_no_tmp'
> is set before calling gfc_conv_expr_descriptor.
> * trans.c (get_array_span): Move the code for extracting 'span'
> from gfc_build_array_ref to this function. This is specific to
> descriptors that are component and indirect references.
> * trans.h : Add the force_no_tmp flag bitfield to gfc_se.
>
> 2019-05-06  Paul Thomas  
>
> PR fortran/90093
> * gfortran.dg/ISO_Fortran_binding_12.f90: New test.
> * gfortran.dg/ISO_Fortran_binding_12.c: Supplementary code.
>
> PR fortran/90352
> * gfortran.dg/iso_c_binding_char_1.f90: New test.
>
> PR fortran/90355
> * gfortran.dg/ISO_Fortran_binding_4.f90: Add 'substr' to test
> the direct passing of substrings as descriptors to bind(C).
> * gfortran.dg/assign_10.f90: Increase the tree_dump count of
> 'atmp' to account for the setting of the 'span' field.
> * gfortran.dg/transpose_optimization_2.f90: Ditto.
Index: gcc/fortran/decl.c
===
*** gcc/fortran/decl.c	(revision 270622)
--- gcc/fortran/decl.c	(working copy)
*** match_data_constant (gfc_expr **result)
*** 406,412 
  	 contains the right constant expression.  Check here.  */
if ((*result)->symtree == NULL
  	  && (*result)->expr_type == EXPR_CONSTANT
! 	  && ((*result)->ts.type == BT_INTEGER 
  	  || (*result)->ts.type == BT_REAL))
  	return m;
  
--- 406,412 
  	 contains the right constant expression.  Check here.  */
if ((*result)->symtree == NULL
  	  && (*result)->expr_type == EXPR_CONSTANT
! 	  && ((*result)->ts.type == BT_INTEGER
  	  || (*result)->ts.type == BT_REAL))
  	return m;
  
*** gfc_verify_c_interop_param (gfc_symbol *
*** 1493,1511 
  
/* Character strings are only C interoperable if they have a
   length of 1.  */
!   if (sym->ts.type == BT_CHARACTER)
  	{
  	  gfc_charlen *cl = sym->ts.u.cl;
  	  if (!cl || !cl->length || cl->length->expr_type != EXPR_CONSTANT
|| mpz_cmp_si (cl->length->value.integer, 1) != 0)
  		{
! 		  if (!gfc_notify_std (GFC_STD_F2018,
!    "Character argument %qs at %L "
!    "must be length 1 because "
!    "procedure %qs is BIND(C)",
!    sym->name, >declared_at,
!    sym->ns->proc_name->name))
! 		retval = false;
  		}
  	}
  
--- 1493,1510 
  
/* Character strings are only C interoperable if they have a
   length of 1.  */
!   if (sym->ts.type == BT_CHARACTER && !sym->attr.dimension)
  	{
  	  gfc_charlen *cl = sym->ts.u.cl;
  	  if (!cl || !cl->length || cl->length->expr_type != EXPR_CONSTANT
|| mpz_cmp_si (cl->length->value.integer, 1) != 0)
  		{
! 		  gfc_error ("Character argument %qs at %L "
! 			 "must be length 1 because "
! 			 "procedure %qs is BIND(C)",
! 			 sym->name, >declared_at,
! 			 sym->ns->proc_name->name);
! 		  retval = false;
  		}
  	}
  
*** static bool
*** 6074,6080 
  in_module_or_interface(void)
  {
if (gfc_current_state () == COMP_MODULE
!   || gfc_current_state () == COMP_SUBMODULE 
|| gfc_current_state () == COMP_INTERFACE)
  return true;
  
--- 

[Patch, fortran] ISO_Fortran_binding PRs 90093, 90352 & 90355

2019-05-06 Thread Paul Richard Thomas
Unfortunately, this patch was still in the making at the release of
9.1. It is more or less self explanatory with the ChangeLogs.

It should be noted that gfc_conv_expr_present could not be used in the
fix for PR90093 because the passed descriptor is a CFI type. Instead,
the test is for a null pointer passed.

The changes to trans-array.c(gfc_trans_create_temp_array) have an eye
on the future, as well as PR90355. I am progressing towards the point
where all descriptors have 'span' set correctly so that
trans.c(get_array_span) can be eliminated and much of the code in the
callers can be simplified.

Bootstrapped and regtested on FC29/x86_64 - OK for trunk and 9-branch?

Paul

2019-05-06  Paul Thomas  

PR fortran/90093
* trans-decl.c (convert_CFI_desc): Test that the dummy is
present before doing any of the conversions.

PR fortran/90352
* decl.c (gfc_verify_c_interop_param): Restore the error for
charlen > 1 actual arguments passed to bind(C) procs.
Clean up trailing white space.

PR fortran/90355
* trans-array.c (gfc_trans_create_temp_array): Set the 'span'
field to the element length for all types.
(gfc_conv_expr_descriptor): The force_no_tmp flag is used to
prevent temporary creation, especially for substrings.
* trans-decl.c (gfc_trans_deferred_vars): Rather than assert
that the backend decl for the string length is non-null, use it
as a condition before calling gfc_trans_vla_type_sizes.
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): 'force_no_tmp'
is set before calling gfc_conv_expr_descriptor.
* trans.c (get_array_span): Move the code for extracting 'span'
from gfc_build_array_ref to this function. This is specific to
descriptors that are component and indirect references.
* trans.h : Add the force_no_tmp flag bitfield to gfc_se.

2019-05-06  Paul Thomas  

PR fortran/90093
* gfortran.dg/ISO_Fortran_binding_12.f90: New test.
* gfortran.dg/ISO_Fortran_binding_12.c: Supplementary code.

PR fortran/90352
* gfortran.dg/iso_c_binding_char_1.f90: New test.

PR fortran/90355
* gfortran.dg/ISO_Fortran_binding_4.f90: Add 'substr' to test
the direct passing of substrings as descriptors to bind(C).
* gfortran.dg/assign_10.f90: Increase the tree_dump count of
'atmp' to account for the setting of the 'span' field.
* gfortran.dg/transpose_optimization_2.f90: Ditto.


Re: [PATCH] Various store-merging improvements (PR tree-optimization/88709, PR tree-optimization/90271)

2019-05-06 Thread Richard Biener
On May 6, 2019 3:22:42 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>The following patch adds some store-merging improvements.
>Previously, only stores where the lhs was
>ARRAY_REF, ARRAY_RANGE_REF, MEM_REF, COMPONENT_REF or BIT_FIELD_REF
>were considered, as the testcases show it is useful to consider
>stores into whole decl as well.
>Another thing is that we were considering stores where the rhs was some
>constant with fixed size mode and one that native_encode_expr can
>handle.
>This patch extends it to handle = {} stores (but not clobbers).  While
>native_encode_expr doesn't handle those, they can be handled very
>easily
>(it is a memory clearing) and we don't need have the machine mode
>restrictions for that case either.
>With those two changes, we can handle stuff like:
>  var = {};
>  var.b = 1;
>  var.d = 2;
>and turn that into
>  MEM [(struct S *)] = some_constant;
>We do have a param how many stores we consider together and when all
>the
>stores were fixed size, that naturally implied a reasonably small size
>of
>the whole store group area which we e.g. precompute the bytes for. 
>With
>= {} having almost arbitrary sizes, I feared we could easily trigger
>out of
>compile time memory errors by just having var = {} for say 2GB variable
>followed by var.a = 1; allocating 4GB of RAM (or even larger cases for
>64-bit
>hosts+targets).  So, the patch adds another parameter and its default
>to
>limit the sizes of the groups, above that boundary we just don't merge
>multiple stores into a group.
>Another thing is, if we have a large store first, typically = {} of the
>whole object, followed by some stores after it, either the initial
>clearing
>is small and we manage to turn the stores into even smaller ones that
>cover
>the complete object, but if the = {} is large, we'd often just punt
>because
>it would create more stores than originally (we really count the = {}
>as one
>store and although sometimes it could be expanded by pieces, in other
>times
>it can be far more efficient to clear everything and overwrite a few
>things).  This means, in various cases the patch could even regress on
>the
>store merging done, because previously we wouldn't even consider the =
>{},
>while now we've placed it into the same store group, so before we would
>have
>that = {} separate and then merge the rest, while now we just punt.
>This patch solves it by special casing the case where there is a clear
>of
>the whole area, followed by some stores into that area.  We check what
>is in
>that case smaller, whether to handle the = {} together with the rest,
>or
>whether to treat it as = {} separate from the rest; in the latter case
>we
>can do whatever we used to do before, but actually can do even better,
>because we know the memory area is already pre-cleared, so for the
>purposes
>of computation how many, how large and how much aligned stores we will
>do
>we can actually treat all the zero bytes in the combined value to be
>stored
>as the padding bytes (gaps); we have already stored zeros there, so we
>don't
>need to overwrite it, but if it is beneficial, we actually can
>overwrite it
>with further zeros.  When later on actually adding the individual
>bytes, we
>don't need to consider those bytes as padding though, so can avoid
>doing a
>RMW cycle, we know those bytes are zero.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

Richard. 

>2019-05-06  Jakub Jelinek  
>
>   PR tree-optimization/88709
>   PR tree-optimization/90271
>   * params.def (PARAM_STORE_MERGING_MAX_SIZE): New parameter.
>   * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Handle
>   non-clobber CONSTRUCTORs with no elts.  Remove useless tmp_int
>   variable.
>   (imm_store_chain_info::coalesce_immediate_stores): Punt if the size
>   of the store merging group is larger than
>   PARAM_STORE_MERGING_MAX_SIZE parameter.
>   (split_group): Add bzero_first argument.  If set, always emit first
>   the first store which must be = {} of the whole area and then for the
>   rest of the stores consider all zero bytes as paddings.
>   (imm_store_chain_info::output_merged_store): Check if first store
>   is = {} of the whole area and if yes, determine which setting of
>   bzero_first for split_group gives smaller number of stores.  Adjust
>   split_group callers.
>   (lhs_valid_for_store_merging_p): Allow decls.
>   (rhs_valid_for_store_merging_p): Allow non-clobber CONTRUCTORs with
>   no elts.
>   (pass_store_merging::process_store): Likewise.
>
>   * gcc.dg/store_merging_26.c: New test.
>   * gcc.dg/store_merging_27.c: New test.
>   * gcc.dg/store_merging_28.c: New test.
>   * gcc.dg/store_merging_29.c: New test.
>
>--- gcc/params.def.jj  2019-05-06 09:45:57.014019177 +0200
>+++ gcc/params.def 2019-05-06 10:06:33.036183432 +0200
>@@ -1205,6 +1205,11 @@ DEFPARAM (PARAM_MAX_STORES_TO_MERGE,
> "store merging pass.",
> 

Re: Fix Solaris bootstrap: lto-common.c, lto-dump.c format mismatches

2019-05-06 Thread Richard Biener
On May 6, 2019 8:35:15 PM GMT+02:00, Rainer Orth 
 wrote:
>The recent LTO patches broke Solaris bootstrap (both sparc and x86):
>
>/vol/gcc/src/hg/trunk/local/gcc/lto/lto-common.c: In function
>'lto_file_decl_data* lto_file_read(lto_file*, std::FILE*, int*)':
>/vol/gcc/src/hg/trunk/local/gcc/lto/lto-common.c:2114:32: error: format
>'%ld' expects argument of type 'long int', but argument 4 has type
>'intptr_t' {aka 'int'} [-Werror=format=]
> 2114 |   fprintf (stdout, "%2d %8ld %8ld   %s\n",
>  | ~~~^
>  ||
>  |long int
>  | %8d
> 2115 | ++i, section->start, section->len, section->name);
>  |  ~~
>  |   |
>  |   intptr_t {aka int}
>
>/vol/gcc/src/hg/trunk/local/gcc/lto/lto-common.c:2114:37: error: format
>'%ld' expects argument of type 'long int', but argument 5 has type
>'std::size_t' {aka 'unsigned int'} [-Werror=format=]
> 2114 |   fprintf (stdout, "%2d %8ld %8ld   %s\n",
>  |  ~~~^
>  | |
>  | long int
>  |  %8d
> 2115 | ++i, section->start, section->len, section->name);
>  |  
>  |   |
> |   std::size_t {aka unsigned int}
>
>/vol/gcc/src/hg/trunk/local/gcc/lto/lto-dump.c: In member function
>'virtual void symbol_entry::dump()':
>/vol/gcc/src/hg/trunk/local/gcc/lto/lto-dump.c:63:25: error: format
>'%lu' expects argument of type 'long unsigned int', but argument 4 has
>type 'std::size_t' {aka 'unsigned int'} [-Werror=format=]
>63 | printf ("%s  %s  %4lu  %s  ", type_name, visibility, sz,
>name);
>  |  ~~~^~~
>  | ||
>| long unsigned intstd::size_t
>{aka unsigned int}
>  |  %4u
>
>Fixed as follows.  i386-pc-solaris2.11 bootstrap has completed with
>this
>patch, sparc-sun-solaris2.11 is running the testsuite and
>x86_64-pc-linux-gnu is building the runtime libs.
>
>Ok for mainline?

Can you use the PRI* format macros to match the types instead?

Ok with that change. 

Richard. 

>   Rainer



Fix Solaris bootstrap: lto-common.c, lto-dump.c format mismatches

2019-05-06 Thread Rainer Orth
The recent LTO patches broke Solaris bootstrap (both sparc and x86):

/vol/gcc/src/hg/trunk/local/gcc/lto/lto-common.c: In function 
'lto_file_decl_data* lto_file_read(lto_file*, std::FILE*, int*)':
/vol/gcc/src/hg/trunk/local/gcc/lto/lto-common.c:2114:32: error: format '%ld' 
expects argument of type 'long int', but argument 4 has type 'intptr_t' {aka 
'int'} [-Werror=format=]
 2114 |   fprintf (stdout, "%2d %8ld %8ld   %s\n",
  | ~~~^
  ||
  |long int
  | %8d
 2115 | ++i, section->start, section->len, section->name);
  |  ~~
  |   |
  |   intptr_t {aka int}

/vol/gcc/src/hg/trunk/local/gcc/lto/lto-common.c:2114:37: error: format '%ld' 
expects argument of type 'long int', but argument 5 has type 'std::size_t' {aka 
'unsigned int'} [-Werror=format=]
 2114 |   fprintf (stdout, "%2d %8ld %8ld   %s\n",
  |  ~~~^
  | |
  | long int
  |  %8d
 2115 | ++i, section->start, section->len, section->name);
  |  
  |   |
  |   std::size_t {aka unsigned int}

/vol/gcc/src/hg/trunk/local/gcc/lto/lto-dump.c: In member function 'virtual 
void symbol_entry::dump()':
/vol/gcc/src/hg/trunk/local/gcc/lto/lto-dump.c:63:25: error: format '%lu' 
expects argument of type 'long unsigned int', but argument 4 has type 
'std::size_t' {aka 'unsigned int'} [-Werror=format=]
   63 | printf ("%s  %s  %4lu  %s  ", type_name, visibility, sz, name);
  |  ~~~^~~
  | ||
  | long unsigned intstd::size_t 
{aka unsigned int}
  |  %4u

Fixed as follows.  i386-pc-solaris2.11 bootstrap has completed with this
patch, sparc-sun-solaris2.11 is running the testsuite and
x86_64-pc-linux-gnu is building the runtime libs.

Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2019-05-06  Rainer Orth  

* lto-common.c (lto_file_read): Cast section->start, section->len
to match format.
* lto-dump.c (symbol_entry::dump): Cast sz to match format.

# HG changeset patch
# Parent  48cc59b91c89bd75e69175e99802075208e54e51
Fix Solaris bootstrap: lto-common.c, lto-dump.c format mismatches

diff --git a/gcc/lto/lto-common.c b/gcc/lto/lto-common.c
--- a/gcc/lto/lto-common.c
+++ b/gcc/lto/lto-common.c
@@ -2111,8 +2111,9 @@ lto_file_read (lto_file *file, FILE *res
 fprintf (stdout, "\nLTO Object Name: %s\n", file->filename);
 fprintf (stdout, "\nNo.OffsetSize   Section Name\n\n");
 for (section = section_list.first; section != NULL; section = section->next)
-  fprintf (stdout, "%2d %8ld %8ld   %s\n",
-	   ++i, section->start, section->len, section->name);
+  fprintf (stdout, "%2d %8ld %8lu   %s\n",
+	   ++i, (long) section->start, (unsigned long) section->len,
+	   section->name);
   }
 
   /* Find all sub modules in the object and put their sections into new hash
diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c
--- a/gcc/lto/lto-dump.c
+++ b/gcc/lto/lto-dump.c
@@ -60,7 +60,8 @@ struct symbol_entry
 const char *type_name = node->get_symtab_type_string ();
 const char *visibility = node->get_visibility_string ();
 size_t sz = get_size ();
-printf ("%s  %s  %4lu  %s  ", type_name, visibility, sz, name);
+printf ("%s  %s  %4lu  %s  ", type_name, visibility, (unsigned long) sz,
+	name);
   }
 };
 


Re: [PATCH][Tree-optimization/PR89772]fold memchr builtins for character not in constant nul-padded string

2019-05-06 Thread Martin Sebor

On 5/6/19 5:58 AM, JunMa wrote:

在 2019/5/6 下午6:02, Richard Biener 写道:

On Thu, Mar 21, 2019 at 5:57 AM JunMa  wrote:

Hi
For now, gcc can not fold code like:

const char a[5] = "123"
__builtin_memchr (a, '7', sizeof a)

It tries to avoid folding out of string length although length of a 
is 5.

This is a bit conservative, it's safe to folding memchr/bcmp/memcmp
builtins when constant string stores in array with some trailing nuls.

This patch folds these cases by exposing additional length of
trailing nuls in c_getstr().
Bootstrapped/regtested on x86_64-linux, ok for trunk?

It's probably better if gimple_fold_builtin_memchr uses string_constant
directly instead?

We can use string_constant in gimple_fold_builtin_memchr, however it is a
bit complex to use it in memchr/memcmp constant folding.

You are changing memcmp constant folding but only have a testcase
for memchr.

I'll add later.

If we decide to amend c_getstr I would rather make it return the
total length instead of the number of trailing zeros.

I think return the total length is safe in other place as well.
I used the argument in patch since I do not want any impact on
other part at all.


Using c_getstr is simpler than string_constant so it seems fine
to me but I agree that returning a size of the array rather than
the number of trailing nuls would make the API more intuitive.
I would also suggest to use a name for the variable/parameter
that makes that clear.  E.g., string_size or array_size.
(Since trailing nuls don't contribute to the length of a string
referring to length in the name is misleading.)

Martin


Thanks
JunMa

Richard.


Regards
JunMa


gcc/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * fold-const.c (c_getstr): Add new parameter to get length of
additional
  trailing nuls after constant string.
  * gimple-fold.c (gimple_fold_builtin_memchr): consider trailing 
nuls in

  out-of-bound accesses checking.
  * fold-const-call.c (fold_const_call): Likewise.


gcc/testsuite/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * gcc.dg/builtin-memchr-4.c: New test.







Re: [wwwdocs] Document GCC 9 Solaris changes

2019-05-06 Thread Gerald Pfeifer
On Mon, 6 May 2019, Rainer Orth wrote:
> Better late than never: here's the gcc-9/changes.html update listing
> Solaris improvements.  I'm all ears for suggestions about wording or
> markup improvements.

I believe it should be "runtime library", cf. 
https://gcc.gnu.org/codingconventions.html .

And "GNU as" should be fine, instead of having to go "GNU 
as".

> Ok?

Yes (modulo the above); thank you!

Gerald


Hide move_iterator ill-form operators

2019-05-06 Thread François Dumont

Hi

    This is another attempt to make adapter iterator types operators 
undefined when underlying iterator type doesn't support it. For the 
move_iterator it is rather easy and even already done for the operator- 
so I just generalize it to comparison operators. It doesn't cover all 
operators of course but it is still better than current situation.


    * include/bits/stl_iterator.h (move_iterator<>::operator++(int)):
    Simplify implementation using underlying iterator type same
    post-increment operator.
    (move_iterator<>::operator--(int)):
    Simplify implementation using underlying iterator type same
    post-decrement operator.
    (move_iterator<>::operator<(const move_iterator<>&,
    const move_iterator<>&): Define return type as return type of the same
    expression on underlying iterator type.
    (move_iterator<>::operator<=(const move_iterator<>&,
    const move_iterator<>&): Likewise.
    (move_iterator<>::operator>(const move_iterator<>&,
    const move_iterator<>&): Likewise.
    (move_iterator<>::operator>=(const move_iterator<>&,
    const move_iterator<>&): Likewise.
    * testsuite/24_iterators/move_iterator/operator_neg.cc: New.

    Ok to commit or should the Standard be amended first ?

François
diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index 47be1a9dbcd..c1bbc75ca43 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1121,11 +1121,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   _GLIBCXX17_CONSTEXPR move_iterator
   operator++(int)
-  {
-	move_iterator __tmp = *this;
-	++_M_current;
-	return __tmp;
-  }
+  { return move_iterator(_M_current++); }
 
   _GLIBCXX17_CONSTEXPR move_iterator&
   operator--()
@@ -1136,11 +1132,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   _GLIBCXX17_CONSTEXPR move_iterator
   operator--(int)
-  {
-	move_iterator __tmp = *this;
-	--_M_current;
-	return __tmp;
-  }
+  { return move_iterator(_M_current--); }
 
   _GLIBCXX17_CONSTEXPR move_iterator
   operator+(difference_type __n) const
@@ -1197,51 +1189,59 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !(__x == __y); }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator<(const move_iterator<_IteratorL>& __x,
 	  const move_iterator<_IteratorR>& __y)
+-> decltype( __x.base() < __y.base() )
 { return __x.base() < __y.base(); }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator<(const move_iterator<_Iterator>& __x,
 	  const move_iterator<_Iterator>& __y)
+-> decltype( __x.base() < __y.base() )
 { return __x.base() < __y.base(); }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator<=(const move_iterator<_IteratorL>& __x,
 	   const move_iterator<_IteratorR>& __y)
+-> decltype( __x.base() <= __y.base() )
 { return !(__y < __x); }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator<=(const move_iterator<_Iterator>& __x,
 	   const move_iterator<_Iterator>& __y)
+-> decltype( __x.base() <= __y.base() )
 { return !(__y < __x); }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator>(const move_iterator<_IteratorL>& __x,
 	  const move_iterator<_IteratorR>& __y)
+-> decltype( __x.base() > __y.base() )
 { return __y < __x; }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator>(const move_iterator<_Iterator>& __x,
 	  const move_iterator<_Iterator>& __y)
+-> decltype( __x.base() > __y.base() )
 { return __y < __x; }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator>=(const move_iterator<_IteratorL>& __x,
 	   const move_iterator<_IteratorR>& __y)
+-> decltype( __x.base() >= __y.base() )
 { return !(__x < __y); }
 
   template
-inline _GLIBCXX17_CONSTEXPR bool
+inline _GLIBCXX17_CONSTEXPR auto
 operator>=(const move_iterator<_Iterator>& __x,
 	   const move_iterator<_Iterator>& __y)
+-> decltype( __x.base() >= __y.base() )
 { return !(__x < __y); }
 
   // DR 685.
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/operator_neg.cc b/libstdc++-v3/testsuite/24_iterators/move_iterator/operator_neg.cc
new file mode 100644
index 000..7b09425358e
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/operator_neg.cc
@@ -0,0 +1,73 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any 

Re: Patch ping (was Re: [WIP PATCH] Improve tail call analysis and inliner EH clobber through variable life analysis (PR tree-optimization/89060))

2019-05-06 Thread Jakub Jelinek
On Mon, May 06, 2019 at 04:17:01PM +0200, Richard Biener wrote:
> > > +/* Data structure for compute_live_vars* functions.  */
> > >  
> > > +struct compute_live_vars_data {
> > > +  /* Vector of bitmaps for live vars at the end of basic blocks,
> > > + indexed by bb->index.  ACTIVE[ENTRY_BLOCK] must be empty bitmap,
> > > + ACTIVE[EXIT_BLOCK] is used for STOP_AFTER.  */
> > > +  vec active;
> > > +  /* Work bitmap of currently live variables.  */
> > > +  bitmap work;
> > > +  /* Bitmap of interesting variables.  Variables with uids not in this
> > > + bitmap are not tracked.  */
> > > +  bitmap vars;
> > > +};
> 
> How dense are the bitmaps?  Storage requirement will be quadratic
> so eventually using a sparseset for 'vars' and using the sparseset
> index for the bitmaps will improve things?

I don't see sparsesets ever used for DECL_UIDs, there is not even an
accessor for next_decl_uid which could be used outside of tree.c and I'm
afraid it could be huge.
The patch has a bitmap for testing if a DECL_UID is interesting to the
algorithm or not, so perhaps we could replace that bitmap with a
hash_map  that would map the interesting DECL_UIDs to
a sequential number or -1 and then perhaps use sbitmaps instead of bitmaps?

Jakub


Re: C++ PATCH for c++/90265 - ICE with generic lambda

2019-05-06 Thread Jason Merrill
On Mon, May 6, 2019 at 12:32 PM Marek Polacek  wrote:
>
> This new code
>
>   vec_safe_push (call_args, (*call_args)[nargs-1]);
>
> doesn't seem to work well because "obj" points to garbage after the 
> vec_safe_reserve
> call:
>
> template
> inline T *
> vec_safe_push (vec *, const T  CXX_MEM_STAT_INFO)
> {
>   vec_safe_reserve (v, 1, false PASS_MEM_STAT);
>   return v->quick_push (obj);
> }

Ah, sneaky pass by reference.

> But using a dedicated variable works even when vec_safe_reserve actually
> extends.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk/9?

OK, thanks.

Jason


C++ PATCH for c++/90265 - ICE with generic lambda

2019-05-06 Thread Marek Polacek
This new code

  vec_safe_push (call_args, (*call_args)[nargs-1]);

doesn't seem to work well because "obj" points to garbage after the 
vec_safe_reserve
call:

template
inline T *
vec_safe_push (vec *, const T  CXX_MEM_STAT_INFO)
{
  vec_safe_reserve (v, 1, false PASS_MEM_STAT);
  return v->quick_push (obj);
}

But using a dedicated variable works even when vec_safe_reserve actually
extends.

Bootstrapped/regtested on x86_64-linux, ok for trunk/9?

2019-05-06  Marek Polacek  

PR c++/90265 - ICE with generic lambda.
* pt.c (tsubst_copy_and_build): Use a dedicated variable for the last
element in the vector.

* g++.dg/cpp1y/lambda-generic-90265.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index 3e8c70b0d15..2f2066e297c 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -18881,7 +18881,8 @@ tsubst_copy_and_build (tree t,
if (thisarg)
  {
/* Shift the other args over to make room.  */
-   vec_safe_push (call_args, (*call_args)[nargs-1]);
+   tree last = (*call_args)[nargs - 1];
+   vec_safe_push (call_args, last);
for (int i = nargs-1; i > 0; --i)
  (*call_args)[i] = (*call_args)[i-1];
(*call_args)[0] = thisarg;
diff --git gcc/testsuite/g++.dg/cpp1y/lambda-generic-90265.C 
gcc/testsuite/g++.dg/cpp1y/lambda-generic-90265.C
new file mode 100644
index 000..d9ab9b7f55f
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-90265.C
@@ -0,0 +1,4 @@
+// PR c++/90265
+// { dg-do compile { target c++14 } }
+
+void (*a)(int, int, int, void *) = [](auto, auto, auto, auto) {};


Re: Patch ping (was Re: [WIP PATCH] Improve tail call analysis and inliner EH clobber through variable life analysis (PR tree-optimization/89060))

2019-05-06 Thread Jakub Jelinek
On Mon, May 06, 2019 at 04:17:01PM +0200, Richard Biener wrote:
> > > +  vec active;
> > > +  /* Work bitmap of currently live variables.  */
> > > +  bitmap work;
> > > +  /* Bitmap of interesting variables.  Variables with uids not in this
> > > + bitmap are not tracked.  */
> > > +  bitmap vars;
> > > +};
> 
> How dense are the bitmaps?  Storage requirement will be quadratic
> so eventually using a sparseset for 'vars' and using the sparseset
> index for the bitmaps will improve things?  Or re-using live

I'll gather some statistics.  I was using bitmaps because expansion uses
them for the same purpose as well.

> > > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next ())
> > > +{
> > > +  gimple *stmt = gsi_stmt (gsi);
> > > +
> > > +  if (gimple_clobber_p (stmt))
> > > + {
> > > +   tree lhs = gimple_assign_lhs (stmt);
> > > +   if (VAR_P (lhs) && bitmap_bit_p (data->vars, DECL_UID (lhs)))
> > > + bitmap_clear_bit (data->work, DECL_UID (lhs));
> 
> I think this clearing causes PR90348.

But if we remove it, we totally kill all the stack slot sharing I'm afraid.
Looking at that PR now.

> I wonder if it is possible to split analysis and transform here
> some more - the above is called for all preds of EXIT, if we
> first analyze all of them and then compute live once if needed,
> pruning invalid ones and then doing transform this would at least
> make LIVE only computed once.

Yeah, that is something I've mentioned in the mail with the patch, possibly
compute on the first tail call being analyzed for all other tail calls and
then for each one just walk the single bb once again.
> 
> It should be also possible to restrict LIVE to the sub-CFG leading to
> the EXIT preds with tail-call candidates as well, no?

Not sure how.  Especially if we compute for all tail calls.

Jakub


Re: [RFH] split {generic,gimple}-match.c files

2019-05-06 Thread Segher Boessenkool
On Mon, May 06, 2019 at 03:36:06PM +0200, Richard Biener wrote:
> On Mon, 6 May 2019, Martin Liška wrote:
> > 2) faster bootstrap on a massively parallel machine (64+ cores)
> 
> I guess for this we can also try to do LTO bootstap and
> LTO-link libbackend itself.  LTO bootstrap is only slow because
> we build everything 12 times.

Currently it is dominated by the serial things, like the handful of huge
files it compiles.  The hundreds of smaller files take only a few seconds
*combined*.

> I'm most interested in faster bootstrap on low-core machines
> (4 to 6 physical cores), since that's what I am doing most of the time.
> This is dominated by testing time, not bootstrap time.

Yeah.

On a bigger machine, regstrap is 61m for me, while regstrap with
--disable-libgomp is 41m.  I also have guality and prettyprinters
disabled, those take disproportionally much testing time as well.


Segher


Re: [PATCH v2, i386]: Fix PR89221, --enable-frame-pointer does not work as intended

2019-05-06 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 9:25 AM Rainer Orth  
wrote:
>
> Hi Uros,
>
> > On Fri, Feb 8, 2019 at 1:24 PM Uros Bizjak  wrote:
> >
> >> Attached patch fixes --enable-frame-pointer handling, and this way
> >> makes a couple of defines in config/i386/sol2.h obsolete.
> >
> > It turned out that --enable-frame-pointer does not work for multilibs
> > at all. So, instead of pretending that -m32 on x86_64 and -m64 on i686
> > works as advertised, unify 32bit and 64bit handling.
>
> this is effectively what I came up with when testing the first version
> of the patch on i386-pc-solaris2.11 and noticing the -m64 regressions.
>
> I've now re-bootstrapped this exact version without regressions, so from
> a Solaris/x86 POV this is good to go.

Now committed to mainline SVN as attached.

Uros.
Index: config/i386/i386-options.c
===
--- config/i386/i386-options.c  (revision 270904)
+++ config/i386/i386-options.c  (working copy)
@@ -2198,16 +2198,12 @@
 #define USE_IX86_FRAME_POINTER 0
 #endif
 
-#ifndef USE_X86_64_FRAME_POINTER
-#define USE_X86_64_FRAME_POINTER 0
-#endif
-
   /* Set the default values for switches whose default depends on TARGET_64BIT
  in case they weren't overwritten by command line options.  */
   if (TARGET_64BIT_P (opts->x_ix86_isa_flags))
 {
   if (opts->x_optimize >= 1 && !opts_set->x_flag_omit_frame_pointer)
-   opts->x_flag_omit_frame_pointer = !USE_X86_64_FRAME_POINTER;
+   opts->x_flag_omit_frame_pointer = !USE_IX86_FRAME_POINTER;
   if (opts->x_flag_asynchronous_unwind_tables
  && !opts_set->x_flag_unwind_tables
  && TARGET_64BIT_MS_ABI)
Index: config/i386/sol2.h
===
--- config/i386/sol2.h  (revision 270904)
+++ config/i386/sol2.h  (working copy)
@@ -248,9 +248,6 @@
 #define ASAN_REJECT_SPEC \
   DEF_ARCH64_SPEC("%e:-fsanitize=address is not supported in this 
configuration")
 
-#define USE_IX86_FRAME_POINTER 1
-#define USE_X86_64_FRAME_POINTER 1
-
 #undef NO_PROFILE_COUNTERS
 
 #undef MCOUNT_NAME
Index: config.gcc
===
--- config.gcc  (revision 270913)
+++ config.gcc  (working copy)
@@ -607,12 +607,6 @@
echo "This target does not support --with-abi."
exit 1
fi
-   if test "x$enable_cld" = xyes; then
-   tm_defines="${tm_defines} USE_IX86_CLD=1"
-   fi
-   if test "x$enable_frame_pointer" = xyes; then
-   tm_defines="${tm_defines} USE_IX86_FRAME_POINTER=1"
-   fi
;;
 x86_64-*-*)
case ${with_abi} in
@@ -633,12 +627,6 @@
echo "Unknown ABI used in --with-abi=$with_abi"
exit 1
esac
-   if test "x$enable_cld" = xyes; then
-   tm_defines="${tm_defines} USE_IX86_CLD=1"
-   fi
-   if test "x$enable_frame_pointer" = xyes; then
-   tm_defines="${tm_defines} USE_IX86_FRAME_POINTER=1"
-   fi
;;
 arm*-*-*)
tm_p_file="arm/arm-flags.h ${tm_p_file} arm/aarch-common-protos.h"
Index: configure
===
--- configure   (revision 270904)
+++ configure   (working copy)
@@ -1688,8 +1688,7 @@
   --enable-leading-mingw64-underscores
   enable leading underscores on 64 bit mingw targets
   --enable-cldenable -mcld by default for 32bit x86
-  --enable-frame-pointer  enable -fno-omit-frame-pointer by default for 32bit
-  x86
+  --enable-frame-pointer  enable -fno-omit-frame-pointer by default for x86
   --disable-win32-registry
   disable lookup of installation paths in the Registry
   on Windows hosts
@@ -12199,8 +12198,7 @@
 
 case $target_os in
 linux* | darwin[8912]*)
-  # Enable -fomit-frame-pointer by default for Linux and Darwin with
-  # DWARF2.
+  # Enable -fomit-frame-pointer by default for Linux and Darwin with DWARF2.
   enable_frame_pointer=no
   ;;
 *)
@@ -12211,6 +12209,17 @@
 fi
 
 
+case $target in
+i[34567]86-*-* | x86_64-*-*)
+   if test "x$enable_cld" = xyes; then
+   tm_defines="${tm_defines} USE_IX86_CLD=1"
+   fi
+   if test "x$enable_frame_pointer" = xyes; then
+   tm_defines="${tm_defines} USE_IX86_FRAME_POINTER=1"
+   fi
+   ;;
+esac
+
 # Windows32 Registry support for specifying GCC installation paths.
 # Check whether --enable-win32-registry was given.
 if test "${enable_win32_registry+set}" = set; then :
@@ -18646,7 +18655,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18649 "configure"
+#line 18658 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18752,7 +18761,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext 

[PATCH] Eliminates phi on branch for CMP result

2019-05-06 Thread Jiufu Guo
Hi,

This patch implements the optimization in PR77820.  The optimization
eliminates phi and phi's basic block, if the phi is used only by
condition branch, and the phi's incoming value in the result of a
CMP result.

This optimization eliminates:
1. saving CMP result: p0 = a CMP b.
2. additional CMP on branch: if (phi != 0).
3. converting CMP result if there is phi = (INT_CONV) p0 if there is.

  
  p0 = a CMP b
  goto ;

  
  p1 = c CMP d
  goto ;

  
  # phi = PHI 
  if (phi != 0) goto ; else goto ;

Transform to:

  
  p0 = a CMP b
  if (p0 != 0) goto ; else goto ;

  
  p1 = c CMP d
  if (p1 != 0) goto ; else goto ;

Bootstrapped and tested on powerpc64le with no regressions, and testcases were
saved. Is this ok for trunk?

Thanks!

[gcc]
2019-05-06  Jiufu Guo  
Lijia He  

PR tree-optimization/77820
* tree-ssa-mergephicmp.c: New file.
* Makefile.in (OBJS): Add tree-ssa-mergephicmp.o.
* common.opt (ftree-mergephicmp): New flag.
* passes.def (pass_mergephicmp): New pass.
* timevar.def (TV_TREE_MERGEPHICMP): New timevar.
* tree-pass.h: New file.

[gcc/testsuite]
2019-05-06  Jiufu Guo  
Lijia He  

PR tree-optimization/77820
* gcc.dg/tree-ssa/phi_on_compare-1.c: New testcase.
* gcc.dg/tree-ssa/phi_on_compare-2.c: New testcase.


---
 gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/passes.def   |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c |  31 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c |  31 +++
 gcc/timevar.def  |   1 +
 gcc/tree-pass.h  |   1 +
 gcc/tree-ssa-mergephicmp.c   | 260 +++
 8 files changed, 330 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
 create mode 100644 gcc/tree-ssa-mergephicmp.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d186d71..9729501 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1567,6 +1567,7 @@ OBJS = \
tree-ssa-reassoc.o \
tree-ssa-sccvn.o \
tree-ssa-scopedtables.o \
+   tree-ssa-mergephicmp.o \
tree-ssa-sink.o \
tree-ssa-strlen.o \
tree-ssa-structalias.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index d342c4f..5ea5ed2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2702,6 +2702,10 @@ ftree-salias
 Common Ignore
 Does nothing.  Preserved for backward compatibility.
 
+ftree-mergephicmp
+Common Report Var(flag_mergephicmp) Init(1) Optimization
+Enable optimization on branch phi compare on trees.
+
 ftree-sink
 Common Report Var(flag_tree_sink) Optimization
 Enable SSA code sinking on trees.
diff --git a/gcc/passes.def b/gcc/passes.def
index 446a7c4..e3a3913 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -249,6 +249,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lim);
   NEXT_PASS (pass_walloca, false);
   NEXT_PASS (pass_pre);
+  NEXT_PASS (pass_mergephicmp);
   NEXT_PASS (pass_sink_code);
   NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
new file mode 100644
index 000..2e3f4f6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
+
+void g (void);
+void g1 (void);
+
+void
+f (long a, long b, long c, long d, int x)
+{
+  _Bool t;
+  if (x)
+{
+  t = a < b;
+}
+  else if (d == a + b)
+{
+  t = c < d;
+}
+  else
+{
+  t = a == c;
+}
+
+  if (t)
+{
+  g1 ();
+  g ();
+}
+}
+
+/* { dg-final { scan-tree-dump-not "PHI" "mergephicmp" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
new file mode 100644
index 000..7c35417
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-mergephicmp" } */
+
+void g (void);
+void g1 (void);
+
+void
+f (long a, long b, long c, long d, int x)
+{
+  int t;
+  if (x)
+{
+  t = a < b;
+}
+  else if (d == a + b)
+{
+  t = c < d;
+}
+  else
+{
+  t = a == c;
+}
+
+  if (t)
+{
+  g1 ();
+  g ();
+}
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 0 "mergephicmp" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 5415446..91f92d7 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES  , "tree split crit 
edges")
 DEFTIMEVAR (TV_TREE_REASSOC  , "tree reassociation")
 DEFTIMEVAR (TV_TREE_PRE , "tree PRE")
 

Re: Patch ping (was Re: [WIP PATCH] Improve tail call analysis and inliner EH clobber through variable life analysis (PR tree-optimization/89060))

2019-05-06 Thread Richard Biener
On Fri, 3 May 2019, Jakub Jelinek wrote:

> Hi!
> 
> I'd like to ping this patch for stage1.
> Bootstrapped/regtested it again last night successfully.
> 
> On Fri, Feb 08, 2019 at 08:36:33AM +0100, Jakub Jelinek wrote:
> > The following patch uses a simple data flow to find live (addressable)
> > variables at certain spots (for tail call discovery at the point of the
> > potential tail call, so that we don't reject tail calls because of variables
> > that can be known out of scope already so that people don't have to find
> > ugly workarounds if they really need tail calls, and for the recently
> > added inline EH pad clobber addition so that we don't add too many
> > variables).  Bootstrapped/regtested on x86_64-linux and i686-linux.
> > 
> > Haven't gathered statistics on how often does it trigger yet.
> > Shall I use double queue (pending/working sbitmaps) to speed it up (guess I
> > could gather statistics if that helps reasonably)?  But if so, perhaps
> > add_scope_conflicts should change too.  I haven't shared code with
> > what add_scope_conflicts does because it isn't really that big chunk of code
> > and would need many modifications to make it generic enough to handle
> > add_scope_conflicts needs, plus as I wanted to make it a helper for other
> > optimizations, I didn't want to use bb->aux etc.  For tail call, I see
> > another option to compute it unconditionally once at the start of the pass
> > for all may_be_aliased auto_var_in_fn_p vars and then just walk a single
> > bb with each (potential) tail call, instead of computing it for every
> > potential tail call again (on the other side with perhaps smaller set of
> > variables).  In that case e.g. vars == NULL argument could imply the
> > VAR_P && may_be_aliased && auto_var_in_fn_p test instead of bitmap_bit_p
> > test, we could drop the stop_after argument and instead export the _1
> > function renamed to something to walk a single bb (or wrapper to it that
> > would set up the structure) until stop_after.
> > 
> > Thoughts on this?
> > 
> > 2019-02-08  Jakub Jelinek  
> > 
> > PR tree-optimization/89060
> > * tree-ssa-live.h (compute_live_vars, destroy_live_vars): Declare.
> > * tree-ssa-live.c: Include gimple-walk.h and cfganal.h.
> > (struct compute_live_vars_data): New type.
> > (compute_live_vars_visit, compute_live_vars_1, compute_live_vars,
> > destroy_live_vars): New functions.
> > * tree-tailcall.c (find_tail_calls): Perform variable life analysis
> > before punting.
> > * tree-inline.h (struct copy_body_data): Add eh_landing_pad_dest
> > member.
> > * tree-inline.c (add_clobbers_to_eh_landing_pad): Remove BB argument.
> > Perform variable life analysis to select variables that really need
> > clobbers added.
> > (copy_edges_for_bb): Don't call add_clobbers_to_eh_landing_pad here,
> > instead set id->eh_landing_pad_dest and assert it is the same.
> > (copy_cfg_body): Call it here if id->eh_landing_pad_dest is non-NULL.
> > 
> > * gcc.dg/tree-ssa/pr89060.c: New test.
> > 
> > --- gcc/tree-ssa-live.h.jj  2019-01-01 12:37:16.967978068 +0100
> > +++ gcc/tree-ssa-live.h 2019-02-07 19:02:58.233530924 +0100
> > @@ -265,6 +265,8 @@ extern tree_live_info_p calculate_live_r
> >  extern void debug (tree_live_info_d );
> >  extern void debug (tree_live_info_d *ptr);
> >  extern void dump_live_info (FILE *, tree_live_info_p, int);
> > +extern vec compute_live_vars (struct function *, bitmap, 
> > gimple *);
> > +extern void destroy_live_vars (vec &);
> >  
> >  
> >  /*  Return TRUE if P is marked as a global in LIVE.  */
> > --- gcc/tree-ssa-live.c.jj  2019-01-01 12:37:16.055993032 +0100
> > +++ gcc/tree-ssa-live.c 2019-02-07 19:34:33.046401259 +0100
> > @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.
> >  #include "stringpool.h"
> >  #include "attribs.h"
> >  #include "optinfo.h"
> > +#include "gimple-walk.h"
> > +#include "cfganal.h"
> >  
> >  static void verify_live_on_entry (tree_live_info_p);
> >  
> > @@ -1194,8 +1196,142 @@ calculate_live_ranges (var_map map, bool
> >  
> >return live;
> >  }
> > +
> > +/* Data structure for compute_live_vars* functions.  */
> >  
> > +struct compute_live_vars_data {
> > +  /* Vector of bitmaps for live vars at the end of basic blocks,
> > + indexed by bb->index.  ACTIVE[ENTRY_BLOCK] must be empty bitmap,
> > + ACTIVE[EXIT_BLOCK] is used for STOP_AFTER.  */
> > +  vec active;
> > +  /* Work bitmap of currently live variables.  */
> > +  bitmap work;
> > +  /* Bitmap of interesting variables.  Variables with uids not in this
> > + bitmap are not tracked.  */
> > +  bitmap vars;
> > +};

How dense are the bitmaps?  Storage requirement will be quadratic
so eventually using a sparseset for 'vars' and using the sparseset
index for the bitmaps will improve things?  Or re-using live
code mapping?

> > +/* Callback for walk_stmt_load_store_addr_ops.  If OP is a VAR_DECL with
> > +   uid set in DATA->vars, 

Re: [PATCH][stage1] Support profile (BB counts and edge probabilities) in GIMPLE FE.

2019-05-06 Thread Richard Biener
On Mon, May 6, 2019 at 9:59 AM Martin Liška  wrote:
>
> On 5/2/19 2:31 PM, Richard Biener wrote:
> > On Mon, Apr 29, 2019 at 2:51 PM Martin Liška  wrote:
> >>
> >> On 4/26/19 3:18 PM, Richard Biener wrote:
> >>> On Wed, Apr 10, 2019 at 10:12 AM Martin Liška  wrote:
> 
>  On 4/9/19 3:19 PM, Jan Hubicka wrote:
> >> Hi.
> >>
> >> There's updated version that supports profile quality for both counts
> >> and probabilities. I'm wondering whether ENTRY and EXIT BBs needs to
> >> have set probability. Apparently, I haven't seen any verifier that
> >> would complain.
> >
> > Well, you do not need to define it but then several cases will
> > degenerate. In particular BB frequencies (for callgraph profile or
> > hot/cold decisions) are calculated as ratios of entry BB and given BB
> > count. If entry BB is undefined you will get those undefined and
> > heuristics will resort to conservative answers.
> >
> > I do not think we use exit block count.
> >
> > Honza
> >
> 
>  Thank you Honza for explanation. I'm sending version of the patch
>  that supports entry BB count.
> 
>  I've been testing the patch right now.
> >>>
> >>> Can you move the GIMPLE/RTL FE specific data in c_declspecs to
> >>> a substructure accessed via indirection?  I guess enlarging that
> >>> isn't really what we should do.  You'd move gimple_or_rtl_pass
> >>> there and make that pointer one to a struct aux_fe_data
> >>> (lifetime managed by the respective RTL/GIMPLE FE, thus
> >>> to be freed there)?  Joseph, do you agree or is adding more
> >>> stuff to c_declspecs OK (I would guess it could be a few more
> >>> elements in the future).
> >>
> >> Let's wait here for Joseph.
> >
> > So looks like it won't matter so let's go with the current approach
> > for the moment.
> >
> >>>
> >>> -c_parser_gimple_parse_bb_spec (tree val, int *index)
> >>> +c_parser_gimple_parse_bb_spec (tree val, gimple_parser ,
> >>> +  int *index, profile_probability 
> >>> *probablity)
> >>>  {
> >>>
> >>> so this will allow specifying probability in PHI node arguments.
> >>> I think we want to split this into the already existing part
> >>> and a c_parser_gimple_parse_bb_spec_with_edge_probability
> >>> to be used at edge sources.
> >>
> >> Yes, that's a good idea!
> >>
> >>>
> >>> +  if (cfun->curr_properties & PROP_cfg)
> >>> +{
> >>> +  update_max_bb_count ();
> >>> +  set_hot_bb_threshold (hot_bb_threshold);
> >>> +  ENTRY_BLOCK_PTR_FOR_FN (cfun)->count = entry_bb_count;
> >>>
> >>> I guess the last one should be before update_max_bb_count ()?
> >>
> >> Same here.
> >>
> >>>
> >>> +}
> >>>
> >>> + /* Parse profile: quality(value) */
> >>>   else
> >>> {
> >>> - c_parser_error (parser, "unknown block specifier");
> >>> - return return_p;
> >>> + tree q;
> >>> + profile_quality quality;
> >>> + tree v = c_parser_peek_token (parser)->value;
> >>>
> >>> peek next token before the if/else and do
> >>>
> >>>else if (!strcmp (...))
> >>>
> >>> as in the loop_header case.  I expected we can somehow share
> >>> parsing of profile quality and BB/edge count/frequency?  How's
> >>> the expected syntax btw, comments in the code should tell us.
> >>> I'm guessing it's quality-id '(' number ')' and thus it should be
> >>> really shareable between edge and BB count and also __GIMPLE
> >>> header parsing?  So parse_profile_quality should be
> >>> parse_profile () instead, resulting in a combined value
> >>> (we do use the same for edge/bb?).
> >>
> >> It's problematic, there are different error messages for count/frequency.
> >> Moreover call to parse_profile_quality in c_parser_gimple_or_rtl_pass_list
> >> is a way how to test that next 'token' is a profile count.
> >
> > Who cares about error messages...  But sure, I'm just proposing to
> > merge testing for next token and actual parsing.
>
> After I've done removal of hot_bb_threshold parsing, there are just 2 usages
> of parse_profile_quality. I would like to leave it as it, not introducing a 
> wrappers.
>
> >
> >>>
> >>> +  else if (!strcmp (op, "hot_bb_threshold"))
> >>> +   {
> >>>
> >>> I'm not sure about this - it doesn't make sense to specify this
> >>> on a per-function base since it seems to control a global
> >>> variable (booo!)?
> >>
> >> Yep, shame on me!
> >>
> >>> Isn't this instead computed on-demand
> >>> based on profile_info->sum_max?
> >>
> >> No it's a global value shared among functions.
> >>
> >>> If not then I think
> >>> we need an alternate way of funneling in global state into
> >>> the GIMPLE FE.
> >>
> >> What about --param gimple-fe-hot-bb-threshold ?
> >
> > I thought about that, yes ...  in absence can it actually be
> > "computed"?
>
> Renamed to it.
>
> Patch can bootstrap on x86_64-linux-gnu 

GCC 8 backports

2019-05-06 Thread Martin Liška
Hi.

I'm sending following backport that I've just tested.

Martin
>From ad56099403b078303477e3f501ef09b2e2b6e450 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 6 May 2019 07:35:59 +
Subject: [PATCH] Backport r270899

gcc/ChangeLog:

2019-05-06  Martin Liska  

	PR sanitizer/90312
	* config/i386/i386-options.c (ix86_option_override_internal): Error only
	when -mabi is selected to a non-default version.

gcc/testsuite/ChangeLog:

2019-05-06  Martin Liska  

	PR sanitizer/90312
	* gcc.dg/asan/pr87930.c: Run the test only on *linux or *gnu
	systems.
	* gcc.dg/tsan/pr88017.c: Likewise.
---
 gcc/config/i386/i386.c  | 17 +++--
 gcc/testsuite/gcc.dg/asan/pr87930.c |  2 +-
 gcc/testsuite/gcc.dg/tsan/pr88017.c |  2 +-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8a1ffd3769f..5b946e1a32f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3705,12 +3705,17 @@ ix86_option_override_internal (bool main_args_p,
 error ("%<-mabi=ms%> not supported with X32 ABI");
   gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
 
-  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS) && opts->x_ix86_abi == MS_ABI)
-error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
-  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi == MS_ABI)
-error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
-  if ((opts->x_flag_sanitize & SANITIZE_THREAD) && opts->x_ix86_abi == MS_ABI)
-error ("%<-mabi=ms%> not supported with %<-fsanitize=thread%>");
+  const char *abi_name = opts->x_ix86_abi == MS_ABI ? "ms" : "sysv";
+  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS)
+  && opts->x_ix86_abi != DEFAULT_ABI)
+error ("%<-mabi=%s%> not supported with %<-fsanitize=address%>", abi_name);
+  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS)
+  && opts->x_ix86_abi != DEFAULT_ABI)
+error ("%<-mabi=%s%> not supported with %<-fsanitize=kernel-address%>",
+	   abi_name);
+  if ((opts->x_flag_sanitize & SANITIZE_THREAD)
+  && opts->x_ix86_abi != DEFAULT_ABI)
+error ("%<-mabi=%s%> not supported with %<-fsanitize=thread%>", abi_name);
 
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
diff --git a/gcc/testsuite/gcc.dg/asan/pr87930.c b/gcc/testsuite/gcc.dg/asan/pr87930.c
index 4f8e6999fde..5a65d3fb030 100644
--- a/gcc/testsuite/gcc.dg/asan/pr87930.c
+++ b/gcc/testsuite/gcc.dg/asan/pr87930.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-do compile { target { { i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } && lp64 } } } */
 /* { dg-options "-fsanitize=address -mabi=ms" } */
 
 int i;
diff --git a/gcc/testsuite/gcc.dg/tsan/pr88017.c b/gcc/testsuite/gcc.dg/tsan/pr88017.c
index 82693a67e87..10df2818b0d 100644
--- a/gcc/testsuite/gcc.dg/tsan/pr88017.c
+++ b/gcc/testsuite/gcc.dg/tsan/pr88017.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-do compile { target { { i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } && lp64 } } } */
 /* { dg-options "-fsanitize=thread -mabi=ms" } */
 
 int i;
-- 
2.21.0



Re: [PATCH] Append to target_gtfiles in order to fix Darwin bootstrap.

2019-05-06 Thread Martin Liška
On 5/6/19 3:52 PM, Jakub Jelinek wrote:
> On Mon, May 06, 2019 at 03:47:53PM +0200, Martin Liška wrote:
>> The patch append to target_gtfiles at 3 places instead of overwriting
>> that.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2019-05-06  Martin Liska  
>>
>>  * config.gcc: Append to target_gtfiles.
>> ---
>>  gcc/config.gcc | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>>
> 
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index 5124ea00792..f119f82e475 100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -383,7 +383,7 @@ i[34567]86-*-*)
>>  cxx_target_objs="i386-c.o"
>>  d_target_objs="i386-d.o"
>>  extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o 
>> x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o 
>> i386-features.o"
>> -  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c 
>> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
>> +target_gtfiles="$target_gtfiles \$(srcdir)/config/i386/i386-builtins.c 
>> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
> 
> I think there is no need to add $target_gtfiles here, you know it is empty,
> the first spot in config.gcc that touches it is this switch based on CPU.
> Just fix up the indentation.

Ah, got it.

> 
>> @@ -416,7 +416,7 @@ x86_64-*-*)
>>  d_target_objs="i386-d.o"
>>  extra_options="${extra_options} fused-madd.opt"
>>  extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o 
>> x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o 
>> i386-features.o"
>> -  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c 
>> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
>> +target_gtfiles="$target_gtfiles \$(srcdir)/config/i386/i386-builtins.c 
>> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
>>  extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
>> pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
>> nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
> 
> Ditto.
> 
>> @@ -693,7 +693,7 @@ case ${target} in
>>esac
>>tm_file="${tm_file} ${cpu_type}/darwin.h"
>>tm_p_file="${tm_p_file} darwin-protos.h"
>> -  target_gtfiles="\$(srcdir)/config/darwin.c"
>> +  target_gtfiles="$target_gtfiles \$(srcdir)/config/darwin.c"
>>extra_options="${extra_options} darwin.opt"
>>c_target_objs="${c_target_objs} darwin-c.o"
>>cxx_target_objs="${cxx_target_objs} darwin-c.o"
>>
> 
> This is insufficient, needs to be done also in the 3
>   target_gtfiles="\$(srcdir)/config/i386/winnt.c"
> cases.

Done that. I'm going to install the patch.

Martin

> 
> Ok with those changes.
> 
>   Jakub
> 

>From 5761aaab91e10390321efec6f7a09001eb93e94d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 6 May 2019 13:03:59 +0200
Subject: [PATCH] Append to target_gtfiles in order to fix Darwin bootstrap.

gcc/ChangeLog:

2019-05-06  Martin Liska  

	* config.gcc: Append to target_gtfiles and fix indentation.
---
 gcc/config.gcc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5124ea00792..6ac187ce0c1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -383,7 +383,7 @@ i[34567]86-*-*)
 	cxx_target_objs="i386-c.o"
 	d_target_objs="i386-d.o"
 	extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o i386-features.o"
-  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
+	target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
 	extra_options="${extra_options} fused-madd.opt"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		   pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
@@ -416,7 +416,7 @@ x86_64-*-*)
 	d_target_objs="i386-d.o"
 	extra_options="${extra_options} fused-madd.opt"
 	extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o i386-features.o"
-  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
+	target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		   pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		   nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
@@ -693,7 +693,7 @@ case ${target} in
   esac
   tm_file="${tm_file} ${cpu_type}/darwin.h"
   tm_p_file="${tm_p_file} darwin-protos.h"
-  

Re: [PATCH] Append to target_gtfiles in order to fix Darwin bootstrap.

2019-05-06 Thread Jakub Jelinek
On Mon, May 06, 2019 at 03:47:53PM +0200, Martin Liška wrote:
> The patch append to target_gtfiles at 3 places instead of overwriting
> that.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-05-06  Martin Liska  
> 
>   * config.gcc: Append to target_gtfiles.
> ---
>  gcc/config.gcc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> 

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 5124ea00792..f119f82e475 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -383,7 +383,7 @@ i[34567]86-*-*)
>   cxx_target_objs="i386-c.o"
>   d_target_objs="i386-d.o"
>   extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o 
> x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o 
> i386-features.o"
> -  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c 
> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
> + target_gtfiles="$target_gtfiles \$(srcdir)/config/i386/i386-builtins.c 
> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"

I think there is no need to add $target_gtfiles here, you know it is empty,
the first spot in config.gcc that touches it is this switch based on CPU.
Just fix up the indentation.

> @@ -416,7 +416,7 @@ x86_64-*-*)
>   d_target_objs="i386-d.o"
>   extra_options="${extra_options} fused-madd.opt"
>   extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o 
> x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o 
> i386-features.o"
> -  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c 
> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
> + target_gtfiles="$target_gtfiles \$(srcdir)/config/i386/i386-builtins.c 
> \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
>   extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
>  pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
>  nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h

Ditto.

> @@ -693,7 +693,7 @@ case ${target} in
>esac
>tm_file="${tm_file} ${cpu_type}/darwin.h"
>tm_p_file="${tm_p_file} darwin-protos.h"
> -  target_gtfiles="\$(srcdir)/config/darwin.c"
> +  target_gtfiles="$target_gtfiles \$(srcdir)/config/darwin.c"
>extra_options="${extra_options} darwin.opt"
>c_target_objs="${c_target_objs} darwin-c.o"
>cxx_target_objs="${cxx_target_objs} darwin-c.o"
> 

This is insufficient, needs to be done also in the 3
target_gtfiles="\$(srcdir)/config/i386/winnt.c"
cases.

Ok with those changes.

Jakub


Re: [RFH] split {generic,gimple}-match.c files

2019-05-06 Thread Martin Liška
On 5/6/19 3:36 PM, Richard Biener wrote:
> On Mon, 6 May 2019, Martin Liška wrote:
> 
>> On 5/2/19 9:04 PM, Richard Biener wrote:
>>> On May 2, 2019 8:14:46 PM GMT+02:00, Segher Boessenkool 
>>>  wrote:
 On Thu, May 02, 2019 at 07:41:18PM +0200, Richard Biener wrote:
> On May 2, 2019 7:00:16 PM GMT+02:00, Segher Boessenkool
  wrote:
>> On Thu, May 02, 2019 at 03:18:00PM +0200, Richard Biener wrote:
>>> Somewhen earlier this year I've done the experiment with using
>>> a compile with -flto -fno-fat-lto-objects and a link
>>> via -flto -r -flinker-output=rel into the object file.  This cut
>>> compile-time more than in half with less maintainance overhead.
>>>
>>> Adding other files to this handling looks trivial as well, as well
>>> as conditionalizing it (I'd probably not want this for devel
 builds).
>>
>> But we want devel builds to be a lot faster than they are now :-/
>
> My devel build is -O0 non-bootstrapped and building the files after
 dependency changes is fast enough. It's the bootstraps that matter, no?


 Yes, I bootstrap most of the time.  For development.  It catches a
 *lot*
 of problems progressive builds do not.  (Those are plenty fast already
 of
 course, -O0 or not).
>>>
>>> So we'd catch it there but disable by default for stage 1 since we probably 
>>> do not want to rely on the host compiler. 
>>>
>>> Richard. 
>>>

 Segher
>>>
>>
>> I'm interested in 2 scenarios:
>>
>> 1) fast non-bootstrap development build with -O2; typically I want to run a 
>> fraction of test-suite or
>> I want to install the compiler and run-time binaries; building the compiler 
>> with -O0 will result in terribly
>> slow build of run-time libraries
> 
> That's already overall "fast", mostly dominated by runtime library build.

Yes, mainly libsanitizer run-time libraries build is a pain.

> 
>> 2) faster bootstrap on a massively parallel machine (64+ cores)
> 
> I guess for this we can also try to do LTO bootstap and
> LTO-link libbackend itself.  LTO bootstrap is only slow because
> we build everything 12 times.
> 
> I'm most interested in faster bootstrap on low-core machines
> (4 to 6 physical cores), since that's what I am doing most of the time.
> This is dominated by testing time, not bootstrap time.

I can provide access to multiple high-core machines we have at SUSE ;)

Martin

> 
> Richard.
> 



Re: [RFH] split {generic,gimple}-match.c files

2019-05-06 Thread Martin Liška
On 5/6/19 3:31 PM, Richard Biener wrote:
> On Mon, 6 May 2019, Martin Liška wrote:
> 
>> On 5/2/19 3:18 PM, Richard Biener wrote:
>>> On Mon, 29 Apr 2019, Martin Liška wrote:
>>>
 On 9/10/18 1:43 PM, Martin Liška wrote:
> On 09/04/2018 05:07 PM, Martin Liška wrote:
>> - in order to achieve real speed up we need to split also other 
>> generated (and also dwarf2out.c, i386.c, ..) files:
>> here I'm most concerned about insn-recog.c, which can't be split the 
>> same way without ending up with a single huge SCC component.
>
> About the insn-recog.c file: all functions are static and using SCC one 
> ends
> up with all functions in one component. In order to split the callgraph 
> one
> needs to promote some functions to be extern and then split would be 
> possible.
> In order to do that we'll probably need to teach splitter how to do 
> partitioning
> based on minimal number of edges to be removed.
>
> I need to inspire in lto_balanced_map, or is there some simple algorithm 
> I can start with?
>
> Martin
>

 I'm adding here Richard Sandiford as he wrote majority of gcc/genrecog.c 
 file.
 As mentioned, I'm seeking for a way how to split the generated file. Or how
 to learn the generator to process a reasonable splitting.
>>>
>>> Somewhen earlier this year I've done the experiment with using
>>> a compile with -flto -fno-fat-lto-objects
>>
>> -fno-fat-lto-objects is default, isn't it?
> 
> Where linker plugin support is detected, yes.
> 
>>> and a link
>>> via -flto -r -flinker-output=rel into the object file.  This cut
>>> compile-time more than in half with less maintainance overhead.

Ah, -flinker-output=nolto-rel is new in GCC 9 release. That's why I was 
confused.

>>
>> Can you please provide exact command line how to do that?
> 
> gcc t.c -o t.o -flto=8 -r -flinker-output=nolto-rel
> 
> there's an annoying warning:
> 
> cc1plus: warning: command line option ‘-flinker-output=nolto-rel’ is valid 
> for LTO but not for C++
> 
> which can be avoided by splitting the above into a compile and
> a separate LTO "link" step.  Using -Wl,-flinker- doesn't
> work unfortunately (ld doesn't understand it).
> 
> Using installed GCC 9.1 compiling trunk gimple-match.c with -O2 -g
> takes 58.7s while with the LTO trick it takes 23.3s (combined
> CPU time is up to 96s).  That was with -flto=8 on a CPU with
> 4 physical and 8 logical cores.  As it includes -g it includes
> the debug copy dance as well.

That would be usable for the bootstrap on a massively parallel machine
where combined CPU time overhead won't be issue. I'll play with that a bit.

Martin

> 
>> bloaty gimple-match.o -- gimple-match.o.nolto
>  VM SIZE FILE SIZE
>  ++ GROWING   
> ++
>   [ = ]   0 .rela.debug_info  +3.62Mi   
> +45%
>   [ = ]   0 .rela.debug_ranges +161Ki  
> +1.8%
>   [ = ]   0 .debug_str+95.8Ki   
> +19%
>   [ = ]   0 .rela.text+77.6Ki   
> +10%
>   [ = ]   0 .debug_ranges +58.9Ki  
> +1.7%
>   [ = ]   0 .symtab   +22.9Ki   
> +68%
>   [ = ]   0 .debug_abbrev +21.1Ki  
> +394%
>   [ = ]   0 .strtab   +11.4Ki  
> +9.5%
>   +8.1% +5.34Ki .eh_frame +5.34Ki  
> +8.1%
>+84% +4.09Ki .rodata.str1.8+4.09Ki   
> +84%
>   [ = ]   0 .rela.text.unlikely   +3.87Ki  
> +1.0%
>   [ = ]   0 .rela.debug_aranges   +3.68Ki  
> +872%
>   [ = ]   0 .debug_aranges+3.02Ki 
> +10e2%
>+42% +2.59Ki .rodata.str1.1+2.59Ki   
> +42%
>   +0.2% +2.41Ki [Other]   +2.45Ki  
> +0.2%
>   [ = ]   0 .rela.debug_line  +2.09Ki   
> +16%
>   [ = ]   0 .rela.eh_frame+1.17Ki  
> +4.3%
>   [NEW] +1.09Ki .rodata._Z7get_defPFP9tree_nodeS0_ES0_.str1.8 +1.09Ki  
> [NEW]
>   [ = ]   0 .shstrtab+784   
> +44%
>   [ = ]   0 [ELF Headers]+768   
> +16%
>   [ = ]   0 .comment +666 
> +37e2%
> 
>  -- SHRINKING 
> --
>   [ = ]   0 .debug_line-256Ki 
> -17.3%
>   [ = ]   0 .rela.debug_loc   -73.6Ki  
> -0.6%
>   [ = ]   0 .debug_info   -63.4Ki  
> -1.6%
>   [ = ]   0 .debug_loc  

[PATCH] Append to target_gtfiles in order to fix Darwin bootstrap.

2019-05-06 Thread Martin Liška
Hi.

The patch append to target_gtfiles at 3 places instead of overwriting
that.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-05-06  Martin Liska  

* config.gcc: Append to target_gtfiles.
---
 gcc/config.gcc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5124ea00792..f119f82e475 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -383,7 +383,7 @@ i[34567]86-*-*)
 	cxx_target_objs="i386-c.o"
 	d_target_objs="i386-d.o"
 	extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o i386-features.o"
-  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
+	target_gtfiles="$target_gtfiles \$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
 	extra_options="${extra_options} fused-madd.opt"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		   pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
@@ -416,7 +416,7 @@ x86_64-*-*)
 	d_target_objs="i386-d.o"
 	extra_options="${extra_options} fused-madd.opt"
 	extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o x86-tune-sched-core.o i386-options.o i386-builtins.o i386-expand.o i386-features.o"
-  target_gtfiles="\$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
+	target_gtfiles="$target_gtfiles \$(srcdir)/config/i386/i386-builtins.c \$(srcdir)/config/i386/i386-expand.c \$(srcdir)/config/i386/i386-options.c"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		   pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		   nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
@@ -693,7 +693,7 @@ case ${target} in
   esac
   tm_file="${tm_file} ${cpu_type}/darwin.h"
   tm_p_file="${tm_p_file} darwin-protos.h"
-  target_gtfiles="\$(srcdir)/config/darwin.c"
+  target_gtfiles="$target_gtfiles \$(srcdir)/config/darwin.c"
   extra_options="${extra_options} darwin.opt"
   c_target_objs="${c_target_objs} darwin-c.o"
   cxx_target_objs="${cxx_target_objs} darwin-c.o"



[PATCH] Fix PR90358

2019-05-06 Thread Richard Biener


The following fixes a miscompilation by the recent vector load
optimization.  The check matching unused upper half of a vector
was incomplete.

Bootstrap / regtest running on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2019-05-06  Richard Biener  

PR tree-optimization/90358
* tree-vect-stmts.c (get_group_load_store_type): Properly
detect unused upper half of load.
(vectorizable_load): Likewise.

* gcc.target/i386/pr90358.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 270902)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -2273,6 +2274,7 @@ get_group_load_store_type (stmt_vec_info
   == dr_aligned
  || alignment_support_scheme == dr_unaligned_supported)
  && known_eq (nunits, (group_size - gap) * 2)
+ && known_eq (nunits, group_size)
  && mode_for_vector (elmode, (group_size - gap)).exists ()
  && VECTOR_MODE_P (vmode)
  && targetm.vector_mode_supported_p (vmode)
@@ -8550,7 +8552,8 @@ vectorizable_load (stmt_vec_info stmt_in
&& DR_GROUP_GAP (first_stmt_info) != 0
&& known_eq (nunits,
 (group_size
- - DR_GROUP_GAP (first_stmt_info)) * 
2))
+ - DR_GROUP_GAP (first_stmt_info)) * 2)
+   && known_eq (nunits, group_size))
  ltype = build_vector_type (TREE_TYPE (vectype),
 (group_size
  - DR_GROUP_GAP
Index: gcc/testsuite/gcc.target/i386/pr90358.c
===
--- gcc/testsuite/gcc.target/i386/pr90358.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr90358.c (working copy)
@@ -0,0 +1,35 @@
+/* PR target/90358 */
+/* { dg-do run { target { sse4_runtime } } } */
+/* { dg-options "-O3 -msse4" } */
+
+struct s { unsigned int a, b, c; };
+
+void __attribute__ ((noipa))
+foo (struct s *restrict s1, struct s *restrict s2, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  s1[i].b = s2[i].b;
+  s1[i].c = s2[i].c;
+  s2[i].c = 0;
+}
+}
+
+#define N 12
+
+int
+main ()
+{
+  struct s s1[N], s2[N];
+  for (unsigned int j = 0; j < N; ++j)
+{
+  s2[j].a = j * 5;
+  s2[j].b = j * 5 + 2;
+  s2[j].c = j * 5 + 4;
+}
+  foo (s1, s2, N);
+  for (unsigned int j = 0; j < N; ++j)
+  if (s1[j].b != j * 5 + 2)
+__builtin_abort ();
+  return 0;
+}


Re: [RFH] split {generic,gimple}-match.c files

2019-05-06 Thread Richard Biener
On Mon, 6 May 2019, Martin Liška wrote:

> On 5/2/19 9:04 PM, Richard Biener wrote:
> > On May 2, 2019 8:14:46 PM GMT+02:00, Segher Boessenkool 
> >  wrote:
> >> On Thu, May 02, 2019 at 07:41:18PM +0200, Richard Biener wrote:
> >>> On May 2, 2019 7:00:16 PM GMT+02:00, Segher Boessenkool
> >>  wrote:
>  On Thu, May 02, 2019 at 03:18:00PM +0200, Richard Biener wrote:
> > Somewhen earlier this year I've done the experiment with using
> > a compile with -flto -fno-fat-lto-objects and a link
> > via -flto -r -flinker-output=rel into the object file.  This cut
> > compile-time more than in half with less maintainance overhead.
> >
> > Adding other files to this handling looks trivial as well, as well
> > as conditionalizing it (I'd probably not want this for devel
> >> builds).
> 
>  But we want devel builds to be a lot faster than they are now :-/
> >>>
> >>> My devel build is -O0 non-bootstrapped and building the files after
> >> dependency changes is fast enough. It's the bootstraps that matter, no?
> >>
> >>
> >> Yes, I bootstrap most of the time.  For development.  It catches a
> >> *lot*
> >> of problems progressive builds do not.  (Those are plenty fast already
> >> of
> >> course, -O0 or not).
> > 
> > So we'd catch it there but disable by default for stage 1 since we probably 
> > do not want to rely on the host compiler. 
> > 
> > Richard. 
> > 
> >>
> >> Segher
> > 
> 
> I'm interested in 2 scenarios:
> 
> 1) fast non-bootstrap development build with -O2; typically I want to run a 
> fraction of test-suite or
> I want to install the compiler and run-time binaries; building the compiler 
> with -O0 will result in terribly
> slow build of run-time libraries

That's already overall "fast", mostly dominated by runtime library build.

> 2) faster bootstrap on a massively parallel machine (64+ cores)

I guess for this we can also try to do LTO bootstap and
LTO-link libbackend itself.  LTO bootstrap is only slow because
we build everything 12 times.

I'm most interested in faster bootstrap on low-core machines
(4 to 6 physical cores), since that's what I am doing most of the time.
This is dominated by testing time, not bootstrap time.

Richard.

Re: [RFH] split {generic,gimple}-match.c files

2019-05-06 Thread Richard Biener
On Mon, 6 May 2019, Martin Liška wrote:

> On 5/2/19 3:18 PM, Richard Biener wrote:
> > On Mon, 29 Apr 2019, Martin Liška wrote:
> > 
> >> On 9/10/18 1:43 PM, Martin Liška wrote:
> >>> On 09/04/2018 05:07 PM, Martin Liška wrote:
>  - in order to achieve real speed up we need to split also other 
>  generated (and also dwarf2out.c, i386.c, ..) files:
>  here I'm most concerned about insn-recog.c, which can't be split the 
>  same way without ending up with a single huge SCC component.
> >>>
> >>> About the insn-recog.c file: all functions are static and using SCC one 
> >>> ends
> >>> up with all functions in one component. In order to split the callgraph 
> >>> one
> >>> needs to promote some functions to be extern and then split would be 
> >>> possible.
> >>> In order to do that we'll probably need to teach splitter how to do 
> >>> partitioning
> >>> based on minimal number of edges to be removed.
> >>>
> >>> I need to inspire in lto_balanced_map, or is there some simple algorithm 
> >>> I can start with?
> >>>
> >>> Martin
> >>>
> >>
> >> I'm adding here Richard Sandiford as he wrote majority of gcc/genrecog.c 
> >> file.
> >> As mentioned, I'm seeking for a way how to split the generated file. Or how
> >> to learn the generator to process a reasonable splitting.
> > 
> > Somewhen earlier this year I've done the experiment with using
> > a compile with -flto -fno-fat-lto-objects
> 
> -fno-fat-lto-objects is default, isn't it?

Where linker plugin support is detected, yes.

> > and a link
> > via -flto -r -flinker-output=rel into the object file.  This cut
> > compile-time more than in half with less maintainance overhead.
> 
> Can you please provide exact command line how to do that?

gcc t.c -o t.o -flto=8 -r -flinker-output=nolto-rel

there's an annoying warning:

cc1plus: warning: command line option ‘-flinker-output=nolto-rel’ is valid 
for LTO but not for C++

which can be avoided by splitting the above into a compile and
a separate LTO "link" step.  Using -Wl,-flinker- doesn't
work unfortunately (ld doesn't understand it).

Using installed GCC 9.1 compiling trunk gimple-match.c with -O2 -g
takes 58.7s while with the LTO trick it takes 23.3s (combined
CPU time is up to 96s).  That was with -flto=8 on a CPU with
4 physical and 8 logical cores.  As it includes -g it includes
the debug copy dance as well.

> bloaty gimple-match.o -- gimple-match.o.nolto
 VM SIZE FILE SIZE
 ++ GROWING   
++
  [ = ]   0 .rela.debug_info  +3.62Mi   
+45%
  [ = ]   0 .rela.debug_ranges +161Ki  
+1.8%
  [ = ]   0 .debug_str+95.8Ki   
+19%
  [ = ]   0 .rela.text+77.6Ki   
+10%
  [ = ]   0 .debug_ranges +58.9Ki  
+1.7%
  [ = ]   0 .symtab   +22.9Ki   
+68%
  [ = ]   0 .debug_abbrev +21.1Ki  
+394%
  [ = ]   0 .strtab   +11.4Ki  
+9.5%
  +8.1% +5.34Ki .eh_frame +5.34Ki  
+8.1%
   +84% +4.09Ki .rodata.str1.8+4.09Ki   
+84%
  [ = ]   0 .rela.text.unlikely   +3.87Ki  
+1.0%
  [ = ]   0 .rela.debug_aranges   +3.68Ki  
+872%
  [ = ]   0 .debug_aranges+3.02Ki 
+10e2%
   +42% +2.59Ki .rodata.str1.1+2.59Ki   
+42%
  +0.2% +2.41Ki [Other]   +2.45Ki  
+0.2%
  [ = ]   0 .rela.debug_line  +2.09Ki   
+16%
  [ = ]   0 .rela.eh_frame+1.17Ki  
+4.3%
  [NEW] +1.09Ki .rodata._Z7get_defPFP9tree_nodeS0_ES0_.str1.8 +1.09Ki  
[NEW]
  [ = ]   0 .shstrtab+784   
+44%
  [ = ]   0 [ELF Headers]+768   
+16%
  [ = ]   0 .comment +666 
+37e2%

 -- SHRINKING 
--
  [ = ]   0 .debug_line-256Ki 
-17.3%
  [ = ]   0 .rela.debug_loc   -73.6Ki  
-0.6%
  [ = ]   0 .debug_info   -63.4Ki  
-1.6%
  [ = ]   0 .debug_loc-39.3Ki  
-0.6%

  +1.1% +15.5Ki TOTAL +3.67Mi  
+7.8%

.debug_line probably shrinks because we drop columns with LTO.

Richard.

Re: [PATCH] Fix a typo in two_value_replacement function

2019-05-06 Thread Li Jia He




On 2019/5/6 7:19 PM, Christophe Lyon wrote:

On Sun, 5 May 2019 at 08:31, Li Jia He  wrote:


Hi,

GCC revision 267634 implemented two_value_replacement function.
However, a typo occurred during the parameter check, which caused
us to miss some optimizations.

The intent of the code might be to check that the input parameters
are const int and their difference is one.  However, when I read
the code, I found that it is wrong to detect whether an input data
plus one is equal to itself.  This could be a typo.

The regression testing for the patch was done on GCC mainline on

 powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.  Is it OK for trunk and backport to gcc 9 ?

Thanks,
Lijia He

gcc/ChangeLog

2019-05-05  Li Jia He  

 * tree-ssa-phiopt.c (two_value_replacement):
 Fix a typo in parameter detection.

gcc/testsuite/ChangeLog

2019-05-05  Li Jia He  

 * gcc.dg/pr88676.c: Modify the include header file.
 * gcc.dg/tree-ssa/pr37508.c: Add the no-ssa-phiopt option to
 skip phi optimization.
 * gcc.dg/tree-ssa/pr88676.c: Rename to ...
 * gcc.dg/tree-ssa/pr88676-1.c: ... this new file.
 * gcc.dg/tree-ssa/pr88676-2.c: New testcase.


Hi,

This new testcase fails on arm and aarch64:
PASS: gcc.dg/tree-ssa/pr88676-2.c (test for excess errors)
UNRESOLVED: gcc.dg/tree-ssa/pr88676-2.c scan-tree-dump-not optimized " = PHI <"
because:
gcc.dg/tree-ssa/pr88676-2.c: dump file does not exist

Can you fix this?

Sorry for this error, I have reverted it.  Next time I will test it
well.


Thanks,

Christophe


---
  gcc/testsuite/gcc.dg/pr88676.c|  2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr37508.c   |  6 ++--
  .../tree-ssa/{pr88676.c => pr88676-1.c}   |  0
  gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c | 30 +++
  gcc/tree-ssa-phiopt.c |  2 +-
  5 files changed, 35 insertions(+), 5 deletions(-)
  rename gcc/testsuite/gcc.dg/tree-ssa/{pr88676.c => pr88676-1.c} (100%)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c

diff --git a/gcc/testsuite/gcc.dg/pr88676.c b/gcc/testsuite/gcc.dg/pr88676.c
index b5fdd9342b8..719208083ae 100644
--- a/gcc/testsuite/gcc.dg/pr88676.c
+++ b/gcc/testsuite/gcc.dg/pr88676.c
@@ -2,7 +2,7 @@
  /* { dg-do run } */
  /* { dg-options "-O2" } */

-#include "tree-ssa/pr88676.c"
+#include "tree-ssa/pr88676-1.c"

  __attribute__((noipa)) void
  bar (int x, int y, int z)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
index 2ba09afe481..a6def045de4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-fre -fdump-tree-vrp1" } */
+/* { dg-options "-O2 -fno-ssa-phiopt -fno-tree-fre -fdump-tree-vrp1" } */

  struct foo1 {
int i:1;
@@ -22,7 +22,7 @@ int test2 (struct foo2 *x)
  {
if (x->i == 0)
  return 1;
-  else if (x->i == -1) /* This test is already folded to false by ccp1.  */
+  else if (x->i == -1) /* This test is already optimized by ccp1 or phiopt1.  
*/
  return 1;
return 0;
  }
@@ -31,7 +31,7 @@ int test3 (struct foo1 *x)
  {
if (x->i == 0)
  return 1;
-  else if (x->i == 1) /* This test is already folded to false by fold.  */
+  else if (x->i == 1) /* This test is already optimized by ccp1 or phiopt1.  */
  return 1;
return 0;
  }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88676.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-1.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/tree-ssa/pr88676.c
rename to gcc/testsuite/gcc.dg/tree-ssa/pr88676-1.c
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c
new file mode 100644
index 000..d377948e14d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c
@@ -0,0 +1,30 @@
+/* PR tree-optimization/88676 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not " = PHI <" "optimized" } } */
+
+struct foo1 {
+  int i:1;
+};
+struct foo2 {
+  unsigned i:1;
+};
+
+int test1 (struct foo1 *x)
+{
+  if (x->i == 0)
+return 1;
+  else if (x->i == 1)
+return 1;
+  return 0;
+}
+
+int test2 (struct foo2 *x)
+{
+  if (x->i == 0)
+return 1;
+  else if (x->i == -1)
+return 1;
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 219791ea4ba..90674a2f3c4 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -602,7 +602,7 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
|| TREE_CODE (arg1) != INTEGER_CST
|| (tree_int_cst_lt (arg0, arg1)
   ? wi::to_widest (arg0) + 1 != wi::to_widest (arg1)
- : wi::to_widest (arg1) + 1 != wi::to_widest (arg1)))
+ : wi::to_widest (arg1) + 1 != wi::to_widest (arg0)))
  return false;

if (!empty_block_p (middle_bb))
--
2.17.1







[PATCH] Various store-merging improvements (PR tree-optimization/88709, PR tree-optimization/90271)

2019-05-06 Thread Jakub Jelinek
Hi!

The following patch adds some store-merging improvements.
Previously, only stores where the lhs was
ARRAY_REF, ARRAY_RANGE_REF, MEM_REF, COMPONENT_REF or BIT_FIELD_REF
were considered, as the testcases show it is useful to consider
stores into whole decl as well.
Another thing is that we were considering stores where the rhs was some
constant with fixed size mode and one that native_encode_expr can handle.
This patch extends it to handle = {} stores (but not clobbers).  While
native_encode_expr doesn't handle those, they can be handled very easily
(it is a memory clearing) and we don't need have the machine mode
restrictions for that case either.
With those two changes, we can handle stuff like:
  var = {};
  var.b = 1;
  var.d = 2;
and turn that into
  MEM [(struct S *)] = some_constant;
We do have a param how many stores we consider together and when all the
stores were fixed size, that naturally implied a reasonably small size of
the whole store group area which we e.g. precompute the bytes for.  With
= {} having almost arbitrary sizes, I feared we could easily trigger out of
compile time memory errors by just having var = {} for say 2GB variable
followed by var.a = 1; allocating 4GB of RAM (or even larger cases for 64-bit
hosts+targets).  So, the patch adds another parameter and its default to
limit the sizes of the groups, above that boundary we just don't merge
multiple stores into a group.
Another thing is, if we have a large store first, typically = {} of the
whole object, followed by some stores after it, either the initial clearing
is small and we manage to turn the stores into even smaller ones that cover
the complete object, but if the = {} is large, we'd often just punt because
it would create more stores than originally (we really count the = {} as one
store and although sometimes it could be expanded by pieces, in other times
it can be far more efficient to clear everything and overwrite a few
things).  This means, in various cases the patch could even regress on the
store merging done, because previously we wouldn't even consider the = {},
while now we've placed it into the same store group, so before we would have
that = {} separate and then merge the rest, while now we just punt.
This patch solves it by special casing the case where there is a clear of
the whole area, followed by some stores into that area.  We check what is in
that case smaller, whether to handle the = {} together with the rest, or
whether to treat it as = {} separate from the rest; in the latter case we
can do whatever we used to do before, but actually can do even better,
because we know the memory area is already pre-cleared, so for the purposes
of computation how many, how large and how much aligned stores we will do
we can actually treat all the zero bytes in the combined value to be stored
as the padding bytes (gaps); we have already stored zeros there, so we don't
need to overwrite it, but if it is beneficial, we actually can overwrite it
with further zeros.  When later on actually adding the individual bytes, we
don't need to consider those bytes as padding though, so can avoid doing a
RMW cycle, we know those bytes are zero.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-05-06  Jakub Jelinek  

PR tree-optimization/88709
PR tree-optimization/90271
* params.def (PARAM_STORE_MERGING_MAX_SIZE): New parameter.
* gimple-ssa-store-merging.c (encode_tree_to_bitpos): Handle
non-clobber CONSTRUCTORs with no elts.  Remove useless tmp_int
variable.
(imm_store_chain_info::coalesce_immediate_stores): Punt if the size
of the store merging group is larger than
PARAM_STORE_MERGING_MAX_SIZE parameter.
(split_group): Add bzero_first argument.  If set, always emit first
the first store which must be = {} of the whole area and then for the
rest of the stores consider all zero bytes as paddings.
(imm_store_chain_info::output_merged_store): Check if first store
is = {} of the whole area and if yes, determine which setting of
bzero_first for split_group gives smaller number of stores.  Adjust
split_group callers.
(lhs_valid_for_store_merging_p): Allow decls.
(rhs_valid_for_store_merging_p): Allow non-clobber CONTRUCTORs with
no elts.
(pass_store_merging::process_store): Likewise.

* gcc.dg/store_merging_26.c: New test.
* gcc.dg/store_merging_27.c: New test.
* gcc.dg/store_merging_28.c: New test.
* gcc.dg/store_merging_29.c: New test.

--- gcc/params.def.jj   2019-05-06 09:45:57.014019177 +0200
+++ gcc/params.def  2019-05-06 10:06:33.036183432 +0200
@@ -1205,6 +1205,11 @@ DEFPARAM (PARAM_MAX_STORES_TO_MERGE,
  "store merging pass.",
  64, 2, 0)
 
+DEFPARAM (PARAM_STORE_MERGING_MAX_SIZE,
+ "store-merging-max-size",
+ "Maximum size of a single store merging region in bytes.",
+   

[PATCH 2/2] or1k: Fix issues with msoft-div

2019-05-06 Thread Stafford Horne
As reported by Richard Selvaggi.  Also, add a basic test to verify the
soft math works when enabled.

gcc/testsuite/ChangeLog:

PR target/90362
* gcc.target/or1k/div-mul-3.c: New test.

libgcc/ChangeLog:

PR target/90362
* config/or1k/lib1funcs.S (__udivsi3): Change l.sfeqi
to l.sfeq and l.sfltsi to l.sflts equivalents as the immediate
instructions are not available on every processor.  Change a
l.bnf to l.bf to fix logic issue.
---
 gcc/testsuite/gcc.target/or1k/div-mul-3.c | 31 +++
 libgcc/config/or1k/lib1funcs.S|  6 ++---
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/or1k/div-mul-3.c

diff --git a/gcc/testsuite/gcc.target/or1k/div-mul-3.c 
b/gcc/testsuite/gcc.target/or1k/div-mul-3.c
new file mode 100644
index 000..2c4f91b7e98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/or1k/div-mul-3.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -msoft-div -msoft-mul" } */
+
+struct testcase {
+  int a;
+  int b;
+  int c;
+  int expected;
+};
+
+struct testcase tests[] = {
+  {2, 200, 3, 133},
+  {3, 300, 3, 300},
+  {2, 500, 3, 333},
+  {4, 250, 3, 333},
+  {0, 0, 0, 0}
+};
+
+int calc (int a, int b, int c) {
+  return a * b / c;
+}
+
+int main () {
+  int fail = 0;
+  struct testcase *tc;
+
+  for (int i = 0; (tc = [i], tc->c); i++)
+fail |= (calc (tc->a, tc->b, tc->c) != tc->expected);
+
+  return fail;
+}
diff --git a/libgcc/config/or1k/lib1funcs.S b/libgcc/config/or1k/lib1funcs.S
index d2103923486..6d058977229 100644
--- a/libgcc/config/or1k/lib1funcs.S
+++ b/libgcc/config/or1k/lib1funcs.S
@@ -68,18 +68,18 @@ __udivmodsi3_internal:
   is not clobbered by this routine, and use that as to
   save a return address without creating a stack frame.  */
 
-   l.sfeqi r4, 0   /* division by zero; return 0.  */
+   l.sfeq  r4, r0  /* division by zero; return 0.  */
l.ori   r11, r0, 0  /* initial quotient */
l.bf9f
 l.ori  r12, r3, 0  /* initial remainder */
 
/* Given X/Y, shift Y left until Y >= X.  */
l.ori   r6, r0, 1   /* mask = 1 */
-1: l.sfltsir4, 0   /* y has msb set */
+1: l.sflts r4, r0  /* y has msb set */
l.bf2f
 l.sfltur4, r12 /* y < x */
l.add   r4, r4, r4  /* y <<= 1 */
-   l.bnf   1b
+   l.bf1b
 l.add  r6, r6, r6  /* mask <<= 1 */
 
/* Shift Y back to the right again, subtracting from X.  */
-- 
2.19.1



[PATCH 0/2] OpenRISC fixes

2019-05-06 Thread Stafford Horne
I sent the fix for 90363 previously with the FPU patches.  Now, I have added
tests and another issue came up, so I am sending as a new series.  I planned to
commit the code before the 9.x release but missed that.

This is a series of fixes for the OpenRISC target found during recent testings.

PRs:

 - 90362 or1k: Soft divide does not work correctly
 - 90363 or1k: Extra mask insn after load from memory

Stafford Horne (2):
  or1k: Fix code quality for volatile memory loads
  or1k: Fix issues with msoft-div

 gcc/config/or1k/or1k.md   |  6 +-
 gcc/config/or1k/predicates.md | 18 +
 gcc/testsuite/gcc.target/or1k/div-mul-3.c | 31 
 gcc/testsuite/gcc.target/or1k/swap-1.c| 86 +++
 gcc/testsuite/gcc.target/or1k/swap-2.c| 63 +
 libgcc/config/or1k/lib1funcs.S|  6 +-
 6 files changed, 204 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/or1k/div-mul-3.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/swap-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/swap-2.c

-- 
2.19.1



[PATCH 1/2] or1k: Fix code quality for volatile memory loads

2019-05-06 Thread Stafford Horne
Volatile memory does not match the memory_operand predicate.  This
causes extra extend/mask instructions instructions when reading
from volatile memory.  On OpenRISC loading volitile memory can be
treated the same as regular memory loads which supports combined
sign/zero extends.  Fixing this eliminates the need for extra
extend/mask instructions.

This also adds a test provided by Richard Selvaggi which uncovered the
issue while we were looking into another issue.

gcc/ChangeLog:

PR target/90363
* config/or1k/or1k.md (zero_extendsi2): Update predicate.
(extendsi2): Update predicate.
* gcc/config/or1k/predicates.md (volatile_mem_operand): New.
(reg_or_mem_operand): New.

gcc/testsuite/ChangeLog:

PR target/90363
* gcc.target/or1k/swap-1.c: New test.
* gcc.target/or1k/swap-2.c: New test.
---
 gcc/config/or1k/or1k.md|  6 +-
 gcc/config/or1k/predicates.md  | 18 ++
 gcc/testsuite/gcc.target/or1k/swap-1.c | 86 ++
 gcc/testsuite/gcc.target/or1k/swap-2.c | 63 +++
 4 files changed, 170 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/or1k/swap-1.c
 create mode 100644 gcc/testsuite/gcc.target/or1k/swap-2.c

diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md
index 2dad51cd46b..757d899c442 100644
--- a/gcc/config/or1k/or1k.md
+++ b/gcc/config/or1k/or1k.md
@@ -328,11 +328,11 @@
 ;; Sign Extending
 ;; -
 
-;; Zero extension can always be done with AND and an extending load.
+;; Zero extension can always be done with AND or an extending load.
 
 (define_insn "zero_extendsi2"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-   (zero_extend:SI (match_operand:I12 1 "nonimmediate_operand" "r,m")))]
+   (zero_extend:SI (match_operand:I12 1 "reg_or_mem_operand" "r,m")))]
   ""
   "@
l.andi\t%0, %1, 
@@ -344,7 +344,7 @@
 
 (define_insn "extendsi2"
   [(set (match_operand:SI 0 "register_operand"  "=r,r")
-   (sign_extend:SI (match_operand:I12 1 "nonimmediate_operand"  "r,m")))]
+   (sign_extend:SI (match_operand:I12 1 "reg_or_mem_operand"  "r,m")))]
   "TARGET_SEXT"
   "@
l.exts\t%0, %1
diff --git a/gcc/config/or1k/predicates.md b/gcc/config/or1k/predicates.md
index 879236bca49..b895f1b4228 100644
--- a/gcc/config/or1k/predicates.md
+++ b/gcc/config/or1k/predicates.md
@@ -82,3 +82,21 @@
 
 (define_predicate "equality_comparison_operator"
   (match_code "ne,eq"))
+
+;; Borrowed from rs6000
+;  Return true if the operand is in volatile memory.  Note that during the
+;; RTL generation phase, memory_operand does not return TRUE for volatile
+;; memory references.  So this function allows us to recognize volatile
+;; references where it's safe.
+(define_predicate "volatile_mem_operand"
+  (and (match_code "mem")
+   (match_test "MEM_VOLATILE_P (op)")
+   (if_then_else (match_test "reload_completed")
+(match_operand 0 "memory_operand")
+(match_test "memory_address_p (mode, XEXP (op, 0))"
+
+;; Return true if the operand is a register or memory; including volatile
+;; memory.
+(define_predicate "reg_or_mem_operand"
+  (ior (match_operand 0 "nonimmediate_operand")
+   (match_operand 0 "volatile_mem_operand")))
diff --git a/gcc/testsuite/gcc.target/or1k/swap-1.c 
b/gcc/testsuite/gcc.target/or1k/swap-1.c
new file mode 100644
index 000..233c4b71627
--- /dev/null
+++ b/gcc/testsuite/gcc.target/or1k/swap-1.c
@@ -0,0 +1,86 @@
+/* { dg-do run } */
+/* { dg-options "-Os -mhard-mul -msoft-div -msoft-float" } */
+
+/* Copyright (C) 2018-2019 Free Software Foundation, Inc.
+   Copyright 2019 Broadcom.   Richard Selvaggi, 2019-March-27
+   The term "Broadcom" refers to Broadcom Inc. and/or its subsidiaries.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License, version 2, as
+   published by the Free Software Foundation (the "GPL").
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License version 2 (GPLv2) for more details.
+
+   You should have received a copy of the GNU General Public License
+   version 2 (GPLv2) along with this source code.  */
+
+/* Notes:
+
+   This test failed on or1k GCC 7.2.0, and passes on or1k GCC 5.3.0
+   as well as the or1k port released in GCC 9.1.
+
+   The main program is organized as a loop structure so gcc does not
+   optimize-away the calls to swap_1().  Compiling with -O2 is still smart
+   enough to optimize-away the calls, but using -Os does not.
+   The bad code is only generated when compiled with -Os.
+
+   When the bad code is generated all code is okay except for the very last
+   instruction (a 

Re: [RFH] split {generic,gimple}-match.c files

2019-05-06 Thread Martin Liška
On 5/2/19 3:18 PM, Richard Biener wrote:
> On Mon, 29 Apr 2019, Martin Liška wrote:
> 
>> On 9/10/18 1:43 PM, Martin Liška wrote:
>>> On 09/04/2018 05:07 PM, Martin Liška wrote:
 - in order to achieve real speed up we need to split also other generated 
 (and also dwarf2out.c, i386.c, ..) files:
 here I'm most concerned about insn-recog.c, which can't be split the same 
 way without ending up with a single huge SCC component.
>>>
>>> About the insn-recog.c file: all functions are static and using SCC one ends
>>> up with all functions in one component. In order to split the callgraph one
>>> needs to promote some functions to be extern and then split would be 
>>> possible.
>>> In order to do that we'll probably need to teach splitter how to do 
>>> partitioning
>>> based on minimal number of edges to be removed.
>>>
>>> I need to inspire in lto_balanced_map, or is there some simple algorithm I 
>>> can start with?
>>>
>>> Martin
>>>
>>
>> I'm adding here Richard Sandiford as he wrote majority of gcc/genrecog.c 
>> file.
>> As mentioned, I'm seeking for a way how to split the generated file. Or how
>> to learn the generator to process a reasonable splitting.
> 
> Somewhen earlier this year I've done the experiment with using
> a compile with -flto -fno-fat-lto-objects

-fno-fat-lto-objects is default, isn't it?

> and a link
> via -flto -r -flinker-output=rel into the object file.  This cut
> compile-time more than in half with less maintainance overhead.

Can you please provide exact command line how to do that?

> 
> Adding other files to this handling looks trivial as well, as well
> as conditionalizing it (I'd probably not want this for devel builds).
> 
> It might be interesting to optimize this a bit as well by somehow
> merging the compile and WPA stages, thus special-casing single TU
> WPA input in a similar way as we (still) have -flto-partition=none.
> 
> That said, re-doing the measurement is probably interesting (as
> applying to other cases like insn-recog.c).
> 
> Richard.
> 



Re: [RFH] split {generic,gimple}-match.c files

2019-05-06 Thread Martin Liška
On 5/2/19 9:04 PM, Richard Biener wrote:
> On May 2, 2019 8:14:46 PM GMT+02:00, Segher Boessenkool 
>  wrote:
>> On Thu, May 02, 2019 at 07:41:18PM +0200, Richard Biener wrote:
>>> On May 2, 2019 7:00:16 PM GMT+02:00, Segher Boessenkool
>>  wrote:
 On Thu, May 02, 2019 at 03:18:00PM +0200, Richard Biener wrote:
> Somewhen earlier this year I've done the experiment with using
> a compile with -flto -fno-fat-lto-objects and a link
> via -flto -r -flinker-output=rel into the object file.  This cut
> compile-time more than in half with less maintainance overhead.
>
> Adding other files to this handling looks trivial as well, as well
> as conditionalizing it (I'd probably not want this for devel
>> builds).

 But we want devel builds to be a lot faster than they are now :-/
>>>
>>> My devel build is -O0 non-bootstrapped and building the files after
>> dependency changes is fast enough. It's the bootstraps that matter, no?
>>
>>
>> Yes, I bootstrap most of the time.  For development.  It catches a
>> *lot*
>> of problems progressive builds do not.  (Those are plenty fast already
>> of
>> course, -O0 or not).
> 
> So we'd catch it there but disable by default for stage 1 since we probably 
> do not want to rely on the host compiler. 
> 
> Richard. 
> 
>>
>> Segher
> 

I'm interested in 2 scenarios:

1) fast non-bootstrap development build with -O2; typically I want to run a 
fraction of test-suite or
I want to install the compiler and run-time binaries; building the compiler 
with -O0 will result in terribly
slow build of run-time libraries

2) faster bootstrap on a massively parallel machine (64+ cores)

Martin


Re: aliasing_component_refs_p tweek

2019-05-06 Thread Richard Biener
On Mon, 6 May 2019, Jan Hubicka wrote:

> Hi,
> this patch makes aliasing_component_refs_p little bit stronger, especially
> with LTO increasing number of disambiguations on tramp3d by 2%.
> 
> It took me a while to uderstand that function so a bit summary what it
> does.
> 
> Function analyzes chains of nested references like
> 
>  ref1: a.b.c.d.e
>  ref2: c.d.f
> 
> (Here I assume that type of a is a and type of b is b etc.)
> 
> On such chains it looks for outermost common type (c) and then compares
> accesss c.d.e and c.d.f for range overlap. This is safe because we know
> that both copies of type c are either overlaping or completely
> different.
> 
> Walk is done in both directions so
> 
>  ref1: c.d.f
>  ref2: a.b.c.d.e
> 
> Gives the same answer.
> 
> In the final step it is assumed that the reference chains do not have
> common type and may reference one to another only if one is continuation
> of the other (with some intermediate steps omitted).
> 
> So function is organized as follows
> 
>  1) perform walk of ref1 looking for type of base of ref2.
> If we sucessful use base+offset oracle.
>  2) perform walk of ref2 looking for type of base of ref1
> If we sucessful use base+offset oracle.
>  3) assume that chains are disjoiont and see of base type of one is subset
> of ref type of the other (i.e. one can be can be a continuation of the
> chain).
> 
> Now same_type_for_tbaa can return -1 for some odd cases which include the case
> where TYPE_CANONICAL==NULL the function gives up.  This is quite common with
> LTO because TYPE_CANONICAL is NULL for pointers, vectors and arrays and thus 
> we
> often give up after the innermost reference.
>
> If the first walk in 1) terminates with -1 then it still makes sense to do the
> walk in 2) and see if the common type is found.  If there is no common type we
> need to give up on the final test.
> 
> I wonder if as an optimization we do not want to terminate the walk when one
> type is too small to hold the second.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
>   * tree-ssa-alias.c (aliasing_component_refs_p): Walk references
>   both directions even if first walk fails.
> Index: tree-ssa-alias.c
> ===
> --- tree-ssa-alias.c  (revision 270877)
> +++ tree-ssa-alias.c  (working copy)
> @@ -814,10 +841,7 @@ aliasing_component_refs_p (tree ref1,
>&& same_type_for_tbaa (TREE_TYPE (*refp), type1) == 0)
>  refp = _OPERAND (*refp, 0);
>same_p = same_type_for_tbaa (TREE_TYPE (*refp), type1);
> -  /* If we couldn't compare types we have to bail out.  */
> -  if (same_p == -1)
> -return true;
> -  else if (same_p == 1)
> +  if (same_p == 1)
>  {
>poly_int64 offadj, sztmp, msztmp;
>bool reverse;
> @@ -827,24 +851,31 @@ aliasing_component_refs_p (tree ref1,
>offset1 -= offadj;
>return ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2);
>  }
> -  /* If we didn't find a common base, try the other way around.  */
> -  refp = 
> -  while (handled_component_p (*refp)
> -  && same_type_for_tbaa (TREE_TYPE (*refp), type2) == 0)
> -refp = _OPERAND (*refp, 0);
> -  same_p = same_type_for_tbaa (TREE_TYPE (*refp), type2);
> -  /* If we couldn't compare types we have to bail out.  */
> -  if (same_p == -1)
> -return true;
> -  else if (same_p == 1)
> +  else 
>  {

No need for the else {} and thus indenting the rest since the if ()
arm always returns from the function.

OK with eliding this else { } wrapping.

Thanks,
Richard.

> -  poly_int64 offadj, sztmp, msztmp;
> -  bool reverse;
> -  get_ref_base_and_extent (*refp, , , , );
> -  offset1 -= offadj;
> -  get_ref_base_and_extent (base2, , , , );
> -  offset2 -= offadj;
> -  return ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2);
> +  int same_p2;
> +
> +  /* If we didn't find a common base, try the other way around.  */
> +  refp = 
> +  while (handled_component_p (*refp)
> +  && same_type_for_tbaa (TREE_TYPE (*refp), type2) == 0)
> + refp = _OPERAND (*refp, 0);
> +  same_p2 = same_type_for_tbaa (TREE_TYPE (*refp), type2);
> +  if (same_p2 == 1)
> + {
> +   poly_int64 offadj, sztmp, msztmp;
> +   bool reverse;
> +   get_ref_base_and_extent (*refp, , , , );
> +   offset1 -= offadj;
> +   get_ref_base_and_extent (base2, , , , );
> +   offset2 -= offadj;
> +   return ranges_maybe_overlap_p (offset1, max_size1,
> +  offset2, max_size2);
> + }
> +  /* In the remaining test we assume that there is no overlapping type
> +  at all.  So if we are unsure, we need to give up.  */
> +  else if (same_p == -1 || same_p2 == -1)
> + return true;
>  }
>  
>/* If we have two type access paths B1.path1 and B2.path2 they may
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, 

[wwwdocs] Document GCC 9 Solaris changes

2019-05-06 Thread Rainer Orth
Better late than never: here's the gcc-9/changes.html update listing
Solaris improvements.  I'm all ears for suggestions about wording or
markup improvements.

Ok?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2019-05-06  Rainer Orth  

* htdocs/gcc-9/changes.html (Operating System, Solaris): Document
improvements.

Index: gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.66
diff -u -p -r1.66 changes.html
--- gcc-9/changes.html	4 May 2019 14:26:16 -	1.66
+++ gcc-9/changes.html	6 May 2019 12:45:25 -
@@ -959,7 +959,18 @@ $ g++ typo.cc
 
 
 
-
+Solaris
+
+
+   g++ now unconditionally enables large file support when
+compiling 32-bit code.
+   Support for the AddressSanitizer and UndefinedBehaviorSanitizer has
+been merged from LLVM.  For the moment, this only works for 32-bit code
+on both SPARC and x86.
+   An initial port of the D run-time library has been completed on
+Solaris 11/x86.  It requires the use of GNU as.
+Solaris 11/SPARC support is still work-in-progress.
+
 
 
 


Re: [PATCH] Prep for PR88828 fix

2019-05-06 Thread Richard Biener
On Fri, 3 May 2019, H.J. Lu wrote:

> On Fri, May 3, 2019 at 6:02 AM Richard Biener  wrote:
> >
> >
> > The following refactors simplify_vector_constructor and adds
> > handling of constants to it in a straight-forward way.
> >
> > A followup will handle the testcases posted in HJs patch.
...
> > - orig[j] = ref;
> > - break;
> > -   }
> > - else if (ref == orig[j])
> > -   break;
> 
> Missing else
>  return false;

Oops, indeed.

The following is what I have applied after bootstrap/regtest on 
x86_64-unknown-linux-gnu.

Richard.

2019-05-06  Richard Biener  

PR tree-optimization/88828
* tree-ssa-forwprop.c (get_bit_field_ref_def): Split out from...
(simplify_vector_constructor): ...here.  Handle constants in
the constructor.

* gcc.target/i386/pr88828-0.c: New testcase.

Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c (revision 270902)
+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -1997,17 +1997,54 @@ simplify_permutation (gimple_stmt_iterat
   return 0;
 }
 
+/* Get the BIT_FIELD_REF definition of VAL, if any, looking through
+   conversions with code CONV_CODE or update it if still ERROR_MARK.
+   Return NULL_TREE if no such matching def was found.  */
+
+static tree
+get_bit_field_ref_def (tree val, enum tree_code _code)
+{
+  if (TREE_CODE (val) != SSA_NAME)
+return NULL_TREE ;
+  gimple *def_stmt = get_prop_source_stmt (val, false, NULL);
+  if (!def_stmt)
+return NULL_TREE;
+  enum tree_code code = gimple_assign_rhs_code (def_stmt);
+  if (code == FLOAT_EXPR
+  || code == FIX_TRUNC_EXPR)
+{
+  tree op1 = gimple_assign_rhs1 (def_stmt);
+  if (conv_code == ERROR_MARK)
+   {
+ if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (val))),
+   GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1)
+   return NULL_TREE;
+ conv_code = code;
+   }
+  else if (conv_code != code)
+   return NULL_TREE;
+  if (TREE_CODE (op1) != SSA_NAME)
+   return NULL_TREE;
+  def_stmt = SSA_NAME_DEF_STMT (op1);
+  if (! is_gimple_assign (def_stmt))
+   return NULL_TREE;
+  code = gimple_assign_rhs_code (def_stmt);
+}
+  if (code != BIT_FIELD_REF)
+return NULL_TREE;
+  return gimple_assign_rhs1 (def_stmt);
+}
+
 /* Recognize a VEC_PERM_EXPR.  Returns true if there were any changes.  */
 
 static bool
 simplify_vector_constructor (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
-  gimple *def_stmt;
   tree op, op2, orig[2], type, elem_type;
   unsigned elem_size, i;
   unsigned HOST_WIDE_INT nelts;
-  enum tree_code code, conv_code;
+  enum tree_code conv_code;
   constructor_elt *elt;
   bool maybe_ident;
 
@@ -2027,6 +2064,9 @@ simplify_vector_constructor (gimple_stmt
   orig[1] = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
+  tree one_constant = NULL_TREE;
+  auto_vec constants;
+  constants.safe_grow_cleared (nelts);
   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
 {
   tree ref, op1;
@@ -2034,68 +2074,57 @@ simplify_vector_constructor (gimple_stmt
   if (i >= nelts)
return false;
 
-  if (TREE_CODE (elt->value) != SSA_NAME)
-   return false;
-  def_stmt = get_prop_source_stmt (elt->value, false, NULL);
-  if (!def_stmt)
-   return false;
-  code = gimple_assign_rhs_code (def_stmt);
-  if (code == FLOAT_EXPR
- || code == FIX_TRUNC_EXPR)
+  op1 = get_bit_field_ref_def (elt->value, conv_code);
+  if (op1)
{
- op1 = gimple_assign_rhs1 (def_stmt);
- if (conv_code == ERROR_MARK)
+ ref = TREE_OPERAND (op1, 0);
+ unsigned int j;
+ for (j = 0; j < 2; ++j)
{
- if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (elt->value))),
-   GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1)
-   return false;
- conv_code = code;
+ if (!orig[j])
+   {
+ if (TREE_CODE (ref) != SSA_NAME)
+   return false;
+ if (! VECTOR_TYPE_P (TREE_TYPE (ref))
+ || ! useless_type_conversion_p (TREE_TYPE (op1),
+ TREE_TYPE (TREE_TYPE 
(ref
+   return false;
+ if (j && !useless_type_conversion_p (TREE_TYPE (orig[0]),
+  TREE_TYPE (ref)))
+   return false;
+ orig[j] = ref;
+ break;
+   }
+ else if (ref == orig[j])
+   break;
}
- else if (conv_code != code)
+ if (j == 2)
return false;
- if (TREE_CODE (op1) != SSA_NAME)
-   return false;
- def_stmt = SSA_NAME_DEF_STMT (op1);
- if (! 

Re: [PATCH] Fix a typo in two_value_replacement function

2019-05-06 Thread Li Jia He




On 2019/5/6 8:27 PM, Jakub Jelinek wrote:

On Mon, May 06, 2019 at 08:22:41PM +0800, Li Jia He wrote:

Dunno why this changed, if you want it in phiopt1, you need "phiopt1"
in scan-tree-dump-not as well, if you want optimized dump, you need
-fdump-tree-optimized instead.

When I test the code again for submiting the code, I found that this change
only affects the dump file generated by phiopt1,
so I rashly decided to change the test option to dump-tree-phiopt1. If there
is any code change in the future, I will send another patch. Sorry for the
error caused by this.


Even if you rush up such a change (happens sometimes), if it is testsuite
only you don't really need to bootstrap/regtest fully again, but retesting
the single testcase is needed.  So in objdir/gcc
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
tree-ssa.exp=pr88676*'
in this case if the target is multilib, or
make check-gcc RUNTESTFLAGS='tree-ssa.exp=pr88676*'
otherwise.
Thank you very much for this suggestion, I will be very careful in the 
future.


Jakub





aliasing_component_refs_p tweek

2019-05-06 Thread Jan Hubicka
Hi,
this patch makes aliasing_component_refs_p little bit stronger, especially
with LTO increasing number of disambiguations on tramp3d by 2%.

It took me a while to uderstand that function so a bit summary what it
does.

Function analyzes chains of nested references like

 ref1: a.b.c.d.e
 ref2: c.d.f

(Here I assume that type of a is a and type of b is b etc.)

On such chains it looks for outermost common type (c) and then compares
accesss c.d.e and c.d.f for range overlap. This is safe because we know
that both copies of type c are either overlaping or completely
different.

Walk is done in both directions so

 ref1: c.d.f
 ref2: a.b.c.d.e

Gives the same answer.

In the final step it is assumed that the reference chains do not have
common type and may reference one to another only if one is continuation
of the other (with some intermediate steps omitted).

So function is organized as follows

 1) perform walk of ref1 looking for type of base of ref2.
If we sucessful use base+offset oracle.
 2) perform walk of ref2 looking for type of base of ref1
If we sucessful use base+offset oracle.
 3) assume that chains are disjoiont and see of base type of one is subset
of ref type of the other (i.e. one can be can be a continuation of the
chain).

Now same_type_for_tbaa can return -1 for some odd cases which include the case
where TYPE_CANONICAL==NULL the function gives up.  This is quite common with
LTO because TYPE_CANONICAL is NULL for pointers, vectors and arrays and thus we
often give up after the innermost reference.

If the first walk in 1) terminates with -1 then it still makes sense to do the
walk in 2) and see if the common type is found.  If there is no common type we
need to give up on the final test.

I wonder if as an optimization we do not want to terminate the walk when one
type is too small to hold the second.

Bootstrapped/regtested x86_64-linux, OK?

* tree-ssa-alias.c (aliasing_component_refs_p): Walk references
both directions even if first walk fails.
Index: tree-ssa-alias.c
===
--- tree-ssa-alias.c(revision 270877)
+++ tree-ssa-alias.c(working copy)
@@ -814,10 +841,7 @@ aliasing_component_refs_p (tree ref1,
 && same_type_for_tbaa (TREE_TYPE (*refp), type1) == 0)
 refp = _OPERAND (*refp, 0);
   same_p = same_type_for_tbaa (TREE_TYPE (*refp), type1);
-  /* If we couldn't compare types we have to bail out.  */
-  if (same_p == -1)
-return true;
-  else if (same_p == 1)
+  if (same_p == 1)
 {
   poly_int64 offadj, sztmp, msztmp;
   bool reverse;
@@ -827,24 +851,31 @@ aliasing_component_refs_p (tree ref1,
   offset1 -= offadj;
   return ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2);
 }
-  /* If we didn't find a common base, try the other way around.  */
-  refp = 
-  while (handled_component_p (*refp)
-&& same_type_for_tbaa (TREE_TYPE (*refp), type2) == 0)
-refp = _OPERAND (*refp, 0);
-  same_p = same_type_for_tbaa (TREE_TYPE (*refp), type2);
-  /* If we couldn't compare types we have to bail out.  */
-  if (same_p == -1)
-return true;
-  else if (same_p == 1)
+  else 
 {
-  poly_int64 offadj, sztmp, msztmp;
-  bool reverse;
-  get_ref_base_and_extent (*refp, , , , );
-  offset1 -= offadj;
-  get_ref_base_and_extent (base2, , , , );
-  offset2 -= offadj;
-  return ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2);
+  int same_p2;
+
+  /* If we didn't find a common base, try the other way around.  */
+  refp = 
+  while (handled_component_p (*refp)
+&& same_type_for_tbaa (TREE_TYPE (*refp), type2) == 0)
+   refp = _OPERAND (*refp, 0);
+  same_p2 = same_type_for_tbaa (TREE_TYPE (*refp), type2);
+  if (same_p2 == 1)
+   {
+ poly_int64 offadj, sztmp, msztmp;
+ bool reverse;
+ get_ref_base_and_extent (*refp, , , , );
+ offset1 -= offadj;
+ get_ref_base_and_extent (base2, , , , );
+ offset2 -= offadj;
+ return ranges_maybe_overlap_p (offset1, max_size1,
+offset2, max_size2);
+   }
+  /* In the remaining test we assume that there is no overlapping type
+at all.  So if we are unsure, we need to give up.  */
+  else if (same_p == -1 || same_p2 == -1)
+   return true;
 }
 
   /* If we have two type access paths B1.path1 and B2.path2 they may


Re: [PATCH] Fix a typo in two_value_replacement function

2019-05-06 Thread Jakub Jelinek
On Mon, May 06, 2019 at 08:22:41PM +0800, Li Jia He wrote:
> > Dunno why this changed, if you want it in phiopt1, you need "phiopt1"
> > in scan-tree-dump-not as well, if you want optimized dump, you need
> > -fdump-tree-optimized instead.
> When I test the code again for submiting the code, I found that this change
> only affects the dump file generated by phiopt1,
> so I rashly decided to change the test option to dump-tree-phiopt1. If there
> is any code change in the future, I will send another patch. Sorry for the
> error caused by this.

Even if you rush up such a change (happens sometimes), if it is testsuite
only you don't really need to bootstrap/regtest fully again, but retesting
the single testcase is needed.  So in objdir/gcc
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
tree-ssa.exp=pr88676*'
in this case if the target is multilib, or
make check-gcc RUNTESTFLAGS='tree-ssa.exp=pr88676*'
otherwise.

Jakub


Re: [PATCH] Fix a typo in two_value_replacement function

2019-05-06 Thread Li Jia He




On 2019/5/6 7:35 PM, Jakub Jelinek wrote:

On Mon, May 06, 2019 at 01:19:15PM +0200, Christophe Lyon wrote:

The regression testing for the patch was done on GCC mainline on

 powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.  Is it OK for trunk and backport to gcc 9 ?


While the posted patch had:

+/* PR tree-optimization/88676 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not " = PHI <" "optimized" } } */

the actually committed patch has:
/* PR tree-optimization/88676 */
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-phiopt1" } */
/* { dg-final { scan-tree-dump-not " = PHI <" "optimized" } } */

Dunno why this changed, if you want it in phiopt1, you need "phiopt1"
in scan-tree-dump-not as well, if you want optimized dump, you need
-fdump-tree-optimized instead.
When I test the code again for submiting the code, I found that this 
change only affects the dump file generated by phiopt1,
so I rashly decided to change the test option to dump-tree-phiopt1. If 
there is any code change in the future, I will send another patch. 
Sorry for the error caused by this.


This new testcase fails on arm and aarch64:
PASS: gcc.dg/tree-ssa/pr88676-2.c (test for excess errors)
UNRESOLVED: gcc.dg/tree-ssa/pr88676-2.c scan-tree-dump-not optimized " = PHI <"
because:
gcc.dg/tree-ssa/pr88676-2.c: dump file does not exist

Can you fix this?


Jakub





Re: [PATCH][Tree-optimization/PR89772]fold memchr builtins for character not in constant nul-padded string

2019-05-06 Thread JunMa

在 2019/5/6 下午6:02, Richard Biener 写道:

On Thu, Mar 21, 2019 at 5:57 AM JunMa  wrote:

Hi
For now, gcc can not fold code like:

const char a[5] = "123"
__builtin_memchr (a, '7', sizeof a)

It tries to avoid folding out of string length although length of a is 5.
This is a bit conservative, it's safe to folding memchr/bcmp/memcmp
builtins when constant string stores in array with some trailing nuls.

This patch folds these cases by exposing additional length of
trailing nuls in c_getstr().
Bootstrapped/regtested on x86_64-linux, ok for trunk?

It's probably better if gimple_fold_builtin_memchr uses string_constant
directly instead?

We can use string_constant in gimple_fold_builtin_memchr, however it is a
bit complex to use it in memchr/memcmp constant folding.

You are changing memcmp constant folding but only have a testcase
for memchr.

I'll add later.

If we decide to amend c_getstr I would rather make it return the
total length instead of the number of trailing zeros.

I think return the total length is safe in other place as well.
I used the argument in patch since I do not want any impact on
other part at all.

Thanks
JunMa

Richard.


Regards
JunMa


gcc/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * fold-const.c (c_getstr): Add new parameter to get length of
additional
  trailing nuls after constant string.
  * gimple-fold.c (gimple_fold_builtin_memchr): consider trailing nuls in
  out-of-bound accesses checking.
  * fold-const-call.c (fold_const_call): Likewise.


gcc/testsuite/ChangeLog

2019-03-21  Jun Ma 

  PR Tree-optimization/89772
  * gcc.dg/builtin-memchr-4.c: New test.





Re: [PATCH] Fix a typo in two_value_replacement function

2019-05-06 Thread Jakub Jelinek
On Mon, May 06, 2019 at 01:19:15PM +0200, Christophe Lyon wrote:
> > The regression testing for the patch was done on GCC mainline on
> >
> > powerpc64le-unknown-linux-gnu (Power 9 LE)
> >
> > with no regressions.  Is it OK for trunk and backport to gcc 9 ?

While the posted patch had:
> > +/* PR tree-optimization/88676 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +/* { dg-final { scan-tree-dump-not " = PHI <" "optimized" } } */
the actually committed patch has:
/* PR tree-optimization/88676 */
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-phiopt1" } */
/* { dg-final { scan-tree-dump-not " = PHI <" "optimized" } } */

Dunno why this changed, if you want it in phiopt1, you need "phiopt1"
in scan-tree-dump-not as well, if you want optimized dump, you need
-fdump-tree-optimized instead.
> 
> This new testcase fails on arm and aarch64:
> PASS: gcc.dg/tree-ssa/pr88676-2.c (test for excess errors)
> UNRESOLVED: gcc.dg/tree-ssa/pr88676-2.c scan-tree-dump-not optimized " = PHI 
> <"
> because:
> gcc.dg/tree-ssa/pr88676-2.c: dump file does not exist
> 
> Can you fix this?

Jakub


Re: [PATCH 2/2] Support {MIN,MAX}_EXPR in GIMPLE FE.

2019-05-06 Thread Richard Biener
On Mon, May 6, 2019 at 10:00 AM Martin Liška  wrote:
>
> Hi.
>
> The patch is about support of a new GIMPLE expr.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Can you please avoid using/changing parser_build_binary_op?  The other
binary expression handling just does

  if (lhs.value != error_mark_node && rhs.value != error_mark_node)
ret.value = build2_loc (ret_loc, code, ret_type, lhs.value, rhs.value);

which should work equally well here.  I think for future expansion
splitting out the ( op, op ) parsing and expression building into
a function might be nicer so that c_parser_gimple_unary_expression
reads

  if (strcmp (INDENTIFIER_POINTER (id), "__MIN") == 0)
return c_parser_gimple_parentized_binary_expression (op_loc, MIN_EXPR);
  else if (...)

OK with such change/factoring.

Thanks,
Richard.

> Thanks,
> Martin


[libcpp] struct deps renaming

2019-05-06 Thread Nathan Sidwell
I have some patches to mkdeps, but first this one renames the rather 
bland 'struct deps' to 'struct mkdeps'.  (discovered when I made myself 
a local 'struct deps')


[mkdeps at least matches the file name, even though it's going to become 
non make-specific.]


nathan
--
Nathan Sidwell
2019-05-06  Nathan Sidwell  

	libcpp/
	* include/mkdeps.h: Rename struct deps to struct mkdeps.
	* mkdeps.c: Likewise.
	* include/cpplib.h (cpp_get_deps): Rename return type..
	* directives.c (cpp_get_deps): Likewise.
	* internal.h (struct cpp_reader): Rename deps field type.

	gcc/c-family/
	* c-opts.c (handle_defered_opts): Rename struct deps to struc mkdeps.

Index: gcc/c-family/c-opts.c
===
--- gcc/c-family/c-opts.c	(revision 270904)
+++ gcc/c-family/c-opts.c	(working copy)
@@ -1277,18 +1277,15 @@ check_deps_environment_vars (void)
 static void
 handle_deferred_opts (void)
 {
-  size_t i;
-  struct deps *deps;
-
   /* Avoid allocating the deps buffer if we don't need it.
  (This flag may be true without there having been -MT or -MQ
  options, but we'll still need the deps buffer.)  */
   if (!deps_seen)
 return;
 
-  deps = cpp_get_deps (parse_in);
+  struct mkdeps *deps = cpp_get_deps (parse_in);
 
-  for (i = 0; i < deferred_count; i++)
+  for (size_t i = 0; i < deferred_count; i++)
 {
   struct deferred_opt *opt = _opts[i];
 
Index: libcpp/directives.c
===
--- libcpp/directives.c	(revision 270904)
+++ libcpp/directives.c	(working copy)
@@ -2539,7 +2539,7 @@ cpp_set_callbacks (cpp_reader *pfile, cp
 }
 
 /* The dependencies structure.  (Creates one if it hasn't already been.)  */
-struct deps *
+struct mkdeps *
 cpp_get_deps (cpp_reader *pfile)
 {
   if (!pfile->deps)
Index: libcpp/include/cpplib.h
===
--- libcpp/include/cpplib.h	(revision 270904)
+++ libcpp/include/cpplib.h	(working copy)
@@ -953,7 +953,7 @@ extern void cpp_set_include_chains (cpp_
 extern cpp_options *cpp_get_options (cpp_reader *);
 extern cpp_callbacks *cpp_get_callbacks (cpp_reader *);
 extern void cpp_set_callbacks (cpp_reader *, cpp_callbacks *);
-extern struct deps *cpp_get_deps (cpp_reader *);
+extern struct mkdeps *cpp_get_deps (cpp_reader *);
 
 /* This function reads the file, but does not start preprocessing.  It
returns the name of the original file; this is the same as the
Index: libcpp/include/mkdeps.h
===
--- libcpp/include/mkdeps.h	(revision 270904)
+++ libcpp/include/mkdeps.h	(working copy)
@@ -26,54 +26,54 @@ along with this program; see the file CO
 /* This is the data structure used by all the functions in mkdeps.c.
It's quite straightforward, but should be treated as opaque.  */
 
-struct deps;
+struct mkdeps;
 
 /* Create a deps buffer.  */
-extern struct deps *deps_init (void);
+extern struct mkdeps *deps_init (void);
 
 /* Destroy a deps buffer.  */
-extern void deps_free (struct deps *);
+extern void deps_free (struct mkdeps *);
 
 /* Add a set of "vpath" directories. The second argument is a colon-
separated list of pathnames, like you would set Make's VPATH
variable to.  If a dependency or target name begins with any of
these pathnames (and the next path element is not "..") that
pathname is stripped off.  */
-extern void deps_add_vpath (struct deps *, const char *);
+extern void deps_add_vpath (struct mkdeps *, const char *);
 
 /* Add a target (appears on left side of the colon) to the deps list.  Takes
a boolean indicating whether to quote the target for MAKE.  */
-extern void deps_add_target (struct deps *, const char *, int);
+extern void deps_add_target (struct mkdeps *, const char *, int);
 
 /* Sets the default target if none has been given already.  An empty
string as the default target is interpreted as stdin.  */
-extern void deps_add_default_target (struct deps *, const char *);
+extern void deps_add_default_target (struct mkdeps *, const char *);
 
 /* Add a dependency (appears on the right side of the colon) to the
deps list.  Dependencies will be printed in the order that they
were entered with this function.  By convention, the first
dependency entered should be the primary source file.  */
-extern void deps_add_dep (struct deps *, const char *);
+extern void deps_add_dep (struct mkdeps *, const char *);
 
 /* Write out a deps buffer to a specified file.  The third argument
is the number of columns to word-wrap at (0 means don't wrap).  */
-extern void deps_write (const struct deps *, FILE *, unsigned int);
+extern void deps_write (const struct mkdeps *, FILE *, unsigned int);
 
 /* Write out a deps buffer to a file, in a form that can be read back
with deps_restore.  Returns nonzero on error, in which case the
error number will be in errno.  */
-extern int deps_save (struct 

Re: A bug in vrp_meet?

2019-05-06 Thread Richard Biener
On Sun, May 5, 2019 at 11:09 PM Eric Botcazou  wrote:
>
> > I have now applied this variant.
>
> You backported it onto the 8 branch on Friday:
>
> 2019-05-03  Richard Biener  
>
> Backport from mainline
> [...]
> 2019-03-07  Richard Biener  
>
> PR tree-optimization/89595
> * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> stmt iterator as reference, take boolean output parameter to
> indicate whether the stmt was removed and thus the iterator
> already advanced.
> (dom_opt_dom_walker::before_dom_children): Re-iterate over
> stmts created by folding.
>
> and this introduced a regression for the attached Ada testcase at -O:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0102173c in set_value_range (
> vr=0x17747a0  const*)::vr_const_varying>, t=VR_RANGE, min=0x76c3df78, max= out>, equiv=0x0)
> at /home/eric/svn/gcc-8-branch/gcc/tree-vrp.c:298
> 298   vr->type = t;
>
> on x86-64 at least.  Mainline and 9 branch are not affected.

It looks like backporting r269597 might fix it though reverting that on trunk
doesn't make the testcase fail there.

Richard.

> --
> Eric Botcazou


[PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2019-05-06 Thread Antony Polukhin
This patch adds static asserts for type traits misuse with incomplete
classes and unions. This gives a nice readable error message instead
of an UB and odr-violations.

Some features of the patch:
* each type trait has it's own static_assert inside. This gives better
diagnostics than the approach with putting the assert into a helper
structure and using it in each trait.
* the result of completeness check is not memorized by the compiler.
This gives no false positive after the first failed check.
* some of the compiler builtins already implement the check. But not
all of them! So the asserts are in all the type_traits that may
benefit from the check. This also makes the behavior of libstdc++ more
consistent across different (non GCC) compilers.
* std::is_base_of does not have the assert as it works well in many
cases with incomplete types

PR libstdc++/71579
* include/std/type_traits: Add static_asserts to make sure that
type traits are not misused with an incomplete type.
* testsuite/20_util/__is_complete_or_unbounded/memoization.cc:
New test.
* testsuite/20_util/__is_complete_or_unbounded/memoization_neg.cc:
Likewise.
* testsuite/20_util/__is_complete_or_unbounded/value.cc: Likewise.
* testsuite/20_util/is_abstract/incomplete_neg.cc: Likewise.
* testsuite/20_util/is_aggregate/incomplete_neg.cc: Likewise.
* testsuite/20_util/is_class/value.cc: Likewise.
* testsuite/20_util/is_function/value.cc: Likewise.
* testsuite/20_util/is_move_constructible/incomplete_neg.cc: Likewise.
* testsuite/20_util/is_nothrow_move_assignable/incomplete_neg.cc:
Likewise.
* testsuite/20_util/is_polymorphic/incomplete_neg.cc: Likewise.
* testsuite/20_util/is_reference/value.cc: Likewise.
* testsuite/20_util/is_unbounded_array/value.cc: Likewise.
* testsuite/20_util/is_union/value.cc: Likewise.
* testsuite/20_util/is_void/value.cc: Likewise.
* testsuite/util/testsuite_tr1.h: Add incomplete union type.

-- 
Best regards,
Antony Polukhin
diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 411eff1..3a0cf44 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,27 @@
+2019-05-06  Antony Polukhin  
+
+	PR libstdc++/71579
+	* include/std/type_traits: Add static_asserts to make sure that
+	type traits are not misused with an incomplete type.
+	* testsuite/20_util/__is_complete_or_unbounded/memoization.cc:
+	New test.
+	* testsuite/20_util/__is_complete_or_unbounded/memoization_neg.cc:
+	Likewise.
+	* testsuite/20_util/__is_complete_or_unbounded/value.cc: Likewise.
+	* testsuite/20_util/is_abstract/incomplete_neg.cc: Likewise.
+	* testsuite/20_util/is_aggregate/incomplete_neg.cc: Likewise.
+	* testsuite/20_util/is_class/value.cc: Likewise.
+	* testsuite/20_util/is_function/value.cc: Likewise.
+	* testsuite/20_util/is_move_constructible/incomplete_neg.cc: Likewise.
+	* testsuite/20_util/is_nothrow_move_assignable/incomplete_neg.cc:
+	Likewise.
+	* testsuite/20_util/is_polymorphic/incomplete_neg.cc: Likewise.
+	* testsuite/20_util/is_reference/value.cc: Likewise.
+	* testsuite/20_util/is_unbounded_array/value.cc: Likewise.
+	* testsuite/20_util/is_union/value.cc: Likewise.
+	* testsuite/20_util/is_void/value.cc: Likewise.
+	* testsuite/util/testsuite_tr1.h: Add incomplete union type.
+
 2019-05-06  François Dumont  
 
 	* python/libstdcxx/v6/printers.py (add_one_template_type_printer):
diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits
index 1d14c75..6fdb534 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -91,6 +91,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct conditional;
 
+  template 
+struct __type_identity {
+  using type = _Type;
+};
+
   template
 struct __or_;
 
@@ -177,6 +182,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #endif // C++17
 
+  // Forward declarations
+  template
+struct is_reference;
+  template
+struct is_function;
+  template
+struct is_void;
+  template
+struct __is_array_unknown_bounds;
+
+  // Helper functions that return false_type for incomplete classes,
+  // incomplete unions and arrays of known bound from those.
+
+  template 
+constexpr true_type __is_complete_or_unbounded(__type_identity<_T>)
+{ return {}; }
+
+  template 
+constexpr typename __or_<
+  is_reference<_NestedType>,
+  is_function<_NestedType>,
+  is_void<_NestedType>,
+  __is_array_unknown_bounds<_NestedType>
+>::type __is_complete_or_unbounded(_TypeIdentity)
+{ return {}; }
+
   // For several sfinae-friendly trait implementations we transport both the
   // result information (as the member type) and the failure information (no
   // member type). This is very similar to std::enable_if, but we cannot use
@@ -399,9 +431,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public true_type { };
 
   template
-struct is_function;
-
-  template
 struct 

Re: [PATCH] Fix a typo in two_value_replacement function

2019-05-06 Thread Christophe Lyon
On Sun, 5 May 2019 at 08:31, Li Jia He  wrote:
>
> Hi,
>
> GCC revision 267634 implemented two_value_replacement function.
> However, a typo occurred during the parameter check, which caused
> us to miss some optimizations.
>
> The intent of the code might be to check that the input parameters
> are const int and their difference is one.  However, when I read
> the code, I found that it is wrong to detect whether an input data
> plus one is equal to itself.  This could be a typo.
>
> The regression testing for the patch was done on GCC mainline on
>
> powerpc64le-unknown-linux-gnu (Power 9 LE)
>
> with no regressions.  Is it OK for trunk and backport to gcc 9 ?
>
> Thanks,
> Lijia He
>
> gcc/ChangeLog
>
> 2019-05-05  Li Jia He  
>
> * tree-ssa-phiopt.c (two_value_replacement):
> Fix a typo in parameter detection.
>
> gcc/testsuite/ChangeLog
>
> 2019-05-05  Li Jia He  
>
> * gcc.dg/pr88676.c: Modify the include header file.
> * gcc.dg/tree-ssa/pr37508.c: Add the no-ssa-phiopt option to
> skip phi optimization.
> * gcc.dg/tree-ssa/pr88676.c: Rename to ...
> * gcc.dg/tree-ssa/pr88676-1.c: ... this new file.
> * gcc.dg/tree-ssa/pr88676-2.c: New testcase.

Hi,

This new testcase fails on arm and aarch64:
PASS: gcc.dg/tree-ssa/pr88676-2.c (test for excess errors)
UNRESOLVED: gcc.dg/tree-ssa/pr88676-2.c scan-tree-dump-not optimized " = PHI <"
because:
gcc.dg/tree-ssa/pr88676-2.c: dump file does not exist

Can you fix this?

Thanks,

Christophe

> ---
>  gcc/testsuite/gcc.dg/pr88676.c|  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr37508.c   |  6 ++--
>  .../tree-ssa/{pr88676.c => pr88676-1.c}   |  0
>  gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c | 30 +++
>  gcc/tree-ssa-phiopt.c |  2 +-
>  5 files changed, 35 insertions(+), 5 deletions(-)
>  rename gcc/testsuite/gcc.dg/tree-ssa/{pr88676.c => pr88676-1.c} (100%)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr88676.c b/gcc/testsuite/gcc.dg/pr88676.c
> index b5fdd9342b8..719208083ae 100644
> --- a/gcc/testsuite/gcc.dg/pr88676.c
> +++ b/gcc/testsuite/gcc.dg/pr88676.c
> @@ -2,7 +2,7 @@
>  /* { dg-do run } */
>  /* { dg-options "-O2" } */
>
> -#include "tree-ssa/pr88676.c"
> +#include "tree-ssa/pr88676-1.c"
>
>  __attribute__((noipa)) void
>  bar (int x, int y, int z)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
> index 2ba09afe481..a6def045de4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fno-tree-fre -fdump-tree-vrp1" } */
> +/* { dg-options "-O2 -fno-ssa-phiopt -fno-tree-fre -fdump-tree-vrp1" } */
>
>  struct foo1 {
>int i:1;
> @@ -22,7 +22,7 @@ int test2 (struct foo2 *x)
>  {
>if (x->i == 0)
>  return 1;
> -  else if (x->i == -1) /* This test is already folded to false by ccp1.  */
> +  else if (x->i == -1) /* This test is already optimized by ccp1 or phiopt1. 
>  */
>  return 1;
>return 0;
>  }
> @@ -31,7 +31,7 @@ int test3 (struct foo1 *x)
>  {
>if (x->i == 0)
>  return 1;
> -  else if (x->i == 1) /* This test is already folded to false by fold.  */
> +  else if (x->i == 1) /* This test is already optimized by ccp1 or phiopt1.  
> */
>  return 1;
>return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88676.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-1.c
> similarity index 100%
> rename from gcc/testsuite/gcc.dg/tree-ssa/pr88676.c
> rename to gcc/testsuite/gcc.dg/tree-ssa/pr88676-1.c
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c
> new file mode 100644
> index 000..d377948e14d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c
> @@ -0,0 +1,30 @@
> +/* PR tree-optimization/88676 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-not " = PHI <" "optimized" } } */
> +
> +struct foo1 {
> +  int i:1;
> +};
> +struct foo2 {
> +  unsigned i:1;
> +};
> +
> +int test1 (struct foo1 *x)
> +{
> +  if (x->i == 0)
> +return 1;
> +  else if (x->i == 1)
> +return 1;
> +  return 0;
> +}
> +
> +int test2 (struct foo2 *x)
> +{
> +  if (x->i == 0)
> +return 1;
> +  else if (x->i == -1)
> +return 1;
> +  return 0;
> +}
> +
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 219791ea4ba..90674a2f3c4 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -602,7 +602,7 @@ two_value_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>|| TREE_CODE (arg1) != INTEGER_CST
>|| (tree_int_cst_lt (arg0, arg1)
>   ? wi::to_widest (arg0) + 1 != wi::to_widest (arg1)
> - : wi::to_widest (arg1) + 1 != wi::to_widest (arg1)))
> + : wi::to_widest (arg1) + 1 != 

Re: [PATCH] Avoid epilogue peeling for x264 vectorization in x264_pixel_sad_x4_8x8

2019-05-06 Thread Richard Biener
On Mon, 6 May 2019, Jakub Jelinek wrote:

> On Fri, May 03, 2019 at 12:47:39PM +0200, Richard Biener wrote:
> > On Wed, Dec 12, 2018 at 11:54 AM Richard Biener  wrote:
> > >
> > >
> > > The following improves x264 vectorization by avoiding peeling for gaps
> > > noticing that when the upper half of a vector is unused we can
> > > load the lower part only (and fill the upper half with zeros - this
> > > is what x86 does automatically, GIMPLE doesn't allow us to leave
> > > the upper half undefined as RTL would with using subregs).
> > >
> > > The implementation is a little bit awkward as for optimal GIMPLE
> > > code-generation and costing we'd like to go the strided load path
> > > instead.  That proves somewhat difficult though thus the following
> > > is easier but doesn't fill out the re-align paths nor the masked
> > > paths (at least the fully masked path would never need peeling for
> > > gaps).
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, tested with
> > > SPEC CPU 2006 and 2017 with the expected (~4%) improvement for
> > > 625.x264_s.  Didn't see any positive or negative effects elsewhere.
> > >
> > > Queued for GCC 10.
> > 
> > Applied as r270847.
> 
> This regressed
> FAIL: gcc.target/i386/avx512vl-pr87214-1.c execution test
> (AVX512VL hw or SDE is needed to reproduce).

Looking at this.  Reproducible with SSE4.2 and

struct s { unsigned int a, b, c; };

void __attribute__ ((noipa))
foo (struct s *restrict s1, struct s *restrict s2, int n)
{
  for (int i = 0; i < n; ++i)
{
  s1[i].b = s2[i].b;
  s1[i].c = s2[i].c;
  s2[i].c = 0;
}
}

#define N 12

int
main ()
{
  struct s s1[N], s2[N];
  for (unsigned int j = 0; j < N; ++j)
{
  s2[j].a = j * 5;
  s2[j].b = j * 5 + 2;
  s2[j].c = j * 5 + 4;
}
  foo (s1, s2, N);
  for (unsigned int j = 0; j < N; ++j)
  if (s1[j].b != j * 5 + 2)
__builtin_abort ();
  return 0;
}

Probably the cause of PR90358

Richard.


Re: [PATCH PR90078]Capping comp_cost computation in ivopts

2019-05-06 Thread Richard Biener
On Mon, May 6, 2019 at 12:24 PM Bin.Cheng  wrote:
>
> On Mon, May 6, 2019 at 6:11 PM Richard Biener
>  wrote:
> >
> > On Sun, May 5, 2019 at 8:03 AM bin.cheng  
> > wrote:
> > >
> > > > --
> > > > Sender:Jakub Jelinek 
> > > > Sent At:2019 Apr. 17 (Wed.) 19:27
> > > > Recipient:Bin.Cheng 
> > > > Cc:bin.cheng ; GCC Patches 
> > > > 
> > > > Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> > > >
> > > >
> > > > On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > > > > As
> > > > > > #define INFTY 1000
> > > > > > what is the reason to keep the previous condition as well?
> > > > > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > > > > cost1.cost + cost2.cost >= INFTY too.
> > > > > > Unless costs can go negative.
> > > > > It's a bit complicated, but in general, costs can go negative.
> > > >
> > > > Ok, no objections from me then (but as I don't know anything about it,
> > > > not an ack either; you are ivopts maintainer, so you don't need one).
> > >
> > > Hi,
> > > The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is 
> > > now
> > > fixed by another patch.  This is the updated patch for PR90078.  It 
> > > promotes type
> > > of ivopts cost from int to int64_t, as well as change behavior of 
> > > infinite_cost overflow
> > > from saturation to assert.
> > > Please note, implicit conversions are kept in cost computation as before 
> > > without
> > > introducing any narrowing.
> > >
> > > Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?
> >
> > Do not include system headers in .c files, instead those need to be
> > (and are already)
> > included via system.h.
> >
> >  /* The infinite cost.  */
> > -#define INFTY 1000
> > +#define INFTY 10L
> >
> > do we actually need this?  What happens on a ilp32 host?  That is, I believe
> > you can drop the 'L' (it fits into an int anyways)
> Yeah, now I think if int64_t is necessary or not.  With the scaling
> bound and assertions.
> >
> > @@ -256,6 +259,7 @@ operator- (comp_cost cost1, comp_cost cost2)
> >  return infinite_cost;
> >
> >gcc_assert (!cost2.infinite_cost_p ());
> > +  gcc_assert (cost1.cost - cost2.cost < infinite_cost.cost);
> >
> >cost1.cost -= cost2.cost;
> >cost1.complexity -= cost2.complexity;
> >
> > probably a pre-existing issue, but we do not seem to handle underflow
> > here in general, nor check that underflow doesn't get us below -INFTY.
> >
> > I guess we really don't want negative costs?  That doesn't seem to be
> > documented and I was also wondering why the cost isn't unsigned...
> >
> > @@ -638,7 +646,7 @@ struct iv_ca
> >comp_cost cand_use_cost;
> >
> >/* Total cost of candidates.  */
> > -  unsigned cand_cost;
> > +  int64_t cand_cost;
> >
> >/* Number of times each invariant variable is used.  */
> >unsigned *n_inv_var_uses;
> >
> > shows this "issue".  Can't we use uint64_t throughout the patch?
> Oh, it's actually explained in previous message,
> https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00697.html
> In short, yes the cost can be negative.  The negative cost should be
> small in absolute value, and we don't check -INFTY.  With the scaling
> bound change, I don't think -INFTY is possible IIUC.

Ah, I see.  The patch is OK then with 10L in the #define
changed to just 10

Richard.

> Thanks,
> bin
> >
> > Otherwise this looks OK.
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > bin
> > > 2019-05-05  Bin Cheng  
> > >
> > > PR tree-optimization/90078
> > > * tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
> > > (INFTY): Increase the value for infinite cost.
> > > (struct comp_cost): Promote type of members to int64_t.
> > > (infinite_cost): Don't set complexity in initialization.
> > > (comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost 
> > > computation
> > > overflows to infinite_cost.
> > > (adjust_setup_cost): Promote type of parameter and cost 
> > > computation
> > > to int64_t.
> > > (struct ainc_cost_data, struct iv_ca): Promote type of member to
> > > int64_t.
> > > (get_scaled_computation_cost_at, determine_iv_cost): Promote type 
> > > of
> > > cost computation to int64_t.
> > > (determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
> > > int64_t's format specifier in dump.
> > >
> > > 2018-05-05  Bin Cheng  
> > >
> > > PR tree-optimization/90078
> > > * g++.dg/tree-ssa/pr90078.C: New test.


Re: [PATCH] simplify_vector_constructor: Rename the second elt to elem

2019-05-06 Thread Richard Biener
On Fri, May 3, 2019 at 6:41 PM H.J. Lu  wrote:
>
> simplify_vector_constructor has
>
>   constructor_elt *elt;
> ...
>   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
> {
>   ...
>   unsigned int elt;
>   ...
> }
>
> This patch renames the second elt to elem to avoid shadowing the first
> elt.

OK.

> * tree-ssa-forwprop.c (simplify_vector_constructor): Rename the
> second elt to elem to avoid shadowing the first elt.
> ---
>  gcc/tree-ssa-forwprop.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> index 7dd1e64335a..ab27a5ddbb1 100644
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -2101,15 +2101,16 @@ simplify_vector_constructor (gimple_stmt_iterator 
> *gsi)
>   if (j == 2)
> return false;
>
> - unsigned int elt;
> + unsigned int elem;
>   if (maybe_ne (bit_field_size (op1), elem_size)
> - || !constant_multiple_p (bit_field_offset (op1), elem_size, 
> ))
> + || !constant_multiple_p (bit_field_offset (op1), elem_size,
> +  ))
> return false;
>   if (j)
> -   elt += nelts;
> - if (elt != i)
> +   elem += nelts;
> + if (elem != i)
> maybe_ident = false;
> - sel.quick_push (elt);
> + sel.quick_push (elem);
> }
>else if (CONSTANT_CLASS_P (elt->value))
> {
> --
> 2.20.1
>


Re: [PATCH PR90078]Capping comp_cost computation in ivopts

2019-05-06 Thread Bin.Cheng
On Mon, May 6, 2019 at 6:11 PM Richard Biener
 wrote:
>
> On Sun, May 5, 2019 at 8:03 AM bin.cheng  wrote:
> >
> > > --
> > > Sender:Jakub Jelinek 
> > > Sent At:2019 Apr. 17 (Wed.) 19:27
> > > Recipient:Bin.Cheng 
> > > Cc:bin.cheng ; GCC Patches 
> > > 
> > > Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> > >
> > >
> > > On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > > > As
> > > > > #define INFTY 1000
> > > > > what is the reason to keep the previous condition as well?
> > > > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > > > cost1.cost + cost2.cost >= INFTY too.
> > > > > Unless costs can go negative.
> > > > It's a bit complicated, but in general, costs can go negative.
> > >
> > > Ok, no objections from me then (but as I don't know anything about it,
> > > not an ack either; you are ivopts maintainer, so you don't need one).
> >
> > Hi,
> > The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is now
> > fixed by another patch.  This is the updated patch for PR90078.  It 
> > promotes type
> > of ivopts cost from int to int64_t, as well as change behavior of 
> > infinite_cost overflow
> > from saturation to assert.
> > Please note, implicit conversions are kept in cost computation as before 
> > without
> > introducing any narrowing.
> >
> > Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?
>
> Do not include system headers in .c files, instead those need to be
> (and are already)
> included via system.h.
>
>  /* The infinite cost.  */
> -#define INFTY 1000
> +#define INFTY 10L
>
> do we actually need this?  What happens on a ilp32 host?  That is, I believe
> you can drop the 'L' (it fits into an int anyways)
Yeah, now I think if int64_t is necessary or not.  With the scaling
bound and assertions.
>
> @@ -256,6 +259,7 @@ operator- (comp_cost cost1, comp_cost cost2)
>  return infinite_cost;
>
>gcc_assert (!cost2.infinite_cost_p ());
> +  gcc_assert (cost1.cost - cost2.cost < infinite_cost.cost);
>
>cost1.cost -= cost2.cost;
>cost1.complexity -= cost2.complexity;
>
> probably a pre-existing issue, but we do not seem to handle underflow
> here in general, nor check that underflow doesn't get us below -INFTY.
>
> I guess we really don't want negative costs?  That doesn't seem to be
> documented and I was also wondering why the cost isn't unsigned...
>
> @@ -638,7 +646,7 @@ struct iv_ca
>comp_cost cand_use_cost;
>
>/* Total cost of candidates.  */
> -  unsigned cand_cost;
> +  int64_t cand_cost;
>
>/* Number of times each invariant variable is used.  */
>unsigned *n_inv_var_uses;
>
> shows this "issue".  Can't we use uint64_t throughout the patch?
Oh, it's actually explained in previous message,
https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00697.html
In short, yes the cost can be negative.  The negative cost should be
small in absolute value, and we don't check -INFTY.  With the scaling
bound change, I don't think -INFTY is possible IIUC.

Thanks,
bin
>
> Otherwise this looks OK.
>
> Thanks,
> Richard.
>
> > Thanks,
> > bin
> > 2019-05-05  Bin Cheng  
> >
> > PR tree-optimization/90078
> > * tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
> > (INFTY): Increase the value for infinite cost.
> > (struct comp_cost): Promote type of members to int64_t.
> > (infinite_cost): Don't set complexity in initialization.
> > (comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
> > overflows to infinite_cost.
> > (adjust_setup_cost): Promote type of parameter and cost computation
> > to int64_t.
> > (struct ainc_cost_data, struct iv_ca): Promote type of member to
> > int64_t.
> > (get_scaled_computation_cost_at, determine_iv_cost): Promote type of
> > cost computation to int64_t.
> > (determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
> > int64_t's format specifier in dump.
> >
> > 2018-05-05  Bin Cheng  
> >
> > PR tree-optimization/90078
> > * g++.dg/tree-ssa/pr90078.C: New test.


Re: [PATCH] Avoid epilogue peeling for x264 vectorization in x264_pixel_sad_x4_8x8

2019-05-06 Thread Jakub Jelinek
On Fri, May 03, 2019 at 12:47:39PM +0200, Richard Biener wrote:
> On Wed, Dec 12, 2018 at 11:54 AM Richard Biener  wrote:
> >
> >
> > The following improves x264 vectorization by avoiding peeling for gaps
> > noticing that when the upper half of a vector is unused we can
> > load the lower part only (and fill the upper half with zeros - this
> > is what x86 does automatically, GIMPLE doesn't allow us to leave
> > the upper half undefined as RTL would with using subregs).
> >
> > The implementation is a little bit awkward as for optimal GIMPLE
> > code-generation and costing we'd like to go the strided load path
> > instead.  That proves somewhat difficult though thus the following
> > is easier but doesn't fill out the re-align paths nor the masked
> > paths (at least the fully masked path would never need peeling for
> > gaps).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, tested with
> > SPEC CPU 2006 and 2017 with the expected (~4%) improvement for
> > 625.x264_s.  Didn't see any positive or negative effects elsewhere.
> >
> > Queued for GCC 10.
> 
> Applied as r270847.

This regressed
FAIL: gcc.target/i386/avx512vl-pr87214-1.c execution test
(AVX512VL hw or SDE is needed to reproduce).

> > 2018-12-12  Richard Biener  
> >
> > * tree-vect-stmts.c (get_group_load_store_type): Avoid
> > peeling for gaps by loading only lower halves of vectors
> > if possible.
> > (vectorizable_load): Likewise.
> >
> > * gcc.dg/vect/slp-reduc-sad-2.c: New testcase.

Jakub


Re: [PATCH, RFC, rs6000] PR80791 Consider doloop in ivopts

2019-05-06 Thread Richard Biener
On Sun, 5 May 2019, Kewen.Lin wrote:

> on 2019/4/27 上午11:44, Bin.Cheng wrote:
> > GCC lacks the capability passing information to later passes.  Gimple
> > analyzer worked hard collecting various information but discards it
> > entering RTL or earlier.  Other examples are like runtime alias
> > information, non-wrapping information for specific operations, etc.
> > IMHO, this is what needs to be done.  As for this case, it could be
> > finite loop info, or non-wrapping info of the iv_var's increment
> > operation.  By passing more information, RTL passes can be simplified
> > too.
> > 
> 
> Thanks for the information! Is there any under development work for this?
> That would be fine if we can pass down those information to downstream
> passes based on upcoming feature.

GCC keeps several bits of information across passes, notably
everything stored in struct loop.

Richard.

Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-05-06 Thread Richard Biener
On Mon, May 6, 2019 at 5:04 AM Feng Xue OS  wrote:
>
> Hi Richard,
>
>
>Since gcc 9 has been released, will you get some time to take a look at 
> this patch? Thanks.

I'm working through the backlog but I also hope Micha can have a look
here since he
authored the loop splitting code.

Richard.

>
> Feng
>
> 
> From: Richard Biener 
> Sent: Tuesday, March 12, 2019 4:31:49 PM
> To: Feng Xue OS
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR 
> tree-optimization/89134)
>
> On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS  
> wrote:
> >
> > This patch is composed to implement a loop transformation on one of its 
> > conditional statements, which we call it semi-invariant, in that its 
> > computation is impacted in only one of its branches.
> >
> > Suppose a loop as:
> >
> > void f (std::map m)
> > {
> > for (auto it = m.begin (); it != m.end (); ++it) {
> > /* if (b) is semi-invariant. */
> > if (b) {
> > b = do_something();/* Has effect on b */
> > } else {
> > /* No effect on b */
> > }
> > statements;  /* Also no effect on b */
> > }
> > }
> >
> > A transformation, kind of loop split, could be:
> >
> > void f (std::map m)
> > {
> > for (auto it = m.begin (); it != m.end (); ++it) {
> > if (b) {
> > b = do_something();
> > } else {
> > ++it;
> > statements;
> > break;
> > }
> > statements;
> > }
> >
> > for (; it != m.end (); ++it) {
> > statements;
> > }
> > }
> >
> > If "statements" contains nothing, the second loop becomes an empty one, 
> > which can be removed. (This part will be given in another patch). And if 
> > "statements" are straight line instructions, we get an opportunity to 
> > vectorize the second loop. In practice, this optimization is found to 
> > improve some real application by %7.
> >
> > Since it is just a kind of loop split, the codes are mainly placed in 
> > existing tree-ssa-loop-split module, and is controlled by -fsplit-loop, and 
> > is enabled with -O3.
>
> Note the transform itself is jump-threading with the threading
> duplicating a whole CFG cycle.
>
> I didn't look at the patch details yet since this is suitable for GCC 10 only.
>
> Thanks for implementing this.
> Richard.
>
> > Feng
> >
> >
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > index 64bf6017d16..a6c2878d652 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,23 @@
> > +2019-03-12  Feng Xue 
> > +
> > +   PR tree-optimization/89134
> > +* doc/invoke.texi (max-cond-loop-split-insns): Document new 
> > --params.
> > +   (min-cond-loop-split-prob): Likewise.
> > +   * params.def: Add max-cond-loop-split-insns, 
> > min-cond-loop-split-prob.
> > +   * passes.def (pass_cond_loop_split) : New pass.
> > +   * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> > +   * tree-pass.h (make_pass_cond_loop_split): New declaration.
> > +   * tree-ssa-loop-split.c (split_info): New class.
> > +   (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> > +   (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> > +   (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> > +   (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> > +   (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> > +   (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> > +   (pass_data_cond_loop_split): New variable.
> > +   (pass_cond_loop_split): New class.
> > +   (make_pass_cond_loop_split): New function.
> > +
> >  2019-03-11  Jakub Jelinek  
> >
> > PR middle-end/89655
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index df0883f2fc9..f5e09bd71fd 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched in a 
> > single loop.
> >  @item lim-expensive
> >  The minimum cost of an expensive expression in the loop invariant motion.
> >
> > +@item max-cond-loop-split-insns
> > +The maximum number of insns to be increased due to loop split on
> > +semi-invariant condition statement.
> > +
> > +@item min-cond-loop-split-prob
> > +The minimum threshold for probability of semi-invaraint condition
> > +statement to trigger loop split.
> > +
> >  @item iv-consider-all-candidates-bound
> >  Bound on number of candidates for induction variables, below which
> >  all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 3f1576448be..2e067526958 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -386,6 

Re: [PATCH PR90078]Capping comp_cost computation in ivopts

2019-05-06 Thread Richard Biener
On Sun, May 5, 2019 at 8:03 AM bin.cheng  wrote:
>
> > --
> > Sender:Jakub Jelinek 
> > Sent At:2019 Apr. 17 (Wed.) 19:27
> > Recipient:Bin.Cheng 
> > Cc:bin.cheng ; GCC Patches 
> > 
> > Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> >
> >
> > On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > > As
> > > > #define INFTY 1000
> > > > what is the reason to keep the previous condition as well?
> > > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > > cost1.cost + cost2.cost >= INFTY too.
> > > > Unless costs can go negative.
> > > It's a bit complicated, but in general, costs can go negative.
> >
> > Ok, no objections from me then (but as I don't know anything about it,
> > not an ack either; you are ivopts maintainer, so you don't need one).
>
> Hi,
> The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is now
> fixed by another patch.  This is the updated patch for PR90078.  It promotes 
> type
> of ivopts cost from int to int64_t, as well as change behavior of 
> infinite_cost overflow
> from saturation to assert.
> Please note, implicit conversions are kept in cost computation as before 
> without
> introducing any narrowing.
>
> Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?

Do not include system headers in .c files, instead those need to be
(and are already)
included via system.h.

 /* The infinite cost.  */
-#define INFTY 1000
+#define INFTY 10L

do we actually need this?  What happens on a ilp32 host?  That is, I believe
you can drop the 'L' (it fits into an int anyways)

@@ -256,6 +259,7 @@ operator- (comp_cost cost1, comp_cost cost2)
 return infinite_cost;

   gcc_assert (!cost2.infinite_cost_p ());
+  gcc_assert (cost1.cost - cost2.cost < infinite_cost.cost);

   cost1.cost -= cost2.cost;
   cost1.complexity -= cost2.complexity;

probably a pre-existing issue, but we do not seem to handle underflow
here in general, nor check that underflow doesn't get us below -INFTY.

I guess we really don't want negative costs?  That doesn't seem to be
documented and I was also wondering why the cost isn't unsigned...

@@ -638,7 +646,7 @@ struct iv_ca
   comp_cost cand_use_cost;

   /* Total cost of candidates.  */
-  unsigned cand_cost;
+  int64_t cand_cost;

   /* Number of times each invariant variable is used.  */
   unsigned *n_inv_var_uses;

shows this "issue".  Can't we use uint64_t throughout the patch?

Otherwise this looks OK.

Thanks,
Richard.

> Thanks,
> bin
> 2019-05-05  Bin Cheng  
>
> PR tree-optimization/90078
> * tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
> (INFTY): Increase the value for infinite cost.
> (struct comp_cost): Promote type of members to int64_t.
> (infinite_cost): Don't set complexity in initialization.
> (comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
> overflows to infinite_cost.
> (adjust_setup_cost): Promote type of parameter and cost computation
> to int64_t.
> (struct ainc_cost_data, struct iv_ca): Promote type of member to
> int64_t.
> (get_scaled_computation_cost_at, determine_iv_cost): Promote type of
> cost computation to int64_t.
> (determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
> int64_t's format specifier in dump.
>
> 2018-05-05  Bin Cheng  
>
> PR tree-optimization/90078
> * g++.dg/tree-ssa/pr90078.C: New test.


Re: [PATCH][Tree-optimization/PR89772]fold memchr builtins for character not in constant nul-padded string

2019-05-06 Thread Richard Biener
On Thu, Mar 21, 2019 at 5:57 AM JunMa  wrote:
>
> Hi
> For now, gcc can not fold code like:
>
> const char a[5] = "123"
> __builtin_memchr (a, '7', sizeof a)
>
> It tries to avoid folding out of string length although length of a is 5.
> This is a bit conservative, it's safe to folding memchr/bcmp/memcmp
> builtins when constant string stores in array with some trailing nuls.
>
> This patch folds these cases by exposing additional length of
> trailing nuls in c_getstr().
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

It's probably better if gimple_fold_builtin_memchr uses string_constant
directly instead?

You are changing memcmp constant folding but only have a testcase
for memchr.

If we decide to amend c_getstr I would rather make it return the
total length instead of the number of trailing zeros.

Richard.

> Regards
> JunMa
>
>
> gcc/ChangeLog
>
> 2019-03-21  Jun Ma 
>
>  PR Tree-optimization/89772
>  * fold-const.c (c_getstr): Add new parameter to get length of
> additional
>  trailing nuls after constant string.
>  * gimple-fold.c (gimple_fold_builtin_memchr): consider trailing nuls in
>  out-of-bound accesses checking.
>  * fold-const-call.c (fold_const_call): Likewise.
>
>
> gcc/testsuite/ChangeLog
>
> 2019-03-21  Jun Ma 
>
>  PR Tree-optimization/89772
>  * gcc.dg/builtin-memchr-4.c: New test.


Re: [PATCH] Fix PR90316

2019-05-06 Thread Richard Biener
On Sat, 4 May 2019, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 3 May 2019, Richard Biener wrote:
> >
> >> 
> >> I am testing the following patch to remove the code determining
> >> the target virtual operand to walk to and instead compute it
> >> based on the immediate dominator which we will reach anyways
> >> (or a dominating block) during maybe_skip_until.
> >> 
> >> More simplifying might be possible but I'm trying to keep the
> >> patch small and suitable for backporting up to the GCC 8 branch
> >> where this regressed.
> >> 
> >> Note this will handle even more CFG shapes now and seems to
> >> expose some uninit warnings in dwarf2out.c (at least).
> >
> > I can't seem to find an initializer that would "trap" on use
> > so I'm going to do
> >
> > Index: gcc/dwarf2out.c
> > ===
> > --- gcc/dwarf2out.c (revision 270849)
> > +++ gcc/dwarf2out.c (working copy)
> > @@ -15461,7 +15461,7 @@ mem_loc_descriptor (rtx rtl, machine_mod
> >if (mode != GET_MODE (rtl) && GET_MODE (rtl) != VOIDmode)
> >  return NULL;
> >  
> > -  scalar_int_mode int_mode, inner_mode, op1_mode;
> > +  scalar_int_mode int_mode = SImode, inner_mode, op1_mode;
> >switch (GET_CODE (rtl))
> >  {
> >  case POST_INC:
> >
> > unless somebody comes up with something clever over the weekend...
> 
> Nothing clever, but something rare like BImode is probably safer than
> SImode, in case doing this masks real "uninitialised" uses in future.

Ick, and I forgot to install this hunk when I committed it this morning.

Thus fixed as obvious now, with BImode.

Richard.


[PATCH] Fix PR90328

2019-05-06 Thread Richard Biener


Another restrict issue hitting both vectorization and loop distribution.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-05-06  Richard Biener  

PR tree-optimization/90328
* tree-data-ref.h (dr_may_alias_p): Pass in the actual loop nest.
* tree-data-ref.c (dr_may_alias_p): Check whether the clique
is valid in the loop nest before using it.
(initialize_data_dependence_relation): Adjust.
* graphite-scop-detection.c (build_alias_set): Pass the SCOP enclosing
loop as loop-nest to dr_may_alias_p.

* gcc.dg/torture/pr90328.c: New testcase.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h (revision 270902)
+++ gcc/tree-data-ref.h (working copy)
@@ -473,7 +473,7 @@ dr_alignment (data_reference *dr)
 }
 
 extern bool dr_may_alias_p (const struct data_reference *,
-   const struct data_reference *, bool);
+   const struct data_reference *, struct loop *);
 extern bool dr_equal_offsets_p (struct data_reference *,
 struct data_reference *);
 
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c (revision 270902)
+++ gcc/tree-data-ref.c (working copy)
@@ -2231,7 +2231,7 @@ object_address_invariant_in_loop_p (cons
 
 bool
 dr_may_alias_p (const struct data_reference *a, const struct data_reference *b,
-   bool loop_nest)
+   struct loop *loop_nest)
 {
   tree addr_a = DR_BASE_OBJECT (a);
   tree addr_b = DR_BASE_OBJECT (b);
@@ -2255,6 +2255,11 @@ dr_may_alias_p (const struct data_refere
 
   if ((TREE_CODE (addr_a) == MEM_REF || TREE_CODE (addr_a) == TARGET_MEM_REF)
   && (TREE_CODE (addr_b) == MEM_REF || TREE_CODE (addr_b) == 
TARGET_MEM_REF)
+  /* For cross-iteration dependences the cliques must be valid for the
+whole loop, not just individual iterations.  */
+  && (!loop_nest
+ || MR_DEPENDENCE_CLIQUE (addr_a) == 1
+ || MR_DEPENDENCE_CLIQUE (addr_a) == loop_nest->owned_clique)
   && MR_DEPENDENCE_CLIQUE (addr_a) == MR_DEPENDENCE_CLIQUE (addr_b)
   && MR_DEPENDENCE_BASE (addr_a) != MR_DEPENDENCE_BASE (addr_b))
 return false;
@@ -2366,7 +2371,7 @@ initialize_data_dependence_relation (str
 }
 
   /* If the data references do not alias, then they are independent.  */
-  if (!dr_may_alias_p (a, b, loop_nest.exists ()))
+  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
 {
   DDR_ARE_DEPENDENT (res) = chrec_known;
   return res;
Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 270902)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -1417,9 +1417,13 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;
 
+  struct loop *nest
+= find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+   scop->scop_info->region.exit->src->loop_father);
+
   FOR_EACH_VEC_ELT (scop->drs, i, dr1)
 for (j = i+1; scop->drs.iterate (j, ); j++)
-  if (dr_may_alias_p (dr1->dr, dr2->dr, true))
+  if (dr_may_alias_p (dr1->dr, dr2->dr, nest))
{
  /* Dependences in the same alias set need to be handled
 by just looking at DR_ACCESS_FNs.  */
Index: gcc/testsuite/gcc.dg/torture/pr90328.c
===
--- gcc/testsuite/gcc.dg/torture/pr90328.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr90328.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+void g(int*__restrict x, int*y)
+{
+  *x = *y;
+}
+
+void __attribute__((noipa)) f(int* a,int* b)
+{
+  for(int i=0;i<1024;++i)
+g(a+i,b+i);
+}
+
+int main()
+{
+  int x[1025];
+  for (int i = 0; i < 1025; ++i)
+x[i] = i+1;
+  f(x+1, x);
+  for (int i = 0; i < 1025; ++i)
+if (x[i] != 1)
+  __builtin_abort ();
+  return 0;
+}


[C++ Patch] Consistently use FUNC_OR_METHOD_TYPE_P

2019-05-06 Thread Paolo Carlini

Hi,

one of most straightforward changes in my pending queue. Tested 
x86_64-linux.


Thanks, Paolo.

/

2019-05-06  Paolo Carlini  

* call.c (build_call_a): Use FUNC_OR_METHOD_TYPE_P.
* cp-gimplify.c (cp_fold): Likewise.
* cp-objcp-common.c (cp_type_dwarf_attribute): Likewise.
* cp-tree.h (TYPE_OBJ_P, TYPE_PTROBV_P): Likewise.
* cvt.c (perform_qualification_conversions): Likewise.
* decl.c (grokdeclarator): Likewise.
* decl2.c (build_memfn_type): Likewise.
* mangle.c (canonicalize_for_substitution, write_type): Likewise.
* parser.c (cp_parser_omp_declare_reduction): Likewise.
* pt.c (check_explicit_specialization, uses_deducible_template_parms,
check_cv_quals_for_unify, dependent_type_p_r): Likewise.
* rtti.c (ptr_initializer): Likewise.
* semantics.c (finish_asm_stmt, finish_offsetof,
cp_check_omp_declare_reduction): Likewise.
* tree.c (cp_build_qualified_type_real,
cp_build_type_attribute_variant, cxx_type_hash_eq,
cxx_copy_lang_qualifiers, cp_free_lang_data): Likewise.
* typeck.c (structural_comptypes, convert_arguments,
cp_build_addr_expr_1, unary_complex_lvalue, cp_build_c_cast,
cp_build_modify_expr, comp_ptr_ttypes_real, type_memfn_rqual):
Likewise.
Index: call.c
===
--- call.c  (revision 266503)
+++ call.c  (working copy)
@@ -357,8 +357,7 @@ build_call_a (tree function, int n, tree *argarray
 
   gcc_assert (TYPE_PTR_P (TREE_TYPE (function)));
   fntype = TREE_TYPE (TREE_TYPE (function));
-  gcc_assert (TREE_CODE (fntype) == FUNCTION_TYPE
- || TREE_CODE (fntype) == METHOD_TYPE);
+  gcc_assert (FUNC_OR_METHOD_TYPE_P (fntype));
   result_type = TREE_TYPE (fntype);
   /* An rvalue has no cv-qualifiers.  */
   if (SCALAR_TYPE_P (result_type) || VOID_TYPE_P (result_type))
Index: cp-gimplify.c
===
--- cp-gimplify.c   (revision 266503)
+++ cp-gimplify.c   (working copy)
@@ -2306,8 +2306,7 @@ cp_fold (tree x)
 
   /* Cope with user tricks that amount to offsetof.  */
   if (op0 != error_mark_node
- && TREE_CODE (TREE_TYPE (op0)) != FUNCTION_TYPE
- && TREE_CODE (TREE_TYPE (op0)) != METHOD_TYPE)
+ && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (op0)))
{
  tree val = get_base_address (op0);
  if (val
Index: cp-objcp-common.c
===
--- cp-objcp-common.c   (revision 266503)
+++ cp-objcp-common.c   (working copy)
@@ -248,8 +248,7 @@ cp_type_dwarf_attribute (const_tree type, int attr
   switch (attr)
 {
 case DW_AT_reference:
-  if ((TREE_CODE (type) == FUNCTION_TYPE
-  || TREE_CODE (type) == METHOD_TYPE)
+  if (FUNC_OR_METHOD_TYPE_P (type)
  && FUNCTION_REF_QUALIFIED (type)
  && !FUNCTION_RVALUE_QUALIFIED (type))
return 1;
@@ -256,8 +255,7 @@ cp_type_dwarf_attribute (const_tree type, int attr
   break;
 
 case DW_AT_rvalue_reference:
-  if ((TREE_CODE (type) == FUNCTION_TYPE
-  || TREE_CODE (type) == METHOD_TYPE)
+  if (FUNC_OR_METHOD_TYPE_P (type)
  && FUNCTION_REF_QUALIFIED (type)
  && FUNCTION_RVALUE_QUALIFIED (type))
return 1;
Index: cp-tree.h
===
--- cp-tree.h   (revision 266503)
+++ cp-tree.h   (working copy)
@@ -4313,8 +4313,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a
 #define TYPE_OBJ_P(NODE)   \
   (!TYPE_REF_P (NODE)  \
&& !VOID_TYPE_P (NODE)  \
-   && TREE_CODE (NODE) != FUNCTION_TYPE\
-   && TREE_CODE (NODE) != METHOD_TYPE)
+   && !FUNC_OR_METHOD_TYPE_P (NODE))
 
 /* Returns true if NODE is a pointer to an object.  Keep these checks
in ascending tree code order.  */
@@ -4330,8 +4329,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a
void.  Keep these checks in ascending tree code order.  */
 #define TYPE_PTROBV_P(NODE)\
   (TYPE_PTR_P (NODE)   \
-   && !(TREE_CODE (TREE_TYPE (NODE)) == FUNCTION_TYPE  \
-   || TREE_CODE (TREE_TYPE (NODE)) == METHOD_TYPE))
+   && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (NODE)))
 
 /* Returns true if NODE is a pointer to function type.  */
 #define TYPE_PTRFN_P(NODE) \
Index: cvt.c
===
--- cvt.c   (revision 266503)
+++ cvt.c   (working copy)
@@ -1979,8 +1979,7 @@ perform_qualification_conversions (tree type, tree
 bool
 tx_safe_fn_type_p (tree t)
 {
-  if (TREE_CODE (t) != FUNCTION_TYPE
-  && TREE_CODE (t) != METHOD_TYPE)
+  if (!FUNC_OR_METHOD_TYPE_P (t))
 return false;
   

[committed] Clean up libgomp GCC 5 legacy support (was: Remove unused openacc call)

2019-05-06 Thread Thomas Schwinge
Hi!

On Wed, 20 Apr 2016 12:27:27 +0200, Jakub Jelinek  wrote:
> On Wed, Apr 20, 2016 at 10:46:38AM +0200, Thomas Schwinge wrote:
> > Clean up libgomp GCC 5 legacy support
> > 
> > libgomp/
> > * config/nvptx/oacc-parallel.c: Empty file.
|   * oacc-parallel.c: Add comments to legacy entry points (GCC 5).
> 
> Ok for trunk and 6.2, we don't need this for 6.1.

I don't know anymore why I didn't commit this back then.

Cesar has since committed to trunk in r263236 the patch that truncates
'libgomp/config/nvptx/oacc-parallel.c', and I've now committed the
documentation update to trunk in r270901, see attached.


Grüße
 Thomas


From 75a3472e37d26152afb7b08a6fda2691c4b0ee74 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Mon, 6 May 2019 08:49:55 +
Subject: [PATCH] Clean up libgomp GCC 5 legacy support

	libgomp/
	* oacc-parallel.c: Add comments to legacy entry points (GCC 5).

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@270901 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog   | 4 
 libgomp/oacc-parallel.c | 6 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 30bb2d4beba..64e0a8ad8df 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,7 @@
+2019-05-06  Thomas Schwinge  
+
+	* oacc-parallel.c: Add comments to legacy entry points (GCC 5).
+
 2019-03-27  Kevin Buettner  
 
 	* team.c (gomp_team_start): Initialize pool->threads[0].
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index b77c5e8b9c5..f5fb63c5b5a 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -275,7 +275,7 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
-/* Legacy entry point, only provide host execution.  */
+/* Legacy entry point (GCC 5).  Only provide host fallback execution.  */
 
 void
 GOACC_parallel (int flags_m, void (*fn) (void *),
@@ -649,12 +649,16 @@ GOACC_wait (int async, int num_waits, ...)
 acc_wait_all_async (async);
 }
 
+/* Legacy entry point (GCC 5).  */
+
 int
 GOACC_get_num_threads (void)
 {
   return 1;
 }
 
+/* Legacy entry point (GCC 5).  */
+
 int
 GOACC_get_thread_num (void)
 {
-- 
2.17.1



signature.asc
Description: PGP signature


[PATCH] [MIPS] Fix PR/target 90357 20080502-1.c -O0 start with r269880

2019-05-06 Thread Paul Hua
The attached patch fix pr90357, bootstraped and regressed test on
mips64el-linux-gnu target.
Ok for commit ?
From a3db8763ee8460a5f63c567d58624a985f9924ce Mon Sep 17 00:00:00 2001
From: Chenghua Xu 
Date: Mon, 6 May 2019 16:14:56 +0800
Subject: [PATCH] [PATCH,MIPS] Skip forward src into next insn when the SRC reg
 is dead.

	PR target/90357
	gcc/
	* config/mips/mips.c (mips_split_move): Skip forward SRC into
	next insn when the SRC reg is dead.
---
 gcc/config/mips/mips.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1de33b2..89fc073 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -4849,6 +4849,7 @@ mips_split_move (rtx dest, rtx src, enum mips_split_type split_type, rtx insn_)
  can forward SRC for DEST.  This is most useful if the next insn is a
  simple store.   */
   rtx_insn *insn = (rtx_insn *)insn_;
+  struct mips_address_info addr;
   if (insn)
 {
   rtx_insn *next = next_nonnote_nondebug_insn_bb (insn);
@@ -4856,7 +4857,17 @@ mips_split_move (rtx dest, rtx src, enum mips_split_type split_type, rtx insn_)
 	{
 	  rtx set = single_set (next);
 	  if (set && SET_SRC (set) == dest)
-	validate_change (next, _SRC (set), src, false);
+	{
+	  if (MEM_P (src))
+		{
+		  rtx tmp = XEXP (src, 0);
+		  mips_classify_address (, tmp, GET_MODE (tmp), true);
+		  if (REGNO (addr.reg) != REGNO (dest))
+		validate_change (next, _SRC (set), src, false);
+		}
+	  else
+		validate_change (next, _SRC (set), src, false);
+	}
 	}
 }
 }
-- 
1.8.3.1



Re: [PATCH,nvptx] Remove use of 'struct map' from plugin (nvptx)

2019-05-06 Thread Thomas Schwinge
Hi!

On Tue, 31 Jul 2018 08:12:51 -0700, Cesar Philippidis 
 wrote:
> This is an old patch which removes the struct map from the nvptx plugin.

(This got committed to trunk in r263212.)

> I believe at one point this was supposed to be used to manage async data
> mappings, but in practice that never worked out.

The original submission seems to be
:

| The attached patch fixes an issue in the managing of
| the page-locked buffer which holds the kernel launch
| mappings. In the process of fixing the issue, I discovered
| that 'struct map' was no longer needed, so that has
| been removed as well.

I can't tell/remember what the "issue in the managing of the page-locked
buffer which holds the kernel launch mappings" would've been.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c

I wanted to note that on all branches down to GCC 5 this test case PASSes
already without any libgomp code changes.

> @@ -0,0 +1,63 @@
> +/* { dg-do run } */
> +
> +#include 
> +#include 
> +#include 
> +
> +/* Exercise the kernel launch argument mapping.  */
> +
> +int
> +main (int argc, char **argv)
> +{
> +  int a[256], b[256], c[256], d[256], e[256], f[256];
> +  int i;
> +  int n;
> +
> +  /* 48 is the size of the mappings for the first parallel construct.  */
> +  n = sysconf (_SC_PAGESIZE) / 48 - 1;

The rationale for the number 48 doesn't seem right -- the first
'parallel' construct has four mappings, so four times eight bytes is 32
bytes on x86_64?  Ha, or maybe this applies to the 'struct map', so it's
these four plus the 'int async', 'size_t size'?

Anyway, I can't tell what this is trying to achieve?

> +
> +  i = 0;
> +
> +  for (i = 0; i < n; i++)
> +{
> +  #pragma acc parallel copy (a, b, c, d)
> + {
> +   int j;
> +
> +   for (j = 0; j < 256; j++)
> + {
> +   a[j] = j;
> +   b[j] = j;
> +   c[j] = j;
> +   d[j] = j;
> + }
> + }
> +}

Maybe filling up some improperly-managed statically-sized data structure
(said "page-locked buffer"?), which then...

> +
> +#pragma acc parallel copy (a, b, c, d, e, f)

... would overflow here?

> +  {
> +int j;
> +
> +for (j = 0; j < 256; j++)
> +  {
> + a[j] = j;
> + b[j] = j;
> + c[j] = j;
> + d[j] = j;
> + e[j] = j;
> + f[j] = j;
> +  }
> +  }
> +
> +  for (i = 0; i < 256; i++)
> +   {
> + if (a[i] != i) abort();
> + if (b[i] != i) abort();
> + if (c[i] != i) abort();
> + if (d[i] != i) abort();
> + if (e[i] != i) abort();
> + if (f[i] != i) abort();
> +   }
> +
> +  exit (0);
> +}

Anyway -- the libgomp code has been cleaned up; the test case seems like
it can be disregarded.


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH 2/2] Support {MIN,MAX}_EXPR in GIMPLE FE.

2019-05-06 Thread Martin Liška
Hi.

The patch is about support of a new GIMPLE expr.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From a4f23fe8d876f2d8a0628feb57127e276348aab0 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 3 May 2019 13:54:40 +0200
Subject: [PATCH 2/2] Support {MIN,MAX}_EXPR in GIMPLE FE.

gcc/ChangeLog:

2019-05-03  Martin Liska  

	* gimple-pretty-print.c (dump_binary_rhs): Dump MIN_EXPR
	and MAX_EXPR in GIMPLE FE format.

gcc/c/ChangeLog:

2019-05-03  Martin Liska  

	* c-typeck.c (build_binary_op): Handle MIN_EXPR and MAX_EXPR
	with flag_gimple.
	* gimple-parser.c (c_parser_gimple_statement): Support __MIN and
	__MAX.
	(c_parser_gimple_unary_expression): Parse also binary expression
	__MIN and __MAX.

gcc/testsuite/ChangeLog:

2019-05-03  Martin Liska  

	* gcc.dg/gimplefe-39.c: New test.
---
 gcc/c/c-typeck.c   | 10 ++
 gcc/c/gimple-parser.c  | 21 -
 gcc/gimple-pretty-print.c  | 15 ++-
 gcc/testsuite/gcc.dg/gimplefe-39.c | 21 +
 4 files changed, 65 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-39.c

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4e443754002..d0ae9c51f6b 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12193,6 +12193,16 @@ build_binary_op (location_t location, enum tree_code code,
 	maybe_warn_bool_compare (location, code, orig_op0, orig_op1);
   break;
 
+case MIN_EXPR:
+case MAX_EXPR:
+  if (flag_gimple)
+	{
+	  result_type = c_common_type (type0, type1);
+	  break;
+	}
+  else
+	gcc_unreachable ();
+
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c
index 6558770873f..f3e0d2f94ad 100644
--- a/gcc/c/gimple-parser.c
+++ b/gcc/c/gimple-parser.c
@@ -747,7 +747,9 @@ c_parser_gimple_statement (gimple_parser , gimple_seq *seq)
   {
 	tree id = c_parser_peek_token (parser)->value;
 	if (strcmp (IDENTIFIER_POINTER (id), "__ABS") == 0
-	|| strcmp (IDENTIFIER_POINTER (id), "__ABSU") == 0)
+	|| strcmp (IDENTIFIER_POINTER (id), "__ABSU") == 0
+	|| strcmp (IDENTIFIER_POINTER (id), "__MIN") == 0
+	|| strcmp (IDENTIFIER_POINTER (id), "__MAX") == 0)
 	  goto build_unary_expr;
 	break;
   }
@@ -1062,6 +1064,7 @@ c_parser_gimple_unary_expression (gimple_parser )
 	}
 case CPP_NAME:
 	{
+	  bool is_min;
 	  tree id = c_parser_peek_token (parser)->value;
 	  if (strcmp (IDENTIFIER_POINTER (id), "__ABS") == 0)
 	{
@@ -1075,6 +1078,22 @@ c_parser_gimple_unary_expression (gimple_parser )
 	  op = c_parser_gimple_postfix_expression (parser);
 	  return parser_build_unary_op (op_loc, ABSU_EXPR, op);
 	}
+	  else if ((is_min = strcmp (IDENTIFIER_POINTER (id), "__MIN") == 0)
+		   || strcmp (IDENTIFIER_POINTER (id), "__MAX") == 0)
+	{
+	  c_parser_consume_token (parser);
+	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+		return ret;
+	  c_expr op1 = c_parser_gimple_postfix_expression (parser);
+	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
+		return ret;
+	  c_expr op2 = c_parser_gimple_postfix_expression (parser);
+	  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+		return ret;
+	  return parser_build_binary_op (op_loc,
+	 is_min ? MIN_EXPR : MAX_EXPR,
+	 op1, op2);
+	}
 	  else
 	return c_parser_gimple_postfix_expression (parser);
 	}
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 7e3916bff86..58212c4dcc1 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -423,9 +423,22 @@ dump_binary_rhs (pretty_printer *buffer, gassign *gs, int spc,
   enum tree_code code = gimple_assign_rhs_code (gs);
   switch (code)
 {
-case COMPLEX_EXPR:
 case MIN_EXPR:
 case MAX_EXPR:
+  if (flags & TDF_GIMPLE)
+	{
+	  pp_string (buffer, code == MIN_EXPR ? "__MIN (" : "__MAX (");
+	  dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags,
+			 false);
+	  pp_string (buffer, ", ");
+	  dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags,
+			 false);
+	  pp_string (buffer, ")");
+	  break;
+	}
+  else
+	gcc_fallthrough ();
+case COMPLEX_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
 case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/testsuite/gcc.dg/gimplefe-39.c b/gcc/testsuite/gcc.dg/gimplefe-39.c
new file mode 100644
index 000..30677356d5b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-39.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fgimple -fdump-tree-optimized" } */
+
+int a, b;
+
+int __GIMPLE (ssa,guessed_local(1073741824))
+main (int argc)
+{
+  int _1;
+  int _2;
+  int _4;
+
+  __BB(2,guessed_local(1073741824)):
+  _1 = a;
+  _2 = b;
+  _4 = __MAX (_1, _2);
+  return _4;
+
+}
+
+/* { dg-final { scan-tree-dump "MAX_EXPR" 

Re: [PATCH][stage1] Support profile (BB counts and edge probabilities) in GIMPLE FE.

2019-05-06 Thread Martin Liška
On 5/2/19 2:31 PM, Richard Biener wrote:
> On Mon, Apr 29, 2019 at 2:51 PM Martin Liška  wrote:
>>
>> On 4/26/19 3:18 PM, Richard Biener wrote:
>>> On Wed, Apr 10, 2019 at 10:12 AM Martin Liška  wrote:

 On 4/9/19 3:19 PM, Jan Hubicka wrote:
>> Hi.
>>
>> There's updated version that supports profile quality for both counts
>> and probabilities. I'm wondering whether ENTRY and EXIT BBs needs to
>> have set probability. Apparently, I haven't seen any verifier that
>> would complain.
>
> Well, you do not need to define it but then several cases will
> degenerate. In particular BB frequencies (for callgraph profile or
> hot/cold decisions) are calculated as ratios of entry BB and given BB
> count. If entry BB is undefined you will get those undefined and
> heuristics will resort to conservative answers.
>
> I do not think we use exit block count.
>
> Honza
>

 Thank you Honza for explanation. I'm sending version of the patch
 that supports entry BB count.

 I've been testing the patch right now.
>>>
>>> Can you move the GIMPLE/RTL FE specific data in c_declspecs to
>>> a substructure accessed via indirection?  I guess enlarging that
>>> isn't really what we should do.  You'd move gimple_or_rtl_pass
>>> there and make that pointer one to a struct aux_fe_data
>>> (lifetime managed by the respective RTL/GIMPLE FE, thus
>>> to be freed there)?  Joseph, do you agree or is adding more
>>> stuff to c_declspecs OK (I would guess it could be a few more
>>> elements in the future).
>>
>> Let's wait here for Joseph.
> 
> So looks like it won't matter so let's go with the current approach
> for the moment.
> 
>>>
>>> -c_parser_gimple_parse_bb_spec (tree val, int *index)
>>> +c_parser_gimple_parse_bb_spec (tree val, gimple_parser ,
>>> +  int *index, profile_probability *probablity)
>>>  {
>>>
>>> so this will allow specifying probability in PHI node arguments.
>>> I think we want to split this into the already existing part
>>> and a c_parser_gimple_parse_bb_spec_with_edge_probability
>>> to be used at edge sources.
>>
>> Yes, that's a good idea!
>>
>>>
>>> +  if (cfun->curr_properties & PROP_cfg)
>>> +{
>>> +  update_max_bb_count ();
>>> +  set_hot_bb_threshold (hot_bb_threshold);
>>> +  ENTRY_BLOCK_PTR_FOR_FN (cfun)->count = entry_bb_count;
>>>
>>> I guess the last one should be before update_max_bb_count ()?
>>
>> Same here.
>>
>>>
>>> +}
>>>
>>> + /* Parse profile: quality(value) */
>>>   else
>>> {
>>> - c_parser_error (parser, "unknown block specifier");
>>> - return return_p;
>>> + tree q;
>>> + profile_quality quality;
>>> + tree v = c_parser_peek_token (parser)->value;
>>>
>>> peek next token before the if/else and do
>>>
>>>else if (!strcmp (...))
>>>
>>> as in the loop_header case.  I expected we can somehow share
>>> parsing of profile quality and BB/edge count/frequency?  How's
>>> the expected syntax btw, comments in the code should tell us.
>>> I'm guessing it's quality-id '(' number ')' and thus it should be
>>> really shareable between edge and BB count and also __GIMPLE
>>> header parsing?  So parse_profile_quality should be
>>> parse_profile () instead, resulting in a combined value
>>> (we do use the same for edge/bb?).
>>
>> It's problematic, there are different error messages for count/frequency.
>> Moreover call to parse_profile_quality in c_parser_gimple_or_rtl_pass_list
>> is a way how to test that next 'token' is a profile count.
> 
> Who cares about error messages...  But sure, I'm just proposing to
> merge testing for next token and actual parsing.

After I've done removal of hot_bb_threshold parsing, there are just 2 usages
of parse_profile_quality. I would like to leave it as it, not introducing a 
wrappers.

> 
>>>
>>> +  else if (!strcmp (op, "hot_bb_threshold"))
>>> +   {
>>>
>>> I'm not sure about this - it doesn't make sense to specify this
>>> on a per-function base since it seems to control a global
>>> variable (booo!)?
>>
>> Yep, shame on me!
>>
>>> Isn't this instead computed on-demand
>>> based on profile_info->sum_max?
>>
>> No it's a global value shared among functions.
>>
>>> If not then I think
>>> we need an alternate way of funneling in global state into
>>> the GIMPLE FE.
>>
>> What about --param gimple-fe-hot-bb-threshold ?
> 
> I thought about that, yes ...  in absence can it actually be
> "computed"?

Renamed to it.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

> 
> Richard.
> 
>> Thanks,
>> Martin
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>

 Martin
>>

>From e61a8cbf3c9c8b7aef250e1a5d31cf654fad76bb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 4 Apr 2019 14:46:15 +0200
Subject: [PATCH 1/2] 

[PATCH] Fix PR87314 testcase

2019-05-06 Thread Richard Biener


Tested on x86_64-unknown-linux-gnu and ppc64le-linux.

Richard.

2019-05-06  Richard Biener  

PR testsuite/90331
* gcc.dg/pr87314-1.c: Align the substring to open up
string merging for targets aligning strings to 8 bytes.

Index: gcc/testsuite/gcc.dg/pr87314-1.c
===
--- gcc/testsuite/gcc.dg/pr87314-1.c(revision 270847)
+++ gcc/testsuite/gcc.dg/pr87314-1.c(working copy)
@@ -3,9 +3,9 @@
 
 int f(){ int a; return ==(void *)"hello"; }
 int g(){ return "bye"=="hello"; }
-int h() { return "bye"=="hellobye"+5; }
+int h() { return "bye"=="hellbye"+8; }
 
 /* { dg-final { scan-tree-dump-times "hello" 1 "original" } } */
 /* The test in h() should be retained because the result depends on
string merging.  */
-/* { dg-final { scan-assembler "hello" } } */
+/* { dg-final { scan-assembler "hellooo" } } */


Re: [PATCH, RFC, rs6000] PR80791 Consider doloop in ivopts

2019-05-06 Thread Kewen.Lin
on 2019/5/6 上午9:50, Bin.Cheng wrote:
> On Sun, May 5, 2019 at 2:02 PM Kewen.Lin  wrote:
>>
>> To decrease the cost isn't workable for this case, it make the original
>> iv cand is preferred more and over the other iv cand for memory iv use,
>> then the desirable memory based iv cand won't be chosen.
>> If increase the cost, one basic iv cand is chosen for cmp use, memory
>> based iv cand is chosen for memory use, instead of original iv for both.
> Sorry for the mistake, I meant to decrease use cost of whatever "correct"
> iv_cand for cmp iv_use that could enable doloop optimization, it doesn't
> has to the original iv_cand.
> 

Thanks for your clarification. 

As mentioned, we are trying to make the predict doloop hook work as 
conservative as possible, so under the hypothesis that predicted cmp iv 
use is going to be eliminated later, I'd prefer the way to bind iv_cand
artificially.  Because whichever iv_cand is selected as "correct" one
and then increase/decrease the cost, we are considering one upcoming
eliminated cmp iv use (it doesn't require the iv cand eventually), it
will *probably* affect the optimal candidate set decision.

>> Could you help to explain the "binding" more?  Does it mean cost modeling
>> decision making can simply bypass the cmp iv_use (we have artificially
>> assigned one "correct" cand iv to it) and also doesn't count the "correct"
>> iv cand cost into total iv cost? Is my understanding correct?
> For example, if the heuristic finds out the "correct" doloop iv_cand, we can
> force choosing that iv_cand for cmp iv_use and bypass the candidate choosing
> algorithm:
> struct iv_group {
>   //...
>   struct iv_cand *bind_cand;
> };
> then set this bind_cand directly in struct iv_ca::cand_for_group.  As a 
> result,
> iv_use rewriting code takes care of everything, no special handling (such as
> preserve_ivs_for_use) is needed.
> 
> Whether letting cost model decide the "correct" iv_cand or bind it by yourself
> depends on how good your heuristic check is.  It's your call. :)
> 

Thanks a lot for your detailed explanation!

As mentioned above, I prefer to use "bind" for it.
I'll modify the patch with the "bind cand" way, initially set the relevant
iv cand originally used in that cmp iv use, then update it if it can be 
updated with IV_ELIM.

>> Yes, I agree we should be conservative.  But it's hard to determine which is
>> better in practice, even for capturing all cases, we are still trying our 
>> best
>> to be more conservative, excluding any suspicious factor which is possible to
>> make it fail in later RTL checking, one example is that the patch won't 
>> predict
>> it can be doloop once finding switch statement.  It depends on how much "bad"
>> cases we don't catch and how serious its impact is and whether easy to 
>> improve.
> Sure, I don't know ppc so have all the trust in your decision here.
> 
> Thanks for your patience.
> 

Thank you too!!!

Kewen

> Thanks,
> bin
> 



Re: [PATCH] Error only when -mabi=ms is used on a non-MS_ABI system (PR sanitizer/90312).

2019-05-06 Thread Jakub Jelinek
On Fri, May 03, 2019 at 12:51:09PM +0200, Martin Liška wrote:
> 2019-05-02  Martin Liska  
> 
>   PR sanitizer/90312
>   * config/i386/i386.c (ix86_option_override_internal): Error only
>   when -mabi is selected to a non-default version.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-05-02  Martin Liska  
> 
>   PR sanitizer/90312
>   * gcc.dg/asan/pr87930.c: Run the test only on *linux or *gnu
>   systems.
>   * gcc.dg/tsan/pr88017.c: Likewise.

Ok, thanks.

Jakub