[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-20 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #16 from Jeffrey A. Law  ---
FWIW, I'm tracking an independent SRA issue that I might send Martin's way as
well.  I'm still trying to wrap my head around it and I've got a nice small
little testcase now to help with that... But if I get stuck, I'll be looking
for Martin's help :-)

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-20 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #15 from Martin Jambor  ---
(In reply to Jakub Jelinek from comment #13)
> Martin, can you please take over the SRA part?

I will.

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

--- Comment #14 from Jakub Jelinek  ---
Note the above dumps are with -O1 -g0 -fno-asynchronous-unwind-tables
-fno-exceptions (what I've been using in the testsuite reduction).

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

--- Comment #13 from Jakub Jelinek  ---
So, without the tree-dfa.c change current trunk in esra transforms:
 G, J>, Sz> div(U&, double) [with int Sz = 3] (struct U & p1,
double p2)
 {
+  const double * SR.27;
+  const double * SR.26;
+  const double * SR.25;
+  const double * SR.24;
   struct ConstReference D.3593;
   struct ConstReference D.3592;
   struct ConstReference D.3172;
   const struct J D.3173;
   struct expr_type D.3174;
   const double * _11;
   long double _12;

   :
   MEM[(struct  &)&D.3173] ={v} {CLOBBER};
   _12 = (long double) p2_2(D);
   D.3173.m_data = _12;
   _11 = &MEM[(const struct U *)p1_4(D)].m_data;
-  MEM[(struct K *)&D.3593] = _11;
-  D.3592 = D.3593;
-  D.3172 = D.3592;
+  SR.27_3 = _11;
+  SR.26_5 = SR.27_3;
+  SR.24_6 = SR.26_5;
   MEM[(struct  &)&D.3174] ={v} {CLOBBER};
-  D.3174.m_lhs = D.3172;
+  SR.25_23 = SR.24_6;
   MEM[(struct J *)&D.3174 + 16B] = D.3173;
   MEM[(struct  &)&] ={v} {CLOBBER};
-  .m_expr = D.3174;
+  MEM[(struct G *)&] = SR.25_23;
   D.3174 ={v} {CLOBBER};
-  D.3172 ={v} {CLOBBER};
   D.3173 ={v} {CLOBBER};
   return ;

 }

ConstReference (aka K<3>) contains a single const double *m_data member (thus
it is likely a DImode struct), J contains a single long double m_data member,
thus it is XFmode struct,
and expr_type is I, which contains two fields, one ConstReference aka K<3>
m_lhs and another one const J m_rhs.  Scalarization of the DImode struct is of
course just fine, and after scalarization the J store into D.3174 still
remains:
MEM[(struct J *)&D.3174 + 16B] = D.3173;
but what is lost is the read of that or store of D.3173 into MEM[(struct G
*)& + 16B].
Compared to this, with patched tree-dfa.c esra changes the same function:
 G, J>, Sz> div(U&, double) [with int Sz = 3] (struct U & p1,
double p2)
 {
+  const double * SR.29;
+  const double * SR.28;
+  long double SR.27;
+  const double * SR.26;
+  long double SR.25;
+  const double * SR.24;
   struct ConstReference D.3593;
   struct ConstReference D.3592;
   struct ConstReference D.3172;
   const struct J D.3173;
   struct expr_type D.3174;
   const double * _11;
   long double _12;

   :
-  MEM[(struct  &)&D.3173] ={v} {CLOBBER};
   _12 = (long double) p2_2(D);
-  D.3173.m_data = _12;
+  SR.25_5 = _12;
   _11 = &MEM[(const struct U *)p1_4(D)].m_data;
-  MEM[(struct K *)&D.3593] = _11;
-  D.3592 = D.3593;
-  D.3172 = D.3592;
-  MEM[(struct  &)&D.3174] ={v} {CLOBBER};
-  D.3174.m_lhs = D.3172;
-  MEM[(struct J *)&D.3174 + 16B] = D.3173;
+  SR.29_6 = _11;
+  SR.28_7 = SR.29_6;
+  SR.24_23 = SR.28_7;
+  SR.26_26 = SR.24_23;
+  SR.27_27 = SR.25_5;
   MEM[(struct  &)&] ={v} {CLOBBER};
-  .m_expr = D.3174;
-  D.3174 ={v} {CLOBBER};
-  D.3172 ={v} {CLOBBER};
-  D.3173 ={v} {CLOBBER};
+  MEM[(struct G *)&] = SR.26_26;
+  MEM[(struct G *)& + 16B] = SR.27_27;
   return ;

 }

i.e. scalarizes also the long double stores and reads (thus, the proposed
tree-dfa.c change would improve it too), but the bug really is that the long
double assignment of
(80 or 96 or 128 bits from D.3174 starting from offset 16 bytes into the struct
into corresponding  field is lost in vanilla trunk.

So, I think we want to change back tree-dfa.c plus fix the SRA bug that causes
this if tree-dfa.c is not changed.
Martin, can you please take over the SRA part?

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

--- Comment #12 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #11)
> --- gcc/tree-dfa.c.jj 2016-01-04 14:55:50.0 +0100
> +++ gcc/tree-dfa.c2016-01-20 11:06:15.226682927 +0100
> @@ -395,7 +395,7 @@ get_ref_base_and_extent (tree exp, HOST_
>if (mode == BLKmode)
>   size_tree = TYPE_SIZE (TREE_TYPE (exp));
>else
> - bitsize = int (GET_MODE_PRECISION (mode));
> + bitsize = int (GET_MODE_BITSIZE (mode));
>  }
>if (size_tree != NULL_TREE
>&& TREE_CODE (size_tree) == INTEGER_CST)
> 
> fixes that and DJ has tested it on msp430 without regressions.
> As that is consistent with what e.g. get_inner_reference does, I'd prefer to
> apply it this way, but I must say I still don't really understand, even on
> the simplified testcase, what is going on during SRA there with the shorter
> access sizes.
> On the reduced testcase there are 7 calls of get_ref_base_and_extent where
> mode here is XFmode (one with different precision from bitsize).
>  Created a replacement for D.3172 offset: 0, size: 64: SR.24
> -Created a replacement for D.3174 offset: 0, size: 64: SR.25
> -Created a replacement for D.3592 offset: 0, size: 64: SR.26
> -Created a replacement for D.3593 offset: 0, size: 64: SR.27
> +Created a replacement for D.3173 offset: 0, size: 128: SR.25
> +Created a replacement for D.3174 offset: 0, size: 64: SR.26
> +Created a replacement for D.3174 offset: 128, size: 128: SR.27
> +Created a replacement for D.3592 offset: 0, size: 64: SR.28
> +Created a replacement for D.3593 offset: 0, size: 64: SR.29
> is unpatched to patched gcc for the div function.  Is SRA assuming the
> access sizes are power of two and misbehaving if it gets access size 80 bits?

No, it has various checks to verify % BITS_PER_UNIT is zero but no
unchecked division by it.  Even though the above suggests otherwise.

It must be a more subtle bug.

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek  ---
--- gcc/tree-dfa.c.jj   2016-01-04 14:55:50.0 +0100
+++ gcc/tree-dfa.c  2016-01-20 11:06:15.226682927 +0100
@@ -395,7 +395,7 @@ get_ref_base_and_extent (tree exp, HOST_
   if (mode == BLKmode)
size_tree = TYPE_SIZE (TREE_TYPE (exp));
   else
-   bitsize = int (GET_MODE_PRECISION (mode));
+   bitsize = int (GET_MODE_BITSIZE (mode));
 }
   if (size_tree != NULL_TREE
   && TREE_CODE (size_tree) == INTEGER_CST)

fixes that and DJ has tested it on msp430 without regressions.
As that is consistent with what e.g. get_inner_reference does, I'd prefer to
apply it this way, but I must say I still don't really understand, even on the
simplified testcase, what is going on during SRA there with the shorter access
sizes.
On the reduced testcase there are 7 calls of get_ref_base_and_extent where mode
here is XFmode (one with different precision from bitsize).
 Created a replacement for D.3172 offset: 0, size: 64: SR.24
-Created a replacement for D.3174 offset: 0, size: 64: SR.25
-Created a replacement for D.3592 offset: 0, size: 64: SR.26
-Created a replacement for D.3593 offset: 0, size: 64: SR.27
+Created a replacement for D.3173 offset: 0, size: 128: SR.25
+Created a replacement for D.3174 offset: 0, size: 64: SR.26
+Created a replacement for D.3174 offset: 128, size: 128: SR.27
+Created a replacement for D.3592 offset: 0, size: 64: SR.28
+Created a replacement for D.3593 offset: 0, size: 64: SR.29
is unpatched to patched gcc for the div function.  Is SRA assuming the access
sizes are power of two and misbehaving if it gets access size 80 bits?

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

--- Comment #10 from Jakub Jelinek  ---
Reduced testcase:

template  struct A;
template <> struct A<1> {};
template  struct B
{
  template  struct C
  {
typedef T *iterator;
C (iterator p1) : m_iter (p1) {}
void operator, (T p1) { *m_iter = p1; }
iterator m_iter;
  };
  typedef double *iterator;
  B (Obj &p1, double) : m_object (p1) {}
  C operator, (double);
  Obj &m_object;
};
template 
typename B::template C
B::operator, (double p1)
{
  iterator a = m_object.data (), b = a + 1;
  *a = 1;
  *b = p1;
  return C(b + 1);
}
class D {};
inline double operator+(const double &p1, D) { return p1; }
template  class U;
template  struct F
{
  enum { doIt = K < Sz - 1 ? 1 : 0 };
  template 
  static void assign (Dest &p1, Src &p2, Assign &p3)
  {
p3.apply_on (p1 (K), p2 (K));
F::assign (p1, p2, p3);
  }
  template  static double dot (Dest &p1, Src &p2)
  {
return p1 (K) * p2 (K) + F::dot (p1, p2);
  }
};
template <> struct F<0>
{
  template 
  static void assign (Dest &, Src &, Assign &) {}
  template  static D dot (Dest &, Src &) { return D ();
}
};
template  struct G
{
  enum { ops_assign, use_meta };
  G (const E &p1) : m_expr (p1) {}
  double operator()(int p1) const { return m_expr (p1); }
  template 
  static void do_assign (A<1>, Dest &p2, Src &p3, Assign &p4)
  {
F::assign (p2, p3, p4);
  }
  template 
  void assign_to (Dest &p1, const Assign &p2) const
  {
do_assign (A<1>(), p1, *this, p2);
  }
  E m_expr;
};
struct H
{
  static double apply_on (double p1, long double p2) { return p1 / p2; }
  static void apply_on (double & __restrict p1, double p2) { p1 = p2; }
};
template  struct I
{
  I (const E1 &p1, const E2 &p2) : m_lhs (p1), m_rhs (p2) {}
  double operator()(int p1) const
  {
double c = m_lhs (p1);
return H::apply_on (c, m_rhs (0));
  }
  E1 m_lhs;
  const E2 m_rhs;
};
struct J
{
  J (double p1) : m_data (p1) {}
  long double operator()(int) const { return m_data; }
  long double m_data;
};
template  struct K
{
  K (const U &p1) : m_data (p1.data ()) {}
  double operator()(int p1) const { return m_data[p1]; }
  const double *m_data;
};
template  struct U
{
  U () {}
  U (const U &p1)
  {
*this = G(p1.const_ref ());
  }
  B operator=(double) { return B(*this, 0); }
  double *data () { return m_data; }
  const double *data () const { return m_data; }
  double &operator()(int p1) { return m_data[p1]; }
  double operator()(int p1) const { return m_data[p1]; }
  typedef K ConstReference;
  ConstReference const_ref () const { return *this; }
  template  void operator=(const G &p1)
  {
p1.assign_to (*this, H ());
  }
  double m_data[Sz];
};
template 
G, J>, Sz> div (U &p1, double p2)
{
  typedef I, J> expr_type;
  return G(expr_type (p1.const_ref (), p2));
}
template  double norm2 (U &p1)
{
  return __builtin_sqrt (F::dot (p1, p1));
}
template 
G, J>, Sz> operator/(U &p1, double p2)
{
  return div (p1, p2);
}
typedef U<3> V;
V foo (V p1)
{
  double e = norm2 (p1);
  V r;
  r = p1 / e;
  return r;
}
int
main ()
{
  V f;
  f = 1, 2, 3;
  V r = foo (f);
  if (__builtin_fabs (r (0) - 0.267261) > 0.001
  || __builtin_fabs (r (1) - 0.534522) > 0.001
  || __builtin_fabs (r (2) - 0.801784) > 0.001)
__builtin_abort ();
}

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

--- Comment #9 from Jakub Jelinek  ---
Note get_inner_reference uses
  mode = TYPE_MODE (TREE_TYPE (exp));
  *punsignedp = TYPE_UNSIGNED (TREE_TYPE (exp));

  if (mode == BLKmode)
size_tree = TYPE_SIZE (TREE_TYPE (exp));
  else
*pbitsize = GET_MODE_BITSIZE (mode);
and IMNSHO those two should be consistent in what they are doing.

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

--- Comment #8 from Jakub Jelinek  ---
Note, -fno-tree-sra fixes the issue, so I bet it is the get_ref_base_and_extent
calls in tree-sra.c that matter.

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

Jakub Jelinek  changed:

   What|Removed |Added

 CC||dj at gcc dot gnu.org,
   ||law at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
-fno-lifetime-dse -fno-devirtualize -fno-devirtualize-speculatively doesn't
help either.  The number of abstraction levels is very big, so it is hard to
spot where exactly things go wrong.
But, I've tracked the problem to the tree-dfa.c change of that commit only,
reverting:
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/tree-dfa.c?limit_changes=0&r1=214748&r2=214747&pathrev=214748
fixes it.  This makes a change on XprLiteral structure, which contains a single
long double member, therefore the structure has XFmode, which has precision of
80, but bitsize 128 (on x86_64, 96 on i?86).
Using the whole size of the access rather than only the precision looks like
the right thing to me though, I mean various targets don't lay the bits without
any gaps.
So, IMNSHO either that change should be reverted, or limited to partial int
modes?

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

--- Comment #6 from Jakub Jelinek  ---
--- pr69355.ii  2016-01-19 13:00:26.772400572 +0100
+++ pr69355-1.ii2016-01-19 15:54:26.790557524 +0100
@@ -30334,10 +30334,17 @@ namespace std __attribute__ ((__visibili
 # 4 "mytest.cpp"
 typedef tvmet::Vector Vector;

+__attribute__((noinline, noclone)) Vector
+foo (Vector v1)
+{
+  Vector r;
+  r = v1/norm2(v1);
+  return r;
+}
+
 int main(){
  Vector v1;
  v1 = 1,2,3;
- Vector r;
- r = v1/norm2(v1);
+ Vector r = foo(v1);
  std::cout << r;
 }

narrows it down a little bit further, happens also with -fno-strict-aliasing,
somewhere during early inlining the operator/ starts ignoring the passed in
long double rhs argument.  Looking further.

[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization

2016-01-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
  Known to work||4.9.3
   Target Milestone|--- |5.4
Summary|Wrong results with -O1  |[5/6 Regression] Wrong
   |optimization|results with -O1
   ||optimization
  Known to fail||5.3.0, 6.0