[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #16 from Jeffrey A. Law --- FWIW, I'm tracking an independent SRA issue that I might send Martin's way as well. I'm still trying to wrap my head around it and I've got a nice small little testcase now to help with that... But if I get stuck, I'll be looking for Martin's help :-)
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 Martin Jambor changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jamborm at gcc dot gnu.org --- Comment #15 from Martin Jambor --- (In reply to Jakub Jelinek from comment #13) > Martin, can you please take over the SRA part? I will.
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 --- Comment #14 from Jakub Jelinek --- Note the above dumps are with -O1 -g0 -fno-asynchronous-unwind-tables -fno-exceptions (what I've been using in the testsuite reduction).
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 --- Comment #13 from Jakub Jelinek --- So, without the tree-dfa.c change current trunk in esra transforms: G, J>, Sz> div(U&, double) [with int Sz = 3] (struct U & p1, double p2) { + const double * SR.27; + const double * SR.26; + const double * SR.25; + const double * SR.24; struct ConstReference D.3593; struct ConstReference D.3592; struct ConstReference D.3172; const struct J D.3173; struct expr_type D.3174; const double * _11; long double _12; : MEM[(struct &)&D.3173] ={v} {CLOBBER}; _12 = (long double) p2_2(D); D.3173.m_data = _12; _11 = &MEM[(const struct U *)p1_4(D)].m_data; - MEM[(struct K *)&D.3593] = _11; - D.3592 = D.3593; - D.3172 = D.3592; + SR.27_3 = _11; + SR.26_5 = SR.27_3; + SR.24_6 = SR.26_5; MEM[(struct &)&D.3174] ={v} {CLOBBER}; - D.3174.m_lhs = D.3172; + SR.25_23 = SR.24_6; MEM[(struct J *)&D.3174 + 16B] = D.3173; MEM[(struct &)&] ={v} {CLOBBER}; - .m_expr = D.3174; + MEM[(struct G *)&] = SR.25_23; D.3174 ={v} {CLOBBER}; - D.3172 ={v} {CLOBBER}; D.3173 ={v} {CLOBBER}; return ; } ConstReference (aka K<3>) contains a single const double *m_data member (thus it is likely a DImode struct), J contains a single long double m_data member, thus it is XFmode struct, and expr_type is I, which contains two fields, one ConstReference aka K<3> m_lhs and another one const J m_rhs. Scalarization of the DImode struct is of course just fine, and after scalarization the J store into D.3174 still remains: MEM[(struct J *)&D.3174 + 16B] = D.3173; but what is lost is the read of that or store of D.3173 into MEM[(struct G *)& + 16B]. Compared to this, with patched tree-dfa.c esra changes the same function: G, J>, Sz> div(U&, double) [with int Sz = 3] (struct U & p1, double p2) { + const double * SR.29; + const double * SR.28; + long double SR.27; + const double * SR.26; + long double SR.25; + const double * SR.24; struct ConstReference D.3593; struct ConstReference D.3592; struct ConstReference D.3172; const struct J D.3173; struct expr_type D.3174; const double * _11; long double _12; : - MEM[(struct &)&D.3173] ={v} {CLOBBER}; _12 = (long double) p2_2(D); - D.3173.m_data = _12; + SR.25_5 = _12; _11 = &MEM[(const struct U *)p1_4(D)].m_data; - MEM[(struct K *)&D.3593] = _11; - D.3592 = D.3593; - D.3172 = D.3592; - MEM[(struct &)&D.3174] ={v} {CLOBBER}; - D.3174.m_lhs = D.3172; - MEM[(struct J *)&D.3174 + 16B] = D.3173; + SR.29_6 = _11; + SR.28_7 = SR.29_6; + SR.24_23 = SR.28_7; + SR.26_26 = SR.24_23; + SR.27_27 = SR.25_5; MEM[(struct &)&] ={v} {CLOBBER}; - .m_expr = D.3174; - D.3174 ={v} {CLOBBER}; - D.3172 ={v} {CLOBBER}; - D.3173 ={v} {CLOBBER}; + MEM[(struct G *)&] = SR.26_26; + MEM[(struct G *)& + 16B] = SR.27_27; return ; } i.e. scalarizes also the long double stores and reads (thus, the proposed tree-dfa.c change would improve it too), but the bug really is that the long double assignment of (80 or 96 or 128 bits from D.3174 starting from offset 16 bytes into the struct into corresponding field is lost in vanilla trunk. So, I think we want to change back tree-dfa.c plus fix the SRA bug that causes this if tree-dfa.c is not changed. Martin, can you please take over the SRA part?
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 --- Comment #12 from Richard Biener --- (In reply to Jakub Jelinek from comment #11) > --- gcc/tree-dfa.c.jj 2016-01-04 14:55:50.0 +0100 > +++ gcc/tree-dfa.c2016-01-20 11:06:15.226682927 +0100 > @@ -395,7 +395,7 @@ get_ref_base_and_extent (tree exp, HOST_ >if (mode == BLKmode) > size_tree = TYPE_SIZE (TREE_TYPE (exp)); >else > - bitsize = int (GET_MODE_PRECISION (mode)); > + bitsize = int (GET_MODE_BITSIZE (mode)); > } >if (size_tree != NULL_TREE >&& TREE_CODE (size_tree) == INTEGER_CST) > > fixes that and DJ has tested it on msp430 without regressions. > As that is consistent with what e.g. get_inner_reference does, I'd prefer to > apply it this way, but I must say I still don't really understand, even on > the simplified testcase, what is going on during SRA there with the shorter > access sizes. > On the reduced testcase there are 7 calls of get_ref_base_and_extent where > mode here is XFmode (one with different precision from bitsize). > Created a replacement for D.3172 offset: 0, size: 64: SR.24 > -Created a replacement for D.3174 offset: 0, size: 64: SR.25 > -Created a replacement for D.3592 offset: 0, size: 64: SR.26 > -Created a replacement for D.3593 offset: 0, size: 64: SR.27 > +Created a replacement for D.3173 offset: 0, size: 128: SR.25 > +Created a replacement for D.3174 offset: 0, size: 64: SR.26 > +Created a replacement for D.3174 offset: 128, size: 128: SR.27 > +Created a replacement for D.3592 offset: 0, size: 64: SR.28 > +Created a replacement for D.3593 offset: 0, size: 64: SR.29 > is unpatched to patched gcc for the div function. Is SRA assuming the > access sizes are power of two and misbehaving if it gets access size 80 bits? No, it has various checks to verify % BITS_PER_UNIT is zero but no unchecked division by it. Even though the above suggests otherwise. It must be a more subtle bug.
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 Jakub Jelinek changed: What|Removed |Added CC||jamborm at gcc dot gnu.org --- Comment #11 from Jakub Jelinek --- --- gcc/tree-dfa.c.jj 2016-01-04 14:55:50.0 +0100 +++ gcc/tree-dfa.c 2016-01-20 11:06:15.226682927 +0100 @@ -395,7 +395,7 @@ get_ref_base_and_extent (tree exp, HOST_ if (mode == BLKmode) size_tree = TYPE_SIZE (TREE_TYPE (exp)); else - bitsize = int (GET_MODE_PRECISION (mode)); + bitsize = int (GET_MODE_BITSIZE (mode)); } if (size_tree != NULL_TREE && TREE_CODE (size_tree) == INTEGER_CST) fixes that and DJ has tested it on msp430 without regressions. As that is consistent with what e.g. get_inner_reference does, I'd prefer to apply it this way, but I must say I still don't really understand, even on the simplified testcase, what is going on during SRA there with the shorter access sizes. On the reduced testcase there are 7 calls of get_ref_base_and_extent where mode here is XFmode (one with different precision from bitsize). Created a replacement for D.3172 offset: 0, size: 64: SR.24 -Created a replacement for D.3174 offset: 0, size: 64: SR.25 -Created a replacement for D.3592 offset: 0, size: 64: SR.26 -Created a replacement for D.3593 offset: 0, size: 64: SR.27 +Created a replacement for D.3173 offset: 0, size: 128: SR.25 +Created a replacement for D.3174 offset: 0, size: 64: SR.26 +Created a replacement for D.3174 offset: 128, size: 128: SR.27 +Created a replacement for D.3592 offset: 0, size: 64: SR.28 +Created a replacement for D.3593 offset: 0, size: 64: SR.29 is unpatched to patched gcc for the div function. Is SRA assuming the access sizes are power of two and misbehaving if it gets access size 80 bits?
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 --- Comment #10 from Jakub Jelinek --- Reduced testcase: template struct A; template <> struct A<1> {}; template struct B { template struct C { typedef T *iterator; C (iterator p1) : m_iter (p1) {} void operator, (T p1) { *m_iter = p1; } iterator m_iter; }; typedef double *iterator; B (Obj &p1, double) : m_object (p1) {} C operator, (double); Obj &m_object; }; template typename B::template C B::operator, (double p1) { iterator a = m_object.data (), b = a + 1; *a = 1; *b = p1; return C(b + 1); } class D {}; inline double operator+(const double &p1, D) { return p1; } template class U; template struct F { enum { doIt = K < Sz - 1 ? 1 : 0 }; template static void assign (Dest &p1, Src &p2, Assign &p3) { p3.apply_on (p1 (K), p2 (K)); F::assign (p1, p2, p3); } template static double dot (Dest &p1, Src &p2) { return p1 (K) * p2 (K) + F::dot (p1, p2); } }; template <> struct F<0> { template static void assign (Dest &, Src &, Assign &) {} template static D dot (Dest &, Src &) { return D (); } }; template struct G { enum { ops_assign, use_meta }; G (const E &p1) : m_expr (p1) {} double operator()(int p1) const { return m_expr (p1); } template static void do_assign (A<1>, Dest &p2, Src &p3, Assign &p4) { F::assign (p2, p3, p4); } template void assign_to (Dest &p1, const Assign &p2) const { do_assign (A<1>(), p1, *this, p2); } E m_expr; }; struct H { static double apply_on (double p1, long double p2) { return p1 / p2; } static void apply_on (double & __restrict p1, double p2) { p1 = p2; } }; template struct I { I (const E1 &p1, const E2 &p2) : m_lhs (p1), m_rhs (p2) {} double operator()(int p1) const { double c = m_lhs (p1); return H::apply_on (c, m_rhs (0)); } E1 m_lhs; const E2 m_rhs; }; struct J { J (double p1) : m_data (p1) {} long double operator()(int) const { return m_data; } long double m_data; }; template struct K { K (const U &p1) : m_data (p1.data ()) {} double operator()(int p1) const { return m_data[p1]; } const double *m_data; }; template struct U { U () {} U (const U &p1) { *this = G(p1.const_ref ()); } B operator=(double) { return B(*this, 0); } double *data () { return m_data; } const double *data () const { return m_data; } double &operator()(int p1) { return m_data[p1]; } double operator()(int p1) const { return m_data[p1]; } typedef K ConstReference; ConstReference const_ref () const { return *this; } template void operator=(const G &p1) { p1.assign_to (*this, H ()); } double m_data[Sz]; }; template G, J>, Sz> div (U &p1, double p2) { typedef I, J> expr_type; return G(expr_type (p1.const_ref (), p2)); } template double norm2 (U &p1) { return __builtin_sqrt (F::dot (p1, p1)); } template G, J>, Sz> operator/(U &p1, double p2) { return div (p1, p2); } typedef U<3> V; V foo (V p1) { double e = norm2 (p1); V r; r = p1 / e; return r; } int main () { V f; f = 1, 2, 3; V r = foo (f); if (__builtin_fabs (r (0) - 0.267261) > 0.001 || __builtin_fabs (r (1) - 0.534522) > 0.001 || __builtin_fabs (r (2) - 0.801784) > 0.001) __builtin_abort (); }
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 --- Comment #9 from Jakub Jelinek --- Note get_inner_reference uses mode = TYPE_MODE (TREE_TYPE (exp)); *punsignedp = TYPE_UNSIGNED (TREE_TYPE (exp)); if (mode == BLKmode) size_tree = TYPE_SIZE (TREE_TYPE (exp)); else *pbitsize = GET_MODE_BITSIZE (mode); and IMNSHO those two should be consistent in what they are doing.
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 --- Comment #8 from Jakub Jelinek --- Note, -fno-tree-sra fixes the issue, so I bet it is the get_ref_base_and_extent calls in tree-sra.c that matter.
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 Jakub Jelinek changed: What|Removed |Added CC||dj at gcc dot gnu.org, ||law at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- -fno-lifetime-dse -fno-devirtualize -fno-devirtualize-speculatively doesn't help either. The number of abstraction levels is very big, so it is hard to spot where exactly things go wrong. But, I've tracked the problem to the tree-dfa.c change of that commit only, reverting: https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/tree-dfa.c?limit_changes=0&r1=214748&r2=214747&pathrev=214748 fixes it. This makes a change on XprLiteral structure, which contains a single long double member, therefore the structure has XFmode, which has precision of 80, but bitsize 128 (on x86_64, 96 on i?86). Using the whole size of the access rather than only the precision looks like the right thing to me though, I mean various targets don't lay the bits without any gaps. So, IMNSHO either that change should be reverted, or limited to partial int modes?
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 --- Comment #6 from Jakub Jelinek --- --- pr69355.ii 2016-01-19 13:00:26.772400572 +0100 +++ pr69355-1.ii2016-01-19 15:54:26.790557524 +0100 @@ -30334,10 +30334,17 @@ namespace std __attribute__ ((__visibili # 4 "mytest.cpp" typedef tvmet::Vector Vector; +__attribute__((noinline, noclone)) Vector +foo (Vector v1) +{ + Vector r; + r = v1/norm2(v1); + return r; +} + int main(){ Vector v1; v1 = 1,2,3; - Vector r; - r = v1/norm2(v1); + Vector r = foo(v1); std::cout << r; } narrows it down a little bit further, happens also with -fno-strict-aliasing, somewhere during early inlining the operator/ starts ignoring the passed in long double rhs argument. Looking further.
[Bug c++/69355] [5/6 Regression] Wrong results with -O1 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69355 Richard Biener changed: What|Removed |Added Target||x86_64-*-* Known to work||4.9.3 Target Milestone|--- |5.4 Summary|Wrong results with -O1 |[5/6 Regression] Wrong |optimization|results with -O1 ||optimization Known to fail||5.3.0, 6.0