Re: pr45605.C devirtualize call failure in ia64-hp-hpux?
On 08/07/2012 08:29 AM, Martin Jambor wrote: So I did the testing and unfortunately this only works for the first virtual function of a class, the sequence calling any other virtual function has one more statement which adds VMT offset to the typecasted pointer and after folding we end up calling stuff like MEM[f1+16B]() instead of f2() in g++.dg/template/ptrmem18.C. Yes. The ia64 vtable format has the function descriptors directly in the table. See TARGET_VTABLE_USES_DESCRIPTORS. It is unique in this feature so far (although if another new abi decides to use function descriptors at all, I would recommend this as well). Thus vtable[index] is the function pointer that is passed to the backend for expansion in the call sequence. r~
Re: pr45605.C devirtualize call failure in ia64-hp-hpux?
On Mon, Aug 6, 2012 at 8:21 PM, Martin Jambor mjam...@suse.cz wrote: Hi, I've had this flagged to look at later for quite long now... On Mon, Apr 30, 2012 at 07:34:24AM +, Mailaripillai, Kannan Jeganathan wrote: Hi, This is related to pr45605.C test. Reduced testcase struct B { virtual void Run(){}; }; struct D : public B { virtual void Run() { }; }; int main() { D d; static_castB(d).Run(); } With x86_64 linux the call to Run through object d is devirtualized. Whereas it looks like in ia64 hp-ux it is not devirtualized. -fdump-tree-fre1 output for both: x86_64 linux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.2197._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.2248_1 = MEM[(void *)_ZTV1D + 16B]; D.2249_2 = Run; D::Run (d.D.2197); d ={v} {CLOBBER}; return 0; ia64 hp-ux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.1878._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.1929_1 = MEM[(void *)_ZTV1D + 16B]; D.1930_2 = (int (*__vtbl_ptr_type) ()) MEM[(void *)_ZTV1D + 16B]; OBJ_TYPE_REF(D.1930_2;d.D.1878-0) (d.D.1878); Is it a bug (unexpected with O1 compilation) that it is not optimized to direct call? There are two important and related differences. The first one is that virtual method tables on ia64 constist of FDESC_EXPRs rather than mere ADDR_EXPRs. The second one can be seen in the dumps just before fre1 (i.e. esra): i686: d.D.1854._vptr.B = MEM[(void *)_ZTV1D + 8B]; D.1961_4 = d.D.1854._vptr.B; D.1962_5 = *D.1961_4; OBJ_TYPE_REF(D.1962_5;d.D.1854-0) (d.D.1854); ia64: d.D.1883._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.1991_4 = d.D.1883._vptr.B; D.1992_5 = (int (*__vtbl_ptr_type) ()) D.1991_4; OBJ_TYPE_REF(D.1992_5;d.D.1883-0) (d.D.1883); The main difference is not the type cast in the third assignment but the fact that there is no dereference there, which means that gimple folder has to deal with it at a different place. I played with it a bit this afternoon and came up with the following untested patch to fix the pr45605.C testcase. I can bootstrap and test it on ia64 if we do not mind this special casing of FDESC_EXPRs in the midle end (I hope that all platforms that use it use it in the same way, I only know ia64...) Thanks, Martin 2012-08-06 Martin Jambor mjam...@suse.cz * gimple-fold.c (gimple_fold_stmt_to_constant_1): Also fold assignments of V_C_Es of addresses of FDESC_EXPRs. *** gcc/gimple-fold.c Mon Aug 6 14:36:37 2012 --- /tmp/FcpIKb_gimple-fold.c Mon Aug 6 20:17:26 2012 *** gimple_fold_stmt_to_constant_1 (gimple s *** 2542,2548 == TYPE_ADDR_SPACE (TREE_TYPE (op0)) TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))) ! return op0; return fold_unary_ignore_overflow_loc (loc, subcode, --- 2542,2556 == TYPE_ADDR_SPACE (TREE_TYPE (op0)) TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))) ! { ! tree t; ! if (TREE_CODE (op0) != ADDR_EXPR) ! return op0; ! t = fold_const_aggregate_ref_1 (TREE_OPERAND (op0, 0), valueize); ! if (t TREE_CODE (t) == FDESC_EXPR) ! return build_fold_addr_expr_loc (loc, TREE_OPERAND (t, 0)); FDESC_EXPR has two operands ... is it really ok to ignore the 2nd? /* Operand0 is a function constant; result is part N of a function descriptor of type ptr_mode. */ DEFTREECODE (FDESC_EXPR, fdesc_expr, tcc_expression, 2) I suppose yes, from what I see in the uses in the C++ frontend. In fact _all_ users of FDESC_EXPR seem to ignore the 2nd operand ...!? (I would have expected users in machine specific code ... Thus, Ok! Thanks, Richard. ! return op0; ! } return fold_unary_ignore_overflow_loc (loc, subcode,
Re: pr45605.C devirtualize call failure in ia64-hp-hpux?
Hi, On Tue, Aug 07, 2012 at 03:14:21PM +0200, Richard Guenther wrote: On Mon, Aug 6, 2012 at 8:21 PM, Martin Jambor mjam...@suse.cz wrote: I've had this flagged to look at later for quite long now... On Mon, Apr 30, 2012 at 07:34:24AM +, Mailaripillai, Kannan Jeganathan wrote: Hi, This is related to pr45605.C test. Reduced testcase struct B { virtual void Run(){}; }; struct D : public B { virtual void Run() { }; }; int main() { D d; static_castB(d).Run(); } With x86_64 linux the call to Run through object d is devirtualized. Whereas it looks like in ia64 hp-ux it is not devirtualized. -fdump-tree-fre1 output for both: x86_64 linux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.2197._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.2248_1 = MEM[(void *)_ZTV1D + 16B]; D.2249_2 = Run; D::Run (d.D.2197); d ={v} {CLOBBER}; return 0; ia64 hp-ux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.1878._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.1929_1 = MEM[(void *)_ZTV1D + 16B]; D.1930_2 = (int (*__vtbl_ptr_type) ()) MEM[(void *)_ZTV1D + 16B]; OBJ_TYPE_REF(D.1930_2;d.D.1878-0) (d.D.1878); Is it a bug (unexpected with O1 compilation) that it is not optimized to direct call? There are two important and related differences. The first one is that virtual method tables on ia64 constist of FDESC_EXPRs rather than mere ADDR_EXPRs. The second one can be seen in the dumps just before fre1 (i.e. esra): i686: d.D.1854._vptr.B = MEM[(void *)_ZTV1D + 8B]; D.1961_4 = d.D.1854._vptr.B; D.1962_5 = *D.1961_4; OBJ_TYPE_REF(D.1962_5;d.D.1854-0) (d.D.1854); ia64: d.D.1883._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.1991_4 = d.D.1883._vptr.B; D.1992_5 = (int (*__vtbl_ptr_type) ()) D.1991_4; OBJ_TYPE_REF(D.1992_5;d.D.1883-0) (d.D.1883); The main difference is not the type cast in the third assignment but the fact that there is no dereference there, which means that gimple folder has to deal with it at a different place. I played with it a bit this afternoon and came up with the following untested patch to fix the pr45605.C testcase. I can bootstrap and test it on ia64 if we do not mind this special casing of FDESC_EXPRs in the midle end (I hope that all platforms that use it use it in the same way, I only know ia64...) Thanks, Martin 2012-08-06 Martin Jambor mjam...@suse.cz * gimple-fold.c (gimple_fold_stmt_to_constant_1): Also fold assignments of V_C_Es of addresses of FDESC_EXPRs. *** gcc/gimple-fold.c Mon Aug 6 14:36:37 2012 --- /tmp/FcpIKb_gimple-fold.c Mon Aug 6 20:17:26 2012 *** gimple_fold_stmt_to_constant_1 (gimple s *** 2542,2548 == TYPE_ADDR_SPACE (TREE_TYPE (op0)) TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))) ! return op0; return fold_unary_ignore_overflow_loc (loc, subcode, --- 2542,2556 == TYPE_ADDR_SPACE (TREE_TYPE (op0)) TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))) ! { ! tree t; ! if (TREE_CODE (op0) != ADDR_EXPR) ! return op0; ! t = fold_const_aggregate_ref_1 (TREE_OPERAND (op0, 0), valueize); ! if (t TREE_CODE (t) == FDESC_EXPR) ! return build_fold_addr_expr_loc (loc, TREE_OPERAND (t, 0)); FDESC_EXPR has two operands ... is it really ok to ignore the 2nd? /* Operand0 is a function constant; result is part N of a function descriptor of type ptr_mode. */ DEFTREECODE (FDESC_EXPR, fdesc_expr, tcc_expression, 2) I suppose yes, from what I see in the uses in the C++ frontend. In fact _all_ users of FDESC_EXPR seem to ignore the 2nd operand ...!? (I would have expected users in machine specific code ... Thus, Ok! So I did the testing and unfortunately this only works for the first virtual function of a class, the sequence calling any other virtual function has one more statement which adds VMT offset to the typecasted pointer and after folding we end up calling stuff like MEM[f1+16B]() instead of f2() in g++.dg/template/ptrmem18.C. I'll have another look at this when I have a spare while, I think we'll need to look for these cases from the call statement upwards. BTW, an interesting thing about this testcase is that neither on ia64 nor on i686 are the virtual calls enclosed in an OBJ_TYPE_REF... Thanks anyway, Martin
Re: pr45605.C devirtualize call failure in ia64-hp-hpux?
Hi, I've had this flagged to look at later for quite long now... On Mon, Apr 30, 2012 at 07:34:24AM +, Mailaripillai, Kannan Jeganathan wrote: Hi, This is related to pr45605.C test. Reduced testcase struct B { virtual void Run(){}; }; struct D : public B { virtual void Run() { }; }; int main() { D d; static_castB(d).Run(); } With x86_64 linux the call to Run through object d is devirtualized. Whereas it looks like in ia64 hp-ux it is not devirtualized. -fdump-tree-fre1 output for both: x86_64 linux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.2197._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.2248_1 = MEM[(void *)_ZTV1D + 16B]; D.2249_2 = Run; D::Run (d.D.2197); d ={v} {CLOBBER}; return 0; ia64 hp-ux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.1878._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.1929_1 = MEM[(void *)_ZTV1D + 16B]; D.1930_2 = (int (*__vtbl_ptr_type) ()) MEM[(void *)_ZTV1D + 16B]; OBJ_TYPE_REF(D.1930_2;d.D.1878-0) (d.D.1878); Is it a bug (unexpected with O1 compilation) that it is not optimized to direct call? There are two important and related differences. The first one is that virtual method tables on ia64 constist of FDESC_EXPRs rather than mere ADDR_EXPRs. The second one can be seen in the dumps just before fre1 (i.e. esra): i686: d.D.1854._vptr.B = MEM[(void *)_ZTV1D + 8B]; D.1961_4 = d.D.1854._vptr.B; D.1962_5 = *D.1961_4; OBJ_TYPE_REF(D.1962_5;d.D.1854-0) (d.D.1854); ia64: d.D.1883._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.1991_4 = d.D.1883._vptr.B; D.1992_5 = (int (*__vtbl_ptr_type) ()) D.1991_4; OBJ_TYPE_REF(D.1992_5;d.D.1883-0) (d.D.1883); The main difference is not the type cast in the third assignment but the fact that there is no dereference there, which means that gimple folder has to deal with it at a different place. I played with it a bit this afternoon and came up with the following untested patch to fix the pr45605.C testcase. I can bootstrap and test it on ia64 if we do not mind this special casing of FDESC_EXPRs in the midle end (I hope that all platforms that use it use it in the same way, I only know ia64...) Thanks, Martin 2012-08-06 Martin Jambor mjam...@suse.cz * gimple-fold.c (gimple_fold_stmt_to_constant_1): Also fold assignments of V_C_Es of addresses of FDESC_EXPRs. *** gcc/gimple-fold.c Mon Aug 6 14:36:37 2012 --- /tmp/FcpIKb_gimple-fold.c Mon Aug 6 20:17:26 2012 *** gimple_fold_stmt_to_constant_1 (gimple s *** 2542,2548 == TYPE_ADDR_SPACE (TREE_TYPE (op0)) TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))) ! return op0; return fold_unary_ignore_overflow_loc (loc, subcode, --- 2542,2556 == TYPE_ADDR_SPACE (TREE_TYPE (op0)) TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0))) ! { ! tree t; ! if (TREE_CODE (op0) != ADDR_EXPR) ! return op0; ! t = fold_const_aggregate_ref_1 (TREE_OPERAND (op0, 0), valueize); ! if (t TREE_CODE (t) == FDESC_EXPR) ! return build_fold_addr_expr_loc (loc, TREE_OPERAND (t, 0)); ! return op0; ! } return fold_unary_ignore_overflow_loc (loc, subcode,
Re: pr45605.C devirtualize call failure in ia64-hp-hpux?
With x86_64 linux the call to Run through object d is devirtualized. Whereas it looks like in ia64 hp-ux it is not devirtualized. -fdump-tree-fre1 output for both: x86_64 linux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.2197._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.2248_1 = MEM[(void *)_ZTV1D + 16B]; D.2249_2 = Run; D::Run (d.D.2197); d ={v} {CLOBBER}; return 0; ia64 hp-ux: MEM[(struct B *)d]._vptr.B = MEM[(void *)_ZTV1B + 16B]; d.D.1878._vptr.B = MEM[(void *)_ZTV1D + 16B]; D.1929_1 = MEM[(void *)_ZTV1D + 16B]; D.1930_2 = (int (*__vtbl_ptr_type) ()) MEM[(void *)_ZTV1D + 16B]; OBJ_TYPE_REF(D.1930_2;d.D.1878-0) (d.D.1878); Is it a bug (unexpected with O1 compilation) that it is not optimized to direct call? I'd think so, IA-64 uses a specific construct for its vtables because of the ABI requirements. -- Eric Botcazou