Re: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
On Wed, 17 Feb 2016, Richard Biener wrote: > On Wed, 17 Feb 2016, Bernd Schmidt wrote: > > > > > > > On 02/17/2016 02:18 PM, Daniel Gutson wrote: > > > On Wed, Nov 26, 2014 at 5:46 AM, Andrew Pinskiwrote: > > > > > > FYI. This causes gfc_add_interface_mapping in fortrant/trans-expr.c to > > > > be miscompiled for aarch64-linux-gnu. I am still debugging it and > > > > trying to get a smaller testcase. > > > > > > Hello, > > > > > > is there any update on this? > > > > Is this a PR somewhere? > > I think there are several. But rather than a special pass I hope > we can get to lowering all bitfield accesses somewhere and make > our regular passes deal with the combining. I've had multiple > approaches at this but never went through finalizing them > (tried doing that too early all the times I guess). Whee. 2011 - https://gcc.gnu.org/ml/gcc-patches/2011-06/msg01233.html. I remember updating this for DECL_BIT_FIELD_REPRESENTATIVE we have now, simplifying this. I also remember doing the lowering (using DECL_BIT_FIELD_REPRESENTATIVE) at gimplification time. And then bitfield lowering was part of the original mem-ref branch (that didn't get merged). Richard.
Re: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
On Wed, 17 Feb 2016, Bernd Schmidt wrote: > > > On 02/17/2016 02:18 PM, Daniel Gutson wrote: > > On Wed, Nov 26, 2014 at 5:46 AM, Andrew Pinskiwrote: > > > > FYI. This causes gfc_add_interface_mapping in fortrant/trans-expr.c to > > > be miscompiled for aarch64-linux-gnu. I am still debugging it and > > > trying to get a smaller testcase. > > > > Hello, > > > > is there any update on this? > > Is this a PR somewhere? I think there are several. But rather than a special pass I hope we can get to lowering all bitfield accesses somewhere and make our regular passes deal with the combining. I've had multiple approaches at this but never went through finalizing them (tried doing that too early all the times I guess). Richard.
Re: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
On Wed, Feb 17, 2016 at 02:45:16PM +0100, Bernd Schmidt wrote: > > > On 02/17/2016 02:18 PM, Daniel Gutson wrote: > >On Wed, Nov 26, 2014 at 5:46 AM, Andrew Pinskiwrote: > > >>FYI. This causes gfc_add_interface_mapping in fortrant/trans-expr.c to > >>be miscompiled for aarch64-linux-gnu. I am still debugging it and > >>trying to get a smaller testcase. > > > >Hello, > > > >is there any update on this? > > Is this a PR somewhere? Perhaps related to PR22141? Jakub
Re: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
On 02/17/2016 02:18 PM, Daniel Gutson wrote: On Wed, Nov 26, 2014 at 5:46 AM, Andrew Pinskiwrote: FYI. This causes gfc_add_interface_mapping in fortrant/trans-expr.c to be miscompiled for aarch64-linux-gnu. I am still debugging it and trying to get a smaller testcase. Hello, is there any update on this? Is this a PR somewhere? Bernd
Re: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
&& bf_access_candidate_p (gimple_assign_rhs1 >> + (cc->head_copy->load_stmt), >> + _off)) >> + off = mrg_off; >> + if (cc && cc->merged) >> + { >> + tree head_rhs = gimple_assign_rhs1 >> (cc->head_copy->load_stmt); >> + switch (TREE_CODE (head_rhs)) >> + { >> + case COMPONENT_REF: >> + if (bf_access_candidate_p (head_rhs, _off)) >> + off = mrg_off; >> + break; >> + case BIT_FIELD_REF: >> + off = tree_to_uhwi (TREE_OPERAND (head_rhs, 2)); >> + break; >> + default: >> + break; >> + } >> + } >> + >> + if (cc && (cc->modified)) >> + { >> + tree tmp_ssa; >> + tree itype = make_node (INTEGER_TYPE); >> + TYPE_PRECISION (itype) = TREE_INT_CST_LOW (size); >> + fixup_unsigned_type (itype); >> + lower_bitfield_read (, off, size, itype); >> + tmp_ssa = >> + make_ssa_name (create_tmp_var (itype, NULL), >> cc->load_stmt); >> + gimple_assign_set_lhs (cc->load_stmt, tmp_ssa); >> + update_stmt (cc->load_stmt); >> + gimple_assign_set_rhs1 (cc->store_stmt, tmp_ssa); >> + update_stmt (cc->store_stmt); >> + } >> + else if (cc && cc->merged) >> + { >> + gsi_remove (, true); >> + deleted = true; >> + } >> + } >> + /* Lower a bit-field write. */ >> + ref = gimple_assign_lhs (stmt); >> + if (bf_access_candidate_p (ref, )) >> + { >> + bfcopy *cc = NULL; >> + bitfield_stmt_bfcopy_pair st_cpy (stmt, NULL); >> + bitfield_stmt_bfcopy_pair *p_st_cpy; >> + unsigned HOST_WIDE_INT mrg_off; >> + p_st_cpy = bf_stmnt_cpy->find (_cpy); >> + if (p_st_cpy) >> + cc = p_st_cpy->copy; >> + >> + if (cc && (cc->merged || cc->modified)) >> + size = >> + build_int_cst (unsigned_type_node, >> + get_merged_bit_field_size (cc->head_copy ? >> + cc->head_copy : >> + cc)); >> + else >> + size = DECL_SIZE (TREE_OPERAND (ref, 1)); >> + if (cc && cc->merged >> + && >> + bf_access_candidate_p (gimple_assign_lhs >> + (cc->head_copy->store_stmt), >> _off)) >> + off = mrg_off; >> + >> + if (cc && (cc->modified) && !(cc && cc->merged)) >> + lower_bitfield_write (, off, size); >> + else if (cc && cc->merged) >> + { >> + if (gimple_vdef (stmt)) >
Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
); + } + } + + gsi_remove (gsi, true); + deleted = true; + } + } + if (gimple_vdef (stmt) !deleted) + reaching_vuse = gimple_vdef (stmt); + if (!deleted) + gsi_next (gsi); + } + } + delete bf_stmnt_cpy; +} + /* Perform early intraprocedural SRA. */ static unsigned int early_intra_sra (void) { sra_mode = SRA_MODE_EARLY_INTRA; - return perform_intra_sra (); + unsigned int res = perform_intra_sra (); + if (flag_tree_bitfield_merge) +lower_bitfields (); + return res; } /* Perform late intraprocedural SRA. */ diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 44656ea..cdb3850 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -4410,29 +4410,6 @@ get_next_value_id (void) return next_value_id++; } - -/* Compare two expressions E1 and E2 and return true if they are equal. */ - -bool -expressions_equal_p (tree e1, tree e2) -{ - /* The obvious case. */ - if (e1 == e2) -return true; - - /* If only one of them is null, they cannot be equal. */ - if (!e1 || !e2) -return false; - - /* Now perform the actual comparison. */ - if (TREE_CODE (e1) == TREE_CODE (e2) - operand_equal_p (e1, e2, OEP_PURE_SAME)) -return true; - - return false; -} - - /* Return true if the nary operation NARY may trap. This is a copy of stmt_could_throw_1_p adjusted to the SCCVN IL. */ diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h index ad99604..d5963b6 100644 --- a/gcc/tree-ssa-sccvn.h +++ b/gcc/tree-ssa-sccvn.h @@ -21,10 +21,6 @@ #ifndef TREE_SSA_SCCVN_H #define TREE_SSA_SCCVN_H -/* In tree-ssa-sccvn.c */ -bool expressions_equal_p (tree, tree); - - /* TOP of the VN lattice. */ extern tree VN_TOP; diff --git a/gcc/tree.c b/gcc/tree.c index 365e89c..8df9812 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -12286,4 +12286,44 @@ get_base_address (tree t) return t; } +/* Compare two expressions E1 and E2 and return true if they are equal. */ + +bool +expressions_equal_p (const_tree e1, const_tree e2) +{ + /* The obvious case. */ + if (e1 == e2) +return true; + + /* If only one of them is null, they cannot be equal. */ + if (!e1 || !e2) +return false; + + /* Now perform the actual comparison. */ + if (TREE_CODE (e1) == TREE_CODE (e2) + operand_equal_p (e1, e2, OEP_PURE_SAME)) +return true; + + return false; +} + +/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE + node, return the size in bits for the type if it is a constant, or else + return the alignment for the type if the type's size is not constant, or + else return BITS_PER_WORD if the type actually turns out to be an + ERROR_MARK node. */ + +unsigned HOST_WIDE_INT +simple_type_size_in_bits (const_tree type) +{ + if (TREE_CODE (type) == ERROR_MARK) +return BITS_PER_WORD; + else if (TYPE_SIZE (type) == NULL_TREE) +return 0; + else if (tree_fits_uhwi_p (TYPE_SIZE (type))) +return tree_to_uhwi (TYPE_SIZE (type)); + else +return TYPE_ALIGN (type); +} + #include gt-tree.h diff --git a/gcc/tree.h b/gcc/tree.h index 45f127f..b903089 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4068,6 +4068,7 @@ extern tree substitute_placeholder_in_expr (tree, tree); ((EXP) == 0 || TREE_CONSTANT (EXP) ? (EXP) \ : substitute_placeholder_in_expr (EXP, OBJ)) +extern unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree type); /* stabilize_reference (EXP) returns a reference equivalent to EXP but it can be used multiple times @@ -4184,6 +4185,11 @@ inlined_function_outer_scope_p (const_tree block) (TREE = function_args_iter_cond ((ITER))) != NULL_TREE; \ function_args_iter_next ((ITER))) + +/* In dwarf2out.c. */ +HOST_WIDE_INT +field_byte_offset (const_tree decl); + /* In tree.c */ extern unsigned crc32_string (unsigned, const char *); extern unsigned crc32_byte (unsigned, char); @@ -4340,6 +4346,7 @@ extern tree obj_type_ref_class (tree ref); extern bool types_same_for_odr (const_tree type1, const_tree type2); extern bool contains_bitfld_component_ref_p (const_tree); extern bool type_in_anonymous_namespace_p (const_tree); +extern bool expressions_equal_p (const_tree e1, const_tree e2); extern bool block_may_fallthru (const_tree); extern void using_eh_for_cleanups (void); extern bool using_eh_for_cleanups_p (void); Regards, Zoran From: Andrew Pinski [pins...@gmail.com] Sent: Thursday, October 16, 2014 4:09 AM To: Zoran Jovanovic Cc: gcc-patches@gcc.gnu.org; Richard Biener; Bernhard Reutner-Fischer Subject: Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside) On Tue, Apr 22, 2014 at 3:28 AM, Zoran Jovanovic zoran.jovano...@imgtec.com wrote: Hello, Updated doc/invoke.texi by stating that new option is enabled by default at -O2 and higher
RE: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
; - - return false; -} - - /* Return true if the nary operation NARY may trap. This is a copy of stmt_could_throw_1_p adjusted to the SCCVN IL. */ diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h index ad99604..d5963b6 100644 --- a/gcc/tree-ssa-sccvn.h +++ b/gcc/tree-ssa-sccvn.h @@ -21,10 +21,6 @@ #ifndef TREE_SSA_SCCVN_H #define TREE_SSA_SCCVN_H -/* In tree-ssa-sccvn.c */ -bool expressions_equal_p (tree, tree); - - /* TOP of the VN lattice. */ extern tree VN_TOP; diff --git a/gcc/tree.c b/gcc/tree.c index 365e89c..8df9812 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -12286,4 +12286,44 @@ get_base_address (tree t) return t; } +/* Compare two expressions E1 and E2 and return true if they are equal. */ + +bool +expressions_equal_p (const_tree e1, const_tree e2) +{ + /* The obvious case. */ + if (e1 == e2) +return true; + + /* If only one of them is null, they cannot be equal. */ + if (!e1 || !e2) +return false; + + /* Now perform the actual comparison. */ + if (TREE_CODE (e1) == TREE_CODE (e2) + operand_equal_p (e1, e2, OEP_PURE_SAME)) +return true; + + return false; +} + +/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE + node, return the size in bits for the type if it is a constant, or else + return the alignment for the type if the type's size is not constant, or + else return BITS_PER_WORD if the type actually turns out to be an + ERROR_MARK node. */ + +unsigned HOST_WIDE_INT +simple_type_size_in_bits (const_tree type) +{ + if (TREE_CODE (type) == ERROR_MARK) +return BITS_PER_WORD; + else if (TYPE_SIZE (type) == NULL_TREE) +return 0; + else if (tree_fits_uhwi_p (TYPE_SIZE (type))) +return tree_to_uhwi (TYPE_SIZE (type)); + else +return TYPE_ALIGN (type); +} + #include gt-tree.h diff --git a/gcc/tree.h b/gcc/tree.h index 45f127f..b903089 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4068,6 +4068,7 @@ extern tree substitute_placeholder_in_expr (tree, tree); ((EXP) == 0 || TREE_CONSTANT (EXP) ? (EXP) \ : substitute_placeholder_in_expr (EXP, OBJ)) +extern unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree type); /* stabilize_reference (EXP) returns a reference equivalent to EXP but it can be used multiple times @@ -4184,6 +4185,11 @@ inlined_function_outer_scope_p (const_tree block) (TREE = function_args_iter_cond ((ITER))) != NULL_TREE; \ function_args_iter_next ((ITER))) + +/* In dwarf2out.c. */ +HOST_WIDE_INT +field_byte_offset (const_tree decl); + /* In tree.c */ extern unsigned crc32_string (unsigned, const char *); extern unsigned crc32_byte (unsigned, char); @@ -4340,6 +4346,7 @@ extern tree obj_type_ref_class (tree ref); extern bool types_same_for_odr (const_tree type1, const_tree type2); extern bool contains_bitfld_component_ref_p (const_tree); extern bool type_in_anonymous_namespace_p (const_tree); +extern bool expressions_equal_p (const_tree e1, const_tree e2); extern bool block_may_fallthru (const_tree); extern void using_eh_for_cleanups (void); extern bool using_eh_for_cleanups_p (void); Regards, Zoran From: Andrew Pinski [pins...@gmail.com] Sent: Thursday, October 16, 2014 4:09 AM To: Zoran Jovanovic Cc: gcc-patches@gcc.gnu.org; Richard Biener; Bernhard Reutner-Fischer Subject: Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside) On Tue, Apr 22, 2014 at 3:28 AM, Zoran Jovanovic zoran.jovano...@imgtec.com wrote: Hello, Updated doc/invoke.texi by stating that new option is enabled by default at -O2 and higher. Also, -fmerge-bitfields added to the list of optimization flags enabled by default at -O2 and higher. With this patch applied gcc from SPEC 2006 ICEs on aarch64-linux-gnu. Here is a reduced testcase: typedef struct rtx_def *rtx; struct rtx_def { unsigned int unchanging : 1; unsigned int volatil : 1; unsigned int in_struct : 1; unsigned integrated : 1; unsigned frame_related : 1; }; ix86_set_move_mem_attrs_1 (rtx x, rtx dstref) { ((x)-volatil) = ((dstref)-volatil); ((x)-in_struct) = ((dstref)-in_struct); ((x)-frame_related) = ((dstref)-frame_related); ((x)-unchanging) = ((dstref)-unchanging); } --- CUT --- Thanks, Andrew Pinski
Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
On Tue, Apr 22, 2014 at 3:28 AM, Zoran Jovanovic zoran.jovano...@imgtec.com wrote: Hello, Updated doc/invoke.texi by stating that new option is enabled by default at -O2 and higher. Also, -fmerge-bitfields added to the list of optimization flags enabled by default at -O2 and higher. With this patch applied gcc from SPEC 2006 ICEs on aarch64-linux-gnu. Here is a reduced testcase: typedef struct rtx_def *rtx; struct rtx_def { unsigned int unchanging : 1; unsigned int volatil : 1; unsigned int in_struct : 1; unsigned integrated : 1; unsigned frame_related : 1; }; ix86_set_move_mem_attrs_1 (rtx x, rtx dstref) { ((x)-volatil) = ((dstref)-volatil); ((x)-in_struct) = ((dstref)-in_struct); ((x)-frame_related) = ((dstref)-frame_related); ((x)-unchanging) = ((dstref)-unchanging); } --- CUT --- Thanks, Andrew Pinski Regards, Zoran Jovanovic -- Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632-520 bytes) core/platform/graphics/FontDescription (1715-1475 bytes) core/rendering/style/FillLayer (5069-4513 bytes) core/rendering/style/StyleRareInheritedData (5618-5346) core/css/CSSSelectorList(4047-3887) core/platform/animation/CSSAnimationData (3844-3440 bytes) core/css/resolver/FontBuilder (13818-13350 bytes) core/platform/graphics/Font (16447-15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const Ax) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: bb 2: _3 = x_2(D)-a; this_4(D)-a = _3; _6 = x_2(D)-b; this_4(D)-b = _6; _8 = x_2(D)-c; this_4(D)-c = _8; return; Optimized GIMPLE code: bb 2: _10 = x_2(D)-D.1867; _11 = BIT_FIELD_REF _10, 7, 0; _12 = this_4(D)-D.1867; _13 = _12 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)-D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option -fmerge-bitfields is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-22 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to -fmerge-bitfields. * doc/invoke.texi: Add -fmerge-bitfields to the list of optimization flags turned on at -O2. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (part_of_union_p): New function. (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. *
RE: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
Hello, Updated doc/invoke.texi by stating that new option is enabled by default at -O2 and higher. Also, -fmerge-bitfields added to the list of optimization flags enabled by default at -O2 and higher. Regards, Zoran Jovanovic -- Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632-520 bytes) core/platform/graphics/FontDescription (1715-1475 bytes) core/rendering/style/FillLayer (5069-4513 bytes) core/rendering/style/StyleRareInheritedData (5618-5346) core/css/CSSSelectorList(4047-3887) core/platform/animation/CSSAnimationData (3844-3440 bytes) core/css/resolver/FontBuilder (13818-13350 bytes) core/platform/graphics/Font (16447-15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const Ax) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: bb 2: _3 = x_2(D)-a; this_4(D)-a = _3; _6 = x_2(D)-b; this_4(D)-b = _6; _8 = x_2(D)-c; this_4(D)-c = _8; return; Optimized GIMPLE code: bb 2: _10 = x_2(D)-D.1867; _11 = BIT_FIELD_REF _10, 7, 0; _12 = this_4(D)-D.1867; _13 = _12 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)-D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option -fmerge-bitfields is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-22 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to -fmerge-bitfields. * doc/invoke.texi: Add -fmerge-bitfields to the list of optimization flags turned on at -O2. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (part_of_union_p): New function. (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index da275e5..52c7f58 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2203,6 +2203,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Optimization +Merge loads and stores of consecutive bitfields + ftree-ter Common Report Var(flag_tree_ter) Optimization
Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
On Thu, 17 Apr 2014 11:59:16 + Zoran Jovanovic zoran.jovano...@imgtec.com wrote: --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -7789,6 +7789,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -fmerge-bitfields +@opindex fmerge-bitfields +Combines several adjacent bit-field accesses that copy values +from one memory location to another into one single bit-field access. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation (CCP) on trees. This Can you mention that it's enabled at level -O2 here? Also don't forget to add it to the list of flags enabled by -O2 that appears earlier (@item -O2). That list is woefully out of date but let's not make it any worse. -- Ryan Hillpsn: dirtyepic_sk gcc-porting/toolchain/wxwidgets @ gentoo.org 47C3 6D62 4864 0E49 8E9E 7F92 ED38 BD49 957A 8463 signature.asc Description: PGP signature
RE: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
Hello, My apologies for inconvenience. Removed every appearance of -ftree-bitfield-merge from the patch and fixed an issue with unions. The rest of the patch is the same as before. Regards, Zoran Jovanovic -- Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632-520 bytes) core/platform/graphics/FontDescription (1715-1475 bytes) core/rendering/style/FillLayer (5069-4513 bytes) core/rendering/style/StyleRareInheritedData (5618-5346) core/css/CSSSelectorList(4047-3887) core/platform/animation/CSSAnimationData (3844-3440 bytes) core/css/resolver/FontBuilder (13818-13350 bytes) core/platform/graphics/Font (16447-15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const Ax) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: bb 2: _3 = x_2(D)-a; this_4(D)-a = _3; _6 = x_2(D)-b; this_4(D)-b = _6; _8 = x_2(D)-c; this_4(D)-c = _8; return; Optimized GIMPLE code: bb 2: _10 = x_2(D)-D.1867; _11 = BIT_FIELD_REF _10, 7, 0; _12 = this_4(D)-D.1867; _13 = _12 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)-D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option -fmerge-bitfields is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to -fmerge-bitfields. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (part_of_union_p): New function. (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index da275e5..52c7f58 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2203,6 +2203,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Optimization +Merge loads and stores of consecutive bitfields + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA-normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
RE: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
Hello, Unfortunately, optimization is limited only to bit-fields that have same bit-field representative (DECL_BIT_FIELD_REPRESENTATIVE), and fields from different classes do have different representatives. In given example optimization would merge accesses to x and y bit-fields from Base class, but not the access to z from Der class. Regards, Zoran From: Daniel Gutson [daniel.gut...@tallertechnologies.com] Sent: Wednesday, April 16, 2014 4:16 PM To: Zoran Jovanovic Cc: Bernhard Reutner-Fischer; Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside) On Wed, Apr 16, 2014 at 8:38 AM, Zoran Jovanovic zoran.jovano...@imgtec.com wrote: Hello, This is new patch version. Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632-520 bytes) core/platform/graphics/FontDescription (1715-1475 bytes) core/rendering/style/FillLayer (5069-4513 bytes) core/rendering/style/StyleRareInheritedData (5618-5346) core/css/CSSSelectorList(4047-3887) core/platform/animation/CSSAnimationData (3844-3440 bytes) core/css/resolver/FontBuilder (13818-13350 bytes) core/platform/graphics/Font (16447-15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const Ax) { a = x.a; b = x.b; c = x.c; } Very interesting. Does this work with inheritance too? E.g. struct Base { uint32_t x:1; uint32_t y:3; Base(const Base other) { x = other.x; y = other.y; } }; struct Der : Base { Der() = default; Der(const Der other) : Base(other) { z = other.z; } uint32_t z:9; }; GIMPLE code without optimization: bb 2: _3 = x_2(D)-a; this_4(D)-a = _3; _6 = x_2(D)-b; this_4(D)-b = _6; _8 = x_2(D)-c; this_4(D)-c = _8; return; Optimized GIMPLE code: bb 2: _10 = x_2(D)-D.1867; _11 = BIT_FIELD_REF _10, 7, 0; _12 = this_4(D)-D.1867; _13 = _12 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)-D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option -fmerge-bitfields is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to -fmerge-bitfields. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree
RE: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
Hello, This is new patch version. Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632-520 bytes) core/platform/graphics/FontDescription (1715-1475 bytes) core/rendering/style/FillLayer (5069-4513 bytes) core/rendering/style/StyleRareInheritedData (5618-5346) core/css/CSSSelectorList(4047-3887) core/platform/animation/CSSAnimationData (3844-3440 bytes) core/css/resolver/FontBuilder (13818-13350 bytes) core/platform/graphics/Font (16447-15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const Ax) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: bb 2: _3 = x_2(D)-a; this_4(D)-a = _3; _6 = x_2(D)-b; this_4(D)-b = _6; _8 = x_2(D)-c; this_4(D)-c = _8; return; Optimized GIMPLE code: bb 2: _10 = x_2(D)-D.1867; _11 = BIT_FIELD_REF _10, 7, 0; _12 = this_4(D)-D.1867; _13 = _12 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)-D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option -fmerge-bitfields is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to -fmerge-bitfields. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index da275e5..52c7f58 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2203,6 +2203,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Optimization +Merge loads and stores of consecutive bitfields + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA-normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 3fdfeb9..546638e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -411,7 +411,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
On 16 April 2014 13:38, Zoran Jovanovic zoran.jovano...@imgtec.com wrote: Hello, This is new patch version. The comment from the previous iteration still holds true: +@item -fbitfield-merge you are talking about '-fmerge-bitfields' up until here. Please fix all occurances of bitfield-merge, both in the docs as well as in the gcc.dg/tree-ssa/bitfldmrg2.c testcase -- how did that pass anyway as that option is presumably not recognized? :) thanks,
Re: [PATCH] Add a new option -fmerge-bitfields (patch / doc inside)
On Wed, Apr 16, 2014 at 8:38 AM, Zoran Jovanovic zoran.jovano...@imgtec.com wrote: Hello, This is new patch version. Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632-520 bytes) core/platform/graphics/FontDescription (1715-1475 bytes) core/rendering/style/FillLayer (5069-4513 bytes) core/rendering/style/StyleRareInheritedData (5618-5346) core/css/CSSSelectorList(4047-3887) core/platform/animation/CSSAnimationData (3844-3440 bytes) core/css/resolver/FontBuilder (13818-13350 bytes) core/platform/graphics/Font (16447-15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const Ax) { a = x.a; b = x.b; c = x.c; } Very interesting. Does this work with inheritance too? E.g. struct Base { uint32_t x:1; uint32_t y:3; Base(const Base other) { x = other.x; y = other.y; } }; struct Der : Base { Der() = default; Der(const Der other) : Base(other) { z = other.z; } uint32_t z:9; }; GIMPLE code without optimization: bb 2: _3 = x_2(D)-a; this_4(D)-a = _3; _6 = x_2(D)-b; this_4(D)-b = _6; _8 = x_2(D)-c; this_4(D)-c = _8; return; Optimized GIMPLE code: bb 2: _10 = x_2(D)-D.1867; _11 = BIT_FIELD_REF _10, 7, 0; _12 = this_4(D)-D.1867; _13 = _12 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)-D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option -fmerge-bitfields is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to -fmerge-bitfields. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index da275e5..52c7f58 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2203,6 +2203,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Optimization +Merge loads and stores of consecutive bitfields +