Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jeff Law writes: + ! unsignedp Don't you need to check that the conversion is actually a sign extension. Oh, you're relying on the signedness of ops-type. That should be sufficient. Exactly. +if (GET_MODE_SIZE (rmode) GET_MODE_SIZE (mode) + TREE_INT_CST_LOW (treeop1) GET_MODE_BITSIZE (word_mode) + ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) += GET_MODE_BITSIZE (word_mode))) I keep having trouble with this. I think the key to why this works is you don't actually use the source of the conversion, but instead the destination of the conversion (held in op0). Yes, my thought is given any narrower type, the op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); Will do necesary sign extension, then finally want we get from op0 is a 2 x word_mode_size register whose high part contains duplication of sign bit, and low part contains the source of the conversion if the narrower mode is word_mode_size, or contains sign extened source, but either way, the low part can be used as our new shift operand. And the high part of op0 generated from above expand_expr is actually dead which will be removed by later optimization pass. So this is OK for the trunk with the typo whitespace nits fixed. During my final testing of the patch on arm, mips32, mips64, powerpc32, powerpc64 I do found two new issues: * As you have mentioned, this patch do have endianness issue! Although the new testcases (wide-shift-128/64.c) under gcc.dg passed my manual powerpc32/powerpc64 test which check combine pass rtl dump, but the generated assembly looks weird to me, then I found simplify_gen_subreg itself is not endianness safe. We need to use subreg_highpart/lowpart_offset to get the offset. So I done minor modifications on the patch by using lowpart_reg and subreg_highpart_offset I think it's endianess safe now. And the instruction sequences generated for powerpc, for example powerpc32 looks sane to me now: long long load1 (int data) { return (long long) data 12;} load1-before-patch: srawi 9,3,31 mr 4,3 srwi 3,3,20 slwi 4,4,12 rlwimi 3,9,12,0,31-12 blr load1-after-patch: mr 4,3 srawi 3,3,20 slwi 4,4,12 blr * The other issue is still what you have noticed. For some targets, for example arm, there is backend define_expand for double word left shift, then because my patch will only do the tranformation when ! have_insn_for (ASHIFT, mode), thus this transformation will not happen on these targets. While I belive this transformation do generate better instruction sequences for some case. As backend don't know high level type conversion information, those backends expand logic like arm_emit_coreregs_64bit_shift can only assume the shift operand may come from any cases thus the expand logic will be very generic thus not optimal. We should compare the cost to decide whether to go with this transformation or use the backend's customized expand logic. But anyway, this patch will not cause regression. We need another seperate patch to implement the cost comparision then we can let those targets benefit from this transformation. Thus I removed the arm target testcase. (I recall arm testcase is in this patch because my original local patch don't contain the have_insn_for check.) Thanks very much for the review! Committed attached patch (above minor issues I treat them as obivious) as 227018. -- Regards, Jiong Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 227017) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2015-08-19 Jiong Wang jiong.w...@arm.com + + * expr.c (expand_expr_real_2): Check gimple statement during + LSHIFT_EXPR expand. + 2015-08-19 Magnus Granberg zo...@gentoo.org * common.opt (fstack-protector): Initialize to -1. Index: gcc/expr.c === --- gcc/expr.c (revision 227017) +++ gcc/expr.c (working copy) @@ -8836,24 +8836,113 @@ case LSHIFT_EXPR: case RSHIFT_EXPR: - /* If this is a fixed-point operation, then we cannot use the code - below because expand_shift doesn't support sat/no-sat fixed-point - shifts. */ - if (ALL_FIXED_POINT_MODE_P (mode)) - goto binop; + { + /* If this is a fixed-point operation, then we cannot use the code + below because expand_shift doesn't support sat/no-sat fixed-point + shifts. */ + if (ALL_FIXED_POINT_MODE_P (mode)) + goto binop; - if (! safe_from_p (subtarget, treeop1, 1)) - subtarget = 0; - if (modifier == EXPAND_STACK_PARM) - target = 0; - op0 = expand_expr (treeop0, subtarget, - VOIDmode, EXPAND_NORMAL); -
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jiong Wang writes: Jeff Law writes: On 08/14/2015 11:40 AM, Jiong Wang wrote: * Figuring out whether the shift source is coming from sign extension by checking SSA_NAME_DEF_STMT instead of deducing from tree range info. I fell checking the gimple statement is more reliable and straigtforward. I suspect it'll also apply more often if you're looking for the nop-conversion rather than looking at range information. I keep thinking there ought to be a generalization here so that we're not so restrictive on the modes, but it's probably not worth doing. In a perfect world we'd also integrate the other strategies for double-word shifts that exist in the various ports as special cases in expansion and remove the double-word shift patterns for ports that don't actually have them in hardware. But that's probably a bit much to ask -- however, it probably couldn't hurt to see if there are any that are easily handled. Agree. Doing these in early tree/rtl expand stage is more clean safe, and expose more details to later RTL optimization passes as I think if you handle double-word by defining insn pattern, then you will try to split it in RTL split pass which happens later after some important optimizations. + + If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't + native instruction to support this wide mode left shift. Given below + example: + + Type A = (Type) B C + +| T | +| high | low | + + |- size -| + + By checking tree shape, if operand B is coming from signed extension, + then the left shift operand can be simplified into: + + 1. high = low (size - C); + 2. low = low C; You'll want to reorder those to match the code you generate. Doesn't this require that C be less than the bitsize of a word? Yes. Above transformation is to handle double word left shift which shift the original source across the word boundary. Part of the contents shifted into the high half of destination, and the other remain in the low half. So if C is bigger than bitsize of a word, the the whole source will be shifted into high half, thus should generate what you have listed below and that's what gcc is generating, looks like the existed double word shift logic can recognize this special cases. So, I should check the value of C,if it's bigger than bitsize of a word, then just fall through to default expand logic. If C is larger than the bitsize of a word, then you need some adjustments, something like: 1. high = low (C - size) 2. low = 0 Does this transformation require that the wider mode be exactly 2X the narrower mode? If so, that needs to be verified. I think so, the wider mode should be exactly 2X the word_mode, will add the check. + if (GET_MODE_SIZE (rmode) GET_MODE_SIZE (mode) So we're assured we have a widening conversion. +((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) + = GET_MODE_BITSIZE (word_mode))) This test seems wrong. I'm not sure what you're really trying to test here. It seems you want to look at the shift count relative to the bitsize of word_mode. If it's less, then generate the code you mentioned above in the comment. If it's more, then generate the sequence I referenced? Right? As explained above, I am trying to check whether the left shift is causing the original source across word boundary while I should make sure TREE_INT_CST_LOW (treeop1) GET_MODE_BITSIZE (word_mode) at the same time, otherwise fall through to default expand logic. I do think you need to be verifying that rmode == wordmode here. If I understand everything correctly, the final value is mode which must be 2X the size size of rmode/wordmode here, right? I think rmode don't need to equal wordmode. For example the transformation is OK for the following, where rmode = SImode, and final mode = TImode. int b; __128_int a = (__128_int) b; the expand_expr (treeop0... in the start of those code will generate necessary instruction sequences to extend whatever mode b is into TImode register (op0 in the patch); The other question is are we endianness-safe in these transformations? I think it is. As this transformation is done with register involved only. I think endianness issue will only happen if there is operation on memory object. + temp = expand_variable_shift (code, word_mode, low, treeop1, + tlow, unsignedp); Why use code here right than just using LSHIFT_EXPR? As noted earlier, Yes, better to use the constant LSHIFT_EXPR. Thanks. patch updated, please review thanks. Changes are: 1. s/shfit/shift/ 2. Re-write the comment. more explanations on the left shift across word size boundary
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
On 08/18/2015 06:54 AM, Jiong Wang wrote: Changes are: 1. s/shfit/shift/ 2. Re-write the comment. more explanations on the left shift across word size boundary scenario, explan gcc's current expand steps and what this optimization can improve. 3. Match the code to the comment. Expand high part first, then low part, so we don't need to check the pseudo register overlapping. 4. Add the check to make sure if we are shifting the whole source operand into high part (sfhit amount = word mode size) then just don't do this optimization, use the default expand logic instead which generate optimized rtx sequences already. 5. Only do this optimization when GET_MOZE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode) bootstrap OK on x86-64, no regression. bootstrap OK on aarch64, no regression. 2015-08-18 Jiong.Wangjiong.w...@arm.com gcc/ * expr.c (expand_expr_real_2): Check gimple statement during LSHIFT_EXPR expand. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Likewise. * gcc.target/aarch64/ashlti3_1.c: Likewise. * gcc.target/arm/ashldisi_1.c: Likewise. -- Regards, Jiong k-new.patch diff --git a/gcc/expr.c b/gcc/expr.c index 3202f55..eca9a20 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8836,23 +8836,110 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, case LSHIFT_EXPR: case RSHIFT_EXPR: - /* If this is a fixed-point operation, then we cannot use the code -below because expand_shift doesn't support sat/no-sat fixed-point - shifts. */ - if (ALL_FIXED_POINT_MODE_P (mode)) - goto binop; + { + /* If this is a fixed-point operation, then we cannot use the code + below because expand_shift doesn't support sat/no-sat fixed-point + shifts. */ + if (ALL_FIXED_POINT_MODE_P (mode)) + goto binop; + + if (! safe_from_p (subtarget, treeop1, 1)) + subtarget = 0; + if (modifier == EXPAND_STACK_PARM) + target = 0; + op0 = expand_expr (treeop0, subtarget, + VOIDmode, EXPAND_NORMAL); + /* Left shift optimization when shifting across word_size boundary. Please insert a newline before this comment. + + If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't + native instruction to support this wide mode left shift. Given below + scenario: + + Type A = (Type) B C + + |T | + | dest_high | dest_low | + +| word_size | + + If the shfit amount C caused we shift B to across the word size s/shfit/shift/ + + While, by checking gimple statements, if operand B is coming from + signed extension, then we can simplify above expand logic into: + + 1. dest_high = src_low (word_size - C). + 2. dest_low = src_low C. + + We can use one arithmetic right shift to finish all the purpose of + steps 2, 4, 5, 6, thus we reduce the steps needed from 6 into 2. */ + + temp = NULL_RTX; + if (code == LSHIFT_EXPR +target +REG_P (target) +! unsignedp +mode == GET_MODE_WIDER_MODE (word_mode) +GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode) +! have_insn_for (ASHIFT, mode) +TREE_CONSTANT (treeop1) +TREE_CODE (treeop0) == SSA_NAME) + { + gimple def = SSA_NAME_DEF_STMT (treeop0); + if (is_gimple_assign (def) +gimple_assign_rhs_code (def) == NOP_EXPR) Don't you need to check that the conversion is actually a sign extension. Oh, you're relying on the signedness of ops-type. That should be sufficient. + { + machine_mode rmode = TYPE_MODE + (TREE_TYPE (gimple_assign_rhs1 (def))); - if (! safe_from_p (subtarget, treeop1, 1)) - subtarget = 0; - if (modifier == EXPAND_STACK_PARM) - target = 0; - op0 = expand_expr (treeop0, subtarget, -VOIDmode, EXPAND_NORMAL); - temp = expand_variable_shift (code, mode, op0, treeop1, target, - unsignedp); - if (code == LSHIFT_EXPR) - temp = REDUCE_BIT_FIELD (temp); - return temp; + if (GET_MODE_SIZE (rmode) GET_MODE_SIZE (mode) +TREE_INT_CST_LOW (treeop1) GET_MODE_BITSIZE (word_mode) +((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) + = GET_MODE_BITSIZE (word_mode))) I keep having trouble with this. I think the key to why this works is you don't actually use the source of the conversion, but instead the destination of the conversion (held in op0). So this is OK for the trunk with the typo whitespace nits fixed. Thanks for your
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
On 08/14/2015 11:40 AM, Jiong Wang wrote: * Figuring out whether the shift source is coming from sign extension by checking SSA_NAME_DEF_STMT instead of deducing from tree range info. I fell checking the gimple statement is more reliable and straigtforward. I suspect it'll also apply more often if you're looking for the nop-conversion rather than looking at range information. I keep thinking there ought to be a generalization here so that we're not so restrictive on the modes, but it's probably not worth doing. In a perfect world we'd also integrate the other strategies for double-word shifts that exist in the various ports as special cases in expansion and remove the double-word shift patterns for ports that don't actually have them in hardware. But that's probably a bit much to ask -- however, it probably couldn't hurt to see if there are any that are easily handled. * For the pseudo register overlaping issue, given: RTX target = TREE source TREE amount I was trying to make sure those two SSA_NAME won't get the same pseudo register by comparing their assigned partition using var_to_partition, but I can't get the associated tree representation for target which is RTX. Then I just expand source and compare the register number with target. Right. Which is what I would have suggested. Given two pseudos, you can just compare them for equality. If they're unequal, then its safe to assume they do not overlap. But I found the simplest way maybe just reorder low_part (target) = low_part (source) amount high_part (target) = low_part (source) amount1 to: high_part (target) = low_part (source) amount1 low_part (target) = low_part (source) amount then, even target and source share the same pseudo register, we can still get what we want, as we are operating on their subreg. Yes. This would work too and avoids the need to find source's pseudo. * Added checking on the result value of expand_variable_shift, if it's not the same as target then call emit_move_insn to do that. Excellent. How is the re-worked patch looks to you? x86 bootstrap OK, regression OK. aarch64 bootstrap OK. 2015-08-14 Jiong.Wangjiong.w...@arm.com gcc/ * expr.c (expand_expr_real_2): Check tree format to optimize LSHIFT_EXPR expand. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Likewise. * gcc.target/aarch64/ashlti3_1.c: Likewise. * gcc.target/arm/ashldisi_1.c: Likewise. + /* Left shfit optimization: s/shfit/shift/ + + If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't + native instruction to support this wide mode left shift. Given below + example: + + Type A = (Type) B C + +| T | +| high | low | + + |- size -| + + By checking tree shape, if operand B is coming from signed extension, + then the left shift operand can be simplified into: + + 1. high = low (size - C); + 2. low = low C; You'll want to reorder those to match the code you generate. Doesn't this require that C be less than the bitsize of a word? If C is larger than the bitsize of a word, then you need some adjustments, something like: 1. high = low (C - size) 2. low = 0 Does this transformation require that the wider mode be exactly 2X the narrower mode? If so, that needs to be verified. + if (GET_MODE_SIZE (rmode) GET_MODE_SIZE (mode) So we're assured we have a widening conversion. +((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) + = GET_MODE_BITSIZE (word_mode))) This test seems wrong. I'm not sure what you're really trying to test here. It seems you want to look at the shift count relative to the bitsize of word_mode. If it's less, then generate the code you mentioned above in the comment. If it's more, then generate the sequence I referenced? Right? I do think you need to be verifying that rmode == wordmode here. If I understand everything correctly, the final value is mode which must be 2X the size size of rmode/wordmode here, right? The other question is are we endianness-safe in these transformations? + { + rtx low = simplify_gen_subreg (word_mode, op0, mode, 0); + rtx tlow = simplify_gen_subreg (word_mode, target, mode, 0); + rtx thigh = simplify_gen_subreg (word_mode, target, mode, +UNITS_PER_WORD); + HOST_WIDE_INT ramount = (BITS_PER_WORD +- TREE_INT_CST_LOW (treeop1)); + tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jeff Law writes: On 04/29/2015 03:36 PM, Jiong Wang wrote: Jeff Law writes: On 04/27/2015 02:21 PM, Jiong Wang wrote: Jeff, Sorry, I can't understand the meaning of overlap between t_low and low, assume right in right value means the opposite of left not correct. So what you mean is t_low and low share the same pseudo regiser? My concern is sharing the same pseudo or memory location. But thinking more about it, the shifted value has to have range information, so it must have been an SSA_NAME, right? If so, then it can't overlap with the destination, so this is a non-issue. Sorry for the confusion. Thanks for the light. By looking at related code, looks like even it's SSA_NAME, it's still possible to share the same pseudo given the destination is in the same SSA map parition after ssa name coleascing? If they're the same size and have non-overlapping lifetimes, then yes, they could be the same pseudo. That ought to be easy to check. Thankfully we don't have to worry about MEMs, which is a harder check. OK. I will rework the patch, and I found there is a function named expand_doubleword_shift which looks like a more natural place to do this optimization, although it's hard to get range info there. I will do further explore on this. Sounds good. jeff Jeff, Sorry for the slow response. I found we can't integrate this transformation into expand_doubleword_shift as it's at quite deep layer in the call stack, when we reach there, we have lost some high level info. So I am keeping this transformaion still in expr.c, and attachment is the updated version of this patch. The main changes are: * Figuring out whether the shift source is coming from sign extension by checking SSA_NAME_DEF_STMT instead of deducing from tree range info. I fell checking the gimple statement is more reliable and straigtforward. * For the pseudo register overlaping issue, given: RTX target = TREE source TREE amount I was trying to make sure those two SSA_NAME won't get the same pseudo register by comparing their assigned partition using var_to_partition, but I can't get the associated tree representation for target which is RTX. Then I just expand source and compare the register number with target. But I found the simplest way maybe just reorder low_part (target) = low_part (source) amount high_part (target) = low_part (source) amount1 to: high_part (target) = low_part (source) amount1 low_part (target) = low_part (source) amount then, even target and source share the same pseudo register, we can still get what we want, as we are operating on their subreg. * Added checking on the result value of expand_variable_shift, if it's not the same as target then call emit_move_insn to do that. How is the re-worked patch looks to you? x86 bootstrap OK, regression OK. aarch64 bootstrap OK. 2015-08-14 Jiong.Wang jiong.w...@arm.com gcc/ * expr.c (expand_expr_real_2): Check tree format to optimize LSHIFT_EXPR expand. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Likewise. * gcc.target/aarch64/ashlti3_1.c: Likewise. * gcc.target/arm/ashldisi_1.c: Likewise. -- Regards, Jiong diff --git a/gcc/expr.c b/gcc/expr.c index 31b4573..8a25e98 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8836,23 +8836,90 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, case LSHIFT_EXPR: case RSHIFT_EXPR: - /* If this is a fixed-point operation, then we cannot use the code - below because expand_shift doesn't support sat/no-sat fixed-point - shifts. */ - if (ALL_FIXED_POINT_MODE_P (mode)) - goto binop; + { + /* If this is a fixed-point operation, then we cannot use the code + below because expand_shift doesn't support sat/no-sat fixed-point + shifts. */ + if (ALL_FIXED_POINT_MODE_P (mode)) + goto binop; + + if (! safe_from_p (subtarget, treeop1, 1)) + subtarget = 0; + if (modifier == EXPAND_STACK_PARM) + target = 0; + op0 = expand_expr (treeop0, subtarget, + VOIDmode, EXPAND_NORMAL); + /* Left shfit optimization: + + If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't + native instruction to support this wide mode left shift. Given below + example: + + Type A = (Type) B C + +| T | +| high | low | + + |- size -| + + By checking tree shape, if operand B is coming from signed extension, + then the left shift operand can be simplified into: + + 1. high = low (size - C); + 2. low = low C; + */ + + temp = NULL_RTX; + if (code == LSHIFT_EXPR + target + REG_P (target) + ! unsignedp + mode == GET_MODE_WIDER_MODE (word_mode) + ! have_insn_for (ASHIFT, mode) + TREE_CONSTANT (treeop1) + TREE_CODE (treeop0) == SSA_NAME)
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jeff Law writes: On 08/14/2015 11:40 AM, Jiong Wang wrote: * Figuring out whether the shift source is coming from sign extension by checking SSA_NAME_DEF_STMT instead of deducing from tree range info. I fell checking the gimple statement is more reliable and straigtforward. I suspect it'll also apply more often if you're looking for the nop-conversion rather than looking at range information. I keep thinking there ought to be a generalization here so that we're not so restrictive on the modes, but it's probably not worth doing. In a perfect world we'd also integrate the other strategies for double-word shifts that exist in the various ports as special cases in expansion and remove the double-word shift patterns for ports that don't actually have them in hardware. But that's probably a bit much to ask -- however, it probably couldn't hurt to see if there are any that are easily handled. Agree. Doing these in early tree/rtl expand stage is more clean safe, and expose more details to later RTL optimization passes as I think if you handle double-word by defining insn pattern, then you will try to split it in RTL split pass which happens later after some important optimizations. + + If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't + native instruction to support this wide mode left shift. Given below + example: + +Type A = (Type) B C + +| T | +| high | low | + + |- size -| + + By checking tree shape, if operand B is coming from signed extension, + then the left shift operand can be simplified into: + + 1. high = low (size - C); + 2. low = low C; You'll want to reorder those to match the code you generate. Doesn't this require that C be less than the bitsize of a word? Yes. Above transformation is to handle double word left shift which shift the original source across the word boundary. Part of the contents shifted into the high half of destination, and the other remain in the low half. So if C is bigger than bitsize of a word, the the whole source will be shifted into high half, thus should generate what you have listed below and that's what gcc is generating, looks like the existed double word shift logic can recognize this special cases. So, I should check the value of C,if it's bigger than bitsize of a word, then just fall through to default expand logic. If C is larger than the bitsize of a word, then you need some adjustments, something like: 1. high = low (C - size) 2. low = 0 Does this transformation require that the wider mode be exactly 2X the narrower mode? If so, that needs to be verified. I think so, the wider mode should be exactly 2X the word_mode, will add the check. +if (GET_MODE_SIZE (rmode) GET_MODE_SIZE (mode) So we're assured we have a widening conversion. + ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) += GET_MODE_BITSIZE (word_mode))) This test seems wrong. I'm not sure what you're really trying to test here. It seems you want to look at the shift count relative to the bitsize of word_mode. If it's less, then generate the code you mentioned above in the comment. If it's more, then generate the sequence I referenced? Right? As explained above, I am trying to check whether the left shift is causing the original source across word boundary while I should make sure TREE_INT_CST_LOW (treeop1) GET_MODE_BITSIZE (word_mode) at the same time, otherwise fall through to default expand logic. I do think you need to be verifying that rmode == wordmode here. If I understand everything correctly, the final value is mode which must be 2X the size size of rmode/wordmode here, right? I think rmode don't need to equal wordmode. For example the transformation is OK for the following, where rmode = SImode, and final mode = TImode. int b; __128_int a = (__128_int) b; the expand_expr (treeop0... in the start of those code will generate necessary instruction sequences to extend whatever mode b is into TImode register (op0 in the patch); The other question is are we endianness-safe in these transformations? I think it is. As this transformation is done with register involved only. I think endianness issue will only happen if there is operation on memory object. +temp = expand_variable_shift (code, word_mode, low, treeop1, + tlow, unsignedp); Why use code here right than just using LSHIFT_EXPR? As noted earlier, Yes, better to use the constant LSHIFT_EXPR. Thanks. -- Regards, Jiong
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jeff Law writes: On 04/27/2015 02:21 PM, Jiong Wang wrote: Jeff, Sorry, I can't understand the meaning of overlap between t_low and low, assume right in right value means the opposite of left not correct. So what you mean is t_low and low share the same pseudo regiser? My concern is sharing the same pseudo or memory location. But thinking more about it, the shifted value has to have range information, so it must have been an SSA_NAME, right? If so, then it can't overlap with the destination, so this is a non-issue. Sorry for the confusion. Thanks for the light. By looking at related code, looks like even it's SSA_NAME, it's still possible to share the same pseudo given the destination is in the same SSA map parition after ssa name coleascing? I've never liked the model of storing into TARGET when it's convenient. Because storing into TARGET is totally optional, it means the callers have to check if the value was stored into TARGET or not. Sadly that model has been in the expanders as long as I can remember. So I think this can go forward once we resolve the case where expand_variable_shift returns its value in something other than the passed in target. OK. I will rework the patch, and I found there is a function named expand_doubleword_shift which looks like a more natural place to do this optimization, although it's hard to get range info there. I will do further explore on this. -- Regards, Jiong
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
On 04/29/2015 03:36 PM, Jiong Wang wrote: Jeff Law writes: On 04/27/2015 02:21 PM, Jiong Wang wrote: Jeff, Sorry, I can't understand the meaning of overlap between t_low and low, assume right in right value means the opposite of left not correct. So what you mean is t_low and low share the same pseudo regiser? My concern is sharing the same pseudo or memory location. But thinking more about it, the shifted value has to have range information, so it must have been an SSA_NAME, right? If so, then it can't overlap with the destination, so this is a non-issue. Sorry for the confusion. Thanks for the light. By looking at related code, looks like even it's SSA_NAME, it's still possible to share the same pseudo given the destination is in the same SSA map parition after ssa name coleascing? If they're the same size and have non-overlapping lifetimes, then yes, they could be the same pseudo. That ought to be easy to check. Thankfully we don't have to worry about MEMs, which is a harder check. OK. I will rework the patch, and I found there is a function named expand_doubleword_shift which looks like a more natural place to do this optimization, although it's hard to get range info there. I will do further explore on this. Sounds good. jeff
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
On 04/27/2015 02:21 PM, Jiong Wang wrote: Funny, I find myself wanting this transformation in both places :-) Expansion time so that we generate efficient code from the start and combine to catch those cases which are too complex to see at expansion, but due to other optimizations become visible in the combiner. Sadly, it's been fairly common practice for targets to define double-word shift patterns which catch a variety of special cases. Ports will have to choose between using those patterns and exploiting your work. I'd be tempted to generate a double-word shift by the given constant and check its cost vs the single word shifts. What happens if there's an overlap between t_low and low? Won't the lshift clobber low and thus we get the right value for the rshift in that case? Jeff, Sorry, I can't understand the meaning of overlap between t_low and low, assume right in right value means the opposite of left not correct. So what you mean is t_low and low share the same pseudo regiser? My concern is sharing the same pseudo or memory location. But thinking more about it, the shifted value has to have range information, so it must have been an SSA_NAME, right? If so, then it can't overlap with the destination, so this is a non-issue. Sorry for the confusion. Note that expand_variable_shift may not honor your request for putting the result in the TARGET target parameter you pass in. Thanks, agree, it's better to add those extra move. I noticed the comments at the start of the function: Store the result in the rtx TARGET, if that is convenient. Although I still don't understand in which case it's inconveninent. I've never liked the model of storing into TARGET when it's convenient. Because storing into TARGET is totally optional, it means the callers have to check if the value was stored into TARGET or not. Sadly that model has been in the expanders as long as I can remember. So I think this can go forward once we resolve the case where expand_variable_shift returns its value in something other than the passed in target. Jeff
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jeff Law writes: On 04/16/2015 05:04 AM, Jiong Wang wrote: This is a rework of https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01998.html After second thinking, I feel it's better to fix this in earlier stage during RTL expand which is more generic, and we also avoid making the already complex combine pass complexer. Currently gcc expand wide mode left shift to some generic complex instruction sequences, while if we have known the high part of wide mode all comes from sign extension, the expand logic could be simplifed. Given the following example, T A = (T) B const_imm_shift We know the high part of A are all comes from sign extension, if * T is the next wider type of word_mode. For example, for aarch64, if type T is 128int (TImode), and B is with type SImode or DImode, then tree analyzer know that the high part of TImode result all comes from sign extension, and kept them in range info. | T | | high | low | |- sizel -| For above example, we could simplify the expand logic into 1. low = low const_imm_shift; 2. high = low (sizel - const_imm_shift) */ We can utilize the arithmetic right shift to do the sign extension. Those reduntant instructions will be optimized out later. For actual .s improvement, AArch64 === __int128_t foo (int data) { return (__int128_t) data 50; } old: sxtwx2, w0 asr x1, x2, 63 lsl x0, x2, 50 lsl x1, x1, 50 orr x1, x1, x2, lsr 14 new: sxtwx1, w0 lsl x0, x1, 50 asr x1, x1, 14 ARM (.fpu softvfp) === long long shift (int data) { return (long long) data 20; } old: stmfd sp!, {r4, r5} mov r5, r0, asr #31 mov r3, r0 mov r0, r0, asl #20 mov r1, r5, asl #20 orr r1, r1, r3, lsr #12 ldmfd sp!, {r4, r5} bx lr new: mov r1, r0 mov r0, r0, asl #20 mov r1, r1, asr #12 bx lr Test x86 bootstrap OK, regression test OK. AArch64 bootstrap OK, regression test on board OK. Regards, Jiong 2015-04-116 Jiong.Wang jiong.w...@arm.com gcc/ * expr.c (expand_expr_real_2): Take tree range info into account when expanding LSHIFT_EXPR. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Ditto. * gcc.target/aarch64/ashlti3_1.c: Ditto. * gcc.target/arm/ashldisi_1.c: Ditto. Funny, I find myself wanting this transformation in both places :-) Expansion time so that we generate efficient code from the start and combine to catch those cases which are too complex to see at expansion, but due to other optimizations become visible in the combiner. Sadly, it's been fairly common practice for targets to define double-word shift patterns which catch a variety of special cases. Ports will have to choose between using those patterns and exploiting your work. I'd be tempted to generate a double-word shift by the given constant and check its cost vs the single word shifts. What happens if there's an overlap between t_low and low? Won't the lshift clobber low and thus we get the right value for the rshift in that case? Jeff, Sorry, I can't understand the meaning of overlap between t_low and low, assume right in right value means the opposite of left not correct. So what you mean is t_low and low share the same pseudo regiser? or you mean if we are shifting a value across the word boundary? like the following. | double word | | t_high | t_low | |- low -| for above situation, the simplified two instruction sequence do works. t_low = low const_imm_shift ; t_high = low (sizel - const_imm_shift) I attached the expand result for a simple testcase below. I appreicate if you could comment on the RTL. Thanks. __int128_t foo (int data) { return (__int128_t) data 50; } foo.c.188t.optimized === foo (int data) { __int128 _2; __int128 _3; bb 2: _2 = (__int128) data_1(D); _3 = _2 50; return _3; } foo.c.189r.expand === (insn 2 4 3 2 (set (reg/v:SI 76 [ data ]) (reg:SI 0 x0 [ data ])) foo.c:3 -1 (nil)) (insn 6 3 7 2 (set (reg:DI 79) (sign_extend:DI (reg/v:SI 76 [ data ]))) foo.c:4 -1 (nil)) (insn 7 6 8 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 0) (reg:DI 79)) foo.c:4 -1 (nil)) (insn 8 7 9 2 (set (reg:DI 80) (ashiftrt:DI (reg:DI 79) (const_int 63 [0x3f]))) foo.c:4 -1 (nil)) (insn 9 8 10 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 8) (reg:DI 80)) foo.c:4 -1 (nil)) ^ ~ sign extend SImode data into TImode _2 (r78) (insn 10 9 11 2 (set (subreg:DI (reg:TI 77 [D.2677 ]) 0) (ashift:DI (subreg:DI (reg:TI 78 [ D.2677 ]) 0)
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
On 04/16/2015 05:04 AM, Jiong Wang wrote: This is a rework of https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01998.html After second thinking, I feel it's better to fix this in earlier stage during RTL expand which is more generic, and we also avoid making the already complex combine pass complexer. Currently gcc expand wide mode left shift to some generic complex instruction sequences, while if we have known the high part of wide mode all comes from sign extension, the expand logic could be simplifed. Given the following example, T A = (T) B const_imm_shift We know the high part of A are all comes from sign extension, if * T is the next wider type of word_mode. For example, for aarch64, if type T is 128int (TImode), and B is with type SImode or DImode, then tree analyzer know that the high part of TImode result all comes from sign extension, and kept them in range info. | T | | high | low | |- sizel -| For above example, we could simplify the expand logic into 1. low = low const_imm_shift; 2. high = low (sizel - const_imm_shift) */ We can utilize the arithmetic right shift to do the sign extension. Those reduntant instructions will be optimized out later. For actual .s improvement, AArch64 === __int128_t foo (int data) { return (__int128_t) data 50; } old: sxtwx2, w0 asr x1, x2, 63 lsl x0, x2, 50 lsl x1, x1, 50 orr x1, x1, x2, lsr 14 new: sxtwx1, w0 lsl x0, x1, 50 asr x1, x1, 14 ARM (.fpu softvfp) === long long shift (int data) { return (long long) data 20; } old: stmfd sp!, {r4, r5} mov r5, r0, asr #31 mov r3, r0 mov r0, r0, asl #20 mov r1, r5, asl #20 orr r1, r1, r3, lsr #12 ldmfd sp!, {r4, r5} bx lr new: mov r1, r0 mov r0, r0, asl #20 mov r1, r1, asr #12 bx lr Test x86 bootstrap OK, regression test OK. AArch64 bootstrap OK, regression test on board OK. Regards, Jiong 2015-04-116 Jiong.Wang jiong.w...@arm.com gcc/ * expr.c (expand_expr_real_2): Take tree range info into account when expanding LSHIFT_EXPR. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Ditto. * gcc.target/aarch64/ashlti3_1.c: Ditto. * gcc.target/arm/ashldisi_1.c: Ditto. Funny, I find myself wanting this transformation in both places :-) Expansion time so that we generate efficient code from the start and combine to catch those cases which are too complex to see at expansion, but due to other optimizations become visible in the combiner. Sadly, it's been fairly common practice for targets to define double-word shift patterns which catch a variety of special cases. Ports will have to choose between using those patterns and exploiting your work. I'd be tempted to generate a double-word shift by the given constant and check its cost vs the single word shifts. What happens if there's an overlap between t_low and low? Won't the lshift clobber low and thus we get the right value for the rshift in that case? Note that expand_variable_shift may not honor your request for putting the result in the TARGET target parameter you pass in. Thus: temp = expand_variable_shift (...) temp = expand_variable_shift (...) Can't be right.You probably need something like temp = expand_variable_shift (...) if (temp != t_low) emit_move_insn (t_low, temp); temp = expand_variable_shift (...) if (temp != t_high) emit_move_insn (t_high, temp); return target; So I generally like where you're going with this, but I have concerns about its correctness, particularly in cases where there's an overlap or when expand_variable_shift returns its value in something other than the passed in target. Jeff
[Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
This is a rework of https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01998.html After second thinking, I feel it's better to fix this in earlier stage during RTL expand which is more generic, and we also avoid making the already complex combine pass complexer. Currently gcc expand wide mode left shift to some generic complex instruction sequences, while if we have known the high part of wide mode all comes from sign extension, the expand logic could be simplifed. Given the following example, T A = (T) B const_imm_shift We know the high part of A are all comes from sign extension, if * T is the next wider type of word_mode. For example, for aarch64, if type T is 128int (TImode), and B is with type SImode or DImode, then tree analyzer know that the high part of TImode result all comes from sign extension, and kept them in range info. | T | | high | low | |- sizel -| For above example, we could simplify the expand logic into 1. low = low const_imm_shift; 2. high = low (sizel - const_imm_shift) */ We can utilize the arithmetic right shift to do the sign extension. Those reduntant instructions will be optimized out later. For actual .s improvement, AArch64 === __int128_t foo (int data) { return (__int128_t) data 50; } old: sxtwx2, w0 asr x1, x2, 63 lsl x0, x2, 50 lsl x1, x1, 50 orr x1, x1, x2, lsr 14 new: sxtwx1, w0 lsl x0, x1, 50 asr x1, x1, 14 ARM (.fpu softvfp) === long long shift (int data) { return (long long) data 20; } old: stmfd sp!, {r4, r5} mov r5, r0, asr #31 mov r3, r0 mov r0, r0, asl #20 mov r1, r5, asl #20 orr r1, r1, r3, lsr #12 ldmfd sp!, {r4, r5} bx lr new: mov r1, r0 mov r0, r0, asl #20 mov r1, r1, asr #12 bx lr Test x86 bootstrap OK, regression test OK. AArch64 bootstrap OK, regression test on board OK. Regards, Jiong 2015-04-116 Jiong.Wang jiong.w...@arm.com gcc/ * expr.c (expand_expr_real_2): Take tree range info into account when expanding LSHIFT_EXPR. gcc/testsuite * gcc.dg/wide_shift_64_1.c: New testcase. * gcc.dg/wide_shift_128_1.c: Ditto. * gcc.target/aarch64/ashlti3_1.c: Ditto. * gcc.target/arm/ashldisi_1.c: Ditto. diff --git a/gcc/expr.c b/gcc/expr.c index 89ca129..96d64cc 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8984,23 +8984,85 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, case LSHIFT_EXPR: case RSHIFT_EXPR: - /* If this is a fixed-point operation, then we cannot use the code - below because expand_shift doesn't support sat/no-sat fixed-point - shifts. */ - if (ALL_FIXED_POINT_MODE_P (mode)) - goto binop; - - if (! safe_from_p (subtarget, treeop1, 1)) - subtarget = 0; - if (modifier == EXPAND_STACK_PARM) - target = 0; - op0 = expand_expr (treeop0, subtarget, - VOIDmode, EXPAND_NORMAL); - temp = expand_variable_shift (code, mode, op0, treeop1, target, -unsignedp); - if (code == LSHIFT_EXPR) - temp = REDUCE_BIT_FIELD (temp); - return temp; + { + /* If this is a fixed-point operation, then we cannot use the code + below because expand_shift doesn't support sat/no-sat fixed-point + shifts. */ + if (ALL_FIXED_POINT_MODE_P (mode)) + goto binop; + + if (! safe_from_p (subtarget, treeop1, 1)) + subtarget = 0; + if (modifier == EXPAND_STACK_PARM) + target = 0; + + op0 = expand_expr (treeop0, subtarget, + VOIDmode, EXPAND_NORMAL); + + /* If mode == GET_MODE_WIDER_MODE (word_mode), + then normally, there will no native instructions to support + this wide mode left shift. + + given below example, + + T A = (T) B C + + | T | + | high | low | + + |- sizel -| + + if from range info, we could deduce that the high part are all sign + bit extension, then this left shift operation could be largely + simplified into. + + 1. low = low C; + 2. high = low (sizel - C) */ + + int o_bits = GET_MODE_SIZE (mode) * BITS_PER_UNIT; + wide_int min, max; + + if (code == LSHIFT_EXPR + !unsignedp + mode == GET_MODE_WIDER_MODE (word_mode) + !have_insn_for (LSHIFT_EXPR, mode) + TREE_CONSTANT (treeop1) + get_range_info (treeop0, min, max) == VR_RANGE + (wi::cmp (min, + wide_int::from (wi::min_value + ((unsigned) (BITS_PER_WORD), + SIGNED), o_bits, SIGNED), + SIGNED) != -1) + (wi::cmp (max, + wide_int::from (wi::max_value + ((unsigned)(BITS_PER_WORD), + SIGNED), o_bits, SIGNED), + SIGNED) != 1)) + { + rtx low = simplify_gen_subreg (word_mode, op0, mode, 0); + rtx t_low = simplify_gen_subreg (word_mode, target, mode, 0); + rtx t_high = simplify_gen_subreg (word_mode, target, mode, + UNITS_PER_WORD); + tree high_shift = +