When zero extracting a single bit bitfield from bits 16..31 on the H8 we
currently generate some pretty bad code.
The fundamental issue is we can't shift efficiently and there's no
trivial way to extract a single bit out of the high half word of an
SImode value.
What usually happens is we use a synthesized right shift to get the
single bit into the desired position, then a bit-and to mask off
everything we don't care about.
The shifts are expensive, even using tricks like half and quarter word
moves to implement shift-by-16 and shift-by-8. Additionally a logical
right shift must clear out the upper bits which is redundant since we're
going to mask things with &1 later.
This patch provides a consistently better sequence for such extractions.
The general form moves the high half into the low half, a bit
extraction into C, clear the destination, then move C into the
destination with a few special cases.
This also avoids all the shenanigans for H8/SX which has a much more
capable shifter. It's not single cycle, but it is reasonably efficient.
This has been regression tested on the H8 without issues. Pushing to
the trunk momentarily.
jeff
ps. Yes, supporting extraction of multi-bit fields might be improvable
as well. But I've already spent more time on this than I can reasonably
justify.
commit 57dbc02d261bb833f6ef287187eb144321dd595c
Author: Jeff Law <j...@ventanamicro.com>
Date: Thu Nov 9 17:34:01 2023 -0700
[committed] Improve single bit zero extraction on H8.
When zero extracting a single bit bitfield from bits 16..31 on the H8 we
currently generate some pretty bad code.
The fundamental issue is we can't shift efficiently and there's no trivial
way
to extract a single bit out of the high half word of an SImode value.
What usually happens is we use a synthesized right shift to get the single
bit
into the desired position, then a bit-and to mask off everything we don't
care
about.
The shifts are expensive, even using tricks like half and quarter word
moves to
implement shift-by-16 and shift-by-8. Additionally a logical right shift
must
clear out the upper bits which is redundant since we're going to mask things
with &1 later.
This patch provides a consistently better sequence for such extractions.
The
general form moves the high half into the low half, a bit extraction into C,
clear the destination, then move C into the destination with a few special
cases.
This also avoids all the shenanigans for H8/SX which has a much more capable
shifter. It's not single cycle, but it is reasonably efficient.
This has been regression tested on the H8 without issues. Pushing to the
trunk
momentarily.
jeff
ps. Yes, supporting zero extraction of multi-bit fields might be
improvable as
well. But I've already spent more time on this than I can reasonably
justify.
gcc/
* config/h8300/combiner.md (single bit sign_extract): Avoid recently
added patterns for H8/SX.
(single bit zero_extract): New patterns.
diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index 2f7faf77c93..e1179b5fea6 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1278,7 +1278,7 @@ (define_insn_and_split ""
(sign_extract:SI (match_operand:QHSI 1 "register_operand" "0")
(const_int 1)
(match_operand 2 "immediate_operand")))]
- ""
+ "!TARGET_H8300SX"
"#"
"&& reload_completed"
[(parallel [(set (match_dup 0)
@@ -1291,7 +1291,7 @@ (define_insn ""
(const_int 1)
(match_operand 2 "immediate_operand")))
(clobber (reg:CC CC_REG))]
- ""
+ "!TARGET_H8300SX"
{
int position = INTVAL (operands[2]);
@@ -1359,3 +1359,69 @@ (define_insn ""
return "subx\t%s0,%s0\;exts.w %T0\;exts.l %0";
}
[(set_attr "length" "10")])
+
+;; For shift counts >= 16 we can always do better than the
+;; generic sequences. Other patterns handle smaller counts.
+(define_insn_and_split ""
+ [(set (match_operand:SI 0 "register_operand" "=r")
+ (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+ (match_operand 2 "immediate_operand" "n"))
+ (const_int 1)))]
+ "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"
+ "#"
+ "&& reload_completed"
+ [(parallel [(set (match_dup 0) (and:SI (lshiftrt:SI (match_dup 0) (match_dup
2))
+ (const_int 1)))
+ (clobber (reg:CC CC_REG))])])
+
+(define_insn ""
+ [(set (match_operand:SI 0 "register_operand" "=r")
+ (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+ (match_operand 2 "immediate_operand" "n"))
+ (const_int 1)))
+ (clobber (reg:CC CC_REG))]
+ "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"
+{
+ int position = INTVAL (operands[2]);
+
+ /* If the bit we want is the highest bit we can just rotate it into position
+ and mask off everything else. */
+ if (position == 31)
+ {
+ output_asm_insn ("rotl.l\t%0", operands);
+ return "and.l\t#1,%0";
+ }
+
+ /* Special case for H8/S. Similar to bit 31. */
+ if (position == 30 && TARGET_H8300S)
+ return "rotl.l\t#2,%0\;and.l\t#1,%0";
+
+ if (position <= 30 && position >= 17)
+ {
+ /* Shift 16 bits, without worrying about extensions. */
+ output_asm_insn ("mov.w\t%e1,%f0", operands);
+
+ /* Get the bit we want into C. */
+ operands[2] = GEN_INT (position % 8);
+ if (position >= 24)
+ output_asm_insn ("bld\t%2,%t0", operands);
+ else
+ output_asm_insn ("bld\t%2,%s0", operands);
+
+ /* xor + rotate to clear the destination, then rotate
+ the C into position. */
+ return "xor.l\t%0,%0\;rotxl.l\t%0";
+ }
+
+ if (position == 16)
+ {
+ /* Shift 16 bits, without worrying about extensions. */
+ output_asm_insn ("mov.w\t%e1,%f0", operands);
+
+ /* And finally, mask out everything we don't want. */
+ return "and.l\t#1,%0";
+ }
+
+ gcc_unreachable ();
+}
+ [(set_attr "length" "10")])