On 07/02/2020 02:12, Modi Mo via gcc-patches wrote:
I did a quick bootstrap, this shows several failures like:
gcc/builtins.c:9427:1: error: unrecognizable insn:
9427 | }
| ^
(insn 212 211 213 24 (set (reg:SI 207)
(zero_extract:SI (reg:SI 206)
(const_int 26 [0x1a])
(const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
(nil))
The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
Currently cases like this are split into a right shift in aarch64.md around line
5569:
Appreciate you taking a look and the validation. I've gotten access to an
aarch64 server and the bootstrap demonstrated the issue you saw. This was
caused by my re-definition of the pattern to:
+ if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+ FAIL;
Which meant for SImode only a sum of >32 bit actually triggers the fail condition
for the define_expand whereas the existing define_insn fails on >=32 bit. I looked
into the architecture reference manual and the bits are available for ubfx/sbfx for
that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode
as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but
transforms it into a LSR:
Assembly file:
ubfx w0, w0, 24, 8
Disassembly of section .text:
0000000000000000 <f>:
0: 53187c00 lsr w0, w0, #24
This is how LSR is implemented in AArch64, as an extract. So the
disassembler is showing the canonical representation.
However, inside the compiler we really want to represent this as a
shift. Why? Because there are cases where we can then merge a shift
into other operations when we can't merge the more general extract
operation. For example, we can merge an LSR into a subsequent 'AND'
operation, but can't merge a more general extract into an AND.
Essentially, we want the compiler to describe this in the canonical
shift form rather than the extract form.
Ideally this would be handled inside the mid-end expansion of an
extract, but in the absence of that I think this is best done inside the
extv expansion so that we never end up with a real extract in that case.
Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other
assemblers could trip over, I've attached a new patch that allows this encoding
and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's
better to explicitly do the transformation in GCC.
;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
instruction taking advantage of the implicit zero-extension of the X-reg.
(define_split
[(set (match_operand:DI 0 "register_operand")
(zero_extract:DI (match_operand:DI 1 "register_operand")
(match_operand 2
"aarch64_simd_shift_imm_offset_di")
(match_operand 3
"aarch64_simd_shift_imm_di")))]
"IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
GET_MODE_BITSIZE (DImode) - 1)
&& (INTVAL (operands[2]) + INTVAL (operands[3]))
== GET_MODE_BITSIZE (SImode)"
[(set (match_dup 0)
(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
{
operands[4] = gen_lowpart (SImode, operands[1]);
}
However that only supports DImode, not SImode, so it needs to be changed
to be more general using GPI.
Your new extv patterns should replace the magic patterns above it:
With the previous discovery that a sum of 32/64 will trigger LSR in the
assembler I was curious what would happen if I remove this pattern. Turns out,
you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the
test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which
doesn't get transformed into an LSR by the assembler since it's in 64 bit mode.
So this pattern still has value but I don't think it's necessary to extend it
to DI since that'll automatically get turned into a LSR by the assembler as I
previously mentioned.
;; -------------------------------------------------------------------
;; Bitfields
;; -------------------------------------------------------------------
(define_expand "<optab>"
These are the current extv/extzv patterns, but without a mode. They are no
longer used when we start using the new ones.
Note you can write <optab><mode> to combine the extzv and extz patterns.
But please add a comment mentioning the pattern names so they are easy to
find!
Good call here, made this change in the new patch. I did see the define_insn of these
guys below it but somehow missed that they were expanded just above. I believe, from
my understanding of GCC, that the matching pattern below the first line is what
constrains <optab> into just extv/extsv from the long list of iterators it
belongs to. Still, I see that there's constrained iterators elsewhere like:
;; Optab prefix for sign/zero-extending operations
(define_code_attr su_optab [(sign_extend "") (zero_extend "u")
I added a comment in this patch before the pattern. Thoughts on defining
another constrained version to make it clearer (in addition or in lieu of the
comment)?
Besides a bootstrap it is always useful to compile a large body of code with
your change (say SPEC2006/2017) and check for differences in at least
codesize. If there is an increase in instruction count then there may be more
issues that need to be resolved.
Sounds good. I'll get those setup and running and will report back on findings.
What's the preferred way to measure codesize? I'm assuming by default the code
pages are aligned so smaller differences would need to trip over the boundary
to actually show up.
I find it easiest to develop on a many-core AArch64 server so you get much
faster builds, bootstraps and regression tests. Cross compilers are mostly
useful if you want to test big-endian or new architecture features which are
not yet supported in hardware. You don't normally need to test
Go/Fortran/ADA etc unless your patch does something that would directly
affect them.
Good to know. Sounds like our current testing with C/C++ testsuite is alright
for most changes.
Finally do you have a copyright assignment with the FSF?
Lawyers are working on that. Since I have SPEC codesize to go through the
licensing may be ready before all analysis is done for the patch.
Modi
R.