> I did a quick bootstrap, this shows several failures like:
> 
> gcc/builtins.c:9427:1: error: unrecognizable insn:
>  9427 | }
>       | ^
> (insn 212 211 213 24 (set (reg:SI 207)
>         (zero_extract:SI (reg:SI 206)
>             (const_int 26 [0x1a])
>             (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
>      (nil))
> 
> The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
> Currently cases like this are split into a right shift in aarch64.md around 
> line
> 5569:

Appreciate you taking a look and the validation. I've gotten access to an 
aarch64 server and the bootstrap demonstrated the issue you saw. This was 
caused by my re-definition of the pattern to:
+  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;

Which meant for SImode only a sum of >32 bit actually triggers the fail 
condition for the define_expand whereas the existing define_insn fails on >=32 
bit. I looked into the architecture reference manual and the bits are available 
for ubfx/sbfx for that type of encoding and the documentation says you can use 
[lsb, 32-lsb] for SImode as a legal pair. Checking with the GNU assembler it 
does accept a sum of 32 but transforms it into a LSR:

Assembly file:
ubfx    w0, w0, 24, 8

Disassembly of section .text:

0000000000000000 <f>:
   0:   53187c00        lsr     w0, w0, #24

Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other 
assemblers could trip over, I've attached a new patch that allows this encoding 
and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's 
better to explicitly do the transformation in GCC.

> ;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
> instruction taking advantage of the implicit zero-extension of the X-reg.
> (define_split
>   [(set (match_operand:DI 0 "register_operand")
>         (zero_extract:DI (match_operand:DI 1 "register_operand")
>                          (match_operand 2
>                            "aarch64_simd_shift_imm_offset_di")
>                          (match_operand 3
>                            "aarch64_simd_shift_imm_di")))]
>   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
>              GET_MODE_BITSIZE (DImode) - 1)
>    && (INTVAL (operands[2]) + INTVAL (operands[3]))
>        == GET_MODE_BITSIZE (SImode)"
>   [(set (match_dup 0)
>         (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
>   {
>     operands[4] = gen_lowpart (SImode, operands[1]);
>   }
> 
> However that only supports DImode, not SImode, so it needs to be changed
> to be more general using GPI.
> 
> Your new extv patterns should replace the magic patterns above it:

With the previous discovery that a sum of 32/64 will trigger LSR in the 
assembler I was curious what would happen if I remove this pattern. Turns out, 
you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the 
test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which 
doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. 
So this pattern still has value but I don't think it's necessary to extend it 
to DI since that'll automatically get turned into a LSR by the assembler as I 
previously mentioned.


> ;; -------------------------------------------------------------------
> ;; Bitfields
> ;; -------------------------------------------------------------------
> 
> (define_expand "<optab>"
> 
> These are the current extv/extzv patterns, but without a mode. They are no
> longer used when we start using the new ones.
> 
> Note you can write <optab><mode> to combine the extzv and extz patterns.
> But please add a comment mentioning the pattern names so they are easy to
> find!

Good call here, made this change in the new patch. I did see the define_insn of 
these guys below it but somehow missed that they were expanded just above. I 
believe, from my understanding of GCC, that the matching pattern below the 
first line is what constrains <optab> into just extv/extsv from the long list 
of iterators it belongs to. Still, I see that there's constrained iterators 
elsewhere like: 

;; Optab prefix for sign/zero-extending operations
(define_code_attr su_optab [(sign_extend "") (zero_extend "u")

I added a comment in this patch before the pattern. Thoughts on defining 
another constrained version to make it clearer (in addition or in lieu of the 
comment)?

> Besides a bootstrap it is always useful to compile a large body of code with
> your change (say SPEC2006/2017) and check for differences in at least
> codesize. If there is an increase in instruction count then there may be more
> issues that need to be resolved.

Sounds good. I'll get those setup and running and will report back on findings. 
What's the preferred way to measure codesize? I'm assuming by default the code 
pages are aligned so smaller differences would need to trip over the boundary 
to actually show up. 
 
> I find it easiest to develop on a many-core AArch64 server so you get much
> faster builds, bootstraps and regression tests. Cross compilers are mostly
> useful if you want to test big-endian or new architecture features which are
> not yet supported in hardware. You don't normally need to test
> Go/Fortran/ADA etc unless your patch does something that would directly
> affect them.

Good to know. Sounds like our current testing with C/C++ testsuite is alright 
for most changes.

> Finally do you have a copyright assignment with the FSF?

Lawyers are working on that. Since I have SPEC codesize to go through the 
licensing may be ready before all analysis is done for the patch.

Modi

Attachment: pr86901.patch
Description: pr86901.patch

Reply via email to