Re: [PATCH][ARM] Add movv4hf/v8hf expanders & later insns; disable VnHF immediates.
On 18 January 2016 at 14:12, Kyrill Tkachov wrote: > Hi Alan, > > > On 18/01/16 12:14, Alan Lawrence wrote: >> >> This fixes ICEs on armeb for float16x[48]_t vectors, e.g. in >> check_effective_target_arm_neon_fp_16_ok. >> >> At present, without the expander, moving v4hf/v8hf values around is done >> via subregs. On armeb, this ICEs because REG_CANNOT_CHANGE_MODE_P. (On >> arm-*, >> moving via two subregs is less efficient than one native insn!) >> >> However, adding the expanders, reveals a latent bug in the V4HF variant of >> *neon_mov, that vector constants are not handled properly in the >> neon_valid_immediate code. Hence, for now I've used a separate expander >> that >> disallows immediates, and disabled VnHF vectors as immediates in >> neon_valid_immediate_for_move; I'll file a PR for this. >> >> Also to fix the advsimd-intrinsics/vcombine test I had to add HF vector >> modes to >> the VX iterator and V_reg attribute, for vdup_n, as loading a vector of >> identical HF elements is now done by loading the scalar + vdup rather than >> forcing the vector out to the constant pool. >> >> On armeb, one of the ICEs this fixes, is in the test program for >> check_effective_target_arm_neon_fp_16_ok. This means the >> advsimd-intrinsics >> vcvt_f16 test now runs (and passes), and also that the other tests now run >> with neon-fp16, rather than only neon as previously (on armeb). >> This reveals that the fp16 cases of vld1_lane and vset_lane are (and were) >> failing. Since those tests would previously have failed *if fp16 had been >> passed in*, I think this is still a step forward; one can still run the >> tests >> with an explicit non-fp16 multilib if the old behaviour is desired. >> >> Note the previous patch removes other uses of VQXMOV (not strictly a >> dependency, >> generating V4HF/V8HF reinterpret patterns is harmless, they just aren't >> used). >> >> Bootstrapped + check-gcc on arm-none-linux-gnueabihf; >> cross-tested armeb-none-eabi. > > > Seems that you and Christophe have some duplicated work here? > https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01031.html > Indeed, that's unfortunate :-( I had filed PR 68620 to avoid duplication. Both patches look similar, fortunately. Christophe. > Kyrill > > >> gcc/ChangeLog: >> >> * config/arm/arm.c (neon_valid_immediate): Disallow vectors of >> HFmode. >> * config/arm/iterators.md (V_HF): New. >> (VQXMOV): Add V8HF. >> (VX): Add V4HF, V8HF. >> (V_reg): Add cases for V4HF, V8HF. >> * config/arm/vec-common.md (mov V_HF): New. >> --- >> gcc/config/arm/arm.c | 2 ++ >> gcc/config/arm/iterators.md | 8 ++-- >> gcc/config/arm/vec-common.md | 20 >> 3 files changed, 28 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 3276b03..4fdba38 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -12371,6 +12371,8 @@ neon_valid_immediate (rtx op, machine_mode mode, >> int inverse, >> /* Vectors of float constants. */ >> if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) >> { >> + if (GET_MODE_INNER (mode) == HFmode) >> + return -1; >> rtx el0 = CONST_VECTOR_ELT (op, 0); >> const REAL_VALUE_TYPE *r0; >> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md >> index 974cf51..c5db868 100644 >> --- a/gcc/config/arm/iterators.md >> +++ b/gcc/config/arm/iterators.md >> @@ -59,6 +59,9 @@ >> ;; Integer and float modes supported by Neon and IWMMXT. >> (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI >> V4SF]) >> +;; Vectors of half-precision floats. >> +(define_mode_iterator V_HF [V4HF V8HF]) >> + >> ;; Integer and float modes supported by Neon and IWMMXT, except V2DI. >> (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF]) >> @@ -99,7 +102,7 @@ >> (define_mode_iterator VQI [V16QI V8HI V4SI]) >> ;; Quad-width vector modes, with TImode added, for moves. >> -(define_mode_iterator VQXMOV [V16QI V8HI V4SI V4SF V2DI TI]) >> +(define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI]) >> ;; Opaque structure types wider than TImode. >> (define_mode_iterator VSTRUCT [EI OI CI XI]) >> @@ -160,7 +163,7 @@ >> (define_mode_iterator VMDQI [V4HI V2SI V8HI V4SI]) >> ;; Modes with 8-bit and 16-bit elements. >> -(define_mode_iterator VX [V8QI V4HI V16QI V8HI]) >> +(define_mode_iterator VX [V8QI V4HI V4HF V16QI V8HI V8HF]) >> ;; Modes with 8-bit elements. >> (define_mode_iterator VE [V8QI V16QI]) >> @@ -428,6 +431,7 @@ >> ;; Register width from element mode >> (define_mode_attr V_reg [(V8QI "P") (V16QI "q") >>(V4HI "P") (V8HI "q") >> +(V4HF "P") (V8HF "q") >>(V2SI "P") (V4SI "q") >>(V2SF "P") (V4SF "q") >>(DI "P") (V2DI "q") >> diff --git a/gcc/config/arm/v
Re: [PATCH][ARM] Add movv4hf/v8hf expanders & later insns; disable VnHF immediates.
Hi Alan, On 18/01/16 12:14, Alan Lawrence wrote: This fixes ICEs on armeb for float16x[48]_t vectors, e.g. in check_effective_target_arm_neon_fp_16_ok. At present, without the expander, moving v4hf/v8hf values around is done via subregs. On armeb, this ICEs because REG_CANNOT_CHANGE_MODE_P. (On arm-*, moving via two subregs is less efficient than one native insn!) However, adding the expanders, reveals a latent bug in the V4HF variant of *neon_mov, that vector constants are not handled properly in the neon_valid_immediate code. Hence, for now I've used a separate expander that disallows immediates, and disabled VnHF vectors as immediates in neon_valid_immediate_for_move; I'll file a PR for this. Also to fix the advsimd-intrinsics/vcombine test I had to add HF vector modes to the VX iterator and V_reg attribute, for vdup_n, as loading a vector of identical HF elements is now done by loading the scalar + vdup rather than forcing the vector out to the constant pool. On armeb, one of the ICEs this fixes, is in the test program for check_effective_target_arm_neon_fp_16_ok. This means the advsimd-intrinsics vcvt_f16 test now runs (and passes), and also that the other tests now run with neon-fp16, rather than only neon as previously (on armeb). This reveals that the fp16 cases of vld1_lane and vset_lane are (and were) failing. Since those tests would previously have failed *if fp16 had been passed in*, I think this is still a step forward; one can still run the tests with an explicit non-fp16 multilib if the old behaviour is desired. Note the previous patch removes other uses of VQXMOV (not strictly a dependency, generating V4HF/V8HF reinterpret patterns is harmless, they just aren't used). Bootstrapped + check-gcc on arm-none-linux-gnueabihf; cross-tested armeb-none-eabi. Seems that you and Christophe have some duplicated work here? https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01031.html Kyrill gcc/ChangeLog: * config/arm/arm.c (neon_valid_immediate): Disallow vectors of HFmode. * config/arm/iterators.md (V_HF): New. (VQXMOV): Add V8HF. (VX): Add V4HF, V8HF. (V_reg): Add cases for V4HF, V8HF. * config/arm/vec-common.md (mov V_HF): New. --- gcc/config/arm/arm.c | 2 ++ gcc/config/arm/iterators.md | 8 ++-- gcc/config/arm/vec-common.md | 20 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 3276b03..4fdba38 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12371,6 +12371,8 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse, /* Vectors of float constants. */ if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) { + if (GET_MODE_INNER (mode) == HFmode) + return -1; rtx el0 = CONST_VECTOR_ELT (op, 0); const REAL_VALUE_TYPE *r0; diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 974cf51..c5db868 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -59,6 +59,9 @@ ;; Integer and float modes supported by Neon and IWMMXT. (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF]) +;; Vectors of half-precision floats. +(define_mode_iterator V_HF [V4HF V8HF]) + ;; Integer and float modes supported by Neon and IWMMXT, except V2DI. (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF]) @@ -99,7 +102,7 @@ (define_mode_iterator VQI [V16QI V8HI V4SI]) ;; Quad-width vector modes, with TImode added, for moves. -(define_mode_iterator VQXMOV [V16QI V8HI V4SI V4SF V2DI TI]) +(define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI]) ;; Opaque structure types wider than TImode. (define_mode_iterator VSTRUCT [EI OI CI XI]) @@ -160,7 +163,7 @@ (define_mode_iterator VMDQI [V4HI V2SI V8HI V4SI]) ;; Modes with 8-bit and 16-bit elements. -(define_mode_iterator VX [V8QI V4HI V16QI V8HI]) +(define_mode_iterator VX [V8QI V4HI V4HF V16QI V8HI V8HF]) ;; Modes with 8-bit elements. (define_mode_iterator VE [V8QI V16QI]) @@ -428,6 +431,7 @@ ;; Register width from element mode (define_mode_attr V_reg [(V8QI "P") (V16QI "q") (V4HI "P") (V8HI "q") +(V4HF "P") (V8HF "q") (V2SI "P") (V4SI "q") (V2SF "P") (V4SF "q") (DI "P") (V2DI "q") diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index ce98f71..c27578a 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -38,6 +38,26 @@ } }) +;; This exists separately from the above pattern to exclude an immediate RHS. + +(define_expand "mov" + [(set (match_operand:V_HF 0 "nonimmediate_operand" "") + (match_operand:V_HF 1 "nonimmediate_operand" ""))] + "TARGET_NEON + || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" +{ + if
[PATCH][ARM] Add movv4hf/v8hf expanders & later insns; disable VnHF immediates.
This fixes ICEs on armeb for float16x[48]_t vectors, e.g. in check_effective_target_arm_neon_fp_16_ok. At present, without the expander, moving v4hf/v8hf values around is done via subregs. On armeb, this ICEs because REG_CANNOT_CHANGE_MODE_P. (On arm-*, moving via two subregs is less efficient than one native insn!) However, adding the expanders, reveals a latent bug in the V4HF variant of *neon_mov, that vector constants are not handled properly in the neon_valid_immediate code. Hence, for now I've used a separate expander that disallows immediates, and disabled VnHF vectors as immediates in neon_valid_immediate_for_move; I'll file a PR for this. Also to fix the advsimd-intrinsics/vcombine test I had to add HF vector modes to the VX iterator and V_reg attribute, for vdup_n, as loading a vector of identical HF elements is now done by loading the scalar + vdup rather than forcing the vector out to the constant pool. On armeb, one of the ICEs this fixes, is in the test program for check_effective_target_arm_neon_fp_16_ok. This means the advsimd-intrinsics vcvt_f16 test now runs (and passes), and also that the other tests now run with neon-fp16, rather than only neon as previously (on armeb). This reveals that the fp16 cases of vld1_lane and vset_lane are (and were) failing. Since those tests would previously have failed *if fp16 had been passed in*, I think this is still a step forward; one can still run the tests with an explicit non-fp16 multilib if the old behaviour is desired. Note the previous patch removes other uses of VQXMOV (not strictly a dependency, generating V4HF/V8HF reinterpret patterns is harmless, they just aren't used). Bootstrapped + check-gcc on arm-none-linux-gnueabihf; cross-tested armeb-none-eabi. gcc/ChangeLog: * config/arm/arm.c (neon_valid_immediate): Disallow vectors of HFmode. * config/arm/iterators.md (V_HF): New. (VQXMOV): Add V8HF. (VX): Add V4HF, V8HF. (V_reg): Add cases for V4HF, V8HF. * config/arm/vec-common.md (mov V_HF): New. --- gcc/config/arm/arm.c | 2 ++ gcc/config/arm/iterators.md | 8 ++-- gcc/config/arm/vec-common.md | 20 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 3276b03..4fdba38 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12371,6 +12371,8 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse, /* Vectors of float constants. */ if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) { + if (GET_MODE_INNER (mode) == HFmode) + return -1; rtx el0 = CONST_VECTOR_ELT (op, 0); const REAL_VALUE_TYPE *r0; diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 974cf51..c5db868 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -59,6 +59,9 @@ ;; Integer and float modes supported by Neon and IWMMXT. (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF]) +;; Vectors of half-precision floats. +(define_mode_iterator V_HF [V4HF V8HF]) + ;; Integer and float modes supported by Neon and IWMMXT, except V2DI. (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF]) @@ -99,7 +102,7 @@ (define_mode_iterator VQI [V16QI V8HI V4SI]) ;; Quad-width vector modes, with TImode added, for moves. -(define_mode_iterator VQXMOV [V16QI V8HI V4SI V4SF V2DI TI]) +(define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI]) ;; Opaque structure types wider than TImode. (define_mode_iterator VSTRUCT [EI OI CI XI]) @@ -160,7 +163,7 @@ (define_mode_iterator VMDQI [V4HI V2SI V8HI V4SI]) ;; Modes with 8-bit and 16-bit elements. -(define_mode_iterator VX [V8QI V4HI V16QI V8HI]) +(define_mode_iterator VX [V8QI V4HI V4HF V16QI V8HI V8HF]) ;; Modes with 8-bit elements. (define_mode_iterator VE [V8QI V16QI]) @@ -428,6 +431,7 @@ ;; Register width from element mode (define_mode_attr V_reg [(V8QI "P") (V16QI "q") (V4HI "P") (V8HI "q") +(V4HF "P") (V8HF "q") (V2SI "P") (V4SI "q") (V2SF "P") (V4SF "q") (DI "P") (V2DI "q") diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index ce98f71..c27578a 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -38,6 +38,26 @@ } }) +;; This exists separately from the above pattern to exclude an immediate RHS. + +(define_expand "mov" + [(set (match_operand:V_HF 0 "nonimmediate_operand" "") + (match_operand:V_HF 1 "nonimmediate_operand" ""))] + "TARGET_NEON + || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" +{ + if (can_create_pseudo_p ()) +{ + if (!REG_P (operands[0])) + operands[1] = force_reg (mode, operands[1]); + else if (TARGET_NEON && CONSTANT_P (operands[1])) + { + operands[1] = neon_make_constant