On Fri, Mar 19, 2021 at 7:29 AM Alexandre Oliva <ol...@gnu.org> wrote: > > > The split in ssse3_pshufbv8qi3 forces a const vector into the constant > pool, and loads from it. That runs after reload, so if the load > requires any reloading, we're out of luck. Indeed, if the load > address is not legitimate, e.g. -mcmodel=large, the insn is no longer > recognized. > > This patch turns the constant into an input operand, introduces an > expander to generate the constant unconditionally, and arranges for > this input operand to be retained as an unused immediate in the > alternatives that don't undergo splitting, and for it to be loaded > into the scratch register for those that do. > > It is now the register allocator that arranges to load the const > vector into a register, so it deals with whatever legitimizing steps > needed for the target configuration. > > Regstrapped on x86_64-linux-gnu. Ok to install? > > > for gcc/ChangeLog > > * config/i386/predicates.md (register_or_const_vec_operand): > New. > * config/i386/sse.md (ssse3_pshufbv8qi3): Add an expander for > the now *-prefixed insn_and_split, turn the splitter const vec > into an input for the insn, making it an ignored immediate for > non-split cases, and loaded into the scratch register > otherwise.
Testcase? > --- > gcc/config/i386/predicates.md | 6 ++++++ > gcc/config/i386/sse.md | 26 +++++++++++++++++++------- > 2 files changed, 25 insertions(+), 7 deletions(-) > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > index b6dd5e9d3b243..f1da005c95cf3 100644 > --- a/gcc/config/i386/predicates.md > +++ b/gcc/config/i386/predicates.md > @@ -1153,6 +1153,12 @@ (define_predicate > "nonimmediate_or_const_vector_operand" > (ior (match_operand 0 "nonimmediate_operand") > (match_code "const_vector"))) > > +;; Return true when OP is either register operand, or any > +;; CONST_VECTOR. > +(define_predicate "register_or_const_vector_operand" please name this "reg_or_const_vector_operand" > + (ior (match_operand 0 "register_operand") > + (match_code "const_vector"))) > + > ;; Return true when OP is nonimmediate or standard SSE constant. > (define_predicate "nonimmediate_or_sse_const_operand" > (ior (match_operand 0 "nonimmediate_operand") > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 43e4d57ec6a3d..b693864e62d1b 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17159,10 +17159,26 @@ (define_insn "<ssse3_avx2>_pshufb<mode>3<mask_name>" > (set_attr "btver2_decode" "vector") > (set_attr "mode" "<sseinsnmode>")]) > > -(define_insn_and_split "ssse3_pshufbv8qi3" > +(define_expand "ssse3_pshufbv8qi3" > + [(parallel > + [(set (match_operand:V8QI 0 "register_operand" "=") > + (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "") > + (match_operand:V8QI 2 "register_mmxmem_operand" "") > + (const_vector:V4SI [(match_dup 3) (match_dup 3) > + (match_dup 3) (match_dup 3)])] > + UNSPEC_PSHUFB)) > + (clobber (match_scratch:V4SI 4 "="))])] All constraints should be removed from an expander. > + "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3" > +{ > + operands[3] = gen_int_mode (0xf7f7f7f7, SImode); You can use: ix86_build_const_vector (V4SImode, true, gen_int_mode (0xf7f7f7f7, SImode)); to generate the whole const_vector. Uros. > +}) > + > +(define_insn_and_split "*ssse3_pshufbv8qi3" > [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv") > (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,0,Yv") > - (match_operand:V8QI 2 "register_mmxmem_operand" > "ym,x,Yv")] > + (match_operand:V8QI 2 "register_mmxmem_operand" > "ym,x,Yv") > + (match_operand:V4SI 4 "register_or_const_vector_operand" > + "i,3,3")] > UNSPEC_PSHUFB)) > (clobber (match_scratch:V4SI 3 "=X,&x,&Yv"))] > "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3" > @@ -17172,8 +17188,7 @@ (define_insn_and_split "ssse3_pshufbv8qi3" > #" > "TARGET_SSSE3 && reload_completed > && SSE_REGNO_P (REGNO (operands[0]))" > - [(set (match_dup 3) (match_dup 5)) > - (set (match_dup 3) > + [(set (match_dup 3) > (and:V4SI (match_dup 3) (match_dup 2))) > (set (match_dup 0) > (unspec:V16QI [(match_dup 1) (match_dup 4)] UNSPEC_PSHUFB))] > @@ -17188,9 +17203,6 @@ (define_insn_and_split "ssse3_pshufbv8qi3" > GET_MODE (operands[2])); > operands[4] = lowpart_subreg (V16QImode, operands[3], > GET_MODE (operands[3])); > - rtx vec_const = ix86_build_const_vector (V4SImode, true, > - gen_int_mode (0xf7f7f7f7, SImode)); > - operands[5] = force_const_mem (V4SImode, vec_const); > } > [(set_attr "mmx_isa" "native,sse_noavx,avx") > (set_attr "prefix_extra" "1") > > > -- > Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ > Free Software Activist GNU Toolchain Engineer > Vim, Vi, Voltei pro Emacs -- GNUlius Caesar