On Fri, Mar 19, 2021 at 7:29 AM Alexandre Oliva <ol...@gnu.org> wrote:
>
>
> The split in ssse3_pshufbv8qi3 forces a const vector into the constant
> pool, and loads from it.  That runs after reload, so if the load
> requires any reloading, we're out of luck.  Indeed, if the load
> address is not legitimate, e.g. -mcmodel=large, the insn is no longer
> recognized.
>
> This patch turns the constant into an input operand, introduces an
> expander to generate the constant unconditionally, and arranges for
> this input operand to be retained as an unused immediate in the
> alternatives that don't undergo splitting, and for it to be loaded
> into the scratch register for those that do.
>
> It is now the register allocator that arranges to load the const
> vector into a register, so it deals with whatever legitimizing steps
> needed for the target configuration.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?
>
>
> for  gcc/ChangeLog
>
>         * config/i386/predicates.md (register_or_const_vec_operand):
>         New.
>         * config/i386/sse.md (ssse3_pshufbv8qi3): Add an expander for
>         the now *-prefixed insn_and_split, turn the splitter const vec
>         into an input for the insn, making it an ignored immediate for
>         non-split cases, and loaded into the scratch register
>         otherwise.

Testcase?

> ---
>  gcc/config/i386/predicates.md |    6 ++++++
>  gcc/config/i386/sse.md        |   26 +++++++++++++++++++-------
>  2 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index b6dd5e9d3b243..f1da005c95cf3 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -1153,6 +1153,12 @@ (define_predicate 
> "nonimmediate_or_const_vector_operand"
>    (ior (match_operand 0 "nonimmediate_operand")
>         (match_code "const_vector")))
>
> +;; Return true when OP is either register operand, or any
> +;; CONST_VECTOR.
> +(define_predicate "register_or_const_vector_operand"

please name this "reg_or_const_vector_operand"

> +  (ior (match_operand 0 "register_operand")
> +       (match_code "const_vector")))
> +
>  ;; Return true when OP is nonimmediate or standard SSE constant.
>  (define_predicate "nonimmediate_or_sse_const_operand"
>    (ior (match_operand 0 "nonimmediate_operand")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 43e4d57ec6a3d..b693864e62d1b 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17159,10 +17159,26 @@ (define_insn "<ssse3_avx2>_pshufb<mode>3<mask_name>"
>     (set_attr "btver2_decode" "vector")
>     (set_attr "mode" "<sseinsnmode>")])
>
> -(define_insn_and_split "ssse3_pshufbv8qi3"
> +(define_expand "ssse3_pshufbv8qi3"
> +  [(parallel
> +    [(set (match_operand:V8QI 0 "register_operand" "=")
> +         (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "")
> +                       (match_operand:V8QI 2 "register_mmxmem_operand" "")
> +                       (const_vector:V4SI [(match_dup 3) (match_dup 3)
> +                                           (match_dup 3) (match_dup 3)])]
> +                      UNSPEC_PSHUFB))
> +     (clobber (match_scratch:V4SI 4 "="))])]

All constraints should be removed from an expander.

> +  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
> +{
> +  operands[3] = gen_int_mode (0xf7f7f7f7, SImode);

You can use:

ix86_build_const_vector (V4SImode, true,
                                          gen_int_mode (0xf7f7f7f7, SImode));

to generate the whole const_vector.

Uros.

> +})
> +
> +(define_insn_and_split "*ssse3_pshufbv8qi3"
>    [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
>         (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,0,Yv")
> -                     (match_operand:V8QI 2 "register_mmxmem_operand" 
> "ym,x,Yv")]
> +                     (match_operand:V8QI 2 "register_mmxmem_operand" 
> "ym,x,Yv")
> +                     (match_operand:V4SI 4 "register_or_const_vector_operand"
> +                                         "i,3,3")]
>                      UNSPEC_PSHUFB))
>     (clobber (match_scratch:V4SI 3 "=X,&x,&Yv"))]
>    "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
> @@ -17172,8 +17188,7 @@ (define_insn_and_split "ssse3_pshufbv8qi3"
>     #"
>    "TARGET_SSSE3 && reload_completed
>     && SSE_REGNO_P (REGNO (operands[0]))"
> -  [(set (match_dup 3) (match_dup 5))
> -   (set (match_dup 3)
> +  [(set (match_dup 3)
>         (and:V4SI (match_dup 3) (match_dup 2)))
>     (set (match_dup 0)
>         (unspec:V16QI [(match_dup 1) (match_dup 4)] UNSPEC_PSHUFB))]
> @@ -17188,9 +17203,6 @@ (define_insn_and_split "ssse3_pshufbv8qi3"
>                                 GET_MODE (operands[2]));
>    operands[4] = lowpart_subreg (V16QImode, operands[3],
>                                 GET_MODE (operands[3]));
> -  rtx vec_const = ix86_build_const_vector (V4SImode, true,
> -                                          gen_int_mode (0xf7f7f7f7, SImode));
> -  operands[5] = force_const_mem (V4SImode, vec_const);
>  }
>    [(set_attr "mmx_isa" "native,sse_noavx,avx")
>     (set_attr "prefix_extra" "1")
>
>
> --
> Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
>    Free Software Activist         GNU Toolchain Engineer
>         Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

Reply via email to