Many x86 pmovzx/pmovsx instructions with memory operands are modeled in
a wrong way. For example:
(define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
(any_extend:V8HI
(vec_select:V8QI
(match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
should be defind for memory operands as:
(define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
(any_extend:V8HI
(match_operand:V8QI "memory_operand" "m,m,m")))]
This set of patches updates them to
(define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
(any_extend:V8HI
(vec_select:V8QI
(match_operand:V16QI 1 "nonimmediate_operand" "Yr,*x,v")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1"
[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
(any_extend:V8HI
(match_operand:V8QI "subreg_memory_operand" "m,m,m")))]
with a splitter:
(define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2"
[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
(any_extend:V8HI
(vec_select:V8QI
(subreg:V16QI
(vec_concat:V2DI
(match_operand:DI 1 "memory_operand" "m,*m,m")
(const_int 0)) 0)
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
"TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>"
"#"
"&& can_create_pseudo_p ()"
[(set (match_dup 0) (match_dup 1))]
{
operands[1] = gen_rtx_<CODE> (V8HImode,
gen_rtx_SUBREG (V8QImode,
operands[1], 0));
})
It also contains a patch to update apply_subst_iterator to handle
define_insn_and_split.
H.J. Lu (2):
apply_subst_iterator: Handle define_insn_and_split
x86: Add pmovzx/pmovsx patterns with memory operands
gcc/config/i386/predicates.md | 30 ++
gcc/config/i386/sse.md | 323 ++++++++++++++++++++-
gcc/read-rtl.c | 6 +-
gcc/testsuite/gcc.target/i386/pr87317-1.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-10.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-11.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-12.c | 22 ++
gcc/testsuite/gcc.target/i386/pr87317-13.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-2.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-3.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-4.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-5.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-6.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-7.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-8.c | 14 +
gcc/testsuite/gcc.target/i386/pr87317-9.c | 14 +
16 files changed, 535 insertions(+), 14 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-9.c
--
2.17.2