On Tue, May 30, 2023 at 9:39 AM Uros Bizjak <[email protected]> wrote:
>
> On Mon, May 29, 2023 at 8:17 PM Roger Sayle <[email protected]>
> wrote:
> >
> >
> > This is my proposed minimal fix for PR target/109973 (hopefully suitable
> > for backporting) that follows Jakub Jelinek's suggestion that we introduce
> > CCZmode and CCCmode variants of ptest and vptest, so that the i386
> > backend treats [v]ptest instructions similarly to testl instructions;
> > using different CCmodes to indicate which condition flags are desired,
> > and then relying on the RTL cmpelim pass to eliminate redundant tests.
> >
> > This conveniently matches Intel's intrinsics, that provide different
> > functions for retrieving different flags, _mm_testz_si128 tests the
> > Z flag, _mm_testc_si128 tests the carry flag. Currently we use the
> > same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
> > optimization is only valid when the ptest/vptest instruction is used to
> > set/test the Z flag.
> >
> > The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> > currently COMPARE-centric and not able to merge the ptests from expressions
> > such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> > known issue, PR target/80040. I've some follow-up patches to improve
> > things, but this first patch fixes the wrong-code regression, replacing
> > it with a rare missed-optimization (hopefully suitable for GCC 13).
> >
> > The only change that was unanticipated was the tweak to ix86_match_ccmode.
> > Oddly, CCZmode is allowable for CCmode, but CCCmode isn't. Given that
> > CCZmode means just the Z flag, CCCmode means just the C flag, and
> > CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> > Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> > CCmode
> > in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
> > re-use ix86_match_ccmode?
>
> It is actually the other way. CCZmode should NOT be allowed for CCmode
> in ix86_match_ccmode. When CCmode is requested, we don't assume
> anything about FLAGS bits, so we expect all bits to be valid. CCZmode
> implies only Z bit, and should be compatible only with itself. So, the
> "break;" is in the wrong place, it should be before E_CCZmode.
Hm, but PTEST is the *PRODUCER* of flags, not the consumer...
So, the whole picture should be like this:
(define_insn "*cmp<mode>_ccno_1"
[(set (reg FLAGS_REG)
(compare (match_operand:SWI 0 "nonimmediate_operand" "<r>,?m<r>")
(match_operand:SWI 1 "const0_operand")))]
"ix86_match_ccmode (insn, CCNOmode)"
The above means that the compare PROVIDES all bits, but O is
guaranteed to be zero.
(define_insn "*cmp<mode>_1"
[(set (reg FLAGS_REG)
(compare (match_operand:SWI 0 "nonimmediate_operand" "<r>m,<r>")
(match_operand:SWI 1 "<general_operand>" "<r><i>,<r><m>")))]
"ix86_match_ccmode (insn, CCmode)"
The above means that compare PROVIDES all bits.
+(define_expand "<sse4_1>_ptest<mode>"
+ [(set (reg:CC FLAGS_REG)
+ (unspec:CC [(match_operand:V_AVX 0 "register_operand")
+ (match_operand:V_AVX 1 "vector_operand")]
+ UNSPEC_PTEST))]
+ "TARGET_SSE4_1")
This is not true, PTEST does not provide all FLAGS bits in a general sense.
So, I think your original patch is OK, but please introduce the
ix86_match_ptest_ccmode function instead of reusing ix86_match_ccmode.
Uros.
>
> Uros.
>
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures. Ok for mainline?
> >
> >
> > 2023-05-29 Roger Sayle <[email protected]>
> >
> > gcc/ChangeLog
> > PR targt/109973
> > * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> > CODE_for_sse4_1_ptestzv2di.
> > (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> > (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> > (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> > * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> > when expanding UNSPEC_PTEST to compare against zero.
> > * config/i386/i386-features.cc (scalar_chain::convert_compare):
> > Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> > (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> > result.
> > (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> > * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
> > an UNSPEC, in addition to a COMPARE. Consider CCCmode to be a form
> > of CCmode.
> > * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> > to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> > (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.
> > Remove ":CC" flags specification, and use ix86_match_ccmode instead.
> > (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
> > (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
> > (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
> > current behavior.
> > (*ptest<mode>_and): Specify CCZ to only perform this optimization
> > when only the Z flag is required.
> >
> > gcc/testsuite/ChangeLog
> > PR targt/109973
> > * gcc.target/i386/pr109973-1.c: New test case.
> > * gcc.target/i386/pr109973-2.c: Likewise.
> >
> >
> > Thanks,
> > Roger
> > --
> >