On Tue, May 30, 2023 at 9:39 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Mon, May 29, 2023 at 8:17 PM Roger Sayle <ro...@nextmovesoftware.com> 
> wrote:
> >
> >
> > This is my proposed minimal fix for PR target/109973 (hopefully suitable
> > for backporting) that follows Jakub Jelinek's suggestion that we introduce
> > CCZmode and CCCmode variants of ptest and vptest, so that the i386
> > backend treats [v]ptest instructions similarly to testl instructions;
> > using different CCmodes to indicate which condition flags are desired,
> > and then relying on the RTL cmpelim pass to eliminate redundant tests.
> >
> > This conveniently matches Intel's intrinsics, that provide different
> > functions for retrieving different flags, _mm_testz_si128 tests the
> > Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
> > same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
> > optimization is only valid when the ptest/vptest instruction is used to
> > set/test the Z flag.
> >
> > The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> > currently COMPARE-centric and not able to merge the ptests from expressions
> > such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> > known issue, PR target/80040.  I've some follow-up patches to improve
> > things, but this first patch fixes the wrong-code regression, replacing
> > it with a rare missed-optimization (hopefully suitable for GCC 13).
> >
> > The only change that was unanticipated was the tweak to ix86_match_ccmode.
> > Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that
> > CCZmode means just the Z flag, CCCmode means just the C flag, and
> > CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> > Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> > CCmode
> > in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
> > re-use ix86_match_ccmode?
>
> It is actually the other way. CCZmode should NOT be allowed for CCmode
> in ix86_match_ccmode. When CCmode is requested, we don't assume
> anything about FLAGS bits, so we expect all bits to be valid. CCZmode
> implies only Z bit, and should be compatible only with itself. So, the
> "break;" is in the wrong place, it should be before E_CCZmode.

Hm, but PTEST is the *PRODUCER* of flags, not the consumer...

So, the whole picture should be like this:

(define_insn "*cmp<mode>_ccno_1"
  [(set (reg FLAGS_REG)
    (compare (match_operand:SWI 0 "nonimmediate_operand" "<r>,?m<r>")
         (match_operand:SWI 1 "const0_operand")))]
  "ix86_match_ccmode (insn, CCNOmode)"

The above means that the compare PROVIDES all bits, but O is
guaranteed to be zero.

(define_insn "*cmp<mode>_1"
  [(set (reg FLAGS_REG)
    (compare (match_operand:SWI 0 "nonimmediate_operand" "<r>m,<r>")
         (match_operand:SWI 1 "<general_operand>" "<r><i>,<r><m>")))]
  "ix86_match_ccmode (insn, CCmode)"

The above means that compare PROVIDES all bits.

+(define_expand "<sse4_1>_ptest<mode>"
+  [(set (reg:CC FLAGS_REG)
+ (unspec:CC [(match_operand:V_AVX 0 "register_operand")
+    (match_operand:V_AVX 1 "vector_operand")]
+   UNSPEC_PTEST))]
+  "TARGET_SSE4_1")

This is not true, PTEST does not provide all FLAGS bits in a general sense.

So, I think your original patch is OK, but please introduce the
ix86_match_ptest_ccmode function instead of reusing ix86_match_ccmode.

Uros.


>
> Uros.
>
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-05-29  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR targt/109973
> >         * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> >         CODE_for_sse4_1_ptestzv2di.
> >         (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> >         (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> >         (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> >         * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> >         when expanding UNSPEC_PTEST to compare against zero.
> >         * config/i386/i386-features.cc (scalar_chain::convert_compare):
> >         Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> >         (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> > result.
> >         (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> >         * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
> >         an UNSPEC, in addition to a COMPARE.  Consider CCCmode to be a form
> >         of CCmode.
> >         * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> >         to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> >         (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.
> >         Remove ":CC" flags specification, and use ix86_match_ccmode instead.
> >         (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
> >         (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
> >         (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
> >         current behavior.
> >         (*ptest<mode>_and): Specify CCZ to only perform this optimization
> >         when only the Z flag is required.
> >
> > gcc/testsuite/ChangeLog
> >         PR targt/109973
> >         * gcc.target/i386/pr109973-1.c: New test case.
> >         * gcc.target/i386/pr109973-2.c: Likewise.
> >
> >
> > Thanks,
> > Roger
> > --
> >

Reply via email to