On Mon, May 29, 2023 at 8:17 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This is my proposed minimal fix for PR target/109973 (hopefully suitable
> for backporting) that follows Jakub Jelinek's suggestion that we introduce
> CCZmode and CCCmode variants of ptest and vptest, so that the i386
> backend treats [v]ptest instructions similarly to testl instructions;
> using different CCmodes to indicate which condition flags are desired,
> and then relying on the RTL cmpelim pass to eliminate redundant tests.
>
> This conveniently matches Intel's intrinsics, that provide different
> functions for retrieving different flags, _mm_testz_si128 tests the
> Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
> same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
> optimization is only valid when the ptest/vptest instruction is used to
> set/test the Z flag.
>
> The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> currently COMPARE-centric and not able to merge the ptests from expressions
> such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> known issue, PR target/80040.  I've some follow-up patches to improve
> things, but this first patch fixes the wrong-code regression, replacing
> it with a rare missed-optimization (hopefully suitable for GCC 13).
>
> The only change that was unanticipated was the tweak to ix86_match_ccmode.
> Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that
> CCZmode means just the Z flag, CCCmode means just the C flag, and
> CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> CCmode
> in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
> re-use ix86_match_ccmode?

It is actually the other way. CCZmode should NOT be allowed for CCmode
in ix86_match_ccmode. When CCmode is requested, we don't assume
anything about FLAGS bits, so we expect all bits to be valid. CCZmode
implies only Z bit, and should be compatible only with itself. So, the
"break;" is in the wrong place, it should be before E_CCZmode.

Uros.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-05-29  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR targt/109973
>         * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
>         CODE_for_sse4_1_ptestzv2di.
>         (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
>         (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
>         (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
>         * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
>         when expanding UNSPEC_PTEST to compare against zero.
>         * config/i386/i386-features.cc (scalar_chain::convert_compare):
>         Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
>         (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> result.
>         (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
>         * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
>         an UNSPEC, in addition to a COMPARE.  Consider CCCmode to be a form
>         of CCmode.
>         * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
>         to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
>         (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.
>         Remove ":CC" flags specification, and use ix86_match_ccmode instead.
>         (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
>         (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
>         (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
>         current behavior.
>         (*ptest<mode>_and): Specify CCZ to only perform this optimization
>         when only the Z flag is required.
>
> gcc/testsuite/ChangeLog
>         PR targt/109973
>         * gcc.target/i386/pr109973-1.c: New test case.
>         * gcc.target/i386/pr109973-2.c: Likewise.
>
>
> Thanks,
> Roger
> --
>

Reply via email to