On Mon, May 29, 2023 at 8:17 PM Roger Sayle <[email protected]> wrote:
>
>
> This is my proposed minimal fix for PR target/109973 (hopefully suitable
> for backporting) that follows Jakub Jelinek's suggestion that we introduce
> CCZmode and CCCmode variants of ptest and vptest, so that the i386
> backend treats [v]ptest instructions similarly to testl instructions;
> using different CCmodes to indicate which condition flags are desired,
> and then relying on the RTL cmpelim pass to eliminate redundant tests.
>
> This conveniently matches Intel's intrinsics, that provide different
> functions for retrieving different flags, _mm_testz_si128 tests the
> Z flag, _mm_testc_si128 tests the carry flag. Currently we use the
> same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
> optimization is only valid when the ptest/vptest instruction is used to
> set/test the Z flag.
>
> The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> currently COMPARE-centric and not able to merge the ptests from expressions
> such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> known issue, PR target/80040. I've some follow-up patches to improve
> things, but this first patch fixes the wrong-code regression, replacing
> it with a rare missed-optimization (hopefully suitable for GCC 13).
>
> The only change that was unanticipated was the tweak to ix86_match_ccmode.
> Oddly, CCZmode is allowable for CCmode, but CCCmode isn't. Given that
> CCZmode means just the Z flag, CCCmode means just the C flag, and
> CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> CCmode
> in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
> re-use ix86_match_ccmode?
It is actually the other way. CCZmode should NOT be allowed for CCmode
in ix86_match_ccmode. When CCmode is requested, we don't assume
anything about FLAGS bits, so we expect all bits to be valid. CCZmode
implies only Z bit, and should be compatible only with itself. So, the
"break;" is in the wrong place, it should be before E_CCZmode.
Uros.
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2023-05-29 Roger Sayle <[email protected]>
>
> gcc/ChangeLog
> PR targt/109973
> * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> CODE_for_sse4_1_ptestzv2di.
> (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> when expanding UNSPEC_PTEST to compare against zero.
> * config/i386/i386-features.cc (scalar_chain::convert_compare):
> Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> result.
> (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
> an UNSPEC, in addition to a COMPARE. Consider CCCmode to be a form
> of CCmode.
> * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.
> Remove ":CC" flags specification, and use ix86_match_ccmode instead.
> (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
> (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
> (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
> current behavior.
> (*ptest<mode>_and): Specify CCZ to only perform this optimization
> when only the Z flag is required.
>
> gcc/testsuite/ChangeLog
> PR targt/109973
> * gcc.target/i386/pr109973-1.c: New test case.
> * gcc.target/i386/pr109973-2.c: Likewise.
>
>
> Thanks,
> Roger
> --
>