On Tue, May 30, 2023 at 9:39 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Mon, May 29, 2023 at 8:17 PM Roger Sayle <ro...@nextmovesoftware.com> > wrote: > > > > > > This is my proposed minimal fix for PR target/109973 (hopefully suitable > > for backporting) that follows Jakub Jelinek's suggestion that we introduce > > CCZmode and CCCmode variants of ptest and vptest, so that the i386 > > backend treats [v]ptest instructions similarly to testl instructions; > > using different CCmodes to indicate which condition flags are desired, > > and then relying on the RTL cmpelim pass to eliminate redundant tests. > > > > This conveniently matches Intel's intrinsics, that provide different > > functions for retrieving different flags, _mm_testz_si128 tests the > > Z flag, _mm_testc_si128 tests the carry flag. Currently we use the > > same instruction (pattern) for both, and unfortunately the *ptest<mode>_and > > optimization is only valid when the ptest/vptest instruction is used to > > set/test the Z flag. > > > > The downside, as predicted by Jakub, is that GCC's cmpelim pass is > > currently COMPARE-centric and not able to merge the ptests from expressions > > such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a > > known issue, PR target/80040. I've some follow-up patches to improve > > things, but this first patch fixes the wrong-code regression, replacing > > it with a rare missed-optimization (hopefully suitable for GCC 13). > > > > The only change that was unanticipated was the tweak to ix86_match_ccmode. > > Oddly, CCZmode is allowable for CCmode, but CCCmode isn't. Given that > > CCZmode means just the Z flag, CCCmode means just the C flag, and > > CCmode means all the flags, I'm guessing this asymmetry is unintentional. > > Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or > > CCmode > > in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to > > re-use ix86_match_ccmode? > > It is actually the other way. CCZmode should NOT be allowed for CCmode > in ix86_match_ccmode. When CCmode is requested, we don't assume > anything about FLAGS bits, so we expect all bits to be valid. CCZmode > implies only Z bit, and should be compatible only with itself. So, the > "break;" is in the wrong place, it should be before E_CCZmode.
Hm, but PTEST is the *PRODUCER* of flags, not the consumer... So, the whole picture should be like this: (define_insn "*cmp<mode>_ccno_1" [(set (reg FLAGS_REG) (compare (match_operand:SWI 0 "nonimmediate_operand" "<r>,?m<r>") (match_operand:SWI 1 "const0_operand")))] "ix86_match_ccmode (insn, CCNOmode)" The above means that the compare PROVIDES all bits, but O is guaranteed to be zero. (define_insn "*cmp<mode>_1" [(set (reg FLAGS_REG) (compare (match_operand:SWI 0 "nonimmediate_operand" "<r>m,<r>") (match_operand:SWI 1 "<general_operand>" "<r><i>,<r><m>")))] "ix86_match_ccmode (insn, CCmode)" The above means that compare PROVIDES all bits. +(define_expand "<sse4_1>_ptest<mode>" + [(set (reg:CC FLAGS_REG) + (unspec:CC [(match_operand:V_AVX 0 "register_operand") + (match_operand:V_AVX 1 "vector_operand")] + UNSPEC_PTEST))] + "TARGET_SSE4_1") This is not true, PTEST does not provide all FLAGS bits in a general sense. So, I think your original patch is OK, but please introduce the ix86_match_ptest_ccmode function instead of reusing ix86_match_ccmode. Uros. > > Uros. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > > > > > 2023-05-29 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR targt/109973 > > * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new > > CODE_for_sse4_1_ptestzv2di. > > (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di. > > (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di. > > (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di. > > * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode > > when expanding UNSPEC_PTEST to compare against zero. > > * config/i386/i386-features.cc (scalar_chain::convert_compare): > > Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons. > > (general_scalar_chain::convert_insn): Use CCZmode for COMPARE > > result. > > (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result. > > * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be > > an UNSPEC, in addition to a COMPARE. Consider CCCmode to be a form > > of CCmode. > > * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK > > to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ. > > (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn. > > Remove ":CC" flags specification, and use ix86_match_ccmode instead. > > (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ. > > (<sse4_1>_ptestc<mode>): New define_expand to specify CCC. > > (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the > > current behavior. > > (*ptest<mode>_and): Specify CCZ to only perform this optimization > > when only the Z flag is required. > > > > gcc/testsuite/ChangeLog > > PR targt/109973 > > * gcc.target/i386/pr109973-1.c: New test case. > > * gcc.target/i386/pr109973-2.c: Likewise. > > > > > > Thanks, > > Roger > > -- > >