[llvm-branch-commits] [llvm] TTI: Check legalization cost of fptosi_sat/fptoui_sat nodes (PR #100521)
RKSimon wrote: I'm not sure whether its better to just focus on removing some of the custom lowering (and improve TargetLowering::expandFP_TO_INT_SAT) or just add better cost table support. https://github.com/llvm/llvm-project/pull/100521 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] TTI: Check legalization cost of min/max ISD nodes (PR #100514)
@@ -42,75 +42,50 @@ define i32 @umax(i32 %arg) { ; FAST-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I64 = call <2 x i64> @llvm.umax.v2i64(<2 x i64> undef, <2 x i64> undef) ; FAST-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I64 = call <4 x i64> @llvm.umax.v4i64(<4 x i64> undef, <4 x i64> undef) ; FAST-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I64 = call <8 x i64> @llvm.umax.v8i64(<8 x i64> undef, <8 x i64> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call i32 @llvm.umax.i32(i32 undef, i32 undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = call i32 @llvm.umax.i32(i32 undef, i32 undef) ; FAST-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = call <2 x i32> @llvm.umax.v2i32(<2 x i32> undef, <2 x i32> undef) ; FAST-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I32 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> undef, <4 x i32> undef) ; FAST-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I32 = call <8 x i32> @llvm.umax.v8i32(<8 x i32> undef, <8 x i32> undef) ; FAST-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V16I32 = call <16 x i32> @llvm.umax.v16i32(<16 x i32> undef, <16 x i32> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I16 = call i16 @llvm.umax.i16(i16 undef, i16 undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = call <2 x i16> @llvm.umax.v2i16(<2 x i16> undef, <2 x i16> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4I16 = call <4 x i16> @llvm.umax.v4i16(<4 x i16> undef, <4 x i16> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V8I16 = call <8 x i16> @llvm.umax.v8i16(<8 x i16> undef, <8 x i16> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %V16I16 = call <16 x i16> @llvm.umax.v16i16(<16 x i16> undef, <16 x i16> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 126 for instruction: %V32I16 = call <32 x i16> @llvm.umax.v32i16(<32 x i16> undef, <32 x i16> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I8 = call i8 @llvm.umax.i8(i8 undef, i8 undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2I8 = call <2 x i8> @llvm.umax.v2i8(<2 x i8> undef, <2 x i8> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I8 = call <4 x i8> @llvm.umax.v4i8(<4 x i8> undef, <4 x i8> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V8I8 = call <8 x i8> @llvm.umax.v8i8(<8 x i8> undef, <8 x i8> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V16I8 = call <16 x i8> @llvm.umax.v16i8(<16 x i8> undef, <16 x i8> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %V32I8 = call <32 x i8> @llvm.umax.v32i8(<32 x i8> undef, <32 x i8> undef) -; FAST-NEXT: Cost Model: Found an estimated cost of 256 for instruction: %V64I8 = call <64 x i8> @llvm.umax.v64i8(<64 x i8> undef, <64 x i8> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = call i16 @llvm.umax.i16(i16 undef, i16 undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = call <2 x i16> @llvm.umax.v2i16(<2 x i16> undef, <2 x i16> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4I16 = call <4 x i16> @llvm.umax.v4i16(<4 x i16> undef, <4 x i16> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = call <8 x i16> @llvm.umax.v8i16(<8 x i16> undef, <8 x i16> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16I16 = call <16 x i16> @llvm.umax.v16i16(<16 x i16> undef, <16 x i16> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V32I16 = call <32 x i16> @llvm.umax.v32i16(<32 x i16> undef, <32 x i16> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = call i8 @llvm.umax.i8(i8 undef, i8 undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I8 = call <2 x i8> @llvm.umax.v2i8(<2 x i8> undef, <2 x i8> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I8 = call <4 x i8> @llvm.umax.v4i8(<4 x i8> undef, <4 x i8> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I8 = call <8 x i8> @llvm.umax.v8i8(<8 x i8> undef, <8 x i8> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V16I8 = call <16 x i8> @llvm.umax.v16i8(<16 x i8> undef, <16 x i8> undef) +; FAST-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V32I8 = call <32 x i8> @llvm.umax.v32i8(<32 x i8> undef, <32 x i8>
[llvm-branch-commits] [CodeGen] Add dump() to MachineTraceMetrics.h (PR #97799)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/97799 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Call SimplifyDemandedBits on copysign value operand (PR #97180)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/97180 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [DAGCombiner] In mergeTruncStore, make sure we aren't storing shifted in bits. (#90939) (PR #91038)
RKSimon wrote: > > @AtariDreams I've noticed you've filed a lot of backport requests. How are > > you choosing which fixes to backport? Is there a specific use case you care > > about? > > There a particular LLVM miscompile bug in WebKit I'm trying to figure out. > It's been there since 2019. Backports is literally just avoiding > miscompilations @AtariDreams Has the bug disappeared in llvm trunk and you think a recent commit has fixed/hidden it? Has this bug been reported either to WebKit or LLVM that we can track please? Have you been able to confirm if its a llvm bug or UB in WebKit? https://github.com/llvm/llvm-project/pull/91038 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91425)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/91425 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125) (PR #91161)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/91161 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106) (PR #91118)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/91118 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86][EVEX512] Check hasEVEX512 for canExtendTo512DQ (#90390) (PR #90422)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/90422 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/18.x [X86_64] fix SSE type error in vaarg (PR #86698)
RKSimon wrote: What are the current rules on cherry picks for old bugs? AFAICT this patch wasn't fixing a bug introduced in the 17.x-18.x development region. https://github.com/llvm/llvm-project/pull/86698 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [X86] Resolve FIXME: Enable PC relative calls on Windows (PR #84185)
RKSimon wrote: Now that 18.1 has been released - we shouldn't be merging anything that isn't just a regression from 17.x I've tried to find the release policy for this in case 18.2 is now allow further merges but I can't find anything? https://github.com/llvm/llvm-project/pull/84185 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [RISCV] Add test for aliasing miscompile fixed by #83017. NFC (PR #83856)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/83856 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Change computeAliasing signature from optional to LocationSize. (#83017) (PR #83848)
RKSimon wrote: @davemgreen Are there further patches for scalable types coming or is this just to address the ~UINT64_T(0) bugfix? https://github.com/llvm/llvm-project/pull/83848 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Backport [DAGCombine] Fix multi-use miscompile in load combine (#81586) (PR #81633)
https://github.com/RKSimon approved this pull request. LGTM for backport https://github.com/llvm/llvm-project/pull/81633 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm-exegesis] Add additional validation counters (PR #76788)
RKSimon wrote: Thanks, no more comments from me - but a exegesis owner should review the rest https://github.com/llvm/llvm-project/pull/76788 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm-exegesis] Add additional validation counters (PR #76788)
@@ -121,7 +121,12 @@ def HaswellPfmCounters : ProcPfmCounters { PfmIssueCounter<"HWPort7", "uops_executed_port:port_7"> ]; let ValidationCounters = [ -PfmValidationCounter +PfmValidationCounter, +PfmValidationCounter, +PfmValidationCounter, +PfmValidationCounter, +PfmValidationCounter, +PfmValidationCounter ]; RKSimon wrote: Could we pull this out into a default list instead of duplicating it? `let ValidationCounters = DefaultX86ValidationCounters` or something? https://github.com/llvm/llvm-project/pull/76788 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] abc60e9 - [X86] vec_fabs.ll - add SSE test coverage
Author: Simon Pilgrim Date: 2023-11-30T10:07:00Z New Revision: abc60e9808820c3f6614e6815909d43ed085460e URL: https://github.com/llvm/llvm-project/commit/abc60e9808820c3f6614e6815909d43ed085460e DIFF: https://github.com/llvm/llvm-project/commit/abc60e9808820c3f6614e6815909d43ed085460e.diff LOG: [X86] vec_fabs.ll - add SSE test coverage Added: Modified: llvm/test/CodeGen/X86/vec_fabs.ll Removed: diff --git a/llvm/test/CodeGen/X86/vec_fabs.ll b/llvm/test/CodeGen/X86/vec_fabs.ll index ec02dfda30c8502..c17341c2c8b077e 100644 --- a/llvm/test/CodeGen/X86/vec_fabs.ll +++ b/llvm/test/CodeGen/X86/vec_fabs.ll @@ -1,24 +1,31 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=X86,X86-AVX,X86-AVX1 -; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=X86,X86-AVX,X86-AVX2 -; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512vl | FileCheck %s --check-prefixes=X86,X86-AVX512,X86-AVX512VL -; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512fp16 | FileCheck %s --check-prefixes=X86,X86-AVX512,X86-AVX512FP16 -; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512dq,+avx512vl | FileCheck %s --check-prefixes=X86,X86-AVX512,X86-AVX512VLDQ -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX1 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX2 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl | FileCheck %s --check-prefixes=X64,X64-AVX512,X64-AVX512VL -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512fp16 | FileCheck %s --check-prefixes=X64,X64-AVX512,X64-AVX512FP16 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512vl | FileCheck %s --check-prefixes=X64,X64-AVX512,X64-AVX512VLDQ +; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 | FileCheck %s --check-prefixes=X86,X86-SSE +; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=X86,X86-AVX,X86-AVX1OR2,X86-AVX1 +; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=X86,X86-AVX,X86-AVX1OR2,X86-AVX2 +; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512vl | FileCheck %s --check-prefixes=X86,X86-AVX,X86-AVX512,X86-AVX512VL +; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512fp16 | FileCheck %s --check-prefixes=X86,X86-AVX,X86-AVX512,X86-AVX512FP16 +; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512dq,+avx512vl | FileCheck %s --check-prefixes=X86,X86-AVX,X86-AVX512,X86-AVX512VLDQ +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 | FileCheck %s --check-prefixes=X64,X64-SSE +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX1OR2,X64-AVX1 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX1OR2,X64-AVX2 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vl | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX512,X64-AVX512VL +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512fp16 | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX512,X64-AVX512FP16 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512vl | FileCheck %s --check-prefixes=X64,X64-AVX,X64-AVX512,X64-AVX512VLDQ ; ; 128-bit Vectors ; -define <2 x double> @fabs_v2f64(<2 x double> %p) { -; X86-AVX-LABEL: fabs_v2f64: -; X86-AVX: # %bb.0: -; X86-AVX-NEXT:vandps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0 -; X86-AVX-NEXT:retl +define <2 x double> @fabs_v2f64(<2 x double> %p) nounwind { +; X86-SSE-LABEL: fabs_v2f64: +; X86-SSE: # %bb.0: +; X86-SSE-NEXT:andps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0 +; X86-SSE-NEXT:retl +; +; X86-AVX1OR2-LABEL: fabs_v2f64: +; X86-AVX1OR2: # %bb.0: +; X86-AVX1OR2-NEXT:vandps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0 +; X86-AVX1OR2-NEXT:retl ; ; X86-AVX512VL-LABEL: fabs_v2f64: ; X86-AVX512VL: # %bb.0: @@ -35,10 +42,15 @@ define <2 x double> @fabs_v2f64(<2 x double> %p) { ; X86-AVX512VLDQ-NEXT:vandpd {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm0, %xmm0 ; X86-AVX512VLDQ-NEXT:retl ; -; X64-AVX-LABEL: fabs_v2f64: -; X64-AVX: # %bb.0: -; X64-AVX-NEXT:vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 -; X64-AVX-NEXT:retq +; X64-SSE-LABEL: fabs_v2f64: +; X64-SSE: # %bb.0: +; X64-SSE-NEXT:andps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0 +; X64-SSE-NEXT:retq +; +; X64-AVX1OR2-LABEL: fabs_v2f64: +; X64-AVX1OR2: # %bb.0: +; X64-AVX1OR2-NEXT:vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0 +; X64-AVX1OR2-NEXT:retq ; ; X64-AVX512VL-LABEL: fabs_v2f64: ; X64-AVX512VL:
[llvm-branch-commits] [llvm] 48e1434 - [X86] Move combineToExtendBoolVectorInReg before the select combines. NFC.
Author: Simon Pilgrim Date: 2022-02-11T16:51:46Z New Revision: 48e1434a0a77852f58c1617123f228f1069ba775 URL: https://github.com/llvm/llvm-project/commit/48e1434a0a77852f58c1617123f228f1069ba775 DIFF: https://github.com/llvm/llvm-project/commit/48e1434a0a77852f58c1617123f228f1069ba775.diff LOG: [X86] Move combineToExtendBoolVectorInReg before the select combines. NFC. Avoid the need for a forward declaration. Cleanup prep for Issue #53760 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 84c7ff58ae9b0..e91f68425522f 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -43123,6 +43123,104 @@ static SDValue combineExtractVectorElt(SDNode *N, SelectionDAG , return SDValue(); } +// Convert (vXiY *ext(vXi1 bitcast(iX))) to extend_in_reg(broadcast(iX)). +// This is more or less the reverse of combineBitcastvxi1. +static SDValue combineToExtendBoolVectorInReg( +unsigned Opcode, const SDLoc , EVT VT, SDValue N0, SelectionDAG , +TargetLowering::DAGCombinerInfo , const X86Subtarget ) { + if (Opcode != ISD::SIGN_EXTEND && Opcode != ISD::ZERO_EXTEND && + Opcode != ISD::ANY_EXTEND) +return SDValue(); + if (!DCI.isBeforeLegalizeOps()) +return SDValue(); + if (!Subtarget.hasSSE2() || Subtarget.hasAVX512()) +return SDValue(); + + EVT SVT = VT.getScalarType(); + EVT InSVT = N0.getValueType().getScalarType(); + unsigned EltSizeInBits = SVT.getSizeInBits(); + + // Input type must be extending a bool vector (bit-casted from a scalar + // integer) to legal integer types. + if (!VT.isVector()) +return SDValue(); + if (SVT != MVT::i64 && SVT != MVT::i32 && SVT != MVT::i16 && SVT != MVT::i8) +return SDValue(); + if (InSVT != MVT::i1 || N0.getOpcode() != ISD::BITCAST) +return SDValue(); + + SDValue N00 = N0.getOperand(0); + EVT SclVT = N00.getValueType(); + if (!SclVT.isScalarInteger()) +return SDValue(); + + SDValue Vec; + SmallVector ShuffleMask; + unsigned NumElts = VT.getVectorNumElements(); + assert(NumElts == SclVT.getSizeInBits() && "Unexpected bool vector size"); + + // Broadcast the scalar integer to the vector elements. + if (NumElts > EltSizeInBits) { +// If the scalar integer is greater than the vector element size, then we +// must split it down into sub-sections for broadcasting. For example: +// i16 -> v16i8 (i16 -> v8i16 -> v16i8) with 2 sub-sections. +// i32 -> v32i8 (i32 -> v8i32 -> v32i8) with 4 sub-sections. +assert((NumElts % EltSizeInBits) == 0 && "Unexpected integer scale"); +unsigned Scale = NumElts / EltSizeInBits; +EVT BroadcastVT = EVT::getVectorVT(*DAG.getContext(), SclVT, EltSizeInBits); +Vec = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, BroadcastVT, N00); +Vec = DAG.getBitcast(VT, Vec); + +for (unsigned i = 0; i != Scale; ++i) + ShuffleMask.append(EltSizeInBits, i); +Vec = DAG.getVectorShuffle(VT, DL, Vec, Vec, ShuffleMask); + } else if (Subtarget.hasAVX2() && NumElts < EltSizeInBits && + (SclVT == MVT::i8 || SclVT == MVT::i16 || SclVT == MVT::i32)) { +// If we have register broadcast instructions, use the scalar size as the +// element type for the shuffle. Then cast to the wider element type. The +// widened bits won't be used, and this might allow the use of a broadcast +// load. +assert((EltSizeInBits % NumElts) == 0 && "Unexpected integer scale"); +unsigned Scale = EltSizeInBits / NumElts; +EVT BroadcastVT = +EVT::getVectorVT(*DAG.getContext(), SclVT, NumElts * Scale); +Vec = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, BroadcastVT, N00); +ShuffleMask.append(NumElts * Scale, 0); +Vec = DAG.getVectorShuffle(BroadcastVT, DL, Vec, Vec, ShuffleMask); +Vec = DAG.getBitcast(VT, Vec); + } else { +// For smaller scalar integers, we can simply any-extend it to the vector +// element size (we don't care about the upper bits) and broadcast it to all +// elements. +SDValue Scl = DAG.getAnyExtOrTrunc(N00, DL, SVT); +Vec = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VT, Scl); +ShuffleMask.append(NumElts, 0); +Vec = DAG.getVectorShuffle(VT, DL, Vec, Vec, ShuffleMask); + } + + // Now, mask the relevant bit in each element. + SmallVector Bits; + for (unsigned i = 0; i != NumElts; ++i) { +int BitIdx = (i % EltSizeInBits); +APInt Bit = APInt::getBitsSet(EltSizeInBits, BitIdx, BitIdx + 1); +Bits.push_back(DAG.getConstant(Bit, DL, SVT)); + } + SDValue BitMask = DAG.getBuildVector(VT, DL, Bits); + Vec = DAG.getNode(ISD::AND, DL, VT, Vec, BitMask); + + // Compare against the bitmask and extend the result. + EVT CCVT = VT.changeVectorElementType(MVT::i1); + Vec = DAG.getSetCC(DL, CCVT, Vec, BitMask, ISD::SETEQ); + Vec =
[llvm-branch-commits] [llvm] 827d0c5 - [X86] combineToExtendBoolVectorInReg - use explicit arguments. NFC.
Author: Simon Pilgrim Date: 2022-02-11T16:40:29Z New Revision: 827d0c51be93c4b0bcbe43a6cbbcc0e65a8b9f58 URL: https://github.com/llvm/llvm-project/commit/827d0c51be93c4b0bcbe43a6cbbcc0e65a8b9f58 DIFF: https://github.com/llvm/llvm-project/commit/827d0c51be93c4b0bcbe43a6cbbcc0e65a8b9f58.diff LOG: [X86] combineToExtendBoolVectorInReg - use explicit arguments. NFC. Replace the *_EXTEND node with the raw operands, this will make it easier to use combineToExtendBoolVectorInReg for any boolvec extension combine. Cleanup prep for Issue #53760 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 53c00affd70e6..84c7ff58ae9b0 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -50422,11 +50422,9 @@ static SDValue combineToExtendCMOV(SDNode *Extend, SelectionDAG ) { // Convert (vXiY *ext(vXi1 bitcast(iX))) to extend_in_reg(broadcast(iX)). // This is more or less the reverse of combineBitcastvxi1. -static SDValue -combineToExtendBoolVectorInReg(SDNode *N, SelectionDAG , - TargetLowering::DAGCombinerInfo , - const X86Subtarget ) { - unsigned Opcode = N->getOpcode(); +static SDValue combineToExtendBoolVectorInReg( +unsigned Opcode, const SDLoc , EVT VT, SDValue N0, SelectionDAG , +TargetLowering::DAGCombinerInfo , const X86Subtarget ) { if (Opcode != ISD::SIGN_EXTEND && Opcode != ISD::ZERO_EXTEND && Opcode != ISD::ANY_EXTEND) return SDValue(); @@ -50435,8 +50433,6 @@ combineToExtendBoolVectorInReg(SDNode *N, SelectionDAG , if (!Subtarget.hasSSE2() || Subtarget.hasAVX512()) return SDValue(); - SDValue N0 = N->getOperand(0); - EVT VT = N->getValueType(0); EVT SVT = VT.getScalarType(); EVT InSVT = N0.getValueType().getScalarType(); unsigned EltSizeInBits = SVT.getSizeInBits(); @@ -50451,13 +50447,12 @@ combineToExtendBoolVectorInReg(SDNode *N, SelectionDAG , return SDValue(); SDValue N00 = N0.getOperand(0); - EVT SclVT = N0.getOperand(0).getValueType(); + EVT SclVT = N00.getValueType(); if (!SclVT.isScalarInteger()) return SDValue(); - SDLoc DL(N); SDValue Vec; - SmallVector ShuffleMask; + SmallVector ShuffleMask; unsigned NumElts = VT.getVectorNumElements(); assert(NumElts == SclVT.getSizeInBits() && "Unexpected bool vector size"); @@ -50603,7 +50598,8 @@ static SDValue combineSext(SDNode *N, SelectionDAG , if (SDValue V = combineExtSetcc(N, DAG, Subtarget)) return V; - if (SDValue V = combineToExtendBoolVectorInReg(N, DAG, DCI, Subtarget)) + if (SDValue V = combineToExtendBoolVectorInReg(N->getOpcode(), DL, VT, N0, + DAG, DCI, Subtarget)) return V; if (VT.isVector()) { @@ -50757,7 +50753,8 @@ static SDValue combineZext(SDNode *N, SelectionDAG , if (SDValue V = combineExtSetcc(N, DAG, Subtarget)) return V; - if (SDValue V = combineToExtendBoolVectorInReg(N, DAG, DCI, Subtarget)) + if (SDValue V = combineToExtendBoolVectorInReg(N->getOpcode(), dl, VT, N0, + DAG, DCI, Subtarget)) return V; if (VT.isVector()) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 13f2aee - [X86][AVX] Generalize vperm2f128/vperm2i128 patterns to support all legal 256-bit vector types
Author: Simon Pilgrim Date: 2021-01-25T15:35:36Z New Revision: 13f2aee7831c9bec17006a6d401008df541a121d URL: https://github.com/llvm/llvm-project/commit/13f2aee7831c9bec17006a6d401008df541a121d DIFF: https://github.com/llvm/llvm-project/commit/13f2aee7831c9bec17006a6d401008df541a121d.diff LOG: [X86][AVX] Generalize vperm2f128/vperm2i128 patterns to support all legal 256-bit vector types Remove bitcasts to/from v4x64 types through vperm2f128/vperm2i128 ops to help improve shuffle combining and demanded vector elts folding. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86InstrSSE.td llvm/test/CodeGen/X86/haddsub-2.ll llvm/test/CodeGen/X86/masked_store_trunc.ll llvm/test/CodeGen/X86/var-permute-256.ll llvm/test/CodeGen/X86/vector-reduce-and-bool.ll llvm/test/CodeGen/X86/vector-reduce-or-bool.ll llvm/test/CodeGen/X86/vector-reduce-xor-bool.ll llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll llvm/test/CodeGen/X86/vector-trunc.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index ae73a32a5d9a..fc19800eda79 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35436,7 +35436,6 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, DL, 256); } -MVT ShuffleVT = (FloatDomain ? MVT::v4f64 : MVT::v4i64); if (Depth == 0 && Root.getOpcode() == X86ISD::VPERM2X128) return SDValue(); // Nothing to do! @@ -35449,12 +35448,9 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, unsigned PermMask = 0; PermMask |= ((BaseMask[0] < 0 ? 0x8 : (BaseMask[0] & 1)) << 0); PermMask |= ((BaseMask[1] < 0 ? 0x8 : (BaseMask[1] & 1)) << 4); - - Res = CanonicalizeShuffleInput(ShuffleVT, V1); - Res = DAG.getNode(X86ISD::VPERM2X128, DL, ShuffleVT, Res, -DAG.getUNDEF(ShuffleVT), -DAG.getTargetConstant(PermMask, DL, MVT::i8)); - return DAG.getBitcast(RootVT, Res); + return DAG.getNode( + X86ISD::VPERM2X128, DL, RootVT, CanonicalizeShuffleInput(RootVT, V1), + DAG.getUNDEF(RootVT), DAG.getTargetConstant(PermMask, DL, MVT::i8)); } if (Depth == 0 && Root.getOpcode() == X86ISD::SHUF128) @@ -35470,14 +35466,12 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, unsigned PermMask = 0; PermMask |= ((BaseMask[0] & 3) << 0); PermMask |= ((BaseMask[1] & 3) << 4); - SDValue LHS = isInRange(BaseMask[0], 0, 2) ? V1 : V2; SDValue RHS = isInRange(BaseMask[1], 0, 2) ? V1 : V2; -Res = DAG.getNode(X86ISD::VPERM2X128, DL, ShuffleVT, - CanonicalizeShuffleInput(ShuffleVT, LHS), - CanonicalizeShuffleInput(ShuffleVT, RHS), +return DAG.getNode(X86ISD::VPERM2X128, DL, RootVT, + CanonicalizeShuffleInput(RootVT, LHS), + CanonicalizeShuffleInput(RootVT, RHS), DAG.getTargetConstant(PermMask, DL, MVT::i8)); -return DAG.getBitcast(RootVT, Res); } } } @@ -37323,11 +37317,26 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG , return SDValue(); } case X86ISD::VPERM2X128: { +// Fold vperm2x128(bitcast(x),bitcast(y),c) -> bitcast(vperm2x128(x,y,c)). +SDValue LHS = N->getOperand(0); +SDValue RHS = N->getOperand(1); +if (LHS.getOpcode() == ISD::BITCAST && +(RHS.getOpcode() == ISD::BITCAST || RHS.isUndef())) { + EVT SrcVT = LHS.getOperand(0).getValueType(); + if (RHS.isUndef() || SrcVT == RHS.getOperand(0).getValueType()) { +return DAG.getBitcast(VT, DAG.getNode(X86ISD::VPERM2X128, DL, SrcVT, + DAG.getBitcast(SrcVT, LHS), + DAG.getBitcast(SrcVT, RHS), + N->getOperand(2))); + } +} + +// Fold vperm2x128(op(),op()) -> op(vperm2x128(),vperm2x128()). if (SDValue Res = canonicalizeLaneShuffleWithRepeatedOps(N, DAG, DL)) -return Res; + return Res; -// Combine vperm2x128 subvector shuffle with an inner concat pattern. -// vperm2x128(concat(X,Y),concat(Z,W)) --> concat X,Y etc. +// Fold vperm2x128 subvector shuffle with an inner concat pattern. +// vperm2x128(concat(X,Y),concat(Z,W)) --> concat X,Y etc. auto FindSubVector128 = [&](unsigned Idx) { if (Idx > 3) return SDValue(); diff --git a/llvm/lib/Target/X86/X86InstrSSE.td b/llvm/lib/Target/X86/X86InstrSSE.td index 071c638077b2..7cf555748c46 100644 --- a/llvm/lib/Target/X86/X86InstrSSE.td +++ b/llvm/lib/Target/X86/X86InstrSSE.td @@ -7287,16 +7287,12 @@
[llvm-branch-commits] [llvm] 821a51a - [X86][AVX] combineX86ShuffleChainWithExtract - widen to at least original root size. NFCI.
Author: Simon Pilgrim Date: 2021-01-25T13:45:37Z New Revision: 821a51a9cacfac7da8b34ccc0498d316471f1dbc URL: https://github.com/llvm/llvm-project/commit/821a51a9cacfac7da8b34ccc0498d316471f1dbc DIFF: https://github.com/llvm/llvm-project/commit/821a51a9cacfac7da8b34ccc0498d316471f1dbc.diff LOG: [X86][AVX] combineX86ShuffleChainWithExtract - widen to at least original root size. NFCI. We're relying on the source inputs for shuffle combining having already been widened to the root size (otherwise the offset logic falls over) - we're going to be supporting different sized shuffle inputs soon, so we need to explicitly make the minimum widened width the original root size. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index d2a07e7364dd..ae73a32a5d9a 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35997,12 +35997,16 @@ static SDValue combineX86ShuffleChainWithExtract( if (NumInputs == 0) return SDValue(); + EVT RootVT = Root.getValueType(); + unsigned RootSizeInBits = RootVT.getSizeInBits(); + assert((RootSizeInBits % NumMaskElts) == 0 && "Unexpected root shuffle mask"); + SmallVector WideInputs(Inputs.begin(), Inputs.end()); SmallVector Offsets(NumInputs, 0); // Peek through subvectors. // TODO: Support inter-mixed EXTRACT_SUBVECTORs + BITCASTs? - unsigned WideSizeInBits = WideInputs[0].getValueSizeInBits(); + unsigned WideSizeInBits = RootSizeInBits; for (unsigned i = 0; i != NumInputs; ++i) { SDValue = WideInputs[i]; unsigned = Offsets[i]; @@ -36025,8 +36029,6 @@ static SDValue combineX86ShuffleChainWithExtract( if (llvm::all_of(Offsets, [](unsigned Offset) { return Offset == 0; })) return SDValue(); - EVT RootVT = Root.getValueType(); - unsigned RootSizeInBits = RootVT.getSizeInBits(); unsigned Scale = WideSizeInBits / RootSizeInBits; assert((WideSizeInBits % RootSizeInBits) == 0 && "Unexpected subvector extraction"); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 1b780cf - [X86][AVX] LowerTRUNCATE - avoid bitcasts around extract_subvectors.
Author: Simon Pilgrim Date: 2021-01-25T12:10:36Z New Revision: 1b780cf32e3eea193aa2255b852a7ef164ea00a5 URL: https://github.com/llvm/llvm-project/commit/1b780cf32e3eea193aa2255b852a7ef164ea00a5 DIFF: https://github.com/llvm/llvm-project/commit/1b780cf32e3eea193aa2255b852a7ef164ea00a5.diff LOG: [X86][AVX] LowerTRUNCATE - avoid bitcasts around extract_subvectors. We allow extract_subvector lowering of all legal types, so pre-bitcast the source type to try and reduce bitcast pollution. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 2a86e12dd53c..d2a07e7364dd 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -21075,30 +21075,29 @@ SDValue X86TargetLowering::LowerTRUNCATE(SDValue Op, SelectionDAG ) const { assert(VT.is128BitVector() && InVT.is256BitVector() && "Unexpected types!"); if ((VT == MVT::v4i32) && (InVT == MVT::v4i64)) { +In = DAG.getBitcast(MVT::v8i32, In); + // On AVX2, v4i64 -> v4i32 becomes VPERMD. if (Subtarget.hasInt256()) { static const int ShufMask[] = {0, 2, 4, 6, -1, -1, -1, -1}; - In = DAG.getBitcast(MVT::v8i32, In); In = DAG.getVectorShuffle(MVT::v8i32, DL, In, In, ShufMask); return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, In, DAG.getIntPtrConstant(0, DL)); } -SDValue OpLo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v2i64, In, +SDValue OpLo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v4i32, In, DAG.getIntPtrConstant(0, DL)); -SDValue OpHi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v2i64, In, - DAG.getIntPtrConstant(2, DL)); -OpLo = DAG.getBitcast(MVT::v4i32, OpLo); -OpHi = DAG.getBitcast(MVT::v4i32, OpHi); +SDValue OpHi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v4i32, In, + DAG.getIntPtrConstant(4, DL)); static const int ShufMask[] = {0, 2, 4, 6}; return DAG.getVectorShuffle(VT, DL, OpLo, OpHi, ShufMask); } if ((VT == MVT::v8i16) && (InVT == MVT::v8i32)) { +In = DAG.getBitcast(MVT::v32i8, In); + // On AVX2, v8i32 -> v8i16 becomes PSHUFB. if (Subtarget.hasInt256()) { - In = DAG.getBitcast(MVT::v32i8, In); - // The PSHUFB mask: static const int ShufMask1[] = { 0, 1, 4, 5, 8, 9, 12, 13, -1, -1, -1, -1, -1, -1, -1, -1, @@ -21107,21 +21106,17 @@ SDValue X86TargetLowering::LowerTRUNCATE(SDValue Op, SelectionDAG ) const { In = DAG.getVectorShuffle(MVT::v32i8, DL, In, In, ShufMask1); In = DAG.getBitcast(MVT::v4i64, In); - static const int ShufMask2[] = {0, 2, -1, -1}; - In = DAG.getVectorShuffle(MVT::v4i64, DL, In, In, ShufMask2); - In = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v2i64, In, - DAG.getIntPtrConstant(0, DL)); - return DAG.getBitcast(VT, In); + static const int ShufMask2[] = {0, 2, -1, -1}; + In = DAG.getVectorShuffle(MVT::v4i64, DL, In, In, ShufMask2); + return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v8i16, + DAG.getBitcast(MVT::v16i16, In), + DAG.getIntPtrConstant(0, DL)); } -SDValue OpLo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v4i32, In, +SDValue OpLo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v16i8, In, DAG.getIntPtrConstant(0, DL)); - -SDValue OpHi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v4i32, In, - DAG.getIntPtrConstant(4, DL)); - -OpLo = DAG.getBitcast(MVT::v16i8, OpLo); -OpHi = DAG.getBitcast(MVT::v16i8, OpHi); +SDValue OpHi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v16i8, In, + DAG.getIntPtrConstant(16, DL)); // The PSHUFB mask: static const int ShufMask1[] = {0, 1, 4, 5, 8, 9, 12, 13, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] f461e35 - [X86][AVX] combineX86ShuffleChain - avoid bitcasts around insert_subvector() shuffle patterns.
Author: Simon Pilgrim Date: 2021-01-25T11:35:45Z New Revision: f461e35cbafed593e637305e2a76822dfb7ca6c7 URL: https://github.com/llvm/llvm-project/commit/f461e35cbafed593e637305e2a76822dfb7ca6c7 DIFF: https://github.com/llvm/llvm-project/commit/f461e35cbafed593e637305e2a76822dfb7ca6c7.diff LOG: [X86][AVX] combineX86ShuffleChain - avoid bitcasts around insert_subvector() shuffle patterns. We allow insert_subvector lowering of all legal types, so don't always cast to the vXi64/vXf64 shuffle types - this is only necessary for X86ISD::SHUF128/X86ISD::VPERM2X128 patterns later. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 0edc40683ea8..2a86e12dd53c 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35357,8 +35357,6 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, // Handle 128/256-bit lane shuffles of 512-bit vectors. if (RootVT.is512BitVector() && (NumBaseMaskElts == 2 || NumBaseMaskElts == 4)) { -MVT ShuffleVT = (FloatDomain ? MVT::v8f64 : MVT::v8i64); - // If the upper subvectors are zeroable, then an extract+insert is more // optimal than using X86ISD::SHUF128. The insertion is free, even if it has // to zero the upper subvectors. @@ -35367,12 +35365,11 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, return SDValue(); // Nothing to do! assert(isInRange(BaseMask[0], 0, NumBaseMaskElts) && "Unexpected lane shuffle"); - Res = CanonicalizeShuffleInput(ShuffleVT, V1); - unsigned SubIdx = BaseMask[0] * (8 / NumBaseMaskElts); + Res = CanonicalizeShuffleInput(RootVT, V1); + unsigned SubIdx = BaseMask[0] * (NumRootElts / NumBaseMaskElts); bool UseZero = isAnyZero(BaseMask); Res = extractSubVector(Res, SubIdx, DAG, DL, BaseMaskEltSizeInBits); - Res = widenSubVector(Res, UseZero, Subtarget, DAG, DL, RootSizeInBits); - return DAG.getBitcast(RootVT, Res); + return widenSubVector(Res, UseZero, Subtarget, DAG, DL, RootSizeInBits); } // Narrow shuffle mask to v4x128. @@ -35423,6 +35420,7 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, if (!isAnyZero(Mask) && !PreferPERMQ) { if (Depth == 0 && Root.getOpcode() == X86ISD::SHUF128) return SDValue(); // Nothing to do! + MVT ShuffleVT = (FloatDomain ? MVT::v8f64 : MVT::v8i64); if (SDValue V = MatchSHUF128(ShuffleVT, DL, Mask, V1, V2, DAG)) return DAG.getBitcast(RootVT, V); } @@ -35430,8 +35428,6 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, // Handle 128-bit lane shuffles of 256-bit vectors. if (RootVT.is256BitVector() && NumBaseMaskElts == 2) { -MVT ShuffleVT = (FloatDomain ? MVT::v4f64 : MVT::v4i64); - // If the upper half is zeroable, then an extract+insert is more optimal // than using X86ISD::VPERM2X128. The insertion is free, even if it has to // zero the upper half. @@ -35439,13 +35435,13 @@ static SDValue combineX86ShuffleChain(ArrayRef Inputs, SDValue Root, if (Depth == 0 && Root.getOpcode() == ISD::INSERT_SUBVECTOR) return SDValue(); // Nothing to do! assert(isInRange(BaseMask[0], 0, 2) && "Unexpected lane shuffle"); - Res = CanonicalizeShuffleInput(ShuffleVT, V1); - Res = extract128BitVector(Res, BaseMask[0] * 2, DAG, DL); - Res = widenSubVector(Res, BaseMask[1] == SM_SentinelZero, Subtarget, DAG, - DL, 256); - return DAG.getBitcast(RootVT, Res); + Res = CanonicalizeShuffleInput(RootVT, V1); + Res = extract128BitVector(Res, BaseMask[0] * (NumRootElts / 2), DAG, DL); + return widenSubVector(Res, BaseMask[1] == SM_SentinelZero, Subtarget, DAG, +DL, 256); } +MVT ShuffleVT = (FloatDomain ? MVT::v4f64 : MVT::v4i64); if (Depth == 0 && Root.getOpcode() == X86ISD::VPERM2X128) return SDValue(); // Nothing to do! ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 9641bd0 - [TableGen] RuleMatcher::defineComplexSubOperand avoid std::string copy. NFCI.
Author: Simon Pilgrim Date: 2021-01-25T11:35:44Z New Revision: 9641bd0f87dda34c09c606358bb0cb08a641a4f6 URL: https://github.com/llvm/llvm-project/commit/9641bd0f87dda34c09c606358bb0cb08a641a4f6 DIFF: https://github.com/llvm/llvm-project/commit/9641bd0f87dda34c09c606358bb0cb08a641a4f6.diff LOG: [TableGen] RuleMatcher::defineComplexSubOperand avoid std::string copy. NFCI. Use const reference to avoid std::string copy - accordingly to the style guide we shouldn't be using auto anyway. Fixes MSVC analyzer warning. Added: Modified: llvm/utils/TableGen/GlobalISelEmitter.cpp Removed: diff --git a/llvm/utils/TableGen/GlobalISelEmitter.cpp b/llvm/utils/TableGen/GlobalISelEmitter.cpp index 8026a3a102be..cd97733ce984 100644 --- a/llvm/utils/TableGen/GlobalISelEmitter.cpp +++ b/llvm/utils/TableGen/GlobalISelEmitter.cpp @@ -933,7 +933,8 @@ class RuleMatcher : public Matcher { StringRef ParentSymbolicName) { std::string ParentName(ParentSymbolicName); if (ComplexSubOperands.count(SymbolicName)) { - auto RecordedParentName = ComplexSubOperandsParentName[SymbolicName]; + const std::string = + ComplexSubOperandsParentName[SymbolicName]; if (RecordedParentName != ParentName) return failedImport("Error: Complex suboperand " + SymbolicName + " referenced by diff erent operands: " + ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 344afa8 - [Support] TrigramIndex::insert - pass std::String argument by const reference. NFCI.
Author: Simon Pilgrim Date: 2021-01-23T11:04:00Z New Revision: 344afa853fcfcc085cb5c957b4a07c7ea013bb1b URL: https://github.com/llvm/llvm-project/commit/344afa853fcfcc085cb5c957b4a07c7ea013bb1b DIFF: https://github.com/llvm/llvm-project/commit/344afa853fcfcc085cb5c957b4a07c7ea013bb1b.diff LOG: [Support] TrigramIndex::insert - pass std::String argument by const reference. NFCI. Avoid string copies and fix clang-tidy warning. Added: Modified: llvm/include/llvm/Support/TrigramIndex.h llvm/lib/Support/TrigramIndex.cpp Removed: diff --git a/llvm/include/llvm/Support/TrigramIndex.h b/llvm/include/llvm/Support/TrigramIndex.h index 360ab9459790..0be6a1012718 100644 --- a/llvm/include/llvm/Support/TrigramIndex.h +++ b/llvm/include/llvm/Support/TrigramIndex.h @@ -38,7 +38,7 @@ class StringRef; class TrigramIndex { public: /// Inserts a new Regex into the index. - void insert(std::string Regex); + void insert(const std::string ); /// Returns true, if special case list definitely does not have a line /// that matches the query. Returns false, if it's not sure. diff --git a/llvm/lib/Support/TrigramIndex.cpp b/llvm/lib/Support/TrigramIndex.cpp index 1f1f3022b0b3..4370adc9c3e0 100644 --- a/llvm/lib/Support/TrigramIndex.cpp +++ b/llvm/lib/Support/TrigramIndex.cpp @@ -25,7 +25,7 @@ static bool isAdvancedMetachar(unsigned Char) { return strchr(RegexAdvancedMetachars, Char) != nullptr; } -void TrigramIndex::insert(std::string Regex) { +void TrigramIndex::insert(const std::string ) { if (Defeated) return; std::set Was; unsigned Cnt = 0; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] bd122f6 - [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x), movddup(y)) cases
Author: Simon Pilgrim Date: 2021-01-22T16:05:19Z New Revision: bd122f6d217862b4631ac118c58f62a7dec16a02 URL: https://github.com/llvm/llvm-project/commit/bd122f6d217862b4631ac118c58f62a7dec16a02 DIFF: https://github.com/llvm/llvm-project/commit/bd122f6d217862b4631ac118c58f62a7dec16a02.diff LOG: [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y)) Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/extract-concat.ll llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 90ed8c920565..70203dacef09 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36922,6 +36922,15 @@ static SDValue canonicalizeLaneShuffleWithRepeatedOps(SDValue V, return SDValue(); switch (SrcOpc0) { + case X86ISD::MOVDDUP: { +SDValue LHS = DAG.getBitcast(VT, Src0.getOperand(0)); +SDValue RHS = +DAG.getBitcast(VT, Src1.isUndef() ? Src1 : Src1.getOperand(0)); +SDValue Res = +DAG.getNode(X86ISD::VPERM2X128, DL, VT, LHS, RHS, V.getOperand(2)); +Res = DAG.getNode(SrcOpc0, DL, SrcVT0, DAG.getBitcast(SrcVT0, Res)); +return DAG.getBitcast(VT, Res); + } case X86ISD::VSHLI: case X86ISD::VSRLI: case X86ISD::VSRAI: diff --git a/llvm/test/CodeGen/X86/extract-concat.ll b/llvm/test/CodeGen/X86/extract-concat.ll index 49ac851d88fc..f979f23f82f8 100644 --- a/llvm/test/CodeGen/X86/extract-concat.ll +++ b/llvm/test/CodeGen/X86/extract-concat.ll @@ -68,13 +68,12 @@ define <16 x i64> @catcat(<4 x i64> %x) { ; ; AVX1-LABEL: catcat: ; AVX1: # %bb.0: -; AVX1-NEXT:vmovddup {{.*#+}} ymm1 = ymm0[0,0,2,2] -; AVX1-NEXT:vperm2f128 {{.*#+}} ymm2 = ymm1[2,3,2,3] ; AVX1-NEXT:vpermilps {{.*#+}} xmm1 = xmm0[0,1,0,1] ; AVX1-NEXT:vinsertf128 $1, %xmm1, %ymm1, %ymm4 ; AVX1-NEXT:vpermilps {{.*#+}} xmm1 = xmm0[2,3,2,3] ; AVX1-NEXT:vinsertf128 $1, %xmm1, %ymm1, %ymm1 ; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] +; AVX1-NEXT:vmovddup {{.*#+}} ymm2 = ymm0[0,0,2,2] ; AVX1-NEXT:vpermilpd {{.*#+}} ymm3 = ymm0[1,1,3,3] ; AVX1-NEXT:vmovaps %ymm4, %ymm0 ; AVX1-NEXT:retq diff --git a/llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll b/llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll index 38600884262c..80acaef8a0a0 100644 --- a/llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll +++ b/llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll @@ -109,8 +109,8 @@ define <4 x double> @shuffle_v4f64_1000(<4 x double> %a, <4 x double> %b) { define <4 x double> @shuffle_v4f64_2200(<4 x double> %a, <4 x double> %b) { ; AVX1-LABEL: shuffle_v4f64_2200: ; AVX1: # %bb.0: -; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1] +; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:retq ; ; AVX2-LABEL: shuffle_v4f64_2200: @@ -129,8 +129,8 @@ define <4 x double> @shuffle_v4f64_2200(<4 x double> %a, <4 x double> %b) { define <4 x double> @shuffle_v4f64_(<4 x double> %a, <4 x double> %b) { ; AVX1-LABEL: shuffle_v4f64_: ; AVX1: # %bb.0: -; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] +; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:retq ; ; AVX2-LABEL: shuffle_v4f64_: @@ -149,8 +149,8 @@ define <4 x double> @shuffle_v4f64_(<4 x double> %a, <4 x double> %b) { define <4 x double> @shuffle_v4f64__bc(<4 x i64> %a, <4 x i64> %b) { ; AVX1-LABEL: shuffle_v4f64__bc: ; AVX1: # %bb.0: -; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] +; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:retq ; ; AVX2-LABEL: shuffle_v4f64__bc: @@ -856,8 +856,8 @@ define <4 x i64> @shuffle_v4i64_1000(<4 x i64> %a, <4 x i64> %b) { define <4 x i64> @shuffle_v4i64_2200(<4 x i64> %a, <4 x i64> %b) { ; AVX1-LABEL: shuffle_v4i64_2200: ; AVX1: # %bb.0: -; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1] +; AVX1-NEXT:vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] ; AVX1-NEXT:retq ; ; AVX2-LABEL: shuffle_v4i64_2200: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c33d36e - [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x, c), undef) cases
Author: Simon Pilgrim Date: 2021-01-22T15:47:23Z New Revision: c33d36e0667e7fff186243ac7a3a9cd63e797438 URL: https://github.com/llvm/llvm-project/commit/c33d36e0667e7fff186243ac7a3a9cd63e797438 DIFF: https://github.com/llvm/llvm-project/commit/c33d36e0667e7fff186243ac7a3a9cd63e797438.diff LOG: [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c) Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avx-splat.ll llvm/test/CodeGen/X86/extract-concat.ll llvm/test/CodeGen/X86/haddsub-4.ll llvm/test/CodeGen/X86/known-signbits-vector.ll llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll llvm/test/CodeGen/X86/vector-shuffle-combining.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 577745c42d81..90ed8c920565 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36918,19 +36918,21 @@ static SDValue canonicalizeLaneShuffleWithRepeatedOps(SDValue V, EVT SrcVT0 = Src0.getValueType(); EVT SrcVT1 = Src1.getValueType(); - // TODO: Under what circumstances should we push perm2f128 up when we have one - // active src? - if (SrcOpc0 != SrcOpc1 || SrcVT0 != SrcVT1) + if (!Src1.isUndef() && (SrcVT0 != SrcVT1 || SrcOpc0 != SrcOpc1)) return SDValue(); switch (SrcOpc0) { case X86ISD::VSHLI: case X86ISD::VSRLI: case X86ISD::VSRAI: -if (Src0.getOperand(1) == Src1.getOperand(1)) { - SDValue Res = DAG.getNode( - X86ISD::VPERM2X128, DL, VT, DAG.getBitcast(VT, Src0.getOperand(0)), - DAG.getBitcast(VT, Src1.getOperand(0)), V.getOperand(2)); + case X86ISD::PSHUFD: + case X86ISD::VPERMILPI: +if (Src1.isUndef() || Src0.getOperand(1) == Src1.getOperand(1)) { + SDValue LHS = DAG.getBitcast(VT, Src0.getOperand(0)); + SDValue RHS = + DAG.getBitcast(VT, Src1.isUndef() ? Src1 : Src1.getOperand(0)); + SDValue Res = + DAG.getNode(X86ISD::VPERM2X128, DL, VT, LHS, RHS, V.getOperand(2)); Res = DAG.getNode(SrcOpc0, DL, SrcVT0, DAG.getBitcast(SrcVT0, Res), Src0.getOperand(1)); return DAG.getBitcast(VT, Res); diff --git a/llvm/test/CodeGen/X86/avx-splat.ll b/llvm/test/CodeGen/X86/avx-splat.ll index 3755cf4740ab..7602975c8872 100644 --- a/llvm/test/CodeGen/X86/avx-splat.ll +++ b/llvm/test/CodeGen/X86/avx-splat.ll @@ -157,8 +157,8 @@ entry: define <8 x float> @funcH(<8 x float> %a) nounwind uwtable readnone ssp { ; CHECK-LABEL: funcH: ; CHECK: # %bb.0: # %entry -; CHECK-NEXT:vpermilps {{.*#+}} ymm0 = ymm0[1,1,1,1,5,5,5,5] ; CHECK-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] +; CHECK-NEXT:vpermilps {{.*#+}} ymm0 = ymm0[1,1,1,1,5,5,5,5] ; CHECK-NEXT:ret{{[l|q]}} entry: %shuffle = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> diff --git a/llvm/test/CodeGen/X86/extract-concat.ll b/llvm/test/CodeGen/X86/extract-concat.ll index 26e07d86bfc3..49ac851d88fc 100644 --- a/llvm/test/CodeGen/X86/extract-concat.ll +++ b/llvm/test/CodeGen/X86/extract-concat.ll @@ -70,12 +70,12 @@ define <16 x i64> @catcat(<4 x i64> %x) { ; AVX1: # %bb.0: ; AVX1-NEXT:vmovddup {{.*#+}} ymm1 = ymm0[0,0,2,2] ; AVX1-NEXT:vperm2f128 {{.*#+}} ymm2 = ymm1[2,3,2,3] -; AVX1-NEXT:vpermilpd {{.*#+}} ymm1 = ymm0[1,1,3,3] -; AVX1-NEXT:vperm2f128 {{.*#+}} ymm3 = ymm1[2,3,2,3] ; AVX1-NEXT:vpermilps {{.*#+}} xmm1 = xmm0[0,1,0,1] ; AVX1-NEXT:vinsertf128 $1, %xmm1, %ymm1, %ymm4 -; AVX1-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[2,3,2,3] -; AVX1-NEXT:vinsertf128 $1, %xmm0, %ymm0, %ymm1 +; AVX1-NEXT:vpermilps {{.*#+}} xmm1 = xmm0[2,3,2,3] +; AVX1-NEXT:vinsertf128 $1, %xmm1, %ymm1, %ymm1 +; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] +; AVX1-NEXT:vpermilpd {{.*#+}} ymm3 = ymm0[1,1,3,3] ; AVX1-NEXT:vmovaps %ymm4, %ymm0 ; AVX1-NEXT:retq ; diff --git a/llvm/test/CodeGen/X86/haddsub-4.ll b/llvm/test/CodeGen/X86/haddsub-4.ll index 6003f98b9371..2e077d6247ba 100644 --- a/llvm/test/CodeGen/X86/haddsub-4.ll +++ b/llvm/test/CodeGen/X86/haddsub-4.ll @@ -65,8 +65,8 @@ define <8 x float> @hadd_reverse_v8f32(<8 x float> %a0, <8 x float> %a1) { ; AVX1-LABEL: hadd_reverse_v8f32: ; AVX1: # %bb.0: ; AVX1-NEXT:vhaddps %ymm1, %ymm0, %ymm0 -; AVX1-NEXT:vpermilps {{.*#+}} ymm0 = ymm0[1,0,3,2,5,4,7,6] ; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1] +; AVX1-NEXT:vpermilps {{.*#+}} ymm0 = ymm0[1,0,3,2,5,4,7,6] ; AVX1-NEXT:retq ; ; AVX2-LABEL: hadd_reverse_v8f32: @@ -97,10 +97,10 @@ define <8 x float> @hadd_reverse2_v8f32(<8 x float> %a0, <8 x float> %a1) { ; ; AVX1-LABEL:
[llvm-branch-commits] [llvm] 4846f6a - [X86][AVX] combineTargetShuffle - simplify the X86ISD::VPERM2X128 subvector matching
Author: Simon Pilgrim Date: 2021-01-22T15:47:22Z New Revision: 4846f6ab815c34f6ffbc8d4ecde891d917bf2157 URL: https://github.com/llvm/llvm-project/commit/4846f6ab815c34f6ffbc8d4ecde891d917bf2157 DIFF: https://github.com/llvm/llvm-project/commit/4846f6ab815c34f6ffbc8d4ecde891d917bf2157.diff LOG: [X86][AVX] combineTargetShuffle - simplify the X86ISD::VPERM2X128 subvector matching Simplify vperm2x128(concat(X,Y),concat(Z,W)) folding. Use collectConcatOps / ISD::INSERT_SUBVECTOR to find the source subvectors instead of hardcoded immediate matching. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index a293c48a824a..577745c42d81 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -37324,41 +37324,33 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG , if (SDValue Res = canonicalizeLaneShuffleWithRepeatedOps(N, DAG, DL)) return Res; -// If both 128-bit values were inserted into high halves of 256-bit values, -// the shuffle can be reduced to a concatenation of subvectors: -// vperm2x128 (ins ?, X, C1), (ins ?, Y, C2), 0x31 --> concat X, Y -// Note: We are only looking for the exact high/high shuffle mask because we -// expect to fold other similar patterns before creating this opcode. -SDValue Ins0 = peekThroughBitcasts(N.getOperand(0)); -SDValue Ins1 = peekThroughBitcasts(N.getOperand(1)); +// Combine vperm2x128 subvector shuffle with an inner concat pattern. +// vperm2x128(concat(X,Y),concat(Z,W)) --> concat X,Y etc. +auto FindSubVector128 = [&](unsigned Idx) { + if (Idx > 3) +return SDValue(); + SDValue Src = peekThroughBitcasts(N.getOperand(Idx < 2 ? 0 : 1)); + SmallVector SubOps; + if (collectConcatOps(Src.getNode(), SubOps) && SubOps.size() == 2) +return SubOps[Idx & 1]; + unsigned NumElts = Src.getValueType().getVectorNumElements(); + if ((Idx & 1) == 1 && Src.getOpcode() == ISD::INSERT_SUBVECTOR && + Src.getOperand(1).getValueSizeInBits() == 128 && + Src.getConstantOperandAPInt(2) == (NumElts / 2)) { +return Src.getOperand(1); + } + return SDValue(); +}; unsigned Imm = N.getConstantOperandVal(2); - -// Handle subvector splat by tweaking values to match binary concat. -// vperm2x128 (ins ?, X, C1), undef, 0x11 -> -// vperm2x128 (ins ?, X, C1), (ins ?, X, C1), 0x31 -> concat X, X -if (Imm == 0x11 && Ins1.isUndef()) { - Imm = 0x31; - Ins1 = Ins0; +if (SDValue SubLo = FindSubVector128(Imm & 0x0F)) { + if (SDValue SubHi = FindSubVector128((Imm & 0xF0) >> 4)) { +MVT SubVT = VT.getHalfNumVectorElementsVT(); +SubLo = DAG.getBitcast(SubVT, SubLo); +SubHi = DAG.getBitcast(SubVT, SubHi); +return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, SubLo, SubHi); + } } - -if (!(Imm == 0x31 && - Ins0.getOpcode() == ISD::INSERT_SUBVECTOR && - Ins1.getOpcode() == ISD::INSERT_SUBVECTOR && - Ins0.getValueType() == Ins1.getValueType())) - return SDValue(); - -SDValue X = Ins0.getOperand(1); -SDValue Y = Ins1.getOperand(1); -unsigned C1 = Ins0.getConstantOperandVal(2); -unsigned C2 = Ins1.getConstantOperandVal(2); -MVT SrcVT = X.getSimpleValueType(); -unsigned SrcElts = SrcVT.getVectorNumElements(); -if (SrcVT != Y.getSimpleValueType() || SrcVT.getSizeInBits() != 128 || -C1 != SrcElts || C2 != SrcElts) - return SDValue(); - -return DAG.getBitcast(VT, DAG.getNode(ISD::CONCAT_VECTORS, DL, - Ins1.getValueType(), X, Y)); +return SDValue(); } case X86ISD::PSHUFD: case X86ISD::PSHUFLW: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] b1166e1 - [X86][AVX] combineX86ShufflesRecursively - attempt to constant fold before widening shuffle inputs
Author: Simon Pilgrim Date: 2021-01-22T13:19:35Z New Revision: b1166e1317c54e9cfbb28b280af12313cf325a86 URL: https://github.com/llvm/llvm-project/commit/b1166e1317c54e9cfbb28b280af12313cf325a86 DIFF: https://github.com/llvm/llvm-project/commit/b1166e1317c54e9cfbb28b280af12313cf325a86.diff LOG: [X86][AVX] combineX86ShufflesRecursively - attempt to constant fold before widening shuffle inputs combineX86ShufflesConstants/canonicalizeShuffleMaskWithHorizOp can both handle/earlyout shuffles with inputs of different widths, so delay widening as late as possible to make it easier to match constant folds etc. The plan is to eventually move the widening inside combineX86ShuffleChain so that we don't create any new nodes unless we successfully combine the shuffles. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 895a02e5c98e..a293c48a824a 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36610,6 +36610,17 @@ static SDValue combineX86ShufflesRecursively( } } + // Attempt to constant fold all of the constant source ops. + if (SDValue Cst = combineX86ShufflesConstants( + Ops, Mask, Root, HasVariableMask, DAG, Subtarget)) +return Cst; + + // Canonicalize the combined shuffle mask chain with horizontal ops. + // NOTE: This will update the Ops and Mask. + if (SDValue HOp = canonicalizeShuffleMaskWithHorizOp( + Ops, Mask, RootSizeInBits, SDLoc(Root), DAG, Subtarget)) +return DAG.getBitcast(Root.getValueType(), HOp); + // Widen any subvector shuffle inputs we've collected. if (any_of(Ops, [RootSizeInBits](SDValue Op) { return Op.getValueSizeInBits() < RootSizeInBits; @@ -36622,17 +36633,6 @@ static SDValue combineX86ShufflesRecursively( resolveTargetShuffleInputsAndMask(Ops, Mask); } - // Attempt to constant fold all of the constant source ops. - if (SDValue Cst = combineX86ShufflesConstants( - Ops, Mask, Root, HasVariableMask, DAG, Subtarget)) -return Cst; - - // Canonicalize the combined shuffle mask chain with horizontal ops. - // NOTE: This will update the Ops and Mask. - if (SDValue HOp = canonicalizeShuffleMaskWithHorizOp( - Ops, Mask, RootSizeInBits, SDLoc(Root), DAG, Subtarget)) -return DAG.getBitcast(Root.getValueType(), HOp); - // We can only combine unary and binary shuffle mask cases. if (Ops.size() <= 2) { // Minor canonicalization of the accumulated shuffle mask to make it easier diff --git a/llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll b/llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll index 2c53579f7627..c358250305a7 100644 --- a/llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll +++ b/llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll @@ -108,13 +108,12 @@ define void @PR46178(i16* %0) { ; X86-NEXT:vmovdqu (%eax), %ymm1 ; X86-NEXT:vpmovqw %ymm0, %xmm0 ; X86-NEXT:vpmovqw %ymm1, %xmm1 -; X86-NEXT:vinserti128 $1, %xmm1, %ymm0, %ymm0 -; X86-NEXT:vpsllw $8, %ymm0, %ymm0 -; X86-NEXT:vpsraw $8, %ymm0, %ymm0 -; X86-NEXT:vmovapd {{.*#+}} ymm1 = [0,0,2,0,4,0,4,0] -; X86-NEXT:vxorpd %xmm2, %xmm2, %xmm2 -; X86-NEXT:vpermi2pd %ymm2, %ymm0, %ymm1 -; X86-NEXT:vmovupd %ymm1, (%eax) +; X86-NEXT:vpsllw $8, %xmm1, %xmm1 +; X86-NEXT:vpsraw $8, %xmm1, %xmm1 +; X86-NEXT:vpsllw $8, %xmm0, %xmm0 +; X86-NEXT:vpsraw $8, %xmm0, %xmm0 +; X86-NEXT:vshufpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[3] +; X86-NEXT:vmovupd %ymm0, (%eax) ; X86-NEXT:vzeroupper ; X86-NEXT:retl ; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] ffe72f9 - [X86][SSE] Don't fold shuffle(binop(), binop()) -> binop(shuffle(), shuffle()) if the shuffle are splats
Author: Simon Pilgrim Date: 2021-01-22T11:31:38Z New Revision: ffe72f987f4866c46c18174cdb750dea88bedba3 URL: https://github.com/llvm/llvm-project/commit/ffe72f987f4866c46c18174cdb750dea88bedba3 DIFF: https://github.com/llvm/llvm-project/commit/ffe72f987f4866c46c18174cdb750dea88bedba3.diff LOG: [X86][SSE] Don't fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) if the shuffle are splats rGbe69e66b1cd8 added the fold, but DAGCombiner.visitVECTOR_SHUFFLE doesn't merge shuffles if the inner shuffle is a splat, so we need to bail. The non-fast-horiz-ops paths see some minor regressions, we might be able to improve on this after lowering to target shuffles. Fix PR48823 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/haddsub-3.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index c5cc23f6236e..895a02e5c98e 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -37964,23 +37964,24 @@ static SDValue combineShuffle(SDNode *N, SelectionDAG , return HAddSub; // Merge shuffles through binops if its likely we'll be able to merge it -// with other shuffles. +// with other shuffles (as long as they aren't splats). // shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) // TODO: We might be able to move this to DAGCombiner::visitVECTOR_SHUFFLE. if (auto *SVN = dyn_cast(N)) { unsigned SrcOpcode = N->getOperand(0).getOpcode(); if (SrcOpcode == N->getOperand(1).getOpcode() && TLI.isBinOp(SrcOpcode) && N->isOnlyUserOf(N->getOperand(0).getNode()) && - N->isOnlyUserOf(N->getOperand(1).getNode()) && - VT.getScalarSizeInBits() >= 32) { + N->isOnlyUserOf(N->getOperand(1).getNode())) { SDValue Op00 = N->getOperand(0).getOperand(0); SDValue Op10 = N->getOperand(1).getOperand(0); SDValue Op01 = N->getOperand(0).getOperand(1); SDValue Op11 = N->getOperand(1).getOperand(1); -if ((Op00.getOpcode() == ISD::VECTOR_SHUFFLE || - Op10.getOpcode() == ISD::VECTOR_SHUFFLE) && -(Op01.getOpcode() == ISD::VECTOR_SHUFFLE || - Op11.getOpcode() == ISD::VECTOR_SHUFFLE)) { +auto *SVN00 = dyn_cast(Op00); +auto *SVN10 = dyn_cast(Op10); +auto *SVN01 = dyn_cast(Op01); +auto *SVN11 = dyn_cast(Op11); +if (((SVN00 && !SVN00->isSplat()) || (SVN10 && !SVN10->isSplat())) && +((SVN01 && !SVN01->isSplat()) || (SVN11 && !SVN11->isSplat( { SDLoc DL(N); ArrayRef Mask = SVN->getMask(); SDValue LHS = DAG.getVectorShuffle(VT, DL, Op00, Op10, Mask); diff --git a/llvm/test/CodeGen/X86/haddsub-3.ll b/llvm/test/CodeGen/X86/haddsub-3.ll index 651ab4ef3935..48d4fe556555 100644 --- a/llvm/test/CodeGen/X86/haddsub-3.ll +++ b/llvm/test/CodeGen/X86/haddsub-3.ll @@ -161,46 +161,49 @@ define <4 x float> @PR48823(<4 x float> %0, <4 x float> %1) { ; SSE2-LABEL: PR48823: ; SSE2: # %bb.0: ; SSE2-NEXT:movaps %xmm0, %xmm2 -; SSE2-NEXT:shufps {{.*#+}} xmm2 = xmm2[1,1],xmm1[2,3] -; SSE2-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; SSE2-NEXT:shufps {{.*#+}} xmm2 = xmm2[1,1],xmm0[1,1] ; SSE2-NEXT:subps %xmm2, %xmm0 +; SSE2-NEXT:movaps %xmm1, %xmm2 +; SSE2-NEXT:shufps {{.*#+}} xmm2 = xmm2[2,2],xmm1[2,2] +; SSE2-NEXT:subps %xmm1, %xmm2 +; SSE2-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,3] ; SSE2-NEXT:retq ; ; SSSE3-SLOW-LABEL: PR48823: ; SSSE3-SLOW: # %bb.0: -; SSSE3-SLOW-NEXT:movaps %xmm0, %xmm2 -; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm2 = xmm2[1,1],xmm1[2,3] -; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; SSSE3-SLOW-NEXT:movshdup {{.*#+}} xmm2 = xmm0[1,1,3,3] ; SSSE3-SLOW-NEXT:subps %xmm2, %xmm0 +; SSSE3-SLOW-NEXT:movsldup {{.*#+}} xmm2 = xmm1[0,0,2,2] +; SSSE3-SLOW-NEXT:subps %xmm1, %xmm2 +; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,3] ; SSSE3-SLOW-NEXT:retq ; ; SSSE3-FAST-LABEL: PR48823: ; SSSE3-FAST: # %bb.0: -; SSSE3-FAST-NEXT:movaps %xmm0, %xmm2 -; SSSE3-FAST-NEXT:shufps {{.*#+}} xmm2 = xmm2[1,1],xmm1[2,3] -; SSSE3-FAST-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] -; SSSE3-FAST-NEXT:subps %xmm2, %xmm0 +; SSSE3-FAST-NEXT:hsubps %xmm1, %xmm0 ; SSSE3-FAST-NEXT:retq ; ; AVX1-SLOW-LABEL: PR48823: ; AVX1-SLOW: # %bb.0: -; AVX1-SLOW-NEXT:vshufps {{.*#+}} xmm2 = xmm0[1,1],xmm1[2,3] -; AVX1-SLOW-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; AVX1-SLOW-NEXT:vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3] ; AVX1-SLOW-NEXT:vsubps %xmm2, %xmm0, %xmm0 +; AVX1-SLOW-NEXT:vmovsldup {{.*#+}} xmm2 = xmm1[0,0,2,2] +; AVX1-SLOW-NEXT:vsubps %xmm1, %xmm2, %xmm1 +;
[llvm-branch-commits] [llvm] 481659c - [X86][SSE] Add v16i8 02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu shuffle test
Author: Simon Pilgrim Date: 2021-01-22T10:05:22Z New Revision: 481659c55c4ec1e133bec82a909e9e6baee70a28 URL: https://github.com/llvm/llvm-project/commit/481659c55c4ec1e133bec82a909e9e6baee70a28 DIFF: https://github.com/llvm/llvm-project/commit/481659c55c4ec1e133bec82a909e9e6baee70a28.diff LOG: [X86][SSE] Add v16i8 02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu shuffle test Added: Modified: llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll Removed: diff --git a/llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll b/llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll index ee3cf43e8f2f7..012b9f07dc6d0 100644 --- a/llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll +++ b/llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll @@ -761,6 +761,60 @@ define <16 x i8> @shuffle_v16i8_16_17_18_19_04_05_06_07_24_25_10_11_28_13_30_15( ret <16 x i8> %shuffle } +define <16 x i8> @shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu(<16 x i8> %a, <16 x i8> %b) { +; SSE2-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; SSE2: # %bb.0: +; SSE2-NEXT:pshufd {{.*#+}} xmm1 = xmm1[1,1,1,1] +; SSE2-NEXT:punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; SSE2-NEXT:pand {{.*}}(%rip), %xmm0 +; SSE2-NEXT:psrlq $16, %xmm0 +; SSE2-NEXT:packuswb %xmm0, %xmm0 +; SSE2-NEXT:retq +; +; SSSE3-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; SSSE3: # %bb.0: +; SSSE3-NEXT:punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7] +; SSSE3-NEXT:pshufb {{.*#+}} xmm0 = xmm0[4,9,u,u,u,u,u,u,u,u,u,u,u,u,u,u] +; SSSE3-NEXT:retq +; +; SSE41-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; SSE41: # %bb.0: +; SSE41-NEXT:punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7] +; SSE41-NEXT:pshufb {{.*#+}} xmm0 = xmm0[4,9,u,u,u,u,u,u,u,u,u,u,u,u,u,u] +; SSE41-NEXT:retq +; +; AVX1-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; AVX1: # %bb.0: +; AVX1-NEXT:vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7] +; AVX1-NEXT:vpshufb {{.*#+}} xmm0 = xmm0[4,9,u,u,u,u,u,u,u,u,u,u,u,u,u,u] +; AVX1-NEXT:retq +; +; AVX2-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; AVX2: # %bb.0: +; AVX2-NEXT:vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7] +; AVX2-NEXT:vpshufb {{.*#+}} xmm0 = xmm0[4,9,u,u,u,u,u,u,u,u,u,u,u,u,u,u] +; AVX2-NEXT:retq +; +; AVX512VLBW-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; AVX512VLBW: # %bb.0: +; AVX512VLBW-NEXT:vpunpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7] +; AVX512VLBW-NEXT:vpshufb {{.*#+}} xmm0 = xmm0[4,9,u,u,u,u,u,u,u,u,u,u,u,u,u,u] +; AVX512VLBW-NEXT:retq +; +; AVX512VLVBMI-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; AVX512VLVBMI: # %bb.0: +; AVX512VLVBMI-NEXT:vpbroadcastw {{.*#+}} xmm2 = [5122,5122,5122,5122,5122,5122,5122,5122] +; AVX512VLVBMI-NEXT:vpermt2b %xmm1, %xmm2, %xmm0 +; AVX512VLVBMI-NEXT:retq +; +; XOP-LABEL: shuffle_v16i8_02_20_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu_uu: +; XOP: # %bb.0: +; XOP-NEXT:vpperm {{.*#+}} xmm0 = xmm0[2],xmm1[4],xmm0[u,u,u,u,u,u,u,u,u,u,u,u,u,u] +; XOP-NEXT:retq + %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> + ret <16 x i8> %shuffle +} + ; PR39387 define <16 x i8> @shuffle_v16i8_5_6_7_8_9_10_27_28_29_30_31_0_1_2_3_4(<16 x i8> %a, <16 x i8> %b) { ; SSE2-LABEL: shuffle_v16i8_5_6_7_8_9_10_27_28_29_30_31_0_1_2_3_4: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 636b877 - [X86][SSE] Add PR48823 HSUB test case
Author: Simon Pilgrim Date: 2021-01-22T10:05:22Z New Revision: 636b87785c1de64134254b688d30ab1248b16ed2 URL: https://github.com/llvm/llvm-project/commit/636b87785c1de64134254b688d30ab1248b16ed2 DIFF: https://github.com/llvm/llvm-project/commit/636b87785c1de64134254b688d30ab1248b16ed2.diff LOG: [X86][SSE] Add PR48823 HSUB test case Added: Modified: llvm/test/CodeGen/X86/haddsub-3.ll Removed: diff --git a/llvm/test/CodeGen/X86/haddsub-3.ll b/llvm/test/CodeGen/X86/haddsub-3.ll index 05ab83f8604de..651ab4ef39355 100644 --- a/llvm/test/CodeGen/X86/haddsub-3.ll +++ b/llvm/test/CodeGen/X86/haddsub-3.ll @@ -156,3 +156,56 @@ define <4 x double> @PR41414(i64 %x, <4 x double> %y) { %t3 = fadd <4 x double> zeroinitializer, %t2 ret <4 x double> %t3 } + +define <4 x float> @PR48823(<4 x float> %0, <4 x float> %1) { +; SSE2-LABEL: PR48823: +; SSE2: # %bb.0: +; SSE2-NEXT:movaps %xmm0, %xmm2 +; SSE2-NEXT:shufps {{.*#+}} xmm2 = xmm2[1,1],xmm1[2,3] +; SSE2-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; SSE2-NEXT:subps %xmm2, %xmm0 +; SSE2-NEXT:retq +; +; SSSE3-SLOW-LABEL: PR48823: +; SSSE3-SLOW: # %bb.0: +; SSSE3-SLOW-NEXT:movaps %xmm0, %xmm2 +; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm2 = xmm2[1,1],xmm1[2,3] +; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; SSSE3-SLOW-NEXT:subps %xmm2, %xmm0 +; SSSE3-SLOW-NEXT:retq +; +; SSSE3-FAST-LABEL: PR48823: +; SSSE3-FAST: # %bb.0: +; SSSE3-FAST-NEXT:movaps %xmm0, %xmm2 +; SSSE3-FAST-NEXT:shufps {{.*#+}} xmm2 = xmm2[1,1],xmm1[2,3] +; SSSE3-FAST-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; SSSE3-FAST-NEXT:subps %xmm2, %xmm0 +; SSSE3-FAST-NEXT:retq +; +; AVX1-SLOW-LABEL: PR48823: +; AVX1-SLOW: # %bb.0: +; AVX1-SLOW-NEXT:vshufps {{.*#+}} xmm2 = xmm0[1,1],xmm1[2,3] +; AVX1-SLOW-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; AVX1-SLOW-NEXT:vsubps %xmm2, %xmm0, %xmm0 +; AVX1-SLOW-NEXT:retq +; +; AVX1-FAST-LABEL: PR48823: +; AVX1-FAST: # %bb.0: +; AVX1-FAST-NEXT:vshufps {{.*#+}} xmm2 = xmm0[1,1],xmm1[2,3] +; AVX1-FAST-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; AVX1-FAST-NEXT:vsubps %xmm2, %xmm0, %xmm0 +; AVX1-FAST-NEXT:retq +; +; AVX2-LABEL: PR48823: +; AVX2: # %bb.0: +; AVX2-NEXT:vshufps {{.*#+}} xmm2 = xmm0[1,1],xmm1[2,3] +; AVX2-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,2] +; AVX2-NEXT:vsubps %xmm2, %xmm0, %xmm0 +; AVX2-NEXT:retq + %3 = shufflevector <4 x float> %0, <4 x float> poison, <4 x i32> + %4 = fsub <4 x float> %0, %3 + %5 = shufflevector <4 x float> %1, <4 x float> poison, <4 x i32> + %6 = fsub <4 x float> %5, %1 + %7 = shufflevector <4 x float> %4, <4 x float> %6, <4 x i32> + ret <4 x float> %7 +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 69bc099 - [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).
Author: Simon Pilgrim Date: 2021-01-21T13:01:34Z New Revision: 69bc0990a9181e6eb86228276d2f59435a7fae67 URL: https://github.com/llvm/llvm-project/commit/69bc0990a9181e6eb86228276d2f59435a7fae67 DIFF: https://github.com/llvm/llvm-project/commit/69bc0990a9181e6eb86228276d2f59435a7fae67.diff LOG: [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387 Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/test/CodeGen/AArch64/aarch64-smull.ll llvm/test/CodeGen/AArch64/lowerMUL-newload.ll llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll llvm/test/CodeGen/ARM/lowerMUL-newload.ll llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll llvm/test/CodeGen/Thumb2/mve-vmulh.ll llvm/test/CodeGen/X86/combine-sra.ll llvm/test/CodeGen/X86/known-signbits-vector.ll llvm/test/CodeGen/X86/min-legal-vector-width.ll llvm/test/CodeGen/X86/uint_to_fp-3.ll llvm/test/CodeGen/X86/vector-trunc.ll Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 067bc436acdd..32c7ac2f6cfb 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -11952,8 +11952,7 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) { } // Simplify the operands using demanded-bits information. - if (!VT.isVector() && - SimplifyDemandedBits(SDValue(N, 0))) + if (SimplifyDemandedBits(SDValue(N, 0))) return SDValue(N, 0); // (trunc adde(X, Y, Carry)) -> (adde trunc(X), trunc(Y), Carry) diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index cac4d8fff8bb..e2f42d050740 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -1986,7 +1986,8 @@ bool TargetLowering::SimplifyDemandedBits( // zero/one bits live out. unsigned OperandBitWidth = Src.getScalarValueSizeInBits(); APInt TruncMask = DemandedBits.zext(OperandBitWidth); -if (SimplifyDemandedBits(Src, TruncMask, Known, TLO, Depth + 1)) +if (SimplifyDemandedBits(Src, TruncMask, DemandedElts, Known, TLO, + Depth + 1)) return true; Known = Known.trunc(BitWidth); @@ -2009,9 +2010,9 @@ bool TargetLowering::SimplifyDemandedBits( // undesirable. break; -SDValue ShAmt = Src.getOperand(1); -auto *ShAmtC = dyn_cast(ShAmt); -if (!ShAmtC || ShAmtC->getAPIntValue().uge(BitWidth)) +const APInt *ShAmtC = +TLO.DAG.getValidShiftAmountConstant(Src, DemandedElts); +if (!ShAmtC) break; uint64_t ShVal = ShAmtC->getZExtValue(); diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index c7bcd4de046c..6dd081dc3cb7 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -3399,6 +3399,7 @@ static SDValue skipExtensionForVectorMULL(SDNode *N, SelectionDAG ) { static bool isSignExtended(SDNode *N, SelectionDAG ) { return N->getOpcode() == ISD::SIGN_EXTEND || + N->getOpcode() == ISD::ANY_EXTEND || isExtendedBUILD_VECTOR(N, DAG, true); } diff --git a/llvm/test/CodeGen/AArch64/aarch64-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-smull.ll index 0a692192ec8b..0c232a4bf5a8 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-smull.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-smull.ll @@ -96,7 +96,7 @@ define <8 x i16> @amull_v8i8_v8i16(<8 x i8>* %A, <8 x i8>* %B) nounwind { ; CHECK: // %bb.0: ; CHECK-NEXT:ldr d0, [x0] ; CHECK-NEXT:ldr d1, [x1] -; CHECK-NEXT:umull v0.8h, v0.8b, v1.8b +; CHECK-NEXT:smull v0.8h, v0.8b, v1.8b ; CHECK-NEXT:bic v0.8h, #255, lsl #8 ; CHECK-NEXT:ret %tmp1 = load <8 x i8>, <8 x i8>* %A @@ -113,7 +113,7 @@ define <4 x i32> @amull_v4i16_v4i32(<4 x i16>* %A, <4 x i16>* %B) nounwind { ; CHECK: // %bb.0: ; CHECK-NEXT:ldr d0, [x0] ; CHECK-NEXT:ldr d1, [x1] -; CHECK-NEXT:umull v0.4s, v0.4h, v1.4h +; CHECK-NEXT:smull v0.4s, v0.4h, v1.4h ; CHECK-NEXT:movi v1.2d, #0x00 ; CHECK-NEXT:and v0.16b, v0.16b, v1.16b ; CHECK-NEXT:ret @@ -131,7 +131,7 @@ define <2 x i64> @amull_v2i32_v2i64(<2 x i32>* %A, <2 x i32>* %B) nounwind { ; CHECK: // %bb.0: ; CHECK-NEXT:ldr d0, [x0] ; CHECK-NEXT:
[llvm-branch-commits] [llvm] 0ca81b9 - [X86][SSE] Add uitofp(trunc(and(lshr(x, c)))) vector test
Author: Simon Pilgrim Date: 2021-01-21T12:38:36Z New Revision: 0ca81b90d19d395c4891b7507cec0f063dd26d22 URL: https://github.com/llvm/llvm-project/commit/0ca81b90d19d395c4891b7507cec0f063dd26d22 DIFF: https://github.com/llvm/llvm-project/commit/0ca81b90d19d395c4891b7507cec0f063dd26d22.diff LOG: [X86][SSE] Add uitofp(trunc(and(lshr(x,c vector test Reduced from regression reported by @hans on D56387 Added: Modified: llvm/test/CodeGen/X86/uint_to_fp-3.ll Removed: diff --git a/llvm/test/CodeGen/X86/uint_to_fp-3.ll b/llvm/test/CodeGen/X86/uint_to_fp-3.ll index ca46b48b7731..5f1c3ec69a34 100644 --- a/llvm/test/CodeGen/X86/uint_to_fp-3.ll +++ b/llvm/test/CodeGen/X86/uint_to_fp-3.ll @@ -69,3 +69,64 @@ define <4 x double> @mask_ucvt_4i32_4f64(<4 x i32> %a) { %cvt = uitofp <4 x i32> %and to <4 x double> ret <4 x double> %cvt } + +; Regression noticed in D56387 +define <4 x float> @lshr_truncate_mask_ucvt_4i64_4f32(<4 x i64> *%p0) { +; X32-SSE-LABEL: lshr_truncate_mask_ucvt_4i64_4f32: +; X32-SSE: # %bb.0: +; X32-SSE-NEXT:movl {{[0-9]+}}(%esp), %eax +; X32-SSE-NEXT:movdqu (%eax), %xmm0 +; X32-SSE-NEXT:movdqu 16(%eax), %xmm1 +; X32-SSE-NEXT:psrlq $16, %xmm1 +; X32-SSE-NEXT:psrlq $16, %xmm0 +; X32-SSE-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; X32-SSE-NEXT:andps {{\.LCPI.*}}, %xmm0 +; X32-SSE-NEXT:cvtdq2ps %xmm0, %xmm0 +; X32-SSE-NEXT:mulps {{\.LCPI.*}}, %xmm0 +; X32-SSE-NEXT:retl +; +; X32-AVX-LABEL: lshr_truncate_mask_ucvt_4i64_4f32: +; X32-AVX: # %bb.0: +; X32-AVX-NEXT:movl {{[0-9]+}}(%esp), %eax +; X32-AVX-NEXT:vmovdqu (%eax), %xmm0 +; X32-AVX-NEXT:vmovdqu 16(%eax), %xmm1 +; X32-AVX-NEXT:vpsrlq $16, %xmm1, %xmm1 +; X32-AVX-NEXT:vpsrlq $16, %xmm0, %xmm0 +; X32-AVX-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; X32-AVX-NEXT:vpxor %xmm1, %xmm1, %xmm1 +; X32-AVX-NEXT:vpblendw {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2],xmm1[3],xmm0[4],xmm1[5],xmm0[6],xmm1[7] +; X32-AVX-NEXT:vcvtdq2ps %xmm0, %xmm0 +; X32-AVX-NEXT:vmulps {{\.LCPI.*}}, %xmm0, %xmm0 +; X32-AVX-NEXT:retl +; +; X64-SSE-LABEL: lshr_truncate_mask_ucvt_4i64_4f32: +; X64-SSE: # %bb.0: +; X64-SSE-NEXT:movdqu (%rdi), %xmm0 +; X64-SSE-NEXT:movdqu 16(%rdi), %xmm1 +; X64-SSE-NEXT:psrlq $16, %xmm1 +; X64-SSE-NEXT:psrlq $16, %xmm0 +; X64-SSE-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; X64-SSE-NEXT:andps {{.*}}(%rip), %xmm0 +; X64-SSE-NEXT:cvtdq2ps %xmm0, %xmm0 +; X64-SSE-NEXT:mulps {{.*}}(%rip), %xmm0 +; X64-SSE-NEXT:retq +; +; X64-AVX-LABEL: lshr_truncate_mask_ucvt_4i64_4f32: +; X64-AVX: # %bb.0: +; X64-AVX-NEXT:vmovdqu (%rdi), %xmm0 +; X64-AVX-NEXT:vmovdqu 16(%rdi), %xmm1 +; X64-AVX-NEXT:vpsrlq $16, %xmm1, %xmm1 +; X64-AVX-NEXT:vpsrlq $16, %xmm0, %xmm0 +; X64-AVX-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; X64-AVX-NEXT:vpxor %xmm1, %xmm1, %xmm1 +; X64-AVX-NEXT:vpblendw {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2],xmm1[3],xmm0[4],xmm1[5],xmm0[6],xmm1[7] +; X64-AVX-NEXT:vcvtdq2ps %xmm0, %xmm0 +; X64-AVX-NEXT:vmulps {{.*}}(%rip), %xmm0, %xmm0 +; X64-AVX-NEXT:retq + %load = load <4 x i64>, <4 x i64>* %p0, align 2 + %lshr = lshr <4 x i64> %load, + %and = and <4 x i64> %lshr, + %uitofp = uitofp <4 x i64> %and to <4 x float> + %fmul = fmul <4 x float> %uitofp, + ret <4 x float> %fmul +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 935bacd - [DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type
Author: Simon Pilgrim Date: 2021-01-21T12:38:36Z New Revision: 935bacd3a7244f04b7f39818e3fc589529474d13 URL: https://github.com/llvm/llvm-project/commit/935bacd3a7244f04b7f39818e3fc589529474d13 DIFF: https://github.com/llvm/llvm-project/commit/935bacd3a7244f04b7f39818e3fc589529474d13.diff LOG: [DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast. Added: Modified: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index b19033e3e427..cac4d8fff8bb 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -2023,12 +2023,12 @@ bool TargetLowering::SimplifyDemandedBits( if (!(HighBits & DemandedBits)) { // None of the shifted in bits are needed. Add a truncate of the // shift input, then shift it. - if (TLO.LegalTypes()) -ShAmt = TLO.DAG.getConstant(ShVal, dl, getShiftAmountTy(VT, DL)); + SDValue NewShAmt = TLO.DAG.getConstant( + ShVal, dl, getShiftAmountTy(VT, DL, TLO.LegalTypes())); SDValue NewTrunc = TLO.DAG.getNode(ISD::TRUNCATE, dl, VT, Src.getOperand(0)); return TLO.CombineTo( - Op, TLO.DAG.getNode(ISD::SRL, dl, VT, NewTrunc, ShAmt)); + Op, TLO.DAG.getNode(ISD::SRL, dl, VT, NewTrunc, NewShAmt)); } break; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] bc9ab9a - [DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI.
Author: Simon Pilgrim Date: 2021-01-21T11:04:09Z New Revision: bc9ab9a5cd6bafc5e1293f3d5d51638f8f5cd26c URL: https://github.com/llvm/llvm-project/commit/bc9ab9a5cd6bafc5e1293f3d5d51638f8f5cd26c DIFF: https://github.com/llvm/llvm-project/commit/bc9ab9a5cd6bafc5e1293f3d5d51638f8f5cd26c.diff LOG: [DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI. Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy. Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index f7c6a77b9a03..067bc436acdd 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -14940,16 +14940,13 @@ bool DAGCombiner::CombineToPreIndexedLoadStore(SDNode *N) { // Therefore, we have: // t0 = (x0 * offset0 - x1 * y0 * y1 *offset1) + (y0 * y1) * t1 -ConstantSDNode *CN = - cast(OtherUses[i]->getOperand(OffsetIdx)); -int X0, X1, Y0, Y1; +auto *CN = cast(OtherUses[i]->getOperand(OffsetIdx)); const APInt = CN->getAPIntValue(); -APInt Offset1 = cast(Offset)->getAPIntValue(); - -X0 = (OtherUses[i]->getOpcode() == ISD::SUB && OffsetIdx == 1) ? -1 : 1; -Y0 = (OtherUses[i]->getOpcode() == ISD::SUB && OffsetIdx == 0) ? -1 : 1; -X1 = (AM == ISD::PRE_DEC && !Swapped) ? -1 : 1; -Y1 = (AM == ISD::PRE_DEC && Swapped) ? -1 : 1; +const APInt = cast(Offset)->getAPIntValue(); +int X0 = (OtherUses[i]->getOpcode() == ISD::SUB && OffsetIdx == 1) ? -1 : 1; +int Y0 = (OtherUses[i]->getOpcode() == ISD::SUB && OffsetIdx == 0) ? -1 : 1; +int X1 = (AM == ISD::PRE_DEC && !Swapped) ? -1 : 1; +int Y1 = (AM == ISD::PRE_DEC && Swapped) ? -1 : 1; unsigned Opcode = (Y0 * Y1 < 0) ? ISD::SUB : ISD::ADD; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 86021d9 - [X86] Avoid a std::string copy by replacing auto with const auto&. NFC.
Author: Simon Pilgrim Date: 2021-01-21T11:04:07Z New Revision: 86021d98d3f8b27f7956cee04f11505c2e836e81 URL: https://github.com/llvm/llvm-project/commit/86021d98d3f8b27f7956cee04f11505c2e836e81 DIFF: https://github.com/llvm/llvm-project/commit/86021d98d3f8b27f7956cee04f11505c2e836e81.diff LOG: [X86] Avoid a std::string copy by replacing auto with const auto&. NFC. Fixes msvc analyzer warning. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 7cd17f109935..c5cc23f6236e 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -2516,11 +2516,11 @@ Value *X86TargetLowering::getIRStackGuard(IRBuilder<> ) const { if (Offset == (unsigned)-1) Offset = (Subtarget.is64Bit()) ? 0x28 : 0x14; - auto GuardReg = getTargetMachine().Options.StackProtectorGuardReg; -if (GuardReg == "fs") - AddressSpace = X86AS::FS; -else if (GuardReg == "gs") - AddressSpace = X86AS::GS; + const auto = getTargetMachine().Options.StackProtectorGuardReg; + if (GuardReg == "fs") +AddressSpace = X86AS::FS; + else if (GuardReg == "gs") +AddressSpace = X86AS::GS; return SegmentOffset(IRB, Offset, AddressSpace); } } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] b8b5e87 - [X86][AVX] Handle vperm2x128 shuffling of a subvector splat.
Author: Simon Pilgrim Date: 2021-01-20T18:16:33Z New Revision: b8b5e87e6b8102d77e4e6beccf4e0f0237acc897 URL: https://github.com/llvm/llvm-project/commit/b8b5e87e6b8102d77e4e6beccf4e0f0237acc897 DIFF: https://github.com/llvm/llvm-project/commit/b8b5e87e6b8102d77e4e6beccf4e0f0237acc897.diff LOG: [X86][AVX] Handle vperm2x128 shuffling of a subvector splat. We already handle "vperm2x128 (ins ?, X, C1), (ins ?, X, C1), 0x31" for shuffling of the upper subvectors, but we weren't dealing with the case when we were splatting the upper subvector from a single source. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avx-vperm2x128.ll llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 0b52b2021c73..852078a299b9 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -37324,6 +37324,14 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG , SDValue Ins1 = peekThroughBitcasts(N.getOperand(1)); unsigned Imm = N.getConstantOperandVal(2); +// Handle subvector splat by tweaking values to match binary concat. +// vperm2x128 (ins ?, X, C1), undef, 0x11 -> +// vperm2x128 (ins ?, X, C1), (ins ?, X, C1), 0x31 -> concat X, X +if (Imm == 0x11 && Ins1.isUndef()) { + Imm = 0x31; + Ins1 = Ins0; +} + if (!(Imm == 0x31 && Ins0.getOpcode() == ISD::INSERT_SUBVECTOR && Ins1.getOpcode() == ISD::INSERT_SUBVECTOR && diff --git a/llvm/test/CodeGen/X86/avx-vperm2x128.ll b/llvm/test/CodeGen/X86/avx-vperm2x128.ll index a519f55aaafe..bfab2f186bf5 100644 --- a/llvm/test/CodeGen/X86/avx-vperm2x128.ll +++ b/llvm/test/CodeGen/X86/avx-vperm2x128.ll @@ -130,7 +130,6 @@ define <32 x i8> @shuffle_v32i8_2323_domain(<32 x i8> %a, <32 x i8> %b) nounwind ; AVX1-NEXT:vpcmpeqd %xmm1, %xmm1, %xmm1 ; AVX1-NEXT:vpsubb %xmm1, %xmm0, %xmm0 ; AVX1-NEXT:vinsertf128 $1, %xmm0, %ymm0, %ymm0 -; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] ; AVX1-NEXT:retq ; ; AVX2-LABEL: shuffle_v32i8_2323_domain: diff --git a/llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll b/llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll index f1af4faf67e2..5f2a3cd72b71 100644 --- a/llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll +++ b/llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll @@ -3098,14 +3098,13 @@ entry: define <8 x i32> @add_v8i32_02468ACE_13579BDF(<8 x i32> %a, <8 x i32> %b) { ; AVX1-LABEL: add_v8i32_02468ACE_13579BDF: ; AVX1: # %bb.0: # %entry -; AVX1-NEXT:vphaddd %xmm1, %xmm0, %xmm2 +; AVX1-NEXT:vextractf128 $1, %ymm1, %xmm2 +; AVX1-NEXT:vextractf128 $1, %ymm0, %xmm3 +; AVX1-NEXT:vphaddd %xmm2, %xmm3, %xmm2 ; AVX1-NEXT:vinsertf128 $1, %xmm2, %ymm2, %ymm2 -; AVX1-NEXT:vextractf128 $1, %ymm1, %xmm1 -; AVX1-NEXT:vextractf128 $1, %ymm0, %xmm0 ; AVX1-NEXT:vphaddd %xmm1, %xmm0, %xmm0 ; AVX1-NEXT:vinsertf128 $1, %xmm0, %ymm0, %ymm0 -; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] -; AVX1-NEXT:vshufpd {{.*#+}} ymm0 = ymm2[0],ymm0[0],ymm2[3],ymm0[3] +; AVX1-NEXT:vshufpd {{.*#+}} ymm0 = ymm0[0],ymm2[0],ymm0[3],ymm2[3] ; AVX1-NEXT:retq ; ; AVX2OR512VL-LABEL: add_v8i32_02468ACE_13579BDF: @@ -3123,14 +3122,13 @@ entry: define <8 x i32> @add_v8i32_8ACE0246_9BDF1357(<8 x i32> %a, <8 x i32> %b) { ; AVX1-LABEL: add_v8i32_8ACE0246_9BDF1357: ; AVX1: # %bb.0: # %entry -; AVX1-NEXT:vphaddd %xmm1, %xmm0, %xmm2 +; AVX1-NEXT:vextractf128 $1, %ymm1, %xmm2 +; AVX1-NEXT:vextractf128 $1, %ymm0, %xmm3 +; AVX1-NEXT:vphaddd %xmm2, %xmm3, %xmm2 ; AVX1-NEXT:vinsertf128 $1, %xmm2, %ymm2, %ymm2 -; AVX1-NEXT:vextractf128 $1, %ymm1, %xmm1 -; AVX1-NEXT:vextractf128 $1, %ymm0, %xmm0 ; AVX1-NEXT:vphaddd %xmm1, %xmm0, %xmm0 ; AVX1-NEXT:vinsertf128 $1, %xmm0, %ymm0, %ymm0 -; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,2,3] -; AVX1-NEXT:vshufpd {{.*#+}} ymm0 = ymm2[1],ymm0[1],ymm2[2],ymm0[2] +; AVX1-NEXT:vshufpd {{.*#+}} ymm0 = ymm0[1],ymm2[1],ymm0[2],ymm2[2] ; AVX1-NEXT:retq ; ; AVX2OR512VL-LABEL: add_v8i32_8ACE0246_9BDF1357: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] cad4275 - [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Author: Simon Pilgrim Date: 2021-01-20T15:39:58Z New Revision: cad4275d697c601761e0819863f487def73c67f8 URL: https://github.com/llvm/llvm-project/commit/cad4275d697c601761e0819863f487def73c67f8 DIFF: https://github.com/llvm/llvm-project/commit/cad4275d697c601761e0819863f487def73c67f8.diff LOG: [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387 Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/test/CodeGen/AArch64/aarch64-smull.ll llvm/test/CodeGen/AArch64/lowerMUL-newload.ll llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll llvm/test/CodeGen/ARM/lowerMUL-newload.ll llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll llvm/test/CodeGen/Thumb2/mve-vmulh.ll llvm/test/CodeGen/X86/combine-sra.ll llvm/test/CodeGen/X86/known-signbits-vector.ll llvm/test/CodeGen/X86/min-legal-vector-width.ll llvm/test/CodeGen/X86/vector-trunc.ll Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index f7c6a77b9a03..680662536161 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -11952,8 +11952,7 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) { } // Simplify the operands using demanded-bits information. - if (!VT.isVector() && - SimplifyDemandedBits(SDValue(N, 0))) + if (SimplifyDemandedBits(SDValue(N, 0))) return SDValue(N, 0); // (trunc adde(X, Y, Carry)) -> (adde trunc(X), trunc(Y), Carry) diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index b19033e3e427..5613db8f724d 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -1986,7 +1986,8 @@ bool TargetLowering::SimplifyDemandedBits( // zero/one bits live out. unsigned OperandBitWidth = Src.getScalarValueSizeInBits(); APInt TruncMask = DemandedBits.zext(OperandBitWidth); -if (SimplifyDemandedBits(Src, TruncMask, Known, TLO, Depth + 1)) +if (SimplifyDemandedBits(Src, TruncMask, DemandedElts, Known, TLO, + Depth + 1)) return true; Known = Known.trunc(BitWidth); @@ -2009,9 +2010,9 @@ bool TargetLowering::SimplifyDemandedBits( // undesirable. break; -SDValue ShAmt = Src.getOperand(1); -auto *ShAmtC = dyn_cast(ShAmt); -if (!ShAmtC || ShAmtC->getAPIntValue().uge(BitWidth)) +const APInt *ShAmtC = +TLO.DAG.getValidShiftAmountConstant(Src, DemandedElts); +if (!ShAmtC) break; uint64_t ShVal = ShAmtC->getZExtValue(); @@ -2023,6 +2024,7 @@ bool TargetLowering::SimplifyDemandedBits( if (!(HighBits & DemandedBits)) { // None of the shifted in bits are needed. Add a truncate of the // shift input, then shift it. + SDValue ShAmt = Src.getOperand(1); if (TLO.LegalTypes()) ShAmt = TLO.DAG.getConstant(ShVal, dl, getShiftAmountTy(VT, DL)); SDValue NewTrunc = diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index c7bcd4de046c..6dd081dc3cb7 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -3399,6 +3399,7 @@ static SDValue skipExtensionForVectorMULL(SDNode *N, SelectionDAG ) { static bool isSignExtended(SDNode *N, SelectionDAG ) { return N->getOpcode() == ISD::SIGN_EXTEND || + N->getOpcode() == ISD::ANY_EXTEND || isExtendedBUILD_VECTOR(N, DAG, true); } diff --git a/llvm/test/CodeGen/AArch64/aarch64-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-smull.ll index 0a692192ec8b..0c232a4bf5a8 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-smull.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-smull.ll @@ -96,7 +96,7 @@ define <8 x i16> @amull_v8i8_v8i16(<8 x i8>* %A, <8 x i8>* %B) nounwind { ; CHECK: // %bb.0: ; CHECK-NEXT:ldr d0, [x0] ; CHECK-NEXT:ldr d1, [x1] -; CHECK-NEXT:umull v0.8h, v0.8b, v1.8b +; CHECK-NEXT:smull v0.8h, v0.8b, v1.8b ; CHECK-NEXT:bic v0.8h, #255, lsl #8 ; CHECK-NEXT:ret %tmp1 = load <8 x i8>, <8 x i8>* %A @@ -113,7 +113,7 @@ define <4 x i32> @amull_v4i16_v4i32(<4 x i16>* %A, <4 x i16>* %B) nounwind { ; CHECK: // %bb.0: ; CHECK-NEXT:ldr d0, [x0] ; CHECK-NEXT:ldr d1, [x1] -; CHECK-NEXT:umull v0.4s, v0.4h, v1.4h +; CHECK-NEXT:smull v0.4s, v0.4h, v1.4h ; CHECK-NEXT:movi v1.2d, #0x00 ; CHECK-NEXT:and v0.16b, v0.16b, v1.16b ; CHECK-NEXT:ret @@
[llvm-branch-commits] [llvm] 19d0284 - [X86][AVX] Fold extract_subvector(VSRLI/VSHLI(x, 32)) -> VSRLI/VSHLI(extract_subvector(x), 32)
Author: Simon Pilgrim Date: 2021-01-20T14:34:54Z New Revision: 19d02842ee56089b9208875ce4582e113e08fb6d URL: https://github.com/llvm/llvm-project/commit/19d02842ee56089b9208875ce4582e113e08fb6d DIFF: https://github.com/llvm/llvm-project/commit/19d02842ee56089b9208875ce4582e113e08fb6d.diff LOG: [X86][AVX] Fold extract_subvector(VSRLI/VSHLI(x,32)) -> VSRLI/VSHLI(extract_subvector(x),32) As discussed on D56387, if we're shifting to extract the upper/lower half of a vXi64 vector then we're actually better off performing this at the subvector level as its very likely to fold into something. combineConcatVectorOps can perform this in reverse if necessary. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/combine-sra.ll llvm/test/CodeGen/X86/pmul.ll llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 0ee671710219..0b52b2021c73 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -49799,8 +49799,8 @@ static SDValue combineExtractSubvector(SDNode *N, SelectionDAG , // If we're extracting the lowest subvector and we're the only user, // we may be able to perform this with a smaller vector width. + unsigned InOpcode = InVec.getOpcode(); if (IdxVal == 0 && InVec.hasOneUse()) { -unsigned InOpcode = InVec.getOpcode(); if (VT == MVT::v2f64 && InVecVT == MVT::v4f64) { // v2f64 CVTDQ2PD(v4i32). if (InOpcode == ISD::SINT_TO_FP && @@ -49853,6 +49853,17 @@ static SDValue combineExtractSubvector(SDNode *N, SelectionDAG , } } + // Always split vXi64 logical shifts where we're extracting the upper 32-bits + // as this is very likely to fold into a shuffle/truncation. + if ((InOpcode == X86ISD::VSHLI || InOpcode == X86ISD::VSRLI) && + InVecVT.getScalarSizeInBits() == 64 && + InVec.getConstantOperandAPInt(1) == 32) { +SDLoc DL(N); +SDValue Ext = +extractSubVector(InVec.getOperand(0), IdxVal, DAG, DL, SizeInBits); +return DAG.getNode(InOpcode, DL, VT, Ext, InVec.getOperand(1)); + } + return SDValue(); } diff --git a/llvm/test/CodeGen/X86/combine-sra.ll b/llvm/test/CodeGen/X86/combine-sra.ll index 28a73cdb6a41..453a61b8565e 100644 --- a/llvm/test/CodeGen/X86/combine-sra.ll +++ b/llvm/test/CodeGen/X86/combine-sra.ll @@ -207,9 +207,8 @@ define <4 x i32> @combine_vec_ashr_trunc_lshr(<4 x i64> %x) { ; ; AVX2-SLOW-LABEL: combine_vec_ashr_trunc_lshr: ; AVX2-SLOW: # %bb.0: -; AVX2-SLOW-NEXT:vpsrlq $32, %ymm0, %ymm0 -; AVX2-SLOW-NEXT:vextracti128 $1, %ymm0, %xmm1 -; AVX2-SLOW-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; AVX2-SLOW-NEXT:vextractf128 $1, %ymm0, %xmm1 +; AVX2-SLOW-NEXT:vshufps {{.*#+}} xmm0 = xmm0[1,3],xmm1[1,3] ; AVX2-SLOW-NEXT:vpsravd {{.*}}(%rip), %xmm0, %xmm0 ; AVX2-SLOW-NEXT:vzeroupper ; AVX2-SLOW-NEXT:retq diff --git a/llvm/test/CodeGen/X86/pmul.ll b/llvm/test/CodeGen/X86/pmul.ll index db6009f273d2..56476eea323e 100644 --- a/llvm/test/CodeGen/X86/pmul.ll +++ b/llvm/test/CodeGen/X86/pmul.ll @@ -1150,9 +1150,8 @@ define <4 x i32> @mul_v4i64_zero_lower(<4 x i32> %val1, <4 x i64> %val2) { ; AVX-NEXT:vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero ; AVX-NEXT:vpsrlq $32, %ymm1, %ymm1 ; AVX-NEXT:vpmuludq %ymm1, %ymm0, %ymm0 -; AVX-NEXT:vpsllq $32, %ymm0, %ymm0 ; AVX-NEXT:vextracti128 $1, %ymm0, %xmm1 -; AVX-NEXT:vshufps {{.*#+}} xmm0 = xmm0[1,3],xmm1[1,3] +; AVX-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] ; AVX-NEXT:vzeroupper ; AVX-NEXT:retq entry: diff --git a/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll b/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll index a274baefc1ef..f0cb46e63d8f 100644 --- a/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll +++ b/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll @@ -834,19 +834,20 @@ define <4 x double> @uitofp_v4i64_v4f64(<4 x i64> %x) #0 { ; ; AVX2-64-LABEL: uitofp_v4i64_v4f64: ; AVX2-64: # %bb.0: -; AVX2-64-NEXT:vpsrlq $32, %ymm0, %ymm1 -; AVX2-64-NEXT:vextracti128 $1, %ymm1, %xmm2 +; AVX2-64-NEXT:vextracti128 $1, %ymm0, %xmm1 +; AVX2-64-NEXT:vpsrlq $32, %xmm1, %xmm1 +; AVX2-64-NEXT:vpextrq $1, %xmm1, %rax +; AVX2-64-NEXT:vcvtsi2sd %rax, %xmm2, %xmm2 +; AVX2-64-NEXT:vmovq %xmm1, %rax +; AVX2-64-NEXT:vcvtsi2sd %rax, %xmm3, %xmm1 +; AVX2-64-NEXT:vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0] +; AVX2-64-NEXT:vpsrlq $32, %xmm0, %xmm2 ; AVX2-64-NEXT:vpextrq $1, %xmm2, %rax ; AVX2-64-NEXT:vcvtsi2sd %rax, %xmm3, %xmm3 ; AVX2-64-NEXT:vmovq %xmm2, %rax ; AVX2-64-NEXT:vcvtsi2sd %rax, %xmm4, %xmm2 ; AVX2-64-NEXT:vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm3[0] -; AVX2-64-NEXT:vpextrq $1, %xmm1, %rax -;
[llvm-branch-commits] [llvm] 2988f94 - [X86] Regenerate fmin/fmax reduction tests
Author: Simon Pilgrim Date: 2021-01-19T14:28:44Z New Revision: 2988f940d861f0fa76bc5b749772f2b9239d5a1b URL: https://github.com/llvm/llvm-project/commit/2988f940d861f0fa76bc5b749772f2b9239d5a1b DIFF: https://github.com/llvm/llvm-project/commit/2988f940d861f0fa76bc5b749772f2b9239d5a1b.diff LOG: [X86] Regenerate fmin/fmax reduction tests Add missing check-prefixes + v1f32 tests Added: Modified: llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll llvm/test/CodeGen/X86/vector-reduce-fmax.ll llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll llvm/test/CodeGen/X86/vector-reduce-fmin.ll Removed: diff --git a/llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll b/llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll index 021c48deece7..167248181ecb 100644 --- a/llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll +++ b/llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll @@ -1,15 +1,23 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 | FileCheck %s --check-prefixes=SSE,SSE2 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 | FileCheck %s --check-prefixes=SSE,SSE41 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefix=AVX -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefix=AVX -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw | FileCheck %s --check-prefix=AVX512 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512vl | FileCheck %s --check-prefix=AVX512 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 | FileCheck %s --check-prefixes=ALL,SSE,SSE2 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 | FileCheck %s --check-prefixes=ALL,SSE,SSE41 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=ALL,AVX +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=ALL,AVX +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw | FileCheck %s --check-prefixes=ALL,AVX512 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512vl | FileCheck %s --check-prefixes=ALL,AVX512 ; ; vXf32 ; +define float @test_v1f32(<1 x float> %a0) { +; ALL-LABEL: test_v1f32: +; ALL: # %bb.0: +; ALL-NEXT:retq + %1 = call nnan float @llvm.vector.reduce.fmax.v1f32(<1 x float> %a0) + ret float %1 +} + define float @test_v2f32(<2 x float> %a0) { ; SSE2-LABEL: test_v2f32: ; SSE2: # %bb.0: @@ -458,10 +466,10 @@ define half @test_v2f16(<2 x half> %a0) nounwind { ; SSE-NEXT:subq $16, %rsp ; SSE-NEXT:movl %edi, %ebx ; SSE-NEXT:movzwl %si, %edi -; SSE-NEXT:callq __gnu_h2f_ieee +; SSE-NEXT:callq __gnu_h2f_ieee@PLT ; SSE-NEXT:movaps %xmm0, (%rsp) # 16-byte Spill ; SSE-NEXT:movzwl %bx, %edi -; SSE-NEXT:callq __gnu_h2f_ieee +; SSE-NEXT:callq __gnu_h2f_ieee@PLT ; SSE-NEXT:movaps %xmm0, %xmm1 ; SSE-NEXT:cmpunordss %xmm0, %xmm1 ; SSE-NEXT:movaps %xmm1, %xmm2 @@ -471,7 +479,7 @@ define half @test_v2f16(<2 x half> %a0) nounwind { ; SSE-NEXT:andnps %xmm3, %xmm1 ; SSE-NEXT:orps %xmm2, %xmm1 ; SSE-NEXT:movaps %xmm1, %xmm0 -; SSE-NEXT:callq __gnu_f2h_ieee +; SSE-NEXT:callq __gnu_f2h_ieee@PLT ; SSE-NEXT:addq $16, %rsp ; SSE-NEXT:popq %rbx ; SSE-NEXT:retq @@ -482,16 +490,16 @@ define half @test_v2f16(<2 x half> %a0) nounwind { ; AVX-NEXT:subq $16, %rsp ; AVX-NEXT:movl %esi, %ebx ; AVX-NEXT:movzwl %di, %edi -; AVX-NEXT:callq __gnu_h2f_ieee +; AVX-NEXT:callq __gnu_h2f_ieee@PLT ; AVX-NEXT:vmovss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill ; AVX-NEXT:movzwl %bx, %edi -; AVX-NEXT:callq __gnu_h2f_ieee +; AVX-NEXT:callq __gnu_h2f_ieee@PLT ; AVX-NEXT:vmovss {{[-0-9]+}}(%r{{[sb]}}p), %xmm2 # 4-byte Reload ; AVX-NEXT:# xmm2 = mem[0],zero,zero,zero ; AVX-NEXT:vmaxss %xmm2, %xmm0, %xmm1 ; AVX-NEXT:vcmpunordss %xmm2, %xmm2, %xmm2 ; AVX-NEXT:vblendvps %xmm2, %xmm0, %xmm1, %xmm0 -; AVX-NEXT:callq __gnu_f2h_ieee +; AVX-NEXT:callq __gnu_f2h_ieee@PLT ; AVX-NEXT:addq $16, %rsp ; AVX-NEXT:popq %rbx ; AVX-NEXT:retq @@ -514,6 +522,7 @@ define half @test_v2f16(<2 x half> %a0) nounwind { %1 = call nnan half @llvm.vector.reduce.fmax.v2f16(<2 x half> %a0) ret half %1 } +declare float @llvm.vector.reduce.fmax.v1f32(<1 x float>) declare float @llvm.vector.reduce.fmax.v2f32(<2 x float>) declare float @llvm.vector.reduce.fmax.v4f32(<4 x float>) declare float @llvm.vector.reduce.fmax.v8f32(<8 x float>) diff --git a/llvm/test/CodeGen/X86/vector-reduce-fmax.ll b/llvm/test/CodeGen/X86/vector-reduce-fmax.ll index af8141a119ab..d7d754ac5548 100644 ---
[llvm-branch-commits] [llvm] 5626adc - [X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x, c)) -> packss(sra(x, c))
Author: Simon Pilgrim Date: 2021-01-19T11:04:13Z New Revision: 5626adcd6bbaadd12fe5bf15cd2d39ece2e5c406 URL: https://github.com/llvm/llvm-project/commit/5626adcd6bbaadd12fe5bf15cd2d39ece2e5c406 DIFF: https://github.com/llvm/llvm-project/commit/5626adcd6bbaadd12fe5bf15cd2d39ece2e5c406.diff LOG: [X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x,c)) -> packss(sra(x,c)) If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-trunc.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 97fcef0b92fa..0ee671710219 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -46071,9 +46071,23 @@ static SDValue combineVectorSignBitsTruncation(SDNode *N, const SDLoc , if (SVT == MVT::i32 && NumSignBits != InSVT.getSizeInBits()) return SDValue(); - if (NumSignBits > (InSVT.getSizeInBits() - NumPackedSignBits)) + unsigned MinSignBits = InSVT.getSizeInBits() - NumPackedSignBits; + if (NumSignBits > MinSignBits) return truncateVectorWithPACK(X86ISD::PACKSS, VT, In, DL, DAG, Subtarget); + // If we have a srl that only generates signbits that we will discard in + // the truncation then we can use PACKSS by converting the srl to a sra. + // SimplifyDemandedBits often relaxes sra to srl so we need to reverse it. + if (In.getOpcode() == ISD::SRL && N->isOnlyUserOf(In.getNode())) +if (const APInt *ShAmt = DAG.getValidShiftAmountConstant( +In, APInt::getAllOnesValue(VT.getVectorNumElements( { + if (*ShAmt == MinSignBits) { +SDValue NewIn = DAG.getNode(ISD::SRA, DL, InVT, In->ops()); +return truncateVectorWithPACK(X86ISD::PACKSS, VT, NewIn, DL, DAG, + Subtarget); + } +} + return SDValue(); } diff --git a/llvm/test/CodeGen/X86/vector-trunc.ll b/llvm/test/CodeGen/X86/vector-trunc.ll index f35e315bbb0b..1d8d6f66521e 100644 --- a/llvm/test/CodeGen/X86/vector-trunc.ll +++ b/llvm/test/CodeGen/X86/vector-trunc.ll @@ -452,10 +452,9 @@ define <8 x i16> @trunc8i32_8i16_lshr(<8 x i32> %a) { ; ; SSSE3-LABEL: trunc8i32_8i16_lshr: ; SSSE3: # %bb.0: # %entry -; SSSE3-NEXT:movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,10,11,14,15,14,15,128,128] -; SSSE3-NEXT:pshufb %xmm2, %xmm1 -; SSSE3-NEXT:pshufb %xmm2, %xmm0 -; SSSE3-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; SSSE3-NEXT:psrad $16, %xmm1 +; SSSE3-NEXT:psrad $16, %xmm0 +; SSSE3-NEXT:packssdw %xmm1, %xmm0 ; SSSE3-NEXT:retq ; ; SSE41-LABEL: trunc8i32_8i16_lshr: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] ce06475 - [X86][AVX] IsElementEquivalent - add matchShuffleWithUNPCK + VBROADCAST/VBROADCAST_LOAD handling
Author: Simon Pilgrim Date: 2021-01-18T15:55:00Z New Revision: ce06475da94f1040d17d46d471dd48478576a76f URL: https://github.com/llvm/llvm-project/commit/ce06475da94f1040d17d46d471dd48478576a76f DIFF: https://github.com/llvm/llvm-project/commit/ce06475da94f1040d17d46d471dd48478576a76f.diff LOG: [X86][AVX] IsElementEquivalent - add matchShuffleWithUNPCK + VBROADCAST/VBROADCAST_LOAD handling Specify LHS/RHS operands in matchShuffleWithUNPCK's calls to isTargetShuffleEquivalent, and handle VBROADCAST/VBROADCAST_LOAD matching in IsElementEquivalent Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/avg.ll llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 78a5d4a6dfbf8..60a2fd233d5cb 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -10960,6 +10960,11 @@ static bool IsElementEquivalent(int MaskSize, SDValue Op, SDValue ExpectedOp, MaskSize == (int)ExpectedOp.getNumOperands()) return Op.getOperand(Idx) == ExpectedOp.getOperand(ExpectedIdx); break; + case X86ISD::VBROADCAST: + case X86ISD::VBROADCAST_LOAD: +// TODO: Handle MaskSize != Op.getValueType().getVectorNumElements()? +return (Op == ExpectedOp && +Op.getValueType().getVectorNumElements() == MaskSize); case X86ISD::HADD: case X86ISD::HSUB: case X86ISD::FHADD: @@ -11321,7 +11326,8 @@ static bool matchShuffleWithUNPCK(MVT VT, SDValue , SDValue , // Attempt to match the target mask against the unpack lo/hi mask patterns. SmallVector Unpckl, Unpckh; createUnpackShuffleMask(VT, Unpckl, /* Lo = */ true, IsUnary); - if (isTargetShuffleEquivalent(VT, TargetMask, Unpckl)) { + if (isTargetShuffleEquivalent(VT, TargetMask, Unpckl, V1, +(IsUnary ? V1 : V2))) { UnpackOpcode = X86ISD::UNPCKL; V2 = (Undef2 ? DAG.getUNDEF(VT) : (IsUnary ? V1 : V2)); V1 = (Undef1 ? DAG.getUNDEF(VT) : V1); @@ -11329,7 +11335,8 @@ static bool matchShuffleWithUNPCK(MVT VT, SDValue , SDValue , } createUnpackShuffleMask(VT, Unpckh, /* Lo = */ false, IsUnary); - if (isTargetShuffleEquivalent(VT, TargetMask, Unpckh)) { + if (isTargetShuffleEquivalent(VT, TargetMask, Unpckh, V1, +(IsUnary ? V1 : V2))) { UnpackOpcode = X86ISD::UNPCKH; V2 = (Undef2 ? DAG.getUNDEF(VT) : (IsUnary ? V1 : V2)); V1 = (Undef1 ? DAG.getUNDEF(VT) : V1); diff --git a/llvm/test/CodeGen/X86/avg.ll b/llvm/test/CodeGen/X86/avg.ll index e2139fd20d32c..23fa7e033db9e 100644 --- a/llvm/test/CodeGen/X86/avg.ll +++ b/llvm/test/CodeGen/X86/avg.ll @@ -2245,7 +2245,7 @@ define void @not_avg_v16i8_wide_constants(<16 x i8>* %a, <16 x i8>* %b) nounwind ; AVX2-NEXT:vpunpcklbw {{.*#+}} xmm9 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7] ; AVX2-NEXT:vpbroadcastw %xmm8, %xmm8 ; AVX2-NEXT:vpbroadcastw %xmm9, %xmm0 -; AVX2-NEXT:vpblendw {{.*#+}} xmm8 = xmm0[0,1,2,3,4,5,6],xmm8[7] +; AVX2-NEXT:vpunpcklwd {{.*#+}} xmm8 = xmm0[0],xmm8[0],xmm0[1],xmm8[1],xmm0[2],xmm8[2],xmm0[3],xmm8[3] ; AVX2-NEXT:vpunpcklbw {{.*#+}} xmm0 = xmm13[0],xmm12[0],xmm13[1],xmm12[1],xmm13[2],xmm12[2],xmm13[3],xmm12[3],xmm13[4],xmm12[4],xmm13[5],xmm12[5],xmm13[6],xmm12[6],xmm13[7],xmm12[7] ; AVX2-NEXT:vpunpcklbw {{.*#+}} xmm9 = xmm15[0],xmm14[0],xmm15[1],xmm14[1],xmm15[2],xmm14[2],xmm15[3],xmm14[3],xmm15[4],xmm14[4],xmm15[5],xmm14[5],xmm15[6],xmm14[6],xmm15[7],xmm14[7] ; AVX2-NEXT:vpbroadcastw %xmm0, %xmm0 diff --git a/llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll b/llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll index 29ea4d3bf55d3..4c86242a1d302 100644 --- a/llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll +++ b/llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll @@ -4230,11 +4230,10 @@ define <4 x double> @test_masked_z_8xdouble_to_4xdouble_perm_mem_mask6(<8 x doub define <4 x double> @test_masked_8xdouble_to_4xdouble_perm_mem_mask7(<8 x double>* %vp, <4 x double> %vec2, <4 x double> %mask) { ; CHECK-LABEL: test_masked_8xdouble_to_4xdouble_perm_mem_mask7: ; CHECK: # %bb.0: -; CHECK-NEXT:vbroadcastsd 40(%rdi), %ymm2 -; CHECK-NEXT:vblendpd $5, (%rdi), %ymm2, %ymm2 # ymm2 = mem[0],ymm2[1],mem[2],ymm2[3] +; CHECK-NEXT:vmovapd (%rdi), %ymm2 ; CHECK-NEXT:vxorpd %xmm3, %xmm3, %xmm3 ; CHECK-NEXT:vcmpeqpd %ymm3, %ymm1, %k1 -; CHECK-NEXT:vmovapd %ymm2, %ymm0 {%k1} +; CHECK-NEXT:vunpcklpd 40(%rdi){1to4}, %ymm2, %ymm0 {%k1} ; CHECK-NEXT:retq %vec = load <8 x double>, <8 x double>* %vp %shuf = shufflevector <8 x double> %vec, <8 x double> undef, <4 x i32> @@ -4246,11 +4245,10 @@ define <4 x
[llvm-branch-commits] [llvm] 207f329 - [DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops
Author: Simon Pilgrim Date: 2021-01-18T10:29:23Z New Revision: 207f32948b2408bebd5a523695f6f7c08049db74 URL: https://github.com/llvm/llvm-project/commit/207f32948b2408bebd5a523695f6f7c08049db74 DIFF: https://github.com/llvm/llvm-project/commit/207f32948b2408bebd5a523695f6f7c08049db74.diff LOG: [DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532 Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll llvm/test/CodeGen/X86/combine-umin.ll llvm/test/CodeGen/X86/sdiv_fix_sat.ll llvm/test/CodeGen/X86/udiv_fix_sat.ll Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index e265bcea5945..ef83df8bdd96 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -4607,6 +4607,10 @@ SDValue DAGCombiner::visitIMINMAX(SDNode *N) { return DAG.getNode(AltOpcode, SDLoc(N), VT, N0, N1); } + // Simplify the operands using demanded-bits information. + if (SimplifyDemandedBits(SDValue(N, 0))) +return SDValue(N, 0); + return SDValue(); } diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 21953373b745..b19033e3e427 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -1722,6 +1722,32 @@ bool TargetLowering::SimplifyDemandedBits( } break; } + case ISD::UMIN: { +// Check if one arg is always less than (or equal) to the other arg. +SDValue Op0 = Op.getOperand(0); +SDValue Op1 = Op.getOperand(1); +KnownBits Known0 = TLO.DAG.computeKnownBits(Op0, DemandedElts, Depth + 1); +KnownBits Known1 = TLO.DAG.computeKnownBits(Op1, DemandedElts, Depth + 1); +Known = KnownBits::umin(Known0, Known1); +if (Optional IsULE = KnownBits::ule(Known0, Known1)) + return TLO.CombineTo(Op, IsULE.getValue() ? Op0 : Op1); +if (Optional IsULT = KnownBits::ult(Known0, Known1)) + return TLO.CombineTo(Op, IsULT.getValue() ? Op0 : Op1); +break; + } + case ISD::UMAX: { +// Check if one arg is always greater than (or equal) to the other arg. +SDValue Op0 = Op.getOperand(0); +SDValue Op1 = Op.getOperand(1); +KnownBits Known0 = TLO.DAG.computeKnownBits(Op0, DemandedElts, Depth + 1); +KnownBits Known1 = TLO.DAG.computeKnownBits(Op1, DemandedElts, Depth + 1); +Known = KnownBits::umax(Known0, Known1); +if (Optional IsUGE = KnownBits::uge(Known0, Known1)) + return TLO.CombineTo(Op, IsUGE.getValue() ? Op0 : Op1); +if (Optional IsUGT = KnownBits::ugt(Known0, Known1)) + return TLO.CombineTo(Op, IsUGT.getValue() ? Op0 : Op1); +break; + } case ISD::BITREVERSE: { SDValue Src = Op.getOperand(0); APInt DemandedSrcBits = DemandedBits.reverseBits(); diff --git a/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll b/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll index b4cd36daad65..f0604c7fe782 100644 --- a/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll +++ b/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll @@ -18,7 +18,7 @@ define amdgpu_kernel void @test(i64 addrspace(1)* %out) { ; CHECK-NEXT:2(2.802597e-45), 0(0.00e+00) ; CHECK-NEXT: MOV * T0.W, KC0[2].Y, ; CHECK-NEXT:ALU clause starting at 11: -; CHECK-NEXT: MAX_UINT T0.X, T0.X, literal.x, +; CHECK-NEXT: MOV T0.X, literal.x, ; CHECK-NEXT: MOV T0.Y, 0.0, ; CHECK-NEXT: LSHR * T1.X, T0.W, literal.y, ; CHECK-NEXT:4(5.605194e-45), 2(2.802597e-45) diff --git a/llvm/test/CodeGen/X86/combine-umin.ll b/llvm/test/CodeGen/X86/combine-umin.ll index b22c45bbce45..1be72ad66799 100644 --- a/llvm/test/CodeGen/X86/combine-umin.ll +++ b/llvm/test/CodeGen/X86/combine-umin.ll @@ -10,14 +10,9 @@ define i8 @test_demandedbits_umin_ult(i8 %a0, i8 %a1) { ; CHECK-LABEL: test_demandedbits_umin_ult: ; CHECK: # %bb.0: -; CHECK-NEXT:orb $12, %dil -; CHECK-NEXT:orb $4, %sil -; CHECK-NEXT:andb $13, %dil -; CHECK-NEXT:andb $12, %sil -; CHECK-NEXT:movzbl %dil, %ecx -; CHECK-NEXT:movzbl %sil, %eax -; CHECK-NEXT:cmpb %al, %cl -; CHECK-NEXT:cmovbl %ecx, %eax +; CHECK-NEXT:movl %esi, %eax +; CHECK-NEXT:orb $4, %al +; CHECK-NEXT:andb $12, %al ; CHECK-NEXT:# kill: def $al killed $al killed $eax ; CHECK-NEXT:retq %lhs0 = and i8 %a0, 13 ; b1101 diff --git a/llvm/test/CodeGen/X86/sdiv_fix_sat.ll b/llvm/test/CodeGen/X86/sdiv_fix_sat.ll index 617d5d7876bd..9801cb4018b9 100644
[llvm-branch-commits] [llvm] 770d1e0 - [X86][SSE] isHorizontalBinOp - reuse any existing horizontal ops.
Author: Simon Pilgrim Date: 2021-01-18T10:14:45Z New Revision: 770d1e0a8828010a7c95de4596e24d54ed2527c3 URL: https://github.com/llvm/llvm-project/commit/770d1e0a8828010a7c95de4596e24d54ed2527c3 DIFF: https://github.com/llvm/llvm-project/commit/770d1e0a8828010a7c95de4596e24d54ed2527c3.diff LOG: [X86][SSE] isHorizontalBinOp - reuse any existing horizontal ops. If we already have similar horizontal ops using the same args, then match that, even if we are on a target with slow horizontal ops. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/haddsub-shuf.ll llvm/test/CodeGen/X86/haddsub-undef.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 6bee21747bce..78a5d4a6dfbf 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -45628,8 +45628,9 @@ static SDValue combineVEXTRACT_STORE(SDNode *N, SelectionDAG , /// In short, LHS and RHS are inspected to see if LHS op RHS is of the form /// A horizontal-op B, for some already available A and B, and if so then LHS is /// set to A, RHS to B, and the routine returns 'true'. -static bool isHorizontalBinOp(SDValue , SDValue , SelectionDAG , - const X86Subtarget , bool IsCommutative, +static bool isHorizontalBinOp(unsigned HOpcode, SDValue , SDValue , + SelectionDAG , const X86Subtarget , + bool IsCommutative, SmallVectorImpl ) { // If either operand is undef, bail out. The binop should be simplified. if (LHS.isUndef() || RHS.isUndef()) @@ -45790,9 +45791,20 @@ static bool isHorizontalBinOp(SDValue , SDValue , SelectionDAG , isMultiLaneShuffleMask(128, VT.getScalarSizeInBits(), PostShuffleMask)) return false; + // If the source nodes are already used in HorizOps then always accept this. + // Shuffle folding should merge these back together. + bool FoundHorizLHS = llvm::any_of(NewLHS->uses(), [&](SDNode *User) { +return User->getOpcode() == HOpcode && User->getValueType(0) == VT; + }); + bool FoundHorizRHS = llvm::any_of(NewRHS->uses(), [&](SDNode *User) { +return User->getOpcode() == HOpcode && User->getValueType(0) == VT; + }); + bool ForceHorizOp = FoundHorizLHS && FoundHorizRHS; + // Assume a SingleSource HOP if we only shuffle one input and don't need to // shuffle the result. - if (!shouldUseHorizontalOp(NewLHS == NewRHS && + if (!ForceHorizOp && + !shouldUseHorizontalOp(NewLHS == NewRHS && (NumShuffles < 2 || !IsIdentityPostShuffle), DAG, Subtarget)) return false; @@ -45816,7 +45828,8 @@ static SDValue combineFaddFsub(SDNode *N, SelectionDAG , SmallVector PostShuffleMask; if (((Subtarget.hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) || (Subtarget.hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) && - isHorizontalBinOp(LHS, RHS, DAG, Subtarget, IsFadd, PostShuffleMask)) { + isHorizontalBinOp(HorizOpcode, LHS, RHS, DAG, Subtarget, IsFadd, +PostShuffleMask)) { SDValue HorizBinOp = DAG.getNode(HorizOpcode, SDLoc(N), VT, LHS, RHS); if (!PostShuffleMask.empty()) HorizBinOp = DAG.getVectorShuffle(VT, SDLoc(HorizBinOp), HorizBinOp, @@ -48931,17 +48944,18 @@ static SDValue combineAddOrSubToHADDorHSUB(SDNode *N, SelectionDAG , SDValue Op0 = N->getOperand(0); SDValue Op1 = N->getOperand(1); bool IsAdd = N->getOpcode() == ISD::ADD; + auto HorizOpcode = IsAdd ? X86ISD::HADD : X86ISD::HSUB; assert((IsAdd || N->getOpcode() == ISD::SUB) && "Wrong opcode"); SmallVector PostShuffleMask; if ((VT == MVT::v8i16 || VT == MVT::v4i32 || VT == MVT::v16i16 || VT == MVT::v8i32) && Subtarget.hasSSSE3() && - isHorizontalBinOp(Op0, Op1, DAG, Subtarget, IsAdd, PostShuffleMask)) { -auto HOpBuilder = [IsAdd](SelectionDAG , const SDLoc , - ArrayRef Ops) { - return DAG.getNode(IsAdd ? X86ISD::HADD : X86ISD::HSUB, DL, - Ops[0].getValueType(), Ops); + isHorizontalBinOp(HorizOpcode, Op0, Op1, DAG, Subtarget, IsAdd, +PostShuffleMask)) { +auto HOpBuilder = [HorizOpcode](SelectionDAG , const SDLoc , +ArrayRef Ops) { + return DAG.getNode(HorizOpcode, DL, Ops[0].getValueType(), Ops); }; SDValue HorizBinOp = SplitOpsAndApply(DAG, Subtarget, SDLoc(N), VT, {Op0, Op1}, HOpBuilder); diff --git a/llvm/test/CodeGen/X86/haddsub-shuf.ll b/llvm/test/CodeGen/X86/haddsub-shuf.ll index 37eedcd54441..282ef37f6e52 100644 --- a/llvm/test/CodeGen/X86/haddsub-shuf.ll +++ b/llvm/test/CodeGen/X86/haddsub-shuf.ll @@ -873,45 +873,15 @@ define <4 x float>
[llvm-branch-commits] [llvm] be69e66 - [X86][SSE] Attempt to fold shuffle(binop(), binop()) -> binop(shuffle(), shuffle())
Author: Simon Pilgrim Date: 2021-01-15T16:25:25Z New Revision: be69e66b1cd826f499566e1c3dadbf04e872baa0 URL: https://github.com/llvm/llvm-project/commit/be69e66b1cd826f499566e1c3dadbf04e872baa0 DIFF: https://github.com/llvm/llvm-project/commit/be69e66b1cd826f499566e1c3dadbf04e872baa0.diff LOG: [X86][SSE] Attempt to fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) If this will help us fold shuffles together, then push the shuffle through the merged binops. Ideally this would be performed in DAGCombiner::visitVECTOR_SHUFFLE but getting an efficient+legal merged shuffle can be tricky - on SSE we can be confident that for 32/64-bit elements vectors shuffles should easily fold. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/haddsub-shuf.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index a84250782c19..d2cc2395576a 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -37939,6 +37939,33 @@ static SDValue combineShuffle(SDNode *N, SelectionDAG , if (SDValue HAddSub = foldShuffleOfHorizOp(N, DAG)) return HAddSub; + +// Merge shuffles through binops if its likely we'll be able to merge it +// with other shuffles. +// shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) +// TODO: We might be able to move this to DAGCombiner::visitVECTOR_SHUFFLE. +if (auto *SVN = dyn_cast(N)) { + unsigned SrcOpcode = N->getOperand(0).getOpcode(); + if (SrcOpcode == N->getOperand(1).getOpcode() && TLI.isBinOp(SrcOpcode) && + N->isOnlyUserOf(N->getOperand(0).getNode()) && + N->isOnlyUserOf(N->getOperand(1).getNode()) && + VT.getScalarSizeInBits() >= 32) { +SDValue Op00 = N->getOperand(0).getOperand(0); +SDValue Op10 = N->getOperand(1).getOperand(0); +SDValue Op01 = N->getOperand(0).getOperand(1); +SDValue Op11 = N->getOperand(1).getOperand(1); +if ((Op00.getOpcode() == ISD::VECTOR_SHUFFLE || + Op10.getOpcode() == ISD::VECTOR_SHUFFLE) && +(Op01.getOpcode() == ISD::VECTOR_SHUFFLE || + Op11.getOpcode() == ISD::VECTOR_SHUFFLE)) { + SDLoc DL(N); + ArrayRef Mask = SVN->getMask(); + SDValue LHS = DAG.getVectorShuffle(VT, DL, Op00, Op10, Mask); + SDValue RHS = DAG.getVectorShuffle(VT, DL, Op01, Op11, Mask); + return DAG.getNode(SrcOpcode, DL, VT, LHS, RHS); +} + } +} } // Attempt to combine into a vector load/broadcast. diff --git a/llvm/test/CodeGen/X86/haddsub-shuf.ll b/llvm/test/CodeGen/X86/haddsub-shuf.ll index 9b2dfc1ce0cb..37eedcd54441 100644 --- a/llvm/test/CodeGen/X86/haddsub-shuf.ll +++ b/llvm/test/CodeGen/X86/haddsub-shuf.ll @@ -923,45 +923,15 @@ define <4 x float> @PR34724_1(<4 x float> %a, <4 x float> %b) { } define <4 x float> @PR34724_2(<4 x float> %a, <4 x float> %b) { -; SSSE3_SLOW-LABEL: PR34724_2: -; SSSE3_SLOW: # %bb.0: -; SSSE3_SLOW-NEXT:haddps %xmm1, %xmm0 -; SSSE3_SLOW-NEXT:movsldup {{.*#+}} xmm2 = xmm1[0,0,2,2] -; SSSE3_SLOW-NEXT:addps %xmm1, %xmm2 -; SSSE3_SLOW-NEXT:shufps {{.*#+}} xmm2 = xmm2[3,0],xmm0[2,0] -; SSSE3_SLOW-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,0] -; SSSE3_SLOW-NEXT:retq -; -; SSSE3_FAST-LABEL: PR34724_2: -; SSSE3_FAST: # %bb.0: -; SSSE3_FAST-NEXT:haddps %xmm1, %xmm0 -; SSSE3_FAST-NEXT:retq -; -; AVX1_SLOW-LABEL: PR34724_2: -; AVX1_SLOW: # %bb.0: -; AVX1_SLOW-NEXT:vhaddps %xmm1, %xmm0, %xmm0 -; AVX1_SLOW-NEXT:vmovsldup {{.*#+}} xmm2 = xmm1[0,0,2,2] -; AVX1_SLOW-NEXT:vaddps %xmm1, %xmm2, %xmm1 -; AVX1_SLOW-NEXT:vblendps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3] -; AVX1_SLOW-NEXT:retq -; -; AVX1_FAST-LABEL: PR34724_2: -; AVX1_FAST: # %bb.0: -; AVX1_FAST-NEXT:vhaddps %xmm1, %xmm0, %xmm0 -; AVX1_FAST-NEXT:retq -; -; AVX2_SLOW-LABEL: PR34724_2: -; AVX2_SLOW: # %bb.0: -; AVX2_SLOW-NEXT:vhaddps %xmm1, %xmm0, %xmm0 -; AVX2_SLOW-NEXT:vmovsldup {{.*#+}} xmm2 = xmm1[0,0,2,2] -; AVX2_SLOW-NEXT:vaddps %xmm1, %xmm2, %xmm1 -; AVX2_SLOW-NEXT:vblendps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3] -; AVX2_SLOW-NEXT:retq +; SSSE3-LABEL: PR34724_2: +; SSSE3: # %bb.0: +; SSSE3-NEXT:haddps %xmm1, %xmm0 +; SSSE3-NEXT:retq ; -; AVX2_FAST-LABEL: PR34724_2: -; AVX2_FAST: # %bb.0: -; AVX2_FAST-NEXT:vhaddps %xmm1, %xmm0, %xmm0 -; AVX2_FAST-NEXT:retq +; AVX-LABEL: PR34724_2: +; AVX: # %bb.0: +; AVX-NEXT:vhaddps %xmm1, %xmm0, %xmm0 +; AVX-NEXT:retq %t0 = shufflevector <4 x float> %a, <4 x float> %b, <4 x i32> %t1 = shufflevector <4 x float> %a, <4 x float> %b, <4 x i32> %t2 = fadd <4 x float> %t0, %t1 ___
[llvm-branch-commits] [llvm] 5183a13 - [X86] Add umin knownbits/demandedbits ult test for D94532
Author: Simon Pilgrim Date: 2021-01-15T14:42:55Z New Revision: 5183a13d37825f93d92c23c257dbb1c994098bdc URL: https://github.com/llvm/llvm-project/commit/5183a13d37825f93d92c23c257dbb1c994098bdc DIFF: https://github.com/llvm/llvm-project/commit/5183a13d37825f93d92c23c257dbb1c994098bdc.diff LOG: [X86] Add umin knownbits/demandedbits ult test for D94532 Added: Modified: llvm/test/CodeGen/X86/combine-umin.ll Removed: diff --git a/llvm/test/CodeGen/X86/combine-umin.ll b/llvm/test/CodeGen/X86/combine-umin.ll index 558d4df9adb4..b22c45bbce45 100644 --- a/llvm/test/CodeGen/X86/combine-umin.ll +++ b/llvm/test/CodeGen/X86/combine-umin.ll @@ -1,11 +1,33 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 | FileCheck %s --check-prefix=SSE2 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 | FileCheck %s --check-prefix=SSE41 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 | FileCheck %s --check-prefix=SSE42 -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefix=AVX -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefix=AVX -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f | FileCheck %s --check-prefix=AVX -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw | FileCheck %s --check-prefix=AVX +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 | FileCheck %s --check-prefixes=CHECK,SSE2 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 | FileCheck %s --check-prefixes=CHECK,SSE41 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 | FileCheck %s --check-prefixes=CHECK,SSE42 +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=CHECK,AVX +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=CHECK,AVX +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f | FileCheck %s --check-prefixes=CHECK,AVX +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw | FileCheck %s --check-prefixes=CHECK,AVX + +define i8 @test_demandedbits_umin_ult(i8 %a0, i8 %a1) { +; CHECK-LABEL: test_demandedbits_umin_ult: +; CHECK: # %bb.0: +; CHECK-NEXT:orb $12, %dil +; CHECK-NEXT:orb $4, %sil +; CHECK-NEXT:andb $13, %dil +; CHECK-NEXT:andb $12, %sil +; CHECK-NEXT:movzbl %dil, %ecx +; CHECK-NEXT:movzbl %sil, %eax +; CHECK-NEXT:cmpb %al, %cl +; CHECK-NEXT:cmovbl %ecx, %eax +; CHECK-NEXT:# kill: def $al killed $al killed $eax +; CHECK-NEXT:retq + %lhs0 = and i8 %a0, 13 ; b1101 + %rhs0 = and i8 %a1, 12 ; b1100 + %lhs1 = or i8 %lhs0, 12 ; b1100 + %rhs1 = or i8 %rhs0, 4 ; b0100 + %umin = tail call i8 @llvm.umin.i8(i8 %lhs1, i8 %rhs1) + ret i8 %umin +} +declare i8 @llvm.umin.i8(i8, i8) define <8 x i16> @test_v8i16_nosignbit(<8 x i16> %a, <8 x i16> %b) { ; SSE2-LABEL: test_v8i16_nosignbit: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 1dfd5c9 - [X86][AVX] combineHorizOpWithShuffle - support target shuffles in HOP(SHUFFLE(X, Y), SHUFFLE(X, Y)) -> SHUFFLE(HOP(X, Y))
Author: Simon Pilgrim Date: 2021-01-15T13:55:30Z New Revision: 1dfd5c9ad8cf677fb4c9c3ccf39d7b1f20c397d3 URL: https://github.com/llvm/llvm-project/commit/1dfd5c9ad8cf677fb4c9c3ccf39d7b1f20c397d3 DIFF: https://github.com/llvm/llvm-project/commit/1dfd5c9ad8cf677fb4c9c3ccf39d7b1f20c397d3.diff LOG: [X86][AVX] combineHorizOpWithShuffle - support target shuffles in HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)) Be more aggressive on (AVX2+) folds of lane shuffles of 256-bit horizontal ops by working on target/faux shuffles as well. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/haddsub-2.ll llvm/test/CodeGen/X86/haddsub-undef.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index d45eb5366bfe..a84250782c19 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -43114,30 +43114,32 @@ static SDValue combineHorizOpWithShuffle(SDNode *N, SelectionDAG , // Attempt to fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)). // TODO: Relax shuffle scaling to support sub-128-bit subvector shuffles. if (VT.is256BitVector() && Subtarget.hasInt256()) { -if (auto *SVN0 = dyn_cast(N0)) { - if (auto *SVN1 = dyn_cast(N1)) { -SmallVector ShuffleMask0, ShuffleMask1; -if (scaleShuffleElements(SVN0->getMask(), 2, ShuffleMask0) && -scaleShuffleElements(SVN1->getMask(), 2, ShuffleMask1)) { - SDValue Op00 = SVN0->getOperand(0); - SDValue Op01 = SVN0->getOperand(1); - SDValue Op10 = SVN1->getOperand(0); - SDValue Op11 = SVN1->getOperand(1); - if ((Op00 == Op11) && (Op01 == Op10)) { -std::swap(Op10, Op11); -ShuffleVectorSDNode::commuteMask(ShuffleMask1); - } - if ((Op00 == Op10) && (Op01 == Op11)) { -SmallVector ShuffleMask; -ShuffleMask.append(ShuffleMask0.begin(), ShuffleMask0.end()); -ShuffleMask.append(ShuffleMask1.begin(), ShuffleMask1.end()); -SDLoc DL(N); -MVT ShufVT = VT.isFloatingPoint() ? MVT::v4f64 : MVT::v4i64; -SDValue Res = DAG.getNode(Opcode, DL, VT, Op00, Op01); -Res = DAG.getBitcast(ShufVT, Res); -Res = DAG.getVectorShuffle(ShufVT, DL, Res, Res, ShuffleMask); -return DAG.getBitcast(VT, Res); - } +SmallVector Mask0, Mask1; +SmallVector Ops0, Ops1; +if (getTargetShuffleInputs(N0, Ops0, Mask0, DAG) && !isAnyZero(Mask0) && +getTargetShuffleInputs(N1, Ops1, Mask1, DAG) && !isAnyZero(Mask1) && +!Ops0.empty() && !Ops1.empty()) { + SDValue Op00 = Ops0.front(), Op01 = Ops0.back(); + SDValue Op10 = Ops1.front(), Op11 = Ops1.back(); + SmallVector ShuffleMask0, ShuffleMask1; + if (Op00.getValueType() == SrcVT && Op01.getValueType() == SrcVT && + Op11.getValueType() == SrcVT && Op11.getValueType() == SrcVT && + scaleShuffleElements(Mask0, 2, ShuffleMask0) && + scaleShuffleElements(Mask1, 2, ShuffleMask1)) { +if ((Op00 == Op11) && (Op01 == Op10)) { + std::swap(Op10, Op11); + ShuffleVectorSDNode::commuteMask(ShuffleMask1); +} +if ((Op00 == Op10) && (Op01 == Op11)) { + SmallVector ShuffleMask; + ShuffleMask.append(ShuffleMask0.begin(), ShuffleMask0.end()); + ShuffleMask.append(ShuffleMask1.begin(), ShuffleMask1.end()); + SDLoc DL(N); + MVT ShufVT = VT.isFloatingPoint() ? MVT::v4f64 : MVT::v4i64; + SDValue Res = DAG.getNode(Opcode, DL, VT, Op00, Op01); + Res = DAG.getBitcast(ShufVT, Res); + Res = DAG.getVectorShuffle(ShufVT, DL, Res, Res, ShuffleMask); + return DAG.getBitcast(VT, Res); } } } diff --git a/llvm/test/CodeGen/X86/haddsub-2.ll b/llvm/test/CodeGen/X86/haddsub-2.ll index a025604f44a5..82fd7a2699a5 100644 --- a/llvm/test/CodeGen/X86/haddsub-2.ll +++ b/llvm/test/CodeGen/X86/haddsub-2.ll @@ -444,12 +444,18 @@ define <4 x double> @avx_vhadd_pd_test(<4 x double> %A, <4 x double> %B) { ; SSE-NEXT:movapd %xmm2, %xmm1 ; SSE-NEXT:retq ; -; AVX-LABEL: avx_vhadd_pd_test: -; AVX: # %bb.0: -; AVX-NEXT:vperm2f128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] -; AVX-NEXT:vinsertf128 $1, %xmm1, %ymm0, %ymm0 -; AVX-NEXT:vhaddpd %ymm2, %ymm0, %ymm0 -; AVX-NEXT:retq +; AVX1-LABEL: avx_vhadd_pd_test: +; AVX1: # %bb.0: +; AVX1-NEXT:vperm2f128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] +; AVX1-NEXT:vinsertf128 $1, %xmm1, %ymm0, %ymm0 +; AVX1-NEXT:vhaddpd %ymm2, %ymm0, %ymm0 +; AVX1-NEXT:retq +; +; AVX2-LABEL: avx_vhadd_pd_test: +; AVX2: # %bb.0: +; AVX2-NEXT:vhaddpd %ymm1, %ymm0, %ymm0 +; AVX2-NEXT:vpermpd {{.*#+}} ymm0 = ymm0[0,2,1,3] +; AVX2-NEXT:retq %vecext =
[llvm-branch-commits] [llvm] b99782c - [X86][AVX] Adjust unsigned saturation downconvert negative test
Author: Simon Pilgrim Date: 2021-01-14T17:51:23Z New Revision: b99782cf7850a481fa36fd95ae04923739e0da6d URL: https://github.com/llvm/llvm-project/commit/b99782cf7850a481fa36fd95ae04923739e0da6d DIFF: https://github.com/llvm/llvm-project/commit/b99782cf7850a481fa36fd95ae04923739e0da6d.diff LOG: [X86][AVX] Adjust unsigned saturation downconvert negative test D87145 was showing that this test (added in D45315) could always be constant folded (with suitable value tracking). What we actually needed was smax(smin()) negative test coverage, the invert of negative_test2_smax_usat_trunc_wb_256_mem, so I've tweaked the test to provide that instead. Added: Modified: llvm/test/CodeGen/X86/avx512-trunc.ll Removed: diff --git a/llvm/test/CodeGen/X86/avx512-trunc.ll b/llvm/test/CodeGen/X86/avx512-trunc.ll index 0b2a47c2772c..d61ada4e5d05 100644 --- a/llvm/test/CodeGen/X86/avx512-trunc.ll +++ b/llvm/test/CodeGen/X86/avx512-trunc.ll @@ -1007,10 +1007,8 @@ define <16 x i16> @smax_usat_trunc_dw_512(<16 x i32> %i) { define void @negative_test1_smax_usat_trunc_wb_256_mem(<16 x i16> %i, <16 x i8>* %res) { ; KNL-LABEL: negative_test1_smax_usat_trunc_wb_256_mem: ; KNL: ## %bb.0: -; KNL-NEXT:vpxor %xmm1, %xmm1, %xmm1 -; KNL-NEXT:vpmaxsw %ymm1, %ymm0, %ymm0 -; KNL-NEXT:vpcmpeqd %ymm1, %ymm1, %ymm1 -; KNL-NEXT:vpminsw %ymm1, %ymm0, %ymm0 +; KNL-NEXT:vpminsw {{.*}}(%rip), %ymm0, %ymm0 +; KNL-NEXT:vpmaxsw {{.*}}(%rip), %ymm0, %ymm0 ; KNL-NEXT:vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero ; KNL-NEXT:vpmovdb %zmm0, (%rdi) ; KNL-NEXT:vzeroupper @@ -1018,17 +1016,15 @@ define void @negative_test1_smax_usat_trunc_wb_256_mem(<16 x i16> %i, <16 x i8>* ; ; SKX-LABEL: negative_test1_smax_usat_trunc_wb_256_mem: ; SKX: ## %bb.0: -; SKX-NEXT:vpxor %xmm1, %xmm1, %xmm1 -; SKX-NEXT:vpmaxsw %ymm1, %ymm0, %ymm0 -; SKX-NEXT:vpcmpeqd %ymm1, %ymm1, %ymm1 -; SKX-NEXT:vpminsw %ymm1, %ymm0, %ymm0 +; SKX-NEXT:vpminsw {{.*}}(%rip), %ymm0, %ymm0 +; SKX-NEXT:vpmaxsw {{.*}}(%rip), %ymm0, %ymm0 ; SKX-NEXT:vpmovwb %ymm0, (%rdi) ; SKX-NEXT:vzeroupper ; SKX-NEXT:retq - %x1 = icmp sgt <16 x i16> %i, - %x2 = select <16 x i1> %x1, <16 x i16> %i, <16 x i16> - %x3 = icmp slt <16 x i16> %x2, - %x5 = select <16 x i1> %x3, <16 x i16> %x2, <16 x i16> + %x1 = icmp slt <16 x i16> %i, + %x2 = select <16 x i1> %x1, <16 x i16> %i, <16 x i16> + %x3 = icmp sgt <16 x i16> %x2, + %x5 = select <16 x i1> %x3, <16 x i16> %x2, <16 x i16> %x6 = trunc <16 x i16> %x5 to <16 x i8> store <16 x i8> %x6, <16 x i8>* %res, align 1 ret void ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 0a59647 - [SystemZ] misched-cutoff tests can only be tested on non-NDEBUG (assertion) builds
Author: Simon Pilgrim Date: 2021-01-14T15:46:27Z New Revision: 0a59647ee407524e6468cc5be4ba288861aa700d URL: https://github.com/llvm/llvm-project/commit/0a59647ee407524e6468cc5be4ba288861aa700d DIFF: https://github.com/llvm/llvm-project/commit/0a59647ee407524e6468cc5be4ba288861aa700d.diff LOG: [SystemZ] misched-cutoff tests can only be tested on non-NDEBUG (assertion) builds Fixes clang-with-thin-lto-ubuntu buildbot after D94383/rGddd03842c347 Added: Modified: llvm/test/CodeGen/SystemZ/misched-cutoff.ll Removed: diff --git a/llvm/test/CodeGen/SystemZ/misched-cutoff.ll b/llvm/test/CodeGen/SystemZ/misched-cutoff.ll index 0de80a22c301..859c7398f2cd 100644 --- a/llvm/test/CodeGen/SystemZ/misched-cutoff.ll +++ b/llvm/test/CodeGen/SystemZ/misched-cutoff.ll @@ -1,5 +1,7 @@ ; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z13 -misched-cutoff=1 -o /dev/null < %s -; +; REQUIRES: asserts +; -misched=shuffle isn't available in NDEBUG builds! + ; Test that the post-ra scheduler does not crash with -misched-cutoff. @g_184 = external dso_local global i16, align 2 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c0939fd - [Support] Simplify KnownBits::sextInReg implementation.
Author: Simon Pilgrim Date: 2021-01-14T15:14:32Z New Revision: c0939fddf80c16829502186e2e5b78f77696310a URL: https://github.com/llvm/llvm-project/commit/c0939fddf80c16829502186e2e5b78f77696310a DIFF: https://github.com/llvm/llvm-project/commit/c0939fddf80c16829502186e2e5b78f77696310a.diff LOG: [Support] Simplify KnownBits::sextInReg implementation. As noted by @foad in rG9cf4f493a72f all we need to do is sextInReg both KnownBits One and Zero. Added: Modified: llvm/lib/Support/KnownBits.cpp Removed: diff --git a/llvm/lib/Support/KnownBits.cpp b/llvm/lib/Support/KnownBits.cpp index a46a90bb97d4..3623a54ae476 100644 --- a/llvm/lib/Support/KnownBits.cpp +++ b/llvm/lib/Support/KnownBits.cpp @@ -91,34 +91,12 @@ KnownBits KnownBits::sextInReg(unsigned SrcBitWidth) const { if (SrcBitWidth == BitWidth) return *this; - // Sign extension. Compute the demanded bits in the result that are not - // present in the input. - APInt NewBits = APInt::getHighBitsSet(BitWidth, BitWidth - SrcBitWidth); - - // If the sign extended bits are demanded, we know that the sign - // bit is demanded. - APInt InSignMask = APInt::getSignMask(SrcBitWidth).zext(BitWidth); - APInt InDemandedBits = APInt::getLowBitsSet(BitWidth, SrcBitWidth); - if (NewBits.getBoolValue()) -InDemandedBits |= InSignMask; - + unsigned ExtBits = BitWidth - SrcBitWidth; KnownBits Result; - Result.One = One & InDemandedBits; - Result.Zero = Zero & InDemandedBits; - - // If the sign bit of the input is known set or clear, then we know the - // top bits of the result. - if (Result.Zero.intersects(InSignMask)) { // Input sign bit known clear -Result.Zero |= NewBits; -Result.One &= ~NewBits; - } else if (Result.One.intersects(InSignMask)) { // Input sign bit known set -Result.One |= NewBits; -Result.Zero &= ~NewBits; - } else { // Input sign bit unknown -Result.Zero &= ~NewBits; -Result.One &= ~NewBits; - } - + Result.One = One << ExtBits; + Result.Zero = Zero << ExtBits; + Result.One.ashrInPlace(ExtBits); + Result.Zero.ashrInPlace(ExtBits); return Result; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 0b46f19 - [Support] Ensure KnownBits::sextInReg can handle the src == dst sext-in-reg case.
Author: Simon Pilgrim Date: 2021-01-14T14:50:21Z New Revision: 0b46f19a9ecd6215cffb51d19f2403c18b0226f5 URL: https://github.com/llvm/llvm-project/commit/0b46f19a9ecd6215cffb51d19f2403c18b0226f5 DIFF: https://github.com/llvm/llvm-project/commit/0b46f19a9ecd6215cffb51d19f2403c18b0226f5.diff LOG: [Support] Ensure KnownBits::sextInReg can handle the src == dst sext-in-reg case. This was resulting in assertions inside APInt::zext that we were extending to the same bitwidth. Added: Modified: llvm/lib/Support/KnownBits.cpp llvm/unittests/Support/KnownBitsTest.cpp Removed: diff --git a/llvm/lib/Support/KnownBits.cpp b/llvm/lib/Support/KnownBits.cpp index 0f36c6a9ef1d..a46a90bb97d4 100644 --- a/llvm/lib/Support/KnownBits.cpp +++ b/llvm/lib/Support/KnownBits.cpp @@ -85,7 +85,11 @@ KnownBits KnownBits::computeForAddSub(bool Add, bool NSW, KnownBits KnownBits::sextInReg(unsigned SrcBitWidth) const { unsigned BitWidth = getBitWidth(); - assert(BitWidth >= SrcBitWidth && "Illegal sext-in-register"); + assert(0 < SrcBitWidth && SrcBitWidth <= BitWidth && + "Illegal sext-in-register"); + + if (SrcBitWidth == BitWidth) +return *this; // Sign extension. Compute the demanded bits in the result that are not // present in the input. diff --git a/llvm/unittests/Support/KnownBitsTest.cpp b/llvm/unittests/Support/KnownBitsTest.cpp index 991096098b8e..4e69df49837e 100644 --- a/llvm/unittests/Support/KnownBitsTest.cpp +++ b/llvm/unittests/Support/KnownBitsTest.cpp @@ -427,7 +427,7 @@ TEST(KnownBitsTest, SExtOrTrunc) { TEST(KnownBitsTest, SExtInReg) { unsigned Bits = 4; - for (unsigned FromBits = 1; FromBits != Bits; ++FromBits) { + for (unsigned FromBits = 1; FromBits <= Bits; ++FromBits) { ForeachKnownBits(Bits, [&](const KnownBits ) { APInt CommonOne = APInt::getAllOnesValue(Bits); APInt CommonZero = APInt::getAllOnesValue(Bits); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e8622d2 - [Support] Add KnownBits::sextInReg exhaustive tests
Author: Simon Pilgrim Date: 2021-01-14T14:27:45Z New Revision: e8622d27c0e3020177ff47ad57dd1e5371feb9cf URL: https://github.com/llvm/llvm-project/commit/e8622d27c0e3020177ff47ad57dd1e5371feb9cf DIFF: https://github.com/llvm/llvm-project/commit/e8622d27c0e3020177ff47ad57dd1e5371feb9cf.diff LOG: [Support] Add KnownBits::sextInReg exhaustive tests Requested by @foad in rG9cf4f493a72f Added: Modified: llvm/unittests/Support/KnownBitsTest.cpp Removed: diff --git a/llvm/unittests/Support/KnownBitsTest.cpp b/llvm/unittests/Support/KnownBitsTest.cpp index ba587a1e2f65..991096098b8e 100644 --- a/llvm/unittests/Support/KnownBitsTest.cpp +++ b/llvm/unittests/Support/KnownBitsTest.cpp @@ -425,4 +425,24 @@ TEST(KnownBitsTest, SExtOrTrunc) { } } +TEST(KnownBitsTest, SExtInReg) { + unsigned Bits = 4; + for (unsigned FromBits = 1; FromBits != Bits; ++FromBits) { +ForeachKnownBits(Bits, [&](const KnownBits ) { + APInt CommonOne = APInt::getAllOnesValue(Bits); + APInt CommonZero = APInt::getAllOnesValue(Bits); + unsigned ExtBits = Bits - FromBits; + ForeachNumInKnownBits(Known, [&](const APInt ) { +APInt Ext = N << ExtBits; +Ext.ashrInPlace(ExtBits); +CommonOne &= Ext; +CommonZero &= ~Ext; + }); + KnownBits KnownSExtInReg = Known.sextInReg(FromBits); + EXPECT_EQ(CommonOne, KnownSExtInReg.One); + EXPECT_EQ(CommonZero, KnownSExtInReg.Zero); +}); + } +} + } // end anonymous namespace ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 7c30c05 - [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI.
Author: Simon Pilgrim Date: 2021-01-14T11:55:20Z New Revision: 7c30c05ff71d062f0b8a05b7c3c12ede2c285371 URL: https://github.com/llvm/llvm-project/commit/7c30c05ff71d062f0b8a05b7c3c12ede2c285371 DIFF: https://github.com/llvm/llvm-project/commit/7c30c05ff71d062f0b8a05b7c3c12ede2c285371.diff LOG: [DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI. I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run. Also, move the second op matching before bailing to make it simpler to try to match other things afterward. Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index f4c9b814b806..eaf9ad9ef6e2 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -20835,7 +20835,9 @@ SDValue DAGCombiner::visitVECTOR_SHUFFLE(SDNode *N) { if (OtherSVN->isSplat()) return false; +SV0 = SV1 = SDValue(); Mask.clear(); + for (unsigned i = 0; i != NumElts; ++i) { int Idx = SVN->getMaskElt(i); if (Idx < 0) { @@ -20877,15 +20879,16 @@ SDValue DAGCombiner::visitVECTOR_SHUFFLE(SDNode *N) { Mask.push_back(Idx); continue; } + if (!SV1.getNode() || SV1 == CurrentVec) { +// Ok. CurrentVec is the right hand side. +// Update the mask accordingly. +SV1 = CurrentVec; +Mask.push_back(Idx + NumElts); +continue; + } // Bail out if we cannot convert the shuffle pair into a single shuffle. - if (SV1.getNode() && SV1 != CurrentVec) -return false; - - // Ok. CurrentVec is the right hand side. - // Update the mask accordingly. - SV1 = CurrentVec; - Mask.push_back(Idx + NumElts); + return false; } return true; }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 8f1d7f3 - [X86] Improve sum-of-reductions v4f32 test coverage
Author: Simon Pilgrim Date: 2021-01-14T11:05:19Z New Revision: 8f1d7f3753ca132b310bbb0e62c394cfa75daee5 URL: https://github.com/llvm/llvm-project/commit/8f1d7f3753ca132b310bbb0e62c394cfa75daee5 DIFF: https://github.com/llvm/llvm-project/commit/8f1d7f3753ca132b310bbb0e62c394cfa75daee5.diff LOG: [X86] Improve sum-of-reductions v4f32 test coverage Ensure that the v4f32 reductions use a -0.0f start value and add fast-math test variant. Added: Modified: llvm/test/CodeGen/X86/horizontal-sum.ll Removed: diff --git a/llvm/test/CodeGen/X86/horizontal-sum.ll b/llvm/test/CodeGen/X86/horizontal-sum.ll index 315e795d7a37..a5b34c482474 100644 --- a/llvm/test/CodeGen/X86/horizontal-sum.ll +++ b/llvm/test/CodeGen/X86/horizontal-sum.ll @@ -1,10 +1,10 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3 | FileCheck %s --check-prefixes=SSSE3,SSSE3-SLOW -; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3,fast-hops | FileCheck %s --check-prefixes=SSSE3,SSSE3-FAST -; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx | FileCheck %s --check-prefixes=AVX,AVX-SLOW,AVX1-SLOW -; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx,fast-hops | FileCheck %s --check-prefixes=AVX,AVX-FAST,AVX1-FAST -; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2| FileCheck %s --check-prefixes=AVX,AVX-SLOW,AVX2-SLOW -; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2,fast-hops | FileCheck %s --check-prefixes=AVX,AVX-FAST,AVX2-FAST +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3 | FileCheck %s --check-prefixes=SSSE3-SLOW +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3,fast-hops | FileCheck %s --check-prefixes=SSSE3-FAST +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx | FileCheck %s --check-prefixes=AVX-SLOW,AVX1-SLOW +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx,fast-hops | FileCheck %s --check-prefixes=AVX-FAST,AVX1-FAST +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2| FileCheck %s --check-prefixes=AVX-SLOW,AVX2-SLOW +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2,fast-hops | FileCheck %s --check-prefixes=AVX-FAST,AVX2-FAST ; Vectorized Pairwise Sum Reductions ; e.g. @@ -954,77 +954,137 @@ define <4 x i32> @sequential_sum_v4i32_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i3 ; } define <4 x float> @reduction_sum_v4f32_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { -; SSSE3-LABEL: reduction_sum_v4f32_v4f32: -; SSSE3: # %bb.0: -; SSSE3-NEXT:movshdup {{.*#+}} xmm5 = xmm0[1,1,3,3] -; SSSE3-NEXT:movss {{.*#+}} xmm4 = mem[0],zero,zero,zero -; SSSE3-NEXT:addss %xmm4, %xmm5 -; SSSE3-NEXT:movaps %xmm0, %xmm6 -; SSSE3-NEXT:unpckhpd {{.*#+}} xmm6 = xmm6[1],xmm0[1] -; SSSE3-NEXT:addss %xmm5, %xmm6 -; SSSE3-NEXT:shufps {{.*#+}} xmm0 = xmm0[3,3,3,3] -; SSSE3-NEXT:addss %xmm6, %xmm0 -; SSSE3-NEXT:movshdup {{.*#+}} xmm5 = xmm1[1,1,3,3] -; SSSE3-NEXT:addss %xmm4, %xmm5 -; SSSE3-NEXT:movaps %xmm1, %xmm6 -; SSSE3-NEXT:unpckhpd {{.*#+}} xmm6 = xmm6[1],xmm1[1] -; SSSE3-NEXT:addss %xmm5, %xmm6 -; SSSE3-NEXT:shufps {{.*#+}} xmm1 = xmm1[3,3,3,3] -; SSSE3-NEXT:addss %xmm6, %xmm1 -; SSSE3-NEXT:unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] -; SSSE3-NEXT:movshdup {{.*#+}} xmm1 = xmm2[1,1,3,3] -; SSSE3-NEXT:addss %xmm4, %xmm1 -; SSSE3-NEXT:movaps %xmm2, %xmm5 -; SSSE3-NEXT:unpckhpd {{.*#+}} xmm5 = xmm5[1],xmm2[1] -; SSSE3-NEXT:addss %xmm1, %xmm5 -; SSSE3-NEXT:shufps {{.*#+}} xmm2 = xmm2[3,3,3,3] -; SSSE3-NEXT:addss %xmm5, %xmm2 -; SSSE3-NEXT:movshdup {{.*#+}} xmm1 = xmm3[1,1,3,3] -; SSSE3-NEXT:addss %xmm4, %xmm1 -; SSSE3-NEXT:movaps %xmm3, %xmm4 -; SSSE3-NEXT:unpckhpd {{.*#+}} xmm4 = xmm4[1],xmm3[1] -; SSSE3-NEXT:addss %xmm1, %xmm4 -; SSSE3-NEXT:shufps {{.*#+}} xmm3 = xmm3[3,3,3,3] -; SSSE3-NEXT:addss %xmm4, %xmm3 -; SSSE3-NEXT:unpcklps {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1] -; SSSE3-NEXT:movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0] -; SSSE3-NEXT:retq +; SSSE3-SLOW-LABEL: reduction_sum_v4f32_v4f32: +; SSSE3-SLOW: # %bb.0: +; SSSE3-SLOW-NEXT:movshdup {{.*#+}} xmm4 = xmm0[1,1,3,3] +; SSSE3-SLOW-NEXT:addss %xmm0, %xmm4 +; SSSE3-SLOW-NEXT:movaps %xmm0, %xmm5 +; SSSE3-SLOW-NEXT:unpckhpd {{.*#+}} xmm5 = xmm5[1],xmm0[1] +; SSSE3-SLOW-NEXT:addss %xmm4, %xmm5 +; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm0 = xmm0[3,3,3,3] +; SSSE3-SLOW-NEXT:addss %xmm5, %xmm0 +; SSSE3-SLOW-NEXT:movshdup {{.*#+}} xmm4 = xmm1[1,1,3,3] +; SSSE3-SLOW-NEXT:addss %xmm1, %xmm4 +; SSSE3-SLOW-NEXT:movaps %xmm1, %xmm5 +; SSSE3-SLOW-NEXT:unpckhpd {{.*#+}} xmm5 = xmm5[1],xmm1[1] +; SSSE3-SLOW-NEXT:addss %xmm4, %xmm5 +; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm1 = xmm1[3,3,3,3]
[llvm-branch-commits] [llvm] af8d27a - [DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI.
Author: Simon Pilgrim Date: 2021-01-14T11:05:19Z New Revision: af8d27a7a8266b89916b5e4db2b2fd97eb7d84e5 URL: https://github.com/llvm/llvm-project/commit/af8d27a7a8266b89916b5e4db2b2fd97eb7d84e5 DIFF: https://github.com/llvm/llvm-project/commit/af8d27a7a8266b89916b5e4db2b2fd97eb7d84e5.diff LOG: [DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI. Make it easier to reuse in a future patch. Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 24bc7fe7e0ad..f4c9b814b806 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -20823,30 +20823,19 @@ SDValue DAGCombiner::visitVECTOR_SHUFFLE(SDNode *N) { return DAG.getCommutedVectorShuffle(*SVN); } - // Try to fold according to rules: - // shuffle(shuffle(A, B, M0), C, M1) -> shuffle(A, B, M2) - // shuffle(shuffle(A, B, M0), C, M1) -> shuffle(A, C, M2) - // shuffle(shuffle(A, B, M0), C, M1) -> shuffle(B, C, M2) - // Don't try to fold shuffles with illegal type. - // Only fold if this shuffle is the only user of the other shuffle. - if (N0.getOpcode() == ISD::VECTOR_SHUFFLE && N->isOnlyUserOf(N0.getNode()) && - Level < AfterLegalizeDAG && TLI.isTypeLegal(VT)) { -ShuffleVectorSDNode *OtherSV = cast(N0); - + // Compute the combined shuffle mask for a shuffle with SV0 as the first + // operand, and SV1 as the second operand. + // i.e. Merge SVN(OtherSVN, N1) -> shuffle(SV0, SV1, Mask). + auto MergeInnerShuffle = [NumElts](ShuffleVectorSDNode *SVN, + ShuffleVectorSDNode *OtherSVN, SDValue N1, + SDValue , SDValue , + SmallVectorImpl ) -> bool { // Don't try to fold splats; they're likely to simplify somehow, or they // might be free. -if (OtherSV->isSplat()) - return SDValue(); - -// The incoming shuffle must be of the same type as the result of the -// current shuffle. -assert(OtherSV->getOperand(0).getValueType() == VT && - "Shuffle types don't match"); +if (OtherSVN->isSplat()) + return false; -SDValue SV0, SV1; -SmallVector Mask; -// Compute the combined shuffle mask for a shuffle with SV0 as the first -// operand, and SV1 as the second operand. +Mask.clear(); for (unsigned i = 0; i != NumElts; ++i) { int Idx = SVN->getMaskElt(i); if (Idx < 0) { @@ -20859,15 +20848,14 @@ SDValue DAGCombiner::visitVECTOR_SHUFFLE(SDNode *N) { if (Idx < (int)NumElts) { // This shuffle index refers to the inner shuffle N0. Lookup the inner // shuffle mask to identify which vector is actually referenced. -Idx = OtherSV->getMaskElt(Idx); +Idx = OtherSVN->getMaskElt(Idx); if (Idx < 0) { // Propagate Undef. Mask.push_back(Idx); continue; } - -CurrentVec = (Idx < (int) NumElts) ? OtherSV->getOperand(0) - : OtherSV->getOperand(1); +CurrentVec = (Idx < (int)NumElts) ? OtherSVN->getOperand(0) + : OtherSVN->getOperand(1); } else { // This shuffle index references an element within N1. CurrentVec = N1; @@ -20892,31 +20880,52 @@ SDValue DAGCombiner::visitVECTOR_SHUFFLE(SDNode *N) { // Bail out if we cannot convert the shuffle pair into a single shuffle. if (SV1.getNode() && SV1 != CurrentVec) -return SDValue(); +return false; // Ok. CurrentVec is the right hand side. // Update the mask accordingly. SV1 = CurrentVec; Mask.push_back(Idx + NumElts); } +return true; + }; -// Check if all indices in Mask are Undef. In case, propagate Undef. -if (llvm::all_of(Mask, [](int M) { return M < 0; })) - return DAG.getUNDEF(VT); + // Try to fold according to rules: + // shuffle(shuffle(A, B, M0), C, M1) -> shuffle(A, B, M2) + // shuffle(shuffle(A, B, M0), C, M1) -> shuffle(A, C, M2) + // shuffle(shuffle(A, B, M0), C, M1) -> shuffle(B, C, M2) + // Don't try to fold shuffles with illegal type. + // Only fold if this shuffle is the only user of the other shuffle. + if (N0.getOpcode() == ISD::VECTOR_SHUFFLE && N->isOnlyUserOf(N0.getNode()) && + Level < AfterLegalizeDAG && TLI.isTypeLegal(VT)) { +ShuffleVectorSDNode *OtherSV = cast(N0); + +// The incoming shuffle must be of the same type as the result of the +// current shuffle. +assert(OtherSV->getOperand(0).getValueType() == VT && + "Shuffle types don't match"); -if (!SV0.getNode()) - SV0 = DAG.getUNDEF(VT); -if (!SV1.getNode()) - SV1 =
[llvm-branch-commits] [llvm] 993c488 - [DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI.
Author: Simon Pilgrim Date: 2021-01-13T17:19:41Z New Revision: 993c488ed2b347011d9d71990af38a82aaf5bdf5 URL: https://github.com/llvm/llvm-project/commit/993c488ed2b347011d9d71990af38a82aaf5bdf5 DIFF: https://github.com/llvm/llvm-project/commit/993c488ed2b347011d9d71990af38a82aaf5bdf5.diff LOG: [DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI. Added: Modified: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index 5d9bb4e4a98b..7e4ee3bd 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -20901,11 +20901,7 @@ SDValue DAGCombiner::visitVECTOR_SHUFFLE(SDNode *N) { } // Check if all indices in Mask are Undef. In case, propagate Undef. -bool isUndefMask = true; -for (unsigned i = 0; i != NumElts && isUndefMask; ++i) - isUndefMask &= Mask[i] < 0; - -if (isUndefMask) +if (llvm::all_of(Mask, [](int M) { return M < 0; })) return DAG.getUNDEF(VT); if (!SV0.getNode()) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] efb6e45 - [X86][AVX] Add test for another 'reverse HADD' pattern mentioned in PR41813
Author: Simon Pilgrim Date: 2021-01-13T17:19:41Z New Revision: efb6e45d2be8e3e0843bdc4c2766e6910083c08e URL: https://github.com/llvm/llvm-project/commit/efb6e45d2be8e3e0843bdc4c2766e6910083c08e DIFF: https://github.com/llvm/llvm-project/commit/efb6e45d2be8e3e0843bdc4c2766e6910083c08e.diff LOG: [X86][AVX] Add test for another 'reverse HADD' pattern mentioned in PR41813 Added: Modified: llvm/test/CodeGen/X86/haddsub-4.ll Removed: diff --git a/llvm/test/CodeGen/X86/haddsub-4.ll b/llvm/test/CodeGen/X86/haddsub-4.ll index d0c62753f0d2..6003f98b9371 100644 --- a/llvm/test/CodeGen/X86/haddsub-4.ll +++ b/llvm/test/CodeGen/X86/haddsub-4.ll @@ -120,6 +120,38 @@ define <8 x float> @hadd_reverse2_v8f32(<8 x float> %a0, <8 x float> %a1) { ret <8 x float> %add } +define <8 x float> @hadd_reverse3_v8f32(<8 x float> %a0, <8 x float> %a1) { +; SSE-LABEL: hadd_reverse3_v8f32: +; SSE: # %bb.0: +; SSE-NEXT:movaps %xmm0, %xmm4 +; SSE-NEXT:haddps %xmm2, %xmm4 +; SSE-NEXT:haddps %xmm3, %xmm1 +; SSE-NEXT:shufps {{.*#+}} xmm1 = xmm1[3,2,1,0] +; SSE-NEXT:shufps {{.*#+}} xmm4 = xmm4[3,2,1,0] +; SSE-NEXT:movaps %xmm1, %xmm0 +; SSE-NEXT:movaps %xmm4, %xmm1 +; SSE-NEXT:retq +; +; AVX1-LABEL: hadd_reverse3_v8f32: +; AVX1: # %bb.0: +; AVX1-NEXT:vhaddps %ymm1, %ymm0, %ymm0 +; AVX1-NEXT:vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] +; AVX1-NEXT:vperm2f128 {{.*#+}} ymm0 = ymm0[2,3,0,1] +; AVX1-NEXT:retq +; +; AVX2-LABEL: hadd_reverse3_v8f32: +; AVX2: # %bb.0: +; AVX2-NEXT:vhaddps %ymm1, %ymm0, %ymm0 +; AVX2-NEXT:vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] +; AVX2-NEXT:vpermpd {{.*#+}} ymm0 = ymm0[2,3,0,1] +; AVX2-NEXT:retq + %shuf0 = shufflevector <8 x float> %a0, <8 x float> %a1, <8 x i32> + %shuf1 = shufflevector <8 x float> %a0, <8 x float> %a1, <8 x i32> + %add = fadd <8 x float> %shuf0, %shuf1 + %shuf2 = shufflevector <8 x float> %add, <8 x float> poison, <8 x i32> + ret <8 x float> %shuf2 +} + define <16 x i16> @hadd_reverse_v16i16(<16 x i16> %a0, <16 x i16> %a1) nounwind { ; SSE-LABEL: hadd_reverse_v16i16: ; SSE: # %bb.0: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] cbbfc82 - [X86][SSE] canonicalizeShuffleMaskWithHorizOp - simplify shuffle(HOP(HOP(X, Y), HOP(Z, W))) style chains.
Author: Simon Pilgrim Date: 2021-01-13T17:19:40Z New Revision: cbbfc8258615bc971a54c6287abe33c4215d2eac URL: https://github.com/llvm/llvm-project/commit/cbbfc8258615bc971a54c6287abe33c4215d2eac DIFF: https://github.com/llvm/llvm-project/commit/cbbfc8258615bc971a54c6287abe33c4215d2eac.diff LOG: [X86][SSE] canonicalizeShuffleMaskWithHorizOp - simplify shuffle(HOP(HOP(X,Y),HOP(Z,W))) style chains. See if we can remove the shuffle by resorting a HOP chain so that the HOP args are pre-shuffled. This initial version just handles (the most common) v4i32/v4f32 hadd/hsub reduction patterns - future work can extend this to v8i16 types plus PACK chains (2f64 HADD/HSUB should already be handled in the half-lane combine code later on). Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/horizontal-sum.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 821cfc5f0c27..d45eb5366bfe 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36115,6 +36115,38 @@ static SDValue canonicalizeShuffleMaskWithHorizOp( int NumEltsPerLane = NumElts / NumLanes; int NumHalfEltsPerLane = NumEltsPerLane / 2; + // See if we can remove the shuffle by resorting the HOP chain so that + // the HOP args are pre-shuffled. + // TODO: Generalize to any sized/depth chain. + // TODO: Add support for PACKSS/PACKUS. + if (isHoriz && NumEltsPerLane == 4 && VT0.is128BitVector() && + shouldUseHorizontalOp(Ops.size() == 1, DAG, Subtarget)) { +SmallVector ScaledMask; +if (scaleShuffleElements(Mask, 4, ScaledMask)) { + // Attempt to find a HOP(HOP(X,Y),HOP(Z,W)) source operand. + auto GetHOpSrc = [&](int M) { +if (M == SM_SentinelUndef) + return DAG.getUNDEF(VT0); +if (M == SM_SentinelZero) + return getZeroVector(VT0.getSimpleVT(), Subtarget, DAG, DL); +SDValue Src0 = BC[M / NumElts]; +SDValue Src1 = Src0.getOperand((M % 4) >= 2); +if (Src1.getOpcode() == Opcode0 && Src0->isOnlyUserOf(Src1.getNode())) + return Src1.getOperand(M % 2); +return SDValue(); + }; + SDValue M0 = GetHOpSrc(ScaledMask[0]); + SDValue M1 = GetHOpSrc(ScaledMask[1]); + SDValue M2 = GetHOpSrc(ScaledMask[2]); + SDValue M3 = GetHOpSrc(ScaledMask[3]); + if (M0 && M1 && M2 && M3) { +SDValue LHS = DAG.getNode(Opcode0, DL, VT0, M0, M1); +SDValue RHS = DAG.getNode(Opcode0, DL, VT0, M2, M3); +return DAG.getNode(Opcode0, DL, VT0, LHS, RHS); + } +} + } + if (2 < Ops.size()) return SDValue(); diff --git a/llvm/test/CodeGen/X86/horizontal-sum.ll b/llvm/test/CodeGen/X86/horizontal-sum.ll index 47d44171d99a..315e795d7a37 100644 --- a/llvm/test/CodeGen/X86/horizontal-sum.ll +++ b/llvm/test/CodeGen/X86/horizontal-sum.ll @@ -38,13 +38,9 @@ define <4 x float> @pair_sum_v4f32_v4f32(<4 x float> %0, <4 x float> %1, <4 x fl ; ; SSSE3-FAST-LABEL: pair_sum_v4f32_v4f32: ; SSSE3-FAST: # %bb.0: -; SSSE3-FAST-NEXT:haddps %xmm0, %xmm0 -; SSSE3-FAST-NEXT:haddps %xmm1, %xmm1 ; SSSE3-FAST-NEXT:haddps %xmm1, %xmm0 -; SSSE3-FAST-NEXT:haddps %xmm2, %xmm2 -; SSSE3-FAST-NEXT:haddps %xmm3, %xmm3 -; SSSE3-FAST-NEXT:haddps %xmm2, %xmm3 -; SSSE3-FAST-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,2],xmm3[2,0] +; SSSE3-FAST-NEXT:haddps %xmm3, %xmm2 +; SSSE3-FAST-NEXT:haddps %xmm2, %xmm0 ; SSSE3-FAST-NEXT:retq ; ; AVX1-SLOW-LABEL: pair_sum_v4f32_v4f32: @@ -66,18 +62,12 @@ define <4 x float> @pair_sum_v4f32_v4f32(<4 x float> %0, <4 x float> %1, <4 x fl ; AVX1-SLOW-NEXT:vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0] ; AVX1-SLOW-NEXT:retq ; -; AVX1-FAST-LABEL: pair_sum_v4f32_v4f32: -; AVX1-FAST: # %bb.0: -; AVX1-FAST-NEXT:vhaddps %xmm0, %xmm0, %xmm0 -; AVX1-FAST-NEXT:vhaddps %xmm1, %xmm1, %xmm1 -; AVX1-FAST-NEXT:vhaddps %xmm1, %xmm0, %xmm0 -; AVX1-FAST-NEXT:vhaddps %xmm2, %xmm2, %xmm1 -; AVX1-FAST-NEXT:vhaddps %xmm1, %xmm1, %xmm1 -; AVX1-FAST-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,1] -; AVX1-FAST-NEXT:vhaddps %xmm3, %xmm3, %xmm1 -; AVX1-FAST-NEXT:vhaddps %xmm1, %xmm1, %xmm1 -; AVX1-FAST-NEXT:vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0] -; AVX1-FAST-NEXT:retq +; AVX-FAST-LABEL: pair_sum_v4f32_v4f32: +; AVX-FAST: # %bb.0: +; AVX-FAST-NEXT:vhaddps %xmm1, %xmm0, %xmm0 +; AVX-FAST-NEXT:vhaddps %xmm3, %xmm2, %xmm1 +; AVX-FAST-NEXT:vhaddps %xmm1, %xmm0, %xmm0 +; AVX-FAST-NEXT:retq ; ; AVX2-SLOW-LABEL: pair_sum_v4f32_v4f32: ; AVX2-SLOW: # %bb.0: @@ -97,19 +87,6 @@ define <4 x float> @pair_sum_v4f32_v4f32(<4 x float> %0, <4 x float> %1, <4 x fl ; AVX2-SLOW-NEXT:vaddps %xmm2, %xmm1, %xmm1 ; AVX2-SLOW-NEXT:vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0] ;
[llvm-branch-commits] [llvm] 0a0ee7f - [X86] canonicalizeShuffleMaskWithHorizOp - minor refactor to support multiple src ops. NFCI.
Author: Simon Pilgrim Date: 2021-01-13T13:59:56Z New Revision: 0a0ee7f5a5af0f5dae65452f649ab665e787e7d6 URL: https://github.com/llvm/llvm-project/commit/0a0ee7f5a5af0f5dae65452f649ab665e787e7d6 DIFF: https://github.com/llvm/llvm-project/commit/0a0ee7f5a5af0f5dae65452f649ab665e787e7d6.diff LOG: [X86] canonicalizeShuffleMaskWithHorizOp - minor refactor to support multiple src ops. NFCI. canonicalizeShuffleMaskWithHorizOp currently only supports shuffles with 1 or 2 sources, but PR41813 will require us to support higher numbers of sources. This patch just generalizes the initial setup stages to ensure all src ops are the same type and opcode and then will continue to early out if we have more than 2 sources. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 5949782f3c0c..821cfc5f0c27 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36088,20 +36088,20 @@ static SDValue canonicalizeShuffleMaskWithHorizOp( MutableArrayRef Ops, MutableArrayRef Mask, unsigned RootSizeInBits, const SDLoc , SelectionDAG , const X86Subtarget ) { - - // Combine binary shuffle of 2 similar 'Horizontal' instructions into a - // single instruction. Attempt to match a v2X64 repeating shuffle pattern that - // represents the LHS/RHS inputs for the lower/upper halves. - if (Mask.empty() || Ops.empty() || 2 < Ops.size()) + if (Mask.empty() || Ops.empty()) return SDValue(); - SDValue BC0 = peekThroughBitcasts(Ops.front()); - SDValue BC1 = peekThroughBitcasts(Ops.back()); + SmallVector BC; + for (SDValue Op : Ops) +BC.push_back(peekThroughBitcasts(Op)); + + // All ops must be the same horizop + type. + SDValue BC0 = BC[0]; EVT VT0 = BC0.getValueType(); - EVT VT1 = BC1.getValueType(); unsigned Opcode0 = BC0.getOpcode(); - unsigned Opcode1 = BC1.getOpcode(); - if (Opcode0 != Opcode1 || VT0 != VT1 || VT0.getSizeInBits() != RootSizeInBits) + if (VT0.getSizeInBits() != RootSizeInBits || llvm::any_of(BC, [&](SDValue V) { +return V.getOpcode() != Opcode0 || V.getValueType() != VT0; + })) return SDValue(); bool isHoriz = (Opcode0 == X86ISD::FHADD || Opcode0 == X86ISD::HADD || @@ -36110,12 +36110,16 @@ static SDValue canonicalizeShuffleMaskWithHorizOp( if (!isHoriz && !isPack) return SDValue(); - if (Mask.size() == VT0.getVectorNumElements()) { -int NumElts = VT0.getVectorNumElements(); -int NumLanes = VT0.getSizeInBits() / 128; -int NumEltsPerLane = NumElts / NumLanes; -int NumHalfEltsPerLane = NumEltsPerLane / 2; + int NumElts = VT0.getVectorNumElements(); + int NumLanes = VT0.getSizeInBits() / 128; + int NumEltsPerLane = NumElts / NumLanes; + int NumHalfEltsPerLane = NumEltsPerLane / 2; + + if (2 < Ops.size()) +return SDValue(); + SDValue BC1 = BC[BC.size() - 1]; + if (Mask.size() == VT0.getVectorNumElements()) { // Canonicalize binary shuffles of horizontal ops that use the // same sources to an unary shuffle. // TODO: Try to perform this fold even if the shuffle remains. @@ -36159,6 +36163,9 @@ static SDValue canonicalizeShuffleMaskWithHorizOp( } } + // Combine binary shuffle of 2 similar 'Horizontal' instructions into a + // single instruction. Attempt to match a v2X64 repeating shuffle pattern that + // represents the LHS/RHS inputs for the lower/upper halves. unsigned EltSizeInBits = RootSizeInBits / Mask.size(); SmallVector TargetMask128, WideMask128; if (isRepeatedTargetShuffleMask(128, EltSizeInBits, Mask, TargetMask128) && ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 0f59d09 - [X86][AVX] combineVectorSignBitsTruncation - limit AVX512 truncations to 128-bits (PR48727)
Author: Simon Pilgrim Date: 2021-01-13T10:38:23Z New Revision: 0f59d099571d3d803b54e2ce06aa94babb9b26db URL: https://github.com/llvm/llvm-project/commit/0f59d099571d3d803b54e2ce06aa94babb9b26db DIFF: https://github.com/llvm/llvm-project/commit/0f59d099571d3d803b54e2ce06aa94babb9b26db.diff LOG: [X86][AVX] combineVectorSignBitsTruncation - limit AVX512 truncations to 128-bits (PR48727) rG73a44f437bf1 result in 256-bit packss/packus ops with additional shuffles that shuffle combining can sometimes try to convert back into a truncation. Added: llvm/test/CodeGen/X86/pr48727.ll Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-pack-256.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 65b784f31842..5949782f3c0c 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -45957,11 +45957,11 @@ static SDValue combineVectorSignBitsTruncation(SDNode *N, const SDLoc , if (Subtarget.hasAVX512() && !(!Subtarget.useAVX512Regs() && VT.is256BitVector() && InVT.is512BitVector())) { -// PACK should still be worth it for 128/256-bit vectors if the sources were +// PACK should still be worth it for 128-bit vectors if the sources were // originally concatenated from subvectors. SmallVector ConcatOps; -if (VT.getSizeInBits() > 256 || !collectConcatOps(In.getNode(), ConcatOps)) - return SDValue(); +if (VT.getSizeInBits() > 128 || !collectConcatOps(In.getNode(), ConcatOps)) +return SDValue(); } unsigned NumPackedSignBits = std::min(SVT.getSizeInBits(), 16); diff --git a/llvm/test/CodeGen/X86/pr48727.ll b/llvm/test/CodeGen/X86/pr48727.ll new file mode 100644 index ..4fa16db14acc --- /dev/null +++ b/llvm/test/CodeGen/X86/pr48727.ll @@ -0,0 +1,51 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-- -mcpu=skx | FileCheck %s + +define void @PR48727() { +; CHECK-LABEL: PR48727: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:vcvttpd2dqy 0, %xmm0 +; CHECK-NEXT:vcvttpd2dqy 128, %xmm1 +; CHECK-NEXT:movq (%rax), %rax +; CHECK-NEXT:vcvttpd2dqy 160, %xmm2 +; CHECK-NEXT:vinserti128 $1, %xmm2, %ymm1, %ymm1 +; CHECK-NEXT:vcvttpd2dqy (%rax), %xmm2 +; CHECK-NEXT:vinserti128 $1, %xmm2, %ymm0, %ymm0 +; CHECK-NEXT:vinserti64x4 $1, %ymm1, %zmm0, %zmm0 +; CHECK-NEXT:vpmovdw %zmm0, %ymm0 +; CHECK-NEXT:vmovdqu %ymm0, 16(%rax) +; CHECK-NEXT:vzeroupper +; CHECK-NEXT:retq +entry: + %0 = load [100 x [100 x i16]]*, [100 x [100 x i16]]** undef, align 8 + %wide.load.2 = load <4 x double>, <4 x double>* null, align 16 + %1 = fptosi <4 x double> %wide.load.2 to <4 x i16> + %2 = getelementptr inbounds [100 x [100 x i16]], [100 x [100 x i16]]* %0, i64 0, i64 0, i64 8 + %3 = bitcast i16* %2 to <4 x i16>* + store <4 x i16> %1, <4 x i16>* %3, align 8 + %wide.load.3 = load <4 x double>, <4 x double>* undef, align 16, !invariant.load !0, !noalias !1 + %4 = fptosi <4 x double> %wide.load.3 to <4 x i16> + %5 = getelementptr inbounds [100 x [100 x i16]], [100 x [100 x i16]]* %0, i64 0, i64 0, i64 12 + %6 = bitcast i16* %5 to <4 x i16>* + store <4 x i16> %4, <4 x i16>* %6, align 8 + %7 = getelementptr inbounds [100 x [100 x double]], [100 x [100 x double]]* null, i64 0, i64 0, i64 16 + %8 = bitcast double* %7 to <4 x double>* + %wide.load.4 = load <4 x double>, <4 x double>* %8, align 16, !invariant.load !0, !noalias !1 + %9 = fptosi <4 x double> %wide.load.4 to <4 x i16> + %10 = getelementptr inbounds [100 x [100 x i16]], [100 x [100 x i16]]* %0, i64 0, i64 0, i64 16 + %11 = bitcast i16* %10 to <4 x i16>* + store <4 x i16> %9, <4 x i16>* %11, align 8 + %12 = getelementptr inbounds [100 x [100 x double]], [100 x [100 x double]]* null, i64 0, i64 0, i64 20 + %13 = bitcast double* %12 to <4 x double>* + %wide.load.5 = load <4 x double>, <4 x double>* %13, align 16, !invariant.load !0, !noalias !1 + %14 = fptosi <4 x double> %wide.load.5 to <4 x i16> + %15 = getelementptr inbounds [100 x [100 x i16]], [100 x [100 x i16]]* %0, i64 0, i64 0, i64 20 + %16 = bitcast i16* %15 to <4 x i16>* + store <4 x i16> %14, <4 x i16>* %16, align 8 + ret void +} + +!0 = !{} +!1 = !{!2} +!2 = !{!"buffer: {index:1, offset:0, size:2}", !3} +!3 = !{!"XLA global AA domain"} diff --git a/llvm/test/CodeGen/X86/vector-pack-256.ll b/llvm/test/CodeGen/X86/vector-pack-256.ll index af06ddbd3f3a..b789b46906cb 100644 --- a/llvm/test/CodeGen/X86/vector-pack-256.ll +++ b/llvm/test/CodeGen/X86/vector-pack-256.ll @@ -31,7 +31,10 @@ define <16 x i16> @trunc_concat_packssdw_256(<8 x i32> %a0, <8 x i32> %a1) nounw ; AVX512: # %bb.0: ; AVX512-NEXT:vpsrad $17, %ymm0, %ymm0 ; AVX512-NEXT:vpsrad $23, %ymm1, %ymm1 -;
[llvm-branch-commits] [llvm] a4931d4 - [AMDGPU] Regenerate umax crash test
Author: Simon Pilgrim Date: 2021-01-12T18:02:15Z New Revision: a4931d4fe38d6feef53f97f3dcc7792bfcb06c84 URL: https://github.com/llvm/llvm-project/commit/a4931d4fe38d6feef53f97f3dcc7792bfcb06c84 DIFF: https://github.com/llvm/llvm-project/commit/a4931d4fe38d6feef53f97f3dcc7792bfcb06c84.diff LOG: [AMDGPU] Regenerate umax crash test Added: Modified: llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll Removed: diff --git a/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll b/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll index b7ed34bbf09b..b4cd36daad65 100644 --- a/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll +++ b/llvm/test/CodeGen/AMDGPU/r600-legalize-umax-bug.ll @@ -1,8 +1,27 @@ -; RUN: llc -march=r600 -mcpu=cypress -start-after safe-stack %s -o - | FileCheck %s +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -march=r600 -mcpu=cypress -start-after safe-stack | FileCheck %s ; Don't crash -; CHECK: MAX_UINT define amdgpu_kernel void @test(i64 addrspace(1)* %out) { +; CHECK-LABEL: test: +; CHECK: ; %bb.0: ; %bb +; CHECK-NEXT:ALU 4, @6, KC0[CB0:0-32], KC1[] +; CHECK-NEXT:MEM_RAT_CACHELESS STORE_RAW T0.XY, T1.X, 0 +; CHECK-NEXT:ALU 3, @11, KC0[], KC1[] +; CHECK-NEXT:MEM_RAT_CACHELESS STORE_RAW T0.XY, T1.X, 1 +; CHECK-NEXT:CF_END +; CHECK-NEXT:PAD +; CHECK-NEXT:ALU clause starting at 6: +; CHECK-NEXT: MOV T0.X, literal.x, +; CHECK-NEXT: MOV T0.Y, 0.0, +; CHECK-NEXT: LSHR * T1.X, KC0[2].Y, literal.x, +; CHECK-NEXT:2(2.802597e-45), 0(0.00e+00) +; CHECK-NEXT: MOV * T0.W, KC0[2].Y, +; CHECK-NEXT:ALU clause starting at 11: +; CHECK-NEXT: MAX_UINT T0.X, T0.X, literal.x, +; CHECK-NEXT: MOV T0.Y, 0.0, +; CHECK-NEXT: LSHR * T1.X, T0.W, literal.y, +; CHECK-NEXT:4(5.605194e-45), 2(2.802597e-45) bb: store i64 2, i64 addrspace(1)* %out %tmp = load i64, i64 addrspace(1)* %out ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 85aaa3e - [X86] Regenerate sdiv_fix_sat.ll + udiv_fix_sat.ll tests
Author: Simon Pilgrim Date: 2021-01-12T17:25:30Z New Revision: 85aaa3e310c23ec8a375b7a2e2fceee5a72441ef URL: https://github.com/llvm/llvm-project/commit/85aaa3e310c23ec8a375b7a2e2fceee5a72441ef DIFF: https://github.com/llvm/llvm-project/commit/85aaa3e310c23ec8a375b7a2e2fceee5a72441ef.diff LOG: [X86] Regenerate sdiv_fix_sat.ll + udiv_fix_sat.ll tests Adding missing libcall PLT qualifiers Added: Modified: llvm/test/CodeGen/X86/sdiv_fix_sat.ll llvm/test/CodeGen/X86/udiv_fix_sat.ll Removed: diff --git a/llvm/test/CodeGen/X86/sdiv_fix_sat.ll b/llvm/test/CodeGen/X86/sdiv_fix_sat.ll index 512488e8f872..617d5d7876bd 100644 --- a/llvm/test/CodeGen/X86/sdiv_fix_sat.ll +++ b/llvm/test/CodeGen/X86/sdiv_fix_sat.ll @@ -322,7 +322,7 @@ define i64 @func5(i64 %x, i64 %y) nounwind { ; X64-NEXT:movq %r15, %rdi ; X64-NEXT:movq %r12, %rsi ; X64-NEXT:movq %r13, %rcx -; X64-NEXT:callq __divti3 +; X64-NEXT:callq __divti3@PLT ; X64-NEXT:movq %rax, %rbx ; X64-NEXT:movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill ; X64-NEXT:movq %rdx, %rbp @@ -338,7 +338,7 @@ define i64 @func5(i64 %x, i64 %y) nounwind { ; X64-NEXT:movq %r12, %rsi ; X64-NEXT:movq (%rsp), %rdx # 8-byte Reload ; X64-NEXT:movq %r13, %rcx -; X64-NEXT:callq __modti3 +; X64-NEXT:callq __modti3@PLT ; X64-NEXT:orq %rax, %rdx ; X64-NEXT:setne %al ; X64-NEXT:testb %r14b, %al @@ -613,7 +613,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %r12, %rdi ; X64-NEXT:movq %rbp, %rsi ; X64-NEXT:movq %r15, %rcx -; X64-NEXT:callq __divti3 +; X64-NEXT:callq __divti3@PLT ; X64-NEXT:movq %rax, %r13 ; X64-NEXT:movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill ; X64-NEXT:movq %rdx, %r14 @@ -626,7 +626,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %rbp, %rsi ; X64-NEXT:movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload ; X64-NEXT:movq %r15, %rcx -; X64-NEXT:callq __modti3 +; X64-NEXT:callq __modti3@PLT ; X64-NEXT:orq %rax, %rdx ; X64-NEXT:setne %al ; X64-NEXT:testb %bl, %al @@ -668,7 +668,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %r15, %rdi ; X64-NEXT:movq %r13, %rsi ; X64-NEXT:movq %rbp, %rcx -; X64-NEXT:callq __divti3 +; X64-NEXT:callq __divti3@PLT ; X64-NEXT:movq %rax, %r12 ; X64-NEXT:movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill ; X64-NEXT:movq %rdx, %r14 @@ -681,7 +681,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %r13, %rsi ; X64-NEXT:movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload ; X64-NEXT:movq %rbp, %rcx -; X64-NEXT:callq __modti3 +; X64-NEXT:callq __modti3@PLT ; X64-NEXT:orq %rax, %rdx ; X64-NEXT:setne %al ; X64-NEXT:testb %bl, %al @@ -735,7 +735,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %r15, %rdi ; X64-NEXT:movq %r12, %rsi ; X64-NEXT:movq %rbp, %rcx -; X64-NEXT:callq __divti3 +; X64-NEXT:callq __divti3@PLT ; X64-NEXT:movq %rax, %r13 ; X64-NEXT:movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill ; X64-NEXT:movq %rdx, %r14 @@ -748,7 +748,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %r12, %rsi ; X64-NEXT:movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload ; X64-NEXT:movq %rbp, %rcx -; X64-NEXT:callq __modti3 +; X64-NEXT:callq __modti3@PLT ; X64-NEXT:orq %rax, %rdx ; X64-NEXT:setne %al ; X64-NEXT:testb %bl, %al @@ -790,7 +790,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %r15, %rdi ; X64-NEXT:movq %r13, %rsi ; X64-NEXT:movq %rbp, %rcx -; X64-NEXT:callq __divti3 +; X64-NEXT:callq __divti3@PLT ; X64-NEXT:movq %rax, %r12 ; X64-NEXT:movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill ; X64-NEXT:movq %rdx, %r14 @@ -803,7 +803,7 @@ define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind { ; X64-NEXT:movq %r13, %rsi ; X64-NEXT:movq {{[-0-9]+}}(%r{{[sb]}}p), %rdx # 8-byte Reload ; X64-NEXT:movq %rbp, %rcx -; X64-NEXT:callq __modti3 +; X64-NEXT:callq __modti3@PLT ; X64-NEXT:orq %rax, %rdx ; X64-NEXT:setne %al ; X64-NEXT:testb %bl, %al diff --git a/llvm/test/CodeGen/X86/udiv_fix_sat.ll b/llvm/test/CodeGen/X86/udiv_fix_sat.ll index d2e3b80c2145..2be51c3ccbba 100644 --- a/llvm/test/CodeGen/X86/udiv_fix_sat.ll +++ b/llvm/test/CodeGen/X86/udiv_fix_sat.ll @@ -179,7 +179,7 @@ define i64 @func5(i64 %x, i64 %y) nounwind { ; X64-NEXT:shlq $32, %rdi ; X64-NEXT:xorl %ebx, %ebx ; X64-NEXT:xorl %ecx, %ecx -; X64-NEXT:callq __udivti3 +; X64-NEXT:callq __udivti3@PLT ; X64-NEXT:cmpq $-1, %rax ; X64-NEXT:movq $-1, %rcx ;
[llvm-branch-commits] [llvm] 2ed914c - [X86][SSE] getFauxShuffleMask - handle PACKSS(SRAI(), SRAI()) shuffle patterns.
Author: Simon Pilgrim Date: 2021-01-12T14:07:53Z New Revision: 2ed914cb7e9c0737bdf60a0b1fd48b6499973325 URL: https://github.com/llvm/llvm-project/commit/2ed914cb7e9c0737bdf60a0b1fd48b6499973325 DIFF: https://github.com/llvm/llvm-project/commit/2ed914cb7e9c0737bdf60a0b1fd48b6499973325.diff LOG: [X86][SSE] getFauxShuffleMask - handle PACKSS(SRAI(),SRAI()) shuffle patterns. We can't easily treat ASHR a faux shuffle, but if it was just feeding a PACKSS then it was likely being used as sign-extension for a truncation, so just peek through and adjust the mask accordingly. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/psubus.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 750c809eafca..f28e28689806 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -7685,12 +7685,26 @@ static bool getFauxShuffleMask(SDValue N, const APInt , // If we know input saturation won't happen (or we don't care for particular // lanes), we can treat this as a truncation shuffle. +bool Offset0 = false, Offset1 = false; if (Opcode == X86ISD::PACKSS) { if ((!(N0.isUndef() || EltsLHS.isNullValue()) && DAG.ComputeNumSignBits(N0, EltsLHS, Depth + 1) <= NumBitsPerElt) || (!(N1.isUndef() || EltsRHS.isNullValue()) && DAG.ComputeNumSignBits(N1, EltsRHS, Depth + 1) <= NumBitsPerElt)) return false; + // We can't easily fold ASHR into a shuffle, but if it was feeding a + // PACKSS then it was likely being used for sign-extension for a + // truncation, so just peek through and adjust the mask accordingly. + if (N0.getOpcode() == X86ISD::VSRAI && N->isOnlyUserOf(N0.getNode()) && + N0.getConstantOperandAPInt(1) == NumBitsPerElt) { +Offset0 = true; +N0 = N0.getOperand(0); + } + if (N1.getOpcode() == X86ISD::VSRAI && N->isOnlyUserOf(N1.getNode()) && + N1.getConstantOperandAPInt(1) == NumBitsPerElt) { +Offset1 = true; +N1 = N1.getOperand(0); + } } else { APInt ZeroMask = APInt::getHighBitsSet(2 * NumBitsPerElt, NumBitsPerElt); if ((!(N0.isUndef() || EltsLHS.isNullValue()) && @@ -7707,6 +7721,13 @@ static bool getFauxShuffleMask(SDValue N, const APInt , Ops.push_back(N1); createPackShuffleMask(VT, Mask, IsUnary); + +if (Offset0 || Offset1) { + for (int : Mask) +if ((Offset0 && isInRange(M, 0, NumElts)) || +(Offset1 && isInRange(M, NumElts, 2 * NumElts))) + ++M; +} return true; } case X86ISD::VTRUNC: { diff --git a/llvm/test/CodeGen/X86/psubus.ll b/llvm/test/CodeGen/X86/psubus.ll index 06240cd8bad3..351629a732c1 100644 --- a/llvm/test/CodeGen/X86/psubus.ll +++ b/llvm/test/CodeGen/X86/psubus.ll @@ -1403,11 +1403,6 @@ define <8 x i16> @psubus_8i32_max(<8 x i16> %x, <8 x i32> %y) nounwind { ; SSE2-NEXT:psrad $16, %xmm5 ; SSE2-NEXT:packssdw %xmm6, %xmm5 ; SSE2-NEXT:psubusw %xmm5, %xmm0 -; SSE2-NEXT:punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7] -; SSE2-NEXT:psrad $16, %xmm1 -; SSE2-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3] -; SSE2-NEXT:psrad $16, %xmm0 -; SSE2-NEXT:packssdw %xmm1, %xmm0 ; SSE2-NEXT:retq ; ; SSSE3-LABEL: psubus_8i32_max: @@ -1738,111 +1733,91 @@ define <16 x i16> @psubus_16i32_max(<16 x i16> %x, <16 x i32> %y) nounwind { ; SSE2-LABEL: psubus_16i32_max: ; SSE2: # %bb.0: # %vector.ph ; SSE2-NEXT:movdqa {{.*#+}} xmm9 = [2147483648,2147483648,2147483648,2147483648] -; SSE2-NEXT:movdqa %xmm3, %xmm8 +; SSE2-NEXT:movdqa %xmm5, %xmm8 ; SSE2-NEXT:pxor %xmm9, %xmm8 ; SSE2-NEXT:movdqa {{.*#+}} xmm7 = [2147549183,2147549183,2147549183,2147549183] ; SSE2-NEXT:movdqa %xmm7, %xmm6 ; SSE2-NEXT:pcmpgtd %xmm8, %xmm6 ; SSE2-NEXT:pcmpeqd %xmm8, %xmm8 -; SSE2-NEXT:pand %xmm6, %xmm3 +; SSE2-NEXT:pand %xmm6, %xmm5 ; SSE2-NEXT:pxor %xmm8, %xmm6 -; SSE2-NEXT:por %xmm3, %xmm6 +; SSE2-NEXT:por %xmm5, %xmm6 ; SSE2-NEXT:pslld $16, %xmm6 ; SSE2-NEXT:psrad $16, %xmm6 -; SSE2-NEXT:movdqa %xmm2, %xmm10 +; SSE2-NEXT:movdqa %xmm4, %xmm10 ; SSE2-NEXT:pxor %xmm9, %xmm10 -; SSE2-NEXT:movdqa %xmm7, %xmm3 -; SSE2-NEXT:pcmpgtd %xmm10, %xmm3 -; SSE2-NEXT:pand %xmm3, %xmm2 -; SSE2-NEXT:pxor %xmm8, %xmm3 -; SSE2-NEXT:por %xmm2, %xmm3 -; SSE2-NEXT:pslld $16, %xmm3 -; SSE2-NEXT:psrad $16, %xmm3 -; SSE2-NEXT:packssdw %xmm6, %xmm3 -; SSE2-NEXT:movdqa %xmm5, %xmm2 -; SSE2-NEXT:pxor %xmm9, %xmm2 +; SSE2-NEXT:movdqa %xmm7, %xmm5 +; SSE2-NEXT:pcmpgtd %xmm10, %xmm5 +; SSE2-NEXT:pand %xmm5, %xmm4 +; SSE2-NEXT:pxor %xmm8, %xmm5 +; SSE2-NEXT:por
[llvm-branch-commits] [llvm] 7e44208 - [X86][SSE] combineSubToSubus - add v16i32 handling on pre-AVX512BW targets.
Author: Simon Pilgrim Date: 2021-01-12T13:44:11Z New Revision: 7e44208115b35ad34cc10259e9c375abbd636ef5 URL: https://github.com/llvm/llvm-project/commit/7e44208115b35ad34cc10259e9c375abbd636ef5 DIFF: https://github.com/llvm/llvm-project/commit/7e44208115b35ad34cc10259e9c375abbd636ef5.diff LOG: [X86][SSE] combineSubToSubus - add v16i32 handling on pre-AVX512BW targets. v16i32 -> v16i16/v8i16 truncation is now good enough using PACKSS/PACKUS + shuffle combining that its no longer necessary to early-out on pre-AVX512BW targets. This was noticed while looking at completing PR40111 and moving combineSubToSubus to DAGCombine entirely. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/psubus.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index e3a94f1c23ab..750c809eafca 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -48756,9 +48756,9 @@ static SDValue combineSubToSubus(SDNode *N, SelectionDAG , // PSUBUS is supported, starting from SSE2. EVT EltVT = VT.getVectorElementType(); - if (!(Subtarget.hasSSE2() && (EltVT == MVT::i8 || EltVT == MVT::i16 || -VT == MVT::v8i32 || VT == MVT::v8i64)) && - !(Subtarget.useBWIRegs() && (VT == MVT::v16i32))) + if (!(Subtarget.hasSSE2() && +(EltVT == MVT::i8 || EltVT == MVT::i16 || VT == MVT::v8i32 || + VT == MVT::v8i64 || VT == MVT::v16i32))) return SDValue(); SDValue SubusLHS, SubusRHS; @@ -48795,8 +48795,8 @@ static SDValue combineSubToSubus(SDNode *N, SelectionDAG , SDValue MinRHS = Op1.getOperand(0).getOperand(1); EVT TruncVT = Op1.getOperand(0).getValueType(); if (!(Subtarget.hasSSE2() && - (TruncVT == MVT::v8i32 || TruncVT == MVT::v8i64)) && -!(Subtarget.useBWIRegs() && (TruncVT == MVT::v16i32))) + (TruncVT == MVT::v8i32 || TruncVT == MVT::v8i64 || + TruncVT == MVT::v16i32))) return SDValue(); SDValue OpToSaturate; if (MinLHS.getOpcode() == ISD::ZERO_EXTEND && diff --git a/llvm/test/CodeGen/X86/psubus.ll b/llvm/test/CodeGen/X86/psubus.ll index 906af5e17211..06240cd8bad3 100644 --- a/llvm/test/CodeGen/X86/psubus.ll +++ b/llvm/test/CodeGen/X86/psubus.ll @@ -1737,141 +1737,125 @@ vector.ph: define <16 x i16> @psubus_16i32_max(<16 x i16> %x, <16 x i32> %y) nounwind { ; SSE2-LABEL: psubus_16i32_max: ; SSE2: # %bb.0: # %vector.ph -; SSE2-NEXT:movdqa %xmm1, %xmm8 -; SSE2-NEXT:pxor %xmm7, %xmm7 -; SSE2-NEXT:punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm7[0],xmm1[1],xmm7[1],xmm1[2],xmm7[2],xmm1[3],xmm7[3] -; SSE2-NEXT:punpckhwd {{.*#+}} xmm8 = xmm8[4],xmm7[4],xmm8[5],xmm7[5],xmm8[6],xmm7[6],xmm8[7],xmm7[7] -; SSE2-NEXT:movdqa %xmm0, %xmm10 -; SSE2-NEXT:punpcklwd {{.*#+}} xmm10 = xmm10[0],xmm7[0],xmm10[1],xmm7[1],xmm10[2],xmm7[2],xmm10[3],xmm7[3] -; SSE2-NEXT:punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm7[4],xmm0[5],xmm7[5],xmm0[6],xmm7[6],xmm0[7],xmm7[7] -; SSE2-NEXT:movdqa {{.*#+}} xmm7 = [2147483648,2147483648,2147483648,2147483648] -; SSE2-NEXT:movdqa %xmm3, %xmm6 -; SSE2-NEXT:pxor %xmm7, %xmm6 -; SSE2-NEXT:movdqa %xmm0, %xmm9 -; SSE2-NEXT:por %xmm7, %xmm9 -; SSE2-NEXT:pcmpgtd %xmm6, %xmm9 -; SSE2-NEXT:pand %xmm9, %xmm0 -; SSE2-NEXT:pandn %xmm3, %xmm9 -; SSE2-NEXT:por %xmm0, %xmm9 -; SSE2-NEXT:movdqa %xmm2, %xmm6 -; SSE2-NEXT:pxor %xmm7, %xmm6 -; SSE2-NEXT:movdqa %xmm10, %xmm0 -; SSE2-NEXT:por %xmm7, %xmm0 -; SSE2-NEXT:pcmpgtd %xmm6, %xmm0 -; SSE2-NEXT:pand %xmm0, %xmm10 -; SSE2-NEXT:pandn %xmm2, %xmm0 -; SSE2-NEXT:por %xmm10, %xmm0 -; SSE2-NEXT:movdqa %xmm5, %xmm10 -; SSE2-NEXT:pxor %xmm7, %xmm10 -; SSE2-NEXT:movdqa %xmm8, %xmm6 -; SSE2-NEXT:por %xmm7, %xmm6 -; SSE2-NEXT:pcmpgtd %xmm10, %xmm6 -; SSE2-NEXT:pand %xmm6, %xmm8 -; SSE2-NEXT:pandn %xmm5, %xmm6 -; SSE2-NEXT:por %xmm8, %xmm6 -; SSE2-NEXT:movdqa %xmm4, %xmm8 -; SSE2-NEXT:pxor %xmm7, %xmm8 -; SSE2-NEXT:por %xmm1, %xmm7 -; SSE2-NEXT:pcmpgtd %xmm8, %xmm7 -; SSE2-NEXT:pand %xmm7, %xmm1 -; SSE2-NEXT:pandn %xmm4, %xmm7 -; SSE2-NEXT:por %xmm7, %xmm1 -; SSE2-NEXT:psubd %xmm4, %xmm1 -; SSE2-NEXT:psubd %xmm5, %xmm6 -; SSE2-NEXT:psubd %xmm2, %xmm0 -; SSE2-NEXT:psubd %xmm3, %xmm9 -; SSE2-NEXT:pslld $16, %xmm9 -; SSE2-NEXT:psrad $16, %xmm9 -; SSE2-NEXT:pslld $16, %xmm0 -; SSE2-NEXT:psrad $16, %xmm0 -; SSE2-NEXT:packssdw %xmm9, %xmm0 +; SSE2-NEXT:movdqa {{.*#+}} xmm9 = [2147483648,2147483648,2147483648,2147483648] +; SSE2-NEXT:movdqa %xmm3, %xmm8 +; SSE2-NEXT:pxor %xmm9, %xmm8 +; SSE2-NEXT:movdqa {{.*#+}} xmm7 = [2147549183,2147549183,2147549183,2147549183] +; SSE2-NEXT:movdqa %xmm7, %xmm6 +; SSE2-NEXT:pcmpgtd %xmm8, %xmm6
[llvm-branch-commits] [llvm] a5212b5 - [X86][SSE] combineSubToSubus - remove SSE2 early-out.
Author: Simon Pilgrim Date: 2021-01-12T12:52:11Z New Revision: a5212b5c91cc699052125b8a3428ffe0c123837d URL: https://github.com/llvm/llvm-project/commit/a5212b5c91cc699052125b8a3428ffe0c123837d DIFF: https://github.com/llvm/llvm-project/commit/a5212b5c91cc699052125b8a3428ffe0c123837d.diff LOG: [X86][SSE] combineSubToSubus - remove SSE2 early-out. SSE2 truncation codegen has improved over the past few years (mainly due to better shuffle lowering/combining and computeKnownBits) - its no longer necessary to early-out from v8i32/v8i64 truncations. This was noticed while looking at completing PR40111 and moving combineSubToSubus to DAGCombine entirely. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/psubus.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 2f9de876a87f..e3a94f1c23ab 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -48754,11 +48754,10 @@ static SDValue combineSubToSubus(SDNode *N, SelectionDAG , if (!VT.isVector()) return SDValue(); - // PSUBUS is supported, starting from SSE2, but truncation for v8i32 - // is only worth it with SSSE3 (PSHUFB). + // PSUBUS is supported, starting from SSE2. EVT EltVT = VT.getVectorElementType(); - if (!(Subtarget.hasSSE2() && (EltVT == MVT::i8 || EltVT == MVT::i16)) && - !(Subtarget.hasSSSE3() && (VT == MVT::v8i32 || VT == MVT::v8i64)) && + if (!(Subtarget.hasSSE2() && (EltVT == MVT::i8 || EltVT == MVT::i16 || +VT == MVT::v8i32 || VT == MVT::v8i64)) && !(Subtarget.useBWIRegs() && (VT == MVT::v16i32))) return SDValue(); @@ -48795,8 +48794,8 @@ static SDValue combineSubToSubus(SDNode *N, SelectionDAG , SDValue MinLHS = Op1.getOperand(0).getOperand(0); SDValue MinRHS = Op1.getOperand(0).getOperand(1); EVT TruncVT = Op1.getOperand(0).getValueType(); -if (!(Subtarget.hasSSSE3() && (TruncVT == MVT::v8i32 || - TruncVT == MVT::v8i64)) && +if (!(Subtarget.hasSSE2() && + (TruncVT == MVT::v8i32 || TruncVT == MVT::v8i64)) && !(Subtarget.useBWIRegs() && (TruncVT == MVT::v16i32))) return SDValue(); SDValue OpToSaturate; @@ -48835,7 +48834,7 @@ static SDValue combineSubToSubus(SDNode *N, SelectionDAG , // values, or first 48 bits for 64 bit values. KnownBits Known = DAG.computeKnownBits(SubusLHS); unsigned NumZeros = Known.countMinLeadingZeros(); - if ((VT == MVT::v8i64 && NumZeros < 48) || NumZeros < 16) + if (NumZeros < (VT.getScalarSizeInBits() - 16)) return SDValue(); EVT ExtType = SubusLHS.getValueType(); diff --git a/llvm/test/CodeGen/X86/psubus.ll b/llvm/test/CodeGen/X86/psubus.ll index 92283dba25b8..906af5e17211 100644 --- a/llvm/test/CodeGen/X86/psubus.ll +++ b/llvm/test/CodeGen/X86/psubus.ll @@ -1382,33 +1382,32 @@ vector.ph: define <8 x i16> @psubus_8i32_max(<8 x i16> %x, <8 x i32> %y) nounwind { ; SSE2-LABEL: psubus_8i32_max: ; SSE2: # %bb.0: # %vector.ph -; SSE2-NEXT:movdqa %xmm0, %xmm3 -; SSE2-NEXT:pxor %xmm4, %xmm4 -; SSE2-NEXT:punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3] -; SSE2-NEXT:punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7] -; SSE2-NEXT:movdqa {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648] -; SSE2-NEXT:movdqa %xmm2, %xmm6 -; SSE2-NEXT:pxor %xmm5, %xmm6 -; SSE2-NEXT:movdqa %xmm3, %xmm4 -; SSE2-NEXT:por %xmm5, %xmm4 -; SSE2-NEXT:pcmpgtd %xmm6, %xmm4 -; SSE2-NEXT:pand %xmm4, %xmm3 -; SSE2-NEXT:pandn %xmm2, %xmm4 -; SSE2-NEXT:por %xmm3, %xmm4 -; SSE2-NEXT:movdqa %xmm1, %xmm3 -; SSE2-NEXT:pxor %xmm5, %xmm3 -; SSE2-NEXT:por %xmm0, %xmm5 +; SSE2-NEXT:movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648] +; SSE2-NEXT:movdqa %xmm2, %xmm4 +; SSE2-NEXT:pxor %xmm3, %xmm4 +; SSE2-NEXT:movdqa {{.*#+}} xmm5 = [2147549183,2147549183,2147549183,2147549183] +; SSE2-NEXT:movdqa %xmm5, %xmm6 +; SSE2-NEXT:pcmpgtd %xmm4, %xmm6 +; SSE2-NEXT:pcmpeqd %xmm4, %xmm4 +; SSE2-NEXT:pand %xmm6, %xmm2 +; SSE2-NEXT:pxor %xmm4, %xmm6 +; SSE2-NEXT:por %xmm2, %xmm6 +; SSE2-NEXT:pslld $16, %xmm6 +; SSE2-NEXT:psrad $16, %xmm6 +; SSE2-NEXT:pxor %xmm1, %xmm3 ; SSE2-NEXT:pcmpgtd %xmm3, %xmm5 -; SSE2-NEXT:pand %xmm5, %xmm0 -; SSE2-NEXT:pandn %xmm1, %xmm5 -; SSE2-NEXT:por %xmm5, %xmm0 -; SSE2-NEXT:psubd %xmm1, %xmm0 -; SSE2-NEXT:psubd %xmm2, %xmm4 -; SSE2-NEXT:pslld $16, %xmm4 -; SSE2-NEXT:psrad $16, %xmm4 -; SSE2-NEXT:pslld $16, %xmm0 +; SSE2-NEXT:pxor %xmm5, %xmm4 +; SSE2-NEXT:pand %xmm1, %xmm5 +; SSE2-NEXT:por %xmm4, %xmm5 +; SSE2-NEXT:pslld $16, %xmm5 +;
[llvm-branch-commits] [llvm] 4214ca9 - [X86][AVX] Attempt to fold vpermf128(op(x, i), op(y, i)) -> op(vpermf128(x, y), i)
Author: Simon Pilgrim Date: 2021-01-11T16:59:25Z New Revision: 4214ca96145c9487407925b121b85fafb1179209 URL: https://github.com/llvm/llvm-project/commit/4214ca96145c9487407925b121b85fafb1179209 DIFF: https://github.com/llvm/llvm-project/commit/4214ca96145c9487407925b121b85fafb1179209.diff LOG: [X86][AVX] Attempt to fold vpermf128(op(x,i),op(y,i)) -> op(vpermf128(x,y),i) If vpermf128/vpermi128 is acting on 2 similar 'inlane' ops, then try to perform the vpermf128 first which will allow us to merge the ops. This will help us fix one of the regressions in D56387 Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-trunc.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 7895f883863f..2f9de876a87f 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -36665,6 +36665,43 @@ static SDValue combineCommutableSHUFP(SDValue N, MVT VT, const SDLoc , return SDValue(); } +/// Attempt to fold vpermf128(op(),op()) -> op(vpermf128(),vpermf128()). +static SDValue canonicalizeLaneShuffleWithRepeatedOps(SDValue V, + SelectionDAG , + const SDLoc ) { + assert(V.getOpcode() == X86ISD::VPERM2X128 && "Unknown lane shuffle"); + + MVT VT = V.getSimpleValueType(); + SDValue Src0 = peekThroughBitcasts(V.getOperand(0)); + SDValue Src1 = peekThroughBitcasts(V.getOperand(1)); + unsigned SrcOpc0 = Src0.getOpcode(); + unsigned SrcOpc1 = Src1.getOpcode(); + EVT SrcVT0 = Src0.getValueType(); + EVT SrcVT1 = Src1.getValueType(); + + // TODO: Under what circumstances should we push perm2f128 up when we have one + // active src? + if (SrcOpc0 != SrcOpc1 || SrcVT0 != SrcVT1) +return SDValue(); + + switch (SrcOpc0) { + case X86ISD::VSHLI: + case X86ISD::VSRLI: + case X86ISD::VSRAI: +if (Src0.getOperand(1) == Src1.getOperand(1)) { + SDValue Res = DAG.getNode( + X86ISD::VPERM2X128, DL, VT, DAG.getBitcast(VT, Src0.getOperand(0)), + DAG.getBitcast(VT, Src1.getOperand(0)), V.getOperand(2)); + Res = DAG.getNode(SrcOpc0, DL, SrcVT0, DAG.getBitcast(SrcVT0, Res), +Src0.getOperand(1)); + return DAG.getBitcast(VT, Res); +} +break; + } + + return SDValue(); +} + /// Try to combine x86 target specific shuffles. static SDValue combineTargetShuffle(SDValue N, SelectionDAG , TargetLowering::DAGCombinerInfo , @@ -37045,6 +37082,9 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG , return SDValue(); } case X86ISD::VPERM2X128: { +if (SDValue Res = canonicalizeLaneShuffleWithRepeatedOps(N, DAG, DL)) +return Res; + // If both 128-bit values were inserted into high halves of 256-bit values, // the shuffle can be reduced to a concatenation of subvectors: // vperm2x128 (ins ?, X, C1), (ins ?, Y, C2), 0x31 --> concat X, Y @@ -37053,6 +37093,7 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG , SDValue Ins0 = peekThroughBitcasts(N.getOperand(0)); SDValue Ins1 = peekThroughBitcasts(N.getOperand(1)); unsigned Imm = N.getConstantOperandVal(2); + if (!(Imm == 0x31 && Ins0.getOpcode() == ISD::INSERT_SUBVECTOR && Ins1.getOpcode() == ISD::INSERT_SUBVECTOR && diff --git a/llvm/test/CodeGen/X86/vector-trunc.ll b/llvm/test/CodeGen/X86/vector-trunc.ll index bd8b7dd355cc..f35e315bbb0b 100644 --- a/llvm/test/CodeGen/X86/vector-trunc.ll +++ b/llvm/test/CodeGen/X86/vector-trunc.ll @@ -107,11 +107,9 @@ define <8 x i32> @trunc8i64_8i32_lshr(<8 x i64> %a) { ; ; AVX2-SLOW-LABEL: trunc8i64_8i32_lshr: ; AVX2-SLOW: # %bb.0: # %entry -; AVX2-SLOW-NEXT:vpsrlq $32, %ymm1, %ymm1 -; AVX2-SLOW-NEXT:vpsrlq $32, %ymm0, %ymm0 -; AVX2-SLOW-NEXT:vperm2i128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] -; AVX2-SLOW-NEXT:vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX2-SLOW-NEXT:vshufps {{.*#+}} ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6] +; AVX2-SLOW-NEXT:vperm2f128 {{.*#+}} ymm2 = ymm0[2,3],ymm1[2,3] +; AVX2-SLOW-NEXT:vinsertf128 $1, %xmm1, %ymm0, %ymm0 +; AVX2-SLOW-NEXT:vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm2[1,3],ymm0[5,7],ymm2[5,7] ; AVX2-SLOW-NEXT:retq ; ; AVX2-FAST-LABEL: trunc8i64_8i32_lshr: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a0f8274 - [X86] Extend lzcnt-cmp tests to test on non-lzcnt targets
Author: Simon Pilgrim Date: 2021-01-11T15:27:08Z New Revision: a0f82749f4f3373ba85de40c69b866081f77abce URL: https://github.com/llvm/llvm-project/commit/a0f82749f4f3373ba85de40c69b866081f77abce DIFF: https://github.com/llvm/llvm-project/commit/a0f82749f4f3373ba85de40c69b866081f77abce.diff LOG: [X86] Extend lzcnt-cmp tests to test on non-lzcnt targets Added: Modified: llvm/test/CodeGen/X86/lzcnt-cmp.ll Removed: diff --git a/llvm/test/CodeGen/X86/lzcnt-cmp.ll b/llvm/test/CodeGen/X86/lzcnt-cmp.ll index 5bf0dbec7510..c094920d59eb 100644 --- a/llvm/test/CodeGen/X86/lzcnt-cmp.ll +++ b/llvm/test/CodeGen/X86/lzcnt-cmp.ll @@ -1,6 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=i686-- -mattr=+lzcnt | FileCheck %s --check-prefixes=X86 -; RUN: llc < %s -mtriple=x86_64-- -mattr=+lzcnt | FileCheck %s --check-prefix=X64 +; RUN: llc < %s -mtriple=i686-- | FileCheck %s --check-prefixes=X86,X86-BSR +; RUN: llc < %s -mtriple=i686-- -mattr=+lzcnt | FileCheck %s --check-prefixes=X86,X86-LZCNT +; RUN: llc < %s -mtriple=x86_64-- | FileCheck %s --check-prefixes=X64,X64-BSR +; RUN: llc < %s -mtriple=x86_64-- -mattr=+lzcnt | FileCheck %s --check-prefixes=X64,X64-LZCNT define i1 @lshr_ctlz_cmpeq_one_i64(i64 %in) nounwind { ; X86-LABEL: lshr_ctlz_cmpeq_one_i64: @@ -10,11 +12,27 @@ define i1 @lshr_ctlz_cmpeq_one_i64(i64 %in) nounwind { ; X86-NEXT:sete %al ; X86-NEXT:retl ; -; X64-LABEL: lshr_ctlz_cmpeq_one_i64: -; X64: # %bb.0: -; X64-NEXT:testq %rdi, %rdi -; X64-NEXT:sete %al -; X64-NEXT:retq +; X64-BSR-LABEL: lshr_ctlz_cmpeq_one_i64: +; X64-BSR: # %bb.0: +; X64-BSR-NEXT:testq %rdi, %rdi +; X64-BSR-NEXT:je .LBB0_1 +; X64-BSR-NEXT: # %bb.2: # %cond.false +; X64-BSR-NEXT:bsrq %rdi, %rax +; X64-BSR-NEXT:xorq $63, %rax +; X64-BSR-NEXT:jmp .LBB0_3 +; X64-BSR-NEXT: .LBB0_1: +; X64-BSR-NEXT:movl $64, %eax +; X64-BSR-NEXT: .LBB0_3: # %cond.end +; X64-BSR-NEXT:shrq $6, %rax +; X64-BSR-NEXT:cmpq $1, %rax +; X64-BSR-NEXT:sete %al +; X64-BSR-NEXT:retq +; +; X64-LZCNT-LABEL: lshr_ctlz_cmpeq_one_i64: +; X64-LZCNT: # %bb.0: +; X64-LZCNT-NEXT:testq %rdi, %rdi +; X64-LZCNT-NEXT:sete %al +; X64-LZCNT-NEXT:retq %ctlz = call i64 @llvm.ctlz.i64(i64 %in, i1 0) %lshr = lshr i64 %ctlz, 6 %icmp = icmp eq i64 %lshr, 1 @@ -22,26 +40,48 @@ define i1 @lshr_ctlz_cmpeq_one_i64(i64 %in) nounwind { } define i1 @lshr_ctlz_undef_cmpeq_one_i64(i64 %in) nounwind { -; X86-LABEL: lshr_ctlz_undef_cmpeq_one_i64: -; X86: # %bb.0: -; X86-NEXT:xorl %eax, %eax -; X86-NEXT:cmpl $0, {{[0-9]+}}(%esp) -; X86-NEXT:jne .LBB1_2 -; X86-NEXT: # %bb.1: -; X86-NEXT:lzcntl {{[0-9]+}}(%esp), %eax -; X86-NEXT:addl $32, %eax -; X86-NEXT: .LBB1_2: -; X86-NEXT:testb $64, %al -; X86-NEXT:setne %al -; X86-NEXT:retl +; X86-BSR-LABEL: lshr_ctlz_undef_cmpeq_one_i64: +; X86-BSR: # %bb.0: +; X86-BSR-NEXT:xorl %eax, %eax +; X86-BSR-NEXT:cmpl $0, {{[0-9]+}}(%esp) +; X86-BSR-NEXT:jne .LBB1_2 +; X86-BSR-NEXT: # %bb.1: +; X86-BSR-NEXT:bsrl {{[0-9]+}}(%esp), %eax +; X86-BSR-NEXT:xorl $31, %eax +; X86-BSR-NEXT:addl $32, %eax +; X86-BSR-NEXT: .LBB1_2: +; X86-BSR-NEXT:testl $-64, %eax +; X86-BSR-NEXT:setne %al +; X86-BSR-NEXT:retl ; -; X64-LABEL: lshr_ctlz_undef_cmpeq_one_i64: -; X64: # %bb.0: -; X64-NEXT:lzcntq %rdi, %rax -; X64-NEXT:shrq $6, %rax -; X64-NEXT:cmpl $1, %eax -; X64-NEXT:sete %al -; X64-NEXT:retq +; X86-LZCNT-LABEL: lshr_ctlz_undef_cmpeq_one_i64: +; X86-LZCNT: # %bb.0: +; X86-LZCNT-NEXT:xorl %eax, %eax +; X86-LZCNT-NEXT:cmpl $0, {{[0-9]+}}(%esp) +; X86-LZCNT-NEXT:jne .LBB1_2 +; X86-LZCNT-NEXT: # %bb.1: +; X86-LZCNT-NEXT:lzcntl {{[0-9]+}}(%esp), %eax +; X86-LZCNT-NEXT:addl $32, %eax +; X86-LZCNT-NEXT: .LBB1_2: +; X86-LZCNT-NEXT:testb $64, %al +; X86-LZCNT-NEXT:setne %al +; X86-LZCNT-NEXT:retl +; +; X64-BSR-LABEL: lshr_ctlz_undef_cmpeq_one_i64: +; X64-BSR: # %bb.0: +; X64-BSR-NEXT:bsrq %rdi, %rax +; X64-BSR-NEXT:shrq $6, %rax +; X64-BSR-NEXT:cmpl $1, %eax +; X64-BSR-NEXT:sete %al +; X64-BSR-NEXT:retq +; +; X64-LZCNT-LABEL: lshr_ctlz_undef_cmpeq_one_i64: +; X64-LZCNT: # %bb.0: +; X64-LZCNT-NEXT:lzcntq %rdi, %rax +; X64-LZCNT-NEXT:shrq $6, %rax +; X64-LZCNT-NEXT:cmpl $1, %eax +; X64-LZCNT-NEXT:sete %al +; X64-LZCNT-NEXT:retq %ctlz = call i64 @llvm.ctlz.i64(i64 %in, i1 -1) %lshr = lshr i64 %ctlz, 6 %icmp = icmp eq i64 %lshr, 1 @@ -56,11 +96,26 @@ define i1 @lshr_ctlz_cmpne_zero_i64(i64 %in) nounwind { ; X86-NEXT:sete %al ; X86-NEXT:retl ; -; X64-LABEL: lshr_ctlz_cmpne_zero_i64: -; X64: # %bb.0: -; X64-NEXT:testq %rdi, %rdi -; X64-NEXT:sete %al -; X64-NEXT:retq +;
[llvm-branch-commits] [llvm] a46982a - [X86] Add nounwind to lzcnt-cmp tests
Author: Simon Pilgrim Date: 2021-01-11T15:06:38Z New Revision: a46982a25511bd0da82f3f2637912dfd86042929 URL: https://github.com/llvm/llvm-project/commit/a46982a25511bd0da82f3f2637912dfd86042929 DIFF: https://github.com/llvm/llvm-project/commit/a46982a25511bd0da82f3f2637912dfd86042929.diff LOG: [X86] Add nounwind to lzcnt-cmp tests Remove unnecessary cfi markup Added: Modified: llvm/test/CodeGen/X86/lzcnt-cmp.ll Removed: diff --git a/llvm/test/CodeGen/X86/lzcnt-cmp.ll b/llvm/test/CodeGen/X86/lzcnt-cmp.ll index 3823524f552a..5bf0dbec7510 100644 --- a/llvm/test/CodeGen/X86/lzcnt-cmp.ll +++ b/llvm/test/CodeGen/X86/lzcnt-cmp.ll @@ -2,7 +2,7 @@ ; RUN: llc < %s -mtriple=i686-- -mattr=+lzcnt | FileCheck %s --check-prefixes=X86 ; RUN: llc < %s -mtriple=x86_64-- -mattr=+lzcnt | FileCheck %s --check-prefix=X64 -define i1 @lshr_ctlz_cmpeq_one_i64(i64 %in) { +define i1 @lshr_ctlz_cmpeq_one_i64(i64 %in) nounwind { ; X86-LABEL: lshr_ctlz_cmpeq_one_i64: ; X86: # %bb.0: ; X86-NEXT:movl {{[0-9]+}}(%esp), %eax @@ -21,7 +21,7 @@ define i1 @lshr_ctlz_cmpeq_one_i64(i64 %in) { ret i1 %icmp } -define i1 @lshr_ctlz_undef_cmpeq_one_i64(i64 %in) { +define i1 @lshr_ctlz_undef_cmpeq_one_i64(i64 %in) nounwind { ; X86-LABEL: lshr_ctlz_undef_cmpeq_one_i64: ; X86: # %bb.0: ; X86-NEXT:xorl %eax, %eax @@ -48,7 +48,7 @@ define i1 @lshr_ctlz_undef_cmpeq_one_i64(i64 %in) { ret i1 %icmp } -define i1 @lshr_ctlz_cmpne_zero_i64(i64 %in) { +define i1 @lshr_ctlz_cmpne_zero_i64(i64 %in) nounwind { ; X86-LABEL: lshr_ctlz_cmpne_zero_i64: ; X86: # %bb.0: ; X86-NEXT:movl {{[0-9]+}}(%esp), %eax @@ -67,7 +67,7 @@ define i1 @lshr_ctlz_cmpne_zero_i64(i64 %in) { ret i1 %icmp } -define i1 @lshr_ctlz_undef_cmpne_zero_i64(i64 %in) { +define i1 @lshr_ctlz_undef_cmpne_zero_i64(i64 %in) nounwind { ; X86-LABEL: lshr_ctlz_undef_cmpne_zero_i64: ; X86: # %bb.0: ; X86-NEXT:xorl %eax, %eax @@ -93,12 +93,10 @@ define i1 @lshr_ctlz_undef_cmpne_zero_i64(i64 %in) { ret i1 %icmp } -define <2 x i64> @lshr_ctlz_cmpeq_zero_v2i64(<2 x i64> %in) { +define <2 x i64> @lshr_ctlz_cmpeq_zero_v2i64(<2 x i64> %in) nounwind { ; X86-LABEL: lshr_ctlz_cmpeq_zero_v2i64: ; X86: # %bb.0: ; X86-NEXT:pushl %esi -; X86-NEXT:.cfi_def_cfa_offset 8 -; X86-NEXT:.cfi_offset %esi, -8 ; X86-NEXT:movl {{[0-9]+}}(%esp), %eax ; X86-NEXT:movl {{[0-9]+}}(%esp), %esi ; X86-NEXT:movl {{[0-9]+}}(%esp), %edx @@ -115,7 +113,6 @@ define <2 x i64> @lshr_ctlz_cmpeq_zero_v2i64(<2 x i64> %in) { ; X86-NEXT:movl %ecx, 4(%eax) ; X86-NEXT:movl %ecx, (%eax) ; X86-NEXT:popl %esi -; X86-NEXT:.cfi_def_cfa_offset 4 ; X86-NEXT:retl $4 ; ; X64-LABEL: lshr_ctlz_cmpeq_zero_v2i64: @@ -134,12 +131,10 @@ define <2 x i64> @lshr_ctlz_cmpeq_zero_v2i64(<2 x i64> %in) { ret <2 x i64> %sext } -define <2 x i64> @lshr_ctlz_cmpne_zero_v2i64(<2 x i64> %in) { +define <2 x i64> @lshr_ctlz_cmpne_zero_v2i64(<2 x i64> %in) nounwind { ; X86-LABEL: lshr_ctlz_cmpne_zero_v2i64: ; X86: # %bb.0: ; X86-NEXT:pushl %esi -; X86-NEXT:.cfi_def_cfa_offset 8 -; X86-NEXT:.cfi_offset %esi, -8 ; X86-NEXT:movl {{[0-9]+}}(%esp), %eax ; X86-NEXT:movl {{[0-9]+}}(%esp), %esi ; X86-NEXT:movl {{[0-9]+}}(%esp), %edx @@ -156,7 +151,6 @@ define <2 x i64> @lshr_ctlz_cmpne_zero_v2i64(<2 x i64> %in) { ; X86-NEXT:movl %ecx, 4(%eax) ; X86-NEXT:movl %ecx, (%eax) ; X86-NEXT:popl %esi -; X86-NEXT:.cfi_def_cfa_offset 4 ; X86-NEXT:retl $4 ; ; X64-LABEL: lshr_ctlz_cmpne_zero_v2i64: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 8112a25 - [X86][SSE] Add 'vectorized sum' test patterns
Author: Simon Pilgrim Date: 2021-01-11T12:51:18Z New Revision: 8112a2598ce180ab4cd106f154a71e813fc28d91 URL: https://github.com/llvm/llvm-project/commit/8112a2598ce180ab4cd106f154a71e813fc28d91 DIFF: https://github.com/llvm/llvm-project/commit/8112a2598ce180ab4cd106f154a71e813fc28d91.diff LOG: [X86][SSE] Add 'vectorized sum' test patterns These are often generated when building a vector from the reduction sums of independent vectors. I've implemented some typical patterns from various v4f32/v4i32 based off current codegen emitted from the vectorizers, although these tests are more about tweaking some hadd style backend folds to handle whatever the vectorizers/vectorcombine throws at us... Added: llvm/test/CodeGen/X86/horizontal-sum.ll Modified: Removed: diff --git a/llvm/test/CodeGen/X86/horizontal-sum.ll b/llvm/test/CodeGen/X86/horizontal-sum.ll new file mode 100644 index ..47d44171d99a --- /dev/null +++ b/llvm/test/CodeGen/X86/horizontal-sum.ll @@ -0,0 +1,1189 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3 | FileCheck %s --check-prefixes=SSSE3,SSSE3-SLOW +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3,fast-hops | FileCheck %s --check-prefixes=SSSE3,SSSE3-FAST +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx | FileCheck %s --check-prefixes=AVX,AVX-SLOW,AVX1-SLOW +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx,fast-hops | FileCheck %s --check-prefixes=AVX,AVX-FAST,AVX1-FAST +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2| FileCheck %s --check-prefixes=AVX,AVX-SLOW,AVX2-SLOW +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2,fast-hops | FileCheck %s --check-prefixes=AVX,AVX-FAST,AVX2-FAST + +; Vectorized Pairwise Sum Reductions +; e.g. +; inline STYPE sum(VTYPE x) { +; return (x[0] + x[1]) + (x[2] + x[3]); +; } +; +; VTYPE sum4(VTYPE A0, VTYPE A1, VTYPE A2, VTYPE A3) { +; return (VTYPE) { sum( A0 ), sum( A1 ), sum( A2 ), sum( A3 ) }; +; } + +define <4 x float> @pair_sum_v4f32_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { +; SSSE3-SLOW-LABEL: pair_sum_v4f32_v4f32: +; SSSE3-SLOW: # %bb.0: +; SSSE3-SLOW-NEXT:haddps %xmm0, %xmm0 +; SSSE3-SLOW-NEXT:movshdup {{.*#+}} xmm4 = xmm0[1,1,3,3] +; SSSE3-SLOW-NEXT:addps %xmm4, %xmm0 +; SSSE3-SLOW-NEXT:haddps %xmm1, %xmm1 +; SSSE3-SLOW-NEXT:movshdup {{.*#+}} xmm4 = xmm1[1,1,3,3] +; SSSE3-SLOW-NEXT:addps %xmm1, %xmm4 +; SSSE3-SLOW-NEXT:unpcklps {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1] +; SSSE3-SLOW-NEXT:haddps %xmm2, %xmm2 +; SSSE3-SLOW-NEXT:movshdup {{.*#+}} xmm1 = xmm2[1,1,3,3] +; SSSE3-SLOW-NEXT:addps %xmm2, %xmm1 +; SSSE3-SLOW-NEXT:haddps %xmm3, %xmm3 +; SSSE3-SLOW-NEXT:movshdup {{.*#+}} xmm2 = xmm3[1,1,3,3] +; SSSE3-SLOW-NEXT:addps %xmm3, %xmm2 +; SSSE3-SLOW-NEXT:movlhps {{.*#+}} xmm2 = xmm2[0],xmm1[0] +; SSSE3-SLOW-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,0] +; SSSE3-SLOW-NEXT:retq +; +; SSSE3-FAST-LABEL: pair_sum_v4f32_v4f32: +; SSSE3-FAST: # %bb.0: +; SSSE3-FAST-NEXT:haddps %xmm0, %xmm0 +; SSSE3-FAST-NEXT:haddps %xmm1, %xmm1 +; SSSE3-FAST-NEXT:haddps %xmm1, %xmm0 +; SSSE3-FAST-NEXT:haddps %xmm2, %xmm2 +; SSSE3-FAST-NEXT:haddps %xmm3, %xmm3 +; SSSE3-FAST-NEXT:haddps %xmm2, %xmm3 +; SSSE3-FAST-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,2],xmm3[2,0] +; SSSE3-FAST-NEXT:retq +; +; AVX1-SLOW-LABEL: pair_sum_v4f32_v4f32: +; AVX1-SLOW: # %bb.0: +; AVX1-SLOW-NEXT:vhaddps %xmm0, %xmm0, %xmm0 +; AVX1-SLOW-NEXT:vmovshdup {{.*#+}} xmm4 = xmm0[1,1,3,3] +; AVX1-SLOW-NEXT:vaddps %xmm4, %xmm0, %xmm0 +; AVX1-SLOW-NEXT:vhaddps %xmm1, %xmm1, %xmm1 +; AVX1-SLOW-NEXT:vmovshdup {{.*#+}} xmm4 = xmm1[1,1,3,3] +; AVX1-SLOW-NEXT:vaddps %xmm4, %xmm1, %xmm1 +; AVX1-SLOW-NEXT:vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; AVX1-SLOW-NEXT:vhaddps %xmm2, %xmm2, %xmm1 +; AVX1-SLOW-NEXT:vmovshdup {{.*#+}} xmm2 = xmm1[1,1,3,3] +; AVX1-SLOW-NEXT:vaddps %xmm2, %xmm1, %xmm1 +; AVX1-SLOW-NEXT:vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; AVX1-SLOW-NEXT:vhaddps %xmm3, %xmm3, %xmm1 +; AVX1-SLOW-NEXT:vmovshdup {{.*#+}} xmm2 = xmm1[1,1,3,3] +; AVX1-SLOW-NEXT:vaddps %xmm2, %xmm1, %xmm1 +; AVX1-SLOW-NEXT:vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0] +; AVX1-SLOW-NEXT:retq +; +; AVX1-FAST-LABEL: pair_sum_v4f32_v4f32: +; AVX1-FAST: # %bb.0: +; AVX1-FAST-NEXT:vhaddps %xmm0, %xmm0, %xmm0 +; AVX1-FAST-NEXT:vhaddps %xmm1, %xmm1, %xmm1 +; AVX1-FAST-NEXT:vhaddps %xmm1, %xmm0, %xmm0 +; AVX1-FAST-NEXT:vhaddps %xmm2, %xmm2, %xmm1 +; AVX1-FAST-NEXT:vhaddps %xmm1, %xmm1, %xmm1 +; AVX1-FAST-NEXT:vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,1] +; AVX1-FAST-NEXT:vhaddps %xmm3, %xmm3, %xmm1 +;
[llvm-branch-commits] [llvm] 5963229 - [X86][SSE] Add missing SSE test coverage for permute(hop, hop) folds
Author: Simon Pilgrim Date: 2021-01-11T11:29:04Z New Revision: 5963229266303d83b2e9de09bce7e063276e41d0 URL: https://github.com/llvm/llvm-project/commit/5963229266303d83b2e9de09bce7e063276e41d0 DIFF: https://github.com/llvm/llvm-project/commit/5963229266303d83b2e9de09bce7e063276e41d0.diff LOG: [X86][SSE] Add missing SSE test coverage for permute(hop,hop) folds Should help avoid bugs like reported in rG80dee7965dff Added: llvm/test/CodeGen/X86/horizontal-shuffle-3.ll Modified: llvm/test/CodeGen/X86/horizontal-shuffle-2.ll Removed: diff --git a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll index 4f747db94341..78c30e431574 100644 --- a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll +++ b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll @@ -1,17 +1,21 @@ ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py -; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx2 | FileCheck %s -; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 | FileCheck %s - -; -; 128-bit Vectors -; +; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+sse4.1 | FileCheck %s --check-prefix=SSE +; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx2 | FileCheck %s --check-prefix=AVX +; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse4.1 | FileCheck %s --check-prefix=SSE +; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 | FileCheck %s --check-prefix=AVX define <4 x float> @test_unpacklo_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { -; CHECK-LABEL: test_unpacklo_hadd_v4f32: -; CHECK: ## %bb.0: -; CHECK-NEXT:vhaddps %xmm2, %xmm0, %xmm0 -; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] -; CHECK-NEXT:ret{{[l|q]}} +; SSE-LABEL: test_unpacklo_hadd_v4f32: +; SSE: ## %bb.0: +; SSE-NEXT:haddps %xmm2, %xmm0 +; SSE-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,2,1,3] +; SSE-NEXT:ret{{[l|q]}} +; +; AVX-LABEL: test_unpacklo_hadd_v4f32: +; AVX: ## %bb.0: +; AVX-NEXT:vhaddps %xmm2, %xmm0, %xmm0 +; AVX-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] +; AVX-NEXT:ret{{[l|q]}} %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> @@ -19,11 +23,18 @@ define <4 x float> @test_unpacklo_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 } define <4 x float> @test_unpackhi_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { -; CHECK-LABEL: test_unpackhi_hadd_v4f32: -; CHECK: ## %bb.0: -; CHECK-NEXT:vhaddps %xmm3, %xmm1, %xmm0 -; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] -; CHECK-NEXT:ret{{[l|q]}} +; SSE-LABEL: test_unpackhi_hadd_v4f32: +; SSE: ## %bb.0: +; SSE-NEXT:movaps %xmm1, %xmm0 +; SSE-NEXT:haddps %xmm3, %xmm0 +; SSE-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,2,1,3] +; SSE-NEXT:ret{{[l|q]}} +; +; AVX-LABEL: test_unpackhi_hadd_v4f32: +; AVX: ## %bb.0: +; AVX-NEXT:vhaddps %xmm3, %xmm1, %xmm0 +; AVX-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] +; AVX-NEXT:ret{{[l|q]}} %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> @@ -31,11 +42,17 @@ define <4 x float> @test_unpackhi_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 } define <4 x float> @test_unpacklo_hsub_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { -; CHECK-LABEL: test_unpacklo_hsub_v4f32: -; CHECK: ## %bb.0: -; CHECK-NEXT:vhsubps %xmm2, %xmm0, %xmm0 -; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] -; CHECK-NEXT:ret{{[l|q]}} +; SSE-LABEL: test_unpacklo_hsub_v4f32: +; SSE: ## %bb.0: +; SSE-NEXT:hsubps %xmm2, %xmm0 +; SSE-NEXT:shufps {{.*#+}} xmm0 = xmm0[0,2,1,3] +; SSE-NEXT:ret{{[l|q]}} +; +; AVX-LABEL: test_unpacklo_hsub_v4f32: +; AVX: ## %bb.0: +; AVX-NEXT:vhsubps %xmm2, %xmm0, %xmm0 +; AVX-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] +; AVX-NEXT:ret{{[l|q]}} %5 = tail call <4 x float> @llvm.x86.sse3.hsub.ps(<4 x float> %0, <4 x float> %1) #4 %6 = tail call <4 x float> @llvm.x86.sse3.hsub.ps(<4 x float> %2, <4 x float> %3) #4 %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> @@ -43,11 +60,18 @@ define <4 x float> @test_unpacklo_hsub_v4f32(<4 x float> %0, <4 x float> %1, <4 } define <4 x float> @test_unpackhi_hsub_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { -; CHECK-LABEL: test_unpackhi_hsub_v4f32: -; CHECK: ## %bb.0: -; CHECK-NEXT:vhsubps %xmm3, %xmm1, %xmm0 -; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] -;
[llvm-branch-commits] [llvm] 41bf338 - Revert rGd43a264a5dd3 "Revert "[X86][SSE] Fold unpack(hop(), hop()) -> permute(hop())""
Author: Simon Pilgrim Date: 2021-01-11T11:29:04Z New Revision: 41bf338dd1e7f07c1e89f171ff6d53578f5125be URL: https://github.com/llvm/llvm-project/commit/41bf338dd1e7f07c1e89f171ff6d53578f5125be DIFF: https://github.com/llvm/llvm-project/commit/41bf338dd1e7f07c1e89f171ff6d53578f5125be.diff LOG: Revert rGd43a264a5dd3 "Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())"" This reapplies commit rG80dee7965dffdfb866afa9d74f3a4a97453708b2. [X86][SSE] Fold unpack(hop(),hop()) -> permute(hop()) UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result. REAPPLIED with a fix for unary unpacks of HOP. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/horizontal-shuffle-2.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 16f1023ed5f8..7895f883863f 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -37513,10 +37513,12 @@ static SDValue combineShuffleOfConcatUndef(SDNode *N, SelectionDAG , /// Eliminate a redundant shuffle of a horizontal math op. static SDValue foldShuffleOfHorizOp(SDNode *N, SelectionDAG ) { + // TODO: Can we use getTargetShuffleInputs instead? unsigned Opcode = N->getOpcode(); if (Opcode != X86ISD::MOVDDUP && Opcode != X86ISD::VBROADCAST) -if (Opcode != ISD::VECTOR_SHUFFLE || !N->getOperand(1).isUndef()) - return SDValue(); +if (Opcode != X86ISD::UNPCKL && Opcode != X86ISD::UNPCKH) + if (Opcode != ISD::VECTOR_SHUFFLE || !N->getOperand(1).isUndef()) +return SDValue(); // For a broadcast, peek through an extract element of index 0 to find the // horizontal op: broadcast (ext_vec_elt HOp, 0) @@ -37535,6 +37537,28 @@ static SDValue foldShuffleOfHorizOp(SDNode *N, SelectionDAG ) { HOp.getOpcode() != X86ISD::HSUB && HOp.getOpcode() != X86ISD::FHSUB) return SDValue(); + // unpcklo(hop(x,y),hop(z,w)) -> permute(hop(x,z)). + // unpckhi(hop(x,y),hop(z,w)) -> permute(hop(y,w)). + // Don't fold if hop(x,y) == hop(z,w). + if (Opcode == X86ISD::UNPCKL || Opcode == X86ISD::UNPCKH) { +SDValue HOp2 = N->getOperand(1); +if (HOp.getOpcode() != HOp2.getOpcode() || VT.getScalarSizeInBits() != 32) + return SDValue(); +if (HOp == HOp2) + return SDValue(); +SDLoc DL(HOp); +unsigned LoHi = Opcode == X86ISD::UNPCKL ? 0 : 1; +SDValue Res = DAG.getNode(HOp.getOpcode(), DL, VT, HOp.getOperand(LoHi), + HOp2.getOperand(LoHi)); +// Use SHUFPS for the permute so this will work on SSE3 targets, shuffle +// combining and domain handling will simplify this later on. +EVT ShuffleVT = VT.changeVectorElementType(MVT::f32); +Res = DAG.getBitcast(ShuffleVT, Res); +Res = DAG.getNode(X86ISD::SHUFP, DL, ShuffleVT, Res, Res, + getV4X86ShuffleImm8ForMask({0, 2, 1, 3}, DL, DAG)); +return DAG.getBitcast(VT, Res); + } + // 128-bit horizontal math instructions are defined to operate on adjacent // lanes of each operand as: // v4X32: A[0] + A[1] , A[2] + A[3] , B[0] + B[1] , B[2] + B[3] diff --git a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll index c012c88c6ed2..4f747db94341 100644 --- a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll +++ b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll @@ -9,9 +9,8 @@ define <4 x float> @test_unpacklo_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { ; CHECK-LABEL: test_unpacklo_hadd_v4f32: ; CHECK: ## %bb.0: -; CHECK-NEXT:vhaddps %xmm0, %xmm0, %xmm0 -; CHECK-NEXT:vhaddps %xmm0, %xmm2, %xmm1 -; CHECK-NEXT:vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT:vhaddps %xmm2, %xmm0, %xmm0 +; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] ; CHECK-NEXT:ret{{[l|q]}} %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 @@ -22,9 +21,8 @@ define <4 x float> @test_unpacklo_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 define <4 x float> @test_unpackhi_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { ; CHECK-LABEL: test_unpackhi_hadd_v4f32: ; CHECK: ## %bb.0: -; CHECK-NEXT:vhaddps %xmm1, %xmm0, %xmm0 -; CHECK-NEXT:vhaddps %xmm3, %xmm0, %xmm1 -; CHECK-NEXT:vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK-NEXT:vhaddps %xmm3, %xmm1, %xmm0 +; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] ; CHECK-NEXT:ret{{[l|q]}} %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 @@ -35,9 +33,8
[llvm-branch-commits] [llvm] 80dee79 - [X86][SSE] Fold unpack(hop(), hop()) -> permute(hop())
Author: Simon Pilgrim Date: 2021-01-08T15:22:17Z New Revision: 80dee7965dffdfb866afa9d74f3a4a97453708b2 URL: https://github.com/llvm/llvm-project/commit/80dee7965dffdfb866afa9d74f3a4a97453708b2 DIFF: https://github.com/llvm/llvm-project/commit/80dee7965dffdfb866afa9d74f3a4a97453708b2.diff LOG: [X86][SSE] Fold unpack(hop(),hop()) -> permute(hop()) UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/horizontal-shuffle-2.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 16f1023ed5f8..7b0e927a33d2 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -37513,10 +37513,12 @@ static SDValue combineShuffleOfConcatUndef(SDNode *N, SelectionDAG , /// Eliminate a redundant shuffle of a horizontal math op. static SDValue foldShuffleOfHorizOp(SDNode *N, SelectionDAG ) { + // TODO: Can we use getTargetShuffleInputs instead? unsigned Opcode = N->getOpcode(); if (Opcode != X86ISD::MOVDDUP && Opcode != X86ISD::VBROADCAST) -if (Opcode != ISD::VECTOR_SHUFFLE || !N->getOperand(1).isUndef()) - return SDValue(); +if (Opcode != X86ISD::UNPCKL && Opcode != X86ISD::UNPCKH) + if (Opcode != ISD::VECTOR_SHUFFLE || !N->getOperand(1).isUndef()) +return SDValue(); // For a broadcast, peek through an extract element of index 0 to find the // horizontal op: broadcast (ext_vec_elt HOp, 0) @@ -37535,6 +37537,24 @@ static SDValue foldShuffleOfHorizOp(SDNode *N, SelectionDAG ) { HOp.getOpcode() != X86ISD::HSUB && HOp.getOpcode() != X86ISD::FHSUB) return SDValue(); + // unpck(hop,hop) -> permute(hop,hop). + if (Opcode == X86ISD::UNPCKL || Opcode == X86ISD::UNPCKH) { +SDValue HOp2 = N->getOperand(1); +if (HOp.getOpcode() != HOp2.getOpcode() || VT.getScalarSizeInBits() != 32) + return SDValue(); +SDLoc DL(HOp); +unsigned LoHi = Opcode == X86ISD::UNPCKL ? 0 : 1; +SDValue Res = DAG.getNode(HOp.getOpcode(), DL, VT, HOp.getOperand(LoHi), + HOp2.getOperand(LoHi)); +// Use SHUFPS for the permute so this will work on SSE3 targets, shuffle +// combining and domain handling will simplify this later on. +EVT ShuffleVT = VT.changeVectorElementType(MVT::f32); +Res = DAG.getBitcast(ShuffleVT, Res); +Res = DAG.getNode(X86ISD::SHUFP, DL, ShuffleVT, Res, Res, + getV4X86ShuffleImm8ForMask({0, 2, 1, 3}, DL, DAG)); +return DAG.getBitcast(VT, Res); + } + // 128-bit horizontal math instructions are defined to operate on adjacent // lanes of each operand as: // v4X32: A[0] + A[1] , A[2] + A[3] , B[0] + B[1] , B[2] + B[3] diff --git a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll index c012c88c6ed2..6b4b8047d0f0 100644 --- a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll +++ b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll @@ -9,9 +9,8 @@ define <4 x float> @test_unpacklo_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { ; CHECK-LABEL: test_unpacklo_hadd_v4f32: ; CHECK: ## %bb.0: -; CHECK-NEXT:vhaddps %xmm0, %xmm0, %xmm0 -; CHECK-NEXT:vhaddps %xmm0, %xmm2, %xmm1 -; CHECK-NEXT:vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT:vhaddps %xmm2, %xmm0, %xmm0 +; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] ; CHECK-NEXT:ret{{[l|q]}} %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 @@ -22,9 +21,8 @@ define <4 x float> @test_unpacklo_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 define <4 x float> @test_unpackhi_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { ; CHECK-LABEL: test_unpackhi_hadd_v4f32: ; CHECK: ## %bb.0: -; CHECK-NEXT:vhaddps %xmm1, %xmm0, %xmm0 -; CHECK-NEXT:vhaddps %xmm3, %xmm0, %xmm1 -; CHECK-NEXT:vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK-NEXT:vhaddps %xmm3, %xmm1, %xmm0 +; CHECK-NEXT:vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3] ; CHECK-NEXT:ret{{[l|q]}} %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 @@ -35,9 +33,8 @@ define <4 x float> @test_unpackhi_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 define <4 x float> @test_unpacklo_hsub_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { ; CHECK-LABEL: test_unpacklo_hsub_v4f32: ; CHECK: ## %bb.0: -; CHECK-NEXT:vhsubps %xmm0, %xmm0, %xmm0 -; CHECK-NEXT:vhsubps %xmm0, %xmm2, %xmm1 -;
[llvm-branch-commits] [llvm] 4a582d7 - [X86][SSE] Add vphaddd/vphsubd unpack(hop(), hop()) tests
Author: Simon Pilgrim Date: 2021-01-08T14:39:37Z New Revision: 4a582d766ae40c8f624140c70b7122091d3a9b35 URL: https://github.com/llvm/llvm-project/commit/4a582d766ae40c8f624140c70b7122091d3a9b35 DIFF: https://github.com/llvm/llvm-project/commit/4a582d766ae40c8f624140c70b7122091d3a9b35.diff LOG: [X86][SSE] Add vphaddd/vphsubd unpack(hop(),hop()) tests Added: Modified: llvm/test/CodeGen/X86/horizontal-shuffle-2.ll Removed: diff --git a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll index 7acd85604800..c012c88c6ed2 100644 --- a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll +++ b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll @@ -58,6 +58,58 @@ define <4 x float> @test_unpackhi_hsub_v4f32(<4 x float> %0, <4 x float> %1, <4 ret <4 x float> %7 } +define <4 x i32> @test_unpacklo_hadd_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i32> %2, <4 x i32> %3) { +; CHECK-LABEL: test_unpacklo_hadd_v4i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vphaddd %xmm0, %xmm0, %xmm0 +; CHECK-NEXT:vphaddd %xmm0, %xmm2, %xmm1 +; CHECK-NEXT:vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x i32> @llvm.x86.ssse3.phadd.d.128(<4 x i32> %0, <4 x i32> %1) #5 + %6 = tail call <4 x i32> @llvm.x86.ssse3.phadd.d.128(<4 x i32> %2, <4 x i32> %3) #5 + %7 = shufflevector <4 x i32> %5, <4 x i32> %6, <4 x i32> + ret <4 x i32> %7 +} + +define <4 x i32> @test_unpackhi_hadd_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i32> %2, <4 x i32> %3) { +; CHECK-LABEL: test_unpackhi_hadd_v4i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vphaddd %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vphaddd %xmm3, %xmm0, %xmm1 +; CHECK-NEXT:vpunpckhdq {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x i32> @llvm.x86.ssse3.phadd.d.128(<4 x i32> %0, <4 x i32> %1) #5 + %6 = tail call <4 x i32> @llvm.x86.ssse3.phadd.d.128(<4 x i32> %2, <4 x i32> %3) #5 + %7 = shufflevector <4 x i32> %5, <4 x i32> %6, <4 x i32> + ret <4 x i32> %7 +} + +define <4 x i32> @test_unpacklo_hsub_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i32> %2, <4 x i32> %3) { +; CHECK-LABEL: test_unpacklo_hsub_v4i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vphsubd %xmm0, %xmm0, %xmm0 +; CHECK-NEXT:vphsubd %xmm0, %xmm2, %xmm1 +; CHECK-NEXT:vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x i32> @llvm.x86.ssse3.phsub.d.128(<4 x i32> %0, <4 x i32> %1) #5 + %6 = tail call <4 x i32> @llvm.x86.ssse3.phsub.d.128(<4 x i32> %2, <4 x i32> %3) #5 + %7 = shufflevector <4 x i32> %5, <4 x i32> %6, <4 x i32> + ret <4 x i32> %7 +} + +define <4 x i32> @test_unpackhi_hsub_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i32> %2, <4 x i32> %3) { +; CHECK-LABEL: test_unpackhi_hsub_v4i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vphsubd %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vphsubd %xmm3, %xmm0, %xmm1 +; CHECK-NEXT:vpunpckhdq {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x i32> @llvm.x86.ssse3.phsub.d.128(<4 x i32> %0, <4 x i32> %1) #5 + %6 = tail call <4 x i32> @llvm.x86.ssse3.phsub.d.128(<4 x i32> %2, <4 x i32> %3) #5 + %7 = shufflevector <4 x i32> %5, <4 x i32> %6, <4 x i32> + ret <4 x i32> %7 +} + ; ; 256-bit Vectors ; @@ -114,6 +166,58 @@ define <8 x float> @test_unpackhi_hsub_v8f32(<8 x float> %0, <8 x float> %1, <8 ret <8 x float> %7 } +define <8 x i32> @test_unpacklo_hadd_v8i32(<8 x i32> %0, <8 x i32> %1, <8 x i32> %2, <8 x i32> %3) { +; CHECK-LABEL: test_unpacklo_hadd_v8i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vphaddd %ymm0, %ymm0, %ymm0 +; CHECK-NEXT:vphaddd %ymm0, %ymm2, %ymm1 +; CHECK-NEXT:vpunpckldq {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <8 x i32> @llvm.x86.avx2.phadd.d(<8 x i32> %0, <8 x i32> %1) #5 + %6 = tail call <8 x i32> @llvm.x86.avx2.phadd.d(<8 x i32> %2, <8 x i32> %3) #5 + %7 = shufflevector <8 x i32> %5, <8 x i32> %6, <8 x i32> + ret <8 x i32> %7 +} + +define <8 x i32> @test_unpackhi_hadd_v8i32(<8 x i32> %0, <8 x i32> %1, <8 x i32> %2, <8 x i32> %3) { +; CHECK-LABEL: test_unpackhi_hadd_v8i32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vphaddd %ymm1, %ymm0, %ymm0 +; CHECK-NEXT:vphaddd %ymm3, %ymm0, %ymm1 +; CHECK-NEXT:vpunpckhdq {{.*#+}} ymm0 = ymm0[2],ymm1[2],ymm0[3],ymm1[3],ymm0[6],ymm1[6],ymm0[7],ymm1[7] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <8 x i32> @llvm.x86.avx2.phadd.d(<8 x i32> %0, <8 x i32> %1) #5 + %6 = tail call <8 x i32> @llvm.x86.avx2.phadd.d(<8 x i32> %2, <8 x i32> %3) #5 + %7 = shufflevector <8 x i32> %5, <8 x i32> %6, <8 x i32> + ret <8 x i32> %7 +} + +define <8 x i32> @test_unpacklo_hsub_v8i32(<8 x i32> %0, <8 x i32> %1, <8 x i32> %2,
[llvm-branch-commits] [llvm] 7b9f541 - [X86][SSE] Add tests for unpack(hop(), hop())
Author: Simon Pilgrim Date: 2021-01-08T14:11:37Z New Revision: 7b9f541c1edb24a676508906cfbcaaf228cc6a2e URL: https://github.com/llvm/llvm-project/commit/7b9f541c1edb24a676508906cfbcaaf228cc6a2e DIFF: https://github.com/llvm/llvm-project/commit/7b9f541c1edb24a676508906cfbcaaf228cc6a2e.diff LOG: [X86][SSE] Add tests for unpack(hop(),hop()) We should be able to convert these to permute(hop()) as we only ever use one of the ops from each hop. Added: llvm/test/CodeGen/X86/horizontal-shuffle-2.ll Modified: Removed: diff --git a/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll new file mode 100644 index ..7acd85604800 --- /dev/null +++ b/llvm/test/CodeGen/X86/horizontal-shuffle-2.ll @@ -0,0 +1,145 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx2 | FileCheck %s +; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx2 | FileCheck %s + +; +; 128-bit Vectors +; + +define <4 x float> @test_unpacklo_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { +; CHECK-LABEL: test_unpacklo_hadd_v4f32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vhaddps %xmm0, %xmm0, %xmm0 +; CHECK-NEXT:vhaddps %xmm0, %xmm2, %xmm1 +; CHECK-NEXT:vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 + %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 + %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> + ret <4 x float> %7 +} + +define <4 x float> @test_unpackhi_hadd_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { +; CHECK-LABEL: test_unpackhi_hadd_v4f32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vhaddps %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vhaddps %xmm3, %xmm0, %xmm1 +; CHECK-NEXT:vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %1) #4 + %6 = tail call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %2, <4 x float> %3) #4 + %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> + ret <4 x float> %7 +} + +define <4 x float> @test_unpacklo_hsub_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { +; CHECK-LABEL: test_unpacklo_hsub_v4f32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vhsubps %xmm0, %xmm0, %xmm0 +; CHECK-NEXT:vhsubps %xmm0, %xmm2, %xmm1 +; CHECK-NEXT:vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x float> @llvm.x86.sse3.hsub.ps(<4 x float> %0, <4 x float> %1) #4 + %6 = tail call <4 x float> @llvm.x86.sse3.hsub.ps(<4 x float> %2, <4 x float> %3) #4 + %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> + ret <4 x float> %7 +} + +define <4 x float> @test_unpackhi_hsub_v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2, <4 x float> %3) { +; CHECK-LABEL: test_unpackhi_hsub_v4f32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vhsubps %xmm1, %xmm0, %xmm0 +; CHECK-NEXT:vhsubps %xmm3, %xmm0, %xmm1 +; CHECK-NEXT:vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <4 x float> @llvm.x86.sse3.hsub.ps(<4 x float> %0, <4 x float> %1) #4 + %6 = tail call <4 x float> @llvm.x86.sse3.hsub.ps(<4 x float> %2, <4 x float> %3) #4 + %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> + ret <4 x float> %7 +} + +; +; 256-bit Vectors +; + +define <8 x float> @test_unpacklo_hadd_v8f32(<8 x float> %0, <8 x float> %1, <8 x float> %2, <8 x float> %3) { +; CHECK-LABEL: test_unpacklo_hadd_v8f32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vhaddps %ymm0, %ymm0, %ymm0 +; CHECK-NEXT:vhaddps %ymm0, %ymm2, %ymm1 +; CHECK-NEXT:vunpcklps {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float> %0, <8 x float> %1) #4 + %6 = tail call <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float> %2, <8 x float> %3) #4 + %7 = shufflevector <8 x float> %5, <8 x float> %6, <8 x i32> + ret <8 x float> %7 +} + +define <8 x float> @test_unpackhi_hadd_v8f32(<8 x float> %0, <8 x float> %1, <8 x float> %2, <8 x float> %3) { +; CHECK-LABEL: test_unpackhi_hadd_v8f32: +; CHECK: ## %bb.0: +; CHECK-NEXT:vhaddps %ymm1, %ymm0, %ymm0 +; CHECK-NEXT:vhaddps %ymm3, %ymm0, %ymm1 +; CHECK-NEXT:vunpckhps {{.*#+}} ymm0 = ymm0[2],ymm1[2],ymm0[3],ymm1[3],ymm0[6],ymm1[6],ymm0[7],ymm1[7] +; CHECK-NEXT:ret{{[l|q]}} + %5 = tail call <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float> %0, <8 x float> %1) #4 + %6 = tail call <8 x float>
[llvm-branch-commits] [llvm] 037b058 - [AArch64] SVEIntrinsicOpts - use range loop and cast<> instead of dyn_cast<> for dereferenced pointer. NFCI.
Author: Simon Pilgrim Date: 2021-01-07T14:21:55Z New Revision: 037b058e41979fa5e6ffd209033dfe72abb97b53 URL: https://github.com/llvm/llvm-project/commit/037b058e41979fa5e6ffd209033dfe72abb97b53 DIFF: https://github.com/llvm/llvm-project/commit/037b058e41979fa5e6ffd209033dfe72abb97b53.diff LOG: [AArch64] SVEIntrinsicOpts - use range loop and cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. Don't directly dereference a dyn_cast<> - use cast<> so we assert for the correct type. Also, simplify the for loop to a range loop. Fixes clang static analyzer warning. Added: Modified: llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp Removed: diff --git a/llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp b/llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp index 67fc4ee0a29d..8e8b12c07bbf 100644 --- a/llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp +++ b/llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp @@ -248,10 +248,8 @@ bool SVEIntrinsicOpts::runOnModule(Module ) { case Intrinsic::aarch64_sve_ptest_any: case Intrinsic::aarch64_sve_ptest_first: case Intrinsic::aarch64_sve_ptest_last: - for (auto I = F.user_begin(), E = F.user_end(); I != E;) { -auto *Inst = dyn_cast(*I++); -Functions.insert(Inst->getFunction()); - } + for (User *U : F.users()) +Functions.insert(cast(U)->getFunction()); break; default: break; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] fa6d897 - [Analysis] MemoryDepChecker::couldPreventStoreLoadForward - remove dead store. NFCI.
Author: Simon Pilgrim Date: 2021-01-07T14:21:54Z New Revision: fa6d8977999096b2a3ae1357aa38ddf73abaf414 URL: https://github.com/llvm/llvm-project/commit/fa6d8977999096b2a3ae1357aa38ddf73abaf414 DIFF: https://github.com/llvm/llvm-project/commit/fa6d8977999096b2a3ae1357aa38ddf73abaf414.diff LOG: [Analysis] MemoryDepChecker::couldPreventStoreLoadForward - remove dead store. NFCI. As we're breaking from the loop when clamping MaxVF, clang static analyzer was warning that the VF iterator was being updated and never used. Added: Modified: llvm/lib/Analysis/LoopAccessAnalysis.cpp Removed: diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp index be340a3b3130..76e172534176 100644 --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp @@ -1338,7 +1338,7 @@ bool MemoryDepChecker::couldPreventStoreLoadForward(uint64_t Distance, // If the number of vector iteration between the store and the load are // small we could incur conflicts. if (Distance % VF && Distance / VF < NumItersForStoreLoadThroughMemory) { - MaxVFWithoutSLForwardIssues = (VF >>= 1); + MaxVFWithoutSLForwardIssues = (VF >> 1); break; } } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a9a8caf - [llvm-objdump] Pass Twine by const reference instead of by value. NFCI.
Author: Simon Pilgrim Date: 2021-01-07T12:53:29Z New Revision: a9a8caf2ce2ff08a20cc145d23270e6c91709baa URL: https://github.com/llvm/llvm-project/commit/a9a8caf2ce2ff08a20cc145d23270e6c91709baa DIFF: https://github.com/llvm/llvm-project/commit/a9a8caf2ce2ff08a20cc145d23270e6c91709baa.diff LOG: [llvm-objdump] Pass Twine by const reference instead of by value. NFCI. Added: Modified: llvm/tools/llvm-objdump/llvm-objdump.cpp llvm/tools/llvm-objdump/llvm-objdump.h Removed: diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp index 5ac25d7e57be..3134f989603a 100644 --- a/llvm/tools/llvm-objdump/llvm-objdump.cpp +++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp @@ -29,6 +29,7 @@ #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringSet.h" #include "llvm/ADT/Triple.h" +#include "llvm/ADT/Twine.h" #include "llvm/CodeGen/FaultMaps.h" #include "llvm/DebugInfo/DWARF/DWARFContext.h" #include "llvm/DebugInfo/Symbolize/Symbolize.h" @@ -448,7 +449,7 @@ std::string objdump::getFileNameForError(const object::Archive::Child , return ""; } -void objdump::reportWarning(Twine Message, StringRef File) { +void objdump::reportWarning(const Twine , StringRef File) { // Output order between errs() and outs() matters especially for archive // files where the output is per member object. outs().flush(); @@ -457,7 +458,7 @@ void objdump::reportWarning(Twine Message, StringRef File) { } LLVM_ATTRIBUTE_NORETURN void objdump::reportError(StringRef File, - Twine Message) { + const Twine ) { outs().flush(); WithColor::error(errs(), ToolName) << "'" << File << "': " << Message << "\n"; exit(1); @@ -480,11 +481,11 @@ LLVM_ATTRIBUTE_NORETURN void objdump::reportError(Error E, StringRef FileName, exit(1); } -static void reportCmdLineWarning(Twine Message) { +static void reportCmdLineWarning(const Twine ) { WithColor::warning(errs(), ToolName) << Message << "\n"; } -LLVM_ATTRIBUTE_NORETURN static void reportCmdLineError(Twine Message) { +LLVM_ATTRIBUTE_NORETURN static void reportCmdLineError(const Twine ) { WithColor::error(errs(), ToolName) << Message << "\n"; exit(1); } diff --git a/llvm/tools/llvm-objdump/llvm-objdump.h b/llvm/tools/llvm-objdump/llvm-objdump.h index 4cee35484105..99bf191a301e 100644 --- a/llvm/tools/llvm-objdump/llvm-objdump.h +++ b/llvm/tools/llvm-objdump/llvm-objdump.h @@ -18,6 +18,7 @@ namespace llvm { class StringRef; +class Twine; namespace object { class ELFObjectFileBase; @@ -127,11 +128,11 @@ void printSymbolTable(const object::ObjectFile *O, StringRef ArchiveName, void printSymbol(const object::ObjectFile *O, const object::SymbolRef , StringRef FileName, StringRef ArchiveName, StringRef ArchitectureName, bool DumpDynamic); -LLVM_ATTRIBUTE_NORETURN void reportError(StringRef File, Twine Message); +LLVM_ATTRIBUTE_NORETURN void reportError(StringRef File, const Twine ); LLVM_ATTRIBUTE_NORETURN void reportError(Error E, StringRef FileName, StringRef ArchiveName = "", StringRef ArchitectureName = ""); -void reportWarning(Twine Message, StringRef File); +void reportWarning(const Twine , StringRef File); template T unwrapOrError(Expected EO, Ts &&... Args) { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 0280911 - [DWARF] DWARFDebugLoc::dumpRawEntry - remove dead stores. NFCI.
Author: Simon Pilgrim Date: 2021-01-07T12:53:28Z New Revision: 028091195d763190d9b57ae316c8601fe223c02c URL: https://github.com/llvm/llvm-project/commit/028091195d763190d9b57ae316c8601fe223c02c DIFF: https://github.com/llvm/llvm-project/commit/028091195d763190d9b57ae316c8601fe223c02c.diff LOG: [DWARF] DWARFDebugLoc::dumpRawEntry - remove dead stores. NFCI. Don't bother zeroing local (unused) variables just before returning. Fixes clang static analyzer warning. Added: Modified: llvm/lib/DebugInfo/DWARF/DWARFDebugLoc.cpp Removed: diff --git a/llvm/lib/DebugInfo/DWARF/DWARFDebugLoc.cpp b/llvm/lib/DebugInfo/DWARF/DWARFDebugLoc.cpp index 44b410778146..cdffb36741c8 100644 --- a/llvm/lib/DebugInfo/DWARF/DWARFDebugLoc.cpp +++ b/llvm/lib/DebugInfo/DWARF/DWARFDebugLoc.cpp @@ -260,7 +260,6 @@ void DWARFDebugLoc::dumpRawEntry(const DWARFLocationEntry , Value1 = Entry.Value1; break; case dwarf::DW_LLE_end_of_list: -Value0 = Value1 = 0; return; default: llvm_unreachable("Not possible in DWARF4!"); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 236129f - [CompilationDatabase] Pass Twine by const reference instead of by value. NFCI.
Author: Simon Pilgrim Date: 2021-01-07T12:53:28Z New Revision: 236129fb4460a4030eee685abc2f02b32458e775 URL: https://github.com/llvm/llvm-project/commit/236129fb4460a4030eee685abc2f02b32458e775 DIFF: https://github.com/llvm/llvm-project/commit/236129fb4460a4030eee685abc2f02b32458e775.diff LOG: [CompilationDatabase] Pass Twine by const reference instead of by value. NFCI. Added: Modified: clang/include/clang/Tooling/CompilationDatabase.h clang/lib/Tooling/CompilationDatabase.cpp Removed: diff --git a/clang/include/clang/Tooling/CompilationDatabase.h b/clang/include/clang/Tooling/CompilationDatabase.h index cbd57e9609aa..44af236347b3 100644 --- a/clang/include/clang/Tooling/CompilationDatabase.h +++ b/clang/include/clang/Tooling/CompilationDatabase.h @@ -43,10 +43,10 @@ namespace tooling { /// Specifies the working directory and command of a compilation. struct CompileCommand { CompileCommand() = default; - CompileCommand(Twine Directory, Twine Filename, - std::vector CommandLine, Twine Output) + CompileCommand(const Twine , const Twine , + std::vector CommandLine, const Twine ) : Directory(Directory.str()), Filename(Filename.str()), -CommandLine(std::move(CommandLine)), Output(Output.str()){} +CommandLine(std::move(CommandLine)), Output(Output.str()) {} /// The working directory the command was executed from. std::string Directory; @@ -180,9 +180,9 @@ class FixedCompilationDatabase : public CompilationDatabase { /// \param Argv Points to the command line arguments. /// \param ErrorMsg Contains error text if the function returns null pointer. /// \param Directory The base directory used in the FixedCompilationDatabase. - static std::unique_ptr loadFromCommandLine( - int , const char *const *Argv, std::string , - Twine Directory = "."); + static std::unique_ptr + loadFromCommandLine(int , const char *const *Argv, std::string , + const Twine = "."); /// Reads flags from the given file, one-per-line. /// Returns nullptr and sets ErrorMessage if we can't read the file. @@ -196,7 +196,8 @@ class FixedCompilationDatabase : public CompilationDatabase { /// Constructs a compilation data base from a specified directory /// and command line. - FixedCompilationDatabase(Twine Directory, ArrayRef CommandLine); + FixedCompilationDatabase(const Twine , + ArrayRef CommandLine); /// Returns the given compile command. /// diff --git a/clang/lib/Tooling/CompilationDatabase.cpp b/clang/lib/Tooling/CompilationDatabase.cpp index d339fd044c02..1e19e68633d2 100644 --- a/clang/lib/Tooling/CompilationDatabase.cpp +++ b/clang/lib/Tooling/CompilationDatabase.cpp @@ -323,7 +323,7 @@ std::unique_ptr FixedCompilationDatabase::loadFromCommandLine(int , const char *const *Argv, std::string , - Twine Directory) { + const Twine ) { ErrorMsg.clear(); if (Argc == 0) return nullptr; @@ -368,8 +368,8 @@ FixedCompilationDatabase::loadFromBuffer(StringRef Directory, StringRef Data, return std::make_unique(Directory, std::move(Args)); } -FixedCompilationDatabase:: -FixedCompilationDatabase(Twine Directory, ArrayRef CommandLine) { +FixedCompilationDatabase::FixedCompilationDatabase( +const Twine , ArrayRef CommandLine) { std::vector ToolCommandLine(1, GetClangToolCommand()); ToolCommandLine.insert(ToolCommandLine.end(), CommandLine.begin(), CommandLine.end()); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 350ab7a - [DAG] Simplify OR(X, SHL(Y, BW/2)) eq/ne 0/-1 'all/any-of' style patterns
Author: Simon Pilgrim Date: 2021-01-07T12:03:19Z New Revision: 350ab7aa1c6735c0a136c118f7b43773fd74bf2d URL: https://github.com/llvm/llvm-project/commit/350ab7aa1c6735c0a136c118f7b43773fd74bf2d DIFF: https://github.com/llvm/llvm-project/commit/350ab7aa1c6735c0a136c118f7b43773fd74bf2d.diff LOG: [DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead. This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers. Differential Revision: https://reviews.llvm.org/D93599 Added: Modified: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/test/CodeGen/X86/avx512-mask-op.ll llvm/test/CodeGen/X86/cmp-concat.ll llvm/test/CodeGen/X86/movmsk-cmp.ll Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index f5abb2c513fb..1bf9840995b0 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -3956,6 +3956,67 @@ SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1, if (SDValue CC = optimizeSetCCByHoistingAndByConstFromLogicalShift( VT, N0, N1, Cond, DCI, dl)) return CC; + + // For all/any comparisons, replace or(x,shl(y,bw/2)) with and/or(x,y). + // For example, when high 32-bits of i64 X are known clear: + // all bits clear: (X | (Y<<32)) == 0 --> (X | Y) == 0 + // all bits set: (X | (Y<<32)) == -1 --> (X & Y) == -1 + bool CmpZero = N1C->getAPIntValue().isNullValue(); + bool CmpNegOne = N1C->getAPIntValue().isAllOnesValue(); + if ((CmpZero || CmpNegOne) && N0.hasOneUse()) { +// Match or(lo,shl(hi,bw/2)) pattern. +auto IsConcat = [&](SDValue V, SDValue , SDValue ) { + unsigned EltBits = V.getScalarValueSizeInBits(); + if (V.getOpcode() != ISD::OR || (EltBits % 2) != 0) +return false; + SDValue LHS = V.getOperand(0); + SDValue RHS = V.getOperand(1); + APInt HiBits = APInt::getHighBitsSet(EltBits, EltBits / 2); + // Unshifted element must have zero upperbits. + if (RHS.getOpcode() == ISD::SHL && + isa(RHS.getOperand(1)) && + RHS.getConstantOperandAPInt(1) == (EltBits / 2) && + DAG.MaskedValueIsZero(LHS, HiBits)) { +Lo = LHS; +Hi = RHS.getOperand(0); +return true; + } + if (LHS.getOpcode() == ISD::SHL && + isa(LHS.getOperand(1)) && + LHS.getConstantOperandAPInt(1) == (EltBits / 2) && + DAG.MaskedValueIsZero(RHS, HiBits)) { +Lo = RHS; +Hi = LHS.getOperand(0); +return true; + } + return false; +}; + +auto MergeConcat = [&](SDValue Lo, SDValue Hi) { + unsigned EltBits = N0.getScalarValueSizeInBits(); + unsigned HalfBits = EltBits / 2; + APInt HiBits = APInt::getHighBitsSet(EltBits, HalfBits); + SDValue LoBits = DAG.getConstant(~HiBits, dl, OpVT); + SDValue HiMask = DAG.getNode(ISD::AND, dl, OpVT, Hi, LoBits); + SDValue NewN0 = + DAG.getNode(CmpZero ? ISD::OR : ISD::AND, dl, OpVT, Lo, HiMask); + SDValue NewN1 = CmpZero ? DAG.getConstant(0, dl, OpVT) : LoBits; + return DAG.getSetCC(dl, VT, NewN0, NewN1, Cond); +}; + +SDValue Lo, Hi; +if (IsConcat(N0, Lo, Hi)) + return MergeConcat(Lo, Hi); + +if (N0.getOpcode() == ISD::AND || N0.getOpcode() == ISD::OR) { + SDValue Lo0, Lo1, Hi0, Hi1; + if (IsConcat(N0.getOperand(0), Lo0, Hi0) && + IsConcat(N0.getOperand(1), Lo1, Hi1)) { +return MergeConcat(DAG.getNode(N0.getOpcode(), dl, OpVT, Lo0, Lo1), + DAG.getNode(N0.getOpcode(), dl, OpVT, Hi0, Hi1)); + } +} + } } // If we have "setcc X, C0", check to see if we can shrink the immediate diff --git a/llvm/test/CodeGen/X86/avx512-mask-op.ll b/llvm/test/CodeGen/X86/avx512-mask-op.ll index 5df6842994f0..684bebaa85dd 100644 --- a/llvm/test/CodeGen/X86/avx512-mask-op.ll +++ b/llvm/test/CodeGen/X86/avx512-mask-op.ll @@ -2148,18 +2148,15 @@ define void @ktest_2(<32 x float> %in, float * %base) { ; ; KNL-LABEL: ktest_2: ; KNL: ## %bb.0: -; KNL-NEXT:vcmpgtps 64(%rdi), %zmm1, %k1 -; KNL-NEXT:vcmpgtps (%rdi), %zmm0, %k2 -; KNL-NEXT:vmovups 4(%rdi), %zmm2 {%k2} {z} -; KNL-NEXT:vmovups 68(%rdi), %zmm3 {%k1} {z} -; KNL-NEXT:vcmpltps %zmm3, %zmm1, %k0 -; KNL-NEXT:vcmpltps %zmm2, %zmm0, %k3 +; KNL-NEXT:vcmpgtps (%rdi), %zmm0,
[llvm-branch-commits] [llvm] 3f8c252 - [X86] Add commuted patterns test coverage for D93599
Author: Simon Pilgrim Date: 2021-01-06T18:03:20Z New Revision: 3f8c2520c0424860b4bd3ae7b20f8033ed09363a URL: https://github.com/llvm/llvm-project/commit/3f8c2520c0424860b4bd3ae7b20f8033ed09363a DIFF: https://github.com/llvm/llvm-project/commit/3f8c2520c0424860b4bd3ae7b20f8033ed09363a.diff LOG: [X86] Add commuted patterns test coverage for D93599 Suggested by @spatel Added: Modified: llvm/test/CodeGen/X86/cmp-concat.ll Removed: diff --git a/llvm/test/CodeGen/X86/cmp-concat.ll b/llvm/test/CodeGen/X86/cmp-concat.ll index a622ad7faff7..e3a69df86563 100644 --- a/llvm/test/CodeGen/X86/cmp-concat.ll +++ b/llvm/test/CodeGen/X86/cmp-concat.ll @@ -35,6 +35,46 @@ define i1 @cmp_anybits_concat_i32(i32 %x, i32 %y) { ret i1 %r } +define i1 @cmp_anybits_concat_shl_shl_i16(i16 %x, i16 %y) { +; CHECK-LABEL: cmp_anybits_concat_shl_shl_i16: +; CHECK: # %bb.0: +; CHECK-NEXT:# kill: def $esi killed $esi def $rsi +; CHECK-NEXT:movzwl %di, %eax +; CHECK-NEXT:movzwl %si, %ecx +; CHECK-NEXT:shlq $32, %rax +; CHECK-NEXT:shlq $8, %rcx +; CHECK-NEXT:orq %rax, %rcx +; CHECK-NEXT:sete %al +; CHECK-NEXT:retq + %zx = zext i16 %x to i64 + %zy = zext i16 %y to i64 + %sx = shl i64 %zx, 32 + %sy = shl i64 %zy, 8 + %or = or i64 %sx, %sy + %r = icmp eq i64 %or, 0 + ret i1 %r +} + +define i1 @cmp_anybits_concat_shl_shl_i16_commute(i16 %x, i16 %y) { +; CHECK-LABEL: cmp_anybits_concat_shl_shl_i16_commute: +; CHECK: # %bb.0: +; CHECK-NEXT:# kill: def $esi killed $esi def $rsi +; CHECK-NEXT:movzwl %di, %eax +; CHECK-NEXT:movzwl %si, %ecx +; CHECK-NEXT:shlq $32, %rax +; CHECK-NEXT:shlq $8, %rcx +; CHECK-NEXT:orq %rax, %rcx +; CHECK-NEXT:sete %al +; CHECK-NEXT:retq + %zx = zext i16 %x to i64 + %zy = zext i16 %y to i64 + %sx = shl i64 %zx, 32 + %sy = shl i64 %zy, 8 + %or = or i64 %sy, %sx + %r = icmp eq i64 %or, 0 + ret i1 %r +} + define <16 x i8> @cmp_allbits_concat_v16i8(<16 x i8> %x, <16 x i8> %y) { ; CHECK-LABEL: cmp_allbits_concat_v16i8: ; CHECK: # %bb.0: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 1307e3f - [TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support.
Author: Simon Pilgrim Date: 2021-01-06T16:13:51Z New Revision: 1307e3f6c46cc3a6e6ad9cd46fc67efafcac939e URL: https://github.com/llvm/llvm-project/commit/1307e3f6c46cc3a6e6ad9cd46fc67efafcac939e DIFF: https://github.com/llvm/llvm-project/commit/1307e3f6c46cc3a6e6ad9cd46fc67efafcac939e.diff LOG: [TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support. Added: Modified: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/test/CodeGen/X86/lzcnt-cmp.ll Removed: diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index d895a53e5a83..f5abb2c513fb 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -3486,35 +3486,36 @@ SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1, // Optimize some CTPOP cases. if (SDValue V = simplifySetCCWithCTPOP(*this, VT, N0, C1, Cond, dl, DAG)) return V; - } - - // FIXME: Support vectors. - if (auto *N1C = dyn_cast(N1.getNode())) { -const APInt = N1C->getAPIntValue(); // If the LHS is '(srl (ctlz x), 5)', the RHS is 0/1, and this is an // equality comparison, then we're just comparing whether X itself is // zero. if (N0.getOpcode() == ISD::SRL && (C1.isNullValue() || C1.isOneValue()) && N0.getOperand(0).getOpcode() == ISD::CTLZ && -N0.getOperand(1).getOpcode() == ISD::Constant) { - const APInt = N0.getConstantOperandAPInt(1); - if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && - ShAmt == Log2_32(N0.getValueSizeInBits())) { -if ((C1 == 0) == (Cond == ISD::SETEQ)) { - // (srl (ctlz x), 5) == 0 -> X != 0 - // (srl (ctlz x), 5) != 1 -> X != 0 - Cond = ISD::SETNE; -} else { - // (srl (ctlz x), 5) != 0 -> X == 0 - // (srl (ctlz x), 5) == 1 -> X == 0 - Cond = ISD::SETEQ; +isPowerOf2_32(N0.getScalarValueSizeInBits())) { + if (ConstantSDNode *ShAmt = isConstOrConstSplat(N0.getOperand(1))) { +if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && +ShAmt->getAPIntValue() == Log2_32(N0.getScalarValueSizeInBits())) { + if ((C1 == 0) == (Cond == ISD::SETEQ)) { +// (srl (ctlz x), 5) == 0 -> X != 0 +// (srl (ctlz x), 5) != 1 -> X != 0 +Cond = ISD::SETNE; + } else { +// (srl (ctlz x), 5) != 0 -> X == 0 +// (srl (ctlz x), 5) == 1 -> X == 0 +Cond = ISD::SETEQ; + } + SDValue Zero = DAG.getConstant(0, dl, N0.getValueType()); + return DAG.getSetCC(dl, VT, N0.getOperand(0).getOperand(0), Zero, + Cond); } -SDValue Zero = DAG.getConstant(0, dl, N0.getValueType()); -return DAG.getSetCC(dl, VT, N0.getOperand(0).getOperand(0), -Zero, Cond); } } + } + + // FIXME: Support vectors. + if (auto *N1C = dyn_cast(N1.getNode())) { +const APInt = N1C->getAPIntValue(); // (zext x) == C --> x == (trunc C) // (sext x) == C --> x == (trunc C) diff --git a/llvm/test/CodeGen/X86/lzcnt-cmp.ll b/llvm/test/CodeGen/X86/lzcnt-cmp.ll index 435b09dd5d08..3823524f552a 100644 --- a/llvm/test/CodeGen/X86/lzcnt-cmp.ll +++ b/llvm/test/CodeGen/X86/lzcnt-cmp.ll @@ -96,75 +96,36 @@ define i1 @lshr_ctlz_undef_cmpne_zero_i64(i64 %in) { define <2 x i64> @lshr_ctlz_cmpeq_zero_v2i64(<2 x i64> %in) { ; X86-LABEL: lshr_ctlz_cmpeq_zero_v2i64: ; X86: # %bb.0: +; X86-NEXT:pushl %esi +; X86-NEXT:.cfi_def_cfa_offset 8 +; X86-NEXT:.cfi_offset %esi, -8 ; X86-NEXT:movl {{[0-9]+}}(%esp), %eax +; X86-NEXT:movl {{[0-9]+}}(%esp), %esi +; X86-NEXT:movl {{[0-9]+}}(%esp), %edx +; X86-NEXT:xorl %ecx, %ecx +; X86-NEXT:orl {{[0-9]+}}(%esp), %edx +; X86-NEXT:setne %cl +; X86-NEXT:negl %ecx ; X86-NEXT:xorl %edx, %edx -; X86-NEXT:cmpl $0, {{[0-9]+}}(%esp) -; X86-NEXT:movl $0, %ecx -; X86-NEXT:jne .LBB4_2 -; X86-NEXT: # %bb.1: -; X86-NEXT:lzcntl {{[0-9]+}}(%esp), %ecx -; X86-NEXT:addl $32, %ecx -; X86-NEXT: .LBB4_2: -; X86-NEXT:cmpl $0, {{[0-9]+}}(%esp) -; X86-NEXT:jne .LBB4_4 -; X86-NEXT: # %bb.3: -; X86-NEXT:lzcntl {{[0-9]+}}(%esp), %edx -; X86-NEXT:addl $32, %edx -; X86-NEXT: .LBB4_4: -; X86-NEXT:andl $-64, %edx -; X86-NEXT:cmpl $1, %edx -; X86-NEXT:sbbl %edx, %edx -; X86-NEXT:andl $-64, %ecx -; X86-NEXT:cmpl $1, %ecx -; X86-NEXT:sbbl %ecx, %ecx -; X86-NEXT:movl %ecx, 12(%eax) -; X86-NEXT:movl %ecx, 8(%eax) -; X86-NEXT:movl %edx, 4(%eax) -; X86-NEXT:movl %edx, (%eax) +; X86-NEXT:orl {{[0-9]+}}(%esp), %esi +; X86-NEXT:setne %dl +; X86-NEXT:negl %edx +; X86-NEXT:movl %edx, 12(%eax) +; X86-NEXT:movl %edx, 8(%eax) +; X86-NEXT:
[llvm-branch-commits] [llvm] 500864f - Remove some unused includes. NFCI.
Author: Simon Pilgrim Date: 2021-01-06T15:50:29Z New Revision: 500864f928c272e8ebfd6493cb749083124bfd8b URL: https://github.com/llvm/llvm-project/commit/500864f928c272e8ebfd6493cb749083124bfd8b DIFF: https://github.com/llvm/llvm-project/commit/500864f928c272e8ebfd6493cb749083124bfd8b.diff LOG: Remove some unused includes. NFCI. (unlike many other c++ headers) is relatively clean, so if the file doesn't use std::vector then it shouldn't need the header. Added: Modified: llvm/include/llvm/Analysis/InlineAdvisor.h llvm/include/llvm/CodeGen/CodeGenPassBuilder.h llvm/include/llvm/ExecutionEngine/JITEventListener.h Removed: diff --git a/llvm/include/llvm/Analysis/InlineAdvisor.h b/llvm/include/llvm/Analysis/InlineAdvisor.h index 4dbd5786ac7d..f051706dca16 100644 --- a/llvm/include/llvm/Analysis/InlineAdvisor.h +++ b/llvm/include/llvm/Analysis/InlineAdvisor.h @@ -9,13 +9,11 @@ #ifndef LLVM_INLINEADVISOR_H_ #define LLVM_INLINEADVISOR_H_ -#include -#include -#include - #include "llvm/Analysis/InlineCost.h" #include "llvm/Config/llvm-config.h" #include "llvm/IR/PassManager.h" +#include +#include namespace llvm { class BasicBlock; diff --git a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h index b47aaa53eb89..893bc6e013f4 100644 --- a/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h +++ b/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h @@ -57,7 +57,6 @@ #include #include #include -#include namespace llvm { diff --git a/llvm/include/llvm/ExecutionEngine/JITEventListener.h b/llvm/include/llvm/ExecutionEngine/JITEventListener.h index 606b6f7cc128..4eefd993de2b 100644 --- a/llvm/include/llvm/ExecutionEngine/JITEventListener.h +++ b/llvm/include/llvm/ExecutionEngine/JITEventListener.h @@ -20,7 +20,6 @@ #include "llvm/IR/DebugLoc.h" #include "llvm/Support/CBindingWrapping.h" #include -#include namespace llvm { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] b69fe6a - [X86] Add icmp ne/eq (srl (ctlz x), log2(bw)) test coverage.
Author: Simon Pilgrim Date: 2021-01-06T15:50:29Z New Revision: b69fe6a85db43df27ebb260716d41a3e1b0d7534 URL: https://github.com/llvm/llvm-project/commit/b69fe6a85db43df27ebb260716d41a3e1b0d7534 DIFF: https://github.com/llvm/llvm-project/commit/b69fe6a85db43df27ebb260716d41a3e1b0d7534.diff LOG: [X86] Add icmp ne/eq (srl (ctlz x), log2(bw)) test coverage. Add vector coverage as well (which isn't currently supported). Added: llvm/test/CodeGen/X86/lzcnt-cmp.ll Modified: Removed: diff --git a/llvm/test/CodeGen/X86/lzcnt-cmp.ll b/llvm/test/CodeGen/X86/lzcnt-cmp.ll new file mode 100644 index ..435b09dd5d08 --- /dev/null +++ b/llvm/test/CodeGen/X86/lzcnt-cmp.ll @@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=i686-- -mattr=+lzcnt | FileCheck %s --check-prefixes=X86 +; RUN: llc < %s -mtriple=x86_64-- -mattr=+lzcnt | FileCheck %s --check-prefix=X64 + +define i1 @lshr_ctlz_cmpeq_one_i64(i64 %in) { +; X86-LABEL: lshr_ctlz_cmpeq_one_i64: +; X86: # %bb.0: +; X86-NEXT:movl {{[0-9]+}}(%esp), %eax +; X86-NEXT:orl {{[0-9]+}}(%esp), %eax +; X86-NEXT:sete %al +; X86-NEXT:retl +; +; X64-LABEL: lshr_ctlz_cmpeq_one_i64: +; X64: # %bb.0: +; X64-NEXT:testq %rdi, %rdi +; X64-NEXT:sete %al +; X64-NEXT:retq + %ctlz = call i64 @llvm.ctlz.i64(i64 %in, i1 0) + %lshr = lshr i64 %ctlz, 6 + %icmp = icmp eq i64 %lshr, 1 + ret i1 %icmp +} + +define i1 @lshr_ctlz_undef_cmpeq_one_i64(i64 %in) { +; X86-LABEL: lshr_ctlz_undef_cmpeq_one_i64: +; X86: # %bb.0: +; X86-NEXT:xorl %eax, %eax +; X86-NEXT:cmpl $0, {{[0-9]+}}(%esp) +; X86-NEXT:jne .LBB1_2 +; X86-NEXT: # %bb.1: +; X86-NEXT:lzcntl {{[0-9]+}}(%esp), %eax +; X86-NEXT:addl $32, %eax +; X86-NEXT: .LBB1_2: +; X86-NEXT:testb $64, %al +; X86-NEXT:setne %al +; X86-NEXT:retl +; +; X64-LABEL: lshr_ctlz_undef_cmpeq_one_i64: +; X64: # %bb.0: +; X64-NEXT:lzcntq %rdi, %rax +; X64-NEXT:shrq $6, %rax +; X64-NEXT:cmpl $1, %eax +; X64-NEXT:sete %al +; X64-NEXT:retq + %ctlz = call i64 @llvm.ctlz.i64(i64 %in, i1 -1) + %lshr = lshr i64 %ctlz, 6 + %icmp = icmp eq i64 %lshr, 1 + ret i1 %icmp +} + +define i1 @lshr_ctlz_cmpne_zero_i64(i64 %in) { +; X86-LABEL: lshr_ctlz_cmpne_zero_i64: +; X86: # %bb.0: +; X86-NEXT:movl {{[0-9]+}}(%esp), %eax +; X86-NEXT:orl {{[0-9]+}}(%esp), %eax +; X86-NEXT:sete %al +; X86-NEXT:retl +; +; X64-LABEL: lshr_ctlz_cmpne_zero_i64: +; X64: # %bb.0: +; X64-NEXT:testq %rdi, %rdi +; X64-NEXT:sete %al +; X64-NEXT:retq + %ctlz = call i64 @llvm.ctlz.i64(i64 %in, i1 0) + %lshr = lshr i64 %ctlz, 6 + %icmp = icmp ne i64 %lshr, 0 + ret i1 %icmp +} + +define i1 @lshr_ctlz_undef_cmpne_zero_i64(i64 %in) { +; X86-LABEL: lshr_ctlz_undef_cmpne_zero_i64: +; X86: # %bb.0: +; X86-NEXT:xorl %eax, %eax +; X86-NEXT:cmpl $0, {{[0-9]+}}(%esp) +; X86-NEXT:jne .LBB3_2 +; X86-NEXT: # %bb.1: +; X86-NEXT:lzcntl {{[0-9]+}}(%esp), %eax +; X86-NEXT:addl $32, %eax +; X86-NEXT: .LBB3_2: +; X86-NEXT:testb $64, %al +; X86-NEXT:setne %al +; X86-NEXT:retl +; +; X64-LABEL: lshr_ctlz_undef_cmpne_zero_i64: +; X64: # %bb.0: +; X64-NEXT:lzcntq %rdi, %rax +; X64-NEXT:testb $64, %al +; X64-NEXT:setne %al +; X64-NEXT:retq + %ctlz = call i64 @llvm.ctlz.i64(i64 %in, i1 -1) + %lshr = lshr i64 %ctlz, 6 + %icmp = icmp ne i64 %lshr, 0 + ret i1 %icmp +} + +define <2 x i64> @lshr_ctlz_cmpeq_zero_v2i64(<2 x i64> %in) { +; X86-LABEL: lshr_ctlz_cmpeq_zero_v2i64: +; X86: # %bb.0: +; X86-NEXT:movl {{[0-9]+}}(%esp), %eax +; X86-NEXT:xorl %edx, %edx +; X86-NEXT:cmpl $0, {{[0-9]+}}(%esp) +; X86-NEXT:movl $0, %ecx +; X86-NEXT:jne .LBB4_2 +; X86-NEXT: # %bb.1: +; X86-NEXT:lzcntl {{[0-9]+}}(%esp), %ecx +; X86-NEXT:addl $32, %ecx +; X86-NEXT: .LBB4_2: +; X86-NEXT:cmpl $0, {{[0-9]+}}(%esp) +; X86-NEXT:jne .LBB4_4 +; X86-NEXT: # %bb.3: +; X86-NEXT:lzcntl {{[0-9]+}}(%esp), %edx +; X86-NEXT:addl $32, %edx +; X86-NEXT: .LBB4_4: +; X86-NEXT:andl $-64, %edx +; X86-NEXT:cmpl $1, %edx +; X86-NEXT:sbbl %edx, %edx +; X86-NEXT:andl $-64, %ecx +; X86-NEXT:cmpl $1, %ecx +; X86-NEXT:sbbl %ecx, %ecx +; X86-NEXT:movl %ecx, 12(%eax) +; X86-NEXT:movl %ecx, 8(%eax) +; X86-NEXT:movl %edx, 4(%eax) +; X86-NEXT:movl %edx, (%eax) +; X86-NEXT:retl $4 +; +; X64-LABEL: lshr_ctlz_cmpeq_zero_v2i64: +; X64: # %bb.0: +; X64-NEXT:movdqa %xmm0, %xmm1 +; X64-NEXT:psrlq $1, %xmm1 +; X64-NEXT:por %xmm0, %xmm1 +; X64-NEXT:movdqa %xmm1, %xmm0 +; X64-NEXT:psrlq $2, %xmm0 +; X64-NEXT:por %xmm1, %xmm0 +; X64-NEXT:movdqa %xmm0, %xmm1 +; X64-NEXT:psrlq $4, %xmm1 +; X64-NEXT:por %xmm0, %xmm1 +; X64-NEXT:movdqa %xmm1, %xmm0 +; X64-NEXT:
[llvm-branch-commits] [llvm] 26c486c - [TableGen] RegisterBankEmitter - Pass Twine by const reference instead of by value. NFCI.
Author: Simon Pilgrim Date: 2021-01-06T14:22:05Z New Revision: 26c486c2eb1a0f302eb60a4b959456f09adbbacb URL: https://github.com/llvm/llvm-project/commit/26c486c2eb1a0f302eb60a4b959456f09adbbacb DIFF: https://github.com/llvm/llvm-project/commit/26c486c2eb1a0f302eb60a4b959456f09adbbacb.diff LOG: [TableGen] RegisterBankEmitter - Pass Twine by const reference instead of by value. NFCI. Added: Modified: llvm/utils/TableGen/RegisterBankEmitter.cpp Removed: diff --git a/llvm/utils/TableGen/RegisterBankEmitter.cpp b/llvm/utils/TableGen/RegisterBankEmitter.cpp index 6a45213e1d66..0725657150f8 100644 --- a/llvm/utils/TableGen/RegisterBankEmitter.cpp +++ b/llvm/utils/TableGen/RegisterBankEmitter.cpp @@ -168,7 +168,7 @@ void RegisterBankEmitter::emitBaseClassDefinition( ///to the class. static void visitRegisterBankClasses( const CodeGenRegBank , -const CodeGenRegisterClass *RC, const Twine Kind, +const CodeGenRegisterClass *RC, const Twine , std::function VisitFn, SmallPtrSetImpl ) { @@ -182,7 +182,7 @@ static void visitRegisterBankClasses( for (const auto : RegisterClassHierarchy.getRegClasses()) { std::string TmpKind = -(Twine(Kind) + " (" + PossibleSubclass.getName() + ")").str(); +(Kind + " (" + PossibleSubclass.getName() + ")").str(); // Visit each subclass of an explicitly named class. if (RC != && RC->hasSubClass()) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] df5c2ca - [MIPS] MipsAsmParser - Pass Twine by const reference instead of by value. NFCI.
Author: Simon Pilgrim Date: 2021-01-06T14:22:04Z New Revision: df5c2caf0fc0d59d4d2e0ce99da4aa58f204791a URL: https://github.com/llvm/llvm-project/commit/df5c2caf0fc0d59d4d2e0ce99da4aa58f204791a DIFF: https://github.com/llvm/llvm-project/commit/df5c2caf0fc0d59d4d2e0ce99da4aa58f204791a.diff LOG: [MIPS] MipsAsmParser - Pass Twine by const reference instead of by value. NFCI. Added: Modified: llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp Removed: diff --git a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp index 9dbbdeb34dba..e4d61f8c210e 100644 --- a/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp +++ b/llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp @@ -352,8 +352,8 @@ class MipsAsmParser : public MCTargetAsmParser { bool expandSaaAddr(MCInst , SMLoc IDLoc, MCStreamer , const MCSubtargetInfo *STI); - bool reportParseError(Twine ErrorMsg); - bool reportParseError(SMLoc Loc, Twine ErrorMsg); + bool reportParseError(const Twine ); + bool reportParseError(SMLoc Loc, const Twine ); bool parseMemOffset(const MCExpr *, bool isParenExpr); @@ -6982,12 +6982,12 @@ bool MipsAsmParser::ParseInstruction(ParseInstructionInfo , StringRef Name, // FIXME: Given that these have the same name, these should both be // consistent on affecting the Parser. -bool MipsAsmParser::reportParseError(Twine ErrorMsg) { +bool MipsAsmParser::reportParseError(const Twine ) { SMLoc Loc = getLexer().getLoc(); return Error(Loc, ErrorMsg); } -bool MipsAsmParser::reportParseError(SMLoc Loc, Twine ErrorMsg) { +bool MipsAsmParser::reportParseError(SMLoc Loc, const Twine ) { return Error(Loc, ErrorMsg); } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 396dd6c - [ProfileData] Pass Twine by const reference instead of by value.
Author: Simon Pilgrim Date: 2021-01-06T14:22:03Z New Revision: 396dd6cd3d8bdcda9dcb606ad4c054560bf0649f URL: https://github.com/llvm/llvm-project/commit/396dd6cd3d8bdcda9dcb606ad4c054560bf0649f DIFF: https://github.com/llvm/llvm-project/commit/396dd6cd3d8bdcda9dcb606ad4c054560bf0649f.diff LOG: [ProfileData] Pass Twine by const reference instead of by value. Its only used by DiagnosticInfoSampleProfile which takes a const reference anyhow. Added: Modified: llvm/include/llvm/ProfileData/SampleProfReader.h Removed: diff --git a/llvm/include/llvm/ProfileData/SampleProfReader.h b/llvm/include/llvm/ProfileData/SampleProfReader.h index 35e71f336c27..92fe825beefc 100644 --- a/llvm/include/llvm/ProfileData/SampleProfReader.h +++ b/llvm/include/llvm/ProfileData/SampleProfReader.h @@ -226,7 +226,6 @@ #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringMap.h" #include "llvm/ADT/StringRef.h" -#include "llvm/ADT/Twine.h" #include "llvm/IR/DiagnosticInfo.h" #include "llvm/IR/Function.h" #include "llvm/IR/LLVMContext.h" @@ -247,6 +246,7 @@ namespace llvm { class raw_ostream; +class Twine; namespace sampleprof { @@ -408,7 +408,7 @@ class SampleProfileReader { StringMap () { return Profiles; } /// Report a parse error message. - void reportError(int64_t LineNumber, Twine Msg) const { + void reportError(int64_t LineNumber, const Twine ) const { Ctx.diagnose(DiagnosticInfoSampleProfile(Buffer->getBufferIdentifier(), LineNumber, Msg)); } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 37ac4f8 - [Hexagon] Regenerate zext-v4i1.ll tests
Author: Simon Pilgrim Date: 2021-01-06T12:56:06Z New Revision: 37ac4f865fba451d969bd9b4b1e28ce296e093da URL: https://github.com/llvm/llvm-project/commit/37ac4f865fba451d969bd9b4b1e28ce296e093da DIFF: https://github.com/llvm/llvm-project/commit/37ac4f865fba451d969bd9b4b1e28ce296e093da.diff LOG: [Hexagon] Regenerate zext-v4i1.ll tests This will be improved by part of the work for D86578 Added: Modified: llvm/test/CodeGen/Hexagon/vect/zext-v4i1.ll Removed: diff --git a/llvm/test/CodeGen/Hexagon/vect/zext-v4i1.ll b/llvm/test/CodeGen/Hexagon/vect/zext-v4i1.ll index e5394d929bb1..5f9a1522a2f6 100644 --- a/llvm/test/CodeGen/Hexagon/vect/zext-v4i1.ll +++ b/llvm/test/CodeGen/Hexagon/vect/zext-v4i1.ll @@ -1,12 +1,44 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -march=hexagon -hexagon-instsimplify=0 < %s | FileCheck %s ; Check that this compiles successfully. -; CHECK: vcmph.eq target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048" target triple = "hexagon" define i32 @fred(<8 x i16>* %a0) #0 { +; CHECK-LABEL: fred: +; CHECK: // %bb.0: // %b0 +; CHECK-NEXT:{ +; CHECK-NEXT: if (p0) jump:nt .LBB0_2 +; CHECK-NEXT:} +; CHECK-NEXT: // %bb.1: // %b2 +; CHECK-NEXT:{ +; CHECK-NEXT: r3:2 = combine(#0,#0) +; CHECK-NEXT: r1:0 = memd(r0+#0) +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: p0 = vcmph.eq(r1:0,r3:2) +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r1:0 = mask(p0) +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = and(r0,#1) +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: p0 = cmp.eq(r0,#11) +; CHECK-NEXT: r0 = #1 +; CHECK-NEXT:} +; CHECK-NEXT:{ +; CHECK-NEXT: if (p0) r0 = #0 +; CHECK-NEXT: jumpr r31 +; CHECK-NEXT:} +; CHECK-NEXT: .LBB0_2: // %b14 +; CHECK-NEXT:{ +; CHECK-NEXT: r0 = #0 +; CHECK-NEXT: jumpr r31 +; CHECK-NEXT:} b0: switch i32 undef, label %b14 [ i32 5, label %b2 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] dfcb872 - [X86] Add scalar/vector test coverage for D93599
Author: Simon Pilgrim Date: 2021-01-06T11:58:27Z New Revision: dfcb872c3e82c821bb32a2dd53ab73314d38ce38 URL: https://github.com/llvm/llvm-project/commit/dfcb872c3e82c821bb32a2dd53ab73314d38ce38 DIFF: https://github.com/llvm/llvm-project/commit/dfcb872c3e82c821bb32a2dd53ab73314d38ce38.diff LOG: [X86] Add scalar/vector test coverage for D93599 This expands the test coverage beyond just the boolvector/movmsk concat pattern Added: llvm/test/CodeGen/X86/cmp-concat.ll Modified: Removed: diff --git a/llvm/test/CodeGen/X86/cmp-concat.ll b/llvm/test/CodeGen/X86/cmp-concat.ll new file mode 100644 index ..a622ad7faff7 --- /dev/null +++ b/llvm/test/CodeGen/X86/cmp-concat.ll @@ -0,0 +1,84 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 | FileCheck %s + +define i1 @cmp_allbits_concat_i8(i8 %x, i8 %y) { +; CHECK-LABEL: cmp_allbits_concat_i8: +; CHECK: # %bb.0: +; CHECK-NEXT:movzbl %sil, %eax +; CHECK-NEXT:shll $8, %edi +; CHECK-NEXT:orl %eax, %edi +; CHECK-NEXT:cmpw $-1, %di +; CHECK-NEXT:sete %al +; CHECK-NEXT:retq + %zx = zext i8 %x to i16 + %zy = zext i8 %y to i16 + %sh = shl i16 %zx, 8 + %or = or i16 %zy, %sh + %r = icmp eq i16 %or, -1 + ret i1 %r +} + +define i1 @cmp_anybits_concat_i32(i32 %x, i32 %y) { +; CHECK-LABEL: cmp_anybits_concat_i32: +; CHECK: # %bb.0: +; CHECK-NEXT:# kill: def $edi killed $edi def $rdi +; CHECK-NEXT:movl %esi, %eax +; CHECK-NEXT:shlq $32, %rdi +; CHECK-NEXT:orq %rax, %rdi +; CHECK-NEXT:setne %al +; CHECK-NEXT:retq + %zx = zext i32 %x to i64 + %zy = zext i32 %y to i64 + %sh = shl i64 %zx, 32 + %or = or i64 %zy, %sh + %r = icmp ne i64 %or, 0 + ret i1 %r +} + +define <16 x i8> @cmp_allbits_concat_v16i8(<16 x i8> %x, <16 x i8> %y) { +; CHECK-LABEL: cmp_allbits_concat_v16i8: +; CHECK: # %bb.0: +; CHECK-NEXT:movdqa %xmm1, %xmm2 +; CHECK-NEXT:punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7] +; CHECK-NEXT:punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15] +; CHECK-NEXT:pcmpeqd %xmm0, %xmm0 +; CHECK-NEXT:pcmpeqw %xmm0, %xmm1 +; CHECK-NEXT:pcmpeqw %xmm2, %xmm0 +; CHECK-NEXT:packsswb %xmm1, %xmm0 +; CHECK-NEXT:retq + %zx = zext <16 x i8> %x to <16 x i16> + %zy = zext <16 x i8> %y to <16 x i16> + %sh = shl <16 x i16> %zx, + %or = or <16 x i16> %zy, %sh + %r = icmp eq <16 x i16> %or, + %s = sext <16 x i1> %r to <16 x i8> + ret <16 x i8> %s +} + +define <2 x i64> @cmp_nobits_concat_v2i64(<2 x i64> %x, <2 x i64> %y) { +; CHECK-LABEL: cmp_nobits_concat_v2i64: +; CHECK: # %bb.0: +; CHECK-NEXT:movq %xmm0, %rax +; CHECK-NEXT:pextrq $1, %xmm0, %rcx +; CHECK-NEXT:movq %xmm1, %rdx +; CHECK-NEXT:pextrq $1, %xmm1, %rsi +; CHECK-NEXT:xorl %edi, %edi +; CHECK-NEXT:orq %rcx, %rsi +; CHECK-NEXT:sete %dil +; CHECK-NEXT:negq %rdi +; CHECK-NEXT:movq %rdi, %xmm1 +; CHECK-NEXT:xorl %ecx, %ecx +; CHECK-NEXT:orq %rax, %rdx +; CHECK-NEXT:sete %cl +; CHECK-NEXT:negq %rcx +; CHECK-NEXT:movq %rcx, %xmm0 +; CHECK-NEXT:punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT:retq + %zx = zext <2 x i64> %x to <2 x i128> + %zy = zext <2 x i64> %y to <2 x i128> + %sh = shl <2 x i128> %zx, + %or = or <2 x i128> %zy, %sh + %r = icmp eq <2 x i128> %or, zeroinitializer + %s = sext <2 x i1> %r to <2 x i64> + ret <2 x i64> %s +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 55488bd - CGExpr - EmitMatrixSubscriptExpr - fix getAs<> null-dereference static analyzer warning. NFCI.
Author: Simon Pilgrim Date: 2021-01-05T17:08:11Z New Revision: 55488bd3cd1a468941e26ad4cf94f2bad887fc02 URL: https://github.com/llvm/llvm-project/commit/55488bd3cd1a468941e26ad4cf94f2bad887fc02 DIFF: https://github.com/llvm/llvm-project/commit/55488bd3cd1a468941e26ad4cf94f2bad887fc02.diff LOG: CGExpr - EmitMatrixSubscriptExpr - fix getAs<> null-dereference static analyzer warning. NFCI. getAs<> can return null if the cast is invalid, which can lead to null pointer deferences. Use castAs<> instead which will assert that the cast is valid. Added: Modified: clang/lib/CodeGen/CGExpr.cpp Removed: diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index 3013fffcbf6d..a3f90449bb4c 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -3858,7 +3858,7 @@ LValue CodeGenFunction::EmitMatrixSubscriptExpr(const MatrixSubscriptExpr *E) { llvm::Value *ColIdx = EmitScalarExpr(E->getColumnIdx()); llvm::Value *NumRows = Builder.getIntN( RowIdx->getType()->getScalarSizeInBits(), - E->getBase()->getType()->getAs()->getNumRows()); + E->getBase()->getType()->castAs()->getNumRows()); llvm::Value *FinalIdx = Builder.CreateAdd(Builder.CreateMul(ColIdx, NumRows), RowIdx); return LValue::MakeMatrixElt( ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 73a44f4 - [X86][AVX] combineVectorSignBitsTruncation - use PACKSS/PACKUS in more AVX cases
Author: Simon Pilgrim Date: 2021-01-05T15:01:45Z New Revision: 73a44f437bf19ecf2c865e6c8b9b8a2e4a811960 URL: https://github.com/llvm/llvm-project/commit/73a44f437bf19ecf2c865e6c8b9b8a2e4a811960 DIFF: https://github.com/llvm/llvm-project/commit/73a44f437bf19ecf2c865e6c8b9b8a2e4a811960.diff LOG: [X86][AVX] combineVectorSignBitsTruncation - use PACKSS/PACKUS in more AVX cases AVX512 has fast truncation ops, but if the truncation source is a concatenation of subvectors then its likely that we can use PACK more efficiently. This is only guaranteed to work for truncations to 128/256-bit vectors as the PACK works across 128-bit sub-lanes, for now I've just disabled 512-bit truncation cases but we need to get them working eventually for D61129. Added: Modified: llvm/lib/Target/X86/X86ISelLowering.cpp llvm/test/CodeGen/X86/vector-pack-128.ll llvm/test/CodeGen/X86/vector-pack-256.ll Removed: diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 4dce5283b2ab..16f1023ed5f8 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -45706,8 +45706,13 @@ static SDValue combineVectorSignBitsTruncation(SDNode *N, const SDLoc , // there's no harm in trying pack. if (Subtarget.hasAVX512() && !(!Subtarget.useAVX512Regs() && VT.is256BitVector() && -InVT.is512BitVector())) -return SDValue(); +InVT.is512BitVector())) { +// PACK should still be worth it for 128/256-bit vectors if the sources were +// originally concatenated from subvectors. +SmallVector ConcatOps; +if (VT.getSizeInBits() > 256 || !collectConcatOps(In.getNode(), ConcatOps)) + return SDValue(); + } unsigned NumPackedSignBits = std::min(SVT.getSizeInBits(), 16); unsigned NumPackedZeroBits = Subtarget.hasSSE41() ? NumPackedSignBits : 8; diff --git a/llvm/test/CodeGen/X86/vector-pack-128.ll b/llvm/test/CodeGen/X86/vector-pack-128.ll index 9b0bbac0199d..a49d0f9e3605 100644 --- a/llvm/test/CodeGen/X86/vector-pack-128.ll +++ b/llvm/test/CodeGen/X86/vector-pack-128.ll @@ -35,9 +35,7 @@ define <8 x i16> @trunc_concat_packssdw_128(<4 x i32> %a0, <4 x i32> %a1) nounwi ; AVX512: # %bb.0: ; AVX512-NEXT:vpsrad $17, %xmm0, %xmm0 ; AVX512-NEXT:vpandd {{.*}}(%rip){1to4}, %xmm1, %xmm1 -; AVX512-NEXT:vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX512-NEXT:vpmovdw %ymm0, %xmm0 -; AVX512-NEXT:vzeroupper +; AVX512-NEXT:vpackssdw %xmm1, %xmm0, %xmm0 ; AVX512-NEXT:retq %1 = ashr <4 x i32> %a0, %2 = and <4 x i32> %a1, @@ -80,9 +78,7 @@ define <8 x i16> @trunc_concat_packusdw_128(<4 x i32> %a0, <4 x i32> %a1) nounwi ; AVX512: # %bb.0: ; AVX512-NEXT:vpsrld $17, %xmm0, %xmm0 ; AVX512-NEXT:vpandd {{.*}}(%rip){1to4}, %xmm1, %xmm1 -; AVX512-NEXT:vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX512-NEXT:vpmovdw %ymm0, %xmm0 -; AVX512-NEXT:vzeroupper +; AVX512-NEXT:vpackusdw %xmm1, %xmm0, %xmm0 ; AVX512-NEXT:retq %1 = lshr <4 x i32> %a0, %2 = and <4 x i32> %a1, @@ -99,38 +95,12 @@ define <16 x i8> @trunc_concat_packsswb_128(<8 x i16> %a0, <8 x i16> %a1) nounwi ; SSE-NEXT:packsswb %xmm1, %xmm0 ; SSE-NEXT:retq ; -; AVX1-LABEL: trunc_concat_packsswb_128: -; AVX1: # %bb.0: -; AVX1-NEXT:vpsraw $15, %xmm0, %xmm0 -; AVX1-NEXT:vpand {{.*}}(%rip), %xmm1, %xmm1 -; AVX1-NEXT:vpacksswb %xmm1, %xmm0, %xmm0 -; AVX1-NEXT:retq -; -; AVX2-LABEL: trunc_concat_packsswb_128: -; AVX2: # %bb.0: -; AVX2-NEXT:vpsraw $15, %xmm0, %xmm0 -; AVX2-NEXT:vpand {{.*}}(%rip), %xmm1, %xmm1 -; AVX2-NEXT:vpacksswb %xmm1, %xmm0, %xmm0 -; AVX2-NEXT:retq -; -; AVX512F-LABEL: trunc_concat_packsswb_128: -; AVX512F: # %bb.0: -; AVX512F-NEXT:vpsraw $15, %xmm0, %xmm0 -; AVX512F-NEXT:vpand {{.*}}(%rip), %xmm1, %xmm1 -; AVX512F-NEXT:vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX512F-NEXT:vpmovzxwd {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero -; AVX512F-NEXT:vpmovdb %zmm0, %xmm0 -; AVX512F-NEXT:vzeroupper -; AVX512F-NEXT:retq -; -; AVX512BW-LABEL: trunc_concat_packsswb_128: -; AVX512BW: # %bb.0: -; AVX512BW-NEXT:vpsraw $15, %xmm0, %xmm0 -; AVX512BW-NEXT:vpand {{.*}}(%rip), %xmm1, %xmm1 -; AVX512BW-NEXT:vinserti128 $1, %xmm1, %ymm0, %ymm0 -; AVX512BW-NEXT:vpmovwb %ymm0, %xmm0 -; AVX512BW-NEXT:vzeroupper -; AVX512BW-NEXT:retq +; AVX-LABEL: trunc_concat_packsswb_128: +; AVX: # %bb.0: +; AVX-NEXT:vpsraw $15, %xmm0, %xmm0 +; AVX-NEXT:vpand {{.*}}(%rip), %xmm1, %xmm1 +; AVX-NEXT:vpacksswb %xmm1, %xmm0, %xmm0 +; AVX-NEXT:retq %1 = ashr <8 x i16> %a0, %2 = and <8 x
[llvm-branch-commits] [llvm] dc74d7e - [X86] getMemoryOpCost - use dyn_cast_or_null. NFCI.
Author: Simon Pilgrim Date: 2021-01-05T13:23:09Z New Revision: dc74d7ed1f651aa61d15b4eaaa32200df1f38d37 URL: https://github.com/llvm/llvm-project/commit/dc74d7ed1f651aa61d15b4eaaa32200df1f38d37 DIFF: https://github.com/llvm/llvm-project/commit/dc74d7ed1f651aa61d15b4eaaa32200df1f38d37.diff LOG: [X86] getMemoryOpCost - use dyn_cast_or_null. NFCI. Use instead of the isa_and_nonnull and use the StoreInst::getPointerOperand wrapper instead of a hardcoded Instruction::getOperand. Looks cleaner and avoids a spurious clang static analyzer null dereference warning. Added: Modified: llvm/lib/Target/X86/X86TargetTransformInfo.cpp Removed: diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp index 5a342d41fb5e..71455237fb61 100644 --- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp +++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp @@ -3188,11 +3188,10 @@ int X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, const Instruction *I) { // TODO: Handle other cost kinds. if (CostKind != TTI::TCK_RecipThroughput) { -if (isa_and_nonnull(I)) { - Value *Ptr = I->getOperand(1); +if (auto *SI = dyn_cast_or_null(I)) { // Store instruction with index and scale costs 2 Uops. // Check the preceding GEP to identify non-const indices. - if (auto *GEP = dyn_cast(Ptr)) { + if (auto *GEP = dyn_cast(SI->getPointerOperand())) { if (!all_of(GEP->indices(), [](Value *V) { return isa(V); })) return TTI::TCC_Basic * 2; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 313d982 - [IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse.
Author: Simon Pilgrim Date: 2021-01-05T11:01:10Z New Revision: 313d982df65a7a8f1da2da5f0e03e6b6e301ce3c URL: https://github.com/llvm/llvm-project/commit/313d982df65a7a8f1da2da5f0e03e6b6e301ce3c DIFF: https://github.com/llvm/llvm-project/commit/313d982df65a7a8f1da2da5f0e03e6b6e301ce3c.diff LOG: [IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse. Added: Modified: llvm/include/llvm/IR/Constants.h llvm/lib/IR/Constants.cpp llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Removed: diff --git a/llvm/include/llvm/IR/Constants.h b/llvm/include/llvm/IR/Constants.h index 3fbbf53c29b4..ac802232c23d 100644 --- a/llvm/include/llvm/IR/Constants.h +++ b/llvm/include/llvm/IR/Constants.h @@ -88,8 +88,10 @@ class ConstantInt final : public ConstantData { static ConstantInt *getTrue(LLVMContext ); static ConstantInt *getFalse(LLVMContext ); + static ConstantInt *getBool(LLVMContext , bool V); static Constant *getTrue(Type *Ty); static Constant *getFalse(Type *Ty); + static Constant *getBool(Type *Ty, bool V); /// If Ty is a vector type, return a Constant with a splat of the given /// value. Otherwise return a ConstantInt for the given value. diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 82a5f9db0bf7..a38302d17937 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -815,6 +815,10 @@ ConstantInt *ConstantInt::getFalse(LLVMContext ) { return pImpl->TheFalseVal; } +ConstantInt *ConstantInt::getBool(LLVMContext , bool V) { + return V ? getTrue(Context) : getFalse(Context); +} + Constant *ConstantInt::getTrue(Type *Ty) { assert(Ty->isIntOrIntVectorTy(1) && "Type not i1 or vector of i1."); ConstantInt *TrueC = ConstantInt::getTrue(Ty->getContext()); @@ -831,6 +835,10 @@ Constant *ConstantInt::getFalse(Type *Ty) { return FalseC; } +Constant *ConstantInt::getBool(Type *Ty, bool V) { + return V ? getTrue(Ty) : getFalse(Ty); +} + // Get a ConstantInt from an APInt. ConstantInt *ConstantInt::get(LLVMContext , const APInt ) { // get an existing value or the insertion position diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp index 83b310bfcd05..87d4b40a9a64 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp @@ -5037,11 +5037,9 @@ Instruction *InstCombinerImpl::foldICmpUsingKnownBits(ICmpInst ) { llvm_unreachable("Unknown icmp opcode!"); case ICmpInst::ICMP_EQ: case ICmpInst::ICMP_NE: { -if (Op0Max.ult(Op1Min) || Op0Min.ugt(Op1Max)) { - return Pred == CmpInst::ICMP_EQ - ? replaceInstUsesWith(I, ConstantInt::getFalse(I.getType())) - : replaceInstUsesWith(I, ConstantInt::getTrue(I.getType())); -} +if (Op0Max.ult(Op1Min) || Op0Min.ugt(Op1Max)) + return replaceInstUsesWith( + I, ConstantInt::getBool(I.getType(), Pred == CmpInst::ICMP_NE)); // If all bits are known zero except for one, then we know at most one bit // is set. If the comparison is against zero, then this is a check to see if ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 7a97eeb - [Coroutines] checkAsyncFuncPointer - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI.
Author: Simon Pilgrim Date: 2021-01-05T10:31:45Z New Revision: 7a97eeb197a8023acbb800d40b3bb852fc2f5d60 URL: https://github.com/llvm/llvm-project/commit/7a97eeb197a8023acbb800d40b3bb852fc2f5d60 DIFF: https://github.com/llvm/llvm-project/commit/7a97eeb197a8023acbb800d40b3bb852fc2f5d60.diff LOG: [Coroutines] checkAsyncFuncPointer - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning. Added: Modified: llvm/lib/Transforms/Coroutines/Coroutines.cpp Removed: diff --git a/llvm/lib/Transforms/Coroutines/Coroutines.cpp b/llvm/lib/Transforms/Coroutines/Coroutines.cpp index f0095a649b0c..6699a5c46313 100644 --- a/llvm/lib/Transforms/Coroutines/Coroutines.cpp +++ b/llvm/lib/Transforms/Coroutines/Coroutines.cpp @@ -676,8 +676,8 @@ static void checkAsyncFuncPointer(const Instruction *I, Value *V) { if (!AsyncFuncPtrAddr) fail(I, "llvm.coro.id.async async function pointer not a global", V); - auto *StructTy = dyn_cast( - AsyncFuncPtrAddr->getType()->getPointerElementType()); + auto *StructTy = + cast(AsyncFuncPtrAddr->getType()->getPointerElementType()); if (StructTy->isOpaque() || !StructTy->isPacked() || StructTy->getNumElements() != 2 || !StructTy->getElementType(0)->isIntegerTy(32) || ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] a000366 - [SimplifyIndVar] createWideIV - make WideIVInfo arg a const ref. NFCI.
Author: Simon Pilgrim Date: 2021-01-05T10:31:45Z New Revision: a000366d0502b35fc0d3b113ace7f0e3bbdc08cd URL: https://github.com/llvm/llvm-project/commit/a000366d0502b35fc0d3b113ace7f0e3bbdc08cd DIFF: https://github.com/llvm/llvm-project/commit/a000366d0502b35fc0d3b113ace7f0e3bbdc08cd.diff LOG: [SimplifyIndVar] createWideIV - make WideIVInfo arg a const ref. NFCI. The WideIVInfo arg is only ever used as a const. Fixes cppcheck warning. Added: Modified: llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h llvm/lib/Transforms/Utils/SimplifyIndVar.cpp Removed: diff --git a/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h b/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h index 4599627b65f5..4ba56fb45afa 100644 --- a/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h +++ b/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h @@ -74,7 +74,7 @@ struct WideIVInfo { /// Widen Induction Variables - Extend the width of an IV to cover its /// widest uses. -PHINode *createWideIV(WideIVInfo , +PHINode *createWideIV(const WideIVInfo , LoopInfo *LI, ScalarEvolution *SE, SCEVExpander , DominatorTree *DT, SmallVectorImpl , unsigned , unsigned , diff --git a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp index f3b198094bd1..290c04a7ad10 100644 --- a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp @@ -2076,7 +2076,7 @@ void WidenIV::calculatePostIncRanges(PHINode *OrigPhi) { } } -PHINode *llvm::createWideIV(WideIVInfo , +PHINode *llvm::createWideIV(const WideIVInfo , LoopInfo *LI, ScalarEvolution *SE, SCEVExpander , DominatorTree *DT, SmallVectorImpl , unsigned , unsigned , ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 84d5768 - MemProfiler::insertDynamicShadowAtFunctionEntry - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI.
Author: Simon Pilgrim Date: 2021-01-05T09:34:01Z New Revision: 84d5768d97635602225f5056da96b058e588b2f5 URL: https://github.com/llvm/llvm-project/commit/84d5768d97635602225f5056da96b058e588b2f5 DIFF: https://github.com/llvm/llvm-project/commit/84d5768d97635602225f5056da96b058e588b2f5.diff LOG: MemProfiler::insertDynamicShadowAtFunctionEntry - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning. Added: Modified: llvm/lib/Transforms/Instrumentation/MemProfiler.cpp Removed: diff --git a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp index 56006bbc94c7..0e6a404a9e0b 100644 --- a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp +++ b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp @@ -577,7 +577,7 @@ bool MemProfiler::insertDynamicShadowAtFunctionEntry(Function ) { Value *GlobalDynamicAddress = F.getParent()->getOrInsertGlobal( MemProfShadowMemoryDynamicAddress, IntptrTy); if (F.getParent()->getPICLevel() == PICLevel::NotPIC) -dyn_cast(GlobalDynamicAddress)->setDSOLocal(true); +cast(GlobalDynamicAddress)->setDSOLocal(true); DynamicShadowOffset = IRB.CreateLoad(IntptrTy, GlobalDynamicAddress); return true; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 52e4489 - SystemZTargetLowering::lowerDYNAMIC_STACKALLOC - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI.
Author: Simon Pilgrim Date: 2021-01-05T09:34:01Z New Revision: 52e448974b2ec826c8af429c370c4d6e79ce5747 URL: https://github.com/llvm/llvm-project/commit/52e448974b2ec826c8af429c370c4d6e79ce5747 DIFF: https://github.com/llvm/llvm-project/commit/52e448974b2ec826c8af429c370c4d6e79ce5747.diff LOG: SystemZTargetLowering::lowerDYNAMIC_STACKALLOC - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning. Added: Modified: llvm/lib/Target/SystemZ/SystemZISelLowering.cpp Removed: diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp index 663af1d64943..603446755aaf 100644 --- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp +++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp @@ -3419,8 +3419,8 @@ lowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG ) const { // If user has set the no alignment function attribute, ignore // alloca alignments. - uint64_t AlignVal = (RealignOpt ? - dyn_cast(Align)->getZExtValue() : 0); + uint64_t AlignVal = + (RealignOpt ? cast(Align)->getZExtValue() : 0); uint64_t StackAlign = TFI->getStackAlignment(); uint64_t RequiredAlign = std::max(AlignVal, StackAlign); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 6725860 - Sema::BuildCallExpr - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI.
Author: Simon Pilgrim Date: 2021-01-05T09:34:00Z New Revision: 6725860d21a03741d6c3331ab0560416bb19e068 URL: https://github.com/llvm/llvm-project/commit/6725860d21a03741d6c3331ab0560416bb19e068 DIFF: https://github.com/llvm/llvm-project/commit/6725860d21a03741d6c3331ab0560416bb19e068.diff LOG: Sema::BuildCallExpr - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning. Added: Modified: clang/lib/Sema/SemaExpr.cpp Removed: diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 3992a373f721..28f4c5bbf19b 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6484,7 +6484,7 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, "should only occur in error-recovery path."); QualType ReturnType = llvm::isa_and_nonnull(NDecl) -? dyn_cast(NDecl)->getCallResultType() +? cast(NDecl)->getCallResultType() : Context.DependentTy; return CallExpr::Create(Context, Fn, ArgExprs, ReturnType, Expr::getValueKindForType(ReturnType), RParenLoc, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] f7463ca - [ProfileData] GCOVFile::readGCNO - silence undefined pointer warning. NFCI.
Author: Simon Pilgrim Date: 2021-01-04T16:50:05Z New Revision: f7463ca3cc5ba8455c4611c5afa79c48d8a79326 URL: https://github.com/llvm/llvm-project/commit/f7463ca3cc5ba8455c4611c5afa79c48d8a79326 DIFF: https://github.com/llvm/llvm-project/commit/f7463ca3cc5ba8455c4611c5afa79c48d8a79326.diff LOG: [ProfileData] GCOVFile::readGCNO - silence undefined pointer warning. NFCI. Silence clang static analyzer warning that 'fn' could still be in an undefined state - this shouldn't happen depending on the likely tag order, but the analyzer can't know that. Added: Modified: llvm/lib/ProfileData/GCOV.cpp Removed: diff --git a/llvm/lib/ProfileData/GCOV.cpp b/llvm/lib/ProfileData/GCOV.cpp index 2e1ba3338394..3332a898603b 100644 --- a/llvm/lib/ProfileData/GCOV.cpp +++ b/llvm/lib/ProfileData/GCOV.cpp @@ -111,7 +111,7 @@ bool GCOVFile::readGCNO(GCOVBuffer ) { buf.getWord(); // hasUnexecutedBlocks uint32_t tag, length; - GCOVFunction *fn; + GCOVFunction *fn = nullptr; while ((tag = buf.getWord())) { if (!buf.readInt(length)) return false; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits