[llvm-branch-commits] [llvm] [AMDGPU] Skip non-first termintors when forcing emit zero flag (PR #112116)
https://github.com/shiltian created
https://github.com/llvm/llvm-project/pull/112116
None
>From 59007297616ea2e11a06401b79833953318d42e0 Mon Sep 17 00:00:00 2001
From: Shilei Tian
Date: Sat, 12 Oct 2024 23:58:25 -0400
Subject: [PATCH] [AMDGPU] Skip non-first termintors when forcing emit zero
flag
---
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 15 -
.../waitcnt-debug-non-first-terminators.mir | 22 +++
2 files changed, 36 insertions(+), 1 deletion(-)
create mode 100644
llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 9866ecbdddb608..28e26dc47b0ab4 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1600,6 +1600,17 @@ static bool callWaitsOnFunctionReturn(const MachineInstr
&MI) {
return true;
}
+/// \returns true if \p MI is not the first terminator of its associated MBB.
+static bool checkIfMBBNonFirstTerminator(const MachineInstr &MI) {
+ const auto &MBB = MI.getParent();
+ if (MBB->getFirstTerminator() == MI)
+return false;
+ for (const auto &I : MBB->terminators())
+if (&I == &MI)
+ return true;
+ return false;
+}
+
/// Generate s_waitcnt instruction to be placed before cur_Inst.
/// Instructions of a given type are returned in order,
/// but instructions of different types can complete out of order.
@@ -1825,7 +1836,9 @@ bool
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
// Verify that the wait is actually needed.
ScoreBrackets.simplifyWaitcnt(Wait);
- if (ForceEmitZeroFlag)
+ // When forcing emit, we need to skip non-first terminators of a MBB because
+ // that would break the terminators of the MBB.
+ if (ForceEmitZeroFlag && !checkIfMBBNonFirstTerminator(MI))
Wait = WCG->getAllZeroWaitcnt(/*IncludeVSCnt=*/false);
if (ForceEmitWaitcnt[LOAD_CNT])
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
new file mode 100644
index 00..530d1981f053e9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
@@ -0,0 +1,22 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass si-insert-waitcnts
-amdgpu-waitcnt-forcezero=1 %s -o - | FileCheck %s
+
+...
+
+# CHECK-LABEL: waitcnt-debug-non-first-terminators
+# CHECK: S_WAITCNT 0
+# CHECK-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
+# CHECK-NEXT: S_BRANCH %bb.2, implicit $scc
+
+name: waitcnt-debug-non-first-terminators
+liveins:
+machineFunctionInfo:
+ isEntryFunction: true
+body: |
+ bb.0:
+S_CBRANCH_SCC1 %bb.1, implicit $scc
+S_BRANCH %bb.2, implicit $scc
+ bb.1:
+S_NOP 0
+ bb.2:
+S_NOP 0
+...
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Skip non-first termintors when forcing emit zero flag (PR #112116)
shiltian wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/112116?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#112116** https://app.graphite.dev/github/pr/llvm/llvm-project/112116?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#112114** https://app.graphite.dev/github/pr/llvm/llvm-project/112114?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/112116 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Skip non-first termintors when forcing emit zero flag (PR #112116)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Shilei Tian (shiltian)
Changes
---
Full diff: https://github.com/llvm/llvm-project/pull/112116.diff
2 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+14-1)
- (added) llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
(+22)
``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 9866ecbdddb608..28e26dc47b0ab4 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1600,6 +1600,17 @@ static bool callWaitsOnFunctionReturn(const MachineInstr
&MI) {
return true;
}
+/// \returns true if \p MI is not the first terminator of its associated MBB.
+static bool checkIfMBBNonFirstTerminator(const MachineInstr &MI) {
+ const auto &MBB = MI.getParent();
+ if (MBB->getFirstTerminator() == MI)
+return false;
+ for (const auto &I : MBB->terminators())
+if (&I == &MI)
+ return true;
+ return false;
+}
+
/// Generate s_waitcnt instruction to be placed before cur_Inst.
/// Instructions of a given type are returned in order,
/// but instructions of different types can complete out of order.
@@ -1825,7 +1836,9 @@ bool
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
// Verify that the wait is actually needed.
ScoreBrackets.simplifyWaitcnt(Wait);
- if (ForceEmitZeroFlag)
+ // When forcing emit, we need to skip non-first terminators of a MBB because
+ // that would break the terminators of the MBB.
+ if (ForceEmitZeroFlag && !checkIfMBBNonFirstTerminator(MI))
Wait = WCG->getAllZeroWaitcnt(/*IncludeVSCnt=*/false);
if (ForceEmitWaitcnt[LOAD_CNT])
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
new file mode 100644
index 00..530d1981f053e9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-debug-non-first-terminators.mir
@@ -0,0 +1,22 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass si-insert-waitcnts
-amdgpu-waitcnt-forcezero=1 %s -o - | FileCheck %s
+
+...
+
+# CHECK-LABEL: waitcnt-debug-non-first-terminators
+# CHECK: S_WAITCNT 0
+# CHECK-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
+# CHECK-NEXT: S_BRANCH %bb.2, implicit $scc
+
+name: waitcnt-debug-non-first-terminators
+liveins:
+machineFunctionInfo:
+ isEntryFunction: true
+body: |
+ bb.0:
+S_CBRANCH_SCC1 %bb.1, implicit $scc
+S_BRANCH %bb.2, implicit $scc
+ bb.1:
+S_NOP 0
+ bb.2:
+S_NOP 0
+...
``
https://github.com/llvm/llvm-project/pull/112116
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Skip non-first termintors when forcing emit zero flag (PR #112116)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/112116 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++] Avoid re-exporting a few specific symbols from libc++abi (#109054) (PR #110677)
var-const wrote: @tru Thank you so much for accepting this! It looks like the CI is clean now. https://github.com/llvm/llvm-project/pull/110677 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
https://github.com/wangpc-pp updated
https://github.com/llvm/llvm-project/pull/107548
>From f21cfcfc90330ee3856746b6315a81a00313b0e0 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng
Date: Fri, 6 Sep 2024 17:20:51 +0800
Subject: [PATCH 1/5] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
=?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Created using spr 1.3.6-beta.1
---
.../Target/RISCV/RISCVTargetTransformInfo.cpp | 15 +
.../Target/RISCV/RISCVTargetTransformInfo.h | 3 +
llvm/test/CodeGen/RISCV/memcmp.ll | 932 ++
3 files changed, 950 insertions(+)
create mode 100644 llvm/test/CodeGen/RISCV/memcmp.ll
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index e809e15eacf696..ad532aadc83266 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2113,3 +2113,18 @@ bool RISCVTTIImpl::shouldConsiderAddressTypePromotion(
}
return Considerable;
}
+
+RISCVTTIImpl::TTI::MemCmpExpansionOptions
+RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
+ TTI::MemCmpExpansionOptions Options;
+ // FIXME: Vector haven't been tested.
+ Options.AllowOverlappingLoads =
+ (ST->enableUnalignedScalarMem() || ST->enableUnalignedScalarMem());
+ Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
+ Options.NumLoadsPerBlock = Options.MaxNumLoads;
+ if (ST->is64Bit())
+Options.LoadSizes.push_back(8);
+ llvm::append_range(Options.LoadSizes, ArrayRef({4, 2, 1}));
+ Options.AllowedTailExpansions = {3, 5, 6};
+ return Options;
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 763b89bfec0a66..ee9bed09df97f3 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -404,6 +404,9 @@ class RISCVTTIImpl : public BasicTTIImplBase {
shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader);
std::optional getMinPageSize() const { return 4096; }
+
+ TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
+bool IsZeroCmp) const;
};
} // end namespace llvm
diff --git a/llvm/test/CodeGen/RISCV/memcmp.ll
b/llvm/test/CodeGen/RISCV/memcmp.ll
new file mode 100644
index 00..652cd02e2c750a
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/memcmp.ll
@@ -0,0 +1,932 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
UTC_ARGS: --version 5
+; RUN: sed 's/iXLen/i32/g' %s | llc -mtriple=riscv32 -O2 | FileCheck %s
--check-prefix=CHECK-ALIGNED-RV32
+; RUN: sed 's/iXLen/i64/g' %s | llc -mtriple=riscv64 -O2 | FileCheck %s
--check-prefix=CHECK-ALIGNED-RV64
+; RUN: sed 's/iXLen/i32/g' %s | llc -mtriple=riscv32
-mattr=+unaligned-scalar-mem -O2 \
+; RUN: | FileCheck %s --check-prefix=CHECK-UNALIGNED-RV32
+; RUN: sed 's/iXLen/i64/g' %s | llc -mtriple=riscv64
-mattr=+unaligned-scalar-mem -O2 \
+; RUN: | FileCheck %s --check-prefix=CHECK-UNALIGNED-RV64
+
+declare i32 @bcmp(i8*, i8*, iXLen) nounwind readonly
+declare i32 @memcmp(i8*, i8*, iXLen) nounwind readonly
+
+define i1 @bcmp_size_15(i8* %s1, i8* %s2) {
+; CHECK-ALIGNED-RV32-LABEL: bcmp_size_15:
+; CHECK-ALIGNED-RV32: # %bb.0: # %entry
+; CHECK-ALIGNED-RV32-NEXT:lbu a2, 1(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a3, 0(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 2(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a5, 3(a0)
+; CHECK-ALIGNED-RV32-NEXT:slli a2, a2, 8
+; CHECK-ALIGNED-RV32-NEXT:or a2, a2, a3
+; CHECK-ALIGNED-RV32-NEXT:slli a4, a4, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 24
+; CHECK-ALIGNED-RV32-NEXT:or a4, a5, a4
+; CHECK-ALIGNED-RV32-NEXT:or a2, a4, a2
+; CHECK-ALIGNED-RV32-NEXT:lbu a3, 1(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 0(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a5, 2(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a6, 3(a1)
+; CHECK-ALIGNED-RV32-NEXT:slli a3, a3, 8
+; CHECK-ALIGNED-RV32-NEXT:or a3, a3, a4
+; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a6, a6, 24
+; CHECK-ALIGNED-RV32-NEXT:or a4, a6, a5
+; CHECK-ALIGNED-RV32-NEXT:or a3, a4, a3
+; CHECK-ALIGNED-RV32-NEXT:xor a2, a2, a3
+; CHECK-ALIGNED-RV32-NEXT:lbu a3, 5(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 4(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a5, 6(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a6, 7(a0)
+; CHECK-ALIGNED-RV32-NEXT:slli a3, a3, 8
+; CHECK-ALIGNED-RV32-NEXT:or a3, a3, a4
+; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a6, a6, 24
+; CHECK-ALIGNED-RV32-NEXT:or a4, a6, a5
+; CHECK-ALIGNED-RV32-NEXT:or a3, a4, a3
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 5(a1)
+; CHECK-ALIGNED-RV32-NEXT
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
https://github.com/wangpc-pp updated
https://github.com/llvm/llvm-project/pull/107548
>From f21cfcfc90330ee3856746b6315a81a00313b0e0 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng
Date: Fri, 6 Sep 2024 17:20:51 +0800
Subject: [PATCH 1/5] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
=?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Created using spr 1.3.6-beta.1
---
.../Target/RISCV/RISCVTargetTransformInfo.cpp | 15 +
.../Target/RISCV/RISCVTargetTransformInfo.h | 3 +
llvm/test/CodeGen/RISCV/memcmp.ll | 932 ++
3 files changed, 950 insertions(+)
create mode 100644 llvm/test/CodeGen/RISCV/memcmp.ll
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index e809e15eacf696..ad532aadc83266 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2113,3 +2113,18 @@ bool RISCVTTIImpl::shouldConsiderAddressTypePromotion(
}
return Considerable;
}
+
+RISCVTTIImpl::TTI::MemCmpExpansionOptions
+RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
+ TTI::MemCmpExpansionOptions Options;
+ // FIXME: Vector haven't been tested.
+ Options.AllowOverlappingLoads =
+ (ST->enableUnalignedScalarMem() || ST->enableUnalignedScalarMem());
+ Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
+ Options.NumLoadsPerBlock = Options.MaxNumLoads;
+ if (ST->is64Bit())
+Options.LoadSizes.push_back(8);
+ llvm::append_range(Options.LoadSizes, ArrayRef({4, 2, 1}));
+ Options.AllowedTailExpansions = {3, 5, 6};
+ return Options;
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 763b89bfec0a66..ee9bed09df97f3 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -404,6 +404,9 @@ class RISCVTTIImpl : public BasicTTIImplBase {
shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader);
std::optional getMinPageSize() const { return 4096; }
+
+ TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
+bool IsZeroCmp) const;
};
} // end namespace llvm
diff --git a/llvm/test/CodeGen/RISCV/memcmp.ll
b/llvm/test/CodeGen/RISCV/memcmp.ll
new file mode 100644
index 00..652cd02e2c750a
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/memcmp.ll
@@ -0,0 +1,932 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
UTC_ARGS: --version 5
+; RUN: sed 's/iXLen/i32/g' %s | llc -mtriple=riscv32 -O2 | FileCheck %s
--check-prefix=CHECK-ALIGNED-RV32
+; RUN: sed 's/iXLen/i64/g' %s | llc -mtriple=riscv64 -O2 | FileCheck %s
--check-prefix=CHECK-ALIGNED-RV64
+; RUN: sed 's/iXLen/i32/g' %s | llc -mtriple=riscv32
-mattr=+unaligned-scalar-mem -O2 \
+; RUN: | FileCheck %s --check-prefix=CHECK-UNALIGNED-RV32
+; RUN: sed 's/iXLen/i64/g' %s | llc -mtriple=riscv64
-mattr=+unaligned-scalar-mem -O2 \
+; RUN: | FileCheck %s --check-prefix=CHECK-UNALIGNED-RV64
+
+declare i32 @bcmp(i8*, i8*, iXLen) nounwind readonly
+declare i32 @memcmp(i8*, i8*, iXLen) nounwind readonly
+
+define i1 @bcmp_size_15(i8* %s1, i8* %s2) {
+; CHECK-ALIGNED-RV32-LABEL: bcmp_size_15:
+; CHECK-ALIGNED-RV32: # %bb.0: # %entry
+; CHECK-ALIGNED-RV32-NEXT:lbu a2, 1(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a3, 0(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 2(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a5, 3(a0)
+; CHECK-ALIGNED-RV32-NEXT:slli a2, a2, 8
+; CHECK-ALIGNED-RV32-NEXT:or a2, a2, a3
+; CHECK-ALIGNED-RV32-NEXT:slli a4, a4, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 24
+; CHECK-ALIGNED-RV32-NEXT:or a4, a5, a4
+; CHECK-ALIGNED-RV32-NEXT:or a2, a4, a2
+; CHECK-ALIGNED-RV32-NEXT:lbu a3, 1(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 0(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a5, 2(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a6, 3(a1)
+; CHECK-ALIGNED-RV32-NEXT:slli a3, a3, 8
+; CHECK-ALIGNED-RV32-NEXT:or a3, a3, a4
+; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a6, a6, 24
+; CHECK-ALIGNED-RV32-NEXT:or a4, a6, a5
+; CHECK-ALIGNED-RV32-NEXT:or a3, a4, a3
+; CHECK-ALIGNED-RV32-NEXT:xor a2, a2, a3
+; CHECK-ALIGNED-RV32-NEXT:lbu a3, 5(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 4(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a5, 6(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a6, 7(a0)
+; CHECK-ALIGNED-RV32-NEXT:slli a3, a3, 8
+; CHECK-ALIGNED-RV32-NEXT:or a3, a3, a4
+; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a6, a6, 24
+; CHECK-ALIGNED-RV32-NEXT:or a4, a6, a5
+; CHECK-ALIGNED-RV32-NEXT:or a3, a4, a3
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 5(a1)
+; CHECK-ALIGNED-RV32-NEXT
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
@@ -112,42 +104,46 @@ entry:
define i32 @bcmp_size_2(ptr %s1, ptr %s2) nounwind optsize {
; CHECK-ALIGNED-RV32-LABEL: bcmp_size_2:
; CHECK-ALIGNED-RV32: # %bb.0: # %entry
-; CHECK-ALIGNED-RV32-NEXT:addi sp, sp, -16
-; CHECK-ALIGNED-RV32-NEXT:sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK-ALIGNED-RV32-NEXT:li a2, 2
-; CHECK-ALIGNED-RV32-NEXT:call bcmp
-; CHECK-ALIGNED-RV32-NEXT:lw ra, 12(sp) # 4-byte Folded Reload
-; CHECK-ALIGNED-RV32-NEXT:addi sp, sp, 16
+; CHECK-ALIGNED-RV32-NEXT:lbu a2, 1(a0)
wangpc-pp wrote:
This seems to be more complicated than what I was thinking. For this case, we
generate i16 loads/compares, and then we expand them since unaligned loads are
illegal. We may expand memcmp to byte loop directly when there is no unaligned
access.
https://github.com/llvm/llvm-project/pull/107548
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
