Matthew Poremba has submitted this change. (
https://gem5-review.googlesource.com/c/public/gem5/+/55643 )
Change subject: arch-vega: Fix global 64-bit calcAddr with SGPR base
......................................................................
arch-vega: Fix global 64-bit calcAddr with SGPR base
Global instruction address calculation when using an SGPR or SGPR pair
as a base address was being calculated incorrectly when 64-bit addresses
were to be generated.
From the ISA documentation, the SGPR should be read as 32-bit or 64-bit
depending on "ADDRESS_MODE." The VGPR-offset (computed from the lower
32-bits of vaddr) should always be 32-bits and the offset is 12 bits
from the instruction. This means the 32-bit mask should only be applied
to vaddr to get the VGPU-offset rather than the final sum.
The SGPR base format is being seen in more recent clang/ROCm versions to
avoid unnecessary copies of SGPRs into VGPRs to use VGPRs as the base
address.
Change-Id: I48910611fcfac5b62bc63496bbaabd6f6e53fe0d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/55643
Reviewed-by: Matt Sinclair <mattdsincl...@gmail.com>
Maintainer: Matt Sinclair <mattdsincl...@gmail.com>
Tested-by: kokoro <noreply+kok...@google.com>
---
M src/arch/amdgpu/vega/insts/op_encodings.hh
1 file changed, 40 insertions(+), 8 deletions(-)
Approvals:
Matt Sinclair: Looks good to me, approved; Looks good to me, approved
kokoro: Regressions pass
diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh
b/src/arch/amdgpu/vega/insts/op_encodings.hh
index a6dc58c..2642cd7 100644
--- a/src/arch/amdgpu/vega/insts/op_encodings.hh
+++ b/src/arch/amdgpu/vega/insts/op_encodings.hh
@@ -911,12 +911,14 @@
// be a 64-bit address. Otherwise, saddr is the reg index for a
// scalar reg used as the base address for a 32-bit address.
if ((saddr == 0x7f && isFlatGlobal()) || isFlat()) {
- calcAddr64(gpuDynInst, vaddr, offset);
+ calcAddrVgpr(gpuDynInst, vaddr, offset);
} else {
- ConstScalarOperandU32 sbase(gpuDynInst, saddr);
+ // Assume we are operating in 64-bit mode and read a pair
of
+ // SGPRs for the address base.
+ ConstScalarOperandU64 sbase(gpuDynInst, saddr);
sbase.read();
- calcAddr32(gpuDynInst, vaddr, sbase, offset);
+ calcAddrSgpr(gpuDynInst, vaddr, sbase, offset);
}
if (isFlat()) {
@@ -953,20 +955,23 @@
void generateGlobalDisassembly();
void
- calcAddr32(GPUDynInstPtr gpuDynInst, ConstVecOperandU64 &vaddr,
- ConstScalarOperandU32 &saddr, ScalarRegU32 offset)
+ calcAddrSgpr(GPUDynInstPtr gpuDynInst, ConstVecOperandU64 &vaddr,
+ ConstScalarOperandU64 &saddr, ScalarRegU32 offset)
{
+ // Use SGPR pair as a base address and add VGPR-offset and
+ // instruction offset. The VGPR-offset is always 32-bits so we
+ // mask any upper bits from the vaddr.
for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
if (gpuDynInst->exec_mask[lane]) {
gpuDynInst->addr.at(lane) =
- (vaddr[lane] + saddr.rawData() + offset) &
0xffffffff;
+ saddr.rawData() + (vaddr[lane] & 0xffffffff) +
offset;
}
}
}
void
- calcAddr64(GPUDynInstPtr gpuDynInst, ConstVecOperandU64 &addr,
- ScalarRegU32 offset)
+ calcAddrVgpr(GPUDynInstPtr gpuDynInst, ConstVecOperandU64 &addr,
+ ScalarRegU32 offset)
{
for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
if (gpuDynInst->exec_mask[lane]) {
--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/55643
To unsubscribe, or for help writing mail filters, visit
https://gem5-review.googlesource.com/settings
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I48910611fcfac5b62bc63496bbaabd6f6e53fe0d
Gerrit-Change-Number: 55643
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba <matthew.pore...@amd.com>
Gerrit-Reviewer: Kyle Roarty <kyleroarty1...@gmail.com>
Gerrit-Reviewer: Matt Sinclair <mattdsincl...@gmail.com>
Gerrit-Reviewer: Matthew Poremba <matthew.pore...@amd.com>
Gerrit-Reviewer: kokoro <noreply+kok...@google.com>
Gerrit-MessageType: merged
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s