[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute,configs: Make sim exits conditional
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email ) Change subject: gpu-compute,configs: Make sim exits conditional .. gpu-compute,configs: Make sim exits conditional The unconditional exit event when a kernel completes that was added in c644eae2ddd34cf449a9c4476730bd29703c4dd7 is causing scripts that do not ignore unknown exit events to end simulation prematurely. One such script is the apu_se.py script used in SE mode GPU simulation. Make this exit conditional to the parameter being set to a valid value to avoid this problem. Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72098 Reviewed-by: Jason Lowe-Power Maintainer: Matt Sinclair Maintainer: Jason Lowe-Power Reviewed-by: Matt Sinclair Tested-by: kokoro --- M configs/example/gpufs/system/system.py M src/gpu-compute/GPU.py M src/gpu-compute/dispatcher.cc M src/gpu-compute/dispatcher.hh 4 files changed, 13 insertions(+), 3 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index 40e0016..19df310 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -115,7 +115,8 @@ numHWQueues=args.num_hw_queues, walker=hsapp_pt_walker, ) -dispatcher = GPUDispatcher() +dispatcher_exit_events = True if args.exit_at_gpu_kernel > -1 else False +dispatcher = GPUDispatcher(kernel_exit_events=dispatcher_exit_events) cp_pt_walker = VegaPagetableWalker() gpu_cmd_proc = GPUCommandProcessor( hsapp=gpu_hsapp, dispatcher=dispatcher, walker=cp_pt_walker diff --git a/src/gpu-compute/GPU.py b/src/gpu-compute/GPU.py index c5449cc..c64a6b7 100644 --- a/src/gpu-compute/GPU.py +++ b/src/gpu-compute/GPU.py @@ -328,6 +328,10 @@ cxx_class = "gem5::GPUDispatcher" cxx_header = "gpu-compute/dispatcher.hh" +kernel_exit_events = Param.Bool( +False, "Enable exiting sim loop after a kernel" +) + class GPUCommandProcessor(DmaVirtDevice): type = "GPUCommandProcessor" diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc index b19bccc..7b36bce 100644 --- a/src/gpu-compute/dispatcher.cc +++ b/src/gpu-compute/dispatcher.cc @@ -51,7 +51,8 @@ : SimObject(p), shader(nullptr), gpuCmdProc(nullptr), tickEvent([this]{ exec(); }, "GPU Dispatcher tick", false, Event::CPU_Tick_Pri), - dispatchActive(false), stats(this) + dispatchActive(false), kernelExitEvents(p.kernel_exit_events), + stats(this) { schedule(&tickEvent, 0); } @@ -332,7 +333,9 @@ curTick(), kern_id); DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id); -exitSimLoop("GPU Kernel Completed"); +if (kernelExitEvents) { +exitSimLoop("GPU Kernel Completed"); +} } if (!tickEvent.scheduled()) { diff --git a/src/gpu-compute/dispatcher.hh b/src/gpu-compute/dispatcher.hh index 7699cef..eafa080 100644 --- a/src/gpu-compute/dispatcher.hh +++ b/src/gpu-compute/dispatcher.hh @@ -92,6 +92,8 @@ std::queue doneIds; // is there a kernel in execution? bool dispatchActive; +// Enable exiting sim loop after each kernel completion +bool kernelExitEvents; protected: struct GPUDispatcherStats : public statistics::Group -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd Gerrit-Change-Number: 72098 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Bobby Bruce Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute,configs: Make sim exits conditional
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email ) Change subject: gpu-compute,configs: Make sim exits conditional .. gpu-compute,configs: Make sim exits conditional The unconditional exit event when a kernel completes that was added in c644eae2ddd34cf449a9c4476730bd29703c4dd7 is causing scripts that do not ignore unknown exit events to end simulation prematurely. One such script is the apu_se.py script used in SE mode GPU simulation. Make this exit conditional to the parameter being set to a valid value to avoid this problem. Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd --- M configs/example/gpufs/system/system.py M src/gpu-compute/GPU.py M src/gpu-compute/dispatcher.cc M src/gpu-compute/dispatcher.hh 4 files changed, 13 insertions(+), 3 deletions(-) diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index 40e0016..19df310 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -115,7 +115,8 @@ numHWQueues=args.num_hw_queues, walker=hsapp_pt_walker, ) -dispatcher = GPUDispatcher() +dispatcher_exit_events = True if args.exit_at_gpu_kernel > -1 else False +dispatcher = GPUDispatcher(kernel_exit_events=dispatcher_exit_events) cp_pt_walker = VegaPagetableWalker() gpu_cmd_proc = GPUCommandProcessor( hsapp=gpu_hsapp, dispatcher=dispatcher, walker=cp_pt_walker diff --git a/src/gpu-compute/GPU.py b/src/gpu-compute/GPU.py index c5449cc..c64a6b7 100644 --- a/src/gpu-compute/GPU.py +++ b/src/gpu-compute/GPU.py @@ -328,6 +328,10 @@ cxx_class = "gem5::GPUDispatcher" cxx_header = "gpu-compute/dispatcher.hh" +kernel_exit_events = Param.Bool( +False, "Enable exiting sim loop after a kernel" +) + class GPUCommandProcessor(DmaVirtDevice): type = "GPUCommandProcessor" diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc index b19bccc..7b36bce 100644 --- a/src/gpu-compute/dispatcher.cc +++ b/src/gpu-compute/dispatcher.cc @@ -51,7 +51,8 @@ : SimObject(p), shader(nullptr), gpuCmdProc(nullptr), tickEvent([this]{ exec(); }, "GPU Dispatcher tick", false, Event::CPU_Tick_Pri), - dispatchActive(false), stats(this) + dispatchActive(false), kernelExitEvents(p.kernel_exit_events), + stats(this) { schedule(&tickEvent, 0); } @@ -332,7 +333,9 @@ curTick(), kern_id); DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id); -exitSimLoop("GPU Kernel Completed"); +if (kernelExitEvents) { +exitSimLoop("GPU Kernel Completed"); +} } if (!tickEvent.scheduled()) { diff --git a/src/gpu-compute/dispatcher.hh b/src/gpu-compute/dispatcher.hh index 7699cef..eafa080 100644 --- a/src/gpu-compute/dispatcher.hh +++ b/src/gpu-compute/dispatcher.hh @@ -92,6 +92,8 @@ std::queue doneIds; // is there a kernel in execution? bool dispatchActive; +// Enable exiting sim loop after each kernel completion +bool kernelExitEvents; protected: struct GPUDispatcherStats : public statistics::Group -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd Gerrit-Change-Number: 72098 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [L] Change in gem5/gem5[develop]: configs: Create base GPUFS vega config and atomic config
{} {}" -echo "{}" | base64 -d > myapp -chmod +x myapp -./myapp {} -/sbin/m5 exit -""" - -demo_runscript_with_checkpoint = """\ -export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH -export HSA_ENABLE_INTERRUPT=0 -dmesg -n8 -dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 -if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then -echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." -/sbin/m5 exit -fi -modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0 -echo "Running {} {}" -echo "{}" | base64 -d > myapp -chmod +x myapp -/sbin/m5 checkpoint -./myapp {} -/sbin/m5 exit -""" - - -def addDemoOptions(parser): -parser.add_argument( -"-a", "--app", default=None, help="GPU application to run" -) -parser.add_argument( -"-o", "--opts", default="", help="GPU application arguments" -) - - -if __name__ == "__m5_main__": -parser = argparse.ArgumentParser() -runfs.addRunFSOptions(parser) -Options.addCommonOptions(parser) -AmdGPUOptions.addAmdGPUOptions(parser) -Ruby.define_options(parser) -GPUTLBOptions.tlb_options(parser) -addDemoOptions(parser) - -# Parse now so we can override options -args = parser.parse_args() -demo_runscript = "" - -# Create temp script to run application -if args.app is None: -print(f"No application given. Use {sys.argv[0]} -a ") -sys.exit(1) -elif args.kernel is None: -print(f"No kernel path given. Use {sys.argv[0]} --kernel ") -sys.exit(1) -elif args.disk_image is None: -print(f"No disk path given. Use {sys.argv[0]} --disk-image ") -sys.exit(1) -elif args.gpu_mmio_trace is None: -print(f"No MMIO trace path. Use {sys.argv[0]} --gpu-mmio-trace ") -sys.exit(1) -elif not os.path.isfile(args.app): -print("Could not find applcation", args.app) -sys.exit(1) - -# Choose runscript Based on whether any checkpointing args are set -if args.checkpoint_dir is not None: -demo_runscript = demo_runscript_with_checkpoint -else: -demo_runscript = demo_runscript_without_checkpoint - -with open(os.path.abspath(args.app), "rb") as binfile: -encodedBin = base64.b64encode(binfile.read()).decode() - -_, tempRunscript = tempfile.mkstemp() -with open(tempRunscript, "w") as b64file: -runscriptStr = demo_runscript.format( -args.app, args.opts, encodedBin, args.opts -) -b64file.write(runscriptStr) - -if args.second_disk == None: -args.second_disk = args.disk_image - -# Defaults for Vega10 -args.ruby = True -args.cpu_type = "X86KvmCPU" -args.num_cpus = 1 -args.mem_size = "3GB" -args.dgpu = True -args.dgpu_mem_size = "16GB" -args.dgpu_start = "0GB" -args.checkpoint_restore = 0 -args.disjoint = True -args.timing_gpu = True -args.script = tempRunscript -args.dgpu_xor_low_bit = 0 - -# Run gem5 -runfs.runGpuFSSystem(args) +vega10.runVegaGPUFS("X86KvmCPU") -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71939?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I094bc4d4df856563535c28c1f6d6cc045d6734cd Gerrit-Change-Number: 71939 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [L] Change in gem5/gem5[develop]: configs: Create base GPUFS vega config and atomic config
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/71939?usp=email ) Change subject: configs: Create base GPUFS vega config and atomic config .. configs: Create base GPUFS vega config and atomic config Move the Vega KVM script code to a common base file and add scripts for KVM and atomic. Since atomic is now possible in GPUFS this gives a way to run it without editing the current scripts. Change-Id: I094bc4d4df856563535c28c1f6d6cc045d6734cd --- A configs/example/gpufs/vega10.py A configs/example/gpufs/vega10_atomic.py M configs/example/gpufs/vega10_kvm.py 3 files changed, 188 insertions(+), 124 deletions(-) diff --git a/configs/example/gpufs/vega10.py b/configs/example/gpufs/vega10.py new file mode 100644 index 000..9eff5a2 --- /dev/null +++ b/configs/example/gpufs/vega10.py @@ -0,0 +1,153 @@ +# Copyright (c) 2022-2023 Advanced Micro Devices, Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are met: +# +# 1. Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# +# 2. Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# 3. Neither the name of the copyright holder nor the names of its +# contributors may be used to endorse or promote products derived from this +# software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +# POSSIBILITY OF SUCH DAMAGE. + +import m5 +import runfs +import base64 +import tempfile +import argparse +import sys +import os + +from amd import AmdGPUOptions +from common import Options +from common import GPUTLBOptions +from ruby import Ruby + + +demo_runscript_without_checkpoint = """\ +export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH +export HSA_ENABLE_INTERRUPT=0 +dmesg -n8 +dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 +if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then +echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." +/sbin/m5 exit +fi +modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0 +echo "Running {} {}" +echo "{}" | base64 -d > myapp +chmod +x myapp +./myapp {} +/sbin/m5 exit +""" + +demo_runscript_with_checkpoint = """\ +export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH +export HSA_ENABLE_INTERRUPT=0 +dmesg -n8 +dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 +if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then +echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." +/sbin/m5 exit +fi +modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0 +echo "Running {} {}" +echo "{}" | base64 -d > myapp +chmod +x myapp +/sbin/m5 checkpoint +./myapp {} +/sbin/m5 exit +""" + + +def addDemoOptions(parser): +parser.add_argument( +"-a", "--app", default=None, help="GPU application to run" +) +parser.add_argument( +"-o", "--opts", default="", help="GPU application arguments" +) + + +def runVegaGPUFS(cpu_type): +parser = argparse.ArgumentParser() +runfs.addRunFSOptions(parser) +Options.addCommonOptions(parser) +AmdGPUOptions.addAmdGPUOptions(parser) +Ruby.define_options(parser) +GPUTLBOptions.tlb_options(parser) +addDemoOptions(parser) + +# Parse now so we can override options +args = parser.parse_args() +demo_runscript = "" + +# Create temp script to run application +if args.app is None: +print(f"No application given. Use {sys.argv[0]} -a ") +sys.exit(1) +elif args.kernel is None: +print(f"No kernel path given. Use {sys.argv[0]} --kernel ") +sys.exit(1) +elif args.disk_image is None: +print(f"No disk path given. Use {sys.argv[0]} --disk-image ") +sys.exit(1) +elif args.gpu_mmio_trace is None: +print(f"No MMIO trace path. Use {sys.argv[0
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add GPUFS --root-partition option
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email ) Change subject: configs: Add GPUFS --root-partition option .. configs: Add GPUFS --root-partition option Different GPUFS disk images have different root partitions that Linux needs to boot from. In particular, Ubuntu's new installer has a GRUB partition that cannot seem to be removed. Adding this as an option prevents needing to edit a config script to change one character each time a different disk image is used. Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71918 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M configs/example/gpufs/runfs.py M configs/example/gpufs/system/system.py 2 files changed, 8 insertions(+), 1 deletion(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index b045b80..5346622 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -151,6 +151,13 @@ help="Exit simulation after running this many kernels", ) +parser.add_argument( +"--root-partition", +type=str, +default="/dev/sda1", +help="Root partition of disk image", +) + def runGpuFSSystem(args): """ diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index 263ffc0..40e0016 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -50,7 +50,7 @@ "earlyprintk=ttyS0", "console=ttyS0,9600", "lpj=723", -"root=/dev/sda1", +f"root={args.root_partition}", "drm_kms_helper.fbdev_emulation=0", "modprobe.blacklist=amdgpu", "modprobe.blacklist=psmouse", -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc Gerrit-Change-Number: 71918 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email ) Change subject: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32 .. arch-vega: Add Vega D16 decodings and fix V_SWAP_B32 Vega adds multiple new D16 instructions which load a byte or short into the lower or upper 16 bits of a register for packed math. The decoder table has subDecode tables for FLAT instructions which represents 32 opcodes in each subDecode table. The subDecode table for opcodes 32-63 is missing so it is added here. The opcode for V_SWAP_B32 is also off by one- In the ISA manual this instruction is opcode 81, the instruction before is 79, and there is no opcode 80, so the decoder entry is swapped with the invalid decoding below it. Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/decoder.cc 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/decoder.cc b/src/arch/amdgpu/vega/decoder.cc index fd3a803..a86dd66 100644 --- a/src/arch/amdgpu/vega/decoder.cc +++ b/src/arch/amdgpu/vega/decoder.cc @@ -495,7 +495,7 @@ &Decoder::decode_invalid, &Decoder::decode_invalid, &Decoder::subDecode_OP_FLAT, -&Decoder::decode_invalid, +&Decoder::subDecode_OP_FLAT, &Decoder::subDecode_OP_FLAT, &Decoder::subDecode_OP_FLAT, &Decoder::decode_invalid, @@ -3140,8 +3140,8 @@ &Decoder::decode_OP_VOP1__V_CVT_NORM_I16_F16, &Decoder::decode_OP_VOP1__V_CVT_NORM_U16_F16, &Decoder::decode_OP_VOP1__V_SAT_PK_U8_I16, -&Decoder::decode_OP_VOP1__V_SWAP_B32, &Decoder::decode_invalid, +&Decoder::decode_OP_VOP1__V_SWAP_B32, &Decoder::decode_invalid, &Decoder::decode_invalid, &Decoder::decode_invalid, -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad Gerrit-Change-Number: 71899 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Perform frame writes atomically
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email ) Change subject: dev-amdgpu: Perform frame writes atomically .. dev-amdgpu: Perform frame writes atomically The PCI read/write functions are atomic functions in gem5, meaning they expect a response with a latency value on the same simulation Tick. For reads to a PCI device, the response must also include a data value read from the device. The AMDGPU device has a PCI BAR which mirrors the frame buffer memory. Currently reads are done atomically, but writes are sent to a DMA device without waiting for a write completion ACK. As a result, it is possible that writes can be queued in the DMA device long enough that another read for a queued address arrives. This happens very deterministically with the AtomicSimpleCPU and causes GPUFS to break with that CPU. This change makes writes to the frame BAR atomic the same as reads. This avoids that problem and as a result the AtomicSimpleCPU can now load the driver for GPUFS simulations. Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matthew Poremba Reviewed-by: Matthew Poremba --- M src/dev/amdgpu/amdgpu_device.cc 1 file changed, 16 insertions(+), 2 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass Matthew Poremba: Looks good to me, approved; Looks good to me, approved diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index 3260d05..d1058f1 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -349,6 +349,22 @@ } nbio.writeFrame(pkt, offset); + +/* + * Write the value to device memory. This must be done functionally + * because this method is called by the PCIDevice::write method which + * is a non-timing write. + */ +RequestPtr req = std::make_shared(offset, pkt->getSize(), 0, + vramRequestorId()); +PacketPtr writePkt = Packet::createWrite(req); +uint8_t *dataPtr = new uint8_t[pkt->getSize()]; +std::memcpy(dataPtr, pkt->getPtr(), +pkt->getSize() * sizeof(uint8_t)); +writePkt->dataDynamic(dataPtr); + +auto system = cp->shader()->gpuCmdProc.system(); +system->getDeviceMemory(writePkt)->access(writePkt); } void @@ -489,8 +505,6 @@ switch (barnum) { case FRAMEBUFFER_BAR: - gpuMemMgr->writeRequest(offset, pkt->getPtr(), - pkt->getSize(), 0, nullptr); writeFrame(pkt, offset); break; case DOORBELL_BAR: -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db Gerrit-Change-Number: 71898 Gerrit-PatchSet: 4 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add GPUFS --root-partition option
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email ) Change subject: configs: Add GPUFS --root-partition option .. configs: Add GPUFS --root-partition option Different GPUFS disk images have different root partitions that Linux needs to boot from. In particular, Ubuntu's new installer has a GRUB partition that cannot seem to be removed. Adding this as an option prevents needing to edit a config script to change one character each time a different disk image is used. Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc --- M configs/example/gpufs/runfs.py M configs/example/gpufs/system/system.py 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index b045b80..5346622 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -151,6 +151,13 @@ help="Exit simulation after running this many kernels", ) +parser.add_argument( +"--root-partition", +type=str, +default="/dev/sda1", +help="Root partition of disk image", +) + def runGpuFSSystem(args): """ diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index 263ffc0..40e0016 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -50,7 +50,7 @@ "earlyprintk=ttyS0", "console=ttyS0,9600", "lpj=723", -"root=/dev/sda1", +f"root={args.root_partition}", "drm_kms_helper.fbdev_emulation=0", "modprobe.blacklist=amdgpu", "modprobe.blacklist=psmouse", -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc Gerrit-Change-Number: 71918 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email ) Change subject: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32 .. arch-vega: Add Vega D16 decodings and fix V_SWAP_B32 Vega adds multiple new D16 instructions which load a byte or short into the lower or upper 16 bits of a register for packed math. The decoder table has subDecode tables for FLAT instructions which represents 32 opcodes in each subDecode table. The subDecode table for opcodes 32-63 is missing so it is added here. The opcode for V_SWAP_B32 is also off by one- In the ISA manual this instruction is opcode 81, the instruction before is 79, and there is no opcode 80, so the decoder entry is swapped with the invalid decoding below it. Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad --- M src/arch/amdgpu/vega/decoder.cc 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/arch/amdgpu/vega/decoder.cc b/src/arch/amdgpu/vega/decoder.cc index fd3a803..a86dd66 100644 --- a/src/arch/amdgpu/vega/decoder.cc +++ b/src/arch/amdgpu/vega/decoder.cc @@ -495,7 +495,7 @@ &Decoder::decode_invalid, &Decoder::decode_invalid, &Decoder::subDecode_OP_FLAT, -&Decoder::decode_invalid, +&Decoder::subDecode_OP_FLAT, &Decoder::subDecode_OP_FLAT, &Decoder::subDecode_OP_FLAT, &Decoder::decode_invalid, @@ -3140,8 +3140,8 @@ &Decoder::decode_OP_VOP1__V_CVT_NORM_I16_F16, &Decoder::decode_OP_VOP1__V_CVT_NORM_U16_F16, &Decoder::decode_OP_VOP1__V_SAT_PK_U8_I16, -&Decoder::decode_OP_VOP1__V_SWAP_B32, &Decoder::decode_invalid, +&Decoder::decode_OP_VOP1__V_SWAP_B32, &Decoder::decode_invalid, &Decoder::decode_invalid, &Decoder::decode_invalid, -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad Gerrit-Change-Number: 71899 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Perform frame writes atomically
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email ) Change subject: dev-amdgpu: Perform frame writes atomically .. dev-amdgpu: Perform frame writes atomically The PCI read/write functions are atomic functions in gem5, meaning they expect a response with a latency value on the same simulation Tick. For reads to a PCI device, the response must also include a data value read from the device. The AMDGPU device has a PCI BAR which mirrors the frame buffer memory. Currently reads are done atomically, but writes are sent to a DMA device without waiting for a write completion ACK. As a result, it is possible that writes can be queued in the DMA device long enough that another read for a queued address arrives. This happens very deterministically with the AtomicSimpleCPU and cause GPUFS to break with that CPU. This change makes writes to the frame BAR atomic the same as reads. This avoids that problem and as a result the AtomicSimpleCPU can now load the driver for GPUFS simulations. Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db --- M src/dev/amdgpu/amdgpu_device.cc 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index 3260d05..226fc99 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -349,6 +349,22 @@ } nbio.writeFrame(pkt, offset); + +/* + * Read the value from device memory. This must be done functionally + * because this method is called by the PCIDevice::read method which + * is a non-timing read. + */ +RequestPtr req = std::make_shared(offset, pkt->getSize(), 0, + vramRequestorId()); +PacketPtr writePkt = Packet::createWrite(req); +uint8_t *dataPtr = new uint8_t[pkt->getSize()]; +std::memcpy(dataPtr, pkt->getPtr(), +pkt->getSize() * sizeof(uint8_t)); +writePkt->dataDynamic(dataPtr); + +auto system = cp->shader()->gpuCmdProc.system(); +system->getDeviceMemory(writePkt)->access(writePkt); } void @@ -489,8 +505,6 @@ switch (barnum) { case FRAMEBUFFER_BAR: - gpuMemMgr->writeRequest(offset, pkt->getPtr(), - pkt->getSize(), 0, nullptr); writeFrame(pkt, offset); break; case DOORBELL_BAR: -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db Gerrit-Change-Number: 71898 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Helper methods for SDWA/DPP for VOP2
// use copies of original src0, src1, and dest during selecting +T origSrc0_sdwa(gpuDynInst, extData.iFmt_VOP_SDWA.SRC0); +T origSrc1(gpuDynInst, instData.VSRC1); + +src0_sdwa.read(); +origSrc0_sdwa.read(); +origSrc1.read(); + +DPRINTF(VEGA, "Handling %s SRC SDWA. SRC0: register v[%d], " +"DST_SEL: %d, DST_U: %d, CLMP: %d, SRC0_SEL: %d, SRC0_SEXT: " +"%d, SRC0_NEG: %d, SRC0_ABS: %d, SRC1_SEL: %d, SRC1_SEXT: %d, " +"SRC1_NEG: %d, SRC1_ABS: %d\n", +opcode().c_str(), extData.iFmt_VOP_SDWA.SRC0, +extData.iFmt_VOP_SDWA.DST_SEL, extData.iFmt_VOP_SDWA.DST_U, +extData.iFmt_VOP_SDWA.CLMP, extData.iFmt_VOP_SDWA.SRC0_SEL, +extData.iFmt_VOP_SDWA.SRC0_SEXT, +extData.iFmt_VOP_SDWA.SRC0_NEG, extData.iFmt_VOP_SDWA.SRC0_ABS, +extData.iFmt_VOP_SDWA.SRC1_SEL, +extData.iFmt_VOP_SDWA.SRC1_SEXT, +extData.iFmt_VOP_SDWA.SRC1_NEG, +extData.iFmt_VOP_SDWA.SRC1_ABS); + +processSDWA_src(extData.iFmt_VOP_SDWA, src0_sdwa, origSrc0_sdwa, +src1, origSrc1); + +return src0_sdwa; +} + +template +void sdwaDstHelper(GPUDynInstPtr gpuDynInst, T & vdst) +{ +T origVdst(gpuDynInst, instData.VDST); + +Wavefront *wf = gpuDynInst->wavefront(); +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (wf->execMask(lane)) { +origVdst[lane] = vdst[lane]; // keep copy consistent +} +} + +processSDWA_dst(extData.iFmt_VOP_SDWA, vdst, origVdst); +} + +template +T dppHelper(GPUDynInstPtr gpuDynInst, T & src1) +{ +T src0_dpp(gpuDynInst, extData.iFmt_VOP_DPP.SRC0); +src0_dpp.read(); + +DPRINTF(VEGA, "Handling %s SRC DPP. SRC0: register v[%d], " +"DPP_CTRL: 0x%#x, SRC0_ABS: %d, SRC0_NEG: %d, SRC1_ABS: %d, " +"SRC1_NEG: %d, BC: %d, BANK_MASK: %d, ROW_MASK: %d\n", +opcode().c_str(), extData.iFmt_VOP_DPP.SRC0, +extData.iFmt_VOP_DPP.DPP_CTRL, extData.iFmt_VOP_DPP.SRC0_ABS, +extData.iFmt_VOP_DPP.SRC0_NEG, extData.iFmt_VOP_DPP.SRC1_ABS, +extData.iFmt_VOP_DPP.SRC1_NEG, extData.iFmt_VOP_DPP.BC, +extData.iFmt_VOP_DPP.BANK_MASK, extData.iFmt_VOP_DPP.ROW_MASK); + +processDPP(gpuDynInst, extData.iFmt_VOP_DPP, src0_dpp, src1); + +return src0_dpp; +} + +template +void vop2Helper(GPUDynInstPtr gpuDynInst, +void (*fOpImpl)(T&, T&, T&, Wavefront*)) +{ +Wavefront *wf = gpuDynInst->wavefront(); +T src0(gpuDynInst, instData.SRC0); +T src1(gpuDynInst, instData.VSRC1); +T vdst(gpuDynInst, instData.VDST); + +src0.readSrc(); +src1.read(); + +if (isSDWAInst()) { +T src0_sdwa = sdwaSrcHelper(gpuDynInst, src1); +fOpImpl(src0_sdwa, src1, vdst, wf); +sdwaDstHelper(gpuDynInst, vdst); +} else if (isDPPInst()) { +T src0_dpp = dppHelper(gpuDynInst, src1); +fOpImpl(src0_dpp, src1, vdst, wf); +} else { +fOpImpl(src0, src1, vdst, wf); +} + +vdst.write(); +} + private: bool hasSecondDword(InFmt_VOP2 *); }; // Inst_VOP2 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70738?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406 Gerrit-Change-Number: 70738 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Jason Lowe-Power Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: GPUFS: Only use parallel eventqs for KVM
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email ) Change subject: configs: GPUFS: Only use parallel eventqs for KVM .. configs: GPUFS: Only use parallel eventqs for KVM This is turned on by default with multiple CPUs in the GPUFS configs, which causes other CPU types (e.g., AtomicSimpleCPU) to assert. Only enable parallel event queues for KVM CPUs to avoid this issue. Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71419 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M configs/example/gpufs/runfs.py 1 file changed, 2 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 01203bb..b045b80 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -162,7 +162,8 @@ # GPUFS is primarily designed to use the X86 KVM CPU. This model needs to # use multiple event queues when more than one CPU is simulated. Force it # on if that is the case. -args.host_parallel = True if args.num_cpus > 1 else False +if ObjectList.is_kvm_cpu(ObjectList.cpu_list.get(args.cpu_type)): +args.host_parallel = True if args.num_cpus > 1 else False # These are used by the protocols. They should not be set by the user. n_cu = args.num_compute_units -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36 Gerrit-Change-Number: 71419 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: configs,gpu-compute: Kernel dispatch-based exit events
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email ) Change subject: configs,gpu-compute: Kernel dispatch-based exit events .. configs,gpu-compute: Kernel dispatch-based exit events Add two kernel dispatch-based exit events that are useful for limiting the simulation and enabling debug flags at specific GPU kernels. Since the KVM CPU typically used with GPUFS is not deterministic, this help with enabling debug flags when the Tick number may vary. The exit at GPU kernel option can also limit simulation by only simulating a few hundred kernels, for example, and exit at a determined point. Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71418 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M configs/example/gpufs/runfs.py M src/gpu-compute/dispatcher.cc 2 files changed, 30 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index f8ef70d..01203bb 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -137,6 +137,20 @@ "MI200 (gfx90a)", ) +parser.add_argument( +"--debug-at-gpu-kernel", +type=int, +default=-1, +help="Turn on debug flags starting with this kernel", +) + +parser.add_argument( +"--exit-at-gpu-kernel", +type=int, +default=-1, +help="Exit simulation after running this many kernels", +) + def runGpuFSSystem(args): """ @@ -184,6 +198,9 @@ print("Running the simulation") sim_ticks = args.abs_max_tick +kernels_launched = 0 +if args.debug_at_gpu_kernel != -1: +m5.trace.disable() exit_event = m5.simulate(sim_ticks) @@ -199,11 +216,21 @@ assert args.checkpoint_dir is not None m5.checkpoint(args.checkpoint_dir) break +elif "GPU Kernel Completed" in exit_event.getCause(): +kernels_launched += 1 else: print( f"Unknown exit event: {exit_event.getCause()}. Continuing..." ) +if kernels_launched == args.debug_at_gpu_kernel: +m5.trace.enable() +if kernels_launched == args.exit_at_gpu_kernel: +print(f"Exiting @ GPU kernel {kernels_launched}") +break + +exit_event = m5.simulate(sim_ticks - m5.curTick()) + print( "Exiting @ tick %i because %s" % (m5.curTick(), exit_event.getCause()) ) diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc index a76ba7c..b19bccc 100644 --- a/src/gpu-compute/dispatcher.cc +++ b/src/gpu-compute/dispatcher.cc @@ -40,6 +40,7 @@ #include "gpu-compute/hsa_queue_entry.hh" #include "gpu-compute/shader.hh" #include "gpu-compute/wavefront.hh" +#include "sim/sim_exit.hh" #include "sim/syscall_emul_buf.hh" #include "sim/system.hh" @@ -330,6 +331,8 @@ DPRINTF(GPUWgLatency, "Kernel Complete ticks:%d kernel:%d\n", curTick(), kern_id); DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id); + +exitSimLoop("GPU Kernel Completed"); } if (!tickEvent.scheduled()) { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d Gerrit-Change-Number: 71418 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: VISHNU RAMADAS Gerrit-CC: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: configs,gpu-compute: Kernel dispatch-based exit events
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email ) Change subject: configs,gpu-compute: Kernel dispatch-based exit events .. configs,gpu-compute: Kernel dispatch-based exit events Add two kernel dispatch-based exit events that are useful for limiting the simulation and enabling debug flags at specific GPU kernels. Since the KVM CPU typically used with GPUFS is not deterministic, this help with enabling debug flags when the Tick number may vary. The exit at GPU kernel option can also limit simulation by only simulating a few hundred kernels, for example, and exit at a determined point. Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d --- M configs/example/gpufs/runfs.py M src/gpu-compute/dispatcher.cc 2 files changed, 30 insertions(+), 0 deletions(-) diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index f8ef70d..01203bb 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -137,6 +137,20 @@ "MI200 (gfx90a)", ) +parser.add_argument( +"--debug-at-gpu-kernel", +type=int, +default=-1, +help="Turn on debug flags starting with this kernel", +) + +parser.add_argument( +"--exit-at-gpu-kernel", +type=int, +default=-1, +help="Exit simulation after running this many kernels", +) + def runGpuFSSystem(args): """ @@ -184,6 +198,9 @@ print("Running the simulation") sim_ticks = args.abs_max_tick +kernels_launched = 0 +if args.debug_at_gpu_kernel != -1: +m5.trace.disable() exit_event = m5.simulate(sim_ticks) @@ -199,11 +216,21 @@ assert args.checkpoint_dir is not None m5.checkpoint(args.checkpoint_dir) break +elif "GPU Kernel Completed" in exit_event.getCause(): +kernels_launched += 1 else: print( f"Unknown exit event: {exit_event.getCause()}. Continuing..." ) +if kernels_launched == args.debug_at_gpu_kernel: +m5.trace.enable() +if kernels_launched == args.exit_at_gpu_kernel: +print(f"Exiting @ GPU kernel {kernels_launched}") +break + +exit_event = m5.simulate(sim_ticks - m5.curTick()) + print( "Exiting @ tick %i because %s" % (m5.curTick(), exit_event.getCause()) ) diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc index a76ba7c..b19bccc 100644 --- a/src/gpu-compute/dispatcher.cc +++ b/src/gpu-compute/dispatcher.cc @@ -40,6 +40,7 @@ #include "gpu-compute/hsa_queue_entry.hh" #include "gpu-compute/shader.hh" #include "gpu-compute/wavefront.hh" +#include "sim/sim_exit.hh" #include "sim/syscall_emul_buf.hh" #include "sim/system.hh" @@ -330,6 +331,8 @@ DPRINTF(GPUWgLatency, "Kernel Complete ticks:%d kernel:%d\n", curTick(), kern_id); DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id); + +exitSimLoop("GPU Kernel Completed"); } if (!tickEvent.scheduled()) { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d Gerrit-Change-Number: 71418 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: GPUFS: Only use parallel eventqs for KVM
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email ) Change subject: configs: GPUFS: Only use parallel eventqs for KVM .. configs: GPUFS: Only use parallel eventqs for KVM This is turned on by default with multiple CPUs in the GPUFS configs, which causes other CPU types (e.g., AtomicSimpleCPU) to assert. Only enable parallel event queues for KVM CPUs to avoid this issue. Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36 --- M configs/example/gpufs/runfs.py 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 01203bb..b045b80 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -162,7 +162,8 @@ # GPUFS is primarily designed to use the X86 KVM CPU. This model needs to # use multiple event queues when more than one CPU is simulated. Force it # on if that is the case. -args.host_parallel = True if args.num_cpus > 1 else False +if ObjectList.is_kvm_cpu(ObjectList.cpu_list.get(args.cpu_type)): +args.host_parallel = True if args.num_cpus > 1 else False # These are used by the protocols. They should not be set by the user. n_cu = args.num_compute_units -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36 Gerrit-Change-Number: 71419 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: gpu-compute: Gfx version check for FS and SE mode
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email ) Change subject: gpu-compute: Gfx version check for FS and SE mode .. gpu-compute: Gfx version check for FS and SE mode There is no GPU device in SE mode to get version from and no GPU driver in FS mode to get version from, so a conditional needs to be added depending on the mode to get the gfx version. Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71078 Tested-by: kokoro Maintainer: Matt Sinclair Reviewed-by: Matt Sinclair --- M src/gpu-compute/gpu_command_processor.cc M src/gpu-compute/gpu_compute_driver.hh 2 files changed, 5 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/gpu_command_processor.cc b/src/gpu-compute/gpu_command_processor.cc index 9755180..8f748bd 100644 --- a/src/gpu-compute/gpu_command_processor.cc +++ b/src/gpu-compute/gpu_command_processor.cc @@ -227,9 +227,11 @@ DPRINTF(GPUKernelInfo, "Kernel name: %s\n", kernel_name.c_str()); +GfxVersion gfxVersion = FullSystem ? gpuDevice->getGfxVersion() + : driver()->getGfxVersion(); HSAQueueEntry *task = new HSAQueueEntry(kernel_name, queue_id, dynamic_task_id, raw_pkt, &akc, host_pkt_addr, machine_code_addr, -gpuDevice->getGfxVersion()); +gfxVersion); DPRINTF(GPUCommandProc, "Task ID: %i Got AQL: wg size (%dx%dx%d), " "grid size (%dx%dx%d) kernarg addr: %#x, completion " diff --git a/src/gpu-compute/gpu_compute_driver.hh b/src/gpu-compute/gpu_compute_driver.hh index def40f4..9a3c647 100644 --- a/src/gpu-compute/gpu_compute_driver.hh +++ b/src/gpu-compute/gpu_compute_driver.hh @@ -142,6 +142,8 @@ }; typedef class EventTableEntry ETEntry; +GfxVersion getGfxVersion() const { return gfxVersion; } + private: /** * GPU that is controlled by this driver. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31 Gerrit-Change-Number: 71078 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: gpu-compute: Gfx version check for FS and SE mode
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email ) Change subject: gpu-compute: Gfx version check for FS and SE mode .. gpu-compute: Gfx version check for FS and SE mode There is no GPU device in SE mode to get version from and no GPU driver in FS mode to get version from, so a conditional needs to be added depending on the mode to get the gfx version. Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31 --- M src/gpu-compute/gpu_command_processor.cc M src/gpu-compute/gpu_compute_driver.hh 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gpu-compute/gpu_command_processor.cc b/src/gpu-compute/gpu_command_processor.cc index 9755180..8f748bd 100644 --- a/src/gpu-compute/gpu_command_processor.cc +++ b/src/gpu-compute/gpu_command_processor.cc @@ -227,9 +227,11 @@ DPRINTF(GPUKernelInfo, "Kernel name: %s\n", kernel_name.c_str()); +GfxVersion gfxVersion = FullSystem ? gpuDevice->getGfxVersion() + : driver()->getGfxVersion(); HSAQueueEntry *task = new HSAQueueEntry(kernel_name, queue_id, dynamic_task_id, raw_pkt, &akc, host_pkt_addr, machine_code_addr, -gpuDevice->getGfxVersion()); +gfxVersion); DPRINTF(GPUCommandProc, "Task ID: %i Got AQL: wg size (%dx%dx%d), " "grid size (%dx%dx%d) kernarg addr: %#x, completion " diff --git a/src/gpu-compute/gpu_compute_driver.hh b/src/gpu-compute/gpu_compute_driver.hh index def40f4..9a3c647 100644 --- a/src/gpu-compute/gpu_compute_driver.hh +++ b/src/gpu-compute/gpu_compute_driver.hh @@ -142,6 +142,8 @@ }; typedef class EventTableEntry ETEntry; +GfxVersion getGfxVersion() const { return gfxVersion; } + private: /** * GPU that is controlled by this driver. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31 Gerrit-Change-Number: 71078 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem: Handle DRAM write queue drain and disabled power down
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email ) Change subject: mem: Handle DRAM write queue drain and disabled power down .. mem: Handle DRAM write queue drain and disabled power down Write queue drain logic seems off currently. An event is scheduled if the write queue is empty instead of non-empty. There is no check to see if draining is complete when bus is in write mode. Finally the power down check on drain always fails if DRAM powerdown is disabled. This changeset reverses the drain conditional for the write queue to schedule an event if the write queue is *not* empty and checks in the event processing method that the queues are all empty so that signalDrainDone can be called. Lastly the powerdown state is ignored if DRAM powerdown is disabled. Powerdown is disabled in the GPU_VIPER protocol by default. This changeset successfully drains and checkpoints a GPUFS simulation using GPU_VIPER protocol. Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69917 Tested-by: kokoro Maintainer: Jason Lowe-Power Reviewed-by: Jason Lowe-Power --- M src/mem/dram_interface.hh M src/mem/mem_ctrl.cc 2 files changed, 23 insertions(+), 4 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/dram_interface.hh b/src/mem/dram_interface.hh index fa9d319..e20e33f 100644 --- a/src/mem/dram_interface.hh +++ b/src/mem/dram_interface.hh @@ -380,7 +380,18 @@ * @param Return true if the rank is idle from a bank *and power point of view */ -bool inPwrIdleState() const { return pwrState == PWR_IDLE; } +bool +inPwrIdleState() const +{ +// If powerdown is not enabled, then the ranks never go to idle +// states. In that case return true here to prevent checkpointing +// from getting stuck waiting for DRAM to be idle. +if (!dram.enableDRAMPowerdown) { +return true; +} + +return pwrState == PWR_IDLE; +} /** * Trigger a self-refresh exit if there are entries enqueued diff --git a/src/mem/mem_ctrl.cc b/src/mem/mem_ctrl.cc index 543d637..290db3e 100644 --- a/src/mem/mem_ctrl.cc +++ b/src/mem/mem_ctrl.cc @@ -908,6 +908,13 @@ } } +if (drainState() == DrainState::Draining && !totalWriteQueueSize && +!totalReadQueueSize && respQEmpty() && allIntfDrained()) { + +DPRINTF(Drain, "MemCtrl controller done draining\n"); +signalDrainDone(); +} + // updates current state busState = busStateNext; @@ -1411,8 +1418,8 @@ { // if there is anything in any of our internal queues, keep track // of that as well -if (!(!totalWriteQueueSize && !totalReadQueueSize && respQueue.empty() && - allIntfDrained())) { +if (totalWriteQueueSize || totalReadQueueSize || !respQueue.empty() || + !allIntfDrained()) { DPRINTF(Drain, "Memory controller not drained, write: %d, read: %d," " resp: %d\n", totalWriteQueueSize, totalReadQueueSize, @@ -1420,7 +1427,8 @@ // the only queue that is not drained automatically over time // is the write queue, thus kick things into action if needed -if (!totalWriteQueueSize && !nextReqEvent.scheduled()) { +if (totalWriteQueueSize && !nextReqEvent.scheduled()) { +DPRINTF(Drain,"Scheduling nextReqEvent from drain\n"); schedule(nextReqEvent, curTick()); } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb Gerrit-Change-Number: 69917 Gerrit-PatchSet: 4 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: Nikos Nikoleris Gerrit-Reviewer: kokoro Gerrit-CC: Matt Sinclair Gerrit-CC: Matt Sinclair Gerrit-CC: Melissa Jost Gerrit-CC: VISHNU RAMADAS ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: configs,dev-amdgpu: GPUFS MI200/gfx90a support
class PoolManager(SimObject): diff --git a/src/gpu-compute/gpu_command_processor.cc b/src/gpu-compute/gpu_command_processor.cc index af59b78..9755180 100644 --- a/src/gpu-compute/gpu_command_processor.cc +++ b/src/gpu-compute/gpu_command_processor.cc @@ -228,7 +228,8 @@ DPRINTF(GPUKernelInfo, "Kernel name: %s\n", kernel_name.c_str()); HSAQueueEntry *task = new HSAQueueEntry(kernel_name, queue_id, -dynamic_task_id, raw_pkt, &akc, host_pkt_addr, machine_code_addr); +dynamic_task_id, raw_pkt, &akc, host_pkt_addr, machine_code_addr, +gpuDevice->getGfxVersion()); DPRINTF(GPUCommandProc, "Task ID: %i Got AQL: wg size (%dx%dx%d), " "grid size (%dx%dx%d) kernarg addr: %#x, completion " diff --git a/src/gpu-compute/hsa_queue_entry.hh b/src/gpu-compute/hsa_queue_entry.hh index fbe0efe..4083c1c 100644 --- a/src/gpu-compute/hsa_queue_entry.hh +++ b/src/gpu-compute/hsa_queue_entry.hh @@ -51,6 +51,7 @@ #include "base/types.hh" #include "dev/hsa/hsa_packet.hh" #include "dev/hsa/hsa_queue.hh" +#include "enums/GfxVersion.hh" #include "gpu-compute/kernel_code.hh" namespace gem5 @@ -61,7 +62,7 @@ public: HSAQueueEntry(std::string kernel_name, uint32_t queue_id, int dispatch_id, void *disp_pkt, AMDKernelCode *akc, - Addr host_pkt_addr, Addr code_addr) + Addr host_pkt_addr, Addr code_addr, GfxVersion gfx_version) : kernName(kernel_name), _wgSize{{(int)((_hsa_dispatch_packet_t*)disp_pkt)->workgroup_size_x, (int)((_hsa_dispatch_packet_t*)disp_pkt)->workgroup_size_y, @@ -92,9 +93,19 @@ // we need to rip register usage from the resource registers. // // We can't get an exact number of registers from the resource -// registers because they round, but we can get an upper bound on it -if (!numVgprs) -numVgprs = (akc->granulated_workitem_vgpr_count + 1) * 4; +// registers because they round, but we can get an upper bound on it. +// We determine the number of registers by solving for "vgprs_used" +// in the LLVM docs: https://www.llvm.org/docs/AMDGPUUsage.html +// #code-object-v3-kernel-descriptor +// Currently, the only supported gfx version in gem5 that computes +// this differently is gfx90a. + if (!numVgprs) { +if (gfx_version == GfxVersion::gfx90a) { +numVgprs = (akc->granulated_workitem_vgpr_count + 1) * 8; +} else { +numVgprs = (akc->granulated_workitem_vgpr_count + 1) * 4; +} +} if (!numSgprs || numSgprs == std::numeric_limitswavefront_sgpr_count)>::max()) { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70317?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d Gerrit-Change-Number: 70317 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-x86: Fix CPUID function 0
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email ) Change subject: arch-x86: Fix CPUID function 0 .. arch-x86: Fix CPUID function 0 This should return the number of standard features, not the number of extended features. Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70778 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/x86/cpuid.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/x86/cpuid.cc b/src/arch/x86/cpuid.cc index 4ce66df..ac4709c 100644 --- a/src/arch/x86/cpuid.cc +++ b/src/arch/x86/cpuid.cc @@ -162,7 +162,7 @@ ISA *isa = dynamic_cast(tc->getIsaPtr()); auto vendor_string = isa->getVendorString(); result = CpuidResult( - NumExtendedCpuidFuncs - 1, + NumStandardCpuidFuncs - 1, stringToRegister(vendor_string.c_str()), stringToRegister(vendor_string.c_str() + 4), stringToRegister(vendor_string.c_str() + 8)); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561 Gerrit-Change-Number: 70778 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Gabe Black Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Update SDMA checkpointing
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email ) Change subject: dev-amdgpu: Update SDMA checkpointing .. dev-amdgpu: Update SDMA checkpointing Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added support for a variable number of SDMA engines to support newer GPU models. As part of this an SDMA IDs map was added to map from SDMA ID number to the SDMA SimObject pointer. In order to get the correct pointer in unserialize now, we need to store the ID in the checkpoint and use that to index the new map. We can't simply assign using the loop variable as the SDMAs might not be in order in the checkpoint and additionally the checkpoint contains both the gfx and page offset for the SDMA engines, so each SDMA is inserted into the SDMA offset map (sdmaEngs) twice. Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70878 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/sdma_engine.hh 2 files changed, 5 insertions(+), 3 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index f58d1f7..7037e6f 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -604,7 +604,7 @@ idx = 0; for (auto & it : sdmaEngs) { sdma_engs_offset[idx] = it.first; -sdma_engs[idx] = idx; +sdma_engs[idx] = it.second->getId(); ++idx; } @@ -675,8 +675,9 @@ UNSERIALIZE_ARRAY(sdma_engs, sizeof(sdma_engs)/sizeof(sdma_engs[0])); for (int idx = 0; idx < sdma_engs_size; ++idx) { -assert(sdmaIds.count(idx)); -SDMAEngine *sdma = sdmaIds[idx]; +int sdma_id = sdma_engs[idx]; +assert(sdmaIds.count(sdma_id)); +SDMAEngine *sdma = sdmaIds[sdma_id]; sdmaEngs.insert(std::make_pair(sdma_engs_offset[idx], sdma)); } } diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index 1e4f965..bcbd497 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -165,6 +165,7 @@ void setGPUDevice(AMDGPUDevice *gpu_device); void setId(int _id) { id = _id; } +int getId() const { return id; } /** * Returns the client id for the Interrupt Handler. */ -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d Gerrit-Change-Number: 70878 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: VISHNU RAMADAS Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Update SDMA checkpointing
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email ) Change subject: dev-amdgpu: Update SDMA checkpointing .. dev-amdgpu: Update SDMA checkpointing Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added support for a variable number of SDMA engines to support newer GPU models. As part of this an SDMA IDs map was added to map from SDMA ID number to the SDMA SimObject pointer. In order to get the correct pointer in unserialize now, we need to store the ID in the checkpoint and use that to index the new map. We can't simply assign using the loop variable as the SDMAs might not be in order in the checkpoint and additionally the checkpoint contains both the gfx and page offset for the SDMA engines, so each SDMA is inserted into the SDMA offset map (sdmaEngs) twice. Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d --- M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/sdma_engine.hh 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index f58d1f7..7037e6f 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -604,7 +604,7 @@ idx = 0; for (auto & it : sdmaEngs) { sdma_engs_offset[idx] = it.first; -sdma_engs[idx] = idx; +sdma_engs[idx] = it.second->getId(); ++idx; } @@ -675,8 +675,9 @@ UNSERIALIZE_ARRAY(sdma_engs, sizeof(sdma_engs)/sizeof(sdma_engs[0])); for (int idx = 0; idx < sdma_engs_size; ++idx) { -assert(sdmaIds.count(idx)); -SDMAEngine *sdma = sdmaIds[idx]; +int sdma_id = sdma_engs[idx]; +assert(sdmaIds.count(sdma_id)); +SDMAEngine *sdma = sdmaIds[sdma_id]; sdmaEngs.insert(std::make_pair(sdma_engs_offset[idx], sdma)); } } diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index 1e4f965..bcbd497 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -165,6 +165,7 @@ void setGPUDevice(AMDGPUDevice *gpu_device); void setId(int _id) { id = _id; } +int getId() const { return id; } /** * Returns the client id for the Interrupt Handler. */ -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d Gerrit-Change-Number: 70878 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix nbio psp ring assert
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email ) Change subject: dev-amdgpu: Fix nbio psp ring assert .. dev-amdgpu: Fix nbio psp ring assert The size of the packet changes between ROCm 4.x and ROCm 5.x. Change how the address is set based on the incoming packet size so that both versions continue to work for now. Change-Id: I91694e4760198fd9129e60140df4e863666be2e2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70677 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/dev/amdgpu/amdgpu_nbio.cc 1 file changed, 17 insertions(+), 3 deletions(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/src/dev/amdgpu/amdgpu_nbio.cc b/src/dev/amdgpu/amdgpu_nbio.cc index 8064fd2..69e4373 100644 --- a/src/dev/amdgpu/amdgpu_nbio.cc +++ b/src/dev/amdgpu/amdgpu_nbio.cc @@ -162,9 +162,23 @@ AMDGPUNbio::writeFrame(PacketPtr pkt, Addr offset) { if (offset == psp_ring_listen_addr) { -assert(pkt->getSize() == 8); -psp_ring_dev_addr = pkt->getLE() - - gpuDevice->getVM().getSysAddrRangeLow(); +DPRINTF(AMDGPUDevice, "Saw psp_ring_listen_addr with size %ld value " +"%ld\n", pkt->getSize(), pkt->getUintX(ByteOrder::little)); + +/* + * In ROCm versions 4.x this packet is a 4 byte value. In ROCm 5.x + * the packet is 8 bytes and mapped as a system address which needs + * to be subtracted out to get the framebuffer address. + */ +if (pkt->getSize() == 4) { +psp_ring_dev_addr = pkt->getLE(); +} else if (pkt->getSize() == 8) { +psp_ring_dev_addr = pkt->getUintX(ByteOrder::little) + - gpuDevice->getVM().getSysAddrRangeLow(); +} else { +panic("Invalid write size to psp_ring_listen_addr\n"); +} + DPRINTF(AMDGPUDevice, "Setting PSP ring device address to %#lx\n", psp_ring_dev_addr); } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I91694e4760198fd9129e60140df4e863666be2e2 Gerrit-Change-Number: 70677 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: VISHNU RAMADAS Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-x86: Fix CPUID function 0
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email ) Change subject: arch-x86: Fix CPUID function 0 .. arch-x86: Fix CPUID function 0 This should return the number of standard features, not the number of extended features. Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561 --- M src/arch/x86/cpuid.cc 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/arch/x86/cpuid.cc b/src/arch/x86/cpuid.cc index 4ce66df..ac4709c 100644 --- a/src/arch/x86/cpuid.cc +++ b/src/arch/x86/cpuid.cc @@ -162,7 +162,7 @@ ISA *isa = dynamic_cast(tc->getIsaPtr()); auto vendor_string = isa->getVendorString(); result = CpuidResult( - NumExtendedCpuidFuncs - 1, + NumStandardCpuidFuncs - 1, stringToRegister(vendor_string.c_str()), stringToRegister(vendor_string.c_str() + 4), stringToRegister(vendor_string.c_str() + 8)); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561 Gerrit-Change-Number: 70778 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Helper methods for SDWA/DPP for VOP2
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70738?usp=email ) Change subject: arch-vega: Helper methods for SDWA/DPP for VOP2 .. arch-vega: Helper methods for SDWA/DPP for VOP2 Many of the outstanding issues with the GPU model are related to instructions not having SDWA/DPP implementations and executing by ignoring the special registers leading to incorrect executiong. Adding SDWA/DPP is current very cumbersome as there is a lot of boilerplate code. This changeset adds helper methods for VOP2 with one instruction changed as an example. This review is intended to get feedback before applying this change to all VOP2 instructions that support SDWA/DPP. Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406 --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/op_encodings.hh 2 files changed, 97 insertions(+), 52 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 6c014bc..0d3f2dc 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -6384,65 +6384,17 @@ void Inst_VOP2__V_MUL_U32_U24::execute(GPUDynInstPtr gpuDynInst) { -Wavefront *wf = gpuDynInst->wavefront(); -ConstVecOperandU32 src0(gpuDynInst, instData.SRC0); -VecOperandU32 src1(gpuDynInst, instData.VSRC1); -VecOperandU32 vdst(gpuDynInst, instData.VDST); - -src0.readSrc(); -src1.read(); - -if (isSDWAInst()) { -VecOperandU32 src0_sdwa(gpuDynInst, extData.iFmt_VOP_SDWA.SRC0); -// use copies of original src0, src1, and dest during selecting -VecOperandU32 origSrc0_sdwa(gpuDynInst, -extData.iFmt_VOP_SDWA.SRC0); -VecOperandU32 origSrc1(gpuDynInst, instData.VSRC1); -VecOperandU32 origVdst(gpuDynInst, instData.VDST); - -src0_sdwa.read(); -origSrc0_sdwa.read(); -origSrc1.read(); - -DPRINTF(VEGA, "Handling V_MUL_U32_U24 SRC SDWA. SRC0: register " -"v[%d], DST_SEL: %d, DST_U: %d, CLMP: %d, SRC0_SEL: " -"%d, SRC0_SEXT: %d, SRC0_NEG: %d, SRC0_ABS: %d, SRC1_SEL: " -"%d, SRC1_SEXT: %d, SRC1_NEG: %d, SRC1_ABS: %d\n", -extData.iFmt_VOP_SDWA.SRC0, extData.iFmt_VOP_SDWA.DST_SEL, -extData.iFmt_VOP_SDWA.DST_U, -extData.iFmt_VOP_SDWA.CLMP, -extData.iFmt_VOP_SDWA.SRC0_SEL, -extData.iFmt_VOP_SDWA.SRC0_SEXT, -extData.iFmt_VOP_SDWA.SRC0_NEG, -extData.iFmt_VOP_SDWA.SRC0_ABS, -extData.iFmt_VOP_SDWA.SRC1_SEL, -extData.iFmt_VOP_SDWA.SRC1_SEXT, -extData.iFmt_VOP_SDWA.SRC1_NEG, -extData.iFmt_VOP_SDWA.SRC1_ABS); - -processSDWA_src(extData.iFmt_VOP_SDWA, src0_sdwa, origSrc0_sdwa, -src1, origSrc1); - -for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { -if (wf->execMask(lane)) { -vdst[lane] = bits(src0_sdwa[lane], 23, 0) * - bits(src1[lane], 23, 0); -origVdst[lane] = vdst[lane]; // keep copy consistent -} -} - -processSDWA_dst(extData.iFmt_VOP_SDWA, vdst, origVdst); -} else { +auto opImpl = [](VecOperandU32& src0, VecOperandU32& src1, + VecOperandU32& vdst, Wavefront* wf) { for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (wf->execMask(lane)) { vdst[lane] = bits(src0[lane], 23, 0) * bits(src1[lane], 23, 0); } } -} +}; - -vdst.write(); +vop2Helper(gpuDynInst, opImpl); } // execute // --- Inst_VOP2__V_MUL_HI_U32_U24 class methods --- diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh b/src/arch/amdgpu/vega/insts/op_encodings.hh index 1071ead..f195472 100644 --- a/src/arch/amdgpu/vega/insts/op_encodings.hh +++ b/src/arch/amdgpu/vega/insts/op_encodings.hh @@ -272,6 +272,99 @@ InstFormat extData; uint32_t varSize; +template +T sdwaSrcHelper(GPUDynInstPtr gpuDynInst, T & src1) +{ +T src0_sdwa(gpuDynInst, extData.iFmt_VOP_SDWA.SRC0); +// use copies of original src0, src1, and dest during selecting +T origSrc0_sdwa(gpuDynInst, extData.iFmt_VOP_SDWA.SRC0); +T origSrc1(gpuDynInst, instData.VSRC1); + +src0_sdwa.read(); +origSrc0_sdwa.read(); +origSrc1.read(); + +DPRINTF(VEGA, "Ha
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix nbio psp ring assert
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email ) Change subject: dev-amdgpu: Fix nbio psp ring assert .. dev-amdgpu: Fix nbio psp ring assert The size of the packet changes between ROCm 4.x and ROCm 5.x. Change how the address is set based on the incoming packet size so that both versions continue to work for now. Change-Id: I91694e4760198fd9129e60140df4e863666be2e2 --- M src/dev/amdgpu/amdgpu_nbio.cc 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/dev/amdgpu/amdgpu_nbio.cc b/src/dev/amdgpu/amdgpu_nbio.cc index 8064fd2..8722c50 100644 --- a/src/dev/amdgpu/amdgpu_nbio.cc +++ b/src/dev/amdgpu/amdgpu_nbio.cc @@ -162,9 +162,18 @@ AMDGPUNbio::writeFrame(PacketPtr pkt, Addr offset) { if (offset == psp_ring_listen_addr) { -assert(pkt->getSize() == 8); -psp_ring_dev_addr = pkt->getLE() - - gpuDevice->getVM().getSysAddrRangeLow(); +DPRINTF(AMDGPUDevice, "Saw psp_ring_listen_addr with size %ld value " +"%ld\n", pkt->getSize(), pkt->getUintX(ByteOrder::little)); + +if (pkt->getSize() == 4) { +psp_ring_dev_addr = pkt->getLE(); +} else if (pkt->getSize() == 8) { +psp_ring_dev_addr = pkt->getUintX(ByteOrder::little) + - gpuDevice->getVM().getSysAddrRangeLow(); +} else { +panic("Invalid write size to psp_ring_listen_addr\n"); +} + DPRINTF(AMDGPUDevice, "Setting PSP ring device address to %#lx\n", psp_ring_dev_addr); } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I91694e4760198fd9129e60140df4e863666be2e2 Gerrit-Change-Number: 70677 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-gcn3,arch-vega: Fix ds_read2st64_b32
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email ) Change subject: arch-gcn3,arch-vega: Fix ds_read2st64_b32 .. arch-gcn3,arch-vega: Fix ds_read2st64_b32 This instruction has two issues. The first is that it should write two consecutive registers, starting with vdst because it is writing two dwords. The second is that the data assignment to the lanes from the dynamic instruction should cast to a U32 type otherwise the array index goes out of bounds and returns the wrong data. The first issue was fixed in GCN3 a few years ago in this review: https://gem5-review.googlesource.com/c/public/gem5/+/32236. This changeset makes the same change for Vega and applies the U32 cast in both ISAs. Tested with rocPRIM unit test. The test was failing before this changeset and now passes. Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70577 Maintainer: Matt Sinclair Reviewed-by: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/gcn3/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.cc 2 files changed, 5 insertions(+), 5 deletions(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc b/src/arch/amdgpu/gcn3/insts/instructions.cc index 8c51af5..478b1d3 100644 --- a/src/arch/amdgpu/gcn3/insts/instructions.cc +++ b/src/arch/amdgpu/gcn3/insts/instructions.cc @@ -32123,9 +32123,9 @@ for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { -vdst0[lane] = (reinterpret_cast( +vdst0[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2]; -vdst1[lane] = (reinterpret_cast( +vdst1[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2 + 1]; } } diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 45c8491..6c014bc 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -35665,13 +35665,13 @@ Inst_DS__DS_READ2ST64_B32::completeAcc(GPUDynInstPtr gpuDynInst) { VecOperandU32 vdst0(gpuDynInst, extData.VDST); -VecOperandU32 vdst1(gpuDynInst, extData.VDST + 2); +VecOperandU32 vdst1(gpuDynInst, extData.VDST + 1); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { -vdst0[lane] = (reinterpret_cast( +vdst0[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2]; -vdst1[lane] = (reinterpret_cast( +vdst1[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2 + 1]; } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10 Gerrit-Change-Number: 70577 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-gcn3,arch-vega: Fix ds_read2st64_b32
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email ) Change subject: arch-gcn3,arch-vega: Fix ds_read2st64_b32 .. arch-gcn3,arch-vega: Fix ds_read2st64_b32 This instruction has two issues. The first is that it should write two consecutive registers, starting with vdst because it is writing two dwords. The second is that the data assignment to the lanes from the dynamic instruction should cast to a U32 type otherwise the array index goes out of bounds and returns the wrong data. The first issue was fixed in GCN3 a few years ago in this review: https://gem5-review.googlesource.com/c/public/gem5/+/32236. This changeset makes the same change for Vega and applies the U32 cast in both ISAs. Tested with rocPRIM unit test. The test was failing before this changeset and now passes. Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10 --- M src/arch/amdgpu/gcn3/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.cc 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc b/src/arch/amdgpu/gcn3/insts/instructions.cc index 8c51af5..478b1d3 100644 --- a/src/arch/amdgpu/gcn3/insts/instructions.cc +++ b/src/arch/amdgpu/gcn3/insts/instructions.cc @@ -32123,9 +32123,9 @@ for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { -vdst0[lane] = (reinterpret_cast( +vdst0[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2]; -vdst1[lane] = (reinterpret_cast( +vdst1[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2 + 1]; } } diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 45c8491..6c014bc 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -35665,13 +35665,13 @@ Inst_DS__DS_READ2ST64_B32::completeAcc(GPUDynInstPtr gpuDynInst) { VecOperandU32 vdst0(gpuDynInst, extData.VDST); -VecOperandU32 vdst1(gpuDynInst, extData.VDST + 2); +VecOperandU32 vdst1(gpuDynInst, extData.VDST + 1); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { -vdst0[lane] = (reinterpret_cast( +vdst0[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2]; -vdst1[lane] = (reinterpret_cast( +vdst1[lane] = (reinterpret_cast( gpuDynInst->d_data))[lane * 2 + 1]; } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings?usp=email Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10 Gerrit-Change-Number: 70577 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: configs,dev-amdgpu: GPUFS MI200/gfx90a support
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70317?usp=email ) Change subject: configs,dev-amdgpu: GPUFS MI200/gfx90a support .. configs,dev-amdgpu: GPUFS MI200/gfx90a support Add support for MI200-like device. This includes adding PCI IDs and new MMIOs for the device, a different MAP_PROCESS packet, and a different calculation for the number of VGPRs. Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d --- M configs/example/gpufs/runfs.py M configs/example/gpufs/system/amdgpu.py M configs/example/gpufs/system/system.py M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/amdgpu_device.hh M src/dev/amdgpu/amdgpu_nbio.cc M src/dev/amdgpu/amdgpu_nbio.hh M src/dev/amdgpu/amdgpu_vm.hh M src/dev/amdgpu/pm4_defines.hh M src/dev/amdgpu/pm4_packet_processor.cc M src/dev/amdgpu/pm4_packet_processor.hh M src/gpu-compute/GPU.py M src/gpu-compute/gpu_command_processor.cc M src/gpu-compute/hsa_queue_entry.hh 14 files changed, 173 insertions(+), 27 deletions(-) diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 4c90601..f8ef70d 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -132,8 +132,9 @@ parser.add_argument( "--gpu-device", default="Vega10", -choices=["Vega10", "MI100"], -help="GPU model to run: Vega10 (gfx900) or MI100 (gfx908)", +choices=["Vega10", "MI100", "MI200"], +help="GPU model to run: Vega10 (gfx900), MI100 (gfx908), or " +"MI200 (gfx90a)", ) diff --git a/configs/example/gpufs/system/amdgpu.py b/configs/example/gpufs/system/amdgpu.py index 5f98b55..9697e50 100644 --- a/configs/example/gpufs/system/amdgpu.py +++ b/configs/example/gpufs/system/amdgpu.py @@ -177,6 +177,10 @@ system.pc.south_bridge.gpu.DeviceID = 0x738C system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002 system.pc.south_bridge.gpu.SubsystemID = 0x0C34 +elif args.gpu_device == "MI200": +system.pc.south_bridge.gpu.DeviceID = 0x740F +system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002 +system.pc.south_bridge.gpu.SubsystemID = 0x0C34 elif args.gpu_device == "Vega10": system.pc.south_bridge.gpu.DeviceID = 0x6863 else: diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index 90c5c01..263ffc0 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -152,6 +152,16 @@ 0x7D000, ] sdma_sizes = [0x1000] * 8 +elif args.gpu_device == "MI200": +num_sdmas = 5 +sdma_bases = [ +0x4980, +0x6180, +0x78000, +0x79000, +0x7A000, +] +sdma_sizes = [0x1000] * 5 else: m5.util.panic(f"Unknown GPU device {args.gpu_device}") diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index f58d1f7..734f0d7 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -115,7 +115,7 @@ sdmaFunc.insert({0x10b, &SDMAEngine::setPageDoorbellOffsetLo}); sdmaFunc.insert({0xe0, &SDMAEngine::setPageSize}); sdmaFunc.insert({0x113, &SDMAEngine::setPageWptrLo}); -} else if (p.device_name == "MI100") { +} else if (p.device_name == "MI100" || p.device_name == "MI200") { sdmaFunc.insert({0xd9, &SDMAEngine::setPageBaseLo}); sdmaFunc.insert({0xe1, &SDMAEngine::setPageRptrLo}); sdmaFunc.insert({0xe0, &SDMAEngine::setPageRptrHi}); @@ -144,10 +144,19 @@ if (p.device_name == "Vega10") { setRegVal(VEGA10_FB_LOCATION_BASE, mmhubBase >> 24); setRegVal(VEGA10_FB_LOCATION_TOP, mmhubTop >> 24); +gfx_version = GfxVersion::gfx900; } else if (p.device_name == "MI100") { setRegVal(MI100_FB_LOCATION_BASE, mmhubBase >> 24); setRegVal(MI100_FB_LOCATION_TOP, mmhubTop >> 24); setRegVal(MI100_MEM_SIZE_REG, 0x3ff0); // 16GB of memory +gfx_version = GfxVersion::gfx908; +} else if (p.device_name == "MI200") { +// This device can have either 64GB or 128GB of device memory. +// This limits to 16GB for simulation. +setRegVal(MI200_FB_LOCATION_BASE, mmhubBase >> 24); +setRegVal(MI200_FB_LOCATION_TOP, mmhubTop >> 24); +setRegVal(MI200_MEM_SIZE_REG, 0x3ff0); +gfx_version = GfxVersion::gfx90a; } else { panic("Unknown GPU device %s\n", p.device_name); } diff --git a/src/dev/amdgpu/amdgpu_device.hh b/src/dev/amdgpu/amdgpu_device.hh index cab7991..56ed2f4 100644 --- a/src/dev/amdgpu/amdgpu_device.hh +++ b/src/dev/amdgpu/amdgpu_device.hh @@ -42,6 +42,7 @@ #include "dev/amdgpu/mmio_reader.hh" #include "dev/io_device.hh" #include "dev/pci/device.hh" +#include "enums/GfxVersion.hh" #include "param
[gem5-dev] [L] Change in gem5/gem5[develop]: dev-amdgpu: Enable more GPUs with device specific registers
0x69b88 + +#define MI100_INV_ENG17_ACK1 0x0a318 +#define MI100_INV_ENG17_ACK2 0x6a918 +#define MI100_INV_ENG17_ACK3 0x76918 +#define MI100_INV_ENG17_SEM1 0x0a288 +#define MI100_INV_ENG17_SEM2 0x6a888 +#define MI100_INV_ENG17_SEM3 0x76888 + +class AMDGPUNbio +{ + public: +AMDGPUNbio(); + +void setGPUDevice(AMDGPUDevice *gpu_device); + +void readMMIO(PacketPtr pkt, Addr offset); +void writeMMIO(PacketPtr pkt, Addr offset); + +bool readFrame(PacketPtr pkt, Addr offset); +void writeFrame(PacketPtr pkt, Addr offset); + + private: +AMDGPUDevice *gpuDevice; + +/* + * Driver initialization sequence helper variables. + */ +uint64_t mm_index_reg = 0; +std::unordered_map triggered_reads; + +/* + * PSP variables used in initialization. + */ +Addr psp_ring = 0; +Addr psp_ring_dev_addr = 0; +Addr psp_ring_listen_addr = 0; +int psp_ring_size = 0; +int psp_ring_retval = 0; +int psp_ring_value = 0; +}; + +} // namespace gem5 + +#endif // __DEV_AMDGPU_AMDGPU_NBIO__ diff --git a/src/dev/amdgpu/amdgpu_vm.hh b/src/dev/amdgpu/amdgpu_vm.hh index 212a688..ac35a11 100644 --- a/src/dev/amdgpu/amdgpu_vm.hh +++ b/src/dev/amdgpu/amdgpu_vm.hh @@ -74,6 +74,13 @@ #define mmMMHUB_VM_FB_LOCATION_BASE 0x082c #define mmMMHUB_VM_FB_LOCATION_TOP 0x082d +#define VEGA10_FB_LOCATION_BASE 0x6a0b0 +#define VEGA10_FB_LOCATION_TOP 0x6a0b4 + +#define MI100_MEM_SIZE_REG 0x0378c +#define MI100_FB_LOCATION_BASE 0x6ac00 +#define MI100_FB_LOCATION_TOP 0x6ac04 + // AMD GPUs support 16 different virtual address spaces static constexpr int AMDGPU_VM_COUNT = 16; @@ -192,6 +199,9 @@ Addr getMMHUBBase() { return mmhubBase; } Addr getMMHUBTop() { return mmhubTop; } +void setMMHUBBase(Addr base) { mmhubBase = base; } +void setMMHUBTop(Addr top) { mmhubTop = top; } + bool inFB(Addr vaddr) { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70041?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I14b364374e086e185978334425a4e265cf2760d0 Gerrit-Change-Number: 70041 Gerrit-PatchSet: 4 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Refactor MMIO interface for SDMA engines
}; diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc index 736df45..e99d694 100644 --- a/src/dev/amdgpu/sdma_engine.cc +++ b/src/dev/amdgpu/sdma_engine.cc @@ -49,7 +49,8 @@ : DmaVirtDevice(p), id(0), gfxBase(0), gfxRptr(0), gfxDoorbell(0), gfxDoorbellOffset(0), gfxWptr(0), pageBase(0), pageRptr(0), pageDoorbell(0), pageDoorbellOffset(0), - pageWptr(0), gpuDevice(nullptr), walker(p.walker) + pageWptr(0), gpuDevice(nullptr), walker(p.walker), + mmioBase(p.mmio_base), mmioSize(p.mmio_size) { gfx.ib(&gfxIb); gfxIb.parent(&gfx); @@ -87,6 +88,18 @@ return SOC15_IH_CLIENTID_SDMA0; case 1: return SOC15_IH_CLIENTID_SDMA1; + case 2: +return SOC15_IH_CLIENTID_SDMA2; + case 3: +return SOC15_IH_CLIENTID_SDMA3; + case 4: +return SOC15_IH_CLIENTID_SDMA4; + case 5: +return SOC15_IH_CLIENTID_SDMA5; + case 6: +return SOC15_IH_CLIENTID_SDMA6; + case 7: +return SOC15_IH_CLIENTID_SDMA7; default: panic("Unknown SDMA id"); } @@ -1240,6 +1253,10 @@ { gfxDoorbellOffset = insertBits(gfxDoorbellOffset, 31, 0, 0); gfxDoorbellOffset |= data; +if (bits(gfxDoorbell, 28, 28)) { +gpuDevice->setDoorbellType(gfxDoorbellOffset, QueueType::SDMAGfx); +gpuDevice->setSDMAEngine(gfxDoorbellOffset, this); +} } void @@ -1250,9 +1267,11 @@ } void -SDMAEngine::setGfxSize(uint64_t data) +SDMAEngine::setGfxSize(uint32_t data) { -gfx.size(data); +uint32_t rb_size = bits(data, 6, 1); +assert(rb_size >= 6 && rb_size <= 62); +gfx.size(1 << (rb_size + 2)); } void @@ -1320,6 +1339,10 @@ { pageDoorbellOffset = insertBits(pageDoorbellOffset, 31, 0, 0); pageDoorbellOffset |= data; +if (bits(pageDoorbell, 28, 28)) { +gpuDevice->setDoorbellType(pageDoorbellOffset, QueueType::SDMAPage); +gpuDevice->setSDMAEngine(pageDoorbellOffset, this); +} } void @@ -1330,9 +1353,11 @@ } void -SDMAEngine::setPageSize(uint64_t data) +SDMAEngine::setPageSize(uint32_t data) { -page.size(data); +uint32_t rb_size = bits(data, 6, 1); +assert(rb_size >= 6 && rb_size <= 62); +page.size(1 << (rb_size + 2)); } void diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index 27c1691..1e4f965 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -156,6 +156,9 @@ void processRLC0(Addr wptrOffset); void processRLC1(Addr wptrOffset); +Addr mmioBase = 0; +Addr mmioSize = 0; + public: SDMAEngine(const SDMAEngineParams &p); @@ -243,6 +246,14 @@ uint64_t *dmaBuffer); /** + * Methods for getting SDMA MMIO base address and size. These are set by + * the python configuration depending on device to allow for flexible base + * addresses depending on what GPU is being simulated. + */ +Addr getMmioBase() { return mmioBase; } +Addr getMmioSize() { return mmioSize; } + +/** * Methods for getting the values of SDMA MMIO registers. */ uint64_t getGfxBase() { return gfxBase; } @@ -269,7 +280,7 @@ void setGfxDoorbellHi(uint32_t data); void setGfxDoorbellOffsetLo(uint32_t data); void setGfxDoorbellOffsetHi(uint32_t data); -void setGfxSize(uint64_t data); +void setGfxSize(uint32_t data); void setGfxWptrLo(uint32_t data); void setGfxWptrHi(uint32_t data); void setPageBaseLo(uint32_t data); @@ -280,7 +291,7 @@ void setPageDoorbellHi(uint32_t data); void setPageDoorbellOffsetLo(uint32_t data); void setPageDoorbellOffsetHi(uint32_t data); -void setPageSize(uint64_t data); +void setPageSize(uint32_t data); void setPageWptrLo(uint32_t data); void setPageWptrHi(uint32_t data); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70040?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc Gerrit-Change-Number: 70040 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Default MMIO reads when previously written
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email ) ( 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. )Change subject: dev-amdgpu: Default MMIO reads when previously written .. dev-amdgpu: Default MMIO reads when previously written If an MMIO was previously written and the driver reads it, we should return the value that was previously read. This overwrites the MMIO trace value which is the last resort fallback for finding an MMIO value. This is needed to initialize newer GPU devices in gem5. Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039 Maintainer: Matt Sinclair Reviewed-by: Matt Sinclair Tested-by: kokoro --- M src/dev/amdgpu/amdgpu_device.cc 1 file changed, 7 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index 3605882..7e6304a 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -248,6 +248,13 @@ DPRINTF(AMDGPUDevice, "Read MMIO %#lx\n", offset); mmioReader.readFromTrace(pkt, MMIO_BAR, offset); +if (regs.find(pkt->getAddr()) != regs.end()) { +uint64_t value = regs[pkt->getAddr()]; +DPRINTF(AMDGPUDevice, "Reading what kernel wrote before: %#x\n", +value); +pkt->setUintX(value, ByteOrder::little); +} + switch (aperture) { case NBIO_BASE: switch (aperture_offset) { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09 Gerrit-Change-Number: 70039 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu,configs: Add human readable names for different GPUs
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email ) Change subject: dev-amdgpu,configs: Add human readable names for different GPUs .. dev-amdgpu,configs: Add human readable names for different GPUs Add a human readable string for GPU device names rather than using the device ID in the code. This is intended to make code more readable. Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70038 Reviewed-by: Matt Sinclair Tested-by: kokoro Reviewed-by: Jason Lowe-Power Maintainer: Matt Sinclair --- M configs/example/gpufs/runfs.py M configs/example/gpufs/system/amdgpu.py M src/dev/amdgpu/AMDGPU.py 3 files changed, 24 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass Jason Lowe-Power: Looks good to me, approved diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 52b79ab..4c90601 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -126,6 +126,16 @@ help="type of memory to use", ) +# These are the models that are both supported in gem5 and supported +# by the versions of ROCm supported by gem5 in full system mode. For +# other gfx versions there is some support in syscall emulation mode. +parser.add_argument( +"--gpu-device", +default="Vega10", +choices=["Vega10", "MI100"], +help="GPU model to run: Vega10 (gfx900) or MI100 (gfx908)", +) + def runGpuFSSystem(args): """ diff --git a/configs/example/gpufs/system/amdgpu.py b/configs/example/gpufs/system/amdgpu.py index 1fd3e2f..5f98b55 100644 --- a/configs/example/gpufs/system/amdgpu.py +++ b/configs/example/gpufs/system/amdgpu.py @@ -170,3 +170,14 @@ system.pc.south_bridge.gpu.checkpoint_before_mmios = ( args.checkpoint_before_mmios ) + +system.pc.south_bridge.gpu.device_name = args.gpu_device + +if args.gpu_device == "MI100": +system.pc.south_bridge.gpu.DeviceID = 0x738C +system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002 +system.pc.south_bridge.gpu.SubsystemID = 0x0C34 +elif args.gpu_device == "Vega10": +system.pc.south_bridge.gpu.DeviceID = 0x6863 +else: +panic("Unknown GPU device: {}".format(args.gpu_device)) diff --git a/src/dev/amdgpu/AMDGPU.py b/src/dev/amdgpu/AMDGPU.py index f9d953f..1e78672 100644 --- a/src/dev/amdgpu/AMDGPU.py +++ b/src/dev/amdgpu/AMDGPU.py @@ -46,6 +46,9 @@ cxx_header = "dev/amdgpu/amdgpu_device.hh" cxx_class = "gem5::AMDGPUDevice" +# Human readable name for device ID +device_name = Param.String("Vega10", "Codename for device") + # IDs for AMD Vega 10 VendorID = 0x1002 DeviceID = 0x6863 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a Gerrit-Change-Number: 70038 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Add decodings for new MI100 VOP2 insts
ying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_FMAC_F32(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_PK_FMAC_F16(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_XNOR_B32(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* Decoder::decode_OP_SOP2__S_ADD_U32(MachInst iFmt) { return new Inst_SOP2__S_ADD_U32(&iFmt->iFmt_SOP2); diff --git a/src/arch/amdgpu/vega/gpu_decoder.hh b/src/arch/amdgpu/vega/gpu_decoder.hh index 1be4386..af989e0 100644 --- a/src/arch/amdgpu/vega/gpu_decoder.hh +++ b/src/arch/amdgpu/vega/gpu_decoder.hh @@ -1358,6 +1358,13 @@ GPUStaticInst* decode_OP_VOP2__V_ADD_U32(MachInst); GPUStaticInst* decode_OP_VOP2__V_SUB_U32(MachInst); GPUStaticInst* decode_OP_VOP2__V_SUBREV_U32(MachInst); +GPUStaticInst* decode_OP_VOP2__V_DOT2C_F32_F16(MachInst); +GPUStaticInst* decode_OP_VOP2__V_DOT2C_I32_I16(MachInst); +GPUStaticInst* decode_OP_VOP2__V_DOT4C_I32_I8(MachInst); +GPUStaticInst* decode_OP_VOP2__V_DOT8C_I32_I4(MachInst); +GPUStaticInst* decode_OP_VOP2__V_FMAC_F32(MachInst); +GPUStaticInst* decode_OP_VOP2__V_PK_FMAC_F16(MachInst); +GPUStaticInst* decode_OP_VOP2__V_XNOR_B32(MachInst); GPUStaticInst* decode_OP_VOPC__V_CMP_CLASS_F32(MachInst); GPUStaticInst* decode_OP_VOPC__V_CMPX_CLASS_F32(MachInst); GPUStaticInst* decode_OP_VOPC__V_CMP_CLASS_F64(MachInst); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70042?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893 Gerrit-Change-Number: 70042 Gerrit-PatchSet: 4 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Add writeROM method
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email ) Change subject: dev-amdgpu: Add writeROM method .. dev-amdgpu: Add writeROM method For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore gets routed to the PIO bus in gem5. This gets routed to the GPU in the case of a ROM write. We write to the ROM as a way to "load" the VBIOS without creating holes in the KVM VM. This write method allows the same scripts as KVM to be used by writing to the ROM area and overwriting what might already be there from the --gpu-rom option. Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037 Reviewed-by: Matt Sinclair Tested-by: kokoro Maintainer: Matt Sinclair --- M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/amdgpu_device.hh 2 files changed, 23 insertions(+), 0 deletions(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index cb180b6..3605882 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -107,6 +107,20 @@ pkt->getAddr(), rom_offset, rom_data); } +void +AMDGPUDevice::writeROM(PacketPtr pkt) +{ +assert(isROM(pkt->getAddr())); + +Addr rom_offset = pkt->getAddr() - romRange.start(); +uint64_t rom_data = pkt->getUintX(ByteOrder::little); + +memcpy(rom.data() + rom_offset, &rom_data, pkt->getSize()); + +DPRINTF(AMDGPUDevice, "Write to addr %#x on ROM offset %#x data: %#x\n", +pkt->getAddr(), rom_offset, rom_data); +} + AddrRangeList AMDGPUDevice::getAddrRanges() const { @@ -386,6 +400,14 @@ Tick AMDGPUDevice::write(PacketPtr pkt) { +if (isROM(pkt->getAddr())) { +writeROM(pkt); + +dispatchAccess(pkt, false); + +return pioDelay; +} + int barnum = -1; Addr offset = 0; getBAR(pkt->getAddr(), barnum, offset); diff --git a/src/dev/amdgpu/amdgpu_device.hh b/src/dev/amdgpu/amdgpu_device.hh index ac31b95..b64067a 100644 --- a/src/dev/amdgpu/amdgpu_device.hh +++ b/src/dev/amdgpu/amdgpu_device.hh @@ -94,6 +94,7 @@ AddrRange romRange; bool isROM(Addr addr) const { return romRange.contains(addr); } void readROM(PacketPtr pkt); +void writeROM(PacketPtr pkt); std::array rom; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683 Gerrit-Change-Number: 70037 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Use higher dmesg level for GPUFS
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email ) Change subject: configs: Use higher dmesg level for GPUFS .. configs: Use higher dmesg level for GPUFS The dmesg level is currently set to 3 which will not display errors if the amdgpu driver fails to load. Changing to level 8 will show errors in the gem5 terminal and is not too spammy. This will help GPUFS developers with bug reports since we would actually be able to observe an error. Currently if the driver fails to load, there is no way to detect it and applications will attempt to run, usually failing on getting device properties. Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69977 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M configs/example/gpufs/hip_cookbook.py M configs/example/gpufs/hip_rodinia.py M configs/example/gpufs/hip_samples.py M configs/example/gpufs/vega10_kvm.py 4 files changed, 4 insertions(+), 4 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/gpufs/hip_cookbook.py b/configs/example/gpufs/hip_cookbook.py index 87c7547..6a7bb42 100644 --- a/configs/example/gpufs/hip_cookbook.py +++ b/configs/example/gpufs/hip_cookbook.py @@ -42,7 +42,7 @@ cookbook_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." diff --git a/configs/example/gpufs/hip_rodinia.py b/configs/example/gpufs/hip_rodinia.py index 8ed951b..b8a7858 100644 --- a/configs/example/gpufs/hip_rodinia.py +++ b/configs/example/gpufs/hip_rodinia.py @@ -43,7 +43,7 @@ rodinia_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." diff --git a/configs/example/gpufs/hip_samples.py b/configs/example/gpufs/hip_samples.py index ccc1719..9f83c25 100644 --- a/configs/example/gpufs/hip_samples.py +++ b/configs/example/gpufs/hip_samples.py @@ -42,7 +42,7 @@ samples_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." diff --git a/configs/example/gpufs/vega10_kvm.py b/configs/example/gpufs/vega10_kvm.py index 54253be..9c7e457 100644 --- a/configs/example/gpufs/vega10_kvm.py +++ b/configs/example/gpufs/vega10_kvm.py @@ -44,7 +44,7 @@ demo_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d Gerrit-Change-Number: 69977 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add simple check for valid GPU MMIO trace
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email ) Change subject: configs: Add simple check for valid GPU MMIO trace .. configs: Add simple check for valid GPU MMIO trace This file is a required input to the simulator for GPUFS. There seems to be confusion from several users who are not providing this input. This usually results in the amdgpu driver failing to load, leading to the application under test exiting along with it. This changeset adds a simple md5 hashsum check to compare against the known good MMIO trace located in the gem5-resources repository. Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69978 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M configs/example/gpufs/runfs.py 1 file changed, 6 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 4a28068a..52b79ab 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -30,6 +30,7 @@ # System includes import argparse import math +import hashlib # gem5 related import m5 @@ -145,6 +146,11 @@ math.ceil(float(n_cu) / args.cu_per_scalar_cache) ) +# Verify MMIO trace is valid +mmio_md5 = hashlib.md5(open(args.gpu_mmio_trace, "rb").read()).hexdigest() +if mmio_md5 != "c4ff3326ae8a036e329b8b595c83bd6d": +m5.util.panic("MMIO file does not match gem5 resources") + system = makeGpuFSSystem(args) root = Root( -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987 Gerrit-Change-Number: 69978 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Allow other CPU types in GPUFS
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email ) Change subject: configs: Allow other CPU types in GPUFS .. configs: Allow other CPU types in GPUFS Previously the CPU type and memory modes were hardcoded for KVM, because there was a deadlock bug. After some recent testing, this deadlock bug no longer exists with the simple CPU models. Thus, changing the configs to allow for other CPU models as a first step toward lifting the KVM requirement from GPUFS. Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69979 Tested-by: kokoro Maintainer: Matt Sinclair Reviewed-by: Matt Sinclair --- M configs/example/gpufs/system/system.py 1 file changed, 6 insertions(+), 3 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index a1b59ef..93f0194 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -61,7 +61,9 @@ panic("Need at least 2GB of system memory to load amdgpu module") # Use the common FSConfig to setup a Linux X86 System -(TestCPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(args) +(TestCPUClass, test_mem_mode) = Simulation.getCPUClass(args.cpu_type) +if test_mem_mode == "atomic": +test_mem_mode = "atomic_noncaching" disks = [args.disk_image] if args.second_disk is not None: disks.extend([args.second_disk]) @@ -91,10 +93,11 @@ # Create specified number of CPUs. GPUFS really only needs one. system.cpu = [ -X86KvmCPU(clk_domain=system.cpu_clk_domain, cpu_id=i) +TestCPUClass(clk_domain=system.cpu_clk_domain, cpu_id=i) for i in range(args.num_cpus) ] -system.kvm_vm = KvmVM() +if ObjectList.is_kvm_cpu(TestCPUClass): +system.kvm_vm = KvmVM() # Create AMDGPU and attach to southbridge shader = createGPU(system, args) -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: merged Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5 Gerrit-Change-Number: 69979 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Add decodings for new MI100 VOP2 insts
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70042?usp=email ) Change subject: arch-vega: Add decodings for new MI100 VOP2 insts .. arch-vega: Add decodings for new MI100 VOP2 insts VOP2 with opcodes 55-61 were added in MI100 and are not in Vega10. This changeset adds the decodings for these instructions. The changeset does not implement the instructions, however the fatal message is much more helpful for debugging compared so a generic decode_invalid handler. Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893 --- M src/arch/amdgpu/vega/decoder.cc M src/arch/amdgpu/vega/gpu_decoder.hh 2 files changed, 84 insertions(+), 28 deletions(-) diff --git a/src/arch/amdgpu/vega/decoder.cc b/src/arch/amdgpu/vega/decoder.cc index 291dd69..fd3a803 100644 --- a/src/arch/amdgpu/vega/decoder.cc +++ b/src/arch/amdgpu/vega/decoder.cc @@ -274,34 +274,34 @@ &Decoder::decode_OP_VOP2__V_SUBREV_U32, &Decoder::decode_OP_VOP2__V_SUBREV_U32, &Decoder::decode_OP_VOP2__V_SUBREV_U32, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, +&Decoder::decode_OP_VOP2__V_DOT2C_F32_F16, +&Decoder::decode_OP_VOP2__V_DOT2C_F32_F16, +&Decoder::decode_OP_VOP2__V_DOT2C_F32_F16, +&Decoder::decode_OP_VOP2__V_DOT2C_F32_F16, +&Decoder::decode_OP_VOP2__V_DOT2C_I32_I16, +&Decoder::decode_OP_VOP2__V_DOT2C_I32_I16, +&Decoder::decode_OP_VOP2__V_DOT2C_I32_I16, +&Decoder::decode_OP_VOP2__V_DOT2C_I32_I16, +&Decoder::decode_OP_VOP2__V_DOT4C_I32_I8, +&Decoder::decode_OP_VOP2__V_DOT4C_I32_I8, +&Decoder::decode_OP_VOP2__V_DOT4C_I32_I8, +&Decoder::decode_OP_VOP2__V_DOT4C_I32_I8, +&Decoder::decode_OP_VOP2__V_DOT8C_I32_I4, +&Decoder::decode_OP_VOP2__V_DOT8C_I32_I4, +&Decoder::decode_OP_VOP2__V_DOT8C_I32_I4, +&Decoder::decode_OP_VOP2__V_DOT8C_I32_I4, +&Decoder::decode_OP_VOP2__V_FMAC_F32, +&Decoder::decode_OP_VOP2__V_FMAC_F32, +&Decoder::decode_OP_VOP2__V_FMAC_F32, +&Decoder::decode_OP_VOP2__V_FMAC_F32, +&Decoder::decode_OP_VOP2__V_PK_FMAC_F16, +&Decoder::decode_OP_VOP2__V_PK_FMAC_F16, +&Decoder::decode_OP_VOP2__V_PK_FMAC_F16, +&Decoder::decode_OP_VOP2__V_PK_FMAC_F16, +&Decoder::decode_OP_VOP2__V_XNOR_B32, +&Decoder::decode_OP_VOP2__V_XNOR_B32, +&Decoder::decode_OP_VOP2__V_XNOR_B32, +&Decoder::decode_OP_VOP2__V_XNOR_B32, &Decoder::subDecode_OP_VOPC, &Decoder::subDecode_OP_VOPC, &Decoder::subDecode_OP_VOPC, @@ -4172,6 +4172,55 @@ } GPUStaticInst* +Decoder::decode_OP_VOP2__V_DOT2C_F32_F16(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_DOT2C_I32_I16(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_DOT4C_I32_I8(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_DOT8C_I32_I4(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_FMAC_F32(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_PK_FMAC_F16(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* +Decoder::decode_OP_VOP2__V_XNOR_B32(MachInst iFmt) +{ +fatal("Trying to decode instruction without a class\n"); +return nullptr; +} + +GPUStaticInst* De
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Add writeROM method
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email ) Change subject: dev-amdgpu: Add writeROM method .. dev-amdgpu: Add writeROM method For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore gets routed to the PIO bus in gem5. This gets routed to the GPU in the case of a ROM write. We write to the ROM as a way to "load" the VBIOS without creating holes in the KVM VM. This write method allows the same scripts as KVM to be used by writing to the ROM area and overwriting what might already be there from the --gpu-rom option. Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683 --- M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/amdgpu_device.hh 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index cb180b6..3605882 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -107,6 +107,20 @@ pkt->getAddr(), rom_offset, rom_data); } +void +AMDGPUDevice::writeROM(PacketPtr pkt) +{ +assert(isROM(pkt->getAddr())); + +Addr rom_offset = pkt->getAddr() - romRange.start(); +uint64_t rom_data = pkt->getUintX(ByteOrder::little); + +memcpy(rom.data() + rom_offset, &rom_data, pkt->getSize()); + +DPRINTF(AMDGPUDevice, "Write to addr %#x on ROM offset %#x data: %#x\n", +pkt->getAddr(), rom_offset, rom_data); +} + AddrRangeList AMDGPUDevice::getAddrRanges() const { @@ -386,6 +400,14 @@ Tick AMDGPUDevice::write(PacketPtr pkt) { +if (isROM(pkt->getAddr())) { +writeROM(pkt); + +dispatchAccess(pkt, false); + +return pioDelay; +} + int barnum = -1; Addr offset = 0; getBAR(pkt->getAddr(), barnum, offset); diff --git a/src/dev/amdgpu/amdgpu_device.hh b/src/dev/amdgpu/amdgpu_device.hh index ac31b95..b64067a 100644 --- a/src/dev/amdgpu/amdgpu_device.hh +++ b/src/dev/amdgpu/amdgpu_device.hh @@ -94,6 +94,7 @@ AddrRange romRange; bool isROM(Addr addr) const { return romRange.contains(addr); } void readROM(PacketPtr pkt); +void writeROM(PacketPtr pkt); std::array rom; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683 Gerrit-Change-Number: 70037 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Default MMIO reads when previously written
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email ) Change subject: dev-amdgpu: Default MMIO reads when previously written .. dev-amdgpu: Default MMIO reads when previously written If an MMIO was previously written and the driver reads it, we should return the value that was previously read. This overwrites the MMIO trace value which is the last resort fallback for finding an MMIO value. This is needed to initialize newer GPU devices in gem5. Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09 --- M src/dev/amdgpu/amdgpu_device.cc 1 file changed, 7 insertions(+), 0 deletions(-) diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index 3605882..7e6304a 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -248,6 +248,13 @@ DPRINTF(AMDGPUDevice, "Read MMIO %#lx\n", offset); mmioReader.readFromTrace(pkt, MMIO_BAR, offset); +if (regs.find(pkt->getAddr()) != regs.end()) { +uint64_t value = regs[pkt->getAddr()]; +DPRINTF(AMDGPUDevice, "Reading what kernel wrote before: %#x\n", +value); +pkt->setUintX(value, ByteOrder::little); +} + switch (aperture) { case NBIO_BASE: switch (aperture_offset) { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09 Gerrit-Change-Number: 70039 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Refactor MMIO interface for SDMA engines
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70040?usp=email ) Change subject: dev-amdgpu: Refactor MMIO interface for SDMA engines .. dev-amdgpu: Refactor MMIO interface for SDMA engines Currently the amdgpu simulated device is assumed to be a Vega10. As a result there are a few things that are hardcoded. One of those is the number of SDMAs. In order to add a newer device, such as MI100+, we need to enable a flexible number of SDMAs. In order to support a variable number of SDMAs and with the MMIO offsets of each device being potentially different, the MMIO interface for SDMAs is changed to use an SDMA class method dispatch table with forwards a 32-bit value from the MMIO packet to the MMIO functions in SDMA of the format `void method(uint32_t)`. Several changes are made to enable this: - Allow the SDMA to have a variable MMIO base and size. These are configured in python. - An SDMA class method dispatch table which contains the MMIO offset relative to the SDMA's MMIO base address. - An updated writeMMIO method to iterate over the SDMA MMIO address ranges and call the appropriate SDMA MMIO method which matches the MMIO offset. - Moved all SDMA related MMIO data bit twiddling, masking, etc. into the MMIO methods themselves instead of in the writeMMIO method in SDMAEngine. Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc --- M configs/example/gpufs/system/system.py M src/dev/amdgpu/AMDGPU.py M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/amdgpu_device.hh M src/dev/amdgpu/interrupt_handler.cc M src/dev/amdgpu/interrupt_handler.hh M src/dev/amdgpu/sdma_engine.cc M src/dev/amdgpu/sdma_engine.hh 8 files changed, 182 insertions(+), 57 deletions(-) diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index 93f0194..90c5c01 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -129,15 +129,45 @@ device_ih = AMDGPUInterruptHandler() system.pc.south_bridge.gpu.device_ih = device_ih -# Setup the SDMA engines -sdma0_pt_walker = VegaPagetableWalker() -sdma1_pt_walker = VegaPagetableWalker() +# Setup the SDMA engines depending on device. The MMIO base addresses +# can be found in the driver code under: +# include/asic_reg/sdmaX/sdmaX_Y_Z_offset.h +num_sdmas = 2 +sdma_bases = [] +sdma_sizes = [] +if args.gpu_device == "Vega10": +num_sdmas = 2 +sdma_bases = [0x4980, 0x5180] +sdma_sizes = [0x800] * 2 +elif args.gpu_device == "MI100": +num_sdmas = 8 +sdma_bases = [ +0x4980, +0x6180, +0x78000, +0x79000, +0x7A000, +0x7B000, +0x7C000, +0x7D000, +] +sdma_sizes = [0x1000] * 8 +else: +m5.util.panic(f"Unknown GPU device {args.gpu_device}") -sdma0 = SDMAEngine(walker=sdma0_pt_walker) -sdma1 = SDMAEngine(walker=sdma1_pt_walker) +sdma_pt_walkers = [] +sdma_engines = [] +for sdma_idx in range(num_sdmas): +sdma_pt_walker = VegaPagetableWalker() +sdma_engine = SDMAEngine( +walker=sdma_pt_walker, +mmio_base=sdma_bases[sdma_idx], +mmio_size=sdma_sizes[sdma_idx], +) +sdma_pt_walkers.append(sdma_pt_walker) +sdma_engines.append(sdma_engine) -system.pc.south_bridge.gpu.sdma0 = sdma0 -system.pc.south_bridge.gpu.sdma1 = sdma1 +system.pc.south_bridge.gpu.sdmas = sdma_engines # Setup PM4 packet processor pm4_pkt_proc = PM4PacketProcessor() @@ -155,22 +185,22 @@ system._dma_ports.append(gpu_hsapp) system._dma_ports.append(gpu_cmd_proc) system._dma_ports.append(system.pc.south_bridge.gpu) -system._dma_ports.append(sdma0) -system._dma_ports.append(sdma1) +for sdma in sdma_engines: +system._dma_ports.append(sdma) system._dma_ports.append(device_ih) system._dma_ports.append(pm4_pkt_proc) system._dma_ports.append(system_hub) system._dma_ports.append(gpu_mem_mgr) system._dma_ports.append(hsapp_pt_walker) system._dma_ports.append(cp_pt_walker) -system._dma_ports.append(sdma0_pt_walker) -system._dma_ports.append(sdma1_pt_walker) +for sdma_pt_walker in sdma_pt_walkers: +system._dma_ports.append(sdma_pt_walker) gpu_hsapp.pio = system.iobus.mem_side_ports gpu_cmd_proc.pio = system.iobus.mem_side_ports system.pc.south_bridge.gpu.pio = system.iobus.mem_side_ports -sdma0.pio = system.iobus.mem_side_ports -sdma1.pio = system.iobus.mem_side_ports +for sdma in sdma_engines: +sdma.pio = system.iobus.mem_side_ports device_ih.pio = system.iobus.mem_side_ports pm4_pkt_proc.pio = system.iobus.mem_side_ports system_hub.pio = system.iobus.mem_sid
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu,configs: Add human readable names for different GPUs
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email ) Change subject: dev-amdgpu,configs: Add human readable names for different GPUs .. dev-amdgpu,configs: Add human readable names for different GPUs Add a human readable string for GPU device names rather than using the device ID in the code. This is intended to make code more readable. Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a --- M configs/example/gpufs/runfs.py M configs/example/gpufs/system/amdgpu.py M src/dev/amdgpu/AMDGPU.py 3 files changed, 21 insertions(+), 0 deletions(-) diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 52b79ab..efea26b 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -126,6 +126,13 @@ help="type of memory to use", ) +parser.add_argument( +"--gpu-device", +default="Vega10", +choices=["Vega10", "MI100"], +help="GPU model to run: Vega10 (gfx900) or MI100 (gfx908)", +) + def runGpuFSSystem(args): """ diff --git a/configs/example/gpufs/system/amdgpu.py b/configs/example/gpufs/system/amdgpu.py index 1fd3e2f..5f98b55 100644 --- a/configs/example/gpufs/system/amdgpu.py +++ b/configs/example/gpufs/system/amdgpu.py @@ -170,3 +170,14 @@ system.pc.south_bridge.gpu.checkpoint_before_mmios = ( args.checkpoint_before_mmios ) + +system.pc.south_bridge.gpu.device_name = args.gpu_device + +if args.gpu_device == "MI100": +system.pc.south_bridge.gpu.DeviceID = 0x738C +system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002 +system.pc.south_bridge.gpu.SubsystemID = 0x0C34 +elif args.gpu_device == "Vega10": +system.pc.south_bridge.gpu.DeviceID = 0x6863 +else: +panic("Unknown GPU device: {}".format(args.gpu_device)) diff --git a/src/dev/amdgpu/AMDGPU.py b/src/dev/amdgpu/AMDGPU.py index f9d953f..1e78672 100644 --- a/src/dev/amdgpu/AMDGPU.py +++ b/src/dev/amdgpu/AMDGPU.py @@ -46,6 +46,9 @@ cxx_header = "dev/amdgpu/amdgpu_device.hh" cxx_class = "gem5::AMDGPUDevice" +# Human readable name for device ID +device_name = Param.String("Vega10", "Codename for device") + # IDs for AMD Vega 10 VendorID = 0x1002 DeviceID = 0x6863 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a Gerrit-Change-Number: 70038 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [L] Change in gem5/gem5[develop]: dev-amdgpu: Enable more GPUs with device specific registers
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/70041?usp=email ) Change subject: dev-amdgpu: Enable more GPUs with device specific registers .. dev-amdgpu: Enable more GPUs with device specific registers Currently gem5 assumes the amdgpu device to be Vega10. In order to support more devices we need to handle situations where different registers and addresses have the same functionality but different offsets on different devices. This changeset adds an NBIO class to handle device discovery and driver initialization related tasks, pulling them out of the AMDGPUDevice class. The offsets used for MMIOs are reworked slightly to use offsets rather than absolute addresses. This is because we cannot determine the absolute address in the constructor since the BAR has not been assigned by the OS yet. Change-Id: I14b364374e086e185978334425a4e265cf2760d0 --- M src/dev/amdgpu/SConscript M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/amdgpu_device.hh A src/dev/amdgpu/amdgpu_nbio.cc A src/dev/amdgpu/amdgpu_nbio.hh M src/dev/amdgpu/amdgpu_vm.hh 6 files changed, 371 insertions(+), 46 deletions(-) diff --git a/src/dev/amdgpu/SConscript b/src/dev/amdgpu/SConscript index 713f0a6..9f8eeac 100644 --- a/src/dev/amdgpu/SConscript +++ b/src/dev/amdgpu/SConscript @@ -39,6 +39,7 @@ tags='x86 isa') Source('amdgpu_device.cc', tags='x86 isa') +Source('amdgpu_nbio.cc', tags='x86 isa') Source('amdgpu_vm.cc', tags='x86 isa') Source('interrupt_handler.cc', tags='x86 isa') Source('memory_manager.cc', tags='x86 isa') diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index 2acf1f4..519ea7a 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -36,6 +36,7 @@ #include "debug/AMDGPUDevice.hh" #include "dev/amdgpu/amdgpu_vm.hh" #include "dev/amdgpu/interrupt_handler.hh" +#include "dev/amdgpu/nbio_mmio.hh" #include "dev/amdgpu/pm4_packet_processor.hh" #include "dev/amdgpu/sdma_engine.hh" #include "dev/hsa/hw_scheduler.hh" @@ -129,6 +130,34 @@ pm4PktProc->setGPUDevice(this); cp->hsaPacketProc().setGPUDevice(this); cp->setGPUDevice(this); + +// Address aperture for device memory. We tell this to the driver and +// could possibly be anything, but these are the values used by hardware. +uint64_t mmhubBase = 0x8000ULL << 24; +uint64_t mmhubTop = 0x83ffULL << 24; + +// All read-before-write MMIOs go here +//triggered_reads[AMDGPU_MP0_SMN_C2PMSG_64] = 0x8000; + +// These are hardcoded register values to return what the driver expects +setRegVal(AMDGPU_MP0_SMN_C2PMSG_33, 0x8000); + +// Different registers for MI200, MI100, and Vega10 +if (p.device_name == "Vega10") { +setRegVal(VEGA10_FB_LOCATION_BASE, mmhubBase >> 24); +setRegVal(VEGA10_FB_LOCATION_TOP, mmhubTop >> 24); +} else if (p.device_name == "MI100") { +setRegVal(MI100_FB_LOCATION_BASE, mmhubBase >> 24); +setRegVal(MI100_FB_LOCATION_TOP, mmhubTop >> 24); +setRegVal(MI100_MEM_SIZE_REG, 0x3ff0); // 16GB of memory +} else { +panic("Unknown GPU device %s\n", p.device_name); +} + +gpuvm.setMMHUBBase(mmhubBase); +gpuvm.setMMHUBTop(mmhubTop); + +nbio.setGPUDevice(this); } void @@ -236,35 +265,25 @@ * first, ignoring any writes from driver. (2) Any other address from * device backing store / abstract memory class functionally. */ -if (offset == 0xa28000) { -/* - * Handle special counter addresses in framebuffer. These counter - * addresses expect the read to return previous value + 1. - */ -if (regs.find(pkt->getAddr()) == regs.end()) { -regs[pkt->getAddr()] = 1; -} else { -regs[pkt->getAddr()]++; -} - -pkt->setUintX(regs[pkt->getAddr()], ByteOrder::little); -} else { -/* - * Read the value from device memory. This must be done functionally - * because this method is called by the PCIDevice::read method which - * is a non-timing read. - */ -RequestPtr req = std::make_shared(offset, pkt->getSize(), 0, - vramRequestorId()); -PacketPtr readPkt = Packet::createRead(req); -uint8_t *dataPtr = new uint8_t[pkt->getSize()]; -readPkt->dataDynamic(dataPtr); - -auto system = cp->shader()->gpuCmdProc.system(); -system->getDeviceMemory(readPkt)->access(readPkt); - -pkt->setUintX(readPkt->getUintX(ByteOrder::little), ByteOrder::little); +if (nbio.readFrame(pkt, offset)) { +return; } + +/* + * Read the value from device memory. This must be done functionally + * because this method is called by the PCIDevice::read method which +
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add simple check for valid GPU MMIO trace
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email ) Change subject: configs: Add simple check for valid GPU MMIO trace .. configs: Add simple check for valid GPU MMIO trace This file is a required input to the simulator for GPUFS. There seems to be confusion from several users who are not providing this input. This usually results in the amdgpu driver failing to load, leading to the application under test exiting along with it. This changeset adds a simple md5 hashsum check to compare against the known good MMIO trace located in the gem5-resources repository. Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987 --- M configs/example/gpufs/runfs.py 1 file changed, 6 insertions(+), 0 deletions(-) diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 4a28068a..52b79ab 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -30,6 +30,7 @@ # System includes import argparse import math +import hashlib # gem5 related import m5 @@ -145,6 +146,11 @@ math.ceil(float(n_cu) / args.cu_per_scalar_cache) ) +# Verify MMIO trace is valid +mmio_md5 = hashlib.md5(open(args.gpu_mmio_trace, "rb").read()).hexdigest() +if mmio_md5 != "c4ff3326ae8a036e329b8b595c83bd6d": +m5.util.panic("MMIO file does not match gem5 resources") + system = makeGpuFSSystem(args) root = Root( -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987 Gerrit-Change-Number: 69978 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Allow other CPU types in GPUFS
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email ) Change subject: configs: Allow other CPU types in GPUFS .. configs: Allow other CPU types in GPUFS Previously the CPU type and memory modes were hardcoded for KVM, because there was a deadlock bug. After some recent testing, this deadlock bug no longer exists with the simple CPU models. Thus, changing the configs to allow for other CPU models as a first step toward lifting the KVM requirement from GPUFS. Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5 --- M configs/example/gpufs/system/system.py 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py index a1b59ef..93f0194 100644 --- a/configs/example/gpufs/system/system.py +++ b/configs/example/gpufs/system/system.py @@ -61,7 +61,9 @@ panic("Need at least 2GB of system memory to load amdgpu module") # Use the common FSConfig to setup a Linux X86 System -(TestCPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(args) +(TestCPUClass, test_mem_mode) = Simulation.getCPUClass(args.cpu_type) +if test_mem_mode == "atomic": +test_mem_mode = "atomic_noncaching" disks = [args.disk_image] if args.second_disk is not None: disks.extend([args.second_disk]) @@ -91,10 +93,11 @@ # Create specified number of CPUs. GPUFS really only needs one. system.cpu = [ -X86KvmCPU(clk_domain=system.cpu_clk_domain, cpu_id=i) +TestCPUClass(clk_domain=system.cpu_clk_domain, cpu_id=i) for i in range(args.num_cpus) ] -system.kvm_vm = KvmVM() +if ObjectList.is_kvm_cpu(TestCPUClass): +system.kvm_vm = KvmVM() # Create AMDGPU and attach to southbridge shader = createGPU(system, args) -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5 Gerrit-Change-Number: 69979 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Use higher dmesg level for GPUFS
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email ) Change subject: configs: Use higher dmesg level for GPUFS .. configs: Use higher dmesg level for GPUFS The dmesg level is currently set to 3 which will not display errors if the amdgpu driver fails to load. Changing to level 8 will show errors in the gem5 terminal and is not too spammy. This will help GPUFS developers with bug reports since we would actually be able to observe an error. Currently if the driver fails to load, there is no way to detect it and applications will attempt to run, usually failing on getting device properties. Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d --- M configs/example/gpufs/hip_cookbook.py M configs/example/gpufs/hip_rodinia.py M configs/example/gpufs/hip_samples.py M configs/example/gpufs/vega10_kvm.py 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/configs/example/gpufs/hip_cookbook.py b/configs/example/gpufs/hip_cookbook.py index 87c7547..6a7bb42 100644 --- a/configs/example/gpufs/hip_cookbook.py +++ b/configs/example/gpufs/hip_cookbook.py @@ -42,7 +42,7 @@ cookbook_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." diff --git a/configs/example/gpufs/hip_rodinia.py b/configs/example/gpufs/hip_rodinia.py index 8ed951b..b8a7858 100644 --- a/configs/example/gpufs/hip_rodinia.py +++ b/configs/example/gpufs/hip_rodinia.py @@ -43,7 +43,7 @@ rodinia_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." diff --git a/configs/example/gpufs/hip_samples.py b/configs/example/gpufs/hip_samples.py index ccc1719..9f83c25 100644 --- a/configs/example/gpufs/hip_samples.py +++ b/configs/example/gpufs/hip_samples.py @@ -42,7 +42,7 @@ samples_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." diff --git a/configs/example/gpufs/vega10_kvm.py b/configs/example/gpufs/vega10_kvm.py index 54253be..9c7e457 100644 --- a/configs/example/gpufs/vega10_kvm.py +++ b/configs/example/gpufs/vega10_kvm.py @@ -44,7 +44,7 @@ demo_runscript = """\ export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 -dmesg -n3 +dmesg -n8 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5." -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d Gerrit-Change-Number: 69977 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem: Handle DRAM write queue drain and disabled power down
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email ) Change subject: mem: Handle DRAM write queue drain and disabled power down .. mem: Handle DRAM write queue drain and disabled power down Write queue drain logic seems off currently. An event is scheduled if the write queue is empty instead of non-empty. There is no check to see if draining is complete when bus is in write mode. Finally the power down check on drain always fails if DRAM powerdown is disabled. This changeset reverses the drain conditional for the write queue to schedule an event if the write queue is *not* empty and checks in the event processing method that the queues are all empty so that signalDrainDone can be called. Lastly the powerdown state is ignored if DRAM powerdown is disabled. Powerdown is disabled in the GPU_VIPER protocol by default. This changeset successfully drains and checkpoints a GPUFS simulation using GPU_VIPER protocol. Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb --- M src/mem/dram_interface.hh M src/mem/mem_ctrl.cc 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/mem/dram_interface.hh b/src/mem/dram_interface.hh index fa9d319..206f8e8 100644 --- a/src/mem/dram_interface.hh +++ b/src/mem/dram_interface.hh @@ -380,7 +380,11 @@ * @param Return true if the rank is idle from a bank *and power point of view */ -bool inPwrIdleState() const { return pwrState == PWR_IDLE; } +bool +inPwrIdleState() const +{ +return !dram.enableDRAMPowerdown || pwrState == PWR_IDLE; +} /** * Trigger a self-refresh exit if there are entries enqueued diff --git a/src/mem/mem_ctrl.cc b/src/mem/mem_ctrl.cc index 543d637..074a31f 100644 --- a/src/mem/mem_ctrl.cc +++ b/src/mem/mem_ctrl.cc @@ -908,6 +908,13 @@ } } +if (drainState() == DrainState::Draining && !totalWriteQueueSize && +!totalReadQueueSize && respQEmpty()) { + +DPRINTF(Drain, "MemCtrl controller done draining\n"); +signalDrainDone(); +} + // updates current state busState = busStateNext; @@ -1420,7 +1427,8 @@ // the only queue that is not drained automatically over time // is the write queue, thus kick things into action if needed -if (!totalWriteQueueSize && !nextReqEvent.scheduled()) { +if (totalWriteQueueSize && !nextReqEvent.scheduled()) { +DPRINTF(Drain,"Scheduling nextReqEvent from drain\n"); schedule(nextReqEvent, curTick()); } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-MessageType: newchange Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb Gerrit-Change-Number: 69917 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Update API for some flat atomics
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email ) Change subject: arch-vega: Update API for some flat atomics .. arch-vega: Update API for some flat atomics Some recently submitted atomic instructions were using two older APIs. Update these to use the newer APIs to support all apertures and avoid compilation issue. Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67977 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 23 insertions(+), 15 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index b6a78b2..45c8491 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -44984,13 +44984,11 @@ gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); -ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); ConstVecOperandU32 data(gpuDynInst, extData.DATA); -addr.read(); data.read(); -calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); +calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -44999,8 +44997,7 @@ } } -gpuDynInst->computeUnit()->globalMemoryPipe. -issueRequest(gpuDynInst); +issueRequestHelper(gpuDynInst); } // execute void @@ -45091,13 +45088,11 @@ gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); -ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); ConstVecOperandU32 data(gpuDynInst, extData.DATA); -addr.read(); data.read(); -calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); +calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -45106,8 +45101,7 @@ } } -gpuDynInst->computeUnit()->globalMemoryPipe. -issueRequest(gpuDynInst); +issueRequestHelper(gpuDynInst); } // execute void @@ -45226,13 +45220,11 @@ gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); -ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); ConstVecOperandU32 data(gpuDynInst, extData.DATA); -addr.read(); data.read(); -calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); +calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -45241,8 +45233,7 @@ } } -gpuDynInst->computeUnit()->globalMemoryPipe. -issueRequest(gpuDynInst); +issueRequestHelper(gpuDynInst); } // execute void -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852 Gerrit-Change-Number: 67977 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Jason Lowe-Power Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Update API for some flat atomics
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email ) Change subject: arch-vega: Update API for some flat atomics .. arch-vega: Update API for some flat atomics Some recently submitted atomic instructions were using two older APIs. Update these to use the newer APIs to support all apertures and avoid compilation issue. Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852 --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index b6a78b2..45c8491 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -44984,13 +44984,11 @@ gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); -ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); ConstVecOperandU32 data(gpuDynInst, extData.DATA); -addr.read(); data.read(); -calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); +calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -44999,8 +44997,7 @@ } } -gpuDynInst->computeUnit()->globalMemoryPipe. -issueRequest(gpuDynInst); +issueRequestHelper(gpuDynInst); } // execute void @@ -45091,13 +45088,11 @@ gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); -ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); ConstVecOperandU32 data(gpuDynInst, extData.DATA); -addr.read(); data.read(); -calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); +calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -45106,8 +45101,7 @@ } } -gpuDynInst->computeUnit()->globalMemoryPipe. -issueRequest(gpuDynInst); +issueRequestHelper(gpuDynInst); } // execute void @@ -45226,13 +45220,11 @@ gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); -ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); ConstVecOperandU32 data(gpuDynInst, extData.DATA); -addr.read(); data.read(); -calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); +calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -45241,8 +45233,7 @@ } } -gpuDynInst->computeUnit()->globalMemoryPipe. -issueRequest(gpuDynInst); +issueRequestHelper(gpuDynInst); } // execute void -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852 Gerrit-Change-Number: 67977 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Update deprecated ports
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email ) Change subject: dev-amdgpu: Update deprecated ports .. dev-amdgpu: Update deprecated ports Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67837 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/amdgpu/common/tlb_coalescer.hh M src/dev/amdgpu/memory_manager.hh 2 files changed, 16 insertions(+), 3 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/common/tlb_coalescer.hh b/src/arch/amdgpu/common/tlb_coalescer.hh index 59d8ebe..56d72d7 100644 --- a/src/arch/amdgpu/common/tlb_coalescer.hh +++ b/src/arch/amdgpu/common/tlb_coalescer.hh @@ -152,7 +152,7 @@ public: MemSidePort(const std::string &_name, TLBCoalescer *tlb_coalescer, PortID _index) -: RequestPort(_name, tlb_coalescer), coalescer(tlb_coalescer), +: RequestPort(_name), coalescer(tlb_coalescer), index(_index) { } std::deque retries; diff --git a/src/dev/amdgpu/memory_manager.hh b/src/dev/amdgpu/memory_manager.hh index e18ec64..0bd08d6 100644 --- a/src/dev/amdgpu/memory_manager.hh +++ b/src/dev/amdgpu/memory_manager.hh @@ -45,11 +45,11 @@ class AMDGPUMemoryManager : public ClockedObject { -class GPUMemPort : public MasterPort +class GPUMemPort : public RequestPort { public: GPUMemPort(const std::string &_name, AMDGPUMemoryManager &_gpuMemMgr) -: MasterPort(_name, &_gpuMemMgr), gpu_mem(_gpuMemMgr) +: RequestPort(_name), gpu_mem(_gpuMemMgr) { } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882 Gerrit-Change-Number: 67837 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Gabriel B. Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implementing global_atomic_smax
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/64513?usp=email ) Change subject: arch-vega: Implementing global_atomic_smax .. arch-vega: Implementing global_atomic_smax Change-Id: Id4053424c98eec1e98eb555bb35b48f0b5d2407b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64513 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 67 insertions(+), 1 deletion(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index e3639a5..b6a78b2 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -45079,8 +45079,59 @@ void Inst_FLAT__FLAT_ATOMIC_SMAX::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decVMemInstsIssued(); +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); + +ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU32 data(gpuDynInst, extData.DATA); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + +gpuDynInst->computeUnit()->globalMemoryPipe. +issueRequest(gpuDynInst); } // execute + +void +Inst_FLAT__FLAT_ATOMIC_SMAX::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +initAtomicAccess(gpuDynInst); +} // initiateAcc + +void +Inst_FLAT__FLAT_ATOMIC_SMAX::completeAcc(GPUDynInstPtr gpuDynInst) +{ +if (isAtomicRet()) { +VecOperandU32 vdst(gpuDynInst, extData.VDST); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +vdst[lane] = (reinterpret_cast( +gpuDynInst->d_data))[lane]; +} +} + +vdst.write(); +} +} // completeAcc // --- Inst_FLAT__FLAT_ATOMIC_UMAX class methods --- Inst_FLAT__FLAT_ATOMIC_UMAX::Inst_FLAT__FLAT_ATOMIC_UMAX(InFmt_FLAT *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 8b0c8c4..d45a84c 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -42691,6 +42691,8 @@ } // getOperandSize void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; }; // Inst_FLAT__FLAT_ATOMIC_SMAX class Inst_FLAT__FLAT_ATOMIC_UMAX : public Inst_FLAT -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/64513?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Id4053424c98eec1e98eb555bb35b48f0b5d2407b Gerrit-Change-Number: 64513 Gerrit-PatchSet: 2 Gerrit-Owner: Alexandru Duțu (Alex) Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implementing global_atomic_smin
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/64512?usp=email ) Change subject: arch-vega: Implementing global_atomic_smin .. arch-vega: Implementing global_atomic_smin Change-Id: Iffb366190f9e3f7ffbacde5dbb3abc97226926d4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64512 Reviewed-by: Matt Sinclair Tested-by: kokoro Maintainer: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 67 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 987474f..e3639a5 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -44972,8 +44972,59 @@ void Inst_FLAT__FLAT_ATOMIC_SMIN::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decVMemInstsIssued(); +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); + +ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU32 data(gpuDynInst, extData.DATA); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + +gpuDynInst->computeUnit()->globalMemoryPipe. +issueRequest(gpuDynInst); } // execute + +void +Inst_FLAT__FLAT_ATOMIC_SMIN::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +initAtomicAccess(gpuDynInst); +} // initiateAcc + +void +Inst_FLAT__FLAT_ATOMIC_SMIN::completeAcc(GPUDynInstPtr gpuDynInst) +{ +if (isAtomicRet()) { +VecOperandU32 vdst(gpuDynInst, extData.VDST); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +vdst[lane] = (reinterpret_cast( +gpuDynInst->d_data))[lane]; +} +} + +vdst.write(); +} +} // completeAcc // --- Inst_FLAT__FLAT_ATOMIC_UMIN class methods --- Inst_FLAT__FLAT_ATOMIC_UMIN::Inst_FLAT__FLAT_ATOMIC_UMIN(InFmt_FLAT *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index ddf228a..8b0c8c4 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -42615,6 +42615,8 @@ } // getOperandSize void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; }; // Inst_FLAT__FLAT_ATOMIC_SMIN class Inst_FLAT__FLAT_ATOMIC_UMIN : public Inst_FLAT -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/64512?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Iffb366190f9e3f7ffbacde5dbb3abc97226926d4 Gerrit-Change-Number: 64512 Gerrit-PatchSet: 2 Gerrit-Owner: Alexandru Duțu (Alex) Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implementing global_atomic_or
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/64511?usp=email ) Change subject: arch-vega: Implementing global_atomic_or .. arch-vega: Implementing global_atomic_or Change-Id: I13065186313ca784054956e1165b1b2fd8ce4a19 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64511 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 68 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index f019dfd..987474f 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -45112,8 +45112,60 @@ void Inst_FLAT__FLAT_ATOMIC_OR::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decVMemInstsIssued(); +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); + +ConstVecOperandU64 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU32 data(gpuDynInst, extData.DATA); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + +gpuDynInst->computeUnit()->globalMemoryPipe. +issueRequest(gpuDynInst); } // execute + +void +Inst_FLAT__FLAT_ATOMIC_OR::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +initAtomicAccess(gpuDynInst); +} // initiateAcc + +void +Inst_FLAT__FLAT_ATOMIC_OR::completeAcc(GPUDynInstPtr gpuDynInst) +{ +if (isAtomicRet()) { +VecOperandU32 vdst(gpuDynInst, extData.VDST); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +vdst[lane] = (reinterpret_cast( +gpuDynInst->d_data))[lane]; +} +} + +vdst.write(); +} +} // completeAcc + // --- Inst_FLAT__FLAT_ATOMIC_XOR class methods --- Inst_FLAT__FLAT_ATOMIC_XOR::Inst_FLAT__FLAT_ATOMIC_XOR(InFmt_FLAT *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index dc2ee08..ddf228a 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -42800,6 +42800,8 @@ } // getOperandSize void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; }; // Inst_FLAT__FLAT_ATOMIC_OR class Inst_FLAT__FLAT_ATOMIC_XOR : public Inst_FLAT -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/64511?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I13065186313ca784054956e1165b1b2fd8ce4a19 Gerrit-Change-Number: 64511 Gerrit-PatchSet: 2 Gerrit-Owner: Alexandru Duțu (Alex) Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email ) Change subject: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet .. dev-amdgpu: Fix address in POLL_REGMEM SDMA packet The address for the POLL_REGMEM packet should not be shifted when the mode is 1 (memory). Relevant driver code below is not shifting the address. The shift is causing a page fault due to the incorrect address. This changeset removes the shift so the correct address is translated. https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/ roc-4.3.x/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c#L903 Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67877 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/dev/amdgpu/sdma_engine.cc 1 file changed, 23 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc index 4c03bf5..736df45 100644 --- a/src/dev/amdgpu/sdma_engine.cc +++ b/src/dev/amdgpu/sdma_engine.cc @@ -832,7 +832,7 @@ auto cb = new DmaVirtCallback( [ = ] (const uint32_t &dma_buffer) { pollRegMemRead(q, header, pkt, dma_buffer, 0); }); -dmaReadVirt(pkt->address >> 3, sizeof(uint32_t), cb, +dmaReadVirt(pkt->address, sizeof(uint32_t), cb, (void *)&cb->dmaBuffer); } else { panic("SDMA poll mem operation not implemented."); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751 Gerrit-Change-Number: 67877 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email ) Change subject: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet .. dev-amdgpu: Fix address in POLL_REGMEM SDMA packet The address for the POLL_REGMEM packet should not be shifted when the mode is 1 (memory). Relevant driver code below is not shifting the address. The shift is causing a page fault due to the incorrect address. This changeset removes the shift so the correct address is translated. https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/ roc-4.3.x/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c#L903 Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751 --- M src/dev/amdgpu/sdma_engine.cc 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc index 4c03bf5..736df45 100644 --- a/src/dev/amdgpu/sdma_engine.cc +++ b/src/dev/amdgpu/sdma_engine.cc @@ -832,7 +832,7 @@ auto cb = new DmaVirtCallback( [ = ] (const uint32_t &dma_buffer) { pollRegMemRead(q, header, pkt, dma_buffer, 0); }); -dmaReadVirt(pkt->address >> 3, sizeof(uint32_t), cb, +dmaReadVirt(pkt->address, sizeof(uint32_t), cb, (void *)&cb->dmaBuffer); } else { panic("SDMA poll mem operation not implemented."); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751 Gerrit-Change-Number: 67877 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Update deprecated ports
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email ) Change subject: dev-amdgpu: Update deprecated ports .. dev-amdgpu: Update deprecated ports Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882 --- M src/arch/amdgpu/common/tlb_coalescer.hh M src/dev/amdgpu/memory_manager.hh 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/src/arch/amdgpu/common/tlb_coalescer.hh b/src/arch/amdgpu/common/tlb_coalescer.hh index 59d8ebe..56d72d7 100644 --- a/src/arch/amdgpu/common/tlb_coalescer.hh +++ b/src/arch/amdgpu/common/tlb_coalescer.hh @@ -152,7 +152,7 @@ public: MemSidePort(const std::string &_name, TLBCoalescer *tlb_coalescer, PortID _index) -: RequestPort(_name, tlb_coalescer), coalescer(tlb_coalescer), +: RequestPort(_name), coalescer(tlb_coalescer), index(_index) { } std::deque retries; diff --git a/src/dev/amdgpu/memory_manager.hh b/src/dev/amdgpu/memory_manager.hh index e18ec64..0bd08d6 100644 --- a/src/dev/amdgpu/memory_manager.hh +++ b/src/dev/amdgpu/memory_manager.hh @@ -45,11 +45,11 @@ class AMDGPUMemoryManager : public ClockedObject { -class GPUMemPort : public MasterPort +class GPUMemPort : public RequestPort { public: GPUMemPort(const std::string &_name, AMDGPUMemoryManager &_gpuMemMgr) -: MasterPort(_name, &_gpuMemMgr), gpu_mem(_gpuMemMgr) +: RequestPort(_name), gpu_mem(_gpuMemMgr) { } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882 Gerrit-Change-Number: 67837 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Make VGPR-offset for global SGPR-base signed
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email ) Change subject: arch-vega: Make VGPR-offset for global SGPR-base signed .. arch-vega: Make VGPR-offset for global SGPR-base signed The VGPR-offset used when SGPR-base addressing is used can be signed in Vega. These are global instructions of the format: `global_load_dword v0, v1, s[0:1]`. This is not explicitly stated in the ISA manual however based on compiler output the offset can be negative. This changeset assigns the offset to a signed 32-bit integer and the compiler takes care of the signedness in the expression which calculates the final address. This fixes a bad address calculation in a rocPRIM unit test. Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67412 Reviewed-by: Bobby Bruce Maintainer: Bobby Bruce Tested-by: kokoro --- M src/arch/amdgpu/vega/insts/op_encodings.hh 1 file changed, 25 insertions(+), 1 deletion(-) Approvals: Bobby Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh b/src/arch/amdgpu/vega/insts/op_encodings.hh index 34f6040..1071ead 100644 --- a/src/arch/amdgpu/vega/insts/op_encodings.hh +++ b/src/arch/amdgpu/vega/insts/op_encodings.hh @@ -1007,8 +1007,9 @@ // mask any upper bits from the vaddr. for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { +ScalarRegI32 voffset = vaddr[lane]; gpuDynInst->addr.at(lane) = -saddr.rawData() + (vaddr[lane] & 0x) + offset; +saddr.rawData() + voffset + offset; } } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599 Gerrit-Change-Number: 67412 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write_b8_d16_hi
void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; +}; // Inst_DS__DS_WRITE_B8_D16_HI + class Inst_DS__DS_WRITE_B16 : public Inst_DS { public: -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67411?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0bfd573526b9c46585d0008cde07c769b1d29ebd Gerrit-Change-Number: 67411 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Make VGPR-offset for global SGPR-base signed
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email ) Change subject: arch-vega: Make VGPR-offset for global SGPR-base signed .. arch-vega: Make VGPR-offset for global SGPR-base signed The VGPR-offset used when SGPR-base addressing is used can be signed in Vega. These are global instructions of the format: `global_load_dword v0, v1, s[0:1]`. This is not explicitly stated in the ISA manual however based on compiler output the offset can be negative. This changeset assigns the offset to a signed 32-bit integer and the compiler takes care of the signedness in the expression which calculates the final address. This fixes a bad address calculation in a rocPRIM unit test. Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599 --- M src/arch/amdgpu/vega/insts/op_encodings.hh 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh b/src/arch/amdgpu/vega/insts/op_encodings.hh index 34f6040..1f52c75 100644 --- a/src/arch/amdgpu/vega/insts/op_encodings.hh +++ b/src/arch/amdgpu/vega/insts/op_encodings.hh @@ -1007,8 +1007,9 @@ // mask any upper bits from the vaddr. for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { +ScalarRegI32 voffset = vaddr[lane] & 0x; gpuDynInst->addr.at(lane) = -saddr.rawData() + (vaddr[lane] & 0x) + offset; +saddr.rawData() + voffset + offset; } } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599 Gerrit-Change-Number: 67412 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write_b8_d16_hi
lic: -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67411?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0bfd573526b9c46585d0008cde07c769b1d29ebd Gerrit-Change-Number: 67411 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email ) Change subject: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot .. dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot As of Linux 5.11, the MC146818 code was changed to avoid reading garbage data that may occur if the is a read while the registers are being updated: github.com/torvalds/linux/commit/05a0302c35481e9b47fb90ba40922b0a4cae40d8 Previously toggling this bit was fine as Linux would check twice. It now checks before and after reading time information, causing it to retry infinitely until eventually Linux bootup fails due to watchdog timeout. This changeset always sets update in progress to false. Since this is a simulation, the updates probably will not be occurring at the same time a read is occurring. Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66731 Reviewed-by: Matt Sinclair Tested-by: kokoro Maintainer: Bobby Bruce Reviewed-by: Jason Lowe-Power --- M src/dev/mc146818.cc 1 file changed, 31 insertions(+), 2 deletions(-) Approvals: Matt Sinclair: Looks good to me, but someone else must approve Bobby Bruce: Looks good to me, approved Jason Lowe-Power: Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/mc146818.cc b/src/dev/mc146818.cc index 919efb0..2bfe877 100644 --- a/src/dev/mc146818.cc +++ b/src/dev/mc146818.cc @@ -233,8 +233,9 @@ else { switch (addr) { case RTC_STAT_REGA: -// toggle UIP bit for linux -stat_regA.uip = !stat_regA.uip; +// Linux after v5.10 checks this multiple times so toggling +// leads to a deadlock on bootup. +stat_regA.uip = 0; return stat_regA; break; case RTC_STAT_REGB: -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b Gerrit-Change-Number: 66731 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Andreas Sandberg Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Gabe Black Gerrit-Reviewer: Gabe Black Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Read one dword for SGPR base global insts
} if (isFlat()) { @@ -974,6 +980,12 @@ } } +bool +vgprIsOffset() +{ +return (extData.SADDR != 0x7f); +} + // first instruction DWORD InFmt_FLAT instData; // second instruction DWORD @@ -987,7 +999,7 @@ void generateGlobalDisassembly(); void -calcAddrSgpr(GPUDynInstPtr gpuDynInst, ConstVecOperandU64 &vaddr, +calcAddrSgpr(GPUDynInstPtr gpuDynInst, ConstVecOperandU32 &vaddr, ConstScalarOperandU64 &saddr, ScalarRegI32 offset) { // Use SGPR pair as a base address and add VGPR-offset and -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67077?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0 Gerrit-Change-Number: 67077 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write2st64_b64
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email ) Change subject: arch-vega: Implement ds_write2st64_b64 .. arch-vega: Implement ds_write2st64_b64 Write two qwords at offsets multiplied by 8 * 64 bytes. Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67078 Reviewed-by: Matt Sinclair Tested-by: kokoro Maintainer: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 62 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 4b27afa..6cf01fb 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -36595,8 +36595,52 @@ void Inst_DS__DS_WRITE2ST64_B64::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU64 data0(gpuDynInst, extData.DATA0); +ConstVecOperandU64 data1(gpuDynInst, extData.DATA1); + +addr.read(); +data0.read(); +data1.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast( +gpuDynInst->d_data))[lane * 2] = data0[lane]; +(reinterpret_cast( +gpuDynInst->d_data))[lane * 2 + 1] = data1[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_WRITE2ST64_B64::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0 * 8 * 64; +Addr offset1 = instData.OFFSET1 * 8 * 64; + +initDualMemWrite(gpuDynInst, offset0, offset1); +} + +void +Inst_DS__DS_WRITE2ST64_B64::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // --- Inst_DS__DS_CMPST_B64 class methods --- Inst_DS__DS_CMPST_B64::Inst_DS__DS_CMPST_B64(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 9f017f9..2896732 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -33572,6 +33572,8 @@ } // getOperandSize void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; }; // Inst_DS__DS_WRITE2ST64_B64 class Inst_DS__DS_CMPST_B64 : public Inst_DS -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b Gerrit-Change-Number: 67078 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_read_i8
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email ) Change subject: arch-vega: Implement ds_read_i8 .. arch-vega: Implement ds_read_i8 Read one byte with sign extended from LDS. Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67076 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 60 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index a54f426..c803656 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -35636,8 +35636,50 @@ void Inst_DS__DS_READ_I8::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); + +addr.read(); + +calcAddr(gpuDynInst, addr); + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_READ_I8::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initMemRead(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_READ_I8::completeAcc(GPUDynInstPtr gpuDynInst) +{ +VecOperandU32 vdst(gpuDynInst, extData.VDST); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +vdst[lane] = (VecElemU32)sext<8>((reinterpret_cast( +gpuDynInst->d_data))[lane]); +} +} + +vdst.write(); +} // completeAcc // --- Inst_DS__DS_READ_U8 class methods --- Inst_DS__DS_READ_U8::Inst_DS__DS_READ_U8(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index f8fc98b..b2cf2b9 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -32848,6 +32848,8 @@ } // getOperandSize void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; }; // Inst_DS__DS_READ_I8 class Inst_DS__DS_READ_U8 : public Inst_DS -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be Gerrit-Change-Number: 67076 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u64
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email ) Change subject: arch-vega: Implement ds_add_u64 .. arch-vega: Implement ds_add_u64 This instruction does an atomic add of an unsigned 64-bit data with a VGPR and value in LDS atomically without return. Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67075 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 64 insertions(+), 3 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 3d9808a..a54f426 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -36088,6 +36088,10 @@ Inst_DS__DS_ADD_U64::Inst_DS__DS_ADD_U64(InFmt_DS *iFmt) : Inst_DS(iFmt, "ds_add_u64") { +setFlag(MemoryRef); +setFlag(GroupSegment); +setFlag(AtomicAdd); +setFlag(AtomicNoReturn); } // Inst_DS__DS_ADD_U64 Inst_DS__DS_ADD_U64::~Inst_DS__DS_ADD_U64() @@ -36096,14 +36100,53 @@ // --- description from .arch file --- // 64b: -// tmp = MEM[ADDR]; // MEM[ADDR] += DATA[0:1]; -// RETURN_DATA[0:1] = tmp. void Inst_DS__DS_ADD_U64::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU64 data(gpuDynInst, extData.DATA0); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_ADD_U64::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initAtomicAccess(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_ADD_U64::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // completeAcc // --- Inst_DS__DS_SUB_U64 class methods --- Inst_DS__DS_SUB_U64::Inst_DS__DS_SUB_U64(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 05a0002..f8fc98b 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -33079,6 +33079,8 @@ } } // getOperandSize +void initiateAcc(GPUDynInstPtr gpuDynInst) override; +void completeAcc(GPUDynInstPtr gpuDynInst) override; void execute(GPUDynInstPtr) override; }; // Inst_DS__DS_ADD_U64 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077 Gerrit-Change-Number: 67075 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: base: Specialize bitwise atomics so FP types can be used
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email ) Change subject: base: Specialize bitwise atomics so FP types can be used .. base: Specialize bitwise atomics so FP types can be used The current atomic memory operations are templated so any type can be used. However floating point types can not perform bitwise operations. The GPU model contains some instructions which do atomics on floating point types, so they need to be supported. To allow this, template specialization is added to atomic AND, OR, and XOR which does nothing if the type is floating point and operates as normal for integral types. Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67073 Reviewed-by: Matt Sinclair Reviewed-by: Daniel Carvalho Tested-by: kokoro Maintainer: Matt Sinclair --- M src/base/amo.hh 1 file changed, 52 insertions(+), 3 deletions(-) Approvals: kokoro: Regressions pass Daniel Carvalho: Looks good to me, approved Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/src/base/amo.hh b/src/base/amo.hh index 81bf069..c990d15 100644 --- a/src/base/amo.hh +++ b/src/base/amo.hh @@ -129,30 +129,57 @@ template class AtomicOpAnd : public TypedAtomicOpFunctor { +// Bitwise operations are only legal on integral types +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { *b &= a; } + +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { } + public: T a; AtomicOpAnd(T _a) : a(_a) { } -void execute(T *b) { *b &= a; } +void execute(T *b) { executeImpl(b); } AtomicOpFunctor* clone () { return new AtomicOpAnd(a); } }; template class AtomicOpOr : public TypedAtomicOpFunctor { +// Bitwise operations are only legal on integral types +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { *b |= a; } + +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { } + public: T a; AtomicOpOr(T _a) : a(_a) { } -void execute(T *b) { *b |= a; } +void execute(T *b) { executeImpl(b); } AtomicOpFunctor* clone () { return new AtomicOpOr(a); } }; template class AtomicOpXor : public TypedAtomicOpFunctor { +// Bitwise operations are only legal on integral types +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { *b ^= a; } + +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { } + public: T a; AtomicOpXor(T _a) : a(_a) {} -void execute(T *b) { *b ^= a; } +void execute(T *b) { executeImpl(b); } AtomicOpFunctor* clone () { return new AtomicOpXor(a); } }; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c Gerrit-Change-Number: 67073 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Daniel Carvalho Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_f32 atomic
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email ) Change subject: arch-vega: Implement ds_add_f32 atomic .. arch-vega: Implement ds_add_f32 atomic This instruction does an atomic add of a 32-bit float with a VGPR and value in LDS atomically without return. Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67074 Maintainer: Matt Sinclair Reviewed-by: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 64 insertions(+), 3 deletions(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index afdfde3..3d9808a 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -34755,6 +34755,10 @@ : Inst_DS(iFmt, "ds_add_f32") { setFlag(F32); +setFlag(MemoryRef); +setFlag(GroupSegment); +setFlag(AtomicAdd); +setFlag(AtomicNoReturn); } // Inst_DS__DS_ADD_F32 Inst_DS__DS_ADD_F32::~Inst_DS__DS_ADD_F32() @@ -34763,15 +34767,54 @@ // --- description from .arch file --- // 32b: -// tmp = MEM[ADDR]; // MEM[ADDR] += DATA; -// RETURN_DATA = tmp. // Floating point add that handles NaN/INF/denormal values. void Inst_DS__DS_ADD_F32::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandF32 data(gpuDynInst, extData.DATA0); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_ADD_F32::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initAtomicAccess(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_ADD_F32::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // completeAcc // --- Inst_DS__DS_WRITE_B8 class methods --- Inst_DS__DS_WRITE_B8::Inst_DS__DS_WRITE_B8(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 33be33e..05a0002 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -31895,6 +31895,8 @@ } } // getOperandSize +void initiateAcc(GPUDynInstPtr gpuDynInst) override; +void completeAcc(GPUDynInstPtr gpuDynInst) override; void execute(GPUDynInstPtr) override; }; // Inst_DS__DS_ADD_F32 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb Gerrit-Change-Number: 67074 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Fix signed BFE instructions
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email ) Change subject: arch-vega: Fix signed BFE instructions .. arch-vega: Fix signed BFE instructions The bitfield extract instructions come in unsigned and signed variants. The documentation on this is not correct, however the GCN3 documentation gives some clues. The instruction should extract an N-bit integer where N is defined in a source operand starting at some bit also defined by a source operand. For signed variants of this instruction, the N-bit integer should be sign extended but is currently not. This changeset does sign extension using the runtime value of N by ORing the upper bits with ones if the most significant bit is one. This was verified by writing these instructions in assembly and running on a real GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64. Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66751 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 55 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index f5b08b7..c9e57bc 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -1302,6 +1302,21 @@ sdst = (src0.rawData() >> bits(src1.rawData(), 4, 0)) & ((1 << bits(src1.rawData(), 22, 16)) - 1); + +// Above extracted a signed int of size src1[22:16] bits which needs +// to be signed-extended. Check if the MSB of our src1[22:16]-bit +// integer is 1, and sign extend it is. +// +// Note: The description in the Vega ISA manual does not mention to +// sign-extend the result. An update description can be found in the +// more recent RDNA3 manual here: +// https://developer.amd.com/wp-content/resources/ +// RDNA3_Shader_ISA_December2022.pdf +if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) { +sdst = sdst.rawData() + | (0x << bits(src1.rawData(), 22, 16)); +} + scc = sdst.rawData() ? 1 : 0; sdst.write(); @@ -1373,6 +1388,14 @@ sdst = (src0.rawData() >> bits(src1.rawData(), 5, 0)) & ((1 << bits(src1.rawData(), 22, 16)) - 1); + +// Above extracted a signed int of size src1[22:16] bits which needs +// to be signed-extended. Check if the MSB of our src1[22:16]-bit +// integer is 1, and sign extend it is. +if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) { +sdst = sdst.rawData() + | 0x << bits(src1.rawData(), 22, 16); +} scc = sdst.rawData() ? 1 : 0; sdst.write(); @@ -30544,6 +30567,13 @@ if (wf->execMask(lane)) { vdst[lane] = (src0[lane] >> bits(src1[lane], 4, 0)) & ((1 << bits(src2[lane], 4, 0)) - 1); + +// Above extracted a signed int of size src2 bits which needs +// to be signed-extended. Check if the MSB of our src2-bit +// integer is 1, and sign extend it is. +if (vdst[lane] >> (bits(src2[lane], 4, 0) - 1)) { +vdst[lane] |= 0x << bits(src2[lane], 4, 0); +} } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974 Gerrit-Change-Number: 66751 Gerrit-PatchSet: 4 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Fix several issues with DPP
&& (currVal < NumVecElemPerVecReg)) { -newLane += count; +newLane -= 1; } else { outOfBounds = true; } } else if (dppCtrl == SQ_DPP_WF_RR1) { // DPP_WF_RR1 -count = -1; -newLane = (currLane + count + NumVecElemPerVecReg) % +newLane = (currLane - 1 + NumVecElemPerVecReg) % NumVecElemPerVecReg; } else if (dppCtrl == SQ_DPP_ROW_MIRROR) { // DPP_ROW_MIRROR localRowOffset = (15 - localRowOffset); @@ -392,12 +389,22 @@ } else if (dppCtrl == SQ_DPP_ROW_BCAST15) { // DPP_ROW_BCAST15 count = 15; if (currLane > count) { -newLane = (currLane & ~count) - 1; +// 0x30 selects which set of 16 lanes to use. We broadcast the +// last lane of one set to all lanes of the next set (e.g., +// lane 15 is written to 16-31, 31 to 32-47, 47 to 48-63). +newLane = (currLane & 0x30) - 1; +} else { +outOfBounds = true; } } else if (dppCtrl == SQ_DPP_ROW_BCAST31) { // DPP_ROW_BCAST31 count = 31; if (currLane > count) { -newLane = (currLane & ~count) - 1; +// 0x20 selects either the upper 32 or lower 32 lanes and +// broadcasts the last lane of one set to all lanes of the +// next set (e.g., lane 31 is written to 32-63). +newLane = (currLane & 0x20) - 1; +} else { +outOfBounds = true; } } else { panic("Unimplemented DPP control operation: %d\n", dppCtrl); @@ -443,6 +450,9 @@ src0.absModifier(); } +// Need a copy of the original data since we update one lane at a time +T src0_copy = src0; + // iterate over all register lanes, performing steps 2-4 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { threadValid = (0x1LL << lane); @@ -458,7 +468,6 @@ if (((rowMask & (0x1 << rowNum)) == 0) /* row mask */ || ((bankMask & (0x1 << bankNum)) == 0) /* bank mask */) { laneDisabled = true; -continue; } /** @@ -495,7 +504,7 @@ } else { threadValid = 0; } -} else if (!gpuDynInst->exec_mask[lane]) { +} else if (!gpuDynInst->wavefront()->execMask(lane)) { if (boundCtrl == 1) { zeroSrc = true; } else { @@ -505,13 +514,15 @@ if (threadValid != 0 && !outOfBounds && !zeroSrc) { assert(!laneDisabled); -src0[outLane] = src0[lane]; +src0[lane] = src0_copy[outLane]; } else if (zeroSrc) { src0[lane] = 0; } // reset for next iteration laneDisabled = false; +outOfBounds = false; +zeroSrc = false; } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66752?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168 Gerrit-Change-Number: 66752 Gerrit-PatchSet: 5 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Add missing operand size for ds_write2st64_b64
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email ) Change subject: arch-vega: Add missing operand size for ds_write2st64_b64 .. arch-vega: Add missing operand size for ds_write2st64_b64 This instruction takes three operands (address, and two datas) but there were only operand sizes for two operands tripping assert in default case. Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67071 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.hh 1 file changed, 20 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 0671df8..1c42248 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -33553,7 +33553,9 @@ switch (opIdx) { case 0: //vgpr_a return 4; - case 1: //vgpr_d1 + case 1: //vgpr_d0 +return 8; + case 2: //vgpr_d1 return 8; default: fatal("op idx %i out of bounds\n", opIdx); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7 Gerrit-Change-Number: 67071 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u32 atomic
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email ) Change subject: arch-vega: Implement ds_add_u32 atomic .. arch-vega: Implement ds_add_u32 atomic This instruction does an atomic add of unsigned 32-bit data with a VGPR and value in LDS atomically, without return. Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67072 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 64 insertions(+), 3 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 1f37ff1..afdfde3 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -34071,6 +34071,10 @@ Inst_DS__DS_ADD_U32::Inst_DS__DS_ADD_U32(InFmt_DS *iFmt) : Inst_DS(iFmt, "ds_add_u32") { +setFlag(MemoryRef); +setFlag(GroupSegment); +setFlag(AtomicAdd); +setFlag(AtomicNoReturn); } // Inst_DS__DS_ADD_U32 Inst_DS__DS_ADD_U32::~Inst_DS__DS_ADD_U32() @@ -34079,14 +34083,53 @@ // --- description from .arch file --- // 32b: -// tmp = MEM[ADDR]; // MEM[ADDR] += DATA; -// RETURN_DATA = tmp. void Inst_DS__DS_ADD_U32::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU32 data(gpuDynInst, extData.DATA0); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_ADD_U32::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initAtomicAccess(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_ADD_U32::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // completeAcc // --- Inst_DS__DS_SUB_U32 class methods --- Inst_DS__DS_SUB_U32::Inst_DS__DS_SUB_U32(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 1c42248..33be33e 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -31211,6 +31211,8 @@ } } // getOperandSize +void initiateAcc(GPUDynInstPtr gpuDynInst) override; +void completeAcc(GPUDynInstPtr gpuDynInst) override; void execute(GPUDynInstPtr) override; }; // Inst_DS__DS_ADD_U32 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e Gerrit-Change-Number: 67072 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Add DPP support for V_AND_B32
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email ) ( 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. )Change subject: arch-vega: Add DPP support for V_AND_B32 .. arch-vega: Add DPP support for V_AND_B32 A DPP variant of V_AND_B32 was found in rocPRIM. With this changeset the unit tests for rocPRIM scan_inclusive are passing. Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66753 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 46 insertions(+), 4 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index c9e57bc..1f37ff1 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -6844,15 +6844,41 @@ { Wavefront *wf = gpuDynInst->wavefront(); ConstVecOperandU32 src0(gpuDynInst, instData.SRC0); -ConstVecOperandU32 src1(gpuDynInst, instData.VSRC1); +VecOperandU32 src1(gpuDynInst, instData.VSRC1); VecOperandU32 vdst(gpuDynInst, instData.VDST); src0.readSrc(); src1.read(); -for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { -if (wf->execMask(lane)) { -vdst[lane] = src0[lane] & src1[lane]; +if (isDPPInst()) { +VecOperandU32 src0_dpp(gpuDynInst, extData.iFmt_VOP_DPP.SRC0); +src0_dpp.read(); + +DPRINTF(VEGA, "Handling V_AND_B32 SRC DPP. SRC0: register v[%d], " +"DPP_CTRL: 0x%#x, SRC0_ABS: %d, SRC0_NEG: %d, " +"SRC1_ABS: %d, SRC1_NEG: %d, BC: %d, " +"BANK_MASK: %d, ROW_MASK: %d\n", extData.iFmt_VOP_DPP.SRC0, +extData.iFmt_VOP_DPP.DPP_CTRL, +extData.iFmt_VOP_DPP.SRC0_ABS, +extData.iFmt_VOP_DPP.SRC0_NEG, +extData.iFmt_VOP_DPP.SRC1_ABS, +extData.iFmt_VOP_DPP.SRC1_NEG, +extData.iFmt_VOP_DPP.BC, +extData.iFmt_VOP_DPP.BANK_MASK, +extData.iFmt_VOP_DPP.ROW_MASK); + +processDPP(gpuDynInst, extData.iFmt_VOP_DPP, src0_dpp, src1); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (wf->execMask(lane)) { +vdst[lane] = src0_dpp[lane] & src1[lane]; +} +} +} else { +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (wf->execMask(lane)) { +vdst[lane] = src0[lane] & src1[lane]; +} } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942 Gerrit-Change-Number: 66753 Gerrit-PatchSet: 5 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write2st64_b64
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email ) Change subject: arch-vega: Implement ds_write2st64_b64 .. arch-vega: Implement ds_write2st64_b64 Write two qwords at offsets multiplied by 8 * 64 bytes. Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 7594f9c..3ef11c4 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -36589,8 +36589,52 @@ void Inst_DS__DS_WRITE2ST64_B64::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU64 data0(gpuDynInst, extData.DATA0); +ConstVecOperandU64 data1(gpuDynInst, extData.DATA1); + +addr.read(); +data0.read(); +data1.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast( +gpuDynInst->d_data))[lane * 2] = data0[lane]; +(reinterpret_cast( +gpuDynInst->d_data))[lane * 2 + 1] = data1[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_WRITE2ST64_B64::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0 * 8 * 64; +Addr offset1 = instData.OFFSET1 * 8 * 64; + +initDualMemWrite(gpuDynInst, offset0, offset1); +} + +void +Inst_DS__DS_WRITE2ST64_B64::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // --- Inst_DS__DS_CMPST_B64 class methods --- Inst_DS__DS_CMPST_B64::Inst_DS__DS_CMPST_B64(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 9f017f9..2896732 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -33572,6 +33572,8 @@ } // getOperandSize void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; }; // Inst_DS__DS_WRITE2ST64_B64 class Inst_DS__DS_CMPST_B64 : public Inst_DS -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b Gerrit-Change-Number: 67078 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Read one dword for SGPR base global insts
mt_FLAT instData; // second instruction DWORD @@ -987,7 +999,7 @@ void generateGlobalDisassembly(); void -calcAddrSgpr(GPUDynInstPtr gpuDynInst, ConstVecOperandU64 &vaddr, +calcAddrSgpr(GPUDynInstPtr gpuDynInst, ConstVecOperandU32 &vaddr, ConstScalarOperandU64 &saddr, ScalarRegI32 offset) { // Use SGPR pair as a base address and add VGPR-offset and -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67077?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0 Gerrit-Change-Number: 67077 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Add missing operand size for ds_write2st64_b64
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email ) Change subject: arch-vega: Add missing operand size for ds_write2st64_b64 .. arch-vega: Add missing operand size for ds_write2st64_b64 This instruction takes three operands (address, and two datas) but there were only operand sizes for two operands tripping assert in default case. Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7 --- M src/arch/amdgpu/vega/insts/instructions.hh 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 0671df8..1c42248 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -33553,7 +33553,9 @@ switch (opIdx) { case 0: //vgpr_a return 4; - case 1: //vgpr_d1 + case 1: //vgpr_d0 +return 8; + case 2: //vgpr_d1 return 8; default: fatal("op idx %i out of bounds\n", opIdx); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7 Gerrit-Change-Number: 67071 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: base: Specialize bitwise atomics so FP types can be used
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email ) Change subject: base: Specialize bitwise atomics so FP types can be used .. base: Specialize bitwise atomics so FP types can be used The current atomic memory operations are templated so any type can be used. However floating point types can not perform bitwise operations. The GPU model contains some instructions which do atomics on floating point types, so they need to be supported. To allow this, template specialization is added to atomic AND, OR, and XOR which does nothing if the type is floating point and operates as normal for integral types. Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c --- M src/base/amo.hh 1 file changed, 47 insertions(+), 3 deletions(-) diff --git a/src/base/amo.hh b/src/base/amo.hh index 81bf069..c990d15 100644 --- a/src/base/amo.hh +++ b/src/base/amo.hh @@ -129,30 +129,57 @@ template class AtomicOpAnd : public TypedAtomicOpFunctor { +// Bitwise operations are only legal on integral types +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { *b &= a; } + +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { } + public: T a; AtomicOpAnd(T _a) : a(_a) { } -void execute(T *b) { *b &= a; } +void execute(T *b) { executeImpl(b); } AtomicOpFunctor* clone () { return new AtomicOpAnd(a); } }; template class AtomicOpOr : public TypedAtomicOpFunctor { +// Bitwise operations are only legal on integral types +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { *b |= a; } + +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { } + public: T a; AtomicOpOr(T _a) : a(_a) { } -void execute(T *b) { *b |= a; } +void execute(T *b) { executeImpl(b); } AtomicOpFunctor* clone () { return new AtomicOpOr(a); } }; template class AtomicOpXor : public TypedAtomicOpFunctor { +// Bitwise operations are only legal on integral types +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { *b ^= a; } + +template +typename std::enable_if::value, void>::type +executeImpl(B *b) { } + public: T a; AtomicOpXor(T _a) : a(_a) {} -void execute(T *b) { *b ^= a; } +void execute(T *b) { executeImpl(b); } AtomicOpFunctor* clone () { return new AtomicOpXor(a); } }; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c Gerrit-Change-Number: 67073 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u64
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email ) Change subject: arch-vega: Implement ds_add_u64 .. arch-vega: Implement ds_add_u64 This instruction does an atomic add of an unsigned 64-bit data with a VGPR and value in LDS atomically without return. Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077 --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 60 insertions(+), 3 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index a0308c8..511a767 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -36082,6 +36082,10 @@ Inst_DS__DS_ADD_U64::Inst_DS__DS_ADD_U64(InFmt_DS *iFmt) : Inst_DS(iFmt, "ds_add_u64") { +setFlag(MemoryRef); +setFlag(GroupSegment); +setFlag(AtomicAdd); +setFlag(AtomicNoReturn); } // Inst_DS__DS_ADD_U64 Inst_DS__DS_ADD_U64::~Inst_DS__DS_ADD_U64() @@ -36090,14 +36094,53 @@ // --- description from .arch file --- // 64b: -// tmp = MEM[ADDR]; // MEM[ADDR] += DATA[0:1]; -// RETURN_DATA[0:1] = tmp. void Inst_DS__DS_ADD_U64::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU64 data(gpuDynInst, extData.DATA0); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_ADD_U64::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initAtomicAccess(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_ADD_U64::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // completeAcc // --- Inst_DS__DS_SUB_U64 class methods --- Inst_DS__DS_SUB_U64::Inst_DS__DS_SUB_U64(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 05a0002..f8fc98b 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -33079,6 +33079,8 @@ } } // getOperandSize +void initiateAcc(GPUDynInstPtr gpuDynInst) override; +void completeAcc(GPUDynInstPtr gpuDynInst) override; void execute(GPUDynInstPtr) override; }; // Inst_DS__DS_ADD_U64 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077 Gerrit-Change-Number: 67075 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_f32 atomic
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email ) Change subject: arch-vega: Implement ds_add_f32 atomic .. arch-vega: Implement ds_add_f32 atomic This instruction does an atomic add of a 32-bit float with a VGPR and value in LDS atomically without return. Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 60 insertions(+), 3 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 5332687..a0308c8 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -34749,6 +34749,10 @@ : Inst_DS(iFmt, "ds_add_f32") { setFlag(F32); +setFlag(MemoryRef); +setFlag(GroupSegment); +setFlag(AtomicAdd); +setFlag(AtomicNoReturn); } // Inst_DS__DS_ADD_F32 Inst_DS__DS_ADD_F32::~Inst_DS__DS_ADD_F32() @@ -34757,15 +34761,54 @@ // --- description from .arch file --- // 32b: -// tmp = MEM[ADDR]; // MEM[ADDR] += DATA; -// RETURN_DATA = tmp. // Floating point add that handles NaN/INF/denormal values. void Inst_DS__DS_ADD_F32::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandF32 data(gpuDynInst, extData.DATA0); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_ADD_F32::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initAtomicAccess(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_ADD_F32::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // completeAcc // --- Inst_DS__DS_WRITE_B8 class methods --- Inst_DS__DS_WRITE_B8::Inst_DS__DS_WRITE_B8(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 33be33e..05a0002 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -31895,6 +31895,8 @@ } } // getOperandSize +void initiateAcc(GPUDynInstPtr gpuDynInst) override; +void completeAcc(GPUDynInstPtr gpuDynInst) override; void execute(GPUDynInstPtr) override; }; // Inst_DS__DS_ADD_F32 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb Gerrit-Change-Number: 67074 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u32 atomic
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email ) Change subject: arch-vega: Implement ds_add_u32 atomic .. arch-vega: Implement ds_add_u32 atomic This instruction does an atomic add of unsigned 32-bit data with a VGPR and value in LDS atomically, without return. Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 60 insertions(+), 3 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 3570e32..5332687 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -34065,6 +34065,10 @@ Inst_DS__DS_ADD_U32::Inst_DS__DS_ADD_U32(InFmt_DS *iFmt) : Inst_DS(iFmt, "ds_add_u32") { +setFlag(MemoryRef); +setFlag(GroupSegment); +setFlag(AtomicAdd); +setFlag(AtomicNoReturn); } // Inst_DS__DS_ADD_U32 Inst_DS__DS_ADD_U32::~Inst_DS__DS_ADD_U32() @@ -34073,14 +34077,53 @@ // --- description from .arch file --- // 32b: -// tmp = MEM[ADDR]; // MEM[ADDR] += DATA; -// RETURN_DATA = tmp. void Inst_DS__DS_ADD_U32::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU32 data(gpuDynInst, extData.DATA0); + +addr.read(); +data.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->a_data))[lane] += data[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_ADD_U32::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initAtomicAccess(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_ADD_U32::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // completeAcc // --- Inst_DS__DS_SUB_U32 class methods --- Inst_DS__DS_SUB_U32::Inst_DS__DS_SUB_U32(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index 1c42248..33be33e 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -31211,6 +31211,8 @@ } } // getOperandSize +void initiateAcc(GPUDynInstPtr gpuDynInst) override; +void completeAcc(GPUDynInstPtr gpuDynInst) override; void execute(GPUDynInstPtr) override; }; // Inst_DS__DS_ADD_U32 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e Gerrit-Change-Number: 67072 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_read_i8
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email ) Change subject: arch-vega: Implement ds_read_i8 .. arch-vega: Implement ds_read_i8 Read one byte with sign extended from LDS. Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh 2 files changed, 56 insertions(+), 1 deletion(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 511a767..f0fb1aa 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -35630,8 +35630,50 @@ void Inst_DS__DS_READ_I8::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); + +if (gpuDynInst->exec_mask.none()) { +wf->decLGKMInstsIssued(); +return; +} + +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); + +addr.read(); + +calcAddr(gpuDynInst, addr); + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } // execute + +void +Inst_DS__DS_READ_I8::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initMemRead(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_READ_I8::completeAcc(GPUDynInstPtr gpuDynInst) +{ +VecOperandU32 vdst(gpuDynInst, extData.VDST); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +vdst[lane] = (VecElemU32)sext<8>((reinterpret_cast( +gpuDynInst->d_data))[lane]); +} +} + +vdst.write(); +} // completeAcc // --- Inst_DS__DS_READ_U8 class methods --- Inst_DS__DS_READ_U8::Inst_DS__DS_READ_U8(InFmt_DS *iFmt) diff --git a/src/arch/amdgpu/vega/insts/instructions.hh b/src/arch/amdgpu/vega/insts/instructions.hh index f8fc98b..b2cf2b9 100644 --- a/src/arch/amdgpu/vega/insts/instructions.hh +++ b/src/arch/amdgpu/vega/insts/instructions.hh @@ -32848,6 +32848,8 @@ } // getOperandSize void execute(GPUDynInstPtr) override; +void initiateAcc(GPUDynInstPtr) override; +void completeAcc(GPUDynInstPtr) override; }; // Inst_DS__DS_READ_I8 class Inst_DS__DS_READ_U8 : public Inst_DS -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be Gerrit-Change-Number: 67076 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Add DPP support for V_AND_B32
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email ) Change subject: arch-vega: Add DPP support for V_AND_B32 .. arch-vega: Add DPP support for V_AND_B32 A DPP variant of V_AND_B32 was found in rocPRIM. With this changeset the unit tests for rocPRIM scan_inclusive are passing. Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942 --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 5612f29..3570e32 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -6838,15 +6838,41 @@ { Wavefront *wf = gpuDynInst->wavefront(); ConstVecOperandU32 src0(gpuDynInst, instData.SRC0); -ConstVecOperandU32 src1(gpuDynInst, instData.VSRC1); +VecOperandU32 src1(gpuDynInst, instData.VSRC1); VecOperandU32 vdst(gpuDynInst, instData.VDST); src0.readSrc(); src1.read(); -for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { -if (wf->execMask(lane)) { -vdst[lane] = src0[lane] & src1[lane]; +if (isDPPInst()) { +VecOperandU32 src0_dpp(gpuDynInst, extData.iFmt_VOP_DPP.SRC0); +src0_dpp.read(); + +DPRINTF(VEGA, "Handling V_AND_B32 SRC DPP. SRC0: register v[%d], " +"DPP_CTRL: 0x%#x, SRC0_ABS: %d, SRC0_NEG: %d, " +"SRC1_ABS: %d, SRC1_NEG: %d, BC: %d, " +"BANK_MASK: %d, ROW_MASK: %d\n", extData.iFmt_VOP_DPP.SRC0, +extData.iFmt_VOP_DPP.DPP_CTRL, +extData.iFmt_VOP_DPP.SRC0_ABS, +extData.iFmt_VOP_DPP.SRC0_NEG, +extData.iFmt_VOP_DPP.SRC1_ABS, +extData.iFmt_VOP_DPP.SRC1_NEG, +extData.iFmt_VOP_DPP.BC, +extData.iFmt_VOP_DPP.BANK_MASK, +extData.iFmt_VOP_DPP.ROW_MASK); + +processDPP(gpuDynInst, extData.iFmt_VOP_DPP, src0_dpp, src1); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (wf->execMask(lane)) { +vdst[lane] = src0_dpp[lane] & src1[lane]; +} +} +} else { +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (wf->execMask(lane)) { +vdst[lane] = src0[lane] & src1[lane]; +} } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942 Gerrit-Change-Number: 66753 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Fix several issues with DPP
umVecElemPerVecReg) % NumVecElemPerVecReg; } else if (dppCtrl == SQ_DPP_ROW_MIRROR) { // DPP_ROW_MIRROR localRowOffset = (15 - localRowOffset); @@ -392,12 +389,16 @@ } else if (dppCtrl == SQ_DPP_ROW_BCAST15) { // DPP_ROW_BCAST15 count = 15; if (currLane > count) { -newLane = (currLane & ~count) - 1; +newLane = (currLane & 0x30) - 1; +} else { +outOfBounds = true; } } else if (dppCtrl == SQ_DPP_ROW_BCAST31) { // DPP_ROW_BCAST31 count = 31; if (currLane > count) { -newLane = (currLane & ~count) - 1; +newLane = (currLane & 0x20) - 1; +} else { +outOfBounds = true; } } else { panic("Unimplemented DPP control operation: %d\n", dppCtrl); @@ -443,6 +444,9 @@ src0.absModifier(); } +// Need a copy of the original data since we update one lane at a time +T src0_copy = src0; + // iterate over all register lanes, performing steps 2-4 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { threadValid = (0x1LL << lane); @@ -458,7 +462,6 @@ if (((rowMask & (0x1 << rowNum)) == 0) /* row mask */ || ((bankMask & (0x1 << bankNum)) == 0) /* bank mask */) { laneDisabled = true; -continue; } /** @@ -495,7 +498,7 @@ } else { threadValid = 0; } -} else if (!gpuDynInst->exec_mask[lane]) { +} else if (!gpuDynInst->wavefront()->execMask(lane)) { if (boundCtrl == 1) { zeroSrc = true; } else { @@ -505,13 +508,15 @@ if (threadValid != 0 && !outOfBounds && !zeroSrc) { assert(!laneDisabled); -src0[outLane] = src0[lane]; +src0[lane] = src0_copy[outLane]; } else if (zeroSrc) { src0[lane] = 0; } // reset for next iteration laneDisabled = false; +outOfBounds = false; +zeroSrc = false; } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66752?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168 Gerrit-Change-Number: 66752 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Fix signed BFE instructions
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email ) Change subject: arch-vega: Fix signed BFE instructions .. arch-vega: Fix signed BFE instructions The bitfield extract instructions come in unsigned and signed variants. The documentation on this is not correct, however the GCN3 documentation gives some clues. The instruction should extract an N-bit integer where N is defined in a source operand starting at some bit also defined by a source operand. For signed variants of this instruction, the N-bit integer should be sign extended but is currently not. This changeset does sign extension using the runtime value of N by ORing the upper bits with ones if the most significant bit is one. This was verified by writing these instructions in assembly and running on a real GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64. Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974 --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 45 insertions(+), 0 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index f5b08b7..5612f29 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -1302,6 +1302,15 @@ sdst = (src0.rawData() >> bits(src1.rawData(), 4, 0)) & ((1 << bits(src1.rawData(), 22, 16)) - 1); + +// Above extracted a signed int of size src1[22:16] bits which needs +// to be signed-extended. Check if the MSB of our src1[22:16]-bit +// integer is 1, and sign extend it is. +if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) { +sdst = sdst.rawData() + | (0x << bits(src1.rawData(), 22, 16)); +} + scc = sdst.rawData() ? 1 : 0; sdst.write(); @@ -1373,6 +1382,14 @@ sdst = (src0.rawData() >> bits(src1.rawData(), 5, 0)) & ((1 << bits(src1.rawData(), 22, 16)) - 1); + +// Above extracted a signed int of size src1[22:16] bits which needs +// to be signed-extended. Check if the MSB of our src1[22:16]-bit +// integer is 1, and sign extend it is. +if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) { +sdst = sdst.rawData() + | 0x << bits(src1.rawData(), 22, 16); +} scc = sdst.rawData() ? 1 : 0; sdst.write(); @@ -30544,6 +30561,13 @@ if (wf->execMask(lane)) { vdst[lane] = (src0[lane] >> bits(src1[lane], 4, 0)) & ((1 << bits(src2[lane], 4, 0)) - 1); + +// Above extracted a signed int of size src2 bits which needs +// to be signed-extended. Check if the MSB of our src2-bit +// integer is 1, and sign extend it is. +if (vdst[lane] >> (bits(src2[lane], 4, 0) - 1)) { +vdst[lane] |= 0x << bits(src2[lane], 4, 0); +} } } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974 Gerrit-Change-Number: 66751 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute: Fix ABI init for DispatchId
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email ) Change subject: gpu-compute: Fix ABI init for DispatchId .. gpu-compute: Fix ABI init for DispatchId DispatchId should allocate two SGPRs instead of one. Allocating one was causing all subsequent SGPR index values to be off by one, leading to bad addresses for things like flat scratch and private segment. This field is not used very often so it was not impacting most applications. Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66711 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/gpu-compute/wavefront.cc 1 file changed, 28 insertions(+), 2 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/wavefront.cc b/src/gpu-compute/wavefront.cc index 7e4b36f..8a1adfe 100644 --- a/src/gpu-compute/wavefront.cc +++ b/src/gpu-compute/wavefront.cc @@ -118,8 +118,10 @@ { int regInitIdx = 0; -// iterate over all the init fields and check which -// bits are enabled +// Iterate over all the init fields and check which +// bits are enabled. Useful information can be found here: +// https://github.com/ROCm-Developer-Tools/ROCm-ComputeABI-Doc/ +//blob/master/AMDGPU-ABI.md for (int en_bit = 0; en_bit < NumScalarInitFields; ++en_bit) { if (task->sgprBitEnabled(en_bit)) { @@ -263,6 +265,12 @@ computeUnit->cu_id, simdId, wfSlotId, wfDynId, physSgprIdx, task->dispatchId()); + +// Dispatch ID in gem5 is an int. Set upper 32-bits to zero. +physSgprIdx += computeUnit->registerManager->mapSgpr(this, regInitIdx); +computeUnit->srf[simdId]->write(physSgprIdx, 0); +++regInitIdx; break; case FlatScratchInit: physSgprIdx -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06 Gerrit-Change-Number: 66711 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email ) Change subject: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot .. dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot As of Linux 5.11, the MC146818 code was changed to avoid reading garbage data that may occur if the is a read while the registers are being updated: github.com/torvalds/linux/commit/05a0302c35481e9b47fb90ba40922b0a4cae40d8 Previously toggling this bit was fine as Linux would check twice. It now checks before and after reading time information, causing it to retry infinitely until eventually Linux bootup fails due to watchdog timeout. This changeset always sets update in progress to false. Since this is a simulation, the updates probably will not be occurring at the same time a read is occurring. Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b --- M src/dev/mc146818.cc 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/src/dev/mc146818.cc b/src/dev/mc146818.cc index 919efb0..2bfe877 100644 --- a/src/dev/mc146818.cc +++ b/src/dev/mc146818.cc @@ -233,8 +233,9 @@ else { switch (addr) { case RTC_STAT_REGA: -// toggle UIP bit for linux -stat_regA.uip = !stat_regA.uip; +// Linux after v5.10 checks this multiple times so toggling +// leads to a deadlock on bootup. +stat_regA.uip = 0; return stat_regA; break; case RTC_STAT_REGB: -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b Gerrit-Change-Number: 66731 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute: Fix ABI init for DispatchId
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email ) Change subject: gpu-compute: Fix ABI init for DispatchId .. gpu-compute: Fix ABI init for DispatchId DispatchId should allocated two SGPRs instead of one. Allocating one was causing all subsequent SGPR index values to be off by one, leading to bad addresses for things like flat scratch and private segment. This field is not used very often so it was not impacting most applications. Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06 --- M src/gpu-compute/wavefront.cc 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/src/gpu-compute/wavefront.cc b/src/gpu-compute/wavefront.cc index 7e4b36f..8a1adfe 100644 --- a/src/gpu-compute/wavefront.cc +++ b/src/gpu-compute/wavefront.cc @@ -118,8 +118,10 @@ { int regInitIdx = 0; -// iterate over all the init fields and check which -// bits are enabled +// Iterate over all the init fields and check which +// bits are enabled. Useful information can be found here: +// https://github.com/ROCm-Developer-Tools/ROCm-ComputeABI-Doc/ +//blob/master/AMDGPU-ABI.md for (int en_bit = 0; en_bit < NumScalarInitFields; ++en_bit) { if (task->sgprBitEnabled(en_bit)) { @@ -263,6 +265,12 @@ computeUnit->cu_id, simdId, wfSlotId, wfDynId, physSgprIdx, task->dispatchId()); + +// Dispatch ID in gem5 is an int. Set upper 32-bits to zero. +physSgprIdx += computeUnit->registerManager->mapSgpr(this, regInitIdx); +computeUnit->srf[simdId]->write(physSgprIdx, 0); +++regInitIdx; break; case FlatScratchInit: physSgprIdx -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06 Gerrit-Change-Number: 66711 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Writeback RLC queue MQD when unmapped
rlc1.base(rb_base); +rlc1.base(mqd->rb_base << 8); +rlc1.size(rlc_size); rlc1.rptr(0); -rlc1.wptr(0); +rlc1.incRptr(mqd->rptr); +rlc1.setWptr(mqd->wptr); rlc1.rptrWbAddr(rptr_wb_addr); rlc1.processing(false); -rlc1.size(size); +rlc1.setMQD(mqd); +rlc1.setMQDAddr(mqdAddr); } else { panic("No free RLCs. Check they are properly unmapped."); } @@ -199,9 +209,37 @@ { DPRINTF(SDMAEngine, "Unregistering RLC queue at %#lx\n", doorbell); if (rlcInfo[0] == doorbell) { +SDMAQueueDesc *mqd = rlc0.getMQD(); +if (mqd) { +DPRINTF(SDMAEngine, "Writing RLC0 SDMAMQD back to %#lx\n", +rlc0.getMQDAddr()); + +mqd->rptr = rlc0.globalRptr(); +mqd->wptr = rlc0.getWptr(); + +auto cb = new DmaVirtCallback( +[ = ] (const uint32_t &) { }); +dmaWriteVirt(rlc0.getMQDAddr(), sizeof(SDMAQueueDesc), cb, mqd); +} else { +warn("RLC0 SDMAMQD address invalid\n"); +} rlc0.valid(false); rlcInfo[0] = 0; } else if (rlcInfo[1] == doorbell) { +SDMAQueueDesc *mqd = rlc1.getMQD(); +if (mqd) { +DPRINTF(SDMAEngine, "Writing RLC1 SDMAMQD back to %#lx\n", +rlc1.getMQDAddr()); + +mqd->rptr = rlc1.globalRptr(); +mqd->wptr = rlc1.getWptr(); + +auto cb = new DmaVirtCallback( +[ = ] (const uint32_t &) { }); +dmaWriteVirt(rlc1.getMQDAddr(), sizeof(SDMAQueueDesc), cb, mqd); +} else { +warn("RLC1 SDMAMQD address invalid\n"); +} rlc1.valid(false); rlcInfo[1] = 0; } else { @@ -213,7 +251,9 @@ SDMAEngine::deallocateRLCQueues() { for (auto doorbell: rlcInfo) { -unregisterRLCQueue(doorbell); +if (doorbell) { +unregisterRLCQueue(doorbell); +} } } diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index 0bfee12..27c1691 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -34,6 +34,7 @@ #include "base/bitunion.hh" #include "dev/amdgpu/amdgpu_device.hh" +#include "dev/amdgpu/pm4_queues.hh" #include "dev/amdgpu/sdma_packets.hh" #include "dev/dma_virt_device.hh" #include "params/SDMAEngine.hh" @@ -65,9 +66,11 @@ SDMAQueue *_parent; SDMAQueue *_ib; SDMAType _type; +SDMAQueueDesc *_mqd; +Addr _mqd_addr = 0; public: SDMAQueue() : _rptr(0), _wptr(0), _valid(false), _processing(false), -_parent(nullptr), _ib(nullptr), _type(SDMAGfx) {} +_parent(nullptr), _ib(nullptr), _type(SDMAGfx), _mqd(nullptr) {} Addr base() { return _base; } Addr rptr() { return _base + _rptr; } @@ -82,6 +85,8 @@ SDMAQueue* parent() { return _parent; } SDMAQueue* ib() { return _ib; } SDMAType queueType() { return _type; } +SDMAQueueDesc* getMQD() { return _mqd; } +Addr getMQDAddr() { return _mqd_addr; } void base(Addr value) { _base = value; } @@ -114,6 +119,8 @@ void parent(SDMAQueue* q) { _parent = q; } void ib(SDMAQueue* ib) { _ib = ib; } void queueType(SDMAType type) { _type = type; } +void setMQD(SDMAQueueDesc *mqd) { _mqd = mqd; } +void setMQDAddr(Addr mqdAddr) { _mqd_addr = mqdAddr; } }; /* SDMA Engine ID */ @@ -280,8 +287,7 @@ /** * Methods for RLC queues */ -void registerRLCQueue(Addr doorbell, Addr rb_base, uint32_t size, - Addr rptr_wb_addr); +void registerRLCQueue(Addr doorbell, Addr mqdAddr, SDMAQueueDesc *mqd); void unregisterRLCQueue(Addr doorbell); void deallocateRLCQueues(); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65791?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f Gerrit-Change-Number: 65791 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: configs: Set CPU vendor to M5 Simulator in apu_se.py
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email ) Change subject: configs: Set CPU vendor to M5 Simulator in apu_se.py .. configs: Set CPU vendor to M5 Simulator in apu_se.py Other vendor strings causes, for some reason, bad addresses to be computed when running the GPU model. This change reverts back to M5 Simulator only for apu_se.py. Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65991 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Reviewed-by: Jason Lowe-Power --- M configs/example/apu_se.py 1 file changed, 23 insertions(+), 0 deletions(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved Jason Lowe-Power: Looks good to me, approved diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py index 39def02..8e8bc60 100644 --- a/configs/example/apu_se.py +++ b/configs/example/apu_se.py @@ -757,6 +757,11 @@ (cpu_list[i], future_cpu_list[i]) for i in range(args.num_cpus) ] +# Other CPU strings cause bad addresses in ROCm. Revert back to M5 Simulator. +for (i, cpu) in enumerate(cpu_list): +for j in range(len(cpu)): +cpu.isa[j].vendor_string = "M5 Simulator" + # Full list of processing cores in the system. cpu_list = cpu_list + [shader] + cp_list -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845 Gerrit-Change-Number: 65991 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: configs: Set CPU vendor to M5 Simulator in apu_se.py
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email ) Change subject: configs: Set CPU vendor to M5 Simulator in apu_se.py .. configs: Set CPU vendor to M5 Simulator in apu_se.py Other vendor strings causes, for some reason, bad addresses to be computed when running the GPU model. This change reverts back to M5 Simulator only for apu_se.py. Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845 --- M configs/example/apu_se.py 1 file changed, 18 insertions(+), 0 deletions(-) diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py index 39def02..8e8bc60 100644 --- a/configs/example/apu_se.py +++ b/configs/example/apu_se.py @@ -757,6 +757,11 @@ (cpu_list[i], future_cpu_list[i]) for i in range(args.num_cpus) ] +# Other CPU strings cause bad addresses in ROCm. Revert back to M5 Simulator. +for (i, cpu) in enumerate(cpu_list): +for j in range(len(cpu)): +cpu.isa[j].vendor_string = "M5 Simulator" + # Full list of processing cores in the system. cpu_list = cpu_list + [shader] + cp_list -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845 Gerrit-Change-Number: 65991 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Writeback RLC queue MQD when unmapped
ptr_wb_addr); rlc1.processing(false); -rlc1.size(size); +rlc1.setMQD(mqd); +rlc1.setMQDAddr(mqdAddr); } else { panic("No free RLCs. Check they are properly unmapped."); } @@ -199,9 +209,37 @@ { DPRINTF(SDMAEngine, "Unregistering RLC queue at %#lx\n", doorbell); if (rlcInfo[0] == doorbell) { +SDMAQueueDesc *mqd = rlc0.getMQD(); +if (mqd) { +DPRINTF(SDMAEngine, "Writing RLC0 SDMAMQD back to %#lx\n", +rlc0.getMQDAddr()); + +mqd->rptr = rlc0.globalRptr(); +mqd->wptr = rlc0.getWptr(); + +auto cb = new DmaVirtCallback( +[ = ] (const uint32_t &) { }); +dmaWriteVirt(rlc0.getMQDAddr(), sizeof(SDMAQueueDesc), cb, mqd); +} else { +warn("RLC0 SDMAMQD address invalid\n"); +} rlc0.valid(false); rlcInfo[0] = 0; } else if (rlcInfo[1] == doorbell) { +SDMAQueueDesc *mqd = rlc1.getMQD(); +if (mqd) { +DPRINTF(SDMAEngine, "Writing RLC1 SDMAMQD back to %#lx\n", +rlc1.getMQDAddr()); + +mqd->rptr = rlc1.globalRptr(); +mqd->wptr = rlc1.getWptr(); + +auto cb = new DmaVirtCallback( +[ = ] (const uint32_t &) { }); +dmaWriteVirt(rlc1.getMQDAddr(), sizeof(SDMAQueueDesc), cb, mqd); +} else { +warn("RLC1 SDMAMQD address invalid\n"); +} rlc1.valid(false); rlcInfo[1] = 0; } else { @@ -213,7 +251,9 @@ SDMAEngine::deallocateRLCQueues() { for (auto doorbell: rlcInfo) { -unregisterRLCQueue(doorbell); +if (doorbell) { +unregisterRLCQueue(doorbell); +} } } diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index 0bfee12..27c1691 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -34,6 +34,7 @@ #include "base/bitunion.hh" #include "dev/amdgpu/amdgpu_device.hh" +#include "dev/amdgpu/pm4_queues.hh" #include "dev/amdgpu/sdma_packets.hh" #include "dev/dma_virt_device.hh" #include "params/SDMAEngine.hh" @@ -65,9 +66,11 @@ SDMAQueue *_parent; SDMAQueue *_ib; SDMAType _type; +SDMAQueueDesc *_mqd; +Addr _mqd_addr = 0; public: SDMAQueue() : _rptr(0), _wptr(0), _valid(false), _processing(false), -_parent(nullptr), _ib(nullptr), _type(SDMAGfx) {} +_parent(nullptr), _ib(nullptr), _type(SDMAGfx), _mqd(nullptr) {} Addr base() { return _base; } Addr rptr() { return _base + _rptr; } @@ -82,6 +85,8 @@ SDMAQueue* parent() { return _parent; } SDMAQueue* ib() { return _ib; } SDMAType queueType() { return _type; } +SDMAQueueDesc* getMQD() { return _mqd; } +Addr getMQDAddr() { return _mqd_addr; } void base(Addr value) { _base = value; } @@ -114,6 +119,8 @@ void parent(SDMAQueue* q) { _parent = q; } void ib(SDMAQueue* ib) { _ib = ib; } void queueType(SDMAType type) { _type = type; } +void setMQD(SDMAQueueDesc *mqd) { _mqd = mqd; } +void setMQDAddr(Addr mqdAddr) { _mqd_addr = mqdAddr; } }; /* SDMA Engine ID */ @@ -280,8 +287,7 @@ /** * Methods for RLC queues */ -void registerRLCQueue(Addr doorbell, Addr rb_base, uint32_t size, - Addr rptr_wb_addr); +void registerRLCQueue(Addr doorbell, Addr mqdAddr, SDMAQueueDesc *mqd); void unregisterRLCQueue(Addr doorbell); void deallocateRLCQueues(); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65791?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f Gerrit-Change-Number: 65791 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Store SDMA queue type, use for ring ID
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email ) Change subject: dev-amdgpu: Store SDMA queue type, use for ring ID .. dev-amdgpu: Store SDMA queue type, use for ring ID Currently the SDMA queue type is guessed in the trap method by looking at which queue in the engine is processing packets. It is possible for both queues to be processing (e.g., one queue sent a DMA and is waiting then switch to another queue), triggering an assert. Instead store the queue type in the queue itself and use that type in trap to determine which ring ID to use for the interrupt packet. Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65691 Maintainer: Jason Lowe-Power Reviewed-by: Jason Lowe-Power Tested-by: kokoro --- M src/dev/amdgpu/sdma_engine.cc M src/dev/amdgpu/sdma_engine.hh 2 files changed, 30 insertions(+), 6 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc index 59c5027..02203c8 100644 --- a/src/dev/amdgpu/sdma_engine.cc +++ b/src/dev/amdgpu/sdma_engine.cc @@ -55,11 +55,15 @@ gfxIb.parent(&gfx); gfx.valid(true); gfxIb.valid(true); +gfx.queueType(SDMAGfx); +gfxIb.queueType(SDMAGfx); page.ib(&pageIb); pageIb.parent(&page); page.valid(true); pageIb.valid(true); +page.queueType(SDMAPage); +pageIb.queueType(SDMAPage); rlc0.ib(&rlc0Ib); rlc0Ib.parent(&rlc0); @@ -727,11 +731,7 @@ DPRINTF(SDMAEngine, "Trap contextId: %p\n", pkt->intrContext); -uint32_t ring_id = 0; -assert(page.processing() ^ gfx.processing()); -if (page.processing()) { -ring_id = 3; -} +uint32_t ring_id = (q->queueType() == SDMAPage) ? 3 : 0; gpuDevice->getIH()->prepareInterruptCookie(pkt->intrContext, ring_id, getIHClientId(), TRAP_ID); diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index d0afaf7..0bfee12 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -64,9 +64,10 @@ bool _processing; SDMAQueue *_parent; SDMAQueue *_ib; +SDMAType _type; public: SDMAQueue() : _rptr(0), _wptr(0), _valid(false), _processing(false), -_parent(nullptr), _ib(nullptr) {} +_parent(nullptr), _ib(nullptr), _type(SDMAGfx) {} Addr base() { return _base; } Addr rptr() { return _base + _rptr; } @@ -80,6 +81,7 @@ bool processing() { return _processing; } SDMAQueue* parent() { return _parent; } SDMAQueue* ib() { return _ib; } +SDMAType queueType() { return _type; } void base(Addr value) { _base = value; } @@ -111,6 +113,7 @@ void processing(bool value) { _processing = value; } void parent(SDMAQueue* q) { _parent = q; } void ib(SDMAQueue* ib) { _ib = ib; } +void queueType(SDMAType type) { _type = type; } }; /* SDMA Engine ID */ -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84 Gerrit-Change-Number: 65691 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Store SDMA queue type, use for ring ID
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email ) Change subject: dev-amdgpu: Store SDMA queue type, use for ring ID .. dev-amdgpu: Store SDMA queue type, use for ring ID Currently the SDMA queue type is guessed in the trap method by looking at which queue in the engine is processing packets. It is possible for both queues to be processing (e.g., one queue sent a DMA and is waiting then switch to another queue), triggering an assert. Instead store the queue type in the queue itself and use that type in trap to determine which ring ID to use for the interrupt packet. Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84 --- M src/dev/amdgpu/sdma_engine.cc M src/dev/amdgpu/sdma_engine.hh 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc index 59c5027..02203c8 100644 --- a/src/dev/amdgpu/sdma_engine.cc +++ b/src/dev/amdgpu/sdma_engine.cc @@ -55,11 +55,15 @@ gfxIb.parent(&gfx); gfx.valid(true); gfxIb.valid(true); +gfx.queueType(SDMAGfx); +gfxIb.queueType(SDMAGfx); page.ib(&pageIb); pageIb.parent(&page); page.valid(true); pageIb.valid(true); +page.queueType(SDMAPage); +pageIb.queueType(SDMAPage); rlc0.ib(&rlc0Ib); rlc0Ib.parent(&rlc0); @@ -727,11 +731,7 @@ DPRINTF(SDMAEngine, "Trap contextId: %p\n", pkt->intrContext); -uint32_t ring_id = 0; -assert(page.processing() ^ gfx.processing()); -if (page.processing()) { -ring_id = 3; -} +uint32_t ring_id = (q->queueType() == SDMAPage) ? 3 : 0; gpuDevice->getIH()->prepareInterruptCookie(pkt->intrContext, ring_id, getIHClientId(), TRAP_ID); diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index d0afaf7..0bfee12 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -64,9 +64,10 @@ bool _processing; SDMAQueue *_parent; SDMAQueue *_ib; +SDMAType _type; public: SDMAQueue() : _rptr(0), _wptr(0), _valid(false), _processing(false), -_parent(nullptr), _ib(nullptr) {} +_parent(nullptr), _ib(nullptr), _type(SDMAGfx) {} Addr base() { return _base; } Addr rptr() { return _base + _rptr; } @@ -80,6 +81,7 @@ bool processing() { return _processing; } SDMAQueue* parent() { return _parent; } SDMAQueue* ib() { return _ib; } +SDMAType queueType() { return _type; } void base(Addr value) { _base = value; } @@ -111,6 +113,7 @@ void processing(bool value) { _processing = value; } void parent(SDMAQueue* q) { _parent = q; } void ib(SDMAQueue* ib) { _ib = ib; } +void queueType(SDMAType type) { _type = type; } }; /* SDMA Engine ID */ -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84 Gerrit-Change-Number: 65691 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Fix SOPK instruction sign extends
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email ) Change subject: arch-vega: Fix SOPK instruction sign extends .. arch-vega: Fix SOPK instruction sign extends See: https://gem5-review.googlesource.com/c/public/gem5/+/37495 Same patch but for vega. This fixes issues with lulesh and probably rodinia - heartwall as well in fullsystem. Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65432 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 28 insertions(+), 10 deletions(-) Approvals: kokoro: Regressions pass Matt Sinclair: Looks good to me, approved; Looks good to me, approved diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 76bb8aa..f5b08b7 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -1553,7 +1553,7 @@ void Inst_SOPK__S_MOVK_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ScalarOperandI32 sdst(gpuDynInst, instData.SDST); sdst = simm16; @@ -1579,7 +1579,7 @@ void Inst_SOPK__S_CMOVK_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ScalarOperandI32 sdst(gpuDynInst, instData.SDST); ConstScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1607,7 +1607,7 @@ void Inst_SOPK__S_CMPK_EQ_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1634,7 +1634,7 @@ void Inst_SOPK__S_CMPK_LG_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1661,7 +1661,7 @@ void Inst_SOPK__S_CMPK_GT_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1688,7 +1688,7 @@ void Inst_SOPK__S_CMPK_GE_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1715,7 +1715,7 @@ void Inst_SOPK__S_CMPK_LT_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1742,7 +1742,7 @@ void Inst_SOPK__S_CMPK_LE_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1939,7 +1939,7 @@ src.read(); -sdst = src.rawData() + (ScalarRegI32)simm16; +sdst = src.rawData() + (ScalarRegI32)sext<16>(simm16); scc = (bits(src.rawData(), 31) == bits(simm16, 15) && bits(src.rawData(), 31) != bits(sdst.rawData(), 31)) ? 1 : 0; @@ -1969,7 +1969,7 @@ src.read(); -sdst = src.rawData() * (ScalarRegI32)simm16; +sdst = src.rawData() * (ScalarRegI32)sext<16>(simm16); sdst.write(); } // execute -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d Gerrit-Change-Number: 65432 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Revie
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Handle ring buffer wrap for PM4 queue
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email ) Change subject: dev-amdgpu: Handle ring buffer wrap for PM4 queue .. dev-amdgpu: Handle ring buffer wrap for PM4 queue Change-Id: I27bc274327838add709423b072d437c4e727a714 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65431 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matt Sinclair --- M src/dev/amdgpu/pm4_mmio.hh M src/dev/amdgpu/pm4_packet_processor.cc M src/dev/amdgpu/pm4_packet_processor.hh M src/dev/amdgpu/pm4_queues.hh 4 files changed, 31 insertions(+), 4 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/amdgpu/pm4_mmio.hh b/src/dev/amdgpu/pm4_mmio.hh index a3ce5f1..3801223 100644 --- a/src/dev/amdgpu/pm4_mmio.hh +++ b/src/dev/amdgpu/pm4_mmio.hh @@ -60,6 +60,7 @@ #define mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI 0x1251 #define mmCP_HQD_PQ_WPTR_POLL_ADDR 0x1252 #define mmCP_HQD_PQ_WPTR_POLL_ADDR_HI 0x1253 +#define mmCP_HQD_PQ_CONTROL 0x1256 #define mmCP_HQD_IB_CONTROL 0x125a #define mmCP_HQD_PQ_WPTR_LO 0x127b #define mmCP_HQD_PQ_WPTR_HI 0x127c diff --git a/src/dev/amdgpu/pm4_packet_processor.cc b/src/dev/amdgpu/pm4_packet_processor.cc index 4f98f18..f78f833 100644 --- a/src/dev/amdgpu/pm4_packet_processor.cc +++ b/src/dev/amdgpu/pm4_packet_processor.cc @@ -147,8 +147,8 @@ gpuDevice->setDoorbellType(offset, qt); DPRINTF(PM4PacketProcessor, "New PM4 queue %d, base: %p offset: %p, me: " -"%d, pipe %d queue: %d\n", id, q->base(), q->offset(), q->me(), -q->pipe(), q->queue()); +"%d, pipe %d queue: %d size: %d\n", id, q->base(), q->offset(), +q->me(), q->pipe(), q->queue(), q->size()); } void @@ -790,6 +790,9 @@ case mmCP_HQD_PQ_WPTR_POLL_ADDR_HI: setHqdPqWptrPollAddrHi(pkt->getLE()); break; + case mmCP_HQD_PQ_CONTROL: +setHqdPqControl(pkt->getLE()); +break; case mmCP_HQD_IB_CONTROL: setHqdIbCtrl(pkt->getLE()); break; @@ -912,6 +915,12 @@ } void +PM4PacketProcessor::setHqdPqControl(uint32_t data) +{ +kiq.hqd_pq_control = data; +} + +void PM4PacketProcessor::setHqdIbCtrl(uint32_t data) { kiq.hqd_ib_control = data; diff --git a/src/dev/amdgpu/pm4_packet_processor.hh b/src/dev/amdgpu/pm4_packet_processor.hh index 4806671..4617a21 100644 --- a/src/dev/amdgpu/pm4_packet_processor.hh +++ b/src/dev/amdgpu/pm4_packet_processor.hh @@ -171,6 +171,7 @@ void setHqdPqRptrReportAddrHi(uint32_t data); void setHqdPqWptrPollAddr(uint32_t data); void setHqdPqWptrPollAddrHi(uint32_t data); +void setHqdPqControl(uint32_t data); void setHqdIbCtrl(uint32_t data); void setRbVmid(uint32_t data); void setRbCntl(uint32_t data); diff --git a/src/dev/amdgpu/pm4_queues.hh b/src/dev/amdgpu/pm4_queues.hh index 19973b1..8b6626d 100644 --- a/src/dev/amdgpu/pm4_queues.hh +++ b/src/dev/amdgpu/pm4_queues.hh @@ -396,14 +396,14 @@ rptr() { if (ib()) return q->ibBase + q->ibRptr; -else return q->base + q->rptr; +else return q->base + (q->rptr % size()); } Addr wptr() { if (ib()) return q->ibBase + _ibWptr; -else return q->base + _wptr; +else return q->base + (_wptr % size()); } Addr @@ -470,6 +470,9 @@ uint32_t pipe() { return _pkt.pipe; } uint32_t queue() { return _pkt.queueSlot; } bool privileged() { return _pkt.queueSel == 0 ? 1 : 0; } + +// Same computation as processMQD. See comment there for details. +uint64_t size() { return 4UL << ((q->hqd_pq_control & 0x3f) + 1); } }; } // namespace gem5 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I27bc274327838add709423b072d437c4e727a714 Gerrit-Change-Number: 65431 Gerrit-PatchSet: 2 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Fix SDMA ring buffer wrap around
operly unmapped."); } @@ -291,6 +292,17 @@ { decodeHeader(q, header); }); dmaReadVirt(q->rptr(), sizeof(uint32_t), cb, &cb->dmaBuffer); } else { +// The driver expects the rptr to be written back to host memory +// periodically. In simulation, we writeback rptr after each burst of +// packets from a doorbell, rather than using the cycle count which +// is not accurate in all simulation settings (e.g., KVM). +DPRINTF(SDMAEngine, "Writing rptr %#lx back to host addr %#lx\n", +q->globalRptr(), q->rptrWbAddr()); +if (q->rptrWbAddr()) { +auto cb = new DmaVirtCallback( +[ = ](const uint64_t &) { }, q->globalRptr()); +dmaWriteVirt(q->rptrWbAddr(), sizeof(Addr), cb, &cb->dmaBuffer); +} q->processing(false); if (q->parent()) { DPRINTF(SDMAEngine, "SDMA switching queues\n"); @@ -1158,6 +1170,7 @@ { gfxRptr = insertBits(gfxRptr, 31, 0, 0); gfxRptr |= data; +gfx.rptrWbAddr(getGARTAddr(gfxRptr)); } void @@ -1165,6 +1178,7 @@ { gfxRptr = insertBits(gfxRptr, 63, 32, 0); gfxRptr |= ((uint64_t)data) << 32; +gfx.rptrWbAddr(getGARTAddr(gfxRptr)); } void @@ -1236,6 +1250,7 @@ { pageRptr = insertBits(pageRptr, 31, 0, 0); pageRptr |= data; +page.rptrWbAddr(getGARTAddr(pageRptr)); } void @@ -1243,6 +1258,7 @@ { pageRptr = insertBits(pageRptr, 63, 32, 0); pageRptr |= ((uint64_t)data) << 32; +page.rptrWbAddr(getGARTAddr(pageRptr)); } void diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh index 6fe7a8e..d0afaf7 100644 --- a/src/dev/amdgpu/sdma_engine.hh +++ b/src/dev/amdgpu/sdma_engine.hh @@ -58,6 +58,8 @@ Addr _rptr; Addr _wptr; Addr _size; +Addr _rptr_wb_addr = 0; +Addr _global_rptr = 0; bool _valid; bool _processing; SDMAQueue *_parent; @@ -72,6 +74,8 @@ Addr wptr() { return _base + _wptr; } Addr getWptr() { return _wptr; } Addr size() { return _size; } +Addr rptrWbAddr() { return _rptr_wb_addr; } +Addr globalRptr() { return _global_rptr; } bool valid() { return _valid; } bool processing() { return _processing; } SDMAQueue* parent() { return _parent; } @@ -82,22 +86,27 @@ void incRptr(uint32_t value) { -//assert((_rptr + value) <= (_size << 1)); _rptr = (_rptr + value) % _size; +_global_rptr += value; } -void rptr(Addr value) { _rptr = value; } +void +rptr(Addr value) +{ +_rptr = value; +_global_rptr = value; +} void setWptr(Addr value) { -//assert(value <= (_size << 1)); _wptr = value % _size; } void wptr(Addr value) { _wptr = value; } void size(Addr value) { _size = value; } +void rptrWbAddr(Addr value) { _rptr_wb_addr = value; } void valid(bool v) { _valid = v; } void processing(bool value) { _processing = value; } void parent(SDMAQueue* q) { _parent = q; } @@ -268,7 +277,8 @@ /** * Methods for RLC queues */ -void registerRLCQueue(Addr doorbell, Addr rb_base); +void registerRLCQueue(Addr doorbell, Addr rb_base, uint32_t size, + Addr rptr_wb_addr); void unregisterRLCQueue(Addr doorbell); void deallocateRLCQueues(); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65351?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I53ebdcc6b02fb4eb4da435c9a509544066a97069 Gerrit-Change-Number: 65351 Gerrit-PatchSet: 4 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Alexandru Duțu (Alex) Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Handle ring buffer wrap for PM4 queue
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email ) Change subject: dev-amdgpu: Handle ring buffer wrap for PM4 queue .. dev-amdgpu: Handle ring buffer wrap for PM4 queue Change-Id: I27bc274327838add709423b072d437c4e727a714 --- M src/dev/amdgpu/pm4_mmio.hh M src/dev/amdgpu/pm4_packet_processor.cc M src/dev/amdgpu/pm4_packet_processor.hh M src/dev/amdgpu/pm4_queues.hh 4 files changed, 27 insertions(+), 4 deletions(-) diff --git a/src/dev/amdgpu/pm4_mmio.hh b/src/dev/amdgpu/pm4_mmio.hh index a3ce5f1..3801223 100644 --- a/src/dev/amdgpu/pm4_mmio.hh +++ b/src/dev/amdgpu/pm4_mmio.hh @@ -60,6 +60,7 @@ #define mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI 0x1251 #define mmCP_HQD_PQ_WPTR_POLL_ADDR 0x1252 #define mmCP_HQD_PQ_WPTR_POLL_ADDR_HI 0x1253 +#define mmCP_HQD_PQ_CONTROL 0x1256 #define mmCP_HQD_IB_CONTROL 0x125a #define mmCP_HQD_PQ_WPTR_LO 0x127b #define mmCP_HQD_PQ_WPTR_HI 0x127c diff --git a/src/dev/amdgpu/pm4_packet_processor.cc b/src/dev/amdgpu/pm4_packet_processor.cc index 4f98f18..f78f833 100644 --- a/src/dev/amdgpu/pm4_packet_processor.cc +++ b/src/dev/amdgpu/pm4_packet_processor.cc @@ -147,8 +147,8 @@ gpuDevice->setDoorbellType(offset, qt); DPRINTF(PM4PacketProcessor, "New PM4 queue %d, base: %p offset: %p, me: " -"%d, pipe %d queue: %d\n", id, q->base(), q->offset(), q->me(), -q->pipe(), q->queue()); +"%d, pipe %d queue: %d size: %d\n", id, q->base(), q->offset(), +q->me(), q->pipe(), q->queue(), q->size()); } void @@ -790,6 +790,9 @@ case mmCP_HQD_PQ_WPTR_POLL_ADDR_HI: setHqdPqWptrPollAddrHi(pkt->getLE()); break; + case mmCP_HQD_PQ_CONTROL: +setHqdPqControl(pkt->getLE()); +break; case mmCP_HQD_IB_CONTROL: setHqdIbCtrl(pkt->getLE()); break; @@ -912,6 +915,12 @@ } void +PM4PacketProcessor::setHqdPqControl(uint32_t data) +{ +kiq.hqd_pq_control = data; +} + +void PM4PacketProcessor::setHqdIbCtrl(uint32_t data) { kiq.hqd_ib_control = data; diff --git a/src/dev/amdgpu/pm4_packet_processor.hh b/src/dev/amdgpu/pm4_packet_processor.hh index 4806671..4617a21 100644 --- a/src/dev/amdgpu/pm4_packet_processor.hh +++ b/src/dev/amdgpu/pm4_packet_processor.hh @@ -171,6 +171,7 @@ void setHqdPqRptrReportAddrHi(uint32_t data); void setHqdPqWptrPollAddr(uint32_t data); void setHqdPqWptrPollAddrHi(uint32_t data); +void setHqdPqControl(uint32_t data); void setHqdIbCtrl(uint32_t data); void setRbVmid(uint32_t data); void setRbCntl(uint32_t data); diff --git a/src/dev/amdgpu/pm4_queues.hh b/src/dev/amdgpu/pm4_queues.hh index 19973b1..8b6626d 100644 --- a/src/dev/amdgpu/pm4_queues.hh +++ b/src/dev/amdgpu/pm4_queues.hh @@ -396,14 +396,14 @@ rptr() { if (ib()) return q->ibBase + q->ibRptr; -else return q->base + q->rptr; +else return q->base + (q->rptr % size()); } Addr wptr() { if (ib()) return q->ibBase + _ibWptr; -else return q->base + _wptr; +else return q->base + (_wptr % size()); } Addr @@ -470,6 +470,9 @@ uint32_t pipe() { return _pkt.pipe; } uint32_t queue() { return _pkt.queueSlot; } bool privileged() { return _pkt.queueSel == 0 ? 1 : 0; } + +// Same computation as processMQD. See comment there for details. +uint64_t size() { return 4UL << ((q->hqd_pq_control & 0x3f) + 1); } }; } // namespace gem5 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I27bc274327838add709423b072d437c4e727a714 Gerrit-Change-Number: 65431 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Fix SOPK instruction sign extends
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email ) Change subject: arch-vega: Fix SOPK instruction sign extends .. arch-vega: Fix SOPK instruction sign extends See: https://gem5-review.googlesource.com/c/public/gem5/+/37495 Same patch but for vega. This fixes issues with lulesh and probably rodinia - heartwall as well in fullsystem. Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 76bb8aa..f5b08b7 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -1553,7 +1553,7 @@ void Inst_SOPK__S_MOVK_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ScalarOperandI32 sdst(gpuDynInst, instData.SDST); sdst = simm16; @@ -1579,7 +1579,7 @@ void Inst_SOPK__S_CMOVK_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ScalarOperandI32 sdst(gpuDynInst, instData.SDST); ConstScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1607,7 +1607,7 @@ void Inst_SOPK__S_CMPK_EQ_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1634,7 +1634,7 @@ void Inst_SOPK__S_CMPK_LG_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1661,7 +1661,7 @@ void Inst_SOPK__S_CMPK_GT_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1688,7 +1688,7 @@ void Inst_SOPK__S_CMPK_GE_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1715,7 +1715,7 @@ void Inst_SOPK__S_CMPK_LT_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1742,7 +1742,7 @@ void Inst_SOPK__S_CMPK_LE_I32::execute(GPUDynInstPtr gpuDynInst) { -ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16; +ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16); ConstScalarOperandI32 src(gpuDynInst, instData.SDST); ScalarOperandU32 scc(gpuDynInst, REG_SCC); @@ -1939,7 +1939,7 @@ src.read(); -sdst = src.rawData() + (ScalarRegI32)simm16; +sdst = src.rawData() + (ScalarRegI32)sext<16>(simm16); scc = (bits(src.rawData(), 31) == bits(simm16, 15) && bits(src.rawData(), 31) != bits(sdst.rawData(), 31)) ? 1 : 0; @@ -1969,7 +1969,7 @@ src.read(); -sdst = src.rawData() * (ScalarRegI32)simm16; +sdst = src.rawData() * (ScalarRegI32)sext<16>(simm16); sdst.write(); } // execute -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d Gerrit-Change-Number: 65432 Gerrit-PatchSet: 1 Gerrit-Owner: Matthew Poremba Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: gpu-compute: Chunkify AMDKernelCode read from device
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/65251?usp=email ) Change subject: gpu-compute: Chunkify AMDKernelCode read from device .. gpu-compute: Chunkify AMDKernelCode read from device The AMDKernelCode object can span potentially span two pages. Currently the copy loop from device memory only translates once at the base address. This changeset translates one cache line at a time before copying and has the ancillary benefit for cleaning up this code a bit. Change-Id: I602bc12d8f8c5d3a3e57ab3f42f7dd3df58dc144 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65251 Reviewed-by: Matt Sinclair Tested-by: kokoro Reviewed-by: Jason Lowe-Power Maintainer: Jason Lowe-Power --- M src/gpu-compute/gpu_command_processor.cc 1 file changed, 42 insertions(+), 9 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass Matt Sinclair: Looks good to me, approved diff --git a/src/gpu-compute/gpu_command_processor.cc b/src/gpu-compute/gpu_command_processor.cc index d46ace6..af59b78 100644 --- a/src/gpu-compute/gpu_command_processor.cc +++ b/src/gpu-compute/gpu_command_processor.cc @@ -118,6 +118,7 @@ { static int dynamic_task_id = 0; _hsa_dispatch_packet_t *disp_pkt = (_hsa_dispatch_packet_t*)raw_pkt; +assert(!(disp_pkt->kernel_object & (system()->cacheLineSize() - 1))); /** * we need to read a pointer in the application's address @@ -150,6 +151,10 @@ is_system_page); } +DPRINTF(GPUCommandProc, "kernobj vaddr %#lx paddr %#lx size %d s:%d\n", +disp_pkt->kernel_object, phys_addr, sizeof(AMDKernelCode), +is_system_page); + /** * The kernel_object is a pointer to the machine code, whose entry * point is an 'amd_kernel_code_t' type, which is included in the @@ -167,20 +172,27 @@ } else { assert(FullSystem); DPRINTF(GPUCommandProc, "kernel_object in device, using device mem\n"); -// Read from GPU memory manager -uint8_t raw_akc[sizeof(AMDKernelCode)]; -for (int i = 0; i < sizeof(AMDKernelCode) / sizeof(uint8_t); ++i) { -Addr mmhubAddr = phys_addr + i*sizeof(uint8_t); + +// Read from GPU memory manager one cache line at a time to prevent +// rare cases where the AKC spans two memory pages. +ChunkGenerator gen(disp_pkt->kernel_object, sizeof(AMDKernelCode), + system()->cacheLineSize()); +for (; !gen.done(); gen.next()) { +Addr chunk_addr = gen.addr(); +int vmid = 1; +unsigned dummy; + walker->startFunctional(gpuDevice->getVM().getPageTableBase(vmid), +chunk_addr, dummy, BaseMMU::Mode::Read, +is_system_page); + Request::Flags flags = Request::PHYSICAL; -RequestPtr request = std::make_shared( -mmhubAddr, sizeof(uint8_t), flags, walker->getDevRequestor()); +RequestPtr request = std::make_shared(chunk_addr, +system()->cacheLineSize(), flags, walker->getDevRequestor()); Packet *readPkt = new Packet(request, MemCmd::ReadReq); -readPkt->allocate(); +readPkt->dataStatic((uint8_t *)&akc + gen.complete()); system()->getDeviceMemory(readPkt)->access(readPkt); -raw_akc[i] = readPkt->getLE(); delete readPkt; } -memcpy(&akc, &raw_akc, sizeof(AMDKernelCode)); } DPRINTF(GPUCommandProc, "GPU machine code is %lli bytes from start of the " -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65251?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I602bc12d8f8c5d3a3e57ab3f42f7dd3df58dc144 Gerrit-Change-Number: 65251 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Alexandru Duțu (Alex) Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute: Add granulated SGPR computation for gfx9
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/65252?usp=email ) ( 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. )Change subject: gpu-compute: Add granulated SGPR computation for gfx9 .. gpu-compute: Add granulated SGPR computation for gfx9 The granulated SGPR size is used when the number of SGPRs is unknown. The computation for this has changed since gfx8 and is commented as a TODO in a comment. This changeset implements the change and also checks for an invalid SGPR count. According to LLVM code this could happen "due to a compiler bug or when using inline asm.": https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/ AMDGPUAsmPrinter.cpp#L723 Change-Id: Ie487a53940b323a0002341075e0f81af4147a7d8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65252 Maintainer: Matt Sinclair Reviewed-by: Matt Sinclair Tested-by: kokoro --- M src/gpu-compute/hsa_queue_entry.hh 1 file changed, 39 insertions(+), 3 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/hsa_queue_entry.hh b/src/gpu-compute/hsa_queue_entry.hh index 4261f2c..fbe0efe 100644 --- a/src/gpu-compute/hsa_queue_entry.hh +++ b/src/gpu-compute/hsa_queue_entry.hh @@ -96,9 +96,22 @@ if (!numVgprs) numVgprs = (akc->granulated_workitem_vgpr_count + 1) * 4; -// TODO: Granularity changes for GFX9! -if (!numSgprs) -numSgprs = (akc->granulated_wavefront_sgpr_count + 1) * 8; +if (!numSgprs || numSgprs == + std::numeric_limitswavefront_sgpr_count)>::max()) { +// Supported major generation numbers: 0 (BLIT kernels), 8, and 9 +uint16_t version = akc->amd_machine_version_major; +assert((version == 0) || (version == 8) || (version == 9)); +// SGPR allocation granularies: +// - GFX8: 8 +// - GFX9: 16 +// Source: https://llvm.org/docs/AMDGPUUsage.html +if ((version == 0) || (version == 8)) { +// We assume that BLIT kernels use the same granularity as GFX8 +numSgprs = (akc->granulated_wavefront_sgpr_count + 1) * 8; +} else if (version == 9) { +numSgprs = ((akc->granulated_wavefront_sgpr_count + 1) * 16)/2; +} +} initialVgprState.reset(); initialSgprState.reset(); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/65252?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ie487a53940b323a0002341075e0f81af4147a7d8 Gerrit-Change-Number: 65252 Gerrit-PatchSet: 3 Gerrit-Owner: Matthew Poremba Gerrit-Reviewer: Alexandru Duțu (Alex) Gerrit-Reviewer: Kyle Roarty Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org