[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby: fix whitespacing errors in RubySystem
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/69397?usp=email ) Change subject: mem-ruby: fix whitespacing errors in RubySystem .. mem-ruby: fix whitespacing errors in RubySystem These errors cause other commits to fail pre-commit Change-Id: I379d2d7c73f88d0bb35de5aaa7d8cb70a83ee1dd Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69397 Tested-by: kokoro Maintainer: Jason Lowe-Power Reviewed-by: Jason Lowe-Power --- M src/mem/ruby/system/RubySystem.cc 1 file changed, 10 insertions(+), 9 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/system/RubySystem.cc b/src/mem/ruby/system/RubySystem.cc index 91c4bc3..5a81513 100644 --- a/src/mem/ruby/system/RubySystem.cc +++ b/src/mem/ruby/system/RubySystem.cc @@ -310,23 +310,24 @@ void RubySystem::serialize(CheckpointOut &cp) const { -// Store the cache-block size, so we are able to restore on systems with a -// different cache-block size. CacheRecorder depends on the correct -// cache-block size upon unserializing. +// Store the cache-block size, so we are able to restore on systems +// with a different cache-block size. CacheRecorder depends on the +// correct cache-block size upon unserializing. uint64_t block_size_bytes = getBlockSizeBytes(); SERIALIZE_SCALAR(block_size_bytes); -// Check that there's a valid trace to use. If not, then memory won't be -// up-to-date and the simulation will probably fail when restoring from the -// checkpoint. +// Check that there's a valid trace to use. If not, then memory won't +// be up-to-date and the simulation will probably fail when restoring +// from the checkpoint. if (m_cache_recorder == NULL) { -fatal("Call memWriteback() before serialize() to create ruby trace"); +fatal("Call memWriteback() before serialize() to create" +"ruby trace"); } // Aggregate the trace entries together into a single array uint8_t *raw_data = new uint8_t[4096]; -uint64_t cache_trace_size = m_cache_recorder->aggregateRecords(&raw_data, - 4096); +uint64_t cache_trace_size = m_cache_recorder->aggregateRecords( +&raw_data, 4096); std::string cache_trace_file = name() + ".cache.gz"; writeCompressedTrace(raw_data, cache_trace_file, cache_trace_size); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69397?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I379d2d7c73f88d0bb35de5aaa7d8cb70a83ee1dd Gerrit-Change-Number: 69397 Gerrit-PatchSet: 3 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: VISHNU RAMADAS Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby: fix whitespacing errors in RubySystem
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/69397?usp=email ) Change subject: mem-ruby: fix whitespacing errors in RubySystem .. mem-ruby: fix whitespacing errors in RubySystem These errors cause other commits to fail pre-commit Change-Id: I379d2d7c73f88d0bb35de5aaa7d8cb70a83ee1dd --- M src/mem/ruby/system/RubySystem.cc 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/src/mem/ruby/system/RubySystem.cc b/src/mem/ruby/system/RubySystem.cc index 91c4bc3..5a81513 100644 --- a/src/mem/ruby/system/RubySystem.cc +++ b/src/mem/ruby/system/RubySystem.cc @@ -310,23 +310,24 @@ void RubySystem::serialize(CheckpointOut &cp) const { -// Store the cache-block size, so we are able to restore on systems with a -// different cache-block size. CacheRecorder depends on the correct -// cache-block size upon unserializing. +// Store the cache-block size, so we are able to restore on systems +// with a different cache-block size. CacheRecorder depends on the +// correct cache-block size upon unserializing. uint64_t block_size_bytes = getBlockSizeBytes(); SERIALIZE_SCALAR(block_size_bytes); -// Check that there's a valid trace to use. If not, then memory won't be -// up-to-date and the simulation will probably fail when restoring from the -// checkpoint. +// Check that there's a valid trace to use. If not, then memory won't +// be up-to-date and the simulation will probably fail when restoring +// from the checkpoint. if (m_cache_recorder == NULL) { -fatal("Call memWriteback() before serialize() to create ruby trace"); +fatal("Call memWriteback() before serialize() to create" +"ruby trace"); } // Aggregate the trace entries together into a single array uint8_t *raw_data = new uint8_t[4096]; -uint64_t cache_trace_size = m_cache_recorder->aggregateRecords(&raw_data, - 4096); +uint64_t cache_trace_size = m_cache_recorder->aggregateRecords( +&raw_data, 4096); std::string cache_trace_file = name() + ".cache.gz"; writeCompressedTrace(raw_data, cache_trace_file, cache_trace_size); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69397?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I379d2d7c73f88d0bb35de5aaa7d8cb70a83ee1dd Gerrit-Change-Number: 69397 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [M] Change in gem5/gem5[develop]: tests: add GPU Ruby Random tester with WB L2 caches
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/69258?usp=email ) ( 4 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. )Change subject: tests: add GPU Ruby Random tester with WB L2 caches .. tests: add GPU Ruby Random tester with WB L2 caches The current GPU Ruby Random tester tests only test for WT L2 caches, meaning that some transitions (specific to WB caches) are never tested. To help ensure better coverage, this commit adds a separate test that tests WB GPU L2 caches to the per-checkin and nightly regressions. Change-Id: I539ece3b825b9a38630027d947dc11ebef588752 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69258 Tested-by: kokoro Maintainer: Bobby Bruce Reviewed-by: Bobby Bruce --- A tests/gem5/gpu/test_gpu_ruby_random_wbL2.py 1 file changed, 84 insertions(+), 0 deletions(-) Approvals: Bobby Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/tests/gem5/gpu/test_gpu_ruby_random_wbL2.py b/tests/gem5/gpu/test_gpu_ruby_random_wbL2.py new file mode 100644 index 000..9af4e65 --- /dev/null +++ b/tests/gem5/gpu/test_gpu_ruby_random_wbL2.py @@ -0,0 +1,84 @@ +# Copyright (c) 2023 The Board of Regents of the University of Wisconsin +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +from testlib import * + +""" +This file contains random tests for the Ruby GPU protocols with a WB L2 cache. +""" + +# This test will first run the GPU protocol random tester -- it should take +# about 30 seconds to run and provides good coverage for the coherence +# protocol. +# +# Input choices (some are default and thus implicit): +# - use small cache size to encourage races +# - use small system size to encourage races since more requests per CU (and +# faster sim) +# - use small address range to encourage more races +# - use small episode length to encourage more races +# - 50K tests runs in ~30 seconds with reasonably good coverage +# - num-dmas = 0 because VIPER doesn't support partial cache line writes, which +# DMAs need +gem5_verify_config( +name="ruby-gpu-random-test-wbL2-perCheckin", +fixtures=(), +verifiers=(), +config=joinpath( +config.base_dir, "configs", "example", "ruby_gpu_random_test.py" +), +config_args=["--WB_L2", "--test-length", "5", "--num-dmas", "0"], +valid_isas=(constants.vega_x86_tag,), +valid_hosts=constants.supported_hosts, +length=constants.long_tag, +) + + +# This test will run the GPU protocol random tester in nightly -- it should +# take about 30 minutes to run and provides good coverage for the coherence +# protocol. +# +# Input choices (some are default and thus implicit): +# - use small cache size to encourage races +# - use small system size to encourage races since more requests per CU (and +#faster sim) +# - use small address range to encourage more races +# - use small episode length to encourage more races +# - 5M tests runs in ~30 minutes with reasonably good coverage +# - num-dmas = 0 because VIPER doesn't support partial cache line writes, +#which DMAs need +gem5_verify_config( +name="ruby-gpu-random-test-wbL2-nightly", +fixtures=(), +verifiers=(), +config=joinpath( +config.base_dir, "configs", "example", "ruby_gpu_random_test.py" +), +config_args=["--WB_L2", "--test-le
[gem5-dev] [M] Change in gem5/gem5[develop]: tests: add GPU Ruby Random tester with WB L2 caches
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/69258?usp=email ) Change subject: tests: add GPU Ruby Random tester with WB L2 caches .. tests: add GPU Ruby Random tester with WB L2 caches The current GPU Ruby Random tester tests only test for WT L2 caches, meaning that some transitions (specific to WB caches) are never tested. To help ensure better coverage, this commit adds a separate test that tests WB GPU L2 caches to the per-checkin and nightly regressions. Change-Id: I539ece3b825b9a38630027d947dc11ebef588752 --- A tests/gem5/gpu/test_gpu_ruby_random_wbL2.py 1 file changed, 84 insertions(+), 0 deletions(-) diff --git a/tests/gem5/gpu/test_gpu_ruby_random_wbL2.py b/tests/gem5/gpu/test_gpu_ruby_random_wbL2.py new file mode 100644 index 000..808acb1 --- /dev/null +++ b/tests/gem5/gpu/test_gpu_ruby_random_wbL2.py @@ -0,0 +1,84 @@ +# Copyright (c) 2022 The Regents of the University of California +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +from testlib import * + +""" +This file contains random tests for the Ruby GPU protocols. +""" + +# This test will first run the GPU protocol random tester -- it should take +# about 30 seconds to run and provides good coverage for the coherence +# protocol. +# +# Input choices (some are default and thus implicit): +# - use small cache size to encourage races +# - use small system size to encourage races since more requests per CU (and +# faster sim) +# - use small address range to encourage more races +# - use small episode length to encourage more races +# - 50K tests runs in ~30 seconds with reasonably good coverage +# - num-dmas = 0 because VIPER doesn't support partial cache line writes, which +# DMAs need +gem5_verify_config( +name="ruby-gpu-random-test-perCheckin", +fixtures=(), +verifiers=(), +config=joinpath( +config.base_dir, "configs", "example", "ruby_gpu_random_test.py" +), +config_args=["--WB_L2", "--test-length", "5", "--num-dmas", "0"], +valid_isas=(constants.vega_x86_tag,), +valid_hosts=constants.supported_hosts, +length=constants.long_tag, +) + + +# This test will run the GPU protocol random tester in nightly -- it should +# take about 30 minutes to run and provides good coverage for the coherence +# protocol. +# +# Input choices (some are default and thus implicit): +# - use small cache size to encourage races +# - use small system size to encourage races since more requests per CU (and +#faster sim) +# - use small address range to encourage more races +# - use small episode length to encourage more races +# - 5M tests runs in ~30 minutes with reasonably good coverage +# - num-dmas = 0 because VIPER doesn't support partial cache line writes, +#which DMAs need +gem5_verify_config( +name="ruby-gpu-random-test-nightly", +fixtures=(), +verifiers=(), +config=joinpath( +config.base_dir, "configs", "example", "ruby_gpu_random_test.py" +), +config_args=["--WB_L2", "--test-length", "500", "--num-dmas", "0"], +valid_isas=(constants.vega_x86_tag,), +valid_hosts=constants.supported_hosts, +length=constants.long_tag, +) -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69258?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Ge
[gem5-dev] [S] Change in gem5/gem5[develop]: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 i...
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/69257?usp=email ) Change subject: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into develop .. Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into develop Change-Id: I2c40feffc8601c2df5d78182ba3cceb2de009ae1 --- 1 file changed, 16 insertions(+), 0 deletions(-) -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/69257?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I2c40feffc8601c2df5d78182ba3cceb2de009ae1 Gerrit-Change-Number: 69257 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: mem-ruby: fix atomic deadlock with WB GPU L2 caches
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/68978?usp=email ) Change subject: mem-ruby: fix atomic deadlock with WB GPU L2 caches .. mem-ruby: fix atomic deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with atomics. Specifically, when an atomic reaches the L2 and the line is currently in M or W, the line must be written back before the atomic can be performed. However, the current support has two issues: a) it never performs the atomic operation -- while VIPER current assumes all atomics are system scope atomics and thus cannot be performed at the L2 and this transition requires the dirty line be written back before performing the atomic, the transition never performs the atomic nor does the response path handle it. b) putting the atomic action right after the write back is not safe because we need to ensure the requests are ordered when they reach memory -- thus we have to wait until the write back is acknowledged before it's safe to send/perform the atomic. To fix this, this change modifies the transition in question to put the atomic on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the atomic, which will result in the atomic being sent on to the directory). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: I9a43fd985dc71297521f4b05c47288d92c314ac7 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68978 Maintainer: Bobby Bruce Reviewed-by: Matthew Poremba Tested-by: kokoro --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 6 insertions(+), 2 deletions(-) Approvals: kokoro: Regressions pass Matthew Poremba: Looks good to me, approved Bobby Bruce: Looks good to me, approved diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 0b7f5ed..a595898 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -816,10 +816,14 @@ } transition({M, W}, Atomic, WI) {TagArrayRead} { -p_profileHit; t_allocateTBE; wb_writeBack; -p_popRequestQueue; +// after writing back the current line, we need to wait for it to be done +// before we try to perform the atomic +// by putting the stalled requests in a buffer, we reduce resource contention +// since they won't try again every cycle and will instead only try again once +// woken up +st_stallAndWaitRequest; } transition(I, WrVicBlk) {TagArrayRead} { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/68978?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I9a43fd985dc71297521f4b05c47288d92c314ac7 Gerrit-Change-Number: 68978 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Bradford Beckmann Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: VISHNU RAMADAS Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: mem-ruby: fix load deadlock with WB GPU L2 caches
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/68977?usp=email ) Change subject: mem-ruby: fix load deadlock with WB GPU L2 caches .. mem-ruby: fix load deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with loads. Specifically, when a load reaches the L2 and the line is currently in the W state, that line must be written back before the load can be performed. However, the current transition for this in the L2 did not attempt to retry the load when the WB completes, resulting in a deadlock. This deadlock can be replicated by running the GPU Ruby random tester as is with a WB L2 cache instead of a WT L2 cache. To fix this, this change modifies the transition in question to put the load on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the load). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/68977 Reviewed-by: Matthew Poremba Maintainer: Bobby Bruce Tested-by: kokoro --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 5 insertions(+), 2 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Bobby Bruce: Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 0f93339..0b7f5ed 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -718,10 +718,13 @@ p_popRequestQueue; } transition(W, RdBlk, WI) {TagArrayRead, DataArrayRead} { -p_profileHit; t_allocateTBE; wb_writeBack; -p_popRequestQueue; +// need to try this request again after writing back the current entry -- to +// do so, put it with other stalled requests in a buffer to reduce resource +// contention since they won't try again every cycle and will instead only +// try again once woken up +st_stallAndWaitRequest; } transition(I, RdBlk, IV) {TagArrayRead} { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/68977?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 Gerrit-Change-Number: 68977 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Bradford Beckmann Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: VISHNU RAMADAS Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: mem-ruby: Add RdBypassEvict to stalled GPU L2 requests
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/68998?usp=email ) Change subject: mem-ruby: Add RdBypassEvict to stalled GPU L2 requests .. mem-ruby: Add RdBypassEvict to stalled GPU L2 requests 66d4a158 added support for AMD's GPU cache modifiers (GLC and SLC). However, it did not consider a corner case with a WB GPU L2 cache where the line is currently in WI and a SLC load arrives at the L2. In this case, we need to stall the load until the write back completes and the line transitions to I. This patch adds that support. Change-Id: I839638c37fdd0f7d25b48a63bca44a3c4d69dbdf --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index a595898..8b70431 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -693,7 +693,7 @@ // Stalling transitions do NOT check the tag array...and if they do, // they can cause a resource stall deadlock! - transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { + transition(WI, {RdBlk, RdBypassEvict, WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { // by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/68998?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I839638c37fdd0f7d25b48a63bca44a3c4d69dbdf Gerrit-Change-Number: 68998 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: mem-ruby: fix load deadlock with WB GPU L2 caches
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/68977?usp=email ) Change subject: mem-ruby: fix load deadlock with WB GPU L2 caches .. mem-ruby: fix load deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with loads. Specifically, when a load reaches the L2 and the line is currently in the W state, that line must be written back before the load can be performed. However, the current transition for this in the L2 did not attempt to retry the load when the WB completes, resulting in a deadlock. This deadlock can be replicated by running the GPU Ruby random tester as is with a WB L2 cache instead of a WT L2 cache. To fix this, this change modifies the transition in question to put the load on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the load). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 0f93339..0b7f5ed 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -718,10 +718,13 @@ p_popRequestQueue; } transition(W, RdBlk, WI) {TagArrayRead, DataArrayRead} { -p_profileHit; t_allocateTBE; wb_writeBack; -p_popRequestQueue; +// need to try this request again after writing back the current entry -- to +// do so, put it with other stalled requests in a buffer to reduce resource +// contention since they won't try again every cycle and will instead only +// try again once woken up +st_stallAndWaitRequest; } transition(I, RdBlk, IV) {TagArrayRead} { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/68977?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ieec4f61a3070cf9976b8c3ef0cdbd0cc5a1443c6 Gerrit-Change-Number: 68977 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [XS] Change in gem5/gem5[develop]: mem-ruby: fix atomic deadlock with WB GPU L2 caches
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/68978?usp=email ) Change subject: mem-ruby: fix atomic deadlock with WB GPU L2 caches .. mem-ruby: fix atomic deadlock with WB GPU L2 caches By default the GPU VIPER coherence protocol uses a WT L2 cache. However it has support for using WB caches (although this is not tested currently). When using a WB L2 cache for the GPU, this results in deadlocks with atomics. Specifically, when an atomic reaches the L2 and the line is currently in M or W, the line must be written back before the atomic can be performed. However, the current support has two issues: a) it never performs the atomic operation -- while VIPER current assumes all atomics are system scope atomics and thus cannot be performed at the L2 and this transition requires the dirty line be written back before performing the atomic, the transition never performs the atomic nor does the response path handle it. b) putting the atomic action right after the write back is not safe because we need to ensure the requests are ordered when they reach memory -- thus we have to wait until the write back is acknowledged before it's safe to send/perform the atomic. To fix this, this change modifies the transition in question to put the atomic on the stalled requests buffer, which the WBAck will check when it returns to the L2 (and thus perform the atomic, which will result in the atomic being sent on to the directory). This fix has been tested and verified with both the per-checkin and nightly GPU Ruby Random tester tests (with a WB L2 cache). Change-Id: I9a43fd985dc71297521f4b05c47288d92c314ac7 --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 0b7f5ed..a595898 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -816,10 +816,14 @@ } transition({M, W}, Atomic, WI) {TagArrayRead} { -p_profileHit; t_allocateTBE; wb_writeBack; -p_popRequestQueue; +// after writing back the current line, we need to wait for it to be done +// before we try to perform the atomic +// by putting the stalled requests in a buffer, we reduce resource contention +// since they won't try again every cycle and will instead only try again once +// woken up +st_stallAndWaitRequest; } transition(I, WrVicBlk) {TagArrayRead} { -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/68978?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I9a43fd985dc71297521f4b05c47288d92c314ac7 Gerrit-Change-Number: 68978 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: tests: cleanup m5out directly in weekly
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67198?usp=email ) Change subject: tests: cleanup m5out directly in weekly .. tests: cleanup m5out directly in weekly The weekly test script was implicitly assuming that no m5out directory existed in the folder where the script was run. However, if a prior test ran and failed, it would not clean up its m5out directory, causing the weekly tests to fail. This commit resolves this by removing the m5out directory before trying to run any tests in the weekly script. Moreover, we also update the weekly script to explicitly remove this m5out directory at the end of the script. Change-Id: If10c59034528e171cc2c5dacb928b3a81d6b8c50 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67198 Reviewed-by: Bobby Bruce Maintainer: Bobby Bruce Tested-by: kokoro --- M tests/weekly.sh 1 file changed, 8 insertions(+), 4 deletions(-) Approvals: Bobby Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index c7f834b..f9d3e4b 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -70,13 +70,14 @@ # GPU weekly tests start here # before pulling gem5 resources, make sure it doesn't exist already -docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ +docker run -u $UID:$GID --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" --memory="${docker_mem_limit}" \ gcr.io/gem5-test/gcn-gpu:${tag} bash -c \ "rm -rf ${gem5_root}/gem5-resources" -# delete Pannotia datasets and output files in case a failed regression run left -# them around -rm -f coAuthorsDBLP.graph 1k_128k.gr result.out + +# delete m5out, Pannotia datasets, and output files in case a failed regression +# run left them around +rm -rf ${gem5_root}/m5out coAuthorsDBLP.graph 1k_128k.gr result.out # Pull gem5 resources to the root of the gem5 directory -- currently the # pre-built binares for LULESH are out-of-date and won't run correctly with @@ -383,5 +384,8 @@ "${gem5_root}" --memory="${docker_mem_limit}" hacc-test-weekly bash -c \ "rm -rf ${gem5_root}/gem5-resources" +# Delete the gem5 m5out folder we created +rm -rf ${gem5_root}/m5out + # delete Pannotia datasets we downloaded and output files it created rm -f coAuthorsDBLP.graph 1k_128k.gr result.out -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67198?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If10c59034528e171cc2c5dacb928b3a81d6b8c50 Gerrit-Change-Number: 67198 Gerrit-PatchSet: 3 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby: fix TCP spacing/spelling
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67200?usp=email ) Change subject: mem-ruby: fix TCP spacing/spelling .. mem-ruby: fix TCP spacing/spelling Change-Id: I3fd9009592c8716a3da19dcdccf68f16af6522ef Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67200 Reviewed-by: Jason Lowe-Power Maintainer: Jason Lowe-Power Tested-by: kokoro --- M src/mem/ruby/protocol/GPU_VIPER-TCP.sm 1 file changed, 19 insertions(+), 6 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm index 14bdcec..6a977c4 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm @@ -261,7 +261,7 @@ // If L1 is disabled or requests have GLC or SLC flag set, // then, the requests should not cache in the L1. The response // from L2/global memory should bypass the cache - trigger(Event:Bypass, in_msg.addr, cache_entry, tbe); + trigger(Event:Bypass, in_msg.addr, cache_entry, tbe); } else { if (is_valid(cache_entry) || L1cache.cacheAvail(in_msg.addr)) { trigger(Event:TCC_Ack, in_msg.addr, cache_entry, tbe); @@ -288,7 +288,7 @@ DPRINTF(RubySlicc, "%s\n", in_msg); if (in_msg.Type == RubyRequestType:LD) { if ((in_msg.isGLCSet || in_msg.isSLCSet) && is_valid(cache_entry)) { -// Read rquests with GLC or SLC bit set should not cache in the L1. +// Read requests with GLC or SLC bit set should not cache in the L1. // They need to bypass the L1 and go to the L2. If an entry exists // in the L1, it needs to be evicted trigger(Event:LoadBypassEvict, in_msg.LineAddress, cache_entry, tbe); @@ -609,15 +609,15 @@ p_popMandatoryQueue; } -// Transition to be called when a load request with GLC or SLC flag set arrives -// at L1. This transition invalidates any existing entry and forwards the -// request to L2. + // Transition to be called when a load request with GLC or SLC flag set arrives + // at L1. This transition invalidates any existing entry and forwards the + // request to L2. transition(V, LoadBypassEvict, I) {TagArrayRead, TagArrayWrite} { uu_profileDataMiss; ic_invCache; n_issueRdBlk; p_popMandatoryQueue; -} + } transition({V, I}, Atomic, A) {TagArrayRead, TagArrayWrite} { t_allocateTBE; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67200?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I3fd9009592c8716a3da19dcdccf68f16af6522ef Gerrit-Change-Number: 67200 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: VISHNU RAMADAS Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby, gpu-compute: fix TCP GLC cache bypassing
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67199?usp=email ) Change subject: mem-ruby, gpu-compute: fix TCP GLC cache bypassing .. mem-ruby, gpu-compute: fix TCP GLC cache bypassing 66d4a158 added support for AMD's GPU cache bypassing flags (GLC for bypassing L1 caches, SLC for bypassing all caches). However, for applications that use the GLC flag but intermix GLC- and non-GLC accesses to the same address, this previous commit has a bug. This bug manifests when the address is currently valid in the L1 (TCP). In this case, the previous commit chose to evict the line before letting the bypassing access to proceed. However, to do this the previous commit was using the inv_invDone action as part of the process of evicting it. This action is only intended to be called when load acquires are being performed (i.e., when the entire L1 cache is being flash invalidated). Thus, calling inv_invDone for a GLC (or SLC) bypassing request caused an assert failure since the bypassing request was not performing a load acquire. This commit resolves this by changing the support in this case to simply invalidate the entry in the cache. Change-Id: Ibaa4976f8714ac93650020af1c0ce2b6732c95a2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67199 Reviewed-by: Jason Lowe-Power Tested-by: kokoro Maintainer: Jason Lowe-Power --- M src/mem/ruby/protocol/GPU_VIPER-TCP.sm 1 file changed, 31 insertions(+), 1 deletion(-) Approvals: Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm index 3be1397..14bdcec 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm @@ -614,7 +614,6 @@ // request to L2. transition(V, LoadBypassEvict, I) {TagArrayRead, TagArrayWrite} { uu_profileDataMiss; -inv_invDone; ic_invCache; n_issueRdBlk; p_popMandatoryQueue; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67199?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibaa4976f8714ac93650020af1c0ce2b6732c95a2 Gerrit-Change-Number: 67199 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: VISHNU RAMADAS Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby: add GPU cache bypass I->I transition
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/67201?usp=email ) Change subject: mem-ruby: add GPU cache bypass I->I transition .. mem-ruby: add GPU cache bypass I->I transition 66d4a158 added support for AMD's GPU cache bypassing flags (GLC for bypassing L1 caches, SLC for bypassing all caches). However, it did not add a transition for the situation where the cache line is currently I (Invalid). This commit adds this support, which resolves an assert failure in Pannotia workloads when this situation arises. Change-Id: I59a62ce70c01dd8b73aacb733fb3d1d0dab2624b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67201 Reviewed-by: Jason Lowe-Power Tested-by: kokoro Maintainer: Jason Lowe-Power --- M src/mem/ruby/protocol/GPU_VIPER-TCP.sm 1 file changed, 29 insertions(+), 0 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm index 6a977c4..7e0ad4e 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm @@ -619,6 +619,15 @@ p_popMandatoryQueue; } + // Transition to be called when a load request with GLC or SLC flag set arrives + // at L1. Since the entry is invalid, there isn't anything to forward to L2, + // so just issue read. + transition(I, LoadBypassEvict) {TagArrayRead, TagArrayWrite} { +uu_profileDataMiss; +n_issueRdBlk; +p_popMandatoryQueue; + } + transition({V, I}, Atomic, A) {TagArrayRead, TagArrayWrite} { t_allocateTBE; mru_updateMRU; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67201?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I59a62ce70c01dd8b73aacb733fb3d1d0dab2624b Gerrit-Change-Number: 67201 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: VISHNU RAMADAS Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby, gpu-compute: fix TCP GLC cache bypassing
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67199?usp=email ) Change subject: mem-ruby, gpu-compute: fix TCP GLC cache bypassing .. mem-ruby, gpu-compute: fix TCP GLC cache bypassing 66d4a158 added support for AMD's GPU cache bypassing flags (GLC for bypassing L1 caches, SLC for bypassing all caches). However, for applications that use the GLC flag but intermix GLC- and non-GLC accesses to the same address, this previous commit has a bug. This bug manifests when the address is currently valid in the L1 (TCP). In this case, the previous commit chose to evict the line before letting the bypassing access to proceed. However, to do this the previous commit was using the inv_invDone action as part of the process of evicting it. This action is only intended to be called when load acquires are being performed (i.e., when the entire L1 cache is being flash invalidated). Thus, calling inv_invDone for a GLC (or SLC) bypassing request caused an assert failure since the bypassing request was not performing a load acquire. This commit resolves this by changing the support in this case to simply invalidate the entry in the cache. Change-Id: Ibaa4976f8714ac93650020af1c0ce2b6732c95a2 --- M src/mem/ruby/protocol/GPU_VIPER-TCP.sm 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm index 3be1397..14bdcec 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm @@ -614,7 +614,6 @@ // request to L2. transition(V, LoadBypassEvict, I) {TagArrayRead, TagArrayWrite} { uu_profileDataMiss; -inv_invDone; ic_invCache; n_issueRdBlk; p_popMandatoryQueue; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67199?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibaa4976f8714ac93650020af1c0ce2b6732c95a2 Gerrit-Change-Number: 67199 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby: fix TCP spacing/spelling
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67200?usp=email ) Change subject: mem-ruby: fix TCP spacing/spelling .. mem-ruby: fix TCP spacing/spelling Change-Id: I3fd9009592c8716a3da19dcdccf68f16af6522ef --- M src/mem/ruby/protocol/GPU_VIPER-TCP.sm 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm index 14bdcec..6a977c4 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm @@ -261,7 +261,7 @@ // If L1 is disabled or requests have GLC or SLC flag set, // then, the requests should not cache in the L1. The response // from L2/global memory should bypass the cache - trigger(Event:Bypass, in_msg.addr, cache_entry, tbe); + trigger(Event:Bypass, in_msg.addr, cache_entry, tbe); } else { if (is_valid(cache_entry) || L1cache.cacheAvail(in_msg.addr)) { trigger(Event:TCC_Ack, in_msg.addr, cache_entry, tbe); @@ -288,7 +288,7 @@ DPRINTF(RubySlicc, "%s\n", in_msg); if (in_msg.Type == RubyRequestType:LD) { if ((in_msg.isGLCSet || in_msg.isSLCSet) && is_valid(cache_entry)) { -// Read rquests with GLC or SLC bit set should not cache in the L1. +// Read requests with GLC or SLC bit set should not cache in the L1. // They need to bypass the L1 and go to the L2. If an entry exists // in the L1, it needs to be evicted trigger(Event:LoadBypassEvict, in_msg.LineAddress, cache_entry, tbe); @@ -609,15 +609,15 @@ p_popMandatoryQueue; } -// Transition to be called when a load request with GLC or SLC flag set arrives -// at L1. This transition invalidates any existing entry and forwards the -// request to L2. + // Transition to be called when a load request with GLC or SLC flag set arrives + // at L1. This transition invalidates any existing entry and forwards the + // request to L2. transition(V, LoadBypassEvict, I) {TagArrayRead, TagArrayWrite} { uu_profileDataMiss; ic_invCache; n_issueRdBlk; p_popMandatoryQueue; -} + } transition({V, I}, Atomic, A) {TagArrayRead, TagArrayWrite} { t_allocateTBE; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67200?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I3fd9009592c8716a3da19dcdccf68f16af6522ef Gerrit-Change-Number: 67200 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: mem-ruby: add GPU cache bypass I->I transition
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67201?usp=email ) Change subject: mem-ruby: add GPU cache bypass I->I transition .. mem-ruby: add GPU cache bypass I->I transition 66d4a158 added support for AMD's GPU cache bypassing flags (GLC for bypassing L1 caches, SLC for bypassing all caches). However, it did not add a transition for the situation where the cache line is currently I (Invalid). This commit adds this support, which resolves an assert failure in Pannotia workloads when this situation arises. Change-Id: I59a62ce70c01dd8b73aacb733fb3d1d0dab2624b --- M src/mem/ruby/protocol/GPU_VIPER-TCP.sm 1 file changed, 25 insertions(+), 0 deletions(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm index 6a977c4..7e0ad4e 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCP.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCP.sm @@ -619,6 +619,15 @@ p_popMandatoryQueue; } + // Transition to be called when a load request with GLC or SLC flag set arrives + // at L1. Since the entry is invalid, there isn't anything to forward to L2, + // so just issue read. + transition(I, LoadBypassEvict) {TagArrayRead, TagArrayWrite} { +uu_profileDataMiss; +n_issueRdBlk; +p_popMandatoryQueue; + } + transition({V, I}, Atomic, A) {TagArrayRead, TagArrayWrite} { t_allocateTBE; mru_updateMRU; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67201?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I59a62ce70c01dd8b73aacb733fb3d1d0dab2624b Gerrit-Change-Number: 67201 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] [S] Change in gem5/gem5[develop]: tests: cleanup m5out directly in weekly
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/67198?usp=email ) Change subject: tests: cleanup m5out directly in weekly .. tests: cleanup m5out directly in weekly The weekly test script was implicitly assuming that no m5out directory existed in the folder where the script was run. However, if a prior test ran and failed, it would not clean up its m5out directory, causing the weekly tests to fail. This commit resolves this by removing the m5out directory before trying to run any tests in the weekly script. Moreover, we also update the weekly script to explicitly remove this m5out directory at the end of the script. Change-Id: If10c59034528e171cc2c5dacb928b3a81d6b8c50 --- M tests/weekly.sh 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index c7f834b..f218729 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -70,10 +70,11 @@ # GPU weekly tests start here # before pulling gem5 resources, make sure it doesn't exist already +# likewise, remove any lingering m5out folder docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" --memory="${docker_mem_limit}" \ gcr.io/gem5-test/gcn-gpu:${tag} bash -c \ - "rm -rf ${gem5_root}/gem5-resources" + "rm -rf ${gem5_root}/gem5-resources ${gem5_root}/m5out" # delete Pannotia datasets and output files in case a failed regression run left # them around rm -f coAuthorsDBLP.graph 1k_128k.gr result.out @@ -383,5 +384,11 @@ "${gem5_root}" --memory="${docker_mem_limit}" hacc-test-weekly bash -c \ "rm -rf ${gem5_root}/gem5-resources" +# Delete the gem5 m5out folder we created -- need to do in docker because it +# creates +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}" --memory="${docker_mem_limit}" hacc-test-weekly bash -c \ + "rm -rf ${gem5_root}/m5out" + # delete Pannotia datasets we downloaded and output files it created rm -f coAuthorsDBLP.graph 1k_128k.gr result.out -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/67198?usp=email To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If10c59034528e171cc2c5dacb928b3a81d6b8c50 Gerrit-Change-Number: 67198 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org
[gem5-dev] Change in gem5/gem5[develop]: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/57535 ) Change subject: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester .. tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester Currently the GPU Ruby tester does not support requests returned as aliased. To get around this, the GPU Ruby tester needs numDMAs to be 0. To enable this, change the default value to allow us to identify when a user wants more DMAs. Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57535 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matthew Poremba --- M configs/example/ruby_gpu_random_test.py 1 file changed, 30 insertions(+), 7 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/ruby_gpu_random_test.py b/configs/example/ruby_gpu_random_test.py index 0763454..029a97d 100644 --- a/configs/example/ruby_gpu_random_test.py +++ b/configs/example/ruby_gpu_random_test.py @@ -79,7 +79,7 @@ help="Random seed number. Default value (i.e., 0) means \ using runtime-specific value") parser.add_argument("--log-file", type=str, default="gpu-ruby-test.log") -parser.add_argument("--num-dmas", type=int, default=0, +parser.add_argument("--num-dmas", type=int, default=None, help="The number of DMA engines to use in tester config.") args = parser.parse_args() @@ -108,7 +108,7 @@ args.wf_size = 1 args.wavefronts_per_cu = 1 args.num_cpus = 1 -args.num_dmas = 1 +n_DMAs = 1 args.cu_per_sqc = 1 args.cu_per_scalar_cache = 1 args.num_compute_units = 1 @@ -117,7 +117,7 @@ args.wf_size = 16 args.wavefronts_per_cu = 4 args.num_cpus = 4 -args.num_dmas = 2 +n_DMAs = 2 args.cu_per_sqc = 4 args.cu_per_scalar_cache = 4 args.num_compute_units = 4 @@ -126,11 +126,19 @@ args.wf_size = 32 args.wavefronts_per_cu = 4 args.num_cpus = 4 -args.num_dmas = 4 +n_DMAs = 4 args.cu_per_sqc = 4 args.cu_per_scalar_cache = 4 args.num_compute_units = 8 +# Number of DMA engines +if not(args.num_dmas is None): +n_DMAs = args.num_dmas +# currently the tester does not support requests returned as +# aliased, thus we need num_dmas to be 0 for it +if not(args.num_dmas == 0): +print("WARNING: num_dmas != 0 not supported with VIPER") + # # Set address range - 2 options # level 0: small @@ -173,9 +181,6 @@ # For now we're testing only GPU protocol, so we force num_cpus to be 0 args.num_cpus = 0 -# Number of DMA engines -n_DMAs = args.num_dmas - # Number of CUs n_CUs = args.num_compute_units -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57535 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2 Gerrit-Change-Number: 57535 Gerrit-PatchSet: 5 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Alexandru Duțu Gerrit-CC: Bradford Beckmann Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: change default GPU reg allocator to dynamic
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/57537 ) ( 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. )Change subject: configs, gpu-compute: change default GPU reg allocator to dynamic .. configs, gpu-compute: change default GPU reg allocator to dynamic The current default GPU register allocator is the "simple" policy, which only allows 1 wavefront to run at a time on each CU. This is not very realistic and also means the tester (when not specifically choosing the dynamic policy) is less rigorous in terms of validating correctness. To resolve this, this commit changes the default to the "dynamic" register allocator, which runs as many waves per CU as there are space in terms of registers and other resources -- thus it is more realistic and does a better job of ensuring test coverage. Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57537 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Jason Lowe-Power --- M configs/example/apu_se.py 1 file changed, 25 insertions(+), 1 deletion(-) Approvals: Jason Lowe-Power: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py index 532fb98..b5fb9ff 100644 --- a/configs/example/apu_se.py +++ b/configs/example/apu_se.py @@ -161,7 +161,7 @@ ' m5_switchcpu pseudo-ops will toggle back and forth') parser.add_argument("--num-hw-queues", type=int, default=10, help="number of hw queues in packet processor") -parser.add_argument("--reg-alloc-policy", type=str, default="simple", +parser.add_argument("--reg-alloc-policy", type=str, default="dynamic", help="register allocation policy (simple/dynamic)") parser.add_argument("--dgpu", action="store_true", default=False, -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57537 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e Gerrit-Change-Number: 57537 Gerrit-PatchSet: 3 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Alexandru Duțu Gerrit-Reviewer: Bradford Beckmann Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Bobby Bruce Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests, mem-ruby: add GPU Ruby random tester to nightly tests
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/57536 ) Change subject: tests, mem-ruby: add GPU Ruby random tester to nightly tests .. tests, mem-ruby: add GPU Ruby random tester to nightly tests This commit adds the GPU protocol random tester to the nightly tests. The input has been sized to take around 30 seconds and provide good coverage for the coherence protocol. Change-Id: If789d9d15a16fbd95fd7b115ffbf10e45bbb45c4 --- M tests/nightly.sh 1 file changed, 26 insertions(+), 0 deletions(-) diff --git a/tests/nightly.sh b/tests/nightly.sh index e421d97..978e463 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -109,6 +109,19 @@ "scons build/${gpu_isa}/gem5.opt -j${compile_threads} \ || (rm -rf build && scons build/${gpu_isa}/gem5.opt -j${compile_threads})" +# first run the GPU protocol random tester -- it should take about 30 seconds +# to run and provides good coverage for the coherence protocol +# Input choices (some are default and thus implicit): +# - use small cache size to encourage races +# - use small system size to encourage races since more requests per CU (and faster sim) +# - use small address range to encourage more races +# - use small episode length to encourage more races +# - 50K tests runs in ~30 seconds with reasonably good coverage +# - num-dmas = 0 because VIPER doesn't support partial cache line writes, which DMAs need +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/${gpu_isa}/gem5.opt \ +configs/example/ruby_gpu_random_test.py --test-length=5 --num-dmas=0 + # get square wget -qN http://dist.gem5.org/dist/develop/test-progs/square/square -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57536 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If789d9d15a16fbd95fd7b115ffbf10e45bbb45c4 Gerrit-Change-Number: 57536 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/57535 ) Change subject: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester .. tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester Currently VIPER does not support partial cache line writes, so we need numDMAs to be 0 for it to work with the GPU Ruby tester. To enable this, change the default value to -1 to allow us to identify when a user wants more DMAs. Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2 --- M configs/example/ruby_gpu_random_test.py 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/configs/example/ruby_gpu_random_test.py b/configs/example/ruby_gpu_random_test.py index 0763454..7fcaeeb 100644 --- a/configs/example/ruby_gpu_random_test.py +++ b/configs/example/ruby_gpu_random_test.py @@ -79,7 +79,7 @@ help="Random seed number. Default value (i.e., 0) means \ using runtime-specific value") parser.add_argument("--log-file", type=str, default="gpu-ruby-test.log") -parser.add_argument("--num-dmas", type=int, default=0, +parser.add_argument("--num-dmas", type=int, default=-1, help="The number of DMA engines to use in tester config.") args = parser.parse_args() @@ -108,7 +108,13 @@ args.wf_size = 1 args.wavefronts_per_cu = 1 args.num_cpus = 1 -args.num_dmas = 1 +# if user didn't specify number of DMAs, then assume 0 +if args.num_dmas < 1: + # currently VIPER does not support partial cache line writes, + # so we need numDMAs to be 0 for it + args.num_dmas = 0 +else: + args.num_dmas = 1 args.cu_per_sqc = 1 args.cu_per_scalar_cache = 1 args.num_compute_units = 1 @@ -117,7 +123,13 @@ args.wf_size = 16 args.wavefronts_per_cu = 4 args.num_cpus = 4 -args.num_dmas = 2 +# if user didn't specify number of DMAs, then assume 0 +if args.num_dmas < 1: + # currently VIPER does not support partial cache line writes, + # so we need numDMAs to be 0 for it + args.num_dmas = 0 +else: + args.num_dmas = 2 args.cu_per_sqc = 4 args.cu_per_scalar_cache = 4 args.num_compute_units = 4 @@ -126,7 +138,13 @@ args.wf_size = 32 args.wavefronts_per_cu = 4 args.num_cpus = 4 -args.num_dmas = 4 +# if user didn't specify number of DMAs, then assume 0 +if args.num_dmas < 1: + # currently VIPER does not support partial cache line writes, + # so we need numDMAs to be 0 for it + args.num_dmas = 0 +else: + args.num_dmas = 4 args.cu_per_sqc = 4 args.cu_per_scalar_cache = 4 args.num_compute_units = 8 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57535 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2 Gerrit-Change-Number: 57535 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: change default GPU reg allocator to dynamic
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/57537 ) Change subject: configs, gpu-compute: change default GPU reg allocator to dynamic .. configs, gpu-compute: change default GPU reg allocator to dynamic The current default GPU register allocator is the "simple" policy, which only allows 1 wavefront to run at a time on each CU. This is not very realistic and also means the tester (when not specifically choosing the dynamic policy) is less rigorous in terms of validating correctness. To resolve this, this commit changes the default to the "dynamic" register allocator, which runs as many waves per CU as there are space in terms of registers and other resources -- thus it is more realistic and does a better job of ensuring test coverage. Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e --- M configs/example/apu_se.py 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py index 532fb98..b5fb9ff 100644 --- a/configs/example/apu_se.py +++ b/configs/example/apu_se.py @@ -161,7 +161,7 @@ ' m5_switchcpu pseudo-ops will toggle back and forth') parser.add_argument("--num-hw-queues", type=int, default=10, help="number of hw queues in packet processor") -parser.add_argument("--reg-alloc-policy", type=str, default="simple", +parser.add_argument("--reg-alloc-policy", type=str, default="dynamic", help="register allocation policy (simple/dynamic)") parser.add_argument("--dgpu", action="store_true", default=False, -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57537 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e Gerrit-Change-Number: 57537 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Fix register checking and allocation in dyn manager
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/56909 ) Change subject: gpu-compute: Fix register checking and allocation in dyn manager .. gpu-compute: Fix register checking and allocation in dyn manager This patch updates the canAllocate function to account both for the number of regions of registers that need to be allocated, and for the fact that the registers aren't one continuous chunk. The patch also consolidates the registers as much as possible when a register chunk is freed. This prevents fragmentation from making it impossible to allocate enough registers Change-Id: Ic95cfe614d247add475f7139d3703991042f8149 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56909 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matthew Poremba --- M src/gpu-compute/dyn_pool_manager.cc 1 file changed, 69 insertions(+), 6 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/dyn_pool_manager.cc b/src/gpu-compute/dyn_pool_manager.cc index 62a39a9..3db5e7f 100644 --- a/src/gpu-compute/dyn_pool_manager.cc +++ b/src/gpu-compute/dyn_pool_manager.cc @@ -93,8 +93,24 @@ DynPoolManager::canAllocate(uint32_t numRegions, uint32_t size) { uint32_t actualSize = minAllocatedElements(size); -DPRINTF(GPUVRF,"Can Allocate %d\n",actualSize); -return (_totRegSpaceAvailable >= actualSize); +uint32_t numAvailChunks = 0; +DPRINTF(GPUVRF, "Checking if we can allocate %d regions of size %d " +"registers\n", numRegions, actualSize); +for (auto it : freeSpaceRecord) { +numAvailChunks += (it.second - it.first)/actualSize; +} + +if (numAvailChunks >= numRegions) { +DPRINTF(GPUVRF, "Able to allocate %d regions of size %d; " +"number of available regions: %d\n", +numRegions, actualSize, numAvailChunks); +return true; +} else { +DPRINTF(GPUVRF, "Unable to allocate %d regions of size %d; " +"number of available regions: %d\n", +numRegions, actualSize, numAvailChunks); +return false; +} } uint32_t @@ -105,7 +121,8 @@ uint32_t actualSize = minAllocatedElements(size); auto it = freeSpaceRecord.begin(); while (it != freeSpaceRecord.end()) { -if (it->second >= actualSize) { +uint32_t curChunkSize = it->second - it->first; +if (curChunkSize >= actualSize) { // assign the next block starting from here startIdx = it->first; _regionSize = actualSize; @@ -115,14 +132,13 @@ // This case sees if this chunk size is exactly equal to // the size of the requested chunk. If yes, then this can't // contribute to future requests and hence, should be removed -if (it->second == actualSize) { +if (curChunkSize == actualSize) { it = freeSpaceRecord.erase(it); // once entire freeSpaceRecord allocated, increment // reservedSpaceRecord count ++reservedSpaceRecord; } else { it->first += actualSize; -it->second -= actualSize; } break; } @@ -144,7 +160,32 @@ // Current dynamic register allocation does not handle wraparound assert(firstIdx < lastIdx); _totRegSpaceAvailable += lastIdx-firstIdx; -freeSpaceRecord.push_back(std::make_pair(firstIdx,lastIdx-firstIdx)); + +// Consolidate with other regions. Need to check if firstIdx or lastIdx +// already exist +auto firstIt = std::find_if( +freeSpaceRecord.begin(), +freeSpaceRecord.end(), +[&](const std::pair& element){ +return element.second == firstIdx;} ); + +auto lastIt = std::find_if( +freeSpaceRecord.begin(), +freeSpaceRecord.end(), +[&](const std::pair& element){ +return element.first == lastIdx;} ); + +if (firstIt != freeSpaceRecord.end() && lastIt != freeSpaceRecord.end()) { +firstIt->second = lastIt->second; +freeSpaceRecord.erase(lastIt); +} else if (firstIt != freeSpaceRecord.end()) { +firstIt->second = lastIdx; +} else if (lastIt != freeSpaceRecord.end()) { +lastIt->first = firstIdx; +} else { +freeSpaceRecord.push_back(std::make_pair(firstIdx, lastIdx)); +} + // remove corresponding entry from reservedSpaceRecord too --reservedSpaceRecord; } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/56909 To unsubscribe, or for help writing mail filters, visit
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Set scratch_base, lds_base for gfx902
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/54663 ) Change subject: gpu-compute: Set scratch_base, lds_base for gfx902 .. gpu-compute: Set scratch_base, lds_base for gfx902 When updating how scratch_base and lds_base were set, gfx902 was left out. This adds in gfx902 to the case statement, allowing the apertures to be set and for simulations using gfx902 to not error out Change-Id: I0e1adbdf63f7c129186fb835e30adac9cd4b72d0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/54663 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matthew Poremba Tested-by: kokoro --- M src/gpu-compute/gpu_compute_driver.cc 1 file changed, 21 insertions(+), 0 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved; Looks good to me, approved Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/gpu_compute_driver.cc b/src/gpu-compute/gpu_compute_driver.cc index e908f4e..d98f4c6 100644 --- a/src/gpu-compute/gpu_compute_driver.cc +++ b/src/gpu-compute/gpu_compute_driver.cc @@ -331,6 +331,7 @@ ldsApeBase(i + 1); break; case GfxVersion::gfx900: + case GfxVersion::gfx902: args->process_apertures[i].scratch_base = scratchApeBaseV9(); args->process_apertures[i].lds_base = @@ -631,6 +632,7 @@ ape_args->lds_base = ldsApeBase(i + 1); break; case GfxVersion::gfx900: + case GfxVersion::gfx902: ape_args->scratch_base = scratchApeBaseV9(); ape_args->lds_base = ldsApeBaseV9(); break; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/54663 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0e1adbdf63f7c129186fb835e30adac9cd4b72d0 Gerrit-Change-Number: 54663 Gerrit-PatchSet: 2 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Bobby Bruce Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests, gpu-compute: test dynamic register policy in regressions
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/52163 ) Change subject: tests, gpu-compute: test dynamic register policy in regressions .. tests, gpu-compute: test dynamic register policy in regressions The GPU models support a simple register allocation policy (1 WF/CU at a time) and a dynamic register allocation policy (up to max WF/CU at a time). By default, the simple policy is used. However, the dynamic policy is much more realistic relative to real hardware and thus much more important to ensure it works in the regressions. This commit updates the nightly and weekly regressions accordingly to run the dynamic register allocation policy. Change-Id: Id263d3d5e19e4ff47f0eb6d9b08cbafdf2177fb9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52163 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Bobby R. Bruce --- M tests/weekly.sh M tests/nightly.sh 2 files changed, 40 insertions(+), 8 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved Bobby R. Bruce: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/nightly.sh b/tests/nightly.sh index b3708fd..41db369 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -83,7 +83,6 @@ ./main.py run --length long -j${threads} -t${threads} # Run the GPU tests. - # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. docker pull gcr.io/gem5-test/gcn-gpu:latest docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ @@ -101,7 +100,7 @@ # basic GPU functionality is working. docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c square +configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c square # get HeteroSync wget -qN http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel @@ -112,8 +111,8 @@ # atomics are tested. docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c allSyncPrims-1kernel \ ---options="sleepMutex 10 16 4" +configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \ +allSyncPrims-1kernel --options="sleepMutex 10 16 4" # run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs # accessing unique data and then joining a lock-free barrier, 10 Ld/St per @@ -122,5 +121,5 @@ # atomics are tested. docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c allSyncPrims-1kernel \ ---options="lfTreeBarrUniq 10 16 4" +configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \ +allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4" diff --git a/tests/weekly.sh b/tests/weekly.sh index 51376bd..172d955 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -95,7 +95,7 @@ # stressing several GPU compute and memory components docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 --mem-size=8GB \ +configs/example/apu_se.py -n3 --mem-size=8GB --reg-alloc-policy=dynamic \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh # test DNNMark @@ -137,6 +137,7 @@ "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --reg-alloc-policy=dynamic \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax" \ -c dnnmark_test_fwd_softmax \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark \ @@ -146,6 +147,7 @@ "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --reg-alloc-policy=dynamic \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool" \ -c dnnmark_test_fwd_pool \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark \ @@ -155,6 +157,7 @@ "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-w
[gem5-dev] Change in gem5/gem5[develop]: tests: add Pannotia to weekly regression
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51968 ) Change subject: tests: add Pannotia to weekly regression .. tests: add Pannotia to weekly regression Add the Pannotia benchmarks to the weekly regression suite. These applications do a good job of testing the GPU support for irregular access patterns of various kinds. All inputs have been sized to use relatively small graphs to avoid increasing runtime too much. However, even with small input sizes Pannotia does run for a while. Note that the Pannotia benchmarks also use m5ops in them. Thus, this commit also adds support into the weekly regression for compiling the m5ops (for x86, since that is what the GPU model assumes for the CPU). Change-Id: I1f68b02b38ff24505a2894694b7544977024f8fa Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51968 Tested-by: kokoro Maintainer: Matt Sinclair Reviewed-by: Jason Lowe-Power --- M tests/weekly.sh 1 file changed, 165 insertions(+), 2 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index c7ba7e6..51376bd 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -44,10 +44,16 @@ "${gem5_root}"/tests --rm gcr.io/gem5-test/ubuntu-20.04_all-dependencies \ ./main.py run --length very-long -j${threads} -t${threads} +mkdir -p tests/testing-results + +# GPU weekly tests start here # before pulling gem5 resources, make sure it doesn't exist already docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ "rm -rf ${gem5_root}/gem5-resources" +# delete Pannotia datasets and output files in case a failed regression run left +# them around +rm -f coAuthorsDBLP.graph 1k_128k.gr result.out # Pull gem5 resources to the root of the gem5 directory -- currently the # pre-built binares for LULESH are out-of-date and won't run correctly with @@ -71,9 +77,14 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}" -# test LULESH -mkdir -p tests/testing-results +# Some of the apps we test use m5ops (and x86), so compile them for x86 +# Note: setting TERM in the environment is necessary as scons fails for m5ops if +# it is not set. +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/util/m5" hacc-test-weekly bash -c \ +"export TERM=xterm-256color ; scons build/x86/out/m5" +# test LULESH # build LULESH docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/lulesh" \ @@ -163,8 +174,137 @@ --benchmark-root=${gem5_root}/gem5-resources/src/gpu/halo-finder/src/hip \ -c ForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb" +# test Pannotia +# Pannotia has 6 different benchmarks (BC, Color, FW, MIS, PageRank, SSSP), of +# which 3 (Color, PageRank, SSSP) have 2 different variants. Since they are +# useful for testing irregular GPU application behavior, we test each. + +# build BC +docker run --rm -v ${PWD}:${PWD} \ + -w ${gem5_root}/gem5-resources/src/gpu/pannotia/bc -u $UID:$GID \ + hacc-test-weekly bash -c \ + "export GEM5_PATH=${gem5_root} ; make gem5-fusion" + +# # get input dataset for BC test +wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr +# run BC +docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \ + hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \ + ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \ + --benchmark-root=gem5-resources/src/gpu/pannotia/bc/bin -c bc.gem5 \ + --options="1k_128k.gr" + +# build Color Max +docker run --rm -v ${gem5_root}:${gem5_root} -w \ + ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \ + hacc-test-weekly bash -c \ + "export GEM5_PATH=${gem5_root} ; make gem5-fusion" + +# run Color (Max) (use same input dataset as BC for faster testing) +docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \ + hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \ + ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \ + --benchmark-root=${gem5_root}/gem5-resources/src/gpu/pannotia/color/bin \ + -c color_max.gem5 --options="1k_128k.gr 0" + +# build Color (MaxMin) +docker run --rm -v ${gem5_root}:${gem5_root} -w \ + ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \ + hacc-test-weekly bash -c \ + "export GEM5_PATH=${gem5_root} ; export VARIANT=MAXMIN ; make gem5-fusion" + +# run Color (MaxMin) (use same input dataset as BC for faster testing) +docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \ +
[gem5-dev] Change in gem5/gem5[develop]: tests, gpu-compute: test dynamic register policy in weekly
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/52163 ) Change subject: tests, gpu-compute: test dynamic register policy in weekly .. tests, gpu-compute: test dynamic register policy in weekly The GPU models support a simple register allocation policy (1 WF/CU at a time) and a dynamic register allocation policy (up to max WF/CU at a time). By default, the simple policy is used. However, the dynamic policy is much more realistic relative to real hardware and thus much more important to ensure it works in the regressions. This commit updates the nightly and weekly regressions accordingly to run the dynamic register allocation policy. Change-Id: Id263d3d5e19e4ff47f0eb6d9b08cbafdf2177fb9 --- M tests/weekly.sh M tests/nightly.sh 2 files changed, 36 insertions(+), 8 deletions(-) diff --git a/tests/nightly.sh b/tests/nightly.sh index b3708fd..41db369 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -83,7 +83,6 @@ ./main.py run --length long -j${threads} -t${threads} # Run the GPU tests. - # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. docker pull gcr.io/gem5-test/gcn-gpu:latest docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ @@ -101,7 +100,7 @@ # basic GPU functionality is working. docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c square +configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c square # get HeteroSync wget -qN http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel @@ -112,8 +111,8 @@ # atomics are tested. docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c allSyncPrims-1kernel \ ---options="sleepMutex 10 16 4" +configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \ +allSyncPrims-1kernel --options="sleepMutex 10 16 4" # run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs # accessing unique data and then joining a lock-free barrier, 10 Ld/St per @@ -122,5 +121,5 @@ # atomics are tested. docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c allSyncPrims-1kernel \ ---options="lfTreeBarrUniq 10 16 4" +configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \ +allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4" diff --git a/tests/weekly.sh b/tests/weekly.sh index 51376bd..172d955 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -95,7 +95,7 @@ # stressing several GPU compute and memory components docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 --mem-size=8GB \ +configs/example/apu_se.py -n3 --mem-size=8GB --reg-alloc-policy=dynamic \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh # test DNNMark @@ -137,6 +137,7 @@ "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --reg-alloc-policy=dynamic \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax" \ -c dnnmark_test_fwd_softmax \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark \ @@ -146,6 +147,7 @@ "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --reg-alloc-policy=dynamic \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool" \ -c dnnmark_test_fwd_pool \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark \ @@ -155,6 +157,7 @@ "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --reg-alloc-policy=dynamic \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn" \ -c dnnmark_test_bwd
[gem5-dev] Change in gem5/gem5[develop]: tests: add Pannotia to weekly regression
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51968 ) Change subject: tests: add Pannotia to weekly regression .. tests: add Pannotia to weekly regression Add the Pannotia benchmarks to the weekly regression suite. These applications do a good job of testing the GPU support for irregular access patterns of various kinds. All inputs have been sized to use relatively small graphs to avoid increasing runtime too much. However, even with small input sizes Pannotia does run for a while. Note that the Pannotia benchmarks also use m5ops in them. Thus, this commit also adds support into the weekly regression for compiling the m5ops (for x86, since that is what the GPU model assumes for the CPU). Change-Id: I1f68b02b38ff24505a2894694b7544977024f8fa --- M tests/weekly.sh 1 file changed, 160 insertions(+), 2 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index c7ba7e6..a9f7531 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -44,10 +44,15 @@ "${gem5_root}"/tests --rm gcr.io/gem5-test/ubuntu-20.04_all-dependencies \ ./main.py run --length very-long -j${threads} -t${threads} +mkdir -p tests/testing-results + +# GPU weekly tests start here # before pulling gem5 resources, make sure it doesn't exist already docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ "rm -rf ${gem5_root}/gem5-resources" +# delete Pannotia datasets in case a failed regression run left them around +rm -f coAuthorsDBLP.graph 1k_128k.gr # Pull gem5 resources to the root of the gem5 directory -- currently the # pre-built binares for LULESH are out-of-date and won't run correctly with @@ -71,9 +76,14 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}" -# test LULESH -mkdir -p tests/testing-results +# Some of the apps we test use m5ops (and x86), so compile them for x86 +# Note: setting TERM in the environment is necessary as scons fails for m5ops if +# it is not set. +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/util/m5" hacc-test-weekly bash -c \ +"export TERM=xterm-256color ; scons build/x86/out/m5" +# test LULESH # build LULESH docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/lulesh" \ @@ -163,8 +173,137 @@ --benchmark-root=${gem5_root}/gem5-resources/src/gpu/halo-finder/src/hip \ -c ForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb" +# test Pannotia +# Pannotia has 6 different benchmarks (BC, Color, FW, MIS, PageRank, SSSP), of +# which 3 (Color, PageRank, SSSP) have 2 different variants. Since they are +# useful for testing irregular GPU application behavior, we test each. + +# build BC +docker run --rm -v ${PWD}:${PWD} \ + -w ${gem5_root}/gem5-resources/src/gpu/pannotia/bc -u $UID:$GID \ + hacc-test-weekly bash -c \ + "export GEM5_PATH=${gem5_root} ; make gem5-fusion" + +# # get input dataset for BC test +wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr +# run BC +docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \ + hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \ + ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \ + --benchmark-root=gem5-resources/src/gpu/pannotia/bc/bin -c bc.gem5 \ + --options="1k_128k.gr" + +# build Color Max +docker run --rm -v ${gem5_root}:${gem5_root} -w \ + ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \ + hacc-test-weekly bash -c \ + "export GEM5_PATH=${gem5_root} ; make gem5-fusion" + +# run Color (Max) (use same input dataset as BC for faster testing) +docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \ + hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \ + ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \ + --benchmark-root=${gem5_root}/gem5-resources/src/gpu/pannotia/color/bin \ + -c color_max.gem5 --options="1k_128k.gr 0" + +# build Color (MaxMin) +docker run --rm -v ${gem5_root}:${gem5_root} -w \ + ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \ + hacc-test-weekly bash -c \ + "export GEM5_PATH=${gem5_root} ; export VARIANT=MAXMIN ; make gem5-fusion" + +# run Color (MaxMin) (use same input dataset as BC for faster testing) +docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \ + hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \ + ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \ + --benchmark-root=${gem5_root}/gem5-resources/src/gpu/pannotia/color/bin \ + -c color_maxmin.gem5 --options="1k_128k.gr 0" + +# build FW +docker run --rm -v ${ge
[gem5-dev] Change in gem5/gem5[develop]: tests: fix bug in weekly regression
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51907 ) Change subject: tests: fix bug in weekly regression .. tests: fix bug in weekly regression 66a056b8 changed the weekly regression to use a single docker for all GPU tests, to reduce how many times gem5 needed to be compiled. However, in my local testing of that patch, gem5-resources was not deleted until after the docker was created -- which causes a problem when gem5-resources does not exist already from a prior run, since the creation of the dockerfile requires it for HACC. This commit fixes this problem by moving the pull of gem5-resources to be before anything else related to the GPU happens. Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51907 Maintainer: Matt Sinclair Maintainer: Bobby R. Bruce Reviewed-by: Bobby R. Bruce Tested-by: kokoro --- M tests/weekly.sh 1 file changed, 37 insertions(+), 13 deletions(-) Approvals: Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index 12793da..c7ba7e6 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -44,6 +44,20 @@ "${gem5_root}"/tests --rm gcr.io/gem5-test/ubuntu-20.04_all-dependencies \ ./main.py run --length very-long -j${threads} -t${threads} +# before pulling gem5 resources, make sure it doesn't exist already +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "rm -rf ${gem5_root}/gem5-resources" + +# Pull gem5 resources to the root of the gem5 directory -- currently the +# pre-built binares for LULESH are out-of-date and won't run correctly with +# ROCm 4.0. In the meantime, we can build the binary as part of this script. +# Moreover, DNNMark builds a library and thus doesn't have a binary, so we +# need to build it before we run it. +# Need to pull this first because HACC's docker requires this path to exist +git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ +"${gem5_root}/gem5-resources" + # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. # HACC requires setting numerous environment variables to run correctly. To # avoid needing to set all of these, we instead build a docker for it, which @@ -57,20 +71,7 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}" -# before pulling gem5 resources, make sure it doesn't exist already -docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ - "${gem5_root}" hacc-test-weekly bash -c \ - "rm -rf ${gem5_root}/gem5-resources" - # test LULESH -# Pull gem5 resources to the root of the gem5 directory -- currently the -# pre-built binares for LULESH are out-of-date and won't run correctly with -# ROCm 4.0. In the meantime, we can build the binary as part of this script. -# Moreover, DNNMark builds a library and thus doesn't have a binary, so we -# need to build it before we run it. -git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ -"${gem5_root}/gem5-resources" - mkdir -p tests/testing-results # build LULESH -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51907 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c Gerrit-Change-Number: 51907 Gerrit-PatchSet: 4 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-CC: Matthew Poremba Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: fix bug in weekly regression
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51907 ) Change subject: tests: fix bug in weekly regression .. tests: fix bug in weekly regression 66a056b8 changed the weekly regression to use a single docker for all GPU tests, to reduce how many times gem5 needed to be compiled. However, in my local testing of that patch, gem5-resources was not deleted until after the docker was created -- which causes a problem when gem5-resources does not exist already from a prior run, since the creation of the dockerfile requires it for HACC. This commit fixes this problem by moving the pull of gem5-resources to be before anything else related to the GPU happens. Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c --- M tests/weekly.sh 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index 12793da..b91dbbc 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -44,6 +44,20 @@ "${gem5_root}"/tests --rm gcr.io/gem5-test/ubuntu-20.04_all-dependencies \ ./main.py run --length very-long -j${threads} -t${threads} +# before pulling gem5 resources, make sure it doesn't exist already +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}" hacc-test-weekly bash -c \ + "rm -rf ${gem5_root}/gem5-resources" + +# Pull gem5 resources to the root of the gem5 directory -- currently the +# pre-built binares for LULESH are out-of-date and won't run correctly with +# ROCm 4.0. In the meantime, we can build the binary as part of this script. +# Moreover, DNNMark builds a library and thus doesn't have a binary, so we +# need to build it before we run it. +# Need to pull this first because HACC's docker requires this path to exist +git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ +"${gem5_root}/gem5-resources" + # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. # HACC requires setting numerous environment variables to run correctly. To # avoid needing to set all of these, we instead build a docker for it, which @@ -57,20 +71,7 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}" -# before pulling gem5 resources, make sure it doesn't exist already -docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ - "${gem5_root}" hacc-test-weekly bash -c \ - "rm -rf ${gem5_root}/gem5-resources" - # test LULESH -# Pull gem5 resources to the root of the gem5 directory -- currently the -# pre-built binares for LULESH are out-of-date and won't run correctly with -# ROCm 4.0. In the meantime, we can build the binary as part of this script. -# Moreover, DNNMark builds a library and thus doesn't have a binary, so we -# need to build it before we run it. -git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ -"${gem5_root}/gem5-resources" - mkdir -p tests/testing-results # build LULESH -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51907 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c Gerrit-Change-Number: 51907 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: add HACC to weekly regression
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51708 ) Change subject: tests: add HACC to weekly regression .. tests: add HACC to weekly regression This commit adds HACC (a GPU HPC workload) to the weekly regression tests. HACC requires a number of environment variables to be set, so to avoid setting all of them manually, we use a specific Dockerfile for it. To avoid compiling gem5 once for this docker and once for the other GPU tests in the weekly regression, this commit also updates the weekly regression such that all GPU weekly regression tests use HACC's docker for their tests. Change-Id: I9adabbca01537f031cbc491ddf1d3e7dd155f3f2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51708 Tested-by: kokoro Reviewed-by: Jason Lowe-Power Reviewed-by: Matt Sinclair Maintainer: Bobby R. Bruce --- M tests/weekly.sh 1 file changed, 56 insertions(+), 14 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve Matt Sinclair: Looks good to me, approved Bobby R. Bruce: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index 3f6a93c..12793da 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -45,15 +45,21 @@ ./main.py run --length very-long -j${threads} -t${threads} # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. +# HACC requires setting numerous environment variables to run correctly. To +# avoid needing to set all of these, we instead build a docker for it, which +# has all these variables pre-set in its Dockerfile +# To avoid compiling gem5 multiple times, all GPU benchmarks will use this docker pull gcr.io/gem5-test/gcn-gpu:latest +docker build -t hacc-test-weekly ${gem5_root}/gem5-resources/src/gpu/halo-finder + docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"${gem5_root}" hacc-test-weekly bash -c \ "scons build/GCN3_X86/gem5.opt -j${threads} \ -|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" +|| rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}" # before pulling gem5 resources, make sure it doesn't exist already docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ - "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "${gem5_root}" hacc-test-weekly bash -c \ "rm -rf ${gem5_root}/gem5-resources" # test LULESH @@ -70,13 +76,13 @@ # build LULESH docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/lulesh" \ - -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \ + -u $UID:$GID hacc-test-weekly bash -c \ "make" # LULESH is heavily used in the HPC community on GPUs, and does a good job of # stressing several GPU compute and memory components docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +"${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 --mem-size=8GB \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh @@ -84,29 +90,29 @@ # setup cmake for DNNMark docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ - gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP" + hacc-test-weekly bash -c "./setup.sh HIP" # make the DNNMark library docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \ -gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}" +hacc-test-weekly bash -c "make -j${threads}" # generate cachefiles -- since we are testing gfx801 and 4 CUs (default config) # in tester, we want cachefiles for this setup docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ "-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -gcr.io/gem5-test/gcn-gpu:latest bash -c \ +hacc-test-weekly bash -c \ "python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \ --num-cus=4" # generate mmap data for DNNMark (makes simulation much faster) docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly bash -c \ "g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data" docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:l
[gem5-dev] Change in gem5/gem5[develop]: tests: simplify weekly regression
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51707 ) Change subject: tests: simplify weekly regression .. tests: simplify weekly regression DNNMark and LULESH were both cloning and removing gem5-resources as part of their tests, since they were committed separately/in parallel. Clean this up so we only remove and pull gem5-resources once now in the weekly regression script. Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51707 Reviewed-by: Bobby R. Bruce Maintainer: Bobby R. Bruce Tested-by: kokoro --- M tests/weekly.sh 1 file changed, 26 insertions(+), 16 deletions(-) Approvals: Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index 1d14f4f..3f6a93c 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -51,13 +51,17 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" -# test LULESH # before pulling gem5 resources, make sure it doesn't exist already -rm -rf ${gem5_root}/gem5-resources +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "rm -rf ${gem5_root}/gem5-resources" +# test LULESH # Pull gem5 resources to the root of the gem5 directory -- currently the # pre-built binares for LULESH are out-of-date and won't run correctly with -# ROCm 4.0. In the meantime, we can build the binary as part of this script +# ROCm 4.0. In the meantime, we can build the binary as part of this script. +# Moreover, DNNMark builds a library and thus doesn't have a binary, so we +# need to build it before we run it. git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ "${gem5_root}/gem5-resources" @@ -76,19 +80,7 @@ configs/example/apu_se.py -n3 --mem-size=8GB \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh -# get DNNMark -# Delete gem5 resources repo if it already exists -- need to do in docker -# because of cachefiles DNNMark creates -docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ - "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ - "rm -rf ${gem5_root}/gem5-resources" - -# Pull the gem5 resources to the root of the gem5 directory -- DNNMark -# builds a library and thus doesn't have a binary, so we need to build -# it before we run it -git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ -"${gem5_root}/gem5-resources" - +# test DNNMark # setup cmake for DNNMark docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51707 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44 Gerrit-Change-Number: 51707 Gerrit-PatchSet: 3 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Kyle Roarty Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-CC: Matthew Poremba Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: add HACC to weekly regression
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51708 ) Change subject: tests: add HACC to weekly regression .. tests: add HACC to weekly regression This commit adds HACC (a GPU HPC workload) to the weekly regression tests. HACC requires a number of environment variables to be set, so to avoid setting all of them manually, we use a specific Dockerfile for it. To avoid compiling gem5 once for this docker and once for the other GPU tests in the weekly regression, this commit also updates the weekly regression such that all GPU weekly regression tests use HACC's docker for their tests. Change-Id: I9adabbca01537f031cbc491ddf1d3e7dd155f3f2 --- M tests/weekly.sh 1 file changed, 53 insertions(+), 15 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index 3f6a93c..f5308b5 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -45,15 +45,21 @@ ./main.py run --length very-long -j${threads} -t${threads} # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. +# HACC requires setting numerous environment variables to run correctly. To +# avoid needing to set all of these, we instead build a docker for it, which +# has all these variables pre-set in its Dockerfile +# To avoid compiling gem5 multiple times, all GPU benchmarks will use this docker pull gcr.io/gem5-test/gcn-gpu:latest +docker build -t hacc-test-weekly ${gem5_root}/gem5-resources/src/gpu/halo-finder + docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"${gem5_root}" hacc-test-weekly bash -c \ "scons build/GCN3_X86/gem5.opt -j${threads} \ -|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" +|| rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}" # before pulling gem5 resources, make sure it doesn't exist already docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ - "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "${gem5_root}" hacc-test-weekly bash -c \ "rm -rf ${gem5_root}/gem5-resources" # test LULESH @@ -70,43 +76,44 @@ # build LULESH docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/lulesh" \ - -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \ + -u $UID:$GID hacc-test-weekly bash -c \ "make" # LULESH is heavily used in the HPC community on GPUs, and does a good job of # stressing several GPU compute and memory components docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +"${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 --mem-size=8GB \ ---benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh +--benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh \ +--options="1.0e-2 1" # test DNNMark # setup cmake for DNNMark docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ - gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP" + hacc-test-weekly bash -c "./setup.sh HIP" # make the DNNMark library docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \ -gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}" +hacc-test-weekly bash -c "make -j${threads}" # generate cachefiles -- since we are testing gfx801 and 4 CUs (default config) # in tester, we want cachefiles for this setup docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ "-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ -gcr.io/gem5-test/gcn-gpu:latest bash -c \ +hacc-test-weekly bash -c \ "python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \ --num-cus=4" # generate mmap data for DNNMark (makes simulation much faster) docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly bash -c \ "g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data" docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ -"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly bash -c \ "./generate_rand_data" # now we can run DNNMark! @@ -117,7 +124,7 @@ # including both inference and training docker run --rm --volume "${gem5_root}":"$
[gem5-dev] Change in gem5/gem5[develop]: tests: simplify weekly regression
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51707 ) Change subject: tests: simplify weekly regression .. tests: simplify weekly regression DNNMark and LULESH were both cloning and removing gem5-resources as part of their tests, since they were committed separately/in parallel. Clean this up so we only remove and pull gem5-resources once now in the weekly regression script. Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44 --- M tests/weekly.sh 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index 1d14f4f..c5811a4 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -51,13 +51,17 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" -# test LULESH # before pulling gem5 resources, make sure it doesn't exist already -rm -rf ${gem5_root}/gem5-resources +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "rm -rf ${gem5_root}/gem5-resources" +# test LULESH # Pull gem5 resources to the root of the gem5 directory -- currently the # pre-built binares for LULESH are out-of-date and won't run correctly with -# ROCm 4.0. In the meantime, we can build the binary as part of this script +# ROCm 4.0. In the meantime, we can build the binary as part of this script. +# Moreover, DNNMark builds a library and thus doesn't have a binary, so we +# need to build it before we run it. git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ "${gem5_root}/gem5-resources" @@ -76,19 +80,12 @@ configs/example/apu_se.py -n3 --mem-size=8GB \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh -# get DNNMark -# Delete gem5 resources repo if it already exists -- need to do in docker -# because of cachefiles DNNMark creates -docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ - "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ - "rm -rf ${gem5_root}/gem5-resources" - -# Pull the gem5 resources to the root of the gem5 directory -- DNNMark -# builds a library and thus doesn't have a binary, so we need to build -# it before we run it +# get DNNMark; it builds a library and thus doesn't have a binary, so we +# need to build it before we run it git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ "${gem5_root}/gem5-resources" +# test DNNMark # setup cmake for DNNMark docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51707 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44 Gerrit-Change-Number: 51707 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: fix typo in GPU VIPER TCC comment
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51687 ) Change subject: mem-ruby: fix typo in GPU VIPER TCC comment .. mem-ruby: fix typo in GPU VIPER TCC comment 72ee6d1a fixed a deadlock in the GPU VIPER TCC. However, it inadvertently added a typo to the comments explaining the change. This commit fixes that. Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51687 Maintainer: Matt Sinclair Reviewed-by: Jason Lowe-Power Tested-by: kokoro --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 21 insertions(+), 4 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 571587f..dc6cf03 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -618,19 +618,19 @@ // they can cause a resource stall deadlock! transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { - // put putting the stalled requests in a buffer, we reduce resource contention + // by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; } transition(A, {RdBlk, WrVicBlk, WrVicBlkBack}) { //TagArrayRead} { - // put putting the stalled requests in a buffer, we reduce resource contention + // by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; } transition(IV, {WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { - // put putting the stalled requests in a buffer, we reduce resource contention + // by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; @@ -681,7 +681,7 @@ transition(A, Atomic) { p_profileMiss; -// put putting the stalled requests in a buffer, we reduce resource contention +// by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51687 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c Gerrit-Change-Number: 51687 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: fix typo in GPU VIPER TCC comment
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51687 ) Change subject: mem-ruby: fix typo in GPU VIPER TCC comment .. mem-ruby: fix typo in GPU VIPER TCC comment 72ee6d1a fixed a deadlock in the GPU VIPER TCC. However, it inadvertently added a typo to the comments explaining the change. This commit fixes that. Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 571587f..dc6cf03 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -618,19 +618,19 @@ // they can cause a resource stall deadlock! transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { - // put putting the stalled requests in a buffer, we reduce resource contention + // by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; } transition(A, {RdBlk, WrVicBlk, WrVicBlkBack}) { //TagArrayRead} { - // put putting the stalled requests in a buffer, we reduce resource contention + // by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; } transition(IV, {WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { - // put putting the stalled requests in a buffer, we reduce resource contention + // by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; @@ -681,7 +681,7 @@ transition(A, Atomic) { p_profileMiss; -// put putting the stalled requests in a buffer, we reduce resource contention +// by putting the stalled requests in a buffer, we reduce resource contention // since they won't try again every cycle and will instead only try again once // woken up st_stallAndWaitRequest; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51687 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c Gerrit-Change-Number: 51687 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: convert all nightly GPU tests from GUID to GID
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51567 ) Change subject: tests: convert all nightly GPU tests from GUID to GID .. tests: convert all nightly GPU tests from GUID to GID As part of the docker commands for the nightly GPU regression tests, earlier commits inadvertently used GUID instead of GID, where GUID does not exist. This causes some failures when run in Jenkins. This patch fixes this issue. Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51567 Reviewed-by: Jason Lowe-Power Reviewed-by: Bobby R. Bruce Maintainer: Jason Lowe-Power Maintainer: Bobby R. Bruce Tested-by: kokoro --- M tests/nightly.sh 1 file changed, 24 insertions(+), 4 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve; Looks good to me, approved Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/tests/nightly.sh b/tests/nightly.sh index 30e2c58..b3708fd 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -86,7 +86,7 @@ # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. docker pull gcr.io/gem5-test/gcn-gpu:latest -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ "scons build/GCN3_X86/gem5.opt -j${threads} \ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" @@ -99,7 +99,7 @@ # Square is the simplest, fastest, more heavily tested GPU application # Thus, we always want to run this in the nightly regressions to make sure # basic GPU functionality is working. -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 -c square @@ -110,7 +110,7 @@ # 10 Ld/St per thread and 4 iterations of the critical section is a reasonable # moderate contention case for the default 4 CU GPU config and help ensure GPU # atomics are tested. -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 -c allSyncPrims-1kernel \ --options="sleepMutex 10 16 4" @@ -120,7 +120,7 @@ # thread, 4 iterations of critical section. Again this is representative of a # moderate contention case for the default 4 CU GPU config and help ensure GPU # atomics are tested. -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 -c allSyncPrims-1kernel \ --options="lfTreeBarrUniq 10 16 4" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51567 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377 Gerrit-Change-Number: 51567 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: add additional space in weekly DNNMark tests
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51453 ) Change subject: tests: add additional space in weekly DNNMark tests .. tests: add additional space in weekly DNNMark tests Add space between -c and binary name for all DNNMark tests to conform to the other tests style and reduce confusion. Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51453 Reviewed-by: Jason Lowe-Power Reviewed-by: Bobby R. Bruce Maintainer: Jason Lowe-Power Maintainer: Bobby R. Bruce Tested-by: kokoro --- M tests/weekly.sh 1 file changed, 21 insertions(+), 3 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve; Looks good to me, approved Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index 33dd70b..1d14f4f 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -128,7 +128,7 @@ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax" \ - -cdnnmark_test_fwd_softmax \ + -c dnnmark_test_fwd_softmax \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark \ -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" @@ -137,7 +137,7 @@ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool" \ - -cdnnmark_test_fwd_pool \ + -c dnnmark_test_fwd_pool \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark \ -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" @@ -146,7 +146,7 @@ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn" \ - -cdnnmark_test_bwd_bn \ + -c dnnmark_test_bwd_bn \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark \ -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51453 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624 Gerrit-Change-Number: 51453 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: fix LULESH weekly regression command
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51207 ) Change subject: tests: fix LULESH weekly regression command .. tests: fix LULESH weekly regression command 7756c5e added LULESH to the weekly regression script. However, it assumed a local installation of gem5-resources which it should not have. This commit fixes that so the weekly regression builds the LULESH binary and then runs it instead. Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51207 Maintainer: Matt Sinclair Maintainer: Bobby R. Bruce Reviewed-by: Jason Lowe-Power Reviewed-by: Bobby R. Bruce Tested-by: kokoro --- M tests/weekly.sh 1 file changed, 38 insertions(+), 4 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index c699f65..33dd70b 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -46,21 +46,35 @@ # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. docker pull gcr.io/gem5-test/gcn-gpu:latest -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ "scons build/GCN3_X86/gem5.opt -j${threads} \ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" -# get LULESH -wget -qN http://dist.gem5.org/dist/develop/test-progs/lulesh/lulesh +# test LULESH +# before pulling gem5 resources, make sure it doesn't exist already +rm -rf ${gem5_root}/gem5-resources + +# Pull gem5 resources to the root of the gem5 directory -- currently the +# pre-built binares for LULESH are out-of-date and won't run correctly with +# ROCm 4.0. In the meantime, we can build the binary as part of this script +git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ +"${gem5_root}/gem5-resources" mkdir -p tests/testing-results +# build LULESH +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}/gem5-resources/src/gpu/lulesh" \ + -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "make" + # LULESH is heavily used in the HPC community on GPUs, and does a good job of # stressing several GPU compute and memory components docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 --mem-size=8GB -clulesh +configs/example/apu_se.py -n3 --mem-size=8GB \ +--benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c lulesh # get DNNMark # Delete gem5 resources repo if it already exists -- need to do in docker -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51207 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5 Gerrit-Change-Number: 51207 Gerrit-PatchSet: 7 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-CC: Alex Dutu Gerrit-CC: Kyle Roarty Gerrit-CC: Matthew Poremba Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: convert all nightly GPU tests from GUID to GID
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51567 ) Change subject: tests: convert all nightly GPU tests from GUID to GID .. tests: convert all nightly GPU tests from GUID to GID As part of the docker commands for the nightly GPU regression tests, earlier commits inadvertently used GUID instead of GID, where GUID does not exist. This causes some failures when run in Jenkins. This patch fixes this issue. Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377 --- M tests/nightly.sh 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/tests/nightly.sh b/tests/nightly.sh index 89c7005..4e7420d 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -86,7 +86,7 @@ # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. docker pull gcr.io/gem5-test/gcn-gpu:latest -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ "scons build/GCN3_X86/gem5.opt -j${threads} \ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" @@ -99,7 +99,7 @@ # Square is the simplest, fastest, more heavily tested GPU application # Thus, we always want to run this in the nightly regressions to make sure # basic GPU functionality is working. -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" -c square @@ -110,7 +110,7 @@ # 10 Ld/St per thread and 4 iterations of the critical section is a reasonable # moderate contention case for the default 4 CU GPU config and help ensure GPU # atomics are tested. -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \ -c allSyncPrims-1kernel --options="sleepMutex 10 16 4" @@ -120,7 +120,7 @@ # thread, 4 iterations of critical section. Again this is representative of a # moderate contention case for the default 4 CU GPU config and help ensure GPU # atomics are tested. -docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \ -c allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51567 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377 Gerrit-Change-Number: 51567 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: add additional space in weekly DNNMark tests
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51453 ) Change subject: tests: add additional space in weekly DNNMark tests .. tests: add additional space in weekly DNNMark tests Add space between -c and binary name for all DNNMark tests to conform to the other tests style and reduce confusion. Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624 --- M tests/weekly.sh 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index 5a8accc..5997ae9 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -128,7 +128,7 @@ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax" \ - -cdnnmark_test_fwd_softmax \ + -c dnnmark_test_fwd_softmax \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark \ -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" @@ -137,7 +137,7 @@ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool" \ - -cdnnmark_test_fwd_pool \ + -c dnnmark_test_fwd_pool \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark \ -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" @@ -146,7 +146,7 @@ -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn" \ - -cdnnmark_test_bwd_bn \ + -c dnnmark_test_bwd_bn \ --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark \ -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51453 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624 Gerrit-Change-Number: 51453 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: fix square and HeteroSync nightly regression command
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51247 ) Change subject: tests: fix square and HeteroSync nightly regression command .. tests: fix square and HeteroSync nightly regression command Square and HeteroSync's pre-built binaries were downloaded into the tests folder in the nightly regression script, but the docker command running them assumed we were in GEM5_ROOT. This commit fixes this problem by specificying the benchmark root for the applications. Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51247 Tested-by: kokoro Reviewed-by: Jason Lowe-Power Maintainer: Bobby R. Bruce --- M tests/nightly.sh 1 file changed, 24 insertions(+), 5 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved Bobby R. Bruce: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/nightly.sh b/tests/nightly.sh index 6631bb0..89c7005 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -101,7 +101,7 @@ # basic GPU functionality is working. docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c square +configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" -c square # get HeteroSync wget -qN http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel @@ -112,8 +112,8 @@ # atomics are tested. docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -callSyncPrims-1kernel \ ---options="sleepMutex 10 16 4" +configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \ +-c allSyncPrims-1kernel --options="sleepMutex 10 16 4" # run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs # accessing unique data and then joining a lock-free barrier, 10 Ld/St per @@ -122,5 +122,5 @@ # atomics are tested. docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -callSyncPrims-1kernel \ ---options="lfTreeBarrUniq 10 16 4" +configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \ +-c allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51247 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac Gerrit-Change-Number: 51247 Gerrit-PatchSet: 5 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-CC: Alex Dutu Gerrit-CC: Kyle Roarty Gerrit-CC: Matthew Poremba Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51371 ) Change subject: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues .. dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues GFX7 (not supported in gem5) and GFX8 have a bug with how virtual addresses are calculated for their HSA queues. The ROCr component of ROCm solves this problem by doubling the HSA queue size that is requested, then mapping all virtual addresses in the second half of the queue to the same virtual addresses as the first half of the queue. This commit fixes gem5's support to mimic this behavior. Note that this change does not affect Vega's HSA queue support, because according to the ROCm documentation, Vega does not have the same problem as GCN3. Change-Id: I133cf1acc3a00a0baded0c4c3c2a25f39effdb51 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51371 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Matthew Poremba --- M src/dev/hsa/hsa_packet_processor.cc M src/dev/hsa/hsa_packet_processor.hh M src/gpu-compute/gpu_compute_driver.cc M src/dev/hsa/hw_scheduler.cc M src/dev/hsa/hw_scheduler.hh 5 files changed, 75 insertions(+), 17 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/hsa/hsa_packet_processor.cc b/src/dev/hsa/hsa_packet_processor.cc index 0427def..22124b1 100644 --- a/src/dev/hsa/hsa_packet_processor.cc +++ b/src/dev/hsa/hsa_packet_processor.cc @@ -44,6 +44,7 @@ #include "dev/dma_device.hh" #include "dev/hsa/hsa_packet.hh" #include "dev/hsa/hw_scheduler.hh" +#include "enums/GfxVersion.hh" #include "gpu-compute/gpu_command_processor.hh" #include "mem/packet_access.hh" #include "mem/page_table.hh" @@ -100,13 +101,15 @@ HSAPacketProcessor::setDeviceQueueDesc(uint64_t hostReadIndexPointer, uint64_t basePointer, uint64_t queue_id, - uint32_t size, int doorbellSize) + uint32_t size, int doorbellSize, + GfxVersion gfxVersion) { DPRINTF(HSAPacketProcessor, "%s:base = %p, qID = %d, ze = %d\n", __FUNCTION__, (void *)basePointer, queue_id, size); hwSchdlr->registerNewQueue(hostReadIndexPointer, - basePointer, queue_id, size, doorbellSize); + basePointer, queue_id, size, doorbellSize, + gfxVersion); } AddrRangeList diff --git a/src/dev/hsa/hsa_packet_processor.hh b/src/dev/hsa/hsa_packet_processor.hh index 9545006..aabe24e 100644 --- a/src/dev/hsa/hsa_packet_processor.hh +++ b/src/dev/hsa/hsa_packet_processor.hh @@ -39,9 +39,11 @@ #include #include "base/types.hh" +#include "debug/HSAPacketProcessor.hh" #include "dev/dma_virt_device.hh" #include "dev/hsa/hsa.h" #include "dev/hsa/hsa_queue.hh" +#include "enums/GfxVersion.hh" #include "params/HSAPacketProcessor.hh" #include "sim/eventq.hh" @@ -84,14 +86,16 @@ uint64_t hostReadIndexPtr; bool stalledOnDmaBufAvailability; bool dmaInProgress; +GfxVersion gfxVersion; HSAQueueDescriptor(uint64_t base_ptr, uint64_t db_ptr, - uint64_t hri_ptr, uint32_t size) + uint64_t hri_ptr, uint32_t size, + GfxVersion gfxVersion) : basePointer(base_ptr), doorbellPointer(db_ptr), writeIndex(0), readIndex(0), numElts(size / AQL_PACKET_SIZE), hostReadIndexPtr(hri_ptr), stalledOnDmaBufAvailability(false), -dmaInProgress(false) +dmaInProgress(false), gfxVersion(gfxVersion) { } uint64_t spaceRemaining() { return numElts - (writeIndex - readIndex); } uint64_t spaceUsed() { return writeIndex - readIndex; } @@ -102,15 +106,38 @@ uint64_t ptr(uint64_t ix) { -/** - * Sometimes queues report that their size is 512k, which would - * indicate numElts of 0x2000. However, they only have 256k - * mapped which means any index over 0x1000 will fail an - * address translation. +/* + * Based on ROCm Documentation: + * - https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/ + 10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/ + rocr/src/core/runtime/amd_aql_queue.cpp#L99 + * - https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/ + 10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/ + rocr/src/core/runtime/amd_aql_queue.cpp#L
[gem5-dev] Change in gem5/gem5[develop]: tests: add DNNMark to weekly regression
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51187 ) Change subject: tests: add DNNMark to weekly regression .. tests: add DNNMark to weekly regression DNNMark is representative of several simple (fast) layers within ML applications, which are heavily used in modern GPU applications. Thus, we want to make sure support for these applications are tested. This commit updates the weekly regression to run three variants: fwd_softmax, bwd_bn, and fwd_pool -- ensuring we test both inference and training as well as a variety of ML layers. Change-Id: I38bfa9bd3a2817099ece46afc2d6132ce346e21a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51187 Reviewed-by: Bobby R. Bruce Maintainer: Bobby R. Bruce Tested-by: kokoro --- M tests/weekly.sh 1 file changed, 103 insertions(+), 1 deletion(-) Approvals: Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index b697c29..c699f65 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -58,4 +58,86 @@ # LULESH is heavily used in the HPC community on GPUs, and does a good job of # stressing several GPU compute and memory components -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +configs/example/apu_se.py -n3 --mem-size=8GB -clulesh + +# get DNNMark +# Delete gem5 resources repo if it already exists -- need to do in docker +# because of cachefiles DNNMark creates +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "rm -rf ${gem5_root}/gem5-resources" + +# Pull the gem5 resources to the root of the gem5 directory -- DNNMark +# builds a library and thus doesn't have a binary, so we need to build +# it before we run it +git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ +"${gem5_root}/gem5-resources" + +# setup cmake for DNNMark +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ + gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP" + +# make the DNNMark library +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \ +gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}" + +# generate cachefiles -- since we are testing gfx801 and 4 CUs (default config) +# in tester, we want cachefiles for this setup +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" \ +"-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ +gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \ +--num-cus=4" + +# generate mmap data for DNNMark (makes simulation much faster) +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data" + +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"./generate_rand_data" + +# now we can run DNNMark! +# DNNMark is representative of several simple (fast) layers within ML +# applications, which are heavily used in modern GPU applications. So, we want +# to make sure support for these applications are tested. Run three variants: +# fwd_softmax, bwd_bn, fwd_pool; these tests ensure we run a variety of ML kernels, +# including both inference and training +docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \ + "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ + -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ + "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax" \ + -cdnnmark_test_fwd_softmax \ + --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark \ + -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" + +docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \ + "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/
[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51367 ) Change subject: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock .. mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock In the GPU VIPER TCC, programs with mixes of atomics and data accesses to the same address, in the same kernel, can experience deadlock when large applications (e.g., Pannotia's graph analytics algorithms) are running on very small GPUs (e.g., the default 4 CU GPU configuration). In this situation, deadlocks occur due to resource stalls interacting with the behavior of the current implementation for handling races between atomic accesses. The specific order of events causing this deadlock are: 1. TCC is waiting on an atomic to return from directory 2. In the meantime it receives another atomic to the same address -- when this happens, the TCC increments number of atomics to this address (numAtomics = 2) that are pending in TBE, and does a write through of the atomic to the directory. 3. When the first atomic returns from the Directory, it decrements the numAtomics counter. numAtomics was at 2 though, because of step #2. So it doesn't deallocate the TBE entry and calls Event:AtomicNotDone. 4. Another request (a LD) to the same address comes along for the same address. The LD does z_stall since the second atomic is pending –- so the LD retries every cycle until the deadlock counter times out (or until the second atomic comes back). 5. The second atomic returns to the TCC. However, because there are so many LD's pending in the cache, all doing z_stall's and retrying every cycle, there are a lot of resource stalls. So, when the second atomic returns, it is forced to retry its operation multiple times -- and each time it decrements the atomicDoneCnt flag (which was added to catch a race between atomics arriving and leaving the TCC in 7246f70bfb) repeatedly. As a result atomicDoneCnt becomes negative. 6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone happens, and since the resource stalls caused the atomicDoneCnt flag to become negative, we never complete the atomic. Which means the pending LD can never access the line, because it's stuck waiting for the atomic to complete. 7. Eventually the deadlock threshold is reached. To fix this issue, this commit changes the VIPER TCC protocol from using z_stall to using the stall_and_wait buffer method that the Directory-level of the SLICC already uses. This change effectively prevents resource stalls from dominating the TCC level, by putting pending requests for a given address in a per-address stall buffer. These requests are then woken up when the pending request returns. As part of this change, this change also makes two small changes to the Directory-level protocol (MOESI_AMD_BASE-dir): 1. Updated the names of the wakeup actions to match the TCC wakeup actions, to avoid confusion. 2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers, as some requests were being placed later in the stall buffer than was being checked. This mirrors the changes in 187c44fe44 to other Directory transitions to resolve races between GPU and DMA requests, but for transitions prior workloads did not stress. Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367 Reviewed-by: Jason Lowe-Power Reviewed-by: Matthew Poremba Maintainer: Jason Lowe-Power Tested-by: kokoro --- M src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 2 files changed, 109 insertions(+), 15 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve; Looks good to me, approved Matthew Poremba: Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 6c07416..6112f38 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -126,6 +126,7 @@ void unset_tbe(); void wakeUpAllBuffers(); void wakeUpBuffers(Addr a); + void wakeUpAllBuffers(Addr a); MachineID mapAddressToMachine(Addr addr, MachineType mtype); @@ -569,6 +570,14 @@ probeNetwork_in.dequeue(clockEdge()); } + action(st_stallAndWaitRequest, "st", desc="Stall and wait on the address") { +stall_and_wait(coreRequestNetwork_in, address); + } + + action(wada_wakeUpAllDependentsAddr, "wada", desc="Wake up any requests waiting for this address") { +wakeUpAllBuffers(address); + } + action(z_stall, "z", desc="stall") { // built-in } @@ -606,13 +615,22 @@ // they can cause a resource stall deadlock! transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { - z_stall; + // put putting the stalled requests
[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Move VIPER TCC decrements to action from in_port
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51368 ) Change subject: mem-ruby: Move VIPER TCC decrements to action from in_port .. mem-ruby: Move VIPER TCC decrements to action from in_port Currently, the GPU VIPER TCC protocol handles races between atomics in the triggerQueue_in. This in_port does not check for resource availability, which can cause the trigger queue to execute multiple times. Although this is the expected behavior, the code for handling atomic races decrements the atomicDoneCnt flag in the trigger queue, which is not safe since resource contention may cause it to execute multiple times. To resolve this issue, this commit moves the decrementing of this counter to a new action that is called in an event that happens only when the race between atomics is detected. Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368 Reviewed-by: Jason Lowe-Power Reviewed-by: Matthew Poremba Maintainer: Jason Lowe-Power Tested-by: kokoro --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 31 insertions(+), 1 deletion(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve; Looks good to me, approved Matthew Poremba: Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 6112f38..571587f 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -268,7 +268,6 @@ if (tbe.numAtomics == 0 && tbe.atomicDoneCnt == 1) { trigger(Event:AtomicDone, in_msg.addr, cache_entry, tbe); } else { -tbe.atomicDoneCnt := tbe.atomicDoneCnt - 1; trigger(Event:AtomicNotDone, in_msg.addr, cache_entry, tbe); } } @@ -599,6 +598,10 @@ } } + action(dadc_decrementAtomicDoneCnt, "dadc", desc="decrement atomics done cnt flag") { +tbe.atomicDoneCnt := tbe.atomicDoneCnt - 1; + } + action(ptr_popTriggerQueue, "ptr", desc="pop Trigger") { triggerQueue_in.dequeue(clockEdge()); } @@ -787,6 +790,7 @@ } transition(A, AtomicNotDone) {TagArrayRead} { +dadc_decrementAtomicDoneCnt; ptr_popTriggerQueue; } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51368 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f Gerrit-Change-Number: 51368 Gerrit-PatchSet: 3 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51371 ) Change subject: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues .. dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues GFX7 (not supported in gem5) and GFX8 have a bug with how virtual addresses are calculated for their HSA queues. The ROCr component of ROCm solves this problem by doubling the HSA queue size that is requested, then mapping all virtual addresses in the second half of the queue to the same virtual addresses as the first half of the queue. This commit fixes gem5's support to mimic this behavior. Note that this change does not affect Vega's HSA queue support, because according to the ROCm documentation, Vega does not have the same problem as GCN3. Change-Id: I133cf1acc3a00a0baded0c4c3c2a25f39effdb51 --- M src/dev/hsa/hsa_packet_processor.cc M src/dev/hsa/hsa_packet_processor.hh M src/gpu-compute/gpu_compute_driver.cc M src/dev/hsa/hw_scheduler.cc M src/dev/hsa/hw_scheduler.hh 5 files changed, 71 insertions(+), 17 deletions(-) diff --git a/src/dev/hsa/hsa_packet_processor.cc b/src/dev/hsa/hsa_packet_processor.cc index 0427def..22124b1 100644 --- a/src/dev/hsa/hsa_packet_processor.cc +++ b/src/dev/hsa/hsa_packet_processor.cc @@ -44,6 +44,7 @@ #include "dev/dma_device.hh" #include "dev/hsa/hsa_packet.hh" #include "dev/hsa/hw_scheduler.hh" +#include "enums/GfxVersion.hh" #include "gpu-compute/gpu_command_processor.hh" #include "mem/packet_access.hh" #include "mem/page_table.hh" @@ -100,13 +101,15 @@ HSAPacketProcessor::setDeviceQueueDesc(uint64_t hostReadIndexPointer, uint64_t basePointer, uint64_t queue_id, - uint32_t size, int doorbellSize) + uint32_t size, int doorbellSize, + GfxVersion gfxVersion) { DPRINTF(HSAPacketProcessor, "%s:base = %p, qID = %d, ze = %d\n", __FUNCTION__, (void *)basePointer, queue_id, size); hwSchdlr->registerNewQueue(hostReadIndexPointer, - basePointer, queue_id, size, doorbellSize); + basePointer, queue_id, size, doorbellSize, + gfxVersion); } AddrRangeList diff --git a/src/dev/hsa/hsa_packet_processor.hh b/src/dev/hsa/hsa_packet_processor.hh index 9545006..aabe24e 100644 --- a/src/dev/hsa/hsa_packet_processor.hh +++ b/src/dev/hsa/hsa_packet_processor.hh @@ -39,9 +39,11 @@ #include #include "base/types.hh" +#include "debug/HSAPacketProcessor.hh" #include "dev/dma_virt_device.hh" #include "dev/hsa/hsa.h" #include "dev/hsa/hsa_queue.hh" +#include "enums/GfxVersion.hh" #include "params/HSAPacketProcessor.hh" #include "sim/eventq.hh" @@ -84,14 +86,16 @@ uint64_t hostReadIndexPtr; bool stalledOnDmaBufAvailability; bool dmaInProgress; +GfxVersion gfxVersion; HSAQueueDescriptor(uint64_t base_ptr, uint64_t db_ptr, - uint64_t hri_ptr, uint32_t size) + uint64_t hri_ptr, uint32_t size, + GfxVersion gfxVersion) : basePointer(base_ptr), doorbellPointer(db_ptr), writeIndex(0), readIndex(0), numElts(size / AQL_PACKET_SIZE), hostReadIndexPtr(hri_ptr), stalledOnDmaBufAvailability(false), -dmaInProgress(false) +dmaInProgress(false), gfxVersion(gfxVersion) { } uint64_t spaceRemaining() { return numElts - (writeIndex - readIndex); } uint64_t spaceUsed() { return writeIndex - readIndex; } @@ -102,15 +106,38 @@ uint64_t ptr(uint64_t ix) { -/** - * Sometimes queues report that their size is 512k, which would - * indicate numElts of 0x2000. However, they only have 256k - * mapped which means any index over 0x1000 will fail an - * address translation. +/* + * Based on ROCm Documentation: + * - https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/ + 10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/ + rocr/src/core/runtime/amd_aql_queue.cpp#L99 + * - https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/ + 10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/ + rocr/src/core/runtime/amd_aql_queue.cpp#L624 + * + * GFX7 and GFX8 will allocate twice as much space for their HSA + * queues as they actually access (using mod operations to map the + * virtual addresses from the upper half of the queue to the same +
[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51367 ) Change subject: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock .. mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock In the GPU VIPER TCC, programs with mixes of atomics and data accesses to the same address, in the same kernel, can experience deadlock when large applications (e.g., Pannotia's graph analytics algorithms) are running on very small GPUs (e.g., the default 4 CU GPU configuration). In this situation, deadlocks occur due to resource stalls interacting with the behavior of the current implementation for handling races between atomic accesses. The specific order of events causing this deadlock are: 1. TCC is waiting on an atomic to return from directory 2. In the meantime it receives another atomic to the same address -- when this happens, the TCC increments number of atomics to this address (numAtomics = 2) that are pending in TBE, and does a write through of the atomic to the directory. 3. When the first atomic returns from the Directory, it decrements the numAtomics counter. numAtomics was at 2 though, because of step #2. So it doesn't deallocate the TBE entry and calls Event:AtomicNotDone. 4. Another request (a LD) to the same address comes along for the same address. The LD does z_stall since the second atomic is pending –- so the LD retries every cycle until the deadlock counter times out (or until the second atomic comes back). 5. The second atomic returns to the TCC. However, because there are so many LD's pending in the cache, all doing z_stall's and retrying every cycle, there are a lot of resource stalls. So, when the second atomic returns, it is forced to retry its operation multiple times -- and each time it decrements the atomicDoneCnt flag (which was added to catch a race between atomics arriving and leaving the TCC in 7246f70bfb) repeatedly. As a result atomicDoneCnt becomes negative. 6. Since this atomicDoneCnt flag is used to determine when Event:AtomicDone happens, and since the resource stalls caused the atomicDoneCnt flag to become negative, we never complete the atomic. Which means the pending LD can never access the line, because it's stuck waiting for the atomic to complete. 7. Eventually the deadlock threshold is reached. To fix this issue, this commit changes the VIPER TCC protocol from using z_stall to using the stall_and_wait buffer method that the Directory-level of the SLICC already uses. This change effectively prevents resource stalls from dominating the TCC level, by putting pending requests for a given address in a per-address stall buffer. These requests are then woken up when the pending request returns. As part of this change, this change also makes two small changes to the Directory-level protocol (MOESI_AMD_BASE-dir): 1. Updated the names of the wakeup actions to match the TCC wakeup actions, to avoid confusion. 2. Changed transition(B, UnblockWriteThrough, U) to check all stall buffers, as some requests were being placed later in the stall buffer than was being checked. This mirrors the changes in 187c44fe44 to other Directory transitions to resolve races between GPU and DMA requests, but for transitions prior workloads did not stress. Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2 --- M src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 2 files changed, 104 insertions(+), 15 deletions(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 6c07416..6112f38 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -126,6 +126,7 @@ void unset_tbe(); void wakeUpAllBuffers(); void wakeUpBuffers(Addr a); + void wakeUpAllBuffers(Addr a); MachineID mapAddressToMachine(Addr addr, MachineType mtype); @@ -569,6 +570,14 @@ probeNetwork_in.dequeue(clockEdge()); } + action(st_stallAndWaitRequest, "st", desc="Stall and wait on the address") { +stall_and_wait(coreRequestNetwork_in, address); + } + + action(wada_wakeUpAllDependentsAddr, "wada", desc="Wake up any requests waiting for this address") { +wakeUpAllBuffers(address); + } + action(z_stall, "z", desc="stall") { // built-in } @@ -606,13 +615,22 @@ // they can cause a resource stall deadlock! transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} { - z_stall; + // put putting the stalled requests in a buffer, we reduce resource contention + // since they won't try again every cycle and will instead only try again once + // woken up + st_stallAndWaitRequest; } transition(A, {RdBlk, WrVicBlk, WrVicBlkBack}) { //TagArrayRead} { - z_stall; + // put putting the stalled requests in a buffer, we reduce resource con
[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Move VIPER TCC decrements to action from in_port
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51368 ) Change subject: mem-ruby: Move VIPER TCC decrements to action from in_port .. mem-ruby: Move VIPER TCC decrements to action from in_port Currently, the GPU VIPER TCC protocol handles races between atomics in the triggerQueue_in. This in_port does not check for resource availability, which can cause the trigger queue to execute multiple times. Although this is the expected behavior, the code for handling atomic races decrements the atomicDoneCnt flag in the trigger queue, which is not safe since resource contention may cause it to execute multiple times. To resolve this issue, this commit moves the decrementing of this counter to a new action that is called in an event that happens only when the race between atomics is detected. Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index 6112f38..cf7cda5 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -268,7 +268,6 @@ if (tbe.numAtomics == 0 && tbe.atomicDoneCnt == 1) { trigger(Event:AtomicDone, in_msg.addr, cache_entry, tbe); } else { -tbe.atomicDoneCnt := tbe.atomicDoneCnt - 1; trigger(Event:AtomicNotDone, in_msg.addr, cache_entry, tbe); } } @@ -599,6 +598,10 @@ } } + action(dadc_decrementAtomicDoneCnt, "dadc", desc="decrement atomics done cnt flag") { +tbe.numAtomics := tbe.atomicDoneCnt - 1; + } + action(ptr_popTriggerQueue, "ptr", desc="pop Trigger") { triggerQueue_in.dequeue(clockEdge()); } @@ -787,6 +790,7 @@ } transition(A, AtomicNotDone) {TagArrayRead} { +dadc_decrementAtomicDoneCnt; ptr_popTriggerQueue; } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51368 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f Gerrit-Change-Number: 51368 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: fix square nightly regression command
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51247 ) Change subject: tests: fix square nightly regression command .. tests: fix square nightly regression command Square's pre-built binary was downloaded into the tests folder, but the docker command running it assumed we were in GEM5_ROOT. This commit fixes this problem by specificying the benchmark root for the square binary. Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac --- M tests/nightly.sh 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/tests/nightly.sh b/tests/nightly.sh index 6631bb0..03f9c6a 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -101,7 +101,8 @@ # basic GPU functionality is working. docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ -configs/example/apu_se.py -n3 -c square +configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" -csquare + # get HeteroSync wget -qN http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51247 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac Gerrit-Change-Number: 51247 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: fix LULESH weekly regression command
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51207 ) Change subject: tests: fix LULESH weekly regression command .. tests: fix LULESH weekly regression command 7756c5e added LULESH to the weekly regression script. However, it assumed a local installation of gem5-resources which it should not have. This commit fixes that so the weekly regression builds the LULESH binary and then runs it instead. Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5 --- M tests/weekly.sh 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index b697c29..cdc7fa0 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -52,10 +52,28 @@ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" # get LULESH -wget -qN http://dist.gem5.org/dist/develop/test-progs/lulesh/lulesh +# Pull the gem5 resources to the root of the gem5 directory -- currently the +# pre-built binares for LULESH are out-of-date and won't run correctly with +# ROCm 4.0. In the meantime, we can build the binary as part of this script +git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ +"${gem5_root}/gem5-resources" mkdir -p tests/testing-results +# build LULESH +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}/gem5-resources/src/gpu/lulesh" \ + -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \ + "make" + # LULESH is heavily used in the HPC community on GPUs, and does a good job of # stressing several GPU compute and memory components -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh +docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +configs/example/apu_se.py -n3 --mem-size=8GB \ +--benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -clulesh + +# Delete the gem5 resources repo we created +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"rm -rf ${gem5_root}/gem5-resources" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51207 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5 Gerrit-Change-Number: 51207 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: add DNNMark to weekly regression
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/51187 ) Change subject: tests: add DNNMark to weekly regression .. tests: add DNNMark to weekly regression DNNMark is representative of several simple (fast) layers within ML applications, which are heavily used in modern GPU applications. Thus, we want to make sure support for these applications are tested. This commit updates the weekly regression to run three variants: fwd_softmax, bwd_bn, and fwd_pool -- ensuring we test both inference and training as well as a variety of ML layers. Change-Id: I38bfa9bd3a2817099ece46afc2d6132ce346e21a --- M tests/weekly.sh 1 file changed, 87 insertions(+), 0 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index b697c29..dd25d4b 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -59,3 +59,74 @@ # LULESH is heavily used in the HPC community on GPUs, and does a good job of # stressing several GPU compute and memory components docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh + +# get DNNMark +# Pull the gem5 resources to the root of the gem5 directory -- DNNMark +# builds a library and thus doesn't have a binary, so we need to build +# it before we run it +git clone -b develop https://gem5.googlesource.com/public/gem5-resources \ +"${gem5_root}/gem5-resources" + +# setup cmake for DNNMark +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ + "${gem5_root}/gem5-resources/src/gpu/DNNMark" \ + gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP" + +# make the DNNMark library +docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \ +gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}" + +# generate cachefiles -- since we are testing gfx801 and 4 CUs (default config) +# in tester, we want cachefiles for this setup +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" \ +"-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ +gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \ +--num-cus=4" + +# generate mmap data for DNNMark (makes simulation much faster) +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data" + +docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"./generate_rand_data" + +# now we can run DNNMark! +# DNNMark is representative of several simple (fast) layers within ML +# applications, which are heavily used in modern GPU applications. So, we want +# to make sure support for these applications are tested. Run three variants: +# fwd_softmax, bwd_bn, fwd_pool; these tests ensure we run a variety of ML kernels, +# including both inference and training +docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \ + "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ + -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ + "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax" \ + -cdnnmark_test_fwd_softmax \ + --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark \ + -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" + +docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \ + "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ + -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ + "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py" -n3 \ + --benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool" \ + -cdnnmark_test_fwd_pool \ + --options="-config ${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark \ + -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin" + +docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \ + "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0" \ + -w "${gem5_root}/gem5-resources/src/gpu/DNNMark" gcr.io/gem5-test/gcn-gpu \ + "$
[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: Fix MUBUF out-of-bounds case 1
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/51127 ) Change subject: arch-gcn3: Fix MUBUF out-of-bounds case 1 .. arch-gcn3: Fix MUBUF out-of-bounds case 1 This patch upates the out-of-bounds check to properly check against the correct buffer_offset, which is different depending on if the const_swizzle_enable is true or false. Change-Id: I5c687c09ee7f8e446618084b8545b74a84211d4d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51127 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Reviewed-by: Alex Dutu Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/gcn3/insts/op_encodings.hh 1 file changed, 42 insertions(+), 20 deletions(-) Approvals: Alex Dutu: Looks good to me, approved Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/op_encodings.hh b/src/arch/amdgpu/gcn3/insts/op_encodings.hh index 24edfa7..be96924 100644 --- a/src/arch/amdgpu/gcn3/insts/op_encodings.hh +++ b/src/arch/amdgpu/gcn3/insts/op_encodings.hh @@ -634,6 +634,7 @@ Addr stride = 0; Addr buf_idx = 0; Addr buf_off = 0; +Addr buffer_offset = 0; BufferRsrcDescriptor rsrc_desc; std::memcpy((void*)&rsrc_desc, s_rsrc_desc.rawDataPtr(), @@ -656,6 +657,26 @@ buf_off = v_off[lane] + inst_offset; +if (rsrc_desc.swizzleEn) { +Addr idx_stride = 8 << rsrc_desc.idxStride; +Addr elem_size = 2 << rsrc_desc.elemSize; +Addr idx_msb = buf_idx / idx_stride; +Addr idx_lsb = buf_idx % idx_stride; +Addr off_msb = buf_off / elem_size; +Addr off_lsb = buf_off % elem_size; +DPRINTF(GCN3, "mubuf swizzled lane %d: " +"idx_stride = %llx, elem_size = %llx, " +"idx_msb = %llx, idx_lsb = %llx, " +"off_msb = %llx, off_lsb = %llx\n", +lane, idx_stride, elem_size, idx_msb, idx_lsb, +off_msb, off_lsb); + +buffer_offset =(idx_msb * stride + off_msb * elem_size) +* idx_stride + idx_lsb * elem_size + off_lsb; +} else { +buffer_offset = buf_off + stride * buf_idx; +} + /** * Range check behavior causes out of range accesses to @@ -665,7 +686,7 @@ * basis. */ if (rsrc_desc.stride == 0 || !rsrc_desc.swizzleEn) { -if (buf_off + stride * buf_idx >= +if (buffer_offset >= rsrc_desc.numRecords - s_offset.rawData()) { DPRINTF(GCN3, "mubuf out-of-bounds condition 1: " "lane = %d, buffer_offset = %llx, " @@ -692,25 +713,7 @@ } } -if (rsrc_desc.swizzleEn) { -Addr idx_stride = 8 << rsrc_desc.idxStride; -Addr elem_size = 2 << rsrc_desc.elemSize; -Addr idx_msb = buf_idx / idx_stride; -Addr idx_lsb = buf_idx % idx_stride; -Addr off_msb = buf_off / elem_size; -Addr off_lsb = buf_off % elem_size; -DPRINTF(GCN3, "mubuf swizzled lane %d: " -"idx_stride = %llx, elem_size = %llx, " -"idx_msb = %llx, idx_lsb = %llx, " -"off_msb = %llx, off_lsb = %llx\n", -lane, idx_stride, elem_size, idx_msb, idx_lsb, -off_msb, off_lsb); - -vaddr += ((idx_msb * stride + off_msb * elem_size) -* idx_stride + idx_lsb * elem_size + off_lsb); -} else { -vaddr += buf_off + stride * buf_idx; -} +vaddr += buffer_offset; DPRINTF(GCN3, "Calculating mubuf address for lane %d: " "vaddr = %llx, base_addr = %llx, " -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51127 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I5c687c09ee7f8e446618084b8545b74a84211d4d Gerrit-Chang
[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: update GPU scripts to remove master/slave
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/50967 ) Change subject: configs, gpu-compute: update GPU scripts to remove master/slave .. configs, gpu-compute: update GPU scripts to remove master/slave Update apu_se and underlying configuration files for GPU runs to replace the master/slave terminology. Change-Id: Icf309782f0899dc412eccd27e3ac017902316a70 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50967 Tested-by: kokoro Reviewed-by: Matthew Poremba Reviewed-by: Jason Lowe-Power Reviewed-by: Bobby R. Bruce Maintainer: Jason Lowe-Power Maintainer: Bobby R. Bruce --- M configs/common/GPUTLBConfig.py M configs/example/apu_se.py 2 files changed, 53 insertions(+), 28 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve; Looks good to me, approved Matthew Poremba: Looks good to me, approved Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/common/GPUTLBConfig.py b/configs/common/GPUTLBConfig.py index 958cf1f..d7adaee 100644 --- a/configs/common/GPUTLBConfig.py +++ b/configs/common/GPUTLBConfig.py @@ -148,8 +148,8 @@ for TLB_type in hierarchy_level: name = TLB_type['name'] for index in range(TLB_type['width']): -exec('system.%s_coalescer[%d].master[0] = \ -system.%s_tlb[%d].slave[0]' % \ +exec('system.%s_coalescer[%d].mem_side_ports[0] = \ +system.%s_tlb[%d].cpu_side_ports[0]' % \ (name, index, name, index)) # Connect the cpuSidePort (slave) of all the coalescers in level 1 @@ -163,12 +163,12 @@ if tlb_per_cu: for tlb in range(tlb_per_cu): exec('system.cpu[%d].CUs[%d].translation_port[%d] = \ -system.l1_coalescer[%d].slave[%d]' % \ + system.l1_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, cu_idx, tlb, cu_idx*tlb_per_cu+tlb, 0)) else: exec('system.cpu[%d].CUs[%d].translation_port[%d] = \ -system.l1_coalescer[%d].slave[%d]' % \ +system.l1_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, cu_idx, tlb_per_cu, cu_idx / (n_cu / num_TLBs), cu_idx % (n_cu / num_TLBs))) @@ -177,14 +177,14 @@ sqc_tlb_index = index / options.cu_per_sqc sqc_tlb_port_id = index % options.cu_per_sqc exec('system.cpu[%d].CUs[%d].sqc_tlb_port = \ -system.sqc_coalescer[%d].slave[%d]' % \ +system.sqc_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, index, sqc_tlb_index, sqc_tlb_port_id)) elif name == 'scalar': # Scalar D-TLB for index in range(n_cu): scalar_tlb_index = index / options.cu_per_scalar_cache scalar_tlb_port_id = index % options.cu_per_scalar_cache exec('system.cpu[%d].CUs[%d].scalar_tlb_port = \ -system.scalar_coalescer[%d].slave[%d]' % \ +system.scalar_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, index, scalar_tlb_index, scalar_tlb_port_id)) @@ -196,11 +196,12 @@ for TLB_type in L1: name = TLB_type['name'] for index in range(TLB_type['width']): -exec('system.%s_tlb[%d].master[0] = \ -system.l2_coalescer[0].slave[%d]' % \ +exec('system.%s_tlb[%d].mem_side_ports[0] = \ +system.l2_coalescer[0].cpu_side_ports[%d]' % \ (name, index, l2_coalescer_index)) l2_coalescer_index += 1 # L2 <-> L3 -system.l2_tlb[0].master[0] = system.l3_coalescer[0].slave[0] +system.l2_tlb[0].mem_side_ports[0] = \ +system.l3_coalescer[0].cpu_side_ports[0] return system diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py index 7a45952..29ceddb 100644 --- a/configs/example/apu_se.py +++ b/configs/example/apu_se.py @@ -342,8 +342,9 @@ compute_units[-1].prefetch_prev_type = args.pf_type # attach the LDS and the CU to the bus (actually a Bridge) -compute_units[-1].ldsPort = compute_units[-1].ldsBus.slave -compute_units[-1].ldsBus.master = compute_units[-1].localDataStore.cuPort +compute_units[-1].ldsPort = compute_units[-1].ldsBus.cpu_side_port +compute_units[-1].ldsBus.mem_side_port = \ +compute_units[-1].localDataStore.cuPort # Attach compute units to GPU sh
[gem5-dev] Change in gem5/gem5[develop]: tests: add LULESH to weekly regression
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/50952 ) ( 2 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. )Change subject: tests: add LULESH to weekly regression .. tests: add LULESH to weekly regression LULESH is a popular GPU HPC application that acts as a good test for several memory and compute patterns. Thus, including it in the weekly regressions will help verify correctness and functionality for code that affects the GPU. The default LULESH input runs 10 iterations and takes 3-4 hours. Hence, it is not appropriate for nightly regressions. Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50952 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Bobby R. Bruce --- M tests/weekly.sh 1 file changed, 36 insertions(+), 0 deletions(-) Approvals: Bobby R. Bruce: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/weekly.sh b/tests/weekly.sh index 393c66f..b697c29 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -43,3 +43,19 @@ docker run -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}"/tests --rm gcr.io/gem5-test/ubuntu-20.04_all-dependencies \ ./main.py run --length very-long -j${threads} -t${threads} + +# For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. +docker pull gcr.io/gem5-test/gcn-gpu:latest +docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"scons build/GCN3_X86/gem5.opt -j${threads} \ +|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" + +# get LULESH +wget -qN http://dist.gem5.org/dist/develop/test-progs/lulesh/lulesh + +mkdir -p tests/testing-results + +# LULESH is heavily used in the HPC community on GPUs, and does a good job of +# stressing several GPU compute and memory components +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50952 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e Gerrit-Change-Number: 50952 Gerrit-PatchSet: 4 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Kyle Roarty Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: Add HeteroSync to nightly regression
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/50951 ) Change subject: tests: Add HeteroSync to nightly regression .. tests: Add HeteroSync to nightly regression HeteroSync does a good job of testing the GPU memory system and atomics support, without requiring a long runtime. Thus, this commit adds a mutex and barrier test from HeteroSync to the nightly regression to ensure these components are tested. Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50951 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Bobby R. Bruce --- M tests/nightly.sh 1 file changed, 40 insertions(+), 0 deletions(-) Approvals: Bobby R. Bruce: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/nightly.sh b/tests/nightly.sh index 91b19f5..6631bb0 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -102,3 +102,25 @@ docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 -c square + +# get HeteroSync +wget -qN http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel + +# run HeteroSync sleepMutex -- 16 WGs (4 per CU in default config), each doing +# 10 Ld/St per thread and 4 iterations of the critical section is a reasonable +# moderate contention case for the default 4 CU GPU config and help ensure GPU +# atomics are tested. +docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +configs/example/apu_se.py -n3 -callSyncPrims-1kernel \ +--options="sleepMutex 10 16 4" + +# run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs +# accessing unique data and then joining a lock-free barrier, 10 Ld/St per +# thread, 4 iterations of critical section. Again this is representative of a +# moderate contention case for the default 4 CU GPU config and help ensure GPU +# atomics are tested. +docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +configs/example/apu_se.py -n3 -callSyncPrims-1kernel \ +--options="lfTreeBarrUniq 10 16 4" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50951 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a Gerrit-Change-Number: 50951 Gerrit-PatchSet: 3 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Kyle Roarty Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: update nightly tests to document square
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/50949 ) Change subject: tests: update nightly tests to document square .. tests: update nightly tests to document square Add some information and comments on why square is included in the nightly tests. Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50949 Maintainer: Matt Sinclair Tested-by: kokoro Reviewed-by: Bobby R. Bruce --- M tests/nightly.sh 1 file changed, 20 insertions(+), 0 deletions(-) Approvals: Bobby R. Bruce: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/tests/nightly.sh b/tests/nightly.sh index 3ffdbcd..91b19f5 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -91,10 +91,14 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" +# get square wget -qN http://dist.gem5.org/dist/develop/test-progs/square/square mkdir -p tests/testing-results +# Square is the simplest, fastest, more heavily tested GPU application +# Thus, we always want to run this in the nightly regressions to make sure +# basic GPU functionality is working. docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 -c square -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50949 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382 Gerrit-Change-Number: 50949 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: update GPU scripts to remove master/slave
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/50967 ) Change subject: configs, gpu-compute: update GPU scripts to remove master/slave .. configs, gpu-compute: update GPU scripts to remove master/slave Update apu_se and underlying configuration files for GPU runs to replace the master/slave terminology. Change-Id: Icf309782f0899dc412eccd27e3ac017902316a70 --- M configs/common/GPUTLBConfig.py M configs/example/apu_se.py 2 files changed, 34 insertions(+), 28 deletions(-) diff --git a/configs/common/GPUTLBConfig.py b/configs/common/GPUTLBConfig.py index 958cf1f..d7adaee 100644 --- a/configs/common/GPUTLBConfig.py +++ b/configs/common/GPUTLBConfig.py @@ -148,8 +148,8 @@ for TLB_type in hierarchy_level: name = TLB_type['name'] for index in range(TLB_type['width']): -exec('system.%s_coalescer[%d].master[0] = \ -system.%s_tlb[%d].slave[0]' % \ +exec('system.%s_coalescer[%d].mem_side_ports[0] = \ +system.%s_tlb[%d].cpu_side_ports[0]' % \ (name, index, name, index)) # Connect the cpuSidePort (slave) of all the coalescers in level 1 @@ -163,12 +163,12 @@ if tlb_per_cu: for tlb in range(tlb_per_cu): exec('system.cpu[%d].CUs[%d].translation_port[%d] = \ -system.l1_coalescer[%d].slave[%d]' % \ + system.l1_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, cu_idx, tlb, cu_idx*tlb_per_cu+tlb, 0)) else: exec('system.cpu[%d].CUs[%d].translation_port[%d] = \ -system.l1_coalescer[%d].slave[%d]' % \ +system.l1_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, cu_idx, tlb_per_cu, cu_idx / (n_cu / num_TLBs), cu_idx % (n_cu / num_TLBs))) @@ -177,14 +177,14 @@ sqc_tlb_index = index / options.cu_per_sqc sqc_tlb_port_id = index % options.cu_per_sqc exec('system.cpu[%d].CUs[%d].sqc_tlb_port = \ -system.sqc_coalescer[%d].slave[%d]' % \ +system.sqc_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, index, sqc_tlb_index, sqc_tlb_port_id)) elif name == 'scalar': # Scalar D-TLB for index in range(n_cu): scalar_tlb_index = index / options.cu_per_scalar_cache scalar_tlb_port_id = index % options.cu_per_scalar_cache exec('system.cpu[%d].CUs[%d].scalar_tlb_port = \ -system.scalar_coalescer[%d].slave[%d]' % \ +system.scalar_coalescer[%d].cpu_side_ports[%d]' % \ (shader_idx, index, scalar_tlb_index, scalar_tlb_port_id)) @@ -196,11 +196,12 @@ for TLB_type in L1: name = TLB_type['name'] for index in range(TLB_type['width']): -exec('system.%s_tlb[%d].master[0] = \ -system.l2_coalescer[0].slave[%d]' % \ +exec('system.%s_tlb[%d].mem_side_ports[0] = \ +system.l2_coalescer[0].cpu_side_ports[%d]' % \ (name, index, l2_coalescer_index)) l2_coalescer_index += 1 # L2 <-> L3 -system.l2_tlb[0].master[0] = system.l3_coalescer[0].slave[0] +system.l2_tlb[0].mem_side_ports[0] = \ +system.l3_coalescer[0].cpu_side_ports[0] return system diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py index 7a45952..29ceddb 100644 --- a/configs/example/apu_se.py +++ b/configs/example/apu_se.py @@ -342,8 +342,9 @@ compute_units[-1].prefetch_prev_type = args.pf_type # attach the LDS and the CU to the bus (actually a Bridge) -compute_units[-1].ldsPort = compute_units[-1].ldsBus.slave -compute_units[-1].ldsBus.master = compute_units[-1].localDataStore.cuPort +compute_units[-1].ldsPort = compute_units[-1].ldsBus.cpu_side_port +compute_units[-1].ldsBus.mem_side_port = \ +compute_units[-1].localDataStore.cuPort # Attach compute units to GPU shader.CUs = compute_units @@ -561,8 +562,8 @@ Ruby.create_system(args, None, system, None, dma_list, None) system.ruby.clk_domain = SrcClockDomain(clock = args.ruby_clock, voltage_domain = system.voltage_domain) -gpu_cmd_proc.pio = system.piobus.master -gpu_hsapp.pio = system.piobus.master +gpu_cmd_proc.pio = system.piobus.mem_side_ports +gpu_hsapp.pio = system.piobus.mem_side_ports for i, dma_device in enumerate(dma_list): exec('system.
[gem5-dev] Change in gem5/gem5[develop]: tests: add LULESH to weekly regression
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/50952 ) Change subject: tests: add LULESH to weekly regression .. tests: add LULESH to weekly regression LULESH is a popular GPU HPC application that acts as a good test for several memory and compute patterns. Thus, including it in the weekly regressions will help verify correctness and functionality for code that affects the GPU. The default LULESH input runs 10 iterations and takes 3-4 hours. Hence, it is not appropriate for nightly regressions. Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e --- M tests/weekly.sh 1 file changed, 16 insertions(+), 0 deletions(-) diff --git a/tests/weekly.sh b/tests/weekly.sh index 393c66f..d837940 100755 --- a/tests/weekly.sh +++ b/tests/weekly.sh @@ -43,3 +43,19 @@ docker run -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}"/tests --rm gcr.io/gem5-test/ubuntu-20.04_all-dependencies \ ./main.py run --length very-long -j${threads} -t${threads} + +# For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container. +docker pull gcr.io/gem5-test/gcn-gpu:latest +docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \ +"scons build/GCN3_X86/gem5.opt -j${threads} \ +|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" + +# get LULESH +wget -qN http://dist.gem5.org/dist/v21-1/test-progs/lulesh/lulesh + +mkdir -p tests/testing-results + +# LULESH is heavily used in the HPC community on GPUs, and does a good job of +# stressing several GPU compute and memory components +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50952 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e Gerrit-Change-Number: 50952 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: Add HeteroSync to nightly regression
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/50951 ) Change subject: tests: Add HeteroSync to nightly regression .. tests: Add HeteroSync to nightly regression HeteroSync does a good job of testing the GPU memory system and atomics support, without requiring a long runtime. Thus, this commit adds a mutex and barrier test from HeteroSync to the nightly regression to ensure these components are tested. Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a --- M tests/nightly.sh 1 file changed, 22 insertions(+), 0 deletions(-) diff --git a/tests/nightly.sh b/tests/nightly.sh index 91b19f5..3f115e0 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -102,3 +102,25 @@ docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 -c square + +# get HeteroSync +wget -qN http://dist.gem5.org/dist/v21-1/test-progs/heterosync/gcn3/allSyncPrims-1kernel + +# run HeteroSync sleepMutex -- 16 WGs (4 per CU in default config), each doing +# 10 Ld/St per thread and 4 iterations of the critical section is a reasonable +# moderate contention case for the default 4 CU GPU config and help ensure GPU +# atomics are tested. +docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +configs/example/apu_se.py -n3 -callSyncPrims-1kernel \ +--options="sleepMutex 10 16 4" + +# run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs +# accessing unique data and then joining a lock-free barrier, 10 Ld/St per +# thread, 4 iterations of critical section. Again this is representative of a +# moderate contention case for the default 4 CU GPU config and help ensure GPU +# atomics are tested. +docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ +"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ +configs/example/apu_se.py -n3 -callSyncPrims-1kernel \ +--options="lfTreeBarrUniq 10 16 4" -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50951 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a Gerrit-Change-Number: 50951 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: tests: update nightly tests to document square
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/50949 ) Change subject: tests: update nightly tests to document square .. tests: update nightly tests to document square Add some information and comments on why square is included in the nightly tests. Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382 --- M tests/nightly.sh 1 file changed, 4 insertions(+), 0 deletions(-) diff --git a/tests/nightly.sh b/tests/nightly.sh index 3ffdbcd..91b19f5 100755 --- a/tests/nightly.sh +++ b/tests/nightly.sh @@ -91,10 +91,14 @@ "scons build/GCN3_X86/gem5.opt -j${threads} \ || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})" +# get square wget -qN http://dist.gem5.org/dist/develop/test-progs/square/square mkdir -p tests/testing-results +# Square is the simplest, fastest, more heavily tested GPU application +# Thus, we always want to run this in the nightly regressions to make sure +# basic GPU functionality is working. docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \ "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt \ configs/example/apu_se.py -n3 -c square -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50949 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382 Gerrit-Change-Number: 50949 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: fix typo in compute driver comments
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48023 ) Change subject: gpu-compute: fix typo in compute driver comments .. gpu-compute: fix typo in compute driver comments Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48023 Maintainer: Matt Sinclair Reviewed-by: Matthew Poremba Tested-by: kokoro --- M src/gpu-compute/gpu_compute_driver.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/gpu_compute_driver.cc b/src/gpu-compute/gpu_compute_driver.cc index d51b4c3..52a437a 100644 --- a/src/gpu-compute/gpu_compute_driver.cc +++ b/src/gpu-compute/gpu_compute_driver.cc @@ -836,7 +836,7 @@ // of the region. // // This is a simplified version of regular system VMAs, but for -// GPUVM space (non of the clobber/remap nonsense we find in real +// GPUVM space (none of the clobber/remap nonsense we find in real // OS managed memory). allocateGpuVma(mtype, args->va_addr, args->size); 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48023 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5 Gerrit-Change-Number: 48023 Gerrit-PatchSet: 3 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: sim-se: Properly handle a clone with the VFORK flag
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48346 ) Change subject: sim-se: Properly handle a clone with the VFORK flag .. sim-se: Properly handle a clone with the VFORK flag When clone is called with the VFORK flag, the calling process is suspended until the child process either exits, or calls execve. This patch adds in a new variable to Process, which is used to store the context of the calling process if this process is created through a clone with VFORK set. This patch also adds the required support in clone to suspend the calling thread, and in exitImpl and execveFunc to wake up the calling thread when the child thread calls either of those functions Change-Id: I85af67544ea1d5df7102dcff1331b5a6f6f4fa7c Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48346 Tested-by: kokoro Reviewed-by: Bobby R. Bruce Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/sim/process.cc M src/sim/process.hh M src/sim/syscall_emul.cc M src/sim/syscall_emul.hh 4 files changed, 34 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved Bobby R. Bruce: Looks good to me, approved kokoro: Regressions pass diff --git a/src/sim/process.cc b/src/sim/process.cc index 207c275..272fc9f 100644 --- a/src/sim/process.cc +++ b/src/sim/process.cc @@ -175,6 +175,9 @@ #ifndef CLONE_THREAD #define CLONE_THREAD 0 #endif +#ifndef CLONE_VFORK +#define CLONE_VFORK 0 +#endif if (CLONE_VM & flags) { /** * Share the process memory address space between the new process @@ -249,6 +252,10 @@ np->exitGroup = exitGroup; } +if (CLONE_VFORK & flags) { +np->vforkContexts.push_back(otc->contextId()); +} + np->argv.insert(np->argv.end(), argv.begin(), argv.end()); np->envp.insert(np->envp.end(), envp.begin(), envp.end()); } diff --git a/src/sim/process.hh b/src/sim/process.hh index 632ba90..34768a0 100644 --- a/src/sim/process.hh +++ b/src/sim/process.hh @@ -284,6 +284,9 @@ // Process was forked with SIGCHLD set. bool *sigchld; +// Contexts to wake up when this thread exits or calls execve +std::vector vforkContexts; + // Track how many system calls are executed statistics::Scalar numSyscalls; }; diff --git a/src/sim/syscall_emul.cc b/src/sim/syscall_emul.cc index 147cb39..713bec4 100644 --- a/src/sim/syscall_emul.cc +++ b/src/sim/syscall_emul.cc @@ -193,6 +193,16 @@ } } +/** + * If we were a thread created by a clone with vfork set, wake up + * the thread that created us + */ +if (!p->vforkContexts.empty()) { +ThreadContext *vtc = sys->threads[p->vforkContexts.front()]; +assert(vtc->status() == ThreadContext::Suspended); +vtc->activate(); +} + tc->halt(); /** diff --git a/src/sim/syscall_emul.hh b/src/sim/syscall_emul.hh index 09be700..8695638 100644 --- a/src/sim/syscall_emul.hh +++ b/src/sim/syscall_emul.hh @@ -1521,6 +1521,10 @@ ctc->pcState(cpc); ctc->activate(); +if (flags & OS::TGT_CLONE_VFORK) { +tc->suspend(); +} + return cp->pid(); } @@ -1998,6 +2002,16 @@ }; /** + * If we were a thread created by a clone with vfork set, wake up + * the thread that created us + */ +if (!p->vforkContexts.empty()) { +ThreadContext *vtc = p->system->threads[p->vforkContexts.front()]; +assert(vtc->status() == ThreadContext::Suspended); +vtc->activate(); +} + +/** * Note that ProcessParams is generated by swig and there are no other * examples of how to create anything but this default constructor. The * fields are manually initialized instead of passing parameters to the -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48346 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: release-staging-v21-1 Gerrit-Change-Id: I85af67544ea1d5df7102dcff1331b5a6f6f4fa7c Gerrit-Change-Number: 48346 Gerrit-PatchSet: 3 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: sim-se: Fix execve syscall
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48345 ) Change subject: sim-se: Fix execve syscall .. sim-se: Fix execve syscall There were three things preventing execve from working Firstly, the entrypoint for the new program wasn't correct. This was fixed by calling Process::init, which adds a bias to the entrypoint. Secondly, the uname string wasn't being copied over. This meant when the new executable tried to run, it would think the kernel was too old to run on, and would error out. This was fixed by copying over the uname string (the `release` string in Process) when creating the new process. Additionally, this patch also ensures we copy over the uname string in the clone implementation, as otherwise a cloned thread that called execve would crash. Finally, we choose to not delete the new ProcessParams or the old Process. This is done both because it matches what is done in cloneFunc, but also because deleting the old process results in a segfault later on. Change-Id: I4ca201da689e9e37671b4cb477dc76fa12eecf69 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48345 Reviewed-by: Matt Sinclair Reviewed-by: Bobby R. Bruce Maintainer: Matt Sinclair Tested-by: kokoro --- M src/sim/syscall_emul.hh 1 file changed, 6 insertions(+), 2 deletions(-) Approvals: Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved Bobby R. Bruce: Looks good to me, approved kokoro: Regressions pass diff --git a/src/sim/syscall_emul.hh b/src/sim/syscall_emul.hh index aa02fd6..09be700 100644 --- a/src/sim/syscall_emul.hh +++ b/src/sim/syscall_emul.hh @@ -1452,6 +1452,7 @@ pp->euid = p->euid(); pp->gid = p->gid(); pp->egid = p->egid(); +pp->release = p->release; /* Find the first free PID that's less than the maximum */ std::set const& pids = p->system->PIDs; @@ -2017,6 +2018,7 @@ pp->errout.assign("cerr"); pp->cwd.assign(p->tgtCwd); pp->system = p->system; +pp->release = p->release; /** * Prevent process object creation with identical PIDs (which will trip * a fatal check in Process constructor). The execve call is supposed to @@ -2027,7 +2029,9 @@ */ p->system->PIDs.erase(p->pid()); Process *new_p = pp->create(); -delete pp; +// TODO: there is no way to know when the Process SimObject is done with +// the params pointer. Both the params pointer (pp) and the process +// pointer (p) are normally managed in python and are never cleaned up. /** * Work through the file descriptor array and close any files marked @@ -2042,10 +2046,10 @@ *new_p->sigchld = true; -delete p; tc->clearArchRegs(); tc->setProcessPtr(new_p); new_p->assignThreadContext(tc->contextId()); +new_p->init(); new_p->initState(); tc->activate(); TheISA::PCState pcState = tc->pcState(); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48345 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: release-staging-v21-1 Gerrit-Change-Id: I4ca201da689e9e37671b4cb477dc76fa12eecf69 Gerrit-Change-Number: 48345 Gerrit-PatchSet: 3 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: arch-gcn3: Validate if scalar sources are scalar gprs
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48344 ) Change subject: arch-gcn3: Validate if scalar sources are scalar gprs .. arch-gcn3: Validate if scalar sources are scalar gprs Scalar sources can either be a general-purpose register or a constant register that holds a single value. If we don't check for if the register is a general-purpose register, it's possible that we get a constant register, which then causes all of the register mapping code to break, as the constant registers aren't supposed to be mapped like the general-purpose registers are. This fix adds an isScalarReg check to the instruction encodings that were missing it. Change-Id: I3d7d5393aa324737301c3269cc227b60e8a159e4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48344 Tested-by: kokoro Reviewed-by: Matt Sinclair Reviewed-by: Bobby R. Bruce Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair --- M src/arch/amdgpu/gcn3/insts/op_encodings.cc 1 file changed, 6 insertions(+), 6 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved Bobby R. Bruce: Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/op_encodings.cc b/src/arch/amdgpu/gcn3/insts/op_encodings.cc index cbbb767..cf20a2e 100644 --- a/src/arch/amdgpu/gcn3/insts/op_encodings.cc +++ b/src/arch/amdgpu/gcn3/insts/op_encodings.cc @@ -1277,12 +1277,12 @@ reg = extData.SRSRC; srcOps.emplace_back(reg, getOperandSize(opNum), true, - true, false, false); + isScalarReg(reg), false, false); opNum++; reg = extData.SOFFSET; srcOps.emplace_back(reg, getOperandSize(opNum), true, - true, false, false); + isScalarReg(reg), false, false); opNum++; } @@ -1368,12 +1368,12 @@ reg = extData.SRSRC; srcOps.emplace_back(reg, getOperandSize(opNum), true, - true, false, false); + isScalarReg(reg), false, false); opNum++; reg = extData.SOFFSET; srcOps.emplace_back(reg, getOperandSize(opNum), true, - true, false, false); + isScalarReg(reg), false, false); opNum++; // extData.VDATA moves in the reg list depending on the instruction @@ -1441,13 +1441,13 @@ reg = extData.SRSRC; srcOps.emplace_back(reg, getOperandSize(opNum), true, - true, false, false); + isScalarReg(reg), false, false); opNum++; if (getNumOperands() == 4) { reg = extData.SSAMP; srcOps.emplace_back(reg, getOperandSize(opNum), true, - true, false, false); + isScalarReg(reg), false, false); opNum++; } 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48344 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: release-staging-v21-1 Gerrit-Change-Id: I3d7d5393aa324737301c3269cc227b60e8a159e4 Gerrit-Change-Number: 48344 Gerrit-PatchSet: 3 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Bobby R. Bruce Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: arch-gcn3: Implement LDS accesses in Flat instructions
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48343 ) Change subject: arch-gcn3: Implement LDS accesses in Flat instructions .. arch-gcn3: Implement LDS accesses in Flat instructions Add support for LDS accesses by allowing Flat instructions to dispatch into the local memory pipeline if the requested address is in the group aperture. This requires implementing LDS accesses in the Flat initMemRead/Write functions, in a similar fashion to the DS functions of the same name. Because we now can potentially dispatch to the local memory pipeline, this change also adds a check to regain any tokens we requested as a flat instruction. Change-Id: Id26191f7ee43291a5e5ca5f39af06af981ec23ab Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48343 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/gcn3/insts/instructions.cc M src/arch/amdgpu/gcn3/insts/op_encodings.hh M src/gpu-compute/gpu_dyn_inst.cc M src/gpu-compute/local_memory_pipeline.cc 4 files changed, 184 insertions(+), 32 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc b/src/arch/amdgpu/gcn3/insts/instructions.cc index 79af7ac..65d008b 100644 --- a/src/arch/amdgpu/gcn3/insts/instructions.cc +++ b/src/arch/amdgpu/gcn3/insts/instructions.cc @@ -36314,7 +36314,7 @@ gpuDynInst->computeUnit()->globalMemoryPipe. issueRequest(gpuDynInst); } else { -fatal("Non global flat instructions not implemented yet.\n"); +fatal("Unsupported scope for flat instruction.\n"); } } @@ -36363,7 +36363,7 @@ gpuDynInst->computeUnit()->globalMemoryPipe. issueRequest(gpuDynInst); } else { -fatal("Non global flat instructions not implemented yet.\n"); +fatal("Unsupported scope for flat instruction.\n"); } } void @@ -39384,8 +39384,11 @@ if (gpuDynInst->executedAs() == enums::SC_GLOBAL) { gpuDynInst->computeUnit()->globalMemoryPipe .issueRequest(gpuDynInst); +} else if (gpuDynInst->executedAs() == enums::SC_GROUP) { +gpuDynInst->computeUnit()->localMemoryPipe +.issueRequest(gpuDynInst); } else { -fatal("Non global flat instructions not implemented yet.\n"); +fatal("Unsupported scope for flat instruction.\n"); } } // execute @@ -39448,8 +39451,11 @@ if (gpuDynInst->executedAs() == enums::SC_GLOBAL) { gpuDynInst->computeUnit()->globalMemoryPipe .issueRequest(gpuDynInst); +} else if (gpuDynInst->executedAs() == enums::SC_GROUP) { +gpuDynInst->computeUnit()->localMemoryPipe +.issueRequest(gpuDynInst); } else { -fatal("Non global flat instructions not implemented yet.\n"); +fatal("Unsupported scope for flat instruction.\n"); } } @@ -39511,8 +39517,11 @@ if (gpuDynInst->executedAs() == enums::SC_GLOBAL) { gpuDynInst->computeUnit()->globalMemoryPipe .issueRequest(gpuDynInst); +} else if (gpuDynInst->executedAs() == enums::SC_GROUP) { +gpuDynInst->computeUnit()->localMemoryPipe +.issueRequest(gpuDynInst); } else { -fatal("Non global flat instructions not implemented yet.\n"); +fatal("Unsupported scope for flat instruction.\n"); } } @@ -39603,8 +39612,11 @@ if (gpuDynInst->executedAs() == enums::SC_GLOBAL) { gpuDynInst->computeUnit()->globalMemoryPipe .issueRequest(gpuDynInst); +} else if (gpuDynInst->executedAs() == enums::SC_GROUP) { +gpuDynInst->computeUnit()->localMemoryPipe +.issueRequest(gpuDynInst); } else { -fatal("Non global flat instructions not implemented yet.\n"); +fatal("Unsupported scope for flat instruction.\n"); } } @@ -39667,8 +39679,11 @@ if (gpuDynInst->executedAs() == enums::SC_GLOBAL) { gpuDynInst->computeUnit()->globalMemoryPipe .issueRequest(gpuDynInst); +} else if (gpuDynInst->executedAs() == enums::SC_GROUP) { +gpuDynInst->computeUnit()->localMemoryPipe +.issueRequest(gpuDynInst); } else { -fatal("Non global flat instructions not implemented yet.\n"); +fatal("Unsupported scope for flat instruction.\n"); } } @@ -39731,8 +39746,11 @@ if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: gpu-compute: Fix TLB coalescer starvation
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48340 ) Change subject: gpu-compute: Fix TLB coalescer starvation .. gpu-compute: Fix TLB coalescer starvation Currently, we are storing coalesced accesses in an std::unordered_map indexed by a tick index, i.e. issue tick / coalescing window. If there are multiple coalesced requests, at different tick indexes, to the same virtual address, then the TLB coalescer will issue just the first one. However, std::unordered_map is not a sorted container and we issue coalesced requests by iterating through such container. This means that the coalesced request sent in TLBCoalescer::processProbeTLBEvent is not necessarly the oldest one. Because of this, in cases of high contention the oldest coalesced request will have a huge TLB access latency. To fix this issue, we will use an std::map which is a sorted container and therefore guarantees the oldest coalesced request will be sent first. Change-Id: I9c7ab32c038d5e60f6b55236266a27b0cae8bfb0 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48340 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair Tested-by: kokoro --- M src/gpu-compute/tlb_coalescer.hh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/tlb_coalescer.hh b/src/gpu-compute/tlb_coalescer.hh index b97801b..fce8740 100644 --- a/src/gpu-compute/tlb_coalescer.hh +++ b/src/gpu-compute/tlb_coalescer.hh @@ -100,7 +100,7 @@ * option is to change it to curTick(), so we coalesce based * on the receive time. */ -typedef std::unordered_map> +typedef std::map> CoalescingFIFO; CoalescingFIFO coalescerFIFO; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48340 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: release-staging-v21-1 Gerrit-Change-Id: I9c7ab32c038d5e60f6b55236266a27b0cae8bfb0 Gerrit-Change-Number: 48340 Gerrit-PatchSet: 3 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: arch-gcn3: Implement large ds_read/write instructions
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48342 ) Change subject: arch-gcn3: Implement large ds_read/write instructions .. arch-gcn3: Implement large ds_read/write instructions This implements the 96 and 128b ds_read/write instructions in a similar fashion to the 3 and 4 dword flat_load/store instructions. These instructions are treated as reads/writes of 3 or 4 dwords, instead of as a single 96b/128b memory transaction, due to the limitations of the VecOperand class used in the amdgpu code. In order to handle treating the memory transaction as multiple dwords, the patch also adds in new initMemRead/initMemWrite functions for ds instructions. These are similar to the functions used in flat instructions for the same purpose. Change-Id: I0f2ba3cb7cf040abb876e6eae55a6d38149ee960 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48342 Tested-by: kokoro Reviewed-by: Alex Dutu Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/amdgpu/gcn3/insts/instructions.cc M src/arch/amdgpu/gcn3/insts/instructions.hh M src/arch/amdgpu/gcn3/insts/op_encodings.hh 3 files changed, 232 insertions(+), 4 deletions(-) Approvals: Alex Dutu: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc b/src/arch/amdgpu/gcn3/insts/instructions.cc index 21ab58d..79af7ac 100644 --- a/src/arch/amdgpu/gcn3/insts/instructions.cc +++ b/src/arch/amdgpu/gcn3/insts/instructions.cc @@ -34335,9 +34335,52 @@ void Inst_DS__DS_WRITE_B96::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU32 data0(gpuDynInst, extData.DATA0); +ConstVecOperandU32 data1(gpuDynInst, extData.DATA0 + 1); +ConstVecOperandU32 data2(gpuDynInst, extData.DATA0 + 2); + +addr.read(); +data0.read(); +data1.read(); +data2.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast( +gpuDynInst->d_data))[lane * 4] = data0[lane]; +(reinterpret_cast( +gpuDynInst->d_data))[lane * 4 + 1] = data1[lane]; +(reinterpret_cast( +gpuDynInst->d_data))[lane * 4 + 2] = data2[lane]; +} +} + + gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst); } +void +Inst_DS__DS_WRITE_B96::initiateAcc(GPUDynInstPtr gpuDynInst) +{ +Addr offset0 = instData.OFFSET0; +Addr offset1 = instData.OFFSET1; +Addr offset = (offset1 << 8) | offset0; + +initMemWrite<3>(gpuDynInst, offset); +} // initiateAcc + +void +Inst_DS__DS_WRITE_B96::completeAcc(GPUDynInstPtr gpuDynInst) +{ +} // completeAcc + Inst_DS__DS_WRITE_B128::Inst_DS__DS_WRITE_B128(InFmt_DS *iFmt) : Inst_DS(iFmt, "ds_write_b128") { @@ -34354,9 +34397,56 @@ void Inst_DS__DS_WRITE_B128::execute(GPUDynInstPtr gpuDynInst) { -panicUnimplemented(); +Wavefront *wf = gpuDynInst->wavefront(); +gpuDynInst->execUnitId = wf->execUnitId; +gpuDynInst->latency.init(gpuDynInst->computeUnit()); +gpuDynInst->latency.set( +gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); +ConstVecOperandU32 addr(gpuDynInst, extData.ADDR); +ConstVecOperandU32 data0(gpuDynInst, extData.DATA0); +ConstVecOperandU32 data1(gpuDynInst, extData.DATA0 + 1); +ConstVecOperandU32 data2(gpuDynInst, extData.DATA0 + 2); +ConstVecOperandU32 data3(gpuDynInst, extData.DATA0 + 3); + +addr.read(); +data0.read(); +data1.read(); +data2.read(); +data3.read(); + +calcAddr(gpuDynInst, addr); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast( +gpuDynInst->d_data))[lane * 4] = data0[lane]; +(reinterpret_cast( +gpuDynInst->d_data))[lane * 4 + 1] = data1[lane]; +(reinterpret_cast( +gpuDynInst->d_data))[lane * 4 + 2] = data2[lane]; +(reinterpret_cast( +gpuDynInst->d_data))[lane * 4 + 3] = data3[lane]; +} +} + + gp
[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: mem-ruby: Account for misaligned accesses in GPUCoalescer
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/48341 ) Change subject: mem-ruby: Account for misaligned accesses in GPUCoalescer .. mem-ruby: Account for misaligned accesses in GPUCoalescer Previously, we assumed that the maximum number of requests that would be issued by an instruction was equal to the number of threads that were active for that instruction. However, if a thread has an access that crosses a cache line, that thread has a misaligned access, and needs to request both cache lines. This patch takes that into account by checking the status vector for each thread in that instruction to determine the number of requests. Change-Id: I1994962c46d504b48654dbd22bcd786c9f382fd9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48341 Tested-by: kokoro Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair --- M src/mem/ruby/system/GPUCoalescer.cc 1 file changed, 4 insertions(+), 1 deletion(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/system/GPUCoalescer.cc b/src/mem/ruby/system/GPUCoalescer.cc index c00e7c0..2390ba6 100644 --- a/src/mem/ruby/system/GPUCoalescer.cc +++ b/src/mem/ruby/system/GPUCoalescer.cc @@ -645,7 +645,10 @@ // of the exec_mask. int num_packets = 1; if (!m_usingRubyTester) { -num_packets = getDynInst(pkt)->exec_mask.count(); +num_packets = 0; +for (int i = 0; i < TheGpuISA::NumVecElemPerVecReg; i++) { +num_packets += getDynInst(pkt)->getLaneStatus(i); +} } // the pkt is temporarily stored in the uncoalesced table until 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48341 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: release-staging-v21-1 Gerrit-Change-Id: I1994962c46d504b48654dbd22bcd786c9f382fd9 Gerrit-Change-Number: 48341 Gerrit-PatchSet: 3 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 i...
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/48021 ) Change subject: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into develop .. Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into develop Change-Id: I884540d26228cddb739e93eb03541e1deffc4390 --- 1 file changed, 0 insertions(+), 0 deletions(-) -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48021 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I884540d26228cddb739e93eb03541e1deffc4390 Gerrit-Change-Number: 48021 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: fix typo in compute driver comments
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/48023 ) Change subject: gpu-compute: fix typo in compute driver comments .. gpu-compute: fix typo in compute driver comments Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5 --- M src/gpu-compute/gpu_compute_driver.cc 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gpu-compute/gpu_compute_driver.cc b/src/gpu-compute/gpu_compute_driver.cc index 92ac641..4389f31 100644 --- a/src/gpu-compute/gpu_compute_driver.cc +++ b/src/gpu-compute/gpu_compute_driver.cc @@ -831,7 +831,7 @@ // of the region. // // This is a simplified version of regular system VMAs, but for -// GPUVM space (non of the clobber/remap nonsense we find in real +// GPUVM space (none of the clobber/remap nonsense we find in real // OS managed memory). allocateGpuVma(mtype, args->va_addr, args->size); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48023 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5 Gerrit-Change-Number: 48023 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 i...
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/48022 ) Change subject: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into develop .. Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into develop Change-Id: I479b6de37af0de2e92227761794730d37157c803 --- 1 file changed, 0 insertions(+), 0 deletions(-) -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48022 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I479b6de37af0de2e92227761794730d37157c803 Gerrit-Change-Number: 48022 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add fatal when decoding missing insts
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47522 ) Change subject: arch-vega: Add fatal when decoding missing insts .. arch-vega: Add fatal when decoding missing insts Certain instructions don't have implementations in instructions.cc, and get decoded as a nullptr. This adds a fatal when decoding a missing instruction, as we aren't able to properly run a program if all its instructions aren't implemented, and it allows us to figure out which instruction is missing due to fatals printing the line they were called. Change-Id: I7e3690f079b790dceee102063773d5fbbc8619f1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47522 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/decoder.cc 1 file changed, 229 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/decoder.cc b/src/arch/amdgpu/vega/decoder.cc index e4b7922..3054d1a 100644 --- a/src/arch/amdgpu/vega/decoder.cc +++ b/src/arch/amdgpu/vega/decoder.cc @@ -4440,6 +4440,7 @@ GPUStaticInst* Decoder::decode_OP_SOP2__S_MUL_HI_U32(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } @@ -4452,42 +4453,49 @@ GPUStaticInst* Decoder::decode_OP_SOP2__S_LSHL1_ADD_U32(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OP_SOP2__S_LSHL2_ADD_U32(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OP_SOP2__S_LSHL3_ADD_U32(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OP_SOP2__S_LSHL4_ADD_U32(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OP_SOP2__S_PACK_LL_B32_B16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OP_SOP2__S_PACK_LH_B32_B16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OP_SOP2__S_HH_B32_B16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } @@ -4614,6 +4622,7 @@ GPUStaticInst* Decoder::decode_OP_SOPK__S_CALL_B64(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } @@ -6834,108 +6843,126 @@ GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAD_U32_U16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAD_I32_I16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_XAD_U32(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MIN3_F16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MIN3_I16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MIN3_U16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAX3_F16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAX3_I16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAX3_U16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MED3_F16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n"); return nullptr; } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MED3_I16(MachInst iFmt) { +fatal("Trying to decode instruction without a class\n");
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Update GET_PROCESS_APERTURES IOCTLs
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47529 ) Change subject: gpu-compute: Update GET_PROCESS_APERTURES IOCTLs .. gpu-compute: Update GET_PROCESS_APERTURES IOCTLs The apertures for non-gfx801 GPUs are set differently. If the apertures aren't set properly, ROCm will error out. This change sets the apertures appropriately based on the gfx version of the simulated GPU. It also adds in new functions to set the scratch and lds apertures in GFX9 to mimic the linux kernel. Change-Id: I1fa6f60bc20c7b6eb3896057841d96846460a9f8 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47529 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair Tested-by: kokoro --- M src/gpu-compute/gpu_compute_driver.cc M src/gpu-compute/gpu_compute_driver.hh 2 files changed, 88 insertions(+), 22 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/gpu_compute_driver.cc b/src/gpu-compute/gpu_compute_driver.cc index 472ced4..2fe5275 100644 --- a/src/gpu-compute/gpu_compute_driver.cc +++ b/src/gpu-compute/gpu_compute_driver.cc @@ -316,18 +316,50 @@ * ensure that the base/limit addresses are * calculated correctly. */ -args->process_apertures[i].scratch_base -= scratchApeBase(i + 1); + +switch (gfxVersion) { + case GfxVersion::gfx801: + case GfxVersion::gfx803: +args->process_apertures[i].scratch_base = +scratchApeBase(i + 1); +args->process_apertures[i].lds_base = +ldsApeBase(i + 1); +break; + case GfxVersion::gfx900: +args->process_apertures[i].scratch_base = +scratchApeBaseV9(); +args->process_apertures[i].lds_base = +ldsApeBaseV9(); +break; + default: +fatal("Invalid gfx version\n"); +} + +// GFX8 and GFX9 set lds and scratch limits the same way args->process_apertures[i].scratch_limit = scratchApeLimit(args->process_apertures[i].scratch_base); -args->process_apertures[i].lds_base = ldsApeBase(i + 1); args->process_apertures[i].lds_limit = ldsApeLimit(args->process_apertures[i].lds_base); -args->process_apertures[i].gpuvm_base = gpuVmApeBase(i + 1); -args->process_apertures[i].gpuvm_limit = -gpuVmApeLimit(args->process_apertures[i].gpuvm_base); +switch (gfxVersion) { + case GfxVersion::gfx801: +args->process_apertures[i].gpuvm_base = +gpuVmApeBase(i + 1); +args->process_apertures[i].gpuvm_limit = + gpuVmApeLimit(args->process_apertures[i].gpuvm_base); +break; + case GfxVersion::gfx803: + case GfxVersion::gfx900: +// Taken from SVM_USE_BASE in Linux kernel +args->process_apertures[i].gpuvm_base = 0x100ull; +// Taken from AMDGPU_GMC_HOLE_START in Linux kernel +args->process_apertures[i].gpuvm_limit = +0x8000ULL - 1; +break; + default: +fatal("Invalid gfx version"); +} // NOTE: Must match ID populated by hsaTopology.py // @@ -396,14 +428,6 @@ 47) != 0x1); assert(bits(args->process_apertures[i].lds_limit, 63, 47) != 0); -assert(bits(args->process_apertures[i].gpuvm_base, 63, - 47) != 0x1); -assert(bits(args->process_apertures[i].gpuvm_base, 63, - 47) != 0); -assert(bits(args->process_apertures[i].gpuvm_limit, 63, - 47) != 0x1); -assert(bits(args->process_apertures[i].gpuvm_limit, 63, - 47) != 0); } args.copyOut(virt_proxy); @@ -593,13 +617,41 @@ TypedBufferArg ape_args (ioc_args->kfd_process_device_apertures_ptr); -ape_args->scratch_base = scratchApeBase(i + 1); +switch (gfxVersion) { + case GfxVersion::gfx801: + case GfxVersion::gfx803: +
[gem5-dev] Change in gem5/gem5[develop]: configs: Add shared_cpu_list to cache directories
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47524 ) Change subject: configs: Add shared_cpu_list to cache directories .. configs: Add shared_cpu_list to cache directories The ROCm thunk uses this file instead of the shared_cpu_map file. Change-Id: I985512245c9f51106b8347412ed643f78b567b24 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47524 Tested-by: kokoro Reviewed-by: Matt Sinclair Reviewed-by: Jason Lowe-Power Maintainer: Matt Sinclair --- M configs/common/FileSystemConfig.py 1 file changed, 2 insertions(+), 0 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/common/FileSystemConfig.py b/configs/common/FileSystemConfig.py index 0d9f221..66a6315 100644 --- a/configs/common/FileSystemConfig.py +++ b/configs/common/FileSystemConfig.py @@ -217,6 +217,8 @@ file_append((indexdir, 'number_of_sets'), num_sets) file_append((indexdir, 'physical_line_partition'), '1') file_append((indexdir, 'shared_cpu_map'), hex_mask(cpus)) +file_append((indexdir, 'shared_cpu_list'), +','.join(str(cpu) for cpu in cpus)) def _redirect_paths(options): # Redirect filesystem syscalls from src to the first matching dests 2 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47524 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I985512245c9f51106b8347412ed643f78b567b24 Gerrit-Change-Number: 47524 Gerrit-PatchSet: 4 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-x86: Ignore mbind syscall
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47526 ) Change subject: arch-x86: Ignore mbind syscall .. arch-x86: Ignore mbind syscall mbind gets called when running with a dGPU in ROCm 4, but we are able to ignore it without breaking anything Change-Id: I7c1ba47656122a5eb856981dca2a05359098e3b2 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47526 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/x86/linux/syscall_tbl64.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/x86/linux/syscall_tbl64.cc b/src/arch/x86/linux/syscall_tbl64.cc index e1eed18..5c983c0 100644 --- a/src/arch/x86/linux/syscall_tbl64.cc +++ b/src/arch/x86/linux/syscall_tbl64.cc @@ -284,7 +284,7 @@ { 234, "tgkill", tgkillFunc }, { 235, "utimes" }, { 236, "vserver" }, -{ 237, "mbind" }, +{ 237, "mbind", ignoreFunc }, { 238, "set_mempolicy" }, { 239, "get_mempolicy", ignoreFunc }, { 240, "mq_open" }, 1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47526 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I7c1ba47656122a5eb856981dca2a05359098e3b2 Gerrit-Change-Number: 47526 Gerrit-PatchSet: 4 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Gabe Black Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: configs: Don't report CPU cores on Fiji properties
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47525 ) Change subject: configs: Don't report CPU cores on Fiji properties .. configs: Don't report CPU cores on Fiji properties ROCm determines if a device is a dGPU in two ways. The first is by looking at the device ID. The second is through a flag that gets set only if the reported cpu_cores_count is 0. If these don't agree, ROCm breaks when doing memory operations. Previously, cpu_cores_count was non-zero on the Fiji config. This patch sets it to 0 to appease ROCm Change-Id: I0fd0ce724f491ed6a4598188b3799468668585f4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47525 Tested-by: kokoro Reviewed-by: Jason Lowe-Power Reviewed-by: Matthew Poremba Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M configs/example/hsaTopology.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Jason Lowe-Power: Looks good to me, but someone else must approve Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/hsaTopology.py b/configs/example/hsaTopology.py index 78193e0..28060cc 100644 --- a/configs/example/hsaTopology.py +++ b/configs/example/hsaTopology.py @@ -359,7 +359,7 @@ file_append((io_dir, 'properties'), io_prop) # Populate GPU node properties -node_prop = 'cpu_cores_count %s\n' % options.num_cpus + \ +node_prop = 'cpu_cores_count 0\n' + \ 'simd_count %s\n' \ % (options.num_compute_units * options.simds_per_cu)+ \ 'mem_banks_count 1\n' + \ 2 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47525 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0fd0ce724f491ed6a4598188b3799468668585f4 Gerrit-Change-Number: 47525 Gerrit-PatchSet: 4 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: configs: Set valid heap_type values
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47528 ) Change subject: configs: Set valid heap_type values .. configs: Set valid heap_type values The variables that were used to set heap_type don't exist. Explicitly set them to the proper values. Also add pointer to what heap value means in the ROCm stack. Change-Id: I8df7fca7442f6640be1154ef147c4e302ea491bb Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47528 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair Tested-by: kokoro --- M configs/example/hsaTopology.py 1 file changed, 12 insertions(+), 2 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/hsaTopology.py b/configs/example/hsaTopology.py index 28060cc..a4dbebb 100644 --- a/configs/example/hsaTopology.py +++ b/configs/example/hsaTopology.py @@ -140,7 +140,9 @@ # CPU memory reporting mem_dir = joinpath(node_dir, 'mem_banks/0') remake_dir(mem_dir) -mem_prop = 'heap_type %s\n' % HsaHeaptype.HSA_HEAPTYPE_SYSTEM.value + \ +# Heap type value taken from real system, heap type values: +# https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317 +mem_prop = 'heap_type 0\n' + \ 'size_in_bytes 33704329216\n'+ \ 'flags 0\n' + \ 'width 72\n' + \ @@ -221,7 +223,9 @@ # TODO: Extract size, clk, and width from sim paramters mem_dir = joinpath(node_dir, 'mem_banks/0') remake_dir(mem_dir) -mem_prop = 'heap_type %s\n' % heap_type.value + \ +# Heap type value taken from real system, heap type values: +# https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317 +mem_prop = 'heap_type 1\n' + \ 'size_in_bytes 17163091968\n'+ \ 'flags 0\n' + \ 'width 2048\n' + \ @@ -316,6 +320,8 @@ # CPU memory reporting mem_dir = joinpath(node_dir, 'mem_banks/0') remake_dir(mem_dir) +# Heap type value taken from real system, heap type values: +# https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317 mem_prop = 'heap_type 0\n' + \ 'size_in_bytes 33704329216\n'+ \ 'flags 0\n' + \ @@ -394,6 +400,8 @@ # TODO: Extract size, clk, and width from sim paramters mem_dir = joinpath(node_dir, 'mem_banks/0') remake_dir(mem_dir) +# Heap type value taken from real system, heap type values: +# https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317 mem_prop = 'heap_type 1\n' + \ 'size_in_bytes 4294967296\n' + \ 'flags 0\n' + \ @@ -471,6 +479,8 @@ mem_dir = joinpath(node_dir, f'mem_banks/{i}') remake_dir(mem_dir) +# Heap type value taken from real system, heap type values: +# https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317 mem_prop = f'heap_type 0\n' + \ f'size_in_bytes {toMemorySize(options.mem_size)}'+ \ f'flags 0\n' + \ -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47528 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I8df7fca7442f6640be1154ef147c4e302ea491bb Gerrit-Change-Number: 47528 Gerrit-PatchSet: 4 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: configs,gpu-compute: Set proper dGPUPoolID defaults
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47527 ) Change subject: configs,gpu-compute: Set proper dGPUPoolID defaults .. configs,gpu-compute: Set proper dGPUPoolID defaults In GPU.py, dGPUPoolID is defined as an int, but was defaulted to False. Explicitly set it to 0, instead. In apu_se.py, dGPUPoolID was being set to 1, but that was resulting in crashes. Setting it to 0 avoids those crashes. Change-Id: I0f1161588279a335bbd0d8ae7acda97fc23201b5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47527 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair Tested-by: kokoro --- M configs/example/apu_se.py M src/gpu-compute/GPU.py 2 files changed, 3 insertions(+), 2 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py index 98a1e19..6f686f3 100644 --- a/configs/example/apu_se.py +++ b/configs/example/apu_se.py @@ -432,9 +432,10 @@ args.m_type = 6 # HSA kernel mode driver +# dGPUPoolID is 0 because we only have one memory pool gpu_driver = GPUComputeDriver(filename = "kfd", isdGPU = args.dgpu, gfxVersion = args.gfx_version, - dGPUPoolID = 1, m_type = args.m_type) + dGPUPoolID = 0, m_type = args.m_type) renderDriNum = 128 render_driver = GPURenderDriver(filename = f'dri/renderD{renderDriNum}') diff --git a/src/gpu-compute/GPU.py b/src/gpu-compute/GPU.py index 6b0bb2e..d2f9b6e 100644 --- a/src/gpu-compute/GPU.py +++ b/src/gpu-compute/GPU.py @@ -245,7 +245,7 @@ device = Param.GPUCommandProcessor('GPU controlled by this driver') isdGPU = Param.Bool(False, 'Driver is for a dGPU') gfxVersion = Param.GfxVersion('gfx801', 'ISA of gpu to model') -dGPUPoolID = Param.Int(False, 'Pool ID for dGPU.') +dGPUPoolID = Param.Int(0, 'Pool ID for dGPU.') # Default Mtype for caches #-- 1 1 1 C_RW_S (Cached-ReadWrite-Shared) #-- 1 1 0 C_RW_US (Cached-ReadWrite-Unshared) -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47527 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0f1161588279a335bbd0d8ae7acda97fc23201b5 Gerrit-Change-Number: 47527 Gerrit-PatchSet: 4 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Add mmap functionality to GPURenderDriver
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47523 ) Change subject: gpu-compute: Add mmap functionality to GPURenderDriver .. gpu-compute: Add mmap functionality to GPURenderDriver dGPUs mmap the GPURenderDriver, however it doesn't appear that they do anything with it. This patch implements the mmap function by just returning the address provided, while not doing anything else Change-Id: Ia010a2aebcf7e2c75e22d93dfb440937d1bef3b1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47523 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matt Sinclair Tested-by: kokoro --- M src/gpu-compute/gpu_render_driver.cc M src/gpu-compute/gpu_render_driver.hh 2 files changed, 14 insertions(+), 1 deletion(-) Approvals: Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/gpu_render_driver.cc b/src/gpu-compute/gpu_render_driver.cc index 260a61f..ad75c82 100644 --- a/src/gpu-compute/gpu_render_driver.cc +++ b/src/gpu-compute/gpu_render_driver.cc @@ -41,7 +41,7 @@ /* ROCm 4 utilizes the render driver located at /dev/dri/renderDXXX. This * patch implements a very simple driver that just returns a file - * descriptor when opened, as testing has shown that's all that's needed + * descriptor when opened. */ int GPURenderDriver::open(ThreadContext *tc, int mode, int flags) @@ -52,4 +52,14 @@ return tgt_fd; } +/* DGPUs try to mmap the driver file. It doesn't appear they do anything + * with it, so we just return the address that's provided + */ +Addr GPURenderDriver::mmap(ThreadContext *tc, Addr start, uint64_t length, + int prot, int tgt_flags, int tgt_fd, off_t offset) +{ +warn_once("GPURenderDriver::mmap returning start address %#x", start); +return start; +} + } // namespace gem5 diff --git a/src/gpu-compute/gpu_render_driver.hh b/src/gpu-compute/gpu_render_driver.hh index f94fdef..ab1ddcf 100644 --- a/src/gpu-compute/gpu_render_driver.hh +++ b/src/gpu-compute/gpu_render_driver.hh @@ -50,6 +50,9 @@ { return -EBADF; } + +Addr mmap(ThreadContext *tc, Addr start, uint64_t length, + int prot, int tgt_flags, int tgt_fd, off_t offset) override; }; } // namespace gem5 -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47523 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ia010a2aebcf7e2c75e22d93dfb440937d1bef3b1 Gerrit-Change-Number: 47523 Gerrit-PatchSet: 4 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add decoding for implemented insts
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47521 ) Change subject: arch-vega: Add decoding for implemented insts .. arch-vega: Add decoding for implemented insts Certain instructions were implemented in instructions.cc, but weren't actually being decoded by the decoder, causing the decoder to return nullptr for valid instructions. This patch fixes the decoder to return the proper instruction class for implemented instructions Change-Id: I8d8525a1c435147017cb38d9df8e1675986ef04b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47521 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Reviewed-by: Alex Dutu Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/decoder.cc 1 file changed, 9 insertions(+), 9 deletions(-) Approvals: Alex Dutu: Looks good to me, approved Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/decoder.cc b/src/arch/amdgpu/vega/decoder.cc index 359e125..e4b7922 100644 --- a/src/arch/amdgpu/vega/decoder.cc +++ b/src/arch/amdgpu/vega/decoder.cc @@ -4158,19 +4158,19 @@ GPUStaticInst* Decoder::decode_OP_VOP2__V_ADD_U32(MachInst iFmt) { -return nullptr; +return new Inst_VOP2__V_ADD_U32(&iFmt->iFmt_VOP2); } GPUStaticInst* Decoder::decode_OP_VOP2__V_SUB_U32(MachInst iFmt) { -return nullptr; +return new Inst_VOP2__V_SUB_U32(&iFmt->iFmt_VOP2); } GPUStaticInst* Decoder::decode_OP_VOP2__V_SUBREV_U32(MachInst iFmt) { -return nullptr; +return new Inst_VOP2__V_SUBREV_U32(&iFmt->iFmt_VOP2); } GPUStaticInst* @@ -4446,7 +4446,7 @@ GPUStaticInst* Decoder::decode_OP_SOP2__S_MUL_HI_I32(MachInst iFmt) { -return nullptr; +return new Inst_SOP2__S_MUL_I32(&iFmt->iFmt_SOP2); } GPUStaticInst* @@ -6942,31 +6942,31 @@ GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAD_F16(MachInst iFmt) { -return nullptr; +return new Inst_VOP3__V_MAD_F16(&iFmt->iFmt_VOP3A); } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAD_U16(MachInst iFmt) { -return nullptr; +return new Inst_VOP3__V_MAD_U16(&iFmt->iFmt_VOP3A); } GPUStaticInst* Decoder::decode_OPU_VOP3__V_MAD_I16(MachInst iFmt) { -return nullptr; +return new Inst_VOP3__V_MAD_I16(&iFmt->iFmt_VOP3A); } GPUStaticInst* Decoder::decode_OPU_VOP3__V_FMA_F16(MachInst iFmt) { -return nullptr; +return new Inst_VOP3__V_FMA_F16(&iFmt->iFmt_VOP3A); } GPUStaticInst* Decoder::decode_OPU_VOP3__V_DIV_FIXUP_F16(MachInst iFmt) { -return nullptr; +return new Inst_VOP3__V_DIV_FIXUP_F16(&iFmt->iFmt_VOP3A); } GPUStaticInst* -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47521 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I8d8525a1c435147017cb38d9df8e1675986ef04b Gerrit-Change-Number: 47521 Gerrit-PatchSet: 2 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add missing return to flat_load_dwordx4
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47520 ) Change subject: arch-vega: Add missing return to flat_load_dwordx4 .. arch-vega: Add missing return to flat_load_dwordx4 Change-Id: Ibf56c25a3d22d3c12ae2c1bb11f00f4a44b5919a Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47520 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Reviewed-by: Alex Dutu Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 1 insertion(+), 0 deletions(-) Approvals: Alex Dutu: Looks good to me, approved Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 6e8c854..cc5a161 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -42984,6 +42984,7 @@ if (gpuDynInst->exec_mask.none()) { wf->decVMemInstsIssued(); wf->decLGKMInstsIssued(); +return; } gpuDynInst->execUnitId = wf->execUnitId; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47520 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ibf56c25a3d22d3c12ae2c1bb11f00f4a44b5919a Gerrit-Change-Number: 47520 Gerrit-PatchSet: 2 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Fix s_endpgm instruction
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47519 ) Change subject: arch-vega: Fix s_endpgm instruction .. arch-vega: Fix s_endpgm instruction Copy over changes that had been made to s_engpgm in GCN3 but weren't added to the Vega implementation Change-Id: I1063f83b1ce8f7c5e451c8c227265715c8f725b9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47519 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Reviewed-by: Alex Dutu Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/vega/insts/instructions.cc 1 file changed, 11 insertions(+), 2 deletions(-) Approvals: Alex Dutu: Looks good to me, approved Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 47ea892..6e8c854 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -4137,7 +4137,12 @@ ComputeUnit *cu = gpuDynInst->computeUnit(); // delete extra instructions fetched for completed work-items -wf->instructionBuffer.clear(); +wf->instructionBuffer.erase(wf->instructionBuffer.begin() + 1, +wf->instructionBuffer.end()); + +if (wf->pendingFetch) { +wf->dropFetch = true; +} wf->computeUnit->fetchStage.fetchUnit(wf->simdId) .flushBuf(wf->wfSlotId); @@ -4215,8 +4220,11 @@ bool kernelEnd = wf->computeUnit->shader->dispatcher().isReachingKernelEnd(wf); +bool relNeeded = +wf->computeUnit->shader->impl_kern_end_rel; + //if it is not a kernel end, then retire the workgroup directly -if (!kernelEnd) { +if (!kernelEnd || !relNeeded) { wf->computeUnit->shader->dispatcher().notifyWgCompl(wf); wf->setStatus(Wavefront::S_STOPPED); wf->computeUnit->completedWGs++; @@ -4232,6 +4240,7 @@ * the complex */ setFlag(MemSync); +setFlag(GlobalSegment); // Notify Memory System of Kernel Completion // Kernel End = isKernel + isMemSync wf->setStatus(Wavefront::S_RETURNING); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47519 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I1063f83b1ce8f7c5e451c8c227265715c8f725b9 Gerrit-Change-Number: 47519 Gerrit-PatchSet: 2 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3,arch-vega,gpu-compute: Move request counters
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/45347 ) Change subject: arch-gcn3,arch-vega,gpu-compute: Move request counters .. arch-gcn3,arch-vega,gpu-compute: Move request counters When the Vega ISA got committed, it lacked the request counter tracking for memory requests that existed in the GCN3 code. Instead of copying over the same lines from the GCN3 code to the Vega code, this commit makes the various memory pipelines handle updating the request counter information instead, as every memory instruction calls a memory pipeline. This commit also adds an issueRequest in scalar_memory_pipeline, as previously, the gpuDynInsts were explicitly placed in the queue of issuedRequests. Change-Id: I5140d3b2f12be582f2ae9ff7c433167aeec5b68e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45347 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/arch/amdgpu/gcn3/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.cc M src/gpu-compute/global_memory_pipeline.cc M src/gpu-compute/local_memory_pipeline.cc M src/gpu-compute/scalar_memory_pipeline.cc M src/gpu-compute/scalar_memory_pipeline.hh 6 files changed, 82 insertions(+), 408 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc b/src/arch/amdgpu/gcn3/insts/instructions.cc index bc66ebe..a421454 100644 --- a/src/arch/amdgpu/gcn3/insts/instructions.cc +++ b/src/arch/amdgpu/gcn3/insts/instructions.cc @@ -4497,12 +4497,7 @@ calcAddr(gpuDynInst, addr, offset); gpuDynInst->computeUnit()->scalarMemoryPipe -.getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefront()->validateRequestCounters(); +.issueRequest(gpuDynInst); } void @@ -4556,12 +4551,7 @@ calcAddr(gpuDynInst, addr, offset); gpuDynInst->computeUnit()->scalarMemoryPipe. -getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefront()->validateRequestCounters(); +issueRequest(gpuDynInst); } void @@ -4613,12 +4603,7 @@ calcAddr(gpuDynInst, addr, offset); gpuDynInst->computeUnit()->scalarMemoryPipe. -getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefront()->validateRequestCounters(); +issueRequest(gpuDynInst); } void @@ -4670,12 +4655,7 @@ calcAddr(gpuDynInst, addr, offset); gpuDynInst->computeUnit()->scalarMemoryPipe. -getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefront()->validateRequestCounters(); +issueRequest(gpuDynInst); } void @@ -4727,12 +4707,7 @@ calcAddr(gpuDynInst, addr, offset); gpuDynInst->computeUnit()->scalarMemoryPipe. -getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefront()->validateRequestCounters(); +issueRequest(gpuDynInst); } void @@ -4785,12 +4760,7 @@ calcAddr(gpuDynInst, rsrcDesc, offset); gpuDynInst->computeUnit()->scalarMemoryPipe -.getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefront()->validateRequestCounters(); +.issueRequest(gpuDynInst); } // execute void @@ -4844,12 +4814,7 @@ calcAddr(gpuDynInst, rsrcDesc, offset); gpuDynInst->computeUnit()->scalarMemoryPipe -.getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefront()->validateRequestCounters(); +.issueRequest(gpuDynInst); } // execute void @@ -4903,12 +4868,7 @@ calcAddr(gpuDynInst, rsrcDesc, offset); gpuDynInst->computeUnit()->scalarMemoryPipe -.getGMReqFIFO().push(gpuDynInst); - -wf->scalarRdGmReqsInPipe--; -wf->scalarOutstandingReqsRdGm++; -gpuDynInst->wavefront()->outstandingReqs++; -gpuDynInst->wavefron
[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3,gpu-compute: Set gpuDynInst exec_mask before use
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/45346 ) Change subject: arch-gcn3,gpu-compute: Set gpuDynInst exec_mask before use .. arch-gcn3,gpu-compute: Set gpuDynInst exec_mask before use vector_register_file uses the exec_mask of a memory instruction in order to determine if it should mark a register as in-use or not. Previously, the exec_mask of memory instructions was only set on execution of that instruction, which occurs after the code in vector_register_file. This led to the code reading potentially garbage data, leading to a scenario where a register would be marked used when it shouldn't be. This fix sets the exec_mask of memory instructions in schedule_stage, which works because the only time the wavefront execMask() is updated is on a instruction executing, and we know the previous instruction will have executed by the time schedule_stage executes, due to the order the pipeline is executed in. This also undoes part of a patch from last year (62ec973) which treated the symptom of accidental register allocation, without preventing the registers from being allocated in the first place. This patch also removes now redundant code that sets the exec_mask in instructions.cc for memory instructions Change-Id: Idabd3502764fb06133ac2458606c1aaf6f04 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45346 Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Maintainer: Matthew Poremba Tested-by: kokoro --- M src/arch/amdgpu/gcn3/insts/instructions.cc M src/gpu-compute/schedule_stage.cc M src/gpu-compute/vector_register_file.cc 3 files changed, 30 insertions(+), 156 deletions(-) Approvals: Matthew Poremba: Looks good to me, approved; Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc b/src/arch/amdgpu/gcn3/insts/instructions.cc index 8c77b8c..bc66ebe 100644 --- a/src/arch/amdgpu/gcn3/insts/instructions.cc +++ b/src/arch/amdgpu/gcn3/insts/instructions.cc @@ -31243,7 +31243,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -31304,7 +31303,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -31368,7 +31366,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -31548,7 +31545,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -31608,7 +31604,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -32073,7 +32068,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -32135,7 +32129,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -32200,7 +32193,6 @@ { Wavefront *wf = gpuDynInst->wavefront(); gpuDynInst->execUnitId = wf->execUnitId; -gpuDynInst->exec_mask = wf->execMask(); gpuDynInst->latency.init(gpuDynInst->computeUnit()); gpuDynInst->latency.set( gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24))); @@ -32284,7 +32276,6 @@ { W
[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: Read registers in execute instead of initiateAcc
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/45345 ) Change subject: arch-gcn3: Read registers in execute instead of initiateAcc .. arch-gcn3: Read registers in execute instead of initiateAcc Certain memory writes were reading their registers in initiateAcc, which lead to scenarios where a subsequent instruction would execute, clobbering the value in that register before the memory writes' initiateAcc method was called, causing the memory write to read wrong data. This patch moves all register reads to execute, preventing the above scenario from happening. Change-Id: Iee107c19e4b82c2e172bf2d6cc95b79983a43d83 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45345 Tested-by: kokoro Reviewed-by: Matt Sinclair Reviewed-by: Matthew Poremba Reviewed-by: Alex Dutu Maintainer: Matt Sinclair --- M src/arch/amdgpu/gcn3/insts/instructions.cc 1 file changed, 116 insertions(+), 125 deletions(-) Approvals: Alex Dutu: Looks good to me, approved Matthew Poremba: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc b/src/arch/amdgpu/gcn3/insts/instructions.cc index b5a4300..8c77b8c 100644 --- a/src/arch/amdgpu/gcn3/insts/instructions.cc +++ b/src/arch/amdgpu/gcn3/insts/instructions.cc @@ -5068,8 +5068,13 @@ gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); ScalarRegU32 offset(0); ConstScalarOperandU64 addr(gpuDynInst, instData.SBASE << 1); +ConstScalarOperandU32 sdata(gpuDynInst, instData.SDATA); addr.read(); +sdata.read(); + +std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(), +sizeof(ScalarRegU32)); if (instData.IMM) { offset = extData.OFFSET; @@ -5093,10 +5098,6 @@ void Inst_SMEM__S_STORE_DWORD::initiateAcc(GPUDynInstPtr gpuDynInst) { -ConstScalarOperandU32 sdata(gpuDynInst, instData.SDATA); -sdata.read(); -std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(), -sizeof(ScalarRegU32)); initMemWrite<1>(gpuDynInst); } // initiateAcc @@ -5127,8 +5128,13 @@ gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); ScalarRegU32 offset(0); ConstScalarOperandU64 addr(gpuDynInst, instData.SBASE << 1); +ConstScalarOperandU64 sdata(gpuDynInst, instData.SDATA); addr.read(); +sdata.read(); + +std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(), +sizeof(ScalarRegU64)); if (instData.IMM) { offset = extData.OFFSET; @@ -5152,10 +5158,6 @@ void Inst_SMEM__S_STORE_DWORDX2::initiateAcc(GPUDynInstPtr gpuDynInst) { -ConstScalarOperandU64 sdata(gpuDynInst, instData.SDATA); -sdata.read(); -std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(), -sizeof(ScalarRegU64)); initMemWrite<2>(gpuDynInst); } // initiateAcc @@ -5186,8 +5188,13 @@ gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod()); ScalarRegU32 offset(0); ConstScalarOperandU64 addr(gpuDynInst, instData.SBASE << 1); +ConstScalarOperandU128 sdata(gpuDynInst, instData.SDATA); addr.read(); +sdata.read(); + +std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(), +4 * sizeof(ScalarRegU32)); if (instData.IMM) { offset = extData.OFFSET; @@ -5211,10 +5218,6 @@ void Inst_SMEM__S_STORE_DWORDX4::initiateAcc(GPUDynInstPtr gpuDynInst) { -ConstScalarOperandU128 sdata(gpuDynInst, instData.SDATA); -sdata.read(); -std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(), -4 * sizeof(ScalarRegU32)); initMemWrite<4>(gpuDynInst); } // initiateAcc @@ -35746,9 +35749,18 @@ ConstVecOperandU32 addr1(gpuDynInst, extData.VADDR + 1); ConstScalarOperandU128 rsrcDesc(gpuDynInst, extData.SRSRC * 4); ConstScalarOperandU32 offset(gpuDynInst, extData.SOFFSET); +ConstVecOperandI8 data(gpuDynInst, extData.VDATA); rsrcDesc.read(); offset.read(); +data.read(); + +for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { +if (gpuDynInst->exec_mask[lane]) { +(reinterpret_cast(gpuDynInst->d_data))[lane] += data[lane]; +} +} int inst_offset = instData.OFFSET; @@ -35793,16 +35805,6 @@ void Inst_MUBUF__BUFFER_STORE_BYTE::initiateAcc(GPUDynInstPtr gpuDynInst) { -ConstVecOperandI8 data(gpuDynInst, extData.VDATA); -data.read(); - -for (int lane = 0; lane < NumVecElemPerVecR
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Check for WAX dependences
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/47539 ) Change subject: gpu-compute: Check for WAX dependences .. gpu-compute: Check for WAX dependences This adds checking if the destination registers are free or busy in the operandsReady() function for both scalar and vector registers. This allows us to catch WAX dependences between instructions. Change-Id: I0fb0b29e9608fca0d90c059422d4d9500d5b2a7d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47539 Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair Tested-by: kokoro --- M src/gpu-compute/scalar_register_file.cc M src/gpu-compute/vector_register_file.cc 2 files changed, 22 insertions(+), 0 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/scalar_register_file.cc b/src/gpu-compute/scalar_register_file.cc index 52e0a2f..3a00093 100644 --- a/src/gpu-compute/scalar_register_file.cc +++ b/src/gpu-compute/scalar_register_file.cc @@ -64,6 +64,17 @@ } } +for (const auto& dstScalarOp : ii->dstScalarRegOperands()) { +for (const auto& physIdx : dstScalarOp.physIndices()) { +if (regBusy(physIdx)) { +DPRINTF(GPUSRF, "WAX stall: WV[%d]: %s: physReg[%d]\n", +w->wfDynId, ii->disassemble(), physIdx); +w->stats.numTimesBlockedDueWAXDependencies++; +return false; +} +} +} + return true; } diff --git a/src/gpu-compute/vector_register_file.cc b/src/gpu-compute/vector_register_file.cc index dc5434d..2355643 100644 --- a/src/gpu-compute/vector_register_file.cc +++ b/src/gpu-compute/vector_register_file.cc @@ -71,6 +71,17 @@ } } +for (const auto& dstVecOp : ii->dstVecRegOperands()) { +for (const auto& physIdx : dstVecOp.physIndices()) { +if (regBusy(physIdx)) { +DPRINTF(GPUVRF, "WAX stall: WV[%d]: %s: physReg[%d]\n", +w->wfDynId, ii->disassemble(), physIdx); +w->stats.numTimesBlockedDueWAXDependencies++; +return false; +} +} +} + return true; } -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47539 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I0fb0b29e9608fca0d90c059422d4d9500d5b2a7d Gerrit-Change-Number: 47539 Gerrit-PatchSet: 2 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: ruby: fix typo in VIPER TCC triggerQueue
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/44905 ) Change subject: ruby: fix typo in VIPER TCC triggerQueue .. ruby: fix typo in VIPER TCC triggerQueue The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead of "TriggerMsg" for the triggerQueue_in port. This was a benign bug beacuse the msg type is not used in the in_port implementation but still makes the SLICC harder to understand, so fixing it is worthwhile. Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44905 Reviewed-by: Jason Lowe-Power Reviewed-by: Matthew Poremba Maintainer: Jason Lowe-Power Tested-by: kokoro --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved Matthew Poremba: Looks good to me, approved kokoro: Regressions pass diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index e21ba99..6c07416 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -252,7 +252,7 @@ // ** IN_PORTS ** - in_port(triggerQueue_in, TiggerMsg, triggerQueue) { + in_port(triggerQueue_in, TriggerMsg, triggerQueue) { if (triggerQueue_in.isReady(clockEdge())) { peek(triggerQueue_in, TriggerMsg) { TBE tbe := TBEs.lookup(in_msg.addr); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/44905 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9 Gerrit-Change-Number: 44905 Gerrit-PatchSet: 2 Gerrit-Owner: Matt Sinclair Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: ruby: fix typo in VIPER TCC triggerQueue
Matt Sinclair has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/44905 ) Change subject: ruby: fix typo in VIPER TCC triggerQueue .. ruby: fix typo in VIPER TCC triggerQueue The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead of "TriggerMsg" for the triggerQueue_in port. This was a benign bug beacuse the msg type is not used in the in_port implementation but still makes the SLICC harder to understand, so fixing it is worthwhile. Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9 --- M src/mem/ruby/protocol/GPU_VIPER-TCC.sm 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm index e21ba99..6c07416 100644 --- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm +++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm @@ -252,7 +252,7 @@ // ** IN_PORTS ** - in_port(triggerQueue_in, TiggerMsg, triggerQueue) { + in_port(triggerQueue_in, TriggerMsg, triggerQueue) { if (triggerQueue_in.isReady(clockEdge())) { peek(triggerQueue_in, TriggerMsg) { TBE tbe := TBEs.lookup(in_msg.addr); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/44905 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9 Gerrit-Change-Number: 44905 Gerrit-PatchSet: 1 Gerrit-Owner: Matt Sinclair Gerrit-MessageType: newchange ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: dev-hsa,gpu-compute: Fix override for updateHsaSignal
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/44046 ) Change subject: dev-hsa,gpu-compute: Fix override for updateHsaSignal .. dev-hsa,gpu-compute: Fix override for updateHsaSignal Change 965ad12 removed a parameter from the updateHsaSignal function. Change 25e8a14 added the parameter back, but only for the derived class, breaking the override. This patch adds that parameter back to the base class, fixing the override. Change-Id: Id1e96e29ca4be7f3ce244bac83a112e3250812d1 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44046 Reviewed-by: Jason Lowe-Power Reviewed-by: Alex Dutu Reviewed-by: Matt Sinclair Tested-by: kokoro Maintainer: Matt Sinclair --- M src/dev/hsa/hsa_device.hh M src/gpu-compute/gpu_command_processor.hh 2 files changed, 3 insertions(+), 2 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved Alex Dutu: Looks good to me, approved Matt Sinclair: Looks good to me, but someone else must approve; Looks good to me, approved kokoro: Regressions pass diff --git a/src/dev/hsa/hsa_device.hh b/src/dev/hsa/hsa_device.hh index 157c459..5b6f388 100644 --- a/src/dev/hsa/hsa_device.hh +++ b/src/dev/hsa/hsa_device.hh @@ -101,7 +101,8 @@ fatal("%s does not need HSA driver\n", name()); } virtual void -updateHsaSignal(Addr signal_handle, uint64_t signal_value) +updateHsaSignal(Addr signal_handle, uint64_t signal_value, +HsaSignalCallbackFunction function = [] (const uint64_t &) { }) { fatal("%s does not have HSA signal update functionality.\n", name()); } diff --git a/src/gpu-compute/gpu_command_processor.hh b/src/gpu-compute/gpu_command_processor.hh index c78ae0b..67cda7d 100644 --- a/src/gpu-compute/gpu_command_processor.hh +++ b/src/gpu-compute/gpu_command_processor.hh @@ -90,7 +90,7 @@ void updateHsaSignal(Addr signal_handle, uint64_t signal_value, HsaSignalCallbackFunction function = -[] (const uint64_t &) { }); +[] (const uint64_t &) { }) override; uint64_t functionalReadHsaSignal(Addr signal_handle) override; -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/44046 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Id1e96e29ca4be7f3ce244bac83a112e3250812d1 Gerrit-Change-Number: 44046 Gerrit-PatchSet: 4 Gerrit-Owner: Kyle Roarty Gerrit-Reviewer: Alex Dutu Gerrit-Reviewer: Jason Lowe-Power Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-vega, gpu-compute: Add vectors to hold op info
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42211 ) Change subject: arch-vega, gpu-compute: Add vectors to hold op info .. arch-vega, gpu-compute: Add vectors to hold op info This removes the need for redundant functions like isScalarRegister/isVectorRegister, as well as isSrcOperand/isDstOperand. Also, the op info is only generated once this way instead of every time it's needed. Change-Id: I8af5080502ed08ed9107a441e2728828f86496f4 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42211 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh M src/arch/amdgpu/gcn3/insts/instructions.hh M src/arch/amdgpu/gcn3/insts/op_encodings.cc M src/arch/amdgpu/gcn3/insts/op_encodings.hh M src/arch/amdgpu/vega/insts/gpu_static_inst.hh M src/arch/amdgpu/vega/insts/instructions.hh M src/arch/amdgpu/vega/insts/op_encodings.cc M src/arch/amdgpu/vega/insts/op_encodings.hh M src/gpu-compute/gpu_dyn_inst.cc M src/gpu-compute/gpu_static_inst.hh A src/gpu-compute/operand_info.hh 11 files changed, 1,233 insertions(+), 80,257 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass 2 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42211 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I8af5080502ed08ed9107a441e2728828f86496f4 Gerrit-Change-Number: 42211 Gerrit-PatchSet: 6 Gerrit-Owner: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Bobby R. Bruce Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3, gpu-compute: Update getRegisterIndex() API
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42210 ) Change subject: arch-gcn3, gpu-compute: Update getRegisterIndex() API .. arch-gcn3, gpu-compute: Update getRegisterIndex() API This change removes the GPUDynInstPtr argument from getRegisterIndex(). The dynamic inst was only needed to get access to its parent WF's state so it could determine the number of scalar registers the wave was allocated. However, we can simply pass the number of scalar registers directly. This cuts down on shared pointer usage. Change-Id: I29ab8d9a3de1f8b82b820ef421fc653284567c65 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42210 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh M src/arch/amdgpu/gcn3/insts/op_encodings.cc M src/arch/amdgpu/gcn3/insts/op_encodings.hh M src/gpu-compute/fetch_unit.cc M src/gpu-compute/gpu_dyn_inst.cc M src/gpu-compute/gpu_dyn_inst.hh M src/gpu-compute/gpu_static_inst.hh M src/gpu-compute/scalar_register_file.cc M src/gpu-compute/vector_register_file.cc M src/gpu-compute/wavefront.cc 10 files changed, 86 insertions(+), 120 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh b/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh index 03beb20..e4983e8 100644 --- a/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh +++ b/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh @@ -70,7 +70,7 @@ int getOperandSize(int opIdx) override { return 0; } int -getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst) override +getRegisterIndex(int opIdx, int num_scalar_regs) override { return 0; } diff --git a/src/arch/amdgpu/gcn3/insts/op_encodings.cc b/src/arch/amdgpu/gcn3/insts/op_encodings.cc index a6a3a26..34bd35f 100644 --- a/src/arch/amdgpu/gcn3/insts/op_encodings.cc +++ b/src/arch/amdgpu/gcn3/insts/op_encodings.cc @@ -128,21 +128,18 @@ } int -Inst_SOP2::getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst) +Inst_SOP2::getRegisterIndex(int opIdx, int num_scalar_regs) { assert(opIdx >= 0); assert(opIdx < getNumOperands()); switch (opIdx) { case 0: -return opSelectorToRegIdx(instData.SSRC0, -gpuDynInst->wavefront()->reservedScalarRegs); +return opSelectorToRegIdx(instData.SSRC0, num_scalar_regs); case 1: -return opSelectorToRegIdx(instData.SSRC1, -gpuDynInst->wavefront()->reservedScalarRegs); +return opSelectorToRegIdx(instData.SSRC1, num_scalar_regs); case 2: -return opSelectorToRegIdx(instData.SDST, -gpuDynInst->wavefront()->reservedScalarRegs); +return opSelectorToRegIdx(instData.SDST, num_scalar_regs); default: fatal("Operand at idx %i does not exist\n", opIdx); return -1; @@ -244,7 +241,7 @@ } int -Inst_SOPK::getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst) +Inst_SOPK::getRegisterIndex(int opIdx, int num_scalar_regs) { assert(opIdx >= 0); assert(opIdx < getNumOperands()); @@ -253,8 +250,7 @@ case 0: return -1; case 1: -return opSelectorToRegIdx(instData.SDST, -gpuDynInst->wavefront()->reservedScalarRegs); +return opSelectorToRegIdx(instData.SDST, num_scalar_regs); default: fatal("Operand at idx %i does not exist\n", opIdx); return -1; @@ -349,7 +345,7 @@ } int -Inst_SOP1::getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst) +Inst_SOP1::getRegisterIndex(int opIdx, int num_scalar_regs) { assert(opIdx >= 0); assert(opIdx < getNumOperands()); @@ -359,14 +355,11 @@ if (instData.OP == 0x1C) { // Special case for s_getpc, which has no source reg. // Instead, it implicitly reads the PC. -return opSelectorToRegIdx(instData.SDST, -gpuDynInst->wavefront()->reservedScalarRegs); +return opSelectorToRegIdx(instData.SDST, num_scalar_regs); } -return opSelectorToRegIdx(instData.SSRC0, -gpuDynInst->wavefront()->reservedScalarRegs); +return opSelectorToRegIdx(instData.SSRC0, num_scalar_regs); case 1: -return opSelectorToRegIdx(instData.SDST, -gpuDynInst->wavefront()->reservedScalarRegs); +return opSelectorToRegIdx(instData.SDST, num_scalar_regs); default: fatal("Operand at idx %i does not exist\n", opIdx); return -1; @@ -467
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Update instruction encodings
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42205 ) Change subject: arch-vega: Update instruction encodings .. arch-vega: Update instruction encodings This also renames VOP3 and VOP3_SDST_ENC to VOP3A and VOP3B, matching the ISA. Change-Id: I56f254433b1f3181d4ee6896f957a2256e3c7b29 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42205 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/amdgpu/vega/decoder.cc M src/arch/amdgpu/vega/gpu_decoder.hh M src/arch/amdgpu/vega/insts/inst_util.hh M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/instructions.hh M src/arch/amdgpu/vega/insts/op_encodings.cc M src/arch/amdgpu/vega/insts/op_encodings.hh 7 files changed, 2,111 insertions(+), 2,063 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass 2 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42205 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I56f254433b1f3181d4ee6896f957a2256e3c7b29 Gerrit-Change-Number: 42205 Gerrit-PatchSet: 6 Gerrit-Owner: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: kokoro Gerrit-CC: Bobby R. Bruce Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add decodings for Flat, Global, Scratch
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42206 ) Change subject: arch-vega: Add decodings for Flat, Global, Scratch .. arch-vega: Add decodings for Flat, Global, Scratch Does not implement the functions yet Change-Id: I32feab747b13bd2eff98983e3281c0d82e756221 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42206 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/amdgpu/vega/decoder.cc M src/arch/amdgpu/vega/gpu_decoder.hh 2 files changed, 832 insertions(+), 9 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/decoder.cc b/src/arch/amdgpu/vega/decoder.cc index 5dac7f9..3015313 100644 --- a/src/arch/amdgpu/vega/decoder.cc +++ b/src/arch/amdgpu/vega/decoder.cc @@ -1623,19 +1623,19 @@ &Decoder::decode_OP_FLAT__FLAT_LOAD_DWORDX3, &Decoder::decode_OP_FLAT__FLAT_LOAD_DWORDX4, &Decoder::decode_OP_FLAT__FLAT_STORE_BYTE, -&Decoder::decode_invalid, +&Decoder::decode_OP_FLAT__FLAT_STORE_BYTE_D16_HI, &Decoder::decode_OP_FLAT__FLAT_STORE_SHORT, -&Decoder::decode_invalid, +&Decoder::decode_OP_FLAT__FLAT_STORE_SHORT_D16_HI, &Decoder::decode_OP_FLAT__FLAT_STORE_DWORD, &Decoder::decode_OP_FLAT__FLAT_STORE_DWORDX2, &Decoder::decode_OP_FLAT__FLAT_STORE_DWORDX3, &Decoder::decode_OP_FLAT__FLAT_STORE_DWORDX4, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, -&Decoder::decode_invalid, +&Decoder::decode_OP_FLAT__FLAT_LOAD_UBYTE_D16, +&Decoder::decode_OP_FLAT__FLAT_LOAD_UBYTE_D16_HI, +&Decoder::decode_OP_FLAT__FLAT_LOAD_SBYTE_D16, +&Decoder::decode_OP_FLAT__FLAT_LOAD_SBYTE_D16_HI, +&Decoder::decode_OP_FLAT__FLAT_LOAD_SHORT_D16, +&Decoder::decode_OP_FLAT__FLAT_LOAD_SHORT_D16_HI, &Decoder::decode_invalid, &Decoder::decode_invalid, &Decoder::decode_invalid, @@ -1728,6 +1728,137 @@ &Decoder::decode_invalid }; +IsaDecodeMethod Decoder::tableSubDecode_OP_GLOBAL[] = { +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_UBYTE, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SBYTE, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_USHORT, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SSHORT, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORD, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORDX2, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORDX3, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORDX4, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_BYTE, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_BYTE_D16_HI, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_SHORT, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_SHORT_D16_HI, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORD, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORDX2, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORDX3, +&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORDX4, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_UBYTE_D16, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_UBYTE_D16_HI, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SBYTE_D16, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SBYTE_D16_HI, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SHORT_D16, +&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SHORT_D16_HI, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +&Decoder::decode_invalid, +
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Add operand info class to GPUDynInst
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42209 ) Change subject: gpu-compute: Add operand info class to GPUDynInst .. gpu-compute: Add operand info class to GPUDynInst This change adds a class that stores operand register info for the GPUDynInst. The operand info is calculated when the instruction object is created and stored for easy access by the RF, etc. Change-Id: I3cf267942e54fe60fcb4224d3b88da08a1a0226e Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42209 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/gcn3/registers.hh M src/gpu-compute/SConscript M src/gpu-compute/fetch_unit.cc M src/gpu-compute/gpu_dyn_inst.cc M src/gpu-compute/gpu_dyn_inst.hh M src/gpu-compute/wavefront.cc 6 files changed, 223 insertions(+), 9 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/gcn3/registers.hh b/src/arch/gcn3/registers.hh index 7ad9b1f..df1ef4e 100644 --- a/src/arch/gcn3/registers.hh +++ b/src/arch/gcn3/registers.hh @@ -168,6 +168,12 @@ typedef int64_t VecElemI64; typedef double VecElemF64; +const int DWORDSize = sizeof(VecElemU32); +/** + * Size of a single-precision register in DWORDs. + */ +const int RegSizeDWORDs = sizeof(VecElemU32) / DWORDSize; + // typedefs for the various sizes/types of vector regs using VecRegU8 = ::VecRegT; using VecRegI8 = ::VecRegT; diff --git a/src/gpu-compute/SConscript b/src/gpu-compute/SConscript index e41e387..adb9b0e 100644 --- a/src/gpu-compute/SConscript +++ b/src/gpu-compute/SConscript @@ -80,6 +80,7 @@ DebugFlag('GPUDisp') DebugFlag('GPUExec') DebugFlag('GPUFetch') +DebugFlag('GPUInst') DebugFlag('GPUKernelInfo') DebugFlag('GPUMem') DebugFlag('GPUPort') diff --git a/src/gpu-compute/fetch_unit.cc b/src/gpu-compute/fetch_unit.cc index 62b9e73..d2af7b3 100644 --- a/src/gpu-compute/fetch_unit.cc +++ b/src/gpu-compute/fetch_unit.cc @@ -557,6 +557,7 @@ wavefront, gpu_static_inst, wavefront->computeUnit-> getAndIncSeqNum()); +gpu_dyn_inst->initOperandInfo(gpu_dyn_inst); wavefront->instructionBuffer.push_back(gpu_dyn_inst); DPRINTF(GPUFetch, "WF[%d][%d]: Id%ld decoded %s (%d bytes). " @@ -597,6 +598,7 @@ wavefront, gpu_static_inst, wavefront->computeUnit-> getAndIncSeqNum()); +gpu_dyn_inst->initOperandInfo(gpu_dyn_inst); wavefront->instructionBuffer.push_back(gpu_dyn_inst); DPRINTF(GPUFetch, "WF[%d][%d]: Id%d decoded split inst %s (%#x) " diff --git a/src/gpu-compute/gpu_dyn_inst.cc b/src/gpu-compute/gpu_dyn_inst.cc index b9b23d4..c08e4b9 100644 --- a/src/gpu-compute/gpu_dyn_inst.cc +++ b/src/gpu-compute/gpu_dyn_inst.cc @@ -33,6 +33,7 @@ #include "gpu-compute/gpu_dyn_inst.hh" +#include "debug/GPUInst.hh" #include "debug/GPUMem.hh" #include "gpu-compute/gpu_static_inst.hh" #include "gpu-compute/scalar_register_file.hh" @@ -43,7 +44,8 @@ GPUStaticInst *static_inst, InstSeqNum instSeqNum) : GPUExecContext(_cu, _wf), scalarAddr(0), addr(computeUnit()->wfSize(), (Addr)0), numScalarReqs(0), isSaveRestore(false), - _staticInst(static_inst), _seqNum(instSeqNum) + _staticInst(static_inst), _seqNum(instSeqNum), + maxSrcVecRegOpSize(0), maxSrcScalarRegOpSize(0) { statusVector.assign(TheGpuISA::NumVecElemPerVecReg, 0); tlbHitLevel.assign(computeUnit()->wfSize(), -1); @@ -82,6 +84,109 @@ } } +void +GPUDynInst::initOperandInfo(GPUDynInstPtr &gpu_dyn_inst) +{ +assert(gpu_dyn_inst->wavefront()); +/** + * Generate and cache the operand to register mapping information. This + * prevents this info from being generated multiple times throughout + * the CU pipeline. + */ +DPRINTF(GPUInst, "%s: generating operand info for %d operands\n", +disassemble(), getNumOperands()); + +for (int op_idx = 0; op_idx < getNumOperands(); ++op_idx) { +int virt_idx(-1); +int phys_idx(-1); +int op_num_dwords(-1); + +if (isVectorRegister(op_idx)) { +virt_idx = getRegisterIndex(op_idx, gpu_dyn_inst); +op_num_dwords = numOpdDWORDs(op_idx); + +if (isSrcOperand(op_idx)) { +std::vector virt_indices; +std::vector phys_indices; + +if (op_num_dwords > maxSrcVecRegOpSize) { +maxSrcVecRegOpSize = op_num_dwords; +} + +for (int i = 0; i < op_num_dwords; ++i) { +phys_idx = compu
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Update FLAT instructions to use offset
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42213 ) Change subject: arch-vega: Update FLAT instructions to use offset .. arch-vega: Update FLAT instructions to use offset In Vega, flat instructions use an offset when computing the address (section 9.4 of chapter 9 'Flat Memory Instructions' in Vega ISA manual). This is different from the GCN3 baseline. Change-Id: I9fe36f028014889ef566055458c451442403a289 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42213 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/amdgpu/vega/insts/instructions.cc M src/arch/amdgpu/vega/insts/op_encodings.hh 2 files changed, 20 insertions(+), 19 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/arch/amdgpu/vega/insts/instructions.cc b/src/arch/amdgpu/vega/insts/instructions.cc index 281fd95..0a01bf2 100644 --- a/src/arch/amdgpu/vega/insts/instructions.cc +++ b/src/arch/amdgpu/vega/insts/instructions.cc @@ -42461,7 +42461,7 @@ addr.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); if (isFlatGlobal()) { gpuDynInst->computeUnit()->globalMemoryPipe @@ -42552,7 +42552,7 @@ addr.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); if (isFlatGlobal()) { gpuDynInst->computeUnit()->globalMemoryPipe @@ -42644,7 +42644,7 @@ addr.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); if (isFlatGlobal()) { gpuDynInst->computeUnit()->globalMemoryPipe @@ -42707,7 +42707,7 @@ addr.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); if (isFlatGlobal()) { gpuDynInst->computeUnit()->globalMemoryPipe @@ -42770,7 +42770,7 @@ addr.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); if (isFlatGlobal()) { gpuDynInst->computeUnit()->globalMemoryPipe @@ -42841,7 +42841,7 @@ addr.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); if (isFlatGlobal()) { gpuDynInst->computeUnit()->globalMemoryPipe @@ -42920,7 +42920,7 @@ data.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -42983,7 +42983,7 @@ addr.read(); data.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -43046,7 +43046,7 @@ addr.read(); data.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -43117,7 +43117,7 @@ } } -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); if (isFlatGlobal()) { gpuDynInst->computeUnit()->globalMemoryPipe @@ -43178,7 +43178,7 @@ data1.read(); data2.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -43252,7 +43252,7 @@ data2.read(); data3.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -43329,7 +43329,7 @@ addr.read(); data.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -43417,7 +43417,7 @@ data.read(); cmp.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { @@ -43501,7 +43501,7 @@ addr.read(); data.read(); -calcAddr(gpuDynInst, addr); +calcAddr(gpuDynInst, addr, instData.OFFSET); for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) { if (gpuDynInst->exec_mask[lane]) { (reinterpret_cast(gpuDynInst->
[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add Vega ISA as a copy of GCN3
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42204 ) Change subject: arch-vega: Add Vega ISA as a copy of GCN3 .. arch-vega: Add Vega ISA as a copy of GCN3 This changeset adds Vega support as a copy of GCN3. Configs have been modified to include both ISAs. Current implementation is not complete and needs modifications to fully comply with the ISA manual: https://developer.amd.com/wp-content/resources/ Vega_Shader_ISA_28July2017.pdf Change-Id: I608aa6747a45594f8e1bd7802da1883cf612168b Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42204 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M MAINTAINERS.yaml M src/arch/SConscript A src/arch/amdgpu/vega/SConscript A src/arch/amdgpu/vega/SConsopts A src/arch/amdgpu/vega/decoder.cc A src/arch/amdgpu/vega/gpu_decoder.hh A src/arch/amdgpu/vega/gpu_isa.hh A src/arch/amdgpu/vega/gpu_mem_helpers.hh A src/arch/amdgpu/vega/gpu_registers.hh A src/arch/amdgpu/vega/gpu_types.hh A src/arch/amdgpu/vega/insts/gpu_static_inst.cc A src/arch/amdgpu/vega/insts/gpu_static_inst.hh A src/arch/amdgpu/vega/insts/inst_util.hh A src/arch/amdgpu/vega/insts/instructions.cc A src/arch/amdgpu/vega/insts/instructions.hh A src/arch/amdgpu/vega/insts/op_encodings.cc A src/arch/amdgpu/vega/insts/op_encodings.hh A src/arch/amdgpu/vega/isa.cc A src/arch/amdgpu/vega/operand.hh A src/arch/amdgpu/vega/registers.cc 20 files changed, 144,242 insertions(+), 1 deletion(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass 2 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42204 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I608aa6747a45594f8e1bd7802da1883cf612168b Gerrit-Change-Number: 42204 Gerrit-PatchSet: 6 Gerrit-Owner: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Bobby R. Bruce Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: Modify directory structure as prep for adding vega isa
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42203 ) Change subject: arch-gcn3: Modify directory structure as prep for adding vega isa .. arch-gcn3: Modify directory structure as prep for adding vega isa Change-Id: I7c5f4a3a9d82ca4550e833dec2cd576dbe333627 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42203 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/arch/SConscript R src/arch/amdgpu/gcn3/SConscript R src/arch/amdgpu/gcn3/SConsopts A src/arch/amdgpu/gcn3/ast_interpreter.py A src/arch/amdgpu/gcn3/ast_objects.py R src/arch/amdgpu/gcn3/decoder.cc A src/arch/amdgpu/gcn3/description_objects.py A src/arch/amdgpu/gcn3/description_parser.py R src/arch/amdgpu/gcn3/gpu_decoder.hh R src/arch/amdgpu/gcn3/gpu_isa.hh A src/arch/amdgpu/gcn3/gpu_isa_main.py A src/arch/amdgpu/gcn3/gpu_isa_parser.py R src/arch/amdgpu/gcn3/gpu_mem_helpers.hh A src/arch/amdgpu/gcn3/gpu_registers.hh R src/arch/amdgpu/gcn3/gpu_types.hh A src/arch/amdgpu/gcn3/hand_coded.py R src/arch/amdgpu/gcn3/insts/gpu_static_inst.cc R src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh R src/arch/amdgpu/gcn3/insts/inst_util.hh R src/arch/amdgpu/gcn3/insts/instructions.cc R src/arch/amdgpu/gcn3/insts/instructions.hh R src/arch/amdgpu/gcn3/insts/op_encodings.cc R src/arch/amdgpu/gcn3/insts/op_encodings.hh R src/arch/amdgpu/gcn3/isa.cc R src/arch/amdgpu/gcn3/operand.hh R src/arch/amdgpu/gcn3/registers.cc 26 files changed, 7,485 insertions(+), 50 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass 3 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42203 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I7c5f4a3a9d82ca4550e833dec2cd576dbe333627 Gerrit-Change-Number: 42203 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Bobby R. Bruce Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Remove unused functions
Matt Sinclair has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/42202 ) Change subject: gpu-compute: Remove unused functions .. gpu-compute: Remove unused functions These functions were probably used for some stat collection, but they're no longer used, so they're being removed Change-Id: Ic99f22391c0d5ffb0e9963670efb35e503f9957d Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42202 Tested-by: kokoro Reviewed-by: Matt Sinclair Maintainer: Matt Sinclair --- M src/gpu-compute/gpu_dyn_inst.cc M src/gpu-compute/gpu_dyn_inst.hh 2 files changed, 0 insertions(+), 37 deletions(-) Approvals: Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/src/gpu-compute/gpu_dyn_inst.cc b/src/gpu-compute/gpu_dyn_inst.cc index b827632..b9b23d4 100644 --- a/src/gpu-compute/gpu_dyn_inst.cc +++ b/src/gpu-compute/gpu_dyn_inst.cc @@ -268,40 +268,6 @@ return _staticInst->executed_as; } -bool -GPUDynInst::hasVgprRawDependence(GPUDynInstPtr s) -{ -assert(s); -for (int i = 0; i < getNumOperands(); ++i) { -if (isVectorRegister(i) && isSrcOperand(i)) { -for (int j = 0; j < s->getNumOperands(); ++j) { -if (s->isVectorRegister(j) && s->isDstOperand(j)) { -if (i == j) -return true; -} -} -} -} -return false; -} - -bool -GPUDynInst::hasSgprRawDependence(GPUDynInstPtr s) -{ -assert(s); -for (int i = 0; i < getNumOperands(); ++i) { -if (isScalarRegister(i) && isSrcOperand(i)) { -for (int j = 0; j < s->getNumOperands(); ++j) { -if (s->isScalarRegister(j) && s->isDstOperand(j)) { -if (i == j) -return true; -} -} -} -} -return false; -} - // Process a memory instruction and (if necessary) submit timing request void GPUDynInst::initiateAcc(GPUDynInstPtr gpuDynInst) diff --git a/src/gpu-compute/gpu_dyn_inst.hh b/src/gpu-compute/gpu_dyn_inst.hh index 851a46a..97eea01 100644 --- a/src/gpu-compute/gpu_dyn_inst.hh +++ b/src/gpu-compute/gpu_dyn_inst.hh @@ -101,9 +101,6 @@ bool hasDestinationVgpr() const; bool hasSourceVgpr() const; -bool hasSgprRawDependence(GPUDynInstPtr s); -bool hasVgprRawDependence(GPUDynInstPtr s); - // returns true if the string "opcodeStr" is found in the // opcode of the instruction bool isOpcode(const std::string& opcodeStr) const; 2 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted one. -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42202 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: Ic99f22391c0d5ffb0e9963670efb35e503f9957d Gerrit-Change-Number: 42202 Gerrit-PatchSet: 4 Gerrit-Owner: Alex Dutu Gerrit-Reviewer: Matt Sinclair Gerrit-Reviewer: Matthew Poremba Gerrit-Reviewer: kokoro Gerrit-CC: Kyle Roarty Gerrit-MessageType: merged ___ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s