[gem5-dev] Change in gem5/gem5[develop]: tests: update nightly tests to document square

2021-09-25 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50949 )



Change subject: tests: update nightly tests to document square
..

tests: update nightly tests to document square

Add some information and comments on why square is included in the
nightly tests.

Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382
---
M tests/nightly.sh
1 file changed, 4 insertions(+), 0 deletions(-)



diff --git a/tests/nightly.sh b/tests/nightly.sh
index 3ffdbcd..91b19f5 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -91,10 +91,14 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"

+# get square
 wget -qN http://dist.gem5.org/dist/develop/test-progs/square/square

 mkdir -p tests/testing-results

+# Square is the simplest, fastest, more heavily tested GPU application
+# Thus, we always want to run this in the nightly regressions to make sure
+# basic GPU functionality is working.
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3 -c square

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50949
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382
Gerrit-Change-Number: 50949
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: Add HeteroSync to nightly regression

2021-09-25 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50951 )



Change subject: tests: Add HeteroSync to nightly regression
..

tests: Add HeteroSync to nightly regression

HeteroSync does a good job of testing the GPU memory system and
atomics support, without requiring a long runtime.  Thus, this
commit adds a mutex and barrier test from HeteroSync to the
nightly regression to ensure these components are tested.

Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a
---
M tests/nightly.sh
1 file changed, 22 insertions(+), 0 deletions(-)



diff --git a/tests/nightly.sh b/tests/nightly.sh
index 91b19f5..3f115e0 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -102,3 +102,25 @@
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3 -c square
+
+# get HeteroSync
+wget -qN  
http://dist.gem5.org/dist/v21-1/test-progs/heterosync/gcn3/allSyncPrims-1kernel

+
+# run HeteroSync sleepMutex -- 16 WGs (4 per CU in default config), each  
doing
+# 10 Ld/St per thread and 4 iterations of the critical section is a  
reasonable
+# moderate contention case for the default 4 CU GPU config and help ensure  
GPU

+# atomics are tested.
+docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+configs/example/apu_se.py -n3 -callSyncPrims-1kernel \
+--options="sleepMutex 10 16 4"
+
+# run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs
+# accessing unique data and then joining a lock-free barrier, 10 Ld/St per
+# thread, 4 iterations of critical section.  Again this is representative  
of a
+# moderate contention case for the default 4 CU GPU config and help ensure  
GPU

+# atomics are tested.
+docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+configs/example/apu_se.py -n3 -callSyncPrims-1kernel \
+--options="lfTreeBarrUniq 10 16 4"

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50951
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a
Gerrit-Change-Number: 50951
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: add LULESH to weekly regression

2021-09-25 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50952 )



Change subject: tests: add LULESH to weekly regression
..

tests: add LULESH to weekly regression

LULESH is a popular GPU HPC application that acts as a good test
for several memory and compute patterns.  Thus, including it in
the weekly regressions will help verify correctness and
functionality for code that affects the GPU.  The default LULESH
input runs 10 iterations and takes 3-4 hours.  Hence, it is not
appropriate for nightly regressions.

Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e
---
M tests/weekly.sh
1 file changed, 16 insertions(+), 0 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index 393c66f..d837940 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -43,3 +43,19 @@
 docker run -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}"/tests --rm  
gcr.io/gem5-test/ubuntu-20.04_all-dependencies \

 ./main.py run --length very-long -j${threads} -t${threads}
+
+# For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
+docker pull gcr.io/gem5-test/gcn-gpu:latest
+docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"scons build/GCN3_X86/gem5.opt -j${threads} \
+|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"
+
+# get LULESH
+wget -qN http://dist.gem5.org/dist/v21-1/test-progs/lulesh/lulesh
+
+mkdir -p tests/testing-results
+
+# LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

+# stressing several GPU compute and memory components
+docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID  
gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt  
gem5/configs/example/apu_se.py -n3 --mem-size=8GB  
--benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50952
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e
Gerrit-Change-Number: 50952
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: update GPU scripts to remove master/slave

2021-09-25 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50967 )



Change subject: configs, gpu-compute: update GPU scripts to remove  
master/slave

..

configs, gpu-compute: update GPU scripts to remove master/slave

Update apu_se and underlying configuration files for GPU runs to
replace the master/slave terminology.

Change-Id: Icf309782f0899dc412eccd27e3ac017902316a70
---
M configs/common/GPUTLBConfig.py
M configs/example/apu_se.py
2 files changed, 34 insertions(+), 28 deletions(-)



diff --git a/configs/common/GPUTLBConfig.py b/configs/common/GPUTLBConfig.py
index 958cf1f..d7adaee 100644
--- a/configs/common/GPUTLBConfig.py
+++ b/configs/common/GPUTLBConfig.py
@@ -148,8 +148,8 @@
 for TLB_type in hierarchy_level:
 name = TLB_type['name']
 for index in range(TLB_type['width']):
-exec('system.%s_coalescer[%d].master[0] = \
-system.%s_tlb[%d].slave[0]' % \
+exec('system.%s_coalescer[%d].mem_side_ports[0] = \
+system.%s_tlb[%d].cpu_side_ports[0]' % \
 (name, index, name, index))

 # Connect the cpuSidePort (slave) of all the coalescers in level 1
@@ -163,12 +163,12 @@
 if tlb_per_cu:
 for tlb in range(tlb_per_cu):
 exec('system.cpu[%d].CUs[%d].translation_port[%d]  
= \

-system.l1_coalescer[%d].slave[%d]' % \
+ 
system.l1_coalescer[%d].cpu_side_ports[%d]' % \

 (shader_idx, cu_idx, tlb,
 cu_idx*tlb_per_cu+tlb, 0))
 else:
 exec('system.cpu[%d].CUs[%d].translation_port[%d] = \
-system.l1_coalescer[%d].slave[%d]' % \
+system.l1_coalescer[%d].cpu_side_ports[%d]' % \
 (shader_idx, cu_idx, tlb_per_cu,
 cu_idx / (n_cu / num_TLBs),
 cu_idx % (n_cu / num_TLBs)))
@@ -177,14 +177,14 @@
 sqc_tlb_index = index / options.cu_per_sqc
 sqc_tlb_port_id = index % options.cu_per_sqc
 exec('system.cpu[%d].CUs[%d].sqc_tlb_port = \
-system.sqc_coalescer[%d].slave[%d]' % \
+system.sqc_coalescer[%d].cpu_side_ports[%d]' % \
 (shader_idx, index, sqc_tlb_index,  
sqc_tlb_port_id))

 elif name == 'scalar': # Scalar D-TLB
 for index in range(n_cu):
 scalar_tlb_index = index / options.cu_per_scalar_cache
 scalar_tlb_port_id = index % options.cu_per_scalar_cache
 exec('system.cpu[%d].CUs[%d].scalar_tlb_port = \
-system.scalar_coalescer[%d].slave[%d]' % \
+system.scalar_coalescer[%d].cpu_side_ports[%d]' % \
 (shader_idx, index, scalar_tlb_index,
  scalar_tlb_port_id))

@@ -196,11 +196,12 @@
 for TLB_type in L1:
 name = TLB_type['name']
 for index in range(TLB_type['width']):
-exec('system.%s_tlb[%d].master[0] = \
-system.l2_coalescer[0].slave[%d]' % \
+exec('system.%s_tlb[%d].mem_side_ports[0] = \
+system.l2_coalescer[0].cpu_side_ports[%d]' % \
 (name, index, l2_coalescer_index))
 l2_coalescer_index += 1
 # L2 <-> L3
-system.l2_tlb[0].master[0] = system.l3_coalescer[0].slave[0]
+system.l2_tlb[0].mem_side_ports[0] = \
+system.l3_coalescer[0].cpu_side_ports[0]

 return system
diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 7a45952..29ceddb 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -342,8 +342,9 @@
 compute_units[-1].prefetch_prev_type = args.pf_type

 # attach the LDS and the CU to the bus (actually a Bridge)
-compute_units[-1].ldsPort = compute_units[-1].ldsBus.slave
-compute_units[-1].ldsBus.master =  
compute_units[-1].localDataStore.cuPort

+compute_units[-1].ldsPort = compute_units[-1].ldsBus.cpu_side_port
+compute_units[-1].ldsBus.mem_side_port = \
+compute_units[-1].localDataStore.cuPort

 # Attach compute units to GPU
 shader.CUs = compute_units
@@ -561,8 +562,8 @@
 Ruby.create_system(args, None, system, None, dma_list, None)
 system.ruby.clk_domain = SrcClockDomain(clock = args.ruby_clock,
 voltage_domain = system.voltage_domain)
-gpu_cmd_proc.pio = system.piobus.master
-gpu_hsapp.pio = system.piobus.master
+gpu_cmd_proc.pio = system.piobus.mem_side_ports
+gpu_hsapp.pio = system.piobus.mem_side_ports

 for i, dma_device in enumerate(dma_list):
 exec('system.

[gem5-dev] Change in gem5/gem5[develop]: tests: update nightly tests to document square

2021-09-27 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50949 )


Change subject: tests: update nightly tests to document square
..

tests: update nightly tests to document square

Add some information and comments on why square is included in the
nightly tests.

Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50949
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Bobby R. Bruce 
---
M tests/nightly.sh
1 file changed, 20 insertions(+), 0 deletions(-)

Approvals:
  Bobby R. Bruce: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/nightly.sh b/tests/nightly.sh
index 3ffdbcd..91b19f5 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -91,10 +91,14 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"

+# get square
 wget -qN http://dist.gem5.org/dist/develop/test-progs/square/square

 mkdir -p tests/testing-results

+# Square is the simplest, fastest, more heavily tested GPU application
+# Thus, we always want to run this in the nightly regressions to make sure
+# basic GPU functionality is working.
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3 -c square

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50949
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I80b61fb90f16ad0d693ec29975908549e8102382
Gerrit-Change-Number: 50949
Gerrit-PatchSet: 2
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: Add HeteroSync to nightly regression

2021-09-29 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50951 )


Change subject: tests: Add HeteroSync to nightly regression
..

tests: Add HeteroSync to nightly regression

HeteroSync does a good job of testing the GPU memory system and
atomics support, without requiring a long runtime.  Thus, this
commit adds a mutex and barrier test from HeteroSync to the
nightly regression to ensure these components are tested.

Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50951
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Bobby R. Bruce 
---
M tests/nightly.sh
1 file changed, 40 insertions(+), 0 deletions(-)

Approvals:
  Bobby R. Bruce: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/nightly.sh b/tests/nightly.sh
index 91b19f5..6631bb0 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -102,3 +102,25 @@
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3 -c square
+
+# get HeteroSync
+wget -qN  
http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel

+
+# run HeteroSync sleepMutex -- 16 WGs (4 per CU in default config), each  
doing
+# 10 Ld/St per thread and 4 iterations of the critical section is a  
reasonable
+# moderate contention case for the default 4 CU GPU config and help ensure  
GPU

+# atomics are tested.
+docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+configs/example/apu_se.py -n3 -callSyncPrims-1kernel \
+--options="sleepMutex 10 16 4"
+
+# run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs
+# accessing unique data and then joining a lock-free barrier, 10 Ld/St per
+# thread, 4 iterations of critical section.  Again this is representative  
of a
+# moderate contention case for the default 4 CU GPU config and help ensure  
GPU

+# atomics are tested.
+docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+configs/example/apu_se.py -n3 -callSyncPrims-1kernel \
+--options="lfTreeBarrUniq 10 16 4"

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50951
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I65998a0a63d41dd3ba165c3a000cee7e42e9034a
Gerrit-Change-Number: 50951
Gerrit-PatchSet: 3
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Kyle Roarty 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: add LULESH to weekly regression

2021-09-29 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50952 )


 (

2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

 )Change subject: tests: add LULESH to weekly regression
..

tests: add LULESH to weekly regression

LULESH is a popular GPU HPC application that acts as a good test
for several memory and compute patterns.  Thus, including it in
the weekly regressions will help verify correctness and
functionality for code that affects the GPU.  The default LULESH
input runs 10 iterations and takes 3-4 hours.  Hence, it is not
appropriate for nightly regressions.

Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50952
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Bobby R. Bruce 
---
M tests/weekly.sh
1 file changed, 36 insertions(+), 0 deletions(-)

Approvals:
  Bobby R. Bruce: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index 393c66f..b697c29 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -43,3 +43,19 @@
 docker run -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}"/tests --rm  
gcr.io/gem5-test/ubuntu-20.04_all-dependencies \

 ./main.py run --length very-long -j${threads} -t${threads}
+
+# For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
+docker pull gcr.io/gem5-test/gcn-gpu:latest
+docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"scons build/GCN3_X86/gem5.opt -j${threads} \
+|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"
+
+# get LULESH
+wget -qN http://dist.gem5.org/dist/develop/test-progs/lulesh/lulesh
+
+mkdir -p tests/testing-results
+
+# LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

+# stressing several GPU compute and memory components
+docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID  
gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt  
gem5/configs/example/apu_se.py -n3 --mem-size=8GB  
--benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/50952
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ic1b73ab32fdd5cb1b973f2676b272adb91b2a98e
Gerrit-Change-Number: 50952
Gerrit-PatchSet: 4
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Kyle Roarty 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: update GPU scripts to remove master/slave

2021-09-29 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/50967 )


Change subject: configs, gpu-compute: update GPU scripts to remove  
master/slave

..

configs, gpu-compute: update GPU scripts to remove master/slave

Update apu_se and underlying configuration files for GPU runs to
replace the master/slave terminology.

Change-Id: Icf309782f0899dc412eccd27e3ac017902316a70
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/50967
Tested-by: kokoro 
Reviewed-by: Matthew Poremba 
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Bobby R. Bruce 
Maintainer: Jason Lowe-Power 
Maintainer: Bobby R. Bruce 
---
M configs/common/GPUTLBConfig.py
M configs/example/apu_se.py
2 files changed, 53 insertions(+), 28 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Matthew Poremba: Looks good to me, approved
  Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/common/GPUTLBConfig.py b/configs/common/GPUTLBConfig.py
index 958cf1f..d7adaee 100644
--- a/configs/common/GPUTLBConfig.py
+++ b/configs/common/GPUTLBConfig.py
@@ -148,8 +148,8 @@
 for TLB_type in hierarchy_level:
 name = TLB_type['name']
 for index in range(TLB_type['width']):
-exec('system.%s_coalescer[%d].master[0] = \
-system.%s_tlb[%d].slave[0]' % \
+exec('system.%s_coalescer[%d].mem_side_ports[0] = \
+system.%s_tlb[%d].cpu_side_ports[0]' % \
 (name, index, name, index))

 # Connect the cpuSidePort (slave) of all the coalescers in level 1
@@ -163,12 +163,12 @@
 if tlb_per_cu:
 for tlb in range(tlb_per_cu):
 exec('system.cpu[%d].CUs[%d].translation_port[%d]  
= \

-system.l1_coalescer[%d].slave[%d]' % \
+ 
system.l1_coalescer[%d].cpu_side_ports[%d]' % \

 (shader_idx, cu_idx, tlb,
 cu_idx*tlb_per_cu+tlb, 0))
 else:
 exec('system.cpu[%d].CUs[%d].translation_port[%d] = \
-system.l1_coalescer[%d].slave[%d]' % \
+system.l1_coalescer[%d].cpu_side_ports[%d]' % \
 (shader_idx, cu_idx, tlb_per_cu,
 cu_idx / (n_cu / num_TLBs),
 cu_idx % (n_cu / num_TLBs)))
@@ -177,14 +177,14 @@
 sqc_tlb_index = index / options.cu_per_sqc
 sqc_tlb_port_id = index % options.cu_per_sqc
 exec('system.cpu[%d].CUs[%d].sqc_tlb_port = \
-system.sqc_coalescer[%d].slave[%d]' % \
+system.sqc_coalescer[%d].cpu_side_ports[%d]' % \
 (shader_idx, index, sqc_tlb_index,  
sqc_tlb_port_id))

 elif name == 'scalar': # Scalar D-TLB
 for index in range(n_cu):
 scalar_tlb_index = index / options.cu_per_scalar_cache
 scalar_tlb_port_id = index % options.cu_per_scalar_cache
 exec('system.cpu[%d].CUs[%d].scalar_tlb_port = \
-system.scalar_coalescer[%d].slave[%d]' % \
+system.scalar_coalescer[%d].cpu_side_ports[%d]' % \
 (shader_idx, index, scalar_tlb_index,
  scalar_tlb_port_id))

@@ -196,11 +196,12 @@
 for TLB_type in L1:
 name = TLB_type['name']
 for index in range(TLB_type['width']):
-exec('system.%s_tlb[%d].master[0] = \
-system.l2_coalescer[0].slave[%d]' % \
+exec('system.%s_tlb[%d].mem_side_ports[0] = \
+system.l2_coalescer[0].cpu_side_ports[%d]' % \
 (name, index, l2_coalescer_index))
 l2_coalescer_index += 1
 # L2 <-> L3
-system.l2_tlb[0].master[0] = system.l3_coalescer[0].slave[0]
+system.l2_tlb[0].mem_side_ports[0] = \
+system.l3_coalescer[0].cpu_side_ports[0]

 return system
diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 7a45952..29ceddb 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -342,8 +342,9 @@
 compute_units[-1].prefetch_prev_type = args.pf_type

 # attach the LDS and the CU to the bus (actually a Bridge)
-compute_units[-1].ldsPort = compute_units[-1].ldsBus.slave
-compute_units[-1].ldsBus.master =  
compute_units[-1].localDataStore.cuPort

+compute_units[-1].ldsPort = compute_units[-1].ldsBus.cpu_side_port
+compute_units[-1].ldsBus.mem_side_port = \
+compute_units[-1].localDataStore.cuPort

 # Attach compute units to GPU
 sh

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: Fix MUBUF out-of-bounds case 1

2021-09-30 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51127 )


Change subject: arch-gcn3: Fix MUBUF out-of-bounds case 1
..

arch-gcn3: Fix MUBUF out-of-bounds case 1

This patch upates the out-of-bounds check to properly check
against the correct buffer_offset, which is different depending
on if the const_swizzle_enable is true or false.

Change-Id: I5c687c09ee7f8e446618084b8545b74a84211d4d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51127
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Reviewed-by: Alex Dutu 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/gcn3/insts/op_encodings.hh
1 file changed, 42 insertions(+), 20 deletions(-)

Approvals:
  Alex Dutu: Looks good to me, approved
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/gcn3/insts/op_encodings.hh  
b/src/arch/amdgpu/gcn3/insts/op_encodings.hh

index 24edfa7..be96924 100644
--- a/src/arch/amdgpu/gcn3/insts/op_encodings.hh
+++ b/src/arch/amdgpu/gcn3/insts/op_encodings.hh
@@ -634,6 +634,7 @@
 Addr stride = 0;
 Addr buf_idx = 0;
 Addr buf_off = 0;
+Addr buffer_offset = 0;
 BufferRsrcDescriptor rsrc_desc;

 std::memcpy((void*)&rsrc_desc, s_rsrc_desc.rawDataPtr(),
@@ -656,6 +657,26 @@

 buf_off = v_off[lane] + inst_offset;

+if (rsrc_desc.swizzleEn) {
+Addr idx_stride = 8 << rsrc_desc.idxStride;
+Addr elem_size = 2 << rsrc_desc.elemSize;
+Addr idx_msb = buf_idx / idx_stride;
+Addr idx_lsb = buf_idx % idx_stride;
+Addr off_msb = buf_off / elem_size;
+Addr off_lsb = buf_off % elem_size;
+DPRINTF(GCN3, "mubuf swizzled lane %d: "
+"idx_stride = %llx, elem_size = %llx, "
+"idx_msb = %llx, idx_lsb = %llx, "
+"off_msb = %llx, off_lsb = %llx\n",
+lane, idx_stride, elem_size, idx_msb,  
idx_lsb,

+off_msb, off_lsb);
+
+buffer_offset =(idx_msb * stride + off_msb *  
elem_size)

+* idx_stride + idx_lsb * elem_size + off_lsb;
+} else {
+buffer_offset = buf_off + stride * buf_idx;
+}
+

 /**
  * Range check behavior causes out of range accesses to
@@ -665,7 +686,7 @@
  * basis.
  */
 if (rsrc_desc.stride == 0 || !rsrc_desc.swizzleEn) {
-if (buf_off + stride * buf_idx >=
+if (buffer_offset >=
 rsrc_desc.numRecords - s_offset.rawData()) {
 DPRINTF(GCN3, "mubuf out-of-bounds condition  
1: "

 "lane = %d, buffer_offset = %llx, "
@@ -692,25 +713,7 @@
 }
 }

-if (rsrc_desc.swizzleEn) {
-Addr idx_stride = 8 << rsrc_desc.idxStride;
-Addr elem_size = 2 << rsrc_desc.elemSize;
-Addr idx_msb = buf_idx / idx_stride;
-Addr idx_lsb = buf_idx % idx_stride;
-Addr off_msb = buf_off / elem_size;
-Addr off_lsb = buf_off % elem_size;
-DPRINTF(GCN3, "mubuf swizzled lane %d: "
-"idx_stride = %llx, elem_size = %llx, "
-"idx_msb = %llx, idx_lsb = %llx, "
-"off_msb = %llx, off_lsb = %llx\n",
-lane, idx_stride, elem_size, idx_msb,  
idx_lsb,

-off_msb, off_lsb);
-
-vaddr += ((idx_msb * stride + off_msb * elem_size)
-* idx_stride + idx_lsb * elem_size + off_lsb);
-} else {
-vaddr += buf_off + stride * buf_idx;
-}
+vaddr += buffer_offset;

 DPRINTF(GCN3, "Calculating mubuf address for lane %d: "
 "vaddr = %llx, base_addr = %llx, "

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51127
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5c687c09ee7f8e446618084b8545b74a84211d4d
Gerrit-Chang

[gem5-dev] Change in gem5/gem5[develop]: tests: add DNNMark to weekly regression

2021-10-01 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51187 )



Change subject: tests: add DNNMark to weekly regression
..

tests: add DNNMark to weekly regression

DNNMark is representative of several simple (fast) layers within ML
applications, which are heavily used in modern GPU applications.  Thus,
we want to make sure support for these applications are tested.  This
commit updates the weekly regression to run three variants: fwd_softmax,
bwd_bn, and fwd_pool -- ensuring we test both inference and training as
well as a variety of ML layers.

Change-Id: I38bfa9bd3a2817099ece46afc2d6132ce346e21a
---
M tests/weekly.sh
1 file changed, 87 insertions(+), 0 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index b697c29..dd25d4b 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -59,3 +59,74 @@
 # LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

 # stressing several GPU compute and memory components
 docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID  
gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt  
gem5/configs/example/apu_se.py -n3 --mem-size=8GB  
--benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh

+
+# get DNNMark
+# Pull the gem5 resources to the root of the gem5 directory -- DNNMark
+# builds a library and thus doesn't have a binary, so we need to build
+# it before we run it
+git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
+"${gem5_root}/gem5-resources"
+
+# setup cmake for DNNMark
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \
+ gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP"
+
+# make the DNNMark library
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \
+gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}"
+
+# generate cachefiles -- since we are testing gfx801 and 4 CUs (default  
config)

+# in tester, we want cachefiles for this setup
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark" \
+"-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

+gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \
+--num-cus=4"
+
+# generate mmap data for DNNMark (makes simulation much faster)
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:latest bash -c \

+"g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data"
+
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:latest bash -c \

+"./generate_rand_data"
+
+# now we can run DNNMark!
+# DNNMark is representative of several simple (fast) layers within ML
+# applications, which are heavily used in modern GPU applications.  So, we  
want
+# to make sure support for these applications are tested.  Run three  
variants:
+# fwd_softmax, bwd_bn, fwd_pool; these tests ensure we run a variety of ML  
kernels,

+# including both inference and training
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \
+   "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\
+   -w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
+   "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
+
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax"  
\

+   -cdnnmark_test_fwd_softmax \
+   --options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark  
\

+   -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"
+
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \
+   "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\
+   -w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
+   "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
+
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool"  
\

+   -cdnnmark_test_fwd_pool \
+   --options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark  
\

+   -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"
+
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \
+   "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\
+   -w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
+   "$

[gem5-dev] Change in gem5/gem5[develop]: tests: fix LULESH weekly regression command

2021-10-01 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51207 )



Change subject: tests: fix LULESH weekly regression command
..

tests: fix LULESH weekly regression command

7756c5e added LULESH to the weekly regression script.  However,
it assumed a local installation of gem5-resources which it should
not have.  This commit fixes that so the weekly regression builds the
LULESH binary and then runs it instead.

Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5
---
M tests/weekly.sh
1 file changed, 34 insertions(+), 2 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index b697c29..cdc7fa0 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -52,10 +52,28 @@
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"

 # get LULESH
-wget -qN http://dist.gem5.org/dist/develop/test-progs/lulesh/lulesh
+# Pull the gem5 resources to the root of the gem5 directory -- currently  
the

+# pre-built binares for LULESH are out-of-date and won't run correctly with
+# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script

+git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
+"${gem5_root}/gem5-resources"

 mkdir -p tests/testing-results

+# build LULESH
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}/gem5-resources/src/gpu/lulesh" \
+   -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "make"
+
 # LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

 # stressing several GPU compute and memory components
-docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID  
gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt  
gem5/configs/example/apu_se.py -n3 --mem-size=8GB  
--benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh

+docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+configs/example/apu_se.py -n3 --mem-size=8GB \
+--benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin"  
-clulesh

+
+# Delete the gem5 resources repo we created
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"rm -rf ${gem5_root}/gem5-resources"

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51207
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5
Gerrit-Change-Number: 51207
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: fix square nightly regression command

2021-10-02 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51247 )



Change subject: tests: fix square nightly regression command
..

tests: fix square nightly regression command

Square's pre-built binary was downloaded into the tests folder,
but the docker command running it assumed we were in GEM5_ROOT.
This commit fixes this problem by specificying the benchmark root
for the square binary.

Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac
---
M tests/nightly.sh
1 file changed, 16 insertions(+), 1 deletion(-)



diff --git a/tests/nightly.sh b/tests/nightly.sh
index 6631bb0..03f9c6a 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -101,7 +101,8 @@
 # basic GPU functionality is working.
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3 -c square
+configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests"  
-csquare

+

 # get HeteroSync
 wget -qN  
http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51247
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac
Gerrit-Change-Number: 51247
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Move VIPER TCC decrements to action from in_port

2021-10-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51368 )



Change subject: mem-ruby: Move VIPER TCC decrements to action from in_port
..

mem-ruby: Move VIPER TCC decrements to action from in_port

Currently, the GPU VIPER TCC protocol handles races between atomics in
the triggerQueue_in.  This in_port does not check for resource
availability, which can cause the trigger queue to execute multiple
times.  Although this is the expected behavior, the code for handling
atomic races decrements the atomicDoneCnt flag in the trigger queue,
which is not safe since resource contention may cause it to execute
multiple times.

To resolve this issue, this commit moves the decrementing of this
counter to a new action that is called in an event that happens only
when the race between atomics is detected.

Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
---
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
1 file changed, 26 insertions(+), 1 deletion(-)



diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index 6112f38..cf7cda5 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -268,7 +268,6 @@
 if (tbe.numAtomics == 0 && tbe.atomicDoneCnt == 1) {
 trigger(Event:AtomicDone, in_msg.addr, cache_entry, tbe);
 } else {
-tbe.atomicDoneCnt := tbe.atomicDoneCnt - 1;
 trigger(Event:AtomicNotDone, in_msg.addr, cache_entry, tbe);
 }
   }
@@ -599,6 +598,10 @@
 }
   }

+  action(dadc_decrementAtomicDoneCnt, "dadc", desc="decrement atomics done  
cnt flag") {

+tbe.numAtomics := tbe.atomicDoneCnt - 1;
+  }
+
   action(ptr_popTriggerQueue, "ptr", desc="pop Trigger") {
 triggerQueue_in.dequeue(clockEdge());
   }
@@ -787,6 +790,7 @@
   }

   transition(A, AtomicNotDone) {TagArrayRead} {
+dadc_decrementAtomicDoneCnt;
 ptr_popTriggerQueue;
   }


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51368
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
Gerrit-Change-Number: 51368
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock

2021-10-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51367 )



Change subject: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock
..

mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock

In the GPU VIPER TCC, programs with mixes of atomics and data
accesses to the same address, in the same kernel, can experience
deadlock when large applications (e.g., Pannotia's graph analytics
algorithms) are running on very small GPUs (e.g., the default 4 CU GPU
configuration).  In this situation, deadlocks occur due to resource
stalls interacting with the behavior of the current implementation for
handling races between atomic accesses.  The specific order of events
causing this deadlock are:

1. TCC is waiting on an atomic to return from directory

2. In the meantime it receives another atomic to the same address -- when
this happens, the TCC increments number of atomics to this address
(numAtomics = 2) that are pending in TBE, and does a write through of the
atomic to the directory.

3. When the first atomic returns from the Directory, it decrements the
numAtomics counter.  numAtomics was at 2 though, because of step #2.  So
it doesn't deallocate the TBE entry and calls Event:AtomicNotDone.

4. Another request (a LD) to the same address comes along for the same
address.  The LD does z_stall since the second atomic is pending –- so the
LD retries every cycle until the deadlock counter times out (or until the
second atomic comes back).

5.  The second atomic returns to the TCC.  However, because there are so
many LD's pending in the cache, all doing z_stall's and retrying every  
cycle,
there are a lot of resource stalls.  So, when the second atomic returns, it  
is

forced to retry its operation multiple times -- and each time it decrements
the atomicDoneCnt flag (which was added to catch a race between atomics
arriving and leaving the TCC in 7246f70bfb) repeatedly.  As a result
atomicDoneCnt becomes negative.

6.  Since this atomicDoneCnt flag is used to determine when Event:AtomicDone
happens, and since the resource stalls caused the atomicDoneCnt flag to  
become
negative, we never complete the atomic.  Which means the pending LD can  
never

access the line, because it's stuck waiting for the atomic to complete.

7.  Eventually the deadlock threshold is reached.

To fix this issue, this commit changes the VIPER TCC protocol from using
z_stall to using the stall_and_wait buffer method that the
Directory-level of the SLICC already uses.  This change effectively
prevents resource stalls from dominating the TCC level, by putting
pending requests for a given address in a per-address stall buffer.
These requests are then woken up when the pending request returns.

As part of this change, this change also makes two small changes to the
Directory-level protocol (MOESI_AMD_BASE-dir):

1.  Updated the names of the wakeup actions to match the TCC wakeup actions,
to avoid confusion.

2.  Changed transition(B, UnblockWriteThrough, U) to check all stall  
buffers,

as some requests were being placed later in the stall buffer than was
being checked.  This mirrors the changes in 187c44fe44 to other Directory
transitions to resolve races between GPU and DMA requests, but for
transitions prior workloads did not stress.

Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2
---
M src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
2 files changed, 104 insertions(+), 15 deletions(-)



diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index 6c07416..6112f38 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -126,6 +126,7 @@
   void unset_tbe();
   void wakeUpAllBuffers();
   void wakeUpBuffers(Addr a);
+  void wakeUpAllBuffers(Addr a);

   MachineID mapAddressToMachine(Addr addr, MachineType mtype);

@@ -569,6 +570,14 @@
 probeNetwork_in.dequeue(clockEdge());
   }

+  action(st_stallAndWaitRequest, "st", desc="Stall and wait on the  
address") {

+stall_and_wait(coreRequestNetwork_in, address);
+  }
+
+  action(wada_wakeUpAllDependentsAddr, "wada", desc="Wake up any requests  
waiting for this address") {

+wakeUpAllBuffers(address);
+  }
+
   action(z_stall, "z", desc="stall") {
   // built-in
   }
@@ -606,13 +615,22 @@
   // they can cause a resource stall deadlock!

   transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) {  
//TagArrayRead} {

-  z_stall;
+  // put putting the stalled requests in a buffer, we reduce resource  
contention
+  // since they won't try again every cycle and will instead only try  
again once

+  // woken up
+  st_stallAndWaitRequest;
   }
   transition(A, {RdBlk, WrVicBlk, WrVicBlkBack}) { //TagArrayRead} {
-  z_stall;
+  // put putting the stalled requests in a buffer, we reduce resource  
con

[gem5-dev] Change in gem5/gem5[develop]: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues

2021-10-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51371 )



Change subject: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues
..

dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues

GFX7 (not supported in gem5) and GFX8 have a bug with how virtual
addresses are calculated for their HSA queues.  The ROCr component of
ROCm solves this problem by doubling the HSA queue size that is
requested, then mapping all virtual addresses in the second half of the
queue to the same virtual addresses as the first half of the queue.
This commit fixes gem5's support to mimic this behavior.

Note that this change does not affect Vega's HSA queue support, because
according to the ROCm documentation, Vega does not have the same problem
as GCN3.

Change-Id: I133cf1acc3a00a0baded0c4c3c2a25f39effdb51
---
M src/dev/hsa/hsa_packet_processor.cc
M src/dev/hsa/hsa_packet_processor.hh
M src/gpu-compute/gpu_compute_driver.cc
M src/dev/hsa/hw_scheduler.cc
M src/dev/hsa/hw_scheduler.hh
5 files changed, 71 insertions(+), 17 deletions(-)



diff --git a/src/dev/hsa/hsa_packet_processor.cc  
b/src/dev/hsa/hsa_packet_processor.cc

index 0427def..22124b1 100644
--- a/src/dev/hsa/hsa_packet_processor.cc
+++ b/src/dev/hsa/hsa_packet_processor.cc
@@ -44,6 +44,7 @@
 #include "dev/dma_device.hh"
 #include "dev/hsa/hsa_packet.hh"
 #include "dev/hsa/hw_scheduler.hh"
+#include "enums/GfxVersion.hh"
 #include "gpu-compute/gpu_command_processor.hh"
 #include "mem/packet_access.hh"
 #include "mem/page_table.hh"
@@ -100,13 +101,15 @@
 HSAPacketProcessor::setDeviceQueueDesc(uint64_t hostReadIndexPointer,
uint64_t basePointer,
uint64_t queue_id,
-   uint32_t size, int doorbellSize)
+   uint32_t size, int doorbellSize,
+   GfxVersion gfxVersion)
 {
 DPRINTF(HSAPacketProcessor,
  "%s:base = %p, qID = %d, ze = %d\n", __FUNCTION__,
  (void *)basePointer, queue_id, size);
 hwSchdlr->registerNewQueue(hostReadIndexPointer,
-   basePointer, queue_id, size, doorbellSize);
+   basePointer, queue_id, size, doorbellSize,
+   gfxVersion);
 }

 AddrRangeList
diff --git a/src/dev/hsa/hsa_packet_processor.hh  
b/src/dev/hsa/hsa_packet_processor.hh

index 9545006..aabe24e 100644
--- a/src/dev/hsa/hsa_packet_processor.hh
+++ b/src/dev/hsa/hsa_packet_processor.hh
@@ -39,9 +39,11 @@
 #include 

 #include "base/types.hh"
+#include "debug/HSAPacketProcessor.hh"
 #include "dev/dma_virt_device.hh"
 #include "dev/hsa/hsa.h"
 #include "dev/hsa/hsa_queue.hh"
+#include "enums/GfxVersion.hh"
 #include "params/HSAPacketProcessor.hh"
 #include "sim/eventq.hh"

@@ -84,14 +86,16 @@
 uint64_t hostReadIndexPtr;
 bool stalledOnDmaBufAvailability;
 bool dmaInProgress;
+GfxVersion   gfxVersion;

 HSAQueueDescriptor(uint64_t base_ptr, uint64_t db_ptr,
-   uint64_t hri_ptr, uint32_t size)
+   uint64_t hri_ptr, uint32_t size,
+   GfxVersion gfxVersion)
   : basePointer(base_ptr), doorbellPointer(db_ptr),
 writeIndex(0), readIndex(0),
 numElts(size / AQL_PACKET_SIZE), hostReadIndexPtr(hri_ptr),
 stalledOnDmaBufAvailability(false),
-dmaInProgress(false)
+dmaInProgress(false), gfxVersion(gfxVersion)
 {  }
 uint64_t spaceRemaining() { return numElts - (writeIndex -  
readIndex); }

 uint64_t spaceUsed() { return writeIndex - readIndex; }
@@ -102,15 +106,38 @@

 uint64_t ptr(uint64_t ix)
 {
-/**
- * Sometimes queues report that their size is 512k, which would
- * indicate numElts of 0x2000. However, they only have 256k
- * mapped which means any index over 0x1000 will fail an
- * address translation.
+/*
+ * Based on ROCm Documentation:
+ * -  
https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/
+  
10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/

+ rocr/src/core/runtime/amd_aql_queue.cpp#L99
+ * -  
https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/
+  
10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/

+ rocr/src/core/runtime/amd_aql_queue.cpp#L624
+ *
+ * GFX7 and GFX8 will allocate twice as much space for their  
HSA
+ * queues as they actually access (using mod operations to map  
the
+ * virtual addresses from the upper half of the queue to the  
same
+   

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Move VIPER TCC decrements to action from in_port

2021-10-08 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51368 )


Change subject: mem-ruby: Move VIPER TCC decrements to action from in_port
..

mem-ruby: Move VIPER TCC decrements to action from in_port

Currently, the GPU VIPER TCC protocol handles races between atomics in
the triggerQueue_in.  This in_port does not check for resource
availability, which can cause the trigger queue to execute multiple
times.  Although this is the expected behavior, the code for handling
atomic races decrements the atomicDoneCnt flag in the trigger queue,
which is not safe since resource contention may cause it to execute
multiple times.

To resolve this issue, this commit moves the decrementing of this
counter to a new action that is called in an event that happens only
when the race between atomics is detected.

Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51368
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Matthew Poremba 
Maintainer: Jason Lowe-Power 
Tested-by: kokoro 
---
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
1 file changed, 31 insertions(+), 1 deletion(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Matthew Poremba: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index 6112f38..571587f 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -268,7 +268,6 @@
 if (tbe.numAtomics == 0 && tbe.atomicDoneCnt == 1) {
 trigger(Event:AtomicDone, in_msg.addr, cache_entry, tbe);
 } else {
-tbe.atomicDoneCnt := tbe.atomicDoneCnt - 1;
 trigger(Event:AtomicNotDone, in_msg.addr, cache_entry, tbe);
 }
   }
@@ -599,6 +598,10 @@
 }
   }

+  action(dadc_decrementAtomicDoneCnt, "dadc", desc="decrement atomics done  
cnt flag") {

+tbe.atomicDoneCnt := tbe.atomicDoneCnt - 1;
+  }
+
   action(ptr_popTriggerQueue, "ptr", desc="pop Trigger") {
 triggerQueue_in.dequeue(clockEdge());
   }
@@ -787,6 +790,7 @@
   }

   transition(A, AtomicNotDone) {TagArrayRead} {
+dadc_decrementAtomicDoneCnt;
 ptr_popTriggerQueue;
   }


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51368
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I552fd4f34fdd9ebeec99fb7aeb4eeb7b150f577f
Gerrit-Change-Number: 51368
Gerrit-PatchSet: 3
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock

2021-10-08 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51367 )


Change subject: mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock
..

mem-ruby: Update GPU VIPER TCC protocol to resolve deadlock

In the GPU VIPER TCC, programs with mixes of atomics and data
accesses to the same address, in the same kernel, can experience
deadlock when large applications (e.g., Pannotia's graph analytics
algorithms) are running on very small GPUs (e.g., the default 4 CU GPU
configuration).  In this situation, deadlocks occur due to resource
stalls interacting with the behavior of the current implementation for
handling races between atomic accesses.  The specific order of events
causing this deadlock are:

1. TCC is waiting on an atomic to return from directory

2. In the meantime it receives another atomic to the same address -- when
this happens, the TCC increments number of atomics to this address
(numAtomics = 2) that are pending in TBE, and does a write through of the
atomic to the directory.

3. When the first atomic returns from the Directory, it decrements the
numAtomics counter.  numAtomics was at 2 though, because of step #2.  So
it doesn't deallocate the TBE entry and calls Event:AtomicNotDone.

4. Another request (a LD) to the same address comes along for the same
address.  The LD does z_stall since the second atomic is pending –- so the
LD retries every cycle until the deadlock counter times out (or until the
second atomic comes back).

5.  The second atomic returns to the TCC.  However, because there are so
many LD's pending in the cache, all doing z_stall's and retrying every  
cycle,
there are a lot of resource stalls.  So, when the second atomic returns, it  
is

forced to retry its operation multiple times -- and each time it decrements
the atomicDoneCnt flag (which was added to catch a race between atomics
arriving and leaving the TCC in 7246f70bfb) repeatedly.  As a result
atomicDoneCnt becomes negative.

6.  Since this atomicDoneCnt flag is used to determine when Event:AtomicDone
happens, and since the resource stalls caused the atomicDoneCnt flag to  
become
negative, we never complete the atomic.  Which means the pending LD can  
never

access the line, because it's stuck waiting for the atomic to complete.

7.  Eventually the deadlock threshold is reached.

To fix this issue, this commit changes the VIPER TCC protocol from using
z_stall to using the stall_and_wait buffer method that the
Directory-level of the SLICC already uses.  This change effectively
prevents resource stalls from dominating the TCC level, by putting
pending requests for a given address in a per-address stall buffer.
These requests are then woken up when the pending request returns.

As part of this change, this change also makes two small changes to the
Directory-level protocol (MOESI_AMD_BASE-dir):

1.  Updated the names of the wakeup actions to match the TCC wakeup actions,
to avoid confusion.

2.  Changed transition(B, UnblockWriteThrough, U) to check all stall  
buffers,

as some requests were being placed later in the stall buffer than was
being checked.  This mirrors the changes in 187c44fe44 to other Directory
transitions to resolve races between GPU and DMA requests, but for
transitions prior workloads did not stress.

Change-Id: I60ac9830a87c125e9ac49515a7fc7731a65723c2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51367
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Matthew Poremba 
Maintainer: Jason Lowe-Power 
Tested-by: kokoro 
---
M src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
2 files changed, 109 insertions(+), 15 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Matthew Poremba: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index 6c07416..6112f38 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -126,6 +126,7 @@
   void unset_tbe();
   void wakeUpAllBuffers();
   void wakeUpBuffers(Addr a);
+  void wakeUpAllBuffers(Addr a);

   MachineID mapAddressToMachine(Addr addr, MachineType mtype);

@@ -569,6 +570,14 @@
 probeNetwork_in.dequeue(clockEdge());
   }

+  action(st_stallAndWaitRequest, "st", desc="Stall and wait on the  
address") {

+stall_and_wait(coreRequestNetwork_in, address);
+  }
+
+  action(wada_wakeUpAllDependentsAddr, "wada", desc="Wake up any requests  
waiting for this address") {

+wakeUpAllBuffers(address);
+  }
+
   action(z_stall, "z", desc="stall") {
   // built-in
   }
@@ -606,13 +615,22 @@
   // they can cause a resource stall deadlock!

   transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) {  
//TagArrayRead} {

-  z_stall;
+  // put putting the stalled requests

[gem5-dev] Change in gem5/gem5[develop]: tests: add DNNMark to weekly regression

2021-10-11 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51187 )


Change subject: tests: add DNNMark to weekly regression
..

tests: add DNNMark to weekly regression

DNNMark is representative of several simple (fast) layers within ML
applications, which are heavily used in modern GPU applications.  Thus,
we want to make sure support for these applications are tested.  This
commit updates the weekly regression to run three variants: fwd_softmax,
bwd_bn, and fwd_pool -- ensuring we test both inference and training as
well as a variety of ML layers.

Change-Id: I38bfa9bd3a2817099ece46afc2d6132ce346e21a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51187
Reviewed-by: Bobby R. Bruce 
Maintainer: Bobby R. Bruce 
Tested-by: kokoro 
---
M tests/weekly.sh
1 file changed, 103 insertions(+), 1 deletion(-)

Approvals:
  Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index b697c29..c699f65 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -58,4 +58,86 @@

 # LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

 # stressing several GPU compute and memory components
-docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID  
gcr.io/gem5-test/gcn-gpu gem5/build/GCN3_X86/gem5.opt  
gem5/configs/example/apu_se.py -n3 --mem-size=8GB  
--benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh

+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+configs/example/apu_se.py -n3 --mem-size=8GB -clulesh
+
+# get DNNMark
+# Delete gem5 resources repo if it already exists -- need to do in docker
+# because of cachefiles DNNMark creates
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "rm -rf ${gem5_root}/gem5-resources"
+
+# Pull the gem5 resources to the root of the gem5 directory -- DNNMark
+# builds a library and thus doesn't have a binary, so we need to build
+# it before we run it
+git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
+"${gem5_root}/gem5-resources"
+
+# setup cmake for DNNMark
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+ "${gem5_root}/gem5-resources/src/gpu/DNNMark" \
+ gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP"
+
+# make the DNNMark library
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \
+gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}"
+
+# generate cachefiles -- since we are testing gfx801 and 4 CUs (default  
config)

+# in tester, we want cachefiles for this setup
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark" \
+"-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

+gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \
+--num-cus=4"
+
+# generate mmap data for DNNMark (makes simulation much faster)
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:latest bash -c \

+"g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data"
+
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:latest bash -c \

+"./generate_rand_data"
+
+# now we can run DNNMark!
+# DNNMark is representative of several simple (fast) layers within ML
+# applications, which are heavily used in modern GPU applications.  So, we  
want
+# to make sure support for these applications are tested.  Run three  
variants:
+# fwd_softmax, bwd_bn, fwd_pool; these tests ensure we run a variety of ML  
kernels,

+# including both inference and training
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \
+   "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\
+   -w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
+   "${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
+
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax"  
\

+   -cdnnmark_test_fwd_softmax \
+   --options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark  
\

+   -mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"
+
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -v \
+   "${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/

[gem5-dev] Change in gem5/gem5[develop]: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues

2021-10-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51371 )


Change subject: dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues
..

dev-hsa,gpu-compute: fix bug with gfx8 VAs for HSA Queues

GFX7 (not supported in gem5) and GFX8 have a bug with how virtual
addresses are calculated for their HSA queues.  The ROCr component of
ROCm solves this problem by doubling the HSA queue size that is
requested, then mapping all virtual addresses in the second half of the
queue to the same virtual addresses as the first half of the queue.
This commit fixes gem5's support to mimic this behavior.

Note that this change does not affect Vega's HSA queue support, because
according to the ROCm documentation, Vega does not have the same problem
as GCN3.

Change-Id: I133cf1acc3a00a0baded0c4c3c2a25f39effdb51
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51371
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matthew Poremba 
---
M src/dev/hsa/hsa_packet_processor.cc
M src/dev/hsa/hsa_packet_processor.hh
M src/gpu-compute/gpu_compute_driver.cc
M src/dev/hsa/hw_scheduler.cc
M src/dev/hsa/hw_scheduler.hh
5 files changed, 75 insertions(+), 17 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/hsa/hsa_packet_processor.cc  
b/src/dev/hsa/hsa_packet_processor.cc

index 0427def..22124b1 100644
--- a/src/dev/hsa/hsa_packet_processor.cc
+++ b/src/dev/hsa/hsa_packet_processor.cc
@@ -44,6 +44,7 @@
 #include "dev/dma_device.hh"
 #include "dev/hsa/hsa_packet.hh"
 #include "dev/hsa/hw_scheduler.hh"
+#include "enums/GfxVersion.hh"
 #include "gpu-compute/gpu_command_processor.hh"
 #include "mem/packet_access.hh"
 #include "mem/page_table.hh"
@@ -100,13 +101,15 @@
 HSAPacketProcessor::setDeviceQueueDesc(uint64_t hostReadIndexPointer,
uint64_t basePointer,
uint64_t queue_id,
-   uint32_t size, int doorbellSize)
+   uint32_t size, int doorbellSize,
+   GfxVersion gfxVersion)
 {
 DPRINTF(HSAPacketProcessor,
  "%s:base = %p, qID = %d, ze = %d\n", __FUNCTION__,
  (void *)basePointer, queue_id, size);
 hwSchdlr->registerNewQueue(hostReadIndexPointer,
-   basePointer, queue_id, size, doorbellSize);
+   basePointer, queue_id, size, doorbellSize,
+   gfxVersion);
 }

 AddrRangeList
diff --git a/src/dev/hsa/hsa_packet_processor.hh  
b/src/dev/hsa/hsa_packet_processor.hh

index 9545006..aabe24e 100644
--- a/src/dev/hsa/hsa_packet_processor.hh
+++ b/src/dev/hsa/hsa_packet_processor.hh
@@ -39,9 +39,11 @@
 #include 

 #include "base/types.hh"
+#include "debug/HSAPacketProcessor.hh"
 #include "dev/dma_virt_device.hh"
 #include "dev/hsa/hsa.h"
 #include "dev/hsa/hsa_queue.hh"
+#include "enums/GfxVersion.hh"
 #include "params/HSAPacketProcessor.hh"
 #include "sim/eventq.hh"

@@ -84,14 +86,16 @@
 uint64_t hostReadIndexPtr;
 bool stalledOnDmaBufAvailability;
 bool dmaInProgress;
+GfxVersion   gfxVersion;

 HSAQueueDescriptor(uint64_t base_ptr, uint64_t db_ptr,
-   uint64_t hri_ptr, uint32_t size)
+   uint64_t hri_ptr, uint32_t size,
+   GfxVersion gfxVersion)
   : basePointer(base_ptr), doorbellPointer(db_ptr),
 writeIndex(0), readIndex(0),
 numElts(size / AQL_PACKET_SIZE), hostReadIndexPtr(hri_ptr),
 stalledOnDmaBufAvailability(false),
-dmaInProgress(false)
+dmaInProgress(false), gfxVersion(gfxVersion)
 {  }
 uint64_t spaceRemaining() { return numElts - (writeIndex -  
readIndex); }

 uint64_t spaceUsed() { return writeIndex - readIndex; }
@@ -102,15 +106,38 @@

 uint64_t ptr(uint64_t ix)
 {
-/**
- * Sometimes queues report that their size is 512k, which would
- * indicate numElts of 0x2000. However, they only have 256k
- * mapped which means any index over 0x1000 will fail an
- * address translation.
+/*
+ * Based on ROCm Documentation:
+ * -  
https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/
+  
10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/

+ rocr/src/core/runtime/amd_aql_queue.cpp#L99
+ * -  
https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/
+  
10ca0a99bbd0252f5bf6f08d1503e59f1129df4a/ROCm_Libraries/

+ rocr/src/core/runtime/amd_aql_queue.cpp#L

[gem5-dev] Change in gem5/gem5[develop]: tests: fix square and HeteroSync nightly regression command

2021-10-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51247 )


Change subject: tests: fix square and HeteroSync nightly regression command
..

tests: fix square and HeteroSync nightly regression command

Square and HeteroSync's pre-built binaries were downloaded into the
tests folder in the nightly regression script, but the docker
command running them assumed we were in GEM5_ROOT.  This commit
fixes this problem by specificying the benchmark root for the
applications.

Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51247
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
Maintainer: Bobby R. Bruce 
---
M tests/nightly.sh
1 file changed, 24 insertions(+), 5 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved
  Bobby R. Bruce: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/nightly.sh b/tests/nightly.sh
index 6631bb0..89c7005 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -101,7 +101,7 @@
 # basic GPU functionality is working.
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3 -c square
+configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" -c  
square


 # get HeteroSync
 wget -qN  
http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel

@@ -112,8 +112,8 @@
 # atomics are tested.
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3 -callSyncPrims-1kernel \
---options="sleepMutex 10 16 4"
+configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \
+-c allSyncPrims-1kernel --options="sleepMutex 10 16 4"

 # run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs
 # accessing unique data and then joining a lock-free barrier, 10 Ld/St per
@@ -122,5 +122,5 @@
 # atomics are tested.
 docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3 -callSyncPrims-1kernel \
---options="lfTreeBarrUniq 10 16 4"
+configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \
+-c allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4"

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51247
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I905c8bde7231bc708db01bff196fd85d99c7ceac
Gerrit-Change-Number: 51247
Gerrit-PatchSet: 5
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Alex Dutu 
Gerrit-CC: Kyle Roarty 
Gerrit-CC: Matthew Poremba 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: add additional space in weekly DNNMark tests

2021-10-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51453 )



Change subject: tests: add additional space in weekly DNNMark tests
..

tests: add additional space in weekly DNNMark tests

Add space between -c and binary name for all DNNMark tests to conform to
the other tests style and reduce confusion.

Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624
---
M tests/weekly.sh
1 file changed, 15 insertions(+), 3 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index 5a8accc..5997ae9 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -128,7 +128,7 @@
-w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax"  
\

-   -cdnnmark_test_fwd_softmax \
+   -c dnnmark_test_fwd_softmax \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark  
\

-mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"

@@ -137,7 +137,7 @@
-w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool"  
\

-   -cdnnmark_test_fwd_pool \
+   -c dnnmark_test_fwd_pool \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark  
\

-mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"

@@ -146,7 +146,7 @@
-w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn"  
\

-   -cdnnmark_test_bwd_bn \
+   -c dnnmark_test_bwd_bn \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark  
\

-mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51453
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624
Gerrit-Change-Number: 51453
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: convert all nightly GPU tests from GUID to GID

2021-10-13 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51567 )



Change subject: tests: convert all nightly GPU tests from GUID to GID
..

tests: convert all nightly GPU tests from GUID to GID

As part of the docker commands for the nightly GPU regression tests,
earlier commits inadvertently used GUID instead of GID, where GUID does
not exist.  This causes some failures when run in Jenkins.  This patch
fixes this issue.

Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377
---
M tests/nightly.sh
1 file changed, 18 insertions(+), 4 deletions(-)



diff --git a/tests/nightly.sh b/tests/nightly.sh
index 89c7005..4e7420d 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -86,7 +86,7 @@

 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
 docker pull gcr.io/gem5-test/gcn-gpu:latest
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"
@@ -99,7 +99,7 @@
 # Square is the simplest, fastest, more heavily tested GPU application
 # Thus, we always want to run this in the nightly regressions to make sure
 # basic GPU functionality is working.
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\
 configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" -c  
square


@@ -110,7 +110,7 @@
 # 10 Ld/St per thread and 4 iterations of the critical section is a  
reasonable
 # moderate contention case for the default 4 CU GPU config and help ensure  
GPU

 # atomics are tested.
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \
 -c allSyncPrims-1kernel --options="sleepMutex 10 16 4"
@@ -120,7 +120,7 @@
 # thread, 4 iterations of critical section.  Again this is representative  
of a
 # moderate contention case for the default 4 CU GPU config and help ensure  
GPU

 # atomics are tested.
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3 --benchmark-root="${gem5_root}/tests" \
 -c allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4"

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51567
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377
Gerrit-Change-Number: 51567
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: fix LULESH weekly regression command

2021-10-13 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51207 )


Change subject: tests: fix LULESH weekly regression command
..

tests: fix LULESH weekly regression command

7756c5e added LULESH to the weekly regression script.  However,
it assumed a local installation of gem5-resources which it should
not have.  This commit fixes that so the weekly regression builds the
LULESH binary and then runs it instead.

Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51207
Maintainer: Matt Sinclair 
Maintainer: Bobby R. Bruce 
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Bobby R. Bruce 
Tested-by: kokoro 
---
M tests/weekly.sh
1 file changed, 38 insertions(+), 4 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve
  Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index c699f65..33dd70b 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -46,21 +46,35 @@

 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
 docker pull gcr.io/gem5-test/gcn-gpu:latest
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"

-# get LULESH
-wget -qN http://dist.gem5.org/dist/develop/test-progs/lulesh/lulesh
+# test LULESH
+# before pulling gem5 resources, make sure it doesn't exist already
+rm -rf ${gem5_root}/gem5-resources
+
+# Pull gem5 resources to the root of the gem5 directory -- currently the
+# pre-built binares for LULESH are out-of-date and won't run correctly with
+# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script

+git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
+"${gem5_root}/gem5-resources"

 mkdir -p tests/testing-results

+# build LULESH
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}/gem5-resources/src/gpu/lulesh" \
+   -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "make"
+
 # LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

 # stressing several GPU compute and memory components
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3 --mem-size=8GB -clulesh
+configs/example/apu_se.py -n3 --mem-size=8GB \
+--benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh


 # get DNNMark
 # Delete gem5 resources repo if it already exists -- need to do in docker

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51207
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If91f4340f2d042b0bcb366c5da10f7d0dc5643c5
Gerrit-Change-Number: 51207
Gerrit-PatchSet: 7
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Alex Dutu 
Gerrit-CC: Kyle Roarty 
Gerrit-CC: Matthew Poremba 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: add additional space in weekly DNNMark tests

2021-10-13 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51453 )


Change subject: tests: add additional space in weekly DNNMark tests
..

tests: add additional space in weekly DNNMark tests

Add space between -c and binary name for all DNNMark tests to conform to
the other tests style and reduce confusion.

Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51453
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Bobby R. Bruce 
Maintainer: Jason Lowe-Power 
Maintainer: Bobby R. Bruce 
Tested-by: kokoro 
---
M tests/weekly.sh
1 file changed, 21 insertions(+), 3 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index 33dd70b..1d14f4f 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -128,7 +128,7 @@
-w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax"  
\

-   -cdnnmark_test_fwd_softmax \
+   -c dnnmark_test_fwd_softmax \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark  
\

-mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"

@@ -137,7 +137,7 @@
-w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool"  
\

-   -cdnnmark_test_fwd_pool \
+   -c dnnmark_test_fwd_pool \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark  
\

-mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"

@@ -146,7 +146,7 @@
-w "${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn"  
\

-   -cdnnmark_test_bwd_bn \
+   -c dnnmark_test_bwd_bn \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark  
\

-mmap ${gem5_root}/gem5-resources/src/gpu/DNNMark/mmap.bin"


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51453
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I6d0777ba2186f0eedfe7e99db51161106837a624
Gerrit-Change-Number: 51453
Gerrit-PatchSet: 2
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: convert all nightly GPU tests from GUID to GID

2021-10-13 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51567 )


Change subject: tests: convert all nightly GPU tests from GUID to GID
..

tests: convert all nightly GPU tests from GUID to GID

As part of the docker commands for the nightly GPU regression tests,
earlier commits inadvertently used GUID instead of GID, where GUID does
not exist.  This causes some failures when run in Jenkins.  This patch
fixes this issue.

Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51567
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Bobby R. Bruce 
Maintainer: Jason Lowe-Power 
Maintainer: Bobby R. Bruce 
Tested-by: kokoro 
---
M tests/nightly.sh
1 file changed, 24 insertions(+), 4 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/nightly.sh b/tests/nightly.sh
index 30e2c58..b3708fd 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -86,7 +86,7 @@

 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
 docker pull gcr.io/gem5-test/gcn-gpu:latest
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"
@@ -99,7 +99,7 @@
 # Square is the simplest, fastest, more heavily tested GPU application
 # Thus, we always want to run this in the nightly regressions to make sure
 # basic GPU functionality is working.
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3 -c square

@@ -110,7 +110,7 @@
 # 10 Ld/St per thread and 4 iterations of the critical section is a  
reasonable
 # moderate contention case for the default 4 CU GPU config and help ensure  
GPU

 # atomics are tested.
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3  -c allSyncPrims-1kernel \
 --options="sleepMutex 10 16 4"
@@ -120,7 +120,7 @@
 # thread, 4 iterations of critical section.  Again this is representative  
of a
 # moderate contention case for the default 4 CU GPU config and help ensure  
GPU

 # atomics are tested.
-docker run --rm -u $UID:$GUID --volume "${gem5_root}":"${gem5_root}" -w \
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

 configs/example/apu_se.py -n3  -c allSyncPrims-1kernel \
 --options="lfTreeBarrUniq 10 16 4"

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51567
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I429c079ae3df9fd97a956f23a2fc9baeed3f7377
Gerrit-Change-Number: 51567
Gerrit-PatchSet: 2
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: fix typo in GPU VIPER TCC comment

2021-10-16 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51687 )



Change subject: mem-ruby: fix typo in GPU VIPER TCC comment
..

mem-ruby: fix typo in GPU VIPER TCC comment

72ee6d1a fixed a deadlock in the GPU VIPER TCC.  However, it
inadvertently added a typo to the comments explaining the change.  This
commit fixes that.

Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c
---
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
1 file changed, 17 insertions(+), 4 deletions(-)



diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index 571587f..dc6cf03 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -618,19 +618,19 @@
   // they can cause a resource stall deadlock!

   transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) {  
//TagArrayRead} {
-  // put putting the stalled requests in a buffer, we reduce resource  
contention
+  // by putting the stalled requests in a buffer, we reduce resource  
contention
   // since they won't try again every cycle and will instead only try  
again once

   // woken up
   st_stallAndWaitRequest;
   }
   transition(A, {RdBlk, WrVicBlk, WrVicBlkBack}) { //TagArrayRead} {
-  // put putting the stalled requests in a buffer, we reduce resource  
contention
+  // by putting the stalled requests in a buffer, we reduce resource  
contention
   // since they won't try again every cycle and will instead only try  
again once

   // woken up
   st_stallAndWaitRequest;
   }
   transition(IV, {WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} {
-  // put putting the stalled requests in a buffer, we reduce resource  
contention
+  // by putting the stalled requests in a buffer, we reduce resource  
contention
   // since they won't try again every cycle and will instead only try  
again once

   // woken up
   st_stallAndWaitRequest;
@@ -681,7 +681,7 @@

   transition(A, Atomic) {
 p_profileMiss;
-// put putting the stalled requests in a buffer, we reduce resource  
contention
+// by putting the stalled requests in a buffer, we reduce resource  
contention
 // since they won't try again every cycle and will instead only try  
again once

 // woken up
 st_stallAndWaitRequest;

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51687
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c
Gerrit-Change-Number: 51687
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: fix typo in GPU VIPER TCC comment

2021-10-16 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51687 )


Change subject: mem-ruby: fix typo in GPU VIPER TCC comment
..

mem-ruby: fix typo in GPU VIPER TCC comment

72ee6d1a fixed a deadlock in the GPU VIPER TCC.  However, it
inadvertently added a typo to the comments explaining the change.  This
commit fixes that.

Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51687
Maintainer: Matt Sinclair 
Reviewed-by: Jason Lowe-Power 
Tested-by: kokoro 
---
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
1 file changed, 21 insertions(+), 4 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index 571587f..dc6cf03 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -618,19 +618,19 @@
   // they can cause a resource stall deadlock!

   transition(WI, {RdBlk, WrVicBlk, Atomic, WrVicBlkBack}) {  
//TagArrayRead} {
-  // put putting the stalled requests in a buffer, we reduce resource  
contention
+  // by putting the stalled requests in a buffer, we reduce resource  
contention
   // since they won't try again every cycle and will instead only try  
again once

   // woken up
   st_stallAndWaitRequest;
   }
   transition(A, {RdBlk, WrVicBlk, WrVicBlkBack}) { //TagArrayRead} {
-  // put putting the stalled requests in a buffer, we reduce resource  
contention
+  // by putting the stalled requests in a buffer, we reduce resource  
contention
   // since they won't try again every cycle and will instead only try  
again once

   // woken up
   st_stallAndWaitRequest;
   }
   transition(IV, {WrVicBlk, Atomic, WrVicBlkBack}) { //TagArrayRead} {
-  // put putting the stalled requests in a buffer, we reduce resource  
contention
+  // by putting the stalled requests in a buffer, we reduce resource  
contention
   // since they won't try again every cycle and will instead only try  
again once

   // woken up
   st_stallAndWaitRequest;
@@ -681,7 +681,7 @@

   transition(A, Atomic) {
 p_profileMiss;
-// put putting the stalled requests in a buffer, we reduce resource  
contention
+// by putting the stalled requests in a buffer, we reduce resource  
contention
 // since they won't try again every cycle and will instead only try  
again once

 // woken up
 st_stallAndWaitRequest;

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51687
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ibba835aa907be33fc3dd8e576ad2901d5f8f509c
Gerrit-Change-Number: 51687
Gerrit-PatchSet: 2
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: simplify weekly regression

2021-10-17 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51707 )



Change subject: tests: simplify weekly regression
..

tests: simplify weekly regression

DNNMark and LULESH were both cloning and removing gem5-resources as part
of their tests, since they were committed separately/in parallel.  Clean
this up so we only remove and pull gem5-resources once now in the weekly
regression script.

Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44
---
M tests/weekly.sh
1 file changed, 24 insertions(+), 13 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index 1d14f4f..c5811a4 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -51,13 +51,17 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"

-# test LULESH
 # before pulling gem5 resources, make sure it doesn't exist already
-rm -rf ${gem5_root}/gem5-resources
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "rm -rf ${gem5_root}/gem5-resources"

+# test LULESH
 # Pull gem5 resources to the root of the gem5 directory -- currently the
 # pre-built binares for LULESH are out-of-date and won't run correctly with
-# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script
+# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script.

+# Moreover, DNNMark builds a library and thus doesn't have a binary, so we
+# need to build it before we run it.
 git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
 "${gem5_root}/gem5-resources"

@@ -76,19 +80,12 @@
 configs/example/apu_se.py -n3 --mem-size=8GB \
 --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh


-# get DNNMark
-# Delete gem5 resources repo if it already exists -- need to do in docker
-# because of cachefiles DNNMark creates
-docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
-   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
-   "rm -rf ${gem5_root}/gem5-resources"
-
-# Pull the gem5 resources to the root of the gem5 directory -- DNNMark
-# builds a library and thus doesn't have a binary, so we need to build
-# it before we run it
+# get DNNMark; it builds a library and thus doesn't have a binary, so we
+# need to build it before we run it
 git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
 "${gem5_root}/gem5-resources"

+# test DNNMark
 # setup cmake for DNNMark
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
  "${gem5_root}/gem5-resources/src/gpu/DNNMark" \

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51707
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44
Gerrit-Change-Number: 51707
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: add HACC to weekly regression

2021-10-17 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51708 )



Change subject: tests: add HACC to weekly regression
..

tests: add HACC to weekly regression

This commit adds HACC (a GPU HPC workload) to the weekly regression
tests.  HACC requires a number of environment variables to be set, so
to avoid setting all of them manually, we use a specific Dockerfile for
it.  To avoid compiling gem5 once for this docker and once for the other
GPU tests in the weekly regression, this commit also updates the weekly
regression such that all GPU weekly regression tests use HACC's docker
for their tests.

Change-Id: I9adabbca01537f031cbc491ddf1d3e7dd155f3f2
---
M tests/weekly.sh
1 file changed, 53 insertions(+), 15 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index 3f6a93c..f5308b5 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -45,15 +45,21 @@
 ./main.py run --length very-long -j${threads} -t${threads}

 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
+# HACC requires setting numerous environment variables to run correctly.   
To
+# avoid needing to set all of these, we instead build a docker for it,  
which

+# has all these variables pre-set in its Dockerfile
+# To avoid compiling gem5 multiple times, all GPU benchmarks will use this
 docker pull gcr.io/gem5-test/gcn-gpu:latest
+docker build -t hacc-test-weekly  
${gem5_root}/gem5-resources/src/gpu/halo-finder

+
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"${gem5_root}" hacc-test-weekly bash -c \
 "scons build/GCN3_X86/gem5.opt -j${threads} \
-|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"
+|| rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}"

 # before pulling gem5 resources, make sure it doesn't exist already
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
-   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "${gem5_root}" hacc-test-weekly bash -c \
"rm -rf ${gem5_root}/gem5-resources"

 # test LULESH
@@ -70,43 +76,44 @@
 # build LULESH
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
"${gem5_root}/gem5-resources/src/gpu/lulesh" \
-   -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   -u $UID:$GID hacc-test-weekly bash -c \
"make"

 # LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

 # stressing several GPU compute and memory components
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+"${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \
 configs/example/apu_se.py -n3 --mem-size=8GB \
---benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh
+--benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh \

+--options="1.0e-2 1"

 # test DNNMark
 # setup cmake for DNNMark
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
  "${gem5_root}/gem5-resources/src/gpu/DNNMark" \
- gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP"
+ hacc-test-weekly bash -c "./setup.sh HIP"

 # make the DNNMark library
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \
-gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}"
+hacc-test-weekly bash -c "make -j${threads}"

 # generate cachefiles -- since we are testing gfx801 and 4 CUs (default  
config)

 # in tester, we want cachefiles for this setup
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}/gem5-resources/src/gpu/DNNMark" \
 "-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-gcr.io/gem5-test/gcn-gpu:latest bash -c \
+hacc-test-weekly bash -c \
 "python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \
 --num-cus=4"

 # generate mmap data for DNNMark (makes simulation much faster)
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly bash -c  
\

 "g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data"

 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly bash -c  
\

 "./generate_rand_data"

 # now we can run DNNMark!
@@ -117,7 +124,7 @@
 # including both inference and training
 docker run --rm --volume "${gem5_root}":"$

[gem5-dev] Change in gem5/gem5[develop]: tests: simplify weekly regression

2021-10-18 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51707 )


Change subject: tests: simplify weekly regression
..

tests: simplify weekly regression

DNNMark and LULESH were both cloning and removing gem5-resources as part
of their tests, since they were committed separately/in parallel.  Clean
this up so we only remove and pull gem5-resources once now in the weekly
regression script.

Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51707
Reviewed-by: Bobby R. Bruce 
Maintainer: Bobby R. Bruce 
Tested-by: kokoro 
---
M tests/weekly.sh
1 file changed, 26 insertions(+), 16 deletions(-)

Approvals:
  Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index 1d14f4f..3f6a93c 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -51,13 +51,17 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"

-# test LULESH
 # before pulling gem5 resources, make sure it doesn't exist already
-rm -rf ${gem5_root}/gem5-resources
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "rm -rf ${gem5_root}/gem5-resources"

+# test LULESH
 # Pull gem5 resources to the root of the gem5 directory -- currently the
 # pre-built binares for LULESH are out-of-date and won't run correctly with
-# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script
+# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script.

+# Moreover, DNNMark builds a library and thus doesn't have a binary, so we
+# need to build it before we run it.
 git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
 "${gem5_root}/gem5-resources"

@@ -76,19 +80,7 @@
 configs/example/apu_se.py -n3 --mem-size=8GB \
 --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh


-# get DNNMark
-# Delete gem5 resources repo if it already exists -- need to do in docker
-# because of cachefiles DNNMark creates
-docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
-   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
-   "rm -rf ${gem5_root}/gem5-resources"
-
-# Pull the gem5 resources to the root of the gem5 directory -- DNNMark
-# builds a library and thus doesn't have a binary, so we need to build
-# it before we run it
-git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
-"${gem5_root}/gem5-resources"
-
+# test DNNMark
 # setup cmake for DNNMark
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
  "${gem5_root}/gem5-resources/src/gpu/DNNMark" \

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51707
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5ab1410b0934bf20ed817e379f4e494aa53bfa44
Gerrit-Change-Number: 51707
Gerrit-PatchSet: 3
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Kyle Roarty 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Matthew Poremba 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: add HACC to weekly regression

2021-10-19 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51708 )


Change subject: tests: add HACC to weekly regression
..

tests: add HACC to weekly regression

This commit adds HACC (a GPU HPC workload) to the weekly regression
tests.  HACC requires a number of environment variables to be set, so
to avoid setting all of them manually, we use a specific Dockerfile for
it.  To avoid compiling gem5 once for this docker and once for the other
GPU tests in the weekly regression, this commit also updates the weekly
regression such that all GPU weekly regression tests use HACC's docker
for their tests.

Change-Id: I9adabbca01537f031cbc491ddf1d3e7dd155f3f2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51708
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Matt Sinclair 
Maintainer: Bobby R. Bruce 
---
M tests/weekly.sh
1 file changed, 56 insertions(+), 14 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve
  Matt Sinclair: Looks good to me, approved
  Bobby R. Bruce: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index 3f6a93c..12793da 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -45,15 +45,21 @@
 ./main.py run --length very-long -j${threads} -t${threads}

 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
+# HACC requires setting numerous environment variables to run correctly.   
To
+# avoid needing to set all of these, we instead build a docker for it,  
which

+# has all these variables pre-set in its Dockerfile
+# To avoid compiling gem5 multiple times, all GPU benchmarks will use this
 docker pull gcr.io/gem5-test/gcn-gpu:latest
+docker build -t hacc-test-weekly  
${gem5_root}/gem5-resources/src/gpu/halo-finder

+
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"${gem5_root}" hacc-test-weekly bash -c \
 "scons build/GCN3_X86/gem5.opt -j${threads} \
-|| (rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads})"
+|| rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}"

 # before pulling gem5 resources, make sure it doesn't exist already
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
-   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "${gem5_root}" hacc-test-weekly bash -c \
"rm -rf ${gem5_root}/gem5-resources"

 # test LULESH
@@ -70,13 +76,13 @@
 # build LULESH
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
"${gem5_root}/gem5-resources/src/gpu/lulesh" \
-   -u $UID:$GID gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   -u $UID:$GID hacc-test-weekly bash -c \
"make"

 # LULESH is heavily used in the HPC community on GPUs, and does a good job  
of

 # stressing several GPU compute and memory components
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

+"${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \
 configs/example/apu_se.py -n3 --mem-size=8GB \
 --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh


@@ -84,29 +90,29 @@
 # setup cmake for DNNMark
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
  "${gem5_root}/gem5-resources/src/gpu/DNNMark" \
- gcr.io/gem5-test/gcn-gpu:latest bash -c "./setup.sh HIP"
+ hacc-test-weekly bash -c "./setup.sh HIP"

 # make the DNNMark library
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}/gem5-resources/src/gpu/DNNMark/build" \
-gcr.io/gem5-test/gcn-gpu:latest bash -c "make -j${threads}"
+hacc-test-weekly bash -c "make -j${threads}"

 # generate cachefiles -- since we are testing gfx801 and 4 CUs (default  
config)

 # in tester, we want cachefiles for this setup
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}/gem5-resources/src/gpu/DNNMark" \
 "-v${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-gcr.io/gem5-test/gcn-gpu:latest bash -c \
+hacc-test-weekly bash -c \
 "python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx801 \
 --num-cus=4"

 # generate mmap data for DNNMark (makes simulation much faster)
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:latest bash -c \
+"${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly bash -c  
\

 "g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data"

 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
-"${gem5_root}/gem5-resources/src/gpu/DNNMark"  
gcr.io/gem5-test/gcn-gpu:l

[gem5-dev] Change in gem5/gem5[develop]: tests: fix bug in weekly regression

2021-10-21 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51907 )



Change subject: tests: fix bug in weekly regression
..

tests: fix bug in weekly regression

66a056b8 changed the weekly regression to use a single docker for
all GPU tests, to reduce how many times gem5 needed to be compiled.
However, in my local testing of that patch, gem5-resources was not
deleted until after the docker was created -- which causes a problem
when gem5-resources does not exist already from a prior run, since
the creation of the dockerfile requires it for HACC.  This commit
fixes this problem by moving the pull of gem5-resources to be before
anything else related to the GPU happens.

Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c
---
M tests/weekly.sh
1 file changed, 32 insertions(+), 13 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index 12793da..b91dbbc 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -44,6 +44,20 @@
 "${gem5_root}"/tests --rm  
gcr.io/gem5-test/ubuntu-20.04_all-dependencies \

 ./main.py run --length very-long -j${threads} -t${threads}

+# before pulling gem5 resources, make sure it doesn't exist already
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}" hacc-test-weekly bash -c \
+   "rm -rf ${gem5_root}/gem5-resources"
+
+# Pull gem5 resources to the root of the gem5 directory -- currently the
+# pre-built binares for LULESH are out-of-date and won't run correctly with
+# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script.

+# Moreover, DNNMark builds a library and thus doesn't have a binary, so we
+# need to build it before we run it.
+# Need to pull this first because HACC's docker requires this path to exist
+git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
+"${gem5_root}/gem5-resources"
+
 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
 # HACC requires setting numerous environment variables to run correctly.   
To
 # avoid needing to set all of these, we instead build a docker for it,  
which

@@ -57,20 +71,7 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}"

-# before pulling gem5 resources, make sure it doesn't exist already
-docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
-   "${gem5_root}" hacc-test-weekly bash -c \
-   "rm -rf ${gem5_root}/gem5-resources"
-
 # test LULESH
-# Pull gem5 resources to the root of the gem5 directory -- currently the
-# pre-built binares for LULESH are out-of-date and won't run correctly with
-# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script.

-# Moreover, DNNMark builds a library and thus doesn't have a binary, so we
-# need to build it before we run it.
-git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
-"${gem5_root}/gem5-resources"
-
 mkdir -p tests/testing-results

 # build LULESH

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51907
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c
Gerrit-Change-Number: 51907
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: fix bug in weekly regression

2021-10-22 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51907 )


Change subject: tests: fix bug in weekly regression
..

tests: fix bug in weekly regression

66a056b8 changed the weekly regression to use a single docker for
all GPU tests, to reduce how many times gem5 needed to be compiled.
However, in my local testing of that patch, gem5-resources was not
deleted until after the docker was created -- which causes a problem
when gem5-resources does not exist already from a prior run, since
the creation of the dockerfile requires it for HACC.  This commit
fixes this problem by moving the pull of gem5-resources to be before
anything else related to the GPU happens.

Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51907
Maintainer: Matt Sinclair 
Maintainer: Bobby R. Bruce 
Reviewed-by: Bobby R. Bruce 
Tested-by: kokoro 
---
M tests/weekly.sh
1 file changed, 37 insertions(+), 13 deletions(-)

Approvals:
  Bobby R. Bruce: Looks good to me, approved; Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index 12793da..c7ba7e6 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -44,6 +44,20 @@
 "${gem5_root}"/tests --rm  
gcr.io/gem5-test/ubuntu-20.04_all-dependencies \

 ./main.py run --length very-long -j${threads} -t${threads}

+# before pulling gem5 resources, make sure it doesn't exist already
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
+   "rm -rf ${gem5_root}/gem5-resources"
+
+# Pull gem5 resources to the root of the gem5 directory -- currently the
+# pre-built binares for LULESH are out-of-date and won't run correctly with
+# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script.

+# Moreover, DNNMark builds a library and thus doesn't have a binary, so we
+# need to build it before we run it.
+# Need to pull this first because HACC's docker requires this path to exist
+git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
+"${gem5_root}/gem5-resources"
+
 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
 # HACC requires setting numerous environment variables to run correctly.   
To
 # avoid needing to set all of these, we instead build a docker for it,  
which

@@ -57,20 +71,7 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}"

-# before pulling gem5 resources, make sure it doesn't exist already
-docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
-   "${gem5_root}" hacc-test-weekly bash -c \
-   "rm -rf ${gem5_root}/gem5-resources"
-
 # test LULESH
-# Pull gem5 resources to the root of the gem5 directory -- currently the
-# pre-built binares for LULESH are out-of-date and won't run correctly with
-# ROCm 4.0.  In the meantime, we can build the binary as part of this  
script.

-# Moreover, DNNMark builds a library and thus doesn't have a binary, so we
-# need to build it before we run it.
-git clone -b develop https://gem5.googlesource.com/public/gem5-resources \
-"${gem5_root}/gem5-resources"
-
 mkdir -p tests/testing-results

 # build LULESH

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/51907
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I006860204d03807d95628aa5dcf6e82d202fef9c
Gerrit-Change-Number: 51907
Gerrit-PatchSet: 4
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Matthew Poremba 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests: add Pannotia to weekly regression

2021-10-23 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51968 )



Change subject: tests: add Pannotia to weekly regression
..

tests: add Pannotia to weekly regression

Add the Pannotia benchmarks to the weekly regression suite.  These
applications do a good job of testing the GPU support for irregular
access patterns of various kinds.  All inputs have been sized to use
relatively small graphs to avoid increasing runtime too much.  However,
even with small input sizes Pannotia does run for a while.

Note that the Pannotia benchmarks also use m5ops in them.  Thus, this
commit also adds support into the weekly regression for compiling the
m5ops (for x86, since that is what the GPU model assumes for the CPU).

Change-Id: I1f68b02b38ff24505a2894694b7544977024f8fa
---
M tests/weekly.sh
1 file changed, 160 insertions(+), 2 deletions(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index c7ba7e6..a9f7531 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -44,10 +44,15 @@
 "${gem5_root}"/tests --rm  
gcr.io/gem5-test/ubuntu-20.04_all-dependencies \

 ./main.py run --length very-long -j${threads} -t${threads}

+mkdir -p tests/testing-results
+
+# GPU weekly tests start here
 # before pulling gem5 resources, make sure it doesn't exist already
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
"rm -rf ${gem5_root}/gem5-resources"
+# delete Pannotia datasets in case a failed regression run left them around
+rm -f coAuthorsDBLP.graph 1k_128k.gr

 # Pull gem5 resources to the root of the gem5 directory -- currently the
 # pre-built binares for LULESH are out-of-date and won't run correctly with
@@ -71,9 +76,14 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}"

-# test LULESH
-mkdir -p tests/testing-results
+# Some of the apps we test use m5ops (and x86), so compile them for x86
+# Note: setting TERM in the environment is necessary as scons fails for  
m5ops if

+# it is not set.
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/util/m5" hacc-test-weekly bash -c \
+"export TERM=xterm-256color ; scons build/x86/out/m5"

+# test LULESH
 # build LULESH
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
"${gem5_root}/gem5-resources/src/gpu/lulesh" \
@@ -163,8 +173,137 @@
 
--benchmark-root=${gem5_root}/gem5-resources/src/gpu/halo-finder/src/hip \

-c ForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb"

+# test Pannotia
+# Pannotia has 6 different benchmarks (BC, Color, FW, MIS, PageRank,  
SSSP), of
+# which 3 (Color, PageRank, SSSP) have 2 different variants.  Since they  
are

+# useful for testing irregular GPU application behavior, we test each.
+
+# build BC
+docker run --rm -v ${PWD}:${PWD} \
+   -w ${gem5_root}/gem5-resources/src/gpu/pannotia/bc -u $UID:$GID \
+   hacc-test-weekly bash -c \
+   "export GEM5_PATH=${gem5_root} ; make gem5-fusion"
+
+# # get input dataset for BC test
+wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr
+# run BC
+docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \
+   hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \
+   ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \
+   --benchmark-root=gem5-resources/src/gpu/pannotia/bc/bin -c bc.gem5 \
+   --options="1k_128k.gr"
+
+# build Color Max
+docker run --rm -v ${gem5_root}:${gem5_root} -w \
+   ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \
+   hacc-test-weekly bash -c \
+   "export GEM5_PATH=${gem5_root} ; make gem5-fusion"
+
+# run Color (Max) (use same input dataset as BC for faster testing)
+docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \
+   hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \
+   ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \
+
--benchmark-root=${gem5_root}/gem5-resources/src/gpu/pannotia/color/bin \

+   -c color_max.gem5 --options="1k_128k.gr 0"
+
+# build Color (MaxMin)
+docker run --rm -v ${gem5_root}:${gem5_root} -w \
+   ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \
+   hacc-test-weekly bash -c \
+   "export GEM5_PATH=${gem5_root} ; export VARIANT=MAXMIN ; make  
gem5-fusion"

+
+# run Color (MaxMin) (use same input dataset as BC for faster testing)
+docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \
+   hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \
+   ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \
+
--benchmark-root=${gem5_root}/gem5-resources/src/gpu/pannotia/color/bin \

+   -c color_maxmin.gem5 --options="1k_128k.gr 0"
+
+# build FW
+docker run --rm -v ${ge

[gem5-dev] Change in gem5/gem5[develop]: tests, gpu-compute: test dynamic register policy in weekly

2021-10-27 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/52163 )



Change subject: tests, gpu-compute: test dynamic register policy in weekly
..

tests, gpu-compute: test dynamic register policy in weekly

The GPU models support a simple register allocation policy (1 WF/CU at a
time) and a dynamic register allocation policy (up to max WF/CU at a
time).  By default, the simple policy is used.  However, the dynamic
policy is much more realistic relative to real hardware and thus much
more important to ensure it works in the regressions.  This commit
updates the nightly and weekly regressions accordingly to run the
dynamic register allocation policy.

Change-Id: Id263d3d5e19e4ff47f0eb6d9b08cbafdf2177fb9
---
M tests/weekly.sh
M tests/nightly.sh
2 files changed, 36 insertions(+), 8 deletions(-)



diff --git a/tests/nightly.sh b/tests/nightly.sh
index b3708fd..41db369 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -83,7 +83,6 @@
 ./main.py run --length long -j${threads} -t${threads}

 # Run the GPU tests.
-
 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
 docker pull gcr.io/gem5-test/gcn-gpu:latest
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
@@ -101,7 +100,7 @@
 # basic GPU functionality is working.
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3 -c square
+configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c square

 # get HeteroSync
 wget -qN  
http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel

@@ -112,8 +111,8 @@
 # atomics are tested.
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3  -c allSyncPrims-1kernel \
---options="sleepMutex 10 16 4"
+configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \
+allSyncPrims-1kernel --options="sleepMutex 10 16 4"

 # run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs
 # accessing unique data and then joining a lock-free barrier, 10 Ld/St per
@@ -122,5 +121,5 @@
 # atomics are tested.
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3  -c allSyncPrims-1kernel \
---options="lfTreeBarrUniq 10 16 4"
+configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \
+allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4"
diff --git a/tests/weekly.sh b/tests/weekly.sh
index 51376bd..172d955 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -95,7 +95,7 @@
 # stressing several GPU compute and memory components
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \
-configs/example/apu_se.py -n3 --mem-size=8GB \
+configs/example/apu_se.py -n3 --mem-size=8GB  
--reg-alloc-policy=dynamic \
 --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh


 # test DNNMark
@@ -137,6 +137,7 @@
"${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\

+   --reg-alloc-policy=dynamic \
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax"  
\

-c dnnmark_test_fwd_softmax \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark  
\

@@ -146,6 +147,7 @@
"${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\

+   --reg-alloc-policy=dynamic \
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool"  
\

-c dnnmark_test_fwd_pool \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark  
\

@@ -155,6 +157,7 @@
"${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\

+   --reg-alloc-policy=dynamic \
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn"  
\

-c dnnmark_test_bwd

[gem5-dev] Change in gem5/gem5[develop]: tests: add Pannotia to weekly regression

2021-11-01 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/51968 )


Change subject: tests: add Pannotia to weekly regression
..

tests: add Pannotia to weekly regression

Add the Pannotia benchmarks to the weekly regression suite.  These
applications do a good job of testing the GPU support for irregular
access patterns of various kinds.  All inputs have been sized to use
relatively small graphs to avoid increasing runtime too much.  However,
even with small input sizes Pannotia does run for a while.

Note that the Pannotia benchmarks also use m5ops in them.  Thus, this
commit also adds support into the weekly regression for compiling the
m5ops (for x86, since that is what the GPU model assumes for the CPU).

Change-Id: I1f68b02b38ff24505a2894694b7544977024f8fa
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/51968
Tested-by: kokoro 
Maintainer: Matt Sinclair 
Reviewed-by: Jason Lowe-Power 
---
M tests/weekly.sh
1 file changed, 165 insertions(+), 2 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/weekly.sh b/tests/weekly.sh
index c7ba7e6..51376bd 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -44,10 +44,16 @@
 "${gem5_root}"/tests --rm  
gcr.io/gem5-test/ubuntu-20.04_all-dependencies \

 ./main.py run --length very-long -j${threads} -t${threads}

+mkdir -p tests/testing-results
+
+# GPU weekly tests start here
 # before pulling gem5 resources, make sure it doesn't exist already
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest bash -c \
"rm -rf ${gem5_root}/gem5-resources"
+# delete Pannotia datasets and output files in case a failed regression  
run left

+# them around
+rm -f coAuthorsDBLP.graph 1k_128k.gr result.out

 # Pull gem5 resources to the root of the gem5 directory -- currently the
 # pre-built binares for LULESH are out-of-date and won't run correctly with
@@ -71,9 +77,14 @@
 "scons build/GCN3_X86/gem5.opt -j${threads} \
 || rm -rf build && scons build/GCN3_X86/gem5.opt -j${threads}"

-# test LULESH
-mkdir -p tests/testing-results
+# Some of the apps we test use m5ops (and x86), so compile them for x86
+# Note: setting TERM in the environment is necessary as scons fails for  
m5ops if

+# it is not set.
+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}/util/m5" hacc-test-weekly bash -c \
+"export TERM=xterm-256color ; scons build/x86/out/m5"

+# test LULESH
 # build LULESH
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
"${gem5_root}/gem5-resources/src/gpu/lulesh" \
@@ -163,8 +174,137 @@
 
--benchmark-root=${gem5_root}/gem5-resources/src/gpu/halo-finder/src/hip \

-c ForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb"

+# test Pannotia
+# Pannotia has 6 different benchmarks (BC, Color, FW, MIS, PageRank,  
SSSP), of
+# which 3 (Color, PageRank, SSSP) have 2 different variants.  Since they  
are

+# useful for testing irregular GPU application behavior, we test each.
+
+# build BC
+docker run --rm -v ${PWD}:${PWD} \
+   -w ${gem5_root}/gem5-resources/src/gpu/pannotia/bc -u $UID:$GID \
+   hacc-test-weekly bash -c \
+   "export GEM5_PATH=${gem5_root} ; make gem5-fusion"
+
+# # get input dataset for BC test
+wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr
+# run BC
+docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \
+   hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \
+   ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \
+   --benchmark-root=gem5-resources/src/gpu/pannotia/bc/bin -c bc.gem5 \
+   --options="1k_128k.gr"
+
+# build Color Max
+docker run --rm -v ${gem5_root}:${gem5_root} -w \
+   ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \
+   hacc-test-weekly bash -c \
+   "export GEM5_PATH=${gem5_root} ; make gem5-fusion"
+
+# run Color (Max) (use same input dataset as BC for faster testing)
+docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \
+   hacc-test-weekly ${gem5_root}/build/GCN3_X86/gem5.opt \
+   ${gem5_root}/configs/example/apu_se.py -n3 --mem-size=8GB \
+
--benchmark-root=${gem5_root}/gem5-resources/src/gpu/pannotia/color/bin \

+   -c color_max.gem5 --options="1k_128k.gr 0"
+
+# build Color (MaxMin)
+docker run --rm -v ${gem5_root}:${gem5_root} -w \
+   ${gem5_root}/gem5-resources/src/gpu/pannotia/color -u $UID:$GID \
+   hacc-test-weekly bash -c \
+   "export GEM5_PATH=${gem5_root} ; export VARIANT=MAXMIN ; make  
gem5-fusion"

+
+# run Color (MaxMin) (use same input dataset as BC for faster testing)
+docker run --rm -v ${gem5_root}:${gem5_root} -w ${gem5_root} -u $UID:$GID \
+ 

[gem5-dev] Change in gem5/gem5[develop]: tests, gpu-compute: test dynamic register policy in regressions

2021-11-02 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/52163 )


Change subject: tests, gpu-compute: test dynamic register policy in  
regressions

..

tests, gpu-compute: test dynamic register policy in regressions

The GPU models support a simple register allocation policy (1 WF/CU at a
time) and a dynamic register allocation policy (up to max WF/CU at a
time).  By default, the simple policy is used.  However, the dynamic
policy is much more realistic relative to real hardware and thus much
more important to ensure it works in the regressions.  This commit
updates the nightly and weekly regressions accordingly to run the
dynamic register allocation policy.

Change-Id: Id263d3d5e19e4ff47f0eb6d9b08cbafdf2177fb9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/52163
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Bobby R. Bruce 
---
M tests/weekly.sh
M tests/nightly.sh
2 files changed, 40 insertions(+), 8 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved
  Bobby R. Bruce: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/tests/nightly.sh b/tests/nightly.sh
index b3708fd..41db369 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -83,7 +83,6 @@
 ./main.py run --length long -j${threads} -t${threads}

 # Run the GPU tests.
-
 # For the GPU tests we compile and run GCN3_X86 inside a gcn-gpu container.
 docker pull gcr.io/gem5-test/gcn-gpu:latest
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
@@ -101,7 +100,7 @@
 # basic GPU functionality is working.
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3 -c square
+configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c square

 # get HeteroSync
 wget -qN  
http://dist.gem5.org/dist/develop/test-progs/heterosync/gcn3/allSyncPrims-1kernel

@@ -112,8 +111,8 @@
 # atomics are tested.
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3  -c allSyncPrims-1kernel \
---options="sleepMutex 10 16 4"
+configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \
+allSyncPrims-1kernel --options="sleepMutex 10 16 4"

 # run HeteroSync LFBarr -- similar setup to sleepMutex above -- 16 WGs
 # accessing unique data and then joining a lock-free barrier, 10 Ld/St per
@@ -122,5 +121,5 @@
 # atomics are tested.
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest build/GCN3_X86/gem5.opt  
\

-configs/example/apu_se.py -n3  -c allSyncPrims-1kernel \
---options="lfTreeBarrUniq 10 16 4"
+configs/example/apu_se.py --reg-alloc-policy=dynamic -n3 -c \
+allSyncPrims-1kernel --options="lfTreeBarrUniq 10 16 4"
diff --git a/tests/weekly.sh b/tests/weekly.sh
index 51376bd..172d955 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -95,7 +95,7 @@
 # stressing several GPU compute and memory components
 docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
 "${gem5_root}" hacc-test-weekly build/GCN3_X86/gem5.opt \
-configs/example/apu_se.py -n3 --mem-size=8GB \
+configs/example/apu_se.py -n3 --mem-size=8GB  
--reg-alloc-policy=dynamic \
 --benchmark-root="${gem5_root}/gem5-resources/src/gpu/lulesh/bin" -c  
lulesh


 # test DNNMark
@@ -137,6 +137,7 @@
"${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\

+   --reg-alloc-policy=dynamic \
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax"  
\

-c dnnmark_test_fwd_softmax \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark  
\

@@ -146,6 +147,7 @@
"${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-weekly \
"${gem5_root}/build/GCN3_X86/gem5.opt" "${gem5_root}/configs/example/apu_se.py"  
-n3  
\

+   --reg-alloc-policy=dynamic \
 
--benchmark-root="${gem5_root}/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_pool"  
\

-c dnnmark_test_fwd_pool \
--options="-config  
${gem5_root}/gem5-resources/src/gpu/DNNMark/config_example/pool_config.dnnmark  
\

@@ -155,6 +157,7 @@
"${gem5_root}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0"  
\

-w "${gem5_root}/gem5-resources/src/gpu/DNNMark" hacc-test-w

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Set scratch_base, lds_base for gfx902

2022-02-17 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/54663 )


Change subject: gpu-compute: Set scratch_base, lds_base for gfx902
..

gpu-compute: Set scratch_base, lds_base for gfx902

When updating how scratch_base and lds_base were set, gfx902 was left out.
This adds in gfx902 to the case statement, allowing the apertures to be set
and for simulations using gfx902 to not error out

Change-Id: I0e1adbdf63f7c129186fb835e30adac9cd4b72d0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/54663
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matthew Poremba 
Tested-by: kokoro 
---
M src/gpu-compute/gpu_compute_driver.cc
1 file changed, 21 insertions(+), 0 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved; Looks good to me, approved
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/gpu-compute/gpu_compute_driver.cc  
b/src/gpu-compute/gpu_compute_driver.cc

index e908f4e..d98f4c6 100644
--- a/src/gpu-compute/gpu_compute_driver.cc
+++ b/src/gpu-compute/gpu_compute_driver.cc
@@ -331,6 +331,7 @@
 ldsApeBase(i + 1);
 break;
   case GfxVersion::gfx900:
+  case GfxVersion::gfx902:
 args->process_apertures[i].scratch_base =
 scratchApeBaseV9();
 args->process_apertures[i].lds_base =
@@ -631,6 +632,7 @@
 ape_args->lds_base = ldsApeBase(i + 1);
 break;
   case GfxVersion::gfx900:
+  case GfxVersion::gfx902:
 ape_args->scratch_base = scratchApeBaseV9();
 ape_args->lds_base = ldsApeBaseV9();
 break;

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/54663
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0e1adbdf63f7c129186fb835e30adac9cd4b72d0
Gerrit-Change-Number: 54663
Gerrit-PatchSet: 2
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby Bruce 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Fix register checking and allocation in dyn manager

2022-02-18 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/56909 )


Change subject: gpu-compute: Fix register checking and allocation in dyn  
manager

..

gpu-compute: Fix register checking and allocation in dyn manager

This patch updates the canAllocate function to account both for
the number of regions of registers that need to be allocated,
and for the fact that the registers aren't one continuous chunk.

The patch also consolidates the registers as much as possible when
a register chunk is freed. This prevents fragmentation from making
it impossible to allocate enough registers

Change-Id: Ic95cfe614d247add475f7139d3703991042f8149
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/56909
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matthew Poremba 
---
M src/gpu-compute/dyn_pool_manager.cc
1 file changed, 69 insertions(+), 6 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass




diff --git a/src/gpu-compute/dyn_pool_manager.cc  
b/src/gpu-compute/dyn_pool_manager.cc

index 62a39a9..3db5e7f 100644
--- a/src/gpu-compute/dyn_pool_manager.cc
+++ b/src/gpu-compute/dyn_pool_manager.cc
@@ -93,8 +93,24 @@
 DynPoolManager::canAllocate(uint32_t numRegions, uint32_t size)
 {
 uint32_t actualSize = minAllocatedElements(size);
-DPRINTF(GPUVRF,"Can Allocate %d\n",actualSize);
-return (_totRegSpaceAvailable >= actualSize);
+uint32_t numAvailChunks = 0;
+DPRINTF(GPUVRF, "Checking if we can allocate %d regions of size %d "
+"registers\n", numRegions, actualSize);
+for (auto it : freeSpaceRecord) {
+numAvailChunks += (it.second - it.first)/actualSize;
+}
+
+if (numAvailChunks >= numRegions) {
+DPRINTF(GPUVRF, "Able to allocate %d regions of size %d; "
+"number of available regions: %d\n",
+numRegions, actualSize, numAvailChunks);
+return true;
+} else {
+DPRINTF(GPUVRF, "Unable to allocate %d regions of size %d; "
+"number of available regions: %d\n",
+numRegions, actualSize, numAvailChunks);
+return false;
+}
 }

 uint32_t
@@ -105,7 +121,8 @@
 uint32_t actualSize = minAllocatedElements(size);
 auto it = freeSpaceRecord.begin();
 while (it != freeSpaceRecord.end()) {
-if (it->second >= actualSize) {
+uint32_t curChunkSize = it->second - it->first;
+if (curChunkSize >= actualSize) {
 // assign the next block starting from here
 startIdx = it->first;
 _regionSize = actualSize;
@@ -115,14 +132,13 @@
 // This case sees if this chunk size is exactly equal to
 // the size of the requested chunk. If yes, then this can't
 // contribute to future requests and hence, should be removed
-if (it->second == actualSize) {
+if (curChunkSize == actualSize) {
 it = freeSpaceRecord.erase(it);
 // once entire freeSpaceRecord allocated, increment
 // reservedSpaceRecord count
 ++reservedSpaceRecord;
 } else {
 it->first += actualSize;
-it->second -= actualSize;
 }
 break;
 }
@@ -144,7 +160,32 @@
 // Current dynamic register allocation does not handle wraparound
 assert(firstIdx < lastIdx);
 _totRegSpaceAvailable += lastIdx-firstIdx;
-freeSpaceRecord.push_back(std::make_pair(firstIdx,lastIdx-firstIdx));
+
+// Consolidate with other regions. Need to check if firstIdx or lastIdx
+// already exist
+auto firstIt = std::find_if(
+freeSpaceRecord.begin(),
+freeSpaceRecord.end(),
+[&](const std::pair& element){
+return element.second == firstIdx;} );
+
+auto lastIt = std::find_if(
+freeSpaceRecord.begin(),
+freeSpaceRecord.end(),
+[&](const std::pair& element){
+return element.first == lastIdx;} );
+
+if (firstIt != freeSpaceRecord.end() && lastIt !=  
freeSpaceRecord.end()) {

+firstIt->second = lastIt->second;
+freeSpaceRecord.erase(lastIt);
+} else if (firstIt != freeSpaceRecord.end()) {
+firstIt->second = lastIdx;
+} else if (lastIt != freeSpaceRecord.end()) {
+lastIt->first = firstIdx;
+} else {
+freeSpaceRecord.push_back(std::make_pair(firstIdx, lastIdx));
+}
+
 // remove corresponding entry from reservedSpaceRecord too
 --reservedSpaceRecord;
 }

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/56909
To unsubscribe, or for help writing mail filters, visit  

[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: change default GPU reg allocator to dynamic

2022-03-13 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/57537 )



Change subject: configs, gpu-compute: change default GPU reg allocator to  
dynamic

..

configs, gpu-compute: change default GPU reg allocator to dynamic

The current default GPU register allocator is the "simple" policy,
which only allows 1 wavefront to run at a time on each CU.  This is
not very realistic and also means the tester (when not specifically
choosing the dynamic policy) is less rigorous in terms of validating
correctness.

To resolve this, this commit changes the default to the "dynamic"
register allocator, which runs as many waves per CU as there are
space in terms of registers and other resources -- thus it is more
realistic and does a better job of ensuring test coverage.

Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e
---
M configs/example/apu_se.py
1 file changed, 21 insertions(+), 1 deletion(-)



diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 532fb98..b5fb9ff 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -161,7 +161,7 @@
 ' m5_switchcpu pseudo-ops will toggle back and forth')
 parser.add_argument("--num-hw-queues", type=int, default=10,
 help="number of hw queues in packet processor")
-parser.add_argument("--reg-alloc-policy", type=str, default="simple",
+parser.add_argument("--reg-alloc-policy", type=str, default="dynamic",
 help="register allocation policy (simple/dynamic)")

 parser.add_argument("--dgpu", action="store_true", default=False,

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57537
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e
Gerrit-Change-Number: 57537
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester

2022-03-13 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/57535 )



Change subject: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester
..

tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester

Currently VIPER does not support partial cache line writes,
so we need numDMAs to be 0 for it to work with the GPU Ruby
tester.  To enable this, change the default value to -1 to
allow us to identify when a user wants more DMAs.

Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2
---
M configs/example/ruby_gpu_random_test.py
1 file changed, 36 insertions(+), 4 deletions(-)



diff --git a/configs/example/ruby_gpu_random_test.py  
b/configs/example/ruby_gpu_random_test.py

index 0763454..7fcaeeb 100644
--- a/configs/example/ruby_gpu_random_test.py
+++ b/configs/example/ruby_gpu_random_test.py
@@ -79,7 +79,7 @@
 help="Random seed number. Default value (i.e., 0)  
means \

 using runtime-specific value")
 parser.add_argument("--log-file", type=str, default="gpu-ruby-test.log")
-parser.add_argument("--num-dmas", type=int, default=0,
+parser.add_argument("--num-dmas", type=int, default=-1,
 help="The number of DMA engines to use in tester  
config.")


 args = parser.parse_args()
@@ -108,7 +108,13 @@
 args.wf_size = 1
 args.wavefronts_per_cu = 1
 args.num_cpus = 1
-args.num_dmas = 1
+# if user didn't specify number of DMAs, then assume 0
+if args.num_dmas < 1:
+  # currently VIPER does not support partial cache line writes,
+  # so we need numDMAs to be 0 for it
+  args.num_dmas = 0
+else:
+  args.num_dmas = 1
 args.cu_per_sqc = 1
 args.cu_per_scalar_cache = 1
 args.num_compute_units = 1
@@ -117,7 +123,13 @@
 args.wf_size = 16
 args.wavefronts_per_cu = 4
 args.num_cpus = 4
-args.num_dmas = 2
+# if user didn't specify number of DMAs, then assume 0
+if args.num_dmas < 1:
+  # currently VIPER does not support partial cache line writes,
+  # so we need numDMAs to be 0 for it
+  args.num_dmas = 0
+else:
+  args.num_dmas = 2
 args.cu_per_sqc = 4
 args.cu_per_scalar_cache = 4
 args.num_compute_units = 4
@@ -126,7 +138,13 @@
 args.wf_size = 32
 args.wavefronts_per_cu = 4
 args.num_cpus = 4
-args.num_dmas = 4
+# if user didn't specify number of DMAs, then assume 0
+if args.num_dmas < 1:
+  # currently VIPER does not support partial cache line writes,
+  # so we need numDMAs to be 0 for it
+  args.num_dmas = 0
+else:
+  args.num_dmas = 4
 args.cu_per_sqc = 4
 args.cu_per_scalar_cache = 4
 args.num_compute_units = 8

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57535
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2
Gerrit-Change-Number: 57535
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests, mem-ruby: add GPU Ruby random tester to nightly tests

2022-03-13 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/57536 )



Change subject: tests, mem-ruby: add GPU Ruby random tester to nightly tests
..

tests, mem-ruby: add GPU Ruby random tester to nightly tests

This commit adds the GPU protocol random tester to the nightly tests.
The input has been sized to take around 30 seconds and provide good
coverage for the coherence protocol.

Change-Id: If789d9d15a16fbd95fd7b115ffbf10e45bbb45c4
---
M tests/nightly.sh
1 file changed, 26 insertions(+), 0 deletions(-)



diff --git a/tests/nightly.sh b/tests/nightly.sh
index e421d97..978e463 100755
--- a/tests/nightly.sh
+++ b/tests/nightly.sh
@@ -109,6 +109,19 @@
 "scons build/${gpu_isa}/gem5.opt -j${compile_threads} \
 || (rm -rf build && scons build/${gpu_isa}/gem5.opt  
-j${compile_threads})"


+# first run the GPU protocol random tester -- it should take about 30  
seconds

+# to run and provides good coverage for the coherence protocol
+# Input choices (some are default and thus implicit):
+# - use small cache size to encourage races
+# - use small system size to encourage races since more requests per CU  
(and faster sim)

+# - use small address range to encourage more races
+# - use small episode length to encourage more races
+# - 50K tests runs in ~30 seconds with reasonably good coverage
+# - num-dmas = 0 because VIPER doesn't support partial cache line writes,  
which DMAs need

+docker run --rm -u $UID:$GID --volume "${gem5_root}":"${gem5_root}" -w \
+"${gem5_root}" gcr.io/gem5-test/gcn-gpu:latest   
build/${gpu_isa}/gem5.opt \
+configs/example/ruby_gpu_random_test.py --test-length=5  
--num-dmas=0

+
 # get square
 wget -qN http://dist.gem5.org/dist/develop/test-progs/square/square


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57536
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If789d9d15a16fbd95fd7b115ffbf10e45bbb45c4
Gerrit-Change-Number: 57536
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[release-staging-v21-0]: gpu-compute: Fix accidental execution when stopped at barrier

2021-03-04 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/41573 )


Change subject: gpu-compute: Fix accidental execution when stopped at  
barrier

..

gpu-compute: Fix accidental execution when stopped at barrier

Due the compute unit pipeline being executed in reverse order, there
exists a scenario where a compute unit will execute an extra
instruction when it's supposed to be stopped at a barrier. It occurs
as follows:

* The ScheduleStage sets a barrier instruction ready to execute.

* The ScoreboardCheckStage adds another instruction to the readyList.
This is where the barrier is checked, but because the barrier isn't
executing yet, the instruction can be passed along to ScheduleStage

* The barrier executes, and stalls

* The ScheduleStage sees that there's a new instruction and schedules
it to be executed.

* Only now will the ScoreboardCheckStage realize a barrier is active
and stall accordingly

* The subsequent instruction executes

This patch sets the wavefront status to be S_BARRIER in ScheduleStage
instead of in the barrier instruction execution in order to have
ScoreboardCheckStage realize that we're going to execute a barrier,
preventing it from marking another instruciton as ready.

Change-Id: Ib683e2c68f361d7ee60a3beaf53b4b6c888c9f8d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/41573
Reviewed-by: Matt Sinclair 
Reviewed-by: Alexandru Duțu 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/gcn3/insts/instructions.cc
M src/gpu-compute/schedule_stage.cc
2 files changed, 3 insertions(+), 2 deletions(-)

Approvals:
  Alexandru Duțu: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/arch/gcn3/insts/instructions.cc  
b/src/arch/gcn3/insts/instructions.cc

index 29de1a8..bde87ef 100644
--- a/src/arch/gcn3/insts/instructions.cc
+++ b/src/arch/gcn3/insts/instructions.cc
@@ -4114,8 +4114,6 @@

 if (wf->hasBarrier()) {
 int bar_id = wf->barrierId();
-assert(wf->getStatus() != Wavefront::S_BARRIER);
-wf->setStatus(Wavefront::S_BARRIER);
 cu->incNumAtBarrier(bar_id);
 DPRINTF(GPUSync, "CU[%d] WF[%d][%d] Wave[%d] - Stalling at "
 "barrier Id%d. %d waves now at barrier, %d waves "
diff --git a/src/gpu-compute/schedule_stage.cc  
b/src/gpu-compute/schedule_stage.cc

index 8a2ea18..ace6d0c 100644
--- a/src/gpu-compute/schedule_stage.cc
+++ b/src/gpu-compute/schedule_stage.cc
@@ -314,6 +314,9 @@
 computeUnit.insertInPipeMap(wf);
 wavesInSch.emplace(wf->wfDynId);
 schList.at(exeType).push_back(std::make_pair(gpu_dyn_inst,  
RFBUSY));

+if (wf->isOldestInstBarrier() && wf->hasBarrier()) {
+wf->setStatus(Wavefront::S_BARRIER);
+}
 if (wf->isOldestInstWaitcnt()) {
 wf->setStatus(Wavefront::S_WAITCNT);
 }

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/41573
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: release-staging-v21-0
Gerrit-Change-Id: Ib683e2c68f361d7ee60a3beaf53b4b6c888c9f8d
Gerrit-Change-Number: 41573
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alexandru Duțu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby R. Bruce 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: Add missing transitions + wakes for Dma events

2021-03-11 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42463 )


Change subject: mem-ruby: Add missing transitions + wakes for Dma events
..

mem-ruby: Add missing transitions + wakes for Dma events

This also changes one of the wakeUpDependents calls to a
wakeUpAllDependentsAddr call to prevent a hang.

Change-Id: Ia076414e5c6d9c8c0b2576d1f442195d75d275fc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42463
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm
1 file changed, 3 insertions(+), 2 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm  
b/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm

index 684d03e..4d24891 100644
--- a/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm
+++ b/src/mem/ruby/protocol/MOESI_AMD_Base-dir.sm
@@ -1119,7 +1119,7 @@

   // The exit state is always going to be U, so wakeUpDependents logic  
should be covered in all the

   // transitions which are flowing into U.
-  transition({BL, BS_M, BM_M, B_M, BP, BDW_P, BS_PM, BM_PM, B_PM, BS_Pm,  
BM_Pm, B_Pm, B}, {DmaRead,DmaWrite}){
+  transition({BL, BDR_M, BS_M, BM_M, B_M, BP, BDR_PM, BDW_P, BS_PM, BM_PM,  
B_PM, BDR_Pm, BS_Pm, BM_Pm, B_Pm, B}, {DmaRead,DmaWrite}){

 sd_stallAndWaitRequest;
   }

@@ -1280,6 +1280,7 @@
   transition(BDR_M, MemData, U) {
 mt_writeMemDataToTBE;
 dd_sendResponseDmaData;
+wa_wakeUpAllDependentsAddr;
 dt_deallocateTBE;
 pm_popMemQueue;
   }
@@ -1373,7 +1374,7 @@
 dd_sendResponseDmaData;
 // Check for pending requests from the core we put to sleep while  
waiting

 // for a response
-wa_wakeUpDependents;
+wa_wakeUpAllDependentsAddr;
 dt_deallocateTBE;
 pt_popTriggerQueue;
   }

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42463
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ia076414e5c6d9c8c0b2576d1f442195d75d275fc
Gerrit-Change-Number: 42463
Gerrit-PatchSet: 2
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby R. Bruce 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: dev-hsa: Fix size of HSA Queue

2021-03-25 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42423 )


Change subject: dev-hsa: Fix size of HSA Queue
..

dev-hsa: Fix size of HSA Queue

In the HSAQueueDescriptor ptr function, we mod the index by numElts, but
numElts was previously just set to size, which was the raw size of the
queue. This lead to indexing past the queue. We fix this by dividing by
the size by the AQL packet size to get the actual number of elements the
queue can hold.

We also add an assert for indexing into the queue, as there is a
scenario where the queue reports a larger size than it actually is.

Change-Id: Ie5e699379f303255305c279e58a34dc783df86a0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42423
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/dev/hsa/hsa_packet_processor.hh
1 file changed, 8 insertions(+), 1 deletion(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/dev/hsa/hsa_packet_processor.hh  
b/src/dev/hsa/hsa_packet_processor.hh

index babf702..93adf57 100644
--- a/src/dev/hsa/hsa_packet_processor.hh
+++ b/src/dev/hsa/hsa_packet_processor.hh
@@ -85,7 +85,7 @@
uint64_t hri_ptr, uint32_t size)
   : basePointer(base_ptr), doorbellPointer(db_ptr),
 writeIndex(0), readIndex(0),
-numElts(size), hostReadIndexPtr(hri_ptr),
+numElts(size / AQL_PACKET_SIZE), hostReadIndexPtr(hri_ptr),
 stalledOnDmaBufAvailability(false),
 dmaInProgress(false)
 {  }
@@ -98,6 +98,13 @@

 uint64_t ptr(uint64_t ix)
 {
+/**
+ * Sometimes queues report that their size is 512k, which would
+ * indicate numElts of 0x2000. However, they only have 256k
+ * mapped which means any index over 0x1000 will fail an
+ * address translation.
+ */
+assert(ix % numElts < 0x1000);
 return basePointer +
 ((ix % numElts) * objSize());
 }

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42423
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ie5e699379f303255305c279e58a34dc783df86a0
Gerrit-Change-Number: 42423
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Support dynamic scratch allocations

2021-03-25 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42201 )


Change subject: gpu-compute: Support dynamic scratch allocations
..

gpu-compute: Support dynamic scratch allocations

dGPUs in all versions of ROCm and APUs starting with ROCM 2.2 can
under-allocate scratch resources.  This patch adds support for
the CP to trigger a recoverable error so that the host can attempt to
re-allocate scratch to satisfy the currently stalled kernel.

Note that this patch does not include a mechanism to handle dynamic
scratch allocation for queues with in-flight kernels, as these queues
would first need to be drained and descheduled, which would require some
additional effort in the hsaPP and HW queue scheduler.  If the CP
encounters this scenerio it will assert.  I suspect this is not a
particularly common occurence in most of our applications so it is left
as a TODO.

This patch also fixes a few memory leaks and updates the old DMA callback
object interface to use a much cleaner c++11 lambda interface.

Change-Id: Ica8a5fc8283415507544d6cc49fa748fe84d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42201
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/gpu-compute/gpu_command_processor.cc
M src/gpu-compute/gpu_command_processor.hh
2 files changed, 146 insertions(+), 60 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/gpu-compute/gpu_command_processor.cc  
b/src/gpu-compute/gpu_command_processor.cc

index 5a9bbd5..842b515 100644
--- a/src/gpu-compute/gpu_command_processor.cc
+++ b/src/gpu-compute/gpu_command_processor.cc
@@ -157,7 +157,8 @@
 }

 void
-GPUCommandProcessor::updateHsaSignal(Addr signal_handle, uint64_t  
signal_value)
+GPUCommandProcessor::updateHsaSignal(Addr signal_handle, uint64_t  
signal_value,

+ HsaSignalCallbackFunction function)
 {
 // The signal value is aligned 8 bytes from
 // the actual handle in the runtime
@@ -166,10 +167,9 @@
 Addr event_addr = getHsaSignalEventAddr(signal_handle);
 DPRINTF(GPUCommandProc, "Triggering completion signal: %x!\n",  
value_addr);


-Addr *new_signal = new Addr;
-*new_signal = signal_value;
+auto cb = new CPDmaCallback(function, signal_value);

-dmaWriteVirt(value_addr, sizeof(Addr), nullptr, new_signal, 0);
+dmaWriteVirt(value_addr, sizeof(Addr), cb, &cb->dmaBuffer, 0);

 auto tc = system()->threads[0];
 ConstVPtr mailbox_ptr(mailbox_addr, tc);
@@ -297,14 +297,15 @@
 void
 GPUCommandProcessor::initABI(HSAQueueEntry *task)
 {
-auto *readDispIdOffEvent = new ReadDispIdOffsetDmaEvent(*this, task);
+auto cb = new CPDmaCallback(
+[ = ] (const uint32_t &readDispIdOffset)
+{ ReadDispIdOffsetDmaEvent(task, readDispIdOffset); }, 0);

 Addr hostReadIdxPtr
 = hsaPP->getQueueDesc(task->queueId())->hostReadIndexPtr;

 dmaReadVirt(hostReadIdxPtr + sizeof(hostReadIdxPtr),
-sizeof(readDispIdOffEvent->readDispIdOffset), readDispIdOffEvent,
-&readDispIdOffEvent->readDispIdOffset);
+sizeof(uint32_t), cb, &cb->dmaBuffer);
 }

 System*
diff --git a/src/gpu-compute/gpu_command_processor.hh  
b/src/gpu-compute/gpu_command_processor.hh

index 342f788..c78ae0b 100644
--- a/src/gpu-compute/gpu_command_processor.hh
+++ b/src/gpu-compute/gpu_command_processor.hh
@@ -45,6 +45,7 @@
 #ifndef __DEV_HSA_GPU_COMMAND_PROCESSOR_HH__
 #define __DEV_HSA_GPU_COMMAND_PROCESSOR_HH__

+#include "debug/GPUCommandProc.hh"
 #include "dev/hsa/hsa_device.hh"
 #include "dev/hsa/hsa_signal.hh"
 #include "gpu-compute/gpu_compute_driver.hh"
@@ -58,6 +59,7 @@
 {
   public:
 typedef GPUCommandProcessorParams Params;
+typedef std::function  
HsaSignalCallbackFunction;


 GPUCommandProcessor() = delete;
 GPUCommandProcessor(const Params &p);
@@ -86,7 +88,9 @@
 AddrRangeList getAddrRanges() const override;
 System *system();

-void updateHsaSignal(Addr signal_handle, uint64_t signal_value)  
override;

+void updateHsaSignal(Addr signal_handle, uint64_t signal_value,
+ HsaSignalCallbackFunction function =
+[] (const uint64_t &) { });

 uint64_t functionalReadHsaSignal(Addr signal_handle) override;

@@ -112,6 +116,33 @@

 void initABI(HSAQueueEntry *task);

+
+/**
+ * Wraps a std::function object in a DmaCallback.  Much cleaner than
+ * defining a bunch of callback objects for each desired behavior when  
a
+ * DMA completes.  Contains a built in templated buffer that can be  
used

+ * for DMA temporary storage.
+ */
+template 
+class CPDmaCallback : public DmaCallback
+{
+std::function _function;
+
+virtual void
+process() override
+{
+

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Remove unused functions

2021-03-25 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42202 )


Change subject: gpu-compute: Remove unused functions
..

gpu-compute: Remove unused functions

These functions were probably used for some stat collection,
but they're no longer used, so they're being removed

Change-Id: Ic99f22391c0d5ffb0e9963670efb35e503f9957d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42202
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/gpu-compute/gpu_dyn_inst.cc
M src/gpu-compute/gpu_dyn_inst.hh
2 files changed, 0 insertions(+), 37 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/gpu-compute/gpu_dyn_inst.cc  
b/src/gpu-compute/gpu_dyn_inst.cc

index b827632..b9b23d4 100644
--- a/src/gpu-compute/gpu_dyn_inst.cc
+++ b/src/gpu-compute/gpu_dyn_inst.cc
@@ -268,40 +268,6 @@
 return _staticInst->executed_as;
 }

-bool
-GPUDynInst::hasVgprRawDependence(GPUDynInstPtr s)
-{
-assert(s);
-for (int i = 0; i < getNumOperands(); ++i) {
-if (isVectorRegister(i) && isSrcOperand(i)) {
-for (int j = 0; j < s->getNumOperands(); ++j) {
-if (s->isVectorRegister(j) && s->isDstOperand(j)) {
-if (i == j)
-return true;
-}
-}
-}
-}
-return false;
-}
-
-bool
-GPUDynInst::hasSgprRawDependence(GPUDynInstPtr s)
-{
-assert(s);
-for (int i = 0; i < getNumOperands(); ++i) {
-if (isScalarRegister(i) && isSrcOperand(i)) {
-for (int j = 0; j < s->getNumOperands(); ++j) {
-if (s->isScalarRegister(j) && s->isDstOperand(j)) {
-if (i == j)
-return true;
-}
-}
-}
-}
-return false;
-}
-
 // Process a memory instruction and (if necessary) submit timing request
 void
 GPUDynInst::initiateAcc(GPUDynInstPtr gpuDynInst)
diff --git a/src/gpu-compute/gpu_dyn_inst.hh  
b/src/gpu-compute/gpu_dyn_inst.hh

index 851a46a..97eea01 100644
--- a/src/gpu-compute/gpu_dyn_inst.hh
+++ b/src/gpu-compute/gpu_dyn_inst.hh
@@ -101,9 +101,6 @@
 bool hasDestinationVgpr() const;
 bool hasSourceVgpr() const;

-bool hasSgprRawDependence(GPUDynInstPtr s);
-bool hasVgprRawDependence(GPUDynInstPtr s);
-
 // returns true if the string "opcodeStr" is found in the
 // opcode of the instruction
 bool isOpcode(const std::string& opcodeStr) const;



2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42202
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ic99f22391c0d5ffb0e9963670efb35e503f9957d
Gerrit-Change-Number: 42202
Gerrit-PatchSet: 4
Gerrit-Owner: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: Modify directory structure as prep for adding vega isa

2021-03-29 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42203 )


Change subject: arch-gcn3: Modify directory structure as prep for adding  
vega isa

..

arch-gcn3: Modify directory structure as prep for adding vega isa

Change-Id: I7c5f4a3a9d82ca4550e833dec2cd576dbe333627
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42203
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/SConscript
R src/arch/amdgpu/gcn3/SConscript
R src/arch/amdgpu/gcn3/SConsopts
A src/arch/amdgpu/gcn3/ast_interpreter.py
A src/arch/amdgpu/gcn3/ast_objects.py
R src/arch/amdgpu/gcn3/decoder.cc
A src/arch/amdgpu/gcn3/description_objects.py
A src/arch/amdgpu/gcn3/description_parser.py
R src/arch/amdgpu/gcn3/gpu_decoder.hh
R src/arch/amdgpu/gcn3/gpu_isa.hh
A src/arch/amdgpu/gcn3/gpu_isa_main.py
A src/arch/amdgpu/gcn3/gpu_isa_parser.py
R src/arch/amdgpu/gcn3/gpu_mem_helpers.hh
A src/arch/amdgpu/gcn3/gpu_registers.hh
R src/arch/amdgpu/gcn3/gpu_types.hh
A src/arch/amdgpu/gcn3/hand_coded.py
R src/arch/amdgpu/gcn3/insts/gpu_static_inst.cc
R src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh
R src/arch/amdgpu/gcn3/insts/inst_util.hh
R src/arch/amdgpu/gcn3/insts/instructions.cc
R src/arch/amdgpu/gcn3/insts/instructions.hh
R src/arch/amdgpu/gcn3/insts/op_encodings.cc
R src/arch/amdgpu/gcn3/insts/op_encodings.hh
R src/arch/amdgpu/gcn3/isa.cc
R src/arch/amdgpu/gcn3/operand.hh
R src/arch/amdgpu/gcn3/registers.cc
26 files changed, 7,485 insertions(+), 50 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass






3 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42203
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I7c5f4a3a9d82ca4550e833dec2cd576dbe333627
Gerrit-Change-Number: 42203
Gerrit-PatchSet: 5
Gerrit-Owner: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby R. Bruce 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add Vega ISA as a copy of GCN3

2021-03-31 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42204 )


Change subject: arch-vega: Add Vega ISA as a copy of GCN3
..

arch-vega: Add Vega ISA as a copy of GCN3

This changeset adds Vega support as a copy of GCN3.
Configs have been modified to include both ISAs.
Current implementation is not complete and needs
modifications to fully comply with the ISA manual:

https://developer.amd.com/wp-content/resources/
Vega_Shader_ISA_28July2017.pdf

Change-Id: I608aa6747a45594f8e1bd7802da1883cf612168b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42204
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M MAINTAINERS.yaml
M src/arch/SConscript
A src/arch/amdgpu/vega/SConscript
A src/arch/amdgpu/vega/SConsopts
A src/arch/amdgpu/vega/decoder.cc
A src/arch/amdgpu/vega/gpu_decoder.hh
A src/arch/amdgpu/vega/gpu_isa.hh
A src/arch/amdgpu/vega/gpu_mem_helpers.hh
A src/arch/amdgpu/vega/gpu_registers.hh
A src/arch/amdgpu/vega/gpu_types.hh
A src/arch/amdgpu/vega/insts/gpu_static_inst.cc
A src/arch/amdgpu/vega/insts/gpu_static_inst.hh
A src/arch/amdgpu/vega/insts/inst_util.hh
A src/arch/amdgpu/vega/insts/instructions.cc
A src/arch/amdgpu/vega/insts/instructions.hh
A src/arch/amdgpu/vega/insts/op_encodings.cc
A src/arch/amdgpu/vega/insts/op_encodings.hh
A src/arch/amdgpu/vega/isa.cc
A src/arch/amdgpu/vega/operand.hh
A src/arch/amdgpu/vega/registers.cc
20 files changed, 144,242 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass






2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42204
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I608aa6747a45594f8e1bd7802da1883cf612168b
Gerrit-Change-Number: 42204
Gerrit-PatchSet: 6
Gerrit-Owner: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby R. Bruce 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Update FLAT instructions to use offset

2021-03-31 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42213 )


Change subject: arch-vega: Update FLAT instructions to use offset
..

arch-vega: Update FLAT instructions to use offset

In Vega, flat instructions use an offset when
computing the address (section 9.4 of chapter 9
'Flat Memory Instructions' in Vega ISA manual).
This is different from the GCN3 baseline.

Change-Id: I9fe36f028014889ef566055458c451442403a289
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42213
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/op_encodings.hh
2 files changed, 20 insertions(+), 19 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 281fd95..0a01bf2 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -42461,7 +42461,7 @@

 addr.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 if (isFlatGlobal()) {
 gpuDynInst->computeUnit()->globalMemoryPipe
@@ -42552,7 +42552,7 @@

 addr.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 if (isFlatGlobal()) {
 gpuDynInst->computeUnit()->globalMemoryPipe
@@ -42644,7 +42644,7 @@

 addr.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 if (isFlatGlobal()) {
 gpuDynInst->computeUnit()->globalMemoryPipe
@@ -42707,7 +42707,7 @@

 addr.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 if (isFlatGlobal()) {
 gpuDynInst->computeUnit()->globalMemoryPipe
@@ -42770,7 +42770,7 @@

 addr.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 if (isFlatGlobal()) {
 gpuDynInst->computeUnit()->globalMemoryPipe
@@ -42841,7 +42841,7 @@

 addr.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 if (isFlatGlobal()) {
 gpuDynInst->computeUnit()->globalMemoryPipe
@@ -42920,7 +42920,7 @@
 data.read();


-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -42983,7 +42983,7 @@
 addr.read();
 data.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -43046,7 +43046,7 @@
 addr.read();
 data.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -43117,7 +43117,7 @@
 }
 }

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 if (isFlatGlobal()) {
 gpuDynInst->computeUnit()->globalMemoryPipe
@@ -43178,7 +43178,7 @@
 data1.read();
 data2.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -43252,7 +43252,7 @@
 data2.read();
 data3.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -43329,7 +43329,7 @@
 addr.read();
 data.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -43417,7 +43417,7 @@
 data.read();
 cmp.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -43501,7 +43501,7 @@
 addr.read();
 data.read();

-calcAddr(gpuDynInst, addr);
+calcAddr(gpuDynInst, addr, instData.OFFSET);
 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
 (reinterpret_cast(gpuDynInst->

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Add operand info class to GPUDynInst

2021-03-31 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42209 )


Change subject: gpu-compute: Add operand info class to GPUDynInst
..

gpu-compute: Add operand info class to GPUDynInst

This change adds a class that stores operand register info
for the GPUDynInst. The operand info is calculated when the
instruction object is created and stored for easy access
by the RF, etc.

Change-Id: I3cf267942e54fe60fcb4224d3b88da08a1a0226e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42209
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/gcn3/registers.hh
M src/gpu-compute/SConscript
M src/gpu-compute/fetch_unit.cc
M src/gpu-compute/gpu_dyn_inst.cc
M src/gpu-compute/gpu_dyn_inst.hh
M src/gpu-compute/wavefront.cc
6 files changed, 223 insertions(+), 9 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/gcn3/registers.hh b/src/arch/gcn3/registers.hh
index 7ad9b1f..df1ef4e 100644
--- a/src/arch/gcn3/registers.hh
+++ b/src/arch/gcn3/registers.hh
@@ -168,6 +168,12 @@
 typedef int64_t VecElemI64;
 typedef double VecElemF64;

+const int DWORDSize = sizeof(VecElemU32);
+/**
+ * Size of a single-precision register in DWORDs.
+ */
+const int RegSizeDWORDs = sizeof(VecElemU32) / DWORDSize;
+
 // typedefs for the various sizes/types of vector regs
 using VecRegU8 = ::VecRegT;
 using VecRegI8 = ::VecRegT;
diff --git a/src/gpu-compute/SConscript b/src/gpu-compute/SConscript
index e41e387..adb9b0e 100644
--- a/src/gpu-compute/SConscript
+++ b/src/gpu-compute/SConscript
@@ -80,6 +80,7 @@
 DebugFlag('GPUDisp')
 DebugFlag('GPUExec')
 DebugFlag('GPUFetch')
+DebugFlag('GPUInst')
 DebugFlag('GPUKernelInfo')
 DebugFlag('GPUMem')
 DebugFlag('GPUPort')
diff --git a/src/gpu-compute/fetch_unit.cc b/src/gpu-compute/fetch_unit.cc
index 62b9e73..d2af7b3 100644
--- a/src/gpu-compute/fetch_unit.cc
+++ b/src/gpu-compute/fetch_unit.cc
@@ -557,6 +557,7 @@
wavefront, gpu_static_inst,
wavefront->computeUnit->
 getAndIncSeqNum());
+gpu_dyn_inst->initOperandInfo(gpu_dyn_inst);
 wavefront->instructionBuffer.push_back(gpu_dyn_inst);

 DPRINTF(GPUFetch, "WF[%d][%d]: Id%ld decoded %s (%d bytes). "
@@ -597,6 +598,7 @@
wavefront, gpu_static_inst,
wavefront->computeUnit->
getAndIncSeqNum());
+gpu_dyn_inst->initOperandInfo(gpu_dyn_inst);
 wavefront->instructionBuffer.push_back(gpu_dyn_inst);

 DPRINTF(GPUFetch, "WF[%d][%d]: Id%d decoded split inst %s (%#x) "
diff --git a/src/gpu-compute/gpu_dyn_inst.cc  
b/src/gpu-compute/gpu_dyn_inst.cc

index b9b23d4..c08e4b9 100644
--- a/src/gpu-compute/gpu_dyn_inst.cc
+++ b/src/gpu-compute/gpu_dyn_inst.cc
@@ -33,6 +33,7 @@

 #include "gpu-compute/gpu_dyn_inst.hh"

+#include "debug/GPUInst.hh"
 #include "debug/GPUMem.hh"
 #include "gpu-compute/gpu_static_inst.hh"
 #include "gpu-compute/scalar_register_file.hh"
@@ -43,7 +44,8 @@
GPUStaticInst *static_inst, InstSeqNum instSeqNum)
 : GPUExecContext(_cu, _wf), scalarAddr(0),  
addr(computeUnit()->wfSize(),

   (Addr)0), numScalarReqs(0), isSaveRestore(false),
-  _staticInst(static_inst), _seqNum(instSeqNum)
+  _staticInst(static_inst), _seqNum(instSeqNum),
+  maxSrcVecRegOpSize(0), maxSrcScalarRegOpSize(0)
 {
 statusVector.assign(TheGpuISA::NumVecElemPerVecReg, 0);
 tlbHitLevel.assign(computeUnit()->wfSize(), -1);
@@ -82,6 +84,109 @@
 }
 }

+void
+GPUDynInst::initOperandInfo(GPUDynInstPtr &gpu_dyn_inst)
+{
+assert(gpu_dyn_inst->wavefront());
+/**
+ * Generate and cache the operand to register mapping information. This
+ * prevents this info from being generated multiple times throughout
+ * the CU pipeline.
+ */
+DPRINTF(GPUInst, "%s: generating operand info for %d operands\n",
+disassemble(), getNumOperands());
+
+for (int op_idx = 0; op_idx < getNumOperands(); ++op_idx) {
+int virt_idx(-1);
+int phys_idx(-1);
+int op_num_dwords(-1);
+
+if (isVectorRegister(op_idx)) {
+virt_idx = getRegisterIndex(op_idx, gpu_dyn_inst);
+op_num_dwords = numOpdDWORDs(op_idx);
+
+if (isSrcOperand(op_idx)) {
+std::vector virt_indices;
+std::vector phys_indices;
+
+if (op_num_dwords > maxSrcVecRegOpSize) {
+maxSrcVecRegOpSize = op_num_dwords;
+}
+
+for (int i = 0; i < op_num_dwords; ++i) {
+phys_idx = compu

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add decodings for Flat, Global, Scratch

2021-03-31 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42206 )


Change subject: arch-vega: Add decodings for Flat, Global, Scratch
..

arch-vega: Add decodings for Flat, Global, Scratch

Does not implement the functions yet

Change-Id: I32feab747b13bd2eff98983e3281c0d82e756221
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42206
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/vega/decoder.cc
M src/arch/amdgpu/vega/gpu_decoder.hh
2 files changed, 832 insertions(+), 9 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index 5dac7f9..3015313 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -1623,19 +1623,19 @@
 &Decoder::decode_OP_FLAT__FLAT_LOAD_DWORDX3,
 &Decoder::decode_OP_FLAT__FLAT_LOAD_DWORDX4,
 &Decoder::decode_OP_FLAT__FLAT_STORE_BYTE,
-&Decoder::decode_invalid,
+&Decoder::decode_OP_FLAT__FLAT_STORE_BYTE_D16_HI,
 &Decoder::decode_OP_FLAT__FLAT_STORE_SHORT,
-&Decoder::decode_invalid,
+&Decoder::decode_OP_FLAT__FLAT_STORE_SHORT_D16_HI,
 &Decoder::decode_OP_FLAT__FLAT_STORE_DWORD,
 &Decoder::decode_OP_FLAT__FLAT_STORE_DWORDX2,
 &Decoder::decode_OP_FLAT__FLAT_STORE_DWORDX3,
 &Decoder::decode_OP_FLAT__FLAT_STORE_DWORDX4,
-&Decoder::decode_invalid,
-&Decoder::decode_invalid,
-&Decoder::decode_invalid,
-&Decoder::decode_invalid,
-&Decoder::decode_invalid,
-&Decoder::decode_invalid,
+&Decoder::decode_OP_FLAT__FLAT_LOAD_UBYTE_D16,
+&Decoder::decode_OP_FLAT__FLAT_LOAD_UBYTE_D16_HI,
+&Decoder::decode_OP_FLAT__FLAT_LOAD_SBYTE_D16,
+&Decoder::decode_OP_FLAT__FLAT_LOAD_SBYTE_D16_HI,
+&Decoder::decode_OP_FLAT__FLAT_LOAD_SHORT_D16,
+&Decoder::decode_OP_FLAT__FLAT_LOAD_SHORT_D16_HI,
 &Decoder::decode_invalid,
 &Decoder::decode_invalid,
 &Decoder::decode_invalid,
@@ -1728,6 +1728,137 @@
 &Decoder::decode_invalid
 };

+IsaDecodeMethod Decoder::tableSubDecode_OP_GLOBAL[] = {
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_UBYTE,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SBYTE,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_USHORT,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SSHORT,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORD,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORDX2,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORDX3,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_DWORDX4,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_BYTE,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_BYTE_D16_HI,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_SHORT,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_SHORT_D16_HI,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORD,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORDX2,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORDX3,
+&Decoder::decode_OP_GLOBAL__GLOBAL_STORE_DWORDX4,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_UBYTE_D16,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_UBYTE_D16_HI,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SBYTE_D16,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SBYTE_D16_HI,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SHORT_D16,
+&Decoder::decode_OP_GLOBAL__GLOBAL_LOAD_SHORT_D16_HI,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+&Decoder::decode_invalid,
+   

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Update instruction encodings

2021-03-31 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42205 )


Change subject: arch-vega: Update instruction encodings
..

arch-vega: Update instruction encodings

This also renames VOP3 and VOP3_SDST_ENC to
VOP3A and VOP3B, matching the ISA.

Change-Id: I56f254433b1f3181d4ee6896f957a2256e3c7b29
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42205
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/vega/decoder.cc
M src/arch/amdgpu/vega/gpu_decoder.hh
M src/arch/amdgpu/vega/insts/inst_util.hh
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
M src/arch/amdgpu/vega/insts/op_encodings.cc
M src/arch/amdgpu/vega/insts/op_encodings.hh
7 files changed, 2,111 insertions(+), 2,063 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass






2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42205
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I56f254433b1f3181d4ee6896f957a2256e3c7b29
Gerrit-Change-Number: 42205
Gerrit-PatchSet: 6
Gerrit-Owner: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby R. Bruce 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3, gpu-compute: Update getRegisterIndex() API

2021-03-31 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42210 )


Change subject: arch-gcn3, gpu-compute: Update getRegisterIndex() API
..

arch-gcn3, gpu-compute: Update getRegisterIndex() API

This change removes the GPUDynInstPtr argument from
getRegisterIndex(). The dynamic inst was only needed
to get access to its parent WF's state so it could
determine the number of scalar registers the wave was
allocated. However, we can simply pass the number of
scalar registers directly. This cuts down on shared
pointer usage.

Change-Id: I29ab8d9a3de1f8b82b820ef421fc653284567c65
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42210
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh
M src/arch/amdgpu/gcn3/insts/op_encodings.cc
M src/arch/amdgpu/gcn3/insts/op_encodings.hh
M src/gpu-compute/fetch_unit.cc
M src/gpu-compute/gpu_dyn_inst.cc
M src/gpu-compute/gpu_dyn_inst.hh
M src/gpu-compute/gpu_static_inst.hh
M src/gpu-compute/scalar_register_file.cc
M src/gpu-compute/vector_register_file.cc
M src/gpu-compute/wavefront.cc
10 files changed, 86 insertions(+), 120 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh  
b/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh

index 03beb20..e4983e8 100644
--- a/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh
+++ b/src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh
@@ -70,7 +70,7 @@
 int getOperandSize(int opIdx) override { return 0; }

 int
-getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst) override
+getRegisterIndex(int opIdx, int num_scalar_regs) override
 {
 return 0;
 }
diff --git a/src/arch/amdgpu/gcn3/insts/op_encodings.cc  
b/src/arch/amdgpu/gcn3/insts/op_encodings.cc

index a6a3a26..34bd35f 100644
--- a/src/arch/amdgpu/gcn3/insts/op_encodings.cc
+++ b/src/arch/amdgpu/gcn3/insts/op_encodings.cc
@@ -128,21 +128,18 @@
 }

 int
-Inst_SOP2::getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst)
+Inst_SOP2::getRegisterIndex(int opIdx, int num_scalar_regs)
 {
 assert(opIdx >= 0);
 assert(opIdx < getNumOperands());

 switch (opIdx) {
   case 0:
-return opSelectorToRegIdx(instData.SSRC0,
-gpuDynInst->wavefront()->reservedScalarRegs);
+return opSelectorToRegIdx(instData.SSRC0, num_scalar_regs);
   case 1:
-return opSelectorToRegIdx(instData.SSRC1,
-gpuDynInst->wavefront()->reservedScalarRegs);
+return opSelectorToRegIdx(instData.SSRC1, num_scalar_regs);
   case 2:
-return opSelectorToRegIdx(instData.SDST,
-gpuDynInst->wavefront()->reservedScalarRegs);
+return opSelectorToRegIdx(instData.SDST, num_scalar_regs);
   default:
 fatal("Operand at idx %i does not exist\n", opIdx);
 return -1;
@@ -244,7 +241,7 @@
 }

 int
-Inst_SOPK::getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst)
+Inst_SOPK::getRegisterIndex(int opIdx, int num_scalar_regs)
 {
 assert(opIdx >= 0);
 assert(opIdx < getNumOperands());
@@ -253,8 +250,7 @@
   case 0:
 return  -1;
   case 1:
-return opSelectorToRegIdx(instData.SDST,
-gpuDynInst->wavefront()->reservedScalarRegs);
+return opSelectorToRegIdx(instData.SDST, num_scalar_regs);
   default:
 fatal("Operand at idx %i does not exist\n", opIdx);
 return -1;
@@ -349,7 +345,7 @@
 }

 int
-Inst_SOP1::getRegisterIndex(int opIdx, GPUDynInstPtr gpuDynInst)
+Inst_SOP1::getRegisterIndex(int opIdx, int num_scalar_regs)
 {
 assert(opIdx >= 0);
 assert(opIdx < getNumOperands());
@@ -359,14 +355,11 @@
 if (instData.OP == 0x1C) {
 // Special case for s_getpc, which has no source reg.
 // Instead, it implicitly reads the PC.
-return opSelectorToRegIdx(instData.SDST,
-gpuDynInst->wavefront()->reservedScalarRegs);
+return opSelectorToRegIdx(instData.SDST, num_scalar_regs);
 }
-return opSelectorToRegIdx(instData.SSRC0,
-gpuDynInst->wavefront()->reservedScalarRegs);
+return opSelectorToRegIdx(instData.SSRC0, num_scalar_regs);
   case 1:
-return opSelectorToRegIdx(instData.SDST,
-gpuDynInst->wavefront()->reservedScalarRegs);
+return opSelectorToRegIdx(instData.SDST, num_scalar_regs);
   default:
 fatal("Operand at idx %i does not exist\n", opIdx);
 return -1;
@@ -467

[gem5-dev] Change in gem5/gem5[develop]: arch-vega, gpu-compute: Add vectors to hold op info

2021-03-31 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/42211 )


Change subject: arch-vega, gpu-compute: Add vectors to hold op info
..

arch-vega, gpu-compute: Add vectors to hold op info

This removes the need for redundant functions like
isScalarRegister/isVectorRegister, as well as
isSrcOperand/isDstOperand. Also, the op info is only
generated once this way instead of every time it's needed.

Change-Id: I8af5080502ed08ed9107a441e2728828f86496f4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/42211
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/gcn3/insts/gpu_static_inst.hh
M src/arch/amdgpu/gcn3/insts/instructions.hh
M src/arch/amdgpu/gcn3/insts/op_encodings.cc
M src/arch/amdgpu/gcn3/insts/op_encodings.hh
M src/arch/amdgpu/vega/insts/gpu_static_inst.hh
M src/arch/amdgpu/vega/insts/instructions.hh
M src/arch/amdgpu/vega/insts/op_encodings.cc
M src/arch/amdgpu/vega/insts/op_encodings.hh
M src/gpu-compute/gpu_dyn_inst.cc
M src/gpu-compute/gpu_static_inst.hh
A src/gpu-compute/operand_info.hh
11 files changed, 1,233 insertions(+), 80,257 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass






2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/42211
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I8af5080502ed08ed9107a441e2728828f86496f4
Gerrit-Change-Number: 42211
Gerrit-PatchSet: 6
Gerrit-Owner: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby R. Bruce 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: dev-hsa,gpu-compute: Fix override for updateHsaSignal

2021-04-02 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/44046 )


Change subject: dev-hsa,gpu-compute: Fix override for updateHsaSignal
..

dev-hsa,gpu-compute: Fix override for updateHsaSignal

Change 965ad12 removed a parameter from the updateHsaSignal
function. Change 25e8a14 added the parameter back, but only for the
derived class, breaking the override. This patch adds that parameter
back to the base class, fixing the override.

Change-Id: Id1e96e29ca4be7f3ce244bac83a112e3250812d1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44046
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Alex Dutu 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
Maintainer: Matt Sinclair 
---
M src/dev/hsa/hsa_device.hh
M src/gpu-compute/gpu_command_processor.hh
2 files changed, 3 insertions(+), 2 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved
  Alex Dutu: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/dev/hsa/hsa_device.hh b/src/dev/hsa/hsa_device.hh
index 157c459..5b6f388 100644
--- a/src/dev/hsa/hsa_device.hh
+++ b/src/dev/hsa/hsa_device.hh
@@ -101,7 +101,8 @@
 fatal("%s does not need HSA driver\n", name());
 }
 virtual void
-updateHsaSignal(Addr signal_handle, uint64_t signal_value)
+updateHsaSignal(Addr signal_handle, uint64_t signal_value,
+HsaSignalCallbackFunction function = [] (const uint64_t &) { })
 {
 fatal("%s does not have HSA signal update functionality.\n",  
name());

 }
diff --git a/src/gpu-compute/gpu_command_processor.hh  
b/src/gpu-compute/gpu_command_processor.hh

index c78ae0b..67cda7d 100644
--- a/src/gpu-compute/gpu_command_processor.hh
+++ b/src/gpu-compute/gpu_command_processor.hh
@@ -90,7 +90,7 @@

 void updateHsaSignal(Addr signal_handle, uint64_t signal_value,
  HsaSignalCallbackFunction function =
-[] (const uint64_t &) { });
+[] (const uint64_t &) { }) override;

 uint64_t functionalReadHsaSignal(Addr signal_handle) override;


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/44046
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Id1e96e29ca4be7f3ce244bac83a112e3250812d1
Gerrit-Change-Number: 44046
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: ruby: fix typo in VIPER TCC triggerQueue

2021-04-27 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/44905 )



Change subject: ruby: fix typo in VIPER TCC triggerQueue
..

ruby: fix typo in VIPER TCC triggerQueue

The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead
of "TriggerMsg" for the triggerQueue_in port.  This was a benign
bug beacuse the msg type is not used in the in_port implementation
but still makes the SLICC harder to understand, so fixing it is
worthwhile.

Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9
---
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
1 file changed, 1 insertion(+), 1 deletion(-)



diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index e21ba99..6c07416 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -252,7 +252,7 @@


 // ** IN_PORTS **
-  in_port(triggerQueue_in, TiggerMsg, triggerQueue) {
+  in_port(triggerQueue_in, TriggerMsg, triggerQueue) {
 if (triggerQueue_in.isReady(clockEdge())) {
   peek(triggerQueue_in, TriggerMsg) {
 TBE tbe := TBEs.lookup(in_msg.addr);

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/44905
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9
Gerrit-Change-Number: 44905
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: ruby: fix typo in VIPER TCC triggerQueue

2021-04-27 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/44905 )


Change subject: ruby: fix typo in VIPER TCC triggerQueue
..

ruby: fix typo in VIPER TCC triggerQueue

The GPU VIPER TCC protocol accidentally used "TiggerMsg" instead
of "TriggerMsg" for the triggerQueue_in port.  This was a benign
bug beacuse the msg type is not used in the in_port implementation
but still makes the SLICC harder to understand, so fixing it is
worthwhile.

Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/44905
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Matthew Poremba 
Maintainer: Jason Lowe-Power 
Tested-by: kokoro 
---
M src/mem/ruby/protocol/GPU_VIPER-TCC.sm
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  Matthew Poremba: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm  
b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm

index e21ba99..6c07416 100644
--- a/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
+++ b/src/mem/ruby/protocol/GPU_VIPER-TCC.sm
@@ -252,7 +252,7 @@


 // ** IN_PORTS **
-  in_port(triggerQueue_in, TiggerMsg, triggerQueue) {
+  in_port(triggerQueue_in, TriggerMsg, triggerQueue) {
 if (triggerQueue_in.isReady(clockEdge())) {
   peek(triggerQueue_in, TriggerMsg) {
 TBE tbe := TBEs.lookup(in_msg.addr);

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/44905
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I88cbc72bac93bcc58a66f057a32f7bddf821cac9
Gerrit-Change-Number: 44905
Gerrit-PatchSet: 2
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Check for WAX dependences

2021-07-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47539 )


Change subject: gpu-compute: Check for WAX dependences
..

gpu-compute: Check for WAX dependences

This adds checking if the destination registers are free or busy
in the operandsReady() function for both scalar and vector
registers. This allows us to catch WAX dependences between instructions.

Change-Id: I0fb0b29e9608fca0d90c059422d4d9500d5b2a7d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47539
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/gpu-compute/scalar_register_file.cc
M src/gpu-compute/vector_register_file.cc
2 files changed, 22 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/gpu-compute/scalar_register_file.cc  
b/src/gpu-compute/scalar_register_file.cc

index 52e0a2f..3a00093 100644
--- a/src/gpu-compute/scalar_register_file.cc
+++ b/src/gpu-compute/scalar_register_file.cc
@@ -64,6 +64,17 @@
 }
 }

+for (const auto& dstScalarOp : ii->dstScalarRegOperands()) {
+for (const auto& physIdx : dstScalarOp.physIndices()) {
+if (regBusy(physIdx)) {
+DPRINTF(GPUSRF, "WAX stall: WV[%d]: %s: physReg[%d]\n",
+w->wfDynId, ii->disassemble(), physIdx);
+w->stats.numTimesBlockedDueWAXDependencies++;
+return false;
+}
+}
+}
+
 return true;
 }

diff --git a/src/gpu-compute/vector_register_file.cc  
b/src/gpu-compute/vector_register_file.cc

index dc5434d..2355643 100644
--- a/src/gpu-compute/vector_register_file.cc
+++ b/src/gpu-compute/vector_register_file.cc
@@ -71,6 +71,17 @@
 }
 }

+for (const auto& dstVecOp : ii->dstVecRegOperands()) {
+for (const auto& physIdx : dstVecOp.physIndices()) {
+if (regBusy(physIdx)) {
+DPRINTF(GPUVRF, "WAX stall: WV[%d]: %s: physReg[%d]\n",
+w->wfDynId, ii->disassemble(), physIdx);
+w->stats.numTimesBlockedDueWAXDependencies++;
+return false;
+}
+}
+}
+
 return true;
 }


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47539
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0fb0b29e9608fca0d90c059422d4d9500d5b2a7d
Gerrit-Change-Number: 47539
Gerrit-PatchSet: 2
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: Read registers in execute instead of initiateAcc

2021-07-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/45345 )


Change subject: arch-gcn3: Read registers in execute instead of initiateAcc
..

arch-gcn3: Read registers in execute instead of initiateAcc

Certain memory writes were reading their registers in
initiateAcc, which lead to scenarios where a subsequent instruction
would execute, clobbering the value in that register before the memory
writes' initiateAcc method was called, causing the memory write to read
wrong data.

This patch moves all register reads to execute, preventing the above
scenario from happening.

Change-Id: Iee107c19e4b82c2e172bf2d6cc95b79983a43d83
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45345
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Reviewed-by: Alex Dutu 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/gcn3/insts/instructions.cc
1 file changed, 116 insertions(+), 125 deletions(-)

Approvals:
  Alex Dutu: Looks good to me, approved
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc  
b/src/arch/amdgpu/gcn3/insts/instructions.cc

index b5a4300..8c77b8c 100644
--- a/src/arch/amdgpu/gcn3/insts/instructions.cc
+++ b/src/arch/amdgpu/gcn3/insts/instructions.cc
@@ -5068,8 +5068,13 @@
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
 ScalarRegU32 offset(0);
 ConstScalarOperandU64 addr(gpuDynInst, instData.SBASE << 1);
+ConstScalarOperandU32 sdata(gpuDynInst, instData.SDATA);

 addr.read();
+sdata.read();
+
+std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(),
+sizeof(ScalarRegU32));

 if (instData.IMM) {
 offset = extData.OFFSET;
@@ -5093,10 +5098,6 @@
 void
 Inst_SMEM__S_STORE_DWORD::initiateAcc(GPUDynInstPtr gpuDynInst)
 {
-ConstScalarOperandU32 sdata(gpuDynInst, instData.SDATA);
-sdata.read();
-std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(),
-sizeof(ScalarRegU32));
 initMemWrite<1>(gpuDynInst);
 } // initiateAcc

@@ -5127,8 +5128,13 @@
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
 ScalarRegU32 offset(0);
 ConstScalarOperandU64 addr(gpuDynInst, instData.SBASE << 1);
+ConstScalarOperandU64 sdata(gpuDynInst, instData.SDATA);

 addr.read();
+sdata.read();
+
+std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(),
+sizeof(ScalarRegU64));

 if (instData.IMM) {
 offset = extData.OFFSET;
@@ -5152,10 +5158,6 @@
 void
 Inst_SMEM__S_STORE_DWORDX2::initiateAcc(GPUDynInstPtr gpuDynInst)
 {
-ConstScalarOperandU64 sdata(gpuDynInst, instData.SDATA);
-sdata.read();
-std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(),
-sizeof(ScalarRegU64));
 initMemWrite<2>(gpuDynInst);
 } // initiateAcc

@@ -5186,8 +5188,13 @@
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
 ScalarRegU32 offset(0);
 ConstScalarOperandU64 addr(gpuDynInst, instData.SBASE << 1);
+ConstScalarOperandU128 sdata(gpuDynInst, instData.SDATA);

 addr.read();
+sdata.read();
+
+std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(),
+4 * sizeof(ScalarRegU32));

 if (instData.IMM) {
 offset = extData.OFFSET;
@@ -5211,10 +5218,6 @@
 void
 Inst_SMEM__S_STORE_DWORDX4::initiateAcc(GPUDynInstPtr gpuDynInst)
 {
-ConstScalarOperandU128 sdata(gpuDynInst, instData.SDATA);
-sdata.read();
-std::memcpy((void*)gpuDynInst->scalar_data, sdata.rawDataPtr(),
-4 * sizeof(ScalarRegU32));
 initMemWrite<4>(gpuDynInst);
 } // initiateAcc

@@ -35746,9 +35749,18 @@
 ConstVecOperandU32 addr1(gpuDynInst, extData.VADDR + 1);
 ConstScalarOperandU128 rsrcDesc(gpuDynInst, extData.SRSRC * 4);
 ConstScalarOperandU32 offset(gpuDynInst, extData.SOFFSET);
+ConstVecOperandI8 data(gpuDynInst, extData.VDATA);

 rsrcDesc.read();
 offset.read();
+data.read();
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->d_data))[lane]
+= data[lane];
+}
+}

 int inst_offset = instData.OFFSET;

@@ -35793,16 +35805,6 @@
 void
 Inst_MUBUF__BUFFER_STORE_BYTE::initiateAcc(GPUDynInstPtr gpuDynInst)
 {
-ConstVecOperandI8 data(gpuDynInst, extData.VDATA);
-data.read();
-
-for (int lane = 0; lane < NumVecElemPerVecR

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3,gpu-compute: Set gpuDynInst exec_mask before use

2021-07-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/45346 )


Change subject: arch-gcn3,gpu-compute: Set gpuDynInst exec_mask before use
..

arch-gcn3,gpu-compute: Set gpuDynInst exec_mask before use

vector_register_file uses the exec_mask of a memory instruction in
order to determine if it should mark a register as in-use or not.
Previously, the exec_mask of memory instructions was only set on
execution of that instruction, which occurs after the code in
vector_register_file. This led to the code reading potentially garbage
data, leading to a scenario where a register would be marked used when
it shouldn't be.

This fix sets the exec_mask of memory instructions in schedule_stage,
which works because the only time the wavefront execMask() is updated is
on a instruction executing, and we know the previous instruction will
have executed by the time schedule_stage executes, due to the order the
pipeline is executed in.

This also undoes part of a patch from last year (62ec973) which treated
the symptom of accidental register allocation, without preventing the
registers from being allocated in the first place.

This patch also removes now redundant code that sets the exec_mask in
instructions.cc for memory instructions

Change-Id: Idabd3502764fb06133ac2458606c1aaf6f04
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45346
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matthew Poremba 
Tested-by: kokoro 
---
M src/arch/amdgpu/gcn3/insts/instructions.cc
M src/gpu-compute/schedule_stage.cc
M src/gpu-compute/vector_register_file.cc
3 files changed, 30 insertions(+), 156 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved; Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve
  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc  
b/src/arch/amdgpu/gcn3/insts/instructions.cc

index 8c77b8c..bc66ebe 100644
--- a/src/arch/amdgpu/gcn3/insts/instructions.cc
+++ b/src/arch/amdgpu/gcn3/insts/instructions.cc
@@ -31243,7 +31243,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -31304,7 +31303,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -31368,7 +31366,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -31548,7 +31545,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -31608,7 +31604,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -32073,7 +32068,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -32135,7 +32129,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -32200,7 +32193,6 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 gpuDynInst->execUnitId = wf->execUnitId;
-gpuDynInst->exec_mask = wf->execMask();
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(
 gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
@@ -32284,7 +32276,6 @@
 {
 W

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3,arch-vega,gpu-compute: Move request counters

2021-07-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/45347 )


Change subject: arch-gcn3,arch-vega,gpu-compute: Move request counters
..

arch-gcn3,arch-vega,gpu-compute: Move request counters

When the Vega ISA got committed, it lacked the request counter
tracking for memory requests that existed in the GCN3 code.

Instead of copying over the same lines from the GCN3 code to the Vega
code, this commit makes the various memory pipelines handle updating the
request counter information instead, as every memory instruction calls a
memory pipeline.

This commit also adds an issueRequest in scalar_memory_pipeline, as
previously, the gpuDynInsts were explicitly placed in the queue of
issuedRequests.

Change-Id: I5140d3b2f12be582f2ae9ff7c433167aeec5b68e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/45347
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/gcn3/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.cc
M src/gpu-compute/global_memory_pipeline.cc
M src/gpu-compute/local_memory_pipeline.cc
M src/gpu-compute/scalar_memory_pipeline.cc
M src/gpu-compute/scalar_memory_pipeline.hh
6 files changed, 82 insertions(+), 408 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc  
b/src/arch/amdgpu/gcn3/insts/instructions.cc

index bc66ebe..a421454 100644
--- a/src/arch/amdgpu/gcn3/insts/instructions.cc
+++ b/src/arch/amdgpu/gcn3/insts/instructions.cc
@@ -4497,12 +4497,7 @@
 calcAddr(gpuDynInst, addr, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe
-.getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefront()->validateRequestCounters();
+.issueRequest(gpuDynInst);
 }

 void
@@ -4556,12 +4551,7 @@
 calcAddr(gpuDynInst, addr, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe.
-getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefront()->validateRequestCounters();
+issueRequest(gpuDynInst);
 }

 void
@@ -4613,12 +4603,7 @@
 calcAddr(gpuDynInst, addr, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe.
-getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefront()->validateRequestCounters();
+issueRequest(gpuDynInst);
 }

 void
@@ -4670,12 +4655,7 @@
 calcAddr(gpuDynInst, addr, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe.
-getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefront()->validateRequestCounters();
+issueRequest(gpuDynInst);
 }

 void
@@ -4727,12 +4707,7 @@
 calcAddr(gpuDynInst, addr, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe.
-getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefront()->validateRequestCounters();
+issueRequest(gpuDynInst);
 }

 void
@@ -4785,12 +4760,7 @@
 calcAddr(gpuDynInst, rsrcDesc, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe
-.getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefront()->validateRequestCounters();
+.issueRequest(gpuDynInst);
 } // execute

 void
@@ -4844,12 +4814,7 @@
 calcAddr(gpuDynInst, rsrcDesc, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe
-.getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefront()->validateRequestCounters();
+.issueRequest(gpuDynInst);
 } // execute

 void
@@ -4903,12 +4868,7 @@
 calcAddr(gpuDynInst, rsrcDesc, offset);

 gpuDynInst->computeUnit()->scalarMemoryPipe
-.getGMReqFIFO().push(gpuDynInst);
-
-wf->scalarRdGmReqsInPipe--;
-wf->scalarOutstandingReqsRdGm++;
-gpuDynInst->wavefront()->outstandingReqs++;
-gpuDynInst->wavefron

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Fix s_endpgm instruction

2021-07-08 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47519 )


Change subject: arch-vega: Fix s_endpgm instruction
..

arch-vega: Fix s_endpgm instruction

Copy over changes that had been made to s_engpgm in GCN3
but weren't added to the Vega implementation

Change-Id: I1063f83b1ce8f7c5e451c8c227265715c8f725b9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47519
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Reviewed-by: Alex Dutu 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 11 insertions(+), 2 deletions(-)

Approvals:
  Alex Dutu: Looks good to me, approved
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 47ea892..6e8c854 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -4137,7 +4137,12 @@
 ComputeUnit *cu = gpuDynInst->computeUnit();

 // delete extra instructions fetched for completed work-items
-wf->instructionBuffer.clear();
+wf->instructionBuffer.erase(wf->instructionBuffer.begin() + 1,
+wf->instructionBuffer.end());
+
+if (wf->pendingFetch) {
+wf->dropFetch = true;
+}

 wf->computeUnit->fetchStage.fetchUnit(wf->simdId)
 .flushBuf(wf->wfSlotId);
@@ -4215,8 +4220,11 @@
 bool kernelEnd =
  
wf->computeUnit->shader->dispatcher().isReachingKernelEnd(wf);


+bool relNeeded =
+wf->computeUnit->shader->impl_kern_end_rel;
+
 //if it is not a kernel end, then retire the workgroup directly
-if (!kernelEnd) {
+if (!kernelEnd || !relNeeded) {
 wf->computeUnit->shader->dispatcher().notifyWgCompl(wf);
 wf->setStatus(Wavefront::S_STOPPED);
 wf->computeUnit->completedWGs++;
@@ -4232,6 +4240,7 @@
  * the complex
  */
 setFlag(MemSync);
+setFlag(GlobalSegment);
 // Notify Memory System of Kernel Completion
 // Kernel End = isKernel + isMemSync
 wf->setStatus(Wavefront::S_RETURNING);

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47519
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I1063f83b1ce8f7c5e451c8c227265715c8f725b9
Gerrit-Change-Number: 47519
Gerrit-PatchSet: 2
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add missing return to flat_load_dwordx4

2021-07-08 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47520 )


Change subject: arch-vega: Add missing return to flat_load_dwordx4
..

arch-vega: Add missing return to flat_load_dwordx4

Change-Id: Ibf56c25a3d22d3c12ae2c1bb11f00f4a44b5919a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47520
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Reviewed-by: Alex Dutu 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Alex Dutu: Looks good to me, approved
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 6e8c854..cc5a161 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -42984,6 +42984,7 @@
 if (gpuDynInst->exec_mask.none()) {
 wf->decVMemInstsIssued();
 wf->decLGKMInstsIssued();
+return;
 }

 gpuDynInst->execUnitId = wf->execUnitId;

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47520
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ibf56c25a3d22d3c12ae2c1bb11f00f4a44b5919a
Gerrit-Change-Number: 47520
Gerrit-PatchSet: 2
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add decoding for implemented insts

2021-07-08 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47521 )


Change subject: arch-vega: Add decoding for implemented insts
..

arch-vega: Add decoding for implemented insts

Certain instructions were implemented in instructions.cc,
but weren't actually being decoded by the decoder, causing
the decoder to return nullptr for valid instructions.

This patch fixes the decoder to return the proper instruction
class for implemented instructions

Change-Id: I8d8525a1c435147017cb38d9df8e1675986ef04b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47521
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Reviewed-by: Alex Dutu 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/decoder.cc
1 file changed, 9 insertions(+), 9 deletions(-)

Approvals:
  Alex Dutu: Looks good to me, approved
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index 359e125..e4b7922 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -4158,19 +4158,19 @@
 GPUStaticInst*
 Decoder::decode_OP_VOP2__V_ADD_U32(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP2__V_ADD_U32(&iFmt->iFmt_VOP2);
 }

 GPUStaticInst*
 Decoder::decode_OP_VOP2__V_SUB_U32(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP2__V_SUB_U32(&iFmt->iFmt_VOP2);
 }

 GPUStaticInst*
 Decoder::decode_OP_VOP2__V_SUBREV_U32(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP2__V_SUBREV_U32(&iFmt->iFmt_VOP2);
 }

 GPUStaticInst*
@@ -4446,7 +4446,7 @@
 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_MUL_HI_I32(MachInst iFmt)
 {
-return nullptr;
+return new Inst_SOP2__S_MUL_I32(&iFmt->iFmt_SOP2);
 }

 GPUStaticInst*
@@ -6942,31 +6942,31 @@
 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAD_F16(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP3__V_MAD_F16(&iFmt->iFmt_VOP3A);
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAD_U16(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP3__V_MAD_U16(&iFmt->iFmt_VOP3A);
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAD_I16(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP3__V_MAD_I16(&iFmt->iFmt_VOP3A);
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_FMA_F16(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP3__V_FMA_F16(&iFmt->iFmt_VOP3A);
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_DIV_FIXUP_F16(MachInst iFmt)
 {
-return nullptr;
+return new Inst_VOP3__V_DIV_FIXUP_F16(&iFmt->iFmt_VOP3A);
 }

 GPUStaticInst*

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47521
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I8d8525a1c435147017cb38d9df8e1675986ef04b
Gerrit-Change-Number: 47521
Gerrit-PatchSet: 2
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Add mmap functionality to GPURenderDriver

2021-07-09 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47523 )


Change subject: gpu-compute: Add mmap functionality to GPURenderDriver
..

gpu-compute: Add mmap functionality to GPURenderDriver

dGPUs mmap the GPURenderDriver, however it doesn't appear that they do
anything with it. This patch implements the mmap function by just
returning the address provided, while not doing anything else

Change-Id: Ia010a2aebcf7e2c75e22d93dfb440937d1bef3b1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47523
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/gpu-compute/gpu_render_driver.cc
M src/gpu-compute/gpu_render_driver.hh
2 files changed, 14 insertions(+), 1 deletion(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/gpu-compute/gpu_render_driver.cc  
b/src/gpu-compute/gpu_render_driver.cc

index 260a61f..ad75c82 100644
--- a/src/gpu-compute/gpu_render_driver.cc
+++ b/src/gpu-compute/gpu_render_driver.cc
@@ -41,7 +41,7 @@

 /* ROCm 4 utilizes the render driver located at /dev/dri/renderDXXX. This
  * patch implements a very simple driver that just returns a file
- * descriptor when opened, as testing has shown that's all that's needed
+ * descriptor when opened.
  */
 int
 GPURenderDriver::open(ThreadContext *tc, int mode, int flags)
@@ -52,4 +52,14 @@
 return tgt_fd;
 }

+/* DGPUs try to mmap the driver file. It doesn't appear they do anything
+ * with it, so we just return the address that's provided
+ */
+Addr GPURenderDriver::mmap(ThreadContext *tc, Addr start, uint64_t length,
+   int prot, int tgt_flags, int tgt_fd, off_t  
offset)

+{
+warn_once("GPURenderDriver::mmap returning start address %#x", start);
+return start;
+}
+
 } // namespace gem5
diff --git a/src/gpu-compute/gpu_render_driver.hh  
b/src/gpu-compute/gpu_render_driver.hh

index f94fdef..ab1ddcf 100644
--- a/src/gpu-compute/gpu_render_driver.hh
+++ b/src/gpu-compute/gpu_render_driver.hh
@@ -50,6 +50,9 @@
 {
 return -EBADF;
 }
+
+Addr mmap(ThreadContext *tc, Addr start, uint64_t length,
+  int prot, int tgt_flags, int tgt_fd, off_t offset) override;
 };

 } // namespace gem5

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47523
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ia010a2aebcf7e2c75e22d93dfb440937d1bef3b1
Gerrit-Change-Number: 47523
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: configs,gpu-compute: Set proper dGPUPoolID defaults

2021-07-09 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47527 )


Change subject: configs,gpu-compute: Set proper dGPUPoolID defaults
..

configs,gpu-compute: Set proper dGPUPoolID defaults

In GPU.py, dGPUPoolID is defined as an int, but was defaulted
to False. Explicitly set it to 0, instead.

In apu_se.py, dGPUPoolID was being set to 1, but that was
resulting in crashes. Setting it to 0 avoids those crashes.

Change-Id: I0f1161588279a335bbd0d8ae7acda97fc23201b5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47527
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M configs/example/apu_se.py
M src/gpu-compute/GPU.py
2 files changed, 3 insertions(+), 2 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 98a1e19..6f686f3 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -432,9 +432,10 @@
 args.m_type = 6

 # HSA kernel mode driver
+# dGPUPoolID is 0 because we only have one memory pool
 gpu_driver = GPUComputeDriver(filename = "kfd", isdGPU = args.dgpu,
   gfxVersion = args.gfx_version,
-  dGPUPoolID = 1, m_type = args.m_type)
+  dGPUPoolID = 0, m_type = args.m_type)

 renderDriNum = 128
 render_driver = GPURenderDriver(filename = f'dri/renderD{renderDriNum}')
diff --git a/src/gpu-compute/GPU.py b/src/gpu-compute/GPU.py
index 6b0bb2e..d2f9b6e 100644
--- a/src/gpu-compute/GPU.py
+++ b/src/gpu-compute/GPU.py
@@ -245,7 +245,7 @@
 device = Param.GPUCommandProcessor('GPU controlled by this driver')
 isdGPU = Param.Bool(False, 'Driver is for a dGPU')
 gfxVersion = Param.GfxVersion('gfx801', 'ISA of gpu to model')
-dGPUPoolID = Param.Int(False, 'Pool ID for dGPU.')
+dGPUPoolID = Param.Int(0, 'Pool ID for dGPU.')
 # Default Mtype for caches
 #-- 1   1   1   C_RW_S  (Cached-ReadWrite-Shared)
 #-- 1   1   0   C_RW_US (Cached-ReadWrite-Unshared)

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47527
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0f1161588279a335bbd0d8ae7acda97fc23201b5
Gerrit-Change-Number: 47527
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: configs: Set valid heap_type values

2021-07-09 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47528 )


Change subject: configs: Set valid heap_type values
..

configs: Set valid heap_type values

The variables that were used to set heap_type don't exist.
Explicitly set them to the proper values.

Also add pointer to what heap value means in the ROCm stack.

Change-Id: I8df7fca7442f6640be1154ef147c4e302ea491bb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47528
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M configs/example/hsaTopology.py
1 file changed, 12 insertions(+), 2 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/configs/example/hsaTopology.py b/configs/example/hsaTopology.py
index 28060cc..a4dbebb 100644
--- a/configs/example/hsaTopology.py
+++ b/configs/example/hsaTopology.py
@@ -140,7 +140,9 @@
 # CPU memory reporting
 mem_dir = joinpath(node_dir, 'mem_banks/0')
 remake_dir(mem_dir)
-mem_prop = 'heap_type %s\n' % HsaHeaptype.HSA_HEAPTYPE_SYSTEM.value + \
+# Heap type value taken from real system, heap type values:
+#  
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317
+mem_prop = 'heap_type 0\n'   +  
\

'size_in_bytes 33704329216\n'+ \
'flags 0\n'  + \
'width 72\n' + \
@@ -221,7 +223,9 @@
 # TODO: Extract size, clk, and width from sim paramters
 mem_dir = joinpath(node_dir, 'mem_banks/0')
 remake_dir(mem_dir)
-mem_prop = 'heap_type %s\n' % heap_type.value   + \
+# Heap type value taken from real system, heap type values:
+#  
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317

+mem_prop = 'heap_type 1\n'  + \
'size_in_bytes 17163091968\n'+ \
'flags 0\n'  + \
'width 2048\n'   + \
@@ -316,6 +320,8 @@
 # CPU memory reporting
 mem_dir = joinpath(node_dir, 'mem_banks/0')
 remake_dir(mem_dir)
+# Heap type value taken from real system, heap type values:
+#  
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317

 mem_prop = 'heap_type 0\n'  + \
'size_in_bytes 33704329216\n'+ \
'flags 0\n'  + \
@@ -394,6 +400,8 @@
 # TODO: Extract size, clk, and width from sim paramters
 mem_dir = joinpath(node_dir, 'mem_banks/0')
 remake_dir(mem_dir)
+# Heap type value taken from real system, heap type values:
+#  
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317

 mem_prop = 'heap_type 1\n'  + \
'size_in_bytes 4294967296\n' + \
'flags 0\n'  + \
@@ -471,6 +479,8 @@
 mem_dir = joinpath(node_dir, f'mem_banks/{i}')
 remake_dir(mem_dir)

+# Heap type value taken from real system, heap type values:
+#  
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-4.0.x/include/hsakmttypes.h#L317
 mem_prop = f'heap_type  
0\n' + \
f'size_in_bytes  
{toMemorySize(options.mem_size)}'+ \
f'flags  
0\n' + \


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47528
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I8df7fca7442f6640be1154ef147c4e302ea491bb
Gerrit-Change-Number: 47528
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: configs: Don't report CPU cores on Fiji properties

2021-07-09 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47525 )


Change subject: configs: Don't report CPU cores on Fiji properties
..

configs: Don't report CPU cores on Fiji properties

ROCm determines if a device is a dGPU in two ways. The first
is by looking at the device ID. The second is through a flag that
gets set only if the reported cpu_cores_count is 0.

If these don't agree, ROCm breaks when doing memory operations.

Previously, cpu_cores_count was non-zero on the Fiji config.
This patch sets it to 0 to appease ROCm

Change-Id: I0fd0ce724f491ed6a4598188b3799468668585f4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47525
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Matthew Poremba 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M configs/example/hsaTopology.py
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/configs/example/hsaTopology.py b/configs/example/hsaTopology.py
index 78193e0..28060cc 100644
--- a/configs/example/hsaTopology.py
+++ b/configs/example/hsaTopology.py
@@ -359,7 +359,7 @@
 file_append((io_dir, 'properties'), io_prop)

 # Populate GPU node properties
-node_prop = 'cpu_cores_count %s\n' %  
options.num_cpus   + \
+node_prop = 'cpu_cores_count  
0\n'   + \
 'simd_count %s\n'  
\
 % (options.num_compute_units *  
options.simds_per_cu)+ \
 'mem_banks_count  
1\n'   + \




2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47525
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0fd0ce724f491ed6a4598188b3799468668585f4
Gerrit-Change-Number: 47525
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-x86: Ignore mbind syscall

2021-07-09 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47526 )


Change subject: arch-x86: Ignore mbind syscall
..

arch-x86: Ignore mbind syscall

mbind gets called when running with a dGPU in ROCm 4,
but we are able to ignore it without breaking anything

Change-Id: I7c1ba47656122a5eb856981dca2a05359098e3b2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47526
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/x86/linux/syscall_tbl64.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/x86/linux/syscall_tbl64.cc  
b/src/arch/x86/linux/syscall_tbl64.cc

index e1eed18..5c983c0 100644
--- a/src/arch/x86/linux/syscall_tbl64.cc
+++ b/src/arch/x86/linux/syscall_tbl64.cc
@@ -284,7 +284,7 @@
 { 234, "tgkill", tgkillFunc },
 { 235, "utimes" },
 { 236, "vserver" },
-{ 237, "mbind" },
+{ 237, "mbind", ignoreFunc },
 { 238, "set_mempolicy" },
 { 239, "get_mempolicy", ignoreFunc },
 { 240, "mq_open" },



1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47526
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I7c1ba47656122a5eb856981dca2a05359098e3b2
Gerrit-Change-Number: 47526
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Gabe Black 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: configs: Add shared_cpu_list to cache directories

2021-07-09 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47524 )


Change subject: configs: Add shared_cpu_list to cache directories
..

configs: Add shared_cpu_list to cache directories

The ROCm thunk uses this file instead of the
shared_cpu_map file.

Change-Id: I985512245c9f51106b8347412ed643f78b567b24
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47524
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Reviewed-by: Jason Lowe-Power 
Maintainer: Matt Sinclair 
---
M configs/common/FileSystemConfig.py
1 file changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/configs/common/FileSystemConfig.py  
b/configs/common/FileSystemConfig.py

index 0d9f221..66a6315 100644
--- a/configs/common/FileSystemConfig.py
+++ b/configs/common/FileSystemConfig.py
@@ -217,6 +217,8 @@
 file_append((indexdir, 'number_of_sets'), num_sets)
 file_append((indexdir, 'physical_line_partition'), '1')
 file_append((indexdir, 'shared_cpu_map'), hex_mask(cpus))
+file_append((indexdir, 'shared_cpu_list'),
+','.join(str(cpu) for cpu in cpus))

 def _redirect_paths(options):
 # Redirect filesystem syscalls from src to the first matching dests



2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/47524
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I985512245c9f51106b8347412ed643f78b567b24
Gerrit-Change-Number: 47524
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Update GET_PROCESS_APERTURES IOCTLs

2021-07-09 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47529 )


Change subject: gpu-compute: Update GET_PROCESS_APERTURES IOCTLs
..

gpu-compute: Update GET_PROCESS_APERTURES IOCTLs

The apertures for non-gfx801 GPUs are set differently.
If the apertures aren't set properly, ROCm will error out.

This change sets the apertures appropriately based on the
gfx version of the simulated GPU. It also adds in new
functions to set the scratch and lds apertures in GFX9 to mimic
the linux kernel.

Change-Id: I1fa6f60bc20c7b6eb3896057841d96846460a9f8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47529
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/gpu-compute/gpu_compute_driver.cc
M src/gpu-compute/gpu_compute_driver.hh
2 files changed, 88 insertions(+), 22 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/gpu-compute/gpu_compute_driver.cc  
b/src/gpu-compute/gpu_compute_driver.cc

index 472ced4..2fe5275 100644
--- a/src/gpu-compute/gpu_compute_driver.cc
+++ b/src/gpu-compute/gpu_compute_driver.cc
@@ -316,18 +316,50 @@
  * ensure that the base/limit addresses are
  * calculated correctly.
  */
-args->process_apertures[i].scratch_base
-= scratchApeBase(i + 1);
+
+switch (gfxVersion) {
+  case GfxVersion::gfx801:
+  case GfxVersion::gfx803:
+args->process_apertures[i].scratch_base =
+scratchApeBase(i + 1);
+args->process_apertures[i].lds_base =
+ldsApeBase(i + 1);
+break;
+  case GfxVersion::gfx900:
+args->process_apertures[i].scratch_base =
+scratchApeBaseV9();
+args->process_apertures[i].lds_base =
+ldsApeBaseV9();
+break;
+  default:
+fatal("Invalid gfx version\n");
+}
+
+// GFX8 and GFX9 set lds and scratch limits the same way
 args->process_apertures[i].scratch_limit =
  
scratchApeLimit(args->process_apertures[i].scratch_base);


-args->process_apertures[i].lds_base = ldsApeBase(i + 1);
 args->process_apertures[i].lds_limit =
 ldsApeLimit(args->process_apertures[i].lds_base);

-args->process_apertures[i].gpuvm_base = gpuVmApeBase(i +  
1);

-args->process_apertures[i].gpuvm_limit =
-gpuVmApeLimit(args->process_apertures[i].gpuvm_base);
+switch (gfxVersion) {
+  case GfxVersion::gfx801:
+args->process_apertures[i].gpuvm_base =
+gpuVmApeBase(i + 1);
+args->process_apertures[i].gpuvm_limit =
+ 
gpuVmApeLimit(args->process_apertures[i].gpuvm_base);

+break;
+  case GfxVersion::gfx803:
+  case GfxVersion::gfx900:
+// Taken from SVM_USE_BASE in Linux kernel
+args->process_apertures[i].gpuvm_base = 0x100ull;
+// Taken from AMDGPU_GMC_HOLE_START in Linux kernel
+args->process_apertures[i].gpuvm_limit =
+0x8000ULL - 1;
+break;
+  default:
+fatal("Invalid gfx version");
+}

 // NOTE: Must match ID populated by hsaTopology.py
 //
@@ -396,14 +428,6 @@
47) != 0x1);
 assert(bits(args->process_apertures[i].lds_limit, 63,
47) != 0);
-assert(bits(args->process_apertures[i].gpuvm_base,  
63,

-   47) != 0x1);
-assert(bits(args->process_apertures[i].gpuvm_base,  
63,

-   47) != 0);
-assert(bits(args->process_apertures[i].gpuvm_limit,  
63,

-   47) != 0x1);
-assert(bits(args->process_apertures[i].gpuvm_limit,  
63,

-   47) != 0);
 }

 args.copyOut(virt_proxy);
@@ -593,13 +617,41 @@
 TypedBufferArg ape_args
 (ioc_args->kfd_process_device_apertures_ptr);

-ape_args->scratch_base = scratchApeBase(i + 1);
+switch (gfxVersion) {
+  case GfxVersion::gfx801:
+  case GfxVersion::gfx803:
+   

[gem5-dev] Change in gem5/gem5[develop]: arch-vega: Add fatal when decoding missing insts

2021-07-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/47522 )


Change subject: arch-vega: Add fatal when decoding missing insts
..

arch-vega: Add fatal when decoding missing insts

Certain instructions don't have implementations in instructions.cc,
and get decoded as a nullptr.

This adds a fatal when decoding a missing instruction, as we aren't
able to properly run a program if all its instructions aren't
implemented, and it allows us to figure out which instruction is
missing due to fatals printing the line they were called.

Change-Id: I7e3690f079b790dceee102063773d5fbbc8619f1
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/47522
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/decoder.cc
1 file changed, 229 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index e4b7922..3054d1a 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -4440,6 +4440,7 @@
 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_MUL_HI_U32(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

@@ -4452,42 +4453,49 @@
 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_LSHL1_ADD_U32(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_LSHL2_ADD_U32(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_LSHL3_ADD_U32(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_LSHL4_ADD_U32(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_PACK_LL_B32_B16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_PACK_LH_B32_B16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OP_SOP2__S_HH_B32_B16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

@@ -4614,6 +4622,7 @@
 GPUStaticInst*
 Decoder::decode_OP_SOPK__S_CALL_B64(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

@@ -6834,108 +6843,126 @@
 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAD_U32_U16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAD_I32_I16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_XAD_U32(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MIN3_F16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MIN3_I16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MIN3_U16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAX3_F16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAX3_I16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MAX3_U16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MED3_F16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
 return nullptr;
 }

 GPUStaticInst*
 Decoder::decode_OPU_VOP3__V_MED3_I16(MachInst iFmt)
 {
+fatal("Trying to decode instruction without a class\n");
   

[gem5-dev] Change in gem5/gem5[develop]: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 i...

2021-07-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48022 )



Change subject: Merge branch 'develop' of  
https://gem5.googlesource.com/public/gem5 into develop

..

Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into  
develop


Change-Id: I479b6de37af0de2e92227761794730d37157c803
---
1 file changed, 0 insertions(+), 0 deletions(-)




--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48022
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I479b6de37af0de2e92227761794730d37157c803
Gerrit-Change-Number: 48022
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: fix typo in compute driver comments

2021-07-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48023 )



Change subject: gpu-compute: fix typo in compute driver comments
..

gpu-compute: fix typo in compute driver comments

Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5
---
M src/gpu-compute/gpu_compute_driver.cc
1 file changed, 1 insertion(+), 1 deletion(-)



diff --git a/src/gpu-compute/gpu_compute_driver.cc  
b/src/gpu-compute/gpu_compute_driver.cc

index 92ac641..4389f31 100644
--- a/src/gpu-compute/gpu_compute_driver.cc
+++ b/src/gpu-compute/gpu_compute_driver.cc
@@ -831,7 +831,7 @@
 // of the region.
 //
 // This is a simplified version of regular system VMAs, but for
-// GPUVM space (non of the clobber/remap nonsense we find in  
real
+// GPUVM space (none of the clobber/remap nonsense we find in  
real

 // OS managed memory).
 allocateGpuVma(mtype, args->va_addr, args->size);


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48023
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5
Gerrit-Change-Number: 48023
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 i...

2021-07-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48021 )



Change subject: Merge branch 'develop' of  
https://gem5.googlesource.com/public/gem5 into develop

..

Merge branch 'develop' of https://gem5.googlesource.com/public/gem5 into  
develop


Change-Id: I884540d26228cddb739e93eb03541e1deffc4390
---
1 file changed, 0 insertions(+), 0 deletions(-)




--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48021
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I884540d26228cddb739e93eb03541e1deffc4390
Gerrit-Change-Number: 48021
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: mem-ruby: Account for misaligned accesses in GPUCoalescer

2021-07-24 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48341 )


Change subject: mem-ruby: Account for misaligned accesses in GPUCoalescer
..

mem-ruby: Account for misaligned accesses in GPUCoalescer

Previously, we assumed that the maximum number of requests that would be
issued by an instruction was equal to the number of threads that were
active for that instruction.

However, if a thread has an access that crosses a cache line, that
thread has a misaligned access, and needs to request both cache lines.

This patch takes that into account by checking the status vector for
each thread in that instruction to determine the number of requests.

Change-Id: I1994962c46d504b48654dbd22bcd786c9f382fd9
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48341
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
---
M src/mem/ruby/system/GPUCoalescer.cc
1 file changed, 4 insertions(+), 1 deletion(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/mem/ruby/system/GPUCoalescer.cc  
b/src/mem/ruby/system/GPUCoalescer.cc

index c00e7c0..2390ba6 100644
--- a/src/mem/ruby/system/GPUCoalescer.cc
+++ b/src/mem/ruby/system/GPUCoalescer.cc
@@ -645,7 +645,10 @@
 // of the exec_mask.
 int num_packets = 1;
 if (!m_usingRubyTester) {
-num_packets = getDynInst(pkt)->exec_mask.count();
+num_packets = 0;
+for (int i = 0; i < TheGpuISA::NumVecElemPerVecReg; i++) {
+num_packets += getDynInst(pkt)->getLaneStatus(i);
+}
 }

 // the pkt is temporarily stored in the uncoalesced table until



1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48341
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: release-staging-v21-1
Gerrit-Change-Id: I1994962c46d504b48654dbd22bcd786c9f382fd9
Gerrit-Change-Number: 48341
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: arch-gcn3: Implement large ds_read/write instructions

2021-07-24 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48342 )


Change subject: arch-gcn3: Implement large ds_read/write instructions
..

arch-gcn3: Implement large ds_read/write instructions

This implements the 96 and 128b ds_read/write instructions in a similar
fashion to the 3 and 4 dword flat_load/store instructions.

These instructions are treated as reads/writes of 3 or 4 dwords, instead
of as a single 96b/128b memory transaction, due to the limitations of
the VecOperand class used in the amdgpu code.

In order to handle treating the memory transaction as multiple dwords,
the patch also adds in new initMemRead/initMemWrite functions for ds
instructions. These are similar to the functions used in flat
instructions for the same purpose.

Change-Id: I0f2ba3cb7cf040abb876e6eae55a6d38149ee960
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48342
Tested-by: kokoro 
Reviewed-by: Alex Dutu 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/gcn3/insts/instructions.cc
M src/arch/amdgpu/gcn3/insts/instructions.hh
M src/arch/amdgpu/gcn3/insts/op_encodings.hh
3 files changed, 232 insertions(+), 4 deletions(-)

Approvals:
  Alex Dutu: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc  
b/src/arch/amdgpu/gcn3/insts/instructions.cc

index 21ab58d..79af7ac 100644
--- a/src/arch/amdgpu/gcn3/insts/instructions.cc
+++ b/src/arch/amdgpu/gcn3/insts/instructions.cc
@@ -34335,9 +34335,52 @@
 void
 Inst_DS__DS_WRITE_B96::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data0(gpuDynInst, extData.DATA0);
+ConstVecOperandU32 data1(gpuDynInst, extData.DATA0 + 1);
+ConstVecOperandU32 data2(gpuDynInst, extData.DATA0 + 2);
+
+addr.read();
+data0.read();
+data1.read();
+data2.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 4] = data0[lane];
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 4 + 1] = data1[lane];
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 4 + 2] = data2[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 }

+void
+Inst_DS__DS_WRITE_B96::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initMemWrite<3>(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_WRITE_B96::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
+
 Inst_DS__DS_WRITE_B128::Inst_DS__DS_WRITE_B128(InFmt_DS *iFmt)
 : Inst_DS(iFmt, "ds_write_b128")
 {
@@ -34354,9 +34397,56 @@
 void
 Inst_DS__DS_WRITE_B128::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data0(gpuDynInst, extData.DATA0);
+ConstVecOperandU32 data1(gpuDynInst, extData.DATA0 + 1);
+ConstVecOperandU32 data2(gpuDynInst, extData.DATA0 + 2);
+ConstVecOperandU32 data3(gpuDynInst, extData.DATA0 + 3);
+
+addr.read();
+data0.read();
+data1.read();
+data2.read();
+data3.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 4] = data0[lane];
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 4 + 1] = data1[lane];
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 4 + 2] = data2[lane];
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 4 + 3] = data3[lane];
+}
+}
+
+ 
gp

[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: gpu-compute: Fix TLB coalescer starvation

2021-07-24 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48340 )


Change subject: gpu-compute: Fix TLB coalescer starvation
..

gpu-compute: Fix TLB coalescer starvation

Currently, we are storing coalesced accesses in
an std::unordered_map indexed by a tick index, i.e.
issue tick / coalescing window. If there are
multiple coalesced requests, at different tick
indexes, to the same virtual address, then the
TLB coalescer will issue just the first one.

However, std::unordered_map is not a sorted
container and we issue coalesced requests by iterating
through such container. This means that the coalesced
request sent in TLBCoalescer::processProbeTLBEvent is
not necessarly the oldest one. Because of this, in
cases of high contention the oldest coalesced request
will have a huge TLB access latency.

To fix this issue, we will use an std::map which is
a sorted container and therefore guarantees the
oldest coalesced request will be sent first.

Change-Id: I9c7ab32c038d5e60f6b55236266a27b0cae8bfb0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48340
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/gpu-compute/tlb_coalescer.hh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/gpu-compute/tlb_coalescer.hh  
b/src/gpu-compute/tlb_coalescer.hh

index b97801b..fce8740 100644
--- a/src/gpu-compute/tlb_coalescer.hh
+++ b/src/gpu-compute/tlb_coalescer.hh
@@ -100,7 +100,7 @@
  * option is to change it to curTick(), so we coalesce based
  * on the receive time.
  */
-typedef std::unordered_map>
+typedef std::map>
 CoalescingFIFO;

 CoalescingFIFO coalescerFIFO;

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48340
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: release-staging-v21-1
Gerrit-Change-Id: I9c7ab32c038d5e60f6b55236266a27b0cae8bfb0
Gerrit-Change-Number: 48340
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: arch-gcn3: Implement LDS accesses in Flat instructions

2021-07-26 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48343 )


Change subject: arch-gcn3: Implement LDS accesses in Flat instructions
..

arch-gcn3: Implement LDS accesses in Flat instructions

Add support for LDS accesses by allowing Flat instructions to dispatch
into the local memory pipeline if the requested address is in the group
aperture.

This requires implementing LDS accesses in the Flat initMemRead/Write
functions, in a similar fashion to the DS functions of the same name.

Because we now can potentially dispatch to the local memory pipeline,
this change also adds a check to regain any tokens we requested as a
flat instruction.

Change-Id: Id26191f7ee43291a5e5ca5f39af06af981ec23ab
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48343
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/gcn3/insts/instructions.cc
M src/arch/amdgpu/gcn3/insts/op_encodings.hh
M src/gpu-compute/gpu_dyn_inst.cc
M src/gpu-compute/local_memory_pipeline.cc
4 files changed, 184 insertions(+), 32 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc  
b/src/arch/amdgpu/gcn3/insts/instructions.cc

index 79af7ac..65d008b 100644
--- a/src/arch/amdgpu/gcn3/insts/instructions.cc
+++ b/src/arch/amdgpu/gcn3/insts/instructions.cc
@@ -36314,7 +36314,7 @@
 gpuDynInst->computeUnit()->globalMemoryPipe.
 issueRequest(gpuDynInst);
 } else {
-fatal("Non global flat instructions not implemented yet.\n");
+fatal("Unsupported scope for flat instruction.\n");
 }
 }

@@ -36363,7 +36363,7 @@
 gpuDynInst->computeUnit()->globalMemoryPipe.
 issueRequest(gpuDynInst);
 } else {
-fatal("Non global flat instructions not implemented yet.\n");
+fatal("Unsupported scope for flat instruction.\n");
 }
 }
 void
@@ -39384,8 +39384,11 @@
 if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
 gpuDynInst->computeUnit()->globalMemoryPipe
 .issueRequest(gpuDynInst);
+} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
+gpuDynInst->computeUnit()->localMemoryPipe
+.issueRequest(gpuDynInst);
 } else {
-fatal("Non global flat instructions not implemented yet.\n");
+fatal("Unsupported scope for flat instruction.\n");
 }
 } // execute

@@ -39448,8 +39451,11 @@
 if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
 gpuDynInst->computeUnit()->globalMemoryPipe
 .issueRequest(gpuDynInst);
+} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
+gpuDynInst->computeUnit()->localMemoryPipe
+.issueRequest(gpuDynInst);
 } else {
-fatal("Non global flat instructions not implemented yet.\n");
+fatal("Unsupported scope for flat instruction.\n");
 }
 }

@@ -39511,8 +39517,11 @@
 if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
 gpuDynInst->computeUnit()->globalMemoryPipe
 .issueRequest(gpuDynInst);
+} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
+gpuDynInst->computeUnit()->localMemoryPipe
+.issueRequest(gpuDynInst);
 } else {
-fatal("Non global flat instructions not implemented yet.\n");
+fatal("Unsupported scope for flat instruction.\n");
 }
 }

@@ -39603,8 +39612,11 @@
 if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
 gpuDynInst->computeUnit()->globalMemoryPipe
 .issueRequest(gpuDynInst);
+} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
+gpuDynInst->computeUnit()->localMemoryPipe
+.issueRequest(gpuDynInst);
 } else {
-fatal("Non global flat instructions not implemented yet.\n");
+fatal("Unsupported scope for flat instruction.\n");
 }
 }

@@ -39667,8 +39679,11 @@
 if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
 gpuDynInst->computeUnit()->globalMemoryPipe
 .issueRequest(gpuDynInst);
+} else if (gpuDynInst->executedAs() == enums::SC_GROUP) {
+gpuDynInst->computeUnit()->localMemoryPipe
+.issueRequest(gpuDynInst);
 } else {
-fatal("Non global flat instructions not implemented yet.\n");
+fatal("Unsupported scope for flat instruction.\n");
 }
 }

@@ -39731,8 +39746,11 @@
 if (gpuDynInst->executedAs() == enums::SC_GLOBAL) {
  

[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: arch-gcn3: Validate if scalar sources are scalar gprs

2021-07-26 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48344 )


Change subject: arch-gcn3: Validate if scalar sources are scalar gprs
..

arch-gcn3: Validate if scalar sources are scalar gprs

Scalar sources can either be a general-purpose register or a constant
register that holds a single value.

If we don't check for if the register is a general-purpose register,
it's possible that we get a constant register, which then causes all of
the register mapping code to break, as the constant registers aren't
supposed to be mapped like the general-purpose registers are.

This fix adds an isScalarReg check to the instruction encodings that
were missing it.

Change-Id: I3d7d5393aa324737301c3269cc227b60e8a159e4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48344
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Reviewed-by: Bobby R. Bruce 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/gcn3/insts/op_encodings.cc
1 file changed, 6 insertions(+), 6 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Bobby R. Bruce: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/amdgpu/gcn3/insts/op_encodings.cc  
b/src/arch/amdgpu/gcn3/insts/op_encodings.cc

index cbbb767..cf20a2e 100644
--- a/src/arch/amdgpu/gcn3/insts/op_encodings.cc
+++ b/src/arch/amdgpu/gcn3/insts/op_encodings.cc
@@ -1277,12 +1277,12 @@

 reg = extData.SRSRC;
 srcOps.emplace_back(reg, getOperandSize(opNum), true,
-  true, false, false);
+  isScalarReg(reg), false, false);
 opNum++;

 reg = extData.SOFFSET;
 srcOps.emplace_back(reg, getOperandSize(opNum), true,
-  true, false, false);
+  isScalarReg(reg), false, false);
 opNum++;
 }

@@ -1368,12 +1368,12 @@

 reg = extData.SRSRC;
 srcOps.emplace_back(reg, getOperandSize(opNum), true,
-  true, false, false);
+  isScalarReg(reg), false, false);
 opNum++;

 reg = extData.SOFFSET;
 srcOps.emplace_back(reg, getOperandSize(opNum), true,
-  true, false, false);
+  isScalarReg(reg), false, false);
 opNum++;

 // extData.VDATA moves in the reg list depending on the instruction
@@ -1441,13 +1441,13 @@

 reg = extData.SRSRC;
 srcOps.emplace_back(reg, getOperandSize(opNum), true,
-  true, false, false);
+  isScalarReg(reg), false, false);
 opNum++;

 if (getNumOperands() == 4) {
 reg = extData.SSAMP;
 srcOps.emplace_back(reg, getOperandSize(opNum), true,
-  true, false, false);
+  isScalarReg(reg), false, false);
 opNum++;
 }




1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48344
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: release-staging-v21-1
Gerrit-Change-Id: I3d7d5393aa324737301c3269cc227b60e8a159e4
Gerrit-Change-Number: 48344
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: sim-se: Fix execve syscall

2021-07-26 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48345 )


Change subject: sim-se: Fix execve syscall
..

sim-se: Fix execve syscall

There were three things preventing execve from working

Firstly, the entrypoint for the new program wasn't correct. This was
fixed by calling Process::init, which adds a bias to the entrypoint.

Secondly, the uname string wasn't being copied over. This meant when the
new executable tried to run, it would think the kernel was too old to
run on, and would error out. This was fixed by copying over the uname
string (the `release` string in Process) when creating the new process.

Additionally, this patch also ensures we copy over the uname string in
the clone implementation, as otherwise a cloned thread that called
execve would crash.

Finally, we choose to not delete the new ProcessParams or the old
Process. This is done both because it matches what is done in cloneFunc,
but also because deleting the old process results in a segfault later
on.

Change-Id: I4ca201da689e9e37671b4cb477dc76fa12eecf69
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48345
Reviewed-by: Matt Sinclair 
Reviewed-by: Bobby R. Bruce 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/sim/syscall_emul.hh
1 file changed, 6 insertions(+), 2 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Bobby R. Bruce: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/sim/syscall_emul.hh b/src/sim/syscall_emul.hh
index aa02fd6..09be700 100644
--- a/src/sim/syscall_emul.hh
+++ b/src/sim/syscall_emul.hh
@@ -1452,6 +1452,7 @@
 pp->euid = p->euid();
 pp->gid = p->gid();
 pp->egid = p->egid();
+pp->release = p->release;

 /* Find the first free PID that's less than the maximum */
 std::set const& pids = p->system->PIDs;
@@ -2017,6 +2018,7 @@
 pp->errout.assign("cerr");
 pp->cwd.assign(p->tgtCwd);
 pp->system = p->system;
+pp->release = p->release;
 /**
  * Prevent process object creation with identical PIDs (which will trip
  * a fatal check in Process constructor). The execve call is supposed  
to

@@ -2027,7 +2029,9 @@
  */
 p->system->PIDs.erase(p->pid());
 Process *new_p = pp->create();
-delete pp;
+// TODO: there is no way to know when the Process SimObject is done  
with

+// the params pointer. Both the params pointer (pp) and the process
+// pointer (p) are normally managed in python and are never cleaned up.

 /**
  * Work through the file descriptor array and close any files marked
@@ -2042,10 +2046,10 @@

 *new_p->sigchld = true;

-delete p;
 tc->clearArchRegs();
 tc->setProcessPtr(new_p);
 new_p->assignThreadContext(tc->contextId());
+new_p->init();
 new_p->initState();
 tc->activate();
 TheISA::PCState pcState = tc->pcState();

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48345
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: release-staging-v21-1
Gerrit-Change-Id: I4ca201da689e9e37671b4cb477dc76fa12eecf69
Gerrit-Change-Number: 48345
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[release-staging-v21-1]: sim-se: Properly handle a clone with the VFORK flag

2021-07-26 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48346 )


Change subject: sim-se: Properly handle a clone with the VFORK flag
..

sim-se: Properly handle a clone with the VFORK flag

When clone is called with the VFORK flag, the calling process is
suspended until the child process either exits, or calls execve.

This patch adds in a new variable to Process, which is used to store the
context of the calling process if this process is created through a
clone with VFORK set.

This patch also adds the required support in clone to suspend the
calling thread, and in exitImpl and execveFunc to wake up the calling
thread when the child thread calls either of those functions

Change-Id: I85af67544ea1d5df7102dcff1331b5a6f6f4fa7c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48346
Tested-by: kokoro 
Reviewed-by: Bobby R. Bruce 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/sim/process.cc
M src/sim/process.hh
M src/sim/syscall_emul.cc
M src/sim/syscall_emul.hh
4 files changed, 34 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Bobby R. Bruce: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/sim/process.cc b/src/sim/process.cc
index 207c275..272fc9f 100644
--- a/src/sim/process.cc
+++ b/src/sim/process.cc
@@ -175,6 +175,9 @@
 #ifndef CLONE_THREAD
 #define CLONE_THREAD 0
 #endif
+#ifndef CLONE_VFORK
+#define CLONE_VFORK 0
+#endif
 if (CLONE_VM & flags) {
 /**
  * Share the process memory address space between the new process
@@ -249,6 +252,10 @@
 np->exitGroup = exitGroup;
 }

+if (CLONE_VFORK & flags) {
+np->vforkContexts.push_back(otc->contextId());
+}
+
 np->argv.insert(np->argv.end(), argv.begin(), argv.end());
 np->envp.insert(np->envp.end(), envp.begin(), envp.end());
 }
diff --git a/src/sim/process.hh b/src/sim/process.hh
index 632ba90..34768a0 100644
--- a/src/sim/process.hh
+++ b/src/sim/process.hh
@@ -284,6 +284,9 @@
 // Process was forked with SIGCHLD set.
 bool *sigchld;

+// Contexts to wake up when this thread exits or calls execve
+std::vector vforkContexts;
+
 // Track how many system calls are executed
 statistics::Scalar numSyscalls;
 };
diff --git a/src/sim/syscall_emul.cc b/src/sim/syscall_emul.cc
index 147cb39..713bec4 100644
--- a/src/sim/syscall_emul.cc
+++ b/src/sim/syscall_emul.cc
@@ -193,6 +193,16 @@
 }
 }

+/**
+ * If we were a thread created by a clone with vfork set, wake up
+ * the thread that created us
+ */
+if (!p->vforkContexts.empty()) {
+ThreadContext *vtc = sys->threads[p->vforkContexts.front()];
+assert(vtc->status() == ThreadContext::Suspended);
+vtc->activate();
+}
+
 tc->halt();

 /**
diff --git a/src/sim/syscall_emul.hh b/src/sim/syscall_emul.hh
index 09be700..8695638 100644
--- a/src/sim/syscall_emul.hh
+++ b/src/sim/syscall_emul.hh
@@ -1521,6 +1521,10 @@
 ctc->pcState(cpc);
 ctc->activate();

+if (flags & OS::TGT_CLONE_VFORK) {
+tc->suspend();
+}
+
 return cp->pid();
 }

@@ -1998,6 +2002,16 @@
 };

 /**
+ * If we were a thread created by a clone with vfork set, wake up
+ * the thread that created us
+ */
+if (!p->vforkContexts.empty()) {
+ThreadContext *vtc = p->system->threads[p->vforkContexts.front()];
+assert(vtc->status() == ThreadContext::Suspended);
+vtc->activate();
+}
+
+/**
  * Note that ProcessParams is generated by swig and there are no other
  * examples of how to create anything but this default constructor. The
  * fields are manually initialized instead of passing parameters to the

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48346
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: release-staging-v21-1
Gerrit-Change-Id: I85af67544ea1d5df7102dcff1331b5a6f6f4fa7c
Gerrit-Change-Number: 48346
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Bobby R. Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: fix typo in compute driver comments

2021-07-29 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/48023 )


Change subject: gpu-compute: fix typo in compute driver comments
..

gpu-compute: fix typo in compute driver comments

Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48023
Maintainer: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Tested-by: kokoro 
---
M src/gpu-compute/gpu_compute_driver.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/gpu-compute/gpu_compute_driver.cc  
b/src/gpu-compute/gpu_compute_driver.cc

index d51b4c3..52a437a 100644
--- a/src/gpu-compute/gpu_compute_driver.cc
+++ b/src/gpu-compute/gpu_compute_driver.cc
@@ -836,7 +836,7 @@
 // of the region.
 //
 // This is a simplified version of regular system VMAs, but for
-// GPUVM space (non of the clobber/remap nonsense we find in  
real
+// GPUVM space (none of the clobber/remap nonsense we find in  
real

 // OS managed memory).
 allocateGpuVma(mtype, args->va_addr, args->size);




1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/48023
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I550c6c81ffb2ee9143a2676f93385a8b90c4ddd5
Gerrit-Change-Number: 48023
Gerrit-PatchSet: 3
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Alex Dutu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: util: add dev-hsa commit message tag

2020-09-03 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/34095 )



Change subject: util: add dev-hsa commit message tag
..

util: add dev-hsa commit message tag

The dev-hsa commit message tag was originally an option, but
appears to have been removed during the merge of the AMD GCN3
staging branch.  This commit adds it back.

Change-Id: Ie755b5ebe6ca1e5e92583b1588fd7aaeddcb5b00
---
M util/git-commit-msg.py
1 file changed, 4 insertions(+), 4 deletions(-)



diff --git a/util/git-commit-msg.py b/util/git-commit-msg.py
index d33b5b0..9cba896 100755
--- a/util/git-commit-msg.py
+++ b/util/git-commit-msg.py
@@ -91,10 +91,10 @@
 valid_tags = ["arch", "arch-arm", "arch-gcn3",
 "arch-mips", "arch-power", "arch-riscv", "arch-sparc", "arch-x86",
 "base", "configs", "cpu", "cpu-kvm", "cpu-minor", "cpu-o3",
-"cpu-simple", "dev", "dev-arm", "dev-virtio", "ext", "fastmodel",
-"gpu-compute", "learning-gem5", "mem", "mem-cache", "mem-garnet",
-"mem-ruby", "misc", "python", "scons", "sim", "sim-se", "sim-power",
-"stats", "system", "system-arm", "systemc", "tests",
+"cpu-simple", "dev", "dev-arm", "dev-hsa", "dev-virtio", "ext",
+"fastmodel", "gpu-compute", "learning-gem5", "mem", "mem-cache",
+"mem-garnet", "mem-ruby", "misc", "python", "scons", "sim", "sim-se",
+"sim-power", "stats", "system", "system-arm", "systemc", "tests",
 "util", "RFC", "WIP"]

 tags = ''.join(commit_header.split(':')[0].split()).split(',')

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/34095
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ie755b5ebe6ca1e5e92583b1588fd7aaeddcb5b00
Gerrit-Change-Number: 34095
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: util: add dev-hsa commit message tag

2020-09-04 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/34095 )


Change subject: util: add dev-hsa commit message tag
..

util: add dev-hsa commit message tag

The dev-hsa commit message tag was originally an option, but
appears to have been removed during the merge of the AMD GCN3
staging branch.  This commit adds it back.

Change-Id: Ie755b5ebe6ca1e5e92583b1588fd7aaeddcb5b00
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/34095
Reviewed-by: Jason Lowe-Power 
Maintainer: Jason Lowe-Power 
Tested-by: kokoro 
---
M util/git-commit-msg.py
1 file changed, 4 insertions(+), 4 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/util/git-commit-msg.py b/util/git-commit-msg.py
index d33b5b0..9cba896 100755
--- a/util/git-commit-msg.py
+++ b/util/git-commit-msg.py
@@ -91,10 +91,10 @@
 valid_tags = ["arch", "arch-arm", "arch-gcn3",
 "arch-mips", "arch-power", "arch-riscv", "arch-sparc", "arch-x86",
 "base", "configs", "cpu", "cpu-kvm", "cpu-minor", "cpu-o3",
-"cpu-simple", "dev", "dev-arm", "dev-virtio", "ext", "fastmodel",
-"gpu-compute", "learning-gem5", "mem", "mem-cache", "mem-garnet",
-"mem-ruby", "misc", "python", "scons", "sim", "sim-se", "sim-power",
-"stats", "system", "system-arm", "systemc", "tests",
+"cpu-simple", "dev", "dev-arm", "dev-hsa", "dev-virtio", "ext",
+"fastmodel", "gpu-compute", "learning-gem5", "mem", "mem-cache",
+"mem-garnet", "mem-ruby", "misc", "python", "scons", "sim", "sim-se",
+"sim-power", "stats", "system", "system-arm", "systemc", "tests",
 "util", "RFC", "WIP"]

 tags = ''.join(commit_header.split(':')[0].split()).split(',')

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/34095
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ie755b5ebe6ca1e5e92583b1588fd7aaeddcb5b00
Gerrit-Change-Number: 34095
Gerrit-PatchSet: 2
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Alexandru Duțu 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Anthony Gutierrez 
Gerrit-CC: Bradford Beckmann 
Gerrit-CC: Kyle Roarty 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Number of TLBs equal to number of CUs

2020-12-10 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/32035 )


Change subject: gpu-compute: Number of TLBs equal to number of CUs
..

gpu-compute: Number of TLBs equal to number of CUs

The n_cu variable in GPUTLBConifig.py did not take
the number of CUs into consideration and instead
calculated the number of TLBs using cu_per_sa,
sa_per_complex, num_gpu_complexes. Thus changing
the number of cus (n_cus) and none of the other flags
resulted in a  segmentation fault since the required
TLBs were not being instantiated

Change-Id: I569a4e6dc7db9b7a81aeede5ac68aacc0f400a5e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32035
Maintainer: Matt Sinclair 
Maintainer: Anthony Gutierrez 
Reviewed-by: Alexandru Duțu 
Reviewed-by: Anthony Gutierrez 
Reviewed-by: Matt Sinclair 
Reviewed-by: Jason Lowe-Power 
Tested-by: kokoro 
---
M configs/common/GPUTLBConfig.py
1 file changed, 1 insertion(+), 2 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved
  Alexandru Duțu: Looks good to me, approved
  Anthony Gutierrez: Looks good to me, approved; Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/configs/common/GPUTLBConfig.py b/configs/common/GPUTLBConfig.py
index 8e2b1e4..c06bda1 100644
--- a/configs/common/GPUTLBConfig.py
+++ b/configs/common/GPUTLBConfig.py
@@ -74,8 +74,7 @@
 coalescer_name.append(eval(Coalescer_constructor(my_level)))

 def config_tlb_hierarchy(options, system, shader_idx):
-n_cu = options.cu_per_sa * options.sa_per_complex * \
-   options.num_gpu_complexes
+n_cu = options.num_compute_units

 if options.TLB_config == "perLane":
 num_TLBs = 64 * n_cu

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/32035
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I569a4e6dc7db9b7a81aeede5ac68aacc0f400a5e
Gerrit-Change-Number: 32035
Gerrit-PatchSet: 4
Gerrit-Owner: GAURAV JAIN 
Gerrit-Reviewer: Alexandru Duțu 
Gerrit-Reviewer: Anthony Gutierrez 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bradford Beckmann 
Gerrit-CC: Matthew Poremba 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Fix FLAT insts decrementing lgkm count early

2021-01-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/38696 )


Change subject: gpu-compute: Fix FLAT insts decrementing lgkm count early
..

gpu-compute: Fix FLAT insts decrementing lgkm count early

FLAT instructions used to decrement lgkm count on execute, while the
GCN3 ISA specifies that lgkm count should be decremented on data being
returned or data being written.

This patch changes it so that lgkm is decremented after initiateAcc (for
stores) and after completeAcc (for loads) to better reflect the ISA
definition.

This fixes a bug where waitcnts would be satisfied even though the
memory access wasn't completed, which lead to instructions using the
wrong data.

Change-Id: I596cb031af9cda8d47a1b5e146e4a4ffd793d36c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/38696
Reviewed-by: Matt Sinclair 
Reviewed-by: Matthew Poremba 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/gpu-compute/global_memory_pipeline.cc
M src/gpu-compute/gpu_dyn_inst.cc
2 files changed, 7 insertions(+), 1 deletion(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass



diff --git a/src/gpu-compute/global_memory_pipeline.cc  
b/src/gpu-compute/global_memory_pipeline.cc

index bcd93f8..a2b24e4 100644
--- a/src/gpu-compute/global_memory_pipeline.cc
+++ b/src/gpu-compute/global_memory_pipeline.cc
@@ -130,6 +130,9 @@
 DPRINTF(GPUMem, "CU%d: WF[%d][%d]: Completing global mem  
instr %s\n",

 m->cu_id, m->simdId, m->wfSlotId, m->disassemble());
 m->completeAcc(m);
+if (m->isFlat() && m->isLoad()) {
+w->decLGKMInstsIssued();
+}
 w->decVMemInstsIssued();

 if (m->isLoad() || m->isAtomicRet()) {
@@ -193,6 +196,10 @@
 mp->disassemble(), mp->seqNum());
 mp->initiateAcc(mp);

+if (mp->isFlat() && mp->isStore()) {
+mp->wavefront()->decLGKMInstsIssued();
+}
+
 if (mp->isStore() && mp->isGlobalSeg()) {
 mp->wavefront()->decExpInstsIssued();
 }
diff --git a/src/gpu-compute/gpu_dyn_inst.cc  
b/src/gpu-compute/gpu_dyn_inst.cc

index 03ed689..38e4ecf 100644
--- a/src/gpu-compute/gpu_dyn_inst.cc
+++ b/src/gpu-compute/gpu_dyn_inst.cc
@@ -819,7 +819,6 @@
 if (executedAs() == Enums::SC_GLOBAL) {
 // no transormation for global segment
 wavefront()->execUnitId =  wavefront()->flatGmUnitId;
-wavefront()->decLGKMInstsIssued();
 if (isLoad()) {
 wavefront()->rdLmReqsInPipe--;
 } else if (isStore()) {

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/38696
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I596cb031af9cda8d47a1b5e146e4a4ffd793d36c
Gerrit-Change-Number: 38696
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Alexandru Duțu 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Support for dynamic register alloc

2021-01-14 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/32034 )


Change subject: gpu-compute: Support for dynamic register alloc
..

gpu-compute: Support for dynamic register alloc

SimplePoolManager doesn't allow mapping of two WGs
simultaneously on the same Compute Unit (provided
the previous WG has been mapped to all the SIMDs)
even if there is sufficient VRF and SRF space
available.

DynPoolManager takes care of that by dynamically
allocating and deallocating register file space
to wavefronts

Change-Id: I2255c68d4b421615d7b231edc05d3ebb27cbd66c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32034
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Alexandru Duțu 
---
M configs/example/apu_se.py
M src/gpu-compute/GPU.py
M src/gpu-compute/SConscript
M src/gpu-compute/compute_unit.cc
M src/gpu-compute/compute_unit.hh
A src/gpu-compute/dyn_pool_manager.cc
A src/gpu-compute/dyn_pool_manager.hh
M src/gpu-compute/pool_manager.hh
M src/gpu-compute/shader.cc
M src/gpu-compute/static_register_manager_policy.cc
10 files changed, 293 insertions(+), 14 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve
  Alexandru Duțu: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 14f7163..0bcf99b 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -182,6 +182,8 @@
   ' m5_switchcpu pseudo-ops will toggle back and forth')
 parser.add_option("--num-hw-queues", type="int", default=10,
   help="number of hw queues in packet processor")
+parser.add_option("--reg-alloc-policy",type="string", default="simple",
+  help="register allocation policy (simple/dynamic)")

 Ruby.define_options(parser)

@@ -295,18 +297,28 @@
 for k in range(shader.n_wf):
 wavefronts.append(Wavefront(simdId = j, wf_slot_id = k,
 wf_size = options.wf_size))
-vrf_pool_mgrs.append(SimplePoolManager(pool_size = \
+
+if options.reg_alloc_policy == "simple":
+vrf_pool_mgrs.append(SimplePoolManager(pool_size = \
options.vreg_file_size,
min_alloc = \
options.vreg_min_alloc))
+srf_pool_mgrs.append(SimplePoolManager(pool_size = \
+   options.sreg_file_size,
+   min_alloc = \
+   options.vreg_min_alloc))
+elif options.reg_alloc_policy == "dynamic":
+vrf_pool_mgrs.append(DynPoolManager(pool_size = \
+   options.vreg_file_size,
+   min_alloc = \
+   options.vreg_min_alloc))
+srf_pool_mgrs.append(DynPoolManager(pool_size = \
+   options.sreg_file_size,
+   min_alloc = \
+   options.vreg_min_alloc))

 vrfs.append(VectorRegisterFile(simd_id=j, wf_size=options.wf_size,
num_regs=options.vreg_file_size))
-
-srf_pool_mgrs.append(SimplePoolManager(pool_size = \
-   options.sreg_file_size,
-   min_alloc = \
-   options.vreg_min_alloc))
 srfs.append(ScalarRegisterFile(simd_id=j, wf_size=options.wf_size,
num_regs=options.sreg_file_size))

diff --git a/src/gpu-compute/GPU.py b/src/gpu-compute/GPU.py
index b82ad18..d2959ac 100644
--- a/src/gpu-compute/GPU.py
+++ b/src/gpu-compute/GPU.py
@@ -28,8 +28,6 @@
 # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF  
THE

 # POSSIBILITY OF SUCH DAMAGE.
-#
-# Authors: Steve Reinhardt

 from m5.defines import buildEnv
 from m5.params import *
@@ -67,6 +65,12 @@
 cxx_class = 'SimplePoolManager'
 cxx_header = "gpu-compute/simple_pool_manager.hh"

+## This is for allowing multiple workgroups on one CU
+class DynPoolManager(PoolManager):
+type = 'DynPoolManager'
+cxx_class = 'DynPoolManager'
+cxx_header = "gpu-compute/dyn_pool_manager.hh"
+
 class RegisterFile(SimObject):
 type = 'RegisterFile'
 cxx_class = 'RegisterFile'
diff --git a/src/gpu-compute/SConscript b/src/gpu-compute/SConscript
index 416b9e9..e41e387 100644
--- a/src/gpu-compute

[gem5-dev] Change in gem5/gem5[develop]: arch-x86: Make JRCXZ instruction do 64-bit jump

2021-02-03 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/40195 )


Change subject: arch-x86: Make JRCXZ instruction do 64-bit jump
..

arch-x86: Make JRCXZ instruction do 64-bit jump

Per the AMD64 Architecture Programming Manual:

The size of the count register (CX, ECX, or RCX) depends on the
address-size attribute of the JrCXZ instruction. Therefore, JRCXZ can
only be executed in 64-bit mode

and

In 64-bit mode, the operand size defaults to 64 bits. The processor
sign-extends the 8-bit displacement value to 64 bits before adding it
to the RIP.

This patch also renames the instruction from JRCX to JRCXZ to match the
language in the programming manual.

Change-Id: Id55147d0602ff41ad6aaef483bef722ff56cae62
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40195
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/x86/isa/decoder/one_byte_opcodes.isa
M  
src/arch/x86/isa/insts/general_purpose/control_transfer/conditional_jump.py

2 files changed, 4 insertions(+), 2 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/x86/isa/decoder/one_byte_opcodes.isa  
b/src/arch/x86/isa/decoder/one_byte_opcodes.isa

index b5f77cd..04b3adc 100644
--- a/src/arch/x86/isa/decoder/one_byte_opcodes.isa
+++ b/src/arch/x86/isa/decoder/one_byte_opcodes.isa
@@ -483,7 +483,7 @@
 0x0: LOOPNE(Jb);
 0x1: LOOPE(Jb);
 0x2: LOOP(Jb);
-0x3: JRCX(Jb);
+0x3: JRCXZ(Jb);
 0x4: IN(rAb,Ib);
 0x5: IN(rAv,Iv);
 0x6: OUT(Ib,rAb);
diff --git  
a/src/arch/x86/isa/insts/general_purpose/control_transfer/conditional_jump.py  
b/src/arch/x86/isa/insts/general_purpose/control_transfer/conditional_jump.py

index 390a08b..d0fa31a 100644
---  
a/src/arch/x86/isa/insts/general_purpose/control_transfer/conditional_jump.py
+++  
b/src/arch/x86/isa/insts/general_purpose/control_transfer/conditional_jump.py

@@ -210,8 +210,10 @@
 wrip t1, t2, flags=(nCOF,)
 };

-def macroop JRCX_I
+def macroop JRCXZ_I
 {
+# Make the default data size of jumps 64 bits in 64 bit mode
+.adjust_env oszIn64Override
 .control_direct

 rdip t1

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/40195
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Id55147d0602ff41ad6aaef483bef722ff56cae62
Gerrit-Change-Number: 40195
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Gabe Black 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Alexandru Duțu 
Gerrit-CC: Jason Lowe-Power 
Gerrit-CC: Matthew Poremba 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: Implementation of s_sleep

2021-02-03 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/39115 )


Change subject: arch-gcn3: Implementation of s_sleep
..

arch-gcn3: Implementation of s_sleep

This changeset implements the s_sleep instruction in a similar
way to s_waitcnt.

Change-Id: I4811c318ac2c76c485e2bfd9d93baa1205ecf183
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/39115
Maintainer: Matthew Poremba 
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/gcn3/insts/instructions.cc
M src/gpu-compute/GPUStaticInstFlags.py
M src/gpu-compute/gpu_dyn_inst.cc
M src/gpu-compute/gpu_dyn_inst.hh
M src/gpu-compute/gpu_static_inst.hh
M src/gpu-compute/schedule_stage.cc
M src/gpu-compute/scoreboard_check_stage.cc
M src/gpu-compute/scoreboard_check_stage.hh
M src/gpu-compute/wavefront.cc
M src/gpu-compute/wavefront.hh
10 files changed, 81 insertions(+), 4 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  Matthew Poremba: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/gcn3/insts/instructions.cc  
b/src/arch/gcn3/insts/instructions.cc

index 8c02951..03b11ab 100644
--- a/src/arch/gcn3/insts/instructions.cc
+++ b/src/arch/gcn3/insts/instructions.cc
@@ -4187,6 +4187,8 @@
 Inst_SOPP__S_SLEEP::Inst_SOPP__S_SLEEP(InFmt_SOPP *iFmt)
 : Inst_SOPP(iFmt, "s_sleep")
 {
+setFlag(ALU);
+setFlag(Sleep);
 } // Inst_SOPP__S_SLEEP

 Inst_SOPP__S_SLEEP::~Inst_SOPP__S_SLEEP()
@@ -4197,8 +4199,12 @@
 void
 Inst_SOPP__S_SLEEP::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
-}
+ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+gpuDynInst->wavefront()->setStatus(Wavefront::S_STALLED_SLEEP);
+// sleep duration is specified in multiples of 64 cycles
+gpuDynInst->wavefront()->setSleepTime(64 * simm16);
+} // execute
+// --- Inst_SOPP__S_SETPRIO class methods ---

 Inst_SOPP__S_SETPRIO::Inst_SOPP__S_SETPRIO(InFmt_SOPP *iFmt)
 : Inst_SOPP(iFmt, "s_setprio")
diff --git a/src/gpu-compute/GPUStaticInstFlags.py  
b/src/gpu-compute/GPUStaticInstFlags.py

index ad4c6c3..1dc143c 100644
--- a/src/gpu-compute/GPUStaticInstFlags.py
+++ b/src/gpu-compute/GPUStaticInstFlags.py
@@ -48,6 +48,7 @@
 'UnconditionalJump', #
 'SpecialOp', # Special op
 'Waitcnt',   # Is a waitcnt instruction
+'Sleep', # Is a sleep instruction

 # Memory ops
 'MemBarrier',# Barrier instruction
diff --git a/src/gpu-compute/gpu_dyn_inst.cc  
b/src/gpu-compute/gpu_dyn_inst.cc

index a17a93f..b827632 100644
--- a/src/gpu-compute/gpu_dyn_inst.cc
+++ b/src/gpu-compute/gpu_dyn_inst.cc
@@ -399,6 +399,12 @@
 }

 bool
+GPUDynInst::isSleep() const
+{
+return _staticInst->isSleep();
+}
+
+bool
 GPUDynInst::isBarrier() const
 {
 return _staticInst->isBarrier();
diff --git a/src/gpu-compute/gpu_dyn_inst.hh  
b/src/gpu-compute/gpu_dyn_inst.hh

index 8c7cf87..851a46a 100644
--- a/src/gpu-compute/gpu_dyn_inst.hh
+++ b/src/gpu-compute/gpu_dyn_inst.hh
@@ -180,6 +180,7 @@
 bool isUnconditionalJump() const;
 bool isSpecialOp() const;
 bool isWaitcnt() const;
+bool isSleep() const;

 bool isBarrier() const;
 bool isMemSync() const;
diff --git a/src/gpu-compute/gpu_static_inst.hh  
b/src/gpu-compute/gpu_static_inst.hh

index 88fd9f9..f973f2f 100644
--- a/src/gpu-compute/gpu_static_inst.hh
+++ b/src/gpu-compute/gpu_static_inst.hh
@@ -119,6 +119,7 @@

 bool isSpecialOp() const { return _flags[SpecialOp]; }
 bool isWaitcnt() const { return _flags[Waitcnt]; }
+bool isSleep() const { return _flags[Sleep]; }

 bool isBarrier() const { return _flags[MemBarrier]; }
 bool isMemSync() const { return _flags[MemSync]; }
diff --git a/src/gpu-compute/schedule_stage.cc  
b/src/gpu-compute/schedule_stage.cc

index 02580fe..8a2ea18 100644
--- a/src/gpu-compute/schedule_stage.cc
+++ b/src/gpu-compute/schedule_stage.cc
@@ -317,6 +317,9 @@
 if (wf->isOldestInstWaitcnt()) {
 wf->setStatus(Wavefront::S_WAITCNT);
 }
+if (wf->isOldestInstSleep()) {
+wf->setStatus(Wavefront::S_STALLED_SLEEP);
+}
 if (!gpu_dyn_inst->isScalar()) {
 computeUnit.vrf[wf->simdId]
 ->scheduleReadOperands(wf, gpu_dyn_inst);
diff --git a/src/gpu-compute/scoreboard_check_stage.cc  
b/src/gpu-compute/scoreboard_check_stage.cc

index c246279..08ce6a1 100644
--- a/src/gpu-compute/scoreboard_check_stage.cc
+++ b/src/gpu-compute/scoreboard_check_stage.cc
@@ -92,6 +92,15 @@
 }
 }

+// sleep instruction has been dispatched or executed: next
+// instruction should be blocked until the sleep period expires.
+if (w->getStatus() == Wavefront::S_STALLED_SLEEP) {
+if (!w-

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Fixing HSA's barrier bit implementation

2020-06-15 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/30354 )



Change subject: gpu-compute: Fixing HSA's barrier bit implementation
..

gpu-compute: Fixing HSA's barrier bit implementation

This changeset fixes several bugs in the HSA barrier bit implementation.

1. Forces AQL packet launch to wait for completion of all previous packets
2. Enforces barrier bit blocking only if there are packets pending  
completion

3. Barrier bit unblocking is correclty done by the last pending packet
4. Implementing barrier bit for all packets to conform to HSA spec

Change-Id: I62ce589dff57dcde4d64054a1b6ffd962acd5eb8
---
M src/dev/hsa/hsa_packet_processor.cc
M src/dev/hsa/hsa_packet_processor.hh
2 files changed, 84 insertions(+), 14 deletions(-)



diff --git a/src/dev/hsa/hsa_packet_processor.cc  
b/src/dev/hsa/hsa_packet_processor.cc

index f9880e4..4183d6d 100644
--- a/src/dev/hsa/hsa_packet_processor.cc
+++ b/src/dev/hsa/hsa_packet_processor.cc
@@ -277,11 +277,11 @@
 }

 void
-HSAPacketProcessor::schedAQLProcessing(uint32_t rl_idx)
+HSAPacketProcessor::schedAQLProcessing(uint32_t rl_idx, Tick delay)
 {
 RQLEntry *queue = regdQList[rl_idx];
 if (!queue->aqlProcessEvent.scheduled()) {
-Tick processingTick = curTick() + pktProcessDelay;
+Tick processingTick = curTick() + delay;
 schedule(queue->aqlProcessEvent, processingTick);
 DPRINTF(HSAPacketProcessor, "AQL processing scheduled at  
tick: %d\n",

 processingTick);
@@ -290,32 +290,48 @@
 }
 }

-bool
-HSAPacketProcessor::processPkt(void* pkt, uint32_t rl_idx, Addr  
host_pkt_addr)

+void
+HSAPacketProcessor::schedAQLProcessing(uint32_t rl_idx)
 {
-bool is_submitted = false;
+schedAQLProcessing(rl_idx, pktProcessDelay);
+}
+
+Q_STATE HSAPacketProcessor::processPkt(void* pkt, uint32_t rl_idx,
+   Addr host_pkt_addr)
+{
+Q_STATE is_submitted = BLOCKED_BPKT;
 SignalState *dep_sgnl_rd_st = &(regdQList[rl_idx]->depSignalRdState);
 // Dependency signals are not read yet. And this can only be a retry.
 // The retry logic will schedule the packet processor wakeup
 if (dep_sgnl_rd_st->pendingReads != 0) {
-return false;
+return BLOCKED_BPKT;
 }
 // `pkt` can be typecasted to any type of AQL packet since they all
 // have header information at offset zero
 auto disp_pkt = (_hsa_dispatch_packet_t *)pkt;
 hsa_packet_type_t pkt_type = PKT_TYPE(disp_pkt);
+if (IS_BARRIER(disp_pkt) &&
+regdQList[rl_idx]->compltnPending() > 0) {
+// If this packet is using the "barrier bit" to enforce ordering  
with

+// previous packets, and if there are outstanding packets, set the
+// barrier bit for this queue and block the queue.
+DPRINTF(HSAPacketProcessor, "%s: setting barrier bit for active" \
+" list ID = %d\n", __FUNCTION__, rl_idx);
+regdQList[rl_idx]->setBarrierBit(true);
+return BLOCKED_BBIT;
+}
 if (pkt_type == HSA_PACKET_TYPE_VENDOR_SPECIFIC) {
 DPRINTF(HSAPacketProcessor, "%s: submitting vendor specific pkt" \
 " active list ID = %d\n", __FUNCTION__, rl_idx);
 // Submit packet to HSA device (dispatcher)
 hsa_device->submitVendorPkt((void *)disp_pkt, rl_idx,  
host_pkt_addr);

-is_submitted = true;
+is_submitted = UNBLOCKED;
 } else if (pkt_type == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
 DPRINTF(HSAPacketProcessor, "%s: submitting kernel dispatch pkt" \
 " active list ID = %d\n", __FUNCTION__, rl_idx);
 // Submit packet to HSA device (dispatcher)
 hsa_device->submitDispatchPkt((void *)disp_pkt, rl_idx,  
host_pkt_addr);

-is_submitted = true;
+is_submitted = UNBLOCKED;
 } else if (pkt_type == HSA_PACKET_TYPE_BARRIER_AND) {
 DPRINTF(HSAPacketProcessor, "%s: Processing barrier packet" \
 " active list ID = %d\n", __FUNCTION__, rl_idx);
@@ -372,7 +388,7 @@
 // TODO: Completion signal of barrier packet to be
 // atomically decremented here
 finishPkt((void*)bar_and_pkt, rl_idx);
-is_submitted = true;
+is_submitted = UNBLOCKED;
 // Reset signal values
 dep_sgnl_rd_st->resetSigVals();
 // The completion signal is connected
@@ -432,6 +448,13 @@
 " dispIdx %d, active list ID = %d\n",
 __FUNCTION__, aqlRingBuffer->rdIdx(),
 aqlRingBuffer->wrIdx(), aqlRingBuffer->dispIdx(), rqIdx);
+// If barrier bit is set, then this wakeup is a dummy wakeup
+// just to model the processing time. Do nothing.
+if (hsaPP->regdQList[rqIdx]->getBarrierBit()) {
+DPRINTF(HSAPacketProcessor,
+"Dummy wakeup with barrier bit for rdIdx %d\n", rqIdx);
+

[gem5-dev] Change in gem5/gem5[develop]: configs: Change env defaults in apu_se.py for ROCm

2020-07-29 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/30275 )


Change subject: configs: Change env defaults in apu_se.py for ROCm
..

configs: Change env defaults in apu_se.py for ROCm

This change simplifies the setup process for running
ROCm-based programs by adding the libraries that are
needed to LD_LIBRARY_PATH by default, using
preexisting environment variables that should be set
on the host.

HOME also gets set, as MIOpen-based programs can fail
without it set.

Change-Id: Ic599674babeaebb52de8a55981d04454cdc96cd8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30275
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Reviewed-by: Anthony Gutierrez 
Reviewed-by: Bradford Beckmann 
Maintainer: Anthony Gutierrez 
Maintainer: Jason Lowe-Power 
---
M configs/example/apu_se.py
1 file changed, 11 insertions(+), 4 deletions(-)

Approvals:
  Bradford Beckmann: Looks good to me, approved
  Anthony Gutierrez: Looks good to me, approved; Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  Jason Lowe-Power: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 4e9c75f..82e4022 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -456,11 +456,18 @@
 env = [line.rstrip() for line in f]
 else:
 env = ['LD_LIBRARY_PATH=%s' % ':'.join([
-   "/proj/radl_tools/rocm-1.6/lib",
-   "/proj/radl_tools/rocm-1.6/hcc/lib64",
-   "/tool/pandora64/.package/libunwind-1.1/lib",
-   "/tool/pandora64/.package/gcc-6.4.0/lib64"
+   os.getenv('ROCM_PATH','/opt/rocm')+'/lib',
+   os.getenv('HCC_HOME','/opt/rocm/hcc')+'/lib',
+   os.getenv('HSA_PATH','/opt/rocm/hsa')+'/lib',
+   os.getenv('HIP_PATH','/opt/rocm/hip')+'/lib',
+   os.getenv('ROCM_PATH','/opt/rocm')+'/libhsakmt/lib',
+   os.getenv('ROCM_PATH','/opt/rocm')+'/miopen/lib',
+   os.getenv('ROCM_PATH','/opt/rocm')+'/miopengemm/lib',
+   os.getenv('ROCM_PATH','/opt/rocm')+'/hipblas/lib',
+   os.getenv('ROCM_PATH','/opt/rocm')+'/rocblas/lib',
+   "/usr/lib/x86_64-linux-gnu"
]),
+   'HOME=%s' % os.getenv('HOME','/'),
"HSA_ENABLE_INTERRUPT=0"]

 process = Process(executable = executable, cmd = [options.cmd]

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/30275
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ic599674babeaebb52de8a55981d04454cdc96cd8
Gerrit-Change-Number: 30275
Gerrit-PatchSet: 4
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Anthony Gutierrez 
Gerrit-Reviewer: Bradford Beckmann 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: add support for flat atomic adds, subs, incs, decs

2020-07-29 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/31974 )



Change subject: arch-gcn3: add support for flat atomic adds, subs, incs,  
decs

..

arch-gcn3: add support for flat atomic adds, subs, incs, decs

Add support for all missing flat atomic adds, subtracts, increments,
and decrements, including their x2 variants.

Change-Id: I37a67fcacca91a09a82be6597facaa366105d2dc
---
M src/arch/gcn3/insts/instructions.cc
M src/arch/gcn3/insts/instructions.hh
2 files changed, 410 insertions(+), 6 deletions(-)



diff --git a/src/arch/gcn3/insts/instructions.cc  
b/src/arch/gcn3/insts/instructions.cc

index 426f991..6e81e2c 100644
--- a/src/arch/gcn3/insts/instructions.cc
+++ b/src/arch/gcn3/insts/instructions.cc
@@ -40643,8 +40643,72 @@
 void
 Inst_FLAT__FLAT_ATOMIC_SUB::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (wf->execMask().none()) {
+wf->decVMemInstsIssued();
+wf->decLGKMInstsIssued();
+wf->wrGmReqsInPipe--;
+wf->rdGmReqsInPipe--;
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->exec_mask = wf->execMask();
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
+
+ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+if (gpuDynInst->executedAs() == Enums::SC_GLOBAL) {
+gpuDynInst->computeUnit()->globalMemoryPipe.
+issueRequest(gpuDynInst);
+wf->wrGmReqsInPipe--;
+wf->outstandingReqsWrGm++;
+wf->rdGmReqsInPipe--;
+wf->outstandingReqsRdGm++;
+} else {
+fatal("Non global flat instructions not implemented yet.\n");
+}
+
+gpuDynInst->wavefront()->outstandingReqs++;
+gpuDynInst->wavefront()->validateRequestCounters();
 }
+void
+Inst_FLAT__FLAT_ATOMIC_SUB::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+initAtomicAccess(gpuDynInst);
+} // initiateAcc
+
+void
+Inst_FLAT__FLAT_ATOMIC_SUB::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+if (isAtomicRet()) {
+VecOperandU32 vdst(gpuDynInst, extData.VDST);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+vdst[lane] = (reinterpret_cast(
+gpuDynInst->d_data))[lane];
+}
+}
+
+vdst.write();
+}
+} // completeAcc

 Inst_FLAT__FLAT_ATOMIC_SMIN::Inst_FLAT__FLAT_ATOMIC_SMIN(InFmt_FLAT  
*iFmt)

 : Inst_FLAT(iFmt, "flat_atomic_smin")
@@ -40843,9 +40907,74 @@
 void
 Inst_FLAT__FLAT_ATOMIC_INC::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (wf->execMask().none()) {
+wf->decVMemInstsIssued();
+wf->decLGKMInstsIssued();
+wf->wrGmReqsInPipe--;
+wf->rdGmReqsInPipe--;
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->exec_mask = wf->execMask();
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
+
+ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+if (gpuDynInst->executedAs() == Enums::SC_GLOBAL) {
+gpuDynInst->computeUnit()->globalMemoryPipe.
+issueRequest(gpuDynInst);
+wf->wrGmReqsInPipe--;
+wf->outstandingReqsWrGm++;
+wf->rdGmReqsInPipe--;
+wf->outstandingReqsRdGm++;
+} else {
+fatal("Non global flat instructions not implemented yet.\n");
+}
+
+gpuDynInst->wavefront()->outstandingReqs++;
+gpuDynInst->wavefront()->validateRequestCounters();
 }

+void
+Inst_FLAT__FLAT_ATOMIC_INC::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+ini

[gem5-dev] Change in gem5/gem5[develop]: arch-gcn3: add support for flat atomic adds, subs, incs, decs

2020-07-30 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/31974 )


Change subject: arch-gcn3: add support for flat atomic adds, subs, incs,  
decs

..

arch-gcn3: add support for flat atomic adds, subs, incs, decs

Add support for all missing flat atomic adds, subtracts, increments,
and decrements, including their x2 variants.

Change-Id: I37a67fcacca91a09a82be6597facaa366105d2dc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/31974
Reviewed-by: Anthony Gutierrez 
Maintainer: Anthony Gutierrez 
Tested-by: kokoro 
---
M src/arch/gcn3/insts/instructions.cc
M src/arch/gcn3/insts/instructions.hh
2 files changed, 410 insertions(+), 6 deletions(-)

Approvals:
  Anthony Gutierrez: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/arch/gcn3/insts/instructions.cc  
b/src/arch/gcn3/insts/instructions.cc

index 426f991..6e81e2c 100644
--- a/src/arch/gcn3/insts/instructions.cc
+++ b/src/arch/gcn3/insts/instructions.cc
@@ -40643,8 +40643,72 @@
 void
 Inst_FLAT__FLAT_ATOMIC_SUB::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (wf->execMask().none()) {
+wf->decVMemInstsIssued();
+wf->decLGKMInstsIssued();
+wf->wrGmReqsInPipe--;
+wf->rdGmReqsInPipe--;
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->exec_mask = wf->execMask();
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
+
+ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+if (gpuDynInst->executedAs() == Enums::SC_GLOBAL) {
+gpuDynInst->computeUnit()->globalMemoryPipe.
+issueRequest(gpuDynInst);
+wf->wrGmReqsInPipe--;
+wf->outstandingReqsWrGm++;
+wf->rdGmReqsInPipe--;
+wf->outstandingReqsRdGm++;
+} else {
+fatal("Non global flat instructions not implemented yet.\n");
+}
+
+gpuDynInst->wavefront()->outstandingReqs++;
+gpuDynInst->wavefront()->validateRequestCounters();
 }
+void
+Inst_FLAT__FLAT_ATOMIC_SUB::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+initAtomicAccess(gpuDynInst);
+} // initiateAcc
+
+void
+Inst_FLAT__FLAT_ATOMIC_SUB::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+if (isAtomicRet()) {
+VecOperandU32 vdst(gpuDynInst, extData.VDST);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+vdst[lane] = (reinterpret_cast(
+gpuDynInst->d_data))[lane];
+}
+}
+
+vdst.write();
+}
+} // completeAcc

 Inst_FLAT__FLAT_ATOMIC_SMIN::Inst_FLAT__FLAT_ATOMIC_SMIN(InFmt_FLAT  
*iFmt)

 : Inst_FLAT(iFmt, "flat_atomic_smin")
@@ -40843,9 +40907,74 @@
 void
 Inst_FLAT__FLAT_ATOMIC_INC::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (wf->execMask().none()) {
+wf->decVMemInstsIssued();
+wf->decLGKMInstsIssued();
+wf->wrGmReqsInPipe--;
+wf->rdGmReqsInPipe--;
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->exec_mask = wf->execMask();
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
+
+ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+if (gpuDynInst->executedAs() == Enums::SC_GLOBAL) {
+gpuDynInst->computeUnit()->globalMemoryPipe.
+issueRequest(gpuDynInst);
+wf->wrGmReqsInPipe--;
+wf->outstandingReqsWrGm++;
+wf->rdGmReqsInPipe--;
+wf->outstandingReqsRdGm++;
+} else {
+fatal("Non global flat instructions not

[gem5-dev] Change in gem5/gem5[develop]: gpu-compute: Fixing HSA's barrier bit implementation

2020-08-12 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/30354 )


Change subject: gpu-compute: Fixing HSA's barrier bit implementation
..

gpu-compute: Fixing HSA's barrier bit implementation

This changeset fixes several bugs in the HSA barrier bit implementation.

1. Forces AQL packet launch to wait for completion of all previous packets
2. Enforces barrier bit blocking only if there are packets pending  
completion

3. Barrier bit unblocking is correclty done by the last pending packet
4. Implementing barrier bit for all packets to conform to HSA spec

Change-Id: I62ce589dff57dcde4d64054a1b6ffd962acd5eb8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30354
Reviewed-by: Sooraj Puthoor 
Maintainer: Anthony Gutierrez 
Tested-by: kokoro 
---
M src/dev/hsa/hsa_packet_processor.cc
M src/dev/hsa/hsa_packet_processor.hh
2 files changed, 100 insertions(+), 34 deletions(-)

Approvals:
  Sooraj Puthoor: Looks good to me, approved
  Anthony Gutierrez: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/src/dev/hsa/hsa_packet_processor.cc  
b/src/dev/hsa/hsa_packet_processor.cc

index 4143019..68cdcf4 100644
--- a/src/dev/hsa/hsa_packet_processor.cc
+++ b/src/dev/hsa/hsa_packet_processor.cc
@@ -282,11 +282,11 @@
 }

 void
-HSAPacketProcessor::schedAQLProcessing(uint32_t rl_idx)
+HSAPacketProcessor::schedAQLProcessing(uint32_t rl_idx, Tick delay)
 {
 RQLEntry *queue = regdQList[rl_idx];
-if (!queue->aqlProcessEvent.scheduled() && !queue->getBarrierBit()) {
-Tick processingTick = curTick() + pktProcessDelay;
+if (!queue->aqlProcessEvent.scheduled()) {
+Tick processingTick = curTick() + delay;
 schedule(queue->aqlProcessEvent, processingTick);
 DPRINTF(HSAPacketProcessor, "AQL processing scheduled at  
tick: %d\n",

 processingTick);
@@ -295,42 +295,48 @@
 }
 }

-bool
+void
+HSAPacketProcessor::schedAQLProcessing(uint32_t rl_idx)
+{
+schedAQLProcessing(rl_idx, pktProcessDelay);
+}
+
+Q_STATE
 HSAPacketProcessor::processPkt(void* pkt, uint32_t rl_idx, Addr  
host_pkt_addr)

 {
-bool is_submitted = false;
+Q_STATE is_submitted = BLOCKED_BPKT;
 SignalState *dep_sgnl_rd_st = &(regdQList[rl_idx]->depSignalRdState);
 // Dependency signals are not read yet. And this can only be a retry.
 // The retry logic will schedule the packet processor wakeup
 if (dep_sgnl_rd_st->pendingReads != 0) {
-return false;
+return BLOCKED_BPKT;
 }
 // `pkt` can be typecasted to any type of AQL packet since they all
 // have header information at offset zero
 auto disp_pkt = (_hsa_dispatch_packet_t *)pkt;
 hsa_packet_type_t pkt_type = PKT_TYPE(disp_pkt);
+if (IS_BARRIER(disp_pkt) &&
+regdQList[rl_idx]->compltnPending() > 0) {
+// If this packet is using the "barrier bit" to enforce ordering  
with

+// previous packets, and if there are outstanding packets, set the
+// barrier bit for this queue and block the queue.
+DPRINTF(HSAPacketProcessor, "%s: setting barrier bit for active" \
+" list ID = %d\n", __FUNCTION__, rl_idx);
+regdQList[rl_idx]->setBarrierBit(true);
+return BLOCKED_BBIT;
+}
 if (pkt_type == HSA_PACKET_TYPE_VENDOR_SPECIFIC) {
 DPRINTF(HSAPacketProcessor, "%s: submitting vendor specific pkt" \
 " active list ID = %d\n", __FUNCTION__, rl_idx);
 // Submit packet to HSA device (dispatcher)
 hsa_device->submitVendorPkt((void *)disp_pkt, rl_idx,  
host_pkt_addr);

-is_submitted = true;
+is_submitted = UNBLOCKED;
 } else if (pkt_type == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
 DPRINTF(HSAPacketProcessor, "%s: submitting kernel dispatch pkt" \
 " active list ID = %d\n", __FUNCTION__, rl_idx);
 // Submit packet to HSA device (dispatcher)
 hsa_device->submitDispatchPkt((void *)disp_pkt, rl_idx,  
host_pkt_addr);

-is_submitted = true;
-/*
-  If this packet is using the "barrier bit" to enforce ordering  
with

-  subsequent kernels, set the bit for this queue now, after
-  dispatching.
-*/
-if (IS_BARRIER(disp_pkt)) {
-DPRINTF(HSAPacketProcessor, "%s: setting barrier bit for  
active" \

-" list ID = %d\n", __FUNCTION__, rl_idx);
-regdQList[rl_idx]->setBarrierBit(true);
-}
+is_submitted = UNBLOCKED;
 } else if (pkt_type == HSA_PACKET_TYPE_BARRIER_AND) {
 DPRINTF(HSAPacketProcessor, "%s: Processing barrier packet" \
 " active list ID = %d\n", __FUNCTION__, rl_idx);
@@ -387,7 +393,7 @@
 // TODO: Completion signal of barrier packet to be
 // atomically decremented here
 finishPkt((void*)bar_and_pkt, rl_idx);
-i

[gem5-dev] Change in gem5/gem5[develop]: configs: Add import for FileSystemConfig in GPU_VIPER.py

2020-08-18 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/32675 )


Change subject: configs: Add import for FileSystemConfig in GPU_VIPER.py
..

configs: Add import for FileSystemConfig in GPU_VIPER.py

GPU_VIPER.py uses FileSystemConfig to register CPUs and caches in SE
mode. Without the import, it crashes.

Change-Id: I539a4060d705f6e1b9a12aca7836eca271f61557
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/32675
Reviewed-by: Matt Sinclair 
Reviewed-by: Jason Lowe-Power 
Maintainer: Jason Lowe-Power 
Tested-by: kokoro 
---
M configs/ruby/GPU_VIPER.py
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass



diff --git a/configs/ruby/GPU_VIPER.py b/configs/ruby/GPU_VIPER.py
index 967b4d3..50ccd2b 100644
--- a/configs/ruby/GPU_VIPER.py
+++ b/configs/ruby/GPU_VIPER.py
@@ -37,6 +37,7 @@
 from m5.util import addToPath
 from .Ruby import create_topology
 from .Ruby import send_evicts
+from common import FileSystemConfig

 addToPath('../')


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/32675
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I539a4060d705f6e1b9a12aca7836eca271f61557
Gerrit-Change-Number: 32675
Gerrit-PatchSet: 3
Gerrit-Owner: Kyle Roarty 
Gerrit-Reviewer: Anthony Gutierrez 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bradford Beckmann 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: configs, gpu-compute: change default GPU reg allocator to dynamic

2022-03-18 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/57537 )


 (

1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.
 )Change subject: configs, gpu-compute: change default GPU reg allocator to  
dynamic

..

configs, gpu-compute: change default GPU reg allocator to dynamic

The current default GPU register allocator is the "simple" policy,
which only allows 1 wavefront to run at a time on each CU.  This is
not very realistic and also means the tester (when not specifically
choosing the dynamic policy) is less rigorous in terms of validating
correctness.

To resolve this, this commit changes the default to the "dynamic"
register allocator, which runs as many waves per CU as there are
space in terms of registers and other resources -- thus it is more
realistic and does a better job of ensuring test coverage.

Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57537
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
---
M configs/example/apu_se.py
1 file changed, 25 insertions(+), 1 deletion(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 532fb98..b5fb9ff 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -161,7 +161,7 @@
 ' m5_switchcpu pseudo-ops will toggle back and forth')
 parser.add_argument("--num-hw-queues", type=int, default=10,
 help="number of hw queues in packet processor")
-parser.add_argument("--reg-alloc-policy", type=str, default="simple",
+parser.add_argument("--reg-alloc-policy", type=str, default="dynamic",
 help="register allocation policy (simple/dynamic)")

 parser.add_argument("--dgpu", action="store_true", default=False,

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57537
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ifca915130bb4f44da6a9ef896336138542b4e93e
Gerrit-Change-Number: 57537
Gerrit-PatchSet: 3
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Alexandru Duțu 
Gerrit-Reviewer: Bradford Beckmann 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby Bruce 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester

2022-03-23 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/57535 )


Change subject: tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester
..

tests,configs,mem-ruby: Handle num DMAs in GPU Ruby tester

Currently the GPU Ruby tester does not support requests returned
as aliased.  To get around this, the GPU Ruby tester needs
numDMAs to be 0.  To enable this, change the default value to allow
us to identify when a user wants more DMAs.

Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/57535
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matthew Poremba 
---
M configs/example/ruby_gpu_random_test.py
1 file changed, 30 insertions(+), 7 deletions(-)

Approvals:
  Matthew Poremba: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/ruby_gpu_random_test.py  
b/configs/example/ruby_gpu_random_test.py

index 0763454..029a97d 100644
--- a/configs/example/ruby_gpu_random_test.py
+++ b/configs/example/ruby_gpu_random_test.py
@@ -79,7 +79,7 @@
 help="Random seed number. Default value (i.e., 0)  
means \

 using runtime-specific value")
 parser.add_argument("--log-file", type=str, default="gpu-ruby-test.log")
-parser.add_argument("--num-dmas", type=int, default=0,
+parser.add_argument("--num-dmas", type=int, default=None,
 help="The number of DMA engines to use in tester  
config.")


 args = parser.parse_args()
@@ -108,7 +108,7 @@
 args.wf_size = 1
 args.wavefronts_per_cu = 1
 args.num_cpus = 1
-args.num_dmas = 1
+n_DMAs = 1
 args.cu_per_sqc = 1
 args.cu_per_scalar_cache = 1
 args.num_compute_units = 1
@@ -117,7 +117,7 @@
 args.wf_size = 16
 args.wavefronts_per_cu = 4
 args.num_cpus = 4
-args.num_dmas = 2
+n_DMAs = 2
 args.cu_per_sqc = 4
 args.cu_per_scalar_cache = 4
 args.num_compute_units = 4
@@ -126,11 +126,19 @@
 args.wf_size = 32
 args.wavefronts_per_cu = 4
 args.num_cpus = 4
-args.num_dmas = 4
+n_DMAs = 4
 args.cu_per_sqc = 4
 args.cu_per_scalar_cache = 4
 args.num_compute_units = 8

+# Number of DMA engines
+if not(args.num_dmas is None):
+n_DMAs = args.num_dmas
+# currently the tester does not support requests returned as
+# aliased, thus we need num_dmas to be 0 for it
+if not(args.num_dmas == 0):
+print("WARNING: num_dmas != 0 not supported with VIPER")
+
 #
 # Set address range - 2 options
 #   level 0: small
@@ -173,9 +181,6 @@
 # For now we're testing only GPU protocol, so we force num_cpus to be 0
 args.num_cpus = 0

-# Number of DMA engines
-n_DMAs = args.num_dmas
-
 # Number of CUs
 n_CUs = args.num_compute_units


--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/57535
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0a31f66c831f0379544c15bd7364f185e1edb1b2
Gerrit-Change-Number: 57535
Gerrit-PatchSet: 5
Gerrit-Owner: Matt Sinclair 
Gerrit-Reviewer: Bobby Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Alexandru Duțu 
Gerrit-CC: Bradford Beckmann 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] [S] Change in gem5/gem5[develop]: tests: cleanup m5out directly in weekly

2023-01-07 Thread Matt Sinclair (Gerrit) via gem5-dev
Matt Sinclair has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67198?usp=email )



Change subject: tests: cleanup m5out directly in weekly
..

tests: cleanup m5out directly in weekly

The weekly test script was implicitly assuming that no m5out
directory existed in the folder where the script was run.
However, if a prior test ran and failed, it would not clean up
its m5out directory, causing the weekly tests to fail.

This commit resolves this by removing the m5out directory before
trying to run any tests in the weekly script.  Moreover, we also
update the weekly script to explicitly remove this m5out directory
at the end of the script.

Change-Id: If10c59034528e171cc2c5dacb928b3a81d6b8c50
---
M tests/weekly.sh
1 file changed, 27 insertions(+), 1 deletion(-)



diff --git a/tests/weekly.sh b/tests/weekly.sh
index c7f834b..f218729 100755
--- a/tests/weekly.sh
+++ b/tests/weekly.sh
@@ -70,10 +70,11 @@

 # GPU weekly tests start here
 # before pulling gem5 resources, make sure it doesn't exist already
+# likewise, remove any lingering m5out folder
 docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
"${gem5_root}" --memory="${docker_mem_limit}" \
gcr.io/gem5-test/gcn-gpu:${tag} bash -c \
-   "rm -rf ${gem5_root}/gem5-resources"
+   "rm -rf ${gem5_root}/gem5-resources ${gem5_root}/m5out"
 # delete Pannotia datasets and output files in case a failed regression  
run left

 # them around
 rm -f coAuthorsDBLP.graph 1k_128k.gr result.out
@@ -383,5 +384,11 @@
"${gem5_root}" --memory="${docker_mem_limit}" hacc-test-weekly bash  
-c \

"rm -rf ${gem5_root}/gem5-resources"

+# Delete the gem5 m5out folder we created -- need to do in docker because  
it

+# creates
+docker run --rm --volume "${gem5_root}":"${gem5_root}" -w \
+   "${gem5_root}" --memory="${docker_mem_limit}" hacc-test-weekly bash  
-c \

+   "rm -rf ${gem5_root}/m5out"
+
 # delete Pannotia datasets we downloaded and output files it created
 rm -f coAuthorsDBLP.graph 1k_128k.gr result.out

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67198?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If10c59034528e171cc2c5dacb928b3a81d6b8c50
Gerrit-Change-Number: 67198
Gerrit-PatchSet: 1
Gerrit-Owner: Matt Sinclair 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


  1   2   >