[gem5-users] Re: Error when running test_bwd_bn test

2022-04-12 Thread David Fong via gem5-users
Hi Matt S,

Here's the info you requested.

Linux
uname -a
Linux xconfidentialx.com 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 
16:57:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

HIP_PLATFORM
Found this below.  Is it used by docker ?
gem5/util/dockerfiles/gcn-gpu/Dockerfile:-DHIP_COMPILER=clang 
-DHIP_PLATFORM=rocclr -DCMAKE_PREFIX_PATH="/opt/rocm"\

David

From: Matt Sinclair 
Sent: Tuesday, April 12, 2022 10:11 AM
To: David Fong ; gem5 users mailing list 
; Kyle Roarty 
Cc: Poremba, Matthew 
Subject: RE: Error when running test_bwd_bn test

In general, yes, MIOpen is less optimized for APUs.  I do not recall seeing 
this before for bwd_bn though.  @Kyle Roarty<mailto:kroa...@wisc.edu>: have you 
seen this?  I'm wondering if something is missing with how we set HIP_PLATFORM 
in the docker?

I did some quick digging and it appears to be coming from here: 
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.x/src/ocl/gcn_asm_utils.cpp#L144<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ROCmSoftwarePlatform_MIOpen_blob_rocm-2D4.0.x_src_ocl_gcn-5Fasm-5Futils.cpp-23L144&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=h1SwgLMRI3CgOTTRaXphol_HtdPq2xNGPYLECiKMfkQ&s=PeviGTnBCGjbDxcb9dROSYb86B-O-fIDWktX03LHPD8&e=>

David, what OS are you running this on?  In theory since you're using the 
docker, I wouldn't expect it to matter, but unless Kyle is also seeing this, 
that print appears to be happening when MIOpen doesn't think you are running on 
Linux.  Or that somewhere in your setup you set the compiler to something other 
than HCC/clang/rocclr?  What is HIP_PLATFORM set to in your setup?

Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Tuesday, April 12, 2022 10:53 AM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>
Subject: RE: Error when running test_bwd_bn test

Hi Matt S,

I'm using gfx801.
It proceeds and does not error out.
So it just means it's less optimized running with the APU.
I don't get this message for my other 3 tests (test_fwd_softmax, 
test_bwd_softmax, test_fwd_pool), only for test_bwd_bn.
I guess there's some special  function used in test_bwd_bn that is better 
optimized in GPU.

David



From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Monday, April 11, 2022 6:00 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>
Subject: RE: Error when running test_bwd_bn test

Hi David,

My guess is you are using gfx801 for this?  If so, does the application 
actually error out at this point, or just proceed beyond it?  If it's the 
latter, my guess is MIOpen is just complaining that you're running with an APU, 
which is less well optimized for.  If it's the former, then there may be 
something else in your setup we need to check.

Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, April 11, 2022 1:29 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Error when running test_bwd_bn test

Hi,

When I run the DNNMark test_bwd_bn,

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 --gpu-to-dir-latency 
120 --TCC_latency 16 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn
 -c dnnmark_test_bwd_bn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

I get this error:

MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.

Does this mean the test will not run properly with AMD GPU and I should ignore 
this test?
Or the AMD CPU will be doing the computations and it means the test will take 
longer to complete?

David

Log for lines before and after the error.
build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring 
non-temporal hint, modeling as cacheable!
build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 
'frndint' unimplemented
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:704: warn: unimplemented 
ioctl: AMDKFD_IOC_ACQUIRE_VM
b

[gem5-users] Re: Error when running test_bwd_bn test

2022-04-12 Thread David Fong via gem5-users
Hi Matt S,

I'm using gfx801.
It proceeds and does not error out.
So it just means it's less optimized running with the APU.
I don't get this message for my other 3 tests (test_fwd_softmax, 
test_bwd_softmax, test_fwd_pool), only for test_bwd_bn.
I guess there's some special  function used in test_bwd_bn that is better 
optimized in GPU.

David



From: Matt Sinclair 
Sent: Monday, April 11, 2022 6:00 PM
To: gem5 users mailing list 
Cc: David Fong ; Kyle Roarty ; 
Poremba, Matthew 
Subject: RE: Error when running test_bwd_bn test

Hi David,

My guess is you are using gfx801 for this?  If so, does the application 
actually error out at this point, or just proceed beyond it?  If it's the 
latter, my guess is MIOpen is just complaining that you're running with an APU, 
which is less well optimized for.  If it's the former, then there may be 
something else in your setup we need to check.

Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, April 11, 2022 1:29 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Error when running test_bwd_bn test

Hi,

When I run the DNNMark test_bwd_bn,

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 --gpu-to-dir-latency 
120 --TCC_latency 16 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn
 -c dnnmark_test_bwd_bn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

I get this error:

MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.

Does this mean the test will not run properly with AMD GPU and I should ignore 
this test?
Or the AMD CPU will be doing the computations and it means the test will take 
longer to complete?

David

Log for lines before and after the error.
build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring 
non-temporal hint, modeling as cacheable!
build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 
'frndint' unimplemented
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:704: warn: unimplemented 
ioctl: AMDKFD_IOC_ACQUIRE_VM
build/GCN3_X86/sim/syscall_emul.hh:1862: warn: mmap: writing to shared mmap 
region is currently unsupported. The write succeeds on the target, but it will 
not be propagated to the host or shared mappings
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:455: warn: Signal events are 
only supported currently
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the 
requested power state, request ignored
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall 
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Error when running test_bwd_bn test

2022-04-11 Thread David Fong via gem5-users
Hi,

When I run the DNNMark test_bwd_bn,

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 --gpu-to-dir-latency 
120 --TCC_latency 16 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_bwd_bn
 -c dnnmark_test_bwd_bn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/bn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

I get this error:

MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.

Does this mean the test will not run properly with AMD GPU and I should ignore 
this test?
Or the AMD CPU will be doing the computations and it means the test will take 
longer to complete?

David

Log for lines before and after the error.
build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring 
non-temporal hint, modeling as cacheable!
build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 
'frndint' unimplemented
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:704: warn: unimplemented 
ioctl: AMDKFD_IOC_ACQUIRE_VM
build/GCN3_X86/sim/syscall_emul.hh:1862: warn: mmap: writing to shared mmap 
region is currently unsupported. The write succeeds on the target, but it will 
not be propagated to the host or shared mappings
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:455: warn: Signal events are 
only supported currently
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the 
requested power state, request ignored
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall 
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/syscall_emul.hh:2081: warn: prlimit: unimplemented resource 7
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not 
support AMDGPU. Expect performance degradation.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-04-06 Thread David Fong via gem5-users
 |   0  0.00%  
0.25% |   1  0.06%  0.31% |   1  0.06%  0.38% | 
  0  0.00%  0.38% |   1  0.06%  0.44% | 
  0  0.00%  0.44% |   1  0.06%  0.50% |   0 
 0.00%  0.50% |   0  0.00%  0.50% |   0  0.00%  
0.50% |   0  0.00%  0.50% |   0  0.00%  
0.50% |   0  0.00%  0.50% |   0  0.00%  0.50% | 
  0  0.00%  0.50% |   0  0.00%  0.50% | 
  0  0.00%  0.50% |   0  0.00%  0.50% |   0 
 0.00%  0.50% |   0  0.00%  0.50% |   0  0.00%  
0.50% |   0  0.00%  0.50% |   0  0.00%  
0.50% |   0  0.00%  0.50% |   0  0.00%  0.50% | 
  0  0.00%  0.50% |   0  0.00%  0.50% | 
  0  0.00%  0.50% |   0  0.00%  0.50% |   0 
 0.00%  0.50% |   0  0.00%  0.50% |   0  0.00%  
0.50% |   0  0.00%  0.50% |   0  0.00%  
0.50% |   0  0.00%  0.50% |   0  0.00%  0.50% | 
  0  0.00%  0.50% |   0  0.00%  0.50% | 
  0  0.00%  0.50% |   0  0.00%  0.50% |   0 
 0.00%  0.50% |   1  0.06%  0.56% |   0  0.00%  
0.56% |   0  0.00%  0.56% |   1  0.06%  
0.62% |   1  0.06%  0.69% |   0  0.00%  0.69% | 
  0  0.00%  0.69% |   1  0.06%  0.75% | 
  0  0.00%  0.75% |   1  0.06%  0.81% |   0 
 0.00%  0.81% |   1  0.06%  0.87% |   0  0.00%  
0.87% |   1  0.06%  0.94% |   0  0.00%  
0.94% |   1  0.06%  1.00% |   0  0.00%  1.00% | 
  0  0.00%  1.00% |   1  0.06%  1.06% | 
  0  0.00%  1.06% |   0  0.00%  1.06% |   1 
 0.06%  1.13% |   0  0.00%  1.13% |   1  0.06%  
1.19% |   0  0.00%  1.19% |   1  0.06%  
1.25% |   0  0.00%  1.25% |   0  0.00%  1.25% | 
  0  0.00%  1.25% |   0  0.00%  1.25% | 
  1  0.06%  1.31% |   0  0.00%  1.31% |   0 
 0.00%  1.31% |   0  0.00%  1.31% |   1  0.06%  
1.38% |   0  0.00%  1.38% |   1  0.06%  
1.44% |   0  0.00%  1.44% |   0  0.00%  1.44% | 
  1  0.06%  1.50% |   0  0.00%  1.50% | 
  1  0.06%  1.56% |   0  0.00%  1.56% |   0 
 0.00%  1.56% |   0  0.00%  1.56% # delay distribution for 
stores (Unspecified)

Thanks,

David


From: Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Sent: Wednesday, March 30, 2022 11:00 AM
To: David Fong mailto:da...@chronostech.com>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>; Poremba, 
Matthew mailto:matthew.pore...@amd.com>>; Matt 
Sinclair mailto:sincl...@cs.wisc.edu>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi David,
loadLatencyDist and storeLatencyDist are good stats for looking at the average 
latency experienced by GPU loads and stores respectively.

Thanks,
Srikant

From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 30, 2022 10:35 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

[CAUTION: External Email]
Hi,

Matt P has not replied in over a week and may be on vacation.
Can anyone else reply to my question regarding which stats to examine for 
reduced latency in stats.txt ?

Thanks,

David




From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 23, 2022 11:23 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>; David Fong 
mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

Any feedback for my 

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-04-06 Thread David Fong via gem5-users
% |   1  0.06%  
0.62% |   1  0.06%  0.69% |   0  0.00%  0.69% | 
  0  0.00%  0.69% |   1  0.06%  0.75% | 
  0  0.00%  0.75% |   1  0.06%  0.81% |   0 
 0.00%  0.81% |   1  0.06%  0.87% |   0  0.00%  
0.87% |   1  0.06%  0.94% |   0  0.00%  
0.94% |   1  0.06%  1.00% |   0  0.00%  1.00% | 
  0  0.00%  1.00% |   1  0.06%  1.06% | 
  0  0.00%  1.06% |   0  0.00%  1.06% |   1 
 0.06%  1.13% |   0  0.00%  1.13% |   1  0.06%  
1.19% |   0  0.00%  1.19% |   1  0.06%  
1.25% |   0  0.00%  1.25% |   0  0.00%  1.25% | 
  0  0.00%  1.25% |   0  0.00%  1.25% | 
  1  0.06%  1.31% |   0  0.00%  1.31% |   0 
 0.00%  1.31% |   0  0.00%  1.31% |   1  0.06%  
1.38% |   0  0.00%  1.38% |   1  0.06%  
1.44% |   0  0.00%  1.44% |   0  0.00%  1.44% | 
  1  0.06%  1.50% |   0  0.00%  1.50% | 
  1  0.06%  1.56% |   0  0.00%  1.56% |   0 
 0.00%  1.56% |   0  0.00%  1.56% # delay distribution for 
stores (Unspecified)

Thanks,

David


From: Bharadwaj, Srikant 
Sent: Wednesday, March 30, 2022 11:00 AM
To: David Fong ; gem5 users mailing list 
; Poremba, Matthew ; Matt 
Sinclair 
Cc: Kyle Roarty 
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi David,
loadLatencyDist and storeLatencyDist are good stats for looking at the average 
latency experienced by GPU loads and stores respectively.

Thanks,
Srikant

From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 30, 2022 10:35 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

[CAUTION: External Email]
Hi,

Matt P has not replied in over a week and may be on vacation.
Can anyone else reply to my question regarding which stats to examine for 
reduced latency in stats.txt ?

Thanks,

David




From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 23, 2022 11:23 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>; David Fong 
mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

Any feedback for my question below regarding stats (stats.txt) to check for 
overall improvements due to reduced latency?

Thanks,

David

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, March 21, 2022 9:35 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>; David Fong 
mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

When I tried the
--reg-alloc-policy=dynamic
a few runs did not improve and in fact got worse.
For now, I will not use this option.
Maybe the driver is not optimizing for this release.

I did update my runs to use

--gpu-to-dir-latency 100  (instead of 120)
--TCC_latency 12 (instead of 16)

And saw some with positive improvements and some with negative improvements.
But overall positive.

To determine the  improvement, I used the stats.txt and picked "allLatencyDist".
I was told to not use individual "CUsX latencies" since it's too focused on one 
CUs and one should look at the big picture.

system.cpu3.allLatencyDist::mean 91121342.881356

I choose the "allLatencyDist::mean" because it had similar % number as 
"storeLatency" and "loadLatency".
The sim times didn't complete earlier even with shorter latency so I decided to 
choose the overall latency.

Which stats do you think should improve overall ?

Thanks,

David


From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Thursday, March 17, 2022 2:10 PM
To: David Fong mailto:da...@chronostech.com>>; Matt 
Sinclair mailto:sincl..

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-04-01 Thread David Fong via gem5-users
Thanks Srikant for your reply.
I saw most of the tests showed the value of  StoreLatenyDist:mean to improve 
(reduce)  with reduced latency.
A few tests showed slightly increased latency.
I would expect all tests to show improvement (reduced latency).

Is there some explanation for that ?
Are there some inaccuracies of the model or driver ?

David


From: Bharadwaj, Srikant 
Sent: Wednesday, March 30, 2022 11:00 AM
To: David Fong ; gem5 users mailing list 
; Poremba, Matthew ; Matt 
Sinclair 
Cc: Kyle Roarty 
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi David,
loadLatencyDist and storeLatencyDist are good stats for looking at the average 
latency experienced by GPU loads and stores respectively.

Thanks,
Srikant

From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 30, 2022 10:35 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

[CAUTION: External Email]
Hi,

Matt P has not replied in over a week and may be on vacation.
Can anyone else reply to my question regarding which stats to examine for 
reduced latency in stats.txt ?

Thanks,

David




From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 23, 2022 11:23 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>; David Fong 
mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

Any feedback for my question below regarding stats (stats.txt) to check for 
overall improvements due to reduced latency?

Thanks,

David

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, March 21, 2022 9:35 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>; David Fong 
mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

When I tried the
--reg-alloc-policy=dynamic
a few runs did not improve and in fact got worse.
For now, I will not use this option.
Maybe the driver is not optimizing for this release.

I did update my runs to use

--gpu-to-dir-latency 100  (instead of 120)
--TCC_latency 12 (instead of 16)

And saw some with positive improvements and some with negative improvements.
But overall positive.

To determine the  improvement, I used the stats.txt and picked "allLatencyDist".
I was told to not use individual "CUsX latencies" since it's too focused on one 
CUs and one should look at the big picture.

system.cpu3.allLatencyDist::mean 91121342.881356

I choose the "allLatencyDist::mean" because it had similar % number as 
"storeLatency" and "loadLatency".
The sim times didn't complete earlier even with shorter latency so I decided to 
choose the overall latency.

Which stats do you think should improve overall ?

Thanks,

David


From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Thursday, March 17, 2022 2:10 PM
To: David Fong mailto:da...@chronostech.com>>; Matt 
Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn


[AMD Official Use Only]

These would be valid for both as they both use the same cache protocol files.  
I'm not very familiar with how dGPU is hacked up in SE mode to look like a 
dGPU...


-Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Thursday, March 17, 2022 9:57 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

[CAUTION: External Email]
Hi Matt P,

Thanks for the tip on latency parameters.

Are these parameters valid ONLY for DGPU with VRAM or these apply to both DGPU 
and APU ?

David

From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Thursday, March 17, 2022 7:51 AM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; David 
Fong mailto:da.

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-30 Thread David Fong via gem5-users
Hi,

Matt P has not replied in over a week and may be on vacation.
Can anyone else reply to my question regarding which stats to examine for 
reduced latency in stats.txt ?

Thanks,

David




From: David Fong via gem5-users 
Sent: Wednesday, March 23, 2022 11:23 AM
To: gem5 users mailing list ; Poremba, Matthew 
; Matt Sinclair 
Cc: Kyle Roarty ; Bharadwaj, Srikant 
; David Fong 
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

Any feedback for my question below regarding stats (stats.txt) to check for 
overall improvements due to reduced latency?

Thanks,

David

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, March 21, 2022 9:35 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>; David Fong 
mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

When I tried the
--reg-alloc-policy=dynamic
a few runs did not improve and in fact got worse.
For now, I will not use this option.
Maybe the driver is not optimizing for this release.

I did update my runs to use

--gpu-to-dir-latency 100  (instead of 120)
--TCC_latency 12 (instead of 16)

And saw some with positive improvements and some with negative improvements.
But overall positive.

To determine the  improvement, I used the stats.txt and picked "allLatencyDist".
I was told to not use individual "CUsX latencies" since it's too focused on one 
CUs and one should look at the big picture.

system.cpu3.allLatencyDist::mean 91121342.881356

I choose the "allLatencyDist::mean" because it had similar % number as 
"storeLatency" and "loadLatency".
The sim times didn't complete earlier even with shorter latency so I decided to 
choose the overall latency.

Which stats do you think should improve overall ?

Thanks,

David


From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Thursday, March 17, 2022 2:10 PM
To: David Fong mailto:da...@chronostech.com>>; Matt 
Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn


[AMD Official Use Only]

These would be valid for both as they both use the same cache protocol files.  
I'm not very familiar with how dGPU is hacked up in SE mode to look like a 
dGPU...


-Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Thursday, March 17, 2022 9:57 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

[CAUTION: External Email]
Hi Matt P,

Thanks for the tip on latency parameters.

Are these parameters valid ONLY for DGPU with VRAM or these apply to both DGPU 
and APU ?

David

From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Thursday, March 17, 2022 7:51 AM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; David 
Fong mailto:da...@chronostech.com>>; gem5 users mailing 
list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn


[AMD Official Use Only]

Hi David,


I don't think these are the parameters you want to be changing if you are 
trying to change the VRAM memory latency which it seems like you are based on 
the GDDR5 comment.  Those parameters are for the latency between CUs seeing a 
memory request and the request leaving the global memory pipeline, I believe.  
It doesn't really have anything to do with interconnect or the latency to VRAM 
memory.

I think the parameters you probably want are the latencies defined in the 
GPU_VIPER slicc files:

  *   l2_request_latency / l2_response_latency in GPU_VIPER-TCC.sm

It looks like in configs/ruby/GPU_VIPER.py there are some command line 
parameters for this which correspond to:

  *   --gpu-to-dir-latency / --TCC_latency


-Matt

From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Wednesday, March 16, 2022 10:41 PM
To: David Fong mailto:da...@chronostech.com>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (g

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-23 Thread David Fong via gem5-users
Hi Matt P,

Any feedback for my question below regarding stats (stats.txt) to check for 
overall improvements due to reduced latency?

Thanks,

David

From: David Fong via gem5-users 
Sent: Monday, March 21, 2022 9:35 AM
To: Poremba, Matthew ; Matt Sinclair 
; gem5 users mailing list 
Cc: Kyle Roarty ; Bharadwaj, Srikant 
; David Fong 
Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt P,

When I tried the
--reg-alloc-policy=dynamic
a few runs did not improve and in fact got worse.
For now, I will not use this option.
Maybe the driver is not optimizing for this release.

I did update my runs to use

--gpu-to-dir-latency 100  (instead of 120)
--TCC_latency 12 (instead of 16)

And saw some with positive improvements and some with negative improvements.
But overall positive.

To determine the  improvement, I used the stats.txt and picked "allLatencyDist".
I was told to not use individual "CUsX latencies" since it's too focused on one 
CUs and one should look at the big picture.

system.cpu3.allLatencyDist::mean 91121342.881356

I choose the "allLatencyDist::mean" because it had similar % number as 
"storeLatency" and "loadLatency".
The sim times didn't complete earlier even with shorter latency so I decided to 
choose the overall latency.

Which stats do you think should improve overall ?

Thanks,

David


From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Thursday, March 17, 2022 2:10 PM
To: David Fong mailto:da...@chronostech.com>>; Matt 
Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn


[AMD Official Use Only]

These would be valid for both as they both use the same cache protocol files.  
I'm not very familiar with how dGPU is hacked up in SE mode to look like a 
dGPU...


-Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Thursday, March 17, 2022 9:57 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

[CAUTION: External Email]
Hi Matt P,

Thanks for the tip on latency parameters.

Are these parameters valid ONLY for DGPU with VRAM or these apply to both DGPU 
and APU ?

David

From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Thursday, March 17, 2022 7:51 AM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; David 
Fong mailto:da...@chronostech.com>>; gem5 users mailing 
list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn


[AMD Official Use Only]

Hi David,


I don't think these are the parameters you want to be changing if you are 
trying to change the VRAM memory latency which it seems like you are based on 
the GDDR5 comment.  Those parameters are for the latency between CUs seeing a 
memory request and the request leaving the global memory pipeline, I believe.  
It doesn't really have anything to do with interconnect or the latency to VRAM 
memory.

I think the parameters you probably want are the latencies defined in the 
GPU_VIPER slicc files:

  *   l2_request_latency / l2_response_latency in GPU_VIPER-TCC.sm

It looks like in configs/ruby/GPU_VIPER.py there are some command line 
parameters for this which correspond to:

  *   --gpu-to-dir-latency / --TCC_latency


-Matt

From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Wednesday, March 16, 2022 10:41 PM
To: David Fong mailto:da...@chronostech.com>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>; Bharadwaj, Srikant 
mailto:srikant.bharad...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

[CAUTION: External Email]
Matt P or Srikant: can you please help David with the latency question?  You 
know the answers better than I do here.

Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 16, 2022 5:47 PM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi Matt S,

Thanks again for your quick reply with useful information.
I will rerun with -reg-alloc-policy=dynamic
in my mini regression to see If it makes a dif

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-21 Thread David Fong via gem5-users
ublic-2D5Fgem5-2D5F-2D2D2B-2D5Frefs-2D5Fheads-2D5Fdevelop-2D5Ftests-2D5Fweekly.sh-2D2D23176-2D2526d-2D253DDwMFAg-2D2526c-2D253DeuGZstcaTDllvimEN8b7jXrwqOf-2D2Dv5A-2D5FCdpgnVfiiMM-2D2526r-2D253DOkH-2D2D8nM02VdNPRt-2D5FmiVO36vI9580zW1SgNQ4MzWRfqc-2D2526m-2D253DPHGn1HCe8I3xN31ZIG4ubHju1ngyERkZLvihkRk2ZXk-2D2526s-2D253D19clmMzYHLZtPwMARK0v5V0YZvD3ESFCoS4dnaX-2D5FtZo-2D2526e-2D253D-2D26data-2D3D04-2D257C01-2D257CMatthew.Poremba-2D2540amd.com-2D257C128afe0bb84f41f78a8a08da07d8b44f-2D257C3dd8961fe4884e608e11a82d994e183d-2D257C0-2D257C0-2D257C637830924585420548-2D257CUnknown-2D257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-2D253D-2D257C3000-2D26sdata-2D3DHaHDT-2D252BtmXNyxYtW5DVHrAU4-2D252F3gw3dKJAeu6ifoY77i4-2D253D-2D26reserved-2D3D0-2526d-253DDwMFAg-2526c-253DeuGZstcaTDllvimEN8b7jXrwqOf-2Dv5A-5FCdpgnVfiiMM-2526r-253DOkH-2D8nM02VdNPRt-5FmiVO36vI9580zW1SgNQ4MzWRfqc-2526m-253DBrjTuU7kPH2z67wCgKCeOMon3YRdbLbIgI2SzRO24e0-2526s-253DmOWQBYAHGtTVxZMtShwz7uvcbpqFBTgYdLzu-2DR8qaHs-2526e-253D-26data-3D04-257C01-257CMatthew.Poremba-2540amd.com-257C96476710ea844f284a5208da08373836-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637831330531823435-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3DYzad3q1-252FfFIJZT5pBDE3brY-252B-252B0rim204Q6448O6CuDs-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=4aLH4jls_XqBxYRd4acukCSsIkuTXDv-ZFwVCPecRFc&s=wBKGu9Kn3Tv0FsLt5PBo89fzB7RVRa8IWC4GHk6WD1M&e=>
 which ones we run on a weekly basis.  I expect all of those to pass (although 
your comment seems to indicate that is not always true?).  Your issues are 
exposing that perhaps we need to test more of them beyond these 3 - perhaps on 
a quarterly basis or something though to avoid inflating the weekly runtime.  
Having said that, I have not run LRN in a long time, as some ML people told me 
that LRN was not widely used anymore.  But when I did run it, I do remember it 
requiring a large amount of memory - which squares with what you are seeing 
here.  I thought LRN needed -mem-size=32 GB to run, but based on your message 
it seems that is not the case.

@Matt P: have you tried LRN lately?  If so, have you run into the same 
OOM/backing store failures?

I know Kyle R. is looking into your other failure, so this one may have to wait 
behind it from our end, unless Matt P knows of a fix.

Thanks,
Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, March 14, 2022 4:38 PM
To: David Fong via gem5-users mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi,

I'm getting an error related to memory for test_fwd_lrn
I increased the memory size from 4GB to 512GB I got memory size issue : "out of 
memory".

build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/mem_pool.cc:120: fatal: fatal condition freePages() <= 0 
occurred: Out of memory, please increase size of physical memory.

But once I increased mem size to 1024GB, 1536GB,2048GB I'm getting this DRAM 
device capacity issue.

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 1536GB --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_lrn
 -cdnnmark_test_fwd_lrn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/lrn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_gpu_cu256_run_dnnmark_test_fwd_lrn_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (2097152 Mbytes)
mmap: Cannot allocate memory
build/GCN3_X86/mem/physical.cc:231: fatal: Could not mmap 1649267441664 bytes 
for range [0:0x180]!


Smaller number of CUs like 4 also have same type of error.

Is there a regression script or regression log for DNNMark to show mem-size or 
configurations that are known working for DNNMark tests so
I can use same setup to run a few DNNMark tests?
Only test_fwd_softmax, test_bwd_softmax are working for CUs from 
{4,8,16,32,64,128,256}

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-17 Thread David Fong via gem5-users
tself is failing here (from such a large allocation).  
Since the failure is with a C++ mmap call itself, that is perhaps more 
problematic - is "Cannot allocate memory" the failure from the perror() call on 
the line above the fatal() print?

Regarding the other question, and the failures more generally: we have never 
tested with > 64 CUs before, so certainly you are stressing the system and 
encountering different kinds of failures than we have seen previously.

In terms of applications, I had thought most/all of them passed previously, but 
we do not test each and every one all the time because this would make our 
weekly regressions run for a very long time.  You can see here: 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/tests/weekly.sh#176<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Fgem5.googlesource.com-5Fpublic-5Fgem5-5F-2D2B-5Frefs-5Fheads-5Fdevelop-5Ftests-5Fweekly.sh-2D23176-2526d-253DDwMFAg-2526c-253DeuGZstcaTDllvimEN8b7jXrwqOf-2Dv5A-5FCdpgnVfiiMM-2526r-253DOkH-2D8nM02VdNPRt-5FmiVO36vI9580zW1SgNQ4MzWRfqc-2526m-253DPHGn1HCe8I3xN31ZIG4ubHju1ngyERkZLvihkRk2ZXk-2526s-253D19clmMzYHLZtPwMARK0v5V0YZvD3ESFCoS4dnaX-5FtZo-2526e-253D-26data-3D04-257C01-257CMatthew.Poremba-2540amd.com-257C128afe0bb84f41f78a8a08da07d8b44f-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637830924585420548-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3DHaHDT-252BtmXNyxYtW5DVHrAU4-252F3gw3dKJAeu6ifoY77i4-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=BrjTuU7kPH2z67wCgKCeOMon3YRdbLbIgI2SzRO24e0&s=mOWQBYAHGtTVxZMtShwz7uvcbpqFBTgYdLzu-R8qaHs&e=>
 which ones we run on a weekly basis.  I expect all of those to pass (although 
your comment seems to indicate that is not always true?).  Your issues are 
exposing that perhaps we need to test more of them beyond these 3 - perhaps on 
a quarterly basis or something though to avoid inflating the weekly runtime.  
Having said that, I have not run LRN in a long time, as some ML people told me 
that LRN was not widely used anymore.  But when I did run it, I do remember it 
requiring a large amount of memory - which squares with what you are seeing 
here.  I thought LRN needed -mem-size=32 GB to run, but based on your message 
it seems that is not the case.

@Matt P: have you tried LRN lately?  If so, have you run into the same 
OOM/backing store failures?

I know Kyle R. is looking into your other failure, so this one may have to wait 
behind it from our end, unless Matt P knows of a fix.

Thanks,
Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, March 14, 2022 4:38 PM
To: David Fong via gem5-users mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi,

I'm getting an error related to memory for test_fwd_lrn
I increased the memory size from 4GB to 512GB I got memory size issue : "out of 
memory".

build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/mem_pool.cc:120: fatal: fatal condition freePages() <= 0 
occurred: Out of memory, please increase size of physical memory.

But once I increased mem size to 1024GB, 1536GB,2048GB I'm getting this DRAM 
device capacity issue.

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 1536GB --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_lrn
 -cdnnmark_test_fwd_lrn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/lrn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_gpu_cu256_run_dnnmark_test_fwd_lrn_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (2097152 Mbytes)
mmap: Cannot allocate memory
build/GCN3_X86/mem/physical.cc:231: fatal: Could not mmap 1649267441664 bytes 
for range [0:0x180]!


Smaller number of CUs like 4 also have same type of error.

Is there a regression script or regression log for DNNMark to show mem-size or 
configurations that are known working for DNNMark tests so
I can use same setup to run a few DNNMark tests?
Only test_fwd_softmax, test_bwd_softmax are working for CUs from 
{4,8,16,32,64,128,256}

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-16 Thread David Fong via gem5-users
lrn

Hi Matt S.,

Thanks for the detailed reply.

I looked at the link you sent me for the weekly run.

I see an additional parameter which I didn't use:

--reg-alloc-policy=dynamic

What does this do ?

I was able to run the two other tests you use in your weekly runs : 
test_fwd_pool, test_bwd_bn
for CUs=4.

David


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Monday, March 14, 2022 7:41 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>; Poremba, Matthew 
mailto:matthew.pore...@amd.com>>
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi David,

I have not seen this mmap error before, and my initial guess was the mmap error 
is happening because you are trying to allocate more memory than we created 
when mmap'ing the inputs for the applications (we do this to speed up SE mode, 
because otherwise initializing arrays can take several hours).  However, the 
fact that it is failing in physical.cc and not in the application itself is 
throwing me off there.  Looking at where the failure is occurring, it seems the 
backing store code itself is failing here (from such a large allocation).  
Since the failure is with a C++ mmap call itself, that is perhaps more 
problematic - is "Cannot allocate memory" the failure from the perror() call on 
the line above the fatal() print?

Regarding the other question, and the failures more generally: we have never 
tested with > 64 CUs before, so certainly you are stressing the system and 
encountering different kinds of failures than we have seen previously.

In terms of applications, I had thought most/all of them passed previously, but 
we do not test each and every one all the time because this would make our 
weekly regressions run for a very long time.  You can see here: 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/tests/weekly.sh#176<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5_-2B_refs_heads_develop_tests_weekly.sh-23176&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=PHGn1HCe8I3xN31ZIG4ubHju1ngyERkZLvihkRk2ZXk&s=19clmMzYHLZtPwMARK0v5V0YZvD3ESFCoS4dnaX_tZo&e=>
 which ones we run on a weekly basis.  I expect all of those to pass (although 
your comment seems to indicate that is not always true?).  Your issues are 
exposing that perhaps we need to test more of them beyond these 3 - perhaps on 
a quarterly basis or something though to avoid inflating the weekly runtime.  
Having said that, I have not run LRN in a long time, as some ML people told me 
that LRN was not widely used anymore.  But when I did run it, I do remember it 
requiring a large amount of memory - which squares with what you are seeing 
here.  I thought LRN needed -mem-size=32 GB to run, but based on your message 
it seems that is not the case.

@Matt P: have you tried LRN lately?  If so, have you run into the same 
OOM/backing store failures?

I know Kyle R. is looking into your other failure, so this one may have to wait 
behind it from our end, unless Matt P knows of a fix.

Thanks,
Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, March 14, 2022 4:38 PM
To: David Fong via gem5-users mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi,

I'm getting an error related to memory for test_fwd_lrn
I increased the memory size from 4GB to 512GB I got memory size issue : "out of 
memory".

build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/mem_pool.cc:120: fatal: fatal condition freePages() <= 0 
occurred: Out of memory, please increase size of physical memory.

But once I increased mem size to 1024GB, 1536GB,2048GB I'm getting this DRAM 
device capacity issue.

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 1536GB --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_lrn
 -cdnnmark_test_fwd_lrn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/lrn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_gpu_cu256_run_dnnmark_test_fwd_lrn_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (2097152 Mbytes)
mmap: Cannot

[gem5-users] Re: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-15 Thread David Fong via gem5-users
Hi Matt S.,

Thanks for the detailed reply.

I looked at the link you sent me for the weekly run.

I see an additional parameter which I didn't use:

--reg-alloc-policy=dynamic

What does this do ?

I was able to run the two other tests you use in your weekly runs : 
test_fwd_pool, test_bwd_bn
for CUs=4.

David


From: Matt Sinclair 
Sent: Monday, March 14, 2022 7:41 PM
To: gem5 users mailing list 
Cc: David Fong ; Kyle Roarty ; 
Poremba, Matthew 
Subject: RE: gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi David,

I have not seen this mmap error before, and my initial guess was the mmap error 
is happening because you are trying to allocate more memory than we created 
when mmap'ing the inputs for the applications (we do this to speed up SE mode, 
because otherwise initializing arrays can take several hours).  However, the 
fact that it is failing in physical.cc and not in the application itself is 
throwing me off there.  Looking at where the failure is occurring, it seems the 
backing store code itself is failing here (from such a large allocation).  
Since the failure is with a C++ mmap call itself, that is perhaps more 
problematic - is "Cannot allocate memory" the failure from the perror() call on 
the line above the fatal() print?

Regarding the other question, and the failures more generally: we have never 
tested with > 64 CUs before, so certainly you are stressing the system and 
encountering different kinds of failures than we have seen previously.

In terms of applications, I had thought most/all of them passed previously, but 
we do not test each and every one all the time because this would make our 
weekly regressions run for a very long time.  You can see here: 
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/tests/weekly.sh#176<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5_-2B_refs_heads_develop_tests_weekly.sh-23176&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=PHGn1HCe8I3xN31ZIG4ubHju1ngyERkZLvihkRk2ZXk&s=19clmMzYHLZtPwMARK0v5V0YZvD3ESFCoS4dnaX_tZo&e=>
 which ones we run on a weekly basis.  I expect all of those to pass (although 
your comment seems to indicate that is not always true?).  Your issues are 
exposing that perhaps we need to test more of them beyond these 3 - perhaps on 
a quarterly basis or something though to avoid inflating the weekly runtime.  
Having said that, I have not run LRN in a long time, as some ML people told me 
that LRN was not widely used anymore.  But when I did run it, I do remember it 
requiring a large amount of memory - which squares with what you are seeing 
here.  I thought LRN needed -mem-size=32 GB to run, but based on your message 
it seems that is not the case.

@Matt P: have you tried LRN lately?  If so, have you run into the same 
OOM/backing store failures?

I know Kyle R. is looking into your other failure, so this one may have to wait 
behind it from our end, unless Matt P knows of a fix.

Thanks,
Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, March 14, 2022 4:38 PM
To: David Fong via gem5-users mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

Hi,

I'm getting an error related to memory for test_fwd_lrn
I increased the memory size from 4GB to 512GB I got memory size issue : "out of 
memory".

build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/mem_pool.cc:120: fatal: fatal condition freePages() <= 0 
occurred: Out of memory, please increase size of physical memory.

But once I increased mem size to 1024GB, 1536GB,2048GB I'm getting this DRAM 
device capacity issue.

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 1536GB --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_lrn
 -cdnnmark_test_fwd_lrn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/lrn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_gpu_cu256_run_dnnmark_test_fwd_lrn_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (2097152 Mbytes)
mmap: Cannot allocate memory
build/GCN3_X86/mem/physical.cc:231: fatal: Could not mmap 1649267441664 bytes 
for range [0:0x180]!


Smaller number of CUs like 4 also have same type of

[gem5-users] gem5 : X86 + GCN3 (gfx801) + test_fwd_lrn

2022-03-14 Thread David Fong via gem5-users
Hi,

I'm getting an error related to memory for test_fwd_lrn
I increased the memory size from 4GB to 512GB I got memory size issue : "out of 
memory".

build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:599: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:609: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
build/GCN3_X86/sim/mem_pool.cc:120: fatal: fatal condition freePages() <= 0 
occurred: Out of memory, please increase size of physical memory.

But once I increased mem size to 1024GB, 1536GB,2048GB I'm getting this DRAM 
device capacity issue.

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 1536GB --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_lrn
 -cdnnmark_test_fwd_lrn --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/lrn_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_gpu_cu256_run_dnnmark_test_fwd_lrn_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (2097152 Mbytes)
mmap: Cannot allocate memory
build/GCN3_X86/mem/physical.cc:231: fatal: Could not mmap 1649267441664 bytes 
for range [0:0x180]!


Smaller number of CUs like 4 also have same type of error.

Is there a regression script or regression log for DNNMark to show mem-size or 
configurations that are known working for DNNMark tests so
I can use same setup to run a few DNNMark tests?
Only test_fwd_softmax, test_bwd_softmax are working for CUs from 
{4,8,16,32,64,128,256}

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-14 Thread David Fong via gem5-users
Hi Kyle,

Any workarounds for this issue?
Also, what are the other DNNMark tests I can try which don’t need special 
cmd-line settings?

Thanks,

David


From: David Fong via gem5-users 
Sent: Friday, March 11, 2022 9:23 AM
To: Matt Sinclair ; gem5 users mailing list 
; Kyle Roarty 
Cc: Matthew Poremba ; David Fong 

Subject: [gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

Hi Matt,
I’m not changing any NCHW sizes.
I don’t even know what this parameter does.
I’m guessing the invalid filter channel number is related to the file read 
problem.
David


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Friday, March 11, 2022 9:15 AM
To: David Fong mailto:da...@chronostech.com>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>
Cc: Matthew Poremba mailto:matthew.pore...@amd.com>>
Subject: RE: [gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

Well it looks like things are progressing further with the larger memory size, 
so that’s good.

@Kyle Roarty<mailto:kroa...@wisc.edu>: can you please take a look at this?  My 
guess is that > 200 CUs is again calling a different file?  Because this 
warning:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: 
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801100.HIP.fdb.txt

Seems like the root of the problem.  My line numbers are off by 1 from David’s, 
but I’m guessing it’s this that the “real” failure is happening:

MIOPEN_CALL(miopenCreateTensorDescriptor(&desc_));

But I’m guessing that is failing because the kernel files are not being read 
properly.

@David: are you changing the NCHW sizes at all?  The other weird thing is the 
invalid filter channel number part.

Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Friday, March 11, 2022 11:08 AM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>
Cc: Kyle Roarty mailto:kroa...@wisc.edu>>; Matthew Poremba 
mailto:matthew.pore...@amd.com>>
Subject: RE: [gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

Hi Matt,

I added the –mem-size 8GB and there are some other messages and errors showing 
up now.

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: 
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801100.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
MIOpen Error: 3 at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
 264369621500

David

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 8GB --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_apu_cu256_8GB_run_dnnmark_test_fwd_conv_40latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
tcmalloc: large alloc 1073741824 bytes == 0x55f2039c2000 @  0x7f2bdb141680 
0x7f2bdb161ff4 0x55f1bb897441 0x55f1bbf47e53 0x55f1bb316617 0x7f2bdb609718 
0x7f2bdb609afb 0x7f2bdb609dc0 0x7f2bdb3d5d6d 0x7f2bdb3ddef6 0x7f2bdb52becb 
0x7f2bdb6090f4 0x7f2bdb3d5d6d 0x7f2bdb3ddef6 0x7f2bdb52becb 0x7f2bdb52c252 
0x7f2bdb52c63f 0x7f2bdb530c81 0x7f2bdb5c0527 0x7f2bdb3d5d6d 0x7f2bdb3d746d 
0x7f2bdb3e106b 0x7f2bdb609810 0x55f1bb92ed14 0x55f1ba9956f6 0x7f2bda4db0b3 
0x55f1ba9b604e
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memo

[gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-11 Thread David Fong via gem5-users
GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: 
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801100.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid 
filter channel number
MIOpen Error: 3 at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
 264369621500
Exiting because  exiting with last active thread context


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Thursday, March 10, 2022 6:02 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>; Kyle 
Roarty mailto:kroa...@wisc.edu>>; Matthew Poremba 
mailto:matthew.pore...@amd.com>>
Subject: Re: [gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

Just to be clear: —mem-size is an input arg for the apu_se.py script.

Matt
Sent from my iPhone

On Mar 10, 2022, at 7:44 PM, Matt Sinclair via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
 I am on my phone and thus cannot easily look at the line that failed at the 
moment, but my first step would be to increase the size of the memory gem5 is 
assuming — try —mem-size=8GB or 16GB and let us know if that solves the problem.

Matt
Sent from my iPhone

On Mar 10, 2022, at 5:12 PM, David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:

Hi,

I’m trying to run test_fwd_conv for gem5 with X86 CPU and GCN3 (gfx801) APU 
with 256 CU using git with gem5 v21.2.1.0

Linux> cd gem5/gem5-resources/src/gpu/DNNMark
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
Linux> docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx801 --num-cus=256
Linux> mv gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801_256.ukdb 
gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801100.ukdb

Linux> vim gem5/build_opts/GCN3_X86
NUMBER_BITS_PER_SET = '256'

Linux> cd gem5
Linxu> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt

Linux> cd ../../../../

linux> docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

An error message occurred for the test:
HIP Error at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/data_manager.h49
hipErrorOutOfMemory

How to fix this error ?

David

MESSAGES SHORTENED
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .

[gem5-users] Re: gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-11 Thread David Fong via gem5-users
: —mem-size is an input arg for the apu_se.py script.

Matt
Sent from my iPhone


On Mar 10, 2022, at 7:44 PM, Matt Sinclair via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
 I am on my phone and thus cannot easily look at the line that failed at the 
moment, but my first step would be to increase the size of the memory gem5 is 
assuming — try —mem-size=8GB or 16GB and let us know if that solves the problem.

Matt
Sent from my iPhone


On Mar 10, 2022, at 5:12 PM, David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:

Hi,

I’m trying to run test_fwd_conv for gem5 with X86 CPU and GCN3 (gfx801) APU 
with 256 CU using git with gem5 v21.2.1.0

Linux> cd gem5/gem5-resources/src/gpu/DNNMark
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
Linux> docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx801 --num-cus=256
Linux> mv gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801_256.ukdb 
gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801100.ukdb

Linux> vim gem5/build_opts/GCN3_X86
NUMBER_BITS_PER_SET = '256'

Linux> cd gem5
Linxu> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt

Linux> cd ../../../../

linux> docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

An error message occurred for the test:
HIP Error at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/data_manager.h49
hipErrorOutOfMemory

How to fix this error ?

David

MESSAGES SHORTENED
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
updates recently became the responsibility of SLICC state machines. Make sure 
to setMRU() near callbacks in .sm files!
gem5 Simulator System.  
http://gem5.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__gem5.org&d=DwMGaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=7r3w2XxzFbRgIeC6-XIHvihzeamlJkbmqVL5aEB9c_w&s=QTsN9sgXhwR4_EmFelH8kQ-b_SiIyy5a0wBfKhyf00g&e=>
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.2.1.0
gem5 compiled Mar 10 2022 21:44:19
gem5 started Mar 10 2022 22:25:08
gem5 executing on 84084e0cba7d, pid 1
command line: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py 
--num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv '--options=-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin'

info: Standard input is not a terminal, disabling listeners.
Num SQC =  64 Num scalar caches =  64 Num CU =  256
incrementing idx on  4
incrementing idx on  8
incrementing idx on  12
. . .
incrementing idx on  248
incrementing idx on  252
"dot" with args ['-Tsvg', '/tmp/tmp7b3e5gva'] returned code: 1

stdout, stderr:
b''
b'Error: /tmp

[gem5-users] gem5 + multicore CPU example

2022-03-10 Thread David Fong via gem5-users
Hi,

Is there an example system python code like apu_se.py that puts together a 
multicore CPU (quad or oct) and creates the interconnect with approximate 
latency between those cores ?
If yes, where to find this example  model ?

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] gem5 : X86 + GCN3 (gfx8001) + test_fwd_conv

2022-03-10 Thread David Fong via gem5-users
Hi,

I'm trying to run test_fwd_conv for gem5 with X86 CPU and GCN3 (gfx801) APU 
with 256 CU using git with gem5 v21.2.1.0

Linux> cd gem5/gem5-resources/src/gpu/DNNMark
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
Linux> docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
Linux> docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx801 --num-cus=256
Linux> mv gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801_256.ukdb 
gem5/gem5-resources/src/gpu/DNNMark/cachefiles/gfx801100.ukdb

Linux> vim gem5/build_opts/GCN3_X86
NUMBER_BITS_PER_SET = '256'

Linux> cd gem5
Linxu> docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt

Linux> cd ../../../../

linux> docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

An error message occurred for the test:
HIP Error at 
/home/dfong/work/ext_ips/gem5-apu-cu256-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/data_manager.h49
hipErrorOutOfMemory

How to fix this error ?

David

MESSAGES SHORTENED
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
updates recently became the responsibility of SLICC state machines. Make sure 
to setMRU() near callbacks in .sm files!
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.2.1.0
gem5 compiled Mar 10 2022 21:44:19
gem5 started Mar 10 2022 22:25:08
gem5 executing on 84084e0cba7d, pid 1
command line: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py 
--num-compute-units 256 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv '--options=-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin'

info: Standard input is not a terminal, disabling listeners.
Num SQC =  64 Num scalar caches =  64 Num CU =  256
incrementing idx on  4
incrementing idx on  8
incrementing idx on  12
. . .
incrementing idx on  248
incrementing idx on  252
"dot" with args ['-Tsvg', '/tmp/tmp7b3e5gva'] returned code: 1

stdout, stderr:
b''
b'Error: /tmp/tmp7b3e5gva: syntax error in line 236909 scanning a quoted string 
(missing endquote? longer than 16384?)\nString 
starting:"clk_domain=system.ruby.clk_domain
\\eventq_index=0
\\latency=1\n'

build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
. . .
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall 
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall rt_sigaction(...)
  (further warnings will be suppressed)
build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall 
rt_sigprocmask(...)
  (further warnings will be suppressed)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ig

[gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-09 Thread David Fong via gem5-users
Thanks Kyle. I confirmed that your instructions work and my sim can run and 
generate stats.txt.
David

From: Matt Sinclair 
Sent: Wednesday, March 9, 2022 3:13 PM
To: Kyle Roarty ; David Fong ; gem5 
users mailing list ; Poremba, Matthew 

Subject: Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

Thanks Kyle!  Should we add a patch to address this then?

Matt

From: Kyle Roarty mailto:kroa...@wisc.edu>>
Sent: Wednesday, March 9, 2022 5:06 PM
To: David Fong mailto:da...@chronostech.com>>; Matt 
Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>; Poremba, 
Matthew mailto:matthew.pore...@amd.com>>
Subject: Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

For whatever reason, MIOpen looks for a different filename when the number of 
CUs is above 100. However, we didn't see this because we never tested with such 
a large number of CUs.

If you look in the cachfiles directory in the DNNMark folder, you'll see a 
couple of relevant files: gfx801_128.udkb and gfx80180.ukdb. Rename 
gfx801_128.udkb to gfx80180.udkb and it'll work.

Kyle

From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 9, 2022 3:42 PM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>; Poremba, 
Matthew mailto:matthew.pore...@amd.com>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax


Nothing gets echoed out to the screen when I run this cmd-line with the 
-num-cus=128



docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128



Is there some option to make it verbose ?



From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Wednesday, March 9, 2022 1:36 PM
To: David Fong mailto:da...@chronostech.com>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>; Poremba, 
Matthew mailto:matthew.pore...@amd.com>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax



@Kyle Roarty<mailto:kroa...@wisc.edu>: I believe the only way to check that the 
number was substituted in is to watch the terminal when it's run, is that right?



I am not aware of 128 CUs not being supported, but I also haven't tried that 
many before either.



Matt



From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 9, 2022 3:32 PM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>; Poremba, 
Matthew mailto:matthew.pore...@amd.com>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax



Hi Matt,



I used these command-line for generating the cachefiles.



gem5/gem5-resources/src/gpu/DNNMark/



docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP

docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make

docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128



Maybe the option for -num-cus=128 is NOT supported ?



How to confirm the -num-cus=128 is updated in some file(s) ?



Thanks,



David





From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Wednesday, March 9, 2022 1:13 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Kyle Roarty mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax



That error in #2 means MIOpen can't find the kernel again.  Did you change the 
number of CUs to 128 (or whatever number of CUs you are using) when you 
generated the cachefiles?



Matt



From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 9, 2022 12:50 PM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with 
DNNMark test_fwd_softmax



Hi Matt,



Thanks for your quick response.

The hack is not working.

  1.  I had to start from scratch or I get same error
  2.  After running the same steps + the hack before gem5 compile, I'm getting 
these error messages

build/GCN3_X86

[gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-09 Thread David Fong via gem5-users
Nothing gets echoed out to the screen when I run this cmd-line with the 
-num-cus=128

docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128

Is there some option to make it verbose ?

From: Matt Sinclair 
Sent: Wednesday, March 9, 2022 1:36 PM
To: David Fong ; gem5 users mailing list 
; Poremba, Matthew ; Kyle Roarty 

Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

@Kyle Roarty<mailto:kroa...@wisc.edu>: I believe the only way to check that the 
number was substituted in is to watch the terminal when it's run, is that right?

I am not aware of 128 CUs not being supported, but I also haven't tried that 
many before either.

Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Wednesday, March 9, 2022 3:32 PM
To: Matt Sinclair mailto:sincl...@cs.wisc.edu>>; gem5 
users mailing list mailto:gem5-users@gem5.org>>; Poremba, 
Matthew mailto:matthew.pore...@amd.com>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

Hi Matt,

I used these command-line for generating the cachefiles.

gem5/gem5-resources/src/gpu/DNNMark/

docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128

Maybe the option for -num-cus=128 is NOT supported ?

How to confirm the -num-cus=128 is updated in some file(s) ?

Thanks,

David


From: Matt Sinclair mailto:sincl...@cs.wisc.edu>>
Sent: Wednesday, March 9, 2022 1:13 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
Kyle Roarty mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

That error in #2 means MIOpen can't find the kernel again.  Did you change the 
number of CUs to 128 (or whatever number of CUs you are using) when you 
generated the cachefiles?

Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 9, 2022 12:50 PM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with 
DNNMark test_fwd_softmax

Hi Matt,

Thanks for your quick response.
The hack is not working.

  1.  I had to start from scratch or I get same error
  2.  After running the same steps + the hack before gem5 compile, I'm getting 
these error messages
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find 
file: /tmp/miopen-MIOpenSoftmax.cl-96e7-d3d7-ce59-9759/MIOpenSoftmax.cl.o
MIOpen Error: 7 at 
/home/dfong/work/ext_ips/gem5-apu-cu128-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
 574458882500

Am I missing some other setting ?

David

FULL MESSAGE WITH . . . TO REDUCE SIZE

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [

[gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-09 Thread David Fong via gem5-users
Hi Matt,

I used these command-line for generating the cachefiles.

gem5/gem5-resources/src/gpu/DNNMark/

docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128

Maybe the option for -num-cus=128 is NOT supported ?

How to confirm the -num-cus=128 is updated in some file(s) ?

Thanks,

David


From: Matt Sinclair 
Sent: Wednesday, March 9, 2022 1:13 PM
To: gem5 users mailing list ; Poremba, Matthew 
; Kyle Roarty 
Cc: David Fong 
Subject: RE: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

That error in #2 means MIOpen can't find the kernel again.  Did you change the 
number of CUs to 128 (or whatever number of CUs you are using) when you 
generated the cachefiles?

Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Wednesday, March 9, 2022 12:50 PM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with 
DNNMark test_fwd_softmax

Hi Matt,

Thanks for your quick response.
The hack is not working.

  1.  I had to start from scratch or I get same error
  2.  After running the same steps + the hack before gem5 compile, I'm getting 
these error messages
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find 
file: /tmp/miopen-MIOpenSoftmax.cl-96e7-d3d7-ce59-9759/MIOpenSoftmax.cl.o
MIOpen Error: 7 at 
/home/dfong/work/ext_ips/gem5-apu-cu128-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
 574458882500

Am I missing some other setting ?

David

FULL MESSAGE WITH . . . TO REDUCE SIZE

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
build/GCN3_X86/base/remote_gdb.cc:381: warn: Sockets disabled, not accepting 
gdb connections
warn: dir_cntrl0.memory is deprecated. The request port for Ruby memory output 
to the main memory is now called `memory_out_port`
warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
warn: failed to generate dot output from m5out/config.dot
build/GCN3_X86/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
upda

[gem5-users] Re: gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-09 Thread David Fong via gem5-users
;128'`, or higher, and then recompile 
gem5.  As far as I know there is not a limit on the number of CUs.


-Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Tuesday, March 8, 2022 3:51 PM
To: David Fong via gem5-users mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark 
test_fwd_softmax

[CAUTION: External Email]
Hi,

I built gem5 with X86 and APU (gfx801) with CUS=128 to run DNNMark 
test_fwd_softmax showing steps below and message outputs from the run

Is there a limitation on number of CUs (compute units) for the APU (gfx801) or 
do I need to add the number of compute units (128) on one of the cmd-lines 
below ?

Thanks,

David



git clone 
https://gem5.googlesource.com/public/gem5<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgem5.googlesource.com-252Fpublic-252Fgem5-26data-3D04-257C01-257Cmatthew.poremba-2540amd.com-257C43a4c2768a7b409609ca08da015ebddc-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637823803685522602-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3DE6QPfUhM7qFb3gobEkSzCp2HdvVKXuQuGSgxRREcNkc-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=F21ZFu946IONLFjaIYSOXbhp72fP4psEV1yX4oaNmfA&s=4STq7Q1VfHpQCUuTTRNemzSiZeGr1r0hUDLBAidD46E&e=>
git clone 
https://gem5.googlesource.com/public/gem5-resources<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgem5.googlesource.com-252Fpublic-252Fgem5-2Dresources-26data-3D04-257C01-257Cmatthew.poremba-2540amd.com-257C43a4c2768a7b409609ca08da015ebddc-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637823803685522602-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3DqIXdStZk2TYrUHFxTKXguFios5oKN6eQ6WL59RA8sAc-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=F21ZFu946IONLFjaIYSOXbhp72fP4psEV1yX4oaNmfA&s=56gjdqaVCOChrWuZOZ2nDT-soU7aTZ6-flU90R58dQg&e=>
 gem5/gem5-resources

# COMPILE DNNMARK TESTS
cd gem5/gem5-resources/src/gpu/DNNMark
docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128
g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data
./generate_rand_data
# BUILD GEM5
cd ../../../..
docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt
# RUN TEST
cd ../
docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-

[gem5-users] gem5 : X86 + APU (gfx801) with CUs128 error with DNNMark test_fwd_softmax

2022-03-08 Thread David Fong via gem5-users
Hi,

I built gem5 with X86 and APU (gfx801) with CUS=128 to run DNNMark 
test_fwd_softmax showing steps below and message outputs from the run

Is there a limitation on number of CUs (compute units) for the APU (gfx801) or 
do I need to add the number of compute units (128) on one of the cmd-lines 
below ?

Thanks,

David



git clone https://gem5.googlesource.com/public/gem5
git clone https://gem5.googlesource.com/public/gem5-resources 
gem5/gem5-resources

# COMPILE DNNMARK TESTS
cd gem5/gem5-resources/src/gpu/DNNMark
docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 ./setup.sh HIP
docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 make
docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py 
cachefiles.csv --gfx-version=gfx801 --num-cus=128
g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data
./generate_rand_data
# BUILD GEM5
cd ../../../..
docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt
# RUN TEST
cd ../
docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --num-compute-units 128 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_apu_cu128_run_dnnmark_test_fwd_softmax_50latency.log
Global frequency set at 1 ticks per second
build/GCN3_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (512 Mbytes)
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
. . .
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/GCN3_X86/base/statistics.hh:280: warn: One of the stats is a legacy stat. 
Legacy stat is a stat that does not belong to any statistics::Group. Legacy 
stat is deprecated.
. . .
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
Forcing maxCoalescedReqs to 32 (TLB assoc.)
. . .
build/GCN3_X86/base/statistics.hh:280: warn: One of the stats is a legacy stat. 
Legacy stat is a stat that does not belong to any statistics::Group. Legacy 
stat is deprecated.
build/GCN3_X86/mem/ruby/common/Set.hh:214: fatal: Number of bits(64) < size 
specified(65). Increase the number of bits and recompile.
Memory Usage: 2359940 Kbytes

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] gem5 : x86 + VEGA DGPU (gfx900) with test_fwd_conv error

2022-03-07 Thread David Fong via gem5-users
Hi,

I'm trying to run DNNMark with x86 + VEGA DGPU (gfx900) with test_fwd_conv.

I'm getting this warning and error.
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: 
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx900_4.HIP.fdb.txt
MIOpen Error: 3 at 
/home/dfong/work/ext_ips/gem5-vega-gpu-dnn1/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057

Is there something wrong in my cmd-line to run the test_fwd_conv test or do I 
need to update a file ?

David

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/VEGA_X86/gem5.opt 
gem5/configs/example/apu_se.py --mem-size 4GB --dgpu --gfx-version=gfx900 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
 -cdnnmark_test_fwd_conv --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin" |& tee 
gem5_vega_dgpu_run_dnnmark_test_fwd_conv_40latency_0307.log
Global frequency set at 1 ticks per second
build/VEGA_X86/mem/mem_interface.cc:791: warn: DRAM device capacity (8192 
Mbytes) does not match the address range assigned (4096 Mbytes)
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (5) does not divide 
range [1:75] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:10] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (2) does not divide 
range [1:64] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/statistics.hh:280: warn: One of the stats is a legacy stat. 
Legacy stat is a stat that does not belong to any statistics::Group. Legacy 
stat is deprecated.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buckets. Rounding up.
build/VEGA_X86/base/stats/storage.hh:279: warn: Bucket size (1) does not 
divide range [1:1.6e+06] into equal-sized buck

[gem5-users] Re: gem5 + DGPU (GCN3) build error

2022-03-04 Thread David Fong via gem5-users
ong mailto:da...@chronostech.com>>
Sent: Friday, March 4, 2022 10:01 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Bobby Bruce mailto:bbr...@ucdavis.edu>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error



[AMD Official Use Only]



[CAUTION: External Email]

Hi Matt,



I used gfx801 and it ran ok.

What's the difference between gfx801 and gfx803 ?



Yes. I'm trying the DGPU flow with VEGA (GCN5).

Is this supported for the 21.2.1.0 release with DNNMark?



Please send instructions on how to compile and run DNNMark for VEGA.



Thanks,



David





From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Friday, March 4, 2022 9:55 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Bobby Bruce mailto:bbr...@ucdavis.edu>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error



[AMD Official Use Only]



Hi,





I don't know if this is what is causing this specific forking problem, but 
gfx900 is VEGA not GCN3.  There is a separate build for VEGA.  If you want GCN3 
dGPU you want gfx803.





-Matt



From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Friday, March 4, 2022 9:34 AM
To: Bobby Bruce mailto:bbr...@ucdavis.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 + DGPU (GCN3) build error



[CAUTION: External Email]

Hi Bobby,



Thanks for your reply.

I tried to rebuild in new directory and rerun same steps.  Same results with 
error.



In your regression testing, did you run with the --dgpu" and 
--gfx-version=gfx900" options?

Maybe --dgpu" requires some other code or options?



The default flow with APU (no --dgpu, --gfx_version=gfx801) can run DNNMark 
with no problem.



David





From: Bobby Bruce mailto:bbr...@ucdavis.edu>>
Sent: Thursday, March 3, 2022 6:43 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: Re: [gem5-users] gem5 + DGPU (GCN3) build error



I think, based on the error I'm seeing here, your build is creating tmp files 
in the container, which are deleted after DDNMark is built and the docker 
container is discarded. These are, for some reason, needed in the run and 
cannot be found. Did you follow the README here for DNNMark and follow it 
exactly? 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Fnam11.safelinks.protection.outlook.com-5F-2D3Furl-2D3Dhttps-2D253A-2D252F-2D252Furldefense.proofpoint.com-2D252Fv2-2D252Furl-2D253Fu-2D253Dhttps-2D2D3A-2D5F-2D5Fnam11.safelinks.protection.outlook.com-2D5F-2D2D3Furl-2D2D3Dhttps-2D2D253A-2D2D252F-2D2D252Furldefense.proofpoint.com-2D2D252Fv2-2D2D252Furl-2D2D253Fu-2D2D253Dhttps-2D2D2D3A-2D2D5F-2D2D5Fgem5.googlesource.com-2D2D5Fpublic-2D2D5Fgem5-2D2D2D2Dresources-2D2D5F-2D2D2D2B-2D2D5Frefs-2D2D5Fheads-2D2D5Fstable-2D2D5Fsrc-2D2D5Fgpu-2D2D5FDNNMark-2D2D5F-2D2D2526d-2D2D253DDwMFaQ-2D2D2526c-2D2D253DeuGZstcaTDllvimEN8b7jXrwqOf-2D2D2Dv5A-2D2D5FCdpgnVfiiMM-2D2D2526r-2D2D253DOkH-2D2D2D8nM02VdNPRt-2D2D5FmiVO36vI9580zW1SgNQ4MzWRfqc-2D2D2526m-2D2D253DaBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T-2D2D5FjM-2D2D2Df-2D2D2DYQSc-2D2D2526s-2D2D253DwaZIPhDvJYcIbOYvZAGH0pL63ezFEBPqm8wLIJL4QUE-2D2D2526e-2D2D253D-2D2D26data-2D2D3D04-2D2D257C01-2D2D257Cmatthew.poremba-2D2D2540amd.com-2D2D257C88d547e14e03423fd8e808d9fe059fe4-2D2D257C3dd8961fe4884e608e11a82d994e183d-2D2D257C0-2D2D257C0-2D2D257C637820122399508528-2D2D257CUnknown-2D2D257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-2D2D253D-2D2D257C3000-2D2D26sdata-2D2D3DJe1WdEqTbYAT66t3ZF1oj7JBgVSVn-2D2D252FFQ2bHmiU6eWqA-2D2D253D-2D2D26reserved-2D2D3D0-2D2526d-2D253DDwMFAg-2D2526c-2D253DeuGZstcaTDllvimEN8b7jXrwqOf-2D2Dv5A-2D5FCdpgnVfiiMM-2D2526r-2D253DOkH-2D2D8nM02VdNPRt-2D5FmiVO36vI9580zW1SgNQ4MzWRfqc-2D2526m-2D253DCXtsBSeEQThimN8UTcOCBJnofsJFkGT-2D5FC8Ob0g26Nfw-2D2526s-2D253DSZgvV9RNqEIdk8AVGvKhAf2gmSFaCQ655-2D5FCfMRjsjm4-2D2526e-2D253D-2D26data-2D3D04-2D257C01-2D257CMatthew.Poremba-2D2540amd.com-2D257Cbac89e77f805403fc0e308d9fe08edf4-2D257C3dd8961fe4884e608e11a82d994e183d-2D257C0-2D257C0-2D257C637820136584990440-2D257CUnknown-2D257CTW

[gem5-users] Re: gem5 + DGPU (GCN3) build error

2022-03-04 Thread David Fong via gem5-users
 
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5-resources/src/gpu/DNNMark/mmap.bin" --gfx-version=gfx900 --dgpu


-Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Friday, March 4, 2022 10:01 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Bobby Bruce mailto:bbr...@ucdavis.edu>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error


[AMD Official Use Only]

[CAUTION: External Email]
Hi Matt,

I used gfx801 and it ran ok.
What's the difference between gfx801 and gfx803 ?

Yes. I'm trying the DGPU flow with VEGA (GCN5).
Is this supported for the 21.2.1.0 release with DNNMark?

Please send instructions on how to compile and run DNNMark for VEGA.

Thanks,

David


From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Friday, March 4, 2022 9:55 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Bobby Bruce mailto:bbr...@ucdavis.edu>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error


[AMD Official Use Only]

Hi,


I don't know if this is what is causing this specific forking problem, but 
gfx900 is VEGA not GCN3.  There is a separate build for VEGA.  If you want GCN3 
dGPU you want gfx803.


-Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Friday, March 4, 2022 9:34 AM
To: Bobby Bruce mailto:bbr...@ucdavis.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 + DGPU (GCN3) build error

[CAUTION: External Email]
Hi Bobby,

Thanks for your reply.
I tried to rebuild in new directory and rerun same steps.  Same results with 
error.

In your regression testing, did you run with the --dgpu" and 
--gfx-version=gfx900" options?
Maybe --dgpu" requires some other code or options?

The default flow with APU (no --dgpu, --gfx_version=gfx801) can run DNNMark 
with no problem.

David


From: Bobby Bruce mailto:bbr...@ucdavis.edu>>
Sent: Thursday, March 3, 2022 6:43 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: Re: [gem5-users] gem5 + DGPU (GCN3) build error

I think, based on the error I'm seeing here, your build is creating tmp files 
in the container, which are deleted after DDNMark is built and the docker 
container is discarded. These are, for some reason, needed in the run and 
cannot be found. Did you follow the README here for DNNMark and follow it 
exactly? 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Fnam11.safelinks.protection.outlook.com-5F-2D3Furl-2D3Dhttps-2D253A-2D252F-2D252Furldefense.proofpoint.com-2D252Fv2-2D252Furl-2D253Fu-2D253Dhttps-2D2D3A-2D5F-2D5Fnam11.safelinks.protection.outlook.com-2D5F-2D2D3Furl-2D2D3Dhttps-2D2D253A-2D2D252F-2D2D252Furldefense.proofpoint.com-2D2D252Fv2-2D2D252Furl-2D2D253Fu-2D2D253Dhttps-2D2D2D3A-2D2D5F-2D2D5Fgem5.googlesource.com-2D2D5Fpublic-2D2D5Fgem5-2D2D2D2Dresources-2D2D5F-2D2D2D2B-2D2D5Frefs-2D2D5Fheads-2D2D5Fstable-2D2D5Fsrc-2D2D5Fgpu-2D2D5FDNNMark-2D2D5F-2D2D2526d-2D2D253DDwMFaQ-2D2D2526c-2D2D253DeuGZstcaTDllvimEN8b7jXrwqOf-2D2D2Dv5A-2D2D5FCdpgnVfiiMM-2D2D2526r-2D2D253DOkH-2D2D2D8nM02VdNPRt-2D2D5FmiVO36vI9580zW1SgNQ4MzWRfqc-2D2D2526m-2D2D253DaBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T-2D2D5FjM-2D2D2Df-2D2D2DYQSc-2D2D2526s-2D2D253DwaZIPhDvJYcIbOYvZAGH0pL63ezFEBPqm8wLIJL4QUE-2D2D2526e-2D2D253D-2D2D26data-2D2D3D04-2D2D257C01-2D2D257Cmatthew.poremba-2D2D2540amd.com-2D2D257C88d547e14e03423fd8e808d9fe059fe4-2D2D257C3dd8961fe4884e608e11a82d994e183d-2D2D257C0-2D2D257C0-2D2D257C637820122399508528-2D2D257CUnknown-2D2D257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-2D2D253D-2D2D257C3000-2D2D26sdata-2D2D3DJe1WdEqTbYAT66t3ZF1oj7JBgVSVn-2D2D252FFQ2bHmiU6eWqA-2D2D253D-2D2D26reserved-2D2D3D0-2D2526d-2D253DDwMFAg-2D2526c-2D253DeuGZstcaTDllvimEN8b7jXrwqOf-2D2Dv5A-2D5FCdpgnVfiiMM-2D2526r-2D253DOkH-2D2D8nM02VdNPRt-2D5FmiVO36vI9580zW1SgNQ4MzWRfqc-2D2526m-2D253DCXtsBSeEQThimN8UTcOCBJnofsJFkGT-2D5FC8Ob0g26Nfw-2D2526s-2D253DSZgvV9RNqEIdk8AVGv

[gem5-users] Re: gem5 + DGPU (GCN3) build error

2022-03-04 Thread David Fong via gem5-users
Hi Matt,

Thanks for your quick response and support.

I'll try the gfx803 (dGPU) first and later the VEGA_X86 version.

FYI, I'm trying to adjust a parameter for latency to see stat improvements.

gem5/build/GCN3_X86/gpu-compute/GPU.py

mem_req_latency = Param.Int(50, "Latency for request from the cu to ruby. "\
"Represents the pipeline to reach the TCP "\
"and specified in GPU clock cycles")
mem_resp_latency = Param.Int(50, "Latency for responses from ruby to the "\
 "cu. Represents the pipeline between the "\
 "TCP and cu as well as TCP data array "\
 "access. Specified in GPU clock cycles")

Which stat numbers for CUs should I see improvement ?

Thanks,

David

From: Poremba, Matthew 
Sent: Friday, March 4, 2022 10:10 AM
To: David Fong ; gem5 users mailing list 
; Bobby Bruce ; Matt Sinclair 
; Kyle Roarty 
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error


[AMD Official Use Only]

Hi David,


gfx801 is APU (e.g., "Carrizo") and gfx803 is dGPU (e.g., RX 4xx/5xx series).  
From a gem5 perspective, basically they are setup differently in how memory is 
laid out.

Vega was recently added in 21.2 but from my notes not all of the DNNMark 
kernels are working and require docker changes to fix. Anyway you can try by 
generally replacing GCN3_X86 with VEGA_X86 and adding --gfx-version=gfx900 
--dgpu to application runs:

docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID 
gcr.io/gem5-test/gcn-gpu:v21-2 scons -sQ -j$(nproc) build/VEGA_X86/gem5.opt

docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w 
${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py -n3 
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5-resources/src/gpu/DNNMark/mmap.bin" --gfx-version=gfx900 --dgpu


-Matt

From: David Fong mailto:da...@chronostech.com>>
Sent: Friday, March 4, 2022 10:01 AM
To: Poremba, Matthew mailto:matthew.pore...@amd.com>>; 
gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Bobby Bruce mailto:bbr...@ucdavis.edu>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error


[AMD Official Use Only]

[CAUTION: External Email]
Hi Matt,

I used gfx801 and it ran ok.
What's the difference between gfx801 and gfx803 ?

Yes. I'm trying the DGPU flow with VEGA (GCN5).
Is this supported for the 21.2.1.0 release with DNNMark?

Please send instructions on how to compile and run DNNMark for VEGA.

Thanks,

David


From: Poremba, Matthew mailto:matthew.pore...@amd.com>>
Sent: Friday, March 4, 2022 9:55 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Bobby Bruce mailto:bbr...@ucdavis.edu>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error


[AMD Official Use Only]

Hi,


I don't know if this is what is causing this specific forking problem, but 
gfx900 is VEGA not GCN3.  There is a separate build for VEGA.  If you want GCN3 
dGPU you want gfx803.


-Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Friday, March 4, 2022 9:34 AM
To: Bobby Bruce mailto:bbr...@ucdavis.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 + DGPU (GCN3) build error

[CAUTION: External Email]
Hi Bobby,

Thanks for your reply.
I tried to rebuild in new directory and rerun same steps.  Same results with 
error.

In your regression testing, did you run with the --dgpu" and 
--gfx-version=gfx900" options?
Maybe --dgpu" requires some other code or options?

The default flow with APU (no --dgpu, --gfx_version=gfx801) can run DNNMark 
with no problem.

David


From: Bobby Bruce mailto:bbr...@ucdavis.edu>>
Sent: Thursday, March 3, 2022 6:43 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: Re: [gem5-users] gem5 + DGPU (GCN3) build error

I think, based on the error I'm seeing here, your build is creating tmp files 
in the container, which are deleted after DDNMark

[gem5-users] Re: gem5 + DGPU (GCN3) build error

2022-03-04 Thread David Fong via gem5-users
Hi Matt,

I used gfx801 and it ran ok.
What's the difference between gfx801 and gfx803 ?

Yes. I'm trying the DGPU flow with VEGA (GCN5).
Is this supported for the 21.2.1.0 release with DNNMark?

Please send instructions on how to compile and run DNNMark for VEGA.

Thanks,

David


From: Poremba, Matthew 
Sent: Friday, March 4, 2022 9:55 AM
To: gem5 users mailing list ; Bobby Bruce 
; Matt Sinclair ; Kyle Roarty 

Cc: David Fong 
Subject: RE: [gem5-users] Re: gem5 + DGPU (GCN3) build error


[AMD Official Use Only]

Hi,


I don't know if this is what is causing this specific forking problem, but 
gfx900 is VEGA not GCN3.  There is a separate build for VEGA.  If you want GCN3 
dGPU you want gfx803.


-Matt

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Friday, March 4, 2022 9:34 AM
To: Bobby Bruce mailto:bbr...@ucdavis.edu>>; gem5 users 
mailing list mailto:gem5-users@gem5.org>>; Matt Sinclair 
mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] Re: gem5 + DGPU (GCN3) build error

[CAUTION: External Email]
Hi Bobby,

Thanks for your reply.
I tried to rebuild in new directory and rerun same steps.  Same results with 
error.

In your regression testing, did you run with the --dgpu" and 
--gfx-version=gfx900" options?
Maybe --dgpu" requires some other code or options?

The default flow with APU (no --dgpu, --gfx_version=gfx801) can run DNNMark 
with no problem.

David


From: Bobby Bruce mailto:bbr...@ucdavis.edu>>
Sent: Thursday, March 3, 2022 6:43 PM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>; 
Matt Sinclair mailto:sincl...@cs.wisc.edu>>; Kyle Roarty 
mailto:kroa...@wisc.edu>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: Re: [gem5-users] gem5 + DGPU (GCN3) build error

I think, based on the error I'm seeing here, your build is creating tmp files 
in the container, which are deleted after DDNMark is built and the docker 
container is discarded. These are, for some reason, needed in the run and 
cannot be found. Did you follow the README here for DNNMark and follow it 
exactly? 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Fgem5.googlesource.com-5Fpublic-5Fgem5-2D2Dresources-5F-2D2B-5Frefs-5Fheads-5Fstable-5Fsrc-5Fgpu-5FDNNMark-5F-2526d-253DDwMFaQ-2526c-253DeuGZstcaTDllvimEN8b7jXrwqOf-2Dv5A-5FCdpgnVfiiMM-2526r-253DOkH-2D8nM02VdNPRt-5FmiVO36vI9580zW1SgNQ4MzWRfqc-2526m-253DaBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T-5FjM-2Df-2DYQSc-2526s-253DwaZIPhDvJYcIbOYvZAGH0pL63ezFEBPqm8wLIJL4QUE-2526e-253D-26data-3D04-257C01-257Cmatthew.poremba-2540amd.com-257C88d547e14e03423fd8e808d9fe059fe4-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637820122399508528-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3DJe1WdEqTbYAT66t3ZF1oj7JBgVSVn-252FFQ2bHmiU6eWqA-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=CXtsBSeEQThimN8UTcOCBJnofsJFkGT_C8Ob0g26Nfw&s=SZgvV9RNqEIdk8AVGvKhAf2gmSFaCQ655_CfMRjsjm4&e=>.
 I admit building and running the GPU code can be tricky as we're heavily 
dependent on the docker images and things can easily go wrong.

Matt or Kyle: do either of you have any idea what's going wrong here?

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: 
https://www.bobbybruce.net<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Fwww.bobbybruce.net-2526d-253DDwMFaQ-2526c-253DeuGZstcaTDllvimEN8b7jXrwqOf-2Dv5A-5FCdpgnVfiiMM-2526r-253DOkH-2D8nM02VdNPRt-5FmiVO36vI9580zW1SgNQ4MzWRfqc-2526m-253DaBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T-5FjM-2Df-2DYQSc-2526s-253DngqE4VS5UTHp-5FiDKaeA2UgAEOCTJVvsm3o1CfZqeurA-2526e-253D-26data-3D04-257C01-257Cmatthew.poremba-2540amd.com-257C88d547e14e03423fd8e808d9fe059fe4-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637820122399508528-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3Dqmxc35H9Rnhwfp8n8qDG-252FVeT3qIR1UgFT4QSpnLXNeQ-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=CXtsBSeEQThimN8UTcOCBJnofsJFkGT_C8Ob0g26Nfw&s=AZ8wTwP2SSMwW64B6t4bEawT5GGSIIplivSBTk9Tuhg&e=>


On Wed, Mar 2, 2022 at 4:46 PM David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
Hi,

I built gem5 + DGPU (GCN3) (gfx900) and ran 

[gem5-users] Re: gem5 + DGPU (GCN3) build error

2022-03-04 Thread David Fong via gem5-users
Hi Bobby,

Thanks for your reply.
I tried to rebuild in new directory and rerun same steps.  Same results with 
error.

In your regression testing, did you run with the --dgpu” and 
--gfx-version=gfx900” options?
Maybe --dgpu” requires some other code or options?

The default flow with APU (no --dgpu, --gfx_version=gfx801) can run DNNMark 
with no problem.

David

From: Bobby Bruce 
Sent: Thursday, March 3, 2022 6:43 PM
To: gem5 users mailing list ; Matt Sinclair 
; Kyle Roarty 
Cc: David Fong 
Subject: Re: [gem5-users] gem5 + DGPU (GCN3) build error

I think, based on the error I'm seeing here, your build is creating tmp files 
in the container, which are deleted after DDNMark is built and the docker 
container is discarded. These are, for some reason, needed in the run and 
cannot be found. Did you follow the README here for DNNMark and follow it 
exactly? 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_heads_stable_src_gpu_DNNMark_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=aBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T_jM-f-YQSc&s=waZIPhDvJYcIbOYvZAGH0pL63ezFEBPqm8wLIJL4QUE&e=>.
 I admit building and running the GPU code can be tricky as we're heavily 
dependent on the docker images and things can easily go wrong.

Matt or Kyle: do either of you have any idea what's going wrong here?

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: 
https://www.bobbybruce.net<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.bobbybruce.net&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=aBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T_jM-f-YQSc&s=ngqE4VS5UTHp_iDKaeA2UgAEOCTJVvsm3o1CfZqeurA&e=>


On Wed, Mar 2, 2022 at 4:46 PM David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
Hi,

I built gem5 + DGPU (GCN3) (gfx900) and ran DNNMark with this command-line

[gem5-resources]  docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2<https://urldefense.proofpoint.com/v2/url?u=http-3A__gcr.io_gem5-2Dtest_gcn-2Dgpu-3Av21-2D2&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=aBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T_jM-f-YQSc&s=-kc-QcX1EFLM1qzhO2OMnmlH1P1Vx_l9s786ttKmXBY&e=>
 python3 generate_cachefiles.py cachefiles.csv --gfx-version=gfx900 --num-cus=4

[gem5-gpu-dnn] docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2<https://urldefense.proofpoint.com/v2/url?u=http-3A__gcr.io_gem5-2Dtest_gcn-2Dgpu-3Av21-2D2&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=aBkzz8UWgg6cGJOqO3QnvVOSrQN0fZg7T_jM-f-YQSc&s=-kc-QcX1EFLM1qzhO2OMnmlH1P1Vx_l9s786ttKmXBY&e=>
 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py --dgpu 
--gfx-version=gfx900 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

and got this error message :

build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find 
file: /tmp/miopen-MIOpenSoftmax.cl-9c04-5b2f-4076-0450/MIOpenSoftmax.cl.o
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
MIOpen Error: 7 at 
/home/dfong/work/ext_ips/gem5-gpu-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
 571357584000
Exiting because  exiting with last active thread context

Am I missing a step in compilation process or some other settings ?

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org>
To unsubscribe send an email to 
gem5-users-le...@gem5.org<mailto:gem5-users-le...@g

[gem5-users] Re: gem5 + GCN3 questions

2022-03-04 Thread David Fong via gem5-users
Hi Srikant,

Thank you for the detailed reply.

I understand that the overall CU instruction numbers should be the same.

Main idea is that if latency is reduced for "mem_req_latency, mem_resp_latency"
which stat(s) would indicate the reduction in latency ?

Are these two the only ones which relevant or are there more  directly 
correlated stats for reduced latency ?


system.cpu3.CUs2.headTailLatency::mean   68651.850962 (50)

system.cpu3.CUs2.headTailLatency::mean   64891.258333  (40)



system.cpu3.CUs2.headTailLatency::stdev   157090.173635 (50)

system.cpu3.CUs2.headTailLatency::stdev   155057.054245 (40)

Thanks,

David



From: Bharadwaj, Srikant 
Sent: Thursday, March 3, 2022 6:49 PM
To: gem5 users mailing list 
Cc: David Fong 
Subject: RE: gem5 + GCN3 questions


[AMD Official Use Only]

Hi David,

The instructions executed in each CU could change in each run depending on the 
workgroup scheduler. So in your runs, it could be possible that CU2 is getting 
different workgroups in each run - which explains your observations #1, #2, and 
#3. You would want to look at the total number of instructions executed by the 
GPU(sum of all CUs) to ensure that the total count remains the same.
Further, In certain cases, the total instruction count of the GPU could also 
defer if there are atomic operations within the workload. Changing the latency 
could mean that some CUs might be spinning (and thus executing more 
instructions).

For your observation #4, it could be possible that the bottleneck is somewhere 
else in the system, thus changing the mem_req/resp_latency is not resulting in 
any performance difference.

Thanks,
Srikant

From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Friday, February 25, 2022 1:47 PM
To: David Fong via gem5-users mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] gem5 + GCN3 questions

[CAUTION: External Email]
Hi,

For those familiar with gem5 + GCN3 simulations, I need some answers to 
questions.

I downloaded and followed instructions at

https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam11.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgem5.googlesource.com-252Fpublic-252Fgem5-2Dresources-252F-252B-252Frefs-252Fheads-252Fstable-252Fsrc-252Fgpu-252FDNNMark-252F-26data-3D04-257C01-257Csrikant.bharadwaj-2540amd.com-257C0426d1bbae7c4d81c50f08d9f8a88202-257C3dd8961fe4884e608e11a82d994e183d-257C0-257C0-257C637814225326873421-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26sdata-3D32RniQ-252FfZsDtTIdwkT-252FGH-252BJbjLP-252F9pYszPhSPEy-252BT5A-253D-26reserved-3D0&d=DwMFAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=fs2l2Yf-Os9SsNIwjEI3ErVk7edUu7FFfU5pveqCY3Q&s=JWpO9sKfgkhl4hstM2yCfGbZ9VtTO7ieD1mR2nix2oQ&e=>

to build gem5 + GCN3.

I ran the DNN test : test_fwd_softmax

With two runs to check how latency affects the runs

Original value for mem_rd_latency and mem_resp_latency was 50 but I also ran 
with 40.
gem5/build/GCN3_X86/gpu-compute/GPU.py
mem_req_latency = Param.Int(40, "Latency for request from the cu to ruby. "\
"Represents the pipeline to reach the TCP "\
"and specified in GPU clock cycles")
mem_resp_latency = Param.Int(40, "Latency for responses from ruby to the "\
 "cu. Represents the pipeline between the "\
 "TCP and cu as well as TCP data array "\
 "access. Specified in GPU clock cycles")

  1.  Why does stats.txt for "40" show reduced number of instructions ?

system.cpu3.CUs2.numInstrExecuted   71838(50)

system.cpu3.CUs2.numInstrExecuted   69075(40)



  1.  Does the GPU kernel perform optimizations on the instructions due to less 
waiting time?



  1.  Do some stats like below make sense (first line is "50") and (second line 
is "40") ?



system.cpu3.CUs2.ScheduleStage.dispNrdyStalls::Ready   347264

system.cpu3.CUs2.ScheduleStage.dispNrdyStalls::Ready   351634



system.cpu3.CUs2.instCyclesLdsPerSimd::0  800

system.cpu3.CUs2.instCyclesLdsPerSimd::0  696



system.cpu3.CUs2.tlbRequests   104000

system.cpu3.CUs2.tlbRequests   10



system.cpu3.CUs2.tlbCycles10101232000

system.cpu3.CUs2.tlbCycles  9141288000



system.cpu3.CUs2.numInstrExecuted   71838

system.cpu3.CUs2.numInstrExecuted   69075



system.cpu3.CUs2.headTailLatency::mean   68651.850962

system.cpu3.CUs2.headTailLatency::mean   64891.258333



system.

[gem5-users] gem5 with Torus XY NOC

2022-03-03 Thread David Fong via gem5-users
Hi,

Under gem5 or gem5-resources, is  there a Torus_XY NOC configuration that has 
been tested and doesn't deadlock ?

Thanks,

David


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] gem5 + DGPU (GCN3) build error

2022-03-02 Thread David Fong via gem5-users
Hi,

I built gem5 + DGPU (GCN3) (gfx900) and ran DNNMark with this command-line

[gem5-resources]  docker run --rm -v ${PWD}:${PWD} 
-v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} 
gcr.io/gem5-test/gcn-gpu:v21-2 python3 generate_cachefiles.py cachefiles.csv 
--gfx-version=gfx900 --num-cus=4

[gem5-gpu-dnn] docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v21-2 gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py --dgpu --gfx-version=gfx900 -n3 
--benchmark-root=gem5/gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax --options="-config 
gem5/gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5/gem5-resources/src/gpu/DNNMark/mmap.bin"

and got this error message :

build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
sh: 1: Cannot fork
MIOpen Error: /root/driver/MLOpen/src/hipoc/hipoc_program.cpp:195: Cant find 
file: /tmp/miopen-MIOpenSoftmax.cl-9c04-5b2f-4076-0450/MIOpenSoftmax.cl.o
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mbind(...)
MIOpen Error: 7 at 
/home/dfong/work/ext_ips/gem5-gpu-dnn/gem5/gem5-resources/src/gpu/DNNMark/core/include/dnn_wrapper.h485Ticks:
 571357584000
Exiting because  exiting with last active thread context

Am I missing a step in compilation process or some other settings ?

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] gem5 cpu-tests and benchmarks documentation

2022-03-01 Thread David Fong via gem5-users
Hi,

Is there documentation on website that lists the groups of tests and benchmarks 
 that are precompiled and can be downloaded for each CPU type (like x86, arm) ?

Like these cmd-lines to get Bubblesort for each CPU type ? (x86,arm)

wget dist.gem5.org/dist/v21-2/test-progs/cpu-tests/bin/x86/Bubblesort
wget dist.gem5.org/dist/v21-2/test-progs/cpu-tests/bin/arm/Bubblesort

I found that there are some specific cpu-tests from this webpage like above but 
no additional information for more tests

https://www.gem5.org/documentation/learning_gem5/part1/extending_configs

Thanks,

David


___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 with garnet3.0 and x86 examples

2022-03-01 Thread David Fong via gem5-users
Hi Srikant,

Thank you for the instructions to build the topology.

But then how to build a gem5 system with this network topology
that can execute AI applications like DNN in this network topology with CU 
(computing units)?

I can see

configs/example/fs.py

and this looks like one can build a  full system and connect the noc topology 
to it.

How does AI benchmark application like DNN (deep neural networks) get compiled 
to specific CPU ISA
and mapped to  the full system including noc topology for execution?

David


From: Bharadwaj, Srikant 
Sent: Monday, February 28, 2022 5:08 PM
To: gem5 users mailing list 
Cc: David Fong 
Subject: RE: gem5 with garnet3.0 and x86 examples


[Public]

Hi David,
We don't have working examples with CPU and garnet3.0. But the general idea is 
to create network configurations using the topology files(configs/topologies).
The general methodology to build topologies in garnet3.0 is as follows:

  1.  Identify nodes and cache controllers

 *   You can do this using node.type (See configs/topologies/Mesh_XY.py for 
example)
 *   Nodes can be cache controllers like L1/L2 as well as directories, 
memory controllers, etc.
 *   Example:
if (self.nodes[0].type == "L2Cache_Controller"):
l2_cntrls.append(self.nodes[0])

  1.  Build Routers - configure their clock domain, supported flit size, 
latency, etc.

 *   Example:
router_domain_0=SrcClockDomain(clock='4GHz',voltage_domain=VoltageDomain(voltage=options.sys_voltage))
routers.append(Router(router_id=0, latency = 1,clk_domain=domain_0, width=8))

router_domain_1=SrcClockDomain(clock='2GHz',voltage_domain=VoltageDomain(voltage=options.sys_voltage))

routers.append(Router(router_id=1, latency = 1,clk_domain=domain_0, width=16))

  1.  Connect the routers using internal and external links - configure their 
clock domain, supported flit size, latency, SerDes, ClockDomainCrossing, etc.

 *   Internal links are unidirectional and connect two routers
 *   External links are bidirectional and connect a node to a router.
 *   Example:
link_domain_0=SrcClockDomain(clock='4GHz',voltage_domain=VoltageDomain(voltage=options.sys_voltage))
ext_links.append(ExtLink(link_id=0, ext_node=l2_contrls[0], 
int_node=routers[0], width=32,
 clk_domain=link_domain_0, int_serdes=True, latency = 
1, weight=1))

link_domain_1=SrcClockDomain(clock='4GHz',voltage_domain=VoltageDomain(voltage=options.sys_voltage))
int_links.append(IntLink(link_id=1, src_node=routers[0], dst_node=routers[1], 
width=16,
 clk_domain=link_domain_1, dst_cdc=True, 
src_serdes=True, latency = 1, weight=1))
int_links.append(IntLink(link_id=2, src_node=routers[1], dst_node=routers[0], 
width=16,
 clk_domain=link_domain_1, src_cdc=True, 
dst_serdes=True, latency = 1, weight=1))

You can follow examples in configs/topologies/ for building topologies. Hope 
this helps.

Thanks,
Srikant


From: David Fong via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Monday, February 28, 2022 4:33 PM
To: David Fong via gem5-users mailto:gem5-users@gem5.org>>
Cc: David Fong mailto:da...@chronostech.com>>
Subject: [gem5-users] gem5 with garnet3.0 and x86 examples

[CAUTION: External Email]
Hi,

Are there working examples in latest gem5 v21.2.1 with garnet3.0 and x86 (or 
ARM or RISCV) cpus with AI  applications like DNN ?
If not, please explain the flow on how to achieve this.

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] gem5 with garnet3.0 and x86 examples

2022-02-28 Thread David Fong via gem5-users
Hi,

Are there working examples in latest gem5 v21.2.1 with garnet3.0 and x86 (or 
ARM or RISCV) cpus with AI  applications like DNN ?
If not, please explain the flow on how to achieve this.

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] gem5 + GCN3 questions

2022-02-25 Thread David Fong via gem5-users
Hi,

For those familiar with gem5 + GCN3 simulations, I need some answers to 
questions.

I downloaded and followed instructions at

https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/

to build gem5 + GCN3.

I ran the DNN test : test_fwd_softmax

With two runs to check how latency affects the runs

Original value for mem_rd_latency and mem_resp_latency was 50 but I also ran 
with 40.
gem5/build/GCN3_X86/gpu-compute/GPU.py
mem_req_latency = Param.Int(40, "Latency for request from the cu to ruby. "\
"Represents the pipeline to reach the TCP "\
"and specified in GPU clock cycles")
mem_resp_latency = Param.Int(40, "Latency for responses from ruby to the "\
 "cu. Represents the pipeline between the "\
 "TCP and cu as well as TCP data array "\
 "access. Specified in GPU clock cycles")

  1.  Why does stats.txt for "40" show reduced number of instructions ?

system.cpu3.CUs2.numInstrExecuted   71838(50)

system.cpu3.CUs2.numInstrExecuted   69075(40)



  1.  Does the GPU kernel perform optimizations on the instructions due to less 
waiting time?



  1.  Do some stats like below make sense (first line is "50") and (second line 
is "40") ?



system.cpu3.CUs2.ScheduleStage.dispNrdyStalls::Ready   347264

system.cpu3.CUs2.ScheduleStage.dispNrdyStalls::Ready   351634



system.cpu3.CUs2.instCyclesLdsPerSimd::0  800

system.cpu3.CUs2.instCyclesLdsPerSimd::0  696



system.cpu3.CUs2.tlbRequests   104000

system.cpu3.CUs2.tlbRequests   10



system.cpu3.CUs2.tlbCycles10101232000

system.cpu3.CUs2.tlbCycles  9141288000



system.cpu3.CUs2.numInstrExecuted   71838

system.cpu3.CUs2.numInstrExecuted   69075



system.cpu3.CUs2.headTailLatency::mean   68651.850962

system.cpu3.CUs2.headTailLatency::mean   64891.258333



system.cpu3.CUs2.headTailLatency::stdev   157090.173635

system.cpu3.CUs2.headTailLatency::stdev   155057.054245



  1.  The runtime is the same.

Is there a way to end simulation based upon completion of all instructions 
instead of a fixed time ?

This would be another way for me to know that the run with latency = 40 should 
end sooner.



-- Begin Simulation Statistics -- "50"

simSeconds   0.126230   # 
Number of seconds simulated (Second)

simTicks 126229990500   # 
Number of ticks simulated (Tick)

finalTick126229990500   # 
Number of ticks from beginning of simulation (restored from checkpoints and 
never reset) (Tick)

simFreq  1   # 
The number of ticks per simulated second ((Tick/Second))

hostSeconds199.21   # 
Real time elapsed on the host (Second)

hostTickRate633665289   # 
The number of ticks simulated per host second (ticks/s) ((Tick/Second))

hostMemory3596200   # 
Number of bytes of host memory used (Byte)

simInsts 38011242   # 
Number of instructions simulated (Count)

simOps   72305276   # 
Number of ops (including micro ops) simulated (Count)

hostInstRate   190811   # 
Simulator instruction rate (inst/s) ((Count/Second))

hostOpRate 362962   # 
Simulator op (including micro ops) rate (op/s) ((Count/Second))



-- Begin Simulation Statistics -- "40"

simSeconds   0.126230   # 
Number of seconds simulated (Second)

simTicks 126229990500   # 
Number of ticks simulated (Tick)

finalTick126229990500   # 
Number of ticks from beginning of simulation (restored from checkpoints and 
never reset) (Tick)

simFreq  1   # 
The number of ticks per simulated second ((Tick/Second))

hostSeconds199.32   # 
Real time elapsed on the host (Second)

hostTickRate633294503   # 
The number of ticks simulated per host second (ticks/s) ((Tick/Second))

hostMemory3598508   # 
Number of bytes of host memory used (Byte)

simInsts 

[gem5-users] Re: Not able to access webpage to run_npb.py

2022-02-22 Thread David Fong via gem5-users
roofpoint.com/v2/url?u=https-3A__www.gem5.org_documentation_gem5art_main_faq&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=-F4E-Z-aALmxVxUj0Zrc9BVrRSY0sLQaVIPmpHOm8YA&s=A6478iaA1_FaM-3TZWOsqgqAAiFzyOeZQGb5m34N9FM&e=>.
The results of your tests will be stored both on your file system, and the 
gem5art database. The result files include normal gem5 results files like 
stats.txt (which has the performance statistics about your simulation run) and 
some other gem5art related files like info.json (which will contain some high 
level information about your gem5 run).

Hope this helps!
Thanks,
-Ayaz


On Fri, Feb 18, 2022 at 11:40 AM David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
Hi Bobby,

Thanks for your recommendations.

We will stick to X86 to test the flow for NPB tests and adjust to ARM when 
needed.

But I have a few questions about the flow.

From just a user perspective and NOT a developer and following this webpage 
instructions:

https://www.gem5.org/documentation/gem5art/tutorials/npb-tutorial<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gem5.org_documentation_gem5art_tutorials_npb-2Dtutorial&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=-F4E-Z-aALmxVxUj0Zrc9BVrRSY0sLQaVIPmpHOm8YA&s=XI-Og9IRR25doa1_vA_mvc19MYljw09oA7TrDy-6cic&e=>


  1.  It seems like I don’t need to do the “Setting up the environment” since I 
don’t plan to create or modify npb-tests.

I thought the npb tests are already on the disk image from “Creating a disk 
image” section.

Please confirm.

If the npb-tests are necessary please explain “your-remote-add”.  Is this on my 
host machine and needs to be accessible as a webpage on my local hard disk?

git remote add origin 
https://your-remote-add/npb-tests.git<https://urldefense.proofpoint.com/v2/url?u=https-3A__your-2Dremote-2Dadd_npb-2Dtests.git&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=-F4E-Z-aALmxVxUj0Zrc9BVrRSY0sLQaVIPmpHOm8YA&s=v8VgZOZGh_lDrnpTTyQygy08tj2FMTuuBuMrngG5jZQ&e=>



  1.  To run one test I just use at gem5 directory

`./build/X86/gem5.opt configs/example/gem5_library/x86-npb-benchmarks.py 
--benchmark ep --size A`


  1.  To run a suite of NPB benchmark tests I can create the launch_npb_tests.py

This file seems to rebuild everything from scratch and run a regression

As a first order I prefer to not get so sophisticated until I get a few simple 
tests to run.



Could I run in sequential order something like below

`./build/X86/gem5.opt configs/example/gem5_library/x86-npb-benchmarks.py 
--benchmark ep --size A`

`./build/X86/gem5.opt configs/example/gem5_library/x86-npb-benchmarks.py 
--benchmark bt --size A`

`./build/X86/gem5.opt configs/example/gem5_library/x86-npb-benchmarks.py 
--benchmark cg --size A`

`./build/X86/gem5.opt configs/example/gem5_library/x86-npb-benchmarks.py 
--benchmark ft --size A`


  1.  Extract NPB performance data

I’m not familiar with celery and but familiar with python.

Do I need to install celery on host system ?

Which directory and how to extract the NPB run data which is on the virtual 
machine over to the host machine ?

What kind of statistics are output : runtime of test , latencies of certain 
paths ?

Thanks,

David




From: Bobby Bruce mailto:bbr...@ucdavis.edu>>
Sent: Thursday, February 17, 2022 12:11 PM
To: David Fong mailto:da...@chronostech.com>>
Cc: gem5 users mailing list mailto:gem5-users@gem5.org>>
Subject: Re: [gem5-users] Re: Not able to access webpage to run_npb.py

gem5-X is a fork of gem5, which as far as I can tell, diverged from gem5 in the 
middle of 2018. gem5art was built on a version of gem5 in 2020-2021. While I 
can't say anything for certain, I wouldn't be surprised if you run into some 
difficulties getting this all to work perfectly.

Are you going to use ARM? We do provide NPB images, and linux kernels, but they 
work with X86. If you're set on using ARM you'll need to make your own. 
Instructions on building kernel for gem5 can be found here: 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/linux-kernel/<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_heads_stable_src_linux-2Dkernel_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=4io_xViB2fRjXPrYZBmeetGZsI5IqD4Aq_nVt8zMABA&s=BmadL2P06_Iw_mLWfCrwa7hmk8z4-RRTD8e1dgzLTqQ&e=>,
 and info on creating a disk image containing NPB can be found here: 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/npb/<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_heads_stable_src_npb_

[gem5-users] ModuleNotFoundError: No mdule named 'gem5'

2022-02-18 Thread David Fong via gem5-users
Hi,

I got the error

ModuleNotFoundError: No mdule named 'gem5'

After I followed the steps to build and run a test for gem5.

### downloaded gem5 and created gem5.opt based upon v21.1.02 version

git clone --recursive -b v21.1.0.2 https://gem5.googlesource.com/public/gem5/

virtualenv -p python3 venv
source venv/bin/activate

# and build m5

cd gem5/util/m5/
scons build/x86/out/m5
cd ../../

scons build/X86/gem5.opt

### and other disk image steps

cd disk-image/npb
git clone https://github.com/darchr/npb-hooks.git

cd ../disk-image/
wget https://releases.hashicorp.com/packer/1.4.3/packer_1.4.3_linux_amd64.zip
unzip packer_1.4.3_linux_amd64.zip
./packer validate npb/npb.json
./packer build npb/npb.json

# and built the kernel
cd ~/work/ext_ips
git clone --branch v4.19.83 --depth 1 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
mv linux linux-stable
cd linux-stable

cp -p ../config.4.19.83 .config
make -j8
cp vmlinux vmlinux-4.19.83

### run simple test
cd gem5

./build/X86/gem5.opt configs/example/x86-npb-benchmarks.py --benchmark ep 
--size A
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.1.0.2
gem5 compiled Feb 18 2022 14:14:39
gem5 started Feb 18 2022 14:18:54
gem5 executing on sundial.chronostech.com, pid 29766
command line: ./build/X86/gem5.opt configs/example/x86-npb-benchmarks.py 
--benchmark ep --size A

Traceback (most recent call last):
  File "", line 1, in 
  File "build/X86/python/m5/main.py", line 455, in main
exec(filecode, scope)
  File "configs/example/x86-npb-benchmarks.py", line 47, in 
from gem5.utils.requires import requires
ModuleNotFoundError: No module named 'gem5'

I'm getting error message above.

Did I miss a compilation step or didn't download a good version or something 
else ?

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Not able to access webpage to run_npb.py

2022-02-18 Thread David Fong via gem5-users
ttps://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources-2560&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=4io_xViB2fRjXPrYZBmeetGZsI5IqD4Aq_nVt8zMABA&s=fSLta2q59ciWgJVuX5G8Qw_e0q6y8U_ezhIuph8IRY8&e=>
 and look at the configs provided under `src/npb`. That will definitely give 
you something to start with.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: 
https://www.bobbybruce.net<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.bobbybruce.net&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=4io_xViB2fRjXPrYZBmeetGZsI5IqD4Aq_nVt8zMABA&s=1d9i6quzd9ZcvGg1XJgLYTZye85bicMaY5aihOey9xs&e=>


On Thu, Feb 17, 2022 at 11:45 AM David Fong 
mailto:da...@chronostech.com>> wrote:
Hi Bobby,

I’m trying to modify my gem5-X setup to add the NPB performance tests.

My gem5-X setup doesn’t have a configs/example/gem5_library.

I believe the github repo for the  gem5-X build didn’t add the gem5_library 
directory and files.

I could use the gem5 setup from the NPB but then I’ll not be able to do 
architectural exploration with gem5-X.
I would prefer to stick to gem5-X and add on the NPR tests.

This is our cmd-line to build the gem5-X
`./build/ARM/gem5.fast --remote-gdb-port=0 -d log_dir configs/example/fs.py 
--cpu-clock=1GHz --kernel=vmlinux --machine-type=VExpress_GEM5_V1 
--dtb-file=/home/dfong/work/ext_ips/gem5-X/system/arm/dt/armv8_gem5_v1_1cpu.dtb 
-n 1 --disk-image=gem5_ubuntu16.img --caches --l2cache --l1i_size=32kB 
--l1d_size=32kB --l2_size=1MB --l2_assoc=2 --mem-type=DDR4_2400_4x16 
--mem-ranks=4 --mem-size=4GB --sys-clock=1600MHz`

What do you recommend I should do ?

Thanks,

David


From: Bobby Bruce via gem5-users 
mailto:gem5-users@gem5.org>>
Sent: Thursday, February 17, 2022 11:16 AM
To: gem5 users mailing list mailto:gem5-users@gem5.org>>
Cc: Bobby Bruce mailto:bbr...@ucdavis.edu>>
Subject: [gem5-users] Re: Not able to access webpage to run_npb.py

Hey David,

Sorry about the trouble you're running into. It seems the gem5art tutorial on 
the website has become a bit outdated. We've updated gem5-resources in the last 
release and clearly this has broken some links. I'll make sure updating this is 
prioritized.

I have two ways you can run NPB. The first is quite simple and is basically 
what Jason said, there's a script in 
`configs/example/gem5_library/x86-npb-benchmarks.py` which you can execute with 
gem5 and run NPB. This should work: `./build/X86/gem5.opt 
configs/example/gem5_library/x86-npb-benchmarks.py --benchmark ep --size A` 
(warning, this script assumes you're running gem5 on an X86 host with KVM). 
This approach is using our gem5 stdlib, a tutorial to which can be found here: 
https://www.gem5.org/documentation/gem5-stdlib/overview<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gem5.org_documentation_gem5-2Dstdlib_overview&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=XrODO-sPqMN8yjZ4qLcyRP0EbgeB1S1OBBi8JNt8kZc&e=>

The second way is to checkout the gem5-resources repo to the state it was at 
the last release. The tag you want is v21.1.0.2, 
https://gem5.googlesource.com/public/gem5-resources/+/refs/tags/v21.1.0.2/<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_tags_v21.1.0.2_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=f9_9A5mxEZUWXFac_JuBcUfXrqeK_ftZV6TlL4NPvF8&e=>.
 This should contain the configs (If you do this, i think it'd be best to 
checkout the gem5 repo to an earlier release as well).

Kind regards,
Bobby
--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: 
https://www.bobbybruce.net<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.bobbybruce.net&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=6mPnQPGS3as3czVrVGttKGiUN5cpEbfDNx7kDkWIoWI&e=>


On Thu, Feb 17, 2022 at 10:43 AM David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
Hi,

I’m going through the steps to create the npb environment.

https://www.gem5.org/documentation/gem5art/tutorials/npb-tutorial<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gem5.org_documentation_gem5art_tutorials_npb-2Dtutorial&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=wBduQ-K05P_tCniUVKBmzIzN

[gem5-users] Re: Not able to access webpage to run_npb.py

2022-02-17 Thread David Fong via gem5-users
Hi Bobby,

I’m trying to modify my gem5-X setup to add the NPB performance tests.

My gem5-X setup doesn’t have a configs/example/gem5_library.

I believe the github repo for the  gem5-X build didn’t add the gem5_library 
directory and files.

I could use the gem5 setup from the NPB but then I’ll not be able to do 
architectural exploration with gem5-X.
I would prefer to stick to gem5-X and add on the NPR tests.

This is our cmd-line to build the gem5-X
`./build/ARM/gem5.fast --remote-gdb-port=0 -d log_dir configs/example/fs.py 
--cpu-clock=1GHz --kernel=vmlinux --machine-type=VExpress_GEM5_V1 
--dtb-file=/home/dfong/work/ext_ips/gem5-X/system/arm/dt/armv8_gem5_v1_1cpu.dtb 
-n 1 --disk-image=gem5_ubuntu16.img --caches --l2cache --l1i_size=32kB 
--l1d_size=32kB --l2_size=1MB --l2_assoc=2 --mem-type=DDR4_2400_4x16 
--mem-ranks=4 --mem-size=4GB --sys-clock=1600MHz`

What do you recommend I should do ?

Thanks,

David


From: Bobby Bruce via gem5-users 
Sent: Thursday, February 17, 2022 11:16 AM
To: gem5 users mailing list 
Cc: Bobby Bruce 
Subject: [gem5-users] Re: Not able to access webpage to run_npb.py

Hey David,

Sorry about the trouble you're running into. It seems the gem5art tutorial on 
the website has become a bit outdated. We've updated gem5-resources in the last 
release and clearly this has broken some links. I'll make sure updating this is 
prioritized.

I have two ways you can run NPB. The first is quite simple and is basically 
what Jason said, there's a script in 
`configs/example/gem5_library/x86-npb-benchmarks.py` which you can execute with 
gem5 and run NPB. This should work: `./build/X86/gem5.opt 
configs/example/gem5_library/x86-npb-benchmarks.py --benchmark ep --size A` 
(warning, this script assumes you're running gem5 on an X86 host with KVM). 
This approach is using our gem5 stdlib, a tutorial to which can be found here: 
https://www.gem5.org/documentation/gem5-stdlib/overview<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gem5.org_documentation_gem5-2Dstdlib_overview&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=XrODO-sPqMN8yjZ4qLcyRP0EbgeB1S1OBBi8JNt8kZc&e=>

The second way is to checkout the gem5-resources repo to the state it was at 
the last release. The tag you want is v21.1.0.2, 
https://gem5.googlesource.com/public/gem5-resources/+/refs/tags/v21.1.0.2/<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_tags_v21.1.0.2_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=f9_9A5mxEZUWXFac_JuBcUfXrqeK_ftZV6TlL4NPvF8&e=>.
 This should contain the configs (If you do this, i think it'd be best to 
checkout the gem5 repo to an earlier release as well).

Kind regards,
Bobby
--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: 
https://www.bobbybruce.net<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.bobbybruce.net&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=6mPnQPGS3as3czVrVGttKGiUN5cpEbfDNx7kDkWIoWI&e=>


On Thu, Feb 17, 2022 at 10:43 AM David Fong via gem5-users 
mailto:gem5-users@gem5.org>> wrote:
Hi,

I’m going through the steps to create the npb environment.

https://www.gem5.org/documentation/gem5art/tutorials/npb-tutorial<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gem5.org_documentation_gem5art_tutorials_npb-2Dtutorial&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=wBduQ-K05P_tCniUVKBmzIzNp4j3Hpwynn2wjnRp7xE&e=>

gem5 run scripts

Next, we need to add gem5 run scripts. We will do that in a folder named 
configs-npb-tests. Get the run script named run_npb.py from 
here<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_heads_stable_src_npb_configs_run-5Fnpb.py&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&s=vJJp0EIxGV06oBXrj5xw3tI7jRCp2xMwdKa-BftoLxo&e=>,
 and other system configuration files from 
[here]((https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/npb/configs/system/<https://urldefense.proofpoint.com/v2/url?u=https-3A__gem5.googlesource.com_public_gem5-2Dresources_-2B_refs_heads_stable_src_npb_configs_system_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=OkH-8nM02VdNPRt_miVO36vI9580zW1SgNQ4MzWRfqc&m=eSlpqdCMZ0MQaU6uWmL6HLshxvwDW_rT1yOkyrY4pJM&

[gem5-users] Not able to access webpage to run_npb.py

2022-02-17 Thread David Fong via gem5-users
Hi,

I'm going through the steps to create the npb environment.

https://www.gem5.org/documentation/gem5art/tutorials/npb-tutorial

gem5 run scripts

Next, we need to add gem5 run scripts. We will do that in a folder named 
configs-npb-tests. Get the run script named run_npb.py from 
here,
 and other system configuration files from 
[here]((https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/npb/configs/system/).


I'm not able to access the link to "run_npb.py".

https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/npb/configs/run_npb.py

I get this error

[cid:image001.png@01D823EA.BB99FC20]

Does anyone else have this problem and how to workaround ?
Is there another location to download the "run_npb.py" ?

Thanks,

David

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s