I've done a git bisect and found the problematic commit.

bb94296373dde1d0ce971ee58ad111f4225c425e is the first bad commit
commit bb94296373dde1d0ce971ee58ad111f4225c425e
Author: Joe Gross <joseph.gr...@amd.com>
Date:   Mon Jul 20 09:15:18 2015 -0500

    mem-ruby: Fixed pipeline squashes caused by aliased requests

    This patch was created by Bihn Pham during his internship at AMD.

    This patch fixes a very significant performance bug when using the O3
    CPU model and Ruby. The issue was Ruby returned false when it received
    a request to the same address that already has an outstanding request or
    when the memory is blocked. As a result, O3 unnecessary squashed the
    pipeline and re-executed instructions. This fix merges readRequestTable
    and writeRequestTable in Sequencer into a single request table that
    keeps track of all requests and allows multiple outstanding requests to
    the same address. This prevents O3 from squashing the pipeline.

    Change-Id: If934d57b4736861e342de0ab18be4feec464273d
    Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21219
    Reviewed-by: Anthony Gutierrez <anthony.gutier...@amd.com>
    Maintainer: Anthony Gutierrez <anthony.gutier...@amd.com>
    Tested-by: kokoro <noreply+kok...@google.com>

________________________________
From: gem5-dev <gem5-dev-boun...@gem5.org> on behalf of Timothy Hayes 
<timothy.ha...@arm.com>
Sent: 06 February 2020 17:55
To: gem5-dev@gem5.org <gem5-dev@gem5.org>
Subject: [gem5-dev] Ruby Checkpointing Broken

I’m using a workflow that relies on checkpoints created by Gem5/Ruby w/ the 
MOESI_hammer protocol. I successfully created a set of checkpoints last 
September using the public branch up to date at the time. I’ve now pulled the 
latest changes from upstream and tried to recreate them, however, the 
checkpointing mechanism is no longer working. My Linux disk image boots 
successfully, but calling “/sbin/m5 checkpoint” fails silently. No checkpoint 
is created and the simulation tries to resume from the end of time in an 
infinite loop. I’ve tried this with hack_back_ckpt.rcS as well as manually on 
an interactive terminal (same result).

Has something changed between last September and now that might have corrupted 
the checkpointing mechanism w/ Ruby? It was functioning correctly as of 
September 27th 2019.

I’ve tried creating a checkpoint using AtomicCPU w/ the classic memory system 
and it seems to work okay.

Extra details are in the post-script.

--
Timothy Hayes
Senior Research Engineer
Arm Research
Phone: +44-1223405170
timothy.ha...@arm.com


== compile ==
scons CC=gcc-8 CXX=g++-8 build/ARM_MOESI_hammer/gem5.opt TARGET_ISA=arm 
PROTOCOL=MOESI_hammer SLICC_HTML=True 
CPU_MODELS=AtomicSimpleCPU,TimingSimpleCPU,O3CPU -j 8

== run ==
./gem5/build/ARM_MOESI_hammer/gem5.opt ./gem5/configs/example/fs.py --ruby 
--num-cpus=1 --mem-type=SimpleMemory --mem-size=4GB --cpu-type=TimingSimpleCPU 
--kernel=vmlinux.vexpress_gem5_v1_64 --machine-type=VExpress_GEM5_V1 
--disk-image=arm64-ff2-gem5-D1-1.img 
--script=./gem5/configs/boot/hack_back_ckpt.rcS

== stdout ==
warn: You are trying to use Ruby on ARM, which is not working properly yet.
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 compiled Feb  6 2020 14:39:34
gem5 started Feb  6 2020 17:19:51
gem5 executing on machine, pid 18966
command line: ./gem5/build/ARM_MOESI_hammer/gem5.opt 
./gem5/configs/example/fs.py --ruby --num-cpus=1 --mem-type=SimpleMemory 
--mem-size=4GB --cpu-type=TimingSimpleCPU --kernel=vmlinux.vexpress_gem5_v1_64 
--machine-type=VExpress_GEM5_V1 --disk-image=arm64-ff2-gem5-D1-1.img 
–script=./gem5/configs/boot/hack_back_ckpt.rcS

Global frequency set at 1000000000000 ticks per second
info: kernel located at: ./dist/binaries/vmlinux.vexpress_gem5_v1_64
warn: Bootloader entry point 0x10 overriding reset address 0
warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. 
Assuming you wanted these to match.
system.vncserver: Listening for connections on port 5900
system.terminal: Listening for connections on port 3456
0: system.remote_gdb: listening for remote gdb on port 7000
info: Using bootloader at address 0x10
info: Using kernel entry physical address at 0x80080000
info: Loading DTB file: . /m5out/system.dtb at address 0x88000000
**** REAL SIMULATION ****
warn: Existing EnergyCtrl, but no enabled DVFSHandler found.
info: Entering event queue @ 0.  Starting simulation...
warn: Replacement policy updates recently became the responsibility of SLICC 
state machines. Make sure to setMRU() near callbacks in .sm files!
warn: SCReg: Access to unknown device dcc0:site0:pos0:fn7:dev0
warn: Cache maintenance operations are not supported in Ruby.
warn: Tried to read RealView I/O at offset 0x60 that doesn't exist
warn: Tried to read RealView I/O at offset 0x48 that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to write RVIO at offset 0xa8 (data 0) that doesn't exist
warn: Tried to read RealView I/O at offset 0x8 that doesn't exist
warn: Tried to read RealView I/O at offset 0x48 that doesn't exist
warn: EnergyCtrl: Disabled handler, ignoring read from reg 0
info: Entering event queue @ 848156507000.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
info: Entering event queue @ 18446744073709551615.  Starting simulation...
ad infitum
<snip>

== system.terminal ==
<snip>
[    0.389716] EXT4-fs (sda1): couldn't mount as ext3 due to feature 
incompatibilities
[    0.396111] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: 
(null)
[    0.396159] VFS: Mounted root (ext4 filesystem) on device 8:1.
[    0.398130] devtmpfs: mounted
[    0.398489] Freeing unused kernel memory: 384K
Mounting file systems...
[    0.449089] EXT4-fs (sda1): re-mounted. Opts: (null)
Making new terminal device...
Setting up networking...
Bringing up terminal...
Checkpointing simulation...



IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to