[gem5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick

2014-09-03 Thread Cron Daemon via gem5-dev
scons: *** [build/ALPHA/mem/ruby/structures/RubyMemoryControl.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMARequestMsg.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMA_Controller.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMA_Transitions.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMA_Wakeup.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_Controller.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_TBE.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_Transitions.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_Wakeup.do] Error 1
scons: *** [build/ALPHA/mem/protocol/L1Cache_Controller.do] Error 1
scons: *** [build/ALPHA/mem/protocol/L1Cache_Transitions.do] Error 1
scons: *** [build/ALPHA/mem/protocol/L1Cache_Wakeup.do] Error 1
scons: *** [build/ALPHA/mem/protocol/MachineType.do] Error 1
scons: *** [build/ALPHA/mem/protocol/MemoryMsg.do] Error 1
scons: *** [build/ALPHA/mem/protocol/RequestMsg.do] Error 1
scons: *** [build/ALPHA/mem/protocol/ResponseMsg.do] Error 1
scons: *** [build/ALPHA/python/m5/internal/param_RubyMemoryControl_wrap.do] 
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/ruby/structures/RubyMemoryControl.do] 
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMARequestMsg.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMA_Controller.do] Error 1
scons: *** [build/ALPHA/python/m5/internal/param_L1Cache_Controller_wrap.do] 
Error 1
scons: *** [build/ALPHA/python/m5/internal/param_Directory_Controller_wrap.do] 
Error 1
scons: *** [build/ALPHA/python/m5/internal/param_DMA_Controller_wrap.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMA_Transitions.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMA_Wakeup.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_Controller.do] 
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_PfEntry.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_TBE.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_Transitions.do] 
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_Wakeup.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_Controller.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_TBE.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_Transitions.do] Error 
1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_Wakeup.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/MachineType.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/MemoryMsg.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/RequestMsg.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/ResponseMsg.do] Error 1
scons: *** 
[build/ALPHA_MOESI_hammer/python/m5/internal/param_RubyMemoryControl_wrap.do] 
Error 1
scons: *** 
[build/ALPHA_MESI_Two_Level/mem/ruby/structures/RubyMemoryControl.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/DMA_Controller.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/DMA_Transitions.do] Error 1
scons: *** 
[build/ALPHA_MOESI_hammer/python/m5/internal/param_L1Cache_Controller_wrap.do] 
Error 1
scons: *** 
[build/ALPHA_MOESI_hammer/python/m5/internal/param_Directory_Controller_wrap.do]
 Error 1
scons: *** 
[build/ALPHA_MOESI_hammer/python/m5/internal/param_DMA_Controller_wrap.do] 
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/DMA_Wakeup.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Controller.do] 
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Entry.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Transitions.do] 
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Wakeup.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L1Cache_Controller.do] 
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L1Cache_Transitions.do] 
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L1Cache_Wakeup.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Controller.do] 
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Entry.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_TBE.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Transitions.do] 
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Wakeup.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/MachineType.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/MemoryMsg.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/RequestMsg.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/ResponseMsg.do] Error 1
scons: *** 
[build/ALPHA_MESI_Two_Level/python/m5/internal/param_RubyMemoryControl_wrap.do] 
Error 1
scons: *** 
[build/ALPHA_MOESI_CMP_directory/mem/ruby/structures/RubyMemoryControl.do] 
Error 1
scons: *** 

Re: [gem5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick

2014-09-03 Thread Andreas Hansson via gem5-dev
Would someone be able to fry the build directory?

Thanks,

Andreas

On 03/09/2014 08:11, Cron Daemon via gem5-dev gem5-dev@gem5.org wrote:

scons: *** [build/ALPHA/mem/ruby/structures/RubyMemoryControl.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMARequestMsg.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMA_Controller.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMA_Transitions.do] Error 1
scons: *** [build/ALPHA/mem/protocol/DMA_Wakeup.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_Controller.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_TBE.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_Transitions.do] Error 1
scons: *** [build/ALPHA/mem/protocol/Directory_Wakeup.do] Error 1
scons: *** [build/ALPHA/mem/protocol/L1Cache_Controller.do] Error 1
scons: *** [build/ALPHA/mem/protocol/L1Cache_Transitions.do] Error 1
scons: *** [build/ALPHA/mem/protocol/L1Cache_Wakeup.do] Error 1
scons: *** [build/ALPHA/mem/protocol/MachineType.do] Error 1
scons: *** [build/ALPHA/mem/protocol/MemoryMsg.do] Error 1
scons: *** [build/ALPHA/mem/protocol/RequestMsg.do] Error 1
scons: *** [build/ALPHA/mem/protocol/ResponseMsg.do] Error 1
scons: ***
[build/ALPHA/python/m5/internal/param_RubyMemoryControl_wrap.do] Error 1
scons: ***
[build/ALPHA_MOESI_hammer/mem/ruby/structures/RubyMemoryControl.do] Error
1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMARequestMsg.do] Error
1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMA_Controller.do]
Error 1
scons: ***
[build/ALPHA/python/m5/internal/param_L1Cache_Controller_wrap.do] Error 1
scons: ***
[build/ALPHA/python/m5/internal/param_Directory_Controller_wrap.do] Error
1
scons: *** [build/ALPHA/python/m5/internal/param_DMA_Controller_wrap.do]
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMA_Transitions.do]
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/DMA_Wakeup.do] Error 1
scons: ***
[build/ALPHA_MOESI_hammer/mem/protocol/Directory_Controller.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_PfEntry.do]
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_TBE.do] Error
1
scons: ***
[build/ALPHA_MOESI_hammer/mem/protocol/Directory_Transitions.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/Directory_Wakeup.do]
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_Controller.do]
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_TBE.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_Transitions.do]
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/L1Cache_Wakeup.do]
Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/MachineType.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/MemoryMsg.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/RequestMsg.do] Error 1
scons: *** [build/ALPHA_MOESI_hammer/mem/protocol/ResponseMsg.do] Error 1
scons: ***
[build/ALPHA_MOESI_hammer/python/m5/internal/param_RubyMemoryControl_wrap.
do] Error 1
scons: ***
[build/ALPHA_MESI_Two_Level/mem/ruby/structures/RubyMemoryControl.do]
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/DMA_Controller.do]
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/DMA_Transitions.do]
Error 1
scons: ***
[build/ALPHA_MOESI_hammer/python/m5/internal/param_L1Cache_Controller_wrap
.do] Error 1
scons: ***
[build/ALPHA_MOESI_hammer/python/m5/internal/param_Directory_Controller_wr
ap.do] Error 1
scons: ***
[build/ALPHA_MOESI_hammer/python/m5/internal/param_DMA_Controller_wrap.do]
 Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/DMA_Wakeup.do] Error 1
scons: ***
[build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Controller.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Entry.do]
Error 1
scons: ***
[build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Transitions.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/Directory_Wakeup.do]
Error 1
scons: ***
[build/ALPHA_MESI_Two_Level/mem/protocol/L1Cache_Controller.do] Error 1
scons: ***
[build/ALPHA_MESI_Two_Level/mem/protocol/L1Cache_Transitions.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L1Cache_Wakeup.do]
Error 1
scons: ***
[build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Controller.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Entry.do]
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_TBE.do] Error
1
scons: ***
[build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Transitions.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/L2Cache_Wakeup.do]
Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/MachineType.do] Error
1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/MemoryMsg.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/RequestMsg.do] Error 1
scons: *** [build/ALPHA_MESI_Two_Level/mem/protocol/ResponseMsg.do] Error
1
scons: ***
[build/ALPHA_MESI_Two_Level/python/m5/internal/param_RubyMemoryControl_wra

Re: [gem5-dev] Ruby regression tests and null isa

2014-09-03 Thread Andreas Hansson via gem5-dev
Hi Nilay,

That all sounds good. I am not adverse to the idea of including a ruby
protocol in the NULL build, but I??d like it to be for a good reason as it
does indeed add quite some time to the build. That??s all...

Andreas

On 03/09/2014 05:44, Nilay Vaish ni...@cs.wisc.edu wrote:

On Mon, 1 Sep 2014, Andreas Hansson wrote:

 Hi Nilay,

 That is a very good point, and thanks for spending some cycles on this.
 I??m not pushing for a transition, I merely thought it made more sense,
but
 I forgot about the hello world tests.

 Does the ??hello world?? actually add any value to the regressions? Would
it
 not be better to: 1) run a more extensive regression using Ruby + an o3
 CPU model (linux boot etc), or 2) use a more extensive synthetic tester
 (e.g. memtester with actual sharing, which is something we??re working
 on...) for some of these protocols?

I am fine with adding more tests.  I do sometimes test by booting Linux
so
as to ensure things are in a working state.  I am not sure if we would
like to see the time for regressions going up.  I am unable to recall the
inner workings of the testers that we use for ruby, but I am sure they
test sharing.


 As a side note, I??ve managed to make the memory system (src/mem)
 completely ISA independent, so we could compile the entire memory
 directory once for all ISAs. Unfortunately we also need to compile it
once
 for every coherency protocol in Ruby. I??m not sure there is any sensible
 way around it, but it would be good to get your thoughts on this.


If I remember correctly, there is one particular file (MachineType.hh)
that is the stumbling block in compiling all protocols together.  I might
look at this again once I am done with another ruby thing I am working on
currently.

Thanks
Nilay


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England  Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England  Wales, Company No:  2548782
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Fix ExtMachInst hash operator underlying...

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset d2850235e31c in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=d2850235e31c
description:
arm: Fix ExtMachInst hash operator underlying type

This patch fixes the hash operator used for ARM ExtMachInst, which
incorrectly was still using uint32_t. Instead of changing it to
uint64_t it is not using the underlying data type of the BitUnion.

diffstat:

 src/arch/arm/types.hh |  17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diffs (27 lines):

diff -r 9e02c14446bb -r d2850235e31c src/arch/arm/types.hh
--- a/src/arch/arm/types.hh Mon Sep 01 16:55:52 2014 -0500
+++ b/src/arch/arm/types.hh Wed Sep 03 07:42:19 2014 -0400
@@ -727,12 +727,17 @@
 } // namespace ArmISA
 
 __hash_namespace_begin
-template
-struct hashArmISA::ExtMachInst : public hashuint32_t {
-size_t operator()(const ArmISA::ExtMachInst emi) const {
-return hashuint32_t::operator()((uint32_t)emi);
-};
-};
+
+template
+struct hashArmISA::ExtMachInst :
+public hashArmISA::ExtMachInst::__DataType {
+
+size_t operator()(const ArmISA::ExtMachInst emi) const {
+return hashArmISA::ExtMachInst::__DataType::operator()(emi);
+}
+
+};
+
 __hash_namespace_end
 
 #endif
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arch: Cleanup unused ISA traits constants

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset 98771a936b61 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=98771a936b61
description:
arch: Cleanup unused ISA traits constants

This patch prunes unused values, and also unifies how the values are
defined (not using an enum for ALPHA), aligning the use of int vs Addr
etc.

The patch also removes the duplication of PageBytes/PageShift and
VMPageSize/LogVMPageSize. For all ISAs the two pairs had identical
values and the latter has been removed.

diffstat:

 src/arch/alpha/isa_traits.hh   |  14 +-
 src/arch/alpha/process.cc  |   6 +++---
 src/arch/arm/isa_traits.hh |  11 ---
 src/arch/arm/process.cc|  10 +-
 src/arch/arm/utility.cc|   1 +
 src/arch/mips/isa_traits.hh|  10 --
 src/arch/mips/process.cc   |   6 +++---
 src/arch/null/isa_traits.hh|   3 ---
 src/arch/power/isa_traits.hh   |   3 ---
 src/arch/power/process.cc  |   6 +++---
 src/arch/sparc/isa_traits.hh   |  14 ++
 src/arch/sparc/process.cc  |   8 
 src/arch/x86/isa_traits.hh |  10 ++
 src/arch/x86/process.cc|  14 +++---
 src/kern/tru64/tru64.hh|   8 
 src/mem/cache/prefetch/base.cc |   2 +-
 src/mem/multi_level_page_table_impl.hh |  24 
 src/mem/page_table.hh  |   6 +++---
 src/mem/ruby/common/Address.cc |   2 +-
 src/mem/se_translating_port_proxy.cc   |  14 +++---
 src/sim/process.cc |   4 ++--
 src/sim/syscall_emul.cc|  12 ++--
 src/sim/syscall_emul.hh|  10 +-
 src/sim/system.cc  |   6 +++---
 24 files changed, 75 insertions(+), 129 deletions(-)

diffs (truncated from 730 to 300 lines):

diff -r 19f5df7ac6a1 -r 98771a936b61 src/arch/alpha/isa_traits.hh
--- a/src/arch/alpha/isa_traits.hh  Wed Sep 03 07:42:20 2014 -0400
+++ b/src/arch/alpha/isa_traits.hh  Wed Sep 03 07:42:21 2014 -0400
@@ -109,19 +109,7 @@
 mode_number // number of modes
 };
 
-// Constants Related to the number of registers
-
-enum {
-LogVMPageSize = 13,   // 8K bytes
-VMPageSize = (1  LogVMPageSize),
-
-BranchPredAddrShiftAmt = 2, // instructions are 4-byte aligned
-
-MachineBytes = 8,
-WordBytes = 4,
-HalfwordBytes = 2,
-ByteBytes = 1
-};
+const int MachineBytes = 8;
 
 // return a no-op instruction... used for instruction fetch faults
 // Alpha UNOP (ldq_u r31,0(r0))
diff -r 19f5df7ac6a1 -r 98771a936b61 src/arch/alpha/process.cc
--- a/src/arch/alpha/process.cc Wed Sep 03 07:42:20 2014 -0400
+++ b/src/arch/alpha/process.cc Wed Sep 03 07:42:21 2014 -0400
@@ -49,7 +49,7 @@
 : LiveProcess(params, objFile)
 {
 brk_point = objFile-dataBase() + objFile-dataSize() + objFile-bssSize();
-brk_point = roundUp(brk_point, VMPageSize);
+brk_point = roundUp(brk_point, PageBytes);
 
 // Set up stack.  On Alpha, stack goes below text section.  This
 // code should get moved to some architecture-specific spot.
@@ -83,7 +83,7 @@
 // seem to be a problem.
 // check out _dl_aux_init() in glibc/elf/dl-support.c for details
 // --Lisa
-auxv.push_back(auxv_t(M5_AT_PAGESZ, AlphaISA::VMPageSize));
+auxv.push_back(auxv_t(M5_AT_PAGESZ, AlphaISA::PageBytes));
 auxv.push_back(auxv_t(M5_AT_CLKTCK, 100));
 auxv.push_back(auxv_t(M5_AT_PHDR, elfObject-programHeaderTable()));
 DPRINTF(Loader, auxv at PHDR %08p\n, 
elfObject-programHeaderTable());
@@ -193,7 +193,7 @@
 
 LiveProcess::initState();
 
-argsInit(MachineBytes, VMPageSize);
+argsInit(MachineBytes, PageBytes);
 
 ThreadContext *tc = system-getThreadContext(contextIds[0]);
 tc-setIntReg(GlobalPointerReg, objFile-globalPointer());
diff -r 19f5df7ac6a1 -r 98771a936b61 src/arch/arm/isa_traits.hh
--- a/src/arch/arm/isa_traits.hhWed Sep 03 07:42:20 2014 -0400
+++ b/src/arch/arm/isa_traits.hhWed Sep 03 07:42:21 2014 -0400
@@ -51,8 +51,6 @@
 
 namespace LittleEndianGuest {}
 
-#define TARGET_ARM
-
 namespace ArmISA
 {
 using namespace LittleEndianGuest;
@@ -101,16 +99,7 @@
 // return a no-op instruction... used for instruction fetch faults
 const ExtMachInst NoopMachInst = 0x01E320F000ULL;
 
-const int LogVMPageSize = 12;   // 4K bytes
-const int VMPageSize = (1  LogVMPageSize);
-
-// Shouldn't this be 1 because of Thumb?! Dynamic? --Ali
-const int BranchPredAddrShiftAmt = 2; // instructions are 4-byte aligned
-
 const int MachineBytes = 4;
-const int WordBytes = 4;
-const int HalfwordBytes = 2;
-const int ByteBytes = 1;
 
 const uint32_t HighVecs = 0x;
 
diff -r 19f5df7ac6a1 -r 98771a936b61 src/arch/arm/process.cc
--- 

[gem5-dev] changeset in gem5: sim: Fix checkpoint restore for Ticked

2014-09-03 Thread Andrew Bardsley via gem5-dev
changeset 82a4fa2d19a0 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=82a4fa2d19a0
description:
sim: Fix checkpoint restore for Ticked

This patch makes restoring the 'lastStopped' value for Ticked-containing
objects (including MinorCPU) optional so that Ticked-containing objects
can be restored from non-Ticked-containing objects (such as 
AtomicSimpleCPU).

diffstat:

 src/sim/ticked_object.cc |  10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diffs (21 lines):

diff -r 4207f9bfcceb -r 82a4fa2d19a0 src/sim/ticked_object.cc
--- a/src/sim/ticked_object.cc  Wed Sep 03 07:42:22 2014 -0400
+++ b/src/sim/ticked_object.cc  Wed Sep 03 07:42:25 2014 -0400
@@ -82,9 +82,15 @@
 void
 Ticked::unserialize(Checkpoint *cp, const std::string section)
 {
-uint64_t lastStoppedUint;
+uint64_t lastStoppedUint = 0;
 
-paramIn(cp, section, lastStopped, lastStoppedUint);
+/* lastStopped is optional on checkpoint restore as this object may be
+ *  being restored from one which has a common base (and so possibly
+ *  many common checkpointed values) but where Ticked is used in the
+ *  checkpointed object but not this one.
+ *  An example would be a CPU model using Ticked restores from a
+ *  simple CPU without without Ticked */
+optParamIn(cp, section, lastStopped, lastStoppedUint);
 
 lastStopped = Cycles(lastStoppedUint);
 }
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arch, cpu: Factor out the ExecContext into a ...

2014-09-03 Thread Andreas Sandberg via gem5-dev
changeset 4207f9bfcceb in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=4207f9bfcceb
description:
arch, cpu: Factor out the ExecContext into a proper base class

We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.

The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.

diffstat:

 SConstruct  |   12 +-
 src/arch/SConscript |   13 +-
 src/arch/arm/isa/includes.isa   |1 +
 src/arch/isa_parser.py  |   22 ++-
 src/cpu/SConscript  |   64 +
 src/cpu/base_dyn_inst.hh|   43 +
 src/cpu/checker/SConsopts   |4 +-
 src/cpu/checker/cpu.hh  |   27 ++-
 src/cpu/exec_context.cc |   40 +
 src/cpu/exec_context.hh |  264 +++
 src/cpu/inorder/SConsopts   |5 +-
 src/cpu/inorder/inorder_dyn_inst.cc |5 +-
 src/cpu/inorder/inorder_dyn_inst.hh |   46 -
 src/cpu/minor/SConsopts |5 +-
 src/cpu/minor/exec_context.hh   |   25 +-
 src/cpu/nocpu/SConsopts |2 +-
 src/cpu/o3/SConsopts|5 +-
 src/cpu/o3/dyn_inst.hh  |   15 +-
 src/cpu/ozone/SConsopts |8 +-
 src/cpu/simple/SConsopts|   10 +-
 src/cpu/simple/base.hh  |   30 ++--
 src/cpu/simple_thread.cc|   16 ++
 src/cpu/static_inst.hh  |   38 ++--
 23 files changed, 406 insertions(+), 294 deletions(-)

diffs (truncated from 1355 to 300 lines):

diff -r 98771a936b61 -r 4207f9bfcceb SConstruct
--- a/SConstructWed Sep 03 07:42:21 2014 -0400
+++ b/SConstructWed Sep 03 07:42:22 2014 -0400
@@ -1025,17 +1025,10 @@
 
 # Dict of available CPU model objects.  Accessible as CpuModel.dict.
 dict = {}
-list = []
-defaults = []
 
 # Constructor.  Automatically adds models to CpuModel.dict.
-def __init__(self, name, filename, includes, strings, default=False):
+def __init__(self, name, default=False):
 self.name = name   # name of model
-self.filename = filename   # filename for output exec code
-self.includes = includes   # include files needed in exec file
-# The 'strings' dict holds all the per-CPU symbols we can
-# substitute into templates etc.
-self.strings = strings
 
 # This cpu is enabled by default
 self.default = default
@@ -1044,7 +1037,6 @@
 if name in CpuModel.dict:
 raise AttributeError, CpuModel '%s' already registered % name
 CpuModel.dict[name] = self
-CpuModel.list.append(name)
 
 Export('CpuModel')
 
@@ -1086,7 +1078,7 @@
 EnumVariable('TARGET_ISA', 'Target ISA', 'alpha', all_isa_list),
 ListVariable('CPU_MODELS', 'CPU models',
  sorted(n for n,m in CpuModel.dict.iteritems() if m.default),
- sorted(CpuModel.list)),
+ sorted(CpuModel.dict.keys())),
 BoolVariable('EFENCE', 'Link with Electric Fence malloc debugger',
  False),
 BoolVariable('SS_COMPATIBLE_FP',
diff -r 98771a936b61 -r 4207f9bfcceb src/arch/SConscript
--- a/src/arch/SConscript   Wed Sep 03 07:42:21 2014 -0400
+++ b/src/arch/SConscript   Wed Sep 03 07:42:22 2014 -0400
@@ -95,13 +95,11 @@
 # The emitter patches up the sources  targets to include the
 # autogenerated files as targets and isa parser itself as a source.
 def isa_desc_emitter(target, source, env):
-cpu_models = list(env['CPU_MODELS'])
-cpu_models.append('CheckerCPU')
-
 # List the isa parser as a source.
-source += [ isa_parser ]
-# Add in the CPU models.
-source += [ Value(m) for m in cpu_models ]
+source += [
+isa_parser,
+Value(ExecContext),
+]
 
 # Specify different targets depending on if we're running the ISA
 # parser for its dependency information, or for the generated files.
@@ -137,8 +135,7 @@
 
 # Skip over the ISA description itself and the parser to the CPU models.
 models = [ s.get_contents() for s in source[2:] ]
-cpu_models = [CpuModel.dict[cpu] for cpu in models]
- 

[gem5-dev] changeset in gem5: mem: Packet queue clean up

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset 7f4059e4f2d5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=7f4059e4f2d5
description:
mem: Packet queue clean up

No change in functionality, just a bit of tidying up.

diffstat:

 src/mem/packet_queue.cc |  20 +++-
 src/mem/packet_queue.hh |   5 ++---
 2 files changed, 9 insertions(+), 16 deletions(-)

diffs (84 lines):

diff -r 72890a571a7b -r 7f4059e4f2d5 src/mem/packet_queue.cc
--- a/src/mem/packet_queue.cc   Wed Sep 03 07:42:27 2014 -0400
+++ b/src/mem/packet_queue.cc   Wed Sep 03 07:42:28 2014 -0400
@@ -71,11 +71,10 @@
 {
 pkt-pushLabel(label);
 
-DeferredPacketIterator i = transmitList.begin();
-DeferredPacketIterator end = transmitList.end();
+auto i = transmitList.begin();
 bool found = false;
 
-while (!found  i != end) {
+while (!found  i != transmitList.end()) {
 // If the buffered packet contains data, and it overlaps the
 // current packet, then update data
 found = pkt-checkFunctional(i-pkt);
@@ -140,7 +139,7 @@
 }
 
 // this belongs in the middle somewhere, insertion sort
-DeferredPacketIterator i = transmitList.begin();
+auto i = transmitList.begin();
 ++i; // already checked for insertion at front
 while (i != transmitList.end()  when = i-tick)
 ++i;
@@ -151,21 +150,16 @@
 {
 assert(deferredPacketReady());
 
-// take the next packet off the list here, as we might return to
-// ourselves through the sendTiming call below
 DeferredPacket dp = transmitList.front();
-transmitList.pop_front();
 
 // use the appropriate implementation of sendTiming based on the
 // type of port associated with the queue, and whether the packet
 // is to be sent as a snoop or not
 waitingOnRetry = !sendTiming(dp.pkt, dp.sendAsSnoop);
 
-if (waitingOnRetry) {
-// put the packet back at the front of the list (packet should
-// not have changed since it wasn't accepted)
-assert(!sendEvent.scheduled());
-transmitList.push_front(dp);
+if (!waitingOnRetry) {
+// take the packet off the list
+transmitList.pop_front();
 }
 }
 
@@ -216,7 +210,7 @@
 unsigned int
 PacketQueue::drain(DrainManager *dm)
 {
-if (transmitList.empty()  !sendEvent.scheduled())
+if (transmitList.empty())
 return 0;
 DPRINTF(Drain, PacketQueue not drained\n);
 drainManager = dm;
diff -r 72890a571a7b -r 7f4059e4f2d5 src/mem/packet_queue.hh
--- a/src/mem/packet_queue.hh   Wed Sep 03 07:42:27 2014 -0400
+++ b/src/mem/packet_queue.hh   Wed Sep 03 07:42:28 2014 -0400
@@ -78,7 +78,6 @@
 };
 
 typedef std::listDeferredPacket DeferredPacketList;
-typedef std::listDeferredPacket::iterator DeferredPacketIterator;
 
 /** A list of outgoing timing response packets that haven't been
  * serviced yet. */
@@ -109,10 +108,10 @@
 bool waitingOnRetry;
 
 /** Check whether we have a packet ready to go on the transmit list. */
-bool deferredPacketReady()
+bool deferredPacketReady() const
 { return !transmitList.empty()  transmitList.front().tick = curTick(); }
 
-Tick deferredPacketReadyTime()
+Tick deferredPacketReadyTime() const
 { return transmitList.empty() ? MaxTick : transmitList.front().tick; }
 
 /**
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Change writeback modeling for outstandin...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 5b6279635c49 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5b6279635c49
description:
cpu: Change writeback modeling for outstanding instructions

As highlighed on the mailing list gem5's writeback modeling can impact
performance.  This patch removes the limitation on maximum outstanding 
issued
instructions, however the number that can writeback in a single cycle 
is still
respected in instToCommit().

diffstat:

 configs/common/O3_ARM_v7a.py  |   1 -
 src/cpu/o3/O3CPU.py   |   1 -
 src/cpu/o3/iew.hh |  53 ---
 src/cpu/o3/iew_impl.hh|  10 
 src/cpu/o3/inst_queue_impl.hh |   2 -
 src/cpu/o3/lsq_unit.hh|   7 -
 src/cpu/o3/lsq_unit_impl.hh   |   5 +---
 7 files changed, 1 insertions(+), 78 deletions(-)

diffs (210 lines):

diff -r 43516d8eabe9 -r 5b6279635c49 configs/common/O3_ARM_v7a.py
--- a/configs/common/O3_ARM_v7a.py  Wed Sep 03 07:42:32 2014 -0400
+++ b/configs/common/O3_ARM_v7a.py  Wed Sep 03 07:42:33 2014 -0400
@@ -126,7 +126,6 @@
 dispatchWidth = 6
 issueWidth = 8
 wbWidth = 8
-wbDepth = 1
 fuPool = O3_ARM_v7a_FUP()
 iewToCommitDelay = 1
 renameToROBDelay = 1
diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/O3CPU.py
--- a/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:32 2014 -0400
+++ b/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:33 2014 -0400
@@ -84,7 +84,6 @@
 dispatchWidth = Param.Unsigned(8, Dispatch width)
 issueWidth = Param.Unsigned(8, Issue width)
 wbWidth = Param.Unsigned(8, Writeback width)
-wbDepth = Param.Unsigned(1, Writeback depth)
 fuPool = Param.FUPool(DefaultFUPool(), Functional Unit pool)
 
 iewToCommitDelay = Param.Cycles(1, Issue/Execute/Writeback to commit 
diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/iew.hh
--- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:32 2014 -0400
+++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:33 2014 -0400
@@ -219,49 +219,6 @@
 /** Returns if the LSQ has any stores to writeback. */
 bool hasStoresToWB(ThreadID tid) { return ldstQueue.hasStoresToWB(tid); }
 
-void incrWb(InstSeqNum sn)
-{
-++wbOutstanding;
-if (wbOutstanding == wbMax)
-ableToIssue = false;
-DPRINTF(IEW, wbOutstanding: %i [sn:%lli]\n, wbOutstanding, sn);
-assert(wbOutstanding = wbMax);
-#ifdef DEBUG
-wbList.insert(sn);
-#endif
-}
-
-void decrWb(InstSeqNum sn)
-{
-if (wbOutstanding == wbMax)
-ableToIssue = true;
-wbOutstanding--;
-DPRINTF(IEW, wbOutstanding: %i [sn:%lli]\n, wbOutstanding, sn);
-assert(wbOutstanding = 0);
-#ifdef DEBUG
-assert(wbList.find(sn) != wbList.end());
-wbList.erase(sn);
-#endif
-}
-
-#ifdef DEBUG
-std::setInstSeqNum wbList;
-
-void dumpWb()
-{
-std::setInstSeqNum::iterator wb_it = wbList.begin();
-while (wb_it != wbList.end()) {
-cprintf([sn:%lli]\n,
-(*wb_it));
-wb_it++;
-}
-}
-#endif
-
-bool canIssue() { return ableToIssue; }
-
-bool ableToIssue;
-
 /** Check misprediction  */
 void checkMisprediction(DynInstPtr inst);
 
@@ -452,19 +409,9 @@
  */
 unsigned wbCycle;
 
-/** Number of instructions in flight that will writeback. */
-
-/** Number of instructions in flight that will writeback. */
-int wbOutstanding;
-
 /** Writeback width. */
 unsigned wbWidth;
 
-/** Writeback width * writeback depth, where writeback depth is
- * the number of cycles of writing back instructions that can be
- * buffered. */
-unsigned wbMax;
-
 /** Number of active threads. */
 ThreadID numThreads;
 
diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/iew_impl.hh
--- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:32 2014 -0400
+++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:33 2014 -0400
@@ -76,7 +76,6 @@
   issueToExecuteDelay(params-issueToExecuteDelay),
   dispatchWidth(params-dispatchWidth),
   issueWidth(params-issueWidth),
-  wbOutstanding(0),
   wbWidth(params-wbWidth),
   numThreads(params-numThreads)
 {
@@ -109,12 +108,8 @@
 fetchRedirect[tid] = false;
 }
 
-wbMax = wbWidth * params-wbDepth;
-
 updateLSQNextCycle = false;
 
-ableToIssue = true;
-
 skidBufferMax = (3 * (renameToIEWDelay * params-renameWidth)) + 
issueWidth;
 }
 
@@ -635,8 +630,6 @@
 ++wbCycle;
 wbNumInst = 0;
 }
-
-assert((wbCycle * wbWidth + wbNumInst) = wbMax);
 }
 
 DPRINTF(IEW, Current wb cycle: %i, width: %i, numInst: %i\nwbActual:%i\n,
@@ -1263,7 +1256,6 @@
 
 ++iewExecSquashedInsts;
 
-decrWb(inst-seqNum);
 continue;
 }
 
@@ -1502,8 +1494,6 @@
 }
 writebackCount[tid]++;
 }
-
-decrWb(inst-seqNum);
 }
 }
 
diff -r 43516d8eabe9 -r 

[gem5-dev] changeset in gem5: cache: Fix handling of LL/SC requests under c...

2014-09-03 Thread Geoffrey Blake via gem5-dev
changeset 7aacec2a247d in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=7aacec2a247d
description:
cache: Fix handling of LL/SC requests under contention

If a set of LL/SC requests contend on the same cache block we
can get into a situation where CPUs will deadlock if they expect
a failed SC to supply them data.  This case happens where 3 or
more cores are contending for a cache block using LL/SC and the system
is configured where 2 cores are connected to a local bus and the
third is connected to a remote bus.  If a core on the local bus
sends an SCUpgrade and the core on the remote bus sends and SCUpgrade
they will race to see who will win the SC access.  In the meantime
if the other core appends a read to one of the SCUpgrades it will expect
to be supplied data by that SCUpgrade transaction.  If it happens that
the SCUpgrade that was picked to supply the data is failed, it will
drop the appended request for data and never respond, leaving the 
requesting
core to deadlock.  This patch makes all SC's behave as normal stores to
prevent this case but still makes sure to check whether it can perform
the update.

diffstat:

 src/mem/cache/cache_impl.hh |  28 ++--
 src/mem/packet.cc   |  15 ---
 2 files changed, 22 insertions(+), 21 deletions(-)

diffs (96 lines):

diff -r f40134eb3f85 -r 7aacec2a247d src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue May 27 11:00:56 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Wed Sep 03 07:42:31 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2013 ARM Limited
+ * Copyright (c) 2010-2014 ARM Limited
  * All rights reserved.
  *
  * The license below extends only to copyright in the software and shall
@@ -166,8 +166,12 @@
 } else if (pkt-isWrite()) {
 if (blk-checkWrite(pkt)) {
 pkt-writeDataToBlock(blk-data, blkSize);
-blk-status |= BlkDirty;
 }
+// Always mark the line as dirty even if we are a failed
+// StoreCond so we supply data to any snoops that have
+// appended themselves to this cache before knowing the store
+// will fail.
+blk-status |= BlkDirty;
 } else if (pkt-isRead()) {
 if (pkt-isLLSC()) {
 blk-trackLoadLocked(pkt);
@@ -658,6 +662,13 @@
 // (read-only) and we need exclusive
 assert(needsExclusive  !blk-isWritable());
 cmd = cpu_pkt-isLLSC() ? MemCmd::SCUpgradeReq : MemCmd::UpgradeReq;
+} else if (cpu_pkt-cmd == MemCmd::SCUpgradeFailReq ||
+   cpu_pkt-cmd == MemCmd::StoreCondFailReq) {
+// Even though this SC will fail, we still need to send out the
+// request and get the data to supply it to other snoopers in the case
+// where the determination the StoreCond fails is delayed due to
+// all caches not being on the same local bus.
+cmd = MemCmd::SCUpgradeFailReq;
 } else {
 // block is invalid
 cmd = needsExclusive ? MemCmd::ReadExReq : MemCmd::ReadReq;
@@ -1724,18 +1735,7 @@
 DPRINTF(CachePort, %s %s for address %x size %d\n, __func__,
 tgt_pkt-cmdString(), tgt_pkt-getAddr(), tgt_pkt-getSize());
 
-if (tgt_pkt-cmd == MemCmd::SCUpgradeFailReq ||
-tgt_pkt-cmd == MemCmd::StoreCondFailReq) {
-// SCUpgradeReq or StoreCondReq saw invalidation while queued
-// in MSHR, so now that we are getting around to processing
-// it, just treat it as if we got a failure response
-pkt = new Packet(tgt_pkt);
-pkt-cmd = MemCmd::UpgradeFailResp;
-pkt-senderState = mshr;
-pkt-busFirstWordDelay = pkt-busLastWordDelay = 0;
-recvTimingResp(pkt);
-return NULL;
-} else if (mshr-isForwardNoResponse()) {
+if (mshr-isForwardNoResponse()) {
 // no response expected, just forward packet as it is
 assert(tags-findBlock(mshr-addr, mshr-isSecure) == NULL);
 pkt = tgt_pkt;
diff -r f40134eb3f85 -r 7aacec2a247d src/mem/packet.cc
--- a/src/mem/packet.cc Tue May 27 11:00:56 2014 -0500
+++ b/src/mem/packet.cc Wed Sep 03 07:42:31 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2011-2013 ARM Limited
+ * Copyright (c) 2011-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -115,12 +115,13 @@
 /* UpgradeResp */
 { SET3(NeedsExclusive, IsUpgrade, IsResponse),
 InvalidCmd, UpgradeResp },
-/* SCUpgradeFailReq: generates UpgradeFailResp ASAP */
-{ SET5(IsInvalidate, NeedsExclusive, IsLlsc,
-   IsRequest, NeedsResponse),
+/* SCUpgradeFailReq: generates UpgradeFailResp but still gets the data */
+{ SET6(IsRead, NeedsExclusive, IsInvalidate,
+   IsLlsc, IsRequest, NeedsResponse),
 UpgradeFailResp, SCUpgradeFailReq },
-   

[gem5-dev] changeset in gem5: mem: Add utility script to plot DRAM efficien...

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset 5169ebd26163 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5169ebd26163
description:
mem: Add utility script to plot DRAM efficiency sweep

This patch adds basic functionality to quickly visualise the output
from the DRAM efficiency script. There are some unfortunate hacks
needed to communicate the needed information from one script to the
other, and we fall back on (ab)using the simout to do this.

As part of this patch we also trim the efficiency sweep to stop at 512
bytes as this should be sufficient for all forseeable DRAMs.

diffstat:

 configs/dram/sweep.py   |   13 +++-
 util/dram_sweep_plot.py |  151 
 2 files changed, 161 insertions(+), 3 deletions(-)

diffs (185 lines):

diff -r 7f4059e4f2d5 -r 5169ebd26163 configs/dram/sweep.py
--- a/configs/dram/sweep.py Wed Sep 03 07:42:28 2014 -0400
+++ b/configs/dram/sweep.py Wed Sep 03 07:42:29 2014 -0400
@@ -124,12 +124,16 @@
 # assume we start at 0
 max_addr = mem_range.end
 
+# use min of the page size and 512 bytes as that should be more than
+# enough
+max_stride = min(512, page_size)
+
 # now we create the state by iterating over the stride size from burst
-# size to min of the page size and 1 kB, and from using only a single
-# bank up to the number of banks available
+# size to the max stride, and from using only a single bank up to the
+# number of banks available
 nxt_state = 0
 for bank in range(1, nbr_banks + 1):
-for stride_size in range(burst_size, min(1024, page_size) + 1, burst_size):
+for stride_size in range(burst_size, max_stride + 1, burst_size):
 cfg_file.write(STATE %d %d DRAM 100 0 %d 
%d %d %d %d %d %d %d %d 1\n %
(nxt_state, period, max_addr, burst_size, itt, itt, 0,
@@ -168,3 +172,6 @@
 
 m5.instantiate()
 m5.simulate(nxt_state * period)
+
+print DRAM sweep with burst: %d, banks: %d, max stride: %d % \
+(burst_size, nbr_banks, max_stride)
diff -r 7f4059e4f2d5 -r 5169ebd26163 util/dram_sweep_plot.py
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/util/dram_sweep_plot.py   Wed Sep 03 07:42:29 2014 -0400
@@ -0,0 +1,151 @@
+#!/usr/bin/env python
+
+# Copyright (c) 2014 ARM Limited
+# All rights reserved
+#
+# The license below extends only to copyright in the software and shall
+# not be construed as granting a license to any other intellectual
+# property including but not limited to intellectual property relating
+# to a hardware implementation of the functionality of the software
+# licensed hereunder.  You may use the software subject to the license
+# terms below provided that you ensure that this notice is replicated
+# unmodified and in its entirety in all distributions of the software,
+# modified or unmodified, in source code or in binary form.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+# Authors: Andreas Hansson
+
+try:
+
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import matplotlib.pyplot as plt
+import numpy as np
+except ImportError:
+print Failed to import matplotlib and numpy
+exit(-1)
+
+import sys
+import re
+
+# Determine the parameters of the sweep from the simout output, and
+# then parse the stats and plot the 3D surface corresponding to the
+# different combinations of parallel banks, and stride size, as
+# generated by the config/dram/sweep.py script
+def main():
+
+ 

[gem5-dev] changeset in gem5: arm: support 16kb vm granules

2014-09-03 Thread Curtis Dunham via gem5-dev
changeset f40134eb3f85 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=f40134eb3f85
description:
arm: support 16kb vm granules

diffstat:

 src/arch/arm/miscregs.hh |   26 -
 src/arch/arm/table_walker.cc |  125 +-
 src/arch/arm/table_walker.hh |   56 +-
 3 files changed, 139 insertions(+), 68 deletions(-)

diffs (truncated from 363 to 300 lines):

diff -r 5169ebd26163 -r f40134eb3f85 src/arch/arm/miscregs.hh
--- a/src/arch/arm/miscregs.hh  Wed Sep 03 07:42:29 2014 -0400
+++ b/src/arch/arm/miscregs.hh  Tue May 27 11:00:56 2014 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2013 ARM Limited
+ * Copyright (c) 2010-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -1715,6 +1715,30 @@
 Bitfield20 tbi;
 EndBitUnion(TTBCR)
 
+// Fields of TCR_EL{1,2,3} (mostly overlapping)
+// TCR_EL1 is natively 64 bits, the others are 32 bits
+BitUnion64(TCR)
+Bitfield5, 0 t0sz;
+Bitfield7 epd0; // EL1
+Bitfield9, 8 irgn0;
+Bitfield11, 10 orgn0;
+Bitfield13, 12 sh0;
+Bitfield15, 14 tg0;
+Bitfield18, 16 ps;
+Bitfield20 tbi; // EL2/EL3
+Bitfield21, 16 t1sz; // EL1
+Bitfield22 a1; // EL1
+Bitfield23 epd1; // EL1
+Bitfield25, 24 irgn1; // EL1
+Bitfield27, 26 orgn1; // EL1
+Bitfield29, 28 sh1; // EL1
+Bitfield31, 30 tg1; // EL1
+Bitfield34, 32 ips; // EL1
+Bitfield36 as; // EL1
+Bitfield37 tbi0; // EL1
+Bitfield38 tbi1; // EL1
+EndBitUnion(TCR)
+
 BitUnion32(HTCR)
 Bitfield2, 0 t0sz;
 Bitfield9, 8 irgn0;
diff -r 5169ebd26163 -r f40134eb3f85 src/arch/arm/table_walker.cc
--- a/src/arch/arm/table_walker.cc  Wed Sep 03 07:42:29 2014 -0400
+++ b/src/arch/arm/table_walker.cc  Tue May 27 11:00:56 2014 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010, 2012-2013 ARM Limited
+ * Copyright (c) 2010, 2012-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -220,18 +220,18 @@
   case EL0:
   case EL1:
 currState-sctlr = currState-tc-readMiscReg(MISCREG_SCTLR_EL1);
-currState-ttbcr = currState-tc-readMiscReg(MISCREG_TCR_EL1);
+currState-tcr = currState-tc-readMiscReg(MISCREG_TCR_EL1);
 break;
   // @todo: uncomment this to enable Virtualization
   // case EL2:
   //   assert(haveVirtualization);
   //   currState-sctlr = 
currState-tc-readMiscReg(MISCREG_SCTLR_EL2);
-  //   currState-ttbcr = currState-tc-readMiscReg(MISCREG_TCR_EL2);
+  //   currState-tcr = currState-tc-readMiscReg(MISCREG_TCR_EL2);
   //   break;
   case EL3:
 assert(haveSecurity);
 currState-sctlr = currState-tc-readMiscReg(MISCREG_SCTLR_EL3);
-currState-ttbcr = currState-tc-readMiscReg(MISCREG_TCR_EL3);
+currState-tcr = currState-tc-readMiscReg(MISCREG_TCR_EL3);
 break;
   default:
 panic(Invalid exception level);
@@ -625,8 +625,7 @@
 
 currState-longDesc.lookupLevel = start_lookup_level;
 currState-longDesc.aarch64 = false;
-currState-longDesc.largeGrain = false;
-currState-longDesc.grainSize = 12;
+currState-longDesc.grainSize = Grain4KB;
 
 Event *event = start_lookup_level == L1 ? (Event *) doL1LongDescEvent
 : (Event *) doL2LongDescEvent;
@@ -663,13 +662,18 @@
 {
 assert(currState-aarch64);
 
-DPRINTF(TLB, Beginning table walk for address %#llx, TTBCR: %#llx\n,
-currState-vaddr_tainted, currState-ttbcr);
+DPRINTF(TLB, Beginning table walk for address %#llx, TCR: %#llx\n,
+currState-vaddr_tainted, currState-tcr);
+
+static const GrainSize GrainMapDefault[] =
+  { Grain4KB, Grain64KB, Grain16KB, ReservedGrain };
+static const GrainSize GrainMap_EL1_tg1[] =
+  { ReservedGrain, Grain16KB, Grain4KB, Grain64KB };
 
 // Determine TTBR, table size, granule size and phys. address range
 Addr ttbr = 0;
 int tsz = 0, ps = 0;
-bool large_grain = false;
+GrainSize tg = Grain4KB; // grain size computed from tg* field
 bool fault = false;
 switch (currState-el) {
   case EL0:
@@ -678,44 +682,44 @@
   case 0:
 DPRINTF(TLB,  - Selecting TTBR0 (AArch64)\n);
 ttbr = currState-tc-readMiscReg(MISCREG_TTBR0_EL1);
-tsz = adjustTableSizeAArch64(64 - currState-ttbcr.t0sz);
-large_grain = currState-ttbcr.tg0;
+tsz = adjustTableSizeAArch64(64 - currState-tcr.t0sz);
+tg = GrainMapDefault[currState-tcr.tg0];
 if (bits(currState-vaddr, 63, tsz) != 0x0 ||
-currState-ttbcr.epd0)
+

[gem5-dev] changeset in gem5: config: Change parsing of Addr so hex values ...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 19f5df7ac6a1 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=19f5df7ac6a1
description:
config: Change parsing of Addr so hex values work from scripts

When passed from a configuration script with a hexadecimal value (like
0x8000), gem5 would error out. This is because it would call
toMemorySize which requires the argument to end with a size specifier 
(like
1MB, etc).

This modification makes it so raw hex values can be passed through Addr
parameters from the configuration scripts.

diffstat:

 src/arch/arm/ArmSystem.py |   2 +-
 src/python/m5/params.py   |  12 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diffs (35 lines):

diff -r d2850235e31c -r 19f5df7ac6a1 src/arch/arm/ArmSystem.py
--- a/src/arch/arm/ArmSystem.py Wed Sep 03 07:42:19 2014 -0400
+++ b/src/arch/arm/ArmSystem.py Wed Sep 03 07:42:20 2014 -0400
@@ -65,7 +65,7 @@
 highest_el_is_64 = Param.Bool(False,
 True if the register width of the highest implemented exception level 

 is 64 bits (ARMv8))
-reset_addr_64 = Param.UInt64(0x0,
+reset_addr_64 = Param.Addr(0x0,
 Reset address if the highest implemented exception level is 64 bits 
 (ARMv8))
 phys_addr_range_64 = Param.UInt8(40,
diff -r d2850235e31c -r 19f5df7ac6a1 src/python/m5/params.py
--- a/src/python/m5/params.py   Wed Sep 03 07:42:19 2014 -0400
+++ b/src/python/m5/params.py   Wed Sep 03 07:42:20 2014 -0400
@@ -626,9 +626,17 @@
 self.value = value.value
 else:
 try:
+# Often addresses are referred to with sizes. Ex: A device
+# base address is at 512MB.  Use toMemorySize() to convert
+# these into addresses. If the address is not specified with a
+# size, an exception will occur and numeric translation will
+# proceed below.
 self.value = convert.toMemorySize(value)
-except TypeError:
-self.value = long(value)
+except (TypeError, ValueError):
+# Convert number to string and use long() to do automatic
+# base conversion (requires base=0 for auto-conversion)
+self.value = long(str(value), base=0)
+
 self._check()
 def __add__(self, other):
 if isinstance(other, Addr):
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: dev: Avoid invalid sized reads in PL390 with ...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 72890a571a7b in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=72890a571a7b
description:
dev: Avoid invalid sized reads in PL390 with DPRINTF enabled

The first DPRINTF() in PL390::writeDistributor always read a uint32_t, 
though a
packet may have only been 1 or 2 bytes.  This caused an assertion in
packet-get().

diffstat:

 src/dev/arm/gic_pl390.cc |  19 ++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diffs (30 lines):

diff -r 82a4fa2d19a0 -r 72890a571a7b src/dev/arm/gic_pl390.cc
--- a/src/dev/arm/gic_pl390.cc  Wed Sep 03 07:42:25 2014 -0400
+++ b/src/dev/arm/gic_pl390.cc  Wed Sep 03 07:42:27 2014 -0400
@@ -395,8 +395,25 @@
 assert(pkt-req-hasContextId());
 int ctx_id = pkt-req-contextId();
 
+uint32_t pkt_data M5_VAR_USED;
+switch (pkt-getSize())
+{
+  case 1:
+pkt_data = pkt-getuint8_t();
+break;
+  case 2:
+pkt_data = pkt-getuint16_t();
+break;
+  case 4:
+pkt_data = pkt-getuint32_t();
+break;
+  default:
+panic(Invalid size when writing to priority regs in Gic: %d\n,
+  pkt-getSize());
+}
+
 DPRINTF(GIC, gic distributor write register %#x size %#x value %#x \n,
-daddr, pkt-getSize(), pkt-getuint32_t());
+daddr, pkt-getSize(), pkt_data);
 
 if (daddr = ICDISER_ST  daddr  ICDISER_ED + 4) {
 assert((daddr-ICDISER_ST)  2  32);
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arch: Properly guess OpClass from optional St...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 43516d8eabe9 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=43516d8eabe9
description:
arch: Properly guess OpClass from optional StaticInst flags

isa_parser.py guesses the OpClass if none were given based upon the 
StaticInst
flags.  The existing code does not take into account optionally set 
flags.
This code hoists the setting of optional flags so OpClass is properly 
assigned.

diffstat:

 src/arch/isa_parser.py |  36 +---
 1 files changed, 25 insertions(+), 11 deletions(-)

diffs (57 lines):

diff -r 7aacec2a247d -r 43516d8eabe9 src/arch/isa_parser.py
--- a/src/arch/isa_parser.pyWed Sep 03 07:42:31 2014 -0400
+++ b/src/arch/isa_parser.pyWed Sep 03 07:42:32 2014 -0400
@@ -1,3 +1,15 @@
+# Copyright (c) 2014 ARM Limited
+# All rights reserved
+#
+# The license below extends only to copyright in the software and shall
+# not be construed as granting a license to any other intellectual
+# property including but not limited to intellectual property relating
+# to a hardware implementation of the functionality of the software
+# licensed hereunder.  You may use the software subject to the license
+# terms below provided that you ensure that this notice is replicated
+# unmodified and in its entirety in all distributions of the software,
+# modified or unmodified, in source code or in binary form.
+#
 # Copyright (c) 2003-2005 The Regents of The University of Michigan
 # Copyright (c) 2013 Advanced Micro Devices, Inc.
 # All rights reserved.
@@ -1119,17 +1131,7 @@
 
 self.flags = self.operands.concatAttrLists('flags')
 
-# Make a basic guess on the operand class (function unit type).
-# These are good enough for most cases, and can be overridden
-# later otherwise.
-if 'IsStore' in self.flags:
-self.op_class = 'MemWriteOp'
-elif 'IsLoad' in self.flags or 'IsPrefetch' in self.flags:
-self.op_class = 'MemReadOp'
-elif 'IsFloating' in self.flags:
-self.op_class = 'FloatAddOp'
-else:
-self.op_class = 'IntAluOp'
+self.op_class = None
 
 # Optional arguments are assumed to be either StaticInst flags
 # or an OpClass value.  To avoid having to import a complete
@@ -1144,6 +1146,18 @@
 error('InstObjParams: optional arg %s not recognized '
   'as StaticInst::Flag or OpClass.' % oa)
 
+# Make a basic guess on the operand class if not set.
+# These are good enough for most cases.
+if not self.op_class:
+if 'IsStore' in self.flags:
+self.op_class = 'MemWriteOp'
+elif 'IsLoad' in self.flags or 'IsPrefetch' in self.flags:
+self.op_class = 'MemReadOp'
+elif 'IsFloating' in self.flags:
+self.op_class = 'FloatAddOp'
+else:
+self.op_class = 'IntAluOp'
+
 # add flag initialization to contructor here to include
 # any flags added via opt_args
 self.constructor += makeFlagConstructor(self.flags)
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Fix SMT scheduling issue with the O3 cpu

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset ed05298e8566 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=ed05298e8566
description:
cpu: Fix SMT scheduling issue with the O3 cpu

The o3 cpu could attempt to schedule inactive threads under round-robin 
SMT
mode.

This is because it maintained an independent priority list of threads 
from the
active thread list.  This priority list could be come stale once 
threads were
inactive, leading to the cpu trying to fetch/commit from inactive 
threads.


Additionally the fetch queue is now forcibly flushed of instrctuctions
from the de-scheduled thread.

Relevant output:

24557000: system.cpu: [tid:1]: Calling deactivate thread.
24557000: system.cpu: [tid:1]: Removing from active threads list

24557500: system.cpu:
FullO3CPU: Ticking main, FullO3CPU.
24557500: system.cpu.fetch: Running stage.
24557500: system.cpu.fetch: Attempting to fetch from [tid:1]

diffstat:

 src/cpu/o3/O3CPU.py   |3 +-
 src/cpu/o3/commit.hh  |5 +-
 src/cpu/o3/commit_impl.hh |   15 +-
 src/cpu/o3/cpu.cc |5 +-
 src/cpu/o3/fetch.hh   |6 +-
 src/cpu/o3/fetch_impl.hh  |  109 +
 6 files changed, 99 insertions(+), 44 deletions(-)

diffs (truncated from 306 to 300 lines):

diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/O3CPU.py
--- a/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:37 2014 -0400
@@ -61,7 +61,8 @@
 commitToFetchDelay = Param.Cycles(1, Commit to fetch delay)
 fetchWidth = Param.Unsigned(8, Fetch width)
 fetchBufferSize = Param.Unsigned(64, Fetch buffer size in bytes)
-fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops)
+fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops 
+per-thread)
 
 renameToDecodeDelay = Param.Cycles(1, Rename to decode delay)
 iewToDecodeDelay = Param.Cycles(1, Issue/Execute/Writeback to decode 
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh  Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/commit.hh  Wed Sep 03 07:42:37 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2012 ARM Limited
+ * Copyright (c) 2010-2012, 2014 ARM Limited
  * All rights reserved.
  *
  * The license below extends only to copyright in the software and shall
@@ -218,6 +218,9 @@
 /** Takes over from another CPU's thread. */
 void takeOverFrom();
 
+/** Deschedules a thread from scheduling */
+void deactivateThread(ThreadID tid);
+
 /** Ticks the commit stage, which tries to commit instructions. */
 void tick();
 
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:37 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2013 ARM Limited
+ * Copyright (c) 2010-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -463,6 +463,19 @@
 
 template class Impl
 void
+DefaultCommitImpl::deactivateThread(ThreadID tid)
+{
+listThreadID::iterator thread_it = std::find(priority_list.begin(),
+priority_list.end(), tid);
+
+if (thread_it != priority_list.end()) {
+priority_list.erase(thread_it);
+}
+}
+
+
+template class Impl
+void
 DefaultCommitImpl::updateStatus()
 {
 // reset ROB changed variable
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/cpu.cc
--- a/src/cpu/o3/cpu.cc Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/cpu.cc Wed Sep 03 07:42:37 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2011-2012 ARM Limited
+ * Copyright (c) 2011-2012, 2014 ARM Limited
  * Copyright (c) 2013 Advanced Micro Devices, Inc.
  * All rights reserved
  *
@@ -728,6 +728,9 @@
 tid);
 activeThreads.erase(thread_it);
 }
+
+fetch.deactivateThread(tid);
+commit.deactivateThread(tid);
 }
 
 template class Impl
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/fetch.hh
--- a/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:37 2014 -0400
@@ -255,6 +255,8 @@
 /** Tells fetch to wake up from a quiesce instruction. */
 void wakeFromQuiesce();
 
+/** For priority-based fetch policies, need to keep update priorityList */
+void deactivateThread(ThreadID tid);
   private:
 /** Reset this pipeline stage */
 void resetStage();
@@ -484,8 +486,8 @@
 /** The size of the fetch queue in micro-ops */
 unsigned fetchQueueSize;
 
-/** Queue of fetched instructions */
-std::dequeDynInstPtr fetchQueue;
+/** Queue of fetched instructions. Per-thread to prevent HoL blocking. */
+std::dequeDynInstPtr fetchQueue[Impl::MaxThreads];
 
 /** Whether or not the fetch buffer data 

[gem5-dev] changeset in gem5: cpu: Add a fetch queue to the o3 cpu

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 12e3be8203a5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=12e3be8203a5
description:
cpu: Add a fetch queue to the o3 cpu

This patch adds a fetch queue that sits between fetch and decode to the
o3 cpu.  This effectively decouples fetch from decode stalls allowing it
to be more aggressive, running futher ahead in the instruction stream.

diffstat:

 src/cpu/o3/O3CPU.py  |   1 +
 src/cpu/o3/fetch.hh  |  14 +++---
 src/cpu/o3/fetch_impl.hh |  61 ++-
 3 files changed, 55 insertions(+), 21 deletions(-)

diffs (201 lines):

diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/O3CPU.py
--- a/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:34 2014 -0400
+++ b/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:35 2014 -0400
@@ -61,6 +61,7 @@
 commitToFetchDelay = Param.Cycles(1, Commit to fetch delay)
 fetchWidth = Param.Unsigned(8, Fetch width)
 fetchBufferSize = Param.Unsigned(64, Fetch buffer size in bytes)
+fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops)
 
 renameToDecodeDelay = Param.Cycles(1, Rename to decode delay)
 iewToDecodeDelay = Param.Cycles(1, Issue/Execute/Writeback to decode 
diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/fetch.hh
--- a/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:34 2014 -0400
+++ b/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:35 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2012 ARM Limited
+ * Copyright (c) 2010-2012, 2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -401,9 +401,6 @@
 /** Wire to get commit's information from backwards time buffer. */
 typename TimeBufferTimeStruct::wire fromCommit;
 
-/** Internal fetch instruction queue. */
-TimeBufferFetchStruct *fetchQueue;
-
 //Might be annoying how this name is different than the queue.
 /** Wire used to write any information heading to decode. */
 typename TimeBufferFetchStruct::wire toDecode;
@@ -455,6 +452,9 @@
 /** The width of fetch in instructions. */
 unsigned fetchWidth;
 
+/** The width of decode in instructions. */
+unsigned decodeWidth;
+
 /** Is the cache blocked?  If so no threads can access it. */
 bool cacheBlocked;
 
@@ -481,6 +481,12 @@
 /** The PC of the first instruction loaded into the fetch buffer. */
 Addr fetchBufferPC[Impl::MaxThreads];
 
+/** The size of the fetch queue in micro-ops */
+unsigned fetchQueueSize;
+
+/** Queue of fetched instructions */
+std::dequeDynInstPtr fetchQueue;
+
 /** Whether or not the fetch buffer data is valid. */
 bool fetchBufferValid[Impl::MaxThreads];
 
diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/fetch_impl.hh
--- a/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:34 2014 -0400
+++ b/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:35 2014 -0400
@@ -82,11 +82,13 @@
   iewToFetchDelay(params-iewToFetchDelay),
   commitToFetchDelay(params-commitToFetchDelay),
   fetchWidth(params-fetchWidth),
+  decodeWidth(params-decodeWidth),
   retryPkt(NULL),
   retryTid(InvalidThreadID),
   cacheBlkSize(cpu-cacheLineSize()),
   fetchBufferSize(params-fetchBufferSize),
   fetchBufferMask(fetchBufferSize - 1),
+  fetchQueueSize(params-fetchQueueSize),
   numThreads(params-numThreads),
   numFetchingThreads(params-smtNumFetchingThreads),
   finishTranslationEvent(this)
@@ -313,12 +315,10 @@
 
 templateclass Impl
 void
-DefaultFetchImpl::setFetchQueue(TimeBufferFetchStruct *fq_ptr)
+DefaultFetchImpl::setFetchQueue(TimeBufferFetchStruct *ftb_ptr)
 {
-fetchQueue = fq_ptr;
-
-// Create wire to write information to proper place in fetch queue.
-toDecode = fetchQueue-getWire(0);
+// Create wire to write information to proper place in fetch time buf.
+toDecode = ftb_ptr-getWire(0);
 }
 
 templateclass Impl
@@ -342,6 +342,7 @@
 cacheBlocked = false;
 
 priorityList.clear();
+fetchQueue.clear();
 
 // Setup PC and nextPC with initial state.
 for (ThreadID tid = 0; tid  numThreads; ++tid) {
@@ -454,6 +455,10 @@
 return false;
 }
 
+// Not drained if fetch queue contains entries
+if (!fetchQueue.empty())
+return false;
+
 /* The pipeline might start up again in the middle of the drain
  * cycle if the finish translation event is scheduled, so make
  * sure that's not the case.
@@ -673,11 +678,8 @@
 fetchStatus[tid] = IcacheWaitResponse;
 }
 } else {
-// Don't send an instruction to decode if it can't handle it.
-// Asynchronous nature of this function's calling means we have to
-// check 2 signals to see if decode is stalled.
-if (!(numInst  fetchWidth) || stalls[tid].decode ||
-fromDecode-decodeBlock[tid]) {
+// Don't send an instruction to decode if we can't handle it.
+if (!(numInst 

[gem5-dev] changeset in gem5: cpu: Fix o3 front-end pipeline interlock beha...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 867b536a68be in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=867b536a68be
description:
cpu: Fix o3 front-end pipeline interlock behavior

The o3 pipeline interlock/stall logic is incorrect.  o3 unnecessicarily 
stalled
fetch and decode due to later stages in the pipeline.  In general, a 
stage
should usually only consider if it is stalled by the adjacent, 
downstream stage.
Forcing stalls due to later stages creates and results in bubbles in the
pipeline.  Additionally, o3 stalled the entire frontend (fetch, decode, 
rename)
on a branch mispredict while the ROB is being serially walked to update 
the
RAT (robSquashing). Only should have stalled at rename.

diffstat:

 src/cpu/o3/comm.hh|   2 -
 src/cpu/o3/commit.hh  |  11 
 src/cpu/o3/commit_impl.hh |  40 -
 src/cpu/o3/decode.hh  |   4 +--
 src/cpu/o3/decode_impl.hh |  55 +++-
 src/cpu/o3/fetch.hh   |   3 --
 src/cpu/o3/fetch_impl.hh  |  64 --
 src/cpu/o3/iew.hh |  11 
 src/cpu/o3/iew_impl.hh|  23 +---
 src/cpu/o3/rename_impl.hh |  25 +
 10 files changed, 26 insertions(+), 212 deletions(-)

diffs (truncated from 525 to 300 lines):

diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/comm.hh
--- a/src/cpu/o3/comm.hhWed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/comm.hhWed Sep 03 07:42:34 2014 -0400
@@ -229,8 +229,6 @@
 bool renameUnblock[Impl::MaxThreads];
 bool iewBlock[Impl::MaxThreads];
 bool iewUnblock[Impl::MaxThreads];
-bool commitBlock[Impl::MaxThreads];
-bool commitUnblock[Impl::MaxThreads];
 };
 
 #endif //__CPU_O3_COMM_HH__
diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh  Wed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/commit.hh  Wed Sep 03 07:42:34 2014 -0400
@@ -185,9 +185,6 @@
 /** Sets the pointer to the IEW stage. */
 void setIEWStage(IEW *iew_stage);
 
-/** Skid buffer between rename and commit. */
-std::queueDynInstPtr skidBuffer;
-
 /** The pointer to the IEW stage. Used solely to ensure that
  * various events (traps, interrupts, syscalls) do not occur until
  * all stores have written back.
@@ -251,11 +248,6 @@
  */
 void setNextStatus();
 
-/** Checks if the ROB is completed with squashing. This is for the case
- * where the ROB can take multiple cycles to complete squashing.
- */
-bool robDoneSquashing();
-
 /** Returns if any of the threads have the number of ROB entries changed
  * on this cycle. Used to determine if the number of free ROB entries needs
  * to be sent back to previous stages.
@@ -321,9 +313,6 @@
 /** Gets instructions from rename and inserts them into the ROB. */
 void getInsts();
 
-/** Insert all instructions from rename into skidBuffer */
-void skidInsert();
-
 /** Marks completed instructions using information sent from IEW. */
 void markCompletedInsts();
 
diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:34 2014 -0400
@@ -1335,29 +1335,6 @@
 
 template class Impl
 void
-DefaultCommitImpl::skidInsert()
-{
-DPRINTF(Commit, Attempting to any instructions from rename into 
-skidBuffer.\n);
-
-for (int inst_num = 0; inst_num  fromRename-size; ++inst_num) {
-DynInstPtr inst = fromRename-insts[inst_num];
-
-if (!inst-isSquashed()) {
-DPRINTF(Commit, Inserting PC %s [sn:%i] [tid:%i] into ,
-skidBuffer.\n, inst-pcState(), inst-seqNum,
-inst-threadNumber);
-skidBuffer.push(inst);
-} else {
-DPRINTF(Commit, Instruction PC %s [sn:%i] [tid:%i] was 
-squashed, skipping.\n,
-inst-pcState(), inst-seqNum, inst-threadNumber);
-}
-}
-}
-
-template class Impl
-void
 DefaultCommitImpl::markCompletedInsts()
 {
 // Grab completed insts out of the IEW instruction queue, and mark
@@ -1380,23 +1357,6 @@
 }
 
 template class Impl
-bool
-DefaultCommitImpl::robDoneSquashing()
-{
-listThreadID::iterator threads = activeThreads-begin();
-listThreadID::iterator end = activeThreads-end();
-
-while (threads != end) {
-ThreadID tid = *threads++;
-
-if (!rob-isDoneSquashing(tid))
-return false;
-}
-
-return true;
-}
-
-template class Impl
 void
 DefaultCommitImpl::updateComInstStats(DynInstPtr inst)
 {
diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/decode.hh
--- a/src/cpu/o3/decode.hh  Wed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/decode.hh  Wed Sep 03 07:42:34 2014 -0400
@@ -126,7 +126,7 @@
 void drainSanityCheck() const;
 
 /** Has the stage 

[gem5-dev] changeset in gem5: cpu: Fix o3 drain bug

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 40d24a672351 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=40d24a672351
description:
cpu: Fix o3 drain bug

For X86, the o3 CPU would get stuck with the commit stage not being
drained if an interrupt arrived while drain was pending. isDrained()
makes sure that pcState.microPC() == 0, thus ensuring that we are at
an instruction boundary. However, when we take an interrupt we
execute:

pcState.upc(romMicroPC(entry));
pcState.nupc(romMicroPC(entry) + 1);
tc-pcState(pcState);

As a result, the MicroPC is no longer zero. This patch ensures the 
drain is
delayed until no interrupts are present.  Once draining, non-synchronous
interrupts are deffered until after the switch.

diffstat:

 src/cpu/o3/commit.hh  |  11 ++-
 src/cpu/o3/commit_impl.hh |  15 ---
 2 files changed, 22 insertions(+), 4 deletions(-)

diffs (72 lines):

diff -r 53278be85b40 -r 40d24a672351 src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh  Wed Sep 03 07:42:44 2014 -0400
+++ b/src/cpu/o3/commit.hh  Wed Sep 03 07:42:45 2014 -0400
@@ -438,9 +438,18 @@
 /** Number of Active Threads */
 ThreadID numThreads;
 
-/** Is a drain pending. */
+/** Is a drain pending? Commit is looking for an instruction boundary while
+ * there are no pending interrupts
+ */
 bool drainPending;
 
+/** Is a drain imminent? Commit has found an instruction boundary while no
+ * interrupts were present or in flight.  This was the last architecturally
+ * committed instruction.  Interrupts disabled and pipeline flushed.
+ * Waiting for structures to finish draining.
+ */
+bool drainImminent;
+
 /** The latency to handle a trap.  Used when scheduling trap
  * squash event.
  */
diff -r 53278be85b40 -r 40d24a672351 src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:44 2014 -0400
+++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:45 2014 -0400
@@ -104,6 +104,7 @@
   commitWidth(params-commitWidth),
   numThreads(params-numThreads),
   drainPending(false),
+  drainImminent(false),
   trapLatency(params-trapLatency),
   canHandleInterrupts(true),
   avoidQuiesceLiveLock(false)
@@ -406,6 +407,7 @@
 DefaultCommitImpl::drainResume()
 {
 drainPending = false;
+drainImminent = false;
 }
 
 template class Impl
@@ -816,8 +818,10 @@
 void
 DefaultCommitImpl::propagateInterrupt()
 {
+// Don't propagate intterupts if we are currently handling a trap or
+// in draining and the last observable instruction has been committed.
 if (commitStatus[0] == TrapPending || interrupt || trapSquash[0] ||
-tcSquash[0])
+tcSquash[0] || drainImminent)
 return;
 
 // Process interrupts if interrupts are enabled, not in PAL
@@ -1089,10 +1093,15 @@
 squashAfter(tid, head_inst);
 
 if (drainPending) {
-DPRINTF(Drain, Draining: %i:%s\n, tid, pc[tid]);
-if (pc[tid].microPC() == 0  interrupt == NoFault) {
+if (pc[tid].microPC() == 0  interrupt == NoFault 
+!thread[tid]-trapPending) {
+// Last architectually committed instruction.
+// Squash the pipeline, stall fetch, and use
+// drainImminent to disable interrupts
+DPRINTF(Drain, Draining: %i:%s\n, tid, pc[tid]);
 squashAfter(tid, head_inst);
 cpu-commitDrained(tid);
+drainImminent = true;
 }
 }
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Fix v8 neon latency issue for loads/stores

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 53278be85b40 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=53278be85b40
description:
arm: Fix v8 neon latency issue for loads/stores

Neon memory ops that operate on multiple registers currently have very 
poor
performance because of interleave/deinterleave micro-ops.

This patch marks the deinterleave/interleave micro-ops as No_OpClass 
such
that they take minumum cycles to execute and are never resource 
constrained.

Additionaly the micro-ops over-read registers.  Although one form may 
need
to read up to 20 sources, not all do.  This adds in new forms so false
dependencies are not modeled.  Instructions read their minimum number of
sources.

diffstat:

 src/arch/arm/insts/macromem.cc|  47 +-
 src/arch/arm/isa/insts/neon64_mem.isa |  24 +++-
 2 files changed, 56 insertions(+), 15 deletions(-)

diffs (140 lines):

diff -r 8bee5f4edb92 -r 53278be85b40 src/arch/arm/insts/macromem.cc
--- a/src/arch/arm/insts/macromem.ccTue Apr 29 16:05:02 2014 -0500
+++ b/src/arch/arm/insts/macromem.ccWed Sep 03 07:42:44 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2013 ARM Limited
+ * Copyright (c) 2010-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -1107,9 +1107,26 @@
 }
 
 for (int i = 0; i  numMarshalMicroops; ++i) {
-microOps[uopIdx++] = new MicroDeintNeon64(
-machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
-numStructElems, numRegs, i /* step */);
+switch(numRegs) {
+case 1: microOps[uopIdx++] = new MicroDeintNeon64_1Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 1, i /* step */);
+break;
+case 2: microOps[uopIdx++] = new MicroDeintNeon64_2Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 2, i /* step */);
+break;
+case 3: microOps[uopIdx++] = new MicroDeintNeon64_3Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 3, i /* step */);
+break;
+case 4: microOps[uopIdx++] = new MicroDeintNeon64_4Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 4, i /* step */);
+break;
+default: panic(Invalid number of registers);
+}
+
 }
 
 assert(uopIdx == numMicroops);
@@ -1150,9 +1167,25 @@
 unsigned uopIdx = 0;
 
 for(int i = 0; i  numMarshalMicroops; ++i) {
-microOps[uopIdx++] = new MicroIntNeon64(
-machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
-numStructElems, numRegs, i /* step */);
+switch (numRegs) {
+case 1: microOps[uopIdx++] = new MicroIntNeon64_1Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 1, i /* step */);
+break;
+case 2: microOps[uopIdx++] = new MicroIntNeon64_2Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 2, i /* step */);
+break;
+case 3: microOps[uopIdx++] = new MicroIntNeon64_3Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 3, i /* step */);
+break;
+case 4: microOps[uopIdx++] = new MicroIntNeon64_4Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 4, i /* step */);
+break;
+default: panic(Invalid number of registers);
+}
 }
 
 uint32_t memaccessFlags = TLB::MustBeOne | (TLB::ArmFlags) eSize |
diff -r 8bee5f4edb92 -r 53278be85b40 src/arch/arm/isa/insts/neon64_mem.isa
--- a/src/arch/arm/isa/insts/neon64_mem.isa Tue Apr 29 16:05:02 2014 -0500
+++ b/src/arch/arm/isa/insts/neon64_mem.isa Wed Sep 03 07:42:44 2014 -0400
@@ -1,6 +1,6 @@
 // -*- mode: c++ -*-
 
-// Copyright (c) 2012-2013 ARM Limited
+// Copyright (c) 2012-2014 ARM Limited
 // All rights reserved
 //
 // The license below extends only to copyright in the software and shall
@@ -163,11 +163,11 @@
 header_output += MicroNeonMemDeclare64.subst(loadIop) + \
 MicroNeonMemDeclare64.subst(storeIop)
 
-def mkMarshalMicroOp(name, Name):
+def mkMarshalMicroOp(name, Name, numRegs=4):
 global header_output, decoder_output, exec_output
 
 getInputCodeOp1L = ''
-for v in range(4):
+for v in range(numRegs):
 for p in 

[gem5-dev] changeset in gem5: cpu: fix bimodal predictor to use correct glo...

2014-09-03 Thread Dam Sunwoo via gem5-dev
changeset 1b627a6ddac0 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=1b627a6ddac0
description:
cpu: fix bimodal predictor to use correct global history reg

A small bug in the bimodal predictor caused significant degradation in
performance on some benchmarks. This was caused by using the wrong
globalHistoryReg during the update phase. This patches fixes the bug
and brings the performance to normal level.

diffstat:

 src/cpu/pred/bi_mode.cc |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diffs (12 lines):

diff -r 5e424aa952c5 -r 1b627a6ddac0 src/cpu/pred/bi_mode.cc
--- a/src/cpu/pred/bi_mode.cc   Wed Sep 03 07:42:40 2014 -0400
+++ b/src/cpu/pred/bi_mode.cc   Wed Sep 03 07:42:41 2014 -0400
@@ -167,7 +167,7 @@
 unsigned choiceHistoryIdx = ((branchAddr  instShiftAmt)
  choiceHistoryMask);
 unsigned globalHistoryIdx = (((branchAddr  instShiftAmt)
-^ globalHistoryReg)
+^ history-globalHistoryReg)
  globalHistoryMask);
 
 assert(choiceHistoryIdx  choicePredictorSize);
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Refactor assignment of Packet types

2014-09-03 Thread Curtis Dunham via gem5-dev
changeset 711eb0e64249 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=711eb0e64249
description:
mem: Refactor assignment of Packet types

Put the packet type swizzling (that is currently done in a lot of 
places)
into a refineCommand() member function.

diffstat:

 src/cpu/checker/cpu.cc  |   5 +---
 src/cpu/inorder/resources/cache_unit.cc |  14 +-
 src/cpu/o3/lsq_unit.hh  |   7 ++---
 src/cpu/o3/lsq_unit_impl.hh |   9 ++
 src/cpu/ozone/lw_lsq.hh |   5 +---
 src/cpu/ozone/lw_lsq_impl.hh|   5 +---
 src/cpu/simple/atomic.cc|   5 +--
 src/cpu/simple/timing.cc|  15 +---
 src/mem/packet.hh   |  41 -
 9 files changed, 54 insertions(+), 52 deletions(-)

diffs (223 lines):

diff -r 0b4d10f53c2d -r 711eb0e64249 src/cpu/checker/cpu.cc
--- a/src/cpu/checker/cpu.ccWed Sep 03 07:42:46 2014 -0400
+++ b/src/cpu/checker/cpu.ccTue May 13 12:20:48 2014 -0500
@@ -170,10 +170,7 @@
 // Now do the access
 if (fault == NoFault 
 !memReq-getFlags().isSet(Request::NO_ACCESS)) {
-PacketPtr pkt = new Packet(memReq,
-   memReq-isLLSC() ?
-   MemCmd::LoadLockedReq :
-   MemCmd::ReadReq);
+PacketPtr pkt = Packet::createRead(memReq);
 
 pkt-dataStatic(data);
 
diff -r 0b4d10f53c2d -r 711eb0e64249 src/cpu/inorder/resources/cache_unit.cc
--- a/src/cpu/inorder/resources/cache_unit.cc   Wed Sep 03 07:42:46 2014 -0400
+++ b/src/cpu/inorder/resources/cache_unit.cc   Tue May 13 12:20:48 2014 -0500
@@ -812,21 +812,11 @@
 void
 CacheUnit::buildDataPacket(CacheRequest *cache_req)
 {
-// Check for LL/SC and if so change command
-if (cache_req-memReq-isLLSC()  cache_req-pktCmd == MemCmd::ReadReq) {
-cache_req-pktCmd = MemCmd::LoadLockedReq;
-}
-
-if (cache_req-pktCmd == MemCmd::WriteReq) {
-cache_req-pktCmd =
-cache_req-memReq-isSwap() ? MemCmd::SwapReq :
-(cache_req-memReq-isLLSC() ? MemCmd::StoreCondReq 
- : MemCmd::WriteReq);
-}
-
 cache_req-dataPkt = new CacheReqPacket(cache_req,
 cache_req-pktCmd,
 cache_req-instIdx);
+cache_req-dataPkt-refineCommand(); // handle LL/SC, etc.
+
 DPRINTF(InOrderCachePort, [slot:%i]: Slot marked for %x\n,
 cache_req-getSlot(),
 cache_req-dataPkt-getAddr());
diff -r 0b4d10f53c2d -r 711eb0e64249 src/cpu/o3/lsq_unit.hh
--- a/src/cpu/o3/lsq_unit.hhWed Sep 03 07:42:46 2014 -0400
+++ b/src/cpu/o3/lsq_unit.hhTue May 13 12:20:48 2014 -0500
@@ -776,8 +776,7 @@
 
 // if we the cache is not blocked, do cache access
 bool completedFirst = false;
-MemCmd command = req-isLLSC() ? MemCmd::LoadLockedReq : MemCmd::ReadReq;
-PacketPtr data_pkt = new Packet(req, command);
+PacketPtr data_pkt = Packet::createRead(req);
 PacketPtr fst_data_pkt = NULL;
 PacketPtr snd_data_pkt = NULL;
 
@@ -794,8 +793,8 @@
 fst_data_pkt = data_pkt;
 } else {
 // Create the split packets.
-fst_data_pkt = new Packet(sreqLow, command);
-snd_data_pkt = new Packet(sreqHigh, command);
+fst_data_pkt = Packet::createRead(sreqLow);
+snd_data_pkt = Packet::createRead(sreqHigh);
 
 fst_data_pkt-dataStatic(load_inst-memData);
 snd_data_pkt-dataStatic(load_inst-memData + sreqLow-getSize());
diff -r 0b4d10f53c2d -r 711eb0e64249 src/cpu/o3/lsq_unit_impl.hh
--- a/src/cpu/o3/lsq_unit_impl.hh   Wed Sep 03 07:42:46 2014 -0400
+++ b/src/cpu/o3/lsq_unit_impl.hh   Tue May 13 12:20:48 2014 -0500
@@ -839,9 +839,6 @@
 else
 memcpy(inst-memData, storeQueue[storeWBIdx].data, req-getSize());
 
-MemCmd command =
-req-isSwap() ? MemCmd::SwapReq :
-(req-isLLSC() ? MemCmd::StoreCondReq : MemCmd::WriteReq);
 PacketPtr data_pkt;
 PacketPtr snd_data_pkt = NULL;
 
@@ -853,13 +850,13 @@
 if (!TheISA::HasUnalignedMemAcc || !storeQueue[storeWBIdx].isSplit) {
 
 // Build a single data packet if the store isn't split.
-data_pkt = new Packet(req, command);
+data_pkt = Packet::createWrite(req);
 data_pkt-dataStatic(inst-memData);
 data_pkt-senderState = state;
 } else {
 // Create two packets if the store is split in two.
-data_pkt = new Packet(sreqLow, command);
-snd_data_pkt = new Packet(sreqHigh, command);
+data_pkt = Packet::createWrite(sreqLow);
+snd_data_pkt = Packet::createWrite(sreqHigh);
 
 data_pkt-dataStatic(inst-memData);
 snd_data_pkt-dataStatic(inst-memData + 

[gem5-dev] changeset in gem5: x86: Flag instructions that call suspend as I...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 0b4d10f53c2d in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=0b4d10f53c2d
description:
x86: Flag instructions that call suspend as IsQuiesce

The o3 cpu relies upon instructions that suspend a thread context being
flagged as IsQuiesce.  If they are not, unpredictable behavior can 
occur.
This patch fixes that for the x86 ISA.

diffstat:

 src/arch/x86/isa/decoder/two_byte_opcodes.isa |  6 +++---
 src/arch/x86/isa/microops/specop.isa  |  3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diffs (33 lines):

diff -r 40d24a672351 -r 0b4d10f53c2d 
src/arch/x86/isa/decoder/two_byte_opcodes.isa
--- a/src/arch/x86/isa/decoder/two_byte_opcodes.isa Wed Sep 03 07:42:45 
2014 -0400
+++ b/src/arch/x86/isa/decoder/two_byte_opcodes.isa Wed Sep 03 07:42:46 
2014 -0400
@@ -141,13 +141,13 @@
 }}, IsNonSpeculative);
 0x01: m5quiesce({{
 PseudoInst::quiesce(xc-tcBase());
-}}, IsNonSpeculative);
+}}, IsNonSpeculative, IsQuiesce);
 0x02: m5quiesceNs({{
 PseudoInst::quiesceNs(xc-tcBase(), Rdi);
-}}, IsNonSpeculative);
+}}, IsNonSpeculative, IsQuiesce);
 0x03: m5quiesceCycle({{
 PseudoInst::quiesceCycles(xc-tcBase(), Rdi);
-}}, IsNonSpeculative);
+}}, IsNonSpeculative, IsQuiesce);
 0x04: m5quiesceTime({{
 Rax = PseudoInst::quiesceTime(xc-tcBase());
 }}, IsNonSpeculative);
diff -r 40d24a672351 -r 0b4d10f53c2d src/arch/x86/isa/microops/specop.isa
--- a/src/arch/x86/isa/microops/specop.isa  Wed Sep 03 07:42:45 2014 -0400
+++ b/src/arch/x86/isa/microops/specop.isa  Wed Sep 03 07:42:46 2014 -0400
@@ -63,7 +63,8 @@
 MicroHalt(ExtMachInst _machInst, const char * instMnem,
 uint64_t setFlags) :
 X86MicroopBase(_machInst, halt, instMnem,
-   setFlags | (ULL(1)  StaticInst::IsNonSpeculative),
+   setFlags | (ULL(1)  StaticInst::IsNonSpeculative) 
|
+   (ULL(1)  StaticInst::IsQuiesce),
No_OpClass)
 {
 }
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: ISA X31 destination register fix

2014-09-03 Thread Andrew Bardsley via gem5-dev
changeset 85001c018d4c in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=85001c018d4c
description:
arm: ISA X31 destination register fix

This patch substituted the zero register for X31 used as a
destination register.  This prevents false dependencies based on
X31.

diffstat:

 src/arch/arm/intregs.hh  |9 ++-
 src/arch/arm/isa/formats/aarch64.isa |  120 ++
 2 files changed, 72 insertions(+), 57 deletions(-)

diffs (truncated from 349 to 300 lines):

diff -r 60dddc0a6f78 -r 85001c018d4c src/arch/arm/intregs.hh
--- a/src/arch/arm/intregs.hh   Wed Sep 03 07:42:41 2014 -0400
+++ b/src/arch/arm/intregs.hh   Wed Sep 03 07:42:43 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2013 ARM Limited
+ * Copyright (c) 2010-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -510,6 +510,13 @@
 return reg;
 }
 
+static inline IntRegIndex
+makeZero(IntRegIndex reg)
+{
+if (reg == INTREG_X31)
+reg = INTREG_ZERO;
+return reg;
+}
 
 static inline bool
 isSP(IntRegIndex reg)
diff -r 60dddc0a6f78 -r 85001c018d4c src/arch/arm/isa/formats/aarch64.isa
--- a/src/arch/arm/isa/formats/aarch64.isa  Wed Sep 03 07:42:41 2014 -0400
+++ b/src/arch/arm/isa/formats/aarch64.isa  Wed Sep 03 07:42:43 2014 -0400
@@ -1,4 +1,4 @@
-// Copyright (c) 2011-2013 ARM Limited
+// Copyright (c) 2011-2014 ARM Limited
 // All rights reserved
 //
 // The license below extends only to copyright in the software and shall
@@ -63,6 +63,7 @@
 {
 IntRegIndex rd = (IntRegIndex)(uint32_t)bits(machInst, 4, 0);
 IntRegIndex rdsp = makeSP(rd);
+IntRegIndex rdzr = makeZero(rd);
 IntRegIndex rn = (IntRegIndex)(uint32_t)bits(machInst, 9, 5);
 IntRegIndex rnsp = makeSP(rn);
 
@@ -79,9 +80,9 @@
 uint64_t immhi = bits(machInst, 23, 5);
 uint64_t imm = (immlo  0) | (immhi  2);
 if (bits(machInst, 31) == 0)
-return new AdrXImm(machInst, rd, INTREG_ZERO, sext21(imm));
+return new AdrXImm(machInst, rdzr, INTREG_ZERO, sext21(imm));
 else
-return new AdrpXImm(machInst, rd, INTREG_ZERO,
+return new AdrpXImm(machInst, rdzr, INTREG_ZERO,
 sext33(imm  12));
   }
   case 0x2:
@@ -100,11 +101,11 @@
   case 0x0:
 return new AddXImm(machInst, rdsp, rnsp, imm);
   case 0x1:
-return new AddXImmCc(machInst, rd, rnsp, imm);
+return new AddXImmCc(machInst, rdzr, rnsp, imm);
   case 0x2:
 return new SubXImm(machInst, rdsp, rnsp, imm);
   case 0x3:
-return new SubXImmCc(machInst, rd, rnsp, imm);
+return new SubXImmCc(machInst, rdzr, rnsp, imm);
 }
   }
   case 0x4:
@@ -146,23 +147,24 @@
   case 0x2:
 return new EorXImm(machInst, rdsp, rn, imm);
   case 0x3:
-return new AndXImmCc(machInst, rd, rn, imm);
+return new AndXImmCc(machInst, rdzr, rn, imm);
 }
   }
   case 0x5:
   {
 IntRegIndex rd = (IntRegIndex)(uint32_t)bits(machInst, 4, 0);
+IntRegIndex rdzr = makeZero(rd);
 uint32_t imm16 = bits(machInst, 20, 5);
 uint32_t hw = bits(machInst, 22, 21);
 switch (opc) {
   case 0x0:
-return new Movn(machInst, rd, imm16, hw * 16);
+return new Movn(machInst, rdzr, imm16, hw * 16);
   case 0x1:
 return new Unknown64(machInst);
   case 0x2:
-return new Movz(machInst, rd, imm16, hw * 16);
+return new Movz(machInst, rdzr, imm16, hw * 16);
   case 0x3:
-return new Movk(machInst, rd, imm16, hw * 16);
+return new Movk(machInst, rdzr, imm16, hw * 16);
 }
   }
   case 0x6:
@@ -170,11 +172,11 @@
 return new Unknown64(machInst);
 switch (opc) {
   case 0x0:
-return new Sbfm64(machInst, rd, rn, immr, imms);
+return new Sbfm64(machInst, rdzr, rn, immr, imms);
   case 0x1:
-return new Bfm64(machInst, rd, rn, immr, imms);
+return new Bfm64(machInst, rdzr, rn, immr, imms);
   case 0x2:
-return new Ubfm64(machInst, rd, rn, immr, imms);
+return new Ubfm64(machInst, rdzr, rn, immr, imms);
   case 0x3:
 return new Unknown64(machInst);
 }
@@ -184,7 +186,7 @@
 if (opc || bits(machInst, 21))
 return new Unknown64(machInst);
 else
-return new 

[gem5-dev] changeset in gem5: alpha: Stop using 'inorder' and rely entirely...

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset 35241e33c38f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=35241e33c38f
description:
alpha: Stop using 'inorder' and rely entirely on 'minor'

This patch avoids building the 'inorder' CPU model for any permutation
of ALPHA, and also removes the ALPHA regressions using the 'inorder'
CPU. The 'minor' CPU is already providing a broader test coverage.

diffstat:

 build_opts/ALPHA  |2 +-
 build_opts/ALPHA_MESI_Two_Level   |2 +-
 build_opts/ALPHA_MOESI_CMP_directory  |2 +-
 build_opts/ALPHA_MOESI_CMP_token  |2 +-
 build_opts/ALPHA_MOESI_hammer |2 +-
 build_opts/ALPHA_Network_test |2 +-
 tests/SConscript  |1 -
 tests/configs/tsunami-inorder.py  |   43 -
 tests/long/se/30.eon/ref/alpha/tru64/inorder-timing/config.ini|  346 
 tests/long/se/30.eon/ref/alpha/tru64/inorder-timing/simerr|   51 -
 tests/long/se/30.eon/ref/alpha/tru64/inorder-timing/simout|   14 -
 tests/long/se/30.eon/ref/alpha/tru64/inorder-timing/stats.txt |  721 
-
 tests/long/se/50.vortex/ref/alpha/tru64/inorder-timing/config.ini |  346 
 tests/long/se/50.vortex/ref/alpha/tru64/inorder-timing/simerr |5 -
 tests/long/se/50.vortex/ref/alpha/tru64/inorder-timing/simout |   11 -
 tests/long/se/50.vortex/ref/alpha/tru64/inorder-timing/smred.msg  |  158 --
 tests/long/se/50.vortex/ref/alpha/tru64/inorder-timing/smred.out  |  258 ---
 tests/long/se/50.vortex/ref/alpha/tru64/inorder-timing/stats.txt  |  752 
-
 tests/long/se/60.bzip2/ref/alpha/tru64/inorder-timing/config.ini  |  346 
 tests/long/se/60.bzip2/ref/alpha/tru64/inorder-timing/simerr  |5 -
 tests/long/se/60.bzip2/ref/alpha/tru64/inorder-timing/simout  |   26 -
 tests/long/se/60.bzip2/ref/alpha/tru64/inorder-timing/stats.txt   |  759 
--
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/config.ini  |  346 
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/simerr  |5 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/simout  |   26 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/smred.out   |  276 ---
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/smred.pin   |   17 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/smred.pl1   |   11 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/smred.pl2   |2 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/smred.sav   |   18 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/smred.sv2   |   19 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/smred.twf   |   29 -
 tests/long/se/70.twolf/ref/alpha/tru64/inorder-timing/stats.txt   |  722 
-
 tests/quick/se/00.hello/ref/alpha/linux/inorder-timing/config.ini |  346 
 tests/quick/se/00.hello/ref/alpha/linux/inorder-timing/simerr |1 -
 tests/quick/se/00.hello/ref/alpha/linux/inorder-timing/simout |   12 -
 tests/quick/se/00.hello/ref/alpha/linux/inorder-timing/stats.txt  |  699 
-
 37 files changed, 6 insertions(+), 6377 deletions(-)

diffs (truncated from 6560 to 300 lines):

diff -r 939094c17866 -r 35241e33c38f build_opts/ALPHA
--- a/build_opts/ALPHA  Wed Sep 03 07:42:55 2014 -0400
+++ b/build_opts/ALPHA  Wed Sep 03 07:42:56 2014 -0400
@@ -1,4 +1,4 @@
 TARGET_ISA = 'alpha'
 SS_COMPATIBLE_FP = 1
-CPU_MODELS = 'AtomicSimpleCPU,TimingSimpleCPU,O3CPU,InOrderCPU,MinorCPU'
+CPU_MODELS = 'AtomicSimpleCPU,TimingSimpleCPU,O3CPU,MinorCPU'
 PROTOCOL = 'MI_example'
diff -r 939094c17866 -r 35241e33c38f build_opts/ALPHA_MESI_Two_Level
--- a/build_opts/ALPHA_MESI_Two_Level   Wed Sep 03 07:42:55 2014 -0400
+++ b/build_opts/ALPHA_MESI_Two_Level   Wed Sep 03 07:42:56 2014 -0400
@@ -1,3 +1,3 @@
 SS_COMPATIBLE_FP = 1
-CPU_MODELS = 'AtomicSimpleCPU,TimingSimpleCPU,O3CPU,InOrderCPU'
+CPU_MODELS = 'AtomicSimpleCPU,TimingSimpleCPU,O3CPU,MinorCPU'
 PROTOCOL = 'MESI_Two_Level'
diff -r 939094c17866 -r 35241e33c38f build_opts/ALPHA_MOESI_CMP_directory
--- a/build_opts/ALPHA_MOESI_CMP_directory  Wed Sep 03 07:42:55 2014 -0400
+++ b/build_opts/ALPHA_MOESI_CMP_directory  Wed Sep 03 07:42:56 2014 -0400
@@ -1,3 +1,3 @@
 SS_COMPATIBLE_FP = 1
-CPU_MODELS = 'AtomicSimpleCPU,TimingSimpleCPU,O3CPU,InOrderCPU'
+CPU_MODELS = 'AtomicSimpleCPU,TimingSimpleCPU,O3CPU,MinorCPU'
 PROTOCOL = 'MOESI_CMP_directory'
diff -r 939094c17866 -r 35241e33c38f build_opts/ALPHA_MOESI_CMP_token
--- a/build_opts/ALPHA_MOESI_CMP_token  Wed Sep 03 07:42:55 2014 -0400
+++ b/build_opts/ALPHA_MOESI_CMP_token  Wed Sep 03 07:42:56 2014 -0400
@@ -1,3 +1,3 @@
 SS_COMPATIBLE_FP = 1
-CPU_MODELS = 'AtomicSimpleCPU,TimingSimpleCPU,O3CPU,InOrderCPU'
+CPU_MODELS = 

[gem5-dev] changeset in gem5: config: Update Streamline scripts and configs

2014-09-03 Thread Dam Sunwoo via gem5-dev
changeset 2d6d7a056a38 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=2d6d7a056a38
description:
config: Update Streamline scripts and configs

Updated the stat_config.ini files to reflect new structure.

Moved to a more generic stat naming scheme that can easily handle
multiple CPUs and L2s by letting the script replace pre-defined #
symbols to CPU or L2 ids.

Removed the previous per_switch_cpus sections. Still can be used by
spelling out the stat names if necessary. (Resuming from checkpoints
no longer use switch_cpus. Only fast-forwarding does.)

diffstat:

 util/streamline/atomic_stat_config.ini |  55 ---
 util/streamline/m5stats2streamline.py  |  53 +++---
 util/streamline/o3_stat_config.ini |  78 +++--
 3 files changed, 81 insertions(+), 105 deletions(-)

diffs (truncated from 343 to 300 lines):

diff -r dfebd39c48a7 -r 2d6d7a056a38 util/streamline/atomic_stat_config.ini
--- a/util/streamline/atomic_stat_config.iniWed Sep 03 07:43:01 2014 -0400
+++ b/util/streamline/atomic_stat_config.iniWed Sep 03 07:43:02 2014 -0400
@@ -40,54 +40,55 @@
 # Stats grouped together will show as grouped in Streamline.
 # E.g.,
 #
-# icache =
-#icache.overall_hits::total
-#icache.overall_misses::total
+# commit_inst_count =
+# system.cluster.cpu#.commit.committedInsts
+# system.cluster.cpu#.commit.commitSquashedInsts
 #
-# will display the icache as a stacked line chart.
+# will display the inst counts (committed/squashed) as a stacked line chart.
 # Charts will still be configurable in Streamline.
 
 [PER_CPU_STATS]
-# system.cpu#. will automatically prepended for per-CPU stats
+# '#' will be automatically replaced with the correct CPU id.
+
+commit_inst_count =
+system.cluster.cpu#.committedInsts
 
 cycles =
-num_busy_cycles
-num_idle_cycles
+system.cluster.cpu#.num_busy_cycles
+system.cluster.cpu#.num_idle_cycles
 
 register_access =
-num_int_register_reads
-num_int_register_writes
+system.cluster.cpu#.num_int_register_reads
+system.cluster.cpu#.num_int_register_writes
 
 mem_refs =
-num_mem_refs
+system.cluster.cpu#.num_mem_refs
 
 inst_breakdown =
-num_conditional_control_insts
-num_int_insts
-num_fp_insts
-num_load_insts
-num_store_insts
+system.cluster.cpu#.num_conditional_control_insts
+system.cluster.cpu#.num_int_insts
+system.cluster.cpu#.num_fp_insts
+system.cluster.cpu#.num_load_insts
+system.cluster.cpu#.num_store_insts
 
 icache =
-icache.overall_hits::total
-icache.overall_misses::total
+system.cluster.il1_cache#.overall_hits::total
+system.cluster.il1_cache#.overall_misses::total
 
 dcache =
-dcache.overall_hits::total
-dcache.overall_misses::total
-
-[PER_SWITCHCPU_STATS]
-# If starting from checkpoints, gem5 keeps CPU stats in system.switch_cpus# 
structures.
-# List per-switchcpu stats here if any
-# system.switch_cpus# will automatically prepended for per-CPU stats
+system.cluster.dl1_cache#.overall_hits::total
+system.cluster.dl1_cache#.overall_misses::total
 
 [PER_L2_STATS]
+# '#' will be automatically replaced with the correct L2 id.
 
 l2_cache =
-overall_hits::total
-overall_misses::total
+system.cluster.l2_cache#.overall_hits::total
+system.cluster.l2_cache#.overall_misses::total
 
 [OTHER_STATS]
+# Anything that doesn't belong to CPU or L2 caches
 
 physmem =
-system.physmem.bw_total::total
+system.memsys.mem_ctrls.bytes_read::total
+system.memsys.mem_ctrls.bytes_written::total
diff -r dfebd39c48a7 -r 2d6d7a056a38 util/streamline/m5stats2streamline.py
--- a/util/streamline/m5stats2streamline.py Wed Sep 03 07:43:01 2014 -0400
+++ b/util/streamline/m5stats2streamline.py Wed Sep 03 07:43:02 2014 -0400
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 
-# Copyright (c) 2012 ARM Limited
+# Copyright (c) 2012, 2014 ARM Limited
 # All rights reserved
 #
 # The license below extends only to copyright in the software and shall
@@ -142,18 +142,18 @@
 print ERROR: config file ', config_file, ' not found
 sys.exit(1)
 
-if config.has_section(system.cpu):
+if config.has_section(system.cluster.cpu):
 num_cpus = 1
 else:
 num_cpus = 0
-while config.has_section(system.cpu + str(num_cpus)):
+while config.has_section(system.cluster.cpu + str(num_cpus)):
 num_cpus += 1
 
-if config.has_section(system.l2):
+if config.has_section(system.cluster.l2_cache):
 num_l2 = 1
 else:
 num_l2 = 0
-while config.has_section(system.l2 + str(num_l2)):
+while config.has_section(system.cluster.l2_cache + str(num_l2)):
 num_l2 += 1
 
 print Num CPUs:, num_cpus
@@ -713,7 +713,7 @@
 
 # StatsEntry that contains individual statistics
 class StatsEntry(object):
-def __init__(self, name, 

[gem5-dev] changeset in gem5: arm: use condition code registers for ARM ISA

2014-09-03 Thread Curtis Dunham via gem5-dev
changeset 8bee5f4edb92 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=8bee5f4edb92
description:
arm: use condition code registers for ARM ISA

Analogous to ee049bf (for x86).  Requires a bump of the checkpoint 
version
and corresponding upgrader code to move the condition code register 
values
to the new register file.

diffstat:

 src/arch/arm/ccregs.hh|  85 +++
 src/arch/arm/faults.cc|  18 
 src/arch/arm/insts/static_inst.cc |   5 +-
 src/arch/arm/intregs.hh   |   5 --
 src/arch/arm/isa.cc   |  14 +++---
 src/arch/arm/isa.hh   |   6 +-
 src/arch/arm/isa/operands.isa |  54 
 src/arch/arm/miscregs.hh  |  19 
 src/arch/arm/nativetrace.cc   |  12 ++--
 src/arch/arm/registers.hh |  14 --
 src/arch/arm/utility.cc   |   6 +-
 src/cpu/o3/O3CPU.py   |   2 +-
 src/cpu/simple_thread.hh  |   1 +
 src/sim/serialize.hh  |   2 +-
 util/cpt_upgrader.py  |  28 
 15 files changed, 184 insertions(+), 87 deletions(-)

diffs (truncated from 538 to 300 lines):

diff -r 85001c018d4c -r 8bee5f4edb92 src/arch/arm/ccregs.hh
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/src/arch/arm/ccregs.hhTue Apr 29 16:05:02 2014 -0500
@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2014 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met: redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer;
+ * redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution;
+ * neither the name of the copyright holders nor the names of its
+ * contributors may be used to endorse or promote products derived from
+ * this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Authors: Curtis Dunham
+ */
+#ifndef __ARCH_ARM_CCREGS_HH__
+#define __ARCH_ARM_CCREGS_HH__
+
+namespace ArmISA
+{
+
+enum ccRegIndex {
+CCREG_NZ,
+CCREG_C,
+CCREG_V,
+CCREG_GE,
+CCREG_FP,
+CCREG_ZERO,
+NUM_CCREGS
+};
+
+const char * const ccRegName[NUM_CCREGS] = {
+nz,
+c,
+v,
+ge,
+fp,
+zero
+};
+
+enum ConditionCode {
+COND_EQ  =   0,
+COND_NE, //  1
+COND_CS, //  2
+COND_CC, //  3
+COND_MI, //  4
+COND_PL, //  5
+COND_VS, //  6
+COND_VC, //  7
+COND_HI, //  8
+COND_LS, //  9
+COND_GE, // 10
+COND_LT, // 11
+COND_GT, // 12
+COND_LE, // 13
+COND_AL, // 14
+COND_UC  // 15
+};
+
+}
+
+#endif // __ARCH_ARM_CCREGS_HH__
diff -r 85001c018d4c -r 8bee5f4edb92 src/arch/arm/faults.cc
--- a/src/arch/arm/faults.ccWed Sep 03 07:42:43 2014 -0400
+++ b/src/arch/arm/faults.ccTue Apr 29 16:05:02 2014 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010, 2012-2013 ARM Limited
+ * Copyright (c) 2010, 2012-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -466,10 +466,10 @@
 SCTLR sctlr = tc-readMiscReg(MISCREG_SCTLR);
 SCR scr = tc-readMiscReg(MISCREG_SCR);
 CPSR saved_cpsr = tc-readMiscReg(MISCREG_CPSR);
-saved_cpsr.nz = tc-readIntReg(INTREG_CONDCODES_NZ);
-saved_cpsr.c = tc-readIntReg(INTREG_CONDCODES_C);
-

[gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 6be8945d226b in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b
description:
cpu: Fix cache blocked load behavior in o3 cpu

This patch fixes the load blocked/replay mechanism in the o3 cpu.  
Rather than
flushing the entire pipeline, this patch replays loads once the cache 
becomes
unblocked.

Additionally, deferred memory instructions (loads which had conflicting 
stores),
when replayed would not respect the number of functional units (only 
respected
issue width).  This patch also corrects that.

Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.

diffstat:

 src/cpu/o3/iew.hh   |   13 +-
 src/cpu/o3/iew_impl.hh  |   57 ++
 src/cpu/o3/inst_queue.hh|   25 -
 src/cpu/o3/inst_queue_impl.hh   |   68 ++---
 src/cpu/o3/lsq.hh   |   27 +-
 src/cpu/o3/lsq_impl.hh  |   23 +---
 src/cpu/o3/lsq_unit.hh  |  198 ---
 src/cpu/o3/lsq_unit_impl.hh |   40 ++-
 src/cpu/o3/mem_dep_unit.hh  |4 +-
 src/cpu/o3/mem_dep_unit_impl.hh |4 +-
 10 files changed, 203 insertions(+), 256 deletions(-)

diffs (truncated from 846 to 300 lines):

diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh
--- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400
+++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2012 ARM Limited
+ * Copyright (c) 2010-2012, 2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -181,6 +181,12 @@
 /** Re-executes all rescheduled memory instructions. */
 void replayMemInst(DynInstPtr inst);
 
+/** Moves memory instruction onto the list of cache blocked instructions */
+void blockMemInst(DynInstPtr inst);
+
+/** Notifies that the cache has become unblocked */
+void cacheUnblocked();
+
 /** Sends an instruction to commit through the time buffer. */
 void instToCommit(DynInstPtr inst);
 
@@ -233,11 +239,6 @@
  */
 void squashDueToMemOrder(DynInstPtr inst, ThreadID tid);
 
-/** Sends commit proper information for a squash due to memory becoming
- * blocked (younger issued instructions must be retried).
- */
-void squashDueToMemBlocked(DynInstPtr inst, ThreadID tid);
-
 /** Sets Dispatch to blocked, and signals back to other stages to block. */
 void block(ThreadID tid);
 
diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew_impl.hh
--- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:38 2014 -0400
+++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:39 2014 -0400
@@ -530,29 +530,6 @@
 
 templateclass Impl
 void
-DefaultIEWImpl::squashDueToMemBlocked(DynInstPtr inst, ThreadID tid)
-{
-DPRINTF(IEW, [tid:%i]: Memory blocked, squashing load and younger insts, 
-PC: %s [sn:%i].\n, tid, inst-pcState(), inst-seqNum);
-if (!toCommit-squash[tid] ||
-inst-seqNum  toCommit-squashedSeqNum[tid]) {
-toCommit-squash[tid] = true;
-
-toCommit-squashedSeqNum[tid] = inst-seqNum;
-toCommit-pc[tid] = inst-pcState();
-toCommit-mispredictInst[tid] = NULL;
-
-// Must include the broadcasted SN in the squash.
-toCommit-includeSquashInst[tid] = true;
-
-ldstQueue.setLoadBlockedHandled(tid);
-
-wroteToTimeBuffer = true;
-}
-}
-
-templateclass Impl
-void
 DefaultIEWImpl::block(ThreadID tid)
 {
 DPRINTF(IEW, [tid:%u]: Blocking.\n, tid);
@@ -610,6 +587,20 @@
 
 templateclass Impl
 void
+DefaultIEWImpl::blockMemInst(DynInstPtr inst)
+{
+instQueue.blockMemInst(inst);
+}
+
+templateclass Impl
+void
+DefaultIEWImpl::cacheUnblocked()
+{
+instQueue.cacheUnblocked();
+}
+
+templateclass Impl
+void
 DefaultIEWImpl::instToCommit(DynInstPtr inst)
 {
 // This function should not be called after writebackInsts in a
@@ -1376,15 +1367,6 @@
 squashDueToMemOrder(violator, tid);
 
 ++memOrderViolationEvents;
-} else if (ldstQueue.loadBlocked(tid) 
-   !ldstQueue.isLoadBlockedHandled(tid)) {
-fetchRedirect[tid] = true;
-
-DPRINTF(IEW, Load operation couldn't execute because the 
-memory system is blocked.  PC: %s [sn:%lli]\n,
-inst-pcState(), inst-seqNum);
-
-squashDueToMemBlocked(inst, tid);
 }
 } else {
 // Reset any state associated with redirects that will not
@@ -1403,17 +1385,6 @@
 
 ++memOrderViolationEvents;
 }
-if (ldstQueue.loadBlocked(tid) 
-!ldstQueue.isLoadBlockedHandled(tid)) {
-DPRINTF(IEW, Load operation couldn't execute because the 
-memory system is blocked.  PC: %s [sn:%lli]\n,
-

[gem5-dev] changeset in gem5: tests: Use medium dataset for perlbmk regress...

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset ee383b8e4d3f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=ee383b8e4d3f
description:
tests: Use medium dataset for perlbmk regressions

This patch changes the perlbmk regression script from the large to the
medium dataset to reduce the regression run time. For all ISAs and CPU
models, the total perlbmk host CPU time with the large dataset is
roughly 12 hours (constituting 30% of the total regression host
time). There is, most likely, almost no added value in terms of code
coverage for this rather excessive run time.

diffstat:

 tests/long/se/40.perlbmk/test.py |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diffs (10 lines):

diff -r 35241e33c38f -r ee383b8e4d3f tests/long/se/40.perlbmk/test.py
--- a/tests/long/se/40.perlbmk/test.py  Wed Sep 03 07:42:56 2014 -0400
+++ b/tests/long/se/40.perlbmk/test.py  Wed Sep 03 07:42:57 2014 -0400
@@ -29,5 +29,5 @@
 m5.util.addToPath('../configs/common')
 from cpu2000 import perlbmk_makerand
 
-workload = perlbmk_makerand(isa, opsys, 'lgred')
+workload = perlbmk_makerand(isa, opsys, 'mdred')
 root.system.cpu[0].workload = workload.makeLiveProcess()
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: config: Refactor RealviewEMM to fit into new ...

2014-09-03 Thread Geoffrey Blake via gem5-dev
changeset dfebd39c48a7 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=dfebd39c48a7
description:
config: Refactor RealviewEMM to fit into new config system

This eliminates some default devices and adds in helper functions
to connect the devices defined here to associate with the proper
clock domains.

diffstat:

 configs/common/FSConfig.py |3 +
 src/dev/arm/RealView.py|  169 +++-
 2 files changed, 152 insertions(+), 20 deletions(-)

diffs (283 lines):

diff -r 5f1f92bf76ee -r dfebd39c48a7 configs/common/FSConfig.py
--- a/configs/common/FSConfig.pyWed Sep 03 07:42:59 2014 -0400
+++ b/configs/common/FSConfig.pyWed Sep 03 07:43:01 2014 -0400
@@ -221,6 +221,9 @@
 
 self.cf0 = CowIdeDisk(driveID='master')
 self.cf0.childImage(mdesc.disk())
+
+# Attach any PCI devices this platform supports
+self.realview.attachPciDevices()
 # default to an IDE controller rather than a CF one
 # assuming we've got one; EMM64 is an exception for the moment
 if machine_type != VExpress_EMM64:
diff -r 5f1f92bf76ee -r dfebd39c48a7 src/dev/arm/RealView.py
--- a/src/dev/arm/RealView.py   Wed Sep 03 07:42:59 2014 -0400
+++ b/src/dev/arm/RealView.py   Wed Sep 03 07:43:01 2014 -0400
@@ -1,4 +1,4 @@
-# Copyright (c) 2009-2013 ARM Limited
+# Copyright (c) 2009-2014 ARM Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -44,7 +44,7 @@
 from m5.proxy import *
 from Device import BasicPioDevice, PioDevice, IsaFake, BadAddr, DmaDevice
 from Pci import PciConfigAll
-from Ethernet import NSGigE, IGbE_e1000, IGbE_igb
+from Ethernet import NSGigE, IGbE_igb, IGbE_e1000
 from Ide import *
 from Platform import Platform
 from Terminal import Terminal
@@ -184,6 +184,18 @@
 mem_start_addr = Param.Addr(0, Start address of main memory)
 max_mem_size = Param.Addr('256MB', Maximum amount of RAM supported by 
platform)
 
+def attachPciDevices(self):
+pass
+
+def enableMSIX(self):
+pass
+
+def onChipIOClkDomain(self, clkdomain):
+pass
+
+def offChipIOClkDomain(self, clkdomain):
+pass
+
 def setupBootLoader(self, mem_bus, cur_sys, loc):
 self.nvmem = SimpleMemory(range = AddrRange('2GB', size = '64MB'),
   conf_table_reported = False)
@@ -250,6 +262,14 @@
   self.flash_fake.pio_addr + \
   self.flash_fake.pio_size - 1)]
 
+# Set the clock domain for IO objects that are considered
+# to be close to the cores.
+def onChipIOClkDomain(self, clkdomain):
+self.gic.clk_domain = clkdomain
+self.l2x0_fake.clk_domain   = clkdomain
+self.a9scu.clkdomain= clkdomain
+self.local_cpu_timer.clk_domain = clkdomain
+
 # Attach I/O devices to specified bus object.  Can't do this
 # earlier, since the bus object itself is typically defined at the
 # System level.
@@ -282,12 +302,40 @@
self.rtc.pio   = bus.master
self.flash_fake.pio= bus.master
 
+# Set the clock domain for IO objects that are considered
+# to be far away from the cores.
+def offChipIOClkDomain(self, clkdomain):
+self.uart.clk_domain  = clkdomain
+self.realview_io.clk_domain   = clkdomain
+self.timer0.clk_domain= clkdomain
+self.timer1.clk_domain= clkdomain
+self.clcd.clk_domain  = clkdomain
+self.kmi0.clk_domain  = clkdomain
+self.kmi1.clk_domain  = clkdomain
+self.cf_ctrl.clk_domain   = clkdomain
+self.dmac_fake.clk_domain = clkdomain
+self.uart1_fake.clk_domain= clkdomain
+self.uart2_fake.clk_domain= clkdomain
+self.uart3_fake.clk_domain= clkdomain
+self.smc_fake.clk_domain  = clkdomain
+self.sp810_fake.clk_domain= clkdomain
+self.watchdog_fake.clk_domain = clkdomain
+self.gpio0_fake.clk_domain= clkdomain
+self.gpio1_fake.clk_domain= clkdomain
+self.gpio2_fake.clk_domain= clkdomain
+self.ssp_fake.clk_domain  = clkdomain
+self.sci_fake.clk_domain  = clkdomain
+self.aaci_fake.clk_domain = clkdomain
+self.mmc_fake.clk_domain  = clkdomain
+self.rtc.clk_domain   = clkdomain
+self.flash_fake.clk_domain= clkdomain
+
 # Reference for memory map and interrupt number
 # RealView Emulation Baseboard User Guide (ARM DUI 0143B)
 # Chapter 4: Programmer's Reference
 class RealViewEB(RealView):
 uart = Pl011(pio_addr=0x10009000, int_num=44)
-realview_io = RealViewCtrl(pio_addr=0x1000)
+realview_io = RealViewCtrl(pio_addr=0x1000, idreg=0x01400500)
 gic = Pl390(dist_addr=0x10041000, cpu_addr=0x1004)
 timer0 

[gem5-dev] changeset in gem5: arm: Assume we have a kernel that supports pc...

2014-09-03 Thread Ali Saidi via gem5-dev
changeset 1aff1376921e in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=1aff1376921e
description:
arm: Assume we have a kernel that supports pci devices

Change the default kernel for AArch64 and since it supports PCI devices
remove the hack that made it use CF. Unfortunately, there isn't really
a half-way here and we need to switch. Current users will get an error
message that the kernel isn't found and hopefully go download a new
kernel that supports PCI.

diffstat:

 configs/common/FSConfig.py |  12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diffs (29 lines):

diff -r 198dfef33403 -r 1aff1376921e configs/common/FSConfig.py
--- a/configs/common/FSConfig.pyWed Sep 03 07:43:04 2014 -0400
+++ b/configs/common/FSConfig.pyWed Sep 03 07:43:04 2014 -0400
@@ -225,13 +225,9 @@
 # Attach any PCI devices this platform supports
 self.realview.attachPciDevices()
 # default to an IDE controller rather than a CF one
-# assuming we've got one; EMM64 is an exception for the moment
-if machine_type != VExpress_EMM64:
-try:
-self.realview.ide.disks = [self.cf0]
-except:
-self.realview.cf_ctrl.disks = [self.cf0]
-else:
+try:
+self.realview.ide.disks = [self.cf0]
+except:
 self.realview.cf_ctrl.disks = [self.cf0]
 
 if bare_metal:
@@ -241,7 +237,7 @@
  size = mdesc.mem())]
 else:
 if machine_type == VExpress_EMM64:
-self.kernel = binary('vmlinux-3.14-aarch64-vexpress-emm64')
+self.kernel = binary('vmlinux-3.16-aarch64-vexpress-emm64-pcie')
 elif machine_type == VExpress_EMM:
 self.kernel = binary('vmlinux-3.3-arm-vexpress-emm-pcie')
 else:
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Mark v7 cbz instructions as direct branches

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 5e424aa952c5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5e424aa952c5
description:
arm: Mark v7 cbz instructions as direct branches

v7 cbz/cbnz instructions were improperly marked as indirect branches.

diffstat:

 src/arch/arm/isa/insts/branch.isa |  11 +++
 src/arch/arm/isa/templates/branch.isa |   6 +-
 2 files changed, 12 insertions(+), 5 deletions(-)

diffs (52 lines):

diff -r 6be8945d226b -r 5e424aa952c5 src/arch/arm/isa/insts/branch.isa
--- a/src/arch/arm/isa/insts/branch.isa Wed Sep 03 07:42:39 2014 -0400
+++ b/src/arch/arm/isa/insts/branch.isa Wed Sep 03 07:42:40 2014 -0400
@@ -1,6 +1,6 @@
 // -*- mode:c++ -*-
 
-// Copyright (c) 2010-2012 ARM Limited
+// Copyright (c) 2010-2012, 2014 ARM Limited
 // All rights reserved
 //
 // The license below extends only to copyright in the software and shall
@@ -174,12 +174,15 @@
 #CBNZ, CBZ. These are always unconditional as far as predicates
 for (mnem, test) in ((cbz, ==), (cbnz, !=)):
 code = 'NPC = (uint32_t)(PC + imm);\n'
+br_tgt_code = '''pcs.instNPC((uint32_t)(branchPC.instPC() + imm));'''
 predTest = Op1 %(test)s 0 % {test: test}
 iop = InstObjParams(mnem, mnem.capitalize(), BranchImmReg,
-{code: code, predicate_test: predTest},
-[IsIndirectControl])
+{code: code, predicate_test: predTest,
+brTgtCode : br_tgt_code},
+[IsDirectControl])
 header_output += BranchImmRegDeclare.subst(iop)
-decoder_output += BranchImmRegConstructor.subst(iop)
+decoder_output += BranchImmRegConstructor.subst(iop) + \
+  BranchTarget.subst(iop)
 exec_output += PredOpExecute.subst(iop)
 
 #TBB, TBH
diff -r 6be8945d226b -r 5e424aa952c5 src/arch/arm/isa/templates/branch.isa
--- a/src/arch/arm/isa/templates/branch.isa Wed Sep 03 07:42:39 2014 -0400
+++ b/src/arch/arm/isa/templates/branch.isa Wed Sep 03 07:42:40 2014 -0400
@@ -1,6 +1,6 @@
 // -*- mode:c++ -*-
 
-// Copyright (c) 2010 ARM Limited
+// Copyright (c) 2010, 2014 ARM Limited
 // All rights reserved
 //
 // The license below extends only to copyright in the software and shall
@@ -212,6 +212,10 @@
 %(class_name)s(ExtMachInst machInst,
int32_t imm, IntRegIndex _op1);
 %(BasicExecDeclare)s
+ArmISA::PCState branchTarget(const ArmISA::PCState branchPC) const;
+
+/// Explicitly import the otherwise hidden branchTarget
+using StaticInst::branchTarget;
 };
 }};
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Fix o3 quiesce fetch bug

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 1ba825974ee6 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=1ba825974ee6
description:
cpu: Fix o3 quiesce fetch bug

O3 is supposed to stop fetching instructions once a quiesce is 
encountered.
However due to a bug, it would continue fetching instructions from the 
current
fetch buffer.  This is because of a break statment that only broke out 
of the
first of 2 nested loops.  It should have broken out of both.

diffstat:

 src/cpu/o3/fetch_impl.hh |  8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diffs (34 lines):

diff -r ed05298e8566 -r 1ba825974ee6 src/cpu/o3/fetch_impl.hh
--- a/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:37 2014 -0400
+++ b/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:38 2014 -0400
@@ -1236,6 +1236,9 @@
 // ended this fetch block.
 bool predictedBranch = false;
 
+// Need to halt fetch if quiesce instruction detected
+bool quiesce = false;
+
 TheISA::MachInst *cacheInsts =
 reinterpret_castTheISA::MachInst *(fetchBuffer[tid]);
 
@@ -1246,7 +1249,7 @@
 // Keep issuing while fetchWidth is available and branch is not
 // predicted taken
 while (numInst  fetchWidth  fetchQueue[tid].size()  fetchQueueSize
-!predictedBranch) {
+!predictedBranch  !quiesce) {
 // We need to process more memory if we aren't going to get a
 // StaticInst from the rom, the current macroop, or what's already
 // in the decoder.
@@ -1363,9 +1366,10 @@
 
 if (instruction-isQuiesce()) {
 DPRINTF(Fetch,
-Quiesce instruction encountered, halting fetch!);
+Quiesce instruction encountered, halting fetch!\n);
 fetchStatus[tid] = QuiescePending;
 status_change = true;
+quiesce = true;
 break;
 }
 } while ((curMacroop || decoder[tid]-instReady()) 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Make memory ops work on 64bit/128-bit qu...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset d96b61d843b2 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=d96b61d843b2
description:
arm: Make memory ops work on 64bit/128-bit quantities

Multiple instructions assume only 32-bit load operations are available,
this patch increases load sizes to 64-bit or 128-bit for many load pair 
and
load multiple instructions.

diffstat:

 src/arch/arm/insts/macromem.cc  |  388 ++-
 src/arch/arm/insts/macromem.hh  |   22 +-
 src/arch/arm/isa/insts/ldr64.isa|   90 +++---
 src/arch/arm/isa/insts/macromem.isa |   24 +-
 src/arch/arm/isa/insts/mem.isa  |4 +-
 src/arch/arm/isa/templates/macromem.isa |   35 ++-
 6 files changed, 355 insertions(+), 208 deletions(-)

diffs (truncated from 864 to 300 lines):

diff -r b5bef3c8e070 -r d96b61d843b2 src/arch/arm/insts/macromem.cc
--- a/src/arch/arm/insts/macromem.ccFri Jun 27 12:29:00 2014 -0500
+++ b/src/arch/arm/insts/macromem.ccWed Sep 03 07:42:52 2014 -0400
@@ -61,14 +61,29 @@
 {
 uint32_t regs = reglist;
 uint32_t ones = number_of_ones(reglist);
-// Remember that writeback adds a uop or two and the temp register adds one
-numMicroops = ones + (writeback ? (load ? 2 : 1) : 0) + 1;
+uint32_t mem_ops = ones;
 
-// It's technically legal to do a lot of nothing
-if (!ones)
+// Copy the base address register if we overwrite it, or if this 
instruction
+// is basically a no-op (we have to do something)
+bool copy_base =  (bits(reglist, rn)  load) || !ones;
+bool force_user = user  !bits(reglist, 15);
+bool exception_ret = user  bits(reglist, 15);
+bool pc_temp = load  writeback  bits(reglist, 15);
+
+if (!ones) {
 numMicroops = 1;
+} else if (load) {
+numMicroops = ((ones + 1) / 2)
++ ((ones % 2 == 0  exception_ret) ? 1 : 0)
++ (copy_base ? 1 : 0)
++ (writeback? 1 : 0)
++ (pc_temp ? 1 : 0);
+} else {
+numMicroops = ones + (writeback ? 1 : 0);
+}
 
 microOps = new StaticInstPtr[numMicroops];
+
 uint32_t addr = 0;
 
 if (!up)
@@ -81,94 +96,129 @@
 
 // Add 0 to Rn and stick it in ureg0.
 // This is equivalent to a move.
-*uop = new MicroAddiUop(machInst, INTREG_UREG0, rn, 0);
+if (copy_base)
+*uop++ = new MicroAddiUop(machInst, INTREG_UREG0, rn, 0);
 
 unsigned reg = 0;
-unsigned regIdx = 0;
-bool force_user = user  !bits(reglist, 15);
-bool exception_ret = user  bits(reglist, 15);
+while (mem_ops != 0) {
+// Do load operations in pairs if possible
+if (load  mem_ops = 2 
+!(mem_ops == 2  bits(regs,INTREG_PC)  exception_ret)) {
+// 64-bit memory operation
+// Find 2 set register bits (clear them after finding)
+unsigned reg_idx1;
+unsigned reg_idx2;
 
-for (int i = 0; i  ones; i++) {
-// Find the next register.
-while (!bits(regs, reg))
-reg++;
-replaceBits(regs, reg, 0);
+// Find the first register
+while (!bits(regs, reg)) reg++;
+replaceBits(regs, reg, 0);
+reg_idx1 = force_user ? intRegInMode(MODE_USER, reg) : reg;
 
-regIdx = reg;
-if (force_user) {
-regIdx = intRegInMode(MODE_USER, regIdx);
-}
+// Find the second register
+while (!bits(regs, reg)) reg++;
+replaceBits(regs, reg, 0);
+reg_idx2 = force_user ? intRegInMode(MODE_USER, reg) : reg;
 
-if (load) {
-if (writeback  i == ones - 1) {
-// If it's a writeback and this is the last register
-// do the load into a temporary register which we'll move
-// into the final one later
-*++uop = new MicroLdrUop(machInst, INTREG_UREG1, INTREG_UREG0,
-up, addr);
-} else {
-// Otherwise just do it normally
-if (reg == INTREG_PC  exception_ret) {
-// This must be the exception return form of ldm.
-*++uop = new MicroLdrRetUop(machInst, regIdx,
-   INTREG_UREG0, up, addr);
+// Load into temp reg if necessary
+if (reg_idx2 == INTREG_PC  pc_temp)
+reg_idx2 = INTREG_UREG1;
+
+// Actually load both registers from memory
+*uop = new MicroLdr2Uop(machInst, reg_idx1, reg_idx2,
+copy_base ? INTREG_UREG0 : rn, up, addr);
+
+if (!writeback  reg_idx2 == INTREG_PC) {
+// No writeback if idx==pc, set appropriate flags
+(*uop)-setFlag(StaticInst::IsControl);
+(*uop)-setFlag(StaticInst::IsIndirectControl);
+
+if (!(condCode == COND_AL || condCode == COND_UC))
+  

[gem5-dev] changeset in gem5: tests: Use O3_ARM_v7a config for full-system ...

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset 60dddc0a6f78 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=60dddc0a6f78
description:
tests: Use O3_ARM_v7a config for full-system ARM regressions

This patch changes the CPU configuration used for the full-system ARM
regressions to increase the test coverage. Note that it is only the
core configuration, and not the caches etc.

diffstat:

 tests/configs/realview-o3-checker.py |  3 ++-
 tests/configs/realview-o3-dual.py|  3 ++-
 tests/configs/realview-o3.py |  3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diffs (41 lines):

diff -r 1b627a6ddac0 -r 60dddc0a6f78 tests/configs/realview-o3-checker.py
--- a/tests/configs/realview-o3-checker.py  Wed Sep 03 07:42:41 2014 -0400
+++ b/tests/configs/realview-o3-checker.py  Wed Sep 03 07:42:41 2014 -0400
@@ -37,8 +37,9 @@
 
 from m5.objects import *
 from arm_generic import *
+from O3_ARM_v7a import O3_ARM_v7a_3
 
 root = LinuxArmFSSystemUniprocessor(mem_mode='timing',
 mem_class=DDR3_1600_x64,
-cpu_class=DerivO3CPU,
+cpu_class=O3_ARM_v7a_3,
 checker=True).create_root()
diff -r 1b627a6ddac0 -r 60dddc0a6f78 tests/configs/realview-o3-dual.py
--- a/tests/configs/realview-o3-dual.py Wed Sep 03 07:42:41 2014 -0400
+++ b/tests/configs/realview-o3-dual.py Wed Sep 03 07:42:41 2014 -0400
@@ -37,8 +37,9 @@
 
 from m5.objects import *
 from arm_generic import *
+from O3_ARM_v7a import O3_ARM_v7a_3
 
 root = LinuxArmFSSystem(mem_mode='timing',
 mem_class=DDR3_1600_x64,
-cpu_class=DerivO3CPU,
+cpu_class=O3_ARM_v7a_3,
 num_cpus=2).create_root()
diff -r 1b627a6ddac0 -r 60dddc0a6f78 tests/configs/realview-o3.py
--- a/tests/configs/realview-o3.py  Wed Sep 03 07:42:41 2014 -0400
+++ b/tests/configs/realview-o3.py  Wed Sep 03 07:42:41 2014 -0400
@@ -37,7 +37,8 @@
 
 from m5.objects import *
 from arm_generic import *
+from O3_ARM_v7a import O3_ARM_v7a_3
 
 root = LinuxArmFSSystemUniprocessor(mem_mode='timing',
 mem_class=DDR3_1600_x64,
-cpu_class=DerivO3CPU).create_root()
+cpu_class=O3_ARM_v7a_3).create_root()
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: dev: seperate legacy io offsets from PCI offset

2014-09-03 Thread Ali Saidi via gem5-dev
changeset 1e2f39859382 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=1e2f39859382
description:
dev: seperate legacy io offsets from PCI offset

The PC platform has a single IO range that is used both legacy IO and 
PCI IO
while other platforms may use seperate regions. Provide another 
mechanism to
configure the legacy IO base address range and set it to the PCI IO 
address
range for x86.

diffstat:

 src/dev/Pci.py |  1 +
 src/dev/pcidev.cc  |  2 +-
 src/dev/x86/SouthBridge.py |  1 +
 3 files changed, 3 insertions(+), 1 deletions(-)

diffs (34 lines):

diff -r 644b615fbe6a -r 1e2f39859382 src/dev/Pci.py
--- a/src/dev/Pci.pyWed Sep 03 07:43:05 2014 -0400
+++ b/src/dev/Pci.pyWed Sep 03 07:43:06 2014 -0400
@@ -98,6 +98,7 @@
 BAR3LegacyIO = Param.Bool(False, Whether BAR3 is hardwired legacy IO)
 BAR4LegacyIO = Param.Bool(False, Whether BAR4 is hardwired legacy IO)
 BAR5LegacyIO = Param.Bool(False, Whether BAR5 is hardwired legacy IO)
+LegacyIOBase = Param.Addr(0x0, Base Address for Legacy IO)
 
 CardbusCIS = Param.UInt32(0x00, Cardbus Card Information Structure)
 SubsystemID = Param.UInt16(0x00, Subsystem ID)
diff -r 644b615fbe6a -r 1e2f39859382 src/dev/pcidev.cc
--- a/src/dev/pcidev.cc Wed Sep 03 07:43:05 2014 -0400
+++ b/src/dev/pcidev.cc Wed Sep 03 07:43:06 2014 -0400
@@ -213,7 +213,7 @@
 
 for (int i = 0; i  6; ++i) {
 if (legacyIO[i]) {
-BARAddrs[i] = platform-calcPciIOAddr(letoh(config.baseAddr[i]));
+BARAddrs[i] = p-LegacyIOBase + letoh(config.baseAddr[i]);
 config.baseAddr[i] = 0;
 } else {
 BARAddrs[i] = 0;
diff -r 644b615fbe6a -r 1e2f39859382 src/dev/x86/SouthBridge.py
--- a/src/dev/x86/SouthBridge.pyWed Sep 03 07:43:05 2014 -0400
+++ b/src/dev/x86/SouthBridge.pyWed Sep 03 07:43:06 2014 -0400
@@ -84,6 +84,7 @@
 ide.ProgIF = 0x80
 ide.InterruptLine = 14
 ide.InterruptPin = 1
+ide.LegacyIOBase = x86IOAddress(0)
 
 def attachIO(self, bus, dma_ports):
 # Route interupt signals
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Support 2GB of memory for AArch64 systems

2014-09-03 Thread Ali Saidi via gem5-dev
changeset 644b615fbe6a in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=644b615fbe6a
description:
arm: Support 2GB of memory for AArch64 systems

diffstat:

 configs/common/FSConfig.py |  27 +++
 src/dev/arm/RealView.py|   9 +
 2 files changed, 24 insertions(+), 12 deletions(-)

diffs (75 lines):

diff -r 1aff1376921e -r 644b615fbe6a configs/common/FSConfig.py
--- a/configs/common/FSConfig.pyWed Sep 03 07:43:04 2014 -0400
+++ b/configs/common/FSConfig.pyWed Sep 03 07:43:05 2014 -0400
@@ -246,19 +246,30 @@
 if dtb_filename:
 self.dtb_filename = binary(dtb_filename)
 self.machine_type = machine_type
-if convert.toMemorySize(mdesc.mem())  int(self.realview.max_mem_size):
-print The currently selected ARM platforms doesn't support
-print  the amount of DRAM you've selected. Please try
-print  another platform
-sys.exit(1)
-
 # Ensure that writes to the UART actually go out early in the boot
 boot_flags = 'earlyprintk=pl011,0x1c09 console=ttyAMA0 ' + \
  'lpj=19988480 norandmaps rw loglevel=8 ' + \
  'mem=%s root=/dev/sda1' % mdesc.mem()
 
-self.mem_ranges = [AddrRange(self.realview.mem_start_addr,
- size = mdesc.mem())]
+self.mem_ranges = []
+size_remain = long(Addr(mdesc.mem()))
+for region in self.realview._mem_regions:
+if size_remain  long(region[1]):
+self.mem_ranges.append(AddrRange(region[0], size=region[1]))
+size_remain = size_remain - long(region[1])
+else:
+self.mem_ranges.append(AddrRange(region[0], size=size_remain))
+size_remain = 0
+break
+warn(Memory size specified spans more than one region. Creating \
+  another memory controller for that range.)
+
+if size_remain  0:
+fatal(The currently selected ARM platforms doesn't support \
+   the amount of DRAM you've selected. Please try \
+   another platform)
+
+
 self.realview.setupBootLoader(self.membus, self, binary)
 self.gic_cpu_addr = self.realview.gic.cpu_addr
 self.flags_addr = self.realview.realview_io.pio_addr + 0x30
diff -r 1aff1376921e -r 644b615fbe6a src/dev/arm/RealView.py
--- a/src/dev/arm/RealView.py   Wed Sep 03 07:43:04 2014 -0400
+++ b/src/dev/arm/RealView.py   Wed Sep 03 07:43:05 2014 -0400
@@ -184,8 +184,7 @@
 pci_cfg_base = Param.Addr(0, Base address of PCI Configuraiton Space)
 pci_cfg_gen_offsets = Param.Bool(False, Should the offsets used for PCI 
cfg access
  be compatible with the pci-generic-host or the legacy host 
bridge?)
-mem_start_addr = Param.Addr(0, Start address of main memory)
-max_mem_size = Param.Addr('256MB', Maximum amount of RAM supported by 
platform)
+_mem_regions = [(Addr(0), Addr('256MB'))]
 
 def attachPciDevices(self):
 pass
@@ -444,8 +443,7 @@
 self.smcreg_fake.clk_domain   = clkdomain
 
 class VExpress_EMM(RealView):
-mem_start_addr = '2GB'
-max_mem_size = '2GB'
+_mem_regions = [(Addr('2GB'), Addr('2GB'))]
 pci_cfg_base = 0x3000
 uart = Pl011(pio_addr=0x1c09, int_num=37)
 realview_io = RealViewCtrl(proc_id0=0x1400, proc_id1=0x1400, \
@@ -602,6 +600,9 @@
 class VExpress_EMM64(VExpress_EMM):
 pci_io_base = 0x2f00
 pci_cfg_gen_offsets = True
+# Three memory regions are specified totalling 512GB
+_mem_regions = [(Addr('2GB'), Addr('2GB')), (Addr('34GB'), Addr('30GB')),
+(Addr('512GB'), Addr('480GB'))]
 def setupBootLoader(self, mem_bus, cur_sys, loc):
 self.nvmem = SimpleMemory(range = AddrRange(0, size = '64MB'))
 self.nvmem.port = mem_bus.master
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Fix a bug in the cache port flow control

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset fa9ef374075f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=fa9ef374075f
description:
mem: Fix a bug in the cache port flow control

This patch fixes a bug in the cache port where the retry flag was
reset too early, allowing new requests to arrive before the retry was
actually sent, but with the event already scheduled. This caused a
deadlock in the interactions with the O3 LSQ.

The patche fixes the underlying issue by shifting the resetting of the
flag to be done by the event that also calls sendRetry(). The patch
also tidies up the flow control in recvTimingReq and ensures that we
also check if we already have a retry outstanding.

diffstat:

 src/mem/cache/base.cc   |  11 +--
 src/mem/cache/base.hh   |   5 -
 src/mem/cache/cache_impl.hh |  27 +++
 3 files changed, 32 insertions(+), 11 deletions(-)

diffs (80 lines):

diff -r a1eea45928e6 -r fa9ef374075f src/mem/cache/base.cc
--- a/src/mem/cache/base.cc Tue May 13 12:20:49 2014 -0500
+++ b/src/mem/cache/base.cc Wed Sep 03 07:42:50 2014 -0400
@@ -106,13 +106,20 @@
 DPRINTF(CachePort, Cache port %s accepting new requests\n, name());
 blocked = false;
 if (mustSendRetry) {
-DPRINTF(CachePort, Cache port %s sending retry\n, name());
-mustSendRetry = false;
 // @TODO: need to find a better time (next bus cycle?)
 owner.schedule(sendRetryEvent, curTick() + 1);
 }
 }
 
+void
+BaseCache::CacheSlavePort::processSendRetry()
+{
+DPRINTF(CachePort, Cache port %s sending retry\n, name());
+
+// reset the flag and call retry
+mustSendRetry = false;
+sendRetry();
+}
 
 void
 BaseCache::init()
diff -r a1eea45928e6 -r fa9ef374075f src/mem/cache/base.hh
--- a/src/mem/cache/base.hh Tue May 13 12:20:49 2014 -0500
+++ b/src/mem/cache/base.hh Wed Sep 03 07:42:50 2014 -0400
@@ -182,7 +182,10 @@
 
   private:
 
-EventWrapperSlavePort, SlavePort::sendRetry sendRetryEvent;
+void processSendRetry();
+
+EventWrapperCacheSlavePort,
+ CacheSlavePort::processSendRetry sendRetryEvent;
 
 };
 
diff -r a1eea45928e6 -r fa9ef374075f src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue May 13 12:20:49 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Wed Sep 03 07:42:50 2014 -0400
@@ -1937,16 +1937,27 @@
 bool
 CacheTagStore::CpuSidePort::recvTimingReq(PacketPtr pkt)
 {
-// always let inhibited requests through even if blocked
-if (!pkt-memInhibitAsserted()  blocked) {
-assert(!cache-system-bypassCaches());
-DPRINTF(Cache,Scheduling a retry while blocked\n);
-mustSendRetry = true;
-return false;
+assert(!cache-system-bypassCaches());
+
+bool success = false;
+
+// always let inhibited requests through, even if blocked
+if (pkt-memInhibitAsserted()) {
+// this should always succeed
+success = cache-recvTimingReq(pkt);
+assert(success);
+} else if (blocked || mustSendRetry) {
+// either already committed to send a retry, or blocked
+success = false;
+} else {
+// for now this should always succeed
+success = cache-recvTimingReq(pkt);
+assert(success);
 }
 
-cache-recvTimingReq(pkt);
-return true;
+// remember if we have to retry
+mustSendRetry = !success;
+return success;
 }
 
 templateclass TagStore
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: base: Use the global Mersenne twister throughout

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset c91b23c72d5e in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=c91b23c72d5e
description:
base: Use the global Mersenne twister throughout

This patch tidies up random number generation to ensure that it is
done consistently throughout the code base. In essence this involves a
clean-up of Ruby, and some code simplifications in the traffic
generator.

As part of this patch a bunch of skewed distributions (off-by-one etc)
have been fixed.

Note that a single global random number generator is used, and that
the object instantiation order will impact the behaviour (the sequence
of numbers will be unaffected, but if module A calles random before
module B then they would obviously see a different outcome). The
dependency on the instantiation order is true in any case due to the
execution-model of gem5, so we leave it as is. Also note that the
global ranom generator is not thread safe at this point.

Regressions using the memtest, TrafficGen or any Ruby tester are
affected and will be updated accordingly.

diffstat:

 src/cpu/testers/directedtest/SeriesRequestGenerator.cc |   3 +-
 src/cpu/testers/memtest/memtest.cc |  18 --
 src/cpu/testers/networktest/networktest.cc |   7 +++--
 src/cpu/testers/rubytest/Check.cc  |  22 +
 src/cpu/testers/rubytest/CheckTable.cc |   3 +-
 src/cpu/testers/traffic_gen/generators.cc  |  15 +--
 src/cpu/testers/traffic_gen/traffic_gen.cc |   2 +-
 src/mem/ruby/common/NetDest.cc |   7 -
 src/mem/ruby/common/NetDest.hh |   1 -
 src/mem/ruby/common/Set.cc |  16 -
 src/mem/ruby/common/Set.hh |   1 -
 src/mem/ruby/network/MessageBuffer.cc  |   7 +++--
 src/mem/ruby/network/simple/PerfectSwitch.cc   |   4 ++-
 src/mem/ruby/slicc_interface/RubySlicc_Util.hh |   6 
 src/mem/ruby/structures/RubyMemoryControl.cc   |   5 ++-
 15 files changed, 48 insertions(+), 69 deletions(-)

diffs (truncated from 450 to 300 lines):

diff -r d548d1d7597c -r c91b23c72d5e 
src/cpu/testers/directedtest/SeriesRequestGenerator.cc
--- a/src/cpu/testers/directedtest/SeriesRequestGenerator.ccWed Sep 03 
07:42:53 2014 -0400
+++ b/src/cpu/testers/directedtest/SeriesRequestGenerator.ccWed Sep 03 
07:42:54 2014 -0400
@@ -27,6 +27,7 @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include base/random.hh
 #include cpu/testers/directedtest/DirectedGenerator.hh
 #include cpu/testers/directedtest/RubyDirectedTester.hh
 #include cpu/testers/directedtest/SeriesRequestGenerator.hh
@@ -60,7 +61,7 @@
 Request *req = new Request(m_address, 1, flags, masterId);
 
 Packet::Command cmd;
-bool do_write = ((random() % 100)  m_percent_writes);
+bool do_write = (random_mt.random(0, 100)  m_percent_writes);
 if (do_write) {
 cmd = MemCmd::WriteReq;
 } else {
diff -r d548d1d7597c -r c91b23c72d5e src/cpu/testers/memtest/memtest.cc
--- a/src/cpu/testers/memtest/memtest.ccWed Sep 03 07:42:53 2014 -0400
+++ b/src/cpu/testers/memtest/memtest.ccWed Sep 03 07:42:54 2014 -0400
@@ -37,6 +37,7 @@
 #include vector
 
 #include base/misc.hh
+#include base/random.hh
 #include base/statistics.hh
 #include cpu/testers/memtest/memtest.hh
 #include debug/MemTest.hh
@@ -261,14 +262,14 @@
 }
 
 //make new request
-unsigned cmd = random() % 100;
-unsigned offset = random() % size;
-unsigned base = random() % 2;
-uint64_t data = random();
-unsigned access_size = random() % 4;
-bool uncacheable = (random() % 100)  percentUncacheable;
+unsigned cmd = random_mt.random(0, 100);
+unsigned offset = random_mt.randomunsigned(0, size - 1);
+unsigned base = random_mt.random(0, 1);
+uint64_t data = random_mt.randomuint64_t();
+unsigned access_size = random_mt.random(0, 3);
+bool uncacheable = random_mt.random(0, 100)  percentUncacheable;
 
-unsigned dma_access_size = random() % 4; 
+unsigned dma_access_size = random_mt.random(0, 3);
 
 //If we aren't doing copies, use id as offset, and do a false sharing
 //mem tester
@@ -296,7 +297,8 @@
 return;
 }
 
-bool do_functional = (random() % 100  percentFunctional)  !uncacheable;
+bool do_functional = (random_mt.random(0, 100)  percentFunctional) 
+!uncacheable;
 Request *req = new Request();
 uint8_t *result = new uint8_t[8];
 
diff -r d548d1d7597c -r c91b23c72d5e src/cpu/testers/networktest/networktest.cc
--- a/src/cpu/testers/networktest/networktest.ccWed Sep 03 07:42:53 
2014 -0400
+++ b/src/cpu/testers/networktest/networktest.ccWed Sep 03 07:42:54 
2014 -0400
@@ -35,6 

[gem5-dev] changeset in gem5: stats: Update stats for CPU and cache changes

2014-09-03 Thread Andreas Hansson via gem5-dev
changeset 5f1f92bf76ee in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5f1f92bf76ee
description:
stats: Update stats for CPU and cache changes

This patch updates the stats to reflect the fixes and changes to the
CPU (mainly the o3), and the caches.

diffstat:

 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-minor/stats.txt
 |  1532 +-
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3-dual/stats.txt  
 |  3819 
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3/stats.txt   
 |  2187 ++--
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-switcheroo-full/stats.txt  
 |  3122 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor-dual/stats.txt
 |  2214 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor/stats.txt 
 |  1311 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-checker/stats.txt
 |  2244 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-dual/stats.txt   
 |  3684 
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3/stats.txt
 |  2214 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-full/stats.txt   
 |  3077 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-o3/stats.txt 
 |  3339 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-timing/stats.txt 
 |  2117 ++--
 tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt   
 |  2461 ++--
 tests/long/fs/10.linux-boot/ref/x86/linux/pc-switcheroo-full/stats.txt 
 |  3138 +++---
 tests/long/se/10.mcf/ref/arm/linux/minor-timing/stats.txt  
 |  1128 +-
 tests/long/se/10.mcf/ref/arm/linux/o3-timing/stats.txt 
 |  1460 +-
 tests/long/se/10.mcf/ref/arm/linux/simple-atomic/stats.txt 
 |   130 +-
 tests/long/se/10.mcf/ref/arm/linux/simple-timing/stats.txt 
 |   430 +-
 tests/long/se/10.mcf/ref/x86/linux/o3-timing/stats.txt 
 |  1438 +-
 tests/long/se/20.parser/ref/alpha/tru64/minor-timing/stats.txt 
 |   838 +-
 tests/long/se/20.parser/ref/arm/linux/minor-timing/stats.txt   
 |  1321 +-
 tests/long/se/20.parser/ref/arm/linux/o3-timing/stats.txt  
 |  1700 +-
 tests/long/se/20.parser/ref/arm/linux/simple-atomic/stats.txt  
 |   130 +-
 tests/long/se/20.parser/ref/arm/linux/simple-timing/stats.txt  
 |   456 +-
 tests/long/se/20.parser/ref/x86/linux/o3-timing/stats.txt  
 |  1599 +-
 tests/long/se/30.eon/ref/alpha/tru64/minor-timing/stats.txt
 |   468 +-
 tests/long/se/30.eon/ref/alpha/tru64/o3-timing/stats.txt   
 |  1328 +-
 tests/long/se/30.eon/ref/alpha/tru64/simple-atomic/stats.txt   
 |18 +-
 tests/long/se/30.eon/ref/alpha/tru64/simple-timing/stats.txt   
 |18 +-
 tests/long/se/30.eon/ref/arm/linux/minor-timing/stats.txt  
 |  1152 +-
 tests/long/se/30.eon/ref/arm/linux/o3-timing/stats.txt 
 |  1493 +-
 tests/long/se/30.eon/ref/arm/linux/simple-atomic/stats.txt 
 |   160 +-
 tests/long/se/30.eon/ref/arm/linux/simple-timing/stats.txt 
 |   492 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/minor-timing/stats.txt
 |   948 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/o3-timing/stats.txt   
 |  1584 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/simple-atomic/stats.txt   
 |   194 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/simple-timing/stats.txt   
 |   806 +-
 tests/long/se/40.perlbmk/ref/arm/linux/minor-timing/stats.txt  
 |  1248 +-
 tests/long/se/40.perlbmk/ref/arm/linux/o3-timing/stats.txt 
 |  1645 +-
 tests/long/se/40.perlbmk/ref/arm/linux/simple-atomic/stats.txt 
 |   172 +-
 tests/long/se/40.perlbmk/ref/arm/linux/simple-timing/stats.txt 
 |   848 +-
 tests/long/se/50.vortex/ref/alpha/tru64/minor-timing/stats.txt 
 |   888 +-
 tests/long/se/50.vortex/ref/alpha/tru64/o3-timing/stats.txt
 |  1558 +-
 tests/long/se/50.vortex/ref/alpha/tru64/simple-atomic/stats.txt
 |14 +-
 tests/long/se/50.vortex/ref/alpha/tru64/simple-timing/stats.txt
 |14 +-
 tests/long/se/50.vortex/ref/arm/linux/minor-timing/stats.txt   
 |   940 +-
 

[gem5-dev] changeset in gem5: dev, arm: Add support for linux generic pci h...

2014-09-03 Thread Ali Saidi via gem5-dev
changeset 198dfef33403 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=198dfef33403
description:
dev, arm: Add support for linux generic pci host driver

This change adds support for a generic pci host bus driver that
has been included in recent Linux kernel instead of the more
bespoke one we've been using to date. It also works with
aarch64 so it provides PCI support for 64-bit ARM Linux.

To make this work a new configuration option pci_io_base is added
to the RealView platform that should be set to the start of
the memory used as memory mapped IO ports (IO ports that are
memory mapped, not regular memory mapped IO). And a parameter
pci_cfg_gen_offsets which specifies if the config space
offsets should be used that the generic driver expects.

To use the pci-host-generic device you need to:
pci_io_base = 0x2f00 (Valid for VExpress EMM)
pci_cfg_gen_offsets = True

and add the following to your device tree:

pci {
compatible = pci-host-ecam-generic;
device_type = pci;
#address-cells = 0x3;
#size-cells = 0x2;
#interrupt-cells = 0x1;
//bus-range = 0x0 0x1;

// CPU_PHYSICAL(2)  SIZE(2)
// Note, some DTS blobs only support 1 size
reg = 0x0 0x3000 0x0 0x1000;

// IO (1), no bus address (2), cpu address (2), size (2)
// MMIO (1), at address (2), cpu address (2), size (2)
ranges = 0x0100 0x0 0x 0x0 0x2f00 0x0 0x1,
 0x0200 0x0 0x4000 0x0 0x4000 0x0 
0x1000;

// With gem5 we typically use INTA/B/C/D one per device
interrupt-map = 0x 0x0 0x0 0x1 0x1 0x0 0x11 0x1
 0x 0x0 0x0 0x2 0x1 0x0 0x12 0x1
 0x 0x0 0x0 0x3 0x1 0x0 0x13 0x1
 0x 0x0 0x0 0x4 0x1 0x0 0x14 0x1;

// Only match INTA/B/C/D and not BDF
interrupt-map-mask = 0x 0x0 0x0 0x7;
};

diffstat:

 src/dev/arm/RealView.py |   5 +
 src/dev/arm/realview.cc |  27 ---
 src/dev/arm/realview.hh |   3 +++
 3 files changed, 32 insertions(+), 3 deletions(-)

diffs (90 lines):

diff -r 7565dcd505a4 -r 198dfef33403 src/dev/arm/RealView.py
--- a/src/dev/arm/RealView.py   Wed Sep 03 07:43:03 2014 -0400
+++ b/src/dev/arm/RealView.py   Wed Sep 03 07:43:04 2014 -0400
@@ -180,7 +180,10 @@
 type = 'RealView'
 cxx_header = dev/arm/realview.hh
 system = Param.System(Parent.any, system)
+pci_io_base = Param.Addr(0, Base address of PCI IO Space)
 pci_cfg_base = Param.Addr(0, Base address of PCI Configuraiton Space)
+pci_cfg_gen_offsets = Param.Bool(False, Should the offsets used for PCI 
cfg access
+ be compatible with the pci-generic-host or the legacy host 
bridge?)
 mem_start_addr = Param.Addr(0, Start address of main memory)
 max_mem_size = Param.Addr('256MB', Maximum amount of RAM supported by 
platform)
 
@@ -597,6 +600,8 @@
 self.mmc_fake.clk_domain  = clkdomain
 
 class VExpress_EMM64(VExpress_EMM):
+pci_io_base = 0x2f00
+pci_cfg_gen_offsets = True
 def setupBootLoader(self, mem_bus, cur_sys, loc):
 self.nvmem = SimpleMemory(range = AddrRange(0, size = '64MB'))
 self.nvmem.port = mem_bus.master
diff -r 7565dcd505a4 -r 198dfef33403 src/dev/arm/realview.cc
--- a/src/dev/arm/realview.cc   Wed Sep 03 07:43:03 2014 -0400
+++ b/src/dev/arm/realview.cc   Wed Sep 03 07:43:04 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2009 ARM Limited
+ * Copyright (c) 2009, 2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -63,6 +63,21 @@
 {}
 
 void
+RealView::initState()
+{
+Addr junk;
+bool has_gen_pci_host;
+has_gen_pci_host = system-kernelSymtab-findAddress(gen_pci_setup, 
junk);
+
+if (has_gen_pci_host  !params()-pci_cfg_gen_offsets)
+warn(Kernel supports generic PCI host but PCI Config offsets 
+configured for legacy. Set pci_cfg_gen_offsets to True);
+if (has_gen_pci_host  !params()-pci_io_base)
+warn(Kernel supports generic PCI host but PCI IO base is set 
+to 0. Set pci_io_base to the start of PCI IO space);
+}
+
+void
 RealView::postConsoleInt()
 {
 warn_once(Don't know what interrupt to post for console.\n);
@@ -100,13 +115,19 @@
 {
 if (bus != 0)
 return ULL(-1);
-return params()-pci_cfg_base | ((func  7)  16) | ((dev  0x1f)  19);
+
+Addr cfg_offset = 0;
+if (params()-pci_cfg_gen_offsets)
+cfg_offset |= ((func  7)  12) | ((dev  0x1f)  15);
+else
+cfg_offset |= ((func  7)  16) | ((dev  

Re: [gem5-dev] Review Request 2372: style: add .clang-format file

2014-09-03 Thread Andreas Sandberg via gem5-dev


 On Sept. 1, 2014, 6:14 p.m., Andreas Sandberg wrote:
  .clang-format, line 18
  http://reviews.gem5.org/r/2372/diff/1/?file=41128#file41128line18
 
  Has this changed name? The clang documentation lists 
  DerivePointerAlignment, but not DerivePointerBinding.
 
 Nilay Vaish wrote:
 Documentation for version 3.4 lists DerivePointerBinding.

That explains it. I was looking at the 3.6 documentation. 
(http://clang.llvm.org/docs/ClangFormatStyleOptions.html)


- Andreas


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2372/#review5321
---


On Sept. 3, 2014, 5:50 a.m., Nilay Vaish wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2372/
 ---
 
 (Updated Sept. 3, 2014, 5:50 a.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10318:34b549ec182b
 ---
 style: add .clang-format file
 
 The format specified in this file is used by clang-format to fix
 the formatting of a given file.  Hopefully, this will ease the burden
 on the developers as they no longer need to manually format things.
 
 
 Diffs
 -
 
   .clang-format PRE-CREATION 
 
 Diff: http://reviews.gem5.org/r/2372/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Nilay Vaish
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] bi-mode branch predictor miss prediction rate is high

2014-09-03 Thread Mitch Hayenga via gem5-dev
A bug was recently found in the bimodal predictor.  If you are still
looking at this, you might want to try a new checkout.  Hope this helps.


On Wed, Jul 2, 2014 at 4:52 PM, Zi Yan via gem5-dev gem5-dev@gem5.org
wrote:

 I get 5 100-million-instruction simpoints for each benchmark in
 SPEC CPU 2006 with *ref input*. I am using cross-tool
 arm-cortex_a15-linux-gnueabi-gcc version 4.8.2 to compile.

 For gcc, I got from 0.2% to 5% miss rate from tournament, but 3% to 22%
 miss rate from bi-mode cross all simpoints.

 Most weird part is hmmer, I got from 0.3% to 0.5% miss rate from
 tournament,
 but 52% to 60% miss rate from bi-mode.



 --
 Best Regards
 Yan Zi

 On 2 Jul 2014, at 17:11, Anthony Gutierrez via gem5-dev wrote:

  This could depend on a lot of factors. How are you running the
 benchmarks?
 
  E.g., running SPEC 2k6's gcc to completion with the train input set in FS
  mode yields a 6.45% miss rate for bi-mode, while the tournament predictor
  yields a 7.12% miss rate.
 
 
  Anthony Gutierrez
  http://web.eecs.umich.edu/~atgutier
 
 
  On Wed, Jul 2, 2014 at 4:37 PM, Zi Yan via gem5-dev gem5-dev@gem5.org
  wrote:
 
  Hi,
 
  I just updated gem5-dev and got bi-mode as ARM's default
  branch predictor.
 
  I got mis-prediction rate
 
 (system.cpu.branchPred.condIncorrect/system.cpu.branchPred.condPredicted)
  ranging from 10% to 60%, whereas I saw mis-prediction rate ranging
  from 1% to 9% with tournament for SPEC CPU 2006 benchmarks.
 
  Should I expect this from bi-mode?
 
  Thanks.
 
  --
  Best Regards
  Yan Zi
  ___
  gem5-dev mailing list
  gem5-dev@gem5.org
  http://m5sim.org/mailman/listinfo/gem5-dev
 
  ___
  gem5-dev mailing list
  gem5-dev@gem5.org
  http://m5sim.org/mailman/listinfo/gem5-dev
 ___
 gem5-dev mailing list
 gem5-dev@gem5.org
 http://m5sim.org/mailman/listinfo/gem5-dev

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] workaround: Ruby functional read failed error

2014-09-03 Thread Jiri Kaspar via gem5-dev
 BTW, in the scenario above, the functionalWrite will not work correctly,
data
 updated by functionalWrite in controllers will be replaced with old data
from a
 queued packet several ticks later.

functionalWrite works well in this scenario, the functional write failed
bug is already fixed.
Sorry for my missunderstanding, I need to improve my c++ reading skills.

When I read the code carefully, it seems to me, that the same logic, as it
is implemented for writing  in the RubySystem::functionalWrite(PacketPtr
pkt) and RubyMemoryControl::functionalWriteBuffers(Packet *pkt), is alredy
prepared also for reading in the
RubyMemoryControl::functionalReadBuffers(Packet *pkt).

Is there any reason, why it is not used in the
RubySystem::functionalRead(PacketPtr pkt) function ?
Is the following code to add before the return false the right solution ?

Regards,
Jiri Kaspar
---
for (unsigned int i = 0; i  num_controllers;++i) {
if (m_abs_cntrl_vec[i]-functionalReadBuffers(pkt)) return true;
}

for (unsigned int i = 0; i  m_memory_controller_vec.size() ;++i) {
if (m_memory_controller_vec[i]-functionalReadBuffers(pkt)) return
true;
}

if (m_network_ptr-functionalWrite(pkt)) return true;
 

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev