Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu

2015-01-30 Thread Mitch Hayenga via gem5-dev
Hi,

Stores should be fine since they are only sent to the memory system after
commit.   The relevant functions to look at are
sendStore, recvRetry, and writebackStores in lsq_unit_impl.hh.

Basically, if a store gets blocked the core just waits until it gets a
retry.  Since stores are sent in-order from the SQ to the memory system,
that queue just waits.  The stores are never removed from the SQ unless
they succeed.

Loads were special in that they were effectively removed from the
scheduler, even if they might fail.  Stores however always maintain their
entries/order until they succeed.





On Thu, Jan 29, 2015 at 6:01 PM, Beckmann, Brad via gem5-dev 
gem5-dev@gem5.org wrote:

 Hi Mitch,

 Quick question regarding this patch.  Does this patch also handle
 replaying stores once the cache becomes unblocked?  The changes and
 comments appear to only handle loads, but it seems like stores could have
 the same problem.

 Thanks,

 Brad



 -Original Message-
 From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch
 Hayenga via gem5-dev
 Sent: Wednesday, September 03, 2014 4:38 AM
 To: gem5-...@m5sim.org
 Subject: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load
 behavior in o3 cpu

 changeset 6be8945d226b in /z/repo/gem5
 details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b
 description:
 cpu: Fix cache blocked load behavior in o3 cpu

 This patch fixes the load blocked/replay mechanism in the o3 cpu.
 Rather than
 flushing the entire pipeline, this patch replays loads once the
 cache becomes
 unblocked.

 Additionally, deferred memory instructions (loads which had
 conflicting stores),
 when replayed would not respect the number of functional units
 (only respected
 issue width).  This patch also corrects that.

 Improvements over 20% have been observed on a microbenchmark
 designed to
 exercise this behavior.

 diffstat:

  src/cpu/o3/iew.hh   |   13 +-
  src/cpu/o3/iew_impl.hh  |   57 ++
  src/cpu/o3/inst_queue.hh|   25 -
  src/cpu/o3/inst_queue_impl.hh   |   68 ++---
  src/cpu/o3/lsq.hh   |   27 +-
  src/cpu/o3/lsq_impl.hh  |   23 +---
  src/cpu/o3/lsq_unit.hh  |  198
 ---
  src/cpu/o3/lsq_unit_impl.hh |   40 ++-
  src/cpu/o3/mem_dep_unit.hh  |4 +-
  src/cpu/o3/mem_dep_unit_impl.hh |4 +-
  10 files changed, 203 insertions(+), 256 deletions(-)

 diffs (truncated from 846 to 300 lines):

 diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh
 --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400
 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400
 @@ -1,5 +1,5 @@
  /*
 - * Copyright (c) 2010-2012 ARM Limited
 + * Copyright (c) 2010-2012, 2014 ARM Limited
   * All rights reserved
   *
   * The license below extends only to copyright in the software and shall
 @@ -181,6 +181,12 @@
  /** Re-executes all rescheduled memory instructions. */
  void replayMemInst(DynInstPtr inst);

 +/** Moves memory instruction onto the list of cache blocked
 instructions */
 +void blockMemInst(DynInstPtr inst);
 +
 +/** Notifies that the cache has become unblocked */
 +void cacheUnblocked();
 +
  /** Sends an instruction to commit through the time buffer. */
  void instToCommit(DynInstPtr inst);

 @@ -233,11 +239,6 @@
   */
  void squashDueToMemOrder(DynInstPtr inst, ThreadID tid);

 -/** Sends commit proper information for a squash due to memory
 becoming
 - * blocked (younger issued instructions must be retried).
 - */
 -void squashDueToMemBlocked(DynInstPtr inst, ThreadID tid);
 -
  /** Sets Dispatch to blocked, and signals back to other stages to
 block. */
  void block(ThreadID tid);

 diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew_impl.hh
 --- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:38 2014 -0400
 +++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:39 2014 -0400
 @@ -530,29 +530,6 @@

  templateclass Impl
  void
 -DefaultIEWImpl::squashDueToMemBlocked(DynInstPtr inst, ThreadID tid) -{
 -DPRINTF(IEW, [tid:%i]: Memory blocked, squashing load and younger
 insts, 
 -PC: %s [sn:%i].\n, tid, inst-pcState(), inst-seqNum);
 -if (!toCommit-squash[tid] ||
 -inst-seqNum  toCommit-squashedSeqNum[tid]) {
 -toCommit-squash[tid] = true;
 -
 -toCommit-squashedSeqNum[tid] = inst-seqNum;
 -toCommit-pc[tid] = inst-pcState();
 -toCommit-mispredictInst[tid] = NULL;
 -
 -// Must include the broadcasted SN in the squash.
 -toCommit-includeSquashInst[tid] = true;
 -
 -ldstQueue.setLoadBlockedHandled(tid);
 -
 -wroteToTimeBuffer = true;
 -}
 -}
 -
 -templateclass Impl
 -void
  DefaultIEWImpl::block(ThreadID tid)
  {
  DPRINTF(IEW, [tid:%u]: Blocking.\n, tid); @@ -610,6 +587,20 @@

  templateclass Impl
  void

Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu

2015-01-30 Thread Mitch Hayenga via gem5-dev
Ahh, yeah I'm familiar with speculatively grabbing coherence rights for
stores prior to commit.  But the store isn't done right, its just globally
ordered.  And other system activity might make that ownership go away
prior to the actual store commit.

How about just dropping/ignoring the prefetch if the blocked case actually
happens?

On Fri, Jan 30, 2015 at 6:16 PM, Beckmann, Brad via gem5-dev 
gem5-dev@gem5.org wrote:

 Thanks Mitch for the quick reply.

 While assuming stores are only sent after commit is true for the current
 O3 model, aggressive out-of-order processors send store addresses to the
 memory system as soon as they are available (i.e. speculatively).  We
 actually have a patch that provides such a capability, but I'm having a
 tough time figuring out how to merge it with your change.  Any suggestions
 you may have would be very much appreciated.

 Thanks,

 Brad



 -Original Message-
 From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch
 Hayenga via gem5-dev
 Sent: Friday, January 30, 2015 9:34 AM
 To: gem5 Developer List
 Cc: gem5-...@m5sim.org
 Subject: Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load
 behavior in o3 cpu

 Hi,

 Stores should be fine since they are only sent to the memory system after
 commit.   The relevant functions to look at are
 sendStore, recvRetry, and writebackStores in lsq_unit_impl.hh.

 Basically, if a store gets blocked the core just waits until it gets a
 retry.  Since stores are sent in-order from the SQ to the memory system,
 that queue just waits.  The stores are never removed from the SQ unless
 they succeed.

 Loads were special in that they were effectively removed from the
 scheduler, even if they might fail.  Stores however always maintain their
 entries/order until they succeed.





 On Thu, Jan 29, 2015 at 6:01 PM, Beckmann, Brad via gem5-dev 
 gem5-dev@gem5.org wrote:

  Hi Mitch,
 
  Quick question regarding this patch.  Does this patch also handle
  replaying stores once the cache becomes unblocked?  The changes and
  comments appear to only handle loads, but it seems like stores could
  have the same problem.
 
  Thanks,
 
  Brad
 
 
 
  -Original Message-
  From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch
  Hayenga via gem5-dev
  Sent: Wednesday, September 03, 2014 4:38 AM
  To: gem5-...@m5sim.org
  Subject: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load
  behavior in o3 cpu
 
  changeset 6be8945d226b in /z/repo/gem5
  details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b
  description:
  cpu: Fix cache blocked load behavior in o3 cpu
 
  This patch fixes the load blocked/replay mechanism in the o3 cpu.
  Rather than
  flushing the entire pipeline, this patch replays loads once
  the cache becomes
  unblocked.
 
  Additionally, deferred memory instructions (loads which had
  conflicting stores),
  when replayed would not respect the number of functional units
  (only respected
  issue width).  This patch also corrects that.
 
  Improvements over 20% have been observed on a microbenchmark
  designed to
  exercise this behavior.
 
  diffstat:
 
   src/cpu/o3/iew.hh   |   13 +-
   src/cpu/o3/iew_impl.hh  |   57 ++
   src/cpu/o3/inst_queue.hh|   25 -
   src/cpu/o3/inst_queue_impl.hh   |   68 ++---
   src/cpu/o3/lsq.hh   |   27 +-
   src/cpu/o3/lsq_impl.hh  |   23 +---
   src/cpu/o3/lsq_unit.hh  |  198
  ---
   src/cpu/o3/lsq_unit_impl.hh |   40 ++-
   src/cpu/o3/mem_dep_unit.hh  |4 +-
   src/cpu/o3/mem_dep_unit_impl.hh |4 +-
   10 files changed, 203 insertions(+), 256 deletions(-)
 
  diffs (truncated from 846 to 300 lines):
 
  diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh
  --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400
  +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400
  @@ -1,5 +1,5 @@
   /*
  - * Copyright (c) 2010-2012 ARM Limited
  + * Copyright (c) 2010-2012, 2014 ARM Limited
* All rights reserved
*
* The license below extends only to copyright in the software and
  shall @@ -181,6 +181,12 @@
   /** Re-executes all rescheduled memory instructions. */
   void replayMemInst(DynInstPtr inst);
 
  +/** Moves memory instruction onto the list of cache blocked
  instructions */
  +void blockMemInst(DynInstPtr inst);
  +
  +/** Notifies that the cache has become unblocked */
  +void cacheUnblocked();
  +
   /** Sends an instruction to commit through the time buffer. */
   void instToCommit(DynInstPtr inst);
 
  @@ -233,11 +239,6 @@
*/
   void squashDueToMemOrder(DynInstPtr inst, ThreadID tid);
 
  -/** Sends commit proper information for a squash due to memory
  becoming
  - * blocked (younger issued instructions must be retried).
  - */
  -void

Re: [gem5-dev] simpoints and KVM

2015-01-12 Thread Mitch Hayenga via gem5-dev
Hi Mike,

I'm the one who wrote the initial version of the simpoint
collection/generation a few years ago. I enforced the fastmem option
primarily because I didn't see it necessary to simulate caches during
simpoint generation and it made simulation faster.  You can simply disable
this and it should all still work.

For the --cpu-type=atomic option, initially simpoints were hardcoded
directly into the atomic CPU.  Since then, they've been changed to use the
probe system.  However, a quick grep of the code shows the initialization
for the SimPoint probe only exists in the atomic CPU.  If you registered
the probe point with the TimingCPU as well, then that should work (I think).


On Mon, Jan 12, 2015 at 5:02 PM, mike upton via gem5-dev gem5-dev@gem5.org
wrote:

 I am trying to enable simpoint generation with kvm enabled.

 Is there anything that inherently blocks this?

 Simpoints are currently enabled only with --fastmem and --cpu-type=atomic.
 How fundamental are each of these restrictions?

 Thanks,

 Mike
 ___
 gem5-dev mailing list
 gem5-dev@gem5.org
 http://m5sim.org/mailman/listinfo/gem5-dev

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Rework the structuring of the prefetchers

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset b9646f4546ad in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b9646f4546ad
description:
mem: Rework the structuring of the prefetchers

Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.

Finally, the stride-based prefetcher now assumes a custimizable lookup 
table
(sets/ways) rather than the previous fully associative structure.

diffstat:

 src/mem/cache/cache_impl.hh  |   10 +-
 src/mem/cache/prefetch/Prefetcher.py |   62 ---
 src/mem/cache/prefetch/SConscript|1 +
 src/mem/cache/prefetch/base.cc   |  258 --
 src/mem/cache/prefetch/base.hh   |  139 +-
 src/mem/cache/prefetch/queued.cc |  213 
 src/mem/cache/prefetch/queued.hh |  108 ++
 src/mem/cache/prefetch/stride.cc |  205 +++---
 src/mem/cache/prefetch/stride.hh |   55 +++---
 src/mem/cache/prefetch/tagged.cc |   16 +-
 src/mem/cache/prefetch/tagged.hh |   19 +-
 11 files changed, 599 insertions(+), 487 deletions(-)

diffs (truncated from 1401 to 300 lines):

diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -535,7 +535,7 @@
 bool satisfied = access(pkt, blk, lat, writebacks);
 
 // track time of availability of next prefetch, if any
-Tick next_pf_time = 0;
+Tick next_pf_time = MaxTick;
 
 bool needsResponse = pkt-needsResponse();
 
@@ -548,7 +548,7 @@
 
 // Don't notify on SWPrefetch
 if (!pkt-cmd.isSWPrefetch())
-next_pf_time = prefetcher-notify(pkt, time);
+next_pf_time = prefetcher-notify(pkt);
 }
 
 if (needsResponse) {
@@ -648,7 +648,7 @@
 if (prefetcher) {
 // Don't notify on SWPrefetch
 if (!pkt-cmd.isSWPrefetch())
-next_pf_time = prefetcher-notify(pkt, time);
+next_pf_time = prefetcher-notify(pkt);
 }
 }
 } else {
@@ -688,12 +688,12 @@
 if (prefetcher) {
 // Don't notify on SWPrefetch
 if (!pkt-cmd.isSWPrefetch())
-next_pf_time = prefetcher-notify(pkt, time);
+next_pf_time = prefetcher-notify(pkt);
 }
 }
 }
 
-if (next_pf_time != 0)
+if (next_pf_time != MaxTick)
 requestMemSideBus(Request_PF, std::max(time, next_pf_time));
 
 // copy writebacks to write buffer
diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/prefetch/Prefetcher.py
--- a/src/mem/cache/prefetch/Prefetcher.py  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/prefetch/Prefetcher.py  Tue Dec 23 09:31:18 2014 -0500
@@ -1,4 +1,4 @@
-# Copyright (c) 2012 ARM Limited
+# Copyright (c) 2012, 2014 ARM Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -37,6 +37,7 @@
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #
 # Authors: Ron Dreslinski
+#  Mitch Hayenga
 
 from ClockedObject import ClockedObject
 from m5.params import *
@@ -46,39 +47,46 @@
 type = 'BasePrefetcher'
 abstract = True
 cxx_header = mem/cache/prefetch/base.hh
-size = Param.Int(100,
- Number of entries in the hardware prefetch queue)
-cross_pages = Param.Bool(False,
- Allow prefetches to cross virtual page boundaries)
-serial_squash = Param.Bool(False,
- Squash prefetches with a later time on a subsequent miss)
-degree = Param.Int(1,
- Degree of the prefetch depth)
-latency = Param.Cycles('1', Latency of the prefetcher)
-use_master_id = Param.Bool(True,
- Use the master id to separate calculations of prefetches)
-data_accesses_only = Param.Bool(False,
- Only prefetch on data not on instruction accesses)
-on_miss_only = Param.Bool(False,
- Only prefetch on miss (as opposed to always))
-on_read_only = Param.Bool(False,
- Only prefetch on read requests (write requests ignored))
-on_prefetch = Param.Bool(True,
- Let lower cache prefetcher train on prefetch requests)
-inst_tagged = Param.Bool(True,
- Perform a tagged prefetch for instruction fetches always)
 sys = Param.System(Parent.any, System this prefetcher belongs to)
 
-class 

[gem5-dev] changeset in gem5: mem: Fix event scheduling issue for prefetches

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 00965520c9f5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=00965520c9f5
description:
mem: Fix event scheduling issue for prefetches

The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or 
whenever
a future prefetch is ready.  However, a prefetch being ready does not 
guarentee
that it can obtain an MSHR.  So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every 
picosecond
until an MSHR is finally freed and the prefetch can happen.

This patch fixes this by not signaling the prefetch ready time if the 
prefetch
could not be generated.  The event is rescheduled as soon as a MSHR 
becomes
available.

diffstat:

 src/mem/cache/cache_impl.hh |  13 -
 1 files changed, 12 insertions(+), 1 deletions(-)

diffs (30 lines):

diff -r 97aa1ee1c2d9 -r 00965520c9f5 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -1197,6 +1197,15 @@
 if (wasFull  !mq-isFull()) {
 clearBlocked((BlockedCause)mq-index);
 }
+
+// Request the bus for a prefetch if this deallocation freed enough
+// MSHRs for a prefetch to take place
+if (prefetcher  mq == mshrQueue  mshrQueue.canPrefetch()) {
+Tick next_pf_time = std::max(prefetcher-nextPrefetchReadyTime(),
+ curTick());
+if (next_pf_time != MaxTick)
+requestMemSideBus(Request_PF, next_pf_time);
+}
 }
 
 // copy writebacks to write buffer
@@ -1955,7 +1964,9 @@
 Tick nextReady = std::min(mshrQueue.nextMSHRReadyTime(),
   writeBuffer.nextMSHRReadyTime());
 
-if (prefetcher) {
+// Don't signal prefetch ready time if no MSHRs available
+// Will signal once enoguh MSHRs are deallocated
+if (prefetcher  mshrQueue.canPrefetch()) {
 nextReady = std::min(nextReady,
  prefetcher-nextPrefetchReadyTime());
 }
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Add parameter to reserve MSHR entries fo...

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 0b969a35781f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=0b969a35781f
description:
mem: Add parameter to reserve MSHR entries for demand access

Adds a new parameter that reserves some number of MSHR entries for 
demand
accesses.  This helps prevent prefetchers from taking all MSHRs, 
forcing demand
requests from the CPU to stall.

diffstat:

 src/mem/cache/BaseCache.py  |   1 +
 src/mem/cache/base.cc   |   4 ++--
 src/mem/cache/cache_impl.hh |   2 +-
 src/mem/cache/mshr_queue.cc |   8 +---
 src/mem/cache/mshr_queue.hh |  19 ++-
 5 files changed, 27 insertions(+), 7 deletions(-)

diffs (101 lines):

diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/BaseCache.py
--- a/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500
@@ -54,6 +54,7 @@
 max_miss_count = Param.Counter(0,
 number of misses to handle before calling exit)
 mshrs = Param.Int(number of MSHRs (max outstanding requests))
+demand_mshr_reserve = Param.Int(1, mshrs to reserve for demand access)
 size = Param.MemorySize(capacity in bytes)
 forward_snoops = Param.Bool(True,
 forward snoops from mem side to cpu side)
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/base.cc
--- a/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500
@@ -68,8 +68,8 @@
 BaseCache::BaseCache(const Params *p)
 : MemObject(p),
   cpuSidePort(nullptr), memSidePort(nullptr),
-  mshrQueue(MSHRs, p-mshrs, 4, MSHRQueue_MSHRs),
-  writeBuffer(write buffer, p-write_buffers, p-mshrs+1000,
+  mshrQueue(MSHRs, p-mshrs, 4, p-demand_mshr_reserve, MSHRQueue_MSHRs),
+  writeBuffer(write buffer, p-write_buffers, p-mshrs+1000, 0,
   MSHRQueue_WriteBuffer),
   blkSize(p-system-cacheLineSize()),
   hitLatency(p-hit_latency),
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -1841,7 +1841,7 @@
 
 // fall through... no pending requests.  Try a prefetch.
 assert(!miss_mshr  !write_mshr);
-if (prefetcher  !mshrQueue.isFull()) {
+if (prefetcher  mshrQueue.canPrefetch()) {
 // If we have a miss queue slot, we can try a prefetch
 PacketPtr pkt = prefetcher-getPacket();
 if (pkt) {
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.cc
--- a/src/mem/cache/mshr_queue.cc   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/mshr_queue.cc   Tue Dec 23 09:31:18 2014 -0500
@@ -52,10 +52,12 @@
 using namespace std;
 
 MSHRQueue::MSHRQueue(const std::string _label,
- int num_entries, int reserve, int _index)
+ int num_entries, int reserve, int demand_reserve,
+ int _index)
 : label(_label), numEntries(num_entries + reserve - 1),
-  numReserve(reserve), registers(numEntries),
-  drainManager(NULL), allocated(0), inServiceEntries(0), index(_index)
+  numReserve(reserve), demandReserve(demand_reserve),
+  registers(numEntries), drainManager(NULL), allocated(0),
+  inServiceEntries(0), index(_index)
 {
 for (int i = 0; i  numEntries; ++i) {
 registers[i].queue = this;
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.hh
--- a/src/mem/cache/mshr_queue.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/mshr_queue.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -77,6 +77,12 @@
  */
 const int numReserve;
 
+/**
+ * The number of entries to reserve for future demand accesses.
+ * Prevent prefetcher from taking all mshr entries
+ */
+const int demandReserve;
+
 /**  MSHR storage. */
 std::vectorMSHR registers;
 /** Holds pointers to all allocated entries. */
@@ -106,9 +112,11 @@
  * @param num_entrys The number of entries in this queue.
  * @param reserve The minimum number of entries needed to satisfy
  * any access.
+ * @param demand_reserve The minimum number of entries needed to satisfy
+ * demand accesses.
  */
 MSHRQueue(const std::string _label, int num_entries, int reserve,
-  int index);
+  int demand_reserve, int index);
 
 /**
  * Find the first MSHR that matches the provided address.
@@ -218,6 +226,15 @@
 }
 
 /**
+ * Returns true if sufficient mshrs for prefetch.
+ * @return True if sufficient mshrs for prefetch.
+ */
+bool canPrefetch() const
+{
+return (allocated  numEntries - (numReserve + demandReserve));
+}
+
+/**
  * Returns the MSHR at the head of the readyList.
  * @return The next request to service.
  */
___
gem5-dev mailing list
gem5-dev@gem5.org

[gem5-dev] changeset in gem5: mem: Change prefetcher to use random_mt

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 63edd4a1243f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=63edd4a1243f
description:
mem: Change prefetcher to use random_mt

Prefechers has used rand() to generate random numers previously.

diffstat:

 src/mem/cache/prefetch/stride.cc |  3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diffs (20 lines):

diff -r 7982e539d003 -r 63edd4a1243f src/mem/cache/prefetch/stride.cc
--- a/src/mem/cache/prefetch/stride.cc  Tue Dec 23 09:31:19 2014 -0500
+++ b/src/mem/cache/prefetch/stride.cc  Tue Dec 23 09:31:19 2014 -0500
@@ -46,6 +46,7 @@
  * Stride Prefetcher template instantiations.
  */
 
+#include base/random.hh
 #include debug/HWPrefetch.hh
 #include mem/cache/prefetch/stride.hh
 
@@ -176,7 +177,7 @@
 {
 // Rand replacement for now
 int set = pcHash(pc);
-int way = rand() % pcTableAssoc;
+int way = random_mt.randomint(0, pcTableAssoc - 1);
 
 DPRINTF(HWPrefetch, Victimizing lookup table[%d][%d].\n, set, way);
 return pcTable[master_id][set][way];
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Fix bug relating to writebacks and prefe...

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 97aa1ee1c2d9 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=97aa1ee1c2d9
description:
mem: Fix bug relating to writebacks and prefetches

Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system.  I hit that case.  Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.

diffstat:

 src/mem/cache/cache_impl.hh |  12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diffs (29 lines):

diff -r b9646f4546ad -r 97aa1ee1c2d9 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -1892,12 +1892,6 @@
 BlkType *blk = tags-findBlock(mshr-addr, mshr-isSecure);
 
 if (tgt_pkt-cmd == MemCmd::HardPFReq) {
-// It might be possible for a writeback to arrive between
-// the time the prefetch is placed in the MSHRs and when
-// it's selected to send... if so, this assert will catch
-// that, and then we'll have to figure out what to do.
-assert(blk == NULL);
-
 // We need to check the caches above us to verify that
 // they don't have a copy of this block in the dirty state
 // at the moment. Without this check we could get a stale
@@ -1909,8 +1903,10 @@
 cpuSidePort-sendTimingSnoopReq(snoop_pkt);
 
 // Check to see if the prefetch was squashed by an upper cache
-if (snoop_pkt.prefetchSquashed()) {
-DPRINTF(Cache, Prefetch squashed by upper cache.  
+// Or if a writeback arrived between the time the prefetch was
+// placed in the MSHRs and when it was selected to send.
+if (snoop_pkt.prefetchSquashed() || blk != NULL) {
+DPRINTF(Cache, Prefetch squashed by cache.  
Deallocating mshr target %#x.\n, mshr-addr);
 
 // Deallocate the mshr target
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Asimbench : Turn off dumping of file system.framebuffer.bmp

2014-11-23 Thread Mitch Hayenga via gem5-dev
Set the enable_capture parameter from src/dev/arm/RealView.py to false.



On Sat, Nov 22, 2014 at 8:46 PM, Lokesh Jindal via gem5-dev 
gem5-dev@gem5.org wrote:

 Hello everyone,

 While running asimbench, I see that the benchmarks dump file
 system.framebuffer.bmp very frequently in the output directory. This is
 creating a lot of unnecessary writes to disk and slowing down my
 simulation/file-system. It would be a life saver if someone could suggest
 _*a way to *__*turn-off dumping of this bmp file.*__*
 *__
 _Thanks and Regards
 Lokesh Jindal
 ___
 gem5-dev mailing list
 gem5-dev@gem5.org
 http://m5sim.org/mailman/listinfo/gem5-dev

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Delete unused variable in Garnet Network...

2014-11-12 Thread Mitch Hayenga via gem5-dev
changeset 50bbc64efbb8 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=50bbc64efbb8
description:
mem: Delete unused variable in Garnet NetworkLink

With recent changes OSX clang compilation fails due to an unused 
variable.

diffstat:

 src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh |  1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diffs (11 lines):

diff -r d1dce0b728b6 -r 50bbc64efbb8 
src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh
--- a/src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh   Wed Nov 
12 09:05:22 2014 -0500
+++ b/src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh   Wed Nov 
12 09:05:23 2014 -0500
@@ -74,7 +74,6 @@
 flitBuffer_d *linkBuffer;
 Consumer *link_consumer;
 flitBuffer_d *link_srcQueue;
-int m_flit_width;
 
 // Statistical variables
 unsigned int m_link_utilized;
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Add writeback modeling for drain functio...

2014-10-29 Thread Mitch Hayenga via gem5-dev
changeset e57f5bffc553 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=e57f5bffc553
description:
cpu: Add writeback modeling for drain functionality

It is possible for the O3 CPU to consider itself drained and
later have a squashed instruction perform a writeback.  This
patch re-adds tracking of in-flight instructions to prevent
falsely signaling a drained event.

diffstat:

 src/cpu/o3/inst_queue.hh  |  3 +++
 src/cpu/o3/inst_queue_impl.hh |  7 ++-
 2 files changed, 9 insertions(+), 1 deletions(-)

diffs (51 lines):

diff -r 7e54a9a9f6b2 -r e57f5bffc553 src/cpu/o3/inst_queue.hh
--- a/src/cpu/o3/inst_queue.hh  Wed Oct 29 23:18:26 2014 -0500
+++ b/src/cpu/o3/inst_queue.hh  Wed Oct 29 23:18:27 2014 -0500
@@ -437,6 +437,9 @@
 /** The number of physical registers in the CPU. */
 unsigned numPhysRegs;
 
+/** Number of instructions currently in flight to FUs */
+int wbOutstanding;
+
 /** Delay between commit stage and the IQ.
  *  @todo: Make there be a distinction between the delays within IEW.
  */
diff -r 7e54a9a9f6b2 -r e57f5bffc553 src/cpu/o3/inst_queue_impl.hh
--- a/src/cpu/o3/inst_queue_impl.hh Wed Oct 29 23:18:26 2014 -0500
+++ b/src/cpu/o3/inst_queue_impl.hh Wed Oct 29 23:18:27 2014 -0500
@@ -415,6 +415,7 @@
 deferredMemInsts.clear();
 blockedMemInsts.clear();
 retryMemInsts.clear();
+wbOutstanding = 0;
 }
 
 template class Impl
@@ -444,7 +445,9 @@
 bool
 InstructionQueueImpl::isDrained() const
 {
-bool drained = dependGraph.empty()  instsToExecute.empty();
+bool drained = dependGraph.empty() 
+   instsToExecute.empty() 
+   wbOutstanding == 0;
 for (ThreadID tid = 0; tid  numThreads; ++tid)
 drained = drained  memDepUnit[tid].isDrained();
 
@@ -723,6 +726,7 @@
 assert(!cpu-switchedOut());
 // The CPU could have been sleeping until this op completed (*extremely*
 // long latency op).  Wake it if it was.  This may be overkill.
+   --wbOutstanding;
 iewStage-wakeCPU();
 
 if (fu_idx  -1)
@@ -823,6 +827,7 @@
 } else {
 Cycles issue_latency = fuPool-getIssueLatency(op_class);
 // Generate completion event for the FU
+++wbOutstanding;
 FUCompletion *execution = new FUCompletion(issuing_inst,
idx, this);
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Missing m_flit_width definition.

2014-10-27 Thread Mitch Hayenga via gem5-dev
This kind of thing happens fairly often.  The compilation is set to error
out if it detects unused variables.  Over time the compilers get smarter
and detect variables as unused that for whatever reason the compiler
previously couldn't detect.

From my experience Clang on OS X tends to see most of these in the
committed code, since most of the other developers develop solely on
GCC/Linux.

I fixed this error as well last week, just haven't gotten around to
submitting a patch.

On Sat, Oct 25, 2014 at 3:03 AM, Todd Bezenek via gem5-dev 
gem5-dev@gem5.org wrote:

 (I'm a gem5 newbie, so please excuse me if this is an easy/know issue.)

 Please let me know if this is the wrong place to post this.

 I'm using Mac OS X (Mavericks).

 I downloaded the standard gem5 source tree (on Oct. 25, 2014):

 bash hg clone http://repo.gem5.org/gem5

 and built it with the following error:

 bash scons build/ARM/gem5.opt... [ CXX]
 ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.cc - .oIn file
 included from
 build/ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.cc:  31:In
 file included from

 build/ARM/mem/ruby/network/garnet/fixed-pipeline/CreditLink_d.hh:34:build/ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh:77:9:
 error: private  field 'm_flit_width' is not used
 [-Werror,-Wunused-private-field]

 int m_flit_width;

 ^

 1 error generated.

 scons: ***
 [build/ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.o] Error 1

 scons: building terminated because of errors.

 Sat Oct 25 00:17:37 ~/Work/Gem5/Src/gem5

 - I fixed this by commenting out the definition of m_flit_width and
 everything worked.

 - I figure it would be a good thing to fix this for the distribution in
 general.
 -Todd
 --
 Todd Bezenek http://www.linkedin.com/in/toddbezenek/, MScCS, MScEE
 beze...@gmail.com

 A people hire A people, B people hire C people.
   --Jim Gray http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
 ___
 gem5-dev mailing list
 gem5-dev@gem5.org
 http://m5sim.org/mailman/listinfo/gem5-dev

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] Changeset 10484 breaks compilation on Mac OS X

2014-10-21 Thread Mitch Hayenga via gem5-dev
changeset:   10484:6709bbcf564d
user:Michael Adler michael.ad...@intel.com
date:Mon Oct 20 16:44:53 2014 -0500
summary: sim: implement getdents/getdents64 in user mode


Errors:
build/ARM/sim/syscall_emul.cc:881:30: error: use of undeclared identifier
'SYS_getdents'
int bytes_read = syscall(SYS_getdents, fd, bufArg.bufferPtr(), nbytes);
 ^
build/ARM/sim/syscall_emul.cc:899:30: error: use of undeclared identifier
'SYS_getdents64'
int bytes_read = syscall(SYS_getdents64, fd, bufArg.bufferPtr(),
nbytes);


It looks like this recent changeset for syscall emulation directly makes a
syscall to SYS_getdents.  Which seemingly does not exist on mac OS X.

From the getdents manpage:
This is not the function you are interested in.  Look at readdir(3) for
the POSIX conforming C library interface.


- Mitch
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Remove Ozone CPU from the source tree

2014-10-10 Thread Mitch Hayenga via gem5-dev
changeset cba563d00376 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=cba563d00376
description:
cpu: Remove Ozone CPU from the source tree

The Ozone CPU is now very much out of date and completely
non-functional, with no one actively working on restoring it. It is a
source of confusion for new users who attempt to use it before
realizing its current state. RIP

diffstat:

 src/cpu/checker/cpu_impl.hh| 2 +-
 src/cpu/o3/SConscript  | 8 +-
 src/cpu/ozone/OzoneCPU.py  |   118 -
 src/cpu/ozone/OzoneChecker.py  |38 -
 src/cpu/ozone/SConscript   |57 -
 src/cpu/ozone/SConsopts|35 -
 src/cpu/ozone/SimpleOzoneCPU.py|   111 -
 src/cpu/ozone/back_end.cc  |34 -
 src/cpu/ozone/back_end.hh  |   535 
 src/cpu/ozone/back_end_impl.hh |  1919 
 src/cpu/ozone/base_dyn_inst.cc |35 -
 src/cpu/ozone/bpred_unit.cc|36 -
 src/cpu/ozone/checker_builder.cc   |   100 -
 src/cpu/ozone/cpu.cc   |37 -
 src/cpu/ozone/cpu.hh   |   419 --
 src/cpu/ozone/cpu_builder.cc   |   200 ---
 src/cpu/ozone/cpu_impl.hh  |   886 --
 src/cpu/ozone/dyn_inst.cc  |37 -
 src/cpu/ozone/dyn_inst.hh  |   225 ---
 src/cpu/ozone/dyn_inst_impl.hh |   277 
 src/cpu/ozone/ea_list.cc   |80 -
 src/cpu/ozone/ea_list.hh   |75 -
 src/cpu/ozone/front_end.cc |36 -
 src/cpu/ozone/front_end.hh |   317 -
 src/cpu/ozone/front_end_impl.hh|   995 
 src/cpu/ozone/inorder_back_end.cc  |34 -
 src/cpu/ozone/inorder_back_end.hh  |   382 --
 src/cpu/ozone/inorder_back_end_impl.hh |   527 
 src/cpu/ozone/inst_queue.cc|38 -
 src/cpu/ozone/inst_queue.hh|   508 
 src/cpu/ozone/inst_queue_impl.hh   |  1349 --
 src/cpu/ozone/lsq_unit.cc  |36 -
 src/cpu/ozone/lsq_unit.hh  |   636 --
 src/cpu/ozone/lsq_unit_impl.hh |   844 --
 src/cpu/ozone/lw_back_end.cc   |34 -
 src/cpu/ozone/lw_back_end.hh   |   435 ---
 src/cpu/ozone/lw_back_end_impl.hh  |  1677 ---
 src/cpu/ozone/lw_lsq.cc|36 -
 src/cpu/ozone/lw_lsq.hh|   695 ---
 src/cpu/ozone/lw_lsq_impl.hh   |   965 
 src/cpu/ozone/null_predictor.hh|   105 -
 src/cpu/ozone/ozone_base_dyn_inst.cc   |39 -
 src/cpu/ozone/ozone_impl.hh|74 -
 src/cpu/ozone/rename_table.cc  |36 -
 src/cpu/ozone/rename_table.hh  |56 -
 src/cpu/ozone/rename_table_impl.hh |58 -
 src/cpu/ozone/simple_base_dyn_inst.cc  |39 -
 src/cpu/ozone/simple_cpu_builder.cc|   196 ---
 src/cpu/ozone/simple_impl.hh   |70 -
 src/cpu/ozone/simple_params.hh |   192 ---
 src/cpu/ozone/thread_state.hh  |   152 --
 51 files changed, 4 insertions(+), 15821 deletions(-)

diffs (truncated from 16048 to 300 lines):

diff -r ceb471d74fe9 -r cba563d00376 src/cpu/checker/cpu_impl.hh
--- a/src/cpu/checker/cpu_impl.hh   Thu Oct 09 17:51:57 2014 -0400
+++ b/src/cpu/checker/cpu_impl.hh   Thu Oct 09 17:51:58 2014 -0400
@@ -410,7 +410,7 @@
 if (FullSystem) {
 // @todo: Determine if these should happen only if the
 // instruction hasn't faulted.  In the SimpleCPU case this may
-// not be true, but in the O3 or Ozone case this may be true.
+// not be true, but in the O3 case this may be true.
 Addr oldpc;
 int count = 0;
 do {
diff -r ceb471d74fe9 -r cba563d00376 src/cpu/o3/SConscript
--- a/src/cpu/o3/SConscript Thu Oct 09 17:51:57 2014 -0400
+++ b/src/cpu/o3/SConscript Thu Oct 09 17:51:58 2014 -0400
@@ -32,11 +32,6 @@
 
 Import('*')
 
-if 'O3CPU' in env['CPU_MODELS'] or 'OzoneCPU' in env['CPU_MODELS']:
-DebugFlag('CommitRate')
-DebugFlag('IEW')
-DebugFlag('IQ')
-
 if 'O3CPU' in env['CPU_MODELS']:
 SimObject('FUPool.py')
 SimObject('FuncUnitConfig.py')
@@ -64,6 +59,9 @@
 Source('store_set.cc')
 Source('thread_context.cc')
 
+DebugFlag('CommitRate')
+DebugFlag('IEW')
+DebugFlag('IQ')
 DebugFlag('LSQ')
 DebugFlag('LSQUnit')
 DebugFlag('MemDepUnit')
diff -r ceb471d74fe9 -r cba563d00376 src/cpu/ozone/OzoneCPU.py
--- a/src/cpu/ozone/OzoneCPU.py Thu Oct 09 17:51:57 2014 -0400
+++ /dev/null   Thu Jan 01 00:00:00 1970 +
@@ -1,118 +0,0 @@
-# Copyright (c) 2006-2007 The Regents of The University of Michigan
-# All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without
-# modification, are permitted provided 

[gem5-dev] changeset in gem5: mem: Remove the GHB prefetcher from the sourc...

2014-09-20 Thread Mitch Hayenga via gem5-dev
changeset 452a5f178ec5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=452a5f178ec5
description:
mem: Remove the GHB prefetcher from the source tree

There are two primary issues with this code which make it deserving of 
deletion.

1) GHB is a way to structure a prefetcher, not a definitive type of 
prefetcher
2) This prefetcher isn't even structured like a GHB prefetcher.
   It's basically a worse version of the stride prefetcher.

It primarily serves to confuse new gem5 users and most functionality is 
already
present in the stride prefetcher.

diffstat:

 src/mem/cache/prefetch/Prefetcher.py |   5 -
 src/mem/cache/prefetch/SConscript|   1 -
 src/mem/cache/prefetch/ghb.cc|  97 
 src/mem/cache/prefetch/ghb.hh|  77 
 4 files changed, 0 insertions(+), 180 deletions(-)

diffs (208 lines):

diff -r ab8b8601b6ff -r 452a5f178ec5 src/mem/cache/prefetch/Prefetcher.py
--- a/src/mem/cache/prefetch/Prefetcher.py  Sat Sep 20 17:17:43 2014 -0400
+++ b/src/mem/cache/prefetch/Prefetcher.py  Sat Sep 20 17:17:44 2014 -0400
@@ -69,11 +69,6 @@
  Perform a tagged prefetch for instruction fetches always)
 sys = Param.System(Parent.any, System this device belongs to)
 
-class GHBPrefetcher(BasePrefetcher):
-type = 'GHBPrefetcher'
-cxx_class = 'GHBPrefetcher'
-cxx_header = mem/cache/prefetch/ghb.hh
-
 class StridePrefetcher(BasePrefetcher):
 type = 'StridePrefetcher'
 cxx_class = 'StridePrefetcher'
diff -r ab8b8601b6ff -r 452a5f178ec5 src/mem/cache/prefetch/SConscript
--- a/src/mem/cache/prefetch/SConscript Sat Sep 20 17:17:43 2014 -0400
+++ b/src/mem/cache/prefetch/SConscript Sat Sep 20 17:17:44 2014 -0400
@@ -33,7 +33,6 @@
 SimObject('Prefetcher.py')
 
 Source('base.cc')
-Source('ghb.cc')
 Source('stride.cc')
 Source('tagged.cc')
 
diff -r ab8b8601b6ff -r 452a5f178ec5 src/mem/cache/prefetch/ghb.cc
--- a/src/mem/cache/prefetch/ghb.cc Sat Sep 20 17:17:43 2014 -0400
+++ /dev/null   Thu Jan 01 00:00:00 1970 +
@@ -1,97 +0,0 @@
-/*
- * Copyright (c) 2012-2013 ARM Limited
- * All rights reserved
- *
- * The license below extends only to copyright in the software and shall
- * not be construed as granting a license to any other intellectual
- * property including but not limited to intellectual property relating
- * to a hardware implementation of the functionality of the software
- * licensed hereunder.  You may use the software subject to the license
- * terms below provided that you ensure that this notice is replicated
- * unmodified and in its entirety in all distributions of the software,
- * modified or unmodified, in source code or in binary form.
- *
- * Copyright (c) 2005 The Regents of The University of Michigan
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are
- * met: redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer;
- * redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in the
- * documentation and/or other materials provided with the distribution;
- * neither the name of the copyright holders nor the names of its
- * contributors may be used to endorse or promote products derived from
- * this software without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- *
- * Authors: Ron Dreslinski
- *  Steve Reinhardt
- */
-
-/**
- * @file
- * GHB Prefetcher implementation.
- */
-
-#include base/trace.hh
-#include debug/HWPrefetch.hh
-#include mem/cache/prefetch/ghb.hh
-
-void
-GHBPrefetcher::calculatePrefetch(PacketPtr pkt, std::listAddr addresses,
- std::listCycles delays)
-{
-Addr blk_addr = pkt-getAddr()  ~(Addr)(blkSize-1);
-bool is_secure = pkt-isSecure();
-int master_id = useMasterId ? pkt-req-masterId() : 0;
-assert(master_id  Max_Masters);
-
-bool same_sec_state = true;
-// Avoid activating 

[gem5-dev] changeset in gem5: cpu: Add ExecFlags debug flag

2014-09-20 Thread Mitch Hayenga via gem5-dev
changeset b31580e27d1f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b31580e27d1f
description:
cpu: Add ExecFlags debug flag

Adds a debug flag to print out the flags a instruction is tagged with.

diffstat:

 src/cpu/SConscript  |  3 ++-
 src/cpu/exetrace.cc |  6 ++
 2 files changed, 8 insertions(+), 1 deletions(-)

diffs (36 lines):

diff -r 452a5f178ec5 -r b31580e27d1f src/cpu/SConscript
--- a/src/cpu/SConscriptSat Sep 20 17:17:44 2014 -0400
+++ b/src/cpu/SConscriptSat Sep 20 17:17:45 2014 -0400
@@ -96,6 +96,7 @@
 DebugFlag('ExecUser', 'Filter: Trace user mode instructions')
 DebugFlag('ExecKernel', 'Filter: Trace kernel mode instructions')
 DebugFlag('ExecAsid', 'Format: Include ASID in trace')
+DebugFlag('ExecFlags', 'Format: Include instruction flags in trace')
 DebugFlag('Fetch')
 DebugFlag('IntrControl')
 DebugFlag('O3PipeView')
@@ -106,7 +107,7 @@
 'ExecFaulting', 'ExecFetchSeq', 'ExecOpClass', 'ExecRegDelta',
 'ExecResult', 'ExecSpeculative', 'ExecSymbol', 'ExecThread',
 'ExecTicks', 'ExecMicro', 'ExecMacro', 'ExecUser', 'ExecKernel',
-'ExecAsid' ])
+'ExecAsid', 'ExecFlags' ])
 CompoundFlag('Exec', [ 'ExecEnable', 'ExecTicks', 'ExecOpClass', 'ExecThread',
 'ExecEffAddr', 'ExecResult', 'ExecSymbol', 'ExecMicro', 'ExecFaulting',
 'ExecUser', 'ExecKernel' ])
diff -r 452a5f178ec5 -r b31580e27d1f src/cpu/exetrace.cc
--- a/src/cpu/exetrace.cc   Sat Sep 20 17:17:44 2014 -0400
+++ b/src/cpu/exetrace.cc   Sat Sep 20 17:17:45 2014 -0400
@@ -131,6 +131,12 @@
 
 if (Debug::ExecCPSeq  cp_seq_valid)
 outsCPSeq=  dec  cp_seq;
+
+if (Debug::ExecFlags) {
+outsflags=(;
+inst-printFlags(outs, |);
+outs  );
+}
 }
 
 //
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Remove unused deallocateContext calls

2014-09-20 Thread Mitch Hayenga via gem5-dev
changeset a59c189de383 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=a59c189de383
description:
cpu: Remove unused deallocateContext calls

The call paths for de-scheduling a thread are halt() and suspend(), from
the thread context. There is no call to deallocateContext() in general,
though some CPUs chose to define it. This patch removes the function
from BaseCPU and the cores which do not require it.

diffstat:

 src/cpu/base.hh |   3 ---
 src/cpu/inorder/inorder_dyn_inst.cc |   6 --
 src/cpu/inorder/inorder_dyn_inst.hh |   7 ---
 src/cpu/inorder/thread_context.hh   |   3 ---
 src/cpu/o3/cpu.cc   |  16 +---
 src/cpu/o3/cpu.hh   |   5 -
 src/cpu/simple/base.cc  |   8 
 src/cpu/simple/base.hh  |   1 -
 8 files changed, 5 insertions(+), 44 deletions(-)

diffs (140 lines):

diff -r a9023811bf9e -r a59c189de383 src/cpu/base.hh
--- a/src/cpu/base.hh   Sat Sep 20 17:18:35 2014 -0400
+++ b/src/cpu/base.hh   Sat Sep 20 17:18:36 2014 -0400
@@ -257,9 +257,6 @@
 /// Notify the CPU that the indicated context is now suspended.
 virtual void suspendContext(ThreadID thread_num) {}
 
-/// Notify the CPU that the indicated context is now deallocated.
-virtual void deallocateContext(ThreadID thread_num) {}
-
 /// Notify the CPU that the indicated context is now halted.
 virtual void haltContext(ThreadID thread_num) {}
 
diff -r a9023811bf9e -r a59c189de383 src/cpu/inorder/inorder_dyn_inst.cc
--- a/src/cpu/inorder/inorder_dyn_inst.cc   Sat Sep 20 17:18:35 2014 -0400
+++ b/src/cpu/inorder/inorder_dyn_inst.cc   Sat Sep 20 17:18:36 2014 -0400
@@ -571,12 +571,6 @@
 }
 }
 
-void
-InOrderDynInst::deallocateContext(int thread_num)
-{
-this-cpu-deallocateContext(thread_num);
-}
-
 Fault
 InOrderDynInst::readMem(Addr addr, uint8_t *data,
 unsigned size, unsigned flags)
diff -r a9023811bf9e -r a59c189de383 src/cpu/inorder/inorder_dyn_inst.hh
--- a/src/cpu/inorder/inorder_dyn_inst.hh   Sat Sep 20 17:18:35 2014 -0400
+++ b/src/cpu/inorder/inorder_dyn_inst.hh   Sat Sep 20 17:18:36 2014 -0400
@@ -533,13 +533,6 @@
 
 
 //
-// MULTITHREADING INTERFACE TO CPU MODELS
-//
-
-virtual void deallocateContext(int thread_num);
-
-
-//
 //  PROGRAM COUNTERS - PC/NPC/NPC
 //
 
diff -r a9023811bf9e -r a59c189de383 src/cpu/inorder/thread_context.hh
--- a/src/cpu/inorder/thread_context.hh Sat Sep 20 17:18:35 2014 -0400
+++ b/src/cpu/inorder/thread_context.hh Sat Sep 20 17:18:36 2014 -0400
@@ -281,9 +281,6 @@
 void activateContext()
 { cpu-activateContext(thread-threadId()); }
 
-void deallocateContext()
-{ cpu-deallocateContext(thread-threadId()); }
-
 /** Returns the number of consecutive store conditional failures. */
 // @todo: Figure out where these store cond failures should go.
 unsigned readStCondFailures()
diff -r a9023811bf9e -r a59c189de383 src/cpu/o3/cpu.cc
--- a/src/cpu/o3/cpu.cc Sat Sep 20 17:18:35 2014 -0400
+++ b/src/cpu/o3/cpu.cc Sat Sep 20 17:18:36 2014 -0400
@@ -730,20 +730,12 @@
 
 template class Impl
 void
-FullO3CPUImpl::deallocateContext(ThreadID tid, bool remove)
-{
-deactivateThread(tid);
-if (remove)
-removeThread(tid);
-}
-
-template class Impl
-void
 FullO3CPUImpl::suspendContext(ThreadID tid)
 {
 DPRINTF(O3CPU,[tid: %i]: Suspending Thread Context.\n, tid);
 assert(!switchedOut());
-deallocateContext(tid, false);
+
+deactivateThread(tid);
 
 // If this was the last thread then unschedule the tick event.
 if (activeThreads.size() == 0)
@@ -761,7 +753,9 @@
 //For now, this is the same as deallocate
 DPRINTF(O3CPU,[tid:%i]: Halt Context called. Deallocating, tid);
 assert(!switchedOut());
-deallocateContext(tid, true);
+
+deactivateThread(tid);
+removeThread(tid);
 }
 
 template class Impl
diff -r a9023811bf9e -r a59c189de383 src/cpu/o3/cpu.hh
--- a/src/cpu/o3/cpu.hh Sat Sep 20 17:18:35 2014 -0400
+++ b/src/cpu/o3/cpu.hh Sat Sep 20 17:18:36 2014 -0400
@@ -326,11 +326,6 @@
 void suspendContext(ThreadID tid);
 
 /** Remove Thread from Active Threads List 
- *  Possibly Remove Thread Context from CPU.
- */
-void deallocateContext(ThreadID tid, bool remove);
-
-/** Remove Thread from Active Threads List 
  *  Remove Thread Context from CPU.
  */
 void haltContext(ThreadID tid);
diff -r a9023811bf9e -r a59c189de383 src/cpu/simple/base.cc
--- a/src/cpu/simple/base.ccSat Sep 20 17:18:35 2014 -0400
+++ b/src/cpu/simple/base.ccSat Sep 20 17:18:36 2014 -0400
@@ -133,14 +133,6 @@
 }
 
 

[gem5-dev] changeset in gem5: alpha, arm, mips, power, x86, cpu, sim: Cleanup act...

2014-09-20 Thread Mitch Hayenga via gem5-dev
changeset a9023811bf9e in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=a9023811bf9e
description:
alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate

activate(), suspend(), and halt() used on thread contexts had an 
optional
delay parameter. However this parameter was often ignored. Also, when 
used,
the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays 
were
ever specified). This patch removes the delay parameter and 'Events'
associated with them across all ISAs and cores. Unused activate logic
is also removed.

diffstat:

 src/arch/alpha/utility.hh |2 +-
 src/arch/arm/utility.hh   |2 +-
 src/arch/mips/mt.hh   |2 +-
 src/arch/mips/utility.cc  |2 +-
 src/arch/power/utility.hh |2 +-
 src/arch/sparc/utility.hh |2 +-
 src/arch/x86/utility.cc   |4 +-
 src/cpu/base.hh   |8 +-
 src/cpu/checker/thread_context.hh |   10 +-
 src/cpu/inorder/cpu.cc|   12 +-
 src/cpu/inorder/cpu.hh|6 +-
 src/cpu/inorder/thread_context.cc |8 +-
 src/cpu/inorder/thread_context.hh |   13 +-
 src/cpu/kvm/base.cc   |6 +-
 src/cpu/kvm/base.hh   |2 +-
 src/cpu/minor/cpu.cc  |   30 +---
 src/cpu/minor/cpu.hh  |   19 +---
 src/cpu/o3/cpu.cc |  212 ++---
 src/cpu/o3/cpu.hh |  124 +-
 src/cpu/o3/fetch_impl.hh  |8 +-
 src/cpu/o3/thread_context.hh  |9 +-
 src/cpu/o3/thread_context_impl.hh |   11 +-
 src/cpu/simple/atomic.cc  |6 +-
 src/cpu/simple/atomic.hh  |2 +-
 src/cpu/simple/timing.cc  |6 +-
 src/cpu/simple/timing.hh  |2 +-
 src/cpu/simple_thread.cc  |   12 +-
 src/cpu/simple_thread.hh  |5 +-
 src/cpu/thread_context.hh |   19 +-
 src/sim/process.cc|2 +-
 30 files changed, 95 insertions(+), 453 deletions(-)

diffs (truncated from 1157 to 300 lines):

diff -r 3819b85ff21a -r a9023811bf9e src/arch/alpha/utility.hh
--- a/src/arch/alpha/utility.hh Sat Sep 20 17:18:33 2014 -0400
+++ b/src/arch/alpha/utility.hh Sat Sep 20 17:18:35 2014 -0400
@@ -68,7 +68,7 @@
 // Alpha IPR register accessors
 inline bool PcPAL(Addr addr) { return addr  0x3; }
 inline void startupCPU(ThreadContext *tc, int cpuId)
-{ tc-activate(Cycles(0)); }
+{ tc-activate(); }
 
 
 //
diff -r 3819b85ff21a -r a9023811bf9e src/arch/arm/utility.hh
--- a/src/arch/arm/utility.hh   Sat Sep 20 17:18:33 2014 -0400
+++ b/src/arch/arm/utility.hh   Sat Sep 20 17:18:35 2014 -0400
@@ -104,7 +104,7 @@
 
 inline void startupCPU(ThreadContext *tc, int cpuId)
 {
-tc-activate(Cycles(0));
+tc-activate();
 }
 
 void copyRegs(ThreadContext *src, ThreadContext *dest);
diff -r 3819b85ff21a -r a9023811bf9e src/arch/mips/mt.hh
--- a/src/arch/mips/mt.hh   Sat Sep 20 17:18:33 2014 -0400
+++ b/src/arch/mips/mt.hh   Sat Sep 20 17:18:35 2014 -0400
@@ -96,7 +96,7 @@
 
 // TODO: SET PC WITH AN EVENT INSTEAD OF INSTANTANEOUSLY
 tc-pcState(restartPC);
-tc-activate(Cycles(0));
+tc-activate();
 
 warn(%i: Restoring thread %i in %s @ PC %x,
 curTick(), tc-threadId(), tc-getCpuPtr()-name(), restartPC);
diff -r 3819b85ff21a -r a9023811bf9e src/arch/mips/utility.cc
--- a/src/arch/mips/utility.cc  Sat Sep 20 17:18:33 2014 -0400
+++ b/src/arch/mips/utility.cc  Sat Sep 20 17:18:35 2014 -0400
@@ -231,7 +231,7 @@
 void
 startupCPU(ThreadContext *tc, int cpuId)
 {
-tc-activate(Cycles(0));
+tc-activate();
 }
 
 void
diff -r 3819b85ff21a -r a9023811bf9e src/arch/power/utility.hh
--- a/src/arch/power/utility.hh Sat Sep 20 17:18:33 2014 -0400
+++ b/src/arch/power/utility.hh Sat Sep 20 17:18:35 2014 -0400
@@ -59,7 +59,7 @@
 inline void
 startupCPU(ThreadContext *tc, int cpuId)
 {
-tc-activate(Cycles(0));
+tc-activate();
 }
 
 void
diff -r 3819b85ff21a -r a9023811bf9e src/arch/sparc/utility.hh
--- a/src/arch/sparc/utility.hh Sat Sep 20 17:18:33 2014 -0400
+++ b/src/arch/sparc/utility.hh Sat Sep 20 17:18:35 2014 -0400
@@ -77,7 +77,7 @@
 {
 // Other CPUs will get activated by IPIs
 if (cpuId == 0 || !FullSystem)
-tc-activate(Cycles(0));
+tc-activate();
 }
 
 void copyRegs(ThreadContext *src, ThreadContext *dest);
diff -r 3819b85ff21a -r a9023811bf9e src/arch/x86/utility.cc
--- a/src/arch/x86/utility.cc   Sat Sep 20 17:18:33 2014 -0400
+++ b/src/arch/x86/utility.cc   Sat Sep 20 17:18:35 2014 -0400
@@ -203,12 +203,12 @@
 void startupCPU(ThreadContext *tc, int cpuId)
 {
 if (cpuId == 0 || !FullSystem) {
-tc-activate(Cycles(0));
+tc-activate();
 } else {
 // This is an application processor (AP). It should be initialized to
 // 

[gem5-dev] changeset in gem5: cpu: Only iterate over possible threads on th...

2014-09-09 Thread Mitch Hayenga via gem5-dev
changeset c870b43d2ba6 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=c870b43d2ba6
description:
cpu: Only iterate over possible threads on the o3 cpu

Some places in O3 always iterated over Impl::MaxThreads even if a CPU 
had
fewer threads.  This removes a few of those instances.

diffstat:

 src/cpu/o3/fetch_impl.hh |  6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diffs (30 lines):

diff -r 535e088955ca -r c870b43d2ba6 src/cpu/o3/fetch_impl.hh
--- a/src/cpu/o3/fetch_impl.hh  Tue Sep 09 04:36:33 2014 -0400
+++ b/src/cpu/o3/fetch_impl.hh  Tue Sep 09 04:36:34 2014 -0400
@@ -419,7 +419,7 @@
 void
 DefaultFetchImpl::drainResume()
 {
-for (ThreadID i = 0; i  Impl::MaxThreads; ++i)
+for (ThreadID i = 0; i  numThreads; ++i)
 stalls[i].drain = false;
 }
 
@@ -887,7 +887,7 @@
 
 wroteToTimeBuffer = false;
 
-for (ThreadID i = 0; i  Impl::MaxThreads; ++i) {
+for (ThreadID i = 0; i  numThreads; ++i) {
 issuePipelinedIfetch[i] = false;
 }
 
@@ -927,7 +927,7 @@
 }
 
 // Issue the next I-cache request if possible.
-for (ThreadID i = 0; i  Impl::MaxThreads; ++i) {
+for (ThreadID i = 0; i  numThreads; ++i) {
 if (issuePipelinedIfetch[i]) {
 pipelineIcacheAccesses(i);
 }
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Change writeback modeling for outstandin...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 5b6279635c49 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5b6279635c49
description:
cpu: Change writeback modeling for outstanding instructions

As highlighed on the mailing list gem5's writeback modeling can impact
performance.  This patch removes the limitation on maximum outstanding 
issued
instructions, however the number that can writeback in a single cycle 
is still
respected in instToCommit().

diffstat:

 configs/common/O3_ARM_v7a.py  |   1 -
 src/cpu/o3/O3CPU.py   |   1 -
 src/cpu/o3/iew.hh |  53 ---
 src/cpu/o3/iew_impl.hh|  10 
 src/cpu/o3/inst_queue_impl.hh |   2 -
 src/cpu/o3/lsq_unit.hh|   7 -
 src/cpu/o3/lsq_unit_impl.hh   |   5 +---
 7 files changed, 1 insertions(+), 78 deletions(-)

diffs (210 lines):

diff -r 43516d8eabe9 -r 5b6279635c49 configs/common/O3_ARM_v7a.py
--- a/configs/common/O3_ARM_v7a.py  Wed Sep 03 07:42:32 2014 -0400
+++ b/configs/common/O3_ARM_v7a.py  Wed Sep 03 07:42:33 2014 -0400
@@ -126,7 +126,6 @@
 dispatchWidth = 6
 issueWidth = 8
 wbWidth = 8
-wbDepth = 1
 fuPool = O3_ARM_v7a_FUP()
 iewToCommitDelay = 1
 renameToROBDelay = 1
diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/O3CPU.py
--- a/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:32 2014 -0400
+++ b/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:33 2014 -0400
@@ -84,7 +84,6 @@
 dispatchWidth = Param.Unsigned(8, Dispatch width)
 issueWidth = Param.Unsigned(8, Issue width)
 wbWidth = Param.Unsigned(8, Writeback width)
-wbDepth = Param.Unsigned(1, Writeback depth)
 fuPool = Param.FUPool(DefaultFUPool(), Functional Unit pool)
 
 iewToCommitDelay = Param.Cycles(1, Issue/Execute/Writeback to commit 
diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/iew.hh
--- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:32 2014 -0400
+++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:33 2014 -0400
@@ -219,49 +219,6 @@
 /** Returns if the LSQ has any stores to writeback. */
 bool hasStoresToWB(ThreadID tid) { return ldstQueue.hasStoresToWB(tid); }
 
-void incrWb(InstSeqNum sn)
-{
-++wbOutstanding;
-if (wbOutstanding == wbMax)
-ableToIssue = false;
-DPRINTF(IEW, wbOutstanding: %i [sn:%lli]\n, wbOutstanding, sn);
-assert(wbOutstanding = wbMax);
-#ifdef DEBUG
-wbList.insert(sn);
-#endif
-}
-
-void decrWb(InstSeqNum sn)
-{
-if (wbOutstanding == wbMax)
-ableToIssue = true;
-wbOutstanding--;
-DPRINTF(IEW, wbOutstanding: %i [sn:%lli]\n, wbOutstanding, sn);
-assert(wbOutstanding = 0);
-#ifdef DEBUG
-assert(wbList.find(sn) != wbList.end());
-wbList.erase(sn);
-#endif
-}
-
-#ifdef DEBUG
-std::setInstSeqNum wbList;
-
-void dumpWb()
-{
-std::setInstSeqNum::iterator wb_it = wbList.begin();
-while (wb_it != wbList.end()) {
-cprintf([sn:%lli]\n,
-(*wb_it));
-wb_it++;
-}
-}
-#endif
-
-bool canIssue() { return ableToIssue; }
-
-bool ableToIssue;
-
 /** Check misprediction  */
 void checkMisprediction(DynInstPtr inst);
 
@@ -452,19 +409,9 @@
  */
 unsigned wbCycle;
 
-/** Number of instructions in flight that will writeback. */
-
-/** Number of instructions in flight that will writeback. */
-int wbOutstanding;
-
 /** Writeback width. */
 unsigned wbWidth;
 
-/** Writeback width * writeback depth, where writeback depth is
- * the number of cycles of writing back instructions that can be
- * buffered. */
-unsigned wbMax;
-
 /** Number of active threads. */
 ThreadID numThreads;
 
diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/iew_impl.hh
--- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:32 2014 -0400
+++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:33 2014 -0400
@@ -76,7 +76,6 @@
   issueToExecuteDelay(params-issueToExecuteDelay),
   dispatchWidth(params-dispatchWidth),
   issueWidth(params-issueWidth),
-  wbOutstanding(0),
   wbWidth(params-wbWidth),
   numThreads(params-numThreads)
 {
@@ -109,12 +108,8 @@
 fetchRedirect[tid] = false;
 }
 
-wbMax = wbWidth * params-wbDepth;
-
 updateLSQNextCycle = false;
 
-ableToIssue = true;
-
 skidBufferMax = (3 * (renameToIEWDelay * params-renameWidth)) + 
issueWidth;
 }
 
@@ -635,8 +630,6 @@
 ++wbCycle;
 wbNumInst = 0;
 }
-
-assert((wbCycle * wbWidth + wbNumInst) = wbMax);
 }
 
 DPRINTF(IEW, Current wb cycle: %i, width: %i, numInst: %i\nwbActual:%i\n,
@@ -1263,7 +1256,6 @@
 
 ++iewExecSquashedInsts;
 
-decrWb(inst-seqNum);
 continue;
 }
 
@@ -1502,8 +1494,6 @@
 }
 writebackCount[tid]++;
 }
-
-decrWb(inst-seqNum);
 }
 }
 
diff -r 43516d8eabe9 -r 

[gem5-dev] changeset in gem5: config: Change parsing of Addr so hex values ...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 19f5df7ac6a1 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=19f5df7ac6a1
description:
config: Change parsing of Addr so hex values work from scripts

When passed from a configuration script with a hexadecimal value (like
0x8000), gem5 would error out. This is because it would call
toMemorySize which requires the argument to end with a size specifier 
(like
1MB, etc).

This modification makes it so raw hex values can be passed through Addr
parameters from the configuration scripts.

diffstat:

 src/arch/arm/ArmSystem.py |   2 +-
 src/python/m5/params.py   |  12 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diffs (35 lines):

diff -r d2850235e31c -r 19f5df7ac6a1 src/arch/arm/ArmSystem.py
--- a/src/arch/arm/ArmSystem.py Wed Sep 03 07:42:19 2014 -0400
+++ b/src/arch/arm/ArmSystem.py Wed Sep 03 07:42:20 2014 -0400
@@ -65,7 +65,7 @@
 highest_el_is_64 = Param.Bool(False,
 True if the register width of the highest implemented exception level 

 is 64 bits (ARMv8))
-reset_addr_64 = Param.UInt64(0x0,
+reset_addr_64 = Param.Addr(0x0,
 Reset address if the highest implemented exception level is 64 bits 
 (ARMv8))
 phys_addr_range_64 = Param.UInt8(40,
diff -r d2850235e31c -r 19f5df7ac6a1 src/python/m5/params.py
--- a/src/python/m5/params.py   Wed Sep 03 07:42:19 2014 -0400
+++ b/src/python/m5/params.py   Wed Sep 03 07:42:20 2014 -0400
@@ -626,9 +626,17 @@
 self.value = value.value
 else:
 try:
+# Often addresses are referred to with sizes. Ex: A device
+# base address is at 512MB.  Use toMemorySize() to convert
+# these into addresses. If the address is not specified with a
+# size, an exception will occur and numeric translation will
+# proceed below.
 self.value = convert.toMemorySize(value)
-except TypeError:
-self.value = long(value)
+except (TypeError, ValueError):
+# Convert number to string and use long() to do automatic
+# base conversion (requires base=0 for auto-conversion)
+self.value = long(str(value), base=0)
+
 self._check()
 def __add__(self, other):
 if isinstance(other, Addr):
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: dev: Avoid invalid sized reads in PL390 with ...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 72890a571a7b in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=72890a571a7b
description:
dev: Avoid invalid sized reads in PL390 with DPRINTF enabled

The first DPRINTF() in PL390::writeDistributor always read a uint32_t, 
though a
packet may have only been 1 or 2 bytes.  This caused an assertion in
packet-get().

diffstat:

 src/dev/arm/gic_pl390.cc |  19 ++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diffs (30 lines):

diff -r 82a4fa2d19a0 -r 72890a571a7b src/dev/arm/gic_pl390.cc
--- a/src/dev/arm/gic_pl390.cc  Wed Sep 03 07:42:25 2014 -0400
+++ b/src/dev/arm/gic_pl390.cc  Wed Sep 03 07:42:27 2014 -0400
@@ -395,8 +395,25 @@
 assert(pkt-req-hasContextId());
 int ctx_id = pkt-req-contextId();
 
+uint32_t pkt_data M5_VAR_USED;
+switch (pkt-getSize())
+{
+  case 1:
+pkt_data = pkt-getuint8_t();
+break;
+  case 2:
+pkt_data = pkt-getuint16_t();
+break;
+  case 4:
+pkt_data = pkt-getuint32_t();
+break;
+  default:
+panic(Invalid size when writing to priority regs in Gic: %d\n,
+  pkt-getSize());
+}
+
 DPRINTF(GIC, gic distributor write register %#x size %#x value %#x \n,
-daddr, pkt-getSize(), pkt-getuint32_t());
+daddr, pkt-getSize(), pkt_data);
 
 if (daddr = ICDISER_ST  daddr  ICDISER_ED + 4) {
 assert((daddr-ICDISER_ST)  2  32);
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arch: Properly guess OpClass from optional St...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 43516d8eabe9 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=43516d8eabe9
description:
arch: Properly guess OpClass from optional StaticInst flags

isa_parser.py guesses the OpClass if none were given based upon the 
StaticInst
flags.  The existing code does not take into account optionally set 
flags.
This code hoists the setting of optional flags so OpClass is properly 
assigned.

diffstat:

 src/arch/isa_parser.py |  36 +---
 1 files changed, 25 insertions(+), 11 deletions(-)

diffs (57 lines):

diff -r 7aacec2a247d -r 43516d8eabe9 src/arch/isa_parser.py
--- a/src/arch/isa_parser.pyWed Sep 03 07:42:31 2014 -0400
+++ b/src/arch/isa_parser.pyWed Sep 03 07:42:32 2014 -0400
@@ -1,3 +1,15 @@
+# Copyright (c) 2014 ARM Limited
+# All rights reserved
+#
+# The license below extends only to copyright in the software and shall
+# not be construed as granting a license to any other intellectual
+# property including but not limited to intellectual property relating
+# to a hardware implementation of the functionality of the software
+# licensed hereunder.  You may use the software subject to the license
+# terms below provided that you ensure that this notice is replicated
+# unmodified and in its entirety in all distributions of the software,
+# modified or unmodified, in source code or in binary form.
+#
 # Copyright (c) 2003-2005 The Regents of The University of Michigan
 # Copyright (c) 2013 Advanced Micro Devices, Inc.
 # All rights reserved.
@@ -1119,17 +1131,7 @@
 
 self.flags = self.operands.concatAttrLists('flags')
 
-# Make a basic guess on the operand class (function unit type).
-# These are good enough for most cases, and can be overridden
-# later otherwise.
-if 'IsStore' in self.flags:
-self.op_class = 'MemWriteOp'
-elif 'IsLoad' in self.flags or 'IsPrefetch' in self.flags:
-self.op_class = 'MemReadOp'
-elif 'IsFloating' in self.flags:
-self.op_class = 'FloatAddOp'
-else:
-self.op_class = 'IntAluOp'
+self.op_class = None
 
 # Optional arguments are assumed to be either StaticInst flags
 # or an OpClass value.  To avoid having to import a complete
@@ -1144,6 +1146,18 @@
 error('InstObjParams: optional arg %s not recognized '
   'as StaticInst::Flag or OpClass.' % oa)
 
+# Make a basic guess on the operand class if not set.
+# These are good enough for most cases.
+if not self.op_class:
+if 'IsStore' in self.flags:
+self.op_class = 'MemWriteOp'
+elif 'IsLoad' in self.flags or 'IsPrefetch' in self.flags:
+self.op_class = 'MemReadOp'
+elif 'IsFloating' in self.flags:
+self.op_class = 'FloatAddOp'
+else:
+self.op_class = 'IntAluOp'
+
 # add flag initialization to contructor here to include
 # any flags added via opt_args
 self.constructor += makeFlagConstructor(self.flags)
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Fix SMT scheduling issue with the O3 cpu

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset ed05298e8566 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=ed05298e8566
description:
cpu: Fix SMT scheduling issue with the O3 cpu

The o3 cpu could attempt to schedule inactive threads under round-robin 
SMT
mode.

This is because it maintained an independent priority list of threads 
from the
active thread list.  This priority list could be come stale once 
threads were
inactive, leading to the cpu trying to fetch/commit from inactive 
threads.


Additionally the fetch queue is now forcibly flushed of instrctuctions
from the de-scheduled thread.

Relevant output:

24557000: system.cpu: [tid:1]: Calling deactivate thread.
24557000: system.cpu: [tid:1]: Removing from active threads list

24557500: system.cpu:
FullO3CPU: Ticking main, FullO3CPU.
24557500: system.cpu.fetch: Running stage.
24557500: system.cpu.fetch: Attempting to fetch from [tid:1]

diffstat:

 src/cpu/o3/O3CPU.py   |3 +-
 src/cpu/o3/commit.hh  |5 +-
 src/cpu/o3/commit_impl.hh |   15 +-
 src/cpu/o3/cpu.cc |5 +-
 src/cpu/o3/fetch.hh   |6 +-
 src/cpu/o3/fetch_impl.hh  |  109 +
 6 files changed, 99 insertions(+), 44 deletions(-)

diffs (truncated from 306 to 300 lines):

diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/O3CPU.py
--- a/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:37 2014 -0400
@@ -61,7 +61,8 @@
 commitToFetchDelay = Param.Cycles(1, Commit to fetch delay)
 fetchWidth = Param.Unsigned(8, Fetch width)
 fetchBufferSize = Param.Unsigned(64, Fetch buffer size in bytes)
-fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops)
+fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops 
+per-thread)
 
 renameToDecodeDelay = Param.Cycles(1, Rename to decode delay)
 iewToDecodeDelay = Param.Cycles(1, Issue/Execute/Writeback to decode 
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh  Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/commit.hh  Wed Sep 03 07:42:37 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2012 ARM Limited
+ * Copyright (c) 2010-2012, 2014 ARM Limited
  * All rights reserved.
  *
  * The license below extends only to copyright in the software and shall
@@ -218,6 +218,9 @@
 /** Takes over from another CPU's thread. */
 void takeOverFrom();
 
+/** Deschedules a thread from scheduling */
+void deactivateThread(ThreadID tid);
+
 /** Ticks the commit stage, which tries to commit instructions. */
 void tick();
 
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:37 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2013 ARM Limited
+ * Copyright (c) 2010-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -463,6 +463,19 @@
 
 template class Impl
 void
+DefaultCommitImpl::deactivateThread(ThreadID tid)
+{
+listThreadID::iterator thread_it = std::find(priority_list.begin(),
+priority_list.end(), tid);
+
+if (thread_it != priority_list.end()) {
+priority_list.erase(thread_it);
+}
+}
+
+
+template class Impl
+void
 DefaultCommitImpl::updateStatus()
 {
 // reset ROB changed variable
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/cpu.cc
--- a/src/cpu/o3/cpu.cc Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/cpu.cc Wed Sep 03 07:42:37 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2011-2012 ARM Limited
+ * Copyright (c) 2011-2012, 2014 ARM Limited
  * Copyright (c) 2013 Advanced Micro Devices, Inc.
  * All rights reserved
  *
@@ -728,6 +728,9 @@
 tid);
 activeThreads.erase(thread_it);
 }
+
+fetch.deactivateThread(tid);
+commit.deactivateThread(tid);
 }
 
 template class Impl
diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/fetch.hh
--- a/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:36 2014 -0400
+++ b/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:37 2014 -0400
@@ -255,6 +255,8 @@
 /** Tells fetch to wake up from a quiesce instruction. */
 void wakeFromQuiesce();
 
+/** For priority-based fetch policies, need to keep update priorityList */
+void deactivateThread(ThreadID tid);
   private:
 /** Reset this pipeline stage */
 void resetStage();
@@ -484,8 +486,8 @@
 /** The size of the fetch queue in micro-ops */
 unsigned fetchQueueSize;
 
-/** Queue of fetched instructions */
-std::dequeDynInstPtr fetchQueue;
+/** Queue of fetched instructions. Per-thread to prevent HoL blocking. */
+std::dequeDynInstPtr fetchQueue[Impl::MaxThreads];
 
 /** Whether or not the fetch buffer data 

[gem5-dev] changeset in gem5: cpu: Add a fetch queue to the o3 cpu

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 12e3be8203a5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=12e3be8203a5
description:
cpu: Add a fetch queue to the o3 cpu

This patch adds a fetch queue that sits between fetch and decode to the
o3 cpu.  This effectively decouples fetch from decode stalls allowing it
to be more aggressive, running futher ahead in the instruction stream.

diffstat:

 src/cpu/o3/O3CPU.py  |   1 +
 src/cpu/o3/fetch.hh  |  14 +++---
 src/cpu/o3/fetch_impl.hh |  61 ++-
 3 files changed, 55 insertions(+), 21 deletions(-)

diffs (201 lines):

diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/O3CPU.py
--- a/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:34 2014 -0400
+++ b/src/cpu/o3/O3CPU.py   Wed Sep 03 07:42:35 2014 -0400
@@ -61,6 +61,7 @@
 commitToFetchDelay = Param.Cycles(1, Commit to fetch delay)
 fetchWidth = Param.Unsigned(8, Fetch width)
 fetchBufferSize = Param.Unsigned(64, Fetch buffer size in bytes)
+fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops)
 
 renameToDecodeDelay = Param.Cycles(1, Rename to decode delay)
 iewToDecodeDelay = Param.Cycles(1, Issue/Execute/Writeback to decode 
diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/fetch.hh
--- a/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:34 2014 -0400
+++ b/src/cpu/o3/fetch.hh   Wed Sep 03 07:42:35 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2012 ARM Limited
+ * Copyright (c) 2010-2012, 2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -401,9 +401,6 @@
 /** Wire to get commit's information from backwards time buffer. */
 typename TimeBufferTimeStruct::wire fromCommit;
 
-/** Internal fetch instruction queue. */
-TimeBufferFetchStruct *fetchQueue;
-
 //Might be annoying how this name is different than the queue.
 /** Wire used to write any information heading to decode. */
 typename TimeBufferFetchStruct::wire toDecode;
@@ -455,6 +452,9 @@
 /** The width of fetch in instructions. */
 unsigned fetchWidth;
 
+/** The width of decode in instructions. */
+unsigned decodeWidth;
+
 /** Is the cache blocked?  If so no threads can access it. */
 bool cacheBlocked;
 
@@ -481,6 +481,12 @@
 /** The PC of the first instruction loaded into the fetch buffer. */
 Addr fetchBufferPC[Impl::MaxThreads];
 
+/** The size of the fetch queue in micro-ops */
+unsigned fetchQueueSize;
+
+/** Queue of fetched instructions */
+std::dequeDynInstPtr fetchQueue;
+
 /** Whether or not the fetch buffer data is valid. */
 bool fetchBufferValid[Impl::MaxThreads];
 
diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/fetch_impl.hh
--- a/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:34 2014 -0400
+++ b/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:35 2014 -0400
@@ -82,11 +82,13 @@
   iewToFetchDelay(params-iewToFetchDelay),
   commitToFetchDelay(params-commitToFetchDelay),
   fetchWidth(params-fetchWidth),
+  decodeWidth(params-decodeWidth),
   retryPkt(NULL),
   retryTid(InvalidThreadID),
   cacheBlkSize(cpu-cacheLineSize()),
   fetchBufferSize(params-fetchBufferSize),
   fetchBufferMask(fetchBufferSize - 1),
+  fetchQueueSize(params-fetchQueueSize),
   numThreads(params-numThreads),
   numFetchingThreads(params-smtNumFetchingThreads),
   finishTranslationEvent(this)
@@ -313,12 +315,10 @@
 
 templateclass Impl
 void
-DefaultFetchImpl::setFetchQueue(TimeBufferFetchStruct *fq_ptr)
+DefaultFetchImpl::setFetchQueue(TimeBufferFetchStruct *ftb_ptr)
 {
-fetchQueue = fq_ptr;
-
-// Create wire to write information to proper place in fetch queue.
-toDecode = fetchQueue-getWire(0);
+// Create wire to write information to proper place in fetch time buf.
+toDecode = ftb_ptr-getWire(0);
 }
 
 templateclass Impl
@@ -342,6 +342,7 @@
 cacheBlocked = false;
 
 priorityList.clear();
+fetchQueue.clear();
 
 // Setup PC and nextPC with initial state.
 for (ThreadID tid = 0; tid  numThreads; ++tid) {
@@ -454,6 +455,10 @@
 return false;
 }
 
+// Not drained if fetch queue contains entries
+if (!fetchQueue.empty())
+return false;
+
 /* The pipeline might start up again in the middle of the drain
  * cycle if the finish translation event is scheduled, so make
  * sure that's not the case.
@@ -673,11 +678,8 @@
 fetchStatus[tid] = IcacheWaitResponse;
 }
 } else {
-// Don't send an instruction to decode if it can't handle it.
-// Asynchronous nature of this function's calling means we have to
-// check 2 signals to see if decode is stalled.
-if (!(numInst  fetchWidth) || stalls[tid].decode ||
-fromDecode-decodeBlock[tid]) {
+// Don't send an instruction to decode if we can't handle it.
+if (!(numInst 

[gem5-dev] changeset in gem5: cpu: Fix o3 front-end pipeline interlock beha...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 867b536a68be in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=867b536a68be
description:
cpu: Fix o3 front-end pipeline interlock behavior

The o3 pipeline interlock/stall logic is incorrect.  o3 unnecessicarily 
stalled
fetch and decode due to later stages in the pipeline.  In general, a 
stage
should usually only consider if it is stalled by the adjacent, 
downstream stage.
Forcing stalls due to later stages creates and results in bubbles in the
pipeline.  Additionally, o3 stalled the entire frontend (fetch, decode, 
rename)
on a branch mispredict while the ROB is being serially walked to update 
the
RAT (robSquashing). Only should have stalled at rename.

diffstat:

 src/cpu/o3/comm.hh|   2 -
 src/cpu/o3/commit.hh  |  11 
 src/cpu/o3/commit_impl.hh |  40 -
 src/cpu/o3/decode.hh  |   4 +--
 src/cpu/o3/decode_impl.hh |  55 +++-
 src/cpu/o3/fetch.hh   |   3 --
 src/cpu/o3/fetch_impl.hh  |  64 --
 src/cpu/o3/iew.hh |  11 
 src/cpu/o3/iew_impl.hh|  23 +---
 src/cpu/o3/rename_impl.hh |  25 +
 10 files changed, 26 insertions(+), 212 deletions(-)

diffs (truncated from 525 to 300 lines):

diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/comm.hh
--- a/src/cpu/o3/comm.hhWed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/comm.hhWed Sep 03 07:42:34 2014 -0400
@@ -229,8 +229,6 @@
 bool renameUnblock[Impl::MaxThreads];
 bool iewBlock[Impl::MaxThreads];
 bool iewUnblock[Impl::MaxThreads];
-bool commitBlock[Impl::MaxThreads];
-bool commitUnblock[Impl::MaxThreads];
 };
 
 #endif //__CPU_O3_COMM_HH__
diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh  Wed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/commit.hh  Wed Sep 03 07:42:34 2014 -0400
@@ -185,9 +185,6 @@
 /** Sets the pointer to the IEW stage. */
 void setIEWStage(IEW *iew_stage);
 
-/** Skid buffer between rename and commit. */
-std::queueDynInstPtr skidBuffer;
-
 /** The pointer to the IEW stage. Used solely to ensure that
  * various events (traps, interrupts, syscalls) do not occur until
  * all stores have written back.
@@ -251,11 +248,6 @@
  */
 void setNextStatus();
 
-/** Checks if the ROB is completed with squashing. This is for the case
- * where the ROB can take multiple cycles to complete squashing.
- */
-bool robDoneSquashing();
-
 /** Returns if any of the threads have the number of ROB entries changed
  * on this cycle. Used to determine if the number of free ROB entries needs
  * to be sent back to previous stages.
@@ -321,9 +313,6 @@
 /** Gets instructions from rename and inserts them into the ROB. */
 void getInsts();
 
-/** Insert all instructions from rename into skidBuffer */
-void skidInsert();
-
 /** Marks completed instructions using information sent from IEW. */
 void markCompletedInsts();
 
diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:34 2014 -0400
@@ -1335,29 +1335,6 @@
 
 template class Impl
 void
-DefaultCommitImpl::skidInsert()
-{
-DPRINTF(Commit, Attempting to any instructions from rename into 
-skidBuffer.\n);
-
-for (int inst_num = 0; inst_num  fromRename-size; ++inst_num) {
-DynInstPtr inst = fromRename-insts[inst_num];
-
-if (!inst-isSquashed()) {
-DPRINTF(Commit, Inserting PC %s [sn:%i] [tid:%i] into ,
-skidBuffer.\n, inst-pcState(), inst-seqNum,
-inst-threadNumber);
-skidBuffer.push(inst);
-} else {
-DPRINTF(Commit, Instruction PC %s [sn:%i] [tid:%i] was 
-squashed, skipping.\n,
-inst-pcState(), inst-seqNum, inst-threadNumber);
-}
-}
-}
-
-template class Impl
-void
 DefaultCommitImpl::markCompletedInsts()
 {
 // Grab completed insts out of the IEW instruction queue, and mark
@@ -1380,23 +1357,6 @@
 }
 
 template class Impl
-bool
-DefaultCommitImpl::robDoneSquashing()
-{
-listThreadID::iterator threads = activeThreads-begin();
-listThreadID::iterator end = activeThreads-end();
-
-while (threads != end) {
-ThreadID tid = *threads++;
-
-if (!rob-isDoneSquashing(tid))
-return false;
-}
-
-return true;
-}
-
-template class Impl
 void
 DefaultCommitImpl::updateComInstStats(DynInstPtr inst)
 {
diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/decode.hh
--- a/src/cpu/o3/decode.hh  Wed Sep 03 07:42:33 2014 -0400
+++ b/src/cpu/o3/decode.hh  Wed Sep 03 07:42:34 2014 -0400
@@ -126,7 +126,7 @@
 void drainSanityCheck() const;
 
 /** Has the stage 

[gem5-dev] changeset in gem5: cpu: Fix o3 drain bug

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 40d24a672351 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=40d24a672351
description:
cpu: Fix o3 drain bug

For X86, the o3 CPU would get stuck with the commit stage not being
drained if an interrupt arrived while drain was pending. isDrained()
makes sure that pcState.microPC() == 0, thus ensuring that we are at
an instruction boundary. However, when we take an interrupt we
execute:

pcState.upc(romMicroPC(entry));
pcState.nupc(romMicroPC(entry) + 1);
tc-pcState(pcState);

As a result, the MicroPC is no longer zero. This patch ensures the 
drain is
delayed until no interrupts are present.  Once draining, non-synchronous
interrupts are deffered until after the switch.

diffstat:

 src/cpu/o3/commit.hh  |  11 ++-
 src/cpu/o3/commit_impl.hh |  15 ---
 2 files changed, 22 insertions(+), 4 deletions(-)

diffs (72 lines):

diff -r 53278be85b40 -r 40d24a672351 src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh  Wed Sep 03 07:42:44 2014 -0400
+++ b/src/cpu/o3/commit.hh  Wed Sep 03 07:42:45 2014 -0400
@@ -438,9 +438,18 @@
 /** Number of Active Threads */
 ThreadID numThreads;
 
-/** Is a drain pending. */
+/** Is a drain pending? Commit is looking for an instruction boundary while
+ * there are no pending interrupts
+ */
 bool drainPending;
 
+/** Is a drain imminent? Commit has found an instruction boundary while no
+ * interrupts were present or in flight.  This was the last architecturally
+ * committed instruction.  Interrupts disabled and pipeline flushed.
+ * Waiting for structures to finish draining.
+ */
+bool drainImminent;
+
 /** The latency to handle a trap.  Used when scheduling trap
  * squash event.
  */
diff -r 53278be85b40 -r 40d24a672351 src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:44 2014 -0400
+++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:45 2014 -0400
@@ -104,6 +104,7 @@
   commitWidth(params-commitWidth),
   numThreads(params-numThreads),
   drainPending(false),
+  drainImminent(false),
   trapLatency(params-trapLatency),
   canHandleInterrupts(true),
   avoidQuiesceLiveLock(false)
@@ -406,6 +407,7 @@
 DefaultCommitImpl::drainResume()
 {
 drainPending = false;
+drainImminent = false;
 }
 
 template class Impl
@@ -816,8 +818,10 @@
 void
 DefaultCommitImpl::propagateInterrupt()
 {
+// Don't propagate intterupts if we are currently handling a trap or
+// in draining and the last observable instruction has been committed.
 if (commitStatus[0] == TrapPending || interrupt || trapSquash[0] ||
-tcSquash[0])
+tcSquash[0] || drainImminent)
 return;
 
 // Process interrupts if interrupts are enabled, not in PAL
@@ -1089,10 +1093,15 @@
 squashAfter(tid, head_inst);
 
 if (drainPending) {
-DPRINTF(Drain, Draining: %i:%s\n, tid, pc[tid]);
-if (pc[tid].microPC() == 0  interrupt == NoFault) {
+if (pc[tid].microPC() == 0  interrupt == NoFault 
+!thread[tid]-trapPending) {
+// Last architectually committed instruction.
+// Squash the pipeline, stall fetch, and use
+// drainImminent to disable interrupts
+DPRINTF(Drain, Draining: %i:%s\n, tid, pc[tid]);
 squashAfter(tid, head_inst);
 cpu-commitDrained(tid);
+drainImminent = true;
 }
 }
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Fix v8 neon latency issue for loads/stores

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 53278be85b40 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=53278be85b40
description:
arm: Fix v8 neon latency issue for loads/stores

Neon memory ops that operate on multiple registers currently have very 
poor
performance because of interleave/deinterleave micro-ops.

This patch marks the deinterleave/interleave micro-ops as No_OpClass 
such
that they take minumum cycles to execute and are never resource 
constrained.

Additionaly the micro-ops over-read registers.  Although one form may 
need
to read up to 20 sources, not all do.  This adds in new forms so false
dependencies are not modeled.  Instructions read their minimum number of
sources.

diffstat:

 src/arch/arm/insts/macromem.cc|  47 +-
 src/arch/arm/isa/insts/neon64_mem.isa |  24 +++-
 2 files changed, 56 insertions(+), 15 deletions(-)

diffs (140 lines):

diff -r 8bee5f4edb92 -r 53278be85b40 src/arch/arm/insts/macromem.cc
--- a/src/arch/arm/insts/macromem.ccTue Apr 29 16:05:02 2014 -0500
+++ b/src/arch/arm/insts/macromem.ccWed Sep 03 07:42:44 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2013 ARM Limited
+ * Copyright (c) 2010-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -1107,9 +1107,26 @@
 }
 
 for (int i = 0; i  numMarshalMicroops; ++i) {
-microOps[uopIdx++] = new MicroDeintNeon64(
-machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
-numStructElems, numRegs, i /* step */);
+switch(numRegs) {
+case 1: microOps[uopIdx++] = new MicroDeintNeon64_1Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 1, i /* step */);
+break;
+case 2: microOps[uopIdx++] = new MicroDeintNeon64_2Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 2, i /* step */);
+break;
+case 3: microOps[uopIdx++] = new MicroDeintNeon64_3Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 3, i /* step */);
+break;
+case 4: microOps[uopIdx++] = new MicroDeintNeon64_4Reg(
+machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize,
+numStructElems, 4, i /* step */);
+break;
+default: panic(Invalid number of registers);
+}
+
 }
 
 assert(uopIdx == numMicroops);
@@ -1150,9 +1167,25 @@
 unsigned uopIdx = 0;
 
 for(int i = 0; i  numMarshalMicroops; ++i) {
-microOps[uopIdx++] = new MicroIntNeon64(
-machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
-numStructElems, numRegs, i /* step */);
+switch (numRegs) {
+case 1: microOps[uopIdx++] = new MicroIntNeon64_1Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 1, i /* step */);
+break;
+case 2: microOps[uopIdx++] = new MicroIntNeon64_2Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 2, i /* step */);
+break;
+case 3: microOps[uopIdx++] = new MicroIntNeon64_3Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 3, i /* step */);
+break;
+case 4: microOps[uopIdx++] = new MicroIntNeon64_4Reg(
+machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize,
+numStructElems, 4, i /* step */);
+break;
+default: panic(Invalid number of registers);
+}
 }
 
 uint32_t memaccessFlags = TLB::MustBeOne | (TLB::ArmFlags) eSize |
diff -r 8bee5f4edb92 -r 53278be85b40 src/arch/arm/isa/insts/neon64_mem.isa
--- a/src/arch/arm/isa/insts/neon64_mem.isa Tue Apr 29 16:05:02 2014 -0500
+++ b/src/arch/arm/isa/insts/neon64_mem.isa Wed Sep 03 07:42:44 2014 -0400
@@ -1,6 +1,6 @@
 // -*- mode: c++ -*-
 
-// Copyright (c) 2012-2013 ARM Limited
+// Copyright (c) 2012-2014 ARM Limited
 // All rights reserved
 //
 // The license below extends only to copyright in the software and shall
@@ -163,11 +163,11 @@
 header_output += MicroNeonMemDeclare64.subst(loadIop) + \
 MicroNeonMemDeclare64.subst(storeIop)
 
-def mkMarshalMicroOp(name, Name):
+def mkMarshalMicroOp(name, Name, numRegs=4):
 global header_output, decoder_output, exec_output
 
 getInputCodeOp1L = ''
-for v in range(4):
+for v in range(numRegs):
 for p in 

[gem5-dev] changeset in gem5: x86: Flag instructions that call suspend as I...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 0b4d10f53c2d in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=0b4d10f53c2d
description:
x86: Flag instructions that call suspend as IsQuiesce

The o3 cpu relies upon instructions that suspend a thread context being
flagged as IsQuiesce.  If they are not, unpredictable behavior can 
occur.
This patch fixes that for the x86 ISA.

diffstat:

 src/arch/x86/isa/decoder/two_byte_opcodes.isa |  6 +++---
 src/arch/x86/isa/microops/specop.isa  |  3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diffs (33 lines):

diff -r 40d24a672351 -r 0b4d10f53c2d 
src/arch/x86/isa/decoder/two_byte_opcodes.isa
--- a/src/arch/x86/isa/decoder/two_byte_opcodes.isa Wed Sep 03 07:42:45 
2014 -0400
+++ b/src/arch/x86/isa/decoder/two_byte_opcodes.isa Wed Sep 03 07:42:46 
2014 -0400
@@ -141,13 +141,13 @@
 }}, IsNonSpeculative);
 0x01: m5quiesce({{
 PseudoInst::quiesce(xc-tcBase());
-}}, IsNonSpeculative);
+}}, IsNonSpeculative, IsQuiesce);
 0x02: m5quiesceNs({{
 PseudoInst::quiesceNs(xc-tcBase(), Rdi);
-}}, IsNonSpeculative);
+}}, IsNonSpeculative, IsQuiesce);
 0x03: m5quiesceCycle({{
 PseudoInst::quiesceCycles(xc-tcBase(), Rdi);
-}}, IsNonSpeculative);
+}}, IsNonSpeculative, IsQuiesce);
 0x04: m5quiesceTime({{
 Rax = PseudoInst::quiesceTime(xc-tcBase());
 }}, IsNonSpeculative);
diff -r 40d24a672351 -r 0b4d10f53c2d src/arch/x86/isa/microops/specop.isa
--- a/src/arch/x86/isa/microops/specop.isa  Wed Sep 03 07:42:45 2014 -0400
+++ b/src/arch/x86/isa/microops/specop.isa  Wed Sep 03 07:42:46 2014 -0400
@@ -63,7 +63,8 @@
 MicroHalt(ExtMachInst _machInst, const char * instMnem,
 uint64_t setFlags) :
 X86MicroopBase(_machInst, halt, instMnem,
-   setFlags | (ULL(1)  StaticInst::IsNonSpeculative),
+   setFlags | (ULL(1)  StaticInst::IsNonSpeculative) 
|
+   (ULL(1)  StaticInst::IsQuiesce),
No_OpClass)
 {
 }
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 6be8945d226b in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b
description:
cpu: Fix cache blocked load behavior in o3 cpu

This patch fixes the load blocked/replay mechanism in the o3 cpu.  
Rather than
flushing the entire pipeline, this patch replays loads once the cache 
becomes
unblocked.

Additionally, deferred memory instructions (loads which had conflicting 
stores),
when replayed would not respect the number of functional units (only 
respected
issue width).  This patch also corrects that.

Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.

diffstat:

 src/cpu/o3/iew.hh   |   13 +-
 src/cpu/o3/iew_impl.hh  |   57 ++
 src/cpu/o3/inst_queue.hh|   25 -
 src/cpu/o3/inst_queue_impl.hh   |   68 ++---
 src/cpu/o3/lsq.hh   |   27 +-
 src/cpu/o3/lsq_impl.hh  |   23 +---
 src/cpu/o3/lsq_unit.hh  |  198 ---
 src/cpu/o3/lsq_unit_impl.hh |   40 ++-
 src/cpu/o3/mem_dep_unit.hh  |4 +-
 src/cpu/o3/mem_dep_unit_impl.hh |4 +-
 10 files changed, 203 insertions(+), 256 deletions(-)

diffs (truncated from 846 to 300 lines):

diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh
--- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400
+++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2012 ARM Limited
+ * Copyright (c) 2010-2012, 2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -181,6 +181,12 @@
 /** Re-executes all rescheduled memory instructions. */
 void replayMemInst(DynInstPtr inst);
 
+/** Moves memory instruction onto the list of cache blocked instructions */
+void blockMemInst(DynInstPtr inst);
+
+/** Notifies that the cache has become unblocked */
+void cacheUnblocked();
+
 /** Sends an instruction to commit through the time buffer. */
 void instToCommit(DynInstPtr inst);
 
@@ -233,11 +239,6 @@
  */
 void squashDueToMemOrder(DynInstPtr inst, ThreadID tid);
 
-/** Sends commit proper information for a squash due to memory becoming
- * blocked (younger issued instructions must be retried).
- */
-void squashDueToMemBlocked(DynInstPtr inst, ThreadID tid);
-
 /** Sets Dispatch to blocked, and signals back to other stages to block. */
 void block(ThreadID tid);
 
diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew_impl.hh
--- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:38 2014 -0400
+++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:39 2014 -0400
@@ -530,29 +530,6 @@
 
 templateclass Impl
 void
-DefaultIEWImpl::squashDueToMemBlocked(DynInstPtr inst, ThreadID tid)
-{
-DPRINTF(IEW, [tid:%i]: Memory blocked, squashing load and younger insts, 
-PC: %s [sn:%i].\n, tid, inst-pcState(), inst-seqNum);
-if (!toCommit-squash[tid] ||
-inst-seqNum  toCommit-squashedSeqNum[tid]) {
-toCommit-squash[tid] = true;
-
-toCommit-squashedSeqNum[tid] = inst-seqNum;
-toCommit-pc[tid] = inst-pcState();
-toCommit-mispredictInst[tid] = NULL;
-
-// Must include the broadcasted SN in the squash.
-toCommit-includeSquashInst[tid] = true;
-
-ldstQueue.setLoadBlockedHandled(tid);
-
-wroteToTimeBuffer = true;
-}
-}
-
-templateclass Impl
-void
 DefaultIEWImpl::block(ThreadID tid)
 {
 DPRINTF(IEW, [tid:%u]: Blocking.\n, tid);
@@ -610,6 +587,20 @@
 
 templateclass Impl
 void
+DefaultIEWImpl::blockMemInst(DynInstPtr inst)
+{
+instQueue.blockMemInst(inst);
+}
+
+templateclass Impl
+void
+DefaultIEWImpl::cacheUnblocked()
+{
+instQueue.cacheUnblocked();
+}
+
+templateclass Impl
+void
 DefaultIEWImpl::instToCommit(DynInstPtr inst)
 {
 // This function should not be called after writebackInsts in a
@@ -1376,15 +1367,6 @@
 squashDueToMemOrder(violator, tid);
 
 ++memOrderViolationEvents;
-} else if (ldstQueue.loadBlocked(tid) 
-   !ldstQueue.isLoadBlockedHandled(tid)) {
-fetchRedirect[tid] = true;
-
-DPRINTF(IEW, Load operation couldn't execute because the 
-memory system is blocked.  PC: %s [sn:%lli]\n,
-inst-pcState(), inst-seqNum);
-
-squashDueToMemBlocked(inst, tid);
 }
 } else {
 // Reset any state associated with redirects that will not
@@ -1403,17 +1385,6 @@
 
 ++memOrderViolationEvents;
 }
-if (ldstQueue.loadBlocked(tid) 
-!ldstQueue.isLoadBlockedHandled(tid)) {
-DPRINTF(IEW, Load operation couldn't execute because the 
-memory system is blocked.  PC: %s [sn:%lli]\n,
-

[gem5-dev] changeset in gem5: arm: Mark v7 cbz instructions as direct branches

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 5e424aa952c5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5e424aa952c5
description:
arm: Mark v7 cbz instructions as direct branches

v7 cbz/cbnz instructions were improperly marked as indirect branches.

diffstat:

 src/arch/arm/isa/insts/branch.isa |  11 +++
 src/arch/arm/isa/templates/branch.isa |   6 +-
 2 files changed, 12 insertions(+), 5 deletions(-)

diffs (52 lines):

diff -r 6be8945d226b -r 5e424aa952c5 src/arch/arm/isa/insts/branch.isa
--- a/src/arch/arm/isa/insts/branch.isa Wed Sep 03 07:42:39 2014 -0400
+++ b/src/arch/arm/isa/insts/branch.isa Wed Sep 03 07:42:40 2014 -0400
@@ -1,6 +1,6 @@
 // -*- mode:c++ -*-
 
-// Copyright (c) 2010-2012 ARM Limited
+// Copyright (c) 2010-2012, 2014 ARM Limited
 // All rights reserved
 //
 // The license below extends only to copyright in the software and shall
@@ -174,12 +174,15 @@
 #CBNZ, CBZ. These are always unconditional as far as predicates
 for (mnem, test) in ((cbz, ==), (cbnz, !=)):
 code = 'NPC = (uint32_t)(PC + imm);\n'
+br_tgt_code = '''pcs.instNPC((uint32_t)(branchPC.instPC() + imm));'''
 predTest = Op1 %(test)s 0 % {test: test}
 iop = InstObjParams(mnem, mnem.capitalize(), BranchImmReg,
-{code: code, predicate_test: predTest},
-[IsIndirectControl])
+{code: code, predicate_test: predTest,
+brTgtCode : br_tgt_code},
+[IsDirectControl])
 header_output += BranchImmRegDeclare.subst(iop)
-decoder_output += BranchImmRegConstructor.subst(iop)
+decoder_output += BranchImmRegConstructor.subst(iop) + \
+  BranchTarget.subst(iop)
 exec_output += PredOpExecute.subst(iop)
 
 #TBB, TBH
diff -r 6be8945d226b -r 5e424aa952c5 src/arch/arm/isa/templates/branch.isa
--- a/src/arch/arm/isa/templates/branch.isa Wed Sep 03 07:42:39 2014 -0400
+++ b/src/arch/arm/isa/templates/branch.isa Wed Sep 03 07:42:40 2014 -0400
@@ -1,6 +1,6 @@
 // -*- mode:c++ -*-
 
-// Copyright (c) 2010 ARM Limited
+// Copyright (c) 2010, 2014 ARM Limited
 // All rights reserved
 //
 // The license below extends only to copyright in the software and shall
@@ -212,6 +212,10 @@
 %(class_name)s(ExtMachInst machInst,
int32_t imm, IntRegIndex _op1);
 %(BasicExecDeclare)s
+ArmISA::PCState branchTarget(const ArmISA::PCState branchPC) const;
+
+/// Explicitly import the otherwise hidden branchTarget
+using StaticInst::branchTarget;
 };
 }};
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: Fix o3 quiesce fetch bug

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset 1ba825974ee6 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=1ba825974ee6
description:
cpu: Fix o3 quiesce fetch bug

O3 is supposed to stop fetching instructions once a quiesce is 
encountered.
However due to a bug, it would continue fetching instructions from the 
current
fetch buffer.  This is because of a break statment that only broke out 
of the
first of 2 nested loops.  It should have broken out of both.

diffstat:

 src/cpu/o3/fetch_impl.hh |  8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diffs (34 lines):

diff -r ed05298e8566 -r 1ba825974ee6 src/cpu/o3/fetch_impl.hh
--- a/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:37 2014 -0400
+++ b/src/cpu/o3/fetch_impl.hh  Wed Sep 03 07:42:38 2014 -0400
@@ -1236,6 +1236,9 @@
 // ended this fetch block.
 bool predictedBranch = false;
 
+// Need to halt fetch if quiesce instruction detected
+bool quiesce = false;
+
 TheISA::MachInst *cacheInsts =
 reinterpret_castTheISA::MachInst *(fetchBuffer[tid]);
 
@@ -1246,7 +1249,7 @@
 // Keep issuing while fetchWidth is available and branch is not
 // predicted taken
 while (numInst  fetchWidth  fetchQueue[tid].size()  fetchQueueSize
-!predictedBranch) {
+!predictedBranch  !quiesce) {
 // We need to process more memory if we aren't going to get a
 // StaticInst from the rom, the current macroop, or what's already
 // in the decoder.
@@ -1363,9 +1366,10 @@
 
 if (instruction-isQuiesce()) {
 DPRINTF(Fetch,
-Quiesce instruction encountered, halting fetch!);
+Quiesce instruction encountered, halting fetch!\n);
 fetchStatus[tid] = QuiescePending;
 status_change = true;
+quiesce = true;
 break;
 }
 } while ((curMacroop || decoder[tid]-instReady()) 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Make memory ops work on 64bit/128-bit qu...

2014-09-03 Thread Mitch Hayenga via gem5-dev
changeset d96b61d843b2 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=d96b61d843b2
description:
arm: Make memory ops work on 64bit/128-bit quantities

Multiple instructions assume only 32-bit load operations are available,
this patch increases load sizes to 64-bit or 128-bit for many load pair 
and
load multiple instructions.

diffstat:

 src/arch/arm/insts/macromem.cc  |  388 ++-
 src/arch/arm/insts/macromem.hh  |   22 +-
 src/arch/arm/isa/insts/ldr64.isa|   90 +++---
 src/arch/arm/isa/insts/macromem.isa |   24 +-
 src/arch/arm/isa/insts/mem.isa  |4 +-
 src/arch/arm/isa/templates/macromem.isa |   35 ++-
 6 files changed, 355 insertions(+), 208 deletions(-)

diffs (truncated from 864 to 300 lines):

diff -r b5bef3c8e070 -r d96b61d843b2 src/arch/arm/insts/macromem.cc
--- a/src/arch/arm/insts/macromem.ccFri Jun 27 12:29:00 2014 -0500
+++ b/src/arch/arm/insts/macromem.ccWed Sep 03 07:42:52 2014 -0400
@@ -61,14 +61,29 @@
 {
 uint32_t regs = reglist;
 uint32_t ones = number_of_ones(reglist);
-// Remember that writeback adds a uop or two and the temp register adds one
-numMicroops = ones + (writeback ? (load ? 2 : 1) : 0) + 1;
+uint32_t mem_ops = ones;
 
-// It's technically legal to do a lot of nothing
-if (!ones)
+// Copy the base address register if we overwrite it, or if this 
instruction
+// is basically a no-op (we have to do something)
+bool copy_base =  (bits(reglist, rn)  load) || !ones;
+bool force_user = user  !bits(reglist, 15);
+bool exception_ret = user  bits(reglist, 15);
+bool pc_temp = load  writeback  bits(reglist, 15);
+
+if (!ones) {
 numMicroops = 1;
+} else if (load) {
+numMicroops = ((ones + 1) / 2)
++ ((ones % 2 == 0  exception_ret) ? 1 : 0)
++ (copy_base ? 1 : 0)
++ (writeback? 1 : 0)
++ (pc_temp ? 1 : 0);
+} else {
+numMicroops = ones + (writeback ? 1 : 0);
+}
 
 microOps = new StaticInstPtr[numMicroops];
+
 uint32_t addr = 0;
 
 if (!up)
@@ -81,94 +96,129 @@
 
 // Add 0 to Rn and stick it in ureg0.
 // This is equivalent to a move.
-*uop = new MicroAddiUop(machInst, INTREG_UREG0, rn, 0);
+if (copy_base)
+*uop++ = new MicroAddiUop(machInst, INTREG_UREG0, rn, 0);
 
 unsigned reg = 0;
-unsigned regIdx = 0;
-bool force_user = user  !bits(reglist, 15);
-bool exception_ret = user  bits(reglist, 15);
+while (mem_ops != 0) {
+// Do load operations in pairs if possible
+if (load  mem_ops = 2 
+!(mem_ops == 2  bits(regs,INTREG_PC)  exception_ret)) {
+// 64-bit memory operation
+// Find 2 set register bits (clear them after finding)
+unsigned reg_idx1;
+unsigned reg_idx2;
 
-for (int i = 0; i  ones; i++) {
-// Find the next register.
-while (!bits(regs, reg))
-reg++;
-replaceBits(regs, reg, 0);
+// Find the first register
+while (!bits(regs, reg)) reg++;
+replaceBits(regs, reg, 0);
+reg_idx1 = force_user ? intRegInMode(MODE_USER, reg) : reg;
 
-regIdx = reg;
-if (force_user) {
-regIdx = intRegInMode(MODE_USER, regIdx);
-}
+// Find the second register
+while (!bits(regs, reg)) reg++;
+replaceBits(regs, reg, 0);
+reg_idx2 = force_user ? intRegInMode(MODE_USER, reg) : reg;
 
-if (load) {
-if (writeback  i == ones - 1) {
-// If it's a writeback and this is the last register
-// do the load into a temporary register which we'll move
-// into the final one later
-*++uop = new MicroLdrUop(machInst, INTREG_UREG1, INTREG_UREG0,
-up, addr);
-} else {
-// Otherwise just do it normally
-if (reg == INTREG_PC  exception_ret) {
-// This must be the exception return form of ldm.
-*++uop = new MicroLdrRetUop(machInst, regIdx,
-   INTREG_UREG0, up, addr);
+// Load into temp reg if necessary
+if (reg_idx2 == INTREG_PC  pc_temp)
+reg_idx2 = INTREG_UREG1;
+
+// Actually load both registers from memory
+*uop = new MicroLdr2Uop(machInst, reg_idx1, reg_idx2,
+copy_base ? INTREG_UREG0 : rn, up, addr);
+
+if (!writeback  reg_idx2 == INTREG_PC) {
+// No writeback if idx==pc, set appropriate flags
+(*uop)-setFlag(StaticInst::IsControl);
+(*uop)-setFlag(StaticInst::IsIndirectControl);
+
+if (!(condCode == COND_AL || condCode == COND_UC))
+  

Re: [gem5-dev] bi-mode branch predictor miss prediction rate is high

2014-09-03 Thread Mitch Hayenga via gem5-dev
A bug was recently found in the bimodal predictor.  If you are still
looking at this, you might want to try a new checkout.  Hope this helps.


On Wed, Jul 2, 2014 at 4:52 PM, Zi Yan via gem5-dev gem5-dev@gem5.org
wrote:

 I get 5 100-million-instruction simpoints for each benchmark in
 SPEC CPU 2006 with *ref input*. I am using cross-tool
 arm-cortex_a15-linux-gnueabi-gcc version 4.8.2 to compile.

 For gcc, I got from 0.2% to 5% miss rate from tournament, but 3% to 22%
 miss rate from bi-mode cross all simpoints.

 Most weird part is hmmer, I got from 0.3% to 0.5% miss rate from
 tournament,
 but 52% to 60% miss rate from bi-mode.



 --
 Best Regards
 Yan Zi

 On 2 Jul 2014, at 17:11, Anthony Gutierrez via gem5-dev wrote:

  This could depend on a lot of factors. How are you running the
 benchmarks?
 
  E.g., running SPEC 2k6's gcc to completion with the train input set in FS
  mode yields a 6.45% miss rate for bi-mode, while the tournament predictor
  yields a 7.12% miss rate.
 
 
  Anthony Gutierrez
  http://web.eecs.umich.edu/~atgutier
 
 
  On Wed, Jul 2, 2014 at 4:37 PM, Zi Yan via gem5-dev gem5-dev@gem5.org
  wrote:
 
  Hi,
 
  I just updated gem5-dev and got bi-mode as ARM's default
  branch predictor.
 
  I got mis-prediction rate
 
 (system.cpu.branchPred.condIncorrect/system.cpu.branchPred.condPredicted)
  ranging from 10% to 60%, whereas I saw mis-prediction rate ranging
  from 1% to 9% with tournament for SPEC CPU 2006 benchmarks.
 
  Should I expect this from bi-mode?
 
  Thanks.
 
  --
  Best Regards
  Yan Zi
  ___
  gem5-dev mailing list
  gem5-dev@gem5.org
  http://m5sim.org/mailman/listinfo/gem5-dev
 
  ___
  gem5-dev mailing list
  gem5-dev@gem5.org
  http://m5sim.org/mailman/listinfo/gem5-dev
 ___
 gem5-dev mailing list
 gem5-dev@gem5.org
 http://m5sim.org/mailman/listinfo/gem5-dev

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mips: Fix RLIMIT_RSS naming

2014-08-26 Thread Mitch Hayenga via gem5-dev
changeset b7715fb7cf9f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b7715fb7cf9f
description:
mips: Fix RLIMIT_RSS naming

MIPS defined RLIMIT_RSS in a way that could cause a naming conflict with
RLIMIT_RSS from the host system.  Broke clang+MacOS build.

diffstat:

 src/arch/mips/linux/linux.hh |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diffs (12 lines):

diff -r 4593282280e4 -r b7715fb7cf9f src/arch/mips/linux/linux.hh
--- a/src/arch/mips/linux/linux.hh  Tue Aug 26 10:13:28 2014 -0400
+++ b/src/arch/mips/linux/linux.hh  Tue Aug 26 10:13:31 2014 -0400
@@ -117,7 +117,7 @@
 /// Resource constants for getrlimit() (overide some generics).
 static const unsigned TGT_RLIMIT_NPROC = 8;
 static const unsigned TGT_RLIMIT_AS = 6;
-static const unsigned RLIMIT_RSS = 7;
+static const unsigned TGT_RLIMIT_RSS = 7;
 static const unsigned TGT_RLIMIT_NOFILE = 5;
 static const unsigned TGT_RLIMIT_MEMLOCK = 9;
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2332: cpu: Fix cached block load behavior in o3 cpu

2014-08-19 Thread Mitch Hayenga via gem5-dev


 On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote:
  src/cpu/o3/inst_queue.hh, line 322
  http://reviews.gem5.org/r/2332/diff/1/?file=40490#file40490line322
 
  I think we need better differentiation between this list and the one 
  declared after it.
  
  On further reading, it seems that we may not need the two lists.  Can 
  we just mark the instructions that they should be retried?  While adding 
  them back to the ready queue, we can check which ones are marked.  Or may 
  be keep an iterator that tracks the point till which we 
  should retry.
  
  One more thought,  can we do with a queue instead of a list?

That could be done.  Real hardware would likely mark these blocked 
instructions with a bit to guard from execution in the IQ.  The clearing of the 
blocked cause (cache blocked in this case) would flash-clear that guard bit for 
all load instructions in the IQ.

Functionally these two approaches should be equivalent.  The same alternative 
implementation could have been done for deferred memory instructions.  I just 
coded it similarly to deferred instructions were already handled.  I haven't 
looked to verify how easy this alternative implementation would be to do.

A queue could be used, the only difference would be that the clear() would be 
replaced with a swap with an empty queue.


 On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote:
  src/cpu/o3/inst_queue_impl.hh, line 416
  http://reviews.gem5.org/r/2332/diff/1/?file=40491#file40491line416
 
  Should we not clear retryMemInsts as well?

Yep, will add.


 On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote:
  src/cpu/o3/lsq_unit.hh, line 888
  http://reviews.gem5.org/r/2332/diff/1/?file=40494#file40494line888
 
  Do we need all these changes that appear over next 15-20 lines?  It 
  seems from my initial reading that the previous code structure could have 
  been retained.

I moved it below during the coding to clearly show the deletion events.  I hit 
a few issues due to bad deletions and split loads.  Having all deletions done 
together and not mixed in with code segments above made it clearer to me.


 On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote:
  src/cpu/o3/inst_queue_impl.hh, line 759
  http://reviews.gem5.org/r/2332/diff/1/?file=40491#file40491line759
 
  Let's retain the new line above this while loop.

Ok.


 On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote:
  src/cpu/o3/inst_queue_impl.hh, line 1116
  http://reviews.gem5.org/r/2332/diff/1/?file=40491#file40491line1116
 
  We should use nullptr now that we have gcc minimum dependency at 4.6.

Hmm, at one point in this patch or another I had to try to return nullptr and 
someone took issue with it I believe.  Either way, personally I don't care.

Grepping our code base (this might be different from the mainline gem5) it 
seems we only use nullptr eleven times in the code base and only in 
network-related code.  Should a later consistency patch try to change this?  
Because currently it looks like NULL is the preferred.

$ grep -r NULL  * | wc -l
1515
$ grep -r nullptr  * | wc -l
  11


 On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote:
  src/cpu/o3/lsq_unit_impl.hh, line 111
  http://reviews.gem5.org/r/2332/diff/1/?file=40495#file40495line111
 
  New line after.

Will do.


- Mitch


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2332/#review5272
---


On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2332/
 ---
 
 (Updated Aug. 13, 2014, 2:06 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10300:bddebc19285f
 ---
 cpu: Fix cached block load behavior in o3 cpu
 
 This patch fixes the load blocked/replay mechanism in the o3 cpu.  Rather than
 flushing the entire pipeline, this patch replays loads once the cache becomes
 unblocked.
 
 Additionally, deferred memory instructions (loads which had conflicting 
 stores),
 when replayed would not respect the number of functional units (only respected
 issue width).  This patch also corrects that.
 
 Improvements over 20% have been observed on a microbenchmark designed to
 exercise this behavior.
 
 
 Diffs
 -
 
   src/cpu/o3/iew.hh 79fde1c67ed8 
   src/cpu/o3/iew_impl.hh 79fde1c67ed8 
   src/cpu/o3/inst_queue.hh 79fde1c67ed8 
   src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8 
   src/cpu/o3/lsq.hh 79fde1c67ed8 
   src/cpu/o3/lsq_impl.hh 79fde1c67ed8 
   src/cpu/o3/lsq_unit.hh 79fde1c67ed8 
   src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8 
   src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8 
   src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8 
 
 Diff: 

Re: [gem5-dev] Review Request 2332: cpu: Fix cached block load behavior in o3 cpu

2014-08-16 Thread Mitch Hayenga via gem5-dev
Hi,

I'm the one who wrote this patch.


** The opening comment in the patch states that it is trying to do
twothings.  I would suggest that we split the patch.*

Related code was already being changed by this patch, and going out of the
way to handle blocked and deferred memory instructions differently than
each other seemed wrong.  They are both cases of memory instructions
replayed for different reasons.  A side effect of treating them the same is
that the resources are now properly modeled.






** I think we should not drop the original behaviour.  Firstly, it was not
incorrect.Secondly, no reason has been provided as to why the behaviour
implementedshould be preferred.  Are we sure that most out-of-order
processors wouldchoose the proposed over the original?*


The existing replay logic in gem5 attempts to model that which was present
in the Alpha 21264 (
http://www.ece.cmu.edu/~ece447/s14/lib/exe/fetch.php?media=21264hrm.pdf).
 Specifically:

*There are some situations in which a load or store instruction cannot be
executed due to a condition that occurs after that instruction issues from
the IQ or FQ. The instruction is aborted (along with all newer
instructions) and restarted from the fetch stage of the pipeline. This
mechanism is called a replay trap.*


However, it is doubtful that any modern processor would desire this
behavior.  The current symptom is that o3 repeatedly re-fetches and
executes the same sequence of instructions unnecessarily multiple times
until the cache finally becomes unblocked.  This behavior is more
detrimental to performance and power efficiency than even the P4's replay
mechanism which was so power hungry, it has its own wikipedia entry (
http://en.wikipedia.org/wiki/Replay_system).  The P4 was more conservative
than gem5 in that it didn't have to re-fetch. Nowadays, due to power
considerations, replay events are rarer and more selective in modern
processors.  Mikko previously had a paper explaining replay mechanisms and
the importance of limited replay in modern processors (
http://pharm.ece.wisc.edu/papers/hpca2004ikim.pdf).

Though truthful to the 21264, complaints about the existing logic have
already been brought up multiple times on the mailing list and in Karu's
WDDD paper.  Since gem5 is a research simulator and not a historical
tribute to the 21264, I see little value in keeping the old semantics.  It
would add complication to the code and likely add infrequently tested code
paths.


Hope that clears up the reasoning behind dropping the old functionality.



On Sat, Aug 16, 2014 at 11:01 AM, Nilay Vaish via gem5-dev 
gem5-dev@gem5.org wrote:


 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2332/#review5261
 ---


 Two points that I would like to make:
 * The opening comment in the patch states that it is trying to do two
 things.  I would suggest that we split the patch.

 * I think we should not drop the original behaviour.  Firstly, it was not
 incorrect.
 Secondly, no reason has been provided as to why the behaviour implemented
 should be preferred.  Are we sure that most out-of-order processors would
 choose the proposed over the original?

 - Nilay Vaish


 On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote:
 
  ---
  This is an automatically generated e-mail. To reply, visit:
  http://reviews.gem5.org/r/2332/
  ---
 
  (Updated Aug. 13, 2014, 2:06 p.m.)
 
 
  Review request for Default.
 
 
  Repository: gem5
 
 
  Description
  ---
 
  Changeset 10300:bddebc19285f
  ---
  cpu: Fix cached block load behavior in o3 cpu
 
  This patch fixes the load blocked/replay mechanism in the o3 cpu.
 Rather than
  flushing the entire pipeline, this patch replays loads once the cache
 becomes
  unblocked.
 
  Additionally, deferred memory instructions (loads which had conflicting
 stores),
  when replayed would not respect the number of functional units (only
 respected
  issue width).  This patch also corrects that.
 
  Improvements over 20% have been observed on a microbenchmark designed to
  exercise this behavior.
 
 
  Diffs
  -
 
src/cpu/o3/iew.hh 79fde1c67ed8
src/cpu/o3/iew_impl.hh 79fde1c67ed8
src/cpu/o3/inst_queue.hh 79fde1c67ed8
src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8
src/cpu/o3/lsq.hh 79fde1c67ed8
src/cpu/o3/lsq_impl.hh 79fde1c67ed8
src/cpu/o3/lsq_unit.hh 79fde1c67ed8
src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8
src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8
src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8
 
  Diff: http://reviews.gem5.org/r/2332/diff/
 
 
  Testing
  ---
 
 
  Thanks,
 
  Andreas Hansson
 
 

 ___
 gem5-dev mailing list
 gem5-dev@gem5.org
 

[gem5-dev] changeset in gem5: ext: clang fix for flexible array members

2014-08-13 Thread Mitch Hayenga via gem5-dev
changeset 0edd36ea6130 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=0edd36ea6130
description:
ext: clang fix for flexible array members

Changes how flexible array members are defined so clang does not error
out during compilation.

diffstat:

 ext/dnet/os.h |  3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diffs (13 lines):

diff -r 763f76d5dea7 -r 0edd36ea6130 ext/dnet/os.h
--- a/ext/dnet/os.h Sun Aug 10 05:39:40 2014 -0400
+++ b/ext/dnet/os.h Wed Aug 13 06:57:19 2014 -0400
@@ -98,7 +98,8 @@
 
 /* Support for flexible arrays. */
 #undef __flexarr
-#if defined(__GNUC__)  ((__GNUC__  2) || (__GNUC__ == 2  __GNUC_MINOR__ 
= 97))
+#if !defined(__clang__)  defined(__GNUC__)  \
+((__GNUC__  2) || (__GNUC__ == 2  __GNUC_MINOR__ = 97))
 /* GCC 2.97 supports C99 flexible array members.  */
 # define __flexarr []
 #else
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2292: python: Change parsing of Addr so hex values work from scripts

2014-07-02 Thread Mitch Hayenga via gem5-dev


 On June 20, 2014, 2:36 a.m., Steve Reinhardt wrote:
  Why is Addr being parsed using toMemorySize() in the first place?  That 
  seems wrong.  At least some of the places Addr is used with a size (like 
  RealView.max_mem_size), I think the problem is that the param should really 
  be a Param.MemorySize to begin with.

Hi Steve,  I'm the one that made this edit.

I agree that the call to toMemorySize() on an address seems strange, but it's 
an idiom that seems pretty well spread throughout gem5. Saying that something 
lives at 512MB for an address is used in multiple places.


1. It's pretty well baked into the AddrRange() param as most places that call 
it give it a starting address or size in MB, GB, etc which is then directly 
passed to Addr() within the param class.

common/FSConfig.py
143:self.mem_ranges = [AddrRange(Addr('1MB'), size = '64MB'),
144:   AddrRange(Addr('2GB'), size ='256MB')]
405:self.mem_ranges = [AddrRange('3GB'),
406:AddrRange(Addr('4GB'), size = excess_mem_size)]


2. It's also used on other systems for arithmetic.

src/arch/sparc/SparcSystem.py
59:hypervisor_addr = Param.Addr(Addr('64kB') + _rom_base,
61:openboot_addr = Param.Addr(Addr('512kB') + _rom_base,


It could be changed over, but this would require changing multiple other places 
in the code.  Changing these places, which are really trying to directly set an 
Addr to MemorySize, also seems wrong though.  In the FSConfig.py example, 
we're trying to specify the starting address, not a MemorySize.

Either way is going to have some bad semantics (unless we get rid of the 
ability to nicely specify starting addresses via sizes).


- Mitch


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2292/#review5144
---


On June 12, 2014, 10:47 p.m., Ali Saidi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2292/
 ---
 
 (Updated June 12, 2014, 10:47 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10240:d4f21d820604
 ---
 python: Change parsing of Addr so hex values work from scripts
 
 When passed from a configuration script with a hexadecimal value (like
 0x8000), gem5 would error out.  This is because it would call
 toMemorySize which requires the argument to end with a size specifier (like
 1MB, etc).
 
 This modification makes it so raw hex values can be passed through Addr
 parameters from the configuration scripts.
 
 
 Diffs
 -
 
   src/arch/arm/ArmSystem.py a2bb75a474fd 
   src/python/m5/params.py a2bb75a474fd 
 
 Diff: http://reviews.gem5.org/r/2292/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ali Saidi
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Proposal to untemplate the o3 CPU

2014-05-15 Thread Mitch Hayenga via gem5-dev
*Would it be possible to split this change into a series of smaller
patches?*

Thinking about what could be easily split off.

1) The moving of cpu/base_dyn_inst_impl.hh into o3's DynInst
2) The changing of the checker templating

Both of those could go in before the full untemplating patch.  But I'd
guess those are only ~1k lines of the 35k.  I'm unsure how easy the rest
would be to part out given how many cross dependencies exist.  I'll try to
re-do it for a single stage, to see if it is possible, but expect to
discover pain.

This patch was originally written in late December, so some re-basing work
is needed to bring it up to date.  Luckily, other than Tony's fetch patches
and the recent review requests by Steve, not many people have made
significant o3 changes.  Any feedback from here can go into the rebasing
effort.



*and I suspect RB would just fall over.*

Luckily it doesn't.  It ended up being split across 3 very large pages on
the internal ARM review board.



*Have you been able to test that un-templated o3 passes all the
regressions as well?*

Yes, the patch passed all of the regression tests.  It makes no difference
in the stats.




On Thu, May 15, 2014 at 10:56 AM, Korey Sewell via gem5-dev 
gem5-dev@gem5.org wrote:

 Fair points Ali and Tony.

 I think at the end of the day a  35k line patch (if it is that) will be a
 pain to review no matter how you slice/dice it although I'd still maintain
 at least dicing it into pieces would allow the reviewers to handle it
 better.

 If people all agree that this is the way to go, then maybe Mitch should
 just go ahead and provide the full patch to the RB (no matter how
 gi-normous!). This at least keeps us in the process of
 local_patch-RB-commit. I'm doubtful that any one person is going to get
 to 35k lines whether it is one patch or multiple patches anyway.


 -Korey



 On Thu, May 15, 2014 at 8:39 AM, Anthony Gutierrez via gem5-dev 
 gem5-dev@gem5.org wrote:

  I like the idea of this patch as well. In fact, the templating doesn't
  really help with extending the CPU model in my experience.
 
  As far as splitting it up into multiple smaller patches, I don't think
 that
  is necessary or really a good idea unless the changes are truly
  independent. Instead of having a single 35k line patch, we'll have many
  (tens or hundreds?) of patches that add up to 35k lines.
 
 
  Anthony Gutierrez
  http://web.eecs.umich.edu/~atgutier
 
 
  On Thu, May 15, 2014 at 11:29 AM, Korey Sewell via gem5-dev 
  gem5-dev@gem5.org wrote:
 
   Hi Mitch/gem5-ers,
   I think I would support the untemplating movement as well, so you
 have
  my
   approval there too :)
  
   With regards to implementation, I agree  that splitting up the patches
   makes for a better review although it will take a small amount of work
 to
   get a reasonable splitting granularity.
  
   Have you been able to test that un-templated o3 passes all the
  regressions
   as well?
  
   Lastly, once this is reviewed you'll need to coordinate with people who
   have outstanding o3 patches as it could be a real pain for them to pull
  in
   a patch that all of a sudden untemplates the o3 code.
  
   -Korey
  
  
  
   On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev 
   gem5-dev@gem5.org wrote:
  
Hi Mitch,
   
In general, I like the idea of removing some of the pointless/awkward
templates we have in gem5. I would definitely support moving in this
direction. However, I really dislike the idea of reviewing a 32k line
patch. Reviewing such a patch would be a headache and I suspect RB
  would
just fall over. Would it be possible to split this change into a
 series
   of
smaller patches?
   
For example, you could split it into one patch per functional unit
 and
  a
final patch that does some cleaning up. You could probably just
 'fake'
   new
un-templated class names as typedefs in the relevant header files.
   
//Andreas
   
   
   
On 2014-05-13 18:23, Mitch Hayenga via gem5-dev wrote:
   
Hi All,
   
Recently I have written a patch that removes templating from the o3
  cpu.
  In general templating in o3 makes the code significantly more
  verbose,
adds compile time overheads, and doesn't actually benefit
 performance.
 The
templating is largely pointless as 1) there aren't multiple versions
  of
fetch, rename, etc to make the  compile time Impl pattern worth
 doing
  2)
Modern CPUs have indirect branch predictors that hide the penalties
  that
the templating was trying to mask.
   
*I was wondering what peoples feelings were on a patch of this
 sort? *
   It
is a quite large modification (~35k line patch file, changes almost
  all
localized to the o3 directory).  Many of the lines are simply
 because
   the
impl header files were changed to source files.
   
Here are a few benefits of the patch
   
- Cleaner, less verbose code.
- Due to the current templating

[gem5-dev] Proposal to untemplate the o3 CPU

2014-05-13 Thread Mitch Hayenga via gem5-dev
Hi All,

Recently I have written a patch that removes templating from the o3 cpu.
 In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance.  The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the  compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.

*I was wondering what peoples feelings were on a patch of this sort? * It
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory).  Many of the lines are simply because the
impl header files were changed to source files.

Here are a few benefits of the patch

   - Cleaner, less verbose code.
   - Due to the current templating/DynInst interaction, gem5 often requires
   rebuilding the function execution signatures (o3_cpu_exec.o) when a
   modification is made to the o3 cpu.  This patch eliminates having to
   rebuild the execution signatures on o3 changes.
   - Marginally better compile/run times.
   - Moved base_dyn_inst_impl.hh into o3, it's too dependent on o3 as is.
No other cpu does/should inherit from it anyway.
   - Made the checker directly templated on the execution context (DynInst)
   instead of an Impl like o3.  Seems like it was coded dependently on o3.


Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.

*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible 0.0001%


*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real21m32.240s
user20m20.019s
sys 1m6.721s

real21m29.963s
user20m17.016s
sys 1m7.108s

*Untempated:*
real21m24.396s
user20m13.158s
sys 1m5.798s

real21m23.177s
user20m11.911s
sys 1m5.843s


*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real11m35.848s
user67m20.828s
sys 2m2.292s

*Untemplated:*
real11m42.167s
user67m7.572s
sys 2m2.056s


*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06

*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53


*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
   Run 1) 12m19.700s



Any thoughts on eventually merging this?
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Squash prefetch requests from downstream...

2014-05-09 Thread Mitch Hayenga via gem5-dev
changeset 5c2c4195b839 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5c2c4195b839
description:
mem: Squash prefetch requests from downstream caches

This patch squashes prefetch requests from downstream caches,
so that they do not steal cachelines away from caches closer
to the cpu.  It was originally coded by Mitch Hayenga and
modified by Aasheesh Kolli.

diffstat:

 src/mem/cache/cache_impl.hh |  39 +++
 src/mem/cache/mshr_queue.cc |  16 
 src/mem/cache/mshr_queue.hh |   6 ++
 src/mem/packet.hh   |   4 
 4 files changed, 65 insertions(+), 0 deletions(-)

diffs (133 lines):

diff -r 3ab094e72dad -r 5c2c4195b839 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Fri May 09 18:58:46 2014 -0400
+++ b/src/mem/cache/cache_impl.hh   Fri May 09 18:58:46 2014 -0400
@@ -1394,6 +1394,12 @@
 if (snoopPkt.sharedAsserted()) {
 pkt-assertShared();
 }
+// If this request is a prefetch and an
+// upper level squashes the prefetch request,
+// make sure to propogate the squash to the requester.
+if (snoopPkt.prefetchSquashed()) {
+pkt-setPrefetchSquashed();
+}
 } else {
 cpuSidePort-sendAtomicSnoop(pkt);
 if (!alreadyResponded  pkt-memInhibitAsserted()) {
@@ -1420,6 +1426,17 @@
 bool respond = blk-isDirty()  pkt-needsResponse();
 bool have_exclusive = blk-isWritable();
 
+// Invalidate any prefetch's from below that would strip write permissions
+// MemCmd::HardPFReq is only observed by upstream caches.  After missing
+// above and in it's own cache, a new MemCmd::ReadReq is created that
+// downstream caches observe.
+if (pkt-cmd == MemCmd::HardPFReq) {
+DPRINTF(Cache, Squashing prefetch from lower cache %#x\n,
+pkt-getAddr());
+pkt-setPrefetchSquashed();
+return;
+}
+
 if (pkt-isRead()  !invalidate) {
 assert(!needs_exclusive);
 pkt-assertShared();
@@ -1503,6 +1520,14 @@
 Addr blk_addr = blockAlign(pkt-getAddr());
 MSHR *mshr = mshrQueue.findMatch(blk_addr, is_secure);
 
+// Squash any prefetch requests from below on MSHR hits
+if (mshr  pkt-cmd == MemCmd::HardPFReq) {
+DPRINTF(Cache, Squashing prefetch from lower cache on mshr hit %#x\n,
+pkt-getAddr());
+pkt-setPrefetchSquashed();
+return;
+}
+
 // Let the MSHR itself track the snoop and decide whether we want
 // to go ahead and do the regular cache snoop
 if (mshr  mshr-handleSnoop(pkt, order++)) {
@@ -1730,6 +1755,20 @@
 snoop_pkt.senderState = mshr;
 cpuSidePort-sendTimingSnoopReq(snoop_pkt);
 
+// Check to see if the prefetch was squashed by an upper cache
+if (snoop_pkt.prefetchSquashed()) {
+DPRINTF(Cache, Prefetch squashed by upper cache.  
+   Deallocating mshr target %#x.\n, mshr-addr);
+
+// Deallocate the mshr target
+if (mshr-queue-forceDeallocateTarget(mshr)) {
+// Clear block if this deallocation resulted freed an
+// mshr when all had previously been utilized
+clearBlocked((BlockedCause)(mshr-queue-index));
+}
+return NULL;
+}
+
 if (snoop_pkt.memInhibitAsserted()) {
 markInService(mshr, snoop_pkt);
 DPRINTF(Cache, Upward snoop of prefetch for addr
diff -r 3ab094e72dad -r 5c2c4195b839 src/mem/cache/mshr_queue.cc
--- a/src/mem/cache/mshr_queue.cc   Fri May 09 18:58:46 2014 -0400
+++ b/src/mem/cache/mshr_queue.cc   Fri May 09 18:58:46 2014 -0400
@@ -232,6 +232,22 @@
 mshr-readyIter = addToReadyList(mshr);
 }
 
+bool
+MSHRQueue::forceDeallocateTarget(MSHR *mshr)
+{
+bool was_full = isFull();
+assert(mshr-hasTargets());
+// Pop the prefetch off of the target list
+mshr-popTarget();
+// Delete mshr if no remaining targets
+if (!mshr-hasTargets()  !mshr-promoteDeferredTargets()) {
+deallocateOne(mshr);
+}
+
+// Notify if MSHR queue no longer full
+return was_full  !isFull();
+}
+
 void
 MSHRQueue::squash(int threadNum)
 {
diff -r 3ab094e72dad -r 5c2c4195b839 src/mem/cache/mshr_queue.hh
--- a/src/mem/cache/mshr_queue.hh   Fri May 09 18:58:46 2014 -0400
+++ b/src/mem/cache/mshr_queue.hh   Fri May 09 18:58:46 2014 -0400
@@ -194,6 +194,12 @@
 void squash(int threadNum);
 
 /**
+ * Deallocate top target, possibly freeing the MSHR
+ * @return if MSHR queue is no longer full
+ */
+bool forceDeallocateTarget(MSHR *mshr);
+
+/**
  * Returns true if the pending list is not empty.
  * @return True if there are outstanding requests.
  */
diff