Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu
Hi, Stores should be fine since they are only sent to the memory system after commit. The relevant functions to look at are sendStore, recvRetry, and writebackStores in lsq_unit_impl.hh. Basically, if a store gets blocked the core just waits until it gets a retry. Since stores are sent in-order from the SQ to the memory system, that queue just waits. The stores are never removed from the SQ unless they succeed. Loads were special in that they were effectively removed from the scheduler, even if they might fail. Stores however always maintain their entries/order until they succeed. On Thu, Jan 29, 2015 at 6:01 PM, Beckmann, Brad via gem5-dev gem5-dev@gem5.org wrote: Hi Mitch, Quick question regarding this patch. Does this patch also handle replaying stores once the cache becomes unblocked? The changes and comments appear to only handle loads, but it seems like stores could have the same problem. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Wednesday, September 03, 2014 4:38 AM To: gem5-...@m5sim.org Subject: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu changeset 6be8945d226b in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b description: cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. diffstat: src/cpu/o3/iew.hh | 13 +- src/cpu/o3/iew_impl.hh | 57 ++ src/cpu/o3/inst_queue.hh| 25 - src/cpu/o3/inst_queue_impl.hh | 68 ++--- src/cpu/o3/lsq.hh | 27 +- src/cpu/o3/lsq_impl.hh | 23 +--- src/cpu/o3/lsq_unit.hh | 198 --- src/cpu/o3/lsq_unit_impl.hh | 40 ++- src/cpu/o3/mem_dep_unit.hh |4 +- src/cpu/o3/mem_dep_unit_impl.hh |4 +- 10 files changed, 203 insertions(+), 256 deletions(-) diffs (truncated from 846 to 300 lines): diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -181,6 +181,12 @@ /** Re-executes all rescheduled memory instructions. */ void replayMemInst(DynInstPtr inst); +/** Moves memory instruction onto the list of cache blocked instructions */ +void blockMemInst(DynInstPtr inst); + +/** Notifies that the cache has become unblocked */ +void cacheUnblocked(); + /** Sends an instruction to commit through the time buffer. */ void instToCommit(DynInstPtr inst); @@ -233,11 +239,6 @@ */ void squashDueToMemOrder(DynInstPtr inst, ThreadID tid); -/** Sends commit proper information for a squash due to memory becoming - * blocked (younger issued instructions must be retried). - */ -void squashDueToMemBlocked(DynInstPtr inst, ThreadID tid); - /** Sets Dispatch to blocked, and signals back to other stages to block. */ void block(ThreadID tid); diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew_impl.hh --- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:39 2014 -0400 @@ -530,29 +530,6 @@ templateclass Impl void -DefaultIEWImpl::squashDueToMemBlocked(DynInstPtr inst, ThreadID tid) -{ -DPRINTF(IEW, [tid:%i]: Memory blocked, squashing load and younger insts, -PC: %s [sn:%i].\n, tid, inst-pcState(), inst-seqNum); -if (!toCommit-squash[tid] || -inst-seqNum toCommit-squashedSeqNum[tid]) { -toCommit-squash[tid] = true; - -toCommit-squashedSeqNum[tid] = inst-seqNum; -toCommit-pc[tid] = inst-pcState(); -toCommit-mispredictInst[tid] = NULL; - -// Must include the broadcasted SN in the squash. -toCommit-includeSquashInst[tid] = true; - -ldstQueue.setLoadBlockedHandled(tid); - -wroteToTimeBuffer = true; -} -} - -templateclass Impl -void DefaultIEWImpl::block(ThreadID tid) { DPRINTF(IEW, [tid:%u]: Blocking.\n, tid); @@ -610,6 +587,20 @@ templateclass Impl void
Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu
Ahh, yeah I'm familiar with speculatively grabbing coherence rights for stores prior to commit. But the store isn't done right, its just globally ordered. And other system activity might make that ownership go away prior to the actual store commit. How about just dropping/ignoring the prefetch if the blocked case actually happens? On Fri, Jan 30, 2015 at 6:16 PM, Beckmann, Brad via gem5-dev gem5-dev@gem5.org wrote: Thanks Mitch for the quick reply. While assuming stores are only sent after commit is true for the current O3 model, aggressive out-of-order processors send store addresses to the memory system as soon as they are available (i.e. speculatively). We actually have a patch that provides such a capability, but I'm having a tough time figuring out how to merge it with your change. Any suggestions you may have would be very much appreciated. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Friday, January 30, 2015 9:34 AM To: gem5 Developer List Cc: gem5-...@m5sim.org Subject: Re: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu Hi, Stores should be fine since they are only sent to the memory system after commit. The relevant functions to look at are sendStore, recvRetry, and writebackStores in lsq_unit_impl.hh. Basically, if a store gets blocked the core just waits until it gets a retry. Since stores are sent in-order from the SQ to the memory system, that queue just waits. The stores are never removed from the SQ unless they succeed. Loads were special in that they were effectively removed from the scheduler, even if they might fail. Stores however always maintain their entries/order until they succeed. On Thu, Jan 29, 2015 at 6:01 PM, Beckmann, Brad via gem5-dev gem5-dev@gem5.org wrote: Hi Mitch, Quick question regarding this patch. Does this patch also handle replaying stores once the cache becomes unblocked? The changes and comments appear to only handle loads, but it seems like stores could have the same problem. Thanks, Brad -Original Message- From: gem5-dev [mailto:gem5-dev-boun...@gem5.org] On Behalf Of Mitch Hayenga via gem5-dev Sent: Wednesday, September 03, 2014 4:38 AM To: gem5-...@m5sim.org Subject: [gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu changeset 6be8945d226b in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b description: cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. diffstat: src/cpu/o3/iew.hh | 13 +- src/cpu/o3/iew_impl.hh | 57 ++ src/cpu/o3/inst_queue.hh| 25 - src/cpu/o3/inst_queue_impl.hh | 68 ++--- src/cpu/o3/lsq.hh | 27 +- src/cpu/o3/lsq_impl.hh | 23 +--- src/cpu/o3/lsq_unit.hh | 198 --- src/cpu/o3/lsq_unit_impl.hh | 40 ++- src/cpu/o3/mem_dep_unit.hh |4 +- src/cpu/o3/mem_dep_unit_impl.hh |4 +- 10 files changed, 203 insertions(+), 256 deletions(-) diffs (truncated from 846 to 300 lines): diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -181,6 +181,12 @@ /** Re-executes all rescheduled memory instructions. */ void replayMemInst(DynInstPtr inst); +/** Moves memory instruction onto the list of cache blocked instructions */ +void blockMemInst(DynInstPtr inst); + +/** Notifies that the cache has become unblocked */ +void cacheUnblocked(); + /** Sends an instruction to commit through the time buffer. */ void instToCommit(DynInstPtr inst); @@ -233,11 +239,6 @@ */ void squashDueToMemOrder(DynInstPtr inst, ThreadID tid); -/** Sends commit proper information for a squash due to memory becoming - * blocked (younger issued instructions must be retried). - */ -void
Re: [gem5-dev] simpoints and KVM
Hi Mike, I'm the one who wrote the initial version of the simpoint collection/generation a few years ago. I enforced the fastmem option primarily because I didn't see it necessary to simulate caches during simpoint generation and it made simulation faster. You can simply disable this and it should all still work. For the --cpu-type=atomic option, initially simpoints were hardcoded directly into the atomic CPU. Since then, they've been changed to use the probe system. However, a quick grep of the code shows the initialization for the SimPoint probe only exists in the atomic CPU. If you registered the probe point with the TimingCPU as well, then that should work (I think). On Mon, Jan 12, 2015 at 5:02 PM, mike upton via gem5-dev gem5-dev@gem5.org wrote: I am trying to enable simpoint generation with kvm enabled. Is there anything that inherently blocks this? Simpoints are currently enabled only with --fastmem and --cpu-type=atomic. How fundamental are each of these restrictions? Thanks, Mike ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Rework the structuring of the prefetchers
changeset b9646f4546ad in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=b9646f4546ad description: mem: Rework the structuring of the prefetchers Re-organizes the prefetcher class structure. Previously the BasePrefetcher forced multiple assumptions on the prefetchers that inherited from it. This patch makes the BasePrefetcher class truly representative of base functionality. For example, the base class no longer enforces FIFO order. Instead, prefetchers with FIFO requests (like the existing stride and tagged prefetchers) now inherit from a new QueuedPrefetcher base class. Finally, the stride-based prefetcher now assumes a custimizable lookup table (sets/ways) rather than the previous fully associative structure. diffstat: src/mem/cache/cache_impl.hh | 10 +- src/mem/cache/prefetch/Prefetcher.py | 62 --- src/mem/cache/prefetch/SConscript|1 + src/mem/cache/prefetch/base.cc | 258 -- src/mem/cache/prefetch/base.hh | 139 +- src/mem/cache/prefetch/queued.cc | 213 src/mem/cache/prefetch/queued.hh | 108 ++ src/mem/cache/prefetch/stride.cc | 205 +++--- src/mem/cache/prefetch/stride.hh | 55 +++--- src/mem/cache/prefetch/tagged.cc | 16 +- src/mem/cache/prefetch/tagged.hh | 19 +- 11 files changed, 599 insertions(+), 487 deletions(-) diffs (truncated from 1401 to 300 lines): diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -535,7 +535,7 @@ bool satisfied = access(pkt, blk, lat, writebacks); // track time of availability of next prefetch, if any -Tick next_pf_time = 0; +Tick next_pf_time = MaxTick; bool needsResponse = pkt-needsResponse(); @@ -548,7 +548,7 @@ // Don't notify on SWPrefetch if (!pkt-cmd.isSWPrefetch()) -next_pf_time = prefetcher-notify(pkt, time); +next_pf_time = prefetcher-notify(pkt); } if (needsResponse) { @@ -648,7 +648,7 @@ if (prefetcher) { // Don't notify on SWPrefetch if (!pkt-cmd.isSWPrefetch()) -next_pf_time = prefetcher-notify(pkt, time); +next_pf_time = prefetcher-notify(pkt); } } } else { @@ -688,12 +688,12 @@ if (prefetcher) { // Don't notify on SWPrefetch if (!pkt-cmd.isSWPrefetch()) -next_pf_time = prefetcher-notify(pkt, time); +next_pf_time = prefetcher-notify(pkt); } } } -if (next_pf_time != 0) +if (next_pf_time != MaxTick) requestMemSideBus(Request_PF, std::max(time, next_pf_time)); // copy writebacks to write buffer diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/prefetch/Prefetcher.py --- a/src/mem/cache/prefetch/Prefetcher.py Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/prefetch/Prefetcher.py Tue Dec 23 09:31:18 2014 -0500 @@ -1,4 +1,4 @@ -# Copyright (c) 2012 ARM Limited +# Copyright (c) 2012, 2014 ARM Limited # All rights reserved. # # The license below extends only to copyright in the software and shall @@ -37,6 +37,7 @@ # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # Authors: Ron Dreslinski +# Mitch Hayenga from ClockedObject import ClockedObject from m5.params import * @@ -46,39 +47,46 @@ type = 'BasePrefetcher' abstract = True cxx_header = mem/cache/prefetch/base.hh -size = Param.Int(100, - Number of entries in the hardware prefetch queue) -cross_pages = Param.Bool(False, - Allow prefetches to cross virtual page boundaries) -serial_squash = Param.Bool(False, - Squash prefetches with a later time on a subsequent miss) -degree = Param.Int(1, - Degree of the prefetch depth) -latency = Param.Cycles('1', Latency of the prefetcher) -use_master_id = Param.Bool(True, - Use the master id to separate calculations of prefetches) -data_accesses_only = Param.Bool(False, - Only prefetch on data not on instruction accesses) -on_miss_only = Param.Bool(False, - Only prefetch on miss (as opposed to always)) -on_read_only = Param.Bool(False, - Only prefetch on read requests (write requests ignored)) -on_prefetch = Param.Bool(True, - Let lower cache prefetcher train on prefetch requests) -inst_tagged = Param.Bool(True, - Perform a tagged prefetch for instruction fetches always) sys = Param.System(Parent.any, System this prefetcher belongs to) -class
[gem5-dev] changeset in gem5: mem: Fix event scheduling issue for prefetches
changeset 00965520c9f5 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=00965520c9f5 description: mem: Fix event scheduling issue for prefetches The cache's MemSidePacketQueue schedules a sendEvent based upon nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever a future prefetch is ready. However, a prefetch being ready does not guarentee that it can obtain an MSHR. So, when all MSHRs are full, the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond until an MSHR is finally freed and the prefetch can happen. This patch fixes this by not signaling the prefetch ready time if the prefetch could not be generated. The event is rescheduled as soon as a MSHR becomes available. diffstat: src/mem/cache/cache_impl.hh | 13 - 1 files changed, 12 insertions(+), 1 deletions(-) diffs (30 lines): diff -r 97aa1ee1c2d9 -r 00965520c9f5 src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -1197,6 +1197,15 @@ if (wasFull !mq-isFull()) { clearBlocked((BlockedCause)mq-index); } + +// Request the bus for a prefetch if this deallocation freed enough +// MSHRs for a prefetch to take place +if (prefetcher mq == mshrQueue mshrQueue.canPrefetch()) { +Tick next_pf_time = std::max(prefetcher-nextPrefetchReadyTime(), + curTick()); +if (next_pf_time != MaxTick) +requestMemSideBus(Request_PF, next_pf_time); +} } // copy writebacks to write buffer @@ -1955,7 +1964,9 @@ Tick nextReady = std::min(mshrQueue.nextMSHRReadyTime(), writeBuffer.nextMSHRReadyTime()); -if (prefetcher) { +// Don't signal prefetch ready time if no MSHRs available +// Will signal once enoguh MSHRs are deallocated +if (prefetcher mshrQueue.canPrefetch()) { nextReady = std::min(nextReady, prefetcher-nextPrefetchReadyTime()); } ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Add parameter to reserve MSHR entries fo...
changeset 0b969a35781f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=0b969a35781f description: mem: Add parameter to reserve MSHR entries for demand access Adds a new parameter that reserves some number of MSHR entries for demand accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand requests from the CPU to stall. diffstat: src/mem/cache/BaseCache.py | 1 + src/mem/cache/base.cc | 4 ++-- src/mem/cache/cache_impl.hh | 2 +- src/mem/cache/mshr_queue.cc | 8 +--- src/mem/cache/mshr_queue.hh | 19 ++- 5 files changed, 27 insertions(+), 7 deletions(-) diffs (101 lines): diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/BaseCache.py --- a/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500 @@ -54,6 +54,7 @@ max_miss_count = Param.Counter(0, number of misses to handle before calling exit) mshrs = Param.Int(number of MSHRs (max outstanding requests)) +demand_mshr_reserve = Param.Int(1, mshrs to reserve for demand access) size = Param.MemorySize(capacity in bytes) forward_snoops = Param.Bool(True, forward snoops from mem side to cpu side) diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/base.cc --- a/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500 @@ -68,8 +68,8 @@ BaseCache::BaseCache(const Params *p) : MemObject(p), cpuSidePort(nullptr), memSidePort(nullptr), - mshrQueue(MSHRs, p-mshrs, 4, MSHRQueue_MSHRs), - writeBuffer(write buffer, p-write_buffers, p-mshrs+1000, + mshrQueue(MSHRs, p-mshrs, 4, p-demand_mshr_reserve, MSHRQueue_MSHRs), + writeBuffer(write buffer, p-write_buffers, p-mshrs+1000, 0, MSHRQueue_WriteBuffer), blkSize(p-system-cacheLineSize()), hitLatency(p-hit_latency), diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -1841,7 +1841,7 @@ // fall through... no pending requests. Try a prefetch. assert(!miss_mshr !write_mshr); -if (prefetcher !mshrQueue.isFull()) { +if (prefetcher mshrQueue.canPrefetch()) { // If we have a miss queue slot, we can try a prefetch PacketPtr pkt = prefetcher-getPacket(); if (pkt) { diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.cc --- a/src/mem/cache/mshr_queue.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/mshr_queue.cc Tue Dec 23 09:31:18 2014 -0500 @@ -52,10 +52,12 @@ using namespace std; MSHRQueue::MSHRQueue(const std::string _label, - int num_entries, int reserve, int _index) + int num_entries, int reserve, int demand_reserve, + int _index) : label(_label), numEntries(num_entries + reserve - 1), - numReserve(reserve), registers(numEntries), - drainManager(NULL), allocated(0), inServiceEntries(0), index(_index) + numReserve(reserve), demandReserve(demand_reserve), + registers(numEntries), drainManager(NULL), allocated(0), + inServiceEntries(0), index(_index) { for (int i = 0; i numEntries; ++i) { registers[i].queue = this; diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.hh --- a/src/mem/cache/mshr_queue.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/mshr_queue.hh Tue Dec 23 09:31:18 2014 -0500 @@ -77,6 +77,12 @@ */ const int numReserve; +/** + * The number of entries to reserve for future demand accesses. + * Prevent prefetcher from taking all mshr entries + */ +const int demandReserve; + /** MSHR storage. */ std::vectorMSHR registers; /** Holds pointers to all allocated entries. */ @@ -106,9 +112,11 @@ * @param num_entrys The number of entries in this queue. * @param reserve The minimum number of entries needed to satisfy * any access. + * @param demand_reserve The minimum number of entries needed to satisfy + * demand accesses. */ MSHRQueue(const std::string _label, int num_entries, int reserve, - int index); + int demand_reserve, int index); /** * Find the first MSHR that matches the provided address. @@ -218,6 +226,15 @@ } /** + * Returns true if sufficient mshrs for prefetch. + * @return True if sufficient mshrs for prefetch. + */ +bool canPrefetch() const +{ +return (allocated numEntries - (numReserve + demandReserve)); +} + +/** * Returns the MSHR at the head of the readyList. * @return The next request to service. */ ___ gem5-dev mailing list gem5-dev@gem5.org
[gem5-dev] changeset in gem5: mem: Change prefetcher to use random_mt
changeset 63edd4a1243f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=63edd4a1243f description: mem: Change prefetcher to use random_mt Prefechers has used rand() to generate random numers previously. diffstat: src/mem/cache/prefetch/stride.cc | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diffs (20 lines): diff -r 7982e539d003 -r 63edd4a1243f src/mem/cache/prefetch/stride.cc --- a/src/mem/cache/prefetch/stride.cc Tue Dec 23 09:31:19 2014 -0500 +++ b/src/mem/cache/prefetch/stride.cc Tue Dec 23 09:31:19 2014 -0500 @@ -46,6 +46,7 @@ * Stride Prefetcher template instantiations. */ +#include base/random.hh #include debug/HWPrefetch.hh #include mem/cache/prefetch/stride.hh @@ -176,7 +177,7 @@ { // Rand replacement for now int set = pcHash(pc); -int way = rand() % pcTableAssoc; +int way = random_mt.randomint(0, pcTableAssoc - 1); DPRINTF(HWPrefetch, Victimizing lookup table[%d][%d].\n, set, way); return pcTable[master_id][set][way]; ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Fix bug relating to writebacks and prefe...
changeset 97aa1ee1c2d9 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=97aa1ee1c2d9 description: mem: Fix bug relating to writebacks and prefetches Previously the code commented about an unhandled case where it might be possible for a writeback to arrive after a prefetch was generated but before it was sent to the memory system. I hit that case. Luckily the prefetchSquash() logic already in the code handles dropping prefetch request in certian circumstances. diffstat: src/mem/cache/cache_impl.hh | 12 1 files changed, 4 insertions(+), 8 deletions(-) diffs (29 lines): diff -r b9646f4546ad -r 97aa1ee1c2d9 src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -1892,12 +1892,6 @@ BlkType *blk = tags-findBlock(mshr-addr, mshr-isSecure); if (tgt_pkt-cmd == MemCmd::HardPFReq) { -// It might be possible for a writeback to arrive between -// the time the prefetch is placed in the MSHRs and when -// it's selected to send... if so, this assert will catch -// that, and then we'll have to figure out what to do. -assert(blk == NULL); - // We need to check the caches above us to verify that // they don't have a copy of this block in the dirty state // at the moment. Without this check we could get a stale @@ -1909,8 +1903,10 @@ cpuSidePort-sendTimingSnoopReq(snoop_pkt); // Check to see if the prefetch was squashed by an upper cache -if (snoop_pkt.prefetchSquashed()) { -DPRINTF(Cache, Prefetch squashed by upper cache. +// Or if a writeback arrived between the time the prefetch was +// placed in the MSHRs and when it was selected to send. +if (snoop_pkt.prefetchSquashed() || blk != NULL) { +DPRINTF(Cache, Prefetch squashed by cache. Deallocating mshr target %#x.\n, mshr-addr); // Deallocate the mshr target ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Asimbench : Turn off dumping of file system.framebuffer.bmp
Set the enable_capture parameter from src/dev/arm/RealView.py to false. On Sat, Nov 22, 2014 at 8:46 PM, Lokesh Jindal via gem5-dev gem5-dev@gem5.org wrote: Hello everyone, While running asimbench, I see that the benchmarks dump file system.framebuffer.bmp very frequently in the output directory. This is creating a lot of unnecessary writes to disk and slowing down my simulation/file-system. It would be a life saver if someone could suggest _*a way to *__*turn-off dumping of this bmp file.*__* *__ _Thanks and Regards Lokesh Jindal ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Delete unused variable in Garnet Network...
changeset 50bbc64efbb8 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=50bbc64efbb8 description: mem: Delete unused variable in Garnet NetworkLink With recent changes OSX clang compilation fails due to an unused variable. diffstat: src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diffs (11 lines): diff -r d1dce0b728b6 -r 50bbc64efbb8 src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh --- a/src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh Wed Nov 12 09:05:22 2014 -0500 +++ b/src/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh Wed Nov 12 09:05:23 2014 -0500 @@ -74,7 +74,6 @@ flitBuffer_d *linkBuffer; Consumer *link_consumer; flitBuffer_d *link_srcQueue; -int m_flit_width; // Statistical variables unsigned int m_link_utilized; ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: cpu: Add writeback modeling for drain functio...
changeset e57f5bffc553 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=e57f5bffc553 description: cpu: Add writeback modeling for drain functionality It is possible for the O3 CPU to consider itself drained and later have a squashed instruction perform a writeback. This patch re-adds tracking of in-flight instructions to prevent falsely signaling a drained event. diffstat: src/cpu/o3/inst_queue.hh | 3 +++ src/cpu/o3/inst_queue_impl.hh | 7 ++- 2 files changed, 9 insertions(+), 1 deletions(-) diffs (51 lines): diff -r 7e54a9a9f6b2 -r e57f5bffc553 src/cpu/o3/inst_queue.hh --- a/src/cpu/o3/inst_queue.hh Wed Oct 29 23:18:26 2014 -0500 +++ b/src/cpu/o3/inst_queue.hh Wed Oct 29 23:18:27 2014 -0500 @@ -437,6 +437,9 @@ /** The number of physical registers in the CPU. */ unsigned numPhysRegs; +/** Number of instructions currently in flight to FUs */ +int wbOutstanding; + /** Delay between commit stage and the IQ. * @todo: Make there be a distinction between the delays within IEW. */ diff -r 7e54a9a9f6b2 -r e57f5bffc553 src/cpu/o3/inst_queue_impl.hh --- a/src/cpu/o3/inst_queue_impl.hh Wed Oct 29 23:18:26 2014 -0500 +++ b/src/cpu/o3/inst_queue_impl.hh Wed Oct 29 23:18:27 2014 -0500 @@ -415,6 +415,7 @@ deferredMemInsts.clear(); blockedMemInsts.clear(); retryMemInsts.clear(); +wbOutstanding = 0; } template class Impl @@ -444,7 +445,9 @@ bool InstructionQueueImpl::isDrained() const { -bool drained = dependGraph.empty() instsToExecute.empty(); +bool drained = dependGraph.empty() + instsToExecute.empty() + wbOutstanding == 0; for (ThreadID tid = 0; tid numThreads; ++tid) drained = drained memDepUnit[tid].isDrained(); @@ -723,6 +726,7 @@ assert(!cpu-switchedOut()); // The CPU could have been sleeping until this op completed (*extremely* // long latency op). Wake it if it was. This may be overkill. + --wbOutstanding; iewStage-wakeCPU(); if (fu_idx -1) @@ -823,6 +827,7 @@ } else { Cycles issue_latency = fuPool-getIssueLatency(op_class); // Generate completion event for the FU +++wbOutstanding; FUCompletion *execution = new FUCompletion(issuing_inst, idx, this); ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Missing m_flit_width definition.
This kind of thing happens fairly often. The compilation is set to error out if it detects unused variables. Over time the compilers get smarter and detect variables as unused that for whatever reason the compiler previously couldn't detect. From my experience Clang on OS X tends to see most of these in the committed code, since most of the other developers develop solely on GCC/Linux. I fixed this error as well last week, just haven't gotten around to submitting a patch. On Sat, Oct 25, 2014 at 3:03 AM, Todd Bezenek via gem5-dev gem5-dev@gem5.org wrote: (I'm a gem5 newbie, so please excuse me if this is an easy/know issue.) Please let me know if this is the wrong place to post this. I'm using Mac OS X (Mavericks). I downloaded the standard gem5 source tree (on Oct. 25, 2014): bash hg clone http://repo.gem5.org/gem5 and built it with the following error: bash scons build/ARM/gem5.opt... [ CXX] ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.cc - .oIn file included from build/ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.cc: 31:In file included from build/ARM/mem/ruby/network/garnet/fixed-pipeline/CreditLink_d.hh:34:build/ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.hh:77:9: error: private field 'm_flit_width' is not used [-Werror,-Wunused-private-field] int m_flit_width; ^ 1 error generated. scons: *** [build/ARM/mem/ruby/network/garnet/fixed-pipeline/NetworkLink_d.o] Error 1 scons: building terminated because of errors. Sat Oct 25 00:17:37 ~/Work/Gem5/Src/gem5 - I fixed this by commenting out the definition of m_flit_width and everything worked. - I figure it would be a good thing to fix this for the distribution in general. -Todd -- Todd Bezenek http://www.linkedin.com/in/toddbezenek/, MScCS, MScEE beze...@gmail.com A people hire A people, B people hire C people. --Jim Gray http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist) ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] Changeset 10484 breaks compilation on Mac OS X
changeset: 10484:6709bbcf564d user:Michael Adler michael.ad...@intel.com date:Mon Oct 20 16:44:53 2014 -0500 summary: sim: implement getdents/getdents64 in user mode Errors: build/ARM/sim/syscall_emul.cc:881:30: error: use of undeclared identifier 'SYS_getdents' int bytes_read = syscall(SYS_getdents, fd, bufArg.bufferPtr(), nbytes); ^ build/ARM/sim/syscall_emul.cc:899:30: error: use of undeclared identifier 'SYS_getdents64' int bytes_read = syscall(SYS_getdents64, fd, bufArg.bufferPtr(), nbytes); It looks like this recent changeset for syscall emulation directly makes a syscall to SYS_getdents. Which seemingly does not exist on mac OS X. From the getdents manpage: This is not the function you are interested in. Look at readdir(3) for the POSIX conforming C library interface. - Mitch ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: cpu: Remove Ozone CPU from the source tree
changeset cba563d00376 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=cba563d00376 description: cpu: Remove Ozone CPU from the source tree The Ozone CPU is now very much out of date and completely non-functional, with no one actively working on restoring it. It is a source of confusion for new users who attempt to use it before realizing its current state. RIP diffstat: src/cpu/checker/cpu_impl.hh| 2 +- src/cpu/o3/SConscript | 8 +- src/cpu/ozone/OzoneCPU.py | 118 - src/cpu/ozone/OzoneChecker.py |38 - src/cpu/ozone/SConscript |57 - src/cpu/ozone/SConsopts|35 - src/cpu/ozone/SimpleOzoneCPU.py| 111 - src/cpu/ozone/back_end.cc |34 - src/cpu/ozone/back_end.hh | 535 src/cpu/ozone/back_end_impl.hh | 1919 src/cpu/ozone/base_dyn_inst.cc |35 - src/cpu/ozone/bpred_unit.cc|36 - src/cpu/ozone/checker_builder.cc | 100 - src/cpu/ozone/cpu.cc |37 - src/cpu/ozone/cpu.hh | 419 -- src/cpu/ozone/cpu_builder.cc | 200 --- src/cpu/ozone/cpu_impl.hh | 886 -- src/cpu/ozone/dyn_inst.cc |37 - src/cpu/ozone/dyn_inst.hh | 225 --- src/cpu/ozone/dyn_inst_impl.hh | 277 src/cpu/ozone/ea_list.cc |80 - src/cpu/ozone/ea_list.hh |75 - src/cpu/ozone/front_end.cc |36 - src/cpu/ozone/front_end.hh | 317 - src/cpu/ozone/front_end_impl.hh| 995 src/cpu/ozone/inorder_back_end.cc |34 - src/cpu/ozone/inorder_back_end.hh | 382 -- src/cpu/ozone/inorder_back_end_impl.hh | 527 src/cpu/ozone/inst_queue.cc|38 - src/cpu/ozone/inst_queue.hh| 508 src/cpu/ozone/inst_queue_impl.hh | 1349 -- src/cpu/ozone/lsq_unit.cc |36 - src/cpu/ozone/lsq_unit.hh | 636 -- src/cpu/ozone/lsq_unit_impl.hh | 844 -- src/cpu/ozone/lw_back_end.cc |34 - src/cpu/ozone/lw_back_end.hh | 435 --- src/cpu/ozone/lw_back_end_impl.hh | 1677 --- src/cpu/ozone/lw_lsq.cc|36 - src/cpu/ozone/lw_lsq.hh| 695 --- src/cpu/ozone/lw_lsq_impl.hh | 965 src/cpu/ozone/null_predictor.hh| 105 - src/cpu/ozone/ozone_base_dyn_inst.cc |39 - src/cpu/ozone/ozone_impl.hh|74 - src/cpu/ozone/rename_table.cc |36 - src/cpu/ozone/rename_table.hh |56 - src/cpu/ozone/rename_table_impl.hh |58 - src/cpu/ozone/simple_base_dyn_inst.cc |39 - src/cpu/ozone/simple_cpu_builder.cc| 196 --- src/cpu/ozone/simple_impl.hh |70 - src/cpu/ozone/simple_params.hh | 192 --- src/cpu/ozone/thread_state.hh | 152 -- 51 files changed, 4 insertions(+), 15821 deletions(-) diffs (truncated from 16048 to 300 lines): diff -r ceb471d74fe9 -r cba563d00376 src/cpu/checker/cpu_impl.hh --- a/src/cpu/checker/cpu_impl.hh Thu Oct 09 17:51:57 2014 -0400 +++ b/src/cpu/checker/cpu_impl.hh Thu Oct 09 17:51:58 2014 -0400 @@ -410,7 +410,7 @@ if (FullSystem) { // @todo: Determine if these should happen only if the // instruction hasn't faulted. In the SimpleCPU case this may -// not be true, but in the O3 or Ozone case this may be true. +// not be true, but in the O3 case this may be true. Addr oldpc; int count = 0; do { diff -r ceb471d74fe9 -r cba563d00376 src/cpu/o3/SConscript --- a/src/cpu/o3/SConscript Thu Oct 09 17:51:57 2014 -0400 +++ b/src/cpu/o3/SConscript Thu Oct 09 17:51:58 2014 -0400 @@ -32,11 +32,6 @@ Import('*') -if 'O3CPU' in env['CPU_MODELS'] or 'OzoneCPU' in env['CPU_MODELS']: -DebugFlag('CommitRate') -DebugFlag('IEW') -DebugFlag('IQ') - if 'O3CPU' in env['CPU_MODELS']: SimObject('FUPool.py') SimObject('FuncUnitConfig.py') @@ -64,6 +59,9 @@ Source('store_set.cc') Source('thread_context.cc') +DebugFlag('CommitRate') +DebugFlag('IEW') +DebugFlag('IQ') DebugFlag('LSQ') DebugFlag('LSQUnit') DebugFlag('MemDepUnit') diff -r ceb471d74fe9 -r cba563d00376 src/cpu/ozone/OzoneCPU.py --- a/src/cpu/ozone/OzoneCPU.py Thu Oct 09 17:51:57 2014 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 + @@ -1,118 +0,0 @@ -# Copyright (c) 2006-2007 The Regents of The University of Michigan -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided
[gem5-dev] changeset in gem5: mem: Remove the GHB prefetcher from the sourc...
changeset 452a5f178ec5 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=452a5f178ec5 description: mem: Remove the GHB prefetcher from the source tree There are two primary issues with this code which make it deserving of deletion. 1) GHB is a way to structure a prefetcher, not a definitive type of prefetcher 2) This prefetcher isn't even structured like a GHB prefetcher. It's basically a worse version of the stride prefetcher. It primarily serves to confuse new gem5 users and most functionality is already present in the stride prefetcher. diffstat: src/mem/cache/prefetch/Prefetcher.py | 5 - src/mem/cache/prefetch/SConscript| 1 - src/mem/cache/prefetch/ghb.cc| 97 src/mem/cache/prefetch/ghb.hh| 77 4 files changed, 0 insertions(+), 180 deletions(-) diffs (208 lines): diff -r ab8b8601b6ff -r 452a5f178ec5 src/mem/cache/prefetch/Prefetcher.py --- a/src/mem/cache/prefetch/Prefetcher.py Sat Sep 20 17:17:43 2014 -0400 +++ b/src/mem/cache/prefetch/Prefetcher.py Sat Sep 20 17:17:44 2014 -0400 @@ -69,11 +69,6 @@ Perform a tagged prefetch for instruction fetches always) sys = Param.System(Parent.any, System this device belongs to) -class GHBPrefetcher(BasePrefetcher): -type = 'GHBPrefetcher' -cxx_class = 'GHBPrefetcher' -cxx_header = mem/cache/prefetch/ghb.hh - class StridePrefetcher(BasePrefetcher): type = 'StridePrefetcher' cxx_class = 'StridePrefetcher' diff -r ab8b8601b6ff -r 452a5f178ec5 src/mem/cache/prefetch/SConscript --- a/src/mem/cache/prefetch/SConscript Sat Sep 20 17:17:43 2014 -0400 +++ b/src/mem/cache/prefetch/SConscript Sat Sep 20 17:17:44 2014 -0400 @@ -33,7 +33,6 @@ SimObject('Prefetcher.py') Source('base.cc') -Source('ghb.cc') Source('stride.cc') Source('tagged.cc') diff -r ab8b8601b6ff -r 452a5f178ec5 src/mem/cache/prefetch/ghb.cc --- a/src/mem/cache/prefetch/ghb.cc Sat Sep 20 17:17:43 2014 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 + @@ -1,97 +0,0 @@ -/* - * Copyright (c) 2012-2013 ARM Limited - * All rights reserved - * - * The license below extends only to copyright in the software and shall - * not be construed as granting a license to any other intellectual - * property including but not limited to intellectual property relating - * to a hardware implementation of the functionality of the software - * licensed hereunder. You may use the software subject to the license - * terms below provided that you ensure that this notice is replicated - * unmodified and in its entirety in all distributions of the software, - * modified or unmodified, in source code or in binary form. - * - * Copyright (c) 2005 The Regents of The University of Michigan - * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions are - * met: redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer; - * redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution; - * neither the name of the copyright holders nor the names of its - * contributors may be used to endorse or promote products derived from - * this software without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * - * Authors: Ron Dreslinski - * Steve Reinhardt - */ - -/** - * @file - * GHB Prefetcher implementation. - */ - -#include base/trace.hh -#include debug/HWPrefetch.hh -#include mem/cache/prefetch/ghb.hh - -void -GHBPrefetcher::calculatePrefetch(PacketPtr pkt, std::listAddr addresses, - std::listCycles delays) -{ -Addr blk_addr = pkt-getAddr() ~(Addr)(blkSize-1); -bool is_secure = pkt-isSecure(); -int master_id = useMasterId ? pkt-req-masterId() : 0; -assert(master_id Max_Masters); - -bool same_sec_state = true; -// Avoid activating
[gem5-dev] changeset in gem5: cpu: Add ExecFlags debug flag
changeset b31580e27d1f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=b31580e27d1f description: cpu: Add ExecFlags debug flag Adds a debug flag to print out the flags a instruction is tagged with. diffstat: src/cpu/SConscript | 3 ++- src/cpu/exetrace.cc | 6 ++ 2 files changed, 8 insertions(+), 1 deletions(-) diffs (36 lines): diff -r 452a5f178ec5 -r b31580e27d1f src/cpu/SConscript --- a/src/cpu/SConscriptSat Sep 20 17:17:44 2014 -0400 +++ b/src/cpu/SConscriptSat Sep 20 17:17:45 2014 -0400 @@ -96,6 +96,7 @@ DebugFlag('ExecUser', 'Filter: Trace user mode instructions') DebugFlag('ExecKernel', 'Filter: Trace kernel mode instructions') DebugFlag('ExecAsid', 'Format: Include ASID in trace') +DebugFlag('ExecFlags', 'Format: Include instruction flags in trace') DebugFlag('Fetch') DebugFlag('IntrControl') DebugFlag('O3PipeView') @@ -106,7 +107,7 @@ 'ExecFaulting', 'ExecFetchSeq', 'ExecOpClass', 'ExecRegDelta', 'ExecResult', 'ExecSpeculative', 'ExecSymbol', 'ExecThread', 'ExecTicks', 'ExecMicro', 'ExecMacro', 'ExecUser', 'ExecKernel', -'ExecAsid' ]) +'ExecAsid', 'ExecFlags' ]) CompoundFlag('Exec', [ 'ExecEnable', 'ExecTicks', 'ExecOpClass', 'ExecThread', 'ExecEffAddr', 'ExecResult', 'ExecSymbol', 'ExecMicro', 'ExecFaulting', 'ExecUser', 'ExecKernel' ]) diff -r 452a5f178ec5 -r b31580e27d1f src/cpu/exetrace.cc --- a/src/cpu/exetrace.cc Sat Sep 20 17:17:44 2014 -0400 +++ b/src/cpu/exetrace.cc Sat Sep 20 17:17:45 2014 -0400 @@ -131,6 +131,12 @@ if (Debug::ExecCPSeq cp_seq_valid) outsCPSeq= dec cp_seq; + +if (Debug::ExecFlags) { +outsflags=(; +inst-printFlags(outs, |); +outs ); +} } // ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: cpu: Remove unused deallocateContext calls
changeset a59c189de383 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=a59c189de383 description: cpu: Remove unused deallocateContext calls The call paths for de-scheduling a thread are halt() and suspend(), from the thread context. There is no call to deallocateContext() in general, though some CPUs chose to define it. This patch removes the function from BaseCPU and the cores which do not require it. diffstat: src/cpu/base.hh | 3 --- src/cpu/inorder/inorder_dyn_inst.cc | 6 -- src/cpu/inorder/inorder_dyn_inst.hh | 7 --- src/cpu/inorder/thread_context.hh | 3 --- src/cpu/o3/cpu.cc | 16 +--- src/cpu/o3/cpu.hh | 5 - src/cpu/simple/base.cc | 8 src/cpu/simple/base.hh | 1 - 8 files changed, 5 insertions(+), 44 deletions(-) diffs (140 lines): diff -r a9023811bf9e -r a59c189de383 src/cpu/base.hh --- a/src/cpu/base.hh Sat Sep 20 17:18:35 2014 -0400 +++ b/src/cpu/base.hh Sat Sep 20 17:18:36 2014 -0400 @@ -257,9 +257,6 @@ /// Notify the CPU that the indicated context is now suspended. virtual void suspendContext(ThreadID thread_num) {} -/// Notify the CPU that the indicated context is now deallocated. -virtual void deallocateContext(ThreadID thread_num) {} - /// Notify the CPU that the indicated context is now halted. virtual void haltContext(ThreadID thread_num) {} diff -r a9023811bf9e -r a59c189de383 src/cpu/inorder/inorder_dyn_inst.cc --- a/src/cpu/inorder/inorder_dyn_inst.cc Sat Sep 20 17:18:35 2014 -0400 +++ b/src/cpu/inorder/inorder_dyn_inst.cc Sat Sep 20 17:18:36 2014 -0400 @@ -571,12 +571,6 @@ } } -void -InOrderDynInst::deallocateContext(int thread_num) -{ -this-cpu-deallocateContext(thread_num); -} - Fault InOrderDynInst::readMem(Addr addr, uint8_t *data, unsigned size, unsigned flags) diff -r a9023811bf9e -r a59c189de383 src/cpu/inorder/inorder_dyn_inst.hh --- a/src/cpu/inorder/inorder_dyn_inst.hh Sat Sep 20 17:18:35 2014 -0400 +++ b/src/cpu/inorder/inorder_dyn_inst.hh Sat Sep 20 17:18:36 2014 -0400 @@ -533,13 +533,6 @@ // -// MULTITHREADING INTERFACE TO CPU MODELS -// - -virtual void deallocateContext(int thread_num); - - -// // PROGRAM COUNTERS - PC/NPC/NPC // diff -r a9023811bf9e -r a59c189de383 src/cpu/inorder/thread_context.hh --- a/src/cpu/inorder/thread_context.hh Sat Sep 20 17:18:35 2014 -0400 +++ b/src/cpu/inorder/thread_context.hh Sat Sep 20 17:18:36 2014 -0400 @@ -281,9 +281,6 @@ void activateContext() { cpu-activateContext(thread-threadId()); } -void deallocateContext() -{ cpu-deallocateContext(thread-threadId()); } - /** Returns the number of consecutive store conditional failures. */ // @todo: Figure out where these store cond failures should go. unsigned readStCondFailures() diff -r a9023811bf9e -r a59c189de383 src/cpu/o3/cpu.cc --- a/src/cpu/o3/cpu.cc Sat Sep 20 17:18:35 2014 -0400 +++ b/src/cpu/o3/cpu.cc Sat Sep 20 17:18:36 2014 -0400 @@ -730,20 +730,12 @@ template class Impl void -FullO3CPUImpl::deallocateContext(ThreadID tid, bool remove) -{ -deactivateThread(tid); -if (remove) -removeThread(tid); -} - -template class Impl -void FullO3CPUImpl::suspendContext(ThreadID tid) { DPRINTF(O3CPU,[tid: %i]: Suspending Thread Context.\n, tid); assert(!switchedOut()); -deallocateContext(tid, false); + +deactivateThread(tid); // If this was the last thread then unschedule the tick event. if (activeThreads.size() == 0) @@ -761,7 +753,9 @@ //For now, this is the same as deallocate DPRINTF(O3CPU,[tid:%i]: Halt Context called. Deallocating, tid); assert(!switchedOut()); -deallocateContext(tid, true); + +deactivateThread(tid); +removeThread(tid); } template class Impl diff -r a9023811bf9e -r a59c189de383 src/cpu/o3/cpu.hh --- a/src/cpu/o3/cpu.hh Sat Sep 20 17:18:35 2014 -0400 +++ b/src/cpu/o3/cpu.hh Sat Sep 20 17:18:36 2014 -0400 @@ -326,11 +326,6 @@ void suspendContext(ThreadID tid); /** Remove Thread from Active Threads List - * Possibly Remove Thread Context from CPU. - */ -void deallocateContext(ThreadID tid, bool remove); - -/** Remove Thread from Active Threads List * Remove Thread Context from CPU. */ void haltContext(ThreadID tid); diff -r a9023811bf9e -r a59c189de383 src/cpu/simple/base.cc --- a/src/cpu/simple/base.ccSat Sep 20 17:18:35 2014 -0400 +++ b/src/cpu/simple/base.ccSat Sep 20 17:18:36 2014 -0400 @@ -133,14 +133,6 @@ }
[gem5-dev] changeset in gem5: alpha, arm, mips, power, x86, cpu, sim: Cleanup act...
changeset a9023811bf9e in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=a9023811bf9e description: alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate activate(), suspend(), and halt() used on thread contexts had an optional delay parameter. However this parameter was often ignored. Also, when used, the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were ever specified). This patch removes the delay parameter and 'Events' associated with them across all ISAs and cores. Unused activate logic is also removed. diffstat: src/arch/alpha/utility.hh |2 +- src/arch/arm/utility.hh |2 +- src/arch/mips/mt.hh |2 +- src/arch/mips/utility.cc |2 +- src/arch/power/utility.hh |2 +- src/arch/sparc/utility.hh |2 +- src/arch/x86/utility.cc |4 +- src/cpu/base.hh |8 +- src/cpu/checker/thread_context.hh | 10 +- src/cpu/inorder/cpu.cc| 12 +- src/cpu/inorder/cpu.hh|6 +- src/cpu/inorder/thread_context.cc |8 +- src/cpu/inorder/thread_context.hh | 13 +- src/cpu/kvm/base.cc |6 +- src/cpu/kvm/base.hh |2 +- src/cpu/minor/cpu.cc | 30 +--- src/cpu/minor/cpu.hh | 19 +--- src/cpu/o3/cpu.cc | 212 ++--- src/cpu/o3/cpu.hh | 124 +- src/cpu/o3/fetch_impl.hh |8 +- src/cpu/o3/thread_context.hh |9 +- src/cpu/o3/thread_context_impl.hh | 11 +- src/cpu/simple/atomic.cc |6 +- src/cpu/simple/atomic.hh |2 +- src/cpu/simple/timing.cc |6 +- src/cpu/simple/timing.hh |2 +- src/cpu/simple_thread.cc | 12 +- src/cpu/simple_thread.hh |5 +- src/cpu/thread_context.hh | 19 +- src/sim/process.cc|2 +- 30 files changed, 95 insertions(+), 453 deletions(-) diffs (truncated from 1157 to 300 lines): diff -r 3819b85ff21a -r a9023811bf9e src/arch/alpha/utility.hh --- a/src/arch/alpha/utility.hh Sat Sep 20 17:18:33 2014 -0400 +++ b/src/arch/alpha/utility.hh Sat Sep 20 17:18:35 2014 -0400 @@ -68,7 +68,7 @@ // Alpha IPR register accessors inline bool PcPAL(Addr addr) { return addr 0x3; } inline void startupCPU(ThreadContext *tc, int cpuId) -{ tc-activate(Cycles(0)); } +{ tc-activate(); } // diff -r 3819b85ff21a -r a9023811bf9e src/arch/arm/utility.hh --- a/src/arch/arm/utility.hh Sat Sep 20 17:18:33 2014 -0400 +++ b/src/arch/arm/utility.hh Sat Sep 20 17:18:35 2014 -0400 @@ -104,7 +104,7 @@ inline void startupCPU(ThreadContext *tc, int cpuId) { -tc-activate(Cycles(0)); +tc-activate(); } void copyRegs(ThreadContext *src, ThreadContext *dest); diff -r 3819b85ff21a -r a9023811bf9e src/arch/mips/mt.hh --- a/src/arch/mips/mt.hh Sat Sep 20 17:18:33 2014 -0400 +++ b/src/arch/mips/mt.hh Sat Sep 20 17:18:35 2014 -0400 @@ -96,7 +96,7 @@ // TODO: SET PC WITH AN EVENT INSTEAD OF INSTANTANEOUSLY tc-pcState(restartPC); -tc-activate(Cycles(0)); +tc-activate(); warn(%i: Restoring thread %i in %s @ PC %x, curTick(), tc-threadId(), tc-getCpuPtr()-name(), restartPC); diff -r 3819b85ff21a -r a9023811bf9e src/arch/mips/utility.cc --- a/src/arch/mips/utility.cc Sat Sep 20 17:18:33 2014 -0400 +++ b/src/arch/mips/utility.cc Sat Sep 20 17:18:35 2014 -0400 @@ -231,7 +231,7 @@ void startupCPU(ThreadContext *tc, int cpuId) { -tc-activate(Cycles(0)); +tc-activate(); } void diff -r 3819b85ff21a -r a9023811bf9e src/arch/power/utility.hh --- a/src/arch/power/utility.hh Sat Sep 20 17:18:33 2014 -0400 +++ b/src/arch/power/utility.hh Sat Sep 20 17:18:35 2014 -0400 @@ -59,7 +59,7 @@ inline void startupCPU(ThreadContext *tc, int cpuId) { -tc-activate(Cycles(0)); +tc-activate(); } void diff -r 3819b85ff21a -r a9023811bf9e src/arch/sparc/utility.hh --- a/src/arch/sparc/utility.hh Sat Sep 20 17:18:33 2014 -0400 +++ b/src/arch/sparc/utility.hh Sat Sep 20 17:18:35 2014 -0400 @@ -77,7 +77,7 @@ { // Other CPUs will get activated by IPIs if (cpuId == 0 || !FullSystem) -tc-activate(Cycles(0)); +tc-activate(); } void copyRegs(ThreadContext *src, ThreadContext *dest); diff -r 3819b85ff21a -r a9023811bf9e src/arch/x86/utility.cc --- a/src/arch/x86/utility.cc Sat Sep 20 17:18:33 2014 -0400 +++ b/src/arch/x86/utility.cc Sat Sep 20 17:18:35 2014 -0400 @@ -203,12 +203,12 @@ void startupCPU(ThreadContext *tc, int cpuId) { if (cpuId == 0 || !FullSystem) { -tc-activate(Cycles(0)); +tc-activate(); } else { // This is an application processor (AP). It should be initialized to //
[gem5-dev] changeset in gem5: cpu: Only iterate over possible threads on th...
changeset c870b43d2ba6 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=c870b43d2ba6 description: cpu: Only iterate over possible threads on the o3 cpu Some places in O3 always iterated over Impl::MaxThreads even if a CPU had fewer threads. This removes a few of those instances. diffstat: src/cpu/o3/fetch_impl.hh | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diffs (30 lines): diff -r 535e088955ca -r c870b43d2ba6 src/cpu/o3/fetch_impl.hh --- a/src/cpu/o3/fetch_impl.hh Tue Sep 09 04:36:33 2014 -0400 +++ b/src/cpu/o3/fetch_impl.hh Tue Sep 09 04:36:34 2014 -0400 @@ -419,7 +419,7 @@ void DefaultFetchImpl::drainResume() { -for (ThreadID i = 0; i Impl::MaxThreads; ++i) +for (ThreadID i = 0; i numThreads; ++i) stalls[i].drain = false; } @@ -887,7 +887,7 @@ wroteToTimeBuffer = false; -for (ThreadID i = 0; i Impl::MaxThreads; ++i) { +for (ThreadID i = 0; i numThreads; ++i) { issuePipelinedIfetch[i] = false; } @@ -927,7 +927,7 @@ } // Issue the next I-cache request if possible. -for (ThreadID i = 0; i Impl::MaxThreads; ++i) { +for (ThreadID i = 0; i numThreads; ++i) { if (issuePipelinedIfetch[i]) { pipelineIcacheAccesses(i); } ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: cpu: Change writeback modeling for outstandin...
changeset 5b6279635c49 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=5b6279635c49 description: cpu: Change writeback modeling for outstanding instructions As highlighed on the mailing list gem5's writeback modeling can impact performance. This patch removes the limitation on maximum outstanding issued instructions, however the number that can writeback in a single cycle is still respected in instToCommit(). diffstat: configs/common/O3_ARM_v7a.py | 1 - src/cpu/o3/O3CPU.py | 1 - src/cpu/o3/iew.hh | 53 --- src/cpu/o3/iew_impl.hh| 10 src/cpu/o3/inst_queue_impl.hh | 2 - src/cpu/o3/lsq_unit.hh| 7 - src/cpu/o3/lsq_unit_impl.hh | 5 +--- 7 files changed, 1 insertions(+), 78 deletions(-) diffs (210 lines): diff -r 43516d8eabe9 -r 5b6279635c49 configs/common/O3_ARM_v7a.py --- a/configs/common/O3_ARM_v7a.py Wed Sep 03 07:42:32 2014 -0400 +++ b/configs/common/O3_ARM_v7a.py Wed Sep 03 07:42:33 2014 -0400 @@ -126,7 +126,6 @@ dispatchWidth = 6 issueWidth = 8 wbWidth = 8 -wbDepth = 1 fuPool = O3_ARM_v7a_FUP() iewToCommitDelay = 1 renameToROBDelay = 1 diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/O3CPU.py --- a/src/cpu/o3/O3CPU.py Wed Sep 03 07:42:32 2014 -0400 +++ b/src/cpu/o3/O3CPU.py Wed Sep 03 07:42:33 2014 -0400 @@ -84,7 +84,6 @@ dispatchWidth = Param.Unsigned(8, Dispatch width) issueWidth = Param.Unsigned(8, Issue width) wbWidth = Param.Unsigned(8, Writeback width) -wbDepth = Param.Unsigned(1, Writeback depth) fuPool = Param.FUPool(DefaultFUPool(), Functional Unit pool) iewToCommitDelay = Param.Cycles(1, Issue/Execute/Writeback to commit diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:32 2014 -0400 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:33 2014 -0400 @@ -219,49 +219,6 @@ /** Returns if the LSQ has any stores to writeback. */ bool hasStoresToWB(ThreadID tid) { return ldstQueue.hasStoresToWB(tid); } -void incrWb(InstSeqNum sn) -{ -++wbOutstanding; -if (wbOutstanding == wbMax) -ableToIssue = false; -DPRINTF(IEW, wbOutstanding: %i [sn:%lli]\n, wbOutstanding, sn); -assert(wbOutstanding = wbMax); -#ifdef DEBUG -wbList.insert(sn); -#endif -} - -void decrWb(InstSeqNum sn) -{ -if (wbOutstanding == wbMax) -ableToIssue = true; -wbOutstanding--; -DPRINTF(IEW, wbOutstanding: %i [sn:%lli]\n, wbOutstanding, sn); -assert(wbOutstanding = 0); -#ifdef DEBUG -assert(wbList.find(sn) != wbList.end()); -wbList.erase(sn); -#endif -} - -#ifdef DEBUG -std::setInstSeqNum wbList; - -void dumpWb() -{ -std::setInstSeqNum::iterator wb_it = wbList.begin(); -while (wb_it != wbList.end()) { -cprintf([sn:%lli]\n, -(*wb_it)); -wb_it++; -} -} -#endif - -bool canIssue() { return ableToIssue; } - -bool ableToIssue; - /** Check misprediction */ void checkMisprediction(DynInstPtr inst); @@ -452,19 +409,9 @@ */ unsigned wbCycle; -/** Number of instructions in flight that will writeback. */ - -/** Number of instructions in flight that will writeback. */ -int wbOutstanding; - /** Writeback width. */ unsigned wbWidth; -/** Writeback width * writeback depth, where writeback depth is - * the number of cycles of writing back instructions that can be - * buffered. */ -unsigned wbMax; - /** Number of active threads. */ ThreadID numThreads; diff -r 43516d8eabe9 -r 5b6279635c49 src/cpu/o3/iew_impl.hh --- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:32 2014 -0400 +++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:33 2014 -0400 @@ -76,7 +76,6 @@ issueToExecuteDelay(params-issueToExecuteDelay), dispatchWidth(params-dispatchWidth), issueWidth(params-issueWidth), - wbOutstanding(0), wbWidth(params-wbWidth), numThreads(params-numThreads) { @@ -109,12 +108,8 @@ fetchRedirect[tid] = false; } -wbMax = wbWidth * params-wbDepth; - updateLSQNextCycle = false; -ableToIssue = true; - skidBufferMax = (3 * (renameToIEWDelay * params-renameWidth)) + issueWidth; } @@ -635,8 +630,6 @@ ++wbCycle; wbNumInst = 0; } - -assert((wbCycle * wbWidth + wbNumInst) = wbMax); } DPRINTF(IEW, Current wb cycle: %i, width: %i, numInst: %i\nwbActual:%i\n, @@ -1263,7 +1256,6 @@ ++iewExecSquashedInsts; -decrWb(inst-seqNum); continue; } @@ -1502,8 +1494,6 @@ } writebackCount[tid]++; } - -decrWb(inst-seqNum); } } diff -r 43516d8eabe9 -r
[gem5-dev] changeset in gem5: config: Change parsing of Addr so hex values ...
changeset 19f5df7ac6a1 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=19f5df7ac6a1 description: config: Change parsing of Addr so hex values work from scripts When passed from a configuration script with a hexadecimal value (like 0x8000), gem5 would error out. This is because it would call toMemorySize which requires the argument to end with a size specifier (like 1MB, etc). This modification makes it so raw hex values can be passed through Addr parameters from the configuration scripts. diffstat: src/arch/arm/ArmSystem.py | 2 +- src/python/m5/params.py | 12 ++-- 2 files changed, 11 insertions(+), 3 deletions(-) diffs (35 lines): diff -r d2850235e31c -r 19f5df7ac6a1 src/arch/arm/ArmSystem.py --- a/src/arch/arm/ArmSystem.py Wed Sep 03 07:42:19 2014 -0400 +++ b/src/arch/arm/ArmSystem.py Wed Sep 03 07:42:20 2014 -0400 @@ -65,7 +65,7 @@ highest_el_is_64 = Param.Bool(False, True if the register width of the highest implemented exception level is 64 bits (ARMv8)) -reset_addr_64 = Param.UInt64(0x0, +reset_addr_64 = Param.Addr(0x0, Reset address if the highest implemented exception level is 64 bits (ARMv8)) phys_addr_range_64 = Param.UInt8(40, diff -r d2850235e31c -r 19f5df7ac6a1 src/python/m5/params.py --- a/src/python/m5/params.py Wed Sep 03 07:42:19 2014 -0400 +++ b/src/python/m5/params.py Wed Sep 03 07:42:20 2014 -0400 @@ -626,9 +626,17 @@ self.value = value.value else: try: +# Often addresses are referred to with sizes. Ex: A device +# base address is at 512MB. Use toMemorySize() to convert +# these into addresses. If the address is not specified with a +# size, an exception will occur and numeric translation will +# proceed below. self.value = convert.toMemorySize(value) -except TypeError: -self.value = long(value) +except (TypeError, ValueError): +# Convert number to string and use long() to do automatic +# base conversion (requires base=0 for auto-conversion) +self.value = long(str(value), base=0) + self._check() def __add__(self, other): if isinstance(other, Addr): ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: dev: Avoid invalid sized reads in PL390 with ...
changeset 72890a571a7b in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=72890a571a7b description: dev: Avoid invalid sized reads in PL390 with DPRINTF enabled The first DPRINTF() in PL390::writeDistributor always read a uint32_t, though a packet may have only been 1 or 2 bytes. This caused an assertion in packet-get(). diffstat: src/dev/arm/gic_pl390.cc | 19 ++- 1 files changed, 18 insertions(+), 1 deletions(-) diffs (30 lines): diff -r 82a4fa2d19a0 -r 72890a571a7b src/dev/arm/gic_pl390.cc --- a/src/dev/arm/gic_pl390.cc Wed Sep 03 07:42:25 2014 -0400 +++ b/src/dev/arm/gic_pl390.cc Wed Sep 03 07:42:27 2014 -0400 @@ -395,8 +395,25 @@ assert(pkt-req-hasContextId()); int ctx_id = pkt-req-contextId(); +uint32_t pkt_data M5_VAR_USED; +switch (pkt-getSize()) +{ + case 1: +pkt_data = pkt-getuint8_t(); +break; + case 2: +pkt_data = pkt-getuint16_t(); +break; + case 4: +pkt_data = pkt-getuint32_t(); +break; + default: +panic(Invalid size when writing to priority regs in Gic: %d\n, + pkt-getSize()); +} + DPRINTF(GIC, gic distributor write register %#x size %#x value %#x \n, -daddr, pkt-getSize(), pkt-getuint32_t()); +daddr, pkt-getSize(), pkt_data); if (daddr = ICDISER_ST daddr ICDISER_ED + 4) { assert((daddr-ICDISER_ST) 2 32); ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: arch: Properly guess OpClass from optional St...
changeset 43516d8eabe9 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=43516d8eabe9 description: arch: Properly guess OpClass from optional StaticInst flags isa_parser.py guesses the OpClass if none were given based upon the StaticInst flags. The existing code does not take into account optionally set flags. This code hoists the setting of optional flags so OpClass is properly assigned. diffstat: src/arch/isa_parser.py | 36 +--- 1 files changed, 25 insertions(+), 11 deletions(-) diffs (57 lines): diff -r 7aacec2a247d -r 43516d8eabe9 src/arch/isa_parser.py --- a/src/arch/isa_parser.pyWed Sep 03 07:42:31 2014 -0400 +++ b/src/arch/isa_parser.pyWed Sep 03 07:42:32 2014 -0400 @@ -1,3 +1,15 @@ +# Copyright (c) 2014 ARM Limited +# All rights reserved +# +# The license below extends only to copyright in the software and shall +# not be construed as granting a license to any other intellectual +# property including but not limited to intellectual property relating +# to a hardware implementation of the functionality of the software +# licensed hereunder. You may use the software subject to the license +# terms below provided that you ensure that this notice is replicated +# unmodified and in its entirety in all distributions of the software, +# modified or unmodified, in source code or in binary form. +# # Copyright (c) 2003-2005 The Regents of The University of Michigan # Copyright (c) 2013 Advanced Micro Devices, Inc. # All rights reserved. @@ -1119,17 +1131,7 @@ self.flags = self.operands.concatAttrLists('flags') -# Make a basic guess on the operand class (function unit type). -# These are good enough for most cases, and can be overridden -# later otherwise. -if 'IsStore' in self.flags: -self.op_class = 'MemWriteOp' -elif 'IsLoad' in self.flags or 'IsPrefetch' in self.flags: -self.op_class = 'MemReadOp' -elif 'IsFloating' in self.flags: -self.op_class = 'FloatAddOp' -else: -self.op_class = 'IntAluOp' +self.op_class = None # Optional arguments are assumed to be either StaticInst flags # or an OpClass value. To avoid having to import a complete @@ -1144,6 +1146,18 @@ error('InstObjParams: optional arg %s not recognized ' 'as StaticInst::Flag or OpClass.' % oa) +# Make a basic guess on the operand class if not set. +# These are good enough for most cases. +if not self.op_class: +if 'IsStore' in self.flags: +self.op_class = 'MemWriteOp' +elif 'IsLoad' in self.flags or 'IsPrefetch' in self.flags: +self.op_class = 'MemReadOp' +elif 'IsFloating' in self.flags: +self.op_class = 'FloatAddOp' +else: +self.op_class = 'IntAluOp' + # add flag initialization to contructor here to include # any flags added via opt_args self.constructor += makeFlagConstructor(self.flags) ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: cpu: Fix SMT scheduling issue with the O3 cpu
changeset ed05298e8566 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=ed05298e8566 description: cpu: Fix SMT scheduling issue with the O3 cpu The o3 cpu could attempt to schedule inactive threads under round-robin SMT mode. This is because it maintained an independent priority list of threads from the active thread list. This priority list could be come stale once threads were inactive, leading to the cpu trying to fetch/commit from inactive threads. Additionally the fetch queue is now forcibly flushed of instrctuctions from the de-scheduled thread. Relevant output: 24557000: system.cpu: [tid:1]: Calling deactivate thread. 24557000: system.cpu: [tid:1]: Removing from active threads list 24557500: system.cpu: FullO3CPU: Ticking main, FullO3CPU. 24557500: system.cpu.fetch: Running stage. 24557500: system.cpu.fetch: Attempting to fetch from [tid:1] diffstat: src/cpu/o3/O3CPU.py |3 +- src/cpu/o3/commit.hh |5 +- src/cpu/o3/commit_impl.hh | 15 +- src/cpu/o3/cpu.cc |5 +- src/cpu/o3/fetch.hh |6 +- src/cpu/o3/fetch_impl.hh | 109 + 6 files changed, 99 insertions(+), 44 deletions(-) diffs (truncated from 306 to 300 lines): diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/O3CPU.py --- a/src/cpu/o3/O3CPU.py Wed Sep 03 07:42:36 2014 -0400 +++ b/src/cpu/o3/O3CPU.py Wed Sep 03 07:42:37 2014 -0400 @@ -61,7 +61,8 @@ commitToFetchDelay = Param.Cycles(1, Commit to fetch delay) fetchWidth = Param.Unsigned(8, Fetch width) fetchBufferSize = Param.Unsigned(64, Fetch buffer size in bytes) -fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops) +fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops +per-thread) renameToDecodeDelay = Param.Cycles(1, Rename to decode delay) iewToDecodeDelay = Param.Cycles(1, Issue/Execute/Writeback to decode diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/commit.hh --- a/src/cpu/o3/commit.hh Wed Sep 03 07:42:36 2014 -0400 +++ b/src/cpu/o3/commit.hh Wed Sep 03 07:42:37 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved. * * The license below extends only to copyright in the software and shall @@ -218,6 +218,9 @@ /** Takes over from another CPU's thread. */ void takeOverFrom(); +/** Deschedules a thread from scheduling */ +void deactivateThread(ThreadID tid); + /** Ticks the commit stage, which tries to commit instructions. */ void tick(); diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/commit_impl.hh --- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:36 2014 -0400 +++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:37 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2013 ARM Limited + * Copyright (c) 2010-2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -463,6 +463,19 @@ template class Impl void +DefaultCommitImpl::deactivateThread(ThreadID tid) +{ +listThreadID::iterator thread_it = std::find(priority_list.begin(), +priority_list.end(), tid); + +if (thread_it != priority_list.end()) { +priority_list.erase(thread_it); +} +} + + +template class Impl +void DefaultCommitImpl::updateStatus() { // reset ROB changed variable diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/cpu.cc --- a/src/cpu/o3/cpu.cc Wed Sep 03 07:42:36 2014 -0400 +++ b/src/cpu/o3/cpu.cc Wed Sep 03 07:42:37 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2011-2012 ARM Limited + * Copyright (c) 2011-2012, 2014 ARM Limited * Copyright (c) 2013 Advanced Micro Devices, Inc. * All rights reserved * @@ -728,6 +728,9 @@ tid); activeThreads.erase(thread_it); } + +fetch.deactivateThread(tid); +commit.deactivateThread(tid); } template class Impl diff -r f54586c894e3 -r ed05298e8566 src/cpu/o3/fetch.hh --- a/src/cpu/o3/fetch.hh Wed Sep 03 07:42:36 2014 -0400 +++ b/src/cpu/o3/fetch.hh Wed Sep 03 07:42:37 2014 -0400 @@ -255,6 +255,8 @@ /** Tells fetch to wake up from a quiesce instruction. */ void wakeFromQuiesce(); +/** For priority-based fetch policies, need to keep update priorityList */ +void deactivateThread(ThreadID tid); private: /** Reset this pipeline stage */ void resetStage(); @@ -484,8 +486,8 @@ /** The size of the fetch queue in micro-ops */ unsigned fetchQueueSize; -/** Queue of fetched instructions */ -std::dequeDynInstPtr fetchQueue; +/** Queue of fetched instructions. Per-thread to prevent HoL blocking. */ +std::dequeDynInstPtr fetchQueue[Impl::MaxThreads]; /** Whether or not the fetch buffer data
[gem5-dev] changeset in gem5: cpu: Add a fetch queue to the o3 cpu
changeset 12e3be8203a5 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=12e3be8203a5 description: cpu: Add a fetch queue to the o3 cpu This patch adds a fetch queue that sits between fetch and decode to the o3 cpu. This effectively decouples fetch from decode stalls allowing it to be more aggressive, running futher ahead in the instruction stream. diffstat: src/cpu/o3/O3CPU.py | 1 + src/cpu/o3/fetch.hh | 14 +++--- src/cpu/o3/fetch_impl.hh | 61 ++- 3 files changed, 55 insertions(+), 21 deletions(-) diffs (201 lines): diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/O3CPU.py --- a/src/cpu/o3/O3CPU.py Wed Sep 03 07:42:34 2014 -0400 +++ b/src/cpu/o3/O3CPU.py Wed Sep 03 07:42:35 2014 -0400 @@ -61,6 +61,7 @@ commitToFetchDelay = Param.Cycles(1, Commit to fetch delay) fetchWidth = Param.Unsigned(8, Fetch width) fetchBufferSize = Param.Unsigned(64, Fetch buffer size in bytes) +fetchQueueSize = Param.Unsigned(32, Fetch queue size in micro-ops) renameToDecodeDelay = Param.Cycles(1, Rename to decode delay) iewToDecodeDelay = Param.Cycles(1, Issue/Execute/Writeback to decode diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/fetch.hh --- a/src/cpu/o3/fetch.hh Wed Sep 03 07:42:34 2014 -0400 +++ b/src/cpu/o3/fetch.hh Wed Sep 03 07:42:35 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -401,9 +401,6 @@ /** Wire to get commit's information from backwards time buffer. */ typename TimeBufferTimeStruct::wire fromCommit; -/** Internal fetch instruction queue. */ -TimeBufferFetchStruct *fetchQueue; - //Might be annoying how this name is different than the queue. /** Wire used to write any information heading to decode. */ typename TimeBufferFetchStruct::wire toDecode; @@ -455,6 +452,9 @@ /** The width of fetch in instructions. */ unsigned fetchWidth; +/** The width of decode in instructions. */ +unsigned decodeWidth; + /** Is the cache blocked? If so no threads can access it. */ bool cacheBlocked; @@ -481,6 +481,12 @@ /** The PC of the first instruction loaded into the fetch buffer. */ Addr fetchBufferPC[Impl::MaxThreads]; +/** The size of the fetch queue in micro-ops */ +unsigned fetchQueueSize; + +/** Queue of fetched instructions */ +std::dequeDynInstPtr fetchQueue; + /** Whether or not the fetch buffer data is valid. */ bool fetchBufferValid[Impl::MaxThreads]; diff -r 867b536a68be -r 12e3be8203a5 src/cpu/o3/fetch_impl.hh --- a/src/cpu/o3/fetch_impl.hh Wed Sep 03 07:42:34 2014 -0400 +++ b/src/cpu/o3/fetch_impl.hh Wed Sep 03 07:42:35 2014 -0400 @@ -82,11 +82,13 @@ iewToFetchDelay(params-iewToFetchDelay), commitToFetchDelay(params-commitToFetchDelay), fetchWidth(params-fetchWidth), + decodeWidth(params-decodeWidth), retryPkt(NULL), retryTid(InvalidThreadID), cacheBlkSize(cpu-cacheLineSize()), fetchBufferSize(params-fetchBufferSize), fetchBufferMask(fetchBufferSize - 1), + fetchQueueSize(params-fetchQueueSize), numThreads(params-numThreads), numFetchingThreads(params-smtNumFetchingThreads), finishTranslationEvent(this) @@ -313,12 +315,10 @@ templateclass Impl void -DefaultFetchImpl::setFetchQueue(TimeBufferFetchStruct *fq_ptr) +DefaultFetchImpl::setFetchQueue(TimeBufferFetchStruct *ftb_ptr) { -fetchQueue = fq_ptr; - -// Create wire to write information to proper place in fetch queue. -toDecode = fetchQueue-getWire(0); +// Create wire to write information to proper place in fetch time buf. +toDecode = ftb_ptr-getWire(0); } templateclass Impl @@ -342,6 +342,7 @@ cacheBlocked = false; priorityList.clear(); +fetchQueue.clear(); // Setup PC and nextPC with initial state. for (ThreadID tid = 0; tid numThreads; ++tid) { @@ -454,6 +455,10 @@ return false; } +// Not drained if fetch queue contains entries +if (!fetchQueue.empty()) +return false; + /* The pipeline might start up again in the middle of the drain * cycle if the finish translation event is scheduled, so make * sure that's not the case. @@ -673,11 +678,8 @@ fetchStatus[tid] = IcacheWaitResponse; } } else { -// Don't send an instruction to decode if it can't handle it. -// Asynchronous nature of this function's calling means we have to -// check 2 signals to see if decode is stalled. -if (!(numInst fetchWidth) || stalls[tid].decode || -fromDecode-decodeBlock[tid]) { +// Don't send an instruction to decode if we can't handle it. +if (!(numInst
[gem5-dev] changeset in gem5: cpu: Fix o3 front-end pipeline interlock beha...
changeset 867b536a68be in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=867b536a68be description: cpu: Fix o3 front-end pipeline interlock behavior The o3 pipeline interlock/stall logic is incorrect. o3 unnecessicarily stalled fetch and decode due to later stages in the pipeline. In general, a stage should usually only consider if it is stalled by the adjacent, downstream stage. Forcing stalls due to later stages creates and results in bubbles in the pipeline. Additionally, o3 stalled the entire frontend (fetch, decode, rename) on a branch mispredict while the ROB is being serially walked to update the RAT (robSquashing). Only should have stalled at rename. diffstat: src/cpu/o3/comm.hh| 2 - src/cpu/o3/commit.hh | 11 src/cpu/o3/commit_impl.hh | 40 - src/cpu/o3/decode.hh | 4 +-- src/cpu/o3/decode_impl.hh | 55 +++- src/cpu/o3/fetch.hh | 3 -- src/cpu/o3/fetch_impl.hh | 64 -- src/cpu/o3/iew.hh | 11 src/cpu/o3/iew_impl.hh| 23 +--- src/cpu/o3/rename_impl.hh | 25 + 10 files changed, 26 insertions(+), 212 deletions(-) diffs (truncated from 525 to 300 lines): diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/comm.hh --- a/src/cpu/o3/comm.hhWed Sep 03 07:42:33 2014 -0400 +++ b/src/cpu/o3/comm.hhWed Sep 03 07:42:34 2014 -0400 @@ -229,8 +229,6 @@ bool renameUnblock[Impl::MaxThreads]; bool iewBlock[Impl::MaxThreads]; bool iewUnblock[Impl::MaxThreads]; -bool commitBlock[Impl::MaxThreads]; -bool commitUnblock[Impl::MaxThreads]; }; #endif //__CPU_O3_COMM_HH__ diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/commit.hh --- a/src/cpu/o3/commit.hh Wed Sep 03 07:42:33 2014 -0400 +++ b/src/cpu/o3/commit.hh Wed Sep 03 07:42:34 2014 -0400 @@ -185,9 +185,6 @@ /** Sets the pointer to the IEW stage. */ void setIEWStage(IEW *iew_stage); -/** Skid buffer between rename and commit. */ -std::queueDynInstPtr skidBuffer; - /** The pointer to the IEW stage. Used solely to ensure that * various events (traps, interrupts, syscalls) do not occur until * all stores have written back. @@ -251,11 +248,6 @@ */ void setNextStatus(); -/** Checks if the ROB is completed with squashing. This is for the case - * where the ROB can take multiple cycles to complete squashing. - */ -bool robDoneSquashing(); - /** Returns if any of the threads have the number of ROB entries changed * on this cycle. Used to determine if the number of free ROB entries needs * to be sent back to previous stages. @@ -321,9 +313,6 @@ /** Gets instructions from rename and inserts them into the ROB. */ void getInsts(); -/** Insert all instructions from rename into skidBuffer */ -void skidInsert(); - /** Marks completed instructions using information sent from IEW. */ void markCompletedInsts(); diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/commit_impl.hh --- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:33 2014 -0400 +++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:34 2014 -0400 @@ -1335,29 +1335,6 @@ template class Impl void -DefaultCommitImpl::skidInsert() -{ -DPRINTF(Commit, Attempting to any instructions from rename into -skidBuffer.\n); - -for (int inst_num = 0; inst_num fromRename-size; ++inst_num) { -DynInstPtr inst = fromRename-insts[inst_num]; - -if (!inst-isSquashed()) { -DPRINTF(Commit, Inserting PC %s [sn:%i] [tid:%i] into , -skidBuffer.\n, inst-pcState(), inst-seqNum, -inst-threadNumber); -skidBuffer.push(inst); -} else { -DPRINTF(Commit, Instruction PC %s [sn:%i] [tid:%i] was -squashed, skipping.\n, -inst-pcState(), inst-seqNum, inst-threadNumber); -} -} -} - -template class Impl -void DefaultCommitImpl::markCompletedInsts() { // Grab completed insts out of the IEW instruction queue, and mark @@ -1380,23 +1357,6 @@ } template class Impl -bool -DefaultCommitImpl::robDoneSquashing() -{ -listThreadID::iterator threads = activeThreads-begin(); -listThreadID::iterator end = activeThreads-end(); - -while (threads != end) { -ThreadID tid = *threads++; - -if (!rob-isDoneSquashing(tid)) -return false; -} - -return true; -} - -template class Impl void DefaultCommitImpl::updateComInstStats(DynInstPtr inst) { diff -r 5b6279635c49 -r 867b536a68be src/cpu/o3/decode.hh --- a/src/cpu/o3/decode.hh Wed Sep 03 07:42:33 2014 -0400 +++ b/src/cpu/o3/decode.hh Wed Sep 03 07:42:34 2014 -0400 @@ -126,7 +126,7 @@ void drainSanityCheck() const; /** Has the stage
[gem5-dev] changeset in gem5: cpu: Fix o3 drain bug
changeset 40d24a672351 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=40d24a672351 description: cpu: Fix o3 drain bug For X86, the o3 CPU would get stuck with the commit stage not being drained if an interrupt arrived while drain was pending. isDrained() makes sure that pcState.microPC() == 0, thus ensuring that we are at an instruction boundary. However, when we take an interrupt we execute: pcState.upc(romMicroPC(entry)); pcState.nupc(romMicroPC(entry) + 1); tc-pcState(pcState); As a result, the MicroPC is no longer zero. This patch ensures the drain is delayed until no interrupts are present. Once draining, non-synchronous interrupts are deffered until after the switch. diffstat: src/cpu/o3/commit.hh | 11 ++- src/cpu/o3/commit_impl.hh | 15 --- 2 files changed, 22 insertions(+), 4 deletions(-) diffs (72 lines): diff -r 53278be85b40 -r 40d24a672351 src/cpu/o3/commit.hh --- a/src/cpu/o3/commit.hh Wed Sep 03 07:42:44 2014 -0400 +++ b/src/cpu/o3/commit.hh Wed Sep 03 07:42:45 2014 -0400 @@ -438,9 +438,18 @@ /** Number of Active Threads */ ThreadID numThreads; -/** Is a drain pending. */ +/** Is a drain pending? Commit is looking for an instruction boundary while + * there are no pending interrupts + */ bool drainPending; +/** Is a drain imminent? Commit has found an instruction boundary while no + * interrupts were present or in flight. This was the last architecturally + * committed instruction. Interrupts disabled and pipeline flushed. + * Waiting for structures to finish draining. + */ +bool drainImminent; + /** The latency to handle a trap. Used when scheduling trap * squash event. */ diff -r 53278be85b40 -r 40d24a672351 src/cpu/o3/commit_impl.hh --- a/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:44 2014 -0400 +++ b/src/cpu/o3/commit_impl.hh Wed Sep 03 07:42:45 2014 -0400 @@ -104,6 +104,7 @@ commitWidth(params-commitWidth), numThreads(params-numThreads), drainPending(false), + drainImminent(false), trapLatency(params-trapLatency), canHandleInterrupts(true), avoidQuiesceLiveLock(false) @@ -406,6 +407,7 @@ DefaultCommitImpl::drainResume() { drainPending = false; +drainImminent = false; } template class Impl @@ -816,8 +818,10 @@ void DefaultCommitImpl::propagateInterrupt() { +// Don't propagate intterupts if we are currently handling a trap or +// in draining and the last observable instruction has been committed. if (commitStatus[0] == TrapPending || interrupt || trapSquash[0] || -tcSquash[0]) +tcSquash[0] || drainImminent) return; // Process interrupts if interrupts are enabled, not in PAL @@ -1089,10 +1093,15 @@ squashAfter(tid, head_inst); if (drainPending) { -DPRINTF(Drain, Draining: %i:%s\n, tid, pc[tid]); -if (pc[tid].microPC() == 0 interrupt == NoFault) { +if (pc[tid].microPC() == 0 interrupt == NoFault +!thread[tid]-trapPending) { +// Last architectually committed instruction. +// Squash the pipeline, stall fetch, and use +// drainImminent to disable interrupts +DPRINTF(Drain, Draining: %i:%s\n, tid, pc[tid]); squashAfter(tid, head_inst); cpu-commitDrained(tid); +drainImminent = true; } } ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: arm: Fix v8 neon latency issue for loads/stores
changeset 53278be85b40 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=53278be85b40 description: arm: Fix v8 neon latency issue for loads/stores Neon memory ops that operate on multiple registers currently have very poor performance because of interleave/deinterleave micro-ops. This patch marks the deinterleave/interleave micro-ops as No_OpClass such that they take minumum cycles to execute and are never resource constrained. Additionaly the micro-ops over-read registers. Although one form may need to read up to 20 sources, not all do. This adds in new forms so false dependencies are not modeled. Instructions read their minimum number of sources. diffstat: src/arch/arm/insts/macromem.cc| 47 +- src/arch/arm/isa/insts/neon64_mem.isa | 24 +++- 2 files changed, 56 insertions(+), 15 deletions(-) diffs (140 lines): diff -r 8bee5f4edb92 -r 53278be85b40 src/arch/arm/insts/macromem.cc --- a/src/arch/arm/insts/macromem.ccTue Apr 29 16:05:02 2014 -0500 +++ b/src/arch/arm/insts/macromem.ccWed Sep 03 07:42:44 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2013 ARM Limited + * Copyright (c) 2010-2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -1107,9 +1107,26 @@ } for (int i = 0; i numMarshalMicroops; ++i) { -microOps[uopIdx++] = new MicroDeintNeon64( -machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize, -numStructElems, numRegs, i /* step */); +switch(numRegs) { +case 1: microOps[uopIdx++] = new MicroDeintNeon64_1Reg( +machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize, +numStructElems, 1, i /* step */); +break; +case 2: microOps[uopIdx++] = new MicroDeintNeon64_2Reg( +machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize, +numStructElems, 2, i /* step */); +break; +case 3: microOps[uopIdx++] = new MicroDeintNeon64_3Reg( +machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize, +numStructElems, 3, i /* step */); +break; +case 4: microOps[uopIdx++] = new MicroDeintNeon64_4Reg( +machInst, vd + (RegIndex) (2 * i), vx, eSize, dataSize, +numStructElems, 4, i /* step */); +break; +default: panic(Invalid number of registers); +} + } assert(uopIdx == numMicroops); @@ -1150,9 +1167,25 @@ unsigned uopIdx = 0; for(int i = 0; i numMarshalMicroops; ++i) { -microOps[uopIdx++] = new MicroIntNeon64( -machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize, -numStructElems, numRegs, i /* step */); +switch (numRegs) { +case 1: microOps[uopIdx++] = new MicroIntNeon64_1Reg( +machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize, +numStructElems, 1, i /* step */); +break; +case 2: microOps[uopIdx++] = new MicroIntNeon64_2Reg( +machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize, +numStructElems, 2, i /* step */); +break; +case 3: microOps[uopIdx++] = new MicroIntNeon64_3Reg( +machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize, +numStructElems, 3, i /* step */); +break; +case 4: microOps[uopIdx++] = new MicroIntNeon64_4Reg( +machInst, vx + (RegIndex) (2 * i), vd, eSize, dataSize, +numStructElems, 4, i /* step */); +break; +default: panic(Invalid number of registers); +} } uint32_t memaccessFlags = TLB::MustBeOne | (TLB::ArmFlags) eSize | diff -r 8bee5f4edb92 -r 53278be85b40 src/arch/arm/isa/insts/neon64_mem.isa --- a/src/arch/arm/isa/insts/neon64_mem.isa Tue Apr 29 16:05:02 2014 -0500 +++ b/src/arch/arm/isa/insts/neon64_mem.isa Wed Sep 03 07:42:44 2014 -0400 @@ -1,6 +1,6 @@ // -*- mode: c++ -*- -// Copyright (c) 2012-2013 ARM Limited +// Copyright (c) 2012-2014 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -163,11 +163,11 @@ header_output += MicroNeonMemDeclare64.subst(loadIop) + \ MicroNeonMemDeclare64.subst(storeIop) -def mkMarshalMicroOp(name, Name): +def mkMarshalMicroOp(name, Name, numRegs=4): global header_output, decoder_output, exec_output getInputCodeOp1L = '' -for v in range(4): +for v in range(numRegs): for p in
[gem5-dev] changeset in gem5: x86: Flag instructions that call suspend as I...
changeset 0b4d10f53c2d in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=0b4d10f53c2d description: x86: Flag instructions that call suspend as IsQuiesce The o3 cpu relies upon instructions that suspend a thread context being flagged as IsQuiesce. If they are not, unpredictable behavior can occur. This patch fixes that for the x86 ISA. diffstat: src/arch/x86/isa/decoder/two_byte_opcodes.isa | 6 +++--- src/arch/x86/isa/microops/specop.isa | 3 ++- 2 files changed, 5 insertions(+), 4 deletions(-) diffs (33 lines): diff -r 40d24a672351 -r 0b4d10f53c2d src/arch/x86/isa/decoder/two_byte_opcodes.isa --- a/src/arch/x86/isa/decoder/two_byte_opcodes.isa Wed Sep 03 07:42:45 2014 -0400 +++ b/src/arch/x86/isa/decoder/two_byte_opcodes.isa Wed Sep 03 07:42:46 2014 -0400 @@ -141,13 +141,13 @@ }}, IsNonSpeculative); 0x01: m5quiesce({{ PseudoInst::quiesce(xc-tcBase()); -}}, IsNonSpeculative); +}}, IsNonSpeculative, IsQuiesce); 0x02: m5quiesceNs({{ PseudoInst::quiesceNs(xc-tcBase(), Rdi); -}}, IsNonSpeculative); +}}, IsNonSpeculative, IsQuiesce); 0x03: m5quiesceCycle({{ PseudoInst::quiesceCycles(xc-tcBase(), Rdi); -}}, IsNonSpeculative); +}}, IsNonSpeculative, IsQuiesce); 0x04: m5quiesceTime({{ Rax = PseudoInst::quiesceTime(xc-tcBase()); }}, IsNonSpeculative); diff -r 40d24a672351 -r 0b4d10f53c2d src/arch/x86/isa/microops/specop.isa --- a/src/arch/x86/isa/microops/specop.isa Wed Sep 03 07:42:45 2014 -0400 +++ b/src/arch/x86/isa/microops/specop.isa Wed Sep 03 07:42:46 2014 -0400 @@ -63,7 +63,8 @@ MicroHalt(ExtMachInst _machInst, const char * instMnem, uint64_t setFlags) : X86MicroopBase(_machInst, halt, instMnem, - setFlags | (ULL(1) StaticInst::IsNonSpeculative), + setFlags | (ULL(1) StaticInst::IsNonSpeculative) | + (ULL(1) StaticInst::IsQuiesce), No_OpClass) { } ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: cpu: Fix cache blocked load behavior in o3 cpu
changeset 6be8945d226b in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6be8945d226b description: cpu: Fix cache blocked load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. diffstat: src/cpu/o3/iew.hh | 13 +- src/cpu/o3/iew_impl.hh | 57 ++ src/cpu/o3/inst_queue.hh| 25 - src/cpu/o3/inst_queue_impl.hh | 68 ++--- src/cpu/o3/lsq.hh | 27 +- src/cpu/o3/lsq_impl.hh | 23 +--- src/cpu/o3/lsq_unit.hh | 198 --- src/cpu/o3/lsq_unit_impl.hh | 40 ++- src/cpu/o3/mem_dep_unit.hh |4 +- src/cpu/o3/mem_dep_unit_impl.hh |4 +- 10 files changed, 203 insertions(+), 256 deletions(-) diffs (truncated from 846 to 300 lines): diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Wed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew.hh Wed Sep 03 07:42:39 2014 -0400 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2010-2012 ARM Limited + * Copyright (c) 2010-2012, 2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -181,6 +181,12 @@ /** Re-executes all rescheduled memory instructions. */ void replayMemInst(DynInstPtr inst); +/** Moves memory instruction onto the list of cache blocked instructions */ +void blockMemInst(DynInstPtr inst); + +/** Notifies that the cache has become unblocked */ +void cacheUnblocked(); + /** Sends an instruction to commit through the time buffer. */ void instToCommit(DynInstPtr inst); @@ -233,11 +239,6 @@ */ void squashDueToMemOrder(DynInstPtr inst, ThreadID tid); -/** Sends commit proper information for a squash due to memory becoming - * blocked (younger issued instructions must be retried). - */ -void squashDueToMemBlocked(DynInstPtr inst, ThreadID tid); - /** Sets Dispatch to blocked, and signals back to other stages to block. */ void block(ThreadID tid); diff -r 1ba825974ee6 -r 6be8945d226b src/cpu/o3/iew_impl.hh --- a/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:38 2014 -0400 +++ b/src/cpu/o3/iew_impl.hhWed Sep 03 07:42:39 2014 -0400 @@ -530,29 +530,6 @@ templateclass Impl void -DefaultIEWImpl::squashDueToMemBlocked(DynInstPtr inst, ThreadID tid) -{ -DPRINTF(IEW, [tid:%i]: Memory blocked, squashing load and younger insts, -PC: %s [sn:%i].\n, tid, inst-pcState(), inst-seqNum); -if (!toCommit-squash[tid] || -inst-seqNum toCommit-squashedSeqNum[tid]) { -toCommit-squash[tid] = true; - -toCommit-squashedSeqNum[tid] = inst-seqNum; -toCommit-pc[tid] = inst-pcState(); -toCommit-mispredictInst[tid] = NULL; - -// Must include the broadcasted SN in the squash. -toCommit-includeSquashInst[tid] = true; - -ldstQueue.setLoadBlockedHandled(tid); - -wroteToTimeBuffer = true; -} -} - -templateclass Impl -void DefaultIEWImpl::block(ThreadID tid) { DPRINTF(IEW, [tid:%u]: Blocking.\n, tid); @@ -610,6 +587,20 @@ templateclass Impl void +DefaultIEWImpl::blockMemInst(DynInstPtr inst) +{ +instQueue.blockMemInst(inst); +} + +templateclass Impl +void +DefaultIEWImpl::cacheUnblocked() +{ +instQueue.cacheUnblocked(); +} + +templateclass Impl +void DefaultIEWImpl::instToCommit(DynInstPtr inst) { // This function should not be called after writebackInsts in a @@ -1376,15 +1367,6 @@ squashDueToMemOrder(violator, tid); ++memOrderViolationEvents; -} else if (ldstQueue.loadBlocked(tid) - !ldstQueue.isLoadBlockedHandled(tid)) { -fetchRedirect[tid] = true; - -DPRINTF(IEW, Load operation couldn't execute because the -memory system is blocked. PC: %s [sn:%lli]\n, -inst-pcState(), inst-seqNum); - -squashDueToMemBlocked(inst, tid); } } else { // Reset any state associated with redirects that will not @@ -1403,17 +1385,6 @@ ++memOrderViolationEvents; } -if (ldstQueue.loadBlocked(tid) -!ldstQueue.isLoadBlockedHandled(tid)) { -DPRINTF(IEW, Load operation couldn't execute because the -memory system is blocked. PC: %s [sn:%lli]\n, -
[gem5-dev] changeset in gem5: arm: Mark v7 cbz instructions as direct branches
changeset 5e424aa952c5 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=5e424aa952c5 description: arm: Mark v7 cbz instructions as direct branches v7 cbz/cbnz instructions were improperly marked as indirect branches. diffstat: src/arch/arm/isa/insts/branch.isa | 11 +++ src/arch/arm/isa/templates/branch.isa | 6 +- 2 files changed, 12 insertions(+), 5 deletions(-) diffs (52 lines): diff -r 6be8945d226b -r 5e424aa952c5 src/arch/arm/isa/insts/branch.isa --- a/src/arch/arm/isa/insts/branch.isa Wed Sep 03 07:42:39 2014 -0400 +++ b/src/arch/arm/isa/insts/branch.isa Wed Sep 03 07:42:40 2014 -0400 @@ -1,6 +1,6 @@ // -*- mode:c++ -*- -// Copyright (c) 2010-2012 ARM Limited +// Copyright (c) 2010-2012, 2014 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -174,12 +174,15 @@ #CBNZ, CBZ. These are always unconditional as far as predicates for (mnem, test) in ((cbz, ==), (cbnz, !=)): code = 'NPC = (uint32_t)(PC + imm);\n' +br_tgt_code = '''pcs.instNPC((uint32_t)(branchPC.instPC() + imm));''' predTest = Op1 %(test)s 0 % {test: test} iop = InstObjParams(mnem, mnem.capitalize(), BranchImmReg, -{code: code, predicate_test: predTest}, -[IsIndirectControl]) +{code: code, predicate_test: predTest, +brTgtCode : br_tgt_code}, +[IsDirectControl]) header_output += BranchImmRegDeclare.subst(iop) -decoder_output += BranchImmRegConstructor.subst(iop) +decoder_output += BranchImmRegConstructor.subst(iop) + \ + BranchTarget.subst(iop) exec_output += PredOpExecute.subst(iop) #TBB, TBH diff -r 6be8945d226b -r 5e424aa952c5 src/arch/arm/isa/templates/branch.isa --- a/src/arch/arm/isa/templates/branch.isa Wed Sep 03 07:42:39 2014 -0400 +++ b/src/arch/arm/isa/templates/branch.isa Wed Sep 03 07:42:40 2014 -0400 @@ -1,6 +1,6 @@ // -*- mode:c++ -*- -// Copyright (c) 2010 ARM Limited +// Copyright (c) 2010, 2014 ARM Limited // All rights reserved // // The license below extends only to copyright in the software and shall @@ -212,6 +212,10 @@ %(class_name)s(ExtMachInst machInst, int32_t imm, IntRegIndex _op1); %(BasicExecDeclare)s +ArmISA::PCState branchTarget(const ArmISA::PCState branchPC) const; + +/// Explicitly import the otherwise hidden branchTarget +using StaticInst::branchTarget; }; }}; ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: cpu: Fix o3 quiesce fetch bug
changeset 1ba825974ee6 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=1ba825974ee6 description: cpu: Fix o3 quiesce fetch bug O3 is supposed to stop fetching instructions once a quiesce is encountered. However due to a bug, it would continue fetching instructions from the current fetch buffer. This is because of a break statment that only broke out of the first of 2 nested loops. It should have broken out of both. diffstat: src/cpu/o3/fetch_impl.hh | 8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diffs (34 lines): diff -r ed05298e8566 -r 1ba825974ee6 src/cpu/o3/fetch_impl.hh --- a/src/cpu/o3/fetch_impl.hh Wed Sep 03 07:42:37 2014 -0400 +++ b/src/cpu/o3/fetch_impl.hh Wed Sep 03 07:42:38 2014 -0400 @@ -1236,6 +1236,9 @@ // ended this fetch block. bool predictedBranch = false; +// Need to halt fetch if quiesce instruction detected +bool quiesce = false; + TheISA::MachInst *cacheInsts = reinterpret_castTheISA::MachInst *(fetchBuffer[tid]); @@ -1246,7 +1249,7 @@ // Keep issuing while fetchWidth is available and branch is not // predicted taken while (numInst fetchWidth fetchQueue[tid].size() fetchQueueSize -!predictedBranch) { +!predictedBranch !quiesce) { // We need to process more memory if we aren't going to get a // StaticInst from the rom, the current macroop, or what's already // in the decoder. @@ -1363,9 +1366,10 @@ if (instruction-isQuiesce()) { DPRINTF(Fetch, -Quiesce instruction encountered, halting fetch!); +Quiesce instruction encountered, halting fetch!\n); fetchStatus[tid] = QuiescePending; status_change = true; +quiesce = true; break; } } while ((curMacroop || decoder[tid]-instReady()) ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: arm: Make memory ops work on 64bit/128-bit qu...
changeset d96b61d843b2 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=d96b61d843b2 description: arm: Make memory ops work on 64bit/128-bit quantities Multiple instructions assume only 32-bit load operations are available, this patch increases load sizes to 64-bit or 128-bit for many load pair and load multiple instructions. diffstat: src/arch/arm/insts/macromem.cc | 388 ++- src/arch/arm/insts/macromem.hh | 22 +- src/arch/arm/isa/insts/ldr64.isa| 90 +++--- src/arch/arm/isa/insts/macromem.isa | 24 +- src/arch/arm/isa/insts/mem.isa |4 +- src/arch/arm/isa/templates/macromem.isa | 35 ++- 6 files changed, 355 insertions(+), 208 deletions(-) diffs (truncated from 864 to 300 lines): diff -r b5bef3c8e070 -r d96b61d843b2 src/arch/arm/insts/macromem.cc --- a/src/arch/arm/insts/macromem.ccFri Jun 27 12:29:00 2014 -0500 +++ b/src/arch/arm/insts/macromem.ccWed Sep 03 07:42:52 2014 -0400 @@ -61,14 +61,29 @@ { uint32_t regs = reglist; uint32_t ones = number_of_ones(reglist); -// Remember that writeback adds a uop or two and the temp register adds one -numMicroops = ones + (writeback ? (load ? 2 : 1) : 0) + 1; +uint32_t mem_ops = ones; -// It's technically legal to do a lot of nothing -if (!ones) +// Copy the base address register if we overwrite it, or if this instruction +// is basically a no-op (we have to do something) +bool copy_base = (bits(reglist, rn) load) || !ones; +bool force_user = user !bits(reglist, 15); +bool exception_ret = user bits(reglist, 15); +bool pc_temp = load writeback bits(reglist, 15); + +if (!ones) { numMicroops = 1; +} else if (load) { +numMicroops = ((ones + 1) / 2) ++ ((ones % 2 == 0 exception_ret) ? 1 : 0) ++ (copy_base ? 1 : 0) ++ (writeback? 1 : 0) ++ (pc_temp ? 1 : 0); +} else { +numMicroops = ones + (writeback ? 1 : 0); +} microOps = new StaticInstPtr[numMicroops]; + uint32_t addr = 0; if (!up) @@ -81,94 +96,129 @@ // Add 0 to Rn and stick it in ureg0. // This is equivalent to a move. -*uop = new MicroAddiUop(machInst, INTREG_UREG0, rn, 0); +if (copy_base) +*uop++ = new MicroAddiUop(machInst, INTREG_UREG0, rn, 0); unsigned reg = 0; -unsigned regIdx = 0; -bool force_user = user !bits(reglist, 15); -bool exception_ret = user bits(reglist, 15); +while (mem_ops != 0) { +// Do load operations in pairs if possible +if (load mem_ops = 2 +!(mem_ops == 2 bits(regs,INTREG_PC) exception_ret)) { +// 64-bit memory operation +// Find 2 set register bits (clear them after finding) +unsigned reg_idx1; +unsigned reg_idx2; -for (int i = 0; i ones; i++) { -// Find the next register. -while (!bits(regs, reg)) -reg++; -replaceBits(regs, reg, 0); +// Find the first register +while (!bits(regs, reg)) reg++; +replaceBits(regs, reg, 0); +reg_idx1 = force_user ? intRegInMode(MODE_USER, reg) : reg; -regIdx = reg; -if (force_user) { -regIdx = intRegInMode(MODE_USER, regIdx); -} +// Find the second register +while (!bits(regs, reg)) reg++; +replaceBits(regs, reg, 0); +reg_idx2 = force_user ? intRegInMode(MODE_USER, reg) : reg; -if (load) { -if (writeback i == ones - 1) { -// If it's a writeback and this is the last register -// do the load into a temporary register which we'll move -// into the final one later -*++uop = new MicroLdrUop(machInst, INTREG_UREG1, INTREG_UREG0, -up, addr); -} else { -// Otherwise just do it normally -if (reg == INTREG_PC exception_ret) { -// This must be the exception return form of ldm. -*++uop = new MicroLdrRetUop(machInst, regIdx, - INTREG_UREG0, up, addr); +// Load into temp reg if necessary +if (reg_idx2 == INTREG_PC pc_temp) +reg_idx2 = INTREG_UREG1; + +// Actually load both registers from memory +*uop = new MicroLdr2Uop(machInst, reg_idx1, reg_idx2, +copy_base ? INTREG_UREG0 : rn, up, addr); + +if (!writeback reg_idx2 == INTREG_PC) { +// No writeback if idx==pc, set appropriate flags +(*uop)-setFlag(StaticInst::IsControl); +(*uop)-setFlag(StaticInst::IsIndirectControl); + +if (!(condCode == COND_AL || condCode == COND_UC)) +
Re: [gem5-dev] bi-mode branch predictor miss prediction rate is high
A bug was recently found in the bimodal predictor. If you are still looking at this, you might want to try a new checkout. Hope this helps. On Wed, Jul 2, 2014 at 4:52 PM, Zi Yan via gem5-dev gem5-dev@gem5.org wrote: I get 5 100-million-instruction simpoints for each benchmark in SPEC CPU 2006 with *ref input*. I am using cross-tool arm-cortex_a15-linux-gnueabi-gcc version 4.8.2 to compile. For gcc, I got from 0.2% to 5% miss rate from tournament, but 3% to 22% miss rate from bi-mode cross all simpoints. Most weird part is hmmer, I got from 0.3% to 0.5% miss rate from tournament, but 52% to 60% miss rate from bi-mode. -- Best Regards Yan Zi On 2 Jul 2014, at 17:11, Anthony Gutierrez via gem5-dev wrote: This could depend on a lot of factors. How are you running the benchmarks? E.g., running SPEC 2k6's gcc to completion with the train input set in FS mode yields a 6.45% miss rate for bi-mode, while the tournament predictor yields a 7.12% miss rate. Anthony Gutierrez http://web.eecs.umich.edu/~atgutier On Wed, Jul 2, 2014 at 4:37 PM, Zi Yan via gem5-dev gem5-dev@gem5.org wrote: Hi, I just updated gem5-dev and got bi-mode as ARM's default branch predictor. I got mis-prediction rate (system.cpu.branchPred.condIncorrect/system.cpu.branchPred.condPredicted) ranging from 10% to 60%, whereas I saw mis-prediction rate ranging from 1% to 9% with tournament for SPEC CPU 2006 benchmarks. Should I expect this from bi-mode? Thanks. -- Best Regards Yan Zi ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mips: Fix RLIMIT_RSS naming
changeset b7715fb7cf9f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=b7715fb7cf9f description: mips: Fix RLIMIT_RSS naming MIPS defined RLIMIT_RSS in a way that could cause a naming conflict with RLIMIT_RSS from the host system. Broke clang+MacOS build. diffstat: src/arch/mips/linux/linux.hh | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diffs (12 lines): diff -r 4593282280e4 -r b7715fb7cf9f src/arch/mips/linux/linux.hh --- a/src/arch/mips/linux/linux.hh Tue Aug 26 10:13:28 2014 -0400 +++ b/src/arch/mips/linux/linux.hh Tue Aug 26 10:13:31 2014 -0400 @@ -117,7 +117,7 @@ /// Resource constants for getrlimit() (overide some generics). static const unsigned TGT_RLIMIT_NPROC = 8; static const unsigned TGT_RLIMIT_AS = 6; -static const unsigned RLIMIT_RSS = 7; +static const unsigned TGT_RLIMIT_RSS = 7; static const unsigned TGT_RLIMIT_NOFILE = 5; static const unsigned TGT_RLIMIT_MEMLOCK = 9; ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Review Request 2332: cpu: Fix cached block load behavior in o3 cpu
On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote: src/cpu/o3/inst_queue.hh, line 322 http://reviews.gem5.org/r/2332/diff/1/?file=40490#file40490line322 I think we need better differentiation between this list and the one declared after it. On further reading, it seems that we may not need the two lists. Can we just mark the instructions that they should be retried? While adding them back to the ready queue, we can check which ones are marked. Or may be keep an iterator that tracks the point till which we should retry. One more thought, can we do with a queue instead of a list? That could be done. Real hardware would likely mark these blocked instructions with a bit to guard from execution in the IQ. The clearing of the blocked cause (cache blocked in this case) would flash-clear that guard bit for all load instructions in the IQ. Functionally these two approaches should be equivalent. The same alternative implementation could have been done for deferred memory instructions. I just coded it similarly to deferred instructions were already handled. I haven't looked to verify how easy this alternative implementation would be to do. A queue could be used, the only difference would be that the clear() would be replaced with a swap with an empty queue. On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote: src/cpu/o3/inst_queue_impl.hh, line 416 http://reviews.gem5.org/r/2332/diff/1/?file=40491#file40491line416 Should we not clear retryMemInsts as well? Yep, will add. On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote: src/cpu/o3/lsq_unit.hh, line 888 http://reviews.gem5.org/r/2332/diff/1/?file=40494#file40494line888 Do we need all these changes that appear over next 15-20 lines? It seems from my initial reading that the previous code structure could have been retained. I moved it below during the coding to clearly show the deletion events. I hit a few issues due to bad deletions and split loads. Having all deletions done together and not mixed in with code segments above made it clearer to me. On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote: src/cpu/o3/inst_queue_impl.hh, line 759 http://reviews.gem5.org/r/2332/diff/1/?file=40491#file40491line759 Let's retain the new line above this while loop. Ok. On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote: src/cpu/o3/inst_queue_impl.hh, line 1116 http://reviews.gem5.org/r/2332/diff/1/?file=40491#file40491line1116 We should use nullptr now that we have gcc minimum dependency at 4.6. Hmm, at one point in this patch or another I had to try to return nullptr and someone took issue with it I believe. Either way, personally I don't care. Grepping our code base (this might be different from the mainline gem5) it seems we only use nullptr eleven times in the code base and only in network-related code. Should a later consistency patch try to change this? Because currently it looks like NULL is the preferred. $ grep -r NULL * | wc -l 1515 $ grep -r nullptr * | wc -l 11 On Aug. 19, 2014, 5:11 p.m., Nilay Vaish wrote: src/cpu/o3/lsq_unit_impl.hh, line 111 http://reviews.gem5.org/r/2332/diff/1/?file=40495#file40495line111 New line after. Will do. - Mitch --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2332/#review5272 --- On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2332/ --- (Updated Aug. 13, 2014, 2:06 p.m.) Review request for Default. Repository: gem5 Description --- Changeset 10300:bddebc19285f --- cpu: Fix cached block load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. Diffs - src/cpu/o3/iew.hh 79fde1c67ed8 src/cpu/o3/iew_impl.hh 79fde1c67ed8 src/cpu/o3/inst_queue.hh 79fde1c67ed8 src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8 src/cpu/o3/lsq.hh 79fde1c67ed8 src/cpu/o3/lsq_impl.hh 79fde1c67ed8 src/cpu/o3/lsq_unit.hh 79fde1c67ed8 src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8 src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8 src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8 Diff:
Re: [gem5-dev] Review Request 2332: cpu: Fix cached block load behavior in o3 cpu
Hi, I'm the one who wrote this patch. ** The opening comment in the patch states that it is trying to do twothings. I would suggest that we split the patch.* Related code was already being changed by this patch, and going out of the way to handle blocked and deferred memory instructions differently than each other seemed wrong. They are both cases of memory instructions replayed for different reasons. A side effect of treating them the same is that the resources are now properly modeled. ** I think we should not drop the original behaviour. Firstly, it was not incorrect.Secondly, no reason has been provided as to why the behaviour implementedshould be preferred. Are we sure that most out-of-order processors wouldchoose the proposed over the original?* The existing replay logic in gem5 attempts to model that which was present in the Alpha 21264 ( http://www.ece.cmu.edu/~ece447/s14/lib/exe/fetch.php?media=21264hrm.pdf). Specifically: *There are some situations in which a load or store instruction cannot be executed due to a condition that occurs after that instruction issues from the IQ or FQ. The instruction is aborted (along with all newer instructions) and restarted from the fetch stage of the pipeline. This mechanism is called a replay trap.* However, it is doubtful that any modern processor would desire this behavior. The current symptom is that o3 repeatedly re-fetches and executes the same sequence of instructions unnecessarily multiple times until the cache finally becomes unblocked. This behavior is more detrimental to performance and power efficiency than even the P4's replay mechanism which was so power hungry, it has its own wikipedia entry ( http://en.wikipedia.org/wiki/Replay_system). The P4 was more conservative than gem5 in that it didn't have to re-fetch. Nowadays, due to power considerations, replay events are rarer and more selective in modern processors. Mikko previously had a paper explaining replay mechanisms and the importance of limited replay in modern processors ( http://pharm.ece.wisc.edu/papers/hpca2004ikim.pdf). Though truthful to the 21264, complaints about the existing logic have already been brought up multiple times on the mailing list and in Karu's WDDD paper. Since gem5 is a research simulator and not a historical tribute to the 21264, I see little value in keeping the old semantics. It would add complication to the code and likely add infrequently tested code paths. Hope that clears up the reasoning behind dropping the old functionality. On Sat, Aug 16, 2014 at 11:01 AM, Nilay Vaish via gem5-dev gem5-dev@gem5.org wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2332/#review5261 --- Two points that I would like to make: * The opening comment in the patch states that it is trying to do two things. I would suggest that we split the patch. * I think we should not drop the original behaviour. Firstly, it was not incorrect. Secondly, no reason has been provided as to why the behaviour implemented should be preferred. Are we sure that most out-of-order processors would choose the proposed over the original? - Nilay Vaish On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2332/ --- (Updated Aug. 13, 2014, 2:06 p.m.) Review request for Default. Repository: gem5 Description --- Changeset 10300:bddebc19285f --- cpu: Fix cached block load behavior in o3 cpu This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior. Diffs - src/cpu/o3/iew.hh 79fde1c67ed8 src/cpu/o3/iew_impl.hh 79fde1c67ed8 src/cpu/o3/inst_queue.hh 79fde1c67ed8 src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8 src/cpu/o3/lsq.hh 79fde1c67ed8 src/cpu/o3/lsq_impl.hh 79fde1c67ed8 src/cpu/o3/lsq_unit.hh 79fde1c67ed8 src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8 src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8 src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8 Diff: http://reviews.gem5.org/r/2332/diff/ Testing --- Thanks, Andreas Hansson ___ gem5-dev mailing list gem5-dev@gem5.org
[gem5-dev] changeset in gem5: ext: clang fix for flexible array members
changeset 0edd36ea6130 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=0edd36ea6130 description: ext: clang fix for flexible array members Changes how flexible array members are defined so clang does not error out during compilation. diffstat: ext/dnet/os.h | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diffs (13 lines): diff -r 763f76d5dea7 -r 0edd36ea6130 ext/dnet/os.h --- a/ext/dnet/os.h Sun Aug 10 05:39:40 2014 -0400 +++ b/ext/dnet/os.h Wed Aug 13 06:57:19 2014 -0400 @@ -98,7 +98,8 @@ /* Support for flexible arrays. */ #undef __flexarr -#if defined(__GNUC__) ((__GNUC__ 2) || (__GNUC__ == 2 __GNUC_MINOR__ = 97)) +#if !defined(__clang__) defined(__GNUC__) \ +((__GNUC__ 2) || (__GNUC__ == 2 __GNUC_MINOR__ = 97)) /* GCC 2.97 supports C99 flexible array members. */ # define __flexarr [] #else ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Review Request 2292: python: Change parsing of Addr so hex values work from scripts
On June 20, 2014, 2:36 a.m., Steve Reinhardt wrote: Why is Addr being parsed using toMemorySize() in the first place? That seems wrong. At least some of the places Addr is used with a size (like RealView.max_mem_size), I think the problem is that the param should really be a Param.MemorySize to begin with. Hi Steve, I'm the one that made this edit. I agree that the call to toMemorySize() on an address seems strange, but it's an idiom that seems pretty well spread throughout gem5. Saying that something lives at 512MB for an address is used in multiple places. 1. It's pretty well baked into the AddrRange() param as most places that call it give it a starting address or size in MB, GB, etc which is then directly passed to Addr() within the param class. common/FSConfig.py 143:self.mem_ranges = [AddrRange(Addr('1MB'), size = '64MB'), 144: AddrRange(Addr('2GB'), size ='256MB')] 405:self.mem_ranges = [AddrRange('3GB'), 406:AddrRange(Addr('4GB'), size = excess_mem_size)] 2. It's also used on other systems for arithmetic. src/arch/sparc/SparcSystem.py 59:hypervisor_addr = Param.Addr(Addr('64kB') + _rom_base, 61:openboot_addr = Param.Addr(Addr('512kB') + _rom_base, It could be changed over, but this would require changing multiple other places in the code. Changing these places, which are really trying to directly set an Addr to MemorySize, also seems wrong though. In the FSConfig.py example, we're trying to specify the starting address, not a MemorySize. Either way is going to have some bad semantics (unless we get rid of the ability to nicely specify starting addresses via sizes). - Mitch --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2292/#review5144 --- On June 12, 2014, 10:47 p.m., Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2292/ --- (Updated June 12, 2014, 10:47 p.m.) Review request for Default. Repository: gem5 Description --- Changeset 10240:d4f21d820604 --- python: Change parsing of Addr so hex values work from scripts When passed from a configuration script with a hexadecimal value (like 0x8000), gem5 would error out. This is because it would call toMemorySize which requires the argument to end with a size specifier (like 1MB, etc). This modification makes it so raw hex values can be passed through Addr parameters from the configuration scripts. Diffs - src/arch/arm/ArmSystem.py a2bb75a474fd src/python/m5/params.py a2bb75a474fd Diff: http://reviews.gem5.org/r/2292/diff/ Testing --- Thanks, Ali Saidi ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Proposal to untemplate the o3 CPU
*Would it be possible to split this change into a series of smaller patches?* Thinking about what could be easily split off. 1) The moving of cpu/base_dyn_inst_impl.hh into o3's DynInst 2) The changing of the checker templating Both of those could go in before the full untemplating patch. But I'd guess those are only ~1k lines of the 35k. I'm unsure how easy the rest would be to part out given how many cross dependencies exist. I'll try to re-do it for a single stage, to see if it is possible, but expect to discover pain. This patch was originally written in late December, so some re-basing work is needed to bring it up to date. Luckily, other than Tony's fetch patches and the recent review requests by Steve, not many people have made significant o3 changes. Any feedback from here can go into the rebasing effort. *and I suspect RB would just fall over.* Luckily it doesn't. It ended up being split across 3 very large pages on the internal ARM review board. *Have you been able to test that un-templated o3 passes all the regressions as well?* Yes, the patch passed all of the regression tests. It makes no difference in the stats. On Thu, May 15, 2014 at 10:56 AM, Korey Sewell via gem5-dev gem5-dev@gem5.org wrote: Fair points Ali and Tony. I think at the end of the day a 35k line patch (if it is that) will be a pain to review no matter how you slice/dice it although I'd still maintain at least dicing it into pieces would allow the reviewers to handle it better. If people all agree that this is the way to go, then maybe Mitch should just go ahead and provide the full patch to the RB (no matter how gi-normous!). This at least keeps us in the process of local_patch-RB-commit. I'm doubtful that any one person is going to get to 35k lines whether it is one patch or multiple patches anyway. -Korey On Thu, May 15, 2014 at 8:39 AM, Anthony Gutierrez via gem5-dev gem5-dev@gem5.org wrote: I like the idea of this patch as well. In fact, the templating doesn't really help with extending the CPU model in my experience. As far as splitting it up into multiple smaller patches, I don't think that is necessary or really a good idea unless the changes are truly independent. Instead of having a single 35k line patch, we'll have many (tens or hundreds?) of patches that add up to 35k lines. Anthony Gutierrez http://web.eecs.umich.edu/~atgutier On Thu, May 15, 2014 at 11:29 AM, Korey Sewell via gem5-dev gem5-dev@gem5.org wrote: Hi Mitch/gem5-ers, I think I would support the untemplating movement as well, so you have my approval there too :) With regards to implementation, I agree that splitting up the patches makes for a better review although it will take a small amount of work to get a reasonable splitting granularity. Have you been able to test that un-templated o3 passes all the regressions as well? Lastly, once this is reviewed you'll need to coordinate with people who have outstanding o3 patches as it could be a real pain for them to pull in a patch that all of a sudden untemplates the o3 code. -Korey On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev gem5-dev@gem5.org wrote: Hi Mitch, In general, I like the idea of removing some of the pointless/awkward templates we have in gem5. I would definitely support moving in this direction. However, I really dislike the idea of reviewing a 32k line patch. Reviewing such a patch would be a headache and I suspect RB would just fall over. Would it be possible to split this change into a series of smaller patches? For example, you could split it into one patch per functional unit and a final patch that does some cleaning up. You could probably just 'fake' new un-templated class names as typedefs in the relevant header files. //Andreas On 2014-05-13 18:23, Mitch Hayenga via gem5-dev wrote: Hi All, Recently I have written a patch that removes templating from the o3 cpu. In general templating in o3 makes the code significantly more verbose, adds compile time overheads, and doesn't actually benefit performance. The templating is largely pointless as 1) there aren't multiple versions of fetch, rename, etc to make the compile time Impl pattern worth doing 2) Modern CPUs have indirect branch predictors that hide the penalties that the templating was trying to mask. *I was wondering what peoples feelings were on a patch of this sort? * It is a quite large modification (~35k line patch file, changes almost all localized to the o3 directory). Many of the lines are simply because the impl header files were changed to source files. Here are a few benefits of the patch - Cleaner, less verbose code. - Due to the current templating
[gem5-dev] Proposal to untemplate the o3 CPU
Hi All, Recently I have written a patch that removes templating from the o3 cpu. In general templating in o3 makes the code significantly more verbose, adds compile time overheads, and doesn't actually benefit performance. The templating is largely pointless as 1) there aren't multiple versions of fetch, rename, etc to make the compile time Impl pattern worth doing 2) Modern CPUs have indirect branch predictors that hide the penalties that the templating was trying to mask. *I was wondering what peoples feelings were on a patch of this sort? * It is a quite large modification (~35k line patch file, changes almost all localized to the o3 directory). Many of the lines are simply because the impl header files were changed to source files. Here are a few benefits of the patch - Cleaner, less verbose code. - Due to the current templating/DynInst interaction, gem5 often requires rebuilding the function execution signatures (o3_cpu_exec.o) when a modification is made to the o3 cpu. This patch eliminates having to rebuild the execution signatures on o3 changes. - Marginally better compile/run times. - Moved base_dyn_inst_impl.hh into o3, it's too dependent on o3 as is. No other cpu does/should inherit from it anyway. - Made the checker directly templated on the execution context (DynInst) instead of an Impl like o3. Seems like it was coded dependently on o3. Here are some performance results for gem5.fast on GCC 4.9 and CLANG on twolf from spec2k. *Binary Size* CLANG: 1.1% smaller without templating GCC: Difference is negligible 0.0001% *CLANG Compile Time (single threaded, no turboboost, two runs)* *Templated* real21m32.240s user20m20.019s sys 1m6.721s real21m29.963s user20m17.016s sys 1m7.108s *Untempated:* real21m24.396s user20m13.158s sys 1m5.798s real21m23.177s user20m11.911s sys 1m5.843s *GCC Compile Time (-j8, did not disable turboboost)* *Templated* real11m35.848s user67m20.828s sys 2m2.292s *Untemplated:* real11m42.167s user67m7.572s sys 2m2.056s *CLANG Run Time (Spec2k twolf)* *Templated* Run 1) 1187.63 Run 2) 1167.50 Run 3) 1172.06 *Untemplated* Run 1) 1142.29 Run 2) 1154.49 Run 3) 1165.53 *GCC Run Time (Spec2k twolf, did not disable turboboost)* *Templated* Run 1) 12m20.528s *Untemplated* Run 1) 12m19.700s Any thoughts on eventually merging this? ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Squash prefetch requests from downstream...
changeset 5c2c4195b839 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=5c2c4195b839 description: mem: Squash prefetch requests from downstream caches This patch squashes prefetch requests from downstream caches, so that they do not steal cachelines away from caches closer to the cpu. It was originally coded by Mitch Hayenga and modified by Aasheesh Kolli. diffstat: src/mem/cache/cache_impl.hh | 39 +++ src/mem/cache/mshr_queue.cc | 16 src/mem/cache/mshr_queue.hh | 6 ++ src/mem/packet.hh | 4 4 files changed, 65 insertions(+), 0 deletions(-) diffs (133 lines): diff -r 3ab094e72dad -r 5c2c4195b839 src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Fri May 09 18:58:46 2014 -0400 +++ b/src/mem/cache/cache_impl.hh Fri May 09 18:58:46 2014 -0400 @@ -1394,6 +1394,12 @@ if (snoopPkt.sharedAsserted()) { pkt-assertShared(); } +// If this request is a prefetch and an +// upper level squashes the prefetch request, +// make sure to propogate the squash to the requester. +if (snoopPkt.prefetchSquashed()) { +pkt-setPrefetchSquashed(); +} } else { cpuSidePort-sendAtomicSnoop(pkt); if (!alreadyResponded pkt-memInhibitAsserted()) { @@ -1420,6 +1426,17 @@ bool respond = blk-isDirty() pkt-needsResponse(); bool have_exclusive = blk-isWritable(); +// Invalidate any prefetch's from below that would strip write permissions +// MemCmd::HardPFReq is only observed by upstream caches. After missing +// above and in it's own cache, a new MemCmd::ReadReq is created that +// downstream caches observe. +if (pkt-cmd == MemCmd::HardPFReq) { +DPRINTF(Cache, Squashing prefetch from lower cache %#x\n, +pkt-getAddr()); +pkt-setPrefetchSquashed(); +return; +} + if (pkt-isRead() !invalidate) { assert(!needs_exclusive); pkt-assertShared(); @@ -1503,6 +1520,14 @@ Addr blk_addr = blockAlign(pkt-getAddr()); MSHR *mshr = mshrQueue.findMatch(blk_addr, is_secure); +// Squash any prefetch requests from below on MSHR hits +if (mshr pkt-cmd == MemCmd::HardPFReq) { +DPRINTF(Cache, Squashing prefetch from lower cache on mshr hit %#x\n, +pkt-getAddr()); +pkt-setPrefetchSquashed(); +return; +} + // Let the MSHR itself track the snoop and decide whether we want // to go ahead and do the regular cache snoop if (mshr mshr-handleSnoop(pkt, order++)) { @@ -1730,6 +1755,20 @@ snoop_pkt.senderState = mshr; cpuSidePort-sendTimingSnoopReq(snoop_pkt); +// Check to see if the prefetch was squashed by an upper cache +if (snoop_pkt.prefetchSquashed()) { +DPRINTF(Cache, Prefetch squashed by upper cache. + Deallocating mshr target %#x.\n, mshr-addr); + +// Deallocate the mshr target +if (mshr-queue-forceDeallocateTarget(mshr)) { +// Clear block if this deallocation resulted freed an +// mshr when all had previously been utilized +clearBlocked((BlockedCause)(mshr-queue-index)); +} +return NULL; +} + if (snoop_pkt.memInhibitAsserted()) { markInService(mshr, snoop_pkt); DPRINTF(Cache, Upward snoop of prefetch for addr diff -r 3ab094e72dad -r 5c2c4195b839 src/mem/cache/mshr_queue.cc --- a/src/mem/cache/mshr_queue.cc Fri May 09 18:58:46 2014 -0400 +++ b/src/mem/cache/mshr_queue.cc Fri May 09 18:58:46 2014 -0400 @@ -232,6 +232,22 @@ mshr-readyIter = addToReadyList(mshr); } +bool +MSHRQueue::forceDeallocateTarget(MSHR *mshr) +{ +bool was_full = isFull(); +assert(mshr-hasTargets()); +// Pop the prefetch off of the target list +mshr-popTarget(); +// Delete mshr if no remaining targets +if (!mshr-hasTargets() !mshr-promoteDeferredTargets()) { +deallocateOne(mshr); +} + +// Notify if MSHR queue no longer full +return was_full !isFull(); +} + void MSHRQueue::squash(int threadNum) { diff -r 3ab094e72dad -r 5c2c4195b839 src/mem/cache/mshr_queue.hh --- a/src/mem/cache/mshr_queue.hh Fri May 09 18:58:46 2014 -0400 +++ b/src/mem/cache/mshr_queue.hh Fri May 09 18:58:46 2014 -0400 @@ -194,6 +194,12 @@ void squash(int threadNum); /** + * Deallocate top target, possibly freeing the MSHR + * @return if MSHR queue is no longer full + */ +bool forceDeallocateTarget(MSHR *mshr); + +/** * Returns true if the pending list is not empty. * @return True if there are outstanding requests. */ diff